Introduction

This paper is intended to highlight the promise of the emerging field of comparative neuro-primatology and to propose informatics tools and interdisciplinary directions that will open up new avenues of research for ethology and neuroscience. Computational modeling, and specifically dyadic/social brain modeling, can be used to integrate, extend, and test theories from both neuroscientific and behavioral fields. However, there are considerable theoretical and practical challenges to building realistic neuro-computational models of social behavior – both capturing the social elements of behavior, and making the most of the limited data that is currently available.

To illustrate the challenges of this integrative modeling approach, we propose a computational model based on the gestural communication of great apes. Gesture – in particular gesture acquisition – provides an excellent case study in social brain modeling because it raises issues that would be problematic for modeling any social behavior. For example, how do the brains of interacting agents process shared events differently? How do agents respond to behavioral changes in others, and how are these changes reflected in brain activations and/or adaptive synaptic wiring? Are there dedicated neural structures or pathways for social interaction, or do primates largely rely on domain-general regions for social cognition? Focusing on gesture also grants us empirical purchase as ape gestural behavior has been studied both experimentally and observationally, and manual action production and recognition systems in monkeys are fairly well characterized at the neural level. Finally, the added learning component of gesture acquisition forces us to consider both immediate and lasting changes in the neural organization of behavior.

The learning process we discuss – ontogenetic ritualization – has been proposed as a mechanism through which great apes may acquire new communicative gestures through the mutual shaping of action, resulting in a stable, but non-arbitrary gestural form. Modeling the process of ontogenetic ritualization provides several distinct challenges that must be confronted. The model must be able to account for the fact that ontogenetic ritualization is (i) a dynamic process in which (ii) multiple individuals process and respond to the interaction differently, while the interaction itself (iii) changes and develops over time.

Constructing a model of the dyadic interactions of the social brain requires integration of data across multiple datasets, methodologies, and disciplines, and thus places unique demands on informatics tools and resources. Data management tools and techniques for integrating resources must focus on efficiently navigating questions of homologies between species, finding the appropriate granularity of data for modeling projects, and producing simulation results specific enough to test existing frameworks and offer novel hypotheses. We highlight the need for new and more integrated resources for researchers operating in these highly interdisciplinary fields, and offer new suggestions and challenges for the neuroinformatics community.

Primate Social Behavior

All animals face the challenges of finding and obtaining food, water, shelter, and suitable mates while, at the same time, minimizing injury from competitors or predators. For social animals, these physical challenges arise in an abstract network of social relationships that often impact an individual’s success, and which must therefore be tracked, fostered, and exploited. The need for such socio-cognitive abilities likely provided a strong selection pressure that helped shape both brain structure and cognitive skill in the primate lineage (Byrne and Whiten 1988; Dunbar 1998; Sallet et al. 2011). The study of the primate brain has only begun to explore the neural correlates of these socio-cognitive abilities, but new developments in brain imaging and neurophysiological designs allow neural activity to be measured in both human and non-human primates during social interaction. For this paper, we focus on non-human primate – henceforth “primate” – data, but recognize the substantial insights that can be gained from human lesion and neuroimaging studies (Adolphs 2010; Amodio and Frith 2006; Shilbach et al. 2012). Combining insights from neural and behavioral studies promises to greatly increase the scope of the questions that may be addressed.

Ape Gesture

Great apes – gorillas, bonobos, chimpanzees and orang-utans – are proficient at copying manual actions, though their skills in this area are limited relative to those of humans (Byrne and Russon 1998; Dean et al. 2012). The ability to acquire manual skills through observation of conspecifics likely plays a role in the development of group-specific behaviors in both wild and captive populations. ‘Local traditions’ (behaviors restricted to particular populations or subgroups of individuals) involving the presence or variation of manual actions, such as tool use, food processing, and grooming, have been reported in both wild and captive great apes (Byrne 2004; Hobaiter and Byrne 2010; van Schaik et al. 2003; Whiten et al. 2001), and provide evidence that apes can develop ‘cultures’ of behavior (Whiten et al. 1999).

Surprisingly, manual gestures do not show the same levels of inter-group variability as manual actions do. Studies of ape gesture consistently report that the majority of gestures are either “species typical” (i.e., used by members of a species regardless of what geographic site they inhabit), or are idiosyncratic and therefore produced by only a single individual—and presumably recognized by at least one other (Genty et al. 2009; Hobaiter and Byrne 2011; Liebal et al. 2006; Pika et al. 2003). There have been some reported differences in gesture form or use between apes at different research sites, but the predominant pattern is one of similarity across sites, with most of the gestures observed at site A also observed at sites B and C. The gestural repertoires of individual apes at different sites typically overlap as much as those of apes at the same site (Call and Tomasello 2007). Additionally, gestural repertoires typically vary more strongly between age classes than between sites – with juveniles displaying largely different repertoires from adults. The proportion of apes using a particular gesture does vary between site – at some sites a gesture will be used by the majority of individuals, while at others it will only be used by a few – but it does not vary much (Genty et al. 2009; Hobaiter and Byrne 2011). There are some exceptions to the ‘species-typical or idiosyncratic’ characterization of gesture use. A few group-specific gestures have been observed in orangutans (Cartmill 2008; Liebal et al. 2006), gorillas (Genty et al. 2009; Pika et al. 2003), chimpanzees (Nishida 1980), and bonobos (Pika et al. 2005). These studies suggest the existence of group-specific gestures that may result from social learning (Arbib et al. 2008), though some have argued that reports of relatively infrequent gestures observed only in one group may simply be an artifact of under-sampling (Genty et al. 2009). However, if a gesture is used frequently in one group and rarely or never in another, a strong case can be made for a local ‘gesture culture.’

A significant problem with this approach of comparing gesture use across sites is that unless all data are collected and coded according to the same criteria, gestures at multiple sites may vary in how they are defined and recorded. This, in turn, may lead to inaccurate estimates of the repertoire overlap between groups. Comparing gestures across multiple sites and species is crucial to understanding gestural communication in primates, and new informatics approaches to integrating data gathered at different sites are needed to make significant progress in this field. We return to this issue in the final section.

Primate Neurophysiology

We are concerned with building a bridge between ape social behavior and its underlying neural circuitry. However, while there are data sets on single cell recordings in monkeys (especially in macaques), no such data are available for apes (although brain imaging data are now becoming available). Thus, our strategy is indirect, extending our understanding of brain processes in monkeys to construct a framework for modeling the social behavior of apes. In this section we will describe neurophysiological studies on macaques that have linked neural activity to both manual behaviors and cognitive abilities. In a later section we review a key set of computational models linking vision and action and which describe the neurophysiological data. We examine the assumptions and limitations of these models and ask: “what properties must be added to macaque models to support brain modeling of ape (social) behavior?”

Primate neurophysiology has been used to address some aspects of manual and social behavior in monkeys, but the designs have traditionally relied on ‘passive’ designs that do not require interaction between the subject and another individual. For instance, the research on ‘mirror neurons’ has always been passively social in that neural responses could be elicited by observing the performance of other individuals (di Pellegrino et al. 1992; Gallese et al. 1996) rather than through interaction. Neurons were found in premotor (and later in parietal) areas whose activity during an individual’s own performance of a particular action was found to be similar to the activity in response to observing another individual – usually a human researcher – performing a more-or-less similar action. In this way, mirror neurons can be driven by social variables, but the experimental designs do not require the monkeys to differentially ‘use’ this information, and so cannot assess how these responses affect downstream targets.

These passive designs can be contrasted with explicitly social or ‘interactive’ designs, more recently developed, that require the subject to directly interact with other entities, whether computer agents (Lee et al. 2005; Seo et al. 2009; Seo and Lee 2007) or conspecifics (Azzi et al. 2011; Chang et al. 2012; Fujii et al. 2008; Yoshida et al. 2011, 2012). These interactive designs have led to new insights into how the brain is organized to process specifically social information, how this information affects downstream targets, and how interaction between a monkey and another agent places unique demands on processing structures within the brain. For instance, responses in medial parts of frontal cortex, in and around pre-SMA, have been found to be ‘other’ responsive neurons – instead of firing both when an action is done by one’s self and when observing another, as in the above ‘mirror neurons’, these only fire during observation of another’s actions (Yoshida et al. 2011). Orbito-frontal cortex (OFC) neurons, recorded in monkeys playing interactive games, revealed modulations encoding social influences on motivation and reward processing (Azzi et al. 2011). OFC and anterior cingulate (ACC) neurons, in a separate but similarly ‘interactive’ study, were shown to differentially process how rewards were allocated between others and oneself, with ACC gyrus appearing important for processing the ‘shared experience’ of rewards (Chang et al. 2012). Together, these and other data demonstrate that social behaviors are becoming increasingly accessible to neurophysiological study in monkeys, and not just in a ‘passively social’ sense, but within tasks demanding back-and-forth exchanges. Additionally, the emergence of functional monitoring via PET, fMRI and other neuroimaging techniques adapted to non-human primates is most promising. These have the double advantage of being non-invasive, while being of the same ‘format’ of the most used techniques in humans, easing comparison of data across species.

For example, non-invasive EEG and ERP studies have recently been applied to the production and comprehension of vocal communicative behaviors in apes (Hirata et al. 2011; Ueno et al. 2008). These techniques complement the emerging use of PET in functional brain monitoring (Parr et al. 2009; Taglialatela et al. 2011), and eye-tracking, an indirect measure for attentional processing (Kano and Tomonaga 2009) in apes. Combining functional data with mathematical techniques to understand these indirect measures of brain activity in terms of neural firing allows researchers to ‘convert’ data between domains of analysis, including making non-invasive functional data more compatible with neuro-computational analysis (fMRI: Arbib et al. 2000; PET: Arbib et al. 1995; ERP: Barrès et al. 2013). All the above methods can be combined with the use of structural imaging techniques such as DTI (Hecht et al. 2012; Ramayya et al. 2010; Rilling et al. 2008), MRI (Sakai et al. 2011), and other imaging, anatomic and cytoarchitectonic methods comparing primate brains (Hopkins et al. 2010; Keller et al. 2012; Rilling et al. 2011; Schenker et al. 2010). Given the difficulty in directly assessing brain function in apes, it is necessary for researchers to use existing data from all available techniques to develop more complete models of primate neural processing during social behavior.

In this paper, we present the design of a brain-based conceptual model – to be followed with a fully implemented computational model elsewhere – aimed at testing a proposed learning process through which great apes may develop manual communicative gestures. Despite a dearth of direct neural data for gestural communication in great apes (Taglialatela et al. 2011), we have several reasons to focus on gesture acquisition as a test case for modeling social cognition. Firstly, the proposed learning process – ontogenetic ritualization – rests on repeated interactions between pairs of individuals, thus demanding a direct treatment of social interaction. Secondly, computational modeling of primate manual gesture intersects with a growing body of work on the mirror system and has implications for understanding the origins of human language. The Mirror System Hypothesis (MSH: Arbib 2010, 2005, 2008, 2012) makes explicit claims about brain function evolution throughout the hominid line, and the concomitant capacity for social learning and flexibility in intentional communication, that made the human brain ‘language-ready’. Although others have adopted a neuro-evolutionary approach to communication (e.g., Aboitiz 2012; Corballis 2002; Deacon 1997), MSH is unique in explicitly grounding the evolutionary account in the computational description of macaque neural processing (including ‘mirror neuron’ systems) and ape behavior (including gesture). It is within this MSH framework that we approach our case study, emphasizing the computational description of brain function to formalize hypotheses on gesture acquisition.

In order to properly contextualize our proposed model, we first outline the claims of ontogenetic ritualization, and then provide details on primate brain mechanisms known to be important for manual and social tasks (especially those formalized in computational models). We then describe our model – a conceptual analysis of the proposed process of ontogenetic ritualization – and discuss those features important for the field of ‘dyadic/social brain modeling’. Finally, we consider the impact of incorporating observational, experimental, and computational approaches in the study of the social brain, and conclude with a discussion of issues related to data management and sharing that will support future interdisciplinary collaborations.

Ontogenetic Ritualization

Ontogenetic ritualization (OR) is the proposed process of ritualizing movements of ‘effective’ actions (those that directly alter the behavior of other individuals) into communicative signals aimed at eliciting particular responses in others (Tomasello and Call 2007; Tomasello and Camaioni 1997). During this process of ritualization, a movement such as shoving another out of the way becomes ritualized over time into a ‘nudge’ as the actor learns that only the beginning of the movement is necessary to elicit the desired behavior in the recipient, and as the recipient learns to respond to the gesturer using only the initial movements of the action. However, according to this process, the actor and recipient form different associations resulting from their respective roles in the interactions – indeed, the recipient may only be able to perceive but not produce the gesture, and vice-versa (Genty et al. 2009). The degree to which OR plays a role in the acquisition of ape gestures is debated (Genty et al. 2009; Perlman et al. 2012; Tomasello and Call 2007). Here, we do not take a strong stance on whether OR is the main acquisition mechanism for ape gesture, but we do note that OR could explain those species-typical (and not just idiosyncratic) gestures whose relation to species-typical actions is readily derivable through the OR process. We propose a model of the cognitive and neural changes that, we hypothesize, could support OR. It is our hope that such modeling work will make it possible to identify the conditions under which OR is a plausible mechanism for gesture acquisition, while simultaneously generating hypotheses for new behavioral and neuroimaging experiments that test social and communicative behaviors more broadly.

The process of ontogenetic ritualization is described by Call and Tomasello (2007) as proceeding in three steps (Fig. 1, left):

Fig. 1
figure 1

Processes resulting in ontogenetic and assisted ritualization. (Left) Ontogenetic (naïve) ritualization yields a gestural form through the mutual shaping of behavior between individuals A and B. In each iteration, individual A begins with goal G, interacts in some way with individual B, and B fulfills the goal G through action Y (shaded box). Over time, B performs Y in response to shorter and shorter segments of X, resulting in A producing the ritualized gesture XR (last boxes). (Right) Assisted ritualization is similar to ontogenetic ritualization, with the exception that individual B ‘guides’ the behavior of individual A, by inferring the goal G and modeling or facilitating the performance of XR. Here, the shaping is primarily unidirectional (B shaping A), whereas at left, the shaping is bidirectional and makes fewer assumptions about the mental states of the interacting agents

  1. (1)

    Individual A performs behavior X (not a communicative signal), and individual B consistently reacts by doing Y

  2. (2)

    Subsequently B anticipates A’s overall performance of X by starting to perform Y before A completes X

  3. (3)

    Eventually, A anticipates B’s anticipation and produces an initial portion of X in a ritualized form XR in order to elicit Y

Of particular relevance to social brain modeling is that this is a dyadic learning process – it requires differential learning in the brains of A and B, which reflects the changing patterns of interaction between them throughout the ritualization process.

Ontogenetic ritualization is thought to underlie the development of some human gestures, but the process in humans differs in some important ways from the ape process we focus on here (Fig. 1, right). The palm-up ‘gimme’ gesture or the ‘arms up’ gesture in which an infant raises his arms to indicate a desire to be picked up are good examples of ritualized human gestures (Bruner et al. 1982; Clark 1978). Initially, these types of gestures occur only in the immediate context of the actions they are derived from – a 9-month-old infant will use the ‘arms up’ gesture only when an adult behaves as though she is about to pick him up (perhaps only following the adult’s contact under the arms of the child). Over time, however, the gestures become more removed from these particular contexts so that a 13-month-old infant might use the gesture according to his own desires to request rather than facilitate being picked up (Lock 2001). Thus, ‘arms up’ emerges as a sign used with communicative intentions.

Fig. 2
figure 2

Data-driven model development. Our proposed model is based on the functionality of many previous computational models, and of an analysis of where model integration is possible, and where model performance requires ‘extensions’ in its computational powers. a A schematic of primate manual control and recognition, based on the MNS (Oztop and Arbib 2002) model of action recognition, and the FARS (Fagg and Arbib 1998) model of action production, in the macaque. Note that mirror neuron responses are limited to grasping actions directed at objects, and manual control is similarly limited to object-directed actions; the model would fail to respond to (simulated) intransitive movements. Shaded areas correspond to putative anatomical localization. b A schematic of simple ‘addition’ of models discussed in the text, including MNS, FARS, and ACQ (Bonaiuto and Arbib 2010), and of novel connections between modules, including greater postural (intransitive) and tactile-based action recognition, expanded postural control of limbs, and socially-motivated decision-making. Note that shaded regions correspond to primary architecture of previous models, and not anatomical localization

In human infants, this ritualization process may be more accurately described as assisted ritualization (Zukow-Goldring and Arbib 2007) because the adult recipient is monitoring and reinforcing seemingly communicative behavior in the infant, and in many cases the adult already has an idea of what the final gestural form should look like (XR in Fig. 1, right) – because, for example, it already exists in the culture. Thus, the process and speed of ritualizing an action into a gesture becomes driven by the recipient.

In the case study we consider, we restrict ourselves to the first interpretation of ontogenetic ritualization as a naïve interactive process though which a sign emerges, rather than a process in which a sign is shaped by a more knowledgeable individual. It may be the case that experienced primates play a more active role in shaping the behavior of others, as humans are known to do (see Ferrari et al. 2009), but here we focus on the simpler, unassisted version of ontogenetic ritualization in which each participant is naïve as to what the final form of the gesture will be.

Action, Perception and Cognition in the Brain

To fully represent the changes in the brain of each participant during ontogenetic ritualization, our model must minimally incorporate brain structures critical for (i) the visuo-motor control necessary for action and gesture, (ii) recognizing and responding to the actions of others, and (iii) motivating social interactions between conspecifics – as well as considering how learning affects each. We now review some known primate brain systems for visually-guided grasping, action-recognition, and decision-making, and outline their proposed computational properties. In a later section, we will suggest how these brain mechanisms supporting praxic actions directed at changing the physical state of an object could provide a basis for extended circuitry that also supports communicative actions (e.g., gestures) directed at changing the behavior of a conspecific.

Visually-Guided Grasping

The FARS model (Fagg and Arbib 1998) has been offered as a computational description of manual visuo-motor coordination in the macaque brain. FARS describes the fronto-parietal reach-to-grasp production circuitry macaques use when they manually grasp objects (so called reach-to-grasp actions), based on neurophysiological data. Parietal structures on the dorsal stream extract the ‘affordances’ of the world relevant to the grasp (i.e., the physical and spatial properties of the object to be grasped) and forward these to premotor cortex for selection of an appropriate grasping action (Jeannerod et al. 1995). A ventral object-recognition path allows prefrontal structures to select an appropriate motor program when working memory or task structure provides relevant constraints. The model additionally invokes interaction between prefrontal cortex and the basal ganglia when a sequence of actions is required to complete the overall task. This computational description of primate manual control – well supported by contemporary accounts of brain function (Cisek 2007; Cisek and Kalaska 2010) – can help frame our model of gesture learning. It is important to point out, however, that such an ‘affordance-driven’ description must be complemented with a description of the control structures participating in guiding hand motions without explicit physical targets, as would occur during the performance of visually intransitive gestures (as opposed to tactile gestures like the ‘nudge’ example discussed previously).

Action-Recognition

The MNS, for Mirror Neuron System (Oztop and Arbib 2002), and MNS2 (Bonaiuto et al. 2007) models build off of the computational description of manual action in FARS to describe the recognition component of the mirror neuron responses. These models have suggested that mirror neurons learn their property of ‘action parity’ – responding similarly for production and recognition – by learning the visual trajectory of the hand in relation to objects for actions already in the agent’s repertoire – combining signals of visual feedback during the course of generating an action with the efferent motor commands controlling that action (Oztop and Arbib 2002). These models formalize how neurons in parietal and premotor regions can learn to recognize a range of movements associated with a given reach-to-grasp action, and illuminate how action recognition in macaques may be supported by these parietal-premotor circuits. In our analysis of ape gesture learning, the MNS class of models provides sufficient machinery for recognizing affordance-driven actions like reach-to-grasp – crucial, as we will see, for the learning that must occur in the initial stages of ritualization. However, again because of the model’s emphasis on the relation of the hand to an explicit physical target, the MNS models (like the FARS model for action generation) are not flexible enough to account for the movements associated with known ape gestures. Thus, the MNS model for the macaque must be extended by additional visual-processing machinery to recognize movements not explicitly directed towards objects, and likewise for FARS (see Fig. 2). And while monkey reaching and grasping behavior has been long studied (Georgopoulos et al. 1981; Jeannerod and Biguer 1982; Rizzolatti et al. 1987; Taira et al. 1990), non-human primate gestural control has not (though for studies of, e.g., apraxia in humans, see: Buxbaum et al. 2005; Petreska et al. 2007; Rothi et al. 1991). This gap can be partially bridged by generating testable hypotheses derived from computational models (e.g., how do apes maintain approximate visual form when no explicit targets are available?).

Decision-Making

In order for an animal to adjust its actions to respond to the immediate environment, it must be able to evaluate contextual and motivational information and select an appropriate action from its repertoire on the basis of that information. For an animal to adapt its actions to environmental variables over time, the neural system must also be sensitive to the outcomes of its past actions. Reinforcement learning has been a successful framework for describing this type of adaptation – particularly when considering the decisions and actions of goal-directed, reward-driven agents (Sutton and Barto 1998). Crucially, estimates of the ‘value’ in performing particular actions in particular contexts are learnable, even when no explicit positive or negative reinforcement is received until some time in the future – after the completion of further actions. These estimates of the value of particular actions predict how an agent will learn and act when it encounters similar circumstances in the future.

The Augmented Competitive Queuing (ACQ) model (Bonaiuto and Arbib 2010) places reinforcement learning mechanisms alongside MNS mechanisms, in the context of making decisions in the face of changing environments, goals and, crucially, skills. This allows actions to be evaluated in a particular context for executability – the availability of affordances that allow the given action – as well as desirability – the expected (future) reinforcement following that action (e.g., motivational components for decisions). This model predicts that actions will be chosen opportunistically: the next action chosen will be that which is most desirable among the set of executable actions. This separation of decision variables into contextual and motivational components – and their ultimate integration – is supported by the neurophysiology of decision-making (Watanabe 2007; Watanabe and Sakagami 2007).

In the ACQ model, visual feedback analysis (mirror neuron system activity) of one’s own actions determines whether the action achieved its goal, and whether its execution resembled some other action (the apparent action). On this basis, reinforcement learning can update the executability of the intended action and the desirability of the self-observed action (whether intended or apparent). In this way, an agent uses an evaluation of current context based on traces of past experiences to estimate the effectiveness of different possible actions. This, in fact, fits the observed patterns of great ape gesturing, in which apes choose gestures based on their goals, the immediate social context, and their past interactions with their partner (Cartmill 2008; Hobaiter and Byrne 2011; Liebal et al. 2004a).

Given that computational models of neural circuitry for visually-guided grasping, action-recognition, and adaptive decision-making exist, our model of ape gesture acquisition need not be constructed de novo. The FARS, MNS, and ACQ models, along with insights drawn from other models – robotic (Chaminade et al. 2008; de Rengerv et al. 2010), and neural (Bullock and Grossberg 1988; Caligiore et al. 2010; Demiris and Hayes 2002) – provide a circuitry framework upon which neuro-computational models of gesture acquisition may be based. Moving from simpler single-agent models into more complex, social brain modeling may necessitate a re-evaluation of previous models, and require ‘extensions’ to these models to more closely capture brain function. It is here that neuroinformatics tools could provide crucial insights into (i) model benchmarking – what can a model do or explain and what can’t it do? (ii) representation of data at the appropriate ‘level’ – does it explain dynamic time-courses, or sequences of discrete decision events? and (iii) comparing predictions derived from model simulations with empirical results from behavioral studies or neurophysiology.

Case Study: Developing the Gesture ‘Beckon’ via Ontogenetic Ritualization

We now consider an analysis of the progressive changes in brain and behavior that would need to occur during the proposed process of OR. We do this by constructing a hypothetical sequence of interactions between a mother and child ape that could lead to the emergence of ‘beckoning’ as a gesture used by the child to get the mother to approach. This gesture, or variants of arm-extended ‘approach’ gestures, has been observed in several ape species (Cartmill 2008; Pika and Liebal 2012; Pika et al. 2003; Tomasello and Call 1997), though it is not clear how it is acquired. Our aim is not to claim that this specific gesture is learned in this way, but to use this example to help clarify both the types of interactions and the neural changes that would be necessary to support the general transition from action to gesture via OR. Our model is conceptual, not a fully implemented computational model (though the latter is an ongoing research goal). The conceptual model serves to make general points about gestural acquisition through OR, and offers a framework to analyze existing behavioral data from a neuro-computational perspective.

Where Call and Tomasello (2007) describe ontogenetic ritualization with the above 3-step formula, we offer a finer-grained analysis using 6 stages to highlight the distinct learning processes that we expect to occur in the Mother (M) and Child (C) as the child’s pulling action is ritualized into a beckoning gesture. We then walk through the neural changes that seem necessary to support the behavioral changes at each stage and discuss the challenges in modeling the changes in mother and child at each stage.

Proposed Behavioral Changes in Mother (M) and Child (C) During OR of a Beckoning Gesture

  1. Stage 1)

    C reaches out, grabs, and tugs on M, causing M to move towards C as a response.

  2. Stage 2)

    C reaches out, grabs, and begins to tug on M, and M quickly moves towards C.

  3. Stage 3)

    C reaches out and makes contact with M, and M quickly moves towards C.

  4. Stage 4)

    C reaches out towards M, attempting to make contact, but M responds before contact is made.

  5. Stage 5)

    C reaches part way towards M, and M responds by moving towards C.

  6. Stage 6)

    C gestures towards M and M responds to this ritualized gesture by moving towards C.

It is our belief that such a finer-grained process, when pegged to behavioral and neural changes in each agent, presents a more appropriate framework with which to compare or benchmark a computational model, while still remaining consistent with Call & Tomasello’s description of the overall pattern of interaction.

  1. Stage 1)

    Child reaches out, grabs, and tugs on Mother, causing Mother to move towards Child as a response.

Since our example is meant to illustrate the salient steps in all plausible cases of ontogenetic ritualization, the key for stage 1 is that the actor is able to achieve his desired goal directly through physical manipulation of the recipient. For this initial period of interaction, neither participant has any prior expectations of the other’s behavior.

Child

In order for the child to achieve his goal, his only option is to physically manipulate the mother to bring her into physical contact with himself (that is, we assume no latent gestural form for this goal). He orients towards the mother, identifying appropriate surfaces for grasping to pull. He reaches out, grasps a part of her body, and initiates pulling on the mother. The pulling force begins the movement of the mother towards the child, and after enough tension, the mother complies and moves closer to the child. The neural machinery required to coordinate this sequence of actions can be fully described by the FARS model of visual control of grasping discussed above, which will serve as a benchmark for the child’s behavior in subsequent stages.

Mother

Throughout this example, we will assume that the mother is motivated to complete her child’s request (not always the case!) and that her attention is appropriately oriented towards the child, allowing her to process the child’s actions visually as well as haptically. Assuming that the mother is attending to the child, her mirror neuron/action-recognition system would register the ‘reach-to-grasp’ followed by ‘pull’. A key property of mirror neuron firing, captured by the MNS models, is that mirror activity often signals recognition of the observed actions before the action is completed. However, the mother’s response to, as distinct from recognition of, the child’s action has not yet been established. Associative learning mechanisms establish this connection between the child’s action and the response ‘approach.’ Importantly, this association must be retained as the child’s action changes form over the OR process. We also note the need for ‘social’ motivation to fulfill the goals of the child or to prioritize physical proximity – a motivation shared by both agents.

  1. Stage 2)

    Child reaches out, grabs, and begins to tug on Mother, and Mother quickly moves towards Child.

In this step, both individuals experience an adaptive change in behavior in real-time and begin to alter their expectations of the other’s actions. For the child, (i) he need not pull as hard once the mother begins to comply, suggesting feedback modulation of his on-going action, and (ii) he forms the expectation that the mother will be increasingly compliant. For the mother, she learns that given contextual considerations – similar play conditions, perhaps, and/or perceived emotional state – and her child’s grasp-pull action, she is rewarded (socially) for moving herself to his side.

Child

The child’s intention is to reach out, grab and physically move the mother near him. However, following his grasp, his mother becomes more compliant and begins the movement towards his side. The child perceives the mother’s movement as beginning to satisfy his goal and acts less forcefully on the mother as she responds to his action. This further refines his expectations of his mother’s likely response. In future interactions, he will expect that less force is needed to achieve his goal.

At this stage, we encounter the problem of how recognition of someone else’s actions can affect the ongoing execution of one’s own actions – a general concern for social brain modeling. Here, the child, as in step 1, expects a full ‘reach-to-grasp-to-pull’ action is necessary to achieve his goal, but as he begins his tug on the mother, the mother responds by ‘completing’ the action. Recognition of the early success of the action must be able to modulate the child’s ongoing behavior in such a way that his action can be modified either (i) by reducing the force he pulls on the mother, as in this step, or (ii) by interrupting and even extinguishing the action mid-trajectory (as we describe below). Such sensitivity to changing perceptual variables during grasping behavior has been explored in a computational model of how the reach, grasp, and their coordination may be affected by perturbations in the size and location of grasped objects (Hoff and Arbib 1993), which use on-line feedback to modulate what might otherwise have been a feed-forward movement. Thus, insights from other models may guide our own model development.

Mother

Following contact, the mother moves towards the child, easing the tension on her arm. The association between the child’s action and the approach response is rather weak at this point, and can only be triggered by proprioceptive contact and mechanical tension as a complement to the visual-form representation established by the mirror neuron system. As in stage 1, the MNS models of monkey action recognition provide an explanation for the mother’s recognition of the child’s actions, but are unable to provide a clear description of the effects of this recognition – a problem we explore below. Future models of action recognition must address the role of multisensory integration in the recognition process more thoroughly. The MNS2 model (Bonaiuto et al. 2007) characterized the audio-visual neurons seen experimentally in Kohler et al. (2002), and showed how associative learning mechanisms may link acoustic cues with the visual form of actions. A key for a model of OR would be extending this to visual-haptic cues (see Fig. 2).

  1. Stage 3)

    Child reaches out and makes contact with Mother, and Mother quickly moves towards Child.

Child

The child’s attempt at grasping and pulling the mother remains the same as stages 1 and 2, with the exception that he becomes increasingly sensitive to the mother’s anticipatory response, having in the past two stages come to expect a ‘completing’ response. In stage 3, as he begins to make contact with the mother, the mother’s response appears consistent to his expectation, and he aborts the second half of the action sequence: the pull on his mother. As we see in Fig. 3, however, such a process may be described at different levels of representation – discrete and continuous, or ‘event’ and ‘trajectory’. Models of reaching and grasping (e.g., Bullock and Grossberg 1988; Fagg and Arbib 1998) emphasize the dynamic unfolding of the behavior and how certain elements (the positions of joints, perhaps) vary continuously in time. Models of learning and decision-making (e.g., Bonaiuto and Arbib 2010; Botvinick et al. 2009) on the other hand, emphasize the serial structure of decisions as discrete events. Both levels may be helpful in understanding brain function, and in fact the brain appears to utilize both (see: Averbeck et al. 2002; Campos et al. 2010; Georgopoulos 2002; Sawamura et al. 2002). The challenge for neuroscientists is to understand how both may coordinate behavior, and how best to represent these descriptions in models.

Fig. 3
figure 3

Event- and trajectory-level representations in brain modeling. (Left) Event-level representations, treating actions and decisions as discrete units, emphasize higher-order representations and the sequential unfolding of distinct actions in series. Neurophysiological recordings show that the brain can maintain such state-by-state representations of sequences (Campos et al. 2010; Sawamura et al. 2002). (Right) Trajectory-level representations treat actions, both single actions and action sequences, as dynamic and emergent trajectories in ‘action spaces’, sensitive to idiosyncratic context and performance and the on-line modulation from feedback centers. Such a perspective is supported by behavioral and neurophysiological data (Jeannerod et al. 1995). From top to bottom, both levels of representation show the putative ‘truncation’ of an instrumental action to that of a ritualized gesture. Dashed lines on the left indicate possible next-states in the action sequences (e.g., priming activation), while the shading indicates the sensitivity to feedback (e.g., ‘grasp’ may simply become a ‘touch’ if recipient responds quickly; see, for example, Stage 3 in the text). Dashed lines on the right similarly indicate possible next-states contingent on the performer’s evaluation of the goal state (e.g., whether the recipient has responded appropriately). In both representations, then, we see that the original effective action/action sequence is not lost, and may be substituted for the gesture when more appropriate – for example, if the recipient is not visually attentive (Liebal et al. 2004b)

Fig. 4
figure 4

Interactions between experimental and theoretical disciplines. Modeling can be a source of anchoring insights across experimental conditions, including anatomical, physiological and behavioral. A robust, computational model of brain systems that (i) are anatomically-based, (ii) compute with biologically-plausible models of neurons or populations of neurons, and (iii) generate patterns of overt behavior can both formalize findings in a unified framework, and support hypothesis generation to inspire new experiments or techniques of analysis. This back-and-forth between the experimental and theoretical disciplines – facilitated by informatics tools (shading) – is and has been highly profitable. (Left) Models in the past have been successful at engaging experimental findings, especially those that have relied on instrumental behavior in monkeys – and in utilizing insights from monkey single-unit recordings during instrumental or ‘passively social’ task conditions. Informatics tools and resources assist in developing, testing and benchmarking computational models. (Right) We propose to similarly engage this back-and-forth between models and experiments, while moving each ‘cycle’ into an arena of novel questions as the arrows from left to right indicate. In particular, we seek to move from models of single agents engaged in instrumental tasks, to models of ‘dyads’ that interact directly with each other. Informatics tools, while providing important resources for this research venture, must be expanded to handle the new challenges that will result from this novel modeling approach

Mother

Visual recognition of the child’s reach-to-grasp action, coupled with contextual cues and the proprioceptive contact as above, is sufficient for the mother to select an appropriate response consistent with the child’s goals. This stage represents the terminal phase of proprioceptive cues involved in training the recognition-response linkage – in future stages visual recognition alone suffices to initiate the response.

  1. Stage 4)

    Child reaches out towards Mother, attempting to make contact, but Mother quickly responds before contact is made.

Here, we have reached the point where both (i) the child has learned that a ‘reach-to-grasp-to-pull’ action is not necessary (though perhaps unsure about the extent to which he must contact and attempt to manipulate the mother), and (ii) the mother has learned that (attempted) grasps to her arm may signal an opportunity for social bonding. Note that whereas the child’s learning consists largely in tuning his forward expectations of the mother’s behavior, the mother’s learning consists in mapping the recognition of the child’s actions to behavioral responses that satisfy mutual goals.

Child

The child at this point has learned that incomplete versions of a ‘reach-to-grasp-to-pull’ action can be used to achieve his goal, and so only intends to make minimal contact. Here, the child’s attempted action should still be seen as transitive, directed at a surface. This will be the starting point for the last two stages, in which the actions become increasingly removed from orientation towards a specific surface, and instead the hand’s movement pattern in space becomes the most salient element, resulting eventually in a ritualized, intransitive gestural form.

Mother

By stage 4, the mother has robustly linked visual recognition of the child’s ‘reach-towards-body’ action with the ‘move-towards-child’ response, and can effectively fulfill the child’s goal without even minimal haptic cues. The key to this stage of the ritualization process is that visual form alone is now sufficient for the mother to respond. Subsequent stages serve to train the mirror neuron/action-recognition system to recognize the now visual-only ‘proto-beckoning’ act in shorter and more reduced forms.

  1. Stage 5)

    Child reaches part way towards Mother, and Mother quickly responds by moving towards Child.

Child

This stage is unique in that now the child no longer intends to physically interact with the mother, but instead acts only in a way sufficient to elicit the appropriate response. Crucial for the child, though, is that his intransitive performance must appear visually similar enough to his transitive performance that the mother can recognize it and respond appropriately. This would suggest that proprioceptive and visual feedback signals from the past transitive episodes trained a forward internal model – a model of an appropriate trajectory through space – that now can sufficiently control the limb in an apparent ‘proto-beckoning’ action. Whereas previous models of object-directed motor control (e.g., FARS) neglected intransitive performance, we see that models of gesture acquisition may not. Instead, as Fig. 2 shows, additional machinery must be added to recognition and production processes, informed by analyzing behavioral and/or functional data.

Mother

The mother is able to recognize, in mid-trajectory, the motion of the child’s arm, and to respond appropriately. In this stage, the mother’s action-recognition system would need to begin to respond to smaller and smaller portions of the action. Just as the child must maintain visually similar performance in the absence of explicit targets (e.g., mother’s arm), the mother must be able to recognize the child’s actions absent such contextual cues. Without sufficient ‘overlap’ in trajectory, the putative visual training would be unable to maintain the link between recognition and response in the mother.

  1. Stage 6)

    Child gestures towards Mother, and Mother responds to this ritualized gesture by moving towards Child.

A ritualized form of the gesture emerges. While previous stages have highlighted the changes that would allow both mother and child to progress towards using and recognizing an intransitive gestural signal, stage 6 represents a more stable communicative form that the child can continue to use in future interactions.

Child

The child intends the mother to come to his side, and performs the ritualized gesture. As a result of continued interaction with his mother, he may learn that the gesture is more or less effective over various distances or in different contexts. The child maintains an association between the gesture and the original action, and is able to substitute the action for the gesture if the gesture is not effective at achieving the child’s goal.

Mother

The mother sees and recognizes his gesture, and responds in a manner that fulfills his goal. Again, we note that the mother would still respond accordingly to the original effective action.

Summary of the Ontogenetic Ritualization of ‘beckon’

The ‘beckoning’ gesture that would result from this 6-stage process can be seen as a truncated (and modified) version of the instrumental act of reach-to-grasp-to-pull. While this case study illustrates general behavioral changes that would occur in the ‘naïve’ ontogenetic ritualization process as it has been discussed in the literature, we do not argue that this particular gesture is necessarily derived in this way. In fact, we note that any model of the acquisition of a gestural form is likely to have idiosyncratic features. What we have described may be seen as a generic analysis of ritualizing effective actions, and not wholly specific to ‘beckoning’. For example, ‘beckoning’ as described in the literature involves a sweeping of the hand or curling of the fingers towards the body. This requires additional machinery beyond what we have described here – perhaps to anticipate the recipient’s movement towards oneself, or perhaps merely to distinguish the form from other similar gestures (e.g., ‘palm-up give’) – just as a ‘nudge’ gesture, or ‘arms-up play’ gesture would require unique mechanisms specific to each gesture. Also, this transition from action to gesture may take several forms – including phenotypic changes – nevertheless elements of the stages we propose are likely to be central to any ritualization processes that yields gestural forms.

Discussion

Computational models have been important in understanding neural control of behavior, from object and action recognition (Bonaiuto et al. 2007; Deco and Rolls 2004; Oztop and Arbib 2002), to saccadic eye control (Dominey et al. 1995; Silver et al. 2012), and visual control of grasping (Fagg and Arbib 1998). Such models have even made useful contributions to our understanding of higher order cognitive skills (O’Reilly and Frank 2006; Rougier et al. 2005). However, our proposed analysis of ontogenetic ritualization presents several unique challenges to the brain modeling community.

Challenges for Dyadic Brain Modeling

The first challenge for social brain models is simply being social. Few brain models to date have incorporated explicitly social tasks that are central to primate behavior and cognition. The Mental State Inference model (MSI) is perhaps the first explicit instance of multiple brains in simulation, and simulates the manual performance of one individual being observed by another. The MSI model is based in large part on the MNS models of action recognition and suggests mirror responses are a single part of an extended ‘internal model’ that serves to decode the intentions of others (Oztop et al. 2005). However, there is no explicit interaction between the agents in the model, and the observations do not affect the subsequent behaviors of the observer – that is, there is no ‘task’. We are not able to predict how observation would affect future performance, and we are still lacking any consideration of interaction between the brains. Thus, the MSI model is a ‘passively social’ model in the same vein as the MNS and MNS2 models (and most of the work on mirror neuron neurophysiology).

A few ‘interactive’ dyadic models have been put forth, but they often lack the neural specificity found in models of passively social or purely instrumental tasks. Taking an interactive approach, Steels and colleagues have modeled multiple interacting agents in ‘language learning’ games, showing interesting results for ideas of grammar learning and cultural transmission (e.g., Steels 2003). However, Steels’ multi-agent simulations lack ‘brains’, and instead describe agents with simplified mechanisms that are highly task-specific. The dynamics of the interactions are enlightening, but say little about the specific brain processes involved in interactions between primates. Other models simulate neural dynamics between interacting agents, but do not engage questions of the computational properties of detailed brain circuitry during a specific task (Dumas et al. 2012).

Our analysis of ontogenetic ritualization diverges from previous work incorporating purely observational social dimensions in that it is an explicitly interactive process of social learning. Social learning encompasses all learning that is modulated by the actions of another individual (Galef and Laland 2005), though different kinds of learning may be distinguished, for example, in the degree to which available environmental information may be processed to influence future behavior (Acerbi et al. 2011). Dyadic learning, like ontogenetic ritualization, might be considered to be an ‘interactive’ form of social learning in that the learner must interact with (rather than just observe) another individual for learning to proceed. (Indeed, both agents are ‘learners’ in ontogenetic ritualization.) In these cases, brain models must show how distributed patterns of neural activation in each individual affect their behavior and how socially-influenced learning processes in the brain given rise to adaptive changes in behavior.

A further challenge for social brain modeling concerns the fact that the motivations underlying social behaviors may not be the homeostatic motivations (hunger, thirst) that are often used in simulations of behavior. During social interactions, animals may perform the same task, but with differing motivations – for example, preferring social information over food rewards (Deaner et al. 2005; Klein et al. 2008). Brain modeling lacks serious consideration of these differing motivational drives during behavior (especially social behavior), and rarely incorporates reward-modulated or motivational responses in neural network models (Arbib and Bonaiuto 2012; Guazzelli et al. 1998). This failure to incorporate motivational elements becomes more consequential when modeling social tasks, which require both navigating the multi-dimensionality that is ‘motivation’ (Berridge 2004) and incorporating the perception of motivations and intentions in partners, thought to be a large driver of cognitive skill evolution in primates (Byrne and Whiten 1988).

Another challenge is to address the debate over the extent to which neural networks and cognitive modules can be said to be ‘domain general’ or ‘domain specific’, and how these systems would interact, especially with respect to social cognition. As a simple example, face-selective neurons have been described in specific regions of temporal cortex (Barraclough and Perrett 2011). Using more complex interactive designs, (Yoshida et al. 2011) describe medial frontal neurons that appear to respond exclusively to social variables (at least for the task studied), and may suggest certain functional specialization within especially frontal and prefrontal regions. Anterior Cingulate Cortex gyrus (ACCg) and Anterior Cingulate Cortex sulcus (ACCs) dissociate, in a lesion study, in their recruitment in response to social variables, with ACCg important for social valuation (Rudebeck et al. 2006) – though see (Chang et al. 2012) for a more complex result. Lastly, the LIP mirror neuron responses in monkeys suggest that ‘integration’ regions like PPC can represent both social and non-social information important for decision-making (Shepherd et al. 2009). Together, these and other data must be analyzed to identify whether and which structures can be said to be ‘social-domain specific’, and how such regions would interact with wider neural systems (Fig. 4).

Linking Neuroinformatics to Gestural and Behavioral Datasets

The dyadic brain approach to social behavior challenges the neuroinformatics community to provide resources to integrate data from neurophysiological and behavioral studies of primates in a way that could provide new insights into the cognitive and structural changes underlying the evolution of primate (and human) communication. We review discipline-wide concerns for managing data, including:

  • Primate behavioral data

  • Primate brain imaging data

  • Macaque neurophysiological data

  • Comparative neuroanatomical data

  • Model simulation results

Behavioral Data Management

Researchers in biological and biomedical sciences have made significant advances in constructing searchable databases and have tackled the challenges of standardizing and archiving data in a range of fields, including within neuroscience (see companion articles). Though these approaches could not be transferred verbatim to data in comparative cognition, the challenges inherent to linking studies and identifying patterns in data are common to all integrative databases and should be used to inform future efforts to consolidate data across studies of primate cognition.

Tomasello and Call (2011) raised concerns about the isolation of individual studies in primate cognition, particularly in relation to gesture studies in the great apes. Differences as to what qualifies as a ‘gesture’ and how gestures are coded and defined lead to significant differences between the conclusions of different studies of the same species, and can reflect local traditions in research groups (Cartmill and Byrne 2011). The potential of drawing erroneous conclusions based on single studies or only studies from a single research group underscores the importance of establishing a consistent ontology of primate social behavior and developing resources for managing, integrating and sharing behavioral data.

The level of comparison or ‘granularity’ of the searchable data between studies is particularly important. Definitions of behaviors or functional responses differ between studies and these differences often make it difficult to directly compare the results of studies without explicitly accounting for differences in methodologies. For example, one paper that surveyed multiple groups of gorillas found a total repertoire of 33 gestures, most of which were shared between zoos (Pika et al. 2003). Another paper reported 102 gestures, also from a survey of multiple zoos (Genty et al. 2009). The differences between the repertoire sizes reported in these two studies are not the result of group-specific gestures or cultural variation between sites. Rather, they result from granularity differences in the researchers’ gesture definitions. The first study defines gestures by the predominant movement involved, but does not typically draw distinctions between gestures based on the limb or hand shape used. Thus all examples of hitting a surface with a hand would be considered a “slap ground” gesture. In the second study, however, the limb and hand shape used are considered part of the gesture definition so “knock object,” “punch object,” “slap object 1-handed,” and “slap object 2-handed” are all recorded as separate gestures. These differences in the granularity of definition could result in the same set of observations yielding drastically different summary results (a potential problem not just for behavioral, but also neural datasets).

On the one hand, it is necessary to accurately record those methodological differences that make direct comparison of results between studies difficult at the moment. On the other hand, it is not practical to fully recode primary source data from different studies according to the same guidelines so that it can be easily pooled. Allowing individual variation in the coding systems not only removes a substantial barrier to contributing data to a collective database, it also allows coding systems to be appropriately tailored to the differences between species’ communication systems. For example, one frequently coded behavior is whether an ape waits for a response from the recipient before giving up or attempting another gesture. This measure of ‘response waiting’ is used as an indication of intentional communication since it signals that the gesturer expects a particular response from the recipient. Since primate species differ in temperament and energy levels, the length of time that suggests waiting for a response is likely to differ. The amount of time thought to indicate response waiting in a low-energy species might be far too long for a high-energy species with a shorter attention span. In this case, it would be better to ignore the differences in definition of response waiting between studies since those differences account for temperament variation and allow the same behavior to be measured across species. Though variation of definitions within a species is likely to cause problems (as in the gorilla gesture example), allowing definitions to vary by species facilitates direct comparison between studies by bringing the cognitive ability rather than the temperamental differences to the forefront.

Longitudinal data are especially valuable because they allow us to ask direct questions about the development of gestures over time, but longitudinal studies in apes are rare and time intensive. Indirect questions about gesture development may be asked by comparing individuals of different age classes between different sites to identify developmental trajectories in gesture use. Incorporating longitudinal data of gesture in the same individuals into a cross-study/cross-site database would be invaluable to the field because the integration of cross-sectional and longitudinal data would allow researchers to ask more sophisticated questions about development within a species, and facilitate comparative studies.

Neural Data Management

Resources for managing functional and neuroanatomical data provide a strong backbone for research in social brain modeling. BrainMap <http://www.brainmap.org/> and Brede <http://neuro.imm.dtu.dk/services/jerne/brede/> are tools for managing and performing meta-analyses for functional neuroimaging data (and see companion articles in this volume), and resources like BrainLiner <http://brainliner.jp/> offer a platform for managing and standardizing neurophysiological data. As non-invasive functional brain monitoring in apes becomes more available, specific resources may have to be developed tailored to the needs of researchers. For neuroanatomical data, the NeuroHomology DataBase (Bota and Arbib 2001), for example, was developed to offer researchers the tools to investigate the relationship between similar brain structures in different species. However, newer techniques like diffusion tensor imaging (DTI) now allow researchers to ask novel questions in a non-invasive design. Recent comparative DTI analyses, for instance, suggest significant differences in fiber pathways linking regions in parietal, temporal and frontal areas between modern primates – specifically between macaques, chimpanzees and humans (Hecht et al. 2012). The results suggest an increase in connectivity between STS and inferior parietal regions – moving from macaques, then to chimpanzee, and finally to humans with robust connectivity – together processing the visual form of movements. These and other neuroanatomical studies may support, for example, model hypotheses regarding connectivity between kinematic-processing structures and action recognition and other structures (see Fig. 2). As these data become more prevalent, efficient ways to handle and link these data with functional and neuro-homology databases becomes more important.

Model Result Management

Software designed for computational neuroscience is widely available (e.g., NEURON; http://www.yale.edu/neuron) and code repositories like ModelDB (http://senselab.med.yale.edu/modeldb) offer researchers ways to share code. These resources and others can often be linked or ‘federated’ to offer access to data from other systems, as the Brain Operations DataBase (BODB) does. BODB (http://bodb.usc.edu/bodb; and see companion articles) currently allows linkages to data sources ranging from neuroanatomical datasets of monkey and human, to functional imaging sets like those offered by BrainMap (<brainmap.org>). BODB also offers tools for managing Summaries of Empirical Data (SEDs) with the goal of facilitating work in computational neuroscience. The SED format is designed to be at the appropriate ‘level’ to offer challenges to existing ideas for brain function, and flexible enough to be understood both in relation to other data, and in relation to specific models of the brain, allowing direct comparison between model simulation results and existing (or future) empirical work against which the simulations can be benchmarked. However, as the above analyses have shown, model benchmarking becomes much harder when the behavior studied – gesture, for example – has differing operational definitions and levels of description.

Ideally, integration should be possible at multiple levels of representation. One study may want to ask how manual gestures are used in different age groups and integrate this information with what is known about primate brain systems involved in action recognition. Another study may focus on the ability to respond to the gaze of potential recipients by using a visual vs. tactile gesture, and may be concerned only with the neural representation of gaze awareness and not the gesture type. Flexibility for future integration and expansion is key. For example, BODB currently offers tools for managing behavioral data, and has the possibility of integrating its functionality with other, more specific databases. It would be possible then to create links between BODB and a future database of primate gesture research, thus enabling a platform to manage behavioral and neuroscientific data. Still, existing resources within neuroscience need more structuring, as the examples of non-human primate brain imaging suggest.

Establishing links between collections of neural and behavioral data and allowing searches to span and connect data in different fields would transform our ability to ask questions about the evolution of cognition, brain, and behavior. Though the creation of integrative databases holds great promise for researchers, barriers to participation in a collective database must be minimized. The simplicity and power of the built-in tools for adding and managing data in a database greatly impact the likelihood that people will contribute data and use the database to conduct research. The behavioral and cognitive sciences are becoming increasingly interdisciplinary and advances in our understanding are more likely to be made by comparing across studies and disciplines than by individuals working on isolated datasets using a unidirectional approach.

In this paper we have argued that social brain modeling is a promising field with potential to combine and extend the insights gained from the neural and behavioral sciences. We use gesture, and specifically the proposed learning of gesture via ontogenetic ritualization, as a test case for the construction of this integrative modeling approach. We focus on gesture because it incorporates social features that are problematic for modeling (e.g., different processes in the signaler and receiver, goal attribution, recognition of social variables like attention, and flexible deployment), but also because it allows us to build on existing models of the production and perception of manual action. As our proposed model of ontogenetic ritualization illustrates, integrating ethological data with models grounded in neural detail offers the possibility to ask targeted questions about social learning and cognition and to make testable predictions about behavioral outcomes – and ultimately to help unravel questions about development and evolution. However, substantial challenges remain. We believe that many of these challenges require innovative new informatics approaches, like the construction of searchable databases that would allow integration of data across studies, fields, and methodologies. We call for a concerted interdisciplinary effort between primatologists, neuroscientists, and computational modelers to consider new collaborative approaches to the integration and maintenance of both raw and summarized data. Even small steps into this interdisciplinary terrain promise to transform the research landscape from isolated studies to richly collaborative conversations, and to open up powerful new approaches to very old questions.

Information Sharing Statement

This article was made feasible through various online resources, including university-maintained journal access.