Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

7.1 Introduction

Motor control and motor cognition have been under intensive scrutiny for over a century with a growing number of experimental and theoretical tools of increasing complexity. Still we are far away from a real understanding which can allow us, for example, to integrate what we know in large-scale projects like VPH (Virtual Physiological Human). In a sense, the abundance of new behavioral, neurophysiological, and computational approaches may worsen the situation, by “flooding” researchers with frequently incompatible evidence, losing view of the overall picture. An aspect of this tendency is to quickly dismiss earlier “old-fashioned” ideas on the basis of specific but narrow new evidence. This chapter argues in the opposite direction, revisiting old-fashioned notions, like synergy formation, equilibrium point hypothesis (EPH), and body schema, in order to reuse them in a larger context, focused on whole-body actions: this context, typical of humanoid robotics, stresses the need of efficient computational architectures, capable to defeat the curse of dimensionality determined by the frightening “trinity”: complex body + complex brain + complex (partly unknown) environment. The idea is to organize the computational process in a local to global manner, grounding it on emerging studies in different areas of neuroscience, while keeping in mind that motor cognition and motor control are inseparable twins, linked through a common body/body schema. The long-term goal is to make a humanoid robot like iCub capable of “cumulative learning.” A humanoid robot should mirror both the complexity of the human form and the brain that drives it to exhibit equally complex and often creative behaviors! This requires to emulate the gradual process of infant “cognitive development” in order to investigate the underlying interplay among multiple sensory, motor, and cognitive processes in the framework of an integrated system: a coherent, purposive system that emerges from a persistent flux of fragmented, partially inconsistent episodes in which the human/humanoid perceives, acts, learns, remembers, forgets, reasons, makes mistakes, introspects, etc. We aim at linking such a model building approach with emerging trends in neuroscience, taking into account that one of the fundamental challenges today is to “causally and computationally” correlate the incredibly complex behavior of animals to the equally complex activity in their brains. This requires to build a shared computational/neural basis for “execution, imagination, and understanding” of action, while taking into account recent findings from the field of “connectomics,” which addresses the large-scale organization of the cerebral cortex, and the discovery of the “default mode network” of the brain. We will particularly focus, in the near future, on the organization of memory instead of “learning” per se because this helps understanding development from a more “holistic” viewpoint that is not restricted to “isolated tasks” or “experiments.” Computationally the proposed architecture should lead towards novel nonlinear, non-Turing computational machinery based on quasi-physical, non-digital interactions grounded in the biology of the brain.

7.2 Background Concepts on Body and Embodiment

7.2.1 Embodiment

Robotics has long been disputed between approaches that are fully dependent on the exploitation of the affordances provided by the specific features/structure of the robot “body” and approaches, based on artificial intelligence (AI) principles, that neglect “embodiment” and operate in a completely abstract domain. The “vehicles” proposed by Valentino Braitenberg (1986) are examples of the former approach: in spite of the fact that the control hardware is simply a reactive system, which directly links the sensors to the actuators, vehicles’ behaviors can be surprisingly adaptive and exhibit remarkable features that are commonly attributed to some kind of “intelligence.” There are also many biological counterparts of Braitenberg’s vehicles, such as the Aplysia depilans (Kandel and Tauc 1965), which emphasize the fact that adaptive behavior does not require a central nervous system but can emerge in very simple networks of biological neurons as well. However, it is quite clear that purely reactive systems (or reflexes, in the neurophysiological jargon) can only work effectively with very simple bodies.

Nevertheless, a very influential theory proposed by Charles Sherrington (1904) that dominated the understanding of human neurophysiology for over half a century is based on a simple generalization of the reactive architecture, by positing that reflexes are the basic modules of the integrative action of the nervous system, thus enabling the entire body to function towards one definite goal at a time. A similar point of view was defended by Rodney Brooks in robotics (Brooks 1991), as a drastic alternative to GOFAI (Good Old-Fashioned Artificial Intelligence), by proposing a bottom-up design, named Subsumption Architecture, that is supposed to achieve “intelligence without representation”: this architecture is organized in layers, decomposing complicated intelligent behavior into many “simple” behavioral modules, which in turn are organized into layers of simpler behaviors, down to reflex-like mechanisms. Each layer implements a particular goal of the agent, and higher layers are increasingly general and abstract. However, this kind of layered bottom-up architecture scales up badly when one attempts to deal with complex bodies and complex behaviors in a complex environment.

In contrast with the Sherringtonian view, Hugo Liepmann (1905) was the first one to suggest that actions are generated from within, requiring the existence of an internal state where they would be encoded, stored, and ultimately performed independently of the stimuli coming from the external environment. To account for the implementation of action plans, he proposed that the elementary chunks of action are assembled according to an internal representation: he called movement formula the result of this process, i.e., an anticipatory hierarchical structure where all the aspects of an action are represented, before it is enfolded in time. Liepmann’s legacy is still quite influential in motor neuroscience, although the term movement formula was later replaced by several others, like engram, schema, or internal model. In the same vein, Nikolai Bernstein (1935) had an interesting analogy for explaining this mode of organization: he suggested that the representation of an action must contain, “like an embryo in an egg or a track on a gramophone record,” the entire scheme of the movement as it is expanded in time and it must also guarantee the order and the rhythm of the realization of this scheme.

In the field of human motor cognition, only recently advanced brain imaging techniques allowed to gain direct access to cognitive/mental states in the absence of overt behavior, thus making clear that actions involve a covert stage. It is now accepted that the covert stage is a representation of the future that includes

  • The goal of the action

  • The means/tools to reach it

  • The consequences on the body

  • The effects on the external world

Covert and overt stages thus represent a continuum, such that every overtly executed action implies the existence of a covert stage, whereas a covert action does not necessarily turns out into an overt action. Jeannerod (2001) provided a very important contribution by formulating the Mental Simulation Theory, which posits that cognitive motor processes such as motor imagery, movement observation, action planning, and verbalization share the same representations with motor execution. Jeannerod interpreted this brain activity as an internal simulation of a detailed representation of action and used the term S-state for describing the corresponding time-varying mental states. The crucial point is that since S-states occurring during covert actions are, to a great extent, quite similar to the states occurring during overt actions, then it is not unreasonable to posit that also real, overt actions are the results of the same internal simulation process. Running such internal simulations on an interconnected set of neuronal networks is, in our view, the main function of what is known as body schema.

7.2.2 Synergies

Synergy is a compound noun of Greek origin that implies the interaction and cooperation of two or more elements for carrying out some function or work which is difficult or impossible to achieve with isolated elements. Bernstein (1935) was among the first ones to use this term for describing the complexity of the motor system, recognizing that the central problem in the neural control of movement is motor redundancy, namely the imbalance between (a small number of) task-related variables and the (extremely large number of) muscles and mechanical degrees of freedom (DoF). He suggested that the brain uses synergies to solve this problem, giving this term a strongly cybernetic meaning, indeed years before Norbert Wiener invented the term cybernetics: the idea, although not developed in a mathematical model, was that synergies allow the brain to get rid of task-irrelevant degrees of freedom, thus focusing on the simpler problem of mastering a smaller number of task-relevant variables. In this sense, a synergy can be conceived as a “dimensionality-reduction device,” and as such it has been criticized by some (e.g., Diedrichsen and Classen 2012) considering that deterministic constraints on the evolution of DoFs would imply the inability to achieve large subsets of physically possible postures, an inability which is contradicted by a number of experimental findings in speech motor control, whole-body reaching, brain–machine interfaces, etc. However, this criticism can be overcome by supposing that the computational mechanism, responsible for constraining DoFs and muscle activation patterns in such a way to allow a small number of command variables to coordinate them in a purposive manner, is not hardwired but is sensitive to task requirements, imposing task-related constraints in the preparation time of an action. In this view, biologically plausible synergy formation mechanisms must be multireferential, in the sense of allowing task-modulated bidirectional dynamic interactions among different spaces: end-effector space, joint and muscles space, and possibly spaces related to the DoFs of manipulated tools. If such dynamic interactions are acquired by the brain of a subject via training in the real world, they will incorporate implicitly causality constraints, thus allowing a synergy formation mechanism to bind together high-dimensionality and low-dimensionality computational processes. This means that dimensionality reduction can coexist with full dimensionality representation also in a deterministic framework, provided that suitable dynamic processes link the different spaces. Later on we describe a mathematical model, based on Passive Motion Paradigm (PMP), that can achieve this goal.

In recent years a lot of effort has been focused on muscle synergies (D’Avella et al. 2003). It has been found that, for a wide variety of motor tasks, muscle activation patterns evolve in low-dimensional manifolds and thus can be approximated by the linear composition of a small set of predefined/primitive patterns or modules, i.e., the basis vectors of such low-dimensional subspace. However, from this empirical evidence, can we conclude that muscle synergies are explicitly encoded or stored in the brain, thus becoming the building blocks of the synergy formation mechanism? It is possible indeed that the observed correlations and regularities are not determined by the immediate readout of hypothetic modules, for which there is no concrete evidence, but the effects of a multireferential neural dynamics that does not need to explicitly store or encode a number of high-dimensional patterns. It has been shown, for example, by Kutch and Valero-Cuevas (2012), that biomechanical constraints can explain the low-dimensionality of muscle synergies, without the need of an explicit neural coding, and it is conceivable that the specific dynamic modules incorporate such constraints in the production of synergistic patterns. A recent study with frog leg muscles before and after transection at different levels of the neuraxis (Roh et al. 2011) shows that muscle synergies are organized within the brain stem and spinal cord and are activated by descending commands. Moreover, microstimulation of cortical areas (Overduin et al 2012) is capable of evoking muscle synergies that match those extracted from natural movements. But again, this does not imply that muscle synergies are explicitly coded in the corticospinal motor system, although it is compatible with the neural origin of such synergies (Bizzi and Cheung 2013).

It is also worth mentioning that the idea of storing muscle synergies, as basic motor primitives, is similar to the rationale of the model proposed years before by Rosenbaum et al. (1995), which defends the idea that motor planning is based on “goal postures,” selected from a “database” of stored postures. “Goal postures” take the place of “muscle synergies,” but the underlying idea is the same: using a limited, but sufficiently rich, number of high-dimensional patterns to be combined by a synergy formation process. The underlying issue, in our opinion, is memory vs. computation trade-off: is it better to find the solution of a problem by storing a database of predefined solutions or by simulating an internal, generic, computational model? The answer is not unique and probably the brain can switch between one method and the other in different situations. However, in the case of whole-body motor control, the curse of dimensionality, namely, the exponential growth of computational complexity when the number of recruited degrees of freedom increases, is likely to hit the memory solution earlier than the computational solution.

7.2.3 Motor Synergies and Motor Imagery

Recent discoveries about motor imagery are slowly revolutionizing our grasp of motor control and motor cognition. Motor imagery, which can be defined as the set of mental processes occurring when a movement is imagined or practiced without performing it in an overt way, shares many features with brain activities in real actions, as made explicit by means of brain imaging techniques (Decety 1996). The practical relevance of this empirical finding comes from the effectiveness of mental practice for improving performance in athletic skills (Suinn 1972) and the fact that stroke patients can use mental practice to regain motor function (Sharma et al. 2006). We should also take into account that, in spite of similarities, there is also evidence that motor imagery and neural processes during overt motor behavior are not exactly the same (Coelho et al. 2012). Nevertheless, the same existence of motor imagery indicates that muscle synergies are unlike as basic building blocks of the synergy formation circuitry and suggests that what occurs in the brain, during mental rehearsal or mental training, reflects an endogenous dynamics, not a dynamics related to the neuromuscular system, as involved in overt movements. In other words, “muscleless” motor synergies, occurring in covert movements, might be the hidden building blocks which stand behind the recorded muscle synergies.

In any case, there is mounting evidence accumulated from different directions such as brain imaging studies (Frey and Gerry 2006; Grafton 2009), mirror neuron systems (Rizzolatti et al. 1996; Rizzolatti and Luppino 2001; Rizzolatti and Sinigaglia 2010), and embodied cognition (Gallese and Sinigaglia 2011; Gallese and Lakoff 2005) that generally supports the idea that action “generation, observation, imagination, and understanding” share similar underlying functional networks in the brain: distributed, multicenter neural activities occur not only during imagination of movement but also during observation and imitation of other’s actions (Buccino et al. 2001; Anderson 2003; Frey and Gerry 2006; Grafton 2009; Iacoboni 2009) and comprehension of language, namely action-related verbs and nouns (Pulvermüller and Fadiga 2010; Glenberg and Gallese 2012). Such neural activation patterns include premotor and motor areas as well as areas of the cerebellum and the basal ganglia. During the observation of movements of others, an entire network of cortical areas, called “action observation network,” is activated in a highly reproducible fashion (Grafton 2009). The central hypothesis that emerges out of these results is that motor imagery and motor execution draw on a shared set of cortical and subcortical mechanisms underlying motor cognition.

On the other hand, single-cell recordings of motor cortical neurons have provided an apparently different picture, showing that those neurons are characterized by rather broad tuning functions and suggesting the theory of population coding of some kind of population parameter. However, after the early seminal study by Georgopoulos et al. (1986), who proposed that movement direction might be the coded parameter, alternative interpretations were proposed also on theoretical ground (Mussa-Ivaldi 1988), by showing that the same experimental findings can be correlated indeed with different movement-related parameters. Other experimental studies have also shown that the activity of motor cortical neurons correlates with a broad range of parameters of motor performance from spatial target location to hand or joint motion, joint torque, muscle activation patterns, etc. In other words, the correlation between an internal variable, such as the discharge frequency of a motor neuron, and a specific aspect of an empirically measured movement is a very weak form of explanation of the organization of the motor system.

This kind of indeterminacy is also found in a related area of motor control study: the attempt to explain motor invariants, such as the speed–accuracy trade-off (Woodworth 1899), the bell-shaped speed profile of aiming movements (Morasso 1981), or the power law relating the speed and curvature profiles of continuous drawing movements (Lacquaniti et al. 1983), by means of optimization processes to be associated with the main synergy formation process. Also in this case the empirically characterized smoothness of natural movements is compatible with different optimization criteria, but fails to identify in a strong manner a single organizing principle. Thus, the quest for the prevailing motor parameter directly encoded by neuron firing and the optimization criterion specifically employed in the neural control of movements both appear to be an elusive “holy grail.” The crucial point, in our opinion, is that the direct encoding/storage of specific features or criteria is basically a static concept: it may be appropriate, at least as a first-order approximation, for describing sensory/perceptual processing but fails to capture the essence of “ergonomics” (in the wide sense of the word’s etymology), namely, the capability of human beings to generate extremely complex spatiotemporal patterns, required for performing purposive actions, while interacting with external systems and environments. Another essential feature of “ergonomics” is flexibility, in the sense that each action can potentially recruit all the DoFs of the whole body, with the requirement of a rapid reorganization of the specifically recruited body parts as a function of task and environmental requirements. This makes the static encoding of movement parameters impossible or at least nonfunctional.

The alternative to static encoding is endogenous dynamics of brain circuitry which indirectly supplies the outflow of motor commands and, in turn, is sensitive to the inflow of reafferent signals. This is an idea supported by Churchland et al. (2012) who recently proposed that the evolution over time of the state vector of a cortical map (namely, the instantaneous distribution of firing rates for all the neurons of a map) can be better characterized by a nonlinear differential equation, driven by some external input vector, rather than by a direct static encoding of movement parameters. In this framework, the tuning properties of individual neurons are unintended consequences of the fact that the state vector (or population code) is causally determining the motor outflow, although in an indirect way. We agree with this idea, but we should also consider that it has been around for at least two decades, although as the opinion of a small minority: we welcome its resurrection in the context of new evidence and renewed thinking.

7.2.4 Motor Synergies and the Equilibrium Point Hypothesis

The concept of synergy, as a “dimensionality-reduction device,” was accompanied in early studies by the attempt to assign a regulatory role to the “springlike” behavior of muscles (Bernstein 1935) when such springness was indeed suggested by several experimental studies in the 1960s and 1970s (Asatryan and Feldman 1965; Bizzi and Polit 1978, among others). The central idea was that there is no chance in trying to explain biological movement in terms of engineering servomechanism theory, an approach supported, for example, by Marsden et al. (1972), first of all because muscles are not force/torque generators like electrical motors but mainly because the propagation delays in the feedback loop are a severe, potential source of instability. In contrast, intrinsic muscle stiffness has two strong beneficial effects: (1) it provides, locally (i.e., in a muscle-wise manner), an instantaneous disturbance compensation action, and (2) it induces, globally (i.e., in a total body-wise manner), a multidimensional force field with attractor dynamics. This allows to achieve complex body postures “for free,” without a complex, high-dimensional computational process, but simply by allowing the intrinsic dynamics of the neuromuscular system to seek its equilibrium state.

In this framework, movement becomes the transition from an equilibrium state to another, with the remarkable property of “equifinality” (Kelso and Holt 1980), namely, the fact that movement endpoints should be scarcely affected either by small, transient perturbations or by variations in the starting position of the body. Such attractor properties of motor control were confirmed by several studies of electrical stimulation of different parts of the nervous system, such as interneurons in the spinal cord of the frog (Giszter et al. 1993) or pyramidal neurons in the precentral cortex of the monkey (Graziano et al. 2002).

In reality, the picture is more complicated, in the sense that detailed experimental investigations show, for example, that muscles can only be approximated by ideal springs and that equifinality can be somehow violated by small, impulsive force disturbances (Popescu and Rymer 2000) or specific environmental conditions. In spite of this, we believe that EPH can explain a lot of the overall rationale underlying synergy formation, although it cannot cover the whole range of situations. Consider, for example, the stability of the upright standing body and the coordination in whole-body aiming movements: in this case, muscle stiffness alone is insufficient to achieve stability (Loram and Lakie 2002) and requires a parallel intermittent control action (Asai et al. 2009); on the other hand, the appropriate synchronization of ankle and hip strategies, which is essential for whole-body aiming, is nicely explained by means of an extended force field-based coordination model (Morasso et al. 2010), based on the Passive Motion Paradigm (see below).

Motor imagery is quite important, again, for framing the discourse in the right perspective. Since in humans and other species in the high stages of phylogenetic development, actions can be goal oriented, not necessarily stimulus oriented, and can occur in anticipation of events/stimuli or in learned cycles, real/overt actions can alternate with covert/mental actions in order to optimize the chance of success in a game or during social interaction. Therefore, overt actions are just the tip of an iceberg: under the surface it is hidden a vast territory of actions without movements (covert actions) which are at the core of motor cognition. This has two main consequences: (1) the format of spatiotemporal patterns of purposive actions, namely, the organization of the synergy formation process, must be shared by covert and overt actions; (2) this format cannot be strictly dependent upon the physics of the body and the neuromuscular system, because in covert actions there is no motion of body masses or contraction of the muscles. We may then derive the hypothesis that the endogenous dynamics of cortical maps is basically the same in overt movements, when it drives the formation of neuromuscular activation patterns, and in covert movements, when it carries out mental simulations of the same movements. This concept is implicit in the Mental Simulation Theory (Jeannerod 2001), and in a similar line of reasoning, we may quote recent experiments on motor planning in tasks that require the careful coordination of rotation and translation of objects (Cohen and Rosenbaum 2011): these experimental results support theories of synergy formation as a process that generates holistic body changes between successive goal postures (Rosenbaum et al. 1995, 2001) or the Ideomotor Theory, which claims that actions are triggered by the anticipation of intended effects (Herbort and Butz 2012).

7.2.5 Motor Synergies and the Body Schema

That humans have an integrated, internal representation of their body (the body image or body schemaFootnote 1) is strongly suggested by the variety of pathological conditions which can only be explained by a deficient internal representation (Head and Holmes 1911). More recent studies (for reviews see Graziano and Botvinick 2002; Haggard and Wolpert 2005) have identified the different cortical areas that may contribute to this function (area 5 in the superior parietal lobe and possibly premotor and motor areas) and the multimodal integration of proprioceptive, visual, tactile, and motor feedback signals that is necessary for maintaining a coherent spatiotemporal organization. It has also been suggested that such continuous body experience may be one of the key elements for allowing the emergence of individual self-consciousness. However, the role of the body schema in synergy formation needs to be investigated more in depth. We believe indeed that running internal simulations on an interconnected set of neuronal networks is perhaps one of the main functions of the body schema. Therefore, the body schema must not be considered as a static structure, like the Penfield’s homunculus, but a dynamical system that generates goal-oriented, spatiotemporal, sensorimotor patterns.

This view of the body schema is clearly multireferential and resonates well with many ideas investigated in the framework of embodied cognition: (1) cognition is situated, in the sense that it is an online process which takes place in the context of task-relevant sensorimotor information; (2) cognition is time pressured, i.e., it is constrained by the requirements of real-time interaction with the environment, what is also known as “representational bottleneck” (Brooks 1991; Pfeifer and Scheier 1998, among others); (3) the environment is part of the cognitive system, including both the physical and social environment; (4) cognition is intrinsically action oriented and even “off-line cognition,” namely, cognition without overt action, is body based as argued by Lakoff and Johnson (1999), who remarked that in most occasions abstract concepts are based on metaphors grounded in bodily experience/activity.

We agree with Brooks (1991) that “the world is its own best model,” but we also believe that a human being, as well as a humanoid robot, needs an internal model or representation of its own body or body schema, extended with an internal representation of the environment and the mastered tools that allow him/her/it to succeed in physical/social interaction. Such body schema does not need to be a faithful biomechanical model, including the finest details of flesh and bones. It is just a skeleton or middleware representation where it is possible to play plausible spatiotemporal games, required at the same time and formulated in the same language by motor cognition and motor control. The power of the concept is that a well-trained agent can use it to interpret/anticipate the actions of other agents or also imagine actions that are physically impossible but crucially important for figuring out the solution of a difficult task (Fig. 7.1).

Fig. 7.1
figure 1

Purely reactive system (left panel) vs. cognitive system (right panel)

The introduction of the body schema as a middleware implies two important concepts in the analysis of the organization of action: one concept is the necessity and the convenience to separate motor cognition from motor control, in a multireferential framework; the other concept is the identification of different time frames. The first concept is related to flexibility and the necessity of degrees of abstraction in the acquisition of skills. Mental reasoning and mental training can be powerful and effective only if it is possible to abstract from specific environmental conditions that can require different control strategies. The capability of abstraction is made possible by a body schema that allows to formulate real and imagined actions in the same format. This logic separation of motor cognition and motor control implies the identification of three different time frames: (1) learning time, for acquiring an approximate representation of the model modules; (2) preparation time, for recruiting the necessary body parts, configuring the networks, and setting up the specific task-dependent components; and (3) real time, for running the internal simulation of the body model and thus generating control patterns either for covert or overt actions.

7.2.6 Implementing the Body Schema by Means of the Passive Motion Paradigm

The PMP (Mussa Ivaldi et al. 1988) was conceived as an extension of the EPH from motor control to motor cognition. The idea is to think that there are two attractor dynamics, nested one inside the other, which cooperate for action generation: the more internal one expresses an endogenous brain activity, related to an internal model or body schema, and is the one that is responsible for covert movements (as such, it does not involve body masses, muscle stiffness, and muscle synergies); the latter attractor dynamics, related to the conventional EPH, exploits the physical equilibrium states determined by the biomechanics of the body. Our hypothesis is that the two dynamical regimes are compatible and integrated in the same structure, allowing subjects to shift effortlessly from mental simulations of actions to real actions and back, in agreement with the evidence coming from brain imaging.

The Passive Motion Paradigm is a force field-based mechanism of synergy formation that allows to coordinate the motion of a redundant set of articulations while carrying out a task, like reaching or tracking an object. Originally, it was formulated in order to demonstrate that, when carrying out inverse kinematics with a highly redundant system, it is not necessary to introduce an explicit optimization process. The idea can be expressed, in qualitative terms, by means of the animated puppet metaphor (Fig. 7.2 left panel) or the “flying hand metaphor” (Fig. 7.2 right panel), suggested by Marc Jeannerod. The key point, in both cases, is that in reaching movements, it is not the proximal part of the body which is pushing the end effector to the target but the other way around: the end effector is pulled towards the target by the force field and in turn pulls the rest of the body.

Fig. 7.2
figure 2

Animated puppet metaphor (left panel). Flying hand metaphor (right panel)

In mathematical terms, let us represent the intention to reach a target \( {\overrightarrow{p}}_{\mathrm{T}} \) by means of a force field \( {\overrightarrow{F}}_{\mathrm{H}} \), aimed at the target and attached to the hand \( {\overrightarrow{p}}_{\mathrm{H}} \).Footnote 2 \( {\overrightarrow{F}}_{\mathrm{H}} \) is mapped into an equivalent torque field \( {\overrightarrow{T}}_{\mathrm{A}} \), acting on all the joints of the arm (vector \( \overrightarrow{q} \)), by means of the transpose Jacobian matrix J B Footnote 3: it is worth mentioning that the torque field has a much higher dimensionality than the force field as a consequence of the redundancy of the arm. The torque field induces in the body schema a distribution of incremental joint rotations, modulated by the admittance matrix A. In turn, the joint rotation pattern is mapped into the corresponding hand motion pattern, thus updating the attractor force field and closing the computational loop:

$$ \begin{array}{ccc}\hfill \underset{\mathrm{H}\mathrm{and}\kern0.5em \mathrm{space}}{\left\{\begin{array}{l}\frac{d{\overrightarrow{p}}_{\mathrm{H}}}{ dt}={J}_{\mathrm{B}}\frac{d\overrightarrow{q}}{ dt}\\ {}{\overrightarrow{F}}_{\mathrm{H}}=K\left({\overrightarrow{p}}_{\mathrm{T}}-{\overrightarrow{p}}_{\mathrm{H}}\right)\cdot \varGamma (t)\end{array}\right.}\hfill & \hfill \begin{array}{l}\leftarrow {J}_{\mathrm{B}}\leftarrow \\ {}\to {J_{\mathrm{B}}}^{\mathrm{T}}\to \end{array}\hfill & \hfill \underset{\mathrm{A}\mathrm{rm}\kern0.5em \mathrm{joint}\kern0.5em \mathrm{space}}{\left\{\begin{array}{l}\frac{d\overrightarrow{q}}{ dt}=A{\overrightarrow{T}}_{\mathrm{A}}\\ {}{\overrightarrow{T}}_{\mathrm{A}}={J_{\mathrm{B}}}^{\mathrm{T}}{\overrightarrow{F}}_{\mathrm{H}}\end{array}\right.}\hfill \end{array} $$
(7.1)

The mathematical description of the PMP summarized by (7.1) can be expressed graphically by means of the bock diagram of Fig. 7.3. The transient induced by the activation of the force field is terminated when the target is reached, if it is reachable. If the target is not reachable, for example, if it is outside the workspace, the final posture is the one that minimizes the final positioning error. It should be noted that all the computations in the loop are “well posed” and thus this computational model is robust and cannot fail. In any case, if the force field remains stationary during the movement, the acquisition of the new equilibrium state occurs in an asymptotic manner, and thus reaching time is not controlled. Such time can be controlled by means of a technique proposed by the group of Michael Zak (1988), called terminal attractor dynamics, which consists of a suitable nonlinear modulation of the force field, which tends to diverge to infinity when time approaches the intended deadline. The Γ(t) function or nonlinear time-base generator implements such modulation. The function can be considered as a kind of “neural pacemaker” (Barhen et al. 1989), and a biologically plausible representation can be identified in the cortico-basal ganglia–thalamocortical loop and the well-established role of the basal ganglia in the initiation and speed control of voluntary movements. In other words, synergy formation requires a symphonic director, not a mere metronome, namely, a coordination entity that, in addition to giving the tempo, recruits the different sections of the orchestra, modulates the emphasis of the different melodic pieces, etc.: the gating action of the function is the key element of this symphonic action.

Fig. 7.3
figure 3

Top panel: PMP network. The basic kinematic constraint that links the hand and joint spaces is represented by the Jacobian matrix. Additional constraints, in the hand and joint spaces, can be represented by means of corresponding force or torque fields. Bottom panel: Articulated body schema within the PMP framework, to be configured in the preparation time of an action with a selection of tools, targets, time-base generators, and specific constraints

The Γ(t) function was not present in the original PMP model, and it was added later on (Mohan and Morasso 2011; Mohan et al. 2009, 2011a, b) when the model was applied to the iCub robot (Metta et al. 2010). The movements determined by the model are described as “passive” in the sense that the animation of a marionette is passive: the joint rotation patterns are not explicitly programmed but are the consequences of applying a set of forces to the terminal parts of the marionette. A similar point of view has been followed by Kutch and Valero-Cuevas (2012) in their analysis of muscle synergies, but with a different conclusion: they show that the biomechanics of the limbs constrain musculotendon length changes to a low-dimensional subspace across all possible movement directions and then propose that “a modest assumption”—that each muscle is independently instructed to resist length change—can explain the formation of neuromuscular synergies. The “modest assumption” of Kutch and Valero-Cuevas (2012) is equivalent to the “passive motion” above. However, the conclusion by the former authors (namely, that “muscle synergies will arise without the need to conclude that they are a product of neural coupling among muscles”) is not the only possible one. The alternative, exemplified by the PMP hypothesis, is that the neural coupling (or the organized S-state, borrowing the terminology of Jeannerod 2001) is just the result of the simulation of the passive motion induced by the internal body model.

The simple PMP network of Fig. 7.3 (top panel) is just an example of the body schema employed in a simple reaching task. Basically, the model of the body schema is embedded in the Jacobian matrix, and the model of the task in the force field generator and the admittance matrix. The network can be easily generalized to whole-body movements, which recruit all the DoFs of the body, can be expanded in order to integrate manipulated tools, and can be easily specialized to a variety of tasks, even multiple, concurrent tasks (Fig. 7.3, bottom panel).

In the PMP framework, force fields, admittance, and stiffness matrices do not refer to physical entities, as happens in the classical EPH framework, but to features of the attractor dynamics of the internal body model. In particular, the “admittance” matrix A specifies the degree of participation of each degree of freedom to the common reaching movement, and thus it can be manipulated according to specific task requirements.

7.2.7 A Biologically Plausible Implementation of the PMP

A biologically plausible neural architecture that is consistent with the PMP dynamics described by (7.1) or the model of Fig. 7.3 is described in Morasso et al. (1998). It is formulated in terms of collection of macro-neurons, each of which summarizes the activity level of a cortical column, and characterized by a nonlinear ordinary differential equation ODE, gated by the same Γ(t) function defined above. These neural ensembles can be considered “maps” because the lateral connections correspond to a semi-regular grid. The rate of change of the activity of each macro-neuron is modulated by three elements: (1) a local inhibitory input; (2) a recurrent neighboring excitatory input, due to lateral connections inside the map; and (3) another excitatory input originating from external sources. The model is consistent with what is known about the cytoarchitecture of motor cortical areas. In fact, the majority of synapses in the mammalian neocortex originate from cortical neurons. In particular, lateral connections from superficial pyramids tend to be characterized by recurrent excitation with other pyramids (about 80 % of the total), while only about 20 % of the synaptic connections are with inhibitory intra-columnar interneurons (Nicoll and Blakemore 1993). It is well known that recurrent excitation in neural networks can implement many interesting functions, like finite-state automata, associative memories, or spatiotemporal pattern formation (McCulloch and Pitts 1943; Cohen and Grossberg 1983; Hopfield 1984; Morasso et al. 1998). On the other hand, inhibitory synaptic connections are an important part of the intrinsic circuitry of the neocortex, serving to modulate the propagation of sensory information.

More specifically, the inhibitory local field is expressed by a simple “leaky integrator.” The recurrent lateral connections are excitatory and approximately symmetric, as in Hopfield networks, thus making sure that the map is stable, i.e., has an attractor dynamics. We also assume that the pattern of lateral connectivity is acquired through a process of babbling and self-organization, thus encoding the dimensionality and topology of the sensorimotor space represented by the map. The distribution of activity throughout the map via the lateral connections is normalized by a mechanism of gating inhibition that takes into account, for each macro-neuron, the average activity of its neighbors (Reggia et al. 1992; Morasso et al. 1998). Finally, the input field, broadcasted to the map by another map or by thalamocortical projections, is channeled to a limited population of macro-neurons via a mechanism of shunting interaction that induces a cluster of activity in the neural population around the neuron that resonates with the input field.

The equilibrium states of this network architecture are characterized by clusters of activation in register with the peak of the external field, i.e., a population code matching the external input field. After a shift of the input field, corresponding to the selection of a new target, the combination of symmetric recurrent excitation, gating inhibition, and shunting interaction induces in the map an attractor dynamics characterized as follows: first, a diffusion process (which initially flattens the population code, spreading the activity pattern over a large part of the network) and, then, a re-sharpening process around the target. The combination of the two processes can be described as a moving hill, namely, the propagation of the population code towards the new target.

Suppose now to instantiate two cortical maps, with the same network dynamics but with different dimensionality and connectivity: for example, a map for representing hand position and the other for representing arm configuration (in the case of arm motor control) or a map for representing speech sounds and the other for representing configurations of the vocal tract (in the case of speech motor control). Both cases are characterized by a high degree of redundancy, and thus the latter map will have a larger number of units and a more complex connectivity than the former one. We may suppose that during a process of self-supervising learning or Piagetian circular reaction (Kuperstein 1991), it was possible to acquire two sets of topology-representing intra-connections for the two maps and, at the same time, a set of interconnection between the maps. As a consequence of the redundancy of the system, we may expect that interconnectivity will be “many to one,” i.e., each neuron of the hand map will be connected to many neurons of the arm map, thus representing in a distributed manner the “null-space” of the kinematic transformation between the two spaces.

The external field acting on each map is a combination of a bottom-up external input and an input coming from the cross-connection of the two maps. If no external input is provided, the two maps excite each other, via the two corresponding population codes, representing, for example, the current configuration of the arm and the corresponding position of the hand. Starting from this equilibrium state, if an external input is activated in the hand map, identifying a target position, then an overall dynamics will be induced in both maps by spreading activation via inter- and intra-connections until the population code of the hand settles in the target positions and the population code of the arm in one of the many corresponding arm configurations. In principle, this distributed architecture can be extended, up to a full-body representation, by including cortical maps of different body parts as well as cortical representations of manipulated tools (Maravita and Iriki 2004).

The “universal” gating action of the Γ(t) function is critical for making sure that the multiple population codes in a whole-body cortical architecture remain consistent throughout the overall transient from one equilibrium condition to a new one. It can be considered as a deadline enforcing mechanism, and it has been conceived originally for attributing terminal attractor dynamics to associative memories of large size, namely, for assuring that the equilibrium state is achieved in a finite time, independent of the network size and topology. This kind of nonlinear, broadcasted gating action is generally appropriate for coordinating the timing in large-scale, distributed systems, such as different cortical maps. Moreover, the computational necessity of guaranteeing ordinal and temporal structure in complex biological or artificial organisms is supported by recent behavioral experiments (Kornysheva et al. 2013) that suggest the existence of independent ordinal and temporal structures and advocate a nonlinear multiplicative neural interaction of temporal and ordinal signals in the production of motor patterns.

7.2.8 Separating Motor Control from Motor Cognition and Integrating Them via the Body Schema

Figure 7.1 (right panel) illustrates the concept that the body schema can be considered as an internal model which serves as a middleware between the covert virtual movements generated by a motor cognitive machinery and the overt movements generated by the motor controller. In the simplest case (typically used by iCub as a default control mode), the covert movements provide reference trajectories for all the DoFs which are then controlled as a bunch of independent PD-controlled servomechanisms. However, this may not be appropriate in a number of significant situations, in particular in the case of unstable tasks.

An example is whole-body reaching while standing. A biomimetic approach, based on PMP, for synergy formation of whole-body movements in humanoid robots is described by Morasso et al. (2010). It is supposed to combine a double task: (1) a focal task (reaching or approaching as much as possible a target in 3D space) and (2) a postural task (keeping the vertical projection of the center of mass inside the support base of the standing body). The synergy formation mechanism uses two force fields applied to the body schema: one linked to the hands for the focal part and the other linked to the pelvis for the postural part, thus implementing a hip strategy of stabilization. Remarkably, the simulated patterns generated by the model are consistent with distinctive aspects of human behavior for this kind of task, namely, the synchronized velocity peaks of the reaching hand and the forward shift of the center of mass. However, this PMP-based mechanism is massless and is not yet a control system because it does not provide specific stabilization signals of the inverted pendulum which, at least approximately, represents the standing body. The intrinsic instability of the inverted pendulum model is due to the fact that the rate of growth of the gravity-related toppling torque is greater than the stiffness of the critical joint involved in the stabilization of the standing body, namely, the ankle. Therefore, a controller is needed for providing ankle torque control signals that stabilize the inverted pendulum. A continuous-time PD feedback controller applied to the ankle does not work because the delay of the feedback signals (sway angle and sway speed) becomes itself a source of instability. However, such delay-induced instability can be avoided by means of an intermittent controller (Asai et al. 2009), which closes the loop according to a decision mechanism based on the analysis of the trajectories of the inverted pendulum in the phase space: this mechanism achieves bounded stability, consistent with the recorded sway movements of the standing body, in a robust way. A recent paper (Morasso et al. 2013) demonstrates the feasibility of extending the intermittent controller from quiet standing to dynamic standing. It integrates in a bidirectional manner the PMP synergy formation mechanism, which generates time-varying reference joint rotations, with the intermittent controller which switches on/off the feedback control law according to the current state of the pendulum. In other situations, as in grabbing/pushing in which there is a physical interaction, the control part of the synergy might be more concerned with a modulation of the end-effector stiffness, in order to take into account task-dependent features like fragility of the manipulated objects. In any case, stiffness modulation requires, as a prerequisite, the selection and real-time adjustment of appropriate body postures that can be naturally provided by the animated body schema.

7.3 Beyond Embodiment: Building a Brain to Understand the Brain

In the first part of this chapter, we summarized some concepts about the necessity and usefulness of embodiment and body schema as basic building blocks in the process of building a cognitive architecture of a humanoid robot like iCub. However this is only a kind of preliminary groundwork, and the actual construction is an exciting work in progress. As a matter of fact, our ongoing adventure to build a cognitive architecture for iCub in many ways is linked to the three apparently disparate citations above, namely, the power of understanding fundamental principles through a model building approach, which is essentially decentralized, local to global, nonlinear, non-digital: smooth flow through time and space. All of this relates to cumulative learning and organization of memories in our brain as well as in iCub cognitive system. Indeed our own individual experiences play a fundamental role in leading us to exhibit numerous instances of creativity, rationality, and irrationality in our behaviors. Use of “experience” to go “beyond experience” is important simply because we all inhabit a continuously changing world where neither everything can be known nor can everything be experienced. In order to succeed and ultimately survive, diverse “chunks of knowledge” emerging from one’s past experiences have to be integrated and exploited flexibly in the context of the present state of affairs to ensure smooth realization of goals. How the brain achieves such diversity in control is a central challenge facing both neuroscience and cognitive robotics today.

Simply put, beyond a point a software programmer cannot travel the journey of a cognitive robot. Instead, like natural cognitive agents, cognitive robots must also be endowed with mechanisms that enable them to efficiently organize their sensorimotor experiences into their memories, remember and exploit them effectively when needed to realize their goals, and, at the same time, keep learning new things. Enabling them to do so presents a unique opportunity to emulate the gradual process of infant development and investigate the underlying interplay between multiple sensory, motor, and cognitive processes from the perspective of an integrated system that perceives, acts, learns, remembers, forgets, reasons, makes mistakes, introspects, etc. To this effect, even simple experiments with a humanoid like iCub offer us an exciting medium to “build a brain to understand the brain” and contemplate numerous open questions related to the emergence of embodied cognition: how do structures of bodily experience gradually “work their way up” to form abstract patterns of inferences? How do playful interactions between the body and the world sculpt the memories of a cumulative learning robot? When and how do mechanisms related to abstraction, consolidation, and forgetting play a role in shaping cumulative learning and sensorimotor development? What is the role of the teacher in minimizing “blind” trial and error exploration and motivating and influencing the developmental curve? How do all these questions, phrased in the context of a gradually learning and developing humanoid, relate to emerging trends in neuroscience? And finally, to which extent this kind of “cognitive biomimetism” is effective in shaping humanlike capabilities in a humanoid robot? We are currently investigating these fundamental issues with the help of numerous playful experiments with iCub that attempt to achieve cumulative development of procedural, semantic, and episodic memories and the parallel development of a brain-guided computational framework to organize and creatively exploit such learned knowledge for the realization of goals.

In general, after the tryst with GOFAI, most current research in the field of cognitive developmental robotics appreciates the fact that “sensorimotor experience precedes representation” and cognition is gradually bootstrapped through a cumulative process of learning by interaction (physical and social) within the zone of proximal development (Vygotsky 1978) of the agent. This approach indeed has roots in Wiener’s cybernetics (1948), Varela and Maturana’s autopoiesis (1974), Chiel and Beer’s neuroethology (1997), Clark’s situatedness (1997), Hesslow’s simulation hypothesis (Hesslow 2002; Hesslow and Jirenhed 2007), and Thompson’s enactive cognition (2007). The obvious reason to pursue this path is because it is impossible to predict and program at design time every possible situation in every time instance to which an artifact may be subjected to in the future. Straight robot programming approaches work for simple machines performing targeted functions but certainly not for general-purpose robotic companions envisaged to interact with humans in unstructured environments. Complementing the extrinsic application of specific value, the embodied/enactive approach is also relevant from an intrinsic viewpoint of understanding our own selves—understanding how interactions between body and the brain shape the mind and shape action and reason. This is because in addition to the range of direct problems typical of conventional physics, which involve computing effects of forces on objects, brains of animals have also to deal with inverse, typically ill-posed, problems of learning, reasoning, and choosing actions that would enable realization of one’s goals and hence ultimately survive. Strikingly, many of the inverse problems faced by the brain to learn, reason, and generate goal-directed behavior, together with the ability to make predictions inherent with the solution of direct problems, are indeed analogous to the ones roboticists must solve to make their robots act cognitively in the real world. At the same time, it is only fair to say that in spite of extensive research scattered across multiple scientific disciplines and prevalence of numerous machine learning techniques, the present artificial agents still lack much of the resourcefulness, purposefulness, flexibility, and adaptability that biological agents so effortlessly exhibit. Certainly, this points towards the need to develop novel computational frameworks that go beyond the state of the art and endow cognitive agents with the capability to learn cumulatively and use past experience effectively “to connect the dots” when faced with novel situations.

Looking at the incessant loop of gaining experience and using experience, typical of biological species that exhibit some form of cognition, learning and reasoning can be seen as foreground and background alternating each other as intricately depicted in the artistic creations of Escher. In an intriguing work during the early days of embodied/enactive cognition, Mark Johnson (1987) playfully remarked that “we are animals but we are also rational animals,” emphasizing the fact that, like learning, the structure of reasoning and inference also does not transcend the structure of bodily experience. The centrality of embodiment directly influences “what” and “how” things can be meaningful to us, the ways in which our understanding of the world is gradually bootstrapped by experience and the ways in which we reason about them. In this essence, we believe that for cognitive robots foreseen to operate in open-ended unstructured environments, learning and reasoning must cumulatively drive each other in a closed loop: more learning leading to better reasoning and inconsistencies in reasoning driving new learning. In neural computation, this implies that part of the cortical substrates activated during perceptual and motor learning (i.e., when an agent gains experience) are also activated when an agent reasons and simulates the causal consequences of its actions. While resonance between top-down and bottom-up information flows is a measure of the quality of learning, dissonance is the stepping stone to novelty detection for gaining more experience and learning further. Such neural reuse also makes sense considering the fact that brain is a product of evolution, meant to support the survival of a species in its natural environments, and importantly operates under constraints of space, time, and energy. A wealth of emerging evidence from neuroscience substantiates this fact (see Gallese and Sinigaglia 2011; Grafton 2009; Martin 2009; Bressler and Menon 2010; Hesslow and Jirenhed 2007 for recent reviews). We believe that this aspect must be an essential design feature in future cognitive robots that have any chance to survive, cooperate, and assist humans in the real world. While emerging results from functional imaging and behavioral studies may serve as a guiding light, there is still an urgent need to also focus on “cognitive computation” and look deeper into the underlying computational principles in order to create artificial cognitive systems that can both be “practically useful” and in turn shed deeper insights into the ongoing “neural computation” in the brain. In this context, building up on an intriguing review a decade back by Germund Hesslow (2002), we believe that computational architectures driving cognitive robots must include the three following basic building blocks that form the core of the embodied simulation hypothesis:

  1. 1.

    Simulation of action through animation of the PMP-based body schema This building block was discussed in detail in the Background section. In general one may ask why does a cognitive robot like iCub need a body schema. Simply put, for the same reason a human or a chimp needs it: without one, it would be unable to use its “complex body,” take advantage of it, and ultimately survive. In general, for an organism with a complex body inhabiting an unstructured world, the purpose of “action” is not just restricted to shaping motor output to generate movement but also to provide the self with information on the feasibility, consequence, and understanding of “potential actions” (which could lead to realization of “goals”). We already suggested the “iceberg metaphor” to explain this state of affairs; by adding to it, we should say that there must be continuity between what is above and what is below the surface: the “link or the middleware,” we suggest, is the body schema mechanism. We note here that until recently the issue of body schema has not been very popular in cognitive robotics in comparison to the concept of embodiment. These are not the same things. If you have a body schema, you also have embodiment but not the other way around. Vernon et al. (2010) in their discussion on a roadmap for cognitive development in humanoid robots present a catalogue of cognitive architectures, but in none of them the concept of body schema is a key element. However, emerging trends in neuroscience act as a motivating force to revisit old ideas like synergy formation, EPH, and body schema and reuse them in a larger context to arrive at a shared computational/neural basis for “execution, imagination, and understanding” of action in humans and humanoids.

  2. 2.

    Simulation of perception and distributed organization of semantic memory Imagining to perceive something is similar to actually perceive it, only difference being that the perceptual activity is generated top-down rather than by environmental stimuli. While this perspective has been emphasized in the reviews of Hesslow (2002, 2007) and Grush (2004), among others, more recent developments on the organization of semantic knowledge in the brain (see Patterson et al. 2007; Martin 2007, 2009; Martin et al. 2011; Damasio 2010) provide further insights that help to constrain computational architectures for cognitive agents. The main finding from these studies is that conceptual information is grounded in a distributed fashion in “property-specific” cortical networks that directly support perception and action. It is also established that “retrieval” or reactivation of the neural representation can be triggered from partial cues coming from multiple modalities: for example, the sound of a hammer retro-activates its shape representation (Meyer and Damasio 2009), and presentation of a real object or a 2D picture of it can both activate the complete network associated with the object. The results indicate that while there is a fine level of “functional segregation” in the higher-level cortical areas processing sensorimotor information, there is also an underlying cortical dynamics that facilitates cross-modal, top-down, and bottom-up activation of these areas. “Higher level” needs to be emphasized because there is reason to believe that both early stages of perception and late stages of action should not be involved in embodied simulation of action and perception, in order to keep a distinction between overt and covert actions, which we deem important for purposive reasoning: there is evidence of this distinction both from motor (Desmurget and Sirigu 2009) and perceptual studies (Martin 2009).

  3. 3.

    Global integration through small world organization From a computational perspective, in a large-scale complex system like the brain, efficient integrative mechanisms require a number of organizational properties, such as minimization of the number of processing steps, efficient wiring for minimizing brain volumes and metabolic cost in the transmission of information, and synchronization of neural processes in order to achieve pattern completion and conflict resolution. Recent developments in the fields of network theory (Barabási 2012, 2003) and connectomics (Sporns 2013) provide useful insights in this direction. The point of intersection is the property of “small worldness” now found to be prevalent in many large-scale networks. In simple terms, “small worlds” are complex systems where individual members form tightly knit local communities, characterized by dense clustering and very short connection lengths. Since the seminal works of Watts and Strogatz (1998) and Barabási and Albert (1999), it is now established that several complex systems like social networks, transportation networks, power grids, connectivity of the Internet, gene networks, food webs, and patterns in sexually transmitted diseases, among several others, exhibit the “small world” property. Emerging evidence from the analysis of large-scale architecture of the cerebral cortex (Hagmann et al. 2008; Sporns et al. 2002; Sporns 2011, 2013) using techniques like Diffusion Tensor Imaging substantiates the fact that cortical networks of the brain exhibit such small world property. These studies suggest existence of a small set of “hubs” (highly connected cortical patches) that closely interact to facilitate swift cross-modal, top-down, and bottom-up interactions between subnetworks involved in learning, simulating, and representing various sensorimotor information.

It is also worth to highlight that the studies mentioned above, about the simulation of perception and action, also point towards existence of few set of hubs that facilitate both “integration and differentiation” (Patterson et al 2007; Martin 2009; Damasio 2010). Further, with the recent discovery of the default mode network (DMN) in the brain (Buckner and Carroll 2007; Suddendorf et al. 2009; Buckner et al 2008; Bressler and Menon 2010; Addis and Schacter 2012; Addis et al 2009; Hassabis and Maguire 2011; Welberg 2012), it is now also known that a core network of “highly connected” areas is consistently activated when subjects perform diverse cognitive functions like recalling past experiences, simulating possible future events (or prospection) and planning possible actions, and interpreting thoughts and perspectives of other individuals. Recently a homologous network for DMN was also discovered in rats (Lu et al. 2012) further supporting the hypothesis that the structure of DMN was both retained and further enhanced during evolution. In addition to natural systems, these findings provide crucial insights towards creating brain-guided computational architectures that can enhance the survival and productivity of artificial systems beyond the state of the art (e.g., robotic assistants supporting humans in numerous application domains). Figure 7.4 presents a schematic illustration of the recent developments in the fields of neuroscience that we plan to integrate in the cognitive architecture of the iCub.

Fig. 7.4
figure 4

Schematic illustration of the recent developments in the fields of neuroscience that we plan to integrate in the cognitive architecture for iCub

7.3.1 Organization of a Procedural Memory from “Fast, Green, Embodied, Cumulative Learning”

We believe that central to the issue of procedural memory is the capability of humans and cognitive animals to master the use of tools. In general, the essence of “tool use” lies in our gradual progression from learning to act “on” objects to learning to act “with” objects in ways to counteract limitations of “perceptions, actions, and movements” imposed by our bodies. At the same time, to learn both “cumulatively” and “swiftly,” a cognitive agent must be able to efficiently integrate multiple streams of information that aid the learning process itself. Most important among them are social interaction (e.g., imitating a teacher’s demonstration), physical interaction (or practice), and “recycling” previously acquired motor knowledge (experience). On the other hand, from the neuroscience perspective, there has been resounding evidence substantiating the fact that action “generation and observation” share underlying functional networks in the brain, and experiments related to “tool use” learning in animals clearly indicate the fact that the a learned “tool” during coordination becomes a part of the acting “body schema” and is coded in the motor system as if it were an artificial hand able to interact with the external objects, exactly as the natural hand is able to do.

For the development of a “motor vocabulary” and a “procedural memory” for iCub, we took into account the following main requirements:

  1. 1.

    The need to learn “fast” and “green,” by combining multiple learning streams (social interaction, exploration, recycling of past motor experience);

  2. 2.

    The need to arrive at a shared computational basis for “execution, perception, imagination, and understanding” of action;

  3. 3.

    The need to arrive at general representational framework for motor action generation and skill learning that firstly blurs the distinction between body and tool and secondly supports both “task-specific” compositionality and “task-independent” motor knowledge reuse.

Importantly, expanding the framework to incorporate “skill learning,” “tool use,” and “motor knowledge recycling” led further towards the incorporation of several novel ideas emerging from brain science. Looking from the perspective of the brain, the straightforward advantage of learning one motor skill in an “abstract” way is that it unlocks the implicit potential to “perceive, mimic, and begin to perform” several other skills (which share a similar structure). Our working hypothesis was that “shape of movement” could be the abstract feature using which motor vocabulary can be efficiently composed and inversely “stored” as a component of the procedural memory. We observed that a wide range of human actions result in formation of trajectories that ultimately result in similar “shape” representations. For example, drawing a circle, driving a steering wheel, uncorking, winding, cycling, stirring, etc. are actions that have “circularity” as invariant in them. If we teach a humanoid robot to perceive and synthesize “shapes” of movements (instead of motion trajectories), we can endow then with the powerful capability to “compose and recycle” the previously acquired motor knowledge to swiftly learn a wide range of other motor skills. This led to the development of a general motor skill learning architecture based on the PMP framework. The value of this architecture was tested by showing how motor knowledge acquired by iCub while learning to draw (skill 1: Mohan et al. 2011a) could be systematically recycled in a task of learning the bimanual control of a toy crane as a tool to “pick up” otherwise unreachable objects in the environment (skill 2: Mohan and Morasso 2012). The underlying mechanism is indeed quite general and can be applied to acquire a wide range of skilled actions in a similar manner.

Figure 7.5 summarizes the central building blocks and high-level information flows that are crucial for constructing a “reusable” and “growing” motor vocabulary and procedural memory in cumulatively learning robots. Three streams of learning are integrated into the architecture: (1) learning through teacher’s demonstration (information flow in black arrow), (2) learning through physical interaction (blue arrow), and (3) learning through motor imagery (loop 1–5). The imitation loop initiates with the teacher’s demonstration and ends with iCub reproducing the observed action. The motor imagery loop is a subpart of the imitation loop, the only difference being that the motor commands synthesized by the PMP-based forward/inverse model are not transmitted to the actuators. This loop hence allows iCub to internally simulate a range of motor actions and only execute the ones that are promising, given the task and the context.

Fig. 7.5
figure 5

Motor skill learning and action generation architecture for iCub: building blocks and information flows

7.4 Work in Progress: Playful Experiments with iCub for Organizing Episodic and Semantic Memory

If we focus only on learning specific tasks, embodied procedural memory is sufficient to drive learning and action generation. However, many questions remain unanswered if we stick to this framework. Let us list a few of them, for summarizing the range of relevant issues:

  • How do structures of bodily experience gradually “work their way up” to form abstract patterns of inferences?

  • How do we bridge the gap from task-specific “sense” to task-independent “common sense”?

  • How do playful interactions between the body and the world sculpt the memories of a cumulatively learning agent?

  • When and how do mechanisms related to abstraction, consolidation, and forgetting play a role in cumulative learning?

  • What is the specific influence of a teacher in minimizing exploration, motivating, and shaping the developmental curve?

In addition to procedural memory, what we need is semantic and episodic memory (Tulving 1972, 2002) in order to feed in an integrated and bidirectional manner the twin processes of reasoning and learning: more learning driving better reasoning and inconsistencies in reasoning driving new learning. In order to address these problems in the robotic field, it is useful to take inspiration from studies in animal cognition:

  • Causal and spatial reasoning, namely, identifying useful objects in the environment that could be exploited, as tools, in the context of the otherwise unrealizable goal.

  • Trap tube paradigm, namely, the problem of recovering a piece of food, stored in a transparent tube, by means of a sticklike object of sufficient length, while avoiding a trap in the tube.

  • Tool making: Consider the behavior of “Betty, the Caledonian crow” (Weir et al. 2002; Emery and Clayton 2004) when she faced the problem of extracting a food basket from the bottom of a transparent vertical tube and managed to bend a piece of metallic wire in such a way to reach and pick up the basket.

Figure 7.6 shows iCub engaged in different kinds of scenarios. In particular, in these scenarios iCub must learn to push. Why is pushing interesting? As a matter of fact, this skill has been investigated extensively in studies related to understanding of “physical causality” in primates and infants (Visalberghi and Tomasello 1997; Whiten et al 2009; Addessi et al. 2008). It is also known from these studies on animal behavior that different species are different levels of understanding of the causality related to this task. In addition to the multiple utilities of the “push/pull” action itself in the context of assembly operations, what makes it significant is the sheer range of physical concepts that have to be “learned” and “abstracted” in order to execute this action successfully in diverse environmental conditions. For example, it has to be learned that contact is necessary to push, that object properties influence “pushability” (balls roll faster than cubes and it does not matter what is the color of the ball or the cube), that pushing objects gives rise to path of motion in specific directions (the inverse applies for goal-directed pushing), that pushing can be used to support grasping and bring objects to proximity (while working on assembly tasks), and that there can be counterforces that block the pushed object (similar to a goal keeper in football). The requirement to capture/learn such a wide range of physical concepts through “playful interactions” of the baby humanoid with different objects makes this task both interesting and challenging.

Fig. 7.6
figure 6

Playful scenarios for iCub to learn and reason

Other paradigmatic scenarios can be envisaged, in order to engage iCub in significant goal-directed activities. One of them is assembling the tallest possible stack from a set of available objects/toys. This scenario is useful for exploring the computational architecture necessary to enable the robot to efficiently organize and use its own episodic memories related to its various experiences of interacting with different objects, all channelized towards achieving the goal of building the tallest possible stack. Learning takes place cumulatively with the robot playing with different combinations of objects (some previously experienced, some novel) and it goes on in an open-ended fashion. By incrementally exploring and building stacks with various objects, the robot has to learn about their physical properties and relations among different objects in the context of creating the tallest stack. Since the solution itself depends on what objects are available in the “now,” to be successful multiple episodes of past experiences have to be remembered and integrated in the context of the present. Hence, the robot is continuously pushed to both exploit “what it knows” from its past experiences in the novel situations and at the same time learn by exploring novel objects, remember its own mistakes, and perform better next time.

7.4.1 The Darwin Perception–Action Loop

Darwin is an EU project whose principal goal is the development and validation of a cognitive architecture to control action in the generation of assembly tasks. Figure 7.7 shows a block diagram of how the lower-level perception–action-related information is organized. At the bottom is the Darwin sensory layer that includes the sensors, associated communication protocols, and algorithms to analyze properties of the objects, such as color, shape, and size. Word information is an additional input coming from the teacher either to issue user goals or interact with the robot. Results of perceptual analysis activate various neural maps (property-specific SOM’s in layer 1, provincial hubs) ultimately leading to a distributed representation of the perceived object in the connector hub (top-level object map). These self-organizing maps are trained using standard techniques (Kohonen 1995; Fritzke 1995), and more details with experimental results can be found in Mohan et al. (2013). An interesting aspect of such kind of organization is that as we move upwards in the hierarchy, information becomes more and more integrated and multimodal, and as we move downwards, information is more and more differentiated to the level of perceived properties. The connectivity between hubs and property-specific maps is essentially bidirectional, hence allowing information to move “top-down, bottom-up, or in cross-modal fashion.” For example, as illustrated in Mohan et al. (2013), when the robot is issued the goal “grasp a red container” (a new combination of known words describing an object the robot has not encountered before), bottom-up activity in the word map starts spreading through the provincial hub leading to anticipatory top-down activations in the neural maps processing color and shape information. If such top-down activation resonates with the concurrent bottom-up activation (through the perceptual stream), this is sufficient to lead to the inference that the novel object being perceived is most probably the one the user is requested to grab (Mohan et al. 2013).

Fig. 7.7
figure 7

Action–perception loop. Top panel: Shows how lower-level sensorimotor information is organized and the main subsystems involved in the “identify–localize–reach–grasp” loop used to generate primitive actions, in the context of creating the tallest possible stack. At the bottom is the sensory layer that includes the sensors, early visual processing, and associated lower-level communication protocols. Results of perceptual analysis activate various property-specific neural maps (property-specific SOM’s in layer 1, provincial hubs) ultimately leading to a distributed representation of the perceived object in the connector hub. Hubs perform the role of integration between modalities and enable “top-down, bottom-up, and cross-modal” flow of neural activity. The abstract layer forms the “connector hub” in the action space and consists of single neurons coding for different actions at an abstract level. Note that these single neurons do not code for the action itself but instead have the capability to trigger the complete network responsible for generating the plan to execute the action in the context of the present environment. Finally all plans have to be executed by coordinating the body. This is accomplished by iCub action generation system that decomposes the plans to the level of motor commands to be transmitted to the actuators. Bottom panel: Some snapshots of the working loop

This kind of property-specific organization and global integration through hubs is in line with emerging results from neuroscience (van den Heuvel and Sporns 2013; Martin 2009; Meyer and Damasio 2009) as depicted in Fig. 7.3. It is also worth remarking that two important features are made possible by this kind of architecture:

  1. 1.

    The bottom-up processing leads to a distributed representation of the perceived objects (in relation to its perceptual properties color, shape, size) in the object connector hub that identifies the object (in other words coding for “what is it”).

  2. 2.

    Due to reciprocal connectivity between the hubs and property-specific maps, it becomes possible to go beyond “object–action” and learn things at the level of “property–action” too: in our embodied framework, “actions” are mediated through the “body” and directed towards “objects” in the environment, according to “tasks.”

Playful interactions with objects give rise to sensorimotor experience, learning, and ability to reason in the future. Thus there is the need to connect “object,” “action,” and the “body.” Note that there is a subtle separation between representation of actions at an abstract level (“what all can be done with an object/tool”) and the memories related to the action and its consequences (“how to do”). While the former relates to the “affordances” of an object, the latter relates to memories of motor skills, sensorimotor consequences, and anticipated rewards in relation to the goal. The abstract layer forms the “connector hub” and consists of single neurons coding for different action goals like reach, grasp, push, stack, use of different tools, etc. and grows with time as new skills are learned. Single neurons in the connector hub in turn have the capability to trigger the subsystems that hold (procedural, semantic, and episodic) knowledge related to the action (and other actions that may participate as subcomponents). In this sense neurons in the top-level “action connector hub” are similar to “canonical neurons” found in the premotor cortex (Murata et al 1997) that are activated at the sight of objects to which specific actions are applicable. At the same time the detailed knowledge itself is learned/represented in distributed cortical networks which are activated by the action goal (may also involve other sub-actions and sensorimotor memories related to them).

7.4.2 Learning to Build the Tallest Stack Given a Random Set of Objects to Play with

While building the tallest stack, the robot is allowed to explore gradually with a limited set of objects (two at a time, then add a new object, further add another new object, present them in different combinations). The role of the teacher is important as he/she gradually helps the developmental curve, without directly suggesting the solution, but creating situations that can aid new learning, contradictions, and abstractions. At the same time, this scenario is used to explore the organization and flexible use of episodic memory of the robot. The main contents of the episodic memory for this scenario were identified as the temporal order of the robot’s “action” on objects and the final reward received by the user. At the same time, the activations in the neurons directly correspond to activations in the “object hubs” and “action hubs” that were active also during explorative learning. For the stacking scenario (depicted in Fig. 7.8), let us consider a very small patch of a simulated neocortex, consisting of 1,000 pyramidal cells. For simplicity in visualization, the 1,000 neurons are organized in a sheetlike structure with 20 rows each containing 50 neurons. Every row may be thought as an event in time (related to object, action, or reward) and the complete memory as an episode of experience (e.g., picking a cylinder and placing it on a mushroom and getting a null reward from the user and vice versa).

Fig. 7.8
figure 8

Left panel shows explorative attempts to build the tallest stack using a mushroomlike object and a regular cylinder. The formed memories related to object and action (rows 1–4) reflect activation in the neural maps related to object and action; row 5 is the end user reward given to the robot for its performance. Right panel: shows what is remembered when the robot encounters objects already explored in the past. The green table depicts the activations in the object connector hub due to the result of bottom-up perception in two cases: (1) only green mushroom and (2) both mushroom and cylinder are shown. In both cases, partial cues generated by bottom-up perception enable the robot to remember its past experiences. In such a computational organization, the anticipated reward (from past explorative experiences) can be used to trigger competition between multiple “remembered episodic experiences”

This neural network consisting of a sheet of 1,000 pyramidal cells acts as an auto-associative memory that builds up on a recent excitatory–inhibitory neural network proposed by Hopfield (2008). So next time the robot perceives a mushroom (through activations in the color and shape maps), the partial cue is sufficient to recall its past experiences with mushroom (e.g., placing a cylinder on top of it and getting a reward of 0 or placing it on top of the cylinder that was more rewarding). The right panel shows what is “remembered” when these objects are encountered in the future. The neural map (shown in green) depicts the activations in the object connector hub due to the result of bottom-up perception (case 1 only green mushroom and case 2 both mushroom and cylinder). Note that, under such circumstances, the anticipated reward can be used to trigger competition between “remembered episodic experiences” in a way that all memories “compete to survive”: survival based on their capability to reenact their plans once again through the body.

7.4.2.1 Interplay Between Episodic Memory and Abstraction

Colors of objects do not affect the way they move when they are used to create the tallest stack. Can this information be abstracted through playful explorative learning and recall of such past experiences? Suppose that we started with the robot playing with green sphere and a yellow cylinder; the teacher now presents the robot with a blue cylinder and orange sphere. Since activity in object hubs reflects activity in property-specific maps that drive them, there is partial similarity in the neural activation of the object hubs; the objects are of different colors but same shapes. Approximate similarity is enough to generate the partial cue and reconstruct the related past experiences. When presented with a blue cylinder and orange sphere, still the past memories of playing with green sphere and a yellow cylinder can be retrieved successfully. Also note that the partial cue is different and contains less information as compared to the partial cues. This is because the objects in the world that are responsible for the generation of partial cues are also different yet share some similarity in “shape” but not “color.” Partial cue leads to the retrieval of the most related and valuable past memory. Even though the robot knows nothing about stacking blue cylinders and orange spheres, it knows something about yellow cylinders and green spheres and anticipates full reward. Thus, the most valuable action sequence from the past is once again executed (now on new objects), and it turns out that the consequence (in terms of reward received) is the anticipated one. In summary, the robot can pin down “causally dominant” properties while experiencing, learning, and remembering in a dynamic “cumulative” fashion.

7.4.2.2 Interplay Between Memory, Prospection, and Creativity

Let us focus again on the task of assembling the tallest possible stack in order to exemplify the creative use of experiences, showing how novel “action sequences” emerge out of “multiple” past experiences, without any need of “blind” exploration. The teacher puts all the objects (cube, small cylinder, large box, and sphere) in front of the robot, to assemble the tallest stack. Let us suppose that iCub only has isolated past experiences with any of them. This is interesting because none of the “past experiences” of the robot has enough information to deal with all these objects at the same time. The challenge is to “combine” knowledge from multiple experiences to come up with a “novel action sequence.”

Let us suppose that four episodic memories have been assimilated and stored in the past: EM1 (cylinder on top of sphere), EM2 (sphere on top of cylinder), EM3 (cube—cylinder—sphere), and EM4 (large box—cube). iCub is then presented to the full set of four objects (first snapshot of Fig. 7.9). The activity in the object hub results in “partial cues” that reconstruct all the four EMs: this is because all the memories (EM1–EM4) have some information related to a “subset” of objects present in the world. However, not all EMs may participate to the construction system, although they compete for controlling the hub (either fully or partially), exerting a top-down influence of the hub. Note that EM1 and EM2 can be wiped out in the competition because there are other competitors that know more (in the context of the present situation). For example, EM3 encodes information related not just to cylinders and spheres (encoded by EM1 and EM2) but also to cubes and hence is a stronger competitor. But in addition to EM3, also EM4 manages to stay alive (it knows something about large objects that none of the other EMs knows anything about). Further, since EM3 and EM4 know something in common (i.e., cubes), they must inhibit each other in order to get control. In this specific example, it happens that the sum of the activities imposed top-down on the hub by EM3 and EM4 is equal to the bottom-up activities. This implies that “the complete action sequence to solve the problem is already available in the isolated past experiences that won the competition” and this applies always independent of how many past experiences claim their control over the hub. Either the most valuable action sequence is directly available (in a single episodic memory), or multiple past experiences may have to be combined in a novel fashion to generate a new behavior. In any case, if the net top-down hub activity is equivalent to the bottom-up hub activity, then even if the environment is “novel,” the robot can conclude that its past experiences contain enough information to realize the goal, by optimally combining these past memories into a novel sequence. In summary, action sequence chunks encoded by EM3 and EM4 enter the construction system, by singling out the overlapping object cube highlighted in the red box.

Fig. 7.9
figure 9

Snapshots of the process of building the tallest stack from an available set of objects by combining past memories without any trial and error exploration

Overlap in knowledge between different remembered experiences is advantageous, because it helps to connect them together. The construction system just employs one simple rule to achieve this: if there are overlaps in knowledge encoded by different “winning” past experiences, bring them as close as possible. In this sense, the overlapping element is similar to an intermediate subgoal (a point of intersection between two different past experiences). After the initial bootstrap explained above, the construction process goes on as illustrated in Fig. 7.9, by combining isolated memories of past experiences, in such a way that a novel sequence emerges: stack the large box at the bottom, then the cube, the small cylinder on top of the cube, and the sphere on top of the small cylinder and anticipate full reward for this! Indeed full reward was given!

More advanced scenarios, such as the one depicted in Fig. 7.10, are being investigated, while following the same fil rouge in order to test and improve the cognitive architecture.

Fig. 7.10
figure 10

Advanced Darwin scenarios are set up in a range of playful make and break style assembly tasks, also incorporating several elements of goal-directed reasoning, inspired by similar studies in animal and infant cognition

7.5 Concluding Remarks

The world we inhabit is an amalgamation of structure and chaos. There are regularities that could be exploited. Biological or artificial agents, which do this best, have the greatest chances of survival. Often this attempt to survive involves a complex interplay between fundamental mechanisms associated with perception, action learning, memory, abstraction, and prospection that can be investigated in greater detail even through simple “playful” experiments using an integrated system like a baby humanoid (incorporated with basic vision, touch, proprioception, force control, and whole-body coordination). Several experiments related to motor control, skill learning, and organization of procedural, semantic, and episodic memory were presented in this chapter to describe the cross talk between these fundamental processes operating in a “cumulatively” developing cognitive robot. All of this is organized in multiple interacting subsystems that synergistically come together in the context of the “goal” executed in the present (sometimes combined with new explorative interactions). Such interplay plays a fundamental role in ensuring that not everything needs to be learned and explored and not everything needs to be memorized (even memories compete to survive in the neural substrate and get their content reenacted by the actor). The interplay goes on cumulatively, more learning driving better reasoning and inconsistencies in reasoning driving new learning. Reenacting this on a baby humanoid often makes us remember the alternating “foregrounds” and “backgrounds” as intricately depicted in the several artistic creations of Escher. Simply put, beyond a point a software programmer cannot travel the journey of a cognitive robot like iCub. Instead, like natural cognitive agents, they must also be endowed with mechanisms that enable them to efficiently organize their sensorimotor experiences into their memories, remember, consolidate, forget, and exploit them effectively “when needed” to realize their goals and, at the same time, keep learning new things. Open-endedness, cumulatively and growth of a continuously learning system, and gradual emergence of generativity/creativity in their behaviors are natural consequences arising out of such a scheme as different sections in our chapter demonstrate.

In this concluding section, we do not intend to summarize all that has been said so far but instead quickly relate all this to a very fundamental evolutionary function, namely, “navigation,” an activity that all living organisms engage in. It is already developed in a sophisticated way in rats, for example, but much more so in humans, with plenty of added/recycled value (green learning!). The computational basis of this added value, at the same time grounded in the biology of the brain and recreated through playful tasks with iCub, was in fact the main subject of this chapter. In the discussion, we attempt to present a perspective that creative “goal-directed generation of behavior itself is navigation” (not in space but in time)!

7.5.1 Traveling in Time vs. Traveling in Space: The Navigating Rat, a Tool-Making Crow, and Darwin Architecture

All living organisms “navigate.” There are few exceptions like the sea squirts: after few days of life, the first thing they do is digesting their own brains for nourishment. But as the complexity of the body and the environments in which the species had to survive becomes more complex, their brains also become more and more complex. A rat navigates for food, can remember places where food is found, and finds a path to reach it, sometimes involving novel solutions as demonstrated by several studies on rat navigation. However, with an even more complex body and more complex environment to survive in, higher-order primates need to navigate not only in “space” but also in time. Evolution being always constrained by “energy and space” would have certainly found ways to reorganize the primitive neural substrates engaged in navigating in space already existing in lower-level organisms to be reused to “navigate in time.”

Indeed the recent discovery of the default mode network of the brain (both in humans and rats) supports this perspective. There is a wide consensus in the field of neuroscience that the same network is consistently activated while recalling the past (Maguire 2001; Rugg et al 2002) and other activities as diverse as simulating the future (Atance and O’Neill 2001; Addis et al. 2009; Szpunar et al 2007; Schacter et al 2012), spatial navigation (Burgess et al 2002; Suddendorf 2013; Corballis 2013), social cognition (Raichle et al 2001; Frith and Frith 2010), and perspective taking (Mason et al 2007). The essence of these findings is that there is evidence in support of the viewpoint that disparate cognitive functions often treated as distinct might share common underlying processes.

The Darwin architecture being developed looks at the computational basis of how such diverse functions can share resources and enable a cognitive robot to “travel in time” (through its multiple past experiences, the present evolving experience, and the simulated future consequences) to give rise to intelligent goal-directed behavior. In this sense, by mimicking the DMN, we have created a computational framework that enables Darwin robot to travel in time, connect its multiple past experiences to simulate the future, give rise to novel behaviors in unforeseen situations, and learn new things in the process. In this context, what we want to emphasize is that “goal-directed reasoning” is very similar to a path-finding exercise during spatial navigation, but now in “time” not “space.”

Let us consider this analogy in detail in the context of this chapter. Goals are distant events in time that have to be reached; past experiences triggered by one’s episodic memories give a path in time to reach a future event (which can be remembered based on partial cues). Frequently the paths in time also encounter obstacles, i.e., a contradiction between what the robot expects to ensue in time as a result of its past experience and what is actually happening in the present time. Clearly this is equivalent to getting lost in space, like a rat trapped in a maze. In the present context, the robot gets lost in time instead!

Alternative paths have to be found in time by exploration, in analogy with a rat, engaged in exploring its environment to come up with a new path to its spatial destination. Several cues in the environment are used to guide such exploration. The same applies when obstacles are encountered in time! Just like a train that changes its tracks. Many times there are multiple paths that lead to the same goal, when energy is used as a mechanism to choose the most efficient strategy (PMP mechanism of Sect. 7.2, for instance, which solves the degrees of freedom problem elegantly). The same applies also in the context of the energy of a memory. When we navigate in space, we remember the landmarks. Similarly, events in the episodic memory are landmarks in time. Landmarks in time can be connected by a mechanism of resonance. When landmarks are connected in space, we get a new trajectory to navigate spatially towards the goal. When the dots in time are connected, novel behaviors may emerge (like the examples in Sect. 7.3).

In sum, our memories represent our past, but they can also be used to simulate the future (whether it is while navigating in space or navigating in time). Emergence of creativity and novelty in behavior when encountered with a novel situation is related to the power to “re-invoke” such experiences that otherwise lie dormant in the neural episodic memory based on the present context, connecting the dots between such diverse experiences to find a new path in time. Indeed a navigating rat, a tool-making crow, and iCub share similarities in the way they accomplish their goals. Of course, it may be tough to understand what is going on in the brain of Betty reasoning in time or a rat navigating in space by looking at the neural activations in their brains. But principles can be abstracted from information-rich biology that can help to both “mimic and create” artifacts that show similar competencies.

Embodied developmental robotics helps here provide novel insights, as we computationally attempt to reenact such processes and on the way sometimes manage to abstract “fundamental principles” involved. The discovery of DNA was a result of model building by Watson and Crick, of empirical measurements with X-ray diffraction images by Rosalind Franklin, and the theoretical analysis of chemical bonds by Wolfgang Pauli. The model building direction is what the Darwin goal-directed reasoning framework achieves, using principles that are grounded in the biology of the brain! Of course the discussion does not end here; these were just simple explorations at the tip of the iceberg! Future efforts will be directed to go deeper!