Keywords

1 Introduction

Almost all of our daily activities require that we plan, execute, and control our movements in a skillful manner. Strictly speaking, movement provides the only means by which we not only physically interact with the world, but also actively operate on this world. Consequently, and somewhat exaggerated, it has been proposed that the entire purpose of the brain is to produce movement, and all sensory and cognitive processes can be regarded as inputs for future motor outputs [1, 2].

Typically, our motor behavior seeks to achieve action goals related to the environment. Thus, given a certain action goal, the motor system’s task is to generate a movement that will attain this goal, and hence, bring about a change in the sensory environment. This applies not only to rather complex movements such as in sport contexts, but to manual actions as well.

Bernstein [1] as one of the first scientists already acknowledged the fundamental role of sensory feedback processing in the control of voluntary movements, and pointed out the goal-directed character of motor actions. He explicitly emphasized the importance of anticipation in realizing any type of goal-directed motor act, and that any voluntary motor action cannot be initiated without a model of what should result from the planned action. This idea is expressed in his ‘model of the desired future’ (i.e., a model of what should be) which is supposed to play an important role in motor control. Consequently, such a model must possess the capability to form a representation of future events by integrating information from past (i.e., memory) and present (i.e., sensory) events in order to generate motor commands that transform the current state in the sensory environment into the desired state (i.e., achieving the action goal). Bernstein’s model reflects the idea that motor control is based on cognitive representations, which serve goal-directed motor planning, and that these representations reflect the functional movement structure.

Building on this general idea and more recent research by Hoffmann, Rosch, Prinz, Jeannerod and others [37], Schack and colleagues have proposed a cognitive architecture model (see Table 1) which views the functional construction of actions on the basis of a reciprocal assignment of performance-oriented regulation levels and representational levels [810]. According to this view, basic action concepts (BACs) are thought to serve as major representation units for movement control. Analogous to the well-established notion of basic concepts for objects [6], BACs are considered the mental counterparts of functionally relevant elementary components or transitional states (body postures) of movements. BACs are based on the cognitive chunking of body postures and movement events concerning common functions in realizing action goals. In contrast to basic object concepts, they do not refer to behavior-related invariance properties of objects, but to perception-linked invariance properties of movements. Consequently, BACs can be understood as representational units in memory that tie together the functional and sensory features of movements. The integration of sensory features refers to the perceptual movement effects, whereas the functional features are derived from the action goals. Taken together, such movement representations provide the basis for action anticipation and control by linking higher-level action goals with the lower-level perceptual effects in the form of cognitive reference structures.

Table 1 Cognitive architecture (levels) of motor action (modified from [9, 10])

The integration of BACs into higher-order representation structures has been investigated and simulated with a number of different methods [9, 11, 12]. One way to ascertain cognitive representation structures is provided by the Structural Dimensional AnalysisMotoric (SDA-M [13]). The SDA-M procedure determines relational structures in a given set of concepts, and has been applied in a number of different studies such as complex action in sport contexts [14, 15], manual action [10], and rehabilitation settings [16]. Importantly, this method allows for a psychometric analysis of the structures without necessitating participants to give explicit statements regarding their representation, but rather through means of knowledge-based decisions in an experimental setting. In general, results of these studies have provided convincing evidence for a mutual overlap between the representation structures and the actual motor performance. In the following, examples of such studies are presented to demonstrate the broad spectrum of potential applications of this approach.

2 Representation Structures of Complex Motor Action

Schack and Mechsner [14] demonstrated differences of representation structures between skilled athletes and novices in the tennis serve. In skilled athletes, representation structures had a distinct hierarchical organization, were remarkably similar between individuals, and were well matched with the functional and biomechanical demands of the task. In comparison, representation structures in novices were organized less hierarchically, exhibited a higher variability between individuals, and were less well matched with task demands. This systematic relationship between mental representation structures and expertise has been successfully reproduced in a number of complex actions, such as golf, soccer, wind surfing, volleyball, gymnastics, and dancing [1721].

While these differences between skilled athletes and novices suggested that representation structures changed as a function of skill level, experimental evidence for a change of the representation over the course of learning was lacking. Frank and colleagues [22] therefore investigated the effects of movement practice on representation structures during early skill acquisition. To this end, novice golfers were randomly assigned into a practice and a control group. Participants in the practice group were asked to perform a total of 600 golf putts over a period of 3 days. Using the SDA-M method, representation structures of the practice and the control group were measured before and after this acquisition phase. Results showed that the structures of the practice group had changed between tests, while the structures of the control group had not. The structural changes in the mental representation of the practice group resulted in the formation of functionally meaningful clusters and, over the course of learning, rendered the mental representation more similar to an expert structure. These findings indicate that practice results not only in a better motor control and anticipation of future states but furthermore in functional adaptations in the mental representation of complex actions.

Bläsing and colleagues [23] used the measurement of mental representation structures to study the interaction of the body schema and multiple, effector-specific actions. The body schema was defined as the multimodal representation of the body that integrates somatosensory, proprioceptive, vestibular, and visual information. The authors measured the cognitive representations of body parts and actions of two patients with congenitally absent limbs (one with, one without phantom sensations) and compared those to the representation structures of a group of paraplegic participants and two healthy control groups. Structures of the control groups and the paraplegic group revealed a clear separation of upper body, lower body, fingers and head, as well as a clear clustering of the different actions with their respective effectors. The representation structure of the patient with phantom limbs closely resembled the structures of the control groups, whereas the structure of the patient without phantom limbs exhibited no modularity of the body schema. However, actions were still clustered with the action-specific effector of the patient (i.e., the right toe). These results provided evidence for a link between motor actions and their respective effectors on the level of mental representation.

To study the functional link between representation and motor execution Land et al. [8] measured the representation structures and the kinematic structures of the full swing in golf. The authors asked to what extent the output at the kinematic level was governed by processes of anticipation and representation at the level of motor control. To this end, the Spatio-Temporal Kinematic Decomposition (STKD) was introduced, which could calculate the hierarchically-organized kinematic structure of a movement in different spatio-temporal scales. The authors measured the overlap between the representation structure of a movement and its kinematic structure. Results showed that the hierarchical kinematic structure was closely related to the representation structure of the movement across participants, with various degrees of skill on a complex motor movement (see Fig. 1). This finding supports the idea that representation and motor execution are closely linked. The STKD approach might also be used to distinguish the extent to which humans, but also artificial cognitive systems like robots, efficiently represent and guide complex actions [8].

Fig. 1
figure 1

Mean hierarchical structures over all participants for the mental representation (left) and movement kinematics (right) of the golf swing. The numbers on the horizontal axis relate to the concept number, the numbers on the vertical axis display Euclidean distances. The lower the Euclidean distance between two concepts in feature space, the stronger the link between these concepts. The horizontal dotted line marks the critical distance dcrit for a given α-level (dcrit = 5.64, α = .001): links above this line are considered statistically irrelevant, whereas concepts linked below this line are clustered together. Concepts: (1) head, (2) chest, (3) left shoulder, (4) left elbow, (5) left hand, (6) right shoulder, (7) right elbow, (8) right hand, (9) hips, (10) left thigh, (11) left knee, (12) left foot, (13) right thigh, (14) right knee, (15) right foot. The two structures show that both the mental and kinematic structure are split into two main clusters pertaining to the upper and lower body (adapted from [8] with friendly permission of Frontiers in Computational Neuroscience)

3 From Representation to Anticipation

Further evidence for the close link between mental representation and action was found in a recent study on the visual processing of complex stimuli [24]. The authors used an unconscious response priming paradigm to investigate whether skilled athletes and novices differed in the visual perception of actions: Participants had to decide to which phase of the high jump (approach vs. flight) a presented target picture belonged. Before the target picture, a picture from the same or different movement phase, masked by two scrambled versions of the stimulus material, was presented briefly (17 ms) on screen. The prime and target picture were either presented in a natural (reflecting the temporal sequence of the movement) or reversed order. Results showed that skilled athletes had faster reaction times for prime-target pairs that reflected the natural movement order, indicating a facilitation of the visual processing. The authors hypothesized that, in skilled athletes, the represented BACs of the high jump also contained information about anticipated future aspects of the movement, facilitating the processing of the subsequent target picture in a natural prime-target order.

Similar anticipation of future movement states has been described by Marteniuk and colleagues [25] for the hand trajectory in a reach-to-grasp movement. Participants had to grasp a disk and either (1) throw it or (2) place it into a tight-fitting well. The authors showed that the relative length of the deceleration phase of the initial reach-to-grasp segment depended on the subsequent movement: If the object had to be placed in a tight-fitting well, the deceleration phase was prolonged. This result supported the hypothesis that subsequent movements were anticipated and movement plans were optimized according to the final goal of the task.

Rosenbaum and Jorgensen [26] extended these previous results to grasp posture planning. In the first of two experiments, participants were asked to grasp a horizontal bar and place its left or right end onto a target disk on the table. When asked to place the right end on the target disk, participants used an overhand grasp, to end the movement sequence in a comfortable, thumb-up posture. When asked to place the left end of the bar on the target, however, participants used a less comfortable, underhand grasp, to again end their movement in a comfortable, thumb-up posture. This tendency to accept uncomfortable initial postures in order to avoid awkward postures at the end of the movement has been termed end-state comfort effect [26, 27]. Since its original description, the end-state comfort effect has been reproduced in a variety of different experiments [2833] and taken to support the notion that people represent future body postures and select initial grasps in anticipation of these forthcoming postures.

4 The Link of Anticipation and Cognitive Cost

In the second experiment of their study, Rosenbaum and Jorgensen [26] demonstrated that posture selection not only depended on anticipated future, but also on previous movement states. A demonstration of this behavior was provided by Weigelt and colleagues [34]. Participants had to open a column of slotted drawers in a sequential order, using either an over- or underhand grasp for each drawer. Results showed that the point-of-change between grasp types shifted depending on the movement direction: In ascending sequences, participants tended to keep their initial underhand grasp for the central drawers, whereas they stuck to an overhand grasp in the descending sequences. This tendency of the motor system to switch from one state to another at different values depending on its history was termed motor hysteresis. Motor hysteresis is not limited to binary state changes, but can also be observed in sequential tasks with continuous hand rotation [35]. Rosenbaum and colleagues [36] hypothesized that motor hysteresis effects resulted from the cognitive costs of movement planning. In a sequential task, planning costs could be reduced by the reuse of a former movement plan instead of the creation of a new plan from scratch for each movement. A number of studies support the cognitive cost hypothesis [34, 37, 38].

Weigelt and colleagues [34], for example, provided evidence for the interaction of motor execution and working memory in a combined sensorimotor and verbal memory task. In their study, participants had to retrieve a cup from each drawer and memorize the letter from the inside of the cup. Under these dual-task conditions, one of the most reliable effects in memory research was eliminated, namely, the tendency to recall more recent items better than items encountered earlier (recency effect). This outcome was not the result of an overall poor memory performance, as the tendency to recall initial items better than items in the middle of the sequence—the primacy effect—was preserved. Changing the difficulty of the memory task (free recall instead of serial recall) did not affect the loss of the recency effect. However, it did affect the transition of grasp type in the ascending and descending sequences. In the free recall task the range of indifference [26], within which grasp type selection depended on the movement history, increased in comparison to the serial recall task. These findings indicate that cognitive load modulates the motor execution and vice versa.

The interaction of motor planning and verbal memory was studied in detail by Spiegel and colleagues [39]. The authors combined a verbal memory task with a grasp-to-place task. Participants were asked to plan a placing movement to one of two target locations and subsequently memorize a 3 × 3 letter matrix. They then had to execute the pre-planned movement. In 20 % of the trials, an auditory cue informed the participants that the placing movement had to be executed to the alternative target. Results showed that the verbal recall performance was significantly reduced by the re-planning of the movement, indicating that movement planning and verbal working memory share common cognitive resources. In a subsequent study [40], the authors tested what kind of working memory processes were recruited for the storage and creation of a movement plan. To this end, either a verbal or a spatial memory task was used. Results showed that the storage of a motor plan disrupted spatial more than verbal memory, whereas re-planning reduced memory performance in both tasks in equal measure. These findings indicate distinct roles of working memory domains during the storage and creation of a motor plan.

Whereas the planning and re-planning of grasping movements obviously require the anticipation of a desired movement state, this requirement is less self-evident for the reuse of former movement plans. In a recent study, however, Schütz and Schack [37] argued that motor hysteresis also relies on the representation of future movement states. The authors hypothesized that, in a sequential task, the motor system does not seek to minimize the cognitive cost of movement planning, but the added cost of movement planning and movement execution. To test this hypothesis, participants were asked to open a column of drawers with cylindrical knobs in a sequential order. Results showed that, at the central drawers, participants assumed a more pronated posture in the descending sequences and a more supinated posture in the ascending sequences, thus showing motor hysteresis in a continuous posture space. After an initial pre-test phase, the mechanical cost of opening and closing a drawer in the sequence was increased for 10 trials. In the subsequent post-test, the size of the motor hysteresis effect was significantly reduced in comparison to the pre-test, indicating that the increased mechanical cost during the intervention phase was represented in participants’ long term memory. Based on the original hypothesis, this result also supports the notion that both the cognitive and mechanical cost of the upcoming movement have to be anticipated in order to optimize the total cost of the movement.

5 From Anticipation to Representation—Developmental Aspects

The planning of grasping actions is not only related to physical features of the grasped object such as its shape [41] or size [42], but also depends on what the actor intends to do with the object [25, 43]. This anticipatory aspect of manual action has been termed second-order motor planning [27, 44]. A number of studies investigated the development of second-order motor planning on a phylogenetic (e.g., non-human primates [45]) and ontogenetic level (human children [4648]). Weiss et al. [45], for example, showed that end-state comfort planning in New World monkeys (cotton top tamarins) was similar to human adults.

Weigelt and Schack [48], on the other hand, investigated end-state comfort sensitivity in young children (3, 4, and 5 year old). The children performed a dowel placing task, reaching for a horizontal dowel and inserting one of its ends into a target disc. If an initial overhand grasp was required to achieve end-state comfort, all children were able to perform the task successfully. If, however, an underhand grasp had to be selected, only 18 % of the 3-year-olds, 45 % of the 4-year-olds, and 67 % of the 5-year-olds were able to satisfy end-state comfort. These results show a distinct pattern of gradual improvement in children’s sensitivity to reach end-state comfort across the three age-groups.

To compare anticipatory planning of manual actions between non-human primates and humans, Wunsch and colleagues [44] applied the motor task previously used for cotton-top tamarins [45] to pre-school children, primary school children and adults. Participants had to reach for a plastic cup that was vertically suspended in an upright or inverted orientation and retrieve a small toy from inside the cup. When the cup was presented in the inverted orientation, only adults consistently used the inverted grasp posture required to satisfy end-state comfort. The percentage of inverted grip amongst participants increased with age, indicating a gradual improvement of end-state comfort sensitivity. However, while the performance of adults was closely related to the performance of non-human primates, even children at approximately 10 years of age exhibited less end-state comfort sensitivity than the primates.

Convincing evidence that the gradual improvements in anticipatory motor planning in children are closely related to adaptations on a cognitive level was provided by Stöckel and colleagues [46]. Specifically, these authors examined links between motor planning (using the bar-transport task) and the development of cognitive representation of grasp postures in children aged 7, 8, and 9 years. In line with other studies on motor planning during childhood (see [49] for a recent review), end-state comfort satisfaction increased with age, and the 9-year old children had more distinct representation structures compared to the 7- and 8-year old children. Importantly, the sensitivity to comfortable end-states was related to the cognitive representation structure. Children with functionally well-structured representations exhibited a stronger preference for end-state comfort, thus supporting the notion that cognitive action representation provide a benchmark for anticipatory motor planning and behavior.

6 Computational Models

Computational frameworks of sensorimotor control based on parallel inverse and forward dynamic motor models (e.g., MOSAIC model [50]) are well established and are detailed enough to reason about neurological motor disorders in humans, such as musician’s dystonia (i.e., involuntary muscle cramps during instrument playing that can become severe enough to end a professional career [51]). In the MOSAIC model, an efference copy of a motor command is used by predictor units to internally simulate the sensory consequences of a planned action. These predictor units can therefore be interpreted as the computational model´s counterparts of BACs. Further, the hierarchical MOSAIC model (HMOSAIC) [52]—having a top symbolic/representational level, a middle motor sequence level and a bottom sensorimotor control level—bears strong similarities to the 4-layer model of complex motor control described by Schack and Ritter [10] and can be used to explain action observation, imitation learning and social interaction.

MOSAIC uses parallel local modules to represent motor primitives. This local approach allows for sharp segmentation between learned motor patterns, but leads to poor generalization properties due to strong competition between local modules during learning. This ‘conflict between generalization and segmentation’ [53] limits the number of learnable motor patterns. A neural network based model proposed by Yamashita and Tani [53] avoids this problem by distributed storage of motor patterns within a single network. Furthermore, the network is able to implicitly establish functional hierarchies using neural units with multiple time scales. Yet, the model requires long training times and has limited motor pattern capacity because of known problems using a classical training method for recurrent neural networks (conventional back-propagation through time).

Echo State Networks (ESN) proposed by Jäger and Haas [54] are special variants of recurrent neural networks that circumvent the problems of gradient based learning methods. They offer high model capacity, fast learning and excellent prediction capabilities. The learning speed is high enough to combine them with evolutionary algorithms to optimize both reservoir weights and network structure in a motor pattern learning task [11]. The resulting, compact networks can smoothly switch between motor patterns and show an interesting emergent feature: the ability to morph between stored patterns [11]. A recent extension of ESNs adds neural filters called conceptors [55] that pull the network dynamics to defined subspaces. Conceptors bridge the gap between (1) low level neuronal processing and learning of static and dynamic patterns and (2) high level symbolic processing. Using conceptors, motor patterns can be smoothly switched, interpolated and combined by logical operations [55]. Again, conceptors can be interpreted as computational instances of BACs.

Cruse and Schilling [56, 57] argue that cognitive systems not necessarily have to be complex, but that already small, reactive neuronal systems like WALKNET [58] or ESN based approaches [11, 55] can be expanded in a bottom-up approach to show basic cognitive capabilities like planning ahead or discovering new actions to solve a given problem. The proposed cognitive architecture described in [56] consists of modules arranged in layers and columns, again bearing close similarities to the layered architecture of Schack and Ritter [10]. Internal body models based on grounded and embodied sensorimotor circuits provide the basis for internal simulation of given tasks. Such internal models can be employed as forward models to predict sensory consequences of planned actions, and at the same time, serve as inverse models for generating and ‘inventing’ new, task specific motor commands [59].

7 Implications for Robotics

Information about basic principles of anticipatory planning in humans such as the end-state comfort effect are relevant when designing robot architectures that are used to perform manual actions and interact with humans in an appropriate, age-dependent manner [60]. Robot technology has matured to the point at which it can approximate a reasonable spectrum of perceptual, cognitive, and motor capabilities, allowing us to explore architectures for the integration of these functions into robot action control. This gives us the opportunity to fit existing models of anticipation, representation, and decision making in humans to robotic architectures. Therefore research concerning anticipation in human motor control and the cognitive architecture of human motion is clearly linked to the ongoing field of cognitive robotics. The goal of cognitive robotics is to elevate the currently still rigid and rather narrow action repertoire of robots to a level where a robot can select and adjust its actions flexibly to highly varying contexts, maintain a shared focus of attention with a human instructor, and react to commands that are offered in a ‘natural’ format, such as speech and demonstration.

Our recent interdisciplinary research within the Center of Excellence ‘Cognitive Interaction Technology’ (CITEC) focused on the question of how motor anticipation and representation structures are established and changed step by step in compliance with task constraints. We wanted to investigate the relationship between goal anticipation, the structure of mental representations and the performance in manual actions in humans and robots. Therefore, we studied the development of structured representations (action templates) in human skill acquisition, and how these research results could be applied to robotics [10, 61]. In a next step, we translated our findings from studies of human movement into sufficiently specific models that permitted an implementation on robotic platforms [11, 12]. This research connection was used in both directions: insights gained from the attempt to validate the hypothesis about action and representation structures in the robot learning scenario have been used to change the design of experiments with human subjects. For instance, to learn about the granularity of cognitive building blocks in manual actions we tried to gain closer insight into the relationship between representation structures and motor execution, including situations in which actions resulted in errors.

An important link between cognitive architecture in humans and robots focuses on the cognitive reference structures of manual actions, especially the representations of objects and grasp postures. Therefore we investigated the hierarchical representation of objects under different task constraints and, complementary to this, the hierarchical representation of grasping movements (hierarchies of power and precision grips). The insights gained in these experiments have been implemented in robotic platforms and could be linked to different robot architectures [10, 61]. This kind of integrated (interdisciplinary) research allowed us to experimentally explore the interactions of action representation in memory (simulated with artificial neural networks) and motor skills in the context of real-world tasks. Based on present studies on anticipation, motor planning, and representation of grasp postures in humans and the corresponding experimental studies with uni- and bimanual robotic platforms we are going to refine robotic grasping step by step.