Why do we need hand-centred representations of visual space?

The primate visual system can be described as a series of brain areas whose neuronal activity represents properties of visible objects in the world. One such important property is the location of an object relative to the observer, and this can be represented using one of several different coordinate systems. Perhaps the simplest (and most common) coordinate system is the one based on the retinae of the eyes, in which objects are represented relative to their retinotopic (eye) position. However, in order to interact successfully with a visible object, it is necessary to represent the object’s position relative to the observer’s body or body part. Given that our hands can move independently from our eyes, the brain needs to integrate information arising in an eye-centred reference frame with information about the current position of the hand relative to the body and to nearby potential target objects.

A well-established solution to this problem involves transforming all information into a common reference frame for the encoding of events. Andersen and colleagues (e.g. Bhattacharyya et al. 2009; Cohen and Andersen 2002) suggested that, within the parietal cortex and via the dorsal visual stream, the represented locations of multisensory cues relevant for actions (e.g. auditory cues arising in head-centred, and proprioceptive cues in limb- or joint-centred frames) are translated into a common eye-centred reference frame. According to this account, action target positions are represented relative to the eyes (or gaze position), and, in its pure form, regardless of the posture of other body parts (Fig. 1b). Representations of this type are found throughout the primate visual system, from the retina, via the thalamus and superior colliculus, through the parietal and frontal cortices (Snyder 2000; Buneo and Andersen 2006; Marzocchi et al. 2008), and have repeatedly been demonstrated in humans using both imaging and behavioural approaches (Gardner et al. 2008; Crawford et al. 2004). These representations are found so often that they are the principal reference frame for representing visual information in the brain (for an alternative view, see Làdavas 2002).

Fig. 1
figure 1

Eye- and hand-centred spatial representations in the primate brain. Response patterns of illustrative neurons which are specifically modulated by changes in the fixation position (with respect to reaching targets b), and by changes in hand position (with respect to an approaching object, c). In both paradigms, monkeys were trained to maintain fixation in one of three targets (indicated by crosses in a) and place their hand in one of three positions (as indicated by the sketch in a). In b, responses were recorded in the parietal reach region (PRR, shown in a), while monkeys were performing delayed reaching movements to each of four different targets (indicated by the grey circles in a, corresponding to the four panels in each row). In c, responses were recorded in the ventral premotor cortex (PMv, shown in a), while three-dimensional objects were approaching the monkey from four different trajectories (indicated by the grey arrows in a, corresponding to the four panels in each row). Initial hand and fixation positions are shown in the left side of each row. In both studies, hand and eye positions were independently manipulated (upper and lower two rows, respectively). In b, the peak response of the neuron shifted when the initial eye position was varied, but not when the initial hand position was varied. In c, the peak response of the neuron shifted when the initial hand position was varied, but not when the initial eye position was varied. b Modified from Cohen and Andersen (2002). a, c Modified from Graziano et al. (1997)

At the motor end, eye-centred representations of targets would have to be transformed into effector-centred representations, in order to command movements directed towards those targets (Fig. 1c). This final transformation between eye- and muscle-centred representations for action may occur in the premotor cortex (for the potential roles of the posterior parietal cortex in coordinate transformations, see Snyder 2000). Within the dorsal premotor cortex (PMd), the firing rate of reach-related neurons is a function of three spatial relationships: between the target and the eye; the target and the hand; and the eye and the hand (Pesaran et al. 2006, 2010). In the ventral premotor cortex (PMv), some neuronal signals have been shown to correspond to the spatial relationship between the hand and the target, independently of eye position (Mushiake et al. 1997; Graziano et al. 1997), or intrinsic movement parameters of individual hand and arm muscles (Kakei et al. 2001). This suggests that the PMv may play a role in the transformation of target location from a visual to a motor frame of reference, relating to broader aspects of skeleto-motor control than previously assigned to it. At this point, it is important to emphasize that, according to the common eye-centred account presented above, while this final stage of eye-to-hand coordinate transformation is necessary for launching the motor behaviour, it is only performed after the motor outcome has been decided on and planned (Snyder 2000). In this respect, hand-centred visual representations are not used for ongoing sensory representations, but rather for enabling motor outputs.

While the scheme of transforming all inputs into a single (eye-centred) representation frame may be common in the brain, it may not be the most efficient for controlling a moving body from a signal processing point of view (e.g. Glennerster et al. 2001). This is because sensorimotor transformations come with some costs. First, transformations may add biases and variability to the transformed signals, influencing the flow and quality of information in motor control circuits (McGuire and Sabes 2009; Schlicht and Schrater 2007). Second, transformations take time. In our daily interactions with objects, however, we often need to respond very quickly to unexpected changes in their position or movement, and speed is of paramount importance. Moreover, as we move around, changes in our body position (and particularly our hand position) will require constant updating and re-calibration of the positions of objects relative to us, in the common reference frame. Given this error, bias and delay with each sensory-motor transformation, and for each movement of the subject or object, it is appealing to speculate that the brain may possess an alternative, simpler and more rapid mechanism for processing visual information to generate effector-centred motor commands to a very tight deadline.

In the following, we will present and discuss anatomical and physiological evidence indicating the existence of rapid processing of visual information for hand–object interactions. We propose that the neural mechanisms previously implicated with hand-centred processing of objects in near space (peripersonal space, see Makin et al. 2008; Brozzoli et al. 2011), in particular those within the PMv (e.g. Rizzolatti et al. 1981; Graziano et al. 1994), may play a specific role in the rapid processing of visual information for the online control of actions.

Anatomical pathways for rapid processing of visual information

Which brain circuits might rapidly mediate the flow of visual information to the motor system? Most retinal output projects to the striate cortex via the thalamic lateral geniculate nucleus (LGN), and ascends up the visual hierarchy in an eye-centred fashion (Maunsell and Newsome 1978). A small proportion of the retinal output bypasses this major pathway and projects instead to the superior colliculus in the midbrain (Perry and Cowey 1984; see also Sincich et al. 2004, for an alternative pathway directly connecting the LGN to the middle temporal visual area, MT). This collicular pathway may be involved in the rapid processing of visual information for action: Lyon and colleagues (Lyon et al. 2010) used a rabies virus in macaque monkeys in order to track transsynaptic connections projecting from the superior colliculus towards the cerebral cortex. They found that areas within the visual cortical dorsal stream (i.e., the third retinotopic visual area, V3a and MT) receive disynaptic projections from the superior colliculus via an inferior pulvinar relay (see Berman and Wurtz 2010, for complementary physiological results). No evidence was found for similar projections from the superior colliculus to the visual ventral stream areas. This finding, summarized here in Fig. 2, places the PMv within just five synapses from the retina (via the superior colliculus, the inferior pulvinar, MT and the ventral intraparietal area (VIP), see Kaas and Lyon 2007; Lewis and Van Essen 2000). This “express” route should be able to transfer visual information to the motor cortex within approximately 70–80 ms (see Pettersson et al. 1997 for comments on synaptic relay times during the online visual control of movement in cats). Lyon and colleagues assigned the collicular-to-dorsal visual stream route with the functional role of processing rapidly moving visual stimuli. Here we extend this suggestion, and hypothesize that this route might be directly responsible for the rapid, online updating of changing visual information with respect to hand position that is crucial for the dynamic control of action (Cisek and Kalaska 2010; see also Stuphorn et al. 2000; Reyes-Puerta et al. 2010, for evidence for reach-related visual coding in the intermediate layers of the superior colliculus).

Fig. 2
figure 2

Fast route from visual input to motor output. Most retinal output projects to the striate cortex via the thalamic lateral geniculate nucleus (LGN), and ascends up to the motor cortex via the dorsal and ventral pathways (grey arrows). A small proportion of the retinal output bypasses this major pathway and projects instead to the superior colliculus (SC) in the midbrain (black arrows). Information from the SC is transferred to the middle temporal visual area (MT) via disynaptic projections. This places the premotor cortex (PMv) within just five synapses from the retina (via the SC, the inferior pulvinar, MT and the ventral portion of intraparietal sulcus (VIP)). By effectively bypassing the visual processing hierarchy, visual information mediated via this pathway may therefore become available for rapid online control of action

Physiological evidence for the rapid online control of action

These anatomical data from monkeys (Lyon et al. 2010) coincide with human neurophysiological data from experiments in our laboratory, demonstrating rapid processing of visual information relevant for motor control. In our study (Makin et al. 2009), participants performed a simple button-press motor response with the right index finger, while a task-irrelevant three-dimensional ball suddenly fell just above the participants’ responding hand. Using single-pulse transcranial magnetic stimulation (TMS) over the contralateral primary motor cortex, we found that the sudden appearance of this potentially threatening visual stimulus was associated with a reduction in corticospinal excitability at the very early and specific time window of 70–80 ms following its appearance (see Evarts 1974, for comparable response latencies in macaque M1 neurons, with respect to visual feedback for force application). We interpreted this inhibition as reflecting the proactive suppression of an automatic avoidance-related response, during the execution of the task-related response. Indeed, when the two motor behaviours (the avoidance and the task-related responses) were uncoupled, the approaching ball had an opposite, facilitatory effect on corticospinal excitability (Fig. 3a). Importantly, the rapid inhibition of corticospinal excitability was predominantly hand-centred, depending most upon the distance of the ball from the hand, regardless of the locations of both visual fixation and covert spatial attention relative to the ball and hand.

Fig. 3
figure 3

Rapid modulation of TMS-facilitated corticospinal excitability during the online control of pre-programmed and modified movements. a Modulations in corticospinal excitability, measured using single TMS pulses over M1, during unexpected appearance of distractor 3D objects, approaching the participants. When participants responded to the task-irrelevant objects (identified by a voluntary muscle twitch of the approached hand), corticospinal excitability was specifically enhanced when the object was approaching near the hand, resulting in hand-centred excitation, 70 ms following the appearance of the object (left). When the participants were pre-engaged in a reaction time task, the unexpected appearance of the objects approaching the hand caused reduced corticospinal excitability, resulting in hand-centred suppression, 80 ms following the object appearance. a Modulations in corticospinal excitability, measured using ventral premotor cortex (PMv) and primary motor cortex (M1) paired-pulse TMS, during the performance of reaching movements to objects. When the same object was maintained throughout the execution of the movement, PMv pulses resulted in increased corticospinal excitability 75 ms following the onset of the movement (left). When the target object was switched during movement execution, PMv pulses suppressed corticospinal excitability as early as 75 ms following the switch (which was time locked to the movement onset, right). a Modified from Makin et al. (2009). b Modified from Buch et al. (2010)

This ability to inhibit one movement while concurrently executing another (selective inhibition) is crucial for flexible motor behaviour (Coxon et al. 2007). Such inhibition can be effective to suppress undesirable movements not only after they have been initiated, but also proactively before any muscle response is released (Boulinguez et al. 2008). The rapid processing of visual information during action execution is therefore a prerequisite for such proactive inhibition. Since the hierarchical processing of information in the visual cortex is relatively slow (latencies of neurons in macaque secondary visual cortex (V2), for example, span between 56 and 117 ms, Schmolesky et al. 1998), it is likely that the rapid reprogramming in response to visual information that we demonstrated in our study is mediated via more direct pathways (e.g. area MT, which exhibits latencies as early as 49 ms), possibly including the rapid subcortical route identified by Lyon et al. (2010). Indeed, the rapid hand-centred modulations in our study fit well with the estimated time for processing visual information via the SC in humans, based on recordings in the cat brain (Pettersson et al. 1997). Our findings also correspond well with kinematic and electromyographic findings for early movement corrections following unexpected changes in target positions, just under 100 ms following the perturbation (Paulignan et al. 1991a, b; Farnè et al. 2003; Pruszynski et al. 2008). This timing fits very well with our findings, assuming a ~25-ms conduction time between the primary motor cortex and the intrinsic hand and arm muscles.

While single-pulse TMS allowed us to accurately infer the timeline of hand-centred representations of visual information, it did not provide us with any evidence regarding the brain mechanisms enabling such processing. A recent study directly links the rapid selection of motor responses following changes in visual information with the PMv. Buch and colleagues (Buch et al. 2010) used paired-pulse TMS in order to examine the involvement of the right PMv in the reprogramming of grasp apertures when a target cylinder unexpectedly changed in size, as compared to when the movement was executed towards the originally planned target size. The authors found that PMv facilitated corticospinal excitability prior to and following the onset of the planned movement (see also Koch et al. 2006; O’Shea et al. 2007; Davare et al. 2006, for similar results). When the planned movement had to be reprogrammed following a change in the target, however, the PMv now inhibited corticospinal excitability (Fig. 3b). Crucially, this inhibitory effect emerged as early as 75 ms following the change in visual information, and after the original reaching movement had been launched.

Contrary to our inhibitory effect, which was confined to a narrow time window of less than 20 ms, the effect reported by Buch and colleagues remained significant until at least 100 ms following the switch. This difference between the two studies might reflect the transient nature of the conflict between the avoidance and simple reaction time responses in our design, as compared to the ongoing reaching and grasping movement that Buch and colleagues studied. Indeed, it is important to note that the mechanisms underlying these two types of motor behaviour (avoidance vs. grasping) are most likely different. However, both studies examined representations of changing visual information that are crucial for rapid decision making during motor control. The generation and control of both avoidance and grasping movements in a dynamic environment require re-positioning of the hand in space, and coordination between proximal and distal muscles (for example, changes in object size will require re-programming of the transport phase, Paulignan et al. 1991a, b). In our study, the relevant visual information was a ball, approaching the subject at high velocity (~370 cm/s). In the study by Buch and colleagues, the target object suddenly changed in size by 50 mm, which was brought about via a change in illumination in a darkened room. Such transient, high-contrast, and high-luminance visual events would activate many visual pathways, likely including those also responsive to rapidly moving objects. Given the similarity in onset times of the visual modulation of motor excitability in these two reports, we propose that the mechanisms responsible for the rapid reprogramming of hand and arm movements based on visual perturbations partly rely on the same neural pathways that give rise to rapid hand-centred representations of space when examined neurophysiologically. We expand upon this notion in the next sections.

Hand-centred representations in PMv

The premotor cortex has often been implicated with eye-to-hand coordinate transformation (Pesaran et al. 2006, 2010; Kakei et al. 2001; Mushiake et al. 1997). The most striking demonstrations of hand-centred representation of visual information have been found in bimodal visual–tactile neurons in the macaque PMv (Graziano et al. 1994, 1997; Graziano 1999; Rizzolatti et al. 1981, see Fig. 1c). In these bimodal neurons, the visual and tactile receptive fields generally overlap. This characteristic ensures that the neurons respond preferentially to visual stimuli near the tactile receptive field on the hand, arm, shoulders, neck or face. Evidence for visual–tactile integration is not exclusive to the hand (e.g. Farnè et al. 2005, Schicke et al. 2009), and body-part-centred representation is not restricted to visual space (e.g. Serino et al. 2011). However, as both vision and the hands play a dominant role in humans for interactions with objects, we will focus our discussion on neurons representing visual information around the hands and arms. Furthermore, body-part-centred multisensory representations are not restricted to the PMv, and have also been shown in the posterior parietal cortex, particularly in area VIP (Cooke et al. 2003; Duhamel et al. 1998), which connects MT with PMv directly (see Fig. 2). However, since hand-centred visual representations are most commonly found in the PMv, we will restrict our discussion to this particular area, without considering the relative contributions of the posterior parietal cortex. The body-part-centred multisensory neurons were proposed to provide a general solution to the problem of visuomotor integration (Graziano and Gross 1998), and have stimulated much research in humans (for review, see Brozzoli et al. 2011). The accepted view over the years has been that visual hand-centred mechanisms should play some general role in the sensory guidance of movements towards objects (Rizzolatti 1987; Graziano and Gross 1998; Fogassi and Luppino 2005), or, more recently, in object avoidance responses (Graziano et al. 2002; Cooke et al. 2003), but their specific role has not been determined.

The response properties of macaque PMv hand-centred neurons are similar to the hand-centred modulations of motor excitability by visual stimuli that we reported using TMS (Makin et al. 2009): In both cases, modulations in motor excitability varied with the distance of the object from the hand, independently of the retinal position of the visual stimulus. Moreover, hand-centred modulation was specific for three-dimensional objects approaching the hand. Although comparisons between data arising from monkeys and humans (and using different methods) should be made with caution, given the spatial specificity of the above responses with respect to visual events, and given the evidence for the involvement of PMv in rapid visuomotor response selection (Buch et al. 2010), we suggest that these hand-centred mechanisms play a specific and prominent role in the rapid selection and control of manual actions. Moreover, as the responses of these hand-centred PMv neurons are generally limited to real, three-dimensional, moving stimuli, we suggest that hand-centred coding occurs predominantly to update the motor system about unexpected changes in the visual properties of objects (or of the hand with respect to the object), which are relevant for hand interactions during the online control of action.

It is important to emphasize that we do not assign an exclusive role of the PMv in the representation of visual information for the rapid online control of movement. Indeed, Cisek and Kalaska (2010) noted that the continuous and parallel processes critical for hand–object interactions appear as two waves of activation: an early wave (<100 ms), crudely specifying a “menu” of options; and a second wave that selects among the different available options approximately 120–150 ms after visual stimulus onset. We suggest that when a rapid motor decision is required (such as when an initiated movement needs to be corrected or aborted, due to unexpected changes in the object’s properties), transient hand-centred visual information, made available within a short time frame (potentially due to a specialized pathway), will be utilized (however, see Davare et al. 2006, for rapid modulation of PMv during the delayed execution of a pre-planned movement). Eye-centred representations for action, which during hand movements will necessitate recurrent transformations (from hand-centred to eye-centred, and then back again to hand-centred), are likely to dominate the later wave. We are currently not aware of direct evidence using electrophysiology to support the role of the PMv in rapid hand-centred visual representations during hand–object interactions. An indirect finding was reported by Cooke and Graziano (2003), which showed startle-related EMG activity occurring as early as 70 ms after stimulus onset. This EMG activity was comparable to the muscle responses that were artificially evoked by electrical macro-stimulation of bimodal regions of the premotor cortex (Graziano et al. 2002). We hope that our suggestions will motivate researchers to design paradigms involving hand movements while studying hand-centred representations in PMv.

Concluding remarks

A quick review of the literature reveals that the most dominant representation for visually guided hand movements is an eye-centred one. Moreover, the theoretical framework that promotes the primary role of eye-centred representations in hand–object interactions includes evidence showing hand-centred processing of the position of objects in the motor system, as a final stage in the perception-to-action pathway (Snyder 2000). In this light, the role of hand-centred representation for hand–object interactions may require re-evaluation.

We suggest that a short and rapid pathway from the retina to the motor cortex exists to update hand-relevant visual information for the dynamic control of hand–object interactions. According to our framework, sudden or unexpected changes in object attributes relative to the hand (e.g. location) will be rapidly transmitted to the PMv via a rapid subcortical route involving the superior colliculus. This relatively “crude” visual information will be integrated with sensory information regarding current estimates of hand position in the PMv, resulting in hand-centred responses in bimodal neurons with overlapping visual and tactile receptive fields (Graziano 1999). This hand-centred information will then be used to abort or inhibit pre-planned movements that become irrelevant, or to facilitate newly emerging movements. Such a mechanism might allow for the rapid selection of appropriate actions, for example when the position of a target object unexpectedly changes (e.g. a glass that falls from the shelf as we fumble in reaching for it) or when we are unexpectedly required to avoid an approaching object (e.g. realizing that we are, in fact, unable to catch the falling glass, we instead try to avoid injury). In accordance with the model proposed by Cisek and Kalaska (2010), we emphasize the role of this rapid mechanism in an unstable environment. This is because under predictable conditions, hand-centred representation could be outweighed by the eye-centred mechanisms, which probably provide more accurate visuospatial information relating to the positions of the object and the hand.

Cooke and Graziano have previously proposed a similar notion, advocating for the involvement of bimodal neurons in the PMv (as well as the ventral intraparietal area) in the coding of avoidance responses (Cooke and Graziano 2003; Cooke et al. 2003; Graziano et al. 2002). Our proposal extends this framework in several ways: First, we argue that the same coding occurs during the rapid selection of responses both towards objects (reaching and grasping) and away from objects (avoidance). The same neural mechanisms and pathways may sustain both types of hand–object interaction. Second, according to our account, hand-centred processing of objects in this pathway can only occur (or at least, predominantly occurs) during sudden changes in the properties most relevant for hand–object interactions, such as the spatial properties of the object. In this sense, therefore, there is no need for a continuous, online hand-centred representation of peripersonal space. Instead, hand-centred visual representations of objects near or approaching the hand may be dynamically formed only whenever they become relevant for our actions and interactions with the world. We therefore suggest that these mechanisms specialize in the rapid updating of relevant visual information during response selection and the online control of action. We believe that future studies should place an emphasis on the precise timing of visuomotor transformations, as well as multisensory interactions during action (see, for example, Brozzoli et al. 2009, 2010), as important criteria for determining the underlying neural mechanisms for hand-centred representations.