Keywords

11.1 Introduction

It used to be thought that the brain processes sensory inputs in parallel, ­modality-specific streams, but an extensive literature on multisensory processing now suggests that this is not the case. For example, many cerebral cortical regions previously considered to be specialized for processing various aspects of visual input are also activated during analogous tactile or haptic tasks. In humans, the set of visual cortical regions known as the MT complex, which processes visual motion, is also activated by tactile motion stimuli, even when there is no explicit task (Hagen et al. 2002; Blake et al. 2004; Summers et al. 2009). Tactile texture perception activates visually texture-selective areas in medial occipital cortex (Stilla and Sathian 2008; Sathian et al. 2011) while the lateral occipital complex (LOC) is selective for shape during both visual and haptic perception (Amedi et al. 2001, 2002; Zhang et al. 2004; Stilla and Sathian 2008). By contrast, haptic face recognition was found to activate the left fusiform gyrus (even though faces were felt with the left hand), while visual face recognition activated the right fusiform gyrus (Kilgour et al. 2005), and there is little overlap between visually and haptically face-selective voxels in ventral and inferior temporal cortex (Pietrini et al. 2004). On the whole, however, the old consensus is giving way to the concept of a “metamodal” brain with a ­multisensory task-based organization (Pascual-Leone and Hamilton 2001; Lacey et al. 2009a; James et al. 2011a); for example, shape-selective regions respond whether the task is visual or haptic. An intuitively appealing idea is that the activation of classical visual regions during haptic perception reflects visual imagery (Sathian et al. 1997): when feeling an object, one naturally imagines what it might look like. However, this does not necessarily mean that visual imagery mediates visual cortical recruitment during haptic perception. In this chapter, we review the evidence concerning the potential role of visual imagery in haptic shape perception and outline a process model. By way of background, we begin with a review of the brain regions involved in visuo-haptic multisensory shape processing and the inferences that can be drawn from this evidence about the underlying representation of object shape.

11.2 Cortical Regions Involved in Visuo-Haptic Shape Processing

The principal cerebral cortical region involved in visuo-haptic shape processing is the LOC, an object-selective region in the ventral visual pathway (Malach et al. 1995). Part of the LOC responds selectively to objects in both vision and touch (Amedi et al. 2001, 2002). The LOC is shape-selective during both haptic 3D shape perception (Amedi et al. 2001; Zhang et al. 2004; Stilla and Sathian 2008) and tactile 2D shape perception (Stoesz et al. 2003; Prather et al. 2004). The LOC is thought to be a processor of geometric shape, since it is not activated during object ­recognition triggered by object-specific sounds (Amedi et al. 2002) but does respond when auditory object recognition is mediated by a visual-auditory sensory substitution device (Amedi et al. 2007). Such devices convert visual shape information into an auditory stream, or “soundscape,” conveying the visual horizontal axis through auditory duration and stereo panning, the vertical axis by varying auditory pitch, and brightness by varying loudness. Extracting shape information from these ­soundscapes, which requires substantial training, enables object recognition and generalization to untrained objects but only when individuals (whether sighted or blind) are trained using the specific algorithms involved and not when merely ­arbitrary associations are taught (Amedi et al. 2007). A more recent study required participants to listen to the impact sounds made by rods and balls made of either metal or wood (James et al. 2011b). Participants matched these sounds by the shape of the object that made them, the material of the object, or by using all the acoustic information available. The LOC was more activated when these sounds were categorized by shape than by material (James et al. 2011b). It is possible that such a matching task engaged visual imagery and that this could explain why these results differ from those obtained by Amedi et al. (2002). Taken together, these findings support the idea that the LOC is concerned with shape information, regardless of the input sensory modality.

Several parietal cortical regions also show multisensory shape-selectivity, ­including the postcentral sulcus (PCS) (Stilla and Sathian 2008), which is the location of Brodmann’s area 2 in human primary somatosensory cortex (S1) (Grefkes et al. 2001). S1 is generally assumed to be purely somatosensory, but earlier ­neurophysiological studies in monkeys suggested that parts of S1 were visually responsive as well (Zhou and Fuster 1997; Iwamura 1998). Visuo-haptic shape-selectivity has also been widely reported in various parts of the human intraparietal sulcus (IPS), which is squarely in classical multisensory cortex. In particular, there are bisensory foci in the anterior IPS (aIPS) (Grefkes et al. 2002; Stilla and Sathian 2008); in the regions referred to as the anterior intraparietal area (AIP, Grefkes and Fink 2005; Shikata et al. 2008) and medial intraparietal area (MIP, Grefkes et al. 2004); and in the posteroventral IPS (pvIPS) region (Saito et al. 2003; Stilla and Sathian 2008) comprising the caudal intraparietal area (CIP, Shikata et al. 2008) and the adjacent, retinotopically mapped, areas IPS1 and V7 (Swisher et al. 2007). It should be noted that areas AIP, MIP, CIP, and V7 were first described in macaque monkeys, and their homologies in humans remain somewhat uncertain.

A crucial question about haptic or tactile activation of supposedly visual cortical areas is whether such activation is merely a by-product, with little or no functional relevance, or whether it is, in fact, necessary for task performance. Two lines of evidence indicate that the latter is the case. Firstly, neurological case studies indicate that the LOC is necessary for both haptic and visual shape perception. A patient with a left occipito-temporal cortical lesion, likely including the LOC, had both tactile and visual agnosia (an inability to recognize objects), although somatosensory cortex and basic somatosensory function were intact (Feinberg et al. 1986). Another patient with bilateral lesions to the LOC was unable to learn new objects either visually or haptically (James et al. 2006). Secondly, some studies have employed transcranial magnetic stimulation (TMS) to temporarily deactivate specific, functionally defined, cortical areas. TMS over a parieto-occipital region activated during tactile discrimination of grating orientation (probable area V6 [Pitzalis et al. 2006]) interfered with performance of this task (Zangaladze et al. 1999). A recent study reported that repetitive TMS (rTMS) over left lateral occipital cortex disrupted object categorization while facilitating scene categorization (Mullin and Steeves 2011), suggesting that object processing cannot be carried out without a contribution from this area. Similarly, rTMS over the left aIPS impaired visual-haptic, but not haptic-visual, shape matching using the right hand (Buelte et al. 2008), but rTMS over the right aIPS during shape matching with the left hand had no effect on either cross-modal condition. The reason for this discrepancy is unclear and emphasizes that the exact roles of the PCS, the IPS regions, and LOC in multisensory shape processing remain to be fully worked out.

11.3 Visual Imagery or Multisensory Convergence?

An intuitively appealing explanation for haptically evoked activation of visual cortex is that this is mediated by visual imagery (Sathian et al. 1997). The LOC is certainly active during imagery: for example, the left LOC is active during mental imagery of familiar objects previously explored haptically by blind individuals or visually by sighted individuals (De Volder et al. 2001), and also during recall of both geometric and material object properties from memory (Newman et al. 2005). More pertinently, haptic shape-selective activation magnitudes in the right LOC were strongly predicted by individual differences in ratings of the vividness of visual imagery (Zhang et al. 2004). Some have argued against the visual imagery hypothesis on the basis that the congenitally blind show shape-related activity in the same regions as the sighted: since the congenitally blind do not have visual imagery, these researchers have argued that such imagery cannot account for the activations seen in the sighted (Pietrini et al. 2004). However, the fact that the blind cannot employ visual imagery during haptic shape perception is certainly no reason to exclude this possibility in the sighted, particularly given the extensive evidence for cross-modal plasticity in studies of visual deprivation (Pascual-Leone et al. 2005; Sathian 2005; Sathian and Stilla 2010). A further objection has been that the magnitude of activity in the LOC during visual imagery is only about 20% of that seen during haptic object identification, suggesting that visual imagery is relatively unimportant during haptic shape perception (Amedi et al. 2001; and see Reed et al. 2004). However, these studies generally did not monitor performance on the visual imagery task, and so the low activity in LOC during imagery could simply mean that participants were not performing the task consistently or were not maintaining their visual images throughout the imagery scan.

It is also important to be clear what is meant by visual imagery as this is not a unitary ability. Recent research has shown that there are two different kinds of visual imagery: “object imagery,” i.e., images that are pictorial and deal with the actual appearance of objects in terms of shape, color, brightness, and other surface ­properties, and “spatial imagery,” i.e., more schematic images dealing with the spatial relations of objects and their component parts and with spatial transformations (Kozhevnikov et al. 2002; Kozhevnikov et al. 2005; Blajenkova et al. 2006; and see Chap. 16). This distinction is relevant because both vision and touch encode spatial information about objects—for example, size, shape, and the relative positions of different object features—such information may well be encoded in a modality-independent spatial representation (Lacey and Campbell 2006). Support for this possibility is provided by recent work showing that spatial, but not object, imagery scores were correlated with accuracy on cross-modal, but not within-modal, object identification for a set of very similar and previously unfamiliar objects (Lacey et al. 2007a).

In recent work, we investigated whether object and spatial imagery dimensions exist in haptic and multisensory representations, in addition to the visual domain (Lacey et al. 2011). We employed tasks that required shape discrimination across changes in texture and texture discrimination across changes in shape; these were performed both within-modally in vision and touch and cross-modally with visual study followed by haptic test, and vice versa. In both vision and touch, we found that shape discrimination was impaired by texture changes for object imagers but not spatial imagers, while texture discrimination was impaired by shape changes for spatial imagers but not object imagers. A similar pattern occurred in the cross-modal conditions when participants were accessing a multisensory representation (see Lacey et al. 2009b): object imagers were worse at shape discrimination if texture changed while spatial imagers could discriminate shape whether texture changed or not (Lacey et al. 2011). There is also evidence that early-blind individuals perform both object-based and spatially based tasks equally well (Aleman et al. 2001; see also Noordzij et al. 2007). Thus, it is probably beneficial to explore the roles of “object” and “spatial” imagery rather than taking an undifferentiated “visual” imagery approach. Moreover, the object-spatial dimension of imagery can be viewed as orthogonal to the modality involved.

An alternative to the visual imagery hypothesis is that incoming inputs in both vision and touch converge on a modality-independent representation, which is suggested by the overlap of visual and haptic shape-selective activity in the LOC (Amedi et al. 2001, 2002; Zhang et al. 2004; Stilla and Sathian 2008). While some researchers refer to such modality-independent representations as “amodal,” we believe that this term should be reserved for linguistic or other abstract representations. Instead, we prefer use of the term “multisensory” to refer to a representation that can be encoded and retrieved by multiple sensory systems and which retains the modality “tags” of the associated inputs (Sathian 2004). The multisensory hypothesis is supported by studies of effective connectivity derived from functional magnetic resonance imaging (fMRI) data indicating the existence of bottom-up projections from S1 to the LOC (Peltier et al. 2007; Deshpande et al. 2008) and also by electrophysiological data showing early propagation of activity from S1 into the LOC during tactile shape discrimination (Lucan et al. 2010). However, both Peltier et al. (2007) and Deshpande et al. (2008) also found evidence for top-down projections, indicating that shape representations in the LOC may be flexibly accessible by either bottom-up or top-down pathways (see Sect. 11.4.2).

If vision and touch engage a common spatial representational system, then we would expect to see similarities in processing of visually and haptically derived representations; this, in fact, turns out to be the case. For example, the time taken to scan both visual images (Kosslyn 1973; Kosslyn et al. 1978) and haptically derived images (Röder and Rösler 1998) increases with the spatial distance to be inspected. Also, the time taken to judge whether two objects are the same or mirror-images increases nearly linearly with increasing angular disparity between the objects for mental rotation of both visual (Shepard and Metzler 1971) and haptic stimuli (Marmor and Zaback 1976; Carpenter and Eisenberg 1978; Hollins 1986; Dellantonio and Spagnolo 1990). The same relationship was found when the angle between a tactile stimulus and a canonical angle was varied, with associated activity in the left aIPS (Prather et al. 2004), an area also active during mental rotation of visual stimuli (Alivisatos and Petrides 1997), and probably corresponding to AIP (Grefkes and Fink 2005; Shikata et al. 2008). Similar processing has been found with sighted, early- and late-blind individuals (Carpenter and Eisenberg 1978; Röder and Rösler 1998). These findings suggest that spatial metric information is preserved in ­representations derived from both vision and touch and that both modalities rely on similar, if not identical, imagery processes (Röder and Rösler 1998). In addition, behavioral studies have shown that cross-modal priming is as effective as within-modal priming (Easton et al. 1997a, b; Reales and Ballesteros 1999) and that visuo-haptic cross-modal object recognition is subserved by a multisensory, view-independent, representation (Lacey et al. 2007a, 2009b; Lacey et al. 2010b; but see also Lawson 2009). Candidate regions for housing a common visuo-haptic shape representation include the right LOC and the left pvIPS, since activation magnitudes during visual and haptic processing of (unfamiliar) shape are significantly correlated across subjects in these regions (Stilla and Sathian 2008).

11.4 A Preliminary Model of Visual Imagery in Haptic Shape Perception and Representation

An important goal of multisensory research is to model the processes underlying visuo-haptic object representation. In pursuit of this, we recently investigated connectivity and inter-task correlations of activation magnitudes during visual object imagery and haptic perception of both familiar and unfamiliar objects (Deshpande et al. 2010; Lacey et al. 2010a). As a result, we are able to outline a preliminary process model of visual imagery in haptic shape perception that draws together the various findings reviewed above.

11.4.1 Activation Analyses

In one experiment (Lacey et al. 2010a), a visual imagery task required participants to listen to word pairs and to decide whether the objects designated by those words had similar (e.g., snake-rope) or different (e.g., spoon-fork) shapes; responses were indicated by pressing buttons on a response box. Thus, in contrast to earlier studies, participants engaged in a task requiring visual imagery which could be verified by monitoring their performance. In a separate session, participants performed a haptic shape task in which they felt a series of unfamiliar objects with their right hand and made a same/different shape discrimination. Each of these tasks was paired with a suitable control task (see Lacey et al. 2010a, for details). We were particularly interested in brain areas that were activated in both the imagery and the haptic tasks and whether activation magnitudes in these overlap zones were correlated between the two tasks. Although there were four such overlap zones: in the LOC bilaterally, left aIPS, and left anteroventral IPS (avIPS), only the last showed a significant, positive inter-task correlation. These results therefore offered only weak evidence for the visual imagery hypothesis, perhaps reflecting only transient imagery of basic shape elements of the unfamiliar objects. However, while the haptic shape task involved unfamiliar objects, the visual imagery task obviously involved retrieving images of familiar objects from long-term memory. Reasoning that this mismatch in familiarity might have accounted for our findings, we conducted a second experiment in which the visual imagery and haptic shape tasks were exactly the same as before, except that we substituted a set of familiar objects in the haptic task. Thus, both tasks were now matched for familiarity. This yielded an extensive network of overlap zones, including bilateral LOC and a number of prefrontal areas. Not only were these regions active in both the imagery and haptic tasks but also activation magnitudes were significantly positively correlated between tasks in bilateral LOC, left pvIPS, ventral premotor cortex (PMv), inferior frontal gyrus (IFG), and the pulvinar/lateral posterior thalamic region (pul/LP). Thus, putting both experiments together, we demonstrated that while visual imagery was only weakly associated with haptic perception of unfamiliar objects, it was strongly linked to haptic perception of familiar objects. We should also note that the visual imagery and familiar haptic shape tasks probably engaged visual object imagery rather than visual spatial imagery (see discussion above). Participants in each experiment also completed the Object-Spatial Imagery Questionnaire (OSIQ: Blajenkova et al. 2006), and those with a preference for object imagery tended to be better at the familiar haptic task than those who preferred spatial imagery, while the reverse was true for the unfamiliar haptic task. This is consistent with the idea that haptic shape perception might differentially engage object and spatial imagery depending on familiarity (see Lacey et al. 2009a); however, the relationship between task performance and OSIQ scores in these experiments was fairly weak, and further investigation will be necessary to address these individual differences.

11.4.2 Effective Connectivity Analyses

Having found support for the visual imagery hypothesis, we then wished to place this on a stronger footing by examining the connectivity within the cortical networks involved in visual imagery and haptic shape perception (Deshpande et al. 2010). In addition, examination of connectivity could distinguish between the visual ­imagery and multisensory convergence hypotheses. We had previously suggested that vision and touch share a common shape representation that is flexibly accessible via both top-down and bottom-up pathways (Lacey et al. 2007b). Visual imagery involves top-down paths from prefrontal and posterior parietal areas into visual cortex (Mechelli et al. 2004), and so, if LOC activity were mediated by visual imagery, we would expect to find similar, top-down paths into the LOC during both the visual imagery and haptic shape tasks. Alternatively, LOC activity might reflect convergence on a multisensory representation, in which case we would predict bottom-up pathways into the LOC from somatosensory cortex. The existence of paths relevant to both these possibilities was suggested by earlier studies of effective connectivity (Peltier et al. 2007; Deshpande et al. 2008), but these only employed unfamiliar objects and did not analyze task-specific connectivity.

In order to examine the effective connectivity between relevant brain regions, we employed Granger causality analyses. Briefly, causality can be inferred between two time series (in this case, activation magnitudes during the fMRI scan) by cross-prediction: if future values of time series y(t) can be predicted from past values of time series x(t), then x(t) can be said to have a causal influence on y(t) (Granger 1969) (for further details, see Deshpande et al. 2010). These analyses were carried out on a set of regions of interest selected to distinguish between top-down and bottom-up input into the LOC.

During visual imagery, the LOC was primarily driven top-down by prefrontal areas with significant inputs from the IFG and orbitofrontal cortex (OFC). There was a similar pattern during haptic perception of familiar shape, with top-down drive into the LOC from OFC and IFG. During haptic perception of unfamiliar shape, however, a very different pattern emerged, with the right PCS driving bilateral LOC as well as the left avIPS which, in turn, provided strong input to the left LOC. Thus, here bottom-up pathways from somatosensory cortex dominated LOC inputs. 2D correlations between the connectivity matrices for the three tasks showed that the visual imagery network was strongly correlated with the familiar, but not the unfamiliar, haptic shape network, whereas the two haptic networks were uncorrelated.

Based on these findings and on the literature reviewed earlier in this chapter, we proposed a conceptual framework for visuo-haptic object representation that integrates the visual imagery and multisensory approaches (Lacey et al. 2009a). In this model, the LOC contains a representation that is independent of the input sensory modality and is flexibly accessible via either bottom-up or top-down pathways, depending on object familiarity (or other task attributes). For familiar objects, global shape can be inferred easily, perhaps from distinctive features that are sufficient to retrieve a visual image, and so the model predicts important top-down contributions from parietal and prefrontal regions on the basis that haptic perception of familiar shape utilizes visual object imagery via these regions. By contrast, because there is no stored representation of an unfamiliar object, its global shape has to be computed by exploring it in its entirety. Haptic perception of unfamiliar shape may therefore rely more on bottom-up pathways from somatosensory cortex to the LOC. Since parietal cortex in and around the IPS has been implicated in visuo-haptic perception of both shape and location (Stilla and Sathian 2008; Sathian et al. 2011), the model also predicts that, in order to compute the global shape of objects, these parietal regions would be involved in processing the relative spatial locations of object parts.

In a further test of the model, we recently compared visual spatial imagery to familiar and unfamiliar haptic shape perception (Lacey et al. 2012). Conjunction analyses showed parietal cortical foci common to spatial imagery and both haptic shape tasks as well as demonstrating inter-task correlations of activation magnitude. Spatial imagery performance was positively correlated with activity in multiple parietal cortical foci. These results suggest that spatial imagery appears to be implicated in haptic shape perception regardless of object familiarity, possibly related to assembling a global shape representation from component parts (Lacey et al. 2012).

11.4.3 Future Development

One goal for further work on this model is to examine how it relates to Kosslyn’s model of visual imagery which proposes that visual images are maintained in a visual buffer and inspected via an “attentional window” (Kosslyn 1980, 1994). In this respect, it is interesting that we found inter-task correlations of activation magnitudes in IFG and the pul/LP thalamic area during visual object imagery and haptic perception of familiar shape. The IFG is involved in top-down generation and control of imagery processes (Kosslyn et al. 1993; Ishai et al. 2000; Mechelli et al. 2004) while the pul/LP thalamic area has been associated with shifts of attention within the visual buffer (Kosslyn et al. 1993; Kosslyn 1994). Since the imagery and haptic tasks both required a comparison between two stimuli in order to make the same/different decision, participants may well have shifted between images in making the comparison. At this stage, however, these relationships between our model and Kosslyn’s (1980, 1994) can only be regarded as tentative, and a more principled investigation is required.

In addition, clearly objects are not exclusively familiar or unfamiliar, and individuals are not purely object or spatial imagers: these are dimensions along which objects and individuals may vary. Since these factors likely interact, with different weights in different circumstances, for example depending on task demands or individual history (visual experience, training, etc.), an individual differences approach is likely to be productive (see Lacey et al. 2007b; Motes et al. 2008).

11.5 Conclusions

In this chapter, we have reviewed evidence for the functional involvement of the LOC, a supposedly visual area, in haptic shape perception and outlined our model in which this involvement reflects visual object and spatial imagery, depending on object familiarity. Both activation and connectivity analyses suggest that object imagery is associated with familiar, more than unfamiliar, objects while spatial imagery may be associated with both. Further work is required to examine individual differences as they relate to this model and to investigate how it interfaces with earlier models of visual imagery.