Introduction

Visual agnosia refers to any disorder in the visual recognition of objects that cannot be attributed to more rudimentary visual defects in acuity, stereopsis, luminance or contrast sensitivity, nor to higher order cognitive functions such as verbal comprehension, speech production, dementia and more general memory or cognitive deterioration [1, 2]. This definition means that visual agnosia is a multifaceted disorder, one that covers a diverse spectrum of functions that are all required in order to accomplish the everyday tasks of visual perception and recognition. Research on visual agnosia can therefore serve two important functions. One is to better understand the nature of any specific individual patient’s disorder. The other is to inform our understanding of the functional and anatomical organization of the human visual system. Both of these endeavors are being rapidly enhanced by recent advances in neuroimaging and intra-operative electrical mapping.

Many excellent reviews and monographs on the topic of visual agnosia already exist and we encourage the reader to consult them for either broad overviews or for detailed expositions on specific topics [39]. Given this rich and contemporaneous background literature, our aim here is to highlight certain facets of the perceptual, as opposed to the semantic, symptoms of visual agnosia that we believe are still poorly understood and that therefore could benefit from further critical consideration, theorizing and investigation.

We begin by revisiting Lissauer’s [10, 11] pioneering associative-apperceptive dichotomy, a foundational pillar of visual agnosia, in order to underscore elements of his conceptual framework that are understated in current textbook treatments. We then review work involving two of the most extensively studied patients with visual agnosia, “DF” and “HJA”, who demonstrate unique patterns of perceptual and recognition deficits in shape and scene processing. We survey evidence from these patients and from patients with object-centered neglect that suggests that deficits in selective attention to the parts of objects, and to the relations between object parts, can limit the ability to bind various surface features and object parts onto a single object. This problem with binding can also occur at a higher level in the structural description, such as to the relations between objects, and even to the relations between different regions in a scene. We refer to these functions collectively as the spatial scale of processing and we limit their content to the structural hierarchy of the visual array. We further acknowledge that selective attention to a specific spatial scale is necessarily constrained by “higher level” sources of control over internal attention such as goal selection. We move on to review the cortical network hubs associated with the control of visual-spatial attention (often called “orienting”) and discuss a subset of visual neglect characterized by object-centered deficits in perception. We offer candidate cortico-cortical pathways that are capable of carrying the signals of selective attention to scale to their cortical participants along the ventral occipitotemporal areas associated with visual perception and visual agnosia. This analysis highlights the parallel nature of the neural pathways out of visual cortex and their different cognitive and behavioral functions. It also explains why damage that excludes the posterior parietal cortex can paradoxically spare visually guided grasping when directed towards simple goal objects and yet impair grasping based on the most appropriate and contextually meaningful part of more complex objects, such as tools.

Lissauer’s Patient “L”

Lissauer’s late nineteenth century studies of patient “L” constitute the earliest evidence that the human visual system can be functionally (and anatomically) fractionated between brain regions that support conscious visual experience and those brain regions that are necessary for semantic elaboration [10, 11]. L experienced great difficulty recognizing objects, people and places by sight alone despite intact central visual fields, preserved fixation, smooth pursuit and saccadic eye movements, and relatively good acuity and depth perception. Lissauer also reported that L could identify and describe common objects when permitted to explore them haptically, when permitted to hear the canonical sounds they could make or when given their name verbally. Thus, both L’s low-level vision and his semantic knowledge of common objects remained intact, despite his recognition deficit. These observations led Lissauer to conclude “…there must have been a disruption of the associate processes.” (p. 186), from damage to the pathways linking the structures dedicated to apperception and those in which are stored the associated semantic knowledge.

Lissauer’s Notion of Visual Apperception

Lissauer posited that the visual system’s construction of “mental pictures”, a process he referred to as apperception, was functionally and anatomically distinct from the association of those pictures with stored semantic information (“ideas”). Lissauer described apperception as “…the stage of conscious awareness of a sensory impression” (p. 181); “…the highest level of perception in which the conscious mind takes a sensory impression with maximal intensity.” (p. 182); “…the ability to detect discrepancies between sense perceptions.” (p. 183); and “…that function which enables us to give information about the differences between sensory impressions.” (p. 184). In other words, Lissauer conceived apperception as the process by which we achieve a visual percept, that is, a visual understanding of an object, which includes its shape or structure, surface properties and volume, and this process necessarily entails discrimination.

Lissauer was determined to quantify visual percepts, but he realized that asking patients to verbalize their conscious visual experience presumed the pathways linking “mental pictures” to their associated “ideas” was intact. For example, Lissauer reported that although his patient could not name colours by sight, he could successfully match and sort colours when given samples, stating “If he was presented with samples of Holmgren wools and asked to select all examples of the same shade he was able to do this without hesitation. For example, he would select all the green shades and without hesitation reject all blue colours or hues tending towards yellow. If he was presented with a certain hue and asked to find its exact match, he was able to do this immediately. He would either find the closest match or report that an exact match was not available. Thus, he clearly was able to differentiate between subtle hues of grey, green, and yellow.” (p. 163). From this, Lissauer reasoned that quantifying visual percepts was best approached using two possible non-verbal approaches. The first would involve “…getting the subject to copy the stimulus either by drawing it or by repetition or something along these lines.” (p. 183); the quality of the drawings and manner in which they were made could provide insights into the quality of the patient’s visual apperception. This approach is still used in more modern studies. In a second suggestion, visual percepts could be quantified by measuring “The amount of difference necessary for two percepts to be registered as being incongruent…” (p. 183). In other words, the second method entailed the “bread and butter” method of modern visual psychophysics: forced-choice measurement.

Lissauer Fractionates Apperception and Grounds It in Spatial Vision

Lissauer further fractionated apperceptive visual agnosia into different domains. L’s spared non-verbal colour discrimination, his relatively intact ability to draw copies of simple objects and “…the abilities to perceive colour, form, and three-dimensional objects.” (p. 183). Lissauer’s distinction between simple and complex objects anticipates a modern distinction between patients with form and integrative visual agnosia, respectively, while Lissauer’s distinction between object “form” (2D) and 3D (real) objects anticipates Marr’s distinctions between the primal sketch, 2 ½ D and 3D object model processing levels [12]. Furthermore, Lissauer’s notion of visual apperception was fundamentally grounded in spatial vision. In fact, he defended his entire notion of visual apperception by “…introducing spatial vision into the framework…as a prerequisite for any complex visual perception, even if it is justifiable to consider it an issue separate from apperception.” (p. 184). In other words, Lissauer believed that spatial vision was a multimodal enterprise, referring to both retinal and extra-retinal input, and suspected it was sufficiently complex to warrant its own system. Damage to this system, he speculated, would result in a “chaotic” and “confusing” visual experience that would disrupt object recognition. These speculations anticipate our contemporary theoretical understanding of patients with simultanagnosia.

It is also worthwhile to point out that, contrary to many textbook characterizations of Lissauer’s apperception-associative dichotomy, he did not believe the boundary between these categories was strict. In fact, Lissauer stated “There can be no doubt that our patient showed an impairment of apperception. In particular, as has been described in the case history under the heading “form perception”, his perception of complex visual stimuli was not intact” (p. 185). Lissauer’s conclusion flowed from his observations of L’s drawn copies of objects of various complexity. Recall that L produced good line-drawn copies of simple geometric shapes, but he became hopelessly frustrated when attempting to copy more complex objects, and, regardless of the object’s structural complexity, L’s drawings were made slowly, with concerted effort, and in a piecemeal manner.

Evaluating Visual Agnosia

Patients who report impaired visual recognition will typically undergo static and/or dynamic perimetry mapping to test for low-level defects across their visual field, along with additional tests for low-level defects in visual acuity, contrast sensitivity, stereopsis and depth discrimination. Tests of object recognition entail asking patients to name and describe objects in plain view and discriminate among them verbally or through gesture. The additional information about 3D geometry and surface properties that are available with real objects and models of real objects, relative to photographs or line drawings, can improve recognition performance in patients with visual agnosia. Furthermore, recognition improves substantially when the experimenter uses non-visual means to cue object identity, such as when they name the unrecognizable object; manipulate it in a way that produces its canonical sound (e.g. shaking a set of keys to make a familiar sound of jingling keys); or permit the patient to explore it haptically. Thus, patients with visual agnosia can demonstrate that they possess accurate semantic information about the object that is retrievable through non-visual sensory information.

Aside from drawing copies of objects from a visible template or one from memory, tests to reveal deficits in the apperception of object structure or surface properties often rely on detection and discrimination methods that are not dependent on verbal reports. For example, Efron devised a two-alternative forced-choice (2AFC) test of object form discrimination in which the participant indicates whether pairs of rectilinear shapes (squares and rectangles) are the same or different [13]. The shapes themselves possess the same texture, colour and surface area, and differ only in terms of their lengths and widths.

Other tasks aim to test the integrity of representations of higher order 3D structure. Goodale and colleagues devised a version of Efron’s shape-discrimination task using 3D blocks [14]. Taylor and Warrington devised an object-naming task in which photographs of common objects were taken from conventional and unconventional angles to test the patient’s ability to access 3D information about the stimulus [15]; (see also 16). Variants of these tasks entail matching photographs of objects (or faces or houses) taken from different viewpoints to a target photograph [e.g. 17. Riddoch and Humphreys devised displays in which line drawings of different objects are superimposed on one another and the patient’s task is to match the embedded objects to samples presented in isolation [18]. Patients with deficits in figure-ground and part segmentation perform poorly on this task [e.g. 1820].

De Renzi and colleagues devised a match-to-sample task that pits visual structural similarity against semantic identity [21]. In this task, three photographs are presented: the sample, the match and the foil. Crucially, the match is the same object as the sample but is configured differently (e.g. an open vs. closed umbrella), while the foil is a different object but is configured in a way that resembles the sample (e.g. a walking cane that resembles the sample closed umbrella serves as a foil, when the match is the open umbrella) [22]. Patients with visual associative deficits but relatively intact visual perception often choose the structurally similar foil [22].

Visual Form Agnosia

The first patient demonstrated to possess visual form agnosia was “Mr. S”, who was systematically tested by Efron [13] and Benson and Greenberg [23]. Mr. S was unable to name any common object or discriminate triangles from circles, despite being able to identify colours, discriminate hue and detect subtle differences in motion, luminance and overall size. Despite his deficit in shape perception, as far as a casual observer could tell, Mr. S could reach for and grasp real objects accurately provided they were moved by the experimenter, and he could localize small white pieces of paper on a black background by pointing at them. Furthermore, he could name objects placed in his hand and demonstrate their use through verbal or communicative gesture. Thus, his semantic knowledge of objects was intact.

Mr. S’s selective deficit was powerfully illustrated by his impaired performance when copying from a visible template (see Fig. 1) and by his poor performance on Efron’s shape-discrimination task. In the shape-discrimination task, the one that bears Efron’s name, a standard square and a rectangle are presented and the viewer is asked to make a same-different judgment about the shapes of the two stimuli. The dimensions of the rectangle are varied from trial to trial with the condition that it must always match the square in terms of its surface reflectance and overall size.

Mr. S’s perceptual impairments were also evident from the results of the attempts to train him to recognize objects using his spared perceptual capacities. For example, he learned to correctly name a red-backed playing card as a “playing card”, but when he was presented with a blue-backed copy of the same playing card, he could not name it at all. In fact, when Mr. S was later presented with a red postage stamp, he identified it as the playing card. In other words, Mr. S had relied on the colour of the object to cue its verbal identity. Furthermore, if any of the objects he was trained to identify were placed on a different background, he could no longer identify them properly. This was consistent with his poor ability to trace the outlines of photographed objects. When doing this, Mr. S would often leave the boundary of one object to trace the boundary of another where two objects overlapped, suggesting impaired figure-ground separation. As compelling as the evidence is for form perception in Mr. S’s case, we do not know the exact location and extent of damage to visual cortex he sustained, because detailed neural scans were not available in the era in which he was reported.

Fig. 1
figure 1

Patient-drawn copies of objects. The structural components of the objects can come from (1) long-term memory, as happens when the experimenter names an object aloud and the patient must recall and visualize the structural features of the object, and maintain them in working memory, visualizing them while translating their visualization into appropriate pen or pencil strokes on paper (left column); or (2) a real object or a picture (right column), photograph or 3D model of an object, which the patient is asked to draw a copy of, therefore circumventing, to some extent, visualization and explicit long-term memory. For patients with visual form agnosia, such as Mr. S and DF, their copies from memory are relatively easily identifiable and are given reliably higher quality ratings by normally sighted controls. In contrast, the patients’ drawn copies of visible templates are often uninterpretable and are assigned reliably lower quality ratings by normally sighted judges. In contrast to the drawn copies of patients with visual form agnosia, patient HJA’s copies appear substantially better; his variant of visual agnosia leans more heavily towards the associative side of the apperceptive-associative spectrum. Nevertheless, it is important to note that in all cases the drawings are made laboriously—in a piecemeal fashion—which suggests that even HJA possesses impairments in visual perception. Indeed, HJA possesses deficits in segmenting overlapping objects, for example, which is one of several indicators for the integrative variant of visual agnosia

The most extensively studied patient with visual form agnosia is “DF”, and it just so happens that her perceptual deficits are strikingly similar to Mr. S’s. DF’s visual fields are intact well beyond central vision, her contrast sensitivity thresholds are normal at high frequencies and modestly higher at lower frequencies and her colour discrimination remains largely preserved [24, 25]. Nevertheless, she exhibits prosopagnosia and possesses a profound deficit in object perception and recognition; her drawn copies of line drawings are poor (see Fig. 1) and her performance on the Efron shape-discrimination task was significantly impaired [2426]; her match-to-sample performance when line-drawn objects are filled-in with black was at chance, regardless of whether the objects are animals or simple geometric shapes [25]. Although her recognition performance never approached levels observed in normally sighted populations, it improved when the test involved coloured photographs and real objects. This is presumably because the additional spatial, colour and surface cues to texture and material properties facilitate the retrieval of intact semantic and structural knowledge [26].

Detailed MRI scans of DF’s brain were taken at a number of different years following her initial injury. The initial MRI scan revealed bilateral lesions to the ventrolateral areas of her occipital cortex and bilateral lesions to the cuneus of dorsomedial occipital cortex that were more extensive on the left than on the right [25]. These lesions have expanded over the decades, particularly in the left posterior parietal cortex (PPC), but functional MRI (fMRI) scans of DF’s brain suggest that her primary visual cortex remains functionally intact [2729]. Consistent with the pattern of DF’s recognition deficits, fMRI scans reveal no differential activity while she viewed intact line drawings or their scrambled counterparts [28]. Scans of normally sighted individuals were made under identical presentations to establish the regions that are typically activated when viewing intact objects, their scrambled counterparts and, importantly, the object-preferential regions that are activated significantly more for intact objects than for scrambled ones. When the group map of controls’ object-preferential activity was superimposed over DF’s brain, the foci of activation were in the lateral occipital cortex (LOC), overlapping DF's lesions. [28]. Notably, LOC is known to play a prominent role in processing the outline shape and contour of objects [30, 31, 32,; for review, see 33].

When DF viewed grey-scaled and coloured photographs of real objects, stimuli that improve her recognition performance, activation was observed in the intact areas of her visual cortex, including the fusiform gyrus, lingual gyrus and, to a lesser degree, the collateral sulcus extending into parahippocampal cortex. Furthermore, activation in these areas was positively correlated with DF’s success or failure to identify the objects. Notably, these same areas responded negligibly when she viewed scrambled versions of those photographs [28].

While DF can classify scenes as natural or artificial at above chance levels when they are presented in full colour or in greyscale, her error rate increases substantially if the scenes are presented in black and white, a finding consistent with her profound deficit in shape perception [29]. In normally sighted individuals, scene perception is associated with a network of cortical structures, including LOC, the parahippocampal place area (PPA), the retrosplenial complex (RSC; also referred to as the medial place area, MPA) and the occipital place area (OPA) (for review, see [34•]). FMRI scans of DF’s brain showed greater activation in her intact parahippocampal gyrus when she viewed scenes compared to when she viewed faces, suggesting she retains some functionality in the scene-processing network, consistent with her ability to classify scenes relatively well when they are presented in full colour [29].

For both scene and object perception, DF fairs better when colour cues are available. In normally sighted individuals, the fusiform and lingual gyri, which border the collateral sulcus in ventral occipitotemporal cortex, activate more strongly to visual surface properties, including colour, specular highlights, shading, pattern and texture, than they do to object shape [35, 36]. Activation in LOC shows an opposite preference, suggesting a lateral processing preference for object shape and a more medial and anterior processing preference for surface and material properties [3540]. In line with the nature of her deficit in shape perception, DF’s performance in a three-item “oddball task” falls to chance levels provided the object-relevant property is shape alone [36]. When the object-relevant property is texture, she performs at well-above chance levels, albeit still below normal. In line with this dissociation, fMRI scans of DF’s brain while she performed these tasks showed there were no areas with greater activation for the shape-discrimination task than for the texture-based one, whereas the middle and posterior lingual gyrus and posterior fusiform gyrus showed greater activation for the texture-discrimination task than the shape-based one [36]. The spared aspects of DF’s scene perception can be contrasted with the topographic associative agnosia experienced by HJA, whose lesions are located more medially and more anteriorally in the inferior occipitotemporal cortex. HJA is the most heavily studied patient with integrative visual agnosia, a higher order visual disorder we turn to next.

Integrative Visual Agnosia

The term integrative agnosia was coined by Riddoch and Humphreys following a series of experiments they conducted with patient HJA [17, 18]. While in hospital for appendectomy, HJA suffered a stroke perioperatively which left a large bilateral lesion to the anteroventral half of his occipital cortex, extending about midway into temporal cortex ventromedially. The stroke resulted in an upper field anopia and rendered him achromatopsic, atopographic, prosopagnosic, alexic and visually agnosic for common objects [17, 18]. Like DF, HJA’s acuity and contrast sensitivity were relatively good, and he had no trouble identifying and describing objects by touch [18, 41] or describing objects named aloud by others [18]. Unlike DF, however, he performed well on the Efron shape- and line-orientation discrimination tasks [13], his line-copy drawings of simple and real objects were good, and he performed well-above chance on a non-verbal object-matching task that involved pictures of common objects [18, 19, 41]. Thus, the pattern of visual deficits indicated impaired access to stored semantic knowledge through sight alone, which suggested his disorder leaned more closely towards the associative end of the visual agnosia spectrum. Indeed, MRI scans of HJA’s brain show that his lesions are more anterior and more medial than DF’s, encroaching well into the temporal cortex and include the fusiform and lingual gyri, and the inferotemporal gyrus [19, 41, 42].

Although HJA’s deficit does not conform to the pattern typical of visual form agnosia, additional testing suggested he experienced apperceptive deficits. For example, although HJA’s copies of line-drawn objects were better than those done by patients with visual form agnosia (see Fig. 1), his drawings were done in a time-consuming, piecemeal fashion [18, 41]. Furthermore, HJA’s good object-matching performance dropped substantially when the objects overlapped one another [18, 19]. He was also poor at classifying line drawings of objects as meaningful or meaningless, where the meaningless objects were Frankenstein-like constructions comprised of parts from different objects [18]. HJA’s identification performance for objects presented in isolation was significantly worse when they were line drawings than when they were silhouettes; the silhouettes lacked details within the object that might normally aid recognition, but these details appeared to confound him. HJA also took ~ 2–4 times longer than controls do to determine whether two beads are located on the same string or on strings that overlap one another; and when the strings were configured to resemble amoebas, he took longer to determine if two beads are on the same “string-amoeba” or different ones, or if a bead is inside or outside of a single string-amoeba [19].

These observations suggested to Riddoch and Humphreys that HJA possessed a deficit in the ability to construct not a coherent percept per se but one that reflected the object in its entirety and a deficit in segmenting clusters of objects or scenes more generally. Although HJA could rely on local geometric features to work out what an object was, he exhibited signs of impaired perception of the relations between object parts and the object as a whole. This idea is further supported by HJA’s performance on a choice-discrimination task that used Navon-like stimuli in which a large object, which constitutes the “global” level, is comprised of smaller objects, which occupy the “local” level. In a popular task, participants classify the global-level object on one set of trials and classify the local-level objects in another set of trials. When the global and local levels share the same identity, they cue the same response and therefore operate cooperatively. When the local and global levels differ, they can compete for different responses provided the irrelevant level has been associated with a competing response. However, when the irrelevant level is not associated with any response, neither a cooperative nor a competitive effect is expected and the condition is considered neutral. In normally sighted individuals, performance is typically better when the global and local level share the same identity and will show a modest discriminative advantage for the global level over the local one. Using the letter stimuli, HJA is substantially slower and less accurate than controls. Moreover, although he shows an advantage for the global level over the local one, his performance is slowest and least accurate when classifying the local letters embedded in a neutral global one, suggesting a particularly strong interference effect transitioning from the global to the local scale, regardless of competing stimulus identity and response associations [19].

Attention, Spatial Vision and Visual Agnosia

HJA’s deficit in integrating object components into a coherent whole and segmenting overlapping objects reflects a deficit in the ability to integrate levels of the visual structural hierarchy. By “structural hierarchy” we are referring to the way in which every visual scene can be thought of as comprised of local and global geometric elements relative to one another. Just as a laptop on a desk forms part of a larger scale desktop scene or an even larger office scene, the visual structure of a laptop is comprised of various smaller scale components (e.g. screen, keyboard, touchpad), each of which can be further segmented. A keyboard, for example, can be subdivided into keys, each of which assumes one of a few different shapes and is positioned at different locations within the keyboard. Each key possesses a printed letter or symbol, most of which can be further subdivided into component contours and lines. Farah [6, [43] suggested that selective attention played a crucial role in mediating the relationships between objects and among the parts of objects. In other words Farah believed selective attention played a crucial role in shifting “the mind’s eye” within and between levels of the structural hierarchy.

Using this theoretical perspective, Farah [43] argued that damage to selective attention of this nature could manifest albeit rarely in patients with what she referred to as “dorsal simultanagnosia”. The dorsal reference stemmed from the preponderance of cases with damage to dorsal parietal occipital cortex (POC) who exhibited this behavior. Farah wrote that the reported propensity of some of these patients to fixate on the parts of objects rather than the whole reflected a deficit “… withseeing objects, or seeing them at the “correct” level of the hierarchy of part-whole analysis; whatever dorsal simultanagnosics can see, they can recognize.” (p. 38, [43] ). The emphasis Farah placed on the word “seeing” suggests that she was referring to the content of visual awareness in these patients—their visual phenomenology. At the same time Farah noted that somewhat paradoxically a deficit in the ability to transition between levels of structural hierarchy can arise from damage to ventral cortical structures. Farah referred to these cases as instances of ventral simultanagnosia. She used the term “ventral” because the location of damage tends to occur in ventral occipitotemporal cortex; and she used the term “simultanagnosia” because these patients have demonstrated impairments with discriminating and reporting the letters of relatively simple and briefly presented three-letter words and non-words [4344].

The similarity of symptoms between dorsal and ventral simultanagnosia leaves open the possibility that deficits in transitioning between and within levels of the structural hierarchy of the array might arise from damage to the pathways that carry signals between posterior parietal and occipitotemporal cortex. It is notable that HJA, whose damage is restricted to ventromedial occipital-temporal cortex, exhibits his strongest impairments when recognizing scenes, which occupy the pinnacle level of the visual structural hierarchy and entails small- and large-scale processing; when isolating overlapping objects, which requires assigning the parts of multiple objects to their appropriate wholes and entails competition within scales and similar levels of the structural hierarchy; and when matching objects when their parts are substituted for the parts of other objects. This pattern of deficits is consistent with what might be expected to occur following damage to structures that integrate information within and across different levels of the structural hierarchy [45]. Put another way, HJA’s behavior implies that damage to brain circuits that assemble visual representations at different scales of integration can contribute to visual agnosia. In a subsequent section, we discuss further evidence that is consistent with this viewpoint, based on neuroimaging work in normally sighted individuals and in patients with visual agnosia and spatial neglect. Nevertheless, it is first important to clarify what is implied by the term selective attention as it relates to structural hierarchy.

Selective Attention and Attention to Scale

The term selective attention is used in the cognitive sciences to refer to a wide variety of perceptual functions, including the selection of various spatial locations for privileged processing [46], the selection of particular surface and material properties such as luminance, colour and texture [47], the selection of objects and extended surfaces [48] and even the selection of some items over others that have been stored in working memory [49; for review see 50]. The form of selective attention implied by Farah [43] is distinct from all these since it refers to one or more of the many levels of structural description for a scene or object. Other researchers have referred to this form of visual selection as attention to the local versus the global aspects of a display [51, 52] and as attention to scene scale [53]. Each of these terms necessarily implies selective attention to some aspects of space, to some features, to some objects, to some surfaces and so on, but attention to each of these visual properties is circumscribed by the level in the hierarchical description of a scene that is required to accomplish a perceptual task (e.g. “attend to the shape of the tree” versus “attend to the shape of the forest”).

Selective attention to one level of scene scale over another level necessarily involves aspects of visual function that have traditionally been studied under the separate umbrellas of spatial attention (e.g. 46), attention to features [47, 54, 55] and object-based attention (e.g.48, 56, 57). Yet note that the task of selectively attending to one scene level over another means, by definition, that attention to spatial locations, featural properties and objects are not independent. Selecting any level of the scene hierarchy implies attention to locations, features and objects at that scale. For example, when selecting at the level of “graspable object”, the object must be segmented from the surface it rests on and the background behind it (separating figure from ground), and the object’s location within the visual array, its spatial relationship between it and parts of the viewer’s body (e.g. eyes, head, and limbs), its surface properties such as its texture and its volumetric shape at different scales (e.g. curved vs. rectangular at larger vs. smaller scales) must all be registered by the motor system to successfully guide the hand to grasp and manipulate the object appropriately.

From this perspective, it is informative to discuss the effects of manipulating selective attention at the level of objects on DF, who, as we have already discussed, possesses visual form agnosia. Normally sighted and neurologically intact individuals are generally slower to discriminate targets that are preceded by invalid spatial cues (for a review, see 58). This cost is associated with the processing time it takes for spatial attention to disengage the cued location and engage a different location that the target occupies [46, 59]. The crucial twist to this finding is that participants are faster to respond if the target and a preceding spatial cue are located within the boundaries of the same object, even when the spatial cue is invalid [48]. In other words, the boundaries of the object define a local region in which selective attention can spread, reducing the processing costs of reorienting attention to a new location. This effect is thought to operate in conjunction and in parallel with spatial attention and has been called “object-based attention” [48, 56, 57].

When DF performs a standard spatial-attention task, her processing time costs for invalidly cued spatial targets are akin to those observed in normally sighted controls. Furthermore, like in normally sighted controls, she showed greater processing costs for vertical over horizontal shifts in cued spatial attention [60]. Thus, DF’s spatial attention appears to be intact. Importantly, however, DF did not show the typical advantage for within-object spatial cueing [60]. In fact, her performance, unlike that of the controls, merely reflected the typical increased processing cost for vertical shifts in attention over horizontal ones. For the controls, but not DF, this cost was overcome provided object-centered attention was invoked [60]. Thus, for DF, damage to LOC meant that there was no shape content or shape processing for object-centered attention to operate on.

These results of object-centered attention measures in DF suggest that structures in visual cortex that are dedicated to processing object form, which are damaged in her brain, are recipients of the modulatory influence of spatially cued attention on performance. The neural correlates of attention in an object-centered context have also been studied using tasks in which participants attend to one or the other of two superimposed images, not unlike the superimposed image recognition and discrimination tasks on which many patients with visual apperceptive agnosia exhibit performance deficits. Two advantages of using superimposed stimuli are that (1) they control for differences in low-level visual features, because the visual input is identical across two or more tasks, leaving the perceptual and cognitive operations performed on the visual input to systematically differ; and (2) they control for large-scale spatial attention. When participants view an image of a house and a face superimposed on one another, for example, activation in PPA is enhanced whenever attention is deployed to the house, while activation in the fusiform face area (FFA) is enhanced whenever attention is deployed to the face [6164]. Orienting attention from one image type to the other is associated with the ventrolateral prefrontal cortex, the posterior superior parietal cortex and ventral occipitotemporal cortex. Furthermore, consistent with feedback based on attentional modulation, enhanced activity in the PPA and FFA is associated with local potential responses occurring ~ 200 ms or later, well after image onset [61]. These studies highlight the influence of attention on visual perception across different stimulus classes in ventral visual cortex.

Support for the view that selective attention to scale aids the construction of the content of visual awareness comes from studies that induce inattentional blindness (for review, see 65). In these studies, participants perform difficult tasks wherein they are asked to track, detect or classify stimuli under attentionally demanding conditions and the difficulty of the task is varied in order to induce inattentional blindness. In some task variants, participants track moving objects or count the number of instances in which they see a number during a rapid serial visual presentation (RSVP) of images. The primary target stimuli are mixed into a “noise” background comprised of, for example, random patches of different colours. On critical trials, an unexpected scene or object is presented, and participants are asked if they were aware of anything different on that trial. When the primary task difficulty is increased, for example, by increasing the speed at which the tracked-stimuli move, participants typically fail to notice unanticipated scenes [66]. Inattentional blindness and dual-task paradigms have also been used to demonstrate the importance of attention for the extraction of summary statistical information about variance in the colour and size of ensembles of objects [67]. Interestingly, the perception of scenes and ensembles and the neural correlates of these processes has been both behaviorally and anatomically linked: performance on scene-perception tasks is correlated with performance on ensemble-perception tasks [68] and scenes and object ensembles are processed in overlapping structures bordering the collateral sulcus in ventral visual cortex [6971, 72•], areas that are damaged in patient HJA.

The Cortical Structures Associated with the Control of Selective Attention to Scale

Figure 2 highlights the visual pathways out of occipital cortex that serve visual perception and the pathways out of the dorsal and ventral parietal attentional centers of the superior and inferior parietal lobe, respectively, that putatively influence visual perception. The upper panel shows a ventral view of the right hemisphere and schematically illustrates, in orange, yellow and red, the following well-established pathways: the inferior longitudinal and inferior fronto-occipital fasciculi (IFL and IFOF) and the occipitotemporal projection system (OTPS), that deliver visual signals out of occipital cortex to the temporal and prefrontal cortex.

Fig. 2
figure 2

Neural pathways carrying visual and selective attentional signals out of the occipital and parietal cortex, respectively, that are associated with visual perception. Top panel: Connections that carry visual signals from occipital cortex to temporal and frontal cortex in the inferior half of the human brain. The lateral and most superficial connections are the U and neighbourhood fibers that comprise the occipitotemporal projection system (OTPS), depicted in orange. Medial to the OTPS and slightly deeper lies the inferior longitudinal fasciculus (ILF), depicted in yellow, which is the first of two long fascicles that run along the rostro-caudal axis. The ILF terminates in the anterior third of the inferior temporal cortex. The second is the inferior fronto-occipital fasciculus (IFOF), depicted in red, which terminates in radiating fan shape within the prefrontal cortex in a dorsoventral axis. Bottom panel: Components of the superior longitudinal fasciculus (SLF) and arcuate fasciculus (AF) that connect attentional centers in intraparietal cortex (IPC) and posterior inferior parietal lobule (pIPL) to prefrontal cortex (e.g. the dorsolateral prefrontal cortex, DLPFC) and to visual areas in temporal cortex. The pathways linking IPC and pIPL to visual areas in the occipitotemporal and temporal lobe are depicted in light green and putatively reflect a means for the attentional hubs to rapidly and directly influence visual perception and to select relevant semantic functional information about goal objects for visually guided actions, like grasping. Components of the SLF and AF that serve the traditional fronto-parietal dorsal and ventral networks are depicted in different shades of blue, with the most posterior component belonging to SLF-II, terminating in the anterior dorsolateral occipital cortex (aDLOC). Landmark sulci are denoted as follows: AOS, anterior occipital sulcus; ATCS, anterior transverse collateral sulcus; IPS, intraparietal sulcus; LOS, lateral occipital sulcus; POS, parieto-occipital sulcus; SOS superior occipital sulcus (posterior IPS); TOS, transverse occipital sulcus. Landmark gyri are denoted as follows: AG, angular gyrus; FG, fusiform gyrus; IOG, inferior occipital gyrus; ITG, inferior temporal gyrus; LG, lingual gyrus; MFG, middle frontal gyrus; MOG, middle occipital gyrus; MTG, middle temporal gyrus; PHG, parahippocampal gyrus; SFG, superior frontal gyrus; SMG, supramarginal gyrus; SOG, superior occipital gyrus; STG, superior temporal gyrus

The lower panel of Fig. 2 shows a three-quarter view of the posterior right hemisphere and overlays schematic illustration of the pathways out of parietal areas that are associated with the control of selective attention to scale and those associated with higher level cognitive operations. The more recently studied subset of these pathways, illustrated in light green, interconnects the intraparietal and posterior inferior parietal attentional centers in the intraparietal and posterior inferior parietal cortex to structures in the occipitotemporal and temporal cortex that are associated with visual perception. These pathways can potentially carry selective attention signals directly, and therefore rapidly, between the attentional centers and the visual cortical structures necessary for the typical construction of the content of visual awareness. These direct pathways are well-positioned to aid not only in the construction of moment-to-moment phenomenological vision, but also in the selection of semantic information stored in the temporal lobe necessary for the selection of appropriate grasp points on complex objects, such as tools, that are suitable for their intended use.

The pathways illustrated in blue in the lower panel of Fig. 2 reflect connections out of the dorsal attention hub of the superior parietal lobule that interconnects the superior parietal and intraparietal cortices (SPC and IPC) and dorsal prefrontal and premotor cortex, bilaterally, and includes core areas that are engaged when attention is voluntarily deployed from one spatial location to another and during the planning and execution of eye movements [73, 74]. The ventral subnetwork is lateralized to the right hemisphere and is comprised of structures in the ventral prefrontal cortex and the ventrolateral inferior parietal cortex, caudal superior temporal cortex and the anterior dorsolateral occipital cortex [73, 74]. Both subnetworks are associated with the intentional deployment of spatial attention and its maintenance, but the ventral subnetwork is engaged when covert attention is “captured” by stimuli that possess salient and task-relevant components [73, 74].

Visuospatial Neglect and Selective Attention

Neglect is conventionally considered a deficit in deploying spatial attention to objects in the contralesional field [75]. As we have discussed, this function is closely associated with the dorsal and ventral attention subnetworks [73, 74]. Classic methods for testing neglect include the line-bisection task [e.g. 76, 77], in which the patient is asked to indicate the center of a line that is oriented from left-to-right, and cancelation tasks [e.g. 78, 79], in which the patient is presented with a cluttered page illustrated with objects and is asked to mark each one of the objects or each instance of a particular object among a mix of different object types. In both tasks, the patients perform as if they are biased towards the ipsilesional side of the line or page. In other words, the patient behaves as if they ignore the side of space that is opposite the hemisphere in which their lesion is located [80]. Lesion analyses of patients with spatial neglect reveal right-hemispheric damage to the ventral subnetwork, including the superior temporal gyrus, supramarginal gyrus, angular gyrus, inferior and middle frontal gyri, the anterior insula, the frontal operculum and the white matter pathways that underlie these areas [8188].

Like visual agnosia, more recent research with neglect patients is based on a diverse set of tasks. Contemporary assessments for spatial neglect contain combinations of tests like line bisection, cancellation, figure copying, representational drawing (see, for example, the Behavioral Inattention Test [89]), and word and sentence reading tasks [e.g. 90]. Crucially, performance on assessments for neglect can vary considerably from patient to patient. In fact, double dissociations have been demonstrated where one patient exhibits neglect in one subtest (e.g. line bisection) but not another (e.g. cancellation), whereas another patient exhibits the reverse pattern [91]. Double dissociations such as these indicate that these tests recruit different underlying processes and neural substrates that can be damaged independently. Thus, visuospatial neglect, like visual agnosia, does not constitute a uniform disorder (for reviews, see [90, 91]). Indeed, recent lesion analyses accommodate these differences by categorizing tests based on whether they rely more heavily on patient-centered (i.e. egocentric) spatial reference frames, which characterize the more classic symptoms of neglect that are tethered to the patient’s contralesional visual field or side of space, or on object-centered reference frames, which we discuss in the next section.

Object-Centered Neglect and Object-Centered Attention

Despite the conventional viewpoint that neglect is a deficit in deploying spatial attention to the visual field or side of space opposite the hemispheric side of the lesion [75], it is clear that a subset of neglect patients experience deficits in object-based perception, regardless of the object’s location in the visual field [9295] (for review, see [75, 93]). One set of tests that highlight the object-centered aspects of neglect are cancellation tasks [94, 96, 97]. In these tasks, the patient is presented with a scene of items and is asked to indicate target items. These tasks are elegant because the patient views the same visual scene and, therefore, the experimenter can manipulate the scene itself while keeping the task demands the same or they can keep the scene the same while manipulating the task demands [98]. In some versions of the task, the targets appear scattered throughout the scene, embedded in a background full of distractor items. In a pioneering study by Driver and Halligan, two groups of multiple short lines were distinguished by colour and located on opposite sides of the display [94]. The patient’s task was to cross each line out, regardless of which group the line belonged to, under conditions of free viewing in which neither the patient’s eyes nor the head is fixed. Remarkably, the patient omitted lines to the left within both groups [94]. It was as if the lines were grouped into a holistic unit, presumably driven by the Gestalt principles of proximity, similarity of form and colour, and by figure-ground separation. Thus, this finding suggests that visual neglect can impair Gestalt-grouping processes that integrate spatial and object information—the very processes that would aid ensemble perception.

The cancellation task was enhanced by Ota and colleagues, who created a scene comprised of two target types that differed from one another by only a subtle change in one of their parts [99]. Circles, for example, served as one target type while variants of the circle that had a small gap in them on either the left or right side served as a second, “partner” target type. A variant set of target types was created that was comprised of triangles and trapezoids. The latter were made by flattening one of the corners of the triangle, such that the two object types were distinguishable merely by this flattened part, which, like the gaps in the circles, could occur on the left or right side of the triangle. The task was to indicate each instance of one object type with one kind of mark (e.g. circling the triangles) and to indicate each instance of the other, “partner” object type with a different mark (e.g. crossing out the trapezoids) [99].

Ota and colleagues tested two patients. The first patient possessed lesions to the insula, anterior superior temporal gyrus and inferior frontal gyrus. In accordance with classic egocentric or patient-centered neglect, this patient tended to miss targets located towards the left-hand side of the page, regardless of what target type they were. The second patient possessed lesions that were more posterior, involving the angular gyrus and posterior superior and middle temporal gyri. Interestingly, regardless of where the first target type (triangles or circles) was located on the page, this patient performed just as well as the first had for targets located in their ipsilesional (i.e. “good”) visual field. In other words, the second patient with more posterior damage showed no unusual tendency to miss targets in contralesional space. Crucially, however, this patient omitted targets when the distinguishing part occurred on the left side of the target, regardless of where the targets were located on the page, indicating a deficit in attention to local scale in the contralesional side of the target.

Lesion Analyses Reveal the Neural Correlates of Object-Centered Neglect

A number of groups have used lesion analytical techniques to identify abnormal voxels in large groups of neglect patients relative to neurologically intact or neurologically compromised controls. The analysis involves correlating these abnormal voxels with different symptoms as assessed by different tests. Chechlacz and colleagues administered a modified version of Ota’s cancellation task, called the apples-cancellation task to 41 patients in order to quantify the severity of patient-centered and object-centered neglect [81]. In line with the view that object-centered and patient-centered neglect were distinct subcomponents, they found that the severity of deficit in each was uncorrelated. Additionally, the voxel-based analytical techniques that involved morphometry and lesion-symptom mapping provided converging support for separate clusters of regions underlying patient- and object-centered neglect. Regions associated uniquely with object-centered neglect were located in the posterior right hemisphere and included the right middle occipital gyrus, the angular gyrus and adjacent posterior regions of the inferior, middle and superior temporal gyri. These analyses also identified the superior longitudinal fasciculus, the inferior fronto-occipital fasciculus and the inferior longitudinal fasciculus, suggesting the involvement of these pathways in selective attention to scale.

Verdon and colleagues tested 80 patients with a battery of behavioral tests in order to perform a principal components analysis on the resultant scores and explore the resultant latent factors the behavioral tests would associate most strongly with. Among the tests was the Ota cancellation task and a similarly constructed compound-word-reading task, which entails (1) tabulating the number of omissions of the whole word as a function of the side of the page the word appears in; and (2) tabulating separately the number of omissions of the left and right word of the compound words, regardless of where they occur on the page. Verdon and colleagues performed voxel-based lesion-symptom mapping (VBLM) which combined the patient-specific factor scores, which were derived from the principal components analysis, with the MRI scans of the patients’ brains [88]. They found three factors that together accounted for 82.1% of the behavioral test scores variance. Again, in line with the view that object-centered neglect is a separate component of neglect, the object-centered components of the Ota cancellation and word-reading tasks loaded strongly and uniquely onto one of the three dominant factors [88]. Furthermore, the patient scores for this factor correlated less with the other two factors than the patient scores for the other two factors correlated with one another, reinforcing the notion that the object-centered components of the test probe a distinct function [88]. The VBLM localized the structures associated with this distinct function: Variance in the object-centered factor was maximally associated with damage to the white matter adjacent to the middle temporal gyrus [88], indicating a crucial role in the long white matter pathways connecting the occipital cortex to the temporal and frontal cortices in scale attention. Of the patients with the most severe deficits on the object-centered tests, half possessed lesions extending from the occipital to the medial temporal lobe, whereas the other half possessed lesions that extended more laterally and anteriorally into the temporal cortex [88]. This final observation might reflect a difference in linguistic emphasis between the two object-centered tasks, with poor performance on the non-linguistic Ota task associated with damage to the posterior regions.

Chechlacz and colleagues used anatomic likelihood estimation to perform a meta-analysis of 10 lesion-overlap studies that involved a combined 700 patients with visuospatial neglect [82]. The analysis separated tasks that were geared to reveal patient-centered impairments from those geared to reveal object-centered ones. Regions associated with object-centered deficits were located entirely in the parietal and occipital cortex. The clusters with the largest ALE values included the right posterior middle temporal gyrus and adjacent white matter pathways of the posterior superior longitudinal fasciculus (SLF), the right middle occipital gyrus, the anterior angular gyrus, the IFOF and the white matter underlying the anterior superior parietal lobule (SPL). Again, these findings imply that object-centered neglect is associated with damage to cortical regions associated with visual perception, the ventral attention network and the pathways that likely carry signals from these areas to prefrontal targets, suggesting these structures are involved in constructing the object-centered content of visual awareness.

Pathways Involved in Selective Attention to Scale

The notion that selective attention to scale plays a role in the mental construction of objects and scenes is supported by the connectivity of the vertical and posterior-most components of the SLF, illustrated schematically by the light green lines in the lower panel of Fig. 2. These cortico-cortical components would be capable of carrying attentional signals directly between the dorsal and ventral subnetworks along the intraparietal cortex (IPC) and temporal-parietal junction (TPJ) and inferior occipitotemporal cortex, where damage is associated with visual object agnosia. The figure also makes clear the long horizontal connections to cortical targets in the prefrontal cortex through which dorsal and ventral parietal attention subnetworks operate indirectly on visual perception. These regions control eye movements (e.g. frontal eye fields) and host broadly distributed executive responsibilities that require control over “internal” attention for goal, task, and response selection and inhibition, spatial and verbal working memory and visual search (e.g. [100102]) These regions closely align with the set of cortical structures that comprise a multiple demands network [103]. Duncan has argued that the role of this large network is to construct what he refers to as “attentional episodes” over brief task epochs during which the network configures and structures cognition (and constituent processes) suitably for solving a sub-goal on its way to completing the task [104, 105].

Ventral Visual Perceptual Pathways Out of Occipital Cortex

There are at least five major intra-hemispheric pathways along which visual information is conveyed between the occipital lobe and the rest of the brain: the inferior longitudinal fasciculus (ILF), the medial longitudinal fasciculus (MLF), the superior longitudinal fasciculus (SLF), the inferior fronto-occipital fasciculus (IFOF) and the occipitotemporal projection system (OTPS). The ventral visual pathways that are well-studied and closely associated with visual perception (the OTPS, ILF and the IFOF) are schematically illustrated in orange, yellow, and red, respectively, in Fig. 2. These three pathways complement one another. The long, horizontal connectivity of the ILF [106, 107•, 108, 109] and IFOF [110113] affords direct and rapid transmission of visual information between lower and higher levels of the visual processing hierarchy and prefrontal structures associated with executive processing, respectively. These pathways are thought to support the rapid construction of initial estimates, “hypotheses” or “primitives” of higher level descriptions of the content of the visual array (e.g. [114]). These primitives can then be reinforced or rejected with subsequent volleys of visual input through the serial, stagewise U-shaped and neighbourhood-fiber projections of the OTPS, which help refine lower and intermediate-level structural descriptions [106, 115]. Thus, the ILF, OTPS, and IFOF are crucial bidirectional pathways that transmit visual sensory input for elaboration and integration with semantic information in the medial temporal lobe. The SLF, on the other hand, can be subdivided into pathways responsible for the regulation of spatial attention, which are shown in Fig. 2, for conveying visual input to the sensorimotor structures of the posterior parietal and premotor cortices, and for the production and comprehension of speech.

Electrical Stimulation of the ILF and the IFOF

The involvement of the ILF and IFOF in visual object processing is further supported by electrical brain mapping studies of patients undergoing awake surgical resection for small lesions in posterior temporal or occipitotemporal cortical areas adjacent to the ILF and in the superior temporal, inferior parietal and frontal cortical areas adjacent to the IFOF. Mandonnet and colleagues found that stimulation at the junction between the fusiform and inferior temporal gryus elicited errors when the patient named common objects presented as line drawings [116]. Their patient misidentified, for example, an armchair as a mirror and a mask as a cat. What is interesting about the nature of these errors is the structural similarity between the object depicted and the one perceived (see Fig. 3). The back of the armchair resembles a classic, hand-held ovoid mirror, complete with a curvilinear line inside it that is intended to illustrate the convexity of the chair’s back cushion but could be mistaken for glare or the reflection of a curvilinear object in the hand-held mirror. Interestingly, a failure to integrate the legs of the chair into the percept would exacerbate the misperception of a mirror, as would a reliance on part-based recognition.

Fig. 3
figure 3

Two sample images from the Boston Naming task that were misnamed (left), along with their putative “percepts” when patients were undergoing electrical stimulation (right) [116]. Top left: chair. Bottom left: mask. The pictures on the right represent possible “mental pictures” (percepts) that result from failures of selective attention to scale rather than mere failures to name what is seen. The top right panel illustrates what might result from a failure to select and integrate the seat, legs and arms of the chair, leaving only the back of the chair, which does resemble a mirror complete with a minor reflection. The bottom right panel illustrates what might result following a failure to select and integrate the nose and mouth of the mask, resulting in something that resembles a cat. The mask’s string is misinterpreted as the body of a resting cat

Recall that HJA’s reliance on part-based recognition led him to misidentify line-drawn objects and that his recognition performance improved when the local details of line-drawn objects were removed by filling the object in with black to create silhouettes. For the case in which the electrically-stimulated patient misidentified the mask as a cat, a failure to consider the detail of the mouth cut-out of the mask, and an over-reliance on the top of the mask, which resembles the ears of a cat, helps explain the misidentification error. Furthermore, the mask’s string can be misinterpreted as outlining the boundary of a cat’s body. A failure in figure-ground assignment for the space between the string and the mask as background, therefore, can also help explain the error. Remarkably, this patient also reported that the line drawings appeared 3D during stimulation, highlighting the integral nature of visual depth processing, spatial vision, and visual awareness as Lissauer argued over a century ago. Notable too is that the spherical resection (~ 1.5 cm) was localized to the right ventrolateral occipital cortex and resulted in novel postoperative central visual deficits in shape, face, and word perception [116]. Although these deficits were resolved 3 months after surgery [116], these observations suggest that these regions were crucially involved in object-based visual perception before, presumably, neural plasticity allowed other regions to assume the role of the lesioned structures.

Coello and colleagues used a similar task, this time presenting two pictures, one to each visual field [117]. Subcortical stimulation of the ILF above the right fusiform gyrus resulted in failures to name the picture presented in the left visual field but no failure to name the picture presented in the right visual field. The patient affirmed they saw the object, denying any visual disturbance, yet could not name it, suggestive of pure optic aphasia. In two additional patients, intra-operative stimulation of the ILF led to impairments in reading short sentences and in symbol recognition [118]. These patients remarked that they experienced difficulty combining individual letters into intelligible words and were only able to spell words letter-by-letter, which is strikingly reminiscent of Farah’s descriptions of “ventral simultanagnosia”.

Electrical stimulation to the surface of the posterior aspect of the left middle and superior temporal gyri and to the IFOF beneath the superior temporal sulcus also induces picture-naming errors and, crucially, picture matching errors on the non-verbal Pyramids and Palmtrees Task [119,120,121,122]. In this task, three line drawings are shown to the participant: a sample, its semantic match and a distractor. For example, a pair of hands should be matched with its target, a pair of gloves, as opposed to the distractor pair of shoes. The participant’s task is to point to the semantic match (the pair of gloves, in the example given). Electrical stimulation to the IFOF produced incorrect or no response whatsoever, with some of the patients expressing confusion about what they were looking at [120, 121]. Taken together, the evidence suggests that these ventral pathways are crucial for transmitting attentional and structural information to posterior ventral areas involved in the mental construction of conscious visual experience and the downstream anterior areas involved in linking percepts with their associated semantic features, including their verbal labels.

Electrical Stimulation of Ventral Occipitotemporal Cortex and High-Level Visual Perception

Recent studies have demonstrated that high-level cortical regions within the ventral stream of visual processing are associated with the mental construction of conscious visual experience. For example, Parvizi and colleagues studied a patient who had electrodes implanted into his right inferior temporal lobe, to probe the location of pharmaceutically resistant seizures [123]. Electrical stimulation of two of these electrodes, which were located on the posterior and middle aspects of the lateral bank of the right fusiform gyrus (i.e. overlapping FFA, as confirmed in a separate fMRI session), had a striking effect on the patient’s conscious perception of faces. Namely, the stimulation caused the patient to experience facial hallucinations, during which he remarked “You just turned into somebody else. Your face metamorphosed”, and “You almost look like somebody I’ve seen before, but somebody different. That was a trip…. It’s almost like the shape of your face, your features drooped” (both p. 14918) [123]. Importantly, electrical stimulation of these electrodes did not produce the same effect when viewing non-face objects, and sham stimulation of these electrodes and stimulation of nearby, but non-face-selective electrodes did not cause distortions in the patient’s perception of facial features [123].

Mégevand and colleagues examined a patient who was undergoing presurgical evaluation for treatment-resistant epilepsy and had several electrodes implanted into his right frontal and temporal cortices [124]. Separate fMRI and intracranial electroencephalography (iEEG) sessions determined the location and functional responsivity of scene-selective regions of cortex in the medial fusiform gyrus and collateral sulcus, overlapping the parahippocampal place area (PPA) [124]. Direct electrical stimulation of these regions induced topographic, scene-based hallucinations based in part on the patient’s memories of particular places. For example, the patient reported seeing his optometrist’s office and on a separate occasion a train station in his neighbourhood [124]. Taken together, these findings from the electrical stimulation studies of FFA and PPA strongly suggest a causal role for these structures in the construction of our moment-to-moment visual experiences of face- and scene-based perception.

Parallel Visual Object and Spatial Processing

Studies of visual agnosia have also helped illustrate the parallel nature of visual processing across different functional and behavioral end-points. These issues have been studied in the context of reaching out to grasp and manipulate objects in a few visual agnosics, most notably DF and HJA. To reach out and grasp an object successfully, the visual system must analyze the 3D geometry of an object and combine this analysis with the agent’s goal and stored functional information about the object in order to select grasp points along with an appropriate grasp type (e.g. a whole hand or a pincer grasp). This suite of information must also incorporate a set of unintuitive spatial relationships among our limbs, body, head and eyes, and the object itself. All of these computations are performed within fractions of a second and with little conscious effort in neurotypical individuals just prior to the initiation of the reach. On the basis of electrophysiological recordings in non-human primates, contemporary theories of visuomotor control implicate a cortical network spanning the parietal, prefrontal and occipital cortices for coding the spatial transformations that underlie goal-directed eye and limb movements.

Despite DF’s impairments in the perception of object size, shape and orientation, when she reaches out to pick up a goal object, her hand configures in-flight to suit the size, shape and orientation of that object [14, 25, 125]. The same counter-intuitive result was observed in visual form agnosic patient JS, when he was tested with the same sets of shapes [126]. Despite the similarity between JS and DF in terms of their perceptual deficit in shape and orientation perception, JS’s lesions are restricted to the ventromedial occipitotemporal cortex, rather than the ventrolateral site in DF. Furthermore, the published scans outlining the extent of the lesion in JS’s brain strongly indicate the involvement of the IFOF, ILF or both. The involvement of the ventromedial occipitotemporal cortex and its underlying white matter reinforces the viewpoint that shape processing for perception engages a network of a number of different cortical structures along the ventral visual pathway [34•].

Consideration for the role that scale attention must play in the selection of different parts of complex objects, particularly when those parts possess different functions, is also important for grasping complex objects, like tools. Here, DF and HJA’s grasps reveal important shortcomings. For example, when reaching to pick up a hammer in order to demonstrate its use to an experimenter, DF will reach for the end of the tool closest to her, rather than for the handle, regardless of the hammer’s orientation [127]. It is only after her hand makes contact with the hammer and explores it haptically that she adjusts her hand’s posture to grasp the handle, before lifting the hammer up and demonstrating its use successfully. Normally sighted individuals will reach for the handle, regardless of its orientation, presumably because this is the most efficient way to transition from acquiring the hammer to using it. The visual nature of DF’s deficit in shape perception impairs her ability to use geometric form to cue semantic information about what the object is and how its different parts should be used.

DF’s problems with selecting object parts for grasping are also evident in her inability to select the appropriate part of a 3D cross when asked to grasp and rotate it 45 degrees clockwise [127]. When asked to perform this task, normally sighted individuals adjust the orientation of their grasp aperture before making contact with the cross, taking into account the starting orientation of the object and its desired orientation in order to minimize awkward transitional hand configurations and wrist rotations. Unlike controls, DF adopts a default strategy, grasping the cross at a relatively consistent angle, regardless of the cross’s orientation [127]. This means she ends up grasping the intersection of the cross as much as she grasps one of the bars of the cross.

Relative to DF, HJA’s visual shape perception was by and large spared, and both his grasps when directed at simple “Efron” blocks and his performance when posting “letters” were normal [45]. Like DF, however, HJA’s reaching and grasping ability was limited to simple objects, even though his perception of object shape and orientation remained largely intact. When the objects were tools that possessed parts with distinct functions, he was unable to select the appropriate part to grasp. This suggests the medial occipitotemporal cortex is necessary for the integration of semantic information for the selection of object parts for functional grasps [45].

DF and HJA retain a parietal pathway for the visual analysis of 3D geometry for visually guided actions directed at objects with few distinct parts. However, while DF’s lesions in the ventral cortex are localized to ventrolateral occipitotemporal cortex, HJA’s lesions are confined to the ventromedial anterior occipital and temporal cortex. This suggests that the ventromedial temporal cortex plays a crucial role in scale attention for segmenting objects, particularly in cases where semantic information normally aids in the selection of appropriate object parts for grasping.

Pathways Underlying Visual Shape Processing for Action

The pathways that carry visual signals between visual and premotor and motor cortex are subcomponents of the three divisions of the superior longitudinal fasciculus (SLF; see also Fig. 2). The SLF is the largest of the long association fibers that are associated with vision [128132]. SLF-I is the dorsomedial-most of the three divisions and it interconnects the precuneus of medial posterior superior parietal lobule with medial superior frontal gyrus, premotor and motor areas of the dorsal frontal cortex [128132]. SLF-II is situated ventrolaterally relative to the SLF-I, interconnecting the anterior dorsolateral occipital cortex and adjacent angular gyrus in the inferior parietal lobe with the middle frontal cortex [128132]. The SLF-III is a shorter fiber pathway that interconnects the supramarginal gyrus with the inferior frontal gyrus in the ventral frontal cortex.

Lesions to cortical structures in and around the anterior intraparietal sulcus (aIPS) have long been known to result in deficits in reaching for objects to pick them up, the in-flight configuration of the hand, the selection of grasp points on the target itself and the dexterous finger movements that unfold after the hand makes contact with it [133138]. Different lines of evidence in neurotypical and normally sighted individuals support a necessary role for the aIPS in visually guided grasping. For example, functional MRI activation in the aIPS of normally sighted individuals is greater when they reach for objects to pick up using their index finger and thumb (a “pincer grasp”) than when they merely reach for them to touch with their index finger or knuckle [133, 139141]. Moreover, transcranial magnetic stimulation (TMS) to aIPS disrupts the formation of the in-flight grasp aperture [142, 143] and increases the area over the object in which the fingers first make contact [144], strongly suggesting a role for the aIPS in the selection of grasp points. Notably, the aIPS forms part of a larger, left-lateralized “praxis network” involving the premotor cortex that is involved in the timing and sequencing of goal-oriented muscle movements [e.g. 145; for review, see 146].

Visual Agnosia and Semantic Contributions to Visually Guided Grasping

One open question that visual agnosia may help address is how semantic information about an object, including its use, is delivered to the visuomotor structures in the PPC and premotor cortex. When we reach out to pick up complex goal objects that are made of constituent parts that possess different functions, semantic information about the object along with shape- and surface-based visual processing must be integrated into the motor plan in order to select grasp points that are suitable for using the object in its intended manner.

We have suggested that the vertical and posterior components of the SLF that interconnect ventral and lateral occipitotemporal cortical areas with the posterior parietal cortex might mediate direct interactions between cortical sources of semantic information about the functional parts of complex objects, like tools, and cortical sources involved in the selection of hand postures and grasp points for motor planning and execution. In line with this notion, fMRI activity in praxis network areas, including the posterior middle temporal gyrus and LOC, areas associated with the vertical SLF, are more active when viewing real tools vs. Frankenstein-like objects that are made from the parts of different tools [147]. Furthermore, dynamic causal modelling suggests fMRI activity in the LOC leads activity in aIPS when participants view pictures of tools, relative to pictures of non-tool objects that possess a similar, size, shape and orientation [148]. Moreover, real tool use invokes fMRI activity in these same structures as well as others in the praxis network [145, 149151].

With a handful of noted exceptions, there are only a few detailed studies of the reach-to-grasp actions of patients with visual agnosia. This is likely because these patients often times show no obvious problem reaching for and acquiring objects. Nevertheless, as case studies of HJA and DF have shown, careful laboratory observation can reveal important impairments in the selection of suitable object parts, particularly when the selection depends on visual access to semantic, functional information about what the object is and how to use it. Quantifying patterns of deficits and spared abilities and the location and extent of neural damage allows us to test ideas about the causal relationships between function and anatomy.

Neglect and the Role of Object-Centered Attention in Visually Guided Grasping

A related open question concerns the role that attention plays in the construction of motor plans for goal-directed action like reaching for and grasping objects. A few studies have investigated different aspects of reaching and grasping in neglect patients. When patients with neglect are presented with an object to pick up, the path the hand takes from its initial resting position deviates towards a distractor object, provided the distractor is located on the ipsilesional side of the target [152]. Interestingly, the hand’s in-flight grasp aperture remains unaffected, suggesting that neglect, and presumably selective attention to scale, can operate on different components of reaching and gasping movements, similar to the distinction between spatial (target location) and object-centered (the selection of grasp points) components of neglect.

Pritchard reported the results of a case of visual neglect in which the patient’s perception of the size of a target object presented in the contralesional visual field was compressed relative to when the same object was presented in the ipsilesional field [153; see also 154]. Remarkably, when the patient was asked to reach for and pick up the object, her in-flight grasp aperture reflected the bar’s real size regardless of whether the object was presented in the contralesional or ipsilesional field [153]. Unfortunately, detailed scans of the patient’s brain were not published. Nevertheless, the authors described the site of the lesion as right occipitotemporal cortex, extending into the medial temporal lobe. The extent along the superior-inferior dimension was left unspecified. Thus, it appears that the damage spared the dorsal PPC, along with those structures around the intraparietal sulcus that are engaged when we reach for and pick up goal objects.

It is also worth noting that there were other signs the patient’s visual perception may have been abnormal. She could not, for example, complete the Benton visual form discrimination task [154]. This task entails matching a target “set” of three objects against four sample sets, only one of which is identical to the target set. The remaining three foil sets contain objects that are either arranged differently with respect to one another, or some of the objects within the set differ in a subtle way from their correspondents in the target set. In short, this task strikes us as requiring selective scale-based attention, which would appear to have been severely compromised in the patient. Given the description of the lesion, it is possible that the damage to this patient’s occipital and medial temporal cortex extended into the underlying white matter, which could include the ILF, IFOF and/or the posterior, vertical segments of the SLF. Damage to these segments of the SLF would be consistent with our view that these pathways aid the operations of selective scale-based attention in the construction of the content of visual awareness. This would explain why the patient experienced a deficit in the perceived size of targets located on the left. Furthermore, the lesion did not appear to involve the PPC. Given the involvement of the PPC in visually guided reaching and grasping, this would help explain why the patient’s grasp aperture remained tuned to the real size of those same objects.

Marrotta and colleagues reported a study of shape discrimination and grasp point selection in six neglect patients [155]. These authors administered a test similar to the one Goodale and colleagues administered to DF, using smooth pebble-like 3D shapes [125]. In one of the conditions, the patient is presented with two of these shapes at two different locations along their midline and is asked to make a same/different judgment about their shape. On half the trials, the shapes are the same. Furthermore, the orientation of the shapes is randomly varied. The authors found that even on the shape-discrimination task, the patients performed poorly, albeit scoring above chance, and therefore better than DF, who has visual form agnosia, but well below normally sighted controls and the right hemisphere damaged controls. In other words, these patients appeared to possess symptoms of object-centered neglect.

In a second condition, performed after the patient made their same/different judgment about object shape on each trial, Marotta and colleagues removed one of the shapes and then asked the patient to reach for and pick up the remaining one [155]. Due to the smooth pebble-like shape of the targets, the grasp points had to be chosen carefully to minimize instability of the resultant grip. This involves selecting points on the target's surface for the thumb and forefinger that result in a finger-thumb "opposition axis" that falls close to the target's center of mass. For this task, the patient’s fingertips were inked so that their touchpoints would leave marks on the side of the target. This way, the experimenter could record where the patient grasped the object, and then determine afterwards how close their grasp points were to the center of the target’s mass, on average, across many trials. Marotta and colleagues found that the grasp points the neglect patient selected were shifted rightward, relative to those of the controls, towards the right (ipsilesional) side of the object. In fact, the extent of shift in the grasp points was correlated with the severity of neglect, as indicated by their scores on the BIT [155]. Thus, in this case, it is possible that the impaired perceptual processing for shape may have also affected the selection of grasp points. Unfortunately, detailed scans of the only patient in the group with a lesion in the parietal, occipital and temporal cortex (presumably the TPJ) were not published, and the scans that are available lack sufficient detail to draw any conclusive inferences about the relationship between lesion site and extent and performance on the two tasks.

Conclusion

One of the overarching aims of this review is to propose a more prominent role for selective attention to scale in understanding the conditions of visual agnosia and neglect. Our review of this literature points to the critical role of attention to scene and object scale in the construction of the content of visual awareness and in the selection of different object parts and object-surface points for goal-directed action like grasping. Some of the strongest support for this proposal comes from a subset of visuospatial neglect patients who possess object-based deficits in attention that resemble the perceptual deficits of patients with visual agnosia, and from two heavily studied patients with visual agnosia, DF and HJA. Our interpretation is that selective attention to the appropriate structural scale of a scene facilitates effective visual perception. That is, attention to the appropriate scale helps to construct the contents of awareness, including scenes, ensembles of objects, objects themselves and the selection of object parts suitable for recognition and action.

At the same time, it is important to note that we are not claiming that behavioral and neural responses cannot be reliable in the absence of selective attention to scale. Blindsight, in which patients respond reliably to visual stimulation presented in clinically blind fields, is a notable case in point demonstrating that selective attention to scale is not essential for successful visual-motor coordination to simple rectilinear and cylindrical shapes (e.g. 156, 157, 158). Rather, it is our view that under typical circumstances, the visual contents of immediate awareness are constructed within the occipital and inferior temporal cortices, and it is in the construction of these phenomenological representations that selective attention to scale plays a critical role. We have argued here that the origins of these attentional signals lie in parietal and frontal attentional centers. In so doing, we have highlighted the direct and indirect pathways that seem capable of delivering these signals to the inferior occipitotemporal structures that, as cases of visual agnosia have shown, are necessary for normal conscious visual experience.

Neuropsychological studies of visual agnosia have contributed substantially for over 100 years to informing theoretical models of the structure and function of the human visual system. The most recent strides in understanding have come from the development of brain imaging techniques that permit detailed anatomical visualization as well as functional visualization while an individual is perceiving and acting. Nonetheless, detailed patient case work is still foundational, because they often guide the brain imaging that affords us more precise tests of our ideas about the structural and functional relationships. The study of visual object agnosia is central to our current understanding that the mental representation of the visible world involves a parallel interplay between visual sensory inputs, past experience and perceptual and behavioral end-points of action.

In this review, we have highlighted that the tendency among researchers to study aspects of selective attention in isolation—for example, spatial attention, featural attention and object-based attention—may have contributed to the neglect of a critically important aspect of selective attention. Specifically, selective attention to one level in the structural hierarchy of a visual scene over another. Such selection is essential for successful perception of, and action towards, objects within a given scene. Moreover, such selection always entails attention to spatial locations, features and objects, but notably, only at the scale that is required for a given perceptual or motor task.

In developing this account, we have also highlighted an important area for further research—visually guided action in visual agnosia and visual neglect—that is likely to yield theoretical insights on still-unresolved issues. Although cases of visual agnosia are quite rare, cases of neglect are relatively common following right hemispheric stroke (~ 44–48%, see 83, 159). Thus, neglect, and more specifically the object-centered variant of it, might be a more accessible model to study the relationship between selective attention to scale, object perception and visually guided action. More work is needed to determine the conditions in which scale-based attention operates differently on the content of visual awareness than it does on visually guided action and to determine the neural underpinnings of these processes.

Finally, it is worthwhile mentioning that the literature of case reports involving patients with visual agnosia, and some patients with visual neglect, is replete with brief clinical descriptive accounts of rapid partial recovery in visual function. We currently know very little about how neural rewiring in the visual system helps reestablish facets of visual perception and recognition following damage. Neuroimaging uniquely affords researchers and clinicians the tools to study this nascent field of neural plasticity in patients with compromised visual perception. Therefore, we remain optimistic that additional studies of patients with visual agnosia and patients with visual neglect will continue to yield important insights into how the brain uses vision for perception, cognition and action.