Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Vision is the dominant sense in humans. We built our cities and buildings, furnished our homes and offices, and designed our transportation and appliances with the assumption that the users will have full vision—with occasional concessions for the visually impaired. We point at things, play sports, drive cars, and read body and facial expressions. When we are not actively interacting with our world, we watch television—about 4–5 h per day (Nielsen 2009; Ofcom 2010). These accounts illustrate the importance of vision as a source of information—and entertainment—about our environment. In short, we live in a sighted culture.

The importance of vision is also reflected in our brain. About 25 % of the human cerebral cortex (Van Essen 2003) is involved in visual processing, which is more than for any other sense. The visual system covers the occipital lobes, extends significantly into both temporal and parietal lobes, and involves parts of the frontal lobes. In closely related primates, such as macaques, the relative cortical surface area occupied by the visual system is even larger: about 50 % (Felleman and Van Essen 1991). The human visual cortex contains about five billion neurons. This number is far greater than in related primate species. The macaque visual cortex is about 20 % of that in humans despite similar numbers of nerve fibers coming from the eyes in both species. The increased number of neurons in the human visual cortex presumably reflects additional visual processing required for uniquely human skills such as language. Given these species differences in visual cortex, the human visual system likely contains features not found in nonhuman primates. Therefore, extrapolation of nonhuman findings to humans is not always possible. In addition, invasive techniques that have pioneered visual neuroscience in nonhuman primates are not feasible in humans. Therefore, noninvasive neuroimaging approaches, and in particular functional magnetic resonance imaging (fMRI), are pivotal for a full understanding of the human visual system. In addition, fMRI is viable in both species and will therefore be essential to bridge the species gap.

Studies of the visual system have a long history. Primary visual cortex (V1) was one of the first cortical areas to be distinguished. In 1782, prior to Brodmann (1903), Gennari dissociated V1 from the rest of the cerebral cortex due to the appearance of a stripe (stria of Gennari), though V1 was not identified as visual cortex until 1893 (Henschen 1893). Hence, V1 is also known as the striate cortex and the remainder as extra-striate cortex. The detailed knowledge of the visual system draws many scientists to vision. Not all these scientists are studying the visual system per se. Some use the visual system as a model either to develop and validate new methods or to investigate other neural properties, such as attention or consciousness.

In the field of fMRI, several influential studies are grounded in the visual system. These studies include the first successful human fMRI scan (Belliveau et al. 1991), and two of the three early reports using intrinsic blood oxygenation level-dependent (BOLD) fMRI signals (Bandettini et al. 1992; Kwong et al. 1992; Ogawa et al. 1992). Other examples include simultaneous electrophysiological and fMRI measurements to determine the neurobiological basis of the fMRI signal (Logothetis et al. 2001) and investigations of the linearity of the fMRI signal that form the basis of almost all fMRI data-analyses techniques (Boynton et al. 1996). Studies of the visual system have generated several advanced data-analysis techniques, such as retinotopic mapping (Engel et al. 1994; Sereno et al. 1995), information decoding (Haxby et al. 2001; Haynes and Rees 2005b; Kamitani and Tong 2005; Chap. 23), fMRI adaptation (Buckner et al. 1998; Tootell et al. 1998b; “fMRI Adaptation”), and neural model-based analyses (Thirion et al. 2006; Dumoulin and Wandell 2008; Kay et al. 2008; “Neural Model-Based Approaches”). These data-analysis techniques aim to extract more information from the fMRI data, beyond detecting the presence or absence of an fMRI signal; a quest captured by the term computational neuroimaging (Wandell 1999). Currently, the visual system provides a gold standard for high-resolution fMRI protocols to reveal columnar and laminar structures (see Chap. 26). We know where the columns are and where they terminate (for human ocular dominance columns see Adams et al. 2007). Once we can reliably detect these features of the visual system, we can turn our attention to more unexplored regions of cortex. In short, scientists study the visual system not just for the sake of vision itself but also as a model for the rest of the brain and as a rich database to validate new methods.

Visual Field Maps

One of the most important aspects of an image is its spatial arrangement. One can recognize the content of an image even after spatial transformations, color, or contrast changes. But, recognition is completely obliterated after spatial scrambling of the image pixels. Intuitively, it may not seem surprising that the spatial arrangement of an image is preserved in the visual cortex.

The existence of human visual field maps or retinotopic maps was established in the early 1900s (Fishman 1997). The reconstruction of the visual field maps were based on the correlation of visual field deficits with the location of human brain lesions suffered by soldiers of the Russo-Japanese war (Inouye 1909) and the First World War (Holmes 1918). These early authors made two important observations (Fig. 15.1). First, each hemisphere encodes the opposite hemifield, that is, the right hemisphere encodes the left visual field and vice versa. Second, the cortical representation of the central part of the visual field (fovea) is enlarged relative to more peripheral parts—a phenomenon commonly referred to as cortical magnification (Daniel and Whitteridge 1961). The cortical magnification factor was initially underestimated and was only recently corrected (Horton and Hoyt 1991b).

Fig. 15.1
figure 1

Schematic illustration of the visual field representation in primary visual cortex (V1 or striate cortex). The visual field is shown in the left panel; the center of the visual field is at the black circle and the polar-coordinate axes—eccentricity and polar angle—are identified. V1 lies within and around the calcarine sulcus (inset, dashed lines). The left visual field (left panel) is represented on the right cortical surface (unfolded cortical surface, inset and right panel). This representation uses a mathematical transformation proposed by Schwartz (Schwartz 1977) that captures biological measurements. The visual field is inverted, corresponding to the inverted image on the retina. The representation of the central part of the visual field is enlarged compared to more peripheral regions, a phenomenon commonly referred to as cortical magnification. V1 primary visual cortex (Daniel and Whitteridge 1961)

The cortical magnification factor, that is the increased number of neurons processing the input from the fovea versus the periphery, has its initial origin at the retina and is also reflected in the visual field maps. The V1 cortical representation of the central visual field is magnified to such an extent that the central 10° of our visual field, which is a little over 1 % of our total visual field occupies approximately 50 % of the V1 cortical area. The cortical magnification relates to perception. The increased peripheral neural convergence provides increased sensitivity at the expense of spatial resolution. The higher peripheral sensitivity is used to detect events of interest and next inspect them with the higher spatial acuity of the fovea. Visual performance on several visual tasks is far superior in the fovea. Examples of these improved visual skills in central vision are not only basic skills such as our ability to see fine details (visual acuity) but also more complex tasks such as reading. Importantly, the peripheral inferiority in more complex tasks cannot be explained solely based on visual acuity (Legge 2007), suggesting that other differences in central–peripheral processing underlie this performance.

Subsequent animal experiments refined these observations and, importantly, defined multiple visual field maps. Both the second and third visual area, V2 and V3, are visual field maps encompassing V1 in a horseshoe shape (Thompson et al. 1950; Clare and Bishop 1954; Cowey 1964; Hubel and Wiesel 1965; Tusa et al. 1978). Coinciding with identifications of multiple visual field maps was the notion that the nature of the representation must differ from map to map. Especially in humans, the identification of visual field maps, map functions, and homologies to monkeys is still ongoing (Tootell et al. 2003; Sereno and Tootell 2005; Wandell et al. 2007; Silver and Kastner 2009). Using fMRI, there are several techniques to identify visual field maps. The most commonly used visual field mapping technique is described in “Measuring Visual Field Maps Using fMRI” and Fig. 15.2. A promising new approach is discussed in “Measuring Population Receptive Fields Using fMRI” and Fig. 15.4a. Visual field maps extend significantly into the parietal and temporal lobes, and have also been reported in the frontal lobes.

Fig. 15.2
figure 2

Traveling wave or phase-encoded visual field mapping. The subject looks at the red fixation dot. Expanding annuli containing flickering dartboard patterns evoke a traveling wave of BOLD activity across visual cortex; small central rings stimulate central representations near the occipital pole a, whereas intermediate b and large rings c evoke responses in more peripheral representations in anterior occipital cortex. The phase—or delay—of the fMRI signal indicates the ring position that elicited the strongest response. The preferred eccentricity is indicated in a color map on the cortical surface d, the colors represent different eccentricities (inset). The representation in panel d corresponds to the dashed region in panels ac. The orthogonal dimension, polar angle, in polar coordinates, is reconstructed using rotating wedges; dashed and solid lines indicate the horizontal and vertical meridians, respectively. e. Similar to eccentricity, the wedge that evoked the strongest response is indicated with a color map f. The changes in polar angle progression reveal the borders between the visual field maps (g). V1 primary visual cortex, V2 second visual area, V3 third visual area, hV4 human homologue of V4, VO ventral occipital, PHC-1 parahippocampal cortex 1, PHC-2 parahippocampal cortex 2 IPS intraparietal sulcus LO lateral occipital

Initial naming schemes for human visual field maps adopted the nonhuman primate nomenclature, for example, V1, V2, V3, middle temporal (MT), etc. However, questions about human and nonhuman homology demanded a different naming scheme. Such different naming schemes separate efforts to identify a visual field map from the effort to establish homology. Uncertainty about homologies starts as early as V3. The V3 and V3 accessory (V3A) visual field maps layout are similar in both human and nonhuman primates, but their sensitivities to visual motion stimuli—and therefore perhaps their functions—are reversed (Tootell et al. 1997; Vanduffel et al. 2001). In macaques, V3 but not V3A is sensitive to visual motion stimuli, whereas in humans V3A but not V3 responds most strongly to motion stimuli. Perhaps it is only reasonable to question homologies beyond V2. Only V1 and V2 in mammals and MT in primates seem to be evolutionarily preserved (Rosa and Krubitzer 1999; Krubitzer 2009). Consequently, different naming schemes for humans have been proposed. The simplest scheme is the addition of “h” for human to the primate nomenclature, for example, human homologue of V4 (hV4) and human homologue of monkey area MT (hMT). Others are based on their anatomical locations or suspected functions. But gross anatomical features lack the specificity to label several small maps in the same regions. Nomenclature on suspected functions is unsafe as the full function of a region may only be appreciated after extensive studies (Smith et al. 1998). Wandell et al. (2005) proposed a naming scheme based on the gross anatomical location and a number. Several laboratories have adopted this naming scheme (Brewer et al. 2005; Schluppeck et al. 2005; Silver et al. 2005; Larsson and Heeger 2006; Swisher et al. 2007; Konen and Kastner 2008; Amano et al. 2009; Arcaro et al. 2009).

Measuring Visual Field Maps Using fMRI

One exciting advance in fMRI methodology was the ability to precisely delineate visual field maps using the traveling wave method (Engel et al. 1994), also known as phase-encoded retinotopic mapping (Sereno et al. 1995). Though this is not the only way to identify visual field maps (for a new technique see “Measuring Population Receptive Fields Using fMRI” and for other techniques see Fox et al. (1987), Sutter and Tran (1992), Schneider et al. (1993), Hansen et al. (2004), and Vanni et al. (2005), its simplicity and robustness have ensured that it is still the most popular technique today.

The method sequentially stimulates each point in the visual field along the axes of a polar-coordinate system, thereby reconstructing the representation of the visual field on the cortex (Engel et al. 1994; Sereno et al. 1995; DeYoe et al. 1996; Engel et al. 1997; Warnking et al. 2002; Dumoulin et al. 2003). The analysis routine is unique because it relies on the phase—or delay—of the fMRI signal rather than the amplitude (Fig. 15.2). Expanding (or contracting) ring sections of a dartboard pattern elicit responses at increasingly eccentric visual field locations. The phase or delay of the fMRI signal identifies the ring position—eccentricity—that evokes the strongest response at each cortical location (Fig. 15.2a, b, c, d). In a similar fashion, rotating wedges are used to reconstruct the polar-angle representation on the cortical surface (Fig. 15.2e, f).

Precise delineation of visual areas has several implications. First, it allows quantitative insights into the organization of the visual cortex, for example, by estimating cortical magnification factors or receptive field size. The quantitative measures furthermore permit interspecies comparisons (Orban et al. 2004; Sereno and Tootell 2005) and a detailed analysis of the pathological visual system. Second, it enhances the interpretability of studies of the visual system’s functional properties by allowing activations to be localized in, or constrained by, functional areas rather than anatomical locations (Di Russo et al. 2002; Appelbaum et al. 2006). Furthermore, it allows a region-of-interest (ROI) analysis, that is, averaging of the same regions in the individual brains with the underlying assumption of a homogeneous processing within the region. An ROI-analysis increases the signal-to-noise ratio (SNR) beyond standard stereotaxic averaging, that is, averaging of similar coordinates on the basis of anatomical instead of functional features (Talairach and Tournoux 1988; Collins et al. 1994). The increased SNR is due to intra and intersubject averaging, that is, averaging of voxels within the same cortical area and the same cortical area across subjects.

Identifying Visual Field Maps

Visual field maps are identified based on several criteria. These criteria are derived from the established layouts of the visual field maps V1, V2, and V3. First, each visual field map represents—by definition—each point in visual space only once (Press et al. 2001), and each map represents the entire—or at least a substantial part (Zeki 2003) of the—visual field. Second, each visual field map should have an orderly organization in both polar angle and eccentricity dimensions across the cortical surface. The polar angle and eccentricity should be nonparallel, though not necessarily orthogonal (Tyler et al. 2005). But there are discontinuities in visual field map representations. To date, all visual field maps known are split across the vertical meridian such that the two hemifields are represented in different hemispheres. V2 and V3 are additionally split across the horizontal meridian as they wrap around V1, such that each contiguous field map region represents only a quarterfield These discontinuities thus occur at the horizontal and vertical meridians.

Borders between visual field maps are identified based on discontinuities of the visual field representations (Fig. 15.2f, g). These discontinuities reveal themselves as reversals or local peaks/troughs in the polar-angle progression. Even at conventional fMRI resolutions, relatively straightforward interpolation schemes identify the border position within about 1-mm precision (Engel et al. 1997; Olman et al. 2003). For instance, the represented polar angle gradually rotates from the upper vertical meridian to the lower vertical meridian as one traverses V1 in a dorsal direction, but then rotates back up as soon as one continues along the same route into V2 (Fig. 15.2f). Along the polar-angle dimension, these reversals coincide with reversals in visual field map representation, in other word visual field signs: mirror or non-mirror-image representations of the visual field (Sereno et al. 1994; Sereno et al. 1995; Dumoulin et al. 2003). These visual field signs can be used to distinguish neighboring visual field maps along the polar-angle dimensions but can fail to distinguish neighboring visual field maps bordering along the eccentricity dimension—for example, V3A and lateral occipital map (LO)-1 (Fig. 15.3). Alternatively, the visual field map borders may be derived from a fit of a canonical template to the reconstructed visual field layout (Dougherty et al. 2003). Though this method is sensitive to the initial starting points provided by the experimenter, it not only provides objective border definitions but also precise localization of all other parts of the visual field representation. An advantage of the traveling wave method is that the border identification depends on the change in polar-angle progression and is independent of the widely used (amplitude) significance threshold. Furthermore, it reconstructs the entire visual representation and does not assume a particular a priori layout of the visual field. Therefore, it is an ideal method to delineate new visual field maps or visualize changes in known visual field maps.

Fig. 15.3
figure 3

Human visual field maps. A schematic overview is shown of the visual field map layout on an unfolded representation of the right hemisphere from a medial–ventral (left) and dorsal–lateral (right) perspective. The right visual field maps represent the left visual field (inset), the upper and lower visual field representations are indicated with a “+” and “,” respectively. This schematic overview is only one interpretation of the visual field mapping data. Others exist as well. Only V1, V2, V3, and V3A are firmly established. V1 primary visual cortex, V2 second visual area, V3 third visual area

There are several factors that make accurate reconstruction of visual field maps difficult and that can confound results. Methodological choices such as stimulus parameters and data-analysis procedures may influence the ability to reconstruct visual field maps. For example, due to their different emphasis on the ­representation of central versus peripheral parts of the visual field, maps at the ventral surface may be clarified by fine sampling of the central part of the visual field, whereas more dorsal regions may be best revealed using larger stimuli (Baizer et al. 1991; Brewer et al. 2005; Pitzalis et al. 2006). A common hypothesis is that the visual field map organization and relative layout is preserved across subjects. But, biological variability may limit accurate visual field map reconstruction. For example, visual field map sizes can vary by a factor of two between different subjects (Stensaas et al. 1974; Andrews et al. 1997; Dougherty et al. 2003; Duncan and Boynton 2003; Schira et al. 2007). Especially for high-level, that is, smaller, visual field maps, natural variability in the size may introduce variability in reconstruction accuracy. Recently, another biological source of fMRI variability has been identified (Winawer et al. 2010). Winawer and colleagues found that fMRI signal dropouts associated with the presence of large veins could obscure parts of visual field maps. Though the global position of these veins is roughly related to gross anatomical features, the exact positions of these veins are variable in relationship with functional anatomical structures. Therefore, these artifacts may obscure certain features—and fMRI signals—in some individuals but not in others. To sum up, the ability to identify visual field maps depends on many variables, of which some are outside of the experimenter’s control. Therefore, the inability to identify certain visual field maps, or parts of certain maps, should be interpreted carefully, and reports of the same visual field map pattern by multiple independent laboratories should outweigh the occasional inability to define these maps.

Human Visual Field Maps

A schematic overview of the human visual field map layout is shown in Fig. 15.3. Other visual field map layouts have been proposed, and many features are intensely scrutinized and passionately debated. This scheme is likely to be adjusted as additional evidence is gathered and interpreted. It is clear, however, that these regions exhibit retinotopic responses; in other words, each cortical location represents a limited part of the visual field.

Using the traveling wave method (Engel et al. 1994), the visual field maps V1, V2, V3, V3A, and the ventral representation of the human homologue of area V4 were identified (Sereno et al. 1995; DeYoe et al. 1996; Engel et al. 1997). These maps are now routinely identified in individual subjects in fMRI experiments lasting half an hour or so.

But, despite the large cortical region devoted to processing the most central part of our visual field, the human foveal representation of V1, V2, and V3 remained unclear for many years. Hence, this part of cortex was dubbed “foveal confluence” (Somers et al. 1999; Dougherty et al. 2003). Delineation of the foveal representation is important because the fovea is vital for many basic visual functions, such as reading. Recent advances in data analysis (Dumoulin and Wandell 2008) and data acquisition (Schira et al. 2009) have separated the visual field map representation within the foveal confluence. Schira et al. (2009) described the V2 and V3 representations as contiguous bands surrounding V1. Near the fovea, the width of these bands is about 5 mm. This banded organization not only minimizes visual field map distortions in these areas but also increases the cortical magnification of V2 and V3 relative to V1 (see Fig. 15.3; Schira et al. 2009, 2010).

On the ventral surface, several visual field maps were identified (Fig. 15.3); these include the hV4, two ventral occipital maps (VO-1 and VO-2; Wade et al. 2002; Brewer et al. 2005; Arcaro et al. 2009; Winawer et al. 2010), and two maps in parahippocampal cortex (PHC-1 and PHC-2; Arcaro et al. 2009). Particularly, the visual field map layout around hV4 is intensely debated and several alternative proposals exist (Hadjikhani et al. 1998; Tootell and Hadjikhani 2001; Hansen et al. 2007). Only recently, Winawer and colleagues realized that this region is contaminated with vasculature artifacts providing a unifying explanation for some of the controversies (Winawer et al. 2010).

On the lateral surface, several maps have been identified. The four maps illustrated in Fig. 15.3, LO-1 and 2 (Smith et al. 1998; Larsson and Heeger 2006; Swisher et al. 2007; Amano et al. 2009), and temporal occipital maps 1 and 2 (TO-1 and 2; Huk et al. 2002; Amano et al. 2009; Kolster et al. 2010), have been confirmed by independent laboratories. TO-1 and 2 are putative homologues of monkey areas MT and medial superior temporal area (MST). Kolster and colleagues have proposed other putative homologues of monkey visual areas in this region (Kolster et al. 2010).

Along dorsal visual cortex, many maps have been identified, V3A (DeYoe et al. 1996; Tootell et al. 1997; Smith et al. 1998) and V3B (Smith et al. 1998; Press et al. 2001; Schluppeck et al. 2005), and a series of maps along the intraparietal sulcus (IPS), including IPS-0 or V7 (Tootell et al. 1998a; Sereno et al. 2001; Schluppeck et al. 2005; Silver et al. 2005; Hagler et al. 2007; Swisher et al. 2007; Konen and Kastner 2008). On the medial surface, a human homologue of monkey area V6 has been suggested (Pitzalis et al. 2006; Stenbacka and Vanni 2007). A few visual field maps have been identified within the frontal lobe, including one in the approximate location of the frontal eye fields (FEFs; Hagler and Sereno 2006; Kastner et al. 2007).

Topographic organization has been reconstructed beyond the cortex. These include several subcortical nuclei; the most prominent being not only the lateral geniculate nucleus (LGN; Chen et al. 1999; Uğurbil et al. 1999; Schneider et al. 2004) but also other nuclei such as the superior colliculus (Schneider and Kastner 2005; Wall et al. 2009) and the pulvinar (Cotton and Smith 2007; Fischer and Whitney 2009). Advances beyond fMRI, that is, diffusion tensor imaging (DTI) and fiber-tracking (FT), revealed a topographic organization of the occipital–callosal fibers (Dougherty et al. 2005). The discoveries of multiple visual field maps and continuing reports of novel maps support the notion of modular design of the visual cortex. It also suggests that the labels of “retinotopic” and “nonretinotopic” should be viewed as parts of a continuum rather than as a dichotomy.

Population Receptive Fields

The traveling wave method and other visual field mapping techniques summarize the most effective visual location to drive neuronal responses at a particular cortical location as a point in visual space. Yet every neuron does not process a single location but a region of visual space known as its receptive field. Moreover, given estimates of neuronal packing density (Rockel et al. 1980; Leuba and Garey 1989) and typical fMRI resolutions (~2.5 mm isotropic), each recording location contains about a million neurons. The aggregate receptive field of a neuronal population is often referred to as the population receptive field (pRF; Victor et al. 1994; Jancke et al. 2004). Using an analogous rationale in fMRI, the region of visual space that stimulates the recording site is also typically referred to as the pRF (Dumoulin and Wandell 2008).

Many factors influence the pRF properties, some neural and some not (for reviews, see Smith et al. 2001; Dumoulin and Wandell 2008). Nonneural factors include eye movements, head movements, optical defocus, recording—or voxel—size, and both temporal and spatial hemodynamic response function parameters. These nonneural factors may not affect all pRF parameters equally, for example, isotropic eye movements increase pRF size but have little influence on the pRF position, and hence on visual field maps (Levin et al. 2010). There are also differences in neural contributions to the pRF. These include position scatter of the individual receptive fields of the recorded neural population and both classical and extra-classical neural receptive field properties. Because different neurons are included within one recorded site, different stimuli that drive different neurons can also yield different pRF properties at the same cortical site. We can see these different contributions to the pRF as a confound, and it also provides an opportunity to examine the properties of the neural population. By comparing estimates from carefully selected stimulus conditions, we may be able to distinguish the different neural contributions to the pRF.

Measuring Population Receptive Fields Using fMRI

There are several methods to estimate pRF sizes from the fMRI signal. First, the pRF size influences the fMRI signals elicited by the traveling wave stimuli. This pRF influence was first observed by Tootell et al. (1997), who noticed different time courses in visual field maps V1 and V3A in response to conventional traveling wave stimuli. They explained this time course difference by suggesting that pRF sizes in V3A exceed those of V1. Smith et al. (2001) quantified this observation by measuring the relative amount of active versus inactive epochs—the duty cycle—in the fMRI response to the ring stimulus (for related approaches, see also Larsson and Heeger 2006, Li et al. 2007, Kolster et al. 2010). These measurements revealed differences between visual field maps and increasing pRF sizes with eccentricity.

The duty-cycle method will only work directly for the ring stimuli (Smith et al. 2001), but size estimates from wedge stimuli can be derived also after estimating the pRF’s eccentricity (Larsson and Heeger 2006; Kolster et al. 2010). But due to the lack of a baseline in the stimulus, this type of measurement will systematically underestimate larger pRF sizes (Dumoulin and Wandell 2008; Amano et al. 2009). Basically, modulations of the fMRI signals elicited by conventional traveling wave stimuli may be caused by a small pRF, naturally responding to only certain visual field locations or a large pRF responding to all visual field locations but with a preference to certain visual field locations. Without a proper baseline, these cannot be distinguished and duty-cycle-related measures often default to the first possibility.

The second method estimates pRF sizes based on electrophysiological observations that two—or more—stimuli presented simultaneously within a receptive field reduce responses compared to the same stimuli presented sequentially (Moran and Desimone 1985; Luck et al. 1997; Reynolds et al. 1999). The extents of the suppressive interactions covary with receptive field size of the neurons. Kastner et al. (1998, 2001) used a similar paradigm to relate these suppressive interactions to receptive field sizes using fMRI (see also Bles et al. 2006; Rijpkema et al. 2008). Basically, if at a given recording site the fMRI signal is attenuated for simultaneous versus sequential stimuli presentations, the receptive fields at that recording site are assumed to be large enough to cover the different stimuli.

More recently, pRF sizes were modeled by fitting two-dimensional (2D) models to the fMRI signals (Fig 15.4a). These pRF models were either Gaussian (Dumoulin and Wandell 2008) or Gabor wavelet pyramids (Kay et al. 2008). This type of analysis is independent of the exact stimulus layout, though the insertion of proper baseline is crucial to estimate the exact pRF sizes (Dumoulin and Wandell 2008). The neural model predicts the fMRI time series by convolution of the neural model with the stimulus sequence and the hemodynamic response function. The optimal neural model parameters are estimated by minimizing the sum of squared errors between the predicted and observed fMRI time series. In this type of analysis, the output of the fMRI data analysis is the model parameters. Compared to the previous approaches, the model-based approach has several other advantages that are discussed in more detail in “Neural Model-Based Approaches”.

Fig. 15.4
figure 4

Population receptive field (pRF) estimates. a Schematic illustration of the neural model-based method to estimate the pRF. Convolution of the neural model with the stimulus sequence and the hemodynamic response function predicts the fMRI time series; the optimal neural model parameters are estimated by minimizing the sum of squared errors between the predicted and observed fMRI time series (Adapted from Dumoulin and Wandell 2008). b The pRF size estimates vary between different visual field maps. Within each visual field map, pRF size increases with eccentricity. c When pRF sizes are expressed in V1 cortical surface area, cortico-cortical pRFs, they are constant across eccentricity in V2 and V3. Thus, V2, V3, and, to some degree hV4, sample from a constant extent of V1. fMRI functional magnetic resonance imaging, V1 primary visual cortex, V2 second visual area, V3 third visual area, LO-1 lateral occipital map 1, hV4 human homologue of V4, pRF population receptive field. (Adapted from Harvey and Dumoulin 2011, Amano et al. 2009)

The pRF size estimates using the neural model-based analysis show similar trends as the receptive field estimates by electrophysiological studies (Fig. 15.4b; Dumoulin and Wandell 2008; Kay et al. 2008; Amano et al. 2009; Winawer et al. 2010; Harvey and Dumoulin 2011). There are large differences between different visual field maps; within each visual field map, the pRFs increase as a function of eccentricity. These pRF size changes across visual cortex are reminiscent of a hierarchical organization of the visual field maps in nonhuman primates (Van Essen and Maunsell 1983). The quantitative pRF size estimates are comparable to independent pRF estimates made using single- and multiunit activity and local field potentials (LFPs) in nonhuman primates (Dumoulin and Wandell 2008). They are also comparable to estimates from human electrophysiological measurements (Yoshor et al. 2007).

Receptive field sizes are typically measured in visual space but recent efforts have related the receptive field sizes to other parts of visual cortex. This defines the receptive field of a given area by the cortical sampling extent from another area, for example, the sampling extent of V1 cortical surface by a V4 neuron (Motter 2009). When pRF sizes are expressed in terms of cortical surface area, they are typically referred to as cortico-cortical pRFs. Cortico-cortical pRF are constant in V2, V3, and, to some extent, (h)V4 when expressed in V1 sampling extent (Fig. 15.4c; Motter 2009; Harvey and Dumoulin 2011). This suggests a constant topographic functional connectivity between visual field maps. These cortico-cortical pRFs can be estimated without any visual stimulation linking the concept of cortico-cortical pRFs to spontaneous signal fluctuations (Heinzle et al. 2011).

Neural Model-Based Approaches

The neural model-based method is more than just a technique to estimate visual field maps and neuronal receptive field sizes. Compared to the previous approaches, they have several advantages. First, these approaches do not depend on a particular stimulus paradigm. Second—and most important—these approaches are poised to model many other properties of the underlying neuronal population, such as quantitative estimates of point image (Harvey and Dumoulin 2011), surround suppression (Zuiderbaan et al. 2012), and the relative amount to which neuronal populations process the contra or ipsilateral visual field (Dumoulin and Wandell 2008).

Another example is provided by the study of Kay et al. (2008). Their study consisted of two stages. The first stage estimated the parameters of their neural model. The neural model predicts the fMRI time series. The neural model parameters were estimated by minimizing the residual sum of squares between the predicted fMRI time series and the actual fMRI time series from a separate—training—data set. In the second stage, they used the neural models with fixed parameters to predict the fMRI signals elicited from viewing natural images not previously shown to the subject. These predictions were compared to those measured with fMRI. Based on these predictions they were able to select the image that was shown in the fMRI scanner to the subject with high accuracy. Using a similar approach, Brouwer and Heeger (2009) were able to decode and reconstruct color from fMRI responses.

The neural model-based approach is fundamentally different from statistical pattern recognition approaches that also aim to identify stimuli or conditions based on fMRI signals (Chapt. 20; Wandell 2008; Raizada and Kriegeskorte 2010)—though local pattern recognition techniques can capture some of the pRF properties modeled in neural model-based approaches (Miyawaki et al. 2008). First, as a classification technique, the neural model-based approach does not rely on predefined categories and allows any image or condition to be identified (Kay et al. 2008; Brouwer and Heeger 2009), even images imagined by the subject (Thirion et al. 2006). Second, as it is based on a neural model, the identification (and reconstruction) accuracy depends on the accuracy of the neural model: The identification accuracy provides a validation of the neural model itself. Classification based on neural models therefore not only determines the information content of a particular patch of cortex but also explicitly models the underlying brain processes.

Both Thirion et al. (2006) and Brouwer and Heeger (2009, 2011) compared their model-based approach to statistical pattern recognition. Brouwer and colleagues found similar performances. Thirion and colleagues found that the statistical pattern recognition technique outperformed the neural model-based approach. This result indicates that some fMRI signal characteristics were utilized by the statistical approach but not by the neural model. Therefore, the neural model may be extended to capture additional neural properties displayed in the fMRI signal—as in Kay et al. (2008). In this fashion, the neural model-based approach provides insights into the underlying neural processes.

Functional Specialization

Functional specialization is the notion that the cortex consists of separate areas involved in different processes. This functional specialization is presumed to be closely associated with cytoarchitecture, connections, and the layout of maps (Van Essen 2003). Functional specializations typically refer to perceptual qualities of the visual scene. Early evidence of these functional specializations was provided by studies of subjects with brain lesions. Lesions in particular places in visual cortex give rise to specific deficits, such as the inability to recognize objects (visual agnosia), faces (prosopagnosia), motion (akinetopsia), or the inability to read (alexia). Zeki and colleagues were first to illustrate the notion of functional specialization or modularity in the healthy human visual cortex using positron emission test (PET; Zeki et al. 1991). They located separate regions involved in processing color and motion information, one in ventral and one in lateral occipital cortex. Although these are not the only regions processing color and motion information, these regions respond the strongest in experimental paradigms selectively targeting color and motion perception.

The functional specialization literature within the visual cortex is a wide field; therefore, I will focus on a number of issues that have proved to be critical points of debate in the fMRI community in early visual cortex and along the dorsal and ventral pathways. These issues include overlap with visual field maps, a well-described motion-selective region of the dorsal pathway, and various object category-specific specializations in the ventral pathway, but exclude other regions such as the parietal cortex (Culham and Kanwisher 2001; Silver and Kastner 2009).

In early visual cortex, functional specializations overlap with visual field maps. Visual field maps are being defined in regions already suspected to contain maps such as the motion selective region of hMT+ (Huk et al. 2002; Amano et al. 2009; Kolster et al. 2010) and color-selective cortex (Hadjikhani et al. 1998; Wade et al. 2002; Brewer et al. 2005; Hansen et al. 2007; Winawer et al. 2010). The visual field maps in hMT+ have been subject to relatively minor discussions. The visual field map layout around the color-selective cortex, on the other hand, is intensely debated (Hadjikhani et al. 1998; Wade et al. 2002; Brewer et al. 2005; Hansen et al. 2007; Winawer et al. 2010). It is not the color-selective responses that are debated, but the organization of the visual field maps and monkey homologies. What is clear is that this part of cortex differs from monkeys. Only recently, Winawer and colleagues realized that this region contains artifacts introduced by a particular vein, the transverse sinus, which can explain some of the controversies surrounding this region (Winawer et al. 2010).

In higher-order visual cortex, the identification of the functional specialization has been quite distinct from efforts defining visual field maps. Recently, these research fields have started to overlap; starting with the suggestion of large-scale relationship between retinal position and functional specializations (Levy et al. 2001; Hasson et al. 2002) to the identification of visual field maps in regions such as lateral occipital complex (LOC; Larsson and Heeger 2006; Amano et al. 2009) and parahippocampal place area (PPA; Arcaro et al. 2009). Often two or more visual field maps are found, suggesting that these regions may contain more areas based on topographic criteria than traditional functional specialization definitions. Based on these observations, Wandell and colleagues suggested that visual field map clusters organized around a common eccentricity map might relate to functional specializations (Wandell et al. 2005).

The cortical region processing motion, first defined by Zeki et al. (1991), is now known as the hMT or visual area 5 (V5). Using fMRI, hMT+ has now been observed many times by contrasting fMRI signals elicited by visual motion stimuli and their stationary counterparts (see, for example, McCarthy et al. 1994, Tootell et al. 1995, Dumoulin et al. 2000). In monkey cortex, several other motion-selective cortical areas surround MT; human homologues of these areas are likely included when using a functional localizer in an fMRI experiment. To acknowledge this degree of imprecision, this region is typically referred to as hMT+ (DeYoe et al. 1996). Not only the hMT+ region responds selectively to motion but also many other distinct cortical patches (Dupont et al. 1994; Braddick et al. 2001; Culham et al. 2001) and, in particular—in humans but not in macaques—V3A (Tootell et al. 1997; Vanduffel et al. 2001).

The ventral pathway in particular has seen a proliferation of functionally defined areas (Fig. 15.5). These regions are typically defined by a contrasting fMRI signal elicited by different visual stimulus categories and/or their scrambled counterparts. These areas are named after their rough anatomical location or their presumed function. They include LOC (Malach et al. 1995), fusiform face area (FFA; Kanwisher et al. 1997), PPA (Epstein and Kanwisher 1998; Maguire et al. 1998; Epstein et al. 1999), extrastriate body area (EBA; Downing et al. 2001; Peelen and Downing 2007), and visual word form area (VWFA; Puce et al. 1996; Cohen et al. 2000). Except for LOC, all the other names indicate their presumed functions.

Fig. 15.5
figure 5

Functional specializations in visual cortex. The schematic diagram illustrates the typical organization of major cortical regions implicated in processing fundamental perceptual qualities in visual images. The cortical patches and their most frequently used acronyms are indicated for regions proposed to selectively process color (yellow), motion (turquoise, hMT+ or V5), faces (red, FFA), places (blue, PPA), bodies (purple, EBA), visual word forms (green, VWFA), and visual objects (orange, LOC). The motion- and body selective-regions, and a large part of the object-selective regions, are on the lateral surface. (Drawn after Wandell et al. 2006, Op de Beeck et al. 2008, Wandell et al. 2009, Kanwisher 2010)

The cortical region where intact objects elicit stronger responses than their scrambled counterparts defines LOC (Malach et al. 1995). It extends from lateral occipital to ventral occipital cortex (Fig. 15.5). Most of the other regions mentioned in the previous paragraphs overlap to some degree with the original LOC region. The term “complex” acknowledges that this region consists of several visual areas. Early visual cortex (V1) is often also modulated by the contrast between intact and scrambled objects but in an opposite fashion, that is, fMRI signal amplitudes are higher for scrambled images (Grill-Spector et al. 1998; Lerner et al. 2001; Murray et al. 2002; Rainer et al. 2002; Dumoulin and Hess 2006; Fang et al. 2008). Stronger responses to scrambled objects have been interpreted as feedback from predictive coding mechanisms (Murray et al. 2002; Fang et al. 2008) or incomplete match of low-level image statistics (Rainer et al. 2002; Dumoulin and Hess 2006). Several studies show that fMRI signals in LOC, but not lower visual areas, are correlated with object perception (Grill-Spector et al. 2000; James et al. 2000; Bar et al. 2001; Avidan et al. 2002; Carlson et al. 2007).

One patch of visual cortex is specifically responsive to faces (Sergent and Signoret 1992; Haxby et al. 1996; Puce et al. 1996; Kanwisher et al. 1997). It was termed the FFA (Kanwisher et al. 1997). This patch of visual cortex responds most vividly to visual stimuli containing faces. In an fMRI-guided electrophysiology experiment, Tsao and colleagues demonstrated that monkey regions found using similar fMRI experimental protocols contain enormous quantities of—if not only—face-responsive neurons (Tsao et al. 2006). This view of FFA has not been without opposition. Some have argued that the FFA is not specialized for faces per se, but for expertise—and we are experts at recognizing faces (Gauthier et al. 2000; Xu 2005). In addition to FFA, selective responses to visual faces have been found in other regions (Grill-Spector 2003; Rajimehr et al. 2009; Kanwisher 2010). Others have proposed that FFA itself consists of several distributed face-selective patches (Pinsk et al. 2009; Weiner and Grill-Spector 2010). Together these proposals suggest that face perception, like motion perception, may be an emerging property from a large cortical network rather than a single cortical site (Rossion et al. 2003).

These reservations hold for the other abovementioned areas implicated in functional specialization as well. Haxby and colleagues proposed that, rather than containing clearly separated loci of functional specialization, the ventral cortex contains widely distributed and overlapping representations. Using a pattern classification approach (see Chap. 23), they demonstrated that visual cortex was able to identify the different stimuli categories, even when the regions thought to be specialized in processing the categories, such as FFA for faces, were removed from the analysis (Haxby et al. 2001; O’Toole et al. 2005).

Using fMRI and other imaging techniques, regions implicated in functional specializations are identified by comparing fMRI signal amplitudes elicited by viewing two—or more—tightly controlled synthetic stimulus categories. Yet, knowledge acquired with these synthetic stimuli and tasks is supposed to extrapolate to real-life situations. Recent studies confirm that these functional specializations are preserved during uncontrolled natural viewing of movies (Bartels and Zeki 2004; Hasson et al. 2004). The modularity is also preserved when morphing stimuli from one stimulus category to another. For example, when morphing a face into a house, the fMRI activity patch does not systematically shift from FFA to intermediate positions and then to PPA, but rather signal amplitudes decrease in FFA and increases in PPA (Tootell et al. 2008; Goesaert and Op de Beeck 2010). Like the visual field maps, functionally defined areas are used to constrain the brain areas under consideration. It has the same advantage of increasing the SNR. This type of ROI analyses based on function has been subject to different critiques (see, for example, Friston et al. 2006, Saxe et al. 2006). Unlike visual field mapping, ROI analysis based on functional definitions should take care that the functional definition of the area is independent of the function examined in the main experiment; a lack of independence can lead to invalid results, a fallacy that has been pointed out on several occasions (Kriegeskorte et al. 2009; Vul et al. 2009).

Subcortical Nuclei

In addition to the cortex, there are several subcortical nuclei that also process visual information with specific functional specializations. The most prominent nuclei are the LGN, superior colliculus, and the pulvinar. fMRI measurements readily cover these nuclei, and they are readily identified based on their anatomical locations. On the other hand, the small sizes of the subcortical nuclei and their vicinity to large (pulsating) vasculature hinder fMRI measurements. Advances in imaging technologies, including high-resolution and physiological noise suppression, has increased access to these structures in humans.

The most well-known subcortical structure in the visual system is the LGN. The LGN is an intermediate nucleus transmitting signals from the retina to primary visual cortex. Traditionally, it is thought of as a passive relay station. In line with this idea, the receptive field properties of the retinal ganglion cells and LGN neurons are very similar. On the other hand, the LGN receives input from V1, thalamic, and brain-stem nuclei, and these non-retinal contributions account for 80–95 % of all the LGN inputs. These non-retinal inputs are thought to modulate the signals transmission from the retina to the visual cortex. Consequently, the LGN is thought of as a gatekeeper rather than a passive relay station (Singer 1977; Burke and Cole 1978; Crick 1984; Sherman and Koch 1986; Sherman and Guillery 2002; Saalmann and Kastner 2009; Fig. 15.6).

Fig. 15.6
figure 6

T-statistical maps of a single subject indicating fMRI responses elicited by visual stimulation overlaid on coronal (left) and axial (right) anatomical images. The LGNs are highlighted with dashed lines. (Adapted from Mullen et al. 2008)

Many independent laboratories have repeatedly measured fMRI signals from the LGN (Buchel et al. 1997; Chen et al. 1998b; Miki et al. 2000; Fujita et al. 2001; Kastner et al. 2004; Lu et al. 2008), characterized some of its response properties to different stimulus manipulations (Kastner et al. 2004; Schneider et al. 2004; Mullen et al. 2008), and examined its role in clinical conditions such as amblyopia (Miki et al. 2003; Hess et al. 2009; Hess et al. 2010). fMRI has revealed influences from surprisingly high-level cognitive processes and motor events, such as perceptional states (Haynes et al. 2005; Wunderlich et al. 2005; see also “Binocular Rivalry”), attention (O’Connor et al. 2002; Schneider and Kastner 2009; see also “Attention”), visual imagery (Chen et al. 1998a), saccades (Sylvester et al. 2005; Sylvester and Rees 2006), and blinking (Bristow et al. 2005). Imaging of functional subdivisions of the LGN requires several measuring sites within the small LGN (± 120 mm3; Andrews et al. 1997). High-resolution fMRI protocols have reconstructed functional subdivisions and visual field map representations in humans (Chen et al. 1999; Schneider et al. 2004) and cats (Zhang et al. 2010). fMRI allows simultaneous measurements of the LGN and visual cortex. This makes fMRI an ideal method to study the relationship between them. Similar to the reported anatomical covariation of the LGN and V1 (Andrews et al. 1997), LGN activation sizes correlate with those in visual cortex (Chen and Zhu 2001). This covariation may depend on stimulus characteristics. Mullen and colleagues have suggested that signals of certain neural populations are selectively amplified between the LGN and V1, in line with a modulator role of the LGN (Mullen et al. 2008).

The superior colliculus is a layered nucleus located in the roof of the brain stem. It is extensively studied in nonhuman animals. The superior colliculus is a key component in a network mediating saccadic eye movements, fixations, and directed attention. Superficial layers not only receive direct input from the retina but also from visual cortex and FEFs. Deeper layers receive input from a range of ­cortical and subcortical regions, involved in sensory and motor functions (Wurtz and ­Albano 1980; Sparks 1988). Human measurements from the superior colliculus are obscured by its small size and proximity to large pulsating vasculature. Currently, only a few laboratories have reported fMRI responses from the superior colliculus including a reconstruction of a coarse visual field map (DuBois and Cohen 2000; Schneider and Kastner 2005; Sylvester et al. 2007; Wall et al. 2009).

The pulvinar lies in the dorsolateral posterior thalamus and consists of several nuclei. It receives input from the retina and a series of subcortical and cortical regions. The retinal input, however, is not thought to make a dominant contribution to its response properties. Instead, the pulvinar appears to receive its primary input from the cortex, and it has extensive reciprocal connections with virtually all visual cortical areas. Therefore, in contrast to the LGN, the pulvinar is considered a higher-order subcortical nucleus. Its functions are not well understood, but include visuomotor processing, attention, complex processing of visual stimuli in conjunction with the cortex, and it may play a role in integrating information from different cortical regions (Robinson and McClurkin 1989; Grieve et al. 2000; Sherman and Guillery 2002; Casanova 2004). A few studies have observed fMRI signals in the pulvinar and attentional manipulations seem important (Yantis et al. 2002; Kastner et al. 2004). Some nuclei within the pulvinar can discriminate small shifts in the stimulus position (Fischer and Whitney 2009) and others have contralateral hemifield representations (Cotton and Smith 2007).

fMRI Adaptation

From the functional specialization literature, new data-analysis techniques have emerged. Information decoding algorithms (Haxby et al. 2001; Norman et al. 2006) are discussed in detail in Chap. 23. Another technique is commonly referred to as fMRI adaptation (fMRI-A; Grill-Spector et al. 1999), and is also known as repetition–suppression or repetition priming (Buckner and Koutstaal 1998). The technique is grounded in a long history of psychophysical and electrophysiological research; a long exposure to a given orientation, motion, or face will change perception.

In adaptation, the response to a given stimulus decreases if a similar stimulus was recently presented. There are many unknowns about the exact mechanism underlying the decreased—adapted—response. Yet, despite these unknowns and cautionary remarks (Hegde 2009), fMRI adaptation has been used to provide insight into whether the same neurons or different neurons are processing a given stimulus dimension—adaptation is only expected when the same neurons are processing the two sequential stimuli (Grill-Spector and Malach 2001; Krekelberg et al. 2006).

The experimental rationale is as follows: Two or more stimuli are presented sequentially. If the same neural population processes all stimuli, adaptation is expected, and hence the fMRI signal will decrease in amplitude for the second and later stimuli presentations. If, on the other hand, distinct neural populations process the stimuli, no decrease in amplitude is expected. Both scenarios can be expected within the same brain, but at different stages of the visual processing hierarchy. Three examples of the technique of fMRI adaptation are given in the following paragraphs.

One of the first to use this technique in fMRI studies was Tootell et al. (1998b). Tootell and colleagues reconstructed the orientation tuning width of V1 neurons using fMRI adaptation. In these experiments, gratings with different orientations were presented sequentially. The orientation difference was varied: Smaller orientation differences between successive gratings adapt similar neurons and decrease the fMRI amplitude, larger orientation differences cause less adaptation, and, consequently, smaller decreases in the fMRI signal amplitude. The orientation tuning width was then reconstructed by comparing the signal decreases as a function of the orientation difference of sequential gratings.

Another illustration is provided by the study of Rokers et al. (2009). They used fMRI adaptation to identify the cortical areas that are selective for three-dimensional (3D) motion. Motion towards or away from an observer is characterized by simultaneous opposite directions of retinal motion in the two eyes. After adapting to opposite directions of motion in the two eyes for some time, the researchers presented a probe that contained the same signals either synchronously or in quick succession. While the synchronous probe produces a percept of 3D motion, the quick-succession probe does not. Early cortical areas that are sensitive to retinal motion per se, such as V1 and V2, showed adaptation in both conditions, but area hMT+ showed much larger adaptation effects for the probe that produced a percept of 3D motion, compared to the probe that did not. This result suggests that area hMT+ contains neurons that are tuned to trajectories of 3D motion, and that such sensitivity does not exist in earlier cortical areas. The use of fMRI adaptation proved critical in obtaining a result that had been elusive in earlier attempts using single-cell recording techniques.

A last example of the use of the fMRI adaptation paradigm is provided by the study of Carlson et al. (2007). They used fMRI adaptation in the object-substitution masking paradigm. In the object-substitution paradigm, a mask presented after a target visual object, but in a distinct retinotopic location, removes the target visual object from the subject’s awareness. They presented another target stimulus after the object-substitution masking paradigm. Besides collecting fMRI data, behavioral responses validated the masking success on a trial-by-trial basis. fMRI adaptation of the second target stimulus was expected when the masking was—behaviorally—unsuccessful, or if, despite successful masking, the neurons still represented the stimulus but without awareness. They showed fMRI adaptation in LOC when the masking was unsuccessful, but no fMRI adaptation when the masking was successful. This result suggests that the mask not only removed the target stimulus from awareness but also removed—or significantly altered—the neural representation of the target objects in LOC.

Organization Principles

The organization of the visual system can be investigated at different spatial scales (Fig. 15.7). In the previous sections, we discussed visual field maps and functional specializations. Both distinctions support the notion of a modular design of visual cortex, with the modules representing visual field maps or functional specializations. Multiple visual field maps suggest that neurons in every visual field map perform a different computation on the visual scene. Hence, each visual field map is hypothesized to contain a unique representation of the visual field. This hypothesis relates the visual field map to the idea of functional specialization. This relationship is supported by the idea that visual areas can be defined based on unique functions, connections, architecture, and visual field map (Van Essen 2003).

Fig. 15.7
figure 7

A schematic illustration of several theoretical organization schemes in the visual cortex overlaid on a lateral view of a human brain. At the largest scale, two—dorsal and ventral—pathways are distinguished in visual cortex (arrows). At a medium scale, several eccentricity representations—clusters—are dissociated (circles, with the star representing the foveal representation). These clusters may correspond to functional specializations (see Fig. 2.5). At the smallest scale, several visual field maps can be delineated within a given cluster (dashed lines). Primary visual cortex (V1) and representative visual field map naming conventions are indicated (x indicates visual field map number or letter, e.g., VO-1 or V3A, see Fig. 2.4). IPS intraparietal sulcus, V1 primary visual cortex, V3 third visual area, TO temporal occipital, VO ventral occipital, LO lateral occipital

The functional specializations mentioned in “Functional Specialization” are defined based on certain perceptual or phenomenological aspects of a visual scene, for example, motion, color, or faces. Lennie (1998) suggested that the modular organization aids retrieval of perceptual relevant information from the different modules, and it eliminates the need for information from one level to be passed on to the next. Here, however, the computational processes within a visual field map do not have to coincide with perceptual qualities. Indeed, most perceptual functions are associated with multiple visual field maps and even multiple cortical patches. Wandell et al. (2005) noticed that visual field maps are organized in clusters that share a similar eccentric organization. Within a cluster, visual field maps are distinguished by polar angle (e.g., see Fig. 15.2; V1, V2, and V3 fall within one cluster). Many perceptual functional specializations fall within a cluster. For example, TO maps lie within the motion-selective hMT+ cluster (Amano et al. 2009; Kolster et al. 2010), and the PHC maps fall within the place-selective PPA cluster (Arcaro et al. 2009). Wandell and colleagues proposed that functional specializations for perceptual functions are organized around visual field map clusters rather than single visual field maps.

The cluster theory is reminiscent of the center–periphery organization proposed by Levy et al. (2001). Levy and colleagues proposed that object representations are organized according to central versus peripheral visual field bias. The cluster theory is different in two aspects. First, the center–periphery organization was proposed for object-related areas only. Second, Levy and colleagues’ hypothesis proposed a center–periphery organization based on the absence of orderly meridian (polar-angle) representations. As technology evolved, this proposal did not anticipate the discovery of several visual field maps with orderly polar-angle representations in object-selective cortex. Several independent laboratories confirmed these orderly polar-angle representations (Larsson and Heeger 2006; Swisher et al. 2007; Amano et al. 2009; Arcaro et al. 2009; Kolster et al. 2010). The cluster theory generalizes the object-related center–periphery proposal to a large extent. First, because it is founded on widely accepted visual field map organization in V1, V2, and V3, and, second, it applies in both object and non-object-related patches of visual cortex.

At an even larger spatial scale, Ungerleider and Mishkin (Ungerleider and Mishkin 1982) proposed another long-standing organizational principle. They proposed that the visual system is organized along two pathways: a ventral pathway identifying what an object is and a dorsal pathway identifying where an object is. This distinction is also interpreted as perceptual identification of objects and perception for visually guided actions (Goodale and Milner 1992). Many lines of evidence support these two distinctions including fMRI studies (James et al. 2002; Culham et al. 2003; Shmuelof and Zohary 2005; Valyear et al. 2006).

Given that we have a modular organization of visual cortex, in terms of both visual field maps and functional specializations, the next question is how the information is integrated between the modules. In nonhuman primates, detailed knowledge of the connections of different visual areas allowed inferences about cortical organization. This has yielded intricate graphs that capture the relationships and information flow between different visual areas (Felleman and Van Essen 1991; Young 1992). Monkey–human homologue questions complicate the extrapolation of these graphs to humans. For example, in humans, novel visual field maps and functional areas have been defined, and different functions have been attributed to similar visual field maps. Both scenarios indicate different connections in humans. Promising avenues to contribute to this type of analysis in humans come from both within and outside the fMRI field (Bullmore and Sporns 2009; Smith et al. 2010).

These proposals of cortical organization relate to the spatial scale of the visual cortex’ organization and are not mutually exclusive. At different spatial scales, these proposals all support the notion of a modular design of the visual system. Marr (1982) compared the modularity of the visual system to principles in computational science. The separation of a complex task into smaller—to some degree independent—modules facilitates easier modifications of the individual modules, whether by a human designer or evolution, without the need of many simultaneous changes elsewhere.

Evolutionarily, the visual word form area (VWFA) differs from the other functional specializations. Reading arose too recently to have significantly influenced our brain evolution. This suggests that at least the VWFA is shaped by experience. However, the VWFA is found in the same place in different individuals and cultures. VWFA is even reported in blind Braille readers (Reich et al. 2011). To explain this consistency across subjects, Dehaene and colleagues proposed the neuronal recycling hypothesis. According to this hypothesis, new cultural skills such as reading invade evolutionary older circuits and inherit many of their properties (Dehaene 2005; Dehaene and Cohen 2007).

The modular organization may also be a consequence of individual neuronal limitations. First, a neuron’s processing speed is slow—especially compared to modern computer’s central processing unit (CPU) capabilities (about 30 versus 109 Hz). A modular design may speed up the overall processing time by parallel computing (Feldman and Ballard 1982). Second, a neuron can physically directly connect to a limited amount of other neurons; prioritizing these connections may result in grouping of certain neural populations (Barlow 1986). Minimizing and prioritizing the wiring length and configurations would also have an evolutionary benefit of faster processing. It may also account for the modularity of the visual system at different spatial scales, and it may even explain the anatomical folding pattern of the cortex itself (Van Essen 1997).

Visual Perception

Visual perception is initiated by retinal stimulation, and it is also guided by the brain’s existing knowledge about the visual world. The visual system reconstructs the 3D environment from the 2D retinal projection in each eye. This 2D to 3D reconstruction is inherently ambiguous and to solve this “inverse optics problem,” the brain cannot rely on the retinal image alone. Rather, we interpret the retinal image based on existing knowledge about our environment. Many important investigators recognized this relationship between the physical sensory input and our perceptual interpretation. Even as early as about AD 360 ,Nemesius (1636) wrote: “[visual perception] hath brought together, both that which was before seen and that which is present likewise, in our sight.” Similarly, von Helmholtz (1867) wrote: “objects are always imagined as being present in the field of vision as would have to be there in order to produce the same impression on the nervous mechanism.”

Along the transformation pathway from retinal stimulation to perception, we do not expect the activity of every neuron to correlate with perception. Based on a hierarchical model of vision, activity in higher visual areas is assumed to correlate more with perception, whereas the activity in lower visual areas may correspond more with retinal stimulation. Many visual areas may contain a mixture of representations that may also depend on the specific stimulus and task. Cases of both retinal stimulation without perception and perception without retinal stimulation have been documented. Based on V1 signals—but not V2 or V3—perceptually invisible stimuli can be successfully identified (Haynes and Rees 2005b). Top-down—cognitive—influences such as attention and visual imagery can reach early stages of visual processing, from extra-striate cortex to V1 to subcortical nuclei (Pessoa et al. 2003; Boynton 2005; Yantis 2008). One way to relate fMRI signals to perception is to correlate fMRI signals with behavioral—perceptual—measurements. For example, Grill-Spector showed that the fMRI signal amplitude is correlated with object recognition performance in LOC but less so in V1 (Grill-Spector et al. 2000). In V1, fMRI signal amplitude corresponds to the likelihood of the subject detecting a stimulus (Ress et al. 2000; Ress and Heeger 2003).

Binocular Rivalry

The discrepancy between the physical image properties and our perception is the basis of numerous visual illusions. In visual illusions, percepts are dissociated from retinal stimulations. Therefore, another way to relate fMRI signals to perception is to use visual illusions. In particular, binocular rivalry has been used to study perception-related activity or even to elucidate the neural correlates of consciousness (Myerson et al. 1981; Crick and Koch 1998). In binocular rivalry, two different stimuli are presented to each eye (Fig. 15.8). These two stimuli are incongruent and cannot be fused into a coherent percept. Thus, even though physically both stimuli remain unchanged and are presented simultaneously, visual perception alternates between the two stimuli (Wheatstone 1838; Alais and Blake 2005). Using fMRI, neural correlates of binocular rivalry percepts have been reported at different stages, ranging from extra-striate cortex (Lumer et al. 1998; Tong et al. 1998; Brouwer et al. 2005), to V1 (Polonsky et al. 2000; Tong and Engel 2001; Haynes and Rees 2005a; Lee et al. 2005; Lee et al. 2007)), and as early as the LGN (Haynes et al. 2005; Wunderlich et al. 2005).

Fig. 15.8
figure 8

Schematic illustration of binocular rivalry. Two different stimuli are presented to each eye. In this example, the stimuli consist of oblique gratings. These two stimuli do not change over time. Visual perception—subjective experience—alternates between the two stimuli

In contrast with electrophysiology, the fMRI signals have been correlated with perception at surprisingly early stages. In binocular rivalry, electrophysiology studies report little to no evidence of neural spiking rates correlating with perception in V1 (Leopold and Logothetis 1996) and LGN (Lehky and Maunsell 1996). The site of rivalry may depend on the nature of the visual stimulation (Wilson 2003; Freeman 2005; Hohwy et al. 2008). The difference may be attributed to different sensitivities of both methods (Boynton 2011). But this contrast can also be explained because in V1 the neural correlates of the perceptional alternations are only present in the low-frequency LFPs but not high-frequency LFP or spiking rates (Maier et al. 2008). Unlike spiking activity, LFP mainly reflect subthreshold activity, such as synaptic potentials, voltage-dependent membrane oscillations, and spike afterpotentials (Logothetis and Wandell 2004). Fries et al. (1997) suggested that the neural synchrony of the neural populations coding for the different rivalry stimuli varies, which may be reflected in LFP signal changes but not spiking rates. Though generally spiking activity and LFP are correlated, fMRI is more sensitive to LFP (Logothetis et al. 2001; Lauritzen and Gold 2003; Logothetis and Wandell 2004), and this may explain the discrepancy between fMRI and electrophysiological measurements of spiking rates during binocular rivalry. In sum, the quest for the neural correlate of conscious perception is still open, and fMRI studies highlight subthreshold processing and the participation of early cortical and subcortical regions in perception.

Attention

Not all aspects from the visual scene are processed equally; attention selectively concentrates on certain aspects, while ignoring others. Attention changes how sensory information is processed, though it will not affect all aspects of sensory processing equally. As such, attention plays a central role in perception (James 1890).

Where is the site of attentional modulations in visual tasks? Corbetta and colleagues, using PET, found that selective attention to speed, color, and shape enhanced activity in regions implicated in processing the selected attribute. Using fMRI, many investigators have confirmed and extended these findings. Without changes in stimuli, regions implicated in functional specializations are modulated when shifting attention to and from the attribute of interest, such as motion (Beauchamp et al. 1997; O’Craven et al. 1997; Buchel et al. 1998; Chawla et al. 1999), color (Chawla et al. 1999), faces (Wojciulik et al. 1998; O’Craven et al. 1999), and places (O’Craven et al. 1999). Besides, manipulating activity in regions implicated in functional specializations, attention to specific retinotopic locations, without changes in retinal stimulation, can reconstruct visual field maps (Tootell et al. 1998a; Brefczynski and DeYoe 1999). These attentional modulations have been reported in surprisingly early stages of visual processing, including primary visual cortex (Tootell et al. 1998a; Watanabe et al. 1998a, b; Brefczynski and DeYoe 1999; Gandhi et al. 1999; Kastner et al. 1999; Martinez et al. 1999; Somers et al. 1999; Liu et al. 2005) and subcortical nuclei, including the LGN (O’Connor et al. 2002; Schneider and Kastner 2009).

Attention changes the gain of neural responses and hence behavior (Desimone and Duncan 1995; Kanwisher and Wojciulik 2000; Kastner and Ungerleider 2000; Treue 2001; Boynton 2005; Reynolds and Heeger 2009). Based on electrophysiological studies, this change may be a multiplicative response gain or more in line with a change in the contrast gain. Other studies suggested an attention-dependent change in tuning functions. In line with these theories, human fMRI studies suggest that these increased responses may reflect a multiplicative gain in response profiles (Saproo and Serences 2010), an increase in the response selectivity (Murray and Wojciulik 2004), and an increase in suppressive interactions (Kastner et al. 1998). Recently, Reynolds and Heeger (2009) proposed a model that captures the variety of response modulations. This model normalizes neural responses by a so-called attention field, and exhibits each of these different response modulations depending on the stimulus and attentional manipulations. Using behavioral measurements and fMRI, they validated this model showing that behavior can exhibit both multiplicative response gains and contrast gains that correlate with attention field sizes as measured with fMRI (Herrmann et al. 2010).

Attention modulates neural responses in the visual system, but this does not mean that these changes originate there. Indeed, attention relies on non-sensory brain functions such as intention, planning, and memory. Consequently, attention-related modulations of the visual system are accompanied by widespread activity in a network of frontal and parietal brain regions (Kanwisher and Wojciulik 2000; Corbetta and Shulman 2002). Activity in these frontoparietal regions is also observed during periods without visual stimulation when an item is anticipated, indicating that this activity is directly related to attention allocation, and, in turn, modulates sensory responses when a stimulus is present (Kastner et al. 1999). These frontoparietal regions show a strong overlap with those associated with planning eye movements consistent with a tight functional relation between selecting input through attention and through redirection of gaze (Corbetta et al. 1998). Interestingly, a very similar network of brain regions are activated at the time of perceptual changes in the paradigm of binocular rivalry discussed above (Lumer et al. 1998; Sterzer et al. 2009; Knapen et al. 2011). This result suggests a relationship between the allocation of attention and the formation of a conscious percept.

Disorders of the Visual System

Investigations of visual system disorders take advantage of the detailed knowledge of the visual system layout. V1, in particular, is often studied because—almost—all visual information passes through V1. In addition, V1 is the largest visual field map on the cortex, reliably located in and around the calcarine sulcus (Stensaas et al. 1974), and it is routinely mapped using fMRI (“Measuring Visual Field Maps Using fMRI”). Complete removal of V1 results in—cortical—blindness. Local damage or nonfunctional regions in V1 result in corresponding blind spots—called scotoma—in the visual field (Holmes 1918). Lesions in V2/V3 may have a similar consequence (Horton and Hoyt 1991a), whereas lesions in higher visual cortex may yield more complex and specific deficits but not blindness (see “Functional Specialization”).

Yet, subjects with V1 lesions may retain limited visual capabilities in these blind regions. These residual visual capabilities—if any—are mostly unconscious “blindsight” (Poppel et al. 1973; Weiskrantz 1990; Stoerig and Cowey 1997), but may be conscious “Riddoch syndrome” (Riddoch 1917; Zeki and Ffytche 1998; Giaschi et al. 2003). When these residual visual capabilities are unconscious, the subject claims to have no awareness of any stimulus presentation, but, when pushed to make a choice or guess, performances are above-chance levels. These residual visual capabilities are generally attributed to direct connections between the LGN, superior colliculus, pulvinar, and extra-striate cortex (Cowey and Stoerig 1991; Sincich et al. 2004; Leh et al. 2006). The results have to be interpreted carefully; in certain cases, spared islands in V1 may underlie blindsight (Fendrich et al. 1992, 2001), or healthy V1 may be reached due to light scatter in the eye (Faubert et al. 1999). Using fMRI in humans, visual stimulation in the blind visual fields can activate extra-striate cortex after local V1 lesions (Baseler et al. 1999; Goebel et al. 2001; Morland et al. 2004) and complete removal of one hemisphere “hemispherectomy” (Bittar et al. 1999). In nonhuman primates, where the V1 lesions are under tight experimental control, extra-striate activations have also been reported in extra-striate cortex—as early as V2 (Schmid et al. 2009). Subsequent experiments demonstrated a causal role of the LGN in these extra-striate fMRI signals, providing support for the notion of a connection between the LGN and extra-striate cortex that bypasses V1 (Schmid et al. 2010).

Congenital and developmental disorders can drastically alter the layout of V1 and visual cortex. For instance, in the absence of a functional central retina due to inherited photoreceptor abnormalities, peripheral retinal signals may occupy central parts of V1 (Baseler et al. 2002), tactile information may invade V1 in a retinotopically specific manner in visually impaired subjects (Cheung et al. 2009), and the V1 hemifields normally divided across the two hemispheres may be found in the same hemisphere up to a certain eccentricity in albino subjects (Hoffmann et al. 2003) or completely in a subject born with only one hemisphere (Muckli et al. 2009). Developmental disorders may not only alter V1 organization but can also preserve V1 organization in anatomically abnormal cortex. An intact V1 and normal visual perception suggest normal visual functions, even when found within large anatomical malformations such as polymicrogyria (Dumoulin et al. 2007).

In adults, the degree to which visual cortex is able to reorganize is subject to intense disputes (Baseler et al. 2009; Gilbert et al. 2009; Wandell and Smirnakis 2009). Smirnakis et al. (2005) demonstrated limited plasticity in the adult visual system of macaques. Their thorough investigation entailed both fMRI and electrophysiology over a period of 7.5 months after retinal lesions. They failed to find evidence of plasticity in adult visual cortex, causing a reinterpretation of existing data (Smirnakis et al. 2005; Wandell and Smirnakis 2009) and an upset in the—mainly non-fMRI—plasticity literature (Calford et al. 2005). Although Smirnakis and colleagues also used electrophysiological techniques, Calford et al. (2005) questioned the use of fMRI to measure reorganization because of the many uncertainties associated with the fMRI signal. However, fMRI allows these plasticity questions to be pursued in subjects typically inaccessible to invasive approaches. Another example of limited plasticity of adult visual cortex is provided by the limited success of sight recovery from early blindness in adult life. Subjects, whose sight—or more precisely, the optics in the eye—have been restored in adult life after having grown up blind, are severely limited in their visual performances even many years after sight recovery (Gregory and Wallace 1963; Fine et al. 2003; Ostrovsky et al. 2006). Despite relatively normal eye responses, continuing deficits in cortical organization limit the visual abilities of these subjects (Fine et al. 2003; Saenz et al. 2008; Levin et al. 2010).

Part of the debate about adult plasticity is based on a widely publicized fMRI finding related to macular degeneration. Macular degeneration destroys the central retina, also known as the fovea or macula, resulting in a visual blind spot (scotoma). Central visual loss is particularly problematic, because the fovea is a specialized region that represents the image with the highest spatial acuity. In addition to juvenile variants, age-related macular degeneration is the leading cause of visual impairment of people over the age of 50 (Leibowitz et al. 1980). Due to the cortical magnification factor, macular degeneration deprives a large cortical surface area of retinal input. These deprived regions of visual cortex can roughly be identified based on the canonical layout of the—healthy—visual system (see, e.g., Fig. 15.3). Surprisingly, Baker et al. (2005) found that these regions deprived of visual input could still respond to visual stimulation. Not when stimulating the central and degenerated retina, but when stimulating peripheral retina less affected by the degeneration. They interpreted these results as evidence of large-scale reorganization in visual cortex.

Several independent labs have now replicated this finding (Baker et al. 2008; Masuda et al. 2008; Schumacher et al. 2008; Dilks et al. 2009; Liu et al. 2010) though not in all subjects (Sunness et al. 2004; Masuda et al. 2008; Baseler et al. 2009, 2011). The same phenomenon has also been replicated in other types of retinal degeneration, such as retinitis pigmentosa, a condition that damages the peripheral retina leaving the subject with only central vision (Masuda et al. 2010). There are many differences between these patients, for example, the distinction between juvenile and age-rated macular degeneration, the completeness of the retinal degeneration, and the development of a peripheral preferred retinal locus, are all factors that may affect the results. Masuda et al. (2008, 2010) and Liu et al. (2010), suggested that these signals are mediated by the subject’s task, which could explain the discrepancies between different studies. They advocated that these central fMRI signals reflect an imbalance in the feed-forward and feedback signals; an explanation also originally proposed as a possibility by Baker et al. (2005). But, because this explanation does not require any changes in cortical circuitry, Masuda and colleagues opposed the notion that these fMRI signals reflect reorganization of the visual system. Basically, due to the complexity of the neural networks in our brain, there is more than one way to reach the neurons in primary visual cortex, and random damages in any part may cause unexpected behavior. Models of neural circuitry and the ability to simulate damage to this circuitry are therefore essential, independent of the experimental technique that is used (Wandell and Smirnakis 2009).

The terms “plasticity” and “reorganization” are ubiquitous in studies of visual disorders, but these terms are ill defined. Using fMRI, the most basic definition is that the obtained fMRI signals are not observed in control subjects. The neural basis of these terms is likewise vague and the interpretation ranges from changes in synapse strength, to growing new connections between neurons, either dendrites or axons, to growing new neurons altogether. These neural changes also vary, in the same order, from being generally accepted, as for processes underlying standard learning activities, to unresolved, as for the processes underlying new dendrite, axon or neuron creation. In short, care should be taken to a priori label any unexpected fMRI signals as reorganization or plasticity of the underlying neural circuitry, and steps should be taken to specify the implied mechanism.

Conclusion

fMRI has provided several insights into the organization and function of visual cortex. It has provided a detailed image of the organization of visual cortex with a multitude of functional specializations and an increasing amount of visual field maps extending into all four lobes. FMRI is one of the few techniques that is readily applied to both human and nonhuman primates, and hereby it facilitates the extrapolation of detailed findings from invasive techniques to humans. Besides, providing a vehicle to integrate results between the species, fMRI has also identified several species differences, and it outlines limits to extrapolate the findings of nonhuman species to humans. A surprising finding of fMRI is the marked influence of cognitive events on the early visual system. Cognitive phenomena, such as attention and correlates of conscious perception, may influence the fMRI signals as early as V1 and the LGN. The noninvasive nature of fMRI allows investigations of clinical manifestations of human visual cortex and allows these measurements to be related to behavioral findings. Taken together, fMRI and the development of data-analysis techniques that take advantage of the rich amount of information in fMRI signals provide insights into the structural organization and function of the visual system, that could not be arrived at using more traditional anatomical, behavioral, and neurophysiological techniques.

Future directions of fMRI of the visual system will continue to go beyond straightforward measures of the presence or absence of significant fMRI signal amplitudes (activity). New data-analysis techniques will extract more information from the fMRI signals, push through the hemodynamic filter, and provide a tighter link to the underlying neural population. Already several new data techniques have emerged that rely on adaptation phenomena (“fMRI Adaptation”), look beyond single locations to information contained across multiple recording sites (Chap. 23), and fit quantitative neural models to the fMRI signals (“Neural Model-Based Approaches”). Quantitative descriptions of fMRI data will be vital in future research, and they will add to the ability to link the data across different species and measurement techniques. These quantitative measurements will be invaluable when shifting questions from where to how the visual system processes information, including the question of neural communications between different cortical regions and the neural correlate of perception.