Keywords

…without selective interest, experience is an utter chaos—James (1890).

Attention, the “focalization” or “taking possession” of environmental stimuli or lines of thought (James 1890), can refer to a general level of alertness, focus on specific object, or a shift in the point of gaze (Colombo 2001). Eye movements allow us to quickly form an impression of our surroundings and to make moment-to-moment, context-appropriate decisions about our actions. Eye movements are directly observable, and thus they are good targets for investigations of how infants and children learn about the visual world.

Infants inhabit the same world as do adults and encounter the same visual scenes. How do they meet the challenges of seeing and interpreting the visual environment? Is the infant’s visual system sufficiently functional and organized to make sense of the world and able to bind shapes, colors, and textures into coherent forms and to perceive objects, people, and places? Or does the infant’s visual system require a period of maturation and experience within which to observe and learn, to coordinate visual and manual skills, to recognize and utilize individual visual cues, and to integrate auditory, haptic, and visual information? These questions are rooted in the perennial “nature-nurture” debate (Johnson 2010; Spelke 1990), and they continue to motivate research investigations of the development of visual-spatial attention. The studies reviewed in this chapter reveal that development of visual-spatial attention is a function of growth, maturation, and experience from learning and from action; all happen at the same time, and all influence one another. The chapter will emphasize the neural foundations of attention and developmental psychology research that shows clear links between attention and learning.

1 Neural Foundations of Visual-Spatial Attention

The purpose of the visual system is to transduce light reflected from surfaces in the environment into neural signals that are relayed to the brain for processing and action planning. Light is first transmitted through the cornea, the outer protective covering, and the lens, which provides focus of near and far objects, and then falls on the retina, a thin film of tissue covering the back of the eyeball. The retina comprises layers of photoreceptors and a rich network of connections and nonsensory neurons that provide initial processing of visual information. Different kinds of photoreceptor accomplish different tasks: There are specialized cells and circuits in the retina for color and contrast, for example, and these help determine how information is subsequently routed to appropriate channels up the visual hierarchy in the brain.

The visual system in the brain consists of a richly interconnected yet functionally segregated network of areas, specializing in processing different aspects of visual scenes and visually guided behavior: contours, motion, luminance, color, objects, faces, approach vs. avoidance, and so forth. Information flows from the retina to the lateral geniculate nucleus (a subcortical structure) and then to the primary cortical visual area (area V1). Reciprocal connections carry information to secondary visual areas (e.g., V2, V3, V4, and the medial temporal area). From these primary and secondary visual areas, visual information diverges to two partly segregated streams, ventral and dorsal (Goodale and Milner 1992; Stiles 2017; Schiller 1996). The ventral stream is connected to temporal cortex and is specialized for object recognition, mostly in inferotemporal cortex or IT (Tanaka 1997). IT connects to perirhinal cortex and other areas involved in categorization of visual stimuli and formation of visual memories (e.g., entorhinal cortex and hippocampus) as well as the lateral prefrontal cortex, involved in learning and planning (Miller 2000; Miyashita and Hayashi 2000). The dorsal stream codes for spatial information (object location and object-oriented action) and connects primarily to parietal structures that are important for voluntary action planning and coordinating somatosensory, proprioceptive, and visual inputs.

1.1 Development of Structure in the Visual System

The visual system, like other sensory and cortical systems, takes shape early in prenatal development. In humans, the retina starts to form around 40 days after conception and cells proliferate until about 160 days. The growth and organization of cells and connections continues well past birth (Finlay et al. 2003). The distinction between foveal and extrafoveal regions is present early, and the topology and patterning of receptors and neurons continue to change throughout prenatal development and the first year after birth. Foveal receptors are over-represented in the cortical visual system, and detailed information about different parts of the scene is enabled by moving the eyes to different points (more on this later). The musculature responsible for eye movements develops before birth in humans, as do subcortical systems (e.g., superior colliculus and brainstem) to control these muscles (Johnson 2001; Prechtl 2001). Different brain areas are present in rudimentary form during the first trimester, but the final forms continue to take shape well after birth; like synaptic pruning, developmental processes are partly the result of experience. Some kinds of experience are intrinsic to the visual system, as opposed to outside stimulation. Spontaneous prenatal activity in visual pathways, for example, contributes to retinotopic mapping (Sperry 1963; Wong 1999).

Many developmental mechanisms are common across mammalian species, including humans, though the timing of developmental events varies (Clancy et al. 2000; Finlay and Darlington 1995). As noted, many major structures (neurons, areas, and layers) in visual cortical and subcortical areas are in place by the end of the second trimester in utero. Later developments consist of the physical growth of neurons and the proliferation and pruning of synapses, which is, in part, activity-dependent.

As soon as neurons are formed, find their place in cortex, and grow, they begin to connect to other neurons. There is a surge in synaptogenesis in visual areas around the time of birth and then a more protracted period in which synapses are eliminated, reaching adult-like levels at puberty (Bourgeois et al. 2000). Synapses are preserved in active cortical circuits and lost in inactive circuits. In contrast to visual areas, auditory cortex experiences a synaptogenesis surge several months earlier, which corresponds to its earlier functionality relative to visual cortex (viz., prenatally). Here, too, pruning of synapses extends across the next several years. (In other cortical areas, such as frontal cortex, there is a more gradual accrual of synapses without extensive pruning.) For the visual system, the addition and elimination of synapses, the onset of which coincides with the start of visual experience, provides an important mechanism by which the cortex tunes itself to environmental demands and the structure of sensory input.

2 Visual Attention in Infancy

Much of what we know about development of visual-spatial attention comes from studies of neural mechanisms in animals and by extensive observations of human infants (Kiorpes and Movshon 2004; Teller and Movhson 1986). Even in infancy, vision is not passive; infants are active perceivers and active participants in their own development (von Hofsten 2004). At no point in development does it appear that humans are simply inactive recipients of stimulation.

2.1 Visual Attention in Newborns

Prior to birth, human fetuses orient their attention preferentially to some stimuli over others (Marx and Nagy 2015; Piontelli 2015). Recordings of cortical activity in utero have demonstrated that fetuses as young as 28 weeks gestation respond to bright light directed at the mother’s abdomen (Eswaran et al. 2004; Fulford et al. 2003); moreover, fetuses in the third month of gestation turn preferentially toward light patterns in a face-like configuration vs. control stimuli (Reid et al. 2017).

Human infants are thus born with a functional visual system, and if motivated (i.e., awake and alert), the baby may react to visual stimulation with head and eye movements. Vision is relatively poor; however, acuity (detection of fine detail), contrast sensitivity (detection of different shades of luminance), color sensitivity, and sensitivity to direction of motion all undergo improvements after birth (Banks and Salapatek 1983). The field of view is also relatively small, so that newborns often fail to detect targets too far distant or too far in the periphery. In addition, neonates lack stereopsis, perception of depth from binocular disparity (differences in the input to the two eyes). Maturation of the eye and cortical structures (some of the mechanisms discussed previously) support developments in these visual functions, and learning plays an important role as well, discussed in greater detail in subsequent sections.

Observations of neonates have revealed that despite relatively poor vision, they actively scan the visual environment. Early studies, summarized by Haith (1980), revealed systematic oculomotor behaviors that provided clear evidence of visual organization at birth. Newborns, for example, will search for patterned visual stimulation, tending to scan broadly until encountering an edge, at which point scanning narrows so that the edge can be explored. Such behaviors are likely adaptive for investigating and learning about the visual world.

In addition, newborn infants show consistent visual preferences. Fantz (1961) presented newborns with pairs of pictures and other two-dimensional patterns and recorded which member of the pair which attracted the infant’s visual attention, which he scored as proportion of looking times per exposure. Infants often showed longer looking at one member of the pair: bull’s-eyes vs. stripes or checkerboards vs. solid forms, for example. Visual preferences have served as a method of choice ever since, in older infants as well as neonates. Slater (1995) described a number of newborns’ preferences: patterned vs. unpatterned stimuli, curvature vs. rectilinear patterns, moving vs. static patterns, three-dimensional vs. two-dimensional forms, and high- vs. low-contrast patterns, among others. In addition, perhaps due to the relatively poor visual acuity of the newborn visual system, there is a preference for “global” form vs. “local” detail in newborns (Macchi Cassia et al. 2002).

Fantz (1964) also reported that repeated exposure to a single stimulus led to a decline of visual attention and increased attention to a new stimulus, in 2- to 6-month-olds. Subsequent investigations examined infants’ preferences for familiar and novel stimuli as a function of increasing exposure, and these in turn led to standardized methods for testing infant perception and cognition, such as familiarization and habituation paradigms (Cohen 1976), as well as a deeper understanding of infants’ information processing (Aslin 2007; Hunter and Ames 1988; Sirois and Mareschal 2002). For example, infants generally show preferences for novel vs. familiar stimuli after habituation (a decline of looking times across trials), implying both discrimination of novel and familiar stimuli and memory for the stimulus shown during habituation. Neonates and older infants also recognize visual constancies or invariants, the identification of common features of a stimulus across some transformation, for instance, shape, size, slant, and form (Slater et al. 1983). Recognition of invariants, in turn, forms the basis for visual categorization (Mareschal and Quinn 2001).

Finally, some kinds of functional visual development have been explained in terms of visual maturation (Atkinson 2000; Johnson 1990, 2005). Acuity, for example, improves in infancy with a number of developments, all taking place in parallel: migration of receptor cells in the retina toward the center of the eye, elongation of the receptors to catch more incoming light, growth of the eyeball to augment the resolving power of the lens, myelination of the optic nerve and cortical neurons, and synaptogenesis and pruning. And as described in the next section, development of gaze control in infancy has been thought to reflect changes in visual-spatial attention stemming from brain development.

3 Development of Gaze Control

Both infants and adults scan visual scenes actively – on the order of 2–4 eye movements per second in general (Johnson et al. 2004; Melcher and Kowler 2001). Gaze control in adults is accomplished with a coordinated system comprising both subcortical and cortical components. Six muscles are connected to the eyeball, each under direct control by brainstem. Eye movements are initiated in areas such as the frontal eye fields, in the cortex, and the superior colliculus, a subcortical area that receives inputs from several cortical areas. Both these areas connect to the brainstem, from which the actual signals to drive eye movements originate. Research on development of eye movements has often been viewed as an indirect means to examine cortical development, on the assumption that oculomotor behaviors can serve as “markers” to specific brain systems (e.g., Johnson 1990). In young infants, cortical influences on gaze control are less prominent.

Several types of eye movements, which develop over the first 6 postnatal months, are made during visual tracking: reflexive and voluntary saccades (quick eye movements from location to location) and smooth pursuit eye movements (slower eye movements that track moving targets; Richards 2001). Voluntary and smooth pursuit eye movements require attention, whereas reflexive eye movements are responses to sudden onset stimuli. Visual pathways that support reflexive eye movements include the retina, lateral geniculate nucleus, superior colliculus, and potentially primary visual cortex (Schiller 1985, 1988). Voluntary eye movements are planned attention-driven saccades that involve early cortical visual areas, parietal cortex, and frontal eye fields (Schiller 1985, 1988). Smooth pursuit eye movements are supported by visual pathways involving medial temporal lobe, middle superior temporal lobe, and perhaps parietal cortex (Schiller 1985, 1988). Johnson (1990, 1995) suggested that the pathways for reflexive eye movements are mature at birth, whereas primary visual area layers supporting voluntary saccadic eye movements mature rapidly during the first 6 postnatal months in conjunction with behavioral changes. Smooth pursuit, however, continues to improve over the first 2 years (Richards 1998).

Early arguments (Bronson 1974) claimed that the development of visual orienting involved a shift from solely subcortical to a mixture of subcortical and cortical processing. However, recent evidence suggests that the development proceeds in a graded manner, with some limited cortical activity likely present in newborns (Johnson 2005). In addition, development of smooth and saccadic eye movements in infants (described in detail in the next section) has often been interpreted as revealing development of distinct cortical systems that control them. Another early and influential proposal held that there are two discrete visual systems, a relatively primitive and phylogenetically older “secondary” system and a relatively sophisticated “primary” system that is more recent to humans in evolutionary time (Schneider 1969). In the neonate, visual behavior was held to be guided principally by the secondary system, which is characterized by poor foveal vision. The secondary system is restricted to reflexive or reactive eye movements to peripheral stimuli and does not participate in analysis of complex visual patterns. The primary system was thought to develop across the first several postnatal months, accompanies improvements in acuity and contrast sensitivity, and is responsible for the emergence of endogenous or internal control of saccades so as to support inspection of visual scenes (Bronson 1974). More recent interpretations of the two-system model have suggested that visual attention is largely under subcortical control until the first few months after birth, after which there is increasing cortical control (Atkinson 1984; Colombo 2001; Johnson 1990).

An example of how oculomotor control may emanate from cortical development is found in Johnson (1990). There are rapid improvements between 6 and 10 weeks in smooth pursuit, the ability to track a small moving target against a featureless background, and in motion direction discrimination, the ability to discriminate dynamic random dot patterns (Aslin 1981; Wattam-Bell 1996). Johnson (1990) suggested that a common developmental path underlies emergence of both smooth pursuit and motion sensitivity: maturation of pathways to and from the medial temporal area. That is, the perception of motion in the visual environment and the programming of eye movements to follow motion are thought to be supported by the same cortical networks (Thier and Ilg 2005). This suggestion was tested empirically by Johnson et al. (2008), who observed infants between 58 and 97 days of age in both a smooth pursuit and a motion direction discrimination task. Individual differences in performance on the two tasks were strongly correlated and were also positively correlated with age, consistent with a maturational model (though not necessarily uniquely predicted by it).

Other visual functions in infancy that have been linked to cortical maturation include development of form and motion perception, stemming from maturation of parvocellular and magnocellular processing streams, respectively (Atkinson 2000), and development of visual memory for object features and object locations, stemming from maturation of ventral and dorsal processing streams (Mareschal and Johnson 2002).

4 Eye Movements

As noted previously, humans move their eyes from location to location even before birth, a behavior that presumably reflects some kind of decision made somewhere in the visual system to find locations in the visual scene for closer inspection. This is known as foveation, the bringing of an image in the environment to the fovea, the center of the visual field and the location on the retina producing the highest acuity inputs to the brain. Foveation is an ingenious mechanism to balance the need to derive detailed visual information from the world and the need to reduce as much as possible the metabolic demands of a large brain necessary to process the information. The fovea has the highest concentration of photoreceptors, and these are preferentially mapped onto visual cortical tissue. Acuity is best at the point of gaze and drops off abruptly with increasing visual eccentricity into a low-resolution visual surround; this drop-off is reflected in the distribution of photoreceptors on the retina (Winkler et al. 1990). Detailed representation of a visual scene, therefore, which entails extensive processing by visual cortex, takes place only for a region within about 2° visual angle of the viewed scene (approximately the size of a thumbnail at arm’s length). To avoid the need to build the large brain that would be needed to process detail about the entire scene, the visual system compromises by periodically shifting the point of gaze with saccadic eye movements and thus reorienting the specific location in the scene that is best represented and processed.

4.1 Primary Eye Movement Systems

There are four primary eye movement systems, each of which is produced by separate neural circuitry, all channeled through the brainstem, which innervates the ocular musculature. Saccades are the most common form of eye movement, consisting of quick changes of fixation – that is, holding the gaze stable – whose function is to place the retinal “image” of an object of interest on the fovea. Smooth pursuit eye movements maintain foveation on a moving item of interest. Optokinetic nystagmus (OKN) is a semi-reflexive eye movement pattern driven by a large visual array that moves with respect to a stationary observer, and the vestibulo-ocular response (or VOR) is a semi-reflexive eye movement pattern driven by a stationary large visual array placed in front of, or surrounding, a moving observer. The relation of eye movements to the head is similar in both OKN and VOR, a relatively slow, smooth track followed by a quick saccade-like movement, but the stimulus conditions, developmental progression, and neural systems controlling each are different. OKN, VOR, and smooth pursuit work in tandem to yield visual stability, the ability to stabilize the retinal image despite perturbations due to eye, head, and body movements and motion in the observer’s surroundings. Visual stability is vital for the effective extraction of detailed information about the visual environment as observers move about, and it improves markedly after 2 months with the onset of consistent smooth pursuit and suppression of OKN and VOR (Aslin and Johnson 1996). This is described in greater detail in the subsequent section.

4.1.1 Scanning

Visual attention is a combination of saccades and fixations, working in tandem to move the point of gaze across the visual field. Sequencing of saccades and fixations is known as scanning, and it constitutes a form of overt (observable) visual attention. During a saccade, the point of gaze for both eyes sweeps rapidly across the scene, and during a fixation, the point of gaze is relatively stationary. Information about the scene is acquired during the fixations. Analysis of the scene cannot be performed during a saccade, whose purpose is to direct attention to a different part of the scene for subsequent processing. Scanning can be interspersed with smooth eye movements, as when the head translates or rotates as the point of gaze remains stabilized on a single point in space (the eyes move to compensate for head movement) or when following a moving target.

Newborn infants, if awake and alert, examine their surroundings with a series of fixations, indicating that some of the neural circuitry for saccade generation is in place at birth. Young infants’ fixations, however, often do not extend beyond areas of high contrast, such as edges (Bronson 1994), or remain centered around a limited set of stimulus features (Bronson 1990; Johnson and Johnson 2000). Infants older than 3 months will more often scan in what appears to be an exploratory fashion. Older infants will also scan between individual stimuli more readily than will younger infants (Bronson 1994). This pattern of development has been interpreted as a shift from reflexive to more purposive scanning, consistent with the maturation of cortical pathways, as noted previously.

4.1.2 Sticky Fixation

One- and 2-month-old infants have been found to exhibit much longer looking times than either neonates or 4-month-old infants in studies that use habituation (Hood et al. 1996; Johnson 1996; Slater et al. 1996a) or other looking time methods (e.g., Johnson et al. 1991). This so-called sticky fixation has been interpreted as difficulty disengaging attention and has been tied to tonic inhibition of the superior colliculus by the substantia nigra and basal ganglia, later controlled by cortical mechanisms subserving endogenous attentional control and peripheral expansion of the visual field (Johnson 1990; see also Hood et al. 1998; Maurer and Lewis 1998). (However, there are other, more cognitive explanations for longer looking times in 1- and 2-month-olds; Johnson 1996.) A decrease in fixation durations with age has also been found in studies of infants’ natural scene viewing (Helo et al. 2016; Wass and Smith 2014). Attentional disengagement continues to improve throughout childhood (Gregory et al. 2016) and into adolescence (Luna et al. 2008). Interestingly, individual differences in fixation durations are linked to attentional and behavioral control in childhood (Papageorgiou et al. 2014).

4.1.3 Smooth Pursuit

In contrast to saccades, smooth pursuit is limited in very young infants. When presented with a small, moving target, infants younger than 2 months will often attempt to track it with “catch-up” saccades, rather than smooth eye movements (Aslin 1981). Younger infants may engage in short bouts of smooth pursuit if target speed is not too high, but smooth pursuit is not robust (Kremenitzer et al. 1979; Roucoux et al. 1983). The limitation in pursuit is not an inability to move the eyes smoothly: OKN, which contains a slow-movement component, can be observed in neonates. Rather, very young infants may be incapable of engaging in predictive eye movements, such that the future location of a moving target cannot be computed, or they may be unable to track due to limitations in motion processing (a function of the medial temporal area; Komatsu and Wurtz 1989). Alternatively, immaturity of retinal photoreceptors may prohibit firm registration of the target on the fovea, such that a series of saccades is necessary to recenter gaze and maintain fixation.

5 Orienting

Many studies have examined orienting – engagement of visual attention – as infants are presented with a limited number of small static or moving targets (Johnson 2005; Richards 1998). Oculomotor behaviors that have been examined include detection of targets in the periphery, saccade planning, oculomotor anticipations, sustained vs. transient attention, effects of spatial cuing, and eye/head movement integration; other tasks examined inhibition of eye movements, such as disengagement of attention, inhibition of return, and spatial negative priming. Bronson (1990, 1994) explored developmental changes in scanning patterns as infants viewed simple geometric forms. The youngest infants tested (2 weeks) were reported to attend primarily to a single prominent feature, whereas older infants (3 months) were more likely to scan between features and to direct saccades with greater accuracy, again perhaps reflecting a transition from reflexive to “volitional” scanning.

5.1 Overt vs. Covert Attention

Along with these developments in overt orienting, infants become capable of covert orienting – an internal shift of attention that can facilitate saccades to particular spatial locations. Two related phenomena are inhibition of return (IOR), a delay in eye movements toward a previously cued location, and spatial negative priming, a delay in eye movements toward a separate location presented alongside the fixated one (i.e., subsequent to the first fixation).

5.1.1 Inhibition

In a typical spatial cueing paradigm, an infant is shown a central stimulus as a stimulus flashes in one of the two possible peripheral locations; the stimulus is too brief to elicit a saccade. After a delay, a target is presented in one or both of the peripheral locations; evidence for covert orienting is provided by effects of the cue’s location. In a study by Clohessy et al. (1991), for example, infants sat in front of three screens. A dynamic stimulus first appeared in the center for fixation, followed by a cue on one of the peripheral screens and then a presentation of a target on both peripheral screens. Infants 6 months and older made more saccades opposite the location where the original cue was flashed, taken as evidence of IOR. Using similar methods, Johnson et al. (1994) presented 4-month-old infants with stimulus sequences in which a 100 ms cue flashed contralateral to the location of target appearing 400 ms later. After 12 such “training” trials, “test” trials were presented intermixed with more training trials; in test trials the target appeared in the cued location after a 100 ms delay. Despite the training (to make a saccade to the side contralateral to the cue), infants showed faster saccadic RTs to orient to the target on test trials, consistent with the possibility that covert attention facilitated saccades to the cue’s location. Johnson and Tucker (1996) subsequently investigated effects of cue-target timing intervals on the speed and direction of orienting in 2-, 4-, and 6-month-olds. Two-month-olds showed only weak effects of the cue, but 4-month-olds showed facilitation to the cued location when the target appeared 200 ms after cue onset but inhibition if it appeared 700 ms after cue onset. Six-month-olds showed evidence of inhibition only (cf. Clohessy et al. 1991). Researchers using paradigms that rely on overt orienting have reported IOR in newborns (Simion et al. 1995; Valenza et al. 1994), whereas those that require covert orienting have found evidence of IOR only in infants 4 months of age and older (Johnson et al. 1994; Johnson and Tucker 1996; Richards 2001).

Scanning efficiency is assessed in the IOR paradigm by the inhibition of eye movements to previously processed locations. In a negative priming paradigm, when an object or location that had previously been ignored or inhibited becomes the target to be selected, responses to it are slowed in comparison to a control stimulus (Neill 1977; Tipper 1985), suggesting that the ignored object or location was attended covertly and inhibited in favor of selecting the attended stimulus. In spatial negative priming (SNP) tasks, competition between the target and distractor must be resolved by suppression of the distractor and simultaneous selection of the target, a demand unnecessary in IOR tasks because the cue generally precedes the target. Impairment during target selection, therefore, can be seen as a validation that the distractor location was attended and inhibited during target selection in the prime. IOR, therefore, is a measure of preference for novel unattended locations, whereas SNP provides a measure of the ability to inhibit a location, while selection of another is underway. These are two distinct processing demands, and behavioral results from the two methods shed light in unique ways on the development of inhibition and its role in learning in infancy.

A study with children ranging from 5 to 12 years, for example, reported negative priming of identity, location, and conceptual task features (Pritchard and Neumann 2004; Simone and Mccormick 1999). The magnitude of spatial negative priming (SNP) effects (i.e., inhibition of location) has been found to remain fairly constant from about age 6 years into adulthood (Davidson et al. 2006). Amso and Johnson (2005) used a spatial negative priming task to assess the ability of 9-month-old infants and adults to select between simultaneously presented locations when inhibiting a distractor location in favor of a target location. Interstimulus intervals (ISIs) between presentation of cues and targets were manipulated to examine temporal effects on inhibition efficiency. Each trial began with simultaneous presentation of a salient cue and a nonsalient distracter, followed (after varying ISIs) by a target in the location previously occupied by the distracter or another location. Both infants and adults exhibited the SNP effect at the longest (550 ms) and intermediate (200 ms) ISIs – that is, saccades were slower to targets when they appeared at the distracter location. In a follow-up study with 3-, 6-, and 9-month-olds, the youngest infants provided no evidence of inhibition; rather, target locations were facilitated (evinced by faster saccadic RTs) (Amso and Johnson 2008). These studies provide evidence, therefore, for covert attention in young infants, but not fully mature inhibitory processes.

5.1.2 Selective Attention

Visual selective attention is the ability to select relevant stimuli for processing and to ignore or inhibit competing alternatives. Efficient allocation of visual attention is critical to learning in infancy. To organize the world into coherent percepts, a child must attend to certain environmental features while simultaneously ignoring others. This requires selection of relevant stimuli for processing and inhibition of those that are irrelevant.

Developmental investigations into visual selective attention have made use of visual search paradigms, which have origins in work by Treisman and colleagues (Treisman 1988; Treisman and Gelade 1980). In a visual search paradigm, a unique target element is placed in an array of distracters. For example, Treisman and Gormican (1988) presented an oblique target among vertically oriented distracters. Participants quickly detected the oblique target, and saccadic RTs to the target were largely unaffected by the number of distracters in the display. The authors concluded that the fact that the increase in elements of one feature did not interfere with the processing of the other suggests that oblique and vertical must be coded as separate features in the visual module that codes for orientation. The “pop-out” effect has been attributed to a parallel processing mechanism that directs attention to the location of the unique item during the preattentive early stages of visual processing (Treisman 1988; Treisman and Gelade 1980).

Research using the novelty preference method (Colombo et al. 1995; Quinn and Bhatt 1998) and mobile conjugate reinforcement techniques (Adler et al. 1998; Rovee-Collier et al. 1996; Goodale and Milner 1992) has reported evidence of pop-out as early as 3 months of age. Also consistent with adult findings, this pop-out effect has been found to be unaffected by the number of distractors (Rovee-Collier et al. 1996).

Visual search tasks have also been used to test infants’ sensitivity to competition between parts of a visual scene (Dannemiller 1998, 2000, 2002). In these studies, a moving target was embedded in an array of 27 static red and green distracters, evenly distributed on the left and right sides of a monitor; each trial commenced when the infant looked at an attention getter in the center of the screen. Attention to the target was influenced by the spatial distribution of the distracters, such that orienting was weakened when higher salience bars were contralateral to the moving target. (Salience was determined by the contrast of static bars with their background; Ross and Dannemiller 1999; Zenger et al. 2000.) Thus, attention was divided when salient static features competed for orienting with the moving probe, and sensitivity to this competition, indexed as poorer performance with increased competition, may reflect maturing selective attentional mechanisms.

Finally, Amso and Johnson (2006) used a visual search task to examine relations between visual selection and object perception in infancy. Three-month-old infants were shown displays in which single targets of varying levels of salience were presented among homogeneous static vertical distractors. Infants also completed a “perceptual completion” task in which they were habituated to a partly occluded moving rod and subsequently presented with unoccluded broken and complete rod test stimuli. Infants were divided into two groups. “Perceivers” were those infants whose posthabituation preference indicated perception of object unity in the perceptual completion task (i.e., a novelty preference for the unoccluded broken rod stimulus), whereas the “non-perceivers” looked longer at the complete rod test stimulus. Perceivers tended to scan the rod parts more than did non-perceivers; in addition, perceivers detected more targets during the visual search task. That is, infants who provided evidence of perceiving the unity of disjoint surfaces also provided evidence of efficient visual selective attention in the search task. With the emergence of selective attention, therefore, infants become better able to detect and learn about relevant parts of the visual scene. This is discussed further subsequently.

6 Perception of Space, Objects, and Scenes

6.1 Development of Spatial Perception

The gaze moves frequently from place to place in the visual scene, and our heads and bodies move around in space. Despite near-continual transformations, disruptions, and interruptions in visual input, we usually experience the visual world as an inherently stable place. This can be demonstrated vividly by trying to read while shaking the head back and forth (reading is not much compromised) vs. shaking the reading material back and forth (reading can be quite difficult). When an observer rotates the head, the VOR (vestibuloocular response) provides compensatory eye movements which allow the point of gaze to remain fixed or to continue moving volitionally as desired (as when reading). When the page moves, there is no such compensatory response.

Evidence from three paradigms suggests that visual stability emerges gradually across the first year after birth. First, young infants have difficulty discriminating optic flow patterns that simulate different directions of self-motion (Gilmore et al. 2004). Infants viewed a pair of random dot displays in which the dots repeatedly expanded and contracted around a central point to simulate the effect of moving forward and backward under real-world conditions. On one side, the location of this point shifted periodically, which for adults specifies a change in heading direction; the location on the other side remained stationary. Under these circumstances adults detected a shift simulating a 5° change in heading, but infants were insensitive to all shifts below 22°, and sensitivity was unchanged between 3 and 6 months. Gilmore et al. speculated that optic flow sensitivity may be improved by self-produced locomotion after 6 months of age or by maturation of the ventral visual stream.

Second, young infants’ saccade patterns tend to be retinocentric, rather than body-centered, when studied in a “double-step” tracking paradigm (Gilmore and Johnson 1997). Retinocentric saccades are programmed without taking into account previous eye movements. Body-centered eye movements, in contrast, are programmed while updating the spatial frame of reference or coordinate system in which the behaviors occur. Infants first viewed a fixation point that then disappeared, followed in succession by the appearance and extinguishing of two targets on either side of the display. The fixation point was located at the top center of the display, and targets were located below it at the extreme left and right sides. As the infant viewed the fixation point and targets in sequence, there was an age-related transition in saccade patterns. Three-month-olds tended to direct their gaze downward from the first target, as if directed toward a target below the current point of gaze. In reality the second target was below the first location – the original fixation point – not the current point of gaze. Seven-month-old infants, in contrast, were more likely to direct gaze directly toward the second target. These findings imply that the young infants’ visual-spatial coordinate system, necessary to support perception of a stable visual world, may be insensitive to extra-retinal information, such as eye and head position, in planning eye movements. In addition, the child’s field of view transforms with the advent of the transition to walking, which yields new kinds of visual experiences (Kretch et al. 2014).

Third, there are limits in the ability of infants younger than 2 months to switch attention flexibly and volitionally to consistently maintain a stable gaze. Movement of one’s body through the visual environment can produce an optic flow pattern, as can head movement while stationary (recall the head-shaking example). The two scenarios may produce similar visual inputs from optic flow, yet we readily distinguish between them. In addition, adult observers can generally direct attention to either moving or stationary targets, nearby or in the background, as desired. These are key features of visual stability, and the four eye movement systems described previously work in concert to produce it. OKN (optokinetic nystagmus) stabilizes the visual field on the retina as the observer moves through the environment. OKN is triggered by a large moving field, as when gazing out the window of a train: The eyes catch a feature, follow it with a smooth movement, and saccade in the opposite direction to catch another feature, repeating the cycle. The VOR, described previously, helps maintain a stable gaze to compensate for head movement (OKN and the VOR are present and functional at birth, largely reflexive or obligatory, and are likely mediated by subcortical pathways; Atkinson and Braddick 1981; Preston and Finocchio 1983). The others are the saccadic eye movement system and smooth pursuit, to compensate for or cancel the VOR or OKN as appropriate. Aslin and Johnson (1994) observed suppression (cancellation) of the VOR to fixate a small moving target in 2- and 4-month-olds, but not 1-month-olds, and Aslin and Johnson (1996) observed suppression of OKN to fixate a stationary target in 2-month-olds, but not in a younger group.

6.2 Development of Object Perception

Studies of object perception in young infants have revealed a developmental pattern from “piecemeal” to coherent perception of the visual environment that extends from birth through the first several months afterward, implying a fundamental shift in the infant’s perceptual experience (Johnson 1997; Bremner et al. 2015). Experiments that examined perceptual completion, for example, revealed that newborn infants appear to construe a partly occluded rod behind a box as consisting of two disjoint parts (Slater et al. 1990; Slater et al. 1996b; but see Valenza and Bulf 2011 for counterevidence). By 4 months, perceptual completion has become more robust and reliable under different conditions (see Bremner et al. 2015, for review). As noted previously, an important means by which infants may come to perceive occlusion is by improvements in selective attention (Amso and Johnson 2006; Johnson et al. 2008; Schlesinger et al. 2012).

Several studies have reported that young infants can maintain active representations of the solidity and location of fully hidden objects across brief delays (e.g., Aguiar and Baillargeon 1999; Spelke et al. 1992), and some interpretations of these reports have appealed to innate object concepts (e.g., Baillargeon 1993; Spelke and Kinzler 2007). However, as noted in the previous paragraph, newborns provide little evidence of perceiving partly occluded objects, begging the question of how perception of complete occlusion, or simple object persistence, emerges in infancy. Experiments addressing this question have examined infants’ responses to objects that move on a trajectory, disappear behind an occluder, reappear on the far side, and reverse direction, repeating the cycle (see Bremner et al. 2015 for review). These experiments have tended to support a “perceptual” account of object persistence, as opposed to an account based on object persistence as an innate principle through which events are interpreted. Object persistence is specified by perceptual information such as deletion and accretion of objects at an edge, surface appearance, rate and orientation of object motion, and other visual cues. The developmental question thus concerns infants’ changing ability to perceive object persistence on the basis of these cues. Four-month-olds perceive persistence across shorter spatial and temporal gaps than 6-month-olds, for example, and they require more cues to specify occlusion (and hence persistence) than adults (Johnson et al. 2003).

Other advances in object perception come from coordinated visual attention and manual exploration, which can help infants understand objects as solid in 3D space. For example, 4-month-olds who exhibited high levels of spontaneous engagement with complex objects, operationalized as manual manipulation of the object accompanied by visual attention to the object, appeared to be better at mental rotation, the ability to imagine the appearance of objects from different viewpoints and to discriminate objects from different views (Slone et al. 2018). In addition, 4- to 7-month-olds’ visual-manual object exploratory skills predicted 3D object completion, the ability to perceive objects as solid in 3D space (Soska et al. 2010).

6.3 Development of Scene Perception

In our everyday visual environment, visual scenes are complex and generally characterized by a number of objects at different distances, overlapping one another from the observer’s perspective. Two types of scene characteristics have been extensively studied in adults: perceptual salience (low-level features such as edges, color, and contrast) and semantic relevance (objects). Low-level scene characteristics can be measured with perceptual salience maps (Itti et al. 1998), and there is evidence that visual attention is driven toward salient regions based on these low-level properties (Borji et al. 2013; Itti and Koch 2000; Itti et al. 1998). Other studies have showed, in contrast, that attention allocation can be better explained as looking at objects or semantically relevant characteristics of a scene, rather than salience (Einhaüser et al. 2008; Nuthmann and Henderson 2010; Stoll et al. 2015). Moreover, fixations often fall on the centers of objects (Foulsham and Kingstone 2013; Nuthmann and Henderson 2010; Xu et al. 2014), which is difficult to explain from a saliency point of view, because saliency models typically highlight object contours and edges. In addition, effects of low-level salience on visual attention are minimal when task demands are made more challenging, such as engagement in visual-motor tasks (Tatler et al. 2011), or when viewing scenes with familiar objects in everyday settings, which can help the viewer establish areas of the scene with the highest semantic content (Henderson and Hayes 2017). Thus in adults, the role of salience in scene viewing is limited, and eye movements are more likely to be driven by meaning.

As noted in previous sections, development of visual-spatial attention in infancy has been thought of as a shift from relying on exogenous features (e.g., perceptual salience) toward more endogenous (e.g., knowledge) (Johnson 1990, 2001), but findings from studies of young infants’ free viewing of complex scenes are mixed with respect to the role salience plays. For instance, Frank et al. (2009) found that salience was a better predictor of attention in younger infants (3-month-olds) than faces, whereas in older (6- and 9-month-olds) infants faces were a better predictor than salience. Frank et al. (2014) replicated this finding and showed that looking at faces in the youngest age group was correlated with performance on a visual search task (i.e., number of targets detected). This implies that detecting semantically relevant information is partly a function of emerging selective attention skills. However, other studies have reported that the predictive value of perceptual salience for children’s visual attention actually increases with age (Amso et al. 2014; Franchak et al. 2016), and a recent comparison of eight saliency models found that most were better able to predict fixation locations for adults than for infants (Mahdi et al. 2017), which would not be expected if the role of saliency in driving attention was reduced over developmental time. Part of the discrepancy among findings may stem from the fact that objects are salient (Elazary and Itti 2008), and so some of the reported age-related effects of perceptual salience on visual-spatial attention may actually be an increase in attention toward meaningful objects.

In a recent study of the development of scene perception, 3- to 15-month-old infants were presented with a series of everyday photographs (outdoor and indoor scenes, no faces) as their eye movements were recorded (van Renswoude et al. 2019); infants also participated in visual search and spatial negative priming tasks as described previously. In addition, a group of adults viewed the photos, and their gaze patterns were compared to those of infants. Infants’ visual attention to the scenes was then modeled as a function of perceptual salience, adult fixation locations, age, and selective and inhibitory attentional mechanisms while accounting for general biases (e.g., to produce horizontal eye movements; van Renswoude et al. 2016). Interestingly, all of these factors play a role in guiding eye movements as infants viewed complex scenes. Perceptual salience and adult fixation locations both made unique predictions to infants’ attention allocation. Older infants were more likely to fixate parts of scenes more frequently fixated by adults relative to younger infants, and fixation durations were longer on regions more frequently fixated by adults. In addition, there was a stronger decrease in fixation durations with age in those infants with better orienting skills. Taken together, these results suggest that development of scene perception is characterized by a growing tendency to look at semantically meaningful regions in scenes and that this process is gated in part by the emergence of gaze control.

7 Conclusions

Development of visual-spatial attention is a complex interplay between multiple systems: innate propensities, neural maturation, various kinds of experiences, motor development, and learning. Many of the insights into development of visual-spatial attention described in this chapter have resulted from remarkable advances in technology to record infants’ and children’s eye movements (Franchak et al. 2011) and brain activity (Johnson 2005; Richards 2001), yet there are many aspects of attentional development that remain unknown, in particular the precise roles of different kinds of experiences and the impact of developmental disabilities.