Introduction

Autism spectrum disorder (ASD) is associated with a range of visual processing anomalies [1,2,3]. Differential development of the neural mechanisms associated with visual perception may be more than a mere epiphenomenon of the underlying social communication difficulties that define the diagnosis of ASD [4]. Indeed, for reasons outlined throughout this review, we argue it is possible that an early fundamental deficit in visual processing could adversely impact the development of many social skills. This possibility derives from the assumption that understanding social cues depend on receiving and processing relevant sensory information before any higher-level cognition can contribute to making sense of the cues. From birth, the primate (and human) visual system is an important information system that drives attention and cognition. Indeed, visual pathways activate attention networks and allow eye movements towards salient objects of interest such as faces and also supply considerable information to the emotional centres of the brain long before the development of motor skills or language [5•].

Consider the visual demands that are required to make sense of the complex social cues that are fundamental to social interactions. Interpretation of social cues requires the rapid activation of visual attention for the detection of rapid changes in any of the numerous complex transient non-verbal stimuli, including eye position and gaze, muscles around the eyes, mouth, and forehead of the face and tenseness of body posture. These cues are all subtle and fleeting indicators of expressions of emotion. Indeed, they often manifest as micro-expressions—which are brief partial expressions that ‘leak’ out despite the intention of the expresser, reaching their peak after around 140 ms and returning to baseline after 300 ms [6] making them at times imperceptible. That is to say, social interactions present the individual with a type of rapid-serial-visual-presentation task whereby ‘targets’ (e.g. smirking lips, ‘smiling’ eyes, or shrugging of shoulders) are unpredictable in location and timing but often provide the key to successful navigation of a social encounter. Hence, for a child to develop appropriate social communication skills, the speed of their visual information processing system needs to be sufficiently fast to detect slight movements, attract attention, and process and react appropriately to the rapid, transient, and dynamic non-verbal cues that underlie social interactions. This rapid visual processing is usually required to be achieved well before conscious awareness in order that the eyes can follow and fixate on salient objects, whilst initiating the social or emotional context to be processed appropriately prior to conscious responses.

Such an emphasis on the dynamic aspects of successful social communication suggests that any underlying impairment in the speed of visual information processing and the ability to shift focus rapidly from one location to another (e.g. from the eyes to the mouth) is crucial for understanding social cognition. The neural mechanisms underlying vision—from neurons in the retina (e.g. photoreceptor, bipolar, and retinal ganglion cells) to the various cortical regions involved with visual processing, and the independent parallel pathways in between—are complex. For example, neural signals project from the retina not just to visual cortex but to widespread areas of the whole brain [7, 8]. With all this in mind, it becomes easy to see how the development of visual pathways that drive attention from the moment of birth might play an important role in the development of social skills in typically developing children. We will argue that it is plausible that these mechanisms do not develop appropriately in ASD, leading to difficulties in social communication in this population.

This paper is divided into two main parts. The first will provide a ‘cook’s tour’ of the visual system with a focus on the functions of key pathways and structures (or nodes) that contribute to social processing. After these basics are covered, the second will discuss how certain pathways are thought to be affected in ASD and how these affect processing of other neural systems that are important for learning and understanding social cues. We will argue that an emphasis on the pathways that project information towards important nodes within the social brain (such as the amygdala) is fundamental to understanding salient social interactions. Our paper will then conclude with some recommendations for future avenues of research that could test some of the ideas we present in our paper.

Part 1: A Cook’s Tour of the Visual System

The Primary Pathways to Visual Cortex

The cortical visual system consists of two main parallel pathways—one faster and one slower—starting from the retina and projecting through the thalamic lateral geniculate nucleus (LGN), the primary visual cortex (V1) and beyond to the dorsal (fast conducting) and ventral (slower conducting) visual streams [9, 10]. These pathways are referred to as the retino-geniculo-cortical pathways (for a simplified schematic of this system, see Fig. 1). Most of our understanding of the cortical visual system comes from pioneering studies in cats and monkeys starting in the 1960s [11] and a series of primate studies in the 1980s and 1990s that recorded from single neurons [12, 13]. Similar functional pathways to the human visual cortex have been demonstrated with psychophysical and electrophysiological techniques [14,15,16].

Fig. 1
figure 1

M and P channel contributions to the dorsal and ventral streams. The M (red) and P (green) channels originate from the retina and remain partially differentiated in V1. The dorsal stream consists mainly of the M channel, which receives input from both V1 and the motion processing area V5/MT. Some inputs to V5/MT via the M channel bypass V1 coming directly from the subcortical LGN. The ventral stream consists of both M and P channels. In this paper, we posit that the M channel is more affected in ASD. M magnocellular, P parvocellular, LGN lateral geniculate nucleus, V5/MT area V5/MT, V1 primary visual cortex

These functional pathways can be viewed as neural highways. Like lanes of a highway, these pathways have multiple channels, some of which are faster than others. The larger faster conducting M channel gathers information from multiple types of photoreceptors in the retina prior to projection to the LGN, and then to V1 [17]. The M channel responds preferentially to high temporal and low spatial frequency (i.e. course grained rapidly presented visual information), is colour insensitive, but has high-contrast sensitivity, and responds via a rapid transient response, with fast axonal conduction speeds [18, 19]. The M channel also projects via the evolutionarily older visual pathway to the brainstem and drives visual attention and eye movements (see section below ‘The subcortical eye movement system’). On the other hand, the slower conducting, small retinal ganglion cells give rise to the parvocellular (P) channel, and project to the LGN and then to the V1. This channel is preferentially sensitive to visual stimuli with high spatial, and low temporal frequencies (i.e. fine-grained and slower information) [18, 19]. It has lower contrast sensitivity and provides more sustained responses with slower axonal conduction speeds than the M channel [14, 18, 19]. The ventral stream has a much higher concentration of the P channel compared with the M channel. The P and M channels are not the only known channels, but they are the best investigated from the retina through the cortex [9, 20].

As the M and P channels send separate and parallel information through the LGN to V1 at different conduction rates, these signals arrive in V1 at different times [21] such that the faster M inputs can project rapidly through the dorsal stream for analysis of motion [22, 23] and visuo-spatial guidance of arm and eye movements towards targets [10]. Subsequently, some M fibres feedback through the dorsal stream to combine with later arriving P channel inputs in V1 and project together through the ventral stream to the inferior temporal cortex, which is critical for object and face recognition [24,25,26].

Currently, there is now a growing body of research that seeks to understand the role of the M and P channels in social processing, with a focus on facial emotion processing [27,28,29,30,31,32]. One common approach to separating out the contribution of the two channels when processing faces has been software filtering of spatial frequency information from photographs [32]. More recent data indicates that the use of temporal contrast characteristics (e.g. stimuli that rapidly flicker) is a better approach to confine visual processing to the M channel and is now being applied to test M channel functioning in ASD age groups [33••].

Early considerations of how the visual system provides information to emotion and social processing regions of the brain focused on the slower conducting more detailed information carried by the ventral stream projections to the amygdala as well as to the frontal cortex [34, 35]. However, as explained above, social cues are brief, fleeting, and dynamic. With this in mind, one might expect that the faster M channel is important for the visual processing of these cues. Much available evidence suggests this is indeed the case. For example, it has been shown that behavioural measures of M channel functioning (luminance contrast sensitivity) correlates with face identity discrimination [36]. Face identity processing studies have generally highlighted how the M channel drives rapid processing of the broad ‘gist’ rather than a detailed processing of faces [37, 38]. This course-to-fine model of visual processing has been suggested to apply also for object and scene processing [39, 40]. Cortically based models of visual processing have highlighted how a faster M channel through the dorsal stream processes a rapid gist of a stimulus, prior to feedback connections combining with the slower P channel inputs through the ventral stream for processing the same stimulus with greater scrutiny [21, 26, 39].

Hence, although the processing of faces in dynamic social settings require a complex interaction of M and P channels, it is clear that the M channel is particularly important for a quick interpretation of the essential affective aspects of the face. More recently, intracortical recordings in humans have demonstrated that the M channel via an evolutionarily older subcortical pathway is even faster than cortical pathways in carrying signals related to emotion, and in particular related to threat, to the amygdala, the accepted site of negative emotional processing [41•]. This alternative route, which bypasses the visual cortex, should also be considered for social communication. We will discuss this route next.

The Subcortical Amygdala Pathway

Whilst most research in the visual system has focussed on cortical pathways, there has been a renewed interest in the evolutionarily older retino-superior colliculus-pulvinar-amygdala visual pathway projecting through the superior colliculus (SC), to the pulvinar (PUL), and then to the amygdala (SC-PUL-AMY) (for a simplified schematic of this system, see Fig. 2). It is believed that infants rely on brainstem and subcortical sites for the control of visual attention prior to the development of binocular cortically coordinated vision around 3 months [5, 42,43,44]. This subcortical route, which has been directly observed in lower mammals and non-human primates [45, 46], is driven primarily by the faster conducting M channel [47]. In humans, the deeper midbrain pathway has been more elusive to examine with neuroimaging or electrical recording, though there are a number of studies providing support for such a pathway that has similar temporal and spatial frequency characteristics to the primate M channel [5, 32, 41, 48, 49•]. This is consistent with evidence that faster conducting large cell responses (likely M channel) are present from birth [50, 51]. Together with the fact that early in the life of all mammals the M channel drives subcortical vision through the nucleus of the optic tract and superior colliculus [47], these results highlight how detrimental an M channel deficiency could be to a baby’s ability to orient attention to salient objects, and in particular to faces.

Fig. 2
figure 2

Multiple visual inputs to the amygdala. a Visual inputs to the amygdala can arise from a subcortical pathway via the superior colliculus and the thalamic pulvinar and also via a pulvinar-STS connection. b The amygdala also receives visual inputs from the retinal-geniculate pathway and through the ventral stream, though potentially also through a dorsal V1-STS-amygdala pathway. Both subcortical and cortical pathways shown in a and b carry M-type fast information. Hence, disruption of the M channel could have consequences in amygdala function. AMY amygdala, SC superior colliculus, PUL pulvinar nucleus, STS superior temporal sulcus, V5/MT area V5/MT, LGN lateral geniculate nucleus, V1 primary visual cortex

The amygdala has for a long time been thought of as a neural centre for threat detection [52, 53], though some more recent views conceptualise the amygdala as having a broader role in salience or relevance detection [54,55,56]. Nevertheless, it appears most likely that a fast and direct subcortical pathway to the amygdala provides an evolutionarily older mechanism to detect threat and attach an emotional value to biologically salient stimuli such as faces [5•]. Whilst everyday social interactions mostly do not involve life-threatening encounters, the rapid and automatic detection of facial cues of emotion is an important function of the amygdala to assist with the processing of salient social information necessary for successful social communication.

In fact, the subcortical pathway to the amygdala has been shown to activate in the absence of conscious awareness [57]. Non-conscious processing of a visual stimulus can be demonstrated by using techniques such as backward masking in which a visual target (e.g. photo of a face) is presented briefly (i.e. 15 ms) before a high-contrast ‘white noise’ masking stimulus takes the target’s place. Such a procedure can prevent the participant from having any awareness of the face. Under these conditions, neuroimaging has demonstrated activation in the amygdala, superior colliculus, and pulvinar to emotionally charged stimuli in the absence of the participants being aware of their presence [58]. Continuous flash suppression (CFS) is another method of assessing non-conscious processing that involves presenting a target to one eye and a series of noisy (Mondrian-like) images that are flashed rapidly to the other eye [59]. This technique carried out during neuroimaging has similarly demonstrated activation of the amygdala during the presentation of emotionally charged faces without participants being aware of their presence [60].

The Subcortical Eye Movement System

Finally, given the wide-ranging body of evidence linking impairments in ability to rapidly orient and shift attention [61] in individuals with ASD [62,63,64], this review needs to consider the role of the oculomotor system that controls eye movements. This system needs to be efficient and precise. Otherwise, it would not allow the necessary saccadic eye movements required to maintain proper fixation on a moving object or, more relevant to this paper, to pick up and make rapid eye movements (saccades) towards the small fleeting and dynamic motor cues that are necessary for effective social communication.

The oculomotor system is complex as it maintains many functions. However, our particular interest is the relationship between eye movements and attention, and the ability of the eyes to rapidly activate and shift attention to any facial movements. Such shifts in attention can be driven directly by the brainstem nuclei and midbrain superior colliculus in infants or controlled by fronto-parietal circuits in more mature individuals [44, 65]. There are also a number of auxiliary retinal projection areas in the brainstem, cerebellum, thalamus, and the cortex that together control eye movements for facial expression pursuit [66] (for a simplified schematic of this system, see Fig. 3). The superior colliculus receives a direct retinal projection and contains a topographical saliency map that facilitates rapid orientation of attention to salient stimuli in the visual field [61, 67]. It also serves as a hub-integrating visual and motor information from many other parts of the cortex and sending out commands to the cerebellum and pretectal areas in the midbrain that innervate the extraocular muscles controlling eye movements [68]. Again, the projections the superior colliculus receives from the retina are from the faster M channel [47], which enables the system to move the eyes quickly to the targets of interest in the visual field with efficiency and precision. An impairment in this pathway, known as the retino-tectal pathway, is sufficient to alter the facilitatory capacity to reflexively saccade towards salient targets in the visual field, whether that be due to stimulus salience (when objects are distinct from their surroundings due to luminance or movement) or to behavioural relevance [69], such as when detecting faces compared with other non-face object categories [31].

Fig. 3
figure 3

Multiple brain structures important for eye movements. Incoming visual information (red arrows) goes to different brain regions upon which output signals (black arrows) are computed and project to the extraocular muscles. All of this incoming fast information is carried out by the M channel. Hence, disruption of the M channel could have consequences in oculomotor responses, including those that are necessary to detect subtle and dynamic social cues

Part 2: How the Visual System Is Affected in ASD

In this section, we propose that deficits in the M channel that reduce or attenuate the rapid and dynamic visual processing required for social interactions could potentially provide a parsimonious explanation to a myriad of social-communicative difficulties associated with ASD—especially if these impairments in driving attention and eye movements are present early in life before language develops.

Whilst everyday social interactions are likely to utilise a combination of slower more deliberate and conscious processing by the P channel as an adjunct to the faster, more automatic, and non-conscious processing by the M channel, it has been suggested that it is the latter which may be more affected in ASD [8, 70,71,72,73,74,75,76]. An M channel abnormality has been reported in children with ASD [77•] and in neurotypical adult populations with high autistic traits [78]. In contrast, McCleery et al. [79] found at-risk infants demonstrated an abnormally sensitive M channel. Indeed, there have been a number of mixed findings with regard to the appearance of a specific M channel deficiency in ASD [80, 81], though as has been suggested by Greenaway et al. [77], this may in large part be a product of the utilisation of stimuli that were not optimal to selectively target this channel. Slower visual information processing for even simple stimuli (e.g. flickering light), which provides a threshold index of M channel functioning [82••], has been demonstrated in children with ASD [33••]. Impairments in the M channel have also been suggested to underlie abnormal processing of motion, given the dominance of the M channel projection in motion-sensitive regions in the dorsal stream [83,84,85,86,87]. As described in ‘Part 1: A Cook’s Tour of the Visual System’, the M channel reaches the cortex via both the LGN to V1 route, which we referred to as the primary pathway, and the superior colliculus and pulvinar projections to the amygdala, which we referred to as the subcortical route. It could be argued that the latter route is more critical to examine in the context of ASD given its role in driving automatic emotional evaluation and automatic attention orientation, particularly for salient stimuli such as faces.

There is abundant evidence for abnormalities in amygdala structure and function in ASD that can be linked to a number of social difficulties, including Theory of Mind, face emotion recognition, and eye contact [88,89,90,91,92,93,94,95]. In focussing on the functioning of the amygdala as a node in the emotional/social neural network, most research has, in general, not considered how relevant sensory information reached this part of the brain. Accordingly, an impairment in the pathway to the node, rather than the node per se, may be critical to understanding amygdala dysfunction in ASD. Indeed, a number of authors have proposed models that directly implicate the SC-PUL-AMY route as being important for automatic face and eye contact detection in typically developing populations [31, 32, 41, 49•, 96] and as a potential source of impairment in ASD [97, 98••]. In support of this notion, this subcortical route has been shown to be more sensitive to low spatial frequency content of images, including faces, and has thus been suggested to be driven by the M channel through the superior colliculus [32, 41].

An impairment in this system in ASD could conceivably reduce one’s ability to automatically and effortlessly detect minute movements of the facial muscles that serve as important emotional cues, such as the spreading creases around the eyes or the rising eye brow. Importantly, this system needs to be quick. It must be able to rapidly attend to and detect these emotional cues and transmit this information to the necessary brain systems for interpretation. A fast and direct pathway to the amygdala could provide the necessary automatic, non-conscious emotional responses that facilitate behavioural responses when combined with slower and more conscious or deliberative processing of the social context.

Previous research has highlighted deficits in implicit processing of emotion [99,100,101]. Whilst this may involve a contribution of non-conscious pathways, these studies were not designed to confirm non-conscious processes. Only in the last 10–15 years has this question been examined directly. Although it is not essential to assume that this automatic mechanism is specifically non-conscious, there is some evidence that individuals with ASD do not make use of non-conscious emotional information in the same way as typically developing participants [70, 72, 102, 103]. One approach for verifying the presence of non-conscious processing is through the use of pupillometry, which measures pupil diameter that increases or decreases in accordance with brightness, cognitive effort, and emotional arousal [104]. There is evidence for reduced pupillary dilation in ASD children during the backward masked presentation, when compared with visible presentation, of fearful faces [70]. Unlike typically developing children who showed equivalent responses between conscious and non-conscious conditions, the results of Nuske et al. [70] indicate reduced physiological arousal despite these ASD children having no conscious awareness of seeing these stimuli. Other studies utilising backwards masking or other techniques to suppress conscious awareness have in general found evidence for reduced or anomalous non-conscious visual processing in ASD [72, 73, 102, 103]. Such studies typically have investigated non-conscious processing of faces and consequently have argued for the amygdala to be the source of this impairment [75].

However, it should be considered that any abnormality upstream from the amygdala, either in subcortical structures such as the superior colliculus and pulvinar, or in the neural channels leading to the amygdala, could equally result in a reduced processing of visual stimuli in ASD. Given that these structures are not normally associated with a specific role in the emotional processing of faces (although for evidence for pulvinar involvement in emotional processes and saliency detection, see [105]), this would suggest that the impairment in ASD may not be specific to the processing of faces. Indeed, recent work has found evidence for reduced non-conscious processing of non-social cues, specifically hierarchical arrow stimuli, in typically developing adults with high levels of autistic traits [74].

A large literature, including multiple meta-analyses, confirms that individuals with ASD show deficiencies in recognising emotion from facial displays [106, 107]. The evidence also points towards a more general abnormality in face processing, including eye fixations [108] and neural activations when viewing faces [98, 109, 110]. When examined from a developmental perspective, these face processing deficits may result from a failure of the amygdala to automatically assign faces as salient throughout early development. Supporting this view, 1-year-old infants who were later diagnosed with ASD looked at other people less frequently than typically developing controls [111]. This apparent disinterest in faces would be expected to lead to less experience with viewing and interpreting emotional expressions [112] and consequently to inhibit the development of neural networks which specialise in face processing [113]. Thus, early deficits in face processing in ASD could lead to a social experience throughout the lifespan that is markedly different from the neurotypical population.

Deepening this picture of anomalous development of face processing requires an appreciation of how the visual system may contribute to this divergence. One possibility is that a fundamental deficit in the nodes associated with social cognition (e.g. amygdala, prefrontal cortex, cingulate cortex) or in emotional face processing (e.g. fusiform face area, occipital face area, superior temporal sulcus) predisposes individuals to the cascading difficulties with facial emotion recognition throughout development [90, 91]. Another possibility is that a fundamental deficit in the channels that drive these nodes—specifically the rapid M channel—leads to these difficulties in face processing, and consequently to associated social difficulties. This hypothesis becomes even more conceivable when one considers the specific role of the M system in driving attention mechanisms and the detection of movement. An infant with an M channel impairment could be expected to demonstrate difficulties with making saccadic movements and orienting attention towards less obvious movements of facial features whilst concentrating on obviously moving features such as a talking mouth. When neurotypical adults view faces, eye-tracking data suggests they tend to fixate more on the eyes or the mouth depending on the expression [114], making attention to only eyes or mouth inadequate for full use of social cues. Instead, particularly when learning about social cues and facial expressions, one must be ready to orient visual attention to different regions of the face, which become salient due to the small rapid movements essential for detecting emotional expressions.

An impairment in the retino-tectal pathway, which provides a rapid and automatic (and potentially non-conscious) projection through the superior colliculus and pulvinar, would be sufficient to alter the facilitatory capacity to reflexively saccade towards faces compared with other non-face object categories [31]. Previous work has pointed to the importance of this subcortical route in face and eye detection in ASD [97]. The preference for faces observed in newborn infants is driven by the retino-tectal pathway and provides the basis for development of cortical face specialisation [5•]. It has been argued that this subcortical pathway involved with rapid face processing is driven by the M channel [5, 41, 57].

Thus, although the existing suggestion of M impairment in ASD is appealing to link with the possibility of a retina-superior colliculus pathway impairment driving early difficulties with face processing, this has yet to be directly demonstrated. However, there is some indirect evidence. In a choice reaction time task, fast reflexive saccades were faster when detecting upright compared with inverted faces in typically developing adults with lower autistic traits, reflecting an expected preference for faces [115]. This saccadic face inversion effect was absent in typically developing adults with higher autistic traits. Whilst this research made use of static images of faces, as has been the case for most face processing studies, a number of factors suggest that the subcortical M channel may be critically involved in processing dynamic changes in facial expressions. The superior colliculus shows response selectivity to motion [116], whilst a direct connection between pulvinar and the motion-sensitive V5/MT region in the dorsal stream has been demonstrated [117,118,119]. In addition, the importance of this subcortical system for orienting eye and head movements, and hence attention towards salient and moving objects [120], makes this a potentially crucial site for investigation into the difficulties in dynamic social interactions found in ASD. Indeed, a recent review has proposed that the superior colliculus may provide a unifying substrate for the pathogenesis of ASD [8].

An impairment in the fast axonal conduction speeds (or a reduced signal strength) of the M channel driving attentional acquisition and initial processing of global gist may also be critical throughout early development in children with ASD [21]. Furthermore, whilst an existing literature has documented impairments in the superior temporal sulcus (STS) in ASD linked to its role in social perception [121], to date, there has been little confirmatory research examining which subcortical or cortical visual pathways project to STS. Evidence for a critical role for STS in processing biological motion and facial expression movements [122, 123], as well as models that describe the STS as part of a dorsal stream face pathway [124,125,126], suggest that problems with the M channel may also explain impaired STS function. A direct subcortical pathway to the STS from the pulvinar has been demonstrated in monkey [127] whilst it has been shown in monkeys and in humans that the STS also sends connections to the amygdala [128, 129]. Again, this points to the importance of considering how impairments in the visual pathways projecting to ‘social brain’ regions may be crucial to understanding how individuals with ASD develop difficulties in tracking dynamic social encounters, which in turn can lead to difficulties in social communication.

Future Directions and Concluding Remarks

The proposal outlined in this review that visual processing deficits could contribute to the development of social communication difficulties in ASD points towards some novel experiments that could be considered in the future. Firstly, it could be the case that emotion recognition performance is even more impaired in ASD when dynamic facial expression rather than the more commonly utilised static photograph displays are presented. There is already some preliminary evidence for this [130•], although it will be important to test this hypothesis using more realistic stimuli, for example using faces displaying subtle rather than obvious or exaggerated expressions as is most often available in many face databases or with more complex emotions (e.g. confused, ashamed, suspicious) rather than the standard six basic expressions (fear, disgust, happy, sad, surprised, angry). There is some promising research indicating that slowing the speed of videos of facial expression improves ASD performance and alters the pattern of eye movement fixations on the mouth region [131, 132•]. As has been suggested by Gepner et al. [132•], these findings provide a promising avenue for therapeutic ASD research. As well as slowing down facial displays to assist in learning, treatments that seek to train or improve rapid visual processing, in particular M channel processing could also be fruitful. It would be interesting to explore whether spatial frequency or contrast filtering of videos of dynamic faces confirms the expectation that preferential processing by the M channel is disproportionately impaired in ASD populations. As alluded to above, the visual pathway contribution to STS is unclear, and studies attempting to determine which visual pathways provide dominant inputs to this important social perception region will allow testing the hypothesis that M channel inputs to STS are impaired in ASD.

Less emphasised, but an aspect that deserves more research interest, is that an impairment in the M channel input to motion-sensitive regions in the dorsal stream could make processing of moving facial features more difficult. It is worth noting that motion-sensitive region V5/MT, as well as receiving inputs through the dorsal stream via V1, also receives a fast and direct input from the M channel through the superior colliculus and pulvinar [133, 134]. Marmoset research has demonstrated a direct retino-pulvinar to V5/MT pathway which would facilitate fast detection of motion in the visual field (reviewed in [119]). What role such direct inputs to V5/MT and STS might contribute to the dynamics of facial expression processing in ASD has not yet been investigated in detail.

Conclusions

In closing, ASD is a complex and heterogenous condition. It is not likely that a single factor will be able to explain the myriad symptoms and presentations found. The current review argues that although ASD is considered a social communication disorder, consideration of the important role vision has in the early development of learned social cognition, and ultimately in providing crucial inputs to a range of higher order cognitive processes, such as attention and executive function that are impaired in ASD, may be a promising avenue for future research.