Introduction

The earliest as well as the most current theories of autism are based on the premise that persons with autism process sensory information in a way that is different from others (Brock, Brown, & Boucher, 2002; Frith, 1989; Happé, 2005; Hermelin & O’Connor, 1970; Hutt, Hutt, Lee, & Ounsted, 1964; Just, Cherkassky, & Keller, 2004; Mottron, Dawson, Soulieres, Hubert, & Burack, 2005). Initial clinical reports of atypical reactions to sensory stimuli date back to Kanner (1943) who observed unusual attention to parts rather than wholes among persons he later described as autistic. These early reports were later corroborated and extended by numerous clinical and parental reports as well as accounts from persons with autism of unusually intense attention to or avoidance of sensory stimuli from all the modalities (Grandin, 1992; Cesaroni & Garber, 1991; O’Neill & Jones, 1997; Williams, 1994). The first theories of the causes of atypical behaviours among persons with autism were based on observations of hypo- or hyper-arousal (Hutt et al., 1964) and unusual reactions to sensory input (Kootz, Marinelli, & Cohen, 1982; Ornitz, 1974; Ornitz, Guthrie, & Farley, 1977), as well as evidence of atypical attentional, physiological, and neurological responses to sensory tasks among persons with autism (Hermelin & O’Connor, 1970; Hutt et al., 1964; Ornitz, 1974). Similarly, many of the current theories of autism reflect the theme that sensory atypicalities are core symptoms of autism and have downstream effects on the development of the perceptual system in persons with autism (Bertone et al., in press; Happé, 2005; Mottron & Burack, 2001; Just et al., 2004).

In this paper, we review the research on sensory issues in autism including studies that are based on questionnaires, autobiographical accounts, retrospective video observations and early experimental approaches. The strengths and limitations of the various methodologies are discussed. The goal is to operationalize the construct of sensory integration from a cognitive neuroscience perspective and present experimental paradigms and techniques used extensively to study sensory integration in normative populations. We conclude with a discussion of the implications of multisensory research for perceptual theories of autism.

Sensory profiles of persons with autism: the status of the evidence

Questionnaires and Rating Scales

Parents of infants with autism often report sensory peculiarities early in the development of their infants. These reports are among the most diagnostically salient features of autism in the first 2 years of life (Dahlgren & Gillberg, 1989). For example, children and adults with autism are reported to be easily distressed or preoccupied by innocuous sights, sounds, odours and textures, and are not responsive to other more meaningful sensations such as the sound of their name (Baranek, 1999; Talay-Ongan & Wood, 2000; Waterhouse, 1999). Atypical sensory-perceptual behaviours appear to persist throughout the development of individuals with autism (Greenspan & Weider, 1997; O’Neill & Jones, 1997) and occur in the absence of hearing and visual defects and other physical dysfunctions (Scharre & Creedon, 1992). Based on a review of research that included anecdotal and clinical reports, the prevalence of sensory sensitivities among persons with autism was estimated to be between 30 and 100% (Dawson & Watling, 2000). From a clinical perspective, the sensory-related behaviours exhibited by persons with autism are thought to help individuals cope with their sensory environment by either generating or avoiding sensory stimulation. Additionally, the frequency or intensity of these behaviours may differentiate persons with autism from other groups with developmental disabilities (Ermer & Dunn, 1998).

Traditionally, clinicians used parental questionnaires such as Sensory Profile (SP) to assess the sensory profile of children with autism for which normative data is available (Dunn & Westman, 1997). Kientz and Dunn (1997) compared the SPs of 32 children with autism (3–13 years) with that of typically developing (TD) children (3–10 years) and found significantly more (85% of the items) hypo- or hyper-responses (e.g., preoccupations with sensory features, perceptual distortions, paradoxical responses to sensory stimuli). The generalizability of these findings are limited due to the small sample size, significant variability among the group of children with autism, and no matching with TD group on any measure. In a similar study based on the same sample, Ermer and Dunn (1998) differentiated children with autism or PDD from those with ADHD on the factors of sensory seeking, oral sensory sensitivity, and fine motor perception. However, again, several methodological shortcomings such as small, biased samples and uneven variability across the groups limited the significance of the findings (Ermer & Dunn, 1998).

Watling, Deitz, and White (2001) compared the SPs of 40 children with autism or PDD (3–6 years) and 40 TD children matched on CA and found significant differences on 8 of the 10 factors including Sensory Seeking, Emotional Reactive, Low Endurance/Tone, Oral Sensitivity, Inattention/Distractibility, Poor Registration, Fine Motor Perceptual and Other. Although the methodology was improved due to the narrower age range of the participants, the considerable variability among the group of children with autism/PDD and the lack of mental age matching precludes a meaningful interpretation of the findings.

More sensitivities in all modalities were found among children with autism spectrum disorder (ASD) (4–14 years) than in the gender and CA-matched TD children on the Sensory Sensitivity Questionnaire-Revised (SSQ-R) (Taley-Ongan & Wood, 2000). However, the heterogeneity of the ASD group (e.g., diagnosis and severity) and the lack of an IQ matched TD group limit the findings.

Rogers, Hepburn, and Wehner (2003) addressed many of the methodological limitations of previous sensory-questionnaire studies by including a group of participants with autism that was more homogeneous with regard to both age (21–50 months) and diagnosis (autism proper as opposed to), matching on mental age, and the inclusion of two comparison groups of persons with fragile X and developmental delay. On the short version of the SP, significant differences among the groups were found on tactile sensitivity, taste/smell sensitivity, underreactive/seeks stimulation, auditory filtering and low energy/weak muscles. The overall number of sensory symptoms reported among the children with autism did not differ from those reported for children with Fragile X syndrome, although the type of sensory symptoms varied. The SP sensory scores were not correlated with either the Autism Diagnostic Interview—Revised or the Autism Diagnostic Observation Schedule. Thus, in toddlers, sensory symptoms as measured by the SP were neither specific to autism nor related to its symptoms. Sigman and Capps (1997) caution that persons with autism may not be impaired but rather react to stimuli in very idiosyncratic ways such as smelling non-edible objects and attending to objects out of the corner of their eyes. The meaning and function of sensory-related behaviours may be different in persons with autism than in other individuals, and thus, sensory impairments cannot be inferred from clinical observations or reports.

Self-reports

Autobiographical accounts provide another relevant source of information on the subjective sensory-perceptual experience of people with autism. Although the communication and, in some cases, cognitive deficits of persons with autism generally preclude parents, clinicians, and researchers from directly accessing their sensory experience, a subgroup of high-functioning, verbal adults with autism have provided autobiographical accounts of their subjective sensory experience (see Volkmar & Cohen, 1985). Generally, the reports refer to difficulties in the reception (input) and processing (making sense) of sensory information (Cesaroni & Garber, 1991). The personal accounts include examples from vision, sound, taste, smell, proprioception, and kinesthetic stimulation of sensory distortions, sensory tune-out and overload, synesthesia (e.g., a sound provoking sensations of colour or smell), difficulties processing information from more than one modality concurrently, and difficulties identifying the source modality of sensory input (Attwood, 1998; Grandin, 1988, 2000; Williams, 1996).

Jones, Quigney, and Huws (2003) conducted a qualitative analysis of the numerous first-hand web page accounts of sensory disturbances and discovered 4 response clusters that included aversive experiences, coping mechanisms, pleasurable experiences, and awareness of being different. Other findings showed that being touched by others, certain sound frequencies, and light flashing at a certain frequency were considered aversive and avoided by persons with autism (Cesaroni & Garber, 1991; Grandin, 1988; White & White, 1987). Fascinations with certain smells, movements, and engaging in sensory stereotypies (i.e., repetitive behaviour that increases the likelihood of re-experiencing the sensory event) were sources of interest and pleasure, and sought out by persons with autism (Stelhi, 1991; Volkmar & Cohen, 1985). These sensory experiences and the coping behaviours that they elicit may evoke positive or negative feelings about the self (Jones et al., 2003; Volkmar & Cohen, 1985).

Autobiographical accounts of unusual sensory experiences provide only one source of information, and must be considered along with other indices since the reports of one individual with autism may change significantly over time, may not be relevant to others, or may be a confluence of self and others’ memories about events and experiences (O’Neill & Jones, 1997). Furthermore, in the case of autism, as well as in many other disorders that affect mental functioning, characteristics of the disorder, such as idiosyncratic language (Tager-Flusberg, 2001), difficulty with certain aspects of memory (Bowler, Gardiner, & Berthollier, 2004), and perseveration on a topic, may limit the individual’s insight on and ability to accurately report their sensory experiences (Happé, 1991).

Retrospective Video Analyses

Systematic observation and analysis of home videos of infants who are later diagnosed with autism provide another source of information on sensory issues in autism. Lösche (1990) found that raters blind to the children’s diagnoses reported delayed sensorimotor development among infants (4 and 42 months of age) who were later diagnosed with autism. However, these differences may have been due to mental retardation and not autism. Using a behaviour checklist (ERC-N scale), Adrien and colleagues (Adrien, Perrot, & Hameury, 1991; Adrien, Perrot, & Sauvage, 1992) found that 9 children with autism and 3 with PDD.NOS (birth to 2 years) showed paradoxical reactions to sounds and/or the child appeared deaf, atypical motor movements and, excitability or passivity. In a follow up study, Adrien, Lenior, and Martineau (1993) improved their methodology by employing blind raters, narrowing the age range to birth and 1 year, and including observations of TD children. The children with autism were found to have hypotonia and showed a lack of social attention, social smiling and appropriate facial expressions and hypotonia. Osterling and colleagues (Osterling & Dawson, 1994; Osterling, Dawson, & Munson, 2002; Werner, Dawson, & Osterling, 2000) also found that at 8–10 months of age, and, increasingly as they approached their first year of life, children with autism were distinguishable from TD children and those with mental retardation in terms of their tendency to look at others and orient to their name less frequently. Compared to infants with Down syndrome, those with autism exhibited significantly more mouthing of objects, poor visual attention and aversion to social touch (Baranek, 1999). Retrospective videotape analysis is a promising technique, however, researchers should consider the variability in the data sources (e.g., context and setting), difficulties in estimating the age of the child, observer rating biases, and the mental age of the child in order to determine whether the behaviour is atypical or simply delayed (Baranek, 1999; Burack, Iarocci, Bowler, & Mottron, 2004).

Early Behavioural Studies

Sensory, attentional, and perceptual peculiarities were among the first behavioural symptoms to rouse research interest and generate theories about the underlying causes of autism (Hermelin & O’Connor, 1970; Hutt et al., 1964; Ornitz, 1969). In the initial neurological theory of autism, sensory abnormalities occurred in response to a chronic state of overarousal due to a disturbance in the modulation of arousal level (Hutt et al., 1964). The behavioural symptoms of repetitive motor stereotypies and restricted focus and interests were thought to serve the function of regulating arousal level through external means. However, under experimental conditions in which the amount of sensory input was manipulated, repetitive motor behaviours were not reliably correlated with increased arousal (Sorosky, Ornitz, Brown, & Rivto, 1968). Similarly, measures of EEG activity during controlled conditions did not support the notion of overarousal among persons with autism (Hutt & Hutt, 1965).

Ornitz (Ornitz & Rivto, 1968; Ornitz, 1969, 1974) was also concerned with the repetitive motor behaviours exhibited by children with autism but attributed these to problems in the processing of sensory input. He argued that children with autism preferred to use proximal sense receptors such as touch, smell and taste instead of the more distal ones such as audition and vision (Schopler, 1966). Concordantly, Hermelin and O’Connor (1970) found that children with autism displayed different behavioural and physiological responses to visual and auditory stimuli than did matched TD and learning-disabled children. These and other related findings (e.g., Hermelin & O’Connor, 1971; Lovaas, Schreibman, & Koegel, 1971) suggested that individuals with autism relied on different aspects of the stimulus sensory cue or channel than TD children and that the repetitive motor movements of children with autism provided kinesthetic feedback to better cope with sensations in their environment including a sense of their body in space (Ornitz, 1974). These studies were plagued with poorly defined constructs and samples and unsophisticated methodology, thus, the findings lacked validity and reliability.

Current Theories and Research on Sensory Processing

Early notions about the unique ways in which children with autism organize, process, and act on sensory input are reflected in several of the current psychological theories of autism. According to the “weak central coherence” theory, the ability to integrate information across a variety of contexts (perception, attention, linguistic, semantic) for higher-level meaning is impaired (Frith, 1989; Frith & Happé, 1994; Happé, 2005). Temporal binding is identified as the key process that is disrupted and likely implicated in the perceptual as well as higher-order deficits observed in autism (Brock et al., 2002), whereas in other studies, processing atypicalities are specifically associated with enhanced sensory processing or discrimination in various modalities (Mottron & Burack, 2001; O’Riordan, Plaisted, Baron-Cohen, & Driver, 2001; Plaisted, O’Riordan, & Baron-Cohen, 1998). Researchers of the neurological aspects of the disorder suggest that structural abnormalities in the cerebellums of persons with autism cause a disruption in the attentional system, particularly in the ability to shift attention within the visual modality and between auditory and visual modalities (Ciesielski, Knight, Prince, Harris, & Handmaker, 1995; Courchesne, Townsend, & Akshoomoff, 1994; Martineau et al., 1992; Townsend, Harris, & Courchesne, 1996). In contrast, some argue for a broader neurological problem such as an executive function deficit in the coordination of different sources of information from different modalities (Ozonoff, Strayer, & McMahon, 1994; Russo et al., in press; Zelazo & Müller, 2002), connectivity across different brain regions (i.e., Broca’s and Wernike’s area) that are specialized for language functions (Just et al., 2004), or reduced feedback modulation between higher and lower cortical areas (Castelli, Frith, & Happé, 2002). The various theories differ with regard to the nature of the problem (structural or functional), the domain and modality affected (sensory or complex cognitive; within or across sensory modalities) and the process involved (integration, binding, feedback modulation between cortical areas, neural connectivity), but all implicate atypical sensory processing as a core feature of autism.

Multisensory processing: operationalizing the concept of sensory integration

The development of perception is founded on the more basic abilities to selectively attend to, and spatially and temporally, integrate multiple sources of input. Typically, multiple sources of input are merged with fluency such that the observer is unaware that they are initially segregated, yet the signals may be processed by different areas of the brain and combined to form a unified representation of the object, the action, and the context. The integration of sensory input is necessary for a child to achieve a coherent percept and to plan and coordinate action. In this paper, we explore the concept of sensory processing and integration that involves multiple modalities (multisensory) in autism. In particular, we propose that multisensory processing may be a useful construct for conceptualizing and studying the sensory processing and perceptual experience of persons with autism. We chose to focus on multisensory processing and integration because many of the leading theories of autism allude to dynamic constructs and conceptualizations such as central coherence, temporal binding, shifting attention, enhanced perception, and neural modulation and connectivity that may involve multisensory processing and integration. Furthermore, there are numerous clinical and anecdotal reports that the sensory abnormalities that are observed among individuals with autism involve several sense modalities. We present theoretical perspectives on multisensory processing and integration and methodologies that have been used extensively to study these constructs in the normal population. Our goal is to operationalize the concept of sensory integration, a notion that is frequently alluded to in the field of autism yet rarely defined in empirical terms. We suspect that a rigorous definition of the term sensory integration will generate testable hypotheses and lead to refinements in current perceptual theories of autism.

A key motivation for research on multisensory processing is that most objects in the natural environment stimulate more than one of our senses simultaneously. To understand how observers perceive such multisensory objects, or how the brain processes their features, the ways in which sensory signals in different modalities influence each other need to be investigated. Research on this topic has proceeded along three lines. One, anatomical and physiological research on several non-human species involved the examination of how sensory signals initially separate neural pathways and come to interact in the brain (Stein & Meredith, 1993). This work led to the identification of neurons that respond to input from more than one modality and of particular neural firing characteristics that may underlie the integration of sensory features across modalities. Second, behavioural research on humans is focused on the ways in which sensory information in one modality can influence the perception of, and overt responses to, sensory information in another modality (see Calvert, Spence, & Stein, 2004; Welch & Warren, 1986). This work has characterized a variety of multisensory perceptual phenomena, such as ventriloquism (the integration of visual and auditory stimulus location, identity, and timing) as well as other crossmodal influences on perception that seem to be mediated by attention (Spence & McDonald, 2004; Spence, McDonald, & Driver, 2004). Third, the patterns of activity within the human brain that are associated with different multisensory perceptual phenomena are beginning to be identified in cognitive neuroscience research.

Multisensory Convergence and Integration in the Brain

In humans as well as other complex organisms, sensory signals that are transduced into neural impulses at the various receptor organs are relayed to subcortical and cortical structures along modality-specific pathways. Once the sensory signals reach cortex, processing continues along specialized cortical pathways that are still largely modality specific. Beyond these modality-specific pathways, however, several brain areas that are not dedicated to the processing of stimuli within individual sensory modalities can be found. Some of these latter “association” areas appear to be specialized for the integration of information from different sensory systems. Individual neurons in these multisensory brain areas receive input from more than one modality-specific brain area and are responsive to stimulation in more than one modality (Stein & Meredith, 1993). Some neurons in these areas not only receive convergent sensory input and respond to separate stimulation in two or more modalities, but they respond more (or less) vigorously to concurrent multisensory stimulation than would be predicted from their responses to unimodal stimulation alone. The process by which multisensory stimulation enhances (or depresses) neural activity in this fashion is thought to be one mechanism by which information in the different senses can be integrated into multisensory representations of objects in space (see Stein, Jiang, & Stanford, 2004; Stein & Meredith, 1993).

Visual, auditory, and somatosensory inputs converge upon the deeper layers of the SC, and many neurons within those layers respond to stimulation in multiple modalities (see Stein & Meredith, 1993). Most of these multisensory neurons respond differently to concurrent multisensory stimulation than to stimulation in any single modality. In a seminal study on the multisensory processing in the SC the authors found that neural responses to multisensory stimuli can exceed the sums of the responses to individual stimuli. For example, the neural response to an audio–visual stimulus can exceed the sum of the responses to the individual auditory and visual stimuli (denoted AV > A + V) by as much as 1,000% or more (Meredith & Stein, 1983, 1986). Such dramatic response enhancements are rare, but responses to multisensory stimuli are statistically stronger than the responses to the most effective unimodal stimulus for the majority of multisensory neurons in the SC (Stein, Jiang, et al., 2004; Stein, Stanford, Wallace, Vaughan, & Jiang, 2004).

Studies of the multisensory properties of SC neurons provided clues as to when and why multisensory response enhancement takes place. This work led to the discovery of several rules that govern the multisensory responses, including the spatial, temporal, and inverse effectiveness rules (see Stein & Meredith, 1993). The spatial and temporal rules concern the relative stimulus locations and timing. Stimuli that are spatially and temporally coincident typically lead to multisensory response enhancement, whereas stimuli that are spatially or temporally disparate produce no interaction or response depression (i.e., responses that are weaker than the strongest unimodal response). In the natural environment, sensory signals that occur at about the same time and place typically arise from a common object. Multisensory response enhancements are thus believed to be integrated neural signals that represent objects with multisensory features (Stein & Meredith, 1993). The integrated signal is strongest when the unimodal stimuli elicit weak responses from the multisensory neuron and is weakest when at least one of the unimodal stimuli elicits a strong response (the inverse effectiveness rule). Nearly all of the super-additive (i.e., AV > A + V) response enhancements that occur, do so when the unimodal stimuli elicit weak responses from multisensory SC neurons. This is precisely the situation in which perception and action systems would benefit the most from a boosted multisensory signal (Stein, Jiang, et al., 2004).

Multisensory Interactions in Perception and Action

As in the case of physiological studies of multisensory neurons, behavioural evidence of human perception and action indicates that organisms make use of multisensory stimulation. Under normal circumstances, multisensory stimulation leads to enhanced perceptions of, and facilitated responses to, objects in the environment (e.g., Bolognini, Frassinetti, Serino, & Làdavas, 2005; Stein, London, Wilkinson, & Price, 1996; Sumby & Pollack, 1954). In the lab, however, researchers often introduce discrepancies between stimuli that are normally concordant. In these circumstances, multisensory stimulation actually leads to inaccurate perceptions and responses. This type of work revealed a plethora of multisensory interactions in the processing of stimulus location, identity, and timing (see Calvert, Brammer, & Iversen, 1998; Calvert et al., 2004). Here, we concentrate on the ventriloquist effect and bimodal speech perception, which are the most well-known multisensory interactions in spatial processing and stimulus identification, respectively. Each of these multisensory interactions is dominated by the visual modality, as the visual stimulation profoundly affects auditory processing but not vice versa. Other multisensory interactions, particularly those that involve processing of stimulus timing, are dominated by the auditory modality (see Shams, Kamitani, & Shimojo, 2004).

Multisensory interactions in spatial processing are evident in many everyday situations. For example, when we watch television or sit in a movie theatre, we hear sounds that come from speakers around the screen, but we perceive the sounds as if they were emanating from the on-screen objects themselves. A particularly amusing example of this type of multisensory interaction is the ventriloquist’s illusion, in which a performer “throws” her voice to a dummy by minimizing her own mouth movements and simultaneously moving the dummy’s mouth. Most observers know that the dummy is not actually speaking, yet it is difficult not to perceive the voice as emanating from the dummy’s mouth.

Similar multisensory examples of the ventroliquism effect are found in the lab using more mundane auditory and visual stimuli. The general strategy is to present concurrent auditory and visual stimuli at different locations and determine the consequences of the spatial discordance on the perceived location of the stimuli. Participants are asked either to point to the apparent location of one of the stimuli and to ignore the other or to indicate whether the two stimuli appeared at the same location (see Bertelson, 1999; Bertelson & de Gelder, 2004). The first method enables researchers to investigate whether the presence of a stimulus in one modality biases the perceived spatial location of a stimulus in another modality, whereas the second method enables researchers to investigate whether stimuli presented at different locations are fused into a common object. This visual information biases the perceived location of auditory stimuli and dominates the perceived location of fused audio–visual objects (Bertelson, 1999; Welch & Warren, 1986).

The pattern of intersensory biases that emerged suggests that vision dominates the multisensory interactions in spatial perception, but not completely (Welch, 1999; Welch & Warren, 1980). In particular, visual stimulation largely, but not completely, biases the perceived location of auditory and proprioceptive stimuli, whereas auditory or proprioceptive stimulation biases the perceived location of visual stimuli slightly (e.g., Bertelson & Radeau, 1981; Hay, Pick, & Ikeda, 1965). Such biases occur only when the spatial discrepancies are sufficiently small (<30°) and the stimuli are presented concurrently (e.g., Jack & Thurlow, 1973). This pattern of findings suggests that the perceptual system integrates signals from different modalities when it can be assumed that the stimuli were produced from the same external object. Discrepancies between conflicting stimuli are resolved largely on the basis of the information provided by the most precise, or most appropriate, modality (Howard & Templeton, 1966; Welch & Warren, 1980). Vision has the highest spatial resolution and provides the most reliable spatial information; thus, it produces the largest intersensory biases in spatial perception. By comparison, the auditory modality supplies the most precise temporal information, which explains why it dominates other modalities in the perception of stimulus timing (Shams et al., 2004).

Multisensory interactions in stimulus identity are most evident during face-to-face communication, when people are confronted with the sound of the speaker’s voice and the sight of the speaker’s facial expressions, body “language”, and articulations (lip and mouth movements). Although few individuals are expert lip readers, most process the articulatory movements of the speaker’s lips and mouth to a surprisingly large degree. The presence of congruent visual information can substantially improve comprehension of speech in noisy environments (Sumby & Pollack, 1954), whereas the presence of incongruent visual information can lead to illusory auditory perceptions (McGurk & MacDonald, 1976). These latter incongruency effects are studied by presenting congruent and incongruent pairs of auditory and visual consonant–vowel combinations (e.g., /ba/). On incongruent-pair trials, an audible consonant–vowel combination is dubbed onto a video of a person speaking a different combination. In this case, people typically report hearing either the visual consonant–vowel combination or something else. For example, when an audible /ba/ is dubbed onto a visual /ga/, people most often hear either /ga/ or /da/. This type of effect is called the McGurk effect.

The McGurk effect is a powerful demonstration of multisensory integration. The finding that people often report hearing something in between the auditory and visual speech signals suggests that both modalities contribute to the final fused percept. Thus, some similarities between the multisensory interactions influence stimulus identity, whereas others influence spatial perception. Unlike the ventriloquism effect, however, the McGurk effect can tolerate substantial spatial and temporal separations between the auditory and visual signals (Jones & Munhall, 1997; Munhall, Gribble, Sacco, & Ward, 1996). These differences suggest that the rules that govern multisensory integration at the perceptual level might depend on the type of information being processed (Calvert et al., 1998).

Multisensory Interactions in the Human Brain

Advances in neurosciences over the past decade made possible the study of multisensory interactions in the active human brain. Neuroimaging techniques, such as functional magnetic resonance imaging (fMRI), measure changes in metabolism or blood flow associated with neural activity, whereas electrophysiological techniques such as electroencephalographic (EEG) and magnetoencephalographic (MEG) recordings measure fluctuations in electrical or magnetic fields that are generated by neural currents (Huettel, Song, & McCarthy, 2004; Picton, Lins, & Scherg, 1995). Each of these techniques can provide different clues about the ways in which the human brain integrates information from the various sensory modalities into meaningful, multisensory perceptions. By considering data from multiple techniques, it is possible to document patterns of brain activity associated with the multisensory interactions in spatial processing and stimulus identification. The preliminary evidence suggests that there may be multiple mechanisms by which multisensory signals are integrated in the human brain.

Multisensory interactions in spatial processing are investigated with functional neuroimaging and electrophysiological recordings. Using fMRI, researchers identified candidate multisensory areas by looking for regions of cortex that are activated by visual and tactile stimuli (Macaluso & Driver, 2001, 2005a). One region—the anterior part of the intraparietal sulcus—can be activated by contralateral stimulation in either modality, which suggests that this area contains a spatially organized representation of multisensory space. Based on the spatial and temporal rules governing multisensory integration at the cellular level, concurrent visuo-tactile stimulation at a single location might be expected to enhance activity within such a multisensory area. Surprisingly, however, concurrent visuo-tactile stimulation does not modulate activity within the intraparietal sulcus, but does modulate activity within visual and somatosensory areas of the occipital and parietal lobes, respectively (Macaluso & Driver, 2005a; Macaluso, Frith, & Driver, 2000). These findings indicate that multisensory processing in the human brain involves the multisensory areas of “association” cortex as well as the modality-specific areas of cortex that are specialized for the processing of stimulus features such as colour, shape, texture, and the like. The modulation of modality-specific sensory activity by spatially congruent bimodal stimulation was hypothesized to involve feedback from higher multisensory areas (Macaluso & Driver, 2001, 2005a, 2005b; Macaluso et al., 2000). The reciprocal nature of the interaction between multisensory and modality-specific areas of the cortex suggests that the organization as well as the functional relations between these areas must be considered to fully understand perceptual consequences.

Multisensory interactions in spatial processing are examined in comparisons between the brain’s electrical responses to bimodal stimuli and the brain’s electrical responses to unimodal stimuli (see Fort & Giard, 2004). In this line of research, EEG is recorded from multiple electrodes placed on the scalp, and the brain’s electrical responses to the stimuli are determined by averaging portions of the EEG that are time-locked to the stimulus onsets (see Picton et al., 1995). The averaged electrical responses, called event-related potentials (ERPs), consist of several voltage fluctuations that can be characterized both temporally and spatially (e.g., distributions across scalp, estimated anatomical sources). The human brain’s electrical responses to multisensory interaction are studied in comparisons between the ERPs elicited by bimodal stimuli (e.g., audio–visual, AV) and the sum of the ERPs elicited by the constituent unimodal stimuli (e.g., auditory + visual, A + V). Differences between the bimodal response and the sum of the unimodal responses (e.g., AV > A + V) can be attributed to multisensory interactions, except when there are processes that are common to all three stimuli (see Teder-Sälejärvi, McDonald, Di Russo, & Hillyard, 2002). In this latter situation, the common activity is represented twice in the sum of the unimodal-stimulus ERPs but only once in the bimodal-stimulus ERP.

The procedure described above was first applied to the study of audio–visual interactions by Giard and Peronnet (1999) who found evidence for multisensory interactions in modality-specific auditory and visual cortices beginning as early as 40 ms after stimulus onset. The earliest differences may have reflected multisensory interactions or anticipatory processes that were common to all three stimuli (Teder-Sälejärvi et al., 2002), but such early differences were observed in subsequent audio–visual and audio–tactile experiments that were designed to minimize anticipation (Foxe et al., 2000; Molholm et al., 2002). Other differences that were attributed to audio–visual interaction occurred in occipital and superior temporal (auditory) cortices about 190 and 260 ms after stimulus onset, respectively (Teder-Sälejärvi, Di Russo, McDonald, & Hillyard, 2005; Teder-Sälejärvi et al., 2002). The latter of these differences was hypothesized to reflect neural activity associated with shifts in the perceived locations of auditory stimuli by concurrent visual stimuli (i.e., the ventriloquism effect). This implies that intersensory biases in spatial processing take place at relatively late stages of processing and are mediated by feedback pathways from higher multisensory cortical areas to modality-specific cortical areas (Teder-Sälejärvi et al., 2005). Other than this one effect, however, the ERP differences attributed to audio–visual or audio–tactile interaction do not appear to depend on the spatial congruency of the stimuli (Murray et al., 2005; Teder-Sälejärvi et al., 2005). This suggests that a different set of rules may govern most of the multisensory interactions in human cortex.

Multisensory interactions in speech perception are studied with functional neuroimaging and electromagnetic recordings (see Calvert et al., 1998). In an early fMRI study, the silent viewing of a speaker’s lips activated modality-specific auditory areas of the superior temporal cortex that are involved in the processing of auditory speech (Calvert et al., 1997; see also MacSweeney et al., 2000). In a subsequent study, congruent audio–visual speech was found to enhance activity relative to either auditory or visual speech in modality-specific auditory and visual cortices (Calvert et al., 1999). These enhancements were hypothesized to underlie the perceptual gains resulting from concurrent audiovisual stimulation (e.g., Sumby & Pollack, 1954).

No enhancements were found in multisensory regions of the cortex in the early fMRI studies. However, evidence for multisensory enhancement in a posterior region of the left superior temporal sulcus (STS) was found with a more advanced experimental design involving the presentation of auditory and visual speech streams (Calvert, Campbell, & Brammer, 2000). The task entailed the presentation of unpredictable periods of auditory, visual, congruent audio–visual, and incongruent audio–visual speech stimuli. Non-additive response enhancements (AV > A + V) and depressions (AV < A + V) were found in the left posterior STS for congruent and incongruent audio–visual stimuli, respectively. Weaker multisensory interactions were found in other cortical areas, including modality-specific areas of auditory and visual cortices. Thus, STS was hypothesized to be a primary region for multisensory convergence of auditory and visual speech signals and the interactions in auditory and visual cortical areas a consequence of re-entrant feedback from STS. This hypothesis is consistent with the fMRI and ERP data indicating multisensory interactions in spatial processing within modality-specific visual and auditory areas (e.g., Giard & Peronnet, 1999; Macaluso & Driver, 2005a). In both cases, the interactions in the modality-specific areas are believed to underlie the perceptual costs and benefits produced by multisensory stimulation.

Evidence from several ERP and MEG studies also suggests that the multisensory interactions in bimodal speech involve modulations in modality-specific regions of cortex (Möttönen, Krause, Tiippana, & Sams, 2002; Sams et al., 1991). For example, Sams et al. (1991) found that incongruent audio-visual speech stimuli (e.g., auditory /pa/ and visual /ka/) that are perceived to deviate from standard congruent stimuli (e.g., auditory /pa/ and visual /pa/) activate auditory cortex starting at about 180 ms after stimulus onset. And, in a follow-up study, Möttönen et al. (2002) found that changes in sequences of visual speech stimuli can activate auditory cortex even in the absence of concurrent auditory speech stimuli. Thus, multisensory processing in humans may be achieved in different ways, including convergence in higher-level multisensory area, feedback from such areas to lower-level areas in putatively unimodal sensory pathways (e.g., Calvert et al., 2000; Macaluso & Driver, 2005b), or even direct connections among these lower-level sensory areas (cf. Clavagnier, Falchier, & Kennedy, 2004). However it is not yet clear how the different rules of integration map onto these diverse pathways (Murray et al., 2005; Teder-Sälejärvi et al., 2005).

Implications for Perceptual Theories of Autism

Current conceptualizations of perceptual functioning among persons with autism allude to dynamic constructs such as temporal binding, attentional coordination, and neural modulation and connectivity (Brock et al., 2002; Castelli et al., 2002; Just et al., 2004; Mottron, Burack, Iarocci, Belleville, & Enns, 2003; Schultz et al., 2000). Empirical innovations include more ecologically valid paradigms that permit the assessment of the organization of functioning in which specific components of processing are presented in competition with each other. Within this more complex framework of coordinated processes, the patterns of performance may provide more fine-tuned depictions of expected differences between persons with and without autism. For example, perceptual atypicalities may arise from the integration and organization of specific processes, rather than solely from impairments in the different components. In the case of global–local processing, the unexpected but consistent evidence of intact global processing among persons with autism (e.g., Mottron, Burack, Stauder, & Robaey, 1999; Ozonoff et al., 1994) was found in situations in which only that type of processing was necessary, whereas the expected impairments in global processing were manifested as local processing predominated when attention was divided between local and global processing (Plaisted, Swettenham, & Rees, 1999). Similarly, orienting impairments may be evident only when reflexive and voluntary components are mobilized in competition with each other (Iarocci, Burack, Mottron, Randolph, & Enns, under revision).

The debate about abnormalities in top down vs. bottom up processing in autism is quelled by a more systems-based approach to the development of perception according to which the assessment of any perceptual ability involves identifying the level at which specific processes affect perceptual development and analyzing the essential aspects (i.e., interdependency and relations among the processes) that give meaning to the overall system that is developing (Gottlieb, Wahlsten, & Lickliter, 1998; Thelen & Smith, 1994). What emerges is the recognition that a precise and detailed understanding of the components of the perceptual system is necessary but not sufficient. Thus, perception cannot be fully understood by dissecting its dynamic systems into their constituent parts because the individual components and their associated functions are embedded within the fabric of the whole system. Accordingly, understanding the phenomenon of atypical perception requires investigations in multiple modalities; the use of multiple methods and contexts, including the comparisons of specific clinical groups; and assessments of interactions as they unfold over time (Burack et al., 2004).

The application of theories of visual perception of autism (e.g., enhancement of feature processing and discrimination or deficits in spatial localization in persons with autism) to understand better auditory perception among persons with autism represents a first step to a systemic approach to studying perception in real-life settings. The benefit of venturing beyond the visual modality is that the original theory may be fine-tuned to reflect which of the proposed phenomena is specific to the visual domain and which reflects a generalized mechanism that underlies both visual and auditory domains (Heaton, Hermelin, & Pring, 1998; Mottron, Peretz, & Ménard, 2000; Plaisted, Saksida, Alcántara, & Weisblatt, 2003; Teder-Sälejärvi et al., 2005). Further, these refinements may generate hypotheses regarding the perceptual experience of persons with autism in multisensory contexts, wherein visual information is known to bias the perception of fused audio–visual objects (Bertelson, 1999; Welch & Warren, 1986) and auditory information dominates in the perception of stimulus timing (Shams et al., 2004).

The next step in the quest for a comprehensive theory of perception in autism is to address the consequences of enhanced feature detection or discrimination, weak central coherence or temporal binding, and atypical neural modulation or connectivity on perception in the context of the multisensory world. This issue becomes more pressing with the mounting evidence from studies of human sensory development and intersensory perception that the senses are not as segregated as initially thought and that both the differentiation and integration processes are involved in perceptual development (Lickliter & Bahrick, 2004). Another impetus for sensory integration research is the sheer number of behavioural observations of sensory abnormalities in all the modalities observed among individuals with autism of all ages and levels of functioning. Although not exclusive to autism, these abnormalities may occur at specific periods in development or in a particular constellation that is uniquely related to the early clinical markers of autism (e.g., lack of gaze monitoring). Thus, we argue that the exploration of multisensory issues in autism should not remain a clinical pursuit (Benaroya, 1977, 1979; Chan, Fung, & Tong, 2005), uncharted by basic researchers in the field of autism. Rather borrowing theories and methods from cognitive neuroscience will provide a common language for a collaborative course of inquiry that may prove fruitful for both clinicians and researchers.