Keywords

Accurate integration of multisensory information serves essential purposes in daily life. For example, we recognize ourselves in the mirror and distinguish our own shadow from others by matching movements we intend to generate and movements seen on the visual image. Successful motor control, such as grabbing a coffee mug, critically relies on the integration of visual and proprioceptive information on one’s arm and hand position, and of visual and tactile information on where the fingers are on the handle. Therefore, accurate integration of multisensory information is crucial for distinguishing between oneself and external world as well as for interacting with the environment. As the brain is constantly flooded with sensory information both from one’s own body and the environment, how it properly integrates and segregates sensory information becomes an important question for understanding the underlying mechanism of body representation. In this chapter, we first discuss behavioral work that investigates the constraints and principles underlying multisensory integration regarding the body. We then introduce a Bayesian framework that theorizes multisensory integration as inferring the source of the sensory inputs by an optimal observer. Finally, we review evidence from neuroimaging and neurophysiological studies on the neural correlates and computational principles of body representation.

5.1 Temporal and Spatial Constraints on Multisensory Integration

Multisensory integration is typically studied by creating a mismatch between inputs from different modalities. Early studies on the integration of visual and auditory information have identified spatial and temporal rules of multisensory integration, such that inputs are more likely to be combined if spatially and temporally closer (Meredith 1987; Meredith and Stein 1986). The same rules also apply to multisensory integration regarding the body. Studies have shown that the processing of bodily signals is influenced by external stimuli occurring within a limited space around the body part, also known as the peripersonal space (PPS) (Làdavas et al. 1998; Spence et al. 2004). For example, the perceived location of tactile stimuli is more strongly biased toward a concurrent visual stimulus when the visual stimulus occurs close to the body versus far (Spence et al. 2004). These findings demonstrate the multisensory nature of body representation such that tactile perception is automatically biased by task-irrelevant visual information. Moreover, there is a spatial limit within which sensory inputs are considered bodily-related and integrated together.

Multisensory integration not only underlies the perception of bodily sensory inputs, but also how the brain represents the body itself. Because it is not always possible to separate information from one’s own body, bodily illusions are often used. In these paradigms, participants’ actual hand is hidden from view while seeing a fake hand. In this way, proprioceptive information and visual information are dissociated. Integration between visual and proprioceptive information is indexed by the illusory embodiment of the viewed hand, i.e., the viewed hand feels like one’s own body—referred to as body ownership, and the hand is perceived closer to where the viewer sees it (Botvinick and Cohen 1998; Holmes et al. 2004). By examining the effect of spatial and temporal congruence between the viewed hand and unseen actual hand on the strength of illusion, researchers can identify factors influencing multisensory integration.

One important factor influencing multisensory integration is the spatial location of the viewed fake hand. As the distance between the viewed hand and unseen actual hand increases, hence more difficult to reconcile the discrepancy between visual and proprioceptive information, participants are less likely to feel the viewed hand as their own hand (Medina et al. 2015) or is located where the viewed hand is (Holmes et al. 2004, 2006; Holmes and Spence 2005). By gradually displacing the rubber hand further from participants’ unseen actual hand, Lloyd (2007) found decreasing illusion with increasing distance. Importantly, there was an exponential nonlinear decrease as the rubber hand was positioned outside of the participants’ reachable space, marking the boundary of the spatial range within which an object could be embodied.

In addition to spatial location, a congruent posture of the viewed and unseen actual hand is critical. For example, with a fixed distance between the rubber hand and the unseen actual hand, subjective ownership of the rubber hand decreased with mismatch in the hand angle between the two hands (e.g., fingers pointing forward vs. pointing 30° leftward) despite synchronous visuotactile stimulation (Costantini and Haggard 2007; Ide 2013).

It has been well-established that the temporal synchrony between the viewed hand and the unseen actual hand plays an important role. In the rubber hand illusion, synchronous strokes on the unseen hidden hand and the rubber hand, such that strokes seen on the viewed hand match tactile sensation on the actual hand, elicit strong illusion. Asynchronous strokes, however, abolish the illusion (Botvinick and Cohen 1998; Tsakiris and Haggard 2005). Using a different and more powerful paradigm, researchers can manipulate whether the unseen hand and the viewed hand are performing congruent movements (mirror box illusion; Ramachandran and Rogers-Ramachandran 1996; Medina et al. 2015). Whereas participants experience strong ownership of the viewed hand when the seen movements on the viewed hand is congruent with movements performed by the unseen actual hand, the illusion is much weaker when the movements are out of phase (Holmes and Spence 2005; Liu and Medina 2018; Medina et al. 2015). These findings provide evidence for the importance of temporal synchrony in multisensory integration.

The evidence discussed above suggests a key role of cross-modal congruence in multisensory integration and body representation. However, matching between bottom-up sensory information is not sufficient for the brain to embody an object. It makes intuitive sense that one would not perceive a dog’s paw as their own hand regardless of sensory information, implying additional constraints from prior knowledge of our body independent of incoming sensory inputs. The following section discusses the influence of prior information on multisensory integration.

5.2 Prior Knowledge of the Body Influences Multisensory Integration

At the level of visual features, the human body has specific anatomical and structural properties that differ from other objects (e.g., “a hand has five fingers sticking out”). At the semantic level, we use labels and descriptive language to distinguish objects of different categories (e.g., a human body versus a tree). These forms of prior knowledge are learned in life experience and exist independent of online sensory information. Using the rubber hand illusion paradigm, studies found weaker illusory embodiment when individuals view an object (e.g., a wood stick) versus a realistic rubber hand (Holmes and Spence 2005; Tsakiris et al. 2008; Tsakiris and Haggard 2005). The illusion was also weaker for neutral objects whose shape deviates from a standard hand (Haans et al. 2008; Tsakiris et al. 2010). These findings indicate that individuals have prior expectations of what features constitute a hand based on stored visual body representation (Tsakiris 2010). Interestingly, visual features that are more specific to one’s own body, such as size and color, do not have a dramatic effect on the embodiment of the viewed hand or body (Austen et al. 2004; Farmer et al. 2012, 2014; Peck et al. 2013; but see Pavani and Zampini 2007). Based on these findings, it was proposed that the stored visual body representation encodes general shape information of body parts instead of self-specific features (Kilteni et al. 2015; Tsakiris 2010).

Another source of prior knowledge comes from the body schema that represents online body position in space as the body moves (Head and Holmes 1911; Schwoebel and Coslett 2005). The movement of the body is limited by biomechanical constraints—resistance caused by joint and muscle structures that determines the difficulty in and the possible range of body movement (Parsons 1987, 1994). As such, biomechanical constraints are also encoded in the body schema. Importantly, biomechanical constraints not only affect physical movements but manifests in mental body representations. For example, when individuals are asked to judge the chirality of a hand image in a selected orientation, the reaction time is longer if the rotation from the individuals’ own hand posture to the hand image is more biomechanically constrained, even if the rotation angle is the same (Cooper and Shepard 1975; Parsons 1987, 1994; Zapparoli et al. 2014). These findings provide evidence that the participants performed the task by mentally simulating body movements, in which biomechanical constraints are encoded.

There is evidence that biomechanical constraints between the unseen actual hand and viewed hand influences multisensory integration. With the angular difference between the viewed hand and unseen actual hand fixed, biomechanical constraints can be manipulated such that the rotation from the actual to the viewed hand is less biomechanically constrained in one condition but more constrained in another (Ide 2013; Liu and Medina 2017). Participants reported weaker illusions in the more-biomechanically-constrained condition despite matched angular differences, indicating that the amount of biomechanical constraint is also computed into the overall discrepancy between visual and proprioceptive information. These findings prove that multisensory information is influenced not only by bottom-up sensory information but also prior information stored in the body schema.

Another type of constraint regards anatomical plausibility of body configuration, i.e., whether a body posture is possible to occur. It was found that the rubber hand illusion was abolished when the viewed hand is in anatomically implausible postures (e.g., fingers pointing straight toward one’s body along the sagittal axis) despite synchronous visuotactile stimulation (Ehrsson et al. 2004; Ide 2013; Tsakiris and Haggard 2005). These findings can be accounted for by the encoding of biomechanical constraints discussed above, such that the viewed hand is in an infinitely biomechanically constrained position that proprioceptive information cannot be biased toward. Alternatively, the brain may refer to a body structural description, the stored knowledge of the relative position of body parts on the body surface, for example, the arm is attached to the trunk (Buxbaum and Coslett 2010). Anatomically implausible hand postures often imply breaking of joints, hence violating the body structural description, leading to lower degrees of embodiment (Kilteni et al. 2015).

In summary, the brain uses both incoming sensory information and prior knowledge of the body in multisensory information. The interplay of both bottom-up sensory inputs and top-down knowledge is summarized in a neurocognitive model of body ownership (Tsakiris 2010) that conceptualizes body representation as a set of information comparators. What factors determine the relative importance of each source of information, and how do these factors lead to the final percept that a hand belongs to one’s own? In the next section, we discuss a Bayesian framework that addresses the computational principles of multisensory integration.

5.3 A Bayesian Framework of Multisensory Integration

Forming a coherent body representation is a process of inferring the state of the body based on incoming sensory information. The Bayesian framework posits that upon receiving sensory inputs from multiple modalities, the brain generates hypotheses on the cause of these sensory inputs. For example, inferring the hand position gives rise to the perceived visual and proprioceptive information. Given the noise in both external inputs and internal sensory systems, each hypothesis is only correct at a certain probability. These probabilities are called posterior probability as they are conditional on a particular set of sensory inputs, written as P(S| xv, xp) (the probability of state S, for example, hand position, given visual (xv), and proprioceptive (xp) information). The goal of an optimal observer is to find the state S with the maximum posterior probability. Following Bayes’ rule, posterior probability depends on the product of two components: the likelihood of obtaining a particular set of sensory inputs given state S (P (xv, xa| S)), and the prior probability of this hypothesis based on prior knowledge (P(S)) (see Eq. 5.1).

$$ P\left(S|{x}_v,{x}_p\right)=\frac{P\;\left({x}_v,{x}_p|S\right)\;P(S)}{P\left({x}_v,{x}_p\right)} $$
(5.1)

As the denominator P(xv, xp) is independent of the state S, Eq. (5.1) is simplified as:

$$ P\left(S|{x}_v,{x}_p\right)\propto P\left({x}_v,{x}_p|S\right)P(S) $$
(5.2)

Assuming a uniform distribution P(S), denoting equal prior probability that the hand appears anywhere in space, maximizing the posterior probability equals to maximizing the likelihood. Further assume that each sensory modality is corrupted by independent Gaussian noise, Eq. (5.2) is simplified as:

$$ P\left(S|{x}_v,{x}_p\right)\propto P\left({x}_v|S\right)P\left({x}_p|S\right) $$
(5.3)

Maximum-likelihood estimation of this equation then becomes a weighted sum of visual and proprioceptive input based on their relative reliability, with reliability the inverse of the standard deviation of each input:

$$ \hat{S}=\frac{\frac{1}{\sigma_v}}{\frac{1}{\sigma_v}+\frac{1}{\sigma_p}}{x}_v+\frac{\frac{1}{\sigma_p}}{\frac{1}{\sigma_v}+\frac{1}{\sigma_p}}{x}_p $$
(5.4)

As a result, the final percept is biased toward the more reliable unimodal estimate, and the reliability of the final percept is maximized (Ernst and Bülthoff 2004). This model therefore accounts for the findings that participants perceive their hand toward where the visual hand is (i.e., visual capture), presumably because visual information is typically less noisy than proprioceptive information. As an optimal weighting principle, maximum-likelihood estimation has been supported by multiple studies on various sensory modalities (Alais and Burr 2004; Ernst and Banks 2002; Van Beers et al. 1999; Witten and Knudsen 2005). For example, in one such study, participants were asked to estimate the height of a block based on haptic and visual information (Ernst and Banks 2002). Small discrepancies between visual and haptic inputs were introduced by manipulating the visual image viewed from a pair of goggles. Consistent with the model, participants’ estimate was biased toward visual information, with the weight of the visual information decreasing with visual noise.

An important consequence of this model is that unimodal estimates are always integrated to a single final percept, hence the model is referred to as the forced-fusion model (Körding et al. 2007; Shams and Beierholm 2010). Although it well explains human behavior when the discrepancy between unimodal inputs is small and easy to resolve, it does not apply to all real-world situations. Organized representations require not only accurate integration of inputs belonging to the same object, but segregation of inputs that come from different sources. For example, when playing a duet, it is equally important to integrate information on one’s own hand and to segregate information from the partner’s hand. In experiments on bodily illusions, the viewed hand is not embodied if it is placed too far from the participant’s actual hand (Lloyd 2007; Medina et al. 2015), indicating segregation of visual and proprioceptive information. The underlying problem is to infer whether inputs are emitted by the same object, or in other words, have a common cause, a process referred to as “causal inference.”

Researchers have proposed a Bayesian causal inference model to account for how the brain infers the causal structure of sensory inputs (Fig. 5.1; Körding et al. 2007; Shams and Beierholm 2010). In this model, the underlying causal structure of whether multisensory inputs come from a common cause is denoted by the posterior probability (P(C| xv, xp)). Two causal hypotheses are tested: that the inputs are caused by a common source (C = 1), which leads to complete integration of inputs from both modalities, or that the inputs are caused by independent sources (C = 2), which leads to complete segregation of information from different modalities. In this way, the Bayesian causal inference model has a hierarchical structure in which the brain first infers the causal structure of sensory inputs, and then makes an estimate of the object state (e.g., location) under the causal structure. Following Bayes’ rule, the posterior probability of each causal hypothesis depends on the likelihood of receiving the current sensory inputs given this causal hypothesis (P(xv, xa|C), and the prior probability of the causal hypothesis (P(C)). An important cue that informs the brain about the causal structure of inputs from multiple modalities is the “similarity” between the inputs: inputs that are closer in time, space, or other dimensions tend to have a higher likelihood under the common cause hypothesis, making them more likely to be integrated.

Fig. 5.1
An illustration presents a cartoon human figure with his hand placed on a table and an impression of his hand on the table. The bubble above his head interprets teh casual inference if that is his hand or not. The inference has a common and a separate source for forced fusion and full segregation.

Causal inference problem in body representation. Upon receiving visual and proprioceptive information about the hand, the brain infers the probability that the two sources of information come from a common cause

How is the estimate of object location contingent on the inferred causal structure? One strategy is to follow the most likely causal hypothesis (model selection, Wozny et al. 2008). If the probability of the common cause hypothesis is higher than the hypothesis of the independent cause, inputs are fully integrated by maximum-likelihood estimation. Otherwise, the brain estimates each modality independently without combing them. Another strategy is to weight the estimate under each causal hypothesis in proportion to each hypothesis’s posterior probability (model averaging, see Eqs. (5.5a) and (5.5b), Wozny et al. 2010; Körding et al. 2007). By considering the causal structure of inputs from multiple modalities, the Bayesian causal inference model can account for the full range of multisensory integration from complete integration to complete segregation.

$$ {\hat{S}}_p=P{\left(C=1|{x}_p,{x}_v\right)}_{vp,c=1}+\hat{S}P\left(C=2|{x}_p,{x}_v\right){\hat{S}}_{p,c=2} $$
(5.5a)
$$ {\hat{S}}_v=P\left(C=1|{x}_p,{x}_v\right)\;{\hat{S}}_{vp,c=1}+P\left(C=2|{x}_p,{x}_v\right){\hat{S}}_{v,c=2} $$
(5.5b)

The Bayesian causal inference model can account for the effect of various factors on multisensory integration regarding the body (Fig. 5.2; Fang et al. 2019; Kilteni et al. 2015; Samad et al. 2015). Provided with multisensory information about the hand including vision, proprioceptive, touch, etc., the brain is faced with a causal inference problem of whether all sources of information come from the same hand, i.e., “my hand.” The posterior probability of a common hand depends on the likelihood of receiving the current sensory information if they belong to the same hand and the prior probability of there being one hand. The closer the visual and proprioceptive hand positions are, the more likely they define the same hand. Similarly, other spatial and temporal congruence factors discussed above contribute to the likelihood of a common cause. On the other hand, prior body knowledge constrains how strongly a viewed object can be embodied. For example, the model-fitted prior probability of a common cause was lower when participants viewed a woodblock versus a verisimilar hand (Fang et al. 2019). The higher the posterior probability of a common cause, the more strongly the viewed hand is perceived as one’s own (Fang et al. 2019), suggesting that causal inference may be a mechanism of how the brain distinguishes oneself from the environment.

Fig. 5.2
A schematic presents a visual feature, anatomical plausibility, spatial disparity, and temporal synchrony with calculations given on the left for prior probability of each c, posterior probability of each c, and likelihood of each c given sensory information.

Bayesian causal inference in body representation. The posterior probability of a common cause depends on the prior probability and likelihood. Prior information refers to existent knowledge including the visual feature and anatomical plausibility of the body. Likelihood depends on the congruence between information across modalities, such as spatial disparity and visual-tactile synchrony

5.4 Neurophysiological Evidence of Body Representation

5.4.1 Neuronal Basis of Multisensory Integration of Body-Related Signals

The last section focuses on the neurophysiological and neuroimaging evidence related to multisensory body representation. Since numerous behavioral findings have demonstrated that the body representation depends on the integration of body-related signals from multiple modalities, most studies exploring the neural basis of body representation have focused on the multisensory neurons (Noel et al. 2018; Blanke et al. 2015). Human and non-human primate neurophysiological studies have demonstrated a high degree of overlap of brain regions in multisensory integration and body representation (Blanke et al. 2015). Using single-unit recordings in animals, numerous kinds of multisensory neurons have been identified in various brain regions, including the parietal association cortex, premotor cortex, insula, and superior colliculus (Mountcastle et al. 1975; Avillac et al. 2007; Stein and Stanford 2008). In particular, body-related multisensory neurons are predominantly located in the posterior parietal cortex, including the ventral intraparietal (VIP) area, area 5 and area 7, and premotor cortex (Graziano et al. 1994; Graziano et al. 1997; Avillac et al. 2005; Fogassi et al. 1996; Graziano et al. 1999; Leinonen 1980).

Multisensory integration of body-related signals at the single neuron level has been well studied on the bimodal neurons responding to somatosensory stimuli and visual (auditory) stimuli near the body. In most of these neurons, the visual (auditory) receptive field is localized on a given body part (Graziano et al. 1994, 1999; Avillac et al. 2005). One important multisensory property of these neurons is that the neural response to the tactile stimulus is modulated by visual (auditory) stimuli presented within their receptive field. Similar to other multisensory neurons (e.g., visual-vestibular multisensory neurons), such multisensory modulation on neural response can be super-additive (increased firing rate) or sub-additive (decreased firing rate) compared to the arithmetic sum of the responses in unimodal conditions (Avillac et al. 2007). In analogy to peripersonal space in human psychological studies, the size of the visual (auditory) receptive field is proportional to the tactile receptive field on a given body part. The visual receptive field of monkey premotor neurons typically extends 40 cm from the upper limb (Graziano et al. 1994, 1999). It ranges from 5 cm to 1 m depending on the size of other body parts (Avillac et al. 2005; Schlack et al. 2005; Jiang et al. 2013). Furthermore, the visual (auditory) receptive field of these multisensory neurons is anchored on the corresponding body part despite the movement of the body (Fogassi et al. 1996; Graziano et al. 1997; Graziano 1999, 2000). For example, Graziano and colleagues found that the visual receptive field of monkey premotor neurons shifts to the new spatial location where the limb is placed (Graziano 1999). Such binding receptive fields can also be observed in other body-centered multisensory neurons, such as face-centered neurons in VIP (Avillac et al. 2005) and trunk-centered in area 7 (Iriki et al. 1996). Thus, these multisensory neurons with an anchored visual (auditory) receptive field encode body-related signals in body-part centered reference frames. Taken together, the visual (auditory) receptive field of these multisensory neurons is thus conceived as the neural basis of peripersonal space in human psychological studies, which constitute an interface for the body–environment interaction (Noel et al. 2018).

Human neuroimaging studies have consistently highlighted the premotor and posterior parietal cortex in integrating body-related signals (Makin et al. 2008). Similar to the single neuron responses in non-human primates, super-additive and sub-additive responses are also observed in the intra-parietal sulcus (IPS) and ventral premotor cortex when the visual stimulus is integrated within the peripersonal space (Gentile et al. 2011). For example, human fMRI studies found enhanced BOLD response in IPS when participants were presented with a visual object near the body (Makin et al. 2007; Sereno and Huang 2006). Using the BOLD adaptation paradigm, Brozzoli and colleagues examined the neural activations when consecutive visual stimuli were presented in peripersonal space around the participants’ hand. Since the adaptation paradigm (reduced neural activity in response to repeatedly presented stimuli) has been well established in electrophysiology and fMRI studies to reflect selectivity of neural responses to a specific stimulus, it is a useful method to examine the neural populations that respond to the visual stimuli within PPS. The adaptation effect was observed in IPS, inferior parietal lobe, and premotor cortex when the visual object was presented near the right hand, but not far from the hand (Brozzoli et al. 2011), indicating that these brain regions selectively respond to inputs near the body. Taken together, both human and animal neurophysiological findings suggest two key regions, the posterior parietal cortex, and premotor cortex, in multisensory processing of bodily signals. Single neuron and population activities in these areas offer a common neural basis in humans and non-human primates for multisensory integration in peripersonal space.

5.4.2 Neuronal Representation of One’s Own Body

Despite the rich neurophysiology studies about multisensory processing within peripersonal space in body representation, direct evidence about the neural basis of subjective recognition of one’s body is obtained from body illusion studies on humans. Using synchronous visual-tactile stimulation on a viewed hand and the participant’s actual hidden hand within fMRI scanner, Ehrsson and colleagues examined brain activation when the participants experienced the rubber hand illusion. They found activation in IPS, ventral premotor cortex, cerebellum (Ehrsson et al. 2004, 2005), and right posterior insula (Tsakiris et al. 2007) was closely correlated to the change of limb ownership in RHI. These neural correlates can be further confirmed by conducting the threatening test, which is commonly used in the behavioral test in body ownership. Lloyd and colleagues found increased activity in the posterior parietal cortex and supplementary motor cortex when the threatening visual object approached the fake limb in RHI (Lloyd et al. 2006), as if the participant’s own body was being threatened.

Furthermore, several studies examined the shift of peripersonal space toward the viewed hand after the rubber hand illusion has been induced. As expected, peripersonal space around the limb is remapped onto the fake limb under the RHI, and the adaptation effect in IPS and premotor cortex was observed when visual stimuli were repeatedly presented near the fake limb, but the adaptation effect is not observed if the contralateral fake limb is presented (Brozzoli et al. 2012).

The most direct evidence about the relationship between the body-related multisensory integration and body representation on a single neuron level has been examined by Graziano and colleagues in monkeys. Using the single-unit recording, the authors examined the neuronal response in monkey area 5 when the animal experienced the visual-somatosensory multisensory condition similar to the rubber hand illusion in humans (Graziano 2000). The multisensory neurons showed spatial selectivity to proprioceptive (monkey’s veridical limb) information and visual (fake limb) information of limb positions. For instance, if a neuron has a higher firing rate when the monkey’s veridical limb (proprioceptive input) is positioned on the left side versus the right side, its firing rate also increased when the fake visual limb is presented on the left side and decreased when it was shown on the right side. Intriguingly, such tuning modulation by visual information depended on the physiological feature of the visual limb. Non-body objects (e.g., a trunk of wood) and physically impossible limb position did not modulate the neural response. Furthermore, electrical stimulation of these body-related multisensory neurons in non-human primates results in defensive-like actions (Cooke and Graziano 2003; Graziano and Cooke 2006). These results are comparable to the RHI results in human behavioral and imaging studies.

5.4.3 Electrophysiological Evidence of Causal Inference in Body Representation

In the last session, we take the Bayesian theory of multisensory integration into account to explain the neural implementation of the subjective experience of body representation and self-other discrimination. In a recent study, the authors established a moving rubber hand illusion paradigm based on reaching movements under a virtual reality system. They recoded single neuron responses in the premotor cortex in awake behaving monkeys. By introducing various disparities between a monkey’s real limb and a visual fake limb, authors can examine the proprioceptive drift, which is the probe of the illusion strength under dynamic multisensory conditions. The behavioral result showed that the limb’s integration of visual and proprioceptive information could be well explained by Bayesian causal inference theory, consistent with the human behavioral results (Fang et al. 2019).

More importantly, under the Bayesian causal inference framework, the ownership of the visual fake limb is determined by the posterior probability that the sensory signals come from a common source. That is, the central neural system should integrate the signals when the visual-proprioceptive limbs are aligned and segregate the signals when the disparity is too large. To examine the neural correlates of the posterior probability, the authors conducted two control conditions to establish the ideal neuronal responses of integration and segregation. In the integration control condition, the visual and proprioceptive limb position was always perfectly aligned, while the visual limb was not presented in the segregation control condition. Thus, when the disparity was systematically changed across trials, the representation of the common source probability on a single neuron level can be approached according to how similar the neural response is to these ideal neural responses. The single neuron analysis revealed a considerable population in the premotor cortex associated with the posterior probability of common source predicted by the Bayesian causal inference model. The dynamics of the posterior probability of common source across trials can also be decoded by the population neuronal activities. More importantly, the probability of integration at both the behavioral and neural levels was decreased when the visual feedback was replaced by a piece of wood (Fang et al. 2019).

The neural mechanism of causal inference was further extended by a recent artificial neural network study. The authors trained a neural network model to solve causal inference for motion estimation (Rideaux et al. 2021). It was suggested that the multisensory neurons with congruent Gaussian tuning may account for multisensory integration, whereas those with incongruent multisensory tuning have been considered to account for segregating (French and DeAngelis 2020). In line with this prediction, the neural network develops multisensory neurons with congruent and opposite tunings and demonstrated both congruent and opposite neurons contribute to the multisensory behavior. This simulation thus showed that to determine whether the signals should be integrated or segregated, the causal inference problem can be solved by balancing between the activities of these two populations.