Introduction

The brain must constantly combine different natures of information coming from the environment, and at the same time has to track and anticipate changes in these signals, which simultaneously stimulate different sensory modalities. Several previous studies have shown that the integration of multisensory signals is based not only on the anatomical convergence of sensory-specific cortices with multisensory brain areas, but also on reciprocal influences between cortical regions traditionally considered sensory specific (for a review see Macaluso 2006). Multisensory signals provide complementary estimates of the same event, mainly through visual (Wassenhove et al. 2005), auditory (Morís Fernández et al. 2017), and somatosensory (Foxe et al. 2000) inputs that allow the formation of unified percepts. Studies on multisensory integration (MSI) allow us to understand the organization of sensory and cognitive systems, and to better explain pathological (e.g., Feldman et al. 2018; Williams et al. 2010) and physiological conditions (e.g., Ramkhalawansingh et al. 2016; Stephen et al. 2010).

To investigate the neural basis of MSI, neuroimaging (e.g., Beauchamp et al. 2004, 2005; Laurienti et al. 2003; Calvert et al. 1997) and electrophysiological (e.g., Santangelo et al. 2008; Talsma et al. 2010; Teder-Sälejärvi et al. 2005) methods have been often used. It is well known that MSI is mediated by several brain networks, involving the superior temporal and intraparietal sulci (Seltzer and Pandya 1989), the posterior insula (Mesulam and Mufson 1982), and the temporopolar cortex (Jones and Powell 1970). Recent functional magnetic resonance imaging (fMRI) studies also showed the contribution of the middle temporal gyrus, thalamus, insula, and inferior frontal gyrus (e.g., Li et al. 2020; Scheliga et al. 2022). Some of the most notable challenges in MSI studies are the need to control stimulus timing, spatial location, and sensory quality and quantity during stimulus delivery (Spence et al. 2001). For this purpose, one of the advantages of the electrophysiological event-related potential (ERP) technique is the precise time course recording, which allows the investigation of different processing stages, from the (preparatory) pre-stimulus to the (reactive) post-stimulus components. For example, if we consider the studies that have used fMRI only, a technique that has a high spatial but a low temporal resolution, we still do not know at which stage of processing the involved brain areas, such as the prefrontal cortex (PFC), contribute to MSI processing. Studies on post-stimulus ERP provided solid evidence that MSI affects both early and late stages of stimulus processing in sensory cortices and that attention can modulate these effects, providing evidence for the ability of top-down attention to influence MSI processes (e.g., Talsma, and Woldorff 2005). There is some evidence that top-down processing can facilitate the integration of multisensory inputs and lead to a spread of attention across sensory modalities (e.g., Talsma et al. 2010). Top-down modulation of modality-specific inputs can induce critical changes in the MSI process (Choi et al. 2018). Several ERP studies (e.g., Di Russo et al. 2019) have further shown that top-down cognitive control can be detected during the preparation/anticipation stage of processing in the PFC even before the stimulus onset. However, to date, only one study is available on anticipatory MSI: modality-specific pre-stimulus ERP components (i.e., the auditory positivity, aP, and the visual negativity, vN, originating from auditory and visual areas, respectively) started earlier and were larger during multimodal stimulation compared with the sum of the preparatory ERPs elicited by the unimodal stimulations (Fiorini et al. 2021). This anticipatory MSI also affected stimulus processing, boosting the magnitude of early post-stimulus ERP components. Although paving the way for the exploration of anticipatory MSI, Fiorini et al. (2021) only focused on sensory preparation during passive stimulation (i.e., no tasks were administered and only a passive observation/listening of auditory, visual, or simultaneous visual and auditory stimuli was required). To date, no information is available on the effects of multimodal integration on motor and cognitive preparation processes that occur in cognitive tasks requiring a motor response. These non-sensory anticipatory processes were associated with two specific ERP components: the Bereitschaftspotential (BP) and the prefrontal negativity (pN), reflecting, respectively, the progressive excitability of premotor areas and top-down attentional control in the PFC for the upcoming stimuli (Berchicci et al. 2012; Di Russo et al. 2016; Ragazzoni et al. 2019; Sulpizio et al. 2017; for a review see Di Russo et al. 2017). In addition, it has been recently shown that BP and the pN were also present in visual, auditory, and somatosensory modalities during unimodal tasks that involved specific sensory modalities (DRT, Bianco et al. 2020a, b). These components usually occur as negative slow waves, even though in auditory tasks they can be translated into positive polarity in light of the presence of the sensory aP.

Considering that the ERP literature on MSI is limited to post-stimulus processing and that only one study investigated MSI on pre-stimulus ERP (Fiorini et al. 2021), but focusing only on passive stimulation in sensory brain areas, in this study we aimed to verify the presence of anticipatory MSI in non-sensory brain areas during the preparation phase of a cognitive task and to test its possible correlation with task performance. To these aims, pre-stimulus preparatory activity in premotor (the BP) and prefrontal (the pN) areas was recorded in auditory (A) and visual (V) sensory–motor tasks and compared to that obtained in an audiovisual (AV) task.

To quantify the AV effect on perception in sensory brain areas, previous studies compared the AV condition with the sum of the A and V unisensory conditions (AV – (A + V)) to evaluate the “superadditive” enhancements, in which the multisensory response (AV) exceeds the sum of those evoked modality-specific components individually (A + V; e.g., Foxe et al. 2000; Molholm et al. 2002; Teder-Sälejärvi et al. 2002). That is, the activity evoked on the A or the V task is not sufficient by itself to explain the increased activity during the AV task. Hence, if the AV effect is sought to be present also in non-sensory areas and before stimulus presentation, a different comparison must be devised because non-sensory preparatory brain activity is not affected by the sensory modality of the upcoming stimulus (e.g., Bianco et al. 2020b). In addition, some studies have shown that intra-modal integration could be more accurately described as an averaging of responses to individual component stimuli. This averaging could function as a gain-control mechanism, preventing response saturation and allowing neurons to continue to use response speed to signal information along a certain feature dimension (Alvarado et al. 2007; Pluta et al. 2011). For these reasons, we operationally defined the AV effects on preparatory non-sensory areas, as those activities larger than the average of the unimodal ones (AV – (A + V)/2). If the brain is capable of anticipatory MSI, we expect that the BP and/or the pN elicited in the AV condition should be more negative than the average of the two unimodal conditions.

In addition, to test the possible contribution of premotor AV integration to task performance, in the AV task, we correlated the pN and BP amplitude with response time and accuracy, respectively. The early post-stimulus perceptual components, the auditory N1 and the visual P1 were also analyzed to confirm the AV effect as done in previous studies (e.g., Giard and Peronnet 1999).

We hypothesized that if the MSI will improve the behavioral performance in a cognitive sensorimotor task, the motor (the BP) and cognitive (the pN) preparatory ERP components should also be affected by MSI. Specifically, an observed improvement in response accuracy should be associated with a pN increase, and an observed improvement in response time should be associated with a BP increase.

Materials and methods

Participants

The a priori power analysis, performed using the G*Power 3.1.9.2 (Faul et al. 2007), showed that a minimum of 15 participants were required to reach the effect size (Cohen’s dz) of 0.862, a power (1-β error probability) of 0.85, and a 0.05 alpha probability for the two-tailed dependent sample (matched-pairs) t test. The dz value was calculated from the average dz values of the only previous study addressing MSI during preparatory ERP activity (Fiorini et al. 2021).

Sixteen adults volunteered for this study (8 females, 8 males; mean age 33 years SD ± 7.1). Inclusion criterion was considered the absence of reported neurological or psychological disorders. All participants were naive about the aim of the study and reported normal or corrected-to-normal vision and hearing. They were all right-handed, which was evaluated by the Italian version of the Edinburgh Handedness Inventory Questionnaire (Salmaso and Longoni 1985). Written informed consent was obtained from all participants according to the Declaration of Helsinki, and the project was approved by the Santa Lucia Foundation Ethical Committee.

Procedure

Participants were seated in front of a screen placed 114 cm from their eyes with their arms positioned on the armrests with their left arm relaxed and their right arm positioned palm down on a push-button board, such that the fingers could freely move on it. During the whole run, a fixation point was displayed in the center of the computer screen and consisted of a yellow circle (diameter 0.15 × 0.15° of visual angle) on a dark background. Participants were instructed to remain relaxed and not cross their legs and to maintain their gaze on the fixation point during the run. In each run, 80 stimuli were randomly presented with a variable interstimulus interval (ISI) of 1–2 s (0.01 s steps) for a total duration of approximately 2.5 min per run. The used ISI has been proven effective to avoid activity overlap between adjacent trials (Quinzi et al. 2019). After each run, the participant could take a short break if they needed it.

Participants performed three tasks in separate blocks: an auditory (A), a visual (V), and an audiovisual (AV) sensory–motor discrimination task as depicted in Fig. 1. The order of the tasks was randomized between subjects. In each task, four different stimuli were presented for 250 ms with equal probability: two of them were designated as Targets and the other two as non-Targets. Participants were asked to be very accurate in discriminating the stimuli and to respond as soon as possible when targets were displayed on the screen, withholding the response when non-Targets were displayed. The order of presentation was randomized within blocks and among participants. For each task, 8 runs were presented obtaining a total of 640 trials (i.e., 320 target and 320 non-target). Before starting the experiment, the participant was instructed through some test trials. The experimental session duration was about 60 min. To ensure proper comparability between the present and the previous studies using similar auditory and visual unimodal stimulations (Bianco et al. 2020a, b), we used the same visual and auditory stimulus coupling. The used stimuli were chosen for their abstract and neutral shape to prevent any emotional or attentional bias in the sensory processing.

Fig. 1
figure 1

Representation of all the stimuli used in the auditory (A), visual (V), and audiovisual (AV) conditions. In the AV condition, auditory and visual stimuli were presented simultaneously

In the A task, stimuli were presented binaurally through two 20 W loudspeakers placed symmetrically on each side of the computer screen with an intensity of 60 dB SPL measured at the head. Stimuli consisted of four complex tones with the following characteristics: 10 ms rise and fall, 16 harmonic components, 44,100 Hz sample rate, 16bit sound depth, stereo master, 60 dB SPL, and WAVE audio file format. For sound synthesis, we used Praat open-source software (www.fon.hum.uva.nl/praat). As target stimuli, the following sound frequencies were used: 740 Hz (i.e., F#5), and 1046 Hz (i.e., C6). As non-target stimuli, 2093 Hz (i.e., C7) and 2960 Hz (i.e., F#7) were used. In the V task, stimuli were presented using a 24’’ CRT monitor and consisted of four squared figures made of black vertical and horizontal bars displayed on a dark gray background of 4° on each side. Two of them were defined as targets and the other two were defined as non-targets (see Fig. 1). In the AV task, the A and V stimuli described above were simultaneously administered, as shown in Fig. 1. Presentation 20.2 software (Neurobehavioral Systems, Inc. Berkeley, CA, USA) was used for stimulation and EEG synchronization.

Behavioral data

The individual response speed was assessed using the median response time (RT) for correctly executed target responses; at the group level, the mean value of the RTs was considered. For target stimuli, RT shorter than 100 ms and longer than 1000 ms were not included for further analysis (0.21% of trials). The accuracy was assessed considering the percentage of both omission errors (missed responses to target stimuli) and false alarms (responses to non-target stimuli).

EEG recording and analysis

The participants were tested in a sound-attenuated room using a 64-channel EEG system (Brainamp™ amplifiers) with active electrodes (Acticap™) and software (Recorder 1.2 and Analyzer 2.2) all by Brain Products GmbH (Munich, Germany). The electrodes were mounted according to the 10-10 International System, initially referenced to the left mastoid (M1), and then off-line re-referenced to the M1–M2 average. Horizontal and vertical electrooculograms (HEOG and VEOG) were monitored by bipolar recordings, with electrodes positioned at the left and right external canthi (HEOG) and below and above the left eye (VEOG). Electrode impedances were kept below 5 kΩ. EEG was amplified, digitized at 250 Hz, bandpass filtered (0.01–80 Hz with a 50 Hz notch filter), and stored for further analysis. EEG was further filtered off-line (0.1–40 Hz bandpass, IIR zero phase shift 2° order Butterworth filter) and processed to reduce ocular artifacts using independent component analysis (ICA) available in the Analyzer software. Before signal averaging, automatic artifact rejection was performed to discard epochs still contaminated by other signals exceeding the amplitude threshold of ± 70 μV. Rejected trials were 3.5, 3.3, and 3.6% in the A, V, and AV conditions, respectively. As done by Fiorini et al. (2021), the signal was then segmented into 1350 ms epochs, starting from − 1100 ms pre- to 250 ms post-stimulus onset. Pre- and post-stimulus ERP components were measured with a − 1100/ − 900 ms and − 100/0 ms baseline, respectively. Target and non-target trials were collapsed because this factor does not affect the studied components (e.g., Di Russo et al. 2009) and to keep the same trial number for both pre- and post-stimulus ERP analysis.

In addition to the mentioned three conditions, a fourth condition was made averaging the A and the V conditions at the individual level and labeled (A + V)/2. This condition was statistically compared to the AV condition to test the multisensory effect on pre-stimulus ERP components. A fifth condition was obtained summing up the A and the V conditions at the individual level and was labeled A + V. This condition was statistically compared to the AV condition to test the multisensory effect on post-stimulus ERP components.

The selection of intervals and electrodes to be considered in statistical analysis was defined a priori using the “collapsed localizer” method (Luck and Gaspelin 2017), in which a localizer ERP is obtained by collapsing (averaging) all experimental conditions. To identify the interval of analysis, the global field power (GFP) was calculated from – 900 to 200 ms. The intervals in which the GPF was larger than 80% of its maximum value were used for further analysis. This GFP approach selected a pre-stimulus interval from − 500 to 0 ms and a post-stimulus interval from 100 to 132 ms, in which the mean amplitude was calculated for statistical purposes. The electrodes with an amplitude larger than 80% of the maximum value in the intervals selected by the collapsed localizer were joined in spatial pools and considered for statistical analysis. The pN was measured as a pool containing Fp1, Fpz, Fp2, and AFz electrodes (prefrontal pool). The BP was measured as a pool containing Cz, CPz, and Pz electrodes (centro-parietal pool). The auditory N1 was represented by a pool containing AFz, F1, Fz, and F2 electrodes (frontal pool). The visual P1 was represented by a pool containing PO9, PO7, O1, O2, PO8, and PO10 electrodes (parieto-occipital pool).

Statistical analysis

For behavioral data, statistical analysis was performed using a repeated-measure analysis of variance (ANOVA) with the factor “Condition” (A, V, and AV) as the independent variable. To evaluate the effect size of the results, the partial eta squared (ηp2) was reported. To reduce the likelihood of Type 1 errors, the Bonferroni post hoc test was used (this test returns adjusted p values that are divided by the number of comparisons). The alpha level was set to 0.05. For pre-stimulus activity, statistical analysis was performed using two-tailed t tests for dependent samples, comparing the AV and (A + V)/2 conditions. For post-stimulus ERP components, statistical analysis was performed using two-tailed t tests for dependent samples, comparing the AV and A + V conditions. We also performed topographical t tests, comparing every single electrode to reach a broader understanding of the two found differences. For these t tests, the Bonferroni correction was used to compensate for the multiple comparisons. Cohen’s d (d) was also reported as a measure of effect size. Correlation analysis between ERP components and behavioral performance was executed using the Pearson product-moment correlation coefficient. The correlation significance was tested by ANOVAs comparing the slope of the linear fit with zero. The p value was Bonferroni corrected. At first, Levene’s and Wilk–Shapiro’s tests for equality of variance and normal distribution, respectively, were performed, showing no violation of the sample homoscedasticity and distribution. All statistical analyses were executed using the Statistica 12.0 software (StatSoft inc., Tulsa, OK, USA). The alpha threshold was set to 0.05 for t tests on the electrode pools.

Results

Figure 2 shows the behavioral results. ANOVA showed that the effect of the Condition was significant for all measures (RT, omissions, and false alarms). Post hoc analysis indicated that for all measures in the AV condition, the value was smaller than in the unimodal conditions, which do not differ from each other. Table 1 reports the statistical data.

Fig. 2
figure 2

Behavioral results in the three studied conditions. Vertical lines denote 95% confidence intervals. RT  response time. * = p < 0.05

Table 1 Statistical data of the behavioral analysis

The left side of Fig. 3 shows the pre-stimulus ERP waveforms (BP and pN) for the A, V, and AV conditions at the prefrontal and centro-parietal pools. On the right side, scalp topographies (top-flat view) for the three conditions in the − 500/0 ms interval are shown.

Fig. 3
figure 3

On the left, pre-stimulus ERPs in the auditory (A), visual (V), and audiovisual (AV) conditions at the prefrontal pool, showing the pN component, and at the centro-parietal pool, showing the BP. The electrode pools are represented on the small heads. On the right, topographical scalp distribution in the − 500/0 ms interval in the three conditions

In the V condition over the prefrontal scalp, pN was recognizable as slow-rising negativity initiating at − 800 ms. In the A condition, the prefrontal activity had similar onset latency, but was positive because of being translated by the sensory auditory positivity (aP), as described by Bianco et al. (2020a). In the AV condition, this prefrontal activity was slightly positive. Over the centro-parietal areas, BP was present in all conditions but larger for V, smaller for A, and in the middle for AV. The BP onset was at about − 620 ms and its medial centro-parietal scalp distribution was similar among conditions.

In Fig. 4, the pre-stimulus phases of (A + V)/2 and the AV conditions are compared, showing more negative amplitude for the AV than the (A + V)/2 conditions in the prefrontal pool and similar amplitude at the centro-parietal pool. The t test on the prefrontal pool showed significant differences between conditions, while the t test on the centro-parietal pool was not significant. Table 2 reports the statistical data. The right side of Fig. 4 shows the topographical t test on all the scalp electrodes, indicating that significant differences between AV and (A + V)/2 go beyond the prefrontal electrodes’ pools involving ten prefrontal and lateral frontal electrodes. There was a lack of significant effects on the BP component. Significant electrodes are highlighted in white in Fig. 4.

Fig. 4
figure 4

On the left, pre-stimulus ERPs in the A–V average ((A + V)/2) and the audiovisual (AV) conditions at the prefrontal and centro-parietal pools. The electrode pools are represented on the small heads. On the right, a statistical map is presented, depicting the t test topographical distribution. Electrodes with significant differences are highlighted in white

Table 2 Statistical data of the pre-stimulus ERP analysis

Figure 5 shows the post-stimulus auditory N1 and visual P1 components in the A + V and AV conditions at the frontal and parieto-occipital pools. On the right side, their scalp topography is displayed in the 100–132 ms interval. The t test on both pools was significant, indicating a large amplitude in the AV than in the A + V condition. Table 3 reports the statistical data.

Fig. 5
figure 5

On the left, the waveforms show the early post-stimulus ERPs in the auditory plus visual (A + V) and audiovisual (AV) conditions at the frontal pool, with the auditory N1 component, and at the parieto-occipital pool, with the visual P1. The electrode pools are represented on the small heads. On the right, the topographical scalp distribution in the 100–132 ms interval in the two conditions is reported

Table 3 Statistical data of the post-stimulus ERP analysis

Correlation analysis between ERP components and behavioral performance showed that in the AV condition, the pre-stimulus prefrontal amplitude (pN) was correlated with the proportion of false alarms (%). The post-stimulus frontal (N1) and parieto-occipital (P1) activities correlated with the RT in the AV condition. Table 4 reports all the correlation data and the relative statistical significance (slope different from zero). Figure 6 shows the plot of the three significant correlations.

Table 4 Statistical data of the correlation analysis between ERPs and behavioral measures in the AV condition
Fig. 6
figure 6

Significant correlations in the AV condition between the pN amplitude and false alarm (FA) percentage, between the P1 amplitude and the RT, and between the N1 amplitude and the RT

Discussion

In this study, we investigated the possible effects of multimodal interaction on motor and cognitive preparation in premotor and prefrontal brain areas. The main purpose was to verify the presence of multimodal integration in non-sensory brain areas involved during task preparation. In addition, we aimed to verify the possible correlation of these ERP components with task performance.

The behavioral performance confirmed previous literature (e.g., Stevenson et al. 2014), showing faster and more accurate responses to multimodal rather than unimodal tasks. This multimodal advantage was previously explained by larger sensory and attentional processing of the presented stimulus in sensory-related areas (Busse et al. 2005). However, here we showed that this behavioral advantage might also be explained by enhanced cognitive preparation, occurring before stimulus processing and indexed by the pN component. In line with this interpretation, we acknowledged a positive relation between the pN amplitude and behavioral performance (larger pN was related to improved accuracy), confirming previous studies (for normative data, see Di Russo et al. 2019). Positive correlations were also found between the auditory N1 and visual P1 post-stimulus components and response time, confirming earlier findings (e.g., Mangun and Buck 1998).

pN has been repeatedly observed in discriminative response tasks and associated with top-down attentional and inhibitory control. A plethora of studies demonstrated that this component is sensitive to several variables such as age, physical activity, and task complexity (Berchicci et al. 2012, 2015, 2016, 2019; Mussini et al. 2021; Perri et al. 2019).

Confirming that MSI influences attention in both a stimulus-driven and top-down fashion (e.g., Matusz and Eimer 2013; Talsma et al. 2010), a possible conclusion is that the increased involvement of the pN in multimodal, compared to unimodal, tasks is related to the higher top-down control required when multiple sensory modalities are involved.

An alternative explanation for this effect relates to the view of the brain as a predictive organ that actively infers and learns the causes of its sensory inputs (Friston 2005). This theory considers ERP components in terms of precision-weighted prediction errors ascending from lower to higher levels in the cortical hierarchies (e.g., Visalli et al. 2021). According to this theory, the brain tends to minimize uncertainty (surprise) trying to predict the desired outcomes by referring to available inputs. Friston’s theory of cortical responses could offer a more parsimonious account for the present results in terms of surprise minimization (Friston 2005), considering the increased ERP activity in multimodal tasks as the brain’s necessity to increase energy to cope with the increased perceptual uncertainty.

The PFC contribution to MSI has been found in non-human primate studies, showing that interaction of higher-order association areas such as the PFC with the thalamus and sensory cortices is necessary to focus attention on a task in complex environments with multiple sensory stimulations (Zikopoulos and Barbas 2007). In human studies, it has also been proposed that the PFC role in MSI is to synchronize the neural networks of sensory channels to allow reciprocal influence (integration). This interpretation is further corroborated by crucial evidence that the prefrontal function is impaired in people with autism spectrum disorders, contributing to the social impairment and atypical behavior observed in these populations (Martínez-Sanchis 2014). The PFC synchronization role is in line with single-neuron data in non-human primates, showing that a disruption in the temporal synchrony of an audiovisual signal produces changes in the PFC firing rate of neurons and this could underlie the loss in intelligibility, occurring with asynchronous speech (Romanski and Hwang 2012). The present results are in line with this literature and additionally show that PFC synchronization might be an anticipatory process necessary when motor interactions with multimodal stimuli are required.

A limitation of the current study is that it is the first time in literature that the paradigm AV−((A + V)/2) is used. In the present case, this represented the best choice given the lack of previous literature on pre-stimulus MSI effects on non-sensory brain areas. Further, it would be interesting to disentangle this issue, maybe considering a study using EEG combined with fMRI to confirm that the observed effects are truly linked to MSI. Besides this, it would be interesting having more information about the precise source of pN and the connectivity with other brain regions, including sensory ones, to achieve a broader view of these phenomena. Another limitation is related to the use of a block-task design rather than an event-based intermixed design, which might preclude the possibility of drawing strong conclusions from the obtained results. However, we have already shown in two studies that there were no differences in the pre-stimulus ERP components between the block and intermixed designs, confirming that the observed anticipatory multisensory effect can be safely attributed to top-down processes (Fiorini et al. 2021, 2022). Last, the role of attention in MSI by manipulating the relevance of sensory modality (see Talsma et al. 2010 for a review) might deserve further exploration.

In conclusion, the present results confirm previous literature indicating that MSI is a very flexible phenomenon (e.g., Van Atteveldt et al. 2014). MSI might act both as bottom-up and top-down controls (Choi et al. 2018; Tang et al. 2016) going beyond perception and sensory-related brain areas (Li et al. 2020; Scheliga et al. 2022), also involved in sensory anticipation (Fiorini et al. 2021). We have also extended the current literature showing the influence of the MSI on the processing phase of PFC, which precedes stimulus occurrence acting during the anticipatory cognitive preparation stage of task execution and allowing more accurate responses.