Introduction

The “Deutsch’s illusion” (or octave illusion, Deutsch 1974, 1981) occurs when a dichotic pair of tones spaced an octave apart is repeatedly presented in alternation, i.e., when the right ear receives the high tone, the left ear receives the low tone and vice versa. Specifically, while each ear receives the same tone sequence, the sequences at the two ears are shifted by one tone offset, so that the high- and low-frequency tones are always presented to different ears (Fig. 1). The most common percept consists of a single tone that alternates in pitch between ears, so that a high tone heard in one ear alternates with a low tone heard in the other ear. Two other observed percepts (but in less than 1% of the subjects) consist of (1) a single tone that alternates between ears, with a pitch which remains nearly constant as the perceived location of the tone changes and (2) a mixed group of complex percepts, which often involve three different pitches (Deutsch 1974, 1981). Since 1974 an extensive series of studies has been carried out aimed at investigating several variables relevant to the acoustic and perceptual parameters that elicit the Deutsch’s illusion (McClurkin and Hall 1981; Deutsch 1983; Efron et al. 1983; Zwicker 1984; Akerboom et al. 1985; Brennan and Stevens 2002; Chambers et al. 2005; Sonnadara and Trainor 2005; Brancucci et al. 2009, 2011a, b; Brancucci and Tommasi 2011; Brännström and Nilsson 2011; Xu et al. 2012).

Fig. 1
figure 1

The stimulation conditions. Each tone lasts 500 ms and the complete sequence lasts 15 s. The first stimulus of each sequence was counterbalanced (see text). Time flows from left to right

From a theoretical standpoint, Deutsch (Deutsch 1980, 1981, 1988; Deutsch and Roll 1976) proposed a two-channel model to explain the octave illusion in terms of separate “what” and “where” implicit decision mechanisms. According to this model, the “what” mechanism determines that the perceived pitches are equivalent to the frequencies presented to the dominant ear, whereas the frequencies presented to the non-dominant ear are suppressed; the “where” mechanism determines that percepts are localized at the ear receiving the higher frequency, regardless of which frequency is in fact perceived. An alternative model was suggested more than 25 years later by Chambers et al. (2002, 2004), who proposed that the pitch variation experienced during listening to the octave illusion may arise from mechanisms of harmonic fusion combined with binaural diplacusis (see also Deutsch 2004a, b).

Despite the deep interest around the psychophysical, behavioral and perceptual features of the illusion, just a few studies investigated its neural mechanisms (Ross and Näätänen 1996; Lamminmäki and Hari 2000; Lamminmäki et al. 2012). Using electroencephalography and mismatch negativity, i.e., a change-specific event-related potential originating at cortical level, Ross and Näätänen (1996) concluded that the stimuli giving rise to the octave illusion are encoded at lower cortical levels according to their physical rather than perceptual properties and that the generator of the octave illusion is located beyond the first levels of the auditory cortex. To further investigate the neural bases of the illusion, Lamminmäki and Hari (2000) recorded the magnetoencephalographic (MEG) neural responses to binaural and dichotic 400 and 800 Hz tones. They found that the M100 (the component of the evoked magnetic field appearing bilaterally 100 ms after the onset of the stimulus) localized in the left and right auditory cortices was stronger in the hemisphere contralateral to the ear receiving the 800 Hz tone, than vice versa. In contrast, the sustained fields tended to behave in an opposite manner. They suggested that the illusorily perceived locations of the sounds would follow the M100 lateralization, and that the perceived pitch is determined by different mechanisms related to both activation strength and streaming by ear. In the most recent study investigating the neural bases of the illusion, the same group (Lamminmäki et al. 2012) explored the link between the perceived pitches and brain activity. They recorded magnetic responses to 4 binaural and dichotic combinations of 2-min long continuous 400- and 800-Hz tones and to four monaural tones. Responses to left and right ear stimuli were made recognizable by tagging the ear-specific stimuli at different modulation frequencies. During the dichotic presentation, ipsilateral left ear tones elicited weaker responses and ipsilateral right ear tones elicited stronger responses compared to the binaural presentation (i.e., when both ears received the same tone). In one of the most paradoxical situations, i.e., stimulation with high tone to the left ear and low tone to the right ear (perceived as a low tone in left ear during the illusion), also the contralateral responses to the left ear tones decreased. The authors concluded that illusory pitch perception depends on a modified binaural interaction between dichotic tones separated by one octave.

There are still fundamental aspects that need to be elucidated to understand the neurophysiological mechanisms explaining the Deutsch’s illusion. First, the key role of sequential interactions between alternating tones in the illusion, as pointed out by several behavioral studies (Deutsch 1980, 1981, 1988, 2004a, b; Chambers et al. 2002, 2004; Brancucci et al. 2011b) remains to be clarified. Second, the previous studies were restrictedly focused on the primary auditory cortex. Third, just early evoked responses were analyzed in the previous studies. To fill at least in part these gaps, we utilized here three stimulation conditions in which tones lasted 500 ms (Brancucci et al. 2009), allowing the investigation of middle and late neural evoked activity. The tones were presented in relatively long sequences (15 s), permitting the emergence of sequential interaction effects. The first stimulation condition of the present experiment, here labeled as ILL, consisted of the standard tone sequence eliciting the illusion. The second condition (MON) consisted of a mimicry of the illusory percept, i.e., a sequence of one 400 Hz monaural tone presented at one ear followed by one 800 Hz monaural tone at the other ear (tones alternating in both frequency and ears). The third condition (NIL) differed slightly from ILL, as the two tones composing the dichotic stimuli were separated by a frequency interval which was chosen in the range not eliciting the illusion (Brancucci et al. 2009). The selection of the second and third condition was driven by previous literature (Ross and Näätänen 1996; Lamminmäki et al. 2012). Moreover, in the NIL condition the slight frequency difference of one of the tones composing the dichotic pair (which produces a drastically different percept) can be controlled since the topography of auditory cortex activation follows well-known rules related to sound frequency (Romani et al. 1982). To carry out our study we used MEG, a silent neuroimaging technique especially suited to detect cortical activity related to auditory stimulation in humans and able to analyze the response dynamics.

The main hypothesis of the study is based on the assumption that the experimental conditions elicit responses involving the main nodes of the auditory what and where pathways and that these responses should differ in terms of amplitude, latency or duration. The “what” pathway runs from the primary auditory cortex ventrally to more anterior areas, whereas the “where” pathway runs from the primary auditory cortex dorsally to parietal areas and then to frontal areas (Rauschecker and Tian 2000; Ahveninen et al. 2006; Woods and Alain 2009; Hackett 2011). In the ILL and NIL conditions both ears are stimulated always together, with the same number of 400 and 800-Hz tones (400 and 480 in the NIL condition), whereas the perception of pitch (what) and origin (where) is known to be strongly different from the physical input (Brancucci et al. 2009). Hence, a putative different response between the two conditions in the two auditory streams would be of interest, as presumably driven by processes related to illusory perception rather than to stimulation. In turn, the comparison between ILL and MON contemplates two conditions having as much as possible the same perceptual features, although elicited by different stimulations. Here, conversely, a similarity in the neural response, despite the different ear inputs, would be of interest to elucidate mechanisms at the basis of the illusion. The prediction is that, although illusory in nature, the perceptual features of the illusion (which tone is perceived, “what”; in which ear, “where”) described by Deutsch and Roll (1976) have a defined biological counterpart in the two main neural streams leaving the primary auditory cortex. In addition, we expect a leading role of frontal areas in the present paradigm as it involves higher cognitive functions and suppose that, due to the non-verbal nature of the present acoustic stimuli, the right hemisphere (RH) would play a predominant role during the perception of the illusion.

Materials and methods

Subjects

Nineteen healthy adults (13 females, 6 males) aged between 21 and 31 years (mean age 25.1 years) participated in the study. They reported no history of auditory impairments and were additionally administered an auditory functional assessment (absolute hearing threshold < 20 dB) showing no differences between left and right ear (± 5 dB, Brancucci et al. 2005). Subjects were recruited through a preliminary experiment, carried out with the explicit purpose of assessing their percepts in the Deutsch’s illusion; only subjects perceiving the auditory illusion in the most common way (i.e., a single tone that alternates in pitch between ears) were recruited. Sixteen subjects were right-handed as determined by a standard handedness measurement (Salmaso and Longoni 1985) while three subjects scored < 0. Since one male subject was seated too far from the MEG helmet during 2 out of 3 experimental conditions and we aimed at comparing the responses in the 3 conditions, he was not included in further analysis.

Stimuli

Stimuli were synthesized on a personal computer by means of CSound language for sound synthesis (Vercoe 1992). All tones were sinusoids with amplitude envelope having an uprising part of 10 ms and a decay of 490 ms. Tones were arranged in sequences representing the experimental condition and the two control conditions (Fig. 1). The first condition (experimental condition; ILL, illusion; Fig. 1, top) was a sequence composed of the simultaneous 400 and 800 Hz tones constituting the two dichotic pairs (400 Hz to the left ear, 800 Hz to the right ear and vice versa: 800 Hz left, 400 Hz right). The pairs were presented repeatedly in alternation with no interstimulus interval so that, when the right ear received the high tone, the left ear simultaneously received the low tone and vice versa. The tone sequence lasted 15 s and was presented 25 times with an interval of 8 s between two sequences. Twenty-five additional sequences starting with the reversed dichotic pair were presented to all subjects. The second condition (control condition; MON, monaural; see Fig. 1, center) consisted of a monaural stimulation made of 400 and 800 Hz tones presented sequentially in alternation (400 left, 800 right, 400 left and so on, or the reversed sequence). Specifically, stimuli mimicked the standard percept of the Deutsch’s illusion. The third condition (control condition; NIL, no illusion; Fig. 1, bottom) was instead a sequence identical to the experimental condition, in which the 800 Hz tone was replaced by a 480 Hz tone, obtaining a stimulation with dichotic pairs of simultaneous 400 Hz left and 480 Hz right tones (and vice versa) eliciting no illusory percept (Brancucci et al. 2009; Deutsch 1981). As in the experimental condition, the sequences in the two control conditions lasted 15 s and were presented 25 times (+ 25 times with stimuli inverted between ears) in all subjects, with an interval of 8 s between two sequences. Based on previous experiments on the Deutsch’s illusion, stimulation intensity was the same for all subjects (Deutsch 1974) and was set at a comfortable level of 70 dB as measured with a phonometer (SL-318, PCE group, UK). The acoustic stimuli were provided by Sensorcom plastic ear tubes connected to a transducer that was placed inside a µ-metal box to avoid any artifact on the MEG recordings. The initial stimulus of each sequence was counterbalanced within subjects, while the position of the earpieces was counterbalanced between subjects (i.e., to control for possible output differences in the earpieces, half of the subjects wore them in inverted position).

MEG recordings

The magnetic field was recorded by using a whole-head MEG system consisting of 165 dc SQUID integrated magnetometers and sited in a high-quality magnetically shielded room (Della Penna et al. 2000). Evoked magnetic fields were bandpass filtered at 0.16–250 Hz and recorded at 1 kHz sampling rate. To determine the position of the subject’s head with respect to the MEG sensor, the magnetic field generated by five coils placed on the scalp was recorded before and after each of three measurement sessions. A coordinate system was defined by digitizing anatomical landmarks on the subject’s head by a 3D digitizer (3Space Fastrak; Polhemus). Cardiac and ocular activities were also monitored by means of bipolar electrodes placed on the chest (1 pair) and on the lower and upper eyelids, one on one side and the other at the opposite corner of the eye (1 pair), and were used to filter out possible heart and eyes contaminations of the MEG signals by means of an adaptive algorithm (as in Della Penna et al. 2004). Magnetic resonance images of the subjects’ heads were obtained by means of a Philips scanner at 1.5 T using 3D T1-turbo field echo sequence. During the acquisition, spherical oil capsules were applied on the anatomical landmarks to allow coregistration of MEG and MRI coordinate systems.

Data analysis

The aim of the post-processing analysis was first to localize the sources generating the evoked field during the three stimulation conditions (see “Source localization” section) and then to compare activity and dynamics of the active sources. To define the ROIs to be compared across conditions, a completely data-driven approach was applied to the whole brain activity maps (see “ROI selection” section).

Source localization

For each condition, evoked magnetic fields were obtained by averaging MEG signals from all sequences of homologous stimuli. The averaging window comprised the time interval [− 50, + 450] ms with respect to the stimulus onset at 0 ms. For each condition, the amplitude of the evoked fields was calculated with respect to a baseline level in the interval [− 50, 0] ms. Then, we selected and applied a suitable localization strategy, which in our case was completely data-driven. In principle, to identify generators of MEG-evoked responses, parametric approaches assuming a fixed number of equivalent point-like models for the brain generators (Scherg and Berg 1996; Torquati et al. 2005; Stavrinou et al. 2007) or approaches based on extended source models such as minimum-norm estimates, low-resolution tomographies, and beam-forming methods (Hämäläinen and Ilmoniemi 1994; Pascual-Marqui et al. 2002; Sekihara et al. 2002; Brookes et al. 2007) can be used. To identify generators of MEG-evoked responses, we used the classical LORETA algorithm (low-resolution brain electromagnetic tomography; Pascual-Marqui et al. 2002; Pascual-Marqui 1995) implemented in the Besa software (version 5.2.4, Besa GmbH). LORETA is an imaging approach that can provide a blurred image of a point source exactly centered on the location of maximal activity, even for shallow and correlated sources. We used this approach based on distributed sources because we had no a priori assumption on the number and locations of sources, which is needed in multiple dipole fitting.

LORETA estimated the activity intensity of each voxel in the 3D volume grid (7 mm side) modeling the volume conductor, with 1 ms resolution over the whole time interval [− 50, 450] ms. We thus obtained 500 whole brain maps of activity for each subject and condition.

ROI selection

We adopted a data-driven strategy to define group-level ROIs, the activity of which was eventually compared across conditions. First, after source localization, the individual 3D anatomical images and the coregistered functional maps were transformed into stereotaxic coordinates in the Talairach space using Brain Voyager Qx. Then we applied a statistical threshold to the transformed functional maps, and we looked for contiguous group of voxels with a statistically significant current density magnitude (Pascual-Marqui et al. 2002) during at least one stimulation condition, to generate a volumetric mask. In this search, we removed from the entire brain volume the slices including the cerebellum and the voxels inside a sphere of 40 mm of diameter placed at the center of the brain, to discard the contribution of the cerebral ventricles (Franciotti et al. 2009). Specifically, we applied the following strategy: (1) for each subject and condition, the sequence of whole brain maps of current density magnitude was down-sampled to a time step of 25 ms (averaging the 25 intensity values comprised in this time interval) just to reduce the computational burden; (2) for each voxel and time point t′, we first estimated the difference sig(t′) between the activity intensity at time t′ after the trigger onset and the mean intensity during the baseline ([− 50, 0] ms), and then we transformed this difference into z-score values according to the formula

$$ zs(t') = \frac{{{\text{sig}}(t') - \mu }}{{\sigma_{\text{b}} }}, $$

where sig(t′) was the activity intensity with respect to the baseline, µ and σ b were the mean and the standard deviation of sig(t′) over the whole brain and over the post-trigger epoch, respectively. Here the null hypothesis was that no local peaks of current density magnitude were detected on the 3D grid. Eventually, (3) these values were averaged over the first neighboring voxels to account for possible coregistration errors of the individual brains over the common atlas (see motivation in point 6 below); (4) still at the subject level, z-score maps at each time point were thresholded at p = 0.05 (Bonferroni corrected on the basis of the number of MEG channels) to obtain binary maps, where 1 corresponded to voxels with z ≥ 3.99 (p < 0.05) and 0 to voxels with z < 3.99 (p > 0.05); (5) since we aimed to find supra-threshold voxels in the whole post-trigger interval whatever was the latency, the Boolean logical operator OR was applied over time t′, producing an individual binary mask map for each condition; (6) for each condition and for each voxel, we then counted how many how many times that voxel scored 1 across subjects and we defined a consistency of 50% as a threshold to obtain group masks of significant current density magnitude; the spatial smoothing in point (3) was thus needed to avoid discarding significant voxels closely laying and comprised within the same region in different subjects, but non-overlapping due to coregistration errors; without this procedure we would artificially reduce the consistency. Finally, (7) a binary, global significance mask was obtained from the logical OR of the group significance masks obtained in the three conditions. Thus, the global significance mask was obtained from the data and showed voxels above threshold during at least one condition, consistently across subjects. Notably, no assumptions on latency of activity and position of sources were adopted. From the global significance mask we manually delimited the ROIs for further analyses on modulations and latency of activity intensity across conditions. Finally, for visualization of group results and labeling of the group-level ROI, an additional transformation of the maps of current density magnitude (in Talairach space) to the MNI152 atlas was applied through in-house software, based on Matlab (The MathWorks Inc., Natick, MA, USA).

Analysis of ROI activity intensities and latencies

We carried out our analysis using two different strategies providing complementary information. First we looked for a global modulation of activity intensity within the ROIs, during macro-epochs centered across activity peaks. This analysis was designed to show modulations of the time integral of the z-scored activity intensity. To this end we inspected the individual time courses of voxels in the ROIs in the three conditions. This inspection suggested three peaks of activity and eventually we selected three macro-epochs with the same duration: early (0–150 ms), middle (150–300 ms) and late (300–450 ms), each containing six time points. Then, for each condition and macro-epoch, we computed the mean of the z-scores across the six time points, and obtained three maps (early, middle and late) for each subject and condition. Finally, we averaged these maps across subjects, we computed the z-score differences ILL–NIL and ILL–MON for each macro-epoch, we masked these differences using the global significance mask and we thresholded them at p = 0.05 (Bonferroni corrected).

The second strategy aimed at inspecting possible modulations of peak activity and peak latency exploring each voxel in each ROI within the early, middle and late macro-epochs for each condition. Specifically for each condition, the activities obtained from LORETA with 1 ms time resolution were analyzed for all the voxels within the global significance mask, to look for voxels showing the maximum activity within an ROI during each of the macro-epochs. Eventually, we selected an activity peak for each ROI, condition, and macro-epoch. To assess statistical significance of modulations, Student’s t tests were performed for each ROI separately and for each macro-epoch, with both amplitude differences and latency differences as dependent variables. Because of the relatively high number of comparisons (3 in each ROI for both amplitude and latency), we decided to use a more strict significance threshold as the conventional one and set p = 0.01.

Analysis of lateralization effects

As for the analysis of ROI activity, the strategy to evaluate lateralization effects was twofold. The first approach aimed at inspecting lateralization effects on the integral of the activity signal in the three macro-epochs, using the group z-score maps described in the previous paragraph. The Laterality Index (LI), defined as \( {\text{LI}} = 100 \times \frac{{Q_{\text{L}} - Q_{\text{R}} }}{{Q_{\text{L}} + Q_{\text{R}} }} \), was estimated for each macro-epoch and for each ROI pair in each z-score difference map (ILL–NIL, ILL–MON, NIL–MON). Here Q L and Q R were defined as the means of the z-score values over each ROI. To estimate them, only voxels with a z-score value above the statistical threshold (p = 0.05, Bonferroni corrected) either in the left hemisphere (LH) and/or the RH were included. Additionally, each mean value was transformed into a z-score (i.e., it was multiplied by N½, where N was the number of voxels in each ROI) to account for both the size (as the number of voxels above threshold could be different in the LH and RH) and the increased (or decreased) activity of the ROI during ILL with respect to both other conditions. A 20% threshold was used to individuate lateralized ROIs.

The second approach analyzed the peak responses elicited in homologous ROIs in the LH and RH to investigate possible hemispheric asymmetries. Four-way ANOVA with Condition (ILL, MON, NIL), ROI (MFG, IC, IFG and HG), Hemisphere (left, right) and Macro-epoch (first, second, third) as factors was calculated for amplitudes and latencies.

Results

Regions of interest following the three stimulation conditions

In all conditions (ILL, MON, NIL), we looked for voxels above significance threshold during the whole post-trigger time interval, consistently across subjects, to obtain binary significance masks (see “Materials and methods”).

Figure 2 displays these binary masks in the MNI atlas, using Caret software (http://www.nitrc.org/projects/caret/; Van Essen et al. 2001). In the figure, the masks have been overlapped and thus we show the voxels above threshold in one or more conditions in different colors. Specifically, significant activity was found in the majority of subjects over the bilateral Heschl gyri (HG, BA41, centroid MNI coordinates left: − 47 − 25 10, right: 51 − 25 16), the bilateral insulae (IC, BA13, ± 45 − 3 5), the bilateral inferior frontal gyri (left IFG, BA47, − 46 20 − 3, right IFG, BA44, 46 17 12), the bilateral middle frontal gyri (MFG, BA6, left: − 42 − 1 42, right: 42 − 4 42) as well as the left inferior parietal lobe (IPL, BA40, − 34 − 58 33). A large part of the voxels comprised in the HG and MFG were above threshold in all the three conditions (red voxels in Fig. 2). All the other ROIs were elicited during the ILL condition, with some overlap with the other conditions. For further analyses we adopted a global significance mask (not shown) obtained as the logical OR of the three overlapped binary maps shown in Fig. 2.

Fig. 2
figure 2

Map of voxels showing an activity significantly above the baseline level (p < 0.05, Bonferroni corrected) with a consistency of 50% across subjects in at least one of the three conditions. The different colors code the conditions eliciting significant activity

Using the global significance mask, we extracted the time course of activity of the voxels in the ROIs. Figure 3 shows the source waveforms of sample voxels included in the ROIs and obtained through LORETA from a representative subject in the three conditions (ILL—thick line; MON—thin line; NIL—dashed line). For all the three conditions, the source waveforms show three evoked activity components (peaks) in the interval from 75 to 400 ms after the stimulus onset. For this representative subject, the first peak fell in the 75–130 ms time interval, the second peak in the 150–260 ms interval and the third peak in the 300–400 ms time interval. These intervals are included in the three macro-epochs defined as early ([0, 150] ms), middle ([150, 300] ms) and late ([300, 400] ms), respectively.

Fig. 3
figure 3

Time course in the three conditions (ILL, NIL, MON) of neural responses in the significant areas as obtained by LORETA from a representative subject

Results on ROI analyses

Figure 4 shows the significant (p < 0.05 Bonferroni corrected) z-score group difference maps at each macro-epoch, together with the statistical results on peak amplitude and latency, as obtained from the comparisons between the different conditions (ILL, MON, NIL).

Fig. 4
figure 4

a Activation differences between ILL and MON conditions in the ROIs. Top: maps depict activation differences (z-score at ROI level) in the middle macro-epoch (150–300 ms). Bottom: graphs report in each ROI (single voxel) the across subjects mean difference between the two conditions of the peak response amplitude (first row) and latency (second row) for the early (left panel, 0–150 ms), middle (central panel, 150–300 ms) and late (right panel, 300–450 ms) time intervals. Asterisks indicate significant t test results (p < 0.01) against 0. b Activation differences between ILL and NIL conditions in the ROIs. Top: maps depict activation differences (z-score at ROI level) in the middle macro-epoch (150–300 ms). Bottom: graphs report in each ROI (single voxel) the across subjects mean difference between the two conditions of the peak response amplitude (first row) and latency (second row) for the early (left panel), middle (central panel) and late (right panel) time intervals. Asterisks indicate significant t test results (p < 0.01) against 0. c Activation differences between NIL and MON conditions in the ROIs. Graphs report in each ROI (single voxel) the across subjects mean difference between the two conditions of the peak amplitude (top) and latency (bottom) for the early (left panel), middle (central panel) and late (right panel) time intervals. Asterisks indicate significant t test results (p < 0.01) against 0. No significant differences were observed in the z-score maps, hence none of them are displayed for this comparison

ILL vs. MON

Figure 4a and Fig. S1 (group z-score difference map at the macro-epochs) show (top left, Fig. S1) an early larger integral of the z-scored activity for the ILL condition. This increase mainly occurs in both left and right superior temporal gyri, and also in more anterior areas to some extent. At the middle latencies (Fig. 4a, top; Fig. S1, center), the integral of the z-scored activity is significantly stronger in the ILL condition and involves the superior temporal gyri, bilateral inferior and medial frontal areas as well as left parietal regions. No significant differences were obtained at the late macro-epoch (Fig. S1, top right).

The bar plots in Fig. 4a suggest that during the ILL condition, compared to the MON condition, the peak activity at the early-middle latencies was larger in all ROIs (first and second macro-epochs, left and center top plots). This effect was statistically significant in the right IC (second macro-epoch, t = 3.455, p = 0.003), in the left MFG (first macro-epoch, t = 3.701, p = 0.002), in the left IPL (first macro-epoch, t = 3.196, p = 0.005), in the left HG [first macro-epoch (t = 3.598, p = 0.0029) and second macro-epoch (t = 4.565, p < 0.001)] and in the right HG (second macro-epoch, t = 3.650, p = 0.002). Conversely, during the third macro-epoch (right top plot), peak activity tended to be larger in the MON condition, in all ROIs except than in the right IFG and right IC, although without reaching statistical significance. Finally, bottom bar plots in Fig. 4a suggest that in the ILL condition there was a trend towards a slowing of the peak latency during the third macro-epoch in all ROIs except for left IC and left IPL. The latency increase reached significance in the left HG (t = 4.485, p < 0.001). Overall, these results suggest a more pronounced activity (larger integral) in the right insula at the middle latencies during the ILL condition, and a possibly faster processing at the late latencies in right primary auditory areas.

ILL vs. NIL

The group z-score difference map in Fig. S2 (top left) suggests that the ILL condition elicited a stronger integral of the z-scored early activity mainly in the right superior temporal gyrus. At middle latencies (Fig. 4b, top; Fig. S2 top center) the integral of the z-scored activity is significantly larger in the majority of the ROIs, which are specifically comprised in the superior temporal gyrus, in more inferior frontal and inferior parietal areas, and in particular in the LH. At the late latencies, no difference was observed above threshold (Fig. S2, top right maps). The bar plots in Fig. 4b (left and center plots) show that the ILL condition, compared to the NIL condition, was associated to a larger middle peak response (peak amplitudes in the first and second macro-epochs) in all ROIs except for the IPL. In the right IC the effect was statistically significant at the second macro-epoch (t = 3.294, p = 0.004). On the contrary, the late peak response (Fig. 4b, top right bar plot) was larger in the NIL condition in all ROIs except than in the right IC and IFG. In the right HG the effect was statistically significant (t = 3.710, p = 0.002). Instead, neither consistent difference patterns nor statistically relevant effects were detected when analyzing peak latency differences (Fig. 4b, bottom bar plots). Overall these results suggest a more pronounced activity (larger integral) and possibly prolonged activity in the ROIs at the middle latencies during the ILL condition, and a possibly prolonged processing in left primary auditory areas (same integral, reduced peak activity) at the late latencies.

NIL vs. MON

No significant voxels were detected in the group z-score difference maps at any macro-epoch hence no maps are displayed in Fig. 4c. The bar plot in Fig. 4c (top left) shows that the NIL condition, compared to the MON condition, was associated with a larger early response (occurring during the first macro-epoch) in all ROIs, and this peak value reached statistical significance in the left IPL (first component, t = 2.977, p = 0.008). A clear trend towards increase or decrease of peak activity could instead not be observed at later latencies (within the second and third macro-epochs, top center and right plots). However, the peak activity of the left IPL during the third macro-epoch was significantly larger in the NIL condition (t = 3.061, p = 0.007). Figure 4c (bottom center) shows that the peak latency during the second macro-epoch tended to be slower in the MON condition (all ROIs except for left HG), although no comparison was above the statistical significance.

In summary, this comparison indicates that only mild differences could be found between the NIL and MON conditions. We found only that peak responses to NIL tended to be larger than MON at the early latencies.

Hemispheric lateralization effects

Figure 5 shows the z-scored QL and QR used for the computation of the ROI LIs. Only the above-threshold LIs are displayed. The response of left IPL is always lateralized by construction from the global significance mask. Different patterns of LIs can be noticed across macro-epochs and conditions. Specifically, in the first macro-epoch (Fig. 5, top) a larger increase of the activity integral was detected in the right MFG when comparing ILL vs. NIL (LI = 35%) and in the right IC when comparing ILL vs. MON (LI = 21%). Conversely, in the second macro-epoch (Fig. 5, bottom), the increase of activity integral was larger in the left IC and IFG irrespective of the comparison (LI = 33% and LI = 30% for ILL vs. NIL, respectively, and LI = 30% and LI = 25% for ILL vs. MON, respectively). As at the late latencies no significant activity was found, possible lateralization effects could not be investigated.

Fig. 5
figure 5

Mean z-score activity difference in the ROIs used to estimate the LI in the first two macro-epochs. Numbers report LIs above 20%, positive sign indicates leftward hemispheric asymmetry

When inspecting possible lateralization over the peak activities or latencies, no statistically significant effects involving the factor hemisphere were obtained.

In summary, the above results suggest different patterns of lateralization in the first two macro-epochs. From a ILL condition-based perspective (ILL vs. MON, ILL vs. NIL), a larger and prolonged increase of activity in the RH at the early macro-epoch, and a predominance of the LH at the middle macro-epoch were observed. Distinctively, while the latter predominance was not condition-specific (left IC and left IFG for both comparisons), the former involved the right MFG for the ILL vs. NIL and the right IC for the ILL vs. MON comparison.

Effects on what and where pathways

Two 3 × 3 × 2 × 2 ANOVAs were performed with factors Condition (ILL, MON, NIL), Macro-epoch (1, 2, 3), Pathway (What, Where) and Pathway ROI (What pathway: IC, IFG; Where pathway: IPL, MFG) on amplitude and latency data. Latencies were computed with respect to HG, as this ROI is involved in both pathways. Data were analyzed in the LH since right IPL was not included in the global mask. ANOVA on amplitude yielded no significant effects involving Condition and Pathway. ANOVA on latency yielded a significant three-way interaction involving the factors Condition, Macro-epoch and Pathway (F 4,68 = 3.597, p = 0.010). Figure 6 displays corresponding mean values and post hoc results. Whereas no effects can be observed in the first macro-epoch in both auditory pathways between conditions, the second macro-epoch showed a significantly slower response in the MON condition compared to both ILL and NIL conditions, which was confined to the where pathway. The third macro-epoch showed that in the what pathway the MON condition was associated to a slower response compared to both NIL and ILL condition, and that the ILL condition was associated to a faster response compared to the NIL condition. In the where pathway at the third macro-epoch responses were faster in the ILL compared to both MON and NIL condition.

Fig. 6
figure 6

Mean latencies in the what and where auditory pathways for the three conditions at the three macro-epochs main components (peaks). Values represent pathways peak latency difference with HG taken as a start point of the two pathways. Asterisks indicate significant post hoc comparisons (p < 0.05) observed in the significant three-way interaction Condition × Pathway × Peak

Discussion

Basic neurophysiological results

According to the present results, the signature of the Deutsch’s illusion presents the following features: the areas showing activity differences between the ILL and the control conditions are few, and include the HG, the IC, the left IPL, the IFG, and the MFG, as revealed by the z-score difference maps. The neural activations therein observed indicate that the sequence eliciting the illusion, compared to both monaural illusion-mimicking and dichotic non-illusion inducing stimuli, is associated to stronger early-middle evoked activations in the right IC, left MFG, left IPL and bilateral HG, and to reduced and slowed late evoked activity in bilateral HG. These effects were substantially concordant across the mentioned areas. Thus, listening to the auditory sequence which elicits the Deutsch’s illusion produces neural activations that differ considerably both from the ones produced during listening to a very similar auditory sequence which does not elicit the illusion (NIL condition), and from the ones produced during listening to a monaural auditory sequence which reproduces the illusory percept (MON condition).

The fact that in the illusory vs. monaural (ILL–MON) comparison in particular the HG showed effects at the earliest stages of processing is possibly due to the substantial difference between dichotic and monaural acoustic stimuli, which has a strong reflection on the activity in the auditory pathways and determines a strong influence of sensory thalamocortical input on lower level cortical activity due to the interaction of neural activity elicited by the stimulation of the two ears. Specifically, it has been shown that binaural stimuli induce a cortical response which is stronger than the one elicited by monaural stimuli although attenuated with respect to the sum of the two monaural stimuli (Pantev et al. 1986). Moreover, the interaction between afferent auditory pathways results in an inhibition exerted by the contralateral over the ipsilateral pathway which reduces cortical representation of stimuli delivered to the ipsilateral ear (Brancucci et al. 2004, 2008; Della Penna et al. 2007; Franciotti et al. 2011). On the contrary, stimulations more similar among each other (dichotic illusory vs. dichotic non-illusory, the present ILL and NIL conditions) have smaller repercussions on early cortical activity, though the considerably different perceptual outcomes have a neural counterpart in higher level cortical areas and at later latencies, where presumably the finest features of the illusion find their neural counterparts.

A possible neurophysiological interpretation of such a result is that the neurons of the auditory cortex, which are used to receive concordant information from the periphery (i.e., binaural hearing in all-day life) when stimulated with two different auditory objects in the two ears, i.e., as in the ILL and NIL conditions, are put in an unnatural situation and generate a thoroughly novel, habituation-free early response of higher amplitude. This is consistent with the increased peak amplitude observed during ILL or NIL with respect to MON, which can also depend, however, also from the fact that two ears are stimulated in the ILL and NIL conditions and only one in the MON condition. Further, the higher amplitudes in the ILL compared to NIL conditions could be explained with the frequency gap between the tones composing the respective dichotic pairs. The octave interval of the ILL condition (400 Hz to one ear and 800 Hz to the other) stimulates a wider neural population than the narrower interval (400–480 Hz) used in the NIL condition, whose target cells in the auditory cortex are more overlapped (Pantev et al. 1986). Concerning the late evoked activity (3rd macro-epoch), the reduction of the peak response and the prolonged processing during ILL with respect to NIL and to MON could be ascribed to an interference or inhibitory influence arising from contralateral auditory higher-order areas as a possible cause of the illusory perception. This effect was accompanied, in terms of latency, by a slowed late response in the two HG (8 comparisons out of 8, see bars in Fig. 4a, b concerning the third macro-epoch in lHG and rHG) in the ILL condition. This explanation is, however, speculative and further targeted research is needed to achieve an exhaustive explanation of the cellular bases of the illusion. On the contrary, the ILL response in the third macro-epoch was consistently either the single one to show reduced activity or the less increased one in terms of response amplitude. This effect was accompanied, in terms of latency, by a consistently later third response in the HG (4 comparisons out of 4) in the ILL condition.

Overall, the distinctive effect associated to the listening of the Deutsch’s sequence is thus a stronger early and middle evoked activity followed by a consistently reduced late peak activity but prolonged over time.

Remarkably, the differences observed between the ILL and NIL conditions pointing to a stronger early response in the ILL condition cannot be explained by the mere difference in frequency between the two stimulations (800 vs. 480 Hz in one of the tones composing the dichotic pair). In fact, dipole modeling studies have shown that the neuromagnetic signals produced by higher frequency tones stem from deeper sources which yield reduced responses, exactly the opposite as observed here (Romani et al. 1982).

What and where pathways activations

The cortical areas the present study showed to play a role during the perception of the Deutsch’s illusion overlap quite the ones constituting the auditory “what” and “where” streams. These two auditory pathways have been investigated in works performed mainly in the past two decades using direct neural recordings in monkeys and neuroimaging techniques in humans (Rauschecker and Tian 2000; Arnott et al. 2004; Brunetti et al. 2005, 2008; Woods and Alain 2009). The “what” pathway projects from the anterior primary auditory cortex to more anterior areas, such as the planum polare (Ahveninen et al. 2006), the MFG and the IC (Renier et al. 2009) and then to the ventrolateral prefrontal cortex (Rauschecker and Scott 2009). In turn, the “where” pathway projects dorsally from the primary auditory cortex to more medioposterior areas, such as the planum temporale (Ahveninen et al. 2006), the IPL (Renier et al. 2009), and then to more frontal areas such as the MFG (Rauschecker and Scott 2009). Within the ILL and NIL conditions of the present experiment both ears are stimulated always together and with the same tones. However, only in the ILL condition a perceived spatial difference of tone provenience (left–right ear) as well as a perceived height difference in the tones (high–low tones; Deutsch and Roll 1976) emerges. These percepts are of illusory nature as they are not present in the physical stimulation. Conversely, in the MON condition the differences in height and ear provenience are both perceived and present in the physical stimulation. On this premise, the present experimental paradigm permits an ensemble of observations concerning auditory what and where streams. In the what pathway activity (HG, IC, IFG), we observed a slower involvement of the late response peak (third macro-epoch) in the MON condition compared to the NIL and ILL conditions, together with a faster activation of the same peak in the ILL compared to the NIL condition. This reflects presumably an underlying mechanisms for the generation of the illusory perception. In fact, whereas in the NIL condition no difference in tone perception (tone height) emerges, in the MON condition this perception is generated in a bottom-up way, i.e., starting from the stimulus, while in the ILL condition it is generated internally, in a top-down mode. The presence of a different mechanism involving the what pathway is suggested by the earlier latency of the late peak observed in the ILL condition (even earlier than in HG) which is in line with enhanced top-down activity. The observations in the where pathway activity corroborate this explanation. The MON condition shows also here slower activity, specifically at the second peak compared to both other conditions and in the third peak compared to the ILL condition. On the whole, both conditions (ILL and MON) in which a perception variation of tone height and provenience is present, differed considerably in both pathways from the NIL condition, where no perception of tone height and provenience emerges. The fact that this difference is especially clear at tardy latencies speaks in favor of a strong role of top-down processes in the Deutsch’s illusion. The differences between ILL and MON condition could reflect two interesting aspects of neural correlates of perception. First, that the two auditory pathways are presumably not part of the auditory neural correlate of consciousness core (Tononi and Koch 2015), otherwise, given the correspondence in perception (very similar in ILL and MON) one would expect also a similar underlying neural activity. Second, that the brain, in order to build two similar perceptions starting from different physical stimulations, needs to implement differential mechanisms at intermediate and late levels of analysis, in addition to earlier levels which differ driven by stimulation. On the whole, these evidences indicate that the activation of the what and where auditory pathways can be elicited also independently of stimulus properties and that they host the neural mechanisms subserving what–where perception even if it is fictitiously produced by internal processes.

Perceptual mechanisms underlying the illusion

Concerning the mechanisms proposed for the generation of the illusion (Deutsch 1980, 1981, 1988; Deutsch and Roll 1976; Chambers et al. 2002, 2004), the present results speak in favor of the original explanation by Deutsch, i.e., on the fact that the illusion strongly grounded on what and where features. The present results add a neural basis to this suggestion by showing that the features of phenomenology have a counterpart in the activity of the two corresponding auditory what and where streams. However, future studies should verify whether the activation of the two pathways directly produces the perceptual features of the illusion, i.e., whether interfering (e.g., with TMS) with the what pathway specifically abolishes illusory pitch perception effects, whereas interfering with the where pathway specifically abolishes illusory sided perception effects.

Hemispheric asymmetries

A basic asymmetry involving the IPL was observed, showing activity above threshold just in the LH in all conditions. Further, as regards the areas showing above threshold activity bilaterally, we found that the asymmetry differences between conditions varied during the course of the neural response, so that an RH prevalence was observed in the increase/prolongation of early response in ILL vs. NIL and ILL vs. MON, and an LH prevalence was observed in the enhanced/prolonged ILL (vs. NIL and vs. MON) middle response. Interestingly, the ILL LH asymmetry involved areas belonging to the what pathway (IC and IFG), whereas the ILL RH asymmetry involved different areas depending on the comparison made (MFG in comparison with NIL and IC in comparison with MON). These results are not comparable with the only previous study on lateralization processes in the Deutsch’s illusion (i.e., Lamminmäki and Hari 2000) as their hemispheric asymmetries were analyzed at the level of the single tone evoked activity, whereas the present data concern the whole auditory sequence eliciting the illusion which, as explained in the “Materials and methods”, was based on activation magnitude and activation duration. As mentioned above, neural activity differences observed between the ILL and MON conditions should be ascribed tendentially more to the stimulation than to the perception, since the stimulation is substantially different between the two conditions and the percept is not. So is for the early RH and middle-latency LH asymmetric increased activation of the IC and the middle-latency left increased activation of IFG. Conversely, neural activity differences observed between the ILL and NIL conditions should reflect differences in the perceptual processes eliciting the illusion, since the stimulation is very similar in the two conditions and the respective percepts are very different. So is for the early RH MFG asymmetry reflecting presumably the startup of a top-down process and for the middle-latency LH asymmetry in IC and IFG reflecting presumably a lateralization of the what pathway. This last outcome would be a novelty with respect to the visual system whose what pathway has been described as substantially bilateral. Finally, we found that the right IFG was active only in the ILL condition (Fig. 2) as one possible finer signature of the Deutsch’s illusion within the present experimental paradigm, a finding which needs examination in future research. On the whole, the present results suggest that the illusory percept is based on a cooperation between the hemispheres in which the RH is possibly involved in bootstrapping top-down processes and the LH has a prime role in the generation of the percept (what) which results in illusory perception of height.

Comparison with earlier studies

With respect to the studies shortly exposed in the “Introduction”, these results are in line with the findings by Ross and Näätänen (1996) who pointed to an encoding at cortical level of the auditory sequence eliciting the illusion according to its physical rather than perceptual properties, and who proposed that the generator of the octave illusion is located beyond the lower levels of the auditory cortex. The areas identified in the present study, i.e., the IC, the left IPL and the MFG—in addition to the HG, would be good candidates as “generators” of the perceptual features experienced during the Deutsch’s illusion by the majority of individuals. These results confirm and extend also the results obtained by the Finnish group (Lamminmäki and Hari 2000; Lamminmäki et al. 2012) in that they demonstrate that the inverse relation between amplitude and time (i.e., strong early activity followed by reduced late activity) is a large-scale feature typical of the dichotic sequence eliciting the illusion which disappears in similar sequences (NIL and MON conditions). In addition, the present results show that this effect can be observed also in higher cortical regions compared to HG, such as IC, MFG and IPL and confirm the possibility that illusory pitch perception depends on a modified binaural interaction between dichotic tones separated by intervals similar to the octave (Brancucci et al. 2009), as demonstrated by the considerable difference in neural activation observed during the NIL condition.

It should be mentioned that the present results go together with two previously published MEG and fMRI studies (Brancucci et al. 2011a, b, 2014). In those studies, however, the focal point was set on neural correlates of consciousness (“NCC”) and the Deutsch illusion was used to exploit a very rare feature in auditory domain which is owned by its stimuli, namely auditory perceptual multistability. In such protocols, the analysis is focused on the different neural responses elicited by two identical physical stimuli which are, however, perceived differently in subsequent presentations, as in the well-known visual multistable images (e.g., Rubin’s vase). In addition, in one case (2014) the original acoustic stimulation eliciting the Deutsch illusion was substantially altered to join fMRI constrictions.

Conclusion

The achievements of the present study extend to illusory perception the auditory what and where neural mechanisms observed previously only in ordinary auditory processing. In addition, according to our hypotheses, they suggest a neurophysiological basis of the psychophysics of the Deutsch’s illusion showing an involvement of the two auditory neural pathways, substantiating the original explanation based on the psychological concepts of sound identification (what) and origin (where, Deutsch and Roll 1976). The present findings show also that the neural mechanisms underlying the illusion mainly involve activity in frontal areas, as predicted. Furthermore, results suggest that relevant activity underlying the illusion is mainly reflected by larger activity at early and middle latencies and by a dynamic lateralization pattern net in favor of the LH.