Introduction

The neural correlates of consciousness (NCC) are patterns of brain activity that specifically accompany a particular conscious experience (Tononi 2004, Tononi and Koch 2008; Dehaene and Changeux 2011; Edelman et al. 2011). Previous research has investigated them deeply in the visual system, using particularly suited paradigms such as binocular rivalry and bistable percepts in combination with neural recordings or neuroimaging (Leopold and Logothetis 1996; Tononi et al. 1998; Blake and Logothetis 2002). The main outcome of these studies has been a consistent observation of frontal, parietal, and inferior temporal areas as a fecund substrate for the emergence of visual awareness (Rees et al. 2002; Rees 2007). Compared to vision, the NCC in the auditory modality has been scarcely investigated. The aim of the present study was to use an approach which attempts to mimic visual multistability (i.e. the same stimulus perceived differently at different times) using acoustic stimuli eliciting illusory percepts to explore consciousness in the auditory domain. We started from the dichotic sequence eliciting the Deutsch’s octave illusion (Deutsch 1974), which occurs when a sequence of tones alternating in frequency between 400 and 800 Hz is presented dichotically. Each ear receives the same sequence, however, the sequences at the two ears are offset by one tone so that the high- and low-frequency tones are always presented to different ears and when the right ear receives the high tone, the left ear receives simultaneously the low tone and vice versa. The most common illusory percept, which is independent from subject’s attentional shifts (Chambers et al. 2005), consists of a single high pitch tone in one ear alternating with a single low-pitch tone in the other ear (Fig. 1, left). The percepts of the Deutsch’s illusion can be compared to visual multistability in that one dichotic stimulus can be perceived in four different ways: high left (i.e. a high pitch in the left ear), high right, low left and low right. A remarkable difference with visual multistability is that in the Deutsch illusion the stimulus perceived in different ways is not steadily presented to the subject but in subsequent (identical) sequences. Whereas in the same sequence one dichotic stimulus is always perceived in the same manner, in different sequences the perception can change. Although the perception is often stable across Deutsch’s illusion sequences, one fraction of subjects shows changes (multistability). Thanks to this effect, the illusion can aid to shed light in the NCC.

Fig. 1
figure 1

The four conditions used in the fMRI sparse sampling experimental design. Acoustic stimuli: A = 400 Hz left ear + 800 Hz right ear, B = 800 Hz left ear + 400 Hz right ear, duration of each stimulus: 500 ms. Response was given by button press associated to the last perceived tone (high left, low left, high right, low right). Time flows from left to right

One of the problems encountered in the study of the Deutsch’s illusion with high spatial resolution techniques is that the duration of each dichotic pair does not last long enough to allow the detection of its underlying neural activity by fMRI, which requires a duration of a second or so to detect a reliable activation (Malonek et al. 1997). To this aim, in a previous behavioral study (Brancucci et al. 2011a) we developed a variant of the acoustic sequence eliciting the Deutsch’s illusion which allows to maintain the illusory percept constant for a time long enough to be detected with fMRI, a brain imaging technique which allows the highest precision and reliability of the spatial details observed in the working brain. The variant consists of a first sequence of two dichotic stimuli composed of tones spaced an octave apart (400 and 800 Hz) presented repeatedly in alternation as in the standard version of the illusion described above, but immediately followed by a second sequence composed of just one of the two dichotic stimuli presented repeatedly (Fig. 1, left). We demonstrated that through this kind of stimulation the illusory percept of pitch and ear of origin can be kept constant for at least 6 s, a duration long enough to elicit an observable response in the BOLD (blood oxygen level dependent) signal.

It is of primary importance for the present study that the stimulation is made up of two stimuli: a dichotic stimulus composed of a 400 Hz tone presented at the left ear and an 800 Hz tone presented at the right ear (here referred to as stimulus A), and a specular dichotic stimulus composed of an 800 Hz tone presented at the left ear and a 400 Hz tone presented at the right ear (stimulus B). The repeated illusory tone having high or low pitch and perceived at the left or right ear is not fixedly associated with one of the two dichotic stimuli. Subjects can perceive one of the two dichotic stimuli as having high or low pitch and at the left or at the right ear (multistable percept). Nevertheless, during one auditory sequence the association between stimulus and percept does not change (Brancucci et al. 2011a; Deutsch 1981). Since the percepts do not depend on the actually presented sounds, this procedure allows to ascribe the different brain activations observed in concomitance to each of them solely to the neural processes specifically underlying the percept, i.e. to auditory consciousness.

In a previous study (Brancucci et al. 2011b), using a stimulation paradigm similar to the present one, we investigated the neural mechanisms of auditory consciousness using magnetoencephalography, a technique bearing high temporal resolution which allowed us to study the time course of neural responses to auditory stimuli. Results showed that the neural activity accompanying auditory perception was differently timed in case of pitch (earlier) and sound provenience (later) awareness. Here, we want to improve the aspect of spatial resolution thanks to the use of fMRI which allows to observe finer spatial details of the biological mechanisms underlying cognition and may allow us to shed light on the topography of the neural networks underlying auditory consciousness. Visual consciousness has been investigated deeply and its neural correlate has been observed in a network comprising frontal, parietal and inferior temporal areas. This could be due to the fact that the latter two areas are higher order visual areas. Since the parietal lobe is mainly devoted to somatosensory and higher order visuospatial processing, we predict that it does not play a primary role in auditory consciousness, which may in turn be based on a network comprising only frontal and (superior) temporal cortex, in addition to subcortical areas around the thalamus as in all kinds of conscious perception.

Methods

Subjects

Twenty-six subjects (23 female, 3 male) aged between 19 and 25 years (mean age = 22.3) participated in the study. Subjects were recruited on the basis of their perception of the Deutsch’s illusion, as assessed in a preliminary experiment, to represent as much as possible all perception variants of the stimulation during the fMRI recordings (i.e. low vs. high pitch and left vs. right ear for each stimulus). Therefore, both right-handers and left-handers were recruited, since it has been described that the percepts reported during the illusion are related to hand preference (Deutsch 1983).

None of the subjects had auditory impairments as measured by auditory functional assessment (absolute hearing threshold <20 dB) and no differences (±5 dB) of hearing threshold were found between left and right ear (Brancucci et al. 2005). Handedness measurement (Salmaso and Longoni 1985) yielded 57.3 (±2.2) as an average result at group level (scale max score = 100, min score = −100; 0 = no hand preference) with 18 subjects scoring >0 and 8 subjects scoring <0.

Stimulation

All tones were sinusoids having a duration of 500 ms and a frequency of 400 or 800 Hz, synthesized on a personal computer by means of the CSound language for sound synthesis (Vercoe 1992). Amplitude envelope of the tones had a linear attack of 10 ms and a linear decay of 490 ms. Stimulation intensity was 70 dBA.

Tones were arranged in the auditory sequences which constitute the four stimulation conditions of the study (Fig. 1). All conditions were composed of two sequences each, one “alternating” sequence and one “repeating” sequence. In the first condition the first auditory sequence was composed of the 400 and 800 Hz tones constituting the two dichotic stimuli (A: 400 Hz left, 800 Hz right; B: 800 Hz left, 400 Hz right) presented in alternation so that when the right ear receives the 800 Hz tone, the left ear receives the 400 Hz tone, and vice versa. The second auditory sequence followed immediately, without inter-stimulus interval, and was composed of just one of the two dichotic stimuli presented repeatedly. The whole stimulation condition (repeating + alternating sequences) lasted 12 s. The three other stimulation conditions, aimed at counterbalancing all possible tone biases of the first condition, were composed as follows with respect to the first condition: in the second condition the alternating sequence started with B, in the third condition the repeating sequence was composed of B, and in the fourth condition both the alternating sequence started with B and the repeating sequence was composed of B. Discrepancy in the durations of the alternating vs repeating sequences composing the stimulations (7 or 7.5 s, 4.5 or 5 s, see Fig. 1) is due to the maintenance of the alternation between the last tone of the first sequence and the first tone of the second sequence. The stimulations were presented eight times each in each of the three fMRI runs (see bottom).

Task

Each stimulation was followed by 2 s silence (response time) in which the subject was asked to press one out of four buttons describing the subjective properties of the stimulus presented repeatedly (i.e. the last perceived tone): (1) low pitch at the left ear (left hand, middle finger), (2) high pitch at the left ear (left hand, second finger), (3) low pitch at the right ear (right hand, middle finger), (4) high pitch at the right ear (right hand, second finger). The response time (max 2 s) was followed by 2 s volume acquisition (Fig. 1, middle), by a rest of 14 s and by further 2 s of volume acquisition (Hall et al. 1999; Fig. 1, right). In this way, the likelihood that the motor response biased the neural activity related to consciousness is minimized, according to previous studies on the temporal relationships between the dynamics of cortical blood flow, oxygenation, and cognitive activity. The task allowed us to know the subjective illusory percept associated to the repeated stimulus, a percept which was maintained for a time of 4.5–5 s, which should produce a BOLD response lasting sufficiently to be detected with fMRI (Malonek et al. 1997). Of relevance here is the fact that the present paradigm should minimize possible confounds between attention and consciousness, a central issue in the search of the NCC (Koch and Tsuchiya 2007, 2012), since the percept during the Deutsch’s illusion is not contingent upon attentional shifts of the subject (Chambers et al. 2005).

fMRI recordings

fMRI was carried out with a 3T Philips Achieva MRI scanner (Philips Medical Systems, Best, The Netherlands). BOLD contrast functional images were acquired by means of T2*-weighted echo planar imaging free induction decay sequences with the following parameters: repetition time (TR) 16 s, echo time (TE) 35 ms, matrix size 96 × 96, field of view (FoV) 230 mm, in-plane voxel size 2.875 × 2.875 mm, flip angle 90°, slice thickness 3 mm, and no gap. To avoid the interference of the scanner bursts of noise, the “sparse” sampling technique proposed by Hall and colleagues (1999) was used. Following the procedure outlined by these authors, functional images were acquired at the end of the stimulation periods and of the baseline (silent) periods. The interval between two consecutive acquisitions (TR) was set at 16 s to allow activations elicited by scanner noise to return to baseline before the next image acquisition (Fig. 1).

BOLD fMRI data were analyzed by means of the Brain Voyager QX software version 2.3 (Brain Innovation, The Netherlands). Preprocessing of functional scans included motion correction and removal of linear trends from voxel time series (Hajnal et al. 1994; Friston et al. 1996). Preprocessed functional volumes of a subject were co-registered with the corresponding structural data set. Since the 2D functional and 3D structural measurements were acquired in the same session, the co-registration transformation was determined using the position parameters of the two images. The alignment between functional and anatomical scans was checked by means of an accurate visual inspection. Structural and functional volumes were transformed into Talairach space (Talairach and Tournoux 1998) using a piecewise affine and continuous transformation. Functional volumes were resampled at a voxel size of 3 × 3 × 3 mm.

Data analysis

The experiment was designed with the aim of obtaining a brain activity profile during the different percepts elicited by the same dichotic stimulus (A or B) in the period in which it was presented repeatedly (Fig. 1, left). The percepts could be of four types: low (pitch) left (ear), high left, low right, and high right, and were separated on the basis of the response (button press) indicative of the subjective perception. Successively, the perception categories along each dimension (i.e. illusory pitch and ear of origin) were associated to the physical stimuli eliciting them (A or B) and four t tests comparisons on the behavioral data were performed: A low vs. high, B low vs. high, A left vs. right, B left vs. right.

Accordingly, we selected fMRI trials in which the pitch was perceived as “high” or “low” and in which the sound was perceived at the left or at the right ear, that were associated to physical stimulus A, and fMRI trials in which the pitch was perceived as “high” or “low” and in which the sound was perceived at the left or at the right ear, that were associated to stimulus B. Within fMRI trials evoked by stimulus A, the comparisons were made between those associated to the perception of the high vs. low pitch and, in a separate analysis, between those associated to a tone perceived at the left vs. right ear. Similar comparisons were made within traces evoked by stimulus B.

Statistical activation maps were generated by means of paired t tests comparing voxel by voxel images corresponding to the subjective percept (high vs low, left vs right), produced by the same stimulus. The maps were thresholded at p < 0.01 (uncorrected) at the voxel level. The correction for multiple comparisons was performed using a cluster-size thresholding algorithm (Forman et al. 1995; Cox 1996) based on Monte Carlo simulations implemented in the BrainVoyager QX software. The uncorrected and thresholded maps were used as input in the Monte Carlo simulations with a FWHM = 1.9 voxel as Gaussian Kernel of the spatial correlation among voxel and with 5000 iterations, yielding a minimum cluster size of 27 face contiguous voxels. Using this method we obtained a map thresholded at a significance level (the probability of a false detection for the entire functional volume) of α < 0.05, corrected for multiple comparisons.

Results

Behavioral responses

The percentages of “high” responses (perception of illusory high-pitch tone) were 44.8 % within left and 45.9 % within right ear perceptions elicited by stimulus A (t = 0.3, p = 0.7) and 50.4 % within left and 40.1 % within right ear perceptions elicited by stimulus B (t = 1.2, p = 0.2). “Low” responses are the complement of “high” responses to 100 %.

The percentages of “left” responses (perception of illusory tone in the left ear) were 26.4 % within high pitch and 26.9 % within low pitch perceptions elicited by stimulus A (t = 0.1, p = 0.9) and 71.3 % within high pitch and 63.1 % within low pitch perceptions elicited by stimulus B (t = 1.1, p = 0.3). “Right” responses are the complement of “left” responses to 100 %.

Within the perceptions elicited by stimulus A, a left-ear tone was perceived 10.9 ± 3.1 times whereas a right-ear tone was perceived 32.0 ± 3.1 times (t = 3.4, p < 0.01, the only significant difference found in behavioral data analyses, Bonferroni corrected) and a high-pitch tone was perceived 20.3 ± 2.3 times whereas a low-pitch tone was perceived 22.7 ± 2.5 times (t = 0.5, p > 0.05).

Within the perceptions elicited by stimulus B, a left-ear tone was perceived 29.1 ± 3.8 times whereas a right-ear tone was perceived 16.0 ± 3.8 times (t = 1.8, p > 0.05) and a high-pitch tone was perceived 20.6 ± 2.6 times whereas a low-pitch tone was perceived 24.6 ± 2.7 times (t = 0.8, p > 0.05). Figure 2 depicts graphically the described behavioral responses.

Fig. 2
figure 2

Behavioral results. In black, the mean number of perceptions associated to stimulus A (400 Hz left ear + 800 Hz right ear), in white the mean number of perceptions associated to stimulus B (800 Hz left ear + 400 Hz right ear). The only statistically significant difference was observed between A right and A left (p = 0.008, Bonferroni corrected)

Fig. 3
figure 3

a Mean difference maps. The brain regions that showed activation differences when the subjective perception of stimulus A differed in terms of pitch. Red color indicates stronger activation when a high pitch was perceived, blue when a low pitch was perceived. The first map (z = 25) shows activity differences in the left and right SFG, the second map (y = −22) in the left insula and posterior lateral nucleus of the thalamus (p < 0.05 corrected for multiple comparisons). b The same as in a but elicited by stimulus B. Red color indicates stronger activation when a high pitch was perceived, blue when a low pitch was perceived. The first map (z = 24) shows activity differences in the MFG (bilateral) and in the right SFG, the second map (z = 64) in the right MFG (p < 0.05 corrected for multiple comparisons)

Fig. 4
figure 4

a Mean difference maps. The brain regions that showed activation differences when the subjective perception of stimulus A differed in terms of ear of origin. Red color indicates stronger activation when a tone was perceived in the right ear, blue when a tone was perceived in the left ear. The first map (x = −8) shows activity differences in the left MFG, the second map (z = 12) in the left STG, and the third map (z = −14) in the right PHG (p < 0.05 corrected for multiple comparisons). b The same as in a but elicited by stimulus B. Red color indicates stronger activation when a tone was perceived in the right ear, blue when a tone was perceived in the left ear. The map (z = 0) shows activity difference in the right insula (p < 0.05 corrected for multiple comparisons)

To test for possible effects of hand preference we divided the sample in 3 groups based on handedness (8 subjects scoring >70, 10 subjects scoring <70 and >0, 8 subjects scoring <0) (Figs. 3, 4). No significant effects were observed in the perception of the illusion and related neural activations.

fMRI activation differences

All following comparisons were made within perceptions elicited by the same stimulus (stimulus A or stimulus B). The areas in which, associating brain activity to the subjective report, a statistically significant difference was observed in the brain comparing the perception of a high-pitch tone minus a low-pitch tone or the perception of a right ear tone minus a left ear tone (p < 0.01, minimum cluster size = 27 face contiguous voxels) are reported in Table 1.

Table 1 Brain regions where a statistically significant difference was observed between the perceptions of a high minus a low pitch or between the perceptions of a right ear tone minus a left ear tone

In summary, concerning consciousness of pitch, the maximal differences between the activations related to the perception of a high minus low pitch were found, when elicited by stimulus A: in the right SFG (BA9, with more activation during the perception of a high pitch), in the left SFG (separate activations in BA9, with more activation during the perception of a high pitch, and BA10, with stronger activation during the perception of a low pitch), in the left insula (BA13) and in the left lateral posterior nucleus of the thalamus (in both cases with stronger activation during the perception of a low pitch). When elicited by stimulus B, differences were found in the left and right SFG (BA10), in the right medial frontal gyrus (MFG, BA6 and BA9 in two separate activations) and in the left MFG (BA9). All activations elicited by stimulus B were stronger during the perception of a low compared to high pitch (Fig. 3).

Concerning consciousness of side (ear of origin), the maximal differences between the activations related to the perception of a tone in the right minus left ear were found, for stimulus A, in the left MFG (BA6) and in the left STG (BA41) where a stronger activation was observed during the perception of a tone in the right ear and in the right parahippocampal gyrus (PHG, BA28) where a stronger activation was observed during the perception of a tone in the left ear. When elicited by stimulus B, differences were found in the right insula (BA13), with a stronger activation during the perception of a tone in the left ear (Fig. 4).

Further analyses devoted to ascertain whether the observed neural activation differences related to pitch perception were influenced by perception of side (ear of origin) and vice versa yielded no significant results. Similar non-significant results were found concerning the influence of the stimulus on pitch perception. On the contrary, we found that the stimulus played a significant role in the perception of side (ear of origin). Neural activation differences elicited by stimulus A minus B were of opposite sign (and different size) during the perception of side in the PHG (β values of A right = −0.33 ± 0.11, B right = 0.03 ± 0.26, A left = 0.14 ± 0.12, B left = −0.11 ± 0.08: negative difference during the perception of right, positive difference during the perception of left, p = 0.04) and in the insula (β values of A right = 0.24 ± 0.32, B right = 0.15 ± 0.31, A left = 0.49 ± 0.07, B left = 0.79 ± 0.07: positive difference during the perception of right, negative difference during the perception of left, p = 0.01).

Additional differences, contralateral to the hand/finger motor responses, were found within the motor hand areas of the precentral gyri (BA4).

Control analysis

We performed a further analysis on a sub-group of participants (n = 10, 1 male, mean age = 22.7 years) who perceived the presented stimuli (A and B) in all of the four possible ways described above, to control whether the results found in the main analyses on the whole group could suffer from the fact that they were obtained comparing different subgroups of subjects. Again, no difference of activation was found in the parietal lobe between percepts elicited by the same stimulus, thus confirming the main result of the study. Statistically significant different activations were found in the left MFG (−32, 12, 30; BA6; p < 0.001) comparing brain activations during the perception of high vs. low pitch, and in the right MFG (19, −8, 60; BA6; p < 0.001) comparing brain activations during the perception of right vs. left ear sound origin, both elicited by stimulus B. Furthermore, within percepts elicited by stimulus A, we found different activations in the cingulate cortex (2 foci: 0, −44, 9 and −3, 17, 43; both p < 0.001), and in the left MFG (−53, 3, 38; BA6, p < 0.001) during the perception of right vs. left ear tones.

Discussion

The present study showed that when the same acoustic stimulus is perceived in different ways it produces a brain hemodynamic activity which varies along with the different auditory perceptions. Such variation, thus completely independent from the acoustical stimulation, was observed in a fronto-temporal brain network which comprised bilateral SFG, MFG, STG, insulae, posterior lateral nucleus of the thalamus and parahippocampal gyrus. This finding is based on the analysis of BOLD responses elicited with a stimulation condition which consists of a variant of the classical Deutsch illusion (or Octave illusion, Deutsch 1974; Brancucci et al. 2009, 2011) allowing to maintain the percept, an illusory segregation of pitch and side (ear of origin) for a time sufficiently long to permit the detection of a hemodynamic response by fMRI. It should be remarked that this stimulation has some advantages in comparison to other paradigms based on multistable percepts, because a motor response is required just once for a series of percepts instead than for each percept, and because the different percepts, otherwise typically free running, are here independent of attentional factors and are temporally defined by the auditory stimulation, allowing to restrict each different conscious percept to a known time window and thus yielding more homogeneous data (Brancucci and Tommasi 2011).

The perception of a high-pitch tone or a low-pitch tone both elicited by the same physical stimulus (A or B) was associated to differences in BOLD signals which engaged near cortical areas in frontal (in particular BA9, BA10 in the SFG and BA6, BA9 in the MFG) and in subcortical areas (BA13 in the insula, lateral posterior nucleus of the thalamus). In contrast, the perception of a tone in the right or left ear was associated to differences in BOLD signals in frontal (BA6 in the MFG), temporal (BA41 in the STG) and in subcortical areas (BA13 in the insula, BA28 in the PHG). Of note, gender of subjects was skewed towards female and allowed no speculation on possible effects, which was not observed and moreover not expected as it was never reported in the field. Hand preference, which on the contrary has been reported to influence the perception of the Deutsch illusion (Deutsch 1983), did not produce noticeable effects in our observations. This apparently odd result is possibly due to the different method of subject recruitment. Deutsch (1983) recruited subjects on the basis of their handedness and was able to afford a large sample, whereas we selected our subjects on the basis of their percept due to the need to represent all possible perception variants in a relatively smaller sample which underwent brain scanning.

Although it is not clear how neural activity can give rise to subjective experience, the NCC is an experimental construct researchers seek to identify and characterize in the form of patterns of neural activity. It should be noted here that the expression “NCC” is a very prudent one, as it confines neural activity to a simple correlate of conscious experience, like an epiphenomenon. On the contrary, there are now evidences suggesting that the NCC is the substrate of experience or in other words the physical event which causes consciousness (Tononi 2004; de Graaf et al. 2012). In the light of the present results, it can be proposed that the neural areas underlying and possibly causing conscious experience are strictly related to each precise conscious experience, as they were different for each kind of conscious experience tested, and confirm that there is no single area which is responsible of consciousness (James 1890; Crick and Koch 2005; Rees 2007). Moreover, the results represent a rather clear picture which illustrates that auditory consciousness depends on a network of areas comprising cortical regions located in the frontal and superior temporal lobe, as well as in several subcortical regions. As hypothesized, the parietal lobe seems to play a role of limited importance in consciousness when the subjective experience is purely auditory in nature. On the contrary, as reported by many evidences in the literature, when the experience is of visual nature, the parietal lobe seems to play a prominent role in association with frontal and inferior temporal cortical areas (Rees 2007; Sergent and Dehaene 2004). Thus, in accord with several previous studies, the present results show that the NCC may not be generalizable, but rather it may represent at least in part specific features depending on the sensory system involved. It has been suggested that primary auditory areas are involved in different aspects of auditory consciousness (Bekinschtein et al. 2009; Bidet-Caulet and Bertrand 2009), whereas there is now general agreement that the primary visual areas do not have a specific role in the NCC (Tononi and Koch 2008; Rees et al. 2002). Such a dissimilarity between the visual and the auditory system is not surprising given both the intrinsic differences between the two modalities and the different neural organization of the respective primary and non-primary cortices.

Hence, the prominent role in consciousness assigned to the parietal lobe could represent a bias due to the bulk of studies investigating consciousness in the visual domain, a function resting directly upon parts of the parietal cortex. Concerning auditory perception, the involvement of parietal areas has been associated mostly to spatial aspects of perception rather than to other features (Rauschecker and Tian 2000; Arnott et al. 2004; Woods and Alain 2009). This interpretation is further supported by a study comparing fronto-parietal activity in visual and auditory awareness (Eriksson et al. 2007). It showed that whereas frontal regions were related to both visual and auditory awareness, parietal activity was related only to visual awareness and superior temporal activity was correlated to auditory awareness. According to the authors’ interpretation, these results indicate that frontal regions interact with specific posterior regions to produce awareness in different sensory modalities.

Concerning more specific aspects related to the effective stimulation used in the present research, an interesting observation can be made by an analysis of the activations associated to the same perceptions but which have been elicited by different stimuli (A and B), another feature allowed by the present stimulation paradigm. This feature is somehow the opposite to that exploited in the main analysis and generally in the search of the NCC, but it can tell us something about the relationships between stimulus (environment), neural activation, and consciousness. A vast debate is in progress about the substrates of conscious experience, and some researchers argue that consciousness has to be searched also in the environment around the experiencing organism, as if the two would stand in an indissoluble relationship (e.g. Di Francesco 2008). Which answer can be given to this question in the light of the present observations? Although this issue would require further research, the fMRI activations observed in the present study showed that one specific percept is not necessarily related to a single specific neural activation pattern (activation maps related to e.g. high pitch experience were different when elicited by stimulus A or by stimulus B). In addition, they point to a key role of the stimulus which can drive consciousness upshot in a way not directly related to neural activation, in different ways in diverse brain areas. In our opinion this reflects the ability of the brain to produce the same percept in many multiple ways, and should be read as if the external world can play a role in consciousness processes if needed. The external world is, however, not necessary in principle, as demonstrated by other manifestations of consciousness such as dreams. This is further substantiated by the observation that each external stimulation can be simulated by proper peripheral nerve stimulation and is in line with the notion of degeneracy, a feature of several biological systems owned both at structural and cognitive level (Tononi et al. 1999; Edelman and Gally 2001; Price and Friston 2002), referring to the ability of elements that are structurally different to perform the same function. In the present case, degeneracy would be a feature of the relation between neural tissue/activity and consciousness, a notion already pointed out recently (Balduzzi and Tononi 2009).

In conclusion, it can be proposed that auditory consciousness rests upon some neural mechanisms hosted in the superior and medial frontal gyri, two areas whose prominent role in this subject has been repeatedly observed in different contexts (Lumer et al. 1998; Beauregard et al. 2001; Tong et al. 2006; Brancucci et al. 2011b), in superior temporal areas, possibly as neural auditory hubs (Brancucci et al. 2011b; Plourde et al. 2006), in insular (Craig 2009) and thalamic (Llinás et al. 1998; Liu et al. 2013) regions, in accordance with most research postulating a thalamocortical basis for the substrate of consciousness. Finally, the lack of clues pointing to a role of the parietal cortex in auditory consciousness suggests that this region, often associated to consciousness in general (Taylor 2001; Pins and Ffytche 2003), may instead be implicated when consciousness is of visual nature. In this view, it can be argued that the SFG/MFG activity is an NCC not linked to a specific perceptual modality and in forthcoming research it will become crucial to understand which specific features are owned by this brain area in terms of microanatomy, functional connectivity and information integration processing, to further advance our knowledge in the field.