Introduction

In everyday life, it is rare to find an environment where there is only one sound at a time. The daily auditory scene of humans is made of a wide variety of sounds with various spectral and temporal components. These competitive sounds form a complex acoustic signal. The auditory system must interpret this signal according to the listening context, either by dividing the signal into its distinct components (segregation) or by grouping components together (integration), to form a heterogeneous or homogeneous auditory image. These two mechanisms (segregation and integration) enable mental representations of sounds as distinct auditory objects. This ability is called the auditory scene analysis (Bregman, 1994). Auditory illusions are precious tools to help understand perception as they allow us to investigate the limits of auditory perception. An auditory illusion occurs when a sound that is presented to an individual differs in its perception to the extent that it creates a false perception (e.g., hearing a sound that does not exist) or an impossible perception (e.g., hearing the pitch increase forever [Shepard, 1964]) through some top-down or bottom-up perceptual processes. These illusions show that the auditory system not only faithfully transmits sound information as it enters our ears, but also alters and rearranges this information in various ways. Some auditory illusions can help us to learn more about how we integrate and segregate sounds. However, some of them remain under-investigated to this day.

A well-known illusion used to investigate the auditory scene analysis is the octave illusion (Deutsch, 1974a, 1974b). This illusion is perceived when two sounds separated by an octave are repeatedly and alternately played (high-low–high-low) through stereo headphones (see Fig. 1). The same sequence is played to both ears simultaneously. However, when the right ear receives the high sound, the left ear receives the low sound, and vice versa. The auditory percept most often reported by participants is the perception of a single tone that alternates between the ears while its pitch simultaneously shifts back and forth between high and low instead of perceiving two alternating tones (see Fig. 1). This sensory phenomenon highlights two essential abilities for auditory scene analysis: sound localization and binaural fusion/segregation.

Fig. 1
figure 1

A The classical pattern of sounds used to elicit the octave illusion is a combination of a 400 and 800 Hz pure tone alternating every 250 ms. When one ear is receiving the high tone, the other is receiving the low tone. (Deutsch, 1974a, 1974b) B-C The what and where or Suppression model as suggested by Deutsch (2004a). D Most common percept reported by right-handers (Deutsch, 2004a): a high tone on the right alternating with a low tone on the left. (Dashed lines represent the continuation of the signal or the percept)

The model proposed by Deutsch to explain this phenomenon divides the perceptual process giving rise to the octave illusion into two channels: the what and the where pathways of the auditory system (Deutsch, 1975, 1978, 1980a, b, 1981, 1988, 2004a; Deutsch & Roll, 1976). The what refers to the pitch perceived by the individual during the octave illusion. The where refers to the perceived localization of the pitch percept to the right or left ear (Deutsch, 1975, 1981, 1983). The model also posits different inhibitory mechanisms for the two channels which lead to channel specific suppression of certain percepts. The model is, therefore, called either the two-channel or suppression model.

Based on the most common percept reported (Fig. 1, D), the perceived pitch alternates between a high tone and a low tone (Fig. 1, B) based on which tone the dominant ear receives at the time (Fig. 1, A). This observation led to the hypothesis that the what channel contains inhibitory connections from the neural units responsible for pitch perception in the dominant ear to the neural units responsible for pitch perception in the non-dominant ear (Deutsch, 1981). These hypothesized inhibitory connections would suppress the non-dominant ear’s pitch perception when there is conflicting pitch information received in the dominant ear. This lateral dominance hypothesis is supported by studies that have looked at the impact of handedness on the perception of the illusion. For example, Deutsch (1974b) found that most right-handed subjects localized the high tone (800 Hz) of the octave percept on the right ear, while left-handed subjects did not preferentially localize the 400 or 800 Hz tone on either ear.

The localization of the pitch percept, however, is not based on the dominant ear. Instead, the localization alternates between the right and left ears based on which ear is physically receiving the high tone at the time. For instance, the sound would be localized to the right from 0 to 250 and from 500 to 750 ms (Fig. 1, C) since it is the right ear which receives the high tone at these time steps (Fig. 1, A). Conversely, since the left ear receives the high tone from 250 to 500 ms and from 750 to 1000 ms, the sound would be localized to the left during these time steps. This observation led to the hypothesis that the where channel, which implicated in sound localization, contains inhibitory connections from the neural units sensitive to higher tones to the neural units sensitive to lower tones (Deutsch, 1981).

Deutsch (1981) suggested that the octave illusion arises from neural units that integrate information from both what and where channels. This proposition is supported by the discovery of neural units that are sensitive to both frequency and spatial localization at the same time (e.g., neural units that possess both a characteristic frequency and a characteristic interaural time or intensity difference). Such neural units have notably been discovered in the superior olivary complex, the inferior colliculi and the auditory cortex (Deutsch, 1981). The two-channel or suppression model has been a subject of debate and attempts to explain the fundamental perceptual mechanisms that give rise to the octave illusion (Chambers et al. 2002, 2004a, 2004b; Deutsch, 1975, 2004b).

Since the original study in 1974, many acoustic features have been investigated to assess their impact on the percept. McClurkin and Hall (1981) pointed out that it is possible to elicit the illusion even when one of the two stimuli is a harmonic complex instead of a pure tone and that the use of a harmonic complex does not appear to have an impact on the most common percept reported by the participants. They also investigated the impact of the timbre. Their results suggest that timbre is not a parameter with a significant effect on the most common percept reported either. These findings helped to clarify the two-channel model by suggesting that the object of the what component of the illusion appears to be based on pitch perception rather than on frequency-based harmonic analysis (Bregman & Steiger, 1980) or dichotic fusion (Chambers et al. 2002).

Deutsch, (1974b, 1988) tested the octave illusion with different tone combinations (200–400 Hz, 400–800 Hz, 600–1200 Hz, 800–1600 Hz) and found no impact of tone combination on the percept distribution. This suggested that the model can be applied to octave intervals constructed from pitches within the range of 200–1600 Hz. Deutsch, (1988) also explored variations in intensity and found that increasing the amplitude of the low-frequency signal overcomes the lateralization effect. This finding is coherent with the suggestion by Deutsch, (1981) that the where pathway of the two-channel model involves inhibitory connections from neural units sensitive to high frequencies from neural units sensitive to lower frequencies. This hypothesis indeed predicts that increasing the amplitude of the lower tone would offset the inhibitory effect of the higher tone, and would in turn cancel the lateralization effect. This element raises the importance of having an equivalent intensity between the two frequencies presented to elicit the illusion accordingly to the common percept. Brancucci et al., (2009) found that the illusion could be elicited by several frequency ratios other than the octave interval without impacting the distribution of percepts (with tones separated by a major seventh, a minor ninth, a major ninth, and a minor tenth). This result is coherent with the finding of McClurkin and Hall, (1981) that the what pathway of the two-channel model involves pitch perception rather than the harmonic relation between the frequencies of the two tones. They also explored the impact of duration, suggesting that the illusion is stronger with tones lasting 500 than 200 ms. Finally, Brännström and Nilsson, (2011) even showed that it is possible to interchange the pure sounds by narrow-spectrum noises and still elicit the illusion. These kinds of narrowband noise contain no harmonic information, but still elicit pitch perception (Fastl & Zwicker, 2007; Horbach et al. 2018). Thus, this constitutes further evidence that the octave illusion involves pitch perception.

Most studies have found that the octave illusion is rather resilient to modification of acoustic features. Notably, the what pathway of the model seems to heavily rely on pitch perception and appears so far insensitive to the harmonic relation between the tones (Brancucci et al. 2009) and to the harmonic content (McClurkin & Hall, 1981) or lack of harmonic content (Brännström &Nilsson, 2011) of the tones. The most noticeable acoustic parameter that affects the perception of the illusion appears to be the difference in amplitude between the low and high tones, which impacts the localization percept in the where pathway (Deutsch, 1981). The hypothesis that pitch perception constitutes the object of the what component of the octave illusion is well supported; thus we would expect that a degradation of the pitch perception would alter the perception of the illusion. However, no study to date has directly tested this hypothesis. For instance, while the experiments of Deutsch (1974b, 1988) showed that the octave illusion is insensitive to changes in pitch, they did not investigate if the response pattern changes when the illusion is evoked with stimuli that produce reduced pitch perception. We suggest that this could be achieved using stimuli ranging the whole musical scale, particularly towards its upper and lower boundaries where pitch perception starts to disappear.

An 88 keys grand piano generally starts from 27.5 Hz (A0) and end at 4186 Hz (C8). It is, therefore, not surprising that our sense of musical pitch and our ability to make octave correspondences largely disappear above 5000 Hz (Attneave 1971). In addition, a major and minor triad recognition task in low and high frequency regions performed by Biasutti (1997) suggests that amateur musicians failed to recognize minor and major triads when placed below around 120 Hz and above around 3000 Hz. All of which raises the importance of similarly exploring the perception of the octave illusion. The objective of our study was, therefore, to investigate the impact on the distribution of percepts elicited by pairs of frequencies ranging from 40 to 4000 Hz while maintaining the other classic parameters of the octave illusion.

Methods

Participants

30 adults (14 males) with reported normal hearing took part in the study. All participants had less than 5 years of musical experience on average, were 26 ± 3 years old and were all self-reported right-handed.

Rationale for sample size

Power analysis was conducted using R version 4.2.1 (R Core Team, 2022) and the pwr package (Champely, 2020). A previous study on the octave illusion by Jonas Brännström and Nilsson (2011) reported strong to very strong effect sizes with a similar study design where the frequency distribution of two percepts (Octave illusion, other illusory percept) reported by N = 16 participants was compared across different test conditions. Accordingly, the expected effect size for the present study’s chi-square tests was estimated between 0.60 and 0.80, which constitute a strong association according to the Rea and Parker (1992) criteria. With a significance criterion of α = 0.05, a power = 0.80, and an effect size of w = 0.6, the minimum sample size needed was calculated as N = 27. Thus, the sample size of N = 30 is deemed adequate for the study hypothesis.

Protocol

The illusion was presented to the participants through headphones (Insert ER3C, Etymotic) at a comfortable sound level (65–70 dB SPL). The stimuli were generated with the open-source software Audacity 3.0.0. The pure tones (generated sine waves) all had a duration of 250 ms (similarly to the original experiment Deutsch, 1974b) and were presented alternately 120 times in each sequence for a total duration of 30 s. As in Deutsch’s original study, stimuli were controlled to avoid drops in amplitude and preserve phase continuity during the frequency transitions. 7 frequency pairs were presented to cover virtually the entire musical spectrum: 40–80 Hz, 100–200 Hz, 200–400 Hz, 400–800 Hz, 800–1600 Hz, 1600–3200 Hz, 2000–4000 Hz. The participant listened to each of these sequences in a randomized order and had to give an answer among the 4 choices for each sequence: (1) a high-pitched sound on the right alternating with a low-pitched sound on the left, (2) a high-pitched sound on the left alternating with a low-pitched sound on the right, (3) a sound that passes from one ear to the other without a change in pitch, and (4) none of these answers (the participant had to explain their perception if they gave this answer).

Ethical considerations

The Research Committee for sectorial research in neuroscience and mental health of the CIUSSS—Capitale Nationale approved all procedures and each participant provided written informed consent. All experiments were performed in accordance with relevant guidelines and regulations.

Analysis

First, the proportion of responses to the questionnaires for choices 1 and 2 were combined, as these both represent the same category of percept which is the octave one (Oehler & Reuter, 2013). These two response choices are analyzed separately in studies focusing on handedness, but since only right-handed participants were recruited for the present study, it was not appropriate to separate these response choices. In summary, we used 3 categories of percept to carry out the analyses: octave (answers 1 and 2), simple (answer 3) and complex (answer 4; Oehler & Reuter, 2013). Multiple Chi-squares with Bonferroni correction were carried out to compare the relative frequency distributions of responses (i.e., octave, simple or complex) between the seven different frequency conditions (i.e., 40–80 Hz, 100–200 Hz, 200–400 Hz, 400–800 Hz, 800–1600 Hz, 1600–3200 Hz, 2000–4000 Hz).

Results

Chi-squares were carried out to compare the distribution of responses for each of the pairs of frequencies with each other. The results of these comparisons are all available in Table 1.

Table 1 Statistical results for multiple Chi-squares used to compare the distribution of percept between each frequency pair

Statistical analysis suggests that the 40–80 Hz condition differs significantly from all other frequency pairs (100–200, 200–400, 400–800, 800–1600, 1600–3200, 2000–4000 Hz; see Table 1 for relevant statistical values). Figure 2 illustrates the number of responses for each category of percept for all frequency pairs. Looking at this graph, we see that 40–80 Hz differs from all other frequency pairs, since it is only condition where the most common percept is simple (60%) followed by complex (30%), with a very low percentage of participants reporting the octave percept (10%). Furthermore, the 400–800 Hz condition differs significantly from 40 to 80 Hz, 100 to 200 Hz, 1600 to 3200 Hz, and 2000 to 4000 Hz, but do not differ, after Bonferroni correction, from 200 to 400 Hz and 800–1600 Hz. 400–800 Hz is the condition in which the percept octave is most frequent (93%), followed by the 800–1600 Hz condition (83%) and by the 200–400 Hz condition (67%). Finally, the 100–200 Hz, 1600–3200 Hz and 2000–4000 Hz conditions do not differ between each other and are conditions in which the distribution of results is more divided, with the octave percept being reported by around half of the participants and the non-octave percepts (simple or complex) being reported by the other half of participants.

Fig. 2
figure 2

Number of responses for each three types of percepts across all frequency pairs. (** Statistically different distributions between two conditions, after Bonferroni correction; *** Statistically different distribution from every other condition, after Bonferroni correction)

Discussion

The main goal of this study was to investigate how the distribution of percepts elicited by the octave illusion is influenced using different combinations of tones covering the whole musical scale. More precisely, we elicited the illusion with pairs of frequencies ranging from 40 to 4000 Hz while maintaining the other classic parameters of the octave illusion. For the pairs of frequency near the center of the musical scale that had already been tested in previous studies (200–400, 400–800, 800–1600 Hz), we found similar distributions of percepts as were reported in past studies. For the 400–800 Hz condition, which is the frequency pair most often used to elicit the illusion, the most common percept, reported by 93% of participants, was the classical octave illusion. This result is congruent with the experiment by Deutsch, (1988), where 90.3% of right-handers reported hearing the octave percept when the same 400–800 Hz frequency pair was used. In the 200–400 Hz and 800–1600 Hz conditions, the most frequently reported percept was also the octave one. Deutsch, (1974b) obtained similar results with these two same frequency pairs when she tested the illusion in 12 right-handers. However, we also tested the illusion with other pairs of frequencies that had not yet been tested in prior studies, two lower frequency pairs (40–80 Hz, 100–200 Hz) and two higher frequency pairs (1600–3200 Hz, 2000–4000 Hz). The perception of pitch is known to decline near theses upper bounds (Moore, 1977) and lower bounds (Biasutti, 1997) of the musical scale. This is relevant since the two-channel or suppression model (Deutsch & Roll, 1976; Deutsch, 1975, 1978, 1981, 1988, 2004a) proposes that the what component of the octave illusion is based on the perception of pitch (Brancucci et al., 2009; Brännström &Nilsson, 2011; McClurkin & Hall, 1981). Thus, we hypothesized that eliciting the illusion with these higher or lower frequency pairs should alter the distribution of the percepts reported by the participants. This is indeed what our results suggest. Specifically, the proportion of participants that reported the octave illusion percept was lower in these conditions, which seems to support the prediction of the model that lower pitch perception may impair in total or in part the perception of the illusion.

The 40–80 Hz condition differs significantly from all the other pairs of frequencies tested. In this low-tone condition, the most frequently reported percepts were the simple percept (60%) and the complex percept (30%). This condition is different from all other frequency pairs tested, since it is the only one where the most common percept is not the octave percept. Our hypothesis is that this unique distribution occurs because these very low frequencies are found at the limits of the musical spectrum where a degradation of pitch perception occurs, thus significantly affecting the way the illusion tends to be perceived. This degradation of pitch perception at lower frequency is often reported in the literature. For instance, Houstmas and Goldstein, (1972) reported a drop in recognition of melodic intervals below 150 Hz and Biasutti, (1997) suggested that non-musicians failed to recognize minor and major triads when below 120 Hz. Based on the what and where model suggested by Deutsch, (2004a), if the what (i.e., the pitch we hear) is altered, such as in the low-tone condition (40–80 Hz), the where (i.e., the tone localization) should still be unaffected. This is what our results show, since, in the 40–80 Hz condition, most participants (60%) reported hearing the simple percept, meaning they heard a sound without a recognizable pitch alternating between their two ears. A simple percept response suggests a difficulty in perceiving pitch that goes beyond the simple incomplete pitch perception found in the classical octave percept. Furthermore, in the 40–80 Hz condition, 30% of participants reported perceiving a complex percept. For example, some participants described their complex percept as the impression of hearing a continuous noise in both ears (like a helicopter sound). Thus, for these participants hearing a complex percept, there was not only the notion of pitch which was lost, but the localization was also inaccurate. Taken together, the most and the second most reported percept for the 40–80 Hz condition suggest a greater difficulty to analyze the auditory stimuli compared to the 400–800 Hz condition. This interpretation is consistent with studies that investigated low pitch perception in a musical context (Biasutti, 1997; Houstmas & Glodstein, 1972).

The 100–200 Hz condition is also considered in the low pitch of the musical scale and differs significantly from the 400 to 800 Hz. This frequency pair is closer to the center of the scale compared to the 40–80 Hz pair; thus, it is not surprising to observe divided responses for this condition. Half of the participants perceived the octave percept similarly to the classical octave illusion, while the other half perceived a degraded percept (simple or complex).

Furthermore, the 400–800 Hz condition differs from 1600 to 3200 Hz and 2000 to 4000 Hz. In these two conditions, the results are also divided like in the 100–200 Hz condition. The proportion of combined responses for simple and complex percept is 50% for 1600–3200 Hz and 54% for 2000–4000 Hz. This degradation of the illusion perception in these conditions suggests that pitch perception is indeed diminishing at higher frequencies. Past studies about musical perception in the higher frequencies range suggested that the sensation of musical pitch and the ability to match octave intervals largely disappears above 5000 Hz (Moore, 1977) and that pitch discrimination becomes very difficult at very high frequency (Henning, 1966; Shower & Biddulph, 1931; Wever & Wedell, 1941). Here, our experiment did not go above 5000 Hz, but there is still a significant difference in distribution of percept between the two higher frequency conditions and the classical octave illusion pair of frequencies. We could, therefore, expect a greater proportion of simple and complex percepts by testing even higher frequencies. Further research is needed to confirm what happens to the distribution of percepts beyond this 5000 Hz threshold.

Considering that we suggest for the first time that lower and higher frequency pairs (compared to the classical octave illusion frequency pair) have an impact on the distributions of the percepts reported when eliciting the octave illusion, it would be interesting to explore the impact of a musical training on these distributions. It has been shown that professional musicians are better at to discriminating the pitch of pure and harmonic tones (for a review, see Besson et al., 2007) and that regular practice of a musical instrument leads to cortical reorganization in the areas associated with the modalities solicited (e.g., Herholz & Zatorre, 2012). Considering that perception of the octave illusion appears to be associated with pitch perception, we could hypothesize that professional musicians may present stronger pitch perception and thus may perceive the octave illusion more consistently than non-musicians in the lower and higher frequencies. To date, barely any study has investigated the octave illusion specifically among musicians. Brennan and Stevens, (2002) found that, even among musicians with the most extensive musical training, only a small minority reported hearing the veridical perception of the octave stimulus (i.e., a perception matching the true physical stimulus instead of an illusory percept). The authors also noted that pipe organ in particular players were more likely to report a veridical perception than any other equally trained musicians. The pipe organ is an instrument where pitch perception is of fundamental importance since organists must consciously add individual harmonics to attain the desired timbre (Brennan & Stevens, 2002). With these results in mind, we think that a protocol with frequency combination covering a broader spectrum would better show the differences between musicians and non-musicians.

Finally, exploring the neurophysiological correlates of the octave illusion may grant insights on the central mechanisms involved in pitch processing and sound localization. Deutsch’s model puts forward the idea that the octave illusion comes from neural units sensitive to frequency and spatial localization (Deutsch, 1981), but only a handful of studies attempted to directly investigate the neural mechanisms of the octave illusion using neurophysiological methods. One of these studies comes from Ross and Näätänen (1996), who used electroencephalography and mismatch negativity (MMN), an event-related potential originating at cortical level (e.g., in the auditory cortex) when an occasional deviant stimulus is presented amidst a sequence of repeated standard stimuli. A stimulus mimicking the most common percept was used as the deviant and the classical octave illusion stimuli as the standard. The deviant tones elicited the MMN which suggests that the stimuli giving rise to the octave illusion are encoded according to their physical rather than perceptual properties. Based on their results, the authors suggested that the generator of the octave illusion is located beyond the first level of the auditory cortex. Then, Lamminmäki and Hari, (2000) used whole-scalp magnetoencephalography (MEG) to investigate the N100m responses to binaural and dichotic 400 and 800 Hz tones. The N100m is a large negative potential elicited by an unpredictable stimulus in the absence of tasks. Their results suggest that the N100m was stronger in both hemispheres when tones were different between ears (400L/800R and 800L/400R) rather than the same (400L/400R and 800L/800R). In addition, the N100m was stronger in the right hemisphere when the 800 Hz was presented in left ear and the 400 Hz to the right. The authors suggest that the perceived locations of the sounds would follow the N100m lateralization, and that the percept is contributed by streaming by the ear. They also explored more recently (Lamminmäki & Hari, 2000), still employing MEG, the link between the perceived pitch and brain activity. Their results showed that illusory pitch perception depends on a modified binaural interaction between dichotic tones separated by one octave. Brancucci et al. (2018) investigated the neural mechanism underlying the what and where model using MEG. Their subjects listened to the classic acoustic sequence used to elicit the octave illusion and a second one, similar to it, but not eliciting the illusion. They also listened to a monaural sequence mimicking the illusory percept. Their results showed that bilaterally the Helschl’s gyrus, insular cortex, inferior frontal gyrus, and medial frontal gyrus are involved in the illusion, as is the left inferior–parietal lobe. These structures are known to belong to the two main auditory streams known as the what and where pathways. Furthermore, the sound sequence eliciting the illusion was associated with larger activity at early and middle latencies and to a lateralization pattern in favor of the left hemisphere. A recent fMRI study by Tanaka et al. (2018) compared participants who reported perceiving the octave percept versus participants who perceived another percept. The activation in the bilateral planum polare in the auditory cortex was significantly larger in the group who reported the octave percept. In the group who did not report the octave percept, there was a greater activation in the right premotor cortex. The authors thus suggest that not perceiving the classical octave illusion may be linked with a greater sensation of musical meters. Such study requires the comparison of two groups that possess different perceptions at baseline. However, our protocol might provide a new way to correlate different percepts with neurophysiological patterns of activation within the same subjects in a way that is less sensitive to between participants’ differences.

Conclusion

The objective of our study was to investigate the impact on the distribution of percepts when eliciting the octave illusion with pairs of frequencies ranging from 40 to 4000 Hz while maintaining the other classic parameters of the octave illusion. The results obtained for the frequency pairs used in prior study protocols are consistent with past results. However, when using pairs of stimuli towards the upper and lower extremes of our selected range (40–80 Hz to 2000–4000 Hz) where pitch perception is known to be degraded, our results showed that the distribution of percepts differed significantly, especially in the case of the 40–80 Hz pair. The findings of this research support the what and where model where pitch perception is hypothesized to play a significant role in the perception of the octave illusion.