Introduction

In everyday life, we frequently witness that humans do not perceive everything that happens in their field of vision. People overlook a waving person in a crowd or miss to see changes after cuts in a movie (unnoticed “continuity errors” in film terminology, change blindness in psychological terms). A core effect underlying such phenomena is inattentional blindness. First demonstrated by Neisser and Becklen (1975; Neisser, 1979; both not using the term yet), it means that people engaged in a simple attention task, such as counting the occurrences of a certain event in a visual scene, tend to overlook rather obvious unexpected events of long duration, even in the center of their visual field. The most impressive demonstration of the inattentional blindness effect comes from Simons and Chabris (1999), who showed that people concentrating on counting passes in a lay basketball setting do not notice a gorilla (i.e., a person in a gorilla costume) walking through the scene. This effect, replicated in many studies, highlights our ability to blank out even highly startling information when we are focusing on a specific task.

We show for the first time that an analogous phenomenon exists in music. In two studies, we investigated whether participants notice an “acoustic gorilla”, namely an e-guitar solo blended into Richard Strauss’ Thus Spake Zarathustra, when they are focusing on counting timpani beats. We further examine whether musical expertise (Study 1) and perceptual salience of the solo (Study 2) attenuate inattentional deafness. To explain our rationale, we review the research on the attentional processes involved in inattentional unawareness across different perceptual domains.

As mentioned above, inattentional blindness has been most effectively demonstrated by Simons and Chabris (1999; see also Most, Simons, Scholl, Jimenez, Clifford, & Chabris, 2001; Simons, 2000; Simons & Jensen, 2009), who also received lots of “internet fame” beyond the scientific community via YouTube and other media channels. They showed that when people are counting the number of passes in one of two teams in a leisurely student basketball scene, an average of 56 % do not notice a person in a black gorilla costume walking squarely across the scene. The effect is so striking that people who are not counting passes while they watch the film cannot believe that anyone could ever miss the gorilla. The scenario even works under rather uncontrolled conditions, as with audiences of several hundred people in lecture halls. Simons and Chabris (1999) found that fewer participants noticed the black gorilla when the counting task demanded more attentional resources (difficulty of the primary task) and when they attended to the white team rather than to the black team (feature similarity effect). Feature similarity implies that inattentional blindness may occur because attention narrows to processing of features for task-relevant objects and/or because it actively inhibits the intentionally unattended, irrelevant objects and thus any new object of this kind.

The experimental design of Simons and Chabris’ study has been extended frequently (e.g., Graham & Burke, 2011; Memmert, 2006; Seegmiller, Watson, & Strayer, 2011), and provides important insights about the structural link between attention and awareness. Using eye tracking, Memmert (2006) showed that children who did not consciously notice the gorilla did not differ in the frequency and duration of fixations on the gorilla from those who did notice it. Thus, looking at an object does not necessarily result in conscious perception of it (for similar results with static stimuli see Koivisto, Hyönä, & Revonsuo, 2004). Memmert (2006) concludes that self-organized processes act “as an early ‘gating’ mechanism, which influences the direction of attention through potentially useful or emotionally interesting information before conscious knowledge of the observed object is available” (p. 625). Wayand, Levin, and Varakin (2005) emphasized the tight relationship between attention and awareness in their research on inattentional blindness under multimodal conditions. However, the general debate on the nature of this relationship is unresolved, and ranges from the assumption that top-down attention and awareness are “distinct phenomena that need not occur together” (Koch & Tsuchiya, 2007) to attention being a prerequisite of awareness (Cohen, Cavanagh, Chun, & Nakayama, 2012).

Less disputed is that awareness of unexpected events plays a functional role in complex tasks that involve skilled performance. Evidence from sports suggests that experts are less prone to similar attentional lapses than novices. Werner and Thies (2000) showed that football players notice changes in static football images more often than novices. Memmert (2006) found that 61 % of experienced basketball players noticed the gorilla when counting the passes on the white team as opposed to 37 % of non-experts. In a dynamic basketball scene, experienced players noticed an unguarded team mate more often than less experienced participants when engaged in a distracting primary task (Furley, Memmert, & Heller, 2010). However, also experts performed less efficiently under a distracting primary task compared to untaxed attention. These results suggest that expertise increases inattentional awareness of objects but that blindness does not fully disappear.

Phenomena of inattentional unawareness are by no means restricted to vision. A seminal overview, also including tactile designs, is given by Mack and Rock (1998). The acoustical analog of inattentional blindness, inattentional deafness, is less well known than its visual counterpart, but related phenomena of selective auditory attention have a long research record. Actually, Neisser (1979) had been inspired by findings on selective listening when designing his visual attention studies that later led to the term “inattentional blindness”; hence, “inattentional deafness” can be seen as a renaissance of selective listening research in new paradigmatic clothes. For example, the classic cocktail party effect (“…the ability to focus one’s listening attention on a single talker among a cacophony of conversations and background noise”, Arons, 1992, p. 35; Cherry, 1953; Moray, 1959; Koch, Lawo, Fels, & Vorländer, 2011; Wood & Cowan, 1995a, 1995b; see Spence & Read, 2003, for a cross-modal example) resembles inattentional deafness effects in terms of underlying cognitive processes, although this link is rarely made explicit in contemporary studies. In inattentional deafness, focus on a relevant stimulus completely blocks out an unexpected acoustic stimulus, whereas in the cocktail party effect, the person is aware of the irrelevant stimuli but ignoring them, as long as no high-salience events, such as hearing one’s own name, occur. More generally, the dichotic listening paradigm (participants listening simultaneously to two different streams of information, each presented to one ear; e.g., Cherry, 1953; Broadbent, 1954; Wood & Cowan, 1995a, 1995b) bears structural similarities to inattentional deafness designs. It is a matter of debate, however, if “inattentional deafness” findings based on dichotic listening approaches are a valid analogon of “inattentional blindness” findings, as information presented to one ear only may be more rigorously attentionally blocked, and as this kind of blocking may involve other neural and cognitive processes, than information presented binaurally in a dynamic auditory scene.

In studies on change deafness, participants are often explicitly instructed to report changes in objects that are audible from the outset and they perform no other task in parallel (e.g., Eramudugolla, Irvine, McAnally, Martin, & Mattingley, 2005; see Snyder & Gregg, 2011, for a review). Interesting exceptions, building a bridge to inattentional deafness designs, are the studies by Vitevitch (2003; Vitevitch & Donoso, 2011). Using shadowing (Vitevitch, 2003) and lexical decisions (Vitevitch & Donoso, 2011) as the primary tasks, the speaking voice unexpectedly changed identity in the middle of the trials, with roughly 40 % of participants remaining unaware of this voice change. In a more naturalistic setup, Fenn, Shintel, Atkins, Skipper, Bond, and Nusbaum (2011) showed that most participants engaged in a telephone conversation failed to notice a change in speaker voice in the middle of the conversation unless they expected a change to occur or the two voices differed in gender. Vitevitch (2003) argues further that if attention is allocated to the index dimension of the voice (e.g., speaker identity), then the processing speed of the linguistic content dimension decreases. In fact, those participants who noticed the voice change tended to perform slower on the shadowing task than those who were unaware of the voice change. While these studies addressed allocation of attention to different features (content vs. speaker identity) of a single auditory stream (voice stream), in the present study we address attention allocation among different objects of a complex auditory scene (timpani beats vs. e-guitar).

Mack and Rock (1998) demonstrated inattentional deafness under static conditions with a single unexpected stimulus. Participants listened to recorded strings of five letters and wrote down each string as soon as the last letter was played. Additionally, they had to press a button whenever a string contained the letter “A”. A clarinet sound of 200 ms duration served as the unexpected event. To examine the perceptual salience of the unexpected stimulus, the loudness varied between groups of participants. In total, 71 % of the participants did not notice the unexpected sound. The frequency of inattentional deafness increased (from 33 % to 100 %) with decreasing loudness of the unexpected event.

A few studies have investigated the effects of inattention with multimodal stimuli. Wayand et al. (2005) found that more than half of their participants did not notice a salient and unpleasant audio-visual stimulus in an inattentional blindness design. Sinnett, Costa, and Soto-Faraco (2006) showed that inattentional blindness effects were less extreme, but still existed, in cross-modal than in exclusively visual conditions. Macdonald and Lavie (2011) found that visual perceptual load affected the frequency of inattentional deafness.

Beyond designs using static stimuli and multimodal designs, Dalton and Fraenkel (2012) were the first to publish a study of sustained inattentional deafness under dynamic conditions in a scholarly journal. Strictly following Simons and Chabris’ (1999) gorilla-study design, they asked participants to listen to one of two recorded conversations, one by two men and one by two women. In the course of the conversation, a male voice repeated the words “I’m a gorilla” for 19 s while moving through the auditory scene. A substantial share of participants was unable to report the presence of this unexpected event. Similar to Simons and Chabris’ (1999) feature similarity effect, high similarity between target and unexpected stimulus (i.e., same gender) resulted in lower occurrence of inattentional deafness (10 %) than low similarity (different gender; 70 %). The authors also varied the spatial separation of voices and found that both, spatial location and voice gender, contributed to inattentional deafness. The effect was most pronounced when target and unexpected event differed along both dimensions (women conversation and male gorilla utterance at different locations). However, the intriguing finding in Simons and Chabris (1999) is that the gorilla is missed even if it appears with almost no spatial separation to attended objects. When gender differed but the conversation and the gorilla utterance were at the same location, 55 % of the participants missed the utterance. Dalton and Fraenkel (2012) argue that the qualitative difference between dichotic listening and inattentional deafness is that in the former, basic auditory features of the unattended stream remain consciously present, whereas in the latter the unexpected stimulus and its basic features are missed entirely.

For all we know, inattentional deafness has never been demonstrated under controlled experimental conditions with respect to music, in spite of some indirect evidence for related effects. Repp (1996) showed that pianists’ errors are difficult to hear even for a jury of other pianists currently practicing the same piece, as the errors and omissions usually happen in non-dominant voices and without violation of the harmonic context, instead of, for example, in the melody lead or highest pitch voice (see also Byo, 1993; Sheldon, 2004). Repp (1996) found that on average merely 38 % of such errors were detected by experienced pianists and argues that learning to avoid errors in dominant voices is an implicit part of musical training. However, the relation is only weak here, as the salience of the “unexpected events” (errors) is very low due to their unobtrusive contextual fit.

By definition, inattentional deafness in music means that striking and unexpected musical events within familiar music settings remain unnoticed when an explicit task engages attentional resources. Studies on the effects of musical training suggest that musicians may be less prone to inattentional deafness. It has been shown on the neurophysiological level, for instance, that violinists extract information on chord structure preattentively (Koelsch, Schröger, & Tervaniemi, 1999) and that conductors are more efficient in selectively locating sounds in the periphery of an auditory scene (Münte, Kohlmetz, Nager, & Altenmüller, 2001). In the following studies, we tested inattentional deafness in a complex and dynamic auditory scene, and investigated to what degree the effect is attenuated by musical expertise (Study 1) and by a more salient unexpected sound (Study 2).

Method

Design

The experiment was neutrally framed as a “perceptual study in music psychology”. All participants were presented with a modification of the first 1′50″ (Introduction, or Sunrise) of Richard Strauss’ orchestral tone poem Also sprach Zarathustra (Thus Spake Zarathustra), op. 30. An e-guitar solo intruding several bars of Strauss’ orchestral piece served as the unexpected event. The music was presented from a digital source via loudspeakers, with the e-guitar’s onset at 1′16″, lasting for 20 s in Study 1 and for 26 s in Study 2. The participants’ task in the experimental group was to count the number of timpani beats in the piece, while the control group was instructed to just listen. The counting of the timpani beats was chosen as the attentive task for the experimental group due to (1) its relative simplicity, requiring no formal music training, and (2) the timpani’s spectral distance to the e-guitar, directing attention to the bass voices of the orchestra (low feature similarity). Note that the two groups listened to exactly the same recording, “meaning that any differences in detection rate between groups can be attributed directly to differences in participants’ attentional focus” (Dalton & Fraenkel, 2012, p. 369).

Procedure

Participants were tested individually. They sat on a chair opposite the hi-fi system at a distance of 50 cm. The sound intensity was maintained constant at a pleasant volume. First, participants reported demographic data and their level of musical expertise. After that, they were informed about their task (see Design). The experimental group additionally listened to a short excerpt of four timpani beats for clarification and warming-up purposes. Subsequently, all subjects were presented with the modified piece of music. Afterwards, following Simons and Chabris (1999), all participants were successively asked if they had noticed (a) anything peculiar, (b) any unfitting instruments or sounds, and (c) the e-guitar. Participants stating “no” to all three questions were classified as not having noticed any modification. If they answered any of these questions with “yes”, they were immediately asked what exactly they had perceived, how it had sounded, and when it had happened (at the beginning, middle, or end of the sequence). Participants were classified as having noticed the e-guitar only if they were able to name it at any point of this short interview. Additionally, they were asked how many beats they had counted.

All participants were asked whether they had heard the piece before, and if so, whether they could name the title, the composer, or any specific movie(s) featuring it. Participants who had never heard the piece before were excluded from the analyses. Out of the 115 participants who stated having heard the piece before, only 4, 12, and 24 could name the composer, the title, and the movie (Stanley Kubrick’s 2001: A Space Odyssey), respectively. To explore the effects of familiarity, the participants were classified into two groups: “more familiar” if they could name the title, the composer, or the movie correctly (n = 32), and “less familiar” if they merely stated having heard the piece before (n = 83). Finally, participants were excluded if they were familiar with the concept of inattentional blindness or with the “gorilla video”.

Materials

The e-guitar improvisations were recorded by a locally renowned professional jazz guitar player on a semi-acoustic guitar, linked to a standard guitar amplifier (Mesa Boogie, Quad Preamp). They were cut and mixed into a commercial standard recording of the orchestra piece (Chicago Symphony Orchestra, Sir Georg Solti, Decca 1994) using the Samplitude Professional 8.0 software. The two versions (Take 1 in Study 1 and Take 2 in Study 2) were chosen out of seven different takes which varied in distinctiveness/embeddedness, loudness, and use of the bottleneck technique, based on the experimenters’ pre-selective judgment and a pilot study on the ease of perception of the takes. Compared to the take selected for Study 1, the alternate take (with bottleneck) selected for Study 2 is more distinct from the original piece in terms of timbre, rhythmic pattern, melodic contour, and number of notes. It is clearly less embedded (more salient) and, therefore, more likely to be segmented from the orchestral voices and noticed more easily. Audio files (MP3) of both versions are available for download in the supplementary material [Study 1 = version_1; Study 2 = version_2].

Sample

A total of 125 students of the University of Klagenfurt initially participated in Study 1. A number of 10 participants had to be excluded because they were unfamiliar with the piece, had hearing impairments, or reported a highly deviant counting result (cf. Simons & Chabris, 1999). None of the participants claimed to know the inattentional deafness / inattentional blindness phenomenon. Hence, data of n = 115 subjects were analyzed (age 18–63 years, M = 26 years, 64 % female). Of these, n 1 = 58 were non-musicians, and n 2 = 57 were musicians. Musicians were either students of music or musicians of higher qualification with a minimum of either (a) 7 h of weekly instrumental practice during the last 3 or more years or (b) 3 h of weekly instrumental practice during the last 5 or more years. Participants who had regularly played an instrument over the last 10 years were also classified as musicians.

In Study 2, all 50 participants (students) were non-musicians, and 3 were excluded because they did not know the piece. Thus, n = 47 subjects (age 19–51 years, M = 26 years, 70 % female) remained.

Hypotheses and analyses

The main goal of this work was to demonstrate inattentional deafness in music. Hence, we predicted that significantly more participants would notice the e-guitar solo in the control group than in the experimental group. Musicians were expected to show considerably smaller, down to no inattentional deafness effects with this setting. Similar to expertise, higher familiarity with the piece was expected to attenuate inattentional deafness. We tested main and interaction effects of experimental groups, familiarity, and expertise in a logistic regression analysis.

Results

Study 1: general effect, familiarity, and musical expertise

We collapsed the response categories “e-guitar” and “peculiarity noticed” into a single category. For the two resulting categories (e-guitar or peculiarity noticed vs. no modification noticed), we first computed a logistic regression for the probability to notice the e-guitar (or a peculiarity) with the three main effects of experimental group, familiarity, and musical expertise directly entered into the model equation. This initial model showed a significant improvement in goodness-of-fit over a constant-only model [likelihood ratio χ 2(3, N = 115) = 42.38, p < 0.001, Nagelkerke’s R 2 = 0.42]. Elimination of any of the three main effects leads to a significant loss in model fit (all p < 0.001). Thus, each predictor was significantly associated with the probability to notice the e-guitar. Second, we added each of the three two-way interactions separately to the initial model and found that neither experimental group by expertise [χ 2(1, N = 115) = 0.17, p = 0.68], nor experimental group by familiarity [χ 2(1, N = 115) = 2.43, p = 0.12], nor expertise by familiarity [χ2(1, N = 115) = 0.08, p = 0.78] resulted in a significant improvement in goodness-of-fit compared to the initial model. Therefore, when controlling for main effects, none of the interactions was significantly associated with the probability to notice the e-guitar. The chosen model with the three main effects correctly classified 74 % of all participants, 80 % of those that noticed the e-guitar or some peculiarity and 64 % of those that did not notice any modification. The model with the estimated B-weights was: Logit(noticing the e-guitar) = 2.12 (control group) + 2.01 (more familiar) + 1.65 (musician) − 1.71 (constant). The constant reflects the fact that the odds of noticing the e-guitar for the less familiar non-musicians in the experimental group are e −1.71 = 0.18. The corresponding raw cell frequencies are displayed in Table 1.Footnote 1

Table 1 Frequencies of noticing the e-guitar solo (or some peculiarity) vs. not noticing any modification by experimental groups, expertise, and familiarity in Studies 1 and 2

In the control group, 81 % (46 out of 57) of the participants noticed the unexpected e-guitar solo. This proportion was lower in the experimental group, where only 43 % (25 out of 58) noticed the e-guitar fill-in. According to the Wald criterion, experimental group was a significant predictor in the logistic regression model when controlling for familiarity and musical expertise [B = 2.12, χ 2(1) = 17.48, p < 0.001]. The corresponding estimated odds ratio (OR) of e 2.12 = 8.3 suggests that the odds of noticing the e-guitar were 8.3 times higher in the control than in the experimental group.

A total of 84 % (27 out of 32) of the participants more familiar with the piece but only 53 % (44 out of 83) of those less familiar noticed the unexpected e-guitar. Familiarity was a significant predictor in the logistic regression model [B = 2.01, χ2(1) = 10.70, p = 0.001]. The corresponding odds to notice the e-guitar were OR = 7.5 times higher for the more familiar than for the less familiar participants. Note that familiarity was unrelated to experimental group and musical expertise. Familiarity did not differ between experimental and control group: 28 % (16 out of 58) and 28 % (16 out of 57) of the participants in the experimental and the control group, respectively, were classified as “more familiar”. More and less familiar participants were also similarly distributed among non-musicians and musicians: 29 % (17 out of 58) of the non-musicians and 26 % (15 out of 57) of the musicians were classified as “more familiar”.

Musical expertise. Only 48 % (28 out of 58) of the non-musicians noticed the e-guitar compared to 75 % (43 out of 57) of the musicians. Thus, musical expertise was a significant predictor in the logistic regression model [B = 1.65, χ2(1) = 10.94, p = 0.001], and the odds of noticing the e-guitar were OR = 5.2 times higher for musicians than for non-musicians. The descriptive results suggest that the difference between the experimental and control group in noticing the e-guitar was larger among non-musicians than among musicians. While 72 % (21 out of 29) of the non-musicians in the control group noticed the e-guitar, only 24 % (7 out of 29) in the experimental group noticed it. Among musicians, 89 % (25 out of 28) noticed the e-guitar in the control compared to 62 % (18 out of 29) in the experimental group. These relations are displayed in Fig. 1. However, as shown before, the interaction between group and expertise turned out insignificant in the regression model when controlling for main effects [B = −0.40, OR = 0.67, χ2(1) = 0.17, p = 0.68].

Fig. 1
figure 1

Experimental effect (inattentional deafness) for a non-musicians and b musicians in Study 1

In the experimental group, performance in counting timpani beats did not differ between musicians (Md = 32) and non-musicians (Md = 33), Mann–Whitney U = 360.5, p = 0.35. As implied by previous results on change deafness (Vitevitch, 2003; Vitevitch & Donoso, 2011), performance could also differ depending on whether the e-guitar solo was noticed or not. The number of counted timpani beats did not differ between these two groups, neither for non-musicians (U = 62.5, p = 0.46) nor for musicians (U = 68, p = 0.16).

Overall, participants engaged in counting timpani beats were less likely to notice the unexpected e-guitar than those just listening, when controlling for familiarity and musical expertise. Participants stating to be more familiar with the piece noticed the e-guitar more often than less familiar participants, when controlling for experimental group and expertise. And finally, musicians were more likely to notice the e-guitar than non-musicians, when controlling for experimental group and familiarity.

Study 2: salience of the unexpected stimulus

The aim of Study 2 was to investigate the inattentional deafness effect with a more salient unexpected event. Thus, an e-guitar take was chosen which was less embedded into the main piece, and thus presumed to be more easily spontaneously noticed. This complementary study was conducted with non-musicians only. In Study 2, 79 % (19 out of 24) of the participants in the experimental group and 91 % (21 out of 23) of the control group noticed the e-guitar (see bottom of Table 1). Detection rates in both groups were higher than in Study 1, demonstrating that the change in salience was successful. The difference between control group and experimental group was not significant [χ2 (1, n = 47) = 1.37, p = 0.24]. Again, performance in the primary task (counting timpani beats) did not differ depending on noticing or not noticing the e-guitar (U = 34.5, p = 0.35).

To systematically test the differences in salience between the two e-guitar soli used in both studies, we combined the data from Study 2 and from the non-musicians in Study 1 in spite of possible cohort effects, because in both studies the non-musicians were sampled from the same subject pool of students with overall similar characteristics in terms of educational background, age, and familiarity with the piece. Across these pooled data, a logistic regression with experimental group and salience (lower salience in Study 1 vs. higher salience in Study 2) as the two main effects showed a good fit judged by the improvement over a constant-only model [likelihood ratio χ2(2, N = 105) = 30.93, p < 0.001; Nagelkerke’s R 2 = 0.35]. Addition of the interaction between experimental group and salience did not result in a significant improvement in goodness-of-fit [χ2(1, N = 105) = 0.97, p = 0.33]. In the resulting model with the two main effects, the odds to notice the e-guitar were OR = 6.1 times higher in the control than in the experimental group [B = 1.81, χ2(1) = 12.63, p < 0.001]. The odds to notice the more salient e-guitar solo in Study 2 were 8.3 times higher than the odds to notice the less salient solo [Study 1; B = 2.12, χ2(1) = 15.43, p < 0.001], thus providing a comparative picture of the size of the stimulus salience effect.

Discussion

Our results demonstrate that sustained inattentional deafness exists in the musical realm, in close correspondence to visual blindness effects with dynamic stimuli (in both film and reality settings). Most participants who were counting the timpani beats in the opening sequence of Richard Strauss’ Thus Spake Zarathustra did not notice a 20-s e-guitar solo during the piece, while most participants just listening (in the control group) noticed it. We found no evidence of primary task trade-offs, and participants noticing or not noticing the e-guitar performed similar in the counting task. Familiarity with the piece and musical training both lead to generally higher detection rates, but did not eliminate inattentional deafness.

In Dalton and Fraenkel (2012), 45 % of the participants noticed the male gorilla utterance at the same location as the attended conversation of women. Detection rates in the experimental conditions in the present studies ranged from 43 % for the less salient to 79 % for the more salient unexpected stimulus. Simons and Chabris (1999) report detection rates of the gorilla of 42 and 50 %. It is difficult to tell to what degree the unexpected events in all three studies are similar with respect to salience because the control conditions differed. Participants in Dalton and Fraenkel (2012) listened to the scene twice and were explicitly instructed to listen for unusual stimuli on the second trial, whereas in the present study control group participants were not hinted about anything unusual. Our control group setup therefore provides a more conservative estimate of stimulus salience and may explain why we did not find detection rates at ceiling. However, given that in the present Study 1, 19 % of the control group participants missed the unexpected stimulus, we cannot exclude that the e-guitar solo may have been less salient than the gorilla utterance. Also, when directly comparing these studies, consider that feature similarity was chosen to be low (with timpani beats in the primary task vs. an e-guitar as the unexpected stimulus), corresponding to counting the passes in the white team in Simons and Chabris’ work and to the different-gender condition in Dalton and Fraenkel.

Comparing the results of Study 1 with those of Study 2, effects due to the salience of the unexpected stimulus are clearly visible. Overall, the odds for missing the less salient stimulus in Study 1 were about 8 times higher than for the more salient one in Study 2. Since the less salient stimulus was more embedded and, therefore, more similar or “close” to the orchestral voices than the more salient stimulus, this finding is consistent with inhibitory process explanations of the inattentional deafness effect. When counting timpani beats, it is likely that other orchestral voices were inhibited, as well as any similar sounding new stimulus. Therefore, by similarity to the orchestral voices other than the timpani, the less salient (more similar) stimulus in Study 1 may have suffered more from inhibition than the more salient (dissimilar) one in Study 2.

Musicians noticed the e-guitar solo more often than non-musicians in general. Musicians in the experimental group noticed the e-guitar solo more frequently (62 %) than non-musicians (24 %). In the control group, the large majority of both musicians (89 %) and non-musicians (72 %) were aware of the e-guitar. Musical training and domain-specific knowledge enable musicians to process and represent the auditory scene more efficiently (e.g., Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011; Zendel & Alain, 2008), spending less attentional resources on the counting of timpani beats and thus noticing the unexpected e-guitar more often. Polyphonic awareness for the entire score is a cornerstone of most musical training regimens, most prominently so for conductors, but also for musicians of all trades. Since the field of music is well suited to explore expertise effects (e.g., Vitouch, 2005), this variation of “perceptual expertise” is another novel aspect of our approach (cf. Memmert, 2006, and Furley et al, 2010, for visual examples).

Having said this, one possible counter effect has to be considered: With compliant subjects, musical expertise will also have helped to concentrate on the primary task, efficiently filtering most other tonal information. (Just think of a timpanist, who has to re-tune his instruments well in advance before a modulation while the orchestra is still playing in another key.) Musicians probably had more attentional ressources in reserve, but might also have had lower field dependence and stronger focus (or attentional shielding), with these two effects working in opposite directions with regard to inattentional deafness.

In any case, the implications of our results are twofold: First, music is a suitable domain for demonstrating sustained inattentional deafness under highly realistic conditions and in close analogy to Simons and Chabris’ inattentional blindness, using a binaurally presented, complex and dynamic auditory stream without the necessity of artificial or reductionist laboratory restraints (showing inattentional deafness with music). Second, from a perspective centered on music perception, we learn that the polyphonic awareness of both non-musicians and musicians is clearly limited as soon as attentional ressources are competitively engaged, which has further implications for music education and music performance (showing inattentional deafness in music).

Since we have demonstrated inattentional unawareness under dynamic conditions and with real-life musical material, the ecological validity of these results may be higher than in earlier studies, especially from the visual domain, using static and experimentally simplified stimuli. For another example, see Simons and Levin (1998), who brilliantly succeeded at demonstrating person-related change blindness effects in a real-life setting on a university campus. Additionally, in evolutionary theories of human perception, hearing is conceptualized as an “alarm sense” (you cannot “close your ears”) which is especially prone to pre-attentively react to unexpected events (as in the so-called orienting reflex extensively studied in psychophysiology). This makes the strong effect of inattentional deafness even more remarkable.

Just like in the visual domain, there will be limits to this effect depending on the salience and the sheer physical properties of the unexpected stimulus: There is probably no counting task engaging enough to make people miss the sudden fortissimo in Joseph Haydn’s Symphonie mit dem Paukenschlag (Surprise Symphony), or the “tutti” fortissimo following a fading bassoon pianississimo in the first movement of Pyotr Ilyich Tchaikovsky’s Pathétique. But despite these obvious limits, it is intriguing to see how a relatively simple task can bind attentional resources in a way that is strongly counter-intuitive, and even hard to believe, when listening to the material under control conditions.

A possible point of criticism is that since the days of “The London Philharmonic Orchestra playing the Beatles” and other cross-over projects (pop meets classic), an e-guitar improvisation in a classical piece might not be perceived as unusual, or at least will be less out of place than a gorilla in a basketball scene. However, this does not explain why participants in the control group mentioned the e-guitar much more often than participants counting timpani beats; so this interpretation is ruled out by design.

Altogether, our results show close correspondence to findings from the visual domain, and are just as perplexing: People are quite likely to completely miss an e-guitar improvisation in a classical piece of music under simple conditions of attentional distraction. Hence, our research implies that human auditory processing is even more selective, due to attentional processes, than typically assumed. Also, the “score awareness” of musicians seems to be much more restricted than hoped for. The striking effects even among musicians shed a new light on the role of attentional processes in music perception and performance.