Introduction

Music is a ubiquitous, fundamental instrument of communication, second only to language. Individual musical elements such as pitch, rhythm, tempo, contour, timbre, loudness, and reverberation are related to another in music and are the basis of higher order concepts such as melody or harmony (Levitin 2006). Rhythmic and melodic patterns need to be represented mentally to comprehend and process music as such, and not just as a sequence of rhythms and melodies (Azzara 1991). Music is a multifaceted domain, and musical experts must contend with a wide range of demands that depend on their specific profession. Studying instrumental or vocal musicians provides an approach to explore the impact of musical training on the processing of music, as musical experience is greatly influenced by practicing music and learning a musical instrument (Münte et al. 2002). According to Posner (1988), an expert is a person who consistently demonstrates outstanding achievement in an activity. Acquired domain knowledge, that is, explicit and implicit memory, has been the most prominent explanation for the superiority of expert performance (e.g., Lehmann and Gruber 2006; Jäncke 2009).

Performance differences between musicians and non-musicians in auditory tasks are well documented: expertise seems to promote both the perception and the processing of music (e.g., Zatorre et al. 1998; Pantev et al. 2001; Kraus and Chandrasekaran 2010). This is often evident in behavioral benefits such as accuracy of responses and response times (e.g., Tervaniemi et al. 2005; Pallesen et al. 2010). For example, musicians are better at differentiating frequencies and intensities (Houtsma et al. 1987), an effect that is domain specific, as the advantage in discriminating power does not translate into spoken sounds (Münzer et al. 2002). Musicians also show advantages in the processing of simple phonetic stimuli (e.g., Geiser et al. 2010) and in the processing of prosody and pitch perception (e.g., Magne et al. 2006). Musicians are superior to non-musicians especially when it comes to the processing of harmonic complex sounds (e.g., Micheyl et al. 2006; Nikjeh et al. 2009). However, it is still discussed whether these advantages also apply to the fundamental level of processing sound, such as in the detection of pure sine tones (Kishon-Rabin et al. 2001; Tervaniemi et al. 2005). One method of investigating the discrimination of sounds is the method of determining difference thresholds. In psychoacoustic studies, the difference threshold is the minimal distance in frequency between two consecutively presented sounds that is required for it to be recognized: the just noticeable difference (JND). The accuracy of indicating the JND between two stimuli increases with age, experience, and practice (Halberda et al. 2008). That is, the JND has proven to be training-dependent, for example, it can be improved by specific pitch discrimination training (e.g., Menning et al. 2000; Jäncke et al. 2001). To date, there has not been an examination of whether this applies broadly to all types of musical training. Therefore, the present study aimed to investigate if musicians have smaller JND for auditory material (pure sine tones) compared to non-musicians due to their general musical training (i.e., experience), and, thus superior “basic auditory abilities” (Kishon-Rabin et al. 2001, p. 125; Bianchi et al. 2017).

Additionally, we ask if differences in individual JND thresholds are also reflected in differences in brain activity between musicians and non-musicians (Bianchi et al. 2017) as it is discussed whether this advantage of musicians arises on cortical (see Jäncke 2009) and subcortical level (e.g., Bianchi et al. 2017; Parbery-Clark et al. 2011)—or perhaps even at the level of the inner ear (Bidelman et al. 2016). We, therefore, examine an event-related potential (ERP) investigated in this context: the mismatch negativity (MMN). The MMN represents the pre-attentive sensibility for the detection of deviations of a stimulus from a standard stimulus on the neurophysiological level in the EEG (Näätänen 1995). The auditory MMN is mostly generated bilaterally temporally in the auditory cortex, and in the frontal cortex. It is a good indicator of the auditory discrimination ability, as it seems to occur regardless of attention, motivation or requirement. It occurs approximately 100–250 ms after a deviant stimulus in an auditory oddball paradigm (Näätänen 1995, 2000). A sequence of consistent auditory stimuli is presented between which a less likely deviant stimulus occurs randomly. It, therefore, presupposes a difference between the current and previous events. Repeated sequential auditory input is represented as a trace in memory, and the current auditory input is automatically checked for consistency with that memory trace (Näätänen 1995, 2000). The latency and amplitude of the MMN are related to the magnitude of the deviation of the stimuli: the greater the deviation, the larger the amplitude and the shorter the latency (Näätänen 1995). The MMN is particularly well suited for controlling the neural response to specific musical components, for example, deviations in frequency or intensity, and also effects of musical training (Näätänen 1995; Tervaniemi and Brattico 2004). Many studies have shown that the MMN has larger amplitudes and shorter latencies in response to different types of deviance, such as temporal or spatial variations and frequency or intensity variations in musicians compared to non-musicians (Koelsch et al. 1999; Fujioka et al. 2004; Tervaniemi et al. 2005; Tervaniemi et al. 2006; Nikjeh et al. 2008; Nager et al. 2003). This suggests that musicians have a pre-attentive processing advantage for musical stimuli. In addition, the amplitude of the MMN correlates with the behavioral discrimination performance of individuals while examining auditory stimuli, that is, with greater accuracy of discrimination and shorter response times (Novitsky et al. 2004; Lang et al. 1990). Also, the amplitude of the MMN increases with increasing training of pitch discrimination (Putkinen 2014). The MMN is often followed by a fronto-central positivity, the P3a, indexing involuntary attentional orienting or attention shifting (e.g., Schröger and Wolff 1998; Escera et al. 1998; Friedman et al. 2001; Čeponienė et al. 2004; Polich 2007). According to Polich (2012), the P3a “results from an early attention-related process stemming from a working memory representational change” (p. 180). The peak latency of the P3a in adults is in the range of 250–350 ms (Čeponienė et al. 2004). The P3a shows a similar pattern to the MMN in terms of amplitude and latency: the greater the deviation, the larger the amplitude and the shorter the latency (e.g., Escera et al. 1998; Escera et al. 2000).

For the present study, participants’ individual auditory difference thresholds (individual JND thresholds) were determined first, a passive oddball paradigm followed. Participants were monitored by EEG to investigate whether musicians have smaller individual JND thresholds (Kishon-Rabin et al. 2001) and, therefore, better neural discrimination skills compared to non-musicians (Tervaniemi et al. 2005). The individual JND threshold was determined by means of frequency comparisons of two sine tones. In the passive oddball paradigm, a frequency-varying, rare-occurring stimulus was presented in a series of standard stimuli to evoke the MMN and P3a. We assumed that musicians and non-musicians differ in the following aspects. First, musicians and non-musicians differ in their individual JND thresholds with musicians having smaller individual JND thresholds (Kishon-Rabin et al. 2001). Second, musicians react faster to the presented sine sound pairs than non-musicians (Tervaniemi et al. 2005). Third, musicians and non-musicians alike respond with the MMN and P3a to the sound deviations in the oddball paradigm, but differ in the amplitudes of the MMN and P3a with musicians showing larger amplitudes of the MMN and P3a across all conditions (Putkinen 2014).

Methods

Participants

Fifty-three musicians (26 females, age range 15–55, M = 30.02 years, SD = 9.96 years) from various musical institutions and 51 non-musicians (26 females, age range 17–51, M = 27.30 years, SD = 7.78 years) initially participated in this experiment. There was no significant difference in age (t(98) = 1.52, p = .13).

Musicians were categorized through a musical expertise questionnaire (based on Ollen 2006). A musician was defined as an individual who met two or more of the following criteria: (1) they are employed primarily as a musician, (2) they have had a minimum of 10 years of musical training (range 10–46 years; M = 21.04 years: SD = 10.45 years), or (3) they average at least 1–2 h of practice per day (range 1–12 h; M = 2.1 h: SD = 1.88 h). Five musicians were drummers, 8 were pianists, 10 were guitarists, 15 were violinists, 6 played wind instruments (horn, saxophone), and the remaining 6 were singers. Thirty-eight had started musical training before 6 years of age, and the remaining 12 had started musical training before the age of 10.

Non‐musicians were defined as those who had never played a musical instrument and did not have any special musical education besides normal school education. Musicians and non-musicians were matched in terms of age, gender, and education level. All participants self-reported to have normal hearing and normal or corrected‐to‐normal vision. None of the musicians reported to have absolute pitch. A total of four participants were excluded from the data analysis. Of the musicians, two did not meet the abovementioned criteria we set to be considered “musical experts”, and one participant was unable to fully perceive the presented auditory material. One non-musician was excluded because she subsequently disclosed that she had learned a musical instrument and thus did not fulfill our non-musician criteria. The final sample comprised 100 participants: 50 musicians (25 female) and 50 non‐musicians (25 female).

The experimental protocols were done in accordance with the Declaration of Helsinki (1964) and approved by the Ethical committee of the Department of Psychology, Humboldt-Universität zu Berlin. A written parental consent was required for underage participants. Participation was remunerated with either course credits (psychology students) or money.

Tasks

JND threshold task

Based on Tervaniemi and colleagues (2005), a two-alternative forced choice (2AFC) experiment was programmed using the method of constant stimuli (Bortz and Schuster 2010) to record the individual JND threshold. The sounds were created using a Soundblaster 16-bit sound card and the Gold Wave software (version 4.26, GoldWave Inc.). All sounds had a presentation time of 300 ms including 5 ms rise and 5 ms fall times, and were presented binaurally via speakers at an intensity of 65 decibels. The sounds were presented in pairs of two successive sine tones with a silent interstimulus interval (ISI) of 1400 ms duration, so the presentation of a pair always lasted 2 s. The response time was not limited, but participants were asked to respond as fast and as accurately as possible. The standard sound had a frequency of 528 Hz and was presented randomly in the first or second position of the tone pair. The thirty comparative sounds had frequencies of 529 Hz and 558 Hz in 1-Hz steps (percentage change range .19–5.68%). The individual JND threshold was determined in the 2AFC procedure in which participants had to decide whether the first or second sine tone was higher in pitch by pressing the corresponding push-buttons (left or right). The order of the presented tones, as well as the assignment of the push-buttons, was randomized. The change of pitch was adaptive: the distance of the sine tone pair was reduced by one Hz if correctly answered, increased by three Hz if the answer was incorrect (weighted up-down method, Kaernbach 1991). The first pair of sine tones to be assessed had frequencies of 528 Hz and 558 Hz. To familiarize themselves with the task and handling requirements of the push-buttons, participants performed five paired practice runs at 528 Hz and 558 Hz with appropriate true/false feedback. The individual JND threshold is defined as the lowest tone that a participant can successfully differentiate from the standard sound in the sine tone pairs presented in the experiment. Participants “final” individual JND threshold was tested three times to rule out the possibility of a chance in answering.

Oddball paradigm

For the passive oddball paradigm, the sounds from the individual JND threshold task were used. The experiment consisted of three blocks of 15 min each. The standard sound was either combined with (1) a deviant sound of 558 Hz, (2) a deviant sound of 535 Hz, or (3) the participants’ individual JND threshold as a deviant sound. This is the first study using individually determined JND threshold values, whereas in previous studies, only the sample average individual JND thresholds were used as deviant stimuli. The frequencies of the deviant sounds were chosen to reflect different levels of neural responses and comprised 15% of the tones presented in the block. The order of the blocks was randomized. The total of 1800 sounds (600 per block) had a length of 300 ms each with an ISI of 300 ms. During the experiment, participants screened a wildlife video with no sound to maintain alertness and minimize movement artifacts. Participants were instructed not to pay attention to the sine tones.

Procedure

The entire study lasted about 3–4 h, and took place in the examination rooms of the Department of Cognitive Psychology of the Humboldt-Universität zu Berlin. The tests described in the present study took about 2–2.5 h.Footnote 1 The participants first answered the musical expertise questionnaire and then the EEG was prepared.Footnote 2 A resting EEG served to capture the spontaneous activity of the brain. Participants were then asked to complete the individual JND threshold task, followed by the oddball paradigm. To ensure that participants remained attentive and motivated, they were given an opportunity for a rest or snack break between single tasks.

Data recording and analyses

The participants sat in an EEG cabin while the experiment was being conducted. Response times and responses were recorded using push-buttons and recorded by the computer. The EEG was recorded with the QRefa Acquisition Software, Version 1.0 beta (Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany) from 46 Ag/AgCl electrodes on the scalp and around the eyes of the participant. 42 electrodes were mounted in an elastic electrode cap (Easycap GmbH, Germany) at positions FP1, FP2, AF3, AF4, Fz, F3, F4, F7, F8, FC1, FC2, FC5, FC6, FT7, FT8, Cz, C3, C4, T7, T8, A1, A2, CP1, CP2, CP5, CP6, TP7, TP8, Pz, P3, P4, P7, P8, PO3, PO4, PO7, PO8, PO9, PO10, Oz, O1, and O2 of the International 10/20 system. Four EOG electrodes were placed to the outer canthi of both eyes and the supraorbital and infraorbital ridges of the right eye. FPz was used as ground electrode, and impedances were kept below 5 kΩ. The signal was amplified with a PORTI-32/MREFA amplifier (TMS International B.V., Enschede, Netherlands) and sampled at a rate of 500 Hz with an online-reference to Cz.

Offline, the data were re-referenced to the mastoid electrodes A1 and A2 and filtered with a band-pass filter according to the common approach in the literature for auditory material (.1–30 Hz, Tanner et al. 2015). The data were divided into epochs of 600 ms including a 100 ms pre-stimulus baseline (see Tervaniemi et al. 2005). Subsequently, epoch correction was performed: all epochal data above 100 μV and below − 100 μV were excluded (Tervaniemi et al. 2005), as this indicates disturbing muscle artifacts. Afterwards, the Independent Component Analysis (ICA) of the EEG data was performed to exclude muscle artifacts and eye movements. The corrected epochs were averaged per participant and condition (standard or deviant stimulus).

In reference to Tervaniemi and colleagues (2005), the ERP effects were quantified using the mean ERP amplitudes in 50-ms time windows. These were centered on the peak of each component in the waves of average ERP difference. ERP amplitudes were averaged for the MMN and P3a for each region of interest (ROI). The ROIs and the parameters of the time windows are shown in Table 1 for each condition. In reference to Tervaniemi and colleagues (2005), the latencies of the MMN components were not analyzed since visual inspection suggested that the latencies did not remarkably differ between the groups (see Fig. 2).

Table 1 Time windows (in milliseconds, ms) and regions of interests (ROIs) for ERP quantification separately for each condition

Statistical analyses

All EEG data were analyzed using Matlab version R2014a and EEGLAB version 12.0.2.4b (The MathWorks, Inc., Natick, Massachusetts, United States). All statistical analyses were performed with the Statistical Package for the Social Sciences Version 22 software program (SPSS inc., Chicago, USA). Independent sample t tests were calculated for the mean individual JND thresholds and for response times. The main effects of the deviant stimulus condition and musical expertise, as well as their interaction, were tested for the MMN and P3a amplitudes using repeated measures ANOVA with the factors Deviant (individual JND threshold, 535 Hz, and 558 Hz) and Group (musicians, non-musicians). Additionally, Cohen’s d effect sizes were calculated. Post hoc tests were conducted by two-tailed t tests. The Greenhouse–Geisser correction was applied when appropriate. Alpha level was set at .05.

Results

JND threshold task

Descriptive statistics of the JND threshold task are displayed in Table 2. As expected, the individual JND thresholds of musicians (M = 531.1 Hz, SD = 1.83; range 529–537 Hz; percentage deviation from the standard: .59%) were significantly smaller than those of non-musicians (M = 539.8 Hz, SD = 7.26; range 530–554 Hz; percentage deviation from the standard: 2.23%; see Table 2).

Table 2 The individual JND thresholds (in Hz) and mean response times (in milliseconds, ms; SEM in parentheses) in the individual JND threshold task in musicians (M) and non-musicians (NM)

Contrary to the hypothesis, there were no significant differences in response times between musicians and non-musicians (see Table 2). We analyzed the differences in speed and accuracy data between participant groups, but found no significant correlation (i.e., no speed–accuracy trade-off).

Oddball paradigm

MMN

Figure 1 illustrates the grand-average ERP responses to standard and deviant tones in musicians and non-musicians. Figure 2 illustrates the difference waves (deviant minus standard tone ERPs) in musicians and non-musicians. As Figs. 1 and 2 illustrate, the MMN was elicited in both groups by the deviants of 558 Hz and 535 Hz as reflected by the negativity in the frontal components.Footnote 3 The MMN peaked between 160 and 240 ms depending on the extent of the deviant.

Fig. 1
figure 1

The grand-average ERPs elicited by standard tone (solid blue line) and deviant tones (dashed and dotted green lines) in musicians (top) and non-musicians (bottom). Individual JND thresholds are averaged over participants’ responses for their personal individual JND threshold sounds

Fig. 2
figure 2

Difference waves (deviant minus standard tone ERPs) in musicians (green line) and non-musicians (blue line) divided according to the three conditions (558 Hz, 535 Hz, individual JND threshold). Individual JND thresholds are averaged over participants’ responses for their personal individual JND threshold sounds

A 2 (musicians vs. non-musicians) × 3 (individual JND threshold vs. 535 Hz vs. 558 Hz) repeated-measures ANOVA on MMN amplitudes was performed. Across both groups, the MMN significantly differed in amplitude depending on the extent of the deviant (main effect Deviant: F(2,98) = 14.20, p < .001). The post hoc comparisons indicate that the MMN amplitude was smaller after the individual JND threshold condition than after the 535 Hz or 558 Hz condition (individual JND threshold vs. 535 Hz: p < .001, d = − .55; individual JND threshold vs. 558 Hz: p < .001, d = − .64; 535 Hz vs. 558 Hz: n.s., p = .29, d = − .13) (Fig. 1; Table 3). There was no significant main effect of Group (p = .78). There was a significant Group × Deviant interaction (F(2,98) = 3.47, p < .05). The post hoc comparisons show no significant differences between musicians and non-musicians in the 558 Hz (p = .75, d = .07) and individual JND threshold condition (p = .13, d = − .31). The only significant group difference is in the 535 Hz condition with musicians showing larger amplitudes than non-musicians (t(98) = − 2.13, p < .05, d = .43) (Fig. 2; Table 3).

Table 3 The mean amplitudes of the MMN and P3a (SEM in parentheses)

Musicians’ averaged individual JND thresholdFootnote 4 of 531.1 Hz was close to the standard sound of 528 Hz. As can be seen in Figs. 1 and 2, there was no MMN elicited in musicians in the individual JND threshold condition (p = .42), even though they were able to distinguish the sounds behaviorally (see “JND threshold task”). Non-musicians, whose averaged individual threshold was 539.8 Hz, predictably showed the MMN (F(1,48) = 5.75, p < .05). Exploratory analyses at the subject level have shown that non-musicians whose individual JND thresholds were close to the standard sound of 528 Hz (530–532 Hz; n = 9) also showed no MMN (p = .38).

P3a

The MMN was followed by a P3a, which peaked between 305 ms and 355 ms. A 2 (musicians vs. non-musicians) × 3 (individual JND threshold vs. 535 Hz vs. 558 Hz) repeated-measures ANOVA on P3a amplitudes was performed. Across both groups, the P3a significantly differed in amplitude depending on the extent of the deviant (main effect Deviant: F(2,98) = 8.19, p < .001). According to the post hoc comparisons, the P3a amplitude was smaller after the individual JND threshold condition than after the 535 Hz or 558 Hz condition (individual JND threshold vs. 535 Hz: n.s., p = .21, d = − .17; individual JND threshold vs. 558 Hz: p < .01, d = .34; 535 Hz vs. 558 Hz: p < .001, d = .51) (Fig. 1; Table 3). There was a main effect of Group (F(1,98) = 11.10, p < .001), that is, the P3a amplitude was larger in musicians than in non-musicians (Fig. 2; Table 3). The post hoc comparisons show significant differences between musicians and non-musicians in the 558 Hz (t(98) = 2.07, p < .05, d = − .41) and 535 Hz (t(98) = 3.21, p < .01, d = − .64) condition, but not in the individual JND threshold condition (p = .25, d = − .23) (Fig. 2; Table 3). The Group × Deviant interaction was not significant (p = .40).

Discussion

The aim of this study was to deepen our insights into neural auditory differences between musicians and non-musicians. To address this issue, an individual JND threshold task and a passive oddball paradigm were applied while recording EEG activity. We investigated if musicians and non-musicians differ in their individual JND thresholds (pure sine tones) and in their neural auditory discrimination as reflected by the amplitudes of the MMN and P3a.

The study yielded the following main results. First, musicians had significantly smaller individual JND thresholds than non-musicians in the JND threshold task. Second, musicians were not faster than non-musicians in the individual JND threshold task. Third, we found an MMN, followed by a P3a for both groups in the 535 Hz and 558 Hz conditions in the oddball paradigm. Fourth, non-musicians, whose individual JND threshold (M = 539.8 Hz) was above the deviant sound of 535 Hz, showed the MMN and P3a if their individual JND threshold was presented as a deviant stimulus. Fifth, we did not find an MMN and P3a in musicians in response to their individual JND threshold condition (M = 531.1 Hz), although they were behaviorally able to differentiate frequencies within their individual JND threshold range.

JND threshold task

The JND threshold task was used to shed more light on the processing of music (pure sine tones) in musicians versus non-musicians. It was assumed that musicians benefit from their expertise and show significantly smaller individual JND thresholds compared to non-musicians. It was also assumed that musicians react faster to the presented sine tone pairs than non-musicians.

In line with our hypothesis, the behavioral results showed that musicians and non-musicians differed in their individual JND thresholds: musicians had smaller individual JND thresholds than non-musicians (531.1 Hz vs. 539.8 Hz that is .59% vs. 2.23% deviation from the standard). As assumed, the subtle differentiation of pitch differences is part of the musical expertise. Numerous studies have argued that the superior performance of musicians in auditory tasks results from the auditory material that is being used, for example, harmoniously complex sounds (Micheyl et al. 2006; Nikjeh et al. 2009). Since it is familiar to musicians, they benefit from their expertise in processing these stimuli (e.g., Altenmüller 2002; Pallesen et al. 2010). In contrast to these studies, the pitch task selected here captured the ability to perceive subtle differences in sine tones. We recorded the JND of the participants, which only few studies have addressed so far (e.g., Spiegel and Watson 1984; Kishon-Rabin et al. 2001; Tervaniemi et al. 2005). Our findings support the results of Tervaniemi and colleagues (2005) who demonstrated detectable effects of musical expertise on discrimination for sine tones. In contrast to prior studies on sine tones, however, we found no significant difference regarding the response times between musicians and non-musicians (cf. Novitsky et al. 2004; Lang et al. 1990; Tervaniemi et al. 2005). However, since musicians had significantly smaller individual JND thresholds than non-musicians, they somewhat benefited from their musical expertise. In light of this, we examined the data for a speed–accuracy trade-off, but found none. Following Nikjeh and colleagues (2009), we propose that musicians compared to non-musicians process sine tones differently. The authors found that the “physiological detection of pure sine tones occurred more slowly for musicians than non-musicians” (Nikjeh et al. 2009, p. 442) as reflected in a longer P1 latency. They concluded that “musicians may have been slower to detect pure tones because they perceived this audible stimulus energy as irrelevant sensory stimuli” (p. 442), reflecting “a musician’s learned ability to modulate neural sensitivity to incoming irrelevant sensory stimuli” (p. 442).

As musicians were able to detect subtler differences (had smaller JND than non-musicians), the results lead to the conclusion that musicians seem to have superior “basic auditory abilities” (Kishon-Rabin et al. 2001, p. 125), and, therefore, better pitch discrimination skills than non-musicians (Tervaniemi et al. 2005). Further research is needed to analyze whether these superior “basic auditory abilities” (Kishon-Rabin et al. 2001, p. 125) are general in nature or are only restricted to sine tones as other authors argue (e.g., Demany and Semal 2002; Nikjeh et al. 2009). As already mentioned, pitch and tone perception strongly depend on learning (e.g., Menning et al. 2000; Jäncke et al. 2001; Halberda et al. 2008). That is, the JND can be improved by specific pitch discrimination training. However, it cannot be ruled out that better pitch discrimination in musicians could be partially innate and cannot be traced back to training alone. In the case of our study, this assumption is supported by the lack of correlation between years of musical training and JND (p = .69) (cf. Halberda et al. 2008). It could be speculated that other factors such as intensity of training, type of musical education, musical instrument, or mental imagery strategies might play a crucial role, too. Musicians may have smaller JND than non-musicians, but that difference may already have been present before they began musical training.

Oddball paradigm: MMN and P3a

The oddball paradigm was used to shed light on the neural differences in pitch discrimination between musicians and non-musicians as reflected in the MMN and P3a. The behavioral-independent neural responses of the participants were recorded in three blocks with different types of deviants (participants’ individual JND threshold, 535 Hz, and 558 Hz) versus the standard sound of 528 Hz.

In line with our expectations, we found an MMN and a P3a in musicians and non-musicians for the deviant sounds of 535 Hz and 558 Hz. The time windows in which the MMN occurred corresponded to those in the literature (Näätänen 1995) and showed the usual pattern: the MMN occurred earlier and over a longer period of time at larger deviants. The time windows in which the P3a occurred also corresponded to those in the literature (e.g., Čeponienė et al. 2004).

Contrary to our hypothesis, musicians and non-musicians did not significantly differ in their MMN amplitudes across all conditions. The 535 Hz condition was the only one in which musicians showed significantly larger MMN amplitudes compared to non-musicians. In the 558 Hz condition, the deviant-standard difference was greatest and both, musicians and non-musicians were able to easily identify this difference. As a result, musicians and non-musicians did not differ in their MMN amplitudes (see Tervaniemi et al. 2005). As for the P3a, musicians compared to non-musicians showed significantly larger P3a amplitudes in the 535 Hz and 558 Hz condition. Following Nikjeh and colleagues (Nikjeh et al. 2008), we propose that the P3a “may be a significant index of music experience” (p. 1005) as the P3a distinguishes both groups regarding their “accurate auditory discrimination” (Putkinen et al. 2013, p. 658).

The present data indicate that non-musicians showed the MMN and P3a at the 535 Hz condition, although their individual JND threshold (M =539.8 Hz) was above the deviant sound of 535 Hz. In the JND threshold task (where they consciously focused their attention on the sine tones) they were unable to distinguish between the two sine tones (535 Hz vs. 528 Hz). However, in the passive oddball paradigm (in which they did not pay attention to the sine tones), this distinction succeeded as mirrored in the MMN. This finding supports the suitability of the MMN as an indicator of the auditory discrimination ability, as the MMN is assumed to occur regardless of attention, motivation or requirement (Näätänen 1995, 2000).

As expected, non-musicians also showed the MMN and P3a when their individual JND thresholds, which at M = 539.8 Hz were sufficiently separated from the standard sound of 528 Hz and above the deviant sound of 535 Hz (as already mentioned above), were presented as the deviant sound. However, the opposite was true for musicians when their individual JND thresholds (M = 531.1 Hz) were presented as the deviant sound. Although musicians were behaviorally able to distinguish the sine tones (individual JND threshold vs. 528 Hz; see JND threshold task), they showed no MMN and P3a in the oddball paradigm for their individual JND thresholds. That is, when musicians put their attention on the sine tones (as in the JND threshold task), they benefited from their musical expertise as they were able to detect subtle differences. On the other hand, if they did not pay attention to the sine tones (as in the oddball paradigm), they did not distinguish between the sine tones as no MMN and P3a were present.

Individual JND thresholds are easily influenced and depend not only on the actual sensitivity of the sensory organ for the difference of two stimuli but also on other non-sensory influencing factors. The signal detection theory (Swets et al. 1961) describes how these two factors can be separated from each other so that the actual performance of the sensory organ can be determined. According to the theory, the recognition of the difference between two frequencies is subject to fluctuations which are influenced by the sensitivity of the sensory system (sensitivity parameter d′) and its activity, and also by the decision strategy (decision criterion c) (Müsseler 2016). C can be conditioned by external influences such as task instruction, the probability of the answer possibilities, and the relationship between costs and benefits to give a certain answer. Therefore, effects of motivation, vigilance, and attention, which can also influence the activity of the sensory system, can be effective (Harvey and Parker 2014). The ratio of the two parameters d′ and c can be represented in the receiver operating characteristic (ROC) curve, in which the curve with the same sensitivity d′ contains all possible answer strategies for the respective task. Thus, d′ and c can be considered separately. Whether the individual JND threshold is identified as a deviant in the oddball paradigm depends not only on the sensitivity of the sensory organ but also on the attention and motivation of the participant (Müsseler 2016). In the individual JND threshold task, the attention was explicitly on the sine tone differences, that is, participants directly focused on the sine tones. In the oddball paradigm, where the MMN and P3a were recorded, the attention was explicitly not on the sine tones, because the participants were instructed not to pay attention to them. We propose that musicians did not identify their individual JND thresholds due to the “unfocused attention” and, as a result, the MMN and P3a also failed to appear as neural correlates of the deviation detection. Control processes that were used to differentiate the sine tones in the JND threshold task were missing here, possibly resulting in the absence of better detection. Exploratory analyses at the subject level have shown that non-musicians, whose individual JND thresholds were between 530 Hz and 532 Hz, thus close to the standard sound of 528 Hz, also lacked the MMN and P3a. This may possibly lead to the conclusion that the MMN is not completely independent of attention in the individual JND threshold range. This corresponds to the findings of Tervaniemi and colleagues (2005), who investigated an “attend” and an “unattend” condition in their experiment. They found that musicians had significantly lower MMN and P3a responses to sine tone differences in the threshold range when no attention was paid to detecting them compared to specifically pursue the task of recognizing them.

Alho et al. (1992) also showed that the MMN was smaller for deviants of frequencies in the threshold range than for larger deviants when no attention was paid. As soon as attention was focused on detecting deviances, there was no difference in the MMN expression between large and threshold-close deviants. They concluded that attention modulates MMN expression only for small, threshold-close deviants but this conclusion was not confirmed in another study (Sanju and Kumar 2016). However, unlike the present study, Alho and colleagues (1992) and Sanju and Kumar (2016) used deviant sounds in the sample average threshold range, and not in the individual JND threshold range, which possibly reduced the discrimination performance for some participants and affected the expression of the MMN. Thus, in subsequent studies, the role of attention in the appearance and severity of the MMN would have to be examined for deviant sounds in the individual JND threshold range when participants, especially musicians, put their attention explicitly on the sine tone differences. Future studies should also consider attention and motivation of the participants. This could be done as part of the signal detection theory based on analyses of the area under the ROC curve representing the percentage of correct answers (Harvey and Parker 2014). Thus, the actual sound discrimination skills of the participants would be detectable and to a greater extent separable from other influencing factors.

Moreover, sine tones might not be the appropriate stimuli for the present research question. For example, studies showed that natural sounds and a musical context, such as melodies, produce a stronger MMN including a larger amplitude and shorter latency than sine tones (Tervaniemi et al. 2000; Pantev et al. 2003; Novitsky et al. 2004). Sine tones may not be complex enough to trigger the MMN (Nikjeh et al. 2008) in the JND threshold range without targeted attention (i.e., if the individual JND threshold is close to the standard). Nonetheless, we deliberately decided to use sine tones because they are not common features of music and thus musicians would not be more familiar with them than non-musicians. Therefore, future studies should contrast natural sounds with sine tones to question the possibly missing complexity of sine tones. In addition, a direct comparison of conditions with versus without attention in the oddball paradigm would be useful to clarify the role of attention in relation to the MMN and P3a in the threshold range.

Conclusion

The present study provides valuable insights into the behavioral and neural differences in pitch discrimination between musicians and non-musicians. Musicians appear to have better differentiation abilities as indicated by the smaller individual JND thresholds. Furthermore, our findings show that attention may play an important role in triggering the MMN—at least in the JND threshold range. Since this study is the first one that, to our knowledge, used the individual JND threshold as a deviant sound in the oddball paradigm, further studies are needed to further specify the neural correlates of the individual JND threshold. The MMN and P3a are suitable indicators of the auditory discrimination ability, but additional ERPs may provide further information. For example, Seppänen et al. (2012) suggest the P100, N100, and P200 as good additional indicators for the investigation of auditory discrimination ability, as they are influenced by training in sound discrimination tasks (Brattico et al. 2003). Moreover, attention processes should be considered in the analysis of further ERPs (for example, by comparing attend vs. unattend conditions, see Tervaniemi et al. 2005). Furthermore, longitudinal studies would be needed to analyze the causal relationships between the individual JND threshold, complex musical processing, expertise, and neural correlates.