Listening to and playing music involves almost all cognitive functions [1, 2], which is reflected in a wide range of rehabilitation and developmental procedures that use music and music-like signals [35]. To improve the effectiveness of these procedures, it is of fundamental importance to study the neurocognitive mechanisms of processing the elements and structures of music.

The main aspects of music are the tonal relationships between musical sounds (tonal space), as well as meter and rhythm (the organization of sounds in time). A key factor in the perception of melodies and harmonies is a tonal system of reference or a musical scale. Music of European tradition is dominated by the diatonic scale, the notes of which differ both in pitch and in the level of attraction to the tonal center or tonic (the first note of a scale). A basic formula of tonal syntax consists of three triads based on the following diatonic steps of the scale: I-IV-V-I. The reorientation of the diatonic scale from one tonic to another, i.e., the transition from one tonality to another within the same musical composition is called tonal modulation [6, 7]. Since tones in the diatonic scale differ in the level of attraction to a tonic [8], this tonal hierarchy sets various options for reorienting the tonal schema in the tonal space. Namely, reorientations of the tonal schema differ in the degree of relatedness between the beginning and ending tonalities. The level of commonality of information between the beginning tonality and ending tonality determines the tonal distance or the degree of relatedness of tonalities: the more tones are replaced in the ending scale with respect to the beginning scale, the greater the distance between two tonalities and the smaller the level of shared infromation between them.

The previous studies have found that the processing of tonal modulation is associated with the N200 wave or early right-anterior negativity (ERAN) and early anterior negativity (EAN) [911] and with the P600 wave [12, 13]. The N200 wave (ERAN) is associated with the processing of the level of correspondence between melody and harmony [14] and with the processes of harmonic integration [15]. The P600 wave is associated with the processing of linguistic syntax, since it arises in response to the presentation of syntactic disturbances in speech, e.g., in the case of violating the grammar of sentences and when reading grammatically correct sentences that are built in such a way that their interpretation will be either incorrect or ambiguous [16]. This wave is also recorded in the context of tasks related to the perception of sequences (e.g., sequences of geometric figures), mathematical rules, as well as violations of tonal syntax in music [13, 17, 18]. A space of the tonal relationships or pitch space [19] is explained geometrically as a torus [20], which makes it possible to classify the perception of tonal modulation (the reorientation of a tonal system of reference from one center to another) as a quasi-spatial phenomenon. The authors of this article assume that the amplitude-time characteristics of these ERP waves reflect the neurocognitive processes of processing the tonal distance during modulation.

The goal of this study was to investigate the amplitude–time characteristics of the N200 and P600 waves depending on the degree of tonal modulation.

METHODS

The study involved 20 volunteers (six men); the average age of the participants was 19.7 ± 2.3 years. None of them was a professional musician or had a special musical education other than the general school curriculum. Before taking part in the experiment, each subject was familiarized with it and signed a form of voluntary informed consent to participation.

The Stimulus Material

To carry out the experiment, we used a number of harmonic progressions (Fig. 1) modulating to the subdominant (near modulation that leads to the replacement of one tone in the beginning scale), minor sixth (distant modulation that leads to the replacement of four tones in the beginning scale), and triton (distant modulation that leads to the replacement of six tones in the beginning scale). A non-modulating progression (zero tonal distance) was used as a baseline to assess the modulation effect. Each harmonic progression consisted of eight chords. The progressions were alike in rhythm, tempo, duration (9.6 s), sound intensity, timbre, and frequency range. All progressions were in a major mode. Modulation fell on the last chord of the progressions. In addition, a set of stimuli was prepared that consisted of a similar set of progressions, but with the replacement of timbre for the fourth chord of the progressions.

Fig. 1.
figure 1

Note recording of the harmonic progressions used in the experiment. (a) The harmonic progression without modulation. Progressions with modulation: to the (b) subdominant (close modulation, replacement of one tone in the initial scale), (c) minor sixth (long-range modulation, replacement of four tones in the initial scale) and (d) triton (long-range modulation, replacement of six tones in the initial scale). Modulation fell on the last chord of the progressions. The frame highlights the modulating chords. The arrows mark the time of sounding of the progression chords before playing the modulating chord, the time of playing the modulating chord, and duration of the inter-stimulus interval.

The Experiment Procedure

The experiment procedure consisted of the following stages: Instruction of the subject; a training session, during which the participants were asked to listen to harmonic progressions with different degrees of modulation; and an experimental session, during which the participants were asked to count the number of fragments with timbre changes in the harmonic progressions. During the experimental series, the participants listened to a set of harmonic progressions that consisted of 240 progressions without change in timbre (60 for each progression variant) and 60 progressions with changes in timbre (12 for each progression variant); the ratio of the progressions was 240 : 60. All harmonic progressions were presented in random order; the interval between the stimuli ranged from 1.5 to 2 s. In sum with the duration of the progressions the interval between modulating chords varied in the range of 9.9–10.4 s. During the experimental series the participants took a break every 12 min, during which they could relax and inform the experimenter about the number of fragments with changes in timbre. The harmonic progressions were presented through headphones. The experiment was prepared using the “Presentation” software (version 19.0, Neurobehavioral Systems, Berkeley, CA, United States, www.neurobs.com).

The Parameters of Electroencephalogram Recording

Electroencephalogram recording (EEG) was performed using an EEGA-21-26 Encephalan-131-03 electroencephalograph-analyzer of the Medicom-MTD (Taganrog, Russia). Recording was carried out from 19 electrodes (Fp1Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, and O2), which were located according to the standard 10–20 scheme with reference electrodes on earlobes and a grounding electrode in a vertex. The sampling rate was 250 Hz. The following filtering parameters of the original signal were used: the cut-off frequency of the high-pass filter (0.5 Hz), cut-off frequency of the low-pass filter (70 Hz), and the rejection filter on the network frequency (50 Hz). Impedance values did not exceed 10 kΩ. Vertical and horizontal electrooculography (EOG) and electrocardiogram (ECG) were recorded in the first standard lead. EEG recording was carried out in a sitting position on a chair, with the possibility of adjusting the height of the backrest, armrests and height of the chair relative to the floor. During recording, the subjects were in total silence in a room without light, with the eyes closed.

EEG Processing

EEG recordings were processed using the Brainstorm software (version 161110), which is distributed under an open license agreement [20]. To eliminate the effect of the artifacts caused by eye movements, the EOG channels were analyzed by a 200-ms sliding window, and samples with a standard deviation above 35 mV were excluded from further analysis. To eliminate the artifacts caused by movements and signal drift, each EEG channel was analyzed with a 500-ms sliding window and with the exclusion of samples with a standard deviation above 35 mV. The fragments that remained after the automatic filtering were viewed by the experimenters. A low pass filter with a cutoff frequency of 35 Hz was used. According to the results of post-processing of the recording, the first subject was excluded from further analysis due to the large number of samples that contained artifacts caused by eye movement. The signal of each subject that was obtained after filtering was averaged for each stimulus at the time of playing a modulating chord of a progression. The time between the chords, in response to which ERPs were recorded, was 11.1–11.6 s. The average amplitude and fractional latency parameters [22] were calculated for each averaged wave at intervals of 150–250 and 550–650 ms since the beginning of playing the modulating chord. The fractional latency means the time point that divides the area under the curve into two specified parts (in this case, 50% was used) in the corresponding time range. A multidimensional analysis of variance (ANOVA) with repeated measurements was carried out with the following factors taken into account: Modulation Distance, Zone, and Laterality. The Modulation Distance factor included the four modulation steps described above. The Laterality factor included the values obtained for the electrodes, which were grouped by five lateral axes: axis 1 comprised F7, T3, T5; axis 2 comprised F3, C3, P3; axis 3 comprised Fz, Cz, Pz; axis 4 comprised F4, C4, P4; axis 5 comprised F8; T4; T6. The Zone factor included the values obtained for the electrodes, which were grouped by three zones: the frontal zone comprised F7, F3, Fz, F4, F8; the central zone comprised T3, C3, Cz, C4, T4; the parietal zone comprised T5, P3, Pz, P4, T6. After ANOVA, the Greenhouse–Geisser correction and Bonferroni correction were used. Statistical processing was carried out using the Microsoft Excel 2010 and Statistica 6.1 software.

RESULTS

The accuracy of the subjects' performing the test task of detecting timbre changes when listening to fragments with modulations amounted to 95% of correct answers.

The results of the variance analysis for the time interval of 150–250 ms showed significant effects of the “Modulation Distance” factor (F (3, 810) = 37.957, p ≤ 0.001, ph ≤ 0.00l (here and below, ph means the p value after applying the Greenhouse–Geisser correction)) and a significant effect in the case of the interaction of the Modulation Distance and Laterality factors (F (12, 810) = 3.503, p ≤ 0.001, ph ≤ 0.008) per average amplitude of the N200 wave. The amplitude of the N200 wave was found to decrease when listening to the fragments with modulations compared to the zero-step modulation. These changes are the most pronounced on the central axis Fz, Cz, Pz (Fig. 2). No significant differences between the amplitudes of the N200 wave for the fragments with modulating chords were identified. The ANOVA of fractional latency parameters for the time interval of 150–250 ms revealed a significant effect of the Modulation Distance factor (F (3, 810) = 4.736, p ≤ 0.002, ph ≤ 0.03). The average amplitudes of the N200 wave are given in Table 1.

Fig. 2.
figure 2

Averaged ERP waves in the areas of the most pronounced changes in N200 and P600 during listening to modulations (a) with zero tonal distance, with modulation to (b) the subdominant, (c) minor sixth, and (d) triton. The dotted frame indicates the time interval of 150–250 ms; the solid frame marks the interval of 550–650 ms. A low pass filter with a cutoff frequency of 35 Hz was used.

Table 1.   Average values of the amplitude of N200 and P600 waves during listening to the progressions with different distances of tonal modulation

The analysis for the time interval of 550–650 ms revealed a significant effect of the Modulation Distance factor (F (3, 810) = 12.933, p ≤ 0.001, ph ≤ 0.001) and a significant effect of the interaction of the Modulation Distance and Zone factors (F (6, 810) = 23.050, p ≤ 0.001, p ≤ 0.001). The amplitude of the P600 wave was noted to increase in direct proportion to the modulation distance in the frontal zone (Fig. 3). An inverse relationship was obtained in the parietal zone; the amplitude of the P600 wave was noted to decrease with increasing modulation distance. The variance analysis of fractional latency parameters for the time interval of 550–650 ms did not reveal significant effects for any of the factors. The average amplitudes of the P600 wave are given in Table 1.

Fig. 3.
figure 3

Topological amplitude map of the N200 and P600 waves during listening to chords without modulations, with modulation to the subdominant, minor sixth and triton. The amplitude of the waves is displayed at time points of (a) 200 ms for the N200 wave and (b) 596 ms for the P600 wave. A low-pass filter with a cutoff frequency of 35 Hz was used.

RESULTS AND DISCUSSION

The amplitude of the N200 wave during processing the modulating chords was lower compared to the wave amplitude of the non-modulating progression, which differed from the results obtained in the previous studies [9, 10] that had found that N200 was higher during the perception of tonal modulations.

The result may be due to several factors. The first factor is that we used unprepared modulations (interrupted cadances), while the previous studies used prepared modulations that occurred using three or four transition chords [9]. The second factor is that our experiment had gradations of modulations according to their degree (zero, near, and distant), while in other studies the authors assessed the effects of modulations without taking into account the tonal distance between the beginning and ending tonics [9, 10]. It can also be assumed that the increase in the N200 wave amplitude in response to the stimuli without modulation is associated with the oddball paradigm, in which the standard stimuli for the subjects were modulation progressions (the subdominant, minor sixth and triton), and the deviant stimuli were the stimuli without modulation (zero tonal distance). In this case, the observed increase in the amplitude can be attributed to mismatch negativity upon presentation of deviant stimuli (without modulation) with a frequency of 25%. This hypothesis is also confirmed by the fact that the observed effect was manifested mainly in the fronto-central leads [23]. This suggests that the neurocognitive processing of the degree of tonal modulation (near or distant in tonal space ) occurred at later times compared to “rough” deviations in the structure of musical phrases, such as the presence or absence of modulation as such; these earlier reactions to the presence/absence of modulation were reflected by the N200 wave.

The study also revealed that an increase in the amplitude of Р600 depends on the degree of modulation: the greatest amplitude fell on a suddenly appearing triad on a tritone (distant modulation), and the smallest amplitude fell on the ending triad in phrases without modulation (zero tonal distance). The amplitudes of P600 in response to the modulations to the subdominant and minor sixth did not differ significantly between themselves and were significantly smaller than the amplitude of the wave during modulation to the triton and significantly larger in the absence of modulation. These amplitude changes were the most pronounced in the frontal leads; i.e., the amplitude of the P600 wave forms a gradation from the smallest amplitude at zero distance to the maximum amplitude for modulation to the triton, and the intermediate amplitude for modulation to the near subdominant and minor sixth (a major tonality on the ascending minor sixth, i.e., diatonic step VI, belongs to the subdominant sphere).

The basic assumptions about the interpretation of the P600 wave come from the field of neurolinguistics, since this wave is most often observed when various syntactic anomalies occur. Coulson et al. [24] suggest that the P600 wave reflects the processing of “improbable” (atypical) syntactic elements of a sentence and is similar in its mechanism to the P3b wave, which is associated with the processes of identifying and matching external events with internal representations, since the wave arises in the case of deviations of a context-significant event from a predicted result. It was shown that the amplitudes of the P600 and P3b waves were sensitive to changes in the probability of presenting syntactic distortions in sentences and the degree of these distortions, and a similar spatial distribution of the P600 and P3b waves was recorded in this case. The authors also note that the P600 wave is not specific to modality and cannot be directly correlated with syntax processing; hence, they suggest that the P600 wave occurs when the mental schema that reflects the patterns of perception of various stimuli is disturbed.

Therefore, linguistic syntax serves as such mental schema in speech, the rules of tonal harmony form a mental schema in music, and laws of mathematics make a mental schema in mathematics. In relation to our results, the P600 wave may reflect the degree of violation of the rules of harmony during an unprepared transition into another tonality, which may be either near or distant in relation to the beginning tonality. It is important to note that the topology of the P600 wave in our experiment was different from the topology of the analogous wave, which was obtained in the experiments of Coulson et al. In our experiment the P600 wave is the most pronounced in the frontal leads, while in the experiments of Coulson et al. it was observed in the parietal and occipital leads. This may be due to differences in the experimental tasks, since in our experiment listening to fragments with modulations was passive and was conducted against the background of the task of identifying changes in timbre, while in the experiments of Coulson et al. the subjects read the sentences presented on the monitor screen and they were to answer the questions related to the comprehension of what was read.

Kaan et al. [25] suggest that the P600 wave reflects a number of characteristics of syntactic processing. It is assumed that the P600 wave in the parietal and occipital regions reflects the complexity of syntactic processing, whereas the frontal P600 wave is associated with the degree of ambiguity and complexity of syntactic processing. In this case, the authors attribute the occurrence of the P600 wave in the occipital regions with the processes of “recovery” and “revision”. The “revision” is understood by the authors as the processes of repeated syntactic analysis in the situations of processing a structure that is syntactically correct, but ambiguous in meaning. The “recovery” in turn reflects the processes of reconstruction of a grammatically incorrect sentence and the creation of its syntactically correct representation. The conclusion is made that the P600 wave is associated with the emergence of ambiguity of resolution or with the increasing complexity of semantic analysis. In our experiment we observe a similar topology of the distribution of the P600 wave during listening to modulations; the modulation distance in this case reflects the degree, to which musical syntax (tonal harmony rules) is violated. We can assume that the degree of tonal modulation is a feature that characterizes the level of complexity of syntactic analysis in music by analogy with syntactic and semantic processing, although they cannot be directly equated, since the syntaxes of language and music are based on different morphological principles.

Gouvea et al. [26] investigated the amplitude–time characteristics of the P600 wave during processing different types of linguistic syntactic violations: grammar violations, grammatically correct sentences that contain semantic violations that are detected after their complete reading, and sentences that contain interrogative pronouns. It may be noted that the topology of P600 wave distributions for all types of linguistic syntactic deviations is the same, namely, the wave is observed in the parietal and occipital leads. The authors suggest that P600 reflects the complexity of the structural relationships in the formation of the mental schema for processing homogeneous syntactic elements and that P600 may be associated with the processes of assessing the compliance of new elements with the already formed syntactic schema. Meanwhile, the latency of P600 reflects the time required to process the elements that form a syntactic structure, and the amplitude and duration of the P600 wave can be regarded as a function of creating and destroying the mental schema of syntactic relationships.

According to the results of our study, we can assume that the P600 wave is associated with the processes of creating the mental representation that reflects the laws of harmony and the amplitude of the wave reflects the degree of mismatch of the elements processed with this representation.

CONCLUSIONS

(1). The amplitude of the N200 wave is associated with the processing of tonal harmony, and this amplitude decreases when processing tonal modulations, regardless of the degree or distance of modulation. Changes in the amplitude of the N200 wave are the most pronounced on the central axis Fz, Cz, Pz.

(2) The increase in the amplitude of P600 in the frontal leads is due to the increase in tonal distance between the beginning and ending tonalities during modulation, which suggests that the amplitude of P600 is related to the level of difficulty in processing changes in the tonal schema, which are contingent to the degree of reorientation from one tonality to another.