Keywords

1 Introduction

Magnetoencephalography (MEG) is a noninvasive technique for investigating neuronal activity in the living human brain and has been used to study human auditory function since 1990. MEG is suitable to study auditory function due to two major advantages. First, the MEG does not produce any sound during measurement. The neuromagnetometer system measuring MEG activity is a completely passive-mode machine that solely measures magnetic flux produced by the neuronal current in the brain without any radiation or production of physical particles or electromagnetic waves. It is a great advantage for auditory researchers that the MEG can record neural activity in complete silence, avoiding contamination of acoustic stimulation delivered to the subjects with machine noise. Second, the MEG can well resolve the signals produced by the auditory cortices located bilaterally in the temporal lobes. It is more difficult to resolve electric potentials produced by the auditory cortex in each hemisphere, considering the symmetrical scalp potential when either side of the auditory cortex is activated in isolation. Additionally, any auditory stimulation cannot solely activate a single side of the auditory cortex by the anatomical limitation. Auditory information from each ear does radiate both sides of the auditory cortex with contralateral weighting, without utilizing specialized experimental conditions such as frequency tagging [17]. Furthermore, the electric potential reflects activity in the subcortical regions as well as in the auditory cortex whereas MEG predominantly reflects cortical responses. Thus, the MEG is an indispensable tool to noninvasively measure and analyze neural signals from the auditory cortices with high temporal and spatial resolution.

A considerable amount of studies concerning human auditory system has so far been reported, addressing selective attention [39], auditory stream segregation [81], scene analysis [20], and information masking. In this chapter, we will briefly review clinical MEG studies concerning basic auditory function from the viewpoint of experimental paradigms and describe our recent trials to expand experimental paradigm specifically suitable to auditory research.

2 Repetition

Simple averaging of the evoked responses to repeated stimuli (e.g., a click, tone, or noise burst) has long been used in clinical studies. In adults, the most prominent deflection of the auditory evoked responses (AEFs) to tone bursts with enough duration presented with sufficient interstimulus intervals for response recovery is N1m (magnetic counterpart of N1 potential) peaking approximately 100 ms after stimulus onset [21, 33, 44, 51]. Although tone clicks or bursts with rapidly rising envelope can elicit earlier magnetic responses such as brain stem [15] and middle latency responses [40], clinical application has been limited [31].

The AEF components, as well as auditory evoked potentials (AEPs) [19], are known to change with maturation [25, 84]. In children until the age of 7 years, the most prominent AEP component is P1, from which the later components N1 and P2 are branching (Fig. 5.1). The P1 (P1m) response shows a progressive and rapid decrease in latency along white matter maturation (i.e., myelination), which can be represented by increasing fractional anisotropy (FA) in diffusion tensor imaging. The white matter FA in the acoustic radiations of the auditory pathway, from the medial geniculate nucleus in the thalamus to the primary auditory cortex, is negatively correlated with P1m latency [60]. This latency shortening is delayed in children with autism [61, 62].

Fig. 5.1
figure 1

(a) Maturation of auditory evoked responses (Revised from [19]). Evoked potentials were recorded at Cz referenced to the right mastoid from normal-hearing participants (N = 8–12 for each age group), while a 23-ms speech syllable was presented with an interstimulus interval of 2 s. Averaged responses were filtered with a band-pass 4–30 Hz filter and grand-averaged within each age group. (b) Developmental latency shortening of P1m revealed by the two previous studies (Revised from [25] and [84])

The source for N1m responses to pure tones in the left hemisphere is known to be localized posterior (~14 mm) to that in the right hemisphere [14], resulting at least partially from the larger planum temporale in the left hemisphere. A significant reduction of the right-sided N1m anteriority consistent with neuroanatomical deviation was first reported in patients with schizophrenia [63]. Recently, similar reduction of the anteriority was replicated in patients with autism and an association between language functioning and the degree of asymmetry was suspected [68].

Gamma-band oscillatory activity in MEG has been considered as a reliable marker for cognitive function. Oscillatory activity can be evoked or induced by a variety of cognitive tasks. In auditory modality, repetition of click trains has often been used as stimuli to record the auditory steady-state response (ASSR), which exhibits resonant frequencies in response to the click trains at approximately 40 Hz most frequently [50, 54]. Previous MEG studies reported reduced gamma-band ASSR in patients with chronic schizophrenia [76] and in patients with bipolar disorder [49].

3 Paired Repetition

Paired-click paradigm has been used for clinical research since 1980s. Sensory gating function, which is considered to represent a very early stage of attention, has been studied using the paired-click paradigm [2]. In this paradigm, identical clicks (or short sounds) are presented as a pair “S1-S2” typically separated with 0.5 s at intertrial intervals of several to c.a. 10 s [48]. In normal subjects, the responses to the S2 are significantly reduced compared with those to the S1, which has been interpreted as a representation of sensory gating function to suppress incoming irrelevant sensory inputs. The most reliable neurophysiological index of the response reduction is considered to be P50 (or the magnetic counterpart P50m), which is generated near the primary auditory cortex peaking approximately 50 ms after the stimulus onset. In patients with schizophrenia, this amplitude reduction is less pronounced compared with normal subjects, which has been believed to account for cognitive symptoms such as trouble focusing induced by sensory overload.

The sensory gating effect was also studied by using MEG [8, 9, 13, 24, 27, 78]. The MEG sensory gating deficit in schizophrenia was also replicated with left hemispheric dominance [74] and was presumed partially resulting from a complex alteration of information processing [55] and impaired gating process associated with alpha oscillation [43]. In the schizophrenia group, anterior hippocampal volume was smaller, and both the P50 and M50 gating ratios were larger (worse) than in controls [75]. Patients with schizophrenia showed significantly higher P50m gating ratios to human voices specifically in the left hemisphere.

Moreover, patients with higher left P50m gating ratios showed more severe auditory hallucinations, while patients with higher right P50m gating ratios showed more severe negative symptoms [23]. Acoustic startle prepulse inhibition (PPI) is known as another index of sensory (sensorimotor) gating. However, previous studies found little correlation between the two measures, suggesting independent aspects of brain inhibitory functions [7, 26, 28].

Several MEG studies were conducted to investigate the pathophysiology of stuttering [4, 6, 66, 67, 77]. Using this paired-click paradigm, a previous study reported that stutterers exhibited impaired left auditory sensory gating and expanded tonotopic organization in the right hemisphere, which was consistent with a significant increase in the gray matter volume of the right superior temporal gyrus revealed by voxel-based morphometry [37]. The degree of PPI, on the other hand, was not correlated with the effect of altered auditory feedback on stuttering [3].

In this paradigm, the inter-pair interval is the only variable parameter. The predictabilities of the occurrence timing of S1 and S2 are quite asymmetric. The forthcoming S2 appears 0.5 s after the most recent S1, whereas the S1 appears several to c.a. 10 s after the most recent S2. Previous studies on temporal reproduction and sensorimotor synchronization revealed hierarchically organized chronometric function in humans [42, 56]. The neural substrates for temporal prediction in audition in this time range may involve cortical and subcortical temporal processing network (Fig. 5.2) [69]. Function assessed by this paradigm may include not only attention-mediated gating but also temporal processing.

Fig. 5.2
figure 2

A histogram of timing errors in a behavioral task of time-interval reproduction in a representative subject. Intervals of 0.25 and 0.5 s were well reproduced, whereas timing error increased drastically when the interval gets longer than 1 or 2 s. Behavioral performance of temporal reproduction correlated with neuromagnetic measures of P1m gating (unpublished data)

4 Oddball Paradigm

The oddball paradigm (first used by Squires et al. [72]) has been extensively used in a considerable range of MEG as well as EEG studies to probe human auditory system. A large part of the studies on human auditory function using oddball paradigms focused on the mismatch negativity (MMN) [46], which has been considered as an index of pre-attentive change detection processes occurred in the absence of directed attention. The conventional MMN is defined as a differential event-related component superimposed onto the response to oddballs. The MMN can hence be extracted by subtracting the responses to the standard stimuli from those to the rare stimuli in oddball paradigms. The MMN is one of the most promising neurophysiological candidates for biomarkers reflecting mental illness, such as schizophrenia (first reported by Shelley et al. [70]) and Alzheimer’s disease (first reported by Pekkonen et al. [53]).

Previous studies showed that the MMNm (mismatch fields; the magnetic counterpart of the MMN) produced by speech sound rather than tonal stimuli exhibited more marked reduction in schizophrenia [35]. The power of the speech-sound MMNm in the left hemisphere was positively correlated with gray matter volume of the left planum temporale in patients with schizophrenia, implying that a significant reduction of the gray matter volume of the left planum temporale may underlie functional abnormalities of fundamental language-related processing in schizophrenia (Fig. 5.3) [82]. Latency for speech-sound MMNm in adults with autism was prolonged compared to the normal subjects [34]. The prolonged peak latency of speech-sound MMNm was replicated in children with autism [59], and the delay was most evident in those with concomitant language impairment, which may validate the speech-sound MMNm as a biomarker for pathology relevant to language ability.

Fig. 5.3
figure 3

Scatterplot depicting a correlation between the phonetic MMN power in the left hemisphere and the gray matter volume of the left planum temporale in patients with schizophrenia (n = 13). (Revised from [82])

MMN generation is regulated through the N-methyl-D-aspartate-type glutamate receptor [30], which was supported by a recent study showing that the MMNm was modulated by genetic variations in metabotropic glutamate receptor 3 (GRM3) in healthy subjects [36]. It has been known that people with the variant of the GRM3 gene were at increased risk of developing bipolar disorder as well as schizophrenia [32]. The power of MMNm in the right hemisphere under the pure-tone condition was significantly delayed in patients with bipolar disorder [73]. MMNm elicited by pitch deviance could be a potential trait marker reflecting the global severity of bipolar disorder [71]. Recent advances in this research line can be found elsewhere [47].

In the research lines of basic studies, it was found that ensembles of physically different stimuli that share the same abstract feature could produce MMN (abstract-feature MMN) [64]. A notion of error signals in a framework of predictive processing may include MMN-like activities produced by cross-modal oddball paradigms such as visual-auditory [22, 79, 85, 87] and motor-auditory [86] links. In between the previous notion of stimulus repetition and oddball paradigms, roving standard paradigm has been receiving attention [10]. Using this paradigm, a recent study proposed two separate mechanisms involved in auditory memory trace formation [58].

5 Higher-Order Stochastic Sequence

Predictive coding frameworks of perception [16] tell us that most of the stimulus sequences used in AEF studies may have been too simple for our brain as a highly predictive organ. In other words, most of the event-related response studies may reflect the brain already adapted to (i.e., learned) the experimental paradigm (i.e., contrivances). Indeed, rule learning is achieved very rapidly even when complex rules are embedded in the sequence [5, 45].

The Markov chain [41] is one of the methods that can regulate statistical rules and systematically expand the scope of stochastic information processing in auditory system. The Markov chain model is a specific form of variable-order Markov models, which have been applied to a wide variety of research fields such as information theory, machine-learning, and human learning of artificial grammars [57]. The Markov property is described that the next state depends only on the recent state and not on the sequence of events that preceded it.

A conventional oddball paradigm in which standard (s) stimuli are repeatedly presented while infrequently replaced by the other rare (r) stimuli with a probability of p is described by the following two unconditional probability components: \( \left\{\mathrm{P}(s)=1-\mathrm{p},\mathrm{P}(r)=\mathrm{p}\right\} \). The oddball sequence is often constrained by pseudo-random replacement of rare stimuli so that the rare stimuli are not presented consecutively. The stimulus sequence in such a case is considered as a discrete-time Markov chain (DTMC) of one order and can be described by the following three conditional probability components: \( \left\{\mathrm{P}\left(s\Big|s\right)=\left(1-2\mathrm{p}\right)/\left(1-\mathrm{p}\right),\mathrm{P}\left(r\Big|s\right)=\mathrm{p}/\left(1-\mathrm{p}\right),\mathrm{P}\left(s\Big|r\right)=1\right\} \). The next stimulus “s or r” is statistically defined by the most recent stimulus “s or r” (Fig. 5.4a). The DTMC, thus involving conventional pseudo-random oddball sequences, can systematically control the randomness and regularities embedded in stimulus sequence.

Fig. 5.4
figure 4

(a) A state transition diagram of pseudo-random oddball sequences. From the law of total probability \( \mathrm{P}(r)=\mathrm{P}\left(r\left|s\right.\right)\mathrm{P}(s)+\mathrm{P}\left(r\left|r\right.\right)\mathrm{P}(r) \); \( \mathrm{P}(s)=\mathrm{P}\left(s\left|s\right.\right)\mathrm{P}(s)+\mathrm{P}\left(s\left|r\right.\right)\mathrm{P}(r) \), where \( \mathrm{P}\left(r\left|r\right.\right)=0,\ \mathrm{P}\left(s\left|r\right.\right)=1,\ \mathrm{P}(r)=\mathrm{p},\ \mathrm{and}\ \mathrm{P}(s)=1-\mathrm{p} \), we have the transitional probabilities \( \mathrm{P}\left(s\left|s\right.\right)=\left(1-2\mathrm{p}\right)/\left(1-\mathrm{p}\right)\ \mathrm{and}\ \mathrm{P}\left(r\left|s\right.\right)=\mathrm{p}/\left(1-\mathrm{p}\right)\ \left(\mathrm{p}<0.5\right) \). (b) A state transition diagram of a second-order Markov chain used in our study. The circled digits indicate two adjacent tones: “3,5” indicates tone 5 and is followed by tone 3. The arrows on the solid lines indicate transitions with 80 % probability and those on the dashed lines indicate transitions with 5 % probability each from the state “2,1.” All the other transition arrows with 5 % probability were ignored to avoid illegibility

Although many behavioral studies investigated auditory sequence learning or statistical learning [65], surprisingly few neurophysiological studies have been conducted to date [1, 12, 52]. To the best of our knowledge, Furl et al. first conducted an MEG study using stimulus sequences based on a Markov stochastic model [18]. In this study, MEG responses were recorded while listeners were paying attention to the auditory sequence. We carried out a study using Markov paradigm to clarify whether learning achievement was detected by a neurophysiological measure in ignoring condition, taking clinical susceptibility of ignoring condition into consideration. A difficulty of the sequence learning, which is parallel to the time that is needed to learn the sequence, depends on variabilities of stimuli, transition patterns, and the order of the Markov chains.

After pilot studies, we set the parameters so that the learning effect could be captured within the measurement period of 20–30 min. Figure 5.4b indicates an example of stochastic sequences used in our recent study [11].

During exposure sequence progression, the responses to the tones that appeared with higher transitional probability decayed, whereas those to the tones that appeared with lower probability retained their amplitude. Temporal profiles of the source waveforms seem astonishingly similar to those depicted in previous MMN/MMNm studies (Fig. 5.5). The amplitude decay, which may reflect learning achievement, was more rapid in attending condition compared to ignoring condition (Fig. 5.6). Learning effect detected in between the first and second thirds of the explicit sequence was preserved in the last third of the sequence, in which spectral shifts occurred without changing relative pitch intervals (i.e., transposition), suggesting that the participants could recognize the transposed sequence as the same melody already learned in the preceding exposure sequence. In ignoring condition, the MEG data indicated that the participants could finally learn the melody.

Fig. 5.5
figure 5

Representative grand-averaged source-strength waveforms for the N1m responses (N = 14). The solid lines represent the responses to tones that appeared with higher transitional probability, and the dashed lines represent the responses to tones that appeared with lower transitional probability

Fig. 5.6
figure 6

Time courses of N1m amplitude changes along sequence progression in the ignoring (a) and attending conditions (b) (N = 14). The solid lines link N1m amplitudes for tones that appeared with higher transitional probability, and the dashed lines link N1m amplitudes for tones that appeared with lower transitional probability. The error bars indicate the standard error of the mean

The temporal profiles of the difference waveforms obtained from the responses to tones that appeared with higher and lower transitional probabilities were quite similar to the MMNm. The difference observed in this study cannot be explained by short-term effects based on the assumption that there are distinct change-specific neurons in the auditory cortex that elicit the MMN. On the other hand, the adaptation hypothesis assuming that preceding stimuli adapt feature-specific neurons was proposed to interpret the MMN [29].

The present results suggest that the adaptation to the statistical structure embedded in tone sequences may extend longer in timescale than sensory memory, which has been considered to work as a comparator between immediate past and forthcoming stimulus in previous studies using conventional oddball paradigm. A possible comparator that works during the statistical learning in this time range may involve the hippocampus [38].

Recently, Yaron et al. found that neurons in the rat auditory cortex were sensitive to the detailed structure of sound sequences over timescales of minutes, by controlling periodicity of deviant occurrence in oddball sequences [83]. On the other hand, Wilson et al. found a difference between humans and macaque monkeys despite a considerable level of cross-species correspondence [80]. These findings suggest that, although the mammals share auditory statistical learning mechanism expanding a timescale of minutes, there may be some difference between humans and monkeys when a statistical structure of the auditory sequence becomes as complex as human languages.

One of the main functions of our brain is to convert ensembles of individual occurrences in our environment into integrated knowledge by accumulating incidence and extracting hidden rules in order to better cope with future environmental changes. Introduction of higher-order stochastic rules into neurophysiological measure may give us a new scope for clinical pathological evaluation.

6 Conclusion

Clinical studies related to basic auditory function were briefly reviewed and our recent trials were described as a possible way to further investigate human auditory function. Any auditory sequence has statistics in the temporal and spectral axes. In discrete tone sequences, simple repetition {P(s) = 1} and random sequence {P(Si) = 1/i} are both ends of a broad statistical spectrum of the auditory events occurred in our living environment. Auditory sequences such as language and music have a hierarchical statistical structure, which can be interpreted as a context inspired by the human brain. Conversely, the human brain also processes sensory inputs in context-seeking fashion, which leads to integration of knowledge necessary to cope with forthcoming events at least cost. In any experimental approach to the human brain, such learning feature of the brain cannot be neglected. It must be noted that various levels of learning (e.g., gaiting, adaptation, reasoning) that involve distinct coupling of multiple brain regions, depending on the experimental paradigms or tasks, may underlie acquired neurophysiological data.