Introduction

Music production and perception invoke a complex set of cerebral process that rely on the integration of sensorimotor, cognitive, and emotional pathways. A universal feature of the human experience, music is created as a form of expression, evocation of emotion, or means of social interaction [1]. Music perception requires the complex task of integrating the components of pitch, rhythm, harmony, and timbre, over time. Despite the complexity of this cognitive task, infants and newborns show autonomic and emotional responses to music [2].

Speech and language share many of the processes involved in music production and appreciation. Both language and music rely on processing and integrating multiple components of sound. Musical elements such as pitch and rhythm correlate with similar elements in speech. The basic elements combine to create phonemes in speech and notes in music, which are then combined into higher order structures, words and sentences in speech and melody and harmony in music. Much work has been done to understand the overlap in neural processing of language and music both at fundamental and higher order levels [3]. Our understanding is aided by research on disorders of both production and processing of music and speech. Given the overlap between these two systems, there is potential for the application of music therapy in language processing disorders such as dyslexia and aphasia [4]. This review will focus on the neural mechanisms underlying pitch as a component of both music and language, and the application of music training for dyslexia and language processing disorders.

Pitch

Pitch is a fundamental perceptual attribute of sound that is organized on a scale from low to high. Although pitch is not equivalent to frequency, the perception of pitch is correlated with the fundamental frequency of sounds. The frequency of oscillations of sound waves (e.g., 440 Hz) is a fundamental element of sound. Pure tones of a given frequency can be generated and are extensively used in research of the auditory system. However, the vast majority of sounds encountered in the environment are complex tones made up of multiple frequencies. Periodic complex tones, such as those created by the voice or musical instruments, are a composite of a fundamental frequency and harmonics. Harmonics are multiples of a given fundamental frequency. For example, the fundamental frequency of a violin D-string is 294 Hz, while multiple harmonic frequencies are created as well to create a complex wave tone. It is the composite of these frequencies and harmonics that are heard and processed by our brain into the psychoacoustic phenomenon we call pitch [5].

A majority of the sounds heard in everyday life are complex tones comprising a series of complex frequencies and harmonics. Sounds are initially deconstructed in the peripheral auditory system, which is organized tonotopically, i.e., in a gradient of frequencies. The inner hair cells in the cochlea and the auditory nerve are tuned to single frequencies [6]. The majority of brainstem and subcortical neurons are tuned to a single frequency; however, a significant portion of neurons are found to have multi-peak frequency tuning [7, 8]. In the auditory cortex of non-human primate marmosets, 20 % of primary auditory cortex neurons have multi-peak frequency tuning [9]. Harmonically sensitive neurons have been found in the auditory cortex in various animal models. In fact, some cortical neurons respond best to harmonic tones rather than individual frequencies [9]. These studies have led to the hypothesis that auditory processing is organized in terms of harmonic frequencies [10].

Pitch information is part of the human perception of sound and a fundamental building block for more complex cognitive functions such as speech and music. Changes in pitch are used to convey information in both music and speech as well as allow humans to separate sounds that originate from different sources [11]. Although pitch perceived by a listener is often related to the specific frequencies of a waveform, there are multiple examples where the perceived pitch is not explained by the physical parameters of the stimulus (e.g., the missing fundamental) [12]. The human cochlea can resolve only the first five to eight harmonic frequencies of a complex tone [13]; thus, only certain frequencies of a given sound are transmitted. Interestingly, sounds with very different spectra can have the same pitch, and sounds with similar spectra can have different pitches [14]. There is much still to uncover with regard to the basic encoding of pitch in the auditory system, but recent animal studies and fMRI studies have begun to unlock the answers to these fundamental questions.

Functional localization of pitch

Since the recognition that pitch is a construct of human perception, researchers have sought to identify a “pitch center” in the brain, where processing auditory information is a matter of resolving distinct pitches. A pitch processing center should show pitch constancy, i.e., same response to a given pitch value and strength regardless of the stimulus [11]. There is little evidence that a pitch center or pitch-selective neurons exist in the periphery or the brainstem. Pitch-responsive neurons have been identified in the primary auditory cortex of marmosets, which respond to a given pitch rather than the individual components of a complex tone [15]. It is important to differentiate neurons that merely bear pitch information from neurons (or more likely cortical regions) that discriminate and process pitch. Multiple techniques including fMRI, electroencephalography (EEG), and magnetoencephalography (MEG) have been employed in humans and non-human primates in attempts to localize a pitch center.

Numerous studies have identified the lateral Heschl’s gyrus (HG), located adjacent to the anterolateral border of the primary auditory cortex along the superior temporal gyrus, as the putative pitch center (Fig. 1) [16, 17]. Heschl’s gyrus is one of the first and most consistently identified areas to be activated by pitch modulation [18]. The temporal and spectral maps are distinct within the primary auditory cortex but appear to overlap portions of HG [19•]. Many of the studies implicating the lateral HG also showed activation of the planum temporale (PT), located more posteriorly along the superior temporal gyrus, as previously reviewed [14]. Taken together, these studies call into question the idea of a singular specialized region for pitch [14, 19•]. A recent meta-analysis of fMRI studies showed loci for pitch processing in the superior temporal gyrus lateral to Heschl’s gyrus [20••]. They also found that processing of infrequent pitch changes was centered in the PT, significantly posterior to other areas of pitch processing. In some listeners, pitch responses were also identified in the temporo-parieto-occipital junction or the prefrontal cortex [11], suggesting that significant individual variation may exist in the localization of pitch processing.

Fig. 1
figure 1

Schematic of the neuroanatomical structures involved in auditory processing and pitch perception. Located on the superior temporal gyrus (STG) is the primary auditory cortex (PAC) and planum temporale (PT). Note that Heschl’s gyrus is located deep to the primary auditory cortex and not shown in this projection. White matter structures: superior longitudinal fasciculus (SLF), arcuate fasciculus (AF), uncinate fasciculus (UF), inferior longitudinal fasciculus (ILF); extreme capsule fiber system (ECFS). Additional gray matter structures: middle temporal gyrus (MTG), inferior frontal gyrus (IFG), Brodmann areas 44 and 45 (44 and 45), supplementary motor area (SMA); presupplementary motor area (pre-SMA) (adapted with permission from Loui, Psyche [24])

Initial stages of pitch processing occur symmetrically between both cerebral hemispheres [18]. Asymmetries begin to emerge when temporal information is taken into account, i.e., melodies versus fixed pitch. In the right-lateralized areas of Heschl’s gyrus and the anterolateral border of the PT, there were markedly increased responses to modulation of complex tones [21]. When pitch stimuli create melodies, even further cortical areas from Heschl’s gyrus and the planum temporale are activated, consistent with further processing of melodic sound in an anterolateral direction [18]. Thus, in the hierarchical structure of pitch processing in the auditory cortex, pitch is initially extracted bilaterally, whereas long-term variations in pitch are subsequently processed asymmetrically (Fig. 1). One plausible interpretation is that the left hemisphere has selective acuity at encoding temporal changes, whereas the right hemisphere possesses superior spectral resolution, which is necessary for detecting changes in both pitch and timbre [22].

Studies on higher order functions investigate divergent streams of processing leaving the primary auditory cortex. The components of language are processed in independent pathways, the dorsal and ventral systems (Fig. 1) [23]. After the early cortical stages of speech perception, the information diverges into the dorsal and ventral processing streams, which map sensorimotor integration and perception of speech, respectively. The dual-stream model is thought to apply to both music and speech [24]. However, the contribution of these networks to understanding auditory processing is only beginning to be elucidated [25]. Projecting ventro-laterally toward the inferior posterior temporal cortex, the ventral stream is connected via the extreme capsule, uncinate fasciculus, and middle and inferior longitudinal fasciculi. The dorsal system projects dorso-posteriorly and is supported by the superior longitudinal and arcuate fasciculi. The dorsal stream plays a role in speech production and categorization of phonemes [26]. The arcuate fasciculus (AF) is a prominent tract consisting of white matter that connects the caudal temporal and inferior parietal cortices to the frontal lobe [27]. Recent studies suggest that during development, white matter tracts of the arcuate fasciculus continue to develop into adolescence whereas ventral stream tracts reach maturation in early infancy, suggesting that the arcuate fasciculus plays an important role in the developmental stages of children with disorders such as auditory processing dysfunction, congenital amusia (tone deafness), and dyslexia [28•].

Musical Training and Variations in Pitch Processing

Investigations of musicians and the effects of musical training have been informative of the anatomic and functional pathways of music and auditory processing. Musical training is associated with volumetric differences in the primary auditory cortex in the HG, PT, corpus callosum, and cerebellum [29]. These volume differences may have functional consequences; for example, musicians were found to have 130 % larger anteromedial HG that correlated with significantly better performance on melody discrimination [30]. Early music training is associated with enhanced auditory discrimination, bimanual motor synchronization, and sensorimotor integration [31, 32].

Musical training results in plasticity of neural networks [33]. On a pitch memory task, non-musicians rely more on brain regions important for pitch discrimination while musicians use brain regions specialized in short-term memory and recall [34]. Cerebral networks show increased sensitivity in certain brain regions associated with musical syntax, timbre, and sensorimotor integration in musicians when listening to music played on their instrument of expertise (i.e., flutist listening to music played on a flute) [35, 36]. Even the instrument played and type of music performed influence brain structure and function. For example, the relative size of the left and right motor cortices differs between piano and string players [37], and the arcuate fasciculus volume is increased in vocalists as compared to instrumentalists [38]. A recent fMRI study showed individuals with expertise in musical improvisation—a staple of jazz—were found to have higher functional connectivity among prefrontal, premotor, and presupplementary motor cortices, suggesting a more efficient neural network [39].

Underlying all of the anatomic and functional changes, there is a genetic predisposition to certain aspects of musical ability. A recent review of the literature suggested loci 4p14 and 4q22 on chromosome 4 may play a role in pitch discrimination, while 4q23 changes are linked to pitch accuracy of singing [40]. Further study is needed to understand the phenotype-genotype correlations of these studies and the functional consequences of these variants.

Perfect pitch, more accurately labeled as absolute pitch, is an auditory phenomenon in which one is able to identify and/or reproduce a given musical note without external reference [41]. It is a rare ability (~1:10,000 individuals) and suggested to have a genetic basis in at least some cases, with candidate genes remaining to be validated [40]. In subjects possessing absolute pitch, the left posterior dorso-lateral frontal cortex has been implicated as a key cortical association area [42]. During an interval judgment task, activity within the right inferior frontal cortex of subjects without absolute pitch was observed, but such activity in subjects with absolute pitch was not present, suggesting that subjects with absolute pitch do not need to access working memory when assessing musical intervals. MRI measurements of cortical volume indicated that the subjects with absolute pitch had a larger left planum temporale [43], which either correlates with improved “pruning” of the right PT or improved pitch naming associated with the left PT. A more recent study argues that individuals with absolute pitch may be using different pitch processing mechanisms [44]. Structural and functional neuroimaging studies, comparing absolute pitch possessors with controls who were matched for musical training, showed increased activation as well as white matter connectivity in the superior temporal gyrus [45, 46], suggesting that the increased categorization ability in absolute pitch possessors is subserved by structural and functional connectivity between brain regions that enable auditory perception and categorization.

A wide spectrum of pitch perception exists within the population, with up to 17 % of individuals self-reporting tone deafness [47]. Tone deafness, or congenital amusia, is a deficit of musical ability, specifically in pitch processing, recognition, and reproduction. Neuroimaging studies have identified several specific neural correlates of tone deafness. Results from a cortical thickness study found a thicker cortex in the right inferior frontal gyrus and right auditory cortex in individuals with amusia [48]. Structural abnormalities in the superior arcuate fasciculus in the right hemisphere were identified using diffusion tensor imaging (DTI) in individuals with amusia [49, 50]. Taken together, these studies led to the hypothesis that amusia is a result of abnormal neurodevelopmental connections along the fronto-temporal pathways [51].

Few studies to date have evaluated children to understand the development of congenital amusia. Congenital amusia can be identified in children as young as 6 years [52], and 4 weeks of daily music listening did not normalize children’s abilities or neural activity as measured electrophysiologically [53•]. While the right superior arcuate fasciculus is identifiable by DTI in 6–10-year-old children (Fig. 2), it shows heterogeneity between individuals. It remains to be seen if this inter-subject heterogeneity correlates with pitch discrimination in children. These results will help understand the neurodevelopmental processes that underlie pitch discrimination in the developing brain.

Fig. 2
figure 2

Group-averaged arcuate fasciculus of nine pediatric patients as identified by diffusion tensor imaging (DTI). The averaged arcuate fasciculus of nine 6–10-year-old subjects was calculated using DTI, overlaid on a template fractional anisotropy (FA) image in radiological convention. Superior arcuate fasciculus (tracts identified between the superior temporal gyrus and inferior frontal gyrus) are in red and yellow. Inferior arcuate fasciculus (tracts identified between the middle temporal gyrus and inferior frontal gyrus) are in blue and light blue. Brightness of the voxels in each tract corresponds to the number of subjects who share a tract in that voxel

Dyslexia

Dyslexia is a learning disorder characterized by difficulties with reading fluency, phonological awareness, and accurate comprehension despite average or above average intelligence. A common finding among individuals with dyslexia is impairment in phonemic awareness, which is the ability to process and manipulate spoken words made up of individual sounds or phonemes [54, 55]. Significant overlaps between phonemic awareness and musical sound processing have been shown [56, 57].

Children with developmental dyslexia often have persistent difficulty recognizing rhythmic patterns. In a study conducted by Wolff in which adults and children with dyslexia conducted the metrical task of finger-tapping to keep time, results showed that children with dyslexia have great difficulty with rhythmic finger-tapping and primary sensory impairment [58]. Metrical organization similar to that of finger-tapping tasks is used in the phonological processes of language and pitch perception. Pattern perceptions of pitch is important for speech prosody [59], a predictor of later reading skills [60] and one of the major impairments in dyslexia [61]. Phonemic awareness is correlated with pitch perception-production skills in children [62]. Tone-deaf individuals have impairments in speech intonation, lexical tone in tonal languages, and phonemic awareness [6365].

The speed of processing is an important component to pitch and tone recognition and is thought to be an essential component underlying language processing disorders. Rapid temporal processing of speech and non-speech signals is altered in children with language impairments [66]. The speed of pitch processing may also be a predictor for language development. Benasich and Tallal showed that when given rapidly presented tones of different frequencies, infants’ discrimination threshold for rapid auditory processing at 7.5 months predicted language outcome at 3 years of age [67]. Further studies have shown behavioral and electrophysiological differences in rapid auditory processing even in newborns with genetic or familial risk for developmental language-learning impairments. Electrophysiological studies using event-related potentials have demonstrated left-hemisphere-specific dysfunction prior to language development in children at risk for language delay [6871]. Functional MRI data show overlapping activation among speech and non-speech sounds within the left primary and secondary auditory cortices, supporting the notion of a shared neural network for rapid temporal processing of speech and non-speech sounds [72].

Imaging studies have expanded our understanding of brain regions involved in dyslexia and language disorders. Adolescent children with dyslexia demonstrated reduced activation of the left occipital-temporal cortex, which is considered to be a critical region for reading skills [73]. In addition, gray matter volume was diminished in the bilateral anterior cerebellum, bilateral fusiform gyrus, and right supramarginal gyrus of children with dyslexia. Another volumetric study was consistent with a multifocal network of brain abnormalities in dyslexia involving the left superior temporal gyrus, occipital-temporal cortices, and cerebellum [74].

Given the overlap that exists between music and speech processing, music training may provide benefit to individuals with dyslexia. Musical training has been shown to improve many aspects of auditory processing, language, and literacy skills [75]. A recent study in low-income children showed that those receiving music instruction retained age-normed reading performance, whereas those without had expected declines [76•]. Musical training alters the functional anatomy underlying rapid spectrotemporal processing of non-linguistic stimuli, resulting in improved behavioral performance along with a more efficient functional neural network primarily involving traditional language regions [66]. Studies have shown a strong relationship between musical ability and language and literacy. While the underlying mechanisms are unknown, research suggests that musical training enhances the neural encoding of speech, perhaps because musical training demands greater precision in auditory processing than speech perception alone [77]. Although a meta-analysis found no conclusive evidence for music education improving reading skills in children [78••], studies have consistently suggested that music training could be beneficial for reading skills [79], with persistent functional brain changes following even short-term musical training in children supporting the benefit of early intervention [8082]. Future studies should elucidate the effects of music therapy on dyslexia and the overlapping neurodevelopmental processes that underlie music and language processing.

Conclusions

Perception of music and language involve processing and integrating multiple components of sound. The tonotopic system of pitch perception in the peripheral auditory system transforms into an increasingly complex system in primary and then association auditory cortex. The human pitch center probably does not correspond to a single area but involves at least the lateral Heschl’s gyrus and the planum temporale. Other areas invoked include the temporo-parieto-occipital junction and prefrontal cortex. Further processing and lateralization of functions may play a role in perception of more complex melodies. The arcuate fasciculus, a white matter tract connecting the temporal and parietal cortices to the frontal lobe, may play a role in neurodevelopmental disorders ranging from congenital amusia to dyslexia. Music training is associated with volumetric brain differences and enhanced auditory processing, language, and literary skills. Such intervention may have beneficial effects for patients with neurodevelopmental disorders affecting language and reading acquisition as well as neuroprotective or enhancing effects for the general population.