Keywords

Speech disorders characterize the cerebellar syndrome. In classical descriptions speech is reported as scanning, hesitant, explosive (Darley et al. 1975) due to a lack of coordination of both articulation and phonation within a broader ataxic syndrome (ataxic dysarthria) (Duffy 2013). Speech disorders are commonly observed in the presence of atrophic damage (Schalling and Hartelius 2013); they have been less frequently described in the presence of focal lesions confined to the cerebellum, primarily in the SCA territory (Urban 2013). Cerebellar damage also produces disorders of covert articulation (Silveri et al. 1998), which affect the planning of speech production at a “prearticulatory” level (Ackermann et al. 2007).

Evidence of cerebellar involvement in processing speech time parameters and in discriminating perceptual components of speech suggests that the role of the cerebellum transcends the articulatory level (Ackermann et al. 2007; Mariën et al. 2014) also in the perspective of evolution and acquisition of language. Time is a crucial parameter in discriminating speech sounds, particularly sound categorical stimuli. Linguistic perception is, in fact, categorical (Liberman et al. 1967); we are able to perceive stimuli that we learned as different by means of a categorization process which, during language acquisition, “mapped” continuous sensorial phenomena such as acoustic stimuli onto a limited number of sound categories (phonemes) which vary across languages. Thus, the brain assigns acoustic stimuli to qualitatively different categories (and never to intermediate categories). In particular, Voice-Onset-Time (VOT), i.e., the interval between the start of a stop consonant and the onset of the vibration of the vocal folds (voicing), allows perceiving sounds as different. For example, different length VOT allow the listener to distinguish the sound/ba/ (VOT <30) (voiced) from/pa/(VOT >30) (voiceless) (the phoneme boundary effect). In humans, the vocal tract is divided into independent components (particulation) (Studdert-Kennedy 1998). Speech sounds produced by each of these components are combined in various ways and each sound is influenced by both the preceding and the following ones (coarticulation).

Perception of speech has to be integrated with visual articulatory information (McGurk and MacDonald 1976. The “motor theory of speech perception” (Liberman et al. 1967; Liberman and Whalen 2000) assumes that perception of verbal sounds requires identification of the “articulatory gestures” the vocal tract is supposed to perform to pronounce those verbal sounds by evoking the motor representation in the listener’s motor cortex. Thus, “articulatory gestures” are the objects of both production and perception, which develop together in evolution; their representations are immediately linguistic and do not require the intervention of other components of the linguistic system (Liberman and Whalen 2000). In other words, “speech” does not merely consist of sounds as such (Liberman and Whalen 2000); instead, “speech” has to be intended as the only natural human communication system (Rizzolatti and Craighero 2007).

The identification of a mirror neurons system in the premotor cortex of humans (Rizzolatti et al. 1996) provided experimental support for Liberman’s “motor theory of speech perception” because this system represents the structural basis of the direct link between speaker and listener. The existence of mirror neurons that respond to sounds produced by the orolaryngeal tract of the speaker (echo-mirrors) has also been hypothesized (Fadiga et al. 2002). This echo-mirror neurons system might also have a role in evolution. In the F5 human (Broca’s) area echo mirror neurons might have evolved to simultaneously code “gestures” generated in the vocal tract (speech articulatory movements) and in the body (actions), which is a prerequisite for the development of a relationship between phonetics and semantics (Rizzolatti and Craighero 2007).

The cerebellum has been considered as a “timing system” in both movement and perception (Keele and Ivry 1991). Ivry and Keele (1989) demonstrated that the ability to estimate the duration of time intervals is impaired when cerebellar damage is present. In fact, patients (unlike normal subjects) were unable to discriminate different intervals of time demarked by two clicks. VOT was confirmed to be altered in speech production (Ackermann and Hertrich 1997), and an impaired phoneme boundary effect was demonstrated in patients with cerebellar atrophy by adopting acoustic stimuli that differed only in terms of a duration parameter (occlusion time), independently of voicing (Ackermann et al. 1997).

Despite the need for further experimental evidence, these data seem to confirm that the cerebellum acts as an internal clock (Keele and Ivry 1991) and intervenes to differentiate speech sounds in both production and perception. Thus, it contributes to accessing the phonological aspects of language and indirectly supports language functions in which the “timing” of phonological components has a role, such as in the application of syntactic rules (Silveri et al. 1994) and in working memory (Silveri et al. 1998). Cerebellar “timing” might also contribute to the correct combination of sounds in coarticulation processes by coordinating movements of the components of the vocal tract (where the cerebellum controls about 100 muscles) (Ackermann et al. 2007).

The reciprocal connectivity between the phylogenetically newest portion of the cerebellum and the anterior cerebral cortex (primarily Brodmann area 44-45-Broca’s area and premotor cortex) (Leiner et al. 1991; Stoodley et al. 2012) represents the neural basis of the cerebellar contribution to speech. But, as clinical studies suggest (Kumral et al. 2007) speech is a distributed function that requires the integrity of both cortical and subcortical structures. Among the subcortical structures, the basal ganglia make the greatest contribution to speech (Kotz et al. 2009) but with a different role than the cerebellum (Booth et al. 2007).

In conclusion, within a broad functional system that includes both cortical and subcortical structures, the cerebellum is responsible for time information processing during speech production and discrimination of perceptual components of speech. Thus cerebellar damage can be followed by disorders of speech production and perception and, in turn, by disorders of linguistic and cognitive processes connected to speech production and perception.

If the “articulatory gestures” represent, by means of the (echo) mirror neurons mechanism, the direct link between speaker and listener, allowing that type of communication from which language evolved (parity and direct comprehension between speaker and listener) (Rizzolatti and Arbib 1998), then a contribution to language evolution should also be attributed to the cerebellum to the extent that it modulates the temporal components of speech. Likewise, congenital damage of the cerebellum might account for disorders of speech/language acquisition (Misciagna et al. 2010).