Keywords

1 Introduction

A theory is a way of thinking about what a given system is and how it behaves. It is grounded on available experimental data, but a good theory will also be consistent with data obtained in the future. Furthermore, a good theory provides experimentally testable predictions to further enhance our understandings of the system.

Around 1970, Marr (1969), Ito (1970), and Albus (1971) conceptualized the computational principle of the cerebellum based on a blueprint of the cerebellar circuit (Eccles et al., 1967). About 10 years later, Ito and his colleagues found the missing piece of the concept experimentally, which was plasticity at parallel fiber-Purkinje cell synapses (Ito et al., 1982; Ito, 1989). Following the discovery of the plasticity that we now know as long-term depression (LTD), the Marr-Albus-Ito model was established. Since then, the model has been analyzed, challenged, and extended to obtain better and deeper understandings of cerebellar computational principles (Ito, 1984, 2012).

The Marr-Albus-Ito model has provided a compass with which to navigate the ocean of cerebellar research for 50 years (Yamazaki & Lennon, 2019). Without the compass, researchers would be easily drawn in front of a huge amount of experimental data. Meanwhile, the model has been gradually evolving to account for new findings and concepts up to date.

The original Marr-Albus-Ito model postulated various important concepts: applications to behavioral studies, granular layer encoding, memory capacity of Purkinje cells, distributed and synergistic synaptic plasticity, internal models, and general computational principles of the cerebellar circuit. For these concepts, a number of theoretical models have been proposed. We have summarized the “evolution” of the original model as an evolutionary tree (Fig. 11.1). In this article, we review how the original MAI model has been adapted to account for those concepts.

Fig. 11.1
figure 1

An evolutionary tree of the Marr-Albus-Ito model. Gray boxes represent models, whereas lines represent inheritance relationships. Blue boxes represent conceptual works published outside of the cerebellar research. Green boxes are representative review papers. The pink box indicates the Marr-Albus-Ito model. Please note that the tree is not exhaustive. Only the models necessary to draw the diagram are shown. Abbreviations as in the text

Meanwhile, computational studies using supercomputers are another important direction for cerebellar research. While theoretical studies aim to extract the principle or “essence” of a given system by eliminating a number of biological details, computational studies aim to replicate the system itself to reproduce the dynamics by incorporating as many details as possible. Both are called models, but their approaches are completely different. For computational studies, another review article will be available (Yamazaki et al., Submitted).

2 Evolutionary Tree of the Marr-Albus-Ito Model

To illustrate how the Marr-Albus-Ito model has influenced its successors, we compiled a number of theoretical models that stemmed from the Marr-Albus-Ito model, and created an “evolutionary tree” (Fig. 11.1).

2.1 The Marr-Albus-Ito Model

The ancestor is of course the Marr-Albus-Ito model. Although there are fewer than ten types of neurons in the cerebellum (more specifically, in a corticonuclear microcomplex) that constitute a network with recurrent connections, these pioneers focused on a feedforward network composed of Pons (input layer), granule cells (middle layer), and Purkinje cells (output layer) with climbing fiber inputs, while leaving aside the other neurons and connections (Fig. 11.2). The resulting network was assumed to be a cerebellar counterpart of the perceptron (Rosenblatt, 1958), which is the simplest form of supervised learning machine. The Marr-Albus-Ito model, which was also called the perceptron hypothesis, postulated two important observations:

  • Granule cells in the middle layer encode afferent inputs from Pons via mossy fibers sparsely in a distributed manner.

  • Connection strengths from granule cells to Purkinje cells are adjusted by climbing fibers.

Fig. 11.2
figure 2

Reduction of the cerebellar circuit to a perceptron. (a) Schematic of a corticonuclear microcomplex. Abbreviations: MF mossy fibers, GR granule cells, Go Golgi cells, MLI molecular layer interneurons, PC Purkinje cells, PF parallel fibers, CN cerebellar nuclei, CF climbing fibers, IO inferior olive. (b) The Marr-Albus-Ito model known as a perceptron. Only Pons, GR, PC, and IO are considered

The first idea was called “codon theory” (Marr, 1969) or “expansion recoding” (Albus, 1971), whereas the second one was confirmed about 10 years later by Ito and his colleagues, which we now known as long-term depression (LTD) (Ito et al., 1982). During those 10 years, the parallel fiber-Purkinje cell LTD was a missing piece to establish the Marr-Albus-Ito model. Contrary to the most common synaptic plasticity mechanism called Hebbian learning (Hebb, 1949), in which a synaptic connection between a pair of neurons is updated based on the correlated activity of the pre- and postsynaptic neurons, a synaptic weight is updated based on the correlated activities of the presynaptic parallel fiber and the climbing fiber innervating the same Purkinje cell in LTD.

Moreover, these authors did not forget to discuss the potential roles of other components such as Golgi cells and molecular layer interneurons. The authors first dissected the essential components from other peripheral components, continued to investigate the roles of the other components separately, and finally integrated their roles once again into the main model.

2.2 Applications to Behavioral Studies

2.2.1 Eye Movement Control

The Marr-Albus-Ito model was readily applied to behavioral studies. The first attempt was an application to eye movement control such as the vestibulo-ocular reflex (VOR) and optokinetic response (OKR). VOR is an eye movement reflex in which the eyes rotate in the opposite direction compared with the head rotation, whereas OKR is a reflex in which the eyes rotate to the same direction in response to slow movement of the entire visual world. Information on the head and visual world movements are fed by mossy fibers to the cerebellum. When the eye rotation is insufficient against the head or visual world movements, the visual image on the retina slips. This retinal slip provides an “error” signal to Purkinje cells via climbing fibers, which induces learning to adjust the eye movement gain (gain adaptation). In VOR, artificial stimuli could decouple the phases of head rotation and eye rotation while keeping the eye movement gain, which is called phase adaptation.

In the original Marr-Albus-Ito model, granule cells were considered an encoder of spatial input patterns conveyed by mossy fibers. Fujita (1982a) introduced sinusoidal temporal dynamics of mossy fibers representing the head and eye rotations and that of a climbing fiber that represents retinal slip errors. The model also included an implementation of the granular layer network composed of granule cells and a Golgi cell. Due to distributed synaptic weights of mossy fibers and inhibition exerted by the Golgi cell, granule cells exhibited various sinusoidal activity patterns with different amplitudes and phases. In the end, the model successfully reproduced both gain and phase adaptations in VOR and OKR (Fujita, 1982b). The model adopted the notion of adaptive filtering in the field of engineering (Widrow et al., 1975) and therefore was called an adaptive filter model.

The adaptive filter model was a pioneering theoretical model that first put the Marr-Albus-Ito model into action. The model became a stepstone for eyeblink conditioning (section “Eyeblink Conditioning”) and general computational principles (Sect. 2.7). In fact, Fujita’s original adaptive filter model was later generalized in the context of general computational principles with the same name (Dean et al., 2010). This might produce unnecessary confusion.

After Ito (1975) on the flocculus hypothesis for VOR, another hypothesis on VOR was proposed from outside the Marr-Albus-Ito model (Miles & Lisberger, 1981). Those authors hypothesized that plasticity at mossy fiber synapses on the vestibular nuclei played the prominent role in VOR. In those days, the LTD at parallel fiber synapses on Purkinje cells was a matter of debate. A research group strongly argued that the inferior olive was a timing device that controls motor timing precisely (Llinás & Sugimori, 1980) (Sect. 2.8), but not a subsystem that provided teacher or error signals. Based on the Miles and Lisberger (1981) hypothesis, a series of theoretical models were reported later (Lisberger, 1988; Lisberger & Sejnowski, 1992; Lisberger, 1994), which proposed that Purkinje cells “guide” learning in the downstream neurons of the vestibular nuclei while incorporating recurrent connections from vestibular nuclei to Purkinje cells. Ito immediately responded to the hypothesis (Ito, 1982), and a long-lasting debate over 30 years started (Kandel et al., 2000). Through this debate, the concept of distributed synaptic plasticity within the cerebellum has been gradually developed (Sect. 2.5). In addition, there was an attempt to include the effect of recurrent connections into the Marr-Albus-Ito model (Tabata et al., 2002).

2.2.2 Eyeblink Conditioning

Eyeblink conditioning is a type of classical conditioning, in which an animal receives repeated presentations of a neutral stimulus such as a tone (conditioned stimulus; CS) paired with an aversive stimulus such as an airpuff to the eye (unconditioned stimulus (US)). The animal becomes conditioned to close its eyes in response to the tone (conditioned response (CR)). Moreover, in delay eyeblink conditioning paradigms, the CR is elicited with a delay equal to the interstimulus interval (ISI) between the CS and US onsets (e.g., Mauk and Donegan (1997) for review). The essence of eyeblink conditioning models is how to represent the passage of time during the CS. Most studies attempted to include the timing mechanisms in the granular layer. The first model used tapped delay lines (Desmond & Moore, 1988; Moore et al., 1989), in which neurons are connected in series and activated one by one sequentially. Tapped delay lines are found in superior colliculus for sound localization; this is known as the Jefress model (Jefress, 1948). Fujita’s adaptive filter model (section “Eye Movement Control”) was also extended and applied (Gluck et al., 1990). In the model, granule cells were assumed to exhibit different sinusoidal temporal activity patterns with various frequencies and phases, which could perform Fourier expansion. Another extension was a spectral timing model (Bullock et al., 1994), which assumed multiple Golgi cells with different membrane time constants and generated transient activities of granule cells with different timings and amplitudes. Buonomano and Mauk (1994) focused on the network dynamics in the granular layer and proposed that granule cells exhibit sparse and chaotic or random activity patterns through recurrent inhibitory connections with Golgi cells. These models led to the refinement of general computational principles of the cerebellum (Sect. 2.7).

Other studies attempted to represent the passage-of-time outside of the granular layer. Braitenberg et al. (1997) proposed that parallel fibers provide delay lines by assuming large conduction delays. The concept of spectral timing models (Bullock et al., 1994) was followed by Fiala et al. (1996), Steuber and Willshaw (2004), and Majoral et al. (2020), which proposed that parallel fiber synapses on Purkinje cells could exhibit various temporal activity patterns lasting for seconds through the activation of metabotropic glutamate receptors. Kotaleski et al. (2002) assumed that biochemical interactions within Purkinje cells produce an increase in protein kinase C (PKC) activation, which could contribute to temporal sensitivity of Purkinje cells lasting for seconds. Hong and Optican (2008) proposed a similar timing mechanism through interactions between Purkinje cells and molecular layer interneurons.

These models provide theoretical support for sparse coding in granule cells (Sect. 2.3), which has long stood in opposition to a similar coding hypothesis (section “Distributed Versus Similar Coding in the Granular Layer”).

For timing models, more detailed reviews are provided elsewhere (Yamazaki & Tanaka, 2009).

2.3 Granular Layer Encoding

One of the important concepts of the Marr-Albus-Ito model is the granular layer encoding of mossy fiber inputs. After the publication of Albus (1971), Albus published another paper that aimed to apply the cerebellar control mechanisms for engineering applications (Albus, 1975). The model called Cerebellar Model Architecture Control (CMAC) introduced a tile coding scheme within the granular layer. Models for VOR/OKR and eyeblink conditioning assumed various encoding schemes in the granular layer (Sect. 2.2). Tyrrell and Willshaw (1992) examined the possibility and efficiency of the granular layer encoding by a large-scale computer simulation for the first time. Later, the notion of sparse coding (Olshausen & Field, 1996) was introduced for the granular layer encoding, which was realized by anti-Hebbian learning mechanisms (Schweighofer et al., 2001), principal component analysis (PCA) (Dean et al., 2002), and chaotic spatiotemporal dynamics (Rössert et al., 2015). These studies were followed by those for unified gain and timing mechanisms and general computational principles (Sect. 2.7). Recently, the information capacity of sparse coding in the granular layer was examined mathematically (Cayco-Gajic et al., 2017), adding a new approach to an abundance of studies of the information capacity of Purkinje cells (Sect. 2.4). However, not all experimental data support the sparse encoding hypothesis, which is discussed later (section “Distributed Versus Similar Coding in the Granular Layer”).

2.4 Information Capacity of Purkinje Cells

Another issue of neural encoding is the information encoding by parallel fibers on Purkinje cells. Purkinje cells receive excitatory inputs from about 200,000 parallel fibers, but 80% of them are silent. This massive convergence may play an important functional role in cerebellar computation. The first influential study was reported by Brunel et al. (2004). The authors calculated how many spatial patterns are embedded in parallel fiber synapses on a Purkinje cell while assuming that parallel fibers and Purkinje cells are binary neurons by using the same technique for analyzing associative memory capacity (Hopfield, 1982). Later, the study was extended for temporally correlated input patterns (Clopath et al., 2012) and for analog but not binary neurons (Clopath & Brunel, 2013). Independently from these studies, Porrill and Dean (2008) reported that adaptive filter models using a covariance learning rule could achieve optimal synaptic weights against noisy parallel fiber inputs, suggesting that long-term potentiation (LTP) at parallel fiber-Purkinje cell synapses is also important (Medina & Mauk, 1999).

2.5 Distributed Synaptic Plasticity

One of the most active debates on the cerebellum is probably the location of motor memory in the cerebellum. As seen the above (section “Eye Movement Control”), Miles and Lisberger (1981) proposed that mossy fiber-vestibular nuclei synapses store the memory on eye movement gain in VOR, whereas Ito et al. (1982) proposed parallel fiber-Purkinje cell synapses for the memory site (Kandel et al., 2000). Medina and Mauk (1999) built a computer simulation model that has two synaptic plasticity mechanisms for mossy fiber-vestibular nuclei and parallel fiber-Purkinje cell synapses. They found that learned memory on mossy fiber-vestibular nuclei synapses is stable at the resting state if the memory formation is guided by Purkinje cells innervating the nuclei. Dual plasticity models have also been studied in depth mathematically (Masuda & Amari, 2008; Clopath et al., 2014). These models mainly address the formation of motor memory, not the consolidation process of learned memory. Yamazaki et al. (2015) integrated both formation and consolidation processes in a single model and succeeded in reproducing various experimental results including spacing effects.

These studies were accompanied by experimental findings of multiple distributed synaptic plasticity within the cerebellum (e.g., Boyden et al. (2004) and D’Angelo (2014) for review). Among them, plasticity at parallel fiber synapses on molecular layer interneurons is considered a mechanism that could supersede parallel fiber-Purkinje cell LTD. Parallel fiber-molecular layer interneuron synapses undergo LTP with conjunctive activation of a presynaptic parallel fiber and a postsynaptic molecular layer interneuron that could be activated by spillover of glutamate secreted from nearby climbing fibers, whereas molecular layer interneurons inhibit Purkinje cells. This suggests that parallel fiber-molecular layer interneuron LTP could provide the same function for the cerebellar cortex as a supervised learning machine (Jörntell et al., 2010). Yamazaki and Lennon (2019) built a model of the cerebellar cortex that takes both parallel fiber-Purkinje cell LTD and parallel fiber-molecular layer interneuron LTP into account and analyzed the system dynamics. Contrary to the classical hypothesis of the cerebellar cortex as a supervised learning machine, the authors suggested that the cerebellar cortex could act as a reinforcement learning machine. We will discuss this issue in Sect. 2.7.

2.6 Internal Models

Ito (1970) already described the role of internal feedbacks from the cerebellum to the cerebral cortex that could act as a forward model. Forward models are well-known in engineering and control theory, and so they were readily adopted in the context of motor control by the cerebellum. A Kalman filter model (Paulin, 1989) and a Smith predictor model (Miall & Stein, 1993) were successful examples. Inverse models, closely related to forward models, were proposed by Kawato et al. (1987). Inverse models are acquired by a feedback error learning scheme (Kawato & Gomi, 1992), which was proposed as a model of cerebro-cerebellar interactions. Furthermore, a general architecture consisting of multiple paired forward and inverse models was proposed (Wolpert & Kawato, 1998) and was finalized as MOSAIC (modular selection and identification for control) architecture (Haruno et al., 2001). The concept has been even adopted for cognitive processes (Ramnani, 2006, 2014). A tandem architecture of forward and inverse models was applied for interpreting adaptation of voluntary movements (Honda et al., 2018). In general, forward and inverse models are called internal models. Internal models may be the most successful of all theoretical studies in the history of cerebellar research. A comprehensive review on internal models has been provided elsewhere (Wolpert et al., 1998).

2.7 General Computational Principles

Theoretical models for VOR/OKR (section “Eye Movement Control”) and eyeblink conditioning (section “Eyeblink Conditioning”) led to generalization of the computational principles. Liquid state machines (Yamazaki & Tanaka, 2007) and generalized adaptive filter models (Dean et al., 2010) were proposed as a general computational principle of the cerebellum as an extension of the Marr-Albus-Ito model. These studies were able to explain gain learning (e.g., VOR adaptation) and timing learning (e.g., eyeblink conditioning) by a single computational principle (Yamazaki & Nagao, 2012). Another theoretical study examined the potential of the cerebellar cortical circuit as a universal functional approximator based on mathematical functional analysis (Fujita, 2016).

Pursuing general computational principles of the cerebellum led to studies that would supersede the classical view of the cerebellum as a supervised learning machine. In supervised learning, learning is driven by teacher or error signals (Raymond & Medina, 2018). However, various different learning schemes would be used as well in the cerebellar cortex (Streng et al., 2018; Hull, 2020). An early attempt was made by Kitazawa (2002), who proposed noise-driven learning at Purkinje cells. The same idea was elaborated recently with the name stochastic gradient descent (SGD), which is a general technique to find optimal solutions used in the field of machine learning (Bouvier et al., 2018). These schemes enable the cerebellar cortex to search optimal solutions autonomously. Further elaboration proposes that the cerebellar cortex is a reinforcement learning machine. In reinforcement learning, an agent (i.e., the cerebellum) acquires an optimal action strategy called a policy for a given environment by maximizing expected future reward through trial and error (Sutton & Barto, 2018). Yamazaki and Lennon (2019) proposed that Purkinje cells and molecular layer interneurons act as an actor and a critic, respectively, in an actor-critic model of reinforcement learning, while assuming that climbing fibers convey reward information.

2.8 Olivocerebellar System

In the Marr-Albus-Ito model, inferior olive is considered as the source of teacher or error signals conveyed by climbing fibers that drives learning. Neurons in the inferior olive are connected electrically by gap junctions and exhibit subthreshold oscillation of membrane potentials (Llinás & Sugimori, 1980). A research group has been proposing that inferior olive is not the site for motor learning but a site for controlling motor timing (Welsh et al., 2005) by using temporal dynamics of the subthreshold oscillation (see Llinás (2011) for review). Tokuda et al. (2010) proposed that such subthreshold oscillation combined with chaotic dynamics accelerates learning owing to chaotic resonance mechanisms. A recent theoretical study proposes that gap junctions in the inferior olive constrain motor learning by controlling the degrees of freedom of a system (Hoang et al., 2020). This study could interpret the role of the subthreshold oscillation in the inferior olive, which was the basis of the motor timing hypothesis, within the context of the Marr-Albus-Ito model. Another group studying the olivocerebellar system develops a recurrent network model composed of the inferior olive, Purkinje cells, and cerebellar nuclei via nucleo-olivary connections that could control learning rate and suppress overlearning (Kenyon et al., 1998a, b).

3 Perspectives

3.1 Summary of the Evolutionary Tree

The evolutionary tree could provide several interesting observations. First, the node that has the largest number of outgoing paths is Fujita (1982a), suggesting that the model is the most influential one from which many followers were born. On the other hand, the node that has the largest number of incoming paths is Dean et al. (2010), which is also a variant of adaptive filter models. Thus, adaptive filter models could be regarded as a backbone of all theoretical models on the cerebellum. Second, various important concepts have been introduced from the outside of cerebellar research, such as perceptron, adaptive filtering, sparse coding, and reinforcement learning. The “crossover” has improved and expanded the original concept of the Marr-Albus-Ito model continuously for the next generations. In turn, the original concept has not been altered largely against the crossover, indicating the robustness and concreteness of the original concept. Third, a number of excellent review papers have been available in a timely manner. These review papers help researchers to obtain the latest and consistent view of the computational principles of the cerebellum at the age. The evolution will be able to continue further, as long as we regularly incorporate new concepts from the outside and publish many review papers.

3.2 Unresolved Issues

Although various issues have been resolved through the evolution of the Marr-Albus-Ito model, there are still several unresolved issues, including:

  • Distributed versus a similar coding in the granular layer

  • Information representation by mossy fiber and climbing fiber signals

  • Local versus global computation revealed by distributed climbing fiber activity

3.2.1 Distributed Versus Similar Coding in the Granular Layer

While the granular layer encoding, in which granule cells receive mossy fibers transmitting different information, has theoretical as well as experimental supports (Billings et al., 2014; Ishikawa et al., 2015; Gilmer & Person, 2017), another group suggests that granule cells receive mossy fibers that represent the same information (Bengtsson & Jörntell, 2009). The scheme, which those authors called “similar coding,” may enable granule cells to transmit weak sensory inputs in a graded manner. The similar coding hypothesis suggests that the timing mechanism in eyeblink conditioning (section “Eyeblink Conditioning”) exists not in the granular layer but at Purkinje cells, which is supported by experimental observations (e.g., Johansson et al. (2016) for review). To resolve this argument, large-scale, wide field-of-view imaging studies of the granule cells has provided an essential perspective (Knogler et al., 2017; Giovannucci et al., 2017; Wagner et al., 2017). For comprehensive reviews, see Spanne and Jörntell (2015) and Gilmer and Person (2018).

3.2.2 Information Representation by Mossy Fiber and Climbing Fiber Signals

Classical debates on the information conveyed by mossy and climbing fibers were basically on sensory versus motor, because the cerebellum is known as a central locus for motor control. However, recent imaging studies have revealed that various types of information are represented by mossy and climbing fibers. Notably, even reward-related information is represented in both mossy fiber and climbing fiber signals (Badura & Zeeuw, 2017; Gilmer & Person, 2018). Furthermore, anatomical connections with the ventral tegmental area have been found (Carta et al., 2019). These findings suggest the involvement of the cerebellum in reinforcement learning.

3.2.3 Localized Versus Distributed Computation Revealed by Distributed Climbing Fiber Activity

Previous theoretical studies have examined computational capability of a single microcomplex, which is thought of as a functional module of the cerebellum (Ito, 1984). Over the cerebellar surface, a number of microcomplexes are arranged regularly in space to constitute the entire cerebellar circuit. Different microcomplexes could have different functional roles, so for each specific task, a subset of microcomplexes might be employed to achieve the task, while the rest remain silent. In other words, microcomplexes are organized in a task-specific manner. MOSAIC models seem consistent with this observations (Haruno et al., 2001).

However, a striking Ca2+ imaging study by Michikawa et al. (2020) has revealed that all microzones are always activated simultaneously, suggesting that all cerebellar modules could function in a holistic manner. This study implies that many rather than a small subset of microcomplexes share functions for a given task. Many questions would arise: how do different microcomplexes share a task? And do they interact with each other to accomplish the task? To address this issue, we must examine how multiple microcomplexes share a task on the fly. This could provide new insights on distributed computations over the entire cerebellum.

3.3 Future Directions

A future direction of cerebellar research will be to uncover the role of the cerebellum embedded in the whole brain network for higher-order cognitive functions, for which internal models of mental processes called “mental models” would play essential roles (Ito, 2008, 2012), while sharing the same computational principles with motor functions (Koziol et al., 2012). The interactions will be made through the cerebro-cerebellar communication loop (Allen & Tsukahara, 1974). For such higher- order functions, a whole brain learning mechanism would be necessary including the cerebral cortex,

basal ganglia, and cerebellum. In a pioneering study, Doya pointed out different roles of the cerebral cortex, basal ganglia, and cerebellum and argued how these regions interact with each other (Doya, 1999, 2000). In particular, direct interactions between the basal ganglia and cerebellum have been found experimentally (Carta et al., 2019), suggesting that the cerebellum is involved in even social tasks (D’Angelo, 2019). The whole brain learning architecture model has been extended recently (Caligiore et al., 2019).

These studies have assumed that the cerebellum is a supervised learning machine. However, Yamazaki and Lennon (2019) proposed that the cerebellum might act as a reinforcement learning machine by incorporating synaptic plasticity at parallel fiber-molecular layer interneurons as well as conventional parallel fiber-Purkinje cell LTD. Furthermore, the paper proposed that the whole brain could act as a hierarchical deep reinforcement learning machine, where the cerebral cortex stores deep representation of states and actions, the cerebro-basal ganglia loop performs higher reinforcement learning for goal setting and planning by decomposing a global task into a number of smaller subtasks, and the cerebro-cerebellar loop performs lower reinforcement learning that solves the subtasks in parallel (Fig. 11.3). Deep hierarchical reinforcement learning has proven to be a powerful machine learning algorithm (Kulkarni et al., 2016), which would be suitable for higher-order cognitive and social functions realized by the whole brain (Kawato et al., 2021).

Fig. 11.3
figure 3

A hypothetical role of cerebro-basal ganglia loop and cerebro-cerebellar loop for hierarchical reinforcement learning (Yamazaki & Lennon, 2019). The cerebral cortex (green) stores deep representation of states and actions. The cerebro-basal ganglia (blue) loop performs higher reinforcement learning that decomposes a global task into a number of smaller subtasks for goal setting and planning. The cerebro-cerebellar (yellow) loop performs lower reinforcement learning to solve subtasks for action execution in parallel. The thalamus (red) would play a certain role in the interactions of the dual loops. Abbreviation: RL reinforcement learning

4 Conclusion

The Marr-Albus-Ito model has cultivated a vast research field for theoretical models. The evolution of our knowledge of the cerebellum will continue to expand, including interactions with other brain regions towards understanding the whole brain learning mechanism.