Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 The Classic D1/D2 Direct/Indirect Model of the Basal Ganglia Networks

Today, neurology textbooks (e.g., Adams and Victor’s Principles of Neurology 10th Edition, 2014) depict the basal ganglia (BG) as the feed-forward part of a closed loop connecting all cortical areas sequentially through the BG direct and indirect pathways back to the motor cortex [1, 5]. The motor cortex projects to the spinal level through the corticospinal pathway and controls muscle activation and movements (Fig. 1.1).

Fig. 1.1
figure 1

The classical D1/D2 direct/indirect model of the basal ganglia. Gray and black arrows represent excitatory (glutamate) and inhibitory (GABA) connections, respectively. Gray and black round shape arrows represent excitatory and inhibitory effects of dopamine on MSNs expressing D1 and D2 receptors, respectively. Abbreviations: DAN midbrain dopaminergic neurons, GPe, GPi external and internal segments of the globus pallidus, MSN striatal medium spiny (projection) neurons, SNr substantia nigra pars reticulata, STN subthalamic nucleus

This BG model emphasizes the structure of the two segregated internal BG pathways. Both pathways start in the projection neurons of the striatum and converge on the output structures of the basal ganglia (the internal segment of the globus pallidus – the GPi; the substantia nigra pars reticulata – the SNr). The striatal projection neurons in the direct pathway are medium spiny neurons (MSNs) that express D1 dopamine receptors, whereas those in the indirect pathway express D2 dopamine receptors [11]. Both D1 and D2 MSNs use GABA as their main neurotransmitter. The “direct pathway” is a monosynaptic GABAergic inhibitory projection from the striatum to the GPi/SNr, whereas the “indirect pathway” projection is polysynaptic and disinhibitory through the external segment of the globus pallidus (GPe) and the glutamatergic (excitatory) subthalamic nucleus (STN). Dopamine has differential effects on the two striato-pallidal pathways. It excites and facilitates transmission along the direct pathway via activation of D1 receptors and inhibits transmission along the indirect pathway via the D2 receptors.

The classical D1/D2 direct/indirect rate model of the basal ganglia has been one of the most influential models in the history of clinical neuroscience. It provides a general framework for the finding of physiological studies of Parkinsonian MPTP-treated monkeys (Fig. 1.2). These studies found that following dopamine depletion, there was a decrease in the average discharge rate of GPe neurons and an increase in the GPi [9, 19] and the STN [6] discharge rate. Reverse trends of pallidal discharge rates in response to dopamine replacement therapy have been reported in both human patients [13, 15, 18] and primates [10, 12, 21].

Fig. 1.2
figure 2

The classical D1/D2 direct/indirect model of the basal ganglia in the dopamine depleted Parkinsonian animal. Conventions and abbreviations as in Fig. 1.1. Increase/decrease width of arrows (in comparison with Fig. 1.1) represent increase/decrease activity. Black bold/gray characters represent neuronal structures with increase/decrease of their discharge rate

The classical D1/D2 direct/indirect model can also explain the physiological mechanisms of dopamine replacement therapy for Parkinson’s disease. Postsynaptic dopamine agonists enable the restoration of the normal dopamine tone to the striatum, and therefore raise the level of excitability of the motor cortex and ameliorate Parkinsonian akinesia. Similarly, STN and GPi inactivation, by GABA agonists, by lesions [5, 32], or by deep brain stimulation (under the assumption that deep brain stimulation mimics inactivation, see below), lead to a reduction in the over-activation of BG inhibitory output to the motor thalamocortical networks.

However, recent anatomical, physiological, and theoretical studies have revealed that the basal ganglia connectivity is more complex than the simple connectivity depicted by the D1/D2 direct/indirect model (e.g., back projections from STN to GPe and from GPe to striatum, hyper-direct cortex–STN pathway, etc.). Secondly, the model is falling short in explaining the dynamic patterns of basal ganglia activity and Parkinson’s disease. A common finding of physiological recording in MPTP-treated monkeys [9, 19] [6, 7, 23, 31] and human patients with Parkinson’s disease [16, 17, 29, 30, 33] is an increase in the fraction of basal ganglia neurons that discharge in periodic bursts at the tremor (3–7 Hz) frequency and at double tremor and beta range (12–30 Hz) frequency. Finally, this classical D1/D2 direct/indirect rate model ignores the emerging roles of the basal ganglia in reinforcement learning (see below) and behavioral adaptions to the changing environment.

2 The Reinforcement Learning Model of the Basal Ganglia

More modern computational models of the basal ganglia [27] treat the basal ganglia as an actor/critic reinforcement learning network (Fig. 1.3). The main axis or the actor part implements the behavioral policy or the mapping between states and actions (behavioral policy), and the critic calculates the mismatch between predictions and the actual state (prediction error). The prediction error is used to update the agent’s predictions and for optimization of the behavioral policy (by reinforcing those actions that led to the state of affairs better than predictions and by weakening the associations between state and actions that led to a state worse than predictions). Rewards can be either positive or negative in these models, and the computational goal is to maximize the cumulative (future discounted) reward.

Fig. 1.3
figure 3

Schematic actor/critic model of a reinforcement learning agent

In terms of BG anatomy (Fig. 1.4), the neural networks of the BG main axis (actor) connect the state encoding cortical domains with the cortical and brainstem motor centers. The midbrain dopaminergic neurons (located mainly at the substantia nigra pars compacta and in the ventral tegmental area, SNc and VTA, respectively) are the critics of the basal ganglia. Their normal background activity (~4–5 spikes/s) encodes the mismatch between predictions and reality. Positive prediction errors (reality better than predictions) are encoded by bursts of the dopamine neurons. On the other hand, omission of the expected reward, prediction of aversive events, and other cases of negative prediction error (reality worse than predictions) are encoded by depression (pause) of the spiking activity [26, 28]. These changes in dopamine activity and the coinciding cortical and striatal discharge lead to plastic changes in the efficacy of the cortico-striatal synapses (long-term potentiation or depression accordingly), and therefore to modulation of the association between states (encoded by the cortical activity) and action (encoded by BG output activity).

Fig. 1.4
figure 4

Anatomical description of the actor/critic reinforcement model of the basal ganglia. Abbreviations and conventions as in Fig. 1.1

The reinforcement actor/critic model of the basal ganglia has revolutionized current understanding of physiological mechanisms of model-free (procedural, implicit) learning and may provide insights into certain basal ganglia-related disorders such as akinesia and levodopa-induced dyskinesia. However, as for the classical D1/D2 direct/indirect model, this model has its own pitfalls. For example, the reinforcement learning BG model fails to provide a mechanism for the ultrafast action of dopamine agonists and antagonists (such as apomorphine or haloperidol). There is an ongoing debate in the electrophysiological literature on the ability of dopaminergic neurons to encode the negative domain of pleasure prediction [14]. Finally, the model assumes a single final currency (pleasure or its absence) to control behavior, and thus probably does not describe the multidimensional emotional repertoire of humans and animals.

3 The Multi-objective Optimization Model of the Basal Ganglia

There are other neuromodulators of the basal ganglia in addition to the midbrain dopaminergic neurons. The striatum is highly enriched with cholinergic, serotonergic, and histaminergic markers, and many anatomical and physiological studies have suggested that the striatal cholinergic interneurons, dorsal raphe serotonin (5-HT) neurons, and tubero-mamillary histamine neurons are part of the basal ganglia critic system. We recently hypothesized [22] that the computational goal of the basal ganglia is to optimize the trade-off between the orthogonal goals of maximizing future cumulative gain and minimizing the behavioral (information) cost (i.e., multi- rather than single-objective optimization). This multi-objective optimization goal naturally leads to a soft-max like behavioral policy where each of the BG critic plays a dual role. First, and as in previous reinforcement models, the BG critics affect the efficacy of the cortico-striatal synapses [2, 24, 27]. Second, the BG critics also affect the excitability of the striatal projection neurons (as in the classical D1/D2 direct/indirect BG model), and therefore act as a pseudo-temperature soft-max parameter. This pseudo-temperature parameter controls the trade-off between gain and cost and the continuum between exploratory (gambling) and greedy (akinetic) behavioral policies (the motor vigor [8, 20]). The different critics have differential effects on state-to-action coupling and the pseudo-temperature (excitability) of the basal ganglia network (Fig. 1.5 and Table 1.1). At present, we assume that dopamine and serotonin increase the temperature, whereas the other two critics reduce the temperature. Similarly, dopamine and histamine increase the coupling between state and action, whereas serotonin and acetylcholine reduce it. The reason for this heterogeneity is the variability of the environment and the optimal responses of the agent. Both appetitive and aversive predictive cues and events should increase the pseudo-temperature to enable approach and escape. However, appetitive events should increase the state-to-action coupling leading to reinforcement of the behavior that has resulted in better than predicted state. Conversely, aversive events should lead to reduction in the state-to-action coupling. Dopamine released for appetitive events and serotonin for aversive events have similar effects on the pseudo-temperature, and opposite effects on the state-to-action coupling (Table 1.1) are therefore ideally suited for these demands. Similar reasoning can be applied for the role of acetylcholine and histamine in BG information processing.

Fig. 1.5
figure 5

Anatomical description of the multi-objective optimization model of the basal ganglia. Abbreviations and conventions as in Fig. 1.1

Table 1.1 Multiple and differential effects of BG critics on BG pseudo-temperature (through modulation of the excitability of striatal projection neurons) and state-to-action coupling (through modulation of the efficacy of cortico-striatal synapses)

Finally, the unique features of funneling along the main axis of the basal ganglia [4] are included in this new model. In the nonhuman primate, there are 109 neurons in the cortex that project to the striatum, 107 projection neurons in the striatum, and 105 neurons in the output structure of the basal ganglia (GPi and SNr). This funneling structure (schematically illustrated by the box size in Fig. 1.5) enables the basal ganglia to extract the features of the current state that are important for the ongoing and future movements. For example, when you unexpectedly meet your grandmother in the corridor of your department, the most relevant feature is that this is your grandmother and not the white–blue dress or the new hairstyle. Your next action is to approach and kiss your grandmother, and this does not depend on her specific dress and hairstyle.

The multi-objective optimization model better captures the multifaceted organization of the actor/critic network of the basal ganglia. The combined effects of the critics on striatal excitability and on cortico-striatal synaptic efficacy enable the model to account for both ultrafast effects (e.g., apomorphine) and slow procedural learning kinetics. Furthermore, the model provides insights into the role of the non-dopaminergic critics in the basal ganglia physiology and pathophysiology (e.g., dopamine–acetylcholine motor balance and serotonin-related depression in Parkinson’s disease).

The first step in the treatment of Parkinson’s disease today is dopamine replacement therapy (DRT). This treatment is aimed at restoring the normal function of the BG critics. The first goal of DRT is to restore the full dynamic range of dopamine physiology, including phasic and environment-related changes at the dopamine level. However, the increased sprouting of dopaminergic axons, the over-sensitization of dopamine receptors, and other pathophysiological changes occurring over the many years of DRT lead to abnormal dynamics of dopamine in the striatum of these patients. This is clearly augmented by the use of dopamine agonists which directly affect postsynaptic receptors. Hence, striatal dopamine concentration and effects are no longer dependent on the environment and the behavior of the patient [3]. After five to ten years of years of treatment with DRT, Parkinsonian patients can no longer experience the benefits of DRT generated at the start of treatment, and side effects such as levodopa-induced dyskinesia affect their quality of life.

The D1/D2 direct/indirect model and the physiological recordings in the MPTP primate model of Parkinson’s disease have led to a shift in focus of therapy from the critic to the actor part of the basal ganglia. Physiological and metabolic studies have revealed changes in the discharge rate, pattern, and synchronization of neurons in the STN and GPi of MPTP-treated monkeys. Inactivation of these overactive BG nuclei in monkey and humans leads to an amelioration of Parkinsonian symptoms and to new therapeutic methods that can be applied after DRT failure. We hypothesized that the basal ganglia network is the default, fast, and unconscious link between the neural structures encoding the current state and action (e.g., System 1 of Daniel Kahneman’s Thinking, Fast and Slow, 2011). However, there are many additional networks, for example, amygdala – hypothalamic–pituitary–adrenal (HPA) axis and cortico-cortical networks. These networks provide parallel connectivity between state and action (Fig. 1.6); however, since the BG is the default connection between state and action, the other networks cannot compensate for abnormal BG activity. Silencing the BG abnormal activity enables the other networks to compensate and to reestablish close-to-normal state-to-action coupling.

Fig. 1.6
figure 6

The basal ganglianetwork is one of many neural networks connecting state-to-action association in the nervous system

However, permanent inactivation of a BG target is only achieved by lesioning, and hence is not recommended as a therapy of choice. Deep brain stimulation (DBS) is a reversible and adjustable procedure, and thus better suits current demands for efficient and ethical therapy. DBS effects mimic inactivation effects. Today, there is still an active debate concerning the mechanism governing DBS (e.g., by depolarization block or activation of afferent inhibitory projections); however, there is a general consensus that STN and GPi DBS provides effective treatment of late and even early-stage Parkinson’s disease. Thus, the modern therapy of Parkinson’s disease and other BG disorders has shifted from chemical manipulation of the neurotransmitter level of the BG critic to manipulation of spiking activity in the BG actor. DBS treatments are also effective in other basal ganglia-related movement disorders such as dystonia and essential tremor and are currently being tested for mental disorders such as obsessive–compulsive and major depression disorders.

We predict that next generation of DBS devices will exploit BG actor/critic multi-objective optimization algorithms and will provide even better therapy for human patients. Today, DBS adjustments must be made by a physician every 2–10 weeks. However, the dynamic and complex nature of Parkinson’s disease calls for more frequent and more sophisticated adjustment of the DBS parameters. This can be achieved by closed-loop DBS methods [25]. These future closed-loop DBS devices will be modulated by the BG neural activity, the objective telemetry of the patient’s symptoms, and the subjective evaluation by the patient and caregivers of quality of life. This closed-loop modulation is aiming at achievement of multi-objective optimization of the patient’s motor and nonmotor symptoms, along with minimization of the side effects of DBS therapy. Better understanding of the computational physiology of the basal ganglia in health and disease is therefore the first step in the long path for better treatment of human patients with basal ganglia disorders.