Synchronization and Exploration in Basal Ganglia—A Spiking Network Model

Mandali, Alekhya; Chakravarthy, V. Srinivasa

doi:10.1007/978-981-10-8494-2_6

Alekhya Mandali⁴ &
V. Srinivasa Chakravarthy⁵

Part of the book series: Cognitive Science and Technology ((CSAT))

956 Accesses

Abstract

Making an optimal decision could be to either ‘Explore’ or ‘exploit’ or ‘not to take any action,’ and basal ganglia (BG) are considered to be a key neural substrate in decision making. In earlier chapters, we had hypothesized earlier that the indirect pathway (IP) of the BG could be the subcortical substrate for exploration. Here, we build a spiking network model to relate exploration to synchrony levels in the BG (which are a neural marker for tremor in Parkinson’s disease). Key BG nuclei such as the subthalamic nucleus (STN), Globus Pallidus externus (GPe), and Globus Pallidus internus (GPi) were modeled as Izhikevich spiking neurons, whereas the striatal output was modeled as Poisson spikes. We have applied reinforcement learning framework with the dopamine signal representing the reward prediction error used for cortico-striatal weight update. We apply the model to two decision-making tasks: a binary action selection task and an n-armed bandit task. The model shows that exploration levels could be controlled by STN’s lateral connection strength which also influenced the synchrony levels in the STN–GPe circuit. An increase in STN’s lateral strength led to a decrease in exploration which can be thought as the possible explanation for reduced exploratory levels in Parkinson’s patients.

Access provided by CONRICYT-eBooks. Download chapter PDF

Oscillatory Neural Models of the Basal Ganglia for Action Selection in Healthy and Parkinsonian Cases

A Computational Model of Neural Synchronization in Striatum

Neurophysiology of the Basal Ganglia and Deep Brain Stimulation

6.1 Introduction

Chakravarthy, Joseph, and Bapi (2010) suggested that STN–GPe loop, a coupled excitatory–inhibitory network in the IP, might be the substrate for exploration (Chakravarthy et al., 2010). It is well known that coupled excitatory–inhibitory pools of neurons can exhibit rich dynamic behavior like oscillations and chaos (Borisyuk, Borisyuk, Khibnik, & Roose, 1995; Sinha, 1999). This hypothesis has inspired models simulating various BG functions ranging from action selection in continuous spaces (Krishnan, Ratnadurai, Subramanian, Chakravarthy, & Rengaswamy, 2011), reaching movements (Magdoom et al., 2011), spatial navigation (Sukumar, Rengaswamy, & Chakravarthy, 2012), precision grip (Gupta, Balasubramani, & Chakravarthy, 2013), and gait (Muralidharan, Balasubramani, Chakravarthy, Lewis, & Moustafa, 2013) in normal and Parkinsonian conditions. Using a network of rate-coding neurons, Kalva, Rengaswamy, Chakravarthy, and Gupte (2012) showed that exploration emerges out of the chaotic dynamics of the STN–GPe system (Kalva et al., 2012). Most rate-coded models, by design, fail to capture dynamic phenomena like synchronization found in more realistic spiking neuron models (Bevan, Magill, Terman, Bolam, & Wilson, 2002; Park, Worth, & Rubchinsky, 2010; Park, Worth, & Rubchinsky, 2011). Synchronization within BG nuclei had gained attention since the discovery that STN, GPe, and GPi neurons show high levels of synchrony in Parkinsonian conditions (Bergman, Wichmann, Karmon, & DeLong, 1994; Bevan et al., 2002; Hammond, Bergman, & Brown, 2007; Tachibana, Iwamuro, Kita, Takada, & Nambu, 2011; Weinberger & Dostrovsky, 2011). This oscillatory activity was found to be present in two frequency bands, one around the tremor frequency [2–4 Hz] and another in [10–30 Hz] frequency (Weinberger & Dostrovsky, 2011). Park et al. (2011) report the presence of intermittent synchrony between STN neurons and its local field potentials (LFP), recorded using multiunit activity electrodes from PD patients undergoing deep brain stimulation (DBS) surgery (Park et al., 2011) which is absent in healthy controls.

One of the key objectives of the current study is to use a 2D spiking neuron model to understand and correlate STN–GPe’s synchrony levels to exploration. As the second objective, we apply the above-mentioned model to the n-armed bandit problem of Daw, O’Doherty, Dayan, Seymour, and Dolan (2006) and Bourdaud, Chavarriaga, Galán, and del R Millan (2008) (Bourdaud et al., 2008; Daw et al., 2006) with the specific aim of studying the contributions of STN–GPe dynamics to exploration. The proposed model shares some aspects of classical RL-based approach to BG modeling. For example, dopamine signal is compared to reward prediction error (Schultz, 1998). Furthermore, DA is allowed to control cortico-striatal plasticity (Reynolds and Wickens 2002), modulate the gains of striatal neurons (Hadipour-Niktarash, Rommelfanger, Masilamoni, Smith, & Wichmann, 2012; Kliem, Maidment, Ackerson, Chen, Smith, & Wichmann, 2007), and influence the dynamics of STN–GPe by modulating the connections (Fan, Baufreton, Surmeier, Chan, & Bevan, 2012; Kreiss, Mastropietro, Rawji, & Walters, 1997).

6.2 Methods

6.2.1 Spiking Neuron Model of the Basal Ganglia

The network model of BG (Mandali, Rengaswamy, Chakravarthy, & Moustafa, 2015) described earlier was used to simulate the binary action selection and n-arm bandit task. For details of the model and its related equations, refer to earlier sections. The details of the tasks and the related measures are explained below.

6.2.2 Binary Action Selection Task

The first task we simulated was the simple binary action selection similar to Humphries, Stewart, and Gurney (2006), where two competing stimuli were presented to the model (Humphries et al., 2006). The input firing frequency is thought to represent ‘saliency,’ with higher frequencies representing higher salience (Humphries et al., 2006). The response of striatal output to cortical input falls in the range of a few tens of Hz (Sharott, Doig, Mallet, & Magill, 2012). Therefore, the frequencies that represent the 2 actions were assumed to be around 4 Hz (stimulus #1) and 8 Hz (stimulus #2). Spontaneous output firing rate of the striatal neurons (without input) is assumed to be around 1 Hz (Plenz & Kitai, 1998; Sharott et al., 2012). Selection of higher salient stimulus among the available choices could be considered as ‘exploitation’ while selecting the less salient one as ‘exploration’ (Sutton & Barto, 1998). So, the action selected is defined as ‘Go’ if stimulus #2 (more salient) is selected, ‘Explore’ if stimulus #1 (less salient) is selected, and ‘NoGo’ if none of them is selected.

The inputs were given spatially such that the neurons in the upper half of the lattice receive stimulus #1 and lower half the other (Fig. 6.1). The striatal outputs from D1 and D2 neurons of the striatum are given as input to GPi and GPe modules, respectively, with the projection pattern as shown in Fig. 6.1. Poisson spike trains corresponding to stimulus #1 were presented as input to neurons (1–1250) and were fully correlated among themselves. Similarly, Poisson spike trains corresponding to stimulus #2 were presented as input to neurons (1251–2500) and were fully correlated among themselves. Stimulus #1 and #2 are presented for an interval of 100 ms between 100 and 200 ms; at other times, uncorrelated spike trains at 1 Hz are presented to all the striatal neurons.

6.2.3 The N-Armed Bandit Task

We now describe the four-armed bandit task (Bourdaud et al., 2008; Daw et al., 2006) used to study exploratory and exploitatory behavior. In this experimental task, subjects were presented with four arms where one among them is to be selected in every trial for a total of 300 trials. The reward/payoff for each of these slots was obtained from a Gaussian distribution whose mean changes from trial to trial with payoff ranging from 0 to 100. The payoff, r _i.k associated with the ith machine at the kth trial, was drawn from a Gaussian distribution of mean μ _i,k and standard deviation (SD) σ ₀. The payoff was rounded to the nearest integer, in the range [0, 100]. At each trial, the mean is diffused according to a decaying Gaussian random walk. The trial was defined as an ‘exploitatory’ trial if highest reward giving arm was selected else defined as an ‘exploratory’ trial.

The payoffs generated by the slot machines are computed as follows,

$$ \mu_{i,k + 1} = \lambda_{m} \mu_{i,k} + (1 - \lambda_{m} )\theta_{m} + {\text{e}} $$

(6.1)

$$ r_{i,k}^{{\prime }} \approx N(\mu_{i,k} ,\sigma_{0}^{2} ) $$

(6.2)

$$ r_{i,k} = {\text{round}}(r_{i,k}^{\prime} ) $$

(6.3)

where

µ _i,k is the mean of the Gaussian distribution with standard deviation (σ ₀) for ith machine during k ^th trial. λ _m and θ _m control the random walk of mean (µ _i,k), and e ~ N(0, σ ²_d ) is obtained from Gaussian distribution of mean 0 and standard deviation σ _d. r _i,k and $ r_{i,k}^{\prime } $ are the payoffs before and after rounding to nearest integer, respectively. The initial value of mean payoff, µ _i,0, is set to a value of 50. All the values for the parameters λ _m, θ _m, σ _d, σ ₀ were adapted from (Bourdaud et al., 2008).

To make an optimal decision, the subjects need to keep track of rewards associated with each of the four arms. The subject’s decision to either Explore or exploit would depend on this internal representation which would closely resemble the actual payoff that is being obtained. It is quite difficult to identify whether the subject made an exploratory decision or an exploitative one just by observing the EEG and selected slot data. A subject-specific model is required to classify their decisions and identify the strategy (Bourdaud et al., 2008; Daw et al., 2006). Keeping this in mind, Bourdaud et al. (2008) used a ‘behavioral model’ that uses the softmax principle of RL to fit the selection pattern of human subjects. The parameter ‘β’ of the behavioral model was adjusted such that the final selection pattern matches that of individual subjects in the experiment (given below). The parameter ‘β’ which controls the exploration level in the behavioral model is tuned to match % exploitation obtained for each of the eight subjects (one subject’s data were discarded because of artifacts); two out of the eight subjects had similar exploration levels. Hence, a total of six subjects’ data are taken into account to check the performance of the proposed spiking BG model.

6.2.3.1 Behavioral Model (Adapted from Bourdaud et al. (2008))

The behavioral model labels each trial as corresponding to either an exploratory or exploitative decision. The model assumes that the user estimates the mean payoff of each machine using a Bayesian linear Gaussian rule (i.e., a Kalman filter). Using these estimations, he/she selects a machine according to a softmax rule. All the subjects are assumed to share the same model for tracking the payoff means, and thus, parameters are computed using the entire available data. The parameters of the model (for both mean tracking and machine selection) are estimated by maximizing the model likelihood with respect to the subject’s choices.

At any given trial, the behavioral model provides the mean payoff for all machines considering previous observations (i.e., the payoff obtained at previous trials). Comparison between the model’s estimated payoffs for all machines is used to label that trial as either exploration or exploitation. Those trials in which the user selects the machine with the highest estimated mean are labeled as corresponding to exploitative decisions.

The subject strategy for tracking the payoff of each machine is modeled by a Kalman filter, whose parameters are assumed to remain constant over trials. Once the jth machine is selected, at the kth trial, the estimated payoff distribution is updated from its preselection values $ \left( {\widehat{\mu }_{j,k}^{\text{pre}} ,\left( {\widehat{\sigma }_{j,k}^{\text{pre}} } \right)^{2} } \right) $ to its post-selection values $ \left( {\widehat{\mu }_{j,k}^{\text{post}} ,\left( {\widehat{\sigma }_{j,k}^{\text{post}} } \right)^{2} } \right) $ as follows

$$ \widehat{\mu }_{j,k}^{\text{post}} = \widehat{\mu }_{j,k}^{\text{post}} + K_{k} \left( {r_{k} - \widehat{\mu }_{j,k}^{\text{pre}} } \right) $$

(6.4)

$$ \left( {\widehat{\sigma }_{j,k}^{\text{post}} } \right)^{2} = (1 - K_{k} )\left( {\widehat{\sigma }_{j,k}^{\text{pre}} } \right)^{2} $$

(6.5)

where

$$ (K_{k} ) = \frac{{\left( {\widehat{\sigma }_{j,k}^{\text{pre}} } \right)^{2} }}{{\left( {\widehat{\sigma }_{j,k}^{\text{pre}} } \right)^{2} + \left( {\widehat{\sigma }_{0} } \right)^{2} }} $$

(6.6)

The mean estimation for the remaining machines does not change as result of the choice since the user cannot observe the payoff of these machines. That is,

$$ \forall i \ne j $$

$$ \widehat{\mu }_{j,k}^{\text{post}} = \widehat{\mu }_{j,k}^{\text{pre}} $$

(6.7)

$$ \widehat{\sigma }_{j,k}^{\text{post}} = \widehat{\sigma }_{j,k}^{\text{pre}} $$

(6.8)

Then, the estimations are also evolved according to the diffusion rule:

$$ \widehat{\mu }_{j,k + 1}^{\text{pre}} = \widehat{\lambda }\widehat{\mu }_{j,k}^{\text{post}} + (1 - \widehat{\lambda })\widehat{\theta } $$

(6.9)

$$ \left( {\mu_{j,k + 1}^{{{\prime }{\text{pre}}}} } \right)^{2} = \widehat{\lambda }^{2} \left( {\sigma_{j,k}^{{{\prime }{\text{post}}}} } \right)^{2} + \sigma_{d}^{2} $$

(6.10)

The choice of subjects is modeled by a softmax rule; i.e., at each trial k, the probability of choosing the machine is

$$ P_{i,k} = \frac{{\exp \left( {\beta \widehat{\mu }_{i,k}^{\text{pre}} } \right)}}{{\sum\limits_{j} {\exp \left( {\beta \widehat{\mu }_{j,k}^{\text{pre}} } \right)} }} $$

(6.11)

where ‘β’ is a scaling parameter. Higher values of β drive the system to exploitative behavior and vice versa. The parameters of the behavioral model $ \left( {\sigma_{0} ,\widehat{\theta },\widehat{\lambda },\widehat{\sigma }_{d} } \right) $ are estimated by maximizing the log likelihood under the following constraints. To speed up convergence, estimated parameters $ \left( {\sigma ,\widehat{\mu }_{j,0}^{\text{pre}}\, \& \, \widehat{\sigma }_{j,0}^{\text{pre}} } \right) $ are initialized to the parameters of the original model $ (\sigma_{0} ,\mu_{j,0} \, \& \, \sigma_{j,0} ) $, respectively. Fixing the last two parameters does not significantly affect the estimation of the others, because their influence vanishes quickly within a few trials. Table 6.1 shows the estimated values of the model, which are consistent with the real values of the machines.

Table 6.1 Estimation of parameters of the behavioral model (Bourdaud et al., 2008)

Full size table

6.2.3.2 Strategy for Slot Machine Selection

To simulate the experiment, we utilized the concepts of RL and combined the dynamics of BG model to select an optimally rewarding slot in each trial. Experimental data show that BG receives reward-related information in the form of dopaminergic input to striatum (Chakravarthy et al., 2010; Niv, 2009). Cortico-striatal plasticity changes due to dopamine (Reynolds & Wickens, 2002) were incorporated in the model by allowing DA signals to modulate the Hebb-like plasticity of cortico-striatal synapses (Surmeier, Ding, Day, Wang, & Shen, 2007).

The architecture of the proposed network model is depicted in Fig. 6.1. The output of striatum (both D1 and D2 parts) was divided equally into four quadrants which receive input from corresponding stimulus. The stimuli are associated with 2 weights $ \left( {w_{i,0}^{{{\text{D}}1}} ,w_{i,0}^{{{\text{D}}2}} } \right) $ initialized with equal value of 50 which represent the cortico-striatal weights of D1 and D2 MSNs in the striatum. Each of the cortico-striatal weights represents the saliency (in terms of striatal spike rate) for that corresponding arm. These output spikes generated from each of the D1 and D2 striatum project to GPi and GPe, respectively. The final selection of an arm is made as in Sect. 6.2.4. The reward r _i,k received for the selected slot was sampled from Gaussian distribution with mean μ _i,k and SD (σ ₀) (Eq. 6.3).

Utilizing the reward obtained for the input ‘i’ and trial ‘k’, the expected value of the slots, inputs to D1 and D2 striatum are updated using the following equations,

$$ \Delta w_{i,k + 1}^{\text{D1}} = \eta \delta_{k} x_{i,k}^{\text{inp}} $$

(6.12)

$$ \Delta w_{i,k + 1}^{\text{D2}} = - \eta \delta_{k} x_{i,k}^{\text{inp}} $$

(6.13)

The expected value (V _k) for kth trial is calculated as

$$ V_{k} = \sum\limits_{i = 1}^{4} {w_{i,k}^{\text{D1}} *x_{i,k}^{\text{inp}} } $$

(6.14)

The received payoff (Re_k) for kth trial is calculated as

$$ {\text{Re}}_{k} = \sum\limits_{i = 1}^{4} {r_{i,k} *x_{i,k}^{\text{inp}} } $$

(6.15)

The error (δ) for kth trial is defined as

$$ \delta_{k} = {\text{Re}}_{k} - V_{k} $$

(6.16)

where $ w_{i,k}^{\text{D1}} $ are the cortico-striatal weights of D1 striatum for ith machine in kth trial, $ w_{i,k}^{\text{D2}} $ are the cortico-striatal weights of D2 striatum for ith machine for kth trial, r _i,k is the reward obtained for the selected ith machine for kth trial, $ x_{i,k}^{\text{inp}} $ is the binary input vector representing the four slot machines, e.g., if the first slot machine is selected $ x_{i,k}^{\text{inp}} $ = [1 0 0 0], η (=0.3) is the learning rate of D1 and D2 striatal MSNs, Re_k is the received payoff for selected slot for kth trial, and V _k is the expected value for selected slot for kth trial.

The cortico-striatal weights are updated (Eqs. 6.12 and 6.13) using the error term ‘δ’ (Eq. 6.16). The reward-related information in the form of dopaminergic input to striatum has been correlated to the error (δ) (Chakravarthy et al., 2010; Niv, 2009). The δ calculated from Eq. (6.16) has both positive and negative values with no upper and lower boundaries but the working DA range in the model was limited to small positive values (0.1–0.9). Hence, a mapping from δ to DA is defined as follows:

$$ {\text{DA}} = {\text{sig}}(\lambda *\delta_{k} ) $$

(6.17)

where

DA is the dopamine signal within range of 0.1–0.9, λ is the slope of sigmoid (=0.2), δ _k is the error obtained for kth trial (Eq. 6.16), and sig () is the sigmoid function.

6.2.4 Measures

6.2.4.1 Synchronization

The phenomenon of neural synchrony has attracted the attention of many computational and experimental neuroscientists in the recent decades (Hauptmann & Tass, 2007; Kumar, Cardanobile, Rotter, & Aertsen, 2011; Park et al., 2011; Pinsky & Rinzel, 1995; Plenz & Kital, 1999). It is believed that partial synchrony helps in the generation of various EEG rhythms such as alpha and beta (Izhikevich, 2007). Studying synchrony in neural networks has been gaining importance due to its presence in normal functioning (coordinated movement of the limbs) and in pathological states (e.g., synchronized activity of CA3 neurons in the hippocampus during an epileptic seizure) (Pinsky & Rinzel, 1995). Plenz and Kital (1998) proposed that STN–GPe might act as a pacemaker (Plenz & Kital, 1999), a source for generating oscillations in pathological conditions such as Parkinson’s disease. Park et al. (2011) report the presence of intermittent synchrony between STN neurons and its local field potentials (LFP), recorded using multiunit activity electrodes from PD patients undergoing DBS surgery (Park et al., 2011). They also calculated the duration of synchronized and desynchronized events in neuronal activity by estimating transition rates, which were obtained with the help of first return maps plotted using phase of neurons (Park et al., 2010, 2011). To observe how dopamine changes synchrony in STN–GPe, we calculated the phases of individual neurons as defined in (Pinsky & Rinzel, 1995).

The phase of jth neuron was calculated as follows:

$$ \emptyset_{j} \left( t \right) = 2*\pi *\frac{{\left( {T_{j,k} - t_{j,k} } \right)}}{{\left( {t_{j,k + 1} - t_{j,k} } \right)}} $$

(6.18)

$$ R^{\text{sync}} \left( t \right)*{\text{e}}^{i\theta \left( t \right)} = \frac{1}{N}\mathop \sum \limits_{j = 1}^{N} {\text{e}}^{{i\emptyset_{j} \left( t \right)}} $$

(6.19)

where

t _j,k and t _j,k+1 are the onset times of kth and k + 1th spike of the jth neuron $ T_{j,k} \in \left[ {t_{j,k} ,t_{j,k + 1} } \right] $, $ \emptyset_{j} \left( t \right) $ = phase of jth neuron at time ‘t’, R ^sync is the synchronization measure 0 ≤ R ^sync ≤ 1, $ \theta $ = average phase of neurons, N = total number of neurons in the network.

6.2.5 Action Selection Using the Race Model

Action selection is modulated by BG output nucleus GPi which projects back to the cortex via the thalamus. We have used the race model (Vickers, 1970) for the final action selection where an action is selected when temporally integrated neuronal activity of the output neurons crosses a threshold (Frank, 2006; Frank, Samanta, Moustafa, & Sherman, 2007; Humphries, Khamassi, & Gurney, 2012).

The dynamics of the thalamic neurons is as follows:

$$ \frac{{{\text{d}}z_{k} \left( t \right)}}{{{\text{d}}t}} = - z_{k} \left( t \right) + f_{\text{Gpik}} (t) $$

(6.20)

$$ \begin{aligned} f^{\prime}_{\text{Gpik}} & = \frac{1}{(N*N)/k}\sum\limits_{t = 1}^{T} {\left( {\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N/k} {S_{ij}^{\text{GPik}} } } (t)} \right)} \\ f_{\text{GPik}} & = \frac{{f_{\text{GPi}}^{ \hbox{max} } - f^{\prime}_{\text{Gpik}} }}{{f_{\text{GPi}}^{ \hbox{max} } }} \\ \end{aligned} $$

(6.21)

where

z _k (t) = integrating variable for kth stimulus, f _GPik (t) = normalized and reversed average firing frequency of GPi neurons receiving kth stimulus from striatum, $ f_{\text{GPi}}^{ \hbox{max} } $ = highest firing rate among the GPi neurons, $ S_{ij}^{\text{Gpik}} $ = neuronal spikes of GPi neurons receiving kth stimulus, N = number of neurons in a single row/column of GPi array (=50), and T = duration of simulation.

The first neuron (z _k) among k stimuli to cross the threshold (=0.15) represents the action selected. All the variables representing neuron activity are reset immediately after each action selection.

6.3 Results

We start with results of neural dynamics (STN–GPe) as a function of DA and then present with decision-making results.

6.3.1 Neural Dynamics

Pathological oscillations of STN and GP have been associated with various PD symptoms (Brown, 2003; Plenz & Kital, 1999). Correlated neural firing patterns in STN and GPi can be seen in both experimental conditions of dopamine depletion and in Parkinsonian conditions. In the present model, we show increased synchronized behavior under conditions of reduced dopamine, resembling the situation in dopamine-deficient conditions of Parkinson’s disease. The effect of DA on the synchronization of STN and GPe neurons was studied by estimating the values of $ R_{\text{STN}}^{\text{sync}} $, $ R_{\text{GPe}}^{\text{sync}} R_{\text{STNGPe}}^{\text{sync}} $ for increasing values of DA (0.1–0.9).

The three ‘R ^sync’ (Eq. 6.19) values showed a decrease in amplitude with an increase in DA level (Fig. 6.2a–c). Under low DA conditions, GPe activity follows STN activity (Plenz & Kital, 1999), thus forming a pacemaker kind of circuit, which could be the source of STN–GPe oscillations Fig. 6.2d. One of the suspected reasons of bursting activity in STN is the decreased inhibition from GPe neurons (Plenz & Kital, 1999) at low DA levels. This feature is captured by the model since GPe firing rates are smaller for lower DA levels. The STN neurons showed oscillations around the frequency of 10 Hz at low DA but were absent at high DA level (Kang & Lowery, 2013).

6.3.2 Decision Making

After the model’s performance was quantified at neural level, we studied the role of BG in decision making using two tasks especially in explorative and exploitative dynamics. This work is in continuation to our earlier hypothesis that the source for exploration comes from STN–GPe dynamics (Kalva et al., 2012). The first task was a simple binary action selection similar to Humphries et al., (2006), where two competing stimuli were presented to the model. The input firing frequency is thought to represent ‘saliency,’ with higher frequencies representing higher salience. Selection of stimulus with the higher salience between the two available choices could be considered as ‘exploitation’ while selecting the less salient one as ‘exploration’ (Sutton & Barto, 1998). So the action selected is defined as ‘Go’ if stimulus #2 (more salient) is selected, ‘Explore’ if stimulus #1 (less salient) is selected, and ‘NoGo’ if none of them is selected. Simulations were run for 100 trials, and the percentage of actions selected under each regime (Go, Explore, and NoGo) was calculated for dopamine levels ranging from low (0.1) to high (0.9) (Fig. 6.3). We may note that the probability of NoGo, where no action is selected, decreases with increase in dopamine; probability of Go increases with dopamine; the peak of exploration is found at intermediate levels of dopamine (Fig. 6.3). The range of DA where a peak in exploration was observed is the same where STN and GPe network showed chaotic activity.

The second task was a four-armed bandit task (Bourdaud et al., 2008; Daw et al., 2006) which is similar to a real-world decision-making scenario. In this task, the subjects are presented with four arms where one among them is to be selected in every trial for a total of 300 trials. The reward/payoff for each of these slots was obtained from a Gaussian distribution whose mean changes from trial to trial with payoff ranging from 0 to 100. The model’s performance (% exploitation) was compared with behavioral model, which represents the experimental data in the n-armed bandit task (Fig. 6.4). The parameter ‘β’ of the behavioral model which controls the Exploit–Explore balance was adjusted to match the performance of individual subjects in the experiment. Exploration in the model can be obtained by either increasing the IP weight (influence from STN) or decreasing DP weight (influence from striatum).

6.4 Discussion

The synchrony results tally with the general observation from electrophysiology that at higher levels of dopamine, the STN–GPe system shows desynchronized activity and under dopamine-deficient conditions of PD exhibits synchronized bursts (Bergman et al., 1994; Gillies, Willshaw, Gillies, & Willshaw, 1998; Park et al., 2011). We observed that STN activity showed oscillatory activity with a frequency (=10 Hz) which falls under the beta frequency range observed in experimental PD study (Weinberger & Dostrovsky, 2011). One of the aims of the present work is also to show that the complex dynamics of STN–GPe system contributes to exploration. To this end, we first simulated the binary action selection task [similar to Humphries et al., (2006)] where saliency was coded in the firing rate. The selection of higher one was defined as ‘exploitation/Go’ and lesser one as ‘exploration/Explore’ and not selecting any of the inputs as ‘NoGo’. The model showed NoGo at low DA levels (0.1–0.3) and Go at high DA levels (0.7–0.9) consistent with the classical picture of BG function. Along with this, a peak in ‘Explore’ at intermediate levels of DA (0.4–0.6) was also observed (Fig. 6.3). To check whether any other module in the network is influencing exploration in the system, we removed the STN to GPi connection (which effectively eliminated the IP). This omission rendered the system to display only Go and NoGo regimes (no exploration, results not included). We then moved to simulating the n-armed bandit task, where the performance of model was compared with experimental result. The results obtained from BG model closely match with the behavioral model (Fig. 6.4) reinforcing the idea that STN–GPe could be a source for exploration at subcortical level.

References

Bergman, H., Wichmann, T., Karmon, B., & DeLong, M. (1994). The primate subthalamic nucleus. II. Neuronal activity in the MPTP model of parkinsonism. Journal of Neurophysiology, 72(2), 507–520.
Article Google Scholar
Bevan, M. D., Magill, P. J., Terman, D., Bolam, J. P., & Wilson, C. J. (2002). Move to the rhythm: Oscillations in the subthalamic nucleus–external globus pallidus network. Trends in Neurosciences, 25(10), 525–531.
Article Google Scholar
Borisyuk, G. N., Borisyuk, R. M., Khibnik, A. I., & Roose, D. (1995). Dynamics and bifurcations of two coupled neural oscillators with different connection types. Bulletin of Mathematical Biology, 57(6), 809–840.
Article MATH Google Scholar
Bourdaud, N., Chavarriaga, R., Galán, F., & del R Millan, J. (2008). Characterizing the EEG correlates of exploratory behavior. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 6(6), 549–556.
Google Scholar
Brown, P. (2003). Oscillatory nature of human basal ganglia activity: Relationship to the pathophysiology of Parkinson’s disease. Movement Disorders, 18(4), 357–363.
Article Google Scholar
Chakravarthy, V., Joseph, D., & Bapi, R. S. (2010). What do the basal ganglia do? A modeling perspective. Biological Cybernetics, 103(3), 237–253.
Article MathSciNet MATH Google Scholar
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879.
Article Google Scholar
Fan, K. Y., Baufreton, J., Surmeier, D. J., Chan, C. S., & Bevan, M. D. (2012). Proliferation of external globus pallidus-subthalamic nucleus synapses following degeneration of midbrain dopamine neurons. The Journal of Neuroscience, 32(40), 13718–13728.
Article Google Scholar
Frank, M. J. (2006). Hold your horses: A dynamic computational role for the subthalamic nucleus in decision making. Neural Networks, 19(8), 1120–1136.
Article MATH Google Scholar
Frank, M. J., Samanta, J., Moustafa, A. A., & Sherman, S. J. (2007). Hold your horses: Impulsivity, deep brain stimulation, and medication in parkinsonism. Science, 318(5854), 1309–1312.
Article Google Scholar
Gillies, A., Willshaw, D., Gillies, A., & Willshaw, D. (1998). A massively connected subthalamic nucleus leads to the generation of widespread pulses. Proceedings of the Royal Society of London, Series B: Biological Sciences, 265(1410), 2101–2109.
Article Google Scholar
Gupta, A., Balasubramani, P. P., & Chakravarthy, V. S. (2013). Computational model of precision grip in Parkinson’s disease: A utility based approach. Frontiers in Computational Neuroscience, 7.
Google Scholar
Hadipour-Niktarash, A., Rommelfanger, K. S., Masilamoni, G. J., Smith, Y., & Wichmann, T. (2012). Extrastriatal D2-like receptors modulate basal ganglia pathways in normal and parkinsonian monkeys. Journal of Neurophysiology, 107(5), 1500–1512.
Google Scholar
Hammond, C., Bergman, H., & Brown, P. (2007). Pathological synchronization in Parkinson’s disease: Networks, models and treatments. Trends in Neurosciences, 30(7), 357–364.
Article Google Scholar
Hauptmann, C., & Tass, P. A. (2007). Therapeutic rewiring by means of desynchronizing brain stimulation. Biosystems, 89(1), 173–181.
Article Google Scholar
Humphries, M. D., Khamassi, M., & Gurney, K. (2012). Dopaminergic control of the exploration-exploitation trade-off via the basal ganglia. Frontiers in Neuroscience, 6.
Google Scholar
Humphries, M. D., Stewart, R. D., & Gurney, K. N. (2006). A physiologically plausible model of action selection and oscillatory activity in the basal ganglia. The Journal of Neuroscience, 26(50), 12921–12942.
Article Google Scholar
Izhikevich, E. M. (2007). Dynamical systems in neuroscience. Cambridge: The MIT press.
Google Scholar
Kalva, S. K., Rengaswamy, M., Chakravarthy, V. S., & Gupte, N. (2012). On the neural substrates for exploratory dynamics in basal ganglia: A model. Neural Networks, 32, 65–73. https://doi.org/10.1016/j.neunet.2012.02.031.
Article Google Scholar
Kang, G., & Lowery, M. M. (2013). Interaction of oscillations, and their suppression via deep brain stimulation, in a model of the cortico-basal ganglia network. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 21(2), 244–253.
Article Google Scholar
Kliem, M. A., Maidment, N. T., Ackerson, L. C., Chen, S., Smith, Y., & Wichmann, T. (2007). Activation of nigral and pallidal dopamine D1-like receptors modulates basal ganglia outflow in monkeys. Journal of Neurophysiology, 98(3), 1489–1500.
Article Google Scholar
Kreiss, D. S., Mastropietro, C. W., Rawji, S. S., & Walters, J. R. (1997). The response of subthalamic nucleus neurons to dopamine receptor stimulation in a rodent model of Parkinson’s disease. The Journal of Neuroscience, 17(17), 6807–6819.
Google Scholar
Krishnan, R., Ratnadurai, S., Subramanian, D., Chakravarthy, V. S., & Rengaswamy, M. (2011). Modeling the role of basal ganglia in saccade generation: Is the indirect pathway the explorer? Neural Networks, 24(8), 801–813.
Article Google Scholar
Kumar, A., Cardanobile, S., Rotter, S., & Aertsen, A. (2011). The role of inhibition in generating and controlling Parkinson’s disease oscillations in the basal ganglia. Frontiers in Systems Neuroscience, 5.
Google Scholar
Magdoom, K., Subramanian, D., Chakravarthy, V. S., Ravindran, B., Amari, S.-I., & Meenakshisundaram, N. (2011). Modeling basal ganglia for understanding Parkinsonian reaching movements. Neural Computation, 23(2), 477–516.
Article MATH Google Scholar
Mandali, A., Rengaswamy, M., Chakravarthy, S., & Moustafa, A. A. (2015). A spiking Basal Ganglia model of synchrony, exploration and decision making. Frontiers in Neuroscience, 9, 191.
Article Google Scholar
Muralidharan, V., Balasubramani, P. P., Chakravarthy, V. S., Lewis, S. J., & Moustafa, A. A. (2013). A computational model of altered gait patterns in parkinson’s disease patients negotiating narrow doorways. Frontiers in Computational Neuroscience, 7.
Google Scholar
Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53(3), 139–154.
Article MathSciNet MATH Google Scholar
Park, C., Worth, R. M., & Rubchinsky, L. L. (2010). Fine temporal structure of beta oscillations synchronization in subthalamic nucleus in Parkinson’s disease. Journal of Neurophysiology, 103(5), 2707–2716.
Article Google Scholar
Park, C., Worth, R. M., & Rubchinsky, L. L. (2011). Neural dynamics in parkinsonian brain: the boundary between synchronized and nonsynchronized dynamics. Physical Review E, 83(4), 042901.
Article Google Scholar
Pinsky, P. F., & Rinzel, J. (1995). Synchrony measures for biological neural networks. Biological Cybernetics, 73(2), 129–137.
Article MATH Google Scholar
Plenz, D., & Kitai, S. T. (1998). Up and down states in striatal medium spiny neurons simultaneously recorded with spontaneous activity in fast-spiking interneurons studied in cortex–striatum–substantia nigra organotypic cultures. The Journal of Neuroscience, 18(1), 266–283.
Google Scholar
Plenz, D., & Kital, S. T. (1999). A basal ganglia pacemaker formed by the subthalamic nucleus and external globus pallidus. Nature, 400(6745), 677–682.
Article Google Scholar
Reynolds, J. N. J., & Wickens, J. R. (2002). Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15(4), 507–521.
Article Google Scholar
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1–27.
Article MathSciNet Google Scholar
Sharott, A., Doig, N. M., Mallet, N., & Magill, P. J. (2012). Relationships between the firing of identified striatal interneurons and spontaneous and driven cortical activities in vivo. The Journal of Neuroscience, 32(38), 13221–13236.
Article Google Scholar
Sinha, S. (1999). Noise-free stochastic resonance in simple chaotic systems. Physica A: Statistical Mechanics and its Applications, 270(1), 204–214.
Article Google Scholar
Sukumar, D., Rengaswamy, M., & Chakravarthy, V. S. (2012). Modeling the contributions of Basal ganglia and Hippocampus to spatial navigation using reinforcement learning. PLoS ONE, 7(10), e47467.
Article Google Scholar
Surmeier, D. J., Ding, J., Day, M., Wang, Z., & Shen, W. (2007). D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatal medium spiny neurons. Trends in Neurosciences, 30(5), 228–235.
Article Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). UK: Cambridge University Press.
Google Scholar
Tachibana, Y., Iwamuro, H., Kita, H., Takada, M., & Nambu, A. (2011). Subthalamo-pallidal interactions underlying parkinsonian neuronal oscillations in the primate basal ganglia. European Journal of Neuroscience, 34(9), 1470–1484.
Article Google Scholar
Vickers, D. (1970). Evidence for an accumulator model of psychophysical discrimination. Ergonomics, 13(1), 37–58.
Article Google Scholar
Weinberger, M., & Dostrovsky, J. O. (2011). A basis for the pathological oscillations in basal ganglia: The crucial role of dopamine. NeuroReport, 22(4), 151.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychiatry, School of Clinical Sciences, University of Cambridge, Cambridge, UK
Alekhya Mandali
Computational Neuroscience Laboratory, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India
V. Srinivasa Chakravarthy

Authors

Alekhya Mandali
View author publications
You can also search for this author in PubMed Google Scholar
V. Srinivasa Chakravarthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Srinivasa Chakravarthy .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mandali, A., Chakravarthy, V.S. (2018). Synchronization and Exploration in Basal Ganglia—A Spiking Network Model. In: Computational Neuroscience Models of the Basal Ganglia. Cognitive Science and Technology. Springer, Singapore. https://doi.org/10.1007/978-981-10-8494-2_6

Download citation

DOI: https://doi.org/10.1007/978-981-10-8494-2_6
Published: 22 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8493-5
Online ISBN: 978-981-10-8494-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Synchronization and Exploration in Basal Ganglia—A Spiking Network Model

Abstract

Similar content being viewed by others

Oscillatory Neural Models of the Basal Ganglia for Action Selection in Healthy and Parkinsonian Cases

A Computational Model of Neural Synchronization in Striatum

Neurophysiology of the Basal Ganglia and Deep Brain Stimulation

6.1 Introduction

6.2 Methods