Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A common approach in neuroscience involves recording spiking activities or action potentials of neurons using microelectrodes. Subsequently, neuronal data may be represented as the times at which spikes occur. The main objective of a considerably large number of statistical methods then is to model the temporal evolution of the firing patterns of a group of neurons (Brillinger 1988; Brown et al. 2004; Kass et al. 2005; Gerstner and Kistler 2002; Tuckwell 1988; H.C. 1989; Riccardi 1977; Holden 1976; West 2007; Rigat et al. 2006). For a comprehensive review of the topic see Kass et al. (2005).

As an example, consider a study of neurons recorded from the primary motor cortex (M1) area of a Macaque monkey, performing a sequential task of reaching five targets arranged horizontally on a touch sensitive screen (Matsuzaka et al. 2007). The targets were numbered 1 to 5 from left to right and could be illuminated upon reaching them. The animal was trained to respond to the visual stimuli under two experimental conditions or modes. In the “repeating mode,” a sequence of targets appeared on the screen in a repeating order. In the “random mode,” targets appear in a pseudo-random order. An experimental window of 300 milliseconds (ms) was used. This time window began at 200 ms prior to the target reach and continued for 100 ms after that. The upper segment of Fig. 13.1 shows the corresponding raster plots for a neuron recorded under both modes of this task. Rows in the raster plot represent trials, and the tick-marks are spike time occurrences.

Fig. 13.1
figure 1

Raster and PSTH plots for a neuron under repeating (left panel) and random (right panel) modes

Typically, neuronal data are summarized through peri-stimulus time histograms (PSTH). For the above example, by dividing the window of 300 ms into bins of 10 ms, and pooling the spike occurrences within each bin, one can create the PSTH plots shown in the lower segment of Fig. 13.1.

Let \(Y _{1},\mathop{\ldots },Y _{n}\) denote the number of spike occurrences within the bins centered at times \(t_{1},\mathop{\ldots },t_{n}\). A common approach to modeling the neuronal firing rates is by discretizing an inhomogeneous Poisson point process, resulting in a hierarchical model of the form

$$\displaystyle{ \begin{array}{ll} Y _{j}\, \sim &\ p(y_{j}\vert \theta _{j},\,\zeta ) \\ \theta _{j} = &\ f(t_{j}),\end{array} }$$
(13.1)

where the data model p(y j  | θ j , ζ) is usually a Poisson(θ j ) density. If the bins are narrow enough they can safely be assumed (or thresholded) to contain at most one spike, so that Y j can be modeled as a Bernoulli random variable. The model includes a vector of nuisance parameters ζ to allow for generality.

The (latent) firing rates f(t j ) are often the key quantity of interest. In particular—ignoring the details of binning or the data model—one should think of f(t) as a latent function over the whole time interval of interest. Thus in the Bayesian approach a key challenge is to produce an appropriately realistic and flexible prior distribution over latent functions, and to then provide computationally efficient procedures for approximating the posterior distribution of f given the observed spike data \(Y _{1},\mathop{\ldots },Y _{n}\).

One highly flexible approach is known as Bayesian adaptive regression splines (BARS) (Dimatteo et al. 2001). In this model the latent function f is assumed to be a spline having knots at unknown locations ξ 1, , ξ k . Writing f(t) in terms of basis functions b ξ, h (t) as f(t) =  h b ξ, h (t) β ξ, h , the function evaluations f(t 1), , f(t n ) may be collected into a vector (f(t 1), , f(t n ))T = X ξ β ξ , where X ξ is the design matrix and β ξ the coefficient vector. BARS then employs a reversible jump MCMC algorithm (Green 1995) to sample from a suitable approximate posterior distribution on the knot set ξ. Eventually, curves are fitted via model averaging.

One advantage of using BARS for modeling PSTH is the ability to develop inferential methods, suitable for comparing the patterns of spiking activities for comparative problems similar to the one depicted in Fig. 13.1 (Behseta and S. 2011; Behseta et al. 2005). Kottas et al. (2012) and Kottas and Behseta (2010) also treated the problem of comparing the spike trains resulting in the experiments similar to the ones shown in Fig. 13.1, and subsequently developed a fully-Bayesian inferential methodology for such comparative studies; however, they used a Dirichlet process mixture of Beta densities as the prior for f.

Although single-neuron analysis of this type has led to many interesting discoveries, it is widely perceived that complex behaviors are driven by networks of neurons instead of a single neuron (Buzsáki 2010). Therefore, investigators have been recording neuronal activity from multi-probe electrodes. From the statistical point of view, multiple channel recordings greatly facilitate assessing the temporal properties of networks of neurons in real time.

Early analysis of simultaneously recorded neurons focused on correlation of activity across pairs of neurons using cross-correlation analyses (Narayanan and Laubach 2009) and analyses of changes in correlation over time, i.e., by using a Joint Peri-Stimulus Time Histogram or PSTH (Gerstein and Perkel 1969). Similar analyses were performed in the frequency domain by using coherence analysis of neuron pairs using Fourier-transformed neural activity (Brown et al. 2004). For the Bayesian correction for attenuation of correlation in multi-trial spike see Behseta et al. (2009). There are also a number of multivariate analysis techniques for the investigation of simultaneously recorded populations of neurons (Chapin 1999; Nicolelis 1999; Grün et al. 2002; Pillow et al. 2008; Harrison et al. 2013; Brillinger 1988; Brown et al. 2004; Kass et al. 2005; West 2007; Rigat et al. 2006; Patnaik et al. 2008; Diekman et al. 2009; Sastry and Unnikrishnan 2010; Kottas et al. 2012).

Recently, Kelly and Kass (2012) proposed a new method to quantify synchrony among multiple neurons. The authors argued that separating stimulus effects from history effects would allow for a more precise estimation of the instantaneous conditional firing rate. Specifically, given the firing history H t , define λ A(t | H t A), λ B(t | H t B), and λ AB(t | H t AB) to be the conditional firing intensities of neuron A, neuron B, and their synchronous spikes respectively. Independence between the two point processes may be examined by testing the null hypothesis H 0: ζ(t) = 1, where \(\zeta (t) = \frac{\lambda ^{AB}(t\vert H_{ t}^{AB})} {\lambda ^{A}(t\vert H_{t}^{A})\lambda ^{B}(t\vert H_{t}^{B})}\). where ζ represents the excess firing rate (ζ > 1) or the suppression of firing rate (ζ < 1) due to dependence between two neurons (Ventura et al. 2005; Kelly and Kass 2012). That is, ζ accounts for the excess joint spiking beyond what is explained by independence.

In this chapter we discuss alternative approaches that place a Gaussian process (GP) prior over the latent function in order to model the time-varying and history-dependent firing rate for each neuron. The joint distribution of spikes for multiple neurons is connected to their marginals using a parametric copula model. We first provide a brief overview of univariate GP models in Sect. 13.2. Then, in Sect. 13.3 we discuss the application of GP for single neuron analysis. The copula model for simultaneously recorded neurons is presented in Sect. 13.4. In Sect. 13.5, we discuss some future directions.

2 Gaussian Process Models

A Gaussian process (GP) on the real line is a random real-valued function x(t), with statistics determined by its mean function \(\mathbb{E}x(s)\) and kernel κ(s, t) = Cov(x(s), x(t)). More precisely, all finite-dimensional distributions \((x(t_{1}),\mathop{\ldots },x(t_{n}))\) are multivariate Gaussian with mean \((\mathbb{E}x(t_{1}),\mathop{\ldots }, \mathbb{E}x(t_{n}))\), and with covariance matrix (κ(t k , t )) k,  = 1 n. Since the latter must be positive semi-definite for every finite collection of inputs \(t_{1},\mathop{\ldots },t_{n}\), only certain kernels κ are valid. Thus when using Gaussian processes, a practitioner often chooses from among the few popular classes of kernels, such as the Squared Exponential (SE), Ornstein–Uhlenbeck (OU), Matérn, Polynomial, and linear combinations of these. For example, we can use the following covariance form, which combines a random constant with the SE kernel and iid observation noise (Rasmussen and Williams 2006; Neal 1998):

$$\displaystyle\begin{array}{rcl} C_{ij}& =& Cov[x(t_{i}),x(t_{j})] \\ & =& \lambda ^{2} +\eta ^{2}\exp [-\rho ^{2}(t_{ i} - t_{j})^{2}] +\delta _{ ij}\sigma _{\epsilon }^{2}.{}\end{array}$$
(13.2)

Here, λ, η, ρ, and σ ε are hyperparameters with their own hyperpriors. In general, the choice of kernel encodes our qualitative beliefs about the underlying signal. For instance, samples from a GP with OU kernel are always non-differentiable functions x(t), and the SE kernel generates only infinitely differentiable functions. Despite such differences, both kernels have the inverse length-scale ρ as a hyperparameter: smaller values of ρ result in more slowly varying functions. In practice we only observe GPs at a finite number of points, hence local properties of GPs such as differentiability are irrelevant—in Cunningham et al. (2007), for example, it was observed that using the Matérn instead of SE kernel resulted in negligible differences when modeling spike trains.

It should be remarked that many dynamical models such as autoregressive processes with Gaussian noise are also multivariate Gaussian and hence can be situated within the GP framework, albeit with a usually less interpretable kernel.

3 Gaussian Process Model of Firing Rates

With the model (13.1), note that the latent firing rates f(t i ) need to be non-negative, hence a Gaussian process cannot be directly used as a prior distribution for f. In the case of Poisson observations one can use an exponential link function, letting f(t) = exp(x(t)), where x(t) is a GP. In Cunningham et al. (2007) it was instead proposed to set the constant mean function μ(t) = μ > 0 as an additional hyperparameter for a GP, and then to let the latent rate f be this GP conditioned to be non-negative.Footnote 1 In a recent work, Shahbaba et al. (2014) also use the model (13.1) to estimate the underlying firing rate of neurons, but after discretizing time so that there is at most one spike within each time interval, resulting in a binary time series \(Y _{1},\mathop{\ldots },Y _{n}\) comprised of 1 s (spike) and 0 s (silence). To model the latent firing probabilities \(f(t_{i}) = P(Y _{i} = 1)\), they apply the sigmoidal transformation

$$\displaystyle\begin{array}{rcl} f(t_{i})& =& \frac{1} {1 +\exp [-u(t_{i})]}, {}\\ \end{array}$$

where u(t) has a GP prior. Note that as u(t) increases, so does f(t i ). The prior autocorrelation imposed by this model allows the firing rate to change smoothly over time. When there are R trials (i.e., R spike trains) for each neuron, we can model the corresponding spike trains as conditionally independent given the latent variable u(t). Figure 13.2 shows the posterior expectation of firing rate (blue curve) overlaid on the PSTH plot of a single neuron with 5 ms bin intervals.

Fig. 13.2
figure 2

Using the Gaussian process model of Shahbaba et al. (2014) to capture the underlying firing rate of a single neuron from prefrontal cortical areas in rat’s brain. There are 51 spike trains recorded over 10 s. The PSTH plot is generated by creating 5 ms intervals. The curve shows the estimated firing rate (posterior expectation)

4 Detecting Synchrony Among Multiple Spike Trains

For multiple neurons, Shahbaba et al. (2014) propose to use a generalization of the method by Kelly and Kass (2012) (see Sect. 13.1) to model the joint distribution as a function of marginals. In general, models that couple the joint distribution of two (or more) variables to their individual marginal distributions are called copula models. See Nelsen (1998) for detailed discussion of copula models. Onken et al. (2009) and Berkes et al. (2009) also use copula models for capturing neural dependencies.

Let H be n-dimensional distribution functions with marginals F 1, , F n . Then, an n-dimensional copula is a function of the following form:

$$\displaystyle\begin{array}{rcl} H(y_{1},\ldots,y_{n}) = \mathcal{H}(F_{1}(y_{1}),\ldots,F_{n}(y_{n})),\ \text{for all }y_{1},\ldots,y_{n}.& & {}\\ \end{array}$$

Here, \(\mathcal{H}\) defines the dependence structure between the marginals. For example, the Farlie–Gumbel–Morgenstern (FGM) copula family (Farlie 1960; Gumbel 1960; Morgenstern 1956; Nelsen 1998) is defined as follows:

$$\displaystyle\begin{array}{rcl} \mathcal{H} =\big [1 +\sum _{ k=2}^{n}\ \ \sum _{ 1\leq j_{1}<\cdots <j_{k}\leq n}\beta _{j_{1}j_{2}\ldots j_{k}}\prod _{l=1}^{k}(1 - F_{ j_{l}})\big]\prod _{i=1}^{n}F_{ i},& &{}\end{array}$$
(13.3)

where F i  = F i (y i ). As shown by Wilson and Ghahramani (2012), this idea can be generalized to multivariate processes. Restricting the above model to second-order interactions, we have

$$\displaystyle\begin{array}{rcl} H(y_{1},\ldots,y_{n}) =\big [1 +\sum _{1\leq j_{1}<j_{2}\leq n}\beta _{j_{1}j_{2}}\prod _{l=1}^{2}(1 - F_{ j_{l}})\big]\prod _{i=1}^{n}F_{ i},& &{}\end{array}$$
(13.4)

where F i  = P(Y i  ≤ y i ). Here, we use y 1, , y n to denote the firing status of n neurons at time t; \(\beta _{j_{1}j_{2}}\) captures the relationship between the j 1 th and j 2 th neurons.

For a pair of neurons with firing probabilities p and q respectively, we can show that \(\beta = \frac{\zeta -1} {(1-p)(1-q)}\). As discussed in Sect. 13.1, ζ represents the excess firing rate (ζ > 1) or the suppression of firing rate (ζ < 1) due to dependence between two neurons (Ventura et al. 2005; Kelly and Kass 2012). In our model, β = 0 indicates that the two neurons are independent; the excess firing rate and the suppression of firing rate between two dependent neurons are represented by β > 0 and β < 0 respectively.

To ensure that probability distribution functions remain within [0, 1], the following constraints on all \(n\choose 2\) parameters \(\beta _{j_{1}j_{2}}\) are imposed:

$$\displaystyle{ 1 +\sum _{1\leq j_{1}<j_{2}\leq n}\beta _{j_{1}j_{2}}\prod _{l=1}^{2}\epsilon _{ j_{l}} \geq 0,\quad \epsilon _{1},\cdots \,,\epsilon _{n} \in \{-1,1\}. }$$

Considering all possible combinations of \(\epsilon _{j_{1}}\) and \(\epsilon _{j_{2}}\) in the above condition, there are n(n − 1) linear inequalities, which can be combined into the following inequality:

$$\displaystyle{ \sum _{1\leq j_{1}<j_{2}\leq n}\vert \beta _{j_{1}j_{2}}\vert \leq 1. }$$

4.1 Computation

Sampling from the posterior distribution of β’s in the above copula model is quite challenging because of the imposed constraints. Lan et al. (2014) developed a novel Markov Chain Monte Carlo algorithm for constrained target distributions of this type based on Hamiltonian Monte Carlo (HMC) (Duane et al. 1987; Neal 2011). They show that in many cases, bounded connected constrained D-dimensional parameter spaces can be bijectively mapped on to the D-dimensional unit ball. Their method then augments the original D-dimensional parameter θ with an extra auxiliary variable θ D+1 to form an extended (D + 1)-dimensional parameter \(\tilde{\theta }= (\theta,\theta _{D+1})\) such that \(\Vert \tilde{\theta }\Vert _{2} = 1\) so \(\theta _{D+1} = \pm \sqrt{1 -\Vert \theta \Vert _{ 2 }^{2}}\). This way, the domain of the target distribution is changed from the unit ball to the D-dimensional sphere. Using the above transformation, they define the Hamiltonian dynamics on the sphere. This way, the resulting HMC sampler can move freely on the sphere, S D, while implicitly handling the constraints imposed on the original parameters. As illustrated in Fig. 13.3, the boundary of the constraint, i.e., \(\Vert \theta \Vert _{2} = 1\), corresponds to the equator on the sphere S D. Therefore, as the sampler moves on the sphere, passing across the equator from one hemisphere to the other translates to “bouncing back” off the boundary in the original parameter space.

Lan et al. (2014) show that by defining HMC on the sphere, besides handling the constraints implicitly, the computational efficiency of the sampling algorithm could be improved since the resulting dynamics has a partial analytical solution (geodesic flow on the sphere). They used this approach, called Spherical HMC, for sampling from the posterior distribution of β’s in the above copula model and showed that the resulting sampler is substantially more efficient than alternative methods.

Fig. 13.3
figure 3

Transforming unit ball B 0 D(1) to sphere S D

4.2 Results for Experimental Data

We now consider an experiment designed to investigate the role of the prefrontal cortex in rats in conjunction with reward-seeking behaviors and inhibition of reward-seeking in the absence of a rewarded outcome. The neural activity (spike trains) of several prefrontal neurons were recorded simultaneously. There are two conditions during the experiment: rewarded and non-rewarded. During the recording/test sessions, two different stimuli were presented: tone 1 (10 KHz) or tone 2 (5 KHz) individually and in pseudorandom order. At the same time, one of two levers was presented: an active-lever, paired with tone 1 (Rewarded-Stimulus—RS) and an inactive-lever paired with tone 2 (Non-rewarded Stimulus—NS). Pressing the active lever resulted in the offset of tone 1, retraction of the lever, and illumination of the reward receptacle. If the rat then went to the reward receptacle, 0.1 ml of 15 % sucrose solution was delivered as a reward. Pressing the inactive lever produced no effect. See Moorman and Aston-Jones (2014) for more details.

Here, we focus on five simultaneously recorded neurons. There are 51 trials per neuron under each scenario. We set the time intervals to 5 ms. Tables 13.1 and 13.2 show the estimates of β i, j , which capture the association between the ith and jth neurons, under the two scenarios. Figure 13.4 shows the schematic representation of these results under the two experimental conditions. The solid line indicates significant association.

Fig. 13.4
figure 4

A schematic representation of connections between five neurons under two experimental conditions. The solid line indicates significant association

Table 13.1 Estimates of β’s along with their 95 % probability intervals for the first scenario (Rewarded) based on the copula model. Statistically significant values are shown in bold
Table 13.2 Estimates of β’s along with their 95 % probability intervals for the second scenario (Non-rewarded) based on the copula model. Statistically significant values are shown in bold

These results show that neurons recorded simultaneously in the same brain area are correlated in some conditions and not others. This strongly supports the hypothesis that population coding among neurons (here though correlated activity) is a meaningful way of signaling differences in the environment (rewarded or non-rewarded stimulus) or behavior (going to press the rewarded lever or not pressing) (Buzsáki 2010). It also shows that neurons in the same brain region are differentially involved in different tasks, an intuitive perspective but one that is neglected by much of behavioral neuroscience. Finally, these results indicate that network correlation is dynamic and that functional pairs—again, even within the same brain area—can appear and disappear depending on the environment or behavior. This suggests (but does not confirm) that correlated activity across separate populations within a single brain region can encode multiple aspects of the task. For example, the pairs that are correlated in reward and not in non-reward could be related to reward-seeking whereas pairs that are correlated in non-reward could be related to response inhibition. Characterizing neural populations within a single brain region based on task-dependent differences in correlated firing is a less-frequently studied phenomenon compared to the frequently pursued goal of identifying the overall function of the brain region based on individual neural firing (Stokes et al. 2013).

5 Future Directions

The methods discussed here can be generalized in several ways as discussed below.

5.1 Multivariate GPs

The multivariate model presented in the previous section uses univariate Gaussian processes for the marginal distributions and a copula model for the joint distribution of multiple neurons in terms of these marginals. Alternatively, we can use a multivariate GP for modeling the joint distribution of multiple neurons directly. A multivariate Gaussian process can be defined in a similar way as a univariate GP, but this time the kernel function depends on two pairs of inputs. For simplicity we can assume that the mean of each process is the zero function. The kernel κ is now defined for \(i,j = 1,\mathop{\ldots },p\) and \(s,t \in \mathbb{R}\) as

$$\displaystyle{ \kappa ([i,s],[j,t]) = \mathbb{E}x_{i}(s)x_{j}(t). }$$
(13.5)

The initial challenge within the Gaussian process context is to produce a valid and interpretable kernel. A common technique for generating multivariate GP kernels is known as co-kriging, borrowed from the geostatistical literature (Cressie 1993).

One variant of co-kriging describes \((x_{1}(t),\mathop{\ldots },x_{p}(t))\) as linear combinations of latent factors. We suppose \(u_{1}(t),\mathop{\ldots },u_{q}(t)\) are independent mean zero Gaussian processes, and let

$$\displaystyle{ x_{i}(t) =\sum _{ k=1}^{q}a_{ i,k}u_{k}(t),\quad \text{ for }i = 1,2,\mathop{\ldots }p. }$$
(13.6)

Let κ i (s, t) = \(\mathbb{E}u_{i}(s)u_{i}(t)\) be the kernel for the ith latent process. Then the observed processes \(\mathbf{x}(t) = (x_{1}(t),\mathop{\ldots },x_{p}(t))\) are jointly mean-zero Gaussian with covariances

$$\displaystyle{ \mathbb{E}x_{i}(s)x_{j}(t) =\sum _{ k=1}^{q}a_{ i,k}a_{j,k}\kappa _{k}(s,t). }$$
(13.7)

This is the semi-parametric latent factor model of Teh et al. (2005), so-called because the linear combination of latent GPs is parameterized by the matrix of coefficients A = (a i, k ), while each Gaussian process is of course a non-parametric model. See Alvarez et al. (2011) for a survey of co-kriging and other multivariate GPs seen in the literature.

Recently, Vandenberg-Rodes and Shahbaba (2015) proposed a multivariate Gaussian processes model for multiple time series \(X(t) = (x_{1}(t),\mathop{\ldots },x_{p}(t))\) such that each marginal process x j (t) is a stationary mean-zero Gaussian process with Matérn kernel. Crucially, the marginal processes are not required to share the same hyperparameter values. This approach can be used to model the joint distribution of the firing rates of multiple neurons directly, and allows for significant heterogeneity among neurons while also providing a high degree of interpretability.

5.2 Dynamic Networks

The static (stationary) model discussed here aggregates cross-neuronal spike-train interactions over time. This can lead to misleading results. Although there exist many dynamic methods developed for modeling brain functional and effective connectivity (Friston et al. 1997; Cribben et al. 2013; Ombao et al. 2005; Ombao and Van Bellegem 2008; Motta and Ombao 2012; Park et al. 2014; Lindquist et al. 2014), these approaches are primarily designed for continuous-valued signals such as functional magnetic resonance imaging (fMRI) and electroencephalogram (EEG) data. The GP-based method discussed here can be extended to model neuronal connections dynamically.

5.3 Community Detection

Besides allowing for time-varying firing rates and interactions among neurons, the GP-based method can also be extended to cluster neurons based on their cross-dependencies in order to detect subnetworks (communities). To this end, stochastic block models could be used to identify network partitions (Holland et al. 1983). For example, Rodriguez (2012) recently proposed a stochastic block model for network analysis where interactions among factors are observed at multiple time points. This method uses a Bayesian hierarchical stochastic block model to detect possible structural changes in a network. Alternatively, one can use a method similar to the product partition model (PPM) of Müller and Quintana (2010). In general, these methods assume a prior probability on all possible partitions. The assumed prior probability could be influenced by some covariates. This approach can be used to partition neurons into subnetworks.