Introduction

Although mathematical modeling plays a key role in describing the dynamics of complex systems, it still remains a challenging problem (Banga and Balsa-Canto 2008; van Riel 2006; Stelling 2004; Kell 2004). In order to build a successful model that allows one to reveal the mechanism underlying a complex system, we first need to select a robust model whose output is consistent with a priori available knowledge about the system dynamics (Kitano 2002; Rodriguez-Fernandez et al. 2006a; Rodriguez-Fernandez et al. 2013). The selected model should be able to reproduce, at least qualitatively, observed specific features in experimental data. This task is referred to as structure identification (Lillacci and Khammash 2010; Tashkova et al. 2011). The subsequent task is parameter estimation (Ashyraliyev et al. 2008, 2009). After the model identification, one needs to determine the unknown model parameters from the measurements. Since the output of a model depends on the values of its parameters, reproducing specific features of the experimental measurements requires selecting a suitable set of the unknown parameters. Therefore, parameter estimation is a very important component of the model developing procedure. Broadly speaking, given a set of experimental data and a particular mathematical model, the aim of parameter estimation (also known as model calibration) is to identify the unknown model parameters from the measurements for which substituting the estimated parameters in the model equations reproduces the experimental data in the best possible way (Rodriguez-Fernandez et al. 2006a). Nevertheless, finding a set of model parameters which accurately fits the recorded data is an extremely difficult task, especially for nonlinear dynamic models with many parameters and constraints. Numerical integration of differential equations and finding the best parameter values in the entire search domain, i.e. finding the global minimum, are two major challenges in the parameter estimation problems (Zhan and Yeung 2011). In particular for biological systems, these challenges need to be addressed in nonlinear high-dimensional models.

In general, there are two broad classes of approaches for solving parameter estimation problems: the frequentist (classic) inference and Bayesian (probabilistic) estimation (Kimura et al. 2005; Myung 2003; Gelman et al. 2004). Both approaches have been applied successfully in a wide range of scientific areas with different applications while one over the other is preferable in specific problems (Green and Worden 2015; Prasad and Souradeep 2012; Lillacci and Khammash 2010; Ashyraliyev et al. 2009). Bayesian inference gives the full probability distribution of the parameters rather than single optimal values as in frequentist inference. However, the former approach is more complex and more expensive in terms of computational cost than the latter (Lillacci and Khammash 2010). In practice, the frequentist framework is more simple and more suitable for high-dimensional models (Tashkova et al. 2011).

It is important to point out that there are various algorithms in both frequentist and Bayesian inferences, and no single algorithm is the best for all problems or even for a broad class of problems (Mendes and Kell 1998; Gelman et al. 2004; Haario et al. 2006; Girolami and Calderhead 2011; Kramer et al. 2014). Specifically, in the frequentist approach the choice of the optimization technique commonly depends on the nonlinearity of the model and its constraints, on the problem dimensionality as well as on the a priori knowledge about the system.

In the present study, we employ different algorithms within both frequentist and Bayesian inference frameworks. As frequentist techniques, we apply the Levenberg-Marquardt (LM) algorithm as a gradient descent local search method, the algorithm by Hooke and Jeeves (HJ) as direct local search method, in addition to Particle Swarm Optimization (PSO), Differential Evolution (DE), Genetic Algorithm (GA), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) as stochastic global search methods that have previously been compared and/or shown to be efficient for fitting electrophysiological neuronal recordings (Buhry et al. 2012). We also use Metropolis-Hastings (MH) and Simulated Annealing (SA) as the most established Monte Carlo Markov Chain (MCMC) algorithms, which are widely used in the Bayesian framework. Furthermore, we evaluate the performance of aforementioned algorithms to determine which method is more suitable for each of the parameter estimation problem considered in this study.

It is well known that the dynamics of a majority of biological systems can be described by a set of coupled Ordinary Differential Equations (ODEs) or Delay Differential Equations (DDEs) (Mendes and Kell 1998). Moreover, biological systems are often subject to external random fluctuations (noise) from signal stimuli and environmental perturbations (Daunizeau et al. 2009; Breakspear 2017). Despite the importance of stochastic differential equations (SDEs) in brain stimulation (Deco et al. 2009; Herrmann et al. 2016) and describing biological systems (Wilkinson 2011; Hutt et al. 2016), their parameter inference by a rigorous analytical approach have received relatively little attention and substantial challenges remain in this context. This motivated us to focus on the parameter estimation of systems whose dynamics are governed by SDEs.

More precisely, a parameter estimation problem is shown for a neurophysiological model describing recorded electroencephalographic data (EEG) obtained under anesthesia. We show that the proposed neural mass model is able to fit very well to observed EEG spectral power peaks in the δ − (0 − 4 Hz) and α − (8 − 13 Hz) frequency ranges. For illustration reasons, firstly two in silico parameter estimation problems are presented using synthetic data. These case studies consider very basic linear stochastic models and illustrate in detail the analysis applied.

After the parameter estimation task, another important challenge is the identifiability of the estimates (Ashyraliyev et al. 2009; Rodriguez-Fernandez et al. 2006b). Identifiability analysis allows one to estimate whether the model parameters can be uniquely determined by the given experimental data (Rodriguez-Fernandez et al. 2013). For each considered case study, we employ different methods to address this issue. The confidence regions of the estimates are plotted and the correlation and sensitivity matrices are analyzed to assess the accuracy of the estimates.

Several previous methods need to integrate differential equations to estimate model parameters, which is a major time consuming problem for the parameter estimation of nonlinear dynamic systems (Tsai and Wang 2005). In this work, we present a general methodological framework for estimating the parameters of systems described by a set of stochastic ODEs or DDEs. In our proposed scheme which is applicable in both frequentist and Bayesian inference frameworks, we compute analytically the power spectrum of model solutions by the aid of the Green’s function and fit these to the spectral power of measured data. This combination of techniques provides high estimation accuracy in addition to a great advantage in terms of optimization speed, because it allows us to avoid the numerical integration of model equations.

The following section presents the acquisition procedure of experimental EEG under anesthesia. Then, we briefly review the parameter estimation algorithms and present the mathematical formulation of identifiability analysis in details. Next, we provide the analytical derivation of system spectral power for the two synthetic case studies and the thalamo-cortical model carried out in this work. The subsequent results section provides the performance of employed optimization algorithms for the synthetic and neurophysiological models. We can show the different sensitivity of model parameters in the thalamo-cortical model. Moreover, employing EAs yields very good model fits to the EEG spectral features within δ − and α −frequency ranges measured during general anesthesia. A final patient group study reveals which model parameters vary statistically significantly between experimental conditions and which are robust towards conditions.

Materials and Methods

EEG Acquisition during General Anesthesia

The details of the patient management and EEG acquisition is described in Sleigh et al. (2010). In brief, frontal (FP2-FT7 montage) EEG was obtained from adult patients under general anesthesia that was maintained using either propofol and fentanyl, or desflurane and fentanyl. The hypnotic drugs were titrated to obtain a bispectral index value of 40-50 as per clinical guidelines. The EEG data were collected 2 minutes before, and 2 minutes after, the initial skin incision. The signal was digitized at 128/sec and with 14 bit precision. To remove line artefact it was band-pass filtered between 1 Hz and 41 Hz.

Objective Function

The most widely used criteria to evaluate the goodness of a model fit are the maximum likelihood estimation (MLE) and the least-squares estimation (LSE) (Bates and Watts 1988; Villaverde and Banga 2013). MLE implies Bayesian inference and was originally introduced by R.A. Fisher in 1912 (Aldrich 1997). It searches parameter space to obtain the parameter probability distributions that produce the observed data most likely (Kay 1993). In other words, the MLE assesses the quality of estimated parameters by maximizing the likelihood function (or equivalently the log-likelihood function which is easier to work mathematically). The likelihood function is the probability of obtaining the set of observed data, with a given set of parameter values. The set of parameters that maximizes the likelihood function is called the maximum likelihood estimator. On the other hand, choosing LSE method (frequentist inference), we search for the parameter values that minimize the sum of squared error (SSE) between the measured and the simulated data (Ljung 1999; Myung 2003). As it is widely known, if we assume that the experimental errors are independent and normally distributed and assuming that the measurement noise is uncorrelated and obeys a Gaussian distribution, the MLE is equivalent to LSE (Bates and Watts 1980; Ljung 1999):

$$ \underset{{\boldsymbol{p}}} {\text{argmax}}~ \left\{ \mathcal P(\boldsymbol p)\right\}=\underset{\boldsymbol{p}} {\text{argmin}}~ \left\{ \mathrm{\mathcal E}(\boldsymbol{ p})\right\}, $$
(1)

where

$$\begin{array}{@{}rcl@{}} \mathcal P(\boldsymbol{p}))&=&ln \left( \prod\limits_{i = 1}^{N_{y}} \left( \frac{1}{2\pi {\sigma_{i}^{2}}}\right)^{\frac{1}{2}} \right)\\&&-\frac{1}{2} \left( \sum\limits_{i = 1}^{N} \left[ \frac{{\left( {\hat Y}_{i}-{Y}_{i}(t,\boldsymbol{p})\right)^{2}}}{{\sigma_{i}^{2}}}\right] \right), \end{array} $$
(2)
$$\begin{array}{@{}rcl@{}} \mathrm{\mathcal E}(\boldsymbol{p})&=&\sum\limits_{i = 1}^{N_{y}} \left[ \frac{{\left( {\hat Y}_{i}-{Y}_{i}(t,\boldsymbol{p})\right)^{2}}}{{\sigma_{i}^{2}}}\right] , \end{array} $$
(3)

where \(\mathcal E (\boldsymbol {p})\) is the weighted least-squares fitness function, \(\hat Y_{i}\) denotes the measured data in the i-th data point, Y i (t,p) represents the corresponding model prediction at time point t i , p is the parameter vector being estimated, σ i are the measurement errors (the variance of the experimental fluctuations), and N y is the number of sampling points of the observed data. In addition, if we assume that all variances \({\sigma ^{2}_{i}}\) are equal, Eq. 3 simplifies to the well-known chi-squared error criterion (Walter and Pronzato 1997)

$$ {\chi^{2}}= \sum\limits_{i = 1}^{N_{y}} \left( {\hat Y}_{i}-{Y}_{i}(t,\boldsymbol{p})\right)^{2}. $$
(4)

When minimizing the standard chi-squared error criterion failed to reveal the power peaks in certain frequency bands, we employ a modified chi-squared error criterion referred to as the biased chi-squared function given by

$$\begin{array}{@{}rcl@{}} {\chi^{2}}\!\!&=&\!\! c_{1} \sum\limits_{i = 1}^{N_{1}} \left( {\hat Y}_{i}-{Y}_{i}(t,\boldsymbol{p})\right)^{2}\!+ c_{2} \sum\limits_{i=N_{1}}^{N_{2}} \left( {\hat Y}_{i}\,-\,{Y}_{i}(t,\boldsymbol{p})\right)^{2}\\ &&\!\!+c_{3}\sum\limits_{i=N_{2}}^{N_{3}} \left( {\hat Y}_{i}\,-\,{Y}_{i}(t,\boldsymbol{p})\right)^{2}+ c_{4}\sum\limits_{i=N_{3}}^{{N_{y}}} \left( {\hat Y}_{i}\,-\,{Y}_{i}(t,\boldsymbol{p})\right)^{2},\\ \end{array} $$
(5)

where c1, c2 and c3 c4 are manually chosen constants depending on the observed spectral peaks in the estimation problem. Let us consider a power spectrum that exhibits two peaks in δ − (0 − 4 Hz) and α − (8 − 13 Hz) frequency ranges. We can choose N1, N2, and N3 in such a way that the δ − and α − peaks fall within the ranges [1,N1] and [N2,N3], respectively. Then, large values of c1, c3 forces the model output to be fitted with the observed spectral peaks within these frequency ranges. It is trivial that c1 = c2 = c3 = 1 yields the standard chi-squared error criterion given by Eq. 4. To fit the model’s power spectrum to the empirical data, we take the logarithm of the spectral power i.e., Y i (t,p) = log(PSD m o d e l (f i ,p)), where f i is the i-th frequency value and p contains all the unknown model parameters being estimated. Here, PSD m o d e l is the analytically derived power spectrum derived in Section “Case Studies”.

Parameter Estimation Algorithms

Optimization methods can be broadly divided into two major groups known as local optimization methods and global optimization methods. Local optimization methods can be further subdivided into two categories. First, gradient based methods involve the use of derivative information, such as Levenberg-Marquardt and Gauss-Newton algorithms. Second, pattern search methods, such as Nelder-Mead simplex and Hooke-Jeeves algorithms, which involve the use of function evaluations only and do not need the derivative information. Local optimization methods start with an initial guess for the parameter values and, in order to obtain satisfactory results, one has to manually tune the initial parameters. Although the local search algorithms converge very rapidly to a solution, they can easily get trapped at a local minimum if the algorithm is not initialized close to the global minimum (Moles et al. 2003; Mendes and Kell 1998; Rodriguez-Fernandez et al. 2006a; Hamm et al. 2007). To overcome such drawbacks, stochastic global optimization methods have been widely used for the solving of nonlinear optimization problems (Rodriguez-Fernandez et al. 2006b; Svensson et al. 2012; Tashkova et al. 2011). These methods need neither an initial guess for the parameters nor the gradient of the objective function. Although stochastic global search methods cannot guarantee the convergence to a global optimum, they are particularly adapted to black-box optimization problems (Pardalos et al. 2000; Papamichail and Adjiman 2004; Lera and Dergeyev 2010). These methods are also usually more efficient in locating a global minimum than deterministic methods, which are based on the computation of gradient information (Georgieva and Jordanov 2009; Cuevas et al. 2014).

There are several types of stochastic global optimization methods, which are mostly based on biological or physical phenomena (Corne et al. 1999; Fogel 2000). Evolutionary algorithms (EAs) are stochastic search methods, which incorporate a random search principle existing in natural systems including biological evolution (e.g. GA inspired by mating and mutation), artificial evolution (if one does not deal with binary data), and social swarming behavior of living organisms. As an example for the latter algorithm, Particle Swarm Optimization is inspired by birds flocking and fish schooling.

In this study, we use the most popular optimization algorithms namely Levenberg-Marquardt (LM) algorithm and Hooke and Jeeves (HJ) algorithm selected from local search category, and Particle Swarm Optimization (PSO), Differential Evolution (DE), Genetic Algorithm (GA), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) from stochastic global search methods. Furthermore, we use Metropolis-Hastings (MH) and Simulated Annealing (SA) as the popular sampling algorithm belonging to Monte Carlo Markov Chain (MCMC) methods. In addition, to confirm our results obtained by MH, we have used PyMC, which is a probabilistic programming language to perform Bayesian inference in Python (Patil et al. 2010). The details of these algorithms are explained in Appendix A in Supplementary Material.

Identifiability Analysis

Once the model parameters have been estimated, it is necessary to determine the identifiability of the estimates, i.e., whether the model parameters can be uniquely determined by the given experimental data (Raue et al. 2011, 2009; Quaiser and Monnigmann 2009). This task is referred to as practical identifiability of the estimates. Several approaches have been suggested to assess the reliability and accuracy of the estimated parameters. In what follows, we describe the most widely used metrics for assessing the accuracy of estimates.

Confidence Regions

A widely used method in statistical inference to assess the precision of estimated parameters is constructing the confidence regions (Draper and Smith 1998; Rawlings et al. 1998). A confidence region with the confidence level of (1 − α)% is a region around the estimated parameter that contains the true parameter with a probability of (1 − α). Since the sum of squares function is quadratic in linear models, the confidence regions for linear problems with Gaussian noise can be obtained exactly as the ellipsoid (Kay 1993)

$$ (\boldsymbol{p}^{\ast}-\boldsymbol{p})^{\top} C_{lin}^{-1} (\boldsymbol{p}^{\ast}-\boldsymbol{p}) \leq N_{p} \mathcal{F}^{1-\alpha}_{{N_{P}, {N_{y}-N_{p}}}}~. $$
(6)

It is centered at the estimated parameter p with principal axes directed along the eigenvectors of \(C_{lin}^{-1}\), where C l i n denotes the covariance matrix of the linear model, \(\mathcal {F}\) is the Fisher distribution with N p and N y N p degrees of freedom, N p and N y are the number of model parameters and the total number of data points, respectively.

In contrast, for nonlinear models there is no exact solution to obtain the confidence regions (Marsili-Libelli et al. 2003). In these cases, we have to approximate the covariance matrix to extend (6) for nonlinear models leading to (Seber and Wild 1997; Ljung 1999)

$$ (\boldsymbol{p}^{\ast}-\boldsymbol{p})^{\top} C_{approx}^{-1} (\boldsymbol{p}^{\ast}-\boldsymbol{p}) \leq N_{p} \mathcal{F}^{1-\alpha}_{{N_{P}, {N_{y}-N_{p}}}}~. $$
(7)

Here C a p p r o x is an approximation of covariance matrix and it can be computed by either the Fisher information matrix (represented by C J ), or the Hessian matrix (represented by C H ).

Applying the Fisher matrix \(C_{J}={FIM }^{-1}\), the approximate covariance matrix is given by (Rodriguez-Fernandez et al. 2006a)

$$ C_{J}= s^{2} \left( J(\boldsymbol{p})^{\top} W J(\boldsymbol{p}) \right)^{-1}, $$
(8)

where \(s^{2}=\mathcal E(\boldsymbol {p {^{\ast }}})/(N_{y}-N_{p})\) is an unbiased approximation of the measurement variance,

$$\begin{array}{@{}rcl@{}} J(\boldsymbol p)= \frac{\partial {Y}(t,\boldsymbol{p})}{\partial \boldsymbol p} \mid_{\boldsymbol{p}^{\ast}} \end{array} $$

is an N y × N p matrix indicating the Jacobian matrix evaluated at p, and W is a weighting diagonal matrix with elements \(w^{2}_{ii}= 1/\sigma ^{2}_{ii}\) in the principal diagonal. Consequently, by substituting (8) into (7), the confidence region obtained with the Fisher matrix reads

$$\begin{array}{@{}rcl@{}} (\boldsymbol{p}^{\ast}\,-\,\boldsymbol{p})^{\top} \left( J(\boldsymbol{p})^{\top} W J(\boldsymbol{p}) \right)(\boldsymbol{p}^{\ast}\,-\,\boldsymbol{p}) \!&\leq&\! N_{p} \frac{\mathcal E(\boldsymbol{p}^{\ast})}{N_{y}\,-\,N_{p}}\\ &&\!\times\mathcal{F}^{1-\alpha}_{{N_{P}, {N_{y}-N_{p}}}}. \end{array} $$
(9)

In another approach, the approximate covariance matrix can be derived from the curvature of the objective function through the Hessian matrix (Marsili-Libelli et al. 2003):

$$ C_{H}= 2 s^{2} H(\boldsymbol{p})^{-1}, $$
(10)

where

$$\begin{array}{@{}rcl@{}} H(\boldsymbol{p}) = \frac{\partial^{2} \mathcal E(\boldsymbol{p})}{\partial \boldsymbol p \partial \boldsymbol p^{\top}} \mid_{\boldsymbol{p}^{\ast}}~. \end{array} $$

Therefore, the confidence region based on Hessian matrix reads

$$ (\boldsymbol{p}^{\ast}-\boldsymbol{p})^{\top} H (\boldsymbol{p}) (\boldsymbol{p}^{\ast}-\boldsymbol{p}) \leq 2 N_{p} \frac{\mathcal E(\boldsymbol{p}^{\ast})}{N_{y}-N_{p}} \mathcal{F}^{1-\alpha}_{{N_{P}, {N_{y}-N_{p}}}}. $$
(11)

It is important to note that if both approaches yield the same confidence ellipsoids, the estimation converges to the true parameters. Otherwise, any discrepancy between them indicates an inaccurate estimation (Marsili-Libelli et al. 2003; Rodriguez-Fernandez et al. 2006b).

Another way of constructing the confidence regions in non-linear models is known as the likelihood method. In this approach, an approximate confidence region is defined as all the parameter sets that satisfy (Donaldson and Schnabel 1985)

$$ \mathcal E(\boldsymbol p) \leq \mathcal E(\boldsymbol{p}^{\ast}) \left( 1+ \frac{N_{p}}{N_{y}-N_{p}} \mathcal{F}^{1-\alpha}_{{N_{P}, {N_{y}-N_{p}}}} \right). $$
(12)

In general, the confidence regions constructed by this approach do not have to be elliptical. Furthermore, since the (12) does not depend on the linearizion, the confidence regions obtained through the likelihood method are more precise than those computed through the approximate covariance matrix (Schmeink et al. 2011). Generating likelihood-based confidence regions requires a large number of function evaluations, which can be computationally expensive. Despite this fact, since minimizing an objective function with metaheuristic optimization algorithms like PSO is performed through function evaluations, using them is a suitable way to obtain the likelihood confidence regions (Schwaab et al. 2008). In this work, we employ the PSO algorithm to compute the likelihood confidence regions which will be compared with those obtained through the covariance approximation.

Correlation Analysis

The correlation matrix quantifies the possible interrelationship among the model parameters, which can be obtained from the covariance matrix. The correlation coefficient between the i-th and j-th parameter is defined by

$$ \begin{array}{ccl} R_{ij}& = & \frac{C_{ij}}{\sqrt {C_{ii} C_{jj}}} \end{array} $$
(13)

where C i j is the covariance between the i-th and j-th parameter estimates (Rodriguez-Fernandez et al. 2006a). By virtue of the conceptual definition of the correlation coefficient, the correlation among parameters leads to non-identifiability problems (Li and Vu 2013; Rodriguez-Fernandez et al. 2006b). Thus, highly correlated parameters cannot be uniquely estimated, because the output modification due to small change in one of the correlated parameter can be compensated by an appropriate change in the other parameter.

Sensitivity Analysis

Sensitivity analysis is an appropriate way to identify which model parameters contribute most to variations in model output due to the changes in model input (Rateitschak et al. 2012). A local sensitivity coefficient measures the influence of small changes in one model parameter on the model output, while the other parameters are held constant (Ingalls 2008; Zi 2011). The local sensitivity coefficients can be defined by (Brun et al. 2001)

$$ {\Gamma}(p_{j})= \mathcal{D}(J(\boldsymbol{p})^{\top} W J(\boldsymbol{p})), $$
(14)

where \(\mathcal {D}\) denotes the main diagonal elements of a matrix. In addition, the local sensitivity matrix can be determined by computing the curvature of the objective function through the Hessian matrix (Bates and Watts 1980)

$$ {\Lambda}(p_{j})=\mathcal{D} (H(\boldsymbol{p})). $$
(15)

The sensitivity analysis can shed light on the identifiability of model parameters. Making a small change in a very sensitive model parameter causes a strong response in the model output, which indicates that the parameter is more identifiable. On the contrary, a model parameter with low sensitivity is more difficult to being identified, because any modification in an insensitive parameter has no influence on the model output (Rodriguez-Fernandez et al. 2013).

Case Studies

Firstly, in order to illustrate the performance and capability of the parameter estimation method carried out in this work, we estimate the model parameters of two case studies: Case Study I) a stochastic damped harmonic oscillator, and Case Study II) a stochastic delayed oscillator. For each case we have generated in silico data, i.e., the measured data is generated artificially by adding noise to the model output obtained by simulating the model equations with a set of pre-chosen parameters referred to as the true values. Finally, in Case Study III) the parameters of a thalamo-ocortical model are inferred by fitting the model power spectrum to the EEG spectral power recorded under various experimental conditions. All the computations in the present work were implemented in Matlab (The Mathworks Inc., MA) on a Mac OS X machine with 2.5 GHz Intel Core i5 processor and 12 GB of 1333 MHz DDR3 memory.

Case Study I: a Stochastic Damped Harmonic Oscillator

Consider a damped harmonic oscillator driven by a random stochastic force given by (Øksendal 2007)

$$\begin{array}{@{}rcl@{}} \frac{d^{2}x}{dt^{2}}+\gamma \frac{dx}{dt}+ {\omega_{0}^{2}} x=\xi(t), \end{array} $$
(16)

where ω0 is the intrinsic angular frequency of the oscillator, and γ denotes the damping coefficient. The additive Gaussian white noise ξ(t) obeys

$$ \langle \xi(t) \rangle= 0, ~~ \langle \xi(t) \xi(t^{\prime}) \rangle= 2 \kappa \delta(t-t^{\prime}), $$
(17)

where κ is the intensity of the uncorrelated driving noise, and 〈.〉 denotes the ensemble average (Risken 1984; 1996). Using the Wiener-Khinchin theorem, the power spectrum of the stochastic differential equation (16) reads (Wang and Uhlenbeck 1945; Masoliver and Porrá 1993)

$$\begin{array}{@{}rcl@{}} P(\omega)=\frac{2 \kappa}{\sqrt{2\pi}} \frac{1}{(\omega^{2}-{\omega_{0}^{2}})^{2}+\gamma^{2} \omega^{2}}, \end{array} $$
(18)

where ω = 2πf denotes the angular frequency. It can be shown that the only maximum of P(ω) is located at \(\omega _{max}=\sqrt {{\omega _{0}^{2}}-\gamma ^{2}/2}\), where f0 = ω0/2π is the resonant frequency of the system. In this case study, the vector of unknown parameters being estimated is p I = (κ,γ,f0) with the constraint κ,γ,f0 > 0.

Case Study II: a Stochastic Linear Delayed Oscillator

Consider a linear scalar delay differential equation in the presence of additive white noise given by

$$\begin{array}{@{}rcl@{}} \frac{dy(t)}{dt}=a y(t) + by(t-\tau)+ \xi(t). \end{array} $$
(19)

where the noise ξ(t) obeys the properties given by Eq. 17. The power spectrum of the corresponding solution is

$$\begin{array}{@{}rcl@{}} P(\omega)=\frac{2 \kappa}{\sqrt{2\pi}} \frac{1}{(a+b \cos (\omega \tau))^{2}+(\omega+b \sin (\omega \tau))^{2}}, \end{array} $$
(20)

where κ is the intensity of the additive white Gaussian noise. In this case study the vector of unknown parameters being estimated is p II = (κ,a,b,τ), where κ > 0, τ > 0, and \(a,b \in \mathbb {R}\).

Case Study III: a Thalamo-Cortical Model Reproducing the EEG Rhythms

Case Study III aims to estimate the parameters of a neural mass model by fitting the power spectrum of the system to the recorded EEG data during awake and anesthesia conditions. To this end, we consider a reduced thalamo-cortical neuronal population model, which is able to reproduce the characteristic spectral changes in EEG rhythms observed experimentally during propofol-induced anesthesia (Hashemi et al. 2014; 2015). In the following, the model equations are given, then we derive the analytical expression for EEG power spectrum which will be fitted to the empirical spectra.

Consider the thalamo-cortical system shown schematically in Fig. 1. The model consists of a network of three populations of neurons: cortical pyramidal neurons (E), thalamo-cortical relay neurons (S) which both are excitatory glutamatergic neurons, and thalamic reticular nucleus (R) which is a thin shell of GABAergic cells surrounding the thalamus. The cortical pyramidal neurons (E) receives excitatory input from thalamo-cortical relay neurons (S) and projects back to the same nucleus. This reciprocal long-range excitatory interaction would generates a positive feedback which is associated with a conduction delay τ. However, the incessant excitation in this loop is prevented by the interposed inhibition to thalamo-cortical relay neurons (S) which originates from thalamic reticular nucleus (R). The thalamic reticular nucleus (R) receive excitatory input from axon collaterals of the cortical pyramidal neurons (E) and thalamo-cortical relay neurons (S), which the former input is associated with a constant time delay τ (Robinson et al. 2001a; Victor et al. 2011).

Fig. 1
figure 1

Schematic diagram of the reduced thalamo-cortical model. The excitatory connections (glutamatergic) are indicated with blue arrows, while the inhibitory connections (GABAergic) are represented by red lines with filled circle ends. The connections between cortical pyramidal neurons (E) and the thalamus consisting of thalamocortical relay neurons (S) and thalamic reticular nucleus (R) are associated with a constant time delay τ

Following Hashemi et al. (2014, 2015), we denote the excitatory and inhibitory postsynaptic potentials (PSPs) in the model’s neuronal populations by \({V_{a}^{c}}\), where \(a \in \left \{E,R,S \right \}\) represents the pyramidal (E), relay (S), and reticular (R) neurons, respectively, and \(c \in \left \{e,i \right \}\) indicates the excitatory and inhibitory synapses, respectively. The system dynamics are governed by the following set of coupled delay differential equations

$$\begin{array}{@{}rcl@{}} \hat {L}_{e}{V^{e}_{E}}(t)&=& K_{ES}S_{S}[{V^{e}_{S}}(t-\tau)-{V^{i}_{S}}(t-\tau)] , \\ \hat{L}_{e}{V^{e}_{S}}(t)&=&K_{SE}S_{E}[{V^{e}_{E}}(t-\tau)]+I(t) ,\\ \hat{L}_{i}{V^{i}_{S}}(t)&= &K_{SR}S_{R}[{V^{e}_{R}}(t)] ,\\ \hat{L}_{e} {V^{e}_{R}}(t)&=& K_{RE} S_{E}[{V^{e}_{E}}(t-\tau)]+K_{RS} S_{S}[{V^{e}_{S}}(t)-{V^{i}_{S}}(t)]\\ \end{array} $$
(21)

where the parameters K a b are the synaptic connection strengths in population a originating from population b and τ is the transmission time delay between cortex and thalamus. The additional activity I(t) introduces an external input to the system considered as a non-specific input to relay neurons

$$ I(t)=I_{0} +\xi(t), $$
(22)

where I0 is the input mean value, and the noise ξ(t) obeys the properties given by Eq. 17. According to previous studies, we assume that the EEG can be described in a good approximation by spatially constant neural population activity (Robinson et al. 2001a, 2001b, 2002). Thus, under the assumption of the spatial homogeneity, mean post-synaptic potentials in above equations do not depend on spatial locations. The parameters S a [.] describe the mean firing rate functions for neuronal populations \(a \in \left \{E,S, R \right \}\), in which they are generally considered as a standard sigmoid function

$$ S_{a}(V)=\frac{S_{a}^{max}}{1+e^{-c(V-V_{a}^{th})}}, $$
(23)

where \(S_{a}^{max}\) is the maximum firing rate of population a, \( V_{a}^{th}\) indicates the mean firing threshold, and c denotes the slope of the sigmoid function at the inflexion-point Vth. The temporal operators \(\hat {L}_{e,i}\) are given by

$$ \begin{array}{l} \hat{L}_{e}(\partial/\partial t)=\frac{1}{\alpha_{e} \beta_{e}} \dfrac {\partial^{2}}{\partial t^{2}}+ (\frac{1}{\alpha_{e}}+\frac{1}{\beta_{e}} ) \frac{\partial}{\partial t} + 1,\\\\ \hat{L}_{i} (\partial/\partial t)=\frac{1}{\alpha_{i} \beta_{i}} \dfrac {\partial^{2}}{\partial t^{2}}+ (\frac{1}{\alpha_{i}}+\frac{1}{\beta_{i}} ) \frac{\partial}{\partial t} + 1, \end{array} $$
(24)

with α e > β e , and α i > β i , where α e and α i indicate the synaptic rise rates of the response functions for excitatory and inhibitory synapses in s− 1, respectively, and β e and β i denote the corresponding decay rate constants. Moreover, the delay term, τ, is zero if both the sending and receiving populations are in the thalamus while for the thalamo-cortical or cortico-thalamic pathways, the delay term is nonzero. For further details on model equation derivation see Hashemi et al. (2015).

Finally, since we assume that the EEG is generated by the activity of pyramidal cortical cells (Nunez and Srinivasan 2006; Rennie et al. 2002), and by virtue of the specific choice of external input to relay neurons, the power spectrum of the EEG just depends on one matrix component of the Green’s function by (Hutt 2013; Hashemi et al. 2015)

$$ P_{E}(\omega)={2 \kappa}\sqrt{2\pi} \left\vert {\tilde{G}}_{1,2}(\omega)\right\vert^{2}, $$
(25)

where

$$ \tilde{G}_{1,2}(\omega)=\frac{-K_{1} \hat {L}_{i}e^{- i\omega\tau} }{\hat {L}_{e}(\hat {L}_{e} \hat {L}_{i} +G_{srs})+ e^{-2i \omega\tau} (G_{esre}-G_{ese} \hat {L}_{i}) }, $$
(26)

with G e s e = K1K2, G s r s = K3K5 and G e s r e = K1K3K4, and

$$\begin{array}{@{}rcl@{}} \hat {L}_{e}\!\!&=&\!\!\left( 1+\frac{i \omega}{\alpha_{e}} \right)\left( 1+\frac{i \omega}{\beta_{e}} \right)~,~~~~~~ \quad \quad \hat {L}_{i}\,=\,\left( 1+\frac{i \omega}{\alpha_{i}} \right)\!\left( 1+\frac{i \omega}{\beta_{i}} \right)~\!\!,\\ K_{1}\!\!&=&\!\!K_{ES}\frac{d S_{S}[ V]}{d V} \mid_{V=(V^{*e}_{S}-V^{*i}_{S})}, ~~~~ \quad K_{2}\,=\,K_{SE}\frac{d S_{E}[ V]}{d V}\mid_{V=V^{*e}_{E}}~\!,\\ K_{3}\!\!&=&\!\!K_{SR} \frac{d S_{R}[V]}{d V}\mid_{V=V^{*e}_{R}}~, ~~~~~~ \quad \quad K_{4}\,=\,K_{RE} \frac{d S_{E}[ V]}{d V} \mid_{V=V^{*e}_{E}},\\ K_{5}\!\!&=&\!\!K_{RS} \frac{d S_{S}[ V]}{d V}\mid_{V=(V^{*e}_{S}-V^{*i}_{S})}~. \end{array} $$

In a reasonable approximation, we assume an instantaneous rise of the synaptic response function followed by an exponential decay i.e., α e β e , and α i β i (Hashemi et al. 2017). This approximation reduces the second-order temporal operators \(\hat {L}_{e,i}\) given by Eq. 24 to the first-order operators \(\hat {L}_{e}= 1+i\omega /\beta _{e}\), and \(\hat {L}_{i}= 1 + i \omega / \beta _{i}\). Using this approximation, the sixth-order characteristic equation (the denominator of \(\tilde {G}_{1,2}\) given by Eq. 26) simplifies to a third-order equation, which is more analytically tractable. In our previous study (Hashemi et al. 2017), we have shown that this simplification does not affect the spectral power in the delta and alpha ranges. Moreover, it is widely accepted that anesthetic agent propofol prolongs the temporal decay phase of inhibitory synapses while the rise rates remain unaffected (Hutt and Longtin 2009; Hutt et al. 2015; Hashemi et al. 2014, 2015).

Taken together, by fitting the power spectrum of EEG given by Eq. 25 to the empirical spectra, we aim to estimate seven model parameters, namely, the power normalization \(D=\sqrt {2\kappa K_{1}}\), the excitatory and inhibitory synaptic decay rates β e , and β i , respectively, the axonal propagation delay τ, and the closed-loop gains G e s e , G s r s , and G e s r e . Thus, the vector of unknown parameters being estimated is p III = (D,τ, β e ,β i ,G e s e ,G s r s ,G e s r e ), where based on the physiological limits, all the parameters are restricted to be positive.

Furthermore, there are six inequality constraints on system parameters, which will be imposed over the chi-squared error function in spectral fitting problem. The first constraint is related to the synaptic rise and decay rate constants. Since response functions for the excitatory synapses exhibit a longer characteristic rise and decay times than the inhibitory synapses, thus α e > α i , and β e > β i (Constraint I). Following the analytical approach described in Forde and Nelson (2004) to obtain stability conditions for characteristic equation of DDEs, we have derived five analytical conditions for the stability of the considered thalamo-cortical system. According to this approach, we first investigate the conditions under which the system is stable in the absence of time delay (τ = 0). Then, by increasing the delay value (τ > 0), we seek to determine whether there exists a critical delay value for which the system becomes unstable. Since the power spectrum analysis is valid only if the system resting state is stable, we probe the conditions under which the introduction of time delay cannot cause a bifurcation. The following conditions guarantee that the system is stable when τ = 0, and increasing the delay value does not change the stability of the system (see (Hashemi et al. 2017) for the details):

$$\begin{array}{@{}rcl@{}} \beta_{i}(2+G_{srs})\,+\,\beta_{e}(1\,-\,G_{ese})\!\!&>&\!\!0, ~~~ (\mathrm{Constraint~ II})\\ 1\,+\,G_{esre}\,+\,G_{srs}\,-\,G_{ese}\!\!&>&\!\!0, ~~~ (\mathrm{Constraint~ III})\\ (2\beta_{e}+\beta_{i})\left( \frac{2+G_{srs}}{\beta_{e}}+\frac{1\,-\,G_{ese}}{\beta_{i}}\right)&&\\ -(1+G_{esre}+G_{srs}-G_{ese})\!\!&>&\!\!0, ~~~ (\mathrm{Constraint~ IV})\\ ({\beta_{e}^{2}} \beta_{i})^{2} \left( (1\,+\,G_{srs})^{2}\,-\, (G_{esre}\,-\,G_{ese})^{2} \right)\!\!&>&\!\!0, ~~~(\mathrm{Constraint~ V})\\ {\Delta}\,=\,18\xi_{2}\xi_{1}\xi_{0}-4{\xi_{2}^{3}}\xi_{0}+{\xi_{2}^{2}}{\xi_{1}^{2}}-4{\xi_{1}^{3}}-27{\xi_{0}^{2}}\!\!&<&\!\!0. ~~~~ (\mathrm{Constraint~ VI}) \end{array} $$

Results

In the following, the results of model parameter estimation for the case studies described in the previous section are presented. The first two case studies aim to illustrate important features of the methods applied laying the ground for the analysis of recorded experimental data by a thalamo-cortical model. An outline of the parameter inference in this study is illustrated in Fig. 2. In Case Study I and II, the unknown parameters of set of SDEs (stochastic ordinary and delay differential equation, respectively) are inferred from pseudo-experimental data. As can be observed from the schematic illustration, in order to estimate the unknown parameters of a set of SDE, we transform the observation from time-domain to frequency-domain data. To this end, the power spectrum of the system is computed analytically by the aid of the Green’s function to generate the true signal, i.e. the signal constructed by the nominal (true) parameters. In addition, the system spectral power is calculated numerically to acquire the measurement signal by applying the Welch method. Then, the model parameters are estimated by fitting the experimental data to the corresponding model power spectrum. In general, the generated in silico data can be mathematically expressed as Ψ = Φ + noise, where Φ and Ψ denote the noise-free observation (true signal) and the corresponding noisy data (measured signal), respectively. Finally, in the main Case Study III, the proposed parameter inference method is applied to the real experimental data set to estimate the parameters of a neural mass thalamo-cortical model (true signal) from the EEG spectral power (measured signal).

Fig. 2
figure 2

Schematic illustration of parameter inference carried out in this work. In Case studies I and II, the true signal (analytical power spectrum, Φ) is fitted to the measured signal (numerical power spectrum, Ψ). In a simmilar manner applied to real data measuremet, in Case study III, the power spectrum of a neural mass model (true signal) is fitted to the EEG spectral power (measured signal)

Case Study I

Case Study I deals with estimating the parameters of a stochastic damped harmonic oscillator by fitting the model’s spectrum to a set of pseudo-experimental data. The result of this estimation is shown in Fig. 3. In Fig. 3a, the estimated power spectrum obtained by PSO is compared with the respective noise-free and the noisy spectra. From the result, we observe that the estimated power spectrum is in very good agreement with the power spectrum computed from the given signal. The noise-free power spectrum was generated according to Eq. 18 with the true parameters p I = (κ,γ,f0) = (0.1 mV,5.0 Hz,3.0 Hz). The estimated parameters \(\boldsymbol {p_{I}^{\ast }}=(\kappa ,\gamma ,f_{0})=(0.103 ~\text {mV},\) 4.562 Hz,3.00 Hz) are very close to the true parameters p I and yield the best-fit value \(\mathrm {\mathcal E (\boldsymbol {p_{I}^{\ast }})} = 0.6554\). It is worth pointing out that other EAs such as GA and DE yield similar estimations.

Moreover, using MCMC methods we can produce an estimate of the means and standard deviations of the inferred parameters. The histogram of Markov Chains constructed by the MH algorithm for model parameters κ, γ, and f0 are shown in Fig. 3b, c and d, respectively. One can see that the Markov chains obey a Gaussian distribution, where the mean values (vertical red lines) indicate near identical estimates with those obtained by PSO algorithm. This result represents a very close agreement between the MLE and LSE obtained by the MH and the PSO algorithm, respectively.

Once the model parameters have been inferred, one can determine the uncertainties in the parameter estimations. In order to assess the accuracy of the estimates shown in Fig. 3, we plot the confidence regions of the calibrated parameters. Figure 4 illustrates the 95% confidence regions for different pairs of parameter estimates in Case Study I. Covariance matrix estimation yields elliptical confidence regions, whereas the likelihood confidence regions are estimated by PSO algorithm. Since J(p)WJ(p) = 2H(p) the covariance matrix approximated by the Fisher Information Matrix (cf. Eq. 9) and Hessian matrix (cf. Eq. 11) are equal. This yields identical elliptical confidence regions, cf. dashed red and green lines in Fig. 4a. Considering the conceptual difference of Hessian and FIM approaches in the derivative terms, the exact coincidence of the ellipsoids obtained by these methods confirms that the accuracy in parameter estimations are well captured (Marsili-Libelli et al. 2003). Moreover, comparing the likelihood confidence regions (calculated from Eq. 12) with the elliptical confidence regions indicates that high inference precision have been obtained by PSO algorithm. This demonstrates further the benefits of the PSO algorithm in estimating the model parameters combined with a simultaneous computation of the confidence estimates.

Fig. 3
figure 3

Parameter estimation of a stochastic damped harmonic oscillator (Case Study I) from a set of noisy in silico data. a Estimated power spectrum is plotted versus the noise-free and the noisy spectrum, encoded in dashed green, solid blue, and dashed red lines, respectively. In addition, the grey shaded area represents the 95% confidence interval. The true and estimated parameters obtained by PSO are p I = (κ,γ,f0) = (0.1 mV,5.0 Hz,3.0 Hz), and \(\boldsymbol {p_{I}^{\ast }}=(\kappa ,\gamma ,f_{0})=(0.103 ~\text {mV}, 4.562 ~\text {Hz}, 3.00 ~\text {Hz})\), respectively. b, c, d Histogram of Markov chains constructed by the MH algorithm for parameters κ, γ and f0, respectively. The mean value of Markov chains (vertical red lines) indicate near identical estimates with those obtained by the PSO algorithm

To further confirm the reliability of the obtained confidence regions, we have also computed the 95% confidence regions by PyMC package (Patil et al. 2010). As presented in Fig. 4b, one observes very good agreement with the results illustrated in panel a.

Fig. 4
figure 4

Comparison of 95% confidence regions for different pairs of parameter estimates in Case Study I. a The ellipsoids encoded in dashed red and green lines show the confidence regions obtained by approximating the covariance matrix through the use of FIM and Hessian approaches, respectively. The regions constructed by the blue markers indicate the likelihood confidence regions produced by the PSO algorithm. b Confidence regions for model parameters obtained by MH algorithm. The regions are centered at the optimal parameters \(\boldsymbol {p_{I}^{\ast }}\) illustrated by the filled red circles

An easy way to study the practical identifiability of an estimation is to plot the correlation matrix of the model parameters. Here, the local identifiability of the obtained estimations is evaluated based on the correlation analysis. For Case Study I, Fig. 5 displays the absolute value of the correlation coefficients obtained according to Eq. 13. The figure shows low correlation values in non-diagonal elements. The lack of correlation between the estimated parameters indicates that all the parameters are identifiable. Furthermore, we have carried out the sensitivity analysis for this case study (see the relevant result presented in Appendix B in Supplementary Material) revealing that the estimated parameters in this case study are captured in an accurate manner.

Fig. 5
figure 5

Correlation matrix for Case Study I. The figure shows the absolute value of the correlation coefficients indicating lack of correlation between the estimated model parameters κ, γ, and f0

Further it is interesting to take a closer look at the convergence speed of different algorithms carried out in Case Study I. Figure 6 shows the convergence functions, i.e., The fitness values versus the function evaluations, for LM, HJ, PSO, DE, GA, CMA-ES, MH, and SA algorithms averaged over 100 runs. Although the fitness function of all algorithms finally reach the global minimum, the local search algorithms (LM and HJ) show a faster convergence speed compared to the others, whereas the EAs (including PSO, DE, GA, CMA-ES) indicate faster convergence than MH and SA as MCMC algorithms. In addition, SA converges finally to the minimum value in a damping manner (when the temperature is reduced toward zero). In contrast, the fitness function of MH keeps oscillating about the minimum value.

Fig. 6
figure 6

Convergence functions of several optimization algorithms used in Case Study I. The fitness values versus the function evaluations in a log-log scale for different algorithms: LM and HJ as local search algorithm, PSO, DE, GA, CMA-ES from global search algorithms, and MH and SA known as sampling algorithms

Case Study II

In Case Study II, the power spectrum of a linear SDDE is fit to a set of pseudo-experimental data. Note that this case study poses a multimodal objective function, which is a more challenging problem in finding the global minimum compared to Case Study I as an example of unimodal functions. Figure 7 illustrates the parameter inference of the SDDE from a noisy measurement. The estimated power spectrum shows a striking close match to the reference spectrum in Fig. 7a. Here, the noise-free observations are generated by substituting the true parameters p II = (κ,a,b,τ) = (0.1 mV,− 17.3,− 21.32,0.2) in Eq. 20. The fit based on PSO yields the optimal parameters \(\boldsymbol {p_{II}^{\ast }}=(\kappa ,a,b,\tau )=(0.103 ~\text {mV}, -18.4, -21.49, 0.2)\), that is in very good agreement with the original model parameters. The corresponding estimation’s fitness function value is \(\mathrm {\mathcal E(\boldsymbol {p_{II}^{\ast }})} = 33.19\). Furthermore, the histograms of Markov Chains constructed by the MH algorithm for model parameters κ, a, b and τ are shown in Fig. 7b–e, respectively. We observe that the estimates calculated by the MH (vertical red lines) are very close to those obtained by PSO algorithm.

Fig. 7
figure 7

Inferring the parameter values of a stochastic linear delay differential equation (Case Study II) from a set of in silico data. a The estimated power spectrum (dashed green line), the corresponding noise-free spectrum (blue line) and the spectrum from noisy measured data (dashed red line). The grey shaded area encodes the 95% confidence interval. The true and estimated parameters are p II = (κ,a,b,τ) = (0.1 mV,− 17.3,− 21.32,0.2), and \(\boldsymbol {p_{II}^{\ast }}=(\kappa ,a,b,\tau )=(0.103 ~\text {mV}, -18.4, -21.49, 0.2)\), respectively. b, c, d, e Histograms of Markov chains constructed by the MH algorithm for parameters κ, a, b and τ, respectively. The mean value of generated Markov chains (vertical red lines) are very close to the estimates obtained by the PSO algorithm

Figure 8 displays the confidence regions for all possible pairs of the estimated parameters in Case Study II. Similar to Case Study I, the elliptical confidence regions are computed by covariance matrix estimation according to Eqs. 9, and 11, whereas the likelihood confidence regions are provided by PSO according to Eq. 12. One can see that the ellipsoids constructed with covariance matrix estimation using FIM and Hessian matrix coincide, because in this case study J(p)WJ(p) = 2H(p). However, comparing the elliptical and likelihood confidence regions, there is a discrepancy between the regions evaluated based on covariance matrix and those computed through the PSO method.

Fig. 8
figure 8

Elliptical and likelihood confidence regions at 95% confidence level for each pair of estimated parameters in Case Study II. The ellipsoids are computed with the FIM information (in dashed red) and Hessian matrix (in green), whereas the likelihood confidence regions (in blue) are estimated by the PSO algorithm. The estimated parameters \(\boldsymbol {p_{II}^{\ast }}=(\kappa ,a,b,\tau )=(0.103 ~\text {mV}, -18.4, -21.49, 0.201)\) are represented by filled red circles

In order to identify the origin of the discrepancy between elliptical and likelihood confidence regions observed in Fig. 8, we investigate the correlation among the model parameters. Figure 9 represents the correlation matrix of the model parameters in case study II. If two parameters are highly correlated, the change in model output caused by change in one parameter can be compensated by an appropriate change in the other parameter. This prevents the parameters from being uniquely identifiable. In other words, for a pair of correlated parameters there exist many combinations that give almost the same value of fitness function. This aspect reflects a degeneracy of solutions, resulting from the non-uniqueness of the inverse problem solution. According to the absolute value of the correlation coefficients plotted in Fig. 9a, the parameters a and b are practically non-identifiable since they are highly correlated, whereas other pairs of parameters are uncorrelated. To overcome such problem, the pairs of correlated parameters must be removed analytically by introduction of new variables. In this case study, setting a candidate solution in the form of y(t) = Ceλt yields the following nonlinear transcendental characteristic equation:

$$ \lambda-a-b e^{-\lambda \tau}= 0, $$

where, by inserting λ = iω, and separating the real and imaginary parts we obtain

$$\begin{array}{@{}rcl@{}} a&=&-b \cos(\omega \tau),\\ {\Omega}&=&-b \sin (\omega \tau), \end{array} $$
(27)

or equivalently,

$$\begin{array}{@{}rcl@{}} a&=&\omega/\tan(\omega \tau),\\ b&=&-\omega/\sin(\omega \tau). \end{array} $$
(28)

where ω = 2πΩ. Now, introducing the parameter Ω according to the above equations leads to a model equation containing three uncorrelated parameters: κ, Ω, τ (cf. Fig. 9b). As it is shown in Fig. 10a, for this set of uncorrelated parameters, the elliptical confidence regions coincide very well with the likelihood-based regions. These results indicate a precise estimation with uniquely identifiable estimates. Here, to compute the confidence regions of the model parameters, we employed the same approach as used in Fig. 8. In addition, the 95% confidence regions obtained by PyMC are displayed in Fig. 10b. From Figs. 7 and 10, we observe very close agreement between the inference obtained by PSO and MH.

Fig. 9
figure 9

Correlation matrix (absolute values) for Case Study II. a The estimated parameters are κ, a, b, and τ. From this panel, we observe that parameters a and b are highly correlated, which were causing identifiability problem. b Introducing the parameter Ω according to Eq. 27 yields a model with three uncorrelated parameters: κ, Ω, τ

Fig. 10
figure 10

Confidence regions for the parameters of Case Study II. a The elliptic and likelihood confidence regions for the uncorrelated parameters κ, Ω, and τ. b Confidence regions of the parameters built from MH algorithm. The regions are centered at the estimated parameters \(\boldsymbol {p_{II}^{\ast }}=(\kappa ,{\Omega },\tau )=(0.103 ~\text {mV}, 1.99, 0.2)\)

Finally, for this case study, we compare the performance of different algorithms used in this study. For the sake of fair comparison, the initial guesses in the MH and SA algorithms were created randomly within the parameter search space to have an identical strategy for starting condition with the EAs (i.e., PSO, DE, GA, CMA-ES). The parameter search space was limited in the range [0,20] for each parameter. We have also applied the local algorithms LM and HJ, but nonlocal algorithms out-perform them clearly (LM and HJ algorithms failed to arrive at the global minimum). This is why we do not discuss their corresponding results in the following.

The results for 100 runs are reported in Fig. 11 and Table 1. We found that PSO, DE, GA, CMA-ES, SA, MH methods succeeded in finding the global minimum.

Fig. 11
figure 11

Comparing the performance of different algorithms through 100 independent runs in Case Study II. The red bars indicate the histogram of fitness function values (the number of counts of the best fitness value) obtaiend by PSO, DE, GA, CMA-ES, MH and SA algorithm

In addition, for each algorithm, the mean and minimum values of obtained fitness function, and the average of running time are listed in Table 1. Although EAs reveal a high computational cost, they show a very good performance in finding the global solution. According to these results, PSO delivers slightly better solutions than other EAs, although the employed EAs are competitive in finding the global minimum.

Table 1 Comparing the results obtained by different search algorithms achieved from 100 independent runs in Case Study II

Case Study III

The first two case studies were designed with the measured in silico data. In the following, we identify the parameters of a thalamo-cortical model described by a set of coupled stochastic delay differential equations through the model spectral fitting to the in vivo experimental data.

Figure 12 shows the power spectrum of the model given by Eq. 25 fit to the power spectra of EEG recorded over frontal and occipital head regions during awake and anesthesia conditions. As a consequence of the very good performance of the parameter estimation based on PSO, we applied it to estimate model parameters optimally. Figure 12 shows a good prediction of the observed spectral power features in experimental data.

Fig. 12
figure 12

Fitting a reduced thalamo-cortical model to the EEG power spectra in awake and anesthesia conditions. In each panel, the spectral power of recorded EEG data is shown as a dashed red line. The fit EEG power spectra using standard chi-squared function are illustrated by green lines, whereas those obtained through the biased chi-squared function are shown by blue lines. Panels a and b illustrate the EEG spectral power over the frontal head region in awake and anesthesia conditions, respectively. The occipital EEG spectral power in awake and anesthesia conditions are displayed in panels c and d, respectively

It is important to point out that, in most of the datasets, implementing a standard fitness function defined by the discrepancy between the models output and the measured data does not allow to fit well the spectral power peak in δ − and α −frequency ranges (cf. the inset in Fig. 12a). Since the δ − and α −peaks are important and informative signal features observed during anesthesia, we employed a biased chi-squared function given by Eq. 5 in order to fit the model with the spectral power peak within these frequency ranges. Taking a biased fitness function with more weight value in δ − and α −frequency bands, the model output is forced to improve the fit of the corresponding experimental spectral power peaks. For instance, in panel A, we set c1 = 20, c2 = 1, c3 = 10, c4 = 1 to capture the observed δ − and α − peak. It is trivial that c1 = c2 = c3 = c4 = 1 results in the standard chi-squared function.

The sensitivity analysis of the fitness function to the estimated parameters for this case study is shown in Appendix B in Supplementary Mateial.

In order to demonstrate the power of the thalamo-cortical neural mass model, it is fit to EEG spectral power of eight patients recorded during pre- and post-incision anesthesia induced by propofol and desflurane, as shown in Fig. 13. In this figure, we also observe that the model fits measured data very well in δ − and α −frequency bands. These results indicate that the considered thalamo-cortical model in this work is able to reproduce the specific features observed in EEG spectral power data adequately. For completeness, statistics of the estimated parameters are given in Fig. 14 for all patients. Most parameters are stable over experimental conditions and subjects, such as the thalamo-cortical delay time τ. Conversely, the decay rates β e and β i are significantly different under desflurane and propofol anesthesia under the pre-incision condition (p < 0.05). Moreover, the noise strength is significantly different under desflurane and propofol anesthesia in both experimental conditions (p < 0.05). The detailed parameter statistics for each patient are reported in Appendix C in Supplementary Material.

Fig. 13
figure 13

Fitting a reduced thalamo-cortical model to EEG spectral power in pre- and post-incision anesthesia induced by propofol and desflurane. The recorded EEG data for eight patients are shown by dashed lines, whereas the corresponding fitted model are illustrated by solid lines. a The EEG power spectra recorded in pre- and b in post-incision condition induced by propofol are illustrated by dashed red and green lines, respectively. In addition, the solid blue and black lines depict the corresponding fitted thalamo-cortical model to experimental data. Panels c) and d show the the fitted mode against the spectral power of recorded EEG data during desflurane induced anesthesia in pre- and post-incision conditions, respectively

Fig. 14
figure 14

Statistics of the estimated parameters of the thalamo-cortical model for 25 patients during general anesthesia. Each boxplot shows the Kruskal-Wallis test statistic for the estimated parameters of the thalamo-cortical neural mass model fitted to EEG spectral power in pre- and post-incision anesthesia induced by propofol and desflurane. D p r e and D p s t stand for pre- and post- incision induced by desflurane, respectively. P p r e and P p s t stand for pre- and post- incision induced by propofol, respectively

Discussion

In a great variety of scientific fields, stochastic differential equations arise naturally in the modeling of systems due to random forcing or other noisy input (Faisal et al. 2008). Numerical integration of differential equations is a major time consuming problem in the parameter estimation of nonlinear dynamics describing biological systems (Liang and Lord 2010). Furthermore, inferring the parameters of SDEs are more problematic due to the inherent noise in system equations.

Various previous methods attack the parameter inference problem. It has been shown that a decoupling strategy (slope approximation), that considers the derivative values of system state variables, avoids numerical integration altogether by fitting models to the slope of time-series data (Almeida and Voit 2003; Voit and Almeida 2004). However, this technique is not applicable in most inverse problems. For instance, if an equation is affected by a state variable for which there is no data available, then the decoupling technique cannot be applied to that equation. Moreover, this strategy cannot provides a model that is readily applicable to the computational simulation when the given time-series data contain measurement errors (Kimura et al. 2005).

In another work, Tsai and Wang (2005) have proposed a modified collocation approximation technique to convert differential equations into a set of algebraic equations. This method has the obvious advantage of avoiding numerical integration of differential equations. They have shown that their method yields accurate parameter estimation for S-system models of genetic networks what also saves much computational time. However, such an approximation cannot always be employed in general complex nonlinear inverse problems.

In the last decade, there have been several studies on fitting the neural population models to experimental data. In neuroimaging literature, Dynamic Causal Modeling (DCM) has been used successfully to infer hidden neuronal states from measurements of brain activity (Friston et al. 2003; David et al. 2006; Pinotsis et al. 2012). It has been shown previously that characterizing neural fluctuations in terms of spectral densities leads to more accurate inference than stochastic scheme (Razi et al. 2015; Jirsa et al. 2017). However, in most of the previous studies, a rigorous analytical approach to overcome the inference difficulties due to the additive noise has received relatively little attention (Daunizeau et al. 2012; Ostwald et al. 2014; Ostwald and Starke 2016). In the technique presented in this study, we estimated the model parameters from the power spectrum derived analytically from the system equations. By the aid of the Green’s function method, we can easily compute the power spectrum of a linear system whose dynamics are governed by a set of coupled stochastic ordinary or delay differential equations. By fitting the analytically computed spectral power to the spectral power estimated from corresponding measurements, we can estimate the model parameters without solving the model equations. Hence we are able to avoid the computational costs of numerical integration, which dramatically reduces the computational time burden. Note that investigating the structural identifiability (model selection practice) in order to identify which model best explains the experimental data, is beyond the scope of the present manuscript. The reader is referred to further literature for a more detailed review of the model comparison (Daunizeau et al. 2009; Raue et al. 2009; Penny 2012).

In general, the inverse problems can be solved by optimization algorithms and MCMCs methods (Myung 2003; Tashkova et al. 2011; Gelman et al. 2004). Optimization methods are simple and straightforward to minimize the error between the model prediction and the measured data (Mendes and Kell 1998; Moles et al. 2003; Kimura et al. 2015). On the other side, many sampling algorithms and probabilistic programming languages have been created to perform Bayesian inference, especially for high dimensional and complex posterior distributions e.g., Carpenter et al. (2017) and Patil et al. (2010). This maximum likelihood approach provides us uncertainty information in addition to the optimum value for each parameter. In the present work, we have used several optimization algorithms as well as classical sampling methods (MH) to benefit from and compare both classic and probabilistic inferences.

We compared the performance of EAs including PSO, DE, GA, CMA-ES and the well-known sampling algorithms MH, and SA (Case Study I and Case Study II, cf. Figs. 6, and 11)). Our results show that in the case of a unimodal problem (single spectral peak), EAs outperform the sampling algorithms while they are computationally more expensive.

In recent years, many algorithms have been proposed to solve inverse problems (Rodriguez-Fernandez et al. 2006b; Kramer et al. 2014; Kimura et al. 2015). Notably, it is shown that both the choice of algorithm applied in the estimating problems and the formulation of the objective function plays a crucial role in reproducing the key features of the measured data (Kimura et al. 2005). This is confirmed by our study demonstrating that the specific choice of the fitness function, e.g. by weighting different signal elements, plays a decisive role in reproducing the key features of the measured data. We showed that using the standard least squares function the thalamo-cortical neural mass model fails to be fit to the spectral power peak observed in δ − and α −frequency ranges. This can be improved by adding more weights to the fitness function in certain frequency bands than the others, cf. Fig 12.

For each parameter estimation problem carried out in this study, we also employed the practical identifiability analysis to check the reliability of the estimates. The identifiability analysis in this work comprised the Fisher Information Matrix (FIM) to compute the sensitivity and the correlation matrices, in addition to plotting the confidence regions for estimated parameters. We illustrated that the identifiability analysis can be easily exploited by plotting the confidence regions according to the covariance approximation or by employing PSO and MH algorithms. For instance, the confidence regions obtained through Hessian and FIM approaches were compared in Figs. 4 and 10. By virtue of the conceptual difference between these approaches, the exact coincidence of the ellipsoids obtained based on Hessian and FIM information indicates that the estimated parameters are uniquely identifiable and we were able to obtain reliable estimates (Marsili-Libelli et al. 2003). To further confirm the reliability of the shown confidence regions, we have also compared the results obtained by the PSO and the MH algorithms. As presented in Figs. 4 and 10, we observed very good agreement with these approaches.

Furthermore, by measuring the sensitivity values, it is possible to investigate how the system output will change in response to small modification in the model parameters (Rodriguez-Fernandez et al. 2006b, 2013). This allows us to reveal which model parameters play a decisive role in the model behavior. A high sensitivity index for a parameter shows that the small changes on that parameter cause a strong response in model output. This indicates that the parameters with higher sensitivity values are more identifiable than those parameters with low sensitivity indices (cf. Appendix B in Supplementary Material). The correlation plots also provide information about the parameter identifiability. The lack of correlation among the estimated parameters reveals that the parameters are identifiable, as shown in Fig. 5. On the contrary, the highly correlated parameters are not identifiable since there exist combinations of them yielding an identical fitness value, cf. Fig. 9. The high correlation between parameters can also cause a discrepancy between the elliptical and likelihood-based confidence regions, as illustrated in Fig. 8. To surmount this problem, the pairs of correlated parameters must be removed by introduction of new variables.

Up to now, few studies have investigated the parameter estimation problems in the context of neural population modeling, which is well-established to reproduce the measured EEG data during different behavioral states. To our best knowledge, the present study is the first that fits a thalamo-cortical model to EEG spectral power peaks observed in both δ − and α −frequency ranges. A pioneer study by Bojak and Liley (2005) fitted a neural population model comprising excitatory and inhibitory cortical neurons to a set of pseudo-experimental data. In another study, Rowe et al. (2004) have estimated the values of key neurophysiological parameters by fitting the model’s single-peak spectrum to EEG spectra in awake eyes-closed and eyes-open states. Although they have achieved good predictions of the measured data, their data do not exhibit a second spectral power peak as in our data in δ −frequency range. Moreover, they have used a local search method (LM method) which requires an initial guess for the parameters. In a similar approach, Van Albada et al. (2010) have fit a neural mass model to eyes-closed EEG spectra of a large number of subjects to probe the age-associated changes in the physiological model’s parameters. Their findings suggest that the inverse modeling of EEG spectral power is a reliable and non-invasive method for investigating large-scale dynamics, which allows us to extract physiological information from EEG spectra. In line with these studies, the data-driven approach presented in the current study provides a proper guidance for fitting the thalamo-cortical model to a large set of experimental recordings. This enables us to investigate the parameter changes during the transition from awake to anesthesia state, especially those parameters that cannot be measured directly. An important finding of our data-based analysis in fitting a thalamo-cortical model to the EEG spectra is that the model is heavily sensitive to the delay transmission in the system (cf. Appendix B in Supplementary Material). This is in agreement with previous studies suggesting that the location of spectral power peaks especially in alpha frequency range heavily depends to the delay values in the thalamo-cortical circuits (Robinson et al. 2001a, 2001b; Rowe et al. 2004). Hence the transmission delay can provide a basis for the reproduction of certain features in experimental data seen at high concentration of anesthetics. For instance, a recent study by Hashemi et al. (2017) has considered the effect of anesthetics on the axonal transmission delay to reproduce the beta power surging observed in EEG power spectrum close to loss of consciousness. Inferring the parameter changes associated to the changes in brain activities from model fitting to a large data set remains to be investigated in future work.

Conclusion

The results obtained in the present work reveal that given a set of stochastic ordinary or delay differential equations (SDEs) and a set of experimental data, we are able to fit the model power spectrum to the related data with a high accuracy and very low computational costs by the aid of the Green’s function method and evolutionary algorithms. We demonstrated that using evolutionary algorithms, the proposed thalamo-cortical neural population model fits very well to the EEG spectral features within δ − and α −frequency ranges measured during general anesthesia.

Moreover, we showed that in multimodal optimization problems, the use of a global optimization approach such as PSO or DE is required in order to accurately estimate the model parameters.

Our analysis indicates further that one can employ a data-driven approach to provide new valuable insights into the mechanisms underlying the behavior of complex systems. This approach will provide an appropriate guidance in future brain experiments to better understand different behavioral activities. As a summary, this work can serve as a basis for future studies revealing biomarkers from physiological signals.

Information Sharing Statement

The authors do not have ethical approval to make the data set publicly available, as this did not form part of the subject consent process. Consequently, we do not have the written approval of the patients to publish their data and hence we refrain from making the data public. However, the data are available upon request from authors by contacting meysam.hashemi@univ-amu.fr and/or Jamie.Sleigh@waikatodhb.health.nz

The source codes needed to reproduce the presented results are available on GitHub (https://www.github.com/mhashemi0873/SpectralPowerFitting).