Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Motions and processes observed in nature are extremely diverse and complex. Therefore, opportunities to model them with explicit functions of time are rather restricted. Much greater potential is expected from difference and differential equations (Sects. 3.3, 3.5 and 3.6). Even a simple one-dimensional map with a quadratic maximum is capable of demonstrating chaotic behaviour (Sect. 6.2). Such model equations in contrast to explicit functions of time describe how a future state of an object depends on its current state or how velocity of the state change depends on the state itself. However, a technology for the construction of these more sophisticated models, including parameter estimation and selection of approximating functions, is basically the same. A simple example: construction of a one-dimensional map \(\eta_{n+1}=f(\eta_n,\textbf{c})\) differs from obtaining an explicit temporal dependence \(\eta=f(t,\textbf{c})\) only in that one needs to draw a curve through experimental data points on the plane \((\eta_n,\eta_{n+1})\) (Fig. 8.1a–c) rather than on the plane \((t,\eta) \) (Fig. 7.1). To construct model ODEs \({{{\mathrm{d}}{\mathbf{x}}} \mathord{\left/ {\vphantom {{{\mathrm{d}}{\mathbf{x}}} {{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace} {{\mathrm{d}}t}} = {\mathbf{f}}({\mathbf{x}},{\mathbf{c}})\), one may first get time series of the derivatives \({{\mathrm{d}}x_k}\mathord{\left/{\vphantom {{{\mathrm{d}}x_k}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace} {{\mathrm{d}}t}\) (\(k=1,{\ldots},\ D\), where D is a model dimension) via numerical differentiation and then approximate a dependence of \({{\mathrm{d}}x_k}\mathord{\left/{\vphantom {{{\mathrm{d}}x_k}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace} {{\mathrm{d}}t}\) on x in a usual way. Model equations can be multidimensional, which is another difference from the construction of models as explicit functions of time.

Fig 8.1
figure 1

Parameter estimation for the quadratic map (8.1) at \(c_{0}=1.85\). Circles show the observed values: (a) no noise; the dashed line is an original parabola; (b) only dynamical noise is present; the dashed line is a model parabola obtained via minimisation of the mean-squared vertical distance (some of those distances are shown by solid lines); (c) only measurement noise is present; the dashed line is a model parabola obtained via minimisation of the mean-squared orthogonal distance; (d) rhombs show a model realisation which is the closest one to an original time series in the sense (8.4)

For a long time, in empirical modelling of complex processes, one used linear difference equations containing noise to allow for irregularity (Sect. 4.4). The idea was first suggested in 1927 (Yule, 1927) and appeared very fruitful so that autoregression and moving average models became a main tool for the description of complex behaviour for the next 50 years.

Only in 1960–1970s, researchers widely realised that simple low-dimensional models in the form of non-linear maps or differential equations can exhibit complex oscillations even without noise influence. It gave a new impulse to the development of empirical modelling techniques, since arousal of powerful and widely accessible computers provided practical implementation of the ideas.

In this chapter, we consider a situation when an observed time series \(\eta_{i}=h(\textbf{x}(t_{i})),i=1,\ldots,N\), is produced by iterations of a map \(\textbf{x}_{n+1}=\textbf{f}(\textbf{x}_{n},\textbf{c})\) or integration of an ordinary differential equation \({\mathrm{d}}\textbf{x}/{\mathrm{d}}t=\textbf{f(x, c)}\), whose structure is completely known. The problem is to estimate parameters c from the observed data. This is called “transparent box” problem (Sect. 5.2). To make the consideration more realistic, we add dynamical and/or measurement noise.

Such a problem setting is encountered in different applications and attracts serious attention. One singles out two main aspects of interest considered below:

  1. (i)

    Parameter estimation with a required accuracy is important if the parameters cannot be measured directly due to experimental conditions. Then, the modelling procedure plays a role of “measurement device” (Butkovsky et al., 2002; Horbelt and Timmer, 2003; Jaeger and Kanrz, 1996, Judd, 2003; McSharry and Smith, 1999; Pisarenko and Sornette, 2004; Smirnov et al., 2005b) (see Sect. 8.1).

  2. (ii)

    Parameter estimation in the case of data deficit is even more problematic. Such is a situation when one cannot get time series of all model dynamical variables x k from the measured values of an observable η, i.e. some variables are “hidden” (Baake et al., 1992; Bezruchko et al., 2006; Breeden and Hubler, 1990; Parlitz, 1996; Voss et al., 2004; (see Sect. 8.2).

1 Parameter Estimators and Their Accuracy

Let us consider the estimation of a single parameter in a non-linear map from its noise-corrupted time realisation. The object is the quadratic map in a chaotic regime with unknown parameter \(c=c_{0}\):

$${x_{n+1}}=f({x_n},{c_0})+{\xi_n}=1-{c_0}x_n^2+{\xi_n}, {\eta_n}={x_n}+{\zeta_n},$$
((8.1))

where ξ n , ζ n are independent random processes. The first of them is a dynamical noise (since it influences the dynamics) and the second one is a measurement noise (since it affects only the observations).

If both noises are absent, one has \(\eta_{n}=x_{n}\) and experimental data points on the plane \(x_{n},x_{n+1}\) lie exactly on the sought parabola (Fig. 8.1a). Finding the value of c reduces to an algebraic equation whose solution is \(\hat{c}=(1-x_{n+1})\left/x_{n}^{2}\right.\). Hence, it is sufficient to use any two observations \(x_{n},x_{n+1}\) with \(x_{n}\neq 0\). As a result, a model coincides with the object up to the computation error.

If noise is present either in dynamics or measurements, one looks for a parameter estimate rather than for the precise value of the parameter. The most widely used estimation techniques are described in Sect. 7.1.1. Peculiarities of their application under the considered problem setting are as follows.

1.1 Dynamical Noise

Let the dynamical noise ξ n in Eq. (8.1) be a sequence of statistically independent random quantities and identically distributed with a probability density \(p(\xi)\). To estimate the parameter c, one can use the ML technique (Sects. 2.2.1, 7.1.1 and 7.1.2), which is the most efficient one under sufficiently general conditions (Ibragimov and Has’ minskii, 1979; Pisarenko and Sornette, 2004). The likelihood function [see also Eqs. (2.26) and (7.10)] reads in this case as

$${\mathrm{ln}}\,L(c) \equiv{\mathrm{ln}}\,{p_N}(\left.{{\eta_1},{\eta_2},\ldots,{\eta_N}} \right|c) \approx \sum\limits_{n=1}^{N-1}{{\mathrm{ln}}\,p\left({{\eta_{n+1}}-f({\eta_n},c)} \right)}.$$
((8.2))

To apply the technique, one must know the distribution law \(p(\xi)\), which is rarely the case. Most often, one assumes Gaussian noise so that the maximisation of Eq. (8.2) becomes equivalent to the so-called ordinary least-squares technique, i.e. to the minimisation

$$S(c)=\sum\limits_{i=1}^{N-1}{{{\left({{\eta_{n+1}}-f({\eta_n},c)}\right)}^2}}\to\min.$$
((8.3))

It means that the plot of a model function on the plane \((\eta_{n},\eta_{n+1})\) should go in such a way so as to minimise the sum of the squared vertical distances from it to the experimental data points (Fig. 8.1b).

As a rule, the error in the estimator \(\hat{c}\) decreases with a time series length N. Under the considered problem setting, both the ML approach and the LS technique give asymptotically unbiased and consistent estimators. It can be shown that the estimator variance decreases as \(N^{-1}\) analogous to the examples in Sect. 7.1.2. The reason can be described in the same way: the terms in Eq. (8.3) are stationary with respect to i, i.e. a partial Fisher information is bounded.

The ordinary LS technique often gives an acceptable estimator accuracy even if the noise is not Gaussian (Sects. 7.1.1 and 7.1.2). Although one may apply other methods, e.g., the least absolute values technique, the LS technique is much easily implemented. An essential technical difficulty arises when the “relief” of the cost function (8.3) exhibits many local minima, which is often the case if f is non-linear with respect to c. Then, the optimisation problem is solved in an iterative way with some starting guesses for the sought parameters (Dennis and Schnabel, 1983). Whether a global extremum is found depends on how “lucky” are the starting guesses i.e. how close they are to the true parameter values. In the example (8.1), f is linear with respect to c; therefore, the cost function S is quadratic with respect to c and exhibits the only minimum, which is easily found as a solution to a linear algebraic equation.

We note that if f is linear with respect to x, the model (8.1) is a linear first-order autoregression model. More general ARMA models involve a dependence of \(x_{n+1}\) on several previous values of x and ξ, see Eq. (4.13) in Sect. 4.4.

1.2 Measurement Noise

If only a measurement noise is present \((\eta_{n}=x_{n}+\zeta_{n})\), the estimation problem gets more complicated. This is because one aims at finding a dependence of \(x_{n+1}\) on x n , where x n is an “independent” variable whose observed values are noise corrupted [a confluent analysis problem, see Eq. (2.28) in Sect. 2.2.1.8).

1.2.1 Bias in the Estimator Obtained Via the Ordinary LS Technique

The bias is non-zero for an arbitrarily long time series, since the technique (8.3) is developed under the assumption of only a dynamical noise presence. It can be illustrated with the example (8.1), where one has

$$\begin{array}{ll} S(c)&=\sum\limits_{i=1}^{N-1}{{{\left({{\eta_{i+1}}-f({\eta_i},c)}\right)}^2}}=\sum\limits_{i=1}^{N-1}{{{\left({{x_{i+1}}+{\zeta_{i+1}}-1+c{{\left({{x_i}+{\zeta_i}}\right)}^2}}\right)}^2}}= \\ &=\sum\limits_{i=1}^{N-1}{{{\left({cx_i^2-{c_0}x_i^2+{\zeta_{i+1}}+2c{x_i}{\zeta_i}+c\zeta_i^2}\right)}^2}}. \\ \end{array}$$

Minimum of S over c can be found from the condition \(\partial S/\partial c=0\), which reduces to the form

$$\sum\limits_{i=1}^{N-1}{\left({cx_i^2-{c_0}x_i^2+{\zeta_{i+1}}+2c{x_i}{\zeta_{i+1}}+c\zeta_i^2}\right)\cdot\left({x_i^2+2{x_i}{\zeta_{i+1}}+\zeta_i^2}\right)}=0.$$

By solving this equation, one gets an estimator (Mcsharry and Smith, 1999)

$$\hat c=\frac{{{c_0}\left({\sum\limits_{i=1}^{N-1}{x_i^4}+2\sum\limits_{i=1}^{N-1}{x_i^3{\zeta_i}}+\sum\limits_{i=1}^{N-1}{x_i^2\zeta_i^2}}\right)-\sum\limits_{i=1}^{N-1}{x_i^2{\zeta_{i+1}}}-2\sum\limits_{i=1}^{N-1}{{x_i}{\zeta_i}{\zeta_{i+1}}}-\sum\limits_{i=1}^{N-1}{\zeta_i^2{\zeta_{i+1}}}}} {{\sum\limits_{i=1}^{N-1}{x_i^4}+6\sum\limits_{i=1}^{N-1}{x_i^2\zeta_i^2}+\sum\limits_{i=1}^{N-1}{x_i^4}+4\sum\limits_{i=1}^{N-1}{x_i^3{\zeta_i}}+4\sum\limits_{i=1}^{N-1}{{x_i}\zeta_i^3}}}.$$
Fig 8.2
figure 2

Ordinary LS estimates of the parameter c in Eq. (8.1) versus the noise level at \(N=\infty\) and \(c_{0}=2\). Noise-to-signal ratio quantified as the ratio of standard deviations is shown versus the abscissa axis (McSharry and Smith, 1999)

Under the condition \(N\to\infty\), one can take into account statistical independence ζ i of \(\zeta_{i+1}\) and x i and replace the sums like \(\sum\limits_{i=1}^{N-1}x_{i}^{4}\) (temporal averaging) by the integrals like \(\int\limits_{-\infty}^{\infty}\mu(x,c_{0})x^{4}{\mathrm{d}}x\equiv\langle x^{4}\rangle\) (ensemble averaging). Here, \(\mu(x,c_{0})\) is an invariant measure for the map (8.1), i.e. a probability density function. At \(c_{0}=2\) it can be found analytically: \(\mu(x,2)=1/\pi\left(1-x^{2}\right),-1<x<1\). Hence, one gets \(\langle x^{2}\rangle=1/2,\ \langle x^{4}\rangle=3/8\) and \(\langle x^{n}\rangle=0\) for uneven n. Finally, at \(c_{0}=2\), one comes to the asymptotic expression \(\hat{c}=c_{0}\left(4\sigma_{\zeta}^{2}+3\right)\left/\left(8\langle\zeta^{4}\rangle+24\sigma_{\zeta}^{2}+3\right)\right.\). This is a biased estimator. It underestimates the true value c 0, since the denominator is greater than the numerator in the above expression. Figure 8.2 shows the asymptotic value of \(\hat{c}\) versus noise-to-signal ratio for Gaussian noise. It is close to the true value only under the low noise levels; its bias is less than 1% if the noise level is less than 0.05. The bias rises with the noise level. Analogous properties are observed for other noise distributions (Butkovsky et al., 2002; McSharry and Smith, 1999).

However, since the LS technique is simple for the implementation and can be easily used in the case of many estimated parameters in contrast to other methods, it is often applied in practice with an additional assumption of low measurement noise level.

1.2.2 Increasing Estimator Accuracy at High Measurement Noise Level

It is possible in part through the use of the total least-squares technique (Jaeger and Kanrz, 1996) when one minimises the sum of the squared orthogonal distances from the data points \((\eta_{n},\eta_{n+1})\) to the plot of the function \(f(x_{n},c)\) (Fig. 8.1c). Thereby, one takes into account that deviations of the observed data points with coordinates \((\eta_{n},\eta_{n+1})\) from the plot of the sought function \(f(x_{n},c_{0})\) are induced by the noise influence on both coordinates. Therefore, the deviations may occur in any direction, not only vertically. The use of the orthogonal distances is justified in Jaeger and Kanrz (1996) as an approximate version of the ML approach.

However, a non-zero estimator bias is not fully eliminated under the use of the total LS technique (especially in the case of a very strong noise), since the latter is just an approximation to the ML technique. It may seem that a way out is to write down the likelihood function for the new situation “honestly”, i.e. taking into account how the noise enters the observations. For a Gaussian noise, the problem reduces to the minimisation of the sum of squared deviations of a model realisation from an observed time series (Fig. 8.1d):

$$S(c,{x_1})=\sum\limits_{n=0}^{N-1}{{{\left({{\eta_{n+1}}-{f^{(n)}}({x_1},c)} \right)}^2}} \to{\mathrm{min}},$$
((8.4))

where \(f^{(n)}\) is an nth iterate of the map \(x_{n+1}=f(x_{n},c),\ f^{(0)}(x,c)=x\), and the initial model state x 1 is considered as an estimated quantity as well.

An orbit of a chaotic system is very sensitive to initial conditions and parameters. Therefore, the variance of the estimator obtained from Eq. (8.4) for a chaotic orbit rapidly decreases with N, sometimes even exponentially (Horbelt and Timmer, 2003; Pisarenko and Sornette, 2004). This is a desirable property, but it is achieved only if one manages to find the global minimum of Eq. (8.4). In practice, even for moderate values of N, the “relief” of S for a chaotic system becomes strongly “jagged” (Fig. 8.3a) so that it gets almost impossible to find the global minimum numerically (Dennis and Schnabel, 1983). To do it, one would need very “lucky” starting guesses for c and x 1. It is also difficult to speak of asymptotic properties of the estimators since the cost function gets non-smooth in the limit \(N\to\infty\). Therefore, one develops modifications of the ML technique in application to the parameter estimation from a chaotic time series (Pisarenko and Sornette, 2004; Smirnov et al., 2005b).

Fig 8.3
figure 3

Cost functions for the quadratic map (8.1) at \(c_{0}=1.85\) and \(N=20\). The left panel shows the cost function for the direct iterates (8.4), where \(x_{1}=0.3\); the right one for the reverse iterates (8.5), where \(x_{N}=f^{(N-1)}(0.3,c_{0})\)

Thus, according to a piecewise technique, one divides an original time series into segments of a moderate length so that it is possible to find the global minimum of Eq. (8.4) for each of them and averages the obtained estimates. This is a reasonable approach, but a resulting estimator may remain asymptotically biased. Its variance decreases again as \(N^{-1}\). Some ways to improve the estimator properties are described in Sect. 8.2.

Here, we note only an approach specific to one-dimensional maps (Smirnov et al., 2005b). It is based on the property that the only Lyapunov exponent of a one-dimensional chaotic map becomes negative under the time reversal so that an orbit gets much less sensitive to parameters and a “final condition”. Therefore, one minimises a quantity

$$S(c,{x_N})=\sum\limits_{n=0}^{N-1}{{{\left({{\eta_{N-n}}-{f^{(-n)}}({x_N},c)} \right)}^2}} \to{\mathrm{min}},$$
((8.5))

where \(f^{(-n)}\) is an nth iterate of the map \(x_{n+1}=f(x_{n},c)\) in reverse time, in particular \(f^{(-1)}\) is an inverse function for f with respect to x. The plot of the cost function (8.5) looks sufficiently smooth for an arbitrarily long time series (Fig. 8.3b) so that it is not difficult to find its global minimum. At low and moderate noise levels (\(\sigma_{\zeta}/\sigma_{x}\) up to 0.05–0.15), the error in the estimator (8.5) appears less than for the piecewise technique. Moreover, the expression (8.5) gives asymptotically unbiased estimates, whose variance typically scales as \(N^{-2}\) at weak noise. The high rate of error decrease is determined by the returns of a chaotic orbit to a small vicinity of the extremum of the function f (Smirnov et al., 2005b).

2 Hidden Variables

If the measurement noise level is considerable, the state variable x is often regarded “hidden” since its true values are, in fact, unknown. Variables are “even more hidden” if even their noise-corrupted values cannot be measured directly or computed from the observed data. This is often the case in practice. In such a situation, the parameter estimation is much more problematic than in the cases considered in Sect. 8.1. However, if one manages to solve the problem, then an additional opportunity to restore time series of hidden variables appears as a by-product. Then, a modelling procedure serves as a measurement device with respect to dynamical variables as well.

2.1 Measurement Noise

We illustrate the techniques with the example of the parameter estimation in ordinary differential equations without a dynamical noise. An object is the classical chaotic system – Lorenz system:

$$\begin{array}{ll}&{\textrm d}x_1/{\textrm d}t=c_1(x_2-x_1), \quad {\textrm d}x_2/{\textrm d}t=-x_2+x_1(c_3-x_3),\\ &{\textrm d}x_3/{\textrm d}t=c_2x_3+x_1x_2,\end{array}$$
((8.6))

for a “canonical” set of parameter values \(c_{1}=10,c_{2}=8/3,c_{3}=46\). A noise-corrupted realisation of x 1 is considered as observed data, i.e. \(\eta_{n}=x_{1}(t_{n})+\zeta_{n}\), while the variables x 2 and x 3 are hidden. A model is constructed in the form (8.6) where all the three parameters c k are regarded unknown.

2.1.1 Initial Value Approach

All the estimation techniques are based to a certain extent on the ideas like Eq. (8.4), i.e. one chooses such initial conditions and parameter values to provide maximal closeness of a model realisation to an observed time series in the least-squares sense. The direct solution to the problem like (8.4) is called an initial value approach (Horbelt, 2001; Voss et al., 2004). As indicated in Sect. 8.1.2, it is not applicable to a long chaotic time series. Improving the approach is not straightforward. Thus, a simple division of a time series into segments with subsequent averaging of the respective estimates gives a low accuracy of the resulting estimator. The reverse-time iterations are not suitable in the case of a multidimensional dissipative system.

2.1.2 Multiple Shooting Approach

The difficulties can be overcome in part with Bock’s algorithm (Baake et al., 1992; Bock, 1981). It is also called a multiple shooting approach since one replaces the Cauchy initial value problem for an entire observation interval with a set of boundary value problems. Namely, one divides an original time series \(\{\eta_{1},\eta_{2},\ldots,\eta_{N}\}\) into L non-overlapping segments of length M and considers model initial states \(\textbf{x}^{(i)}\) for each of them (i.e. at time instants \(t_{(i-1)M+1},i=1,\ldots,L\)) as estimated quantities, but not as free parameters. One solves the problem of conditional minimisation, which reads in the case of a scalar observable \(\eta=h(\textbf{x})+\zeta\) (\(\eta=x_{1}+\zeta\) in our example) as

$$\begin{array}{ll}&S\left({{\mathbf{c}},{{\mathbf{x}}^{(1)}},{{\mathbf{x}}^{(2)}},\ldots,{{\mathbf{x}}^{(L)}}} \right)= \\ &\quad=\sum\limits_{i=1}^L{\sum\limits_{n=1}^M{{{\left[{\eta \left({{t_{(i-1)M+n}}} \right)-h\left({{\mathbf{x}}\left({{t_{(i-1)M+n}}-{t_{(i-1)M+1}},{{\mathbf{x}}^{(i)}},{\mathbf{c}}} \right)} \right)} \right]}^{\,2}} \to{\mathrm{min}}}}, \\ &{\mathbf{x}}\left({{t_{iM+1}}-{t_{(i-1)M+1}},{{\mathbf{x}}^{(i)}},{\mathbf{c}}} \right)={{\mathbf{x}}^{(i+1)}},\,\,i=1,2,\ldots,L-1. \end{array}$$
((8.7))

The quantity \(\textbf{x}(t,\textbf{x}^{(i)},\textbf{c})\) denotes a model realisation (a solution to model equations), i.e. a model state x at a time instant t for an initial state \(\textbf{x}^{(i)}\) and a parameter value c. The first equation in Eq. (8.7) means minimisation of the deviations of a model realisation from the observed series over the entire observation interval. The second line provides “matching” of the segments to get finally a continuous model orbit over the entire observation interval. This matching imposes the conditions of the equality type on the L sought vectors \(\textbf{x}^{(i)}\), i.e. only one of the vectors can be regarded as a free parameter of the problem.

Fig 8.4
figure 4

Parameter estimation from a chaotic realisation of the coordinate \(x=x_{1}\) of the Lorenz system (8.6), \(N=100\) data points, the sampling interval is 0.04, measurement noise is Gaussian and white with standard deviation of 0.2 of the standard deviation of the noise-free signal (Horbelt, 2001). (a) An initial value approach. Realisations of an observable (circles) and a corresponding model variable. The fitting process converges to a local minimum, where a model orbit and parameter estimates strongly differ from the true ones. (b) A multiple shooting approach. The fitting process converges to the global minimum, where a model orbit and parameter estimates are close to the true ones

Next, one solves the problem with ordinary numerical iterative techniques using some starting guesses for the sought quantities \(\textbf{c, x}^{(1)},\textbf{x}^{(2)},\ldots,\textbf{x}^{(L)}\). The starting guesses correspond, as a rule, to L non-matching pieces of a model orbit (Fig. 8.4b, upper panel). The situation is similar for intermediate values of the sought quantities during the iterative minimisation procedure, but the “discrepancies” should get smaller and smaller if the procedure converges at given starting guesses. Such a temporary admission of the model orbit discontinuity distinguishes Bock’s algorithm (Fig. 8.4b) from the initial value approach (Fig. 8.4a) and provides greater flexibility of the former.

An example of the application of the two techniques to the parameter estimation for the system (8.6) is shown in Fig. 8.4. The multiple shooting technique “finds” the global minimum, while the initial value approach stops at a local one. This is quite a typical situation. However, the multiple shooting technique does not assure finding the global minimum. It only softens requirements to the “goodness” of starting guesses for the sought quantities (Bezruchko et al., 2006). For even longer chaotic time series, it also gets inefficient since the basic principle of closeness of a chaotic model orbit to the observed time series over a long time interval again leads to very strict requirements to starting guesses.

2.1.3 Modification of the Multiple Shooting Technique

As shown in Bezruchko et al., (2006), one can avoid some difficulties via allowing discontinuity of the resulting model orbit at several time instants within the observation interval, i.e. via ignoring several equalities in the last line of Eq. (8.7). In such a way it is easier to find the global minimum of the cost function S in Eq. (8.7).

Such a modification allows to use arbitrarily long chaotic time series, but the requital is that sometimes a model with an inadequate structure may be accepted as a “good” one due to its ability to reproduce short segments of the time series. Therefore, one should carefully select the number and the size of the continuity segments for the model orbit.

There is also an additional difficulty in the situation with hidden variables. Apart from lucky starting guesses for the parameters c, it appears important to generate lucky starting guesses for the hidden variables (components of vectors \(\textbf{x}^{(i)}\)) in contrast to very optimistic early statements (Baake et al., 1992). Quite often, one has to proceed intuitively or via blind trials and errors. However, useful information may be obtained sometimes through a preliminary study of the properties of model time realisations at several trial parameter values (Bezruchko et al., 2006).

2.1.4 Synchronisation-Based Parameter Estimation

A further improvement of the technique is based on the idea of synchronising a model by the observed time series. It was suggested in Parlitz (1996) and further elaborated in many works. Here, we briefly describe its main points, advantages and difficulties following the works Parlitz (1996) and Parlitz et al., (1996).

Let us consider a system of ODEs \({\mathrm{d}}\textbf{y}/{\mathrm{d}}t=\textbf{f}(\textbf{y, c}_{0})\) with a state vector \(\textbf{y}\in R^{D}\) and a parameter vector \(\textbf{c}_{0}\in R^{P}\) as an object of modelling. Parameter values \(\textbf{c}_{0}\) are unknown. An observable vector is \(\boldsymbol{\upeta}=\textbf{h}(\textbf{y})\). It may have a lower dimension than the state vector y (the case of hidden variables). A model is given by the equation \({\mathrm{d}}\textbf{x}/{\mathrm{d}}t=\textbf{f(x, c)}\). Let us assume that there is a unidirectional coupling scheme using the available signal \(\boldsymbol{\upeta}(t)\), which enables asymptotically stable synchronisation of the model (x) by the object (y), i.e. \(\textbf{x}\to\textbf{y}\) as \(t\to\infty\) if \(\textbf{c}=\textbf{c}_{0}\). Thus, one can vary model parameters c and integrate model equations with somehow introduced input signal \(\boldsymbol{\upeta}(t)\) at each value of c. If at some value \(\textbf{c}=\hat{\textbf{c}}\) identical synchronisation between \(\boldsymbol{\upeta}(t)\) and the corresponding model realisation \(\textbf{h}(\textbf{x}(t))\) (i.e. the regime \(\boldsymbol{\upeta}(t)=\textbf{h}(\textbf{x}(t))\)) is achieved after some transient process, then the value \(\hat{\textbf{c}}\) should be equal to \(\textbf{c}_{0}\). If only an approximate relationship \(\boldsymbol{\upeta}(t)\approx\textbf{h}(\textbf{x}(t))\) holds true, then \(\hat{\textbf{c}}\) can be taken as an estimate of \(\textbf{c}_{0}\).

There are different implementations of the idea. One can introduce a measure of the discrepancy between \(\boldsymbol{\upeta}(t)\) and \(\textbf{h}(\textbf{x}(t))\), e.g. their mean-squared difference after some transient, and minimise it as a function of c (Parlitz et al., 1996). This is a complicated problem of non-linear optimisation similar to those encountered under the multiple shooting approach. An advantage of the synchronisation-based estimation is that the minimised function often changes quite gradually with c and has a pronounced minimum at \(\textbf{c}=\textbf{c}_{0}\) with a broad “basin of attraction”, i.e. a starting guess for c does not have to be so “lucky” as under the multiple shooting approach. This is explained by the following property of many non-linear systems: If c is not equal to \(\textbf{c}_{0}\) but reasonably close to it, the identical synchronisation is impossible but there often occurs the generalised synchronisation (Sect. 6.4.5), where x is a function of y not very much different from \(\textbf{x}=\textbf{y}\). Then, the discrepancy between \(\boldsymbol{\upeta}(t)\) and \(\textbf{h}(\textbf{x}(t))\) changes smoothly in the vicinity of \(\textbf{c}=\textbf{c}_{0}\).

It can be even more convenient to avoid minimisation of a complicated cost function by considering the parameters c as additional variables in model ODEs and update their values depending on the current mismatch between \(\boldsymbol{\upeta}(t)\) and \(\textbf{h}(\textbf{x}(t))\) in the course of integration of the model ODEs (Parlitz, 1996). An example is again the chaotic Lorenz system

$$\begin{array}{ll} &{{{\mathrm{d}}{y_1}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{y_1}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=\sigma ({y_2}-{y_1}), \quad {{{\mathrm{d}}{y_2}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{y_2}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}={c_{1,0}}{y_1}-{c_{2,0}}{y_2}-{y_1}{y_3}+{c_{3,0}}, \\ &{{{\mathrm{d}}{y_3}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{y_3}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}={y_1}{y_2}-b{y_3},\end{array}$$
((8.8))

with the parameters \(c_{1,0}=28,c_{2,0}=1, c_{3,0}=0,\sigma=10,\ b=8/3\) and an observable \(\eta=h(\textbf{y})=y_{2}\). The following unidirectional coupling scheme and equations for the parameter updates were considered:

$$\begin{array}{ll} &{{{\mathrm{d}}{x_1}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_1}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=\sigma (\eta-{x_1}),\\ &{{{\mathrm{d}}{x_2}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_2}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}={c_1}{x_1}-{c_2}{x_2}-{x_1}{x_3}+{c_3},\\ &{{{\mathrm{d}}{x_3}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_3}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}={x_1}{x_2}-b{x_3}, \\ &{{{\mathrm{d}}{c_1}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{c_1}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=(\eta-{x_2}){x_1},\\ &{{{\mathrm{d}}{c_2}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{c_2}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=-(\eta-{x_2}){x_2},\\ &{{{\mathrm{d}}{c_3}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{c_3}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=\eta-{x_2}.\end{array}$$
((8.9))

Using a global Lyapunov function, the author has shown that at correct parameter values \(\textbf{c}=\textbf{c}_{0}\), the model is synchronised by the signal \(\eta(t)\) at all initial conditions. The system (8.9) tends to the regime \(\textbf{x}(t)=\textbf{y}(t)\) and \(\textbf{c}(t)=\textbf{c}_{0}\) at any values of \(x_{1}(0),x_{2}(0),x_{3}(0),c_{1}(0),c_{3}(0)\) and any positive \(c_{2}(0)\). Parameter estimates in this example appeared not very sensitive to the mismatch in the parameter σ. More general recommendations on the choice of the coupling scheme were also suggested from geometrical considerations in Parlitz (1996). Rather different equations for the parameter updates were suggested by several authors, e.g., Konnur (2003).

The synchronisation-based approach is theoretically justified for the noise-free case. Still, numerical experiments have shown its good performance when a moderate measurement noise is present. As it has been already mentioned, the technique can be readily used in the case of hidden variables. However, despite several important advantages of the approach, it may encounter its own significant difficulties. Firstly, an asymptotically stable synchronisation may not be achieved for any observable \(\boldsymbol{\upeta}=\textbf{h}(\textbf{y})\). This possibility depends on the system under study and the coupling scheme. Secondly, it is not always clear what coupling scheme should be used to assure synchronisation at \(\textbf{c}=\textbf{c}_{0}\). Finally, it may be very important to select appropriate initial conditions in Eqs. (8.8) and (8.9), in other words, to select starting guesses for the model parameters and hidden variables. Further details can be found in Chen and Kurths (2007); Freitas et al., (2005); Hu et al., (2007); Huang (2004); Konnur (2003); Marino and Miguez (2005); Maybhate and Amritkar (1999); Parlitz (1996); Parlitz et al. (1996); Tao et al. (2004).

2.2 Dynamical and Measurement Noise

Estimating model parameters in the case of simultaneous presence of the dynamical and measurement noise is a more complicated task. However, there have been recently developed corresponding sophisticated techniques such as Kalman filtering-based methods (Sitz et al., 2002, 2004; Voss et al., 2004) and Bayesian approaches (Bremer and Kaplan, 2001; Davies, 1994; Meyer and Christensen, 2000). Below, we describe in some detail a recently suggested technique (Sitz et al., 2002) called unscented Kalman filtering.

Kalman filtering is a general idea which was originally developed for the state estimation in linear systems from observed noise-corrupted data (Kalman and Bucy, 1961). It is widely used, e.g., in data assimilation (Sect. 5.1) as well as in many other applications (Bar-shalom and Fortmann, 1988). The idea has been generalised for the estimation of model parameters together with model states. It has been recently further generalised for the estimation of parameters and state vectors in non-linear systems (Sitz et al., 2002).

In the linear case, the model is assumed to be of the form

$$\begin{array}{ll} &{{\mathbf{x}}_i}={\mathbf{A}} \cdot{{\mathbf{x}}_{i-1}}+{\boldsymbol{\upxi}_i}, \\ &{{\boldsymbol{\upeta}}_i}={\mathbf{B}} \cdot{{\mathbf{x}}_i}+{{\boldsymbol{\upzeta}}_i},\end{array}$$
((8.10))

where x is a state vector of the dimension D, i is the discrete time, η is the vector of observables whose dimension may be different from that of x, A and B are constant matrices, and ξ n and ζ n are independent zero-mean Gaussian white noises with diagonal covariance matrices. The problem is to estimate the state x at a time instant n having observations η i up to time n inclusively, i.e. a set \(\textbf{H}_{n}=\{\boldsymbol{\upeta}_{1},\boldsymbol{\upeta}_{2},\ldots,\boldsymbol{\upeta}_{n}\}\). Let us denote such an estimator as \(\hat{\textbf{x}}(n|n)\). Kalman filter provides an optimal linear estimate, i.e. unbiased and with the smallest variance. Formally, the estimation procedure consists of the predictor and corrector steps. At the predictor step, one estimates the value \(\hat{\textbf{x}}(n|n-1)\), i.e. takes into account only the observations \(\textbf{H}_{n-1}\) up to time instant \(n-1\). The optimal solution to such a problem is \(\hat{\textbf{x}}(n|n-1)=E[\textbf{x}_{n}|\textbf{H}_{n-1}]\). It can be written as

$$\begin{array}{ll} &{\hat {\textbf x}}(n\left|{n-1} \right.)=E[{\mathbf{A}} \cdot{{\mathbf{x}}_{n-1}}{\left|{\mathbf{{\textrm H}}} \right._{n-1}}]={\mathbf{A}} \cdot E[{{\mathbf{x}}_{n-1}}{\left|{\mathbf{{\textrm H}}} \right._{n-1}}]={\mathbf{A}} \cdot{\hat {\textbf x}}(n-1\left|{n-1} \right.),\\ &{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.)=E[{\mathbf{B}} \cdot{{\mathbf{x}}_n}{\left|{\mathbf{{\textrm H}}} \right._{n-1}}]={\mathbf{B}} \cdot{\mathbf{A}} \cdot{\hat {\textbf x}}(n-1\left|{n-1} \right.).\end{array}$$
((8.11))

The second equation in Eq. (8.11) will be used at the correction step below. The point estimators must be equipped with the confidence bands (see Sect. 2.2.1). It is known that due to the linearity of the system, the estimators are Gaussian distributed. Thus, their confidence bands are simply expressed via their covariance matrices whose optimal estimates read as

$$\begin{array}{ll} {\mathbf{P}}(n\left|{n-1} \right.)&=E[({{\mathbf{x}}_n}-{\hat {\textbf x}}(n\left|{n-1} \right.)) \cdot{({{\mathbf{x}}_n}-{\hat {\textbf x}}(n\left|{n-1} \right.))^{\mathrm{T}}}{\left|{\mathbf{{\textrm H}}} \right._{n-1}}],\\ {{\mathbf{P}}_{{\boldsymbol{\upeta} \boldsymbol{\upeta}}}}(n\left|{n-1} \right.)&=E[({{\boldsymbol{\upeta}}_n}-{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.)) \cdot{({{\boldsymbol{\upeta}}_n}-{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.))^{\mathrm{T}}}{\left|{\mathbf{{\textrm H}}} \right._{n-1}}],\\ {{\mathbf{P}}_{\textbf{x}{\boldsymbol{\upeta}}}}(n\left|{n-1} \right.)&=E[({{\mathbf{x}}_n}-{\mathbf{x}}(n\left|{n-1} \right.)) \cdot{({{\boldsymbol{\upeta}}_n}-{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.))^{\mathrm{T}}}{\left|{\mathbf{{\textrm H}}} \right._{n-1}}].\end{array}$$
((8.12))

These matrices can be expressed via their previous estimates \(\textbf{P}(n-1|{n-1}),\textbf{P}_{\boldsymbol{\upeta}\boldsymbol{\upeta}}(n-1|{n-1}),\textbf{P}_{\textbf{x}\boldsymbol{\upeta}}(n-1|{n-1})\) analytically for the linear system.

Now, the corrector step updates the predictor-step estimators taking into account the last observation η n as follows:

$$\begin{array}{ll} {\hat {\textbf x}}(n\left| n \right.)&={\hat {\textbf x}}(n\left|{n-1} \right.)+{{\mathbf{K}}_n} \cdot ({{\boldsymbol{\upeta}}_n}-{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.)),\\ {\mathbf{P}}(n\left| n \right.)&={\mathbf{P}}(n\left|{n-1} \right.)-{{\mathbf{K}}_n} \cdot{{\mathbf{P}}_{{\boldsymbol{\upeta \upeta}}}}(n\left|{n-1} \right.) \cdot{\mathbf{K}}_n^{\mathrm{T}},\\ {{\mathbf{K}}_n}&={{\mathbf{P}}_{\textbf{x}{\boldsymbol{\upeta}}}}(n\left|{n-1} \right.) \cdot{\mathbf{P}}_{{\boldsymbol{\upeta \upeta}}}^{-1}(n\left|{n-1} \right.). \end{array}$$
((8.13))

Thus, the corrections represent discrepancy between the predictor-step estimates and actual observations multiplied by the so-called Kalman gain matrix K n . Having Eqs. (8.11), (8.12) and (8.13), one can start from initial guesses \(\hat{\textbf{x}}(1|1)\) and \(\textbf{P}(1|1)\) and recursively get optimal state estimates for all subsequent time instants taking into account subsequent observations. Due to Gaussianity of the distributions, a 95% confidence band for the jth component of the vector x n is given by \(\hat{x}_{j}(n|n)\pm 1.96\sqrt{\textbf{P}_{\mathit{jj}}(n|n)}\).

To apply the idea to a non-linear system \(\textbf{x}_{i+1}=\textbf{f}(\textbf{x}_{i})+\boldsymbol{\upxi}_{i}\) and \(\boldsymbol{\upeta}_{i}=\textbf{h}(\textbf{x}_{i})+\boldsymbol{\upzeta}_{i}\), one can either approximate non-linear functions with Taylor expansion or simulate the distribution of states and compute many model orbits to get an estimator \(\hat{\textbf{x}}(n|n)\) and its covariance matrix \(\textbf{P}(n|n)\). The latter idea appeared more fruitful in practice (Sitz et al., 2002). Its fast and convenient implementation includes the selection of the so-called sigma points \(\textbf{x}^{(1)},\ldots,\textbf{x}^{(2D)}\) specifying the distribution of states at a time instant \(n-1\):

$$\begin{array}{ll} &{{\mathbf{x}}^{(j)}}(n-1\left|{n-1} \right.)={\hat {\textbf x}}(n-1\left|{n-1} \right.)+{\left[{\sqrt{D \cdot{\mathbf{P}}(n-1\left|{n-1} \right.)}} \right]_j},\\ &{{\mathbf{x}}^{(j+D)}}(n-1\left|{n-1} \right.)={\hat {\textbf x}}(n-1\left|{n-1} \right.)-{\left[{\sqrt{D \cdot{\mathbf{P}}(n-1\left|{n-1} \right.)}} \right]_j}, \end{array}$$
((8.14))

where \(j=1,2,\ldots,D\) and \(\left[\sqrt{\cdot}\right]_{j}\) means jth column of the matrix square root. The sigma points are propagated through the non-linear systems giving

$$\begin{array}{ll} &{{\mathbf{x}}^{(j)}}(n\left|{n-1} \right.)={\mathbf{f}}({{\mathbf{x}}^{(j)}}(n-1\left|{n-1} \right.)),\\ &{{\mathbf{y}}^{(j)}}(n\left|{n-1} \right.)={\mathbf{h}}({{\mathbf{x}}^{(j)}}(n\left|{n-1} \right.)), \end{array}$$
((8.15))

where \(j=1,\ldots,2D\). Now, their sample means and covariances define the predictor estimates as follows:

$$\begin{array}{ll} &{\hat {\textbf x}}(n\left|{n-1} \right.)=\frac{1} {{2D}}\sum\limits_{j=1}^{2D}{{{\mathbf{x}}^{(j)}}(n\left|{n-1} \right.)},\\ &{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.)=\frac{1} {{2D}}\sum\limits_{j=1}^{2D}{{{\mathbf{y}}^{(j)}}(n\left|{n-1} \right.)},\\ &{{\mathbf{P}}_{{\boldsymbol{\upeta \upeta}}}}(n\left|{n-1} \right.)=\frac{1} {{2D}}\sum\limits_{j=1}^{2D}{({{\mathbf{y}}^{(j)}}(n\left|{n-1} \right.)-{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.)) \cdot{{({{\mathbf{y}}^{(j)}}(n\left|{n-1} \right.)-{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.))}^{\mathrm{T}}}},\\ &{{\mathbf{P}}_{\textbf{x}{\boldsymbol{\upeta}}}}(n\left|{n-1} \right.)=\frac{1} {{2D}}\sum\limits_{j=1}^{2D}{({{\mathbf{x}}^{(j)}}(n\left|{n-1} \right.)-{\hat {\mathbf x}}(n\left|{n-1} \right.)) \cdot{{({{\mathbf{y}}^{(j)}}(n\left|{n-1} \right.)-{\boldsymbol{\hat \upeta}}(n\left|{n-1} \right.))}^{\mathrm{T}}}},\\ &{\mathbf{P}}(n\left|{n-1} \right.)=\frac{1} {{2D}}\sum\limits_{j=1}^{2D}{({{\mathbf{x}}^{(j)}}(n\left|{n-1} \right.)-{\hat {\mathbf x}}(n\left|{n-1} \right.)) \cdot{{({{\mathbf{x}}^{(j)}}(n\left|{n-1} \right.)-{\hat {\mathbf x}}(n\left|{n-1} \right.))}^{\mathrm{T}}}}. \end{array}$$
((8.16))

The estimates (8.16) are updated via the usual Kalman formulas (8.13). This procedure is called unscented Kalman filtering (Sitz et al., 2002).

If parameter a of a system \(\textbf{x}_{i+1}=\textbf{f}(\textbf{x}_{i},\textbf{a})+\boldsymbol{\upxi}_{i}\) is unknown, then it can be formally considered as an additional state component. Moreover, it is convenient to consider noises ξ i and ζ i as components of a joint state vector. Thus, joint equations of motion read as

$$\begin{array}{ll} &{{\mathbf{x}}_{i+1}}={\mathbf{f}}({{\mathbf{x}}_i},{{\mathbf{a}}_i})+{{\boldsymbol{\upxi}}_i},\\ &{{\mathbf{a}}_{i+1}}={{\mathbf{a}}_i}, \\ &{{\boldsymbol{\upxi}}_{i+1}}={{\boldsymbol{\upxi}}_i},\\ &{{\boldsymbol{\upzeta}}_{i+1}}={{\boldsymbol{\upzeta}}_i},\end{array}$$
((8.17))

where the last three equations do not alter starting values of the additional state components. However, the estimates of these components change in time due to the correction step (8.13), which takes into account a new observation.

The technique can be easily generalised to the case of ordinary and stochastic differential equations, where numerical integration scheme would enter the first equation of Eq. (8.17) instead of a simple function f.

We note that the procedure can be used in the case of hidden variables, which correspond to the situation where the dimension of the observable vector η is less than D. Examples with a scalar observable from two- and three-dimensional dynamical systems are considered in Sitz et al., (2002), where efficiency of the approach is illustrated. Thus, even in the case of the deterministically chaotic Lorenz system, the unscented Kalman filtering allowed accurate estimation of the three parameters from a scalar time realisation. Concerning this example, we note two things. Firstly, the successful application of a statistical method (Kalman filtering has its roots in mathematical statistics and the theory of random processes) to estimate parameters in a deterministic system illustrates again a close interaction between deterministic and stochastic approaches to modelling discussed in Chap. 2 (see, e.g., Sect. 2.6). Secondly, the unscented Kalman filtering seems to be more efficient than the multiple shooting approach (Sect. 8.2.1) in many cases since the former technique does not require a continuous model orbit over the entire observation interval. In this respect, the unscented Kalman filtering is similar to the modified multiple shooting approach which allows several discontinuity points. However, the unscented Kalman filtering is easier in implementation, since it does not require to solve optimisation problem for the parameter estimation. On the other hand, the multiple shooting technique would certainly give more accurate parameter estimates for a deterministically chaotic system if one manages to find good starting guesses for the parameters and find the global minimum of the cost function (8.7).

The Kalman filtering-based approach also resembles the synchronisation-based approach: It also involves parameter updates depending on the current model state. However, the difference is that the Kalman filtering does not require any synchronisation between a model and the observed data and is appropriate when a dynamical noise is present in contrast to the synchronisation-based technique developed for deterministic systems.

A proper selection of starting guesses for the model parameters and the hidden variables is of importance for the unscented Kalman filtering as well, since an arbitrary starting guess does not assure convergence of the non-linear recursive procedure [Eqs. (8.16) and (8.13)] to the true parameter values. More detailed discussion of the unscented Kalman filtering can be found in Sitz et al., (2002, 2004); Voss et al., (2004).

Even more general techniques are based on the Bayesian approach (Bremer and Kaplan, 2001; Davies, 1994; Meyer and Christensen, 2000), where the state and parameter estimators are calculated based on the entire available set of observations rather than only on the previous ones. However, full Bayesian estimation is more difficult to use and implement, since it requires to solve a complicated numerical problem of sampling from multidimensional probability distributions.

2.2.1 Concluding Remark

Model validation for the considered “transparent box” problems is performed along two main lines. The first one is the analysis of the model residuals (Box and Jenkins, 1970), i.e. the check for their correspondence to the assumed noise properties (see Sect. 7.3). The second one is the computation of dynamical, geometrical and topological characteristics of a model attractor and their comparison with the respective properties of an original (Gouesbet et al., 2003b) (see Sect. 10.4).

3 What One Can Learn from Modelling Successes and Failures

Successful realisation of the above techniques (Sect. 8.2) promises an opportunity to obtain parameter estimates and time courses of hidden variables. It would allow several useful applications such as validation of model ideas, “measurement” of quantities inaccessible to measurement devices and restoration of lost or distorted segments of data. Let us comment it in more detail.

In studying of real-world objects, a researcher never meets a purely “transparent box” setting. He/she can only believe subjectively that a trial model structure is adequate to an original. Therefore, even with a perfect realisation of the procedures corresponding to the final modelling stages (Fig. 5.1), the result may appear negative, i.e. one may not get a valid model with a given structure. Then, a researcher should declare incorrectness of his/her ideas about the process under investigation and return to the stage of the model structure selection. If there are several alternative mathematical constructions, then modelling from time series may reveal the most adequate among them. Thus, the modelling procedure gives an opportunity to reject or confirm (possibly, to make more accurate) some substantial ideas about the object under investigation.

There are a number of practical examples of successful application of the approaches described above. Thus, in Horbelt et al. (2001) the authors confirm validity of their ideas about gas laser functioning and get directly immeasurable parameters of the transition rates between energetic levels depending on the pumping current. In Swameye et al. (2003) the authors are able to make substantial conclusions about the mechanism underlying a biochemical signalling process in cells which is described below.

3.1 An Example from Cell Biology

In many applications it is necessary to find out which cell properties determine an undesirable process in the cells most strongly and how one can purposefully affect those properties.Footnote 1 To answer such questions, it may be sufficient to get an adequate mathematical model as demonstrated in Swameye et al. (2003).

Fig 8.5
figure 5

A scheme of a biochemical signalling process in a cell (Swameye et al., 2003)

The authors investigate one of many intracellular signalling pathways, which provide a cell with an opportunity to produce necessary substances in response to variations in surroundings. In particular, such pathways provide reproduction, differentiation and survival of cells. The authors consider the so-called signalling pathway JAK-STAT, which transforms an external chemical signal into activation of a respective gene transcription in a cell nucleus (Fig. 8.5). One of the simplest mathematical models of the process can be written down based on the law of active mass (a usual approach in chemical kinetics) and reads as

$$\begin{array}{ll} &{{{\mathrm{d}}{x_1}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_1}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=-{k_1}{x_1}E(t),\\ &{{{\mathrm{d}}{x_2}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_2}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=-{k_2}x_2^2+{k_1}{x_1}E(t), \\ &{{{\mathrm{d}}{x_3}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_3}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=-{k_3}{x_3}+{{{k_2}x_2^2} \mathord{\left/ {\vphantom{{{k_2}x_2^2} 2}} \right. \kern-\nulldelimiterspace} 2},\\ &{{{\mathrm{d}}{x_4}} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_4}}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}={k_3}{x_3}.\end{array}$$
((8.18))

Here, k i are reaction rates, \(E(t)\) is the concentration of erythropoietin in the extracellular space (denoted Epo in Fig. 8.5), whose variations lead to the activation of respective receptors on a cell membrane. The receptors are bound to tyrosine kinase of the type JAK-2 existing in a cell cytoplasm. Tyrosine kinase reacts with molecules of the substance STAT5, whose concentration is denoted x 1. As a result of the reaction, the latter are phosphorylated. There arise monomeric tyrosine phosphorylated molecules STAT5, whose concentration is denoted x 2. This reaction leads to a decrease in x 1 [the first line in Eq. (8.18)] and an increase in x 2 [a positive term in the second line of Eq. (8.18)]. The monomeric molecules dimerise when they meet each other. Concentration of the dimeric molecules is denoted x 3. This reaction leads to a decrease in x 2 [a negative term in the second line of Eq. (8.18)] and an increase in x 3 [a positive term in the third line of Eq. (8.18)]. The dimeric molecules penetrate into the nucleus, where their concentration is denoted x 4. This process leads do a decrease in x 3 [a negative term in the third line of Eq. (8.18)] and an increase in x 4 [the fourth line in Eq. (8.18)]. The dimeric molecules activate the transcription of a target gene. As a result, a specific protein is produced. At that, the dimeric molecules dissociate into the monomeric ones, which degrade inside the nucleus according to the hypothesis underlying the model (8.18).

However, there is another hypothesis according to which the monomeric molecules STAT5 are relocated from the cell nucleus to the cytoplasm after a certain delay time. Under such an assumption, the mathematical model slightly changes and takes the form

$$\begin{array}{ll} &{{{\mathrm{d}}{x_1}(t)} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_1}(t)}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=-{k_1}{x_1}(t)E(t)+2{k_4}{x_3}(t-\tau),\\ &{{{\mathrm{d}}{x_2}(t)} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_2}(t)}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=-{k_2}x_2^2(t)+{k_1}{x_1}(t)E(t), \\ &{{{\mathrm{d}}{x_3}(t)} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_3}(t)}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=-{k_3}{x_3}(t)+{{{k_2}x_2^2(t)} \mathord{\left/ {\vphantom{{{k_2}x_2^2(t)} 2}} \right. \kern-\nulldelimiterspace} 2},\\ &{{{\mathrm{d}}{x_4}(t)} \mathord{\left/ {\vphantom{{{\mathrm{d}}{x_4}(t)}{{\mathrm{d}}t}}} \right. \kern-\nulldelimiterspace}{{\mathrm{d}}t}}=-{k_4}{x_3}(t-\tau)+{k_3}{x_3}(t),\end{array}$$
((8.19))

where additional time-delayed terms appear in the first and the fourth lines and τ is the delay time. Which of the two models (i.e. which of the two hypotheses) is valid is unknown. Opportunities of the observations are quite limited, an equipment is quite expensive and a measurement process is quite complicated (Swameye et al., 2003). In an experiment, the authors could measure only the total mass of the phosphorylated STAT5 in the cytoplasm η 1 and the total mass of STAT5 in the cytoplasm η 2 up to constant multipliers:

$$\begin{array}{ll} &{\eta_1}={k_5}({x_2}+2{x_3}),\\ &{\eta_2}={k_6}({x_1}+{x_2}+2{x_3}), \end{array}$$
((8.20))

where \(k_{5},k_{6}\) are unknown proportionality coefficients. Along with those two observables, the concentration \(E(t)\) is measured up to a proportionality coefficient (Fig. 8.6a). We stress that all model dynamical variables x k are hidden. There are only two observables, which are related to the four dynamical variables in the known way (8.20).

Fig 8.6
figure 6

Modelling of a biochemical cell signalling process (Swameye et al., 2003): (a) a time series of an external driving, variations in erythropoietin concentration; (b), (c) the results of an empirical model construction in the form (8.18) and (8.19), respectively

The parameters in both models (8.18) and (8.19) are estimated in Swameye et al. (2003) from the described experimental data. The authors have shown invalidity of the former model and a good correspondence between the experiments and the latter model. Namely, the time series were measured in three independent experiments and one of them is shown in Fig. 8.6. Model parameters were estimated from all the available data via the initial value approach (Sect. 8.2.1). This approach was appropriate since the time series were rather short. They contained only 16 data points per experiment session and represented responses to “pulse” changes in the erythropoietin concentration (Fig. 8.6a). Thus, one should not expect problems with finding the global minimum of the cost function. The model (8.18) appeared incapable of reproducing the observed time series (Fig. 8.6b), while the model (8.19) was adequate in this respect (Fig. 8.6c). The estimate of the delay time was \(\tau\approx 6\) min, which agreed by the order of magnitude with the results of other authors obtained with different techniques for similar objects.

Thus, only the modelling from time series allowed the authors to make a non-trivial conclusion that relocation of STAT5 molecules to the cytoplasm plays a significant role in the process under study. They found out some details of the process, which cannot be observed directly, for instance, the stay of the STAT5 molecules in the nucleus approximately for 6 min.

Moreover, by studying the model (8.19), one can predict what happens if some parameters of the process are varied. For example, the authors studied how the total mass of the protein produced by a cell (which is proportional to a total number of STAT5 molecules participating in the process) changes under variations in different model parameters. This quantity appeared to depend very weakly on \(k_{1},k_{2}\) and quite strongly on \(k_{3},k_{4},\tau\). In other words, the processes in the nucleus play a major role. According to the model (8.19), decreasing k 4 down to zero (inhibition of a nuclear export) leads to the decrease in the produced protein mass by 55%. In experiments with leptomycin B, the authors inhibited a nuclear export (an analogue to the parameter k 4) only by 60%. According to the model, it should lead to the reduction in the amount of activated STAT5 by 40%. In experiment, the reduction by 37% was observed. Thus, a model prediction was finely confirmed, which further increases one’s confidence to the model and allows to use it for a more detailed study of the process and its control. Having such achievements, one could think about opportunities to use empirical modelling for medical purposes.

3.2 Concluding Remarks

Despite the successes mentioned above, the problem of modelling may often appear technically unsolvable even under the relatively simple “transparent box” setting and for an adequate model structure. To date, examples of successful modelling are observed when the difference between the model dimension and the number of observables is not greater than two or three and the number of estimated parameters is not greater than 3–5. These are only rough figures to give an impression about typical practical opportunities. In each concrete case, modelling results depend on specific non-linearities involved in the model equations. In any case, the greater the number of hidden variables and unknown model parameters, the weaker the chances for a successful modelling and the lower the accuracy of parameter estimators.