Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter is dedicated to illustrating the examples, theory, and algorithms, as presented in the previous chapters, through a few short and easy-to-follow MATLAB programs. These programs are provided for two reasons: (i) For some readers, they will form the best route by which to appreciate the details of the examples, theory, and algorithms we describe; (ii) for other readers, they will be a useful starting point to develop their own codes. While ours are not necessarily the optimal implementations of the algorithms discussed in these notes, they have been structured to be simple to understand, to modify, and to extend. In particular, the code may be readily extended to solve problems more complex than those described in Examples 2.12.7, which we will use for most of our illustrations. The chapter is divided into three sections, corresponding to programs relevant to each of the preceding three chapters.

Before getting into details, we highlight a few principles that have been adopted in the programs and in the accompanying text of this chapter. First, notation is consistent between programs, and it matches the text in the previous sections of the book as far as possible. Second, since many of the elements of the individual programs are repeated, they will be described in detail only in the text corresponding to the program in which they first appear; the short annotations explaining them will, however, be repeated within the programs. Third, the reader is advised to use the documentation available at the command line for any built-in functions of MATLAB; this information can be accessed using the help command—for example, the documentation for the command help can be accessed by typing help help.

5.1 Chapter 2 Programs

The programs p1.m and p2.m used to generate the figures in Chapter 2 are presented in this section. Thus these algorithms simply solve the dynamical system (2.1) and process the resulting data.

5.1.1 p1.m

The first program, p1.m, illustrates how to obtain sample paths from equations (2.1) and (2.3). In particular, the program simulates sample paths of the equation

$$\displaystyle{ u_{j+1} =\alpha \sin (u_{j}) +\xi _{j}, }$$
(5.1)

with \(\xi _{j} \sim N(0,\sigma ^{2})\) i.i.d.  and α = 2. 5, both for deterministic (σ = 0) and stochastic dynamics (σ ≠ 0) corresponding to Example 2.3. In line 5, the variable J is defined, which corresponds to the number of forward steps that we will take. The parameters α and σ are set in lines 6–7. The seed for the random-number generator is set to sd \(\in \mathbb{N}\) in line 8 using the command rng(sd). This guarantees that the results will be reproduced exactly by running the program with this same sd. Different choices of sd \(\in \mathbb{N}\) will lead to different streams of random numbers used in the program, which may also be desirable in order to observe the effects of different random numbers on the output. The command sd will be called in the preamble of all of the programs that follow. In line 9, two vectors of length J are created, named v and vnoise; after the program has run, these two vectors contain the solutions for the case of deterministic (σ = 0) and stochastic dynamics (σ = 0. 25) respectively. After the initial conditions are set in line 10, the desired map is iterated, without and with noise, in lines 12–15. Note that the only difference between the forward iteration of v and that of vnoise is the presence of the sigma*randn term, which corresponds to the generation of a random variable sampled from N(0, σ 2). Lines 17–18 contain code that graphs the trajectories, with and without noise, to produce Figure 2.3 Figures 2.12.2, and 2.5 were obtained by simply modifying lines 12–15 of this program, in order to create sample paths for the corresponding \(\mathrm{\varPsi }\) for the other three examples; furthermore, Figure 2.4a was generated from output of this program, and Figure 2.4b was generated from output of a modification of this program.

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p1.m  −  behaviour of  sin  map (Ex.  1.3)

 3 %%%  with and without observational  noise

 4 

 5 J=10000;% number of  steps

 6 alpha=2.5;% dynamics determined by alpha

 7 sigma=0.25;% dynamics noise  variance i s  sigmaˆ2

 8 sd=1;rng ( sd);% choose random number seed

 9 v =zeros (J+1,1);  vnoise=zeros (J+1,1);% preallocate  space

10 v(1)=1; vnoise (1)=1;% i n i t i a l  conditions

11 

12 for  i =1:J

13     v( i+1)=alpha * sin (v( i ) );

14     vnoise ( i+1)=alpha * sin ( vnoise ( i ))+sigma * randn;

15 end

16 

17 figure (1),  plot ( [ 0: 1: J ], v ),

18 figure (2),  plot ( [ 0: 1: J ], vnoise ),

5.1.2 p2.m

The second program presented here, p2.m, is designed to visualize the posterior distribution in the case of one-dimensional deterministic dynamics . For clarity, the program is separated into three main sections. The setup section in lines 5–10 defines the parameters of the problem. The model parameter r is defined in line 6, and it determines the dynamics of the forward model, in this case given by the logistic map (2.9):

$$\displaystyle{ v_{j+1} = rv_{j}(1 - v_{j}). }$$
(5.2)

The dynamics are taken as deterministic, so the parameter sigma does not feature here. The parameter r is equal to 2, so that the dynamics are not chaotic, as the explicit solution given in Example 2.4 shows. The parameters m0 and C0 define the mean and covariance of the prior distribution \(v_{0} \sim N(m_{0},C_{0})\), while gamma defines the observational noise \(\eta _{j} \sim N(0,\gamma ^{2})\).

The truth section in lines 14–20 generates the true reference trajectory (or truth) vt in line 18 given by (5.2), as well as the observations y in line 19 given by

$$\displaystyle{ y_{j} = v_{j} +\eta _{j}. }$$
(5.3)

Note that the index of y(:,j) corresponds to observation of H*v(:,j+1). This is due to the fact that the first index of an array in MATLAB is j=1, while the initial condition is v 0, and the first observation is of v 1. So effectively the indices of y are correct as corresponding to the text and equation (5.3), but the indices of v are off by 1. The memory for these vectors is preallocated in line 14. This is unnecessary, because MATLAB would simply dynamically allocate the memory in its absence, but it would slow down the computations due to the necessity of allocating new memory each time the given array changes size. Commenting this line out allows observation of this effect, which becomes significant when J becomes sufficiently large.

The solution section after line 24 computes the solution, in this case the pointwise representation of the posterior smoothing distribution on the scalar initial condition. The pointwise values of the initial condition are given by the vector v0 (v 0) defined in line 24. There are many ways to construct such vectors, and this convention defines the initial (0. 01) and final (0. 99) values and a uniform step size 0. 0005. It is also possible to use the command v0=linspace(0.01,0.99,1961), defining the number 1961 of intermediate points, rather than the step size 0. 0005. The corresponding vector of values of Phidet (\(\mathrm{\varPhi }_{\mathrm{det}}\)), Jdet (\(\mathsf{J}_{\mathrm{det}}\)), and Idet (\(\mathsf{I}_{\mathrm{det}}\)) are computed in lines 32, 29, and 34 for each value of v0, as related by the equation

$$\displaystyle{ \mathsf{I}_{\mathrm{det}}(v_{0};y) = \mathsf{J}_{\mathrm{det}}(v_{0}) + \mathrm{\varPhi }_{\mathrm{det}}(v_{0};y), }$$
(5.4)

where \(\mathsf{J}_{\mathrm{det}}(v_{0})\) is the background penalization and \(\mathrm{\varPhi }_{\mathrm{det}}(v_{0};y)\) is the model–data misfit functional given by (2.29b) and (2.29c) respectively. The function \(\mathsf{I}_{\mathrm{det}}(v_{0};y)\) is the negative log-posterior as given in Theorem 2.11. Having obtained \(\mathsf{I}_{\mathrm{det}}(v_{0};y)\), we calculate \(\mathbb{P}(v_{0}\vert y)\) in lines 37–38, using the formula

$$\displaystyle{ \mathbb{P}(v_{0}\vert y) = \frac{\exp (-\mathsf{I}_{\mathrm{det}}(v_{0};y))} {\int \exp (-\mathsf{I}_{\mathrm{det}}(v_{0};y))}. }$$
(5.5)

The trajectory v corresponding to the given value of v 0 (v0(i)) is denoted by vv and is replaced for each new value of v0(i) in lines 29 and 31, since it is required only to compute Idet. The command trapz(v0,exp(-Idet)) in line 37 approximates the denominator of the above by the trapezoidal rule, i.e., the summation

$$\displaystyle{ \mathtt{trapz(v0,exp(-Idet))} =\sum _{ i=1}^{N-1}\mathtt{(v0(i + 1) -v0(i)) {\ast} (Idet(i + 1) + Idet(i))/2}. }$$
(5.6)

The rest of the program deals with plotting our results, and in this instance, it coincides with the output of Figure 2.11b. Again, simple modifications of this program were used to produce Figures 2.102.12, and 2.13. Note that rng(sd) in line 8 allows us to use the same random numbers every time the file is executed; those random numbers are generated with the seed sd, as described in Section 5.1.1. Commenting this line out would result in the creation of new realizations of the random data y, different from those used to obtain Figure 2.11b.

 1 clear;  set (0, ’ defaultaxesfontsize ’, 2 0 );  format long

 2 %%%  p2.m  smoothing problem for  the deterministic  l o g i s t i c  map (Ex.  1.4)

 3 %%  setup

 4 

 5 J=1000;% number of  steps

 6 r=2;% dynamics determined by r

 7 gamma=0.1;% observational  noise  variance i s  gammaˆ2

 8 C0=0.01;% prior  i n i t i a l  condition  variance

 9 m0=0.7;% prior  i n i t i a l  condition  mean

10 sd=1;rng ( sd);% choose random number seed

11 

12 %%  truth

13 

14 vt=zeros (J+1,1); y =zeros (J,1);% preallocate  space to save time

15 vt (1)=0.1;% truth i n i t i a l  condition

16 for  j =1:J

17     %  can be replaced  by Psi for  each problem

18     vt ( j+1)=r * vt ( j )*(1 − vt ( j ));% create  truth

19     y( j)=vt ( j+1)+ gamma * randn;% create  data

20 end

21 

22 %%  solution

23 

24 v0 =[0.01:0.0005:0.99];%  construct  vector of  d i f f e r e n t  i n i t i a l  data

25 Phidet=zeros ( length (v0 ), 1 ); Idet=Phidet; Jdet=Phidet;% preallocate  space

26 vv =zeros (J,1);% preallocate  space

27 %  loop through i n i t i a l  conditions  vv0,  and compute log  posterior  I0 ( vv0 )

28 for  j =1: length (v0)

29    vv(1)=v0( j );  Jdet ( j )=1/2/C0 * ( v0( j)− m0)ˆ2;% background penalization

30      for  i =1:J

31         vv( i+1)=r * vv( i )*(1 − vv( i ) );

32         Phidet ( j)=Phidet ( j )+1/2/gamma ˆ 2 * ( y( i )−vv( i +1))ˆ2;% misfit

33                   functional

34      end

35     Idet ( j)=Phidet ( j)+Jdet ( j );

36 end

37 

38 constant=trapz (v0, exp(−Idet ));% approximate normalizing constant

39 P = exp(−Idet )/ constant;% normalize posterior  distribution

40 prior= normpdf (v0,m0, sqrt (C0));% calculate  prior  distribution

41 

42 figure (1), plot (v0, prior, ’ k ’, ’ LineWidth ’,2)

43 hold on,  plot (v0,P, ’ r −−’,’LineWidth ’,2),  xlabel  ’v_0 ’,

44 legend ’ prior ’  J=10ˆ3

5.2 Chapter 3 Programs

The programs p3.m-p7.m, used to generate the figures in Chapter 3, are presented in this section. Hence various MCMC algorithms used to sample the posterior smoothing distribution are given. Furthermore, optimization algorithms used to obtain solutions of the 4DVAR and w4DVAR variational methods are also introduced. Our general theoretical development of MCMC methods in Section 3.2 employs a notation of u for the state of the chain and w for the proposal . For deterministic dynamics, the state is the initial condition v 0; for stochastic dynamics, it is either the signal v or the pair (v 0, ξ), where ξ is the noise (since this pair determines the signal). Where appropriate, the programs described here use the letter \(v\), and variants thereof, for the state of the Markov chain to keep the connection with the underlying dynamics model.

5.2.1 p3.m

The program p3.m contains an implementation of the random walk Metropolis (RWM) MCMC algorithm. The development follows Section 3.2.3, where the algorithm is used to determine the posterior distribution on the initial condition arising from the deterministic logistic map of Example 2.4 given by (5.2). Note that in this case, since the underlying dynamics are deterministic and hence completely determined by the initial condition, the RWM algorithm will provide samples from a probability distribution on \(\mathbb{R}\).

As in program p2.m, the code is divided into three sections: setup, where parameters are defined; truth, where the truth and data are generated; and solution, where the solution is computed, this time by means of MCMC samples from the posterior smoothing distribution . The parameters in lines 5–10 and the true solution (here taken as only the initial condition, rather than the trajectory it gives rise to) vt in line 14 are taken to be the same as those used to generate Figure 2.13 The temporary vector vv generated in line 19 is the trajectory corresponding to the truth (vv(1)=vt in line 14) and used to calculate the observations y in line 20. The true value vt will also be used as the initial sample in the Markov chain for this and for all subsequent MCMC programs. This scenario is, of course, impossible in the case that the data is not simulated. However, it is useful when the data is simulated, as it is here, because it can reduce the burn-in time, i.e., the time necessary for the current sample in the chain to reach the target distribution, or the high-probability region of the state space. Because we initialize the Markov chain at the truth, the value of \(\mathsf{I}_{\mathrm{det}}(v^{\dag })\), denoted by the temporary variable Idet, is required to determine the initial acceptance probability, as described below. It is computed in lines 15–23 exactly as in lines 25–34 of program p2.m, as described around equation (5.4).

In the solution section, some additional MCMC parameters are defined. In line 28, the number of samples is set to N =105. For the parameters and specific data used here, this is sufficient for convergence of the Markov chain . In line 30, the step-size parameter beta is preset such that the algorithm for this particular posterior distribution has a reasonable acceptance probability, or ratio of accepted to rejected moves. A general rule of thumb for this is that it should be somewhere around 0. 5, to ensure that the algorithm is not too correlated because of high rejection rate (acceptance probability near zero) and that it is not too correlated because of small moves (acceptance probability near one). The vector V defined in line 29 will save all of the samples. This is an example in which preallocation is very important. Try using the commands tic and toc before and respectively after the loop in lines 33–50 in order to time the chain both with and without preallocation .Footnote 1 In line 34, a move is proposed according to the proposal equation (3.15):

$$\displaystyle{w^{(k)} = v^{(k-1)} +\beta \iota ^{(k-1)},}$$

where v(v) is the current state of the chain (initially taken to be equal to the true initial condition v 0), \(\iota ^{(k-1)}\)=randn is an i.i.d.  standard normal, and w represents w (k). Indices are not used for v and w because they will be overwritten at each iteration.

The temporary variable vv is again used for the trajectory corresponding to \(w^{(k)}\) as a vehicle to compute the value of the proposed \(\mathsf{I}_{\mathrm{det}}(w^{(k)};y)\), denoted in line 42 by I0prop = J0prop + Phiprop. In lines 44–46, the decision to accept or reject the proposal is made based on the acceptance probability

$$\displaystyle{a(v^{(k-1)},w^{(k)}) = 1 \wedge \exp (\mathsf{I}_{\mathrm{ det}}(v^{(k-1)};y) -\mathsf{I}_{\mathrm{ det}}(w^{(k)};y)).}$$

In practice, this corresponds to drawing a uniform random number rand and replacing v and Idet in line 45 with w and I0prop if rand<exp(I0-I0prop) in line 44. The variable bb is incremented if the proposal is accepted, so that the running ratio of accepted moves bb to total steps n can be computed in line 47. This approximates the average acceptance probability. The current sample v (k) is stored in line 48. Notice that here one could replace v by V(n-1) in line 34, and by V(n) in line 45, thereby eliminating v, and letting w be the only temporary variable. However, the present construction is favorable, because as mentioned above, in general one may not wish to save every sample.

The samples V are used in lines 51–53 to visualize the posterior distribution. In particular, bins of width dx are defined in line 51, and the command hist is used in line 52. The assignment Z = hist(V,v0) means first that the real number line is split into M bins with centers defined according to v0(i) for \(i = 1,\ldots,M\), with the first and last bins corresponding to the negative, respectively positive, half-lines. Second, Z(i) counts the number of k for which V(k) is in the bin with center determined by v0(i).

Again, trapz (5.6) is used to compute the normalizing constant in line 53, directly within the plotting command. The choice of the location of the histogram bins allows for a direct comparison with the posterior distribution calculated from the program p2.m by directly evaluating \(\mathsf{I}_{\mathrm{det}}(v;y)\) defined in (5.4) for different values of initial conditions v. This output is then compared with the corresponding output of p2.m for the same parameters in Figure 3.2

 1 clear;  set (0, ’ defaultaxesfontsize ’, 2 0 );  format long

 2 %%%  p3.m  MCMC  RWM  algorithm for  l o g i s t i c  map (Ex.  1.4)

 3 %%  setup

 4 

 5 J=5;% number of  steps

 6 r=4;% dynamics determined by alpha

 7 gamma=0.2;% observational  noise  variance i s  gammaˆ2

 8 C0=0.01;% prior  i n i t i a l  condition  variance

 9 m0=0.5;% prior  i n i t i a l  condition  mean

10 sd=10;rng ( sd);% choose random number seed

11 

12 %%  truth

13 

14 vt =0.3;vv(1)=vt;% truth i n i t i a l  condition

15 Jdet=1/2/C0 * ( vt− m0)ˆ2;% background penalization

16 Phidet=0;% i n i t i a l i z a t i o n  model−data misfit  functional

17 for  j =1:J

18     %  can be replaced  by Psi for  each problem

19     vv( j+1)=r * vv( j )*(1 − vv( j ));% create  truth

20     y( j)=vv( j+1)+ gamma * randn;% create  data

21     Phidet=Phidet+1/2/gamma ˆ 2 * ( y( j)−vv( j +1))ˆ2;% misfit  functional

22 end

23 Idet=Jdet+Phidet;% compute log  posterior  of  the truth

24 

25 %%  solution

26 %  Markov Chain Monte Carlo:  N  forward steps  of  the

27 %  Markov Chain on R  ( with truth i n i t i a l  condition )

28 N =1e5;% number of  samples

29 V =zeros (N,1);% preallocate  space to save time

30 beta =0.05;% step−s i z e  of  random walker

31 v =vt;% truth i n i t i a l  condition  ( or e l s e  update I0 )

32 n=1; bb=0; rat (1)=0;

33 while n <= N

34      w = v +sqrt ( 2 * beta ) * randn;% propose sample from random walker

35     vv(1)=w;

36     Jdetprop=1/2/C0 * ( w − m0)ˆ2;% background penalization

37     Phidetprop=0;

38     for  i =1:J

39             vv( i+1)=r * vv( i )*(1 − vv( i ) );

40             Phidetprop=Phidetprop+1/2/gamma ˆ 2 * ( y( i )−vv( i +1))ˆ2;

41     end

42     Idetprop=Jdetprop+Phidetprop;% compute log  posterior  of  the proposal

43 

44     i f  rand< exp ( Idet−Idetprop)% accept or r e j e c t  proposed sample

45         v = w;  Idet=Idetprop;  bb = bb+1;% update the Markov chain

46     end

47     rat (n)=bb/n;% running rate  of  acceptance

48     V(n)=v;% store  the chain

49     n = n+1;

50 end

51 dx=0.0005; v0 =[0.01: dx: 0. 9 9 ];

52 Z =hist (V, v0);% construct  the posterior  histogram

53 figure (1),  plot (v0,Z/ trapz (v0,Z), ’ k ’, ’ Linewidth ’,2)% v i s u a l i z e  the

54   posterior

5.2.2 p4.m

The program p4.m contains an implementation of the independence dynamics sampler for stochastic dynamics , as introduced in Section 3.2.4. Thus the posterior distribution is on the entire signal \(\{v_{j}\}_{j\in \mathbb{J}}.\) The forward model in this case is from Example 2.3, given by (5.1).

The smoothing distribution \(\mathbb{P}(v\vert Y )\) is therefore over the state space \(\mathbb{R}^{J+1}\).

The sections setup, truth, and solution are defined as for program p3.m, but note that now the smoothing distribution is over the entire path, not just over the initial condition, because we are considering stochastic dynamics . Since the state space is now the path space, rather than the initial condition as it was in program p3.m, the truth vt \(\in \mathbb{R}^{J+1}\) is now a vector. Its initial condition is taken as a draw from \(N(m_{0},C_{0})\) in line 16, and the trajectory is computed in line 20, so that at the end, vt ∼ ρ 0. As in program p3.m, \(v^{\dag }\) (vt) will be the chosen initial condition in the Markov chain (to ameliorate burn-in issues ), and so \(\mathrm{\varPhi }(v^{\dag };y)\) is computed in line 23. Recall from Section 3.2.4 that only \(\mathrm{\varPhi }(\cdot;y)\) is required to compute the acceptance probability in this algorithm.

Notice that the collection of samples V \(\in \mathbb{R}^{N\times J+1}\) preallocated in line 30 is substantial in this case, illustrating the memory issue that arises when the dimension of the signal space, and number of samples, increases.

The current state of the chain \(v^{(k)}\) and the value of \(\varPhi (v^{(k)};y)\) are again denoted by v and Phi, while the proposal w (k) and the value of \(\mathrm{\varPhi }(w^{(k)};y)\) are again denoted by w and Phiprop, as in program p3. As discussed in Section 3.2.4, the proposal w (k) is an independent sample from the prior distribution ρ 0, similarly to v , and it is constructed in lines 34–39. The acceptance probability used in line 40 is now

$$\displaystyle{ a(v^{(k-1)},w^{(k)}) = 1 \wedge \exp (\mathrm{\varPhi }(v^{(k-1)};y) -\mathrm{\varPhi }(w^{(k)};y)). }$$
(5.7)

The remainder of the program is structurally the same as p3.m. The outputs of this program are used to plot Figures 3.33.4, and 3.5. Note that in the case of Figure 3.5, we have used N = 108 samples.

 1 clear;  set (0, ’ defaultaxesfontsize ’, 2 0 );  format long

 2 %%%  p4.m  MCMC  INDEPENDENCE  DYNAMICS  SAMPLER  algorithm

 3 %%%  for  sin  map (Ex.  1.3)  with noise

 4 %%  setup

 5 

 6 J=10;% number of  steps

 7 alpha=2.5;% dynamics determined by alpha

 8 gamma =1;% observational  noise  variance i s  gammaˆ2

 9 sigma=1;% dynamics noise  variance i s  sigmaˆ2

10 C0 =1;% prior  i n i t i a l  condition  variance

11 m0 =0;% prior  i n i t i a l  condition  mean

12 sd=0;rng ( sd);% choose random number seed

13 

14 %%  truth

15 

16 vt(1)=m0 +sqrt (C0 ) * randn;% truth i n i t i a l  condition

17 Phi=0;

18 

19 for  j =1:J

20     vt ( j+1)=alpha * sin ( vt ( j ))+sigma * randn;% create  truth

21     y( j)=vt ( j+1)+ gamma * randn;% create  data

22     %  calculate  log  likelihood  of  truth,  Phi (v; y) from (1.11)

23     Phi = Phi+1/2/gamma ˆ 2 * ( y( j)−vt ( j +1))ˆ2;

24 end

25 

26 %%  solution

27 %  Markov Chain Monte Carlo:  N  forward steps  of  the

28 %  Markov Chain on R ˆ{J+1} with truth i n i t i a l  condition

29 N =1e5;% number of  samples

30 V =zeros (N, J+1);% preallocate  space to save time

31 v =vt;% truth i n i t i a l  condition  ( or e l s e  update Phi )

32 n=1; bb=0; rat (1)=0;

33 while n <= N

34      w(1)= sqrt (C0 ) * randn;% propose sample from the  prior

35     Phiprop=0;

36     for  j =1:J

37     w( j+1)=alpha * sin (w( j ))+sigma * randn;% propose sample from the prior

38     Phiprop = Phiprop+1/2/gamma ˆ 2 * ( y( j)− w( j +1))ˆ2;% compute likelihood

39     end

40     i f  rand< exp (Phi− Phiprop)% accept or r e j e c t  proposed sample

41        v = w;  Phi = Phiprop;  bb = bb+1;% update the Markov chain

42     end

43    rat (n)=bb/n;% running rate  of  acceptance

44    V(n,:)= v;% store  the chain

45    n = n+1;

46 end

47 %  plot  acceptance ratio  and cumulative sample mean

48 figure; plot ( rat )

49 figure; plot (cumsum(V(1:N, 1 ) ). / [ 1:N] ’ )

50 xlabel ( ’ samples N’ )

51 ylabel ( ’(1/N) ∖Sigma_{n=1}ˆ N  v_0ˆ{(n)} ’)

5.2.3 p5.m

The independence dynamics sampler of Section 5.2.2 may be very inefficient, since typical random draws from the dynamics may be unlikely to fit the data as well as the current state, and will then be rejected. The fifth program, p5.m, gives an implementation of the pCN algorithm from Section 3.2.4 that is designed to overcome this issue by including the parameter β, which, if chosen small, allows for incremental steps in signal space and hence the possibility of nonnegligible acceptance probabilities. This program is used to generate Figure 3.6

This program is almost identical to p4.m, and so only the points at which it differs will be described. First, since the acceptance probability is given by

$$\displaystyle{a(v^{(k-1)},w^{(k)}) = 1 \wedge \exp (\mathrm{\varPhi }(v^{(k-1)};y) -\mathrm{\varPhi }(w^{(k)};y) + G(v^{(k-1)}) - G(w^{(k)})),}$$

the quantity

$$\displaystyle{G(u) =\sum _{ j=0}^{J-1}{\Bigl (\frac{1} {2}\vert \varSigma ^{-\frac{1} {2} }\mathrm{\varPsi }(u_{j})\vert ^{2} -\langle \varSigma ^{-\frac{1} {2} }u_{j+1},\varSigma ^{-\frac{1} {2} }\mathrm{\varPsi }(u_{j})\rangle \Bigr )}}$$

will need to be computed, both for v (k) (denoted by v in lines 31 and 44), where its value is denoted by G (v (0) = v ), as well as for G(v ) which is computed in line 22), and for w (k) (denoted by w in line 36) where its value is denoted by Gprop in line 39.

As discussed in Section 3.2.4, the proposal w (k) is given by (3.19):

$$\displaystyle{ w^{(k)} = m + (1 -\beta ^{2})^{\frac{1} {2} }(v^{(k-1)} - m) +\beta \iota ^{(k-1)}; }$$
(5.8)

here \(\iota ^{(k-1)} \sim N(0,C)\) are i.i.d.  and denoted by iota in line 35. Here C is the covariance of the Gaussian measure π 0 given in Equation (2.24) corresponding to the case of trivial dynamics \(\mathrm{\varPsi } = 0\), and m is the mean of π 0. The value of m is given by m in line 33.

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p5.m  MCMC  pCN  algorithm for  sin  map (Ex.  1.3)  with noise

 3 

 4 %%  setup

 5 J=10;% number of  steps

 6 alpha=2.5;% dynamics determined by alpha

 7 gamma =1;% observational  noise  variance i s  gammaˆ2

 8 sigma=.1;% dynamics noise  variance i s  sigmaˆ2

 9 C0 =1;% prior  i n i t i a l  condition  variance

10 m0 =0;% prior  i n i t i a l  condition  mean

11 sd=0;rng ( sd);% Choose random number seed

12 

13 %%  truth

14 

15 vt(1)=m0 +sqrt (C0 ) * randn;% truth i n i t i a l  condition

16 G =0;Phi=0;

17 

18 for  j =1:J

19     vt ( j+1)=alpha * sin ( vt ( j ))+sigma * randn;% create  truth

20     y( j)=vt ( j+1)+ gamma * randn;% create  data

21     %  calculate  log  density  from (1.−−)

22     G = G +1/2/sigma ˆ 2 * ( ( alpha * sin ( vt ( j )))ˆ2 −2* vt ( j +1)* alpha * sin ( vt ( j ) ) );

23     %  calculate  log  likelihood  phi (u; y) from (1.11)

24     Phi = Phi+1/2/gamma ˆ 2 * ( y( j)−vt ( j +1))ˆ2;

25 end

26 

27 %%  solution

28 %  Markov Chain Monte Carlo:  N  forward steps

29 N =1e5;% number of  samples

30 beta =0.02;% step−s i z e  of  pCN  walker

31 v =vt;% truth i n i t i a l  condition  ( or update G  +  Phi )

32 V =zeros (N, J+1); n=1; bb=0; rat =0;

33 m =[m0, zeros (1, J ) ];

34 while n <= N

35      iota =[ sqrt (C0 ) * randn, sigma * randn (1, J )];% Gaussian prior  sample

36     w = m +sqrt(1−beta ˆ 2 ) * ( v − m )+beta * iota;% propose sample from the pCN  walker

37     Gprop=0;Phiprop=0;

38     for  j =1:J

39         Gprop = Gprop+1/2/sigma ˆ 2 * ( ( alpha * sin (w( j )))ˆ2 −2*w( j +1)* alpha * sin

40               (w( j ) ) );

41         Phiprop = Phiprop+1/2/gamma ˆ 2 * ( y( j)− w( j +1))ˆ2;

42     end

43 

44     i f  rand< exp (Phi− Phiprop + G − Gprop)% accept or r e j e c t  proposed sample

45         v = w; Phi = Phiprop;G = Gprop; bb = bb+1;% update the Markov chain

46     end

47     rat (n)=bb/n;% running rate  of  acceptance

48     V(n,:)= v;% store  the chain

49     n = n+1;

50 end

51 %  plot  acceptance ratio  and cumulative sample mean

52 figure; plot ( rat )

53 figure; plot (cumsum(V(1:N, 1 ) ). / [ 1:N] ’ )

54 xlabel ( ’ samples N’ )

55 ylabel ( ’(1/N) ∖Sigma_{n=1}ˆ N  v_0ˆ{(n)} ’)

5.2.4 p6.m

The pCN dynamics sampler is now introduced as program p6.m. The independence dynamics sampler of Section 5.2.2 may be viewed as a special case of this algorithm for proposal variance β = 1. This proposal combines the benefits of tuning the step size β while still respecting the prior distribution on the dynamics. It does so by sampling the initial condition and noise (v 0, ξ) rather than the path itself, in lines 34 and 35, as given by equation (5.8). However, as opposed to the pCN sampler of the previous section, this variable w is now interpreted as a sample of (v 0, ξ) and is therefore fed into the path vv itself in line 39. The acceptance probability is the same as that of the independence dynamics sampler  (5.7), depending only on Phi. If the proposal is accepted, both the forcing u=w and the path v=vv are updated in line 44. Only the path is saved, as in the previous routines, in line 47.

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p6.m  MCMC  pCN  Dynamics algorithm for

 3 %%%  sin  map (Ex.  1.3)  with noise

 4 %%  setup

 5 

 6 J=10;% number of  steps

 7 alpha=2.5;% dynamics determined by alpha

 8 gamma =1;% observational  noise  variance i s  gammaˆ2

 9 sigma=1;% dynamics noise  variance i s  sigmaˆ2

10 C0 =1;% prior  i n i t i a l  condition  variance

11 m0 =0;% prior  i n i t i a l  condition  mean

12 sd=0;rng ( sd);% Choose random number seed

13 

14 %%  truth

15 

16 vt(1)=m0 +sqrt (C0 ) * randn;% truth i n i t i a l  condition

17 ut(1)=vt ( 1 );

18 Phi=0;

19 for  j =1:J

20     ut ( j+1)=sigma * randn;

21     vt ( j+1)=alpha * sin ( vt ( j ))+ut ( j +1);% create  truth

22     y( j)=vt ( j+1)+ gamma * randn;% create  data

23     %  calculate  log  likelihood  phi (u; y) from (1.11)

24     Phi = Phi+1/2/gamma ˆ 2 * ( y( j)−vt ( j +1))ˆ2;

25 end

26 

27 %%  solution

28 %  Markov Chain Monte Carlo:  N  forward steps

29 N =1e5;% number of  samples

30 beta=0.2;% step−s i z e  of  pCN  walker

31 u = ut; v =vt;% truth i n i t i a l  condition  ( or update Phi )

32 V =zeros (N, J+1); n=1; bb=0; rat =0; m =[m0, zeros (1, J ) ];

33 while n <= N

34      iota =[ sqrt (C0 ) * randn, sigma * randn (1, J )];% Gaussian prior  sample

35     w = m +sqrt(1−beta ˆ 2 ) * ( u − m )+beta * iota;% propose sample from the pCN  walker

36     vv(1)=w( 1 );

37     Phiprop=0;

38     for  j =1:J

39         vv( j+1)=alpha * sin (vv( j ))+w( j +1);% create  path

40         Phiprop = Phiprop+1/2/gamma ˆ 2 * ( y( j)−vv( j +1))ˆ2;

41     end

42 

43     i f  rand< exp (Phi− Phiprop)% accept or r e j e c t  proposed sample

44         u = w; v = vv; Phi = Phiprop; bb = bb+1;% update the Markov chain

45     end

46        rat (n)=bb/n;% running rate  of  acceptance

47        V(n,:)= v;% store  the chain

48        n = n+1;

49 end

50 %  plot  acceptance ratio  and cumulative sample mean

51 figure; plot ( rat )

52 figure; plot (cumsum(V(1:N, 1 ) ). / [ 1:N] ’ )

53 xlabel ( ’ samples N’ )

54 ylabel ( ’(1/N) ∖Sigma_{n=1}ˆ N  v_0ˆ{(n)} ’)

5.2.5 p7.m

The next program, p7.m, contains an implementation of the weak constrained variational algorithm w4DVAR discussed in Section 3.3 This program is written as a function, while all previous programs were written as scripts. This choice was made for p7.m so that the MATLAB built-in function fminsearch can be used for optimization in the solution section, and the program can still be self-contained. To use this built-in function, it is necessary to define an auxiliary objective function I to be optimized. The function fminsearch can be used within a script, but the auxiliary function would then have to be written separately, so we cannot avoid functions altogether unless we write the optimization algorithm by hand. We avoid the latter in order not to divert the focus of this text from the data-assimilation problem, and algorithms to solve it, to the problem of how to optimize an objective function.

Again the forward model is that given by Example 2.8, namely (5.1). The setup and truth sections are similar to the previous programs, except that G, for example, need not be computed here. The auxiliary objective function I in this case is \(\mathsf{I}(\cdot;y)\) from equation (2.21), given by

$$\displaystyle{ \mathsf{I}(\cdot;y) = \mathsf{J}(\cdot ) +\mathrm{ \varPhi }(\cdot;y), }$$
(5.9)

where

$$\displaystyle{ \mathsf{J}(u):= \frac{1} {2}{\bigl |C_{0}^{-\frac{1} {2} }(u_{0} - m_{0})\bigr |}^{2} +\sum _{ j=0}^{J-1}\frac{1} {2}{\bigl |\varSigma ^{-\frac{1} {2} }{\bigl (u_{j+1} -\mathrm{\varPsi }(u_{j})\bigr )}\bigr |}^{2} }$$
(5.10)

and

$$\displaystyle{ \mathrm{\varPhi }(u;y) =\sum _{ j=0}^{J-1}\frac{1} {2}{\bigl |\varGamma ^{-\frac{1} {2} }{\bigl (y_{j+1} - h(u_{j+1})\bigr )}\bigr |}^{2}. }$$
(5.11)

It is defined in lines 38–45. The auxiliary objective function takes as inputs (u,y,sigma,gamma,alpha,m0, C0,J), and gives output out \(= \mathsf{I}(u;y)\), where \(u \in R^{J+1}\) (given all the other parameters in its definition—the issue of identifying the input to be optimized over is discussed also below).

The initial guess for the optimization algorithm uu is taken as a standard normal random vector over \(\mathbb{R}^{J+1}\) in line 27. In line 24, a standard normal random matrix of size 1002 is drawn and thrown away. This is so that one can easily change the input, e.g., to randn(z) for z \(\in \mathbb{N}\), and induce different random initial vectors uu for the optimization algorithm, while keeping the data fixed by the random number seed sd set in line 12. The truth vt may be used as initial guess by uncommenting line 28. In particular, if the output of the minimization procedure is different for different initial conditions, then it is possible that the objective function \(\mathsf{I}(\cdot;y)\) has multiple minima, and hence the posterior distribution \(\mathbb{P}(\cdot \vert y)\) is multimodal. As we have already seen in Figure 3.8, this is certainly true even in the case of scalar deterministic dynamics , when the underlying map gives rise to a chaotic flow.

The MATLAB optimization function fminsearch is called in line 32. The function handle command @(u)I(u, \(\cdots\) ) is used to tell fminsearch that the objective function I is to be considered a function of u, even though it may take other parameter values as well (in this case, y,sigma,gamma,alpha,m0,C0, and J). The outputs of fminsearch are the value vmap such that I(vmap) is minimum, the value fval = I(vmap), and the exit flag, which takes the value 1 if the algorithm has converged. The reader is encouraged to use the help command for more details on this and other MATLAB functions used in the notes. The results of this minimization procedure are plotted in lines 34–35 together with the true value \(v^{\dag }\) as well as the data y. In Figure 3.9, such results are presented, including two minima that were found with different initial conditions.

 1 function  this= p7

 2 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 3 %%%  p7.m  weak 4 DVAR  for  sin  map (Ex.  1.3)

 4 %%  setup

 5 

 6 J=5;% number of  steps

 7 alpha=2.5;% dynamics determined by alpha

 8 gamma =1e0;% observational  noise  variance i s  gammaˆ2

 9 sigma=1;% dynamics noise  variance i s  sigmaˆ2

10 C0 =1;% prior  i n i t i a l  condition  variance

11 m0 =0;% prior  i n i t i a l  condition  mean

12 sd=1;rng ( sd);% choose random number seed

13 

14 %%  truth

15 

16 vt(1)= sqrt (C0 ) * randn;% truth i n i t i a l  condition

17 for  j =1:J

18     vt ( j+1)=alpha * sin ( vt ( j ))+sigma * randn;% create  truth

19     y( j)=vt ( j+1)+ gamma * randn;% create  data

20 end

21 

22 %%  solution

23 

24     randn (100);% try uncommenting or changing the argument for  d i f f e r e n t

25                %  i n i t i a l  conditions  −−  i f  the result  i s  not the same,

26                %  there may be multimodality ( e. g.  1 &  100).

27     uu = randn (1, J+1);% i n i t i a l  guess

28     % uu =vt;      %  truth i n i t i a l  guess option

29 

30 %  solve  with blackbox

31 %  e x i t f l a g=1 ==>  convergence

32 [vmap, fval, e x i t f l a g ]=fminsearch (@(u) I (u, y, sigma,gamma, alpha,m0,C0, J ),uu)

33 

34 figure; plot ( [ 0: J ],vmap, ’ Linewidth ’, 2 ); hold; plot ( [ 0: J ], vt, ’ r ’, ’ Linewidth ’,2)

35 plot ( [ 1: J ], y, ’ g ’, ’ Linewidth ’, 2 ); hold; xlabel ( ’ j ’ ); legend ( ’MAP ’, ’ truth ’, ’ y ’ )

36 

37 %%  auxiliary  objective  function  d e f i n i t i o n

38 function  out=I (u, y, sigma,gamma, alpha,m0,C0, J)

39 

40 Phi=0;JJ=1/2/C0 * ( u(1)−m0)ˆ2;

41 for  j =1:J

42     JJ = JJ+1/2/sigma ˆ 2 * ( u( j+1)−alpha * sin (u( j )))ˆ2;

43     Phi = Phi+1/2/gamma ˆ 2 * ( y( j)−u( j +1))ˆ2;

44 end

45 out= Phi + JJ;

5.3 Chapter 4 Programs

The programs p8.m-p15.m, used to generate the figures in Chapter 4, are presented in this section. Various filtering algorithms used to sample the posterior filtering distribution are given, involving both Gaussian approximation and particle approximation. Since these algorithms are run for very large times (large J), they will be divided into only two sections: setup, in which the parameters are defined, and solution, in which both the truth and observations are generated, and the online assimilation of the current observation into the filter solution is performed. The generation of truth can be separated into a truth section as in the previous sections, but two loops of length J would be required, and loops are inefficient in MATLAB , so the present format is preferred. The programs in this section are all very similar, and their output is also similar, giving rise to Figures 4.34.12. With the exception of p8.m and p9.m, the forward model is given by Example 2.8 (5.1), and the output is identical, given for p10.m through p15.m in Figures 4.54.7 and 4.84.10. Figures 4.11 and 4.12 compare the filters from the other Figures. The program p8.m features a two-dimensional linear forward model, and p9.m features the forward model from Example 2.9 (5.2). At the end of each program, the outputs are used to plot the mean and the covariance as well as the mean-square error of the filter as functions of the iteration number j.

5.3.1 p8.m

The first filtering program is p8.m, which contains an implementation of the Kalman filter applied to Example 2.2,

$$\displaystyle{v_{j+1} = Av_{j}+\xi _{j},\quad \text{with}\quad A = \left (\begin{array}{cc} 0 &1\\ - 1 &0 \end{array} \right ),}$$

and observed data given by

$$\displaystyle{y_{j+1} = Hv_{j+1} +\eta _{j+1}}$$

with H = (1, 0) and Gaussian noise. Thus only the first component of v j is observed.

The parameters and initial condition are defined in the setup section, lines 3–19. The vectors v, m \(\in \mathbb{R}^{N\times J}\), y \(\in \mathbb{R}^{J}\), and c \(\in \mathbb{R}^{N\times N\times J}\) are preallocated to hold the truth, mean, observations, and covariance over the J observation times defined in line 5. In particular, notice that the true initial condition is drawn from \(N(m_{0},C_{0})\) in line 16, where m 0 = 0 and C 0 = 1 are defined in lines 10–11. The initial estimate of the distribution is defined in lines 17–18 as \(N(m_{0},C_{0})\), where m 0 ∼ N(0, 100I) and \(C_{0} \leftarrow 100C_{0}\), so that the code may test the ability of the filter to lock onto the true distribution, asymptotically in j, given a poor initial estimate. That is to say, the values of \((m_{0},C_{0})\) are changed such that the initial condition is not drawn from this distribution.

The main solution loop then follows in lines 21–34. The truth v and the data that are being assimilated y are sequentially generated within the loop, in lines 24–25. The filter prediction step, in lines 27–28, consists in computing the predictive mean and covariance \(\widehat{m}_{j}\) and \(\widehat{C}_{j}\) as defined in (4.4) and (4.5) respectively:

$$\displaystyle{\widehat{m}_{j+1} = Am_{j},\quad \widehat{C}_{j+1} = AC_{j}A^{T} +\varSigma.}$$

Notice that indices are not used for the transient variables mhat and chat representing \(\widehat{m}_{j}\) and \(\widehat{C}_{j}\), because they will not be saved from one iteration to the next. In lines 30–33, we implement the analysis formulas for the Kalman filter from Corollary 4.2. In particular, the innovation between the observation of the predicted mean and the actual observation, as introduced in Corollary 4.2, is first computed in line 30,

$$\displaystyle{ d_{j} = y_{j} - H\widehat{m}_{j}. }$$
(5.12)

Again d, which represents d j , does not have any index for the same reason as above. Next, the Kalman gain defined in Corollary 4.2 is computed in line 31:

$$\displaystyle{ K_{j} =\widehat{ C}_{j}H^{T}(H\widehat{C}_{ j}H^{T}+\varGamma )^{-1}. }$$
(5.13)

Once again, an index j is not used for the transient variable K representing K j . Notice that a “forward slash” / is used to compute B/A=B A −1. This is an internal function of MATLAB that will analyze the matrices B and A to determine an “optimal” method for inversion, given their structure. The update given in Corollary 4.2 is completed in lines 30–32 with the equations

$$\displaystyle{ m_{j} =\widehat{ m}_{j} + K_{j}d_{j}\quad \mathrm{and}\quad C_{j} = (I - K_{j}H)\widehat{C}_{j}. }$$
(5.14)

Finally, in lines 36–50, the outputs of the program are used to plot the mean and the covariance as well as the mean-square error of the filter as functions of the iteration number j, as shown in Figure 4.3

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p8.m  Kalman Filter,  Ex.  1.2

 3 %%  setup

 4 

 5 J=1e3;% number of  steps

 6 N =2;% dimension of  state

 7 I=eye (N);% identity  operator

 8 gamma =1;% observational  noise  variance i s  gamma ˆ 2 * I

 9 sigma=1;% dynamics noise  variance i s  sigma ˆ 2 * I

10 C0 =eye (2);% prior  i n i t i a l  condition  variance

11 m0=[0;0];% prior  i n i t i a l  condition  mean

12 sd=10;rng ( sd);% choose random number seed

13 A =[0 1;−1 0];% dynamics determined by A

14  

15  m =zeros (N, J ); v = m; y =zeros (J, 1 ); c =zeros (N,N, J);% pre−allocate

16 v(:,1)=m0 + sqrtm (C0 ) * randn (N,1);% i n i t i a l  truth

17 m(:,1)=10* randn (N,1);% i n i t i a l  mean/ estimate

18 c (:,:, 1 ) = 1 0 0 *C0;% i n i t i a l  covariance

19 H=[1,0];% observation operator

20 

21 %%  solution  %  assimilate !

22 

23 for  j =1:J

24     v (:, j+1)= A * v (:, j ) +  sigma * randn (N,1);% truth

25     y( j)= H * v (:, j+1)+ gamma * randn;% observation

26 

27     mhat = A * m(:, j );% estimator predict

28     chat= A * c (:,:, j ) * A ’+sigma ˆ 2 * I;% covariance predict

29 

30     d = y( j)− H * mhat;% innovation

31     K =(chat * H’ ) / (H * chat * H ’+gammaˆ2);% Kalman gain

32     m(:, j+1)=mhat + K * d;% estimator update

33     c (:,:, j +1)=(I− K * H ) * chat;% covariance update

34 end

35 

36 figure; j s =21; plot ( [ 0: js −1],v (2,1: j s ) ); hold; plot ( [ 0: js −1], m(2,1: j s ), ’m’ );

37 plot ( [ 0: js −1], m(2,1: j s )+reshape ( sqrt ( c (2,2,1: j s )),1, j s ), ’ r −−’);

38 plot ( [ 0: js −1], m(2,1: j s )−reshape ( sqrt ( c (2,2,1: j s )),1, j s ), ’ r −−’);

39 hold; grid; xlabel ( ’ iteration,  j ’ );

40 t i t l e ( ’Kalman Filter,  Ex.  1. 2 ’ );

41 

42 figure; plot ( [ 0: J ], reshape ( c (1,1,:)+ c (2,2,:), J+1,1)); hold

43 plot ( [ 0: J ],cumsum( reshape ( c (1,1,:)+ c (2,2,:), J +1,1))./[1: J+1] ’, ’m’,  ...

44 ’ Linewidth ’, 2 );  grid;  hold; xlabel ( ’ iteration,  j ’ ); axis ([1  1000 0 5 0 ] );

45 t i t l e ( ’Kalman F i l t e r  Covariance,  Ex.  1. 2 ’ );

46 

47 figure; plot ( [ 0: J ],sum((v − m). ˆ 2 ) ); hold;

48 plot ( [ 0: J ],cumsum(sum((v − m). ˆ 2 ) ). / [ 1: J+1], ’m’, ’ Linewidth ’, 2 ); grid

49 hold; xlabel ( ’ iteration,  j ’ ); axis ([1  1000 0 5 0 ] );

50 t i t l e ( ’Kalman F i l t e r  Error,  Ex.  1.2 ’)

5.3.2 p9.m

The program p9.m contains an implementation of the 3DVAR method applied to the chaotic logistic map of Example 2.4 (5.2) for r = 4. As in the previous section, the parameters and initial condition are defined in the setup section, lines 3–16. In particular, notice that the truth initial condition v(1) and initial mean m(1) are now initialized in lines 12–13 with a uniform random number using the command rand, so that they are in the interval [0, 1], where the model is well defined. Indeed, the solution will eventually become unbounded if initial conditions are chosen outside this interval. With this in mind, we set the dynamics noise sigma = 0 in line 8, i.e., deterministic dynamics, so that the true dynamics themselves do not become unbounded.

The analysis step of 3DVAR consists in minimizing

$$\displaystyle{\mathsf{I}_{\mathrm{filter}}(v) = \frac{1} {2}\vert \varGamma ^{-\frac{1} {2} }(y_{j+1} - Hv)\vert ^{2} + \frac{1} {2}\vert \widehat{C}^{-\frac{1} {2} }(v -\mathrm{\varPsi }(m_{j}))\vert ^{2}.}$$

In this one-dimensional case, we set Γ = γ 2, \(\widehat{C} =\sigma ^{2}\) and define \(\eta ^{2} =\gamma ^{2}/\eta ^{2}.\) The stabilization parameter η (eta) from Example 4.12 is set in line 14, representing the ratio in uncertainty in the data to that of the model; equivalently, it measures trust in the model over the observations. The choice η = 0 means that the model is irrelevant in the minimization step (4.12) of 3DVAR , in the observed space—the synchronization filter . Since in the example, the signal space and observation space both have dimension equal to 1, the choice η = 0 simply corresponds to using only the data. In contrast, the choice η =  ignores the observations and uses only the model.

The 3DVAR setup gives rise to the constant scalar covariance C and resultant constant scalar gain K; this should not be confused with the changing K j in (5.13), temporarily defined by K in line 31 of p8.m. The main solution loop follows in lines 20–33. Up to the different forward model, lines 21–22, 24, 26, and 27 of this program are identical to lines 24–25, 27, 30, and 32 of p8.m, described in Section 5.3.1. The only other difference is that the covariance updates are not here because of the constant-covariance assumption underlying the 3DVAR algorithm.

The 3DVAR filter may in principle generate the estimated mean mhat outside [0, 1], because of the noise in the data. In order to flag potential unbounded trajectories of the filter, which in principle could arise because of this, an extra stopping criterion is included in lines 29–32. To illustrate this, try setting sigma ≠ 0 in line 8. Then the signal will eventually become unbounded, regardless of how small the noise variance is chosen. In this case, the estimate will surely blow up while tracking the unbounded signal. Otherwise, if η is chosen appropriately so as to stabilize the filter, it is extremely unlikely that the estimate will ever blow up. Finally, similarly to p8.m, in the last lines of the program we use the outputs of the program in order to produce Figure 4.4, namely plotting the mean and the covariance as well as the mean-square error of the filter as functions of the iteration number j.

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p9.m  3 DVAR  Filter,  deterministic  l o g i s t i c  map (Ex.  1.4)

 3 %%  setup

 4 

 5 J=1e3;% number of  steps

 6 r=4;% dynamics determined by r

 7 gamma =1e−1;% observational  noise  variance i s  gammaˆ2

 8 sigma=0;% dynamics noise  variance i s  sigmaˆ2

 9 sd=10;rng ( sd);% choose random number seed

10 

11 m =zeros (J, 1 ); v = m; y = m;% pre−allocate

12 v(1)=rand;% i n i t i a l  truth,  in [ 0, 1 ]

13 m (1)=rand;% i n i t i a l  mean/estimate,  in [ 0, 1 ]

14 eta=2e−1;% s t a b i l i z a t i o n  c o e f f i c i e n t  0 <  eta <<  1

15 C = gammaˆ2/ eta;H =1;% covariance and observation operator

16 K =( C * H’ ) / (H * C * H ’+gammaˆ2);% Kalman gain

17 

18 %%  solution  %  assimilate !

19 

20 for  j =1:J

21     v( j+1)=r * v( j )*(1 − v( j ))  +  sigma * randn;% truth

22     y( j)= H * v( j+1)+ gamma * randn;% observation

23 

24     mhat =r * m ( j )*(1 −m ( j ));% estimator predict

25 

26     d = y( j)− H * mhat;% innovation

27     m ( j+1)=mhat + K * d;% estimator update

28 

29     i f  norm(mhat)>1e5

30         disp ( ’ blowup ! ’ )

31         break

32     end

33 end

34 j s =21;% plot  truth,  mean,  standard deviation,  observations

35 figure; plot ( [ 0: js −1],v (1: j s ) ); hold; plot ( [ 0: js −1], m(1: j s ), ’m’ );

36 plot ( [ 0: js −1], m(1: j s )+sqrt (C), ’ r −−’); plot ( [ 1: js −1],y (1: js −1), ’kx ’ );

37 plot ( [ 0: js −1], m(1: j s )−sqrt (C), ’ r −−’);hold; grid; xlabel ( ’ iteration,  j ’ );

38 t i t l e ( ’3DVAR  Filter,  Ex.  1.4 ’)

39 

40 figure; plot ( [ 0: J ],C * [ 0: J ]. ˆ 0 ); hold

41 plot ( [ 0: J ],C * [ 0: J ].ˆ0, ’m’, ’ Linewidth ’, 2 ); grid

42 hold; xlabel ( ’ iteration,  j ’ ); t i t l e ( ’3DVAR  F i l t e r  Covariance,  Ex.  1. 4 ’ );

43 

44 figure; plot ( [ 0: J ], ( v − m). ˆ 2 ); hold;

45 plot ( [ 0: J ],cumsum((v − m). ˆ 2 ). / [ 1: J+1] ’, ’m’, ’ Linewidth ’, 2 ); grid

46 hold; xlabel ( ’ iteration,  j ’ );

47 t i t l e ( ’3DVAR  F i l t e r  Error,  Ex.  1.4 ’)

5.3.3 p10.m

A variation of program p9.m is given by p10.m, where the 3DVAR filter is implemented for Example 2.3 given by (5.1). Indeed, the remaining programs of this section will all be for the same example, namely Example 2.3, so this will not be mentioned again. In this case, the initial condition is again taken as a draw from the prior \(N(m_{0},C_{0})\) as in p7.m, and the initial mean estimate is again changed to m 0 ∼ N(0, 100I) so that the code may test the ability of the filter to lock onto the signal given a poor initial estimate. Furthermore, for this problem, there is no need to introduce the stopping criterion present in the case of p9.m since the underlying deterministic dynamics are dissipative. The output of this program is shown in Figure 4.5

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p10.m  3 DVAR  Filter,  sin  map (Ex.  1.3)

 3 %%  setup

 4 

 5 J=1e3;% number of  steps

 6 alpha=2.5;% dynamics determined by alpha

 7 gamma =1;% observational  noise  variance i s  gammaˆ2

 8 sigma=3e−1;% dynamics noise  variance i s  sigmaˆ2

 9 C0 =9e−2;% prior  i n i t i a l  condition  variance

10 m0 =0;% prior  i n i t i a l  condition  mean

11 sd=1;rng ( sd);% choose random number seed

12 

13 m =zeros (J, 1 ); v = m; y = m;% pre−allocate

14 v(1)=m0 +sqrt (C0 ) * randn;% i n i t i a l  truth

15 m(1)=10* randn;% i n i t i a l  mean/ estimate

16 eta=2e−1;% s t a b i l i z a t i o n  c o e f f i c i e n t  0 <  eta <<  1

17 c = gammaˆ2/ eta;H =1;% covariance and observation operator

18 K =(c * H’ ) / (H * c * H ’+gammaˆ2);% Kalman gain

19 

20 %%  solution  %  assimilate !

21 

22 for  j =1:J

23     v( j+1)=alpha * sin (v( j ))  +  sigma * randn;% truth

24     y( j)= H * v( j+1)+ gamma * randn;% observation

25 

26     mhat =alpha * sin ( m ( j ));% estimator predict

27 

28     d = y( j)− H * mhat;% innovation

29     m ( j+1)=mhat + K * d;% estimator update

30 

31 end

32 

33 j s =21;% plot  truth,  mean,  standard deviation,  observations

34 figure; plot ( [ 0: js −1],v (1: j s ) ); hold; plot ( [ 0: js −1], m(1: j s ), ’m’ );

35 plot ( [ 0: js −1], m(1: j s )+sqrt ( c ), ’ r −−’); plot ( [ 1: js −1],y (1: js −1), ’kx ’ );

36 plot ( [ 0: js −1], m(1: j s )−sqrt ( c ), ’ r −−’);hold; grid; xlabel ( ’ iteration,  j ’ );

37 t i t l e ( ’3DVAR  Filter,  Ex.  1.3 ’)

38 

39 figure; plot ( [ 0: J ], c * [ 0: J ]. ˆ 0 ); hold

40 plot ( [ 0: J ], c * [ 0: J ].ˆ0, ’m’, ’ Linewidth ’, 2 ); grid

41 hold; xlabel ( ’ iteration,  j ’ );

42 t i t l e ( ’3DVAR  F i l t e r  Covariance,  Ex.  1. 3 ’ );

43 

44 figure; plot ( [ 0: J ], ( v − m). ˆ 2 ); hold;

45 plot ( [ 0: J ],cumsum((v − m). ˆ 2 ). / [ 1: J+1] ’, ’m’, ’ Linewidth ’, 2 ); grid

46 hold; xlabel ( ’ iteration,  j ’ );

47 t i t l e ( ’3DVAR  F i l t e r  Error,  Ex.  1.3 ’)

5.3.4 p11.m

The next program is p11.m. This program comprises an implementation of the extended Kalman filter.It is very similar in structure to p8.m, except with a different forward model. Since the dynamics are scalar, the observation operator is defined by setting H to take the value 1 in line 16. The predicting covariance \(\widehat{C}_{j}\) is not independent of the mean, as it is for the linear problem p8.m. Instead, as described in Section 4.2.2, it is determined via the linearization of the forward map around m j , in line 26:

$$\displaystyle{\widehat{C}_{j+1} = \left (\alpha \cos (m_{j})\right )C_{j}\left (\alpha \cos (m_{j})\right ).}$$

As in p8.m, we change the prior to a poor initial estimate of the distribution to study whether, and how, the filter locks onto a neighborhood of the true signal, despite poor initialization, for large j. This initialization is in lines 15–16, where \(m_{0} \sim N(0,100I)\) and \(C_{0} \leftarrow 10C_{0}\). Subsequent filtering programs use an identical initialization, with the same rationale as in this case. We will not state this again. The output of this program is shown in Figure 4.6

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p11.m   Extended Kalman Filter,  sin  map (Ex.  1.3)

 3 %%  setup

 4 

 5 J=1e3;% number of  steps

 6 alpha=2.5;% dynamics determined by alpha

 7 gamma =1;% observational  noise  variance i s  gammaˆ2

 8 sigma=3e−1;% dynamics noise  variance i s  sigmaˆ2

 9 C0 =9e−2;% prior  i n i t i a l  condition  variance

10 m0 =0;% prior  i n i t i a l  condition  mean

11 sd=1;rng ( sd);% choose random number seed

12 

13 m =zeros (J, 1 ); v = m; y = m; c = m;% pre−allocate

14 v(1)=m0 +sqrt (C0 ) * randn;% i n i t i a l  truth

15 m(1)=10* randn;% i n i t i a l  mean/ estimate

16 c (1)=10*C0;H =1;% i n i t i a l  covariance and observation operator

17 

18 %%  solution  %  assimilate !

19 

20 for  j =1:J

21 

22     v( j+1)=alpha * sin (v( j ))  +  sigma * randn;% truth

23     y( j)= H * v( j+1)+ gamma * randn;% observation

24 

25     mhat =alpha * sin ( m ( j ));% estimator predict

26     chat=alpha * cos ( m ( j ) ) * c ( j ) * alpha * cos ( m ( j ))+sigmaˆ2;% covariance predict

27 

28     d = y( j)− H * mhat;% innovation

29     K =(chat * H’ ) / (H * chat * H ’+gammaˆ2);% Kalman gain

30     m ( j+1)=mhat + K * d;% estimator update

31     c ( j+1)=(1− K * H ) * chat;% covariance update

32 

33 end

34 

35 j s =21;% plot  truth,  mean,  standard deviation,  observations

36 figure; plot ( [ 0: js −1],v (1: j s ) ); hold; plot ( [ 0: js −1], m(1: j s ), ’m’ );

37 plot ( [ 0: js −1], m(1: j s )+sqrt ( c (1: j s )), ’ r −−’); plot ( [ 1: js −1],y (1: js −1), ’kx ’ );

38 plot ( [ 0: js −1], m(1: j s )−sqrt ( c (1: j s )), ’ r −−’);hold; grid; xlabel

39   ( ’ iteration,  j ’ );

40 t i t l e ( ’ExKF,  Ex.  1.3 ’)

41 

42 figure; plot ( [ 0: J ], c ); hold

43 plot ( [ 0: J ],cumsum( c ). / [ 1: J+1] ’, ’m’, ’ Linewidth ’, 2 ); grid

44 hold; xlabel ( ’ iteration,  j ’ );

45 t i t l e ( ’ExKF  Covariance,  Ex.  1. 3 ’ );

46 

47 figure; plot ( [ 0: J ], ( v − m). ˆ 2 ); hold;

48 plot ( [ 0: J ],cumsum((v − m). ˆ 2 ). / [ 1: J+1] ’, ’m’, ’ Linewidth ’, 2 ); grid

49 hold; xlabel ( ’ iteration,  j ’ );

50 t i t l e ( ’ExKF  Error,  Ex.  1.3 ’)

5.3.5 p12.m

The program p12.m contains an implementation of the ensemble Kalman filter, with perturbed observations , as described in Section 4.2.3The structure of this program is again very similar to those of p8.m and p11.m, except now an ensemble of particles, of size N defined in line 12, is retained as an approximation of the filtering distribution . The ensemble \(\{v^{(n)}\}_{n=1}^{N}\) represented by the matrix U is then constructed out of draws from this Gaussian in line 18, and the mean m0 is reset to the ensemble sample mean.

In line 27, the predicting ensemble \(\{\widehat{v}_{j}^{(n)}\}_{n=1}^{N}\) represented by the matrix Uhat is computed from a realization of the forward map applied to each ensemble member. This is then used to compute the ensemble sample mean \(\widehat{m}_{j}\) (mhat) and covariance \(\widehat{C}_{j}\) (chat). There is now an ensemble of “innovations” with a new i.i.d.  realization \(y_{j}^{(n)} \sim N(y_{j},\varGamma )\) for each ensemble member, computed in line 31 (not to be confused with the actual innovation as defined in equation (5.12)),

$$\displaystyle{d_{j}^{(n)} = y_{ j}^{(n)} - H\widehat{v}_{ j}^{(n)}.}$$

The Kalman gain K j (K) is computed using (5.13), very similarly to how it is done in p8.m and p11.m, and the ensemble of updates is computed in line 33:

$$\displaystyle{v_{j}^{(n)} =\widehat{ v}_{ j}^{(n)} + K_{ j}d_{j}^{(n)}.}$$

The output of this program is shown in Figure 4.7 Furthermore, long simulations of length J = 105 were performed for this and the previous two programs, p10.m and p11.m, and their errors are compared in Figure 4.11

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p12.m  Ensemble Kalman F i l t e r  (PO),  sin  map (Ex.  1.3)

 3 %%  setup

 4 

 5 J=1e5;% number of  steps

 6 alpha=2.5;% dynamics determined by alpha

 7 gamma =1;% observational  noise  variance i s  gammaˆ2

 8 sigma=3e−1;% dynamics noise  variance i s  sigmaˆ2

 9 C0 =9e−2;% prior  i n i t i a l  condition  variance

10 m0 =0;% prior  i n i t i a l  condition  mean

11 sd=1;rng ( sd);% choose random number seed

12 N =10;% number of  ensemble members

13 

14 m =zeros (J, 1 ); v = m; y = m; c = m;U =zeros (J,N);% pre−allocate

15 v(1)=m0 +sqrt (C0 ) * randn;% i n i t i a l  truth

16 m(1)=10* randn;% i n i t i a l  mean/ estimate

17 c (1)=10*C0;H =1;% i n i t i a l  covariance and observation operator

18 U(1,:)=m (1)+ sqrt ( c ( 1 ) ) * randn (1,N);m (1)=sum(U(1,:))/N;% i n i t i a l  ensemble

19 

20 %%  solution  %  assimilate !

21 

22 for  j =1:J

23 

24     v( j+1)=alpha * sin (v( j ))  +  sigma * randn;% truth

25     y( j)= H * v( j+1)+ gamma * randn;% observation

26 

27     Uhat =alpha * sin (U( j,:))+ sigma * randn (1,N);% ensemble predict

28     mhat = sum(Uhat)/N;% estimator predict

29     chat=(Uhat − mhat ) * ( Uhat − mhat) ’/(N −1);% covariance predict

30 

31     d = y( j)+ gamma * randn (1,N )− H * Uhat;% innovation

32     K =(chat * H’ ) / (H * chat * H ’+gammaˆ2);% Kalman gain

33     U( j +1,:)=Uhat + K * d;% ensemble update

34     m ( j+1)= sum(U( j +1,:))/N;% estimator update

35     c ( j +1)=( U( j +1,:)− m ( j +1))*(U( j +1,:)− m ( j +1)) ’/(N −1);% covariance update

36 

37 end

38 

39 j s =21;% plot  truth,  mean,  standard deviation,  observations

40 figure; plot ( [ 0: js −1],v (1: j s ) ); hold; plot ( [ 0: js −1], m(1: j s ), ’m’ );

41 plot ( [ 0: js −1], m(1: j s )+sqrt ( c (1: j s )), ’ r −−’); plot ( [ 1: js −1],y (1: js −1), ’kx ’ );

42 plot ( [ 0: js −1], m(1: j s )−sqrt ( c (1: j s )), ’ r −−’);hold; grid; xlabel

43   ( ’ iteration,  j ’ );

44 t i t l e ( ’EnKF,  Ex.  1.3 ’)

45 

46 figure; plot ( [ 0: J ], c ); hold

47 plot ( [ 0: J ],cumsum( c ). / [ 1: J+1] ’, ’m’, ’ Linewidth ’, 2 ); grid

48 hold; xlabel ( ’ iteration,  j ’ );

49 t i t l e ( ’EnKF  Covariance,  Ex.  1. 3 ’ );

50 

51 figure; plot ( [ 0: J ], ( v − m). ˆ 2 ); hold;

52 plot ( [ 0: J ],cumsum((v − m). ˆ 2 ). / [ 1: J+1] ’, ’m’, ’ Linewidth ’, 2 ); grid

53 hold; xlabel ( ’ iteration,  j ’ );

54 t i t l e ( ’EnKF  Error,  Ex.  1.3 ’)

5.3.6 p13.m

The program p13.m contains a particular square-root filter implementation of the ensemble Kalman filter , namely the ETKF filter, described in detail in Section 4.2.4 The program thus is very similar to p12.m for the EnKF with perturbed observations. In particular, the filtering distribution of the state is again approximated by an ensemble of particles. The predicting ensemble \(\{\widehat{v}_{j}^{(n)}\}_{n=1}^{N}\) (Uhat), mean \(\widehat{m}_{j}\)(mhat), and covariance \(\widehat{C}_{j}\) (chat) are computed exactly as in p12.m. However, this time the covariance is kept in factorized form \(\widehat{X}_{j}\widehat{X}_{j}^{\top } =\widehat{ C}_{j}\) in lines 29–30, with factors denoted by Xhat. The transformation matrix is computed in line 31,

$$\displaystyle{T_{j} = \left (I_{N} +\widehat{ X}_{j}^{\top }H^{\top }\varGamma ^{-1}H\widehat{X}_{ j}\right )^{-\frac{1} {2} },}$$

and \(X_{j} =\widehat{ X}_{j}T_{j}\) (X) is computed in line 32, from which the covariance \(C_{j} = X_{j}X_{j}^{\top }\) is reconstructed in line 38. A single innovation d j is computed in line 34, and a single updated mean m j is then computed in line 36 using the Kalman gain K j  (5.13) computed in line 35. This is the same as in the Kalman filter and extended Kalman filter (ExKF) of p8.m and p11.m, in contrast to the EnKF with perturbed observations appearing in p12.m. The ensemble is then updated to U in line 37 using the formula

$$\displaystyle{v_{j}^{(n)} = m_{ j} + X_{j}^{(n)}\sqrt{N - 1},}$$

where \(X_{j}^{(n)}\) is the nth column of X j .

Notice that the operator that is factorized and inverted has dimension N, which in this case is large in comparison to the state and observation dimensions. This is, of course, natural for computing sample statistics, but in the context of the one-dimensional examples considered here, it makes p13.m run far more slowly than p12.m. However, in many applications, the signal state-space dimension is the largest, with the observation dimension coming next, and the ensemble size being far smaller than either of these. In this context, the ETKF has become a very popular method. So its relative inefficiency, compared, for example, with the perturbed observations Kalman filter, should not be given too much weight in the overall evaluation of the method. Results illustrating the algorithm are shown in Figure 4.8

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p13.m  Ensemble Kalman F i l t e r  (ETKF),  sin  map (Ex.  1.3)

 3 %%  setup

 4 

 5 J=1e3;% number of  steps

 6 alpha=2.5;% dynamics determined by alpha

 7 gamma =1;% observational  noise  variance i s  gammaˆ2

 8 sigma=3e−1;% dynamics noise  variance i s  sigmaˆ2

 9 C0 =9e−2;% prior  i n i t i a l  condition  variance

10 m0 =0;% prior  i n i t i a l  condition  mean

11 sd=1;rng ( sd);% choose random number seed

12 N =10;% number of  ensemble members

13 

14 m =zeros (J, 1 ); v = m; y = m; c = m;U =zeros (J,N);% pre−allocate

15 v(1)=m0 +sqrt (C0 ) * randn;% i n i t i a l  truth

16 m(1)=10* randn;% i n i t i a l  mean/ estimate

17 c (1)=10*C0;H =1;% i n i t i a l  covariance and observation operator

18 U(1,:)=m (1)+ sqrt ( c ( 1 ) ) * randn (1,N);m (1)=sum(U(1,:))/N;% i n i t i a l  ensemble

19 

20 %%  solution  %  assimilate !

21 

22 for  j =1:J

23 

24     v( j+1)=alpha * sin (v( j ))  +  sigma * randn;% truth

25     y( j)= H * v( j+1)+ gamma * randn;% observation

26 

27     Uhat =alpha * sin (U( j,:))+ sigma * randn (1,N);% ensemble predict

28     mhat = sum(Uhat)/N;% estimator predict

29     Xhat=(Uhat − mhat)/ sqrt (N −1);% centered ensemble

30     chat= Xhat * Xhat ’;% covariance predict

31     T = sqrtm ( inv ( eye (N )+Xhat ’ * H ’ * H * Xhat/gammaˆ2));% right− hand sqrt

32       transform

33     X = Xhat * T;% transformed centered ensemble

34 

35     d = y( j)− H * mhat; randn (1,N);% innovation

36     K =(chat * H’ ) / (H * chat * H ’+gammaˆ2);% Kalman gain

37     m ( j+1)=mhat + K * d;% estimator update

38     U( j +1,:)= m ( j+1)+ X * sqrt (N −1);% ensemble update

39     c ( j+1)= X * X’;% covariance update

40 

41 end

42 

43 j s =21;% plot  truth,  mean,  standard deviation,  observations

44 figure; plot ( [ 0: js −1],v (1: j s ) ); hold; plot ( [ 0: js −1], m(1: j s ), ’m’ );

45 plot ( [ 0: js −1], m(1: j s )+sqrt ( c (1: j s )), ’ r −−’); plot ( [ 1: js −1],y (1: js −1), ’kx ’ );

46 plot ( [ 0: js −1], m(1: j s )−sqrt ( c (1: j s )), ’ r −−’);hold; grid; xlabel

47   ( ’ iteration,  j ’ );

48 t i t l e ( ’EnKF(ETKF),  Ex.  1. 3 ’ );

49 

50 figure; plot ( [ 0: J ], ( v − m). ˆ 2 ); hold;

51 plot ( [ 0: J ],cumsum((v − m). ˆ 2 ). / [ 1: J+1] ’, ’m’, ’ Linewidth ’, 2 ); grid

52 plot ( [ 0: J ],cumsum( c ). / [ 1: J+1] ’, ’ r −−’,’Linewidth ’, 2 );

53 hold; xlabel ( ’ iteration,  j ’ );

54 t i t l e ( ’EnKF(ETKF) Error,  Ex.  1.3 ’)

5.3.7 p14.m

The program p14.m is an implementation of the standard SIRS filter from Section 4.3.2 The setup section is almost identical to the those of the EnKF methods, because those methods also rely on particle approximations of the filtering distribution . However, the particle filters consistently estimate quite general distributions, while the EnKF is provably accurate only for Gaussian distributions. The truth and data generation and ensemble prediction in lines 24–27 are the same as in p12.m and p13.m. The way this prediction in line 27 is phrased in Section 4.3.2 is \(\widehat{v}_{j+1}^{(n)} \sim \mathbb{P}(\cdot \vert v_{j}^{(n)})\). An ensemble of “innovation” terms \(\{d_{j}^{(n)}\}_{n=1}^{N}\) is again required, but with all terms using the same observation, as computed in line 28. Assuming \(w_{j}^{(n)} = 1/N\), then

$$\displaystyle{\widehat{w}_{j}^{(n)} \propto \mathbb{P}(y_{ j}\vert v_{j}^{(n)}) \propto \exp \left \{-\frac{1} {2}\left \vert d_{j}^{(n)}\right \vert _{\varGamma }^{2}\right \},}$$

where \(d_{j}^{(n)}\) is the innovation of the nth particle, as given in (4.27). The vector of unnormalized weights \(\{\widehat{w}_{j}^{(n)}\}_{n=1}^{N}\) (what) is computed in line 29 and normalized to \(\{w_{j}^{(n)}\}_{n=1}^{N}\) (w) in line 30. Lines 32–39 implement the resampling step. First, the cumulative distribution function of the weights \(W \in [0,1]^{N}\) (ws) is computed in line 32. Notice that W has the properties \(W_{1} = w_{j}^{(1)}\), \(W_{n} \leq W_{n+1}\), and W N  = 1. Then N uniform random numbers \(\{u^{(n)}\}_{n=1}^{N}\) are drawn. For each u (n), let n be such that \(W_{n^{{\ast}}-1} \leq u^{(n)} <W_{n^{{\ast}}}\). This n (ix) is found in line 34 using the find function, which can identify the first or last element in an array to exceed zero (see help file): ix = find ( ws > rand, 1, ’first’ ). This corresponds to drawing the (n )th element from the discrete measure defined by \(\{w_{j}^{(n)}\}_{n=1}^{N}\). The nth particle \(v_{j}^{(n)}\) (U(j+1,n)) is set equal to \(\widehat{v}_{j}^{(n^{{\ast}}) }\) (Uhat(ix)) in line 37. The sample mean and covariance are then computed in lines 41–42. The rest of the program follows the others, generating the output displayed in Figure 4.9

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p14.m  Particle  F i l t e r  (SIRS ),  sin  map (Ex.  1.3)

 3 %%  setup

 4 

 5 J=1e3;% number of  steps

 6 alpha=2.5;% dynamics determined by alpha

 7 gamma =1;% observational  noise  variance i s  gammaˆ2

 8 sigma=3e−1;% dynamics noise  variance i s  sigmaˆ2

 9 C0 =9e−2;% prior  i n i t i a l  condition  variance

10 m0 =0;% prior  i n i t i a l  condition  mean

11 sd=1;rng ( sd);% choose random number seed

12 N =100;% number of  ensemble members

13 

14 m =zeros (J, 1 ); v = m; y = m; c = m;U =zeros (J,N);% pre−allocate

15 v(1)=m0 +sqrt (C0 ) * randn;% i n i t i a l  truth

16 m(1)=10* randn;% i n i t i a l  mean/ estimate

17 c (1)=10*C0;H =1;% i n i t i a l  covariance and observation operator

18 U(1,:)=m (1)+ sqrt ( c ( 1 ) ) * randn (1,N);m (1)=sum(U(1,:))/N;% i n i t i a l  ensemble

19 

20 %%  solution  %  Assimilate !

21 

22 for  j =1:J

23 

24     v( j+1)=alpha * sin (v( j ))  +  sigma * randn;% truth

25     y( j)= H * v( j+1)+ gamma * randn;% observation

26 

27     Uhat =alpha * sin (U( j,:))+ sigma * randn (1,N);% ensemble predict

28     d = y( j)− H * Uhat;% ensemble innovation

29     what = exp ( −1/2*(1/gamma ˆ 2 * d.ˆ2));% weight update

30     w = what/sum(what);% normalize predict  weights

31 

32     ws = cumsum(w);% resample:  compute cdf of  weights

33     for  n=1: N

34          ix=find (ws > rand,1, ’ f i r s t ’);% resample:  draw rand ∖sim U[ 0, 1 ]  and

35         %  find  the index of  the p a r t i c l e  corresponding to the f i r s t  time

36         %  the cdf of  the weights exceeds rand.

37         U( j +1,n)=Uhat( ix );% resample:  reset  the nth p a r t i c l e  to the one

38         %  with the given index above

39     end

40 

41     m ( j+1)= sum(U( j +1,:))/N;% estimator update

42     c ( j +1)=( U( j +1,:)− m ( j +1))*(U( j +1,:)− m ( j +1)) ’/N;% covariance update

43 

44 end

45 

46 j s =21;% plot  truth,  mean,  standard deviation,  observations

47 figure; plot ( [ 0: js −1],v (1: j s ) ); hold; plot ( [ 0: js −1], m(1: j s ), ’m’ );

48 plot ( [ 0: js −1], m(1: j s )+sqrt ( c (1: j s )), ’ r −−’); plot ( [ 1: js −1],y (1: js −1), ’kx ’ );

49 plot ( [ 0: js −1], m(1: j s )−sqrt ( c (1: j s )), ’ r −−’);hold; grid; xlabel

50  ( ’ iteration,  j ’ );

51 t i t l e ( ’ Particle  F i l t e r  ( Standard ),  Ex.  1. 3 ’ );

52 

53 figure; plot ( [ 0: J ], ( v − m). ˆ 2 ); hold;

54 plot ( [ 0: J ],cumsum((v − m). ˆ 2 ). / [ 1: J+1] ’, ’m’, ’ Linewidth ’, 2 ); grid

55 hold; xlabel ( ’ iteration,  j ’ ); t i t l e ( ’ Particle  F i l t e r  ( Standard ) Error,

56   Ex.  1.3 ’)

5.3.8 p15.m

The program p15.m is an implementation of the SIRS(OP) algorithm from Section 4.3.3 The setup section and truth and observation generation are again the same as in the previous programs. The difference between this program and p14.m arises because the importance sampling proposal kernel Q j with density \(\mathbb{P}(v_{j+1}\vert v_{j},y_{j+1})\) is used to propose each \(\widehat{v}_{j+1}^{(n)}\) given each particular \(v_{j}^{(n)}\); in particular, Q j depends on the next data point, whereas the kernel P used in p14.m has density \(\mathbb{P}(v_{j+1}\vert v_{j})\), which is independent of y j+1. 

Observe that if \(v_{j}^{(n)}\) and y j+1 are both fixed, then \(\mathbb{P}\left (v_{j+1}\vert v_{j}^{(n)},y_{j+1}\right )\) is the density of the Gaussian with mean m(v) and covariance Σ′ given by

$$\displaystyle{m'^{(n)} =\varSigma '\left (\varSigma ^{-1}\mathrm{\varPsi }\left (v_{ j}^{(n)}\right ) + H^{\top }\varGamma ^{-1}y_{ j+1}\right ),\quad \left (\varSigma '\right )^{-1} =\varSigma ^{-1} + H^{\top }\varGamma ^{-1}H.}$$

Therefore, Σ′ (Sig) and the ensemble of means \(\left \{m'^{(n)}\right \}_{n=1}^{N}\) (vector em) are computed in lines 27 and 28 and used to sample \(\widehat{v}_{j+1}^{(n)} \sim N(m'^{(n)},\varSigma ')\) in line 29 for all of \(\left \{\widehat{v}_{j+1}^{(n)}\right \}_{n=1}^{N}\) (Uhat).

Now the weights are updated by (4.34) rather than (4.27), i.e., assuming \(w_{j}^{(n)} = 1/N\). Then

$$\displaystyle{\widehat{w}_{j+1}^{(n)} \propto \mathbb{P}\left (y_{ j+1}\vert v_{j}^{(n)}\right ) \propto \exp \left \{-\frac{1} {2}\left \vert y_{j+1} -\mathrm{\varPsi }\left (v_{j}^{(n)}\right )\right \vert _{\varGamma +\varSigma }^{2}\right \}.}$$

This is computed in lines 31–32, using another auxiliary “innovation” vector d in line 31. Lines 35–45 are again identical to lines 32–42 of program p14.m, performing the resampling step and computing the sample mean and covariance.

The output of this program was used to produce Figure 4.10, similar to the other filtering algorithms . Furthermore, long simulations of length J = 105 were performed for this and the previous three programs, p12.m, p13.m, and p14.m, and their errors are compared in Figure 4.12, similarly to Figure 4.11, comparing the basic filters p10.m, p11.m, and p12.m.

 1 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 2 %%%  p15.m  Particle  F i l t e r  (SIRS,  OP),  sin  map (Ex.  1.3)

 3 %%  setup

 4 

 5 J=1e3;% number of  steps

 6 alpha=2.5;% dynamics determined by alpha

 7 gamma =1;% observational  noise  variance i s  gammaˆ2

 8 sigma=3e−1;% dynamics noise  variance i s  sigmaˆ2

 9 C0 =9e−2;% prior  i n i t i a l  condition  variance

10 m0 =0;% prior  i n i t i a l  condition  mean

11 sd=1;rng ( sd);% choose random number seed

12 N =100;% number of  ensemble members

13 

14 m =zeros (J, 1 ); v = m; y = m; c = m;U =zeros (J,N);% pre−allocate

15 v(1)=m0 +sqrt (C0 ) * randn;% i n i t i a l  truth

16 m(1)=10* randn;% i n i t i a l  mean/ estimate

17 c (1)=10*C0;H =1;% i n i t i a l  covariance and observation operator

18 U(1,:)=m (1)+ sqrt ( c ( 1 ) ) * randn (1,N);m (1)=sum(U(1,:))/N;% i n i t i a l  ensemble

19 

20 %%  solution  %  Assimilate !

21 

22 for  j =1:J

23 

24     v( j+1)=alpha * sin (v( j ))  +  sigma * randn;% truth

25     y( j)= H * v( j+1)+ gamma * randn;% observation

26 

27     Sig=inv ( inv ( sigmaˆ2)+ H ’ * inv (gamma ˆ 2 ) *H);% optimal proposal covariance

28     em =Sig * ( inv ( sigma ˆ 2 ) * alpha * sin (U( j,:))+H ’ * inv (gamma ˆ 2 ) * y( j ));% proposal

29        mean

30     Uhat = em +sqrt ( Sig ) * randn (1,N);% ensemble optimally importance sampled

31 

32     d = y( j)− H * alpha * sin (U( j,:));%  ensemble innovation

33     what = exp(−1/2/(sigmaˆ2+ gamma ˆ 2 ) * d.ˆ2);% weight update

34     w = what/sum(what);% normalize predict  weights

35 

36     ws = cumsum(w);% resample:  compute cdf of  weights

37     for  n=1: N

38          ix=find (ws > rand,1, ’ f i r s t ’);% resample:  draw rand ∖sim U[ 0, 1 ]  and

39         %  find  the index of  the p a r t i c l e  corresponding to the f i r s t  time

40         %  the cdf of  the weights exceeds rand.

41         U( j +1,n)=Uhat( ix );% resample:  reset  the nth p a r t i c l e  to the one

42         %  with the given index above

43     end

44 

45     m ( j+1)= sum(U( j +1,:))/N;% estimator update

46     c ( j +1)=( U( j +1,:)− m ( j +1))*(U( j +1,:)− m ( j +1)) ’/N;% covariance update

47 

48 end

49 

50 j s =21;%plot  truth,  mean,  standard deviation,  observations

51 figure; plot ( [ 0: js −1],v (1: j s ) ); hold; plot ( [ 0: js −1], m(1: j s ), ’m’ );

52 plot ( [ 0: js −1], m(1: j s )+sqrt ( c (1: j s )), ’ r −−’); plot ( [ 1: js −1],y (1: js −1), ’kx ’ );

53 plot ( [ 0: js −1], m(1: j s )−sqrt ( c (1: j s )), ’ r −−’);hold; grid; xlabel

54   ( ’ iteration,  j ’ );

55 t i t l e ( ’ Particle  F i l t e r  ( Optimal ),  Ex.  1. 3 ’ );

5.4 ODE Programs

The programs p16.m and p17.m are used to simulate and plot the Lorenz ’63 and ’96 models from Examples 2.6 and 2.7, respectively. These programs are both MATLAB functions, similar to the program p7.m presented in Section 5.2.5. The reason for using functions and not scripts is that the black box MATLAB built-in function ode45 can be used for the time integration (see help page for details regarding this function). Therefore, each has an auxiliary function defining the right-hand side of the given ODE, which is passed via a function handle to ode45.

5.4.1 p16.m

The first of the ODE programs, p16.m, integrates the Lorenz ’63 model 2.6. The setup section of the program, on lines 4–11, defines the parameters of the model and the initial conditions. In particular, a random Gaussian initial condition is chosen in line 9, and a small perturbation to its first (x) component is introduced in line 10. The trajectories are computed on lines 13–14 using the built-in function ode45. Notice that the auxiliary function lorenz63, defined on line 29, takes as arguments (t, y), prescribed through the definition of the function handle @(t,y), while (α, b, r) are given as fixed parameters (a,b,r), defining the particular instance of the function. The argument t is intended for defining nonautonomous ODEs and is spurious here, since it is an autonomous ODE, and therefore, t does not appear on the right-hand side. It is nonetheless included for completeness, and it causes no harm. The Euclidean norm of the error is computed in line 16, and the results are plotted similarly to previous programs in lines 18–25. This program is used to plot Figs. 2.6 and 2.7.

5.4.2 p17.m

The second of the ODE programs, p17.m, integrates the J=40-dimensional Lorenz ’96 model 2.7. This program is almost identical to the previous one, where a small perturbation of the random Gaussian initial condition defined on line 9 is introduced on lines 10–11. The major difference is the function passed to ode45 on lines 14–15, which now defines the right-hand side of the Lorenz ’96 model given by the subfunction lorenz96 on line 30. Again the system is autonomous, and the spurious t-variable is included for completeness. A few of the 40 degrees of freedom are plotted along with the error in lines 19–27. This program is used to plot Figs. 2.8 and 2.9

 1 function  this= p16

 2 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 3 %%%  p16.m  Lorenz ’63 (Ex.  2.6)

 4 %%  setup

 5 

 6 a=10;b=8/3; r=28;% define  parameters

 7 sd=1;rng ( sd);% choose random number seed

 8 

 9 i n i t i a l= randn (3,1);% choose i n i t i a l  condition

10 i n i t i a l 1=i n i t i a l  +  [0.0001;0;0];%  choose perturbed i n i t i a l  condition

11 

12 %%  calculate  the t r a j e c t o r i e s  with blackbox

13 [ t1, y]=ode45 (@( t, y) lorenz63 ( t, y, a, b, r ),  [0  100],  i n i t i a l );

14 [ t, y1]=ode45 (@( t, y) lorenz63 ( t, y, a, b, r ),  t1,  i n i t i a l 1 );

15 

16 error=sqrt (sum((y − y1 ).ˆ2,2));%  calculate  error

17 

18 %%  plot  r e s u l t s

19 

20 figure (1),  semilogy ( t, error, ’ k ’ )

21 axis ([0  100 10ˆ−6 10ˆ2])

22 set ( gca, ’ YTick ’,[10ˆ −6 10ˆ−4 10ˆ−2 10ˆ0 10ˆ2])

23 

24 figure (2),  plot ( t, y (:,1), ’ k ’ )

25 axis ([0  100 −20 20])

26 

27 

28 %%  auxiliary  dynamics function  d e f i n i t i o n

29 function  rhs=lorenz63 ( t, y, a, b, r )

30 

31 rhs (1,1)=a * ( y(2)−y ( 1 ) );

32 rhs (2,1)=−a * y(1)−y(2)−y ( 1 ) * y ( 3 );

33 rhs (3,1)=y ( 1 ) * y(2)−b * y(3)−b * ( r+ a );

 1 function  this= p17

 2 clear; set (0, ’ defaultaxesfontsize ’, 2 0 ); format long

 3 %%%  p17.m  Lorenz ’96 (Ex.  2.7)

 4 %%  setup

 5 

 6 J=40;F =8;% define  parameters

 7 sd=1;rng ( sd);% choose random number seed

 8 

 9 i n i t i a l= randn (J,1);% choose i n i t i a l  condition

10 i n i t i a l 1=i n i t i a l;

11 i n i t i a l 1 (1)= i n i t i a l (1)+0.0001;% choose perturbed i n i t i a l  condition

12 

13 %%  calculate  the t r a j e c t o r i e s  with blackbox

14 [ t1, y]=ode45 (@( t, y) lorenz96 ( t, y,F),  [0  100],  i n i t i a l );

15 [ t, y1]=ode45 (@( t, y) lorenz96 ( t, y,F),  t1,  i n i t i a l 1 );

16 

17 error=sqrt (sum((y − y1 ).ˆ2,2));%  calculate  error

18 

19 %%  plot  r e s u l t s

20 

21 figure (1),  plot ( t, y (:,1), ’ k ’ )

22 figure (2),  plot (y (:,1), y (:, J ), ’ k ’ )

23 figure (3),  plot (y (:,1), y (:, J−1), ’k ’ )

24 

25 figure (4),  semilogy ( t, error, ’ k ’ )

26 axis ([0  100 10ˆ−6 10ˆ2])

27 set ( gca, ’ YTick ’,[10ˆ −6 10ˆ−4 10ˆ−2 10ˆ0 10ˆ2])

28 

29 %%  auxiliary  dynamics function  d e f i n i t i o n

30 function  rhs=lorenz96 ( t, y,F)

31 

32 rhs=[y( end ); y (1: end − 1 ) ]. * ( [ y (2: end ); y ( 1 ) ]  −  ...

33     [ y(end−1:end ); y (1: end −2)]) −  y +  F * y. ˆ 0;