The iterative solution to LQ zero-sum stochastic differential games

Ivanov, Ivan G.; Ivanov, Ivelin G.

doi:10.1007/s12190-017-1086-3

The iterative solution to LQ zero-sum stochastic differential games

Original Research
Published: 09 February 2017

Volume 56, pages 547–559, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Applied Mathematics and Computing Aims and scope Submit manuscript

The iterative solution to LQ zero-sum stochastic differential games

Download PDF

Ivan G. Ivanov^1,2 &
Ivelin G. Ivanov²

249 Accesses
Explore all metrics

Abstract

We consider a generalized algebraic Riccati equation arising in stochastic control with an indefinite quadratic part. Three effective methods for computing a matrix sequence, which converges to the stabilizing solution of the considered type of Riccati equations with indefinite quadratic parts are explored. Convergence properties of these methods are studied. Computer realizations of the presented methods are numerically compared. Based on the experiments the main conclusion is that the Lyapunov iteration is faster than the Riccati iteration because these methods carry the same number of iterations. The iterative methods are numerically compared and investigated.

A Numerical Algorithm to Calculate the Unique Feedback Nash Equilibrium in a Large Scalar LQ Differential Game

Article Open access 29 July 2016

Linear-Quadratic McKean-Vlasov Stochastic Differential Games

On the SQH Method for Solving Differential Nash Games

Article Open access 27 April 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The $H_{\infty }$ linear quadratic problems have been introduced by Basar and Bernhard [1] as a two-player zero sum game. $H_{\infty }$ control theory plays an important role in contemporary control theory. There are engineering applications where the algebraic Riccati equations with an indefinite quadratic term have to be solved (see [5] and reference there in). In addition new type of methods and their extensions to solve such type of equations are studied in [3, 5, 7,8,9]. Recently, linear quadratic differential games and their applications have been widely investigated in many literatures. The Riccati type equations have significant importance in the investigations of the $H_2/H_{\infty }$ optimal control problems. The stochastic $H_2/H_{\infty }$ control with state, control and external disturbance-dependent noise is discussed by Zhu and Zhang in [11]. Necessary and sufficient conditions for the existence of the Nash strategy are presented by means of four coupled stochastic algebraic Riccati equations are derived under the assumption of mean-square stabilizability of stochastic systems in [11]. A linear quadratic stochastic two-person zero-sum differential game with constant coefficients in an infinite time horizon is considered in [12]. The closed-loop saddle points are characterized by the solvability of an algebraic Riccati equation with a certain stabilizing condition. A model-free policy iteration method for learning the $H_{\infty }$ control policy by using measured system data without system model information is proposed in [13]. A least-square based model-free policy iteration approach by using real system data is considered in order to solve the suitable algebraic Riccati equation. The data-driven $H_{\infty }$ control problem of nonlinear distributed parameter systems is considered in [14]. A data-driven off-policy learning method is proposed based on the simultaneous policy update algorithm and its convergence is proved.

A new type of Riccati equation is introduced in [15] where a generalized class of continuous-time two person zero-sum stochastic differential games is studied. Here we consider a more general $H_{\infty }$ Riccati equation:

$$\begin{aligned} {\mathcal R}(X):= & {} A^\mathrm{T}X + XA + C^\mathrm{T}C + \varPi (X) \nonumber \\&\quad - [X(G \ \ B) + ( S_1^\mathrm{T} \ \ S_2^\mathrm{T})] \,R^{-1}\,[X(G \ \ B) + ( S_1^\mathrm{T} \ \ S_2^\mathrm{T})]^\mathrm{T} \, = \, 0 \end{aligned}$$

(1)

with $ R = \left( \begin{array}{cc} R_1 &{} 0 \\ 0 &{} R_2 \end{array} \right) , \ \ R_1 <0\,, R_2>0$, and $(G \ \ B)$ and $( S_1^\mathrm{T} \ \ S_2^\mathrm{T})$ are block matrices. Here $\varPi (X)$ is a positive definite (semidefinite) operator, i.e. $\varPi (X) > 0$ if $X>0$ ($\varPi (X) \ge 0$ if $X \ge 0$).

Riccati equations for the $H_{\infty }$ control of stochastic systems is different from $H_{\infty }$ Riccati equations in deterministic systems because Eq. (1) has an additional term $\varPi (X)$ which appears from state-dependent noises of the system. This term $\varPi (X)$ complicates the process to compute the stabilizing solution of (1). Traditional methods applied to the $H_{\infty }$ Riccati equations in deterministic systems do not work for the stochastic case. In this reason many researchers have derived and proposed new iterative methods for computing the stabilizing solutions to $H_{\infty }$ Riccati equations in stochastic case.

The matrix function ${\mathcal R}(X)$ defined for a symmetric matrix $X, \, (X=X^\mathrm{T})$ can be rewritten in the following form:

$$\begin{aligned} \begin{array}{cl} {\mathcal R}(X) = A^\mathrm{T}X + XA + C^\mathrm{T}C - (XG+S_1^\mathrm{T})R_1^{-1}(G^\mathrm{T}X+S_1) \\ \ \ \ \ - (XB + S_2^\mathrm{T})\,R_2^{-1}\,(B^\mathrm{T}X+S_2) + \varPi (X)\,. \end{array} \end{aligned}$$

Different algorithms for solving $H_{\infty }$ control problems have been investigated by Wu and Luo in [10]. The main idea is to introduce two iterative algorithms, where one matrix sequence is constructed, for computing the stabilizing solution to the algebraic Riccati equation with an indefinite quadratic part (1). However, we have no direct proof of the convergence properties of these iterative methods. For this reason we consider an iterative process with two matrix sequences and we derive the convergence properties of this iterative process. Moreover, we show that the introduced iterative algorithms are equivalent to the iterative process, where two matrix sequences are studied. Based on the equivalence relation we conclude, that the iterative algorithms, where one matrix sequence is constructed are convergent algorithms.

In this paper we use the following notations: $\mathbb {R}^{n\times s}$ stands for a set of $n\times s$ real matrices. We write $X> Y$ or $X\ge Y$ if $X-Y$ is positive definite or $X-Y$ is positive semidefinite for any two symmetric matrices X and Y. A matrix A is called asymptotically stable (or Hurwitz), if the eigenvalues to A have a negative real part. E(.) denotes the mathematical expectation. A feedback gain matrix K is said to be stabilizing for linear systems $dx = (Ax+Bu)dt$, if the feedback matrix $A - BK$ is Hurwitz.

Remark 1

Consider the following example [5] of a two-mass spring system with uncertain stiffness. The dynamical stochastic system is

$$\begin{aligned} \left\{ \begin{array}{l} dx(t) = A_0 x(t) dt + G v(t) dt + B u(t) dt + A_1 x(t) dw \\ z(t) = Cx(t) + Du(t) \end{array} \right. , \end{aligned}$$

(2)

where $x(t) \in \mathbb {R}^{n}$ is the state vector, $v(t) \in \mathbb {R}^{m_1}$ denotes the control of the “disturbance player”, $u(t) \in \mathbb {R}^{m_2}$ denotes the control of the “controller player” input, w(t) is an one-dimensional Wiener process. The interpretation of the “controller player” and the “disturbance player” can be found in Vrabie [9]. The disturbance player wishes to maximize, while the controller player seeks to minimize the cost functional:

$$\begin{aligned} J_{\gamma } (u,v) = E \, \int _0^{\infty }\, ( x^\mathrm{T}C^\mathrm{T}Cx + u^\mathrm{T}D^\mathrm{T}Du - \gamma ^{2} v^\mathrm{T}\,v) \, dt \,. \end{aligned}$$

(3)

The corresponding Riccati equation, which stabilizing solution leads us to the equilibrium point for such defined $H_{\infty }$ control problem is (1) with $\varPi (X)= {A_1}^\mathrm{T} X A_1$ and $A=A_0, S_1=S_2=0$, and $R_1=-\gamma ^2\,I, R_2=D^\mathrm{T}D$.

2 Preliminaries

Some lemmas that are instrumental in the construction of the iterative processes are presented in this section. We introduce the following perturbed Lyapunov operator

$$\begin{aligned} \mathcal {L}_{W; \varPi } ( X) = (W)^\mathrm{T}\,X + X\,W + \varPi (X)\,, \end{aligned}$$

and we will present the solvability of (1) through properties of the perturbed Lyapunov operator.

Lemma 1

[4] The following are equivalent:

(i)
The matrix $\tilde{X}$ is the stabilizing solution to (1);
(ii)
The perturbed Lyapunov operator ${\mathcal {L}}_{\tilde{A}; \varPi }$ is asymptotically stable where:
$$\begin{aligned}\left\{ \begin{array}{l} F_{1, \tilde{X}} = R_1^{-1}(G^\mathrm{T}\,{\tilde{X}}\,+S_1)\,, \ \ \ \ \ \ \ F_{2, \tilde{X}} = - R_2^{-1}\,(B^{T}{\tilde{X}} +S_2)\,, \\ {\tilde{A}} = A - G\,F_{1, \tilde{X}} + B\,F_{2, \tilde{X}} \,. \\ \end{array} \right. \end{aligned}$$

The above lemma gives a deterministic characterization of a stabilizing solution to the set of Riccati Eqs. (1).

Lemma 2

For the map ${\mathcal R}(X)$ the following identities are valid:

$$\begin{aligned} \mathrm{(i)} \quad {\mathcal R}(X+Z)= & {} {\mathcal R}(X) + (A(X))^\mathrm{T} Z + Z A(X) - ZB\,R_2^{-1}\,B^\mathrm{T}Z \nonumber \\&\quad - Z\,G\,R_1^{-1}G^\mathrm{T}\,Z + \varPi (Z) \,, \nonumber \\ \mathrm{where} \quad A(X)= & {} A - GF_1(X) + BF_2(X)\,, \nonumber \\ F_1(X)= & {} R_1^{-1}(G^\mathrm{T}\,X\,+S_1), \nonumber \\ F_2(X)= & {} - R_2^{-1}\,(B^\mathrm{T}X +S_2)\, \end{aligned}$$

(4)

for any symmetric matrices X, Z;

$$\begin{aligned} \mathrm{(ii)}\quad \ \ {\mathcal R}(W,V,X)= & {} (A(W,V))^\mathrm{T} X + X A(W,V) + C^\mathrm{T}C \nonumber \\&\quad -\,( F_2(X) - V)^\mathrm{T}\,R_2\,(F_2(X)-V) + \varPi (X) \nonumber \\&\quad -\,(F_1(X) - W)^\mathrm{T} R_1 (F_1(X) - W) \nonumber \\&\quad +\,(W-R_1^{-1}S_1)^\mathrm{T}R_1(W-R_1^{-1}S_1) - S_1^\mathrm{T}R_1^{-1}S_1 \nonumber \\&\quad +\,(V+R_2^{-1}S_2)^\mathrm{T}R_2(V+R_2^{-1}S_2) - S_2^\mathrm{T}R_2^{-1}S_2\,, \end{aligned}$$

(5)

with $ A(W,V) = A - GW + BV , \ W\in {\mathbb R}^{m_1\times n}, \ V\in {\mathbb R}^{m_2\times n}\,. $

Proof

The statements of Lemma 2 are verified by direct manipulations. $\square $

Lemma 3

Assume there exist positive definite symmetric matrices $X,Z, \hat{X}$ with ${\hat{X}} \ge X\,, \, {\mathcal R}({\hat{X}}) \le 0,$ and Z is the stabilizing solution to

$$\begin{aligned} 0 = (A(X))^\mathrm{T} Z + Z A(X) + \varPi ({Z}) + {\mathcal R} ({X}) - ZB\,R_2^{-1}\,B^\mathrm{T}Z\,. \end{aligned}$$

(6)

Then

(i)
if $\mathcal {L}_{\bar{A}(X, \hat{X});\varPi }$ is asymptotically stable with $ \bar{A}(X, \hat{X})=A- GF_1(X) + BF_2({\hat{X}}) $ then $\hat{X} - X - Z \ge 0$;
(ii)
if ${\hat{X}} - X - Z \ge 0$ then the Lyapunov operator $\mathcal {L}_{ {\check{A}}({X}+{Z},{\hat{X}});\varPi }$ is asymptotically stable with $ \check{A}({X}+{Z}, \hat{X})=A - GF_1({X}+{Z}) + BF_2({\hat{X}}) $.

Proof

There exists a matrix $\hat{Q} \ge 0$ for which ${\mathcal R} (\hat{X}) + {\hat{Q}} =0$. We have

$$\begin{aligned} {\mathcal R} (\hat{X}) + {\hat{Q}}= & {} A^\mathrm{T}\, {\hat{X}} + {\hat{X}}\, A + \varPi (\hat{ X}) + C^\mathrm{T}C \\&\quad - ({\hat{X}}G+S_1^\mathrm{T})R_1^{-1}(G^\mathrm{T}{\hat{X}}+S_1) - ({\hat{X}}B + S_2^\mathrm{T})\,R_2^{-1}\,(B^\mathrm{T}{\hat{X}}+S_2) = 0\,. \end{aligned}$$

Since Z is a solution to (6) then

$$\begin{aligned} {\mathcal R} ({X}+{Z}) = -Z\,G\,R_1^{-1}G^\mathrm{T}\,Z\,. \end{aligned}$$

Thus

$$\begin{aligned} {\mathcal R} ({X}+{Z}) = {\mathcal R} (\hat{X}) + {\hat{Q}} - Z\,G\,R_1^{-1}G^\mathrm{T}\,Z\,. \end{aligned}$$

According to (5) we rewrite the last equation $(W= F_1(X) = R_1^{-1}(G^\mathrm{T}\,X + S_1)\,, \ V= F_2(\hat{X}) = - R_2^{-1}(B^\mathrm{T}\,\hat{X} +S_2)\,)$:

$$\begin{aligned} {\mathcal R}(F_1(X),F_2(\hat{X}),{X}+{Z}) = {\mathcal R} (F_1(X),F_2(\hat{X}), \hat{X}) + \hat{Q} - Z\,G\,R_1^{-1}G^\mathrm{T}\,Z \,, \end{aligned}$$

and then by applying some matrix manipulations we obtain the identity ($ \bar{A}(X, \hat{X})= A-GF_1(X) + BF_2(\hat{X})$):

$$\begin{aligned} 0= & {} (\bar{A}(X, \hat{X}))^\mathrm{T}\, [ {\hat{X}}-X-Z ] + [{\hat{X}}-X-Z]\, \bar{A}(X, \hat{X}) \nonumber \\&\quad +\,\varPi (\hat{X} - {X} - {Z}) + {\hat{Q}} - (\hat{X} - X)\,G\,R_1^{-1}\,G^\mathrm{T}\,( \hat{X} - X )\nonumber \\&\quad +\,\left( F_2( X+Z) - F_2(\hat{X}) \right) ^\mathrm{T}\,R_2\,\left( F_2( X+Z) - F_2(\hat{X}) \right) \,, \end{aligned}$$

(7)

or

$$\begin{aligned} \begin{array}{lcl}\\ 0 &{}=&{} {\mathcal {L}}_{\bar{A}(X, \hat{X});\varPi } \left( \hat{X} - {X} - {Z} \right) + W\,, \end{array} \end{aligned}$$

and

$$\begin{aligned} W= & {} {\hat{Q}} - (\hat{X} - X)\,G\,R_1^{-1}\,G^\mathrm{T}\,( \hat{X} - X ) \\&\quad + \left( F_2( X+Z) - F_2(\hat{X}) \right) ^\mathrm{T}\,\left( F_2( X+Z) - F_2(\hat{X}) \right) \ \ge \ 0 \,. \end{aligned}$$

Thus ${\hat{X}} - {X} - {Z} \ge 0$. The statement (i) is proved.

In order to prove the statement (ii) we derive a connection between the matrix coefficients $\bar{A}(X, \hat{X})$ and ${\check{A}}({X}+{Z},{\hat{X}})$:

$$\begin{aligned} \bar{A}(X, \hat{X}) \pm G\,F_1(X+Z)= & {} {\check{A}}({X}+{Z},{\hat{X}}) - G (F_1(X) - F_1(X+Z)) \\= & {} {\check{A}}({X}+{Z},{\hat{X}}) + \,GR_1^{-1}G^\mathrm{T}\, Z \end{aligned}$$

and using it to transform identity (7) we yield

$$\begin{aligned} 0= & {} ({\check{A}}({X}+{Z},{\hat{X}}))^\mathrm{T}\, ({\hat{X}}-X-Z) + ( {\hat{X}}-X-Z )\, {\check{A}}({X}+{Z},{\hat{X}}) \\&\quad +\,\varPi (\hat{X} - {X} - {Z}) + {\hat{Q}} \\&\quad +\,[\hat{X} - X-Z]\,B\,R_2^{-1}\,B^\mathrm{T} [ \hat{X} -X-Z) ] \\&\quad -\,Z\,GR_1^{-1}G^\mathrm{T}\,Z - ({\hat{X}}-X-Z )Z\,G R_1^{-1}G^\mathrm{T}\,({\hat{X}}-X-Z )\,. \end{aligned}$$

Thus

$$\begin{aligned} 0= & {} {\mathcal {L}}_{ {\check{A}}({X}+{Z},{\hat{X}});\varPi } \left( \hat{X} - {X} - {Z} \right) + [\hat{X} - X-Z]\,BR_2^{-1}B^\mathrm{T} [ \hat{X} -X-Z] \nonumber \\&\quad +\,{\hat{Q}}- Z\,GR_1^{-1}G^\mathrm{T}\,Z - ({\hat{X}}-X-Z )Z\,G R_1^{-1}G^\mathrm{T}\,({\hat{X}}-X-Z ) \,. \end{aligned}$$

(8)

Since ${\hat{X}}-X-Z$ is positive definite and $R_1$ is negative definite, then the Lyapunov operator $ \mathcal {L}_{ {\check{A}}({X}+{Z},{\hat{X}}) ;\varPi }$ is asymptotically stable because Riccati Eq. (8) has the stabilizing positive semidefinte solution.

The lemma is proved. $\square $

3 Some iterative procedures

3.1 An iterative process with two matrix sequences

We present the main theorem with properties for constructing two matrix sequences of positive semidefinite matrices $\{ {X}^{(k)} \}_{k=0}^{\infty }, \, \{ {Z}^{(k)} \}_{k=0}^{\infty }$. We construct the above matrix sequences as follows. We take

$$\begin{aligned} X^{(k+1)}= & {} X^{(k)} + Z^{(k)}, \ \ \ \ \ \text{ with } \ \ X^{(0)} = 0, \\ \nonumber k= & {} 0,1,2\ldots \,. \end{aligned}$$

(9)

Each matrix ${Z}^{(k)}, \ k=0,1,2, \ldots $ is computed as the stabilizing solution of the algebraic Riccati equation with definite quadratic part:

$$\begin{aligned} {\mathcal G} ({Z}^{(k)}):= & {} (A^{(k)})^\mathrm{T}\, Z^{(k)} + Z^{(k)}\,A^{(k)} + \varPi ({Z}^{(k)}) + {\mathcal R} ({X}^{(k)}) \nonumber \\&\quad - Z^{(k)}\,B\,R_2^{-1}B^\mathrm{T}\, Z^{(k)} \ = \ 0 \,, \end{aligned}$$

(10)

where

$$\begin{aligned} \left\{ \begin{array}{l} F_1(X^{(k)}) = R_1^{-1}\,(G^\mathrm{T}\, X^{(k)}+S_1)\,, \\ F_2(X^{(k)}) = - R_2^{-1}\,(B^\mathrm{T}\,X^{(k)}+S_2)\,, \\ A^{(k)} = A - G F_1(X^{(k)}) + B\,F_2(X^{(k)})\,. \end{array} \right. \end{aligned}$$

We formulate sufficient conditions for the existence of a stabilizing solution to the algebraic Riccati Eq. (1). In fact, the next theorem proves the convergence properties to the main iterative loop (9), (10) under the sufficient conditions for existence of a stabilizing solution to (1).

The matrices $ \{ Z^{(k)} \}_{k=0}^{\infty }$ are stabilizing solutions to the sequence of algebraic Riccati equations (10). We will prove that the sequence $ \{ X^{(k)} \}_{k=0}^{\infty }$ is monotonically non-decreasing and converges to the unique stabilizing solution to set of Eq. (1). We reformulate the convergence theorem introduced in [5, Theorem 3] and we present it as sufficient conditions to the existence of the stabilizing solution to (1).

Theorem 1

Assume there exist symmetric matrices $\hat{X}$ and ${X}^{(0)}$ such that ${\mathcal R}({X}^{(0)})\ge 0\,, \, {\mathcal R}(\hat{X}) \le 0$ and ${X}^{(0)} \le \hat{X}$. Assume the Lyapunov operator ${\mathcal {L}}_{ A^{(0)};\varPi }$ is asymptotically stable. Then, for the matrix sequences $\{{X}^{(k)} \}_{k=0}^{\infty }, \, \{ {Z}^{(k)} \}_{k=0}^{\infty }$ are satisfied

(i)
The Lyapunov operator $ {\mathcal {L}}_{A^{(k)};\varPi } $ is asymptotically stable;
(ii)
${\mathcal R}({X}^{(k+1)}) = - \,Z^{(k)}\,G\,R_1^{-1}G^\mathrm{T}\,Z^{(k)}\ge 0$;
(iii)
the Lyapunov operator $ {\mathcal {L}}_{ \tilde{A}^{(k)};\varPi } $ is asymptotically stable,

where $\tilde{A}^{(k)} = A - GF_1(X^{(k)}) + BF_2(X^{(k+1)})$;
(iv)
$\hat{X} \ge {X}^{(k+1)} \ge {X}^{(k)} \ge 0$ for $k=0,1,\ldots $;
(v)
$\lim _{k\rightarrow \infty } X^{(k)} = {\tilde{X}}$ is the stabilizing solution to (1) .

Proof

The proof follows the proof of Theorem 3 from [5]. $\square $

Moreover, practical computation of the stabilizing solution $Z^{(k)}$ of (10) follows the following simple recursive procedure like those in [2, 6]. We choose $Y_0=0$ and $Y_s, s=1,2, \ldots $ is the stabilizing solution to the algebraic Riccati equation:

$$\begin{aligned} (A^{(k)})^\mathrm{T}\, Y_s + Y_s\,A^{(k)} - Y_s\,BR_2^{-1}\,B^\mathrm{T}\, Y_s + {Q}_{R,s-1} =0 \,. \end{aligned}$$

(11)

Note that the matrix ${Q}_{R,s-1} = {\mathcal R} ({X}^{(k)}) + \varPi ({Y}_{s-1})$. We apply the Matlab procedure care for solving (11).

3.2 An iterative process with one matrix sequence

In this section we consider an alternative iteration process where one matrix sequence is constructed. This sequence converges to the stabilizing solution of the given set of Riccati equations. We are proving that this introduced iteration is equivalent to the iteration loop (9), (10). We substitute ${\mathcal R} ({X}^{(k)})$ in recurrence Eq. (10):

$$\begin{aligned} 0= & {} (A^{(k)})^\mathrm{T}\, \left( X^{(k+1)} - X^{(k)} \right) + \left( X^{(k+1)} - X^{(k)} \right) \,A^{(k)} + \varPi ({Z}^{(k)}) \\&\quad +\,A^\mathrm{T}{X}^{(k)} + {X}^{(k)}A + C^\mathrm{T}C - ({X}^{(k)}G+S_1^\mathrm{T})R_1^{-1}\left( G^\mathrm{T}{X}^{(k)}+S_1\right) \\&\quad - \left( {X}^{(k)}B + S_2^\mathrm{T}\right) \,R_2^{-1}\,\left( B^\mathrm{T}{X}^{(k)}+S_2\right) + \varPi ({X}^{(k)}) \\&\quad - \left( X^{(k+1)} - X^{(k)} \right) \,BR_2^{-1}B^\mathrm{T}\,\left( X^{(k+1)} - X^{(k)} \right) \,. \end{aligned}$$

Transforming the above matrix equation we obtain:

$$\begin{aligned} 0= & {} (A^{(k)})^\mathrm{T}\, X^{(k+1)} + X^{(k+1)}\,A^{(k)} + \varPi ({X}^{(k+1)}) \nonumber \\&\quad +\, C^\mathrm{T}C - S_1^\mathrm{T}R_1^{-1}S_1 - S_2^\mathrm{T}R_2^{-1}S_2 + X^{(k)}GR_1^{-1}G^\mathrm{T}X^{(k)} \nonumber \\&\quad +\,X^{(k)}BR_2^{-1}B^\mathrm{T}X^{(k)} \nonumber \\&\quad - \,\left( X^{(k+1)} - X^{(k)} \right) \,BR_2^{-1}B^\mathrm{T}\,\left( X^{(k+1)} - X^{(k)} \right) \,. \end{aligned}$$

(12)

There are two approaches to continue the transformation of the last recurrence equation. The first one is to extract the term $(A - G F_1(X^{(k)}))^\mathrm{T}X^{(k+1)} + X^{(k+1)}\,(A - G F_1(X^{(k)}))$ and the second one is to extract the term $(A + B\,F_2(X^{(k)}))^\mathrm{T}X^{(k+1)} + X^{(k+1)}\,(A + B\,F_2(X^{(k)}))$.

We apply the first approach:

$$\begin{aligned} 0= & {} (A - G F_1(X^{(k)}))^\mathrm{T} X^{(k+1)} + X^{(k+1)}\,(A - G F_1(X^{(k)})) + \varPi ({X}^{(k+1)}) \\&\quad + \,X^{(k+1)}\,B\,F_2(X^{(k)}) + F_2(X^{(k)})^\mathrm{T}B^T X^{(k+1)} \\&\quad + \,C^\mathrm{T}C - S_1^\mathrm{T}R_1^{-1}S_1 - S_2^\mathrm{T}R_2^{-1}S_2 + X^{(k)}GR_1^{-1}G^\mathrm{T}X^{(k)} \\&\quad +\, X^{(k)}BR_2^{-1}B^\mathrm{T}X^{(k)} \\&\quad -\, \left( X^{(k+1)} - X^{(k)} \right) \,BR_2^{-1}B^\mathrm{T}\,\left( X^{(k+1)} - X^{(k)} \right) \,. \end{aligned}$$

After some matrix manipulations we derive

$$\begin{aligned} 0= & {} \left( A - G F_1(X^{(k)})\right) ^\mathrm{T} X^{(k+1)} + X^{(k+1)}\,(A - G F_1(X^{(k)})) \nonumber \\&\quad - \,[X^{(k+1)}B+S_2^\mathrm{T}] R_2^{-1}[ B^\mathrm{T}\,X^{(k+1)}+S_2 ] + \varPi ({X}^{(k+1)}) \nonumber \\&\quad +\, C^\mathrm{T}C - S_1^\mathrm{T}R_1^{-1}S_1 + X^{(k)}GR_1^{-1}G^\mathrm{T}X^{(k)} \,. \end{aligned}$$

(13)

Based on the executed matrix manipulations we conclude, that the perturbed Riccati Eq. (13) is equivalent to the main iterative process (9), (10). Thus the matrix sequence defined by (13) with $X^{(0)}=0$ converges to the stabilizing solution to (1). Numerical solvers for the perturbed Riccati Eq. (13) based on the Riccati recurrence equation are investigated in [2, 6]. Following the experience with iterations considered in [2, 6] we transform the latest recurrence equation in the following form:

$$\begin{aligned} 0= & {} \left( A - GF_1^{(k)}\right) ^\mathrm{T}\, X^{(k+1)} + X^{(k+1)}\,\left( A - GF_1^{(k)}\right) \nonumber \\&\quad - [X^{(k+1)}B+S_2^\mathrm{T}] R_2^{-1}[ B^\mathrm{T}\,X^{(k+1)}+S_2 ] + {\tilde{Q}}^{(k)} \,. \end{aligned}$$

(14)

where

$$\begin{aligned} {\tilde{Q}}^{(k)} = \varPi ({X}^{(k)}) + C^\mathrm{T}C - S_1^\mathrm{T}R_1^{-1}S_1 + X^{(k)}GR_1^{-1}G^\mathrm{T}X^{(k)}\,. \end{aligned}$$

Further on, we apply the second approach to transform (12):

$$\begin{aligned} 0= & {} (A + B\,F_2(X^{(k)}))^\mathrm{T}\, X^{(k+1)} + X^{(k+1)}\,(A + B\,F_2(X^{(k)})) + \varPi ({X}^{(k+1)}) \\&\quad -\,F_1(X^{(k)})^\mathrm{T}G^T X^{(k+1)} - X^{(k+1)}\,G F_1(X^{(k)}) \\&\quad +\,C^\mathrm{T}C - S_1^\mathrm{T}R_1^{-1}S_1 - S_2^\mathrm{T}R_2^{-1}S_2 + X^{(k)}GR_1^{-1}G^\mathrm{T}X^{(k)} \\&\quad +\,X^{(k)}BR_2^{-1}B^\mathrm{T}X^{(k)} \\&\quad -\,\left( X^{(k+1)} - X^{(k)} \right) \,BR_2^{-1}B^\mathrm{T}\,\left( X^{(k+1)} - X^{(k)} \right) \,. \end{aligned}$$

We obtain

$$\begin{aligned} 0= & {} (A + B\,F_2(X^{(k)}))^\mathrm{T}\, X^{(k+1)} + X^{(k+1)}\,(A + B\,F_2(X^{(k)})) \nonumber \\&+\,C^\mathrm{T}C+ X^{(k)}BR_2^{-1}B^\mathrm{T}X^{(k)} - S_2^\mathrm{T}R_2^{-1}S_2 + \varPi ({X}^{(k+1)}) \nonumber \\&- \,(G^\mathrm{T} X^{(k+1)} +S_1)^\mathrm{T}R_1^{-1}(G^\mathrm{T} X^{(k+1)} +S_1) \nonumber \\&+\,(X^{(k+1)} -X^{(k)})GR_1^{-1}G^\mathrm{T}(X^{(k+1)} -X^{(k)}) \nonumber \\&-\,\left( X^{(k+1)} - X^{(k)} \right) \,BR_2^{-1}B^\mathrm{T}\,\left( X^{(k+1)} - X^{(k)} \right) \,. \end{aligned}$$

(15)

Thus, the recurrence Eq. (15) is equivalent to the main iterative process (9), (10). In order to construct the matrix sequence by using Eq. (15) we apply the following practical realization to (15):

$$\begin{aligned} \begin{array}{lcl} &{}&{} \mathrm{1.\,We\,take} \ X^{(0)} =0 \ ; \\ &{}&{} \mathrm{2.\, Compute} \ X^{(1)} \ \mathrm{as\, the\,stabilizing\, solution\, to} \\ &{}&{} \mathrm{\, the\,Lyapunov\, equation:} \\ 0&{}=&{} A^\mathrm{T}\, X^{(1)} + X^{(1)}\,A + \varPi ({X}^{(0)}) + C^\mathrm{T}\,C - S_2^\mathrm{T}R_2^{-1}S_2 - S_1^\mathrm{T}R_1^{-1}S_1 \\ &{}&{} \mathrm{3.\, Compute} \ X^{(2)},X^{(3)}, \ldots \mathrm{as\, the\,stabilizing\, solution\, to} \\ &{}&{} \mathrm{\, the \,Lyapunov\, equation:} \\ 0&{}=&{} (A + B\,F_2(X^{(k)}))^\mathrm{T}\, X^{(k+1)} + X^{(k+1)}\,(A + B\,F_2(X^{(k)})) + \varPi ({X}^{(k)}) \\ &{}&{} +\,C^\mathrm{T}C+ X^{(k)}BR_2^{-1}B^\mathrm{T}X^{(k)} - (G^\mathrm{T} X^{(k)} +S_1)^\mathrm{T}R_1^{-1}(G^\mathrm{T} X^{(k)} +S_1) \\ &{}&{} +\,(X^{(k)} - X^{(k-1)})GR_1^{-1}G^\mathrm{T}(X^{(k)} -X^{(k-1)}) - S_2^\mathrm{T}R_2^{-1}S_2 \\ &{}&{} -\,\left( X^{(k)} - X^{(k-1)} \right) \,BR_2^{-1}B^\mathrm{T}\,\left( X^{(k)} - X^{(k-1)} \right) \,, \\ &{}&{} \ \ k=1,2, \ldots \end{array} \end{aligned}$$

(16)

In this section we have considered two iterative methods for computing the stabilizing solution to (1):

Riccati iteration (14);
New Lyapunov iteration (16).

One matrix sequence is constructed in both methods (14) and (16).

4 Numerical experiments

In this paper we have studied three iterative methods for solving algebraic Riccati equation, arising in stochastic control with an indefinite quadratic term (1). Here we provide some numerical experiments with these iterative methods. The first one is the iterative process (9), (10). We construct two matrix sequences $\{ {X}^{(k)} \} $ and $\{{Z}^{(k)} \} \ , k=0,1, \ldots $ for each example. The matrix sequence $\{ {X}^{(k)} \} $ is computed using the external loop of the iterative process (9), (10). In order to find the stabilizing solution ${Z}^{(k)}$ to (10) we apply Riccati iteration (11). Furthermore, the next two methods are described by the iterative procedures (14) and (16). These methods construct one matrix sequence, which converges to the stabilizing solution of (1). The results from numerical experiments are compared.

In our experiments we solve Riccati recurrence equations (11) and (14) with the MATLAB procedure care where the flops are $81\,n^3$ per one iteration. The MATLAB procedure lyap is applied for solving (16) and the flops are $\frac{27}{2}n^3$ per one iteration.

We consider a family of examples of Riccati Eq. (1) with $\varPi (X) = {A_1}^\mathrm{T}XA_1 + {A_2}^\mathrm{T}XA_2$ in two tests, where the coefficient real matrices are given as follows: $A, A_1, A_2, B_1, B_2$ were constructed using the MATLAB notations:

$$\begin{aligned} \begin{array}{lcl} \mathrm{Test \ 1}&{}:&{}A=abs(randn(n,n))/6-6.5*eye(n,n); \\ &{}&{} n=14,..., 17; \\ &{}&{} R_1=-[0.35 \ 0 \ 0;0 \ 0.32 \ 0; 0 \ 0 \ 0.375]; \ \ \ \ \ S_1=randn(3,n)/4; \\ &{}&{} R_2=[8.7 \ 2 \ 0;2 \ 7 \ 0; 0 \ 0 \ \ 11.5]; \ \ \ \ \ S_2=randn(3,n)/4; \\ &{}&{} G=randn(n,3)/5; \ \ \ \ B=randn(n,3)/5; \ \ \ Q = 0.3*eye(n,n); \\ &{}&{} A_1=randn(n,n)/2; \ \ \ \ \ A_2=randn(n,n)/5; \\ &{}&{} \\ \mathrm{Test \ 2}&{}:&{}A=(randn(n,n))/24-9*eye(n,n); \\ &{}&{} n=26,...,32; \\ &{}&{} R_1=-[0.35 \ 0 \ 0;0 \ 0.32 \ 0; 0 \ 0 \ 0.375]; \ \ \ \ \ S_1=randn(3,n)/4; \\ &{}&{} R_2=[8.7 \ 2 \ 0;2 \ 7 \ 0; 0 \ 0 \ \ 11.5]; \ \ \ \ \ S_2=randn(3,n)/4; \\ &{}&{} G=randn(n,3)/5; \ \ \ \ B=randn(n,3)/5; \ \ \ \ Q = 0.002*eye(n,n); \\ &{}&{} A_1=randn(n,n)/2; \ \ \ \ A_2=randn(n,n)/5; \\ \end{array} \end{aligned}$$

In our definitions the function randn(p,k) returns a p-by-k matrix of pseudorandom scalar values and a q-by-m sparse matrix respectively (for more information see the MATLAB description).

Our experiments are executed in MATLAB on a 2.20GHz Intel(R) Core(TM) i7-4702MQ CPU computer. We use two variables tolR and tol for small positive numbers to control the accuracy of the computations. We use $tolR= 1e-7, tol = 1e-5$ in our experiments. We denote $Error_{\mathcal {R},k} = \ \left\| \mathcal {R}\, ( {\mathbf X}^{(k)} ) \right\| $ and $Error_{\mathcal {G},k} = \ \left\| \mathcal {G}\, ( {\mathbf Z}^{(k)} ) \right\| $. Iteration (9) stops when the inequality $Error_{\mathcal {R}, k_0} \le tolR$ is satisfied for some $k_0$. Iteration (10) stops when the inequality $Error_{\mathcal {G}, s_0} \le tol$ is satisfied for some $s_0$. The above inequalities are practical stopping criteria for (9) and (11), (14) and (16).

Table 1 Results for the example

Full size table

We have executed one hundred examples of each value of n. Table 1 shows the values of the variables “$It_M$” and “$It_S$”. The variable “$It_M$” means the biggest number of the iterations for the main iterative process (9) is finished for all runs, i.e. $It_M = \max _{p=1,\ldots ,100} {\tilde{k}}_p,$ where $Error_{\mathcal {R}, {\tilde{k}}_p} \le tolR$. The variable “$It_S$” stands for the average number of iterations executed in the inner loop by iterations (11), (14) and (16) until the main iteration is finished, i.e. $It_S = \max _{p=1,\ldots ,100} {s_p \over {\tilde{k}}_p},$ where $Error_{\mathcal {G}, s_p} \le tol$.

Iterations (14) and (16) construct one matrix sequence, which converges to the stabilizing solution. Results from the experiments with (14) and (16) are presented in the same table. These iterations stop when the inequality $Error_{\mathcal {R}, k_0} \le tolR$ holds. We display the biggest number of iterations “$It_b$” and the average number of iterations avIt of each size for all one hundred runs. In addition, the column “CPU” presents the CPU time for execution of the corresponding iterations for all one hundred runs 1.

The results described in the table show that all introduced iterative methods are effective methods for computing the stabilizing solution of the Riccati equations arising in the $H_{\infty }$ stochastic problems.

5 Conclusion

The novelty of this paper is the construction and numerical analysis of the new iterative algorithms provided a converge matrix sequence. We have studied two iterative processes for finding the stabilizing solution to generalized Riccati equations (1). In the proposed algorithms, we transform the given two matrix sequences $\{ {X}^{(k)} \} $ and $\{{Z}^{(k)} \}$ into one different matrix sequence. We have made numerical experiments for computing this solution and we have compared the numerical results. We have compared the results from the experiments with regard to the number of iterations and CPU time for executing. Our numerical experiments confirm the effectiveness of the proposed new methods (14) and (16). The proposed algorithms complete the stabilizing solution with less computational time and no loss of accuracy. Moreover, the proposed algorithms utilize less computer memory and it can be considered as their potential benefit.

References

Basar, T., Bernhard, P.: $H_{\infty }$ Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach. Systems Control: Foundations and Applications Series. Birkhauser, Boston (1995)
MATH Google Scholar
Costa, E., do Val, J.: An algorithm for solving a perturbed algebraic Riccati equation. Eur. J. Control 10, 576–580 (2004)
Article MathSciNet MATH Google Scholar
Dragan, V., Aberkane, S., Ivanov, I.: On computing the stabilizing solution of a class of discrete-time periodic Riccati equations. Int. J. Robust Nonlinear Control (2015). doi:10.1002/rnc.3131
MathSciNet MATH Google Scholar
Dragan, V., Morozan T., Stoica, A.M.: Mathematical Methods in Robust Control of Linear Stochastic Systems. Mathematical Concepts and Methods in Science and Engineering, Series Editor: Angelo Miele, vol. 50 (2013)
Feng, Y.T., Anderson, B.D.O.: An iterative algorithm to solve state-perturbed stochastic algebraic Riccati equations in LQ zero-sum games. Syst. Control Lett. 59(1), 50–56 (2010)
Article MathSciNet MATH Google Scholar
Ivanov, I.: On some iterations for optimal control of jump linear equations nonlinear analysis series a: theory. Methods Appl. 69, 4012–4024 (2008)
MATH Google Scholar
Ivanov, I., Ivanov, I., Netov, N.: On the iterative solution to $H_{\infty }$ control problems. Appl. Math. 6, 1263–1270 (2015)
Article Google Scholar
Lanzon, A., Feng, Y., Anderson, B., Rotkowitz, M.: Computing the positive stabilizing solution to algebraic Riccati equations with an indefinite quadratic term via a recursive method. IEEE Trans. Autom. Control 53(10), 2280–2291 (2008)
Article MathSciNet MATH Google Scholar
Vrabie, D., Lewis, F.: Adaptive dynamic programming for online solution of a zero-sum differential game. J. Control Theory Appl. 9, 353–360 (2011)
Article MathSciNet MATH Google Scholar
Wu, H.-N., Luo, B.: Simultaneous policy update algorithms for learning the solution of linear continuous-time $H_{\infty }$ state feedback control. Inf. Sci. 222, 472–485 (2013)
Article MathSciNet MATH Google Scholar
Zhu, H., Zhang, C.: Infinite time horizon nonzero-sum linear quadratic stochastic differential games with state and control-dependent noise. J. Control Theory Appl. 11(4), 629–633 (2013). doi:10.1007/s11768-013-1182-3
Article MathSciNet Google Scholar
Sun, J., Yong, J., Zhang, S.: Linear quadratic stochastic two-person zero-sum differential games in an infinite horizon. ESAIM Control Optim. Calculus Variations 22(3), 743–769 (2016). doi:10.1051/cocv/2015024
Article MathSciNet MATH Google Scholar
Luo, B., Liu, D., Yang, X., Ma, H.: $H_{\infty }$ control synthesis for linear parabolic PDE systems with model-free policy iteration. Lect. Notes Comput. Sci. 9377, 81–90 (2015)
Article MathSciNet Google Scholar
Luo, B., Huang, T., Wu, H.-N., Yang, X.: Data-Driven $H_{\infty }$ control for nonlinear distributed parameter systems. IEEE Trans Neural Netw. Learn. Syst. 26(11), 2949–2961 (2015). doi:10.1109/TNNLS.2015.2461023
Article MathSciNet Google Scholar
Zhu, H.-N., Zhang, C.-K.: Bin. N.: Infinite horizon LQ zero-sum stochastic differential games with Markovian jumps. Appl. Math. 3(10A), 1321–1326 (2012). doi:10.4236/am.2012.330188
Article Google Scholar

Download references

Acknowledgements

The present research paper was supported in part by the EEA Scholarship Programme BG09 Project Grant D03-91 under the European Economic Area Financial Mechanism.

Author information

Authors and Affiliations

Faculty of Economics and Business Administration, Sofia University “St. Kl. Ohridski”, 125 Tzarigradsko Chaussee Blvd., bl. 3, 1113, Sofia, Bulgaria
Ivan G. Ivanov
College Dobrich, Shoumen University, Shoumen, Bulgaria
Ivan G. Ivanov & Ivelin G. Ivanov

Authors

Ivan G. Ivanov
View author publications
You can also search for this author in PubMed Google Scholar
Ivelin G. Ivanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan G. Ivanov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ivanov, I.G., Ivanov, I.G. The iterative solution to LQ zero-sum stochastic differential games. J. Appl. Math. Comput. 56, 547–559 (2018). https://doi.org/10.1007/s12190-017-1086-3

Download citation

Received: 13 June 2016
Published: 09 February 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s12190-017-1086-3

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The iterative solution to LQ zero-sum stochastic differential games

Abstract

Similar content being viewed by others

A Numerical Algorithm to Calculate the Unique Feedback Nash Equilibrium in a Large Scalar LQ Differential Game

Linear-Quadratic McKean-Vlasov Stochastic Differential Games

On the SQH Method for Solving Differential Nash Games

1 Introduction

Remark 1

2 Preliminaries

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

3 Some iterative procedures

3.1 An iterative process with two matrix sequences

Theorem 1

Proof

3.2 An iterative process with one matrix sequence

4 Numerical experiments

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

The iterative solution to LQ zero-sum stochastic differential games

Abstract

Similar content being viewed by others

A Numerical Algorithm to Calculate the Unique Feedback Nash Equilibrium in a Large Scalar LQ Differential Game

Linear-Quadratic McKean-Vlasov Stochastic Differential Games

On the SQH Method for Solving Differential Nash Games

1 Introduction

Remark 1

2 Preliminaries

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

3 Some iterative procedures

3.1 An iterative process with two matrix sequences

Theorem 1

Proof

3.2 An iterative process with one matrix sequence

4 Numerical experiments

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation