On Class of Linear Quadratic Non-cooperative Differential Games with Continuous Updating

Kuchkarov, Ildus; Petrosian, Ovanes

doi:10.1007/978-3-030-22629-9_45

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11548))

Included in the following conference series:

International Conference on Mathematical Optimization Theory and Operations Research

1003 Accesses
12 Citations

Abstract

The subject of this paper is a linear quadratic case of a differential game model with continuous updating. This class of differential games is essentially new, there it is assumed that at each time instant, players have or use information about the game structure defined on a closed time interval with a fixed duration. As time goes on, information about the game structure updates. Under the information about the game structure we understand information about motion equations and payoff functions of players. A linear quadratic case for this class of games is particularly important for practical problems arising in the engineering of human-machine interaction. The notion of Nash equilibrium as an optimality principle is defined and the explicit form of Nash equilibrium for the linear quadratic case is presented. Also, the case of dynamic updating for the linear quadratic differential game is studied and uniform convergence of Nash equilibrium strategies and corresponding trajectory for a case of continuous updating and dynamic updating is demonstrated.

Research was supported by a grant from the Russian Science Foundation (Project No 18-71-00081).

Access provided by Autonomous University of Puebla. Download conference paper PDF

Hamilton-Jacobi-Bellman Equations for Non-cooperative Differential Games with Continuous Updating

Open-Loop Based Strategies for Autonomous Linear Quadratic Game Models with Continuous Updating

On the Numerical Solution of Differential Games for Neutral-Type Linear Systems

Article 01 July 2018

Keywords

1 Introduction

Most of the real-life conflict-driven processes evolve continuously in time, and their participants continuously receive updated information and adapt. Main models considered in the classical differential game theory are associated with problems defined on a fixed time interval (players have all the information on a closed time interval) [6], problems defined on an infinite time interval with discounting (players have all the information specified on an infinite time interval) [1], problems defined on a random time interval (players have information on a given time interval, but the duration of this interval is a random variable) [14]. One of the first works in the theory of differential games is devoted to the differential pursuit game (the player’s gain depends on the time of capture of the opponent) [12]. In all the above models and approaches it is assumed that at the beginning of the game players know all information about the game dynamics (equations of motion) and about player’s preferences (cost functions). However, these approaches do not take into account the fact that in many real conflict-controlled processes, players at the initial time instant do not have all information about the game. Therefore classical approaches for defining in some sense optimal strategies (for example, Nash equilibrium), such as Hamilton-Jacobi-Bellman equation [2] or the Pontryagin maximum principle [13], cannot be directly used to construct a large range of real game-theoretic models.

In this paper, we apply the approach of continuous updating to a special class of dynamic games, where the environment can be modeled by a set of linear differential equations and the objectives can be modeled by the functions containing affine and quadratic terms. The popularity of the so-called linear quadratic differential games [4] on one hand can be explained by practical applications in engineering. To some extent, this kind of differential games is analytically and numerically solvable. On the other hand, this linear quadratic problem setting naturally appears if the agents’ objective is to minimize the effect of a small perturbation of their nonlinear optimally controlled environment. By solving a linear quadratic control problem, and using the optimal actions implied by this problem, players can avoid most of the additional cost incurred by this perturbation.

Most of the real conflict-driven processes are continuously evolving over time, and their participants constantly adapt. This paper presents the approach of constructing Nash equilibrium for game models with continuous updating. In the game models with continuous updating, it is assumed that players

have information about motion equations and payoff functions only on $[t, t + \overline{T}]$, where $\overline{T}$ – information horizon, t – current time instant.
receive updated information about motion equations and payoff functions as time $t \in [t_0, +\infty )$ evolves.

In the general form, it is supposed that motion equations and payoff functions explicitly depend on the time parameter. Therefore, in the general form of the differential game with continuous updating information about motion equations and payoff functions updates, because its form changes as the current time $t \in [t_0, +\infty )$ evolves. In this paper, we consider a particular class of linear quadratic differential games with continuous updating, where motion equations and payoff functions do not explicitly depend on time parameter t, but the meaning of the updating procedure is not missed, because the main goal of modeling of behavior of players with continuous updating is reached.

Obviously, it is difficult to obtain Nash equilibrium due to the lack of fundamental approaches to control problems with moving information horizon. Classical methods such as dynamic programming and Hamilton-Jacobi-Bellman equation do not allow to directly construct Nash equilibrium in problems with moving information horizon.

In the framework of dynamic updating approach the following papers were published [5, 7,8,9,10,11, 15]. Their authors laid a foundation for further study of a class of games with dynamic updating. It is assumed that the information about motion equations and payoff functions is updated in discrete time instants and interval on which players know the information is defined by the value of the information horizon. However, the class of games with continuous updating provides with the new theoretical results.

For the linear quadratic game models with continuous updating Nash equilibrium in closed-loop form are constructed and it is proved that Nash equilibrium in the corresponding linear quadratic game with dynamic updating uniformly converges to the constructed controls. This approach allows concluding that the constructed control indeed is optimal in the game model with continuous updating, i.e. in the case when the length of updating interval converges to zero. The similar procedure is performed for the corresponding trajectory.

The paper is structured as follows. In Sect. 2, a description of the initial differential game model and corresponding game model with continuous updating as well as the concept of a strategy for it are presented. In Sect. 3, the concept of Nash equilibrium is adapted for a class of games with continuous updating and the explicit form of it for a class of linear quadratic differential games is presented. In Sect. 4, the description of the game model with dynamic updating and the form of Nash equilibrium with continuous updating is presented. In Sect. 5, the convergence of Nash equilibrium strategies and corresponding trajectories for a case of dynamic and continuous updating is demonstrated. The illustrative model example and corresponding numerical simulation are presented in Sect. 6. Demonstration of convergence result is as well presented in the numerical simulation part. In Sect. 7, the conclusion is drawn.

2 Game Model

In this section description of the initial linear quadratic game model and corresponding game model with continuous updating are presented.

2.1 Initial Linear Quadratic Game Model

Consider n-player ($|N| = n$) linear quadratic differential game $\varGamma (x_0, T - t_0)$ defined on the interval $[t_0, T]$:

Motion equations have the form

$$\begin{aligned} \begin{array}{l} \dot{x}(t) = A x(t) + B_1 u_1(t,x) + \ldots + B_n u_n(t,x),\\ x(t_0)=x_0, \\ x \in \mathbb {R}^l, \ u = (u_1, \ldots , u_n), \ u_i=u_i(t,x) \in U_i \subset \text {comp} \mathbb {R}^k, \ t \in [t_0, T]. \end{array}\end{aligned}$$

(1)

Payoff function of player $i \in N$ is defined as

$$\begin{aligned} \begin{aligned} K_i(x_0, t_0, T; u) = \int \limits _{t_0}^{T} \left( x'(t) Q_i x(t) + \sum \limits _{j=1}^{n} u'_{j}(t,x) R_{ij} u_{j}(t,x) \right) d t, \ i \in N, \end{aligned} \end{aligned}$$

(2)

where $Q_i$, $R_{ij}$ are assumed to be symmetric, $R_{ii}$ is positive defined, $({}\cdot {})'$ means transpose here and hereafter.

2.2 Linear Quadratic Game Model with Continuous Updating

Consider n-player differential game $\varGamma (x, t, \overline{T})$, $t \in [t_0, +\infty )$ defined on the interval $[t, t + \overline{T}]$, where $0< \overline{T}< +\infty $.

Motion equations of $\varGamma (x, t, \overline{T})$ have the form

$$\begin{aligned} \begin{array}{l} \dot{x}^{t}(s) = A x^{t}(s) + B_1 u_1^{t}(s,x^{t}) + \ldots + B_n u_n^{t}(s,x^{t}),\\ x^{t}(t)=x, \\ x^{t} \in \mathbb {R}^l, \ u^{t} = (u^{t}_1, \ldots , u^{t}_n), \ u^{t}_i=u^{t}_i(s,x^{t}) \in U_i \subset \text {comp} \mathbb {R}^k, \ t \in [t_0, +\infty ). \end{array}\end{aligned}$$

(3)

Payoff function of player $i \in N$ in the game $\varGamma (x, t, \overline{T})$ is defined as

$$\begin{aligned} \begin{aligned} K^t_i(x^{t}, t, \overline{T}; u^{t}) = \int \limits _{t}^{t+\overline{T}} \left( \left( x^{t}(s) \right) ' Q_i x^{t}(s) + \sum \limits _{j=1}^{n} \left( u^{t}_j (s,x^{t}) \right) ' R_{i j} u_{j}^{t}(s,x^{t}) \right) d s, \end{aligned} \end{aligned}$$

(4)

where $x^{t}(s)$, $u^{t}(s,x)$ are trajectory and strategies in the game $\varGamma (x, t, \overline{T})$.

Differential game with continuous updating evolves according to the rule:

Time parameter $t \in [t_0, +\infty )$ evolves continuously, as a result players continuously receive updated information about motion equations and payoff functions under $\varGamma (x, t, \overline{T})$.

Strategies u(t, x) in the game model with continuous updating are defined in the following way:

$$\begin{aligned} u(t,x) = u^t(t,x), \ t \in [t_0, +\infty ), \end{aligned}$$

(5)

where $u^t(s,x)$, $s \in [t, t + \overline{T}]$ are some fixed strategies defined in the subgame $\varGamma (x, t, \overline{T})$.

State x(t) in the model with continuous updating is defined according to

$$\begin{aligned} \begin{array}{l} \dot{x}(t)=A x(t) + B_1 u_1(t,x) + \ldots + B_n u_n(t,x), \\ x(t_0)=x_0, \\ x \in \mathbb {R}^l \end{array} \end{aligned}$$

(6)

with strategies with continuous updating u(t, x) involved.

Essential difference between the game model with continuous updating and classic differential game $\varGamma (x_0, T - t_0)$ with prescribed duration is that players in the initial game are guided by the payoffs that they will eventually receive on the interval $[t_0, T]$, but in the case of a game with continuous updating, at the time instant t they orient themselves on the expected payoffs (4), which are calculated using information about the game structure defined on the interval $[t, t + \overline{T}]$.

3 Nash Equilibrium with Continuous Updating in LQ Differential Games

3.1 Concept of Nash Equilibrium for Games with Continuous Updating

Within the framework of continuously updated information in this class of differential games it is interesting to understand of how to model the behavior of players. To do this, we use the concept of Nash equilibrium in feedback strategies. However, for the class of differential games with continuous updating, we would like $u^{NE}(t,x) = (u^{NE}_1(t,x), \ldots , u^{NE}_n(t,x))$ for each fixed $t \in [t_0, + \infty )$ to coincide with the feedback Nash equilibrium in the game (6), (4) defined on the interval $[t, t + \overline{T}]$ at the instant t.

Consider two time intervals $[t, t+\overline{T}]$ and $[t + \epsilon , t + \overline{T} + \epsilon ]$, $\epsilon<< \overline{T}$. According to the problem statement, $u ^ {NE} (t, x)$ at the instant t should coincide with the Nash equilibrium in the game defined on the interval $[t, t + \overline{T}] $ and $ u ^ {NE} (t + \epsilon , x) $ at instant $ t + \epsilon $ should coincide with the Nash equilibrium in the game defined on the interval $[t + \epsilon , t + \epsilon + \overline{T}]$. Therefore direct application of classical approaches for determining Nash equilibrium in feedback strategies is not possible.

In order to construct such a strategy profile, we define the concept of generalized Nash equilibrium in feedback strategies as an optimality principle:

$$\begin{aligned} \widetilde{u}^{NE}(t,s,x) = (\widetilde{u}^{NE}_1(t,s,x), \ldots , \widetilde{u}^{NE}_n(t,s,x)), \ t \in [t_0, T], \ s \in [t, t + \overline{T}], \end{aligned}$$

(7)

which we further use to construct desired strategy profile $u^{NE}(t,x)$.

Definition 1

Strategy profile $\widetilde{u}^{NE}(t,s,x) = $ $(\widetilde{u}^{NE}_1(t,s,x), \ldots , \widetilde{u}^{NE}_n(t,s,x))$, $t \in [t_0, + \infty )$, $s \in [t, t + \overline{T}]$ is a generalized Nash equilibrium in the game with continuous updating, if for any fixed $t \in [t_0, +\infty )$ strategy profile $\widetilde{u}^{NE}(t,s,x)$ is Nash equilibrium in feedback strategies in the game $\varGamma (x, t, \overline{T})$, $0< \overline{T}< \infty $.

Using generalized feedback Nash equilibrium it is possible to define solution concept for a game model with continuous updating.

Definition 2

Strategy profile $u^{NE}(t, x)$ is called the Nash equilibrium with continuous updating, if it is defined in the following way:

$$\begin{aligned} u^{NE}(t,x) = \widetilde{u}^{NE}(t,s,x) |_{s = t} = (\widetilde{u}^{NE}_1(t,s,x) |_{s = t}, \ldots , \widetilde{u}^{NE}_n(t,s,x) |_{s = t}), \end{aligned}$$

(8)

where $t \in [t_0, + \infty ),$ $\widetilde{u}^{NE}(t,s,x)$ is the generalized feedback Nash equilibrium defined above.

Strategy profile $u^{NE}(t, x)$ will be used as a solution concept in the game with continuous updating.

3.2 Theorem on Nash Equilibrium with Continuous Updating for LQ Differential Games

Here we present the explicit form of Nash equilibrium with continuous updating for a two-player differential game.

Theorem 1

The two-player linear quadratic differential game $\varGamma (x_0, t_0, \overline{T})$ with continuous updating has, for every initial state, a linear feedback Nash equilibrium, if and only if the following set of coupled Riccati differential equations has a set of symmetric solutions $K_1$, $K_2$ on the interval [0, 1]:

$$\begin{aligned} \dot{K}_i(\tau )&= - (A \overline{T}- S_j K_j(\tau ))' K_i(\tau ) - K_i(\tau ) (A \overline{T}- S_j K_j(\tau )) \nonumber \\&\quad +\, K_i(\tau ) S_i K_i(\tau ) -Q_i - K_j(\tau ) S_{ji} K_j(\tau ), \nonumber \\&\qquad \qquad \qquad \qquad \quad K_i(1) = 0, \ i \ne j \in N, \end{aligned}$$

(9)

where

$$\begin{aligned} S_i = \overline{T}^2 B_i R_{ii}^{-1} B'_i, \quad S_{ij} = \overline{T}^2 B_i R_{ii}^{-1} R_{ji} R_{ii}^{-1} B'_i, \ i \ne j \in N. \end{aligned}$$

(10)

In this case there is a unique feedback Nash equilibrium with continuous updating, which has the form:

$$\begin{aligned} u^{NE}_i(t,x) = - R_{ii}^{-1} B'_{i} K_i(0) \overline{T}x, \ i \in N. \end{aligned}$$

(11)

Proof

In order to prove the Theorem we introduce the following change of variables

$$\begin{aligned} \begin{aligned} s&= t + \overline{T}\tau , \\ y(\tau )&= x(t+\overline{T}\tau ), \\ v_i(\tau ,y)&= u_i(t+\overline{T}\tau ,x), \ i \in N. \end{aligned} \end{aligned}$$

(12)

By substituting (12) to the motion equations (3), payoff function (4) we obtain

$$\begin{aligned} \begin{array}{l} \dot{y}(\tau ) = \overline{T} A y(s) + \overline{T} B_1 v_1(\tau ,y) + \overline{T} B_2 v_2(\tau ,y) \end{array}\end{aligned}$$

(13)

and

(14)

It is known [4] that the criterion for existence of feedback Nash equilibrium is the existence of symmetric solution for the system of differential Eq. (9). According to [4] feedback Nash equilibrium strategies have the form

$$\begin{aligned} v^{NE}_i(\tau , y) = - R_{ii}^{-1} B'_{i} K_i(\tau ) \overline{T}y. \end{aligned}$$

(15)

From (12) we have

$$\tau = \frac{s-t}{\overline{T}},$$

returning to original variables we obtain the following strategies

$$ u^t_i(s, x) = - R_{ii}^{-1} B'_{i} K_i\left( \frac{s-t}{\overline{T}} \right) \overline{T}x. $$

These strategies are Nash equilibrium in feedback strategies in the subgame $\varGamma (x, t, \overline{T})$ by construction.

Task (13), (14) and solution (15) have the same form for all values t in original game with continuous updating. Then a generalized Nash equilibrium in the game with continuous updating has the form

$$\begin{aligned} \widetilde{u}^{NE}_i(t,s,x) = - R_{ii}^{-1} B'_{i} K_i\left( \frac{s-t}{\overline{T}} \right) \overline{T}x. \end{aligned}$$

(16)

Apply the procedure (8) to determine Nash equilibrium with continuous updating using generalized Nash equilibrium (16), $s=t$:

$$\begin{aligned} u^{NE}_i(t,x) = - R_{ii}^{-1} B'_{i} K_i(0) \overline{T}x, \ t \in [t_0, + \infty ), \ i \in N. \end{aligned}$$

(17)

This proves the theorem.

4 LQ Differential Game with Dynamic Updating

In this section, we define a game model with dynamic updating in order to later demonstrate the convergence of Nash equilibrium strategies and corresponding trajectories for a case of dynamic and continuous updating.

4.1 LQ Game Model with Dynamic Updating

In papers [5, 7,8,9,10,11, 16] the method for constructing differential game model with dynamic updating is described. There it is assumed that players have information about the game structure only over a truncated interval and, based on this, make decisions. In order to model the behavior of players in the case, when information updates dynamically, consider the case when information is updated every $\varDelta {t} > 0$ and the behavior of players on each segment $[t_0+j\varDelta {t},t_0+(j+1)\varDelta {t}]$, $j = 0, 1, 2, \ldots $ is modeled using the notion of truncated subgame:

Definition 3

Let $j = 0, 1, 2, \ldots $. Truncated subgame $\bar{\varGamma }_j (x^j_0, t_0 + j\varDelta t, t_0 + j\varDelta t + \overline{T})$ is the game defined on the interval $[t_0 + j \varDelta t, t_0 + j \varDelta t + \overline{T}]$ in the following way. On the interval $[t_0 + j \varDelta t, t_0 + j \varDelta t + \overline{T}]$ payoff function, motion equation in the truncated subgame and initial game model $\varGamma (x_ {0}, T - t_0)$ coincide:

$$\begin{aligned} \begin{array}{l} \dot{x}^{j}(s) = A x^{j}(s) + B_1 u^{j}_{1}(s,x^{j}) + \ldots + B_n u^{j}_{n}(s,x^{j}),\\ x^{j}(t_0 + j \varDelta t)=x^{j}_{0}, \\ x^{j} \in \mathbb {R}^n, \ u^j = (u^{j}_{1}, \ldots , u^{j}_{n}), \ u^{j}_{i}=u^{j}_{i}(s,x^{j}) \in U_i \subset \text {comp} \mathbb {R}^k, \ t \in [t_0, +\infty ). \end{array}\end{aligned}$$

(18)

$$\begin{aligned} \begin{aligned} K^{j}_{i}(x^{j}, t_0 + j \varDelta {t}, t_0 + j \varDelta {t} + \overline{T}; u^j)&= \int \limits _{t_0 + j \varDelta {t}}^{t_0 + j \varDelta {t}+\overline{T}} \left( x^{j}(s) \right) ' Q_i x^{j}(s) \\&\quad + \sum \limits _{k=1}^{n} { \left( {u^k} (s,x^{j}) \right) ' } R_{i k} u^{k}(s,x^{j}) d s, \ i \in N, \end{aligned} \end{aligned}$$

(19)

At any instant $t=t_0 + j \varDelta {t}$ information about the game structure updates, and therefore players adapt to it. This class of game models is called differential games with dynamic updating.

As a solution concept in the differential game model with dynamic updating we will use feedback Nash equilibrium. In the same way as in Sect. 3 we will need to define a special form of it. According to the approach described above, at any time instant $t \in [t_0, + \infty )$, players have or use truncated information about the game structure $\varGamma (x_0, T - t_0)$, therefore classical approaches for determining optimal strategies (cooperative and noncooperative) cannot be directly applied. In order to determine the solution for games with dynamic updating, the notion of resulting feedback Nash equilibrium is introduced:

Definition 4

Resulting feedback Nash equilibrium

$$\hat{u}^{NE}(t,x) = (\hat{u}_1^{NE}(t,x), \ldots , \hat{u}_n^{NE}(t,x))$$

of players in the game model with dynamic updating have the form:

$$\begin{aligned} \{ \hat{u}^{NE}(t,x) \}^{\infty }_{t = t_0} = {\left\{ \begin{array}{ll} u_{0}^{NE}(t,x), \quad t \in [t_0, t_0 + \varDelta t], \\ \cdots \\ u_{j}^{NE}(t,x), \quad t \in (t_0 + j\varDelta t, t_0 + (j+1)\varDelta t], \\ \cdots \end{array}\right. } \end{aligned}$$

(20)

where $u^{NE}_{j}(t,x) = (u^{j,NE}_1(t,x), \ldots , u^{j,NE}_n(t,x))$ is some fixed feedback Nash equilibrium in the truncated subgame $\bar{\varGamma }_j(x^{j,NE}_{0}, t_0 + j \varDelta t, t_0 + j \varDelta t + \overline{T})$, $j = 0, 1, 2, \ldots $ starting along the equilibrium trajectory of the previous truncated subgame: $x^{j, NE}_{0} = x^{j-1, NE}(t_0 + j \varDelta t)$.

Trajectory obtained by using motion equation (1) and the resulting feedback Nash equilibrium $\hat{u}^{NE}(t,x)=(\hat{u}^{NE}_1(t,x),\ldots ,\hat{u}^{NE}_n(t,x))$ we denote by $\hat{x}^{NE}(t)$ and call the resulting equilibrium trajectory.

4.2 Resulting Feedback Nash Equilibrium with Dynamic Updating

Firstly, consider Nash Equilibrium in truncated subgame $\bar{\varGamma }_j (x^j_0, t_0 + j\varDelta t, t_0 + j\varDelta t + \overline{T})$.

Theorem 2

The two-player linear quadratic differential game $\bar{\varGamma }_j (x^j_0, t_0 + j\varDelta t, t_0 + j\varDelta t + \overline{T})$ has, for every initial state, a linear feedback Nash equilibrium if and only if the following set of coupled Riccati differential equations has a set of symmetric solutions $K_1$, $K_2$ on the interval [0, 1]:

$$\begin{aligned} \dot{K}_i(\tau )&= - (A \overline{T}- S_j K_j(\tau ))' K_i(\tau ) - K_i(\tau ) (A \overline{T}- S_j K_j(\tau )) \nonumber \\&\quad +\,K_i(\tau ) S_i K_i(\tau ) -Q_i - K_j(\tau ) S_{ji} K_j(\tau ),\nonumber \\&\qquad \qquad \qquad \qquad \qquad \quad \, K_i(1) = 0, \ i \ne j, \end{aligned}$$

(21)

where

$$\begin{aligned} S_i = \overline{T}^2 B_i R_{ii}^{-1} B'_i, \quad S_{ij} = \overline{T}^2 B_i R_{ii}^{-1} R_{ji} R_{ii}^{-1} B'_i, \ i \ne j \in N. \end{aligned}$$

(22)

In that case there is a unique equilibrium. The equilibrium strategies are

$$\begin{aligned} u^{j,NE}_{i}(t,x) = - R_{ii}^{-1} B'_{i} K_i \left( \frac{t-(t_0+j \varDelta t)}{\overline{T}} \right) \overline{T}x. \end{aligned}$$

(23)

Proof

To prove this theorem we use similar change of variables as in (12) for each truncated subgame:

$$\begin{aligned} \tau = \frac{t-(t_0+j \varDelta t)}{\overline{T}}. \end{aligned}$$

(24)

According to (20) Nash equilibrium for the game model with dynamic updating ${\hat{u}}^{NE}_{i}(t,x)$ can be constructed using the Nash equilibrium defined in each truncated subgame $u^{j, NE}_{i}(t,x)$. Corresponding trajectory $\hat{x}^{NE}(t)$ is constructed using ${\hat{u}}^{NE}_{i}(t,x)$ and (1).

5 Convergence of Resulting Nash Equilibrium Strategies and Trajectory

Theorem 3

For $\varDelta {t}\rightarrow {0}$ and $x \in X$ (X—limited set) resulting feedback Nash equilibrium strategies $\hat{u}^{NE}_i(t,x)$ in the game with dynamic updating uniformly converge to feedback Nash equilibrium with continuous updating $\widetilde{u}_i^{NE}(t,x)$:

$$\begin{aligned} \hat{u}^{NE}_i(t,x) \underset{[t_0,+ \infty )}{\rightrightarrows } \widetilde{u}^{NE}_i(t,x), \ i \in N. \end{aligned}$$

(25)

Proof

Introduce the notation: $t_j {\mathop {=}\limits ^{\text {def}}} t_0+j \varDelta t $ and let $t \in [t_j, t_{j+1}]$ for some j. According to the definition of $\hat{u}^{NE}(t,x)$ (20) we will need to show that $\Vert \widetilde{u}^{NE}_i(t,x) - u^{j,NE}_{i}(t,x) \Vert \rightarrow 0$, when $\varDelta t \rightarrow 0$.

Consider the expressions for $\widetilde{u}^{NE}_i$ and $u^{j,NE}_{i}:$

$$\begin{aligned} \widetilde{u}^{NE}_i(t,x)= & {} - R_{ii}^{-1} B'_{i} K_i(0) \overline{T}x, \\ u^{j,NE}_{i}(t,x)= & {} - R_{ii}^{-1} B'_{i} K_i \left( \frac{t-t_j}{\overline{T}} \right) \overline{T}x. \end{aligned}$$

From Taylor decomposition for K(t) at the point $t=0$ we obtain:

$$\begin{aligned} \Vert \widetilde{u}^{NE}_i(t,x) - u^{j,NE}_{i}(t,x) \Vert \le \Vert R_{ii}^{-1} B'_{i} \Vert \Vert x\Vert \left( \left\| \dot{K}(0) \right\| \frac{\varDelta t}{\overline{T}} + o(\varDelta t) \right) . \end{aligned}$$

(26)

When $\varDelta t \rightarrow 0$ the right hand side of (26) converges to zero and as a result the left hand side of (26) also converges to zero. This completes the proof.

Theorem 4

Equilibrium trajectory in the game with dynamic updating $\hat{x}^{NE}(t)$ pointwise converges to the equilibrium trajectory $\widetilde{x}^{NE}(t)$ in the game with continuous updating $\widetilde{x}^{NE}(t)$ for $\varDelta {t}\rightarrow {0}$:

$$\begin{aligned} \hat{x}^{NE}(t) \underset{[t_0,+ \infty )}{\rightarrow } \widetilde{x}^{NE}(t). \end{aligned}$$

(27)

Proof

Let $t \in [t_j, t_{j+1}]$ for some j. According to the definition of $\hat{x}^{NE}(t)$ we will need to show that $\Vert \widetilde{x}^{NE}(t) - x^{NE}_{j}(t) \Vert \rightarrow 0$ when $\varDelta t \rightarrow 0$.

Trajectories $\widetilde{x}^{NE}(t)$ and $x^{NE}_{j}(t)$ satisfy the differential equations respectively

$$\begin{aligned} \dot{\widetilde{x}}(t)= & {} \left( A - B_1 R_{11}^{-1} B_{1}' K_1(0) \overline{T}- B_2 R_{22}^{-1} B_{2}' K_2(0) \overline{T}\right) \widetilde{x}(t), \\ \dot{x}_{j}(t)= & {} \left( A - B_1 R_{11}^{-1} B_{1}' K_1\left( \frac{t-t_j}{\overline{T}} \right) \overline{T}- B_2 R_{22}^{-1} B_{2}' K_2\left( \frac{t-t_j}{\overline{T}} \right) \overline{T}\right) x_{j}(t). \end{aligned}$$

Notice that

$$ \begin{aligned} K_i(0) \widetilde{x} - K_i\left( \frac{t-t_j}{\overline{T}} \right) x_j = K_i(0) (\widetilde{x} - x_j) + \left( K_i(0) - K_i \left( \frac{t-t_j}{\overline{T}} \right) \right) x_j. \end{aligned} $$

Let $y_j^{NE}(t) = \widetilde{x}^{NE}(t) - x^{NE}_{j}(t) $, $ \widetilde{A} = A - B_1 R_{11}^{-1} B_{1}' K_1(0) \overline{T}- B_2 R_{22}^{-1} B_{2}' K_2(0) \overline{T}$ and

$$ \begin{aligned} f_j(t) = -&B_1 R_{11}^{-1} B_{1}' \left[ K_1(0) - K_1 \left( \frac{t-t_j}{\overline{T}} \right) \right] \overline{T}x_j(t) \\&-B_2 R_{22}^{-1} B_{2}' \left[ K_2(0) - K_2 \left( \frac{t-t_j}{\overline{T}} \right) \right] \overline{T}x_j(t). \end{aligned} $$

Then $y_j^{NE}(t)$ satisfies following differential equation

$$ \dot{y}_j(t) = \widetilde{A} y_j(t) + f_j(t). $$

Consider

$$\begin{aligned} y(t) = {\left\{ \begin{array}{ll} y_{0}(t), \quad t \in [t_0, t_0 + \varDelta t], \\ \cdots \\ y_{j}(t), \quad t \in (t_0 + j\varDelta t, t_0 + (j+1)\varDelta t], \\ \cdots \end{array}\right. } \end{aligned}$$

(28)

and

$$\begin{aligned} f(t) = {\left\{ \begin{array}{ll} f_{0}(t), \quad t \in [t_0, t_0 + \varDelta t], \\ \cdots \\ f_{j}(t), \quad t \in (t_0 + j\varDelta t, t_0 + (j+1)\varDelta t], \\ \cdots \end{array}\right. } \end{aligned}$$

then (28) satisfies following differential equation

$$ \dot{y}(t) = \widetilde{A} y(t) + f(t). $$

with initial state $y(t_0) = 0,$ since $\hat{x}^{NE}(t_0)=\widetilde{x}^{NE}(t)=x_0.$

By the Cauchy formula we have for any $t\ge t_0$

$$ y(t) = \int \limits _{t_0}^{t} e^{\widetilde{A}(t-s)} f(s) d s. $$

Taking this into account we have for fixed t

$$\begin{aligned} \lim \limits _{\varDelta t \rightarrow 0} \Vert y(t_j)\Vert \le \lim \limits _{\varDelta t \rightarrow 0} \Vert e^{\widetilde{A}(t-t_0)} \Vert (t-t_0) \beta \left( \frac{\varDelta t}{\overline{T}} + o(\varDelta t) \right) = 0, \end{aligned}$$

(29)

where

$$ \beta = \left( \Vert B_1 R_{11}^{-1} B_{1}'\Vert \left\| \dot{K_1}(0) \right\| +\Vert B_2 R_{22}^{-1} B_{2}'\Vert \left\| \dot{K_2}(0) \right\| \right) \overline{T}M(t), $$

$$M(t) = \max \limits _{\tau \in [t_0, t]} \Vert \hat{x}^{NE}(\tau )\Vert . $$

According to (29) $y(t) \underset{[t_0,+\infty )}{\rightarrow } 0,$ when $ \varDelta t \rightarrow 0 $. This proves the theorem.

6 Example Model

6.1 Common Description

Consider the model in which there are two individuals investing in a public stock of knowledge (see also Dockner et al. [3]). Let x(t) be the stock of knowledge at time t and $u_i(t)$ – the investment of player i in public knowledge at time t. Assume that the stock of knowledge evolves according to the accumulation equation

$$\begin{aligned} \dot{x}(t)=-\beta x(t)+u_1(t,x)+u_2(t,x), \quad x(0)=x_0, \end{aligned}$$

(30)

where $\beta $ is the depreciation rate. Assume that each player derives quadratic utility from the consumption of the stock of knowledge and that the cost of investment increases quadratically with the investment effort. That is, the cost function of both players is given by

$$\begin{aligned} K_i(x_0,t_0,T;u)=\int _{0}^T\big (-q_ix^2(t)+r_i u^2_i(t,x)\big )d t, \ i=1,2. \end{aligned}$$

(31)

Consider the initial game (30), (31) in the terms of LQ-games theory [4]. To find a feedback Nash equilibrium, we need to solve the following set of coupled Riccati differential equations:

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{k_1}(t)=-2(-\beta -\frac{1}{r_2}k_2(t))k_1(t)+\frac{1}{r_1}k^2_1(t)+q_1,\\ \dot{k_2}(t)=-2(-\beta -\frac{1}{r_1}k_1(t))k_2(t)+\frac{1}{r_2}k^2_2(t)+q_2,\\ k_1(T)=0, \\ k_2(T)=0. \end{array}\right. } \end{aligned}$$

(32)

As an example consider the symmetric case $r_1=r_2=r$, $q_1=q_2=q$. Let $k(t)=k_1(t)=k_2(t).$ We obtain the following differential equation:

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{k}(t)=2\beta k(t)+\frac{3k^2(t)}{r}+q,\\ k(T)=0. \end{array}\right. } \end{aligned}$$

(33)

The solution of Cauchy problem (33) is

$$ k(t)=\frac{\beta r+v}{3}\left( \frac{2v}{(v-\beta r)e^{\frac{2v}{r}(t-T)}+v+\beta r}-1\right) , $$

where $v=\sqrt{\beta ^2r^2-3qr}$. According to [4] feedback Nash equilibrium for the initial game model will have the form:

$$\begin{aligned} u^{NE}_i(t,x)=-\frac{k(t)x}{r}, \ i=1,2. \end{aligned}$$

(34)

By substituting the value for k(t) in (34) we obtain:

$$ u^{NE}_i(t,x)=\frac{\beta r+v}{3r}\left( 1-\frac{2v}{(v-\beta r)e^{\frac{2v}{r}(t-T)}+v+\beta r}\right) x(t). $$

6.2 Game Model with Continuous Updating

Now consider the case of continuous updating. Here we suppose that two individuals at each time instant $t \in [t_0, +\infty )$ use information about motion equations and payoff functions on the interval $[t, t + \overline{T}]$. As the current time t evolves the interval, which defines the information shifts as well. Motion equations for the game model with continuous updating have the form

$$\begin{aligned} \begin{array}{l} \dot{x}^{t}(s)=-\beta x^{t}(s)+u^{t}_1(s,x)+u^{t}_2(s,x), \ x^{t}(t)=x, \quad t \in [t_0, +\infty ). \end{array}\end{aligned}$$

(35)

Payoff function of player $i \in N$ for the game model with continuous updating is defined as

$$\begin{aligned} \begin{aligned} K^t_i(x^{t}, t, \overline{T}; u^{t}) = \int \limits _{t}^{t+\overline{T}} \left( - \left( x^t(s) \right) ^2 q_i + \left( u^t_i(s,x) \right) ^2 r_i \right) d s, \ i=1,2. \end{aligned} \end{aligned}$$

(36)

According to the Theorem 2 defining the form of feedback Nash equilibrium with continuous updating on the first step we need to solve the following differential equation:

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{k}(\tau )=2\beta \overline{T} k(\tau )+\frac{3\overline{T}k^2(\tau )}{r}+\overline{T}q,\\ k(1)=0. \end{array}\right. } \end{aligned}$$

(37)

The solution of (37) is

$$\begin{aligned} k(\tau )=\frac{\beta r+v}{3}\left( \frac{2v}{(v-\beta r)e^{\frac{2v\overline{T}}{r}(\tau -1)}+v+\beta r}-1\right) , \end{aligned}$$

(38)

where $v=\sqrt{\beta ^2r^2-3qr}$. According to (23) feedback Nash equilibrium with continuous updating has the form:

$$\begin{aligned} \widetilde{u}^{NE}_i(t,x)=-\frac{k(0)x\overline{T}}{r}. \end{aligned}$$

(39)

By substituting (38) in (39) we obtain:

$$\begin{aligned} \widetilde{u}^{NE}_i(t,x)=\frac{\beta r+v}{3r}\left( 1-\frac{2v}{(v-\beta r)e^{-\frac{2v\overline{T}}{r}}+v+\beta r}\right) \overline{T}x, \end{aligned}$$

(40)

by substituting (40) in (30) we obtain $\widetilde{x}^{NE}(t)$ as solution of equation

$$\begin{aligned} \dot{\widetilde{x}}^{NE}(t)=-\beta \widetilde{x}^{NE}(t)+\widetilde{u}^{NE}_1(t,x)+\widetilde{u}^{NE}_2(t,x), \quad \widetilde{x}^{NE}(0)=x_0. \end{aligned}$$

(41)

6.3 Game Model with Dynamic Updating

Perform similar calculations for the resulting Nash equilibrium for a game with dynamic updating based on the calculations for the original game and the approach described in Sect. 4.1 and obtain

$$\begin{aligned} \widetilde{u}^{NE}_i(t,x)=-\frac{k \left( \frac{t-t_i}{\overline{T}} \right) x\overline{T}}{r}, \ t \in [t_i, t_{i+1}]. \end{aligned}$$

(42)

By substituting (38) in (42) we obtain:

$$\begin{aligned} \hat{u}^{NE}_i(t,x)=\frac{\beta r+v}{3r}\left( 1-\frac{2v}{(v-\beta r)e^{\frac{2v(t-t_i-\overline{T})}{r}}+v+\beta r}\right) \overline{T}x, \quad t \in [t_i, t_{i+1}], \end{aligned}$$

(43)

by substituting (43) in (30) we obtain $\hat{x}^{NE}(t)$ as solution of equation

$$\begin{aligned} \dot{\hat{x}}^{NE}(t)=-\beta \hat{x}^{NE}(t)+\hat{u}^{NE}_1(t,x)+\hat{u}^{NE}_2(t,x), \quad \hat{x}^{NE}(0)=x_0. \end{aligned}$$

(44)

6.4 Game Model on Infinite Interval

Consider classic approach for Nash equilibrium for the game on infinite interval $[0, +\infty ).$ Motion equations have the form

$$\begin{aligned} \dot{x}(t)=-\beta x(t)+u_1(t,x)+u_2(t,x), \quad x(0)=x_0. \end{aligned}$$

(45)

Payoff function of player $i \in N$ is defined as

$$\begin{aligned} K_i(x_0;u)=\lim \limits _{T \rightarrow \infty }\int _{0}^T\big (-q_ix^2(t)+r_i u^2_i(t,x)\big )d t, \ i=1,2. \end{aligned}$$

(46)

According to [4] feedback Nash equilibrium strategies have the form

$$\begin{aligned} u^{NE}(t, x) = -\frac{k x}{r} \end{aligned}$$

(47)

in our symmetric case ($r_1=r_2=r,$ $ q_1 = q_2=q$), where k is solution of

$$ \begin{aligned} \frac{3k^2}{r}+2\beta k + q = 0. \end{aligned} $$

By substituting (47) in (45) we obtain $x^{NE}(t)$ as solution of equation

$$\begin{aligned} \dot{x}^{NE}(t)=\left( -\beta - \frac{2 k}{r} \right) x^{NE}(t), \quad {x}^{NE}(0)=x_0. \end{aligned}$$

(48)

6.5 Numerical Simulation

Consider the results of numerical simulation for the game model presented above on the interval [0, 8], i.e. $t_0 = 0$, $T = 8$. At the initial instant $t_0 = 0$ the stock of knowledge is 100, i.e. $x_0 = 100$. The other parameters of models: $\beta =0.9,$ $r=6,$ $q=1.$ Suppose that for the case of a dynamic updating (blue solid and dotted lines Figs. 1 and 2), the intervals between updating instants are $\varDelta t = 2$, therefore $l = 4$. In Fig. 1 the comparison of resulting Nash equilibrium in the game with dynamic updating (blue line) and Nash equilibrium with continuous updating (red lines) is presented. In Fig. 2 similar results are presented for the strategies.

In order to demonstrate the results of Theorems 3 and 4 on convergence of resulting equilibrium strategies and corresponding trajectory to the equilibrium strategies and trajectory with continuous updating, consider the simulation results for a case of frequent updating, namely $l=20$. Figures 3 and 4 represent the same solutions as in Figs. 1 and 2, but for the case, when $\varDelta t = 0.4$. Therefore, convergence results are confirmed by the numerical experiments presented below.

7 Conclusion

The concept of feedback Nash equilibrium for the class of linear quadratic differential games with continuous updating is constructed and the corresponding Theorem is presented. The form of feedback Nash equilibrium for a game model with dynamic updating is also presented and convergence of resulting feedback Nash equilibrium with dynamic updating to the feedback Nash equilibrium with continuous updating as the number of updating instants converges to infinity is proved. The results are demonstrated using the differential game model of knowledge stock. Obtained results are both fundamental and applied in nature since they allow specialists from the applied field to use a new mathematical tool for more realistic modeling of engineering system describing human-machine interaction.

References

Basar, T., Olsder, G.: Dynamic Noncooperative Game Theory. Academic Press, London (1995)
MATH Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Dockner, E., Jorgensen, S., Long, N., Sorger, G.: Differential Games in Economics and Management Science. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Engwerda, J.: LQ Dynamic Optimization and Differential Games. Willey, New York (2005)
Google Scholar
Gromova, E., Petrosian, O.: Control of information horizon for cooperative differential game of pollution control. In: 2016 International Conference Stability and Oscillations of Nonlinear Control Systems (Pyatnitskiy’s Conference) (2016)
Google Scholar
Kleimenov, A.: Non-antagonistic Positional Differential Games. Science, Ekaterinburg (1993)
Google Scholar
Petrosian, O.: Looking forward approach in cooperative differential games. Int. Game Theory Rev. 18, 1–14 (2016)
MathSciNet MATH Google Scholar
Petrosian, O.: Looking forward approach in cooperative differential games with infinite-horizon. Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr. (4), 18–30 (2016)
Google Scholar
Petrosian, O., Barabanov, A.: Looking forward approach in cooperative differential games with uncertain-stochastic dynamics. J. Optim. Theory Appl. 172, 328–347 (2017)
Article MathSciNet Google Scholar
Petrosian, O., Nastych, M., Volf, D.: Non-cooperative differential game model of oil market with looking forward approach. In: Petrosyan, L.A., Mazalov, V.V., Zenkevich, N. (eds.) Frontiers of Dynamic Games, Game Theory and Management, St. Petersburg 2017. Birkhäuser, Basel (2018)
Chapter Google Scholar
Petrosian, O., Nastych, M., Volf, D.: Differential game of oil market with moving informational horizon and non-transferable utility. In: 2017 Constructive Nonsmooth Analysis and Related Topics (dedicated to the memory of V.F. Demyanov) (2017)
Google Scholar
Petrosyan, L., Murzov, N.: Game-theoretic problems in mechanics. Lith. Math. Collect. 3, 423–433 (1966)
Google Scholar
Pontryagin, L.: On theory of differential games. Successes Math. Sci. 26, 4(130), 219–274 (1966)
Google Scholar
Shevkoplyas, E.: Optimal solutions in differential games with random duration. J. Math. Sci. 199(6), 715–722 (2014)
Article MathSciNet Google Scholar
Yeung, D., Petrosian, O.: Cooperative stochastic differential games with information adaptation. In: International Conference on Communication and Electronic Information Engineering (2017)
Google Scholar
Yeung, D., Petrosian, O.: Infinite horizon dynamic games: a new approach via information updating. Int. Game Theory Rev. 19, 1–23 (2017)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

St. Petersburg State University, Saint-Petersburg, 199034, Russia
Ildus Kuchkarov & Ovanes Petrosian
National Research University Higher School of Economics, Saint-Petersburg, 194100, Russia
Ovanes Petrosian

Authors

Ildus Kuchkarov
View author publications
You can also search for this author in PubMed Google Scholar
Ovanes Petrosian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ovanes Petrosian .

Editor information

Editors and Affiliations

Krasovsky Institute of Mathematics and Mechanics, Ekaterinburg, Russia
Michael Khachay
Sobolev Institute of Mathematics, Novosibirsk, Russia
Yury Kochetov
University of Florida, Gainesville, FL, USA
Panos Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuchkarov, I., Petrosian, O. (2019). On Class of Linear Quadratic Non-cooperative Differential Games with Continuous Updating. In: Khachay, M., Kochetov, Y., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2019. Lecture Notes in Computer Science(), vol 11548. Springer, Cham. https://doi.org/10.1007/978-3-030-22629-9_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-22629-9_45
Published: 12 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22628-2
Online ISBN: 978-3-030-22629-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Class of Linear Quadratic Non-cooperative Differential Games with Continuous Updating

Abstract

Similar content being viewed by others

Hamilton-Jacobi-Bellman Equations for Non-cooperative Differential Games with Continuous Updating

Open-Loop Based Strategies for Autonomous Linear Quadratic Game Models with Continuous Updating

On the Numerical Solution of Differential Games for Neutral-Type Linear Systems

Keywords

1 Introduction

2 Game Model

2.1 Initial Linear Quadratic Game Model

2.2 Linear Quadratic Game Model with Continuous Updating

3 Nash Equilibrium with Continuous Updating in LQ Differential Games

3.1 Concept of Nash Equilibrium for Games with Continuous Updating

Definition 1

Definition 2

3.2 Theorem on Nash Equilibrium with Continuous Updating for LQ Differential Games

Theorem 1

Proof

4 LQ Differential Game with Dynamic Updating

4.1 LQ Game Model with Dynamic Updating

Definition 3

Definition 4

4.2 Resulting Feedback Nash Equilibrium with Dynamic Updating

Theorem 2

Proof

5 Convergence of Resulting Nash Equilibrium Strategies and Trajectory

Theorem 3

Proof

Theorem 4

Proof

6 Example Model

6.1 Common Description

6.2 Game Model with Continuous Updating

6.3 Game Model with Dynamic Updating

6.4 Game Model on Infinite Interval

6.5 Numerical Simulation

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation