The Impact of Discounted Indices on Equilibrium Strategies of Players in Dynamical Bimatrix Games

Krasovskii, Nikolay; Tarasyev, Alexander

doi:10.1007/978-3-319-92988-0_6

Nikolay Krasovskii¹³ &
Alexander Tarasyev^13,14

Part of the book series: Static & Dynamic Game Theory: Foundations & Applications ((SDGTFA))

659 Accesses

Abstract

The paper deals with construction of solutions in dynamical bimatrix games. It is assumed that integral payoffs are discounted on the infinite time interval. The dynamics of the game is subject to the system of differential equations describing the behavior of players. The problem of construction of equilibrium trajectories is analyzed in the framework of the minimax approach proposed by N. N. Krasovskii and A. I. Subbotin in the differential games theory. The concept of dynamical Nash equilibrium developed by A. F. Kleimenov is applied to design the structure of the game solution. For obtaining constructive control strategies of players, the maximum principle of L. S. Pontryagin is used in conjunction with the generalized method of characteristics for Hamilton–Jacobi equations. The impact of the discount index is indicated for equilibrium strategies of the game.

Access provided by CONRICYT-eBooks. Download chapter PDF

Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games

Minimax Generalized Solutions of Hamilton-Jacobi Equations in Dynamic Bimatrix Games

Equilibrium Trajectories in Dynamical Bimatrix Games with Average Integral Payoff Functionals

Article 28 June 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

6.1 Introduction

The dynamical bimatrix game with discounted integral payoff functionals is considered on the infinite horizon. Usually the discount parameter appears to be very uncertain value which reflects subjective components in economic and financial models. In this case models with discounted indices require an implementation of sensitivity analysis for solutions with respect to changing of the discount parameter. In the paper we build optimal control strategies based on Krasovskii minimax approach [10, 11], using constructions of Pontryagin maximum principle [21] and Subbotin technique of method of characteristics for generalized (minimax) solutions of Hamilton-Jacobi equations [22, 23]. Basing on constructed optimal control strategies we simulate equilibrium trajectories for dynamical bimatrix game in the framework of Kleimenov approach [8]. It is important to note that in considered statement we can obtain analytical solutions for control strategies depending explicitly on uncertain discount parameter. That allows to implement the sensitivity analysis of equilibrium trajectories with respect to changing of discount parameter and determine the asymptotical behavior of solutions when the discount parameter converges to zero. It is shown that control strategies and equilibrium solutions asymptotically converge to the solution of dynamical bimatrix game with average integral payoff functional considered in papers by Arnold [1].

It is worth to note that we use dynamical constructions and methods of evolutionary games analysis proposed in the paper [18]. To explain the dynamics of players’ interaction we use elements of evolutionary games models [2, 5, 6, 25, 27]. For the analysis of shifting equilibrium trajectories from competitive static Nash equilibrium to the points of cooperative Pareto maximum we consider ideas and constructions of cooperative dynamical games [20]. The dynamics of bimatrix game can be interpreted as a generalization of Kolmogorov’s equations for probabilities of states [9], which are widely used in Markov processes, stochastic models of mathematical economics and queuing theory. The generalization is understood in the sense that parameters of the dynamics are not fixed a priori and appear to be control parameters and are constructed by the feedback principle in the framework of control theory and differential games theory.

The solution of dynamical bimatrix games is based on construction of positional strategies that maximize own payoffs at any behavior of competing players, which means “guaranteeing” strategies [10, 11, 19]. The construction of solutions on the infinite horizon is divided into fragments with a finite horizon for which Pontryagin maximum principle is used [21] in accordance with constructions of positional differential games theory [11]. More precisely, elements of maximum principle are considered in the aggregate with the method of characteristics for Hamilton-Jacobi equations [12, 22, 24, 26]. The optimal trajectory in each time interval is constructed from pieces of characteristics while switching moments from one characteristic to another are determined by maximum principle. In this method switching moments and points generate switching lines in the phase space which determine the synthesis of optimal positional strategies. Let us note that analogous methods for construction of positional strategies are used in papers [7, 13,14,15,16,17].

In the framework of proposed approach we consider the model of competition on financial markets which is described by dynamical bimatrix game. For this game we construct switching curves for optimal control strategies and synthesize equilibrium trajectories for various values of the discount parameter. We analyze the qualitative behavior of equilibrium trajectories and demonstrate that equilibrium trajectories of dynamical bimatrix game provide better results than static Nash equilibrium. Results of the sensitivity analysis for obtained solutions are demonstrated. This analysis shows that switching curves of optimal control strategies for the series of the discount parameter values have the convergence property by the parameter. We provide calculations confirming the fact that equilibrium trajectories in the problem with discounting converge to the equilibrium trajectory in the problem with average integral payoff functional.

6.2 Model Dynamics

The system of differential equations which defines the dynamics of behavior for two players is investigated

$$\displaystyle \begin{aligned} \begin{array}{c} \dot{x}(t) = -x(t) + u(t), \quad x(t_0)=x_0,\\ \dot{y}(t) = -y(t) + v(t), \quad y(t_0)=y_0. \end{array}\end{aligned} $$

(6.1)

The parameter x = x(t), 0 ≤ x ≤ 1, means the probability that first player holds to the first strategy (respectively, (1 − x) is the probability that he holds to the second strategy). The parameter y = y(t), 0 ≤ y ≤ 1, is the probability of choosing the first strategy by the second player (respectively, (1 − y) is the probability that he holds to the second strategy). Control parameters u = u(t) and v = v(t) satisfy conditions 0 ≤ u ≤ 1, 0 ≤ v ≤ 1, and can be interpreted as signals, that recommend change of strategies by players. For example, value u = 0 (v = 0) corresponds to the signal: “change the first strategy to the second one”. The value u = 1 (v = 1) corresponds to the signal: “change the second strategy to the first one”. The value u = x (v = y) corresponds to the signal: “keep the previous strategy”.

It is worth to note, that the basis for the dynamics (6.1) and its properties were examined in papers [18, 25]. This dynamics generalizes Kolmogorov’s differential equations for probabilities of states [9]. Such generalization assumes that coefficients of incoming and outgoing streams inside coalitions of players are not fixed a priori and can be constructed as positional strategies in the controlled process.

6.3 Local Payoff Functions

Let us assume that the payoff of the first player is described my the matrix A = a _ij, and the payoff of the second player is described by the matrix B = b _ij

$$\displaystyle \begin{aligned} A = \left( \begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right), \quad B = \left( \begin{array}{cc} b_{11} & b_{12} \\ b_{21} & b_{22} \end{array} \right).\end{aligned} $$

Local payoff functions of the players in the time period t, t ∈ [t ₀, +∞) are determined by the mathematical expectation of payoffs, given by corresponding matrices A and B in the bimatrix game, and can be interpreted as “local” interests of the players

$$\displaystyle \begin{aligned} \begin{array}{c} g_A(x(t),y(t)) = C_A x(t) y(t) - \alpha_1 x(t) - \alpha_2 y(t) + a_{22}, \end{array} \end{aligned}$$

$$\displaystyle \begin{aligned} \begin{array}{c} g_B(x(t),y(t)) = C_B x(t) y(t) - \beta_1 x(t) - \beta_2 y(t) + b_{22}. \end{array} \end{aligned}$$

Here parameters C _A, α ₁, α ₂ and C _B, β ₁, β ₂ are determined according to the classical theory of bimatrix games (see [27])

$$\displaystyle \begin{aligned} \begin{array}{c} C_A = a_{11} - a_{12} - a_{21} + a_{22}, \quad D_A = a_{11} a_{22} - a_{12} a_{21}, \\ \alpha_1 = a_{22} - a_{12}, \quad \alpha_2 = a_{22} - a_{21}, \end{array} \end{aligned}$$

$$\displaystyle \begin{aligned} \begin{array}{c} C_B = b_{11} - b_{12} - b_{21} + b_{22}, \quad D_B = b_{11} b_{22} - b_{12} b_{21}, \\ \beta_1 = b_{22} - b_{12}, \quad \beta_2 = b_{22} - b_{21}. \end{array} \end{aligned}$$

6.4 Nash Equilibrium in the Differential Game with Discounted Functionals

In this section we consider the non-zero sum differential game for two players with discounted payoff functionals on the infinite horizon

$$\displaystyle \begin{aligned} \begin{array}{rcl} JD_{A}^{\infty} &\displaystyle =&\displaystyle [JD_{A}^{-},JD_{A}^{+}], {} \\ JD_{A}^{-} &\displaystyle =&\displaystyle JD_{A}^{-}(x(\cdot),y(\cdot))= \liminf_{T \rightarrow \infty} \int_{t_0}^{T} e^{-\lambda (t-t_0)} g_{A}(x(t),y(t)) \, dt, \\ JD_{A}^{+} &\displaystyle =&\displaystyle JD_{A}^{+}(x(\cdot),y(\cdot))= \limsup_{T \rightarrow \infty} \int_{t_0}^{T} e^{-\lambda (t-t_0)} g_{A}(x(t),y(t)) \, dt, \end{array} \end{aligned} $$

(6.2)

defined on the trajectories (x(⋅), y(⋅)) of the system (6.1).

Payoff functionals of the second player $JD_{B}^{\infty }$, $JD_{B}^{-}$, $JD_{B}^{+}$ are determined analogously by replacement of the function g _A(x, y) by the function g _B(x, y).

Discounted functionals (6.2) are traditional for the problems of evolutionary economics and economic growth [6, 12], and are related to the idea of depreciation of financial funds in time. In the problems of optimal guaranteed control such functionals were considered in the paper [25]. Unlike payoff functionals optimized in each period, discounted functionals admit the possibility of loss in some periods in order to win in other periods and obtain better integral result in all periods. This fact allows the system to stay longer in favorable domains where values of local payoffs for the players are strictly better than values of static Nash equilibrium.

Let us introduce the notion of dynamical Nash equilibrium for the evolutionary game with the dynamics (6.1) and discounted payoff functionals (6.2) in the context of constructions of non-antagonistic positional differential games [8, 11, 18]. Let us define the dynamical Nash equilibrium in the class of positional strategies (feedbacks) U = u(t, x, y, ε), V = v(t, x, y, ε).

Definition 6.1

The dynamical Nash equilibria (U ⁰, V ⁰), U ⁰ = u ⁰(t, x, y, ε), V ⁰ = v ⁰(t, x, y, ε) from the class of controls by the feedback principle U = u(t, x, y, ε), V = v(t, x, y, ε) for the given problem is determined by inequalities

$$\displaystyle \begin{aligned} \begin{array}{c} JD_{A}^{-}(x^{0}(\cdot),y^{0}(\cdot)) \geq JD_{A}^{+}(x_{1}(\cdot),y_{1}(\cdot)) - \varepsilon, \\ JD_{B}^{-}(x^{0}(\cdot),y^{0}(\cdot)) \geq JD_{B}^{+}(x_{2}(\cdot),y_{2}(\cdot)) - \varepsilon, \end{array} \end{aligned}$$

$$\displaystyle \begin{aligned} \begin{array}{c} (x^{0}(\cdot),y^{0}(\cdot)) \in X(x_{0},y_{0},U^{0},V^{0}), \quad (x_{1}(\cdot),y_{1}(\cdot)) \in X(x_{0},y_{0},U,V^{0}), \\ (x_{2}(\cdot),y_{2}(\cdot)) \in X(x_{0},y_{0},U^{0},V). \end{array} \end{aligned}$$

Here symbol X stands for the set of trajectories, starting from initial point and generated by corresponding postional strategies is the sense of the paper [11].

6.5 Auxiliary Zero-Sum Games

For the construction of desired equilibrium feedbacks U ⁰, V ⁰ we use the approach [8]. In accordance with this approach we construct the equilibrium using optimal feedbacks for differential games $\varGamma _A = \varGamma _A^- \cup \varGamma _A^+$ and $\varGamma _B = \varGamma _B^- \cup \varGamma _B^+$ with payoffs $JD_A^{\infty }$ and $JD_B^{\infty }$ (6.2). In the gamed Γ _A the first player maximizes the functional $JD_A^-(x(\cdot ),y(\cdot ))$ with the guarantee using the feedback U = u(t, x, y, ε), and the second player oppositely provides the minimization of the functional $JD_A^+(x(\cdot ),y(\cdot ))$ using the feedback V = v(t, x, y, ε). Vice versa, in the game Γ _B the second player maximizes the functional $JD_B^-(x(\cdot ),y(\cdot ))$ with the guarantee, and the first player maximizes the functional $JD_B^+(x(\cdot ),y(\cdot ))$.

Let us introduce following notations. By $u_A^0=u_A^0(t,x,y,\varepsilon )$ and $v_B^0=v_B^0(t,x,y,\varepsilon )$ we denote feedbacks that solve the problem of guaranteed maximization for payoff functionals $JD_A^-$ and $JD_B^-$ respectively. Let us note, that these feedbacks represent the guaranteed maximization of players’ payoffs in the long run and can be named “positive”. By $u_B^0=u_B^0(t,x,y,\varepsilon )$ and $v_A^0=v_A^0(t,x,y,\varepsilon )$ we denote feedbacks mostly favorable for opposite players, namely, those, that minimize payoff functionals $JD_B^+$, $JD_A^+$ of the opposite players. Let us call them “punishing”.

Let us note, that inflexible solutions of selected problems can be obtained in the framework of the classical bimatrix games theory. Let us propose for definiteness, (this proposition is given for illustration without loss of generality for the solution), that the following relations corresponding to the almost antagonistic structure of bimatrix game hold for the parameters of matrices A and B,

$$\displaystyle \begin{aligned} \begin{array}{c} C_A>0, \quad \quad C_B<0, \vspace{0.5ex}\\ \displaystyle{ 0<x_A = \frac {\alpha_2} {C_A} <1, \quad 0<x_B = \frac {\beta_2} {C_B} <1,} \\ \displaystyle{ 0<y_A = \frac {\alpha_1} {C_A} <1, \quad 0<y_B = \frac {\beta_1} {C_B} <1.} \end{array} \end{aligned} $$

(6.3)

The following proposition is fair.

Lemma 6.1

Differential games $\varGamma _A^-,\varGamma _A^+$ have equal values

$$\displaystyle \begin{aligned} w_A^- = w_A^+ = w_A = \frac{D_A}{C_A}, \end{aligned}$$

and differential games $\varGamma _B^-, \varGamma _B^+$ have equal values

$$\displaystyle \begin{aligned} w_B^- = w_B^+ = w_B = \frac{D_B}{C_B} \end{aligned}$$

for any initial position (x ₀, y ₀) ∈ [0, 1] × [1, 0]. These values, for example, can be guaranteed by “positive” feedbacks $u_A^{cl}$ , $v_B^{cl}$ corresponding to classical solutions x _A , y _B

$$\displaystyle \begin{aligned} u_A^0 = u_A^{cl} = u_A^{cl}(x,y) = \left\{ \begin{array}{l} 0, \quad x_A < x \leq 1 , \\ 1, \quad 0 \leq x < x_A , \\ \left[ 0,1 \right], \quad x = x_A. \end{array} \right. \end{aligned}$$

$$\displaystyle \begin{aligned} v_B^0 = v_B^{cl} = v_B^{cl}(x,y) = \left\{ \begin{array}{l} 0, \quad y_B < y \leq 1 , \\ 1, \quad 0 \leq y < y_B , \\ \left[ 0,1 \right], \quad y = y_B. \end{array} \right. \end{aligned}$$

“Punishing” feedbacks are determined by formulas

$$\displaystyle \begin{aligned} u_B^0 = u_B^{cl} = u_B^{cl}(x,y) = \left\{ \begin{array}{l} 0, \quad x_B < x \leq 1 , \\ 1, \quad 0 \leq x < x_B , \\ \left[ 0,1 \right], \quad x = x_B, \end{array} \right. \end{aligned}$$

$$\displaystyle \begin{aligned} v_A^0 = v_A^{cl} = v_A^{cl}(x,y) = \left\{ \begin{array}{l} 0, \quad y_A < y \leq 1 , \\ 1, \quad 0 \leq y < y_A , \\ \left[ 0,1 \right], \quad y = y_A, \end{array} \right. \end{aligned}$$

and correspond to classical solutions x _B, y _A (6.3), which generate the static Nash equilibrium NE = (x _B, y _A).

The proof of this proposition can me obtained by the direct substitution of shown strategies to corresponding payoff functionals (6.2).

Remark 6.1

Values of payoff functions g _A(x, y), g _B(x, y) coincide at points (x _A, y _B), (x _B, y _A)

$$\displaystyle \begin{aligned} g_A (x_A,y_B) = g_A (x_B,y_A) = w_A, \quad \quad g_B (x_A,y_B) = g_B (x_B,y_A) = w_B. \end{aligned}$$

The point NE = (x _B, y _A) is the “mutually punishing” Nash equilibrium, and the point (x _A, y _B) does not possess equilibrium properties in the corresponding static game.

6.6 Construction of the Dynamical Nash Equilibrium

Let us construct the pair of feedbacks, which consist the Nash equilibrium. For this let us combine “positive” feedbacks $u_A^0,v_B^0$ and “punishing” feedbacks $u_B^0,v_A^0$.

Let us choose the initial position (x ₀, y ₀) ∈ [0, 1] × [0, 1] and accuracy parameter ε > 0. Let us choose the trajectory $(x^0(\cdot ),y^0(\cdot )) \in X (x_0,y_0,U_A^0(\cdot ),v_B^0(\cdot ))$, generated by “positive” $u_A^0=u_A^0(t,x,y,\varepsilon )$ and $v_B^0=v_B^0(t,x,y,\varepsilon )$. Let us choose T _ε > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{c} g_A(x^0(t),y^0(t)) \quad > \quad JD_A^-(x^0(\cdot),y^0(\cdot)) - \varepsilon, \vspace{0.5ex}\\ g_B(x^0(t),y^0(t)) \quad > \quad JD_B^-(x^0(\cdot),y^0(\cdot)) - \varepsilon, \vspace{0.5ex}\\ t \in [T_{\varepsilon},+\infty]. \end{array} \end{aligned}$$

Let us denote by $u_A^{\varepsilon }(t)$: [0, T _ε) → [0, 1], $v_B^{\varepsilon }(t)$: [0, T _ε) → [0, 1] step-by-step implementation of strategies $u_A^0,v_B^0$ such that the corresponding step-by-step trajectory (x _ε(⋅), y _ε(⋅)) satisfies the condition

$$\displaystyle \begin{aligned} \max_{t \in [0,T_{\varepsilon}]} \| (x^0(t),y^0(t)) - (x_{\varepsilon}(t), y_{\varepsilon}(t)) \| < \varepsilon. \end{aligned}$$

From the results of the paper [8] the next proposition follows.

Lemma 6.2

The pair of feedbacks U ⁰ = u ⁰(t, x, y, ε), V ⁰ = v ⁰(t, x, y, ε), combines together “positive” feedbacks $u_A^0$ , $v_B^0$ and “punishing” feedbacks $u_B^0$ , $v_A^0$ according to relations

$$\displaystyle \begin{aligned} U^0 = u^0(t,x,y,\varepsilon) = \left\{ \begin{array}{l} u_A^{\varepsilon}(t), \quad \| (x,y) - (x_{\varepsilon}(t),y_{\varepsilon}(t)) \| < \varepsilon, \\ u_B^0(x,y), \quad otherwise, \end{array} \right. \end{aligned}$$

$$\displaystyle \begin{aligned} V^0 = v^0(t,x,y,\varepsilon) = \left\{ \begin{array}{l} v_B^{\varepsilon}(t), \quad \| (x,y) - (x_{\varepsilon}(t),y_{\varepsilon}(t)) \| < \varepsilon, \\ v_A^0(x,y), \quad otherwise \end{array} \right. \end{aligned}$$

is the dynamical ε-Nash equilibrium.

Below we construct flexible “positive” feedbacks that generate trajectories (x ^fl(⋅), y ^fl(⋅)), which reduce to “better” positions than the inflexible dynamical equilibrium (x _B, y _A), (x _A, y _B) by both criteria $JD_A^{\infty }(x^{fl}(\cdot ),y^{fl}(\cdot )) \geq v_A$, $JD_B^{\infty }(x^{fl}(\cdot ),y^{fl}(\cdot )) \geq v_B$.

6.7 Two-Step Optimal Control Problems

For the construction of “positive” feedbacks $u_{A}^{0}=u_{A}^{fl}(x,y)$, $v_{B}^{0}=v_{B}^{fl}(x,y)$ we consider in this section the auxiliary two-step optimal control problem with discounted payoff functional for the first player in the situation, when actions of the second player are most unfavorable. Namely, let us analyze the optimal control problem for the dynamical system (6.1)

$$\displaystyle \begin{aligned} \begin{array}{c} \dot{x}(t) = -x(t) + u(t), \quad x(0)=x_0,\\ \dot{y}(t) = -y(t) + v(t), \quad y(0)=y_0. \end{array} \end{aligned} $$

(6.4)

with the payoff functional

$$\displaystyle \begin{aligned} JD_{A}^{f} = \int_{0}^{T_{f}} e^{-\lambda t} g_{A}(x(t),y(t)) \, dt. \end{aligned} $$

(6.5)

Here without loss of generality let us consider that t ₀ = 0, T = T _f, and terminal moment of time T _f = T _f(x ₀, y ₀) we determine later.

Without loss of generality, we assume that the value of the static game equals to zero

$$\displaystyle \begin{aligned} w_{A} = \frac{D_{A}}{C_{A}} = 0, \end{aligned} $$

(6.6)

and next conditions hold

$$\displaystyle \begin{aligned} C_{A} > 0, \quad 0 < x_{A}=\frac{\alpha_{2}}{C_{A}} < 1, \quad 0 < y_{A}=\frac{\alpha_{1}}{C_{A}} < 1. \end{aligned} $$

(6.7)

Let us consider the case when initial conditions (x ₀, y ₀) of the system (6.4) satisfy relations

$$\displaystyle \begin{aligned} x_{0} = x_{A}, \quad y_{0} > y_{A}. \end{aligned} $$

(6.8)

Let us assume that actions of the second player are mostly unfavorable for the first player. For trajectories of the system (6.4), which start from initial positions (x ₀, y ₀) (6.8), these actions $v_{A}^{0}=v_{A}^{cl}(x,y)$ are determined by the relation

$$\displaystyle \begin{aligned} v_{A}^{cl}(x,y) \equiv 0. \end{aligned}$$

Optimal actions $u_{A}^{0}=u_{A}^{fl}(x,y)$ of the first player according to the payoff functional $JD_{A}^{f}$ (6.5) in this situation can be presented as the two-step impulse control: it equals one from the initial time moment t ₀ = 0 till the moment of optimal switch s and then equals to zero till the final time moment T _f

$$\displaystyle \begin{aligned} u_{A}^{0}(t)=u_{A}^{fl}(x(t),y(t))= \left\{ \begin{array}{ll} 1, \quad \mbox{if} \quad t_{0} \leq t<s, \\ 0, \quad \mbox{if} \quad s \leq t<T_{f}. \end{array} \right. \end{aligned}$$

Here the parameter s is the optimization parameter. The final time moment T _f is determined by the following condition. The trajectory (x(⋅), y(⋅)) of the system (6.4), which starts from the line where x(t ₀) = x _A, returns to this line when x(T _f) = x _A.

Let us consider two aggregates of characteristics. The first one is described by the system of differential equations with the value on the control parameter u = 1

$$\displaystyle \begin{aligned} \begin{array}{l} \dot{x}(t) = -x(t)+1, \\ \dot{y}(t) = -y(t), \end{array} \end{aligned} $$

(6.9)

solutions of which are determined by the Cauchy formula

$$\displaystyle \begin{aligned} x(t) = (x_{0}-1) e^{-t}+1, \quad y(t) = y_{0} e^{-t}. \end{aligned} $$

(6.10)

Here initial positions (x ₀, y ₀) satisfy conditions (6.8) and time parameter t satisfies the inequality 0 ≤ t < s.

The second aggregate of characteristics is given by the system of differential equations with the value of the control parameter u = 0

$$\displaystyle \begin{aligned} \begin{array}{l} \dot{x}(t) = -x(t), \\ \dot{y}(t) = -y(t), \end{array} \end{aligned} $$

(6.11)

solutions of which are determined by the Cauchy formula

$$\displaystyle \begin{aligned} x(t) = x_{1} e^{-t}, \quad y(t) = y_{1} e^{-t}. \end{aligned} $$

(6.12)

Here initial positions (x ₁, y ₁) = (x ₁(s), y ₁(s)) are determined by relations

$$\displaystyle \begin{aligned} x_{1}=x_{1}(s) = (x_{0}-1) e^{-s} + 1, \quad y_{1}=y_{1}(s) = y_{0} e^{-s}, \end{aligned} $$

(6.13)

and the time parameter t satisfies the inequality 0 ≤ t < p. Here the final time moment p = p(s) and the final position (x ₂, y ₂) = (x ₂(s), y ₂(s)) of the whole trajectory (x(⋅), y(⋅)) is given by formulas

$$\displaystyle \begin{aligned} x_{1} e^{-p} = x_{A}, \quad p = p(s) = \ln \frac{x_{1}(s)}{x_{A}}, \quad x_{2} = x_{A}, \quad y_{2} = y_{1}e^{-p}. \end{aligned} $$

(6.14)

The optimal control problem is to find such moment of time s and the corresponding switching point (x ₁, y ₁) = (x ₁(s), y ₁(s)) on the trajectory (x(⋅), y(⋅)), where the integral I = I(s) reaches the maximum value

$$\displaystyle \begin{aligned} \begin{array}{rcl} I(s) &\displaystyle =&\displaystyle I_{1}(s) + I_{2}(s), {} \\ I_{1}(s) &\displaystyle =&\displaystyle \int_{0}^{s} e^{- \lambda t} (C_{A}((x_{0}-1) e^{-t} + 1) y_{0} e^{-t} - \alpha_{1}((x_{0}-1)e^{-t}+1) \\ &\displaystyle &\displaystyle -\alpha_{2} y_{0} e^{-t} + a_{22}) \, dt, \\ I_{2}(s) &\displaystyle =&\displaystyle e^{- \lambda s} \int_{0}^{p(s)} e^{- \lambda t} (C_{A}x_{1}(s)y_{1}(s) e^{-2t} - \alpha_{1}x_{1}(s) e^{-t} - \alpha_{2}y_{1}(s) e^{-t} + a_{22}) \, dt. \end{array} \end{aligned} $$

(6.15)

On the Fig. 6.1 we depict the initial position IP, chosen on the line x = x _A when y > y _A, the characteristic CH, oriented on the vertex (1, 0) of the unit square, characteristics CH ₁, CH ₂, CH ₃, oriented on the vertex (0, 0) of the unit square, switching points SP ₁, SP ₂, SP ₃ of the motion along characteristics and final points of the motion FP ₁, FP ₂, FP ₃, located of the line x = x _A.

6.8 The Solution of the Two-Step Optimal Control Problem

We obtain the solution of the two-step optimal control problem (6.9)–(6.15), by calculating the derivative dI∕ds, presenting it as the function of optimal switching points (x, y) = (x ₁, y ₁), equating this derivative to zero dI∕ds = 0 and finding the equation F(x, y) = 0 for the curve, that consist from optimal switching points (x, y).

Sufficient maximum conditions in such construction are obtained from the fact that the integral I(s) has the property of monotonic increase by the variable s in the initial period, because the integrand g _A(x, y) is positive, g _A(x, y) > w _A = 0, in the domain x > x _A, y > y _A. In the finite period the integral I(s) strictly monotonically decreases by the variable s, because the integrand g _A(x, y) is negative, g _A(x, y) < w _A = 0, in the domain x > x _A, y < y _A.

Firstly let us calculate integrals I ₁, I ₂

$$\displaystyle \begin{aligned} \begin{array}{rcl} I_{1} = I_{1}(s) &\displaystyle =&\displaystyle C_{A}(x_{0}-1)y_{0}\frac{(1-e^{-(\lambda + 2)s})}{(\lambda + 2)} + C_{A}y_{0}\frac{(1-e^{-(\lambda + 1)s})}{(\lambda + 1)} \\ &\displaystyle &\displaystyle -\alpha_{1}(x_{0}-1)\frac{(1-e^{-(\lambda + 1)s})}{(\lambda + 1)} - \alpha_{1}\frac{(1-e^{-\lambda s})}{\lambda} \\ &\displaystyle &\displaystyle -\alpha_{2}y_{0}\frac{(1-e^{-(\lambda + 1)s})}{(\lambda + 1)} + a_{22}\frac{(1-e^{-\lambda s})}{\lambda}. \end{array} \end{aligned} $$

$$\displaystyle \begin{aligned}\begin{array}{rcl} I_{2} = I_{2}(s) &\displaystyle =&\displaystyle e^{-\lambda s}C_{A}x_{1}(s)y_{1}(s)\frac{(1-e^{-(\lambda + 2)p(s)})}{(\lambda + 2)} \\ &\displaystyle &\displaystyle -e^{-\lambda s} \alpha_1 x_1(s) \frac{(1-e^{-(\lambda+1)p(s)})}{(\lambda+1)} \\ &\displaystyle &\displaystyle -e^{-\lambda s} \alpha_{2}y_{1}(s)\frac{(1-e^{-(\lambda+1)p(s)})}{(\lambda+1)} \\ &\displaystyle &\displaystyle +e^{-\lambda s} a_{22} \frac{(1-e^{-\lambda p(s)})}{\lambda}. \end{array} \end{aligned} $$

Let us calculate derivatives dI ₁∕ds, dI ₂∕ds and present them as functions of optimal switching points (x, y) = (x ₁, y ₁)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{dI_{1}}{ds} &\displaystyle =&\displaystyle C_{A}(x_{0}-1)y_{0}e^{-2s}e^{-\lambda s} + C_{A}y_{0}e^{-s}e^{-\lambda s} \\ &\displaystyle &\displaystyle -\alpha_{1}(x_{0}-1)e^{-s}e^{-\lambda s} - \alpha_{1}e^{-\lambda s} - \alpha_{2}y_{0}e^{-s}e^{-\lambda s} + a_{22}e^{-\lambda s} \\ &\displaystyle = &\displaystyle e^{-\lambda s}(C_{A}xy-\alpha_{1}x-\alpha_{2}y+a_{22}). \end{array} \end{aligned} $$

While calculating the derivative dI ₂∕ds let us take into the account next expressions for derivatives dx∕ds, dy∕ds, dp∕ds and the exponent e ^−p as functions of variables (x, y):

$$\displaystyle \begin{aligned} \frac{dx}{ds} = 1-x, \quad \frac{dy}{ds} = -y, \quad \frac{dp}{ds} = \frac{1-x}{x}, \quad e^{-p} = \frac{\alpha_{2}}{C_{A}x}. \end{aligned}$$

Let us introduce the new derivative q = e ^−p and obtain the expression for dI ₂∕ds

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{dI_{2}}{ds} = &\displaystyle e^{- \lambda s}&\displaystyle \Big(-\lambda C_A x y \frac{(1-q^{(\lambda + 2)})}{(\lambda + 2)} + C_A (1-x) y \frac{(1-q^{(\lambda + 2)})}{(\lambda + 2)} \\ &\displaystyle &\displaystyle - C_A x y \frac{(1-q^{(\lambda + 2)})}{(\lambda + 2)} + C_A (1 - x) y q^{(\lambda + 2)} \\ &\displaystyle &\displaystyle + \lambda \alpha_1 x \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} - \alpha_1 (1-x) \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} - \alpha_1 (1-x) q^{(\lambda + 1)} \\ &\displaystyle &\displaystyle + \lambda \alpha_2 y \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} + \alpha_2 y \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} - \alpha_2 y \frac{(1-x)}{x} q^{(\lambda + 1)} \\ &\displaystyle &\displaystyle + a_{22} \frac{(1-x)}{x} q^{\lambda} - a_{22} (1 - q^{\lambda}) \Big). \end{array} \end{aligned} $$

Let us summarize derivatives dI ₁∕ds and dI ₂∕ds, equalize the expression to zero and express y by x in the following form:

$$\displaystyle \begin{aligned} \begin{array}{rcl} y &\displaystyle =&\displaystyle \Big(\alpha_1 x - \lambda \alpha_1 x \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} + \alpha_1 (1-x) \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} + \alpha_1 (1-x) q^{(\lambda + 1)} \\ &\displaystyle &\displaystyle - a_{22} \frac{(1-x)}{x} q^{\lambda} + a_{22} (1 - q^{\lambda}) - a_{22} \Big) \Big/ \\ &\displaystyle &\displaystyle \Big(C_A x - \lambda C_A x \frac{(1-q^{(\lambda + 2)})}{(\lambda + 2)} + C_A (1-2x) \frac{(1-q^{(\lambda + 2)})}{(\lambda + 2)} + C_A (1 - x) q^{(\lambda + 2)} \\ &\displaystyle &\displaystyle + \lambda \alpha_2 \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} + \alpha_2 \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} - \alpha_2 \frac{(1-x)}{x} q^{(\lambda + 1)} - \alpha_2 \Big). \end{array} \end{aligned} $$

Simplifying the expression we obtain the formula:

$$\displaystyle \begin{aligned} \begin{array}{c} \displaystyle{y = \Big(\alpha_1 \frac{(1-q^{(\lambda + 1)})}{(\lambda + 1)} + \alpha_1 q^{(\lambda + 1)} - a_{22} \frac{1}{x} q^{\lambda} \Big) \Big/} \\ \displaystyle{\Big(C_A \frac{(1-q^{(\lambda + 2)})}{(\lambda + 2)} + C_A q^{(\lambda + 2)} - \alpha_2 \frac{1}{x} q^{(\lambda+1)} \Big).} \end{array} \end{aligned}$$

Taking into the account the fact that w _A = 0 (6.6), we obtain a ₂₂ = (α ₁ α ₂)∕C _A. By substitution of this relation and the expression q = α ₂∕(C _A x) to previous formula we obtain:

$$\displaystyle \begin{aligned} y = \Big( \alpha_1 \Big(1 - \Big(\frac{\alpha_2}{C_A x} \Big)^{(\lambda + 1)} \Big) (\lambda + 2) \Big) \Big/ \Big( C_A \Big(1 - \Big(\frac{\alpha_2}{C_A x} \Big)^{(\lambda + 2)} \Big) (\lambda + 1) \Big). \end{aligned}$$

Multiplying both parts on the expression by x ^(λ+2), we obtain:

$$\displaystyle \begin{aligned} y = \Big( \alpha_1 \Big(x^{(\lambda + 1)} - \Big(\frac{\alpha_2}{C_A} \Big)^{(\lambda + 1)} \Big) (\lambda + 2) x \Big) \Big/ \Big( C_A \Big(x^{(\lambda + 2)} - \Big(\frac{\alpha_2}{C_A} \Big)^{(\lambda + 2)} \Big) (\lambda + 1) \Big). \end{aligned} $$

Taking into the account relations x _A = α ₂∕C _A and y _A = α ₁∕C _A (6.7), we obtain the final expression for the switching curve $M_A^1(\lambda )$:

$$\displaystyle \begin{aligned} y=\frac{(\lambda + 2) \Big(x^{(\lambda + 1)} - x_A^{(\lambda + 1)}\Big) y_A x}{(\lambda + 1) \Big(x^{(\lambda + 2)} - x_A^{(\lambda + 2)}\Big)}. \end{aligned} $$

To construct the final switching curve M _A(λ) for the optimal strategy of the first player in the game with the discounted functional in the case C _A > 0, we add to the curve $M_{A}^{1}(\lambda )$ the similar curve $M_{A}^{2}(\lambda )$ in the domain, where x ≤ y _A and y ≤ y _A

$$\displaystyle \begin{aligned} \begin{array}{rcl} M_{A}(\lambda) &\displaystyle =&\displaystyle M_{A}^{1}(\lambda) \cup M_{A}^{2}(\lambda), {} \\ M_{A}^{1}(\lambda) &\displaystyle =&\displaystyle \bigg\{ (x,y) \in [0,1] \times [0,1]\colon \\ &\displaystyle &\displaystyle y=\frac{(\lambda + 2) \Big(x^{(\lambda + 1)} - x_A^{(\lambda + 1)}\Big) y_A x}{(\lambda + 1) \Big(x^{(\lambda + 2)} - x_A^{(\lambda + 2)}\Big)}, \; x \geq x_A, \; y \geq y_A \bigg\}, \\ M_{A}^{2}(\lambda) &\displaystyle =&\displaystyle \bigg\{ (x,y) \in [0,1] \times [0,1]\colon \\ &\displaystyle &\displaystyle y= - \frac{(\lambda + 2) \Big((1-x)^{(\lambda + 1)} - (1-x_A)^{(\lambda + 1)}\Big) (1-y_A) (1-x)}{(\lambda + 1) \Big((1-x)^{(\lambda + 2)} - (1-x_A)^{(\lambda + 2)}\Big)} + 1, \\ &\displaystyle &\displaystyle x \leq x_A, \; y \leq y_A \bigg\}. \vspace{-3pt}\end{array} \end{aligned} $$

(6.16)

In the case when C _A < 0, curves M _A(λ), $M_{A}^{1}(\lambda )$ and$M_{A}^{2}(\lambda )$ are described by formulas

$$\displaystyle \begin{aligned} \begin{array}{rcl} M_{A}(\lambda) &\displaystyle =&\displaystyle M_{A}^{1}(\lambda) \cup M_{A}^{2}(\lambda), {} \\ M_{A}^{1}(\lambda) &\displaystyle =&\displaystyle \bigg\{ (x,y) \in [0,1] \times [0,1]\colon \\ &\displaystyle &\displaystyle y=\frac{(\lambda + 2) \Big((1-x)^{(\lambda + 1)} - (1-x_A)^{(\lambda + 1)}\Big) y_A (1-x)}{(\lambda + 1) \Big((1-x)^{(\lambda + 2)} - (1-x_A)^{(\lambda + 2)}\Big)}, \\ &\displaystyle &\displaystyle x \leq x_A, \; y \geq y_A \bigg\}, \\ M_{A}^{2}(\lambda) &\displaystyle =&\displaystyle \bigg\{ (x,y) \in [0,1] \times [0,1]\colon \\ &\displaystyle &\displaystyle y = - \frac{(\lambda + 2) \Big(x^{(\lambda + 1)} - x_A^{(\lambda + 1)}\Big) (1-y_A) x}{(\lambda + 1) \Big(x^{(\lambda + 2)} - x_A^{(\lambda + 2)}\Big)} + 1, \; x \geq x_A, \; y \leq y_A \bigg\}. \end{array} \end{aligned} $$

(6.17)

The curve M _A(λ) divides the unit square [0, 1] × [0, 1] into two parts: the upper part

$$\displaystyle \begin{aligned} D_{A}^{u} \supset \{ (x,y): \quad x = x_{A}, \quad y > y_{A} \} \end{aligned} $$

and the lower part

$$\displaystyle \begin{aligned} D_{A}^{l} \supset \{ (x,y): \quad x = x_{A}, \quad y < y_{A} \}. \end{aligned} $$

The “positive” feedback $u_{A}^{fl}$ has the following structure

$$\displaystyle \begin{aligned} \begin{array}{rcl} u_{A}^{fl} &=& u_{A}^{fl}(x,y) = \left\{ \begin{array}{ll} \max \{ 0,-sgn(C_{A}) \}, & \; if \; (x,y) \in D_{A}^{u}, \\ \max \{ 0,sgn(C_{A}) \}, & \; if \; (x,y) \in D_{A}^{l}, \\ {[0,1]}, & \; if \; (x,y) \in M_{A}(\lambda). \end{array} \right. {} \vspace{3pt}\end{array} \end{aligned} $$

(6.18)

On the Fig. 6.2 we show switching curves $M_{A}^{1}(\lambda )$, $M_{A}^{2}(\lambda )$ for the first player. Directions of velocities $\dot {x}$ are depicted by horizontal (left and right) arrows.

For the second player one can get similar switching curves M _B(λ) for the optimal control problem with the discounted functional, corresponding to the matrix B. More precisely, in the case when C _B > 0, the switching curve M _B(λ) is given by relations

$$\displaystyle \begin{aligned} \begin{array}{rcl} M_{B}(\lambda) &\displaystyle =&\displaystyle M_{B}^{1}(\lambda) \cup M_{B}^{2}(\lambda), {} \\ M_{B}^{1}(\lambda) &\displaystyle =&\displaystyle \bigg\{ (x,y) \in [0,1] \times [0,1]\colon \\ &\displaystyle &\displaystyle x=\frac{(\lambda + 2) \Big(y^{(\lambda + 1)} - y_B^{(\lambda + 1)}\Big) x_B y}{(\lambda + 1) \Big(y^{(\lambda + 2)} - y_B^{(\lambda + 2)}\Big)}, \; x \geq x_B, \; y \geq y_B \bigg\}, \\ M_{B}^{2}(\lambda) &\displaystyle =&\displaystyle \bigg\{ (x,y) \in [0,1] \times [0,1]\colon \\ &\displaystyle &\displaystyle x= - \frac{(\lambda + 2) \Big((1-y)^{(\lambda + 1)} - (1-y_B)^{(\lambda + 1)}\Big) (1-x_B) (1-y)}{(\lambda + 1) \Big((1-y)^{(\lambda + 2)} - (1-y_B)^{(\lambda + 2)}\Big)} + 1, \\ &\displaystyle &\displaystyle x \leq x_B, \; y \leq y_B \bigg\}. \end{array} \end{aligned} $$

(6.19)

In the case when the parameter C _B is negative C _B < 0, curves M _B(λ), $M_{B}^{1}(\lambda )$ and $M_{B}^{2}(\lambda )$ are determined by formulas

$$\displaystyle \begin{aligned} \begin{array}{rcl} M_{B}(\lambda) &\displaystyle =&\displaystyle M_{B}^{1}(\lambda) \cup M_{B}^{2}(\lambda), {} \\ M_{B}^{1}(\lambda) &\displaystyle =&\displaystyle \bigg\{ (x,y) \in [0,1] \times [0,1]\colon \\ &\displaystyle &\displaystyle x=\frac{(\lambda + 2) \Big((1-y)^{(\lambda + 1)} - (1-y_B)^{(\lambda + 1)}\Big) x_B (1-y)}{(\lambda + 1) \Big((1-y)^{(\lambda + 2)} - (1-y_B)^{(\lambda + 2)}\Big)}, \\ &\displaystyle &\displaystyle x \geq x_B, \; y \leq y_B \bigg\}, \\ M_{B}^{2}(\lambda) &\displaystyle =&\displaystyle \bigg\{ (x,y) \in [0,1] \times [0,1]\colon \\ &\displaystyle &\displaystyle x= - \frac{(\lambda + 2) \Big(y^{(\lambda + 1)} - y_B^{(\lambda + 1)}\Big) (1-x_B) y}{(\lambda + 1) \Big(y^{(\lambda + 2)} - y_B^{(\lambda + 2)}\Big)} + 1, \; x \leq x_B, \; y \geq y_B \bigg\}. \end{array} \end{aligned} $$

(6.20)

The curve M _B(λ) divide the unit square [0, 1] × [0, 1] into two parts: the left part

$$\displaystyle \begin{aligned} D_{B}^{l} \supset \{ (x,y): \quad x < x_{B}, \quad y = y_{B} \} \end{aligned}$$

and the right part

$$\displaystyle \begin{aligned} D_{B}^{r} \supset \{ (x,y): \quad x > x_{B}, \quad y = y_{B} \}. \end{aligned}$$

The “positive” feedback $v_{B}^{fl}$ has the following structure

$$\displaystyle \begin{aligned} \begin{array}{rcl} v_{B}^{fl} &=& v_{B}^{fl}(x,y) = \left\{ \begin{array}{ll} \max \{ 0,-sgn(C_{B}) \}, & \; if \; (x,y) \in D_{B}^{l}, \\ \max \{ 0,sgn(C_{B}) \}, & \; if \; (x,y) \in D_{B}^{r}, \\ {[0,1]}, & \; if \; (x,y) \in M_{B}(\lambda). \end{array} \right. {} \end{array} \end{aligned} $$

(6.21)

Remark 6.2

Let us note that in papers by Arnold [1] average integral payoff functionals were considered

$$\displaystyle \begin{aligned} \frac{1}{(T-t_0)} \int_{t_0}^{T} g_{A}(x(t),y(t)) \, dt. \end{aligned} $$

(6.22)

In the paper [16] switching curves for optimal control strategies of players in the game with average integral functionals were obtained. For example, for the first player in the case when C _A > 0 the switching curve in the domain x ≥ x _A, y ≥ y _A is described by relations

$$\displaystyle \begin{aligned} y = \frac{2\alpha_{1}x}{C_{A}x+\alpha_{2}}. \end{aligned} $$

(6.23)

The asymptotical analysis of solutions (6.16) for the game with discounted payoff functionals shows, that according to L’Hospital’s rule, when the discount parameter λ tends to zero, the relation for switching curves (6.16) of the control strategy for the first player converges to switching curves (6.23) in the game with average integral payoff functionals (6.22).

On the Fig. 6.2. the solid line shows the switching curve of control strategies for the first player in the game with average integral payoff functionals, which is asymptotically approximated by solutions of the game with discounted functionals when λ ↓ 0. The dashed line and the dotted line show switching curves of control strategies for the first player in the game with discounted payoff functionals with values of the discount parameter λ = 0.1 and λ = 0.2, respectively.

On the Fig. 6.3 we show switching curves $M_{B}^{1}(\lambda )$, $M_{B}^{2}(\lambda )$ for the second player. Directions of velocities $\dot {y}$ are depicted by vertical (up and down) arrows.

It is worth to clarify the asymptotical behavior of switching curves for optimal control when discount parameters can be infinitely large. In this case, one can check that switching curve M _A(λ) for optimal control in the problem with discounted integral payoffs describing long-term interests of players converge to the switching line y = y _A generated by the short-run payoff function g _A(x, y) when the discount parameter λ tends to infinity. Such behavior of the switching curve M _A(λ) is shown on the Fig. 6.4.

$$\displaystyle \begin{aligned} \begin{array}{l} \displaystyle{y=\frac{(\lambda + 2) \Big(x^{(\lambda + 1)} - x_A^{(\lambda + 1)}\Big) y_A x}{(\lambda + 1) \Big(x^{(\lambda + 2)} - x_A^{(\lambda + 2)}\Big)}}\\ \displaystyle{=\Big( 1 + \frac{1}{(\lambda+1)} \Big) \frac{\Big( 1 - \Big(\displaystyle{\frac{x_A}{x}}\Big)^{(\lambda + 1)} \Big)}{\Big( 1 - \Big(\displaystyle{\frac{x_A}{x}}\Big)^{(\lambda + 2)} \Big)} y_A \rightarrow y_A, \quad \mbox{when} \; \lambda \rightarrow +\infty.} \end{array} \end{aligned}$$

6.9 Guaranteed Values of Discounted Payoffs

Let us formulate the proposition, which confirms, that the “positive” optimal control by the feedback principle $u_{A}^{fl}(x,y)$ (6.18) with the switching cure M _A, defined by formulas (6.16), (6.17), guarantee that the value of discounted payoff functionals is more or equal than the value w _A (6.6) of the static matrix game.

Theorem 6.1

For any initial position (x ₀, y ₀) ∈ [0, 1] × [0, 1] and for any trajectory

$$\displaystyle \begin{aligned} (x^{fl}(\cdot),y^{fl}(\cdot)) \in X(x_{0},y_{0},u_{A}^{fl}), \quad x^{fl}(t_{0})=x_{0}, \quad y^{fl}(t_{0})=y_{0}, \quad t_{0}=0, \end{aligned}$$

generated by the optimal control by the feedback principle $u_{A}^{fl}=u_{A}^{fl}(x,y)$ there exists the final moment of time t _∗∈ [0, T _A] such that in this moment of time the trajectory (x ^fl(⋅), y ^fl(⋅)) reaches the line where x = x _A , namely x ^fl(t _∗) = x _A . Then, according to the construction of the optimal control, that maximizes the integral (6.15) by the feedback principle $u_{A}^{fl}$ , the following estimate holds

$$\displaystyle \begin{aligned} \int_{t_{\ast}}^{T} e^{-\lambda (t-t_{\ast})} g_{A}(x(t),y(t)) \, dt \geq \frac {w_{A}}{\lambda} \big( 1-e^{-\lambda (T - t_{\ast})} \big), \quad \forall T \geq t_{\ast}. {} \end{aligned} $$

(6.24)

In particular, this inequality remains valid when time T tends to infinity

$$\displaystyle \begin{aligned} \liminf_{T \rightarrow +\infty} \lambda \int_{t_{\ast}}^{T} e^{-\lambda (t-t_{\ast})} g_{A}(x^{fl}(t),y^{fl}(t)) \, dt \geq w_{A}. {} \end{aligned} $$

(6.25)

Inequalities (6.24), (6.25) mean, that the value of the discounted functional is not worse, than the value w _A (6.6) of the static matrix game.

The analogous result is fair for trajectories, which generated by the optimal control $v_{B}^{fl}$ (6.21), that corresponds to the switching curve M _B (6.19), (6.20).

Proof

The result of the theorem follows from the fact that the value of the payoff functional (6.5) is maximum on the constructed broken line. In particular, it is more or equal, than the value of this functional on the trajectory which stays on the segment x = x _A (see Fig. 6.1) with the control u(t) = x _A. The value of the payoff functional on such trajectory is following

$$\displaystyle \begin{aligned} \int_{t_{\ast}}^{T} e^{-\lambda (t-t_{\ast})} w_{A} \, dt = \frac {w_{A}}{\lambda} \big( 1-e^{-\lambda (T - t_{\ast})} \big). \end{aligned}$$

These arguments imply the required relation (6.24), which in the limit transition provides the relation (6.25). □

Remark 6.3

Let us consider the acceptable trajectory $(x_{AB}^{fl}(\cdot ),y_{AB}^{fl}(\cdot ))$, generated by “positive” feedbacks $u_{A}^{fl}$ (6.18), $v_{B}^{fl}$ (6.21). Then in accordance with the Theorem 6.1, next inequalities take place

$$\displaystyle \begin{aligned} \liminf_{T \rightarrow +\infty} \lambda \int_{t_{\ast}}^{T} e^{-\lambda (t-t_{\ast})} g_{A}(x_{AB}^{fl}(t),y_{AB}^{fl}(t)) \, dt \geq w_{A} \end{aligned}$$

$$\displaystyle \begin{aligned} \liminf_{T \rightarrow +\infty} \lambda \int_{t_{\ast}}^{T} e^{-\lambda (t-t_{\ast})} g_{B}(x_{AB}^{fl}(t),y_{AB}^{fl}(t)) \, dt \geq w_{B} \end{aligned}$$

and, hence, the acceptable trajectory $(x_{AB}^{fl}(\cdot ),y_{AB}^{fl}(\cdot ))$ provides the better result for both players, than trajectories, convergent to points of the static Nash equilibrium, in which corresponding payoffs are equal to values w _A and w _B.

6.10 Equilibrium Trajectories in the Game with Discounted Payoffs

Let us consider payoff matrices of players on the financial market, which reflect the data of investigated markets of stocks [3] and bonds [4] in USA. The matrix A corresponds to the behavior of traders, which play on increase of the course and are called “bulls”. The matrix B corresponds to the behavior of traders, which play on the depreciation of the course and are called “bears”. Parameters of matrices represent rate of return for stocks and bonds, expressed in the form of interest rates,

$$\displaystyle \begin{aligned} A = \left( \begin{array}{cc} 10 & 0 \\ 1.75 & 3 \end{array} \right), \quad B = \left( \begin{array}{cc} -5 & 3 \\ 10 & 0.5 \end{array} \right). \end{aligned} $$

(6.26)

Characteristic parameters of static games are given at the following levels [27]

$$\displaystyle \begin{aligned} C_A = a_{11} - a_{12} - a_{21} + a_{22} = 11.25, \end{aligned}$$

$$\displaystyle \begin{aligned} \alpha_1 = a_{22} - a_{12} = 3, \quad \alpha_2 = a_{22} - a_{21} = 1.25, \end{aligned}$$

$$\displaystyle \begin{aligned} x_A = \frac{\alpha_2}{C_A} = 0.11, \quad y_A = \frac{\alpha_1}{C_A} = 0.27; \end{aligned}$$

$$\displaystyle \begin{aligned} C_B = b_{11} - b_{12} - b_{21} + b_{22} = -17.5, \end{aligned}$$

$$\displaystyle \begin{aligned} \beta_1 = b_{22} - b_{12} = -2.5, \quad \beta_2 = b_{22} - b_{21} = -9.5, \end{aligned}$$

$$\displaystyle \begin{aligned} x_B = \frac{\beta_2}{C_B} = 0.54, \quad y_B = \frac{\beta_1}{C_B} = 0.14. \end{aligned}$$

On the Fig. 6.5 we present broken lines of players’ best replies, saddle points NA, NB in static antagonistic games, the point of the Nash equilibrium NE in the static bimatrix game.

Let us note, that players of the coalition of “bulls” gain in the case of upward trend of markets, when players of both coalitions invest in the same market. And players of the coalition of “bears” make profit from investments in the case of downward trend of markets when players of the coalition of “bulls” move their investments from one market to another.

For the game of coalitions of “bulls” and “bears” we construct switching curves M _A(λ), M _B(λ) and provide calculations of equilibrium trajectories of the market dynamics with the value of the discount parameter λ = 0.1.

This calculations are presented on the Fig. 6.6. Here we show saddle points NA, NB in static antagonistic games, the point of the Nash equilibrium NE in the static bimatrix game, switching lines for players’ controls $M_A(\lambda ) = M_A^1(\lambda ) \bigcup M_A^2(\lambda )$ and $M_B(\lambda ) = M_B^1(\lambda ) \bigcup M_B^2(\lambda )$ in the dynamical bimatrix game with discounted payoff functionals for matrices A, B (6.26). The field of velocities of players is depicted by arrows.

The field of directions generates equilibrium trajectories, one of which is presented on the Fig. 6.6. This trajectory $TR(\lambda )=(x_{AB}^{fl}(\cdot ),y_{AB}^{fl}(\cdot ))$ starts from the initial position IP = (0.1, 0.9) and moves along the characteristic in the direction of the vertex (1, 1) of the unit square [0, 1] × [0, 1] with control signals u = 1, v = 1. Then it crosses the switching line M _B(λ), and the second coalition switches the control v from 1 to 0. Then, the trajectory TR(λ) moves in the direction of the vertex (1, 0) until it reaches the switching line M _A(λ). Here players of the first coalition change the control signal u from 1 to 0. After that the movement of the trajectory is directed along the characteristic to the vertex (0, 0). Then the trajectory crosses the line M _B(λ), on which the sliding mode arises, during which the switch of controls of the second coalition occurs, and the trajectory TR(λ) converge to the point $IM(\lambda )=M_{A}(\lambda ) \bigcap M_{B}(\lambda )$ of the intersection of switching lines M _A(λ), M _B(λ).

References

Arnold, V.I.: Optimization in mean and phase transitions in controlled dynamical systems. Funct. Anal. Appl. 36, 83–92 (2002). https://doi.org/10.1023/A:1015655005114
Basar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory. Academic Press, London (1982)
Chapter Google Scholar
CNN Money [Electronic resource]. http://money.cnn.com/
Forex Market [Electronic resource]. http://www.fxstreet.ru.com/
Friedman, D.: Evolutionary games in economics. Econometrica 59(3), 637–666 (1991)
Article MathSciNet Google Scholar
Intriligator, M.: Mathematical Optimization and Economic Theory. Prentice-Hall, New York (1971)
Google Scholar
Klaassen, G., Kryazhimskii, A.V., Tarasyev, A.M.: Multiequilibrium game of timing and competition of gas pipeline projects. J. Optim. Theory Appl. 120(1), 147–179 (2004)
Article MathSciNet Google Scholar
Kleimenov, A.F.: Nonantagonistic Positional Differential Games. Nauka, Yekaterinburg (1993)
Google Scholar
Kolmogorov, A.N.: On analytical methods in probability theory. Uspekhi Mat. Nauk 5, 5–41 (1938)
Google Scholar
Krasovskii, A.N., Krasovskii, N.N.: Control Under Lack of Information. Birkhauser, Boston (1995)
Google Scholar
Krasovskii, N.N., Subbotin, A.I.: Game-Theoretical Control Problems. Springer, New York (1988)
Google Scholar
Krasovskii, A.A., Taras’ev, A.M.: Dynamic optimization of investments in the economic growth models. Autom. Remote Control 68(10), 1765–1777 (2007)
Article MathSciNet Google Scholar
Krasovskii, N.A., Tarasyev, A.M.: Search for maximum points of a vector criterion based on decomposition properties. Proc. Steklov Inst. Math. 269, 174 (2010). https://doi.org/10.1134/S0081543810060155
Article MathSciNet Google Scholar
Krasovskii, N.A., Tarasyev, A.M.: Decomposition algorithm of searching equilibria in a dynamic game. Autom. Remote Control 76, 185 (2015). https://doi.org/10.1134/S0005117915100136
Article MathSciNet Google Scholar
Krasovskii, N.A., Tarasyev, A.M.: Equilibrium Solutions in Dynamical Games. UrGAU, Yekaterinburg (2015)
Google Scholar
Krasovskii, N.A., Tarasyev, A.M.: Equilibrium trajectories in dynamical bimatrix games with average integral payoff functionals. Math. Game Theory Appl. 8(2), 58–90 (2016)
MATH Google Scholar
Krasovskii, N.A., Kryazhimskiy, A.V., Tarasyev, A.M.: Hamilton–Jacobi equations in evolutionary games. Proc. Inst. Math. Mech. UrB RAS 20(3), 114–131 (2014)
MathSciNet Google Scholar
Kryazhimskii, A.V., Osipov, Yu.S.: On differential-evolutionary games. Proc. Steklov Inst. Math. 211, 234–261 (1995)
Google Scholar
Kurzhanskii, A.B.: Control and Observation Under Uncertainty. Nauka, Moscow (1977)
MATH Google Scholar
Petrosjan, L.A., Zenkevich, N.A.: Conditions for sustainable cooperation. Autom. Remote Control 76, 84 (2015). https://doi.org/10.1134/S0005117915100148
Article MathSciNet Google Scholar
Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., Mischenko, E.F.: The Mathematical Theory of Optimal Processes. Interscience Publishers, New York (1962)
Google Scholar
Subbotin, A.I.: Minimax Inequalities and Hamilton–Jacobi Equations. Nauka, Moscow (1991)
MATH Google Scholar
Subbotin, A.I., Tarasyev, A.M.: Conjugate derivatives of the value function of a differential game. Dokl. AN SSSR 283(3), 559–564 (1985)
MathSciNet Google Scholar
Subbotina, N.N.: The Cauchy method of characteristics and generalized solutions of Hamilton–Jacobi–Bellman equations. Dokl. Acad. Nauk SSSR 320(3), 556–561 (1991)
Google Scholar
Tarasyev, A.M.: A Differential model for a 2 × 2-evolutionary game dynamics. IIASA Working Paper, Laxenburg, Austria, WP-94-063 (1994). http://pure.iiasa.ac.at/4148/1/WP-94-063.pdf
Ushakov, V.N., Uspenskii, A.A., Lebedev, P.D.: Geometry of singular curves of a class of time-optimal problems. Vestn. Sankt-Peterburgsk. Univ. 10(3), 157–167 (2013)
Google Scholar
Vorobyev, N.N.: Game Theory for Economists and System Scientists. Nauka, Moscow (1985)
Google Scholar

Download references

Acknowledgements

The paper is supported by the Project 18-1-1-10 “Development of the concept of feedback control, minimax approach and singular perturbations in the theory of differential equations” of the Integrated Program of UrB RAS

Author information

Authors and Affiliations

Krasovskii Institute of Mathematics and Mechanics UrB RAS, Yekaterinburg, Russia
Nikolay Krasovskii & Alexander Tarasyev
Ural Federal University, Yekaterinburg, Russia
Alexander Tarasyev

Authors

Nikolay Krasovskii
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Tarasyev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolay Krasovskii .

Editor information

Editors and Affiliations

St. Petersburg State University, St. Petersburg, Russia
Leon A. Petrosyan
Institute of Applied Mathematical Research, Karelian Research Center of RAS, Petrozavodsk, Russia
Vladimir V. Mazalov
Graduate School of Management, St. Petersburg State University, St. Petersburg, Russia
Nikolay A. Zenkevich

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Krasovskii, N., Tarasyev, A. (2018). The Impact of Discounted Indices on Equilibrium Strategies of Players in Dynamical Bimatrix Games. In: Petrosyan, L., Mazalov, V., Zenkevich, N. (eds) Frontiers of Dynamic Games. Static & Dynamic Game Theory: Foundations & Applications. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-92988-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-92988-0_6
Published: 18 July 2018
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-92987-3
Online ISBN: 978-3-319-92988-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

The Impact of Discounted Indices on Equilibrium Strategies of Players in Dynamical Bimatrix Games

Abstract

Similar content being viewed by others

Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games

Minimax Generalized Solutions of Hamilton-Jacobi Equations in Dynamic Bimatrix Games

Equilibrium Trajectories in Dynamical Bimatrix Games with Average Integral Payoff Functionals

Keywords

6.1 Introduction

6.2 Model Dynamics

6.3 Local Payoff Functions

6.4 Nash Equilibrium in the Differential Game with Discounted Functionals

Definition 6.1

6.5 Auxiliary Zero-Sum Games

Lemma 6.1

Remark 6.1

6.6 Construction of the Dynamical Nash Equilibrium

Lemma 6.2

6.7 Two-Step Optimal Control Problems

6.8 The Solution of the Two-Step Optimal Control Problem

Remark 6.2

6.9 Guaranteed Values of Discounted Payoffs

Theorem 6.1

Proof

Remark 6.3

6.10 Equilibrium Trajectories in the Game with Discounted Payoffs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

The Impact of Discounted Indices on Equilibrium Strategies of Players in Dynamical Bimatrix Games

Abstract

Similar content being viewed by others

Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games

Minimax Generalized Solutions of Hamilton-Jacobi Equations in Dynamic Bimatrix Games

Equilibrium Trajectories in Dynamical Bimatrix Games with Average Integral Payoff Functionals

Keywords

6.1 Introduction

6.2 Model Dynamics

6.3 Local Payoff Functions

6.4 Nash Equilibrium in the Differential Game with Discounted Functionals

Definition 6.1

6.5 Auxiliary Zero-Sum Games

Lemma 6.1

Remark 6.1

6.6 Construction of the Dynamical Nash Equilibrium

Lemma 6.2

6.7 Two-Step Optimal Control Problems

6.8 The Solution of the Two-Step Optimal Control Problem

Remark 6.2

6.9 Guaranteed Values of Discounted Payoffs

Theorem 6.1

Proof

Remark 6.3

6.10 Equilibrium Trajectories in the Game with Discounted Payoffs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation