Strongly Time-Consistent Solutions in Cooperative Dynamic Games

Petrosyan, Leon A.

doi:10.1007/978-3-030-39789-0_2

Leon A. Petrosyan⁵

Part of the book series: Annals of the International Society of Dynamic Games ((AISDG,volume 16))

414 Accesses

Abstract

In the paper the evolution of dynamic game along the cooperative trajectory is investigated. Along cooperative trajectory at each time instant players find themselves in a new game which is a subgame of the originally defined game. In many cases the optimal solution of the initial game restricted to the subgame along cooperative trajectory fails to be optimal in the subgame. To overcome this difficulty we introduced (see Petrosyan and Danilov, Vestnik Leningrad Univ Mat Mekh Astronom 1:52–59, 1979; Petrosyan and Zaccour, J Econ Control 27(3):381–398, 2003; Yeung and Petrosyan, Subgame consistent economic optimization. Birkhauser, 2012) the special payment mechanism—imputation distribution procedure (IDP), or payment distribution procedure (PDP), but another serious question arises: under what conditions the initial optimal solution converted to any optimal solution in the subgame will remain optimal in the whole game. This condition we call strongly time-consistency condition of the optimal solution. If this condition is not satisfied players in reality may switch in some time instant from the previously selected optimal solution to any optimal solution in the subgame, and as result realize the solution which will be not optimal in the whole game. We propose different types of strongly time-consistent solutions for multicriterial control, cooperative differential, and cooperative dynamic games.

Access provided by Autonomous University of Puebla. Download chapter PDF

A-Subgame Concept and the Solutions Properties for Multistage Games with Vector Payoffs

Subgame Consistent Cooperative Solution in Random Horizon Dynamic Games

Subgame Consistent Cooperative Solution in Dynamic Games

Keywords

1 What Is Strongly Time-Consistency?

What is strongly time-consistency? Try to explain this notion. Let M ∈ R ⁿ be a fixed point in R ⁿ. Consider a classical control problem (with one player)

$$\displaystyle \begin{aligned} \begin{array}{l} \dot x=f(x, u), x\in R^n, u\in U \subset Comp R^l\\ x(t_0)=x_0, \ t \in [t_0,T]. \end{array} \end{aligned} $$

(1)

Find the control $\bar {u}(t)$, and corresponding trajectory $\bar {x}(t)$ such that at terminal instant the distance $\rho (\bar {x}(T),M)$ will be minimal.

Denote this problem by Γ(x ₀, T − t ₀). And denote by C(x ₀, T − t ₀) the reachability set of system (1) from initial point x ₀ at terminal time T.

Suppose for simplicity that M∉C(x ₀, T − t ₀). The solution of this optimal control problem we can see on Fig. 1.

Consider the intermediate time instant τ ∈ [t ₀, T], and the intermediate control problem $\varGamma (\bar {x}(\tau ), T-\tau )$ with initial condition on the optimal trajectory with duration T − τ. It is clear that the control $\bar {u}(t)$, t ∈ [τ, T] will be optimal also in $\varGamma (\bar {x}(\tau ), T-\tau )$, so will be also the trajectory $\bar {x}(t)$, t ∈ [τ, T].

This is Bellman-optimality principle and also time-consistency of optimal control $\bar {u}(t)$, t ∈ [t ₀, T]. Suppose now that we have another optimal control $\bar {\bar {u}}(t)$, t ∈ [τ, T] in the problem $\varGamma (\bar {x}(\tau ), T-\tau )$. Then it is easy to see that the control

$$\displaystyle \begin{aligned} \hat{u}(t)=\left\{ \begin{array}{ll} \bar{u}(t),& t\in[t_0, \tau]\\ \bar{\bar{u}}(t),& t\in[\tau,T] \end{array} \right. \end{aligned}$$

will be also optimal in the problem Γ(x ₀, T − t ₀). In other words: “any optimal continuation of the original problem in the subproblem along optimal trajectory generates optimal solution of the original problem.” This property we shall call strongly time-consistency (strongly dynamic stability) (see Fig. 1).

Consider now a slightly more complicated problem. The motion equations are the same (1), but the aim of control is different, it is necessary to come as close as possible to system of points M ₁, …, M _k, M _i ∈ R ⁿ, i ∈{1, …k}.

Denote as before the problem by Γ(x ₀, T − t ₀) and by C(x ₀, T − t ₀) the reachability set of (1) and suppose that $C(x_0,T-t_0) \cap \hat {M} =\emptyset $, where $\hat {M}$ is the convex hull of points {M ₁, …, M _k}. As optimal solution here we may consider Pareto-optimal set which coincides with arc AB, the projection (suppose that C(x ₀, T − t ₀) is convex) of $\hat {M}$ on C(x ₀, T − t ₀) (see Fig. 2).

Consider Pareto-optimal control $\bar {u}(t)$, t ∈ [t ₀, T] which connects the initial point x ₀ ∈ C(x ₀, T − t ₀) with the point M belonging to the Pareto-optimal set (M belongs to the arc AB which is projection of the set $\hat {M}$ on C(x ₀, T − t ₀)). And let $\bar {x}(t)$, t ∈ [t ₀, T] be the corresponding Pareto-optimal trajectory.

Consider a subproblem $\varGamma (\bar {x}(t), T-t)$ from initial position $\bar {x}(t)$ on the Pareto-optimal trajectory. We see that the Pareto-optimal set in $\varGamma (\bar {x}(t), T-t)$ (arc A′B′) is different from the Pareto-optimal set in Γ(x ₀, T − t ₀) having only (in our example) one common point M. This means that the control $\bar {u}(t)$, t ∈ [τ, T] is Pareto-optimal in subproblem $\varGamma (\bar {x}(\tau ), T-\tau )$, and the Pareto-optimal solution $\bar {u}(t)$, t ∈ [t ₀, T] is time-consistent (dynamic stable) [4, 5].

In the same time we can see that the control of the type

$$\displaystyle \begin{aligned}\hat{u}(t)=\left\{ \begin{array}{ll} \bar{u}(t),& t\in[t_0, \tau]\\ \bar{\bar{u}}(t),& t\in[\tau,T], \end{array} \right.\end{aligned}$$

where $\bar {\bar {u}}(t)$ is an arbitrary Pareto-optimal control in subproblem $\varGamma (\bar {x}(\tau ), T-\tau )$, may not be Pareto-optimal in Γ(x ₀, T − t ₀).

Which means that in this case the optimal continuation of the motion in the subproblem with initial conditions on Pareto-optimal trajectory together with initial Pareto-optimal motion maybe not Pareto-optimal in the original problem. This means that the Pareto-optimal solution is time-consistent but not strongly time-consistent (see Fig. 2).

In this special problem there is one approach for constructing strongly time-consistent solutions on the bases of Pareto-optimal solutions. The idea of this approach is to consider all possible outcomes which may occur if at each time instant t on the time interval [t _k, t _k + δ) the control u(τ) will be selected leading to one of Pareto-optimal points in the subproblem Γ(x(t _k), T − t _k). Let t ₀ < t ₁ < … < t _k < t _k+1 < … < t _n = T be the decomposition of the time interval [t ₀, T], t _k+1 − t _k = δ > 0. The resulting trajectory will be not Pareto-optimal, but we shall call it conditionally Pareto-optimal. Denote by P(x(t _k), t _k) the set of end-points of these trajectories for all possible controls selected in a described manner. It is clear that

$$\displaystyle \begin{aligned} P(x(t_0),t_0) \supset P(x(t_1),t_1) \supset \ldots \supset P(x(t_k),t_k)\supset \ldots \supset P(x(T),T).\end{aligned}$$

And the set P(x(t ₀), t ₀) is δ-strongly time-consistent if we allow possible changes of controls only in points t _k, k = 0, …, n.

For the system

$$\displaystyle \begin{aligned}\dot{x}=u_{1}+u_{2}+u_{3}, x(t_{0})=x_{0}\end{aligned}$$

$$\displaystyle \begin{aligned}|u_{i}|\leq 1, x\in R^{2}, t\in [t_{0}, T],\end{aligned}$$

the set P(x(t ₀), t ₀) is denoted by $\hat {D}$ on the Fig. 3 (dashed region).

1.1 Cooperative Differential Game

Consider now cooperative differential games with player set N. Motion equations have the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \dot x=f(x, u_1, \ldots, u_k), x\in R^n, u_i\in U_i\subset Comp R^l \end{array} \end{aligned} $$

(2)

$$\displaystyle \begin{aligned} \begin{array}{rcl} x(t_0)=x_0 \end{array} \end{aligned} $$

(3)

and the payoffs of players are defined as

$$\displaystyle \begin{aligned}K_i(x_0, T-t_0; u_1, \ldots, u_k) = \int_{t_0}^{T} h_i(x(t)) dt, h_i>0, \ i\in N.\end{aligned}$$

Denote this game by Γ(x ₀, T − t ₀). Cooperative trajectory $\overline {x}(t)$, $\overline {x}(t_0)=x_0$, t ∈ [t ₀, T] is defined as

$$\displaystyle \begin{aligned} \max_{u_1, \ldots, u_k} \sum \limits_{i=1}^n K_i(x_0, T-t_0; u_1, \ldots, u_n)= \sum\limits_{i=1}^n K_i(x_0, T-t_0; \overline{u}_1, \ldots, \overline{u}_n)= \end{aligned}$$

$$\displaystyle \begin{aligned} =\sum_{i=1}^n \int_{t_0}^{T} h_i(\overline{x}(t)) dt= v(x_0, T-t_0; N). \end{aligned} $$

(4)

We suppose that max in (4) is attained. Let v(x ₀, T − t ₀;S), S ⊂ N be the characteristic function defined in classical way as value of zero-sum game between coalition S as first player and N∖S as second (see [6]), and E(x ₀, T − t ₀), the set of imputations

$$\displaystyle \begin{aligned} E(x_0, T-t_0)= \lbrace\xi=\lbrace \xi_i \rbrace : \sum\limits_{i=1}^n \xi_i=v(x_0, T-t_0; N), \end{aligned}$$

$$\displaystyle \begin{aligned} \xi_i\geq v(x_0, T-t_0;\lbrace i\rbrace ), i\in N\rbrace. \end{aligned} $$

(5)

Denote by C(x ₀, T − t ₀) reachability set of the system (1), for y ∈ C(x ₀, t − t ₀), t ∈ [t ₀, T] define a subgame Γ(y, T − t) of Γ(x ₀, T − t ₀) with characteristic function v(y, T − t;S), S ⊂ N and imputation set E(y, T − t).

Optimality principle (solution) is a subset of imputation set

$$\displaystyle \begin{aligned}C(y, T-t)\subset E(y, T-t)\end{aligned}$$

(Core, NM-solution,…).

Consider the family of subgames along the cooperative trajectory $\varGamma (\bar {x}(t), T-t;S)$ and also imputation set $E(\overline {x}(t), T-t)$ and the solution of subgames along this cooperative trajectory, $C(\overline {x}(t), T-t)$.

For each ξ ∈ C(x ₀, T − t ₀) define the imputation distribution procedure IDP [3] β(t) = (β ₁(t), …, β _i(t), …, β _n(t))

$$\displaystyle \begin{aligned} \xi=\int_{t_0}^{T} \beta(\tau) d\tau, \ \xi \in C(x_0, T-t_0).\end{aligned}$$

The imputation ξ ∈ C(x ₀, T − t ₀) is called dynamic stable [3,4,5] (time-consistent) if

$$\displaystyle \begin{aligned}\xi - \int_{t_0}^{t} \beta(\tau) d\tau \in C(\overline{x}(t), T-t), \ t\in[t_0, t].\end{aligned}$$

Definition 1

The solution C(x ₀, T − t ₀) is called time-consistent if all imputations ξ ∈ C(x ₀, T − t ₀) are time-consistent.

Definition 2

Optimality principle C(x ₀, T − t ₀) is called strongly dynamic stable [11] (strongly time-consistent) if for each ξ ∈ C(x ₀, T − t ₀) there exist IDP β(τ) such that

$$\displaystyle \begin{aligned}\int_{t_0}^{t} \beta(\tau) d\tau \oplus C(\overline{x}(t), T-t) \subset C(x_0, T-t_0),\end{aligned}$$

here a ⊕ B(a ∈ R ⁿ, B ⊂ R ⁿ) is defined as {a + b : b ∈ B}.

Since as it is well known time-consistency of cooperative solutions taken from the classical one-shot game theory takes place only in special cases it is clear that strongly time-consistency is a very special event. Note that strongly time-consistency has sense only for multivalued (set-valued) optimality principles (core, NM-solution).

1.2 Transformation of Characteristic Function

Let v(y, T − t;S) be characteristic function in Γ(y, T − t). Define the following integral transformation

$$\displaystyle \begin{aligned}\overline{v} (x_0,T-t_0; S)= \int_{t_0}^{T} \frac {v(\overline{x}(t), T-t; S)\sum\limits_{i\in N} h_i(\overline{x}(t))}{v(\overline{x}(t), T-t; N)} dt,\end{aligned}$$

here $v(\bar {x}(t), T-t;S)$ is characteristic function computed for subgame $\varGamma (\bar {x}(t), T-t)$ along cooperative trajectory. It can be seen that

$$\displaystyle \begin{aligned}\overline{v}(x_0, T-t; N)= v(x_0, T-t; N).\end{aligned}$$

Define the imputation set $\overline {E}(x_0, T-t_0)$ and the core under the new characteristic function $\overline {v}(x_0, T-t_0; S)$, $ \overline {C}(x_0, T-t_0) \subset \overline {E}(x_0, T-t_0)$ and define the integral transformation of the imputation ξ ∈ E(x ₀, T − t ₀) to $\bar {\xi }\in \overline {E}(x_0,T-t_0)$ as

$$\displaystyle \begin{aligned}\bar{\xi}_i=\int_{t_0}^{T}\frac{\xi_i(t)\displaystyle\sum_{i\in N}h_i(\bar{x}(t))}{V(\bar{x}(t),T-t;N)}dt, \ i\in N,\end{aligned} $$

where $\xi (t) \in E(\bar {x}(t),T-t)$. Similarly let $\overline {E}(\bar {x}(t), T-t)$ $\overline {C}(\bar {x}(t), T-t)$ be the set of imputations and the core in subgame $\varGamma (\bar {x}(t), T-t)$ along cooperative trajectory under characteristic function

$$\displaystyle \begin{aligned}\overline{v} (\bar{x}(t),T-t; S)= \int_{t}^{T} \frac {v(\overline{x}(\tau), T-\tau; S)\sum\limits_{i\in N} h_i(\overline{x}(\tau))}{v(\overline{x}(\tau), T-\tau; N)} d\tau.\end{aligned}$$

Theorem 1

$\overline {C}(x_0, T-t_0)$ is strongly time-consistent.

To prove it is sufficient to take for each $\bar {\xi }\in \overline {E}(x_0,T-t_0)$ as β _i(t)

$$\displaystyle \begin{aligned}\beta_i(t)=\frac{\xi_i(t)\sum\limits_{i\in N} h_i(\overline{x}(t))}{v(\overline{x}(t), T-t; N)},\end{aligned}$$

where $\xi (t) \in C(\overline {x}(t), T-t)$ is an integrable selector from $C(\overline {x}(t), T-t)$.

What is the connection between $\overline {C}$ and C? If there is a nonvoid intersection of $\overline {C}$ and C, then this imputation set could be a good preferable optimality principle in Γ(x ₀, T − t). Introduce

$$\displaystyle \begin{aligned}\lambda(S)=\max\limits_{t_0\leq t\leq T} \frac{v(\overline{x}(t), T-t; S)}{v(\overline{x}(t), T-t; N)},\end{aligned}$$

$$\displaystyle \begin{aligned}\lambda(N)=1.\end{aligned} $$

We have

$$\displaystyle \begin{aligned}\overline{v}(x_0, T-t_0; S) \leq \lambda(S)\int_{t_0}^{T} \sum\limits_{i\in N} h_i(\overline{x}(t)) dt = \lambda(S)v(x_0, T-t_0; N),\end{aligned}$$

$$\displaystyle \begin{aligned}\overline{v}(x_0, T-t_0; N) = \lambda(N)v(x_0, T-t_0; N)= v(x_0, T-t_0; N),\end{aligned}$$

$$\displaystyle \begin{aligned}\lambda(S)\geq \frac{v(x_0, T-t_0; S)}{v(x_0, T-t_0; N)},\end{aligned}$$

$$\displaystyle \begin{aligned}v(x_0, T-t_0; S)\leq \lambda(S)v(x_0, T-t_0; N).\end{aligned}$$

Denote by $\hat {C}(x_0, T-t_0)$ the set of all solutions ξ = {ξ ₁, …, ξ _n}

$$\displaystyle \begin{aligned}\sum\limits_{i\in S} \xi_i \geq \lambda(S) v(x_0, T-t_0; N), \ S\subset N, \ \sum\limits_{i\in N} \xi_i = v(x_0, T-t_0; N).\end{aligned}$$

From previous considerations it follows

$$\displaystyle \begin{aligned}\sum\limits_{i\in S} \xi_i \geq \lambda(S) v(x_0, T-t_0; N)\geq v(x_0, T-t_0; S).\end{aligned}$$

We see that

$$\displaystyle \begin{aligned}\hat{C}(x_0, T-t_0)\subset C(x_0, T-t_0) \cap \overline{C}(x_0, T-t_0)\end{aligned}$$

and

$$\displaystyle \begin{aligned}\hat{C}(\overline{x}(t), T-t)\subset C(\overline{x}(t), T-t)\cap \overline{C}(\overline{x}(t), T-t).\end{aligned}$$

The following theorem holds.

Theorem 2

$$\displaystyle \begin{aligned}\overline{C}(x_0, T-t_0)\supset \displaystyle\int_{t_0}^{t}\displaystyle \frac{\xi(t)\displaystyle\sum_{i=1}^n h_i(\bar{x}(t))}{v(\bar{x}(t),T-t;N)} \oplus \hat{C}(\bar{x}(t), T-t)\end{aligned} $$

(6)

for any integrable selector ξ(t) ∈ C(x(t), T − t).

Proof

Theorem 2 follows from the inclusion $\hat {C}(\bar {x}(t), T-t) \subset \bar {C}(\bar {x}(t), T-t)$ and strongly time-consistency of $\bar {C}(x_0,T-t_0)$.

From Theorem 2 it follows that for each imputation $\xi _0\in C(x_0,T-t_0)\cap \hat {C}(x_0,T-t_0)$ there exist IDP

$$\displaystyle \begin{aligned}\beta(t)=\displaystyle \frac{\xi(t)\displaystyle\sum_{i=1}^n h_i(\bar{x}(t))}{v(\bar{x}(t), T-t;N)},\end{aligned}$$

where ξ(t ₀) = ξ ₀ and ξ(t) is an integrable selector from $C(\bar {x}(t), T-t)$, such that

$$\displaystyle \begin{aligned} \displaystyle\int_{t_0}^{t}\displaystyle \beta (\tau)d\tau \oplus \hat{C}(\bar{x}(t), T-t)\subset \bar{C}(x_0, T-t_0). \end{aligned} $$

(7)

□

Suppose that $ \hat {C} (x_0, T-t_0)\neq \emptyset $. The interpretation of (7) is as follows. $\hat {C}(x_0,T-t_0)$ is the subset of the original core C(x ₀, T − t ₀) and for any imputation $\xi \in \hat {C} (x_0, T-t_0)\cap C(x_0,T-t_0)$ from this subset of original core C(x ₀, T − t ₀) one can construct the IDP (the imputation distribution procedure) such that if in an intermediate time instant t players for some reasons would like to switch to another optimal imputation $(\xi ^t)'\in \hat {C}(\bar {x}(t), T-t)\subset C(\bar {x}(t), T-t)$ from the subset of original core, they will still get the payments according to the imputation from $\bar {C}(x_0, T-t_0)$, resulting from the integral transformation of C(x ₀, T − t ₀).

2 Repeated Games

Folk theorems are well known in game theory [1, 2, 6,7,8,9]. By using the so-called punishment strategies they show the possibility to attain in some sense preferable outcomes. These outcomes are stable against deviations of single players. But the natural question arises: is it possible to get “good” outcomes stable against deviations of coalitions (coalition-proofness). Now we try to construct a mechanism based on the introduction of an analog of characteristic function which makes it possible (under some conditions on this newly defined characteristic function) to get coalition-proofness for repeated and multistage games [9]. This will show us the way of constructing strongly time-consistent optimality principles in multistage games.

Denote by G the infinity repeated n-person game with the game Γ played on each stage. For simplicity suppose that the stage game Γ is finite (has finite sets of strategies).

$$\displaystyle \begin{aligned} \varGamma= <N; U_1,\ldots,U_i, \ldots,U_n; K_1,\ldots,K_i,\ldots,K_n>.\end{aligned}$$

If on stage k(1 ≤ k ≤∞) strategy profile $u^k=(u_1^k,\ldots ,u_i^k,\ldots ,u_n^k)$ is chosen, the payoff in G is defined as

$$\displaystyle \begin{aligned} \begin{array}{l} {} H_i(u_1(\cdot),\ldots,u_i(\cdot),\ldots,u_n(\cdot)) =\displaystyle\sum_{k=1}^{\infty} \delta^{k-1}K_i(u_1^k,\ldots,u_i^k,\ldots,u_n^k)= \\ =\displaystyle\sum_{k=1}^{\infty} \delta^{k-1}K_i(u^k)=H_i(u(\cdot)), \ i \in N, \end{array} \end{aligned} $$

(8)

here $u_1(\cdot )= (u_1^1,\ldots ,u_1^k,\ldots )$, …, $u_i(\cdot )=(u_i^1,\ldots , u_i^k,$ …), …, $u_n(\cdot )=(u_n^1,\ldots ,u_n^k,\ldots )$, δ ∈ (0, 1).

Here in the expression $u_i(\cdot )=(u_i^1,\ldots , u_i^k,$ …), i ∈ N $u_i^k$ is the strategy chosen by player i in the game Γ on stage k. We suppose that on stage k when choosing $u_i^k$ player i knows the choices of other players and remembers his choices on previous stages. Thus $u_i^k$ is function of history

$$\displaystyle \begin{aligned}h^k= (u_1^1,\ldots,u_1^{k-1};\ldots; u_i^1,\ldots, u_i^{k-1};\ldots; u_n^1,\ldots, u_n^{k-1}). \end{aligned}$$

Formally we have to write $ u^k_i(h^k)$, i.e. $u_i^k$ depends upon history h ^k, k = 1, …. However in this paper for convenience we shall write $u_i^k$ instead $ u^k_i(h^k)$.

Consider the strategy profile $\bar {u}(\cdot )= (\bar {u}_1(\cdot ),$ $\ldots , \bar {u}_i(\cdot ),$ $\ldots , \bar {u}_n(\cdot ))$ such that

$$\displaystyle \begin{aligned} \sum_{i \in N} H_i(\bar{u})= \max_{u(\cdot)} \sum _{i\in N} H_i(u). \end{aligned} $$

(9)

It is evident that such strategy profile always exists.

One can take $\bar {u}_i(\cdot )= (\bar {u}_i^1,\ldots , \bar {u}_i^k, \ldots ,)$ i ∈ N such that

$$\displaystyle \begin{aligned} \displaystyle \sum_{i\in N} K_i(\bar{u}_1,\ldots, \bar{u}_i, \ldots, \bar{u}_n) =\displaystyle \max_{u_1,\ldots, u_i, \ldots, u_n}\sum_{i\in N} K_i(u_1,\ldots,u_i,\ldots,u_n) \end{aligned} $$

(10)

and since the stage games are the same (G is repeated game) we can take $\bar {u}_i^k=\bar {u}_i$ for all k = 1, …, n. Then from (8)–(10) we get that

$$\displaystyle \begin{aligned} \begin{array}{l} {} \displaystyle\sum_{i \in N} H_i(\bar{u})= \displaystyle\sum_{i \in N} \left(\sum_{k=1}^{\infty} \delta^{k-1}K_i(\bar{u}_1^k,\ldots, \bar{u}_n^k) \right)=\\ =\displaystyle \sum_{i \in N} \left(\sum_{k=1}^{\infty} \delta^{k-1}K_i(\bar{u}_1,\ldots, \bar{u}_n) \right) =\displaystyle \frac{1}{1-\delta} \sum_{i \in N} K_i(\bar{u}_1,\ldots, \bar{u}_n). \end{array}\end{aligned} $$

(11)

Introduce characteristic function V (S), S ⊂ N in Γ in classical sense. Then we shall have

$$\displaystyle \begin{aligned} V(N) = \sum_{i\in N} K_i(\bar{u}_1, \ldots, \bar{u}_n) \end{aligned} $$

(12)

and it can be easily shown that the characteristic function W(S), S ⊂ N in G will have the form

$$\displaystyle \begin{aligned} W(S)= \frac{1}{1-\delta}V(S), \ S\subset N. \end{aligned} $$

(13)

Remind now the definition of strong (or coalition proof) Nash equilibrium.

Definition 3

The n-tuple of strategies $(\hat {u}_1,\ldots \hat {u}_2, \ldots \hat {u}_n)=\hat {u}$ is called strong (or coalition proof) Nash equilibrium (SNE) if for all S ⊂ N, and all u _S = {u _i, i ∈ S} the following inequality holds

$$\displaystyle \begin{aligned} \sum_{i\in S} K_i(\hat{u}) \geq \sum_{i\in S} K_i(\hat{u}|| u_S). \end{aligned} $$

(14)

Consider now the core C in Γ, and suppose that C≠∅, and suppose also that there exist an imputation α ∈ C such that

$$\displaystyle \begin{aligned} \sum_{i\in S} \alpha _i> V(S),\ S \subset N, \ S \ne N. \end{aligned} $$

(15)

2.1 Associated Zero-Sum Games

Consider a family of zero-sum games Γ _N∖i,i with coalition N∖{i} as first player and coalition {i} as second. The payoff of N∖{i} is equal to the sum of payoffs of players from N∖{i}. Denote by V (N∖i) the value of Γ _N∖i,i. Let $(\bar {\mu }_{N \backslash i},\bar {\mu }_{i} )$ be the saddle point (in mixed strategies) in Γ _N∖i,i.

Consider the n-tuple of strategies $\bar {\mu }=(\bar {\mu }_1,\ldots ,\bar {\mu }_n)$, and define

$$\displaystyle \begin{aligned}\overline{W}(S)=\max_{\mu_S}\sum_{i\in S} K_i(\mu_S; \bar{\mu}_{N\backslash S}),\end{aligned}$$

here μ _S = {μ _i, i ∈ S}, $\bar {\mu }_{N\backslash S} =\{\bar {\mu }_i, \ i\in N\backslash S\}$. It is clear that

$$\displaystyle \begin{aligned}\overline{W}(S)\geq V(S), \ \overline{W}(N)=V(N), \ S\subset N. \end{aligned}$$

Suppose, that there exist the solution of the system

$$\displaystyle \begin{aligned} \sum_{i\in S} \alpha _i> \overline{W}(S),\ \sum_{i\in N} \alpha _i=\overline{W}(N)=V(N). \end{aligned} $$

(16)

Construct now the modification G ^α of the game G. The difference between G ^α and G is in payoffs defined in stage games Γ when the cooperative strategies $\bar {u}=(\bar {u}_1, \ldots , \bar {u}_n)$ are used and the payoff in this case is equal to α = (α ₁, …, α _n), where α satisfies (16). For all other strategy combinations the payoffs remain as in Γ.

The following theorem holds [10].

Theorem 3

In game G ^α there exist δ ∈ (0, 1) and SNE such that payoffs in this SNE are equal to $\alpha _i\displaystyle \frac {1}{1-\delta }$, which are payoffs in G ^α under cooperation.

2.2 Multistage Games

Multistage game G starts from a fixed stage game Γ(z ₁) which can be considered as situated in the position (root) z ₁ of the game tree G.

$$\displaystyle \begin{aligned} \varGamma(z_1) = <N; U_1^{z_1},\ldots,U_i^{z_1}, \ldots,U_n^{z_1}; K_1^{z_1},\ldots,K_i^{z_1},\ldots,K_n^{z_1}>. \end{aligned} $$

(17)

For simplicity we suppose that the set of players N is the same in all stage games. When the game G develops the infinite sequence of stage games is realized but only a finite number of them are different since we suppose that the total number of different stage game Γ(z) is finite. As usual in multistage games we consider the general case when the next stage game depends upon controls chosen by players only in previous stage game. Like in previous section denote by u _i(⋅) the strategy of player i in G (defined as function of histories). The strategy profile which maximizes the sum of players payoffs in G is called “cooperative” strategy profile and the corresponding sequence of stage games (or equivalently sequence of positions on the tree G) “cooperative trajectory.” Suppose that for each stage game Γ(z) the characteristic function V (z, S) (in classical sense) is defined.

For each stage game Γ(z) consider the family of zero-sum games Γ _N∖i,i(z) and corresponding saddle points $\bar {\mu }^z_{N\backslash i}, \bar {\mu ^z_i}$, and $\bar {\mu }^z=(\bar {\mu }_1^z,\ldots , \bar {\mu }_n^z)$, define

$$\displaystyle \begin{aligned}\overline{W}(z,S)=\max_{\mu_S^z}\sum_{i\in S}K_i^z(\mu_S^z, \bar{\mu}^z_{N\backslash S}).\end{aligned}$$

Let

$$\displaystyle \begin{aligned}\overline{W}(S)=\sup_{z}\overline{W}(z,S).\end{aligned}$$

Suppose that

$$\displaystyle \begin{aligned}\overline{W}(S)<\inf_{z}W(z,N)=\inf_z V(z,N).\end{aligned}$$

Suppose the core C(z) is not empty in each stage game Γ(z), denote by D(z) the subcore of C(z) as set of all imputations $\alpha ^z=(\alpha ^z_1,\ldots , \alpha _n^z)$, $\displaystyle \sum _ {i\in S} \alpha _i^z \geq \overline {W}(S)$, for all S.

Suppose that for all z ∈ G, D(z)≠∅ and suppose also that there exist imputation $\alpha ^z=(\alpha ^z_1,\ldots , \alpha _n^z)$ such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sum_ {i\in S} \alpha_i^z > \overline{W}(S)\ \mbox{for }\ \mbox{all}\ S, \quad \quad \end{array} \end{aligned} $$

(18)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \inf_{ S,z} \left[\sum_ {i\in S} \alpha_i^z - \overline{W}(S)\right] = A >0. \end{array} \end{aligned} $$

(19)

For simplicity we shall consider the special case when $V(z,N) = \overline {W}(N)$ for all z the previous conditions (18) and (19) can be written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_ {i\in S} \alpha_i >\overline{W}(S)\ \mbox{for }\ \mbox{all}\ S, \quad \quad \end{array} \end{aligned} $$

(20)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \inf_{ S} \left[\sum_ {i\in S} \alpha_i -\overline{W}(S)\right] = A >0, \end{array} \end{aligned} $$

(21)

since the number of different stage games is finite and we can select α the same in all stage games.

Construct now the modification G ^α of the game in the same way as it was done in Sect. 1. Theorem 1 from Sect. 1 holds also for the game G ^α.

Theorem 4

In the game G ^α there exist δ ∈ (0, 1) and SNE such that payoffs in this SNE are equal to $\alpha _i\frac {1}{1-\delta }$, which are payoffs in G ^α under cooperation.

2.3 Time-Consistency and Strongly Time-Consistency

Consider cooperative version of game G and subgame G(z). Introduce the following characteristic function in G and in G(z), respectively,

$$\displaystyle \begin{aligned}\hat{W} (S) =\frac{1}{1-\delta} W(S).\end{aligned}$$

Denote the analog of the core $\hat {C}$ and $\hat {C}(z)$ in G under the defined above c.f.

Strongly time-consistency in this case means that for each imputation $\bar {\alpha }\in \hat {C}(\bar {z}_0)$ there exist corresponding IDP $\bar {\beta }(1), \ldots , \bar {\beta }(l), \ldots $ such that

$$\displaystyle \begin{aligned} \sum_{k=0}^l\delta^k\bar{\beta}(k) \oplus\delta^{l+1}\hat{C}(\bar{z}_{l+1})\subset \hat{C}(\bar{z}_0). \end{aligned} $$

(22)

It can be easily seen that if D(z) = D ≠ ∅, by selecting $\bar {\beta }(k)=\beta \in D(\bar {z}_k)$ we can guarantee the strongly time-consistency of $\hat {C}(\bar {z}_0)$.

Suppose $\alpha \in \hat {C}(\bar {z}_0)$, then by definition we have

$$\displaystyle \begin{aligned} \sum_{i\in S}\bar{\alpha}_i\geq \hat{W}(S)=\frac{1}{1-\delta} \bar{W} (S); \ \sum_{i\in N}\bar{\alpha}_i= \hat{W}(N)=\frac{1}{1-\delta} \bar{W} (N).\end{aligned}$$

Represent $\bar {\alpha }$ in the form

$$\displaystyle \begin{aligned} \bar{\alpha}=\sum_{k=0}^{\infty}\delta^k\bar{\beta},\end{aligned}$$

since $\bar {\alpha } \in \hat {C}(\bar {z}_0)$

$$\displaystyle \begin{aligned}\sum_{i\in S}\bar{\alpha}_i=\sum_{i\in S}\frac{1}{1-\delta}\bar{\beta}_i\geq \hat{W}(S) = \sum_{i\in S}\frac{1}{1-\delta}\bar{W}(S),\end{aligned}$$

and

$$\displaystyle \begin{aligned}\sum_{i\in S}\bar{\beta}_i\geq \bar{W}(S), \ \sum_{i\in N}\bar{\beta}_i=\bar{W}(N).\end{aligned}$$

Thus $\bar {\beta }\in D(\bar {z}_k)=D$, k = 0, 1, …, l, …. And we get that each imputation $\bar {\alpha }\in \hat {C}(\bar {z}_0)$ can be represented in the form $\bar {\alpha }=\displaystyle \sum _{k=0}^{\infty }\delta ^k\bar {\beta }(k)$, when $\bar {\beta }(k)=\bar {\beta }\in D(\bar {z}_k)=D$.

This will give us also strongly time-consistency of $\hat {C}(\bar {z}_0)$.

We have seen that for arbitrary $\bar {\alpha }\in \hat {C}(\bar {z}_0)$ there exist such IDP $\bar {\beta }(0), \bar {\beta }(1), \ldots , \bar {\beta }(k), \ldots $ (in our case $\bar {\beta }(k)=\bar {\beta }\in D)$, that

$$\displaystyle \begin{aligned}\bar{\alpha}= \sum_{k=0}^{\infty}\delta^k\bar{\beta}(k).\end{aligned}$$

Suppose that $\alpha '\in \displaystyle \sum _{k=0}^{l}\delta ^k\bar {\beta }(k)\oplus \delta ^{l+1}\hat {C}(\bar {z}_{l+1})$. To prove (22) we have to prove that in this case $\alpha '\in \hat {C}(\bar {z}_0)$. Consider the stage l then we can write the imputation α′ in the form

$$\displaystyle \begin{aligned}\alpha'=\displaystyle\sum_{k=0}^{l}\delta^k\bar{\beta}(k)+ \delta^{l+1}\alpha^{\prime\prime},\end{aligned}$$

here $\bar {\beta }(k)=\bar {\beta }\in D)$, where $\alpha ^{\prime \prime }\in \hat {C}(\bar {z}_{l+1})$.

Since $\alpha ^{\prime \prime }\in \hat {C}(\bar {z}_{l+1})$ we have

$$\displaystyle \begin{aligned}\sum_{i\in S}\alpha^{\prime\prime}_i \geq \hat{W}(S)=\frac{1}{1-\delta}\bar{W}(S), \sum_{i\in N}\alpha^{\prime\prime}_i=\hat{W}(N)=\frac{1}{1-\delta}\bar{W}(N),\end{aligned}$$

and we can show that similar to previous case when $\alpha \in \hat {C}(\bar {z}_0)$, α ^′′ can be represented in the form

$$\displaystyle \begin{aligned}\alpha^{\prime\prime}= \sum_{k=l+1}^{\infty}\delta^{k-(l+1)}\beta^{\prime\prime}(k),\end{aligned}$$

where β ^′′(k) = β ^′′∈ D, k = l + 1, ….

Then we get

$$\displaystyle \begin{aligned}\alpha'= \sum_{k=0}^{l}\delta^{k}\bar{\beta}(k)+\delta^{l+1}\sum_{k=l+1}^{\infty}\delta^{k-(l+1)}\bar{\bar{\beta}}(k)=\sum_{k=0}^{\infty}\delta^{k}\tilde{\beta}(k),\end{aligned}$$

where $\tilde {\beta }(k)\in D$, $\tilde {\beta }(k)=\bar {\beta }(k)=\bar {\beta }$, k = 1, …, l, $\tilde {\beta }(k)=\bar {\bar {\beta }}(k)=\beta ^{\prime \prime }$, k = l + 1, ….

And we have

$$\displaystyle \begin{aligned}\sum_{i\in S}\alpha'= \sum_{k=0}^{l}\delta^{k}\sum_{i\in S}\tilde{\beta}_i(k)+\sum_{k=l+1}^{\infty}\delta^{k}\sum_{i\in S}\tilde{\beta}_i(k) =\sum_{k=1}^{l}\delta^{k}\sum_{i\in S}\bar{\beta}_i(k)+\sum_{k=l+1}^{\infty}\delta^k\sum_{i\in S}\bar{\bar{\beta}}_i(k) \geq,\end{aligned}$$

$$\displaystyle \begin{aligned} \geq \sum_{k=0}^l\delta^k\bar{W}(S)+\sum_{k=l+1}^{\infty}\delta^k\bar{W}(S)=\sum_{k=0}^{\infty}\delta^k\bar{W}(S)=\frac{1}{1-\delta}\bar{W}(S)=\hat{W}(S).\end{aligned}$$

In the similar way we can prove that $\displaystyle \sum _{i\in N }\alpha _i^{\prime }=\hat {W}(S)$. This proves that $\alpha '\in \hat {C}(\bar {z}_0).$

References

Aumann, R.J., Maschler, M.: Repeated Games with Incomplete Information. MIT Press, Cambridge (1995)
MATH Google Scholar
Myerson, R.B.: Multistage Games with Communication. Econometrica. 54, 323–358 (1986)
Article MathSciNet Google Scholar
Petrosyan, L.A., Danilov, N.N.: Stability of the solutions in nonantagonistic differential games with transferable payoffs. Vestnik Leningrad. Univ. Mat. Mekh. Astronom. 1, 52–59 (1979)
MathSciNet MATH Google Scholar
Petrosyan, L.A., Zaccour, G.: Time-consistent Shapley value allocation of pollution cost reduction. Journal of Economics and Control. 27, 3, 381–398 (2003)
Article MathSciNet Google Scholar
Yeung, D.W.K., Petrosyan, L.A.: Subgame Consistent Economic Optimization. Birkhauser (2012)
Google Scholar
M. Maschler, M., Solan, E., Zamir, S.: Game Theory. Cambridge University Press (2013)
Google Scholar
Aumann, R., Shapley, L.: Long-Term Competition – A Game-Theoretic Analysis. Essays in Game Theory. (1994). https://doi.org/10.1007/978-1-4612-2648-21
Rubinstein, A.: Equilibrium in Supergames. Essays in Game Theory. (1994). https://doi.org/10.1007/978-1-4612-2648-22
Fudenberg, D., Maskin, E.: The Folk Theorem in Repeated Games with Discounting or with Incomplete Information. Econometrica. 54, 3, 533–554 (1986). https://doi.org/10.2307/1911307.JSTOR1911307
Article MathSciNet Google Scholar
Petrosjan L.A., Pankratova, Y.B.: Construction of Strong Nash Equilibria in a class of infinite nonzero-sum games. Trudy Inst. Mat. Mekh. UrO RAN. 24 (2018)
Google Scholar
Petrosjan L.A.: Strongly time-consistent differential optimality principles. Vestnik St. Petersburg Univ. Math. 26, 4, 40–46 (1993)
MathSciNet Google Scholar

Download references

Acknowledgement

This research was supported by the Russian Science Foundation (grant 17-11-01079).

Author information

Authors and Affiliations

St. Petersburg State University, Saint Petersburg, Russia
Leon A. Petrosyan

Authors

Leon A. Petrosyan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leon A. Petrosyan .

Editor information

Editors and Affiliations

SRS Consortium for Advanced Study, Hong Kong Shue Yan University, Hong Kong, Hong Kong
David Yeung
School of Public Finance and Taxation, Zhejiang University of Finance and Economics, Hangzhou, China
Shravan Luckraz
School of Economics, University of Nottingham Ningbo China, Ningbo, China
Chee Kian Leong

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Petrosyan, L.A. (2020). Strongly Time-Consistent Solutions in Cooperative Dynamic Games. In: Yeung, D., Luckraz, S., Leong, C. (eds) Frontiers in Games and Dynamic Games. Annals of the International Society of Dynamic Games, vol 16. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-39789-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-39789-0_2
Published: 09 June 2020
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-030-39788-3
Online ISBN: 978-3-030-39789-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Strongly Time-Consistent Solutions in Cooperative Dynamic Games

Abstract

Similar content being viewed by others

A-Subgame Concept and the Solutions Properties for Multistage Games with Vector Payoffs

Subgame Consistent Cooperative Solution in Random Horizon Dynamic Games

Subgame Consistent Cooperative Solution in Dynamic Games

Keywords

1 What Is Strongly Time-Consistency?

1.1 Cooperative Differential Game

Definition 1

Definition 2

1.2 Transformation of Characteristic Function

Theorem 1

Theorem 2

Proof

2 Repeated Games

Definition 3

2.1 Associated Zero-Sum Games

Theorem 3

2.2 Multistage Games

Theorem 4

2.3 Time-Consistency and Strongly Time-Consistency

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Strongly Time-Consistent Solutions in Cooperative Dynamic Games

Abstract

Similar content being viewed by others

A-Subgame Concept and the Solutions Properties for Multistage Games with Vector Payoffs

Subgame Consistent Cooperative Solution in Random Horizon Dynamic Games

Subgame Consistent Cooperative Solution in Dynamic Games

Keywords

1 What Is Strongly Time-Consistency?

1.1 Cooperative Differential Game

Definition 1

Definition 2

1.2 Transformation of Characteristic Function

Theorem 1

Theorem 2

Proof

2 Repeated Games

Definition 3

2.1 Associated Zero-Sum Games

Theorem 3

2.2 Multistage Games

Theorem 4

2.3 Time-Consistency and Strongly Time-Consistency

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation