Abstract
In the paper the evolution of dynamic game along the cooperative trajectory is investigated. Along cooperative trajectory at each time instant players find themselves in a new game which is a subgame of the originally defined game. In many cases the optimal solution of the initial game restricted to the subgame along cooperative trajectory fails to be optimal in the subgame. To overcome this difficulty we introduced (see Petrosyan and Danilov, Vestnik Leningrad Univ Mat Mekh Astronom 1:52–59, 1979; Petrosyan and Zaccour, J Econ Control 27(3):381–398, 2003; Yeung and Petrosyan, Subgame consistent economic optimization. Birkhauser, 2012) the special payment mechanism—imputation distribution procedure (IDP), or payment distribution procedure (PDP), but another serious question arises: under what conditions the initial optimal solution converted to any optimal solution in the subgame will remain optimal in the whole game. This condition we call strongly time-consistency condition of the optimal solution. If this condition is not satisfied players in reality may switch in some time instant from the previously selected optimal solution to any optimal solution in the subgame, and as result realize the solution which will be not optimal in the whole game. We propose different types of strongly time-consistent solutions for multicriterial control, cooperative differential, and cooperative dynamic games.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
1 What Is Strongly Time-Consistency?
What is strongly time-consistency? Try to explain this notion. Let M ∈ R n be a fixed point in R n. Consider a classical control problem (with one player)
Find the control \(\bar {u}(t)\), and corresponding trajectory \(\bar {x}(t)\) such that at terminal instant the distance \(\rho (\bar {x}(T),M)\) will be minimal.
Denote this problem by Γ(x 0, T − t 0). And denote by C(x 0, T − t 0) the reachability set of system (1) from initial point x 0 at terminal time T.
Suppose for simplicity that M∉C(x 0, T − t 0). The solution of this optimal control problem we can see on Fig. 1.
Consider the intermediate time instant τ ∈ [t 0, T], and the intermediate control problem \(\varGamma (\bar {x}(\tau ), T-\tau )\) with initial condition on the optimal trajectory with duration T − τ. It is clear that the control \(\bar {u}(t)\), t ∈ [τ, T] will be optimal also in \(\varGamma (\bar {x}(\tau ), T-\tau )\), so will be also the trajectory \(\bar {x}(t)\), t ∈ [τ, T].
This is Bellman-optimality principle and also time-consistency of optimal control \(\bar {u}(t)\), t ∈ [t 0, T]. Suppose now that we have another optimal control \(\bar {\bar {u}}(t)\), t ∈ [τ, T] in the problem \(\varGamma (\bar {x}(\tau ), T-\tau )\). Then it is easy to see that the control
will be also optimal in the problem Γ(x 0, T − t 0). In other words: “any optimal continuation of the original problem in the subproblem along optimal trajectory generates optimal solution of the original problem.” This property we shall call strongly time-consistency (strongly dynamic stability) (see Fig. 1).
Consider now a slightly more complicated problem. The motion equations are the same (1), but the aim of control is different, it is necessary to come as close as possible to system of points M 1, …, M k, M i ∈ R n, i ∈{1, …k}.
Denote as before the problem by Γ(x 0, T − t 0) and by C(x 0, T − t 0) the reachability set of (1) and suppose that \(C(x_0,T-t_0) \cap \hat {M} =\emptyset \), where \(\hat {M}\) is the convex hull of points {M 1, …, M k}. As optimal solution here we may consider Pareto-optimal set which coincides with arc AB, the projection (suppose that C(x 0, T − t 0) is convex) of \(\hat {M}\) on C(x 0, T − t 0) (see Fig. 2).
Consider Pareto-optimal control \(\bar {u}(t)\), t ∈ [t 0, T] which connects the initial point x 0 ∈ C(x 0, T − t 0) with the point M belonging to the Pareto-optimal set (M belongs to the arc AB which is projection of the set \(\hat {M}\) on C(x 0, T − t 0)). And let \(\bar {x}(t)\), t ∈ [t 0, T] be the corresponding Pareto-optimal trajectory.
Consider a subproblem \(\varGamma (\bar {x}(t), T-t)\) from initial position \(\bar {x}(t)\) on the Pareto-optimal trajectory. We see that the Pareto-optimal set in \(\varGamma (\bar {x}(t), T-t)\) (arc A′B′) is different from the Pareto-optimal set in Γ(x 0, T − t 0) having only (in our example) one common point M. This means that the control \(\bar {u}(t)\), t ∈ [τ, T] is Pareto-optimal in subproblem \(\varGamma (\bar {x}(\tau ), T-\tau )\), and the Pareto-optimal solution \(\bar {u}(t)\), t ∈ [t 0, T] is time-consistent (dynamic stable) [4, 5].
In the same time we can see that the control of the type
where \(\bar {\bar {u}}(t)\) is an arbitrary Pareto-optimal control in subproblem \(\varGamma (\bar {x}(\tau ), T-\tau )\), may not be Pareto-optimal in Γ(x 0, T − t 0).
Which means that in this case the optimal continuation of the motion in the subproblem with initial conditions on Pareto-optimal trajectory together with initial Pareto-optimal motion maybe not Pareto-optimal in the original problem. This means that the Pareto-optimal solution is time-consistent but not strongly time-consistent (see Fig. 2).
In this special problem there is one approach for constructing strongly time-consistent solutions on the bases of Pareto-optimal solutions. The idea of this approach is to consider all possible outcomes which may occur if at each time instant t on the time interval [t k, t k + δ) the control u(τ) will be selected leading to one of Pareto-optimal points in the subproblem Γ(x(t k), T − t k). Let t 0 < t 1 < … < t k < t k+1 < … < t n = T be the decomposition of the time interval [t 0, T], t k+1 − t k = δ > 0. The resulting trajectory will be not Pareto-optimal, but we shall call it conditionally Pareto-optimal. Denote by P(x(t k), t k) the set of end-points of these trajectories for all possible controls selected in a described manner. It is clear that
And the set P(x(t 0), t 0) is δ-strongly time-consistent if we allow possible changes of controls only in points t k, k = 0, …, n.
For the system
the set P(x(t 0), t 0) is denoted by \(\hat {D}\) on the Fig. 3 (dashed region).
1.1 Cooperative Differential Game
Consider now cooperative differential games with player set N. Motion equations have the form
and the payoffs of players are defined as
Denote this game by Γ(x 0, T − t 0). Cooperative trajectory \(\overline {x}(t)\), \(\overline {x}(t_0)=x_0\), t ∈ [t 0, T] is defined as
We suppose that max in (4) is attained. Let v(x 0, T − t 0;S), S ⊂ N be the characteristic function defined in classical way as value of zero-sum game between coalition S as first player and N∖S as second (see [6]), and E(x 0, T − t 0), the set of imputations
Denote by C(x 0, T − t 0) reachability set of the system (1), for y ∈ C(x 0, t − t 0), t ∈ [t 0, T] define a subgame Γ(y, T − t) of Γ(x 0, T − t 0) with characteristic function v(y, T − t;S), S ⊂ N and imputation set E(y, T − t).
Optimality principle (solution) is a subset of imputation set
(Core, NM-solution,…).
Consider the family of subgames along the cooperative trajectory \(\varGamma (\bar {x}(t), T-t;S)\) and also imputation set \(E(\overline {x}(t), T-t)\) and the solution of subgames along this cooperative trajectory, \(C(\overline {x}(t), T-t)\).
For each ξ ∈ C(x 0, T − t 0) define the imputation distribution procedure IDP [3] β(t) = (β 1(t), …, β i(t), …, β n(t))
The imputation ξ ∈ C(x 0, T − t 0) is called dynamic stable [3,4,5] (time-consistent) if
Definition 1
The solution C(x 0, T − t 0) is called time-consistent if all imputations ξ ∈ C(x 0, T − t 0) are time-consistent.
Definition 2
Optimality principle C(x 0, T − t 0) is called strongly dynamic stable [11] (strongly time-consistent) if for each ξ ∈ C(x 0, T − t 0) there exist IDP β(τ) such that
here a ⊕ B(a ∈ R n, B ⊂ R n) is defined as {a + b : b ∈ B}.
Since as it is well known time-consistency of cooperative solutions taken from the classical one-shot game theory takes place only in special cases it is clear that strongly time-consistency is a very special event. Note that strongly time-consistency has sense only for multivalued (set-valued) optimality principles (core, NM-solution).
1.2 Transformation of Characteristic Function
Let v(y, T − t;S) be characteristic function in Γ(y, T − t). Define the following integral transformation
here \(v(\bar {x}(t), T-t;S)\) is characteristic function computed for subgame \(\varGamma (\bar {x}(t), T-t)\) along cooperative trajectory. It can be seen that
Define the imputation set \(\overline {E}(x_0, T-t_0)\) and the core under the new characteristic function \(\overline {v}(x_0, T-t_0; S)\), \( \overline {C}(x_0, T-t_0) \subset \overline {E}(x_0, T-t_0)\) and define the integral transformation of the imputation ξ ∈ E(x 0, T − t 0) to \(\bar {\xi }\in \overline {E}(x_0,T-t_0)\) as
where \(\xi (t) \in E(\bar {x}(t),T-t)\). Similarly let \(\overline {E}(\bar {x}(t), T-t)\) \(\overline {C}(\bar {x}(t), T-t)\) be the set of imputations and the core in subgame \(\varGamma (\bar {x}(t), T-t)\) along cooperative trajectory under characteristic function
Theorem 1
\(\overline {C}(x_0, T-t_0)\) is strongly time-consistent.
To prove it is sufficient to take for each \(\bar {\xi }\in \overline {E}(x_0,T-t_0)\) as β i(t)
where \(\xi (t) \in C(\overline {x}(t), T-t)\) is an integrable selector from \(C(\overline {x}(t), T-t)\).
What is the connection between \(\overline {C}\) and C? If there is a nonvoid intersection of \(\overline {C}\) and C, then this imputation set could be a good preferable optimality principle in Γ(x 0, T − t). Introduce
We have
Denote by \(\hat {C}(x_0, T-t_0)\) the set of all solutions ξ = {ξ 1, …, ξ n}
From previous considerations it follows
We see that
and
The following theorem holds.
Theorem 2
for any integrable selector ξ(t) ∈ C(x(t), T − t).
Proof
Theorem 2 follows from the inclusion \(\hat {C}(\bar {x}(t), T-t) \subset \bar {C}(\bar {x}(t), T-t)\) and strongly time-consistency of \(\bar {C}(x_0,T-t_0)\).
From Theorem 2 it follows that for each imputation \(\xi _0\in C(x_0,T-t_0)\cap \hat {C}(x_0,T-t_0)\) there exist IDP
where ξ(t 0) = ξ 0 and ξ(t) is an integrable selector from \(C(\bar {x}(t), T-t)\), such that
□
Suppose that \( \hat {C} (x_0, T-t_0)\neq \emptyset \). The interpretation of (7) is as follows. \(\hat {C}(x_0,T-t_0)\) is the subset of the original core C(x 0, T − t 0) and for any imputation \(\xi \in \hat {C} (x_0, T-t_0)\cap C(x_0,T-t_0)\) from this subset of original core C(x 0, T − t 0) one can construct the IDP (the imputation distribution procedure) such that if in an intermediate time instant t players for some reasons would like to switch to another optimal imputation \((\xi ^t)'\in \hat {C}(\bar {x}(t), T-t)\subset C(\bar {x}(t), T-t)\) from the subset of original core, they will still get the payments according to the imputation from \(\bar {C}(x_0, T-t_0)\), resulting from the integral transformation of C(x 0, T − t 0).
2 Repeated Games
Folk theorems are well known in game theory [1, 2, 6,7,8,9]. By using the so-called punishment strategies they show the possibility to attain in some sense preferable outcomes. These outcomes are stable against deviations of single players. But the natural question arises: is it possible to get “good” outcomes stable against deviations of coalitions (coalition-proofness). Now we try to construct a mechanism based on the introduction of an analog of characteristic function which makes it possible (under some conditions on this newly defined characteristic function) to get coalition-proofness for repeated and multistage games [9]. This will show us the way of constructing strongly time-consistent optimality principles in multistage games.
Denote by G the infinity repeated n-person game with the game Γ played on each stage. For simplicity suppose that the stage game Γ is finite (has finite sets of strategies).
If on stage k(1 ≤ k ≤∞) strategy profile \(u^k=(u_1^k,\ldots ,u_i^k,\ldots ,u_n^k)\) is chosen, the payoff in G is defined as
here \(u_1(\cdot )= (u_1^1,\ldots ,u_1^k,\ldots )\), …, \(u_i(\cdot )=(u_i^1,\ldots , u_i^k,\) …), …, \(u_n(\cdot )=(u_n^1,\ldots ,u_n^k,\ldots )\), δ ∈ (0, 1).
Here in the expression \(u_i(\cdot )=(u_i^1,\ldots , u_i^k,\) …), i ∈ N \(u_i^k\) is the strategy chosen by player i in the game Γ on stage k. We suppose that on stage k when choosing \(u_i^k\) player i knows the choices of other players and remembers his choices on previous stages. Thus \(u_i^k\) is function of history
Formally we have to write \( u^k_i(h^k)\), i.e. \(u_i^k\) depends upon history h k, k = 1, …. However in this paper for convenience we shall write \(u_i^k\) instead \( u^k_i(h^k)\).
Consider the strategy profile \(\bar {u}(\cdot )= (\bar {u}_1(\cdot ),\) \(\ldots , \bar {u}_i(\cdot ),\) \(\ldots , \bar {u}_n(\cdot ))\) such that
It is evident that such strategy profile always exists.
One can take \(\bar {u}_i(\cdot )= (\bar {u}_i^1,\ldots , \bar {u}_i^k, \ldots ,)\) i ∈ N such that
and since the stage games are the same (G is repeated game) we can take \(\bar {u}_i^k=\bar {u}_i\) for all k = 1, …, n. Then from (8)–(10) we get that
Introduce characteristic function V (S), S ⊂ N in Γ in classical sense. Then we shall have
and it can be easily shown that the characteristic function W(S), S ⊂ N in G will have the form
Remind now the definition of strong (or coalition proof) Nash equilibrium.
Definition 3
The n-tuple of strategies \((\hat {u}_1,\ldots \hat {u}_2, \ldots \hat {u}_n)=\hat {u}\) is called strong (or coalition proof) Nash equilibrium (SNE) if for all S ⊂ N, and all u S = {u i, i ∈ S} the following inequality holds
Consider now the core C in Γ, and suppose that C≠∅, and suppose also that there exist an imputation α ∈ C such that
2.1 Associated Zero-Sum Games
Consider a family of zero-sum games Γ N∖i,i with coalition N∖{i} as first player and coalition {i} as second. The payoff of N∖{i} is equal to the sum of payoffs of players from N∖{i}. Denote by V (N∖i) the value of Γ N∖i,i. Let \((\bar {\mu }_{N \backslash i},\bar {\mu }_{i} )\) be the saddle point (in mixed strategies) in Γ N∖i,i.
Consider the n-tuple of strategies \(\bar {\mu }=(\bar {\mu }_1,\ldots ,\bar {\mu }_n)\), and define
here μ S = {μ i, i ∈ S}, \(\bar {\mu }_{N\backslash S} =\{\bar {\mu }_i, \ i\in N\backslash S\}\). It is clear that
Suppose, that there exist the solution of the system
Construct now the modification G α of the game G. The difference between G α and G is in payoffs defined in stage games Γ when the cooperative strategies \(\bar {u}=(\bar {u}_1, \ldots , \bar {u}_n)\) are used and the payoff in this case is equal to α = (α 1, …, α n), where α satisfies (16). For all other strategy combinations the payoffs remain as in Γ.
The following theorem holds [10].
Theorem 3
In game G α there exist δ ∈ (0, 1) and SNE such that payoffs in this SNE are equal to \(\alpha _i\displaystyle \frac {1}{1-\delta }\), which are payoffs in G α under cooperation.
2.2 Multistage Games
Multistage game G starts from a fixed stage game Γ(z 1) which can be considered as situated in the position (root) z 1 of the game tree G.
For simplicity we suppose that the set of players N is the same in all stage games. When the game G develops the infinite sequence of stage games is realized but only a finite number of them are different since we suppose that the total number of different stage game Γ(z) is finite. As usual in multistage games we consider the general case when the next stage game depends upon controls chosen by players only in previous stage game. Like in previous section denote by u i(⋅) the strategy of player i in G (defined as function of histories). The strategy profile which maximizes the sum of players payoffs in G is called “cooperative” strategy profile and the corresponding sequence of stage games (or equivalently sequence of positions on the tree G) “cooperative trajectory.” Suppose that for each stage game Γ(z) the characteristic function V (z, S) (in classical sense) is defined.
For each stage game Γ(z) consider the family of zero-sum games Γ N∖i,i(z) and corresponding saddle points \(\bar {\mu }^z_{N\backslash i}, \bar {\mu ^z_i}\), and \(\bar {\mu }^z=(\bar {\mu }_1^z,\ldots , \bar {\mu }_n^z)\), define
Let
Suppose that
Suppose the core C(z) is not empty in each stage game Γ(z), denote by D(z) the subcore of C(z) as set of all imputations \(\alpha ^z=(\alpha ^z_1,\ldots , \alpha _n^z)\), \(\displaystyle \sum _ {i\in S} \alpha _i^z \geq \overline {W}(S)\), for all S.
Suppose that for all z ∈ G, D(z)≠∅ and suppose also that there exist imputation \(\alpha ^z=(\alpha ^z_1,\ldots , \alpha _n^z)\) such that
For simplicity we shall consider the special case when \(V(z,N) = \overline {W}(N)\) for all z the previous conditions (18) and (19) can be written as
since the number of different stage games is finite and we can select α the same in all stage games.
Construct now the modification G α of the game in the same way as it was done in Sect. 1. Theorem 1 from Sect. 1 holds also for the game G α.
Theorem 4
In the game G α there exist δ ∈ (0, 1) and SNE such that payoffs in this SNE are equal to \(\alpha _i\frac {1}{1-\delta }\), which are payoffs in G α under cooperation.
2.3 Time-Consistency and Strongly Time-Consistency
Consider cooperative version of game G and subgame G(z). Introduce the following characteristic function in G and in G(z), respectively,
Denote the analog of the core \(\hat {C}\) and \(\hat {C}(z)\) in G under the defined above c.f.
Strongly time-consistency in this case means that for each imputation \(\bar {\alpha }\in \hat {C}(\bar {z}_0)\) there exist corresponding IDP \(\bar {\beta }(1), \ldots , \bar {\beta }(l), \ldots \) such that
It can be easily seen that if D(z) = D ≠ ∅, by selecting \(\bar {\beta }(k)=\beta \in D(\bar {z}_k)\) we can guarantee the strongly time-consistency of \(\hat {C}(\bar {z}_0)\).
Suppose \(\alpha \in \hat {C}(\bar {z}_0)\), then by definition we have
Represent \(\bar {\alpha }\) in the form
since \(\bar {\alpha } \in \hat {C}(\bar {z}_0)\)
and
Thus \(\bar {\beta }\in D(\bar {z}_k)=D\), k = 0, 1, …, l, …. And we get that each imputation \(\bar {\alpha }\in \hat {C}(\bar {z}_0)\) can be represented in the form \(\bar {\alpha }=\displaystyle \sum _{k=0}^{\infty }\delta ^k\bar {\beta }(k)\), when \(\bar {\beta }(k)=\bar {\beta }\in D(\bar {z}_k)=D\).
This will give us also strongly time-consistency of \(\hat {C}(\bar {z}_0)\).
We have seen that for arbitrary \(\bar {\alpha }\in \hat {C}(\bar {z}_0)\) there exist such IDP \(\bar {\beta }(0), \bar {\beta }(1), \ldots , \bar {\beta }(k), \ldots \) (in our case \(\bar {\beta }(k)=\bar {\beta }\in D)\), that
Suppose that \(\alpha '\in \displaystyle \sum _{k=0}^{l}\delta ^k\bar {\beta }(k)\oplus \delta ^{l+1}\hat {C}(\bar {z}_{l+1})\). To prove (22) we have to prove that in this case \(\alpha '\in \hat {C}(\bar {z}_0)\). Consider the stage l then we can write the imputation α′ in the form
here \(\bar {\beta }(k)=\bar {\beta }\in D)\), where \(\alpha ^{\prime \prime }\in \hat {C}(\bar {z}_{l+1})\).
Since \(\alpha ^{\prime \prime }\in \hat {C}(\bar {z}_{l+1})\) we have
and we can show that similar to previous case when \(\alpha \in \hat {C}(\bar {z}_0)\), α ′′ can be represented in the form
where β ′′(k) = β ′′∈ D, k = l + 1, ….
Then we get
where \(\tilde {\beta }(k)\in D\), \(\tilde {\beta }(k)=\bar {\beta }(k)=\bar {\beta }\), k = 1, …, l, \(\tilde {\beta }(k)=\bar {\bar {\beta }}(k)=\beta ^{\prime \prime }\), k = l + 1, ….
And we have
In the similar way we can prove that \(\displaystyle \sum _{i\in N }\alpha _i^{\prime }=\hat {W}(S)\). This proves that \(\alpha '\in \hat {C}(\bar {z}_0).\)
References
Aumann, R.J., Maschler, M.: Repeated Games with Incomplete Information. MIT Press, Cambridge (1995)
Myerson, R.B.: Multistage Games with Communication. Econometrica. 54, 323–358 (1986)
Petrosyan, L.A., Danilov, N.N.: Stability of the solutions in nonantagonistic differential games with transferable payoffs. Vestnik Leningrad. Univ. Mat. Mekh. Astronom. 1, 52–59 (1979)
Petrosyan, L.A., Zaccour, G.: Time-consistent Shapley value allocation of pollution cost reduction. Journal of Economics and Control. 27, 3, 381–398 (2003)
Yeung, D.W.K., Petrosyan, L.A.: Subgame Consistent Economic Optimization. Birkhauser (2012)
M. Maschler, M., Solan, E., Zamir, S.: Game Theory. Cambridge University Press (2013)
Aumann, R., Shapley, L.: Long-Term Competition – A Game-Theoretic Analysis. Essays in Game Theory. (1994). https://doi.org/10.1007/978-1-4612-2648-21
Rubinstein, A.: Equilibrium in Supergames. Essays in Game Theory. (1994). https://doi.org/10.1007/978-1-4612-2648-22
Fudenberg, D., Maskin, E.: The Folk Theorem in Repeated Games with Discounting or with Incomplete Information. Econometrica. 54, 3, 533–554 (1986). https://doi.org/10.2307/1911307.JSTOR1911307
Petrosjan L.A., Pankratova, Y.B.: Construction of Strong Nash Equilibria in a class of infinite nonzero-sum games. Trudy Inst. Mat. Mekh. UrO RAN. 24 (2018)
Petrosjan L.A.: Strongly time-consistent differential optimality principles. Vestnik St. Petersburg Univ. Math. 26, 4, 40–46 (1993)
Acknowledgement
This research was supported by the Russian Science Foundation (grant 17-11-01079).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Petrosyan, L.A. (2020). Strongly Time-Consistent Solutions in Cooperative Dynamic Games. In: Yeung, D., Luckraz, S., Leong, C. (eds) Frontiers in Games and Dynamic Games. Annals of the International Society of Dynamic Games, vol 16. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-39789-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-39789-0_2
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-030-39788-3
Online ISBN: 978-3-030-39789-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)