Keywords

1 What Is Strongly Time-Consistency?

What is strongly time-consistency? Try to explain this notion. Let M ∈ R n be a fixed point in R n. Consider a classical control problem (with one player)

$$\displaystyle \begin{aligned} \begin{array}{l} \dot x=f(x, u), x\in R^n, u\in U \subset Comp R^l\\ x(t_0)=x_0, \ t \in [t_0,T]. \end{array} \end{aligned} $$
(1)

Find the control \(\bar {u}(t)\), and corresponding trajectory \(\bar {x}(t)\) such that at terminal instant the distance \(\rho (\bar {x}(T),M)\) will be minimal.

Denote this problem by Γ(x 0, T − t 0). And denote by C(x 0, T − t 0) the reachability set of system (1) from initial point x 0 at terminal time T.

Suppose for simplicity that MC(x 0, T − t 0). The solution of this optimal control problem we can see on Fig. 1.

Fig. 1
figure 1

Classical optimal control problem

Consider the intermediate time instant τ ∈ [t 0, T], and the intermediate control problem \(\varGamma (\bar {x}(\tau ), T-\tau )\) with initial condition on the optimal trajectory with duration T − τ. It is clear that the control \(\bar {u}(t)\), t ∈ [τ, T] will be optimal also in \(\varGamma (\bar {x}(\tau ), T-\tau )\), so will be also the trajectory \(\bar {x}(t)\), t ∈ [τ, T].

This is Bellman-optimality principle and also time-consistency of optimal control \(\bar {u}(t)\), t ∈ [t 0, T]. Suppose now that we have another optimal control \(\bar {\bar {u}}(t)\), t ∈ [τ, T] in the problem \(\varGamma (\bar {x}(\tau ), T-\tau )\). Then it is easy to see that the control

$$\displaystyle \begin{aligned} \hat{u}(t)=\left\{ \begin{array}{ll} \bar{u}(t),& t\in[t_0, \tau]\\ \bar{\bar{u}}(t),& t\in[\tau,T] \end{array} \right. \end{aligned}$$

will be also optimal in the problem Γ(x 0, T − t 0). In other words: “any optimal continuation of the original problem in the subproblem along optimal trajectory generates optimal solution of the original problem.” This property we shall call strongly time-consistency (strongly dynamic stability) (see Fig. 1).

Consider now a slightly more complicated problem. The motion equations are the same (1), but the aim of control is different, it is necessary to come as close as possible to system of points M 1, …, M k, M i ∈ R n, i ∈{1, …k}.

Denote as before the problem by Γ(x 0, T − t 0) and by C(x 0, T − t 0) the reachability set of (1) and suppose that \(C(x_0,T-t_0) \cap \hat {M} =\emptyset \), where \(\hat {M}\) is the convex hull of points {M 1, …, M k}. As optimal solution here we may consider Pareto-optimal set which coincides with arc AB, the projection (suppose that C(x 0, T − t 0) is convex) of \(\hat {M}\) on C(x 0, T − t 0) (see Fig. 2).

Fig. 2
figure 2

Multicriterial optimal control problem

Consider Pareto-optimal control \(\bar {u}(t)\), t ∈ [t 0, T] which connects the initial point x 0 ∈ C(x 0, T − t 0) with the point M belonging to the Pareto-optimal set (M belongs to the arc AB which is projection of the set \(\hat {M}\) on C(x 0, T − t 0)). And let \(\bar {x}(t)\), t ∈ [t 0, T] be the corresponding Pareto-optimal trajectory.

Consider a subproblem \(\varGamma (\bar {x}(t), T-t)\) from initial position \(\bar {x}(t)\) on the Pareto-optimal trajectory. We see that the Pareto-optimal set in \(\varGamma (\bar {x}(t), T-t)\) (arc A′B′) is different from the Pareto-optimal set in Γ(x 0, T − t 0) having only (in our example) one common point M. This means that the control \(\bar {u}(t)\), t ∈ [τ, T] is Pareto-optimal in subproblem \(\varGamma (\bar {x}(\tau ), T-\tau )\), and the Pareto-optimal solution \(\bar {u}(t)\), t ∈ [t 0, T] is time-consistent (dynamic stable) [4, 5].

In the same time we can see that the control of the type

$$\displaystyle \begin{aligned}\hat{u}(t)=\left\{ \begin{array}{ll} \bar{u}(t),& t\in[t_0, \tau]\\ \bar{\bar{u}}(t),& t\in[\tau,T], \end{array} \right.\end{aligned}$$

where \(\bar {\bar {u}}(t)\) is an arbitrary Pareto-optimal control in subproblem \(\varGamma (\bar {x}(\tau ), T-\tau )\), may not be Pareto-optimal in Γ(x 0, T − t 0).

Which means that in this case the optimal continuation of the motion in the subproblem with initial conditions on Pareto-optimal trajectory together with initial Pareto-optimal motion maybe not Pareto-optimal in the original problem. This means that the Pareto-optimal solution is time-consistent but not strongly time-consistent (see Fig. 2).

In this special problem there is one approach for constructing strongly time-consistent solutions on the bases of Pareto-optimal solutions. The idea of this approach is to consider all possible outcomes which may occur if at each time instant t on the time interval [t k, t k + δ) the control u(τ) will be selected leading to one of Pareto-optimal points in the subproblem Γ(x(t k), T − t k). Let t 0 < t 1 < … < t k < t k+1 < … < t n = T be the decomposition of the time interval [t 0, T], t k+1 − t k = δ > 0. The resulting trajectory will be not Pareto-optimal, but we shall call it conditionally Pareto-optimal. Denote by P(x(t k), t k) the set of end-points of these trajectories for all possible controls selected in a described manner. It is clear that

$$\displaystyle \begin{aligned} P(x(t_0),t_0) \supset P(x(t_1),t_1) \supset \ldots \supset P(x(t_k),t_k)\supset \ldots \supset P(x(T),T).\end{aligned}$$

And the set P(x(t 0), t 0) is δ-strongly time-consistent if we allow possible changes of controls only in points t k, k = 0, …, n.

For the system

$$\displaystyle \begin{aligned}\dot{x}=u_{1}+u_{2}+u_{3}, x(t_{0})=x_{0}\end{aligned}$$
$$\displaystyle \begin{aligned}|u_{i}|\leq 1, x\in R^{2}, t\in [t_{0}, T],\end{aligned}$$

the set P(x(t 0), t 0) is denoted by \(\hat {D}\) on the Fig. 3 (dashed region).

Fig. 3
figure 3

Example of strongly time-consistent solution

1.1 Cooperative Differential Game

Consider now cooperative differential games with player set N. Motion equations have the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \dot x=f(x, u_1, \ldots, u_k), x\in R^n, u_i\in U_i\subset Comp R^l \end{array} \end{aligned} $$
(2)
$$\displaystyle \begin{aligned} \begin{array}{rcl} x(t_0)=x_0 \end{array} \end{aligned} $$
(3)

and the payoffs of players are defined as

$$\displaystyle \begin{aligned}K_i(x_0, T-t_0; u_1, \ldots, u_k) = \int_{t_0}^{T} h_i(x(t)) dt, h_i>0, \ i\in N.\end{aligned}$$

Denote this game by Γ(x 0, T − t 0). Cooperative trajectory \(\overline {x}(t)\), \(\overline {x}(t_0)=x_0\), t ∈ [t 0, T] is defined as

$$\displaystyle \begin{aligned} \max_{u_1, \ldots, u_k} \sum \limits_{i=1}^n K_i(x_0, T-t_0; u_1, \ldots, u_n)= \sum\limits_{i=1}^n K_i(x_0, T-t_0; \overline{u}_1, \ldots, \overline{u}_n)= \end{aligned}$$
$$\displaystyle \begin{aligned} =\sum_{i=1}^n \int_{t_0}^{T} h_i(\overline{x}(t)) dt= v(x_0, T-t_0; N). \end{aligned} $$
(4)

We suppose that max in (4) is attained. Let v(x 0, T − t 0;S), S ⊂ N be the characteristic function defined in classical way as value of zero-sum game between coalition S as first player and NS as second (see [6]), and E(x 0, T − t 0), the set of imputations

$$\displaystyle \begin{aligned} E(x_0, T-t_0)= \lbrace\xi=\lbrace \xi_i \rbrace : \sum\limits_{i=1}^n \xi_i=v(x_0, T-t_0; N), \end{aligned}$$
$$\displaystyle \begin{aligned} \xi_i\geq v(x_0, T-t_0;\lbrace i\rbrace ), i\in N\rbrace. \end{aligned} $$
(5)

Denote by C(x 0, T − t 0) reachability set of the system (1), for y ∈ C(x 0, t − t 0), t ∈ [t 0, T] define a subgame Γ(y, T − t) of Γ(x 0, T − t 0) with characteristic function v(y, T − t;S), S ⊂ N and imputation set E(y, T − t).

Optimality principle (solution) is a subset of imputation set

$$\displaystyle \begin{aligned}C(y, T-t)\subset E(y, T-t)\end{aligned}$$

(Core, NM-solution,…).

Consider the family of subgames along the cooperative trajectory \(\varGamma (\bar {x}(t), T-t;S)\) and also imputation set \(E(\overline {x}(t), T-t)\) and the solution of subgames along this cooperative trajectory, \(C(\overline {x}(t), T-t)\).

For each ξ ∈ C(x 0, T − t 0) define the imputation distribution procedure IDP [3] β(t) = (β 1(t), …, β i(t), …, β n(t))

$$\displaystyle \begin{aligned} \xi=\int_{t_0}^{T} \beta(\tau) d\tau, \ \xi \in C(x_0, T-t_0).\end{aligned}$$

The imputation ξ ∈ C(x 0, T − t 0) is called dynamic stable [3,4,5] (time-consistent) if

$$\displaystyle \begin{aligned}\xi - \int_{t_0}^{t} \beta(\tau) d\tau \in C(\overline{x}(t), T-t), \ t\in[t_0, t].\end{aligned}$$

Definition 1

The solution C(x 0, T − t 0) is called time-consistent if all imputations ξ ∈ C(x 0, T − t 0) are time-consistent.

Definition 2

Optimality principle C(x 0, T − t 0) is called strongly dynamic stable [11] (strongly time-consistent) if for each ξ ∈ C(x 0, T − t 0) there exist IDP β(τ) such that

$$\displaystyle \begin{aligned}\int_{t_0}^{t} \beta(\tau) d\tau \oplus C(\overline{x}(t), T-t) \subset C(x_0, T-t_0),\end{aligned}$$

here a ⊕ B(a ∈ R n, B ⊂ R n) is defined as {a + b : b ∈ B}.

Since as it is well known time-consistency of cooperative solutions taken from the classical one-shot game theory takes place only in special cases it is clear that strongly time-consistency is a very special event. Note that strongly time-consistency has sense only for multivalued (set-valued) optimality principles (core, NM-solution).

1.2 Transformation of Characteristic Function

Let v(y, T − t;S) be characteristic function in Γ(y, T − t). Define the following integral transformation

$$\displaystyle \begin{aligned}\overline{v} (x_0,T-t_0; S)= \int_{t_0}^{T} \frac {v(\overline{x}(t), T-t; S)\sum\limits_{i\in N} h_i(\overline{x}(t))}{v(\overline{x}(t), T-t; N)} dt,\end{aligned}$$

here \(v(\bar {x}(t), T-t;S)\) is characteristic function computed for subgame \(\varGamma (\bar {x}(t), T-t)\) along cooperative trajectory. It can be seen that

$$\displaystyle \begin{aligned}\overline{v}(x_0, T-t; N)= v(x_0, T-t; N).\end{aligned}$$

Define the imputation set \(\overline {E}(x_0, T-t_0)\) and the core under the new characteristic function \(\overline {v}(x_0, T-t_0; S)\), \( \overline {C}(x_0, T-t_0) \subset \overline {E}(x_0, T-t_0)\) and define the integral transformation of the imputation ξ ∈ E(x 0, T − t 0) to \(\bar {\xi }\in \overline {E}(x_0,T-t_0)\) as

$$\displaystyle \begin{aligned}\bar{\xi}_i=\int_{t_0}^{T}\frac{\xi_i(t)\displaystyle\sum_{i\in N}h_i(\bar{x}(t))}{V(\bar{x}(t),T-t;N)}dt, \ i\in N,\end{aligned} $$

where \(\xi (t) \in E(\bar {x}(t),T-t)\). Similarly let \(\overline {E}(\bar {x}(t), T-t)\) \(\overline {C}(\bar {x}(t), T-t)\) be the set of imputations and the core in subgame \(\varGamma (\bar {x}(t), T-t)\) along cooperative trajectory under characteristic function

$$\displaystyle \begin{aligned}\overline{v} (\bar{x}(t),T-t; S)= \int_{t}^{T} \frac {v(\overline{x}(\tau), T-\tau; S)\sum\limits_{i\in N} h_i(\overline{x}(\tau))}{v(\overline{x}(\tau), T-\tau; N)} d\tau.\end{aligned}$$

Theorem 1

\(\overline {C}(x_0, T-t_0)\) is strongly time-consistent.

To prove it is sufficient to take for each \(\bar {\xi }\in \overline {E}(x_0,T-t_0)\) as β i(t)

$$\displaystyle \begin{aligned}\beta_i(t)=\frac{\xi_i(t)\sum\limits_{i\in N} h_i(\overline{x}(t))}{v(\overline{x}(t), T-t; N)},\end{aligned}$$

where \(\xi (t) \in C(\overline {x}(t), T-t)\) is an integrable selector from \(C(\overline {x}(t), T-t)\).

What is the connection between \(\overline {C}\) and C? If there is a nonvoid intersection of \(\overline {C}\) and C, then this imputation set could be a good preferable optimality principle in Γ(x 0, T − t). Introduce

$$\displaystyle \begin{aligned}\lambda(S)=\max\limits_{t_0\leq t\leq T} \frac{v(\overline{x}(t), T-t; S)}{v(\overline{x}(t), T-t; N)},\end{aligned}$$
$$\displaystyle \begin{aligned}\lambda(N)=1.\end{aligned} $$

We have

$$\displaystyle \begin{aligned}\overline{v}(x_0, T-t_0; S) \leq \lambda(S)\int_{t_0}^{T} \sum\limits_{i\in N} h_i(\overline{x}(t)) dt = \lambda(S)v(x_0, T-t_0; N),\end{aligned}$$
$$\displaystyle \begin{aligned}\overline{v}(x_0, T-t_0; N) = \lambda(N)v(x_0, T-t_0; N)= v(x_0, T-t_0; N),\end{aligned}$$
$$\displaystyle \begin{aligned}\lambda(S)\geq \frac{v(x_0, T-t_0; S)}{v(x_0, T-t_0; N)},\end{aligned}$$
$$\displaystyle \begin{aligned}v(x_0, T-t_0; S)\leq \lambda(S)v(x_0, T-t_0; N).\end{aligned}$$

Denote by \(\hat {C}(x_0, T-t_0)\) the set of all solutions ξ = {ξ 1, …, ξ n}

$$\displaystyle \begin{aligned}\sum\limits_{i\in S} \xi_i \geq \lambda(S) v(x_0, T-t_0; N), \ S\subset N, \ \sum\limits_{i\in N} \xi_i = v(x_0, T-t_0; N).\end{aligned}$$

From previous considerations it follows

$$\displaystyle \begin{aligned}\sum\limits_{i\in S} \xi_i \geq \lambda(S) v(x_0, T-t_0; N)\geq v(x_0, T-t_0; S).\end{aligned}$$

We see that

$$\displaystyle \begin{aligned}\hat{C}(x_0, T-t_0)\subset C(x_0, T-t_0) \cap \overline{C}(x_0, T-t_0)\end{aligned}$$

and

$$\displaystyle \begin{aligned}\hat{C}(\overline{x}(t), T-t)\subset C(\overline{x}(t), T-t)\cap \overline{C}(\overline{x}(t), T-t).\end{aligned}$$

The following theorem holds.

Theorem 2

$$\displaystyle \begin{aligned}\overline{C}(x_0, T-t_0)\supset \displaystyle\int_{t_0}^{t}\displaystyle \frac{\xi(t)\displaystyle\sum_{i=1}^n h_i(\bar{x}(t))}{v(\bar{x}(t),T-t;N)} \oplus \hat{C}(\bar{x}(t), T-t)\end{aligned} $$
(6)

for any integrable selector ξ(t) ∈ C(x(t), T  t).

Proof

Theorem 2 follows from the inclusion \(\hat {C}(\bar {x}(t), T-t) \subset \bar {C}(\bar {x}(t), T-t)\) and strongly time-consistency of \(\bar {C}(x_0,T-t_0)\).

From Theorem 2 it follows that for each imputation \(\xi _0\in C(x_0,T-t_0)\cap \hat {C}(x_0,T-t_0)\) there exist IDP

$$\displaystyle \begin{aligned}\beta(t)=\displaystyle \frac{\xi(t)\displaystyle\sum_{i=1}^n h_i(\bar{x}(t))}{v(\bar{x}(t), T-t;N)},\end{aligned}$$

where ξ(t 0) = ξ 0 and ξ(t) is an integrable selector from \(C(\bar {x}(t), T-t)\), such that

$$\displaystyle \begin{aligned} \displaystyle\int_{t_0}^{t}\displaystyle \beta (\tau)d\tau \oplus \hat{C}(\bar{x}(t), T-t)\subset \bar{C}(x_0, T-t_0). \end{aligned} $$
(7)

Suppose that \( \hat {C} (x_0, T-t_0)\neq \emptyset \). The interpretation of (7) is as follows. \(\hat {C}(x_0,T-t_0)\) is the subset of the original core C(x 0, T − t 0) and for any imputation \(\xi \in \hat {C} (x_0, T-t_0)\cap C(x_0,T-t_0)\) from this subset of original core C(x 0, T − t 0) one can construct the IDP (the imputation distribution procedure) such that if in an intermediate time instant t players for some reasons would like to switch to another optimal imputation \((\xi ^t)'\in \hat {C}(\bar {x}(t), T-t)\subset C(\bar {x}(t), T-t)\) from the subset of original core, they will still get the payments according to the imputation from \(\bar {C}(x_0, T-t_0)\), resulting from the integral transformation of C(x 0, T − t 0).

2 Repeated Games

Folk theorems are well known in game theory [1, 2, 6,7,8,9]. By using the so-called punishment strategies they show the possibility to attain in some sense preferable outcomes. These outcomes are stable against deviations of single players. But the natural question arises: is it possible to get “good” outcomes stable against deviations of coalitions (coalition-proofness). Now we try to construct a mechanism based on the introduction of an analog of characteristic function which makes it possible (under some conditions on this newly defined characteristic function) to get coalition-proofness for repeated and multistage games [9]. This will show us the way of constructing strongly time-consistent optimality principles in multistage games.

Denote by G the infinity repeated n-person game with the game Γ played on each stage. For simplicity suppose that the stage game Γ is finite (has finite sets of strategies).

$$\displaystyle \begin{aligned} \varGamma= <N; U_1,\ldots,U_i, \ldots,U_n; K_1,\ldots,K_i,\ldots,K_n>.\end{aligned}$$

If on stage k(1 ≤ k ≤) strategy profile \(u^k=(u_1^k,\ldots ,u_i^k,\ldots ,u_n^k)\) is chosen, the payoff in G is defined as

$$\displaystyle \begin{aligned} \begin{array}{l} {} H_i(u_1(\cdot),\ldots,u_i(\cdot),\ldots,u_n(\cdot)) =\displaystyle\sum_{k=1}^{\infty} \delta^{k-1}K_i(u_1^k,\ldots,u_i^k,\ldots,u_n^k)= \\ =\displaystyle\sum_{k=1}^{\infty} \delta^{k-1}K_i(u^k)=H_i(u(\cdot)), \ i \in N, \end{array} \end{aligned} $$
(8)

here \(u_1(\cdot )= (u_1^1,\ldots ,u_1^k,\ldots )\), …, \(u_i(\cdot )=(u_i^1,\ldots , u_i^k,\) …), …, \(u_n(\cdot )=(u_n^1,\ldots ,u_n^k,\ldots )\), δ ∈ (0, 1).

Here in the expression \(u_i(\cdot )=(u_i^1,\ldots , u_i^k,\) …), i ∈ N \(u_i^k\) is the strategy chosen by player i in the game Γ on stage k. We suppose that on stage k when choosing \(u_i^k\) player i knows the choices of other players and remembers his choices on previous stages. Thus \(u_i^k\) is function of history

$$\displaystyle \begin{aligned}h^k= (u_1^1,\ldots,u_1^{k-1};\ldots; u_i^1,\ldots, u_i^{k-1};\ldots; u_n^1,\ldots, u_n^{k-1}). \end{aligned}$$

Formally we have to write \( u^k_i(h^k)\), i.e. \(u_i^k\) depends upon history h k, k = 1, …. However in this paper for convenience we shall write \(u_i^k\) instead \( u^k_i(h^k)\).

Consider the strategy profile \(\bar {u}(\cdot )= (\bar {u}_1(\cdot ),\) \(\ldots , \bar {u}_i(\cdot ),\) \(\ldots , \bar {u}_n(\cdot ))\) such that

$$\displaystyle \begin{aligned} \sum_{i \in N} H_i(\bar{u})= \max_{u(\cdot)} \sum _{i\in N} H_i(u). \end{aligned} $$
(9)

It is evident that such strategy profile always exists.

One can take \(\bar {u}_i(\cdot )= (\bar {u}_i^1,\ldots , \bar {u}_i^k, \ldots ,)\) i ∈ N such that

$$\displaystyle \begin{aligned} \displaystyle \sum_{i\in N} K_i(\bar{u}_1,\ldots, \bar{u}_i, \ldots, \bar{u}_n) =\displaystyle \max_{u_1,\ldots, u_i, \ldots, u_n}\sum_{i\in N} K_i(u_1,\ldots,u_i,\ldots,u_n) \end{aligned} $$
(10)

and since the stage games are the same (G is repeated game) we can take \(\bar {u}_i^k=\bar {u}_i\) for all k = 1, …, n. Then from (8)–(10) we get that

$$\displaystyle \begin{aligned} \begin{array}{l} {} \displaystyle\sum_{i \in N} H_i(\bar{u})= \displaystyle\sum_{i \in N} \left(\sum_{k=1}^{\infty} \delta^{k-1}K_i(\bar{u}_1^k,\ldots, \bar{u}_n^k) \right)=\\ =\displaystyle \sum_{i \in N} \left(\sum_{k=1}^{\infty} \delta^{k-1}K_i(\bar{u}_1,\ldots, \bar{u}_n) \right) =\displaystyle \frac{1}{1-\delta} \sum_{i \in N} K_i(\bar{u}_1,\ldots, \bar{u}_n). \end{array}\end{aligned} $$
(11)

Introduce characteristic function V (S), S ⊂ N in Γ in classical sense. Then we shall have

$$\displaystyle \begin{aligned} V(N) = \sum_{i\in N} K_i(\bar{u}_1, \ldots, \bar{u}_n) \end{aligned} $$
(12)

and it can be easily shown that the characteristic function W(S), S ⊂ N in G will have the form

$$\displaystyle \begin{aligned} W(S)= \frac{1}{1-\delta}V(S), \ S\subset N. \end{aligned} $$
(13)

Remind now the definition of strong (or coalition proof) Nash equilibrium.

Definition 3

The n-tuple of strategies \((\hat {u}_1,\ldots \hat {u}_2, \ldots \hat {u}_n)=\hat {u}\) is called strong (or coalition proof) Nash equilibrium (SNE) if for all S ⊂ N, and all u S = {u i, i ∈ S} the following inequality holds

$$\displaystyle \begin{aligned} \sum_{i\in S} K_i(\hat{u}) \geq \sum_{i\in S} K_i(\hat{u}|| u_S). \end{aligned} $$
(14)

Consider now the core C in Γ, and suppose that C≠∅, and suppose also that there exist an imputation α ∈ C such that

$$\displaystyle \begin{aligned} \sum_{i\in S} \alpha _i> V(S),\ S \subset N, \ S \ne N. \end{aligned} $$
(15)

2.1 Associated Zero-Sum Games

Consider a family of zero-sum games Γ Ni,i with coalition N∖{i} as first player and coalition {i} as second. The payoff of N∖{i} is equal to the sum of payoffs of players from N∖{i}. Denote by V (Ni) the value of Γ Ni,i. Let \((\bar {\mu }_{N \backslash i},\bar {\mu }_{i} )\) be the saddle point (in mixed strategies) in Γ Ni,i.

Consider the n-tuple of strategies \(\bar {\mu }=(\bar {\mu }_1,\ldots ,\bar {\mu }_n)\), and define

$$\displaystyle \begin{aligned}\overline{W}(S)=\max_{\mu_S}\sum_{i\in S} K_i(\mu_S; \bar{\mu}_{N\backslash S}),\end{aligned}$$

here μ S = {μ i, i ∈ S}, \(\bar {\mu }_{N\backslash S} =\{\bar {\mu }_i, \ i\in N\backslash S\}\). It is clear that

$$\displaystyle \begin{aligned}\overline{W}(S)\geq V(S), \ \overline{W}(N)=V(N), \ S\subset N. \end{aligned}$$

Suppose, that there exist the solution of the system

$$\displaystyle \begin{aligned} \sum_{i\in S} \alpha _i> \overline{W}(S),\ \sum_{i\in N} \alpha _i=\overline{W}(N)=V(N). \end{aligned} $$
(16)

Construct now the modification G α of the game G. The difference between G α and G is in payoffs defined in stage games Γ when the cooperative strategies \(\bar {u}=(\bar {u}_1, \ldots , \bar {u}_n)\) are used and the payoff in this case is equal to α = (α 1, …, α n), where α satisfies (16). For all other strategy combinations the payoffs remain as in Γ.

The following theorem holds [10].

Theorem 3

In game G α there exist δ ∈ (0, 1) and SNE such that payoffs in this SNE are equal to \(\alpha _i\displaystyle \frac {1}{1-\delta }\), which are payoffs in G α under cooperation.

2.2 Multistage Games

Multistage game G starts from a fixed stage game Γ(z 1) which can be considered as situated in the position (root) z 1 of the game tree G.

$$\displaystyle \begin{aligned} \varGamma(z_1) = <N; U_1^{z_1},\ldots,U_i^{z_1}, \ldots,U_n^{z_1}; K_1^{z_1},\ldots,K_i^{z_1},\ldots,K_n^{z_1}>. \end{aligned} $$
(17)

For simplicity we suppose that the set of players N is the same in all stage games. When the game G develops the infinite sequence of stage games is realized but only a finite number of them are different since we suppose that the total number of different stage game Γ(z) is finite. As usual in multistage games we consider the general case when the next stage game depends upon controls chosen by players only in previous stage game. Like in previous section denote by u i(⋅) the strategy of player i in G (defined as function of histories). The strategy profile which maximizes the sum of players payoffs in G is called “cooperative” strategy profile and the corresponding sequence of stage games (or equivalently sequence of positions on the tree G) “cooperative trajectory.” Suppose that for each stage game Γ(z) the characteristic function V (z, S) (in classical sense) is defined.

For each stage game Γ(z) consider the family of zero-sum games Γ Ni,i(z) and corresponding saddle points \(\bar {\mu }^z_{N\backslash i}, \bar {\mu ^z_i}\), and \(\bar {\mu }^z=(\bar {\mu }_1^z,\ldots , \bar {\mu }_n^z)\), define

$$\displaystyle \begin{aligned}\overline{W}(z,S)=\max_{\mu_S^z}\sum_{i\in S}K_i^z(\mu_S^z, \bar{\mu}^z_{N\backslash S}).\end{aligned}$$

Let

$$\displaystyle \begin{aligned}\overline{W}(S)=\sup_{z}\overline{W}(z,S).\end{aligned}$$

Suppose that

$$\displaystyle \begin{aligned}\overline{W}(S)<\inf_{z}W(z,N)=\inf_z V(z,N).\end{aligned}$$

Suppose the core C(z) is not empty in each stage game Γ(z), denote by D(z) the subcore of C(z) as set of all imputations \(\alpha ^z=(\alpha ^z_1,\ldots , \alpha _n^z)\), \(\displaystyle \sum _ {i\in S} \alpha _i^z \geq \overline {W}(S)\), for all S.

Suppose that for all z ∈ G, D(z)≠∅ and suppose also that there exist imputation \(\alpha ^z=(\alpha ^z_1,\ldots , \alpha _n^z)\) such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sum_ {i\in S} \alpha_i^z > \overline{W}(S)\ \mbox{for }\ \mbox{all}\ S, \quad \quad \end{array} \end{aligned} $$
(18)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \inf_{ S,z} \left[\sum_ {i\in S} \alpha_i^z - \overline{W}(S)\right] = A >0. \end{array} \end{aligned} $$
(19)

For simplicity we shall consider the special case when \(V(z,N) = \overline {W}(N)\) for all z the previous conditions (18) and (19) can be written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_ {i\in S} \alpha_i >\overline{W}(S)\ \mbox{for }\ \mbox{all}\ S, \quad \quad \end{array} \end{aligned} $$
(20)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \inf_{ S} \left[\sum_ {i\in S} \alpha_i -\overline{W}(S)\right] = A >0, \end{array} \end{aligned} $$
(21)

since the number of different stage games is finite and we can select α the same in all stage games.

Construct now the modification G α of the game in the same way as it was done in Sect. 1. Theorem 1 from Sect. 1 holds also for the game G α.

Theorem 4

In the game G α there exist δ ∈ (0, 1) and SNE such that payoffs in this SNE are equal to \(\alpha _i\frac {1}{1-\delta }\), which are payoffs in G α under cooperation.

2.3 Time-Consistency and Strongly Time-Consistency

Consider cooperative version of game G and subgame G(z). Introduce the following characteristic function in G and in G(z), respectively,

$$\displaystyle \begin{aligned}\hat{W} (S) =\frac{1}{1-\delta} W(S).\end{aligned}$$

Denote the analog of the core \(\hat {C}\) and \(\hat {C}(z)\) in G under the defined above c.f.

Strongly time-consistency in this case means that for each imputation \(\bar {\alpha }\in \hat {C}(\bar {z}_0)\) there exist corresponding IDP \(\bar {\beta }(1), \ldots , \bar {\beta }(l), \ldots \) such that

$$\displaystyle \begin{aligned} \sum_{k=0}^l\delta^k\bar{\beta}(k) \oplus\delta^{l+1}\hat{C}(\bar{z}_{l+1})\subset \hat{C}(\bar{z}_0). \end{aligned} $$
(22)

It can be easily seen that if D(z) = D ≠ ∅, by selecting \(\bar {\beta }(k)=\beta \in D(\bar {z}_k)\) we can guarantee the strongly time-consistency of \(\hat {C}(\bar {z}_0)\).

Suppose \(\alpha \in \hat {C}(\bar {z}_0)\), then by definition we have

$$\displaystyle \begin{aligned} \sum_{i\in S}\bar{\alpha}_i\geq \hat{W}(S)=\frac{1}{1-\delta} \bar{W} (S); \ \sum_{i\in N}\bar{\alpha}_i= \hat{W}(N)=\frac{1}{1-\delta} \bar{W} (N).\end{aligned}$$

Represent \(\bar {\alpha }\) in the form

$$\displaystyle \begin{aligned} \bar{\alpha}=\sum_{k=0}^{\infty}\delta^k\bar{\beta},\end{aligned}$$

since \(\bar {\alpha } \in \hat {C}(\bar {z}_0)\)

$$\displaystyle \begin{aligned}\sum_{i\in S}\bar{\alpha}_i=\sum_{i\in S}\frac{1}{1-\delta}\bar{\beta}_i\geq \hat{W}(S) = \sum_{i\in S}\frac{1}{1-\delta}\bar{W}(S),\end{aligned}$$

and

$$\displaystyle \begin{aligned}\sum_{i\in S}\bar{\beta}_i\geq \bar{W}(S), \ \sum_{i\in N}\bar{\beta}_i=\bar{W}(N).\end{aligned}$$

Thus \(\bar {\beta }\in D(\bar {z}_k)=D\), k = 0, 1, …, l, …. And we get that each imputation \(\bar {\alpha }\in \hat {C}(\bar {z}_0)\) can be represented in the form \(\bar {\alpha }=\displaystyle \sum _{k=0}^{\infty }\delta ^k\bar {\beta }(k)\), when \(\bar {\beta }(k)=\bar {\beta }\in D(\bar {z}_k)=D\).

This will give us also strongly time-consistency of \(\hat {C}(\bar {z}_0)\).

We have seen that for arbitrary \(\bar {\alpha }\in \hat {C}(\bar {z}_0)\) there exist such IDP \(\bar {\beta }(0), \bar {\beta }(1), \ldots , \bar {\beta }(k), \ldots \) (in our case \(\bar {\beta }(k)=\bar {\beta }\in D)\), that

$$\displaystyle \begin{aligned}\bar{\alpha}= \sum_{k=0}^{\infty}\delta^k\bar{\beta}(k).\end{aligned}$$

Suppose that \(\alpha '\in \displaystyle \sum _{k=0}^{l}\delta ^k\bar {\beta }(k)\oplus \delta ^{l+1}\hat {C}(\bar {z}_{l+1})\). To prove (22) we have to prove that in this case \(\alpha '\in \hat {C}(\bar {z}_0)\). Consider the stage l then we can write the imputation α′ in the form

$$\displaystyle \begin{aligned}\alpha'=\displaystyle\sum_{k=0}^{l}\delta^k\bar{\beta}(k)+ \delta^{l+1}\alpha^{\prime\prime},\end{aligned}$$

here \(\bar {\beta }(k)=\bar {\beta }\in D)\), where \(\alpha ^{\prime \prime }\in \hat {C}(\bar {z}_{l+1})\).

Since \(\alpha ^{\prime \prime }\in \hat {C}(\bar {z}_{l+1})\) we have

$$\displaystyle \begin{aligned}\sum_{i\in S}\alpha^{\prime\prime}_i \geq \hat{W}(S)=\frac{1}{1-\delta}\bar{W}(S), \sum_{i\in N}\alpha^{\prime\prime}_i=\hat{W}(N)=\frac{1}{1-\delta}\bar{W}(N),\end{aligned}$$

and we can show that similar to previous case when \(\alpha \in \hat {C}(\bar {z}_0)\), α ′′ can be represented in the form

$$\displaystyle \begin{aligned}\alpha^{\prime\prime}= \sum_{k=l+1}^{\infty}\delta^{k-(l+1)}\beta^{\prime\prime}(k),\end{aligned}$$

where β ′′(k) = β ′′∈ D, k = l + 1, ….

Then we get

$$\displaystyle \begin{aligned}\alpha'= \sum_{k=0}^{l}\delta^{k}\bar{\beta}(k)+\delta^{l+1}\sum_{k=l+1}^{\infty}\delta^{k-(l+1)}\bar{\bar{\beta}}(k)=\sum_{k=0}^{\infty}\delta^{k}\tilde{\beta}(k),\end{aligned}$$

where \(\tilde {\beta }(k)\in D\), \(\tilde {\beta }(k)=\bar {\beta }(k)=\bar {\beta }\), k = 1, …, l, \(\tilde {\beta }(k)=\bar {\bar {\beta }}(k)=\beta ^{\prime \prime }\), k = l + 1, ….

And we have

$$\displaystyle \begin{aligned}\sum_{i\in S}\alpha'= \sum_{k=0}^{l}\delta^{k}\sum_{i\in S}\tilde{\beta}_i(k)+\sum_{k=l+1}^{\infty}\delta^{k}\sum_{i\in S}\tilde{\beta}_i(k) =\sum_{k=1}^{l}\delta^{k}\sum_{i\in S}\bar{\beta}_i(k)+\sum_{k=l+1}^{\infty}\delta^k\sum_{i\in S}\bar{\bar{\beta}}_i(k) \geq,\end{aligned}$$
$$\displaystyle \begin{aligned} \geq \sum_{k=0}^l\delta^k\bar{W}(S)+\sum_{k=l+1}^{\infty}\delta^k\bar{W}(S)=\sum_{k=0}^{\infty}\delta^k\bar{W}(S)=\frac{1}{1-\delta}\bar{W}(S)=\hat{W}(S).\end{aligned}$$

In the similar way we can prove that \(\displaystyle \sum _{i\in N }\alpha _i^{\prime }=\hat {W}(S)\). This proves that \(\alpha '\in \hat {C}(\bar {z}_0).\)