1 Introduction

The operator introduced by Shapley (1953) to study zero-sum discounted stochastic games is a non expansive map \(\mathbf{T}\) from a Banach space to itself. Several results have been obtained by using a similar “operator approach” in the framework of zero-sum repeated games (Rosenberg and Sorin 2001; Neyman 2003; Sorin 2003; Neyman and Sorin 2010). In particular the analysis extends to general repeated games (including incomplete information and signals, see Mertens et al. 2015, Chapter IV) and arbitrary evaluation of the sequence of stage payoffs.

An important part of the literature studies families of evaluations with vanishing stage weight (either length going to infinity or discount factor going to 0) and the main issue is the existence of an asymptotic value. Assuming that the stage duration is one, each evaluation induces a time ponderation on \({\mathbb {R}}^+\) and vanishing stage weight leads to an increasing number \(n_a\) of interactions during any given fraction \(a\in ]0,1[\) of the game that has been played according to this ponderation.

We consider here another direction of research: the time ponderation is fixed and the stage duration vanishes (leading to a continuous time game at the limit). Note that, as above, this leads to an increasing number \(n_a\) of interactions.

We study in particular stochastic games with varying stage duration, in the spirit of Neyman (2013). Our approach is based on the non expansive property of the Shapley operator to derive convergence results, characterization of the values, and links with evolution equations in continuous time.

The structure of the paper is as follows:

We first recall the definition of the Shapley operator T associated to a stochastic game (Sect. 2) and the related finite and discounted iterations.

We introduce in Sect. 3 two models of stochastic games with variable stage duration h: linearization via “exact” games, and “discretization” of a continuous time model. In both frameworks we describe the link with the fractional Shapley operator \(\mathbf{T}_h\).

Sections 4 and 5 are devoted to the abstract analysis of various fractional iterations of a general non expansive map \(\mathbf{T}\):

  • first in the finite iteration case, where we establish relations between the n-iterate \( \mathbf{T}_h^n\) and the solution of the evolution equation \(\dot{f}_t = (\mathbf{T}- Id )f_t\) at time \(t = nh\),

  • then in the discounted case, where we identify the \(\lambda \)-discounted evaluation associated to \(\mathbf{T}_h\) as the \( \mu = \frac{\lambda }{1+\lambda - \lambda h} \)-discounted evaluation associated to \(\mathbf{T}\).

We then apply these results to the case of exact stochastic games. Sections 6 and 7 are respectively devoted to the study of games with finite length and with discount factor. In both frameworks we establish results of two different kinds. Firstly, for a fixed evaluation (finite length or discount factor), the value of a game with varying stage duration converges as the stage duration goes to 0. Secondly, the asymptotic behavior of the value (for large length or small discount factor) does not depend on the stage duration.

In Sect. 8 we study the discretization of the continuous time game by approximating with exact games and we prove convergence of the values in the finite length and discounted case as the stage duration vanishes.

The last section provide concluding comments.

2 Stochastic games and Shapley operator

Consider a two person zero-sum stochastic game G with a finite state space \(\Omega \). I and J are compact metric action spaces, X and Y are the sets of regular probabilities on the corresponding Borel \(\sigma \)-algebra. g is a bounded measurable payoff function from \( \Omega \times I \times J \) to \({\mathbb {R}}\) (with multilinear extension to \(X\times Y\)) and for each \((i,j) \in I \times J, \) P(ij) is a transition probability from \( \Omega \) to \(\Delta (\Omega )\) (the set of probabilities on \(\Omega \)). g and P are separately continuous on I and J.

The game is played in stages. At stage n, knowing the state \(\omega _n\), player 1 (resp. 2) chooses \(i_n \in I\) (resp. \(j_n \in J\)), the stage payoff is \(g_n = g( \omega _n, i_n, j_n)\). The next state \( \omega _{n+1}\) is selected according to the probability \(P(i_n, j_n)[\omega _n]\) and is announced to the players.

One associates to G a Shapley operator, see Shapley (1953), which is a map \(\mathbf{T}\) from \(F = {\mathbb {R}}^\Omega \) to itself: \( f \in F \mapsto \mathbf{T}(f)\) defined by

$$\begin{aligned} \mathbf{T}(f) (\omega ) = \underset{(x,y) \in X\times Y}{\mathtt{val}} \left\{ g(\omega ; x, y) + P( x, y) [\omega ] \circ f \right\} , \qquad \forall \omega \in \Omega \end{aligned}$$
(1)

where \( \underset{X\times Y}{\mathtt{val}} \) is the \(\max \min = \min \max = \) value operator on \(X\times Y\),

\(P( x, y) [\omega ] (\omega ') = \int _{I\times J} P( i, j) [\omega ] (\omega ') x(di) y(dj) \) and for \(R \in {\mathbb {R}}^\Omega \), \(R\circ f = \sum _{\zeta \in \Omega } R(\zeta ) f(\zeta )\).

Note that \(\mathbf{T}\) is a non expansive map. Moreover \(\mathbf{T}\) is monotone and translates the constants (for a converse result see, e.g., Kolokoltsov 1992; Sorin 2004 for related consequences) but we will not use here these additional properties.

One can consider two other frameworks with \(\Omega \) standard Borel, where \(\mathbf{T}\) is defined in a similar way with \(P( x, y) [\omega ] \circ f = \int _{\Omega } f(\zeta ) P( x, y) [\omega ] (d\zeta )\) and where F is either:

  • the set of bounded measurable functions on \(\Omega \) and \(P(i,j) [\omega ] (A) \) is separately continous in (ij) for each Borel subset \(A \subset \Omega \) (see Mertens et al. 2015, Prop. VII.1.4),

  • or the set of bounded continuous functions on \(\Omega \) and both maps \((x, \omega ) \mapsto I (x,y; \omega ) = \int _{\Omega } f(\zeta ) P( x, y) [\omega ] (d\zeta )\) and \((y, \omega ) \mapsto I (x,y; \omega )\) are continuous for any bounded continuous function f on \(\Omega \) (see Mertens et al. 2015, Prop. VII.1.5).

For more general conditions see Nowak (1985, 2003).

From now on we assume that one of these cases holds so that \(\mathbf{T}\) is well defined from some Banach space F to itself.

Recall that \(V_n = \mathbf{T}^n (0)\) is the value of the n-stage game with total evaluation \(\sum _{m = 1}^n g_m\) (as a function of the initial state) so that the normalized value is \(v_n = \frac{V_n}{n}\).

\(W_{\lambda } \), which is the unique fixed point of \(f \mapsto \mathbf{T}((1- \lambda ) f)\) on F, is the un-normalized value of the discounted game with total evaluation \(\sum _{m = 1}^{\infty } g_m (1- \lambda )^{m-1}\) and the normalized discounted value is \(w_\lambda = \lambda W_{\lambda } \).

3 Stochastic games with varying stage duration

Let us introduce, for each \((i,j)\in I\times J\), the kernel Q(ij) such that \(P( i, j) = Id + Q( i,j)\) and write \(G= (g, Q)\) for a stochastic game defined as above. One introduces two families of varying stage duration games, see Neyman (2013), associated to G.

3.1 Exact sequence

Consider G as a game with stage duration one. Given a step size \(h \in (0, 1]\), define an “exact” game \( G^h\) with stage duration h, stage payoff hg and stage transition \(P_h = Id + h\, Q\). That is, \(G^h = (h\,g, h\,Q)\).

\(G^h\) appears as a linearization of the game G. During a stage of duration h both the payoff and the state variation are proportional with factor h to those of a stage of duration one.

Definition 3.1

Given \( h \in [0,1]\), let \( \mathbf{T}_{h} = (1-h) Id + h\, \mathbf{T}. \)

Then one has:

Proposition 3.1

If \(\mathbf{T}\) is the Shapley operator of G, then \(\mathbf{T}_h\) is the Shapley operator of the game \(G^h\).

Proof

Since \( \mathbf{T}_{h} (f) = (1-h) f + h \, \mathtt{val}\{ g + P \circ f \}= (1-h) f + \mathtt{val}\{ h\, g + h (Id + Q) \circ f \}\), one obtains

$$\begin{aligned} \mathbf{T}_{h} (f) = \mathtt{val}\left\{ h\, g + P_{h}\circ f \right\} \end{aligned}$$
(2)

with \(P_{h} = Id + h Q\).

Hence \( \mathbf{T}_{h} \) is the one stage operator associated to the game \(G^h\). \(\square \)

We will consider the associated finitely repeated games and discounted games asociated to \(G^h\). Natural questions are, in the finite case:

  1. 1.

    given a total length M, what is the asymptotic behavior of the value of the N-stage game with stage duration h, as h vanishes and \(Nh= M\).

  2. 2.

    what is the asymptotic behavior of the value, as Nh goes to \(\infty \),

and similarly in the discounted framework.

These topics will be addressed in the general setting of a non expansive map \(\mathbf{T}\) in Sects. 4 and 5. In both cases we will obtain explicit formulations for the limits.

3.2 Discretization

Let \(G= (g, Q)\) be a stochastic game with a finite state space. We consider here a continuous time jointly controlled Markov process associated to the kernel Q.

Explicitly, define \({\mathsf {P}}^t (i, j)\) as the continuous time homogeneous Markov chain on \(\Omega \), indexed by \({\mathbb {R}}^+\), with generator Q(ij):

$$\begin{aligned} \dot{{\mathsf {P}}}^t(i,j) = {\mathsf {P}}^t(i,j) Q(i,j). \end{aligned}$$
(3)

Given a stepsize \(h \in (0, 1]\), \(\overline{G}^h\) has to be considered as the discretization with mesh h of the game in continuous time \(\overline{G}\) where the state variable follows \({\mathsf {P}}^t\) and is controlled by both players, see Zachrisson (1964), Tanaka and Wakuta (1977), Guo and Hernandez-Lerma (2003), Neyman (2012), Prieto-Rumeau and Hernandez-Lerma (2012).

More precisely the players act at time \(s= kh\) by choosing actions \((i_s, j_s)\) (at random according to some \({x_s}\), resp. \({y_s}\)), knowing the current state. Between time s and \(s+h\), the state \(\omega _t\) evolves with conditional law \({\mathsf {P}}^t\) following (3) with \(Q(i_s, j_s)\) and \({\mathsf {P}}^s = Id\).

The associated Shapley operator of this stochastic game is \(\overline{\mathbf{T}}_h \) with

where \(g^h (\omega _0, x,y)\) stands for \({\mathsf {E}}[ \int _0^h g( \omega _t; x, y )dt]\) and \({\mathsf {P}}^h ( x, y) = \int _{I\times J} {\mathsf {P}}^h (i, j) x(di) y(dj)\).

The corresponding finitely repeated and discounted games will be analyzed in Sect. 8.

4 Finite iterations of non expansive maps and evolution equations

Consider a non expansive map \(\mathbf{T}\) from a Banach space Z to itself. In this section we recall basic results concerning its iterations and the corresponding discrete and continuous dynamics.

4.1 Finite iteration

The n-stage iteration starting from \(z\in Z \) is \(U_n = {\mathbf{T}}^n (z)\) hence satisfies

$$\begin{aligned} U_n - U_{n-1} = - (Id - {\mathbf{T}})(U_{n-1}) \end{aligned}$$

which can be considered as a discretization of the differential equation

$$\begin{aligned} \dot{f}_t = -(Id - \mathbf{T}) f_t,\quad f_0 = z. \end{aligned}$$
(4)

(Note that this is a special case of the differential inclusion \(\dot{f}_t \in - A f_t\), for the accretive (maximal monotone) operator \(A = Id - {\mathbf{T}}\)).

The comparison between the iterates of \(\mathbf{T}\) and the solution \(f_t(z)\) of the differential equation (4) is given by the generalized Chernoff’s formula (Miyadera and Oharu 1970; Brézis and Pazy 1970), see, e.g., Brézis (1973), p. 16:

Proposition 4.1

$$\begin{aligned} \Vert f_t(z) - {\mathbf{T}}^n (z))\Vert \le \Vert z - \mathbf{T}(z) \Vert \sqrt{ t + (n-t)^2}. \end{aligned}$$
(5)

In particular with \(z = 0\) and \(t=n\), one obtains

$$\begin{aligned} \left\| \frac{f_n(0)}{n} - v_n \right\| \le \frac{\Vert {\mathbf{T}} (0) \Vert }{\sqrt{n}} \end{aligned}$$
(6)

where as before, \(\mathbf{T}^n (0) =V_n= n v_n \).

Given \(h \in (0,1]\), a change of time shows that \(f_{t/h}(z)\) is the solution of

$$\begin{aligned} \dot{g}_t =- \frac{ (Id - {\mathbf{T}}) g_t}{h},\quad g_0 = z. \end{aligned}$$
(7)

4.2 Interpolation

Given \(h \in [0,1]\) introduce again:

$$\begin{aligned} \mathbf{T}_{h} = (1-h) Id + h\, \mathbf{T}. \end{aligned}$$
(8)

Then using (7) which is

$$\begin{aligned} \dot{g}_t =- { (Id - {\mathbf{T}}_h) g_t} ,\quad g_0 = z, \end{aligned}$$

one obtains from (5)

$$\begin{aligned} \Vert f_t(z) - \mathbf{T}_{h}^n (z)\Vert \le \Vert z - \mathbf{T}z\Vert \sqrt{ {t}{h} + (n h - {t})^2}, \end{aligned}$$
(9)

hence in particular with \(h = \frac{t}{n}\)

$$\begin{aligned} \Vert f_t(z) - \mathbf{T}_{t/n}^n (z)\Vert \le \Vert z - \mathbf{T}z\Vert \frac{t}{ \sqrt{n}}, \end{aligned}$$
(10)

or

$$\begin{aligned} \Vert f_{n h} (z) - \mathbf{T}_{h}^n (z)\Vert \le \Vert z - \mathbf{T}z\Vert \, h \sqrt{n}. \end{aligned}$$
(11)

4.3 Eulerian schemes

More generally for a sequence of step sizes \( \{h_{k}\}\) in [0, 1] one defines inductively an Eulerian scheme \(\{z_k\}\) by

$$\begin{aligned} z_{k+1}-z_k = h_{k+1}(T-Id)(z_k) \end{aligned}$$

or

$$\begin{aligned} z_{k+1}=\mathbf{T}_{h_{k+1}} z_{k}. \end{aligned}$$

For two sequences \(\{ h_{k}\}, \{ \hat{h}_{\ell }\}\) in [0, 1], with associated Eulerian schemes

$$\begin{aligned} z_{k+1}= & {} \mathbf{T}_{h_{k+1}} z_{k},\\ \hat{z}_{\ell +1}= & {} \mathbf{T}_{\hat{h}_{\ell +1}} \hat{z}_{\ell }, \end{aligned}$$

Vigeral (2010) obtains

Proposition 4.2

$$\begin{aligned}&\Vert \hat{z}_{\ell } - z_k \Vert \le \Vert \hat{z}_0 - z \Vert + \Vert z_0 - z \Vert + \Vert z - \mathbf{T}z\Vert \, \sqrt{ (\sigma _k - \hat{\sigma }_{\ell } )^2 + \tau _k + \hat{\tau }_{\ell }},\quad \forall z\in z,\nonumber \\ \end{aligned}$$
(12)
$$\begin{aligned}&\Vert f_{t} (z) - z_k \Vert \le \Vert z - \mathbf{T}z\Vert \, \sqrt{ (\sigma _k - t )^2 + \tau _k }, \end{aligned}$$
(13)

with \(z_0 = z\), \(\sigma _k = \sum _{i = 1}^k h_ i\) , \( \tau _k = \sum _{i = 1}^k h_ i^2\), \(\hat{\sigma }_\ell = \sum _{j = 1}^\ell \hat{h}_ j\) , \(\hat{\tau }_\ell = \sum _{j = 1}^\ell \hat{h}_ j^2\).

In particular this gives, in the uniform case \(h_i = h, \forall i\)

$$\begin{aligned} \Vert f_t(z) - \mathbf{T}_{h}^n (z))\Vert \le \Vert z - \mathbf{T}z\Vert \sqrt{ n h^2+ (n h - {t})^2} \end{aligned}$$
(14)

and coincides with (10) and (11) at \(t = n h\).

4.4 Two approximations

Equation (9) or more generally (13) corresponds to two approximations:

  1. i.

    Comparison on a compact interval [0, M] of \(f_t\) to the linear interpolation \( \hat{\mathbf{T}}_{M/n}^{s}\) of \(\{\mathbf{T}_{M/n}^m\}, m= 0, \ldots , n \), which is, using (10):

    $$\begin{aligned} \left\| f_t(0) - \hat{\mathbf{T}}_{M/n}^{nt/M} (0)\right\| \le K \frac{M}{ \sqrt{n}},\quad \forall t \in [0, M ], \end{aligned}$$

    for some constant K.

Or more generally if one considers a sequence of step sizes \(\{h_i\}, i= 1, \ldots , k\) with \(\sigma _k = \sum _{i=1}^k h_i = M\), \(h^i \le h, \forall i\) and \( \Pi _i\mathbf{T}_{h_i} = \mathbf{T}_{h_1}\circ \cdots \circ \mathbf{T}_{h_k}\), one has:

$$\begin{aligned} \Vert f_M(0) - \Pi _i\mathbf{T}_{h_i} (0)\Vert \le K \sqrt{h M}. \end{aligned}$$
(15)

Thus the composite iteration \( \Pi _i\mathbf{T}_{h_i}\) converges to the solution of (4) as the mesh h goes to 0.

  1. ii.

    Asymptotic comparison of the behavior of \(f_t\), solution of (4) and iterations of the form \(\Pi _i\mathbf{T}_{h_i}\) with step size \(h_i \le 1\) and total length \( \sigma _k = t\):

    $$\begin{aligned} \Vert f_t (0) -\Pi _i\mathbf{T}_{h_i} (0) \Vert \le K \sqrt{t}. \end{aligned}$$
    (16)

5 Discounted iterations of non expansive maps

5.1 General properties

For \(\lambda \in (0,1]\) denote again by \( W_{\lambda }\) the unique fixed point of \(z \mapsto T ((1- \lambda ) z)\) and let \(w_{\lambda }= \lambda W_{\lambda }\). We recall some basic evaluations, see e.g. Vigeral (2009).

Proposition 5.1

$$\begin{aligned} \Vert w_{\lambda }\Vert\le & {} \Vert \mathbf{T}(0)\Vert ,\\ \Vert w_{\lambda }- w_\mu \Vert\le & {} 2| 1 - \frac{\lambda }{\mu } | \Vert \mathbf{T}(0)\Vert . \end{aligned}$$

Proof

First, one has:

$$\begin{aligned} \Vert W_{\lambda }\Vert - \Vert \mathbf{T}(0) \Vert \le \Vert W_{\lambda }- \mathbf{T}(0) \Vert = \Vert \mathbf{T}( [1- \lambda ] W_{\lambda }) - \mathbf{T}(0) \Vert \le (1- \lambda ) \Vert W_{\lambda }\Vert \end{aligned}$$

which implies \( \lambda \Vert W_{\lambda }\Vert \le \Vert \mathbf{T}(0) \Vert \).

Moreover:

$$\begin{aligned} \Vert W_{\lambda }- W_\mu \Vert= & {} \Vert \mathbf{T}( [1- \lambda ] W_{\lambda }) - \mathbf{T}( [1- \mu ] W_\mu ) \Vert \\\le & {} \Vert [1- \lambda ] W_{\lambda }- [1- \mu ] W_\mu \Vert \\\le & {} (1- \lambda ) \Vert W_{\lambda }- W_\mu \Vert + | \lambda - \mu \Vert \Vert W_\mu \Vert \end{aligned}$$

thus

$$\begin{aligned} \lambda \Vert W_{\lambda }- W_\mu \Vert \le | \lambda - \mu \Vert \Vert W_\mu \left\| \le \frac{| \lambda - \mu |}{\mu } \right\| \mathbf{T}(0) \Vert . \end{aligned}$$

On the other hand:

$$\begin{aligned} \Vert w_{\lambda }- w_\mu \Vert \le \lambda \Vert W_{\lambda }- W_\mu \Vert + | \lambda - \mu \Vert \Vert W_\mu \Vert \end{aligned}$$

hence

$$\begin{aligned} \Vert w_{\lambda }- w_\mu \Vert \le 2 | \lambda - \mu \Vert \Vert W_\mu \Vert \end{aligned}$$

and the result follows from \( \Vert W_\mu \Vert \le \Vert \mathbf{T}(0) \Vert / \mu \). \(\square \)

5.2 Discounted values

For any non expansive operator \(\mathbf{T}\) on Z and \(h \in (0,1]\), introduce

$$\begin{aligned} W_\lambda ^h=\mathbf{T}_h((1-\lambda h)W_\lambda ^h) \end{aligned}$$

as the unique fixed point point of \(u\mapsto \mathbf{T}_h ( (1-\lambda h)u)\) and define

$$\begin{aligned} w_\lambda ^h=\lambda W_\lambda ^h = \lambda \mathbf{T}_h\left( \frac{1-\lambda h}{\lambda }w_\lambda ^h\right) . \end{aligned}$$
(17)

\(W_\lambda ^h\) is the un-normalized \(\lambda \)-evaluation computed through a stage of duration h using the linearization \(\mathbf{T}_h\) of \( \mathbf{T}\) and \(w_\lambda ^h\) is the associated normalization.

Recall that for \(h=1\), \(\mathbf{T}_h = \mathbf{T}\) and \(w_\lambda = w_\lambda ^h\).

Proposition 5.2

$$\begin{aligned} w_{\lambda }^h=w_\mu ,\quad \text{ with } \ \mu = \frac{\lambda }{1+ \lambda - \lambda \, h }. \end{aligned}$$

Proof

By definition of \(\mathbf{T}_h\),

$$\begin{aligned} w_{\lambda }^h= & {} \lambda ((1-h)Id+h\mathbf{T})\left( \frac{1-\lambda h}{\lambda }w_\lambda ^h\right) \\= & {} (1-h)(1-\lambda h)w_{\lambda }^h+\lambda h \mathbf{T}\left( \frac{1-\lambda h}{\lambda }w_\lambda ^h\right) . \end{aligned}$$

Hence

$$\begin{aligned} (1+\lambda -\lambda h)w_{\lambda }^h=\lambda \mathbf{T}\left( \frac{1-\lambda h}{\lambda }w_\lambda ^h\right) \end{aligned}$$

which is

$$\begin{aligned} w_{\lambda }^h=\mu \mathbf{T}\left( \frac{1-\mu }{\mu }w_\lambda ^h\right) \end{aligned}$$

for \(\mu = \frac{\lambda }{1+ \lambda - \lambda \, h }\).

The non expansiveness of \(\mathbf{T}\) yields uniqueness, hence the result. \(\square \)

5.3 Vanishing duration

Introduce \( \mathbf{D}^h_\lambda \), the auxiliary one stage operator associated to the \(\lambda \)-discounted evaluation of \(\mathbf{T}_h\), defined by

$$\begin{aligned} \mathbf{D}^h_\lambda \,z = \lambda \mathbf{T}_h\left[ \left( \frac{1-\lambda h}{\lambda }\right) z\right] \end{aligned}$$

which is \((1 - \lambda h) \) contracting.

In particular \( \mathbf{D}^h_\lambda \, w_{\lambda }^h = w_{\lambda }^h\) and for \(h=1\), \( \mathbf{D}^1_\lambda \, w_{\lambda } = w_{\lambda }\).

Proposition 5.3

For any \(z \in Z\)

$$\begin{aligned} w_{\lambda }^h= \lim _{n \rightarrow + \infty } \left( \mathbf{D}^h_\lambda \right) ^n z \end{aligned}$$

and \( w_{\lambda }^h \rightarrow w_\frac{\lambda }{1+ \lambda }\) as \(h \rightarrow 0\) with

$$\begin{aligned} \left\| w_{\lambda }^h- w_{\frac{\lambda }{1+\lambda }}\right\| \le C \lambda h. \end{aligned}$$

Proof

The first equality follows from definition (17).

By Proposition 5.1 and 5.2

$$\begin{aligned} \left\| w_\lambda ^h-w_{\frac{\lambda }{1+\lambda }}\right\|\le & {} C\left\| 1-\frac{1+\lambda -\lambda h}{1+\lambda }\right\| \\\le & {} C\lambda h. \end{aligned}$$

\(\square \)

More generally one can consider a sequence of step sizes \(\{h_i\}\) with \(h_i \le h\) and \(\sum _i h_i = + \infty \) and the associated operator \( \Pi _i \mathbf{D}^{h_i}_\lambda \).

Lemma 5.1

For any \(z \in Z\) and any sequence \(h_1,\ldots ,h_n\),

$$\begin{aligned} \left\| \Pi _{i=1}^n \mathbf{D}^{h_i}_\lambda (z)-w_{\frac{\lambda }{1+\lambda }} \right\| \le 2\Vert \mathbf{T}(0)\Vert \max _{1\le i\le n} h_i + (\Vert \mathbf{T}(0)\Vert +\Vert z\Vert ) \Pi _{i=1}^n (1-\lambda h_i). \end{aligned}$$

Proof

By non expansiveness,

$$\begin{aligned} \left\| \Pi _{i=1}^n \mathbf{D}^{h_i}_\lambda (z) - \Pi _{i=1}^n \mathbf{D}^{h_i}_\lambda \left( w_{\frac{\lambda }{1+\lambda }}\right) \right\|\le & {} \left\| z-w_{\frac{\lambda }{1+\lambda }}\right\| \, \Pi _{i=1}^n (1-\lambda h_i)\\\le & {} \left( \Vert \mathbf{T}(0)\Vert +\Vert z\Vert \right) \, \Pi _{i=1}^n (1-\lambda h_i). \end{aligned}$$

Hence it is enough to show that \(\Vert \Pi _{i=1}^n \mathbf{D}^{h_i}_\lambda (w_{\frac{\lambda }{1+\lambda }})-w_{\frac{\lambda }{1+\lambda }} \Vert \le 2\Vert \mathbf{T}(0)\Vert \max _{1\le i\le n} h_i\).

Let \(d_k=\Vert \Pi _{i=k}^n \mathbf{D}^{h_i}_\lambda (w_{\frac{\lambda }{1+\lambda }})-w_{\frac{\lambda }{1+\lambda }} \Vert \). Then

$$\begin{aligned} d_k\le & {} \left\| \Pi _{i=k}^n \mathbf{D}^{h_i}_\lambda \left( w_{\frac{\lambda }{1+\lambda }}\right) - \mathbf{D}^{h_{k}}_\lambda \left( w_{\frac{\lambda }{1+\lambda }}\right) \right\| + \left\| \mathbf{D}^{h_{k}}_\lambda \left( w_{\frac{\lambda }{1+\lambda }}\right) -w_{\frac{\lambda }{1+\lambda }}\right\| \\\le & {} (1-\lambda h_{k})d_{k+1} +\left\| \mathbf{D}^{h_{k}}_\lambda \left( w_{\frac{\lambda }{1+\lambda }}\right) -w_{\frac{\lambda }{1+\lambda }}\right\| . \end{aligned}$$

Now, for any h,

$$\begin{aligned} \left\| \mathbf{D}^{h}_\lambda \left( w_{\frac{\lambda }{1+\lambda }}\right) -w_{\frac{\lambda }{1+\lambda }}\right\|= & {} \left\| (1-h)(1-\lambda h)w_{\frac{\lambda }{1+\lambda }} +\lambda h \mathbf{T}\left( \frac{1-\lambda h}{\lambda }w_{\frac{\lambda }{1+\lambda }}\right) -w_{\frac{\lambda }{1+\lambda }}\right\| \\\le & {} \lambda h^2 \left\| w_{\frac{\lambda }{1+\lambda }} \right\| + \left\| \lambda h \mathbf{T}\left( \frac{1-\lambda h}{\lambda }w_{\frac{\lambda }{1+\lambda }}\right) -h(1+\lambda )w_{\frac{\lambda }{1+\lambda }} \right\| \\= & {} \lambda h^2 \left\| w_{\frac{\lambda }{1+\lambda }} \right\| + \left\| \lambda h \mathbf{T}\left( \frac{1-\lambda h}{\lambda }w_{\frac{\lambda }{1+\lambda }}\right) -\lambda h\mathbf{T}\left( \frac{1}{\lambda }w_{\frac{\lambda }{1+\lambda }}\right) \right\| \\\le & {} 2\lambda h^2 \left\| w_{\frac{\lambda }{1+\lambda }}\right\| \\\le & {} 2\Vert \mathbf{T}(0)\Vert \lambda h^2. \end{aligned}$$

Hence

$$\begin{aligned} d_k\le & {} (1-\lambda h_{k})d_{k+1}+ 2\Vert \mathbf{T}(0)\Vert \lambda h_k^2\\= & {} (1-\lambda h_{k})d_{k+1}+\lambda h_k (2\Vert \mathbf{T}(0)\Vert h_k)\\\le & {} \max (d_{k+1}, 2\Vert \mathbf{T}(0)\Vert h_k). \end{aligned}$$

Since \(d_{n+1}=0\) we get \(d_1\le 2\Vert \mathbf{T}(0)\Vert \max _{1\le i\le n} h_i\) as claimed. \(\square \)

In particular one gets

Proposition 5.4

For any \(z \in Z\), and any sequence \(\{h_i\}\) with \(h_i \le h\) and \(\sum _i h_i = + \infty \),

$$\begin{aligned} \left\| \Pi _{i=1}^\infty \mathbf{D}^{h_i}_\lambda (z)-w_{\frac{\lambda }{1+\lambda }}\right\| \le 2\Vert \mathbf{T}(0)\Vert h. \end{aligned}$$

5.4 Asymptotic properties

An easy consequence of Proposition 5.2 is that for a given h, \(w_\lambda ^h\) has the same asymptotic behavior, as \(\lambda \) tends to 0, as \(w_\lambda \).

Proposition 5.5

$$\begin{aligned} \Vert w_\lambda ^h-w_\lambda \Vert \le 2C\lambda . \end{aligned}$$

Proof

By Proposition 5.1 and 5.2,

$$\begin{aligned} \left\| w_\lambda ^h-w_\lambda \right\|= & {} 2 C | 1 - (1 + \lambda - \lambda h | \\\le & {} 2C \lambda . \end{aligned}$$

\(\square \)

To generalize this property in order to apply it to games with varying duration we need an additional assumption on the operator \(\mathbf{T}\).

Definition 5.1

The operator \(\mathbf{T}\) satisfies assumption (H) if there exists two nondecreasing functions \(k:]0,1]\rightarrow {\mathbb {R}}^+\) and \(\ell :[0,+\infty ]\rightarrow {\mathbb {R}}^+\) with \(k(\lambda )=o(\sqrt{\lambda })\) as \(\lambda \) goes to 0 and

$$\begin{aligned} \Vert \mathbf{D}^{1}_\lambda (z)-\mathbf{D}^{1}_\mu (z)\Vert \le k(|\lambda -\mu |) \ell (\Vert z\Vert ) \end{aligned}$$

for all \((\lambda ,\mu )\in ]0,1]^2\) and \(z\in Z\).

Proposition 5.6

If \(\mathbf{T}\) satisfies (H) then for any \(z \in Z\) and any sequence \(\{h_i\}\) with \(\sum _i h_i = + \infty \), \( \Vert \Pi _{i=1}^\infty \mathbf{D}^{h_i}_\lambda (z)-w_\lambda \Vert \) goes to 0 as \(\lambda \) goes to 0.

Proof

Since \(\mathbf{D}^{h_i}_\lambda \) is \(1-\lambda h\) contracting and \(\sum _i h_i = + \infty \), \( \Pi _{i=1}^\infty \mathbf{D}^{h_i}_\lambda (z)\) is independent of z and one may assume \(z=w_\lambda \).

Define \(d_n=\Vert \Pi _{i=1}^n \mathbf{D}^{h_i}_\lambda (w_\lambda )-w_\lambda \Vert \) hence \(d_0=0\) and

$$\begin{aligned} d_n\le & {} \left\| \Pi _{i=1}^n \mathbf{D}^{h_i}_\lambda (w_\lambda )- \mathbf{D}^{h_n}_\lambda (w_\lambda )\right\| +\left\| \mathbf{D}^{h_n}_\lambda (w_\lambda )-w_\lambda \right\| \\\le & {} (1-\lambda h_n)d_{n-1}+\left\| \mathbf{D}^{h_n}_\lambda (w_\lambda )-w_\lambda \right\| . \end{aligned}$$

For any h,

$$\begin{aligned} \left\| \mathbf{D}^{h}_\lambda (w_\lambda )-w_\lambda \right\|\le & {} \left\| (1-h)(1-\lambda h)w_\lambda +\lambda h \mathbf{T}\left( \frac{1-\lambda h}{\lambda }w_\lambda \right) -w_\lambda \right\| \\= & {} h(1+\lambda -\lambda h)\left\| \frac{\lambda }{1+\lambda -\lambda h} \mathbf{T}\left( \frac{1-\lambda h}{\lambda }w_\lambda \right) -w_\lambda \right\| \\= & {} h(1+\lambda -\lambda h)\left\| \mathbf{D}^{1}_\mu (w_\lambda )-\mathbf{D}^{1}_\lambda (w_\lambda )\right\| \text { with }\mu =\frac{\lambda }{1+\lambda -\lambda h}\\\le & {} h(1+\lambda -\lambda h)\ell (\Vert w_\lambda \Vert ) k\left( \frac{\lambda ^2(1-h)}{1+\lambda -\lambda h}\right) \text { by (H)}\\\le & {} 2h\ell (\Vert T(0)\Vert )k(\lambda ^2). \end{aligned}$$

Hence

$$\begin{aligned} d_n\le & {} (1-\lambda h_n)d_{n-1}+ \lambda h_n \left[ 2\ell (\Vert T(0)\Vert )\frac{k(\lambda ^2)}{\lambda }\right] \\\le & {} \max \left( d_{n-1}, 2\ell (\Vert T(0)\Vert )\frac{k(\lambda ^2)}{\lambda }\right) \end{aligned}$$

and \(d_n\le 2\ell (\Vert T(0)\Vert )\frac{k(\lambda ^2)}{\lambda }\) for all n. The result follows since by assumption \(k(\lambda ^2)=o(\lambda )\). \(\square \)

5.5 Invariant properties

We now consider another family of operators parametrized by \(\alpha \in [0,1]\).

Define for \(\alpha \in [0,1]\), \(\widetilde{\mathbf{T}}_\alpha \) by

$$\begin{aligned} \widetilde{\mathbf{T}}_\alpha z = (1-\alpha )z +\mathbf{T}(\alpha z ). \end{aligned}$$
(18)

Thus \(\widetilde{\mathbf{T}}_\alpha \) is non expansive, hence for \(\lambda \in ]0,1]\) one can consider the associated \(\lambda \)-discounted fixed point \({\widetilde{w}}^\alpha _\lambda \) defined by

$$\begin{aligned} {\widetilde{w}}^\alpha _\lambda = \lambda \widetilde{\mathbf{T}}_\alpha \left( \frac{1-\lambda }{\lambda }\widetilde{w}^\alpha _\lambda \right) . \end{aligned}$$

Note that for \(\alpha = 1\), \({\widetilde{w}}^\alpha _\lambda = {w}_\lambda \).

Proposition 5.7

$$\begin{aligned} \widetilde{w}^\alpha _\lambda =w_\mu , \quad \text{ with } \ \mu = \frac{\lambda }{ \alpha + \lambda - \lambda \, \alpha }. \end{aligned}$$

Proof

Direct computation gives

$$\begin{aligned} {\widetilde{w}}^\alpha _\lambda= & {} \lambda \widetilde{\mathbf{T}}_\alpha \left( \frac{1-\lambda }{\lambda }\widetilde{w}^\alpha _\lambda \right) \\= & {} \lambda \left[ (1-\alpha ) \frac{(1-\lambda )}{\lambda }\widetilde{w}^\alpha _\lambda + \mathbf{T}\left( \alpha \frac{(1-\lambda )}{\lambda }\widetilde{w}^\alpha _\lambda \right) \right] . \end{aligned}$$

Thus

$$\begin{aligned} \left( \alpha +\lambda -\lambda \alpha \right) {\widetilde{w}}^\alpha _\lambda =\lambda \mathbf{T}\left. \left( \alpha \frac{(1-\lambda )}{\lambda }\widetilde{w}^\alpha _\lambda \right) \right) \end{aligned}$$

which is

$$\begin{aligned} {\widetilde{w}}^\alpha _\lambda =\mu \mathbf{T}\left( \frac{1-\mu }{\mu }\widetilde{w}^\alpha _\lambda \right) \end{aligned}$$

for \(\mu = \frac{\lambda }{\alpha + \lambda - \lambda \, \alpha }\), hence the result. \(\square \)

Corollary 5.1

For \(\lambda \le 1/2\), \(\widetilde{w}^{\frac{\lambda }{1-\lambda }}_\lambda \) does not depend on \(\lambda \) and equals \(w_{1/2}\).

6 Finitely repeated exact games

We consider the family of exact games \(G^h = (h\, g, h \, Q) \) with \(h\in [0,1]\).

6.1 Approximation of the value

The recursive equation for the un-normalized value \(V_n^h \) of the n-stage game \(G^h\) (of total length nh) is given by:

$$\begin{aligned} V_n^h (\omega ) = \mathtt{val}\left[ h \,g( \omega ; .) + P_h (.) [\omega ]\circ V_{n-1}^h \right] \end{aligned}$$

so that

$$\begin{aligned} V_n^h = \mathbf{T}_h V_{n-1}^h = \mathbf{T}_h^n (0) \end{aligned}$$

Let f be the solution of (4) with \( \mathbf{T}\) satisfying (1). Using the results of Sect. 4 in particular (11) we obtain:

Proposition 6.1

There exists a constant L such that for all n and \(h \in [0,1]\)

$$\begin{aligned} \left\| V_n^h - f_{nh} (0) \right\| \le L h \sqrt{n}. \end{aligned}$$

6.2 Vanishing step sizes

The previous Proposition 6.1 shows that \(v_n^h=\frac{ V_n^h}{N}\), which is the normalized value of the n-stage game \(G^h\) with length \(N= nh\), satisfies:

$$\begin{aligned} \Vert v_n^h - \frac{f_{N} (0)}{N} \Vert \le L\sqrt{\frac{h}{N}}. \end{aligned}$$

In fact Proposition 4.2 induces a more precise result for vanishing stage duration, that we now describe.

Given \(t>0\) and a finite partition \(H_t \) of \([0,t ], t_0= 0, \ldots , t_k = t \), induced by step sizes \(\{ h_i\}, 1\ge h_i >0, i = 1, \ldots ,k, \sum _{i\le j} h_i = t_j\), we define its mesh as \(m (H_t) = \max _i h_i\).

We consider the k stage game where the duration of stage i is \(h_i\). Let \(U( H_t)\) be its un-normalized value (the normalized value is \(u(H_t) = \frac{U( H_t)}{t}\)). Thus \(U( H_t) = \mathbf{T}^{h_1} \circ \cdots \circ \mathbf{T}^{h_k} (0)\).

Definition 6.1

\(\widehat{V}_t \) is the limit value on [0, t ] if for any sequence of partitions \(\{H^n_t\}\) of [0, t] with vanishing mesh, the sequence of values \(\{U( H_t^n)\}\) of the corresponding games converges to \(\widehat{V}_t \).

Proposition 6.2

There exists a constant \(L'\) such that for any \(H_t\) with \(m(H_t) \le h\), the un-normalized value \(U ( H_t)\) satisfies

$$\begin{aligned} \Vert U ( H_t) - f_t (0)\Vert \le L' \sqrt{h \, t}. \end{aligned}$$

Thus the limit value \(\widehat{V}_t \) exists and is given by \(\widehat{V}_t = f_t(0)\).

Proof

The inequality is obtained from Eq. (13) with \(\sigma _k = t\) and \(\tau _k \le h\, t\).

The existence of \(\widehat{V}_t \) follows. \(\square \)

The interpretation of these results is twofold:

first the value of the game with finite length is essentially independent of the duration of the stages when this duration is small enough,

second, this value is given by the solution of the associated differential equation (4).

Note that \(\widehat{V}_t \) equals also the value of the continuous time game of length t introduced in Neyman (2012).

6.3 Asymptotic analysis

A further consequence of Proposition 6.2 is that for any t and any k-stage game associated to a finite partition \(H_t\), with normalized value \(u(H_t)\), one has:

Proposition 6.3

There exists \(L'\) such that for any \(H_t\)

$$\begin{aligned} \left\| u(H_t) - \frac{f_t(0)}{t} \right\| \le \frac{L'}{\sqrt{t}}. \end{aligned}$$

In particular the asymptotic behavior of the (normalized) value of the game depends only on its total length t (and not on the durations of the individual stages) up to a term \(O ( \frac{1}{\sqrt{t}})\). Again the comparison quantity is given by the normalized solution of the associated differential equation (4).

7 Discounted exact games

7.1 Values of discounted exact game

We follow the definition of Neyman (eq. (3) p. 254 in Neyman 2013) : the (normalized) value \(w_{\lambda }^h\) of the \(\lambda \) discounted game \(G^h\) is the unique solution of

$$\begin{aligned} w_{\lambda }^h (\omega ) = \mathtt{val}_{X\times Y}\left[ h \lambda g(\omega ,x,y) + (1- h {\lambda }) P_h (x,y)[\omega ] \circ w_{\lambda }^h \right] . \end{aligned}$$

with \(P_h=Id+hQ\).

In particular, for \(h= 1\) one recovers \(w_{\lambda } = w_{\lambda }^1\) (see Sect. 5.1) associated to \(\mathbf{T}\) defined in (1).

The notation is consistent with the previous Sect. 5 since one has

Proposition 7.1

\(w_{\lambda }^h\) corresponds to the solution of (17).

Proof

$$\begin{aligned} w_\lambda ^h= \lambda \mathbf{T}_h\left( \frac{1-\lambda h}{\lambda }w_\lambda ^h\right) =\lambda \, \mathtt{val}_{X\times Y}\left[ h g(\omega ,x,y) + P_h (x,y)[\omega ] \circ \frac{1-\lambda h}{\lambda }w_{\lambda }^h \right] . \end{aligned}$$

Hence \(w_\lambda ^h= \mathtt{val}_{X\times Y}\Bigg [ h \lambda g(\omega ,x,y) + (1- h {\lambda }) P_h (x,y)[\omega ] \circ w_{\lambda }^h \Bigg ]\). \(\square \)

A direct computation using \( P_h = Id + h\, Q\) gives

Proposition 7.2

\(w_{\lambda }^h\) is the only solution of

$$\begin{aligned} \varphi (\omega ) = \mathtt{val}_{X\times Y}\left[ g(\omega ,x,y) + \frac{(1- h {\lambda })}{\lambda } Q (x,y)[\omega ] \circ \varphi \right] . \end{aligned}$$

We now apply the results of Sect. 5

Proposition 7.3

$$\begin{aligned} w_{\lambda }^h=w_\mu , \quad \text{ with } \ \mu = \frac{\lambda }{1+ \lambda - \lambda \, h }. \end{aligned}$$

Proof

Apply Proposition 5.2. \(\square \)

7.2 Vanishing duration

We now recover the convergence property in Neyman (2013).

Corollary 7.1

For a fixed \(\lambda \), \(w_{\lambda }^h\) converges as h goes to 0. The limit, denoted \(\widehat{w}_{\lambda }\), equals \(w_\frac{\lambda }{1+\lambda }\), hence is the only solution of:

$$\begin{aligned} \varphi = \mathtt{val}\left[ g + \frac{Q}{\lambda } \circ \varphi \right] . \end{aligned}$$
(19)

Moreover, \(\Vert w_{\lambda }^h-\widehat{w}_{\lambda }\Vert \le C \lambda h\).

Proof

For the convergence, apply Proposition 5.3.

By definition \(w_\frac{\lambda }{1+\lambda }\) satisfies

$$\begin{aligned} w_\frac{\lambda }{1+\lambda } = \mathtt{val}\left[ \frac{\lambda }{1+\lambda } g + \frac{1}{1+\lambda } (Id+Q) \circ w_\frac{\lambda }{1+\lambda }\right] . \end{aligned}$$

that is

$$\begin{aligned} w_\frac{\lambda }{1+\lambda } = \mathtt{val}\left[ g + \frac{Q}{\lambda } \circ w_\frac{\lambda }{1+\lambda }\right] . \end{aligned}$$

\(\square \)

More generally consider a sequence of stage durations \(\{h_i\}\) with \(h_i \le h\) and \(\sum _i h_i = + \infty \) inducing a partition H. The value of the associated \(\lambda \)-discounted game \(W_\lambda ^H\) is given by \( \Pi _{i=1}^{+ \infty } \mathbf{D}^{h_i}_\lambda (0)\) hence satisfies

Proposition 7.4

$$\begin{aligned} \left\| W_\lambda ^H - \widehat{w}_{\lambda }\right\| \le 2 \Vert \mathbf{T}(0)\Vert \, h. \end{aligned}$$

Proof

Apply Proposition 5.4. \(\square \)

Once again \(\widehat{w}_{\lambda }\) has to be interpreted as the \(\lambda \)-discounted value of the continuous time game, see Guo and Hernandez-Lerma (2005); Neyman 2012, 2013).

Note that our game theoretic framework is very general, in particular there is no finiteness assumption on the actions or states.

7.3 Asymptotic behavior

The value \(w_\lambda ^h\) of the \(\lambda \) discounted game with stage duration h has the same asymptotic behavior, as \(\lambda \) tends to 0, as \(w_\lambda \).

Proposition 7.5

$$\begin{aligned} \left\| w_\lambda ^h-w_\lambda \right\| \le 2C\lambda . \end{aligned}$$

Proof

Apply Proposition 5.5. \(\square \)

More generally one obtains

Proposition 7.6

For any \(\{h_i\}\) with \(\sum _i h_i = + \infty \) inducing a partition H, \(\Vert W_\lambda ^H - w_{\lambda }\Vert \le C' \lambda \) where C’ depends only on the game.

Proof

Immediate consequence of Proposition 5.6 and its proof, since, by non expansiveness of the value operator, for any game with a payoff bounded by C the associated Shapley operator \(\mathbf{T}\) satisfies assumption (H) with \(k(\lambda )= \lambda =o(\sqrt{\lambda })\) and \(\ell (\Vert z\Vert )=C+\Vert z\Vert \). \(\square \)

7.4 Invariance properties

Let \(\mathbf{T}\) be the Shapley operator associated to the game (gQ). Then \(\widetilde{\mathbf{T}}_\alpha \) defined by (18) is the Shapley operator associated to \((g, \alpha \,Q)\) since

$$\begin{aligned} \widetilde{\mathbf{T}}_\alpha (f)= & {} \mathtt{val}_{X\times Y}\left[ g(\omega ,x,y) + (Id + Q (x,y)) [\omega ] \circ \alpha f \right] + (1- \alpha ) f\\= & {} \mathtt{val}_{X\times Y}\left[ g(\omega ,x,y) + (Id + \alpha Q (x,y)) [\omega ] \circ f \right] . \end{aligned}$$

This implies

Proposition 7.7

For any kernel R, the \(\lambda \)-discounted value of the game \(G (g; \frac{\lambda }{(1- {\lambda })}R)\) is independent of \(\lambda \le 1/2\) and the only solution of

$$\begin{aligned} \varphi (\omega ) = \mathtt{val}_{X\times Y}\left[ g(\omega ,x,y) + R (x,y)[\omega ] \circ \varphi \right] . \end{aligned}$$

Proof

Apply Corollary 5.1. \(\square \)

This shows a tradeoff between the size of the kernel and the discount factor. Taking into account Proposition 7.3 one derives an invariance property of the value on the product space: discount factor \( \times \) stage duration \( \times \) kernel:

$$\begin{aligned} Val (\lambda , h , R) = Val \left( \frac{\lambda }{1+ \lambda - \lambda \,h} , 1, R\right) = Val \left( \lambda , 1, \frac{1- \lambda \,h }{1- \lambda }R \right) . \end{aligned}$$

Similar covariance properties were obtained in Neyman (2012, 2013).

8 Discretization approach

We consider now the game \(\overline{G}^h\) which corresponds to the discretization of the continuous time game. We will study two frameworks, like in the previous sections: either a fixed finite length or a fixed discount factor and we will analyse the behavior of the associated values as the stage duration h goes to 0.

8.1 Finite length

The un-normalized value \(\overline{V}^h_n\) of the n-stage game with stage duration h satisfies \(\overline{V}^h_n= (\overline{\mathbf{T}}_h)^n (0)\). Similarly for varying stage duration, corresponding to a partition H, one gets a recursive equation of the form \(\overline{V}_H(t)= \Pi _i \overline{\mathbf{T}}_{h_i}(0)\).

Lemma 8.1

There exists \(C_0\) such that

$$\begin{aligned} \left\| \mathbf{T}_h(f)-\overline{\mathbf{T}}_h(f)\right\| \le C_0(1+\Vert f\Vert ) h^2. \end{aligned}$$

Proof

By non expansiveness of the value operator,

$$\begin{aligned} \left\| \mathbf{T}_h(f)-\overline{\mathbf{T}}_h(f)\right\|\le & {} \Vert h \,g(\cdot )-g^h(\cdot ) \Vert + \Vert f\Vert \Vert P_h(\cdot )-{\mathsf {P}}^h(\cdot ) \Vert \\= & {} h \,O(h)+ \Vert f\Vert h\, O(h) \end{aligned}$$

since \(P_h=Id+h\,Q\) and \({\mathsf {P}}^h=e^{hQ}=Id+h\,Q+ h \,O(h)\). \(\square \)

Proposition 8.1

There exists C depending only of the game G such that for any finite sequence \((h_i)_{i\le n}\) in [0, h] with sum t and corresponding partition H:

$$\begin{aligned} \left\| \overline{V}_H (t)-\widehat{V}_t \right\| \le C\left( \sqrt{ht}+ht+ht^2\right) . \end{aligned}$$

In particular for a given t, \(\overline{V}_H (t)\) tends to \(\widehat{V}_t\) as h goes to 0.

Proof

The value of any game with total length less than t is bounded by some \(C_1\)t, independently of h. Hence non expansiveness of the operators as well as the previous Lemma 8.1 gives

$$\begin{aligned} \left\| \overline{V}_H (t)-\Pi _i \mathbf{T}_{h_i}(0)\right\|&=\left\| \Pi _i \overline{\mathbf{T}}_{h_i}(0)-\Pi _i \mathbf{T}_{h_i}(0)\right\| \\&\le \left\| \overline{\mathbf{T}}_{h_1}\left( \underset{i\ge 2}{\Pi } \overline{\mathbf{T}}_{h_i}(0)\right) -\overline{\mathbf{T}}_{h_1}\left( \underset{i\ge 2}{\Pi } \mathbf{T}_{h_i}(0)\right) \right\| \\&\quad +\left\| \overline{\mathbf{T}}_{h_1}\left( \underset{i\ge 2}{\Pi }\mathbf{T}_{h_i}(0)\right) -\mathbf{T}_{h_1}\left( \underset{i\ge 2}{\Pi } \mathbf{T}_{h_i}(0)\right) \right\| \\&\le \left\| \underset{i\ge 2}{\Pi } \overline{\mathbf{T}}_{h_i}(0)-\underset{i\ge 2}{\Pi } \mathbf{T}_{h_i}(0)\right\| +C_0h_1^2\left( 1+C_1\sum _{i=2}^n h_i\right) . \end{aligned}$$

Without loss of generality \(C_1\ge 1\) hence by summation,

$$\begin{aligned} \left\| \overline{V}_H(t)-\Pi _i \mathbf{T}_{h_i}(0)\right\|\le & {} C_0C_1(1+t)\sum _{i=1}^n h_i^2 \\\le & {} C\, h(1+t)\sum _{i=1}^n h_i\\= & {} C\,h\,t(1+t) \end{aligned}$$

for \(C=C_0C_1\). Then Proposition 6.2 yields the result. \(\square \)

Remark 8.1

For a given h, the right hand term is quadratic in t, hence we do not link the asymptotic behavior of the normalized quantity \(\overline{v}^h_n=\frac{\overline{V}^h_n}{nh}\) and of \(\widehat{v}_t=\frac{\widehat{V}_t}{t}\). However if n is a function of h converging slowly enough to infinity, the previous proposition can be used. For example for \(n(h)=\frac{1}{h\sqrt{h}}\) (so that \(t(h)=\frac{1}{\sqrt{h}}\)), one has

$$\begin{aligned} \left\| \overline{v}^h_{n(h)}-\widehat{v}_{t(h)}\right\| =O(\sqrt{h}). \end{aligned}$$

8.2 Discounted case

We consider uniform stage duration h. The normalized value \(\overline{w}^h_k\) of the discretization with mesh h of the \(\lambda \)-discounted continuous game satisfies the fixed point equation

$$\begin{aligned} \overline{w}^h_\lambda (\omega )=\mathtt{val}_{X\times Y}\left[ \int _0^h \lambda e^{-\lambda t} g(\omega _t,x,y) + e^{-\lambda h} {\mathsf {P}}^h (x,y)[\omega ] \circ \overline{w}_{\lambda }^h \right] . \end{aligned}$$

Proposition 8.2

For a given \(\lambda \), \(\overline{w}^h_\lambda \) tends to \(\widehat{w}_\lambda \) as h goes to 0.

Proof

The equations for \(\overline{w}^h_\lambda \) and \(w^h_\lambda \), as well as the non expansiveness of the value operator, give:

$$\begin{aligned} \left\| \overline{w}^h_\lambda -w^h_\lambda \right\|\le & {} \lambda h\left\| g(\cdot )-\frac{1}{h}\int _0^h e^{-\lambda t}g_t(\cdot ) \right\| +e^{-\lambda h}\left\| \overline{w}^h_\lambda -w^h_\lambda \right\| + \Vert w^h_\lambda \Vert \Vert P_h-{\mathsf {P}}^h \Vert \\&+(1-\lambda h-e^{-\lambda h})\Vert w^h_\lambda \Vert \\\le & {} \lambda O(h^2)+e^{-\lambda h}\left\| \overline{w}^h_\lambda -w^h_\lambda \right\| + O(h^2)+ \lambda ^2 O(h^2) \end{aligned}$$

hence for a fixed \(\lambda \), \((1-e^{-\lambda h})\Vert \overline{w}^h_\lambda -w^h_\lambda \Vert =O(h^2)\) and the result follows from Corollary 7.1. \(\square \)

Similar properties were obtained in Neyman (2013).

For an alternative approach to the limit behavior of the discretization of the continuous model, relying on viscosity solution tools and extending to various information structures on the state, see Sorin (2015).

9 Extensions and concluding comments

9.1 Stochastic games: no signals on the state

Consider a finite stochastic game where the players know only the initial distribution \(m \in \Delta (\Omega )\) and the actions at each stage.

The basic equation for the exact game with duration h is then

$$\begin{aligned} \hat{\mathbf{T}}^h f (m) = \mathtt{val}\left[ h g(m; x,y) + \sum _{ij} x^i y^j f \left( m*P_h(i, j) \right) \right] . \end{aligned}$$

with \( [m*P_h(i, j)]( \omega ) = \sum _z m (z) P_h(i, j)[z] ( \omega )\) being the image of the probability m by the kernel \(P_h(i, j)\).

The equation

$$\begin{aligned} \hat{\mathbf{T}}^h = h \hat{\mathbf{T}} + (1-h) Id \end{aligned}$$

does not hold anymore and \( \mathbf{T}- Id \) has to be replaced by \( \lim _{ h \rightarrow 0} \frac{ \hat{\mathbf{T}}^h - Id}{h}\) in (4). The study of such games with varying duration thus seems more involved.

9.2 Link with games with uncertain duration

Notice that \(\mathbf{T}_h = (1-h) Id + h \mathbf{T}\) is a particular case of an operator of the form \( \sum _i \alpha _i \mathbf{T}^i, \alpha _i \ge 0, \sum \alpha _i = 1,\) which corresponds to some generalized iterate Neyman (2003), Neyman and Sorin (2010) of \(\mathbf{T}\). Hence all the values computed in Sects. 4.3, 5.3 and so on, can also be seen as the value of some games with uncertain duration. See Vigeral (2010) for specific remarks in the particular case of \(V_n^h\).

9.3 Oscillations

Several examples of stochastic games (either with a finite set of states and compact sets of actions Vigeral (2013), or compact set of states and finite set of actions Ziliotto (2013)) were recently constructed for which the values \(v_n\) and \(v_\lambda \) do not converge. Hence the values of the corresponding exact games with vanishing duration (and thus their limit as continuous time games) do not converge when t goes to infinity or \(\lambda \) to 0.

9.4 Comparison to the literature

The approach here is different from the one of Neyman (2013): the proofs are based on properties of operators and not on strategies. For example Neyman (2013) shows that playing optimally in (19) will imply Corollary 7.1.

By comparison our tools consider only the values and apply to any non expansive map \(\mathbf{T}\).

9.5 Main results

The main results can be summarized in two parts:

  • for a given finite length (or discounted evaluation) the value of the game with vanishing stage duration converges thus defining a limit value for the associated continuous time game. Moreover the limit is described explicitly.

  • as the length goes to \(\infty \) or the discount factor goes to 0, the impact of the stage duration goes to 0 and the asymptotic behavior of the normalized value function is independent of the discretization.