1 Notations

  • The notation “\(X:=Y\)” means “X is defined by the expression Y”.

  • If \(A \subset B\), the complementary of A in B is denoted by \(B \setminus {A}\).

  • The set of non-negative integers is denoted by \(\mathbb {N}\), and \(\mathbb {N}^*:=\mathbb {N} \setminus \{0\}\).

  • The set of real numbers is denoted by \(\mathbb {R}\), and the set of strictly positive real numbers is denoted by \(\mathbb {R}^*_+\).

  • If \((C,{\mathcal {C}})\) is a measurable space, we denote by \({\varDelta }(C)\) the set of probability measures on C. We call \(\delta _c\) the Dirac measure at \(c \in C\). If \(C_0 \subset C\) is a finite set and \((\alpha _c)_{c \in C_0} \in {\varDelta }(C_0)\), then \(\sum _{c \in C_0} \alpha _c \delta _c\) is denoted by \(\sum _{c \in C_0} \alpha _c \cdot c\).

2 Introduction

Zero-sum stochastic games were introduced by Shapley (1953). In this model, two players repeatedly play a zero-sum game, which depends on the state of nature. At each stage, a new state of nature is drawn from a distribution based on the actions of players and the state of the previous stage. The state of nature is announced to both players, along with the actions of the previous stage. Unless mentioned explicitly, we consider finite stochastic games: the state space and the action sets are assumed to be finite.

There are several ways to evaluate the payoff in a stochastic game. For \(n \in \mathbb {N}^*\), the payoff in the \(n\text {-}stage \ game\) is the Cesaro mean \(\frac{1}{n} \sum _{m=1}^n g_m\), where \(g_m\) is the payoff at stage \(m \ge 1\). For \(\lambda \in (0,1]\), the payoff in the \(\lambda \text {-}discounted \ game\) is the Abel mean \(\sum _{m \ge 1} \lambda (1-\lambda )^{m-1} g_m\).

Two main approaches are used to understand the properties of stochastic games with long duration:

  • The asymptotic approach aims at determining if the value \(v_n\) of the n-stage game and the value \(v_{\lambda }\) of the \(\lambda \)-discounted game converge respectively when n goes to infinity and \(\lambda \) goes to 0. Bewley and Kohlberg (1976) have proved that in finite stochastic games, \((v_n)\) and \((v_{\lambda })\) converge to the same limit. This result cannot be extended to stochastic games with compact action sets (see Vigeral 2013). Neither can it be extended to the case of public imperfect observation of the state of nature (see Ziliotto 2013).

  • The uniform approach analyzes the existence of strategies that are approximately optimal in any n-stage game and \(\lambda \)-discounted game, provided that n is big enough and \(\lambda \) is small enough. When this is the case, the stochastic game is said to have a uniform value. Mertens and Neyman (1981) have shown that finite stochastic games have a uniform value. Note that the existence of the uniform value implies the existence of the asymptotic value.

In this paper we investigate these two approaches, when payoffs are not restricted to be Cesaro means or Abel means of stage payoffs. As in Cardaliaguet et al. (2012), if \(\pi :=(\pi _m)_{m \ge 1} \in {\varDelta }(\mathbb {N}^*)\) is a sequence of weights, the payoff in the \(\pi \text {-}weighted \ game\) is defined as the weighted sum \(\sum _{m \ge 1} \pi _m g_m\). Intuitively, a \(\pi \)-weighted game with long duration corresponds to the case where the \((\pi _m)_{m \ge 1}\) are close to 0 (but still summing to one), but there are many different ways to define the convergence of \(\pi \) to 0. Once a criterion of convergence is defined, the asymptotic approach consists in determining whether or not the value \(v_{\pi }\) of the \(\pi \)-weighted game converges when \(\pi \) goes to 0. When this is the case, the game is said to have a general asymptotic value (with respect to the chosen criterion). Likewise, the uniform approach deals with the existence of strategies that are approximately optimal in any \(\pi \)-weighted game, with \(\pi \) small enough. When this is the case, the game is said to have a general uniform value (with respect to the chosen criterion). Two main results can be found in literature:

  • If \((\pi _m)_{m \ge 1} \in {\varDelta }(\mathbb {N}^*)\) is decreasing with respect to m, and if the criterion of convergence is: \(\pi _1\) goes to 0, then finite stochastic games have a general uniform value (and thus a general asymptotic value). This result stems from the existence of the uniform value, established by Mertens and Neyman (1981), and from Theorem 1 and Remark (4) in Neyman and Sorin (2010).

  • Renault and Venel (2012) examine payoff weights that are not necessarily decreasing with respect to time, and consider the impatience \(I_1(\pi ):=\sum _{m \ge 1} |\pi _{m+1}-\pi _m|\) of \(\pi \) (see Sorin 2002, section 5.7). They investigate the limit behavior of finite stochastic games with one Player (Markov Decision Processes) and finite POMDP (Markov Decision Processes with Partial Observation), when \(I_1(\pi )\) goes to 0. In this framework, they show the existence of the general uniform value. Note that if \(\pi \) is decreasing and \(\pi _1\) goes to 0, then \(I_1(\pi )=\pi _1\) goes to 0. Thus, for MDPs, this second result is more general than the first one.

In this paper, we also define a criterion on the convergence of \(\pi \) to 0 under which the general asymptotic value exists in stochastic games. For the asymptotic approach, our theorem generalizes the two aforementioned results. In addition, we provide an example which shows first that our result is tight, and second that the result of Renault and Venel (2012) cannot be extended to the Two-Player Case. We also show that for absorbing games with compact action sets and separately continuous transition and payoff functions, a sufficient condition under which \((v_{\pi })\) converges is that \(\sup _{m \ge 1} \pi _m\) goes to 0 (when the action sets are finite, a sketch of proof for this last result is written in Cardaliaguet et al. 2012). As for the uniform approach, we provide an example of absorbing game which shows that there is no natural way to relax the decreasing assumption on the weights.

The paper is organized as follows. Section 2 presents the model of stochastic games and some basic concepts. Section 3 deals with the asymptotic approach, and Sect. 4 presents the uniform approach.

3 Generalities

3.1 Model of stochastic game

A stochastic game \({\varGamma }\) is defined by:

  • A state space K,

  • An action set I (resp. J) for Player 1 (resp. 2),

  • A payoff function \(g:K \times I \times J \rightarrow [0,1]\),

  • A transition function \(q:K \times I \times J \rightarrow {\varDelta }(K)\).

Except in Sect. 3.2, we assume that KIJ are (nonempty) finite sets.

The initial state is \(k_1 \in K\), and the stochastic game \({\varGamma }^{k_1}\) which starts in \(k_1\) proceeds as follows. At each stage \(m \ge 1\), both players choose simultaneously and independently an action, \(i_m \in I\) (resp. \(j_m \in J\)) for Player 1 (resp. 2). The payoff at stage m is \(g_m:=g(k_m,i_m,j_m)\). The state \(k_{m+1}\) of stage \(m+1\) is drawn from the probability distribution \(q(k_m,i_m,j_m)\). Then \((k_{m+1},i_m,j_m)\) is publicly announced to both players.

The set of all possible histories before stage m is \(H_m:=(K \times I \times J)^{m-1} \times K\). A behavioral strategy for Player 1 (resp. 2) is a mapping \(\sigma :\cup _{m \ge 1} H_m \rightarrow {\varDelta }(I)\) (resp. \(\tau :\cup _{m \ge 1} H_m \rightarrow {\varDelta }(J)\)). The set of all behavioral strategies for Player 1 (resp. 2) is denoted by \({\varSigma }\) (resp. \({\mathcal {T}}\)).

A pure strategy for Player 1 (resp. 2) is a mapping \(\sigma :\cup _{m \ge 1} H_m \rightarrow I\) (resp. \(\tau :\cup _{m \ge 1} H_m \rightarrow J\)).

A Markov strategy is a strategy that depends only on the current stage and state. A Markov strategy for Player 1 (resp. 2) can be assimilated to a mapping from \(\mathbb {N}^* \times K\) to \({\varDelta }(I)\) (resp. \({\varDelta }(J)\)).

A stationary strategy is a strategy that depends only on the current state. A stationary strategy for Player 1 (resp. 2) can be assimilated to a mapping from K to \({\varDelta }(I)\) (resp. \({\varDelta }(J)\)).

The set of infinite plays of the game is \(H_\infty :=(K \times I \times J)^{\mathbb {N}^*}\), and is equipped with the \(\sigma \)-algebra generated by cylinders. A triple \((k_1,\sigma ,\tau ) \in K \times {\varSigma } \times {\mathcal {T}}\) induces a unique probability measure on \(H_\infty \), denoted by \({\mathbb {P}}^{k_1}_{\sigma ,\tau }\) (see Sorin 2002, Appendix D). Let \(\pi \in {\varDelta }(\mathbb {N}^*)\) such that \(\sum _{m \ge 1} \pi _m=1\). The \(\pi -weighted \ game\) \({\varGamma }^{k_1}_\pi \) is the game defined by its normal form \(({\varSigma },{\mathcal {T}},\gamma _{\pi }^{k_1})\), where

$$\begin{aligned} \gamma ^{k_1}_{\pi }(\sigma ,\tau ):={\mathbb {E}}^{k_1}_{\sigma ,\tau }\left( \sum _{m \ge 1} \pi _m g_m \right) . \end{aligned}$$

By the minmax theorem (see Sorin 2002, Appendix A.5), the game \({\varGamma }^{k_1}_{\pi }\) has a value, denoted by \(v_{\pi }(k_1)\):

$$\begin{aligned} v_{\pi }(k_1)=\max _{\sigma \in {\varSigma }} \min _{\tau \in {\mathcal {T}}} \gamma ^{k_1}_{\pi }(\sigma ,\tau ) =\min _{\tau \in {\mathcal {T}}} \max _{\sigma \in {\varSigma }} \gamma ^{k_1}_{\pi }(\sigma ,\tau ). \end{aligned}$$

When for some \(n \in \mathbb {N}^*\), \(\pi _m=n^{-1} 1_{m \le n}\) for every \(m \in \mathbb {N}^*\), the game \({\varGamma }_n:={\varGamma }_{\pi }\) is called the (n-stage game), and its payoff function is denoted by \(\gamma _n\). When for some \(\lambda \in (0,1]\), \(\pi _m=\lambda (1-\lambda )^{m-1}\) for every \(m \in \mathbb {N}^*\), the game \({\varGamma }_{\lambda }:={\varGamma }_{\pi }\) is called the (\(\lambda \)-discounted game), and its payoff function is denoted by \(\gamma _{\lambda }\).

3.2 Two results in the literature

Let us fix a stochastic game \({\varGamma }\). Two standard definitions are recalled below:

Definition 1

The stochastic game \({\varGamma }\) has an asymptotic value if the sequences \((v_{\lambda })\) and \((v_n)\) converge to the same limit, when respectively \(\lambda \) goes to 0 and n goes to infinity.

Definition 2

The stochastic game \({\varGamma }\) has a uniform value \(v_{\infty }:K \rightarrow [0,1]\) if for all \(k_1 \in K\), for all \(\epsilon >0\), there exists \((\sigma ^*,\tau ^*) \in {\varSigma } \times {\mathcal {T}}\) and \({\bar{n}} \in \mathbb {N}^*\), such that for all \(n \ge {\bar{n}}\) and \((\sigma ,\tau ) \in {\varSigma } \times {\mathcal {T}}\), we have

$$\begin{aligned} \gamma ^{k_1}_n(\sigma ^*,\tau ) \ge v_{\infty }(k_1)-\epsilon \quad \text {and} \quad \gamma ^{k_1}_n(\sigma ,\tau ^*) \le v_{\infty }(k_1)+\epsilon . \end{aligned}$$

Bewley and Kohlberg (1976) have proved that \({\varGamma }\) has an asymptotic value, and Mertens and Neyman (1981) have generalized this result in the following way:

Theorem 1

The stochastic game \({\varGamma }\) has a uniform value \(v_{\infty }\). In particular, \({\varGamma }\) has an asymptotic value, and \((v_{\lambda })\) and \((v_n)\) converge to \(v_{\infty }\), when respectively \(\lambda \) goes to 0 and n goes to infinity.

This theorem shows that both players have strategies that are approximately optimal in any long game \({\varGamma }^{k_1}_n\). The existence of a stronger notion of uniform value is then straightforward (see Theorem 1 and Remark (4) in Neyman and Sorin 2010): players have strategies that are approximately optimal in any game \({\varGamma }^{k_1}_{\pi }\) with \(\pi =(\pi _m)_{m \ge 1}\) decreasing with respect to m and \(\pi _1\) sufficiently small. For completeness, we give a sketch of the proof of this corollary.

Corollary 1

For all \(k_1 \in K\), for all \(\epsilon >0\), there exists \((\sigma ^*,\tau ^*) \in {\varSigma } \times {\mathcal {T}}\) and \(\alpha >0\) such that for all \(\pi =(\pi _m)_{m \ge 1} \in {\varDelta }(\mathbb {N}^*)\) decreasing with respect to m, and that satisfies \( I_{\infty }(\pi ):=\sup _{m \ge 1} \pi _m=\pi _1 \le \alpha \), we have for all \((\sigma ,\tau ) \in {\varSigma } \times {\mathcal {T}}\)

$$\begin{aligned} \gamma ^{k_1}_{\pi }(\sigma ^*,\tau ) \ge v_{\infty }(k_1)-\epsilon \quad \text {and} \quad \gamma ^{k_1}_{\pi }(\sigma ,\tau ^*) \le v_{\infty }(k_1)+\epsilon . \end{aligned}$$

In particular, \((v_{\pi })\) converges to \(v_{\infty }\) when \(\pi \) is decreasing and \(I_{\infty }(\pi )\) goes to 0.

Proof

(Sketch) If \(\pi \in {\varDelta }(\mathbb {N}^*)\) is decreasing, then the \(\pi \)-weighted payoff is a convex combination of Cesaro-mean payoffs:

$$\begin{aligned} \sum _{m \ge 1} \pi _m g_m=\sum _{m \ge 1} m(\pi _m-\pi _{m+1}) \frac{1}{m} \sum _{l=1}^m g_l. \end{aligned}$$

The proof of the corollary follows from this equality. \(\square \)

In the One-Player Case, by a particular case of Theorem 3.19 in Renault and Venel (2012), this result can be extended to a wider class of weights, in the following way:

Theorem 2

Assume that \({\varGamma }\) is a Markov decision process, that is, the functions q and g do not depend on the action of Player 2. Then \({\varGamma }\) has a general uniform value: for all \(k_1 \in K\), for all \(\epsilon >0\), there exists \(\sigma ^* \in {\varSigma }\) and \(\alpha >0\), such that for all \(\pi \in {\varDelta }(\mathbb {N}^*)\) that satisfies \(I_1(\pi ):=\sum _{m \ge 1}|\pi _{m+1}-\pi _m| \le \alpha \), we have

$$\begin{aligned} v_{\infty }(k_1)-\epsilon \le \gamma ^{k_1}_{\pi }(\sigma ^*) \le v_{\infty }(k_1)+\epsilon . \end{aligned}$$

In particular, \((v_{\pi })\) converges to \(v_{\infty }\) when \(I_1(\pi )\) goes to 0.

In the next section, we study the asymptotic approach and investigate whether we can also relax the decreasing assumption in Corollary 1 in the Two-Player Case.

4 Asymptotic approach

4.1 A criterion for the convergence of \((v_{\pi })\)

The first obvious point is that if one removes the decreasing assumption in Corollary 1 and only assumes that \(I_{\infty }(\pi ):=\sup _{m \ge 1} \pi _m\) goes to zero, \((v_{\pi })\) does not necessarily converge. Indeed consider the Markov chain which oscillates deterministically between two states, one with payoff 1, the other one with payoff 0. Consider two sequences of weights, one which puts weight on even stages and one which puts weight on odd stages. The difference between these two payoff evaluations is always equal to 1. Thus the condition \(I_{\infty }(\pi ) \rightarrow 0\) is not a sufficient condition to obtain the convergence of \((v_{\pi })\). Let us now provide a more restrictive criterion under which \((v_{\pi })\) converges.

Definition 3

Let \(\pi \in {\varDelta }(\mathbb {N}^*)\) and \(p \in (0,+\infty ]\). The \(\textit{p-impatience}\) of \(\pi \) is the quantity \(I_p(\pi ) \in (0,+\infty ]\) defined by

$$\begin{aligned} I_p(\pi ):= \left\{ \begin{array}{l@{\quad }l} \sum _{m \ge 1} \left| {(\pi _{m+1})^p}-{(\pi _m)^p} \right| &{} \text{ if } \ \ p<\infty , \\ \sup _{m \ge 1} \pi _m &{} \text{ if } \ \ p=\infty . \end{array} \right. \end{aligned}$$

When \(I_p(\pi )\) is small, it means that players are very patient. When in addition \(p<\infty \), it means that the variations of \(\pi \) with respect to m are small.

Proposition 1

Let \(\pi \in {\varDelta }(\mathbb {N}^*)\) and \(p,p' \in \mathbb {R}^*_{+}\), such that \(p \le p'\). Then

  • \(I_{p'}(\pi ) \le (p'/p) I_p(\pi )\),

  • \(I_{\infty }(\pi ) \le (I_p(\pi ))^{1/p}\).

Proof

Let \(m \in \mathbb {N}^*\) and \(q:=p'/p\). The Mean Value Theorem implies that

$$\begin{aligned} \left| (\pi _{m+1})^{p'}-(\pi _m)^{p'} \right| =\left| \left[ (\pi _{m+1})^{p}\right] ^q-\left[ (\pi _m)^{p}\right] ^q \right| \le q \left| (\pi _{m+1})^{p}-(\pi _m)^{p} \right| , \end{aligned}$$

and it yields: \(I_{p'}(\pi ) \le q I_p(\pi )\). As for the second inequality, we have

$$\begin{aligned} (\pi _m)^p=\sum _{m' \ge m} \left[ (\pi _{m'})^p-(\pi _{m'+1})^p \right] \le I_p(\pi ), \end{aligned}$$

and it yields: \(I_{\infty }(\pi ) \le (I_p(\pi ))^{1/p}\). \(\square \)

Remark 1

When \((\pi _m)_{m \ge 1}\) is decreasing, for all \(p \in \mathbb {R}_+^*\), we have \(I_p(\pi )=(\pi _1)^p\). Consequently, given \(p,p' \in \mathbb {R}_+^*\) such that \(p \le p'\), there does not exist a real number \(C(p,p')>0\) such that for all \(\pi \in {\varDelta }(\mathbb {N}^*)\), \(I_{p'}(\pi ) \ge C(p,p') I_p(\pi )\).

Let us fix a stochastic game \({\varGamma }\), and let \(v_{\infty }=\lim _{\lambda \rightarrow 0} v_{\lambda }=\lim _{n \rightarrow +\infty } v_n\) be its uniform value. For f a real-valued function, denote by \(\Vert f\Vert _{\infty }\) the supremum of f.

Definition 4

Let \(p \in (0,+\infty ]\). The stochastic game \({\varGamma }\) has a (p-asymptotic value) if for all \(\epsilon >0\), there exists \(\alpha >0\) such that for all \(\pi \in {\varDelta }(\mathbb {N}^*)\) verifying \(I_p(\pi ) \le \alpha \), we have \(\Vert v_{\pi }-v_{\infty }\Vert _{\infty } \le \epsilon \).

Remark 2

 

  • If for some \(p' \in (0,+\infty ]\), the game \({\varGamma }\) has a \(p'\)-asymptotic value, it has a p-asymptotic value for all \(p \le p'\). It results directly from Proposition 1.

  • By Theorem 2, any Markov decision process has a 1-asymptotic value.

  • Finite absorbing games have an \(\infty \)-asymptotic value (see Cardaliaguet et al. 2012).

  • The Markov chain described at the beginning of this subsection has no p-asymptotic value for all \(p>1\).

Recall that \((v_{\lambda })\) can be expanded in Puiseux series (see Bewley and Kohlberg 1976): there exists \(\beta >0\), \(M \in \mathbb {N}^*\) and \(r_m \in \mathbb {R}^{K}\) such that for all \(k \in K\) and \(\lambda \in [0,\beta )\)

$$\begin{aligned} v_{\lambda }(k)=\sum _{m \ge 0} r_m(k) \lambda ^{\frac{m}{M}} , \end{aligned}$$
(1)

with the convention \(v_0:=v_{\infty }\).

Definition 5

Let \(m_0=\inf \{m \ge 1 \ | \ r_m \ne 0 \}\). The quantity \(s:=m_0/M \in [0,+\infty ]\) is called the order of \({\varGamma }\).

Note that if \(s<+\infty \), there exists \(C>0\) such that for all \((\lambda ,\lambda ') \in [0,\beta )^2\), we have

$$\begin{aligned} \left\| v_{\lambda }-v_{\lambda '}\right\| _\infty \le C \left| \lambda ^{s}-\lambda '^{s} \right| . \end{aligned}$$
(2)

If A is a finite set, the cardinal of A is denoted by \({\text {Card}}A\). By Remark 3 in Oliu-Barton (2014), we have

$$\begin{aligned} s \ge ({\text {Card}}K {\text {Card}}I)^{-\sqrt{{\text {Card}}K {\text {Card}}I}}. \end{aligned}$$

Now we can state our main theorem.

Theorem 3

The stochastic game \({\varGamma }\) has a s-asymptotic value. In particular, if \(p \in \mathbb {R}_{+}^*\) is smaller or equal to \(({\text {Card}}K {\text {Card}}I)^{-\sqrt{{\text {Card}}K {\text {Card}}I}}\), then \({\varGamma }\) has a p-asymptotic value.

Proof

Neyman has shown that in a stochastic game the convergence of \((v_n)\) can be deduced from the Shapley equation and the fact that \((v_{\lambda })\) is absolutely continuous with respect to \(\lambda \) (see Sorin 2002, Theorem C.8, p.177). We use similar tools.

Let \(\pi \in {\varDelta }(\mathbb {N}^*)\) and \(r \in \mathbb {N}\) such that there exists \(m \ge r+2\), \(\pi _m \ne 0\). A sequence of weights \(\pi ^r \in {\varDelta }(\mathbb {N}^*)\) is defined in the following way: for \(m \in \mathbb {N}^*\),

$$\begin{aligned} \pi ^r_m:= \left\{ \begin{array}{l@{\quad }l} \frac{\pi _{m+r}}{\sum _{m' \ge r+1} \pi _{m'}} &{} \text{ if } \ \ \pi _{m+r} \ne 0, \\ 0 &{} \text{ if } \ \ \pi _{m+r}=0. \end{array} \right. \end{aligned}$$

Let \(\lambda _r:=\pi ^r_1\). Let \(k \in K\). Shapley equations yield (see Cardaliaguet et al. 2012):

$$\begin{aligned} v_{\pi ^r}(k)= & {} \max _{x \in {\varDelta }(I)} \min _{y \in {\varDelta }(J)} \left\{ \lambda _r g(k,x,y)+(1-\lambda _r)\mathbb {E}^k_{x,y}(v_{\pi ^{r+1}}) \right\} \end{aligned}$$
(3)
$$\begin{aligned}= & {} \min _{y \in {\varDelta }(J)} \max _{x \in {\varDelta }(I)} \left\{ \lambda _r g(k,x,y)+(1-\lambda _r)\mathbb {E}^k_{x,y}(v_{\pi ^{r+1}}) \right\} \end{aligned}$$
(4)

and

$$\begin{aligned} v_{\lambda _r}(k)= & {} \max _{x \in {\varDelta }(I)} \min _{y \in {\varDelta }(J)} \left\{ \lambda _r g(k,x,y)+(1-\lambda _r)\mathbb {E}^k_{x,y}(v_{\lambda _{r}}) \right\} \end{aligned}$$
(5)
$$\begin{aligned}= & {} \min _{y \in {\varDelta }(J)} \max _{x \in {\varDelta }(I)} \left\{ \lambda _r g(k,x,y)+(1-\lambda _r)\mathbb {E}^k_{x,y}(v_{\lambda _{r}}) \right\} , \end{aligned}$$
(6)

where

$$\begin{aligned} \mathbb {E}^k_{x,y}(f):=\sum _{(k',i,j) \in K \times I \times J} x(i) y(j)q(k,i,j)(k') f(k') \end{aligned}$$
(7)

and

$$\begin{aligned} g(k,x,y):=\sum _{(i,j) \in I \times J} x(i) y(j) g(k,i,j). \end{aligned}$$
(8)

Note that these equations also hold when \(\lambda _r=0\), with the convention \(v_0:=v_{\infty }\). Let \(x \in {\varDelta }(I)\) be optimal in (3) and \(y \in {\varDelta }(J)\) be optimal in (6). We have

$$\begin{aligned} v_{\pi ^r}(k) \le \lambda _r g(k,x,y)+(1-\lambda _r)\mathbb {E}^k_{x,y}(v_{\pi ^{r+1}}) \end{aligned}$$
(9)

and

$$\begin{aligned} v_{\lambda _r}(k) \ge \lambda _r g(k,x,y)+(1-\lambda _r)\mathbb {E}^k_{x,y}(v_{\lambda _{r}}). \end{aligned}$$
(10)

Combining the two inequalities yields:

$$\begin{aligned} v_{\pi ^r}(k)-v_{\lambda _r}(k) \le (1-\lambda _r)\left\| v_{\pi ^{r+1}}-v_{\lambda _r}\right\| _\infty . \end{aligned}$$

Symmetrically, if \(x' \in {\varDelta }(I)\) is optimal in (4) and \(y' \in {\varDelta }(J)\) is optimal in (5), then

$$\begin{aligned} v_{\lambda _r}(k)-v_{\pi ^r}(k) \le (1-\lambda _r)\left\| v_{\pi ^{r+1}}-v_{\lambda _r}\right\| _\infty , \end{aligned}$$

and thus

$$\begin{aligned} \left\| v_{\pi ^r}-v_{\lambda _r}\right\| _\infty \le (1-\lambda _r)\left\| v_{\pi ^{r+1}}-v_{\lambda _r}\right\| _\infty . \end{aligned}$$
(11)

Let \({\varPi }_r:=\prod _{r'=0}^{r-1} (1-\lambda _{r'})= \sum _{m \ge r+1} \pi _m\). Note that \(\lim _{r \rightarrow +\infty } {\varPi }_r=0\). The last inequality yields:

$$\begin{aligned} {\varPi }_r \left\| v_{\pi ^r}-v_{\lambda _r}\right\| _\infty \le {\varPi }_{r+1} \left\| v_{\pi ^{r+1}}-v_{\lambda _{r+1}}\right\| _\infty +{\varPi }_{r+1} \left\| v_{\lambda _{r+1}}-v_{\lambda _r}\right\| _\infty . \end{aligned}$$

Let \(N \in \mathbb {N}^*\) such that there exists \(m \ge N+1\), \(\pi _m \ne 0\). Summing this inequality over \(r \in \{0,1,\ldots ,N-1 \}\) yields:

$$\begin{aligned} \left\| v_{\pi }-v_{\lambda _0}\right\| _\infty \le {\varPi }_{N} \left\| v_{\pi ^{N}}-v_{\lambda _{N}}\right\| _\infty + \sum _{r=1}^N {\varPi }_{r} \left\| v_{\lambda _{r}}-v_{\lambda _{r-1}}\right\| _\infty . \end{aligned}$$
(12)

Let \(\epsilon \in (0,1)\). Let \(N_0:=\max \{N \ge 1 \ | \ {\varPi }_{N} \ge \epsilon \}\). We have \({\varPi }_{N_0} \le \epsilon +I_{\infty }(\pi )\). For \(N=N_0\), inequality (12) writes:

$$\begin{aligned} \left\| v_{\pi }-v_{\lambda _0}\right\| _{\infty } \le \epsilon +I_{\infty }(\pi ) +\sum _{r=1}^{N_0} \left\| v_{\lambda _{r}}-v_{\lambda _{r-1}}\right\| _\infty . \end{aligned}$$
(13)

Assume that \(I_{\infty }(\pi ) < \epsilon \beta \) (see Eq. (1) for the definition of \(\beta \)).

Thus \(\lambda _r \le I_{\infty }(\pi )/{\varPi }_{N_0}<\beta \) for all \(r \in \{0,1,\ldots ,N_0 \}\). If \(s=\infty \), then \(\sum _{r=1}^{N_0} \Vert v_{\lambda _{r}}-v_{\lambda _{r-1}}\Vert _\infty =0\), and the last inequality proves that \({\varGamma }\) has an \(\infty \)-asymptotic value. Assume now that \(s<\infty \). Using (2), let us majorize the term on the right in inequality (13):

$$\begin{aligned} \sum _{r=1}^{N_0} \left\| v_{\lambda _{r}}-v_{\lambda _{r-1}}\right\| _\infty \le C \sum _{r=1}^{N_0} \left| (\lambda _{r})^s-(\lambda _{r-1})^s \right| . \end{aligned}$$

Let \(r \in \{1,2,\ldots ,N_0 \}\). The quantity \(|(\lambda _r)^s-(\lambda _{r-1})^s |\) is smaller than

$$\begin{aligned} \left| \frac{(\pi _{r+1})^s}{\left( \sum _{m \ge r+1} \pi _m \right) ^s}-\frac{(\pi _r)^s}{\left( \sum _{m \ge r+1} \pi _m\right) ^s}\right| + \left| \frac{(\pi _{r})^s}{\left( \sum _{m \ge r+1} \pi _m \right) ^s}-\frac{(\pi _r)^s}{\left( \sum _{m \ge r} \pi _m\right) ^s}\right| . \end{aligned}$$

By definition of \(N_0\), we have

$$\begin{aligned} \sum _{m \ge r} \pi _m \ge \sum _{m \ge r+1} \pi _m \ge \epsilon . \end{aligned}$$

Therefore we can majorize the term on the left by \(\epsilon ^{-s} |(\pi _{r+1})^s-(\pi _r)^s |\). As for the term on the right, by the Mean Value theorem we have

$$\begin{aligned} \left( \sum _{m \ge r+1} \pi _m \right) ^{-s}-\left( \sum _{m \ge r} \pi _m \right) ^{-s}\le & {} s \left( \sum _{m \ge r+1} \pi _m \right) ^{-1-s} \pi _r\\\le & {} s \epsilon ^{-1-s} \pi _r. \end{aligned}$$

Finally we have

$$\begin{aligned} \sum _{r \ge 1} \left| (\lambda _r)^s-(\lambda _{r-1})^s\right|\le & {} \sum _{r \ge 1} \left( \epsilon ^{-s}\left| (\pi _{r+1})^s-(\pi _r)^s\right| +s \epsilon ^{-1-s} (\pi _r)^{1+s} \right) \\\le & {} \epsilon ^{-s} I_s(\pi )+s \epsilon ^{-1-s} I_{\infty }(\pi )^s \\\le & {} (\epsilon ^{-s}+s \epsilon ^{-1-s})I_s(\pi ). \end{aligned}$$

Plugging this into (13) gives

$$\begin{aligned} \left\| v_{\pi }-v_{\lambda _0}\right\| _{\infty } \le \epsilon +I_s(\pi )^{1/s} +C\left( \epsilon ^{-s}+s \epsilon ^{-1-s}\right) I_s(\pi ). \end{aligned}$$

Thus for \(I_s(\pi )\) sufficiently small, we have both \(\Vert v_{\pi }-v_{\lambda _0}\Vert _\infty \le \epsilon \) and \(\Vert v_{\lambda _0}-v_{\infty }\Vert _\infty \le \epsilon \), which concludes the proof. \(\square \)

Corollary 2

Let \((\pi ^n) \in ({\varDelta }(\mathbb {N}^*))^{\mathbb {N}}\) such that for all \(p>0\), \(\lim _{n \rightarrow +\infty } I_p(\pi ^n)=0\). Then in any stochastic game, \((v_{\pi ^n})_{n \ge 0}\) converges to \(v_{\infty }\).

Proof

Let \({\varGamma }\) be a stochastic game of order \(s \in (0,+\infty ]\). By Theorem 3, \({\varGamma }\) has a s-asymptotic value. By assumption, we have \(\lim _{n \rightarrow +\infty } I_s(\pi ^n)=0\), thus \((v_{\pi ^n})_{n \ge 0}\) converges to \(v_{\infty }\). \(\square \)

The following remarks show that for the asymptotic approach, Corollary 2 is more general than Corollary 1 and Theorem 2.

Remark 3

 

  • When \((\pi _m)_{m \ge 1}\) is decreasing, for all \(p>0\), \(I_p(\pi )=(\pi _1)^p\). According to Corollary 2, \((v_{\pi })\) converges when \(\pi _1\) goes to 0 (compare with the asymptotic approach in Corollary 1).

  • When \(s=1\), the mapping \(\lambda \rightarrow v_\lambda \) is Lipschitz. For instance, this is the case when \({\varGamma }\) is a Markov decision process: see Sorin 2002, Chapter 5, Proposition 5.20. By Theorem 3, \({\varGamma }\) has a 1-asymptotic value (compare with the asymptotic approach in Theorem 2).

  • For \((l,n) \in \mathbb {N}\times \mathbb {N}^*\), let \(\pi ^{l,n}=n^{-1} 1_{l+1 \le m \le l+n}\). The \((\pi _{l,n})\) are non-monotonic sequences, thus Corollary 1 does not apply. Nevertheless, \(I_s(\pi ^{l,n})=2 n^{-s}\). Consequently, for any \(\epsilon >0\), there exists \({\bar{n}} \in \mathbb {N}^*\) such that for all \(n \ge {\bar{n}}\), for all \(l \in \mathbb {N}\), \(I_s(\pi ^{l,n}) \le \epsilon \). By Theorem 3, \({\varGamma }\) has a s-asymptotic value, and we deduce that

    $$\begin{aligned} \lim _{n \rightarrow +\infty } \sup _{l \in \mathbb {N}} \left\| v_{\pi ^{l,n}}-v_{\infty } \right\| _{\infty }=0. \end{aligned}$$

4.2 Absorbing games

In this subsection, we relax the finiteness assumption on the action sets. An absorbing state is a state such that once it is reached, the game remains in this state forever, and the payoff does not depend on the actions (absorbing payoff). An absorbing game is a stochastic game that has at most one nonabsorbing state.

Mertens et al. (2009) have proved the existence of the uniform value in absorbing games with compact action sets and separately continuous transition and payoff functions. In particular, \((v_{\lambda })\) converges. Adapting the proof of the previous subsection, we prove the following proposition:

Proposition 2

Let \({\varGamma }\) be an absorbing game with compact action sets and separately continuous transition and payoff functions. Then, \({\varGamma }\) has an \(\infty \)-asymptotic value.

Remark 4

For finite I and J, this result was stated in Cardaliaguet et al. (2012), with a sketch of proof. Here, we provide a complete and simpler demonstration, which holds in a more general framework.

Proof

Again, we adopt the convention \(v_0:=v_{\infty }\). The Shapley equations (3), (4), (5) and (6) still hold true for compact action sets (see Maitra and Parthasarathy 1970). The only difference is that in (7) and (8), the sum has to be replaced by an integral.

If \(k^*\) is an absorbing state, we have \(v_{\pi }(k^*)=v_\lambda (k^*)\), for any \(\pi \in {\varDelta }(\mathbb {N}^*)\) and \(\lambda \in [0,1]\). Let k be the only non-absorbing state of the game, and \(r \in \mathbb {N}\). In the previous proof, inequalities (9) and (10) yield:

$$\begin{aligned} (v_{\pi ^r}-v_{\lambda _r})(k) \le (1-\lambda _r)\mu _r (v_{\pi ^{r+1}}-v_{\lambda _r})(k), \end{aligned}$$

where \(\mu _r\) is the probability that the game is not absorbed, when Player 1 (resp. 2) plays an optimal strategy x (resp. y) in (3) (resp. (6)). In what follows, for simplicity we omit the variable k.

Let \({\varPi }_r=\prod _{m=0}^{r-1} (1-\lambda _r)\mu _r\). Relying on the same steps as in the previous proof, we get the analogous of (13), where \(N_0\) is defined similarly:

$$\begin{aligned} (v_{\pi }-v_{\lambda _0}) \le \left( \epsilon +I_{\infty }(\pi ) \right) +\sum _{r=1}^{N_0} {\varPi }_r (v_{\lambda _{r}}-v_{\lambda _{r-1}}). \end{aligned}$$
(14)

Let us majorize the right-hand side. The sequence \(({\varPi }_r)_{r \ge 0}\) is decreasing. Moreover, for all \(r \in \{1,2,\ldots ,N_0 \}\), we have

$$\begin{aligned} \sum _{m \ge r+1} \pi _m \ge {\varPi }_r \ge \epsilon . \end{aligned}$$

Hence, \(\lambda _r \in [0,I_{\infty }(\pi )/\epsilon ]\). Let \(V:=\sup _{\lambda \in [0,I_{\infty }(\pi )/\epsilon ]} v_\lambda \). We have:

$$\begin{aligned} \sum _{r=1}^{N_0} {\varPi }_r (v_{\lambda _{r}}-v_{\lambda _{r-1}})= & {} \sum _{r=1}^{N_0} {\varPi }_r v_{\lambda _{r}}-\sum _{r=0}^{N_0-1} {\varPi }_{r+1} v_{\lambda _{r}}\\= & {} \sum _{r=1}^{N_0-1} ({\varPi }_r-{\varPi }_{r+1})v_{\lambda _r}+{\varPi }_{N_0} v_{\lambda _{N_0}} -{\varPi }_1 v_{\lambda _0} \\\le & {} V \sum _{r=1}^{N_0-1} ({\varPi }_r-{\varPi }_{r+1})+{\varPi }_{N_0} v_{\lambda _{N_0}} -{\varPi }_1 v_{\lambda _0} \\\le & {} V({\varPi }_1-{\varPi }_{N_0})+{\varPi }_{N_0} v_{\lambda _{N_0}} -{\varPi }_1 v_{\lambda _0} \\= & {} {\varPi }_1\left( V-v_{\lambda _0}\right) -{\varPi }_{N_0}\left( V-v_{\lambda _{N_0}}\right) \\\le & {} \left| V-v_{\lambda _0}\right| +\left| V-v_{\lambda _{N_0}}\right| . \end{aligned}$$

Because \((v_\lambda )\) converges to \(v_\infty \) when \(\lambda \) goes to 0, the term \(|V-v_{\lambda }|\) vanishes when \(\lambda \) goes to 0. Consequently, the right-hand side of the last inequality goes to 0 when \(I_\infty (\pi )\) goes to 0. Together with (14), this shows that the positive part of \((v_{\pi }-v_{\lambda _0})\) goes to 0 when \(I_\infty (\pi )\) goes to 0.

Symmetrically, one can show that the negative part of \((v_{\pi }-v_{\lambda _0})\) goes to 0 when \(I_\infty (\pi )\) goes to 0. Hence, the proposition is proved. \(\square \)

4.3 An example

We construct a stochastic game of order 1 / 2, which has no p-asymptotic value for any \(p>1/2\). First, this shows that our main result (see Theorem 3) cannot be improved, second this implies that Theorem 2 does not extend to the Two-Player Case.

Let us consider the following stochastic game \({\varGamma }\) (Table 1):

Table 1 Transition and payoff functions in state \(\omega _1\) and \(\omega _2\)

The set of states of the game is \(K=\{\omega _1,\omega _2,1^*,0^* \}\). The action set is \(I= \{T,M,B\}\) for Player 1 and \(J= \{L,R \}\) for Player 2. States \(1^*\) and \(0^*\) are absorbing states with absorbing payoff respectively 1 and 0. The payoff and transition functions in state \(\omega _1\) (resp. \(\omega _2\)) are described by the left table (resp. the right one). The symbol \(\overrightarrow{1}\) (resp. \(\overleftarrow{0}\)) means that the payoff is 1 (resp. 0) and the game moves on to state \(\omega _2\) (resp. \(\omega _1\)). When there is no arrow or star, this means that the game remains in the same state.

In Vigeral (2013), a similar stochastic game \({\varGamma }'\) is mentioned. The only difference is that in \({\varGamma }'\), Player 1 has only the two actions T and B. The uniform value \(v_{\infty }'\) of \({\varGamma }'\) satisfies \(v_{\infty }'(\omega _1)=v_{\infty }'(\omega _2)=1/2\). In addition, the order of \({\varGamma }'\) is 1 / 2. Moreover, for all \(\epsilon >0\), the stationary strategy x (resp. y) for Player 1 (resp. 2) defined by \(x(\omega _1)=x(\omega _2)= (1-\sqrt{\lambda }) \cdot T+\sqrt{\lambda } \cdot B\) (resp. \(y(\omega _1)=y(\omega _2)=(1-\sqrt{\lambda }) \cdot L+\sqrt{\lambda } \cdot R\)) is \(\epsilon \)-optimal in \({\varGamma }'_{\lambda }\), for \(\lambda \) small enough (they are asymptotically optimal strategies).

In our example, in \({\varGamma }_{\lambda }\), the action M is dominated by T in every state. Thus for all \(\lambda \in (0,1]\), \(v_{\lambda }=v'_{\lambda }\). In particular, \({\varGamma }\) has order 1 / 2, and its uniform value \(v_{\infty }\) satisfies \(v_{\infty }(\omega _1)=v_{\infty }(\omega _2)=1/2\). In addition, the strategy x (resp. y) is an asymptotically optimal stationary strategy for Player 1 (resp. 2) in \({\varGamma }_{\lambda }\).

Remark 5

 

  • Assume that for some \(\alpha >0\), Player 1 plays \((1-\alpha ) \cdot T+\alpha \cdot B\) in state \(\omega _2\) until the state changes. Whatever Player 2 plays, Player 1 spends at most a number of stages of order \(\alpha ^{-1}\) in state \(\omega _2\) before moving to state \(\omega _1\) or \(0^*\), and the probability that the state goes to \(0^*\) and not to \(\omega _1\) is at most of order \(\alpha \). Hence, for Player 1 there is a trade-off between staying not too long in state \(\omega _2\), and having a low probability of being absorbed in \(0^*\). In view of what precedes, the optimal trade-off in \({\varGamma }_{\lambda }\) is \(\alpha \approx \sqrt{\lambda }\).

  • Let \(\theta \in {\varDelta }(\mathbb {N}^*)\). At some stage m in \({\varGamma }_{\theta }\), the action M may not be dominated by T in state \(\omega _1\). Indeed, if \(\theta _m=0\), the stage payoff is 0 whatever the actions played. Thus, it is optimal for Player 1 to play M, because it makes the state remain in \(\omega _1\). The example builds on this fact. By contrast, for any \(\theta \in {\varDelta }(\mathbb {N}^*)\), in \({\varGamma }_{\theta }\), the action M is dominated by T in state \(\omega _2\). In what follows, we build a family of strategies for Player 1 that all use action M in state \(\omega _2\), but this is only to make the proof easier.

Theorem 4

For all \(p>1/2\), the game \({\varGamma }\) has no p-asymptotic value.

The remainder of the subsection is dedicated to the proof of Theorem 4. Let us introduce the following piece of notation: given three sequences of strictly positive real numbers \((u_n)_{n \ge 1}\), \((v_n)_{n \ge 1}\) and \((w_n)_{n \ge 1}\), we write \(u_n = v_n+o(w_n)\) if the sequence \(([u_n-v_n]/{w_n})_{n \ge 1}\) converges to 0.

To simplify the presentation, we first show that \({\varGamma }\) has no 1-asymptotic value. Let \(n \in \mathbb {N}^*\). For , define \(a_n(l):=l(n+n^5)+1\) and \(b_n(l):=l(n+n^5)+n\). Let \(E_1:=\mathop {\cup }\limits _{0 \le l \le n^3-1} \{a_n(l),a_n(l)+1,\ldots ,b_n(l)\}\).

We consider the sequence \(\pi ^n \in {\varDelta }(\mathbb {N}^*)\) defined by \(\pi ^n_m:=n^{-4}\) if \(m \in E_1\), and \(\pi ^n_m:=0\) otherwise. We have

$$\begin{aligned} I_1(\pi ^n)=(2n^3-1)n^{-4}, \end{aligned}$$

thus \(\lim _{n \rightarrow +\infty } I_1(\pi ^n)=0\). We show below that \(\lim _{n \rightarrow +\infty } v_{\pi ^n}(\omega _1)=1\).

Table 2 Strategy \(\sigma ^n\)

We consider the Markovian strategy \(\sigma ^n \in {\varSigma }\) for Player 1, described by the following Table 2:

We show that for any \(\epsilon >0\), for any n sufficiently large, \(\sigma ^n\) guarantees the payoff \(1-\epsilon \) in \({\varGamma }^{\omega _1}_{\pi ^n}\) for Player 1.

Let \(\tau ^n\) be a pure Markovian best-response to \(\sigma ^n\) in \({\varGamma }^{\omega _1}_{\pi ^n}\). Let \({\varOmega }_n\) be the event

$$\begin{aligned} {\varOmega }_n:= \mathop {\cap }\limits _{l \in \left\{ 0,1,\ldots ,n^3-1 \right\} } \left\{ k_{a_n(l)} \in \left\{ \omega _1,1^* \right\} \right\} . \end{aligned}$$

When the state of the game is \(\omega _2\) and \(m \notin E_1\), Player 1 plays B with probability \(n^{-4}\). By Remark 5, Player 1 spends at most a number of stages of order \(n^4\) in \(\omega _2\), and the state goes to \(0^*\) with a probability at most of order \(n^{-4}\). As a result, if for some \(l \in \{0,\ldots ,n^3-1 \}\) the state is in \(\omega _2\) at stage \(b_n(l)+1\), the probability that it will move to \(\omega _1\) before stage \(a_n(l+1)\) is at least of order \(1-n^{-4}\). Once the state has moved to \(\omega _1\), Player 1 plays M and the state remains in \(\omega _1\) until stage \(a_n(l+1)\). Hence the probability that \(k_{a_n(l)}\) lies in \(\{\omega _1,1^* \}\) for any l in \(\{0,1,\ldots ,n^3-1 \}\) is at least of order \((1-n^{-4})^{n^3}= 1+o(1)\). This informal discussion provides intuition for the following lemma.

Lemma 1

$$\begin{aligned} \lim _{n \rightarrow +\infty } \mathbb {P}^{\omega _1}_{\sigma ^n,\tau ^n}({\varOmega }_n)=1. \end{aligned}$$

For notational convenience, in the proof of this lemma and the proof of the next proposition, for \(n \in \mathbb {N}^*\), \(\mathbb {P}^{\omega _1}_{\sigma ^n,\tau ^n}\) is denoted by \({\mathbb {P}}\) and \(\mathbb {E}^{\omega _1}_{\sigma ^n,\tau ^n}\) is denoted by \(\mathbb {E}\).

Proof

Let \(n \in \mathbb {N}^*\) and \(l \in \{0,\ldots ,n^3-1 \}\). Let us minorize the probability \({\mathbb {P}} (k_{a_n(l+1)} \in \{ \omega _1,1^* \}| k_{a_n(l)} \in \{ \omega _1,1^* \})\).

First, notice that \({\mathbb {P}} (k_{b_n(l)+1} \ne 0^* |k_{a_n(l)} \in \{ \omega _1,\omega _1^* \})=1\) (see Table 2). Let us now analyze how the state may evolve during the block \(\{b_n(l)+1,b_n(l)+2,\ldots ,a_n(l+1)-1 \}\), discriminating between the case \(k_{b_n(l)}=1^*\), \(k_{b_n(l)}=\omega _1\), and \(k_{b_n(l)}=\omega _2\):

  • \({\mathbb {P}} (k_{a_n(l+1)}=1^*|k_{b_n(l)+1}=1^*)=1\).

  • If \(k_{b_n(l)+1}=\omega _1\), then Player 1 will play M at each stage \(m \in \{b_n(l)+1,b_n(l)+2,\ldots ,a_n(l+1)-1 \}\). Therefore, the state will remain in \(\omega _1\): \({\mathbb {P}}(k_{a_n(l+1)}=\omega _1|k_{b_n(l)+1}=\omega _1)=1\).

  • If \(k_{b_n(l)+1}=\omega _2\), then Player 1 will play \((1-n^{-4}) \cdot T + n^{-4} \cdot B\) as long as the state is \(\omega _2\) and \(m \le a_n(l+1)-1\). We discriminate between two cases:

    • if Player 2 plays L as long as the state is \(\omega _2\) and \(m \le a_n(l+1)-1\), the game will never be absorbed in \(0^*\), and the probability that the state will move to \(\omega _1\) before stage \(a_n(l+1)\) is equal to \(1- (1-n^{-4})^{n^5}\). If the state moves to \(\omega _1\) at some stage \(m \le a_n(l+1)-1\), then Player 1 will play M until stage \(a_n(l+1)\), thus the state will remain in \(\omega _1\). Consequently, in this case we have

      $$\begin{aligned} {\mathbb {P}}(k_{a_n(l+1)}=\omega _1|k_{b_n(l)+1}=\omega _2)=1-\left( 1-n^{-4}\right) ^{n^5}. \end{aligned}$$
    • if Player 2 plays R at one stage in \(\{b_n(l)+1,b_n(l)+2,\ldots ,a_n(l+1)-1 \}\), and if at the first stage he does so the state is \(\omega _2\), then with probability \(1-n^{-4}\) the state will move to \(\omega _1\). It will remain in \(\omega _1\) until stage \(a_n(l+1)\). If the state has already switched to \(\omega _1\) before Player 2 plays R, then it will remain in \(\omega _1\) until stage \(a_n(l+1)\). Therefore, in this case we have

      $$\begin{aligned} {\mathbb {P}}(k_{a_n(l+1)}=\omega _1|k_{b_n(l)+1}=\omega _2) \ge 1-n^{-4}. \end{aligned}$$

    The last two subcases show that

    $$\begin{aligned} {\mathbb {P}}(k_{a_n(l+1)}=\omega _1|k_{b_n(l)+1}=\omega _2) \ge \min \left\{ 1-\left( 1-n^{-4} \right) ^{n^5}, 1-n^{-4} \right\} . \end{aligned}$$

This exhaustive study shows that

$$\begin{aligned} {\mathbb {P}}(k_{a_n(l+1)} \in \left\{ \omega _1,1^* \right\} |k_{a_n(l)} \in \left\{ \omega _1,1^* \right\} ) \ge \min \left\{ 1-\left( 1-n^{-4}\right) ^{n^5}, 1-n^{-4} \right\} . \end{aligned}$$

We have \((1-n^{-4})^{n^5}=o(n^{-4})\), thus for n large enough, the minimum in the above equation is reached at \(1-n^{-4}\). By induction, it yields

$$\begin{aligned} \mathbb {P}({\varOmega }_n) \ge \prod _{l=0}^{n^3-1} (1-n^{-4}) = 1+o(1), \end{aligned}$$

and the lemma is proved. \(\square \)

Now we can prove the following proposition:

Proposition 3

The game \({\varGamma }\) has no 1-asymptotic value.

Proof

Let \(n \in \mathbb {N}^*\). We minorize \(\gamma _{\pi ^n}^{\omega _1}(\sigma ^n,\tau ^n)\) by a quantity that goes to 1 as n goes to infinity.

The last lemma shows that with high probability, at the beginning of each block \(\{a_n(l),a_n(l)+1,\ldots ,b_n(l)\}\), the state is either \(\omega _1\) or \(1^*\). Recall that these blocks exactly correspond to the stages where the payoff weight is nonzero. Hence, to get a good payoff between stage \(a_n(l)\) and stage \(b_n(l)\), Player 2 should make the state move from \(\omega _1\) to \(\omega _2\) at least before stage \(b_n(l)\). If Player 2 plays L at each stage \(m \in \{a_n(l),a_n(l)+1,\ldots ,b_n(l)\}\), with probability greater than \(1-(1-n^{-2})^n\), the state will remain in \(\omega _1\) until stage \(b_n(l)\). This probability goes to 1 as n goes to infinity, which is a bad outcome for Player 2. Thus, Player 2 should play R at some stage \(m \in \{a_n(l),a_n(l)+1,\ldots ,b_n(l)\}\). We show that:

  • either the number of \(l \in \{0,1,\ldots ,n^3-1 \}\) such that Player 2 plays at least one time R in \(\{a_n(l),a_n(l)+1,\ldots ,b_n(l)\}\) is small, and thus the total payoff in \({\varGamma }^{\omega _1}_{\pi ^n}\) is close to 1,

  • either the number of \(l \in \{0,1,\ldots ,n^3-1 \}\) such that Player 2 plays at least one time R in \(\{a_n(l),a_n(l)+1,\ldots ,b_n(l) \}\) is high. In this case, with probability close to 1, the state is absorbed in \(1^*\) very rapidly, thus the total payoff in \({\varGamma }^{\omega _1}_{\pi ^n}\) is close to 1.

Let \(n \in \mathbb {N}^*\) and \(l \in \{0,1,\ldots ,n^3-1\}\), and \({\varOmega }_n(l)\) be the event defined by

$$\begin{aligned} {\varOmega }_n(l):=\mathop {\cap }\limits _{0 \le l' \le l} \left\{ k_{a_n(l')} \in \left\{ \omega _1,1^* \right\} \right\} . \end{aligned}$$

Note that \({\varOmega }_n(n^3-1)={\varOmega }_n\). Let

$$\begin{aligned} M_n(l):=\left\{ l' \in \left\{ 0,1,\ldots ,l \right\} \ | \ \exists m \in \left\{ a_n(l),a_n(l)+1,\ldots ,b_n(l) \right\} , \tau ^n(m,\omega _1)=R \right\} , \end{aligned}$$

and let \(\overline{M_n(l)}:= \{0,1,\ldots ,l \} \setminus M_n(l)\). If \(l \in M_{n}(n^3-1)\), let

$$\begin{aligned} m_n(l):=\min \left\{ m \in \left\{ a_n(l),a_n(l)+1,\ldots ,b_n(l) \right\} \ | \ \tau ^n(m,\omega _1)=R \right\} . \end{aligned}$$

Fix \(\delta \in (0,1]\). Let \(l_n:=\max \{l \in \{0,1,\ldots ,n^3-1 \} \ | \ {\text {Card}}M_n(l) \le \delta n^3 \}\). We show that between stages 1 and \(b_n(l_n)\), Player 2 did not play R a sufficient number of times to impact the total payoff, and at stage \(b_n(l_n)+1\), either \(l_n=n^3-1\) and the game is finished, or he has played too many times R, in such a way that the state has been absorbed in \(1^*\) with high probability.

By definition of \(l_n\), we have \({\text {Card}}M_n(l_n) \le \delta n^3\), and if \(l_n<n^3-1\), then \({\text {Card}}M_n(l_n) \ge \delta n^3-1\).

We have

$$\begin{aligned} \mathbb {E}\left( \sum _{m=1}^{b_n(l_n)} \pi ^n_m g_m \right)= & {} \frac{1}{n^4}\sum _{l=0}^{l_n} \mathbb {E}\left( \sum _{m=a_n(l)}^{b_n(l)} g_m \right) \nonumber \\\ge & {} \frac{1}{n^4} \sum _{l \in \overline{M_n(l_n)}} \mathbb {E}\left( 1_{{\varOmega }_n(l)} \sum _{m=a_n(l)}^{b_n(l)} g_m \right) . \end{aligned}$$
(15)

If \(l \in \overline{M_n(l_n)}\) and \(k_{a_n(l)}=\omega _1\), Player 2 plays L as long as \(k_m=\omega _1\) and \(m \le b_n(l)\), while Player 1 plays \((1-n^{-2}) \cdot T + n^{-2} \cdot B\). As a result, if \(k_{a_n(l)}=\omega _1\), the probability that the state remains in \(\omega _1\) until stage \(b_n(l)\) is \(\alpha _n:=(1-n^{-2})^n\). Thus, the last inequality yields

$$\begin{aligned} \mathbb {E}\left( \sum _{m=1}^{b_n(l_n)} \pi ^n_m g_m \right)\ge & {} n^{-3} {\text {Card}}\overline{M_n(l_n)} \mathbb {P}({\varOmega }_n) \alpha _n\end{aligned}$$
(16)
$$\begin{aligned}\ge & {} (n^{-3}(l_n+1)-\delta ) \mathbb {P}({\varOmega }_n) \alpha _n . \end{aligned}$$
(17)

Case 1

\(l_n=n^3-1\).

By (16) and Lemma 1, there exists \({\bar{n}} \in \mathbb {N}^*\) such that for all \(n \ge {\bar{n}}\) verifying \(l_n=n^3-1\),

$$\begin{aligned} \gamma ^{\omega _1}_{\pi ^n}(\sigma ^n,\tau ^n) \ge 1-2\delta . \end{aligned}$$
(18)

Case 2

\(l_n<n^3-1\).

Let \(n \in \mathbb {N}^*\) such that \(l_n<n^3-1\). In particular, \(|M_n(l_n) | \ge \delta n^3-1\) and \(|\overline{M_n(l_n)}| \le l_n-\delta n^3+2\).

We are going to show the following inequality:

$$\begin{aligned} \mathbb {P}(k_{b_n(l_n)}=1^*) \ge \mathbb {P}({\varOmega }_n)-\left( 1-n^{-2}\left( 1-n^{-2}\right) ^n\right) ^{\delta n^3 -1}:=\beta _n . \end{aligned}$$
(19)

The idea is the following. Each time Player 2 plays R in state \(\omega _1\), the state goes to \(1^*\) with probability \(n^{-2}\). If \(k_{a_n(l)}=\omega _1\) and \(l \in M_n(l)\), then at each stage \(m \in \{a_n(l),a_n(l)+1,\ldots ,m_n(l)-1 \}\), Player 2 will play L, hence at each of these stages the state will remain in \(\omega _1\) with probability \(n^{-2}\). Since \(m_n(l)-a_n(l) \le n\), with high probability \(k_{m_n(l)}=\omega _1\). At stage \(m_n(l)\), Player 2 plays R. Thus with high probability, conditional to the event \({\varOmega }_n(l_n)\), before stage \(b_n(l_n)\) Player 2 has played more than \(\delta n^3-1\) times the action R in state \(\omega _1\), leading the state to be absorbed in \(1^*\) before stage \(b_n(l_n)\) with high probability.

Formally, if \(l \in \{0,1,\ldots ,l_n \}\), we have

$$\begin{aligned} \mathbb {P}\left( \left\{ k_{b_n(l)} \ne 1^* \right\} \cap {\varOmega }_n(l)\right)= & {} \mathbb {P}\left( \left\{ k_{b_n(l)} \ne 1^* \right\} \cap \left\{ k_{a_n(l)} = \omega _1 \right\} \cap {\varOmega }_n(l)\right) \nonumber \\= & {} \mathbb {P}\left( \left\{ k_{b_n(l)} \ne 1^*\right\} |\left\{ k_{a_n(l)} = \omega _1 \right\} \cap {\varOmega }_n(l)\right) \nonumber \\&\times \, \mathbb {P}\left( \left\{ k_{a_n(l)} = \omega _1 \right\} \cap {\varOmega }_n(l)\right) . \end{aligned}$$
(20)

First we majorize the first term \(P_1:=\mathbb {P}(k_{b_n(l)} \ne 1^*| \{k_{a_n(l)} = \omega _1 \} \cap {\varOmega }_n(l))\). If \(l \notin M_n(l)\), we simply majorize it by 1. Assume now that \(l \in M_n(l)\). By the Markov property (\(\sigma ^n\) and \(\tau ^n\) are Markovian strategies), we have

$$\begin{aligned} P_1= & {} \mathbb {P}(k_{b_n(l)} \ne 1^*|\left\{ k_{a_n(l)}=\omega _1 \right\} )\\= & {} \mathbb {P}\left( \left\{ k_{b_n(l)} \ne 1^*\right\} \cap \left\{ k_{m_n(l)}=\omega _1\right\} |\left\{ k_{a_n(l)}=\omega _1 \right\} \right) \\&+\, \mathbb {P}\left( \left\{ k_{b_n(l)} \ne 1^*\right\} \cap \left\{ k_{m_n(l)} \ne \omega _1\right\} |\left\{ k_{a_n(l)}=\omega _1 \right\} \right) . \end{aligned}$$

Let \(P_3:=\mathbb {P}(k_{m_n(l)} \ne \omega _1 | k_{a_n(l)}=\omega _1)\). The last equality and the Markov property give

$$\begin{aligned} P_1\le & {} \mathbb {P}(k_{b_n(l)} \ne 1^* | \left\{ k_{m_n(l)}=\omega _1 \right\} \cap \left\{ k_{a_n(l)}=\omega _1 \right\} ) (1-P_3)+P_3 \nonumber \\= & {} \mathbb {P}(k_{b_n(l)} \ne 1^* | k_{m_n(l)}=\omega _1) (1-P_3)+P_3. \end{aligned}$$
(21)

If \(k_{m_n(l)}=\omega _1\), then at stage \(m_n(l)\) Player 2 plays the action R, hence the state is absorbed in \(1^*\) with probability \(n^{-2}\). Thus

$$\begin{aligned} \mathbb {P}(k_{b_n(l)} \ne 1^* | k_{m_n(l)}=\omega _1) \le 1-n^{-2}. \end{aligned}$$
(22)

If \(k_{a_n(l)}=\omega _1\), then at each stage \(m \in \{a_n(l),a_n(l)+1,\ldots ,m_n(l)-1 \}\), Player 2 will play L, hence at each stage the state will remain in \(\omega _1\) with probability \(1-n^{-2}\), and \(m_n(l)-a_n(l) \le n\). We deduce that

$$\begin{aligned} P_3 \le 1-\left( 1-n^{-2} \right) ^n. \end{aligned}$$
(23)

Combining (21), (22) and (23) gives

$$\begin{aligned} P_1\le & {} \left( 1-n^{-2}\right) (1-P_3)+P_3 \nonumber \\= & {} 1+ n^{-2}(P_3-1) \nonumber \\\le & {} 1-n^{-2}\left( 1-n^{-2}\right) ^n. \end{aligned}$$
(24)

As for the second term in (20), we have

$$\begin{aligned} \mathbb {P}\left( \left\{ k_{a_n(l)} = \omega _1 \right\} \cap {\varOmega }_n(l)\right) \le \mathbb {P}\left( \left\{ k_{b_n(l-1)} \ne 1^* \right\} \cap {\varOmega }_n(l-1)\right) . \end{aligned}$$
(25)

Combining (20), (24) and (25), we deduce that if \(l \in M_n(l)\), then

$$\begin{aligned} \mathbb {P}\left( \left\{ k_{b_n(l)} \!\ne \! 1^* \right\} \cap {\varOmega }_n(l)\right) \!\le \! \left( 1-n^{-2}\left( 1-n^{-2}\right) ^n \right) \mathbb {P}\left( \left\{ k_{b_n(l-1)} \!\ne \! 1^* \right\} \cap {\varOmega }_n(l-1)\right) . \end{aligned}$$

Because \(|M_n(l_n)| \ge \delta n^3-1\), by induction we obtain

$$\begin{aligned} \mathbb {P}\left( \left\{ k_{b_n(l_n)} \ne 1^* \right\} \cap {\varOmega }_n(l_n)\right) \le \left( 1-n^{-2}\left( 1-n^{-2}\right) ^n\right) ^{\delta n^3 -1}, \end{aligned}$$

and inequality (19) follows. Now we can minorize the other part of the payoff:

$$\begin{aligned} \mathbb {E}\left( \sum _{m \ge b_n(l_n)+1} \pi ^n_m g_m \right)\ge & {} \mathbb {E}\left( 1_{\left\{ k_{b_n(l_n)}=1^* \right\} }\sum _{m \ge b_n(l_n)+1}\pi ^n_m \right) \nonumber \\= & {} n^{-3}(n^3-l_n-1) \mathbb {P}\left( \left\{ k_{b_n(l_n)}=1^*\right\} \right) \nonumber \\\ge & {} \left( 1-n^{-3}(l_n+1) \right) \beta _n. \end{aligned}$$
(26)

Inequalities (17) and (26) yield

$$\begin{aligned} \mathbb {E}\left( \sum _{m \ge 1} \pi ^n_m g_m \right)= & {} \mathbb {E}\left( \sum _{m=1}^{b_n(l_n)} \pi ^n_m g_m \right) +\mathbb {E}\left( \sum _{m \ge b_n(l_n)+1} \pi ^n_m g_m \right) \\\ge & {} (n^{-3}(l_n+1)-\delta ) \mathbb {P}({\varOmega }_n) \alpha _n+ \left( 1-n^{-3}(l_n+1) \right) \beta _n. \end{aligned}$$

The sequences \((\alpha _n)_{n \ge 1}\), \((\beta _n)_{n \ge 1}\) and \((\mathbb {P}({\varOmega }_n))_{n \ge 1}\) converge to 1, thus there exists \(n_1 \in \mathbb {N}^*\) such that for all \(n \ge n_1\) verifying \(l_n<n^3-1\), we have

$$\begin{aligned} v_{\pi ^n}(\omega _1) \ge \gamma ^{\omega _1}_{\pi ^n}(\sigma ^n,\tau ^n) \ge 1-2\delta . \end{aligned}$$
(27)

Because \(\tau ^n\) is a best-response strategy to \(\sigma ^n\) in \({\varGamma }^{\omega _1}_{\pi _n}\), inequalities (18) and (27) show that for \(n \ge \max ({\bar{n}},n_1)\), we have

$$\begin{aligned} v_{\pi ^n}(\omega _1) \ge \gamma ^{\omega _1}_{\pi ^n}(\sigma ^n,\tau ^n) \ge 1-2\delta . \end{aligned}$$
(28)

Because \(\delta \in (0,1]\) is arbitrary, the sequence \((v_{\pi ^n}(\omega _1))_{n \ge 1}\) converges to 1, and \({\varGamma }\) has no 1-asymptotic value. \(\square \)

Now we can prove Theorem 4.

Proof of Theorem 4

Let \(\epsilon >0\) and \(p:=1/2+\epsilon \). Proving that \({\varGamma }\) has no p-asymptotic value proceeds in the same way as previously. The only difference is that the sequence of weights \((\pi ^n)\) has to be modified. Let \(\epsilon >0\) and \(n \in \mathbb {N}^*\). In what follows, the integer part of a real number x is denoted by \(\lfloor x \rfloor \). Define two integers \(N_1\) and \(N_2\) by

$$\begin{aligned} N_1:=\lfloor n^{2-\epsilon } \rfloor \quad \text {and} \quad N_2:=\lfloor n^{2+\epsilon } \rfloor . \end{aligned}$$

For \(l \in \{0,1,\ldots ,N_2-1 \}\), let \(a_n'(l):=l(N_1+n^5)+1\) and \(b_n'(l):=l(N_1+n^5)+N_1\). Let

$$\begin{aligned} E'_1:=\mathop {\cup }\limits _{l \in \left\{ 0,1,\ldots ,N_2 \right\} } \left\{ a_n'(l),a_n'(l)+1,\ldots ,b_n'(l) \right\} . \end{aligned}$$

Let \(\pi '^n \in {\varDelta }(\mathbb {N}^*)\) defined in the following way: for \(m \in \mathbb {N}^*\),

$$\begin{aligned} \pi '^n_m:= \left\{ \begin{array}{l@{\quad }l} n^{-4} &{} \text{ if } \ \ m \in E'_1 \setminus \left\{ N_1+1 \right\} , \\ 1-\sum _{m \ne N_1+1} \pi '^n_m &{} \text{ if } \ \ m=N_1+1, \\ 0 &{} \text{ if } \ \ m \notin E_1'. \end{array} \right. \end{aligned}$$

We have

$$\begin{aligned} I_{p}(\pi '^n) \le \lfloor n^{2+\epsilon } \rfloor n^{-4 (1/2+\epsilon )}+2 \pi '^n_{N_1+1}. \end{aligned}$$

Hence \(\lim _{n \rightarrow +\infty } I_p(\pi '^n)=0\). We claim that \(\lim _{n \rightarrow +\infty } v_{\pi '^n}(\omega _1)=1\). The proof is the same as above. We still consider the same strategy \(\sigma ^n\) for Player 1 in \({\varGamma }^{\pi '^n}\). Lemma 1 is still true. Indeed, the length of the blocks \(\{b'_n(l)+1,b'_n(l)+2,\ldots ,a'_n(l+1)-1\}\) is still \(n^5\).

Now let us check the remainder of the proof. The quantity \((1-n^{-2})^{n^{2-\epsilon }}\) goes to 1 as n goes to infinity. Hence if \(l \in \{0,\ldots ,N_2 \}\) and \(k_{l(N_1+n^5)+1}=\omega _1\), to get a good payoff between stage \(a'_n(l)\) and stage \(b'_n(l)\), Player 2 should make the state move from \(\omega _1\) to \(\omega _2\) at least before stage \(b_n(l)\). Thus, he has to play R at least one time, and take a risk of being absorbed in \(\omega _1\) of \(n^{-2}\). There are approximately \(n^{2+\epsilon }\) such blocks. Since \((1-n^{-2})^{n^{2+\epsilon }}\) goes to 0 as n goes to infinity, the same proof as before shows that, when n goes to infinity, the sequence \((v_{\pi ^n}(\omega _1))_{n \ge 1}\) converges to 1. \(\square \)

5 Uniform approach

To relax the assumption that sequences of weights are decreasing in Corollary 1, the simplest sequences of weights one can imagine are the \(\pi ^{l,n}\) defined in Remark 3: \(\pi ^{l,n}:=n^{-1} 1_{l+1 \le m \le l+n}\). As we have seen, Theorem 3 shows that for any stochastic game,

$$\begin{aligned} \lim _{n \rightarrow +\infty } \sup _{l \in \mathbb {N}} \left\| v_{\pi ^{l,n}}-v_{\infty } \right\| _{\infty }=0. \end{aligned}$$

Is it possible to show the existence of strategies that are approximately optimal in any game \({\varGamma }_{\pi ^{l,n}}\), for any \(l \ge 0\) and n big enough, for both players? We provide an example of an absorbing game where this property does not hold. Thus, no natural extension of Theorem 1 to sequences of weights which are not decreasing seems to exists.

Consider the following absorbing game, introduced by Gillette (1957) under the name of “Big Match”. The state space is \(K= \{\omega ,1^*,0^* \}\), where \(1^*\) (resp. \(0^*\)) is an absorbing state with payoff 1 (resp. 0). Action sets for Player 1 and 2 are respectively \(I= \{T,B \}\) and \(J= \{L,R\}\). The payoff and transition functions in state \(\omega \) are described in Table 3.

Table 3 Transition and payoff functions in state \(\omega \)

As any stochastic game, the Big Match has a uniform value \(v_{\infty }\), and \(v_{\infty }(\omega )=1/2\) (see Sorin 2002, Chapter 5, p. 93). The stationary strategy \(1/2 \cdot L+1/2 \cdot R\) is a 0-optimal uniform strategy for Player 2. Constructing \(\epsilon \)-optimal uniform strategy for Player 1 is more tricky (see Blackwell and Ferguson 1968).

Now we investigate the general uniform approach in the Big Match.

Definition 6

Let \({\varGamma }\) be a stochastic game, and \(k_1\) the initial state. Player 1 can guarantee uniformly in the general sense \(\alpha \in \mathbb {R}\) in \({\varGamma }^{k_1}\) if for all \(\epsilon >0\), there exists \(\sigma ^* \in {\varSigma }\) and \(N_1 \in {\mathbb {N}}^*\) such that for all \(\tau \in {\mathcal {T}}\), \(n \ge N_1\) and \({\bar{n}} \in {\mathbb {N}}\), we have

$$\begin{aligned} {\mathbb {E}}^{k_1}_{\sigma ^*,\tau }\left( \frac{1}{n}\sum _{m={\bar{n}}+1}^{{\bar{n}}+n} g_m \right) \ge \alpha -\epsilon . \end{aligned}$$
(29)

First, let us explain why Player 1 can not guarantee uniformly more than 0 in the general sense. Assume the contrary: Player 1 can guarantee uniformly in the general sense \(\alpha >0\). Let \((N_1,\sigma ^*) \in \mathbb {N}^* \times {\varSigma }\) corresponding to \(\epsilon =\alpha /2\) in (29). The stationary strategy \(\alpha /10 \cdot L+(1-\alpha /10) \cdot R\) is denoted by y.

On the one hand, Player 1 should not play T at any stage of the game, against the strategy y. On the other hand, in the infinitely repeated game, with high probability Player 2 will play L at \(N_1\) random consecutive stages. At that point, Player 1 does not know if Player 2 has switched to a pure strategy that plays L at any stage, or if he still plays y. In the first case, Player 1 should play T at least one time during these \(N_1\) stages, but in the second case, he should not play T. Thus, he cannot guarantee a good payoff against both strategies. This provides intuition for the following proposition:

Proposition 4

Player 1 cannot guarantee uniformly more than 0 in the general sense.

Remark 6

This proposition shows in particular that Theorem 2 (Renault and Venel) does not generalize to zero-sum stochastic games, even for absorbing games: the game \({\varGamma }\) has no general uniform value.

Proof

The same notations as in the above discussion are used.

For \(n \in \mathbb {N}^*\), let \(A_n\) be the event \(\{\hbox {Player 1 plays } T \hbox { before stage } n \}\), and let \(\overline{A_n}\) be the complement of \(A_n\). The sequence \((\mathbb {P}_{\sigma ^*,y}(A_n))_{n \ge 1}\) is increasing and bounded by 1, therefore it converges to some \(l \in [0,1]\). Let \(N_0 \in \mathbb {N}^*\) such that for all \(n \ge N_0\),

$$\begin{aligned} \mathbb {P}_{\sigma ^*,y}(A_n) \ge l-\alpha /10. \end{aligned}$$

To avoid confusion, in what follows h denotes an element of \(H_{\infty }\), and \({\widetilde{h}}\) denotes the random variable with values in \(H_{\infty }\) describing the infinite history of the game. Let \(n \ge N_0+N_1\). Let \(H^n \subset H_{\infty }\) be the following set:

$$\begin{aligned} \left\{ h \in H_{\infty } | \exists \ a(h) \in \left\{ N_0,\ldots ,n-N_1 \right\} , \ \forall m \in \left\{ a(h)+1,\ldots ,a(h)+N_1\right\} , j_m=L \right\} . \end{aligned}$$

There exists \(N_2 \in \mathbb {N}^*\) such that

$$\begin{aligned} {\mathbb {P}}_{\sigma ^*,y}\left( {\widetilde{h}} \in H^{N_2}\right) \ge 1/2. \end{aligned}$$
(30)

We have

$$\begin{aligned} \mathbb {P}_{\sigma ^*,y}(\overline{A_{N_0}} \cap A_{N_2})\ge & {} \mathbb {E}_{\sigma ^*,y}\left( 1_{\left\{ {\widetilde{h}} \in H^n \right\} } \mathbb {P}_{\sigma ^*,y'({\widetilde{h}})}(\overline{A_{N_0}} \cap A_{N_2}) \right) , \end{aligned}$$
(31)

where for \(h \in H^n\), the strategy \(y'(h)\) is the Markov strategy equal to y between stages 1 and a(h), and equal to \(j_m(h)\) for each stage \(m \ge a(h)+1\).

Let \(h \in H^n\). Let us now minorize \(\mathbb {P}_{\sigma ^*,y'(h)}(\overline{A_{N_0}} \cap A_{N_2})\).

If Player 1 plays T at some stage \(m \le a(h)\) against the strategy \(y'(h)\), the game is absorbed in \(1^*\) with probability \(\alpha /10\), and in \(0^*\) with probability \(1-\alpha /10\). Therefore we have

$$\begin{aligned} \mathbb {E}_{\sigma ^*,y'(h)}\left( 1_{A_{N_0}} \frac{1}{N_1} \sum _{m=a(h)+1}^{a(h)+N_1} g_m \right) \le \frac{\alpha }{10}. \end{aligned}$$
(32)

Because \(h \in H^n\), we have

$$\begin{aligned} \mathbb {E}_{\sigma ^*,y'(h)}\left( 1_{\overline{A_{N_2}}} \frac{1}{N_1} \sum _{m=a(h)+1}^{a(h)+N_1} g_m \right) =0. \end{aligned}$$
(33)

Combining (32) and (33), we obtain

$$\begin{aligned} \mathbb {E}_{\sigma ^*,y'(h)} \left( \frac{1}{N_1} \sum _{m=a(h)+1}^{a(h)+N_1} g_m \right)\le & {} \frac{\alpha }{10}+\mathbb {P}_{\sigma ^*,y'(h)}(\overline{A_{N_0}} \cap A_{N_2}), \end{aligned}$$

and by (29),

$$\begin{aligned} \mathbb {P}_{\sigma ^*,y'(h)}(\overline{A_{N_0}} \cap A_{N_2}) \ge \alpha /2 - \alpha /10= 2 \alpha /5. \end{aligned}$$

Plugging the last inequality into (31) and using (30), we deduce that

$$\begin{aligned} \mathbb {P}_{\sigma ^*,y}(\overline{A_{N_0}} \cap A_{N_2})\ge & {} \mathbb {E}_{\sigma ^*,y}\left( 1_{\left\{ h \in H^n \right\} }2 \alpha /5 \right) \\\ge & {} \alpha /5. \end{aligned}$$

Because \(A_{N_0} \subset A_{N_2}\), we have

$$\begin{aligned} \mathbb {P}_{\sigma ^*,y}(\overline{A_{N_0}} \cap A_{N_2})= \mathbb {P}_{\sigma ^*,y}(A_{N_2})-\mathbb {P}_{\sigma ^*,y}(A_{N_0})\le \alpha /10, \end{aligned}$$

thus \(\alpha /10 \ge \alpha /5\), which is a contradiction. \(\square \)