Stochastic Games with Short-Stage Duration

Neyman, Abraham

doi:10.1007/s13235-013-0083-x

Stochastic Games with Short-Stage Duration

Published: 17 May 2013

Volume 3, pages 236–278, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Dynamic Games and Applications Aims and scope Submit manuscript

Stochastic Games with Short-Stage Duration

Download PDF

Abraham Neyman¹

216 Accesses
17 Citations
Explore all metrics

Abstract

We introduce asymptotic analysis of stochastic games with short-stage duration. The play of stage k, k≥0, of a stochastic game Γ _δ with stage duration δ is interpreted as the play in time kδ≤t<(k+1)δ and, therefore, the average payoff of the n-stage play per unit of time is the sum of the payoffs in the first n stages divided by nδ, and the λ-discounted present value of a payoff g in stage k is λ ^kδ g. We define convergence, strong convergence, and exact convergence of the data of a family (Γ _δ)_δ>0 as the stage duration δ goes to 0, and study the asymptotic behavior of the value, optimal strategies, and equilibrium. The asymptotic analogs of the discounted, limiting-average, and uniform equilibrium payoffs are defined. Convergence implies the existence of an asymptotic discounted equilibrium payoff, strong convergence implies the existence of an asymptotic limiting-average equilibrium payoff, and exact convergence implies the existence of an asymptotic uniform equilibrium payoff.

Stationary Equilibria in Discounted Stochastic Games

Article 05 March 2023

The Asymptotic Value in Finite Stochastic Games

A Folk theorem for stochastic games with finite horizon

Article 17 February 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Most strategic interactions evolve over time, and are often modeled as a discrete-time multistage game. The discrete-time modeling enables us to use the classic theory of extensive form games, which entails no conceptual difficulties. This, however, comes at implicit costs: Players cannot change their actions within a stage and additional information about others’ actions and nature’s moves is obtained only at a discrete set of times. An alternative modeling of dynamic interactions is continuous-time games, which avoids the above mentioned costs, but entails some conceptual difficulties.

The present paper develops a complementary approach that studies the asymptotic behavior of multistage games when the stage duration goes to zero. We focus on the theory of stochastic games.

A discrete-time stochastic game, introduced by Shapley [9], proceeds in stages. The stage payoff is a function g(z,a) of the stage state z and the stage action a, and the transitions to the next state z′ are defined by conditional probabilities P(z′∣z,a) of the next state z′ given the present state z and the stage action a. Players’ stage-action choices are made simultaneously and are observed by all players following the stage play.

Discrete-time stochastic games are multistage game-theoretic models that enable us to account for changes of states between different stages of the interaction, and where the change is impacted by the players’ actions. However, no single discrete-time stochastic game can model the case where the probability of a state change in any short time interval can be positive yet arbitrarily small. This feature can be analyzed by studying continuous-time stochastic games, introduced in [13], and studied in, e.g., [2–4, 8, 13]. An alternative and complementary approach is to study the asymptotic behavior of discrete-time stochastic games, where the individual stage represents short time intervals that converge to zero and the transition probabilities to a new state also converge to zero.

The continuous-time stochastic game model provides us with a tractable analytic model (whose results are neatly stated), but, as mentioned earlier, the model entails some conceptual difficulties. The complementary asymptotic approach builds on the classic discrete-time (well-defined) game model and, therefore, avoids the conceptual issues of continuous-time games. The results of the asymptotic approach supplement and cement the conclusions of the analytic continuous-time model.

We consider a family of discrete-time stochastic games Γ _δ, where the positive parameter δ>0 represents the stage duration. The sets of players N, states S, and actions A are independent of the parameter δ, and the conditional transition probabilities P _δ and the payoff function g _δ depend on the parameter δ. We study the asymptotic behavior of the strategic analysis of Γ _δ as δ goes to zero.

The payoff function g _δ describes the stage payoff in Γ _δ. As the stage duration is δ the stage payoff per unit of time is g _δ/δ. One natural condition, (g.1), on the family of discrete-time stochastic games Γ _δ is that the stage payoff function per unit of time is a function of the current state and action, and independent of δ, i.e., g _δ/δ=g, where $g:S\times A\to\mathbb{R}^{N}$. A less restrictive condition, (g.2), is that the stage payoff function per unit of time converges (as δ goes to zero) to a payoff function $g:S\times A\to\mathbb {R}^{N}$. In the asymptotic results, the distinction between assumptions (g.1) and (g.2) is immaterial.

The transition rates, p _δ, are the functions defined on S×S×A by p _δ(z′,z,a)=P _δ(z′∣z,a) if z′≠z and p _δ(z′,z,a)=P _δ(z′∣z,a)−1 if z′=z. The transition rate p _δ(z′,z,a) represents the difference between the probability that the next state will be z′ and the probability (0 or 1) that the current state is z′ when the current state is z and the current action profile is a. Note that it follows that for every (z,a) the sum of p _δ(z′,z,a) over all states z′ is zero and p _δ(z′,z,a) is nonnegative whenever z′ and z are two distinct states. It is convenient to express our conditions on the conditional transition probabilities P _δ as conditions on the transition rates p _δ.

There are several natural conditions on the transition rates function p _δ, each reflecting a dependence of p _δ on the stage duration parameter δ. One such condition, (p.1), is that the transition rates per unit of time is constant, i.e., for each δ>0, p _δ/δ=μ, where $\mu:S\times S\times A\to\mathbb{R}$. A weaker asymptotic condition, (p.2), called convergence, is that the equality with μ holds in the limit, i.e., for all triples (z′,z,a) of states z′,z and action profile a, p _δ(z′,z,a)/δ converges (as δ goes to zero) to a limit μ(z′,z,a). Condition (p.3), called strong convergence, requires that condition (p.2) hold and that p _δ(z′,z,a)>0 if and only if μ(z′,z,a)>0. Condition (p.1) implies condition (p.3) and condition (p.3) implies condition (p.2).

An exact family of discrete-time stochastic games Γ _δ is one that obeys (g.1) and (p.1). A family of discrete-time stochastic games Γ _δ is said to converge in data if it obeys (g.2) and (p.2), and it is said to converge strongly if it obeys (g.2) and (p.3).

The above-mentioned convergence conditions on a family (Γ _δ)_δ>0 are stated as conditions on the data of the games in the family. The data convergence condition seems natural and, therefore, the study of the asymptotic behavior of equilibria of a data-convergent family is of interest. However, one may wonder if the strategic dynamics of some other families of games that do not converge in data have a limit and, therefore, such families deserve an asymptotic analysis as well. This leads us to the study of convergence conditions on a family (Γ _δ)_δ>0 that depend on the stochastic processes of payoffs and states that are defined by the initial state and a strategy profile σ, in particular, when the strategy profile σ is stationary. This leads to our definition of stationary convergence. Roughly speaking, stationary convergence states that for every stationary strategy profile σ and real time t, both the cumulative payoff (in Γ _δ) up to time t and the distribution of the state at time t converge as the stage duration δ goes to zero.

Proposition 1 asserts that stationary convergence is equivalent to data convergence. This result shows that the continuous-time model (see, e.g., [8]) captures all possible limits of “nicely behaved” families of discrete-time stochastic games with short-stage duration.

Data (or its equivalent stationary) convergence is sufficient for our asymptotic results (e.g., Theorems 1 and 8) on the stationary (as well as the nonstationary) discounted games. In these results, we associate with a discount rate ρ and a stage duration δ the discount factor 1−ρδ. These results remain intact if the (δ,ρ)-dependent discount factor λ _δ,ρ is such that the limit, as δ goes to zero, of (1−λ _δ,ρ)/δ exists and equals ρ. For example, λ _δ,ρ=e ^−ρδ.

The unnormalized ρ-discounted payoff of a play (z ₀,a ₀,z ₁,…) of the game Γ _δ is $\sum_{m =0}^{\infty}(1-\rho \delta )^{m}g_{\delta}(z_{m} ,a_{m} )$. The corresponding ρ-discounted game is denoted by Γ _δ,ρ. In the two-person zero-sum case, Sect. 4.1 shows that, given a converging family (Γ _δ)_δ>0 of two-person zero-sum games, (1) the value of Γ _δ,ρ, denoted by V _δ,ρ, converges as δ goes to zero, and (2) there is a stationary strategy σ that is ε(δ)-optimal in the game Γ _δ,ρ, where ε(δ) goes to zero as δ goes to zero.

An asymptotic ρ-discounted stationary equilibrium strategy of the family (Γ _δ)_δ>0 of non-zero-sum stochastic games is a profile σ of stationary strategies that is an ε(δ)-equilibrium of Γ _δ, where ε(δ)→0 as δ goes to zero. In the discounted non-zero-sum case, we prove (Theorem 8) that (for every ρ>0) a converging family has an asymptotic ρ-discounted stationary equilibrium strategy.

The average (per unit of time) payoff to player i up to time s (in the game Γ _δ) is $g^{i}_{\delta}(s):=\frac{1}{s}\sum_{0\leq m<s/\delta}g^{i}_{\delta}(z_{m} ,a_{m} )$, where $g^{i}_{\delta}$ is the ith coordinate of g _δ. The lim inf, respectively lim sup, game Γ _δ is the game where the payoff to player i is $\underline{g}^{i}_{\delta}:=\liminf_{s\to\infty}g^{i}_{\delta}(s)$, respectively $\bar{g}^{i}_{\delta} :=\limsup_{s\to\infty}g^{i}_{\delta}(s)$. The limiting-average value or equilibrium payoff is a payoff v such that for every ε>0, there is a strategy profile such that (1) for every player i, his payoff in the lim inf game is at least v ⁱ−ε, and (2) every unilateral deviation of player i results in a payoff to him in the lim sup game of no more than v ⁱ+ε.

For every δ>0, v _δ,ρ:=ρV _δ,ρ converges to a limit (denoted by v _δ,0) as ρ→0+ [1]. The limit v _δ,0 is the uniform and limiting-average value of Γ _δ [5]. Convergence in data is not sufficient to guarantee the convergence of v _δ,0 as δ goes to zero (Remark 10). Strong convergence implies that v _δ,ρ converges as δ goes to zero uniformly in ρ (Theorem 2) and, therefore, v _δ,0 converges as δ goes to zero.

A family (Γ _δ)_δ>0 of two-person zero-sum stochastic games has an asymptotic limiting-average value v if for every ε>0 there are strategies σ _δ of player 1 and τ _δ of player 2 and a duration δ ₀>0, such that for every 0<δ<δ ₀, strategy σ of player 1, and strategy τ of player 2, $\varepsilon +E^{z}_{\sigma _{\delta},\tau} \underline{g}_{\delta}\geq v(z)\geq -\varepsilon + E^{z}_{\sigma ,\tau_{\delta}} \bar{g}_{\delta}$.

A family (Γ _δ)_δ>0 of non-zero-sum stochastic games has an asymptotic limiting-average equilibrium payoff v if for every ε>0 there are strategy profiles σ _δ and a duration δ ₀>0, such that for every 0<δ<δ ₀, player i, and strategy τ ⁱ of player i,

$$\varepsilon +E^z_{\sigma _{\delta}} \underline{g}^i_{\delta} \geq v^i(z)\geq -\varepsilon + E^z_{\sigma ^{-i}_{\delta},\tau^i} \bar{g}^i_{\delta}(s). $$

A family (Γ _δ)_δ>0 that converges strongly has an asymptotic limiting-average value in the zero-sum case (Theorem 4), and an asymptotic limiting-average equilibrium payoff in the non-zero-sum case (Theorem 11).

A family (Γ _δ)_δ>0 of two-person zero-sum stochastic games has an asymptotic uniform value v if for every ε>0 there are strategies σ _δ of player 1 and τ _δ of player 2, a duration δ ₀>0, and a time s ₀>0, such that for every 0<δ<δ ₀, s>s ₀, strategy σ of player 1, and strategy τ of player 2, $\varepsilon +E^{z}_{\sigma _{\delta},\tau} g_{\delta}(s)\geq v(z)\geq-\varepsilon + E^{z}_{\sigma ,\tau _{\delta}} g_{\delta}(s)$.

A family (Γ _δ)_δ>0 of non-zero-sum stochastic games has an asymptotic uniform equilibrium payoff v if for every ε>0 there are strategy profiles σ _δ, a duration δ ₀>0, and a time s ₀>0, such that for every 0<δ<δ ₀, s>s ₀, player i, and strategy τ ⁱ of player i,

$$\varepsilon +E^z_{\sigma _{\delta}} g^i_{\delta}(s)\geq v^i(z)\geq-\varepsilon + E^z_{\sigma ^{-i}_{\delta},\tau^i} g^i_{\delta}(s). $$

An exact family of games Γ _δ has an asymptotic uniform value in the zero-sum case (Theorem 6), and an asymptotic uniform equilibrium payoff in the non-zero-sum case (Theorem 12).

2 The Model and Results

Throughout the paper, the set of players N, the set of states S, and the set of actions A, are finite. The set of feasible actions may depend on the state z∈S. We denote by A ⁱ(z) the set of actions of player i∈N in state z∈S. A(z) is the set of action profiles at state z, A(z)=×_i∈N A ⁱ(z). For notational convenience, we set $\mathcal{A}=\{(z,a): z\in S,\ a\in A(z)\}$.

The data of the stochastic game Γ _δ that depend on the parameter δ are the $\mathbb{R}^{N}$-valued payoff function g _δ that is defined on $\mathcal{A}$ and the conditional probabilities P _δ(z′∣z,a) that are defined for all z′∈S and $(z,a)\in \mathcal{A}$. The payoff function g _δ defines the stage payoff $g_{\delta} (z,a)\in\mathbb{R}^{N}$ as a function of the stage state z and the stage action profile a. The ith coordinate of a vector $g\in \mathbb {R}^{N}$ is denoted by g ⁱ. The conditional probabilities P _δ(z′∣z,a) specify the conditional probability of the next state being z′ conditional on playing the action profile a at the current state z.

The conditional probabilities P _δ(z′∣z,a) obey P _δ(z′∣z,a)≥0 and ∑_z′∈S P _δ(z′∣z,a)=1. We describe the conditional probabilities by specifying the function p _δ(z′,z,a) that is defined on $S\times \mathcal{A}$ by p _δ(z′,z,a)=P _δ(z′∣z,a) if z′≠z and p _δ(z′,z,a)=P _δ(z′∣z,a)−1 if z′=z. Obviously, p _δ(z′,z,a)≥0 if z′≠z, p(z,z,a)≥−1, and ∑_z′∈S p _δ(z′,z,a)=0.

The set H of plays of Γ _δ is the set of all sequences h=(z ₀,a ₀,…,z _k,a _k,…) with $(z_{k},a_{k})\in \mathcal{A}$. The events are the elements of the minimal σ-algebra $\mathcal{H}$ of subsets of H for which each one of the maps $H\ni h=(z_{0},a_{0},\ldots )\mapsto(z_{k},a_{k})\in \mathcal{A}$, k≥0, is measurable. We denote by $\mathcal{H}_{k}$ the σ-algebra generated by (z ₀,a ₀,…,z _k).

The set of strategies in the stochastic game Γ _δ is independent of δ. The transition probabilities, however, do depend on δ. For every strategy profile σ=(σ ⁱ)_i∈N we denote by $P^{z}_{\delta,\sigma }$ the probability distribution defined by the transition probabilities of the game Γ _δ, the initial state z ₀=z, and the strategy profile σ, on the measurable space $(H,\mathcal{H})$ of plays. The expectation with respect to the probability $P^{z}_{\delta,\sigma }$ is denoted by $E^{z}_{\delta,\sigma }$. The parameter δ that appears in the probability and expectation above is formally needed as the transition probabilities depend on δ. However, wherever there is an implicit reference to the parameter δ, we suppress (the formally needed) δ; e.g., we write $E^{z}_{\sigma _{\delta} }$, for short, instead of the more explicit $E^{z}_{\delta,\sigma _{\delta}}$.

2.1 The Discounted Games

Given a discount factor 0<λ<1, the discrete-time stochastic game Γ with a discount factor λ is the game where the (unnormalized) valuation of the stream of payoffs (g _m=g(z _m,a _m))_m≥0 is $\sum_{m=0}^{\infty }\lambda^{m} g_{m}$. The normalized valuation is the unnormalized one times 1−λ. The generalization to the case of individual discount factors is straightforward. Given a vector $\vec {\lambda}=(\lambda _{i})_{i\in N}$ of discount factors, the game with discount factors $\vec {\lambda}$ is the game where the unnormalized (respectively, normalized) valuation of player i of the stream of vector payoffs (g _m)_m≥0 is $\sum_{m=0}^{\infty}\lambda_{i}^{m}g^{i}_{m}$ (respectively, $(1-\lambda_{i})\sum_{m=0}^{\infty}\lambda_{i}^{m}g^{i}_{m}$).

We study the family of discrete-time stochastic games Γ _δ with discount factors λ _δ that depend on the stage duration parameter δ. We require that the limit, as δ goes to zero, of the valuation of a unit payoff per unit of time (i.e., g _δ=δ for all δ>0) with the discount factor λ _δ, exist. This requirement is equivalent to the existence of the limit of $\frac{1-\lambda_{\delta}}{\delta}$ as δ goes to zero. A family of δ-dependent discount factors λ _δ is called admissible if $\lim_{\delta\to0+}\frac{1-\lambda_{\delta}}{\delta}$ exists. The limit is called the asymptotic discount rate (and is equal to $\lim_{\delta\to0+}\frac{-\ln\lambda_{\delta}}{\delta }$). Two examples of admissible δ-dependent discount factors, with asymptotic discount rate ρ>0, are λ _δ=e ^−ρδ and λ _δ=1−ρδ.

A family of δ-dependent discount factors, λ _δ, is admissible and has an asymptotic discount rate ρ>0, if and only if for all streams x _δ=(g _δ,0,g _δ,1,…) of payoffs, with uniformly bounded payoffs per unit of time (i.e., |g _δ,m|≤Cδ), the difference between the valuation of x _δ according to the discount factors λ _δ and its valuation according to the discount factors e ^−ρδ goes to zero as δ goes to zero.

Our asymptotic results on the δ-dependent discounted games depend only on the asymptotic discount rate ρ (and not on the exact choice of the δ-dependent discount factor with asymptotic discount rate ρ). Therefore, it suffices to select, for each ρ>0, an admissible family of δ-dependent discount factors λ _δ,ρ with asymptotic discount rate ρ. Our choice of the δ-dependent discount factor with asymptotic discount rate ρ is λ _δ,ρ=1−ρδ. This simplifies some parts of the presentation.

The ρ-discounted game, denoted by Γ _δ,ρ, is the game Γ _δ with discount factor 1−ρδ. In the zero-sum case, we say that the family (Γ _δ)_δ>0 of two-person zero-sum games^{Footnote 1} has an asymptotic ρ-discounted value V _ρ if the values of Γ _δ,ρ, denoted by V _δ,ρ, converge to V _ρ as δ goes to zero. Theorem 1 asserts that a family (Γ _δ)_δ>0 that converges in data has an asymptotic ρ-discounted value. In addition, it provides a system of S equations that has a unique solution, which equals V _ρ, and proves the existence of a (δ-independent) stationary strategy that is ε(δ)-optimal in Γ _δ,ρ, where ε(δ)→0 as δ goes to zero. In the non-zero-sum case, Theorem 8 asserts that a family (Γ _δ)_δ>0 that converges in data has a (δ-independent) stationary strategy that is an ε(δ)-equilibrium of Γ _δ,ρ, where ε(δ)→0 as δ goes to zero.

Section 4.1 notes that the map ρ↦V _ρ is semialgebraic and bounded and, therefore, $v_{\rho}:=\rho V_{\rho}=\sum_{k=0}^{\infty} c_{k}(z)\rho^{k/M}$ in a right neighborhood of zero. This fact, in conjunction with the covariance properties of v _ρ as a function of (g,μ) (see Sect. 4.1), is used in the study of the asymptotic uniform value (see Sect. 4.5). It shows that for an exact family (Γ _δ)_δ>0 there is an integrable function $\psi: [0,1]\to\mathbb{R}_{+}$ and δ ₀>0 such that $\|\rho V_{\delta,\rho}-\rho' V_{\delta,\rho'}\|\leq\int_{\rho} ^{\rho'} \psi(x)\,dx$ for 0<ρ<ρ′≤1 and δ≤δ ₀.

The covariance properties (in conjunction with [10, Theorem 6]) are used in the proof of Theorem 2 that asserts that if Γ _δ converges strongly, then v _δ,ρ (:=ρV _δ,ρ) converges, as δ goes to zero, uniformly on 0<ρ<1.

2.2 The Nonstationary Discounted Games

A time-separable valuation u of streams of payoff is represented by a positive measure w on the nonnegative integers. It is given by the valuation function $u_{w}(g_{0},g_{1},\ldots)=\sum_{m=0}^{\infty} w(m)g_{m}$. The valuation function u _w is (well) defined over all bounded streams (g ₀,g ₁,…) of payoffs. The valuation u _w is normalized if the total mass of w equals 1, i.e., $\sum_{m=0}^{\infty} w(m)=1$. The generalization to the case of individual time-separable valuations is straightforward. Given a vector $\vec {w}=(w^{i})_{i\in N}$ of positive measures on the nonnegative integers the game with valuation $u_{\vec {w}}$ is the game where the valuation of player i of the stream of vector payoffs (g _m)_m≥0 is $\sum_{m=0}^{\infty} w^{i}(m)g^{i}_{m}$. The discrete-time stochastic game Γ with the valuation $u_{\vec {w}}$ is denoted by $\varGamma_{\vec {w}}$.

The set of all probability measures on a set ∗ is denoted by Δ(∗). As A ⁱ(z) is finite, the set X ⁱ(z):=Δ(A ⁱ(z)) is a compact subset of a Euclidean space. The set of profiles of Markovian strategies in a discrete-time stochastic game is identified with the Cartesian product $\times_{(i,z,n)\in N\times S\times\mathbb {N}}X^{i}(z)$, which is a compact space in the product topology. Let Γ be a discrete-time stochastic game (with finitely many states and actions). A profile σ of Markovian strategies is an equilibrium of $\varGamma_{\vec {w}}$ whenever: (1) for every $k\in \mathbb {N}$, $\vec {w}_{k}$ is a vector of positive measures on the nonnegative integers, (2) for every $k\in\mathbb{N}$, σ(k) is a profile of Markovian strategies that is an equilibrium of $\varGamma _{\vec {w}_{k}}$, (3) σ(k)→_k→∞ σ (in the product topology), and (4) for every i∈N, $\sum_{m=0}^{\infty} |w^{i}_{k}(m)-w^{i}(m)|\to_{k\to\infty}0$.

By backward induction, if $\vec {w}$ has finite support, the game $\varGamma_{\vec {w}}$ has an equilibrium in Markovian strategies. Therefore, the above-mentioned comment implies that a discrete-time stochastic game with individual time-separable evaluations has an equilibrium in Markovian strategies. The discrete-time stochastic game Γ _δ with the individual time-separable valuation $\vec {w}_{\delta}$ is denoted by $\varGamma_{\delta,\vec {w}_{\delta}}$. In this game, the payoff to player i of a play (z ₀,a ₀,…) is $g^{i}_{\delta}(w^{i}_{\delta} ):=\sum_{m=0}^{\infty} w^{i}_{\delta}(m) g^{i}_{\delta}(z_{m},a_{m})$. The discrete-time stochastic game Γ _δ with the common time-separable valuation w _δ, denoted by $\varGamma_{\delta, w_{\delta}}$, is the game $\varGamma_{\delta,\vec {w}_{\delta}}$ with ${w}^{i}_{\delta}=w_{\delta}$ for every player i.

If $\vec {w}=(w^{i})_{i\in N}$ is a profile of nonnegative measures on [0,∞], we say that the vector $\vec {w}_{\delta}=(w^{i}_{\delta})_{i\in N}$ of N measures on $\mathbb{N}\cup \{ \infty\}$ converges (as δ→0+) to $\vec {w}$ if (1) $\vec {w}_{\delta}(\mathbb{N}\cup\{\infty\})$ converges (as δ goes to 0) to $\vec {w}([0,\infty])$, and (2) for every 0≤t<∞ there is a family of nonnegative integers m _δ with δm _δ→_δ→0+ t, and such that $\sum_{m=0}^{m_{\delta}} \vec {w}_{\delta}(m)\to_{\delta\to 0+}\vec {w}([0,t])$. Note that by identifying the N-vector measure $\vec {w}_{\delta}$ with the N-vector measure $\vec {w}'_{\delta}$ on [0,∞] (the one-point compactification of [0,∞)) that is supported on {δm:m≥0}∪{∞} and satisfies $\vec {w}'_{\delta}([\delta m, \delta(m+1)))=\vec {w}_{\delta}(m)$ and $\vec {w}'_{\delta}(\infty )=\vec {w}_{\delta}(\infty)$, our definition of convergence here is equivalent to w ^∗ convergence of measures on compact spaces. Explicitly, $\vec {w}_{\delta}$ converges as δ→0+ to the N-vector measure $\vec {w}$ on [0,∞] if for every continuous function f on [0,∞], $\int_{[0,\infty]} f(x) \, d\vec {w}'_{\delta}(x)$ (which equals $f(\infty )\vec {w}_{\delta}(\infty)+\sum_{m=0}^{\infty} f(\delta m)\vec {w}_{\delta}( m)$) converges as δ→0+ to $\int_{[0,\infty ]} f(x) \,d\vec {w}(x)$.

In this section, we focus on the case that $\vec {w}_{\delta}$ is supported on $\mathbb{N}$ and $\vec {w}$ is supported on [0,∞). The more general convergence definition (above) is used in subsequent parts of the paper.

Of special interest are the nonstationary discounting valuations and their limits. In the discrete-time model, the nonnegative measure w on $\mathbb{N}\cup\{\infty\}$ is called a nonstationary discounting valuation (measure) if w(m)≥w(m+1). The vector measure ${\vec {w}}$ is said to be nonstationary discounting if each of its components w ⁱ is a nonstationary discounting. A nonnegative measure w on [0,∞] is said to be nonstationary discounting if for every s>0 the function [0,∞)∋t↦w([t,t+s)) is nonincreasing in t. Note that if the family of nonstationary discounting measures w _δ on $\mathbb{N}$ converges to the nonnegative measure w on [0,∞], then w is a nonstationary discounting measure.

Let $\vec {w}$ be a nonstationary discounting N-vector measure on [0,∞). We say that $v\in\mathbb{R}^{N\times S}$ is an asymptotic $\vec {w}$ equilibrium payoff of the family of N-person games (Γ _δ)_δ>0, if for every ε>0 and a family of nonstationary discounting N-vector measures $\vec {w}_{\delta}$ on $\mathbb{N}$ that converges to $\vec {w}$, v is an ε-equilibrium payoff of $\varGamma _{\delta,\vec {w}_{\delta}}$ for every δ>0 sufficiently small.

Let w be a nonstationary discounting measure on [0,∞). We say that $v\in\mathbb{R}^{S}$ is an asymptotic w value of the family of two-person zero-sum games (Γ _δ)_δ>0, if for every ε>0 and a family of nonstationary discounting measures w _δ on $\mathbb{N}$ that converges to w, the value v _δ of $\varGamma _{\delta, w_{\delta}}$ satisfies |v _δ(z)−v(z)|<ε for every δ>0 sufficiently small and state z. Note that $v\in\mathbb {R}^{S}$ is an asymptotic w value of the family of two-person zero-sum games (Γ _δ)_δ>0 if and only if (v,−v) is an asymptotic (w,w) equilibrium payoff of (Γ _δ)_δ>0.

Theorem 9 asserts (in particular) that if (Γ _δ)_δ>0 converges in data, then for every nonstationary discounting N-vector measure $\vec {w}$ on [0,∞) the family (Γ _δ)_δ>0 has an asymptotic $\vec {w}$ equilibrium payoff. In addition, if the nonstationary discounting N-vector measure $\vec {w}_{\delta} $ converges (as δ goes to 0) to the N-vector measure $\vec {w}$ on [0,∞), then for every ε>0 there is δ ₀>0 and a family of Markovian strategy profiles σ _δ, such that (1) for 0<δ<δ ₀, σ _δ is an ε-equilibrium of $\varGamma_{\delta,\vec {w}_{\delta}}$ and its corresponding payoff is within ε of an asymptotic $\vec {w}$ equilibrium payoff v, and (2) σ _δ converges to a profile of continuous-time Markov strategies.^{Footnote 2} In Sect. 3.2, we define the convergence of Markovian strategies.

Theorem 9 implies in particular that a finite-horizon continuous-time stochastic game has an ε-equilibrium in Markov strategies. Reference [4] shows that a finite-horizon continuous-time stochastic game need not have an equilibrium in Markov strategies. Therefore, it is impossible to require (in the additional part) that σ _δ be an equilibrium (rather than an ε-equilibrium) of $\varGamma_{\delta,\vec {w}_{\delta}}$ and at the same time converge to a profile of continuous-time Markov strategies.

In several dynamic interactions, the game payoff is composed of stage payoffs and a terminal payoff. Such games are also useful in backward induction arguments. For example, in order to find an equilibrium (or an approximate equilibrium) of an extensive form game, a classical procedure is to replace a subgame of the game with a terminal node whose payoff equals an equilibrium (or approximate equilibrium) payoff of the subgame. An equilibrium (or approximate equilibrium) of the original game is obtained by patching together an equilibrium (or an approximate equilibrium) of the truncated game with an equilibrium (or approximate equilibrium) of the subgame. This motivates the definition of the following useful family of games.

Let $\vec {w}_{\delta}=(w_{\delta}^{i})_{i\in N}$ be a vector of positive measures on $\mathbb{N}$, m _δ>0, and let $\nu_{\delta} =(\nu_{\delta}^{i})_{i\in N}$ be a vector of N payoff functions $\nu _{\delta}^{i}: \mathcal{A}\to\mathbb{R}$. The game $\varGamma_{\delta ,\vec {w}_{\delta}}^{m_{\delta},\nu_{\delta}}$ is the game Γ _δ where the valuation of player i of the play (z ₀,a ₀,z ₁,…) is the sum of two terms: $\nu^{i}_{\delta }(z_{m_{\delta}},a_{m_{\delta}}) +\sum_{m=0}^{\infty} w_{\delta} ^{i}(m)g_{\delta} ^{i}(z_{m},a_{m})$. The first term accounts for a one-time (e.g., terminal) payoff. This variation enables us to view games like soccer, where the objective is to reach the best score at the end of the game, as stochastic games.

We say that (m _δ,ν _δ) converges to (t,ν), where 0≤t<∞ and $\nu: \mathcal{A}\to\mathbb{R}^{N}$, if (1) ν _δ(z,a) converges to ν(z,a) for all $(z,a)\in\mathcal{A}$, and (2) δm _δ converges to t as δ goes to zero.

Let $\vec {w}$ be a nonstationary discounting N-vector measure on [0,∞), 0≤t<∞, and $\nu: \mathcal{A}\to \mathbb{R}^{N}$. The N×S payoff vector $v\in\mathbb {R}^{N\times S}$ is called an asymptotic $(\vec {w},t,\nu)$ equilibrium payoff of the family (Γ _δ)_δ>0, if for every (1) family of nonstationary discounting N-vector measure $\vec {w}_{\delta}$ on $\mathbb{N}$ that converges (as δ goes to 0) to $\vec {w}$, (2) $m_{\delta}\in\mathbb{N}$ and $\nu_{\delta}: \mathcal{A}\to\mathbb{R}^{N}$ such that (m _δ,ν _δ) converges to (t,ν), and (3) ε>0, there is δ ₀>0, such that for 0<δ<δ ₀, $\varGamma_{\delta ,\vec {w}_{\delta}}^{m_{\delta},\nu_{\delta}}$ has an ε-equilibrium payoff within ε of v.

Theorem 9 asserts if (1) $\vec {w}$ is a nonstationary discounting N-vector measure on [0,∞), (2) 0≤t<∞, and (3) $\nu: \mathcal{A}\to\mathbb{R}^{N}$, then a family (Γ _δ)_δ>0 that converges in data has an asymptotic $(\vec {w},t,\nu)$ equilibrium payoff. In addition, if (1) $\vec {w}_{\delta}$ is a nonstationary discounting N-vector measure on $\mathbb{N}$ that converges (as δ goes to 0) to $\vec {w}$, and (2) $m_{\delta}\in\mathbb{N}$ and $\nu _{\delta}: \mathcal{A}\to\mathbb{R}^{N}$ are such that (m _δ,ν _δ) converges to (t,ν), then for every ε>0 there are (1) δ ₀>0, (2) Markov strategy profiles σ _δ, and (3) a continuous-time Markov strategy profile σ, such that (1) for 0<δ<δ ₀, σ _δ is a ε-equilibrium of $\varGamma _{\delta ,\vec {w}_{\delta}}^{m_{\delta},\nu_{\delta}}$ with a payoff within ε of an asymptotic $(\vec {w},t,\nu)$ equilibrium payoff v, and (2) the Markov strategy profiles σ _δ converge w ^∗ to σ.

2.3 The Limiting-Average Games

The classic limiting-average valuation of a stream (g ₀,g ₁,…) of payoffs is the limit of the average payoff per stage, $\lim_{n\to \infty} \frac{1}{n}\sum_{0\leq m<n}g_{m}$, if the limit exists. The interpretation is that the stage duration is one unit of time, and therefore the average $\frac{1}{n}\sum_{0\leq m<n}g_{m}$ represents the average payoff per unit of time. In studying the limiting-average valuation of streams (g _δ,0,g _δ,1,…) of payoffs in Γ _δ, one has to take into account that the stage duration is δ. Therefore, the average payoff per unit of time up to time s is $(g^{i}_{\delta}(s))_{i\in N}=g_{\delta}(s)$ ($=\frac{1}{s}\sum_{m: 0\leq m\delta<s}g_{\delta,m}$). In the two-person zero-sum case, the set of players is N={1,2} and we write g for g ¹ and g _δ for $g^{1}_{\delta}$. No confusion should result.

The averages $g^{i}_{\delta}(s)$ need not converge as s goes to infinity. Therefore, in defining the limiting-average (value or) equilibrium payoff v=(v ⁱ)_i∈N, we require that for every ε>0 the (ε-optimal or) ε-equilibrium strategy result in a distribution on streams of payoffs such that the expectation of $\underline{g}^{i}_{\delta}$ ($=\liminf_{\delta\to0+}g^{i}_{\delta}(s)$) is within ε of v, and no unilateral deviation by a player, say player i, can result in a distribution on streams of payoffs with an expectation of $\bar{g}^{i}_{\delta}$ ($=\limsup g^{i}_{\delta}(s)$) greater than v ⁱ+ε.

Note that if w _δ,s is the probability measure on $\mathbb{N}$ with w _δ,s(m)=1/⌈s/δ⌉ (where ⌈⌉ denotes the smallest positive integer that is ≥) if mδ<s and w _δ,s(m)=0 otherwise, then $g^{i}_{\delta}(s)=g^{i}_{\delta} (w_{\delta,s})$. For each δ>0, the probability measures w _δ,s, s>0, are the extreme points of the convex set $M^{1}_{d}(\mathbb{N})$ of nonstationary discounting probability measures w _δ on $\mathbb {N}$. Indeed, $w_{\delta}=\sum_{m=1}^{\infty}(w_{\delta}(m-1)-w_{\delta}(m))m w_{\delta, m\delta}$ and $\sum_{m=1}^{\infty}(w_{\delta} (m-1)-w_{\delta} (m))m =1$. As $\sum_{m=1}^{k} (w_{\delta}(m-1)-w_{\delta}(m))m\leq w_{\delta} (0)k^{2}\to_{w_{\delta}(0)\to0+}0$, we deduce the following (known) property of the lim inf valuation $\underline{g}^{i}_{\delta}$ and the lim sup valuation $\bar{g}^{i}_{\delta}$:

A two-person zero-sum discrete-time stochastic game (with finitely many states and actions) has a limiting-average value [5]. However, this does not imply that a convergent family (Γ _δ)_δ>0 has an asymptotic limiting-average value. A non-zero-sum discrete-time stochastic game (with finitely many states and actions) has a limiting-average correlated equilibrium payoff [11], but it is unknown if it has a limiting-average equilibrium payoff.

Recall that $v\in\mathbb{R}^{S}$ is an asymptotic limiting-average value of the family (Γ _δ)_δ>0 if for every ε>0 there are strategies σ _δ of player 1 and τ _δ of player 2 and a duration δ ₀>0, such that for every strategy τ of player 2, strategy σ of player 1, and 0<δ<δ ₀, we have

$$\varepsilon +E^z_{\sigma _{\delta},\tau}\underline{g}_{\delta}\geq v(z)\geq - \varepsilon +E^z_{\sigma ,\tau_{\delta}}\bar{g}_{\delta}. $$

The definition implies that a family (Γ _δ)_δ>0 has at most one asymptotic limiting-average value.

Recall that $v\in\mathbb{R}^{N\times S}$ is an asymptotic limiting-average equilibrium payoff of the family (Γ _δ)_δ>0 if for every ε>0 there are strategy profiles σ _δ and a duration δ ₀>0, such that for every strategy τ ⁱ of player i and every 0<δ<δ ₀, we have

$$\varepsilon + E^z_{\sigma _{\delta}}\underline{g}^i_{\delta} \geq v^i(z)\geq -\varepsilon +E^z_{\sigma ^{-i}_{\delta},\tau^i} \bar{g}^i_{\delta}. $$

We prove that a family (Γ _δ)_δ>0 that converges strongly has an asymptotic limiting-average value in the zero-sum case (Theorem 4), and an asymptotic limiting-average equilibrium payoff in the non-zero-sum case (Theorem 11).

A variation of the limiting-average value, respectively, limiting-average equilibrium payoff, is the weak limiting-average value, respectively weak limiting-average equilibrium payoff, obtained by exchanging the order of the limiting and the expectation operations. Therefore, we say that $v\in\mathbb{R}^{N\times S}$ is an asymptotic weak limiting-average equilibrium payoff of the family (Γ _δ)_δ>0 if for every ε>0 there are strategy profiles σ _δ and a duration δ ₀>0, such that for every strategy τ ⁱ of player i and every 0<δ<δ ₀, we have

$$\varepsilon +\liminf_{s\to\infty}E^z_{\sigma _{\delta}}g^i_{\delta}(s) \geq v^i(z)\geq -\varepsilon +\limsup_{s\to\infty}E^z_{\sigma ^{-i}_{\delta},\tau ^i}g^i_{\delta}(s). $$

In the general model of repeated games (which includes repeated games with incomplete information), the existence of a limiting-average (value or) equilibrium payoff implies the existence of a weak limiting-average (value or) equilibrium payoff, but not vice versa. In the game models studied in the present paper, all results that we can prove regarding the weak limiting-value hold also for the limiting-average value. Therefore, no special consideration is given to these weaker concepts. It should be noted, however, that in the analogous study of the general model of repeated games, in particular, in repeated games with incomplete information, the limiting-average value, or equilibrium payoff will typically not exist, while the weak limiting-average value and equilibrium payoff may exist in some of these models.

2.4 The Mixed Discounting and Limiting-Average Games

The mixed time-separable and the limiting-average (respectively, the weak limiting-average) valuation of payoffs is a positive linear combination of a time-separable valuation u _w and the limiting-average (respectively, the weak limiting-average) valuation. It is represented by a measure w on $\mathbb{N}\cup\{\infty\}$, where w(∞) represents the weight given to the limiting-average (or weak limiting-average) valuation, and w(m) represents the weight of the payoff at stage $m\in\mathbb{N}$. A normalized mixed time-separable and limiting-average (or weak limiting-average) valuation of payoffs is a convex combination of a normalized time-separable valuation u _w and the limiting-average (or the weak limiting-average) valuation, and is represented by a probability measure on $\mathbb{N}\cup\{\infty\}$.

Let $\vec {w}_{\delta}=(w^{i})_{i\in N}$ be a vector of positive measures on $\mathbb{N}\cup\{\infty\}$, m _δ>0, and let $\nu _{\delta}=(\nu_{\delta}^{i})_{i\in N}$ be a vector of N payoff functions $\nu^{i}_{\delta}: \mathcal{A}\to\mathbb{R}$. The game $\varGamma _{\delta ,\vec {w}_{\delta}}^{m_{\delta},\nu_{\delta}}$ is the game Γ _δ where the valuation of player i of the play (z ₀,a ₀,z ₁,…) is the sum of three terms

$$\nu^i_{\delta}(z_{m_{\delta}},a_{m_{\delta}})+ w_{\delta}^i(\infty)\lim_{s\to \infty}g_{\delta}^i(s)+ \sum_{m=0}^{\infty} w_{\delta}^i(m)g_{\delta}^i(z_m,a_m), $$

if the limit exists.

The limit of $g_{\delta}^{i}(s)$ as s→∞ need not exist. Therefore, in defining (the value or) an equilibrium payoff v of $\varGamma_{\delta,\vec {w}_{\delta}}^{m_{\delta},\nu _{\delta}}$, we require that for every ε>0 the (ε-optimal or) ε-equilibrium strategy result in a distribution on plays such that the expectation of the $\nu^{i}_{\delta}(z_{m_{\delta}},a_{m_{\delta}})+ w_{\delta}^{i}(\infty )\underline{g}_{\delta}^{i}+\sum_{m=0}^{\infty} w_{\delta}^{i}(m)g_{\delta} ^{i}(z_{m},a_{m}) $ is within ε of v ⁱ, and no unilateral deviation by a player, say player i, can result in a distribution on plays with an expectation of $\nu^{i}_{\delta}(z_{m_{\delta}},a_{m_{\delta}})+ w_{\delta}^{i}(\infty )\bar {g}_{\delta}^{i}+\sum_{m=0}^{\infty} w_{\delta}^{i}(m)g_{\delta}^{i}(z_{m},a_{m}) $ greater than v ⁱ+ε.

Theorem 13 asserts that if (1) (Γ _δ)_δ>0 is an exact family, (2) the nonstationary discounting N-vector measure $\vec {w}_{\delta}$ converges (as δ goes to 0) to the N-vector measure $\vec {w}$ on [0,∞], and (3) (m _δ,ν _δ) converges to (t,ν), then for every ε>0 there are strategy profiles σ _δ, an N×S vector v, and δ ₀>0, such that for 0<δ<δ ₀, σ _δ is an ε-equilibrium of $\varGamma_{\delta,\vec {w}_{\delta} }^{m_{\delta} ,\nu_{\delta}}$ with a payoff within ε of v.

2.5 The Uniform Games

In a uniform (value or) equilibrium payoff v, we require that for every ε>0 there be a time s ₀ and a strategy profile for which for every s>s ₀ the expectation of g _δ(s) is within ε of v, and that there be no unilateral deviation by a player, say player i, and a time s>s ₀ such that the expectation of $g^{i}_{\delta}(s)$ is more than v ⁱ+ε. It is known that a uniform value exists in the zero-sum case (with finitely^{Footnote 3} many states and actions) [5]. In the discrete-time non-zero-sum case (with finitely many states and actions), (a uniform correlated equilibrium payoff exists [11], but) it is unknown if a uniform equilibrium payoff exists in this case.

We say that $v\in\mathbb{R}^{S}$ is an asymptotic uniform value of the family (Γ _δ)_δ>0 if for every ε>0 there are (1) a time s ₀>0, (2) a duration δ ₀>0, and (3) strategies σ _δ of player 1 and τ _δ of player 2, such that for all strategies τ of player 2 and σ of player 1, duration 0<δ<δ ₀, and time s>s ₀, we have

$$\varepsilon +E^z_{\sigma _{\delta},\tau}g_{\delta}(s)\geq v(z)\geq-\varepsilon +E^z_{\sigma ,\tau _{\delta}}g_{\delta}(s). $$

The definition implies that a family (Γ _δ)_δ>0 has at most one asymptotic uniform value.

Similarly, we say that $v\in\mathbb{R}^{N\times S}$ is an asymptotic uniform equilibrium payoff of the family (Γ _δ)_δ>0 if for every ε>0 there are (1) a time s ₀>0, (2) a duration δ ₀>0, and (3) strategy profiles σ _δ, such that for every player i, strategy τ ⁱ of player i, duration 0<δ<δ ₀, and time s>s ₀, we have

$$\varepsilon +E^z_{\sigma _{\delta}}g^i_{\delta}(s)\geq v^i(z)\geq-\varepsilon +E^z_{\sigma ^{-i}_{\delta},\tau^i}g^i_{\delta}(s). $$

An exact family has an asymptotic uniform value in the zero-sum case (Theorem 6), and an asymptotic uniform equilibrium payoff in the non-zero-sum case (Theorem 12).

Remark 1

The existence of an asymptotic uniform equilibrium payoff has the following corollaries.

If v is the asymptotic uniform equilibrium payoff of a family (Γ _δ)_δ>0 then for every ε>0 there is δ ₀>0 such that if 0<δ<δ ₀ and $\vec {w}_{\delta}=(w^{i})_{i\in N}$ is a profile of nonstationary discounting probability measures on $\mathbb{N}$ with $w^{i}_{\delta}(0)<\delta\delta_{0}$, then the game Γ _δ,w has an ε-equilibrium payoff within ε of v.

2.6 The Robust Nonstationary Discounted Solutions

Given a nonstationary discounting measure w on [0,∞], we define $\underline{g}^{i}_{\delta}(w)$ by

$$\underline{g}^i_{\delta}(w):=\liminf_{w_{\delta}\to w} g^i_{\delta} (w_{\delta})\quad \mbox{and}\quad \bar{g}^i_{\delta}(w):=\limsup_{w_{\delta}\to w} g^i_{\delta}(w_{\delta}), $$

where the lim inf and lim sup are over all nonstationary discounting measures w _δ on $\mathbb{N}$ that converge to w. If 1_∞ denotes the probability measure on [0,∞] with 1_∞(∞)=1, then $\underline{g}^{i}_{\delta}(1_{\infty} )=\underline {g}^{i}_{\delta}$ and $\bar{g}^{i}_{\delta}(1_{\infty})= \bar{g}^{i}_{\delta}$.

Fix a nonstationary discounting measure w on [0,∞] and a profile $\vec {w}=(w^{i})_{i\in N}$ of nonstationary discounting measures w ⁱ on [0,∞].

We say that $v\in\mathbb{R}^{S}$ is an asymptotic w-limiting-average value of the family (Γ _δ)_δ>0 if for every ε>0 there are strategies σ _δ of player 1 and τ _δ of player 2, and a duration δ ₀>0, such that for every strategy τ of player 2, strategy σ of player 1, and 0<δ<δ ₀, we have

$$\varepsilon +E^z_{\sigma _{\delta},\tau}\underline{g}_{\delta}(w)\geq v(z)\geq -\varepsilon +E^z_{\sigma ,\tau_{\delta}}\bar{g}_{\delta}(w). $$

We say that $v\in\mathbb{R}^{S}$ is an asymptotic w-uniform value of the family (Γ _δ)_δ>0 if for every ε>0 there are strategies σ _δ of player 1 and τ _δ of player 2, such that for all strategies $\tau_{\delta}^{*}$ of player 2, strategies $\sigma _{\delta}^{*}$ of player 1, and nonstationary discounting measures w _δ on $\mathbb{N}$ that converge (as δ→0+) to w, we have

$$\varepsilon +\liminf_{\delta\to0+}E^z_{\sigma _{\delta},\tau_{\delta} ^{*}}g_{\delta} (w_{\delta})\geq v(z)\geq-\varepsilon +\limsup_{\delta\to0+}E^z_{\sigma _{\delta} ^{*},\tau_{\delta}}g_{\delta}(w_{\delta}). $$

Similarly, we say that $v\in\mathbb{R}^{N\times S}$ is an asymptotic $\vec {w}$ -limiting-average equilibrium payoff of the family (Γ _δ)_δ>0 if for every ε>0 there are strategy profiles σ _δ (δ>0) and a duration δ ₀>0, such that for every player i, strategy $\tau^{i}_{\delta}$ of player i, and 0<δ<δ ₀, we have

$$\varepsilon +E^z_{\sigma _{\delta}}\underline{g}^i_{\delta} \bigl(w^i\bigr)\geq v^i(z)\geq -\varepsilon +E^z_{\sigma _{\delta}^{-i},\tau^i_{\delta}} \bar{g}^i_{\delta}\bigl(w^i\bigr). $$

We say that $v\in\mathbb{R}^{N\times S}$ is an asymptotic $\vec {w}$ -uniform equilibrium payoff of the family (Γ _δ)_δ>0 if for every ε>0 there are strategy profiles σ _δ, such that for every player i, all strategies $\tau ^{i}_{\delta}$ of player i, and all nonstationary discounting measures $w^{i}_{\delta}$ on $\mathbb{N}$ that converge (as δ→0+) to w ⁱ, we have

$$\varepsilon +\liminf_{\delta\to0+}E^z_{\sigma _{\delta}}g^i_{\delta} \bigl(w^i_{\delta} \bigr)\geq v^i(z)\geq-\varepsilon + \limsup_{\delta\to0+}E^z_{\sigma _{\delta} ^{-i},\tau _{\delta} ^i}g^i_{\delta} \bigl(w^i_{\delta}\bigr). $$

Note that v is an asymptotic limiting-average, respectively asymptotic uniform, equilibrium payoff of a family (Γ _δ)_δ>0 if and only if it is an asymptotic 1_∞-limiting-average, respectively, asymptotic 1_∞-uniform, equilibrium payoff of this family. Therefore, the results in the paragraph below generalize our results about the existence of an asymptotic limiting-average, respectively, asymptotic uniform, equilibrium payoff.

A strongly convergent family (Γ _δ)_δ>0 has an asymptotic $\vec {w}$-limiting-average equilibrium payoff, and an exact family (Γ _δ)_δ>0 has an asymptotic $\vec {w}$-uniform equilibrium payoff.

In what follows, we define the asymptotic w-robust value and the asymptotic $\vec {w}$-robust equilibrium payoff.

We say that $v\in\mathbb{R}^{S}$ is an asymptotic w-robust value of the family (Γ _δ)_δ>0 (of two-person zero-sum games) if for every ε>0 there are strategies σ _δ of player 1 and τ _δ of player 2, such that for all strategies $\tau ^{*}_{\delta}$ of player 2, strategies $\sigma ^{*}_{\delta}$ of player 1, and nonstationary discounting measures w _δ on $\mathbb{N}\cup\{ \infty \}$ that converge (as δ→0+) to w, we have

$$\varepsilon +\liminf_{\delta\to0+}E^z_{\sigma ^1_{\delta},\tau^2_{\delta} }\underline {g}^i_{\delta}(w_{\delta})\geq v^i(z)\geq- \varepsilon +\limsup_{\delta\to 0+}E^z_{\tau_{\delta}^1,\sigma _{\delta}^2,} \bar{g}^i_{\delta}(w_{\delta}). $$

We say that $v\in\mathbb{R}^{N\times S}$ is an asymptotic $\vec {w}$ -robust equilibrium payoff of the family (Γ _δ)_δ>0 if for every ε>0 there are strategy profiles σ _δ, such that for every player i, all strategies $\tau ^{i}_{\delta}$ of player i, and all nonstationary discounting measures w _δ on $\mathbb{N}\cup\{\infty\}$ that converge (as δ→0+) to w, we have

$$\varepsilon +\liminf_{\delta\to0+}E^z_{\sigma _{\delta}}\underline {g}^i_{\delta} \bigl(w^i_{\delta}\bigr)\geq v^i(z)\geq-\varepsilon +\limsup_{\delta\to0+}E^z_{\sigma _{\delta} ^{-i},\tau_{\delta}^i} \bar{g}^i_{\delta}\bigl(w^i_{\delta}\bigr). $$

An asymptotic $\vec {w}$-robust equilibrium payoff of a family (Γ _δ)_δ>0 is (by definition) an asymptotic w-limiting-average equilibrium payoff and an asymptotic $\vec {w}$-uniform equilibrium payoff.

Theorem 13 asserts that for every nonstationary discounting N-vector measure $\vec {w}$ on [0,∞], an exact family (Γ _δ)_δ>0 of N-person games has an asymptotic $\vec {w}$-robust equilibrium payoff.

2.7 The Variable Short-Stage Duration Games

The paper states and proves asymptotic results on families (Γ _δ)_δ>0 of discrete-time stochastic games. In each game Γ _δ, the stage duration is a constant positive number δ>0. The results remain intact also in the case where the parameter δ is a sequence of stage durations δ=(δ _m)_m≥0 with d _n:=∑_0≤m<n δ _m→_n→∞∞, where δ _m is the duration of the mth stage, the mth stage payoff function is g _δ,m (or g _m for short), and the mth stage transition function is p _δ,m (or p _m for short).^{Footnote 4}

The condition that the constant stage duration is sufficiently small needs to be replaced with the condition that the supremum of the stage durations, d(δ):=sup_m≥0 δ _m, is sufficiently small. A family (Γ _δ)_δ with variable stage duration converges in data if sup_m≥0∥g _m/δ _m−g∥ and sup_m≥0∥p _m/δ _m−μ∥ converge to zero as d(δ) goes to zero. It is an exact sequence if g _m=δ _m g and p _m=δ _m μ, and it converges strongly if it converges in data and for every δ, m≥0, z′≠z, and a∈A(z), p _m(z′,z,a)≠0 iff μ(z′,z,a)≠0.

The ρ-discounted present value of the payoff g _m at stage m is g _m∏_0≤j<m(1−δ _j ρ) (where a product over an empty set of indices is zero). Therefore, in the ρ-discounted game Γ _δ, the valuation of a play (z ₀,a ₀,…,z _m,a _m…) by player i is $\sum_{m=0}^{\infty} g_{m}(z_{m},a_{m})\cdot\allowbreak \prod_{0\leq j<m}(1-\nobreak \delta_{j} \rho)$.

In the case of a time-separable valuation, w _δ is said to be nonstationary discounting if $\frac{w_{\delta}(m)}{\delta_{m}}$ is nonincreasing in m. We assign to the measure w _δ on $\mathbb {N}$ the measure $w'_{\delta}$ on [0,∞) that is supported on $\{ d_{n}: n\in\mathbb{N}\}$ and $w'_{\delta}(d_{n})=w_{\delta}(n)$. We say that w _δ converges, as d(δ)→0+, to the measure w on [0,∞) if $w'_{\delta}$ converges w ^∗ to w.

Similarly, in the limiting-average games with variable stage duration δ, we set $g(s)=\frac{1}{s}\sum_{0\leq m: d_{m}< s}g_{m}(z_{m},a_{m})$ and in the definitions of $\underline{g}^{i}_{\delta}$ and $\bar{g}^{i}_{\delta}$, the condition w _δ(m)<η needs to be replaced with w _δ(m)<ηδ _m.

3 Convergence of Stochastic Games with Short-Stage Duration

We study the “convergence” of the family (Γ _δ)_δ>0, and the presentation of the “limit” as a continuous-time stochastic game Γ.

We define various conditions of the dependence of the transition rates p _δ on the stage duration δ. Some of these conditions relate directly to assumptions on the homogeneous Markov chain of states that are defined by an initial state, a stationary strategy, and the stage duration δ. Each one of the conditions can be interpreted as a consistency, or approximate consistency, of the models Γ _δ as δ varies.

Condition (p.0) asserts that the probability of a state change within the first m stages (namely, in a time t≤mδ) converges to zero as mδ goes to zero. In particular, the probability of a state change in a single stage converges to zero as δ goes to zero. Condition (p.0) is equivalent to mp _δ(z,z,a) converging to zero as mδ goes to zero. Recall that condition (p.2) is lim_δ→0+ p _δ/δ=μ where $\mu:S\times\mathcal {A}\to \mathbb{R}$, and note that condition (p.2) implies condition (p.0).

Recall that condition (p.3) requires (p.2) and that p _δ(z′,z,a)>0 if and only if μ(z′,z,a)>0 (where μ(z′,z,a) is the limit, as δ goes to zero, of p _δ(z′,z,a)/δ). Condition (p.3) implies that the ergodic classes of the homogeneous Markov chain that is defined by a stationary strategy and the transition rates p _δ are independent of δ.

Recall that condition (p.1) is p _δ=δμ, condition (p.1) implies condition (p.3), and condition (p.3) implies condition (p.2). Therefore, each asymptotic property that holds in any family (Γ _δ)_δ>0 that obeys (g.2) and (p.k) holds also in any family (Γ _δ)_δ>0 that obeys (g.1) and (p.k′), where k′=3 if k=2 and k′=1 if k=3.

Recall the following definitions of convergence in data and strong convergence.

Definition 1

(Convergence in data)

We say that Γ _δ converges in data (as δ→0) if the family (Γ _δ)_δ>0 satisfies conditions (g.2) and (p.2).

Definition 2

(Strong convergence)

We say that Γ _δ converges strongly (as δ→0) if the family (Γ _δ)_δ>0 satisfies conditions (g.2) and (p.3).

Next, we wish to define the “convergence” of the family (Γ _δ)_δ>0 as a convergence (as δ→0+) of the stochastic process of states and payoffs that is defined by the initial state and a strategy σ. Obviously, in defining the convergence of the stochastic process of states and payoffs one has to take into account the stage duration δ. The state z _n in the play of the discrete-time stochastic game Γ _δ is interpreted as the state at time nδ. Similarly, the sum $\sum_{j=0}^{n-1}g_{\delta} (z_{j},a_{j})$ of stage payoffs in stages 0≤j<n is interpreted as the cumulative payoff in the time interval [0,nδ].

Definition 3

(Convergence in stationary dynamics)

We say that Γ _δ converges in stationary dynamics if for all pure stationary strategies σ, states z′,z∈S, times t≥0, and positive integers n _δ such that n _δ δ→_δ→0+ t, we have

$$P^z_{\delta,\sigma }\bigl(z_{n_{\delta}}=z'\bigr) \rightarrow_{\delta\to0+} F^{\sigma} _{z,z'}(t) $$

and

$$E^z_{\delta,\sigma } \sum_{j=0}^{n_{\delta}}g_{\delta} (z_j,a_j) \rightarrow_{\delta \to0+} G_t(z,\sigma ), $$

where $(\sigma ,z',z,t)\mapsto F^{\sigma }_{z,z'}(t)\in\mathbb{R}$ and $(t,z,\sigma )\mapsto G_{t}(z,\sigma )\in\mathbb{R}^{N}$ are functions that are defined for all pure stationary strategies σ, states z′,z∈S, and times t≥0.

3.1 Stationary Convergence

Proposition 1

The following conditions are equivalent:

(A)
(Γ _δ)_δ>0 converges in stationary dynamics.
(B)
(Γ _δ)_δ>0 converges in data.

Proof

(A) ⇒ (B). Assume condition (A) holds. Obviously, $\sum_{z'\in S}P^{z}_{\delta,\sigma }(z_{n_{\delta}}=z')=1$. Therefore, $\sum_{z'\in S}F^{\sigma }_{z,z'}(t)=1$. Applying condition (A) to n _δ=0 and z′=z, we have $F^{\sigma }_{z,z}(0)=1$. Applying condition (A) to t=0 and all nonnegative integers n _δ with δn _δ→_δ→0+0, we deduce that for every ε>0 there are t _ε>0 and δ _ε>0, such that for every 0<δ<δ _ε and n with nδ≤t _ε, we have $P^{z}_{\delta, \sigma }(z_{n}=z)>1-\varepsilon $ for all states z∈S and pure stationary strategy profiles σ.

Fix z∈S and a∈A(z), set K _δ=K _δ(z)=∑_z′≠z p _δ(z′,z,a), and let σ be a pure stationary strategy with σ(z)=a, and n=n _δ=[t _1/3/δ] (where [∗] denotes the largest integer that is less than or equal to ∗). Then, for δ<δ _1/3, $1/3>P^{z}_{\delta, \sigma }(z_{n}\neq z)\geq \sum_{m=1}^{n} P^{z}_{\delta, \sigma }(\forall j<m\ z_{j}= z \mbox{ and } z\neq z_{m}=z_{n})\geq\sum_{m=1}^{n} (1-K_{\delta})^{m-1} K_{\delta} 2/3=(1-(1-K_{\delta} )^{n})2/3$, which implies the inequality (1−K _δ)ⁿ≥1/2. Therefore, lim sup_δ→0+ K _δ/δ<∞. Therefore, there is a positive constant K such that for all δ>0, z∈S, and a∈A(z), we have ∑_z′≠z p _δ(z′,z,a)<Kδ.

Next, we prove that if for a pair of distinct states z′≠z and an action profile a∈A(z) we have lim inf_δ→0+ p _δ(z′,z,a)/δ<c, then, for t>0 sufficiently small and a stationary strategy σ with σ(z)=a, we have $F^{\sigma} _{z,z'}(t)<ct$. Indeed, the set {z _n=z′,z ₀=z} is the union of the disjoint sets Y _m,z″={∀0≤j<m,z _j=z ₀,z _m=z″ and z _n=z′}, where m ranges over the positive integers 1≤m≤n and z″ ranges over all states z″≠z. Let ε>0 and set n=n _δ=[t _ε/δ]. Note that $P^{z}_{\delta,\sigma }(Y_{m,z''})\leq p_{\delta}(z',z,a) $ for z″=z′ and $\sum_{m=1}^{n-1}\sum_{z\neq z''\neq z'}P^{z}_{\delta ,\sigma }(Y_{m,z''})\leq \varepsilon K\delta n$ for δ sufficiently small. Therefore, if δ>0 is sufficiently small so that, in addition, p _δ(z′,z,a)/δ<c and for all z″≠z and a∈A(z) we have p _δ(z″,z,a)≤Kδ, then $P^{z}_{\delta,\sigma }(z_{n}=z')\leq\sum_{m=1}^{n} P^{z}_{\delta,\sigma }(Y_{m,z'})+\varepsilon K\delta n\leq(c+K\varepsilon ) \delta n$. Therefore, for t>0 sufficiently small, we have $F^{\sigma} _{z,z'}(t)<ct$.

Finally, we prove that if for a pair of distinct states z′≠z and an action profile a∈A(z) we have lim sup_δ→0+ p _δ(z′,z,a)/δ>c, then, for t>0 sufficiently small and a stationary strategy σ with σ(z)=a, we have $F^{\sigma} _{z,z'}(t)>ct$. Indeed, the set {z _n=z′, z ₀=z} contains the disjoint sets Y _m,z′={∀0≤j<m, z _j=z ₀,z _m=z′=z _n}, where m ranges over the positive integers 1<m≤n. Let ε>0 and set n=n _δ=[t _ε/δ]. Note that $P^{z}_{\delta,\sigma }(Y_{m,z'})\geq(1-\varepsilon )^{2}p_{\delta} (z',z,a) $ for δ sufficiently small. Therefore, if δ>0 is sufficiently small so that, in addition, p _δ(z′,z,a)/δ>c, then $P^{z}_{\delta,\sigma }(z_{n}=z')\geq\sum_{m=1}^{n} P^{z}_{\delta,\sigma }(Y_{m,z'})\geq n(1-\varepsilon )^{2}\delta c$. Therefore, for t>0 sufficiently small, we have $F^{\sigma} _{z,z'}(t)>ct$.

We conclude that the lim sup_δ→0+ p _δ(z′,z,a)/δ and the lim inf_δ→0+ p _δ(z′,z,a)/δ coincide.

We will now prove that the second part of (B) holds. Fix a player i∈N and assume that $\limsup_{\delta\to0+}\| g^{i}_{\delta}\|/\delta<\infty$, where $\|g^{i}_{\delta}\|:=\max_{z,a}|g^{i}_{\delta}(z,a)|$. For t>0 let $\gamma_{t}(z,\sigma )=\frac{1}{t}G_{t}(z,\sigma )$. Then, for δ>0 sufficiently small, $g^{i}_{\delta}(z,\sigma (z))/\delta -2\varepsilon \|g^{i}_{\delta}\|/\delta\leq\gamma^{i}_{t_{\varepsilon }}(z,\sigma )+\varepsilon $. Therefore,

$$\limsup_{\delta\to0+} g^i_{\delta}\bigl(z,\sigma (z)\bigr)/\delta\leq\gamma ^i_{t_{\varepsilon } }(z,\sigma )+\varepsilon +2\varepsilon \limsup_{\delta\to0+}\bigl\|g^i_{\delta}\bigr\|/\delta, $$

and, therefore,

$$\limsup_{\delta\to0+} g^i_{\delta}\bigl(z,\sigma (z)\bigr)/\delta\leq\liminf_{\varepsilon \to 0+}\gamma^i_{t_{\varepsilon }}(z, \sigma ). $$

Similarly, for δ>0 sufficiently small, $\gamma^{i}_{t_{\varepsilon } }(z,\sigma )-\varepsilon \leq g^{i}_{\delta}(z,\sigma (z))/\delta+2\varepsilon \|g^{i}_{\delta}\| /\delta$, and therefore $\limsup_{\varepsilon \to0+}\gamma^{z}_{t_{\varepsilon }}(z,\sigma )\leq\liminf_{\delta \to0+} g^{i}_{\delta}(z,\sigma (z))/\delta$. Given a∈A(z) and applying these inequalities to a stationary strategy σ with σ(z)=a, we conclude that the $\liminf_{\delta\to0+} g^{i}_{\delta}(z,a)/\delta$ and the $\limsup_{\varepsilon \to0+}g^{i}_{\delta}(z,a)/\delta$ coincide.

It remains to prove that condition (A) implies that $\limsup_{\delta \to0+}\|g^{i}_{\delta}\|/\delta<\infty$. For every 1>δ>0, let z _δ∈S and a _δ∈A(z) be such that $|g^{i}_{\delta} (z_{\delta},a_{\delta})|= \|g^{i}_{\delta}\|$. Let ε>0, and let σ=σ _δ be a stationary strategy with σ(z _δ)=a _δ. Set n=n _δ=[t _ε/δ] and z ₀=z _δ. If $g^{i}_{\delta} (z_{\delta},a_{\delta})\geq0$, then, for sufficiently small δ>0, we have $G^{i}_{t_{\varepsilon } }(z_{\delta} ,\sigma )+t_{\varepsilon }/3\geq E^{z_{\delta}}_{\sigma} \sum_{j=0}^{n-1}g^{i}_{\delta} (z_{j},a_{j})\geq(1-2\varepsilon )ng^{i}_{\delta}(z_{\delta},a_{\delta})$. Therefore, if ε<1/3, we have $g^{i}_{\delta}(z_{\delta},a_{\delta} )/\delta \leq3|\gamma^{i}_{t_{\varepsilon }}(z,\sigma )|+1$ for δ>0 sufficiently small. If g ⁱ(z _δ,a _δ)<0, then for sufficiently small δ>0, we have $G^{i}_{t_{\varepsilon }}(z_{\delta},\sigma )-t_{\varepsilon }/3\leq E^{z_{\delta}}_{\sigma} \sum_{j=0}^{n-1}g^{i}_{\delta}(z_{j},a_{j})\leq(1-2\varepsilon )ng^{i}_{\delta}(z_{\delta} ,a_{\delta})$. Therefore, if ε<1/3, we have $g^{i}_{\delta}(z_{\delta},a_{\delta} )/\delta \geq-3|\gamma^{i}_{t_{\varepsilon }}(z,\sigma )|-1$. This proves that $\limsup_{\delta \to0+}\|g^{i}_{\delta}\|/\delta\leq3|\gamma^{i}_{t_{\varepsilon }}(z,\sigma )|+1<\infty$.

(B) ⇒ (A). Let σ be a stationary strategy and let Q be the S×S matrix whose (z,z′)-th entry is Q _z,z′=μ(z′,z,σ(z)). Note that for δ>0 sufficiently small, I+δQ is a transition matrix, where I stands for the identity matrix, and ∥I+δQ∥:=max_z∈S∑_z′∈S|(I+δQ)_z,z′|=1. In addition, e ^δQ (which equals by definition the convergent sum $\sum_{j=0}^{\infty}\frac{\delta^{j} Q^{j}}{j!}$) is an S×S matrix, and (e ^δQ)ⁿ=e ^nδQ. Let P _δ be the S×S transition matrix whose (z,z′)-th entry is (P _δ)_z,z′=I _z,z′+p _δ(z′,z,σ(z)). Therefore, if n is a positive integer, then $P^{z}_{\delta,\sigma }(z_{n}=z')=(P^{n}_{\delta})_{z,z'}$. By the assumption on p _δ and the definitions of Q and e ^δQ, we have ∥e ^δQ−P _δ∥≤o(δ) as δ→0+.

For any two S×S matrices (or elements of a norm algebra) A and B we have $A^{n}-B^{n}=\sum_{k=1}^{n} A^{n-k}(A-B)B^{k-1}$, implying that $\|A^{n}-B^{n}\|\leq\|A-B\| \sum_{j=0}^{n-1}\|A\|^{j} \|B\|^{n-j}$. Therefore, $\|P_{\delta}^{n}-e^{n\delta Q}\|\leq\|P_{\delta}-e^{\delta Q}\|\sum_{j=0}^{n-1}\|e^{\delta Q}\|^{j}\leq o(\delta)n$ as δ→0+.

Therefore, $\|P_{\delta}^{n}-e^{t Q}\|\leq\|P_{\delta}^{n}-e^{n\delta Q}\|+\| e^{tQ}- e^{n\delta Q}\|\to0$ as δ→0+ and nδ→t. We conclude that $P^{z}_{\delta,\sigma }(z_{n}=z')\to F^{\sigma} _{z,z'}(t)=(e^{t Q})_{z,z'}\in \mathbb {R}$ as δ→0+.

By assumption (B), we have g _δ(z,a)=δg(z,a)+o(δ). Therefore, if δ→0+ and n _δ δ→t>0, then $|E^{z}_{\delta,\sigma } \sum_{j=0}^{n_{\delta}-1}g^{i}_{\delta} (z_{j},a_{j})-E^{z}_{\delta ,\sigma } \sum_{j=0}^{n_{\delta}-1}\delta g^{i}(z_{j},a_{j})|\to 0$. If δ→0+ and n _δ δ→t>0, then, as shown earlier, $P^{z}_{\delta,\sigma }(z_{n}=z')\to F^{\sigma} _{z,z'}(t)$, and, therefore, $E^{z}_{\delta,\sigma } \sum_{j=0}^{n_{\delta}-1}\delta g^{i}(z_{j},a_{j})\allowbreak \to G_{t}(z,\sigma )=\int_{0}^{t} \sum_{z'\in S}F^{\sigma} _{z,z'}(s)g(z',\sigma (z'))\,ds$. Therefore, $E^{z}_{\delta,\sigma } \sum_{j=0}^{n_{\delta}-1}g^{i}_{\delta} (z_{j},a_{j})\to G_{t}(z,\sigma )$ as δ→0+ and n _δ δ→t>0. □

Remark 2

The above proof of condition (B) implying condition (A) proves that for every stationary strategy σ, every time t≥0, all states z,z′∈S, and all integers 0≤n _δ with n _δ δ→_δ→0+ t, $P^{z}_{\sigma} (z_{n_{\delta} }=z')\to _{\delta\to0+} F^{\sigma} _{z,z'}(t)=e^{tQ}_{z,z'}$ where Q is the S×S matrix whose (z,z′)-th entry is Q _z,z′=μ(z′,z,σ(z)).

Note that every continuous-time stochastic game Γ=〈N,S,A,μ,g〉 is a “data limit” of the family of discrete-time stochastic games Γ _δ=〈N,S,A,p _δ,g _δ〉, where g _δ(z,a)=δg(z,a) and p _δ(z′,z,a)=δμ(z′,z,a) for all pairs of distinct states z′≠z and every action profile a∈A(z).

3.2 Markov Convergence

The next proposition gives a sufficient condition for a family of Markov strategies σ _δ in Γ _δ to have a continuous-time limiting dynamics and payoffs as δ→0+. In the formulas that follow, we view σ _δ(z,j) ($j\in \mathbb {N}$) as a measure on A(z); i.e., σ _δ(z,j)∈Δ(A(z)), and σ _δ(j):=(σ _δ(z,j))_z∈S is an element of ×_z∈SΔ(A(z)). Therefore, for any fixed z∈S, any linear combination of σ _δ(z,j) is a measure on A(z). Similarly, if $\sigma :S\times\mathbb{R}_{+}\to\Delta(A)$ is measurable with σ(z,t)∈Δ(A(z)), then, for any function $f\in L_{1}(\mathbb {R}_{+})$, the integral $\int_{0}^{\infty} f(t)\sigma (z,t)\,dt$ is well defined.

We say that the Markov strategies σ _δ in Γ _δ converge w ^∗ if for every continuous function $f:\mathbb{R}_{+}\to \mathbb {R}$ with bounded support, the limit of $\sum_{j=0}^{\infty} f(j\delta )\delta\sigma _{\delta}(z,j)$ as δ→0+ exists. In that case, there is a measurable function $\sigma :S\times\mathbb{R}_{+}\to\Delta(A)$ (with σ(z,t)∈Δ(A(z))) such that for every $f\in L_{1}(\mathbb{R}_{+})$ the limit of $\int_{0}^{\infty} f(t)\sigma _{\delta}(z,[t/\delta])\, dt$ as δ→0+ exists and equals $\int_{0}^{\infty} f(t)\sigma (z,t)\,dt$, and we say that the discrete-time Markov strategies σ _δ converge w ^∗ to (the continuous-time Markov correlated strategy) $\sigma :S\times\mathbb {R}_{+}\in \Delta(A)$.

Whenever the conditional probability $P^{z_{0}}_{\delta,\sigma }(E_{1}\mid E_{2})$ is independent of the initial state z ₀, we suppress the superscript of the initial state z ₀.

Proposition 2

If the (correlated) Markov strategies σ _δ in Γ _δ converge w ^∗ to $\sigma :S\times \mathbb{R}_{+}\to\Delta(A)$ and the family of discrete-time stochastic games (Γ _δ)_δ>0 converges in data, then, for every 0≤s<t, there are S×S transition matrices F ^σ(s,t) such that

$$P_{\sigma _{\delta}}\bigl(z_n=z'\mid z_k=z \bigr)\to F^{\sigma} _{z,z'}(s,t) \quad \mbox{\textit{as} } \delta \to0+, k \delta\to s, \mbox{ \textit{and} } n\delta\to t, $$

and

$$E^z_{\sigma _{\delta}}\sum_{0\leq m<n}g_{\delta}(z_m,a_m) \to\int_0^t \sum_{z'\in S}F^{\sigma }_{z,z'}(0,t)g \bigl(z',\sigma \bigl(z',t\bigr)\bigr)\,dt \quad \mbox{\textit{as} } \delta\to0+ \mbox{ \textit{and} } n\delta\to t. $$

Proof

As the family of discrete-time stochastic games (Γ _δ)_δ>0 converges in data, there is a positive constant K>0 such that for every $(z,a)\in\mathcal{A}$ we have |p _δ(z,z,a)|>1−Kδ. Therefore, if 0≤k<n, $|P_{\delta,\sigma _{\delta}}(z_{n}=z'\mid z_{k}=z)- I_{z,z'}|<1-(1-K\delta )^{n-k}\to0$ as nδ−kδ→0+. Therefore, it suffices to prove that for every s<t there are sequences k _δ<n _δ such that k _δ δ→s and n _δ δ→t such that

$$P_{\delta,\sigma _{\delta}}\bigl(z_{n_{\delta}}=z'\mid z_{k_{\delta}}=z \bigr)\to F^{\sigma} _{z,z'}(s,t) \quad \mbox{as } \delta\to0+. $$

We will prove it for n _δ=[t/δ] and k _δ=[s/δ].

Assume that the Markov strategies σ _δ in Γ _δ converge w ^∗ to $\sigma :S\times\mathbb{R}_{+}\to\Delta(A)$. Let M be the space of all S×S matrices Q, let M ₀ be the subset of all its matrices Q with ∑_z′∈S Q _z,z′=0 for every z∈S and Q _z,z′≥0 for all z≠z′, and let M ₁ be the subset of M of all transition matrices. The space M is a (noncommutative) Banach algebra with the norm ∥Q∥=max_z∈S∑_z′∈S|Q _z,z′|, and M ₁ is closed under multiplication. For an ordered list F ₁,…,F _j∈M, we denote by $\prod_{i=1}^{j} F_{i}$ the matrix (ordered) product F ₁ F ₂…F _j.

Let Q:[0,∞)→M be defined by Q _z,z′(u)=μ(z′,z,σ(z,u)), and let Q ^δ:[0,∞)→M be defined by $Q^{\delta} _{z,z'}(u)=p_{\delta}(z',z, \sigma _{\delta}(z,[u/\delta]))/\delta$. As (Γ _δ)_δ>0 converges in data, $Q^{\delta}_{z,z'}(u)=\mu (z',z, \sigma _{\delta}(z,[u/\delta]))+o(1)$ as δ→0+. Therefore, $\int_{s}^{t}Q^{\delta}_{z,z'}(u)\,du=\mu(z',z,\allowbreak \int_{s}^{t}\sigma _{\delta} (z,[u/\delta ]))\, du+o(1)$ as δ→0+, where for a measure α on A(z) we define μ(z′,z,α):=∑_a∈A(z) α(a)μ(z′,z,a). Therefore, as the Markov strategies σ _δ converge w ^∗ to σ, for every s<t, we have

$$\int_s^t Q^{\delta}(u)\,du\underset{\delta\to0+}{\longrightarrow} \int_s^t Q(u)\,du. $$

Let $G^{\delta}_{j}$ be the transition matrix $(G^{\delta} _{j})_{z,z'}=p_{\delta} (z',z, \sigma _{\delta}(z,j))+I_{z,z'}$, and given 0≤s≤t we define G ^δ(s,t) to be the transition matrix $\prod_{j=[s/\delta ]}^{[t/\delta]-1}G^{\delta}_{j}$, where a product over an empty set of indices is defined as the identity. It suffices to prove that G ^δ(s,t) converges as δ→0+.

Let C=2max_z,a|μ(z,z,a)|<C′. It follows that for every t≥0 we have ∥Q(t)∥≤C, and for sufficiently small δ>0 we have ∥Q ^δ(t)∥<C′. Let L _δ(s,t)=[t/δ]−[s/δ], and note that δL _δ(s,t)≤t−s+δ.

As M is a Banach algebra, for every finite sequence Q ₁,…,Q _m of elements in M, we have

$$ \Biggl\|\prod_{j=1}^m(I+Q_j)-I- \sum_{j=1}^m Q_j\Biggr\|\leq e^{\sum_{j=1}^m \| Q_j\| }-1-\sum_{j=1}^m \|Q_j\|. $$

(1)

Inequality (1) follows from the inequality e ^x≥1+x, the triangle inequality, and the Banach algebra inequality ∥QQ′∥≤∥Q∥∥Q′∥. Indeed, if θ _j=∥Q _j∥, then $\|\prod_{j=1}^{m}(I+Q_{j})-I-\sum_{j=1}^{m} Q_{j}\|\leq\prod_{j\in J}(1+\theta_{j})-1-\sum_{j=1}^{m} \theta_{j}\leq e^{\sum_{j=1}^{m} \theta _{j}}-1-\sum_{j=1}^{m} \theta_{j}$.

As $G^{\delta}_{j}=I+\int_{j\delta}^{j\delta+\delta}Q^{\delta}(u)\,du$, $\int_{j\delta}^{j\delta+\delta}\|Q^{\delta}(u)\|\,du\leq\delta C'$, and e ^x−1−x is monotonic increasing on x≥0, for all 0≤s<t, we have

for (t−s)C′≤1 and δ>0 sufficiently small.

For every sequence s=t ₀<t ₁<⋯<t _k=t, set A _j=G ^δ(t _j−1,t _j), $B^{\delta}_{j}=I+\int_{t_{j-1}}^{t_{j}}Q^{\delta}(u)\,du$, and $B_{j}=I+\int_{t_{j-1}}^{t_{j}}Q(u)\,du$, j=1,…,k. Note that $G^{\delta}(t_{0},t)=\prod_{j=1}^{k} A_{j}$ and $\prod_{j=1}^{k} A_{j}-\prod_{j=1}^{k} B_{j}=\sum_{i=1}^{k} \prod_{j=1}^{i-1}A_{j}(A_{i}-B_{i})\prod_{j=i+1}^{k}B_{j}$. For 1≤j<k, ∥A _j∥=1, and for sufficiently small $\max_{i=1}^{k} (t_{i}-t_{i-1})$, ∥B _i∥=1 for every 1≤i≤k. Therefore, $\|\prod_{j=1}^{k} A_{j}-\prod_{j=1}^{k} B_{j}\|\leq\sum_{j=1}^{k} \|A_{j}-B_{j}\|\leq\sum_{j=1}^{k} \| A_{j}-B^{\delta}_{j}\|+\sum_{j=1}^{k} \|B^{\delta}_{j}-B_{j}\|$. Therefore, for a sufficiently large k, by setting t _j=s+j(t−s)/k and $F(t_{j-1},t_{j})=I+\int_{t_{j-1}}^{t_{j}}Q(u)\,du$, there is a (sufficiently small) δ(k)>0 such that for 0<δ<δ(k), we have

$$\Biggl\|G^{\delta}(s,t)-\prod_{j=1}^{k}F(t_{j-1},t_j) \Biggr\|\leq 2(t-s)^2 C'^2/k. $$

Therefore, sup_{0<δ,δ′<δ(k)}∥G ^δ(s,t)−G ^δ′(s,t)∥≤4(t−s)² C′²/k, implying that lim_k→∞sup_{0<δ,δ′<δ(k)}∥G ^δ(s,t)−G ^δ′(s,t)∥=0. Therefore, G ^δ(s,t) converges to a limit as δ→0+. □

Remark 3

The result applies in particular to profiles $\sigma _{\delta} =(\sigma _{\delta}^{i})_{i\in N}$ of (uncorrelated) Markov strategies in Γ _δ that converge w ^∗ to (a continuous-time correlated Markov strategy) $\sigma :S\times\mathbb{R}_{+}\to\Delta(A)$. In this case the w ^∗ limit σ need not represent a profile of continuous-time Markov strategies.

For example, if $\sigma ^{1}_{\delta}$ and $\sigma ^{2}_{\delta}$ play (T,L) at even stages and (B,R) at odd stages, then the Markov strategy profiles $\sigma _{\delta}=(\sigma ^{1}_{\delta},\sigma ^{2}_{\delta})$ converge w ^∗ to (the continuous-time stationary correlated strategy) σ with σ(∗)(T,L)=1/2=σ(∗)(B,R). Therefore, asymptotic results that involve referral to Markov strategies need special attention. They are not obtained by simply “taking limits.” However, if $\sigma :S\times \mathbb {R}_{+}\to\Delta(A)$ is a continuous-time correlated Markov strategy, there are profiles of pure (and thus uncorrelated) Markov strategies $\sigma _{\delta}=(\sigma ^{i}_{\delta})_{i\in N}$ such that σ _δ converge w ^∗ to σ.

Remark 4

Proposition 2 holds also in the model of variable stage duration games. The conditions δ→0+, kδ→s, and nδ→t, are replaced with d(δ)→0+, d _k→s, and d _n→t, respectively, and the term g _δ(z _m,a _m) is replaced with g _m(z _m,a _m).

The proof of Remark 4 is obtained by the following (additional) notational modifications in the proof of Proposition 2. The inequality p _δ(z,z,a)>1−Kδ is replaced with p _m(z,z,a)>1−Kδ _m for every m≥0, the term (1−Kδ)^n−k is replaced with ∏_k≤m<n(1−Kδ _m), and a term of the form [t/δ] is replaced with the largest integer m such that d _m≤t. The definition (in the proof of Proposition 2) of the S×S matrix $Q^{\delta}_{z,z'}(u)$ is modified to $Q^{\delta }_{z,z'}(u)=p_{[u/\delta]}(z',z,\sigma _{\delta}(z,[u/\delta ]))/\delta _{[u/\delta]}$. The inequality 0<δ<δ(k) is interpreted as 0<d(δ)<δ(k).

4 Two-Person Zero-Sum Stochastic Games with Short-Stage Duration

4.1 The Discounted Case

Fix the sets of player N={1,2}, states S, and actions A, and let Γ _δ=〈N,S,A,g _δ,p _δ〉, or Γ _δ=〈g _δ,p _δ〉 for short, be a stochastic game whose stage payoff function g _δ and transitions p _δ depend on the parameter δ that represents the single-stage duration. Recall that Γ _δ,ρ denotes the (unnormalized) discounted game Γ _δ with discount factor 1−ρδ, V _δ,ρ denotes its value, and $V_{\rho}\in\mathbb{R}^{S}$ is the asymptotic ρ-discounted value of (Γ _δ)_δ>0 if V _δ,ρ→_δ→0+ V _ρ.

Given a family (Γ _δ)_δ>0 that has an asymptotic ρ-discounted value V _ρ, we say that the stationary strategy σ, respectively τ, is asymptotic ρ-discounted optimal if for every ε>0, there is δ ₀>0, such that for every 0<δ<δ ₀, strategy σ ^∗ of player 1 (in Γ _δ), strategy τ ^∗ of player 2 (in Γ _δ), and state z,

$$\varepsilon +E^z_{\delta,\sigma ,\tau^{*}}\sum_{m=0}^{\infty}(1- \rho\delta)^m g_{\delta} (z_m,x_m)\geq V_{\rho}(z) \geq-\varepsilon + E^z_{\delta,\sigma ^{*},\tau}\sum _{m=0}^{\infty}(1-\rho\delta)^m g_{\delta}(z_m,x_m). $$

Given a converging family (Γ _δ)_δ>0, we denote by g and μ the limits, as δ→0+, of g _δ/δ and p _δ/δ, respectively.

We denote by X ⁱ(z), respectively X(z), all probability distributions over A ⁱ(z), respectively over A(z) (=A ¹(z)×A ²(z)). For z∈S and x ⁱ∈X ⁱ(z), we denote by x ¹⊗x ² the product distribution x∈X(z) that is given by x(a)=x ¹(a ¹)x ²(a ²) for a=(a ¹,a ²)∈A ¹(z)×A ²(z). For any function h:a↦h(a), that is defined over A(z), e.g., A(z)∋a↦g(z,a) or A(z)∋a↦μ(z′,z,a), we denote also by h its linear extension to X(z), i.e., h(x)=∑_a∈A(z) x(a)h(a).

Theorem 1

Every converging family (Γ _δ)_δ>0 has an asymptotic ρ-discounted value, which equals the unique solution $V\in\mathbb{R}^{S}$ of the system of S equations, z∈S,

$$ \rho v(z)=\max_{x^1\in X^1(z)} \min_{x^2\in X^2(z)} \biggl( g\bigl(z,x^1\otimes x^2\bigr)+\sum_{z'\in S}\mu \bigl(z',z,x^1\otimes x^2\bigr)v \bigl(z'\bigr) \biggr), $$

(2)

and each player has an asymptotic ρ-discounted optimal stationary strategy.

Proof

By the theory of discrete-time stochastic games, V _δ,ρ exists and is the unique solution of the system of equations

$$ v(z)=\max_{x^1\in X^1(z)}\min _{x^2\in X^2(z)} \biggl( g_{\delta} \bigl(z,x^1\otimes x^2\bigr)+\sum_{z'\in S}(1-\rho \delta)P_{\delta}\bigl(z'\mid z,x^1\otimes x^2\bigr)v\bigl(z'\bigr) \biggr). $$

(3)

Since P _δ(z′∣z,a)=p _δ(z′,z,a) for z′≠z, and P _δ(z′∣z,a)=1+p _δ(z′,z,a) for z′=z, we can deduce, by subtracting (1−ρδ)v(z) from both sides of the z equation that V _δ,ρ exists and is the unique solution of the system of equations

$$ \rho\delta v(z)=\max_{x^1\in X^1(z)}\min _{x^2\in X^2(z)} \biggl( g_{\delta} \bigl(z,x^1\otimes x^2\bigr)+\sum_{z'\in S}(1-\rho \delta)p_{\delta} \bigl(z',z,x^1\otimes x^2\bigr)v\bigl(z'\bigr) \biggr). $$

(4)

For g _δ=δg and $p_{\delta}=\frac{\delta}{1-\rho\delta }\mu $, v solves (4) if and only if it solves (2). For δ>0 sufficiently small, $p_{\delta}=\frac{\delta}{1-\rho\delta}\mu$ indeed represents transition probabilities. Therefore, the system (2) of equations has a unique solution.

Let V be the unique solution of (2). Let σ be a stationary strategy of player 1 with σ(z) maximizing (over all x ¹∈X ¹(z))

$$ \min_{x^2\in X^2(z)}g\bigl(z,x^1 \otimes x^2\bigr)+\sum_{z'\in S}\mu \bigl(z',z,x^1\otimes x^2\bigr)V \bigl(z'\bigr). $$

(5)

Therefore, for every z∈S and x ²∈X ²(z), we have

$$ g\bigl(z,\sigma (z)\otimes x^2\bigr)+\sum _{z'\in S}\mu \bigl(z',z,\sigma (z)\otimes x^2\bigr)V\bigl(z'\bigr)\geq\rho V(z). $$

(6)

Fix ε>0. We claim that there is δ ₀>0, such that for every 0<δ<δ ₀, strategy τ of player 2, and state z,

$$ E^z_{\delta,\sigma ,\tau} \sum _{m=0}^{\infty} (1-\rho\delta)^m g_{\delta}(z_m,a_m)\geq V(z)-\varepsilon . $$

(7)

Fix an initial history h _m=(z ₀,a ₀,…,z _m), and let $x_{m}^{2}=\tau (h_{m})$ and $x_{m}=\sigma (z_{m})\otimes x^{2}_{m}$. Let Y _m:=E _σ,τ(g _δ(z _m,a _m)+(1−ρδ)V(z _m+1)∣h _m).

Therefore, for every m≥0, $E^{z}_{\delta,\sigma ,\tau}(1-\rho \delta )^{m}g_{\delta}(z_{m},a_{m})\geq(1-\rho\delta)^{m}E^{z}_{\delta,\sigma ,\tau }V(z_{m})-(1-\rho\delta)^{m+1}E^{z}_{\delta,\sigma ,\tau }V(z_{m+1})-o(\delta )(1-\rho\delta)^{m}$. Summing over m=0,1,… , we deduce that

$$E^z_{\delta,\sigma ,\tau} \sum_{m=0}^{\infty}(1- \rho\delta)^m g_{\delta} (z_m,a_m)\geq V(z) -o(\delta)\sum_{m=0}^{\infty}(1-\rho\delta )^m\to _{\delta\to0+} V(z). $$

By duality, if τ is a stationary strategy of player 2 with τ(z) minimizing (over all x ²∈X ²(z))

$$ \max_{x^1\in X^1(z)}g\bigl(z,x^1 \otimes x^2\bigr)+\sum_{z'\in S}\mu \bigl(z',z,x^1\otimes x^2\bigr)V \bigl(z'\bigr), $$

(8)

then for every strategy σ of player 1 we have

$$E^z_{\delta,\sigma ,\tau} \sum_{m=0}^{\infty}(1- \rho\delta)^m g_{\delta} (z_m,a_m)\leq V(z) +o(\delta)\sum_{m=0}^{\infty}(1-\rho\delta )^m\to _{\delta\to0+} V(z). $$

□

Denote by V _ρ(g,μ) the asymptotic ρ-discounted value of the family (Γ _δ=〈g _δ,μ _δ〉)_δ>0 that converges (as δ goes to zero) to 〈g,μ〉, and by V _δ,ρ(g,p) the value of the discounted discrete-time stochastic game 〈g,p〉 with a discount factor 1−ρδ.

Remark 5

The above proof of Theorem 1 shows that

$$ V_{\rho}(g,\mu)=V_{\delta,\rho}\biggl(\delta g, \frac{\delta}{1-\rho \delta}\mu \biggr) \quad \mbox{whenever } \delta\leq\frac{1}{\|\mu\|+\rho}, $$

(9)

where ∥μ∥=max_z,a|μ(z,z,a)|.

Remark 6

The proof shows in addition that a stationary strategy σ of player 1, respectively τ of player 2, is asymptotic ρ-discounted optimal if and only if, for every state z∈S, σ(z) maximizes (5), respectively, τ(z) minimizes (8).

Remark 7

It is worth recalling that a stationary strategy is a (behavioral) strategy whose mixed action at every stage is independent of the stage, past states, and past actions of the players. Therefore, the result holds also in a model where some of the players do not observe past actions, and even in a model where some of the players are unable to recall the current stage and past states.

Remark 8

The proof that (2) has a solution was based on the corresponding result from the theory of discounted discrete-time stochastic games. In what follows, we prove it directly.

For a vector $v\in\mathbb{R}^{S}$ we denote by ∥v∥ its maximum norm ∥v∥:=max_z∈S|v(z)|. For every z∈S, a∈A(z), $v\in\mathbb{R}^{S}$, and x∈X(z), G ^z[v](a) is defined by

$$G^z[v](a)=\frac{1}{\|\mu\|+\rho} \biggl(g(z,a)+\sum _{z'\in S}\mu \bigl(z',z,a\bigr)v \bigl(z'\bigr)+\|\mu\|v(z) \biggr), $$

and (thus) G ^z[v](x) is defined by

Define the operator Q from $\mathbb{R}^{S}$ to $\mathbb{R}^{S}$ by

$$Qv(z)=\max_{x\in X^1(z)}\min_{x^2\in X^2(z)}G^z[v] \bigl(x^1\otimes x^2\bigr). $$

By the minmax theorem, we have

$$Qv(z)=\min_{x^2\in X^2(z)}\max_{x\in X^1(z)}G^z[v] \bigl(x^1\otimes x^2\bigr) $$

and, therefore, v is a solution of Qv=v if and only if it is a solution of (2). Therefore, it suffices to prove that Q has a fixed point. Note that $G^{z}[v+c1_{S}](x)=G^{z}[v](x)+\frac{c\|\mu\|}{\|\mu\|+\rho}$ and, therefore,

$$Q(v+c1_S) (z)=Qv+\frac{c\|\mu\|}{\|\mu\|+\rho}. $$

In addition, Q is monotonic; i.e., u≥v implies that Qu≥Qv and, therefore, for $v,u\in\mathbb{R}^{S}$ we have

$$\|Qv-Qu\|\leq\frac{\|\mu\|}{\|\mu\|+\rho}\|v-u\|. $$

Therefore, Q is a strict contraction and, therefore, Q has a unique fixed point. □

Remark 9

The following (alternative) proof of Theorem 1 is based on results from the theory of continuous-time stochastic games in conjunction with stationary convergence of the family of games Γ _δ.

We apply notations and inequalities from [8]. First, one recalls that a pair of stationary strategies, σ of player 1 and τ of player 2, where σ(z) maximizes (5), and τ(z) minimizes (8), is a pair of optimal strategies in the continuous-time ρ-discounted game Γ=〈g,μ〉, and V is its value. In particular, for every stationary strategy τ ^∗ of player 2 and every stationary strategy σ ^∗ of player 1, we have

$$E^z_{\sigma ,\tau^{*}}\int_0^{\infty} e^{-\rho t}g\bigl(z_t , \sigma (z_t )\otimes \tau ^{*}(z_t )\bigr)\,dt \geq V(z)\geq E^z_{\sigma ^{*},\tau}\int _0^{\infty} e^{-\rho t}g\bigl(z_t , \sigma ^{*}(z_t)\otimes\tau(z_t )\bigr)\,dt . $$

Next, stationary convergence implies that for stationary strategies σ′ of player 1 and τ′ of player 2 we have

$$E^z_{\delta,\sigma ',\tau'}\sum_{m=0}^{\infty}(1- \rho\delta)^m g_{\delta} (z_m,a_m) \to_{\delta\to0+}E^z_{\sigma ',\tau'}\int_0^{\infty} e^{-\rho t}g\bigl(z_t , \sigma '(z_t )\otimes\tau'(z_t )\bigr)\,dt. $$

Therefore, given ε>0, for δ>0 sufficiently small, for every pure stationary strategy τ ^∗ of player 2 and pure stationary strategy σ ^∗ of player 1, we have

$$\varepsilon +E^z_{\delta,\sigma ,\tau^{*}}\sum_{m=0}^{\infty}(1- \rho\delta)^m g_{\delta} (z_m,a_m)\geq V(z)\geq-\varepsilon + E^z_{\delta,\sigma ^{*},\tau}\sum_{m=0}^{\infty} (1-\rho\delta)^m g_{\delta}(z_m,a_m). $$

In a discrete-time discounted game (with finitely many states and actions), there is always a pure stationary strategy that is a best reply to a given stationary strategy. Therefore, V is an asymptotic ρ-discounted value and σ and τ are asymptotic ρ-discounted optimal strategies of the converging family (Γ _δ)_δ>0.

The Algebraic Approach

Fix the finite state space S and the finite action sets A ⁱ(z) (i=1,2 and z∈S), and recall that $\mathcal{A}=\{(z,a): z\in S, a\in A(z)\}$. The set of all (g,μ,v,ρ,x ¹,x ²), where $g\in\mathbb{R}^{\mathcal{A}}$, $\mu\in\mathbb {R}^{S\times \mathcal{A}}$ (with μ(z′,z,a)≥0 for S∋z′≠z∈S and a∈A(z), and ∑_z′∈S μ(z′,z,a)=0 for $(z,a)\in \mathcal{A}$), $v\in \mathbb{R}^{S}$, 0<ρ<1, x ⁱ∈X ⁱ(z), that satisfies the following finite^{Footnote 5} lists of inequalities:

(10)

(11)

is semialgebraic. Therefore, for each fixed (g,μ), the graph of the correspondence assigning to each ρ the asymptotic ρ-discounted optimal stationary strategies of each player and the asymptotic ρ-discounted value function V _ρ is semialgebraic. Therefore (see, e.g., [1, 6]), there is a semialgebraic map ρ↦(V _ρ,σ ^ρ,τ ^ρ), where V _ρ is the ρ-discounted asymptotic value and σ ^ρ and τ ^ρ are stationary asymptotic ρ-discounted optimal strategies. In particular, the map has a convergent expansion in fractional powers of ρ in a right neighborhood of 0 (and a convergent expansion in fractional powers of ρ in any one-sided neighborhood of a point 0<ρ ₀<1). As V _ρ is the ρδ-discounted value of the discrete-time stochastic game with payoff function δg and transitions $p_{\delta}=\frac{\delta }{1-\rho\delta}\mu$ it is bounded by ∥g∥/ρ. Therefore, ρ↦v _ρ:=ρV _ρ is a bounded semialgebraic function. In particular, there is (1) a positive integer M, (2) real coefficients c _k(z), and (3) a positive discount rate $\bar{\rho}>0$, such that for $0<\rho\leq\bar{\rho}$ the series $\sum_{k=0}^{\infty} c_{k}(z)\rho^{k/M}$ converges and

$$v_{\rho}(z) =\sum_{k=0}^{\infty} c_k(z)\rho^{k/M}. $$

If the game is one of perfect information, then each player has for each 1>ρ>0 a pure stationary strategy that is an asymptotic ρ-discounted optimal strategy. Therefore (following the classical argument from discrete-time stochastic games), the value function ρ↦v _ρ(z) is a rational function in ρ in a right neighborhood of 0 (and in any one-sided neighborhood of a point 1>ρ ₀>0). It follows that there are $\bar{\rho}>0$ and real coefficients c _k(z), and pure stationary strategies σ ⁱ, i=1,2, such that for $\rho\leq\bar{\rho}$ the series $\sum_{k=0}^{\infty} c_{k}(z)\rho ^{k}$ converges,

$$v_{\rho}(z) =\sum_{k=0}^{\infty} c_k(z)\rho^{k}, $$

and σ ⁱ is asymptotic ρ-discounted optimal in the family (Γ _δ)_δ>0.

Covariance Properties

Fix the sets of states S and actions A. Let V _ρ(g,μ) be the unique solution of the system (2) of S equations. Recall that it equals the asymptotic ρ-discounted value of any family 〈g _δ,p _δ〉 that converges in data to 〈g,μ〉. (It is also the value of the continuous-time stochastic game 〈N,S,A,g,μ〉, e.g., [8].) Consider the function V _ρ(g,μ) as a function of ρ, g, and μ. Obviously, the ρ-discounted asymptotic value V _ρ(g,μ) is monotonic in g and covariant with respect to multiplication of the payoff function g by a positive scalar. Namely, if g′≥g and α is a nonnegative real number, V _ρ(g′,μ)≥V _ρ(g,μ) and V _ρ(αg,μ)=αV _ρ(g,μ). For α>0, a vector V satisfies Eq. (2) if and only if it satisfies the same equation when ρ is replaced by αρ, g is replaced by αg, and μ is replaced by αμ. Therefore, V _αρ(αg,αμ)=V _ρ(g,μ). (In the continuous-time game interpretation, this equality is interpreted as, and can be derived by, a simple rescaling of time: t↦αt.)

Now we turn to the expression of the ρ-discounted asymptotic value as a value of a discrete-time discounted stochastic game.

If ∥μ∥≤1, we assign to (the continuous-time game) Γ=〈N,S,A,g,μ〉 the discrete-time game $\bar{\varGamma }=\langle N,S,A,g,p=\mu\rangle$. By Remark 5, the value $\bar{V}_{\rho}(g,\mu)$ of the discrete-time ρ-discounted (with discount factor 1−ρ) stochastic game $\bar{\varGamma}=\langle\{ 1,2\} ,S,A,g, p=\mu\rangle$ equals V _ρ(g,(1−ρ)μ) whenever 0<ρ≤1−∥μ∥.

Summarizing,

(12)

(13)

(14)

equivalently,

$$ \bar{V}_{\rho}(g,\mu) = V_{\rho}\bigl(g,(1-\rho)\mu\bigr)\quad \mbox{whenever } 0 < \rho< 1 \mbox{ and } \|\mu\|\leq1. $$

(15)

Note that for a constant payoff function g=c, we have ρV _ρ(c,μ)=c. The normalization v _ρ:=ρV _ρ of the function V _ρ, is a function of (g,μ): v _ρ(g,μ)=ρV _ρ(g,μ). Given two transition rates μ and μ′,

$$d\bigl(\mu,\mu'\bigr):=\max \biggl\{ \frac{\mu(z',z,a)}{\mu'(z',z,a)}, \frac {\mu '(z',z,a)}{\mu(z',z,a)}\biggm{|} a\in A(z), z,z'\in S \biggr\}-1, $$

where by convention x/0=∞ for x>0, and 0/0=1.

Lemma 1

For every pair of payoff functions g and g′ and every pair of transition rates μ and μ′ the following inequality holds:

$$ \bigl\|v_{\rho}\bigl(g',\mu' \bigr)-v_{\rho}(g,\mu)\bigr\| _{\infty}\leq 4|S| d\bigl(\mu, \mu'\bigr) \min\bigl\{\|g\|,\bigl\|g'\bigr\|\bigr\}+ \bigl\|g-g'\bigr\|. $$

(16)

Proof

The proof applies [10, Theorem 6] in conjunction with the covariance properties (13) and (14). Fix ρ,g,g′,μ,μ′. Let β>0, and note that d(μ,μ′)=d(μ/β,μ′/β). As μ=βμ/β, equality (13) implies that $v_{\rho} (g,\mu)=\frac{\rho}{\beta}V_{\frac{\rho}{\beta}}(g,\mu/\beta )=v_{\frac {\rho}{\beta}}(g,\mu/\beta)$, and similarly, $v_{\rho}(g',\mu ')=v_{\frac {\rho}{\beta}}(g',\mu'/\beta)$. We choose β>0 sufficiently large, e.g., $\beta>\rho+\frac{\max\{\|\mu\|, \|\mu'\|\}}{1-\rho}$, so that $\rho/\beta<1-\frac{\|\mu\|}{(1-\rho)\beta}$ and $\rho/\beta <1-\frac {\|\mu'\|}{(1-\rho)\beta}$. This will enable us to apply equality (14) in the third equality below. Therefore,

where the first and second equalities follow from (13), the third equality follows from (14), and the last inequality follows from [10, Theorem 6]. □

Recall that the family of discrete-time stochastic games Γ _δ=〈N,S,A,g _δ,p _δ〉 converges strongly to Γ=〈N,S,A,g,μ〉 if for all $(z',z,a)\in S\times \mathcal{A}$, g _δ(z,a)=δg(z,a)+o(δ) and p _δ(z′,z,a)=δμ(z′,z,a)(1+o(1)) as δ→0+.

Theorem 2

If Γ _δ=〈g _δ,p _δ〉 converges strongly to Γ=〈g,μ〉 then ρV _δ,ρ→_δ→0+ ρV _ρ(μ,g) uniformly in 0<ρ<1.

Proof

By Remark 5, V _δ,ρ=V _ρ(g _δ/δ,(1−ρδ)p _δ/δ). Therefore, $v_{\delta,\rho}=\rho V_{\delta,\rho}= v_{\rho}(g'_{\delta} ,\allowbreak \mu _{\delta,\rho}):=\rho V_{\rho}(g'_{\delta},\mu_{\delta,\rho})$, where $g'_{\delta}= g_{\delta}/\delta$ and μ _δ,ρ(z′,z,a)=(1−ρδ)p _δ/δ. Therefore, as ∥g′−g∥→0 as δ→0+ and d(μ,μ _δ,ρ)→_δ→0+0 uniformly in ρ, inequality (16) implies that $\rho V_{\delta ,\rho }=v_{\rho}(g'_{\delta},\mu_{\delta,\rho})\to_{\delta\to0+}v_{\rho} (g,\mu)$ uniformly in ρ. □

4.2 The Asymptotic Nonstationary Discounted Value

We start with a few simple and useful properties of nonstationary discounting measures. First, if w is a nonstationary discounting measure on [0,∞] then w has no atoms in (0,∞), w is absolutely continuous on (0,∞), and $\frac{dw}{dt}(t)$ is nonincreasing in 0<t<∞. Given a nonstationary discounting measure w on [0,∞] and a finite sequence $\tilde {t}=(t_{0}=0<t_{1}<\cdots<t_{\ell}<\infty)$, we define the nonstationary discounting measure $\tilde{w}_{\tilde{t}}$, or $\tilde{w}$ for short, on [0,∞] by $\tilde{w}([t_{j},t_{j+1}))=w([t_{j},t_{j+1}))$, $\frac{d\tilde{w}}{dt}(t)$ being a constant (thus, $\frac{d\tilde {w}}{dt}(t)=w([t_{j},t_{j+1}))/(t_{j+1}-t_{j})$) on each interval [t _j,t _j+1) (0≤j<ℓ), and $\tilde{w}$ coincides with w on subsets of [t _ℓ,∞]. Set $d(\tilde{t}):= \max_{0\leq j<\ell }(t_{j+1}-t_{j})$.

Lemma 2

Let w be a nonstationary measure on [0,∞] and $\tilde {t}=(t_{0}=0<t_{1}<\cdots <t_{\ell}<\infty)$ a finite sequence. Then

$$ \int_{t_1}^{t_{\ell}} \biggl| \frac{dw}{dt}(t)-\frac{d\tilde{w}}{dt}(t)\biggr|\, dt\leq 2\int_{t_1}^{t_1+d(\tilde{t})} \frac{dw}{dt}(t)\,dt, $$

(17)

and if the nonstationary discounting measures w _δ on [0,∞] converge to the measure w on [0,∞] then

$$ \int_{t_1}^{t_{\ell}} \biggl| \frac{dw_{\delta}}{dt}(t)-\frac{dw}{dt}(t)\biggr|\, dt\to _{\delta\to0+}0. $$

(18)

Proof

As $\frac{dw}{dt}(t)$ is nonincreasing in t, $\int_{t_{j}}^{t_{j+1}} |\frac{dw}{dt}(t)-\frac{d\tilde{w}}{dt}(t)|\,dt\leq2\int_{t_{j}}^{t_{j+1}} \frac{dw}{dt}(t)- \frac{dw}{dt}(t+\nobreak d(\tilde{t}))\,dt$. Therefore, $\int_{t_{1}}^{t_{\ell}} |\frac{dw}{dt}(t)-\frac{d\tilde {w}}{dt}(t)|\,dt=\sum_{1\leq j<\ell}\int_{t_{j}}^{t_{j+1}} |\frac {dw}{dt}(t)-\frac{d\tilde{w}}{dt}(t)|\,dt\leq2\sum_{1\leq j<\ell }\int_{t_{j}}^{t_{j+1}} \frac{dw}{dt}(t)-\frac{dw}{dt}(t+d(\tilde{t}))\, dt\leq 2\int_{t_{1}}^{t_{1}+d(\tilde{t})}\frac{dw}{dt}(t)\,dt$, which proves (17).

In order to prove (18), it suffices to prove that for every ε>0 there is δ ₀>0 such that for 0<δ<δ ₀, $\int_{t_{1}}^{t_{\ell}} |\frac{dw_{\delta}}{dt}(t)-\frac{dw}{dt}(t)|\, dt<4\varepsilon $. Fix ε>0.

For every d>0 and a nonstationary discounting measure ν on [0,∞], we define the nonstationary discounting measures ν ^d on [0,∞] by $\nu^{d}([a,b])=\frac{1}{d}\int_{0}^{d}\nu ([a+t,b+t])\, dt$. Note that $\frac{dw^{d}}{dt}(t)$ and $\frac{dw^{d}_{\delta}}{dt}(t)$ are continuous at each t<∞ and $\frac{dw^{d}_{\delta}}{dt}(t)\to_{\delta\to 0+}\frac {dw^{d}}{dt}(t)$. Therefore, $\int_{t_{1}}^{t_{\ell}} |\frac{dw^{d}_{\delta} }{dt}(t)-\frac{dw^{d}}{dt}(t)|\,dt\to_{\delta\to0+}0$. As $\frac {dw}{dt}(t)$ is nonincreasing in t, $\int_{t_{1}}^{t_{\ell}} |\frac {dw}{dt}(t)-\frac{dw^{d}}{dt}(t)|\,dt=\int_{t_{1}}^{t_{\ell}} \frac {dw}{dt}(t)-\frac{dw^{d}}{dt}(t)\,dt\leq\int_{t_{1}}^{t_{1}+d}\frac {dw}{dt}(t)\,dt -\int_{t_{\ell}}^{t_{\ell}+d}\frac{dw}{dt}(t)\,dt\leq w([t_{1},t_{1}+d])$. Similarly, $\int_{t_{1}}^{t_{\ell}} |\frac{dw_{\delta}}{dt}(t)-\frac {dw^{d}_{\delta}}{dt}(t)|\,dt\leq w_{\delta}([t_{1},t_{1}+d])$. Let d>0 be sufficiently small so that w([t ₁,t ₁+d])<ε, and δ ₀>0 be sufficiently small so that for all 0<δ<δ ₀, $\int_{t_{1}}^{t_{\ell}} |\frac{dw^{d}_{\delta}}{dt}(t)-\frac {dw^{d}}{dt}(t)|\, dt<\varepsilon $. Therefore, as $|\frac{dw_{\delta}}{dt}(t)-\frac {dw}{dt}(t)|\leq |\frac{dw_{\delta}}{dt}(t)-\frac{dw^{d}_{\delta}}{dt}(t)|+|\frac {dw^{d}_{\delta} }{dt}(t)-\frac{dw^{d}}{dt}(t)|+|\frac{dw^{d}}{dt}(t)-\frac{dw}{dt}(t)|$,

$$\int_{t_1}^{t_{\ell}} \biggl|\frac{dw_{\delta}}{dt}(t)- \frac{dw}{dt}(t)\biggr|\, dt<4\varepsilon . $$

□

Theorem 3

Let w be a nonstationary discounting measure on [0,∞), t≥0, and $\nu: \mathcal{A}\to\mathbb{R}$. Then a family (Γ _δ)_δ>0 that converges in data has an asymptotic (w,t,ν) value, and if w _δ, δ>0, are nonstationary discounting measures on $\mathbb{N}$ that converge to w, and m _δ≥0 and $\nu_{\delta}: \mathcal{A}\to\mathbb{R}$ are such that (m _δ,ν _δ) converges to (t,ν) (as δ→0+), then for every ε>0 there are ε-optimal Markov strategies in $\varGamma _{\delta ,w_{\delta}}^{m_{\delta}, \nu_{\delta}}$ that converge to a continuous-time Markov strategy.

Before turning to the proof of the theorem, we introduce a useful auxiliary lemma.

Fix a payoff function $g:\mathcal{A}\to\mathbb{R}$ and a transition rate function $\mu:S\times\mathcal{A}\to\mathbb{R}$ with μ(z′,z,a)≥0 if z′≠z and ∑_z′∈S μ(z′,z,a)=0. Let $\|\mu\|:=\max_{(z,a)\in\mathcal{A}}|\mu(z,z,a)|$. For every z∈S, α,β>0, $V\in\mathbb{R}^{S}$, and x∈Δ(A(z)), F(z,x,α,β,V) is defined by

$$F(z,x,\alpha,\beta,V) = \alpha g(z,x)+V(z)+\sum_{z'\in S} \beta\mu \bigl(z',z,x\bigr)V\bigl(z'\bigr), $$

and $T(\alpha,\beta,V)\in\mathbb{R}^{S}$ is defined by

$$T(\alpha,\beta,V) (z)=\max_{x^1\in X^1(z)}\min_{x^2\in X^2(z)}F \bigl(z,x^1\otimes x^2,\alpha,\beta,V\bigr). $$

Let $V_{1}\in\mathbb{R}^{S}$, α,β>0, and define $V_{0}\in \mathbb {R}^{S}$ by V ₀=T(α,β,V ₁). Given a sequence γ=(0=γ ₀<⋯<γ _m=1), define $U_{\gamma_{j}}$, 0≤j≤m (recursively in j) by $U_{1}=U_{\gamma_{m}}=V_{1}$, and for 0≤j<m and z∈S, $U_{\gamma_{j}}=T((\gamma_{j+1}-\gamma_{j})\alpha, (\gamma _{j+1}-\gamma _{j})\beta, U_{\gamma_{j+1}})$. If d(γ):=max_0≤j<m γ _j+1−γ _j is sufficiently small so that d(γ)β∥μ∥≤1, Γ(γ) denotes the m-stage game with set of plays $S\times\mathcal{A}^{m}$, the payoff of a play z ₀,a ₀,…,z _m is V ₁(z _m)+∑_0≤j<m(γ _j+1−γ _j)αg(z _j,a _j), and past play is observed by the players, and the “states transitions” are such that the conditional probability of z _j+1=z, given z ₀,a ₀,…,a _j, is $I_{z_{j},z}+(\gamma_{j+1}-\gamma_{j})\beta\mu(z,z_{j},a_{j})$.

Lemma 3

Assume that β∥μ∥≤1/2. Then (1) the game Γ(γ) is well defined and its value equals U ₀, (2) the stationary strategy σ of player 1 (respectively, τ of player 2) that for every state z∈S, σ(z) maximizes $\min_{x^{2}\in X^{2}(z)}F(z,\sigma (z)\otimes x^{2},\alpha,\beta,V_{1})$, (respectively, τ(z) minimizes $\max_{x^{1}\in X^{1}(z)}F(z,x^{1}\otimes\tau(z),\alpha,\beta,V_{1})$) is 4β∥μ∥(α∥g∥+4β∥μ∥∥V ₁∥)-optimal in Γ(γ), and (3) ∥U ₀−V ₀∥≤4β∥μ∥(α∥g∥+4β∥μ∥∥V ₁∥).

Proof

For every $(z_{j},a_{j})\in\mathcal{A}$, the condition d(γ)β∥μ∥≤1 implies that $I_{z_{j},z}+(\gamma_{j+1}-\gamma_{j})\beta \mu (z,z_{j},a_{j})\geq0$, and in addition $\sum_{z\in S} (I_{z_{j},z}+(\gamma_{j+1}-\gamma_{j})\beta\mu (z,z_{j},a_{j}))=1$. Therefore, Γ(γ) is well defined. The recursive formula for the value of the m-stage game Γ(γ) shows that the value of Γ(γ) equals U ₀.

For every strategy profile σ in Γ(γ) and state z, $P^{z}_{\sigma} (z_{0}=z_{1}=\cdots=z_{m})\geq\prod_{0\leq j<m}(1-(\gamma _{j+1}-\gamma _{j})\beta\|\mu\|)\geq1-\beta\|\mu\|$. Therefore, for every Markov strategy profile σ in Γ(γ) and state z,

$$E^z_{\sigma} \sum_{0\leq j<m}( \gamma_{j+1}-\gamma_j)\alpha g(z_j,a_j) \geq \alpha g\bigl(z,\bar{\sigma }(z)\bigr)-2\beta\|\mu\|\alpha\|g\|, $$

where $\bar{\sigma }(z)=\sum_{0\leq j<m}(\gamma_{j+1}-\gamma _{j})\sigma (z,j)$.

Let σ ¹ be a stationary strategy of player 1 in Γ(γ) such that for every state z∈S, σ ¹(z) maximizes $\min_{x^{2}\in X^{2}(z)}F(z,\sigma (z)\otimes x^{2},\alpha,\beta,V_{1})$. Then for every Markov strategy σ ² of player 2 in Γ(γ), inequality (1) implies that $\sum_{z'\in S}|P^{z}_{\sigma} (z_{m}=z')-I_{z,z'}-\beta\mu(z',z,\bar {\sigma }(z))|\leq e^{2\beta\|\mu\|}-1-2\beta\|\mu\|\leq4\beta^{2}\|\mu\|^{2}$, where σ is the strategy profile (σ ¹,σ ²) and the last inequality uses the assumption 2β∥μ∥≤1. Therefore,

$$E^z_{\sigma} \biggl(V_1(z_m)+\sum _{0\leq j<m}(\gamma_{j+1}-\gamma _j) \alpha g(z_j,a_j) \biggr) \geq V_0(z)-2\beta \|\mu\|\bigl(\alpha\|g\|+4\beta\| \mu\|\| V_1\|\bigr). $$

Let τ ² be a stationary strategy of player 2 in Γ(γ) such that for every state z∈S, τ ²(z) minimizes $\max_{x^{1}\in X^{1}(z)}F(z,\sigma (z)\otimes x^{2},\alpha,\beta,V_{1})$. Then, by duality, for every Markov strategy τ ¹ of player 1 in Γ(γ),

$$E^z_{\tau}\biggl(V_1(z_m)+\sum _{0\leq j<m}(\gamma_{j+1}-\gamma_j)\alpha g(z_j,a_j)\biggr)\leq V_0(z)+2\beta\|\mu\|\bigl(\alpha \|g\|+4\beta\|\mu\|\| V_1\|\bigr), $$

where τ=(τ ¹,τ ²).

Therefore, ∥U ₀−V ₀∥≤4β∥μ∥(α∥g∥+4β∥μ∥∥V ₁∥) and σ ¹ and τ ² are (4β∥μ∥(α∥g∥+4β∥μ∥∥V ₁∥))-optimal. □

Proof of Theorem 3

The first stage of the proof is obtained by associating an extensive form ℓ-stage game $\varGamma(\tilde{t})$ with a finite sequence $\tilde{t}=(t_{0}=0<t_{1}<\cdots<t_{k}=t<t_{k+1}<\cdots<t_{\ell})$ of times (and the triple (w,t,ν)) as follows.

The game $\varGamma(\tilde{t})$ is an ℓ-stage “stochastic game” with (1) the same sets of states, actions, and players as in Γ _δ, (2) stage-dependent payoffs (that also incorporate an extra payment in stage k), and (3) stage-dependent transitions. Let Δ_j:=t _j+1−t _j and let $\tilde{t}$ be such that $d(\tilde{t})$ is sufficiently small so that $d(\tilde{t})\|\mu\|<1/2$. A play of $\varGamma (\tilde{t})$ is a sequence $(\tilde{z}_{0},\tilde{a}_{0},\ldots ,\tilde {z}_{\ell})$ with $\tilde{a}_{j}\in A(\tilde{z}_{j})$ and the payoff of the play $(\tilde{z}_{0},\tilde{a}_{0},\ldots,\tilde{z}_{\ell})$ is $\nu (\tilde{z}_{k},\tilde{a}_{k})+\sum_{j=0}^{\ell-1} w_{j} g(\tilde {z}_{j},\tilde{a}_{j})$, where w _j:=w([t _j,t _j+1)).

Past play is observed by the players. Therefore, a strategy of a player chooses his action at stage j=0,…,ℓ−1 as a function of $(\tilde{z}_{0},\tilde{a}_{0},\ldots,\tilde{z}_{j})$. The conditional probability, given $\tilde{z}_{0},\tilde{a}_{0},\ldots,\tilde {z}_{j},\tilde{a}_{j}$, of $\tilde{z}_{j+1}=z$ is $\Delta_{j} \mu(z, \tilde{z}_{j},\tilde{a}_{j})+I_{\tilde{z}_{j},z}$. It is helpful to view the states transitions in $\varGamma(\tilde {t})$ as those of an “exact” stochastic game whose jth stage duration, 0≤j<ℓ, is Δ_j. The game $\varGamma(\tilde{t})$ has a value $\tilde{V}$ and the players have Markovian optimal strategies.

The value $\tilde{V}$ equals $\tilde{V}_{0}$, where $\tilde{V}_{j}\in \mathbb{R}^{S}$ are defined recursively for 0≤j≤ℓ. For every z∈S, $\tilde{V}_{\ell}(z)=0$, and for 0≤j<ℓ we define $\tilde{V}_{j}(z)$ by

$$\tilde{V}_j(z)=\max_{x^1\in X^1(z)}\min _{x^2\in X^2(z)} \bigl(1_{j=k}\nu (z,x)+ F(z,x,w_j, \Delta_j,\tilde{V}_{j+1}) \bigr), $$

where x=x ¹⊗x ².

Note that for every j<ℓ, $\|\tilde{V}_{j}\|\leq1_{j=k}\|\nu\| +w_{j}\| g\|+\|\tilde{V}_{j+1}\|$, where $\|\nu\|= \max_{(z,a)\in\mathcal {A}}|\nu (z,a)|$. Therefore, by induction on 0≤ℓ−j≤ℓ, $\| \tilde {V}_{j}\|\leq1_{j\leq k}\|\nu\|+\sum_{j'\geq j}w_{j'}\|g\|\leq \|\nu\|+w([0,\infty))\|g\|$.

The Markov strategy $\tilde{\sigma }$ of player 1 in $\varGamma (\tilde{t})$ with $\tilde{\sigma }(z,j)$ maximizing (over all x ¹∈X ¹(z))

$$\min_{x^2\in X^2(z)} \bigl(1_{j=k}\nu\bigl(z,x^1 \otimes x^2\bigr)+ F\bigl(z,x^1\otimes x^2,w_j ,\Delta_j ,\tilde{V}_{j+1} \bigr) \bigr) $$

is an optimal strategy of player 1 in $\varGamma(\tilde{t})$. Indeed, for every strategy τ of player 2 in $\varGamma(\tilde{t})$ and stage 0≤j<ℓ,

$$E^z_{\tilde{\sigma },\tau} \bigl( 1_{j=k}\nu( \tilde{z}_{j},\tilde {a}_{j})+w_j g( \tilde{z}_{j},\tilde{a}_{j}) \bigr)\geq E^z_{\tilde{\sigma },\tau } \bigl(\tilde{V}_j( \tilde{z}_{j})-\tilde{V}_{j+1}(\tilde{z}_{j+1}) \bigr). $$

Therefore, by summing these inequalities over 0≤j<ℓ, we have

$$E^z_{\tilde{\sigma },\tau} \biggl(\nu(\tilde{z}_{k},\tilde {a}_{k})+\sum_{0\leq j<\ell}w_j g( \tilde{z}_{j},\tilde{a}_{j}) \biggr)\geq \tilde{V}_0(z). $$

The second stage of the proof is to associate with $\tilde{t}$, $\tilde {\sigma }$, and δ>0, a sequence $\tilde{m}_{\delta}=(m_{\delta ,0}=0<m_{\delta,1}<\cdots<m_{\delta,\ell})$, a Markov strategy σ _δ in Γ _δ, and a nonstationary discounting measure $\tilde{w}_{\delta}$, as follows.

For m _δ,j≤m<m _δ,j+1, $\sigma _{\delta}(z,m)= \tilde {\sigma }(z,j)$, for m≥m _δ,ℓ, σ _δ(z,m) coincides with an arbitrary stationary strategy, m _δ,k=m _δ, m _δ,j=[t _j/δ] for j≠k (thus δm _δ,j→_δ→0+ t _j for all 0≤j<ℓ), $\tilde{w}_{\delta}(m)=w_{\delta}(m)$ for m≥m _δ,ℓ, and $\tilde{w}_{\delta}(m)=\frac {1}{m_{\delta ,j+1}-m_{\delta,j}}\sum_{m_{\delta,j}\leq m<m_{\delta,j+1}}w_{\delta}(m)$ for m _δ,j≤m<m _δ,j+1 and j<ℓ.

Note that $\tilde{w}_{\delta}$ is a nonstationary discounting measure that converges, as δ→0+, to w.

Consider the family of games $\tilde{\varGamma}_{\delta,\tilde {w}_{\delta} }^{m_{\delta}, \nu_{\delta}}$ with $\tilde{g}_{\delta}=\delta g$ and $\tilde{p}_{\delta}=\delta\mu$. By Lemma 3, for every ε>0, there is a sufficiently small d>0 such that if $\tilde{t}$ is such that $d(\tilde {t})<d$ and w([t _ℓ,∞))<d, then for sufficiently small δ>0, the Markov strategy σ _δ guarantees in $\tilde {\varGamma }_{\delta,\tilde{w}_{\delta}}^{m_{\delta}, \nu_{\delta}}$ a payoff that is at least $\tilde{V}-\varepsilon $. Therefore, for sufficiently small δ>0, the Markov strategy σ _δ guarantees in $\varGamma_{\delta,\tilde{w}_{\delta} }^{m_{\delta}, \nu _{\delta}}$ a payoff that is at least $\tilde{V}-2\varepsilon $.

Note that for sufficiently small δ>0, $P^{z}_{\sigma} (z_{m}=z\ \forall m\leq m_{\delta,1})\geq1-d \|\mu\|$ for every strategy profile σ and state z. Therefore, if $d\|\mu\| \|\tilde{V}_{1}\|<\varepsilon /4$, for sufficiently small δ>0, for every strategy τ of player 2, we have $E^{z}_{\sigma _{\delta},\tau}(\tilde{V}_{1}(z_{m_{\delta,1}})+\sum_{m<m_{\delta ,1}}w_{\delta}(m)g_{\delta}(z_{m},a_{m}))\geq\tilde{V}_{0}(z)-2d\|\mu\| \| \tilde {V}_{1}\|-\varepsilon /2>\tilde{V}_{0}(z)-\varepsilon $.

By Lemma 2, $\sum_{m\geq m_{\delta,1}}|\tilde{w}_{\delta}(m)-w_{\delta}(m)|\to0$ as δ→0+. If δ>0 is sufficiently small so that $\sum_{m\geq m_{\delta ,1}}|\tilde{w}_{\delta}(m)-w_{\delta}(m)|<\varepsilon $, then σ _δ guarantees in $\varGamma_{\delta,w_{\delta}}^{m_{\delta}, \nu_{\delta}}$ a payoff that is at least $\tilde{V}-3\varepsilon -\varepsilon \|g\|$. By the construction of σ _δ, σ _δ converges to a continuous-time Markov strategy.

Similarly, we associate with the Markov strategy $\tilde{\tau}$ (and δ>0) a Markov strategy τ _δ that for δ>0 sufficiently small guarantees in $\varGamma_{\delta,w_{\delta} }^{m_{\delta}, \nu_{\delta}}$ a payoff that is at most $\tilde{V}+3\varepsilon +\varepsilon \|g\|$ while τ _δ converges to a continuous-time strategy τ. □

4.3 The Asymptotic Limiting-Average Value

Recall that the family (Γ _δ)_δ>0 has an asymptotic limiting-average value v if for every ε>0 there are δ ₀>0 sufficiently small and strategies σ _δ and τ _δ in Γ _δ, such that for every strategy pair (σ ^∗,τ ^∗), every initial state z, and every 0<δ<δ ₀, we have

$$ \varepsilon +E^z_{\sigma _{\delta} ,\tau ^{*}} \underline{g}_{\delta}\geq v(z)\geq-\varepsilon + E^z_{\sigma ^{*},\tau _{\delta} }\bar {g}_{\delta}. $$

(19)

Theorem 4

A family (Γ _δ)_δ>0 that converges strongly has an asymptotic limiting-average value.

Proof

Let g=lim_δ→0+ g _δ/δ and μ=lim_δ→0+ p _δ/δ. As the function ρ↦v _ρ(g,μ) is semialgebraic and bounded, it converges to a limit v as ρ→0+. Fix ε>0. As every discrete-time stochastic game with finitely many states and actions has a limiting-average value [5], which is the limit of its ρ-discounted values as ρ goes to 0+, there are strategies σ _δ of player 1 and τ _δ of player 2, such that for every strategy pair (σ ^∗,τ ^∗) and every initial state z∈S,

$$ \varepsilon /2+E^z_{\sigma _{\delta} ,\tau^{*}} \underline{g}_{\delta}\geq\lim_{\rho\to0+}{v}_{\delta ,\rho }(z) \geq-\varepsilon /2 + E^z_{\sigma ^{*},\tau_{\delta}}\bar{g}_{\delta}. $$

(20)

As v _δ,ρ→v _ρ(g,μ) uniformly in ρ, there is δ ₀>0 such that for every 0<δ<δ ₀ and every state z∈S, |v _δ,ρ(z)−v _ρ(g,μ)(z)|<ε/2. Therefore, for 0<δ<δ ₀, |lim_ρ→0+ v _δ,ρ(z)−v(z)|≤ε/2, which together with (20) implies (19). □

Remark 10

A family (Γ _δ)_δ>0 that converges in data need not have an asymptotic limiting-average value.

For example, consider a game with two states and a single action for each player in each state. The payoff in state one is 1 and in state 2 it is 0. State 2 is absorbing, i.e., P _δ(1∣2)=0, and the probability of transition from state 1 to state 2, P _δ(2∣1), equals δ ² if δ is rational, and it equals 0 if δ is irrational. Then v _δ,0=0 if δ is rational, and v _δ,0=1 if δ is irrational. Therefore, v _δ,0 does not converge as δ goes to 0.

4.4 The Asymptotic Mixed Discounting and Limiting-Average Value

For every positive measure w _δ on $\mathbb{N}\cup\{\infty\}$, $\varGamma_{\delta,w_{\delta}}$ is the game Γ _δ where the valuation of a play (z ₀,a ₀,z ₁,…) of Γ _δ is given by $\sum_{m=0}^{\infty} w_{\delta}(m)g_{\delta}(z_{m},a_{m})+w_{\delta} (\infty )\lim_{s\to\infty}g_{\delta}(s)$, if the limit exists. Obviously, the limit need not exist.

We say that the two-person zero-sum game $\varGamma_{\delta,w_{\delta}}$ has a value $V_{\delta,w_{\delta} }$, if for every ε>0 there are strategies σ _δ of player 1 and τ _δ of player 2, such that for every strategy τ of player 2, strategy σ of player 1, and initial state z, we have

$$E^z_{\sigma _{\delta},\tau} \Biggl(w_{\delta}(\infty)\underline {g}_{\delta} +\sum_{m=0}^{\infty} w_{\delta}(m)g_{\delta}(z_m,a_m) \Biggr)\geq V_{\delta ,w_{\delta}}(z)-\varepsilon $$

and

$$E^z_{\sigma ,\tau_{\delta}} \Biggl(w_{\delta}(\infty) \bar{g}_{\delta} +\sum_{m=0}^{\infty} w_{\delta}(m)g_{\delta}(z_m,a_m) \Biggr)\leq V_{\delta ,w_{\delta}}(z)+\varepsilon . $$

Theorem 5

If Γ _δ converges strongly and the nonstationary discounting measure w _δ converges to a positive measure w on [0,∞], and w _δ(∞) converges to w(∞), then $V_{\delta,w_{\delta}}$ converges.

Proof

The proof is obtained by collating the result of Theorem 3 with the result of Theorem 4. Let 0<ε<1. Let 0<t<∞ be sufficiently large so that 2w([t,∞))∥g∥<ε, and let w _t be the restriction of w to the interval [0,t). Let v be the asymptotic limiting-average value of the family (Γ _δ)_δ>0, and define $\nu: \mathcal{A}\to\mathbb{R}$ by ν(z,a)=w(∞)v(z). The family (Γ _δ)_δ>0 has an asymptotic (w _t,t,ν) value V.

Assume that the nonstationary discounting measure w _δ converges to w and w _δ(∞) converges to w(∞). Let m _δ=[t/δ] and let w _δ,t be the restriction of w _δ to {0,1,…,m _δ}.

The value $V_{\delta, w_{\delta,t}}^{m_{\delta},\nu}$ of the game $\varGamma _{\delta, w_{\delta,t}}^{m_{\delta},\nu}$ converges to V. Recall that as Γ _δ converges strongly, the limiting-average value of the game Γ _δ, which is denoted by v _δ,0, converges as δ goes to zero to v. Let δ ₀ be sufficiently small so that for 0<δ<δ ₀, (1) $\|V_{\delta, w_{\delta ,t}}^{m_{\delta},\nu}-V\|<\varepsilon $, (2) ∥v _δ,0−v∥<ε, (3) ∥w _δ(∞)−w(∞)∥<ε, and (4) $\|g\|\sum_{m=m_{\delta} }^{\infty} w_{\delta}(m)|<\varepsilon $.

Let σ _δ follow an optimal strategy in $\varGamma _{\delta, w_{\delta,t}}^{m_{\delta},\nu}$ up to stage m _δ, and thereafter it “restarts” with an ε-optimal strategy in the limiting-average game Γ _δ. It follows that for every 0<δ<δ ₀ and strategy τ of player 2,

$$E^z_{\sigma _{\delta},\tau} \Biggl(w_{\delta}(\infty) \bar{g}_{\delta} +\sum_{m=0}^{\infty} w_{\delta}(m)g_{\delta}(z_m,a_m) \Biggr)\geq V(z)-\varepsilon w(\infty )-3\varepsilon . $$

Similarly, if τ _δ follows an optimal strategy in $\varGamma _{\delta, w_{\delta,t}}^{m_{\delta},\nu}$ up to stage m _δ, and thereafter it “restarts” with an ε-optimal strategy in the limiting-average game Γ _δ, then for every 0<δ<δ ₀ and strategy σ of player 1,

$$E^z_{\sigma ,\tau_{\delta}} \Biggl(w_{\delta}(\infty) \bar{g}_{\delta} +\sum_{m=0}^{\infty} w_{\delta}(m)g_{\delta}(z_m,a_m) \Biggr)\leq V(z)+\varepsilon w(\infty )+3\varepsilon . $$

□

4.5 The Asymptotic Uniform and w-Robust Value

Theorem 6

An exact family of two-person zero-sum games Γ _δ has an asymptotic uniform value.

Proof

Let v=lim_δ→0+ v _δ,0. It is sufficient to prove that for every ε>0 there are (1) a duration δ ₀>0, (2) strategies σ _δ of player 1 and τ _δ of player 2, and (3) a positive real number s _ε, such that for every strategy τ of player 2, strategy σ of player 1, 0<δ<δ ₀, and s>s _ε we have

$$ E^z_{\sigma _{\delta},\tau}g_{\delta}(s)\geq v(z)-\varepsilon , $$

(21)

and

$$ E^z_{\sigma ,\tau_{\delta}}g_{\delta}(s)\leq v(z)+\varepsilon . $$

(22)

By duality, it suffices to prove (21).

Let $A=\max\{|g(z,a)|: (z,a)\in\mathcal{A}\}$, and g _δ=δg.

The first step is to show that for an exact family Γ _δ the following property holds. There is an integrable function $\psi :[0,1]\to\mathbb{R}_{+}$ and δ ₀>0 sufficiently small such that for 0<ρ<ρ′≤1 and 0<δ<δ ₀, we have

$$ \|v_{\delta,\rho}-v_{\delta,\rho '}\| \leq\int _{\rho}^{\rho'}\psi(x)\,dx. $$

(23)

The second step is to show that if the family Γ _δ of two-person zero-sum games satisfies the above-mentioned property, then it has an asymptotic uniform value.

We start with the first step. Fix the payoff function g and the transition rates μ. By the covariance properties, $V_{\delta,\rho}=\bar{V}_{\rho\delta}(\delta g, \delta\mu )=V_{\rho\delta }(\delta g, (1-\rho\delta)\delta\mu)= \frac{\delta}{(1-\rho\delta)\delta}V_{\frac{\delta\rho }{(1-\delta\rho )\delta}}(g,\mu)=\frac{1}{1-\rho\delta} V_{\frac{\rho}{(1-\rho\delta)}}(g,\mu)$. Therefore,

$$v_{\delta,\rho}=v_{\frac{\rho}{(1-\rho\delta)}}(g,\mu). $$

The function ρ↦v _ρ:=v _ρ(g,μ) (is semialgebraic and thus) has a convergent expansion, $v_{\rho}(z)=\sum_{k=0}^{\infty} c_{k}(z)\rho^{k/K}$ (where K is a positive integer), in a right neighborhood of 0. Therefore, there is 1/2>ρ ₀>0 such that its derivative, $v'_{\rho}(z):=\frac{d}{d\rho}v_{\rho}(z)$, exists in the interval (0,2ρ ₀], and its absolute value is bounded by a positive constant C ₁ times ρ ^1/K−1. Therefore, for δ<1/4, the derivative $\frac{d}{d\rho}v_{\frac{\rho}{(1-\rho\delta)}}$ of the function $(0,\rho_{0}]\ni\rho\mapsto v_{\frac{\rho}{(1-\rho\delta )}}:=v_{\frac{\rho }{(1-\rho\delta)}}(g,\mu)$ equals $\frac{1}{(1-\rho\delta)^{2}} v'_{\frac {\rho}{(1-\rho\delta)}}$; thus, it is bounded (in the interval (0,ρ ₀]) by a positive constant C ₂ times ρ ^1/K−1. (E.g., C ₂=2C ₁.) The function ρ↦v _δ,ρ is (2A/ρ ₀)-Lipschitz in ρ in the interval (ρ ₀,1] (∥v _δ,ρ−v _δ,θ∥≤2A|ρ−θ|/ρ, e.g., by [5, Lemma 4.2]). The function ψ that is defined by ψ(x)=2C ₁ x ^1/K−1 for 0<x≤ρ ₀ and ψ(x)=2A/ρ ₀ for 1≥x>ρ ₀ is integrable and satisfies (23).

We turn now to the second step. Let Γ _δ be a converging family, $\psi:[0,1]\to\mathbb{R}_{+}$ be an integrable function, and δ ₀>0, such that for 0<ρ<ρ′≤1 and 0<δ<δ ₀, inequality (23) holds.

Fix ε>0 and w.l.o.g. we assume that 0<ε<A. Fix δ ₀>0 and λ ₀>0 sufficiently small so that for 0<δ<δ ₀ and 0<ρ<λ ₀, ∥v _δ,ρ−v∥<ε.

Fix 0<δ<δ ₀. We apply the proof of the existence of a value of the discrete-time stochastic game 〈δg,δμ〉, [5, Sect. 2]. In what follows, we define a strategy σ _δ of player 1 in Γ _δ. We will define a sequence $(\rho_{k})_{m=0}^{\infty}$ so that ρ _k is a function of the past history up to stage k[1/δ], i.e., measurable with respect to the σ-algebra $\mathcal {F}_{k}:=\mathcal{H}_{k[1/\delta]}$ where [∗] stands for the largest integer that is ≤. The $(\rho_{k})_{k=0}^{\infty}$ strategy σ _δ of player 1 is to play a stationary optimal strategy in $\varGamma _{\delta,\rho_{k}}$ at stage k[1/δ]≤m<(k+1)[1/δ]. Let

For every strategy τ of player 2, we have

$$E_{\sigma _{\delta},\tau}\bigl(\rho_k y_k+ (1-\delta \rho_k)^{[1/\delta ]}v_{\delta ,\rho_k}(\bar{z}_{k+1})\mid \mathcal{F}_k\bigr)\geq v_{\delta,\rho _k}(\bar{z}_k). $$

Note that for every ε>0 there is λ ₀>0 and δ ₀ such that for 0<ρ _k<λ ₀ and 0<δ<δ ₀ we have

$$\sum_{k[1/\delta]\leq m < (k+1)[1/\delta]}\bigl|(1-\delta\rho _k)^{m-k[1/\delta]}-1\bigr| \delta\rho_k+\bigl|(1-\delta\rho_k)^{[1/\delta]}- (1- \rho_k)\bigr|\leq \varepsilon \rho_k/A. $$

It follows that for 0<δ<δ ₀ and 0<ρ _k<λ ₀ we have

$$ E_{\sigma _{\delta},\tau}\bigl(v_{\delta ,\rho _k}(\bar {z}_{k+1})-v_{\delta,\rho_k}( \bar{z}_k)+\rho_k \bigl(x_k-v_{\delta,\rho _k}( \bar{z}_{k+1})\bigr)\mid\mathcal{F}_k\bigr)\geq-\varepsilon \rho_k $$

(24)

for every strategy τ of player 2. Now one follows the proof of [5, Sect. 2] by replacing inequality [5, (2.1)] with inequality (24). The index i in [5, Sect. 2] is replaced by our stage index k (λ _i by ρ _k, v _λ by v _δ,ρ, and z _i by $\bar{z}_{k}$).

With these substitutions, inequality [5, (2.15)] becomes

$$ \sum_{k<n}x_k\geq\sum _{k<n} v_{\delta,\rho_k}(\bar {z}_{k+1})+s_n-s_0-2A \sum_{k<n}I(s_{k+1}=M)-4n\varepsilon . $$

(25)

Note that the term −ερ _k in inequality (24) does not appear in [5, (2.1)]. It impacts inequality [5, (2.9)] as −ερ _k needs to be added to its right side. Therefore, we have to replace [5, (2.9)] with $E(Y_{k+1}-Y_{k}\mid\mathcal{F}_{k})\geq \varepsilon \rho_{k}$ (where E stands for $E_{\sigma _{\delta},\tau}$) and, therefore, $E \#\{k: \rho_{k}\geq \eta\} \leq \frac{A}{\varepsilon \eta}$ (rather than $\leq\frac{2A}{\varepsilon \eta}$ in [5, (2.12)]). Therefore, $E\sum_{k<n}I(s_{k+1}=M)\leq\frac{A}{\varepsilon \lambda(M)}$ and, therefore, for n sufficiently large E∑_k<n I(s _k+1=M)≤εn/(2A).

For δ and ρ sufficiently small, ∥v _δ,ρ−v∥≤ε, where v=lim_δ→0 v _δ,0. Therefore, inequality (25) implies that

$$ E_{\sigma _{\delta},\tau}\sum_{k<n}x_k \geq nv(z_0)-3\varepsilon n -s_0-\varepsilon n-4n\varepsilon . $$

(26)

□

Remark 11

Note that the inequality $E \sum_{k<n}I(s_{k+1}=M)\leq \frac {A}{\varepsilon \lambda(M)}$ (in the above proof) implies that ∑_k<∞ I(s _k+1=M) is finite a.s. Therefore,

$$E_{\sigma _{\delta},\tau}\biggl(\liminf_{n\to\infty}\frac{1}{n}\sum _{m<n}g(z_m,a_m) \biggr)=E_{\sigma _{\delta},\tau}\biggl(\liminf_{n\to\infty }\frac {1}{n} \sum_{k<n}x_k\biggr)\geq v(z_0)-7\varepsilon . $$

This shows that the above-constructed strategy σ _δ of player 1 is approximate optimal in both the uniform game and the limiting-average game. Therefore, an exact family of two-person zero-sum games Γ _δ has an asymptotic 1_∞-robust value.

Theorem 7

For every nonstationary discounting measure w on [0,∞], an exact family of two-person zero-sum games Γ _δ has an asymptotic w-robust value.

Proof

If w(∞)=0, then an asymptotic w value is a w-robust value. Therefore, it suffices to prove the result for w with w(∞)>0. For every β>0, the family (Γ _δ)_δ>0 has an asymptotic w-robust value if and only if it has an asymptotic βw-robust value. Therefore, we may assume that w(∞)=1.

Let ν be the asymptotic 1_∞-robust value of the exact family (Γ _δ)_δ>0. Fix ε>0 and let τ _δ be a family of strategy profiles that are ε-optimal in the 1_∞-robust game. Let t=t _ε<∞ be sufficiently large so that w([t,∞))<ε/∥g∥. The family (Γ _δ)_δ>0 has an asymptotic (w _t,t,ν) value v _ε, where w _t is the restriction of w to the interval [0,t). Let m _δ=[t/δ] and let τ _δ be a profile of strategies that is optimal in $\varGamma^{m_{\delta} ,\nu }_{\delta,w_{\delta,t}}$, where w _δ,t is the nonstationary discounting measure that satisfies w _δ,t(m)=w([mδ,(m+1)δ)) if m<m _δ and w _δ,t(m)=0 otherwise.

The strategy profile σ _δ follows the strategy profile τ _δ,t in stages 0≤m<m _δ and in stage m _δ starts following the strategy profile τ _δ (explicitly, $\sigma _{\delta}(z_{0},a_{0},\ldots,z_{m_{\delta}+k})=\tau_{\delta}(z_{m_{\delta} },\ldots ,z_{m_{\delta}+k})$).

Then for every player i, all strategies $\bar{\tau}^{i}_{\delta}$ (δ>0) of player i, and all nonstationary discounting measures w _δ on $\mathbb{N}\cup\{\infty\}$ that converge (as δ→0+) to w, we have

$$2\varepsilon +\liminf_{\delta\to0+}E^z_{\sigma ^1_{\delta},\bar{\tau }^2_{\delta}} \underline{g}_{\delta}(w_{\delta})\geq v_{\varepsilon }(z)\geq-2\varepsilon + \liminf_{\delta \to0+}E^z_{\bar{\tau}^1_{\delta},\sigma ^2_{\delta},} \bar{g}_{\delta} (w_{\delta}). $$

A limit point (as ε→0+) of v _ε is an asymptotic w-robust value of the family (Γ _δ)_δ>0. □

5 Non-zero-Sum Stochastic Games with Short-Stage Duration: The Discounted Games

5.1 The Asymptotic Discounted Equilibrium

Fix the sets of players N, states S, and actions A, and let Γ _δ=〈N,S,A,g _δ,p _δ〉 be a stochastic game whose stage payoff function g _δ and transition function p _δ depend on the parameter δ>0 that represents the single-stage duration. Let Γ _δ,ρ be the (unnormalized) discounted game Γ _δ with discount factor 1−ρδ. We say that pair (V,σ), where $V\in\mathbb {R}^{N\times S}$ is a payoff vector and σ is a strategy profile, is an asymptotic ρ-discounted ε-equilibrium of (Γ _δ)_δ>0 if for every δ>0 sufficiently small, every player i∈N, every strategy τ ⁱ of player i in Γ _δ, and every state z,

$$-\varepsilon +E^z_{\delta,\sigma ^{-i},\tau^i}\sum_{m=0}^{\infty}(1- \delta \rho)^m g^i_{\delta}(z_m,a_m) \leq V^i(z)\leq E^z_{\delta,\sigma }\sum _{m=0}^{\infty} (1-\delta\rho)^m g^i_{\delta}(z_m,a_m)+\varepsilon . $$

The pair (V,σ) is an asymptotic ρ-discounted equilibrium if it is an asymptotic ρ-discounted ε-equilibrium for every ε>0. It is called an asymptotic ρ-discounted stationary ε-equilibrium, respectively an asymptotic ρ-discounted stationary equilibrium, if, in addition, σ is stationary.

Theorem 8

Every converging family (Γ _δ)_δ>0 has an asymptotic ρ-discounted stationary equilibrium.

Proof

Let σ be a stationary strategy and $V_{\rho}\in\mathbb {R}^{N\times S}$ such that for every z∈S, i∈N, and a ⁱ∈A ⁱ(z), we have

$$\rho V(z)=g\bigl(z,\sigma (z)\bigr)+ \sum_{z'\in S}\mu \bigl(z',z,\sigma (z)\bigr)V\bigl(z'\bigr), $$

and

$$\rho V^i(z)\geq g^i\bigl(z,\sigma (z)^{-i},a^i \bigr)+ \sum_{z'\in S}\mu \bigl(z',z,\sigma (z)^{-i},a^i\bigr)V\bigl(z'\bigr). $$

The existence of such a pair (V,σ) follows (as in the proof of Theorem 1) from the existence of stationary equilibria in discounted discrete-time stochastic games; alternatively, see, e.g., [8].

Let τ ⁱ be a strategy of player i. Fix an initial history h _m=(z ₀,a ₀,…,z _m), and let y _m=σ(z _m), $x_{m}^{i}=\tau^{i}(h_{m})$, and $x_{m}=\sigma ^{-i}(z_{m})\otimes x^{i}_{m}$. Let

and

It follows that

Therefore,

$$E^z_{\delta,\sigma } \sum_{m=0}^{\infty}(1- \rho\delta )^mg^i_{\delta} (z_m,a_m) \geq V^i(z_0) -o(\delta) \sum _{m=0}^{\infty}(1- \rho\delta )^m\to _{\delta\to0+} V^i(z_0). $$

Similarly,

$$\begin{aligned} U_m &\leq \delta g^i(z_m,a_m)+ \sum_{z'\in S}\delta\mu \bigl(z',z,x_m \bigr)V^i\bigl(z'\bigr)-\rho\delta V^i(z_m)+V^i(z_m)+o( \delta) \\ &\leq V^i(z_m)+o(\delta). \end{aligned} $$

Therefore,

$$E^z_{\delta,\sigma ^{-i},\tau^i} \sum_{m=0}^{\infty}(1- \rho \delta )^mg^i_{\delta}(z_m,a_m) \leq V^i(z) +o(\delta) \sum_{m=0}^{\infty}(1- \rho \delta)^m\to_{\delta\to0+} V^i(z). $$

We conclude that for sufficiently small δ>0 we have

$$-\varepsilon + E^z_{\delta,\sigma ^{-i},\tau^i} \sum_{m=0}^{\infty}(1- \rho \delta )^mg^i_{\delta}(z_m,a_m) \leq V^i(z)\leq E^z_{\delta,\sigma } \sum _{m=0}^{\infty} (1- \rho\delta)^mg^i_{\delta}(z_m,a_m)+ \varepsilon . $$

□

Remark 12

The conclusion of Theorem 8 (as well as its proof) holds also for the model with individual discount rates $\vec {\rho}=(\rho_{i})_{i\in N}$.

Covariance Properties

Fix α,β>0. A point (x,V)∈×_z∈S,i∈N(X ⁱ(z)×[−∥g ⁱ∥/ρ,∥g ⁱ∥/ρ]) is a stationary equilibrium (strategies and payoffs) of the continuous-time ρ-discounted game Γ=〈N,S,A,g,μ〉 if and only if (x,V) is a stationary equilibrium of the continuous-time αρ-discounted game Γ=〈N,S,A,αg,αμ〉, and given 0<ρ<1 and ∥μ∥≤1−ρ, if and only if it is a stationary equilibrium of the discrete-time ρ-discounted game $\bar {\varGamma}=\langle N,S,A,g,\bar{p}\rangle$, where $\bar{p}$ is the transition probability that is given by $\bar{p}(z', z,a)= \frac {1}{1-\rho}\mu(z',z,a)$ for all z′≠z.

5.2 The Asymptotic Discounted Minmax

Fix the sets of players N, states S, and actions A, and let Γ _δ=〈N,S,A,g _δ,p _δ〉 be a stochastic game whose stage payoff function g _δ and transition function p _δ depend on the parameter δ>0 that represents the single-stage duration. The (unnormalized) ρ-discounted minmax of the discrete-time game Γ _δ is defined as the (uncorrelated) minmax of the discrete-time stochastic game Γ _δ with discount factor 1−δρ. It exists and is denoted by V _δ,ρ. We say that $V_{\rho}\in\mathbb{R}^{N\times S}$ is the (unnormalized) asymptotic ρ-discounted minmax of the family (Γ _δ)_δ>0 if V _δ,ρ→V _ρ as δ→0+.

Using arguments analogous to those used in earlier sections, it follows that (1) $V_{\delta,\rho}=(V^{i}_{\delta,\rho}(z))_{(i,z)\in N\times S}$ is the unique solution of the following system of |N×S| equalities,

$$\delta\rho V^i(z)=\min_{x^{-i}\in X^{-i}(z)}\max _{x^i\in X^i(z)}g^i_{\delta}\bigl(z,x^{-i} \otimes x^i\bigr)+(1-\delta\rho)\sum_{z'\in S}p_{\delta} \bigl(z',z,x^{-i}\otimes x^i \bigr)V^i\bigl(z'\bigr), $$

where X ⁻ⁱ(z):=×_j≠i X ⁱ(z), (2) a family (Γ _δ)_δ>0 (Γ _δ=〈g _δ,p _δ〉 for short) that converges to 〈g,μ〉 has an asymptotic ρ-discounted minmax V _ρ, and (3) $V_{\rho}=(V^{i}_{\rho}(z))_{(i,z)\in N\times S}$ is the unique solution of the system of |N×S| equalities,

$$\rho V^i(z)=\min_{x^{-i}\in X^{-i}}\max_{x^i\in X^i(z)}g \bigl(z,x^{-i}\otimes x^i\bigr)+\sum _{z'\in S}\mu\bigl(z',z,x^{-i}\otimes x^i\bigr)V^i\bigl(z'\bigr). $$

The normalized ρ-discounted minmax values are v _ρ=ρV _ρ and v _δ,ρ=ρV _δ,ρ. The semialgebraic and covariance properties of the value of zero-sum games hold for the minmax value of non-zero-sum games as well.

In particular, for fixed g _δ, p _δ, g, and μ, the maps ρ↦v _ρ and ρ↦v _δ,ρ are bounded semialgebraic functions, and thus have a limit as ρ→0+, the maps ρ↦V _ρ and ρ↦V _δ,ρ are semialgebraic, v _δ,ρ(g _δ,p _δ)=v _ρ(g _δ/δ,(1−ρδ)p _δ/δ), inequality (16) holds, and if Γ _δ=〈g _δ,p _δ〉 converges strongly to Γ=〈g,μ〉, then v _δ,ρ converges, as δ→0+, uniformly in ρ.

5.3 The Asymptotic Equilibrium of Nonstationary Discounting Games

The following theorem is a generalization of Theorem 3 to the non-zero-sum case. Its proof is analogous to the proof of Theorem 3.

Theorem 9

If (1) (Γ _δ)_δ>0 is a family that converges in data, (2) $\vec {w}$ is a nonstationary discounting N-vector measure on [0,∞), (3) $t\in\mathbb{R}$, and (4) $\nu: \mathcal {A}\to\mathbb{R}^{N}$, then the family (Γ _δ)_δ>0 has an asymptotic $(\vec {w},t,\nu)$ equilibrium payoff v. If (1) $\vec {w}_{\delta}$ is a nonstationary discounting N-vector measure on $\mathbb{N}$ that converges to $\vec {w}$, and (2) $0\leq m_{\delta}\in\mathbb{N}$ and $\nu_{\delta}: \mathcal{A}\to\mathbb{R}^{N}$ are such that (m _δ,ν _δ)→_δ→0+(t,ν), then for every ε>0, there are Markov strategy profiles σ _δ and δ ₀>0 such that (1) for every 0<δ<δ ₀, σ _δ is an ε-equilibrium of $\varGamma _{\delta ,\vec {w}_{\delta}}^{m_{\delta},\nu_{\delta}}$ with an equilibrium payoff within ε of v, and (2) σ _δ converge w ^∗ to a profile of continuous-time Markov strategies.

6 Non-zero-Sum Stochastic Games with Short-Stage Duration: The Limiting-Average and Uniform Games

Fix the sets of players N, states S, and actions A, and let Γ _δ=〈N,S,A,p _δ,g _δ〉 be a stochastic game whose stage payoff function g _δ and transition function p _δ depend on the parameter δ>0 that represents the single-stage duration.

For every strategy profile σ in Γ _δ, we set

$$\bar{\gamma}^i_{\delta}(z,\sigma )=E^z_{\delta,\sigma } \,\bar {g}^i_{\delta}, \quad \mbox{and}\quad \underline{ \gamma}^i_{\delta}(z,\sigma )=E^z_{\delta,\sigma }\, \underline {g}^i_{\delta}. $$

6.1 The Asymptotic Limiting-Average and Uniform Minmax

We say that the vector $v\in\mathbb{R}^{N\times S}$ is the asymptotic limiting-average minmax of the family (Γ _δ)_δ>0 if for every ε>0 there is δ ₀>0 such that for every player i and 0<δ<δ ₀, (1) there is a strategy profile $\sigma _{\delta,\varepsilon }^{-i}$ of players N∖{i} such that for every strategy τ ⁱ of player i and every state z∈S,

$$\bar{\gamma}^i_{\delta}\bigl(z,\sigma _{\delta,\varepsilon }^{-i}, \tau^i\bigr)\leq v^i(z)+\varepsilon , $$

and (2) for every strategy profile $\sigma _{\delta}^{-i}$ of players N∖{i} there is a strategy τ ⁱ of player i such that for every state z∈S,

$$\underline{\gamma}^i_{\delta}\bigl(z,\sigma _{\delta}^{-i},\tau^i\bigr) \geq v^i(z)- \varepsilon . $$

We say that the vector $v\in\mathbb{R}^{N\times S}$ is the asymptotic uniform minmax of the family (Γ _δ)_δ>0 if for every ε>0 there are δ ₀>0 and s ₀>0 such that for every player i and 0<δ<δ ₀, (1) there is a strategy profile $\sigma _{\delta,\varepsilon }^{-i}$ of players N∖{i} such that for every strategy τ ⁱ of player i, state z∈S, and duration s>s ₀,

$$E^z_{\sigma _{\delta,\varepsilon }^{-i},\tau^i}g^i_{\delta}(s)\leq v^i(z)+\varepsilon , $$

and (2) for every strategy profile $\sigma _{\delta}^{-i}$ of players N∖{i} there is a strategy τ ⁱ of player i such that for s>s ₀,

$$E_{\sigma _{\delta}^{-i},\tau^i}g^i_{\delta}(s) \geq v^i(z)-\varepsilon . $$

We say that the vector $v\in\mathbb{R}^{N\times S}$ is the asymptotic robust minmax of the family (Γ _δ)_δ>0 if for every ε>0 there are δ ₀>0 and s ₀>0 such that for every player i and 0<δ<δ ₀, (1) there is a strategy profile $\sigma _{\delta,\varepsilon }^{-i}$ of players N∖{i} such that for every strategy τ ⁱ of player i, state z∈S, and duration s>s ₀,

$$E^z_{\sigma _{\delta,\varepsilon }^{-i},\tau^i}g^i_{\delta}(s)\leq v^i(z)+\varepsilon \quad \mbox{and}\quad \bar{\gamma}^i_{\delta} \bigl(z,\sigma _{\delta,\varepsilon }^{-i},\tau ^i\bigr)\leq v^i(z)+\varepsilon , $$

and (2) for every strategy profile $\sigma _{\delta}^{-i}$ of players N∖{i} there is a strategy τ ⁱ of player i, such that for every state z∈S and duration s>s ₀,

$$E^z_{\sigma _{\delta}^{-i},\tau^i}g^i_{\delta}(s) \geq v^i(z)-\varepsilon \quad \mbox{and}\quad \underline{\gamma}^i_{\delta} \bigl(z,\sigma _{\delta }^{-i},\tau ^i\bigr) \geq v^i(z)-\varepsilon . $$

Theorem 10

A family (Γ _δ)_δ>0 that converges strongly to Γ=〈μ,g〉 has a limiting-average minmax $v:S\to\mathbb{R}^{N}$, which is the limit of ρV _ρ as ρ→0+, where V _ρ is the unique solution of the following system of equalities:

$$\rho V^i(z)=\min_{x^{-i}}\max_{y^i} \biggl(g^i\bigl(z,x^{-i},y^i\bigr)+\sum _{z'\in S}\mu\bigl(z',z,x^{-i},y^i \bigr)V^i(z) \biggr),\quad \forall i\in N,\ z\in S. $$

If the family is exact it has an asymptotic robust minmax (and, therefore, an asymptotic uniform minmax as well).

Proof

The proof that a family that converges strongly has an asymptotic limiting-average minmax is analogous to the proof of Theorem 4. Let $\tilde{v}_{\delta}=\lim_{\rho\to0+}v_{\delta,\rho}$.

As every discrete-time stochastic game with finitely many states and actions has a limiting-average minmax [5, 7], which is the limit of its ρ-discounted minmax as ρ goes to 0+, it suffices to prove that $\lim_{\delta\to0+}\tilde{v}_{\delta}$ exists.

As mentioned in the last section, if 〈g _δ,p _δ〉 converges strongly, then v _δ,ρ converges to v _ρ, as δ→0, uniformly in ρ. Therefore, for every ε>0, there is δ ₁>0 such that for 0<δ,δ′≤δ ₁ we have ∥v _δ,ρ−v _δ′,ρ∥<ε and, therefore, $\| \tilde {v}_{\delta}-\tilde{v}_{\delta'}\|\leq \varepsilon $.

The proof that an exact family has an asymptotic minmax is analogous to the proof of Theorem 6. □

6.2 The Asymptotic Limiting-Average Equilibrium

We say that $u=(u^{i}(z))_{i\in N,\,z\in S}\in\mathbb{R}^{N\times S}$ is an asymptotic limiting-average ε-equilibrium payoff of (Γ _δ)_δ>0 if for every δ>0 sufficiently small there is a strategy profile σ _δ, such that for every player i∈N, strategy τ ⁱ of player i, and state z,

$$-\varepsilon +\bar{\gamma}^i_{\delta}\bigl(z,\sigma ^{-i}_{\delta},\tau^i\bigr)\leq u^i(z) \leq \underline{\gamma}^i_{\delta}(z,\sigma _{\delta})+ \varepsilon . $$

Note that it is an asymptotic limiting-average equilibrium payoff if it is an asymptotic limiting-average ε-equilibrium payoff for every ε>0.

Remark 13

Note that the existence of a limiting-average equilibrium, respectively ε-equilibrium, payoff in each one of the games Γ _δ does not imply (and is not implied by) the existence of an asymptotic limiting-average equilibrium, respectively ε-equilibrium, payoff of the family (Γ _δ)_δ>0.

Remark 14

If $u_{\varepsilon }\in\mathbb{R}^{N\times S}$ is an asymptotic limiting-average ε-equilibrium payoff of the family (Γ _δ)_δ>0 and $u\in\mathbb {R}^{N\times S}$, then u is an asymptotic limiting-average ε′-equilibrium payoff of the family (Γ _δ)_δ whenever ε′≥ε+∥u−u _ε∥. Therefore, a limit point, as δ→0+, of asymptotic limiting-average ε-equilibrium payoffs is an asymptotic limiting-average ε′-equilibrium payoff whenever ε′>ε, and a limit point, as ε→0+, of asymptotic limiting-average ε-equilibrium payoffs is an asymptotic limiting-average equilibrium payoff.

Two related equilibrium concepts are the lim sup and the lim inf equilibrium payoffs. We say that $u=(u^{i}(z))_{i\in N,\,z\in S}\in\mathbb{R}^{N\times S}$ is an asymptotic lim sup ε-equilibrium payoff, respectively an asymptotic lim inf ε-equilibrium payoff, of (Γ _δ)_δ>0 if for every δ>0 sufficiently small there is a strategy profile σ _δ, such that for every player i∈N, strategy τ ⁱ of player i in Γ _δ, and state z,

$$-\varepsilon +\bar{\gamma}^i_{\delta}\bigl(z,\sigma ^{-i}_{\delta},\tau^i\bigr)\leq u^i(z) \leq\bar {\gamma}^i_{\delta}(z,\sigma _{\delta})+\varepsilon , $$

respectively,

$$-\varepsilon +\underline{\gamma}^i_{\delta}\bigl(z,\sigma ^{-i}_{\delta},\tau ^i\bigr)\leq u^i(z) \leq\underline{\gamma}^i_{\delta}(z,\sigma _{\delta})+\varepsilon . $$

The corresponding strategies σ _δ are 2ε-equilibrium strategies of Γ _δ with the lim sup, respectively lim inf, payoff function.

We say that $u=(u^{i}(z))_{i\in N,\,z\in S}\in\mathbb{R}^{N\times S}$ is an asymptotic lim sup equilibrium payoff, respectively an asymptotic lim inf equilibrium payoff, if it is an asymptotic lim sup ε-equilibrium payoff, respectively, an asymptotic lim inf ε-equilibrium payoff, for every ε>0.

Remark 15

Obviously, an asymptotic limiting-average equilibrium payoff is an asymptotic lim sup and an asymptotic lim inf equilibrium payoff. However, there are stochastic games with countably many states that have both an asymptotic lim sup equilibrium payoff and an asymptotic lim inf equilibrium payoff, such that, moreover, both payoffs coincide, but have no asymptotic limiting-average equilibrium payoffs.

Remark 16

It is unknown whether every stochastic game with finitely many states and actions has a lim sup, respectively lim inf, equilibrium payoff. In particular, it is unknown whether every stochastic game with finitely many states and actions has a limiting-average equilibrium payoff.

Theorem 11

A family (Γ _δ=〈g _δ,p _δ〉)_δ>0 that converges strongly to Γ=〈g,μ〉 has a limiting-average equilibrium payoff.

Proof

Let (Γ _δ)_δ>0 be a family that converges strongly to Γ=〈μ,g〉. Then $|g^{i}_{\delta}(z,a)-\delta g^{i}(z,a)|=o(\delta)$ and, therefore, $|\frac{1}{n\delta}\sum_{0\leq m<n}g^{i}_{\delta}(z_{m},a_{m}) -\frac{1}{n\delta}\sum_{0\leq m<n}\delta g^{i}(z_{m},a_{m}) |\leq \max_{z,a}|g^{i}_{\delta}(z,a)-\delta g^{i}(z,a)|/\delta =o(1)$ as δ→0+. Therefore, it suffices to prove the theorem for the special case where $g^{i}_{\delta}=\delta g^{i}$. Note that in this special case

$$\frac{1}{n\delta}\sum_{0\leq m<n}g^i_{\delta}(z_m,a_m)= \frac {1}{n\delta }\sum_{0\leq m<n}\delta g^i(z_m,a_m)=\frac{1}{n}\sum _{0\leq m<n}g^i(z_m,a_m). $$

Therefore, $\bar{g}^{i}_{\delta}$ and $\underline{g}^{i}_{\delta}$, as a function of the play z ₀,a ₀,… , are independent of δ. Therefore, we write $\bar{g}^{i}$ and $\underline{g}^{i}$ for short for $\bar{g}^{i}_{\delta}$ and $\underline{g}^{i}_{\delta}$. Without loss of generality, we may assume that 0≤g ⁱ≤1.

By Remark 14, it suffices to prove that for every ε>0 there is a vector $u\in\mathbb{R}^{N\times S}$ that is an asymptotic limiting-average ε-equilibrium payoff.

Fix ε>0 and let u and σ be, respectively, the uniform (and limiting-average) ε/8-equilibrium payoff and the uniform (and limiting-average) ε/8-equilibrium strategy of the continuous-time stochastic game Γ=〈N,S,A,μ,g〉 that are constructed in [8]. In particular, for every state z∈S, player i∈N, and strategy τ ⁱ of player i, we have

$$ u^i(z)+\varepsilon /8\geq E^z_{\sigma} \bar{g}^i\geq E^z_{\sigma} \underline {g}^i\geq u^i(z)-\varepsilon /8, $$

(27)

where $\bar{g}^{i}=\limsup_{s\to\infty}\frac{1}{s} \int_{0}^{s} g^{i}(z_{t},x_{t})\, dt$ and $\underline{g}^{i}=\liminf_{s\to\infty}\frac{1}{s} \int_{0}^{s} g^{i}(z_{t},x_{t})\,dt$, and

$$ E^z_{\sigma ^{-i},\tau^i} \bar {g}^i\leq u^i(z)+\varepsilon /8. $$

(28)

These inequalities follow from (u,σ) being a limiting-average ε/8-equilibrium payoff and strategy profile. (An additional property that follows from the special construction of σ in [8] is that $\bar{g}^{i}= \underline{g}^{i}$ $P^{z}_{\sigma} $ a.e.)

Let $v:S\to\mathbb{R}^{N}$ be the limit of ρV _ρ as ρ→0+, where V _ρ is the asymptotic ρ-discounted minmax. Recall that V _ρ is the unique solution of the following system of equalities:

$$\rho V^i(z)=\min_{x^{-i}}\max_{y^i} \biggl(g^i\bigl(z,x^{-i},y^i\bigr)+\sum _{z'\in S}\mu\bigl(z',z,x^{-i} \otimes y^i\bigr)V^i(z) \biggr), \quad \forall i\in N,\ z\in S. $$

As the strategy profile σ (that is constructed in [8]) is a discretimized strategy (namely, there is a strictly increasing sequence of continuous times t ₀=0<t ₁<t ₂<⋯ , such that t _ℓ→_ℓ→∞∞ and the mixed-action profile selected by σ at time t _ℓ≤t<t _ℓ+1 is a function of the play up to time t _ℓ and the state at time t), it follows that for every ε′>0 and for every player i there is a strategy $\tau^{i}_{\varepsilon '}$ such that $v^{i}(z)-\varepsilon '< E^{z}_{\sigma ^{-i},\tau^{i}_{\varepsilon '}}\underline{g}^{i}$ ($\leq E^{z}_{\sigma ^{-i},\tau^{i}_{\varepsilon '}}\bar{g}^{i}$). Therefore, the inequalities $u^{i}(z)+\varepsilon /8\geq E^{z}_{\sigma ^{-i},\tau^{i}_{\varepsilon '}}\bar{g}^{i}\geq v^{i}(z)-\varepsilon '$ hold for every ε′>0. Therefore,

$$ u^i(z)\geq v^i(z)-\varepsilon /8. $$

(29)

We will prove that u is an asymptotic limiting-average ε-equilibrium payoff of the family (Γ _δ)_δ>0. The construction of the corresponding limiting-average ε-equilibrium strategy profile σ _δ is analogous to the construction of σ in [8]. The continuous-time pure-action strategy profiles $\bar {\tau}$ and $\hat{\tau}$, which are used in [8] in the definition of σ, will be adapted to the discrete-time pure strategies $\bar{\tau}_{\delta}$ and τ _δ, respectively.

The continuous-time pure-action strategy profile $\bar{\tau}$ obeys the following property. There is a sequence of continuous times $\mathcal{T}: 0=t_{0}<t_{1}<\cdots$ (with t _k→_k→∞∞) such that for t _k≤t<t _k+1, $\bar{\tau}(h,t)$ is a function of t, z _t, and the finite sequence of states $\vec {z}_{k}=(z_{t_{0}},\ldots ,z_{t_{k}})$. Therefore, for t _k≤t<t _k+1, we can write $\bar {\tau }(\vec {z}_{k},z_{t},t)$ for $\bar{\tau}(h,t)$.

The corresponding discrete-time pure strategy $\bar{\tau}_{\delta}$ will be such that there is a sequence of stages $\mathcal{T}^{\delta}:0=n_{\delta ,0}<\cdots<n_{\delta,k}<\cdots$ such that (1) δn _δ,k→_δ→0+ t _k, (2) for n _δ,k≤m<n _δ,k+1, $\bar {\tau}_{\delta}(z_{0},a_{0},\ldots,z_{m})$ is a function of m, z _m, and the finite sequence of states $\vec {z}^{\delta}_{k}=(z_{n_{\delta ,0}},\ldots,z_{n_{\delta,k}})$; thus we can write $\bar{\tau }_{\delta} (\vec {z}^{\delta}_{k},z_{m},m)$ for $\bar{\tau}(z_{0},a_{0},\ldots ,z_{m})$, and (3) for fixed $\vec {z}_{k}=\vec {z}^{\delta} _{k}$ ∈S ^k+1, the map $[n_{\delta,k},n_{\delta,k+1})\ni m \mapsto \bar{\tau}_{\delta}(\vec {z}^{\delta}_{k},z_{m},m)$, which (given $\vec {z}^{\delta}_{k}$) is a Markov strategy on this interval of stages, converges w ^∗ to the (given $\vec {z}_{k}$) Markov strategy $[t_{k},t_{k+1})\ni t\mapsto\bar{\tau}(\vec {z}_{k},z_{t},t)$.

This relation between the continuous-time strategy $\bar{\tau}$ and the discrete-time strategy $\bar{\tau}_{\delta}$ implies, by inductive application of Proposition 2, that for every state z and every positive M,

$$\gamma^i_{\delta,M}(z,\bar{\tau}_{\delta}):=E^z_{\bar{\tau}_{\delta} } \frac {1}{[M/\delta]}\sum_{0\leq m<[M/\delta]}g^i(z_m,a_m) \to_{\delta\to 0+}\gamma^i_M(z,\bar{\tau}), $$

where

$$\gamma^i_M(z,\bar{\tau}):=E^z_{\bar{\tau}} \frac{1}{M}\int_0^Mg^i(z_t,x_t) \,dt, $$

and [∗] stands for the largest integer that is less than or equal to ∗.

Definition of $\bar{\tau}_{\delta}$

We map the discrete-time sequence of states z ₀,z ₁,… to a continuous-time (step-function) list of states: for any nonnegative real t≥0 we define z _δ,t=z _[t/δ]. Next, we define the profile of strategies $\bar{\tau}_{\delta}$ in Γ _δ by $\bar {\tau}_{\delta} (z_{0},a_{m},\ldots,z_{m})=\bar{\tau}((z_{\delta,t})_{t\leq m\delta})$.

Properties of $\bar{\tau}_{\delta}$

Recall the definition and properties of the positive integer N ₀, the sufficiently small ε ₁>0, the disjoint subset of states, S ₁, S ₂, and $\bar{S}$, and the pure-action strategy profile $\bar{\tau}$ (that were constructed in [8]). One of the properties of $\bar{\tau}$ is that for every z∈S ₁ and z _s∈C _z:={z′∈S∣v(z′)=v(z)} for all s≤t, $\mu(\bar {S}\cup(S\setminus C_{z}),z_{t},\bar{\tau}_{t})=0$. Therefore, by the definition of $\bar{\tau,}_{\delta}$ we have

$$ P^z_{\bar{\tau}_{\delta}}(z_m\in C_z\setminus \bar{S})=1 \quad \forall z\in S_1,\ m\geq0. $$

(30)

The following inequality^{Footnote 6} is proved in [8]. For z∈S ₁, for every player i,

$$\gamma^i_{N_0}(z,\bar{\tau})\geq v^i(z)-\varepsilon /7, $$

and, therefore, for sufficiently small δ>0,

$$ \gamma^i_{\delta,N_0}(z,\bar {\tau }_{\delta}) \geq v^i(z)-\varepsilon /6. $$

(31)

Definition of τ _δ

We define a stopping time m _δ=m _δ(z ₀,a ₀,z ₁,…) as follows. On z ₀∈S ₁, m _δ=[N ₀/δ]; on $z_{0}\in\bar{S}$, m _δ=[1/δ]; on z ₀=z∈S ₂, $m_{\delta}=\min(\{m: m=[j/\delta],\allowbreak j\in\nobreak \mathbb{N}, \mbox{ and } z_{m}\notin C_{z}\setminus \bar{S}\}\cup\{[N_{0}/\delta]\})$. Define m _k,δ, k≥0 inductively: m _0,δ=0 and $m_{k+1,\delta}=m_{k,\delta}+m_{\delta}(z_{m_{k,\delta }},a_{m_{k,\delta }},z_{m_{k,\delta}+1},\ldots)$.

The strategy profile τ _δ is defined as follows:

$$\tau_{\delta}(z_0,a_0,\ldots,z_m)= \bar{\tau}_{\delta}(z_{m_{k,\delta }},z_{m_{k,\delta}}, \ldots,z_m) \quad \mbox{if } m_{k,\delta}\leq m< m_{k+1,\delta}. $$

Properties of τ _δ

We define the sequence of states $\bar{z}^{\delta}_{k}$, k≥0, by $\bar {z}^{\delta}_{k}=z_{m_{k,\delta}}$. Note that this definition is analogous to that of the sequence of states $\bar{z}_{k}$, k≥0, in [8]. Let F ^δ, respectively F, be the transition matrix of the homogeneous Markov chain $\bar{z}^{\delta}_{0},\bar{z}^{\delta}_{1},\ldots$ with its $P^{z}_{\tau_{\delta}}$ distribution, respectively, $\bar {z}_{0},\bar {z}_{1},\ldots$ with its $P^{z}_{\hat{\tau}}$ distribution.

By the strong data convergence of 〈δg,p _δ〉 to 〈g,μ〉, p _δ(z′,z,a)>0 if and only if μ(z′,z,a)>0. Therefore, (for δ>0 sufficiently small) $F^{\delta} _{z,z'}=0$ if and only if F _z,z′=0, and thus the ergodic classes of states of the two homogeneous Markov chains, the one with transition matrix F ^δ and the other with transition matrix F, coincide.

Let $\mathcal{E}$ denote the set of ergodic classes of states, and for $E\in\mathcal{E}$ we denote by $q^{E}_{\delta}$ and q ^E the F ^δ and F invariant measures that are supported on E, and $q^{z}_{\delta} (E)$ (respectively q ^z(E)) denotes the probability of the F ^δ-Markov chain (respectively F-Markov chain) with initial state z entering the ergodic class E. Recall that every ergodic class $E\in \mathcal{E}$ is a subset of S ₁, and on z ₀∈S ₁ we have m _δ=[N ₀/δ]. Therefore,

$$E^z_{\tau_{\delta}}\underline{g}^i=E^z_{\tau_{\delta}} \bar{g}^i=\sum_{E\in \mathcal{E}}q^z_{\delta}(E) \sum_{z\in E}q^E_{\delta}(z) \gamma ^i_{\delta ,N_0}(z,\bar{\tau}_{\delta}). $$

Similarly,

$$E^z_{\hat{\tau}}\underline{g}^i=E^z_{\hat{\tau}} \bar{g}^i=\sum_{E\in \mathcal{E}}q^z(E) \sum_{z\in E}q^E(z) \gamma^i_{N_0}(z, \bar{\tau}). $$

In addition, by Proposition 2 and the w ^∗ convergence of $\bar{\tau}_{\delta}$ to $\bar{\tau}$, F ^δ→F as δ→0+. Therefore, $q^{z}_{\delta}(E)\to_{\delta\to0+}q^{z}(E)$ and $q^{E}_{\delta}\to_{\delta\to0+}q^{E}$. Since for all z∈S, $E\in\mathcal{E}$, and z′∈E, we have

$$\bigl(q^z_{\delta}(E),q^E_{\delta} \bigl(z'\bigr),\gamma^i_{\delta,N_0} \bigl(z',\bar{\tau }_{\delta} \bigr)\bigr)\to_{\delta\to0+} \bigl(q^z(E),q^E\bigl(z'\bigr), \gamma^i_{N_0}\bigl(z',\bar{\tau}\bigr)\bigr), $$

we deduce that

$$E^z_{\tau_{\delta}}\underline{g}^i =E^z_{\tau_{\delta}} \bar{g}^i \to _{\delta \to0+}E^z_{\hat{\tau}} \bar{g}^i=E^z_{\hat{\tau}}\underline{g}^i. $$

Therefore, for sufficiently small δ>0, we have

$$ u^i(z)-\varepsilon \leq E^z_{\tau_{\delta}} \underline{g}^i =E^z_{\tau_{\delta} }\bar {g}^i \leq u^i(z)+\varepsilon /6. $$

(32)

Recall the definition of τ, ε ₁ and $\tilde{\tau}$ in [8], where it is proved that for every z∈S, every player i, and every stopping time T, $E^{z}_{\tau} v^{i}(z_{T})\geq v^{i}(z)-\varepsilon _{1}/2$. Therefore, for every z∈S ₂, ∑_z′∈S F _z,z′ v ⁱ(z′)≥v ⁱ(z)−ε ₁/2. For $z\in S_{1}\cup\bar{S}$, ∑_z′∈S F _z,z′ v ⁱ(z′)=v ⁱ(z). Therefore,

$$\sum_{z'\in S}F_{z,z'}v^i \bigl(z'\bigr)\geq v^i(z)-\frac{\varepsilon _1}{2} 1_{z\in S_2\cup\bar{S}}, $$

where 1_∗ is the indicator function of ∗. In addition, if we replace the symbols δ and ε in [8] with the symbol η and ε/8, it can be seen that for $\varepsilon _{1}<\eta d^{2}\frac{\varepsilon }{32}$,

$$E^z_{\sigma }\sum_{k=0}^{\infty}1_{\bar{z}_k\notin S_1} \leq\frac {128}{\eta d^2 \varepsilon }, $$

and, therefore, for ε<1 and $\varepsilon _{1}<\eta d^{2}\frac{\varepsilon ^{2}}{8}\frac {1}{128}$ ($<\eta d^{2}\frac{\varepsilon }{32}$),

$$\varepsilon _1 E^z_{\sigma }\sum _{k=0}^{\infty}1_{\bar{z}_k\notin S_1}< \eta d^2 \frac {\varepsilon ^2}{8}\frac{1}{128} \frac{128}{\eta d^2 \varepsilon }=\varepsilon /8. $$

Therefore, for sufficiently small δ,

$$\varepsilon _1 E^z_{\tau_{\delta}}\sum _{k=0}^{\infty}1_{\bar{z}_k\notin S_1}< \varepsilon /6. $$

Assume that ε<1 and $\varepsilon _{1}<\eta d^{2}\frac{\varepsilon ^{2}}{8}\frac{1}{128}$.

Lemma 4

For sufficiently small δ>0, for every stopping time T, we have

$$ E^z_{\tau_{\delta}}v^i_T \leq E^z_{\tau_{\delta}}v^i_{\infty}+\varepsilon /6, $$

(33)

where v _m=v(z _m) and $v^{i}_{\infty}=\limsup_{m\to\infty}v^{i}_{m}$ (which equals $\lim_{m\to\infty}v^{i}_{m}$ $P^{z}_{\tau_{\delta}}$ a.e.), and

$$ E^z_{\tau_{\delta}}1_{T<\infty }v^i_T \leq E^z_{\tau_{\delta}}1_{T<\infty}v^i_{\infty}+ \varepsilon /6\leq E^z_{\tau _{\delta}}1_{T<\infty}\bar{g}^i+ \varepsilon /3. $$

(34)

Proof

The strategy τ defined in [8] obeys $v^{i}(z_{T})\leq E^{z}_{\tau}(v^{i}(z_{T'})\mid \mathcal{H}_{T})+\varepsilon _{1}/2$ for all finite stopping times T≤T′. Therefore, for all stopping times T≤T′≤N ₀, $E^{z}_{\tau} v^{i}(z_{T})\leq E^{z}_{\tau} v^{i}(z_{T'})+\varepsilon _{1}/2$, and $E^{z}_{\bar {\tau}} v^{i}(z_{T})\leq E^{z}_{\bar{\tau}} v^{i}(z_{T'})+3\varepsilon _{1}/4$. For z∈S ₁ we have, v(z _m)=v(z) for all m≤m _δ, $P^{z}_{\bar{\tau }_{\delta}}$ a.e. Therefore, for sufficiently small δ>0, for every stopping time T≤m _δ (in the discrete-time game), we have

$$E^z_{\tau_{\delta}} v^i_T=E^z_{\bar{\tau}_{\delta}} v^i_T\leq E^z_{\tau _{\delta}} v^i\bigl(\bar{z}^{\delta}_1\bigr)+\varepsilon _1 1_{z\notin S_1}. $$

Therefore, for δ>0 sufficiently small, for every stopping time T,

$$E^z_{\tau_{\delta}} v^i_T\leq E^z_{\tau_{\delta}} v^i_{\infty}+\varepsilon _1 E^z_{\tau_{\delta}}\sum_{k=0}^{\infty}1_{\bar{z}_k\notin S_1} \leq E^z_{\tau _{\delta}} v^i_{\infty}+\varepsilon /6. $$

This completes the proof of inequality (33).

Since 1_T=∞ v _T=1_T=∞ v _∞ $P^{z}_{\tau_{\delta}}$ a.e., we deduce that

$$ E^z_{\tau_{\delta}}1_{T<\infty }v_T=E^z_{\tau_{\delta}}1_{T<\infty}v_{\infty}+ \varepsilon /6. $$

(35)

By inequality (31), we have $v^{i}_{\infty}\leq\bar {g}^{i}+\varepsilon /6$, $P^{z}_{\tau_{\delta}}$ a.e. Therefore,

$$E^z_{\tau_{\delta}} 1_{T<\infty}v^i_{\infty} \leq E^z_{\tau_{\delta}} 1_{T<\infty}\bar{g}^i + \varepsilon /6, $$

which together with (35) implies (34). □

The Punishing Strategies

Recall that $v:S\to\mathbb{R}^{N}$ is the asymptotic limiting-average minmax of the family (Γ _δ)_δ>0. It follows that for every ε>0, z∈S, i∈N, and δ sufficiently small, there is a strategy profile $\sigma _{\delta ,\varepsilon }^{-i}$ of players N∖{i} such that for every strategy τ ⁱ of player i we have

$$\bar{\gamma}^i_{\delta}\bigl(z,\sigma _{\delta,\varepsilon }^{-i}, \tau ^i\bigr):=E^z_{\delta,\sigma _{\delta,\varepsilon }^{-i},\tau^i}\bar{g}^i \leq v^i(z)+\varepsilon /3. $$

The Limiting-Average ε-Equilibrium Strategy σ _δ

The strategy profile σ _δ follows the pure strategy profile τ _δ as long as the play coincides with a play that is compatible with the strategy τ _δ, and reverts to punishing (in the lim sup game Γ _δ) a deviating player. A formal definition of σ _δ follows.

Let k _δ be the first stage m with a _m≠τ _δ(z ₀,a ₀,…,z _m); k _δ=∞ if a _m=τ _δ(z ₀,a ₀,…,z _m) for every m≥0. Fix an order of the player set N, and on k _δ<∞ let i _δ be the minimal player i with $a^{i}_{k_{\delta}}\neq\tau_{\delta}^{i}(z_{0},a_{0},\ldots ,z_{k_{\delta} })$. For every player i∈N,

$$\sigma _{\delta}^{-i}(z_0,a_0, \ldots,z_m)= \begin{cases}\tau_{\delta}^{-i}(z_0,a_0,\ldots,z_m) & \mbox{if } k_{\delta}\geq m,\\ \sigma _{\delta,\varepsilon }^{-i}(z_{k_{\delta} +1},a_{k_{\delta} +1},\ldots,z_m) & \mbox{if } k_{\delta}<m \mbox{ and } i=i_{\delta}. \end{cases} $$

To complete the definition of the strategy profile σ _δ, there is a need to define $\sigma _{\delta}^{i}(z_{0},a_{0},\ldots,\allowbreak z_{m})$ on k _δ<m and i=i _δ. However, this has no impact on the reasoning that follows. We therefore define it arbitrarily.

Let τ ⁱ be a pure strategy of player i. Note that $(\sigma _{\delta} ^{-i},\tau^{i})$ is a pure strategy profile. Let n _δ be the stopping time of the first stage m such that $\tau^{i}(z_{0},a_{0},\ldots ,z_{m})\neq\tau^{i}_{\delta}(z_{0},a_{0},\ldots,z_{m})$. Note that for every state z, with $P^{z}_{\tau_{\delta}^{-i},\tau^{i}}$-probability 1, k _δ=n _δ, and i _δ=i on k _δ<∞. Let $\mathcal{H}_{n_{\delta}}$ be the σ-algebra generated by all $(z_{m})_{m\leq n_{\delta}}$ and $(a_{m})_{m< n_{\delta}}$.

(36)

(37)

(38)

(39)

(40)

(41)

(42)

(43)

Equality (36) follows from the definition of $\bar {\gamma }^{i}(z,{\sigma _{\delta}^{-i},\tau^{i}})$. Equality (37) follows from one of the basic properties of conditional expectation: that the expectation equals the expectation of the conditional expectation. Equality (38) follows from the rewriting of the constant function 1 as the sum of the two {0,1}-valued functions $1_{n_{\delta}=\infty}$ and $1_{n_{\delta}<\infty}$. Equality (39) follows from the facts that (1) the expectation is additive, (2) $1_{n_{\delta}=\infty}$ is measurable with respect to σ-algebra $\mathcal{H}_{n_{\delta}}$ and therefore $E^{z}_{\sigma ^{-i}_{\delta}, \tau ^{i}}1_{n_{\delta}=\infty}\,\bar{g}^{i}=E^{z}_{\sigma ^{-i}_{\delta}, \tau ^{i}}1_{n_{\delta}=\infty}\,E^{z}_{\sigma ^{-i}_{\delta},\tau^{i}}(\bar {g}^{i}\mid \mathcal{H}_{n_{\delta}})$, and (3) the $P^{z}_{\sigma _{\delta}}$-distribution and the $P^{z}_{\sigma _{\delta}^{-i},\tau^{i}}$-distribution of $1_{n_{\delta} =\infty }\bar {g}^{i}$ coincide. Inequality (40) follows from the definitions of $\sigma ^{-i}_{\delta}$ and $\sigma ^{-i}_{\delta,\varepsilon }$. Inequality (41) follows from the fact that for sufficiently small δ>0, for every strategy σ and stopping time T, $E^{z}_{\sigma } 1_{T<\infty}v^{i}(z_{T+1})\leq E^{z}_{\sigma} 1_{T<\infty}v^{i}(z_{T})+\varepsilon /6$. Inequality (42) follows from Lemma 4, which asserts that for every stopping time T, $E^{z}_{\sigma _{\delta} }1_{T<\infty}\,v^{i}(z_{T})\leq E^{z}_{\sigma _{\delta}}1_{T<\infty}\, \bar {g}^{i}+\varepsilon /3$. Inequality (43) follows from inequality (32) which asserts that $E^{z}_{\sigma _{\delta}}\bar{g}^{i} \leq u^{i}(z)+\varepsilon /6$.

By (32), $E^{z}_{\sigma _{\delta}}\underline{g}^{i}\geq u^{i}(z)-\varepsilon $, which together with the equality $\underline{\gamma}^{i}(z,\sigma _{\delta} )=E^{z}_{\sigma _{\delta}}\underline{g}^{i}$ implies that $\underline {\gamma }^{i}(z,\sigma _{\delta})\geq u^{i}(z)-\varepsilon $. We conclude that (u,σ _δ) is a limiting-average ε-equilibrium payoff and strategy and, therefore, u is an asymptotic limiting-average equilibrium payoff. □

Theorem 12

An exact family (Γ _δ)_δ>0 has an asymptotic uniform equilibrium payoff.

Proof

First, recall that an exact family has an asymptotic uniform minmax. The uniform ε-equilibrium strategy σ _δ follows the pure strategy profile τ _δ (defined in the proof of the previous theorem), and reverts to punishing a deviating player (in the uniform game). □

Theorem 13

An exact family (Γ _δ)_δ>0 has an asymptotic $\vec {w}$-robust equilibrium payoff whenever $\vec {w}=(w^{i})_{i\in N}$ is a vector of nonstationary discounting measures on [0,∞].

Proof

For $\beta=(\beta^{i})_{i\in N} \in\mathbb{R}^{N}_{+}$ we denote by $\beta* \vec {w}$ the vector (β ⁱ w ⁱ)_i∈N. Note that if β ⁱ>0 for every i∈N, then the family (Γ _δ)_δ>0 has an asymptotic $\vec {w}$-robust equilibrium payoff if and only if it has an asymptotic $\beta *\vec {w}$-robust equilibrium payoff. Therefore, we may assume that w ⁱ(∞)=1.

Fix ε>0 and an asymptotic 1_∞-robust equilibrium payoff $\nu \in\mathbb{R}^{N\times S}$ of the exact family (Γ _δ)_δ>0. Let 0<t<∞ be such that w ⁱ([t,∞))<ε/∥g∥ for every i∈N, and let m _δ=[t/δ] and ν _δ=ν. Then (m _δ,ν _δ) converges to (t,ν). Let $v\in\mathbb{R}^{N\times S}$ be an asymptotic $(\vec {w}_{t},t,\nu)$ equilibrium payoff of the family (Γ _δ)_δ>0, where $\vec {w}_{t}$ is the restriction of $\vec {w}$ to the interval [0,t). If $\vec {w}_{\delta}$ converges (as δ goes to zero) to $\vec {w}$, then $\vec {w}_{t,\delta}$—the restriction of $\vec {w}_{\delta}$ to {0,1,2,…,m _δ}—converges to $\vec {w}_{t}$.

If σ _δ is the strategy profile that follows up to stage m _δ an ε-equilibrium strategy profile in $\varGamma ^{m_{\delta} ,\nu}_{\delta,\vec {w}_{t,\delta}}$ with a payoff within ε of v, and thereafter a 1_∞-robust ε-equilibrium with a payoff within ε of ν, then for every player i and all strategies $\tau^{i}_{\delta}$ (δ>0) of player i in Γ _δ,

$$6\varepsilon + \liminf_{\delta\to0+} E^z_{\sigma _{\delta}} g^i_{\delta} \bigl(w^i_{\delta}\bigr)\geq v^i(z)\geq-6\varepsilon + \limsup_{\delta\to0+} E^z_{\sigma _{\delta},\tau ^i_{\delta}} g^i_{\delta} \bigl(w^i_{\delta}\bigr). $$

Therefore, the exact family (Γ _δ)_δ>0 has an asymptotic $\vec {w}$-robust equilibrium payoff. □

Notes

Henceforth, whenever we discuss a value concept of a family (Γ _δ), we will omit the statement of the implicit condition that it is a family of two-person zero-sum games.
A continuous-time strategy σ is a mixed-action-valued measurable function defined on $S\times\mathbb{R}$.
Without the assumption of finitely many actions, a uniform value need not exist [12]. The assumption of finitely many states is obviously needed.
Moreover, the stage-dependent duration δ _m, payoff g _m, and transition function p _m can depend on past history.
The finiteness follows from the fact that the minimum and the maximum of a linear function over a simplex is attained in one of the finitely many extreme points of the simplex.
The ε in [8] is ε/8 here, and ε ₁ there is sufficiently small.

References

Bewley T, Kohlberg E (1976) The asymptotic theory of stochastic games. Math Oper Res 1:197–208
Article MathSciNet MATH Google Scholar
Guo X, Hernandez-Lerma O (2005) Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs. J Appl Probab 42:303–320
Article MathSciNet MATH Google Scholar
Jasso-Fuentes H (2005) Noncooperative continuous-time Markov games. Morfismos 9:39–54
Google Scholar
Levy Y (2012) Continuous-time stochastic games of fixed duration. Dyn Games Appl (this issue)
Mertens J-F, Neyman A (1981) Stochastic games. Int J Game Theory 10:53–66
Article MathSciNet MATH Google Scholar
Neyman A (2003) Real algebraic tools in stochastic games. In: Neyman A, Sorin S (eds) Stochastic games and applications. NATO ASI series. Kluwer Academic, Dordrecht, pp 57–75
Chapter Google Scholar
Neyman A (2003) Stochastic games: existence of the minmax. In: Neyman A, Sorin S (eds) Stochastic games and applications. NATO ASI series. Kluwer Academic, Dordrecht, pp 173–193
Chapter Google Scholar
Neyman A (2012) Continuous-time stochastic games. DP 616, Center for the Study of Rationality, Hebrew University of Jerusalem
Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100
Article MathSciNet MATH Google Scholar
Solan E (2003) Continuity of the value of competitive Markov decision processes. J Theor Probab 16:831–845
Article MathSciNet MATH Google Scholar
Solan E, Vieille N (2002) Correlated equilibrium in stochastic games. Games Econ Behav 38:362–399
Article MathSciNet MATH Google Scholar
Vigeral G (2012) A zero-sum stochastic game with compact action sets and no asymptotic value. Dyn Games Appl (this issue)
Zachrisson LE (1964) Markov games. In: Advances in game theory. Princeton University Press, Princeton
Google Scholar

Download references

Acknowledgements

This research was supported in part by Israel Science Foundation grant 1596/10.

Author information

Authors and Affiliations

Institute of Mathematics, and Center for the Study of Rationality, The Hebrew University of Jerusalem, Givat Ram, Jerusalem, 91904, Israel
Abraham Neyman

Authors

Abraham Neyman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abraham Neyman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Neyman, A. Stochastic Games with Short-Stage Duration. Dyn Games Appl 3, 236–278 (2013). https://doi.org/10.1007/s13235-013-0083-x

Download citation

Published: 17 May 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s13235-013-0083-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Stochastic Games with Short-Stage Duration

Abstract

Similar content being viewed by others

Stationary Equilibria in Discounted Stochastic Games

The Asymptotic Value in Finite Stochastic Games

A Folk theorem for stochastic games with finite horizon

1 Introduction

2 The Model and Results

2.1 The Discounted Games

2.2 The Nonstationary Discounted Games

2.3 The Limiting-Average Games

2.4 The Mixed Discounting and Limiting-Average Games

2.5 The Uniform Games

Remark 1

2.6 The Robust Nonstationary Discounted Solutions

2.7 The Variable Short-Stage Duration Games

3 Convergence of Stochastic Games with Short-Stage Duration

Definition 1

Definition 2

Definition 3

3.1 Stationary Convergence

Proposition 1

Proof

Remark 2

3.2 Markov Convergence

Proposition 2

Proof

Remark 3

Remark 4

4 Two-Person Zero-Sum Stochastic Games with Short-Stage Duration

4.1 The Discounted Case

Theorem 1

Proof

Remark 5

Remark 6

Remark 7

Remark 8

Remark 9

The Algebraic Approach

Covariance Properties

Lemma 1

Proof

Theorem 2

Proof

4.2 The Asymptotic Nonstationary Discounted Value

Lemma 2

Proof

Theorem 3

Lemma 3

Proof

Proof of Theorem 3

4.3 The Asymptotic Limiting-Average Value

Theorem 4

Proof

Remark 10

4.4 The Asymptotic Mixed Discounting and Limiting-Average Value

Theorem 5

Proof

4.5 The Asymptotic Uniform and w-Robust Value

Theorem 6

Proof

Remark 11

Theorem 7

Proof

5 Non-zero-Sum Stochastic Games with Short-Stage Duration: The Discounted Games

5.1 The Asymptotic Discounted Equilibrium

Theorem 8

Proof

Remark 12

Covariance Properties

5.2 The Asymptotic Discounted Minmax

5.3 The Asymptotic Equilibrium of Nonstationary Discounting Games

Theorem 9

6 Non-zero-Sum Stochastic Games with Short-Stage Duration: The Limiting-Average and Uniform Games

6.1 The Asymptotic Limiting-Average and Uniform Minmax

Theorem 10

Proof

6.2 The Asymptotic Limiting-Average Equilibrium

Remark 13

Remark 14

Definition of τ _δ

Properties of τ _δ

The Limiting-Average ε-Equilibrium Strategy σ _δ