1 Introduction

This paper deals with zero-sum (ratio) average payoff semi-Markov games with Borel spaces, weakly continuous transition probabilities and unbounded lower semicontinuous payoff function. It is shown the existence of solutions to the Shapley equation and the existence of an optimal stationary strategy for the minimizing player and an \(\varepsilon \)-optimal stationary strategy for the maximizing one assuming that the game model satisfies a growth and Lyapunov conditions besides some continuity properties. The framework settled by these conditions was already used in several previous works dealing with Markov and semi-Markov decision processes (for strongly continuous transition probabilities–in the control variable–see references [4, 6, 7, 21]; for weakly continuous transition probabilities–in the state-action pair–see references [9, 12, 13, 16, 24]) as well as with Markov and semi-Markov games (for strongly continuous transition probabilities see [8, 22]; for the weakly continuous case see [10, 11]). See also references [2, 17] for applications in communications systems. In fact, the present work extends the fixed-point approach of references [21, 24] to zero-sum average semi-Markov games with weakly transition probabilities and provides an application to a minimax semi-Markov inventory problem under fairly weak assumptions on the demand distributions. The readers are referred to reference [15] for a review on zero-sum stochastic games in discrete time. It is worth mentioning that zero-sum average payoff semi-Markov games were seemingly studied first by Tanaka and Wakuta [20],Footnote 1 who considered compact state and action spaces and assumed that the payoff function is continuous (thus, bounded), the transition law is weakly continuous among other conditions.

Concerning reference [10] some comments are in order. Jaśkiewicz [10] shows similar results to the present work under similar conditions but there are, of course, some important differences. To begin, to take advantage of the contraction property implied by the Lyapunov condition, she directly “smooths” functions–that are not lower semicontinuous–by taking liminf pointwise, which makes the proof of the validity of the Shapley equation somewhat technically involved. The present work gives a simpler proof of this result by using lower semicontinuous envelopes of functions and defining a suitable contraction operator. Moreover, the Lyapunov conditions of the present paper are weaker than those used in [10]. A second difference regards with the regularity property of the controlled processes, which is guaranteed in [10] by imposing essentially the standard condition on the holding time distribution (see, for instance, [19, Prop. 5.1(a), p. 88.]). Recall that the regularity property states that the involved stochastic processes experience finitely many transitions in bounded time intervals. This property, together the Lyapunov condition, plays an important role to guarantee that the ratio average payoff is well defined and finite-valued and also to show that the Shapley equation yields the existence of optimal or almost optimal stationary strategies for the players. These two latter facts are given by granted and not discussed explicitly by Jaśkiewicz [10]. In contrast, the present work does not impose any probabilistic condition additional to the Lyapunov condition; instead, in order to ensure that the ratio average payoff criterion is well defined, it is assumed that the admissible action sets for both players are compact sets. (Jaśkiewicz [10] supposes that the admissible action sets for one player are compact sets, while for the other player the admissible action sets are complete spaces.) In fact, the mentioned Lyapunov condition implies that the semi-Markov processes induced by stationary policies are regular (see [23, Theorem 18.3.4 and Remark 18.3.6.]). On the other hand, Jaśkiewicz [10] illustrates her results with a minimax Markov inventory problem with a finite number of possible distributions for the random demands, while the present work considers a minimax semi-Markov inventory problem assuming only that the first and second moments belong to bounded intervals. For additional results on minimax (or robust) control problems see references [3, 14, 16].

The remainder of the paper is organized as follows. Section 2 introduces the zero-sum semi-Markov game and the (ratio) average payoff performance index besides some standard concepts and notation. Section 3 states the assumptions and the main result (Theorem 3.4); its proof is given in Sect. 5. Section 4 shows the existence of a deterministic stationary minimax policy for a semi-Markov minimax inventory problem (see Theorem 4.2); the proof is given in Sect. 5.

2 Zero-Sum Average Payoff Semi-Markov Games

The following standard concepts and notation are used throughout the paper. For a Borel space \(\mathcal{S}\)–that is, a subset of a complete and separable metric space–\(\mathcal{B(S)}\) denotes the Borel \(\sigma \)-algebra; any statement about “measurability” should be meant as measurability with respect to \(\mathcal{B}(\mathcal{S)}\). The family of probability measures on \(\mathcal{S}\) is denoted by \(\mathbb{P}(\mathcal{S})\). Given two Borel spaces \(\mathcal{S}\) and \(\mathcal{S}^{\prime }\), a kernel \(K(\cdot |\cdot )\) on \(\mathcal{S}\) given \(\mathcal{S}^{\prime }\) is a mapping such that \(K(\cdot |s^{\prime })\) is a measure on \(\mathcal{S}\) for each \(s^{\prime }\in \mathcal{S}^{\prime }\), and \(K(B|\cdot )\) is a measurable function on \(\mathcal{S}^{\prime }\) for each subset \(B\in \mathcal{B}(\mathcal{S})\). The kernel \(K(\cdot |\cdot )\) is called stochastic kernel if \(K(\mathcal{\cdot }|s^{\prime })\in \mathbb{P}(\mathcal{S)}\) for all \(s^{\prime }\in \mathcal{S}^{\prime }\).

Let \(K(\cdot |\cdot )\) be a kernel on \(\mathcal{S}\) given \(\mathcal{S}^{\prime }\). For an arbitrary measurable function \(u:\mathcal{S\rightarrow }\mathbb{R}\), let

$$ Ku(s^{\prime }):=\int _{\mathcal{S}}u(s)K(ds|s^{\prime }),\ \ s^{\prime }\in \mathcal{S}^{\prime }, $$

whenever the integral is well defined. Similarly, for a measure \(\nu \in \mathbb{P}(\mathcal{S)}\), set

$$ \nu (u):=\int _{\mathcal{S}}u(s)\nu (ds). $$

Given a measurable function \(w:\mathcal{S}\rightarrow \lbrack 1,\infty )\), \(B_{w}(\mathcal{S})\) stands for the family of functions \(u\) on \(\mathcal{S}\) with finite \(w\)-norm, which is defined as

$$ ||u||_{w}:=\sup _{s\in \mathcal{S}}\frac{|u(s)|}{w(s)}. $$

By \(L_{w}(\mathcal{S})\) denote the class of lower semicontinuous functions belonging to \(B_{w}(\mathcal{S})\). The normed space \((B_{w}(\mathcal{S}),||\cdot ||_{w})\) is a Banach space while \((L_{w}(\mathcal{S}),d_{w})\) is a complete metric space, where \(d_{w}\) is the metric induced by the \(w\)-norm. The space of continuous bounded functions defined on \(\mathcal{S}\) is denoted as \(\mathcal{C}_{b}(\mathcal{S})\).

The subset of nonnegative real numbers is denoted by \(\mathbb{R}_{+}\) and the positive (nonnegative, resp.) integers subset by \(\mathbb{N\ (N}_{0}\), resp.).

The game model. We are interested in zero-sum semi-Markov games with the (ratio) average cost criterion given below (6). This kind of games is specified by a semi-Markov game model given by the collection

$$ (\mathbf{X},\mathbf{A},\mathbf{B},\mathbb{K}_{\mathbf{A}}, \mathbb{K}_{\mathbf{B}},q,c), $$
(1)

where the Borel spaces \(\mathbf{X},\mathbf{A},\mathbf{B}\) denote the state space of the game, and the action or control subsets for player 1 and player 2, respectively. The constraint sets \(\mathbb{K}_{\mathbf{A}}\) and \(\mathbb{K}_{\mathbf{B}}\) belongs to \(\mathcal{B}(\mathbf{X\times A)}\) and \(\mathcal{B}(\mathbf{X\times B)}\), respectively. The \(x\)-sections

$$\begin{aligned} A(x) & :=\{a\in \mathbf{A}:(x,a)\in \mathbb{K}_{\mathbf{A}}\} \\ B(x) & :=\{b\in \mathbf{B}:(x,b)\in \mathbb{K}_{\mathbf{B}}\} \end{aligned}$$

stand for the admissible action or control subsets for players 1 and 2, respectively, when the game is in state \(x\in \mathbf{X}\). The set

$$ \mathbb{K}:=\{(x,a,b):x\in \mathbf{X},a\in A(x),b\in B(x)\} $$

is a Borel subset of the Cartesian product \(\mathbf{X}\times \mathbf{A}\times \mathbf{B}\) (see [18, Lemma 1.1]). The stochastic kernel \(q(\cdot |\cdot ,\cdot ,\cdot )\) on \(\mathbf{X\times }\mathbb{R}_{+}\) given \(\mathbb{K}\) is the transition law of the game. Finally, the measurable function \(c:\mathbb{K\times R}_{+}\rightarrow \mathbb{R}\) is the payoff function of the game.

The game is played over an infinite horizon as follows: at time \(t=0\), both players observe the game in some state, say, \(x_{0}=x\in \mathbf{X}\), and independently choose admissible controls \(a_{0}=a\in A(x)\) and \(b_{0}=b\in B(x)\). Then, the game remains in state \(x_{0}=x\) for a nonnegative random time \(\Delta _{1}\) and, at this time, it moves to a new state \(x_{1}=x^{\prime }\in \mathbf{X}\) according to the probability measure \(q(\cdot |x,a,b)\), that is,

$$ q(B\times D|x,a,b)=\Pr [(x_{1},\Delta _{1})\in B\times D|x_{0}=x,a_{0}=a,b_{0}=b], $$
(2)

for \(B\in \mathcal{B}(\mathbf{X}),D\in \mathcal{B}(\mathbb{R}_{+})\). Immediately after the transition occurs, player 1 pays the amount \(c(x,a,b,\Delta _{1})\) to player 2, and they choose new controls, say, \(a_{1}=a^{\prime }\in A(x^{\prime })\) and \(b_{1}=b^{\prime }\in B(x^{\prime })\), and the above process repeats over and over again.

This procedure engenders a stochastic process \(\{(x_{n},a_{n},b_{n},\Delta _{n+1})\}\), where, for each \(n\in \mathbb{N}_{0}\), \(x_{n}\) is the state of the game, \(a_{n}\) and \(b_{n}\) are the control variables for players 1 and 2, respectively, and \(\Delta _{n+1}\) is the time the game spends in state \(x_{n}\); thus, the random time \(\Delta _{n+1}\) is called holding or sojourn time at state \(x_{n}\). Note that the random variable

$$ T_{n}:=T_{n-1}+\Delta _{n},\ n\in \mathbb{N},\ \ \text{and}\ \ T_{0}:=0, $$

is the time of the nth jump of the game. Thus, if \(x_{n}=x,a_{n}=a\) and \(b_{n}=b\), according to (2), the conditional marginal

$$ Q(B|x,a,b):=q(B\times \mathbb{R}_{+}|x,a,b) $$
(3)

rules the state toward the game moves in the next transition irrespective of the time it takes to occur. Similarly, the conditional marginal

$$ F(D|x,a,b):=q(\mathbf{X}\times D|x,a,b),\ \ D\in \mathcal{B}( \mathbb{R}_{+}), $$

rules the time at which the next transition happens irrespective the state toward which the system moves out; thus, it is called the holding time distribution. Then,

$$ \tau (x,a,b):=\int _{\mathbb{R}^{+}}tF(dt|x,a,b) $$
(4)

is the mean holding or sojourn time, while

$$ C(x,a,b):=\int _{\mathbb{R}_{+}}c(x,a,b,t)F(dt|x,a,b) $$
(5)

is the mean payoff that player 1 makes to player 2.

Strategies. Let \(H_{0}:=\mathbf{X}\) and \(H_{n}:=\mathbb{K\times R}_{+}\times H_{n-1}\) for \(n\in \mathbb{N}\). Thus, for \(n\in \mathbb{N}\), each element

$$ h_{n}=(x_{0},a_{0},b_{0},\Delta _{1},\ldots ,x_{n-1},a_{n-1},b_{n-1}, \Delta _{n},x_{n})\in H_{n}$$

is the history of the game up to the nth transition, which occurs in the time \(T_{n}\). A strategy for player 1 is a sequence \(\pi ^{1}=\{\pi _{n}^{1}\}\) of stochastic kernels on \(\mathbf{A}\) given \(H_{n}\) that satisfy the constraint

$$ \pi _{n}^{1}(A(x_{n})|h_{n})=1\ \ \forall h_{n}\in H_{n},n\in \mathbb{N}_{0}. $$

The class of all strategies for player 1 is denoted by \(\Pi ^{1}\). Now, for each \(x\in \mathbf{X}\), let \(\mathbb{A}(x):=\mathbb{P}(A(x))\) and denote by \(\Phi ^{1}\) the class of stochastic kernels on \(\mathbf{A}\) given \(\mathbf{X}\) such that \(\phi ^{1}(\cdot |x)\in \mathbb{A}(x)\) for each \(x\in \mathbf{X}\). A policy \(\pi ^{1}=\{\pi _{n}^{1}\}\) is called stationary if

$$ \pi _{n}^{1}(\cdot |h_{n})=\phi ^{1}(\cdot |x_{n})\ \ \forall h_{n} \in H_{n},n\in \mathbb{N}_{0}$$

for some stochastic kernel \(\phi ^{1}\in \Phi \). In this case, as usual, strategy \(\pi ^{1}=\{\pi _{n}^{1}\}\) is identified with the stochastic kernel \(\phi ^{1}\) and the class of all stationary strategies with \(\Phi ^{1}\). A stationary policy \(\phi ^{1}\) for player 1 is called deterministic stationary if there exists a measurable function \(f:\mathbf{X}\rightarrow \mathbf{A}\) satisfying that \(f(x)\in A(x)\) for all \(x\in \mathbf{X}\) and such that \(\phi ^{1}(\cdot |x)\) is concentrated at \(f(x)\) for each \(x\in \mathbf{X}\); the set of deterministic stationary policies for player 1 is denoted by \(\mathbb{F}^{1}\). The sets of strategies \(\Pi ^{2}\), \(\Phi ^{2}\) and \(\mathbb{F}^{2}\) for player 2 are defined similarly but considering \(B(x)\) and \(\mathbb{B}(x)\) in lieu of \(A(x)\) and \(\mathbb{A}(x)\), respectively.

Throughout of the remainder of the present work is used the following standard notation: for a measurable function \(u\) on \(\mathbb{K}\), \(x\in \mathbf{X}\) and probability measures \(\gamma ^{1}\in \mathbb{A}(x),\gamma ^{2}\in \mathbb{B}(x)\), let

$$ u_{\gamma ^{1},\gamma ^{2}}(x):=\int _{B(x)}\int _{A(x)}u(x,a,b) \gamma ^{1}(da)\gamma ^{2}(db). $$

Similarly, for a stationary strategy pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\) and \(x\in \mathbf{X}\), set

$$ u_{\phi ^{1},\phi ^{2}}(x):=\int _{B(x)}\int _{A(x)}u(x,a,b)\phi ^{1}(da|x) \phi ^{2}(db|x). $$

Thus, in particular,

$$\begin{aligned} C_{\phi ^{1},\phi ^{2}}(x) & =\int _{B(x)}\int _{A(x)}C(x,a,b)\phi ^{1}(da|x)\phi ^{2}(db|x), \\ \tau _{\phi ^{1},\phi ^{2}}(x) & =\int _{B(x)}\int _{A(x)}\tau (x,a,b) \phi ^{1}(da|x)\phi ^{2}(db|x), \end{aligned}$$

and also

$$ Q_{\phi ^{1},\phi ^{2}}(\cdot |x)=\int _{B(x)}\int _{A(x)}Q(\cdot |x,a,b) \phi ^{1}(da|x)\phi ^{2}(db|x). $$

The average payoff performance index. Let \(\Omega :=(\mathbb{K\times R}_{+})^{\infty }\) and ℱ the corresponding product \(\sigma \)-algebra. For each strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) and probability measure \(\nu \) on \(\mathbf{X}\) there exist a probability measure \(P_{\nu }^{(\pi ^{1},\pi ^{2})}\) and a stochastic process \(\{(x_{n},a_{n},b_{n},\Delta _{n+1})\}\) defined on the sample space \((\Omega ,\mathcal{F})\) with the following properties:

\((i)\ P_{\nu }^{(\pi ^{1},\pi ^{2})}[x_{0}\in B]=\nu (B)\);

\((\mathit{ii})\ P_{\nu }^{(\pi ^{1},\pi ^{2})}[(a_{n},b_{n})\in C_{1}\times C_{2}|h_{n}]=\pi _{n}^{1}(C_{1}|h_{n})\pi _{n}^{2}(C_{2}|h_{n})\);

\((\mathit{iii})\ P_{\nu }^{(\pi ^{1},\pi ^{2})}[(x_{n+1},\Delta _{n+1})\in B \times D|h_{n},a_{n},b_{n}]=q(B\times D|x_{n},a_{n},b_{n})\);

for all \(B\in \mathcal{B}(\mathbf{X}),\ C_{1}\in \mathcal{B}(\mathbf{A}),C_{2}\in \mathcal{B}(\mathbf{B}),D\in \mathcal{B}( \mathbb{R}_{+}),h_{n}\in H_{n},n\in \mathbb{N}\).

The expectation operator with respect to \(P_{\nu }^{(\pi ^{1},\pi ^{2})}\) is denoted as \(E_{\nu }^{(\pi ^{1},\pi ^{2})}\). According to property \((i)\), the probability measure \(\nu \) is called initial distribution. If the initial distribution \(\nu \) is concentrated at some state \(x\in \mathbf{X}\), we shall write \(P_{x}^{(\pi ^{1},\pi ^{2})}\) and \(E_{x}^{(\pi ^{1},\pi ^{2})}\) instead of \(P_{\nu }^{(\pi ^{1},\pi ^{2})}\) and \(E_{\nu }^{(\pi ^{1},\pi ^{2})}\), respectively.

If the players use a stationary strategy pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\), by property (iii), the state process \(\{x_{n}\}\) is a Markov chain with one-step transition probability \(Q_{\phi ^{1},\phi ^{2}}(\cdot |\cdot )\). In this case, the n-step transition probability is denoted by \(Q_{\phi ^{1},\phi ^{2}}^{n}(\cdot |\cdot )\) for \(n\in \mathbb{N}_{0}\), where \(Q_{\phi ^{1},\phi ^{2}}^{0}(\cdot |x)\) is the Dirac measure concentrated at \(x\in \mathbf{X}\). Thus, for a measurable function \(u:\mathbf{X}\rightarrow \mathbb{R}\),

$$ Q_{\phi ^{1},\phi ^{2}}^{n}u(x)=\int _{\mathbf{X}}u(y)Q_{\phi ^{1}, \phi ^{2}}^{n}(dy|x)=E_{x}^{(\phi ^{1},\phi ^{2})}u(x_{n})\ \ \ \forall x\in \mathbf{X}, $$

whenever these quantities are well defined.

The (ratio) expected average payoff (EAP) for a strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) and initial state \(x\in \mathbf{X}\) is defined as

$$ J(\pi ^{1},\pi ^{2},x):=\limsup _{n\rightarrow \infty } \frac{E_{x}^{(\pi ^{1},\pi ^{2})}\sum _{i=0}^{n-1}c(x_{i},a_{i},b_{i},\Delta _{i+1})}{E_{x}^{(\pi ^{1},\pi ^{2})}T_{n}}. $$
(6)

Since the equalities

$$\begin{aligned} E_{x}^{(\pi ^{1},\pi ^{2})}\Delta _{i+1} & =E_{x}^{(\pi ^{1},\pi ^{2})} \tau (x_{i},a_{i},b_{i}), \end{aligned}$$
(7)
$$\begin{aligned} E_{x}^{(\pi ^{1},\pi ^{2})}c(x_{i},a_{i},b_{i},\Delta _{i+1}) & =E_{x}^{(\pi ^{1},\pi ^{2})}C(x_{i},a_{i},b_{i}) \end{aligned}$$
(8)

hold for all strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\), state \(x\in \mathbf{X}\) and \(i\in \mathbb{N}_{0}\), the performance index (6) can be re-written as

$$ J(\pi ^{1},\pi ^{2},x)=\limsup _{n\rightarrow \infty } \frac{E_{x}^{(\pi ^{1},\pi ^{2})}\sum _{i=0}^{n-1}C(x_{i},a_{i},b_{i})}{E_{x}^{(\pi ^{1},\pi ^{2})}\sum _{i=0}^{n-1}\tau (x_{i},a_{i},b_{i})}. $$
(9)

Roughly speaking, the goal of player 1 (player 2, resp.) is to minimize (maximize, resp.) (9). This leads to the lower and upper value functions

$$ L(\cdot ):=\sup _{\pi ^{2}\in \Pi ^{2}}\inf _{\pi ^{1}\in \Pi ^{1}}J( \pi ^{1},\pi ^{2},\cdot )\ \ \ \text{and}\ \ \ U(\cdot ):=\inf _{\pi ^{1} \in \Pi ^{1}}\sup _{\pi ^{2}\in \Pi ^{2}}J(\pi ^{1},\pi ^{2},\cdot ), $$

respectively. Observe that, in general, \(L(\cdot )\leq U(\cdot )\). Thus, if \(L(\cdot )=U(\cdot )\), it is said that the game has a value and the common value of these functions, which is denoted as \(V(\cdot )\), is called the value of the game.

If the game has value \(V(\cdot )\), a policy \(\pi _{\ast }^{1}\in \Pi ^{1}\) is said to be EAP-optimal for player 1 if

$$ \sup _{\pi ^{2}\in \Pi ^{2}}J(\pi _{\ast }^{1},\pi ^{2},\cdot )=V( \cdot ); $$

similarly, a policy \(\pi _{\ast }^{2}\in \Pi ^{2}\) is said to be EAP-optimal for player 2 if

$$ \inf _{\pi ^{1}\in \Pi ^{1}}J(\pi ^{1},\pi _{\ast }^{2},\cdot )=V( \cdot ). $$

If \(\pi _{\ast }^{i}\) is EAP-optimal for player \(i\ (i=1,2)\), then the pair \((\pi _{\ast }^{1},\pi _{\ast }^{2})\) is called EAP-optimal pair or saddle point. Observe that \((\pi _{\ast }^{1},\pi _{\ast }^{2})\) is an EAP-optimal pair if and only if

$$ J(\pi _{\ast }^{1},\pi ^{2},\cdot )\leq J(\pi _{\ast }^{1},\pi _{\ast }^{2}, \cdot )\leq J(\pi ^{1},\pi _{\ast }^{2},\cdot )\ \ \ \forall \pi ^{1} \in \Pi ^{1},\pi ^{2}\in \Pi ^{2}. $$

Remark 2.1

An important particular case of semi-Markov games are Markov games, which come out when the holding time distribution is concentrated at some positive constant \(c\), say \(c=1\), that is

$$ F(\{1\}|x,a,b)=1\text{ }\forall (x,a,b)\in \mathbb{K}. $$

In this case,

$$ P_{x}^{(\pi ^{1},\pi ^{2})}[T_{n}=n]=1\ \ \ \forall x\in \mathbf{X},( \pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},n\in \mathbb{N}, $$

and \(\tau =1\). Thus, the performance index (9) becomes

$$ J(\pi ^{1},\pi ^{2},x)=\limsup _{n\rightarrow \infty }\frac{1}{n}E_{x}^{( \pi ^{1},\pi ^{2})}\sum _{i=0}^{n-1}C(x_{i},a_{i},b_{i}). $$

Remark 2.2

The performance index (9) depends on the kernel \(q\) only through its marginal distribution measures \(Q\) and \(F\), which enter in the computation in a “decoupled” way. Thus, one can assume without loss of generality that

$$ q(B\times D|x,a,b)=F(D|x,a,b)Q(B|x,a,b) $$

for all \((x,a,b)\in \mathbb{K},B\in \mathcal{B}(\mathbf{X}),D\in \mathcal{B}(\mathbb{R}_{+})\). A second consequence is that the mean holding time distribution \(F\) can be replaced by an exponential distribution with parameter \(\tau ^{-1}\), that is, by the distribution

$$ F^{\prime }(D|x,a,b):=\tau ^{-1}(x,a,b)\int _{D}e^{-\tau ^{-1}(x,a,b)t}dt $$

under the assumption that \(\tau >0\). In other words, the semi-Markov game model (1) can be replaced without loss of generality by the model \((\mathbf{X},\mathbf{A},\mathbf{B},\mathbb{K}_{\mathbf{A}}, \mathbb{K}_{\mathbf{B}},q^{\prime },C)\) where

$$ q^{\prime }(B\times D|x,a,b):=F^{\prime }(D|x,a,b)Q(B|x,a,b), $$

\(Q\) is the transition kernel (3) and \(C\) is the cost function (8). It is worth mentioning that this could not be the case for a discounted performance index.

3 Solutions to the Shapley Equation and Optimal Stationary Strategies

The main result of the present work, Theorem 3.4 below, extends to semi-Markov games with weak continuous transition probabilities the analysis given in [24] for Markov decision processes with continuous transition probabilities. Specifically, it is shown the existence of lower semicontinuous solutions to the Shapley equation under Assumptions 1, 2 and 3 given below. As commented in the Introduction, the framework settled by these conditions has became quite standard for the study of average payoff optimization problems and used, for instance, for Markov and semi-Markov decision processes in [4, 6, 7, 9, 12, 13, 16, 21, 24], and for Markov and semi-Markov games in [10, 11, 15, 22]. See reference [5] for further comments and discussion on such kind of conditions.

Assumption 1

The following conditions hold for all \((x,a,b)\in \mathbb{K}\):

(a) \(\tau (x,a,b)>0\);

(b) there exist a measurable function \(W\geq 1\) on \(\mathbf{X}\) and a constant \(k>0\) such that

$$ \max \{\tau (x,a,b),|C(x,a,b)|\}\leq kW(x). $$

Assumption 2

There exist a measurable nonnegative function \(s\) on \(\mathbb{K}\), a nontrivial measure \(\nu \) on \(\mathbf{X}\), and a constant \(\lambda \in (0,1)\) such that the following properties hold:

(a) \(\nu (W)<\infty \);

(b) \(Q(B|x,a,b)\geq s(x,a,b)\nu (B)\) for all \((x,a,b)\in \mathbb{K}\);

(c) \(QW(x,a,b))\leq \lambda W(x)+\nu (W)s(x,a,b)\) for all \((x,a,b)\in \mathbb{K}\);

(d) \(\nu (s_{\phi ^{1},\phi ^{2}})>0\).

The key point is that Assumption 2 entails a contraction property. To see this, let

$$ \widehat{Q}(B|x,a,b):=Q(B|x,a,b)-s(x,a,b)\nu (B),\ (x,a,b)\in \mathbb{K},B\in \mathcal{B}(\mathbf{X}). $$

Then, Assumption 2(b) implies that \(\widehat{Q}\) is a nonnegative kernel, while Assumption 2(c) leads to the inequality

$$ \sup _{(x,a,b)}\frac{|\widehat{Q}u(x,a,b)|}{W(x)}\leq \lambda ||u||_{W}\ \ \forall u\in B_{W}(\mathbf{X}). $$

The results in the next proposition are proved in [21] using this contraction property.

Proposition 3.1

Suppose that Assumption 2holds and let \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\) be an arbitrary stationary strategy pair. Then:

(a) the transition probability \(Q_{\phi ^{1},\phi ^{2}}\) is positive Harris recurrent; thus, it admits a unique invariant probability measure \(\mu _{\phi ^{1},\phi ^{2}}\) and \(\nu \) is an irreducibility measure;

(b) \(\mu _{\phi ^{1},\phi ^{2}}(W)\) is finite and

$$ 1\leq \mu _{\phi ^{1},\phi ^{2}}(W)\leq \frac{1}{1-\lambda } \frac{\nu (W)}{\nu (\mathbf{X})}; $$

moreover,

$$ \mu _{\phi ^{1},\phi ^{2}}(s_{\phi ^{1},\phi ^{2}})\geq \theta := \frac{1-\lambda }{\nu (W)}; $$

(c) for any \(u\in B_{W}(\mathbf{X})\), \(\mu _{\phi ^{1},\phi ^{2}}(|u|)<\infty \) and

$$ \lim _{n\rightarrow \infty }\frac{1}{n}E_{x}^{(\phi ^{1},\phi ^{2})} \sum _{i=0}^{n-1}u(x_{i})=\mu _{\phi ^{1},\phi ^{2}}(u)\ \ \forall x \in \mathbf{X}; $$

hence,

$$ \lim _{n\rightarrow \infty }\frac{1}{n}E_{x}^{(\phi ^{1},\phi ^{2})}u(x_{n})=0. $$

Remark 3.2

Thus, for each pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\), Assumptions 1 and 2 imply that the constant

$$ \rho (\phi ^{1},\phi ^{2}):= \frac{\mu _{\phi ^{1},\phi ^{2}}(C_{\phi ^{1},\phi ^{2}})}{\mu _{\phi ^{1},\phi ^{2}}(\tau _{\phi ^{1},\phi ^{2}})}, $$
(10)

is well defined and finite, and also that the operator

$$ T_{\phi ^{1},\phi ^{2}}u:=C_{\phi ^{1},\phi ^{2}}-\rho (\phi ^{1}, \phi ^{2})\tau _{\phi ^{1},\phi ^{2}}+\widehat{Q}_{\phi ^{1},\phi ^{2}}u $$

is a contraction map from \(B_{W}(\mathbf{X})\) into itself with modulus \(\lambda \). Since \((B_{W}(\mathbf{X}),||\cdot ||_{W})\) is a Banach space, it follows that there exists a unique function \(h_{\phi ^{1},\phi ^{2}}\in B_{W}(\mathbf{X)}\) such that

$$\begin{aligned} h_{\phi ^{1},\phi ^{2}} & =T_{\phi ^{1},\phi ^{2}}h_{\phi ^{1},\phi ^{2}} \\ & =C_{\phi ^{1},\phi ^{2}}-\rho (\phi ^{1},\phi ^{2})\tau _{\phi ^{1}, \phi ^{2}}+\widehat{Q}_{\phi ^{1},\phi ^{2}}h_{\phi ^{1},\phi ^{2}}. \end{aligned}$$
(11)

Now an integration of both sides of the above equation with respect to the probability measure \(\mu _{\phi ^{1},\phi ^{2}}\) leads to equality

$$ 0=\nu (h_{\phi ^{1},\phi ^{2}})\mu _{\phi ^{1},\phi ^{2}}(s_{\phi ^{1}, \phi ^{2}}), $$

which implies, by Proposition 3.1(b), that \(\nu (h_{\phi ^{1},\phi ^{2}})=0\). Hence, \(h_{\phi ^{1},\phi ^{2}}\) is the unique function on \(B_{W}(\mathbf{X})\) that satisfies the (semi-Markov) Poisson equation

$$ h_{\phi ^{1},\phi ^{2}}=C_{\phi ^{1},\phi ^{2}}-\rho (\phi ^{1},\phi ^{2}) \tau _{\phi ^{1},\phi ^{2}}+Q_{\phi ^{1},\phi ^{2}}h_{\phi ^{1},\phi ^{2}}$$

and the condition \(\nu (h_{\phi ^{1},\phi ^{2}})=0\). Then, by iterating this equation, one can see that

$$ J(\phi ^{1},\phi ^{2},x)=\lim _{n\rightarrow \infty } \frac{E_{x}^{(\phi ^{1},\phi ^{2})}\sum _{i=0}^{n-1}C_{\phi ^{1},\phi ^{2}}(x_{i})}{E_{x}^{(\phi ^{1},\phi ^{2})}\sum _{i=0}^{n-1}\tau _{\phi ^{1},\phi ^{2}}(x_{i})}= \rho (\phi ^{1},\phi ^{2}) $$
(12)

for all \(x\in \mathbf{X}\).

It is worth mentioning that reference [21] proves Proposition 3.1 using similar arguments to those displayed above to show the existence of solutions to the Poisson equation (11).

Now, to pass from the Poisson equations to the Shapley equation, it is assumed that the model game satisfies the following compactness and continuity conditions.

Assumption 3

(a) \(C\) is lower semicontinuous on \(\mathbb{K}\);

(b) the mapping \(x\rightarrow A(x)\) is lower semicontinuous and compact-valued;

(c) the mapping \(x\rightarrow B(x)\) is upper semicontinuous and compact valued;

(d) \(\tau \) is continuous;

(e) the state transition law \(Q\) is weakly continuous on \(\mathbb{K}\), that is, the mapping

$$ (x,a,b)\rightarrow \int _{\mathbf{X}}u(y)Q(dy|x,a,b) $$

is continuous for all \(u\in \mathcal{C}_{b}(\mathbf{X})\);

(f) \(W\) and \(QW\) are continuous functions.

Notice that Remark 3.2 shows that the average payoff criterion is well defined and finite-valued whenever the players use stationary strategies. The next theorem extends this assertion to all admissible strategies.

Theorem 3.3

Under Assumptions 1, 2and 3, the performance criterion (9) is well defined and finite-valued.

The next theorem states the main results of the present work.

Theorem 3.4

If Assumptions 1, 2, 3hold, then:

(a) there exist \(h^{\ast }\in L_{W}(\mathbf{X})\) and \(\rho ^{\ast }\in \mathbb{R}\) that satisfy the Shapley equation

$$\begin{aligned} h^{\ast }(x) & =\inf _{\gamma ^{1}\in \mathbb{A}(x)}\sup _{\gamma ^{2}\in \mathbb{B}(x)}[C_{\gamma ^{1},\gamma ^{2}}(x)-\rho ^{\ast }\tau _{ \gamma ^{1},\gamma ^{2}}(x)+Q_{\gamma ^{1},\gamma ^{2}}h^{\ast }(x)] \\ & =\sup _{\gamma ^{2}\in \mathbb{B}(x)}\inf _{\gamma ^{1}\in \mathbb{A}(x)}[C_{\gamma ^{1},\gamma ^{2}}(x)-\rho ^{\ast }\tau _{\gamma ^{1}, \gamma ^{2}}(x)+Q_{\gamma ^{1},\gamma ^{2}}h^{\ast }(x)] \end{aligned}$$

for all \(x\in \mathbf{X}\);

(b) there exists \(\phi _{\ast }^{1}\in \Phi ^{1}\) such that

$$ h^{\ast }(x)=\sup _{\gamma ^{2}\in \mathbb{B}(x)}[C_{\phi _{\ast }^{1}, \gamma ^{2}}(x)-\rho ^{\ast }\tau _{\phi _{\ast }^{1},\gamma ^{2}}(x)+Q_{\phi _{ \ast }^{1},\gamma ^{2}}h^{\ast }(x)]\ \ \forall x\in \mathbf{X}; $$

(c) for each \(\varepsilon >0\) there exists \(\phi _{\varepsilon }^{2}\in \Phi ^{2}\) such that

$$ h^{\ast }(x)\leq \inf _{\gamma ^{1}\in \mathbb{A}(x)}[C_{\gamma ^{1}, \phi _{\varepsilon }^{2}}(x)-\rho ^{\ast }\tau _{\gamma ^{1},\phi _{ \varepsilon }^{2}}(x)+Q_{\gamma ^{1},\phi _{\varepsilon }^{2}}h^{\ast }(x)]+\varepsilon \ \ \forall x\in \mathbf{X.}$$

(d) the constant \(\rho ^{\ast }\) is the value of the game, the stationary policy \(\phi _{\ast }^{1}\) is optimal for player 1; for each \(\varepsilon >0\), the stationary policy \(\phi _{\varepsilon ^{\prime }}^{2}\) is \(\varepsilon \)-optimal for player 2 with \(\varepsilon ^{\prime }=\nu (W)l\varepsilon /(1-\lambda )\) where \(l\) is a fixed constant (this is given below in Lemma 5.3);

(e) moreover,

$$ \rho ^{\ast }=\inf _{\phi ^{1}\in \Phi ^{1}}\sup _{\phi ^{2}\in \Phi ^{2}} \rho (\phi ^{1},\phi ^{2})=\sup _{\phi ^{2}\in \Phi ^{2}}\inf _{\phi ^{1} \in \Phi ^{1}}\rho (\phi ^{1},\phi ^{2}), $$

The proof of Theorem 3.4 is given in Sect. 4.

Remark 3.5

Assumption 3(d) trivially holds for Markov games (see Remark 2.1). Moreover, as can be checked in the proof of Theorem 3.4, instead of the compactness property in Assumption 3(c), it suffices to assume the completeness of the sets \(B(x)\), \(x\in \mathbf{X}\). In fact, the compactness property is used only to prove the existence of a positive constant \(\gamma \) such that

$$ \liminf _{m\rightarrow \infty }\frac{1}{m}\sum _{i=0}^{m-1}E_{x}^{( \pi ^{1},\pi ^{2})}\tau (x_{i},a_{i},b_{i})\geq \gamma $$

for all initial state \(x\in \mathbf{X}\) and strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) (see Corollary 5.5 in Sect. 5). This latter inequality clearly holds for Markov games.

4 A Minimax Semi-Markov Inventory Problem

A controller seeks to minimize the cost of operating a single-item inventory system without backlog, for which the decision epochs form a nondecreasing stochastic processes \(T_{n},n\in \mathbb{N}_{0}\). The demand process \(\{w_{n}\}\) is formed by independent nonnegative random variables with \(w_{n}\) being the quantity of product demanded between the decision epochs \(T_{n}\) and \(T_{n+1}\). The controller knows that the expected demands lie in a bounded interval and that its second moments are bounded above, but she/he does not know the demand distribution themselves. To state these assumptions formally, for \(p\in \mathbb{P}([0,\infty ))\), let

$$ \mu _{p}:=\int _{0}^{\infty }sp(ds)\ \ \ \text{and}\ \ \ s_{p}^{2}:= \int _{0}^{\infty }s^{2}p(ds). $$

Assumption 4

The demand distributions belong to the class

$$ \mathbf{B:}=\{b\in \mathbb{P}(\mathbb{R}_{+}):z_{\ast }\leq \mu _{b} \leq z^{\ast },s_{b}^{2}\leq s^{\ast }\}, $$

where \(z^{\ast },z_{\ast }\) and \(s^{\ast }\) are constants such that \(z^{\ast }>z_{\ast }>0\) and \(s^{\ast }\geq 0\).

In what follows it is assumed that \(\mathbf{B}\) is metrized with any metric compatible with the topology of weak convergence of probability measures.

The inventory evolves as follows: at the nth decision epoch, which is given by the time \(T_{n}\), the controller observes the inventory level \(x_{n}=x\in \mathbf{X:}=\mathbb{R}_{+}\) and orders a product quantity \(a_{n}=a\in \mathbf{A:}=[0,\widehat{a}]\) for facing the (nonnegative) product demand \(w_{n}=w\) accumulated between times \(T_{n}\) and \(T_{n+1}\); the constant \(\widehat{a}\) is positive and it is assumed that the replenishing quantity \(a_{n}\) is immediately supplied. Thus, the controller incurs in the cost

$$ c_{1}(1-\delta _{0}(a))+c_{2}a+c_{3}(x+a)+c_{4}(w-x-a)^{+}$$

where \(\delta _{0}(a)\) is the delta of Kronecker and the nonnegative constants \(c_{i},i=1,\ldots ,4\), stand for the cost for putting an order, the production cost, the holding cost and the penalizating cost for unmet demand, respectively, per unit of product.

Then, the expected payoff (or one step cost) is

$$ C(x,a,b):=c_{1}(1-\delta _{0}(a))+c_{2}a+c_{3}(x+a)+c_{4}E_{b}(w-x-a)^{+} $$
(13)

for \((x,a,b)\in \mathbb{K}\), with \(A(x):=\mathbf{A}\) and \(B(x):=\mathbf{B}\) for all \(x\in \mathbf{X}\), and where \(E_{b}\) denotes the expectation with respect to the distribution \(b\in \mathbf{B}\). Notice that \(\mathbb{K=}\mathbf{X}\times \mathbf{A}\times \mathbf{B}\).

At time \(T_{n+1}\) the inventory changes according to the recursive equation

$$ x_{n+1}=(x_{n}+a_{n}-w_{n})^{+},n\in \mathbb{N}_{0}, $$
(14)

where \(r^{+}:=max(r,0)\) for \(r\in \mathbb{R}\), which leads to the stochastic kernel

$$ Q(B|x,a,b):=b([w\geq 0:(x+a-w)^{+}\in B]),\ B\in \mathcal{B} \mathbb{(}\mathbf{X}),(x,a,b)\in \mathbb{K}. $$
(15)

Since the distribution of the product demand is unknown, the controller wants to hedge himself against the worse possible scenario; then, she/he approaches the problem as a game against “nature”, who choses the distribution \(b_{n}\in \mathbf{B}\) for the product demand \(w_{n}\). Thus, the controller goal is to find a minimax policy, that is, a policy \(\pi _{\ast }^{1}\in \Pi ^{1}\) that satisfies the condition

$$ \sup _{\pi ^{2}\in \Pi ^{2}}J(\pi _{\ast }^{1},\pi ^{2},x)=\inf _{\pi ^{1} \in \Pi ^{1}}\sup _{\pi ^{2}\in \Pi ^{2}}J(\pi ^{1},\pi ^{2},x)\ \ \forall x\in \mathbf{X}. $$
(16)

Notice that a minimax policy is no other than an optimal policy for the controller when the inventory system is seen as a game against nature and this game has a value.

Jaśkiewicz [10] considers a minimax Markov inventory problem, that is, a minimax inventory problem for which \(T_{n}=n\) for all \(n\in \mathbb{N}\); moreover, she assumes that the set \(\mathbf{B}\) of possible distributions for the demand is finite. Reference [16] also follows the minimax approach to study a class of semi-Markov inventory system but assuming that the holding time distribution depends on an unknown parameter and that the demand distribution is completely known.

The main result of this section, Theorem 4.2 given below, shows the existence of a deterministic stationary minimax policy for the inventory system (13)-(15) under Assumption 4 and Assumptions 5 and 6, which are given next. Assumption 5 is also related to the distributions of the product demands.

Assumption 5

The inequality \(z_{\ast }>\widehat{a}\) holds.

It is shown in Lemma 6.1, Sect. 6, that Assumptions 4 and 5 imply that there exists a constant \(r>0\) such that

$$ \lambda :=\sup _{b\in \mathbf{B}}\Phi _{b}(r)< 1, $$
(17)

where

$$ \Phi _{p}(t):=\int _{\mathbb{R}_{+}}e^{t(\widehat{a}-s)}p(ds),\ \ t \geq 0,p\in \mathbb{P}(\mathbb{R}_{+}). $$
(18)

This inequality plays a key role in proving that the inventory system satisfies Assumption 2. In fact, Lemma 6.2 shows that the constant \(\lambda \) in (17), the functions

$$ W(x):=\exp (rx)\ \ \text{ and}\ \ \ s(x,a,b):=b((x+a,\infty )),\ (x,a,b) \in \mathbb{K}, $$
(19)

and the Dirac measure at zero \(\nu (\cdot )\) satisfy Assumption 2.

The next condition concerns with the mean holding time distribution. In order to be specific, it is assumed that it has an exponential density where the parameter \(\kappa ^{-1}\) is a positive continuous function dominated by the function \(W\); such condition is formally stated below. However, it should be noted that it can be considered any other class of distributions as long as Assumption 1 holds (see Remark 2.2).

Assumption 6

The mean holding time distribution is given as

$$ F([0,t]|x,a,b)=\kappa ^{-1}(x,a,b)\int _{0}^{t}e^{-\kappa ^{-1}(x,a,b)s}ds $$

for all \(t\geq 0\) and \((x,a,b)\in \mathbb{K}\), where \(\kappa :\mathbb{K}\rightarrow \mathbb{(}0,\infty )\) is a continuous function such that \(\kappa (\cdot ,\cdot ,\cdot )\leq k_{1}W(\cdot )\) for some positive constant \(k_{1}\).

Thus, according to Remark 2.2, the semi-Markov kernel can be taken without loss of generality as

$$ q(B\times \lbrack 0,t]|x,a,b)=[1-e^{-\kappa ^{-1}(x,a,b)t}]b([w\geq 0:(x+a-w)^{+} \in B]) $$

for all \(t\geq 0,(x,a,b)\in \mathbb{K}\) and \(B\in \mathcal{B}(\mathbb{R}_{+})\).

Remark 4.1

Observe that Assumption 4 implies the inequality

$$ 0\leq C(x,a,b)\leq M+c_{3}x\ \ \ \forall (x,a,b)\in \mathbb{K}, $$

where \(M:=c_{1}+(c_{2}+c_{3})\widehat{a}+c_{4}z^{\ast }\). For the other hand, Assumption 6 implies that

$$ 0< \tau (x,a,b)=\kappa (x,a,b)\leq k_{1}W(x)\ \ \ \forall (x,a,b)\in \mathbb{K}. $$

Thus, one can choose a constant \(k\geq k_{1}>0\) such that

$$ \max \{\tau (x,a,b),|C(x,a,b)|\}\leq kW(x), $$

for all \((x,a,b)\in \mathbb{K}\). Notice that this latter inequality shows that Assumption 1(b) holds.

Next, it is the main result of this section.

Theorem 4.2

Suppose that Assumptions 4, 5and 6hold. Then there exists a function \(h^{\ast }\in L_{W}(\mathbf{X})\), a constant \(\rho ^{\ast }\) and a deterministic stationary policy \(f_{\ast }^{1}\in \mathbb{F}^{1}\) for player 1 such that the equalities

$$\begin{aligned} h^{\ast }(x) & =\inf _{a\in \mathbf{A}}\sup _{b\in \mathbf{B}}[C(x,a,b)- \rho ^{\ast }\tau (x,a,b)+\int _{\mathbf{X}}h^{\ast }(y)Q(dy|x,a,b)] \\ & =\sup _{b\in \mathbf{B}}[C(x,f_{\ast }^{1}(x),b)-\rho ^{\ast }\tau (x,f_{ \ast }^{1}(x),b)+\int _{\mathbf{X}}h^{\ast }(y)Q(dy|x,f_{\ast }^{1}(x),b)] \end{aligned}$$

hold for all \(x\in \mathbf{X}\). Moreover, \(f_{\ast }^{1}\) is a minimax policy and

$$ \rho ^{\ast }=\sup _{\pi ^{2}\in \Pi ^{2}}J(f_{\ast }^{1},\pi ^{2},x)= \inf _{\pi ^{1}\in \Pi ^{1}}\sup _{\pi ^{2}\in \Pi ^{2}}J(\pi ^{1},\pi ^{2},x)\ \ \forall x\in \mathbf{X}. $$

5 Proof of Theorems 3.3 and 3.4

The proof relies on a number of preliminary results, which are collected in several propositions and lemmas. In Proposition 5.1, it is used the concept of lower semicontinuous envelope of functions, which is introduced below together with some related properties.

Let \((\mathcal{S},d)\) be a metric space. For each function \(u:\mathcal{S\rightarrow }\mathbb{R}\) define

$$ u^{e}(s):=\sup _{r>0}\inf _{s^{\prime }\in B_{r}(s)}u(s^{\prime }), $$

where \(B_{r}(s)\) is the ball with center at \(s\in \mathcal{S}\) and radius \(r>0\). Function \(u^{e}\) is the largest lower semicontinuous function dominated by \(u\), that is: (i) \(u^{e}\) is lower semicontinuous; (ii) \(u\geq u^{e}\); (iii) if \(v\) is lower semicontinuous and \(u\geq v\), then \(u^{e}\geq v\). Thus, \(u^{e}\) is called the lower semicontinuous envelope of the function \(u\). Clearly, \(u\) is lower semicontinuous if and only if \(u=u^{e}\); moreover, if \(u\geq v\), then \(u^{e}\geq v^{e}\). Additionally, if \(w:\mathcal{S\rightarrow }[1,\infty )\) is a continuous function, then

$$ ||u^{e}-v^{e}||_{w}\leq ||u-v||_{w}\ \ \ \forall u,v\in B_{w}( \mathcal{S}). $$

In what follows is assumed that Assumptions 1, 2, 3 hold.

Proposition 5.1

Let \(S:=s^{e}\). Then:

(a) \(Q(B|x,a,b)\geq S(x,a,b)\nu (B)\) for all \((x,a,b)\in \mathbb{K}\);

(b) \(QW(x,a,b)\leq \lambda W(x)+\nu (W)S(x,a,b)\) for all \((x,a,b)\in \mathbb{K}\);

(c) \(\mu _{\phi ^{1},\phi ^{2}}(S_{\phi ^{1},\phi ^{2}})\geq \theta >0\) for all pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\).

Proof

Property (a) follows directly from the definition of lower semicontinuous envelope. To prove (b), note that \(\nu (W)s\geq QW-\lambda W\) and recall that \(QW\) and \(W\) are continuous functions; then

$$ \nu (W)S=(\nu (W)s)^{e}\geq (QW-\lambda W)^{e}=QW-\lambda W, $$

which proves (b). Now, note that part (b) yields for each pair of stationary strategies \((\phi ^{1},\phi ^{2})\) the inequality

$$ Q_{\phi ^{1},\phi ^{2}}W\leq \lambda W+\nu (W)S_{\phi ^{1},\phi ^{2}}. $$

Thus, the inequality resulting by integrating with respect to \(\mu _{\phi ^{1},\phi ^{2}}\) at both sides implies that

$$ \mu _{\phi ^{1},\phi ^{2}}(S_{\phi ^{1},\phi ^{2}})\geq (1-\lambda ) \frac{\mu _{\phi ^{1},\phi ^{2}}(W)}{\nu (W)}\geq \frac{1-\lambda }{\nu (W)}=\theta , $$

which proves part (c). □

Proposition 5.2

For all strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},x\in \mathbf{X}\) and \(u\in B_{W}(\mathbf{X)}\):

$$ \lim _{n\rightarrow \infty }\frac{1}{n}E_{x}^{(\pi ^{1},\pi ^{2})}u(x_{n})=0. $$

Proof

This result follows directly after noting that Assumption 1(b) and Assumption 2(c) imply that

$$ 1\leq E_{x}^{(\pi ^{1},\pi ^{2})}W(x_{n})\leq \lambda ^{n}W(x)+ \frac{\nu (W)}{(1-\lambda )\nu (X)}\ \ \forall n\in \mathbb{N}_{0}. $$
(20)

 □

Lemma 5.3

There exists a constant \(l>0\) such that

$$ E_{x}^{(\pi ^{1},\pi ^{2})}\tau (x_{k},a_{k},b_{k})\geq lE_{x}^{(\pi ^{1}, \pi ^{2})}s(x_{k-1},a_{k-1},b_{k-1}) $$

for all \(x\in \mathbf{X},(\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},k\in \mathbb{N}\).

Proof

Fix \(x\in \mathbf{X},(\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},k\in \mathbb{N}_{0}\). Now, observe that

$$ \tau ^{\ast }(x):=\min _{(a,b)\in A(x)\times B(x)}\tau (x,a,b)>0. $$

Next consider the subsets

$$ \mathbf{X}_{n}:=\{x\in \mathbf{X}:\tau ^{\ast }(x)\geq 1/n\} $$

and notice that \(\mathbf{X}_{n}\uparrow \mathbf{X}\). Thus, for some \(N\in \mathbb{N}\), \(\nu (\mathbf{X}_{N})>0\) since \(\nu (\mathbf{X}_{n})\uparrow \nu (\mathbf{X})>0\). Then,

$$\begin{aligned} E_{x}^{(\pi ^{1},\pi ^{2})}\tau (x_{k},a_{k},b_{k}){}&{}=E_{x}^{(\pi ^{1}, \pi ^{2})}E_{x}^{(\pi ^{1},\pi ^{2})}[\tau (x_{k},a_{k},b_{k})|h_{k-1},a_{k-1},b_{k-1}]\\ &{}=E_{x}^{(\pi ^{1},\pi ^{2})}{\displaystyle \int _{\mathbf{X}}} {\displaystyle \int _{\mathbf{A}}} { \displaystyle \int _{\mathbf{B}}} \tau (y,a,b)\pi _{k}^{1}(da|h_{k}) \pi _{k}^{2}(db|h_{k})Q(dy|x_{k-1},a_{k-1},b_{k-1})\\ &{}\geq E_{x}^{(\pi ^{1},\pi ^{2})}{\displaystyle \int _{\mathbf{X}_{N}}} {\displaystyle \int _{ \mathbf{A}}} {\displaystyle \int _{\mathbf{B}}} \frac{1}{N}\pi _{k}^{1}(a|h_{k}) \pi _{k}^{2}(db|h_{k})Q(dy|x_{k-1},a_{k-1},b_{k-1})\\ &{}=\frac{1}{N}E_{x}^{(\pi ^{1},\pi ^{2})}Q(X_{N}|x_{k-1},a_{k-1},b_{k-1})\\ &{}\geq \frac{1}{N}\nu (\mathbf{X}_{N})E_{x}^{( \pi ^{1},\pi ^{2})}s(x_{k-1},a_{k-1},b_{k-1}); \end{aligned}$$

which proves the desired result with \(l:=\nu (\mathbf{X}_{N})/N\). □

Lemma 5.4

The following inequality holds for all \(x\in \mathbf{X},(\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\):

$$ \liminf _{m\rightarrow \infty }\frac{1}{m}E_{x}^{(\pi ^{1},\pi ^{2})} \sum _{k=0}^{m-1}s(x_{k},a_{k},b_{k})\geq \frac{1-\lambda }{\nu (W)}. $$

Proof

Let \(x\in \mathbf{X},(\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) be fixed but arbitrary. It follows from Assumption 2(c) that

$$\begin{aligned} E_{x}^{(\pi ^{1},\pi ^{2})}s(x_{k-1},a_{k-1},b_{k-1}) & \geq \frac{1}{\nu (W)}E_{x}^{(\pi ^{1},\pi ^{2})}[QW(x_{k-1},a_{k-1},b_{k-1})- \lambda W(x_{k-1})] \\ & =\frac{1}{\nu (W)}E_{x}^{(\pi ^{1},\pi ^{2})}[W(x_{k})-W(x_{k-1})+(1-\lambda )W(x_{k-1})]. \end{aligned}$$

Then,

$$\begin{aligned} \frac{1}{m}\sum _{k=1}^{m}E_{x}^{(\pi ^{1},\pi ^{2})}s(x_{k-1},a_{k-1},b_{k-1}) & \geq \frac{1}{\nu (W)}\frac{1}{m}\left ( [E_{x}^{(\pi ^{1},\pi ^{2})}W(x_{m})-W(x)]\right . \\ & \ \ \ \ \ \ \ \left . +(1-\lambda )E_{x}^{(\pi ^{1},\pi ^{2})}\sum \limits _{k=1}^{m}W(x_{k-1})\right ) . \end{aligned}$$

The last inequality implies that

$$\begin{aligned} \liminf _{m\rightarrow \infty }\frac{1}{m}\sum _{k=1}^{m}E_{x}^{(\pi ^{1}, \pi ^{2})}s(x_{k-1},a_{k-1},b_{k-1}) & \geq \frac{1-\lambda }{\nu (W)}\liminf _{m \rightarrow \infty }\frac{1}{m}\sum _{k=1}^{m}E_{x}^{(\pi ^{1},\pi ^{2})}W(x_{k-1}) \\ & \geq \frac{1-\lambda }{\nu (W)}, \end{aligned}$$

proving the desired inequality. □

The next corollary is a direct consequence of the two previous lemmas and Proposition 5.2.

Corollary 5.5

Suppose Assumptions 1and 2hold. Then, for all \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},x\in \mathbf{X}\), the following inequality holds:

$$ \liminf _{m\rightarrow \infty }\frac{1}{m}\sum _{k=0}^{m-1}E_{x}^{( \pi ^{1},\pi ^{2})}\tau (x_{k},a_{k},b_{k})\geq \frac{(1-\lambda )l}{\nu (W)}. $$

Therefore,

$$ \sum _{k=0}^{\infty }E_{x}^{(\pi ^{1},\pi ^{2})}\tau (x_{k},a_{k},b_{k})= \infty $$

and

$$ \lim _{m\rightarrow \infty } \frac{E_{x}^{(\pi ^{1},\pi ^{2})}u(x_{m})}{\sum _{i=0}^{m-1}E_{x}^{(\pi ^{1},\pi ^{2})}\tau (x_{i},a_{i},b_{i})}=0 $$

for all \(u\in B_{W}(\mathbf{X})\).

Proof of Theorem 3.3

Fix an arbitrary strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) and an arbitrary initial state \(x\in \mathbf{X}\). Assumption 1(b) and inequality (20) imply that

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{1}{n}E_{x}^{(\pi ^{1},\pi ^{2})} \sum _{i=0}^{n-1}|C(x_{i},a_{i},b_{i})| & \leq \limsup _{n \rightarrow \infty }\frac{k}{n}E_{x}^{(\pi ^{1},\pi ^{2})}\sum _{i=0}^{n-1}W(x_{i}) \\ & \leq \lim _{n\rightarrow \infty }\frac{k}{n}[ \frac{1-\lambda ^{n}}{1-\lambda }+ \frac{n\nu (W)}{(1-\lambda )\nu (X)}]W(x) \\ & =\frac{k}{l\nu (\mathbf{X})}\left ( \frac{\nu (W)}{1-\lambda } \right ) ^{2}W(x). \end{aligned}$$

Then, from the above inequality and Corollary 5.5, it follows that

$$\begin{aligned} |J(\pi ^{1},\pi ^{2},x)| & \leq \frac{\limsup _{n\rightarrow \infty }\frac{1}{n}E_{x}^{(\pi ^{1},\pi ^{2})}\sum _{i=0}^{n-1}|C(x_{i},a_{i},b_{i})|}{\liminf _{n\rightarrow \infty }\frac{1}{n}E_{x}^{(\pi ^{1},\pi ^{2})}\sum _{i=0}^{n-1}\tau (x_{i},a_{i},b_{i})} \\ & \leq \frac{k(\nu (W))^{2}}{l(1-\lambda )^{2}\nu (X)}W(x)< \infty , \end{aligned}$$
(21)

which is the desired result. □

Lemma 5.6

For all stationary strategy pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\):

$$ |\rho (\phi ^{1},\phi ^{2})|\leq \frac{k}{l\nu (\mathbf{X})}\left ( \frac{\nu (W)}{1-\lambda }\right ) ^{2}. $$

Hence, the constants

$$ \rho ^{l}:=\sup _{\phi ^{2}\in \Phi ^{2}}\inf _{\phi ^{1}\in \Phi ^{1}} \rho (\phi ^{1},\phi ^{2})\ \ \ \textit{and}\ \ \ \rho ^{u}:=\inf _{ \phi ^{1}\in \Phi ^{1}}\sup _{\phi ^{2}\in \Phi ^{2}}\rho (\phi ^{1},\phi ^{2}) $$

are well defined and finite; notice that \(\rho ^{u}\geq \rho ^{l}\).

Proof

It follows from (10), Proposition 3.1, Corollary 5.5 and Assumption 1(b) that

$$\begin{aligned} |\rho (\phi ^{1},\phi ^{2})| & \leq \frac{\mu _{\phi ^{1},\phi ^{2}}(|C_{\phi ^{1},\phi ^{2}}|)}{\mu _{\phi ^{1},\phi ^{2}}(\tau _{\phi ^{1},\phi ^{2}})} \\ & \leq \frac{k\nu (W)}{l(1-\lambda )}\mu _{\phi ^{1},\phi ^{2}}(W) \\ & \leq \frac{k}{l\nu (\mathbf{X})}\left ( \frac{\nu (W)}{1-\lambda } \right ) ^{2}, \end{aligned}$$

which is the result to be proven. □

Now, to show the existence of solutions to the Shapley equation define the nonnegative kernel

$$ \widetilde{Q}(B|x,a,b):=Q(B|x,a,b)-S(x,a,b)\nu (B) $$

for \((x,a,b)\in \mathbb{K},B\in \mathcal{B}(\mathbf{X})\), and consider the operators on \(B_{W}(\mathbf{X})\) given by

$$\begin{aligned} Lu(x,a,b) & :=C(x,a,b)-\rho ^{l}\tau (x,a,b)+\widetilde{Q}u(x,a,b), \\ L^{e}u(x,a,b) & :=(Lu)^{e}(x,a,b), \\ Tu(x) & :=\sup _{\gamma ^{2}\in \mathbb{B}(x)}\inf _{\gamma ^{1}\in \mathbb{A}(x)}(L^{e}u)_{\gamma ^{1},\gamma ^{2}}(x) \end{aligned}$$

for \((x,a,b)\in \mathbb{K},B\in \mathcal{B}(\mathbf{X})\). Notice that \(\widetilde{Q}\geq 0\) and also that

$$ \sup _{(x,a,b)\in \mathbb{K}}\frac{|\widetilde{Q}u(x,a,b)|}{W(x)} \leq \lambda ||u||_{W}\ \ \forall u\in B_{W}(\mathbf{X}). $$

Lemma 5.7

The operator \(T\) is a contraction operator from \(L_{W}(\mathbf{X})\) into itself with modulus \(\lambda \). Thus, by the Banach fixed-point theorem, there exists a unique function \(h^{\ast }\in L_{W}(\mathbf{X})\) such that \(h^{\ast }=Th^{\ast }\). Moreover:

(a) \(Th^{\ast }(x)=\inf _{\gamma ^{1}\in \mathbb{A}(x)}\sup _{\gamma ^{2}\in \mathbb{B}(x)}(L^{e}u)_{\gamma ^{1},\gamma ^{2}}(x)\) for all \(x\in \mathbf{X}\);

(b) there exists \(\phi _{\ast }^{1}\in \Phi ^{1}\) such that

$$ h^{\ast }(x)=\sup _{\gamma ^{2}\in \mathbb{B}(x)}(L^{e}h^{\ast })_{ \phi _{\ast }^{1},\gamma ^{2}}(x)\ \ \forall x\in \mathbf{X}; $$

(c) for each \(\varepsilon >0\) there exists \(\phi _{\varepsilon }^{2}\in \Phi ^{2}\) such that

$$ h^{\ast }(x)-\varepsilon \leq \inf _{\gamma ^{1}\in \mathbb{A}(x)}(L^{e}h^{ \ast })_{\gamma ^{1},\phi _{\varepsilon }^{2}}(x)\ \ \forall x\in \mathbf{X}$$

Proof

Let \(u\) be an arbitrary function in \(L_{W}(\mathbf{X})\). It follows from Assumption 1 that \(|Lu|\leq MW\), where \(M:=(k+|\rho ^{l}|k+\lambda ||u||_{W})\). Then, since \(W\) is continuous, \(|L^{e}u|\leq MW\), which in turn implies that \(|(L^{e}u)_{\gamma ^{1},\gamma ^{2}}|\leq MW\) for all measures \(\gamma ^{1}\in \mathbb{A}(x),\gamma ^{2}\in \mathbb{B}(x)\). For the other hand, \((L^{e}u)_{\gamma ^{1},\gamma ^{2}}\) is a lower semicontinuous function on \(\mathbf{X}\); hence, \((L^{e}u)_{\gamma ^{1},\gamma ^{2}}\in L_{W}(\mathbf{X})\) for all measures \(\gamma ^{1}\in \mathbb{A}(x),\gamma ^{2}\in \mathbb{B}(x)\). Now, combining the above facts with the minimax theorem [10, Lemma 3.3], it follows that \(Tu\in L_{W}(\mathbf{X)}\) and also that

$$ Tu(x)=\inf _{\gamma ^{1}\in \mathbb{A}(x)}\sup _{\gamma ^{2}\in \mathbb{B}(x)}(L^{e}u)_{\gamma ^{1},\gamma ^{2}}(x)\ \ \forall x\in \mathbf{X.}$$

For the other hand, for \(u,v\in B_{W}(\mathbf{X})\), notice that

$$\begin{aligned} ||Tu-Tv||_{W} & =\sup _{x\in \mathbf{X}}\frac{|Tu(x)-Tv(x)|}{W(x)} \\ & \leq \sup _{x\in \mathbf{X}}\sup _{a\in A(x)}\sup _{b\in B(x)} \frac{|L^{e}u(x,a,b)-L^{e}v(x,a,b)|}{W(x)} \\ & \leq \sup _{x\in \mathbf{X}}\sup _{a\in A(x)}\sup _{b\in B(x)} \frac{|Lu(x,a,b)-Lv(x,a,b)|}{W(x)} \\ & \leq \sup _{x\in \mathbf{X}}\sup _{a\in A(x)}\sup _{b\in B(x)} \frac{|\widetilde{Q}(u-v)(x,a,b)|}{W(x)} \\ & \leq \lambda ||u-v||_{W}. \end{aligned}$$

Hence, \(T\) is a contraction operator from \(L_{W}(\mathbf{X})\) into itself with modulus \(\lambda \). So, there exists a unique function \(h^{\ast }\in L_{W}(\mathbf{X})\) such that

$$ h^{\ast }(x)=Th^{\ast }(x)=\inf _{\gamma ^{1}\in \mathbb{A}(x)}\sup _{ \gamma ^{2}\in \mathbb{B}(x)}(L^{e}h^{\ast })_{\gamma ^{1},\gamma ^{2}}(x)\ \ \forall x\in \mathbf{X}. $$

The last two assertions follows from the minimax theorem [10, Lemma 3.3]. □

Lemma 5.8

\(\nu (h^{\ast })=0\) and \(\rho ^{l}=\rho ^{u}\).

Proof

First it is shown that \(\nu (h^{\ast })\leq 0\) and that \(\rho ^{l}=\rho ^{u}\); after this, it is proved that \(\nu (h^{\ast })\geq 0\), completing thus the proof.

For each \(\varepsilon >0\), by Lemma 5.7(b), there exists \(\phi _{\varepsilon }^{2}\in \Phi ^{2}\) such that

$$ h^{\ast }\leq C_{\phi ^{1},\phi _{\varepsilon }^{2}}-\rho ^{l}\tau _{ \phi ^{1},\phi _{\varepsilon }^{2}}+Q_{\phi ^{1},\phi _{\varepsilon }^{2}}h^{ \ast }-\nu (h^{\ast })S_{\phi ^{1},\phi _{\varepsilon }^{2}}+\varepsilon \ \ \ \forall \phi ^{1}\in \Phi ^{1}. $$

Then, integrating with respect to the invariant probability measure \(\mu _{\phi ^{1},\phi _{\varepsilon }^{2}}\) in both sides of the above inequality, it follows after a rearrangement of terms that

$$ \rho ^{l}\leq \rho (\phi ^{1},\phi _{\varepsilon }^{2})+ \frac{\varepsilon -\nu (h^{\ast })\mu _{\phi ^{1},\phi _{\varepsilon }^{2}}(S_{\phi ^{1},\phi _{\varepsilon }^{2}})}{\mu _{\phi ^{1},\phi _{\varepsilon }^{2}}(\tau _{\phi ^{1},\phi _{\varepsilon }^{2}})} \ \ \forall \varepsilon >0,\phi ^{1}\in \Phi ^{1}. $$

Now suppose that \(\nu (h^{\ast })>0\) and take \(\varepsilon \in (0,\nu (h^{\ast })\theta )\), where \(\theta =(1-\lambda )/\nu (W)\) (see Proposition 3.1(b)). Then

$$ \rho ^{l}\leq \rho (\phi ^{1},\phi _{\varepsilon }^{2})+ \frac{[\varepsilon -\nu (h^{\ast })\theta ]\nu (W)}{1-\lambda }\ \ \ \forall \phi ^{1}\in \Phi ^{1}, $$

which, taking infimum over \(\phi ^{1}\), yields

$$ \rho ^{l}\leq \inf _{\phi ^{1}\in \Phi ^{1}}\rho (\phi ^{1},\phi _{ \varepsilon }^{2})+\frac{[\varepsilon -\nu (h^{\ast })\theta ]\nu (W)}{1-\lambda }< \rho ^{l}. $$

Therefore, \(\nu (h^{\ast })\leq 0\).

Recall that \(C\) and \(S\) are lower semicontinuous and \(\tau \) is continuous. Moreover, \(Qu\) is lower semicontinuous for each \(u\in L_{W}(\mathbf{X})\). Now, since \(\nu (h^{\ast })\leq 0\), the function \(Lh^{\ast }\) is lower semicontinuous too. Hence,

$$\begin{aligned} L^{e}h^{\ast }(x,a,b) & =Lh^{\ast }(x,a,b) \\ & =C(x,a,b)-\rho ^{l}\tau (x,a,b)+Qh^{\ast }(x,a,b)-\nu (h^{\ast })S(x,a,b) \end{aligned}$$

for all \((x,a,b)\in \mathbb{K}\). From Lemma 5.7(b), there exists \(\phi _{\ast }^{1}\in \Phi ^{1}\) such that

$$\begin{aligned} h^{\ast }(x) & =\sup _{\gamma ^{2}\in \mathbb{B}(x)}Lh^{\ast }{}_{\phi _{ \ast }^{1},\gamma ^{2}}(x) \\ & \geq C_{\phi _{\ast }^{1},\phi ^{2}}(x)-\rho ^{l}\tau _{\phi _{\ast }^{1}, \phi ^{2}}(x)+Q_{\phi _{\ast }^{1},\phi ^{2}}h^{\ast }(x)-\nu (h^{\ast })S_{ \phi _{\ast }^{1},\phi ^{2}}(x) \\ & \geq C_{\phi _{\ast }^{1},\phi ^{2}}(x)-\rho ^{l}\tau _{\phi _{\ast }^{1}, \phi ^{2}}(x)+Q_{\phi _{\ast }^{1},\phi ^{2}}h^{\ast }(x) \end{aligned}$$
(22)

for all \(x\in \mathbf{X},\phi ^{2}\in \Phi ^{2}\). Integrating with respect to \(\mu _{\phi _{\ast }^{1},\phi ^{2}}\), it follows that

$$ \rho ^{l}\geq \rho (\phi _{\ast }^{1},\phi ^{2})\ \ \ \forall \phi ^{2} \in \Phi ^{2}. $$

Thus,

$$ \rho ^{l}\geq \sup _{\phi ^{2}\in \Phi ^{2}}\rho (\phi _{\ast }^{1}, \phi ^{2})\geq \inf _{\phi ^{1}\in \Phi ^{1}}\sup _{\phi ^{2}\in \Phi ^{2}}\rho (\phi ^{1},\phi ^{2})=\rho ^{u}. $$

Therefore, \(\rho ^{u}=\rho ^{l}\).

Next it is shown that \(\nu (h^{\ast })\geq 0\). Since \(\rho ^{u}=\rho ^{l}\), the inequality (22) implies that

$$ \frac{\nu (h^{\ast })\mu _{{\phi _{\ast }^{1},\phi ^{2}}}(S_{\phi _{\ast }^{1},\phi ^{2}})}{\mu _{{\phi _{\ast }^{1},\phi ^{2}}}(\tau _{\phi _{\ast }^{1},\phi ^{2}})}\geq \rho (\phi _{\ast }^{1},\phi ^{2})-\inf _{\phi ^{1}\in \Phi ^{1}} \sup _{\phi ^{2}\in \Phi ^{2}}\rho (\phi ^{1},\phi ^{2})\ \ \forall \phi ^{2}\in \Phi ^{2}; $$

thus,

$$ \sup _{\phi ^{2}\in \Phi ^{2}} \frac{\nu (h^{\ast })\mu _{{\phi _{\ast }^{1},\phi ^{2}}}(S_{\phi _{\ast }^{1},\phi ^{2}})}{\mu _{{\phi _{\ast }^{1},\phi ^{2}}}(\tau _{\phi _{\ast }^{1},\phi ^{2}})} \geq \sup _{\phi ^{2}\in \Phi ^{2}}\rho (\phi _{\ast }^{1},\phi ^{2})- \inf _{\phi ^{1}\in \Phi ^{1}}\sup _{\phi ^{2}\in \Phi ^{2}}\rho ( \phi ^{1},\phi ^{2})\geq 0. $$

This inequality and Proposition 5.1(c) imply that \(\nu (h^{\ast })\geq 0\). Therefore, \(\nu (h^{\ast })=0\). □

Proof of Theorem 3.4

Parts (a), (b) and (c) follow from Lemmas 5.7 and 5.8 with \(\rho ^{\ast }=\rho ^{l}=\rho ^{u}\). Now to prove part (d), observe that (b) implies that

$$ h^{\ast }(x)\geq E^{(\phi _{\ast }^{1},\pi ^{2})}\sum _{i=0}^{n-1}C(x_{i},a_{i},b_{i})-\rho ^{\ast }E^{(\phi _{\ast }^{1},\pi ^{2})}\sum _{i=0}^{n-1}\tau (x_{i},a_{i},b_{i})+E^{(\phi _{\ast }^{1},\pi ^{2})}h^{\ast }(x_{n}) $$

for all \(x\in \mathbf{X},\pi ^{2}\in \Pi ^{2}\). In turn, by Corollary 5.5, this inequality implies that

$$ \rho ^{\ast }\geq J(\phi _{\ast }^{1},\pi ^{2},x)\ \ \ \forall x\in \mathbf{X},\pi ^{2}\in \Pi ^{2}. $$

Hence,

$$ \rho ^{\ast }\geq \sup _{\pi ^{2}\in \Pi ^{2}}J(\phi _{\ast }^{1},\pi ^{2},x) \geq U(x)\ \ \ \forall x\in \mathbf{X.}$$

Similarly, part (c) implies that

$$ h^{\ast }(x)\leq E^{(\pi ^{1},\phi _{\varepsilon }^{2})}\sum _{i=0}^{n-1}C(x_{i},a_{i},b_{i})-\rho ^{\ast }E^{(\pi ^{1},\phi _{\varepsilon }^{2})} \sum _{i=0}^{n-1}\tau (x_{i},a_{i},b_{i})+E^{(\pi ^{1},\phi _{ \varepsilon }^{2})}h^{\ast }(x_{n})+n\varepsilon $$

for all \(x\in \mathbf{X},\pi ^{1}\in \Pi ^{1}\), which, together with Corollary 5.5, implies that

$$ \rho ^{\ast }\leq J(\pi ^{1},\phi _{\varepsilon }^{2},x)+ \frac{\nu (W)l}{1-\lambda }\varepsilon \ \ \forall x\in \mathbf{X}, \pi ^{1}\in \Pi ^{1}, $$

where \(l\) is the constant in Lemma 5.3. Thus,

$$\begin{aligned} \rho ^{\ast } & \leq \inf _{\pi ^{1}\in \Pi ^{1}}J(\pi ^{1},\phi _{ \varepsilon }^{2},x)+\frac{\nu (W)l}{1-\lambda }\varepsilon \\ & \leq L(x)+\frac{\nu (W)l}{1-\lambda }\varepsilon \end{aligned}$$

for all \(x\in \mathbf{X}\). Hence, since this inequality holds for all positive \(\varepsilon \), it follows that

$$ U(x)\leq \sup _{\pi ^{2}\in \Pi ^{2}}J(\phi _{\ast }^{1},\pi ^{2},x) \leq \rho ^{\ast }\leq L(x)\ \ \forall x\in \mathbf{X.}$$

Therefore, \(\rho ^{\ast }\) is the value of the game and \(\phi _{\ast }^{1}\) is optimal for player 1.

Finally note that part (e) was already proved in Lemma 5.8. Thus, the proof of Theorem 3.4 is now complete. □

6 Proof of Theorem 4.2

The plan to prove Theorem 4.2 is to show that the inventory system satisfies all assumptions of Theorem 3.4, namely, Assumptions 1, 2 and 3. The next lemma is essential for the proof that Assumption 2 holds.

Lemma 6.1

Suppose that Assumptions 4and 5hold. Then, there exists \(r>0\) such that (17) holds, that is, \(\sup _{b\in \mathbf{B}}\Phi _{b}(r)<1\) where \(\Phi _{p}(\cdot )\) is as in (18).

Proof

A twice integration by parts leaves to the equality

$$ e^{-x}=1-x+\frac{x^{2}}{2}-\frac{1}{2}\int _{0}^{x}(x-s)^{2}e^{-s}ds\ \ \ \forall x\geq 0; $$

this implies that

$$ e^{-x}\leq 1-x+\frac{x^{2}}{2}\ \ \forall x\geq 0. $$

Let \(Z\) be a random variable with distribution \(b\in \mathbf{B}\). Then,

$$ \Phi _{p}(t)=E_{b}e^{t(\widehat{a}-Z)}=e^{t\widehat{a}}E_{b}e^{-tZ}$$

The last inequality implies that

$$\begin{aligned} \Phi _{p}(t) & \leq e^{t\widehat{a}}E_{b}(1-tE_{b}Z+\frac{t^{2}}{2}E_{b}Z^{2}) \\ & \leq e^{t\widehat{a}}(1-tz_{\ast }+\frac{t^{2}}{2}s^{\ast })\ \ \ \forall t\geq 0 \end{aligned}$$
(23)

Now consider the function

$$ h(t):=e^{-t\widehat{a}}-1+tz_{\ast }-\frac{t^{2}}{2}s^{\ast },\ \ t\in \mathbb{R}, $$

and its derivative

$$ h^{\prime }(t)=-\widehat{a}e^{-t\widehat{a}}+z_{\ast }-ts^{\ast }. $$

Since \(h^{\prime }\) is continuous and \(h^{\prime }(0)=z_{\ast }-\widehat{a}>0\), there exists \(\delta >0\) such that \(h^{\prime }>0\) on the interval \((0,\delta )\). For each \(r\in (0,\delta )\), by the mean value theorem, there exists \(t^{\ast }\in (0,r)\) such that

$$ h(r)=h(r)-h(0)=h^{\prime }(t^{\ast })r>0; $$

thus,

$$ e^{-r\widehat{a}}-1+rz_{\ast }-\frac{r^{2}}{2}s^{\ast }>0, $$

which implies that

$$ \rho :=e^{r\widehat{a}}(1-rz_{\ast }+\frac{r^{2}}{2}s^{\ast })< 1. $$

This latter fact combined with (23) yields that \(\Phi _{b}(r)\leq \rho \) for all \(b\in \mathbf{B}\), which in turn implies the desired result. □

Lemma 6.2

Suppose that Assumptions 4and 5hold. Let \(\nu \) be the Dirac measure at 0. Thus, the constant \(\lambda \) in Lemma 6.1, the measure \(\nu \) and the functions \(W\) and \(s\) given in (19) satisfy Assumption 2.

Proof

First observe Assumption 2(a) holds since \(\nu (W)=W(0)\). For the other hand,

$$\begin{aligned} \int _{\mathbf{X}}v(y)Q(dy|x,a,b) & =\int _{\mathbb{R}_{+}}v((x+a-w)^{+})b(dw) \\ & =\int _{[0,x+a]}v(x+a-w)b(dw)+v(0)b((x+a,\infty ) \end{aligned}$$

for all measurable function \(v:\mathbf{X}\rightarrow \mathbb{R}\) whenever these integrals exist. In particular, it holds that

$$\begin{aligned} \int _{\mathbf{X}}W(y)Q(dy|x,a,b) & =W(x)\int _{[0,x+a]}e^{r(a-w)}b(dw)+W(0)s(x,a,b) \\ & \leq \Phi _{b}(r)W(x)+\nu (W)s(x,a,b) \\ & \leq \lambda W(x)+\nu (W)s(x,a,b), \end{aligned}$$

which proves that Assumption 2(c) holds. Similarly, taking \(v=\mathbb{I}_{B}\), it results that

$$\begin{aligned} Q(B|x,a,b) & =\int _{[0,x+a]}\mathbb{I}_{B}(x+a-w)b(dw)+\nu (B)b((x+a, \infty )) \\ & \geq \nu (B)s(x,a,b), \end{aligned}$$

which is Assumption 2(b).

Finally, to prove the inequality \(\nu (s_{\phi ^{1},\phi ^{2}})>0\) for all stationary strategy pair \((\phi ^{1},\phi ^{2})\), observe that

$$ \nu (s_{\phi ^{1},\phi ^{2}})\geq \int _{\mathbf{B}}b((\widehat{a}, \infty ))\phi ^{2}(db|0). $$

Next, note that \(b((\widehat{a},\infty ))>0\) because \(\mu _{b}>\widehat{a}\) for all \(b\in \mathbf{B}\), which implies that

$$ \int _{\mathbf{B}}b((\widehat{a},\infty ))\phi ^{2}(db|0)>0\ \ \forall b\in \mathbf{B.}$$

Hence, \(\nu (s_{\phi ^{1},\phi ^{2}})>0\). □

Lemma 6.3

The payoff function \(C\) in (13) is lower semicontinuous.

Proof

Suppose that \((x_{n},a_{n},b_{n})\rightarrow (x,a,b)\) and define

$$\begin{aligned} h_{n}(w) & :=(w-y_{n})^{+},\ \ n\in \mathbb{N},w\geq 0, \\ h(w) & :=(w-y)^{+},\ \ w\geq 0, \end{aligned}$$

where \(y_{n}:=x_{n}+a_{n},n\in \mathbb{N}\), and \(y=x+a\); since \(y_{n}\rightarrow y\), then \(y_{n}\leq L\) for all \(n\in \mathbb{N}\) for some \(L>0\). These functions satisfy the following properties:

  1. (i)

    \(\{h_{n}\}\) is asymptotically uniformly integrable with respect to the sequence of probability measures \(\{b_{n}\}\), which means that

    $$ \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\int _{0}^{ \infty }|h_{n}(w)|\mathbb{I}_{B_{n}^{K}}(w)b_{n}(dw)=0, $$

    where \(B_{n}^{K}:=\{w\geq 0:|h_{n}(w)|\geq K\},n\in \mathbb{N},K>0\). In fact, note that

    $$\begin{aligned} \int _{\mathbb{R}_{+}}|h_{n}(w)|\mathbb{I}_{B_{n}^{K}}(w)b_{n}(dw) & = \int _{\mathbb{R}_{+}}(w-y_{n})\mathbb{I}_{(y_{n},\infty )}(w) \mathbb{I}_{(y_{n}+K,\infty )}(w)b_{n}(dw) \\ & \leq \int _{\mathbb{R}_{+}}w\mathbb{I}_{[K,\infty )}(w)b_{n}(dw)-y_{n}b_{n}((L+K,\infty )). \end{aligned}$$

    Thus, taking limsup as \(n\) goes to \(\infty \), it results that

    $$ \limsup _{n\rightarrow \infty }\int _{\mathbb{R}_{+}}|h_{n}(w)| \mathbb{I}_{B_{n}^{K}}(w)b_{n}(dw)\leq \int _{\mathbb{R}_{+}}w\mathbb{I}_{[K, \infty )}(w)b(dw)-yb((L+K,\infty )), $$

    from which the desired result follows.

  2. (ii)

    the sequence \(\{h_{n}\}\) is equicontinuous; this property is verified directly.

  3. (iii)

    \(\{h_{n}\}\) converges to \(h\) in measure \(b\); this follows because \(h_{n}\rightarrow h\) pointwise.

    Hence, by [1, Corollary 5.2], it holds that

    $$\begin{aligned} E_{b}(w-x-a)^{+} & =\int _{\mathbb{R}_{+}}(w-x-a)^{+}b(dw) \\ & =\lim _{n\rightarrow \infty }\int _{\mathbb{R}_{+}}(w-x_{n}-a_{n})^{+}b_{n}(dw) \\ & =\lim _{n\rightarrow \infty }E_{b_{n}}(w-x_{n}-a_{n})^{+}. \end{aligned}$$

    Therefore,

    $$\begin{aligned} \liminf _{n\rightarrow \infty }C(x_{n},a_{n},b_{n}) & =\liminf _{n \rightarrow \infty }[c_{1}\mathbb{I}_{(0,\infty )}(a_{n})+c_{2}a_{n}+c_{3}(x_{n}+a_{n}) \\ & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +c_{4}E_{b_{n}}(w-x_{n}-a_{n})^{+}] \\ & \geq C(x,a,b), \end{aligned}$$

    which proves that the cost function \(C\) is lower semicontinuous.

 □

Lemma 6.4

\(W\) and \(QW\) are continuous functions.

Proof

Clearly, \(W\) is continuous. Now, to prove the second statement proceed as in the proof of the above lemma; thus, consider a sequence \((x_{n},a_{n},b_{n})\in \mathbb{K},n\in \mathbb{N}\), that converges to \((x,a,b)\in \mathbb{K}\). Next, define

$$\begin{aligned} \widehat{h}_{n}(\omega ) & :=W((x_{n}+a_{n}-w)^{+}),\ \ n\in \mathbb{N},w\geq 0, \\ \widehat{h}(\omega ) & :=W((x+a-w)^{+}),\ \ w\geq 0. \end{aligned}$$

The following facts hold:

  1. (i)

    \(\{\widehat{h}_{n}\}\) is asymptotically uniformly integrable with respect to the sequence of probability measures \(\{b_{n}\}\), which means that

    $$ \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\int _{ \mathbb{R}_{+}}|\widehat{h}_{n}(w)|\mathbb{I}_{B_{n}^{K}}(w)b_{n}(dw)=0, $$

    where \(B_{n}^{K}:=\{w\geq 0:|\widehat{h}_{n}(w)|\geq K\}\). This is true because the sequence \(\{\widehat{h}_{n}\}\) is uniformly bounded.

  2. (ii)

    the sequence \(\{\widehat{h}_{n}\}\) is equicontinuous; to prove this, let \(L\) be a bound for the sequence \(\{x_{n}+a_{n}\}\) and observe that the sequence of functions \(g_{n}(w):=(x_{n}+a_{n}-w)^{+},n\in \mathbb{N},w\geq 0\), is equicontinuous. Since \(W\) is uniformly continuous in \([0,L]\), it follows that the sequence \(\widehat{h}_{n},n\in \mathbb{N}\), is equicontinuous too.

  3. (iii)

    \(\{\widehat{h}_{n}\}\) converges to \(\widehat{h}\) in measure \(b\); this follows from the pointwise convergence.

Hence, by [1, Corollary 5.2], it holds that

$$\begin{aligned} QW(x,a,b) & =\int _{\mathbb{R}_{+}}\widehat{h}(\omega )b(dw) \\ & =\lim _{n\rightarrow \infty }\int _{\mathbb{R}_{+}}\widehat{h}_{n}( \omega )b_{n}(dw) \\ & =\lim _{n\rightarrow \infty }QW(x_{n},a_{n},b_{n}), \end{aligned}$$

which proves the continuity of \(QW\). □

Lemma 6.5

The transition law \(Q\) is weakly continuous.

Proof

Let \((x_{n},a_{n},b_{n})\in \mathbb{K},n\in \mathbb{N}\), be a sequence that converges to \((x,a,b)\in \mathbb{K}\); fix an arbitrary bounded continuous function \(v:\mathbf{X\rightarrow }\mathbb{R}\) and an arbitrary real number \(\varepsilon >0\). The sequence \(b_{n},n\in \mathbb{N}\), is tight since it is weakly convergent; thus, there exists a constant \(K_{1}>0\) such that

$$ \sup _{n}b_{n}([K_{1},\infty ))< \varepsilon /4M $$

where \(M>0\) is a bound for the function \(v\). For the other hand, since \(x_{n}+a_{n}\rightarrow x+a\), there exists \(r>0\) such that

$$ 0\leq x_{n}+a_{n}< K_{2}:=x+\widehat{a}+r\ \ \ \forall n\in \mathbb{N}. $$

Now, let \(K:=\max \{K_{1},K_{2}\}\) and observe that

$$ (x_{n}+a_{n}-z)^{+}=0\ \ \ \forall z>K,n\in \mathbb{N}. $$

Moreover, since \(v\) is uniformly continuous on \([0,K]\), there exists \(\delta >0\) such that

$$ |v(z_{1})-v(z_{2})|< \varepsilon /4 $$

for all \(z_{1},z_{2}\in \lbrack 0,K]\) with \(|z_{1}-z_{2}|<\delta \).

Next consider the continuous bounded function

$$ g(z):=v((x+a-z)^{+}),\ \ z\in \lbrack 0,\infty ), $$

and take \(N\in \mathbb{N}\) such that

$$\begin{aligned} |\int _{0}^{\infty }g(z)b_{n}(dz)-\int _{0}^{\infty }g(z)b(dz)| & < \varepsilon /4, \\ |(x_{n}+a_{n}-z)^{+}-(x+a-z)^{+}| & < \delta \ \ \ \forall z\in \lbrack 0,\infty ) \end{aligned}$$

hold for all \(n\geq N\), and put

$$ A_{n}:=|\int _{\mathbf{X}}v(y)Q(dy|x_{n},a_{n},b_{n})-\int _{ \mathbf{X}}v(y)Q(dy|x,a,b)|,\ \ n\in \mathbb{N}. $$

Then,

$$\begin{aligned} A_{n} & =|\int _{0}^{\infty }v((x_{n}+a_{n}-z)^{+})b_{n}(dz)-\int _{0}^{ \infty }v((x+a-z)^{+})b(dz)| \\ & \leq |\int _{0}^{\infty }v((x_{n}+a_{n}-z)^{+})b_{n}(dz)-\int _{0}^{K}v((x_{n}+a_{n}-z)^{+})b_{n}(dz)|+ \\ & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\int _{0}^{K}v((x_{n}+a_{n}-z)^{+})b_{n}(dz)-\int _{0}^{K}v((x+a-z)^{+})b_{n}(dz)|+ \\ & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left \vert \int _{K}^{\infty }v((x+a-z)^{+})|b_{n}(dz) \right \vert + \\ & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\int _{0}^{\infty }v((x+a-z)^{+})b_{n}(dz)-\int _{0}^{\infty }v((x+a-z)^{+})b(dz)|. \end{aligned}$$

Thus, for all \(n\geq N\), it holds that

$$\begin{aligned} A_{n} & \leq \int _{K}^{\infty }|v((x_{n}+a_{n}-z)^{+})|b_{n}(dz)+ \\ & \ \ \ \ \ \ \ \ \ \ \int _{0}^{K}|v((x_{n}+a_{n}-z)^{+})-v((x+a-z)^{+})|b_{n}(dz)+ \\ & \ \ \ \ \ \ \ \ \ \ \int _{K}^{\infty }|v((x+a-z)^{+})|b_{n}(dz)+ \\ & \ \ \ \ \ \ \ \ \ \ |\int _{0}^{\infty }v((x+a-z)^{+})b_{n}(dz)- \int _{0}^{\infty }v((x+a-z)^{+})b(dz)|, \end{aligned}$$

which yields the inequality

$$ A_{n}< 2Mb_{n}([K,\infty ))+\varepsilon G_{n}([0,K])/4+\varepsilon /4< \varepsilon \ \ \ \forall n\geq N. $$

Hence, \(Q(\cdot |x_{n},a_{n},b_{n}),n\in \mathbb{N}\), converges weakly to \(Q(\cdot |x,a,b)\). □

Lemma 6.6

The set of probability distribution \(\mathbf{B}\) is sequentially compact.

Proof

Notice that the following inequalities

$$ z^{\ast }\geq \mu _{b}\geq \int _{(k,\infty )}sb(ds)\geq kb((k,\infty )) $$
(24)

hold for all \(b\in \mathbf{B}\) and \(k>0\). Hence, \(\mathbf{B}\) is tight.

Now take a sequence \(\{b_{n}\}\subset \mathbf{B}\). By Prohorov theorem, there exist \(b_{0}\in \mathbb{P}(\mathbf{X})\) and a subsequence \(\{b_{n_{k}}\}\) that converges weakly to \(b_{0}\). Then,

$$ z^{\ast }\geq \liminf _{k\rightarrow \infty }\int _{0}^{\infty }sb_{n_{k}}(ds)\geq \int _{0}^{\infty }sb_{0}(ds)=\mu _{b_{0}}$$

By (24), for each \(\varepsilon >0\) there exists \(k_{0}\in \mathbb{N}\) such that

$$ \sup _{b\in \mathbf{B}}\int _{0}^{\infty }s\mathbb{I}_{(k_{0},\infty )}(s)b(ds)< \varepsilon . $$

Since the mapping \(s\rightarrow s\mathbb{I}_{[0,k]}\) is upper semicontinuous and bounded from above, it follows for \(m>k_{0}\) that

$$\begin{aligned} \mu _{b_{0}} & \geq \int _{0}^{\infty }s\mathbb{I}_{[0,m]}(s)b_{0}(ds) \\ & \geq \limsup _{n\rightarrow \infty }\int _{0}^{\infty }s\mathbb{I}_{[0,m]}(s)b_{n_{k}}(ds) \\ & =\limsup _{n\rightarrow \infty }(\int _{0}^{\infty }sb_{n_{k}}(ds)- \int _{0}^{\infty }s\mathbb{I}_{(m,\infty )}(s)b_{n_{k}}(ds)) \\ & \geq z_{\ast }-\varepsilon . \end{aligned}$$

Because \(\varepsilon \) can be chosen arbitrarily, the last inequality implies that \(\mu _{b_{0}}\geq z_{\ast }\).

Finally observe that

$$ s^{\ast }\geq \liminf _{k\rightarrow \infty }\int _{0}^{\infty }s^{2}b_{n_{k}}(ds)\geq \int _{0}^{\infty }s^{2}b_{0}(ds). $$

Therefore, \(b_{0}\in \mathbf{B}\), which proves that \(\mathbf{B}\) is sequentially compact. □

Proof of Theorem 4.2

As mentioned at the beginning of this section, the plan is to show that the semi-Markov inventory system satisfies Assumptions 1, 2 and 3. Thus, first note that Remark 4.1 and Lemma 6.2 prove that Assumptions 1 and 2 hold, respectively. Moreover, Lemma 6.3 proves the lower semicontinuity of function \(C\), which is Assumption 3(a). For the other hand, Assumption 3(d) follows from Assumption 6 since \(\tau =\kappa \). Furthermore, Lemma 6.5 shows that \(Q\) is weakly continuous, which is Assumption 3(e), while Lemma 6.4 shows that Assumption 3(f) holds. Finally, note that Assumptions 3(b) and (c) trivially hold because \(A(x)=[0,\widehat{a}]\) and \(B(x)=\mathbf{B}\) for all \(x\in \mathbf{X}\) and, according to Lemma 6.6, \(\mathbf{B}\) is (sequentially) compact.

Next, consider a function \(v:\mathbb{K}\rightarrow \mathbb{R}\) and observe that

$$ \inf _{a\in A}\sup _{b\in B}v(x,a,b)=\inf _{\nu ^{1}\in \mathbb{A}} \sup _{\nu ^{2}\in \mathbb{B}}v(x,\nu ^{1},\nu ^{2})\ \ \forall x\in \mathbf{X.}$$

Thus, from Theorem 3.4, there exists a function \(h^{\ast }\) in \(L_{W}(\mathbf{X})\) such that

$$ h^{\ast }(x)=\inf _{a\in A}\sup _{b\in B}[C(x,a,b)-\rho ^{\ast }\tau (x,a,b)+Qh^{ \ast }(x,a,b)]\ \ \forall x\in \mathbf{X}. $$

Now, notice that for each \(b\in \mathbf{B}\), the function

$$ C(\cdot ,\cdot ,b)-\rho ^{\ast }\tau (\cdot ,\cdot ,b)+Qh^{\ast }( \cdot ,\cdot ,b) $$

is lower semicontinuous. Then, the mapping

$$ (x,a)\rightarrow \sup _{b\in B}[C(x,a,b)-\rho ^{\ast }\tau (x,a,b)+Qh^{ \ast }(x,a,b)] $$

is also lower semicontinuous. Hence, there exists \(f_{\ast }^{1}\in \mathbb{F}^{1}\) such that

$$ h^{\ast }(x)=\sup _{b\in \mathbf{B}}[C(x,f_{\ast }^{1}(x),b)-\rho ^{ \ast }\tau (x,f_{\ast }^{1}(x),b)+\int _{\mathbf{X}}h^{\ast }(y)Q(dy|x,f_{ \ast }^{1}(x),b)] $$

for all \(x\in \mathbf{X}\), which proves the first statement of Theorem 4.2. The other statements are proved following standard arguments (as those given in the proof of Theorem 3.4(d) and (e)). □