Abstract
Under the framework given by a growth condition, a Lyapunov property and some continuity assumptions, the present work shows the existence of lower semicontinuous solutions to the Shapley equation for zero-sum semi-Markov games with Borel spaces, weakly continuous transition probabilities and possible unbounded payoff. It is also shown the existence of stationary optimal strategies for the minimizing player and stationary \(\varepsilon \)-optimal strategies for the maximizing player. These results are proved using a fixed-point approach. Moreover, it is shown the existence of a deterministic stationary minimax strategy for a minimax semi-Markov inventory problem under mild assumptions on the demand distribution.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
This paper deals with zero-sum (ratio) average payoff semi-Markov games with Borel spaces, weakly continuous transition probabilities and unbounded lower semicontinuous payoff function. It is shown the existence of solutions to the Shapley equation and the existence of an optimal stationary strategy for the minimizing player and an \(\varepsilon \)-optimal stationary strategy for the maximizing one assuming that the game model satisfies a growth and Lyapunov conditions besides some continuity properties. The framework settled by these conditions was already used in several previous works dealing with Markov and semi-Markov decision processes (for strongly continuous transition probabilities–in the control variable–see references [4, 6, 7, 21]; for weakly continuous transition probabilities–in the state-action pair–see references [9, 12, 13, 16, 24]) as well as with Markov and semi-Markov games (for strongly continuous transition probabilities see [8, 22]; for the weakly continuous case see [10, 11]). See also references [2, 17] for applications in communications systems. In fact, the present work extends the fixed-point approach of references [21, 24] to zero-sum average semi-Markov games with weakly transition probabilities and provides an application to a minimax semi-Markov inventory problem under fairly weak assumptions on the demand distributions. The readers are referred to reference [15] for a review on zero-sum stochastic games in discrete time. It is worth mentioning that zero-sum average payoff semi-Markov games were seemingly studied first by Tanaka and Wakuta [20],Footnote 1 who considered compact state and action spaces and assumed that the payoff function is continuous (thus, bounded), the transition law is weakly continuous among other conditions.
Concerning reference [10] some comments are in order. Jaśkiewicz [10] shows similar results to the present work under similar conditions but there are, of course, some important differences. To begin, to take advantage of the contraction property implied by the Lyapunov condition, she directly “smooths” functions–that are not lower semicontinuous–by taking liminf pointwise, which makes the proof of the validity of the Shapley equation somewhat technically involved. The present work gives a simpler proof of this result by using lower semicontinuous envelopes of functions and defining a suitable contraction operator. Moreover, the Lyapunov conditions of the present paper are weaker than those used in [10]. A second difference regards with the regularity property of the controlled processes, which is guaranteed in [10] by imposing essentially the standard condition on the holding time distribution (see, for instance, [19, Prop. 5.1(a), p. 88.]). Recall that the regularity property states that the involved stochastic processes experience finitely many transitions in bounded time intervals. This property, together the Lyapunov condition, plays an important role to guarantee that the ratio average payoff is well defined and finite-valued and also to show that the Shapley equation yields the existence of optimal or almost optimal stationary strategies for the players. These two latter facts are given by granted and not discussed explicitly by Jaśkiewicz [10]. In contrast, the present work does not impose any probabilistic condition additional to the Lyapunov condition; instead, in order to ensure that the ratio average payoff criterion is well defined, it is assumed that the admissible action sets for both players are compact sets. (Jaśkiewicz [10] supposes that the admissible action sets for one player are compact sets, while for the other player the admissible action sets are complete spaces.) In fact, the mentioned Lyapunov condition implies that the semi-Markov processes induced by stationary policies are regular (see [23, Theorem 18.3.4 and Remark 18.3.6.]). On the other hand, Jaśkiewicz [10] illustrates her results with a minimax Markov inventory problem with a finite number of possible distributions for the random demands, while the present work considers a minimax semi-Markov inventory problem assuming only that the first and second moments belong to bounded intervals. For additional results on minimax (or robust) control problems see references [3, 14, 16].
The remainder of the paper is organized as follows. Section 2 introduces the zero-sum semi-Markov game and the (ratio) average payoff performance index besides some standard concepts and notation. Section 3 states the assumptions and the main result (Theorem 3.4); its proof is given in Sect. 5. Section 4 shows the existence of a deterministic stationary minimax policy for a semi-Markov minimax inventory problem (see Theorem 4.2); the proof is given in Sect. 5.
2 Zero-Sum Average Payoff Semi-Markov Games
The following standard concepts and notation are used throughout the paper. For a Borel space \(\mathcal{S}\)–that is, a subset of a complete and separable metric space–\(\mathcal{B(S)}\) denotes the Borel \(\sigma \)-algebra; any statement about “measurability” should be meant as measurability with respect to \(\mathcal{B}(\mathcal{S)}\). The family of probability measures on \(\mathcal{S}\) is denoted by \(\mathbb{P}(\mathcal{S})\). Given two Borel spaces \(\mathcal{S}\) and \(\mathcal{S}^{\prime }\), a kernel \(K(\cdot |\cdot )\) on \(\mathcal{S}\) given \(\mathcal{S}^{\prime }\) is a mapping such that \(K(\cdot |s^{\prime })\) is a measure on \(\mathcal{S}\) for each \(s^{\prime }\in \mathcal{S}^{\prime }\), and \(K(B|\cdot )\) is a measurable function on \(\mathcal{S}^{\prime }\) for each subset \(B\in \mathcal{B}(\mathcal{S})\). The kernel \(K(\cdot |\cdot )\) is called stochastic kernel if \(K(\mathcal{\cdot }|s^{\prime })\in \mathbb{P}(\mathcal{S)}\) for all \(s^{\prime }\in \mathcal{S}^{\prime }\).
Let \(K(\cdot |\cdot )\) be a kernel on \(\mathcal{S}\) given \(\mathcal{S}^{\prime }\). For an arbitrary measurable function \(u:\mathcal{S\rightarrow }\mathbb{R}\), let
whenever the integral is well defined. Similarly, for a measure \(\nu \in \mathbb{P}(\mathcal{S)}\), set
Given a measurable function \(w:\mathcal{S}\rightarrow \lbrack 1,\infty )\), \(B_{w}(\mathcal{S})\) stands for the family of functions \(u\) on \(\mathcal{S}\) with finite \(w\)-norm, which is defined as
By \(L_{w}(\mathcal{S})\) denote the class of lower semicontinuous functions belonging to \(B_{w}(\mathcal{S})\). The normed space \((B_{w}(\mathcal{S}),||\cdot ||_{w})\) is a Banach space while \((L_{w}(\mathcal{S}),d_{w})\) is a complete metric space, where \(d_{w}\) is the metric induced by the \(w\)-norm. The space of continuous bounded functions defined on \(\mathcal{S}\) is denoted as \(\mathcal{C}_{b}(\mathcal{S})\).
The subset of nonnegative real numbers is denoted by \(\mathbb{R}_{+}\) and the positive (nonnegative, resp.) integers subset by \(\mathbb{N\ (N}_{0}\), resp.).
The game model. We are interested in zero-sum semi-Markov games with the (ratio) average cost criterion given below (6). This kind of games is specified by a semi-Markov game model given by the collection
where the Borel spaces \(\mathbf{X},\mathbf{A},\mathbf{B}\) denote the state space of the game, and the action or control subsets for player 1 and player 2, respectively. The constraint sets \(\mathbb{K}_{\mathbf{A}}\) and \(\mathbb{K}_{\mathbf{B}}\) belongs to \(\mathcal{B}(\mathbf{X\times A)}\) and \(\mathcal{B}(\mathbf{X\times B)}\), respectively. The \(x\)-sections
stand for the admissible action or control subsets for players 1 and 2, respectively, when the game is in state \(x\in \mathbf{X}\). The set
is a Borel subset of the Cartesian product \(\mathbf{X}\times \mathbf{A}\times \mathbf{B}\) (see [18, Lemma 1.1]). The stochastic kernel \(q(\cdot |\cdot ,\cdot ,\cdot )\) on \(\mathbf{X\times }\mathbb{R}_{+}\) given \(\mathbb{K}\) is the transition law of the game. Finally, the measurable function \(c:\mathbb{K\times R}_{+}\rightarrow \mathbb{R}\) is the payoff function of the game.
The game is played over an infinite horizon as follows: at time \(t=0\), both players observe the game in some state, say, \(x_{0}=x\in \mathbf{X}\), and independently choose admissible controls \(a_{0}=a\in A(x)\) and \(b_{0}=b\in B(x)\). Then, the game remains in state \(x_{0}=x\) for a nonnegative random time \(\Delta _{1}\) and, at this time, it moves to a new state \(x_{1}=x^{\prime }\in \mathbf{X}\) according to the probability measure \(q(\cdot |x,a,b)\), that is,
for \(B\in \mathcal{B}(\mathbf{X}),D\in \mathcal{B}(\mathbb{R}_{+})\). Immediately after the transition occurs, player 1 pays the amount \(c(x,a,b,\Delta _{1})\) to player 2, and they choose new controls, say, \(a_{1}=a^{\prime }\in A(x^{\prime })\) and \(b_{1}=b^{\prime }\in B(x^{\prime })\), and the above process repeats over and over again.
This procedure engenders a stochastic process \(\{(x_{n},a_{n},b_{n},\Delta _{n+1})\}\), where, for each \(n\in \mathbb{N}_{0}\), \(x_{n}\) is the state of the game, \(a_{n}\) and \(b_{n}\) are the control variables for players 1 and 2, respectively, and \(\Delta _{n+1}\) is the time the game spends in state \(x_{n}\); thus, the random time \(\Delta _{n+1}\) is called holding or sojourn time at state \(x_{n}\). Note that the random variable
is the time of the nth jump of the game. Thus, if \(x_{n}=x,a_{n}=a\) and \(b_{n}=b\), according to (2), the conditional marginal
rules the state toward the game moves in the next transition irrespective of the time it takes to occur. Similarly, the conditional marginal
rules the time at which the next transition happens irrespective the state toward which the system moves out; thus, it is called the holding time distribution. Then,
is the mean holding or sojourn time, while
is the mean payoff that player 1 makes to player 2.
Strategies. Let \(H_{0}:=\mathbf{X}\) and \(H_{n}:=\mathbb{K\times R}_{+}\times H_{n-1}\) for \(n\in \mathbb{N}\). Thus, for \(n\in \mathbb{N}\), each element
is the history of the game up to the nth transition, which occurs in the time \(T_{n}\). A strategy for player 1 is a sequence \(\pi ^{1}=\{\pi _{n}^{1}\}\) of stochastic kernels on \(\mathbf{A}\) given \(H_{n}\) that satisfy the constraint
The class of all strategies for player 1 is denoted by \(\Pi ^{1}\). Now, for each \(x\in \mathbf{X}\), let \(\mathbb{A}(x):=\mathbb{P}(A(x))\) and denote by \(\Phi ^{1}\) the class of stochastic kernels on \(\mathbf{A}\) given \(\mathbf{X}\) such that \(\phi ^{1}(\cdot |x)\in \mathbb{A}(x)\) for each \(x\in \mathbf{X}\). A policy \(\pi ^{1}=\{\pi _{n}^{1}\}\) is called stationary if
for some stochastic kernel \(\phi ^{1}\in \Phi \). In this case, as usual, strategy \(\pi ^{1}=\{\pi _{n}^{1}\}\) is identified with the stochastic kernel \(\phi ^{1}\) and the class of all stationary strategies with \(\Phi ^{1}\). A stationary policy \(\phi ^{1}\) for player 1 is called deterministic stationary if there exists a measurable function \(f:\mathbf{X}\rightarrow \mathbf{A}\) satisfying that \(f(x)\in A(x)\) for all \(x\in \mathbf{X}\) and such that \(\phi ^{1}(\cdot |x)\) is concentrated at \(f(x)\) for each \(x\in \mathbf{X}\); the set of deterministic stationary policies for player 1 is denoted by \(\mathbb{F}^{1}\). The sets of strategies \(\Pi ^{2}\), \(\Phi ^{2}\) and \(\mathbb{F}^{2}\) for player 2 are defined similarly but considering \(B(x)\) and \(\mathbb{B}(x)\) in lieu of \(A(x)\) and \(\mathbb{A}(x)\), respectively.
Throughout of the remainder of the present work is used the following standard notation: for a measurable function \(u\) on \(\mathbb{K}\), \(x\in \mathbf{X}\) and probability measures \(\gamma ^{1}\in \mathbb{A}(x),\gamma ^{2}\in \mathbb{B}(x)\), let
Similarly, for a stationary strategy pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\) and \(x\in \mathbf{X}\), set
Thus, in particular,
and also
The average payoff performance index. Let \(\Omega :=(\mathbb{K\times R}_{+})^{\infty }\) and ℱ the corresponding product \(\sigma \)-algebra. For each strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) and probability measure \(\nu \) on \(\mathbf{X}\) there exist a probability measure \(P_{\nu }^{(\pi ^{1},\pi ^{2})}\) and a stochastic process \(\{(x_{n},a_{n},b_{n},\Delta _{n+1})\}\) defined on the sample space \((\Omega ,\mathcal{F})\) with the following properties:
\((i)\ P_{\nu }^{(\pi ^{1},\pi ^{2})}[x_{0}\in B]=\nu (B)\);
\((\mathit{ii})\ P_{\nu }^{(\pi ^{1},\pi ^{2})}[(a_{n},b_{n})\in C_{1}\times C_{2}|h_{n}]=\pi _{n}^{1}(C_{1}|h_{n})\pi _{n}^{2}(C_{2}|h_{n})\);
\((\mathit{iii})\ P_{\nu }^{(\pi ^{1},\pi ^{2})}[(x_{n+1},\Delta _{n+1})\in B \times D|h_{n},a_{n},b_{n}]=q(B\times D|x_{n},a_{n},b_{n})\);
for all \(B\in \mathcal{B}(\mathbf{X}),\ C_{1}\in \mathcal{B}(\mathbf{A}),C_{2}\in \mathcal{B}(\mathbf{B}),D\in \mathcal{B}( \mathbb{R}_{+}),h_{n}\in H_{n},n\in \mathbb{N}\).
The expectation operator with respect to \(P_{\nu }^{(\pi ^{1},\pi ^{2})}\) is denoted as \(E_{\nu }^{(\pi ^{1},\pi ^{2})}\). According to property \((i)\), the probability measure \(\nu \) is called initial distribution. If the initial distribution \(\nu \) is concentrated at some state \(x\in \mathbf{X}\), we shall write \(P_{x}^{(\pi ^{1},\pi ^{2})}\) and \(E_{x}^{(\pi ^{1},\pi ^{2})}\) instead of \(P_{\nu }^{(\pi ^{1},\pi ^{2})}\) and \(E_{\nu }^{(\pi ^{1},\pi ^{2})}\), respectively.
If the players use a stationary strategy pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\), by property (iii), the state process \(\{x_{n}\}\) is a Markov chain with one-step transition probability \(Q_{\phi ^{1},\phi ^{2}}(\cdot |\cdot )\). In this case, the n-step transition probability is denoted by \(Q_{\phi ^{1},\phi ^{2}}^{n}(\cdot |\cdot )\) for \(n\in \mathbb{N}_{0}\), where \(Q_{\phi ^{1},\phi ^{2}}^{0}(\cdot |x)\) is the Dirac measure concentrated at \(x\in \mathbf{X}\). Thus, for a measurable function \(u:\mathbf{X}\rightarrow \mathbb{R}\),
whenever these quantities are well defined.
The (ratio) expected average payoff (EAP) for a strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) and initial state \(x\in \mathbf{X}\) is defined as
Since the equalities
hold for all strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\), state \(x\in \mathbf{X}\) and \(i\in \mathbb{N}_{0}\), the performance index (6) can be re-written as
Roughly speaking, the goal of player 1 (player 2, resp.) is to minimize (maximize, resp.) (9). This leads to the lower and upper value functions
respectively. Observe that, in general, \(L(\cdot )\leq U(\cdot )\). Thus, if \(L(\cdot )=U(\cdot )\), it is said that the game has a value and the common value of these functions, which is denoted as \(V(\cdot )\), is called the value of the game.
If the game has value \(V(\cdot )\), a policy \(\pi _{\ast }^{1}\in \Pi ^{1}\) is said to be EAP-optimal for player 1 if
similarly, a policy \(\pi _{\ast }^{2}\in \Pi ^{2}\) is said to be EAP-optimal for player 2 if
If \(\pi _{\ast }^{i}\) is EAP-optimal for player \(i\ (i=1,2)\), then the pair \((\pi _{\ast }^{1},\pi _{\ast }^{2})\) is called EAP-optimal pair or saddle point. Observe that \((\pi _{\ast }^{1},\pi _{\ast }^{2})\) is an EAP-optimal pair if and only if
Remark 2.1
An important particular case of semi-Markov games are Markov games, which come out when the holding time distribution is concentrated at some positive constant \(c\), say \(c=1\), that is
In this case,
and \(\tau =1\). Thus, the performance index (9) becomes
Remark 2.2
The performance index (9) depends on the kernel \(q\) only through its marginal distribution measures \(Q\) and \(F\), which enter in the computation in a “decoupled” way. Thus, one can assume without loss of generality that
for all \((x,a,b)\in \mathbb{K},B\in \mathcal{B}(\mathbf{X}),D\in \mathcal{B}(\mathbb{R}_{+})\). A second consequence is that the mean holding time distribution \(F\) can be replaced by an exponential distribution with parameter \(\tau ^{-1}\), that is, by the distribution
under the assumption that \(\tau >0\). In other words, the semi-Markov game model (1) can be replaced without loss of generality by the model \((\mathbf{X},\mathbf{A},\mathbf{B},\mathbb{K}_{\mathbf{A}}, \mathbb{K}_{\mathbf{B}},q^{\prime },C)\) where
\(Q\) is the transition kernel (3) and \(C\) is the cost function (8). It is worth mentioning that this could not be the case for a discounted performance index.
3 Solutions to the Shapley Equation and Optimal Stationary Strategies
The main result of the present work, Theorem 3.4 below, extends to semi-Markov games with weak continuous transition probabilities the analysis given in [24] for Markov decision processes with continuous transition probabilities. Specifically, it is shown the existence of lower semicontinuous solutions to the Shapley equation under Assumptions 1, 2 and 3 given below. As commented in the Introduction, the framework settled by these conditions has became quite standard for the study of average payoff optimization problems and used, for instance, for Markov and semi-Markov decision processes in [4, 6, 7, 9, 12, 13, 16, 21, 24], and for Markov and semi-Markov games in [10, 11, 15, 22]. See reference [5] for further comments and discussion on such kind of conditions.
Assumption 1
The following conditions hold for all \((x,a,b)\in \mathbb{K}\):
(a) \(\tau (x,a,b)>0\);
(b) there exist a measurable function \(W\geq 1\) on \(\mathbf{X}\) and a constant \(k>0\) such that
Assumption 2
There exist a measurable nonnegative function \(s\) on \(\mathbb{K}\), a nontrivial measure \(\nu \) on \(\mathbf{X}\), and a constant \(\lambda \in (0,1)\) such that the following properties hold:
(a) \(\nu (W)<\infty \);
(b) \(Q(B|x,a,b)\geq s(x,a,b)\nu (B)\) for all \((x,a,b)\in \mathbb{K}\);
(c) \(QW(x,a,b))\leq \lambda W(x)+\nu (W)s(x,a,b)\) for all \((x,a,b)\in \mathbb{K}\);
(d) \(\nu (s_{\phi ^{1},\phi ^{2}})>0\).
The key point is that Assumption 2 entails a contraction property. To see this, let
Then, Assumption 2(b) implies that \(\widehat{Q}\) is a nonnegative kernel, while Assumption 2(c) leads to the inequality
The results in the next proposition are proved in [21] using this contraction property.
Proposition 3.1
Suppose that Assumption 2holds and let \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\) be an arbitrary stationary strategy pair. Then:
(a) the transition probability \(Q_{\phi ^{1},\phi ^{2}}\) is positive Harris recurrent; thus, it admits a unique invariant probability measure \(\mu _{\phi ^{1},\phi ^{2}}\) and \(\nu \) is an irreducibility measure;
(b) \(\mu _{\phi ^{1},\phi ^{2}}(W)\) is finite and
moreover,
(c) for any \(u\in B_{W}(\mathbf{X})\), \(\mu _{\phi ^{1},\phi ^{2}}(|u|)<\infty \) and
hence,
Remark 3.2
Thus, for each pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\), Assumptions 1 and 2 imply that the constant
is well defined and finite, and also that the operator
is a contraction map from \(B_{W}(\mathbf{X})\) into itself with modulus \(\lambda \). Since \((B_{W}(\mathbf{X}),||\cdot ||_{W})\) is a Banach space, it follows that there exists a unique function \(h_{\phi ^{1},\phi ^{2}}\in B_{W}(\mathbf{X)}\) such that
Now an integration of both sides of the above equation with respect to the probability measure \(\mu _{\phi ^{1},\phi ^{2}}\) leads to equality
which implies, by Proposition 3.1(b), that \(\nu (h_{\phi ^{1},\phi ^{2}})=0\). Hence, \(h_{\phi ^{1},\phi ^{2}}\) is the unique function on \(B_{W}(\mathbf{X})\) that satisfies the (semi-Markov) Poisson equation
and the condition \(\nu (h_{\phi ^{1},\phi ^{2}})=0\). Then, by iterating this equation, one can see that
for all \(x\in \mathbf{X}\).
It is worth mentioning that reference [21] proves Proposition 3.1 using similar arguments to those displayed above to show the existence of solutions to the Poisson equation (11).
Now, to pass from the Poisson equations to the Shapley equation, it is assumed that the model game satisfies the following compactness and continuity conditions.
Assumption 3
(a) \(C\) is lower semicontinuous on \(\mathbb{K}\);
(b) the mapping \(x\rightarrow A(x)\) is lower semicontinuous and compact-valued;
(c) the mapping \(x\rightarrow B(x)\) is upper semicontinuous and compact valued;
(d) \(\tau \) is continuous;
(e) the state transition law \(Q\) is weakly continuous on \(\mathbb{K}\), that is, the mapping
is continuous for all \(u\in \mathcal{C}_{b}(\mathbf{X})\);
(f) \(W\) and \(QW\) are continuous functions.
Notice that Remark 3.2 shows that the average payoff criterion is well defined and finite-valued whenever the players use stationary strategies. The next theorem extends this assertion to all admissible strategies.
Theorem 3.3
Under Assumptions 1, 2and 3, the performance criterion (9) is well defined and finite-valued.
The next theorem states the main results of the present work.
Theorem 3.4
If Assumptions 1, 2, 3hold, then:
(a) there exist \(h^{\ast }\in L_{W}(\mathbf{X})\) and \(\rho ^{\ast }\in \mathbb{R}\) that satisfy the Shapley equation
for all \(x\in \mathbf{X}\);
(b) there exists \(\phi _{\ast }^{1}\in \Phi ^{1}\) such that
(c) for each \(\varepsilon >0\) there exists \(\phi _{\varepsilon }^{2}\in \Phi ^{2}\) such that
(d) the constant \(\rho ^{\ast }\) is the value of the game, the stationary policy \(\phi _{\ast }^{1}\) is optimal for player 1; for each \(\varepsilon >0\), the stationary policy \(\phi _{\varepsilon ^{\prime }}^{2}\) is \(\varepsilon \)-optimal for player 2 with \(\varepsilon ^{\prime }=\nu (W)l\varepsilon /(1-\lambda )\) where \(l\) is a fixed constant (this is given below in Lemma 5.3);
(e) moreover,
The proof of Theorem 3.4 is given in Sect. 4.
Remark 3.5
Assumption 3(d) trivially holds for Markov games (see Remark 2.1). Moreover, as can be checked in the proof of Theorem 3.4, instead of the compactness property in Assumption 3(c), it suffices to assume the completeness of the sets \(B(x)\), \(x\in \mathbf{X}\). In fact, the compactness property is used only to prove the existence of a positive constant \(\gamma \) such that
for all initial state \(x\in \mathbf{X}\) and strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) (see Corollary 5.5 in Sect. 5). This latter inequality clearly holds for Markov games.
4 A Minimax Semi-Markov Inventory Problem
A controller seeks to minimize the cost of operating a single-item inventory system without backlog, for which the decision epochs form a nondecreasing stochastic processes \(T_{n},n\in \mathbb{N}_{0}\). The demand process \(\{w_{n}\}\) is formed by independent nonnegative random variables with \(w_{n}\) being the quantity of product demanded between the decision epochs \(T_{n}\) and \(T_{n+1}\). The controller knows that the expected demands lie in a bounded interval and that its second moments are bounded above, but she/he does not know the demand distribution themselves. To state these assumptions formally, for \(p\in \mathbb{P}([0,\infty ))\), let
Assumption 4
The demand distributions belong to the class
where \(z^{\ast },z_{\ast }\) and \(s^{\ast }\) are constants such that \(z^{\ast }>z_{\ast }>0\) and \(s^{\ast }\geq 0\).
In what follows it is assumed that \(\mathbf{B}\) is metrized with any metric compatible with the topology of weak convergence of probability measures.
The inventory evolves as follows: at the nth decision epoch, which is given by the time \(T_{n}\), the controller observes the inventory level \(x_{n}=x\in \mathbf{X:}=\mathbb{R}_{+}\) and orders a product quantity \(a_{n}=a\in \mathbf{A:}=[0,\widehat{a}]\) for facing the (nonnegative) product demand \(w_{n}=w\) accumulated between times \(T_{n}\) and \(T_{n+1}\); the constant \(\widehat{a}\) is positive and it is assumed that the replenishing quantity \(a_{n}\) is immediately supplied. Thus, the controller incurs in the cost
where \(\delta _{0}(a)\) is the delta of Kronecker and the nonnegative constants \(c_{i},i=1,\ldots ,4\), stand for the cost for putting an order, the production cost, the holding cost and the penalizating cost for unmet demand, respectively, per unit of product.
Then, the expected payoff (or one step cost) is
for \((x,a,b)\in \mathbb{K}\), with \(A(x):=\mathbf{A}\) and \(B(x):=\mathbf{B}\) for all \(x\in \mathbf{X}\), and where \(E_{b}\) denotes the expectation with respect to the distribution \(b\in \mathbf{B}\). Notice that \(\mathbb{K=}\mathbf{X}\times \mathbf{A}\times \mathbf{B}\).
At time \(T_{n+1}\) the inventory changes according to the recursive equation
where \(r^{+}:=max(r,0)\) for \(r\in \mathbb{R}\), which leads to the stochastic kernel
Since the distribution of the product demand is unknown, the controller wants to hedge himself against the worse possible scenario; then, she/he approaches the problem as a game against “nature”, who choses the distribution \(b_{n}\in \mathbf{B}\) for the product demand \(w_{n}\). Thus, the controller goal is to find a minimax policy, that is, a policy \(\pi _{\ast }^{1}\in \Pi ^{1}\) that satisfies the condition
Notice that a minimax policy is no other than an optimal policy for the controller when the inventory system is seen as a game against nature and this game has a value.
Jaśkiewicz [10] considers a minimax Markov inventory problem, that is, a minimax inventory problem for which \(T_{n}=n\) for all \(n\in \mathbb{N}\); moreover, she assumes that the set \(\mathbf{B}\) of possible distributions for the demand is finite. Reference [16] also follows the minimax approach to study a class of semi-Markov inventory system but assuming that the holding time distribution depends on an unknown parameter and that the demand distribution is completely known.
The main result of this section, Theorem 4.2 given below, shows the existence of a deterministic stationary minimax policy for the inventory system (13)-(15) under Assumption 4 and Assumptions 5 and 6, which are given next. Assumption 5 is also related to the distributions of the product demands.
Assumption 5
The inequality \(z_{\ast }>\widehat{a}\) holds.
It is shown in Lemma 6.1, Sect. 6, that Assumptions 4 and 5 imply that there exists a constant \(r>0\) such that
where
This inequality plays a key role in proving that the inventory system satisfies Assumption 2. In fact, Lemma 6.2 shows that the constant \(\lambda \) in (17), the functions
and the Dirac measure at zero \(\nu (\cdot )\) satisfy Assumption 2.
The next condition concerns with the mean holding time distribution. In order to be specific, it is assumed that it has an exponential density where the parameter \(\kappa ^{-1}\) is a positive continuous function dominated by the function \(W\); such condition is formally stated below. However, it should be noted that it can be considered any other class of distributions as long as Assumption 1 holds (see Remark 2.2).
Assumption 6
The mean holding time distribution is given as
for all \(t\geq 0\) and \((x,a,b)\in \mathbb{K}\), where \(\kappa :\mathbb{K}\rightarrow \mathbb{(}0,\infty )\) is a continuous function such that \(\kappa (\cdot ,\cdot ,\cdot )\leq k_{1}W(\cdot )\) for some positive constant \(k_{1}\).
Thus, according to Remark 2.2, the semi-Markov kernel can be taken without loss of generality as
for all \(t\geq 0,(x,a,b)\in \mathbb{K}\) and \(B\in \mathcal{B}(\mathbb{R}_{+})\).
Remark 4.1
Observe that Assumption 4 implies the inequality
where \(M:=c_{1}+(c_{2}+c_{3})\widehat{a}+c_{4}z^{\ast }\). For the other hand, Assumption 6 implies that
Thus, one can choose a constant \(k\geq k_{1}>0\) such that
for all \((x,a,b)\in \mathbb{K}\). Notice that this latter inequality shows that Assumption 1(b) holds.
Next, it is the main result of this section.
Theorem 4.2
Suppose that Assumptions 4, 5and 6hold. Then there exists a function \(h^{\ast }\in L_{W}(\mathbf{X})\), a constant \(\rho ^{\ast }\) and a deterministic stationary policy \(f_{\ast }^{1}\in \mathbb{F}^{1}\) for player 1 such that the equalities
hold for all \(x\in \mathbf{X}\). Moreover, \(f_{\ast }^{1}\) is a minimax policy and
5 Proof of Theorems 3.3 and 3.4
The proof relies on a number of preliminary results, which are collected in several propositions and lemmas. In Proposition 5.1, it is used the concept of lower semicontinuous envelope of functions, which is introduced below together with some related properties.
Let \((\mathcal{S},d)\) be a metric space. For each function \(u:\mathcal{S\rightarrow }\mathbb{R}\) define
where \(B_{r}(s)\) is the ball with center at \(s\in \mathcal{S}\) and radius \(r>0\). Function \(u^{e}\) is the largest lower semicontinuous function dominated by \(u\), that is: (i) \(u^{e}\) is lower semicontinuous; (ii) \(u\geq u^{e}\); (iii) if \(v\) is lower semicontinuous and \(u\geq v\), then \(u^{e}\geq v\). Thus, \(u^{e}\) is called the lower semicontinuous envelope of the function \(u\). Clearly, \(u\) is lower semicontinuous if and only if \(u=u^{e}\); moreover, if \(u\geq v\), then \(u^{e}\geq v^{e}\). Additionally, if \(w:\mathcal{S\rightarrow }[1,\infty )\) is a continuous function, then
In what follows is assumed that Assumptions 1, 2, 3 hold.
Proposition 5.1
Let \(S:=s^{e}\). Then:
(a) \(Q(B|x,a,b)\geq S(x,a,b)\nu (B)\) for all \((x,a,b)\in \mathbb{K}\);
(b) \(QW(x,a,b)\leq \lambda W(x)+\nu (W)S(x,a,b)\) for all \((x,a,b)\in \mathbb{K}\);
(c) \(\mu _{\phi ^{1},\phi ^{2}}(S_{\phi ^{1},\phi ^{2}})\geq \theta >0\) for all pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\).
Proof
Property (a) follows directly from the definition of lower semicontinuous envelope. To prove (b), note that \(\nu (W)s\geq QW-\lambda W\) and recall that \(QW\) and \(W\) are continuous functions; then
which proves (b). Now, note that part (b) yields for each pair of stationary strategies \((\phi ^{1},\phi ^{2})\) the inequality
Thus, the inequality resulting by integrating with respect to \(\mu _{\phi ^{1},\phi ^{2}}\) at both sides implies that
which proves part (c). □
Proposition 5.2
For all strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},x\in \mathbf{X}\) and \(u\in B_{W}(\mathbf{X)}\):
Proof
This result follows directly after noting that Assumption 1(b) and Assumption 2(c) imply that
□
Lemma 5.3
There exists a constant \(l>0\) such that
for all \(x\in \mathbf{X},(\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},k\in \mathbb{N}\).
Proof
Fix \(x\in \mathbf{X},(\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},k\in \mathbb{N}_{0}\). Now, observe that
Next consider the subsets
and notice that \(\mathbf{X}_{n}\uparrow \mathbf{X}\). Thus, for some \(N\in \mathbb{N}\), \(\nu (\mathbf{X}_{N})>0\) since \(\nu (\mathbf{X}_{n})\uparrow \nu (\mathbf{X})>0\). Then,
which proves the desired result with \(l:=\nu (\mathbf{X}_{N})/N\). □
Lemma 5.4
The following inequality holds for all \(x\in \mathbf{X},(\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\):
Proof
Let \(x\in \mathbf{X},(\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) be fixed but arbitrary. It follows from Assumption 2(c) that
Then,
The last inequality implies that
proving the desired inequality. □
The next corollary is a direct consequence of the two previous lemmas and Proposition 5.2.
Corollary 5.5
Suppose Assumptions 1and 2hold. Then, for all \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2},x\in \mathbf{X}\), the following inequality holds:
Therefore,
and
for all \(u\in B_{W}(\mathbf{X})\).
Proof of Theorem 3.3
Fix an arbitrary strategy pair \((\pi ^{1},\pi ^{2})\in \Pi ^{1}\times \Pi ^{2}\) and an arbitrary initial state \(x\in \mathbf{X}\). Assumption 1(b) and inequality (20) imply that
Then, from the above inequality and Corollary 5.5, it follows that
which is the desired result. □
Lemma 5.6
For all stationary strategy pair \((\phi ^{1},\phi ^{2})\in \Phi ^{1}\times \Phi ^{2}\):
Hence, the constants
are well defined and finite; notice that \(\rho ^{u}\geq \rho ^{l}\).
Proof
It follows from (10), Proposition 3.1, Corollary 5.5 and Assumption 1(b) that
which is the result to be proven. □
Now, to show the existence of solutions to the Shapley equation define the nonnegative kernel
for \((x,a,b)\in \mathbb{K},B\in \mathcal{B}(\mathbf{X})\), and consider the operators on \(B_{W}(\mathbf{X})\) given by
for \((x,a,b)\in \mathbb{K},B\in \mathcal{B}(\mathbf{X})\). Notice that \(\widetilde{Q}\geq 0\) and also that
Lemma 5.7
The operator \(T\) is a contraction operator from \(L_{W}(\mathbf{X})\) into itself with modulus \(\lambda \). Thus, by the Banach fixed-point theorem, there exists a unique function \(h^{\ast }\in L_{W}(\mathbf{X})\) such that \(h^{\ast }=Th^{\ast }\). Moreover:
(a) \(Th^{\ast }(x)=\inf _{\gamma ^{1}\in \mathbb{A}(x)}\sup _{\gamma ^{2}\in \mathbb{B}(x)}(L^{e}u)_{\gamma ^{1},\gamma ^{2}}(x)\) for all \(x\in \mathbf{X}\);
(b) there exists \(\phi _{\ast }^{1}\in \Phi ^{1}\) such that
(c) for each \(\varepsilon >0\) there exists \(\phi _{\varepsilon }^{2}\in \Phi ^{2}\) such that
Proof
Let \(u\) be an arbitrary function in \(L_{W}(\mathbf{X})\). It follows from Assumption 1 that \(|Lu|\leq MW\), where \(M:=(k+|\rho ^{l}|k+\lambda ||u||_{W})\). Then, since \(W\) is continuous, \(|L^{e}u|\leq MW\), which in turn implies that \(|(L^{e}u)_{\gamma ^{1},\gamma ^{2}}|\leq MW\) for all measures \(\gamma ^{1}\in \mathbb{A}(x),\gamma ^{2}\in \mathbb{B}(x)\). For the other hand, \((L^{e}u)_{\gamma ^{1},\gamma ^{2}}\) is a lower semicontinuous function on \(\mathbf{X}\); hence, \((L^{e}u)_{\gamma ^{1},\gamma ^{2}}\in L_{W}(\mathbf{X})\) for all measures \(\gamma ^{1}\in \mathbb{A}(x),\gamma ^{2}\in \mathbb{B}(x)\). Now, combining the above facts with the minimax theorem [10, Lemma 3.3], it follows that \(Tu\in L_{W}(\mathbf{X)}\) and also that
For the other hand, for \(u,v\in B_{W}(\mathbf{X})\), notice that
Hence, \(T\) is a contraction operator from \(L_{W}(\mathbf{X})\) into itself with modulus \(\lambda \). So, there exists a unique function \(h^{\ast }\in L_{W}(\mathbf{X})\) such that
The last two assertions follows from the minimax theorem [10, Lemma 3.3]. □
Lemma 5.8
\(\nu (h^{\ast })=0\) and \(\rho ^{l}=\rho ^{u}\).
Proof
First it is shown that \(\nu (h^{\ast })\leq 0\) and that \(\rho ^{l}=\rho ^{u}\); after this, it is proved that \(\nu (h^{\ast })\geq 0\), completing thus the proof.
For each \(\varepsilon >0\), by Lemma 5.7(b), there exists \(\phi _{\varepsilon }^{2}\in \Phi ^{2}\) such that
Then, integrating with respect to the invariant probability measure \(\mu _{\phi ^{1},\phi _{\varepsilon }^{2}}\) in both sides of the above inequality, it follows after a rearrangement of terms that
Now suppose that \(\nu (h^{\ast })>0\) and take \(\varepsilon \in (0,\nu (h^{\ast })\theta )\), where \(\theta =(1-\lambda )/\nu (W)\) (see Proposition 3.1(b)). Then
which, taking infimum over \(\phi ^{1}\), yields
Therefore, \(\nu (h^{\ast })\leq 0\).
Recall that \(C\) and \(S\) are lower semicontinuous and \(\tau \) is continuous. Moreover, \(Qu\) is lower semicontinuous for each \(u\in L_{W}(\mathbf{X})\). Now, since \(\nu (h^{\ast })\leq 0\), the function \(Lh^{\ast }\) is lower semicontinuous too. Hence,
for all \((x,a,b)\in \mathbb{K}\). From Lemma 5.7(b), there exists \(\phi _{\ast }^{1}\in \Phi ^{1}\) such that
for all \(x\in \mathbf{X},\phi ^{2}\in \Phi ^{2}\). Integrating with respect to \(\mu _{\phi _{\ast }^{1},\phi ^{2}}\), it follows that
Thus,
Therefore, \(\rho ^{u}=\rho ^{l}\).
Next it is shown that \(\nu (h^{\ast })\geq 0\). Since \(\rho ^{u}=\rho ^{l}\), the inequality (22) implies that
thus,
This inequality and Proposition 5.1(c) imply that \(\nu (h^{\ast })\geq 0\). Therefore, \(\nu (h^{\ast })=0\). □
Proof of Theorem 3.4
Parts (a), (b) and (c) follow from Lemmas 5.7 and 5.8 with \(\rho ^{\ast }=\rho ^{l}=\rho ^{u}\). Now to prove part (d), observe that (b) implies that
for all \(x\in \mathbf{X},\pi ^{2}\in \Pi ^{2}\). In turn, by Corollary 5.5, this inequality implies that
Hence,
Similarly, part (c) implies that
for all \(x\in \mathbf{X},\pi ^{1}\in \Pi ^{1}\), which, together with Corollary 5.5, implies that
where \(l\) is the constant in Lemma 5.3. Thus,
for all \(x\in \mathbf{X}\). Hence, since this inequality holds for all positive \(\varepsilon \), it follows that
Therefore, \(\rho ^{\ast }\) is the value of the game and \(\phi _{\ast }^{1}\) is optimal for player 1.
Finally note that part (e) was already proved in Lemma 5.8. Thus, the proof of Theorem 3.4 is now complete. □
6 Proof of Theorem 4.2
The plan to prove Theorem 4.2 is to show that the inventory system satisfies all assumptions of Theorem 3.4, namely, Assumptions 1, 2 and 3. The next lemma is essential for the proof that Assumption 2 holds.
Lemma 6.1
Suppose that Assumptions 4and 5hold. Then, there exists \(r>0\) such that (17) holds, that is, \(\sup _{b\in \mathbf{B}}\Phi _{b}(r)<1\) where \(\Phi _{p}(\cdot )\) is as in (18).
Proof
A twice integration by parts leaves to the equality
this implies that
Let \(Z\) be a random variable with distribution \(b\in \mathbf{B}\). Then,
The last inequality implies that
Now consider the function
and its derivative
Since \(h^{\prime }\) is continuous and \(h^{\prime }(0)=z_{\ast }-\widehat{a}>0\), there exists \(\delta >0\) such that \(h^{\prime }>0\) on the interval \((0,\delta )\). For each \(r\in (0,\delta )\), by the mean value theorem, there exists \(t^{\ast }\in (0,r)\) such that
thus,
which implies that
This latter fact combined with (23) yields that \(\Phi _{b}(r)\leq \rho \) for all \(b\in \mathbf{B}\), which in turn implies the desired result. □
Lemma 6.2
Suppose that Assumptions 4and 5hold. Let \(\nu \) be the Dirac measure at 0. Thus, the constant \(\lambda \) in Lemma 6.1, the measure \(\nu \) and the functions \(W\) and \(s\) given in (19) satisfy Assumption 2.
Proof
First observe Assumption 2(a) holds since \(\nu (W)=W(0)\). For the other hand,
for all measurable function \(v:\mathbf{X}\rightarrow \mathbb{R}\) whenever these integrals exist. In particular, it holds that
which proves that Assumption 2(c) holds. Similarly, taking \(v=\mathbb{I}_{B}\), it results that
which is Assumption 2(b).
Finally, to prove the inequality \(\nu (s_{\phi ^{1},\phi ^{2}})>0\) for all stationary strategy pair \((\phi ^{1},\phi ^{2})\), observe that
Next, note that \(b((\widehat{a},\infty ))>0\) because \(\mu _{b}>\widehat{a}\) for all \(b\in \mathbf{B}\), which implies that
Hence, \(\nu (s_{\phi ^{1},\phi ^{2}})>0\). □
Lemma 6.3
The payoff function \(C\) in (13) is lower semicontinuous.
Proof
Suppose that \((x_{n},a_{n},b_{n})\rightarrow (x,a,b)\) and define
where \(y_{n}:=x_{n}+a_{n},n\in \mathbb{N}\), and \(y=x+a\); since \(y_{n}\rightarrow y\), then \(y_{n}\leq L\) for all \(n\in \mathbb{N}\) for some \(L>0\). These functions satisfy the following properties:
-
(i)
\(\{h_{n}\}\) is asymptotically uniformly integrable with respect to the sequence of probability measures \(\{b_{n}\}\), which means that
$$ \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\int _{0}^{ \infty }|h_{n}(w)|\mathbb{I}_{B_{n}^{K}}(w)b_{n}(dw)=0, $$where \(B_{n}^{K}:=\{w\geq 0:|h_{n}(w)|\geq K\},n\in \mathbb{N},K>0\). In fact, note that
$$\begin{aligned} \int _{\mathbb{R}_{+}}|h_{n}(w)|\mathbb{I}_{B_{n}^{K}}(w)b_{n}(dw) & = \int _{\mathbb{R}_{+}}(w-y_{n})\mathbb{I}_{(y_{n},\infty )}(w) \mathbb{I}_{(y_{n}+K,\infty )}(w)b_{n}(dw) \\ & \leq \int _{\mathbb{R}_{+}}w\mathbb{I}_{[K,\infty )}(w)b_{n}(dw)-y_{n}b_{n}((L+K,\infty )). \end{aligned}$$Thus, taking limsup as \(n\) goes to \(\infty \), it results that
$$ \limsup _{n\rightarrow \infty }\int _{\mathbb{R}_{+}}|h_{n}(w)| \mathbb{I}_{B_{n}^{K}}(w)b_{n}(dw)\leq \int _{\mathbb{R}_{+}}w\mathbb{I}_{[K, \infty )}(w)b(dw)-yb((L+K,\infty )), $$from which the desired result follows.
-
(ii)
the sequence \(\{h_{n}\}\) is equicontinuous; this property is verified directly.
-
(iii)
\(\{h_{n}\}\) converges to \(h\) in measure \(b\); this follows because \(h_{n}\rightarrow h\) pointwise.
Hence, by [1, Corollary 5.2], it holds that
$$\begin{aligned} E_{b}(w-x-a)^{+} & =\int _{\mathbb{R}_{+}}(w-x-a)^{+}b(dw) \\ & =\lim _{n\rightarrow \infty }\int _{\mathbb{R}_{+}}(w-x_{n}-a_{n})^{+}b_{n}(dw) \\ & =\lim _{n\rightarrow \infty }E_{b_{n}}(w-x_{n}-a_{n})^{+}. \end{aligned}$$Therefore,
$$\begin{aligned} \liminf _{n\rightarrow \infty }C(x_{n},a_{n},b_{n}) & =\liminf _{n \rightarrow \infty }[c_{1}\mathbb{I}_{(0,\infty )}(a_{n})+c_{2}a_{n}+c_{3}(x_{n}+a_{n}) \\ & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +c_{4}E_{b_{n}}(w-x_{n}-a_{n})^{+}] \\ & \geq C(x,a,b), \end{aligned}$$which proves that the cost function \(C\) is lower semicontinuous.
□
Lemma 6.4
\(W\) and \(QW\) are continuous functions.
Proof
Clearly, \(W\) is continuous. Now, to prove the second statement proceed as in the proof of the above lemma; thus, consider a sequence \((x_{n},a_{n},b_{n})\in \mathbb{K},n\in \mathbb{N}\), that converges to \((x,a,b)\in \mathbb{K}\). Next, define
The following facts hold:
-
(i)
\(\{\widehat{h}_{n}\}\) is asymptotically uniformly integrable with respect to the sequence of probability measures \(\{b_{n}\}\), which means that
$$ \lim _{K\rightarrow \infty }\limsup _{n\rightarrow \infty }\int _{ \mathbb{R}_{+}}|\widehat{h}_{n}(w)|\mathbb{I}_{B_{n}^{K}}(w)b_{n}(dw)=0, $$where \(B_{n}^{K}:=\{w\geq 0:|\widehat{h}_{n}(w)|\geq K\}\). This is true because the sequence \(\{\widehat{h}_{n}\}\) is uniformly bounded.
-
(ii)
the sequence \(\{\widehat{h}_{n}\}\) is equicontinuous; to prove this, let \(L\) be a bound for the sequence \(\{x_{n}+a_{n}\}\) and observe that the sequence of functions \(g_{n}(w):=(x_{n}+a_{n}-w)^{+},n\in \mathbb{N},w\geq 0\), is equicontinuous. Since \(W\) is uniformly continuous in \([0,L]\), it follows that the sequence \(\widehat{h}_{n},n\in \mathbb{N}\), is equicontinuous too.
-
(iii)
\(\{\widehat{h}_{n}\}\) converges to \(\widehat{h}\) in measure \(b\); this follows from the pointwise convergence.
Hence, by [1, Corollary 5.2], it holds that
which proves the continuity of \(QW\). □
Lemma 6.5
The transition law \(Q\) is weakly continuous.
Proof
Let \((x_{n},a_{n},b_{n})\in \mathbb{K},n\in \mathbb{N}\), be a sequence that converges to \((x,a,b)\in \mathbb{K}\); fix an arbitrary bounded continuous function \(v:\mathbf{X\rightarrow }\mathbb{R}\) and an arbitrary real number \(\varepsilon >0\). The sequence \(b_{n},n\in \mathbb{N}\), is tight since it is weakly convergent; thus, there exists a constant \(K_{1}>0\) such that
where \(M>0\) is a bound for the function \(v\). For the other hand, since \(x_{n}+a_{n}\rightarrow x+a\), there exists \(r>0\) such that
Now, let \(K:=\max \{K_{1},K_{2}\}\) and observe that
Moreover, since \(v\) is uniformly continuous on \([0,K]\), there exists \(\delta >0\) such that
for all \(z_{1},z_{2}\in \lbrack 0,K]\) with \(|z_{1}-z_{2}|<\delta \).
Next consider the continuous bounded function
and take \(N\in \mathbb{N}\) such that
hold for all \(n\geq N\), and put
Then,
Thus, for all \(n\geq N\), it holds that
which yields the inequality
Hence, \(Q(\cdot |x_{n},a_{n},b_{n}),n\in \mathbb{N}\), converges weakly to \(Q(\cdot |x,a,b)\). □
Lemma 6.6
The set of probability distribution \(\mathbf{B}\) is sequentially compact.
Proof
Notice that the following inequalities
hold for all \(b\in \mathbf{B}\) and \(k>0\). Hence, \(\mathbf{B}\) is tight.
Now take a sequence \(\{b_{n}\}\subset \mathbf{B}\). By Prohorov theorem, there exist \(b_{0}\in \mathbb{P}(\mathbf{X})\) and a subsequence \(\{b_{n_{k}}\}\) that converges weakly to \(b_{0}\). Then,
By (24), for each \(\varepsilon >0\) there exists \(k_{0}\in \mathbb{N}\) such that
Since the mapping \(s\rightarrow s\mathbb{I}_{[0,k]}\) is upper semicontinuous and bounded from above, it follows for \(m>k_{0}\) that
Because \(\varepsilon \) can be chosen arbitrarily, the last inequality implies that \(\mu _{b_{0}}\geq z_{\ast }\).
Finally observe that
Therefore, \(b_{0}\in \mathbf{B}\), which proves that \(\mathbf{B}\) is sequentially compact. □
Proof of Theorem 4.2
As mentioned at the beginning of this section, the plan is to show that the semi-Markov inventory system satisfies Assumptions 1, 2 and 3. Thus, first note that Remark 4.1 and Lemma 6.2 prove that Assumptions 1 and 2 hold, respectively. Moreover, Lemma 6.3 proves the lower semicontinuity of function \(C\), which is Assumption 3(a). For the other hand, Assumption 3(d) follows from Assumption 6 since \(\tau =\kappa \). Furthermore, Lemma 6.5 shows that \(Q\) is weakly continuous, which is Assumption 3(e), while Lemma 6.4 shows that Assumption 3(f) holds. Finally, note that Assumptions 3(b) and (c) trivially hold because \(A(x)=[0,\widehat{a}]\) and \(B(x)=\mathbf{B}\) for all \(x\in \mathbf{X}\) and, according to Lemma 6.6, \(\mathbf{B}\) is (sequentially) compact.
Next, consider a function \(v:\mathbb{K}\rightarrow \mathbb{R}\) and observe that
Thus, from Theorem 3.4, there exists a function \(h^{\ast }\) in \(L_{W}(\mathbf{X})\) such that
Now, notice that for each \(b\in \mathbf{B}\), the function
is lower semicontinuous. Then, the mapping
is also lower semicontinuous. Hence, there exists \(f_{\ast }^{1}\in \mathbb{F}^{1}\) such that
for all \(x\in \mathbf{X}\), which proves the first statement of Theorem 4.2. The other statements are proved following standard arguments (as those given in the proof of Theorem 3.4(d) and (e)). □
Notes
The authors thank to the referee for bringing their attention to this paper.
References
Feinberg, E.A., Kasyanov, P.O., Liang, Y.: Fatou’s lemma in its classical form and Lebesgue’s convergence theorems for varying measures with applications to Markov decision processes. Theory Probab. Appl. 65, 270–291 (2020)
Gatsis, K., Ribeiro, A., Pappas, G.J.: Optimal power management in wireless control systems. IEEE Trans. Autom. Control 59, 1495–1510 (2014)
González-Trejo, J.I., Hernández-Lerma, O., Hoyos-Reyes, L.F.: Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41, 1626–1659 (2002)
Guo, X.P., Zhu, Q.: Average optimality for Markov decision processes in Borel spaces: a new condition and approach. J. Appl. Probab. 43, 318–334 (2006)
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (1999)
Hernández-Lerma, O., Vega-Amaya, O.: Infinite-horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality. Appl. Math. 25, 153–178 (1998)
Hernández-Lerma, O., Vega-Amaya, O., Carrasco, G.: Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optim. 38(1), 79–93 (1999)
Jaśkiewicz, A.: Zero-sum semi-Markov games. SIAM J. Control Optim. 41, 723–739 (2002)
Jaśkiewicz, A.: A fixed point approach to solve the average cost optimality equation for semi-Markov decision processes with Feller transition probabilities. Commun. Stat., Theory Methods 36, 2559–2575 (2007)
Jaśkiewicz, A.: Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities. J. Optim. Theory Appl. 141, 321–347 (2009)
Jaśkiewicz, A., Nowak, A.S.: Zero-sum ergodic stochastic games with Feller transition probabilities. SIAM J. Control Optim. 45, 773–789 (2006)
Jaśkiewicz, A., Nowak, A.S.: On the optimality equation for average cost Markov control processes with Feller transition probabilities. J. Math. Anal. Appl. 316, 495–509 (2006)
Jaśkiewicz, A., Nowak, A.S.: Optimality in Feller semi-Markov control processes. Oper. Res. Lett. 34, 713–718 (2006)
Jaśkiewicz, A., Nowak, A.S.: Robust Markov control processes. J. Math. Anal. Appl. 420, 1337–1353 (2014)
Jaśkiewicz, A., Nowak, A.S.: Zero-sum stochastic games. In: Başar, T., Zaccour, G. (eds.) Handbook of Dynamic Game Theory. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27335-8_8-1
Luque-Vásquez, F., Minjárez-Sosa, J.A., Rosas-Rosas, L.C.: Semi-Markov control models with partially known holding times distribution: discounted and average criteria. Acta Appl. Math. 114, 135–156 (2011)
Mesquita, A.R., Hespanha, J.P., Nair, G.N.: Redundant data transmission in control/estimation over lossy networks. Automatica 48, 1612–1620 (2012)
Nowak, A.S.: Measurable selection theorems for minimax stochastic optimization problems. SIAM J. Control Optim. 23, 466–476 (1985)
Ross, S.M.: Applied Probability Models with Optimization Applications. Dover, New York (1970)
Tanaka, K., Wakuta, K.: On semi-Markov games. Sci. Rep. Niigata Univ. Ser. A 13, 55–64 (1976)
Vega-Amaya, O.: The average cost optimality equation: a fixed point approach. Bol. Soc. Mat. Mex. 9, 185–195 (2003)
Vega-Amaya, O.: Zero-sum semi-Markov games: fixed-point solutions of the Shapley equation. SIAM J. Control Optim. 42, 1876–1894 (2003)
Vega-Amaya, O.: On the regularity property of semi-Markov processes with Borel state spaces. In: Hernández-Hernández, D., Minjárez-Sosa, A. (eds.) Optimization, Control, and Applications of Stochastic Systems, pp. 301–309. Springer, Berlin (2012)
Vega-Amaya, O.: Solutions of the average cost optimality equation for Markov decision processes with weakly continuous kernel: the fixed-point approach revisited. J. Math. Anal. Appl. 464, 152–163 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Work partially supported by Consejo Nacional de Ciencia y Tecnología (CONACYT-Mexico) under grant Ciencia Frontera 2019-87787.
Rights and permissions
About this article
Cite this article
Vega-Amaya, Ó., Luque-Vásquez, F. & Castro-Enríquez, M. Zero-Sum Average Cost Semi-Markov Games with Weakly Continuous Transition Probabilities and a Minimax Semi-Markov Inventory Problem. Acta Appl Math 177, 9 (2022). https://doi.org/10.1007/s10440-022-00470-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10440-022-00470-5