On discounted AR–AT semi-Markov games and its complementarity formulations

Mondal, P.; Sinha, S.; Neogy, S. K.; Das, A. K.

doi:10.1007/s00182-015-0470-1

On discounted AR–AT semi-Markov games and its complementarity formulations

Published: 07 March 2015

Volume 45, pages 567–583, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Game Theory Aims and scope Submit manuscript

On discounted AR–AT semi-Markov games and its complementarity formulations

Download PDF

P. Mondal¹,
S. Sinha¹,
S. K. Neogy² &
…
A. K. Das³

287 Accesses
11 Citations
Explore all metrics

Abstract

In this paper, we introduce a class of two-person finite discounted AR–AT (Additive Reward–Additive Transition) semi-Markov games (SMGs). We provide counterexamples to show that AR–AT and AR–AT–PT (Additive Reward–Additive Transition Probability and Time) SMGs do not satisfy the ordered field property. Some results on AR–AT–AITT (Additive Reward–Additive Transition and Action Independent Transition Time) and AR–AIT–ATT (Additive Reward–Action Independent Transition and Additive Transition Time) games are obtained in this paper. For the zero-sum games, we prove the ordered field property and the existence of pure stationary optimals for the players. Moreover, such games are formulated as a vertical linear complementarity problem (VLCP) and have been solved by Cottle-Dantzig’s algorithm under a mild assumption. We illustrate that the nonzero-sum case of such games do not necessarily have pure stationary equilibria. However, there exists a stationary equilibria which has at most two pure actions in each state for each player.

On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

Article Open access 04 January 2015

On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Article Open access 13 January 2023

Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes

Article 25 April 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Stochastic (Markov) games (SGs) with additive rewards and additive transitions (AR–AT) have been widely studied in the literature (e.g., Himmelberg et al. 1976; Raghavan et al. 1985). Raghavan et al. (1985) proved that for zero-sum AR–AT stochastic games, the ordered field property holds and both players have pure optimal stationary strategies for the discounted as well as the undiscounted payoff criteria. However, for nonzero-sum games in this class, a pure stationary equilibria may not exist. Raghavan et al. (1985) suggested a finite step method which involves solving a finite number of Markov decision problems to compute a pair of pure stationary optimal strategies and the value of the zero-sum AR–AT stochastic games. Mohan et al. (1999) formulated such games as a single vertical linear complementarity problem (VLCP) which can be solved by Cottle-Dantzig’s algorithm under a mild assumption.

In this paper, we introduce a class of two-person finite AR–AT semi-Markov games (SMGs) under the discounted payoff criterion. In Sect. 2, we define two-person finite SMGs and supply relevant material on VLCP. In Sect. 3, counterexamples are given to show that zero-sum AR–AT and AR–AT–PT (Additive Reward–Additive Transition Probability and Time) games do not satisfy the ordered field property. We prove that AR–AT–AITT (Additive Reward–Additive Transition and Action Independent Transition Time) and AR–AIT–ATT (Additive Reward–Action Independent Transition and Additive Transition Time) SMGs satisfy the ordered field property and possess pure optimal stationary strategies. We show by illustrations that equilibrium point of pure stationary strategies may not exist in the nonzero-sum case of such games. However, the existence of a stationary equilibrium point which has at most two pure actions for each player in each state is proved. In Sect. 4, we formulate the zero-sum AR–AT–AITT and AR–AIT–ATT games as VLCP and show that the Cottle-Dantzig’s algorithm can process these problems under some mild assumptions.

2 Background and preliminaries

2.1 Two-person finite semi-Markov games

A finite two-person semi-Markov game is defined by a collection of objects $<\mathrm {S},\{A(s):s\in \mathrm {S}\},\{B(s):s\in \mathrm {S}\},q,P,r_1,r_2>$ where $\mathrm {S}=\{1,2,\ldots ,k\}$ is the finite nonempty state space and for each $s\in \mathrm {S}$, $A(s)$ and $B(s)$ are the finite sets of admissible actions in state $s$ for Player 1 (P1) and Player 2 (P2) respectively. Let $|A(s)|=m_s$ and $|B(s)|=n_s$ which denote the cardinality of the sets $A(s)$ and $B(s)$ respectively. The set of admissible triplets is defined by $\fancyscript{K}=\{(s,i,j)|~s\in \mathrm {S},~i\in A(s),~j\in B(s)\}$. Let $\mathbf {P}(D)$ be the family of probability distributions on a finite set $D$. Then the map $q:\fancyscript{K}\rightarrow \mathbf {P}(\mathrm {S})$ is called the transition function of the game. Therefore, for each $(s,i,j)\in \fancyscript{K}$, a transition probability vector $q(s,i,j)=(q(1|s,i,j),q(2|s,i,j),\ldots ,q(k|s,i,j))\in [0,1]^k$ such that $\sum _{s'=1}^kq(s'|s,i,j)=1$. Given $(s,i,j)\in \fancyscript{K}$ and $s'\in \mathrm {S}$, let $\tau _{ss'}^{ij}$ be the transition (sojourn) time random variable which denotes the time for a transition from a state $s$ to a state $s'$ by a pair of actions $(i,j)\in A(s)\times B(s)$. For a finite SMG, we assume that each $\tau _{ss'}^{ij}$ has a finite support $\{1,2,\ldots ,T\}$. Then, the map $P:\fancyscript{K}\times \mathrm {S}\rightarrow \mathbf {P}(\{1,2,\ldots ,T\})$ is called the conditional transition time distribution. Symbolically, for each $(s,i,j)\in \fancyscript{K}$, $s'\in \mathrm {S}$, a transition time distribution vector $P_{ss'}^{ij}=(P_{ss'}^{ij}(1),\ldots ,P_{ss'}^{ij}(T))\in [0,1]^T$ such that $\sum _{t=1}^T P_{ss'}^{ij}(t)=1$. Finally, $r_1$, $r_2: \fancyscript{K}\rightarrow \mathbf {R}$ are called the payoff functions and they represent the immediate (expected) rewards of P1 and P2 respectively.

If $r_1(s,i,j)+r_2(s,i,j)=0$ for all $(s,i,j)\in \fancyscript{K}$, then we have a two-person zero-sum game. Otherwise, the game is nonzero-sum. For the zero-sum case, we define $r=r_1=-r_2$ and consider P1 as the maximizer and P2 as the minimizer.

A finite SMG is played over the infinite future as follows: At the $0$th decision epoch, the game starts at a state $s_0\in \mathrm {S}$ and the players choose simultaneously (and hence independently of each other) a pure action pair $(i_0,j_0)\in A(s_0)\times B(s_0)$, then the following happens: P1 and P2 receive immediate rewards $r_{1}(s_0,i_0,j_0)$ and $r_{2}(s_0,i_0,j_0)$ respectively and the system moves to a new state $s_1$ with probability $q(s_1|s_0,i_0,j_0)$. The time until the transition occurs is determined by a discrete random variable $\tau _{s_0s_1}^{i_0j_0}$ having the probability mass function $P_{s_0s_1}^{i_0j_0}(\cdot )$. Once the transition to $s_1$ occurs on the next decision epoch, the entire process is repeated over and over again.

Clearly, when all the transition times are identical, the game is just a finite stochastic game (Shapley 1953).

A behavioral strategy $\pi _1$ for P1 is a sequence $\{\pi _{1n}\}_{n=0}^{\infty }$, where $\pi _{1n}$ is a probability distribution on $A(s_n)$ depending on the history $h_n=(s_0,i_0,j_0,\ldots ,s_{n-1},i_{n-1},j_{n-1},s_n)$ of the system up to the $n$-th decision epoch, where $(s_k,i_k,j_k)\in \fancyscript{K}$ are respectively the state and actions of the players at the $k$-th decision epoch ($k=0,1,2,\ldots ,n$).

A stationary strategy $f_1$ for P1 consists of a $k$-tuple $f_1=(f_1(1),f_1(2),\ldots ,f_1(k))$, where $f_1(s)\in \mathbf {P}(A(s))$ for each $s\in \mathrm {S}$. The probability of choosing an action $i$ in state $s$ under $f_1$ will be denoted by $f_1(s,i)$. A stationary strategy $f_1$ is called pure if $f_1(s,i)\in \{0,1\}$ for all $s\in \mathrm {S}$ and $i\in A(s)$.

For P2, a behavioral strategy $\pi _2$ and a stationary strategy $f_2$ are similarly defined. Let $\varPi _1$, $\varPi _2$ and $\fancyscript{F}_1$, $\fancyscript{F}_2$ be respectively the classes of all behavioral and stationary strategies for P1 and P2 in a semi-Markov game.

Let $(X_0,A_0,B_0,X_1,A_1,B_1,\ldots )$ be the coordinate sequence on the infinite product space $\mathrm {S}\times (\{A(s):s\in \mathrm {S}\}\times \{B(s):s\in \mathrm {S}\}\times \mathrm {S})^\infty $. Let $r(X_m,A_m,B_m)$ be the immediate reward on the $m$-th decision epoch $(m=0,1,2,\ldots )$, where $X_m$, $A_m$ and $B_m$ are respectively the state and actions chosen in that epoch. Once the initial state $s$ and a pair of strategies $(\pi _1,\pi _2)$ are specified, then so is the probability distribution of $r(X_m,A_m,B_m)$ for every $m$. The expectation of $r$ is well defined and will be denoted by $\mathbf {E}_{\pi _1,\pi _2}\bigl [\,r(X_m,A_m,B_m)\mid X_0=s\bigr ]$. Let $\fancyscript{B}(\mathrm {S})$ be the set of all bounded functions on $\mathrm {S}$.

Definition 1

(Discounted payoffs) For $\beta \in (0,1)$ and $(\pi _1,\pi _2)\in \varPi _1\times \varPi _2$, the expected total $\beta $-discounted payoff to Player $l$, $l=1,2$, is defined by

$$\begin{aligned} V_{\pi _1,\pi _2}^{l\beta }(s)=\mathbf {E}_{\pi _1,\pi _2}\left[ \sum _{n=0}^\infty {\beta }^{(\tau _0+\ldots +\tau _n)}\,r_{l}(X_n,A_n,B_n)|X_0=s\right] \quad ~\hbox {for all } s\in \mathrm {S}, \end{aligned}$$

where $\tau _0=0$ and $\tau _n$ is the time between the $(n-1)$-th and $n$-th decision epochs. Clearly, $\tau _{n+1}$ is a random variable depending on $X_n,A_n,B_n$ and $X_{n+1}$ for each $n$ and $A_n\in A(X_n)$ and $B_n\in B(X_n)$ are pure actions of the players at the $n$-th decision epoch. For the zero-sum case, $V_{\pi _1,\pi _2}^{\beta }(s)=V_{\pi _1,\pi _2}^{1\beta }(s)=-V_{\pi _1,\pi _2}^{2\beta }(s)$ for all $s\in \mathrm {S}$.

Definition 2

(Value and optimal strategies) A zero-sum two-person $\beta $-discounted semi-Markov game is said to have a value vector $v_{\beta }=\bigl [v_{\beta }(s)\bigr ]_{s\in \mathrm {S}}$ if

$$\begin{aligned} \inf _{\pi _2\in \varPi _2}\sup _{\pi _1\in \varPi _1}V_{\pi _1,\pi _2}^\beta (s)=\sup _{\pi _1\in \varPi _1}\inf _{\pi _2\in \varPi _2}V_{\pi _1,\pi _2}^\beta (s)=v_{\beta }(s)~~ \quad \hbox {for all}~ s\in \mathrm {S}. \end{aligned}$$

A pair $(\pi _1^*,\pi _2^*)$ is said to be an optimal strategy pair if for all $s\in \mathrm {S}$,

$$\begin{aligned} V_{\pi _1^*,\pi _2}^\beta (s)\ge v_{\beta }(s)\ge V_{\pi _1,\pi _2^*}^\beta (s)\quad ~~\hbox {for all}~ (\pi _1,\pi _2)\in \varPi _1\times \varPi _2. \end{aligned}$$

To develop a standard SMG model, we require a regularity condition (that ensures that an infinite number of transitions do not occur in a finite time interval) and various boundedness and continuity assumptions (see Sect. 4 in Luque-Vasquez 2002). If we endow $\fancyscript{K}$ with the discrete topology in a finite SMG, all these assumptions are trivially satisfied. Luque-Vasquez (2002) proved that a $\beta $-discounted semi-Markov game has value and a pair of stationary optimal strategies for countable state space, compact action spaces and continuous sojourn times. He obtained a Shapley equation for such games. For a finite semi-Markov game, we observe the following result:

Theorem 1

The value vector $v_{\beta }$ of a zero-sum two-person finite $\beta $-discounted semi-Markov game satisfies the Shapley equation

$$\begin{aligned} v_{\beta }(s)=val\left[ r(s,i,j)+\sum _{s'\in \mathrm {S}}q(s'|s,i,j)v_{\beta }(s')\sum _{t=1}^T{\beta }^t P_{ss'}^{ij}(t)\right] \quad ~~\hbox {for all}~~s\in \mathrm {S}. \end{aligned}$$

(1)

Further, $(f_1^*,f_2^*)\in \fancyscript{F}_1\times \fancyscript{F}_2$ is an optimal strategy pair if and only if $(f_1^*(s),f_2^*(s))$ is a pair of optimal strategies of the matrix game in (1) above for all $s\in \mathrm {S}$.

For each $u\in \fancyscript{B}(\mathrm {S})$, $(s,i,j)\in \fancyscript{K}$ and $(f_1,f_2)\in \fancyscript{F}_1\times \fancyscript{F}_2$, we define:

$$\begin{aligned}&\displaystyle H(u,s,i,j)=r(s,i,j)+\sum _{s'\in \mathrm {S}}q(s'|s,i,j)u(s')\sum _{t=1}^T{\beta }^t P_{ss'}^{ij}(t) \\&\displaystyle T_{\beta }u(s)=\min _{j\in B(s)}\max _{i\in A(s)}H(u,s,i,j) \\&\displaystyle T_{\beta }(f_1,f_2)u(s)=H(u,s,f_1,f_2)=\sum _{i\in A(s)}\sum _{j\in B(s)}H(u,s,i,j)f_1(s,i)f_2(s,j). \end{aligned}$$

We call $T_{\beta }$ the Shapley operator. Since the reward is bounded, $T_{\beta }u$ and $ T_{\beta }(f_1,f_2)u\in \fancyscript{B}(\mathrm {S})$ for each $u\in \fancyscript{B}(\mathrm {S})$. Note that Theorem 1 can be written as $v_{\beta }(s)=T_{\beta }v_{\beta }(s)$ and $T_{\beta }(f_1^*,f_2^*)v_{\beta }(s)=v_{\beta }(s)$ for all $s\in \mathrm {S}$.

We need the following lemmas to prove Theorem 1:

Lemma 1

(a) For each $u\in \fancyscript{B}(\mathrm {S})$, there exists a pair $(f_1^*,f_2^*)\in \fancyscript{F}_1\times \fancyscript{F}_2$ such that

$$\begin{aligned} T_{\beta }u(s)=H(u,s,f_1^*,f_2^*)= & {} \max _{f_1\in \fancyscript{F}_1}H(u,s,f_1,f_2^*)\\= & {} \min _{f_2\in \fancyscript{F}_2}H(u,s,f_1^*,f_2)~\forall s\in \mathrm {S}. \end{aligned}$$

(b) Both $T_{\beta }$ and $T_{\beta }(f_1,f_2)$ are contraction operators with modulus less than 1.

Proof

The proof follows on similar lines as Lemmas 5.1 and 5.2 in Luque-Vasquez (2002) and the required assumptions are trivially true for our finite SMGs. $\square $

Since both $T_{\beta }$ and $T_{\beta }(f_1,f_2)$ are contraction operators, by Banach’s Fixed Point Theorem there exist unique functions $v^*$ and $v_{f_1,f_2}^*$ in $\fancyscript{B}(\mathrm {S})$ such that $T_{\beta }v^*(s)=v^*(s)$ and $T_{\beta }(f_1,f_2)\,v_{f_1,f_2}^*(s)=v_{f_1,f_2}^*(s)$ for all $s\in \mathrm {S}$.

Lemma 2

For each $(f_1,f_2)\in \fancyscript{F}_1\times \fancyscript{F}_2$, $ V_{f_1,f_2}^{\beta }(\cdot )$ is the unique fixed point of $T_{\beta }(f_1,f_2)$.

Proof

In what follows ${\mathbf {E}}_{f_1,f_2}(\cdot \mid X_0=s)$ is denoted by ${\mathbf {E}}$. Now, by definition

$$\begin{aligned} V_{f_1,f_2}^{\beta }(s)= & {} \mathbf {E}\left[ \sum _{n=0}^\infty {\beta }^{(\tau _0+\ldots +\tau _n)}\,r(X_n,A_n,B_n)\right] \\= & {} {\mathbf {E}} \bigl [r(X_0,A_0,B_0)\bigr ]+{\mathbf {E}}\left[ {\mathbf {E}}_{f_1,f_2}\bigl \{\sum _{n=1}^\infty {\beta }^{(\tau _1+\ldots +\tau _n)}\,r(X_n,A_n,B_n)\mid h_1,{\tau }_1\bigr \}\right] \\= & {} \sum \limits _{i\in A(s)}\sum \limits _{j\in B(s)}r(s,i,j)\,f_1(s,i)\,f_2(s,j)+{\mathbf {E}}\bigl [{\beta }^{\tau _1}V_{f_1,f_2}^{\beta }(X_1)\bigr ] \\= & {} r(s,f_1,f_2)+\sum \limits _{i\in A(s)}\sum \limits _{j\in B(s)}\,\left[ \sum _{s'\in \mathrm {S}}q(s'|s,i,j)\,V_{f_1,f_2}^{\beta }(s')\sum _{t}\beta ^tP_{ss'}^{ij}(t)\right] \\&\quad f_1(s,i)f_2(s,j)=T_{\beta }(f_1,f_2)V_{f_1,f_2}^{\beta }(s)\quad \hbox {for all} \quad s\in \mathrm {S}. \end{aligned}$$

$\square $

Lemma 3

A pair of strategies $(f_1^*,f_2^*)\in \fancyscript{F}_1\times \fancyscript{F}_2$ is optimal if and only if its expected payoff satisfies Shapley equation $T_{\beta }V_{f_1^*,f_2^*}^\beta (s)=V_{f_1^*,f_2^*}^\beta (s)$ for all $s\in \mathrm {S}$.

Proof

We use Lemmas 1 and 2 and the rest of the proof follows on similar lines as in Luque-Vasquez (2002). For details, see proof of Theorem 4.3, pp. 10–12. $\square $

Proof of Theorem 1

The result now follows from Lemmas 2 and 3. $\square $

Let $p(s'|s,i,j)=q(s'|s,i,j)\sum \nolimits _{t=1}^T{\beta }^t P_{ss'}^{ij}(t)$ be the probability for a transition from state $s$ to $s'$ in the next day by the pair of actions $(i,j)\in A(s)\times B(s)$. Then the system will stop in state $s$ with probability $p'(s,i,j)=1-\sum \nolimits _{s'\in \mathrm {S}}p(s'|s,i,j)$ for $(i,j)\in A(s)\times B(s)$. Since $\beta \in (0,1)$, $0<\sum \nolimits _{s'\in \mathrm {S}}p(s'|s,i,j)<1$ and hence $p'(s,i,j)\in (0,1)$ for all $(s,i,j)\in \fancyscript{K}$ and the game ends with probability 1 after a finite number of steps. Thus a finite discounted SMG reduces to a stopping stochastic game (Shapley 1953), where the optimality equation becomes

$$\begin{aligned} v_{\beta }(s)=val\left[ r(s,i,j)+\sum _{s'\in \mathrm {S}}p(s'|s,i,j)v_{\beta }(s')\right] \quad \hbox {for all} \quad s\in \mathrm {S}. \end{aligned}$$

(2)

Note that Shapley Eqs. (1) and (2) are identical. However, (1) is more convenient for studying a structured (may be with AR–AT) $\beta $-discounted SMG.

Remark 1

One can rewrite a finite semi-Markov game as a stochastic game on a larger state space by replacing the transition time by staying in states, with certain probabilities, in which the payoff is zero. However, the transformed SG obtained in this way may not work for the original SMG. See Schweitzer (1971) and Jianyong and Xiaobo (2004) (Example 2.1, p. 342–343) for details.

Definition 3

(Nash equilibrium) A pair of strategies $(f_1^{*},f_2^{*})\in \fancyscript{F}_1\times \fancyscript{F}_2$ constitutes a stationary Nash equilibrium pair in the discounted SMG, if for all $s\in \mathrm {S}$,

$$\begin{aligned} V_{f_1^{*},f_2^{*}}^{1\beta }(s)\ge & {} V_{f_1,f_2^{*}}^{1\beta }(s) \quad \quad \hbox {for all} \quad f_1\in \fancyscript{F}_1 \\ \quad \hbox {and}\quad V_{f_1^{*},f_2^{*}}^{2\beta }(s)\ge & {} V_{f_1^{*},f_2}^{2\beta }(s) \quad \quad \hbox {for all} \quad f_2\in \fancyscript{F}_2. \end{aligned}$$

Suppose that for all $s\in \mathrm {S}$, $\bar{V}_{f_2}^{1\beta }(s)=\max \nolimits _{{f_1}\in \fancyscript{F}_1}V_{f_1,f_2}^{1\beta }(s) \, \hbox {and} \, \bar{V}_{f_1}^{2\beta }(s)=\max \nolimits _{{f_2}\in \fancyscript{F}_2}V_{f_1,f_2}^{2\beta }(s)$. If one of the players, say P2, chooses a fixed stationary strategy $f_2$, then the semi-Markov game problem reduces to a discounted semi-Markov decision problem for P1. In this problem, $\bar{V}_{f_2}^{1\beta }(s)$ is the maximal (optimal) $\beta $-discounted payoff for P1 in state $s$ and there exists a $\beta $-discounted stationary maximal (optimal) policy $f_1^*$ for P1. The existence of such an $f_1^*$ (or $f_2^*$ for P2) is ensured by the finite assumptions of state and action spaces (Ross 1970). We now state the following useful result. For a proof we refer to Ross (1970).

Lemma 4

For $(f_1,f_2)\in \fancyscript{F}_1\times \fancyscript{F}_2$, $\bigl (\bar{V}_{f_2}^{1\beta }(\cdot ),\bar{V}_{f_1}^{2\beta }(\cdot )\bigr )$ is the unique solution in $\fancyscript{B}(\mathrm {S})\times \fancyscript{B}(\mathrm {S})$ of the following system of equations in $\bigl (v^1(\cdot ),v^2(\cdot )\bigr ){:}$

$$\begin{aligned} v^1(s)= & {} \!\max _{{f_1}\in \mathbf {P}(A(s))}\Bigl [r_1(s,{f_1},f_2)\!+\!\sum \limits _{i\in A(s)}\sum \limits _{j\in B(s)}\Bigl \{\sum _{s'\in \mathrm {S}}q(s'|s,i,j)v^1(s')\sum _{t}{\beta }^t P_{ss'}^{ij}(t)\!\Bigr \}~~~~\nonumber \\&{f_1}(s,i)f_2(s,j)\Bigr ], \quad s\in \mathrm {S} \end{aligned}$$

(3)

$$\begin{aligned} v^2(s)= & {} \!\max _{{f_2}\in \mathbf {P}(B(s))}\Bigl [r_2(s,f_1,{f_2})\!+\!\sum \limits _{i\in A(s)}\sum \limits _{j\in B(s)}\Bigl \{\sum _{s'\in \mathrm {S}}q(s'|s,i,j)v^2(s')\sum _{t}{\beta }^t P_{ss'}^{ij}(t)\!\Bigr \}~~~~\nonumber \\&f_1(s,i){f_2}(s,j)\Bigr ], \quad s\in \mathrm {S}. \end{aligned}$$

(4)

Furthermore, $(f_1^*,f_2^*)\in \fancyscript{F}_1\times \fancyscript{F}_2$ is a pair of optimal response for the two players iff $f_1^*(s)$ and $f_2^*(s)$ maximize (3) and (4) respectively for each $s\in \mathrm {S}$.

Theorem 2

There exists a Nash equilibrium pair $(f_1^*,f_2^*)\in \fancyscript{F}_1\times \fancyscript{F}_2$ in any finite two-person nonzero-sum $\beta $-discounted semi-Markov game.

Proof

Polowczuk (2000) proved that there exists a $\beta $-discounted stationary Nash equilibrium pair for any $N$-person nonzero-sum semi-Markov game with a countable state space, compact metric action spaces and continuous sojourn times. For the finite case, the result follows on similar lines using the optimality Eqs. (3) and (4). $\square $

Definition 4

(Semi-Markov games with ordered field property) A two-person finite semi-Markov game with rational inputs, that is, with rational payoffs, rational transition probabilities and transition time distributions, and rational discount factor is said to possess the (Archimedean) ordered field property if it has a pair of optimal/Nash equilibrium strategies whose coordinates are rational.

It follows that the value/Nash equilibrium payoffs of the game are rational too.

It is not required for the ordered field property that the game can be solved with stationary strategies. Thuijsman and Raghavan (1997) proved the ordered field property for (nonzero-sum) AR–AT stochastic games by allowing non-stationary strategies. Mondal and Sinha (2013, 2015) proved the ordered field property in a subclass of SeR-SIT and for one player control finite semi-Markov games.

2.2 Vertical linear complementarity problem

Vertical linear complementarity problem (VLCP) is a generalization of the linear complementarity problem (LCP). Let $A$ be a matrix of order $m\times k$ with $m\ge k$ and let $q$ be an $m$-vector. Suppose that $A$ and $q$ are partitioned in the following form

$$\begin{aligned} A=\bigl [A^1,A^2,\ldots ,A^k\bigr ]^t, \quad q=\bigl [q^1,q^2,\ldots ,q^k\bigr ]^t \end{aligned}$$

where $A^j\in \mathbf {R}^{m_j\times k}$ and $q^j\in \mathbf {R}^{m_j}$, $1\le j\le k$ with $\sum _{j=1}^km_j=m$. We call $A$ as a vertical block matrix of type $(m_1,m_2,\ldots ,m_k)$. Now, the generalized LCP (also known as vertical generalization of the LCP) is to find $w\in R^m$ and $z\in R^k$ such that

$$\begin{aligned}&w-Az = q, \quad w\ge \mathbf 0 _{m}, \quad z\ge \mathbf 0 _k \\&z_j\prod \limits _{i=1}^{m_j}w_{i}^j=0, \quad j=1,2,\ldots ,k. \end{aligned}$$

This is denoted by VLCP$(q,A)$. When $m_j=1$ for each $j$, this reduces to the standard LCP$(q,A)$. For a detail study on LCP, we refer the book of Cottle et al. (1992). The VLCP arises in solving some classes of stochastic game problems (see Mohan et al. 1999, 2001). Cottle and Dantzig (1970) described a procedure for solving a VLCP, which is an extension of Lemke’s algorithm (1965) for the LCP. Under the standard non-degeneracy assumption, the algorithm either terminates in a solution to the VLCP$(q,A)$ or in a secondary proper ray.

Definition 5

$A$ is said to be a vertical block $E(d)$-matrix (denoted by $A\in VBE(d)$) for some $d > 0$ if VLCP$(d,A)$ has a unique solution $w=d$, $z=0$.

Definition 6

$A$ is said to be a vertical block $R_0$-matrix (denoted by $A\in VBR_0$) if VLCP$(0,A)$ has a unique solution $w=0$, $z=0$.

If the vertical block matrix $A\in VBE(d)\cap VBR_0$, then VLCP$(q,A)$ is processable by Cottle-Dantzig’s algorithm (Mohan et al. (1996)).

3 Main results

We first describe the different conditions used throughout this paper:

AR (Additive Rewards): The rewards can be written as the sum of two functions, one depending on P1 and other on P2. So, for the zero-sum case,

$$\begin{aligned} r(s,i,j)=r_1(s,i)+r_2(s,j)~~\forall ~ (s,i,j)\in \fancyscript{K}, \end{aligned}$$

and for the nonzero-sum case,

$$\begin{aligned} r_1(s,i,j)=r_{11}(s,i)+r_{12}(s,j)\quad ~\hbox {and}~ \quad r_2(s,i,j)=r_{21}(s,i)+r_{22}(s,j)~\forall ~ (s,i,j)\!\in \!\fancyscript{K}. \end{aligned}$$

AT (Additive Transitions): The transition probabilities can be written as the sum of two functions, one depending on P1 and other] on P2, i.e.,

$$\begin{aligned} q(s'|s,i,j)=q_1(s'|s,i)+q_2(s'|s,j)~~\forall ~(s,i,j)\in \fancyscript{K},\quad s'\in \mathrm {S},\ni q_1,q_2\ge 0. \end{aligned}$$

ATT (Additive Transition Times): The transition times are additive, i.e.,

$$\begin{aligned}&P_{ss'}^{ij}(t)=(P_1)_{ss'}^i(t)+(P_2)_{ss'}^j(t), \\&t=1,2,\ldots ,T;\quad \forall ~(s,i,j)\in \fancyscript{K},\quad s'\in \mathrm {S}, ~\ni P_1,P_2\ge 0. \end{aligned}$$

AIT (Action Independent Transitions) Transition probabilities are independent of the actions of the players. That means

$$\begin{aligned} q(s'|s,i,j)=q(s'|s,i',j')\equiv q(s'|s)~\forall ~(s,i,j),(s,i',j')\in \fancyscript{K},\quad s'\in \mathrm {S}. \end{aligned}$$

AITT (Action Independent Transition Times): Transition times are independent of the actions of the players. Thus,

$$\begin{aligned} P_{ss'}^{ij}(t)=P_{ss'}^{i'j'}(t)\equiv P_{ss'}(t),~t=1,2,\ldots ,T;\forall ~(s,i,j),(s,i',j')\in \fancyscript{K},\quad s'\in \mathrm {S}. \end{aligned}$$

Definition 7

(AR–AT games) A finite semi-Markov game with AR and AT conditions is called an AR–AT (Additive Rewards Additive Transitions) game.

Definition 8

(AR–AT–PT games) A finite semi-Markov game with AR, AT and ATT conditions is called an AR–AT–PT (Additive Rewards and Additive Transition Probabilities and Times) game.

Definition 9

(AR–AT–AITT games) A finite semi-Markov game is called an AR–AT–AITT (Additive Rewards Additive Transitions and Action Independent Transition Times) game if AR, AT and AITT conditions hold.

Definition 10

(AR–AIT–ATT games) A finite semi-Markov game is called an AR–AIT–ATT (Additive Rewards Action Independent Transitions and Additive Transition Times) game if AR, AIT and ATT conditions hold.

Fishery games (Filar and Vrieze 1997) were proposed as an application of AR–AT stochastic games. One can construct a Petroleum game model (where the state space represents the reservation of petroleum in a country and the players are different petroleum companies) for AR–AT–AITT semi-Markov games in a similar manner.

Zero-sum discounted AR–AT semi-Markov games do not possess the ordered field property as the following example shows:

Example 1

Let $p^1=q^1=(1/2,0)$, $p^2=q^2=(0,1/2)$ and $a=2$, $b=1$, $c=1$, $d=0$.

where a cell corresponds to an immediate reward $r$ and a transition with probability $q_1$ to state 1 and probability $q_2$ to state 2 if this cell is chosen in the present state. Let P1 and P2 have pure strategies $x_1$, $x_2$ and $y_1$, $y_2$ respectively in state 1, $x_3$ and $y_3$ in state 2. Transition times of this game are defined by $ P_{11}^{x_1y_1}=P_{12}^{x_2y_2}=\{1(1)\}, ~ P_{11}^{x_1y_2}=P_{12}^{x_1y_2}=P_{11}^{x_2y_1}=P_{12}^{x_2y_1}=\{2(1/2),3(1/2)\}~ \hbox {and}~ P_{22}^{x_3y_3}=\{1(1)\},$ where the entries inside the round brackets are the probabilities of the respective transition times. Let $\beta =1/2$. By Shapley Eq. (1) we get,

$$\begin{aligned}&\displaystyle v_2 = val\left[ 3+\frac{v_2}{2}\right] =3+\frac{v_2}{2}\Rightarrow v_2=6 \\&\displaystyle \hbox {and}~~ v_1 = val\left[ \begin{array}{l@{\quad }l@{\quad }l} 3+\frac{v_1}{2} &{} 2+\frac{3v_1}{32}+\frac{3v_2}{32}\\ 2+\frac{3v_1}{32}+\frac{3v_2}{32} &{} 1+\frac{v_2}{2} \end{array} \right] =val\left[ \begin{array}{l@{\quad }l@{\quad }l} 3+\frac{v_1}{2} &{}\frac{41}{16}+\frac{3v_1}{32}\\ \frac{41}{16}+\frac{3v_1}{32} &{} 4 \end{array} \right] . \end{aligned}$$

Solving the above matrix game by Kaplansky (1945), we get $v_1=\frac{64\sqrt{455}-182}{329}$ which is an irrational number. This implies that the ordered field property does not hold.

The question arises whether the zero-sum AR–AT–PT games (a natural extension of AR–AT stochastic games) possess the ordered field property. The answer is negative which is shown in the following example:

Example 2

In the Example 1, we modify transition times to additive transition times but the rest of the data for this game is same as before. Let

$$\begin{aligned} (P_1)_{11}^{x_1}= & {} \{1(1/2),3(0)\},\quad (P_2)_{11}^{y_1}=\{1(1/2),3(0)\} \\ (P_1)_{11}^{x_2}= & {} \{1(0),3(1/2)\},\quad (P_2)_{11}^{y_2}=\{1(0),3(1/2)\} \\ (P_1)_{12}^{x_1}= & {} \{1(0),4(1/2)\},\quad (P_2)_{12}^{y_1}=\{1(0),4(1/2)\} \\ (P_1)_{12}^{x_2}= & {} \{1(1/2),4(0)\},\quad (P_2)_{12}^{y_2}=\{1(1/2),4(0)\} \\ \hbox {and}\quad P_{ss'}^{x_iy_j}= & {} (P_1)_{ss'}^{x_i}+(P_2)_{ss'}^{y_j},\quad s,s'=1,2, i=1,2; j=1,2 \\ (P_1)_{22}^{x_3}= & {} \{1(1/2)\},\quad (P_2)_{22}^{y_3}=\{1(1/2)\}\quad \hbox {and}\quad P_{22}^{x_3y_3}=(P_1)_{22}^{x_3}+(P_2)_{22}^{y_3}. \end{aligned}$$

Choose $\beta =1/2$. Now by Shapley equation (1) we get, $v_2=6$,

$$\begin{aligned} \hbox {and} ~~v_1=val\left[ \begin{array}{l@{\quad }l@{\quad }l} 3+\frac{v_1}{2} &{} 2+\frac{5v_1}{32}+\frac{9v_2}{64}\\ 2+\frac{5v_1}{32}+\frac{9v_2}{64} &{} 1+\frac{v_2}{2}\\ \end{array} \right] =val\left[ \begin{array}{l@{\quad }l@{\quad }l} 3+\frac{v_1}{2} &{} \frac{91}{32}+\frac{5v_1}{32}\\ \frac{91}{32}+\frac{5v_1}{32} &{} 4\\ \end{array} \right] . \end{aligned}$$

Using Kaplansky (1945) and solving, we get $v_1=\frac{48\sqrt{382}-103}{217}$ which is an irrational number. Thus in this case, the ordered field property does not hold.

We observe the following result:

Theorem 3

A zero-sum two-person $\beta $-discounted AR–AT–AITT semi-Markov game possesses the ordered field property and each player has an optimal stationary strategy that is pure.

Similar result holds for an AR–AIT–ATT semi-Markov game.

Proof

Consider the Shapley Eq. (1) of an AR–AT–AITT semi-Markov game:

$$\begin{aligned} v_{\beta }(s)= & {} val\bigl [r_1(s,i)+r_2(s,j)+\sum _{s'\in \mathrm {S}}(q_1(s'|s,i) \\&+\, q_2(s'|s,j)) v_{\beta }(s')\sum _{t=1}^T{\beta }^t P_{ss'}(t)\bigr ]~\forall ~s\in \mathrm {S}. \end{aligned}$$

The above equation can be decomposed in the following way:

$$\begin{aligned} v_{\beta }(s)= & {} \min \limits _{\mu \in \mathbf {P}(B(s))}\max \limits _{\lambda \in \mathbf {P}(A(s))}\left[ \left\{ r_1(s,\lambda )+\sum _{s'\in \mathrm {S}}q_1(s'|s,\lambda )v_{\beta }(s')\sum _{t=1}^T{\beta }^t P_{ss'}(t)\right\} \right. \\&\quad \left. +\,\left\{ r_2(s,\mu ) +\sum _{s'\in \mathrm {S}}q_2(s'|s,\mu )v_{\beta }(s')\sum _{t=1}^T{\beta }^t P_{ss'}(t)\right\} \right] \\= & {} \max \limits _{\lambda \in \mathbf {P}(A(s))}\left[ r_1(s,\lambda )+\sum _{s'\in \mathrm {S}}q_1(s'|s,\lambda )v_{\beta }(s')\sum _{t=1}^T{\beta }^t P_{ss'}(t)\right] \\&+\min \limits _{\mu \in \mathbf {P}(B(s))}\left[ r_2(s,\mu )+\sum _{s'\in \mathrm {S}}q_2(s'|s,\mu )v_{\beta }(s')\sum _{t=1}^T{\beta }^t P_{ss'}(t)\right] \\= & {} \max \limits _{\lambda \in \mathbf {P}(A(s))}\,G_{1s}(v_\beta )+\min \limits _{\mu \in \mathbf {P}(B(s))}\,G_{2s}(v_\beta )s\quad \hbox {for all}~s\in \mathrm {S}. \end{aligned}$$

Thus, when solving $v_{\beta }(s)$, P1 needs to consider only $G_{1s}(v_\beta )$ and P2 only needs to look at $G_{2s}(v_\beta )$. From the above observation, it is easy to see that both players have optimal pure actions in each state $s\in \mathrm {S}$. Therefore, the ordered field property holds.

For an AR–AIT–ATT game, the proof follows on similar lines. $\square $

Now we ask the following questions. Is there exist any equilibrium point of pure stationary strategy pair for nonzero-sum AR–AT–AITT and AR–AIT–ATT games? Examples 3 and 4 below answer this question in the negative.

Example 3

(AR–AT–AITT game): Let $p^1=(0,0,1/2)$, $p^2=(1/4,1/4,0)$, $q^1=(0,1/4, 1/4)$, $q^2=(1/4,0,1/4); a=(1,1), b=(2,-4), c=(0,0), d=(-8,0).$

The action independent transition times are defined by $ P_{11}=\{1(1/2),2(1/2)\}$, $P_{12}=\{1(3/4),2(1/4)\}$, $P_{13}=\{2(1/2),3(1/2)\}$, $P_{22}=\{1(1/4),2(3/4)\}$, $P_{33}=\{1(1)\}$. Let the players have pure strategies $x_1$, $x_2$ and $y_1$, $y_2$ respectively in state 1. Take $\beta =1/2$. It is obvious to show that the Nash equilibrium payoffs corresponding to the absorbing states 2 and 3 are $(0,0)$ and $(4,4)$ respectively. Computing $V_{x_iy_j}^{l\beta }(1)$ for $l=1,2$, $i=1,2$ and $j=1,2$, we obtain the bimatrix game for state 1 as

$$\begin{aligned} \left[ \begin{array}{l@{\quad }l} \left( \frac{25}{16},\frac{25}{16}\right) &{} \left( \frac{-206}{29},\frac{50}{29}\right) \\ \left( \frac{70}{29},\frac{-122}{29}\right) &{} \left( \frac{-93}{13},\frac{-61}{13}\right) \end{array} \right] . \end{aligned}$$

(5)

For example, $V_{x_1y_2}^{1\beta }(1)$ can be computed as the unique solution $v$ of

$$\begin{aligned} v=-7+\frac{1}{4}\cdot v\cdot (\frac{1}{2}\cdot \frac{1}{2}+\frac{1}{4}\cdot \frac{1}{2})+\frac{3}{4}\cdot 4\cdot (\frac{1}{4}\cdot \frac{1}{2}+\frac{1}{8}\cdot \frac{1}{2})\Rightarrow v=\frac{-206}{29}. \end{aligned}$$

Similarly, other $V_{x_iy_j}^{k\beta }(1)$ can be verified easily.

From the bimatrix game (5), we can show that there exists no equilibrium point in pure stationary strategies and that for this example, the unique Nash equilibrium point in stationary strategies is completely mixed for state 1.

Example 4

(AR–AIT–ATT game): Let $a=(0,0), b=(1,-3), c=(0,0), d=(-15,0)$; $q(1)=(q(1|1),q(2|1))=(1/2,1/2), q(2)=(q(1|2),q(2|2))=(0,1)$.

Let the players have pure strategies $x_1$, $x_2$ and $y_1$, $y_2$ respectively in state 1, $x_3$ and $y_3$ respectively in state 2. Take $\beta =1/2$. The additive transition times are defined by

$$\begin{aligned} (P_1)_{11}^{x_1}= & {} \{1(0),2(1/2)\},\quad (P_1)_{12}^{x_1}=\{1(1/4),2(1/4)\} \\ (P_1)_{11}^{x_2}= & {} \{1(1/2),2(0)\},\quad (P_1)_{12}^{x_2}=\{1(1/4),2(1/4)\}, \\ (P_2)_{11}^{y_1}= & {} \{1(1/2),2(0)\},\quad (P_2)_{12}^{y_1}=\{1(1/8),2(3/8)\} \\ (P_2)_{11}^{y_2}= & {} \{1(1/4),2(1/4)\},\quad (P_2)_{12}^{y_2}=\{1(0),2(1/2)\}, \\ \hbox {where}\quad P_{ss'}^{x_iy_j}= & {} (P_1)_{ss'}^{x_i}+(P_2)_{ss'}^{y_j}\quad \hbox {for}~s=1;~s'=1,2; ~i,j=1,2, \\ (P_1)_{22}^{x_3}= & {} \{1(1/2)\} (P_2)_{22}^{y_3}=\{1(1/2)\}\quad \hbox {and}\quad P_{22}^{x_3y_3}=(P_1)_{22}^{x_3}+(P_2)_{22}^{y_3}. \end{aligned}$$

The Nash equilibrium payoffs corresponding to the absorbing state 2 are $(2,2)$. The bimatrix game corresponding to state 1 can be computed as (similar to Example 3):

$$\begin{aligned} \left[ \begin{array}{l@{\quad }l} \left( \frac{11}{26},\frac{11}{26}\right) &{} \left( \frac{-420}{27},\frac{10}{27}\right) \\ \left( \frac{43}{24},\frac{-85}{24}\right) &{} \left( \frac{-438}{25},\frac{-86}{25}\right) \end{array} \right] . \end{aligned}$$

It is easy to show that there exists a unique Nash equilibrium point in stationary strategies which is completely mixed in state 1. This implies that pure stationary Nash equilibria does not exist.

For $x=(x_1,x_2,\ldots ,x_n)\in \mathbf {R}^n$, we define $\hbox {car}(x)=\{k:x_k\ne 0\}$. Now for the discounted nonzero-sum AR–AT–AITT and AR–AIT–ATT semi-Markov games, we have the following remarkable result:

Theorem 4

A discounted nonzero-sum AR–AT–AITT semi-Markov game has a Nash equilibrium point $(\tilde{f}_1,\tilde{f}_2)\in \fancyscript{F}_1\times \fancyscript{F}_2$ such that

$$\begin{aligned} |\hbox {car}(\tilde{f}_1(s))|\le 2\quad \hbox {and}\quad |\hbox {car}(\tilde{f}_2(s))|\le 2\quad \hbox {for each } s\in \mathrm {S}. \end{aligned}$$

Similar result holds for an AR–AIT–ATT semi-Markov game.

Proof

We use (3) and (4) and the rest of the proof follows on similar lines as in Raghavan et al. (1985) (Theorem 3.1, p. 462) for AR–AT stochastic games. $\square $

4 Computing optimal pure strategies of the subclasses of AR–AT games

We show that the zero-sum discounted AR–AT–AITT and AR–AIT–ATT semi-Markov games can be formulated as a VLCP which can be solved by a pivot scheme under some mild assumption. We follow the similar approach used in Mohan et al. (1999).

4.1 Discounted zero-sum AR–AT–AITT semi-Markov games

4.1.1 VLCP formulation

The Shapley Eq. (1) gives us the following for state $s$, $s\in \mathrm {S}$:

$$\begin{aligned} r(s,i,j)+\sum _{s'\in \mathrm {S}}q(s'|s,i,j)v_{\beta }(s')\sum _{t=1}^T{\beta }^tP_{ss'}^{ij}(t)\le v_{\beta }(s)~\forall i\in A(s)\quad \hbox {and a fixed } j\in B(s). \end{aligned}$$

In particular, suppose that an optimal pure action in state $s$ is $i_0$ for P1 and $j_0$ for P2 (by Theorem 3). Then, for an AR–AT–AITT semi-Markov game, we have

$$\begin{aligned}&r_1(s,i)+r_2(s,j_0)+\sum _{s'\in \mathrm {S}}q_1(s'|s,i)v_{\beta }(s')\sum _{t}{\beta }^tP_{ss'}(t)+\sum _{s'\in \mathrm {S}}q_2(s'|s,j_0)v_{\beta }(s') \\&\qquad \qquad \times \sum _{t}{\beta }^tP_{ss'}(t) \le v_{\beta }(s)~~\hbox {for all } i\in A(s) \\&\quad \Rightarrow r_1(s,i)+\sum _{s'\in \mathrm {S}}q_1(s'|s,i)v_{\beta }(s')\sum _{t}{\beta }^tP_{ss'}(t)\le v_{\beta }(s)-\eta _{\beta }(s)=\xi _{\beta }(s)~\forall i, \end{aligned}$$

where

$$\begin{aligned} \eta _{\beta }(s)= & {} r_2(s,j_0)+\sum \limits _{s'\in \mathrm {S}}q_2(s'|s,j_0)v_{\beta }(s')\sum \limits _{t}{\beta }^tP_{ss'}(t) \nonumber \\ \hbox {and}~~~~\xi _{\beta }(s)= & {} r_1(s,i_0)+\sum \limits _{s'\in \mathrm {S}}q_1(s'|s,i_0)v_{\beta }(s')\sum \limits _{t}{\beta }^tP_{ss'}(t). \nonumber \\ \hbox {Let}~~~~~ w_1(s,i)= & {} -r_1(s,i)-\sum _{s'\in \mathrm {S}}q_1(s'|s,i)\xi _{\beta }(s')\sum _{t}{\beta }^tP_{ss'}(t)+\xi _{\beta }(s) \nonumber \\&-\sum _{s'\in \mathrm {S}}q_1(s'|s,i) \eta _{\beta }(s')\sum _{t}{\beta }^tP_{ss'}(t)\ge 0\quad \hbox {for all } s\in \mathrm {S},\quad i\in A(s) \end{aligned}$$

(6)

$$\begin{aligned} \hbox {and}~~~~~ w_2(s,j)= & {} r_2(s,j)+\sum _{s'\in \mathrm {S}}q_2(s'|s,j)\eta _{\beta }(s')\sum _{t}{\beta }^tP_{ss'}(t)-\eta _{\beta }(s) \nonumber \\&+\sum _{s'\in \mathrm {S}}q_2(s'|s,j) \xi _{\beta }(s')\sum _{t}{\beta }^tP_{ss'}(t)\ge 0\quad \hbox {for all } s\in \mathrm {S},\quad j\in B(s). \nonumber \\ \end{aligned}$$

(7)

We may assume without loss of generality that $\eta _{\beta }(s)$, $\xi _{\beta }(s)$ are all positive. Since, there is at least one inequality in each of (6) and (7) for each $s\in \mathrm {S}$ that holds as an equality, the following complementarity conditions will hold:

$$\begin{aligned} \eta _{\beta }(s)\prod \limits _{i\in A(s)}w_1(s,i)=0\quad \hbox {for } s\in \{1,2,\ldots ,k\} \end{aligned}$$

(8)

$$\begin{aligned} \hbox {and}\quad \xi _{\beta }(s)\prod \limits _{j\in B(s)}w_2(s,j)=0\quad \hbox {for } s\in \{1,2,\ldots ,k\} . \end{aligned}$$

(9)

The inequalities (6) and (7) along with the complementarity conditions (8) and (9) lead to the VLCP $(q,A)$, where the matrix $A$ is of the form

$$\begin{aligned} A=\left[ \begin{array}{l@{\quad }l} -{\bar{M}}_1 &{} E-{\bar{M}}_1\\ -F+{\bar{M}}_2 &{} {\bar{M}}_2 \end{array} \right] \quad \hbox {and}\quad q=\left[ \begin{array}{c} -r_1(\cdot ,\cdot )\\ r_2(\cdot ,\cdot ) \end{array} \right] . \end{aligned}$$

In the above VLCP, ${\bar{M}}_1=\bigl [q_1(s'|s,i)\sum _{t}{\beta }^tP_{ss'}(t)\bigr ]~\hbox {of order } (\sum m_s)\times k$ and ${\bar{M}}_2= \bigl [q_2(s'|s,j)\sum _{t}{\beta }^tP_{ss'}(t)\bigr ]~\hbox {of order } (\sum n_s)\times k$. For each $s\in \mathrm {S}$, ${\bar{M}}_1(s)$ and ${\bar{M}}_2(s)$ denote the matrices of order $m_s\times k$ and $n_s\times k$ respectively. $E$ and $F$ are vertical block identity matrices, corresponding to P1 and P2 respectively, of the form

$$\begin{aligned} \left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} e^1 &{} 0 &{} \ldots &{} 0\\ 0 &{} e^2&{} \ldots &{} 0\\ \vdots &{}\vdots &{} &{} \vdots \\ 0 &{} 0 &{} \ldots &{} e^k \end{array} \right] \end{aligned}$$

(10)

where $e^s$, $s\in \{1,2,\ldots ,k\}$ is a column vector of all $1$’s of appropriate order.

4.1.2 Convergence of Cottle-Dantzig algorithm for AR–AT–AITT games

We show that the vertical block matrix arising from a zero-sum discounted AR–AT–AITT game belongs to a processable class under a mild assumption.

Theorem 5

Consider the vertical block matrix A arising from the zero-sum AR–AT–AITT game. Then $A\in VBE(l)$ where $l$ is the vector each of whose entries is 1.

Proof

Let $d=\left[ \begin{array}{l} d^1\\ d^2 \end{array} \right] $ where $d^1> 0$ and $d^2> 0$. We shall show by contradiction that VLCP$(d,A)$ has only the trivial solution $w=d$, $z=0$, when $d=l$.

Let $w=\left[ \begin{array}{l} w^1\\ w^2 \end{array} \right] $ and $z=\left[ \begin{array}{l} \eta _{\beta }\\ \xi _{\beta } \end{array} \right] $ be a solution to VLCP$(d,A)$. Then $\left[ \begin{array}{l} w^1\\ w^2 \end{array} \right] =\left[ \begin{array}{l} d^1\\ d^2 \end{array} \right] +A\left[ \begin{array}{l} \eta _{\beta }\\ \xi _{\beta } \end{array} \right] $.

Assume $\left[ \begin{array}{l} \eta _{\beta }\\ \xi _{\beta } \end{array} \right] \ne \left[ \begin{array}{l} 0\\ 0 \end{array} \right] $. Let $v_{\beta }(s)=\xi _{\beta }(s)+\eta _{\beta }(s)$ and $v_{\beta }(s^*)=\max \limits _{s\in \mathrm {S}}v_{\beta }(s)>0$.

Case 1: Let $\eta _{\beta }(s^*)>0$. Then, there exists an $i\in A(s^*)$ such that

$$\begin{aligned} \xi _{\beta }(s^*)=- d^1(i)+\sum _{s'\in \mathrm {S}}q_1(s'|s^*,i)v_{\beta }(s')\sum _{t}{\beta }^tP_{s^*s'}(t). \end{aligned}$$

(11)

We also have from the feasibility condition

$$\begin{aligned} d^2(j)+\sum _{s'\in \mathrm {S}}q_2(s'|s^*,j)v_{\beta }(s')\sum _{t}{\beta }^tP_{s^*s'}(t)\ge \eta _{\beta }(s^*). \end{aligned}$$

(12)

From (11) and (12), we have

$$\begin{aligned} d^2(j)-d^1(i)+\sum _{s'\in \mathrm {S}}q(s'|s^*,i,j)v_{\beta }(s')\sum _{t}{\beta }^tP_{s^*s'}(t)\ge v_{\beta }(s^*). \end{aligned}$$

(13)

Since, $d=l$ implies $d^2(j)=d^1(i)$ for all $i$ and $j$, so we have a contradiction unless $v_{\beta }(s^*)=0$ or $v_{\beta }(s')=0$ for all $s'\in \mathrm {S}$ or $\xi _{\beta }(s')=\eta _{\beta }(s')=0$ for all $s'\in \mathrm {S}$.

Case 2: Let $\xi _{\beta }(s^*)>0$. Then by complementarity there exists a $j\in B(s^*)$ such that $\eta _{\beta }(s^*)=d^2(j)+\sum \nolimits _{s'\in \mathrm {S}}q_2(s'|s^*,j)v_{\beta }(s')\sum \nolimits _{t}{\beta }^tP_{s^*s'}(t)$. Since $d^2(j)> 0$, it follows that $\eta _{\beta }(s^*)>0$. Hence, the theorem follows. $\square $

We denote $\sum \limits _{s'\in \mathrm {S}}q(s'|s,i,j)\sum \limits _{t}{\beta }^tP_{ss'}(t)={\mathbf {E}}(\beta ^{{\bar{\tau }}_s^{ij}})$, where ${\bar{\tau }}_s^{ij}$ is the transition time random variable when the game is AITT. We have the following theorem.

Theorem 6

Consider the vertical block matrix A arising from the zero-sum AR–AT–AITT game. Then $A\in VBR_0$ if either the condition (a) or the set of conditions (b) stated below is satisfied:

(a)
For each $s\in \mathrm {S}$ and each $j\in B(s)$, $q_2(s|s,j)>0$.
(b)
1. (i)
  For each $s\in \mathrm {S}$, the matrix ${\bar{M}}_1(s)$ does not contain any zero column and
2. (ii)
  the matrix ${\bar{M}}_2(s)$ is a non-null matrix and ${\mathbf {E}}(\beta ^{{\bar{\tau }}_s^{ij}})$ is independent of $i$ and $j$.

Proof

We shall show by contradiction that VLCP$(0,A)$ has only the trivial solution $w=0$, $z=0$. Let $w=\left[ \begin{array}{l} w^1\\ w^2 \end{array} \right] $ and $z=\left[ \begin{array}{l} \eta _{\beta }\\ \xi _{\beta } \end{array} \right] $ be a solution to VLCP$(0,A)$. Suppose $\left[ \begin{array}{l} \eta _{\beta }\\ \xi _{\beta } \end{array} \right] \ne \left[ \begin{array}{l} 0\\ 0 \end{array} \right] $. Let $v_{\beta }(s)=\xi _{\beta }(s)+\eta _{\beta }(s)$ and $v_{\beta }(s^*)=\max \limits _{s\in \mathrm {S}}v_{\beta }(s)>0$.

Case 1: Let $\eta _{\beta }(s^*)>0$. Since $d^2(j)=d^1(i)=0$ for all $i$ and $j$, we arrive a contradiction in (13), unless $\xi _{\beta }(s')=\eta _{\beta }(s')=0$ for all $s'\in \mathrm {S}$.

Case 2: Next suppose $\eta _{\beta }(s^*)=0$. This implies $\xi _{\beta }(s^*)>0$. Therefore, by complementarity, there exists a $j\in B(s^*)$ such that

$$\begin{aligned} \sum \limits _{s'\in \mathrm {S}}q_2(s'|s^*,j)v_{\beta }(s')\sum \limits _{t}{\beta }^tP_{s^*s'}(t)=\eta _{\beta }(s^*). \end{aligned}$$

Suppose now condition (a) holds. Then, $q_2(s^*|s^*,j)>0$. Also, we have $v_{\beta }(s^*)>0$ and $\sum _{t}{\beta }^tP_{s^*s^*}(t)>0$. It follows that $\eta _{\beta }(s^*)>0$. Hence Case 2 does not arise if condition (a) holds. Now suppose the set of conditions (b) holds. Since for each $s\in \mathrm {S}$, the matrix $\bar{M}_1(s)$ does not have a zero column, we have by the feasibility condition

$$\begin{aligned} \xi _{\beta }(s)\,e^s-\bar{M}_1(s)\,(\eta _{\beta }+\xi _{\beta })\ge 0 \end{aligned}$$

where $e^s$ denotes the vector of $1$’s of order $m_s$ . Then it follows that $\xi _{\beta }(s)> 0$ for each $s\in \mathrm {S}$. Using condition (b)(ii), it is easy to see (similar to Lemma 1 in Mohan et al. 1999) that for $s=s^*$, $q_2(s'|s^*,j)>0$ for some $s'\in \mathrm {S}$ and thus $\eta _{\beta }(s^*)>0$.

Hence again Case 2 does not arise. $\square $

4.2 Discounted zero-sum AR–AIT–ATT games

4.2.1 VLCP formulation

We use Shapley Eq. (1) and proceed on similar lines as in Sect. 4.1.1 and obtaining the following VLCP$(q,A)$, where the matrix $A$ is of the form

$$\begin{aligned} A=\left[ \begin{array}{l@{\quad }l} -\tilde{M}_1 \quad &{} E-\tilde{M}_1\\ -F+\tilde{M}_2\quad &{} \tilde{M}_2 \end{array} \right] \quad \hbox {and}\quad q=\left[ \begin{array}{l} -r_1(\cdot ,\cdot )\\ r_2(\cdot ,\cdot ) \end{array} \right] . \end{aligned}$$

In the above VLCP, $\tilde{M}_1=\bigl [q(s'|s)\sum _{t}{\beta }^t(P_1)_{ss'}^i(t)\bigr ]~~\hbox {of order } (\sum m_s)\times k$ and $\tilde{M}_2=\bigl [q(s'|s)\sum _{t}{\beta }^t(P_2)_{ss'}^j(t)\bigr ]~\hbox {of order } (\sum n_s)\times k$. For each $s\in \mathrm {S}$, ${\tilde{M}}_1(s)$ and ${\tilde{M}}_2(s)$ denote the matrices of order $m_s\times k$ and $n_s\times k$ respectively. E and F are vertical block identity matrices, corresponding to P1 and P2 respectively, of the form (10).

4.2.2 Convergence of Cottle-Dantzig algorithm for AR–AIT–ATT games

We write $\sum \nolimits _{s'\in \mathrm {S}}q(s'|s)\sum \nolimits _{t}{\beta }^tP_{ss'}^{ij}(t)=\mathbf {E}(\beta ^{\tilde{\tau }_s^{ij}})$, where ${\tilde{\tau }_s^{ij}}$ is the transition time random variable when the game is AIT.

Following the same techniques as in Sect. 4.1.2, we have the following results:

Theorem 7

Consider the vertical block matrix A arising from the zero-sum AR–AIT–ATT game. Then $A\in VBE(l)$ where $l$ is the vector each of whose entries is 1.

Theorem 8

Consider the vertical block matrix A arising from the zero-sum AR–AIT–ATT game. Then $A\in VBR_0$ if either the condition (a) or the set of conditions (b) stated below is satisfied:

(a)
For each $s\in \mathrm {S}$ and each $j\in B(s)$, $q(s|s)\sum \nolimits _{t}{\beta }^t(P_2)_{ss}^{j}(t)>0$.
(b)
1. (i)
  For each $s\in \mathrm {S}$, the matrix $\tilde{M}_1(s)$ does not contain any zero column and
2. (ii)
  the matrix $\tilde{M}_2(s)$ is a non-null matrix and ${\mathbf {E}}(\beta ^{\tilde{\tau }_s^{ij}})$ is independent of $i$ and $j$.

5 Further areas of research

In this paper, we deal with pivotal algorithm for some subclasses of AR–AT semi-Markov games. It is interesting to know how neural network algorithm (see Neogy et al. 2008) perform for this class of games. For the undiscounted case of such games, the existence of value vector and Nash equilibrium pair is still an open problem.

References

Cottle RW, Dantzig GB (1970) A generalization of the linear complementarity problem. J Comb Theory 8:79–90
Article Google Scholar
Cottle RW, Pang JS, Stone RE (1992) The linear complementarity problem. Academic Press, New York
Google Scholar
Filar JA, Vrieze K (1997) Competitive Markov decision processes. Springer, New York
Google Scholar
Himmelberg CJ, Parthasarathy T, Raghavan TES, van Vleck FS (1976) Existence of p-equilibrium and optimal stationary strategies in stochastic games. Proc Am Math Soc 60:245–251
Google Scholar
Jianyong L, Xiaobo Z (2004) On average reward semi-Markov decision processes with a general multichain structure. Math Oper Res 29(2):339–352
Article Google Scholar
Kaplansky I (1945) A contribution to von Neumanns theory of games. Ann Math 46(2):474–479
Article Google Scholar
Lemke C (1965) Bimatrix Equilibrium Points and Mathematical Programming. Management Science 11:681–689
Article Google Scholar
Luque-Vasquez F (2002) Zero-sum semi-Markov games in Borel spaces: discounted and average payoff. Bol. Soc. Mat. Mexicana 8:227–241
Google Scholar
Mohan SR, Neogy SK, Parthasarathy T, Sinha S (1999) Vertical linear complementarity and discounted zero-sum stochastic games with ARAT structure. Math Prog Ser A 86:637–648
Article Google Scholar
Mohan SR, Neogy SK, Parthasarathy T (2001) Pivoting algorithms for some classes of stochastic games: a survey. Int Game Theory Rev 3(2 & 3):253–281
Article Google Scholar
Mohan SR, Neogy SK, Sridhar R (1996) The generalized linear complementarity problem revisited. Math Program 74:197–218
Google Scholar
Mondal P, Sinha S (2013) Ordered field property in a subclass of finite SeR-SIT semi-Markov games. Int Game Theory Rev 15(4):1340026-1–1340026-20. doi:10.1142/S0219198913400264
Article Google Scholar
Mondal P, Sinha S (2015) Ordered field property for semi-Markov games when one player controls transition probabilities and transition times. Int Game Theory Rev. 17(1):1540022-1–1540022-26. doi:10.1142/S0219198915400228
Nash JF (1951) Non cooperative games. Ann Math 54:286–295
Article Google Scholar
Neogy SK, Das AK, Das P (2008) Complementarity problem involving a Vertical Block Matrix and its Solution using Neural Network Model. Mathematical Programming and Game Theory for decision making. ISI platinum jubilee series on statistical science and interdisciplinary research 1:113–130
Polowczuk W (2000) Nonzero-sum semi-Markov games with countable state spaces. Appl Math 27(4):395–402
Google Scholar
Raghavan TES, Tijs SH, Vrieze OJ (1985) On stochastic games with additive reward and transition structure. J Optim Theory Appl 47:375–392
Article Google Scholar
Ross SM (1970) Applied probability models with optimization applications. Holden-Day, San Francisco
Google Scholar
Shapley L (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100
Article Google Scholar
Schweitzer PJ (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J Math Anal Appl 34(3):495–501
Article Google Scholar
Thuijsman F, Raghavan TES (1997) Perfect information stochastic games and related classes. Int J Game Theory 26:403–408
Article Google Scholar

Download references

Acknowledgments

The authors wish to thank the unknown referees who have patiently gone through this paper and whose suggestions have improved its presentation and readability considerably. This research was supported by a DST project.

Author information

Authors and Affiliations

Mathematics Department, Jadavpur University, Kolkata, 700032, India
P. Mondal & S. Sinha
Indian Statistical Institute, Delhi Centre, New Delhi, 110016, India
S. K. Neogy
Indian Statistical Institute, Kolkata Centre, Kolkata, 700108, India
A. K. Das

Authors

P. Mondal
View author publications
You can also search for this author in PubMed Google Scholar
S. Sinha
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Neogy
View author publications
You can also search for this author in PubMed Google Scholar
A. K. Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Mondal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mondal, P., Sinha, S., Neogy, S.K. et al. On discounted AR–AT semi-Markov games and its complementarity formulations. Int J Game Theory 45, 567–583 (2016). https://doi.org/10.1007/s00182-015-0470-1

Download citation

Accepted: 26 February 2015
Published: 07 March 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s00182-015-0470-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On discounted AR–AT semi-Markov games and its complementarity formulations

Abstract

Similar content being viewed by others

On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes

1 Introduction

2 Background and preliminaries

2.1 Two-person finite semi-Markov games

Definition 1

Definition 2

Theorem 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Proof of Theorem 1

Remark 1

Definition 3

Lemma 4

Theorem 2

Proof

Definition 4

2.2 Vertical linear complementarity problem

Definition 5

Definition 6

3 Main results

Definition 7

Definition 8

Definition 9

Definition 10

Example 1

Example 2

Theorem 3

Proof

Example 3

Example 4

Theorem 4

Proof

4 Computing optimal pure strategies of the subclasses of AR–AT games

4.1 Discounted zero-sum AR–AT–AITT semi-Markov games

4.1.1 VLCP formulation

4.1.2 Convergence of Cottle-Dantzig algorithm for AR–AT–AITT games

Theorem 5

Proof

Theorem 6

Proof

4.2 Discounted zero-sum AR–AIT–ATT games

4.2.1 VLCP formulation

4.2.2 Convergence of Cottle-Dantzig algorithm for AR–AIT–ATT games

Theorem 7

Theorem 8

5 Further areas of research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation