1 Introduction

Shapley (1953) introduced the concept of two-person zero-sum discounted stochastic games of infinite horizon with finite state space and finite action space, a payoff matrix at each state, a discount factor between 0 and 1, and probabilities of transitions between the states for every pair of actions of the players. Without loss of generality, the row chooser can be considered as the maximizer and the column chooser as the minimizer. Given a starting state, each player simultaneously selects strategies (pure actions or probability distributions over their respective sets of actions) that result in an immediate (expected) payoff to the row player from the column player. The game then transitions to another state depending on the transition probabilities and the strategies of the players. As the game progresses, the payoffs are discounted by the given discount factor. One player maximizes the expected discounted payoffs accumulated over the infinite horizon, and the other minimizes the same.

In this paper, we address three different aspects of zero-sum and nonzero-sum stochastic games: (1) existence of stationary optimal/equilibrium strategies for discounted and undiscounted stochastic games, (2) symmetric equilibria, and (3) completely mixed games. The following paragraphs give a brief motivation for the questions we wish to address on each of these topics.

Shapley showed that every two-person zero-sum finite discounted stochastic game has a unique optimum expected payoff (called the value of the game) that the maximizer obtains from the minimizer, and that the players have stationary optimal strategies. Stationary strategies are those which depend only on the current state of the game and not on how the state was reached. Fink (1964) and Takahashi (1964) extended this concept of stochastic games to n players with countably many states, while Rieder (1979) extended it to games with countably many players. Maitra and Parthasarathy (1970) proved the existence of equilibrium for stochastic games with infinite action space and uncountable state space. Gillette (1957) introduced the concept of undiscounted (or limiting average) payoffs in stochastic games. Gillette (1957), and Blackwell and Ferguson (1968) gave an example of an undiscounted stochastic game (“The Big Match”) where one of the players does not have stationary optimal strategies. Mertens and Neyman (1980, 1981) showed that every two-person zero-sum finite undiscounted stochastic game has a value, though stationary optimal strategies may not exist. The existence of stationary optimal or equilibrium strategies was proved for some classes of stochastic games, including single-player controlled games (Parthasarathy and Raghavan 1981; Filar 1984, 1985), perfect information stochastic games (Shapley 1953), switching control stochastic games (Filar 1981), additive reward–additive transition (ARAT) games (Himmelberg et al. 1976; Parthasarathy 1982; Raghavan et al. 1986), separable reward–state independent transition (SER-SIT) games (Parthasarathy et al. 1984), and state independent transition (SIT) games (Parthasarathy and Sinha 1989; Nowak 2003). We define some of these classes in Sect. 2. In Subsect. 3.2, we identify certain classes of undiscounted stochastic games that have stationary optima.

Nash (1951) showed that finite symmetric games have symmetric equilibria. Symmetric equilibria are those where the players play the same probability distribution over their respective action sets. Symmetric games are typically used to describe single population games in evolutionary game theory where the players in the population have the same set of strategies (Hofbauer and Sigmund 2003). In this context, the only relevant equilibrium is the symmetric Nash equilibrium which is an evolutionarily stable strategy (Maynard Smith and Price 1973). This serves as our motivation to first look at symmetric equilibrium in bimatrix games. Gale (1960) showed the existence of a symmetric optimal strategy in finite zero-sum games with skew symmetric payoff matrices. Symmetric equilibria have been shown to exist in discontinuous symmetric games as well (Dasgupta and Maskin 1986; Reny 1999). Sujatha et al. (2014) showed the existence of symmetric optimal strategy for discounted stochastic games with skew symmetric payoff matrices. In Subsect. 3.3, we provide some sufficient conditions for stochastic games to have symmetric optima or equilibria.

Completely mixed matrix games are games where all optimal strategies are completely mixed. For a two-person zero-sum game to be completely mixed, Kaplansky (1945) showed that the payoff matrix must be a square matrix and each player must have a unique optimal strategy. He also provided the necessary and sufficient condition for the game to be completely mixed (and specifically for a symmetric game with payoff matrix of order \(5 \times 5\)), and later extended the same to odd-ordered skew symmetric payoff matrices (Kaplansky 1995). Raghavan (1970) extended the above result to nonzero-sum bimatrix games. Oviedo (1996) further extended the result to show the conditions under which the set of all equilibrium strategies is completely mixed in a bimatrix game. Using the results of Oviedo, Sujatha et al. (2014) showed that not all two-person zero-sum games with skew symmetric payoff matrices are completely mixed. They also showed necessary and sufficient conditions for a bimatrix game with odd ordered skew symmetric payoff matrices to be completely mixed. It was also shown that bimatrix games with skew symmetric payoff matrices of even order are never completely mixed. We prove some necessary and some sufficient conditions for classes of discounted as well as undiscounted stochastic games, and discuss tightness of these conditions in Subsect. 3.1.

In Sect. 2, we provide some definitions. In Sect. 3, we discuss our results and look at only two-person finite stochastic games. We provide necessary conditions for certain classes of stochastic games to have completely mixed optimal strategies. We also identify classes of undiscounted zero-sum stochastic games with stationary optima. We, then, look at symmetric optima and symmetric equilibria. In particular, if all the payoff matrices are skew symmetric, we provide conditions for optimal strategies to be symmetric. Finally, for nonzero-sum discounted stochastic games, we show conditions under which symmetric equilibrium exists. Our proof follows along the lines of Blackwell and Ferguson (1968). We also give examples to indicate why some of these conditions are required.

2 Definitions

Definition 1

Bimatrix Game and Matrix Game: Let \(A = (a_{ij})\) and \(B = (b_{ij})\) be two \(m \times n\) matrices. The bimatrix game (AB) is a two-person game in normal form, where each player chooses their strategy independently. In short, a bimatrix game is a two-person nonzero-sum game. Let \(p_i\) and \(q_j\) be the probability that player-1 (the row player) and player-2 (the column player) choose the i-th row and j-th column respectively. Then, the expected payoff to player-1 is given by \(p^tAq = \sum \nolimits _{i=1}^{m} \sum \nolimits _{j=1}^{n} p_{i}a_{ij}q_{j}\). Similarly the expected payoff to player-2 is given by \(p^tBq = \sum \nolimits _{i=1}^{m} \sum \nolimits _{j=1}^{n} p_{i}b_{ij}q_{j}\). This is a one-shot game, that is, players choose their strategies, obtain their respective expected payoffs and the game ends.

If \(B = -A\), the game is a zero-sum game, also called a matrix game. We refer to the game as the matrix game A, or as the matrix game with payoff matrix A.

Definition 2

Optimum (Minmax Value) and Nash Equilibrium: Let A be a matrix game where the row player (player-1) is the maximizer and the column player (player-2) is the minimizer. von Neumann (1928) showed that there exists a pair of strategies \( (x^o, y^o)\) of the players which is optimal for both the players, that is

$$\begin{aligned} x^t A y^o \le {x^o}^t A y^o \le {x^o}^t A y , \text{ for } \text{ all } \; \mathrm{strategies } \ x \ \text{ of } \text{ player- }1, y \text{ of } \text{ player- }2 . \end{aligned}$$

\( (x^o, y^o)\) is an optimal strategy pair. \({x^o}^t A y^o\) is a constant across all optimal strategy pairs, and this constant is called the optimum or minmax value or, just, the value. We denote the value of A by val(A).

Given a bimatrix game (AB) where both players are maximizers, \( (x^o, y^o)\) is a Nash equilibrium strategy pair (Nash 1951) if

$$\begin{aligned} x^t A y^o \le {x^o}^t A y^o , \mathrm{for \ all\, strategies } \ x \text{ of } \text{ player- }1, \ \hbox {and} \end{aligned}$$
$$\begin{aligned} {x^o}^t B y \le {x^o}^t B y^o , \mathrm{for \ all \, strategies } \ y \text{ of } \text{ player- }2 . \end{aligned}$$

\({x^o}^t A y^o\) and \({x^o}^t B y^o\) are Nash equilibrium payoffs of player-1 and player-2 respectively, corresponding to \( (x^o, y^o)\) and may not be unique across Nash equilibrium strategies.

Definition 3

Two-Person Stochastic Game: A two-person nonzero-sum stochastic game denoted by \(\varGamma = (S, A_1, A_2, r_1, r_2, q)\) consists of

  1. 1.

    Two players, player-1 and player-2.

  2. 2.

    A non-empty, finite, countable, or uncountable, Borel set of states. (If the state space is finite, we write \(S = \{ s_1, s_2, \ldots , s_N \}\) and the game is called a finite stochastic game).

  3. 3.

    For each state \(s \in S\), finite, non-empty sets of actions available to player-k (\(k = 1, 2\)), denoted by \(A_k (s) = \{1, 2, \ldots , m_k (s)\}\). Without loss of generality, we may assume \(A_k (s) = A_k\) (and hence \(m_k (s) = m_k\)), for all \(s \in S\).

  4. 4.

    Let the game be in state s and let player-1 and player-2 choose actions \(i \in A_1\) and \(j \in A_2\) respectively. Then the immediate rewards for each of the players are given by \(r_k (s, i, j)\), \(k = 1, 2\). The payoff matrix in state s for player-1 is denoted by \(R_1 (s)\) and that for player-2 is denoted by \(R_2 (s)\).

  5. 5.

    Let player-1 and player-2 choose actions \(i \in A_1\) and \(j \in A_2\) respectively in state \(s \in S\). Then, the probability of transition from state s to state \(s'\) is given by \(q (s' | s, i, j)\). The transition probability matrix is given by Q(ij) for all ij. In a finite stochastic game, this is a \(N \times N\) matrix.

For zero-sum stochastic games, \(r_1 = - r_2\) (\(= r\), say). We denote the game as \( (S, A_1, A_2, r, q)\).

Definition 4

Stationary Strategy: Let S be the state space and \(P_{A_1}\) be the set of probability distributions on player-1’s action set \(A_1\). A stationary strategy for player-1 is a Borel measurable mapping \(f: S \rightarrow P_{A_1}\) that is independent of the history that led to the state \(s \in S\). Similarly, we define a stationary strategy for player-2 as a Borel measurable mapping \(g: S \rightarrow P_{A_2}\) that is independent of the history that led to the state \(s \in S\). We denote the set of stationary strategies by \(P_{A_1}^S\) and \(P_{A_2}^S\) respectively.

Definition 5

\(\beta \)-Discounted Payoffs: Consider a two-person discounted nonzero-sum stochastic game \(\varGamma _\beta = (S, A_1, A_2, r_1, r_2, q, \beta )\). Given the initial state \(s_0\), a pair of stationary strategies (fg) for the players, and a discount factor \(\beta \in (0, 1)\), the \(\beta \)-discounted payoffs for player-k (\(k = 1, 2\)) is as follows:

$$\begin{aligned} I_{\beta }^{ (k)} (f, g) ( s_{0} ) = \sum \limits _{t=0}^{\infty } \beta ^{t} r^{ (t)}_{k} (s_{0}, f, g) \end{aligned}$$

Here, \(r^{ (t)}_{k} (s_{0}, f, g)\) is the expected immediate reward at the t-th stage to player-k.

For the zero-sum case, \(I_{\beta }^{ (1)} (f, g) ( s_{0} ) = -I_{\beta }^{ (2)} (f, g) ( s_{0} )\). We denote \(I_{\beta }^{ (1)}\) as \(I_{\beta }\).

Definition 6

Undiscounted Payoffs: For a two-person undiscounted nonzero-sum stochastic game with starting state \(s_0\), let (fg) be a pair of stationary strategies for player-1 and player-2 respectively. Then the undiscounted or limiting average payoff is given as follows:

$$\begin{aligned}&[\varPhi ^{ (1)} (f, g)] (s_{0}) = {\liminf _{T\uparrow \infty }}\left[ \left( \frac{1}{T+1}\right) {\sum _{t=0}^T} r^{ (1)}_{t} (s_{0}, f, g) \right] , \mathrm{for}\,\, \mathrm{player}{\hbox {-}}1\\&[\varPhi ^{(2)} (f, g)] (s_{0}) = {\liminf _{T\uparrow \infty }} \left[ \left( \frac{1}{T+1}\right) {\sum _{t=0}^T} r^{ (2)}_{t} (s_{0}, f, g) \right] , \mathrm{for}\,\, \mathrm{player}{\hbox {-}}2 \end{aligned}$$

For the zero-sum case, \(\varPhi ^{ (1)} (f, g) ( s_{0} ) = -\varPhi ^{ (2)} (f, g) ( s_{0} )\). We denote \(\varPhi ^{ (1)}\) as \(\varPhi \).

Definition 7

Optimal Strategies and Value (in the stochastic game): A pair of stationary strategies \( (f^o, g^o)\) is optimal in the zero-sum discounted stochastic game if, for all \(s \in S\)

$$\begin{aligned} I_{\beta } (f, g^o) (s) \le I_{\beta } (f^o, g^o) (s) \le I_{\beta } (f^o, g) (s), \text{ for } \text{ all } \; f \in P_{A_1}^S, \text{ for } \text{ all } \; g \in P_{A_2}^S \end{aligned}$$

(where player-1 is the maximizer and player-2 is the minimizer).

In other words,

$$\begin{aligned} I_{\beta } (f^o, g^o) (s) = \inf \limits _{g} [I_{\beta } (f^o, g) (s)] = \sup \limits _{f} [I_{\beta } (f, g^o) (s)], \text{ for } \text{ all } \; s \in S. \end{aligned}$$

Shapley (1953) proved the existence and uniqueness of the value \(I_{\beta } (f^o, g^o)\) across all pairs of optimal strategies (\(f^o, g^o\)). The value of the stochastic game, \(v_{\beta }\), is given by:

$$\begin{aligned} v_{\beta } (s) = I_{\beta } (f^o, g^o) (s) = \sup \limits _{f} \inf \limits _{g} [I_{\beta } (f, g) (s)] = \inf \limits _{g} \sup \limits _{f} [I_{\beta } (f, g) (s)] , \text{ for } \text{ all } \; s \in S. \end{aligned}$$

While the value of the stochastic game is unique, optimal strategies may not be unique.

For the undiscounted zero-sum game, \( (f^*, g^*)\) is a pair of optimal strategies if for all \(s \in S\), we have:

$$\begin{aligned}{}[\varPhi (f, g^*)] (s) \le [ \varPhi (f^*, g^*)] (s) \le [ \varPhi (f^*, g)] (s) \text{ for } \text{ all } \; f \in P_{A_1}^S, g \in P_{A_2}^S. \end{aligned}$$

Definition 8

Matrix (Bimatrix) Game Restricted to a State: For a two-person finite zero-sum (discounted or undiscounted) stochastic game, recall that \(m_k\) indicates the number of pure actions of player k, \(k=1,2\). Consider the matrix \(R (s) = (r (s, i, j))_{m_1 \times m_2}\) restricted to state \(s \in S\). That is, for a fixed \(s \in S\), the (ij)-th element of R(s) is the immediate reward to player-1 and player-2 when they choose actions i and j respectively. Then, the one-shot game where the payoff matrix of player-1 is R(s) is referred to as the matrix game restricted to state s.

Throughout the paper, we will use the notation R(s) to indicate the matrix game restricted to state s.

Similarly, for a two-person finite nonzero-sum stochastic game, given \(s \in S\), we call the bimatrix game \((R_1 (s), R_2 (s))\) as the bimatrix game restricted to state s.

Definition 9

Auxiliary Game: The game with payoff matrix \(\mathcal {A} (s)\) whose (ij)-th element is \((r (s, i, j) + \beta \sum \nolimits _{s' \in S} v_\beta (s') q (s' | s, i, j))\) where \(v_\beta (s')\) is the value of the discounted stochastic game whose initial state is \(s'\), is called the auxiliary game at state s (or starting at state s).

Shapley (1953) showed that val \(\mathcal {A} (s) = v_\beta (s)\).

Definition 10

Nash Equilibrium (in the stochastic game): A pair of stationary strategies \( (f^o, g^o)\) constitutes a Nash equilibrium in the discounted stochastic game if, for all \(s \in S\)

$$\begin{aligned}&I_{\beta }^{ (1)} (f^o, g^o) (s) \ge I_{\beta }^{ (1)} (f, g^o) (s) \text{, } \text{ for } \text{ all } \;f \in P_{A_1}^S, \text{ and }\\&I_{\beta }^{ (2)} (f^o, g^o) (s) \ge I_{\beta }^{ (2)} (f^o, g) (s) \text{, } \text{ for } \text{ all } \;g \in P_{A_2}^S \end{aligned}$$

assuming that both players want to maximize their payoffs.

Similarly, a pair of strategies \( (f^*, g^*)\) constitutes a Nash equilibrium for an undiscounted stochastic game if for all \(s \in S\),

$$\begin{aligned}&[\varPhi ^{ (1)} (f^*, g^*)] (s) \ge [ \varPhi ^{ (1)} (f, g^*)] (s) \quad \hbox {for all}\; f \in P_{A_1}^{S}, \hbox {and}\\&[\varPhi ^{ (2)} (f^{*}, g^{*})] (s) \ge [ \varPhi ^{(2)} (f^{*}, g)] (s) \quad \text{ for } \text{ all }\; g \in P_{A_2}^{S}. \end{aligned}$$

Henceforth, whenever we say “equilibrium”, we mean “Nash equilibrium”.

Definition 11

Symmetric Optimal and Symmetric Equilibrium Strategy Pairs: A pair of optimal (or equilibrium) strategies \( (f^*, g^*)\) is called a symmetric optimal (equilibrium) strategy if both players use the same strategy at optimum (equilibrium), that is, \(f^* = g^*\). We say \( (f^*, f^*)\) is a symmetric optimal (equilibrium) strategy pair or simply \(f^*\) is a symmetric optimal (equilibrium) strategy. (Clearly, we talk of symmetric optima (equilibria) only when the payoff matrix (matrices) is (are) square. We assume \(A_1 = A_2\)).

Definition 12

Completely Mixed Stochastic Game (Filar 1985): Consider a two-person stochastic game \( (S, A_1, A_2, r_1, \) \(r_2, q, \beta )\). If every optimal stationary strategy for either player assigns a positive probability to every action in every state, then the stochastic game is said to be completely mixed. Such strategies are referred to as completely mixed strategy of the stochastic game.

For example, consider a single-player controlled stochastic game with positive payoffs. Let \(F^o\) and \(G^o\) denote the set of all optimal stationary strategies for player-1 and player-2 respectively. Then the stochastic game is completely mixed if for all \( (f^o, g^o) \in F^o \times G^o\), \(f_i^o (s)\) and \(g_j^o (s)\) are strictly positive for all i, j and s.

Definition 13

Single-Player Controlled Stochastic Games (Parthasarathy and Raghavan 1981): In single-player controlled stochastic games, only one of the players controls the transitions. For example, when player-1 controls transitions, \(q (s' | s, i, j) = q (s' | s, i)\) for all \(i \in {A_1}\), for all \(j \in {A_2}\), and for all \(s, s' \in S\).

Definition 14

Switching Control Stochastic Games (Filar 1981): In switching control stochastic games, the transition is controlled by player-1 alone when the game is played in a certain subset of states and by player-2 alone when the game is played in other states. That is, \(S = S_1 \cup S_2\), \(S_1 \cap S_2 = \phi \) and the transition probabilities are given by

$$\begin{aligned} q (s' | s, i, j)= & {} q (s' | s, i),\,\,\hbox {for all} \;s' \in S, s \in S_1, i \in {A_1}, j \in {A_2}\\ q (s' | s, i, j)= & {} q (s' | s, j),\,\,\hbox {for all}\; s' \in S, s \in S_2, j \in {A_2}, i \in {A_1}. \end{aligned}$$

Definition 15

AIT (Action Independent Transition) Games (Krishnamurthy 2011): In AIT stochastic games, the transitions are independent of the actions of the two players. That is, for all \(i \in {A_1}\), for all \(j \in {A_2}\), and for all \(s, s' \in S\), \(q (s' | s, i, j) = q (s' | s)\).

Definition 16

SER-SIT (Separable Reward–State Independent Transition) Games (Parthasarathy et al. 1984): SER-SIT stochastic games exhibit the following two properties:

  1. 1.

    The rewards can be written as the sum of two functions—one function that depends on the state alone, and another function that depends on the actions alone. That is, for all \(s \in S\), for all \(i \in {A_1}\), for all \(j \in {A_2}\),

    $$\begin{aligned} r_k (s, i, j) = c_k (s) + a_k (i, j), k = 1, 2, \text{ where }\,c_k (s) \ \hbox {is a measurable function} \end{aligned}$$
  2. 2.

    The transitions are independent of the state from which the game transitions. That is, for all \(i \in {A_1}\), for all \(j \in {A_2}\), and for all \(s, s' \in S\),

    $$\begin{aligned} q (s' | s, i, j) = q (s' | i, j). \end{aligned}$$

3 Results

We consider only two-person finite stochastic games. In each of the following subsections, we address a different aspect of stochastic games—namely, completely mixed stochastic games, undiscounted stochastic games with stationary optimal strategies, and symmetric equilibrium strategies in stochastic games respectively.

3.1 Completely mixed stochastic games

In general, a stochastic game need not be completely mixed. In this subsection, we provide conditions under which finite discounted zero-sum single-player controlled as well as switching control stochastic games are completely mixed using results from Kaplansky (1945). As single-player controlled stochastic games are a subclass of switching control stochastic games, results for the former follow from those of the latter. However, we state them separately as we first prove the results for single-player controlled stochastic games and then extend them to switching control stochastic games. We also provide examples to highlight the necessity of some of the conditions in these results. Further, for two-person nonzero-sum games, we provide sufficient conditions for SER-SIT games to have a completely mixed equilibrium, and a necessary condition for SER-SIT games to be completely mixed in the discounted as well as undiscounted cases. Further, we also extend a result on finite discounted nonzero-sum stochastic games by Sujatha et al. (2014) to finite undiscounted zero-sum stochastic games.

3.1.1 Zero-sum stochastic games

Lemma 1

Let \(A \in \mathbb {R}^{n \times n}\) and let \(A = A^t\). Suppose there exists a probability vector \(x \in \mathbb {R}^n\) with \(\sum \nolimits _{j=1}^n{a_{ij}x_j} = c\), for \(i = 1, 2, \dots , n\). Then \(\sum \nolimits _{i=1}^n{a_{ij}x_i} = c\), for \(j = 1, 2, \dots , n\). Further, c is the (minmax) value of the matrix game A, and x is optimal for both the players.

Proof

We skip the proof as it is straightforward. \(\square \)

The following results of Kaplansky (1945) are needed for proving some of our results.

  1. 1.

    (Theorem 1 of Kaplansky 1945) Consider a two-person matrix game with payoff matrix \(A \in \mathbb {R}^{m \times n}\). Suppose player-1 has a completely mixed optimal strategy. If \(y^o\) is any optimal strategy for player-2, then \(\sum \nolimits _{j=1}^n a_{ij} y_j^o \equiv v\) for \(i=1, \ldots , m\) where v is the value of the matrix game.

  2. 2.

    (Theorem 2 of Kaplansky 1945) Consider a completely mixed two-person matrix game with payoff matrix \(A \in \mathbb {R}^{m \times n}\). Let \(A_{ij}\) be the cofactor of \(a_{ij}\). Then the value of the game is given by \(v = \frac{\det (A)}{\sum \nolimits _i\sum \nolimits _\mathrm{j} A_{ij}}\), where the denominator is always nonzero. Also, if \(v \ne 0\), then \(\det (A) \ne 0\).

  3. 3.

    (Theorem 4 of Kaplansky 1945) Let \(A \in \mathbb {R}^{n \times n}\) be the payoff matrix of a two-person matrix game. Every optimal strategy of player-1 is completely mixed if and only if every optimal strategy of player-2 is also completely mixed.\(\square \)

We now present the following results on stochastic games with completely mixed optimal strategies. Though some of these results are about symmetric optimal strategies, these results are primarily used to prove Theorem 3 on completely mixed stochastic games. Subsection 3.3 discusses further results on symmetric optima and equilibria in stochastic games.

We, first, make the following observation.

Let \(f^*\) be an optimal stationary strategy of player-1 and \(\lambda \) be any strategy of player-2 in state s such that \(r(s, f^*(s), \lambda ) + \beta \sum _{s' \in S} v_\beta (s') q(s' | s, f^*(s), \lambda ) \ge c\). Then \(I^{ (1)}_\beta (f^*, g) (s) \ge c\) for all states \(s \in S\) and all strategies g of player-2. That is, the payoff for player-1 against any strategy of player-2 is at least c for a given discount factor \(\beta \). \(\square \)

In the following lemma, we provide conditions which are sufficient for existence of a symmetric optimal strategy for the matrix game R(s) restricted to state s for all \(s \in S\). In general, the symmetric optimal strategy for R(s) need not be unique as seen in the example of a game with \(r (s, i, j) = 1\) for all sij. We provide a sufficient condition for the symmetric optimal strategy to be unique. Further, under certain conditions the existence of a symmetric optimal strategy pair for R(s) for all \(s \in S\) is necessary for the existence of a completely mixed optimal stationary strategy for the stochastic game.

Lemma 2

Consider a finite discounted zero-sum single-player controlled stochastic game where player-1 is the controlling player, that is, \(q (s' | s, i, j) = q (s' | s, i)\) for all \(s \in S\). Let R(s) be symmetric for each \(s \in S\). Let \((f^*, g^*)\) be a completely mixed optimal stationary strategy pair for the stochastic game. Then, \(f^* (s)\) is a symmetric optimal strategy pair for R(s) for all \(s \in S\).

Further,  for all \(s \in S\), let R(s) be non-singular. Then, \(f^*(s)\) is the unique symmetric optimal strategy pair for R(s), for all \(s \in S\).

Proof

Let \((f^*, g^*)\) be a completely mixed optimal stationary strategy pair for the stochastic game. Since \(f^*\) is a completely mixed optimal stationary strategy of player-1 in the stochastic game, by Kaplansky (1945, Theorem 1) we have, for all \(s \in S\)

$$\begin{aligned} v_\beta (s) = r (s, f^* (s), j) + \beta \sum \limits _{s' \in S}{v_\beta (s')q (s' | s, f^* (s))} , \text{ for } \text{ all } \;j \in A_2. \end{aligned}$$
(1)

Since the game is controlled by player-1, the second term of Eq. 1 is independent of player-2’s actions (that is, independent of any \(j \in A_2\)). Therefore, for all \(s \in S\), \(r (s, f^* (s), j)\) is independent of j since \(v_\beta (s)\) is a constant. That is,

$$\begin{aligned} r (s, f^* (s), j) = v_\beta (s) - \beta \sum \limits _{s' \in S}{v_\beta (s')q (s' | s, f^* (s)) =}\,\text{ constant }\,c(s) \,\hbox {for}\, s \in S, \hbox {for all}\, j \in A_2 . \end{aligned}$$

By Lemma 1, c(s) is the minmax value of the matrix game R(s). Further, by Lemma 1, \(f^* (s)\) is a symmetric optimal strategy for R(s).

Now, if R(s) is non-singular for all \(s \in S\), we shall show that \(f^* (s)\) is the unique symmetric optimal strategy for R(s) for all \(s \in S\). If possible, let \(\mu ^*\) be another optimal strategy of player-1 for the game R(s), \(s \in S\). Then by Kaplansky (1945, Theorem 1), \( (\mu ^{*})^{t} R (s) \equiv v (s) e\) where \(e = (1, \ldots , 1)^t\) and \(v (s) = \) val (R(s)). Also \( (f^* (s))^t R (s) \equiv v (s)e\). Since R(s) is non-singular, it follows that \(\mu ^* = f^* (s)\). Hence, \(f^* (s)\) is the unique symmetric optimal strategy for R(s). \(\square \)

Remark 1

Lemma 2 can be extended to AIT stochastic games. In fact, we do not require symmetry of R(s).

Remark 2

Lemma 2 can be extended to switching control stochastic games as follows. Consider a finite discounted zero-sum switching control stochastic game where \(S_1\) and \(S_2\) are the set of states where player-1 and player-2 are, respectively, the controlling players, \(S = S_1 \cup S_2\) and \(S_1 \cap S_2 = \phi \). Further let \( (f^*, g^*)\) be a completely mixed optimal stationary strategy pair for the stochastic game. For each \(s \in S\), let R(s) be symmetric and non-singular. Then, \(f^* (s)\) is the unique symmetric optimal strategy for R(s) for all \(s \in S_1\), and \(g^* (s)\) is the unique symmetric optimal strategy for R(s) for all \(s \in S_2\).

For finite discounted zero-sum single-player controlled and switching control stochastic games, we provide sufficient conditions for R(s) to be completely mixed, as well as for the stochastic game to be completely mixed. We will use the following result by Parthasarathy and Raghavan (1981) in the proof of Theorem 3.

(Lemma 4.1 of Parthasarathy and Raghavan 1981) Consider a non-singular matrix \(C = (c_{ij})_{n \times n}\) where \(c_{ij} = a_{ij} + b_j\) with \(a_{ij} > 0\) for all ij. Suppose \(Cx = \alpha e\) where x is a probability vector, \(\alpha \) is a scalar and \(e = (1, \ldots , 1)^t\). Then the matrix \(A= (a_{ij})\) is non-singular and \(Ax = \beta e\) for some scalar \(\beta \). \(\square \)

Theorem 3

Consider a finite discounted zero-sum stochastic game which is either a single-player controlled stochastic game where player-1 is the controlling player or a switching control stochastic game. Let R(s) be symmetric for each \(s \in S\). Further, suppose there exists a completely mixed optimal stationary strategy pair for the stochastic game. Then the following are equivalent.

  1. 1.

    The discounted stochastic game is completely mixed.

  2. 2.

    The matrix game R(s) is completely mixed for every s.

Proof

We will first prove the result for the single-player controlled stochastic game where player-1 is the controlling player. Without loss of generality, assume that \(r (s, i, j) > 0\) for every tuple (sij).

We will first show that 2 follows from 1. Let the discounted stochastic game be completely mixed. This means that the auxiliary game \(\mathcal {A} (s)\) is completely mixed for every s. The (ij)-th element of \(\mathcal {A} (s)\) is given by \(r (s, i, j) + \beta \sum \nolimits _{s' \in S}v_\beta (s') q (s' | s, i)\).

By Kaplansky (1945, Theorem 2), value of \(\mathcal {A} (s) = \frac{ \det ( \mathcal {A} (s)) }{\sum \sum \mathcal {A} (s)_{ij}}\). Since \(\mathcal {A} (s)\) is completely mixed and \(v_\beta (s) > 0\), value of \(\mathcal {A} (s)\) is nonzero. Hence \(det (\mathcal {A} (s)) \ne 0\), or \(\mathcal {A} (s)\) is non-singular for every \(s \in S\). By Parthasarathy and Raghavan (1981, Lemma 4.1), it follows that R(s) is non-singular for all \(s \in S\).

Now, let \( (f^o, g^o)\) be a completely mixed optimal stationary strategy pair for the stochastic game. By Lemma 2, \( (f^o (s), f^o (s))\) is an optimal strategy pair for R(s). \( (f^o (s), f^o (s))\) is also completely mixed as \(f^o\) is a completely mixed strategy of player-1, and \(A_1 = A_2\). Thus, as R(s) is non-singular for all \(s \in S\), it follows by Lemma 2 that R(s) is completely mixed for every \(s \in S\).

Conversely, suppose R(s) is completely mixed for all \(s \in S\). Let \( (f^o, g^o)\) be a completely mixed optimal stationary strategy pair for the discounted stochastic game. For all \(s \in S\), as \(r (s, i, j) > 0\) and R(s) is completely mixed, R(s) is non-singular by Kaplansky (1945, Theorem 2). The auxiliary game starting at state s is \(\mathcal {A} (s) = ( (r (s, i, j) + \beta \sum \nolimits _{s' \in S}v_\beta (s') q (s' | s, i)))\). By Parthasarathy and Raghavan (1981, Lemma 4.1), \(\mathcal {A} (s)\) is non-singular for all \(s \in S\).

By Lemma 2, (\(f^o (s)\), \(f^o (s)\)) is the unique symmetric optimal strategy pair for R(s). Then, \(f^o (s)^t R (s) = v (s)e^t\).

If possible, let \(\mathcal {A} (s_0)\) not be completely mixed for some \(s_0 \in S\). Then by Kaplansky (1945, Theorem 4), there exists an optimal strategy \(\mu ^*\) of player-1 such that \(\mu ^* (s_0)\) is not completely mixed.

Thus, \(\sum \nolimits _{i=1}^n [ (r (s_0, i, j) + \beta \sum \nolimits _{s' \in S}v_\beta (s') q (s' | s_0, i))\mu ^*_i] = v_\beta (s_0)\), where \(\mu ^* = (\mu ^*_1, \mu ^*_2, \ldots , \mu ^*_n)^t\), and \(n = m_1 = m_2\), the number of actions each player has.

Also, \(\mathcal {A} (s_0)f^o (s_0)\) \(=\) \(\sum \nolimits _{i=1}^n [ (r (s_0, i, j) + \beta \sum \nolimits _{s' \in S}v_\beta (s') q (s' | s_0, i))f^o_i] = v_\beta (s_0)\).

Since \(\mathcal {A} (s_0)\) is non-singular, \(\mu ^*\) and \(f^o\) must coincide. Thus every optimal strategy for player-1 is completely mixed. Hence the single-player controlled stochastic game is completely mixed.

For switching control stochastic games, the above proof can be mimicked for sets of states \(S_1\) and \(S_2\) separately, and the result follows due to Remark 2 that extends Lemma 2 to switching control stochastic games. \(\square \)

We now provide two examples (Examples 1 and 2) to show whether some of the conditions in Theorem 3 are necessary or not. These two examples consider discounted zero-sum player-2 controlled stochastic games with three states, namely \(s_1\), \(s_2\), and \(s_3\), and discount factor \(\beta = 1/2\). As the states \(s_2\) and \(s_3\) are absorbing states, it can be seen that \(v_\beta (s_2) = 0\) and \(v_\beta (s_3) = 1\) in both the examples.

Example 1

Symmetry of the payoff matrices for all states is not a necessary condition.

Let the payoffs and the transition probabilities for each of the three states be as follows.

$$\begin{aligned} s_1 : \left[ \begin{array}{ccc} 0 &{} &{} 2 \\ (0, 1, 0) &{} &{} (0, 0, 1) \\ &{} &{} \\ 3 &{} &{} 1 \\ (0, 1, 0) &{} &{} (0, 0, 1) \end{array}\right] , s_2 : \left[ \begin{array}{ccc} 1 &{} &{} -1 \\ (0, 1, 0) &{} &{} (0, 1, 0) \\ &{} &{} \\ -1 &{} &{} 1 \\ (0, 1, 0) &{} &{} (0, 1, 0) \end{array}\right] , s_3 : \left[ \begin{array}{ccc} 2 &{} &{} 0 \\ (0, 0, 1) &{} &{} (0, 0, 1) \\ &{} &{} \\ 0 &{} &{} 2 \\ (0, 0, 1) &{} &{} (0, 0, 1) \end{array}\right] \end{aligned}$$

The auxiliary game when the stochastic game starts at \(s_1\) is

$$\begin{aligned} \mathcal {A} (s_1) = \left[ \begin{array}{ccc} 0 &{} &{} 2 + \beta / (1 - \beta )\\ &{} &{} \\ 3 &{} &{} 1 + \beta / (1 - \beta )\\ \end{array}\right] = \left[ \begin{array}{ccc} 0 &{} &{} 3\\ &{} &{} \\ 3 &{} &{} 2\\ \end{array}\right] \end{aligned}$$

which is clearly completely mixed.

In fact, \(\mathcal {A}(s)\) as well as R(s) are completely mixed, for all \(s \in S\). However, \(R (s_1)\) is not symmetric. Hence, symmetry of R(s), for all \(s \in S\), is not a necessary condition for Theorem 3 to hold. \(\square \)

Example 2

The stochastic game being completely mixed is not necessary for R(s) to be completely mixed for all \(s \in S\).

We consider a minor modification to the payoffs in state \(s_1\) in Example 1.

$$\begin{aligned} s_1 : \left[ \begin{array}{ccc} 0 &{} &{} 2 \\ (0, 1, 0) &{} &{} (0, 0, 1) \\ &{} &{} \\ 2 &{} &{} 1 \\ (0, 1, 0) &{} &{} (0, 0, 1) \end{array}\right] , s_2 : \left[ \begin{array}{ccc} 1 &{} &{} -1 \\ (0, 1, 0) &{} &{} (0, 1, 0) \\ &{} &{} \\ -1 &{} &{} 1 \\ (0, 1, 0) &{} &{} (0, 1, 0) \end{array}\right] , s_3 : \left[ \begin{array}{ccc} 2 &{} &{} 0 \\ (0, 0, 1) &{} &{} (0, 0, 1) \\ &{} &{} \\ 0 &{} &{} 2 \\ (0, 0, 1) &{} &{} (0, 0, 1) \end{array}\right] \end{aligned}$$

The auxiliary game starting at \(s_1\) is \(\mathcal {A} (s_1) = \left[ \begin{array}{ccc} 0 &{} &{} 3\\ &{} &{} \\ 2 &{} &{} 2\\ \end{array}\right] \) and is not completely mixed. However the matrix game \(R (s_1)\) is completely mixed since (1 / 3, 2 / 3) is the unique symmetric optimal strategy for \(R (s_1)\). In fact, R(s) is completely mixed for all s. \(\square \)

Our next example (Example 3) highlights the condition under which a stochastic game is not completely mixed though R(s) for all \(s \in S\) are completely mixed. In fact, the stochastic game does not have even one completely mixed optimal stationary strategy in this example. We start with the following interesting result on completely mixed \(n \times n\) matrix games that is used in the example.

Lemma 4

Let \(A \in \mathbb {R}^{n \times n}\). Let the matrix game A be completely mixed. Then the new matrix game \(\left[ \begin{array}{ccccccc} a_{11} + d_1 &{} &{} a_{12} + d_2 &{} &{} \dots &{} &{} a_{1n} + d_n \\ &{} &{} \\ a_{21} + d_1 &{} &{} a_{22} + d_2 &{} &{} \dots &{} &{} a_{2n} + d_n \\ &{} &{} \\ \vdots \\ &{} &{} \\ a_{n1} + d_1 &{} &{} a_{n2} + d_2 &{} &{} \dots &{} &{} a_{nn} + d_n\end{array}\right] \)has no row domination for every \(d_i\) \(\in \mathbb {R}\), \(i = 1 , \dots , n\).

That is, no row is dominated by a convex combination of other rows.

Proof

If possible, let the new game have row domination. This means that a convex combination of the rows of A dominate another row of A. However, no convex combination of rows can dominate another row of A entry wise since the matrix game A is completely mixed. Hence the new game has no row domination. \(\square \)

Example 3

We demonstrate the necessity of the condition in Theorem 3 that the stochastic game has at least one completely mixed optimal stationary strategy. We construct a discounted single player controlled stochastic game with symmetric payoff matrices where R(s) is completely mixed for each \(s \in S\), but the stochastic game does not have any completely mixed optimal stationary strategy.

Let \(R (s_0) = \left[ \begin{array}{ccccc} 2 &{} &{} -1 &{} &{} -1 \\ &{} &{} \\ -1 &{} &{} 2 &{} &{} -1\\ &{} &{} \\ -1 &{} &{} -1 &{} &{} 2\end{array}\right] \). Let \(s_1\), \(s_2\), and \(s_3\) be absorbing states whose payoff matrices are given by:

$$\begin{aligned} R (s_1) = \left[ \begin{array}{ccccc} 6 &{} &{} 0 &{} &{} 0 \\ &{} &{} \\ 0 &{} &{} 6 &{} &{} 0\\ &{} &{} \\ 0 &{} &{} 0 &{} &{} 6\end{array}\right] , \, R (s_2) = \left[ \begin{array}{ccccc} 12 &{} &{} 0 &{} &{} 0 \\ &{} &{} \\ 0 &{} &{} 12 &{} &{} 0\\ &{} &{} \\ 0 &{} &{} 0 &{} &{} 12\end{array}\right] , \mathrm{and}\, R (s_3) = \left[ \begin{array}{ccccc} 18 &{} &{} 0 &{} &{} 0 \\ &{} &{} \\ 0 &{} &{} 18 &{} &{} 0\\ &{} &{} \\ 0 &{} &{} 0 &{} &{} 18\end{array}\right] . \end{aligned}$$

Let the game start in state \(s_0\) and move to state \(s_i\) if column i is played. Let the discount factor be \(\beta = \frac{1}{7}\).

Clearly, R(s) is completely mixed for each s.

Further, \(v_\beta (s_1) = \frac{6}{1 - \beta } = 7\), \(v_\beta (s_2) = \frac{12}{1 - \beta } = 14\), and \(v_\beta (s_3) = \frac{18}{1 - \beta } = 21\).

Hence,

$$\begin{aligned} \mathcal {A} (s_0) = \left[ \begin{array}{ccccc} 2 + \beta v_\beta (s_1) &{} &{} -1 + \beta v_\beta (s_2) &{} &{} -1 + \beta v_\beta (s_3) \\ &{} &{} \\ -1 + \beta v_\beta (s_1) &{} &{} 2 + \beta v_\beta (s_2) &{} &{} -1 + \beta v_\beta (s_3)\\ &{} &{} \\ -1 + \beta v_\beta (s_1) &{} &{} -1 + \beta v_\beta (s_2) &{} &{} 2 + \beta v_\beta (s_3)\end{array}\right] =\left[ \begin{array}{ccccc} 3 &{} &{} 1 &{} &{} 2 \\ &{} &{} \\ 0 &{} &{} 4 &{} &{} 2\\ &{} &{} \\ 0 &{} &{} 1 &{} &{} 5\end{array}\right] . \end{aligned}$$

In this specific example, adding 1, 2 and 3 to the first, second and third columns of \(R (s_0)\) respectively yields the matrix \(\mathcal {A} (s_0)\). By Lemma 4, \(\mathcal {A} (s_0)\) has no row domination. Also \( (\frac{2}{3}, \frac{1}{3}, 0)\) is the only optimal stationary strategy for player-1 (the maximizer) in \(\mathcal {A} (s_0)\) and it is not completely mixed. Since \(\mathcal {A} (s_0)\) has no completely mixed optimal strategy, the assumption that there exists at least one completely mixed optimal stationary strategy for both players in the stochastic game does not hold. However, R(s) is completely mixed for all \(s \in S\). \(\square \)

3.1.2 Nonzero-sum stochastic games

We now look at some results for two-person finite nonzero-sum SER-SIT stochastic games. We also look at an extension of a result by Sujatha et al. (2014) to finite undiscounted zero-sum stochastic games. Though this result is for zero-sum stochastic games, we have stated and proved the same in this subsection as it is an extension of a result on finite discounted nonzero-sum stochastic games by Sujatha et al. (2014), the proof of which (the latter) has been provided in the Appendix.

Parthasarathy et al. (1984) showed that, given a two-person finite nonzero-sum SER-SIT game, a state independent stationary equilibrium strategy pair can be easily found by just solving a single bimatrix game (EF), where, in the case of discounted SER-SIT games, the (ij)th entry of E and F are \(a_1 (i,j) + \beta \sum \nolimits _{s'}c_1 (s')q (s' | i,j)\) and \(a_2 (i,j) + \beta \sum \nolimits _{s'}c_2 (s')q (s' | i,j)\) respectively. Here \(a_1, a_2, c_1\) and \(c_2\) are as per Definition 16. In Krishnamurthy et al. (2009), the authors discuss pure strategy equilibria and show that pure strategy equilibria of the SER-SIT game and of the bimatrix game (EF) correspond. In general, equilibria of the SER-SIT game and of the bimatrix game (EF) may not correspond. Parthasarathy et al. (1984) give an example where the SER-SIT game has more number of equilibrium points than the bimatrix game to which it has been reduced. In the following theorem, we show that the SER-SIT game has a completely mixed equilibrium if and only if the bimatrix game (EF) has one. Further, if the SER-SIT game is completely mixed, so is (EF).

Lemma 5

Let \(\varGamma _\beta \) be a two-person finite discounted nonzero-sum SER-SIT game where the reward functions of the players are \(r_k (s, i, j) = c_k (s) + a_k (i, j), k = 1, 2\), and the transition probabilities are \(q (s' | s, i, j) = q (s' | i, j)\), for all \(i \in {A_1}\), \(j \in {A_2}\), \(s, s' \in S\). Let (EF) be the \(m_1 \times m_2\) bimatrix game \( (a_1 (i,j) + \beta \sum \nolimits _{s'}c_1 (s')q (s' | i,j)\), \(a_2 (i,j) + \beta \sum \nolimits _{s'}c_2 (s')q (s' | i,j))\), \(i = 1, \ldots , m_1\), \(j = 1, \ldots , m_2\), where \(m_1 = | A_1 | \) and \(m_2 = | A_2 | \). Then, (EF) has a completely mixed equilibrium if and only if \(\varGamma _\beta \) has a completely mixed equilibrium. In fact, if \(\varGamma _\beta \) is a completely mixed game, so is (EF).

Proof

Let \( (x^*, y^*)\) be a completely mixed equilibrium point of the bimatrix game (EF). Define \(f^* (s) \equiv x^*\) and \(g^* (s) \equiv y^*\) for all \(s \in S\). By Parthasarathy et al. (1984, Theorem 4.1), \( (f^*, g^*)\) is an equilibrium pair for the discounted SER-SIT game \(\varGamma _\beta \), and by construction, \( (f^*, g^*)\) is completely mixed.

Conversely, let \( (f^*, g^*)\) be a completely mixed equilibrium pair for the discounted SER-SIT game \(\varGamma _\beta \). Then, \( (f^o, g^o)\) where \(f^o (s) \equiv f^*(s')\) and \(g^o (s) \equiv g^*(s')\) for all \(s \in S\), where \(s'\in S\) is a fixed state, is a state-independent completely mixed equilibrium pair for \(\varGamma _\beta \). It is easy to see that \((f^*(s'), g^*(s'))\) is a completely mixed equilibrium of (EF) too.

By Parthasarathy et al. (1984, Theorem 4.1), for any equilibrium of (EF), we can construct an equilibrium of \(\varGamma _\beta \). Therefore, if (EF) has an equilibrium which is not completely mixed, so does \(\varGamma _\beta \). In other words, if \(\varGamma _\beta \) is a completely mixed game, (EF) is a completely mixed game too. \(\square \)

Remark 3

By Parthasarathy et al. (1984, Theorem 4.2) for two-person finite undiscounted nonzero-sum SER-SIT games, Lemma 5 can be proved for two-person finite undiscounted nonzero-sum SER-SIT games as well.

Corollary 6

If the \(m_1 \times m_2\) matrices \( (a_k (i, j))\), \( (k=1, 2)\), and the transition probability matrices are symmetric, then the corresponding SER-SIT game has a symmetric equilibrium pair in the discounted as well as undiscounted case. \(\square \)

We state two relevant results pertaining to conditions when matrix as well as bimatrix games are completely mixed. For a two-person zero-sum game with a skew-symmetric payoff matrix of even order, Kaplansky (1945) showed that the game can never be completely mixed. Further, he showed that the zero-sum game with payoff matrix \(A_{m \times n}\) is completely mixed if and only if A is a square matrix with rank \(n - 1\), and all the cofactors of A are different from zero and have the same sign. Sujatha et al. (2014) showed that a two-person zero-sum game with a skew-symmetric payoff matrix of even order can never be completely mixed. This follows directly from a contradiction to Kaplansky’s result (1945) and is based on the fact that the rank of a skew symmetric matrix is of even order. As the cofactors of the matrix A do not have the same sign, the game can never be completely mixed.

Theorem 8 extends Theorem 7 to undiscounted stochastic games.

Theorem 7

(Theorem 3 of Sujatha et al. 2014) Consider a bimatrix game (A, B) with odd ordered skew-symmetric payoff matrices. Let \(\varepsilon \) be the set of all equilibrium points. For every \( (x, y) \in \varepsilon \), let there exist \(v_1\) and \(v_2\) such that \(Ay = v_1e\) and \(x^tB = v_2e^t\). Then the game is completely mixed if and only if the principal PfaffiansFootnote 1 of both payoff matrices are all nonzero and alternate in sign.

For the sake of completeness, the proof of Theorem 7 is provided in the Appendix.

Theorem 8

Consider a finite undiscounted zero-sum stochastic game \(\varGamma \) with skew symmetric payoff matrices that are odd ordered. Suppose R(s) is completely mixed for all \(s \in S\). Then \(\varGamma \) has value 0 and has a completely mixed optimal strategy.

Proof

As all payoff matrices are skew symmetric, the value of the stochastic game is 0. Let \( (f^o (s), g^o (s))\) be the completely mixed optimal strategy for R(s), for each \(s \in S\). Then \( (f^o, g^o)\) is a completely mixed stationary optimal for the discounted stochastic game with the same payoff matrices and transition probabilities as \(\varGamma \). As this is true for any value of \(\beta \) (in particular, \(\beta \) near 1), \( (f^o, g^o)\) is a completely mixed stationary optimal for the undiscounted stochastic game too. \(\square \)

It is an open question as to whether or not the undiscounted stochastic game is completely mixed for the conditions listed in Theorem 8.

3.2 Undiscounted stochastic games with stationary optimal strategies

Sujatha et al. (2014) showed that finite discounted zero-sum stochastic games have symmetric optimal stationary strategies under certain conditions of symmetry (Theorem 9 below). We extend this result to a bigger class of finite discounted zero-sum stochastic games and to finite undiscounted zero-sum stochastic games too. The results provide sufficient conditions for finite undiscounted zero-sum stochastic games to have stationary optimal strategies, or in other words, a new class of finite undiscounted zero-sum stochastic games that have stationary optimal strategies. Theorem 10 extends Theorem 9 to undiscounted stochastic games.

Theorem 9

(Theorem 5 of Sujatha et al. 2014): Consider a finite discounted zero-sum stochastic game where R(s) is skew-symmetric for all \(s \in S\). Then the value of the stochastic game is 0 and the stochastic game has symmetric optimal stationary strategies independent of the discount factor and the transition probabilities.

For the sake of completeness, we give the proof for Theorem 9 in the Appendix.

Theorem 10

Consider a finite undiscounted zero-sum stochastic game where R(s) is skew-symmetric for all \(s \in S\). Then the value of the stochastic game is 0 and the stochastic game has symmetric optimal stationary strategies independent of the transition probabilities.

Proof

From Theorem 9, a finite discounted zero-sum stochastic game with skew symmetric payoff matrices has a symmetric optimal stationary strategy pair \( (f^o, f^o)\) that is independent of the discount factor and the transition probabilities.

$$\begin{aligned}&(1 - \beta )I_\beta (f^o, g) (s) \ge 0 \text{, } \text{ for } \text{ all } \; g\text{, } \text{ for } \text{ all } \; \beta \\&\text{ or } \varPhi (f^o, g) (s) \ge 0 \text{ as } \beta \rightarrow 1 \text{(by } \text{ Tauberian } \text{ theorem) } \end{aligned}$$

Hence, \(f^o\) is optimal for player-1 in the undiscounted stochastic game too. Similarly, \(f^o\) is optimal for player-2 as

$$\begin{aligned} \varPhi (f, f^o) (s) \le 0 , \text{ for } \text{ all } \; f \\ \end{aligned}$$

Hence, the undiscounted zero-sum stochastic game has a symmetric optimal stationary strategy pair, namely \( (f^o, f^o)\), that does not depend on the transition probabilities. \(\square \)

Remark 4

Let \( f^o\) be a symmetric optimal stationary strategy for a finite undiscounted zero-sum stochastic game with skew symmetric payoff matrices. Then, whether or not \((f^o (s), f^o (s))\) is optimal for R(s) for all \(s \in S\), is an open question.

The following theorem is a natural extension of Theorem 9 to a discounted stochastic game where only \(N-1\) states have skew symmetric payoff matrices and the transition probabilities to the state with the arbitrary payoff matrix are (the same) constant.

Theorem 11

Consider a finite discounted zero-sum stochastic game. (Recall that \(S = \{s_1, s_2, \ldots , s_N\}\)). Without loss of generality, let \(R (s_1), R (s_2), \ldots , R (s_{N-1})\) be skew symmetric and let \(R (s_N)\) be any arbitrary matrix. Further let \(q (s_N | s, i, j) = \) constant c, \(0 \le c \le 1\), for all \(s \in S\). Then the composition of optimal strategies of R(s) is an optimal stationary strategy for the stochastic game.

Proof

For \(s \ne s_N\), the value of the matrix game R(s) is 0 since R(s) is skew-symmetric. Let \(v_N\) be the minmax value of the matrix game \(R (s_N)\). Let \(f^o (s)\) be an optimal strategy for player-1 in the matrix game R(s) for each \(s \in S\).

Now, for the stochastic game starting at state \(s \ne s_N\), if player-1 plays \(f^o (s)\) for all \(s \in S\), then for each stationary strategy g of player-2, the payoff to player-1 is given by

$$\begin{aligned} I_\beta (f^o, g) (s)= & {} r (s, f^o (s), g (s)) + \beta \sum \limits _{s' \in S} r (s', f^o (s'), g (s')) q (s' | s, f^o (s), g (s)) \nonumber \\&+\, \beta ^2 \sum \limits _{s'' \in S} r (s'', f^o (s''), g (s'')) q^2 (s'' | s, f^o (s), g (s)) + \dots \nonumber \\= & {} r (s, f^o (s), g (s)) + \beta \sum \limits _{s' \ne s_N} r (s', f^o (s'), g (s')) q (s' | s, f^o (s), g (s)) \nonumber \\&+\, \beta r (s_N, f^o, g) q (s_N | s, f^o (s), g (s)) \nonumber \\&+\, \beta ^2 \sum \limits _{s'' \ne s_N} r (s'', f^o (s''), g (s'')) q (s'' | s, f^o (s), g (s)) \nonumber \\&+\, \beta ^2 r (s_N, f^o, g) q^2 (s_N | s, f^o (s), g (s))+\,\dots \end{aligned}$$
(2)

Let the transition probability matrix be denoted by \(Q = \begin{pmatrix} q_{11}&{}\dots &{}q_{1, N-1}&{}c\\ \vdots \\ q_{N1}&{}\dots &{}q_{N, N-1}&{}c \end{pmatrix}.\)

Then the (iN)th entry of \(Q^2\) \( = q_{i1}c + \dots + q_{i, N-1}c + c^2 = c (q_{i1} + \dots + q_{i, N-1}+ c) = c\). Hence, \(q^2 (s | s_N, f^o, g) = c\), for all \(s \in S\), and so on. Also R(s) for all \(s \ne s_N\) is skew symmetric. Using these in Eq. 2, we have

$$\begin{aligned}&I_\beta (f^o, g) (s) \ge \beta c v_N + \beta ^2 c v_N + \dots \nonumber \\&\quad \ge c \frac{\beta }{1 - \beta } v_N , \text{ for } \text{ all } g \end{aligned}$$
(3)

If the stochastic game started at state \(s_N\) instead, then the payoff to player-1 is

$$\begin{aligned} I_\beta (f^o, g) (s_N)= & {} r (s_N, f^o (s_N), g (s_N)) \nonumber \\&+\, \beta \sum \limits _{s' \in S} r (s', f^o (s'), g (s')) q (s' | s_N, f^o (s), g (s)) \nonumber \\&+\, \beta ^2 \sum \limits _{s'' \in S} r (s'', f^o (s''), g (s'')) q^2 (s'' | s_N, f^o (s), g (s)) + \dots \nonumber \\\ge & {} v_N + c \frac{\beta }{1 - \beta } v_N \nonumber \\\ge & {} \frac{1 + \beta (c - 1)}{1 - \beta } v_N , \text{ for } \text{ all } g \end{aligned}$$
(4)

Similarly, let \(g^o (s)\) be the optimal strategy for player-2 (the minimizer) for R(s) for each \(s \in S\). We can show that \(I_\beta (f, g^o) (s) \le c \frac{\beta }{1 - \beta } v_N\), for all f, when the stochastic game starts in state \(s \ne s_N\), and \(I_\beta (f, g^o) (s_N) \le \frac{1 + \beta (c - 1)}{1 - \beta } v_N\), for all f, when the stochastic game starts in state \(s_N\). Hence, \( (f^o, g^o)\) is an optimal stationary strategy pair for the stochastic game. \(\square \)

Remark 5

We can extend Theorem 11 to a finite state space where the payoff matrices for all except m states \( (m < N)\) are skew symmetric, and the transition probabilities to each of these m states is a constant (albeit different).

Remark 6

Theorem 11 and Remark 5 can be extended to finite undiscounted zero-sum stochastic games too. Note, in particular, that this class of undiscounted stochastic games has stationary optimal strategies. Also note that this class contains the class of of undiscounted stochastic games provided by Theorem 10.

3.3 Symmetric equilibrium in stochastic games

Symmetric equilibrium in bimatrix games has relevance especially in single population evolutionary games (Hofbauer and Sigmund 2003). Gale (1960) showed that if A is a skew-symmetric matrix, then there exists a symmetric optimal strategy for the matrix game A. Nash (1951) showed that given any square matrix A, there exists a symmetric equilibrium for the bimatrix game \( (A, A^t)\). Hofbauer and Sigmund (2003) also provide an alternative proof to this result by Nash (1951). Flesch et al. (2013) have shown the existence of symmetric stationary equilibria in symmetric irreducible discounted stochastic games. However, characterizing the class of symmetric stochastic games with symmetric stationary strategies remains open.

In this subsection, we provide sufficient conditions for two-person finite discounted as well as undiscounted, zero- as well as nonzero-sum stochastic games to have symmetric optimal/ equilibrium strategies. We also provide a sufficient condition for existence of pure strategy symmetric equilibria in two-person finite discounted as well as undiscounted nonzero-sum stochastic games. We begin with Theorem 12 that extends the result of Nash (1951) to stochastic games.

Theorem 12

Consider a two-person finite nonzero-sum discounted stochastic game \(\varGamma _\beta \). Suppose \(r_1 (s, i, j)\) \(= r_2 (s, j, i)\) and \(q (s' | s, i, j) = q (s' | s, j, i)\), for all \(s, s' \in S, i, j \in A_1 (= A_2)\). Then, \(\varGamma _\beta \) has a symmetric equilibrium.

Proof

Since all transition probabilities are symmetric and \(R_2(s) = R_1(s)^t\) for all \(s \in S\), it follows that \(I_\beta ^{ (1)} (f, g) (s)\) = \(I_\beta ^{ (2)} (g, f) (s)\). On lines similar to the alternative proof given by Hofbauer and Sigmund (2003) to Nash’s result (1951), it follows that \(\varGamma _\beta \) has a symmetric equilibrium using Kakutani’s fixed point theorem (Kakutani 1941). \(\square \)

Remark 7

Theorem 12 holds for two-person finite zero-sum discounted stochastic games too. Theorem 12 also holds for two-person finite undiscounted, zero- as well as nonzero-sum SER-SIT and AIT games. In SER-SIT games, for example, the individual matrices can be symmetrized using Gale’s (1960) technique and hence made skew symmetric.

The following result can be shown from the results by Duersch et al. (2012).

Corollary 13

Consider a two-person finite nonzero-sum discounted stochastic game \(\varGamma _\beta \). For each \(s \in S\), let the bimatrix game restricted to state s have a pure strategy symmetric equilibrium. Further, suppose all payoff matrices and transition probability matrices are symmetric. Then, \(\varGamma _\beta \) has a symmetric pure strategy equilibrium.

Corollary 13 holds for two-person finite nonzero-sum undiscounted stochastic games too. Without symmetry conditions, it is not known whether two-person finite nonzero-sum undiscounted stochastic games have symmetric equilibria or not. Only some partial results are available about this (Flesch et al. 2013).

4 Conclusion and future work

In this paper, we showed some necessary conditions and some sufficient conditions for discounted as well as undiscounted stochastic games to have a completely mixed optimal (in the zero-sum case) or equilibrium (in the nonzero-sum case). We showed some sufficient conditions for discounted as well as undiscounted stochastic games to have symmetric equilibria. We also showed that an undiscounted stochastic game with skew-symmetric payoff matrices has optimal stationary strategies and in fact, the strategies are symmetric. We also extended this result to discounted and undiscounted stochastic games in which there is no restriction on the payoff matrix in a constant number of states and all other states have skew-symmetric payoff matrices. In this paper, we have looked at certain classes of stochastic games and can extend the results to classes of stochastic games such as Additive Reward Additive Transition (ARAT) games, as well as to some classes of uncountable state space stochastic games.