Keywords

12.1 Introduction

We present and subsequently analyze a stochastic game in which transition probabilities at any point in time depend on the history of the play, i.e., players’ past action choices, their current choices, and the current state. This development has been inspired by an ambition to incorporate certain empirical phenomena into Small Fish WarsFootnote 1 [37]. Here, agents possess the fishing rights on a body of water, and the resource can be in either of two states, High or Low. In the former, the fish are more abundant and therefore catches are larger than in the latter. The agents have two options, to fish with or without restraint. Fishing with restraint by both agents is (assumed to be) sustainable in the long run, as the resource will be (assumed to be) able to recover; unrestrained fishing by both yields higher immediate catches, but damages the resource significantly if continued for prolonged periods of time. This damage becomes apparent in the dynamics of the system as an increase in the probabilities that the system moves from High to Low, and simultaneously a decrease in the probabilities of the system to move from Low to High. This causes the system and hence the play, to spend a higher proportion of time in Low.

We additionally aim to incorporate hysteresis effects called poaching pits in the field of management of replenishable resources (e.g., Bulte [11], Courchamp et al. [13], Hall et al. [24]). Hysteresis may be caused by biological phenomena induced by the (nature of the) exploitation of the resource. For instance, full-grown cod spawn a considerably higher number of eggs than younger specimen: Oosthuizen and Daan [57], Armstrong et al. [3] find linear fecundity-weight relations, Rose et al. [63] report exponential fecundity-weight relations. As mature cod are targeted by modern catching techniques such as for instance gill netting, overfishing hurts mainly the cohorts most productive in providing offspring. To regain full reproductive capacity, younger cohorts must reach ages well beyond adulthood. Hence, it may take cod a long while to escape a poaching pit after a recovery plan or program to replenish the stock has been effectuated.

To achieve our goals we engineered a stochastic gameFootnote 2 as follows. Nature (chance) may move the play from one state to the other dependent on the current action choices of the agents, but also on their past catching behavior. To achieve the above-formulated modeling aims we introduce endogenously changing stochastic variation,Footnote 3 the evolution of the transition probabilities reflects that the more frequently the agents exploit the resource without restraint, the more it deteriorates. Here, the probability of moving to High may decrease in time in each state and for each action combination if the agents show prolonged lack of restraint, i.e., overfish frequently.

Transition probabilities from Low to High may become zero, resulting in Low becoming a temporarily absorbing state. If the agents keep overexploiting the resource, this situation does not change in our model. Even if the agents revert to restraint in order to bring about the recovery of the resource, it may take a long time before High becomes accessible again. Thus, we endeavor to reproduce effects similar to the ones associated to hysteresis.

The agents are assumed to wish to maximize their long-term average catches. We adopt a Folk Theorem type analysis as in Joosten et al. [42], and validate relevant procedures in this new setting. First, we show how to establish the rewards for any pair of jointly convergent pure strategies. Then, we determine the set of jointly convergent pure-strategy rewards. A more complex issue is then to find for each player the threat point reward, i.e., the highest amount this player can guarantee himself if his opponent tries to minimize his rewards. Finally, we obtain a large set of rewards which can be supported by equilibria using threats, namely all jointly-convergent pure-strategy rewards giving each player more than the threat point reward.

In the model analyzed throughout the chapter for expository purposes, we gain insights relevant to the management of the resource. Our findings reveal a potential for compromise between ecological and economic maximalistic goals, thus overcoming the one-sidedness of management policies for natural resources as noted by e.g., Holden [33], Brooks et al. [10], and in turn improving their chances of success cf., e.g., BenDor et al. [5], Sanchirico et al. [64]. Full restraint, an ecological maximalistic goal, yields total rewards which are considerably higher than never-restraint rewards. Yet, a possible economic maximalistic goal, i.e., Pareto-efficient equilibrium rewards resulting from jointly convergent pure strategies with threats, yields a sizeable increase of total rewards even over full restraint. We find that the proportion of time spent in such a poaching pit goes to zero in the long run under equilibrium behavior. A whole range of models should be analyzed to obtain general findings providing insights into the full range of fishery management games.

Next, we introduce our model with endogenous transition probabilities. In Sect. 12.3, we focus on strategies and restrictions desirable or resulting from the model. Section 12.4 treats rewards in a very general sense, and equilibrium rewards more specifically. Also some attention is paid to the complexity of computing threat point rewards. Section 12.5 concludes.

12.2 Endogenous Transition Probabilities

A Small Fish War is played by row player A and column player B at discrete moments in time called stages. Each player has two actions and at each stage \(t\in \mathbb {N}\) the players independently and simultaneously choose an action. Action 1 for either player denotes the action for which some restriction exists allowing the resource to recover, e.g., catching with wide-mazed nets or catching a low quantity. Action 2 denotes the action with little restraint.

We assume catches to vary due to random shocks, which we model by means of a stochastic game with two states at every stage of the play. First, let us capture the past play until stage t\(t>1,\) by the following two matrices:

$$\begin{aligned} QH^{t}=\left[ \begin{array}{cc} q_{1}^{t} &{} q_{2}^{t} \\ q_{3}^{t} &{} q_{4}^{t} \end{array} \right] ,\text { and }QL^{t}=\left[ \begin{array}{cc} q_{5}^{t} &{} q_{6}^{t} \\ q_{7}^{t} &{} q_{8}^{t} \end{array} \right] . \end{aligned}$$

Here, e.g., \(q_{1}^{t}\) is the relative frequency with which action pair top-left in High has occurred until stage t, and \(q_{7}^{t}\) is the relative frequency of action pair bottom-left in Low having occurred during past play. So, we must have \(q^{t}=\left( q_{1}^{t},\ldots ,q_{8}^{t}\right) \in \varDelta ^{7}=\{x\in \mathbb {R} ^{8}|x_{i}\ge 0\) for all \(i=1,\ldots ,8\) and \(\sum _{j=1}^{8}x_{j}=1\}\). We refer to such a vector as the relative frequency vector.

Let the interaction at stage t of the play be represented by the following:

$$\begin{aligned} H^{t}= & {} H\left( q^{t}\right) =\left[ \begin{array}{cc} \theta _{1},p_{1}\left( q^{t}\right) &{} \theta _{2},p_{2}\left( q^{t}\right) \\ \theta _{3},p_{3}\left( q^{t}\right) &{} \theta _{4},p_{4}\left( q^{t}\right) \end{array} \right] , \\ L^{t}= & {} L\left( q^{t}\right) =\left[ \begin{array}{cc} \theta _{5},p_{5}\left( q^{t}\right) &{} \theta _{6},p_{6}\left( q^{t}\right) \\ \theta _{7},p_{7}\left( q^{t}\right) &{} \theta _{8},p_{8}\left( q^{t}\right) \end{array} \right] . \end{aligned}$$

Here \(H^{t}\left( q^{t}\right) \) (\(L^{t}\left( q^{t}\right) \)) indicates state High (Low) at stage t of the play if the play until then resulted in relative frequency vector \(q^{t}\). Each entry of the two matrices has an ordered pair denoting the pair of payoffs to the players \(\theta _{i}=\left( \theta _{i}^{A},\theta _{i}^{B}\right) \) if the corresponding action pair is chosen and the probability \(p_{i}\left( q^{t}\right) \) that the system moves to High at stage \(t+1\) (and to Low with the complementary probability). All functions \( p_{i}:\varDelta ^{7}\rightarrow [0,1]\) are assumed continuous. We now give an example.

Example 1

In this Small Fish War we assume that in both states Action 1, i.e., catching with restraint, is dominated by the alternative.Footnote 4 Let, for given relative frequency vector \(q^{t}\in \varDelta ^{7} \), the transition functions \(p_{i}:\varDelta ^{7}\rightarrow [0,1],\) \(i=1,\ldots ,8,\) governing the transition probabilities, be given by

$$\begin{aligned} \begin{array}{l} p_{1}(q^{t})=\left[ \frac{8}{10}-\frac{11}{24}q_{4}^{t}-\frac{11}{12} q_{8}^{t}\right] _{+} \\ p_{2}(q^{t})=p_{3}(q^{t})=\left[ \frac{6}{10}-\frac{11}{20}q_{4}^{t}-\frac{11 }{10}q_{8}^{t}\right] _{+} \\ p_{4}(q^{t})=\left[ \frac{3}{10}-\frac{11}{16}q_{4}^{t}-\frac{11}{8}q_{8}^{t} \right] _{+} \\ p_{5}(q^{t})=\left[ \frac{6}{10}-\frac{11}{12}q_{4}^{t}-\frac{11}{6}q_{8}^{t} \right] _{+} \\ p_{6}(q^{t})=p_{7}(q^{t})=\left[ \frac{4}{10}-\frac{11}{8}q_{4}^{t}-\frac{11 }{4}q_{8}^{t}\right] _{+} \\ p_{8}(q^{t})=\left[ \frac{1}{10}-\frac{11}{4}q_{4}^{t}-\frac{11}{2}q_{8}^{t} \right] _{+}. \end{array} \end{aligned}$$

Here, \(\left[ x\right] _{+}\) is short hand for \(\max \{x,0\}\). These equations capture the following deliberations. Two-sided full restraint is assumed to cause not more damage to the resource in both states than if exactly one player catches with restraint. Hence, the probability that during the next stage play is in High if the first case arises is at least equal to the corresponding probability in the second case. We also assume symmetry, hence \(p_{2}\left( q^{t}\right) =p_{3}\left( q^{t}\right) \) and \(p_{6}\left( q^{t}\right) =p_{7}\left( q^{t}\right) .\) Furthermore, we assume that exactly one player catching without restraint is not more harmful to the resource than two players catching without restraint. The inequalities \(p_{i}\left( q^{t}\right) \ge p_{i+4}\left( q^{t}\right) \) for \(i=1,\ldots ,4,\) are assumed to hold because if the play is in Low, the system is assumed at least as more vulnerable to overfishing as in High. We refer to e.g., Kelly et al. [46] for an empirical underpinning of these modeling choices.

Now, we show that renewable resources may recuperate slowly after a program of recovery has been taken up. Suppose both agents play Action 1 twice followed by 2 for a sufficiently long period of time until stage \( t^{*}\). Clearly, \(q_{4}^{t^{*}}+q_{8}^{t^{*}}=\frac{t^{*}-2}{ t^{*}}.\) Now, for \(t^{*}\rightarrow \infty ,\) \(p_{5}\left( q^{t^{*}}\right) =p_{6}\left( q^{t^{*}}\right) =p_{7}\left( q^{t^{*}}\right) =p_{8}\left( q^{t^{*}}\right) =0,\) because

$$\begin{aligned} \begin{array}{l} \frac{6}{10}-\frac{11}{12}q_{4}^{t^{*}}-\frac{11}{6}q_{8}^{t^{*}}= \frac{6}{10}-\frac{11}{12}\left( \frac{t^{*}-2}{t^{*}} -q_{8}^{t^{*}}\right) -\frac{11}{6}q_{8}^{t^{*}}= \\ \frac{6}{10}-\frac{11}{12}\left( 1-\frac{2}{t^{*}}-q_{8}^{t^{*}}\right) -\frac{11}{6}q_{8}^{t^{*}}=-\frac{19}{60}+\frac{11}{6t^{*}} -\frac{11}{12}q_{8}^{t^{*}}<0. \end{array} \end{aligned}$$

Then, \(p_{5}\left( q^{t^{*}}\right) =0\) and by the relation to the other transition probability functions, \(p_{6}\left( q^{t^{*}}\right) =p_{7}\left( q^{t^{*}}\right) =p_{8}\left( q^{t^{*}}\right) =0\) as well. Take \(t^{*}=16,\) clearly

$$\begin{aligned} \begin{array}{c} -\frac{19}{60}+\frac{11}{6t^{*}}-\frac{11}{12}q_{8}^{t^{*}}<-\frac{19 }{60}+\frac{11}{6t^{*}}<0. \end{array} \end{aligned}$$

If both agents switch to playing sequences of \((1,1,1,\ldots )\) from then on, it will take a while before \(p_{5}\left( q^{t}\right) \) becomes positive again. Since

$$\begin{aligned} \begin{array}{l} \frac{6}{10}-\frac{11}{12}q_{4}^{t^{*}+k}-\frac{11}{6}q_{8}^{t^{*}+k}=\frac{6}{10}-\frac{11}{12}\frac{t^{*}-2}{t^{*}+k}-\frac{11}{12} q_{8}^{t^{*}+k}= \\ \frac{6}{10}-\frac{11}{12}\left( 1-\frac{k+2}{t^{*}+k}\right) -\frac{11}{ 12}q_{8}^{t^{*}+k}=-\frac{19}{60}+\frac{11}{12}\frac{k+2}{t^{*}+k}- \frac{11}{12}q_{8}^{t^{*}+k} \\ <-\frac{19}{60}+\frac{11}{12}\frac{k+2}{t^{*}+k}, \end{array} \end{aligned}$$

the first expression cannot be positive for \(k<\frac{19t^{*}-110}{36}.\) So, for \(t^{*}=16\) it takes at least six stages for the play to be able to return to High.

12.3 Strategies and Restrictions

A strategy is a game plan for the entire infinite time horizon, allowing it to depend on any condition makes an extensive analysis of infinitely repeated games quite impossible. Most restrictions in the literature put requirements on what aspects the strategies are conditional upon. For instance, a history-dependent strategy prescribes a possibly mixed action to be played at each stage conditional on the current stage and state, as well as on the full history until then, i.e., all states visited and all action combinations realized before.

Less general strategies are for instance, action independent ones which condition on all states visited before, but not on the action combinations chosen [31]. Markov strategies condition on the current state and the current stage, and stationary strategies only condition on the present state (cf., e.g., Filar and Vrieze [20], Flesch [21]).

The challenge in the present framework is to find restrictions on strategies which are helpful in the analysis. Although Markov and stationary strategies have proven their value in the analysis of finite state stochastic games with fixed transition probabilities, it is quite unclear what their contribution can be in the present framework.

Essentially, (at least) two points of view can be adopted to analyze the present framework. The one we favor is the one in which High and Low are seen as the states with the transitions between these states being a function of the history of the play as captured by the relative frequency vector \(q^{t}.\) Stationary strategies are easily formulated here, but probably much too simple for analytical purposes as some link with \(q^{t}\) must be assumed to be useful. An alternative is to define the states according to the relative frequency vector in which there exist infinitely many states \(H(q^{t})\) and \(L(q^{t}).\) Here, the practical problem is the enormity of the task of infinitely many stationary or Markov strategies to be defined.

Let \(\mathcal {X}^{k}\) denote the set of history-dependent strategies of player \(k=1,2.\) A strategy is pure, if at each stage a pure action is chosen, i.e., an action is chosen with probability 1. The set of pure strategies for player k is \(\mathcal {P}^{k}\), and \( \mathcal {P\equiv P}^{A}\times \mathcal {P}^{B}.\) Let us define the following notions, introduced before in a rather informal manner, a bit more formally. For \(j=1,2,\) \(t>1\)

$$\begin{aligned} \begin{array}{l} q_{j}^{t}\equiv \frac{\#\left\{ \left( j_{u}^{A,H},j_{u}^{B,H}\right) |\text { }j_{u}^{A,H}=1,\text { }j_{u}^{B,H}=j,\text { }1\le u<t\right\} }{t-1}, \\ q_{j+2}^{t}\equiv \frac{\#\left\{ \left( j_{u}^{A,H},j_{u}^{B,H}\right) | \text { }j_{u}^{A,H}=2,\text { }j_{u}^{B,H}=j,\text { }1\le u<t\right\} }{t-1}, \\ q_{j+4}^{t}\equiv \frac{\#\left\{ \left( j_{u}^{A,L},j_{u}^{B,L}\right) | \text { }j_{u}^{A,H}=1,\text { }j_{u}^{B,H}=j,\text { }1\le u<t\right\} }{t-1}, \\ q_{j+6}^{t}\equiv \frac{\#\left\{ \left( j_{u}^{A,L},j_{u}^{B,L}\right) | \text { }j_{u}^{A,H}=2,\text { }j_{u}^{B,H}=j,\text { }1\le u<t\right\} }{t-1}. \end{array} \end{aligned}$$

Here, \(j_{u}^{A,X}\) (\(j_{u}^{B,X}\)) denotes the action taken by player A (B) while being in state \(X=H,L\) at stage u. So, for instance \(q_{4}^{t}\) is the relative frequency of action pair \(\left( 2,2\right) \) in state H being chosen until stage t.

The strategy pair \(\left( \pi ,\sigma \right) \in \mathcal {X}^{A}\times \mathcal {X}^{B}\) is jointly convergent if and only if \(q\in \varDelta ^{7}\) exists such that for all \(\varepsilon >0,\) \(i\in \{1,2,\ldots ,8\}:\)

$$\begin{aligned} \begin{array}{l} {\limsup }_{t\rightarrow \infty }\Pr _{\pi ,\sigma }\left[ \left| q_{i}^{t}-q_{i}\right| \ge \varepsilon \right] =0. \end{array} \end{aligned}$$
(12.1)

\(\Pr _{\pi ,\sigma }\) denotes the probability under strategy pair \(\left( \pi ,\sigma \right) \). \(\mathcal {JC}\) denotes the set of jointly convergent strategy pairs. Under such a pair of strategies, the relative frequency of each action pair in both states as play goes to infinity converges to a fixed number with probability 1 in the terminology of Billingsley [8, p. 274]). The set of jointly-convergent pure-strategy rewards \(P^{ \mathcal {JC}}\) is then the set of pairs of rewards obtained by using a pair of jointly-convergent pure strategies.

For a pair of jointly convergent pure strategies, let \(p_{i}\equiv \lim _{t\rightarrow \infty }p_{i}\left( q^{t}\right) =p_{i}(q)\) for \( i=1,\ldots ,8.\) These notions are well defined as the relevant functions are continuous (cf., e.g., Billingsley [8]). We distinguish the following restrictions to be explained below:

$$\begin{aligned}&\begin{array}{c} 0<{\sum }_{i=1}^{4}q_{i}\left( 1-p_{i}\right) ={\sum }_{i=5}^{8}\,q_{i}p_{i}\text { and }0<{\sum }_{i=1}^{4}q_{i}<1, \end{array} \end{aligned}$$
(12.2)
$$\begin{aligned}&\begin{array}{c} {\sum }_{i=5}^{8}q_{i}=1,\text { and }q_{i}>0\Longrightarrow p_{i}=0,\text { } i=5,\ldots ,8, \end{array} \end{aligned}$$
(12.3)
$$\begin{aligned}&\begin{array}{c} {\sum }_{i=1}^{4}q_{i}=1,\text { and }q_{i}>0\Longrightarrow p_{i}=1,\text { } i=1,\ldots ,4. \end{array} \end{aligned}$$
(12.4)

Restriction (12.2) is a conservation of flow equation: play takes place on both states infinitely often, therefore, due to the law of large numbers the actual instances of leaving High must be proportional to the long run probability of leaving it and the latter must be equal to the probability of returning.

If the long run play occurs in Low exclusively, (12.3) must hold. The former part is obvious, if \(q_{i}p_{i}>0\) for some \(i=5,\ldots ,8,\) then play would visit the corresponding entry infinitely often as time goes to infinity, hence with probability at least \(q_{i}p_{i}\) state High would occur. Similar reasoning applies to the other case that play occurs only in High, hence (12.4). We now show the implications for jointly-convergent pure strategies.

Example 2

Now, (12.3) can only hold if \(p_{i}=0\) or \(q_{i}=0\) for all \( i=5,\ldots ,8.\) Similarly, (12.4) can only hold if \(1-p_{i}=0\) or \(q_{i}=0\) for all \(i=1,\ldots ,4.\) So, if a state is absorbing, then positive mass on a component of the relative frequency vector q can only occur if the associated probability of leaving that state is zero. Observe that therefore only Low can be absorbing. From the ranking of probabilities, we may distinguish the following three subcases.

$$\begin{aligned} \begin{array}{l} q_{8}=1\text { and }p_{8}=\frac{1}{10}-\frac{11}{4}q_{4}-\frac{11}{2} q_{8}\le 0\text { or} \\ {\sum }_{i=6}^{8}q_{i}=1\text { and }p_{6}=p_{7}=\frac{4}{10}-\frac{11}{8}q_{4}- \frac{11}{4}q_{8}\le 0\text { or} \\ {\sum }_{i=5}^{8}q_{i}=1\text { and }p_{5}=\frac{6}{10}-\frac{11}{12}q_{4}-\frac{ 11}{6}q_{8}\le 0. \end{array} \text { } \end{aligned}$$

Clearly, \(q_{4}=0.\) The first case is easily checked reducing analysis to

$$\begin{aligned} \begin{array}{l} {\sum }_{i=6}^{8}q_{i}=1\text { and }\frac{16}{110}\le q_{8}\le \frac{36}{110}, \text { or} \\ {\sum }_{i=5}^{8}q_{i}=1\text { and }q_{8}\ge \frac{36}{110}. \end{array} \end{aligned}$$

leading to \(q_{5}=0\) and \(\frac{16}{110}\le q_{8}\le \frac{36}{110},\) and \( q_{8}\ge \frac{36}{110}\) and \(q_{5},\ldots ,q_{7}\ge 0\).

Figure 12.1 visualizes these restrictions for Low being absorbing. The upper three-dimensional subset of \(\varDelta ^{3}\), is connected to the final inequality; the parallelogram on the face of \(\varDelta ^{3}\) is connected to the former.

Fig. 12.1
figure 1

If play concentrates on Low, \(q_{1}=\cdots =q_{4}=0\) and \(q_{5}+\cdots +q_{8}=1.\) We depict this face of \(\varDelta ^{7}\) as a “projection” unto \(\varDelta ^{3}\). Extreme point \(e_{i}\) has component \(i-4\) equal to one. The admissible q’s, are sketched as the three-dimensional set on top, and the two-dimensional boundary set

12.4 On Rewards and Equilibrium Rewards

The players receive an infinite stream of stage payoffs, they are assumed to wish to maximize their average rewards. For a given pair of strategies \( \left( \pi ,\sigma \right) ,\) \(R_{t}^{k}\left( \pi ,\sigma \right) \) is the expected payoff to player k at stage t under strategy combination \( \left( \pi ,\sigma \right) \), then player k’s average reward, \( k=A,B,\) is \(\gamma ^{k}\left( \pi ,\sigma \right) =\liminf _{T\rightarrow \infty }\frac{1}{T}\sum _{t=1}^{T}R_{t}^{k}\left( \pi ,\sigma \right) ,\) and \( \gamma \left( \pi ,\sigma \right) \equiv \left( \gamma ^{A}\left( \pi ,\sigma \right) ,\gamma ^{B}\left( \pi ,\sigma \right) \right) \). Moreover, for vector \(q\in \varDelta ^{7}\), the q-averaged payoffs \((x,y)_{q}\) are given by

$$\begin{aligned} \begin{array}{c} (x,y)_{q}={\sum }_{i=1}^{8}q_{i}\theta _{i}. \end{array} \end{aligned}$$

The strategy pair \(\left( \pi ^{*},\sigma ^{*}\right) \in \mathcal {X} ^{A}\times \mathcal {X}^{B}\) is an equilibrium if and only if

$$\begin{aligned} \gamma ^{A}\left( \pi ^{*},\sigma ^{*}\right)\ge & {} \gamma ^{A}\left( \pi ,\sigma ^{*}\right) \text { for all }\pi \in \mathcal {X} ^{A} \\ \gamma ^{B}\left( \pi ^{*},\sigma ^{*}\right)\ge & {} \gamma ^{B}\left( \pi ^{*},\sigma \right) \text { for all }\sigma \in \mathcal {X} ^{B}. \end{aligned}$$

The rewards \(\gamma \left( \pi ^{*},\sigma ^{*}\right) \) associated with an equilibrium \(\left( \pi ^{*},\sigma ^{*}\right) \) will be referred to as equilibrium rewards.

In the analysis of repeated games, another helpful measure to reduce complexity is to focus on rewards instead of strategies. It is more rule than exception that one and the same reward combination can be achieved by several distinct strategy combinations. Here, we focus on rewards to be obtained by jointly-convergent pure strategies.

12.4.1 Jointly Convergent Pure-Strategy Rewards

The next result connects notions introduced in the previous sections.

Proposition 1

Let strategy pair \(\left( \pi ,\sigma \right) \in \mathcal {JC}\) and let \( q\in \varDelta ^{7}\) for which (12.1) is satisfied, then the average payoffs are given by \(\gamma \left( \pi ,\sigma \right) =(x,y)_{q}.\)

Proof

Let \(\left( \pi ,\sigma \right) \in \mathcal {JC}\) and \( E\{\theta _{u}^{\pi ,\sigma }\}\equiv \left( R_{u}^{1}\left( \pi ,\sigma \right) ,R_{u}^{2}\left( \pi ,\sigma \right) \right) ,\) then

$$\begin{aligned} \begin{array}{l} \lim _{t\rightarrow \infty }\frac{1}{t}{\sum }_{u=1}^{t}E\{\theta _{u}^{\pi ,\sigma }\}=\lim _{t\rightarrow \infty }E\left\{ \frac{1}{t} {\sum }_{u=1}^{t}\theta _{u}^{\pi ,\sigma }\right\} = \\ \lim _{t\rightarrow \infty }E\left\{ {\sum }_{i=1}^{8}q_{i}^{t}\theta _{i}\right\} =\lim _{t\rightarrow \infty }{\sum }_{i=1}^{8}E\left\{ q_{i}^{t}\right\} \theta _{i}={\sum }_{i=1}^{8}q_{i}\theta _{i}=(x,y)_{q}. \end{array} \end{aligned}$$

The second equality sign involves a change in counting: on the left-hand side we sum over all periods, on the right-hand side over all eight entries of the two bi-matrices weighed by their relative frequencies. Equalities one and three are standard, the penultimate one follows from (12.1), cf., e.g., Billingsley [8, p. 274], the final one by the definition given above. Since \(\lim _{t\rightarrow \infty }\frac{1}{t}\sum _{u=1}^{t}E\{\theta _{u}^{\pi ,\sigma }\}\) equals \((x,y)_{q}\), it follows that \(\gamma \left( \pi ,\sigma \right) =(x,y)_{q}.\)

Example 3

To continue the example, we add stage payoffs

$$\begin{aligned} H\left( q^{t}\right)= & {} \left[ \begin{array}{cc} \left( 4,4\right) ,p_{1}\left( q^{t}\right) &{} \left( \frac{7}{2},6\right) ,p_{2}\left( q^{t}\right) \\ \left( 6,\frac{7}{2}\right) ,p_{3}\left( q^{t}\right) &{} \left( \frac{11}{2}, \frac{11}{2}\right) ,p_{4}\left( q^{t}\right) \end{array} \right] , \\ L\left( q^{t}\right)= & {} \left[ \begin{array}{cc} \left( 2,2\right) ,p_{5}\left( q^{t}\right) &{} \left( \frac{7}{4},3\right) ,p_{6}\left( q^{t}\right) \\ \left( 3,\frac{7}{4}\right) ,p_{7}\left( q^{t}\right) &{} \left( \frac{11}{4}, \frac{11}{4}\right) ,p_{8}\left( q^{t}\right) \end{array} \right] . \end{aligned}$$

Observe that \(\theta _{i}=\frac{1}{2}\theta _{i-4}\) for \(i=5,\ldots ,8.\) The specifics for the probabilities \(p_{1}\left( q^{t}\right) ,\ldots ,p_{8}\left( q^{t}\right) \) were already presented earlier. Note that in both states, the first action is dominated by the second for both players.

Figure 12.2 shows the rewards consistent with Low being absorbing and note that this hexagon is not convex.Footnote 5 The link between rewards in Fig. 12.2 and the strategy restrictions visualized in Fig. 12.1 is that the extreme points in Fig. 12.2 have the following coordinates (i.e., rewards)

$$\begin{aligned} \begin{array}{l} \frac{74}{110}\left( 2,2\right) +\frac{36}{110}\left( \frac{11}{4},\frac{11}{ 4}\right) =\left( \frac{247}{110},\frac{247}{110}\right) \\ \frac{74}{110}\left( 3,\frac{7}{4}\right) +\frac{36}{110}\left( \frac{11}{4}, \frac{11}{4}\right) =\left( \frac{642}{220},\frac{457}{220}\right) \\ \frac{74}{110}\left( \frac{7}{4},3\right) +\frac{36}{110}\left( \frac{11}{4}, \frac{11}{4}\right) =\left( \frac{457}{220},\frac{642}{220}\right) \\ \frac{94}{110}\left( 3,\frac{7}{4}\right) +\frac{16}{110}\left( \frac{11}{4}, \frac{11}{4}\right) =\left( \frac{652}{220},\frac{417}{220}\right) \\ \frac{94}{110}\left( \frac{7}{4},3\right) +\frac{16}{110}\left( \frac{11}{4}, \frac{11}{4}\right) =\left( \frac{417}{220},\frac{652}{220}\right) . \end{array} \end{aligned}$$

The first three rewards coincide with the lower three vertices of the shaded simplex of dimension 3 within \(\varDelta ^{3}\) in Fig. 12.1. The latter two coincide with the lower two vertices of the quadrangle on the face of \( \varDelta ^{3}\) in Fig. 12.1. Finally, the reward \(\left( \frac{11}{4},\frac{11}{ 4}\right) \) coincides with the vertex \(e_{8}\) in Fig. 12.1.

So, \(e_{5}\) corresponds to the situation that in the long run the relative frequency of play on action pair (1, 1) in Low is 1 (if that were possible). The left-hand lowest vertex of the shaded simplex in Fig. 12.1 has coordinates \(\left( 74/110,00,36/100\right) \), so the corresponding rewards are obtained by the linear combination of both \(\left( 2,2\right) \) and \(\left( \frac{11}{4},\frac{11}{4}\right) \) with the associated weights.

Similarly, all interior points of the shaded simplex in Fig. 12.1 correspond to the interior of the shaded parallelogram in Fig. 12.2. The interior points of the boundary quadrangle in Fig. 12.1 correspond to the interior of the trapezium in Fig. 12.2.

We must also find rewards such that (12.2) is satisfied. Figure 12.3 shows all jointly-convergent pure-strategy rewards. For instance, rewards \(\left( \frac{7}{2},\frac{7}{2}\right) \) correspond to mutual full restraint; furthermore, the Pareto-efficient line segment connecting \(\left( \frac{22}{6 },\frac{23}{6}\right) \) and \(\left( \frac{23}{6},\frac{22}{6}\right) \) is achieved by playing Top-Right in High and by playing the off-diagonal action pairs in Low exclusively.

Fig. 12.2
figure 2

A sketch of the hexagon being the union of the lightly shaded parallelogram and the darker trapezium. The former corresponds to the three-dimensional set, the latter to the two-dimensional boundary set in Fig. 12.1. The other rewards, corresponding to the convex hull of the four entries associated with Low are not feasible by jointly-convergent pure strategies

Fig. 12.3
figure 3

The set \(P^{\mathcal {JC}}\): on the lower left-hand side the hexagon of Fig. 12.2, for other rewards both states are visited infinitely often

12.4.2 Equilibrium Rewards

We now focus on rewards from equilibria involving threats. Our approach is similar to a well-established one in the repeated games literature (cf., e.g., Hart [28], Forges [23]), linked to the Folk Theorem (see e.g., Van Damme [74]) and applied to stochastic games as well (cf., e.g., Thuijsman and Vrieze [71], Joosten et al. [42], Schoenmakers [67]).

We call \(v=\left( v^{A},v^{B}\right) \) the threat point, where \( v^{A}=\min _{\sigma \in \mathcal {X}^{B}}\max _{\pi \in \mathcal {X}^{A}}\) \( \gamma ^{A}(\pi ,\sigma ),\) and \(v^{B}=\min _{\pi \in \mathcal {X} ^{A}}\max _{\sigma \in \mathcal {X}^{B}}\gamma ^{B}(\pi ,\sigma ).\) So, \(v^{A}\) is the highest amount A can get if B tries to minimize A’s average payoffs. Under a pair of individually rational (feasible) rewards each player receives at least the threat-point reward.

Let \(E=\left\{ (x,y)\in P^{\mathcal {JC}}|\text { }x>v^{A}\text { and } y>v^{B}\right\} \) be the set of all individually rational jointly convergent pure-strategy rewards giving each player strictly more than his threat point reward. We can now present the following formal result:

Theorem 1

Each pair of rewards in E can be supported by an equilibrium.

Proof

Let \(\left( x,y\right) \in E\), then a pure-strategy combination \(\left( \pi ,\sigma \right) \in \mathcal {JC}\) exists such that \( \gamma \left( \pi ,\sigma \right) =\left( x,y\right) .\) Let \(\varepsilon = \frac{1}{2}\min \left( x-v^{A},y-v^{B}\right) \) and let \(\pi ^{p}\) (\(\sigma ^{p}\)) be a punishment-strategy of A (B), i.e., a strategy holding his opponent to at most \(v^{B}+\varepsilon \) (\(v^{A}+\varepsilon \)). Let

$$\begin{aligned} \pi _{t}^{*}\equiv & {} \left\{ \begin{array}{ll} \pi _{t} &{} \text {if }j_{k}=\sigma _{k}^{*}\text { for all }k<t, \\ \pi _{t}^{p} &{} \text {otherwise.} \end{array} \right. \\ \sigma _{t}^{*}\equiv & {} \left\{ \begin{array}{ll} \sigma _{t} &{} \text {if }i_{k}=\pi _{k}^{*}\text { for all }k<t, \\ \sigma _{t}^{p} &{} \text {otherwise.} \end{array} \right. \end{aligned}$$

Here, \(i_{t}\) (\(j_{t})\) denotes the action taken by player A (B) at stage t of the play. Clearly, \(\gamma \left( \pi ^{*},\sigma ^{*}\right) =\gamma \left( \pi ,\sigma \right) =\left( x,y\right) \). Suppose player A were to play \(\pi ^{\prime }\) such that \(\pi _{k}^{\prime }\ne \pi _{k}^{*}\) for some k,  then player B would play according to \( \sigma ^{p}\) from then on. Since, \(\gamma ^{A}\left( \pi ^{\prime },\sigma ^{p}\right) \le v^{A}+\varepsilon <x\), it follows immediately that player A cannot improve against \(\sigma ^{*}.\) A similar statement holds in case player B deviates unilaterally. Hence, \(\left( \pi ^{*},\sigma ^{*}\right) \) is an equilibrium.

Such a pair of strategies \(\left( \pi ^{*},\sigma ^{*}\right) \) is called an equilibrium involving threats, e.g., Hart [28], Van Damme [74], Thuijsman and Vrieze [71].

Joosten et al. [42] prove by construction that each reward in the convex hull of E can be supported by an equilibrium, too. Equilibrium rewards in the convex hull of E not in E can be obtained by history-dependent strategies with threats, which are neither jointly-convergent, nor pure. The construction of Joosten et al. [42] involves a randomization phase which obviously violates the pure-strategy part. The randomization phase serves to identify and communicate to both players which equilibrium pair of jointly convergent pure strategies is to be played afterwards. So, this also violates the very notion of jointly convergent strategies. This construction need not work for every stochastic game, but for the present class of games it does as no state is absorbing (permanently).

Whether equilibria exist yielding rewards that are not in the convex hull of E,  is an open question. Such equilibria then must be associated with strategies which are not jointly convergent. For instance, in the example here, it can be shown by construction that rewards in the convex hull of \( \left( \frac{417}{220},\frac{417}{220}\right) \) and \(P^{\mathcal {JC}}\) can be obtained for the average reward criterion using the limes inferior. Similarly, although this is out of the scope of this chapter, one can obtain the convex hull of \(\left( \frac{7}{4},\frac{7}{4}\right) \) and \(P^{\mathcal { JC}}\) as feasible rewards for the average reward criterion using the limes superior. For the latter criterion all additional rewards Pareto dominate all equilibrium rewards in \(P^{\mathcal {JC}}\). Therefore, these rewards can be supported by equilibria as well for this alternative evaluation criterion.

Theorem 1 hinges on the possibility of punishing unilateral deviations, as in e.g., Hämäläinen et al. [25]. So, we cannot restrict ourselves to Markov or stationary strategies as these types of strategies do not offer the strategic richness to allow punishing. History-dependent strategies do offer the required flexibility, but it is an open question whether less general classes of strategies might suffice. What is clear though, is that action independent strategies do not.

There is no contradiction between strategy pairs being both jointly-convergent and history-dependent, or for that matter cooperative, e.g., Tołwinski [72], Tołwinski et al. [73], Krawczyk and To łwinski [48], or incentive strategies, or combinations, e.g., Ehtamo and Hämäläinen [15,16,17,18].

12.4.3 On Computing Threat Points

We illustrate Theorem 1 and the notions introduced. Moreover, we use the examples to show the scope of the problem of computing threat points. The next example shows that linear programs may not suffice.

Example 4

Assume that player B uses his second action at all stages of the play. Now, consider the (nonlinear) program

$$\begin{aligned} \begin{array}{l} {\min }_{q_{2},q_{4},q_{6},q_{8}}6q_{2}+\frac{11}{2}q_{4}+3q_{6}+\frac{11}{4} q_{8} \\ \text {s.t. }1=q_{2}+q_{4}+q_{6}+q_{8} \\ 0=(1-p_{2})q_{2}+(1-p_{4})q_{4}-p_{6}q_{6}-p_{8}q_{8} \\ p_{2}=\left[ \frac{6}{10}-\frac{11}{20}q_{4}-\frac{11}{10}q_{8}\right] _{+} \\ p_{4}=\left[ \frac{3}{10}-\frac{11}{16}q_{4}-\frac{11}{8}q_{8}\right] _{+} \\ p_{6}=\left[ \frac{4}{10}-\frac{11}{8}q_{4}-\frac{11}{4}q_{8}\right] _{+} \\ p_{8}=\left[ \frac{1}{10}-\frac{11}{4}q_{4}-\frac{11}{2}q_{8}\right] _{+} \\ 0\le q_{2},q_{4},q_{6},q_{8}. \end{array} \end{aligned}$$

Clearly, \(q_{8}=1\) yields rewards equal to \(\frac{11}{4}\); all other feasible rewards involve \(q_{8}<1\) yielding a reward strictly higher than \( \frac{11}{4}.\) Evidently, player B can guarantee himself at least 2.75. This implies \(v^{B}\ge 2.75.\)

Next, we aim to show that player A can hold his opponent to at most 2.75 by using his second action at all stages of the play. First, we argue that the best reply of player B resulting in a pair of jointly convergent strategies yields at most 2.75. Then, we argue that if B uses a strategy resulting in a pair of strategies which is not jointly convergent, then this cannot yield more than 2.75. We do not provide the lengthy computations underlying our findings,Footnote 6 only intuitions.

For the first part, since we assume that the pair of strategies is jointly convergent, we may consider the (nonlinear) program

$$\begin{aligned} \begin{array}{l} {\max }_{q_{3},q_{4},q_{7},q_{8}}\frac{7}{2}q_{3}+\frac{11}{2}q_{4}+\frac{7}{4} q_{7}+\frac{11}{4}q_{8} \\ \text {s.t. }1=q_{3}+q_{4}+q_{7}+q_{8} \\ 0=(1-p_{3})q_{3}+(1-p_{4})q_{4}-p_{7}q_{7}-p_{8}q_{8} \\ p_{3}=\left[ \frac{6}{10}-\frac{11}{20}q_{4}-\frac{11}{10}q_{8}\right] _{+} \\ p_{4}=\left[ \frac{3}{10}-\frac{11}{16}q_{4}-\frac{11}{8}q_{8}\right] _{+} \\ p_{7}=\left[ \frac{4}{10}-\frac{11}{8}q_{4}-\frac{11}{4}q_{8}\right] _{+} \\ p_{8}=\left[ \frac{1}{10}-\frac{11}{4}q_{4}-\frac{11}{2}q_{8}\right] _{+} \\ 0\le q_{3},q_{4},q_{7},q_{8}. \end{array} \end{aligned}$$

Observe that if \(p_{7}=0\), then \(p_{8}=0\) as well, hence \(q_{3}=q_{4}=0.\) Then, the maximization program implies \(q_{8}=1\) and the value of the objective function is \(\frac{11}{4}.\) Let us define \(e_{k}=\left( q_{3},q_{4},q_{7},q_{8}\right) \) by \(q_{k}=1,\) \(q_{j}=0\) for \(j\ne k.\) Now, \(p_{7}=0\) if the relative frequency vector \(\left( q_{3},q_{4},q_{7},q_{8}\right) \) is in

$$\begin{aligned} \begin{array}{ll} S^{0}= &{} conv\left\{ \left\{ e_{4},e_{8},\left( \frac{78}{110},\frac{32}{110} ,0,0\right) ,\left( 0,\frac{32}{110},\frac{78}{110},0\right) \right\} \right. \\ &{} \cup \left. \left\{ \left( 0,0,\frac{94}{110},\frac{16}{110}\right) ,\left( \frac{94}{110},0,0,\frac{16}{110}\right) \right\} \right\} , \end{array} \end{aligned}$$

where conv S denotes the convex hull of set S. Possible higher rewards are only to be found for \(\left( q_{3},q_{4},q_{7},q_{8}\right) \in \varDelta ^{3}\backslash S^{0}.\)

Furthermore, \(1-p_{4}>1-p_{3}\ge \frac{4}{10}\ge p_{7}>p_{8},\) hence \( q_{3}+q_{4}\le \frac{1}{2}\le q_{7}+q_{8}.\) So, only tuples \(\left( q_{3},q_{4},q_{7},q_{8}\right) \) in

$$\begin{aligned} \begin{array}{ll} S^{1}= &{} conv\left\{ \left\{ \left( \frac{1}{2},0,\frac{1}{2},0\right) ,\left( \frac{1}{2},0,\frac{39}{110},\frac{16}{110}\right) ,\left( \frac{23}{ 110},\frac{32}{110},\frac{1}{2},0\right) \right\} \right. \\ &{} \cup \left. \left\{ e_{7},\left( 0,0,\frac{94}{110},\frac{16}{110}\right) ,\left( 0,\frac{32}{110},\frac{78}{110},0\right) \right\} \right\} . \end{array} \end{aligned}$$

may yield higher rewards than \(\frac{11}{4}.\) This follows from the observation that the sum of the probabilities to move to (from) Low is always above (below) \(\frac{4}{10},\) hence the (long term) proportion of the play spent in Low is at least \(\frac{1}{2}.\)

The points in \(S^{1}\) satisfying the restriction

$$\begin{aligned} 0=(1-p_{3})q_{3}+(1-p_{4})q_{4}-p_{7}q_{7}-p_{8}q_{8} \end{aligned}$$

form a two-dimensional manifold, say M, and the restriction is clearly violated in a neighborhood of the plane

$$\begin{aligned} \begin{array}{c} P=conv\left\{ \left( \frac{1}{2},0,\frac{39}{110},\frac{16}{110}\right) ,\left( \frac{23}{110},\frac{32}{110},\frac{1}{2},0\right) ,\left( 0,0,\frac{ 94}{110},\frac{16}{110}\right) ,\left( 0,\frac{16}{55},\frac{39}{55} ,0\right) \right\} \end{array} \end{aligned}$$

which is the facet of \(S^{1}\) opposite the line segment \(\left( \frac{1}{2} -x,0,\frac{1}{2}+x,0\right) \), \(x\in [0,\frac{1}{2}].\) Hence, M does not intersect P. The following defines for \(\alpha \in [0, \frac{16}{55}]\) a family of two-dimensional planes in \(S^{1}\):

$$\begin{aligned} S\left( \alpha \right) =\left\{ \left( q_{3},q_{4},q_{7},q_{8}\right) \in S^{1}|q_{4}+q_{8}=\alpha \right\} . \end{aligned}$$

For increasing \(\alpha \), we establish whether \(S\left( \alpha \right) \cap M\ne \varnothing ,\) and in that case the intersection is either a point, a line segment or a two-dimensional subset of \(S\left( \alpha \right) .\) Any unique point in this intersection with the highest weight on \(q_{4}\) clearly maximizes the objective function for \(S\left( \alpha \right) \); otherwise a one-dimensional set of points exist with highest weights on \(q_{4}\), then the point with the highest weight on \(q_{3}\) is the solution with respect to \(S\left( \alpha \right) .\) So, for fixed \(\alpha \) one observes immediately that \(q_{4}=\alpha \) and \(q_{8}=0\) for any solution with respect to \(S\left( \alpha \right) .\)

Take \(q_{3}+q_{7}=1\), then \(1-p_{3}=p_{7}=\frac{4}{10}\) which in turn implies \(q_{3}=q_{7}=\frac{1}{2}.\) In this case, \(\frac{7}{2}q_{3}+\frac{11}{ 2}q_{4}+\frac{7}{4}q_{7}+\frac{11}{4}q_{8}=\frac{21}{8}\). To obtain higher values of the objective function \(q_{4}\) should be increased from zero while keeping \(q_{8}=0.\) The final point is that the one-dimensional set of solutions restricted to such \(S\left( \alpha \right) \) for \(\alpha \in [0,\frac{16}{55}]\) “beginning at” \(\left( \frac{1}{2},0,\frac{1}{2} ,0\right) \) does not lead to higher values of the objective function than \( \frac{21}{8}\).

As no solution satisfying the restrictions of the maximization problem, yields more than \(\frac{11}{4}\) in \(\varDelta ^{3}\backslash S^{0}\), the solution is located in \(S^{0}\), so the global solution is \(q_{8}=1\); the connected reward to player B is 2.75. As player A can hold B to this amount, we have \(v^{B}\le 2.75.\) Hence, under the assumption that the outcome of the maximization problem of player B against his opponent using his second action in any state and at any stage, is a jointly convergent pair of strategies, we find \(v^{B}=2.75\).

Now, we continue our reasoning with the assumption that the maximization problem does not result in a pair of jointly-convergent strategies. First, note that the latter expression in the present framework means that B uses a strategy \(\sigma \) against player A playing \(\pi ^{*}=\left( 2,2,2,\ldots \right) \), such that \(q^{t}=\left( q_{3}^{t},q_{4,}^{t}q_{7}^{t},q_{8}^{t}\right) \) never converges, i.e., \( q^{t}\) must move around in the three-dimensional unit simplex forever.

Note that if for some (non-jointly convergent) pair of strategies \((\pi ^{*},\sigma )\) and some T,  it holds thatFootnote 7 \(\{q^{t}\}_{t\ge T}\subset S^{0},\) then \( \lim _{t\rightarrow \infty }q_{3}^{t}=\lim _{t\rightarrow \infty }q_{4}^{t}=0.\) This follows from the circumstance that \(p_{7}\left( q^{t}\right) =p_{8}\left( q^{t}\right) =0\) for all \(t\ge T.\) So, the long-term average payoffs at point t in time for t sufficiently large satisfy

$$\begin{aligned} \frac{7}{4}q_{7}^{t}+\frac{11}{4}q_{8}^{t}=\frac{7}{4}q_{7}^{t}+\frac{11}{4} \left( 1-q_{7}^{t}\right) =\frac{11}{4}-q_{7}^{t}<\frac{11}{4}. \end{aligned}$$

This means that \(\gamma ^{B}(\pi ^{*},\sigma )<\frac{11}{4}.\)

Furthermore, let \(S^{2}=conv\{e_{7},e_{8},\left( \frac{4}{7},\frac{3}{7} ,0,0\right) ,\left( 0,\frac{11}{15},\frac{4}{15},0\right) \}.\) Then it is easily confirmed that \(\frac{7}{2}q_{3}+\frac{11}{2}q_{4}+\frac{7}{4}q_{7}+ \frac{11}{4}q_{8}\le \frac{11}{4}\) for all \(q\in S^{2}.\) Hence, if for some \((\pi ^{*},\sigma )\) it holds that

$$\begin{aligned} \limsup _{T\rightarrow \infty }\left[ \Pr _{\pi ^{*},\sigma }\left[ \frac{ \#\left\{ q^{t}\subset S^{2}|t\le T\right\} }{T}\right] \ge \varepsilon \right]>0\text { for all }\varepsilon >0, \end{aligned}$$

then \(\gamma ^{B}(\pi ^{*},\sigma )\le \frac{11}{4}.\)

Let \(S^{3}=\varDelta ^{3}\backslash \left( S^{0}\cup S^{2}\right) \) and note that \(\frac{7}{2}q_{3}+\frac{11}{2}q_{4}+\frac{7}{4}q_{7}+\frac{11}{4} q_{8}\ge \frac{11}{4}\) for \(q\in S^{3}.\) By choosing a set of convenient (but not even tight) upper and lower bounds it takes quite some effort to confirm that if for some \((\pi ^{*},\sigma )\)

$$\begin{aligned} \limsup _{T\rightarrow \infty }\left[ \Pr _{\pi ^{*},\sigma }\left[ \frac{ \#\left\{ q^{t}\subset S^{3}|t\le T\right\} }{T}\right] \ge \varepsilon \right] =0\text { for all }\varepsilon >0, \end{aligned}$$

then \(\gamma ^{B}(\pi ^{*},\sigma )<\frac{11}{4}.\) This contradiction implies that it is impossible to guarantee play such that the resulting relative frequencies vectors stay in \(S^{3}\) (hence out of \(S^{0}\cup S^{2}\)) almost forever.

So, candidates to yield a limiting average reward higher than \(\frac{11}{4}\) must induce play such that relative frequency vectors stay forever in \( \left( S^{0}\backslash S^{2}\right) \cup S^{3}\). However, in \(\left( S^{0}\backslash S^{2}\right) \cup S^{3}\) there is persistent drift away from \(conv\{e_{3},e_{4}\}\) because the transition probabilities from Low to High are small and the transition probabilities from High to Low are large. Away from \(conv\{e_{3},e_{4}\}\) means towards \(conv\{e_{7},e_{8}\}\) which implies that the play will induce relative frequency vectors in \(S^{2}.\) Note that due to the assumption that \((\pi ^{*},\sigma )\) is not jointly convergent means \(e_{8}\) can only be approached infinitely often by relative frequency vectors from \(S^{2}\) or returning to \(S^{2},\) yielding limiting average rewards below \(\frac{11}{4}.\)

The negative results above imply that the maximization problem can be solved in jointly convergent strategies in this example. Hence, \(v^{B}=2.75\) (Fig. 12.4).

Fig. 12.4
figure 4

Each pair of jointly convergent pure-strategy rewards to the “north-east” of \(v=(2.75,2.75)\) can be supported by an equilibrium involving threats

Example 4 illustrates that finding threat points may be cumbersome as it requires at least a nonlinear program. Our approach was to alternate a minimization and a maximization program against sequences of stationary strategies to obtain lower and upper bounds for the threat point. If solutions coincide, as in the example above after two steps, we are done. Otherwise, all rewards yielding more than the lowest upper bound established can be associated to equilibria involving threats.

We can interpret every minimization and maximization program as a single controller stochastic game (cf., e.g., Parthasarathy and Raghavan [60]). However, the circumstance that the number of states captured in the relative frequency vectors (please recall our remarks on this issue in Sect. 12.3) is not finite takes our problem out of the scope of the algorithms implied to compute the associated values (e.g., Filar and Raghavan [19], Vrieze [75], see Raghavan and Filar [62] for a survey). Hordijk et al. [34] show that a stationary strategy suffices as a best reply against a fixed stationary strategy, and the optimization problems mentioned reduce to Markov decision problems (cf., e.g., Filar and Vrieze [20]). We used these results partially above,Footnote 8 but found not much help in them otherwise.

The general problem is equivalent to finding the value of a zero-sum stochastic game. Well-known techniques from standard stochastic game theory, e.g., Bewley and Kohlberg [6, 7] and Mertens and Neyman [56], offer insufficient solace because of the state space which is not finite but denumerable.

12.5 Conclusions

We added an innovation to the framework of Small Fish Wars (e.g., Joosten [37, 38, 41]) by allowing endogeneity in the transition structure: transition probabilities depend on the actions taken by the agents currently in the current state and on the history of the play. In this new setting states may become absorbing temporarily. Here, this feature is used to model the phenomenon that, even if the agents turn to ecologically sound exploitation policies, it may take a long time before the first transition to a state yielding higher outcomes occurs if the state Low turns out to have become temporarily absorbing. Thus, we capture hysteresis, called a poaching pit in the management of natural resources literature (cf., e.g., Bulte [11]). Hysteresis is an empirical phenomenon and may be observed in the slow recovery of coastal cod stocks in Canada after a moratorium on cod fishing since 1992 (cf., Rose et al. [63]). More recent estimates of stocks show a less bleak picture due to recent developments unrelated to resource management, but the stocks are still far removed from high historical levels.

Our approach generalizes standard stochastic games,Footnote 9 too. We propose methods of analysis originally introduced in Joosten et al. [42] inspired by Folk Theorems for stochastic games e.g., Thuijsman and Vrieze [71], Joosten [35, 36] and Schoenmakers [67], and developed further in for instance Joosten [37, 38, 41]. Crucial notion is that of jointly-convergent strategies which justify the necessary steps in creating analogies to the Folk Theorem. In our view, it is convenient that the complex model arising from endogenous transition probabilities may be solved quite analogously to repeated games.Footnote 10

Our analysis of a special example with hysteresis shows that a “tragedy of the commons” can be averted by sufficiently patient rational agentsFootnote 11 maximizing their utilities non-cooperatively. All equilibrium rewards yield more than the amounts associated to the permanent ruthless exploitation of the resource. Pareto optimal equilibrium rewards correspond to strategy pairs involving a considerable amount of restraint on the part of the agents, and are considerably higher than no-restraint rewards and slightly higher than perfect-restraint rewards.

To present a tractable model and to economize on notations, we kept the fish stock fixed yet stochastic, i.e., the variation in stock size and catches is only due to random effects; we imposed symmetry and used the three “twos”: two states, two players and two actions. Two distinct states allow to model the kind of transitions we had in mind; two agents are minimally required to model strategic interaction; two stage-game actions leave something to choose. In order to capture additional real-life phenomena observed, such as seasonalities or other types of correlations, a larger number of states may be required. Furthermore, more levels or dimensions of restraining measures may be necessary. Adding states, (asymmetric) players or actions changes nothing to our approach conceptually.

By keeping the model and its analysis relatively simple, hence presumably more tractable, further links to and comparisons with contributions in the social dilemma literature, cf., e.g., Komorita and Parks [1994], Heckathorn [30], Marwell and Oliver [53] where dyadic choice is predominant, may be facilitated. Our resource game is to be associated primarily with a social trap, see e.g., Hamburger [26], Platt [61], Cross and Guyer [14] of which the ‘tragedy of the commons’ cf., e.g., Hardin [27], Messick et al. [55], Messick and Brewer [54]) is a special notorious example.

Ongoing related research focusses on designing algorithms improving computational efficiency of existing ones to generate large sets of jointly-convergent pure-strategy rewards. The algorithms used to find the rewards visualized in consecutive figures in this chapter are unacceptably slow. This was an unpleasant surprise as they were in fact modifications of algorithms working extremely rapidly in models within the same and related frameworks (e.g., Joosten [37, 39,40,41]). The new algorithms not only generate the desired sets within acceptable computing times here, but also seem much more efficient than our algorithms used before when applied to certain repeated games, stochastic games and games with frequency dependent stage payoffs (cf., Joosten and Samuel [43, 44]).

Related ongoing research is devoted to computing threat points with spin-offs of the algorithms of Joosten and Samuel [43, 44] for the same models as mentioned in the previous paragraph. This is a solution born out of necessity because very little is known on finding threat points in this new framework. Future research should address this knowledge gap.

Future research should combine the various modifications and extensions of the original Small Fish Wars [37] with the innovation presented here. Joosten [41] adds various price-scarcity feedbacks to the model, as well as another low-density phenomenon called the Allee effect. For the majority of results and our methods of analysis we anticipate to need no more than the notion of jointly convergent strategies and continuity of stage payoff functions and transition probability functions involved.

We envision applications of stochastic games with endogenous transitions where hysteresis-like phenomena occur, for instance shallow lakes (e.g., Scheffer [65], Carpenter et al. [12], Mäler et al. [52]), labor markets (e.g., Blanchard and Summers [9]), climate change (e.g., Lenton et al. [49]), or more general, where tipping or regime shifts may occur [2, 66]. We also see possible extensions of earlier models on (un)learning by (not) doing, cf., Joosten et al. [42, 45], and related work, e.g., Schoenmakers et al. [68], Schoenmakers [67], Flesch et al. [22].