Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A cold stand-by redundancy is where a unit is only brought into operation when there is a vital need for it. Hospital emergency power supplies, emergency response vehicles, and many military weapon systems are typical examples of standby unit. The cost of such failures is large compared with all other costs and so a cost criterion is inappropriate. Instead, we maximise the time until a catastrophic event occurs (when the equipment is needed and is unable to function) for a standby unit in an uncertain environment. The uncertainty in the environment is reflected in the frequency with which initiating events (to which the equipment needs to respond) occur. In other research, changes in the environment and hence the frequency of the initiating events were modelled as a random process ([7]), but here the environment is controlled by an opponent and so the solution is modelled as a stochastic game.

When on duty in peace keeping roles countering terrorist threats, troops and their equipment cannot remain on perpetual standby. The troops have to be given rest and relaxation, and even if replaced by other forces there will be a learning period when the new forces will not be able to respond as rapidly as their predecessors. The equipment has to receive regular maintenance, and where appropriate, repair. The terrorists or warring parties wish to initiate events which will require the troops or equipment to respond. It is assumed that the readiness of the terrorists to initiate events in the next period of time is partially known by the authorities and is reflected in their state of alertness level (such as the U.S. DEFCOM levels). The terrorist player decides how active they will be in the next period, which then determines the alertness level. This is equivalent to saying that the terrorist player chooses what the alertness level will be. One also assumes that the terrorists have a good knowledge of the state of the standby “equipment” or troops, both by calculating how long they have been on standby and also by open or clandestine inspection of the equipment. This is then a maintenance model involving two players and such situations can be modeled as stochastic games.

The literature on Maintenance, Repair and Replacement policies for deteriorating equipment is long and distinguished. It started with the work of [1], and as the surveys and bibliographies of Refs. [5, 10, 12, 19, 21, 22, 24] and Wang [25] indicate, it has continued apace to the present day. Almost all the literature concentrates on policies which minimise the average discounted cost criterion. The idea of using a catastrophic event criterion to overcome the problem that failure will result in unquantifiably large cost was suggested first by Thomas et al. [23], with other instances being considered by Kim and Thomas [7]. In all these cases, the background environment and hence the probability of an initiating event is either fixed or follows a random Markovian process. Other authors such as Refs. [24], [9, 20] and [17, 18] have looked at maintenance in a random environment but in those cases the unit is always in use so the changes in the environment age the equipment at different rates, but do not affect when it is needed. Refs. [8, 2628] and [6] study protective systems, such as circuit breakers, alarms, and protective relays with non-self-announcing failures where the rate of deterioration is governed by a random environment. We, on the other hand, allow the deterioration of the equipment to be independent of the environment, but the environment affects the need for the equipment. Yeh [29] studied an optimal maintenance model for a standby system but focused on availability and reliability as the criteria. Modelling the maintenance process as a game where the opponent is able to set the environment conditions has not been discussed before. In fact the application of game theory in the maintenance problem is restricted to warranty contracts [13, 14]. Here, we model the situation using stochastic games which were first introduced by Shapley (1953).

It is clear that there has to be some constraint on the activity of the “terrorist” and hence on the alertness level. Otherwise, the game is trivial—the “terrorist” will always force the activity level to its highest (most dangerous) state. This then reduces to a problem with one decision maker and no variation in the external state, which was the problem considered in [23].

In Sect. 2, we define our notation, set up the basic unconstrained game and confirm that in such a game it is optimal for the terrorist player to keep the state of alertness at its highest level. In Sect. 3, we consider the situations where there are constraints on the frequency with which the terrorist can be sufficiently active to force the alertness index to its highest level. For ease of notation, we will concentrate on the game where there are only two alertness states—Peaceful or Dangerous—but the results apply in more complicated situations. We investigate two constraints. The first type of constraint is on the average frequency of dangerous states in the game played so far. The second constraint discounts the activity of the terrorist, so what he was doing in the last period is much more important than his activity ( or lack of it) several periods ago. In Sect. 4, we produce numerical examples and in Sect. 5 we draw conclusions on how the maintenance/ recuperation strategy depends on the interaction between the state of the equipment and the alertness level. We believe these models are a useful step in estimating Repair and Maintenance policies for standby equipment (and staff) which is used to combat the events initiated by intelligent and malevolent opponents.

2 Unconstrained Stochastic Game Model

We assume throughout that Player I, is the owner of the standby capability (hereafter called the equipment) and Player II is the one who seeks to create a catastrophic event–that is initiates an event to which the equipment fails to respond. The parameters of the model are

\({{i}}= 1,2,{\ldots },{{N}}\)—the state of the equipment where N is the failed state;

\({{P}}_{ij}\)—probability of equipment moving from state i to state j in one period of time, if no Repair action is performed.

This is independent of whether it is “used” or not that period. The standby unit is inspected regularly each period and this gives information on the operational state of the equipment to Player I. We assume that either through open inspection or by clandestine means, Player II is also aware of the state of the equipment.

Assume \(\sum \nolimits _{j=1}^N {P_{ij} =1}\), \(P_{NN} =1\) and the Markov chain is such that there exists a T = min \(\{n \ge 0; P_{iN}^n >0\,{\text{ for } \text{ all }}\, i\}\) so that within T periods, the chance of the equipment failing is positive from all starting states, i.e. \(\left( P \right) _{iN}^T >0\, \text{ for } \text{ all } \text{ i }\, (\text{ equivalent }\) \(\text{ to } \text{ min }_{i} \left( P \right) _{iN}^T =p>0)\)..

This ensures that without some maintenance of the equipment it is bound to fail eventually. The “ordering” of the intermediate states of the equipment reflects increasing pessimism about their future operability. This corresponds to \(P_{ij}\) satisfying a first-order stochastic condition namely

$$\begin{aligned} \sum _{j<k} {P_{ij} } \ge \sum _{j<k} {P_{i+1,j}}\quad \mathrm for all \quad i=1,\ldots ,N-1,\,k=1,\ldots ,N. . \end{aligned}$$

This means if one considers states lower than k to be the “good” ones , one is more likely to move to a good state from \(i\) then from \(i+1\). The preventive Maintenance/Repair action (the former if equipment is in state \(i=1, {\ldots }, N-1\), the latter if the state is N) takes one time period, during which the equipment cannot be used, if required. Such actions return the equipment to state 1-the good as new state. The subsequent results also hold if the maintenance action is not perfect, and returns the equipment to state i with probability \(r_{i}\), but we will not complicate the notation by describing this case. \(a=1,2,{\ldots },M\) is the level of alertness of Player I but is really a decision by Player II on how active he intends to be in the next period. Both sides know that Player I has sufficient information sources to be able to correctly identify what this activity level will be. When Player II decides on his activity level, this corresponds to him choosing the “environment” for the next period. \(b_{a}\) is the probability of an initiating event occurring when the environment is a where \(b_{1} \le b_{2}\le {\ldots } \le b_{M}\) since the higher the alertness level the more likely that Player II will seek to initiate an event.

In the basic game, Player I has to decide at each period whether to undertake preventive Maintenance or Repair on his standby equipment, and Player II has to decide what the threat level of the environment should be. The game is played repeatedly until there is a catastrophic event when Player I cannot respond to an initiating event either because the equipment is being preventively maintained or because it has failed. Thus, Player I wants a Repair/Maintenance strategy that maximises the expected time until a catastrophic event, while Player II wishes to choose effort levels (environments) to minimise this expected time.

$$\begin{aligned} \begin{array}{c|c|c|c|c} \hline &{}\Gamma _i ,i\ne N&{}\mathrm{{II}}&{}&{}\\ \hline &{} &{}{\text{ Environmental } \text{ level } \text{1 }}&{} {\text{ Environmental } \text{ level } \text{ a }}&{} {\text{ Environmental } \text{ level } \text{ M }}\\ \hline \mathrm{{I}}&{}{\text{ Do } \text{ nothing }}&{} 1+\sum _{j=1}^N {P_{ij} } \Gamma _j &{} 1+\sum _{j=1}^N {P_{ij} } \Gamma _j &{} 1+\sum _{j=1}^N {P_{ij} } \Gamma _j \\ \hline &{} {\text{ Repair }}&{} (1-b_1 )(1+\Gamma _1 )&{} (1-b_a )(1+\Gamma _1 )&{} (1-b_M )(1+\Gamma _1 )\\ \hline \end{array}\nonumber \\ \end{aligned}$$
(1)

Thus the basic game \(\Gamma \)) is a two person zero sum stochastic game consisting of N subgames \(\Gamma _i,\, i=1,2,..N\), where \(\Gamma _i\) is the game starting in the situation when the equipment is in state i. Player I decides whether to perform a maintenance action or Do Nothing for the next period while Player II decides what the environment will be. This defines the probability that an initiating event will occur during the period, and hence if the equipment is down or being repaired, whether there is a catastrophic event. If the equipment is in state \(i (\Gamma _{i})\) and no maintenance is carried out, it will move to state \(j (\Gamma _{j})\) for the next period with probability \(P_{ij}\). The payoff matrix when the game is in subgame \(\Gamma _{i}\) is given by

$$\begin{aligned} \begin{array}{|c|c|c|c|c|} \hline &{}\Gamma _N&{}\mathrm{{II}}&{}&{}\\ \hline &{}&{} {\text{ Environrnental } \text{ level } \text{1 }}&{} {\text{ Environmental } \text{ level } \text{ a }}&{} {\text{ Environmental } \text{ level } \text{ M }}\\ \hline &{}{\text{ Do } \text{ nothing }}&{} (1-b_1 )(1+\Gamma _N )&{} (1-b_a )(1+\Gamma _N )&{} (1-b_M )(1+\Gamma _N )\\ \hline &{} {\text{ Repair }}&{} (1-b_1 )(1+\Gamma _1 )&{} (1-b_a )(1+\Gamma _1 )&{} (1-b_M )(1+\Gamma _1 )\\ \hline \end{array}\nonumber \\ \end{aligned}$$
(2)

The deterioration assumption guarantees that there is a probability p that the equipment will be down or in repair every T periods. In that period any initiating event will become a catastrophic event, and the least chance of an initiating event in any period is \(b_{1}\). Thus, the time until an initiating event is bounded above by \(T/b_1^ p\). So \(\Gamma \) is a two person zero-sum stochastic game with a finite number of subgames, each of which has only a finite number of pure strategies ( \(2\times M\)) and where the total reward to each player is bounded above. Mertens and Neyman [11] proved that such games have a solution. The value of the game v(i) starting with equipment in state i satisfies the following

$$\begin{aligned}&v(i)=\textit{val}\left[ {{\begin{array}{lll} {1+\sum _{j=1}^N {P_{ij}} v(j)}&{} {...} &{} {1+\sum _{j=1}^N {P_{ij}} v(j)} \\ {(1-b_1)(1+v(1))}&{} {...}&{} {(1-b_M )(1+v(1))} \\ \end{array} }} \right] \quad \mathrm{{for}}\quad i \ne N \end{aligned}$$
(3)
$$\begin{aligned}&v(N)=\textit{val}\left[ {{\begin{array}{lll} {(1-b_1 )(1+v(N))}&{} {...}&{} {(1-b_M )(1+v(N))} \\ {(1-b_1 )(1+v(1))}&{} {...}&{} {(1-b_M )(1+v(1))} \\ \end{array} }} \right] \end{aligned}$$
(4)

where val means the value of the game whose payoff matrix follows. Moreover, this game can be solved using a value iteration approach where the \({{ {n}}}\text{ th }\) iterate \(v_{n}(i)\) (which corresponds to value if only n periods were allowed) satisfies \({{v}}_{0} (i) =0\) for all i and then

$$\begin{aligned} v_n (i)=\textit{val}\left[ {{\begin{array}{lll} {1+\sum _{j=1}^N {P_{ij}} v_{n-1} (j)}&{} {\ldots }&{} {1+\sum _{j=1}^N {P_{ij} } v_{n-1} (j)}\\ {(1-b_1 )(1+v_{n-1} (1))}&{} {\ldots }&{} {(1-b_M )(1+v_{n-1} (1))} \\ \end{array} }} \right] \quad \mathrm for \quad i \ne N \end{aligned}$$
(5)

with a similar equation based on (4) for \(v_{n}(N)\). This allows us to solve the game with help of the following results.

Theorem 1

  1. (i).

    \(v_{n}(i)\) is non-deceasing in n and non-increasing in i and converges to v(i).

  2. (ii).

    \(v(i)\) is non-increasing in i.

  3. (iii).

    The optimal strategy in the unconstrained game is: for Player II always to choose the most dangerous environment ( level M); for Player I to Do Nothing in states \(i < i^{*}\), where \(i^{*} \le \) N, and perform maintenance/repair in state \(i^{*}\) to N.

 

Proof

  1. (i).

    The non-decreasing result in n follows since \(v_1 (i)\ge v_0 (i)=0\) and then by induction. Since \(v_{n-1} (i)\ge v_{n-2} (i)\) for all i , the terms in the payoff matrix for \(v_n (i)\) are greater than or equal to the terms in the matrix for \(v_{n-1} (i)\). Hence \(v_n (i)\ge v_{n-1} (i)\) and the induction step is proved. Similarly \(0=v_0 (i+1)\le v_0 (i)=0\) for all i, so the hypothesis of v(i) non-increasing in i holds for \(n=0\). Assume true for \(v_{n-1} (i)\) then the stochastic ordering plus the monotonicity of \(v_{n-1} (i)\) implies \(\sum \nolimits _{j=1}^N {P_{i+1,j}} v_{n-1} (j)\le \quad \sum \nolimits _{j=1}^N {P_{i,j}} v_{n-1} (j)\). Each entry in (5) of \(v_n (i)\) is as large if not larger than the corresponding terms for \(v_n (i+1)\), so \(v_n (i+1)\le v_n (i)\) for \(i=1,\ldots , N-1\). The same result holds for \(v_n (N)\le v_n (N-1)\) since for \(v_n (N)\) it is clear that Repair dominates do nothing because \(v_{n-1} (N)\le v_{n-1} (i)\). Hence \(v_n (N)=\min \{(1-b_1 )(1+v_{n-1} (i)),(1-b_2 )(1+v_{n-1} (i))\}\le v_n (N-1)\) and the induction step holds.

  2. (ii).

    Trivially since \(v_n (i)\le v_{n+1} (i)\), \(v_n (\cdot )\) converges to \(v(\cdot )\) because \(v_n (i)\) is a bounded increasing function, bounded above by \(T/p b_{1}\). The monotonicity of \(v_{n}(i)\) then guarantees the monotonicity of \(v(i)\).

  3. (iii).

    Player II’s strategy is obvious since the values for the most dangerous environmental choice (M) always dominates the other strategies. Since \(\mathrm{v(1) } \ge v (N)\), the repair strategy (for the dangerous environment) is as good if not better than the do nothing strategy for state N. The monotonicity of \(v(j)\) together with the stochastic ordering of \(P_{ij}\) implies \(\Sigma P_{ij}\) \(v(j)\) is non-increasing in i and so once \(\Sigma P_{ij} v(j)\) goes below \((1-b_{M}(1+ v(1))\) (the definition of \(i^{*}\) it will remain below it for all higher states i.

So the unconstrained game is solved by the terrorist player always being at the highest state of activity. This is both unrealistic and reduces the problem to a single decision maker problem such as that in [7]. In the next section, we look at a more realistic assumption, namely that there is some limit on the terrorist’s activity and hence on the frequency the environment is at its highest danger level. To keep the situation clear, we will hereafter assume there are only two levels of alertness—which we will label Dangerous (level 2) and Peaceful (level 1).

3 Models with Constraints on Effort

One reason an enemy cannot continuously create a dangerous environment, is that it needs time to regroup, plan and rest its forces—which we facetiously describe as “sleep”. One possible assumption is that in stage n of the game, the enemy can only have created a dangerous environment for a proportion c of these stages. Thus if it has created a dangerous situation in d of the n periods that the game has been running, \(d \le cn\) then \(s= cn-d\) is a measure of the “sleep index”. This “sleep index” relates to how many consecutive periods of dangerous environment the enemy can create before it has to rest. If the sleep index is s and at the next period Player II chooses a Peaceful environment, the index will move to \(s+ c\), while if he chooses a dangerous environment, the index will move to \(s+ c-1=s-(1-c)\). In this model, the effect of the rest induced by a peaceful environment will endure undiminished throughout all the future. An alternative view is that the c value that the restful period adds to the “sleep” index should diminish to \(\alpha c\) next period, \(\alpha c^{2}\) the period thereafter and so on. In this case, if the current sleeping index is s, and Player II chooses a Peaceful environment this period, the index will move to \(\alpha s+c\), while if Player II chooses to make the environment dangerous the index will move to \(\alpha s-(1-c)\).

We will prove results for the two cases \(\alpha =1\) (undiscounted) and \(\alpha < 1\) (discounting of the index) in the same model though in the former case the sleep index could be infinite, while in the latter case it is bounded above by \({{ {c}}}/(1-\alpha \). In order to ensure a finite set of subgames, we will always assume in the undiscounted case that the index cannot exceed S. So the stochastic game \(\Gamma \) model of this situation consists of a series of subgame \(\Gamma _{i,s}\) where \(i=1, --- , N\) and \(0 \le \mathrm{s } \le \) min \(S, c/(1-\alpha )\). Although the sleep index set appears continuous, it is in fact countably infinite, and in fact finite if only r stages are allowed. If the index starts with \(s_{0}\) then after r stages, the value can only be \(\alpha ^{r}s_0 +c(1-\alpha ^{r})/(1-\alpha )-\sum \nolimits _{i=1}^r {Z_i \alpha ^{r-i}}\) where \(Z_{i}=1\) or 0 depending on where Player II played Dangerous or Peaceful at the \(i{\text{ th }}\) stage.

Let \(v (i,s)\) be the value of the game \(\Gamma \) starting in \(\Gamma _{i,s}\), where the equipment is in state i and the sleep index is s, then the values satisfy the equations

$$\begin{aligned} v(i,s)=\textit{val} \left[ \begin{array}{c@{\quad }c} (1-\delta _N (i)b_1 )(1+\sum \limits _{j=1}^N {P_{ij} v(j,\alpha s+c)}) &{} (1-\delta _N (i)b_2 )(1+\sum \limits _{j=1}^N {P_{ij} v(j,\alpha s+c-1)}) \\ {(1-b_1 )(1+v(1,\alpha s+c))}&{} (1-b_2 )(1+v (1,\alpha s+c-1)) \\ \end{array} \right] \end{aligned}$$
(6)

where \(\delta _N (i)=1\quad \mathrm{{if}}\,\,i=N, 0\quad \) otherwise

One can solve this problem as in the previous section using value iteration. The iterates \({{v}}_{n}\) (i, s) satisfy an equation like (6) but with \(v(i,s)\) replaced by \({{v}}_{n}\) (i, s) on the left hand side of (6) and v(i, s) replaced by \(v_{n-1}(i,s)\) on the right hand side of (6).

As in Sect. 2, in order to prove results about the optimal policies for the game, \(\Gamma \), one proves results about \(v_{n}(i,s)\) and hence \(v(i,s)\).

Lemma 1

  1. (i).

    \(v_{n}(i,s)\) is non-deceasing in n and non-increasing in i and s.

  2. (ii).

    \(v(i, s)\) is non-increasing in i and s.

 

Proof

  1. (i).

    All the results follow by induction and the fact that if \(W_1 =\textit{val}\left[ {{\begin{array}{ll} {a_1 }&{} {b_1 } \\ {c_1 }&{} {d_1 } \\ \end{array} }} \right] \) and \(W_2 =\textit{val}\left[ {{\begin{array}{ll} {a_2 }&{} {b_2 } \\ {c_2 }&{} {d_2 } \\ \end{array} }} \right] \) then if \(a_{1} \ge a_{2}, b_{1}\ge b_{2}, c_{1}\ge c_{2}, d_{1} \ge d_{2}, W_{1} \ge W_{2}\).

  2. (ii).

    Since \(v_{n}(i, s)\) is non-deceasing in n and bounded above by \(T/p b_{1}\), then \(v_{n}(i,s)\) is a monotonic bounded sequence and so converges to v(i,s). So the properties, \(v_{n} (i+1,s) \le v_{n}(i,s)\), \(v_{n}(i,s^{\prime }) \le v_{n} (i, s)\) if \(s \le s^{\prime }\) hold for the limit function \(v (i,s)\). This allows one to describe features of the optimal strategies . If the item is “down (in state N)” then Player I will want to Repair it, while Player II will want to make the environment dangerous if they can. This ability to make the environment dangerous can only occur if \(\alpha \) s- (\(1-c\)) \(\ge \) 0 or \(s \ge (1-c)/\alpha \). Since if Player II starts with a sleep index of 0, the maximum the index can be is \({{s}}< c/(1-\alpha )\). Player II can only play the Dangerous strategy if \(c/(1-\alpha )>(1-c)/\alpha \) , i.e. \(\alpha +c>1\). So if \(\alpha + c\le 1\), the resultant game becomes trivial with Player II only able to play Peaceful and the results of the 1-player situation in [7], holding.

Theorem 2

Provided \(\alpha +c \ge 1\), then in state N

  1. 1.

    if s satisfies \(s \ge (1-c)/\alpha \), the optimal strategies are “Repair vs Dangerous”

  2. 2.

    if s satisfies \(s<(1-c)/\alpha \) , the optimal strategies are “Repair vs Peaceful”

 

Proof

The payoff matrix in the subgame \(\Gamma _{N, s}\) is

\(\Gamma _{N, s}\)

Making peaceful situation

Making dangerous situation

Do nothing

\((1-b_1 )(1+v(N,\alpha s+c))\)

\((1-b_2 )(1+v(N,\alpha s+c-1))\)

Repair

\((1-b_1 )(1+v(1,\alpha s+c))\)

\((1-b_2 )(1+v(1,\alpha s+c-1))\)

Since by Lemma 1, \(v(N, s) \le v (1, s)\), it is trivial that the Repair strategy dominates the Do Nothing strategy for Player I. If \(s<(1-c)/\alpha \), then Player II can only play the Peaceful strategy and so “Repair versus Peaceful” is optimal. If \(s \ge (1-c)/\alpha \), we need to show that it is better for Player II to play Dangerous than Peaceful at the first occasion the system is in state N. Assume the system is currently down and let \(\pi ^*\) be the policy that chooses to play “peaceful” at this period and plays optimally thereafter so \(v^{\pi _P^*}(N,s)=(1-b_1 )(1+v(1,\alpha s+c))\). Let \(\pi _{1}\) be the policy that plays “peaceful” in the current period when \(i=N\), and is the same as \(\pi ^{*}\) except that at the next down situation it will choose the dangerous environment. Since playing Dangerous rather than Peaceful cannot increase the time until a catastrophic event \(v^{\pi _1}(N,s)\le v^{\pi _P^*}(N,s)\). Let \(\pi _{2}\) be the policy that plays Dangerous now and Peaceful at the next down event, but otherwise chooses the same actions as \(\pi ^{*}\). Let K be the expected time between now and the next time when \(i=N\) under \(\pi ^{*}\). Let T be the expected time from the next time \(i=N\) to when a catastrophic event occurs under the \(\pi ^{*}\) policy, conditional on there being a next down time when \(i=N\). Then \(v^{\pi _1 }(N,s)=(1-b_1 )(K+(1-b_2 )T)>(1-b_2 )(K+(1-b_1 )T)=v^{\pi _2}(N,s)\)

If \(\pi _D^*\) is the optimal policy for Player II to play against the optimal policy of Player I provided he chooses dangerous for his period, then \(v^{\pi _D^*}(N,s)\le v^{\pi _2 }(N,s)\le v^{\pi _1 }(N,s)\le v^{\pi _P^*}(N,s)\). So it is best for Player II to choose the “dangerous” environment as the best response to Player I’s optimal policy.

If the standby system is working, then one can have any of the four combinations of pure strategies being chosen or even mixed strategies. What one can show though is that if the sleeping index is so low, that Player II cannot provoke a dangerous environment either this period or next period; then Player I will do nothing if the system is working.

Theorem 3

If \({{s}}<(1-\alpha c-c)/\alpha ^{2}\), then Player I will do nothing in state \((i,s)\) when \(i\) is a working state (\(i<N\)).

Proof

The condition on s means that Player II must let the environment be peaceful for the next two periods. Consider the possible strategies for Player I over these next two periods,

  • strategy 1 : Repair in both periods

  • strategy 2 : Repair in period 1 and Do Nothing in period 2

  • strategy 3 : Do Nothing in period 1 and Repair in period 2

Let \({{W}}_{1}\), \({{W}}_{2}\), \({{W}}_{3}\) be the respective expected times until a catastrophic event if the optimal policy is used after the first two periods. Then

$$\begin{aligned}&W_1 =(1-b_1 )(1+(1-b_1 )+(1-b_1 )\,v\,(1,\alpha ^{2}s+\alpha c+c) \\&W_2 =(1-b_1 )(1+1+\sum _{j=1}^N P _{ij}\, v\,(1,\alpha ^{2}s+\alpha c+c) \\&W_3 =1+(1-b_1 )+(1-b_1 )+(1-b_1 )\,v\,(1,\alpha ^{2}s+\alpha c+c) \end{aligned}$$

and trivially W \(_{3} \ge \) W \(_{1 }\)and W \(_{3} \ge \) W \(_{2}\) since v (1, s) \(\ge \) v (j, s) for all j and s. Hence, the Do Nothing now policy dominates the policies that Repair now and the result holds.

It need not be the case that it is optimal to Do Nothing even if one is in the new state \(i=1\) because one may recognise that an opponent has to play peacefully this period if the sleep index is s where \(\alpha s+c-1<0\). Repairing keeps the item in state 1, while it could degrade under the Do Nothing strategy. This result will be found in an example in the next section (\(s=0.6\) in Table 1). Before doing that we will show that in the undominated case if the system is working, and if s is large enough, then the players will either play “do nothing” against “peaceful” or they will play mixed strategies where Player I has a very high chance of playing “do nothing”. To do that, we need the following limit result.

Lemma 2

In the case \(\alpha =1\), as s \(\rightarrow \infty \), \(v_{n}\) (i,s) and v(i,s) converge, respectively, to v\(_{n}\)(i) and v(i) where

$$\begin{aligned} v_{n} (i) = \max \left\{ {1 + \sum \limits _{{j = 1}}^{N} {P_{{ij}} v_{{n - 1}} (j)} ,(1 - b_{2} )(1 + v_{{n - 1}} (1))} \right\} \end{aligned}$$

and

$$\begin{aligned} v(i) = \max \left\{ {1 + \sum \limits _{{j = 1}}^{N} {P_{{ij}} v(j)} ,(1 - b_{2} )(1 + v(1))} \right\} \end{aligned}$$

These equations correspond to the situation where Player II is choosing the dangerous environment all the time.

Proof

From Lemma 1, v \(_{n}\) (i, s) and v (i, s) are non-increasing sequence in s, and as they are bounded above, they must converge. In the limit since b \(_{1}<\) b \(_{2}\), Player II’s Dangerous strategy dominates its Peaceful one, since the payoffs against Do Nothing are the same, and against repair (\(1-{b}_{2})(1+{v}_{n-1}(1)) <(1-{ b}_{1})(1+{ v}_{n-1}(1))\).

We are now in a position to describe what happens in the game when the sleep index gets very large.

Theorem 4

In the game with \(\alpha = 1\), if the equipment is in a working state i, then for any \(\varepsilon > 0\), \(\exists \) S so that for \(s \ge S\), the optimal strategies are either a) Do Nothing versus Peaceful, or b) mixed strategies where Player I plays Do Nothing with a probability at least \(1-\varepsilon \).

Proof

Consider the payoff matrix in the subgame \(\Gamma _{i,s}^n \) of the game with n periods to go,

\(\Gamma _{k\ne N,s}^n \)

Making Peaceful Situation

Making Dangerous Situation

Do Nothing

\(1+\sum \nolimits _{j=k}^N {P_{kj} } v_{n-1} (j,s+c):A_n \)

\(1+\sum \limits _{j=k}^N {P_{kj} } v_{n-1} (j,s+c-1):B_n \)

Repair

\((1-b_1 )(1+v_{n-1} (1,s+c)):C_n \)

\((1-b_2 )(1+v_{n-1} (1,s+c-1)):D_n \)

and let \(A, B, C, D\) be the comparable values in \(\Gamma _{i,s}\) when \(v_{n-1}\) is replaced by v. From Lemma 1 and the stochastic ordering property it follows that \(B>A.\) We also can prove \(B>D\). By convergence, we can choose a N and a S so that \(|v_{n}(j,s)-v(j,s)| <\varepsilon \) for all j, s if \(n\ge N\) and provided \(s>S\) we can choose \(|v(j,s)-v(j)|<\varepsilon \) for all j where \(v_{n}(j)\) is defined in Lemma 3.2. Then,

$$\begin{aligned}&1+\sum _{j=k}^N {P_{kj} v(j,s)} \ge 1+\sum _{j=k}^N {P_{kj} v_{n+1} (j,s)} -\varepsilon \\&\ge 1+\sum _{j=k}^N {P_{kj} v_{n+1} (j)} -2\varepsilon \ge 1+\sum _{j=k}^N {P_{kj} (1-b_2 )(1+v_n (1)})-2\varepsilon \\&{}=1+(1-b_2)(1+v_n(1))-2\varepsilon \ge 1+(1-b_2)(1+v_n(1,s))-3\varepsilon \\&\ge 1+(1-b_2 )(1+v(1,s))-4\varepsilon \ge (1-b_2 )+(1-b_2 )v_n (1,s) \\&\mathrm{{provided}}\ \varepsilon <1/4 \end{aligned}$$

Hence \(B>D\).

If \(A \ge C\), then the fact \(B>D\), means Do Nothing dominates Repair for Player I and \(A<B\) means that Peaceful dominates Dangerous for Player II. Thus, Do Nothing versus Peaceful is optimal.

In the case A \(<\) C, note that as b\(_{2}>\) b\(_{1}\), then for s large enough \(C>D\) since

$$\begin{aligned}&(1 - b_{1} )(1 + v(1,s + c)) \ge (1 - b_{1} )(1 + v(1)) - \varepsilon \\&\ge (1 - b_{2} )(1 + v(1)) + \varepsilon \ge (1 - b_{2} )(1 + v(1,s + c - 1)) \end{aligned}$$

Hence with \(C>A\), \(C>D\), \(B>A\), \(B>D\), the optimal strategy is a mixed one with Player I playing \(\left( \frac{C-D}{C+B-A-D},\frac{B-A}{C+B-A-D}\right) \). For any \(\delta > 0\) choose \(\varepsilon \) so that \(\delta > 2\varepsilon /(b_{2}- b_{1})\) and \(\varepsilon <\frac{b_2 -b_1 }{2(1-b_2 )}\). Then the convergence of \(v(j,s)\) in s means one can choose a \(S*\) so for \(s\ge S* {\vert }{ v}({ j,s+c-1})-v(j,s+c){\vert }< \varepsilon \) for all \(s, \ge S*\) and all \(j\). For such \(s\)

$$\begin{aligned}&0\le B-A=\sum _{j=k}^N {P_{kj} } [v(j,s+c-1)-v(j,s+c)]<\varepsilon \\&C+B-A-D\ge C-D=[(1-b_1 )v(1,s+c)-(1-b_2 )v(1,s+c-1)]\\&+b_2 -b_1 \ge ((1-b_1 )-(1-b_2 )v(1,s+c))-(1-b_2 )\varepsilon +(b_2 -b_1 )\\&\ge (b_2 -b_1 )-(1-b_2 )\varepsilon >(b_2 -b_1 )/2 \\ \end{aligned}$$

Then Player I plays Repair with probability

$$\begin{aligned} \frac{B-A}{C+B-A-D}\le \frac{\varepsilon }{(b_2 -b_1 )/2}<\delta \end{aligned}$$

and the result holds.

4 Numerical Examples

The actual policies in specific case can be obtained by value iteration calculations. The following examples have three equipment states-1 (new), 2 (used) and 3 (failed)—and doing nothing gives the following transition probabilities,

$$\begin{aligned} P=\left( {{\begin{array}{c@{\quad }c@{\quad }c} {0.3}&{} {0.4}&{} {0.3} \\ 0&{} {0.4}&{} {0.6} \\ 0&{} 0&{} 1 \\ \end{array} }} \right) \end{aligned}$$

The first examples are the non-discounted cases when \(\alpha =1\). Assume the constraint is that \(c=0.3\) so Player II can only create a dangerous environment \(30\,\%\) of the time.

Tables 1 and 2 give the results in the new state (i = 1) first when \(b_{1}=0.1\) and \(b_{2}=0.5\) so there is a large difference between the Peaceful and Dangerous states (Table 1), and then when \(b_{1}=0.4\) and \(b_{2}=0.5\) (Table 2) so there is little difference between the two states. Notice in all cases, Player II can only choose the Peaceful environment if the sleep index s is less than 0.7. Theorem 3 says that for \(s< 0.4,\) Player I does nothing but notice in Table 1 at s = 0.6; Player I will Repair, even though (perhaps because) Player II can only ensure a Peaceful environment at this period but at the next period, could move the environment to the dangerous level.

Looking at Table 1, when the \(b_{1}, b_{2}\) are quite different, the optimal strategies are mixed as s increases, though Player I’s is getting more and more likely to Do Nothing. When s is large enough, Theorem 4 applies and in Table 1 an \(\varepsilon \) mixed strategy is optimal. In Table 2 when \(b_{1}, b_{2}\) are similar, then Do Nothing versus Peaceful is optimal at all sleep index values since there is no point in repairing equipment in the best state since the impact of the environment is so small.

Table 1 The result for i = 1(new), b \(_{1}\) = 0.1, b \(_{2}\) = 0.5
Table 2 The result for \(i=1\) (new, \(b_{1}=0.4,b_{2}=0.5\))
Table 3 The result for i = 2 (not new, but working)
Table 4 The result for i = 3 (down)
Table 5 The result for i = 1 (new), c = 0.4, \(\alpha \) = 0.8
Fig. 1
figure 1

Simple Form of the Result in Tables 1, 3 and 4

Tables 3 and 4 are the policies for the used and failed states in the case when \(b_{1} = 0.1\) and \(b_{2} = 0.5\) (which are the same parameters as in Table 1 for the new state). In state 2, one has Do Nothing vs Peaceful for \(s< 0.4\) (no dangerous environments for at least two periods), then one has Repair vs Peaceful, at \(0.4\le s< 0.7\). The mixed strategies are optimal as s increases and as \(s \rightarrow \infty \) Player I tends to Do Nothing with probability \(1-\varepsilon \) while Player II tends to (0.62, 0.38). Table 4 confirms the results of Theorem 2 that when the unit is down it must be repaired and the enemy will seek to make the environment dangerous if he can.

Figure 1 summarises the results of Tables 1, 3 and 4. If the equipment has failed one must repair it and the enemy will try to ensure a dangerous environment if its sleep index is high enough to allow it to. If the equipment is working then for a low sleep index, the solution is Do Nothing against Peaceful. As the sleep index increases so the enemy will be able to be dangerous in the next period, the equipment is repaired ready for that. If the sleep index is high enough that the enemy can ensure a dangerous environment this period, both sides play a mixed strategy with Player I more and more likely to Do Nothing and Player II being slightly more likely to play Dangerous (but is still likely to play Peaceful most of the time because of the “sleep” restrictions).

Looking at the same problem \(b_{1}=0.1, b_{2}= 0.5\) but in the discounted case with \(\alpha \)= 0.8 and c = 0.4 (not 0.3) leads to Tables 5, 6 and 7.

Table 6 The result for i = 2 (not new, but working), c = 0.4, \(\alpha \) = 0.8
Table 7 The result for \(i=3\) (down), \(c = 0.4, \alpha = 0.8\)
Fig. 2
figure 2

Simple form of the result in Table 5, 6 and 7

Again Table 6 confirms the results of Theorem 2, since Player II can only play Dangerous if \({ s}\,\ge 0.75\), while Table 5 shows as the sleep index increase the strategies change from Do Nothing versus Peaceful to Repair versus Peaceful and then to mixed strategies. Note that 2 is the greatest value the sleep index can be when \(c = 0.4\) and \(\alpha = 0.8\), and in this case both players are playing a mixed strategy.

The results of Tables 5, 6 and 7 are summarised in Fig. 2. The results are very similar to the undominated case. The only difference is that because discounting prevents the sleep index getting too large, Player I’s mixed strategy does not tend to playing “do nothing” almost all the time but goes to a strategy where one does nothing \(80\,\%\) of the time.

5 Conclusion

These models investigate the Maintenance and Repair policy for a standby system where the environment of when it is needed is controlled by an opponent. The most obvious context for this problem is the military one either in conventional or peace keeping roles. It could also apply to emergency services that need to respond to terrorist threats. We have shown that if there is no limit on resources available to the “enemy”, then the problem reduces to one with a single decision maker dealing with a constantly high risk environment. If more realistically the enemy cannot always be ready to act, but needs time to recuperate, resupply and plan, the situation is much more complex, both in the situation where the restful periods have a long-term effect and when this effect is discounted over time.

One interesting feature is that the optimal policies can be mixed so each period there is a certain probability one should perform maintenance, and a certain probability one does nothing. Clearly if there are a number of such standby units, the mixed policy can translate into what proportion should be given preventive maintenance at that time. If the difference between the benign and the dangerous environment (\(b_{1}\), \(b_{2}\)) is small, one tends only to perform maintenance when the equipment is close to failure, but in other situations one will maintain the equipment when it is in a good state because one feels the environment is likely soon to be dangerous (especially if the sleep index is high). One always repairs a failed unit, but the “enemy” will seek to take advantage of the failure by making the environment as dangerous as it can in those circumstances.

The models introduced in this chapter are the first to address the question of maintenance in an environment where failure can be catastrophic and where there is an enemy seeking such catastrophes. Clearly, more sophisticated models can be developed but we believe this chapter has indicated that one can get useful insights by addressing the problem as a stochastic game. Moreover, the game theory approach may be used to model Maintenance and Repair policies for equipment which are routinely used to deal with threats such as airport passenger and luggage screening devices.