Keywords

17.1 Introduction

In the game of obstacle tag, let at t = t 1 the segment \(P^{t_1}E^{t_1}\) cross the circular obstacle centered at C (see Fig. 17.1). Striving to catch E in minimal time, P may follow the geodesical line shortest at this instant. However, when E retreats on the continuation of this line, P may recognize that the other geodesic line becomes of equal length (when P, E, C are collinear at t = t 2) first, and then gets even shorter (at t = t 3). Switching the line, e.g., at t = t 3, P may reduce the initially evaluated chase time that equal to the length of the geodesic line at the current state divided by the speed difference. At the first International Symposium on the Theory and Applications of Differential games held in Amherst in 1969, R. Isaacs mentioned [1] that the ideas of his book [3] aren’t suitable for analyzing the obstacle tag game in the described situation (see Problem 6.10.1). Different aspects of this game were studied by numerous authors (see, e.g., [1, 4,5,6] for the most relevant results and further references).

Fig. 17.1
figure 1

Switching preferable geodesic lines

J.R. Isbell described a solution of this game without using any formalism [4]. He assumed that P moves along the geodesic line whereas E maintains collinearity of P, C, E (Case 7) or takes a secant line to the circle (Case 8) avoiding the situation shown in Fig. 17.1 by that ways. According to the generalized Isaacs’ approach, singular surfaces is the main subject of zero-sum two-person differential games with full information. The state space has to be filled with the trajectories corresponding to the coupled optimal pursuit-evasion strategies. The value function is evaluated as the payoff for these trajectories. Singular surfaces are manifolds where the value function or its derivatives are discontinuous. However, there is no general theory of construction for singular surfaces. Commonly, a researcher needs to explore different known options for such surfaces [2]. The Isbell’s solution is rather simple and includes no singular surfaces. It was revisited several times by J. Breakwell and his students with the use of a focal line first, and two switching envelops then [1, 5]. A. Melikyan and his students suggested that P and E have to move along straight lines in the attraction domains of the corner surfaces and their solution contains two equivocal surfaces [6].

A pursuit-evasion game is called alternative if it can be terminated by P at will on any of two given terminal manifolds, the payoff functionals of Boltza type on these manifolds differ only in their terminal parts (the integral part is common and equal to 1) and the optimal feedback strategies and the value functions are known [7,8,9]. We consider the obstacle tag game as an alternative pursuit game. At every state where the segment PE crosses the obstacle, P has two alternatives, i.e., to follow the south or north geodesic line. For each of them, the guaranteed catch time is known, and P may choose those with lesser value. However, if a any state P chooses the shortest path to E, a sliding mode may arise on the manifold with collinear P, C, E, and the payoff is undefined there if the corresponding trajectories are defined as limits of Euler broken lines there.

First, we describe a setup of the game. Then, we analyze the structure of the game space in terms of relations of domination between alternatives, and their consistency. Finally, we describe a pursuit strategy with memory and evaluate the guaranteed result solving control optimization problems for E.

17.2 Setup

Let the obstacle be a circular hole of unit radius centered at z c = (x c, y c) in the plane. Let z p = (x p, y p) and z e = (x e, y e) be Cartesian coordinates of players, ||z p − z c|| > 1, ||z e − z c|| > 1. Let P and E have simple motions with speed 1 and β, β < 1, the players perfectly measure all coordinates and P strive to catch E in minimum time. We consider the game only for the initial states where the obstacle separates players. The game terminates at the first instant when P gets on the obstacle boundary or the line segment PE is tangential to the circle. P can follow the shortest geodesic line chosen at the initial state and guarantee that the time spent on E’s point capture is less or equal to the initial distance between P and E along the geodesic line divided by (1 − β). Our goal is to generate a pursuit strategy which allows P to choose geodesic lines if it would be advantageous to him, and evaluate corresponding guaranteed results for this strategy.

We will put the game into different reduced spaces of dimension three. Depending on the chosen state space, we will have different equations describing motions. The target set will be the set of states where P fixes his choice of the geodesic line when evaluates the guaranteed payoff finally.

Let the obstacle be centered at z c = (0, 1), and P be on the negative part of the x-axis. If z p = (−ρ p, 0) then ρ p > 0 is the distance from P to the obstacle along the tangential line. Let z e = (x e, y e) be Cartesian coordinates of E, x e, y e ≥ 0, ||z e − z c|| > 1. Let \(\alpha _p=\arctan \rho _p^{-1}\) and \( y_e/(\rho _p+x_e)\le \tan \alpha _p\). Then, the function V s that evaluates the guaranteed time needed for point capture of E along the south geodesic line and its continuation may be described as

$$\displaystyle \begin{aligned} V^s(\rho_p,x_e,y_e)=\frac{\rho_p+\theta_e+\rho_e}{1-\beta}, \end{aligned} $$
(17.1)

where θ e = α e + γ e, \(\alpha _e=\arctan \rho _e^{-1}\), \(d_e=\sqrt {x_e^2+(1-y_e)^2}\), \(\gamma _e=\arctan (y_e-1)/x_e\), \(\rho _e=\sqrt {d_e^2-1}\) (see Fig. 17.2).

Fig. 17.2
figure 2

The first reduced space

In the second reduced space [5], the obstacle is centered at the origin O. P lies on the negative part of the x-axis at z p = (−d p, 0), d p ≥ 1, the distance from the origin to E equals d e ≥ 1 and ξ is the angle between OP and OE (see Fig. 17.3). The evaluation function may be represented as

$$\displaystyle \begin{aligned} V^s(d_p,\xi,d_e)=\frac{\sqrt{d_p^2-1}+ \xi-\arccos d_p^{-1}-\arccos d_e^{-1}+\sqrt{d_e^2-1}}{1-\beta}. \end{aligned} $$
(17.2)

At the instant t > 0, let P and E be separated by the obstacle and move at angles u p(t) and u e(t) (see Fig. 17.4). Then, for \(\arccos d_p^{-1}\le \xi \le \pi \), their motions may be described by the equations

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \dot d_p(t)=&\displaystyle \cos u_p(t), \\ \dot \xi(t)=&\displaystyle \frac{\displaystyle \sin u_p(t)}{\displaystyle d_p(t)}+\beta \frac{\displaystyle \sin (u_e(t)+\pi- \xi(t))}{\displaystyle d_e(t)},\\ {} \dot d_e(t)= &\displaystyle \beta \cos{}(u_e(t)+\pi-\xi(t)). \end{array} \end{aligned} $$
(17.3)
Fig. 17.3
figure 3

The second reduced space

Fig. 17.4
figure 4

A two-dimensional model of the state space: two fields of trajectories

Let Z ⊆ R 3 and M be the game space and terminal set, U p = {u p : ||u p||≤ 1}, U p = {u e : ||u e||≤ β}, z(t) ∈ Z, u p(t) ∈ U p, u e(t) ∈ U e and

$$\displaystyle \begin{aligned} \dot z(t)=f(z(t), u_p(t),u_e(t)),\; z(0)=z^0, \end{aligned} $$
(17.4)

be the equation that describes motions.

Strategies are rules that map available information into control values. We use equations like (17.4) to generate pencils of trajectories for given initial states and strategies, and then to evaluate the performance index for them. We consider only trajectories that are limits of Euler broken lines when diameters of time partitions tend to zero. This approach allows getting solutions that provide results close to guaranteed in numerical simulations of the game development.

Let Δ = {t 0, t 1, …, t i, t i+1, …} be a partition of the time axis R +. For a given z 0 ∈ Z and a chosen strategy \(\mathscr U_p\) with available information I (e.g., \(\mathscr U_p\div u_p: Z \rightarrow U_p\) for feedback strategies or \(\mathscr U_p\div u_p: R^+\times C^3_{[0,\infty )}\rightarrow U_p\) for memory strategies), let denote as \( Z_p( z^0,\mathscr U_p,\varDelta )\) the pencil of piecewise-constant solutions of the inclusion

(17.5)

where t ∈ [t i, t i+1), i ∈ N, t 0 = 0, t i →i, u p(t i) generated by \(\mathscr U_p\) with information available at the instant t = t i. By this means \( Z_p( z^0,\mathscr U_p,\varDelta )\) contains continuous functions z : R + → Z for which there exists an absolutely continuous restriction onto [0, θ] for any θ > 0 that meets (17.5) for almost all t ∈ [0, θ].

For the first (south) alternative, given z 0, \(\mathscr U_p\), M = M s, ε > 0, Δ and \(z(\cdot )\in Z_p( z^0,\mathscr U_p,\varDelta )\), let

$$\displaystyle \begin{aligned} \tau^s_\varepsilon(z(\cdot))=\min \{t_i\in \varDelta: z(t_i)\in M^s_\varepsilon\}, \end{aligned} $$
(17.6)

if \(\exists t_i\in \varDelta : z(t_i)\in M^s_\varepsilon \), and +  otherwise, where \(M^s_\varepsilon =\{z: z\in Z, \min _{z^\prime \in M^s} ||z-z^\prime ||\leq \varepsilon \}\) is the ε-neighbourhood of M s.

Let

$$\displaystyle \begin{aligned} \mathscr P_\varepsilon^s( z(\cdot))=\tau^s_\varepsilon+V^s(z(\tau^s_\varepsilon)), \end{aligned} $$
(17.7)

if \(\tau ^s_\varepsilon =\tau ^s_\varepsilon (z(\cdot ))<+\infty \), and +  otherwise. Let also |Δ| =supiN(t i+1 − t i). The guaranteed result for a particular pursuit strategy \(\mathscr U_p\) may be evaluated as

$$\displaystyle \begin{aligned} \mathscr P^s( z^0,\mathscr U_p) = \lim_{ \varepsilon\rightarrow 0+} \mathscr P_\varepsilon^s( z^0,\mathscr U_p), \end{aligned} $$
(17.8)

where

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathscr P_\varepsilon^s( z^0, \mathscr U_p)=\lim_{|\varDelta|\rightarrow +0}\mathscr P_\varepsilon^s( z^0,\mathscr U_p,\varDelta), \\ \mathscr P_\varepsilon^s( z^0, \mathscr U_p,\varDelta)=\sup_{ z(\cdot)\in Z_p( z^0,\mathscr U_p,\varDelta)} \mathscr P_\varepsilon^s( z(\cdot)).\notag \end{array} \end{aligned} $$
(17.9)

For coupled pursuit and evasion strategies \(\mathscr U_p\) and \(\mathscr U_e\), the guaranteed result that defined according to the described scheme is denoted as \(\mathscr P^s( z^0,\mathscr U_p,\mathscr U_e)\).

Similarly, we define the game with the second (south) alternative and the terminal set M n, the guaranteed payoff \(\mathscr P^n( z^0,\mathscr U_p)\) for z 0 ∈ Z and \(\mathscr U_p\), etc.

The game with free alternative is completed on M = M s ∪ M n. For z 0 ∈ Z and \(\mathscr U_p\), if P fixes the preferable alternative, he guarantees the payoff

$$\displaystyle \begin{aligned}\mathscr P( z^0,\mathscr U_p)=\min(\mathscr P^s( z^0,\mathscr U_p),\mathscr P^n( z^0,\mathscr U_p)).\end{aligned}$$

Our goal is to generate featured pursuit strategies for the game with free alternative and evaluate corresponding guaranteed payoffs.

17.3 Gradient Strategies

In the game with south alternative, define the (universal) gradient pursuit strategy for P that generates the control according to the following relation [10]

$$\displaystyle \begin{aligned} u^s_p(z)= \arg \min_{u_p \in U_p}\max_{u_e \in U_e} \frac{\partial V^s(z)}{\partial f(z, u_p,u_e)}, \; z\in Z. \end{aligned} $$
(17.10)

For z = (d p, ξ, d e) and (17.2), (17.3), where − π ≤ u e, u e ≤ π, we have

$$\displaystyle \begin{aligned} \frac{\partial V^s(z)}{\partial f(z, u_p, u_e)}= \frac{ \sin ( u_p+\arccos d_p^{-1})-\beta \sin ( u_e -\xi+ \arccos d_e^{-1})}{(1-\beta)}. \end{aligned} $$
(17.11)

Therefore, the guaranteed pursuit strategy corresponds to \( u^s_p=-\arcsin d_p^{-1}\), i.e. P follows the south geodesic line at every state.

Moreover, the (universal) gradient evasion strategy that defined a similar way corresponds to \( u^s_e=\arcsin d_e^{-1}-(\pi -\xi )\) when at every state E flees on the continuation of south geodesic line,

$$\displaystyle \begin{aligned} \min_{ u^s_p}\max_{ u^s_e}\frac{\partial V^s(z)}{\partial f(z, u_p, u_e)}=\frac{\partial V^s(z)}{\partial f(z, u_p^s, u_e^s)}=-1. \end{aligned} $$
(17.12)

and

$$\displaystyle \begin{aligned} \min_{ u^s_p}\frac{\partial V^s(z)}{\partial f(z, u_p, u_e)}=\frac{\partial V^s(z)}{\partial f(z, u_p^s, u_e)}<-1 \end{aligned} $$
(17.13)

if \( u_e\neq u^s_e\) [6, 11].

The same results are valid for the north alternative with the evaluation function V n and controls \(u_p^n, u_e^n\).

Therefore, in the game with a fixed alternative, at any state, the payoff V s or V n is guaranteed if P follows the respective geodesic line and E retreats on its continuation.

17.4 Decomposition of the State Space

Let us denote as \(\mathscr U_p^0\div u^0_p: Z \rightarrow U_p\) the pursuit gradient strategy that at the state z generates the control \(u_p^0 (z^0,z)=u_p^s (z)\) (17.10) if the south geodesic lines is preferable for P at the state z 0 (or either of them if they are of equal length) and \(u_p^n (z)\) otherwise. If P updates the target alternative at any current state, denote this universal strategy as \(\mathscr U_p^Z\div u^t_p: Z \rightarrow U_p\). Also let us use similar notations \(\mathscr U_e^0\) and \(\mathscr U_e^Z\) for corresponding evasion strategies.

At any state, P has two alternative ways to chase E with known guaranteed results evaluated with V s or V n. Then the state space is filled with two families of ideal trajectories corresponding to the coupled geodesic pursuit-evasion strategies \(\mathscr U_p^0\) and \(\mathscr U_e^0\) (see Fig. 17.4). Starting at z 0 ∈ Z, P with \(\mathscr U_p^0\) can guarantee the payoff equal to \(\min (V^s(z^0),V^n(z^0))\).

Consider guaranteed results if P can switch between alternatives.

At the state z 0 ∈ Z, an alternative (south or north) is called consistent (stable) if it dominates the other one at the initial state and also at any state emerged when P and E move along the related geodesic line and its continuation. Let us divide the state space depending on the consistency of the relation of domination (see Fig. 17.5):

  • Z s and Z n are subsets where the particular alternative (south or north) strictly dominates the other one and is consistent,

    Fig. 17.5
    figure 5

    A two-dimensional model of the state space: a partition of the state space

  • D s|n disjoints Z s and Z n, and V s = V n there,

  • \(Z^{\bar {s}}\) and \(Z^{\bar {n}}\) are subsets where the particular alternative strictly dominates other and is not consistent, \(Z^{\bar {s|n}}=Z^{\bar {s}}\cup Z^{\bar {n}}\),

  • \(D^{\bar {s|n}}\) disjoints \(Z^{\bar {s}}\) and \(Z^{\bar {n}}\), and V s = V n there,

  • D 0 ⊂ D s|n disjoints D s|n and \(D^{\bar {s|n}}\),

  • B s disjoints Z s and \(Z^{\bar {s}}\), and when the players move along the south geodesic line and its continuation, the south alternative strictly dominates the north one, and there exists exactly one instant when the alternatives become equivalent,

  • B n is defined similar to B s.

In the game of obstacle tag (see Fig. 17.6), for a given ρ p, \(Z^{\bar {s|n}}\) consists from two curvilinear triangles. They are joined along the half-line from P trough C. All D 0 for different ρ p lie on this half-line

$$\displaystyle \begin{aligned} y=1+\sqrt{\beta}, \;\;\;x\ge \sqrt{1-\beta}. \end{aligned} $$
(17.14)
Fig. 17.6
figure 6

Decomposition of the reduced space for a given ρ p

It is evident that \(Z^{\bar {s|n}}=\varnothing \) for \(\rho _p<\rho _p^*(\beta )\) where

$$\displaystyle \begin{aligned}\notag \rho_p^*(\beta)=\sqrt{(1-\beta)/\beta}. \end{aligned}$$

Therefore, actually the game may be terminated on the subsurface \(\rho _p=\rho _p^*(\beta )\), and the state definitely leaves \(Z^{\bar {s|n}}\) when P follows the geodesic line.

The dotted line in Fig. 17.6 shows the locus of E’s terminal positions of the secondary domain \(\tilde D^0\) for different ρ p for some known solutions of the game(see, e.g., [5, 6]). It has the parametric representation

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} x_e=&\displaystyle 1/\sqrt{1-(1-s)^2}, \end{array} \end{aligned} $$
(17.15)
$$\displaystyle \begin{aligned} \begin{array}{rcl} y_e=&\displaystyle \beta/\sqrt{\beta^2-(\beta-s)^2}, \; 0<s\le \beta. \end{array} \end{aligned} $$
(17.16)

The half-line D 0 (see (17.14)) is a horizontal asymptote for it.

17.5 Alternative Pursuit Strategy with Memory

The strategy \(\mathscr U_p^Z\) is discontinuous on \(D=D^{s|n}\cup D^{\bar {s|n}}\). When the state gets in the neighbourhood of \(D^{\bar {s|n}}\), piecewise-constant solutions of the inclusion (17.5) with \(\mathscr U_p^Z\) stay there for some time. For P, it may lead to the switching control in sliding mode for which the payoff couldn’t be evaluated with the use Euler broken lines.

For the initial state \(z^0\in Z^{\bar {s|n}}= Z^{\bar {s}}\cup D^{\bar {s|n}}\cup Z^{\bar {n}}\), let us allow P to remember the history of the game development and to update the target alternative no more than once on B s or B n when the state leaves \(Z^{\bar {s|n}}\). Let us denote the corresponding strategy as \(\mathscr U_p^1\). The strategy \(\mathscr U_p^B\) prevents P from switching between alternative strategies in the neighbourhood of \(D^{\bar {s|n}}\).

Let us evaluate the guaranteed result \(\mathscr P^{s|n}\) (see (17.8)) when \(z^0\in Z^{\bar {s|n}}\), P applies \(\mathscr U_p^1(z^0,z)\) in \(z\in Z^{\bar {s|n}}\) and \(\mathscr U_p^Z(z^0,z)\) for z the rest of the state space. In order to do that, we setup and solve optimization control problems for termination sets \(D^{\bar {s|n}}\), B s and B n (see Fig. 17.7 for \(z^0\in Z^{\bar {s}}\)). Thereafter, we assume that E moves at the angle ψ in a straight line within \(Z^{\bar {s|n}}\) until the first instant when the state arrives on one of the terminal sets. The maximal of three corresponding estimations determines the guaranteed result.

Fig. 17.7
figure 7

A two-dimensional model of the state space: different options to leave \(Z^{\bar {s}}\)

On B s and B n, the guaranteed results correspond to the gradient strategies (see Sect. 17.3). From the boundaries, the state shifts on D 0 since B s and B n are themselves ideal trajectories for the coupled gradient strategies. Thus, in all cases, the state leaves the closure of \(Z^{\bar {s|n}}\) through D 0 (see Fig. 17.7).

If the state under the E’s control first gets on \(D^{\bar {s|n}}\) and then on D 0 along \(D^{\bar {s|n}}\) (see Fig. 17.7), the guaranteed result is described in [7]. It turns out that the preliminary straight line is tangent to the curvilinear motion along \(D^{\bar {s|n}}\) (the Isbell’s Case 8 in [4]) where at the state \((\rho _p,x_e,y_e)\in D^{\bar {s|n}}\) E chooses the angle [7]

$$\displaystyle \begin{aligned}\psi^{D^{\bar{s|n}}}=\arcsin \frac{\displaystyle (y_e-1)} {\displaystyle \beta\sqrt{1+\rho_p^2}}+\arcsin \frac{\displaystyle 1}{\displaystyle \sqrt{1+\rho_p^2}}.\end{aligned}$$

It is important to note that the Isbell’s solution for Case 7 [4] when the state shifts on D 0 directly from within \(Z^{\bar {s|n}}\) is just infeasible.

Thus far, for the initial state \(z^0\in Z^{\bar {s|n}}\) at any current state \(z\in Z^{\bar {s|n}}\) with the reduced coordinates (ρ p, x e, y e), the angle ψ chosen by E determines the instants \(\tau ^{s|n},\tau ^{\bar {s}},\tau ^{\bar {n}}\) when the state arrives on \(D^{\bar {s|n}}\), B s, B n, and the associated payoffs. The maximal of them defines the guaranteed result \(\mathscr P^{s|n}\) for the described pursuit strategy. As numerical simulations show, in the secondary optimization problem, the preferable option for E always corresponds to the case when E from the initial states \(Z^{\bar {s}}\) shifts on \(B^{\bar {n}}\), and on \(B^{\bar {s}}\) from \(Z^{\bar {n}}\). In this case, E takes the secant line with minimal angle for which he gets from \(Z^{\bar {s}}\) on \(B^{\bar {n}}\) missing \(B^{\bar {s}}\) or from \(Z^{\bar {n}}\) on \(B^{\bar {s}}\) missing \(B^{\bar {n}}\).

Let (d p, π, 1) be the state vector in the second reduced space. An example of the optimal evasion trajectory generated with the use of the described approach and provided the guaranteed result is shown in Fig. 17.8 (\(D^{\bar {s|n}}\), B s, B n, etc. are given for \(t=\tau ^{D^0}\)). Detail descriptions of solutions of the obstacle tag game mentioned in, e.g., [1, 5, 6] are not available. However, it may be safely suggested that for the states with E on the obstacle and collinear with P and C, the known the value function take the value defined on straight line motions of the players along the segment PE and its continuation. Then, the guaranteed result for them may be evaluated as \(\tau ^{\tilde D^0}+V^s(d_p-\tau ^{\tilde D^0},\pi ,1+\beta \tau ^{\tilde D^0})\) (see (17.2)) where \(\tau ^{\tilde D^0}\) is the first instant when E gets on \(\tilde D^0\) (17.15) (see also Fig. 17.7). It’s also worth noting that for 0.1 ≤ β ≤ 0.9, \(\rho _p >\rho _p^*(\beta )\), the maximum relative difference between these values and the guaranteed results for the described pursuit strategy with memory is about 1%.

Fig. 17.8
figure 8

An example of optimal evasion in \(Z^{\bar {s}}\)

17.6 Conclusion

In the situations shown in Fig. 17.1, evaluation of the guaranteed result as corresponding to the south geodesic line appears too pessimistic to P. On the other hand, if P chooses the shortest geodesic at any current state, this feedback strategy is discontinuous for collinear P, C, E. The resulting pencil of trajectories approximated by Euler broken lines doesn’t include associated trajectory with P and E moving along the half-line. To form a pursuit strategy and to evaluate the guaranteed result, e.g., J. Breakwell and A. Melikyan described the fields of trajectories for coupled optimal strategies of the players with two switch envelops or equivocal lines [1, 5, 6]. The construction of such fields involves a cumbersome integration of characteristic equations for Hamilton-Jacobi-Isaacs equations.

We considered the obstacle chase game as an alternative pursuit game. The state space was divided into several parts depending on the domination and consistency features of alternatives at the initial state. In the parts corresponding to the situation similar to that shown in Fig. 17.1, the generated strategy with memory allows P to switch between alternatives only a finite number of times on their boundaries. The guaranteed results fit the evasion strategy whereby E takes a secant line to the obstacle until the state arrives on the boundary. Therefore, for the states in the special region, P uses a strategy that doesn’t depend on the current position of E until the state reaches the boundary.

The approach can be modified to handle the games with convex obstacles of different shapes; see, e.g., [6]. However, decomposition of the state space will be asymmetrical and there will be different termination options for the alternatives when solving the secondary control problems for E.