Nonzero-Sum Differential Games

Başar, Tamer; Haurie, Alain; Zaccour, Georges

doi:10.1007/978-3-319-44374-4_5

Tamer Başar³,
Alain Haurie^4,5 &
Georges Zaccour⁶

4503 Accesses
9 Citations
5 Altmetric

Abstract

This chapter provides an overview of the theory of nonzero-sum differential games, describing the general framework for their formulation, the importance of information structures, and noncooperative solution concepts. Several special structures of such games are identified, which lead to closed-form solutions.

Access provided by CONRICYT-eBooks. Download reference work entry PDF

Nonzero-Sum Differential Games

Zero-Sum Differential Games

Differential Games with Asymmetric and Correlated Information

Article 18 December 2014

Keywords

1 Introduction

Differential games are games played by agents, also called players, who jointly control (through their actions over time, as inputs) a dynamical system described by differential state equations. Hence, the game evolves over a continuous-time horizon (with the length of the horizon known to all players, as common knowledge), and over this horizon each player is interested in optimizing a particular objective function (generally different for different players) which depends on the state variable describing the evolution of the game, on the self-player’s action variable, and also possibly on other players’ action variables. The objective function for each player could be a reward (or payoff, or utility) function, in which case the player is a maximizer, or it could be a cost (or loss) function, in which case the player would be a minimizer. In this chapter we adopt the former, and this clearly brings in no loss of generality, since optimizing the negative of a reward function would make the corresponding player a minimizer. The players determine their actions in a way to optimize their objective functions, by also utilizing the information they acquire on the state and other players’ actions as the game evolves, that is, their actions are generated as a result of the control policies they design as mappings from their information sets to their action sets. If there are only two players and their objective functions add up to zero, then this captures the scenario of two totally conflicting objectives – what one player wants to minimize the other one wants to maximize. Such differential games are known as zero-sum differential games. Otherwise, a differential game is known to be nonzero-sum .

The study of differential games (more precisely, zero-sum differential games) was initiated by Rufus Isaacs at the Rand Corporation through a series of memoranda in the 1950s and early 1960s of the last century. His book Isaacs (1965), published in 1965, after a long delay due to classification of the material it covered, is still considered as the starting point of the field. The early books following Isaacs, such as those by Blaquière et al. (1969), Friedman (1971), and Krassovski and Subbotin (1977), all dealt (in most part) with two-player zero-sum differential games. Indeed, initially the focal point of differential games research stayed within the zero-sum domain and was driven by military applications and the presence of antagonistic elements. The topic of two-player zero-sum differential games is covered in some detail in this chapter (TPZSDG) of this Handbook.

Motivated and driven by applications in management science, operations research, engineering, and economics (see, e.g., Sethi and Thompson 1981), the theory of differential games was then extended to the case of many players controlling a dynamical system while playing a nonzero-sum game. It soon became clear that nonzero-sum differential games present a much richer set of features than zero-sum differential games, particularly with regard to the interplay between information structures and nature of equilibria. Perhaps the very first paper on this topic, by Case, appeared in 1969, followed closely by a two-part paper by Starr and Ho (1969a,b). This was followed by the publication of a number of books on the topic, by Leitmann (1974), and by Başar and Olsder (1999), with the first edition dating back to 1982, Mehlmann (1988), and Dockner et al. (2000), which focuses on applications of differential games in economics and management science. Other selected key book references are the ones by Engwerda (2005), which is specialized to linear-quadratic differential (as well as multistage) games, Jørgensen and Zaccour (2004), which deals with applications of differential games in marketing, and Yeung and Petrosjan (2005), which focuses on cooperative differential games.

This chapter is on noncooperative nonzero-sum differential games, presenting the basics of the theory, illustrated by examples. It is based in most part on material in Chaps. 6 and 7 of Başar and Olsder (1999) and Chap. 7 of Haurie et al. (2012).

2 A General Framework for m-Player Differential Games

2.1 A System Controlled by m Players

2.1.1 System Dynamics

Consider an n-dimensional dynamical system controlled by a set of m players over a time interval [t⁰, T], where T > t⁰ is a final time that can either be a given data or defined endogenously as the time of reaching a given target, as to be detailed below. For future use, let M = {1, …, m} denote the players set, that is, the set of all players. This dynamical system has the following elements:

1.
A state variable $x\in X\subset {\mathbb {R}}^{n}$, and for each player j ∈ M, a control vector $u_{j}\in U_{j}\subset \mathbb {R}^{p_{j}}$, where X and U_j’s are open domains.
2.
A state equation (which is an n-dimensional ordinary differential equation) and an initial value for the state (at time t⁰)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{x}(t) &\displaystyle =&\displaystyle f(x(t),u(t),t), {} \end{array} \end{aligned} $$
(2.1)

$$\displaystyle \begin{aligned} \begin{array}{rcl} x(t^{0}) &\displaystyle =&\displaystyle x^{0}, {} \end{array} \end{aligned} $$
(2.2)
where {x(t) : t ∈ [t⁰, T]} is the state trajectory and $ \{u(t)\triangleq (u_{1}(t),\ldots ,u_{m}(t)):t\in \lbrack 0,T]\}$ is the control (or action) schedule (or simply the control) chosen by the m players, with u_j(⋅) generated by player j as a p_j -dimensional function. Here $\dot {x}(t)$ denotes the time derivative $ \displaystyle{\frac {d}{dt}}x(t)$. The function $f(\cdot ,\cdot ,\cdot ): \mathbb {R}^{n}\times \mathbb {R}^{p_{1}+\dots +p_{m}}\times \mathbb {R}\mapsto \mathbb {R}^{n}$ is assumed to be continuously differentiable (of class C¹) in x, u, and t.
3.
If the control vector generated by the m players is a measurable function of t, or more simply a piecewise continuous function, there is a unique state trajectory solution of (2.1) and (2.2), and each player j ∈ M receives a cumulative reward over the time horizon [t⁰, T]:
$$\displaystyle \begin{aligned} J_{j}(u(\cdot );x^{0},t^{0})=\int_{t^{0}}^{T}g_{j}(x(t),u(t),t)\>dt+S_{j}(x(T),T), {} \end{aligned} $$
(2.3)
where g_j is player j’s instantaneous reward rate and S_j is the terminal reward, also called salvage value function. The functions $ g_{j}(\cdot ,\cdot ,\cdot ):\mathbb {R}^{n}\times \mathbb {R}^{p_{1}+\dots +p_{m}}\times \mathbb {R}\mapsto \mathbb {R}$, j ∈ M, are assumed to be continuously differentiable in x, u, and t, and $S_{j}(\cdot ,\cdot ): \mathbb {R}^{n}\times \mathbb {R}\mapsto \mathbb {R}$, j ∈ M, are assumed to be continuously differentiable in x and t.

2.1.2 Control Constraints

The choice of a control by player j is subject to a pointwise constraint for each t ∈ [t⁰, T]

$$\displaystyle \begin{aligned} u_{j}(t)\in U_{j},\quad t\in \lbrack t^{0},T], {} \end{aligned} $$

(2.4)

where U_j is referred to as the player’s admissible pointwise control set. In a more general setting, the admissible control set may depend on time t and state x(t). Then, the choice of a control is subject to a constraint

$$\displaystyle \begin{aligned} u_{j}(t)\in U_{j}(x(t),t)),\quad t\in \lbrack t^{0},T], {} \end{aligned} $$

(2.5)

where the correspondence, or point-to-set mapping $\left \{ U_{j}(\cdot ,\cdot ):\mathbb {R}^{n}\times \mathbb {R}\mapsto 2^{\mathbb {R} ^{p_{j}}}\right \} $ is assumed to be upper-semicontinuous. In such a case, player j will of course also have to have access to the current value of the state, which brings in the question of what information a player has to have access to before constructing her control; this is related to the information structure of the differential game, without which the formulation of a differential game would not be complete. Information structures will be introduced shortly, in the next subsection.

2.1.3 Target

The determination of the terminal time T can be either prespecified (as part of the initial data), $T\in \mathbb {R}^+$, or the result of the state trajectory reaching a target. The target is defined by a surface or manifold defined by an equation of the form

$$\displaystyle \begin{aligned} \Theta(x,t)=0{,} {} \end{aligned} $$

(2.6)

where $\Theta (\cdot ,\cdot ):\mathbb {R}^n\times \mathbb {R}\mapsto \mathbb {R}$ is continuously differentiable. The trajectory ends (reaches the target), and the rewards are computed, at the first time T when the condition Θ(x(T), T) = 0 is satisfied.

2.1.4 Infinite-Horizon Games

In economic and engineering applications, one also considers games where the terminal time T may tend to ∞. The payoff to player j is then defined as

$$\displaystyle \begin{aligned} J_{j}(u(\cdot );x^{0},t_{0})={\int}_{0}^{\infty }e^{-\rho _{j}t}g_{j}(x(t),u(t)) dt. {} \end{aligned} $$

(2.7)

Note that player j’s payoff does not include a terminal reward and the reward rate depends explicitly on the running time t through a discount factor $e^{-{\rho }_{j}t}$, where ρ_j is a discount rate satisfying ρ_j ≥ 0, which could be player dependent. An important issue in an infinite-horizon dynamic optimization problem (one-player version of the problem above) is the fact that when the discount rate ρ_j is set to zero, then the integral payoff (2.7) may not be well defined, as the integral may not converge to a finite value for all feasible control paths u(⋅), and in some cases for none. In such situations, one has to rely on a different notion of optimality, e.g., overtaking optimality, a concept well developed in Carlson et al. (1991). We refer the reader to the next chapter (Chap. 3) for a deeper discussion of this topic.

2.2 Information Structures and Strategies

2.2.1 Open Loop Versus State Feedback

To complete the formulation of a differential game, one has to describe precisely the information available to each player (regarding the state and past actions of other players) when they choose their controls at time t. Let us first focus on two information structures of common use in applications of differential games, namely, open-loop and state-feedback information structures. Letting ν(t) denote the information available to a generic player at time t, we say that the information structure is open loop if

$$\displaystyle \begin{aligned} \nu (t) =\{x^{0},t\}, \end{aligned}$$

that is, the available information is the current time and the initial state. An information structure is state feedback if

$$\displaystyle \begin{aligned} \nu (t) =\{x(t),t\},\end{aligned} $$

that is, the available information is the current state of the system in addition to the current time. We say that a differential game has open-loop (respectively, state-feedback) information structure if every player in the game has open-loop (respectively, state-feedback) information. It is of course possible for some players to have open-loop information while others have state-feedback information, but we will see later that such a mixed information structure does not lead to a well-defined differential game unless the players who have access to the current value of the state also have access to the initial value of the state, that is,

$$\displaystyle \begin{aligned} \nu (t) =\{x(t),x^{0},t\}. \end{aligned}$$

Another more general information structure is the one with memory, known as closed-loop with memory, where at any time t a player has access to the current value of the state and also recalls all past values, that is,

$$\displaystyle \begin{aligned} \nu (t) =\{x(s),s\leq t\}. \end{aligned}$$

The first two information structures above (open loop and state feedback) are common in optimal control theory, i.e., when the system is controlled by only one player. In optimal control of a deterministic system, the two information structures are in a sense equivalent. Typically, an optimal state-feedback control is obtained by “synthesizing” the optimal open-loop controls defined from all possible initial states.^{Footnote 1} It can also be obtained by employing dynamic programming or equivalently Bellman’s optimality principle (Bellman 1957). The situation is, however, totally different for nonzero-sum differential games. The open-loop and state-feedback information structures generally lead to two very different types of differential games, except for the cases of two-player zero-sum differential games (see Chap. 8, “Zero-sum Differential Games” in this Handbook and also our brief discussion later in this chapter) and differential games with identical objective functions for the players (known as dynamic teams, which are equivalent to optimal control problems as we are dealing with deterministic systems) – or differential games that are strategically equivalent^{Footnote 2} to zero-sum differential games or dynamic team problems. Now, to understand the source of the difficulty in the nonequivalence of two differential games that differ (only) in their information structures, consider the case when the control sets are state dependent, i.e., u_j(t) ∈ U_j(x(t), t). In the optimal control case, when the only player who controls the system selects a control schedule, she can compute also the associated unique state trajectory. In fact, selecting a control amounts to selecting a trajectory. So, it may be possible to select jointly the control and the associated trajectory to ensure that at each time t the constraint u(t) ∈ U(x(t), t) is satisfied; hence, it is possible to envision an open-loop control for such a system. Now, suppose that there is another player involved in controlling the system; let us call them players 1 and 2. When player 1 defines her control schedule, she does not know the control schedule of the other player, unless there has been an exchange of information between the two players and a tacit agreement to coordinate their choices of control. Therefore, player 1, not knowing what player 2 will do, cannot decide in advance if her control at time t will be in the admissible set U₁(x(t), t) or not. Hence, in that case, it is impossible for the players to devise feasible and implementable open-loop controls, whereas this would indeed be possible under the state-feedback information structure. The difference between the two information structures is in fact even more subtle, since even when the admissible control sets are not state dependent, knowing at each instant t what the state x(t) is, or not having access to this information will lead to two different types of noncooperative games in normal form as we will see in the coming sections.

2.2.2 Strategies

In game theory one calls strategy (or policy or law) a rule that associates an action to the information available to a player at a position of the game. In a differential game, a strategy γ_j for player j is a function that associates to each possible information ν(t) at t, a control value u_j(t) in the admissible control set. Hence, for each information structure we have introduced above, we will have a different class of strategies in the corresponding differential game. We make precise below the classes of strategies corresponding to the first two information structures, namely, open loop and state feedback.

Definition 1

Assuming that the admissible control sets U_j are not state dependent, an open-loop strategy γ_j for player j (j ∈ M) selects a control action according to the rule

$$\displaystyle \begin{aligned}u_{j}(t)={\gamma}_{j}(x^{0},t),\quad\forall x^{0},\forall t,j\in M, {} \end{aligned} $$

(2.8)

where $\gamma _{j}(\cdot ,\cdot ):\mathbb {R}^{n}\times \mathbb {R}\mapsto U_{j}$ is a function measurable (or piecewise continuous) in t, for each fixed x⁰. The class of all such strategies for player j is denoted by ${\Gamma }_{j}^{\mathrm {OL}}$ or simply by Γ_j.

Definition 2

A state-feedback strategy γ_j for player j (j ∈ M) selects a control action according to a state-feedback rule

$$\displaystyle \begin{aligned} u_{j}(t)=\gamma _{j}(x(t),t){,}\quad j\in M{,} \end{aligned} $$

(2.9)

where $\gamma _{j}(\cdot ,\cdot ):(x,t)\in \mathbb {R}^{n}\times \mathbb {R} \mapsto U_{j}(x,t)$ is a given function that must satisfy the required regularity conditions imposed on feedback controls.^{Footnote 3} The class of all such strategies for player j is denoted by $\Gamma _{j}^{ \mathrm {SF}}$ or simply by Γ_j.

Remark 1

In the literature on dynamic/differential games, state-feedback strategy is sometimes called “Markovian,” in contrast to “open loop,” with the argument being that the former implies less “commitment” than the latter. Such an interpretation is misleading on two counts. First, one can actually view both classes of strategies as Markovian, since, at each time t, they exploit only the information received at time t. The strategies do not exploit the history of the information received up to time t, which is in fact not available. Second, in both cases, a strategy is a full commitment. Using an open-loop strategy means that the player commits, at the initial time, to a fixed time path for her control, that is, her choice of control at each instant of time is predetermined. When using a state-feedback strategy , a player commits to the use of a well-defined servomechanism to control the system, that is, her reaction to the information concerning the state of the system is predetermined. The main advantages of state-feedback strategies lie elsewhere: (i) state-feedback strategies are essential if one has a stochastic differential game (a differential game where the state dynamics are perturbed by disturbance (or noise) with a stochastic description); in fact, if we view a deterministic differential game as the “limit” of a sequence of stochastic games with vanishing noise, we are left with state-feedback strategies. (ii) State-feedback strategies allow us to introduce the refined equilibrium solution concept of “subgame-perfect Nash equilibrium,” which is a concept much appreciated in economic applications, and will be detailed below.

3 Nash Equilibria

Recall the definition of a Nash equilibrium for a game in normal form (equivalently, strategic form).

Definition 3

With the initial state x⁰ fixed, consider a differential game in normal form, defined by a set of m players, M={1, …, m}, and for each player j(j ∈ M) a strategy set Γ_j and a payoff function

$$\displaystyle \begin{aligned} {\bar J}_j:\Gamma_1\times\dots\times\Gamma_j\times\dots\times\Gamma_m\mapsto \mathbb{R}, \quad j\in M. \end{aligned}$$

Nash equilibrium is a strategy m-tuple $\gamma ^*=(\gamma _1^*, \dots ,\gamma _m^*)$, such that for each player j the following holds:

$$\displaystyle \begin{aligned} {\bar J}_j(\gamma^*)\ge {\bar J}_j([\gamma_j, \gamma^{*}_{-j}]), \;\; \forall \gamma_j\in \Gamma_j{,} \end{aligned} $$

(2.10)

where $\gamma ^{*}_{-j} := (\gamma _i^*:i\in M\setminus j)$ and $ [\gamma _j, \gamma ^*_{-j}]$ is the m-tuple obtained when, in γ^∗, $ \gamma _j^*$ is replaced by γ_j. In other words, in Nash equilibrium, for each player j, the strategy $\gamma _j^*$ is the best reply to the (m − 1)-tuple of strategies $\gamma ^{*}_{-j}$ chosen by the other players.

Corresponding to the first two information structures we have introduced for differential games, we will now define two different games in normal form, leading to two different concepts of Nash equilibrium for nonzero-sum differential games.

3.1 Open-Loop Nash Equilibrium (OLNE)

Assume that the admissible control sets U_j, j ∈ M are not state dependent. If the players use open-loop strategies (2.8), each γ_j defines a unique control schedule u_j(⋅) : [0, T]↦U_j for each initial state x⁰. The payoff functions for the normal form game are defined by

$$\displaystyle \begin{aligned} {\bar J}_j(\gamma)=J_{j} (u(\cdot );x^{0},t^{0}),\quad j\in M, \end{aligned} $$

(2.11)

where J_j(⋅;⋅, ⋅) is the reward function defined in (2.3). Then, we have the following definition:

Definition 4

The control m-tuple $u^*(\cdot )=\left ( u_{1}^{\ast }(\cdot ),\ldots , {u} _{m}^{\ast }(\cdot )\right ) $ is an open-loop Nash equilibrium (OLNE) at (x⁰, t⁰) if the following holds:

$$\displaystyle \begin{aligned} J_{j}(u^{\ast }(\cdot );x^{0},t^{0})\geq J_{j}([u_{j}(\cdot ),u_{-j}^{\ast }(\cdot )];x^{0},t^{0}),\quad\forall u_{j}(\cdot ),j\in M, \end{aligned}$$

where u_j(⋅) is any admissible control of player j and $ [u_{j}(\cdot ),u_{-j}^\ast (\cdot )]$ is the m-tuple of controls obtained by replacing the j-th block component in u^∗(⋅) by u_j(⋅).

Note that in the OLNE, for each player j, $u_{j}^{\ast }(\cdot )$ solves the optimal control problem

$$\displaystyle \begin{aligned} \displaystyle{\max_{u_{j}(\cdot )}\left\{ \int_{t^{0}}^{T}g_{j}\left(x(t),[u_{j}(t),u_{-j}^{\ast }(t)],t\right) dt+S_{j}(x(T))\right\}}, \end{aligned}$$

subject to the state equation

$$\displaystyle \begin{aligned} \dot{x}(t):=\frac{d}{dt}x(t)=f\left(x(t),[u_{j}(t),u_{-j}^{\ast }(t)],t\right) ,\quad x(t^{0})=x^{0}, {} \end{aligned} $$

(2.12)

control constraints u_j(t) ∈ U_j, and target Θ(⋅, ⋅). Further note that OLNE strategies will in general also depend on the initial state x⁰, but this is information available to each player under the open-loop information structure.

3.2 State-Feedback Nash Equilibrium (SFNE)

Now consider a differential game with the state-feedback information structure. The system is then driven by a state-feedback strategy m-tuple γ(x, t)=(γ_j(x, t):j ∈ M), with $ \gamma _j \in \Gamma _j^{\mathrm {SF}}$ for j ∈ M. Its dynamics are thus defined by

$$\displaystyle \begin{aligned} \dot{ {x}}(t):=\frac{d }{dt}{x}(t)=f({x}(t), {\gamma }({x}(t), t),t) ,\quad x(t^{0})= {x}^{0}. {} \end{aligned} $$

(2.13)

The normal form of the game, at ( x⁰, t⁰), is now defined by the payoff functions^{Footnote 4}

$$\displaystyle \begin{aligned} {\bar{J}}_{j}({\gamma };{x}^{0},t^{0})=\int_{t^{0}}^{T}g_{j}( x(t),{ \gamma }(x(t),t),t) dt+S_{j}(x(T)), \end{aligned} $$

(2.14)

where, for each fixed x⁰, $x(\cdot ):[t^{0},T]\mapsto \mathbb {R}^{n}$ is the state trajectory solution of (2.13).

In line with the convention in the OL case, let us introduce the notation

$$\displaystyle \begin{aligned} {\gamma }_{-j}(t, {x}(t))\triangleq \left( \gamma _{1}({x}(t), t),\ldots ,\gamma _{j-1}({x}(t), t),\gamma _{j+1}({x}(t), t),\ldots ,\gamma _{m}({x} (t), t)\right){,} \end{aligned}$$

for the strategy (m − 1)-tuple where the strategy of player j does not appear.

Definition 5

The state-feedback m-tuple ${\gamma }^{\ast }=\left ( \gamma _{1}^{\ast },\ldots ,\gamma _{m}^{\ast }\right ) $ is a state-feedback Nash equilibrium (SFNE) on^{Footnote 5} $ X\times \left [ t^{0},\mathcal {T}\right ] $ if for any initial data $({x} ^{0},t^{0})\in X\times [0,\mathcal {T}\,] \subset \mathbb {R} ^{n}\times \mathbb {R}^{+},$ the following holds:

$$\displaystyle \begin{aligned} \bar{J}_{j}({\gamma }^{\ast };{x}^{0},t^{0})\geq \bar{J}_{j}([\gamma _{j}(\cdot ),{\gamma }_{-j}^{\ast }(\cdot )];{x}^{0},t^{0}),\quad\forall \gamma _{j}\in \Gamma _{j}^{SF},\;j\in M, \end{aligned}$$

where [γ_j, γ^−j∗] is the m-vector of strategies obtained by replacing the j-th block component in γ^∗ by γ_j.

In other words, $\{u_{j}^{\ast }(t)\equiv \gamma _{j}^{\ast }({x}^{\ast }(t),t):t\in \lbrack t^{0},T]\}$, where x^∗(⋅) is the equilibrium trajectory generated by γ^∗ from (x⁰, t⁰), solves the optimal control problem

(2.15)

subject to the state equation

$$\displaystyle \begin{aligned} \dot{{x}}(t)=f({x}(t),\left[ u_{j}(t),{\gamma }_{-j}^{\ast }({x}(t),t)\right] ,t),\ {x}(t^{0})={x}^{0}, {} \end{aligned} $$

(2.16)

control constraints u_j(t) ∈ U_j(x(t), t), and target Θ(⋅, ⋅). We can also say that $\gamma _{j}^{\ast }$ is the optimal state-feedback control $u_{j}^{\ast }(\cdot )$ for the problem (2.15) and (2.16). We also note that the single-player optimization problem (2.15) and (2.16) is a standard optimal control problem whose solution can be expressed in a way compatible with the state-feedback information structure, that is, solely as a function of the current value of the state and current time, and not as a function of the initial state and initial time. The remark below further elaborates on this point.

Remark 2

Whereas an open-loop Nash equilibrium is defined only for the given initial data, here the definition of a state-feedback Nash equilibrium asks for the equilibrium property to hold for all initial points, or data, in a region $ X\times \left [ t^{0},\mathcal {T}\right ] \subset \mathbb {R}^{n}\times \mathbb { R}^{+}$. This is tantamount to asking a state-feedback Nash equilibrium to be subgame perfect (Selten 1975), in the parlance of game theory, or strongly time consistent (Başar 1989). Indeed, even if the state trajectory is perturbed, either because a player has had a “trembling hand” or an unforeseen small shock happened, holding on to the same state-feedback strategy will still constitute a Nash equilibrium in the limit as the perturbations vanish; this property is more pronounced in the case of linear-quadratic differential games (games where the state dynamics are linear, payoff functions are jointly quadratic in the state and the controls, and the time horizon is fixed), in which case the stochastic perturbations in the state equation do not have to be vanishingly small as long as they have zero mean (Başar 1976, 1977). It should be clear that open-loop Nash equilibrium strategies do not possess such a property.

3.3 Necessary Conditions for a Nash Equilibrium

For the sake of simplicity in the exposition below, we will henceforth restrict the target set to be defined by the simple given of a terminal time, that is, the set {(t, x) : t = T}. Also the control constraint set U_j, j ∈ M will be taken to be independent of state and time. As noted earlier, at a Nash equilibrium, each player solves an optimal control problem where the system’s dynamics are influenced by the strategic choices of the other players. We can thus write down necessary optimality conditions for each of the m optimal control problems, which will then constitute a set of necessary conditions for a Nash equilibrium. Throughout, we make the assumption that sufficient regularity holds so that all the derivatives that appear in the necessary conditions below exist.

3.3.1 Necessary Conditions for an OLNE

By using the necessary conditions for an open-loop optimal control, obtained, e.g., from the maximum principle (see, e.g., Başar and Olsder 1999; Bryson et al. 1975), we arrive at the conditions (2.17), (2.18), (2.19), (2.20), and (2.21) below, which are necessary for an open-loop Nash equilibrium. Let us introduce the individual Hamiltonians, with H_j being the Hamiltonian for player j, ^{Footnote 6}

$$\displaystyle \begin{aligned} H_{j}(x,u,{\lambda }_{j},t)=g_{j}(x,u,t)+{\lambda }_{j}(t)f(x,u,t), {} \end{aligned} $$

(2.17)

where λ_j(⋅) is the adjoint (or costate) variable, which satisfies the adjoint variational equation (2.18), along with the transversality condition (2.19):

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{{\lambda }}_{j}(t) &\displaystyle =&\displaystyle -\frac{\partial }{\partial x}H_{j}|{}_{x^{\ast }(t),u^{\ast }(t),t}, {} \end{array} \end{aligned} $$

(2.18)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\lambda }_{j}(T) &\displaystyle =&\displaystyle \frac{\partial }{\partial x}S_{j}|{}_{x^{\ast }(T),T.} {} \end{array} \end{aligned} $$

(2.19)

Further, H_j is maximized with respect to u_j, with all other players’ controls fixed at NE, that is,

$$\displaystyle \begin{aligned} u_{j}^{\ast }(t)=\arg \max_{u_{j}\in U_{j}}H_{j}(x^{\ast }(t),u_{j},u_{-j}^{\ast }(t),{\lambda }_{j}(t),t). {} \end{aligned} $$

(2.20)

If the solution to the maximization problem above is in the interior of U_j, then naturally a necessary condition is for the first derivative to vanish at $u_{j}^{\ast }$ for all t, that is,

$$\displaystyle \begin{aligned} \frac{\partial }{\partial u_{j}}H_{j}|{}_{x^{\ast }(t),u^{\ast }(t),t}=0{,} {} \end{aligned} $$

(2.21)

and for the Hessian matrix of second derivatives (with respect to u_j) to be nonnegative definite.

3.3.2 Necessary Conditions for SFNE

The state-feedback NE can be obtained in various different ways. One could again use the approach above, but paying attention to the fact that in the optimal control problem faced by a generic player, the other players’ strategies are now dependent on the current value of the state. A second approach would be to adapt to this problem the method used in optimal control to directly obtain state-feedback controls (i.e., dynamic programming). We discuss here both approaches, first in this subsection the former. The Hamiltonian for player j is again:

$$\displaystyle \begin{aligned} H_{j}(x,u,{\lambda }_{j},t)=g_{j}(x,u,t)+{\lambda }_{j}(t)f(x,u,t). {} \end{aligned} $$

(2.22)

The controls u_i for i ∈ M ∖ j are now defined by the state-feedback rules $\gamma _{i}^{\ast }(x,t)$. Along the equilibrium trajectory {x^∗(t) : t ∈ [t⁰, T]}, the optimal control of player j is $u_{j}^{\ast }(t)=\gamma _{j}^{\ast }(x^{\ast }(t),t)$. Then, as the counterpart of (2.18) and (2.19), we have λ_j(⋅) satisfying (as a necessary condition)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{{\lambda }}_{j}(t) &\displaystyle =&\displaystyle -\left( \frac{\partial }{\partial x} H_{j}+\sum_{i\in M\setminus j}\frac{\partial }{\partial u_{i}}H_{j}\frac{ \partial }{\partial x}\gamma _{i}^{\ast }\right) |{}_{x^{\ast }(t),u^{\ast }(t),t}, {} \end{array} \end{aligned} $$

(2.23)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\lambda }_{j}(T) &\displaystyle =&\displaystyle \frac{\partial }{\partial x}S_{j}|{}_{x^{\ast }(T),T}{,} {} \end{array} \end{aligned} $$

(2.24)

where the second term in (2.23), involving a summation, is a reflection of the fact that H_j depends on x not only through g_j and f but also through the strategies of the other players. The presence of this extra term clearly makes the necessary condition for the state-feedback solution much more complicated than for open-loop solution.

Again, $u_j^*(t) = \gamma ^*_j(x^*(t), t)$ maximizes the Hamiltonian H_j for each t, with all other variables fixed at equilibrium:

$$\displaystyle \begin{aligned} u_j^\ast(t) =\arg \max_{u_j\in U_j}H_j(x^\ast(t),u_j, \gamma_{-j}^\ast(x^*(t), t), {\lambda}_j,t). {} \end{aligned} $$

(2.25)

If the solution to the maximization problem above is in the interior of U_j , then as in (2.21) a necessary condition is for the first derivative to vanish at $u_j^*$ for all t, that is,

$$\displaystyle \begin{aligned} \frac{\partial}{\partial u_j}H_j|{}_{x^*(t),u_j^*(t), \gamma_{-j}^*(x^*(t), t),t} = 0{,} {} \end{aligned} $$

(2.26)

and for the Hessian matrix of second derivatives (with respect to u_j) to be nonnegative definite.

Remark 3

The summation term in (2.23) is absent in three important cases: (i) in optimal control problems (m = 1), since $\frac {\partial }{\partial u }H \frac {\partial u}{\partial x } = 0$; (ii) in two-person zero-sum differential games, because H₁ ≡−H₂ so that for player 1, $\frac { \partial }{\partial u_2 }H_1\frac {\partial u_2}{\partial x } = - \frac { \partial }{\partial u_2 }H_2\frac {\partial u_2}{\partial x }=0$, and likewise for player 2; and (iii) in open-loop nonzero-sum differential games, because $\frac {\partial u_j }{\partial x } = 0$. It would also be absent in nonzero-sum differential games with state-feedback information structure that are strategically equivalent (Başar and Olsder 1999) to (i) (single objective) team problems (which in turn are equivalent to single-player optimal control problems) or (ii) two-person zero-sum differential games.

3.4 Constructing an SFNE Using a Sufficient Maximum Principle

As alluded to above, the necessary conditions for an SFNE as presented are not very useful to compute a state-feedback Nash equilibrium, as one has to infer the form of the partial derivatives of the equilibrium strategies, in order to write the adjoint equations (2.24). However, as an alternative, the sufficient maximum principle given below can be a useful tool when one has an a priori guess of the class of equilibrium strategies (see, Haurie et al. 2012, page 249).

Theorem 1

Assume that the terminal reward functions S_j are continuously differentiable and concave, and let $X \subset \mathbb {R}^n$ be a state constraint set where the state x(t) belongs for all t. Suppose that an m-tuple ${\gamma }^{\ast }=\left ( \gamma _{1}^{\ast },\ldots ,\gamma _{m}^{\ast }\right ) $ of state-feedback strategies $\gamma _{j}:X\times \lbrack t^{0},T]\mapsto \ \mathbb {R}^{m_{j}},\ j\in M,$ is such that

(i)
γ^∗(x, t) is continuously differentiable in x almost everywhere, and piecewise continuous in t;
(ii)
γ^∗(x, t) generates at (x⁰, t⁰) a unique trajectory x^∗(⋅) : [t⁰, T]↦X, solution of
$$\displaystyle \begin{aligned} \dot{{x}}(t)=f({x}(t), {\gamma }^{\ast }(x, t),t),\quad{x}(t^{0})={x}^{0}, \end{aligned}$$
which is absolutely continuous and remains in the interior of X;
(iii)
there exist m costate vector functions ${\lambda } _{j}(\cdot ):[t^{0},T]\mapsto \mathbb {R}^{n}$, which are absolutely continuous and such that, for all j ∈ M, if we define the Hamiltonians
$$\displaystyle \begin{aligned} &H_{j}({x}(t),[{u}_{j},{u}_{-j}],{\lambda }_{j}(t) ,t)\\ &=g_{j}\left( {x}(t),[{u}_{j},{u}_{-j}],t\right) +{\lambda }_{j}(t)f({x} (t),[{u}_{j},{u}_{-j}],t), \end{aligned} $$
and the equilibrium Hamiltonians
$$\displaystyle \begin{aligned} {\mathcal{H}}_{j}^{\ast }({x}^{\ast }(t),{\lambda }_{j}(t),t)=\max_{u_{j}\in U_{j}}H_{j}({x}^{\ast }(t) ,[{u}_{j},{\gamma }_{-j}^{\ast }({x} ^{\ast }(t),t)],{\lambda }_{j}(t) ,t){,} {} \end{aligned} $$
(2.27)
the maximum in (2.27) is reached at $\gamma _{j}^{\ast }({x} ^{\ast }(t),t),$ i.e.,
$$\displaystyle \begin{aligned} {\mathcal{H}}_{j}^{\ast }({x}^{\ast }(t),{\lambda }_{j}(t),t)=\max_{u_{j}\in U_{j}}H_{j}({x}^{\ast }(t) ,{\gamma }^{\ast }({x}^{\ast }(t),t),{ \lambda }_{j}(t) ,t); {} \end{aligned} $$
(2.28)
(iv)
the functions ${x}\mapsto {\mathcal {H}}_{j}^{\ast }({x},{\lambda }_{j}(t),t)$ where ${\mathcal {H}}_{j}^{\ast }$ is defined as in (2.27), but at position (t, x), are continuously differentiable and concave for all t ∈ [t⁰, T] and j ∈ M;
(v)
the costate vector functions λ_j(⋅), j ∈ M, satisfy the following adjoint differential equations for almost all t ∈ [t⁰, T],
$$\displaystyle \begin{aligned} \dot{{\lambda}}_{j}(t)=-\frac{\partial}{\partial x} {\mathcal{H}}_{j}^{\ast } |{}_{({x}^{\ast }(t), {\lambda} _{j}(t),t)}, {} \end{aligned} $$
(2.29)
along with the transversality conditions
$$\displaystyle \begin{aligned} {\lambda} _{j}(T)= \frac{\partial}{\partial x} S_{j}|{}_{ ({x}^*(T),T)}. {} \end{aligned} $$
(2.30)

Then, $\left ( \gamma _{1}^{\ast },\ldots ,\gamma _{m}^{\ast }\right ) $ is an SFNE at (x⁰, t⁰).

3.5 Constructing an SFNE Using Hamilton-Jacobi-Bellman Equations

We now discuss the alternative dynamic programming approach which delivers the state-feedback solution directly without requiring synthesis or guessing of the solution. The following theorem captures the essence of this effective tool for SFNE (see, Başar and Olsder 1999, page 322; Haurie et al. 2012, page 252).

Theorem 2

Suppose that there exists an m-tuple ${\gamma }^{\ast }=\left ( \gamma _{1}^{\ast },\ldots ,\gamma _{m}^{\ast }\right ) $ of state-feedback laws, such that

(i)
for any admissible initial point (x⁰, t⁰), there exists a unique, absolutely continuous solution $t\in \lbrack t^{0},T]\mapsto {x} ^{\ast }(t)\in X{\subset {\mathbb {R}}^{n}}$ of the differential equation
$$\displaystyle \begin{aligned} {\dot{x}}^{\ast }(t)=f( {x}^{\ast }(t),\gamma _{1}^{\ast }(t, {x}^{\ast }(t)),\ldots ,\gamma _{m}^{\ast }\left( t, {x}^{\ast }(t)\right) ,t),\quad{x }^{\ast }(t^{0})= {x}^{0}\,; \end{aligned}$$
(ii)
there exist continuously differentiable value functionals$V_{j}^{\ast }: X {\times }\, [t^{0},T]\, {\mapsto }\, \mathbb {R},$ such that the following coupled Hamilton-Jacobi-Bellman (HJB) partial differential equations are satisfied for all (x, t) ∈ X × [t⁰, T]
$$\displaystyle \begin{aligned} \begin{array}{rcl} \hspace{-10pt}-\frac{\partial}{\partial t}{\ V_{j}^{\ast }({x},t) } &\displaystyle &\displaystyle =\max_{ { u}_{j}\in U_{j}}\left\{ g_{j}\left( {x},[ {u}_{j}, {\gamma }_{-j}^{\ast }(x,t)],t\right) \right. \\ \hspace{-10pt}&\displaystyle &\displaystyle {\quad }\left. +\frac{\partial}{\partial {x}}{\ V_{j}^{\ast }(x, t)}f( {x},\left[ { u}_{j}, {\gamma }_{-j}^{\ast }(x,t)\right] ,t)\right\} {} \end{array} \end{aligned} $$
(2.31)

(2.32)
(iii)
the boundary conditions
$$\displaystyle \begin{aligned} V_{j}^{\ast }(x,T)=S_{j}({x}), {} \end{aligned} $$
(2.33)
are satisfied for all x ∈ X and j ∈ M.

Then, $\gamma _{j}^{\ast }(x,t)$, is a maximizer of the right-hand side of the HJB equation for player j, and the m −tuple $\left ( \gamma _{1}^{\ast },\ldots ,\gamma _{m}^{\ast }\right ) $ is an SFNE at every initial point (x⁰, t⁰) ∈ X × [t⁰, T].

Remark 4

Note that once a complete set of value functionals, {V_j, j ∈ M}, is identified, then (2.31) directly delivers the Nash equilibrium strategies of the players in state-feedback form. Hence, in this approach one does not have to guess the SFNE but rather the structure of each player’s value function; this can be done in a number of games, with one such class being linear-quadratic differential games, as we will see shortly. Also note that Theorem 2 provides a set of sufficient conditions for SFNE, and hence once a set of strategies are found satisfying them, we are assured of their SFNE property. Finally, since the approach entails dynamic programming, it directly follows from (2.31) that a natural restriction of the set of SFNE strategies obtained for the original differential game to a shorter interval [s, T], with s > t⁰, constitutes an SFNE for the differential game which is similarly formulated but on the shorter time interval [s, T]. Hence, the SFNE is subgame perfect and strongly time consistent.

3.6 The Infinite-Horizon Case

Theorems 1 and 2 were stated under the assumption that the time horizon is finite. If the planning horizon is infinite, then the transversality or boundary conditions, that is, ${\lambda } _j (T)=\frac {\partial S_j }{\partial x}\left ( x(T),T\right ) $ in Theorem 1 and V_j(x(t), T) = S_j(x(t), T) in Theorem 2, have to be modified. Below we briefly state the required modifications, and work out a scalar example in the next subsection to illustrate this.

If the time horizon is infinite, the dynamic system is autonomous (i.e., f does not explicitly depend on t) and the objective functional of player j is as in (2.7), then the transversality conditions in Theorem 1 are replaced by the limiting conditions:

$$\displaystyle \begin{aligned} \lim_{t\rightarrow +\infty }e^{-\rho _{j}t}{q}_{j}(t)=0,\quad\forall j\in M, {} \end{aligned} $$

(2.34)

where ${q}_{j}(t)=e^{\rho _{j}t}\lambda _{j}(t)$ is the so-called current-value costate variable. In the coupled set of HJB equations of Theorem 2, the value function $V_{j}^{\star }(x,t)$ is multiplicatively decomposed as

$$\displaystyle \begin{aligned} V_{j}^{\star }(x,t)=e^{-\rho _{j}t}\mathcal{V}_{j}^{\star }(x), \end{aligned} $$

(2.35)

and the boundary condition (2.33) is replaced by

$$\displaystyle \begin{aligned} V_{j}^{\star }(x,t)\rightarrow 0,\text{ when t }\rightarrow \infty , \end{aligned} $$

(2.36)

which is automatically satisfied if $\mathcal {V}_{j}^{\star }(x)$ is bounded.

3.7 Examples of Construction of Nash Equilibria

We consider here a two-player infinite-horizon differential game with scalar linear dynamics and quadratic payoff functions, which will provide an illustration of the results of the previous subsection, also to be viewed as illustration of Theorems 1 and 2 in the infinite-horizon case.

Let u_j(t) be the scalar control variable of player j, j = 1, 2, and x(t) be the state variable, with t ∈ [0, ∞). Let player j’s optimization problem be given by

where φ and κ are positive parameters, 0 < α < 1, and ρ > 0 is the discount parameter. This game has the following features: (i) the objective functional of player j is quadratic in the control and state variables and only depends on the player’s own control variable; (ii) there is no interaction (coupling) either between the control variables of the two players or between the control and the state variables; (iii) the game is fully symmetric across the two players in the state and the control variables; and (iv) by adding the term , to the integrand of J_j, for j = 1, 2, we can make the two objective functions identical:

$$\displaystyle \begin{aligned} J:=\int_{0}^{\infty }e^{-\rho t}\left( u_{1}(t) \left( \kappa - \frac{1}{2}u_{1}(t) \right) +u_{2}(t) \left( \kappa - \frac{1}{2}u_{2}(t) \right) -\frac{1}{2}\varphi x^{2}(t) \right) dt. {} \end{aligned} $$

(2.37)

The significance of this last feature will become clear shortly when we discuss the OLNE (next). Throughout the analysis below, we suppress the time argument when no ambiguity may arise.

Open-Loop Nash Equilibrium (OLNE).

We first discuss the significance of feature (iv) exhibited by this scalar differential game. Note that when the information structure of the differential game is open loop, adding to the objective function of a player (say, player 1) terms that involve only the control of the other player (player 2) does not alter the optimization problem faced by player 1. Hence, whether player j maximizes J_j, or J given by (2.37), makes no difference as far as the OLNE of the game goes. Since this applies to both players, it readily follows that every OLNE of the original differential game is also an OLNE of the single-objective optimization problem (involving maximization of J by each player). In such a case, we say that the two games are strategically equivalent, and note that the second game (described by the single-objective functional J) is a dynamic team. Now, Nash equilibrium (NE) in teams corresponds to person-by-optimality, and not team optimality (which means joint optimization by members of the team), but when every person-by-person optimal solution is also team optimal (the reverse implication is always true), then one can obtain all NE of games strategically equivalent to a particular dynamic team by solving for team optimal (equivalently, globally optimal) solutions of the team. Further, when solving for team-optimal solutions in deterministic teams, whether the information structure is open loop or state feedback does not make any difference, as mentioned earlier. In the particular dynamic team of this example, since J is strictly concave in u₁ and u₂ and x (jointly), and the state equation is linear, every person-by-person optimal solution is indeed team optimal, and because of strict concavity the problem admits a unique globally optimal solution. Hence, the OLNE of the original game exists and is unique.

Having established the correspondence with a deterministic concave team and thereby the existence of a unique OLNE, we now turn to our main goal here, which is to apply the conditions obtained earlier for OLNE to the differential game at hand. Toward that end, we introduce the current-value Hamiltonian of player j:

$$\displaystyle \begin{aligned} {\mathcal{H}}_j (x, \lambda, u_{1},u_{2})=u_j \left( \kappa -\frac{1}{2}u_j \right) -\frac{1}{2}\varphi x^{2}+{q}_j (u_{1}+u_{2}-\alpha x),\ i=1,2, \end{aligned}$$

where q_j(t) is the current-value costate variable, at time t, defined as

$$\displaystyle \begin{aligned} q_j(t)= e^{\rho_j t} {\lambda} _j (t). \end{aligned} $$

(2.38)

Being strictly concave in u_j, ${\mathcal {H}}_j$ admits a unique maximum, achieved by

(2.39)

Note that the Hamiltonians of both players are strictly concave in x, and hence the equilibrium Hamiltonians also are. Then, the equilibrium conditions read:

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \dot{{q}}_j =\rho {q} _j -\frac{\partial}{\partial x}{\ {\mathcal{H}}_j} =(\rho +\alpha ){q} _j +\varphi x,\quad\lim_{t\rightarrow +\infty }e^{-\rho t}{q} _j (t)=0, \; j=1,2{,} \\ &\displaystyle &\displaystyle \dot{x}=2\kappa + q_{1}+q_{2}-\alpha x,\quad x(0)=x^{0}. \end{array} \end{aligned} $$

It is easy to see that q₁(t) = q₂(t) =: q(t), ∀t ∈ [0, ∞), and therefore u₁(t) = u₂(t), ∀t ∈ [0, ∞). This is not surprising given the symmetry of the game. We then have a two-equation differential system in x and q:

$$\displaystyle \begin{aligned} \left( \begin{array}{c} \dot{x} \\ \dot{{q}} \end{array} \right) =\left( \begin{array}{cc} -\alpha & 2 \\ \varphi & \rho +\alpha \end{array} \right) \left( \begin{array}{c} x \\ {q} \end{array} \right) +\left( \begin{array}{c} 2\gamma \\ 0 \end{array} \right) . \end{aligned}$$

We look for the solution of this system converging to the steady state which is given by

$$\displaystyle \begin{aligned} (x_{ss},{q} _{ss})=\left( \frac{2\kappa (\alpha +\rho )}{\alpha ^{2}+\alpha \rho +2\varphi },-\frac{2\kappa \varphi }{\alpha ^{2}+\alpha \rho +2\varphi } \right) . \end{aligned}$$

The solution can be written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} x(t) &\displaystyle =&\displaystyle (x^{0}-x_{ss})e^{\mu _{1}t}+x_{ss}, \\ {q} (t) &\displaystyle =&\displaystyle -(x^{0}-x_{ss})\frac{2\varphi }{2\alpha +\rho +\sqrt{{(2\alpha +\rho )}^{2}+8\varphi }}e^{\mu _{1}t}+{q} _{ss}, \end{array} \end{aligned} $$

where μ₁ is the negative eigenvalue of the matrix associated with the differential equations system and is given by

$$\displaystyle \begin{aligned} \mu _{1}=\frac{1}{2}(\rho -\sqrt{{(2\alpha +\rho )}^{2}+8\varphi }). \end{aligned}$$

Using the corresponding expression for q(t) for q_j in (2.39) leads to the OLNE strategies (which are symmetric).

State-Feedback Nash Equilibrium (SFNE).

The strategic equivalence between the OL differential game and a team problem we established above does not carry over to the differential game with state-feedback information structure, since adding any term to J₁ that involves control u₂ of player 2 will alter the optimization problem faced by player 1, since u₂ depends on u₁ through the state x. Hence, the example system is a genuine game under SF information, and therefore the only way to obtain its SFNE would be to resort to Theorem 2 in view of the extension to the infinite horizon as discussed in Sect. 3.6. The HJB equation for player j, written for the current-value function $\mathcal {V}_ j(x)=e^{\rho t}V_j(t,x)$, is

$$\displaystyle \begin{aligned} \rho {\mathcal{V}}_j (x)=\max_{u_j }\left[ u_j \left( \kappa -\frac{1}{2}u_j \right) -\frac{1}{2}\varphi x^{2}+\frac{\partial}{\partial x} {\mathcal{V}} _j (x) \left( u_{1}+u_{2}-\alpha x\right) \right] . {} \end{aligned} $$

(2.40)

Being strictly concave in u_j, the RHS of (2.40) admits a unique maximum, with the maximizing solution being

$$\displaystyle \begin{aligned} u_j (x)=\kappa +\frac{\partial}{\partial x} {\mathcal{V}}_j (x). {} \end{aligned} $$

(2.41)

Given the symmetric nature of this game, we focus on symmetric equilibrium strategies. Taking into account the linear-quadratic specification of the differential game, we make the informed guess that the current-value function is quadratic (because the game is symmetric, and we focus on symmetric solutions, the value function is the same for both players), given by

$$\displaystyle \begin{aligned} {\mathcal{V}}_j (x)=\frac{a}{2}x^{2}+bx+c,\quad j=1,2{,} \end{aligned}$$

where a, b, c are parameters yet to be determined. Using (2.41) then leads to u_j(x) = κ + ax + b. Substituting this into the RHS of (2.40), we obtain

$$\displaystyle \begin{aligned} \frac{1}{2}(3a^{2}-2a\alpha -\varphi )x^{2}+(3ab-b\alpha +2a\kappa )x+\frac{1 }{2}(3b^{2}+4b\kappa +\kappa ^{2}). \end{aligned}$$

The LHS of (2.40) reads

$$\displaystyle \begin{aligned} \rho \left(\frac{a}{2}x^{2}+bx+c\right), \end{aligned}$$

and equating the coefficients of x², x and the constant term, we obtain three equations in the three unknowns, a, b, and c. Solving these equations, we get the following coefficients for the noncooperative value functions:

$$\displaystyle \begin{aligned} \begin{array}{rcl} a &\displaystyle =&\displaystyle \frac{\rho +2\alpha \pm \sqrt{{(\rho +2\alpha )}^{2}+16\varphi }}{6}, \\ b &\displaystyle =&\displaystyle \frac{-2a\kappa }{3a-(\rho +\alpha )}, \\ c &\displaystyle =&\displaystyle \frac{\kappa ^{2}+4b\kappa +3b^{2}}{2\rho }. \end{array} \end{aligned} $$

Remark 5

The coefficient a is the root of a second-degree polynomial having two roots: one positive and one negative. The selection of the negative root

$$\displaystyle \begin{aligned} a=\frac{\rho +2\alpha -\sqrt{{(\rho +2\alpha )}^{2}+16\varphi }}{6}, \end{aligned}$$

guarantees the global stability of the state trajectory. The resulting noncooperative equilibrium state trajectory is given by

$$\displaystyle \begin{aligned} x^*(t)=\left[ x^{0}+\frac{2(\kappa +b)}{2a-\alpha }\right] e^{(2a-\alpha )t}- \frac{2(\kappa +b)}{2a-\alpha }. \end{aligned}$$

The state dynamics of the game has a globally asymptotically stable steady state if 2a − α < 0. It can be shown that to guarantee this inequality and therefore global asymptotic stability, the only possibility is to choose a < 0.

3.8 Linear-Quadratic Differential Games (LQDGs)

We have seen in the previous subsection, within the context of a specific scalar differential game, that linear-quadratic structure (linear dynamics and quadratic payoff functions) enables explicit computation of both OLNE and SFNE strategies (for the infinite-horizon game). We now take this analysis a step further, and discuss the general class of linear-quadratic (LQ) games, but in finite horizon, and show that the LQ structure leads (using the necessary and sufficient conditions obtained earlier for OLNE and SFNE, respectively) to computationally feasible equilibrium strategies. Toward that end, we first make it precise in the definition that follows the class of LQ differential games under consideration (we in fact define a slightly larger class of DGs, namely, affine-quadratic DGs, where the state dynamics are driven by also a known exogenous input). Following the definition, we discuss characterization of the OLNE and SFNE strategies, in that order. Throughout, x^′ denotes the transpose of a vector x, and B^′ denotes the transpose of a matrix B.

Definition 6

An m-player differential game of fixed prescribed duration [0, T] is of the affine-quadratic type if $U_{j}=\mathbb {R} ^{p_{j}}\;(j\in M)$ and

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(t,x,u) &\displaystyle =&\displaystyle A(t)x+\sum_{i\in M}{}B_{i}(t)u_{i}+c(t), \\ g_{j}(t,x,u) &\displaystyle =&\displaystyle -{ \frac{1}{2}}\left( x^{\prime }Q_{j}(t)x+\sum_{i\in M}{}u_{i}^{\prime}R^{ i}_{j}(t)u_{i}\right) , \\ S_{j}(x) &\displaystyle =&\displaystyle -{\frac{1}{2}}x_{j}^{\prime}Q_j^fx, \end{array} \end{aligned} $$

where A(⋅), B_j(⋅), Q_j(⋅), $R_{j}^{i}(\cdot )$ are matrices of appropriate dimensions, c(⋅) is an n-dimensional vector, all defined on [0, T], and with continuous entries (i, j ∈ M). Furthermore, $Q_{j}^{f},Q_{j}(\cdot )$ are symmetric, $R_{i}^{j}(\cdot )>0\,\,(j\in M)$, and $R_{j}^{j}(\cdot )\geq 0\;(i\neq j,\,\,i,j\in M)$.

An affine-quadratic differential game is of the linear-quadratic type if c ≡ 0.

3.8.1 OLNE

For the affine-quadratic differential game formulated above, let us further assume that Qⁱ(⋅) ≥ 0, $Q_{f}^{i}\geq 0$. Then, under the open-loop information structure, player j’s payoff function J_j([u_j, u_−j];x⁰, t⁰ = 0), defined by (2.3), is a strictly concave function of u_j(⋅) for all permissible control functions u_−j(⋅) of the other players and for all $x^{0}\in \mathbb {R}^{n}$. This then implies that the necessary conditions for OLNE derived in Sect. 3.3.1 are also sufficient, and every solution set of the first-order conditions provides an OLNE. Now, the Hamiltonian for player j is

$$\displaystyle \begin{aligned} H_{j}(x,u,\lambda _{j},t)=-{\frac{1}{2}}\left(x^{\prime }Q_{j}x+\sum_{i\in M}{}u_{j}^{\prime}R^ {i}_{j}u_{i}\right)+\lambda _{j}\left(Ax+c+\sum_{i\in M}{}B_{i}u_{i}\right), \end{aligned}$$

whose maximization with respect to $u_{j}(t)\in \mathbb {R}^{p_{j}}$ yields the unique relation

$$\displaystyle \begin{aligned} u_{j}^{\ast }(t)={R_{j}^{j}(t)}^{-1}B_{j}(t)^{\prime }\lambda _{j}(t),\quad j\in M. \end{aligned} $$

(i)

Furthermore, the costate equations are

$$\displaystyle \begin{aligned} \dot{\lambda}_{j}=Q_{j}x^{\ast }-A^{\prime }\lambda _{j};\quad\lambda _{j};(T)=-Q_{j}^{f}x(T)\quad(j\in M), \end{aligned} $$

(ii)

and the optimal state trajectory is generated by

$$\displaystyle \begin{aligned} \dot{x}^{\ast }=Ax^{\ast }+c-\sum_{i\in M}{}B_{i}{R_{i}^{i}} ^{-1}B_{i}^{\prime }\lambda _{i}\,;\quad x^{\ast }(0)=x^{0}. \end{aligned} $$

(iii)

This set of differential equations constitutes a two-point boundary value problem, the solution of which can be written, without any loss of generality, as {λ_j(t) = −K_j(t)x^∗(t) − k_j(t), j ∈ M;x^∗(t), t ∈ [0, T]} where K_j(⋅) are (n × n) -dimensional matrices and k_j(⋅) are n-dimensional vectors. Now, substituting λ_j = −K_jx^∗− k_j (j ∈ M) into the costate equations (ii), we can arrive at the conclusion that K_j (j ∈ M) and k_j (j ∈ M) should then satisfy, respectively, the following two sets of matrix and vector differential equations:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{array}{rcl} \dot{K}_{j}+K_{j}A+A^{\prime }K_{j}+Q_{j}^{j}-K_{j}\sum_{i\in M}{}B_{i}{ R_{i}^{i}}^{-1}B_{i}^{\prime }K_{i} &\displaystyle = &\displaystyle 0; \\ \qquad\qquad\qquad\qquad\qquad K_{j}(T)=Q_{j}^{f}\quad(j\in M){,}{} &\displaystyle &\displaystyle \end{array} \end{array} \end{aligned} $$

(2.42)

and

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{array}{rcl} \dot{k}_{j}+A^{\prime }k_{j}+K_{j}c-K_{j}\sum_{i\in M}{}B_{i}{R_{i}^{i}} ^{-1}B_{i}^{\prime }k_{i} &\displaystyle = &\displaystyle 0; \\ \qquad\qquad\qquad\qquad\qquad k_{j}(T)=0\quad(j\in M).{} &\displaystyle &\displaystyle \end{array} \end{array} \end{aligned} $$

(2.43)

The expressions for the OLNE strategies can then be obtained from (i) by substituting λ_j = −K_jx^∗− k_j, and likewise the associated state trajectory for x^∗ follows from (iii).

The following theorem now captures this result (see, Başar and Olsder 1999, pp. 317–318).

Theorem 3

For the m-player affine-quadratic differential game with Q_j(⋅) ≥ 0, $Q_{j}^{f}\geq 0\;(j\in M)$, let there exist a unique solution set {K_j, j ∈ M} to the coupled set of matrix Riccati differential equations (2.42). Then, the differential game admits a unique OLNE solutiongiven by

$$\displaystyle \begin{aligned} \gamma _{j}^{\ast }(x^{0},t)\equiv u_{j}^{\ast }(t)=-R_{i}^{i}(t)^{-1}B_{i}^{\prime }(t)[K_{j}(t)x^{\ast }(t)+k_{j}(t)]\quad(j\in M), \end{aligned}$$

where {k_j(⋅), j ∈ M} solve uniquely the set of linear differential equations (2.43) and x^∗(⋅) denotes the corresponding OLNE state trajectory, generated by (iii), which can be written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} x^{\ast }(t) & = & \Phi (t,0)x_{0}+\displaystyle{\int_{0}^{t}}\Phi (t,\sigma )\eta (\sigma )\,\mathrm{d}\sigma , \\ \displaystyle{\frac{\mathrm{d}}{\mathrm{d}t}}\Phi (t,\sigma ) & = & F(t)\Phi (t,\sigma );\quad\Phi (\sigma ,\sigma )=I, \\ F(t) & := & A-\sum_{i\in M}{}B_{i}{R_{i}^{i}}^{-1}B_{i}^{\prime }K_{i}(t), \\ \eta (t) & := & c(t)-\sum_{i\in M}{}B_{i}{R_{i}^{i}}^{-1}B_{i}^{\prime }k_{i}(t).{} \end{array} \end{aligned}$$

Remark 6 (Nonexistence and multiplicity of OLNE)

Note that the existence of OLNE for the affine-quadratic differential game hinges on the existence of a solution to the set of coupled matrix Riccati equations (2.42), since the second differential equation (2.43) always admits a solution, being linear in k_i’s. Further, the OLNE is unique, whenever the matrix solution to (2.42) is unique. It is within the realm of possibility, however, that an OLNE may not exist, just as a Nash equilibrium may not exist in static quadratic games (reaction planes may not have a common point of intersection) or there may be multiple OLNEs (using the earlier analogy to static games, reaction planes may have more than one point of intersection). Note also that even for the LQDG (i.e., when c ≡ 0), when we have k_j ≡ 0, j ∈ M, still the same possibilities (of nonexistence or multiplicity of OLNE) are valid.

An important point to note regarding the OLNE in Theorem 3 above is that the solution does not depend on all the parameters that define the affine-quadratic differential game, particularly the matrices $ \{R_{j}^{i},\,i\neq j,\,i,j=1,2\}$. Hence, the OLNE would be the same if g_j were replaced by

$$\displaystyle \begin{aligned} \tilde{g}_{j}(t,x,u_{j})=-{\frac{1}{2}}\left( x^{\prime }Q_{j}(t)x+u_{j}^{\prime}R^j_{j}(t)u_{j}\right).\end{aligned} $$

This is in fact not surprising in view of our earlier discussion in Sect. 3.7 on strategic equivalence. Under open-loop information structure, adding to g_j(t, x, u) of one game any function of u_−j generates another game that is strategically equivalent to the first one and hence has the same set of OLNE strategies, and in this particular case, adding the term $ (1/2)\sum _{i\neq j}u_{i}^{\prime }R^i_{j}(t)u_{i}$ to g_j generates $ \tilde {g}_{j}$. We can now go a step further, and subtract the term $ (1/2)\sum _{i\neq j}{}u_{i}^{\prime }R^i_{i}(t)u_{i}$ from $\tilde {g}_{j}$, and assuming also that the state weighting matrices Q_j(⋅) and $ Q_{j}^{f}$ are the same across all players (i.e., Q(⋅) and Q^f, respectively), we arrive at a single-objective function for all players (where we suppress dependence on t in the weighting matrices):

(2.44)

Hence, the affine-quadratic differential game where Q_j(⋅) and $ Q_{j}^{f}$ are the same across all players is strategically equivalent to a team problem, which, being deterministic, is in fact an optimal control problem. Letting $u:=({u_{1}}^{\prime },\ldots ,u_{m}^{\prime })^{\prime }$, B := (B₁, …, B_m), and $R=\mathrm {diag}(R_{1}^{1},\ldots ,R_{m}^{m}) $, this affine-quadratic optimal control problem has state dynamics

$$\displaystyle \begin{aligned} \dot{x}=A(t)x+B(t)u(t)+c(t){,}\;x(0)=x^{0}{,}\end{aligned} $$

and payoff function

where R(⋅) > 0. Being strictly concave (and affine-quadratic), this optimal control problem admits a unique globally optimal solution, given by^{Footnote 7}

$$\displaystyle \begin{aligned} u^{\ast }(t)=-R(t)^{-1}B(t)^{\prime}[K(t) x^\ast (t)+k(t)],\;t\geq 0{,}\end{aligned} $$

where K(⋅) is the unique nonnegative-definite solution of the matrix Riccati equation:

(2.45)

k(⋅) uniquely solves

(2.46)

and x^∗(⋅) is generated by (??), with

$$\displaystyle \begin{aligned} F(t)=A-BR^{-1}B^{\prime}K(t),\quad\eta(t) = c(t) - BR^{-1} B^{\prime }k(t){,}\;t\geq 0. \end{aligned}$$

Note that for each block component of u, the optimal control can be written as

(2.47)

which by strategic equivalence is the unique OLNE. The following corollary to Theorem 3 summarizes this result.

Corollary 1

The special class of affine-quadratic differential games with open-loop information structure, where in Definition 6, Q_j = Q ≥ 0 ∀j ∈ M and $Q_{j}^{f}=Q^{f}\geq 0\,\forall j\in M$ , is strategically equivalent to a strictly concave optimal control problem and admits a unique OLNE, given by (2.47), where K(⋅) is the unique nonnegative-definite solution of (2.45), k(⋅) uniquely solves (2.46), and x^∗(⋅) is the unique OLNE state trajectory as defined above.

Remark 7 (Strategic equivalence and symmetry)

A special class of affine-quadratic differential games which fits into the framework covered by Corollary 1 is the class of symmetric differential games, where the players are indistinguishable (with $B_j, Q_j, Q_j^f, R_j^j$ being the same across all players, that is, index j free). Hence, symmetric affine-quadratic differential games, with Q_j = Q ≥ 0 ∀j ∈ M, $Q_j ^f= Q^f \geq 0\, \forall j\in M$, $R^j_j = \bar {R}> 0\, \forall j\in M$, and $B_j = \bar {B}\, \forall j\in M$, admit a unique OLNE:

where K(⋅) and k(⋅) uniquely solve

(2.48)

and

$$\displaystyle \begin{aligned} \dot{k} + A^{\prime }k + Kc - mK\bar{B}\bar{R}^{-1} \bar{B}^{\prime }k = 0{,}\; k(T) = 0{,} \end{aligned}$$

and x^∗(⋅) is as defined before.

Remark 8 (Zero-sum differential games)

A special class of nonzero-sum differential games is zero-sum differential games, where in the general framework, m = 2 and J₂ ≡−J₁ =: J. The (two) players in this case have totally opposing objectives, and hence what one would be minimizing, the other one would be maximizing. Nash equilibrium in this case corresponds to the saddle-point equilibrium, and if $ (\gamma _{1}^{\ast },\gamma _{2}^{\ast })$ is one such pair of strategies, with player 1 as minimizer (of J) and player 2 as maximizer, they satisfy the pair of saddle-point inequalities:

(2.49)

Affine-quadratic zero-sum differential games are defined as in Definition 6, with m = 2 and (suppressing dependence on the time variable t)

$$\displaystyle \begin{aligned} g_{2}\equiv -g_{1}&=:g(x,u_{1},u_{2},t)\\ &={\frac{1}{2}}(x^{\prime }Qx+u_{1}^{\prime }R_{1}u_{1}-u_{2}^{\prime }R_{2}u_{2}),\;Q\geq 0,R_{i}>0,i=1,2{,} \end{aligned} $$

$$\displaystyle \begin{aligned} S_{2}(x)\equiv -S_{1}(x)=:S(x)={\frac{1}{2}}x^{\prime}Q^f x{,}\;\;Q^{f}\geq 0. \end{aligned}$$

Note, however, that this formulation cannot be viewed as a special case of two-player affine-quadratic nonzero-sum differential games with nonpositive-definite weighting on the states and negative-definite weighting on the controls of the players in their payoff functions (which makes the payoff functions strictly concave in individual players’ controls – making their individual maximization problems automatically well defined), because here maximizing player (player 2 in this case) has nonnegative-definite weighting on the state, which brings up the possibility of player 2’s optimization problem to be unbounded. To make the game well defined, we have to assure that it is convex-concave. Convexity of J in u₁ is readily satisfied, but for concavity in u₂, we have to impose an additional condition. It turns out that (see Başar and Bernhard 1995; Başar and Olsder 1999) a practical way of checking strict concavity of J in u₂ is to assure that the following matrix Riccati differential equation has a continuously differentiable nonnegative-definite solution over the interval [0, T], that is, there are no conjugate points:

(2.50)

Then, one can show that the game admits a unique saddle-point solution in open-loop policies, which can be obtained directly from Theorem 3 by noticing that $K_{2}=-K_{1}=:\hat {K}$ and $k_{2}=-k_{1}=:\hat {k}$, which satisfy

(2.51)

and

$$\displaystyle \begin{aligned} \dot{\hat{k}}+A^{\prime }\hat{k}+\hat{K}c-\hat{K}(B_{1}R_{1}^{-1}B_{1}^{ \prime }-B_{2}R_{2}^{-1}B_{2}^{\prime })\hat{k}=0,\;\hat{k}(T)=0. {} \end{aligned} $$

(2.52)

Under the condition of existence of a well-defined solution to (2.50), the matrix Riccati differential equation (2.51) admits a unique continuously differentiable nonnegative-definite solution, and the open-loop saddle-point (OLSP) strategies for the players, satisfying (2.49), are given by

$$\displaystyle \begin{aligned} \gamma _{1}^{\ast }(x^{0},t)&=-R_{1}^{-1}B_{1}^{\prime }[\hat{K}(t)x^{\ast }(t)+\hat{k}(t)],\\ \gamma _{2}^{\ast }(x^{0},t)&=R_{2}^{-1}B_{2}^{\prime }[ \hat{K}(t)x^{\ast }(t)+\hat{k}(t)]{,}\;t\geq 0{,} \end{aligned} $$

where x^∗(⋅) is the saddle-point state trajectory, generated by

$$\displaystyle \begin{aligned} \dot{x}=(A-(B_{1}^{\prime }R_{1}^{-1}B_{1}^{\prime }-B_{2}R_{2}^{-1}B_{2}^{\prime })\hat{K})x-(B_{1}^{\prime }R_{1}^{-1}B_{1}-B_{2}^{\prime }R_{2}^{-1}B_{2})\hat{k}+c{,}\;x(0)=x^{0}. \end{aligned}$$

Hence, the existence of an OLSP hinges on the existence of a nonnegative-definite solution to the matrix Riccati differential equation (2.50), which as indicated is related to the nonexistence of conjugate points in the interval [0, T],^{Footnote 8} which in turn is related to whether the game in the infinite-dimensional function space (Hilbert space in this case) is convex-concave or not, as mentioned earlier. For details, we refer to Başar and Bernhard (1995).

3.8.2 SFNE

We now turn to affine-quadratic differential games (cf. Definition 6) with state-feedback information structure. We have seen earlier (cf. Theorem 2) that the CLNE strategies can be obtained from the solution of coupled HJB partial differential equations. For affine-quadratic differential games, these equations can be solved explicitly, since their solutions admit a general quadratic (in x) structure, as we will see shortly. This also readily leads to a set of SFNE strategies which are expressible in closed form. The result is captured in the following theorem, which follows from Theorem 2 by using in the coupled HJB equations the structural specification of the affine-quadratic game (cf. Definition 6), testing the solution structure $V_j (t,x) = -{\frac {1 }{2}} x^{\prime }Z_j (t) x - x^{\prime }\zeta _j (t) - n_j(t){,}\; j \in M$, showing consistency, and equating like powers of x to arrive at differential equations for Z_j, ζ_j, and n_j (see, Başar and Olsder 1999, pp. 323–324).

Theorem 4

For the m-player affine-quadratic differential game introduced in Definition 6 , with $Q_j (\cdot ) \geq 0,\ Q^f_j \geq 0\ \; ( j \in M)$, let there exist a set of matrix-valued functions Z_j(⋅) ≥ 0, j ∈ M, satisfying the following m-coupled matrix Riccati differential equations:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{array}{c} \dot{Z}_j + Z_j \tilde{F} + \tilde{F}^{\prime }Z_j+ \sum_{i \in M}{} Z_i B_i {R^{i}_{i}}^{-1} R^{i}_{j} {R^i_i}^{-1} B_{i}^{\prime }Z_i + Q_j = 0; \\ Z_j(T) = Q^f_j, {} \end{array} \end{array} \end{aligned} $$

(2.53)

where

$$\displaystyle \begin{aligned} \begin{array}{rcl} \tilde{F }(t) := A (t) - \sum_{i \in M}{} B_i (t) R^i_i (t)^{-1} B_i (t)^{\prime }Z_i (t). {} \end{array} \end{aligned} $$

(2.54)

Then, under the state-feedback information structure, the differential game admits an SFNE solution, affine in the current value of the state, given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \gamma_{j}^* (x, t) = - R^j_j (t)^{-1} B_j (t)^{\prime }[ Z_j (t) x(t) + \zeta_j(t)], \quad j \in M, {} \end{array} \end{aligned} $$

(2.55)

where ζ_j (j ∈ M) are obtained as the unique solution of the coupled linear differential equations

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{\zeta}_j + \tilde{F}^{\prime }\zeta_j+ \sum_{i \in M}{}\ Z_i B_i {R^i_i} ^{-1} R^{i}_{j} {R^i_i}^{-1} B_i^{\prime }\zeta_i + Z_j \beta= 0;\; \zeta_j (T) = 0, {} \end{array} \end{aligned} $$

(2.56)

with

$$\displaystyle \begin{aligned} \begin{array}{rcl} \beta := c - \sum_{i \in M}{}\ B_i {R^i_i}^{-1} B_{i}^{\prime }\zeta_i . {} \end{array} \end{aligned} $$

(2.57)

The corresponding values of the payoff functionals are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \bar{J}_{j}^* = V_j (x^0, 0) = -{\frac{1}{2}} {x^0}^{\prime }Z_j(0) x^0 - { x^0}^{\prime }\zeta_j (0) - n_j(0), \quad j \in M, {} \end{array} \end{aligned} $$

(2.58)

where n_j(⋅) (j ∈ M) are obtained as unique continuously differentiable solutions of

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{n}_j + \beta^{\prime }\zeta_j + {\frac{1 }{2}}\sum_{i \in M}{}\ \zeta_i^{\prime }B_i {R^i_i}^{-1}R^{i}_{j} {R^i_i}^{-1} B_i^{\prime }\zeta_i = 0;\; n_j (T) = 0. {} \end{array} \end{aligned} $$

(2.59)

Remark 9

Note that the “nonnegative-definiteness” requirement imposed on Z_j(⋅) is a consequence of the fact that $V_j (x, t) \geq 0 \; \forall x \in \mathbb {R}^n, t \in [0,T]$, this latter feature being due to the eigenvalue restrictions imposed a priori on Q_j(⋅), $ Q^f_j$, and $R^{i}_{j} (\cdot )$, i, j ∈ M. Finally, the corresponding “Nash” values for the payoff functionals follow from the fact that V_j(x, t) is the value function for player j at SFNE, at any point (x, t). We also note that Theorem 4 provides only one set of SFNE strategies for the affine-quadratic game under consideration, and it does not attribute any uniqueness feature to this solution set. What can be shown, however, is the uniqueness of SFNE when the players are restricted at the outset to affine memoryless state-feedback strategies (Başar and Olsder 1999).

Remark 10

The result above extends readily to more general affine-quadratic differential games where the payoff functions of the players contain additional terms that are linear in x, that is, with g_j and S_j in Definition 6 extended, respectively, to

$$\displaystyle \begin{aligned} \begin{array}{rcl} g_j = -{\frac{1 }{2}} \left( x^{\prime }[Q_j(t) x + 2l_j(t)]+ \sum_{i \in M}{} u_j^{\prime }R_i^j u_i \right); \quad S_j(x)= -{\frac{1 }{2}} x^{\prime }[Q_j^f x + 2 l_j^f], \end{array} \end{aligned} $$

where l_j(⋅) is a known n-dimensional vector-valued function, continuous on [0, T], and $l_j^f$ is a fixed n-dimensional vector, for each j ∈ M. Then, the statement of Theorem 4 remains intact, with only the differential equation (2.56) that generates ζ_j(⋅) now reading:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{\zeta}_j + \tilde{F}^{\prime }\zeta_j + \sum_{i \in M}{}\ Z_i B_i { R^{i}_{i}}^{-1} R^{i}_j {R_i^i}^{-1} B_i^{\prime }\zeta_i + Z_j\beta + l_j= 0;\; \zeta_j (T) = l_j^f. \end{array} \end{aligned} $$

When comparing SFNE with the OLNE, one question that comes up is whether there is the counterpart of Corollary 1 in the case of SFNE. The answer is no, because adding additional terms to g_j that involve controls of other players generally leads to a different optimization problem faced by player j, since u_i’s for i ≠ j depend on x and through it on u_j. Hence, in general, a differential game with state-feedback information structure cannot be made strategically equivalent to a team (and hence optimal control) problem. One can, however, address the issue of simplification of the set of sufficient conditions (particularly the coupled matrix Riccati differential equations (2.53)) when the game is symmetric. Let us use the same setting as in Remark 7, but also introducing a common notation $\hat {R} $ for the weighting matrices $R_{j}^{i},i\neq j$ for the controls of the other players appearing in player j’s payoff function, and for all j ∈ M (note that this was not an issue in the case of open-loop information since $ R_{j}^{i}$’s, i ≠ j were not relevant to the OLNE), and focusing on symmetric SFNE, we can now rewrite the SFNE strategy (2.55) for player j as

$$\displaystyle \begin{aligned} \gamma _{j}^{\ast }(x,t)=-\bar{R}(t)^{-1}\bar{B}(t)^{\prime }[Z(t)x(t)+\zeta (t)],\quad j\in M, \end{aligned} $$

(2.60)

where Z(⋅) ≥ 0 solves

$$\displaystyle \begin{aligned} \dot{Z}+Z\tilde{F}+\tilde{F}^{\prime }Z+Z\Bar{B}\bar{R}^{-1}\bar{B}^{\prime }Z+\sum_{i\neq j,\,i\in M}{}Z\Bar{B}\bar{R}^{-1}\hat{R}\bar{R}^{-1}\bar{B} ^{\prime f}, {} \end{aligned} $$

(2.61)

with (from (2.54))

$$\displaystyle \begin{aligned} \tilde{F}:=A(t)-m\bar{B}(t)\bar{R}(t)^{-1}\bar{B}(t)^{\prime }Z(t). \end{aligned} $$

(2.62)

Substituting this expression for $\tilde {F}$ into (2.61), we arrive at the following alternative (more revealing) representation:

(2.63)

Using the resemblance to the matrix Riccati differential equation that arises in standard optimal control (compare it with the differential equation (2.48) for K in Remark 7), we can conclude that (2.63) admits a unique continuously differentiable nonnegative-definite solution whenever the condition

$$\displaystyle \begin{aligned} 2\bar{R}-\hat{R}>0, {} \end{aligned} $$

(2.64)

holds. This condition can be interpreted as players placing relatively more weight on their self-controls (in their payoff functions) than on each of the other individual players. In fact, if the weights are equal, then $\bar {R }=\hat {R}$, and (2.63) becomes equivalent to (2.48); this is of course not surprising since a symmetric game with $\bar {R}=\hat {R}$ is essentially an optimal control problem (players have identical payoff functions), for which OL and SF solutions have the same underlying Riccati differential equations .

Now, to complete the characterization of the SFNE for the symmetric differential game, we have to write down the differential equation for ζ_j in Theorem 4, that is (2.56), using the specifications imposed by symmetry. Naturally, it now becomes independent of the index j and can be simplified to the form below:

$$\displaystyle \begin{aligned} \dot{\zeta}+\left[ A^{\prime }-Z\bar{B}\bar{R}^{-1}[(2m-1)\bar{R}-(m-1)\hat{R }]\bar{R}^{-1}\bar{B}^{\prime }\right] \zeta +Zc=0;\;\zeta (T)=0{.} {}\end{aligned} $$

(2.65)

We following corollary to Theorem 4 now captures the main points of the discussion above.

Corollary 2

For the symmetric affine-quadratic differential game introduced above, let the matrix Riccati differential equation (2.63) admit a unique continuously differentiable nonnegative-definite solution Z(⋅). Then the game admits a CLNE solution, which is symmetric across all players, and given by

where ζ(⋅) is generated uniquely by (2.65). If, furthermore, the condition (2.64) holds, then Z(⋅) exists and is unique.

Remark 11 (Zero-sum differential games with SF information)

The counterpart of Remark 8 on saddle-point equilibrium can also be derived under state-feedback information structure, this time specializing Theorem 4 to the two-player zero-sum differential game. Using the same setting as in Remark 8, it follows by inspection from Theorem 4 that $Z_2 = - Z_1=:\hat {Z}$ and $\zeta _2 = \zeta _1= :\hat \zeta $, where the differential equations satisfied by $\hat {Z}$ and $ \hat \zeta $ are precisely the ones satisfied by $\hat {K}$ and $\hat {k}$ in the OL case, that is, (2.51) and (2.52), respectively. Under the condition of the existence of well-defined (unique continuously differentiable nonnegative-definite) solution to the matrix Riccati differential equation (2.51), the state-feedback saddle-point (SFSP) strategies for the players, satisfying (2.49), are given by (directly from Theorem 4)

$$\displaystyle \begin{aligned} \gamma_1^*(x, t) = -R_1^{-1} B_1^{\prime }[\hat{K}(t) x(t) + \hat{k}(t)],\; \gamma_2^*(x, t) = R_2^{-1} B_2^{\prime }[\hat{K}(t) x(t) + \hat{k}(t)]{,}\; t\geq 0.\end{aligned} $$

Note that these are in the same form as the OLSP strategies, with the difference being that they are now functions of the actual current value of the state instead of the computed value (as in the OLSP case). Another difference between the OLSP and SFSP is that the latter does not require an a priori concavity condition to be imposed, and hence whether there exists a solution to (2.50) is irrelevant under state-feedback information; this condition is replaced by the existence of a solution to (2.51), which is less restrictive Başar and Bernhard (1995). Finally, since the forms of the OLSP and SFSP strategies are the same, they generate the same state trajectory (and hence lead to the same value for J), provided that the corresponding existence conditions are satisfied.

4 Stackelberg Equilibria

In the previous sections, the assumption was that the players select their strategies simultaneously, without any communication. Consider now a different scenario: a two-player game where one player, the leader, makes her decision before the other player, the follower.^{Footnote 9} Such a sequence of moves was first introduced by von Stackelberg in the context of a duopoly output game; see, von Stackelberg (1934).

Denote by L the leader and by F the follower. Suppose that u_L(t) and u_F(t) are, respectively, the control vectors of L and F. The control constraints u_L(t) ∈ U_L and u_F(t) ∈ U_F must be satisfied for all t. The state dynamics and the payoff functionals are given as before by (2.1), (2.2), and (2.3), where we take the initial time to be t⁰ = 0, without any loss of generality. As with the Nash equilibrium, we will define an open-loop Stackelberg equilibrium (OLSE). We will also introduce what is called feedback (or Markovian)-Stackelberg equilibrium (FSE), which uses state feedback information and provides the leader only time-incremental lead advantage.

4.1 Open-Loop Stackelberg Equilibria (OLSE)

When both players use open-loop strategies, μ_L and μ_F, their control paths are determined by u_L(t) = μ_L(x⁰, t) and u_F(t) = μ_F(x⁰, t), respectively. Here μ_j denotes the open-loop strategy of player j.

The game proceeds as follows. At time t = 0, the leader announces her control path u_L(⋅) for t ∈ [0, T]. Suppose, for the moment, that the follower believes in this announcement. The best she can do is then to select her own control path u_F(⋅) to maximize the objective functional

$$\displaystyle \begin{aligned} J_{F}=\int_{0}^{T}g_{F}\left( {x}(t),{u}_{L}(t),{\ {u}}_{F}(t),t\right) dt+S_{F}({x}(T)), {} \end{aligned} $$

(2.66)

subject to the state dynamics

$$\displaystyle \begin{aligned} \dot{{x}}(t)=f\left( {x}(t),{u}_{L}(t),{u}_{F}(t),t\right) \quad x(0)=x^{0}, {} \end{aligned} $$

(2.67)

and the control constraint

$$\displaystyle \begin{aligned} {u}_{F}(t)\in U_{F}. {} \end{aligned} $$

(2.68)

This is a standard optimal control problem. To solve it, introduce the follower’s Hamiltonian

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle H_{F}({x}(t),{\lambda }_{F}(t),{u}_{F}(t),{u}_{L}(t),t) \\ &\displaystyle &\displaystyle \qquad\qquad\qquad\qquad=g_{F}({x}(t),{u}_{F}(t),{u}_{L}(t),t)+{ \lambda }_{F}f({x}(t),{\ {u}}_{F}(t) ,{u}_{L}(t) ,t), \end{array} \end{aligned} $$

where the adjoint variable λ_F = λ_F(t) is an n -vector. Suppose that the Hamiltonian H_F is strictly concave in u_F ∈ U_F, where U_F is a convex set. Then the maximization of H_F with respect to u_F, for t ∈ [0, T], uniquely determines u_F(t) as a function of t, x, u_L, and λ_F, which we write as

$$\displaystyle \begin{aligned} {u}_{F}(t)=R({x}(t),t,{u}_{L}(t),{\lambda }_{F}(t)). {} \end{aligned} $$

(2.69)

This defines the follower’s best reply (response) to the leader’s announced time path u_L(⋅).

The follower’s costate equations and their boundary conditions in this maximization problem are given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{{\lambda}}_{F}(t) &\displaystyle =&\displaystyle -\frac{\partial}{\partial {x} } H_{F}, \\ {\lambda} _{F}(T) &\displaystyle =&\displaystyle \frac{\partial}{\partial {x}} S_{j}\left( {x } (T)\right) . \end{array} \end{aligned} $$

Substituting the best response function R into the state and costate equations yields a two-point boundary-value problem. The solution of this problem, $\left ( {x}(t),{\lambda } _{F}(t)\right ) ,$ can be inserted into the function R. This represents the follower’s optimal behavior, given the leader’s announced time path u_L(⋅).

The leader can replicate the follower’s arguments. This means that, since she knows everything the follower does, the leader can calculate the follower’s best reply R to any u_L(⋅) that she may announce. The leader’s problem is then to select a control path u_L(⋅) that maximizes her payoff given F’s response, that is, maximization of

$$\displaystyle \begin{aligned} J_{L}=\int_{0}^{T}g_{L}\left( {x}(t),{\ {u}}_{L}(t),R\left( {x}(t),t,{u} _{L}(t),{\lambda }_{F}(t)\right) ,t\right)dt+S_{L}({x}(T)), {} \end{aligned} $$

(2.70)

subject to

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{{x}}(t) &\displaystyle =&\displaystyle f({x}(t),{u}_{L}(t),R({x}(t),t,{u}_{L}(t),{\lambda } _{F}(t)),t),\quad x(0)=x^{0}, \\ \dot{{\lambda }}_{F}(t) &\displaystyle =&\displaystyle -\frac{\partial }{\partial {x}}H_{F}\left( {x}(t), {\ {u}}_{L}(t),R\left( {x}(t),t,{u}_{L}(t),{\lambda }_{F}(t)\right) \right) ,t), \\ {\lambda }_{F}(T) &\displaystyle =&\displaystyle \frac{\partial}{\partial {x}} S_{F}\left( {x}(T)\right) , \end{array} \end{aligned} $$

and the control constraint

$$\displaystyle \begin{aligned} {u}_{L}(t)\in U_{L}. \end{aligned}$$

Note that the leader’s dynamics include two state equations, one governing the evolution of the original state variables x and a second one accounting for the evolution of λ_F, the adjoint variables of the follower, which are now treated as state variables. Again, we have an optimal control problem that can be solved using the maximum principle. To do so, we introduce the leader’s Hamiltonian

where λ_L = λ_L(t) is the n-vector of costate variables appended to the state equation for x(t), with the boundary conditions

$$\displaystyle \begin{aligned} {\lambda }_{L}(T)=\frac{\partial}{\partial {x}} S_{L}\left( {x}(T)\right) , \end{aligned}$$

and θ = θ(t) is the vector of n costate variables appended to the state equation for λ_F(t), satisfying the initial condition

$$\displaystyle \begin{aligned} \theta \left( 0\right) =0. \end{aligned}$$

This initial condition is a consequence of the fact that λ_F(0) is “free,” i.e., unrestricted, being free of any soft constraint in the payoff function, as opposed to x(T) which enters a terminal reward term. The following theorem now collects all this for the OLSE (see, Başar and Olsder 1999, pp. 409–410).

Theorem 5

For the two-player open-loop Stackelberg differential game formulated in this subsection, let $u_L^*(t) = \mu _L^*(x^0, t)$ be the leader’s open-loop equilibrium strategy and $u_F^*(t) = \mu _F^*(x^0, t)$ be the follower’s. Let the solution to the follower’s optimization problem of maximizing J_F given by (2.66) subject to the state equation (2.67) and control constraint (2.68) exist and be uniquely given by (2.69). Then,

(i)
The leader’s open-loop Stackelberg strategy $\mu _L^*$ maximizes (2.70) subject to the given control constraint and the 2n -dimensional differential equation system for x and λ_F (given after (2.70)) with mixed boundary specifications.
(ii)
The follower’s open-loop Stackelberg strategy $\mu _F^*$ is (2.69) with u_L replaced by $u_L^*$.

Remark 12

In the light of the discussion in this subsection leading to Theorem 5, it is possible to write down a set of necessary conditions (based on the maximum principle) which can be used to solve for L’s open-loop strategy $\mu _{L}^{\ast }$. Note that, as mentioned before, in this maximization problem in addition to the standard state (differential) equation with specified initial conditions, we also have the costate differential equation with specified terminal conditions, and hence the dynamic constraint for the maximization problem involves a 2n-dimensional differential equation with mixed boundary conditions (see the equations for x and λ_F following (2.70)). The associated Hamiltonian is then H_L, defined prior to Theorem 5, which has as its arguments two adjoint variables, λ_L and θ, corresponding to the differential equation evolutions for x and λ_F, respectively. Hence, from the maximum principle, these new adjoint variables satisfy the differential equations:

Finally, $u_{L}^{\ast }(t)=\mu _{L}^{\ast }(x^{0},t)$ is obtained from the maximization of the Hamiltonian H_L (where we suppress dependence on t ):

$$\displaystyle \begin{aligned} u_{L}^{\ast }=\arg \max_{{u}_{L}\in U_{L}}H_{L}(x,u_{L},R(x,u_{L},\lambda _{F}),\lambda _{L},\theta ). \end{aligned}$$

4.2 Feedback Stackelberg Equilibria (FSE)

We now endow both players with state-feedback information, as was done in the case of SFNE, which is a memoryless information structure, not allowing the players to recall even the initial value of the state, x⁰, except at t = 0. In the case of Nash equilibrium, this led to a meaningful solution, which also had the appealing feature of being subgame perfect and strongly time consistent. We will see in this subsection that this appealing feature does not carry over to Stackelberg equilibrium when the leader announces her strategy in advance for the entire duration of the game, and in fact the differential game becomes ill posed. This will force us to introduce, again under the state-feedback information structure, a different concept of Stackelberg equilibrium, called feedback Stackelberg, where the strong time consistency is imposed at the outset. This will then lead to a derivation that parallels the one for SFNE.

Let us first address “ill-posedness” of the classical Stackelberg solution when the players use state-feedback information, in which case their strategies are mappings from $\mathbb {R}^{n}\times \lbrack 0,T]$ where the state-time pair (x, t) maps into U_L and U_F, for L and F, respectively. Let us denote these strategies by γ_L ∈ Γ_L and γ_F ∈ Γ_F, respectively. Hence, the realizations of these strategies lead to the control actions (or control paths): u_L(t) = γ_L(x, t) and u_F(t) = γ_F(x, t), for L and F, respectively. Now, in line with the OLSE we discussed in the previous subsection, under the Stackelberg equilibrium, the leader L announces at time zero her strategy γ(x, t) and commits to using this strategy throughout the duration of the game. Then the follower F reacts rationally to L’s announcement, by maximizing her payoff function. Anticipating this, the leader selects a strategy that maximizes her payoff functional subject to the constraint imposed by the best response of F.

First let us look at the follower’s optimal control problem. Using the dynamic programming approach, we have the Hamilton-Jacobi-Bellman (HJB) equation characterizing F’s best response to an announced γ_L ∈ Γ_L:

where V_F is the value function of F, which has the terminal condition V_F(x, T) = S_F(x(T)). Note that, for each fixed γ_L ∈ Γ_L, the maximizing control for F on the RHS of the HJB equation above is a function of the current time and state and hence is an element of Γ_F. Thus, F’s maximization problem and its solution are compatible with the state-feedback information structure, and hence we have a well-defined problem at this stage. The dependence of this best response on γ_L, however, will be quite complex (much more than in the open-loop case), since what we have is a functional dependence in an infinite-dimensional space. Nevertheless, at least formally, we can write down this relationship as a best reaction function, $\tilde {R}:\Gamma _{L}{\rightarrow } \Gamma _{F}$, for the follower:

$$\displaystyle \begin{aligned} \gamma _{F}=\tilde{R}(\gamma _{L}). {} \end{aligned} $$

(2.71)

Now, L can make this computation too, and according to the Stackelberg equilibrium concept, which is also called global Stackelberg solution (see, Başar and Olsder 1999), she has to maximize her payoff under the constraints imposed by this reaction function and the state dynamics that is formally

$$\displaystyle \begin{aligned} \max_{\gamma _{L}\in \Gamma _{L}}J_{L}(\gamma _{L}, \tilde{R}(\gamma _{L})) \end{aligned}$$

Leaving aside the complexity of this optimization problem (which is not an optimal control problem of the standard type because of the presence of the reaction function which depends on the entire strategy of L over the full- time interval of the game), we note that this optimization problem is ill posed since for each choice of γ_L ∈ Γ_L, $J_{L}(\gamma _{L},\tilde {R}(\gamma _{L}))$ is not a real number but generally a function of the initial state x⁰, which is not available to L; hence, what we have is a multi-objective optimization problem, and not a single-objective one, which makes the differential game with the standard (global) Stackelberg equilibrium concept ill posed. One way around this difficulty would be to allow the leader (as well as the follower) recall the initial state (and hence modify their information sets to ν(x(t), x⁰, t)) or even have full memory on the state (in which case, ν is ν(x(s), s ≤ t;t)), which would make the game well posed, but requiring a different set of tools to obtain the solution (see, e.g., Başar and Olsder 1980; Başar and Selbuz 1979 and Chap. 7 of Başar and Olsder 1999), which also has connections to incentive designs and inducement of collusive behavior, further discussed in the next section of this chapter. We should also note that including x⁰ in the information set also makes it possible to obtain the global Stackelberg equilibrium under mixed information sets, with F’s information being inferior to that of L, such as ν_L(x(t), x⁰, t) for L and ν(x⁰, t) for F. Such a differential game would also be well defined.

Another way to resolve the ill-posedness of the global Stackelberg solution under state-feedback information structure is to give the leader only a stagewise (in the discrete-time context) first-mover advantage; in continuous time, this translates into an instantaneous advantage at each time t (Başar and Haurie 1984). This pointwise (in time) advantage leads to what is called a feedback Stackelberg equilibrium (FSE), which is also strongly time consistent (Başar and Olsder 1999). The characterization of such an equilibrium for j ∈{L, F} involves the HJB equations

$$\displaystyle \begin{aligned} \left\{ - \frac{\partial}{\partial t} V_{j}(x,t)\right\} _{j=1,2}& =\mathrm{Sta}\bigg\{ g_{j}({x},[u_{F},u_{L}])\\ & \left.\qquad+\frac{\partial}{\partial {x}} V_{j}(x,t)f({x} ,[u_{F},u_{L}],t)\right\} _{j=1,2}, {} \end{aligned} $$

(2.72)

where the “Sta” operator on the RHS solves, for each (x, t), for the Stackelberg equilibrium solution of the static two-player game in braces, with player 1 as leader and player 2 as follower. More precisely, the pointwise (in time) best response of F to γ_L ∈ Γ_L is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{R}(x,t;\gamma _{L}(x,t))= \\ &\displaystyle &\displaystyle \qquad\arg \max_{{u}_{F}\in U_{F}}\left\{ g_{F}({x},[{u}_{F},\gamma _{L}(x,t)],t)+\frac{\partial}{\partial {x}} V_{F}(x,t)f({x},[{u}_{F},\gamma _{L}(x,t)],t)\right\} , \end{array} \end{aligned} $$

and taking this into account, L solves, again pointwise in time, the maximization problem:

$$\displaystyle \begin{aligned} \max_{{u}_{L}\in U_{L}}\left\{ g_{L}({x}),([\hat{R}(x,t;{u}_{L}),{u}_{L}],t)+\frac{\partial}{\partial {x}} V_{L}(x,t)f(x,[\hat{R}(x,t;{u}_{L}),{u} _{L}],t)\right\} . \end{aligned}$$

Denoting the solution to this maximization problem by $u_{L}=\hat {\gamma } _{L}(x,t)$, an FSE for the game is then the pair of state-feedback strategies:

$$\displaystyle \begin{aligned} \left( \,\,\hat{\gamma}_{L}(x,t),\;\hat{\gamma}_{F}(x,t)=\hat{R}(x,t;\hat{ \gamma}_{L}(x,t))\,\,\right\} {} \end{aligned} $$

(2.73)

Of course, following the lines we have outlined above, it should be obvious that explicit derivation of this pair of strategies depends on the construction of the value functions, V_L and V_F, satisfying the HJB equations (2.72). Hence, to complete the solution, one has to solve (2.72) for V_L(x, t) and V_F(x, t) and use these functions in (2.73). The main difficulty here is, of course, in obtaining explicit solutions to the HJB equations, which however can be done in some classes of games, such as those with linear dynamics and quadratic payoff functions (in which case V_L and V_F will be quadratic in x) (Başar and Olsder 1999). We provide some evidence of this solvability through numerical examples in the next subsection.

4.3 An Example: Construction of Stackelberg Equilibria

Consider the example of Sect. 3.7 but now with player 1 as the leader (from now on referred to as player L) and player 2 as the follower (player F). Recall that player j’s optimization problem and the underlying state dynamics are

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \max_{{u}_{j}}\left\{ J_{j}{=}\int_{0}^{\infty }e^{-\rho t}\left( u _{j}(t)\left( \kappa {-} \frac{1}{2}{u}_{j}(t)\right) -\frac{1}{2}\varphi x^{2}(t) \right) dt\right\}{,} \; j{=}L, F, {} \end{array} \end{aligned} $$

(2.74)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \dot{{x}}(t)={u}_{L}(t) +{u}_{F}(t) -\alpha x(t),\;\;x(0)=x^{0}, \end{array} \end{aligned} $$

(2.75)

where φ and κ are positive parameters and 0 < α < 1. We again suppress the time argument henceforth when no ambiguity may arise. We discuss below both OLSE and FSE, but with horizon length infinite. This will give us an opportunity to introduce, in this context, also the infinite-horizon Stackelberg differential game.

4.3.1 Open-Loop Stackelberg Equilibrium (OLSE).

To obtain the best reply of the follower to the leader’s announcement of the path u_L(t), we introduce the Hamiltonian of player F:

$$\displaystyle \begin{aligned} {H}_{F}(x,{u}_{L},{u}_{F})={u}_{F}\left( \kappa -\frac{1}{2}{u}_{F}\right) - \frac{1}{2}\varphi x^{2}+{q} _{F}({u}_{L}+{u}_{F}-\alpha x), \end{aligned}$$

where q_F is the follower’s costate variable associated with the state variable x. H_F being quadratic and strictly concave in u_F, it has a unique maximum:

(2.76)

where (from the maximum principle) q_F satisfies

$$\displaystyle \begin{aligned} \dot{{q}}_{F}=\rho {q} _{F}-\frac{\partial}{\partial x} {H}_{F} =(\rho +\alpha ){q} _{F}+\varphi x,\quad\lim_{t\rightarrow \infty }e^{-\rho t}{q} _{F}(t)=0, {} \end{aligned} $$

(2.77)

and with (2.76) used in the state equation, we have

$$\displaystyle \begin{aligned} \dot{{x}}(t)={u}_{L}(t) +\kappa +{q} _{F}(t) -\alpha x(t),\;\;x(0)=x^{0}. {} \end{aligned} $$

(2.78)

Now, one approach here would be first to solve the two differential equations (2.77) and (2.78) and next to substitute the solutions in (2.76) to arrive at follower’s best reply, i.e., u_F(t) = R(x(t), u_L(t), q_F(t)). Another approach would be to postpone the resolution of these differential equations and instead use them as dynamic constraints in the leader’s optimization problem:

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \max_{{u}_{L}}\left\{ J_{L}=\int_{0}^{\infty }e^{-\rho t}\left( u_{L}\left( \kappa - \frac{1}{2}{u}_{L}\right) -\frac{1}{2}\varphi x^{2}\right) dt\right\} \\ &\displaystyle &\displaystyle \dot{{q}}_{F} =(\rho +\alpha ){q} _{F}+\varphi x,\quad\lim_{t\rightarrow \infty }e^{-\rho t}{q} _{F}(t)=0, \\ &\displaystyle &\displaystyle \dot{x} ={u}_{L}+\kappa +{q} _{F}-\alpha x,\quad x(0)=x^{0}. \end{array} \end{aligned} $$

This is an optimal control problem with two state variables (q_F and x) and one control variable (u_L). Introduce the leader’s Hamiltonian:

where θ and q_L are adjoint variables associated with the two state equations in the leader’s optimization problem. Being quadratic and strictly concave in u_L, H_L also admits a unique maximum, given by

$$\displaystyle \begin{aligned} {u}_{L}=\kappa +{q} _{L}{,} \end{aligned}$$

and we have the state and adjoint equations:

Substituting the expression for u_L in the differential equation for x, we obtain a system of four differential equations, written in matrix form as follows:

$$\displaystyle \begin{aligned} \left( \begin{array}{c} \dot{\theta} \\ \dot{{q}}_{L} \\ \dot{x} \\ \dot{{q}}_{F} \end{array} \right) =\left( \begin{array}{cccc} -\alpha & -1 & 0 & 0 \\ -\varphi & \rho +\alpha & \varphi & 0 \\ 0 & 1 & -\alpha & 1 \\ 0 & 0 & \varphi & \rho +\alpha \end{array} \right) \left( \begin{array}{c} \theta \\ {q} _{L} \\ x \\ {q} _{F} \end{array} \right) +\left( \begin{array}{c} 0 \\ 0 \\ 2\kappa \\ 0 \end{array} \right) . \end{aligned}$$

Solving the above system yields $\left ( \theta ,{q} _{L},x,{q} _{F}\right ) $. The last step would be to insert the solutions for q_F and q_L in the equilibrium conditions

$$\displaystyle \begin{aligned} {u}_{F}=\kappa +{q} _{F},\quad{u}_{L}=\kappa +{q} _{L}, \end{aligned}$$

to obtain the open-loop Stackelberg equilibrium controls u_L and u_F.

4.3.2 Feedback-Stackelberg Equilibrium (FSE).

To obtain the FSE, we first have to consider the infinite-horizon version of (2.72) and compute the best response of F to u_L = γ_L(x). The maximization problem faced by F has the associated steady-state HJB equation for the current-value function $\mathcal {V}_F(x)$ (with the value function defined as $V_F(x,t)=e^{-\rho t}\mathcal {V}_F(x)$):

$$\displaystyle \begin{aligned} \rho {\mathcal{V}}_{F}(x)=\max_{{u}_{F}}\left[ {u}_{F}\left( \kappa - \frac{1}{2}{u}_{F}\right) -\frac{1}{2}\varphi x^{2}+\frac{\partial}{{\partial }{x}}{{\mathcal{V}}_{F}(x)}\left( {u}_{L}+{u}_{F}-\alpha x\right) \right]. {} \end{aligned} $$

(2.79)

Maximization of the RHS yields (uniquely, because of strict concavity)

$$\displaystyle \begin{aligned} {u}_{F}=\kappa +\frac{\partial}{{\partial}{x}}{{\mathcal{V}}_{F}(x)}. {} \end{aligned} $$

(2.80)

Note that the above reaction function of the follower does not directly depend on the leader’s control u_L, but only indirectly, through the state variable.

Accounting for the follower’s response, the leader’s HJB equation is

(2.81)

where ${\mathcal {V}}_{L}(x)$ denotes the leader’s current-value function. Maximizing the RHS yields

$$\displaystyle \begin{aligned} {u}_{L}=\kappa +\frac{\partial}{{\partial}{x}}{{\mathcal{V}}_{L}(x)} . \end{aligned}$$

Substituting in (2.81) leads to

$$\displaystyle \begin{aligned} \begin{array}{rcl} \rho {\mathcal{V}}_{L}(x) &\displaystyle =&\displaystyle \left( \kappa +\frac{\partial}{{\partial}{x}}{{ \mathcal{V}}_{L}(x)} \right) \left( \kappa -\frac{1}{2}\left( \kappa +\frac{\partial}{{\partial}{x}}{{\mathcal{V}}_{L}(x)} \right) \right) {} \\ &\displaystyle &\displaystyle -\frac{1}{2}\varphi x^{2}+\frac{\partial}{{\partial}{\ x}}{{\mathcal{V}} _{L}(x)}\left( \frac{\partial}{{\partial}{x}}{{\mathcal{V}}_{L}(x)} +\frac{ \partial}{{\partial}{x}}{{\mathcal{V}}_{F}(x)} +2\kappa -\alpha x\right) .\end{array} \end{aligned} $$

(2.82)

As the game at hand is of the linear-quadratic type, we can take the current value functions to be general quadratic. Accordingly, let

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathcal{V}}_{L}(x) &\displaystyle =&\displaystyle \frac{A_{L}}{2}x^{2}+B_{L}x+C_{L}, {} \end{array} \end{aligned} $$

(2.83)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathcal{V}}_{F}(x) &\displaystyle =&\displaystyle \frac{A_{F}}{2}x^{2}+B_{F}x+C_{F}, {} \end{array} \end{aligned} $$

(2.84)

be, respectively, the leader’s and the follower’s current-value functions, where the six coefficients are yet to be determined. Substituting these structural forms in (2.82) yields

$$\displaystyle \begin{aligned} &\rho \left( \frac{A_{L}}{2}x^{2}+B_{L}x+C_{L}\right) =\frac{1}{2}\left( A_{L}^{2}-\varphi +2\left( A_{F}-\alpha \right) A_{L}\right) x^{2}\\ &+\left( A_{L}\left( B_{L}+B_{F}+2\kappa \right)+\left( A_{F}-\alpha \right) B_{L}\right) x+\frac{1}{2}\left( \kappa ^{2}+B_{L}^{2}\right) +\left( B_{F}+2\kappa \right) B_{L}. \end{aligned} $$

Using (2.79), (2.80), and (2.83)–(2.84), we arrive at the following algebraic equation for the follower:

$$\displaystyle \begin{aligned} &\rho \left( \frac{A_{F}}{2}x^{2}+B_{F}x+C_{F}\right) =\frac{1}{2}\left( A_{F}^{2}-\varphi +2\left( A_{L}-\alpha \right) A_{F}\right) x^{2}\\ &+\left( A_{F}\left( B_{F}+B_{L}+2\kappa \right)+\left( A_{L}-\alpha \right) B_{F}\right) x+\frac{1}{2}\left( \kappa ^{2}+B_{F}^{2}\right) +\left( B_{L}+2\kappa \right) B_{F}. \end{aligned} $$

By comparing the coefficients of like powers of x, we arrive at the following six-equation, nonlinear algebraic system:

$$\displaystyle \begin{aligned} \begin{array}{rcl} 0 &\displaystyle =&\displaystyle A_{L}^{2}+\left( 2A_{F}-2\alpha -\rho \right) A_{L}-\varphi , \\ 0 &\displaystyle =&\displaystyle A_{L}\left( B_{L}+B_{F}+2\kappa \right) +\left( A_{F}-\alpha -\rho \right) B_{L}, \\ 0 &\displaystyle =&\displaystyle \frac{1}{2}\left( \kappa ^{2}+B_{L}^{2}\right) +\left( B_{F}+2\kappa \right) B_{L}-\rho C_{L}, \\ 0 &\displaystyle =&\displaystyle A_{F}^{2}-\varphi +\left( 2A_{L}-2\alpha -\rho \right) A_{F}, \\ 0 &\displaystyle =&\displaystyle A_{F}\left( B_{F}+B_{L}+2\kappa \right) +\left( A_{L}-\alpha -\rho \right) B_{F}, \\ 0 &\displaystyle =&\displaystyle \frac{1}{2}\left( \kappa ^{2}+B_{F}^{2}\right) +\left( B_{L}+2\kappa \right) B_{F}-\rho C_{F}. \end{array} \end{aligned} $$

The above system generally admits multiple solutions. One can eliminate some of these based on, e.g., convergence to an asymptotically globally stable steady state. Let the sextuple $\left ( A_{L}^{S},B_{L}^{S},C_{L}^{S},A_{F}^{S},B_{F}^{S},C_{F}^{S}\right ) $ denote a solution to the above system, satisfying the additional desirable properties. Then, a pair of FSE strategies is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} {u}_{F} &\displaystyle =&\displaystyle \kappa +{\mathcal{V}}_{F}^{\prime }\left( x\right) =A_{F}^{S}x+B_{F}^{S}, \\ {u}_{L} &\displaystyle =&\displaystyle \kappa +{\mathcal{V}}_{L}^{\prime }\left( x\right) =A_{L}^{S}x+B_{L}^{S}. \end{array} \end{aligned} $$

4.4 Time Consistency of Stackelberg Equilibria

When, at an initial instant of time, the leader announces a strategy she will use throughout the game, her goal is to influence the follower’s strategy choice in a way that will be beneficial to her. Time consistency addresses the following question: given the option to re-optimize at a later time, will the leader stick to her original plan, i.e., the announced strategy and the resulting time path for her control variable? If it is in her best interest to deviate, then the leader will do so, and the equilibrium is then said to be time inconsistent. An inherently related question is then why would the follower, who is a rational player, believe in the announcement made by the leader at the initial time if it is not credible? The answer is clearly that she would not.

In most of the Stackelberg differential games, it turns out that the OLSE is time inconsistent, that is, the leader’s announced control path u_L(⋅) is not credible. Markovian or feedback Stackelberg equilibria (SFE), on the other hand, are subgame perfect and hence time consistent; they are in fact strongly time consistent, which refers to the situation where the restriction of leader’s originally announced strategy to a shorter time interval (sharing the same terminal time) is still SFE and regardless of what evolution the game had up to the start of that shorter interval.

The OLSE in Sect. 4.3 is time inconsistent. To see this, suppose that the leader has the option of revising her plan at time τ > 0 and to choose a new decision rule u_L(⋅) for the remaining time span [τ, ∞) . Then she will select a rule that satisfies θ(τ) = 0 (because this choice will fulfill the initial condition on the costate θ). It can be shown, by using the four state and costate equations $\left ( \dot {x},\dot {{q}}_{F},\dot {{q}}_{L},\dot {\theta }\right ) ,$ that for some instant of time, τ > 0, it will hold that θ(τ) ≠ 0. Therefore, the leader will want to announce a new strategy at time τ, and this makes the original strategy time inconsistent, i.e., the new strategy does not coincide with the restriction of the original strategy to the interval [τ, ∞).

Before concluding this subsection, we make two useful observations.

Remark 13

Time consistency (and even stronger, strong time consistency) of FSE relies on the underlying assumption that the information structure is state feedback and hence without memory, that is, at any time t, the players do not remember the history of the game.

Remark 14

In spite of being time inconsistent, the OLSE can still be a useful solution concept for some short-term horizon problems, where it makes sense to assume that the leader will not be tempted to re-optimize at an intermediate instant of time.

5 Memory Strategies and Collusive Equilibria

5.1 Implementing Memory Strategies in Differential Games

As mentioned earlier, by memory strategies we mean that the players can, at any instant of time, recall any specific past information. The motivation for using memory strategies in differential games is in reaching through an equilibrium a desirable outcome that is not obtainable noncooperatively using open-loop or state-feedback strategies. Loosely speaking, this requires that the players agree (implicitly, or without taking on any binding agreement) on a desired trajectory to follow throughout the game (typically a cooperative solution) and are willing to implement a punishment strategy if a deviation is observed. Richness of an information structure, brought about through incorporation of memory, enables such monitoring.

If one party realizes, or remembers, that, in the past, the other party deviated from an agreed-upon strategy, it implements some pre-calculated punishment. Out of the fear of punishment, the players adhere to the Pareto-efficient path, which would be unobtainable in a strictly noncooperative game.

A punishment is conceptually and practically attractive only if it is effective, i.e., it deprives a player of the benefits of a defection, and credible, i.e., it is in the best interest of the player(s) who did not defect to implement this punishment. In this section, we first introduce the concept of non-Markovian strategies and the resulting Nash equilibrium and next illustrate these concepts through a simple example.

Consider a two-player infinite-horizon differential game, with state equation

$$\displaystyle \begin{aligned} \dot{x}(t)=f\left( {x}(t),{\ {u}}_{1}(t),{\ {u}}_{2}(t),t\right) ,\quad{x} (0)={x}^{0}. \end{aligned}$$

To a pair of controls $\left ( {u}_{1}(t),{u}_{2}(t)\right ) $, there corresponds a unique trajectory x(⋅) emanating from x⁰. Player j’s payoff is given by

$$\displaystyle \begin{aligned} J_{j}({\ {u}}_{1}(t),{\ {u}}_{2}(t);x^{0})=\int_{0}^{\infty }e^{-\rho _{j}t}g_{j}({x}(t),{\ {u}}_{1}(t),{\ {u}}_{2}(t),t)\>dt,\;j=1,2, \end{aligned}$$

where g_j(x(t), u₁(t), u₂(t), t) is taken to be bounded and continuously differentiable.^{Footnote 10} As before, the control set of player j is U_j and the state set X is identical to $\mathbb {R}^{n}$.

Heretofore, a strategy has been defined as a mapping from player’s information space to her control set. Unfortunately, this direct approach poses formidable mathematical difficulties in the present context; therefore, we will define a strategy as an infinite sequence of approximate constructions, called δ-strategies. For player j, consider the sequence of times t_j = iδ, i = 0, 1, …, where δ is a fixed positive number. For any time interval [t_j, t_i+1), let $\mathcal {U} _{j}^{i}$ be the set of measurable control functions u_j,i : [t_j, t_i+1) → U_j, and let $\mathcal {U}^{i}=\mathcal {U }_{1}^{i}\times \mathcal {U}_{2}^{i}$. A δ-strategy for player j is a sequence $\Delta _{j}^{\delta }=\left ( \Delta _{j,i}\right ) _{i=0,1,\ldots ,}$ of mappings

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta _{j,0} &\displaystyle \in &\displaystyle \mathcal{U}_{j}^{0}, \\ \Delta _{j,i} &\displaystyle =&\displaystyle \mathcal{\text{U}}^{0}\times \mathcal{\text{U}}^{1}\times \ldots \times \mathcal{\text{U}}^{i-1}\rightarrow \mathcal{U}_{j}^{j}\\text{for }\j=1,2,\ldots . \end{array} \end{aligned} $$

A strategy for player j is an infinite sequence of δ -strategies:

$$\displaystyle \begin{aligned} \Delta _{j}=\left\{ \Delta _{j}^{\delta _{n}}:\delta _{n}\rightarrow 0,n=1,2,\ldots \right\} . \end{aligned}$$

Note that this definition implies that the information set of player j at time t is

$$\displaystyle \begin{aligned} \left\{ \left({u}_{1}(s),{\ {u}}_{2}(s)\right) ,0\leq s<t\right\} , \end{aligned}$$

that is, the entire control history up to (but not including) time t. So when players choose δ-strategies, they are using, at successive sample times t_j, the accumulated information to generate a pair of measurable controls $\left ( {{u}}_{1}^{\delta }(\cdot ),{\ {u}} _{2}^{\delta ^{\prime }}(\cdot )\right ) $ which, in turn, generate a unique trajectory $x^{\bar {\delta }}\left ( \cdot \right ) $ and thus, a unique outcome $w^{\bar {\delta }}=$ $\left ( w_{1}^{\bar {\delta }},w_{2}^{\bar { \delta }}\right ) \in \mathbb {R}^{2},$ where $\bar {\delta }=\left ( \delta ,\delta ^{\prime }\right ) $, and

$$\displaystyle \begin{aligned} w_{j}^{\bar{\delta}}=\int_{0}^{\infty }e^{-\rho _{j}t}g_{j}({x}^{\bar{\delta} }(t) ,{\ {u}}_{1}^{\delta }(t),{\ {u}}_{2}^{\delta ^{\prime }}(t),t)\>dt. \end{aligned}$$

An outcome of the strategy pair $\bar {\Delta }$ is a pair $\bar {w}\in \mathbb {R}^{2},$ which is a limit of the sequence $\left \{ w^{\bar { \delta }_{n}}\right \} $ of the outcomes of δ-strategy pairs $ \bar {\Delta }^{\bar {\delta }_{n}}=\left ( \Delta _{1}^{\delta _{n}},\Delta _{2}^{\delta _{n}^{\prime }}\right ) $ when n tends to infinity. With a strategy pair, the initial state and time are thus associated with a set $ v\left ( t^{0}, {x}^{0};\bar {\Delta }\right ) $ of possible outcomes. (Note that we have used the obvious extension to a non-zero initial time t⁰.) The game is well defined if, for any strategy pair $\bar {\Delta }$ and any initial conditions $\left ( t^{0},x^{0}\right ) ,$ the set of outcomes $ v\left ( t^{0}, {x}^{0};\bar {\Delta }\right ) $ is nonempty.^{Footnote 11}

Definition 7

A strategy pair $\bar {\Delta }^{\ast }$ is a Nash equilibrium at $\left ( t^{0}, {x}^{0}\right ) $ if, and only if,

1.
the outcome set $v\left ( t^{0}, {x}^{0}; \bar {\Delta }\right ) $ reduces to a singleton $w^{\ast }=\ \left ( w_{1}^{\ast },w_{2}^{\ast }\right )$;
2.
for all strategy pairs $\bar {\Delta }^{\left ( 1\right ) }\triangleq \left ( \Delta _{1},\Delta _{2}^{\ast }\right ) $ and $\bar {\Delta }^{\left ( 2\right ) }\triangleq \left ( \Delta _{1}^{\ast },\Delta _{2}\right )$, the following holds for j = 1, 2:
$$\displaystyle \begin{aligned} \left( w_{1},w_{2}\right) \in v\left( t^{0}, {x}^{0};\bar{\Delta} ^{\left( i\right) }\right) \Rightarrow w_{j}\leq w_{j}^{\ast }. \end{aligned}$$

The equilibrium condition for the strategy pair is valid only at $\left ( t^{0}, {x}^{0}\right ) $. This implies, in general, that the Nash equilibrium that was just defined is not subgame perfect.

Definition 8

A strategy pair $\bar {\Delta }^{\ast }$ is a subgame-perfect Nash equilibrium at $\left ( t^{0}, {x}^{0}\right ) $ if, and only if,

1.
given a control pair $\bar {u}\left ( \cdot \right ) :[t^{0},t)\rightarrow U_{1}\times U_{2}$ and the state x(t) reached at time t, we define the prolongation of $\bar {\Delta }^{\ast }$ at $\left ( t, {x}(t)\right ) $ as $\left \{ \Delta ^{{ }^{\prime \ast \delta _{n}}}:\delta _{n}\rightarrow 0,n=1,2,\ldots \right \} $ defined by
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \Delta ^{{}^{\prime \delta _{n}}}\left( \bar{ {u}}_{\left[ t,t+\delta _{n} \right] },\ldots ,\bar{ {u}}_{\left[ t+i\delta _{n},t+\left( i+1\right) \delta _{n}\right] }\right) \\ &\displaystyle =&\displaystyle \Delta ^{{}^{\delta _{n}}}\left( \bar{ {u}}_{\left[ 0,\delta _{n} \right] }, \bar{ {u}}_{\left[ \delta _{n},2\delta _{n}\right] }\ldots , \bar{u}_{\left[ t+i\delta _{n},t+\left( i+1\right) \delta _{n}\right] }\right) ; \end{array} \end{aligned} $$
2.
the prolongation of $\bar {\Delta }^{* }$ at $\left ( t,x(t) \right ) $ is again an equilibrium at $\left ( t, {x}(t) \right ) $.

Before providing an illustrative example, we make a couple of points in the following remark.

Remark 15

1.
The information set was defined here as the entire control history. An alternative definition is $\left \{ x\left ( s\right ) ,0\leq s<t\right \} $, that is, each player bases her decision on the entire past state trajectory. Clearly, this definition requires less memory capacity and hence may be an attractive option, particularly when the differential game involves more than two players. (See Tolwinski et al. 1986 for details.)
2.
The consideration of memory strategies in differential games can be traced back to Varaiya and Lin (1963), Friedman (1971), and Krassovski and Subbotin (1977). Their setting was (mainly) zero-sum differential games , and they used memory strategies as a convenient tool for proving the existence of a solution. Başar used memory strategies in the 1970s to show how richness of and redundancy in information structures could lead to informationally nonunique Nash equilibria (Başar 1974, 1975, 1976, 1977) and how the richness and redundancy can be exploited to solve for global Stackelberg equilibria (Başar 1979, 1982; Başar and Olsder 1980; Başar and Selbuz 1979) and to obtain incentive designs (Başar 1985). The exposition above follows Tolwinski et al. (1986) and Haurie and Pohjola (1987), where the setting is nonzero-sum differential games and the focus is on the construction of cooperative equilibria.

5.2 An Example

Consider a two-player differential game where the evolution of the state is described by

$$\displaystyle \begin{aligned} {\ \dot{x}}(t)=\left( 1-{u}_{1}(t) \right) {u}_{2}(t) , \quad x\left( 0\right) =x^{0}>0, {} \end{aligned} $$

(2.85)

where 0 < u_j(t) < 1. The players maximize the following objective functionals:

$$\displaystyle \begin{aligned} \begin{array}{rcl} J_{1}({u}_{1}(t),{u}_{2}(t);x^{0}) &\displaystyle =&\displaystyle \alpha \int_{0}^{\infty }e^{-\rho t}\left( \ln {u}_{1}(t) +x(t)\right) \>dt, \\ J_{2}({u}_{1}(t),{u}_{2}(t);x^{0}) &\displaystyle =&\displaystyle \left( 1-\alpha \right) \int_{0}^{\infty }e^{-\rho t}\left( \ln \left( 1-{u}_{1}(t) \right) \left( 1-{u}_{2}(t) \right) +x(t)\right) \>dt, \end{array} \end{aligned} $$

where 0 < α < 1 and 0 < ρ ≤ 1∕4.

Suppose that the two players wish to implement a cooperative solution noncooperatively by using non-Markovian strategies and threats.

Step 1: Determine Cooperative Outcomes.

Assume that these outcomes are given by the joint maximization of the sum of players’ payoffs. To solve this optimal control problem, we introduce the current-value Hamiltonian (we suppress the time argument):

$$\displaystyle \begin{aligned} {{\mathcal{H}}}\left( {u}_{1},{u}_{2},x,{\lambda} \right) =\alpha \ln {u} _{1}+\left( 1-\alpha \right) \ln \left( 1-{u}_{1}\right) \left( 1-{u} _{2}\right) +x+{q} \left( 1-{u}_{1}\right) {u}_{2}, \end{aligned}$$

where q is the current-value adjoint variable associated with the state equation (2.85). Necessary and sufficient optimality conditions are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{x} &\displaystyle =&\displaystyle \left( 1-{u}_{1}\right) {u}_{2},\quad x\left( 0\right) =x^{0}>0, \\ \dot{{q}} &\displaystyle =&\displaystyle \rho {q} -1,\quad\lim_{t\rightarrow \infty }e^{-\rho t}{q(t)} =0, \\ \frac{\partial {{\mathcal{H}}}}{\partial {u}_{1}} &\displaystyle =&\displaystyle \frac{\alpha }{{u}_{1}}- \frac{\left( 1-\alpha \right) }{\left( 1-{u}_{1}\right) }-{q} {u}_{2}=0, \\ \frac{\partial {{\mathcal{H}}}}{\partial {u}_{2}} &\displaystyle =&\displaystyle -\frac{\left( 1-\alpha \right) }{ \left( 1-{u}_{2}\right) }-{q} \left( 1-{u}_{1}\right) =0. \end{array} \end{aligned} $$

It is easy to verify that the unique optimal solution is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left( {u}_{1}^{\ast },{u}_{2}^{\ast }\right) &\displaystyle =&\displaystyle \left( \alpha \rho ,\frac{ 1-\rho }{1-\alpha \rho }\right) ,\quad x^{\ast }(t) =x^{0}+\left( 1-\rho \right) t, \\ J_{1}\left( {u}_{1}^{\ast }(\cdot ),{u}_{2}^{\ast }(\cdot ); {x} ^{0}\right) &\displaystyle =&\displaystyle \frac{\alpha }{\rho }\left( \ln \alpha \rho +x^{0}+\frac{ 1-\rho }{\rho } \right) ,\quad\\ J_{2}\left( {u}_{1}^{\ast }(\cdot ),{u}_{2}^{\ast }(\cdot ); {x} ^{0}\right) &\displaystyle =&\displaystyle \frac{1-\alpha }{\rho }\left( \ln \left( 1-\alpha \right) \rho +x^{0}+ \frac{1-\rho }{\rho }\right) . \end{array} \end{aligned} $$

Note that both optimal controls satisfy the constraints 0 < u_j(t) < 1, j = 1, 2.

Step 2: Compute Nash-Equilibrium Outcomes.

As the game is of the linear-state variety,^{Footnote 12} open-loop and state-feedback Nash equilibria coincide. We therefore proceed with the derivation of the OLNE, which is easier to solve. To determine this equilibrium, we first write the players’ current-value Hamiltonians:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {{\mathcal{H}}}_{1}\left( {u}_{1},{u}_{2},x,{q}_{1}\right) &\displaystyle =&\displaystyle \alpha \left( \ln {u}_{1}+x\right) +{q}_{1}\left( 1-{u}_{1}\right) {u}_{2}, \\ {{\mathcal{H}}}_{2}\left( {u}_{1},{u}_{2},x,{q}_{2}\right) &\displaystyle =&\displaystyle \left( 1-\alpha \right) \left( \ln \left( 1-{u}_{1}\right) \left( 1-{u}_{2}\right) +x\right) +{q}_{2}\left( 1-{u}_{1}\right) {u}_{2}, \end{array} \end{aligned} $$

where q_j is the costate variable attached by player j to the state equation (2.85). Necessary conditions for a Nash equilibrium are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{x} &\displaystyle =&\displaystyle \left( 1-{u}_{1}\right) {u}_{2},\quad x\left( 0\right) =x^{0}>0, \\ \dot{{q}}_{1} &\displaystyle =&\displaystyle \rho {q}_{1}-\alpha ,\quad\lim_{t\rightarrow \infty }e^{-\rho t}{q}_{1}(t)=0, \\ \dot{{q}}_{2} &\displaystyle =&\displaystyle \rho {q}_{2}-\left( 1-\alpha \right) ,\quad\lim_{t\rightarrow \infty }e^{-\rho t}{q}_{2}(t)=0, \\ \frac{\partial }{\partial {u}_{1}}{{\mathcal{H}}}_{1} &\displaystyle =&\displaystyle \frac{\alpha }{{u} _{1}}-{q}_{1}{u}_{2}=0, \\ \frac{\partial }{\partial {u}_{2}}{{\mathcal{H}}}_{2} &\displaystyle =&\displaystyle -\frac{\left( 1-\alpha \right) }{\left( 1-{u}_{2}\right) }-{q}_{2}\left( 1-{u}_{1}\right) =0. \end{array} \end{aligned} $$

It is easy to check that the Nash equilibrium is unique and is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left( \bar{u}_{1},\bar{u}_{2}\right) &\displaystyle =&\displaystyle \left( \frac{1-k}{2},\frac{1+k}{2} \right) ,\quad\bar{x}(t)=x^{0}+\left( \frac{1+k}{4}\right) t, \\ J_{1}\left( \bar{u}_{1}\left( \cdot \right) ,\bar{u}_{2}(\cdot );x^{0}\right) &\displaystyle =&\displaystyle \frac{\alpha }{\rho }\left( \ln \left( \frac{1-k}{2} \right) +x^{0}+\frac{1+k}{2\rho }-1\right) ,\quad\\ J_{2}\left( \bar{u}_{1}\left( \cdot \right) ,\bar{u}_{2}(\cdot );x^{0}\right) &\displaystyle =&\displaystyle \frac{1-\alpha }{\rho }\left( \ln \rho +x^{0}+\frac{1+k}{ 2\rho }-1\right) , \end{array} \end{aligned} $$

where $k=\sqrt {1-4\rho }.$ Note that the equilibrium controls satisfy the constraints 0 < u_j(t) < 1, j = 1, 2, and as expected in view of the game structure, they are constant over time.

Step 3: Construct a Collusive Equilibrium.

We have thus so far obtained

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left( w_{1}^{\ast },w_{2}^{\ast }\right) &\displaystyle =&\displaystyle \left( J_{1}\left( {u} _{1}^{\ast }(\cdot ),{u}_{2}^{\ast }(\cdot );x^{0}\right) ,J_{2}\left( {u} _{1}^{\ast }(\cdot ),{u}_{2}^{\ast }(\cdot );x^{0}\right) \right) , \\ \left( \bar{w}_{1},\bar{w}_{2}\right) &\displaystyle =&\displaystyle \left( J_{1}\left( \bar{u}_{1}\left( \cdot \right) ,\bar{u}_{2}(\cdot );x^{0}\right) ,J_{2}\left( \bar{u} _{1}\left( \cdot \right) ,\bar{u}_{2}(\cdot );x^{0}\right) \right) . \end{array} \end{aligned} $$

Computing the differences

$$\displaystyle \begin{aligned} \begin{array}{rcl} w_{1}^{\ast }-\bar{w}_{1} &\displaystyle =&\displaystyle \frac{\alpha }{\rho }\left( \ln \left( \frac{ 1-k }{2\alpha \rho }\right) +\frac{1+\rho ^{2}-3\rho +k}{2\rho }\right) , \\ w_{2}^{\ast }-\bar{w}_{2} &\displaystyle =&\displaystyle \frac{1-\alpha }{\rho }\left( \ln \left( 1-\alpha \right) +\frac{1-3\rho +k}{2\rho }\right) , \end{array} \end{aligned} $$

we note that they are independent of the initial state x⁰ and that their signs depend on the parameter values. For instance, if we have the following restriction on the parameter values:

$$\displaystyle \begin{aligned} 0<\alpha <\min \left( \frac{1-k}{2\rho \exp \left( \frac{3\rho -1-\rho ^{2}-k }{2\rho }\right) },1-\exp \left( \frac{3\rho -1+k}{2\rho }\right) \right) , \end{aligned}$$

then $w_{1}^{\ast }>\bar {w}_{1}$ and $w_{2}^{\ast }>\bar {w}_{2}$. Suppose that this is true. What remains to be shown is then that by combining the cooperative (Pareto-optimal) controls with the state-feedback (equivalent to open-loop, in this case) Nash strategy pair,

$$\displaystyle \begin{aligned} \left( \gamma _{1}\left( x\right) ,\gamma _{2}\left( x\right) \right) =\left( \bar{u}_{1},\bar{u}_{2}\right) =\left( \frac{1-k}{2},\frac{1+k}{2} \right) , \end{aligned}$$

we can construct a subgame-perfect equilibrium strategy in the sense of Definition 8.

Consider a strategy pair

$$\displaystyle \begin{aligned} \bar{\Delta}_{j}=\left\{ \bar{\Delta}_{j}^{\ast \delta _{n}}:\delta _{n}\rightarrow 0,n=1,2,\ldots \right\} , \end{aligned}$$

where, for $j=1,2,\,\,\bar {\Delta }_{j}^{\ast \delta }$ is defined as follows:

$$\displaystyle \begin{aligned} \Delta _{j}^{\ast \delta }=\left( \Delta _{j,i}\right) _{i=0,1,2,\ldots ,} \end{aligned}$$

with

$$\displaystyle \begin{array}{rcl} \Delta_{j,0}^{\ast } &\displaystyle =&\displaystyle {u}_{j,0}^{\ast}\left( \cdot \right) , \\ \Delta _{j,i}^{\ast } &\displaystyle =&\displaystyle \left\{ \begin{array}{cc} {u}_{j,i}^{\ast }\left(\cdot \right) , &\displaystyle \ \text{if } \ \bar{u}\left( s\right) =\bar{u}^{\ast }\left( s\right) \ \text{for almost } \ s \leq i\delta , \\ \varphi _{j}\left( x\left( j\delta \right) \right) =\bar{u}_{j}, &\displaystyle \ \text{otherwise,} \end{array} \right. \end{array}$$

for i = 1, 2, …, where ${u}_{j,i}^{\ast }\left ( \cdot \right ) $ denotes the restriction truncation of ${u}_{j,}^{\ast }(\cdot )$ to the subinterval $ \left [ i\delta ,\left ( i+1\right ) \delta \right ] ,i=0,1,2,\ldots $, and $ x\left ( i\delta \right ) $ denotes the state observed at time t = iδ.

The strategy just defined is known as a trigger strategy . A statement of the trigger strategy, as it would be made by a player, is “At time t, I implement my part of the optimal solution if the other player has never cheated up to now. If she cheats at t, then I will retaliate by playing the state-feedback Nash strategy from t onward.” It is easy to show that this trigger strategy constitutes a subgame-perfect equilibrium.

Remark 16

It is possible in this differential-game setting to define a retaliation period of finite length, following a deviation. Actually, the duration of this period can be designed to discourage any player from defecting. Also, in the above development and example, we assumed that a deviation is instantaneously detected. This may not necessarily be the case, and in such situations we can consider a detection lag. For an example of a trigger strategy with a finite retaliation period and detection lag, see Hämäläinen et al. (1984).

6 Conclusion

This chapter has provided an overview of the theory of nonzero-sum differential games formulated in continuous time and without any stochastic elements. Only noncooperative aspects of the theory have been covered, primarily under two different solution concepts: Nash equilibrium and Stackelberg equilibrium and several of their variants. The importance of information structures in such dynamic games has been emphasized, with special focus on open-loop and state-feedback information structures. The additional degrees of freedom memory strategies bring in in inducing specific behavior on the part of the players has also been discussed, and several special structures of differential games, such as linear-quadratic (or affine-quadratic) games, symmetric games, and zero-sum differential games, have also been covered, with some illustrative examples. The chapter has also emphasized the important role strategic equivalence plays in solvability of some classes of differential games.

There are several other issues very relevant to the topic and material of this chapter, which are covered by selected other chapters in the Handbook. These involve dynamic games described in discrete time, concave differential games with coupled state constraints defined over infinite horizon, dynamic games with an infinite number of players (more precisely, mean-field games), zero-sum differential games (with more in-depth analysis than the coverage in this chapter), games with stochastic elements (more precisely, stochastic games), mechanism designs, and computational methods, to list just a few.

Notes

1.
See, e.g., the classical textbook on optimal control by Lee and Markus (1972) for examples of synthesis of state-feedback control laws.
2.
This property will be discussed later in the chapter; in the context of static games, “strategic equivalence” has been discussed in Chap. 1.
3.
These are conditions which ensure that when all players’ strategies are implemented, then the differential equation (2.1) describing the evolution of the state admits a unique piecewise continuously differentiable solution for each initial condition x⁰; see, e.g., Başar and Olsder (1999).
4.
With a slight abuse of notation, we have included here also the pair (x⁰, t⁰) as an argument of $\bar {J}_{j}$, since under the SF information γ does not have (x⁰, t⁰) as an argument for t > t⁰.
5.
We use $\mathcal {T}$ instead of T because, in a general setting, T may be endogenously defined as the time when the target is reached.
6.
We use the convention that λ_j(t)f(⋅, ⋅, ⋅) is the scalar product of two n dimensional vectors λ_j(t) and f(⋯ ).
7.
This is a standard result in optimal control, which can be found in any standard text, such as Bryson et al. (1975).
8.
The existence of a conjugate point in [0, T) implies that there exists a sequence of policies by the maximizer which can drive the value of the game arbitrarily large, that is, the upper value of the game is infinite.
9.
The setup can be easily extended to the case of several followers. A standard assumption is then that the followers play a (Nash) simultaneous-move game vis-a-vis each other, and a sequential game vis-a-vis the leader (Başar and Olsder 1999).
10.
This assumption allows us to use the strong-optimality concept and avoid introducing additional technicalities.
11.
Here, and in the balance of this section, we depart from our earlier convention of state-time ordering (x, t), and use the reverse ordering (t, x).
12.
In a linear-state differential game, the objective functional, the salvage value and the dynamics are linear in the state variables. For such games, it holds that a feedback strategy is constant, i.e., independent of the state and hence open-loop and state-feedback Nash equilibria coincide.

References

Başar T (1974) A counter example in linear-quadratic games: existence of non-linear Nash solutions. J Optim Theory Appl 14(4):425–430
Google Scholar
Başar T (1975) Nash strategies for M-person differential games with mixed information structures. Automatica 11(5):547–551
Google Scholar
Başar T (1976) On the uniqueness of the Nash solution in linear-quadratic differential games. Int J Game Theory 5(2/3):65–90
Google Scholar
Başar T (1977) Informationally nonunique equilibrium solutions in differential games. SIAM J Control 15(4):636–660
Google Scholar
Başar T (1979) Information structures and equilibria in dynamic games. In: Aoki M, Marzollo A (eds) New trends in dynamic system theory and economics. Academic, New York, pp 3–55
Google Scholar
Başar T (1982) A general theory for Stackelberg games with partial state information. Large Scale Syst 3(1):47–56
Google Scholar
Başar T (1985) Dynamic games and incentives. In: Bagchi A, Jongen HTh (eds) Systems and optimization. Volume 66 of Lecture Notes in Control and Information Sciences. Springer, Berlin, pp 1–13
Google Scholar
Başar T (1989) Time consistency and robustness of equilibria in noncooperative dynamic games. In: Van der Ploeg F, de Zeeuw A (eds) Dynamic policy games in economics, North Holland, Amsterdam/New York, pp 9–54
Google Scholar
Başar T, Bernhard P (1995) H^∞-optimal control and related minimax design problems: a dynamic game approach, 2nd edn. Birkhäuser, Boston
Google Scholar
Başar T, Haurie A (1984) Feedback equilibria in differential games with structural and modal uncertainties. In: Cruz JB Jr (ed) Advances in large scale systems, Chapter 1. JAE Press Inc., Connecticut, pp 163–201
Google Scholar
Başar T, Olsder GJ (1980) Team-optimal closed-loop Stackelberg strategies in hierarchical control problems. Automatica 16(4):409–414
Google Scholar
Başar T, Olsder GJ (1999) Dynamic noncooperative game theory. SIAM series in Classics in Applied Mathematics. SIAM, Philadelphia
Google Scholar
Başar T, Selbuz H (1979) Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems. IEEE Trans Autom Control AC-24(2):166–179
Google Scholar
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
Google Scholar
Blaquière A, Gérard F, Leitmann G (1969) Quantitative and qualitative games. Academic, New York/London
Google Scholar
Bryson AE Jr, Ho YC (1975) Applied optimal control. Hemisphere, Washington, DC
Google Scholar
Carlson DA, Haurie A, Leizarowitz A (1991) Infinite horizon optimal control: deterministic and stochastic systems, vol 332. Springer, Berlin/New York
Google Scholar
Case J (1969) Toward a theory of many player differential games. SIAM J Control 7(2):179–197
Google Scholar
Dockner E, Jørgensen S, Long NV, Sorger G (2000) Differential games in economics and management science. Cambridge University Press, Cambridge
Google Scholar
Engwerda J (2005) Linear-quadratic dynamic optimization and differential games. Wiley, New York
Google Scholar
Friedman A (1971) Differential games. Wiley-Interscience, New York
Google Scholar
Hämäläinen RP, Kaitala V, Haurie A (1984) Bargaining on whales: a differential game model with Pareto optimal equilibria. Oper Res Lett 3(1):5–11
Google Scholar
Haurie A, Pohjola M (1987) Efficient equilibria in a differential game of capitalism. J Econ Dyn Control 11:65–78
Google Scholar
Haurie A, Krawczyk JB, Zaccour G (2012) Games and dynamic games. World scientific now series in business. World Scientific, Singapore/Hackensack
Google Scholar
Isaacs R (1965) Differential games. Wiley, New York
Google Scholar
Jørgensen S, Zaccour G (2004) Differential games in marketing. Kluwer, Boston
Google Scholar
Krassovski NN, Subbotin AI (1977) Jeux differentiels. Mir, Moscow
Google Scholar
Lee B, Markus L (1972) Foundations of optimal control theory. Wiley, New York
Google Scholar
Leitmann G (1974) Cooperative and non-cooperative many players differential games. Springer, New York
Book Google Scholar
Mehlmann A (1988) Applied differential games. Springer, New York
Book Google Scholar
Selten R (1975) Reexamination of the perfectness concept for equilibrium points in extensive games. Int J Game Theory 4:25–55
Article MathSciNet Google Scholar
Sethi SP, Thompson GL (2006)Optimal control theory: applications to management science and economics. Springer, New York. First edition 1981
Google Scholar
Starr AW, Ho YC (1969) Nonzero-sum differential games, Part I. J Optim Theory Appl 3(3): 184–206
Article Google Scholar
Starr AW, Ho YC (1969) Nonzero-sum differential games, Part II. J Optim Theory Appl 3(4): 207–219
Article Google Scholar
Tolwinski B, Haurie A, Leitmann G (1986) Cooperative equilibria in differential games. J Math Anal Appl 119:182–202
Article MathSciNet Google Scholar
Varaiya P, Lin J (1963) Existence of saddle points in differential games. SIAM J Control Optim 7:142–157
Google Scholar
von Stackelberg H (1934) Marktform und Gleischgewicht. Springer, Vienna [An English translation appeared in 1952 entitled The theory of the market economy. Oxford University Press, Oxford]
Google Scholar
Yeung DWK, Petrosjan L (2005) Cooperative stochastic differential games. Springer, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Coordinated Science Laboratory and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 61801, 1308 West Main, Urbana, IL, USA
Tamer Başar
ORDECSYS and University of Geneva, Geneva, Switzerland
Alain Haurie
GERAD-HEC Montréal PQ, Montreal, QC, Canada
Alain Haurie
Department of Decision Sciences, GERAD, HEC Montréal, Montreal, QC, Canada
Georges Zaccour

Authors

Tamer Başar
View author publications
You can also search for this author in PubMed Google Scholar
Alain Haurie
View author publications
You can also search for this author in PubMed Google Scholar
Georges Zaccour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tamer Başar .

Editor information

Editors and Affiliations

Coordinated Science Laboratory and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Tamer Başar
Department of Decision Sciences GERAD, HEC Montréal, Montreal, Québec, Canada
Georges Zaccour

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Başar, T., Haurie, A., Zaccour, G. (2018). Nonzero-Sum Differential Games. In: Başar, T., Zaccour, G. (eds) Handbook of Dynamic Game Theory. Springer, Cham. https://doi.org/10.1007/978-3-319-44374-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-44374-4_5
Published: 02 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44373-7
Online ISBN: 978-3-319-44374-4
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Nonzero-Sum Differential Games

Abstract

Similar content being viewed by others

Nonzero-Sum Differential Games

Zero-Sum Differential Games

Differential Games with Asymmetric and Correlated Information

Keywords

1 Introduction

2 A General Framework for m-Player Differential Games

2.1 A System Controlled by m Players

2.1.1 System Dynamics

2.1.2 Control Constraints

2.1.3 Target

2.1.4 Infinite-Horizon Games

2.2 Information Structures and Strategies

2.2.1 Open Loop Versus State Feedback

2.2.2 Strategies

Definition 1

Definition 2

Remark 1

3 Nash Equilibria

Definition 3

3.1 Open-Loop Nash Equilibrium (OLNE)

Definition 4

3.2 State-Feedback Nash Equilibrium (SFNE)

Definition 5

Remark 2

3.3 Necessary Conditions for a Nash Equilibrium

3.3.1 Necessary Conditions for an OLNE

3.3.2 Necessary Conditions for SFNE

Remark 3

3.4 Constructing an SFNE Using a Sufficient Maximum Principle

Theorem 1

3.5 Constructing an SFNE Using Hamilton-Jacobi-Bellman Equations

Theorem 2

Remark 4

3.6 The Infinite-Horizon Case

3.7 Examples of Construction of Nash Equilibria

Open-Loop Nash Equilibrium (OLNE).

State-Feedback Nash Equilibrium (SFNE).

3.8 Linear-Quadratic Differential Games (LQDGs)

Definition 6

3.8.1 OLNE

Theorem 3

Remark 6 (Nonexistence and multiplicity of OLNE)

Corollary 1

Remark 7 (Strategic equivalence and symmetry)

Remark 8 (Zero-sum differential games)

3.8.2 SFNE

Theorem 4

Remark 9

Remark 10

Corollary 2

Remark 11 (Zero-sum differential games with SF information)

4 Stackelberg Equilibria

4.1 Open-Loop Stackelberg Equilibria (OLSE)

Theorem 5

Remark 12

4.2 Feedback Stackelberg Equilibria (FSE)

4.3 An Example: Construction of Stackelberg Equilibria

4.3.1 Open-Loop Stackelberg Equilibrium (OLSE).

4.3.2 Feedback-Stackelberg Equilibrium (FSE).

4.4 Time Consistency of Stackelberg Equilibria

Remark 13

Remark 14

5 Memory Strategies and Collusive Equilibria

5.1 Implementing Memory Strategies in Differential Games

Definition 7

Definition 8

Remark 15

5.2 An Example

Step 1: Determine Cooperative Outcomes.

Step 2: Compute Nash-Equilibrium Outcomes.

Step 3: Construct a Collusive Equilibrium.

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author