Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

2.1 Introduction

Most noncooperative differential games lack Pareto efficiency. That is, all agents can increase their individual payoffs if they agree to coordinate controls. However, in order to attain the socially optimal outcome at least two conditions must be fulfilled: (1) the agents form the grand coalition to derive the efficient controls and (2) payoffs must be transferable and distributed in such a way that every agent benefits from cooperation.Footnote 1

Here we study a mechanism which implements the Pareto efficient outcome as a bargaining solution. The crucial difference to the classic cooperative approach is that agents do not mutually agree to maximize overall payoffs and distribute them appropriately, but bargain over the controls. In order to support the resulting controls as an equilibrium we fix grim trigger strategies. If an agent defects on the agreement, all agents switch to their noncooperative Nash equilibrium strategies [7].

Sorger [6] proposed the recursive Nash [4] bargaining solution for difference games. We introduce a continuous time analogon and apply it to the differential game of public good provision of Fershtman and Nitzan [2]. Considering the noncooperative equilibrium they showed that the public good is underprovided with respect to the efficient solution. The result, however, crucially depends on the linearity of the Markovian strategies. This simplification makes the game analytically tractable and yields a unique steady state.Footnote 2

This note contributes to the literature on cooperative agreements in noncooperative differential games. It is well known that grim trigger strategies can support a set of control paths as equilibria, if they payoff dominate the noncooperative Nash equilibrium. The Nash bargaining solution can then be used as an equilibrium selection device. Since bargaining problems are defined in the payoff space we need to construct a value under agreement. In games with transferable payoffs one can simply fix the efficient value of the grand coalition and define an imputation. Here, however, we do not assume that the grand coalition forms and jointly maximizes payoffs. But we can define the agreement value in terms of a stationary Hamilton-Jacobi-Bellman equation (HJBe), if the agents stick to the agreement strategies over the entire time interval. The agreement strategies are then determined by the Nash bargaining solution.

The remainder of the paper is organized as follows: Sect. 2.2 presents the problem, Sect. 2.3 the solution concept and Sect. 2.4 concludes.

2.2 Problem Statement

The model is essentially the most rudimentary version of Fershtman and Nitzan [2].Footnote 3 Let \(x(t) \in X := [0, \frac {1}{2}]\) denote the stock of a pure public good at time t ∈ R +. We could think of x being the total contribution to some joint project carried out by n agents. Each agent i ∈ N := {1, 2, …, n} can partially control the evolution of the state according to the state equation

$$\displaystyle \begin{aligned} &\dot x(t) = f(x(t), u(t)) = \sum_{i \in N}{u_i(t)} - \delta x(t) {} \end{aligned} $$
(2.1)
$$\displaystyle \begin{aligned} &x_0 := x(0) \in X \end{aligned} $$
(2.2)

where u(t) := (u i(t))iN ∈ ×iN U i =: U ⊂ R n denotes the investment (control) vector and δ ∈ (0, 1] is the deprecation rate. In the context of the joint project, u i(t) then denotes the contribution rate of any agent i ∈ N. We consider quadratic payoffs of the form

$$\displaystyle \begin{aligned} F_i(x(t), u_i(t)) = x(t)(1-x(t)) - \frac{1}{2}u_i(t)^2 \end{aligned} $$
(2.3)

such that the game is linear quadratic and thus possesses a closed form solution [1, Ch. 7.1]. Note that the instantaneous payoff function is monotonously increasing in the state \(\frac {\partial F_i(x,u_i)}{\partial x} > 0\) for all x ∈ X. The state is thus a pure public good and each agent benefits by its provision. With costly investment, however, there exists a trade-off between increasing the stock and minimizing costs. This trade-off defines a public good game. Each agent wants the others to invest, such that one can free ride on the effort of the other agents. This behavior results in an inefficiently low overall investment level. The objective functional for each agent i ∈ N is then given by the stream of discounted payoffs

$$\displaystyle \begin{aligned} J_i(u(s),t) := \int^\infty_t{e^{-r(s-t)}F_i(x(s), u_i(s))ds} \end{aligned} $$
(2.4)

where r > 0 denotes the time preference rate.

2.3 Solution Concepts

In what follows we consider a stationary setup and hence save the time argument t frequently. First we will derive the efficient collusive solution of joint payoff maximization. The efficient value is an upper bound on the agreement value. We then derive the noncooperative Nash equilibrium which serves as the disagreement value for the bargaining solution. The noncooperative equilibrium value is a lower bound on the agreement value. Any cooperative agreement lies in the set of strategies which support payoffs between the noncooperative Nash and efficient value. The noncooperative equilibrium strategies also serve as threats for deviations from the agreed upon bargaining solution.

2.3.1 Collusive Solution

Assume all agents agree to cooperate and jointly maximize overall payoffs. The value function for the efficient solution then reads

$$\displaystyle \begin{aligned} C(x(t)) := \max_{u(s) \in U}\sum_{i \in N}{J_i(u(s),t)} . \end{aligned} $$
(2.5)

The optimal controls must satisfy the stationary HJBe

$$\displaystyle \begin{aligned} rC(x) = \max_{u \in U}\left\{\sum_{i \in N}{F_i(x, u_i)} + C'(x)f(x, u)\right\}. {} \end{aligned} $$
(2.6)

The maximizers of the right hand side of (2.6) are u i = C′(x) for all i ∈ N. Substituting the maximizers into the HJBe yields

$$\displaystyle \begin{aligned} rC(x) = nx(1-x) + \frac{n}{2}C'(x)^2 - C'(x)\delta x. {} \end{aligned} $$
(2.7)

Theorem 2.1

If we consider symmetric stationary linear strategies of the form \(\hat u_i = \alpha x + \beta \) for all i  N where α and β are constants, then there exists a uniquequadratic solution to (2.7)

$$\displaystyle \begin{aligned} C(x) = \frac{\alpha}{2} x^2 + \beta x + \gamma {} \end{aligned} $$
(2.8)

with

$$\displaystyle \begin{aligned} \alpha &:= \frac{1}{2n}\left(r + 2\delta - \sqrt{(r + 2\delta)^2 + 8n^2}\right) {}, \end{aligned} $$
(2.9)
$$\displaystyle \begin{aligned} \beta &:= \frac{n}{\delta - n\alpha + r} {}, \end{aligned} $$
(2.10)
$$\displaystyle \begin{aligned} \gamma &:= \frac{n}{2r}\beta^2 {}. \end{aligned} $$
(2.11)

Proof

Substitute the guess (2.8) and thus C′(x) = αx + β into (2.7)

$$\displaystyle \begin{aligned} r\left(\frac{\alpha}{2} x^2 + \beta x + \gamma\right) = nx(1-x) + \frac{n}{2}(\alpha x + \beta)^2 - (\alpha x + \beta)\delta x. {} \end{aligned} $$
(2.12)

This optimality condition must hold at any x ∈ X. Evaluate (2.12) at x = 0, which yields γ

$$\displaystyle \begin{aligned} r\gamma = \frac{n}{2}\beta^2 \quad\Longleftrightarrow \quad\gamma = \frac{n}{2r}\beta^2. \end{aligned} $$
(2.13)

Taking the derivative of (2.12) gives

$$\displaystyle \begin{aligned} r(\alpha x + \beta) = n(1 - 2x) + \alpha n(\alpha x + \beta) - \delta(2\alpha x + \beta). {} \end{aligned} $$
(2.14)

Again, at x = 0 we have

$$\displaystyle \begin{aligned} r\beta = n + \alpha n\beta - \delta\beta \quad\Longleftrightarrow \quad\beta = \frac{n}{\delta - \alpha n + r}. \end{aligned} $$
(2.15)

Resubstituting β in (2.14) and solving for α yields

$$\displaystyle \begin{aligned} \alpha = \frac{1}{2n}\left(r + 2\delta \pm \sqrt{(r + 2\delta)^2 + 8n^2}\right). {} \end{aligned} $$
(2.16)

Note that the state dynamics become \(\dot x(t) = (n\alpha - \delta )x(t) + n\beta \). There exists a unique and globally asymptotically stable steady state at x = −∕( − δ) if  − δ < 0 holds, which is ensured for the negative root of (2.16).

2.3.2 Noncooperative Equilibrium

The collusive solution implies two restrictive assumptions. The grand coalition must form and payoffs must be transferable in order to split the total payoff.Footnote 4 Let us assume that the collusive solution is not feasible. If this is the case we consider a noncooperative differential game and each agent maximizes his individual payoffs. The noncooperative Markovian strategies are denoted by ϕ i : X → U i and satisfyFootnote 5

$$\displaystyle \begin{aligned} \phi_i(x(s)) \in \arg\max_{u_i(s) \in U_i}J_i(u_i(s), \phi_{-i}(x(s)),t). \end{aligned} $$
(2.17)

where ϕ i := (ϕ j)jN∖{i}. A noncooperative Nash equilibrium is then defined as follows.

Definition 2.1

The strategy tuple ϕ(x(s)) := (ϕ i(x(s)))iN ∈ U is a noncooperative Nash equilibrium if the following holds

$$\displaystyle \begin{aligned} J_i(\phi(x(s)), t) \geq J_i(u_i(s), \phi_{-i}(x(s)), t) \quad\forall u_i(s) \in U_i, ~\forall i \in N. \end{aligned} $$
(2.18)

Denote by

$$\displaystyle \begin{aligned} D_i(x(t)) := J_i(\phi(x(s)), t) {} \end{aligned} $$
(2.19)

the noncooperative disagreement value.

Theorem 2.2

If we consider symmetric stationary linear strategies of the form ϕ i(x) = ωx + λ for all i  N where ω and λ are constants, then there exists a unique quadratic solution to (2.19)

$$\displaystyle \begin{aligned} D_i(x) = \frac{\omega}{2}x^2 + \lambda x + \mu \end{aligned} $$
(2.20)

with

$$\displaystyle \begin{aligned} \omega & := \frac{1}{2(2n-1)}\left(r + 2\delta - \sqrt{(r + 2\delta)^2 +8(2n-1)}\right), \end{aligned} $$
(2.21)
$$\displaystyle \begin{aligned} \lambda & := \frac{1}{\delta - (2n-1)\omega + r}, \end{aligned} $$
(2.22)
$$\displaystyle \begin{aligned} \mu & := \frac{2n-1}{2r}\lambda^2. \end{aligned} $$
(2.23)

Proof

The proof follows the same steps as Theorem 2.1.

The noncooperative equilibrium, however, is generally not efficient. It can be shown eventually that the collusive solution yields a cooperation dividend such that the value under cooperation always exceeds the noncooperative value, i.e., C(x) >∑iN D i(x) ∀x ∈ X. The investment levels and thus the provision of the public good are inefficiently low. This result is standard in public good games and due to free riding. It is rational to assume that the agents do not want to stick to the fully noncooperative equilibrium, but increase overall efficiency by exploiting the cooperation dividend.

2.3.3 Bargaining Solution

It was shown by Tolwinski et al. [7]Footnote 6 that any control path \(\tilde u_i^t := (\tilde u_i(s))_{s \geq t}\), i ∈ N can be supported as an equilibrium if the control profiles are agreeable and defection from the agreement is punished.Footnote 7 Let σ i : X → U i denote a Markovian strategy that generates \(\tilde u_i\). Suppose the agents agree on some strategy profile σ(x) := (σ i(x))iN at \( \underline t < 0\) before the game has started. If the agents agree from t onwards, the agreement value is defined as

$$\displaystyle \begin{aligned} A_i(x(t)) = J_i(\sigma(x(s)),t). {} \end{aligned} $$
(2.24)

Definition 2.2

A strategy tuple σ(x) is agreeable at \( \underline t\) if

$$\displaystyle \begin{aligned} A_i(x(t)) \geq D_i(x(t)) \quad\forall t , ~ \forall x(t), ~ \forall i \end{aligned} $$
(2.25)

such that every agent benefits the agreement in comparison to the noncooperative equilibrium.

If this inequality was not about to hold there exists an agent who rather switches to the noncooperative equilibrium, because it payoff dominates the agreement. The condition, also refereed to dynamic individual rationality, is necessary but not sufficient for dynamic stability of an agreement. An agent might deviate from the agreement if he benefits from it.

Now we construct the history dependent non-Markovian grim trigger strategies τ i : [0, ) → U i that support σ i(x) as an equilibrium. Given some agreement strategy profile σ(x) the agents can solve the differential equation (2.1) for the agreement trajectory of the state

$$\displaystyle \begin{aligned} x^a(t) := x_0 + \int_0^t{f(x(s), \sigma(x(s)))ds}. \end{aligned} $$
(2.26)

Suppose the agents perfectly observe the state and can recall the history of the state (x(s))s ∈ [0,t]. If they observe that an agent deviates in t, they can impose punishment with delay t + 𝜖. Now the grim strategies read

$$\displaystyle \begin{aligned} \tau(s) = \begin{cases} \sigma(x(s)) \quad\text{for } s \in [t, t + \epsilon] \quad&\text{ if } ~ x(l) = x^a(l) ~ \forall l \in [0, t],\\ \phi(x(s)) \quad\text{for } s \in[t + \epsilon, \infty) &\text{ if } ~ x(t) \neq x^a(t). \end{cases} \end{aligned} $$
(2.27)

That is, if the agents observe that another player deviated at t from the agreement they implement their noncooperative equilibrium strategies from t + 𝜖 onwards. Let d ∈ N denote a potential defector who deviates from σ(x) at t. In the interval s ∈ [t, t + 𝜖] he maximizes his payoff against the agreement strategies of the opponents. From t + 𝜖 onwards he receives the discounted disagreement payoff. Let V d(x(t);𝜖) denote the value of the defector defined as

$$\displaystyle \begin{aligned} \begin{aligned} & V_d(x(t); \epsilon) := \max_{(u_d(s))_{s \in [t, t + \epsilon]}}\int_t^{t + \epsilon}{e^{-r(s-t)}F_d(x(s), u_d(s))ds}\\ & \hspace{2cm} + e^{-r\epsilon}D_d(x(t + \epsilon)) {}\\ \text{s.t.} \quad& \dot x(s) = f(x(s), u_d(s), \sigma_{-d}(x(s))) \quad(s \in [t, t + \epsilon]). \end{aligned} \end{aligned} $$
(2.28)

The threat is effective if

$$\displaystyle \begin{aligned} A_i(x(t)) \geq V_i(x(t); \epsilon) \quad\forall x(t), ~ \forall i {} \end{aligned} $$
(2.29)

holds and every agent benefits the agreement over defecting on the agreement. Now we can always fix an \(\epsilon \in (0, \overline \epsilon ]\) such that (2.29) holds. Suppose punishment can be implemented instantly 𝜖 = 0. Equation (2.29) then becomes

$$\displaystyle \begin{aligned} A_i(x(t)) \geq V_i(x(t); 0) = D_i(x(t)) \end{aligned} $$
(2.30)

which is true by the definition of individual rational agreements. Let \(\overline \epsilon \) denote a threshold such that (2.29) holds with equality

$$\displaystyle \begin{aligned} A_i(x(t)) = V_i(x(t); \overline \epsilon). \end{aligned} $$
(2.31)

Then the threat is effective for all \(\epsilon \in (0, \overline \epsilon ]\). The threat is also credible, because after defection occurs all agents switch to their noncooperative equilibrium strategies and thus have no unilateral incentive to deviate from the punishment by the definition of an equilibrium. The grim trigger strategies and a sufficiently small punishment delay guarantee that the agents stick to the initial agreement over the entire time horizon.

Differentiating (2.24) w.r.t. time yields a representation of the agreement value in terms of the stationary HJBe

$$\displaystyle \begin{aligned} &A^{\prime}_i(x(t))\dot x(t) = -F_i(x(t),\sigma_i(x(t))) + \int_t^{\infty{re^{-r(s-t)}F_i(x(s),\sigma_i(x(s)))ds}} \end{aligned} $$
(2.32)
$$\displaystyle \begin{aligned} \Longleftrightarrow \quad& rA_i(x) = F_i(x,\sigma_i(x)) + A^{\prime}_i(x)f(x, \sigma(x)) {} \end{aligned} $$
(2.33)

This gives us a stationary definition for the agreement value. Next we want to determine a particular strategy profile σ(x) by the Nash bargaining solution. Fix the excess demand function as follows

$$\displaystyle \begin{aligned} E_i(x, \sigma(x)) := \frac{1}{r}[F_i(x,\sigma_i(x)) + A^{\prime}_i(x)f(x, \sigma(x))] - D_i(x). \end{aligned} $$
(2.34)

That is, each agent claims an amount which exceeds his disagreement value. Since each agent will only agree on some bargaining strategy if it gives him at least his disagreement value, we must restrict the control set. The set of individual rational strategies is then defined as

$$\displaystyle \begin{aligned} \varOmega(x) := \{\sigma(x) \in U \mid E_i(x, \sigma(x)) \geq 0 \quad\forall i \in N\}. \end{aligned} $$
(2.35)

Note that these are all stationary representations. That is, the actual time instance t is not important, but state x(t). Since the relation holds for all t ∈ R, we saved the time argument. We are now in the position to state our main result and show how to solve for the bargaining strategy σ(x).

Theorem 2.3

For the fully symmetric case the agreement strategies that solve the Nash bargaining product

$$\displaystyle \begin{aligned} \sigma^N(x) \in \arg\max_{\sigma(x) \in \varOmega(x)} \prod_{i \in N}{E_i(x, \sigma(x))} {} \end{aligned} $$
(2.36)

yield the Pareto optimal controls.

Proof

The first order conditions for j ∈ N of (2.36) is given by

$$\displaystyle \begin{aligned} \begin{aligned} & 0 = \frac{\partial \prod_{i \in N}{E_i(x, \sigma(x))}}{\partial \sigma_j(x)}\\ \Longleftrightarrow \quad& 0 = \frac{1}{r}\sum_{i \in N}{\left[\frac{\partial E_i(x, \sigma(x))}{\partial \sigma_j(x)} \prod_{k \in N \setminus \{i\}}{E_k(x, \sigma(x))}\right]}\\ \Longleftrightarrow \quad& 0 = \frac{\partial E_j(x, \sigma(x))}{\partial \sigma_j(x)}\prod_{k \in N \setminus \{j\}}{E_k(x, \sigma(x))} \\ &+\sum_{i \in N \setminus \{j\}}{\left[\frac{\partial E_i(x, \sigma(x))}{\partial \sigma_j(x)} \prod_{k \in N \setminus \{i\}}{E_k(x, \sigma(x))}\right]}\\ \Longleftrightarrow \quad& 0 =(-\sigma_j(x) + A^{\prime}_j(x))\prod_{k \in N \setminus \{j\}}{E_k(x, \sigma(x))} \\ &+ \sum_{i \in N \setminus \{j\}}{\left[A^{\prime}_i(x)\prod_{k \in N \setminus \{i\}}{E_k(x, \sigma(x))}\right]}. \end{aligned} \end{aligned} $$
(2.37)

Under symmetry, we must have \(E_i(\cdot ) =: \overline E(\cdot )\), \(A^{\prime }_i(\cdot ) =: \overline A'(\cdot )\) and \(\sigma _i(\cdot ) =: \overline \sigma (\cdot )\) for all i ∈ N. The first order condition then becomes

$$\displaystyle \begin{aligned} (-\overline \sigma(x) + n\overline{A}'(x))\overline E(x, \overline \sigma(x))^{n-1} = 0 \quad\Longleftrightarrow \quad\overline \sigma(x) = n\overline A'(x). {} \end{aligned} $$
(2.38)

Since \(\overline E(\cdot ) = 0 \Leftrightarrow \overline A(\cdot ) = \overline D(\cdot )\) implies that all agents stick to the disagreement strategy we can neglect this case here. Now substitute the maximizer \(\overline \sigma (x) = n\overline A'(x)\) into (2.33) which gives

$$\displaystyle \begin{aligned} \begin{aligned} r\overline A(x) &= x(1-x) - \frac{1}{2}\overline \sigma(x)^2 + \frac{\overline \sigma(x)}{n}(n\overline \sigma(x) - \delta x)\\ &= x(1-x) + \frac{1}{2}\overline \sigma(x)^2 - \frac{\delta}{n}\overline \sigma(x)x. \end{aligned} \end{aligned} $$
(2.39)

Take the derivative with respect to x

$$\displaystyle \begin{aligned} r\overline A'(x) \stackrel{(2.38)}{=} \frac{r}{n}\overline \sigma(x) = 1 - 2x + \overline \sigma(x) \overline \sigma'(x) - \frac{\delta}{n}(\overline \sigma'(x)x + \overline \sigma(x)). {} \end{aligned} $$
(2.40)

We claimed that the agreement strategies satisfy the efficient solution and are thus given by \(\overline \sigma (x) = \alpha x + \beta \) with \(\overline \sigma '(x) = \alpha \). Equation (2.40) becomes

$$\displaystyle \begin{aligned} \frac{r}{n}(\alpha x + \beta) = 1 - 2x + (\alpha x + \beta)\alpha - \frac{\delta}{n}(2\alpha x + \beta). {} \end{aligned} $$
(2.41)

This relation must hold at any x ∈ X. At x = 0, the equation simplifies to

$$\displaystyle \begin{aligned} \frac{r}{n}\beta = 1 + \beta\alpha - \frac{\delta}{n}\beta \quad\Longleftrightarrow \quad\beta = \frac{n}{\delta - n\alpha + r} = (2.10). {} \end{aligned} $$
(2.42)

Now substitute β into (2.41) and solve for α, which then is identical with (2.9). Since the controls and thus dynamics are identical under the collusive and bargaining solution, the values must be identical as well.

2.4 Conclusion

We studied the recursive Nash bargaining solution for symmetric differential games. It was shown by an analytically tractable example that the bargaining solution yields the Pareto efficient outcome of full cooperation. In an accompanying paper the author also wants to investigate asymmetric games and compare different solution concepts (e.g. Kalai-Smorodinsky and Egalitarian solution). Especially for the case of asymmetric discounting the recursive bargaining solution can be useful, because then efficient controls are not derivable in the standard way by joint payoff maximization.