1 Introduction

An important issue in cooperative games is to address the question of whether coordinated outcomes can be sustained (enforced) over time, and this is the central point of this paper. We shall show that the sustainability over time of an agreement reached at the starting date of the game can indeed be achieved. As pointed out by Haurie [1], the initial sharing rule agreed upon may become individually irrational in the course of the game, implying that the agreement would be broken at an intermediate instant of time. In other words, the agreed solution does not satisfy individual rationality if one player stands to receive a lower payoff in the coordinated solution than what he would get in a noncooperative solution. If this is the case, the player will find it optimal to deviate, he may have an incentive to cheat on the agreement, that is, to choose a different course of action than that prescribed by the agreement.

Two different approaches have been proposed in the literature to ensure the sustainability over time of coordinated outcomes. The first proposes the design of a cooperative solution or an agreement which is time-consistent or agreeable. In this situation, the coordinated payoffs-to-go are greater than the noncooperative ones, along the cooperative state trajectory (time-consistency) or along any state trajectory (agreeability). See, for example, [26] for the analysis of time-consistent agreements, and [710] for agreeable agreements.

The second approach proposes to design a self-enforceable agreement which is in effect an equilibrium. In this case, each player will find it individually rational to stick to his part in the agreed solution. The problem is already solved when the efficient solution is in itself an equilibrium, as in [1113]. Usually this coincidence is absent, and two different options have been proposed in the literature to embody the cooperative solution with an equilibrium property.

The first is to use trigger strategies (see, for example, [1416]). In the 1980s [14, 15], among others, showed how the use of control-dependent memory strategies permits the inclusion of a threat in a cooperative strategy, which leads to a class of acceptable equilibria in dynamic games. These strategies are based on past actions, and they include a threat to punish, credibly and effectively, any player who cheats on the agreement. Within the limits of differential games theory, this approach presents technical problems, because trigger strategies are in general discontinuous and need the introduction of memory strategies for the players. These strategies are based on all past information of the game evolution to the current time and, as a consequence, they are nonMarkovian.

The second option to implement cooperative solutions by means of noncooperative play is through incentive strategies, establishing the efficient solution as an incentive equilibrium. The key to this result is to design the incentive in such a way that a coordinated outcome becomes a Nash equilibrium. These strategies have been introduced in the two-player dynamic games literature by Ehtamo and Hämäläinen in a series of papers [1720]. The idea is to find strategies for both players such that when one player implements, or is believed to implement his strategy, no temptation exists for the other player to cheat or to break the agreement in the course of the game. Incentive strategies are functions which depend on the possible deviation of the other player with respect to the agreed solution. These strategies recommend that each player implements his part of the coordinated solution whenever the other player is doing so. These strategies are relatively easy to construct, but the difficulty of their use arises when studying their credibility. The equilibrium strategies are credible when it would be more beneficial for each player to follow his equilibrium strategy rather than his cooperative action, in case the opponent deviates from his cooperative action. This credibility property establishes that there will not be any temptation for unilateral deviation from the agreed decision. The credibility of incentive strategies is the topic of this paper.

The starting point of this paper is the main result in [21, 22], where the credibility of the incentive equilibrium strategies is characterized for the class of linear-state and linear-quadratic differential games. This result can be summarized as follows: linear incentive strategies are credible, i.e. the players do believe that the announced strategies will be followed, when deviations are not too large.

This result inspires the main research questions of this paper. Does flexibility facilitate sustainability of cooperation over time? Does the use of more flexible incentive strategies help in ensuring the credibility of the incentive strategies defined to sustain the coordinated outcome? The focus of this follow-up paper is to show that the definition of more flexible nonlinear incentive equilibrium strategies for two-player differential games helps to guarantee the sustainability of the agreement over time. The aim of the study is to check if the definition of less restricted incentive strategies in terms of the permitted deviation from the coordinated solution facilitates the credibility and implementation of these strategies. To this end, we consider a class of incentive strategies that are defined as nonlinear functions of the control variables of both players and the current value of the state variable. Note that in our paper instead of considering only decision-dependent equilibrium strategies as is commonly done in the literature, we also consider state-dependent equilibrium strategies.Footnote 1

Let us recall that in the previous literature on incentive strategies, these strategies are constructed in such a way that the incentive equilibrium is the cooperative solution. Importantly, the idea of this paper is to relax this, in some sense, demanding requirement and look for an incentive strategy equilibrium such that, first, the corresponding optimal state trajectory is close enough but not necessarily identical to the optimal cooperative trajectory; and second, in the long run, the steady state of the state variable is close to the steady state of the state variable under the cooperative mode of play. It is worth noting that the incentive strategies defined in this paper have the following, nice, property: the strategy for one player only calls for a reaction when the control level for the other player is “not close enough” to his cooperative action. Two different realizations for measuring the “distance” from the coordinated action are proposed. The first concerns the distance in the long run as measured by the distance from the steady-state level of the state variable. The second refers to the distance along the whole optimal trajectory measured by the distance from the optimal cooperative time path.

In both cases we show that it is possible to choose the incentive strategy functions in such a manner that the optimal state path evolves arbitrarily close to the corresponding cooperative state trajectory.

We illustrate the use of these strategies in a well-known example taken from environmental economics—a transboundary pollution differential game [24]. We present some clues for the mathematical analysis. Numerical experiments are presented to illustrate the results. Essentially, the numerical algorithm solves an approximate time-discrete dynamic game. The dynamic programming equations are solved by a spline collocation method.

The rest of the paper is organized as follows. In Sect. 2 we briefly review some related works on incentive equilibrium strategies. In Sect. 3 we briefly recall the main ingredients of a linear-quadratic differential game, and in particular, the formulation of a well-known transboundary pollution differential game and its cooperative solution, the feedback noncooperative Nash strategies, as well as the steady-state pollution stocks under cooperative and noncooperative modes of play. In Sect. 4, we define two types of incentive strategies and equilibrium, the so-called stationary incentive and path-dependent incentive. In Sect. 5, we analyse the credibility of these incentive strategies. Sect. 6 concludes. The Appendix contains the numerical algorithm used throughout the paper.

2 Background

The use of incentive strategies has proved to be successful in analysing how a desired coordinated strategy can be implemented in different areas like marketing [23, 2527], environmental economics [21, 22, 28, 29], and others [30]. In these papers different types of incentive strategies are used to achieve the cooperative outcome as an incentive equilibrium. In all the cases, the strategies are assumed to be linear, continuous in the information and decision-dependent in the sense that each player makes his current decision contingent upon the current decision of the other player. All these papers have solvable game structures, specifically all belong to the class of linear-state or linear-quadratic differential games. Furthermore, the credibility property has only been studied in [2123, 29].

As already pointed out in [19], the credibility property is usually difficult to study analytically, and one has to focus on numerical studies. In [19], the authors perform numerical experiments, adopting piecewise constant functions to describe the players’ possible deviations. In that paper the credibility property was studied numerically in the context of a continuous time whaling model. In [20], the analysis for the corresponding discrete-time model is presented. In this discrete-time setting the authors derive the credibility property analytically. In both papers the authors prove that credibility can be obtained for sufficiently small values of the deviation from cooperation.

This last result agrees with the findings in [23], where no closed form results for the credibility problem are derived in the general case, but some insights are obtained in a simplified model. Credibility is assured only against deviations of the following type: a player cheats on the agreement by making a lower effort than the desired rate during some initial period of time.

In [29], in the context of a two-period overlapping generations model in discrete time, the authors show that the decoupled linear incentive strategies considered may not be credible for some parameter values, while for some sets of parameters they would be effective to implement the desired cooperative outcome.

In [21, 22], the credibility of the incentive equilibrium strategies for the class of linear-state and linear-quadratic differential games is characterized. In each of these papers a general condition for credibility is derived, and its use is illustrated in two examples. In [21], sufficient conditions which ensure the existence of neighbourhoods in which the incentive strategies are credible are provided. In both examples the proposed linear incentive strategies are not always credible. In particular, and in line with other studies, linear incentive strategies are credible if the deviation from the cooperative solution is less than a fixed quantity. Alternative ad-hoc nonlinear credible strategies are provided (a hyperbola for the environmental economics game and a parabola for the knowledge accumulation game), suggesting that we should not stick only to linear incentive strategies even in a simple class of differential games such as the linear-state one. In [22] in order to preserve the linear-quadratic structure of the game, the analysis is restricted to linear incentive strategies and, additionally, if one player deviates from the cooperative solution, it is assumed that he will use a control, which is assumed to be linear. In other words, the analysis focuses on the fulfilment of inequalities ensuring that the linear incentive strategies are credible when linear deviations are considered. Under these hypotheses, it is not possible to characterize the feasible sets which ensure credibility of the incentive strategies analytically, showing that obtaining credibility, even for linear incentive strategies, is tedious. In [22], the authors use numerical simulations and conclude that only small deviations from the cooperative levels lead to credible strategies. As far as the credibility property is concerned, the results of all these papers can be summarized as follows: linear incentive strategies are credible, i.e. the players do believe that the announced strategies will be followed, when deviations are not too large.

This result inspires the main research question of the present paper. Does the use of more flexible incentive strategies help in ensuring the credibility of the incentive strategies defined to sustain the coordinated outcome?

3 A Linear-Quadratic Differential Game

We consider a general infinite horizon two-player linear-quadratic differential game. Player \(i\)’s objective is to maximize

$$\begin{aligned}&W_{i}(u_1,u_2,x_0):=\int _{0}^{\infty }f_i(x,u_1,u_2) e^{-\rho t}\, \mathrm{{d}}t,\quad i=1,2,\end{aligned}$$
(1)
$$\begin{aligned}&{\hbox {s.t.:}}\ \dot{x}=g(x,u_1,u_2),\quad x(0)=x_0, \end{aligned}$$
(2)

where \(f_i\) is a general quadratic function of its arguments satisfying standard concavity assumptions, and \(g\) is a linear (affine) function of its arguments, see [16] for details. Parameter \(\rho \) is a positive constant discount rate. Although the setting is more general, it is assumed that the state \(x\) and control variables, \(u_i\), \(i=1,2\), are scalar functions of time \(t\). To simplify the notation, we will drop the explicit dependence on the time variable when no confusion can arise.

Let \(\fancyscript{U}_i\) denote the set of admissible controls for Player \(i\). In this paper, we restrict ourselves to Markovian strategies [16], so, \(\fancyscript{U}_i\) is defined as the set of measurable functions \(u_i=u_i(x,t)\) defined in \(\mathbb {R}\times [0,+\infty [\) with values in some subset of the set of real numbers, \(\mathbb {U}_i\subset \mathbb {R}\), such that for all \(u_i\in \fancyscript{U}_i\), \(i=1,2\), the differential Eq. (2) with \(u_i=u_i(x,t)\), \(i=1,2\), possesses a unique absolute continuous solution defined in \([t_0,+\infty [\) for all \(t_0\ge 0\) and \(x_0\in \mathbb {R}\). Let \(\fancyscript{U}:=\fancyscript{U}_1\times \fancyscript{U}_2\).

From now on, the state variable is the (unique) solution of the ordinary differential Eq. (2) given the pair of admissible controls \(u_1\) and \(u_2\). As problem (1)–(2) is autonomous, we assume, unless explicitly stated otherwise that the players use stationary Markovian strategies \(u_i=u_i(x)\). For notation simplicity, we will use \(W_i(u_1,u_2)\) instead of \(W_i(u_1,u_2,x_0)\), dropping the dependence on the initial condition when no confusion can arise.

The cooperative solution denoted by \(u^\mathrm{{c}}=(u_{1}^\mathrm{{c}},u_{2}^\mathrm{{c}})\) is obtained as the result of the joint optimization problem

$$\begin{aligned} \max _{u\in \fancyscript{U}}\left( W_{1}+W_{2}\right) = \max _{u\in \fancyscript{U}}\int _{0}^{\infty }\left( f_1(x,u_1,u_2)+f_2(x,u_1,u_2)\right) e^{-\rho t}\;\mathrm{{d}}t \end{aligned}$$

subject to dynamics (2). Here, and in the rest of the paper, we have used the notation \(u\) to represent a pair \(u=(u_1,u_2)\in \fancyscript{U}=\fancyscript{U}_1\times \fancyscript{U}_2\).

The cooperative strategies can be explicitly computed. As is well known, the cooperative feedback optimal controls are given by affine function of the state variable of the form \(u_{i}^\mathrm{{c}}(x)=a_{i}^\mathrm{{c}}x+b_{i}^\mathrm{{c}}\), \(i=1,2\), where the coefficients \(a_{i}^\mathrm{{c}}\) and \(b_{i}^\mathrm{{c}}\) are characterized in [22]. Note that such coefficients are not relevant for our analysis, so they will be not presented here. We denote by \(x^\mathrm{{c}}(t)\) the optimal cooperative trajectory that is the unique solution of (2) when \(u_i=u_i^\mathrm{{c}}(x)\), \(i=1,2\).

In the noncooperative case, a Markov-perfect Nash equilibrium (MPNE) [16], \(u^\mathrm{{N}}\in \fancyscript{U}\), \(u^\mathrm{{N}}=(u_{1}^\mathrm{{N}},u_{2}^\mathrm{{N}})\), can be obtained as a pair of linear strategies, \(u_{i}^\mathrm{{N}}(x)=a_{i}^\mathrm{{N}}x+b_{i}^\mathrm{{N}}\), \(i=1,2\). The Nash equilibrium, \(u^\mathrm{{N}}\), is defined by the following pair of inequalities:

$$\begin{aligned} W_1(u_1^\mathrm{{N}},u_2^\mathrm{{N}})\ge W_1(u_1,u_2^\mathrm{{N}}),\;\forall u_1\in \fancyscript{U}_1;\quad W_2(u_1^\mathrm{{N}},u_2^\mathrm{{N}})\ge W_2(u_1^\mathrm{{N}},u_2),\;\forall u_2\in \fancyscript{U}_2. \end{aligned}$$

In some models the coefficients \(a_i^\mathrm{{N}}\) and \(b_i^\mathrm{{N}}\) can be explicitly computed, see [22] for example. We denote by \(x^\mathrm{{N}}(t)\) the solution of (2) when the Markov-perfect Nash strategies are used.

We assume that, as is the case in the majority of models used in practice, both \(x^\mathrm{{c}}(t)\) and \(x^\mathrm{{N}}(t)\) converge to a steady state denoted by \(x_\mathrm{{ss}}^\mathrm{{c}}\) and \(x_\mathrm{{ss}}^\mathrm{{N}}\), respectively.

Although the concepts and techniques presented here can be applied to the general case with obvious modifications, for simplicity of presentation we focus, from now on, on a particular linear–quadratic model borrowed from the environmental economics literature. This particularization will allow us to compare our results with the previous literature on the subject.

Let us consider two players (countries, regions,\(\ldots \)) who wish to coordinate their pollution strategies in order to maximize their joint payoff. The control variables, denoted by \(u_{i}(t)\), \(i=1,2\), are the emissions of the two players (countries) at time \(t\ge 0\). The state variable, denoted by \(x(t)\), represents the stock of pollution, which we assume follows the dynamics defined by the ordinary differential equation

$$\begin{aligned} \dot{x}=g(x,u_1,u_2):=\beta (u_{1}+u_{2})-\alpha x. \end{aligned}$$
(3)

In the model, \(\beta >0\) is a scale parameter, and \(\alpha >0 \) denotes the natural absorption rate. The objective of player \(i\) is defined by functional (1) with

$$\begin{aligned} f_i(x,u_i,u_j):= u_{i}\left( A_i-\frac{1}{2}u_{i}\right) -\frac{1}{2}\varphi _{i}x^{2}, \end{aligned}$$
(4)

where \(A_i\) and \(\varphi _{i}\) are positive parameters. We suppose, as is commonly done for the model at hand, that \(\mathbb {U}_i=[0,+\infty [\), \(i=1,2\), although other restrictions are also advisable.

Particular cases of our specification are analysed in [24, 31]. Both models are completely symmetric (\(A_i=A, \varphi _i=\varphi \) for all players). In [24], an \(n\)-country model is studied, whereas [31] concentrates on a two-region differential game.

In this particular model both cooperative and Markov-perfect noncooperative feedback linear strategies can be explicitly computed, see [22]. In [22], it is proved that \( x_\mathrm{{ss}}^\mathrm{{c}}<x_\mathrm{{ss}}^\mathrm{{N}},\) so that, the noncooperative game gives a greater long-term stock of pollution than the one under cooperation.

As is well known in the related literature, the noncooperative mode of play leads to an overpolluted environment in the long run. However, the environmentally preferred coordinated solution is not an equilibrium, and the players have an incentive to deviate from the prescribed paths. The main aim of this paper is to design an incentive strategy such that the players (countries) will not move away significantly from their part of the coordinated solution. As a result, in the particular model at hand, the optimal time path of the stock of pollution will be close to the cooperative trajectory and the steady state of the stock of pollution will be near the steady state of the stock of pollution under the cooperative setting.

4 Incentive Equilibria

We start this section with the definition of incentive equilibria. The set \(\Gamma _i\) of admissible incentive strategies for Player \(i\) is defined as the set of piecewise smooth functions \(\psi _i\), defined in \(\mathbb {U}_i\times \mathbb {U}_j\times \mathbb {R}\) with values in \( \mathbb {U}_i\), such that, for all \(v_k=v_k(x,t)\in \fancyscript{U}_k\), \(k=1,2\), the function defined by \(\Psi _i(x,t)=\psi _i(v_i(x,t),v_j(x,t),x,t)\), satisfies \(\Psi _i\in \fancyscript{U}_i\). For notation simplicity, in what follows we drop the explicit dependence of \(\psi _i\) on the time variable t when no confusion can arise.

Definition 4.1

A pair \(\psi _1(v_1,v_2,x)\), \(\psi _2(v_1,v_2,x)\) with \(\psi _i\in \Gamma _i\), \(i=1,2\), is an incentive equilibrium at \((u_1^*,u_2^*)\in \fancyscript{U}=\fancyscript{U}_1\times \fancyscript{U}_2\) iff for all \(u_1\in \fancyscript{U}_1\) and \(u_2\in \fancyscript{U}_2\),

$$\begin{aligned} W_1(u_1^*,u_2^*)\ge W_1(u_1,\psi _2(u_1,u_2^*,\hat{x})), \quad W_2(u_1^*,u_2^*)\ge W_2(\psi _1(u_1^*,u_2,\check{x}),u_2), \end{aligned}$$

where \(\hat{x}\) and \(\check{x}\) satisfy \(\dot{\hat{x}}=\beta (u_{1}+\psi _2(u_1,u_2^*,\hat{x}))-\alpha \hat{x}\), and \(\dot{\check{x}}=\beta (\psi _1(u_1^*,u_2,\check{x})+u_{2})-\alpha \check{x}\), respectively, with \(\hat{x}(0)=\check{x}(0)=x_{0}\). Furthermore, \(u_1^*=\psi _1(u_1^*,u_2^*,x^*)\), \(u_2^*=\psi _2(u_1^*,u_2^*,x^*)\), where \( \dot{x}^*=\beta (u_{1}^*+u_{2}^*)-\alpha x^*\), with \(x^*(0)=x_{0}\).

An incentive equilibrium is thus characterized by the following pair of optimal control problems:

$$\begin{aligned} \max _{u_i\in \fancyscript{U}_i} W_i(u_i,u_j^*)&=\int _{0}^{\infty }\left( u_{i}\left( A_i-\frac{1}{2}u_{i}\right) -\frac{1}{2}\varphi _{i}x^{2}\right) e^{-\rho t} \mathrm{{d}}t,\nonumber \\ \text {s.t.: }\dot{x}&=\beta (u_{i}+\psi _j(u_i,u_j^*,x))-\alpha x,\ x(0)=x_{0}, \end{aligned}$$
(5)

with \(u_i^*=\arg \max _{u_i}W_i(u_i,u_j^*)\), \(i,j=1,2\), \( i\ne j\). The equilibrium condition \(u_i^*=\psi _i(u_i^*,u_j^*,x^*)\), \( i,j=1,2\), \( i\ne j,\) has to be satisfied.

Remark 4.1

Some authors have argued that the observations of the current actions of the other player may only be available after a time lag. Thus, the implicit assumption of instantaneous observability in Definition 4.1 should be understood as a mathematical abstraction. It is possible to introduce a time lag in the observations of actions as, for example, in [18, 19]. However, in [19] the authors argue that when the delay is small, its effect can be neglected in theoretical considerations. At the end of the paper, the authors consider the construction of incentive strategies by introducing a constant time lag. They show that when the lag is small, the payoffs and the incentive strategies in their model are very similar (they are the same to an accuracy of four digits) for both specifications (without and with time delay). However, this point could be an interesting question for future research in the context of nonlinear incentive strategies.Footnote 2

Remark 4.2

Linear incentive strategies are a particular case of Definition 4.1. In [17] (see also [21, 22, 28]), the incentive strategy is defined as an affine function with the following form,

$$\begin{aligned} \psi _j(u_i,u_j,x)=\psi _{j}(u_{i})=u_{j}^\mathrm{{c}}+D_{j}(u_{i}-u_{i}^\mathrm{{c}}),\quad i,j=1,2,\;i\ne j, \end{aligned}$$
(6)

with \(D_{j},\,j=1,2\), denoting an appropriate non-zero constant.

In this case, the incentive equilibrium is the pair \((u_1^*,u_2^*)=(u_1^\mathrm{{c}},u_2^\mathrm{{c}})\), that is, the incentive equilibrium is exactly the cooperative solution and, consequently, for all \(t\ge 0\), \( x^*(t)=x^\mathrm{{c}}(t)\), and \(x^*_\mathrm{{ss}}=x^\mathrm{{c}}_\mathrm{{ss}}\), where \(x^*_\mathrm{{ss}}\) denotes the steady state of the system when the incentive strategies are used.

The main idea of this paper is to relax this somehow exigent result, looking instead for an incentive strategy equilibrium \((u_1^*,u_2^*)\) such that, first, the corresponding optimal state trajectory \(x^*(t)\) is close enough but not exactly equal to the cooperative trajectory \(x^\mathrm{{c}}(t)\); and second, in the long run the steady state of the stock of pollution satisfies \(x_\mathrm{{ss}}^\mathrm{{c}}\le x_\mathrm{{ss}}^*<x_\mathrm{{ss}}^\mathrm{{N}}\), with \(x_\mathrm{{ss}}^*\) close to \(x_\mathrm{{ss}}^\mathrm{{c}}\) and lower than the long-run value under the noncooperative setting, \(x_\mathrm{{ss}}^\mathrm{{N}}\).

In the rest of this section we present two different realizations of this idea.

4.1 Stationary Incentive

In the first realization, that we call stationary incentive, we choose the incentive functions \(\psi _j\), \(j=1,2\), in Definition 4.1 of the form

$$\begin{aligned} \psi _j^\mathrm{{s}}(u_i,u_j,x)=(u_{j}^\mathrm{{c}}+D_{j}(u_{i}-u_{i}^\mathrm{{c}}))\phi (x-x_\mathrm{{ss}}^\mathrm{{c}},\varepsilon ) +u_j(1-\phi (x-x_\mathrm{{ss}}^\mathrm{{c}},\varepsilon )), \end{aligned}$$
(7)

where \(\varepsilon >0\) is a small positive parameter, and \(\phi (x,\varepsilon )\) is a smooth function satisfying

$$\begin{aligned} \phi (x,\varepsilon )=0,\; {\hbox {if}}\; x\le \varepsilon ;\quad \phi (x,\varepsilon )=1,\; {\hbox {if}}\; x\ge 2\varepsilon . \end{aligned}$$
(8)

The superscript \(s\) is used to denote “stationary” incentive scenario.

Remark 4.3

Note that, with the previous definition, if the state trajectory \(x(t)\) is far from the steady-state value of the stock of pollution under cooperation, then the linear incentive strategy (6) is activated by forcing the players’ choices in such a way that the game state path returns close to the cooperative long-term state, \(x_\mathrm{{ss}}^\mathrm{{c}}\). When the state path \(x(t)\) is close enough to \(x_\mathrm{{ss}}^\mathrm{{c}}\) (as measured by parameter \(\varepsilon \)) the nonlinear stationary incentive (7) gives freedom to the players who are not restricted to use any type of incentive. That is, in this case, the players are allowed to choose any time path.

Note also that the incentive strategy is implemented only if, at some point, the trajectory \(x(t)\) is above \(x_\mathrm{{ss}}^\mathrm{{c}}\), so only deviations of the cooperative outcome by one player leading to a more polluted environment imply an immediate response by the other player.

The same technique can be applied, with obvious modifications, in different models if the incentive is implemented for deviations below and/or above the desired outcome.

The incentive equilibrium functions \(\psi _j^\mathrm{{s}}\), \(j=1,2\), do not depend explicitly on time. Therefore, in this case, problem (5) is autonomous. The value function \(V_i^\mathrm{{s}}(x)\), for Player \(i\), \(i=1,2\) satisfies the system of Hamilton–Jacobi–Bellman equations

$$\begin{aligned} \rho V_i^\mathrm{{s}}(x)=\max _{u_i\in \mathbb {U}_i}\left\{ f_i(x,u_i,u_j^{*\mathrm{{s}}})+\frac{\mathrm {d}}{\mathrm {d}x}V_i^\mathrm{{s}}(x)g_i^\mathrm{{s}}(x,u_i,u_j^{*\mathrm{{s}}})\right\} , \end{aligned}$$
(9)

where the function \(f_i\) has been defined in (4), and

$$\begin{aligned} g_i^\mathrm{{s}}(x,u_i,u_j)=\beta (u_i+\psi _j^\mathrm{{s}}(u_i,u_j,x))-\alpha x,\quad i,j=1,2,\ i\ne j. \end{aligned}$$
(10)

The optimal policies \(u^{*\mathrm{{s}}}_i\), \(i=1,2\), are defined by

$$\begin{aligned} u^{*\mathrm{{s}}}_i(x)=\arg \max _{u_i\in \mathbb {U}_i}\left\{ f_i(x,u_i,u_j^{*\mathrm{{s}}}) +\frac{\mathrm {d}}{\mathrm {d}x}V_i^\mathrm{{s}}(x)g_i^\mathrm{{s}}(x,u_i,u_j^{*\mathrm{{s}}})\right\} ,\quad j\ne i. \end{aligned}$$

4.2 Path-dependent Incentive

In the second realization, the pair of incentive functions \(\psi _j\), \(j=1,2\), is allowed to depend on \(t\) through the cooperative state trajectory \(x^\mathrm{{c}}(t)\); hence the name path-dependent incentive strategies. More precisely, we define for \(i,j=1,2\), \(i\ne j\) and \(t\ge 0\),

$$\begin{aligned} \psi _j^\mathrm{{ns}}(u_i,u_j,x,t)= (u_{j}^\mathrm{{c}}+D_{j}(u_{i}-u_{i}^\mathrm{{c}}))\phi (x-{x^\mathrm{{c}}(t)},\varepsilon ) +u_j(1-\phi (x-{x^\mathrm{{c}}(t)},\varepsilon )), \end{aligned}$$
(11)

where \(x^\mathrm{{c}}(t)\) is the cooperative state trajectory, and \(\phi \) is the cutoff function defined in (8). Superscript ns is used to denote “non-stationary” as opposite to stationary incentive scenario.

Remark 4.4

Note that, with the choice (11), the players implement the linear incentive strategies when the state trajectory is far from the cooperative trajectory, \(x^\mathrm{{c}}(t)\). The difference with the stationary incentive (7) presented in Sect. 4.1, is that there the players were forced to be not far from the stationary steady state of the cooperative game, \(x_\mathrm{{ss}}^\mathrm{{c}}\), whereas now, when the path-dependent incentive (11) is used, they are forced to be close to the whole trajectory of the cooperative game, \(x^\mathrm{{c}}(t)\).

In the case of a time-dependent incentive, the problem is nonautonomous and, in consequence, the value function \(V_i^\mathrm{{ns}}(t,x)\), for Player \(i=1,2\), satisfies the system of time-dependent Hamilton–Jacobi–Bellman equations

$$\begin{aligned} -\frac{\partial }{\partial t}V_i^\mathrm{{ns}}(t,x)\!+\!\rho V_i^\mathrm{{ns}}(t,x)\!=\! \max _{u_i\in \mathbb {U}_i}\left\{ f_i(x,u_i,u_j^{*\mathrm{{ns}}}) \!+\!\frac{\partial }{\partial x}V_i^\mathrm{{ns}}(t,x)g_i^\mathrm{{ns}}(x,u_i,u_j^{*\mathrm{{ns}}},t)\right\} ,\nonumber \\ \end{aligned}$$
(12)

where, \(f_i\), \(i=1,2\) is defined in (4) and

$$\begin{aligned} g_i^\mathrm{{ns}}(x,u_i,u_j,t)=\beta (u_i+\psi _j^\mathrm{{ns}}(u_i,u_j,x,t))-\alpha x, \quad i=1,2, i\ne j \end{aligned}$$
(13)

with \(\psi _j^\mathrm{{ns}}\) defined in (11).

Equation (12) is supplemented with the boundary condition

$$\begin{aligned} \lim _{t\rightarrow \infty } V_i^\mathrm{{ns}}(t,x)=V_i^\mathrm{{s}} (x), \end{aligned}$$
(14)

which is a natural boundary condition, taking into account that \(\lim _{t\rightarrow \infty }x^\mathrm{{c}}(t)=x_\mathrm{{ss}}^\mathrm{{c}}\). The optimal policies \(u^{*\mathrm{{ns}}}_i(t,x)\), \(i=1,2\) are defined by

$$\begin{aligned} u^{*\mathrm{{ns}}}_i(t,x)= \arg \max _{u_i\in \mathbb {U}_i}\left\{ f_i(x,u_i,u_j^{*\mathrm{{ns}}}) +\frac{\partial }{\partial x}V_i^\mathrm{{ns}}(t,x)g_i^\mathrm{{ns}}(x,u_i,u_j^{*\mathrm{{ns}}},t)\right\} , \quad j\ne i. \end{aligned}$$

4.3 Computing Nonlinear Incentive Strategies

The analysis of the nonlinear incentive strategies presented in the previous subsections requires numerical methods. In this section, we present some results obtained with the numerical method described in the Appendix. Let us analyse a symmetric example. Similar qualitative results were found in all the numerical experiments carried out. We present here only the results for the following particular values of the parameters: \(A_1=A_2=0.5\), \(\varphi _1=\varphi _2=1\), \(\alpha =0.2\), \(\beta =1\), \(\rho =0.1\). The threshold \(\varepsilon \) in the definition of the cutoff function \(\phi \) in (8) defining the nonlinear incentive was set to \(\varepsilon =0.025\). The parameter \(D_j\) in (7) and (11) was set to \(D_j=1\), \(j=1,2\) as in [22]. In this first experiment, the initial condition was set to \(x_0=0\).

In Fig. 1, we have represented the optimal control time paths and optimal state trajectories for four different modes of play. Discontinuous (purple) lines represent the cooperative control (left) and state (right) optimal trajectories; dotted (red) lines are the optimal trajectories corresponding to the noncooperative MPNE under linear strategies. The optimal trajectories for the stationary and path-dependent incentive equilibrium strategies are represented using solid (black) line and dash-dotted (blue) lines, respectively.

Fig. 1
figure 1

Optimal control (left) and state (right) trajectories. Initial condition \(x_0=0\). Discontinuous (purple) line cooperative game. Dotted (red) line noncooperative game, linear MPNE. Solid (black) line stationary incentive equilibrium. Dash-dotted (blue) line path-dependent incentive equilibrium

As we can see in Fig. 1(left), the emission of pollutants when the incentive equilibrium strategies are implemented is higher than in both the cooperative and noncooperative games for a short period of time. This has the effect of rapidly adjusting the trajectory to the specific target of the corresponding incentive strategy. In both cases, stationary and time-dependent incentive equilibrium, the long-term steady state, \(x_\mathrm{{ss}}^{*}\), is within a distance of \(2\varepsilon \) to the steady state of the pollution stock of the cooperative game, see the right part of Fig. 1. In the case of the path-dependent incentive, after a short period of adjustment, the whole trajectory is within a distance of \(2\varepsilon \) of the cooperative trajectory. In Fig. 1(right) we can see that the cooperative and path-dependent incentive state trajectories are, after the adjustment period, parallel. The distance between cooperative and path-dependent trajectories can be controlled by means of parameter \(\varepsilon \) in (8).

Furthermore, the stationary and path-dependent incentive trajectories are very close in the long term. In fact, by construction, they provide the same steady state of the pollution stock, \(x_\mathrm{{ss}}^{*}\). We remark that the steady state of pollution stock can be made arbitrarily close to the Pareto efficient pollution stock by choosing a smaller value of parameter \(\varepsilon \). One feature that is worth noting is that, in the long run, the emission levels are very close, in both cases (stationary and time-dependent incentive), to the Pareto efficient emission level (Fig. 1).

In Fig. 2 we represent the symmetric stationary incentive optimal feedback law together with the optimal feedback laws for the cooperative and noncooperative cases. The cooperative feedback \(u=u_i^\mathrm{{c}}(x)\), \(i=1,2\) is represented with a dashed (purple) line, the linear feedback Nash equilibrium \(u=u_i^\mathrm{{N}}(x)\), \(i=1,2\) with a dotted (red) line, and the stationary incentive equilibrium \(u=u_i^*(x)\), \(i=1,2\), with a solid (black) line. The dash-dotted light-grey line represents the line of possible steady states for the dynamics (3) with symmetric strategies. That is the line defined by the equation \(2\beta u-\alpha x=0\). We have not included the natural restriction \(u\ge 0\) in this picture in order to facilitate its comprehension. In the picture, we can observe the role played by parameter \(\varepsilon \): it marks the point at which the emissions should start to decline towards its stationary level. With a smaller value of \(\varepsilon \) the emissions should start their fall earlier in order to have a steady state closer to the cooperative steady state.

Fig. 2
figure 2

Stationary feedbacks. Discontinuous (purple) line cooperative feedback. Dotted (red) line MPNE. Solid (black) line stationary incentive equilibrium feedback

In the rest of this section we present the results obtained when the initial condition is set to \(x_0=0.25\), so the initial state is larger than the cooperative steady state, \(x_\mathrm{{ss}}^\mathrm{{c}}\) and lower than the noncooperative steady state, \(x_\mathrm{{ss}}^\mathrm{{N}}\). In Fig. 3 we have represented the optimal control (left) and state (right) paths. We have used the same colour and type of line code than before: discontinuous purple line for the results of the cooperative game, dotted red line for noncooperative case and solid black line for the stationary incentive results. In this particular case, the stationary and path-dependent incentives coincide. We can observe in Fig. 3 that, in a first period of time, the optimal controls, in both the cooperative and incentive equilibrium cases, are null, forcing the stock of pollution to decrease, see the right-hand picture in Fig. 3. In consequence, cooperative and incentive equilibrium state trajectories coincide in this initial period of time. After this period, the incentive equilibrium control path rapidly adjusts to its stationary level, driving the stock of pollution near the incentive equilibrium stationary state \(x_\mathrm{{ss}}^{*}\). We remark that \(|x_\mathrm{{ss}}^{*}-x_\mathrm{{ss}}^\mathrm{{c}}|\approx 2\varepsilon \) so that, the difference can be made as small as needed by choosing \(\varepsilon \) appropriately. In the noncooperative game, the optimal emissions trajectory is positive and decreases towards its stationary level. As can be appreciated in Fig. 3, the stock of pollution increases steadily towards \(x_\mathrm{{ss}}^\mathrm{{N}}\), the stationary steady state of the noncooperative game, which is much larger than both \(x_\mathrm{{ss}}^\mathrm{{c}}\) and \(x_\mathrm{{ss}}^{*}\).

Fig. 3
figure 3

Optimal control (left) and state (right) trajectories. Initial condition \(x_0=0.25\). Discontinuous (purple) line cooperative game. Dotted (red) line noncooperative game, linear MPNE. Solid (black) line stationary incentive equilibrium. Dash-dotted (blue) line path-dependent incentive equilibrium

5 Credibility

We start this section with a definition of credible incentive that is an extension of that applicable only to the case of linear incentive strategies, given in [2123, 29]. As indicated in the introduction, the credibility of the incentive strategies means, essentially, that if Player \(j\) deviates unilaterally from his incentive equilibrium action, \(u_j=u_j^*(x)\), then it will be more beneficial for Player \(i\) to follow the incentive strategy, rather than to stick to \(u_i=u_i^*(x)\). This credibility property establishes that there will not be any temptation for unilateral deviation from the pair \(u_j=u_j^*(x)\), \(j=1,2\).

Definition 5.1

A pair of incentive equilibrium strategies \(\psi _1(v_1,v_2,x)\), \(\psi _2(v_1,v_2,x)\), with \(\psi _i\in \Gamma _i\), \(i=1,2\), is credible in a set \(U_1\times U_2\subset \fancyscript{U}_1\times \fancyscript{U}_2\) iff given \(u_1\in U_1\) and \(u_2\in U_2\) there exist \(\hat{u}_1\in \fancyscript{U}_1\) and \(\check{u}_2\in \fancyscript{U}_2\) such that

$$\begin{aligned} W_1(\psi _1(\hat{u}_1,u_2,\hat{x}),u_2)\ge W_1(u_1^*,u_2),\quad W_2(u_1,\psi _2(u_1,\check{u}_2,\check{x}))\ge W_2(u_1,u_2^*), \end{aligned}$$
(15)

where \(\hat{x}\) and \(\check{x}\) satisfy \(\dot{\hat{x}}=\beta (\psi _1(\hat{u}_1,u_2,\hat{x})+u_2)-\alpha \hat{x}\) and \(\dot{\check{x}}=\beta (u_1+\psi _2(u_1,\check{u}_2,\check{x}))-\alpha \check{x}\), respectively, with \(\hat{x}(0)=\check{x}(0)=x_{0}\).

A sufficient, although obviously not necessary, condition for credibility is that for all \(u_1\in U_1\) and \(u_2\in U_2\)

$$\begin{aligned} W_1(\psi _1(u_1^*,u_2,\hat{x}),u_2)\ge W_1(u_1^*,u_2);\quad W_2(u_1,\psi _2(u_1,u_2^*,\check{x}))\ge W_2(u_1,u_2^*), \end{aligned}$$

where \(\hat{x}\) and \(\check{x}\) are defined as in Definition 5.1, with \(\hat{u}_1=u_1^*\) and \(\check{u}_2=u_2^*\).

Remark 5.1

Note that, in the case of linear incentive strategies, \(\psi _j\) is given by (6). Then, Definition 5.1 reduces to \(W_i(\psi _i(u_j),u_j)\ge W_i(u_i^*,u_j)\), \(\forall u_j\in U_j\) with \(u_i^*=u_i^\mathrm{{c}}\), for \(i=1,2\), which is the credibility definition proposed in the literature (see, for example, [2123, 29]).

In what follows, we restrict ourselves to the stationary incentive defined in (7). Definition 5.1 requires conditions (15) to be checked in some subset of admissible controls \(U_1\times U_2\subset \fancyscript{U}_1\times \fancyscript{U}_2\). In order to be able to analyse the credibility properties of the nonlinear incentive strategies we assume that the set of possible deviations is restricted to \(U_1=U_2=\{u(x)=ax+b,a\le 0, b\ge 0\}\). This is enough to illustrate the credibility properties of the proposed incentive strategies and allows us to compare with the linear incentive strategies studied in [22].

In Fig. 4 we have represented the region of credibility for deviations of Player 2 in the symmetric scenario studied in Sect. 4.3. We have checked condition (15) for deviation of Player 2 in the set \(U_2\) with \(-2\le a\le 0\), \(0\le b\le 1\). This set contains both the cooperative control \(u_i^\mathrm{{c}}\) and the linear MPNE \(u_i^\mathrm{{N}}\). We have represented with light-grey colour the region \(C\) of parameter values where, if \(u_2(x)=ax+b\) with \((a,b)\in C\),

$$\begin{aligned} W_1(\psi _1(u_1^*,u_2,\hat{x}),u_2)\ge W_1(u_1^*,u_2). \end{aligned}$$

That is, \(C\) is the region of credibility for Player 1 against deviation of Player 2. The regions \(D\) represented in a darker grey colour in Fig. 4, correspond to the set of parameter values where the following two conditions are satisfied simultaneously

$$\begin{aligned} W_1(\psi _1(u_1^*,u_2,\hat{x}),u_2)< W_1(u_1^*,u_2)\quad \text {and} \quad W_2(u_1^*,u_2)<W_2(u_1^*,u_2^*). \end{aligned}$$

That is, \(D\) is a region where the incentive equilibrium strategy is not credible following Definition 5.1, but it is irrational that Player 2 implements a strategy that would provide him a smaller payoff than that associated with the incentive strategy. Finally, the darkest region \(E\) is the non-credible region. For a deviation of Player 2 in \(E\), we have

$$\begin{aligned} W_1(\psi _1(u_1^*,u_2,\hat{x}),u_2)< W_1(u_1^*,u_2)\quad \hbox {and} \quad W_2(u_1^*,u_2)\ge W_2(u_1^*,u_2^*). \end{aligned}$$

We can clearly see in Fig. 4 that, apart from the very small region \(E\), the incentive strategy is credible for affine deviations from the incentive equilibrium that provide a greater payoff to the deviating player. Furthermore, the incentive strategy also has the following interesting property

$$\begin{aligned} W_2(\psi _1(u_1^*,u_2,\hat{x}),u_2)<W_2(u_1^*,u_2^*),\quad \forall u_2\in U_2. \end{aligned}$$

This last inequality means that the incentive strategy works also as a trigger strategy (see, for example, [1416]), against possible deviation of Player 2 in the set \(U_2\).

Fig. 4
figure 4

Region of credibility. Symmetric case. Parameter values as in Sect. 4.3

In Fig. 5, we can observe, with more detail, the mechanism through which the incentive defined by (7) is able to annihilate the possible advantage of a deviating player. In Fig. 5 we have represented the control (left) and state (right) trajectories when one of the players, for example Player 2, plays his part of the linear MPNE strategy \(u_2=u_2^\mathrm{{N}}\), instead of the incentive equilibrium, whereas the other player, say Player 1, plays \(u_1=\psi (u_1^*,u_2^\mathrm{{N}},\tilde{x})\), where \(\tilde{x}\) is the solution of Eq. (2) with this choice of \(u_1\) and \(u_2\). We have used a dotted-dashed (blue) line to represent the state trajectory in Fig. 5 (right). The control trajectories for Player 1 and Player 2 are represented, on the left part of Fig. 5, using dotted-dashed and dashed (blue) lines, respectively. For reference, we have also depicted the trajectories when both players apply the strategies given by the incentive equilibrium \((u_1^*,u_2^*)\) (represented using solid (black) line), the trajectories when both players play the Nash equilibrium \((u_1^\mathrm{{N}},u_2^\mathrm{{N}})\) (represented using dotted (red) line) and the trajectories when both players apply the cooperative solution \((u_1^\mathrm{{c}},u_2^\mathrm{{c}})\) (represented using discontinuous (purple) line). We can observe that when the stock of pollution (state variable) lies below the threshold (\(x_\mathrm{{ss}}^\mathrm{{c}}+\varepsilon \)) which defines the incentive (see formulas (7) and (8)), Player 1 emits pollutants at rate \(u_1=u_1^*\), regardless of the deviation of Player 2. However, when, as a consequence of the strategy, \(u_2=u_2^\mathrm{{N}}\), implemented by Player 2, the stock of pollution (state of the system) separates from the threshold allowed, Player 1 changes smoothly but rapidly, to the rate of emission given by his part of the linear MPNE strategy, \(u_1=u_1^\mathrm{{N}}\), neutralizing the advantage obtained by Player 2 when it separates from the incentive equilibrium. In this way, the final outcome of the game is that both players play à la Nash and, of course, the stock of pollution, eventually, approximates to \(x_\mathrm{{ss}}^\mathrm{{N}}\).

Fig. 5
figure 5

Optimal control (left) and state (right) trajectories. Initial condition \(x_0=0\). Discontinuous (purple) line cooperative game. Dotted (red) line noncooperative game linear MPNE. Solid (black) line stationary incentive equilibrium. Dash-dotted (blue) line path-dependent incentive equilibrium

To finish this section we compare the nonlinear incentive proposed in the present paper with the linear incentive used in the same model in [22]. We repeat the experiment using the same value of the parameters used in [22]. Specifically, \(A_1=1.01\), \(A_2=1.0\), \(\varphi _1=\varphi _2=1\), \(\alpha =0.2\), \(\beta =1\), \(\rho =0.1\) and \(x_0=0.05\). We have represented in Fig. 6 the regions of credibility for Player 1 against deviations of Player 2 on the left of the picture and the region of credibility for Player 2 against deviations of Player 1 on the right of the picture. We have used the same colour codes as in Fig. 4. We can see, that, in both cases, the region of credibility attained with the nonlinear incentive contains and is considerably greater than that reported for the linear incentive in [22]. In fact, \(-0.85\le a_1 \le -0.7\), \(0.15\le b_1\le 0.2\) for deviation of Player 1 of the form \(u_1=a_1x+b_1\) (compared with the right part of Fig. 6) and \(-0.9\le a_2 \le -0.7\), \(0.15\le b_2\le 0.2\) for deviation of Player 2 of the form \(u_2=a_2x+b_2\) (compared with the left part of Fig. 6). However, in the case of the paper [22], the final outcome was exactly the cooperative Pareto efficient solution, whereas in our case, this goal is, by construction, only approximately attained.

Fig. 6
figure 6

Credibility region for Player 1 (left) and Player 2 (right). Nonsymmetric case. \(A_1=1.01\), \(A_2=1.00\). Regions \(E\), \(D\) and \(E\) are defined as in Fig. 4

This comparison allows us to positively answer the main research question of this paper. We can conclude that the introduction of flexibility can be a useful device to facilitate the sustainability of cooperation over time.

6 Conclusions

The previous literature on dynamic games has shown that incentive strategies could be an interesting device for sustaining cooperation over time of a cooperative agreement, if they happen to be credible. In this paper, we have proposed a class of nonlinear incentive strategies that lead to an optimal state path that remains close to the cooperative state trajectory. These nonlinear incentive strategies, defined as a nonlinear function of the control variables of both players and the current values of the state variable, act as trigger strategies discouraging the player to separate from the incentive equilibrium, and, finally, from the cooperative solution. Furthermore, these nonlinear incentive strategies enlarge the credibility region. This is an interesting feature that, we think, merits exploration in more detail. The characterization of credible incentive strategies defined by means of more general nonlinear functions is another future research project that could be of interest, if the aim is to extend the results of this paper to more general economic or environmental models. Finally, appropriate numerical methods are needed to design and analyse more realistic differential game models. This will be the subject of future research.