1 Introduction

Theoretical research and applications in dynamic games are proceeding apace in many areas including mathematics, economics, operations research and social sciences. The origin of dynamic games started with continuous-time dynamic games (which are called differential games) by Rufus Isaacs in the late 1940s (whose work was published in Isaacs [9]). Pontryagin [13] solved differential games in open-loop solution with his maximum principle. Krasovkii [11] extended the development to external controls. The work of Bellman [3] on dynamic programming facilitated the development of discrete-time dynamic games. Basar and Olsder [2] and Yeung and Petrosyan [18,19,20,21] presented theories and applications of dynamic games and cooperative dynamic games. So far, cooperative dynamic games are confined to paradigms in which the players’ controls do not exhibit lags in the players’ payoffs. Bellman [4] considered dynamic programming in the case where the underlying dynamical equations are differential-difference equations with time lags. Kramer [10] and Arthur [1] considered control of linear processes with distributed control lags affecting the state dynamics. Wang [15] studied time-optimal controls with lags appearing in the state dynamics. Burdet and Sethi [6] and Hartl and Sethi [7] derived maximum principle for a class of systems with control lags which affected the dynamics of the state variable. Brandt-Pollmann et al. [5] and Huschto et al. [8] provided solution to optimal control problems with control delays in the state dynamics. Application studies involving control lags that affect the state dynamics can be found in Sethi and McGuire [14] and Winkler et al. [16].

Lags in controls are not uncommon and they appear in many real-life situations in which a control executed in stage \( k \) and its impacts will continue to be effective in subsequent stages. There are two major reasons for the appearance of control lags. First, lagged controls appear due to the controls’ lasting properties—examples of this type of lagged controls include durable goods, capital assets, money stocks released through quantitative ease, toxic waste dumping, release of radiation, investment expenditures, emission of hydrofluorocarbons and deforestation. The second major reason for lagged controls to appear is binding institutional arrangement. For instance, law or regulation enacted to be effective over a certain period of time, binding contracts, rules and actions of coalitions like those in the EU. In the presence of control lags, the decision maker has to take into consideration the lagged controls not only in the current stage but also in future stages. Significant modifications have to be made in standard control theory to develop novel dynamic optimization techniques to accommodate control lags.

This paper extends the existing paradigm in cooperative dynamic games by incorporating the lagged effects on the players’ payoffs in subsequent stages. The above listed examples are instances where the lagged controls affected the payoffs. Lagged controls which bring about adverse effects to the players’ payoffs often make the negative impacts and externalities to players more prolonged and significant in a non-cooperative equilibrium. Cooperation offers the best promise to alleviate the problem and provide a group optimal and individually rational solution. We first consider a dynamic optimization problem with control lags affecting the decision-maker’s payoff. A novel dynamic optimization theorem for solving control problems with control lags affecting the payoffs is developed. Then, we develop a class of cooperative dynamic game with control lags affecting the players’ future payoffs. Subgame consistent solutions are derived to ensure sustainable cooperation. In particular, subgame consistency guarantees that the optimality principle agreed upon at the outset will remain effective throughout the game. Hence, there is no incentive for any player to deviate from cooperation scheme. Subgame consistent cooperative solutions in dynamic games without control lags can be found in Yeung [17] and Yeung and Petrosyan [18,19,20,21]. In this paper, a subgame consistent payoff distribution procedure for cooperative games with lagged controls is presented. A novel feature is that the payoff distribution procedure will not only depend on the state but also on the lagged control executed in previous stages. This is the first time that cooperative dynamic games with control lags are studied.

The paper is organized as follows. Section 2 presents dynamic optimization techniques for solving control problems with control lags affecting the decision-maker’s payoff. Section 3 formulates a class of dynamic games with control lags affecting the players’ payoffs. In Sect. 4, dynamic cooperation under control lags is analyzed. A subgame consistent cooperative solution is presented and an imputation distribution procedure leading to a subgame consistent outcome is derived. Section 5 provides an application in cooperative environmental management with lagged controls. Concluding remarks are given in Sect. 6.

2 Dynamic Optimization Under Control Lags

In this section, we develop dynamic optimization techniques for solving one-player optimization problems with control lags affecting the decision-maker’s payoff. This technique is crucial in deriving the solution to the game problem of this paper. Consider a \( T \)-stage dynamic optimization problem in which there exist controls with lags. We use \( \mu_{k}^{(0)} \in U^{(0)} \subset R^{{m^{(0)} }} \) to denote control strategies executed in stage \( k \) that involve no (zero) lags. We use \( \mu_{k}^{(\tau )} \in U^{(\tau )} \subset R^{{m^{(\tau )} }} \) to denote control strategies executed in stage \( k \) that involve lags in the subsequent \( \tau \) stages. That means \( \mu_{k}^{(\tau )} \) is effective in stages \( k \), \( k + 1 \) and up to stage \( k + \tau \). For clarity of exposition and without much loss in generality, we consider the case where there exists one set of controls without lags and one set of lagged controls. The lagged control strategies are \( \mu_{k}^{(T)} \in U^{(T)} \subset R^{{m^{(T)} }} \) which have permanent lag effects until the end of the planning horizon. The controls with permanent lag effects can be conveniently modified to become controls with lags in the subsequent \( \tau \) stages, where \( \tau \in \{ 1,2, \ldots ,T - 1\} \). The confinement to one set of lagged controls avoids non-essential notational complexity in explaining the analysis of the paper.

The payoff received at stage \( k \) can be expressed as \( g_{k}^{{}} (x_{k}^{{}} ,\mu_{k}^{(0)} ,\mu_{k}^{(T)} ;\;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \), where \( x_{k}^{{}} \in X \subset R^{m} \) is the state at stage \( k \). The dynamic optimization problem becomes the maximization of

$$ \sum\limits_{k = 1}^{T} {g_{k}^{{}} \left( {x_{k}^{{}} ,\mu_{k}^{(0)} ,\mu_{k}^{(T)} ;\;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} } \right)} \,\delta^{k - 1} + q_{T + 1}^{{}} (x_{T + 1}^{{}} ), $$
(2.1)

subject to the state dynamics

$$ x_{k + 1}^{{}} = f_{k}^{{}} \left( {x_{k}^{{}} ,\mu_{k}^{(0)} ,\mu_{k}^{(T)} } \right),\quad x_{1}^{{}} = x_{1}^{0} , $$
(2.2)

where \( \delta \) is the discount factor, \( q_{T + 1}^{{}} (x_{T + 1}^{{}} ) \) is the terminal payoff, and the controls \( \mu_{0}^{(T)} \), \( \mu_{ - 1}^{(T)} , \ldots ,\mu_{ - (T - 1)}^{(T)} \) are zeros because the problem starts at stage one.

A novel theorem characterizing the optimal control strategies in the dynamic optimization problem (2.1)–(2.2) is presented below.

Theorem 2.1

The optimal strategies\( \{ \mu_{k}^{(0)*} ,\mu_{k}^{(T)*} ,\;k \in \{ 1,2, \ldots ,T\} \} \)for the dynamic optimization problem (2.1)–(2.2) can be obtained by solving the following system of recursive equations:

$$ W\left( {T + 1,x;\mu_{T}^{(T)} ,\mu_{T - 1}^{(T)} , \ldots ,\mu_{1}^{(T)} } \right) = q_{T + 1}^{{}} (x)\;\delta^{T} , $$
(2.3)
$$ \begin{aligned} W\left( {k,x;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} } \right) & = \mathop {\hbox{max} }\limits_{{\mu_{k}^{(0)} ,\mu_{k}^{(T)} }} \left\{ {g_{k}^{{}} \left( {x,\mu_{k}^{(0)} ,\mu_{k}^{(T)} ;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} } \right)\,\delta^{k - 1} } \right. \\ & \quad \left. { + W\left[ {\left. {k + 1,f_{k}^{{}} \left( {x,\mu_{k}^{(0)} ,\mu_{k}^{(T)} } \right);\mu_{k}^{(T)} ,\mu_{k - 1}^{(T)} , \cdots ,\mu_{1}^{(T)} } \right)} \right]} \right\}, \\ {\text{for}}\;k \in \{ 1,2, \ldots ,T\} \\ \end{aligned} $$
(2.4)

where\( W(k,x;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \)is the maximal value of the payoffs

$$ \sum\limits_{t = k}^{T} {g_{t}^{{}} \left( {x_{t}^{{}} ,\mu_{k}^{(0)} ,\mu_{t}^{(T)} ;\mu_{t - 1}^{(T)} ,\mu_{t - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} } \right)\,\delta^{k - 1} + q_{T + 1}^{{}} \left( {x_{T + 1}^{{}} } \right)} $$

for the problem starting at stage\( k \)with state\( x_{k}^{{}} = x \)and previously executed controls\( (\mu_{t - 1}^{(T)} ,\mu_{t - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \).

Proof

See “Appendix A”. □

Theorem 2.1 yields a new optimization technique which can be used to solve control problems with lagged controls affecting the payoffs of the decision maker. Note that both the current state \( x_{k}^{{}} \) and previously executed controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) appear as given in the stage \( k \) optimization problem. However, unlike the state variable \( x_{k}^{{}} \), there are no transition equations governing the transition of \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) from one stage to another. Using Theorem 2.1, one can obtain the optimal control strategies \( \{ \mu_{k}^{(0)*} ,\mu_{k}^{(T)*} \} , \) for \( k \in \{ 1,2, \ldots ,T\} \} \) in an optimization problem involving control lags. In Bellman’s [3] standard dynamic programming technique, the controls executed in stage \( k \) will affect the state \( x_{k + 1}^{{}} \) in stage \( k + 1 \) through the dynamic equation. In Theorem 2.1, the lagged controls executed in stage \( k \), that is \( \mu_{k}^{(T)} \), will affect the state \( x_{k + 1}^{{}} \) in stage \( k + 1 \) through the dynamic Eq. (2.2) and influence the payoff functions:

$$ g_{t}^{{}} \left( {x_{t}^{{}} ,\mu_{t}^{(0)} ,\mu_{t}^{(T)} ;\mu_{t - 1}^{(T)} ,\mu_{t - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} } \right),\quad {\text{for}}\;t \in \{ k,k + 1, \ldots ,T\} . $$

The major differences between the lagged control optimization techniques in Theorem 2.1 and those in the optimal control problems with delays cited in Sect. 1 include

  1. (i)

    The control lags in Theorem 2.1 appear in the payoffs of the decision-maker and not in the state dynamics while in the cited papers control lags appear in the state dynamics and not in the payoffs.

  2. (ii)

    In Theorem 2.1, the previously executed controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) in stage \( k \) act like a vector of idiosyncratic state variables that cannot be changed but will last for some finite stages. Although the lagged effects do not enter into the state dynamics (2.2), the lagged controls produce a vector of state-like variables \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) in the value function of each stage \( k \in \{ 1,2, \ldots ,T\} \). This is a new feature in dynamic optimization and it does not appear in any of the solution techniques for delay controls developed before.

3 Cooperative Dynamic Games with Control Lags

Consider a \( T \)-stage \( n \)-person nonzero-sum discrete-time cooperative dynamic game with state space \( X \in R^{m} \) and initial state \( x_{{}}^{0} \). Player \( i \) has control strategies \( \mu_{k}^{(0)i} \in U_{{}}^{(0)i} \subset R_{{}}^{{m_{{}}^{(0)} }} \) which have no lags and control strategies \( \mu_{k}^{(T)i} \in U_{{}}^{(T)i} \subset R_{{}}^{{m_{{}}^{(T)} }} \) which have permanent lag effects until the end of the planning horizon. The single-stage payoff function of player \( i \) at stage \( k \) is

$$ g_{k}^{i} \left( {x_{k}^{{}} ,\underline{\mu }_{k}^{(0)} ,\underline{\mu }_{k}^{(T)} ;\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right)\;,{\text{ for}}\;i \in \{ 1,2, \ldots ,n\} \equiv N, $$

where \( \underline{\mu }_{k}^{(0)} = (\mu_{k}^{(0)1} ,\mu_{k}^{(0)2} , \ldots ,\mu_{k}^{(0)n} ) \) and \( \underline{\mu }_{k}^{(T)} = (\mu_{k}^{(T)1} ,\mu_{k}^{(T)2} , \ldots ,\mu_{k}^{(T)n} ) \), for \( k \in \{ 1,2, \ldots ,T\} \).

The payoff of player \( i \) is:

$$ \sum\limits_{k = 1}^{T} {g_{k}^{i} \left( {x_{k}^{{}} ,\underline{\mu }_{k}^{(0)} ,\underline{\mu }_{k}^{(T)} ;\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right)\,\delta^{k - 1} + q_{T + 1}^{i} \left( {x_{T + 1}^{{}} } \right)\delta^{T} } , $$
(3.1)

where \( q_{T + 1}^{i} (x_{T + 1}^{{}} ) \) is the terminal payoff.

The state dynamics is characterized by the difference equation:

$$ x_{k + 1}^{{}} = f_{k}^{{}} \left( {x_{k}^{{}} ,\underline{\mu }_{k}^{(0)} ,\underline{\mu }_{k}^{(T)} } \right),\quad x_{1}^{{}} = x_{1}^{0} , $$
(3.2)

for \( k \in \{ 1,2, \ldots ,T\} \).

3.1 Dynamic Cooperation and Group Optimality

Now consider the case when the players agree to cooperate and distribute the payoff among themselves according to a gain sharing optimality principle. Two essential properties that a cooperative scheme has to satisfy are group optimality and individual rationality. To satisfy group optimality, the players will maximize their joint payoff by solving the dynamic optimization problem which maximizes

$$ \sum\limits_{j = 1}^{n} {\sum\limits_{k = 1}^{T} {g_{k}^{i} \left( {x_{k}^{{}} ,\underline{\mu }_{k}^{(0)} ,\underline{\mu }_{k}^{(T)} ;\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right)\,\delta^{k - 1} + \sum\limits_{j = 1}^{n} {} q_{T + 1}^{j} (x_{T + 1}^{{}} )\delta^{T} } } , $$
(3.3)

subject to (3.2).

Note that the payoffs are not weighted because they are transferrable. An optimal solution to the joint maximization problem (3.2)–(3.3) can be characterized by the theorem below.

Theorem 3.1

A set of optimal control strategies\( \{ \underline{\mu }_{k}^{(0)*} ,\underline{\mu }_{k}^{(T)*} , \)for\( k \in \{ 1,2, \ldots ,T\} \} \)of the dynamic optimization problem (3.2)–(3.3) can be obtained by solving the following system of recursive equations:

$$ W\left( {T + 1,x;\underline{\mu }_{T}^{(T)} ,\underline{\mu }_{T - 1}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right) = \sum\limits_{j = 1}^{n} {q_{T + 1}^{j} (x)\;\delta^{T} } , $$
(3.4)
$$ \begin{aligned} &W\left( {k,x;\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right) = \mathop {\hbox{max} }\limits_{{\underline{\mu }_{k}^{(0)} ,\underline{\mu }_{k}^{(T)} }} \left\{ {\sum\limits_{j = 1}^{n} {g_{k}^{j} \left( {x,\underline{\mu }_{k}^{(0)} ,\underline{\mu }_{k}^{(T)} ;\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right)\,\delta^{k - 1} } } \right. \\ & \quad \left. { + W\left[ {\left. {k + 1,f_{k}^{{}} \left( {x,\underline{\mu }_{k}^{(0)} ,\underline{\mu }_{k}^{(T)} } \right);\underline{\mu }_{k}^{(T)} ,\underline{\mu }_{k - 1}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right)} \right]} \right\}, \\ & \quad {\text{for}}\;k \in \{ 1,2, \ldots ,T\} , \\ \end{aligned} $$
(3.5)

where\( W(k,x;\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} ) \)represents the maximal value of the joint payoffs

$$ \sum\limits_{j = 1}^{n} {} \sum\limits_{t = k}^{T} {} g_{t}^{i} \left( {x_{t}^{{}} ,\underline{\mu }_{t}^{(0)} ,\underline{\mu }_{t}^{(T)} ;\underline{\mu }_{t - 1}^{(T)} ,\underline{\mu }_{t - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right)\,\delta^{k - 1} + \sum\limits_{j = 1}^{n} {q_{T + 1}^{j} \left( {x_{T + 1}^{{}} } \right)\delta^{T} ,} $$

for the control problem starting at stage\( k \)with state\( x_{k}^{{}} = x \)and previously executed controls\( (\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} ) \).

Proof

See “Appendix B”. □

Substituting the optimal controls \( \{ \underline{\mu }_{k}^{(0)*} ,\underline{\mu }_{k}^{(T)*} \} \) for \( k \in \{ 1,2, \ldots ,T\} \) and \( i \in N\} \) into the state dynamics (3.2), one can obtain the dynamics of the optimal cooperative trajectory as:

$$ x_{k + 1}^{{}} = f_{k}^{{}} \left( {x_{k}^{{}} ,\underline{\mu }_{k}^{(0)*} ,\underline{\mu }_{k}^{(T)i*} } \right), $$
(3.6)

for \( k \in \{ 1,2, \ldots ,T\} \) and \( x_{1}^{{}} = x_{1}^{0} \).

We use \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T + 1} \) to denote the solution to (3.6) which yields the optimal cooperative state trajectory. In addition, we use \( (\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) to denote the optimal cooperative controls with lags executed in stages preceding stage \( k \). The players agree on an optimality principle which will distribute the total cooperative payoff among themselves. Let

$$ \begin{aligned} & \xi \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right) \\ & = \left[ {\xi_{{}}^{1} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right),\xi_{{}}^{2} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right), \ldots } \right) \\ & \quad \left. { \ldots ,\xi_{{}}^{n} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right)} \right], \\ \end{aligned} $$
(3.7)

for \( k \in \{ 1,2, \ldots ,T\} \), denote the agreed distribution of cooperative payoffs among the players along the cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \) given the controls \( (\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) being executed in stages preceding stage \( k \).

To satisfy group optimality, the imputation vector has to satisfy

$$ W\left( {k,x_{{}}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right) = \sum\limits_{j = 1}^{n} {\xi_{{}}^{j} \left( {k,x_{k}^{*} ;\;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right),} $$
(3.8)

for \( k \in \{ 1,2, \ldots ,T\} \).

This condition guarantees the maximal joint payoff is distributed to the players.

3.2 Individual Rationality

For individual rationality to be satisfied the payoffs received by the players under cooperation have to be no less than their non-cooperative payoffs along the cooperative state trajectory. The non-cooperative payoffs of the players in a Nash equilibrium of the dynamic game (3.1)–(3.2) can be characterized by the following theorem.

Theorem 3.2

A set of Nash equilibrium strategies\( \{ \mu_{k}^{(0)i**} ,\mu_{k}^{(T)i**} \} \), for\( k \in \{ 1,2, \ldots ,T\} \)and\( i \in N \), to the non-cooperative game (3.1)–(3.2) can be obtained by solving the following recursive equations:

$$ V_{{}}^{i} \left( {T + 1,x;\underline{\mu }_{T}^{(T)**} ,\underline{\mu }_{T - 1}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} } \right) = q_{T + 1}^{i} (x)\;\delta^{T} ; $$
(3.9)
$$ \begin{aligned} & V_{{}}^{i} \left( {k,x;\underline{\mu }_{k - 1}^{(T)**} ,\underline{\mu }_{k - 2}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} } \right) = \\ & \mathop {\hbox{max} }\limits_{{\mu_{k}^{(0)i} ,\mu_{k}^{(T)i} }} \left\{ {g_{k}^{i} \left[ {x,\mu_{k}^{(0)i} ,\underline{\mu }_{k}^{(0) \ne i**} ,\mu_{k}^{(T)i} ,\underline{\mu }_{k}^{(T) \ne i**} ;\underline{\mu }_{k - 1}^{(T)**} ,\underline{\mu }_{k - 2}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} } \right]\delta^{k - 1} } \right. \\ & \quad \left. { + V_{{}}^{i} \left[ {k + 1,f_{k}^{{}} \left( {x,\mu_{k}^{(0)i} ,\underline{\mu }_{k}^{(0) \ne i**} ,\mu_{k}^{(T)i} ,\underline{\mu }_{k}^{(T) \ne i**} } \right);\underline{\mu }_{k - 1}^{(T)**} ,\underline{\mu }_{k - 2}^{(T)**} , \cdots ,\underline{\mu }_{1}^{(T)**} } \right]} \right\}, \\ & {\text{for}}\;k \in \{ 1,2, \cdots ,T\} \;{\text{and}}\;i \in N, \\ \end{aligned} $$
(3.10)

where\( \underline{\mu }_{k}^{(0) \ne i**} = (\mu_{k}^{(0)1**} ,\mu_{k}^{(0)2**} , \ldots ,\mu_{k}^{(0)i - 1**} ,\mu_{k}^{(0)i + 1**} , \ldots ,\mu_{k}^{(0)n**} ) \), \( \underline{\mu }_{k}^{(T) \ne i**} = (\mu_{k}^{(T)1**} ,\mu_{k}^{(T)2**} , \ldots ,\mu_{k}^{(T)i - 1**} ,\mu_{k}^{(T)i + 1**} , \ldots ,\mu_{k}^{(T)n**} ) \); and\( V_{{}}^{i} \left( {k,x;\underline{\mu }_{k - 1}^{(T)**} ,\underline{\mu }_{k - 2}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} } \right) \)is the maximal value of player\( i \)’s payoff

$$ \sum\limits_{t = k}^{T} {g_{t}^{i} \left[ {x_{t}^{{}} ,\mu_{t}^{(0)i} ,\underline{\mu }_{t}^{(0) \ne i**} ,\mu_{t}^{(T)i} ,\underline{\mu }_{t}^{(T) \ne i**} ;\underline{\mu }_{t - 1}^{(T)**} ,\underline{\mu }_{t - 2}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} } \right]\delta^{t - 1} + q_{T + 1}^{i} (x_{T + 1}^{{}} )\delta^{T} } $$

for the non-cooperative game starting at stage\( k \)with state\( x_{k}^{{}} = x \). The\( n - 1 \)other players’ game equilibrium strategies are\( \underline{\mu }_{t}^{(0) \ne i**} \)and\( \underline{\mu }_{t}^{(T) \ne i**} \), and the already executed game equilibrium strategies are\( (\underline{\mu }_{t - 1}^{(T)**} ,\underline{\mu }_{t - 2}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} ) \).

Proof

See “Appendix C”. □

To uphold individual rationality the payoff that player \( i \) receives under cooperation along the cooperative trajectory \( \left\{ {x_{k}^{*} } \right\}_{k = 1}^{T + 1} \) must be greater than or equal to his non-cooperative payoff, that is

$$ \xi_{{}}^{i} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right) \ge V_{{}}^{i} \left( {k,x;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right), $$
(3.11)

for \( i \in N \) and \( k \in \{ 1,2, \ldots ,T\} \).

4 Dynamic Cooperation Under Control Lags

To guarantee dynamical stability in a dynamic cooperation scheme, the solution has to satisfy the property of subgame consistency so that the agreed-upon optimality principle remains effective in all stages of the game along the optimal cooperative state trajectory. For subgame consistency to be satisfied, the agreed upon imputation \( \xi (k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) has to be applied at all the stages along the cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \).

4.1 An Example of Optimality Principle

If the optimality principle specifies that the players share the total cooperative payoff proportional to their non-cooperative payoffs, then along the optimal cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \) the imputation to player \( i \) becomes

$$ \begin{aligned} &\xi_{{}}^{i} (k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) = \\ & \frac{{V_{{}}^{i} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right)}}{{\sum\nolimits_{j = 1}^{n} {V_{{}}^{j} \left( {k,x_{k}^{*} ;\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\mu }_{k - 1}^{(T)*} ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\mu }_{k - 2}^{(T)*} , \ldots ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\mu }_{1}^{(T)*} } \right)} }}W\left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right), \\ \end{aligned} $$
(4.1)

for \( i \in N \) and \( k \in \{ 1,2, \ldots ,T\} \). □

Crucial for the outcome of cooperation is the formulation of a payment mechanism so that the agreed-upon imputation \( \xi (k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) can be realized. Following Petrosyan and Danilov [12] and Yeung and Petrosyan [18], we first formulate an Imputation Distribution Procedure (IDP) to guarantee the agreed imputations in (4.1) be allotted to the players.

Let \( B_{k}^{i} (x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) denote the payment that player \( i \) will receive at stage \( k \) under the cooperative agreement along the cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \) given that the optimal control strategies with lags in the preceding \( k - 1 \) stages are \( (\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \). The payment scheme involving \( B_{k}^{i} (x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \cdots ,\underline{\mu }_{1}^{(T)*} ) \) constitutes an IDP in the sense that the payoff to player \( i \) over the stages from \( k \) to \( T + 1 \) satisfy the condition:

$$ \begin{aligned} &\xi_{{}}^{i} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right) = B_{k}^{i} \left( {x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right)\delta^{k - 1} \\ & \quad + \left\{ {\sum\limits_{\zeta = k + 1}^{T} {} B_{\zeta }^{i} \left( {x_{\zeta }^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right)\delta^{\zeta - 1} + q_{T + 1}^{i} \left( {x_{T + 1}^{{}} } \right)\delta_{{}}^{T} } \right\}, \\ \end{aligned} $$
(4.2)

for \( i \in N \) and \( k \in \{ 1,2, \ldots ,T\} \).

A theorem giving an expression for a formula of \( B_{k}^{i} (x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \), for \( k \in \{ 1,2, \ldots ,T\} \) and \( i \in N \), which satisfies (4.2) is provided below.

Theorem 4.1

The agreed-upon imputation\( \xi (k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \), for\( k \in \{ 1,2, \ldots ,T\} \)along the cooperative trajectory\( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \), can be realized by a payment

$$ \begin{aligned} &B_{k}^{i} \left( {x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right) = \delta^{ - (k - 1)} \left[ {\xi^{i} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right)} \right. & \\ & \quad \left. { - \xi_{{}}^{i} \left( {k + 1,f_{k}^{{}} \left( {x_{k}^{*} ,\underline{\mu }_{k}^{(0)*} ,\underline{\mu }_{k}^{(T)*} } \right);\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} } \right)} \right] \\ \end{aligned} $$
(4.3)

given to player\( i \in N \)at stage\( k \in \{ 1,2, \ldots ,T\} \).

Proof

See “Appendix D”. □

The payment scheme in Theorem 4.1 gives rise to the realization of the imputation guided by the agreed-upon optimality principle and constitutes a subgame consistent payment scheme. More specifically, the payment of \( B_{k}^{i} (x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) allotted to player \( i \in N \) in stage \( k \in \{ 1,2, \ldots ,T\} \) will establish a cooperative plan that matches with the agreed-upon imputation to every player. A novel feature of the subgame consistent imputation distribution procedure in Theorem 4.1 is that the payment scheme (4.3) at stage \( k \) is contingent upon the conventional state variable \( x_{k}^{*} \) and previously executed controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) which acts like a vector of idiosyncratic state variables.

5 An Application in Environmental Management

In this section, we provide an application of cooperative dynamic games with control lags in an environmental management game. Consider a 10-stage dynamic environmental game with 2 nations. Each nation produces its own industrial output which brings about two types of damages to the environment. The first type of damage involves the building up of existing pollution stocks like atmospheric particulates. The second type of damages involves toxic environmental effects which last for all stages of the planning horizon. We use \( \mu_{k}^{(10)i} \in U_{{}}^{(10)i} \subset R \) to denote the amount of industrial output of nation \( i \) because of its environmental impacts last for 10 stages. In sum, industrial production produces pollutants that add to the pollution stock and yields lasting toxic environmental damages. Each nation adopts its own pollution abatement policy to reduce pollutants in the environment. We use \( \mu_{k}^{(0)i} \in U_{{}}^{(0)i} \subset R \) to denote the pollution abatement effort of nation \( i \) at stage \( k \). At initial stage 1, the level of pollution is given as \( x_{1}^{{}} = x_{{}}^{0} \). The dynamics of pollution accumulation is governed by the difference equation:

$$ x_{k + 1}^{{}} = x_{k}^{{}} + \sum\limits_{j = 1}^{2} {a_{{}}^{j} \mu_{k}^{(10)j} } - \sum\limits_{j = 1}^{2} {b_{{}}^{j} \mu_{k}^{(0)j} } (x_{k}^{{}} )^{1/2} - \lambda \,x_{k}^{{}} ,\quad x_{1}^{{}} = x_{1}^{0} , $$
(5.1)

where \( a_{{}}^{j} \mu_{k}^{(10)j} \) is the amount of pollutants created by nation \( j \)’s stage \( k \) output, \( b_{{}}^{j} \mu_{k}^{(0)j} (x_{k}^{{}} )^{1/2} \) is the amount of pollutants removed by \( \mu_{k}^{(0)j} \) unit of abatement effort from nation \( j \), and \( \lambda \) is the natural rate of decay of the pollutants.

We follow the assumption in most environmental studies that the effect of abatement effort in stage \( k \) is confined to the pollutants removed in the same stage. Hence, abatement effort does not accumulate. The economic benefits of nation \( i \)’s industrial output produced in stage \( k \) are \( [\alpha_{k}^{i} \mu_{k}^{(10)i} - c_{k}^{i} (\mu_{k}^{(10)i} )^{2} ] \). The objective of the government of nation \( i \) is to maximize

$$ \begin{aligned} &\sum\limits_{k = 1}^{10} {\left[ {\left[ {\alpha_{k}^{i} \mu_{k}^{(10)i} - c_{k}^{i} \left( {\mu_{k}^{(10)i} } \right)^{2} } \right] - \gamma_{k}^{i} (\mu_{k}^{(0)i} )^{2} - h_{k}^{i} x_{k}^{{}} - \sum\limits_{t = 1}^{k} {} \varepsilon_{k}^{i(t)i} \mu_{t}^{(10)i} - \sum\limits_{t = 1}^{k} {} \omega_{k}^{i(t)j} \mu_{t}^{(10)j} } \right]\;} \delta^{k - 1} \hfill \\ &\quad + (Q_{11}^{i} x_{11}^{{}} + \varpi_{{}}^{i} )\delta^{10} ,\quad i \in \{ 1,2\} \;{\text{and}}\;i \ne j. \hfill \\ \end{aligned} $$
(5.2)

The expression \( \varepsilon_{k}^{i(t)i} \mu_{t}^{(10)i} \) is the toxic impact of output \( \mu_{t}^{(10)i} \) to nation \( i \) itself in stage \( k \), and \( \omega_{k}^{j(t)i} \mu_{t}^{(10)i} \) is the toxic impact of nation \( i \)’s output \( \mu_{t}^{(10)i} \) to nation \( j \) in stage \( k \). The pollution abatement cost is \( \gamma_{k}^{i} (\mu_{k}^{(0)i} )^{2} \) for nation \( i \)’s abatement effort \( \mu_{k}^{(0)i} \). The damage of the pollutant stock to nation \( i \) at stage \( k \) is \( h_{k}^{i} x_{k}^{{}} \), where \( h_{k}^{i} \) is the average damage cost incurred by a unit of the pollutant in stage \( k \). The terminal valuation of the government’s objective at stage 11 is \( (Q_{11}^{i} x_{11}^{{}} + \varpi_{{}}^{i} ) \), which is related to the level of pollutants in the atmosphere. The term \( \varpi_{{}}^{i} \) reflects the anticipated present value of the stream of discounted economic benefits of nation \( i \)’s industrial output net of the pollution related costs after stage 11. The term \( Q_{11}^{i} x_{11}^{{}} \) is the linear approximation of the damage of the pollution stock to the nation.

To secure group optimality the participating nations seek to cooperate and maximize their joint payoff by solving the following dynamic optimization problem:

$$ \begin{aligned} \mathop {\hbox{max} }\limits_{\begin{subarray}{l} \mu_{\tau }^{(0)1} ,\mu_{\tau }^{(0)2} ,\,\mu_{\tau }^{(10)1} ,\mu_{\tau }^{(10)2} \\ \;\tau \in \{ 1,2, \ldots ,10\} \end{subarray} } \left\{ {\sum\limits_{k = 1}^{10} {\sum\limits_{j = 1}^{2} {\left[ {\left[ {\alpha_{k}^{j} \mu_{k}^{(10)j} - c_{k}^{j} \left( {\mu_{k}^{(10)j} } \right)^{2} } \right] - \gamma_{k}^{j} \left( {\mu_{k}^{(0)j} } \right)^{2} - h_{k}^{j} x_{k}^{{}} } \right.} } } \right. \hfill \\ \left. { - \sum\limits_{t = 1}^{k} {\varepsilon_{k}^{j(t)j} \mu_{t}^{(10)j} } - \sum\limits_{t = 1}^{k} {\omega_{k}^{j(t)\ell } \mu_{t}^{(10)\ell } } } \right]\left. {\delta^{k - 1} + \sum\limits_{j = 1}^{2} {\left( {Q_{11}^{j} x_{11}^{{}} + \varpi_{{}}^{j} } \right)} \delta^{10} } \right\} \hfill \\ \end{aligned} $$
(5.3)

subject to (5.1), where \( \ell \in \{ 1,2\} \) and \( \ell \ne j \).Invoking Theorem 3.1, a set of optimal control strategies \( \{ \mu_{k}^{(0)i*} ,\mu_{k}^{(10)i*} , \) for \( k \in \{ 1,2, \ldots ,10\} \) and \( i \in \{ 1,2\} \} \) for the dynamic optimization problem (5.1) and (5.3) can be obtained by solving the following system of recursive equations:

$$ W\left( {11,x;\underline{\mu }_{10}^{(10)} ,\underline{\mu }_{9}^{(10)} , \ldots ,\underline{\mu }_{1}^{(10)} } \right) = \sum\limits_{j = 1}^{2} {\left( {Q_{11}^{j} x_{11}^{{}} + \varpi_{{}}^{j} } \right)\;\delta^{10} ,} $$
(5.4)
$$ \begin{aligned} &W\left( {k,x;\underline{\mu }_{k - 1}^{(10)} ,\underline{\mu }_{k - 2}^{(10)} , \ldots ,\underline{\mu }_{1}^{(10)} } \right) = \mathop {\hbox{max} }\limits_{{\mu_{k}^{(0)1} ,\mu_{k}^{(0)2} ,\mu_{k}^{(10)1} \mu_{k}^{(10)2} }} \left\{ {\sum\limits_{j = 1}^{2} {\left[ {\left[ {\alpha_{k}^{j} \mu_{k}^{(10)j} - c_{k}^{j} \left( {\mu_{k}^{(10)j} } \right)^{2} } \right]} \right.} } \right. \\ & \quad \left. { -\, \gamma_{k}^{j} \left( {\mu_{k}^{(0)j} } \right)^{2} - h_{k}^{j} x - \sum\limits_{t = 1}^{k - 1} {\varepsilon_{k}^{j(t)j} \mu_{t}^{(10)j} } - \sum\limits_{t = 1}^{k - 1} {\omega_{k}^{j(t)\ell } \mu_{t}^{(10)\ell } } } \right]\delta^{k - 1} \\ & \quad \left. { + W\left[ {\left. {k + 1,x + \sum\limits_{j = 1}^{2} {a_{{}}^{j} \mu_{k}^{(10)j} } - \sum\limits_{j = 1}^{2} {b_{{}}^{j} \mu_{k}^{(0)j} } (x)^{1/2} - \lambda \,x;\underline{\mu }_{k}^{(10)} ,\underline{\mu }_{k - 1}^{(10)} , \ldots ,\underline{\mu }_{1}^{(10)} } \right)} \right]} \right\}, \\ & {\text{for}}\;k \in \{ 1,2, \ldots ,T\} . \\ \end{aligned} $$
(5.5)

Performing the indicated maximization in (5.5) and solving the system (5.4)–(5.5) one can obtain the maximized joint payoff under cooperation as:

Proposition 5.1

System (5.4)–(5.5) admits a solution with the maximized joint payoff being:

$$ W\left( {k,x;\underline{\mu }_{k - 1}^{(10)} ,\underline{\mu }_{k - 2}^{(10)} , \ldots ,\underline{\mu }_{1}^{(10)} } \right) = (A_{k}^{{}} x + C_{k}^{{}} )\;\delta^{k - 1} ,\quad {\text{for}}\;k \in \{ 1,2, \ldots ,10\} , $$
(5.6)

with\( A_{11}^{{}} = \sum\nolimits_{j = 1}^{2} {Q_{11}^{j} } \), and\( C_{11}^{{}} = \sum\nolimits_{j = 1}^{2} {\varpi_{{}}^{j} } \);

$$ A_{k}^{{}} = \sum\limits_{j = 1}^{2} {\left[ { - \gamma_{k}^{j} \left( {\frac{{\delta A_{k + 1}^{{}} b_{{}}^{j} }}{{2\gamma_{k}^{j} }}} \right)^{2} - h_{k}^{j} } \right]} + \left[ {A_{k + 1}^{{}} \left( {\sum\limits_{j = 1}^{2} {b_{{}}^{j} } \frac{{\delta A_{k + 1}^{{}} b_{{}}^{j} }}{{2\gamma_{k}^{j} }} - \lambda } \right)} \right]\;\delta , $$
(5.7)

for\( k \in \{ 1,2, \ldots ,10\} \); and\( C_{k}^{{}} \)being an expression involving the model parameters, for\( k \in \{ 1,2, \ldots ,10\} \).

Proof

See “Appendix E”. □

Using Proposition 5.1 and (5.4)–(5.5), the optimal cooperative controls can be obtained as:

$$ \mu_{k}^{(0)i*} = - A_{k + 1}^{{}} \frac{{b_{{}}^{i} (x_{k}^{{}} )^{1/2} \delta }}{{2\gamma_{k}^{i} }},\quad {\text{for}}\;k \in \{ 1,2, \ldots ,10\} ; $$
(5.8)
$$ \mu_{10}^{(10)i*} = \frac{{\alpha_{10}^{i} - \varepsilon_{10}^{i(10)i} - \omega_{10}^{\ell (10)i} + \delta A_{11}^{{}} a_{{}}^{i} }}{{2c_{10}^{i} }},\quad i,\ell \in \{ 1,2\} \;{\text{and}}\;\ell \ne i. $$
$$ \begin{aligned} \mu_{k}^{(10)i*} & = \frac{{\alpha_{k}^{i} - \varepsilon_{k}^{i(k)i} - \omega_{k}^{\ell (k)i} + \delta A_{k + 1}^{{}} a_{{}}^{i} - \delta \sum\nolimits_{\tau = k + 1}^{10} {\delta_{{}}^{\tau - (k + 1)} \left( {\varepsilon_{\tau }^{i(k)i} + \omega_{\tau }^{\ell (k)i} } \right)} }}{{2c_{k}^{i} }}, \\ & i,\ell \in \{ 1,2\} ,\;\ell \ne i\;{\text{and}}\;k \in \{ 1,2, \ldots ,9\} , \\ \end{aligned} $$
(5.9)

The cooperative abatement effort \( \mu_{k}^{(0)i*} \) in (5.8) is the optimal level of pollution abatement where the marginal abatement cost \( 2\gamma_{k}^{i} \mu_{k}^{(0)i*} \) equals the marginal social benefit \( - A_{k + 1}^{{}} b_{{}}^{i} (x_{k}^{{}} )^{1/2} \delta \) from pollution abatement. The cooperative industrial output \( \mu_{k}^{(10)i*} \) in (5.9) yields the optimal output level. In particular, the marginal revenue of industrial output \( \alpha_{k}^{i} \) equals the total marginal private and social costs which include (i) the marginal cost of production \( 2c_{k}^{i} \mu_{k}^{(10)i*} \), (ii) the marginal toxic impact of nation \( i \)’s output to nation \( i \) itself \( \varepsilon_{k}^{i(k)i} \), (iii) the marginal toxic impact of nation \( i \)’s output to nation \( j \), that is \( \omega_{k}^{\ell (k)i} \), (iv) the increment of pollution stock \( - \delta A_{k + 1}^{{}} a_{{}}^{i} \), and (v) the sum of the marginal toxic impacts to both nations in stage \( k \) to stage 10, that is \( \delta \sum\nolimits_{\tau = k + 1}^{10} {\delta_{{}}^{\tau - (k + 1)} (\varepsilon_{\tau }^{i(k)i} + \omega_{\tau }^{\ell (k)i} )} \).

Worth noting is that according to item (v), the marginal lagged toxic environmental effects of nation \( i \)’s output on future stages are included. The environmental implication of control lags is that lagged environmental impacts can last for a long time and the total level of lagged impacts accumulates over time from outputs produced stage after stage. The economic implication is that a levy on industrial production equaling the sum of items (ii) to (v) has to be imposed to achieve the social optimal level of output. In particular, the amount in item (v) is the tax on the marginal toxic impacts in future stages. Finally, in the case where there are no control lags, item (v) will disappear.

Substituting the optimal control strategies (5.8)–(5.9) into the state dynamics (5.1), we obtain the cooperative state trajectory

$$ x_{k + 1}^{{}} = x_{k}^{{}} + \sum\limits_{j = 1}^{2} {a_{{}}^{j} \mu_{k}^{(10)j*} } + \sum\limits_{j = 1}^{2} {A_{k + 1}^{{}} \frac{{b_{{}}^{j} \delta }}{{2\gamma_{k}^{j} }}x_{k}^{{}} - \lambda \,x_{k}^{{}} ,\quad x_{1}^{{}} = x_{1}^{0} } . $$
(5.10)

Equation (5.10) is a first-order linear difference equation which can be solved by standard techniques. We use \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{11} \) to denote the values of \( x_{k}^{{}} \) generated by (5.10).

Using (5.8)–(5.9), one can obtain the optimal cooperative strategies in stages 1 to \( T \).

To derive the non-cooperative equilibrium solution of the game (5.1)–(5.2), we invoke Theorem 3.2 and obtain the game equilibrium payoff of nation \( i \) as:

Proposition 5.2

The game equilibrium payoff of nation \( i \in \{ 1,2\} \) is:

$$ V_{{}}^{i} \left( {k,x;\underline{\mu }_{k - 1}^{(10)} ,\underline{\mu }_{k - 2}^{(10)} , \ldots ,\underline{\mu }_{1}^{(10)} } \right) = \left( {A_{k}^{i} x + C_{k}^{i} } \right)\delta^{k - 1} ,\quad {\text{for}}\;k \in \{ 1,2, \ldots ,10\} , $$
(5.11)

with \( A_{11}^{i} = Q_{11}^{i} \), and \( C_{11}^{i} = \varpi_{{}}^{i} \)

$$ \begin{aligned} A_{k}^{i} & = \left[ { - \gamma_{k}^{i} \left( {\frac{{\delta A_{k + 1}^{i} b_{{}}^{i} }}{{2\gamma_{k}^{i} }}} \right)^{2} - h_{k}^{i} } \right] + \left[ {A_{k + 1}^{i} \left( {b_{{}}^{i} \frac{{\delta A_{k + 1}^{i} b_{{}}^{i} }}{{2\gamma_{k}^{i} }} + b_{{}}^{j} \frac{{\delta A_{k + 1}^{j} b_{{}}^{j} }}{{2\gamma_{k}^{j} }} - \lambda } \right)} \right]\;\delta , \\ & \quad {\text{for}}\;k \in \{ 1,2, \ldots ,10\} \\ \end{aligned} $$
(5.12)

and \( C_{k}^{i} \) being an expression involving the model parameters.

Proof

Follow the proof of Proposition 5.1. □

If the nations agree to share the total cooperative payoff proportional to their non-cooperative payoffs, then the payoff of player \( i \) along the cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{11} \) becomes

$$ \begin{aligned} & \xi_{{}}^{i} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(10)*} ,\underline{\mu }_{k - 2}^{(10)*} , \ldots ,\underline{\mu }_{1}^{(10)*} } \right) \\ & = \frac{{V_{{}}^{i} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(10)*} ,\underline{\mu }_{k - 2}^{(10)*} , \ldots ,\underline{\mu }_{1}^{(10)*} } \right)}}{{\sum\nolimits_{j = 1}^{2} {V_{{}}^{j} (k,x_{k}^{*} ;\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\mu }_{k - 1}^{(10)*} ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\mu }_{k - 2}^{(10)*} , \ldots ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\mu }_{1}^{(10)*} )} }}W\left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(10)*} ,\underline{\mu }_{k - 2}^{(10)*} , \ldots ,\underline{\mu }_{1}^{(10)*} } \right) \\ & = \frac{{A_{k}^{i} x_{k}^{*} + C_{k}^{i} }}{{\sum\nolimits_{j = 1}^{2} {(A_{k}^{j} x_{k}^{*} + C_{k}^{j} )} }}\left( {A_{k}^{{}} x_{k}^{*} + C_{k}^{{}} } \right)\delta^{k - 1} , \\ \end{aligned} $$
(5.13)

for \( i \in \{ 1,2\} \) and \( k \in \{ 1,2, \ldots ,10\} \).

Invoking Theorem 4.1, a payment

$$ \begin{aligned} & B_{k}^{i} \left( {x_{k}^{*} ;\underline{\mu }_{k - 1}^{(10)*} ,\underline{\mu }_{k - 2}^{(10)*} , \ldots ,\underline{\mu }_{1}^{(10)*} } \right) = \delta^{ - (k - 1)} \left[ {\xi_{{}}^{i} \left( {k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(10)*} ,\underline{\mu }_{k - 2}^{(10)*} , \ldots ,\underline{\mu }_{1}^{(10)*} } \right)} \right. \\ & \quad \left. { - \,\xi_{{}}^{i} \left( {k + 1,x_{k + 1}^{*} ;\underline{\mu }_{k - 1}^{(10)*} ,\underline{\mu }_{k - 2}^{(10)*} , \ldots ,\underline{\mu }_{1}^{(10)*} } \right)} \right] \\ \end{aligned} $$
(5.14)

given to player \( i \) at stage \( k \in \{ 1,2, \ldots ,10\} \) along the cooperative trajectory \( \left\{ {x_{k}^{*} } \right\}_{k = 1}^{10} \) with preceding strategies \( (\underline{\mu }_{k - 1}^{(10)*} ,\underline{\mu }_{k - 2}^{(10)*} , \ldots ,\underline{\mu }_{1}^{(10)*} ) \) would lead to the realization of the imputation (5.13) and hence a subgame consistent solution results.

6 Concluding Remarks

This paper presents a new class of cooperative dynamic games in which there are control lags affecting the payoffs of the players. The incorporation of control lags extends the application of cooperative dynamic games to a wider spectrum of real-life scenarios. A novel dynamic optimization theorem for solving problems with control lags is developed. This theorem can be applied to a wide range of practical dynamic optimization problems with control lags. For instance, in dynamic consumer utility maximization the purchase of durable goods is a classic case of lagged control. Almost all kinds of investments would generate returns in future stages and can be modeled as lagged controls. The production of goods (like wine or plants) which requires a gestation period is controls with lag effects. Housing decision is an optimization process involving lagged controls. Insurance would affect the payoff of the decision maker in subsequent stages, and multi-stage contracts would generate binding effects on future payoffs. The optimization techniques developed in this paper play an important role in the derivation of solutions for these analyses. A new form of the feedback Nash equilibrium recursive equations is derived using the novel dynamic optimization theorem. A subgame consistent imputation distribution procedure contingent upon the current state \( x_{k}^{*} \) and previously executed controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) is established.

This is the first time that cooperative dynamic games with lagged controls are formulated and further research along this line is expected. An immediate extension of the game is to allow the terminal payoffs being dependent on the previously executed lagged controls, like \( q_{T + 1}^{i} \left[ {x_{T + 1}^{{}} ,\underline{\mu }_{T}^{(T)} ,\underline{\mu }_{T - 1}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right] \), for \( i \in N \). Among further theoretical developments to be pursued include research on dynamic cooperation among players with asynchronous game horizons with control lags and the consideration of more complicated control lags. In addition, application of the analysis to cooperation schemes involving control lags, like Brexit, international climate agreements and nuclear war threats, would likely generate interesting and new results.