Abstract
Controls with lags are control strategies with prolonged effects lasting for more than one stage of the game after the controls had been executed. Lags in controls yielding adverse effects often make the negative impacts more significant. Cooperation provides an effective means to alleviate the problem and obtains an optimal solution. This paper extends the existing paradigm in cooperative dynamic games by allowing the existence of controls with lag effects on the players’ payoffs in subsequent stages. A novel dynamic optimization theorem with control lags is developed to derive the Pareto optimal cooperative controls. Subgame consistent solutions are derived to ensure sustainable cooperation. In particular, subgame consistency guarantees that the optimality principle agreed upon at the outset will remain effective throughout the game and, hence, there is no incentive for any player to deviate from cooperation scheme. A procedure for imputation distribution is provided to formulate a dynamically stable cooperative scheme under control lags. An application in cooperative environmental management is presented. This is the first time that cooperative dynamic games with control lags are studied.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Theoretical research and applications in dynamic games are proceeding apace in many areas including mathematics, economics, operations research and social sciences. The origin of dynamic games started with continuous-time dynamic games (which are called differential games) by Rufus Isaacs in the late 1940s (whose work was published in Isaacs [9]). Pontryagin [13] solved differential games in open-loop solution with his maximum principle. Krasovkii [11] extended the development to external controls. The work of Bellman [3] on dynamic programming facilitated the development of discrete-time dynamic games. Basar and Olsder [2] and Yeung and Petrosyan [18,19,20,21] presented theories and applications of dynamic games and cooperative dynamic games. So far, cooperative dynamic games are confined to paradigms in which the players’ controls do not exhibit lags in the players’ payoffs. Bellman [4] considered dynamic programming in the case where the underlying dynamical equations are differential-difference equations with time lags. Kramer [10] and Arthur [1] considered control of linear processes with distributed control lags affecting the state dynamics. Wang [15] studied time-optimal controls with lags appearing in the state dynamics. Burdet and Sethi [6] and Hartl and Sethi [7] derived maximum principle for a class of systems with control lags which affected the dynamics of the state variable. Brandt-Pollmann et al. [5] and Huschto et al. [8] provided solution to optimal control problems with control delays in the state dynamics. Application studies involving control lags that affect the state dynamics can be found in Sethi and McGuire [14] and Winkler et al. [16].
Lags in controls are not uncommon and they appear in many real-life situations in which a control executed in stage \( k \) and its impacts will continue to be effective in subsequent stages. There are two major reasons for the appearance of control lags. First, lagged controls appear due to the controls’ lasting properties—examples of this type of lagged controls include durable goods, capital assets, money stocks released through quantitative ease, toxic waste dumping, release of radiation, investment expenditures, emission of hydrofluorocarbons and deforestation. The second major reason for lagged controls to appear is binding institutional arrangement. For instance, law or regulation enacted to be effective over a certain period of time, binding contracts, rules and actions of coalitions like those in the EU. In the presence of control lags, the decision maker has to take into consideration the lagged controls not only in the current stage but also in future stages. Significant modifications have to be made in standard control theory to develop novel dynamic optimization techniques to accommodate control lags.
This paper extends the existing paradigm in cooperative dynamic games by incorporating the lagged effects on the players’ payoffs in subsequent stages. The above listed examples are instances where the lagged controls affected the payoffs. Lagged controls which bring about adverse effects to the players’ payoffs often make the negative impacts and externalities to players more prolonged and significant in a non-cooperative equilibrium. Cooperation offers the best promise to alleviate the problem and provide a group optimal and individually rational solution. We first consider a dynamic optimization problem with control lags affecting the decision-maker’s payoff. A novel dynamic optimization theorem for solving control problems with control lags affecting the payoffs is developed. Then, we develop a class of cooperative dynamic game with control lags affecting the players’ future payoffs. Subgame consistent solutions are derived to ensure sustainable cooperation. In particular, subgame consistency guarantees that the optimality principle agreed upon at the outset will remain effective throughout the game. Hence, there is no incentive for any player to deviate from cooperation scheme. Subgame consistent cooperative solutions in dynamic games without control lags can be found in Yeung [17] and Yeung and Petrosyan [18,19,20,21]. In this paper, a subgame consistent payoff distribution procedure for cooperative games with lagged controls is presented. A novel feature is that the payoff distribution procedure will not only depend on the state but also on the lagged control executed in previous stages. This is the first time that cooperative dynamic games with control lags are studied.
The paper is organized as follows. Section 2 presents dynamic optimization techniques for solving control problems with control lags affecting the decision-maker’s payoff. Section 3 formulates a class of dynamic games with control lags affecting the players’ payoffs. In Sect. 4, dynamic cooperation under control lags is analyzed. A subgame consistent cooperative solution is presented and an imputation distribution procedure leading to a subgame consistent outcome is derived. Section 5 provides an application in cooperative environmental management with lagged controls. Concluding remarks are given in Sect. 6.
2 Dynamic Optimization Under Control Lags
In this section, we develop dynamic optimization techniques for solving one-player optimization problems with control lags affecting the decision-maker’s payoff. This technique is crucial in deriving the solution to the game problem of this paper. Consider a \( T \)-stage dynamic optimization problem in which there exist controls with lags. We use \( \mu_{k}^{(0)} \in U^{(0)} \subset R^{{m^{(0)} }} \) to denote control strategies executed in stage \( k \) that involve no (zero) lags. We use \( \mu_{k}^{(\tau )} \in U^{(\tau )} \subset R^{{m^{(\tau )} }} \) to denote control strategies executed in stage \( k \) that involve lags in the subsequent \( \tau \) stages. That means \( \mu_{k}^{(\tau )} \) is effective in stages \( k \), \( k + 1 \) and up to stage \( k + \tau \). For clarity of exposition and without much loss in generality, we consider the case where there exists one set of controls without lags and one set of lagged controls. The lagged control strategies are \( \mu_{k}^{(T)} \in U^{(T)} \subset R^{{m^{(T)} }} \) which have permanent lag effects until the end of the planning horizon. The controls with permanent lag effects can be conveniently modified to become controls with lags in the subsequent \( \tau \) stages, where \( \tau \in \{ 1,2, \ldots ,T - 1\} \). The confinement to one set of lagged controls avoids non-essential notational complexity in explaining the analysis of the paper.
The payoff received at stage \( k \) can be expressed as \( g_{k}^{{}} (x_{k}^{{}} ,\mu_{k}^{(0)} ,\mu_{k}^{(T)} ;\;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \), where \( x_{k}^{{}} \in X \subset R^{m} \) is the state at stage \( k \). The dynamic optimization problem becomes the maximization of
subject to the state dynamics
where \( \delta \) is the discount factor, \( q_{T + 1}^{{}} (x_{T + 1}^{{}} ) \) is the terminal payoff, and the controls \( \mu_{0}^{(T)} \), \( \mu_{ - 1}^{(T)} , \ldots ,\mu_{ - (T - 1)}^{(T)} \) are zeros because the problem starts at stage one.
A novel theorem characterizing the optimal control strategies in the dynamic optimization problem (2.1)–(2.2) is presented below.
Theorem 2.1
The optimal strategies\( \{ \mu_{k}^{(0)*} ,\mu_{k}^{(T)*} ,\;k \in \{ 1,2, \ldots ,T\} \} \)for the dynamic optimization problem (2.1)–(2.2) can be obtained by solving the following system of recursive equations:
where\( W(k,x;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \)is the maximal value of the payoffs
for the problem starting at stage\( k \)with state\( x_{k}^{{}} = x \)and previously executed controls\( (\mu_{t - 1}^{(T)} ,\mu_{t - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \).
Proof
See “Appendix A”. □
Theorem 2.1 yields a new optimization technique which can be used to solve control problems with lagged controls affecting the payoffs of the decision maker. Note that both the current state \( x_{k}^{{}} \) and previously executed controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) appear as given in the stage \( k \) optimization problem. However, unlike the state variable \( x_{k}^{{}} \), there are no transition equations governing the transition of \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) from one stage to another. Using Theorem 2.1, one can obtain the optimal control strategies \( \{ \mu_{k}^{(0)*} ,\mu_{k}^{(T)*} \} , \) for \( k \in \{ 1,2, \ldots ,T\} \} \) in an optimization problem involving control lags. In Bellman’s [3] standard dynamic programming technique, the controls executed in stage \( k \) will affect the state \( x_{k + 1}^{{}} \) in stage \( k + 1 \) through the dynamic equation. In Theorem 2.1, the lagged controls executed in stage \( k \), that is \( \mu_{k}^{(T)} \), will affect the state \( x_{k + 1}^{{}} \) in stage \( k + 1 \) through the dynamic Eq. (2.2) and influence the payoff functions:
The major differences between the lagged control optimization techniques in Theorem 2.1 and those in the optimal control problems with delays cited in Sect. 1 include
-
(i)
The control lags in Theorem 2.1 appear in the payoffs of the decision-maker and not in the state dynamics while in the cited papers control lags appear in the state dynamics and not in the payoffs.
-
(ii)
In Theorem 2.1, the previously executed controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) in stage \( k \) act like a vector of idiosyncratic state variables that cannot be changed but will last for some finite stages. Although the lagged effects do not enter into the state dynamics (2.2), the lagged controls produce a vector of state-like variables \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) in the value function of each stage \( k \in \{ 1,2, \ldots ,T\} \). This is a new feature in dynamic optimization and it does not appear in any of the solution techniques for delay controls developed before.
3 Cooperative Dynamic Games with Control Lags
Consider a \( T \)-stage \( n \)-person nonzero-sum discrete-time cooperative dynamic game with state space \( X \in R^{m} \) and initial state \( x_{{}}^{0} \). Player \( i \) has control strategies \( \mu_{k}^{(0)i} \in U_{{}}^{(0)i} \subset R_{{}}^{{m_{{}}^{(0)} }} \) which have no lags and control strategies \( \mu_{k}^{(T)i} \in U_{{}}^{(T)i} \subset R_{{}}^{{m_{{}}^{(T)} }} \) which have permanent lag effects until the end of the planning horizon. The single-stage payoff function of player \( i \) at stage \( k \) is
where \( \underline{\mu }_{k}^{(0)} = (\mu_{k}^{(0)1} ,\mu_{k}^{(0)2} , \ldots ,\mu_{k}^{(0)n} ) \) and \( \underline{\mu }_{k}^{(T)} = (\mu_{k}^{(T)1} ,\mu_{k}^{(T)2} , \ldots ,\mu_{k}^{(T)n} ) \), for \( k \in \{ 1,2, \ldots ,T\} \).
The payoff of player \( i \) is:
where \( q_{T + 1}^{i} (x_{T + 1}^{{}} ) \) is the terminal payoff.
The state dynamics is characterized by the difference equation:
for \( k \in \{ 1,2, \ldots ,T\} \).
3.1 Dynamic Cooperation and Group Optimality
Now consider the case when the players agree to cooperate and distribute the payoff among themselves according to a gain sharing optimality principle. Two essential properties that a cooperative scheme has to satisfy are group optimality and individual rationality. To satisfy group optimality, the players will maximize their joint payoff by solving the dynamic optimization problem which maximizes
subject to (3.2).
Note that the payoffs are not weighted because they are transferrable. An optimal solution to the joint maximization problem (3.2)–(3.3) can be characterized by the theorem below.
Theorem 3.1
A set of optimal control strategies\( \{ \underline{\mu }_{k}^{(0)*} ,\underline{\mu }_{k}^{(T)*} , \)for\( k \in \{ 1,2, \ldots ,T\} \} \)of the dynamic optimization problem (3.2)–(3.3) can be obtained by solving the following system of recursive equations:
where\( W(k,x;\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} ) \)represents the maximal value of the joint payoffs
for the control problem starting at stage\( k \)with state\( x_{k}^{{}} = x \)and previously executed controls\( (\underline{\mu }_{k - 1}^{(T)} ,\underline{\mu }_{k - 2}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} ) \).
Proof
See “Appendix B”. □
Substituting the optimal controls \( \{ \underline{\mu }_{k}^{(0)*} ,\underline{\mu }_{k}^{(T)*} \} \) for \( k \in \{ 1,2, \ldots ,T\} \) and \( i \in N\} \) into the state dynamics (3.2), one can obtain the dynamics of the optimal cooperative trajectory as:
for \( k \in \{ 1,2, \ldots ,T\} \) and \( x_{1}^{{}} = x_{1}^{0} \).
We use \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T + 1} \) to denote the solution to (3.6) which yields the optimal cooperative state trajectory. In addition, we use \( (\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) to denote the optimal cooperative controls with lags executed in stages preceding stage \( k \). The players agree on an optimality principle which will distribute the total cooperative payoff among themselves. Let
for \( k \in \{ 1,2, \ldots ,T\} \), denote the agreed distribution of cooperative payoffs among the players along the cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \) given the controls \( (\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) being executed in stages preceding stage \( k \).
To satisfy group optimality, the imputation vector has to satisfy
for \( k \in \{ 1,2, \ldots ,T\} \).
This condition guarantees the maximal joint payoff is distributed to the players.
3.2 Individual Rationality
For individual rationality to be satisfied the payoffs received by the players under cooperation have to be no less than their non-cooperative payoffs along the cooperative state trajectory. The non-cooperative payoffs of the players in a Nash equilibrium of the dynamic game (3.1)–(3.2) can be characterized by the following theorem.
Theorem 3.2
A set of Nash equilibrium strategies\( \{ \mu_{k}^{(0)i**} ,\mu_{k}^{(T)i**} \} \), for\( k \in \{ 1,2, \ldots ,T\} \)and\( i \in N \), to the non-cooperative game (3.1)–(3.2) can be obtained by solving the following recursive equations:
where\( \underline{\mu }_{k}^{(0) \ne i**} = (\mu_{k}^{(0)1**} ,\mu_{k}^{(0)2**} , \ldots ,\mu_{k}^{(0)i - 1**} ,\mu_{k}^{(0)i + 1**} , \ldots ,\mu_{k}^{(0)n**} ) \), \( \underline{\mu }_{k}^{(T) \ne i**} = (\mu_{k}^{(T)1**} ,\mu_{k}^{(T)2**} , \ldots ,\mu_{k}^{(T)i - 1**} ,\mu_{k}^{(T)i + 1**} , \ldots ,\mu_{k}^{(T)n**} ) \); and\( V_{{}}^{i} \left( {k,x;\underline{\mu }_{k - 1}^{(T)**} ,\underline{\mu }_{k - 2}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} } \right) \)is the maximal value of player\( i \)’s payoff
for the non-cooperative game starting at stage\( k \)with state\( x_{k}^{{}} = x \). The\( n - 1 \)other players’ game equilibrium strategies are\( \underline{\mu }_{t}^{(0) \ne i**} \)and\( \underline{\mu }_{t}^{(T) \ne i**} \), and the already executed game equilibrium strategies are\( (\underline{\mu }_{t - 1}^{(T)**} ,\underline{\mu }_{t - 2}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} ) \).
Proof
See “Appendix C”. □
To uphold individual rationality the payoff that player \( i \) receives under cooperation along the cooperative trajectory \( \left\{ {x_{k}^{*} } \right\}_{k = 1}^{T + 1} \) must be greater than or equal to his non-cooperative payoff, that is
for \( i \in N \) and \( k \in \{ 1,2, \ldots ,T\} \).
4 Dynamic Cooperation Under Control Lags
To guarantee dynamical stability in a dynamic cooperation scheme, the solution has to satisfy the property of subgame consistency so that the agreed-upon optimality principle remains effective in all stages of the game along the optimal cooperative state trajectory. For subgame consistency to be satisfied, the agreed upon imputation \( \xi (k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) has to be applied at all the stages along the cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \).
4.1 An Example of Optimality Principle
If the optimality principle specifies that the players share the total cooperative payoff proportional to their non-cooperative payoffs, then along the optimal cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \) the imputation to player \( i \) becomes
for \( i \in N \) and \( k \in \{ 1,2, \ldots ,T\} \). □
Crucial for the outcome of cooperation is the formulation of a payment mechanism so that the agreed-upon imputation \( \xi (k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) can be realized. Following Petrosyan and Danilov [12] and Yeung and Petrosyan [18], we first formulate an Imputation Distribution Procedure (IDP) to guarantee the agreed imputations in (4.1) be allotted to the players.
Let \( B_{k}^{i} (x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) denote the payment that player \( i \) will receive at stage \( k \) under the cooperative agreement along the cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \) given that the optimal control strategies with lags in the preceding \( k - 1 \) stages are \( (\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \). The payment scheme involving \( B_{k}^{i} (x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \cdots ,\underline{\mu }_{1}^{(T)*} ) \) constitutes an IDP in the sense that the payoff to player \( i \) over the stages from \( k \) to \( T + 1 \) satisfy the condition:
for \( i \in N \) and \( k \in \{ 1,2, \ldots ,T\} \).
A theorem giving an expression for a formula of \( B_{k}^{i} (x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \), for \( k \in \{ 1,2, \ldots ,T\} \) and \( i \in N \), which satisfies (4.2) is provided below.
Theorem 4.1
The agreed-upon imputation\( \xi (k,x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \), for\( k \in \{ 1,2, \ldots ,T\} \)along the cooperative trajectory\( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{T} \), can be realized by a payment
given to player\( i \in N \)at stage\( k \in \{ 1,2, \ldots ,T\} \).
Proof
See “Appendix D”. □
The payment scheme in Theorem 4.1 gives rise to the realization of the imputation guided by the agreed-upon optimality principle and constitutes a subgame consistent payment scheme. More specifically, the payment of \( B_{k}^{i} (x_{k}^{*} ;\underline{\mu }_{k - 1}^{(T)*} ,\underline{\mu }_{k - 2}^{(T)*} , \ldots ,\underline{\mu }_{1}^{(T)*} ) \) allotted to player \( i \in N \) in stage \( k \in \{ 1,2, \ldots ,T\} \) will establish a cooperative plan that matches with the agreed-upon imputation to every player. A novel feature of the subgame consistent imputation distribution procedure in Theorem 4.1 is that the payment scheme (4.3) at stage \( k \) is contingent upon the conventional state variable \( x_{k}^{*} \) and previously executed controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) which acts like a vector of idiosyncratic state variables.
5 An Application in Environmental Management
In this section, we provide an application of cooperative dynamic games with control lags in an environmental management game. Consider a 10-stage dynamic environmental game with 2 nations. Each nation produces its own industrial output which brings about two types of damages to the environment. The first type of damage involves the building up of existing pollution stocks like atmospheric particulates. The second type of damages involves toxic environmental effects which last for all stages of the planning horizon. We use \( \mu_{k}^{(10)i} \in U_{{}}^{(10)i} \subset R \) to denote the amount of industrial output of nation \( i \) because of its environmental impacts last for 10 stages. In sum, industrial production produces pollutants that add to the pollution stock and yields lasting toxic environmental damages. Each nation adopts its own pollution abatement policy to reduce pollutants in the environment. We use \( \mu_{k}^{(0)i} \in U_{{}}^{(0)i} \subset R \) to denote the pollution abatement effort of nation \( i \) at stage \( k \). At initial stage 1, the level of pollution is given as \( x_{1}^{{}} = x_{{}}^{0} \). The dynamics of pollution accumulation is governed by the difference equation:
where \( a_{{}}^{j} \mu_{k}^{(10)j} \) is the amount of pollutants created by nation \( j \)’s stage \( k \) output, \( b_{{}}^{j} \mu_{k}^{(0)j} (x_{k}^{{}} )^{1/2} \) is the amount of pollutants removed by \( \mu_{k}^{(0)j} \) unit of abatement effort from nation \( j \), and \( \lambda \) is the natural rate of decay of the pollutants.
We follow the assumption in most environmental studies that the effect of abatement effort in stage \( k \) is confined to the pollutants removed in the same stage. Hence, abatement effort does not accumulate. The economic benefits of nation \( i \)’s industrial output produced in stage \( k \) are \( [\alpha_{k}^{i} \mu_{k}^{(10)i} - c_{k}^{i} (\mu_{k}^{(10)i} )^{2} ] \). The objective of the government of nation \( i \) is to maximize
The expression \( \varepsilon_{k}^{i(t)i} \mu_{t}^{(10)i} \) is the toxic impact of output \( \mu_{t}^{(10)i} \) to nation \( i \) itself in stage \( k \), and \( \omega_{k}^{j(t)i} \mu_{t}^{(10)i} \) is the toxic impact of nation \( i \)’s output \( \mu_{t}^{(10)i} \) to nation \( j \) in stage \( k \). The pollution abatement cost is \( \gamma_{k}^{i} (\mu_{k}^{(0)i} )^{2} \) for nation \( i \)’s abatement effort \( \mu_{k}^{(0)i} \). The damage of the pollutant stock to nation \( i \) at stage \( k \) is \( h_{k}^{i} x_{k}^{{}} \), where \( h_{k}^{i} \) is the average damage cost incurred by a unit of the pollutant in stage \( k \). The terminal valuation of the government’s objective at stage 11 is \( (Q_{11}^{i} x_{11}^{{}} + \varpi_{{}}^{i} ) \), which is related to the level of pollutants in the atmosphere. The term \( \varpi_{{}}^{i} \) reflects the anticipated present value of the stream of discounted economic benefits of nation \( i \)’s industrial output net of the pollution related costs after stage 11. The term \( Q_{11}^{i} x_{11}^{{}} \) is the linear approximation of the damage of the pollution stock to the nation.
To secure group optimality the participating nations seek to cooperate and maximize their joint payoff by solving the following dynamic optimization problem:
subject to (5.1), where \( \ell \in \{ 1,2\} \) and \( \ell \ne j \).Invoking Theorem 3.1, a set of optimal control strategies \( \{ \mu_{k}^{(0)i*} ,\mu_{k}^{(10)i*} , \) for \( k \in \{ 1,2, \ldots ,10\} \) and \( i \in \{ 1,2\} \} \) for the dynamic optimization problem (5.1) and (5.3) can be obtained by solving the following system of recursive equations:
Performing the indicated maximization in (5.5) and solving the system (5.4)–(5.5) one can obtain the maximized joint payoff under cooperation as:
Proposition 5.1
System (5.4)–(5.5) admits a solution with the maximized joint payoff being:
with\( A_{11}^{{}} = \sum\nolimits_{j = 1}^{2} {Q_{11}^{j} } \), and\( C_{11}^{{}} = \sum\nolimits_{j = 1}^{2} {\varpi_{{}}^{j} } \);
for\( k \in \{ 1,2, \ldots ,10\} \); and\( C_{k}^{{}} \)being an expression involving the model parameters, for\( k \in \{ 1,2, \ldots ,10\} \).
Proof
See “Appendix E”. □
Using Proposition 5.1 and (5.4)–(5.5), the optimal cooperative controls can be obtained as:
The cooperative abatement effort \( \mu_{k}^{(0)i*} \) in (5.8) is the optimal level of pollution abatement where the marginal abatement cost \( 2\gamma_{k}^{i} \mu_{k}^{(0)i*} \) equals the marginal social benefit \( - A_{k + 1}^{{}} b_{{}}^{i} (x_{k}^{{}} )^{1/2} \delta \) from pollution abatement. The cooperative industrial output \( \mu_{k}^{(10)i*} \) in (5.9) yields the optimal output level. In particular, the marginal revenue of industrial output \( \alpha_{k}^{i} \) equals the total marginal private and social costs which include (i) the marginal cost of production \( 2c_{k}^{i} \mu_{k}^{(10)i*} \), (ii) the marginal toxic impact of nation \( i \)’s output to nation \( i \) itself \( \varepsilon_{k}^{i(k)i} \), (iii) the marginal toxic impact of nation \( i \)’s output to nation \( j \), that is \( \omega_{k}^{\ell (k)i} \), (iv) the increment of pollution stock \( - \delta A_{k + 1}^{{}} a_{{}}^{i} \), and (v) the sum of the marginal toxic impacts to both nations in stage \( k \) to stage 10, that is \( \delta \sum\nolimits_{\tau = k + 1}^{10} {\delta_{{}}^{\tau - (k + 1)} (\varepsilon_{\tau }^{i(k)i} + \omega_{\tau }^{\ell (k)i} )} \).
Worth noting is that according to item (v), the marginal lagged toxic environmental effects of nation \( i \)’s output on future stages are included. The environmental implication of control lags is that lagged environmental impacts can last for a long time and the total level of lagged impacts accumulates over time from outputs produced stage after stage. The economic implication is that a levy on industrial production equaling the sum of items (ii) to (v) has to be imposed to achieve the social optimal level of output. In particular, the amount in item (v) is the tax on the marginal toxic impacts in future stages. Finally, in the case where there are no control lags, item (v) will disappear.
Substituting the optimal control strategies (5.8)–(5.9) into the state dynamics (5.1), we obtain the cooperative state trajectory
Equation (5.10) is a first-order linear difference equation which can be solved by standard techniques. We use \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{11} \) to denote the values of \( x_{k}^{{}} \) generated by (5.10).
Using (5.8)–(5.9), one can obtain the optimal cooperative strategies in stages 1 to \( T \).
To derive the non-cooperative equilibrium solution of the game (5.1)–(5.2), we invoke Theorem 3.2 and obtain the game equilibrium payoff of nation \( i \) as:
Proposition 5.2
The game equilibrium payoff of nation \( i \in \{ 1,2\} \) is:
with \( A_{11}^{i} = Q_{11}^{i} \), and \( C_{11}^{i} = \varpi_{{}}^{i} \)
and \( C_{k}^{i} \) being an expression involving the model parameters.
Proof
Follow the proof of Proposition 5.1. □
If the nations agree to share the total cooperative payoff proportional to their non-cooperative payoffs, then the payoff of player \( i \) along the cooperative trajectory \( \left\{ {\,x_{k}^{*} \,} \right\}_{k = 1}^{11} \) becomes
for \( i \in \{ 1,2\} \) and \( k \in \{ 1,2, \ldots ,10\} \).
Invoking Theorem 4.1, a payment
given to player \( i \) at stage \( k \in \{ 1,2, \ldots ,10\} \) along the cooperative trajectory \( \left\{ {x_{k}^{*} } \right\}_{k = 1}^{10} \) with preceding strategies \( (\underline{\mu }_{k - 1}^{(10)*} ,\underline{\mu }_{k - 2}^{(10)*} , \ldots ,\underline{\mu }_{1}^{(10)*} ) \) would lead to the realization of the imputation (5.13) and hence a subgame consistent solution results.
6 Concluding Remarks
This paper presents a new class of cooperative dynamic games in which there are control lags affecting the payoffs of the players. The incorporation of control lags extends the application of cooperative dynamic games to a wider spectrum of real-life scenarios. A novel dynamic optimization theorem for solving problems with control lags is developed. This theorem can be applied to a wide range of practical dynamic optimization problems with control lags. For instance, in dynamic consumer utility maximization the purchase of durable goods is a classic case of lagged control. Almost all kinds of investments would generate returns in future stages and can be modeled as lagged controls. The production of goods (like wine or plants) which requires a gestation period is controls with lag effects. Housing decision is an optimization process involving lagged controls. Insurance would affect the payoff of the decision maker in subsequent stages, and multi-stage contracts would generate binding effects on future payoffs. The optimization techniques developed in this paper play an important role in the derivation of solutions for these analyses. A new form of the feedback Nash equilibrium recursive equations is derived using the novel dynamic optimization theorem. A subgame consistent imputation distribution procedure contingent upon the current state \( x_{k}^{*} \) and previously executed controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) is established.
This is the first time that cooperative dynamic games with lagged controls are formulated and further research along this line is expected. An immediate extension of the game is to allow the terminal payoffs being dependent on the previously executed lagged controls, like \( q_{T + 1}^{i} \left[ {x_{T + 1}^{{}} ,\underline{\mu }_{T}^{(T)} ,\underline{\mu }_{T - 1}^{(T)} , \ldots ,\underline{\mu }_{1}^{(T)} } \right] \), for \( i \in N \). Among further theoretical developments to be pursued include research on dynamic cooperation among players with asynchronous game horizons with control lags and the consideration of more complicated control lags. In addition, application of the analysis to cooperation schemes involving control lags, like Brexit, international climate agreements and nuclear war threats, would likely generate interesting and new results.
References
Arthur WB (1977) Control of linear processes with distributed lags using dynamic programming from first principles. J Optim Theory Appl 23:429–443
Basar T, Olsder GJ (1995) Dynamic noncooperative game theory, 2nd edn. Academic Press, London
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
Bellman R (1957) Terminal control, time lags, and dynamic programming. Proc Natl Acad Sci USA 43:927–930
Brandt-Pollmann U, Winkler R, Sager S, Moslener U, Schlöder J (2008) Numerical solution of optimal control problems with constant control delays. Comput Econ 31:181–206
Burdet CA, Sethi SP (1976) On the maximum principle for a class of discrete dynamical systems with lags. J Optim Theory Appl 19:445–454
Hartl RF, Sethi SP (1984) Optimal control of a class of systems with continuous lags: dynamic programming approach and economic interpretations. J Optim Theory Appl 43(1):73–88
Huschto T, Feichtinger G, Hartl RF, Kort PM, Sager S (2011) Numerical solution of a conspicuous consumption model with constant control delay. Automatica 47:1868–1877
Isaacs R (1965) Differential games. Wiley, New York
Kramer JDR (1960) On control of linear systems with time lags. Inf Control 3:299–326
Krasovkii NN (1972) External control in a nonlinear differential game. J Appl Math Mech 36:986–1006
Petrosyan LA, Danilov NN (1979) Stability of solutions of nonzero sum differential games with transferrable payoffs. Vestn Leningr Univ 21:52–59
Pontryagin LS (1966) On the theory of differential games. Uspekhi Mat Nauk 21:219–274
Sethi SP, Mcguire TW (1977) Optimal skill mix: an application of the maximum principle for systems with retarded controls. J Optim Theory Appl 23:245–275
Wang PKC (1975) Time-optimal control of time-lag systems with time-lag controls. J Math Anal Appl 52:366–378
Winkler R, Brandt-Pollmann U, Moslener U, Schlöder J (2003) Time-lags in capital accumulation. In: Ahr D, Fahrion R, Oswald M, Reinelt G (eds) Operations research proceedings. Springer, Berlin, pp 451–458
Yeung DWK (2014) Dynamically consistent collaborative environmental management with production technique choices. Ann Oper Res 220:181–204
Yeung DWK, Petrosyan LA (2010) Subgame consistent solutions for cooperative stochastic dynamic games. J Optim Theory Appl 145:579–596
Yeung DWK, Petrosyan LA (2011) Subgame consistent cooperative solution of dynamic games with random horizon. J Optim Theory Appl 150:78–97
Yeung DWK, Petrosyan LA (2013) Subgame-consistent cooperative solutions in randomly furcating stochastic dynamic games. Math Comput Model 57:976–991
Yeung DWK, Petrosyan LA (2016) Subgame consistent cooperation: a comprehensive treatise. Springer, Berlin. https://doi.org/10.1007/978-981-10-1545-8
Acknowledgements
This research was supported by the Russian Science Foundation Grant (RSF N17-11-01079). The authors would like to thank two anonymous reviewers for their extremely valuable comments and suggestions that improve this paper significantly.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proof of Theorem 2.1
According to (2.4) in Theorem 2.1
We prove the validity of (A.1) by contradiction. Suppose
This is not possible because on the right-hand-side of (A.2), we choose controls \( u_{k}^{(0)} \) and \( \mu_{k}^{(T)} \) to maximize the value inside the curly brackets, and this maximized value would at best equal the maximal payoff on the right-hand-side.Now suppose
From (A.1) we obtain \( W(k,x;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) which is the maximal of the payoffs
By definition, the term \( W(k + 1,x;\mu_{k}^{(T)} ,\mu_{k - 1}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) is the maximal of the payoffs
which is no less than
Hence,
(A.6) contradicts with (A.3). Hence the validity of Eq. (2.4) stands.
In addition, to solve the problem (2.3)–(2.4), we adopt the technique of backward induction. Consider first the last operational stage \( T \), invoking Theorem 2.1 we have
The maximization operator in stage \( T \) involves \( \mu_{T}^{(0)} \) and \( \mu_{T}^{(T)} \) only. However, the current state \( x \) and the previous determined controls \( (\mu_{T - 1}^{(T)} ,\mu_{T - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) appear in the stage \( T \) payoff function. If the first order conditions of the maximization problem in (A.7) satisfy the implicit function theorem, one can obtain the optimal controls \( \mu_{T}^{(0)} \) and \( \mu_{T}^{(T)} \) as functions of \( x \) and \( (\mu_{T - 1}^{(T)} ,\mu_{T - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \). Substituting these optimal controls into the function on the right-hand-side of (A.7) yields the function \( W(T,x;\mu_{T - 1}^{(T)} ,\mu_{T - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \), which satisfies the optimal conditions of a maximum for given \( x \) and \( (\mu_{T - 1}^{(T)} ,\mu_{T - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \).
Consider the second last operational stage \( T - 1 \), invoking Theorem 2.1 we have
The maximization operator in stage \( T - 1 \) involves \( \mu_{T - 1}^{(0)} \) and \( \mu_{T - 1}^{(T)} \). The current state \( x \) and the previous determined controls \( (\mu_{T - 2}^{(T)} ,\mu_{T - 3}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) appear in the stage \( T - 1 \) payoff function. If the first order conditions of the maximization problem in (A.8) satisfy the implicit function theorem, one can obtain the optimal controls \( \mu_{T - 1}^{(0)} \) and \( \mu_{T - 1}^{(T)} \) as functions of \( x \) and previously determined controls \( (\mu_{T - 2}^{(T)} ,\mu_{T - 3}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \). Substituting these optimal controls into the function on the right-hand-side of (A.8) yields the function \( W(T - 1,x;\mu_{T - 2}^{(T)} ,\mu_{T - 3}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \).
Now consider stage \( k \in \{ T - 2,T - 3, \ldots ,2,1\} \), invoking Theorem 2.1 we have
The maximization operator involves \( \mu_{k}^{(0)} \) and \( \mu_{k}^{(T)} \). Again, the current state \( x \) and the previous determined controls \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \) appear in the stage \( k \) payoff function. If the first order conditions of the maximization problem in (A.9) satisfy the implicit function theorem, one can obtain the optimal controls \( \mu_{k}^{(0)} \) and \( \mu_{k}^{(T)} \) as functions of \( x \) and \( (\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \). Substituting these optimal controls into the function on the right-hand-side of (A.9) yields the function \( W(k,x;\mu_{k - 1}^{(T)} ,\mu_{k - 2}^{(T)} , \ldots ,\mu_{1}^{(T)} ) \). □
Appendix B: Proof of Theorem 3.1
The conditions in (3.4)–(3.5) satisfy the optimal conditions of the dynamic optimization technique with control lags in Theorem 2.1 and hence an optimal solution to the control problem results. □
Appendix C: Proof of Theorem 3.2
Conditions (3.9)–(3.10) show that \( V_{{}}^{i} \left( {k,x;\underline{\mu }_{k - 1}^{(T)**} ,\underline{\mu }_{k - 2}^{(T)**} , \ldots ,\underline{\mu }_{1}^{(T)**} } \right) \) is the maximized payoff of player \( i \in N \) according to Theorem 2.1 given the game equilibrium strategies of the other \( n - 1 \) players. Hence a Nash equilibrium results. □
Appendix D: Proof of Theorem 4.1
Using (4.2) one can obtain
Upon substituting (D.1) into (4.2) yields
which can be expressed as
From (D.2) one can obtain Theorem 4.1. □
Appendix E: Proof of Proposition 5.1
First, we consider the last operating stage \( T = 10 \). Using (5.5) and Proposition 5.1 we can obtain the optimal cooperative strategies:
Substituting (E.1) into the stage 10 equation in (5.5) we obtain:
Both the LHS and RHS of (E.2) are linear functions of \( x \). For (E.2) to hold it is required that:
and \( C_{10}^{{}} \) being equal to the (undiscounted) expression not involving \( x \) on the right-hand-side of (E.3). Note that the term \( C_{10}^{{}} \) contains previously executed strategies in the form of \( - \sum\nolimits_{t = 1}^{9} {\varepsilon_{10}^{j(t)j} \mu_{t}^{(10)j} } - \sum\nolimits_{t = 1}^{9} {\omega_{10}^{j(t)\ell } \mu_{t}^{(10)\ell } } \).
Invoking Proposition 5.1 and performing similar operations in stage \( k \in \{ 9,8, \ldots ,1\} \) in (5.5), we obtain the optimal cooperative strategies:\( \mu_{k}^{(0)i*} = - A_{k + 1}^{{}} \frac{{b_{{}}^{i} (x)^{1/2} \delta }}{{2\gamma_{k}^{i} }} \),
where \( \partial C_{k + 1}^{{}} /\partial u_{k}^{(10)i} = - \sum\nolimits_{\tau = k + 1}^{10} {} \delta_{{}}^{\tau - (k + 1)} (\varepsilon_{\tau }^{i(k)i} + \omega_{\tau }^{\ell (k)i} ) \). Substituting (E.4) into Eq. (5.5) for \( k \in \{ 8,7, \cdots ,1\} \) we obtain:
Both the LHS and RHS of (E.5) are linear functions of \( x \). For (E.5) to hold it is required that:
and \( C_{k}^{{}} \) being equal to the (undiscounted) expression not involving \( x \) on the right-hand-side of (E.5). Note that the term \( - \sum\nolimits_{t = 1}^{k - 1} {\varepsilon_{k}^{j(t)j} \mu_{t}^{(10)j} } - \sum\nolimits_{t = 1}^{9} {\omega_{k}^{j(t)\ell } \mu_{t}^{(10)\ell } } \) appears in \( C_{k}^{{}} \) as given parameters.
Finally substituting the optimal cooperative controls
into \( C_{k}^{{}} \) for \( k \in \{ 1,2, \ldots ,10\} \), we can express \( C_{k}^{{}} \) in terms of the model parameters explicitly. □
Rights and permissions
About this article
Cite this article
Yeung, D.W.K., Petrosyan, L.A. Cooperative Dynamic Games with Control Lags. Dyn Games Appl 9, 550–567 (2019). https://doi.org/10.1007/s13235-018-0266-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-018-0266-6