Abstract
We study a class of dynamic decision problems of mean-field type with time-inconsistent cost functionals and derive a stochastic maximum principle to characterize sub-game perfect equilibrium points. Subsequently, this approach is extended to a mean-field game to construct decentralized strategies and obtain an estimate of their performance.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In dynamic decision-making problems, a policy is time consistent if whenever it is optimal at time \(t\), it remains optimal when implemented at a later time \(s>t\). In optimal control, this is known as the Bellman principle. A time-inconsistent policy need not be optimal at later time \(s>t\), even if it is optimal at time \(t\). Time inconsistency occurs for example when a hyperbolic discount rate is preferred to an exponential discount rate or when the performance criterion is a nonlinear function of the expected utility such as the variance in the standard Markowitz investment problem. For a recent review of time consistency in dynamic decision-making problems, we refer to Ekeland and Lazrak [14], Ekeland and Lazrak [15], and Zaccour [39].
In his work on a deterministic Ramsay problem, Strotz [33] was the first to formulate the dynamic time-inconsistent decision problem as a game theoretic problem where it is natural to look for sub-game perfect equilibria. Pollak [32], Phelps and Pollak [31], Peleg and Menahem [29] and Goldman [17] extended this framework to discrete and continuous time dynamics. The recent works by Ekeland and Lazrak [14] and Ekeland and Pirvu [15] apply this game theoretic approach to an optimal investment and consumption problem under hyperbolic discounting for deterministic and stochastic models. Among their achievements, they provide a precise definition of the equilibrium concept in continuous time, using a Pontryagin type “spike variation” formulation (that we recall in Sect. 2 below) and derive among other things, an extension of the Hamilton–Jacobi–Bellman (HJB) equation along with a verification theorem that characterizes Markov (or feedback type) sub-game perfect equilibria. Their work is extended by Björk and Murgoci [6] and Björk et al. [7] to performance functions that are nonlinear functions of expected utilities for dynamics driven by a quite general class of Markov processes. Hu et al. [19] followed by Bensoussan et al. [4] characterize sub-game perfect equilibria using a Pontryagin type stochastic maximum principle (SMP) approach to a time-inconsistent stochastic linear-quadratic control problem of mean-field type, where the performance functional is a conditional expectation with respect to the history \({\mathcal {F}}_t\) of the system up to time \(t\). They derive a general sufficient condition for equilibria through a new class of flows of forward-backward stochastic differential equations (FBSDEs). The properties of this class of flows of FBSDEs are far from being well understood and deserve further investigation. Both the extended HJB equation provided in Björk and Murgoci [6] and Björk, Murgoci and Zhou [7] and the sufficient condition suggested by Hu et al. [19] give explicit expression of the equilibria only in very few cases. In a more recent work, Yong [37] studied a class of linear-quadratic models with very general weight matrices in the cost, and time-consistent equilibrium control is constructed by the stochastic maximum principle approach and Riccati equations. Yong [37] also considered closed-loop equilibrium strategies by discretization of time for the game.
In this paper, we suggest an SMP approach to time-inconsistent decision problems for dynamics that is driven by diffusion processes of mean-field type that are not necessarily Markov and whose performance criterion is a nonlinear function of the conditional expectation of a utility function, given the present location of the state process. We do not condition on the whole history \({\mathcal {F}}_t\) of the system as in Hu et al. [19] because for all practical purposes, in the best conditions, the decision-maker can only observe the current state of the system. She can never provide a complete and explicit form of the history \({\mathcal {F}}_t\) (which is a \(\sigma \)-algebra) of the system, simply because this is a huge set of information, except in trivial situations. Our model generalizes the one studied in Ekeland and Pirvu [15] and Björk et al. [6, 7].
In the first main result of the paper, the sub-game perfect equilibria (not necessarily of feedback type) are fully characterized as maximizers of the Hamiltonian associated with the system in a similar fashion as in the SMP for diffusions of mean-field type obtained in Andersson and Djehiche [1] and Buckdahn et al. [8]. This approach is illustrated by several examples, and the explicit solutions are obtained.
Next, we address the time-inconsistency issue in a mean-field game of \(N\) players. The players in such games are individually insignificant and interact via an aggregate effect generated by the population. There has existed a substantial literature on this class of games. Huang et al. [21–23] introduced an approach based on consistent mean-field approximations to design decentralized strategies where each player solves a localized optimal control problem by dynamic programming. These strategies have an \(\varepsilon \)-Nash equilibrium property when applied to a large but finite population. Closely related developments were presented by Lasry and Lions [26] who introduced the name mean-field game, and Weintraub et al. [35] studied oblivious equilibria in a Markov decision setup. Within the linear-quadratic setup, various explicit solutions can be obtained; see, e.g., [2, 5, 23, 27]. Tembine et al. [34] introduced risk sensitive costs for mean-field games and analyzed the linear exponential quadratic Gaussian model in detail. For games with dynamics modelled by nonlinear diffusions, Carmona and Delarue [12] developed a probabilistic approach, and Kolokoltsov et al. [25] presented a very general mean-field game modeling framework via nonlinear Markov processes. Gomes et al. [18] considered games with discrete time and discrete states. For additional information, the reader may consult an overview of this area by Buckdahn et al. [8], and Bensoussan et al. [3].
To display an overall picture of various past developments in a mean field context, we briefly remark on the difference between mean-field type optimal control and mean-field games. For the former (see, e.g., [1, 16, 36]), there is only a single decision-maker who can instantly affect the mean of the underlying state process. In contrast, a player in a mean-field game with all comparably small players (called peers) has little influence on a mean-field term such as \(X^{(N)}=\frac{1}{N}\sum _{i=1}^N X_i\). An exception is games with a major player whose control can affect everyone notably; see, e.g., [20, 28].
So far, most existing research on mean-field games deals with time consistent cost functionals. The state feedback strategies based on consistent mean-field approximations are sub-game perfect in the infinite population limit model and so no individual has the incentive to revise its strategy when time moves forward. In a recent work, Bensoussan et al. [4] considered time-inconsistent quadratic cost functionals in a mean-field game with a continuum population and linear dynamics. A so-called time consistent optimal strategy is derived based on spike variation which is followed by a consistency condition on the mean field generated by an infinite population.
The novelty and main contributions of paper are summarized as follows.
-
(i)
Under the notion of sub-game perfect equilibrium control, we present a characterization of time-consistent control via a stochastic maximum principle for general nonlinear diffusion models. The associated adjoint equations are indexed by the time-state pair which the system has just evolved to.
-
(ii)
The notion of \(\delta _N\)-sub-game perfect equilibrium is introduced for a mean-field game of \(N\)-players with time-inconsistent costs. By combining mean-field approximations and the SMP, we obtain strategies using only local information of a player. The performance of the set of strategies is characterized via a \(\delta _N\)-sub-game perfect equilibrium, which implies, for large \(N\), no individual player has notable incentive to revise its strategy during its execution while interacting with other players.
-
(iii)
The computational aspect of our approaches is illustrated by various examples.
The mean-field game which we will analyze involves nonlinear dynamics, and each player is cost coupled with others by their average state \(X^{(-i)}=\frac{1}{N-1}\sum _{k\ne i}^N X_k\). Time inconsistency arises from the conditioning in the cost functional. Our approach for strategy design is to use a freezing idea so that the coupling term is approximated by a deterministic function \(\bar{X}\). This naturally introduces an optimal control problem with a time inconsistent cost which in turn is handled by the SMP approach. After finding the equilibrium strategy for the limiting control problem, we determine \(\bar{X}\) by a consistency condition. The remaining important issue is to analyze the performance of the obtained strategies when applied by \(N\) players.
The organization of the paper is as follows. In Sect. 2, we state the SMP approach for our game problem and the associated adjoint equations. Section 3 characterizes the equilibrium point by an SMP (Theorem 1). Section 4 is devoted to some examples illustrating the main results. In Sect. 5, we extend the previous results to a system of \(N\) decision-makers (Theorem 2). Section 6 provides the proof of Theorem 2. Section 7 presents explicit computations in a mean-field LQG game with time-inconsistent costs.
To streamline the presentation, we only consider the one-dimensional case for the state. The extension to the multidimensional case is by now straightforward. For the reader’s convenience, we make a convention on notation. The analysis of the mean-field game uses \(C\) as a generic constant which may change from place to place, but depends on neither the population size \(N\) nor the parameter \( \varepsilon \) of the spike variation.
2 Notation and Statement of the Problem
Let \(T>0\) be a fixed time horizon and \((\varOmega ,{\mathcal {F}},\mathbb {F}, \mathbb {P})\) be a given filtered probability space whose filtration \(\mathbb {F}=\{{\mathcal {F}}_s,\ 0\le s \le T\}\) satisfies the usual conditions of right continuity and completeness, on which a one-dimensional standard Brownian motion \(W=\{W_s\}_{s\ge 0}\) is given. We assume that \(\mathbb {F}\) is the natural filtration of \(W\) augmented by \(\mathbb {P}\)-null sets of \({\mathcal {F}}.\)
An admissible strategy \(u\) is an \(\mathbb {F}\)-adapted and square-integrable process with values in a non-empty subset \(U\) of \(\mathbb {R}\). We denote the set of all admissible strategies over \([0,T]\) by \({\mathcal {U}}[0,T]\).
For each admissible strategy \(u\in {\mathcal {U}}[0,T]\), we consider the dynamics given by the following SDE of mean-field type, defined on \((\varOmega ,{\mathcal {F}},\mathbb {F}, \mathbb {P})\),
We consider decision problems related to the following cost functional
associated with the state process \(X^{u,t,x}\), parametrized by \((t,x)\in [0,T]\times \mathbb {R}\), whose dynamics is given by the SDE
where
We note that \(X^{u,0,x_0}=X^{u}\). The mean of the state process appears in (2.1)–(2.3). This mean-field type model involves a single decision-maker, and a motivating example is the mean-variance portfolio optimization problem. And also because of its simplicity, in Remark 5 below, we mention possible extensions to more general classes of mean-field coupling. The inclusion of the average state of a finite number of decision-makers will be considered later in Sect. 5. Under some conditions, such an average converges to a mean term as well.
The dependence of (2.2)–(2.3) on the term \(E[X^{u,t,x}(s)]\) makes the system (2.2)–(2.3) time-inconsistent in the sense that the Bellman Principle for optimality does not hold, i.e., the \(t\)-optimal policy \(u^*(t,x, \cdot )\) which minimizes \(J(t,x, u)\) may not be optimal after \(t\): The restriction of \(u^*(t,x,\cdot )\) on \([t', T]\) does not minimize \(J(t',x',u)\) for some \(t'>t\) when the state process is steered to \(x'\) by \(u^*\). Therefore, as noted by Ekeland et al. [14, 15], time-inconsistent optimal solutions (although they exist mathematically) are irrelevant in practice. The decision-maker would not implement the \(t\)-optimal policy at a later time, if he/she is not forced to do so. The review paper by Zaccour [39] gives a nice guided tour to the concept of time consistency in differential games.
Following Ekeland et al. [14, 15], and Björk and Murgoci [6], we may view the problem as a game and look for a sub-game perfect equilibrium point \(\hat{u}\) in the following sense:
-
(i)
Assume that all players (selves) \(s\), such that \(s>t\), use the strategy \(\hat{u}(s)\).
-
(ii)
Then it is optimal in a certain sense for player (self) \(t\) to also use \(\hat{u}(t)\).
When the players use feedback strategies, depending on \(t\) and on the position \(x\) in space, player \(t\) will choose a strategy of the form \(u(t){:=}\varphi (t, x)\), where \(\varphi \) is a deterministic function, so the action chosen by player \(t\) is given by the mapping \(x\longrightarrow \varphi (t, x)\). The cost to player \(t\) is given by the functional \(J(t, x, \varphi )\). It is clear that \(J(t, x, \varphi )\) does not depend on the actions taken by any player \(s\) for \(s < t\), so in fact \(J\) depends only on the restriction of the strategy \(u\) to the time interval \([t, T]\). The strategy \(\varphi \) can thus be viewed as a complete description of the chosen strategies of all players in the game.
If feedback strategies are to be used, a deterministic function \(\hat{\varphi }:\, [0,T]\times \mathbb {R}\longrightarrow U\) is a sub-game perfect equilibrium point when the following actions are performed:
-
(i)
Assume that all players (selves) \(s\), such that \(s>t\), use the strategy \(\hat{\varphi }(s,\cdot )\).
-
(ii)
Then it is optimal in a certain sense for player (self) \(t\) to also use \(\hat{\varphi }(t,\cdot )\).
Although the \(t\)-self is intuitively assigned the cost \(J(t, x, u)\) for the initial time-state pair \((t,x)\), one cannot obtain the equilibrium strategy in this continuous time model by considering the unilateral perturbation of \(u(t)\) while the controls of all s-selves, \(s\in (t, T]\), are fixed. This is due to the fact that \(J(t,x, u)\) is insensitive to the modification of \(u(\cdot )\) at a single point of time \(t\). To characterize the equilibrium strategy \({\hat{u}}\), Ekeland et al. [14, 15] suggest the following definition that uses a “local” spike variation in a natural way.
Define the admissible strategy \(u^{\varepsilon }\) as the “local” spike variation of a given admissible strategy \(\hat{u}\in {\mathcal {U}}[0,T]\) over the set \([t,t+\varepsilon ]\),
where \(u\in {\mathcal {U}}[0,T]\) and \(t\in [0,T]\) are arbitrarily chosen. We view \([t, t+\varepsilon ]\) as an infinitesimal coalition \(\text{ Co }[t, t+\varepsilon ]\) of \(s\)-selves which is associated with the dynamics (2.3) and the cost \(J(t, x, u^\varepsilon (\cdot ))\) and which is able to choose its strategy \(u(s)\), \(s\in [t, t+\varepsilon ]\). All future \(s\)-selves, \(s>t+\varepsilon \) affect \(J(t, x, u^\varepsilon (\cdot ))\) by their controls on \((t+\varepsilon , T]\). Let \({\mathcal {U}}[r,s]\) denote the restriction of \({\mathcal {U}}[0,T]\) on \([r,s]\) for \(0 \le r\le s\le T\). Then the strategy space of \(\text{ Co }[t, t+\varepsilon ]\) may be denoted by \({\mathcal {U}}[t, t+\varepsilon ]\).
Hu et al. [19] suggest the following open-loop form of the local spike variation:
where \(\nu \in L^2(\varOmega , {\mathcal {F}}_t, \mathbb {P}; \mathbb {R}^l)\) is arbitrarily chosen. This form is suitable only when \(U\) is a linear space.
For either form of local spike variation, we have the following
Definition 1
The admissible strategy \(\hat{u}\) is a sub-game perfect equilibrium for the system (2.2)–(2.3) if
for all \(u\in {\mathcal {U}}[0,T]\), \(x\in \mathbb {R}\) and \({\mathrm{a.e.}\,}t \in [0,T]\). The corresponding equilibrium dynamics solves the SDE
If feedback strategies are to be used, the previous definition reduces to the following
Definition 2
A deterministic function \({\hat{\varphi }}:\, [0,T]\times \mathbb {R}\longrightarrow U\) is a sub-game perfect equilibrium for the system (2.2)–(2.3) if
for all \(u\in {\mathcal {U}}[0,T], x\in \mathbb {R}\) and \({\mathrm{a.e.}\,}t \in [0,T]\), where \({\hat{u}}(s){:=}{\hat{\varphi }}(s,{\hat{X}}(s)), 0\le s\le T\) and \({\hat{X}}\) is given by (2.9). The associated equilibrium dynamics solves the SDE
For brevity, sometimes we simply call \(\hat{u}\) an equilibrium point when there is no ambiguity.
The purpose of this study is to characterize sub-game perfect equilibria for the system (2.2)–(2.3) by evaluating the limit (2.6) in terms of a stochastic maximum principle criterion. We will apply the general stochastic maximum principle for SDEs of mean-field type derived in Buckdahn et al. [10].
The following assumptions (imposed in [10]) will be in force throughout Sects. 2–3. These assumptions can be made weaker, but we do not focus on this here.
Assumption 1
-
(i)
The functions \(b, \sigma , h,g\) are continuous in \((y,z,u)\), and bounded.
-
(ii)
The functions \(b, \sigma , h, g\) are twice continuously differentiable with respect to \((y,z)\), and their derivatives up to the second order are continuous in \((y,z, u)\), and bounded.
Although we are interested in characterizing sub-game perfect equilibrium points by considering the action of player \(t\) at a deterministic position \(x\), we perform the analysis for the more general case where player \(t\) has a random variable \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\) as a state.
For a given admissible strategy \(u\in {\mathcal {U}}[0,T]\), if player \(t\) has \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\) as its state, (2.3) becomes
and the associated cost functional (2.2) becomes
Remark 1
Definitions 1 and 2 can be accordingly generalized by replacing \((t,x)\) by \((t,\xi )\) and the inequality condition takes the form
for all \(u\in {\mathcal {U}}[0,T]\), \(\xi \in L^2(\varOmega , {\mathcal {F}}_t, {\mathbb {P}}; { \mathbb {R}})\) and a.e. \(t\in [0,T]\).
It is a well-known fact, see, e.g., Karatzas and Shreve ([24], pp. 289–290), that under Assumption 1, for any \(u\in {\mathcal {U}}[0,T]\), the SDE (2.10) admits a unique strong solution. Moreover, there exists a constant \(C>0\) which depends only on the bounds of \(b,\sigma \) and their first derivatives w.r.t. \(y,z\), such that, for any \(t\in [0,T], \, u\in {\mathcal {U}}[0,T]\) and \(\xi , \xi ^{\prime }\in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), we also have the following estimates, \(\mathbb {P}-{\mathrm{a.s.}\,}\)
Moreover, the performance functional (2.11) is well defined and finite.
For convenience, we will use the following notation throughout the paper. We will denote by \(X^{t,\xi }{:=}X^{u,t,\xi }\) the solution of the SDE (2.10), associated with the strategy \(u\), and accordingly, \({\hat{X}}^{t,\xi }{:=}X^{{\hat{u}},t,\xi }\) associated with \({\hat{u}}\).
For \(\varphi =b, \sigma , h, g\), we define
Let us introduce the Hamiltonian associated with the r.v. \(X\in L^1(\varOmega ,\mathcal {F}, \mathbb {P})\):
3 Adjoint Equations and the Stochastic Maximum Principle
In this section, we introduce the adjoint equations involved in the SMP which characterize the equilibrium points \({\hat{u}}\in {\mathcal {U}}[0,T]\) of our problem.
The first-order adjoint equation is the following linear backward SDE of mean-field type parametrized by \((t,\xi )\in [0,T]\times L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), satisfied by the processes \((p^{t,\xi }(s),q^{t,\xi }(s)),\, s\in [t,T],\)
where, in view of the notation (2.14), for \(j=y,z\),
This equation reduces to the standard one, when the coefficients do not explicitly depend on the expected value (or the marginal law) of the underlying diffusion process. Under Assumption 1 on \(b,\sigma , h, g\), by an adaptation of Theorem 3.1 in Buckdahn et al. [9], by keeping track of the parametrization \((t,\xi )\), Eq. (3.1) admits a unique \(\mathbb {F}\)-adapted solution \((p^{t,\xi },q^{t,\xi })\). Moreover, there exists a constant \(C>0\) such that, for all \(t\in [0,T]\) and \(\xi , \xi ^{\prime }\in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), we have the following estimate, \({\mathbb {P}}-a.s.,\)
The second order adjoint equation is the classical linear backward SDE, parametrized by \((t,\xi )\in [0,T]\times L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), which appears in Peng’s stochastic maximum principle (see Peng [30]):
where in view of (2.14),
This is a standard linear backward SDE, whose unique \(\mathbb {F}\)-adapted solution \((P^{t,\xi },Q^{t,\xi })\) satisfies the following estimate: There exists a constant \(C>0\) such that, for all \(t\in [0,T]\) and \(\xi ,\xi ^{\prime }\in L^2(\varOmega ,\mathcal {F}_t,\mathbb {P}; \mathbb {R})\),
The SDEs (3.1) and (3.4) have a unique solution for any fixed control \(u\in {\mathcal {U}}[0,T]\) and the corresponding estimates (3.3) and (3.6) hold. However, for Theorem 1 below, only the equilibrium control \({\hat{u}}\) is substituted into the two equations. The following theorem is the first main result of the paper.
Theorem 1
(Characterization of equilibrium strategies) Let Assumption 1 hold. Then \({\hat{u}}\) is an equilibrium strategy for the system (2.10)–(2.11) if and only if there are pairs of \(\mathbb {F}\)-adapted processes \(\left( p,q\right) \) and \(\left( P,Q\right) \) which satisfy (3.1)–(3.3) and (3.4)–(3.6), respectively, and for which
In particular, we have
For feedback strategies, the deterministic function \({\hat{\varphi }}: \, [0,T]\times \mathbb {R}\longrightarrow U\) is an equilibrium strategy for the system (2.11)–(2.10) if and only if there are pairs of \(\mathbb {F}\)-adapted processes \(\left( p,q\right) \) and \(\left( P,Q\right) \) which satisfy (3.1)–(3.3) and (3.4)–(3.6), respectively, and for which
Proof
Denote
where the Hamiltonian \(H\) is given by (2.15). By Theorem 2.1 in Buckdahn et al. [8], keeping track of the parametrization \((t,\xi )\), the key relation between the cost functional (2.11) and the associated Hamiltonian (2.15) reads
for arbitrary \(u\in {\mathcal {U}}[0,T]\) and \((t,\xi )\in [0,T]\times L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), where
for some function \(\bar{\rho }: (0,\infty )\rightarrow (0,\infty )\) such that \(\bar{\rho }(\varepsilon )\downarrow 0\) as \(\varepsilon \downarrow 0\); see Eq. (3.53) of [8] for a similar upper bound estimate of the error term \(R(\varepsilon )\).
Dividing both sides of (3.11) by \(\varepsilon \) and then passing to the limit \(\varepsilon \downarrow 0\), in view of Assumption 1, (3.3) and (3.6), we obtain
Now, if (3.7) holds, by setting \(v{:=}u(t)\) for arbitrary \(u\in {\mathcal {U}}[0,T]\), we also have
Therefore, by (3.12) we obtain (2.12), i.e., \({\hat{u}}\) is an equilibrium point for the system (2.10)–(2.11).
Conversely, assume that (2.12) holds. Then, in view of (3.12), we have
for all \(u\in {\mathcal {U}}[0,T]\), \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\) and \({\mathrm{a.e.}\,}t \in [0,T]\). Now, let \(A\) be an arbitrary set of \({\mathcal {F}}_t\) and set
for an arbitrary \(v\in U\). Obviously, \(u\) is an admissible strategy. Moreover, we have, for every \( s\in [t,T]\),
and
Hence, in view of (3.13), we have
which in turn yields inequality (3.7) since \(v\in U\) and the set \(A\in {\mathcal {F}}_t\) are arbitrary.
Finally, both (3.8) and (3.9) follow from (3.7), by replacing \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\) with \(x\in \mathbb {R}\). \(\square \)
Remark 2
Theorem 1 does not address uniqueness. It is possible to have multiple controls to satisfy (2.10), (3.1)–(3.7), and then each control is an equilibrium strategy.
Remark 3
Define the so-called \({\mathcal {H}}\)-function associated with \((\hat{u}(t),p^{t,\xi }(t),q^{t,\xi }(t), P^{t,\xi }(t))\)
Then, it is easily checked that inequality (3.7) is equivalent to
For all practical purposes, it would be nice to find or characterize equilibrium points, through only maximizing the Hamiltonian \(H\), which amounts to only solving the first-order adjoint equation (3.1). In fact, this happens in the special case where the diffusion coefficient does not contain the control variable, i.e.,
whence, manifestly, inequality (3.7) is equivalent to
for all \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R}), \; {\mathrm{a.e.}\,}t \in [0,T],\; {\mathbb {P}}-a.s.\)
Another very useful case, which we will use in some examples below, is described in the following
Proposition 1
Assume that \(U\) is a convex subset of \(\mathbb {R}\), and the coefficients \(b, \sigma \) and \(h\) satisfy Assumption 1, and are locally Lipschitz in \(u\). Then, the admissible strategy \({\hat{u}}\) is an equilibrium point for the system (2.10)–(2.11) if and only if there is a pair of \(\mathbb {F}\)-adapted processes \(\left( p^{t,\xi },q^{t,\xi }\right) \) that satisfies (3.1)–(3.3) and for which
for all \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R}),\; {\mathrm{a.e.}\,}t \in [0,T],\; {\mathbb {P}}-a.s.\)
Proof
In view of (3.14), it suffices to show that \({\mathcal {H}}\) and \(H\) have the same Clark’s generalized gradient in \({\hat{u}}\). But, this follows for instance from Lemma 5.1. in Yong and Zhou [38], since \(U\) is a convex subset of \(\mathbb {R}\) and the coefficients \(b, \sigma \) and \(h\) are locally Lipschitz in \(u\) and, by Assumption 1, their derivatives in \(y\) are continuous in \((y,u)\). Hence, \({\hat{u}}\) is a maximizer of \({\mathcal {H}}(t,\xi ,\cdot ,p^{t,\xi }(t),q^{t,\xi }(t))\) if and only if it is a maximizer of \(H(t,\xi ,\cdot ,p^{t,\xi }(t), q^{t,\xi }(t))\). \(\square \)
Remark 4
In fact, both Theorem 1 and Proposition 1 extend to the following cost functionals parametrized by \((t,\xi )\in [0,T]\times L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R}){:}\)
where both \(h\) and \(g\) are allowed to explicitly depend on \((t,x)\). This is due to the fact that the spike variation and the subsequent Taylor expansions that are used to derive (3.11) are not affected by this extra dependence of \(h\) and \(g\) on \((t,\xi )\).
Remark 5
Theorem 1 and Proposition 1 extend to more general mean-field couplings than the mean. For couplings of the form \(E[\phi (X^{t,\xi }(s))]\) with sufficiently smooth functions \(\phi \), the SMP developed in Andersson and Djehiche [1] may be used to derive similar results. For the more general coupling involving the probability distribution \({\mathcal {L}}(X^{t,\xi }(s))\) of \(X^{t,\xi }(s)\), the SMP derived in Carmona and Delarue [13] together with the flow properties of solutions of (2.3) obtained recently by Buckdahn et al. [11] are to be used to obtain a similar characterization of the sub-game perfect equilibrium points.
4 Some Applications
In this section, we illustrate the above results through some examples discussed in Björk and Murgoci [6] and Björk et al. [7], using an extended Hamilton–Jacobi–Bellman equation. In these examples, we look for equilibrium strategies of feedback type, i.e., deterministic function \(\hat{\varphi }: [0,T]\times \mathbb {R}\longrightarrow U\) which satisfy (3.9). The corresponding equilibrium point is \({\hat{u}}(s){:=}\hat{\varphi }(s,{\hat{X}}(s))\), where, \({\hat{X}}\) is corresponding to the equilibrium dynamics given by the SDE
Although Assumption 1 does not hold for the cost functionals (for instance, the quadratic cost) of this section, the stochastic maximum principle in Theorem 1 can still be proved in a similar manner by exploiting the current linear dynamics. These details are omitted here.
4.1 Mean-Variance Portfolio Selection with Constant Risk Aversion
The dynamics over \([0,T]\) defined on \((\varOmega ,\mathcal {F},\mathbb {F},\mathbb {P})\) is given by the following SDE:
where \(r, \alpha \) and \(\sigma \) are real constants, and \(\alpha >r\).
The cost functional is given by
where the constant \(\gamma >0\) is the risk aversion coefficient. The associate dynamics, parametrized by \((t,x)\in [0,T]\times \mathbb {R}\), is
The Hamiltonian associated to this system is
and the \({\mathcal {H}}\)-function is
The equation for \(P\) takes the form
where \(P^{t,x}(T)= -\gamma \). We obtain \(P^{t,x}(s)= -\gamma e^{2r(T-s)}\) for \(s\in [t, T]\).
In view of Remark 3, \(\hat{\varphi }\) is an equilibrium point if and only if it maximizes the \({\mathcal {H}}\)-function. Such a maximum exists if and only if
Therefore, to characterize the equilibrium points, we only need to consider the first-order adjoint equation:
We try a solution of the form
where \(A_s\) and \(C_s\) are deterministic functions such that
Identifying the coefficients in (4.3) and (4.6), we get, for \(s\ge t\),
In view of (4.5), we have
Now, from (4.7), we have
which is deterministic and independent of \(x\).
Hence, from (4.5), we get
In view of (4.9), the equilibrium point is the deterministic function
It remains to determine \(A_s\) and \(C_s\).
Indeed, inserting (4.11) in (4.8), we obtain
giving the equations satisfied by \(A_s\) and \(C_s\)
The solutions of these equations are
Whence, we obtain the following explicit form of the equilibrium point:
which is identical to the one obtained in Björk and Murgoci [6] by solving an extended HJB equation.
4.2 Mean-Variance Portfolio Selection with State Dependent Risk Aversion
Consider the same state process over \([0,T]\) as in Sect. 4.1. Namely,
where \(r, \alpha \) and \(\sigma \) are real constants. The modified cost functional takes the form
where the risk aversion coefficient \(\gamma (x)>0\) is made dependent on the current wealth \(x\). We refer to Björk et al. [7] for an economic motivation of this dependence.
The associated dynamics, parametrized by \((t,x)\in [0,T]\times \mathbb {R}\), is
Now, since \(\gamma (x)\) is assumed strictly positive for all \(x\), the equilibrium points of \(J\) are the same as the ones of the cost functional
Therefore, we will find feedback equilibrium points associated with (4.14).
The Hamiltonian associated to this system is
and the \({\mathcal {H}}\)-function is
Again, in view of Remark 3, \(\hat{\varphi }\) is a equilibrium point if and only if it maximizes the \({\mathcal {H}}\)-function. Such a maximum exists if and only if
Therefore, to characterize the equilibrium points, we only need to consider the first-order adjoint equation:
We try a solution of the form
where \(A_s, B_s\), and \(C_s\) are deterministic functions such that
Identifying the coefficients in (4.13) and (4.16), we get for \(s\ge t\),
and, by (4.15), we have
But, from (4.17), we have
Therefore, we get from (4.20)
which together with (4.19) suggests an equilibrium point \( \hat{\varphi }\) of the form
It remains to determine \(A_s\) and \(C_s\).
Indeed, inserting (4.22) in (4.18), we obtain,
Manifestly, from (4.23), it is hard to draw any conclusion about the form of the deterministic functions \(A_s\) and \(C_s\) unless we have an explicit form of the function \(\gamma (x)\). In fact, a closer look at (4.23) suggests that a feasible identification of the coefficients is possible, for instance, when \(\gamma (x)=\frac{\gamma }{x}\). Let us examine this case.
4.2.1 The Case \(\gamma (x)=\frac{\gamma }{x}\)
Let us consider the particular case when
In this special case, (4.23) becomes
This suggests that the functions \(A_s, B_s\) and \(C_s\) solve the following system of equations:
which admits the following explicit solution:
Hence, the equilibrium point \(\hat{\varphi }\) is explicitly given by
4.3 Time-Inconsistent Linear-Quadratic Regulator
We consider the following variant of a time-inconsistent linear-quadratic regulator discussed in Björk and Murgoci [6]. We refer to recent work by Bensoussan et al. [4], Yong [36], and Hu et al. [19], where more general models are considered. The state process over \([0,T]\) defined on \((\varOmega ,\mathcal {F},\mathbb {F},\mathbb {P})\) is a scalar with dynamics
where \( a, b\), and \(\sigma \) are real constants. The cost functional is given by
where \(\gamma \) is a positive constant. The associated dynamics, parametrized by \((t,x)\in [0,T]\times \mathbb {R}\), is
As mentioned in Björk and Murgoci [6], in this time-inconsistent version of the linear-quadratic regulator, we want to control the system so that the final state \(X^{t,x}(T)\) stays as close as possible to \(X^{t,x}(t)=x\), while at the same time, we keep the control energy (expressed by the integral term) small. The time-inconsistency stems from the fact that the target point \(X^{t,x}(t)=x\) is changing with time.
The Hamiltonian associated to this system is
and the \({\mathcal {H}}\)-function is
Again, in view of Remark 3, \(\hat{\varphi }\) is an equilibrium point if and only if it maximizes the \({\mathcal {H}}\)-function. Such a maximizer is
Therefore, to characterize the equilibrium points, we only need to consider the first-order adjoint equation:
We try a solution of the form
where \(\alpha _s\) and \(\beta _s\) are deterministic functions such that
Identifying the coefficients in (4.26) and (4.29), we get, for \(s\ge t\),
and
On the other hand, in view of (4.28)
Thus, by (4.30), the function \(\varphi \) which yields the equilibrium point has the form
Therefore, (4.31) reduces to
suggesting that \((\alpha _s,\beta _s)\) solves the system of equations
The first equation in (4.33) yields the solution
The second equation is of Riccati type whose solution is \(\alpha _s{:=}\frac{v_s}{w_s}\), where \((v,w)\) solves the following system of linear differential equation:
which is obviously solvable.
5 Extension to Mean-Field Game Models
In this section, we extend the SMP approach to an \(N\)-player stochastic differential game of mean-field type where the \(i\)th player would like to find a strategy to optimize her own cost functional regardless of the other players’ cost functionals.
Let \(X=(X_1, \ldots , X_N)\) describe the states of the \(N\) players and \(u=(u_1, \ldots , u_N)\in \prod _{i=1}^N\mathcal {U}_i[0,T]\) be the ensemble of all the individual admissible strategies. Each \(u_i\) takes values in a non-empty subset \(U_i\) of \(\mathbb {R}\), and the class of admissible strategies is given by
To simplify the analysis, we consider a population of uniform agents so that \(U_i=U\) and they have the same initial state \(X_i(0)=x_0\) at time 0 for all \(i\in \{1, \ldots , N\}\). In this case, the \(N\) sets \(\mathcal {U}_i[0,T]\) are identical and equal to \(\mathcal {U}[0,T]\) . Let the dynamics be given by the following SDE:
where the strategy \(u_i\) does not enter the diffusion coefficient \(\sigma \).
For notational simplicity, we do not explicitly indicate the dependence of the state on the control by writing \(X_i^{u_i}(s)\). We take \({\mathbb {F}}\) to be the natural filtration of the \(N\)-dimensional standard Brownian motion \((W_1, \ldots , W_N)\) augmented by \({\mathbb {P}}\)-null sets of \({\mathcal {F}}\).
Denote
Then, the \(i\)th player selects \(u_i\in \mathcal {U}[0,T]\) to evaluate her cost functional
where
The associated dynamics, parametrized by \((t,x_i)\), is
The \(i\)th player interacts with others through the mean-field coupling term
which models the aggregate impact of all other players.
Note that the \(i\)th player assesses her cost functional over \([t,T]\) seen from her local state \(X_i(t)= x_i\) and she knows only the initial states of all other players at time 0, \((X_k(0)=x_0\), \( k\ne i\)). Thus the game may be cast as a decision problem where each player has incomplete state information about other players. The development of a solution framework in terms of a certain exact equilibrium notion is challenging. Our objective is to address this incomplete state information issue and design a set of individual strategies which has a meaningful interpretation. This will be achieved by using the so-called consistent mean-field approximation.
For a large \(N\), even if each player has full state information of the system, the exact characterization of the equilibrium points, based on the SMP, will have high complexity since each player leads to a variational inequality for the underlying Hamiltonians similar to (3.7) which is further coupled with the state processes of all other players. Therefore, we should rely on the mean-field approximation of our system.
We note that \(J^{i,N}\) depends on not only \(u_i\), but also all other players’ strategies \(u_{-i}\) through the mean-field coupling term \(X^{(-i)}\). This suggests that we extend Definition 1 to the \(N\)-player case as follows.
Definition 3
The admissible strategy \({\hat{u}}=({\hat{u}}_1, \ldots , {\hat{u}}_N)\) is a \(\delta _N\)-sub-game perfect equilibrium point for \(N\) players in the system (5.2)–(5.3) if for every \(i\in \{1,\ldots ,N\}\),
for each given \(u_i\in \mathcal {U}_i[0,T],\,x_i\in \mathbb {R}\) and \(\, {\mathrm{a.e.}\,}t\in [0,T]\), where \(u_i^{\varepsilon }\) is the spike variation (2.4) of the strategy \({\hat{u}}_i\) of the \(i\)th player using \(u_i\) and \(0\le \delta _N\rightarrow 0 \) as \(N\rightarrow \infty \).
The error term \(O(\delta _N)\) is due to the mean-field approximation to be introduced below for designing \({\hat{u}}\).
5.1 The Local Limiting Decision Problem
Let \(X^{(-i)}\) be approximated by a deterministic function \(\bar{X}(s)\) on \([0,T]\). Denote the cost functional
which is intended as an approximation of \(J^{i,N}\). Note that once \(\bar{X}\) is assumed fixed, \(\bar{J}^{i}\) is affected only by \(u_i\). The introduction of \(\bar{X}\) as a fixed function of time is based on the freezing idea in mean-field games. The reason is that \(X^{(-i)}=\frac{1}{N-1} \sum _{k\ne i}^N X_k\) is generated by many negligibly small players, and therefore, a given player has little influence on it.
The strategy selection of the \(i\)th player is based on finding a sub-game perfect equilibrium for \(\bar{J}^{i}\) to which the method based on the Stochastic Maximum Principle [cf. (3.8)] of Sect. 3 can be applied under the following conditions:
Assumption 2
-
(i)
The functions \(b(s,y,z,u), \sigma (s,y,z), h(s,y,z,w,u),g(y,z,w)\) are bounded.
-
(ii)
The functions \(b, \sigma \) are differentiable with respect to \((y,z)\). The derivatives are Lipschitz continuous in \((y,z,)\) and bounded.
-
(iii)
The functions \( h, g\) are differentiable with respect to \((y,z,w)\), and their derivatives are continuous in \((y,z, w, u)\) and \((y,z,w)\), respectively, and bounded.
Let \({\hat{u}}_i\in \mathcal {U}[0,T]\) be a sub-game perfect equilibrium point for (5.4) and (5.6) and denote the associated backward SDE
where for \(\zeta =y,z\),
for which
The closed-loop equilibrium state associated to \({\hat{u}}_i\) of the \(i\)th player is given by
We call \({\hat{u}}_i\) a decentralized strategy in that it has sample path dependence only on its local Bronian motion \(W_i\). The processes \(\{{\hat{u}}_k, 1\le k\le N\}\) are independent. Further, we impose
Assumption 3
All the processes \(\{{\hat{u}}_k, 1\le k\le N\}\) have the same law.
This restriction ensures that \(\{{{\hat{X}}}_i, 1\le i\le N\}\) are i.i.d. random processes. Since each \({\hat{u}}_i\) is obtained as a process adapted to the filtration generated by \(W_i\), it can be represented as a non-anticipative functional \(\hat{F}(\{W_i(s)\}_{s\le t}) \) of \(W_i\). For a given \(\bar{X}\), if non-uniqueness of \({\hat{u}}_i\) arises, we stipulate that the same functional \({\hat{F}}\) is used by all players applying their respective Brownian motions so that all the individual control processes have the same law. This means some coordination is necessary for the strategy selection under non-uniqueness. By the law of large numbers, the consistency condition on \(\bar{X}\) reads
A question of central interest is how to characterize the performance of the set of strategies \({\hat{u}}=({\hat{u}}_1, \ldots , \hat{u}_N)\) when they are implemented and assessed according to the original cost functionals \(\{J^{i,N}, 1\le i\le N\}\). An answer is provided in the following theorem for which the proof is displayed in the next section. This is the second main result of the paper.
Theorem 2
Under Assumptions 2 and 3, suppose there exists a solution to (5.7), (5.9) and (5.10). Then we have
Moreover, \({\hat{u}}=({\hat{u}}_1, \ldots , {\hat{u}}_N)\in \prod _{i=1}^N\mathcal {U}[0,T]\) is a \(\delta _N\)-sub-game perfect equilibrium for the system (5.2)–(5.3) where \(\delta _N\le C/\sqrt{N}\) and \(C\) depends only on \((b,\sigma , h, g,T)\). \(\square \)
If there exists a unique solution \((\bar{X}, {\hat{u}}_i)\) to (5.7), (5.9) and (5.10), each player can locally construct its strategy. When there are multiple solutions, the players need to coordinate to choose the same \(\bar{X}\) and further ensure that \(\{{\hat{u}}_i, 1\le i\le N\}\) have the same law.
6 Proof of Theorem 2
This section is devoted to the proof of Theorem 2. We first establish some performance estimates which will be used to conclude the proof of the theorem.
6.1 The Performance Estimate
We have
Now we fix \(i\in \{1, \ldots , N\}\) and change \({\hat{u}}_i\) to \(u_i^\varepsilon \) when all other players apply \({\hat{u}}_{-i}\), where
and \(u_i\in {\mathcal {U}}[0,T]\). We have
where \(X^{t,x_i}_i\) is the solution of (5.4) with admissible strategy \(u_i^\varepsilon \). The following estimates will be frequently used in the sequel.
Lemma 1
For the \(i\)th player, let \( X_i\) and \( {{\hat{X}}}_i\) be the state processes corresponding to \(u_i^\varepsilon \) and \({\hat{u}}_i\), respectively. Then
where \(C\) does not depend on \((t, x_i)\).
Proof
Using the SDEs (5.4) for the two state processes, we have
By Burkholder-Davis-Gundy’s inequality, we have
where \(C\) is a positive constant.
Noting that, in view of Assumption 2-(i), if the positive constant \(C\) denotes the bound of \(b\), we have
Thus, since \(b\) is Lipschitz in \((y,z)\), by Assumption 2-(ii), we have
The Cauchy–Schwarz inequality yields
In a similar fashion, since \(\sigma \) is Lipschitz in \((y,z)\), by Assumption 2-(ii), we obtain
Therefore,
The lemma follows from Gronwall’s lemma. \(\square \)
Lemma 2
We have
Proof
We write
Then, by Burkholder-Davis-Gundy’s inequality, we have
By the Lipschitz condition on \(b\) and \(\sigma \) (their derivatives w.r.t \((y,z)\) being bounded), we further obtain
which combined with Gronwall’s lemma yields the desired estimate. \(\square \)
Corollary 1
We have, for \(N\ge 2\),
where \(C\) does not depend on \(N\).
Proof
Thanks to Assumption 3, \({\hat{X}}_1, \ldots {\hat{X}}_N\) are i.i.d. processes. The estimate follows from Lemma 2. \(\square \)
6.2 Proof of Theorem 2
In order to estimate \( J^{i,N}(t,x_i; {\hat{u}})-J^{i,N}(t,x_i; {\hat{u}}_{-i}, u_i^\varepsilon )\), we introduce some notation. Let
We have
Similarly,
Now, noting that
the cost difference satisfies
We proceed to estimate
Lemma 3
We have
Proof
We will only estimate \(E\left[\int _t^T \varDelta _{h2}(s)\hbox {d}s\right]\). The second term may be handled in a similar fashion. Let
and
Then we have
Noting that by Assumption 2-(iii) on \(h\), we may perform similar calculations leading to (6.2) to obtain
Therefore,
Therefore, by the Cauchy–Schwarz inequality, we get
Subsequently by Lemma 1 and Corollary 1,
\(\square \)
Finally, we combine Lemma 3 and the relation (6.7) to conclude
Furthermore, since \({\hat{u}}\) is determined by (5.7)–(5.10),
we finally get
This completes the proof of Theorem 2. \(\square \)
7 A Mean-Field LQG Game
Consider a system of \(N\) players. The dynamics of the \(i\)th player is given by
Denote \(x=(x_1, \ldots , x_N)\) and \(u=(u_1,\ldots ,u_N)\). Its cost functional at time \(t\) is
where \(X^{(-i)}(t)=\frac{1}{N-1}\sum _{k\ne i}^N X_k(t)\). We take \(\varGamma _1\ne 0\) and \(\varGamma _2\ne 0\). A simple interpretation of the terminal cost is that each agent wants to adjust its terminal state based on its current state and also the mean-field term \(X^{(-i)}\) at time \(T\). The cost functional is time inconsistent. Below, we will apply a consistent mean-field approximation to construct a limiting control problem.
Following the scheme in Sect. 5, we introduce \(\bar{X}_T\) as an approximation of \(X^{(-i)}(T)\). The new cost functional is
This is a time-inconsistent control problem. The same approach as in Sect. 4.3 can be applied. The adjoint equation now reads
We look for a solution of the form
The same set of ODEs is obtained as in Sect. 4.3. The equilibrium strategy is given in the feedback form
when the current state is \(x_i\). The closed-loop equilibrium dynamics of the \(i\)th player is
Finally, we impose the consistency requirement. Assume all players have the same initial condition \(y_0\), and so \(\bar{X}_T\) can be obtained as \(E{\hat{X}}_i(T)\). Now we take expectation in (7.3) to construct the ODE
By obvious notation for the transition function \(\varPhi \), we write the solution of the ODE as
Now the consistency condition for \(\bar{X}\) becomes
For this approach to have a solution for any given \(y_0\), we need
If (7.4) holds, we can solve \(\bar{X}_T\) first and next determine the strategy (7.2).
7.1 The Performance Difference
Suppose (7.4) holds. For the performance estimate, we consider the following set of admissible strategies
which is smaller than \({\mathcal {U}}[0,T]\). The costs associated with \({\hat{u}}\) and \((u_i, {\hat{u}}_{-i})\) are, respectively, given by
The difference can be written as
where
For any fixed \(u_i\in {\mathcal {U}}_0[0,T],\) we can still prove Lemma 1. Corollary 1 also holds for \(\hat{u}_j, 1\le j\le N\). We have
where \(C\) may depend on \(u_i\). If \(u_i\in {\mathcal {U}}[0,T]\) were considered, we would be unable to obtain the second inequality above. Finally,
Thus, \({\hat{u}}\) is a \(\delta _N\)-sub-game perfect Nash equilibrium for \(N\) players where \(\delta _N\le C/\sqrt{N-1}\). \(\square \)
References
Andersson D, Djehiche B (2010) A maximum principle for SDE’s of mean-field type. Appl Math Optim 63(3):341–356
Bardi M (2012) Explicit solutions of some linear-quadratic mean field games. Netw Heterog Media 7(2):243–261
Bensoussan A, Frehse J, Yam P (2013) Mean field games and mean field type control theory, Briefs in Mathematics. Springer, New York
Bensoussan A, Sung KCJ, Yam SCP (2013) Linear–quadratic time-inconsistent mean field games. Dyn Games Appl 3(4):537–552
Bensoussan A, Sung KCJ, Yam SCP, Yung SP (2011) Linear-quadratic mean-field games (preprint). arXiv:1404.5741
Björk T, Murgoci A, (2008) A general theory of Markovian time inconsistent stochastic control problems. SSRN:1694759
Björk T, Murgoci A, Zhou XY (2014) Mean-variance portfolio optimization with state-dependent risk aversion. Math. Finance 24(1):1–24
Buckdahn R, Cardaliaguet P, Quincampoix M (2011) Some recent aspects of differential game theory. Dyn Games Appl 1(1):74–114
Buckdahn R, Li J, Peng S (2009) Mean-field backward stochastic differential equations and related partial differential equations. Stoch Proc Appl 119(10):3133–3154
Buckdahn R, Djehiche B, Li J (2011) A general stochastic maximum principle for SDEs of mean-field type. Appl Math Optim 64(2):197–216
Buckdahn R, Li J, Peng S, Rainer R (2014) Mean-field stochastic differential equations and associated PDEs (preprint). arXiv:1407.1215
Carmona R, Delarue F (2013) Probabilistic analysis of mean-field games. SIAM J Control Optim 51(4):2705–2734
Carmona R, Delarue F (2013) Forward–backward stochastic differential equations and controlled Mckean–Vlasov dynamics (preprint). arXiv:1303.5835v1
Ekeland I, Lazrak A, (2006) Being serious about non-commitment: subgame perfect equilibrium in continuous time. arXiv:math/0604264
Ekeland I, Pirvu TA (2008) Investment and consumption without commitment. Math. Financ. Econ. 2:57–86
Elliott RJ, Li X, Ni Y-H (2013) Discrete time mean-field stochastic linear-quadratic optimal control problems. Automatica 49(11):3222–3233
Goldman SM (1980) Consistent plans. Rev Financ Stud 47:533–537
Gomes DA, Mohr J, Souza RR (2010) Discrete time, finite state space mean field games. J Math Pures Appl 93:308–328
Hu Y, Jin H, Zhou XY (2012) Time-inconsistent stochastic linear-quadratic control. SIAM J Control Optim 50(3):1548–1572
Huang M (2010) Large-population LQG games involving a major player: the Nash certainty equivalence principle. SIAM J Control Optim 48(5):3318–3353
Huang M, Caines PE, Malhamé RP (2003) Individual and mass behaviour in large population stochastic wireless power control problems: centralized and Nash equilibrium solutions. In: Proceedings of the 42nd IEEE CDC, Maui, HI, 98–103
Huang M, Malhamé RP, Caines PE (2006) Large population stochastic dynamic games: closed-loop McKean–Vlasov systems and the Nash certainty equivalence principle. Commun Inf Syst 6(3):221–251
Huang M, Caines PE, Malhamé RP (2007) Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized \(\varepsilon \)-Nash equilibria. IEEE Trans Autom Control 52(9):1560–1571
Karatzas I, Shreve SE (1987) Brownian motion and stochastic calculus. Springer, New York
Kolokoltsov VN, Li J, Yang W (2011) Mean field games and nonlinear Markov processes (preprint). arXiv:1112.3744
Lasry J-M, Lions P-L (2007) Mean field games. Jpn J Math 2(1):229–260
Li T, Zhang J-F (2008) Asymptotically optimal decentralized control for large population stochastic multiagent systems. IEEE Trans Autom Control 53(7):1643–1660
Nourian M, Caines PE (2013) \(\epsilon \)-Nash mean field game theory for nonlinear stochastic dynamical systems with major and minor agents. SIAM J Control Optim 51(4):3302–3331
Peleg B, Menahem EY (1973) On the existence of a consistent course of action when tastes are changing. Rev Financ Stud 40:391–401
Peng S (1990) A general stochastic maximum principle for optimal control problems. SIAM J Control Optim 28(4):966–979
Phelps ES, Pollak RA (1968) On second-best national saving and game-equilibrium growth. Rev Econ Stud 35:185–199
Pollak RA (1968) Consistent planning. Rev Financ Stud 35:185–199
Strotz R (1955) Myopia and inconsistency in dynamic utility maximization. Rev Financ Stud 23:165–180
Tembine H, Zhu Q, Basar T (2011) Risk-sensitive mean-field stochastic differential games. In: Proceedings of the 18th IFAC world congress, Milan, Italy
Weintraub GY, Benkard CL, Van Roy B (2008) Markov perfect industry dynamics with many firms. Econometrica 76(6):1375–1411
Yong J (2013a) Linear-quadratic optimal control problems for mean-field stochastic differential equations. SIAM J Control Optim 51(4):2809–2838
Yong J (2013b) Linear-quadratic optimal control problems for mean-field stochastic differential equations: time-consistent solutions (preprint). arXiv:1304.3964
Yong J, Zhou XY (1999) Stochastic controls: Hamiltonian systems and HJB equations. Springer, New York
Zaccour G (2008) Time consistency in cooperative differential games: a tutorial. INFOR 46(1):81–92
Author information
Authors and Affiliations
Corresponding author
Additional information
Boualem Djehiche received financial support from the Swedish Export Credit Corporation (SEK) which is gratefully acknowledged. Many thanks to Tomas Björk and Georges Zaccour for fruitful discussions and their insightful comments. Minyi Huang’s work was partially supported by Natural Sciences and Engineering Research Council (NSERC) of Canada.
Rights and permissions
About this article
Cite this article
Djehiche, B., Huang, M. A Characterization of Sub-game Perfect Equilibria for SDEs of Mean-Field Type. Dyn Games Appl 6, 55–81 (2016). https://doi.org/10.1007/s13235-015-0140-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-015-0140-8