1 Introduction

In dynamic decision-making problems, a policy is time consistent if whenever it is optimal at time \(t\), it remains optimal when implemented at a later time \(s>t\). In optimal control, this is known as the Bellman principle. A time-inconsistent policy need not be optimal at later time \(s>t\), even if it is optimal at time \(t\). Time inconsistency occurs for example when a hyperbolic discount rate is preferred to an exponential discount rate or when the performance criterion is a nonlinear function of the expected utility such as the variance in the standard Markowitz investment problem. For a recent review of time consistency in dynamic decision-making problems, we refer to Ekeland and Lazrak [14], Ekeland and Lazrak [15], and Zaccour [39].

In his work on a deterministic Ramsay problem, Strotz [33] was the first to formulate the dynamic time-inconsistent decision problem as a game theoretic problem where it is natural to look for sub-game perfect equilibria. Pollak [32], Phelps and Pollak [31], Peleg and Menahem [29] and Goldman [17] extended this framework to discrete and continuous time dynamics. The recent works by Ekeland and Lazrak [14] and Ekeland and Pirvu [15] apply this game theoretic approach to an optimal investment and consumption problem under hyperbolic discounting for deterministic and stochastic models. Among their achievements, they provide a precise definition of the equilibrium concept in continuous time, using a Pontryagin type “spike variation” formulation (that we recall in Sect. 2 below) and derive among other things, an extension of the Hamilton–Jacobi–Bellman (HJB) equation along with a verification theorem that characterizes Markov (or feedback type) sub-game perfect equilibria. Their work is extended by Björk and Murgoci [6] and Björk et al. [7] to performance functions that are nonlinear functions of expected utilities for dynamics driven by a quite general class of Markov processes. Hu et al. [19] followed by Bensoussan et al. [4] characterize sub-game perfect equilibria using a Pontryagin type stochastic maximum principle (SMP) approach to a time-inconsistent stochastic linear-quadratic control problem of mean-field type, where the performance functional is a conditional expectation with respect to the history \({\mathcal {F}}_t\) of the system up to time \(t\). They derive a general sufficient condition for equilibria through a new class of flows of forward-backward stochastic differential equations (FBSDEs). The properties of this class of flows of FBSDEs are far from being well understood and deserve further investigation. Both the extended HJB equation provided in Björk and Murgoci [6] and Björk, Murgoci and Zhou [7] and the sufficient condition suggested by Hu et al. [19] give explicit expression of the equilibria only in very few cases. In a more recent work, Yong [37] studied a class of linear-quadratic models with very general weight matrices in the cost, and time-consistent equilibrium control is constructed by the stochastic maximum principle approach and Riccati equations. Yong [37] also considered closed-loop equilibrium strategies by discretization of time for the game.

In this paper, we suggest an SMP approach to time-inconsistent decision problems for dynamics that is driven by diffusion processes of mean-field type that are not necessarily Markov and whose performance criterion is a nonlinear function of the conditional expectation of a utility function, given the present location of the state process. We do not condition on the whole history \({\mathcal {F}}_t\) of the system as in Hu et al. [19] because for all practical purposes, in the best conditions, the decision-maker can only observe the current state of the system. She can never provide a complete and explicit form of the history \({\mathcal {F}}_t\) (which is a \(\sigma \)-algebra) of the system, simply because this is a huge set of information, except in trivial situations. Our model generalizes the one studied in Ekeland and Pirvu [15] and Björk et al. [6, 7].

In the first main result of the paper, the sub-game perfect equilibria (not necessarily of feedback type) are fully characterized as maximizers of the Hamiltonian associated with the system in a similar fashion as in the SMP for diffusions of mean-field type obtained in Andersson and Djehiche [1] and Buckdahn et al. [8]. This approach is illustrated by several examples, and the explicit solutions are obtained.

Next, we address the time-inconsistency issue in a mean-field game of \(N\) players. The players in such games are individually insignificant and interact via an aggregate effect generated by the population. There has existed a substantial literature on this class of games. Huang et al. [2123] introduced an approach based on consistent mean-field approximations to design decentralized strategies where each player solves a localized optimal control problem by dynamic programming. These strategies have an \(\varepsilon \)-Nash equilibrium property when applied to a large but finite population. Closely related developments were presented by Lasry and Lions [26] who introduced the name mean-field game, and Weintraub et al. [35] studied oblivious equilibria in a Markov decision setup. Within the linear-quadratic setup, various explicit solutions can be obtained; see, e.g., [2, 5, 23, 27]. Tembine et al. [34] introduced risk sensitive costs for mean-field games and analyzed the linear exponential quadratic Gaussian model in detail. For games with dynamics modelled by nonlinear diffusions, Carmona and Delarue [12] developed a probabilistic approach, and Kolokoltsov et al. [25] presented a very general mean-field game modeling framework via nonlinear Markov processes. Gomes et al. [18] considered games with discrete time and discrete states. For additional information, the reader may consult an overview of this area by Buckdahn et al. [8], and Bensoussan et al. [3].

To display an overall picture of various past developments in a mean field context, we briefly remark on the difference between mean-field type optimal control and mean-field games. For the former (see, e.g., [1, 16, 36]), there is only a single decision-maker who can instantly affect the mean of the underlying state process. In contrast, a player in a mean-field game with all comparably small players (called peers) has little influence on a mean-field term such as \(X^{(N)}=\frac{1}{N}\sum _{i=1}^N X_i\). An exception is games with a major player whose control can affect everyone notably; see, e.g., [20, 28].

So far, most existing research on mean-field games deals with time consistent cost functionals. The state feedback strategies based on consistent mean-field approximations are sub-game perfect in the infinite population limit model and so no individual has the incentive to revise its strategy when time moves forward. In a recent work, Bensoussan et al. [4] considered time-inconsistent quadratic cost functionals in a mean-field game with a continuum population and linear dynamics. A so-called time consistent optimal strategy is derived based on spike variation which is followed by a consistency condition on the mean field generated by an infinite population.

The novelty and main contributions of paper are summarized as follows.

  1. (i)

    Under the notion of sub-game perfect equilibrium control, we present a characterization of time-consistent control via a stochastic maximum principle for general nonlinear diffusion models. The associated adjoint equations are indexed by the time-state pair which the system has just evolved to.

  2. (ii)

    The notion of \(\delta _N\)-sub-game perfect equilibrium is introduced for a mean-field game of \(N\)-players with time-inconsistent costs. By combining mean-field approximations and the SMP, we obtain strategies using only local information of a player. The performance of the set of strategies is characterized via a \(\delta _N\)-sub-game perfect equilibrium, which implies, for large \(N\), no individual player has notable incentive to revise its strategy during its execution while interacting with other players.

  3. (iii)

    The computational aspect of our approaches is illustrated by various examples.

The mean-field game which we will analyze involves nonlinear dynamics, and each player is cost coupled with others by their average state \(X^{(-i)}=\frac{1}{N-1}\sum _{k\ne i}^N X_k\). Time inconsistency arises from the conditioning in the cost functional. Our approach for strategy design is to use a freezing idea so that the coupling term is approximated by a deterministic function \(\bar{X}\). This naturally introduces an optimal control problem with a time inconsistent cost which in turn is handled by the SMP approach. After finding the equilibrium strategy for the limiting control problem, we determine \(\bar{X}\) by a consistency condition. The remaining important issue is to analyze the performance of the obtained strategies when applied by \(N\) players.

The organization of the paper is as follows. In Sect. 2, we state the SMP approach for our game problem and the associated adjoint equations. Section 3 characterizes the equilibrium point by an SMP (Theorem 1). Section 4 is devoted to some examples illustrating the main results. In Sect. 5, we extend the previous results to a system of \(N\) decision-makers (Theorem 2). Section 6 provides the proof of Theorem 2. Section 7 presents explicit computations in a mean-field LQG game with time-inconsistent costs.

To streamline the presentation, we only consider the one-dimensional case for the state. The extension to the multidimensional case is by now straightforward. For the reader’s convenience, we make a convention on notation. The analysis of the mean-field game uses \(C\) as a generic constant which may change from place to place, but depends on neither the population size \(N\) nor the parameter \( \varepsilon \) of the spike variation.

2 Notation and Statement of the Problem

Let \(T>0\) be a fixed time horizon and \((\varOmega ,{\mathcal {F}},\mathbb {F}, \mathbb {P})\) be a given filtered probability space whose filtration \(\mathbb {F}=\{{\mathcal {F}}_s,\ 0\le s \le T\}\) satisfies the usual conditions of right continuity and completeness, on which a one-dimensional standard Brownian motion \(W=\{W_s\}_{s\ge 0}\) is given. We assume that \(\mathbb {F}\) is the natural filtration of \(W\) augmented by \(\mathbb {P}\)-null sets of \({\mathcal {F}}.\)

An admissible strategy \(u\) is an \(\mathbb {F}\)-adapted and square-integrable process with values in a non-empty subset \(U\) of \(\mathbb {R}\). We denote the set of all admissible strategies over \([0,T]\) by \({\mathcal {U}}[0,T]\).

For each admissible strategy \(u\in {\mathcal {U}}[0,T]\), we consider the dynamics given by the following SDE of mean-field type, defined on \((\varOmega ,{\mathcal {F}},\mathbb {F}, \mathbb {P})\),

$$\begin{aligned} \left\{ \begin{array}{l} \hbox {d}X^{u}(s)= b(s,X^{u}(s),E[X^{u}(s)],u(s))\hbox {d}s\\ \qquad +\,\sigma (s,X^{u}(s),E[X^{u}(s)],u(s))\hbox {d}W(s),\, 0<s\le T,\\ X^{u}(0)=x_0 \,\,(\in \mathbb {R}). \end{array} \right. \end{aligned}$$
(2.1)

We consider decision problems related to the following cost functional

$$\begin{aligned} J(t,x,u)=E\!\left[ \int _t^T h\left( s,X^{u,t,x}(s),E[X^{u,t,x}(s)],u(s)\right) \hbox {d}s+g\left( X^{u,t,x}(T),E[X^{u,t,x}(T)]\right) \!\right] \!, \end{aligned}$$
(2.2)

associated with the state process \(X^{u,t,x}\), parametrized by \((t,x)\in [0,T]\times \mathbb {R}\), whose dynamics is given by the SDE

$$\begin{aligned} \left\{ \begin{array}{l} \hbox {d}X^{u,t,x}(s)=b(s,X^{u,t,x}(s),E[X^{u,t,x}(s)],u(s))\hbox {d}s\\ \qquad +\,\sigma (s,X^{u,t,x}(s), E[X^{u,t,x}(s)],u(s))\hbox {d}W(s), \quad t<s\le T,\\ X^{u,t,x}(t) = x \,\,(\in \mathbb {R}), \end{array} \right. \end{aligned}$$
(2.3)

where

$$\begin{aligned}&b(s,y,z,v),\,\, \sigma (s,y,z,v),\,\, h(s,y,z,v): \,\,[0,T] \times \mathbb {R}\times \mathbb {R}\times U\longrightarrow \mathbb {R},\\&g(y,z): \,\,\mathbb {R}\times \mathbb {R}\longrightarrow \mathbb {R}, \quad s\in [0,T],\, y\in \mathbb {R},\, z\in \mathbb {R},\, v\in U. \end{aligned}$$

We note that \(X^{u,0,x_0}=X^{u}\). The mean of the state process appears in (2.1)–(2.3). This mean-field type model involves a single decision-maker, and a motivating example is the mean-variance portfolio optimization problem. And also because of its simplicity, in Remark 5 below, we mention possible extensions to more general classes of mean-field coupling. The inclusion of the average state of a finite number of decision-makers will be considered later in Sect. 5. Under some conditions, such an average converges to a mean term as well.

The dependence of (2.2)–(2.3) on the term \(E[X^{u,t,x}(s)]\) makes the system (2.2)–(2.3) time-inconsistent in the sense that the Bellman Principle for optimality does not hold, i.e., the \(t\)-optimal policy \(u^*(t,x, \cdot )\) which minimizes \(J(t,x, u)\) may not be optimal after \(t\): The restriction of \(u^*(t,x,\cdot )\) on \([t', T]\) does not minimize \(J(t',x',u)\) for some \(t'>t\) when the state process is steered to \(x'\) by \(u^*\). Therefore, as noted by Ekeland et al. [14, 15], time-inconsistent optimal solutions (although they exist mathematically) are irrelevant in practice. The decision-maker would not implement the \(t\)-optimal policy at a later time, if he/she is not forced to do so. The review paper by Zaccour [39] gives a nice guided tour to the concept of time consistency in differential games.

Following Ekeland et al. [14, 15], and Björk and Murgoci [6], we may view the problem as a game and look for a sub-game perfect equilibrium point \(\hat{u}\) in the following sense:

  1. (i)

    Assume that all players (selves) \(s\), such that \(s>t\), use the strategy \(\hat{u}(s)\).

  2. (ii)

    Then it is optimal in a certain sense for player (self) \(t\) to also use \(\hat{u}(t)\).

When the players use feedback strategies, depending on \(t\) and on the position \(x\) in space, player \(t\) will choose a strategy of the form \(u(t){:=}\varphi (t, x)\), where \(\varphi \) is a deterministic function, so the action chosen by player \(t\) is given by the mapping \(x\longrightarrow \varphi (t, x)\). The cost to player \(t\) is given by the functional \(J(t, x, \varphi )\). It is clear that \(J(t, x, \varphi )\) does not depend on the actions taken by any player \(s\) for \(s < t\), so in fact \(J\) depends only on the restriction of the strategy \(u\) to the time interval \([t, T]\). The strategy \(\varphi \) can thus be viewed as a complete description of the chosen strategies of all players in the game.

If feedback strategies are to be used, a deterministic function \(\hat{\varphi }:\, [0,T]\times \mathbb {R}\longrightarrow U\) is a sub-game perfect equilibrium point when the following actions are performed:

  1. (i)

    Assume that all players (selves) \(s\), such that \(s>t\), use the strategy \(\hat{\varphi }(s,\cdot )\).

  2. (ii)

    Then it is optimal in a certain sense for player (self) \(t\) to also use \(\hat{\varphi }(t,\cdot )\).

Although the \(t\)-self is intuitively assigned the cost \(J(t, x, u)\) for the initial time-state pair \((t,x)\), one cannot obtain the equilibrium strategy in this continuous time model by considering the unilateral perturbation of \(u(t)\) while the controls of all s-selves, \(s\in (t, T]\), are fixed. This is due to the fact that \(J(t,x, u)\) is insensitive to the modification of \(u(\cdot )\) at a single point of time \(t\). To characterize the equilibrium strategy \({\hat{u}}\), Ekeland et al. [14, 15] suggest the following definition that uses a “local” spike variation in a natural way.

Define the admissible strategy \(u^{\varepsilon }\) as the “local” spike variation of a given admissible strategy \(\hat{u}\in {\mathcal {U}}[0,T]\) over the set \([t,t+\varepsilon ]\),

$$\begin{aligned} u^{\varepsilon }(s){:=}\left\{ \begin{array}{ll} u(s),\,\,\,\; s\in [t,t+\varepsilon ],\\ \\ {\hat{u}}(s),\,\,\,\; s\in [t,T]\setminus [t,t+\varepsilon ],\end{array}\right. \end{aligned}$$
(2.4)

where \(u\in {\mathcal {U}}[0,T]\) and \(t\in [0,T]\) are arbitrarily chosen. We view \([t, t+\varepsilon ]\) as an infinitesimal coalition \(\text{ Co }[t, t+\varepsilon ]\) of \(s\)-selves which is associated with the dynamics (2.3) and the cost \(J(t, x, u^\varepsilon (\cdot ))\) and which is able to choose its strategy \(u(s)\), \(s\in [t, t+\varepsilon ]\). All future \(s\)-selves, \(s>t+\varepsilon \) affect \(J(t, x, u^\varepsilon (\cdot ))\) by their controls on \((t+\varepsilon , T]\). Let \({\mathcal {U}}[r,s]\) denote the restriction of \({\mathcal {U}}[0,T]\) on \([r,s]\) for \(0 \le r\le s\le T\). Then the strategy space of \(\text{ Co }[t, t+\varepsilon ]\) may be denoted by \({\mathcal {U}}[t, t+\varepsilon ]\).

Hu et al. [19] suggest the following open-loop form of the local spike variation:

$$\begin{aligned} u^{\varepsilon }(s){:=} {\hat{u}}(s)+\nu 1\!\!1_{[t,t+\varepsilon ]}(s),\,\,\,\; s\in [t,T], \end{aligned}$$
(2.5)

where \(\nu \in L^2(\varOmega , {\mathcal {F}}_t, \mathbb {P}; \mathbb {R}^l)\) is arbitrarily chosen. This form is suitable only when \(U\) is a linear space.

For either form of local spike variation, we have the following

Definition 1

The admissible strategy \(\hat{u}\) is a sub-game perfect equilibrium for the system (2.2)–(2.3) if

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\frac{J(t,x,{\hat{u}})-J(t,x, u^{\varepsilon })}{\varepsilon }\le 0 \end{aligned}$$
(2.6)

for all \(u\in {\mathcal {U}}[0,T]\), \(x\in \mathbb {R}\) and \({\mathrm{a.e.}\,}t \in [0,T]\). The corresponding equilibrium dynamics solves the SDE

$$\begin{aligned} \!\!\!\left\{ \begin{array}{l}\! \hbox {d}X^{\hat{u}}(s)= b(s,X^{\hat{u}}(s),E[X^{\hat{u}}(s)],\hat{u}(s))\hbox {d}s+\,\sigma (s,X^{\hat{u}}(s),E[X^{\hat{u}}(s)],{\hat{u}}(s))\hbox {d}W(s), \; 0<s\le T,\\ X^{\hat{u}}(0)=x_0. \end{array} \right. \end{aligned}$$
(2.7)

If feedback strategies are to be used, the previous definition reduces to the following

Definition 2

A deterministic function \({\hat{\varphi }}:\, [0,T]\times \mathbb {R}\longrightarrow U\) is a sub-game perfect equilibrium for the system (2.2)–(2.3) if

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\frac{J(t,x,{\hat{u}})-J(t,x, u^{\varepsilon })}{\varepsilon }\le 0 \end{aligned}$$
(2.8)

for all \(u\in {\mathcal {U}}[0,T], x\in \mathbb {R}\) and \({\mathrm{a.e.}\,}t \in [0,T]\), where \({\hat{u}}(s){:=}{\hat{\varphi }}(s,{\hat{X}}(s)), 0\le s\le T\) and \({\hat{X}}\) is given by (2.9). The associated equilibrium dynamics solves the SDE

$$\begin{aligned} \left\{ \begin{array}{lll} \hbox {d}{\hat{X}}(s)= b(s,{\hat{X}}(s),E[{\hat{X}}(s)],{\hat{\varphi }}(s,{\hat{X}}(s) ))\hbox {d}s\\ \quad \quad \quad +\sigma (s,{\hat{X}}(s),E[{\hat{X}}(s)],{\hat{\varphi }}(s,{\hat{X}}(s) ))\hbox {d}W(s),\quad 0<s\le T,\\ {\hat{X}}(0)=x_0. \end{array} \right. \end{aligned}$$
(2.9)

For brevity, sometimes we simply call \(\hat{u}\) an equilibrium point when there is no ambiguity.

The purpose of this study is to characterize sub-game perfect equilibria for the system (2.2)–(2.3) by evaluating the limit (2.6) in terms of a stochastic maximum principle criterion. We will apply the general stochastic maximum principle for SDEs of mean-field type derived in Buckdahn et al. [10].

The following assumptions (imposed in [10]) will be in force throughout Sects. 23. These assumptions can be made weaker, but we do not focus on this here.

Assumption 1

  1. (i)

    The functions \(b, \sigma , h,g\) are continuous in \((y,z,u)\), and bounded.

  2. (ii)

    The functions \(b, \sigma , h, g\) are twice continuously differentiable with respect to \((y,z)\), and their derivatives up to the second order are continuous in \((y,z, u)\), and bounded.

Although we are interested in characterizing sub-game perfect equilibrium points by considering the action of player \(t\) at a deterministic position \(x\), we perform the analysis for the more general case where player \(t\) has a random variable \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\) as a state.

For a given admissible strategy \(u\in {\mathcal {U}}[0,T]\), if player \(t\) has \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\) as its state, (2.3) becomes

$$\begin{aligned} \left\{ \begin{array}{lll} \hbox {d}X^{u,t,\xi }(s)= b(s,X^{u,t,\xi }(s),E[X^{u,t,\xi }(s)],u(s))\hbox {d}s\\ \quad \quad \quad \quad +\,\sigma (s,X^{u,t,\xi }(s), E[X^{u,t,\xi }(s)],u(s))\hbox {d}W(s), \quad t< s\le T,\\ X^{u,t,\xi }(t)=\xi , \end{array} \right. \end{aligned}$$
(2.10)

and the associated cost functional (2.2) becomes

$$\begin{aligned} J(t,\xi ,u){=}E\left[ \int _t^T h\left( s,X^{u,t,\xi }(s),E[X^{u,t,\xi }(s)],u(s)\right) \hbox {d}s+g\left( X^{u,t,\xi }(T),E[X^{u,t,\xi }(T)]\right) \!\right] \!. \end{aligned}$$
(2.11)

Remark 1

Definitions 1 and 2 can be accordingly generalized by replacing \((t,x)\) by \((t,\xi )\) and the inequality condition takes the form

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\frac{J(t,\xi ,{\hat{u}})-J(t,\xi , u^{\varepsilon })}{\varepsilon }\le 0 \end{aligned}$$
(2.12)

for all \(u\in {\mathcal {U}}[0,T]\), \(\xi \in L^2(\varOmega , {\mathcal {F}}_t, {\mathbb {P}}; { \mathbb {R}})\) and a.e. \(t\in [0,T]\).

It is a well-known fact, see, e.g., Karatzas and Shreve ([24], pp. 289–290), that under Assumption 1, for any \(u\in {\mathcal {U}}[0,T]\), the SDE (2.10) admits a unique strong solution. Moreover, there exists a constant \(C>0\) which depends only on the bounds of \(b,\sigma \) and their first derivatives w.r.t. \(y,z\), such that, for any \(t\in [0,T], \, u\in {\mathcal {U}}[0,T]\) and \(\xi , \xi ^{\prime }\in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), we also have the following estimates, \(\mathbb {P}-{\mathrm{a.s.}\,}\)

$$\begin{aligned} \begin{array}{lll} E\left[ \sup _{t\le s\le T}|X^{u,t,\xi }(s)|^2|\mathcal {F}_t\right] \le C(1+|\xi |^2+E[|\xi |^2]),\\ E\left[ \sup _{t\le s\le T}|X^{u,t,\xi }(s)-X^{u,t,\xi ^{\prime }}(s)|^2|\mathcal {F}_t\right] \le C(|\xi -\xi ^{\prime }|^2+E[|\xi -\xi ^{\prime }|^2]). \end{array} \end{aligned}$$
(2.13)

Moreover, the performance functional (2.11) is well defined and finite.

For convenience, we will use the following notation throughout the paper. We will denote by \(X^{t,\xi }{:=}X^{u,t,\xi }\) the solution of the SDE (2.10), associated with the strategy \(u\), and accordingly, \({\hat{X}}^{t,\xi }{:=}X^{{\hat{u}},t,\xi }\) associated with \({\hat{u}}\).

For \(\varphi =b, \sigma , h, g\), we define

$$\begin{aligned} \!\!\left\{ \!\begin{array}{llll} \delta \varphi ^{t,\xi }(s)=\varphi (s,{\hat{X}}^{t,\xi }(s),E[\hat{X}^{t,\xi }(s)],u(s))-\varphi (s,{\hat{X}}^{t,\xi }(s),E[\hat{X}^{t,\xi }(s)],{\hat{u}}(s)),\\ \varphi _y^{t,\xi }(s)=\frac{\partial \varphi }{\partial y}(s,\hat{X}^{t,\xi }(s),E[{\hat{X}}^{t,\xi }(s)],{\hat{u}}(s)),\quad \varphi ^{t,\xi }_{yy}(s)=\frac{\partial ^ 2\varphi }{\partial y^ 2}(s,{\hat{X}}^{t,\xi }(s),E[{\hat{X}}^{t,\xi }(s)],{\hat{u}}(s)),\\ \varphi ^{t,\xi }_z(s)=\frac{\partial \varphi }{\partial z}(s,\hat{X}^{t,\xi }(s),E[{\hat{X}}^{t,\xi }(s)],{\hat{u}}(s)),\quad \varphi ^{t,\xi }_{zz}(s)=\frac{\partial ^ 2\varphi }{\partial z^ 2}(s,{\hat{X}}^{t,\xi }(s),E[{\hat{X}}^{t,\xi }(s)],{\hat{u}}(s)). \end{array}\right. \end{aligned}$$
(2.14)

Let us introduce the Hamiltonian associated with the r.v. \(X\in L^1(\varOmega ,\mathcal {F}, \mathbb {P})\):

$$\begin{aligned} H(s,X,u,p,q){:=}b(s,X,E[X],u)p+\sigma (s,X,E[X],u)q-h(s,X,E[X],u). \end{aligned}$$
(2.15)

3 Adjoint Equations and the Stochastic Maximum Principle

In this section, we introduce the adjoint equations involved in the SMP which characterize the equilibrium points \({\hat{u}}\in {\mathcal {U}}[0,T]\) of our problem.

The first-order adjoint equation is the following linear backward SDE of mean-field type parametrized by \((t,\xi )\in [0,T]\times L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), satisfied by the processes \((p^{t,\xi }(s),q^{t,\xi }(s)),\, s\in [t,T],\)

$$\begin{aligned} \left\{ \begin{array}{lll} \hbox {d}p^{t,\xi }(s)=-\left\{ H^{t,\xi }_y(s)+E\left[ H^{t,\xi }_z(s)\right] \right\} \hbox {d}s+q^{t,\xi }(s)\hbox {d}W_s,\\ p^{t,x}(T)=-g^{t,\xi }_y(T)-E[g^{t,\xi }_z(T)], \end{array}\right. \end{aligned}$$
(3.1)

where, in view of the notation (2.14), for \(j=y,z\),

$$\begin{aligned} H^{t,\xi }_{j}(s){:=}\,b^{t,\xi }_{j}(s)p^{t,\xi }(s)+\sigma ^{t,\xi }_{j}(s)q^{t,\xi }(s)- h^{t,\xi }_{j}(s). \end{aligned}$$
(3.2)

This equation reduces to the standard one, when the coefficients do not explicitly depend on the expected value (or the marginal law) of the underlying diffusion process. Under Assumption 1 on \(b,\sigma , h, g\), by an adaptation of Theorem 3.1 in Buckdahn et al. [9], by keeping track of the parametrization \((t,\xi )\), Eq. (3.1) admits a unique \(\mathbb {F}\)-adapted solution \((p^{t,\xi },q^{t,\xi })\). Moreover, there exists a constant \(C>0\) such that, for all \(t\in [0,T]\) and \(\xi , \xi ^{\prime }\in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), we have the following estimate, \({\mathbb {P}}-a.s.,\)

$$\begin{aligned} \begin{array}{lll} E\left[\sup _{s\in [t,T]}|p^{t,\xi }(s)|^2+\int _t^T |q^{t,\xi }(s)|^2\, \hbox {d}s|\mathcal {F}_t\right]\le C(1+|\xi |^2+E[\xi ^2]). \end{array} \end{aligned}$$
(3.3)

The second order adjoint equation is the classical linear backward SDE, parametrized by \((t,\xi )\in [0,T]\times L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), which appears in Peng’s stochastic maximum principle (see Peng [30]):

(3.4)

where in view of (2.14),

$$\begin{aligned} H^{t,\xi }_{yy}(s)=b^{t,\xi }_{yy}(s)p^{t,\xi }(s) +\sigma ^{t,\xi }_{yy}(s)q^{t,\xi }(s)-h^{t,\xi }_{yy}(s). \end{aligned}$$
(3.5)

This is a standard linear backward SDE, whose unique \(\mathbb {F}\)-adapted solution \((P^{t,\xi },Q^{t,\xi })\) satisfies the following estimate: There exists a constant \(C>0\) such that, for all \(t\in [0,T]\) and \(\xi ,\xi ^{\prime }\in L^2(\varOmega ,\mathcal {F}_t,\mathbb {P}; \mathbb {R})\),

$$\begin{aligned} E\left[\sup _{s\in [t,T]}|P^{t,\xi }(s)|^2+\int _t^T|Q^{t,\xi }(s)|^2\,\hbox {d}s|\mathcal {F}_t\right]\le C(1+|\xi |^2+E[\xi ^2]), \quad \mathbb {P}-{\mathrm{a.s.}\,}\end{aligned}$$
(3.6)

The SDEs (3.1) and (3.4) have a unique solution for any fixed control \(u\in {\mathcal {U}}[0,T]\) and the corresponding estimates (3.3) and (3.6) hold. However, for Theorem 1 below, only the equilibrium control \({\hat{u}}\) is substituted into the two equations. The following theorem is the first main result of the paper.

Theorem 1

(Characterization of equilibrium strategies) Let Assumption 1 hold. Then \({\hat{u}}\) is an equilibrium strategy for the system (2.10)–(2.11) if and only if there are pairs of \(\mathbb {F}\)-adapted processes \(\left( p,q\right) \) and \(\left( P,Q\right) \) which satisfy (3.1)–(3.3) and (3.4)–(3.6), respectively, and for which

$$\begin{aligned}&H(t,\xi ,v,p^{t,\xi }(t),q^{t,\xi }(t))-H(t,\xi ,{\hat{u}}(t),p^{t,\xi }(t),q^{t,\xi }(t))\nonumber \\&\quad +\,\frac{1}{2}P^{t,\xi }(t)\left( \sigma (t,\xi , E[\xi ],v) -\sigma (t,\xi , E[\xi ],{\hat{u}}(t))\right) ^2 \le 0,\nonumber \\&\quad \text{ for } \text{ all }\,\,v\in U,\;\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R}),\;{\mathrm{a.e.}\,}t \in [0,T],\; {\mathbb {P}}-a.s. \end{aligned}$$
(3.7)

In particular, we have

$$\begin{aligned}&H(t,x,v,p^{t,x}(t),q^{t,x}(t))-H(t,x,{\hat{u}}(t),p^{t,x}(t),q^{t,x}(t))\nonumber \\&\quad +\,\frac{1}{2}P^{t,x}(t)\left( \sigma (t,x, x,v)-\sigma (t,x, x,{\hat{u}}(t))\right) ^2 \le 0,\nonumber \\&\quad \text{ for } \text{ all }\,\, v\in U,\; x\in \mathbb {R}, \; {\mathrm{a.e.}\,}t \in [0,T],\;{\mathbb {P}}-a.s. \end{aligned}$$
(3.8)

For feedback strategies, the deterministic function \({\hat{\varphi }}: \, [0,T]\times \mathbb {R}\longrightarrow U\) is an equilibrium strategy for the system (2.11)–(2.10) if and only if there are pairs of \(\mathbb {F}\)-adapted processes \(\left( p,q\right) \) and \(\left( P,Q\right) \) which satisfy (3.1)–(3.3) and (3.4)–(3.6), respectively, and for which

$$\begin{aligned}&H(t,x,v,p^{t,x}(t),q^{t,x}(t))-H(t,x,{\hat{\varphi }}(t,x),p^{t,x}(t),q^{t,x}(t))\nonumber \\&\quad +\,\frac{1}{2}P^{t,x}(t)\left( \sigma (t,x, x,v)-\sigma (t,x, x,\hat{\varphi }(t,x ))\right) ^2 \le 0, \nonumber \\&\quad \text{ for } \text{ all }\,\, v\in U,\; x\in \mathbb {R}, \; {\mathrm{a.e.}\,}t \in [0,T],\;{\mathbb {P}}-a.s. \end{aligned}$$
(3.9)

Proof

Denote

$$\begin{aligned} \delta H^{t,\xi }(s){:=} H(s,{\hat{X}}^{t,\xi }(s), u(s), p^{t,\xi }(s),q^{t,\xi }(s))-H(t,{\hat{X}}^{t,\xi }(s),\hat{u}(s),p^{t,\xi }(s),q^{t,\xi }(s)) \end{aligned}$$
(3.10)

where the Hamiltonian \(H\) is given by (2.15). By Theorem 2.1 in Buckdahn et al. [8], keeping track of the parametrization \((t,\xi )\), the key relation between the cost functional (2.11) and the associated Hamiltonian (2.15) reads

$$\begin{aligned} J(t,\xi ,{\hat{u}})-J(t,\xi ,u^{\varepsilon })=E\left[\int _t^{t+\varepsilon }\delta H^{t,\xi }(s)+\,\frac{1}{2}P^{t,\xi }(s)(\delta \sigma ^{t,\xi }(s))^2\,\hbox {d}s\right] +R(\varepsilon ), \end{aligned}$$
(3.11)

for arbitrary \(u\in {\mathcal {U}}[0,T]\) and \((t,\xi )\in [0,T]\times L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\), where

$$\begin{aligned} |R(\varepsilon )|\le \varepsilon \bar{\rho }(\varepsilon ), \end{aligned}$$

for some function \(\bar{\rho }: (0,\infty )\rightarrow (0,\infty )\) such that \(\bar{\rho }(\varepsilon )\downarrow 0\) as \(\varepsilon \downarrow 0\); see Eq. (3.53) of [8] for a similar upper bound estimate of the error term \(R(\varepsilon )\).

Dividing both sides of (3.11) by \(\varepsilon \) and then passing to the limit \(\varepsilon \downarrow 0\), in view of Assumption 1, (3.3) and (3.6), we obtain

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\frac{J(t,\xi ,{\hat{u}})-J(t,\xi , u^{\varepsilon })}{\varepsilon }=E\left[\delta H^{t,\xi }(t)+\,\frac{1}{2}P^{t,\xi }(t)(\delta \sigma ^{t,\xi }(t))^2\right]. \end{aligned}$$
(3.12)

Now, if (3.7) holds, by setting \(v{:=}u(t)\) for arbitrary \(u\in {\mathcal {U}}[0,T]\), we also have

$$\begin{aligned}&H(t,\xi ,u(t),p^{t,\xi }(t),q^{t,\xi }(t))-H(t,\xi ,\hat{u}(t),p^{t,\xi }(t),q^{t,\xi }(t))\\&\quad +\,\frac{1}{2}P^{t,\xi }(t)\left( \sigma (t,\xi ,E[\xi ],u(t)) -\sigma (t,\xi ,E[\xi ],\hat{u}(t))\right) ^2 \le 0, \quad {\mathbb {P}}-a.s. \end{aligned}$$

Therefore, by (3.12) we obtain (2.12), i.e., \({\hat{u}}\) is an equilibrium point for the system (2.10)–(2.11).

Conversely, assume that (2.12) holds. Then, in view of (3.12), we have

$$\begin{aligned} E\left[\delta H^{t,\xi }(t)+\,\frac{1}{2}P^{t,\xi }(t)(\delta \sigma ^{t,\xi }(t))^2\right]\le 0, \end{aligned}$$
(3.13)

for all \(u\in {\mathcal {U}}[0,T]\), \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\) and \({\mathrm{a.e.}\,}t \in [0,T]\). Now, let \(A\) be an arbitrary set of \({\mathcal {F}}_t\) and set

$$\begin{aligned} u(s){:=}v1\!\!1_A+{\hat{u}}(s)1\!\!1_{\varOmega \setminus A}, \quad t\le s\le T, \end{aligned}$$

for an arbitrary \(v\in U\). Obviously, \(u\) is an admissible strategy. Moreover, we have, for every \( s\in [t,T]\),

$$\begin{aligned} \delta H^{t,\xi }(s)= \left(H(s,\hat{X}^{t,\xi }(s),v,p^{t,\xi }(s),q^{t,\xi }(s))-H(s,{\hat{X}}^{t,\xi }(s),\hat{u}(s),p^{t,\xi }(s),q^{t,\xi }(s)))\right)1\!\!1_A, \end{aligned}$$

and

$$\begin{aligned} \delta \sigma ^{t,\xi }(s)=\left(\sigma (s,{\hat{X}}^{t,\xi }(s),E[\hat{X}^{t,\xi }(s)],v)-\sigma (s,{\hat{X}}^{t,\xi }(s),E[\hat{X}^{t,\xi }(s)],{\hat{u}}(s))\right)1\!\!1_A. \end{aligned}$$

Hence, in view of (3.13), we have

$$\begin{aligned}&E\big [\big (H(t,{\hat{X}}^{t,\xi }(t),v,p^{t,\xi }(t),q^{t,\xi }(t))-H(t, {\hat{X}}^{t,\xi }(t),{\hat{u}}(t),p^{t,\xi }(t),q^{t,\xi }(t))\big )1\!\!1_A\big ]\\&\quad +\,\frac{1}{2}E\big [ P^{t,\xi }(t)\big (\sigma (t,\hat{X}^{t,\xi }(t),E[{\hat{X}}^{t,\xi }(t)],v)-\sigma (t,\hat{X}^{t,\xi }(t),E[{\hat{X}}^{t,\xi }(t)],{\hat{u}}(t))\big )^21\!\!1_A \big ] \le 0, \end{aligned}$$

which in turn yields inequality (3.7) since \(v\in U\) and the set \(A\in {\mathcal {F}}_t\) are arbitrary.

Finally, both (3.8) and (3.9) follow from (3.7), by replacing \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R})\) with \(x\in \mathbb {R}\). \(\square \)

Remark 2

Theorem 1 does not address uniqueness. It is possible to have multiple controls to satisfy (2.10), (3.1)–(3.7), and then each control is an equilibrium strategy.

Remark 3

Define the so-called \({\mathcal {H}}\)-function associated with \((\hat{u}(t),p^{t,\xi }(t),q^{t,\xi }(t), P^{t,\xi }(t))\)

$$\begin{aligned} {\mathcal {H}}(t,\xi ,v)&:= H(t,\xi ,v,p^{t,\xi }(t),q^{t,\xi }(t)) -\frac{1}{2}P^{t,\xi }(t)\sigma ^2(t,\xi ,E[\xi ],{\hat{u}}(t))\nonumber \\&+\,\frac{1}{2}P^{t,\xi }(t)\left(\sigma (t,\xi ,E[\xi ],v)-\sigma (t,\xi ,E[\xi ],\hat{u}(t))\right)^2. \end{aligned}$$

Then, it is easily checked that inequality (3.7) is equivalent to

$$\begin{aligned} {\mathcal {H}}(t,\xi ,{\hat{u}}(t))=\max _{v\in U}{\mathcal H}(t,\xi ,v), \quad \text{ for } \text{ all }\quad \xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R}), \; {\mathrm{a.e.}\,}\; t \in [0,T],\; {\mathbb {P}}-a.s. \end{aligned}$$
(3.14)

For all practical purposes, it would be nice to find or characterize equilibrium points, through only maximizing the Hamiltonian \(H\), which amounts to only solving the first-order adjoint equation (3.1). In fact, this happens in the special case where the diffusion coefficient does not contain the control variable, i.e.,

$$\begin{aligned} \sigma (s,y,z,v)\equiv \sigma (s,y,z),\quad (s,y,z,v)\in [0,T]\times \mathbb {R}\times \mathbb {R}\times U, \end{aligned}$$

whence, manifestly, inequality (3.7) is equivalent to

$$\begin{aligned} H(t,\xi ,{\hat{u}}(t),p^{t,\xi }(t),q^{t,\xi }(t))=\underset{v\in U}{\max }\, H(t,\xi ,v,p^{t,\xi }(t),q^{t,\xi }(t)), \end{aligned}$$

for all \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R}), \; {\mathrm{a.e.}\,}t \in [0,T],\; {\mathbb {P}}-a.s.\)

Another very useful case, which we will use in some examples below, is described in the following

Proposition 1

Assume that \(U\) is a convex subset of \(\mathbb {R}\), and the coefficients \(b, \sigma \) and \(h\) satisfy Assumption 1, and are locally Lipschitz in \(u\). Then, the admissible strategy \({\hat{u}}\) is an equilibrium point for the system (2.10)–(2.11) if and only if there is a pair of \(\mathbb {F}\)-adapted processes \(\left( p^{t,\xi },q^{t,\xi }\right) \) that satisfies (3.1)–(3.3) and for which

$$\begin{aligned} H(t,\xi ,{\hat{u}}(t),p^{t,\xi }(t),q^{t,\xi }(t))=\underset{v\in U}{\max }\, H(t,\xi ,v,p^{t,\xi }(t),q^{t,\xi }(t)), \end{aligned}$$

for all \(\xi \in L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R}),\; {\mathrm{a.e.}\,}t \in [0,T],\; {\mathbb {P}}-a.s.\)

Proof

In view of (3.14), it suffices to show that \({\mathcal {H}}\) and \(H\) have the same Clark’s generalized gradient in \({\hat{u}}\). But, this follows for instance from Lemma 5.1. in Yong and Zhou [38], since \(U\) is a convex subset of \(\mathbb {R}\) and the coefficients \(b, \sigma \) and \(h\) are locally Lipschitz in \(u\) and, by Assumption 1, their derivatives in \(y\) are continuous in \((y,u)\). Hence, \({\hat{u}}\) is a maximizer of \({\mathcal {H}}(t,\xi ,\cdot ,p^{t,\xi }(t),q^{t,\xi }(t))\) if and only if it is a maximizer of \(H(t,\xi ,\cdot ,p^{t,\xi }(t), q^{t,\xi }(t))\). \(\square \)

Remark 4

In fact, both Theorem 1 and Proposition 1 extend to the following cost functionals parametrized by \((t,\xi )\in [0,T]\times L^2(\varOmega ,\mathcal {F}_t, \mathbb {P}; \mathbb {R}){:}\)

$$\begin{aligned} J(t,\xi ,u){=}E\!\left[ \!\int _t^T h\left( t,\xi ,s,X^{t,\xi }(s),E[X^{t,\xi }(s)],u(s)\right) \hbox {d}s {+}g\!\left( t,\xi ,X^{t,\xi }(T),E[X^{t,\xi }(T)]\right) \!\right] \!, \end{aligned}$$

where both \(h\) and \(g\) are allowed to explicitly depend on \((t,x)\). This is due to the fact that the spike variation and the subsequent Taylor expansions that are used to derive (3.11) are not affected by this extra dependence of \(h\) and \(g\) on \((t,\xi )\).

Remark 5

Theorem 1 and Proposition 1 extend to more general mean-field couplings than the mean. For couplings of the form \(E[\phi (X^{t,\xi }(s))]\) with sufficiently smooth functions \(\phi \), the SMP developed in Andersson and Djehiche [1] may be used to derive similar results. For the more general coupling involving the probability distribution \({\mathcal {L}}(X^{t,\xi }(s))\) of \(X^{t,\xi }(s)\), the SMP derived in Carmona and Delarue [13] together with the flow properties of solutions of (2.3) obtained recently by Buckdahn et al. [11] are to be used to obtain a similar characterization of the sub-game perfect equilibrium points.

4 Some Applications

In this section, we illustrate the above results through some examples discussed in Björk and Murgoci [6] and Björk et al. [7], using an extended Hamilton–Jacobi–Bellman equation. In these examples, we look for equilibrium strategies of feedback type, i.e., deterministic function \(\hat{\varphi }: [0,T]\times \mathbb {R}\longrightarrow U\) which satisfy (3.9). The corresponding equilibrium point is \({\hat{u}}(s){:=}\hat{\varphi }(s,{\hat{X}}(s))\), where, \({\hat{X}}\) is corresponding to the equilibrium dynamics given by the SDE

$$\begin{aligned} \left\{ \begin{array}{lll} \hbox {d}{{\hat{X}}}(s)=b(s,{{\hat{X}}}(s),E[{{\hat{X}}}(s)],\varphi (s,\hat{X}(s)))\hbox {d}s\\ \qquad +\,\sigma (s,{{\hat{X}}}(s),E[{{\hat{X}}}(s)],\varphi (s,\hat{X}(s)))\hbox {d}W(s),\quad 0<s\le T,\\ {{\hat{X}}}(0)=x_0. \end{array} \right. \end{aligned}$$

Although Assumption 1 does not hold for the cost functionals (for instance, the quadratic cost) of this section, the stochastic maximum principle in Theorem 1 can still be proved in a similar manner by exploiting the current linear dynamics. These details are omitted here.

4.1 Mean-Variance Portfolio Selection with Constant Risk Aversion

The dynamics over \([0,T]\) defined on \((\varOmega ,\mathcal {F},\mathbb {F},\mathbb {P})\) is given by the following SDE:

$$\begin{aligned} \hbox {d}X(s)=\left(r X(s)+\left(\alpha -r\right)u(s)\right)\hbox {d}s+\sigma u(s)\hbox {d}W(s),\quad X(0)=x_0 \, ( \in \mathbb {R}), \end{aligned}$$
(4.1)

where \(r, \alpha \) and \(\sigma \) are real constants, and \(\alpha >r\).

The cost functional is given by

$$\begin{aligned} J(t,x,u)&= \frac{\gamma }{2}Var(X^{t,x}(T))-E[X^{t,x}(T)] \nonumber \\&= E\left( \frac{\gamma }{2}\left(X^{t,x}(T)\right)^2-X^{t,x}(T) \right) -\frac{\gamma }{2}\left(E[X^{t,x}(T)]\right)^2, \end{aligned}$$
(4.2)

where the constant \(\gamma >0\) is the risk aversion coefficient. The associate dynamics, parametrized by \((t,x)\in [0,T]\times \mathbb {R}\), is

$$\begin{aligned} \hbox {d}X^{t,x}(s)=\left(r X^{t,x}(s)+\left(\alpha -r\right)u(s)\right)\hbox {d}s+\sigma u(s)\hbox {d}W(s),\,\, t<s\le T, \quad X^{t,x}(t)=x. \end{aligned}$$
(4.3)

The Hamiltonian associated to this system is

$$\begin{aligned} H(t,x,u,p,q)=\left(r x+\left(\alpha -r\right)u\right)p+\sigma uq, \end{aligned}$$

and the \({\mathcal {H}}\)-function is

$$\begin{aligned} {\mathcal {H}}(t,x,v){:=}\,H(t,x,v,p,q)-\frac{1}{2}P(\sigma {\hat{\varphi }}(t,x))^2+ \frac{1}{2}P\sigma ^2\left(v-{\hat{\varphi }}(t,x)\right)^2. \end{aligned}$$

The equation for \(P\) takes the form

$$\begin{aligned} \hbox {d}P^{t,x}(s)= -2r P^{t,x}(s)\hbox {d}s +Q^{t,x}(s)\hbox {d}W_s, \end{aligned}$$
(4.4)

where \(P^{t,x}(T)= -\gamma \). We obtain \(P^{t,x}(s)= -\gamma e^{2r(T-s)}\) for \(s\in [t, T]\).

In view of Remark 3, \(\hat{\varphi }\) is an equilibrium point if and only if it maximizes the \({\mathcal {H}}\)-function. Such a maximum exists if and only if

$$\begin{aligned} (\alpha -r)p+\sigma q=0. \end{aligned}$$
(4.5)

Therefore, to characterize the equilibrium points, we only need to consider the first-order adjoint equation:

$$\begin{aligned} \left\{ \begin{array}{lll} \hbox {d}p^{t,x}(s)=-r p^{t,x}(s)\hbox {d}s+q^{t,x}(s)\hbox {d}W(s),\\ p^{t,x}(T)=1-\gamma \left({{\hat{X}}}^{t,x}(T)-E[{{\hat{X}}}^{t,x}(T)]\right). \end{array} \right. \end{aligned}$$
(4.6)

We try a solution of the form

$$\begin{aligned} p^{t,x}(s)=C_s-A_s\left({{\hat{X}}}^{t,x}(s)-E[{{\hat{X}}}^{t,x}(s)]\right), \end{aligned}$$
(4.7)

where \(A_s\) and \(C_s\) are deterministic functions such that

$$\begin{aligned} A_T=\gamma ,\,\,\, C_T=1. \end{aligned}$$

Identifying the coefficients in (4.3) and (4.6), we get, for \(s\ge t\),

$$\begin{aligned}&(2r A_s+\dot{A}_s)\left({{\hat{X}}}^{t,x}(s)-E[\hat{X}^{t,x}(s)]\right)+(\alpha -r)A_s(\hat{\varphi }(s,\hat{X}^{t,x}(s))-E[\hat{\varphi }(s,{{\hat{X}}}^{t,x}(s))])\nonumber \\&\quad =\dot{C}_s+r C_s,\end{aligned}$$
(4.8)
$$\begin{aligned}&q^{t,x}(s)=-A_s\sigma \hat{\varphi }(s,{{\hat{X}}}^{t,x}(s)). \end{aligned}$$
(4.9)

In view of (4.5), we have

$$\begin{aligned} (\alpha -r)p^{t,x}(t)+\sigma q^{t,x}(t)=0. \end{aligned}$$
(4.10)

Now, from (4.7), we have

$$\begin{aligned} p^{t,x}(t)=C_t, \end{aligned}$$

which is deterministic and independent of \(x\).

Hence, from (4.5), we get

$$\begin{aligned} q^{t,x}(t)=-\frac{\alpha -r}{\sigma }C_t. \end{aligned}$$

In view of (4.9), the equilibrium point is the deterministic function

$$\begin{aligned} \hat{\varphi }(s){:=}\frac{\alpha -r}{\sigma ^2}\frac{C_s}{A_s}, \quad 0\le s\le T. \end{aligned}$$
(4.11)

It remains to determine \(A_s\) and \(C_s\).

Indeed, inserting (4.11) in (4.8), we obtain

$$\begin{aligned} (\dot{A}_s+2r A_s)({{\hat{X}}}(s)-E[{{\hat{X}}}^{t,x}(s)])=\dot{C}_s+r C_s, \end{aligned}$$

giving the equations satisfied by \(A_s\) and \(C_s\)

$$\begin{aligned} \left\{ \begin{array}{lll} \dot{A}_s+2r A_s=0,&{}~ A_T=\gamma ,\\ \dot{C}_s+r C_s=0,&{}~ C_T=1. \end{array} \right. \end{aligned}$$

The solutions of these equations are

$$\begin{aligned} A_s=\gamma e^{2r (T-s)},\quad C_s=e^{r (T-s)},\quad 0\le s\le T. \end{aligned}$$

Whence, we obtain the following explicit form of the equilibrium point:

$$\begin{aligned} \hat{\varphi }(s)=\frac{1}{\gamma }\frac{\alpha -r}{\sigma ^2}e^{-r (T-s)},\quad 0\le s\le T, \end{aligned}$$

which is identical to the one obtained in Björk and Murgoci [6] by solving an extended HJB equation.

4.2 Mean-Variance Portfolio Selection with State Dependent Risk Aversion

Consider the same state process over \([0,T]\) as in Sect. 4.1. Namely,

$$\begin{aligned} \hbox {d}X(s)=\left(r X(s)+\left(\alpha -r\right)u(s)\right)\hbox {d}s+\sigma u(s)\hbox {d}W(s),\quad X(0)=x_0, \end{aligned}$$
(4.12)

where \(r, \alpha \) and \(\sigma \) are real constants. The modified cost functional takes the form

$$\begin{aligned}\begin{array}{lll} J(t,x,u)=\frac{\gamma (x)}{2}Var(X^{t,x}(T))-E[X^{t,x}(T)], \\ \end{array} \end{aligned}$$

where the risk aversion coefficient \(\gamma (x)>0\) is made dependent on the current wealth \(x\). We refer to Björk et al. [7] for an economic motivation of this dependence.

The associated dynamics, parametrized by \((t,x)\in [0,T]\times \mathbb {R}\), is

$$\begin{aligned} \hbox {d}X^{t,x}(s)=\left(r X^{t,x}(s)+\left(\alpha -r\right)u(s)\right)\hbox {d}s+\sigma u(s)\hbox {d}W(s),\,\, t<s\le T, \quad X^{t,x}(t)=x. \end{aligned}$$
(4.13)

Now, since \(\gamma (x)\) is assumed strictly positive for all \(x\), the equilibrium points of \(J\) are the same as the ones of the cost functional

$$\begin{aligned} \bar{J}(t,x,u)=\frac{1}{2}Var(X^{t,x}(T))- \gamma ^{-1}(x)E[X^{t,x}(T)]. \end{aligned}$$
(4.14)

Therefore, we will find feedback equilibrium points associated with (4.14).

The Hamiltonian associated to this system is

$$\begin{aligned} H(t,x,u,p,q)=\left(r x+\left(\alpha -r\right)u\right)p+\sigma uq. \end{aligned}$$

and the \({\mathcal {H}}\)-function is

$$\begin{aligned} {\mathcal {H}}(t,x,v){:=}H(t,x,v,p,q)-\frac{1}{2}P(\sigma {\hat{\varphi }}(t,x))^2+ \frac{1}{2}P\sigma ^2\left( v-{\hat{\varphi }}(t,x)\right)^2. \end{aligned}$$

Again, in view of Remark 3, \(\hat{\varphi }\) is a equilibrium point if and only if it maximizes the \({\mathcal {H}}\)-function. Such a maximum exists if and only if

$$\begin{aligned} (\alpha -r)p+\sigma q=0. \end{aligned}$$
(4.15)

Therefore, to characterize the equilibrium points, we only need to consider the first-order adjoint equation:

$$\begin{aligned} \left\{ \begin{array}{lll} \hbox {d}p^{t,x}(s)=-r p^{t,x}(s)\hbox {d}s+q^{t,x}(s)\hbox {d}W(s),\\ \\ p^{t,x}(T)=\gamma ^{-1}(x)-\left({{\hat{X}}}^{t,x}(T)-E[{{\hat{X}}}^{t,x}(T)]\right), \end{array} \right. \end{aligned}$$
(4.16)

We try a solution of the form

$$\begin{aligned} p^{t,x}(s)=C_s\gamma ^{-1}(x)-A_s\left({{\hat{X}}}^{t,x}(s)-E[{{\hat{X}}}^{t,x}(s)]\right), \end{aligned}$$
(4.17)

where \(A_s, B_s\), and \(C_s\) are deterministic functions such that

$$\begin{aligned} A_T=C_T=1. \end{aligned}$$

Identifying the coefficients in (4.13) and (4.16), we get for \(s\ge t\),

$$\begin{aligned}&(\dot{A}_s+2r A_s)\left({{\hat{X}}}^{t,x}(s)-E[\hat{X}^{t,x}(s)]\right)+(\alpha -r)A_s\left( \hat{\varphi }(s,{{\hat{X}}}^{t,x}(s))-E[ \hat{\varphi }(s,{{\hat{X}}}^{t,x}(s))]\right)\nonumber \\&\quad =(\dot{C}_s+r C_s)\gamma ^{-1}(x),\end{aligned}$$
(4.18)
$$\begin{aligned}&\quad q^{t,x}(s)=-A_s\sigma \varphi (s,{{\hat{X}}}^{t,x}(s)), \end{aligned}$$
(4.19)

and, by (4.15), we have

$$\begin{aligned} (\alpha -r)p^{t,x}(t)+\sigma q^{t,x}(t)=0, \end{aligned}$$
(4.20)

But, from (4.17), we have

$$\begin{aligned} p^{t,x}(t)=C_t\gamma ^{-1}(x). \end{aligned}$$

Therefore, we get from (4.20)

$$\begin{aligned} q^{t,x}(t)=-\frac{\alpha -r}{\sigma }C_t\gamma ^{-1}(x), \end{aligned}$$
(4.21)

which together with (4.19) suggests an equilibrium point \( \hat{\varphi }\) of the form

$$\begin{aligned} \hat{\varphi }(s,y)=\frac{\alpha -r}{\sigma ^2}\frac{C_s}{A_s} \gamma ^{-1}(y),\quad 0\le s\le T. \end{aligned}$$
(4.22)

It remains to determine \(A_s\) and \(C_s\).

Indeed, inserting (4.22) in (4.18), we obtain,

$$\begin{aligned}&(\dot{A}_s+2r A_s)\left({{\hat{X}}}^{t,x}(s)-E[\hat{X}^{t,x}(s)]\right)&+\frac{(\alpha -r)^2}{\sigma ^2}C_s\left(\gamma ^{-1}(\hat{X}^{t,x}(s))-E[\gamma ^{-1}({{\hat{X}}}^{t,x}(s))]\right) \nonumber \\&\quad =(\dot{C}_s+r C_s)\gamma ^{-1}(x). \end{aligned}$$
(4.23)

Manifestly, from (4.23), it is hard to draw any conclusion about the form of the deterministic functions \(A_s\) and \(C_s\) unless we have an explicit form of the function \(\gamma (x)\). In fact, a closer look at (4.23) suggests that a feasible identification of the coefficients is possible, for instance, when \(\gamma (x)=\frac{\gamma }{x}\). Let us examine this case.

4.2.1 The Case \(\gamma (x)=\frac{\gamma }{x}\)

Let us consider the particular case when

$$\begin{aligned} \gamma (x)=\frac{\gamma }{x}. \end{aligned}$$

In this special case, (4.23) becomes

$$\begin{aligned} \left( \dot{A}_s+2rA_s+\frac{(\alpha -r)^2}{\gamma \sigma ^2}C_s\right) \left(X^{t,x}(s)-E[X^{t,x}(s)]\right)-(\dot{C}_s+r C_s)\frac{x}{\gamma }=0. \end{aligned}$$

This suggests that the functions \(A_s, B_s\) and \(C_s\) solve the following system of equations:

$$\begin{aligned} \left\{ \begin{array}{lll} \dot{A}_s+2rA_s+\frac{(\alpha -r)^2}{\gamma \sigma ^2}C_s=0,\\ \dot{C}_s+r C_s=0, \\ A_T=C_T=1, \end{array} \right. \end{aligned}$$
(4.24)

which admits the following explicit solution:

$$\begin{aligned} A_s=e^{2r (T-s)}+\frac{(\alpha -r)^2}{r\gamma \sigma ^2} \left(e^{2r(T-s)}-e^{r(T-s)}\right),\quad C_s=e^{r(T-s)}, \; 0\le s\le T. \end{aligned}$$

Hence, the equilibrium point \(\hat{\varphi }\) is explicitly given by

$$\begin{aligned} \hat{\varphi }(s,y)&= \frac{\alpha -r}{\gamma \sigma ^2}\frac{C_s}{A_s}y\\&= \frac{\alpha -r}{\gamma \sigma ^2}\left(e^{r (T-s)}+\frac{(\alpha -r)^2}{r\gamma \sigma ^2} \left(e^{r(T-s)}-1\right)\right)^{-1}y,\quad (s,y)\in [0,T]\times \mathbb {R}. \end{aligned}$$

4.3 Time-Inconsistent Linear-Quadratic Regulator

We consider the following variant of a time-inconsistent linear-quadratic regulator discussed in Björk and Murgoci [6]. We refer to recent work by Bensoussan et al. [4], Yong [36], and Hu et al. [19], where more general models are considered. The state process over \([0,T]\) defined on \((\varOmega ,\mathcal {F},\mathbb {F},\mathbb {P})\) is a scalar with dynamics

$$\begin{aligned} \hbox {d}X(s)=\left(aX(s)+bu(s)\right)\hbox {d}s+\sigma \hbox {d}W(s),\quad X(0)=x_0, \end{aligned}$$
(4.25)

where \( a, b\), and \(\sigma \) are real constants. The cost functional is given by

$$\begin{aligned} J(t,x,u)=\frac{1}{2}E\left[\int _t^T u^2(s)\,\hbox {d}s\right]+\frac{\gamma }{2}E\left[ \left( X^{t,x}(T)-x\right) ^2\right] , \end{aligned}$$

where \(\gamma \) is a positive constant. The associated dynamics, parametrized by \((t,x)\in [0,T]\times \mathbb {R}\), is

$$\begin{aligned} \hbox {d}X^{t,x}(s)=\left(aX^{t,x}(s)+bu(s)\right)\hbox {d}s+\sigma \hbox {d}W(s),\,\, t<s\le T, \quad X^{t,x}(t)=x. \end{aligned}$$
(4.26)

As mentioned in Björk and Murgoci [6], in this time-inconsistent version of the linear-quadratic regulator, we want to control the system so that the final state \(X^{t,x}(T)\) stays as close as possible to \(X^{t,x}(t)=x\), while at the same time, we keep the control energy (expressed by the integral term) small. The time-inconsistency stems from the fact that the target point \(X^{t,x}(t)=x\) is changing with time.

The Hamiltonian associated to this system is

$$\begin{aligned} H(s,x,u,p,q){:=}\left(ax+bu\right)p+\sigma q-\frac{1}{2}u^2. \end{aligned}$$
(4.27)

and the \({\mathcal {H}}\)-function is

$$\begin{aligned} {\mathcal {H}}(t,x,v) {:=} H(t,x,v,p,q)-\frac{1}{2}P\sigma ^2. \end{aligned}$$

Again, in view of Remark 3, \(\hat{\varphi }\) is an equilibrium point if and only if it maximizes the \({\mathcal {H}}\)-function. Such a maximizer is

$$\begin{aligned} {\hat{\varphi }}=bp. \end{aligned}$$
(4.28)

Therefore, to characterize the equilibrium points, we only need to consider the first-order adjoint equation:

$$\begin{aligned} \left\{ \begin{array}{lll} \hbox {d}p^{t,x}(s)=-ap^{t,x}(s)\hbox {d}s+q^{t,x}(s)\hbox {d}W(s), \\ \\ p^{t,x}(T)=\gamma (x-X^{t,x}(T)). \end{array} \right. \end{aligned}$$
(4.29)

We try a solution of the form

$$\begin{aligned} p^{t,x}(s)=\beta _sx-\alpha _s{{\hat{X}}}^{t,x}(s), \end{aligned}$$
(4.30)

where \(\alpha _s\) and \(\beta _s\) are deterministic functions such that

$$\begin{aligned} \alpha _T=\beta _T=\gamma . \end{aligned}$$

Identifying the coefficients in (4.26) and (4.29), we get, for \(s\ge t\),

$$\begin{aligned} (\dot{\alpha }_s+2a\alpha _s){{\hat{X}}}^{t,x}(s)+b\alpha _s{\hat{\varphi }}(s,\hat{X}^{t,x}(s))=(\dot{\beta }_s+a\beta _s)x, \end{aligned}$$
(4.31)

and

$$\begin{aligned} q^{t,x}(s)=-\sigma \alpha _s. \end{aligned}$$

On the other hand, in view of (4.28)

$$\begin{aligned} {\hat{\varphi }}(t,x)=bp^{t,x}(t). \end{aligned}$$

Thus, by (4.30), the function \(\varphi \) which yields the equilibrium point has the form

$$\begin{aligned} \hat{\varphi }(s,y)=b(\beta _s-\alpha _s)y, \quad (s,y)\in [0,T] \times \mathbb {R}. \end{aligned}$$
(4.32)

Therefore, (4.31) reduces to

$$\begin{aligned} (\dot{\alpha }_s+(2a+b^2\beta _s)\alpha _s-b^2\alpha ^2_s){{\hat{X}}}^{t,x}(s)=(\dot{\beta }_s+a\beta _s)x, \end{aligned}$$

suggesting that \((\alpha _s,\beta _s)\) solves the system of equations

$$\begin{aligned} \left\{ \begin{array}{lll} \dot{\beta }_s+a\beta _s=0,\\ \dot{\alpha }_s+(2a+b^2\beta _s)\alpha _s-b^2\alpha ^2_s=0,\\ \alpha _T=\gamma , \,\,\beta _T=\gamma . \end{array} \right. \end{aligned}$$
(4.33)

The first equation in (4.33) yields the solution

$$\begin{aligned} \beta _s=\gamma e^{a(T-s)}. \end{aligned}$$

The second equation is of Riccati type whose solution is \(\alpha _s{:=}\frac{v_s}{w_s}\), where \((v,w)\) solves the following system of linear differential equation:

$$\begin{aligned} \left( \begin{array}{lll} \dot{v}_s \\ \dot{w}_s\end{array}\right) =\left( \begin{array}{lll} -2a &{}\quad 0 \\ -b^2 &{} \quad b^2\beta _s\end{array}\right) \left( \begin{array}{lll} v_s \\ w_s\end{array}\right) ,\quad \left( \begin{array}{lll} v_T\\ w_T\end{array}\right) =\left( \begin{array}{lll} \gamma \\ 1 \end{array}\right) . \end{aligned}$$

which is obviously solvable.

5 Extension to Mean-Field Game Models

In this section, we extend the SMP approach to an \(N\)-player stochastic differential game of mean-field type where the \(i\)th player would like to find a strategy to optimize her own cost functional regardless of the other players’ cost functionals.

Let \(X=(X_1, \ldots , X_N)\) describe the states of the \(N\) players and \(u=(u_1, \ldots , u_N)\in \prod _{i=1}^N\mathcal {U}_i[0,T]\) be the ensemble of all the individual admissible strategies. Each \(u_i\) takes values in a non-empty subset \(U_i\) of \(\mathbb {R}\), and the class of admissible strategies is given by

$$\begin{aligned} \mathcal {U}_i[0,T]=\Big \{u_i: [0,T]\times \varOmega \longrightarrow U_i;\,\, u_i\, \text{ is }\ \mathbb {F}\text{-adapted } \text{ and } \text{ square } \text{ integrable }\Big \}. \end{aligned}$$
(5.1)

To simplify the analysis, we consider a population of uniform agents so that \(U_i=U\) and they have the same initial state \(X_i(0)=x_0\) at time 0 for all \(i\in \{1, \ldots , N\}\). In this case, the \(N\) sets \(\mathcal {U}_i[0,T]\) are identical and equal to \(\mathcal {U}[0,T]\) . Let the dynamics be given by the following SDE:

$$\begin{aligned} \hbox {d}X_i(s)= b(s,X_i(s),E[X_i(s)],u_i(s))\hbox {d}s+\sigma (s,X_i(s),E[X_i(s)])\hbox {d}W_i(s), \end{aligned}$$
(5.2)

where the strategy \(u_i\) does not enter the diffusion coefficient \(\sigma \).

For notational simplicity, we do not explicitly indicate the dependence of the state on the control by writing \(X_i^{u_i}(s)\). We take \({\mathbb {F}}\) to be the natural filtration of the \(N\)-dimensional standard Brownian motion \((W_1, \ldots , W_N)\) augmented by \({\mathbb {P}}\)-null sets of \({\mathcal {F}}\).

Denote

$$\begin{aligned} (u_{-i},v){:=}\,(u_1,\ldots ,u_{i-1},v,u_{i+1},\ldots ,u_N),\quad i=1,\ldots ,N. \end{aligned}$$

Then, the \(i\)th player selects \(u_i\in \mathcal {U}[0,T]\) to evaluate her cost functional

$$\begin{aligned} v\mapsto J^{i,N}(t,x_i;u_{-i},v){:=}J^{i,N} (t,x_i;u_1,\ldots ,u_{i-1},v,u_{i+1},\ldots ,u_N), \end{aligned}$$

where

$$\begin{aligned} J^{i,N}(t,x_i;u)&= E\left[ \int _t^T h\left( s,X^{t,x_i}_i(s),E[X^{t,x_i}_i(s)], X^{(-i)}(s),u_i(s)\right) \hbox {d}s \right. \nonumber \\&+\left. g\left( X^{t,x_i}_i(T), E[X^{t,x_i}_i(T)], X^{(-i)}(T) \right) \right] . \end{aligned}$$
(5.3)

The associated dynamics, parametrized by \((t,x_i)\), is

$$\begin{aligned} \left\{ \begin{array}{l} \hbox {d}X^{t,x_i}_i(s)= b(s,X^{t,x_i}_i(s),E[X^{t,x_i}_i(s)],u_i(s))\hbox {d}s\\ \qquad +\,\sigma (s,X^{t,x_i}_i(s),E[X^{t,x_i}_i(s)])\hbox {d}W_i(s),\quad t<s\le T, \\ X^{t,x_i}(t)=x_i. \end{array} \right. \end{aligned}$$
(5.4)

The \(i\)th player interacts with others through the mean-field coupling term

$$\begin{aligned} X^{(-i)}=\frac{1}{N-1}\sum _{k\ne i}^N X_k,\quad i\in \{1,\ldots ,N\}, \end{aligned}$$

which models the aggregate impact of all other players.

Note that the \(i\)th player assesses her cost functional over \([t,T]\) seen from her local state \(X_i(t)= x_i\) and she knows only the initial states of all other players at time 0, \((X_k(0)=x_0\), \( k\ne i\)). Thus the game may be cast as a decision problem where each player has incomplete state information about other players. The development of a solution framework in terms of a certain exact equilibrium notion is challenging. Our objective is to address this incomplete state information issue and design a set of individual strategies which has a meaningful interpretation. This will be achieved by using the so-called consistent mean-field approximation.

For a large \(N\), even if each player has full state information of the system, the exact characterization of the equilibrium points, based on the SMP, will have high complexity since each player leads to a variational inequality for the underlying Hamiltonians similar to (3.7) which is further coupled with the state processes of all other players. Therefore, we should rely on the mean-field approximation of our system.

We note that \(J^{i,N}\) depends on not only \(u_i\), but also all other players’ strategies \(u_{-i}\) through the mean-field coupling term \(X^{(-i)}\). This suggests that we extend Definition 1 to the \(N\)-player case as follows.

Definition 3

The admissible strategy \({\hat{u}}=({\hat{u}}_1, \ldots , {\hat{u}}_N)\) is a \(\delta _N\)-sub-game perfect equilibrium point for \(N\) players in the system (5.2)–(5.3) if for every \(i\in \{1,\ldots ,N\}\),

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\frac{J^{i,N}(t,x_i;{\hat{u}})-J^{i,N}(t,x_i;\hat{u}_{-i}, u_i^{\varepsilon })}{\varepsilon }\le O(\delta _N), \end{aligned}$$
(5.5)

for each given \(u_i\in \mathcal {U}_i[0,T],\,x_i\in \mathbb {R}\) and \(\, {\mathrm{a.e.}\,}t\in [0,T]\), where \(u_i^{\varepsilon }\) is the spike variation (2.4) of the strategy \({\hat{u}}_i\) of the \(i\)th player using \(u_i\) and \(0\le \delta _N\rightarrow 0 \) as \(N\rightarrow \infty \).

The error term \(O(\delta _N)\) is due to the mean-field approximation to be introduced below for designing \({\hat{u}}\).

5.1 The Local Limiting Decision Problem

Let \(X^{(-i)}\) be approximated by a deterministic function \(\bar{X}(s)\) on \([0,T]\). Denote the cost functional

$$\begin{aligned} \bar{J}^{i} (t, x_i; u_i)&= E\left[ \int _t^T h\left( s, X^{t,x_i}_i(s), E[ X^{t,x_i}_i(s) ], \bar{X}(s), u_i(s)\right) \hbox {d}s \right. \nonumber \\&\left. +\,g\left( X^{t,x_i}_i(T), E[ X^{t,x_i}_i(T)], \bar{X}(T)\right) \right] \end{aligned}$$
(5.6)

which is intended as an approximation of \(J^{i,N}\). Note that once \(\bar{X}\) is assumed fixed, \(\bar{J}^{i}\) is affected only by \(u_i\). The introduction of \(\bar{X}\) as a fixed function of time is based on the freezing idea in mean-field games. The reason is that \(X^{(-i)}=\frac{1}{N-1} \sum _{k\ne i}^N X_k\) is generated by many negligibly small players, and therefore, a given player has little influence on it.

The strategy selection of the \(i\)th player is based on finding a sub-game perfect equilibrium for \(\bar{J}^{i}\) to which the method based on the Stochastic Maximum Principle [cf. (3.8)] of Sect. 3 can be applied under the following conditions:

Assumption 2

  1. (i)

    The functions \(b(s,y,z,u), \sigma (s,y,z), h(s,y,z,w,u),g(y,z,w)\) are bounded.

  2. (ii)

    The functions \(b, \sigma \) are differentiable with respect to \((y,z)\). The derivatives are Lipschitz continuous in \((y,z,)\) and bounded.

  3. (iii)

    The functions \( h, g\) are differentiable with respect to \((y,z,w)\), and their derivatives are continuous in \((y,z, w, u)\) and \((y,z,w)\), respectively, and bounded.

Let \({\hat{u}}_i\in \mathcal {U}[0,T]\) be a sub-game perfect equilibrium point for (5.4) and (5.6) and denote the associated backward SDE

$$\begin{aligned} \left\{ \begin{array}{ll} \hbox {d}p^{t,x_i}(s)=-\left\{ H^{t,x_i}_y(s)+E [H^{t,x_i}_z(s)] \right\} \hbox {d}s +q^{t,x_i}(s)\hbox {d}W_i(s),\\ p^{t,x_i}(T)= -g^{t,x_i}_y (T)-E[g^{t,x_i}_z(T)], \end{array} \right. \end{aligned}$$
(5.7)

where for \(\zeta =y,z\),

$$\begin{aligned} H^{t,x_i}_{\zeta }(s)&= b_{\zeta }(s,{{\hat{X}}}^{t,x_i}_i(s),E[\hat{X}^{t,x_i}_i(s)],{\hat{u}}_i)p^{t,x_i}(s)+ \sigma _{\zeta }(s,{{\hat{X}}}^{t,x_i}_i(s), E[{{\hat{X}}}^{t,x_i}_i(s)])q^{t,x_i}(s) \\&-\, h_{\zeta }(s,{\hat{X}}^{t,x_i}_i, E[{\hat{X}}^{t,x_i}_i],\bar{X}(s),{\hat{u}}_i), \end{aligned}$$

for which

$$\begin{aligned}&H(t,x_i,v,p^{t,x_i}(t),q^{t,x_i}(t))-H(t,x_i,\hat{u}_i(t),p^{t,x_i}(t),q^{t,x_i}(t)) \le 0, \nonumber \\&\quad \forall v\in U,\; x_i\in \mathbb {R}, \;\, {\mathrm{a.e.}\,}t\in [0,T],\;{\mathbb {P}}-a.s. \end{aligned}$$
(5.8)

The closed-loop equilibrium state associated to \({\hat{u}}_i\) of the \(i\)th player is given by

$$\begin{aligned} d{{\hat{X}}}_i(s)= b(s,{{\hat{X}}}_i(s), E[{{\hat{X}}}_i(s)], {\hat{u}}_i(s))\hbox {d}s +\sigma (s, {{\hat{X}}}_i(s), E[{{\hat{X}}}_i(s)]) \hbox {d}W_i(s). \end{aligned}$$
(5.9)

We call \({\hat{u}}_i\) a decentralized strategy in that it has sample path dependence only on its local Bronian motion \(W_i\). The processes \(\{{\hat{u}}_k, 1\le k\le N\}\) are independent. Further, we impose

Assumption 3

All the processes \(\{{\hat{u}}_k, 1\le k\le N\}\) have the same law.

This restriction ensures that \(\{{{\hat{X}}}_i, 1\le i\le N\}\) are i.i.d. random processes. Since each \({\hat{u}}_i\) is obtained as a process adapted to the filtration generated by \(W_i\), it can be represented as a non-anticipative functional \(\hat{F}(\{W_i(s)\}_{s\le t}) \) of \(W_i\). For a given \(\bar{X}\), if non-uniqueness of \({\hat{u}}_i\) arises, we stipulate that the same functional \({\hat{F}}\) is used by all players applying their respective Brownian motions so that all the individual control processes have the same law. This means some coordination is necessary for the strategy selection under non-uniqueness. By the law of large numbers, the consistency condition on \(\bar{X}\) reads

$$\begin{aligned} \bar{X}(s)= E[{{\hat{X}}}_1(s)], \quad \forall s\in [0,T]. \end{aligned}$$
(5.10)

A question of central interest is how to characterize the performance of the set of strategies \({\hat{u}}=({\hat{u}}_1, \ldots , \hat{u}_N)\) when they are implemented and assessed according to the original cost functionals \(\{J^{i,N}, 1\le i\le N\}\). An answer is provided in the following theorem for which the proof is displayed in the next section. This is the second main result of the paper.

Theorem 2

Under Assumptions 2 and 3, suppose there exists a solution to (5.7), (5.9) and (5.10). Then we have

$$\begin{aligned} J ^{i,N}(t,x_i;{\hat{u}} )-J^{i,N}(t,x_i;{\hat{u}}_{-i}, u_i^\varepsilon ) =\bar{J}^{i}(t,x_i; {\hat{u}}_i)-\bar{J}^{i}(t,x_i; u_i^\varepsilon )+O\left(\frac{\varepsilon }{\sqrt{N-1}}\right). \end{aligned}$$
(5.11)

Moreover, \({\hat{u}}=({\hat{u}}_1, \ldots , {\hat{u}}_N)\in \prod _{i=1}^N\mathcal {U}[0,T]\) is a \(\delta _N\)-sub-game perfect equilibrium for the system (5.2)–(5.3) where \(\delta _N\le C/\sqrt{N}\) and \(C\) depends only on \((b,\sigma , h, g,T)\). \(\square \)

If there exists a unique solution \((\bar{X}, {\hat{u}}_i)\) to (5.7), (5.9) and (5.10), each player can locally construct its strategy. When there are multiple solutions, the players need to coordinate to choose the same \(\bar{X}\) and further ensure that \(\{{\hat{u}}_i, 1\le i\le N\}\) have the same law.

6 Proof of Theorem 2

This section is devoted to the proof of Theorem 2. We first establish some performance estimates which will be used to conclude the proof of the theorem.

6.1 The Performance Estimate

We have

$$\begin{aligned} J^{i,N}(t,x_i; {\hat{u}})&= E_{t,x_i}\left[ \int _t^T h(s, \hat{X}^{t,x_i}_i(s), E [{{\hat{X}}}^{t,x_i}_i(s)] , {{\hat{X}}}^{(-i)}(s), \hat{u}_i(s))\hbox {d}s \right. \\&\quad \left. +\,g({{\hat{X}}}^{t,x_i}_i(T), E [{{\hat{X}}}^{t,x_i}_i(T)], {{\hat{X}}}^{(-i)}(T))\right] . \end{aligned}$$

Now we fix \(i\in \{1, \ldots , N\}\) and change \({\hat{u}}_i\) to \(u_i^\varepsilon \) when all other players apply \({\hat{u}}_{-i}\), where

$$\begin{aligned} u_i^{\varepsilon }(s){:=}\left\{ \begin{array}{ll} u_{i}(s), \,\, &{} s\in [t, t+\varepsilon ],\\ {\hat{u}}_i(s), \,\, &{} s\in [t,T]\backslash [t, t+\varepsilon ], \end{array}\right. \end{aligned}$$

and \(u_i\in {\mathcal {U}}[0,T]\). We have

$$\begin{aligned} J^{i,N}(t,x_i;{\hat{u}}_{-i}, u_i^\varepsilon )&= E\left[ \int _t^T h\left( s, X^{t,x_i}_i(s), E [ X^{t,x_i}_i(s)] , {{\hat{X}}}^{(-i)}(s) , u^{\varepsilon }_i(s)\right) \hbox {d}s \right. \nonumber \\&\quad \left. +g\left( X^{t,x_i}_i(T), E [ X^{t,x_i}_i(T)], {{\hat{X}}}^{(-i)}(T) \right) \right] , \end{aligned}$$
(6.1)

where \(X^{t,x_i}_i\) is the solution of (5.4) with admissible strategy \(u_i^\varepsilon \). The following estimates will be frequently used in the sequel.

Lemma 1

For the \(i\)th player, let \( X_i\) and \( {{\hat{X}}}_i\) be the state processes corresponding to \(u_i^\varepsilon \) and \({\hat{u}}_i\), respectively. Then

$$\begin{aligned} E\Big [\sup _{t\le s\le T} |X^{t,x_i}_i(s)-\hat{X}^{t,x_i}_i(s)|^2\Big ]\le C\varepsilon ^{2}, \end{aligned}$$

where \(C\) does not depend on \((t, x_i)\).

Proof

Using the SDEs (5.4) for the two state processes, we have

$$\begin{aligned}&X_i^{t,x_i}(\tau )-{{\hat{X}}}_i^{t,x_i}(\tau )\\&\quad = \int _t^\tau \left\{ b\left( s, X_i^{t,x_i}(s), E[ X_i^{t,x_i}(s)],u_i^\varepsilon (s)\right) - b\left( s, {{\hat{X}}}_i^{t,x_i}(s), E[ \hat{X}_i^{t,x_i}(s)], {\hat{u}}_i(s)\right) \right\} \hbox {d}s \\&\qquad +\int _t^\tau \left\{ \sigma \left( s, X_i^{t,x_i}(s), E[ X_i^{t,x_i}(s)]\right) -\sigma \left( s, {{\hat{X}}}_i^{t,x_i}(s), E[\hat{X}_i^{t,x_i}(s)]\right) \right\} \hbox {d}W_i(s). \end{aligned}$$

By Burkholder-Davis-Gundy’s inequality, we have

$$\begin{aligned}&E\left[ \sup \nolimits _{t\le \tau \le T}|X_i^{t,x_i}(\tau )-{{\hat{X}}}_i^{t,x_i}(\tau )|^2\right] \\&\quad \le C E \left[ \Bigg (\int _t^T\left| b\left( s, X_i^{t,x_i}(s), E[ X_i^{t,x_i}(s)], u_i^\varepsilon (s)\right) \right. \right. \\&\qquad \left. \left. -\, b\left( s, {\hat{X}}_i^{t,x_i}(s), E[ {\hat{X}}_i^{t,x_i}(s)], {\hat{u}}_i(s)\right) \right| \hbox {d}s\Bigg )^2\right] \\&\qquad +\, CE\left[\int _t^T \left| \sigma \left( s, X_i^{t,x_i}(s), E[ X_i^{t,x_i}(s)]\right) -\sigma \left( s, {\hat{X}}_i^{t,x_i}(s), E[{\hat{X}}_i^{t,x_i}(s)]\right) \right| ^2 \hbox {d}s\right]\\&\quad =: C(I_b+I_\sigma ), \end{aligned}$$

where \(C\) is a positive constant.

Noting that, in view of Assumption 2-(i), if the positive constant \(C\) denotes the bound of \(b\), we have

$$\begin{aligned}&\left| b\left( s, {\hat{X}}_i^{t,x_i}(s), E[{\hat{X}}_i^{t,x_i}(s)], u_i^\varepsilon (s)\right) - b\left( s,{\hat{X}}_i^{t,x_i}(s), E[\hat{X}_i^{t,x_i}(s)],{\hat{u}}_i(s)\right) \right| \\&\quad = \left| b \left( s, \hat{X}_i^{t,x_i}(s), E[{\hat{X}}_i^{t,x_i}(s)],u_i(s)\right) - b\left( s, {\hat{X}}_i^{t,x_i}(s), E[ {\hat{X}}_i^{t,x_i}(s)], \hat{u}_i(s)\right) \right| 1\!\!1_{[t,t+\varepsilon ]}(s)\\&\quad \le C1\!\!1_{[t,t+\varepsilon ]}(s), \end{aligned}$$

Thus, since \(b\) is Lipschitz in \((y,z)\), by Assumption 2-(ii), we have

$$\begin{aligned}&\left| b\left( s, X_i^{t,x_i}(s), E[ X_i^{t,x_i}(s)], u_i^\varepsilon (s)\right) - b\left( s, {\hat{X}}_i^{t,x_i}(s), E[ \hat{X}_i^{t,x_i}(s)], {\hat{u}}_i(s)\right) \right| \nonumber \\&\quad \le \left| b\left( s, X_i^{t,x_i}(s), E[ X_i^{t,x_i}(s)], u_i^\varepsilon (s)\right) - b\left( s, {\hat{X}}_i^{t,x_i}(s), E[ {\hat{X}}_i^{t,x_i}(s)], u^\varepsilon _i(s)\right) \right| \nonumber \\&\qquad +\,\left| b \left( s,{\hat{X}}_i^{t,x_i}(s), E[\hat{X}_i^{t,x_i}(s)],u_i^\varepsilon (s)\right) -b\left( s, \hat{X}_i^{t,x_i}(s), E[{\hat{X}}_i^{t,x_i}(s)],{\hat{u}}_i(s)\right) \right| \nonumber \\&\quad \le C\left( |X_i^{t,x_i}(s)-\hat{X}_i^{t,x_i}(s)|+E[|X_i^{t,x_i}(s)-\hat{X}_i^{t,x_i}(s)|]+1\!\!1_{[t,t+\varepsilon ]}(s)\right) . \end{aligned}$$
(6.2)

The Cauchy–Schwarz inequality yields

$$\begin{aligned} I_b&\le C(T-t)\int _t^T E[|X_i^{t,x_i}(s)-{\hat{X}}_i^{t,x_i}(s)|^2] \hbox {d}s +C E\left[ \left( \int _t^T 1\!\!1_{[t,t+\varepsilon ]}(s)\hbox {d}s\right) ^2 \right] \nonumber \\&\le C\int _t^T E\left[ \sup _{t\le \eta \le s}|X_i^{t,x_i}(\eta )-\hat{X}_i^{t,x_i}(\eta )|^2\right] \hbox {d}s +C\varepsilon ^2. \end{aligned}$$
(6.3)

In a similar fashion, since \(\sigma \) is Lipschitz in \((y,z)\), by Assumption 2-(ii), we obtain

$$\begin{aligned} I_\sigma&\le C\int _t^T E\Big [\sup _{t\le \eta \le s}|X_i^{t,x_i}(\eta )-{\hat{X}}_i^{t,x_i}(\eta )|^2\Big ]\hbox {d}s. \end{aligned}$$
(6.4)

Therefore,

$$\begin{aligned} E\big [\sup \nolimits _{t\le \tau \le T}|X_i^{t,x_i}(\tau )-\hat{X}_i^{t,x_i}(\tau )|^2\big ] \le C\int _t^T \Big [ E[\sup \nolimits _{t\le \eta \le s}|X_i^{t,x_i}(\eta )-{\hat{X}}_i^{t,x_i}(\eta )|^2]\hbox {d}s+C\varepsilon ^2. \end{aligned}$$

The lemma follows from Gronwall’s lemma. \(\square \)

Lemma 2

We have

$$\begin{aligned} E\left[ \sup _{0\le s\le T} | {\hat{X}}_i(s)|^2 \right] \le C E \Big [|\hat{X}_i(0)|^2+1\Big ]. \end{aligned}$$

Proof

We write

$$\begin{aligned} {\hat{X}}_i(s)&= {\hat{X}}_i(0) + \int _0^s b(\tau , {\hat{X}}_i(\tau ), E[\hat{X}_i(\tau )], {\hat{u}}_i(\tau ) )d\tau \nonumber \\&\quad + \int _0^s \sigma (\tau , \hat{X}_i(\tau ), E[{\hat{X}}_i(\tau )] )\hbox {d}W_i(\tau ). \end{aligned}$$
(6.5)

Then, by Burkholder-Davis-Gundy’s inequality, we have

$$\begin{aligned} E\left[ \sup _{0\le s\le T}|{\hat{X}}_i(s)|^2\right]&\le C\left( E|{\hat{X}}_i(0)|^2+ E\left[ \int _0^T|b(s, {\hat{X}}_i(s), E[{\hat{X}}_i(s)], {\hat{u}}_i(s) )|\hbox {d}s\right] ^2\right) \\&+\, CE \int _0^T |\sigma (s, {\hat{X}}_i(s), E[{\hat{X}}_i(s)] )|^2\hbox {d}s. \end{aligned}$$

By the Lipschitz condition on \(b\) and \(\sigma \) (their derivatives w.r.t \((y,z)\) being bounded), we further obtain

$$\begin{aligned} E\left[ \sup \nolimits _{0\le s\le T}|{\hat{X}}_i(s)|^2\right] \le C\bigg (E|{\hat{X}}_i(0)|^2 +1 +\int _0^TE[\sup \nolimits _{0\le \eta \le s}|{\hat{X}}_i(\eta )|^2\bigg ) \end{aligned}$$

which combined with Gronwall’s lemma yields the desired estimate. \(\square \)

Corollary 1

We have, for \(N\ge 2\),

$$\begin{aligned} \sup _{0\le s\le T} E[|{\hat{X}}^{(-i)}(s)- \bar{X}(s)|^2 ]\le \frac{C}{N-1}, \end{aligned}$$

where \(C\) does not depend on \(N\).

Proof

Thanks to Assumption 3, \({\hat{X}}_1, \ldots {\hat{X}}_N\) are i.i.d. processes. The estimate follows from Lemma 2. \(\square \)

6.2 Proof of Theorem 2

In order to estimate \( J^{i,N}(t,x_i; {\hat{u}})-J^{i,N}(t,x_i; {\hat{u}}_{-i}, u_i^\varepsilon )\), we introduce some notation. Let

$$\begin{aligned} \varDelta _h(s)&= h\left( s, {\hat{X}}^{t,x}_i(s), E [{\hat{X}}^{t,x_i}_i(s)] , {\hat{X}}^{(-i)}(s), {\hat{u}}_i(s)\right) \\&\quad - h\left( s, X^{t,x_i}_i(s), E[ X^{t,x_i}_i(s)] , \hat{X}^{(-i)}(s) , u_i^\varepsilon (s)\right) , \\ \varDelta _g&= g\left( {\hat{X}}^{t,x_i}_i(T), E[{\hat{X}}^{t,x_i}_i(T)], {\hat{X}}^{(-i)}(T)\right) - g\left( X^{t,x_i}_i(T), E [X^{t,x_i}_i(T)], {\hat{X}}^{(-i)}(T) \right) . \end{aligned}$$

We have

$$\begin{aligned} \varDelta _h(s)&= \left[ h\left( s,{\hat{X}}^{t,x_i}_i(s), E[\hat{X}^{t,x_i}_i(s)], \bar{X}(s),{\hat{u}}_i(s) \right) \right. \\&\quad \left. -\,h\left( s,X^{t,x_i}_i(s), E[ X^{t,x_i}_i(s)], \bar{X}(s),u_i^\varepsilon (s) \right) \right] \\&\quad +\,\Big \{\left[ h\left( s,{\hat{X}}^{t,x_i}_i(s), E[{\hat{X}}^{t,x_i}_i(s)], {\hat{X}}^{(-i)}(s), {\hat{u}}_i(s) \right) \right. \\&\quad \left. -\,h\left( s,X^{t,x_i}_i(s), E[X^{t,x_i}_i(s)], {\hat{X}}^{(-i)}(s), u_i^\varepsilon (s) \right) \right] \\&\quad -\,\left[ h\left( s,{\hat{X}}^{t,x_i}_i(s), E[{\hat{X}}^{t,x_i}_i(s)], \bar{X}(s), {\hat{u}}_i(s) \right) \right. \\&\quad \left. -\, h\left( s, X^{t,x_i}_i(s), E[X^{t,x_i}_i(s)], \bar{X}(s), u^\varepsilon _i(s) \right) \right] \Big \}\\&=: \varDelta _{h1} +\varDelta _{h2}. \end{aligned}$$

Similarly,

$$\begin{aligned} \varDelta _g&= \left[g\left({\hat{X}}^{t,x_i}_i(T), E[{\hat{X}}^{t,x_i}_i(T)], \bar{X}(T)\right)-g\left( X^{t,x_i}_i(T), E[X^{t,x_i}_i(T)], \bar{X}(T) \right) \right]\\&+\,\Big \{ \left[g\left({\hat{X}}^{t,x_i}_i(T), E[{\hat{X}}^{t,x_i}_i(T)], {\hat{X}}^{(-i)}(T)\right)-g\left( X^{t,x_i}_i(T), E [X^{t,x_i}_i(T)], \hat{X}^{(-i)}(T) \right) \right]\\&-\,\left[ g\left({\hat{X}}^{t,x_i}_i(T), E[{\hat{X}}^{t,x_i}_i(T)], \bar{X}(T) \right)-g\left( X^{t,x_i}_i(T), E [X^{t,x_i}_i(T)], \bar{X}(T)\right) \right] \Big \}\\&=: \varDelta _{g1} +\varDelta _{g2}. \end{aligned}$$

Now, noting that

$$\begin{aligned} E\left[ \int _t^T \varDelta _{h1}(s) \hbox {d}s + \varDelta _{g1}\right] = \bar{J}^{i}(t,x_i; {\hat{u}}_i)-\bar{J}^{i}(t,x_i; u_i^\varepsilon ), \end{aligned}$$
(6.6)

the cost difference satisfies

$$\begin{aligned} J ^{i,N}(t,x_i;{\hat{u}})-J^{i,N}(t,x_i;{\hat{u}}_{-i}, u_i^\varepsilon )&= \bar{J}^{i}(t,x_i; {\hat{u}}_i)-\bar{J}^{i}(t,x_i; u_i^\varepsilon )\nonumber \\&\quad +\,E \left[\int _t^T \varDelta _{h2}(s)\hbox {d}s+\varDelta _{g2}\right]. \end{aligned}$$
(6.7)

We proceed to estimate

$$\begin{aligned} E\left[\int _t^T \varDelta _{h2}(s)\hbox {d}s +\varDelta _{g2}\right]. \end{aligned}$$

Lemma 3

We have

$$\begin{aligned} \left|E\left[\int _t^T \varDelta _{h2}(s)\hbox {d}s+\varDelta _{g2}\right]\right|\le \frac{C\varepsilon }{\sqrt{N-1}}. \end{aligned}$$
(6.8)

Proof

We will only estimate \(E\left[\int _t^T \varDelta _{h2}(s)\hbox {d}s\right]\). The second term may be handled in a similar fashion. Let

$$\begin{aligned} \alpha (w){:=}h(s,{\hat{X}}^{t,x_i}_i(s), E[{\hat{X}}^{t,x_i}_i(s)], w, \hat{u}_i(s))- h( s, X^{t,x_i}_i(s), E[ X^{t,x_i}_i(s)], w, u^\varepsilon _i(s) ) \end{aligned}$$

and

$$\begin{aligned} \lambda (s){:=}{\hat{X}}^{(-i)}(s)-\bar{X}(s). \end{aligned}$$

Then we have

$$\begin{aligned} \varDelta _{h2}(s)=\alpha ({\hat{X}}^{(-i)}(s))-\alpha (\bar{X}(s))= \lambda (s) \int _0^1\alpha _w \left(\bar{X}(s)+ \theta \left[\hat{X}^{(-i)}(s)-\bar{X}(s) \right] \right)d\theta . \end{aligned}$$
(6.9)

Noting that by Assumption 2-(iii) on \(h\), we may perform similar calculations leading to (6.2) to obtain

$$\begin{aligned} |\alpha _w(w)|\le C \left[\left| X^{t,x_i}_i(s) -{\hat{X}}^{t,x_i}_i(s)\right| + E\left[ \left| X^{t,x_i}_i(s)- {\hat{X}}^{t,x_i}_i(s)\right| \right] + 1\!\!1_{[t,t+\varepsilon ]}(s)\right]. \end{aligned}$$

Therefore,

$$\begin{aligned} |\varDelta _{h2}(s)|\le C |\lambda (s)| \left[\left| X^{t,x_i}_i(s) -\hat{X}^{t,x_i}_i(s)\right| + E\left[ \left| X^{t,x_i}_i(s)- {\hat{X}}^{t,x_i}_i(s)\right| \right] +1\!\!1_{[t,t+\varepsilon ]}(s)\right]. \end{aligned}$$

Therefore, by the Cauchy–Schwarz inequality, we get

$$\begin{aligned} E\int _t^T|\varDelta _{h2}(s)|\hbox {d}s&\le C\int _t^T (E[ |\lambda (s)|^2])^{1/2}\left( (E [|X^{t,x_i}_i(s) -\hat{X}^{t,x_i}_i(s)|^2])^{1/2} + 1\!\!1_{[t,t+\varepsilon ]}(s)\right) \hbox {d}s \\&\le C \left( \sup _{0\le s\le T}E |\lambda (s)|^2\right) ^{1/2} \left( \!\left( \!E\! \left[ \sup _{t\le s\le T} |X^{t,x_i}_i(s) -{\hat{X}}^{t,x_i}_i(s)|^2\right] \right) ^{1/2}+\varepsilon \!\right) \!. \end{aligned}$$

Subsequently by Lemma 1 and Corollary 1,

$$\begin{aligned} \left| E\left[ \int _t^T \varDelta _{h2}(s)\hbox {d}s\right]\right| \le \frac{C \varepsilon }{ \sqrt{N-1}}. \end{aligned}$$

\(\square \)

Finally, we combine Lemma 3 and the relation (6.7) to conclude

$$\begin{aligned} J^{i,N}(t,x_i; {\hat{u}}) - J^{i,N}(t,x_{i}; {\hat{u}}_{-i}, u_i^\varepsilon )=\bar{J}^i(t,x_i; {\hat{u}}_i ) - \bar{J}^i(t,x_{i}; u_i^\varepsilon )+O\left(\frac{\varepsilon }{\sqrt{N-1}}\right). \end{aligned}$$

Furthermore, since \({\hat{u}}\) is determined by (5.7)–(5.10),

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\frac{\bar{J}^i(t,x_i; {\hat{u}}_i ) -\bar{J}^i(t,x_{i}; u_i^\varepsilon ) }{\varepsilon }\le 0, \end{aligned}$$

we finally get

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\frac{J^{i,N}(t,x_i; {\hat{u}}) - J^{i,N}(t,x_i; {\hat{u}}_{-i}, u_i^\varepsilon ) }{\varepsilon }\le \frac{ C}{\sqrt{N-1}}. \end{aligned}$$

This completes the proof of Theorem 2. \(\square \)

7 A Mean-Field LQG Game

Consider a system of \(N\) players. The dynamics of the \(i\)th player is given by

$$\begin{aligned} \hbox {d}X_i(t)= (a X_i(s)+bu_i(s))\hbox {d}s + \sigma \hbox {d}W_i(s), \quad 1\le i\le N. \end{aligned}$$
(7.1)

Denote \(x=(x_1, \ldots , x_N)\) and \(u=(u_1,\ldots ,u_N)\). Its cost functional at time \(t\) is

$$\begin{aligned} J^{i,N}(t, x_i;u)=\frac{1}{2} E \left[ \int _t^T u_i^2(s)\hbox {d}s \right]+\frac{\gamma }{2}E\left[ X^{t,x_i}_i(T)-\varGamma _1 x_i-\varGamma _2 X^{(-i)}(T)\right] ^2, \end{aligned}$$

where \(X^{(-i)}(t)=\frac{1}{N-1}\sum _{k\ne i}^N X_k(t)\). We take \(\varGamma _1\ne 0\) and \(\varGamma _2\ne 0\). A simple interpretation of the terminal cost is that each agent wants to adjust its terminal state based on its current state and also the mean-field term \(X^{(-i)}\) at time \(T\). The cost functional is time inconsistent. Below, we will apply a consistent mean-field approximation to construct a limiting control problem.

Following the scheme in Sect. 5, we introduce \(\bar{X}_T\) as an approximation of \(X^{(-i)}(T)\). The new cost functional is

$$\begin{aligned} \bar{J}^{i}(t, x_i;u_i)= \frac{1}{2} E\left[\int _t^T u_i^2(s)\hbox {d}s\right] +\frac{\gamma }{2} E \left[ X^{t,x_i}_i(T)-\varGamma _1 x_i-\varGamma _2 \bar{X}_T\right] ^2. \end{aligned}$$

This is a time-inconsistent control problem. The same approach as in Sect. 4.3 can be applied. The adjoint equation now reads

$$\begin{aligned} \left\{ \begin{array}{l} \hbox {d}p^{t,x_i}(s)= -a p^{t,x_i}(s) \hbox {d}s +q^{t,x_i}(s) \hbox {d}W_i(s),\\ p^{t,x_i}(T)=\gamma (\varGamma _1 x_i+\varGamma _2 \bar{X}_T-X^{t,x_i}_i(T)). \end{array} \right. \end{aligned}$$

We look for a solution of the form

$$\begin{aligned} p^{t,x_i}(s) = \beta _s (\varGamma _1x_i + \varGamma _2 \bar{X}_T)-\alpha _s X^{t,x_i}_i(s). \end{aligned}$$

The same set of ODEs is obtained as in Sect. 4.3. The equilibrium strategy is given in the feedback form

$$\begin{aligned} {\hat{u}}_i(t)=b p^{t,x_i}(t)= -b (\alpha _t-\beta _t\varGamma _1) x_i+b \beta _t\varGamma _2 \bar{X}_T \end{aligned}$$
(7.2)

when the current state is \(x_i\). The closed-loop equilibrium dynamics of the \(i\)th player is

$$\begin{aligned} \hbox {d}{\hat{X}}_i(s) =\left[ a-b^2(\alpha _s-\beta _s\varGamma _1)\right] {\hat{X}}_i(s)\hbox {d}s +b \beta _s \varGamma _2\bar{X}_T \hbox {d}s +\sigma \hbox {d}W_i(s). \end{aligned}$$
(7.3)

Finally, we impose the consistency requirement. Assume all players have the same initial condition \(y_0\), and so \(\bar{X}_T\) can be obtained as \(E{\hat{X}}_i(T)\). Now we take expectation in (7.3) to construct the ODE

$$\begin{aligned} \dot{m}(s)=[a-b^2(\alpha _s-\beta _s\varGamma _1)]m(s) +b \beta _s \varGamma _2\bar{X}_T,\quad m(0)=y_0. \end{aligned}$$

By obvious notation for the transition function \(\varPhi \), we write the solution of the ODE as

$$\begin{aligned} m(t)= \varPhi (t,0)y_0+ \int _0^t \varPhi (t,s) b \beta _s \varGamma _2\bar{X}_T \hbox {d}s. \end{aligned}$$

Now the consistency condition for \(\bar{X}\) becomes

$$\begin{aligned} \bar{X}_T= \varPhi (T,0)y_0+ \int _0^T \varPhi (T,s) b \beta _s \varGamma _2\bar{X}_T \hbox {d}s. \end{aligned}$$

For this approach to have a solution for any given \(y_0\), we need

$$\begin{aligned} b \varGamma _2\int _0^T \varPhi (T,s) \beta _sds \ne 1. \end{aligned}$$
(7.4)

If (7.4) holds, we can solve \(\bar{X}_T\) first and next determine the strategy (7.2).

7.1 The Performance Difference

Suppose (7.4) holds. For the performance estimate, we consider the following set of admissible strategies

$$\begin{aligned} {\mathcal {U}}_0[0,T]{:=}\Big \{u: [0,T]\times \varOmega \longrightarrow {\mathbb {R}}; \,\, u\, \text{ is }\,\, \mathbb {F}\text{-adapted }, E[{\hbox {ess sup}}_{0\le s\le T}|u(s)|^2]<\infty \Big \}, \end{aligned}$$

which is smaller than \({\mathcal {U}}[0,T]\). The costs associated with \({\hat{u}}\) and \((u_i, {\hat{u}}_{-i})\) are, respectively, given by

$$\begin{aligned} J^{i,N}(t,x_i; {\hat{u}})&= \frac{1}{2} E\left[ \int _t^T {\hat{u}}_i^2(s)\hbox {d}s\right] +\frac{\gamma }{2} E \left[ {\hat{X}}^{t,x_i}_i(T)-\varGamma _1 x_i-\varGamma _2 \hat{X}^{(-i)}(T)\right] ^2, \\ J^{i,N}(t,x_i; u_i, {\hat{u}}_{-i})&= \frac{1}{2} E\left[ \int _t^T u_i^2(s)\hbox {d}s\right] +\frac{\gamma }{2} E \left[ X^{t,x_i}_i(T)-\varGamma _1 x_i-\varGamma _2 {\hat{X}}^{(-i)}(T)\right] ^2. \end{aligned}$$

The difference can be written as

$$\begin{aligned}&J^{i,N}(t,x_i; u_i, {\hat{u}}_{-i})-J^{i,N}(t,x_i; {\hat{u}})\\&\quad = \frac{1}{2} E\left[ \int _t^T u_i^2(s)\hbox {d}s\right]+\frac{\gamma }{2} E \left[ X^{t,x_i}_i(T)-\varGamma _1 x_i-\varGamma _2 \bar{X}_T\right] ^2\\&\qquad - \frac{1}{2} E\left[ \int _t^T {\hat{u}}_i^2(s)\hbox {d}s\right] -\frac{\gamma }{2} E \left[ {\hat{X}}^{t,x_i}_i(T)-\varGamma _1 x_i-\varGamma _2 \bar{X}_T\right] ^2+d_{N}, \end{aligned}$$

where

$$\begin{aligned} d_N =\gamma \varGamma _2 E\Big [({\hat{X}}^{t,x_i}_i(T)-X^{t,x_i}_i(T))(\hat{X}^{(-i)}(T)-\bar{X}_T)\Big ]. \end{aligned}$$

For any fixed \(u_i\in {\mathcal {U}}_0[0,T],\) we can still prove Lemma 1. Corollary 1 also holds for \(\hat{u}_j, 1\le j\le N\). We have

$$\begin{aligned} |d_N|&\le \gamma \varGamma _2 (E|\hat{X}^{t,x_i}_i(T)-X^{t,x_i}_i(T)|^2)^{1/2} (E|{\hat{X}}^{(-i)}(T)-\bar{X}_T|^2)^{1/2} \\&\le \frac{C\varepsilon }{\sqrt{N-1}}, \end{aligned}$$

where \(C\) may depend on \(u_i\). If \(u_i\in {\mathcal {U}}[0,T]\) were considered, we would be unable to obtain the second inequality above. Finally,

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\frac{J^{i,N}(t,x_i; \hat{u})-J^{i,N}(t,x_i; u_i, {\hat{u}}_{-i})}{\varepsilon }\le \frac{C}{\sqrt{N-1}}. \end{aligned}$$

Thus, \({\hat{u}}\) is a \(\delta _N\)-sub-game perfect Nash equilibrium for \(N\) players where \(\delta _N\le C/\sqrt{N-1}\). \(\square \)