1 Introduction

Mean-field game (MFG) theory is the study of games involving a large number of agents. In the non-cooperative games setting, the basic model requires agents to abide by identical dynamics and seek to minimize an individual cost function that is also the same for all. As the number of agents increases to infinity, the empirical distribution of their states becomes indifferent to the strategy of any single agent and yet it couples their individual responses. Thus, the aggregate response of the agents (mean field) drives individual responses while the action of individual agents is insignificant. On the flip side, in the cooperative games setting, the agents seek to jointly optimize a common performance index. Either way, the desire to minimize cost, individually or collectively, drives the empirical distribution of agents in suitable ways. The purpose of this work to study for both, non-cooperative and cooperative MFGs, the control problem to steer the collective response of agents over a finite window of time between two specified end-point marginal distributions by suitable choice of cost (i.e., incentives) in non-cooperative games and centralized control with cooperative agents, and also the problem to ensure a desired stationary distribution under similar conditions. This viewpoint is influenced by optimal mass transport (OMT) theory that deals with the flow of time-marginal densities for a collective (agents, particles, resources) and corresponding control and modeling problems.

The study of MFG was introduced into the engineering literature by Huang et al. [1] and, independently, by Lasry and Lions [2]. Earlier, in the economics literature, similar models were considered by Jovanovic and Rosental [3]. The importance of the subject stems from the wide range of applications that include modeling and control of multi-agent dynamical systems, stock market dynamics, crowd dynamics, power systems and more; see [1, 2, 4, 5], and also see [6,7,8,9,10,11,12] in the special case of linear dynamics and quadratic cost. The cooperative games counterparts of MFG have been studied in [5, 13, 14], and are usually known as the optimal control of McKean–Vlasov dynamics.

The other important component, OMT, originates in the work of Monge [15] and aims directly at relating/transporting distributions under minimum cost. Kantorovich [16] introduced linear programming and duality theory for solving OMT resource allocation problems and, in recent years, a fast developing phase was spurred by a wide range of applications of OMT to probability theory, economics, weather modeling, biology and mathematical physics [17,18,19,20,21,22,23,24]. The connection between dynamic OMT [25] and stochastic control has been explored in our work, e.g., [26, 27], where the focus has been on regulating stochastic uncertainty of diffusion processes and of stochastically driven dynamical systems by suitable control action. These stochastic control problems, in turn, relate to a classical maximum entropy problem on path space known as the Schrödinger bridge problem; see e.g., [28,29,30,31,32,33,34].

One goal of the present work is to study density-steering problems in an MFG framework or, equivalently, explore the role of interaction potential and decentralized strategies when steering an initial distribution to a terminal one. In particular, we are interested on how to design an added terminal cost so as to provide incentives for agents, under a Nash equilibrium strategy, to move collectively as specified. To this end, we establish that the map between terminal costs and terminal probability distributions is onto, under the assumptions that the underlying dynamics is linear and the running cost is quadratic. Thereby, we develop an MFG-based synthesis framework for OMT-type stochastic control problems with or without stochastic excitation.

The paper evolves along the following lines. First, we discuss the motivation and problem formulation in Sect. 2. The solution is provided in Sect. 3. In Sect. 4 we study similar problems with less or no disturbance. Section 5 is dedicated to the special case with Gaussian marginal distributions. Similar problems in the stationary setting are investigated in Sect. 6. In Sect. 7, we developed the cooperative game counterpart of the density-steering problem. This follows by a simple academic example in Sect. 8 and a brief concluding remark in Sect. 9.

2 Problem Formulation

We herein investigate the collective dynamical response of a group of agents (also thought of as particles, players, and so on) that interact weakly with each other. The terminology “weakly” refers to the agents being statistically indistinguishable (anonymous) and affecting each other’s response only through their empirical distribution [35]. Thus, we consider such a system of N agents with dynamicsFootnote 1 specified by

$$\begin{aligned} \text {d}x_i(t)= & {} Ax_i(t)\text {d}t+\frac{1}{N-1}\sum _{j\ne i} {\bar{A}}x_j(t)\text {d}t+Bu_i(t)\text {d}t+B\text {d}w_i(t),\nonumber \\ x_i(0)= & {} x_0^i,~~i=1,\ldots ,N. \end{aligned}$$
(1)

Here, \(x_i, u_i, w_i\) represent the state, control input, white noise disturbance, respectively, for the ith agent, and the model parameters are the same for all. We further assume that their initial conditions \(x_0^1, x_0^2,\ldots ,x_0^N\) are all independent with the same probability density \(\rho _0\). The ith agent interacts with the rest through the averaged position. The matrices \(A,{\bar{A}}\in {\mathbb R}^{n\times n}, B\in {\mathbb R}^{n\times m}\) are continuous functions of time; for notational simplicity, we often use, e.g., A instead of A(t). It is assumed that the pair is controllable in the sense that the reachability Gramian

$$\begin{aligned} M(t,s)=\int _s^t \varPhi (t,\tau ) B(\tau )B(\tau )'\varPhi (t,\tau )'\text {d}\tau \end{aligned}$$

is invertible for all \(s<t\). Here, \(\varPhi (\cdot ,\cdot )\) denotes the state transition matrix that is defined via

$$\begin{aligned} \frac{\partial \varPhi (t,s)}{\partial t}=A\varPhi (t,s), ~~\varPhi (s,s)=I. \end{aligned}$$

Clearly, in case when A is time-invariant, \(\varPhi (t,s)=e^{A(t-s)}\).

In non-cooperative MFG [1], each agent searches for an optimal control strategy to minimize its own costFootnote 2

$$\begin{aligned} J_i(u_i)={\mathbb E}\left\{ \int _0^1 f(t,x_i(t),u_i(t),\mu ^N(t)) \text {d}t+g(x_i(1),\mu ^N(1))\right\} , \end{aligned}$$
(2)

where

$$\begin{aligned} \mu ^N(t) =\frac{1}{N}\sum _{i=1}^N \delta _{x_i(t)} \end{aligned}$$
(3)

is the empirical distribution of the states of the N agents at time t. Thus, this is a non-cooperative game and the cost of the ith agent is affected by the strategies of others only through the empirical distribution. An optimal control corresponds to a Nash equilibrium for the game. We follow the arguments in [5] and restrict ourselves to equilibria that correspond to symmetric Markovian control strategies (state feedback)

$$\begin{aligned} u_i(t)=\phi (t,x_i(t)), i=1,\ldots ,N. \end{aligned}$$
(4)

When N is large, the empirical distribution \(\mu ^N\) is indifferent to small perturbations of control strategy of a single agent. This points to the following approach [5] to obtain an approximate Nash equilibrium: fix a family \((\mu (t))_{0\le t\le 1}\) of probability measures and solve the standard stochastic control problem

$$\begin{aligned} \phi ^\star ={\text {argmin}}_{\phi } {\mathbb E}\left\{ \int _0^1 f(t,x(t),\phi (t,x(t)),\mu (t)) \text {d}t+g(x(1),\mu (1))\right\} \end{aligned}$$
(5)

subject to the dynamics

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+{\bar{A}}{\bar{x}_\mu }(t)\text {d}t +B\phi (t,x(t))\text {d}t+B\text {d}w(t), ~~x(0)=x_0 \end{aligned}$$
(6)

where

$$\begin{aligned} {\bar{x}}_\mu := \langle x, \mu (t)\rangle = \int _{{\mathbb R}^n} x\, \mu (\text {d}x) \end{aligned}$$

denotes the meanFootnote 3 of the distribution \(\mu (t)\), and \(x_0\) is a random vector with probability density \(\rho _0\).

Considering the choice \((\mu (t))_{0\le t\le 1}\) as a parameter, the remaining issue is to choose this distribution flow so that the actual distribution of the solution x(t) of (6) with optimal control strategy

$$\begin{aligned} u^\star (t)=\phi ^\star (t,x(t)) \end{aligned}$$
(7)

coincides with \(\mu (t)\). The solution to the MFG problem involves establishing the existence and uniqueness of the solution to two coupled partial differential equations (PDEs) [5]. It has been shown that a Nash equilibrium point for this mean-field game exists under rather mild assumptions on the cost function [1, 2, 4,5,6,7,8,9]. That is, there exists a family \((\mu (t))_{0\le t\le 1}\) such that the distribution flow of the solution x(t) of (6) under optimal control strategy \(\phi ^\star \) coincides with this same \(\mu \). In addition, this optimal control \(\phi ^\star \) is proven to be an \(\varepsilon \)-Nash equilibrium to the N-player-game for N large [5, 36].

Departing from previous literature, this paper deals with the density-steering problem of the N-player-game system. More specifically, we are interested in introducing a suitable cost incentive so that the system is driven to a specific distribution \(\mu _1\) at time \(t=1\) under (7). In fact, it turns out that under mild conditions, a quadratic running cost in both the control and state (i.e., group linear tracking as in the work of Huang et al. [1]), can be enhanced by a suitable terminal cost g as follows

$$\begin{aligned} J_i(u_i)={\mathbb E}\left\{ \int _0^1 \left( \frac{1}{2}\Vert u_i(t)\Vert ^2 +\frac{1}{2}\Vert x_i(t)-{\bar{x}}(t)\Vert _Q^2\right) \text {d}t+g(x_i(1),\mu ^N(1))\right\} \end{aligned}$$
(8)

so as to accomplish the task of steering the initial distribution to the desired terminal one. In other words, we show that the mapping between a choice of g and the terminal distribution \(\mu _1\) is onto, under the assumption that \(\mu _0, \mu _1\) have finite second-order moments. Formally, the problem we are interested in can be stated as follows.

Problem 2.1

Given N agents governed by (1) with initial probability density \(\rho _0\), find a terminal cost g such that, in the Nash equilibrium with cost functional (8), the agents will reach a given terminal density \(\rho _1\) at time \(t=1\), in the limit as N goes to \(\infty \).

After this work was completed, it was brought to our attention by a referee that our framework coincides with the mean-field game planning problem (MFGP), which was introduced by P-L Lions in his lecture notes at College de France and later studied in [37, 38]. However, the precise problem we study differs from those in [37, 38] in two ways. First, the diffusion term in (1) can be degenerate, in the sense that B is not required to be square and nonsingular. The presence of a degenerate diffusion violates the assumptions in [37, 38] where the proofs rely on the regularity of the Laplacian operator \(\varDelta \). Second, Eq. (1) includes a weak interaction term which is not covered in [37, 38]. Last but not least, our solution is “explicit” and does not require an iterative scheme. This makes it suitable for direct implementation.

3 General Approach and Solution

For the simplicity of exposition we focus on the case where the running cost depends only on the control actuation (i.e., taking the matrix Q in (8) to be zero). The extension to general \(Q\ge 0\) is straightforward, as can be seen in the discussion at the end of this section.

We begin by considering the optimal steering problem [32, 33, 39, 40] without terminal cost, i.e., for a fixed density flow \((\mu (t))_{0\le t\le 1}\), consider the control problem to minimize

$$\begin{aligned} J(u)={\mathbb E}\left\{ \int _0^1 \frac{1}{2}\Vert u(t)\Vert ^2 \text {d}t\right\} \end{aligned}$$

subject to the dynamics

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+{\bar{A}}{\bar{x}_\mu }(t)\text {d}t +Bu(t)\text {d}t+B\text {d}w(t), ~~x(0)=x_0 \sim \rho _0~ \end{aligned}$$
(9)

and the constraint that x(1) has probability density \(\rho _1\). This problem can be (formally) posed as

$$\begin{aligned}&\inf _{\rho , u}\quad \int _0^1 \int _{{\mathbb R}^n} \frac{1}{2} \rho (t,x) \Vert u(t,x)\Vert ^2 \text {d}x\text {d}t, \end{aligned}$$
(10a)
$$\begin{aligned}&\frac{\partial \rho }{\partial t}+\nabla \cdot ((Ax+{\bar{A}}{\bar{x}_\mu }+Bu)\rho )-\frac{1}{2}{\text {tr}}(BB'\nabla ^2 \rho )=0, \end{aligned}$$
(10b)
$$\begin{aligned}&\rho (0,\cdot )=\rho _0, \quad \rho (1,\cdot )=\rho _1. \end{aligned}$$
(10c)

Following a similar argument as in [27], we establish the following sufficient condition for optimality.

Proposition 3.1

If there exists a function \(\lambda \) such that \(\rho ^\star , \lambda \) satisfy

$$\begin{aligned}&\frac{\partial \lambda }{\partial t}+\nabla \lambda \cdot Ax+\nabla \lambda \cdot {\bar{A}}{\bar{x}_\mu }+\frac{1}{2}{\text {tr}}(BB'\nabla ^2 \lambda )+\frac{1}{2}\nabla \lambda \cdot BB'\nabla \lambda =0, \end{aligned}$$
(11a)
$$\begin{aligned}&\frac{\partial \rho ^\star }{\partial t}+\nabla \cdot ((Ax+{\bar{A}}{\bar{x}_\mu }+BB'\nabla \lambda )\rho ^\star )-\frac{1}{2}{\text {tr}}(BB'\nabla ^2 \rho ^\star )=0, \end{aligned}$$
(11b)

and boundary conditions

$$\begin{aligned} \rho ^\star (0,\cdot )=\rho _0, \quad \rho ^\star (1,\cdot )=\rho _1, \end{aligned}$$
(11c)

then \((\rho ^\star , u^\star =B'\nabla \lambda )\) is a solution to (10).

Replacing \(\mu \) in (11) by \(\rho ^\star \) we obtain the system of (nonlinear) PDE’s

$$\begin{aligned}&\frac{\partial \lambda }{\partial t}+\nabla \lambda \cdot Ax+\nabla \lambda \cdot {\bar{A}}{\bar{x}}_{\rho ^\star }+\frac{1}{2}{\text {tr}}(BB'\nabla ^2 \lambda )+\frac{1}{2}\nabla \lambda \cdot BB'\nabla \lambda =0, \end{aligned}$$
(12a)
$$\begin{aligned}&\frac{\partial \rho ^\star }{\partial t}+\nabla \cdot ((Ax+{\bar{A}}{\bar{x}}_{\rho ^\star }+BB'\nabla \lambda )\rho ^\star )-\frac{1}{2}{\text {tr}}(BB'\nabla ^2 \rho ^\star )=0, \end{aligned}$$
(12b)
$$\begin{aligned}&\rho ^\star (0,\cdot )=\rho _0, \quad \rho ^\star (1,\cdot )=\rho _1. \end{aligned}$$
(12c)

We will show below that, under mild assumptions, (12) has a solution. In fact, we will construct such a solution using standard Schrödinger bridge theory [26, 27].

Note that the coupled PDEs (12a), (12b) are the same as the PDEs that arise in classic MFG problems corresponding to (1) and (8). However, the usual boundary conditions

$$\begin{aligned} \rho ^\star (0,\cdot )=\rho _0, \quad \lambda (1,x)=-\,g(x,\rho ^\star (1,\cdot )), \end{aligned}$$

are now different and given by (12c). Evidently, the Lagrange multiplier \(-\lambda \) is the value (cost-to-go) function of the associated optimal control problem.

To solve (12), we first consider the Schrödinger bridge problem with prior dynamics

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+B\text {d}w(t). \end{aligned}$$
(13)

We remark that the Schrödinger bridge can be viewed as a regularized OMT problem [27, 29, 30]. Let

$$\begin{aligned} {\hat{\rho }}_0(x)=\rho _0(x+{\bar{x}}_{\rho _0}) \end{aligned}$$

and

$$\begin{aligned} {\hat{\rho }}_1(x)=\rho _1(x+{\bar{x}}_{\rho _1}), \end{aligned}$$

then

$$\begin{aligned} \langle x, {\hat{\rho }}_0 \rangle =0, ~~\langle x, {\hat{\rho }}_1 \rangle =0. \end{aligned}$$

The Schrödinger bridge with prior dynamics (13) and marginal distributions \({\hat{\rho }}_0\) and \({\hat{\rho }}_1\) is [26, 27]

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+BB'\nabla {\hat{\lambda }}(t,x(t))\text {d}t+B\text {d}w(t), \end{aligned}$$
(14)

where \({\hat{\lambda }}\) satisfies

$$\begin{aligned} \frac{\partial {\hat{\lambda }}}{\partial t}+\nabla {\hat{\lambda }}\cdot Ax+\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2{\hat{\lambda }}\right) +\frac{1}{2}\nabla {\hat{\lambda }}\cdot BB'\nabla {\hat{\lambda }}=0. \end{aligned}$$
(15)

The boundary condition \({\hat{\lambda }}(1,\cdot )\) for \({\hat{\lambda }}\) is chosen in a way so that the resulting density flow \({\hat{\rho }}(t,x)\) of (14), which is

$$\begin{aligned} \frac{\partial {\hat{\rho }}}{\partial t}+\nabla \cdot \left( \left( Ax+BB'\nabla {\hat{\lambda }}\right) {\hat{\rho }}\right) -\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 {\hat{\rho }}\right) =0, \end{aligned}$$

matches the marginal distributions \({\hat{\rho }}_0\) and \({\hat{\rho }}_1\).

The Schrödinger bridge corresponds to an optimal control problem [26, 27] that minimizes

$$\begin{aligned} {\mathbb E}\left\{ \int _0^1 \frac{1}{2}\Vert u(t)\Vert ^2\text {d}t\right\} \end{aligned}$$

subject to

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+Bu(t)\text {d}t+B\text {d}w \end{aligned}$$

and the constraints on the marginal distributions \(x(0)\sim {\hat{\rho }}_0\) and \(x(1)\sim {\hat{\rho }}_1\). The optimal control is given by \(u(t,x) =B'\nabla {\hat{\lambda }}(t,x)\). The pair \(({\hat{\lambda }}, {\hat{\rho }})\) satisfies that

$$\begin{aligned} \langle \nabla {\hat{\lambda }}(t,\cdot ), {\hat{\rho }}(t,\cdot )\rangle =0 \end{aligned}$$
(16)

and therefore

$$\begin{aligned} \langle x, {\hat{\rho }}(t,\cdot )\rangle =0 \end{aligned}$$
(17)

for all \(0\le t\le 1\). The intuition for (16) is that if the expectation of the control, i.e., \(\langle \nabla {\hat{\lambda }}(t,\cdot ), {\hat{\rho }}(t,\cdot )\rangle \), is not constantly 0, then one can always shift the control by its mean to achieve a smaller cost.

Now let

$$\begin{aligned} m(t)=\varPhi (1,t)'{\bar{M}}_{10}^{-1}({\bar{x}}_{\rho _1}-{\bar{\varPhi }}_{10}{\bar{x}}_{\rho _0}), \end{aligned}$$
(18)

y(t) the solution to

$$\begin{aligned} \dot{y}(t)=(A+{\bar{A}})y(t)+BB'm(t), ~~y(0)={\bar{x}}_{\rho _0}, \end{aligned}$$

and

$$\begin{aligned} \gamma (t)=-\,\int _0^t ({\bar{A}}y(s)\cdot m(s)+\frac{1}{2}m(s)\cdot BB'm(s))\text {d}s. \end{aligned}$$

Here \({\bar{\varPhi }}_{10}:={\bar{\varPhi }}(1,0)\) with \({\bar{\varPhi }}\) being the state transition matrices for the pair \((A+{\bar{A}}, B)\) and the “coupled” Gramian

$$\begin{aligned} {\bar{M}}_{10}=\int _0^1 {\bar{\varPhi }}(1,\tau )BB'\varPhi (1,\tau )'\text {d}\tau \end{aligned}$$

is assumed to be invertible. Note that \(y(1)={\bar{x}}_{\rho _1}\).

With these ingredients, we construct a solution to (12) as follows. Define \((\lambda , \rho ^\star )\) by

$$\begin{aligned} \lambda (t,x)={\hat{\lambda }}(t,x-y(t))+m(t)\cdot x+\gamma (t), \end{aligned}$$
(19a)

and

$$\begin{aligned} \rho ^\star (t,x)={\hat{\rho }}(t,x-y(t)). \end{aligned}$$
(19b)

In so doing, \((\lambda , \rho ^\star )\) is a solution of (12). To see this, on the one hand, substituting (19) into (12b), in view of (17), we obtain

$$\begin{aligned}&\frac{\partial \rho ^\star }{\partial t}+\nabla \cdot \left( \left( Ax+{\bar{A}}{\bar{x}}_{\rho ^\star }+BB'\nabla \lambda \right) \rho ^\star \right) -\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \rho ^\star \right) \\&\quad = \frac{\partial {\hat{\rho }}}{\partial t}-\nabla {\hat{\rho }}\cdot \left( \left( A+{\bar{A}}\right) y+BB'm\right) \\&\qquad +\,\nabla \cdot \left( \left( Ax+BB'\nabla \lambda \right) {\hat{\rho }}\right) +{\bar{A}}\langle \xi , {\hat{\rho }}\left( t,\xi -y\right) \rangle \cdot \nabla {\hat{\rho }}-\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 {\hat{\rho }}\right) \\&\quad =-\,\nabla {\hat{\rho }}\cdot \left( {\bar{A}}y+BB'm\right) +\nabla \cdot \left( BB'm{\hat{\rho }}\right) +{\bar{A}}\langle \xi , {\hat{\rho }}\left( t,\xi -y\right) \rangle \cdot \nabla {\hat{\rho }}=0, \end{aligned}$$

where we referred to (17) in the last step. The fact that \(\rho ^\star \) matches the boundary conditions (12c) follows directly from the definition (19b). On the other hand, plugging (19) into (12a) yields

$$\begin{aligned}&\frac{\partial \lambda }{\partial t}+\nabla \lambda \cdot Ax+\nabla \lambda \cdot {\bar{A}}{\bar{x}}_{\rho ^\star }+\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \lambda \right) +\frac{1}{2}\nabla \lambda \cdot BB'\nabla \lambda \\&\quad = \frac{\partial {\hat{\lambda }}}{\partial t}-\nabla {\hat{\lambda }}\cdot \left( \left( A+{\bar{A}}\right) y+BB'm\right) +\dot{m}\cdot x+\dot{\gamma }\\&\qquad +\,\left( \nabla {\hat{\lambda }}+m\right) \cdot \left( Ax+{\bar{A}}{\bar{x}}_{\rho ^\star }\right) +\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 {\hat{\lambda }}\right) \\&\qquad +\,\frac{1}{2}\left( \nabla {\hat{\lambda }}+m\right) \cdot BB'\left( \nabla {\hat{\lambda }}+m\right) \\&\quad = -\,\nabla {\hat{\lambda }}\cdot {\bar{A}}y+\dot{m}\cdot x+\dot{\gamma }+\left( \nabla {\hat{\lambda }}+m\right) \cdot {\bar{A}}{{\bar{x}}}_{\rho ^\star }+m\cdot Ax+\frac{1}{2}m\cdot BB'm=0. \end{aligned}$$

Therefore \((\lambda , \rho ^\star )\) in (19) is indeed a solution to (12).

The above construction of a solution to (12) is easy to implement numerically. The only nontrivial requirement is to compute a Schrödinger bridge, which can be accomplished using an efficient algorithm [41]. The rest follows explicitly from (19). This construction is very different to that in [37, 38] which relies on an iterative scheme between \(\rho \) and \(\lambda \). The key factor leading to this simplification is that the mean-field effect can be decoupled when the dynamics is linear and the running cost quadratic.

Finally, back to Problem 2.1, we assert that with terminal cost

$$\begin{aligned} g(x,\mu )=-\,{\hat{\lambda }}(1,x-{\bar{x}_\mu })-m(1)\cdot x-\gamma (1), \end{aligned}$$
(20)

we can lead the agents to have terminal distribution \(\rho _1\). To this extent, we follow the strategy in [5] as mentioned earlier in Sect. 2. First fix \(\mu =\rho ^\star \) with \(\rho ^\star \) as in (19b), and then solve the optimal control problem (5). Since \(g(x,\rho ^\star (1,\cdot ))=g(x,\rho _1)=-\,\lambda (1,x)\), we have

$$\begin{aligned}&{\mathbb E}\left\{ \int _0^1 \frac{1}{2}\Vert u(t)\Vert ^2 \text {d}t+g(x,\rho ^\star (1,\cdot ))\right\} \\&\quad ={\mathbb E}\left\{ \int _0^1 \frac{1}{2}\Vert u(t)\Vert ^2 \text {d}t-\lambda (1,x(1))\right\} \\&\quad ={\mathbb E}\left\{ \int _0^1 [\frac{1}{2}\Vert u(t)\Vert ^2 \text {d}t-\text {d}\lambda (t,x(t))]-\lambda (0,x(0))\right\} \\&\quad ={\mathbb E}\left\{ \int _0^1 [\frac{1}{2}\Vert u(t)\Vert ^2 \text {d}t-\frac{\partial \lambda }{\partial t}\text {d}t-\nabla \lambda \cdot \text {d}x(t)-\frac{1}{2}{\text {tr}}(BB'\nabla ^2 \lambda )\text {d}t]-\lambda (0,x(0))\right\} \\&\quad ={\mathbb E}\left\{ \int _0^1 \frac{1}{2}\Vert u(t)-B'\nabla \lambda (t,x(t))\Vert ^2 \text {d}t\right\} -{\mathbb E}\{\lambda (0,x(0)\}. \end{aligned}$$

Hence, the unique optimal control strategy is \(u^\star (t)=B'\nabla \lambda (t,x(t))\). It follows from (12) that the probability distribution of the controlled state x(t) is \(\rho ^\star \). Therefore, with terminal cost g as in (20) we are able to steer the system to terminal distribution \(\rho _1\) providing that the agents follow the equilibrium strategy \(u=B'\nabla \lambda \). Thus, we have established the following result.

Theorem 3.1

Consider N agents governed by (1) with initial density \(\rho _0\). Suppose the terminal cost in (8) is as in (20). Then, in the Nash equilibrium, the agents will reach density \(\rho _1\) at time \(t=1\), in the mean-field limit.

Even though Theorem 3.1 is a result in the mean-field limit, it is straightforward to extend it to N agents systems to establish \(\varepsilon \) optimality. After all, once the terminal cost g is fixed, the problem becomes a standard MFG problem. Thus, the rest follows directly from standard MFG theory [5, 36].

In fact, the dependence of g on \(\mu \) is not necessary. One can simply take \(g(x,\mu )=g(x)=-\,\lambda (1,x)\). With this terminal cost, we can still conclude that \((\lambda , \rho ^\star )\) corresponds to a Nash equilibrium as well. This is due to the fact that we fix the density flow first when we derive a Nash equilibrium. We might need the dependence of g on \(\mu \) to conclude the uniqueness of the equilibrium. It is unclear to us if this is the case.

For general \(Q \ge 0\) in (8), the solution is almost the same. The only difference is that \({\hat{\lambda }}, {\hat{\rho }}\) now solve the equations

$$\begin{aligned}&\frac{\partial {\hat{\lambda }}}{\partial t}+\nabla {\hat{\lambda }}\cdot Ax+\frac{1}{2}{\text {tr}}(BB'\nabla ^2{\hat{\lambda }})+\frac{1}{2}\nabla {\hat{\lambda }}\cdot BB'\nabla {\hat{\lambda }}-\frac{1}{2} x'Qx=0,\\&\frac{\partial {\hat{\rho }}}{\partial t}+\nabla \cdot ((Ax+BB'\nabla {\hat{\lambda }}){\hat{\rho }}) -\frac{1}{2}{\text {tr}}(BB'\nabla ^2 {\hat{\rho }})=0,\\&{\hat{\rho }}(0,\cdot ) = {\hat{\rho }}_0,~~~~{\hat{\rho }}(1,\cdot ) = {\hat{\rho }}_1. \end{aligned}$$

This corresponds to a Schrödinger bridge for diffusion processes with killing [40]. Again, the same efficient algorithm is applicable [41].

4 Zero-Noise Limit

In this section, we study the same problem (Problem 2.1), with however vanishing disturbance. More specifically, we consider a system of N agents with dynamics

$$\begin{aligned} \text {d}x_i(t)= & {} Ax_i(t)\text {d}t+\frac{1}{N-1}\sum _{j\ne i} {\bar{A}}x_j(t)\text {d}t+Bu_i(t)\text {d}t+\sqrt{\epsilon }B\text {d}w_i(t),\nonumber \\ x_i(0)= & {} x_0^i,~~i=1,\ldots ,N, \end{aligned}$$
(21)

where \(\epsilon >0\) represents the variance of the noise. We are especially interested in the limit behavior of the solution to Problem 2.1 with dynamics (21) when \(\epsilon \) goes to 0.

Following the same arguments as in Sect. 3, we arrive at the coupled PDEs

$$\begin{aligned}&\frac{\partial \lambda }{\partial t}+\nabla \lambda \cdot Ax+\nabla \lambda \cdot {\bar{A}}{{\bar{x}}}_{\rho ^\star }+\frac{\epsilon }{2}{\text {tr}}\left( BB'\nabla ^2 \lambda \right) +\frac{1}{2}\nabla \lambda \cdot BB'\nabla \lambda =0, \qquad \end{aligned}$$
(22a)
$$\begin{aligned}&\frac{\partial \rho ^\star }{\partial t}+\nabla \cdot \left( \left( Ax+{\bar{A}}{{\bar{x}}}_{\rho ^\star }+BB'\nabla \lambda \right) \rho ^\star \right) -\frac{\epsilon }{2}{\text {tr}}\left( BB'\nabla ^2 \rho ^\star \right) =0, \end{aligned}$$
(22b)
$$\begin{aligned}&\rho ^\star \left( 0,\cdot \right) =\rho _0, \quad \rho ^\star \left( 1,\cdot \right) =\rho _1. \end{aligned}$$
(22c)

The optimal control strategy is given by \(u(t)=B'\nabla \lambda (t,x(t))\) and terminal cost g is as in (20) with adjusted diffusitivity.

Taking the limit of (22) as \(\epsilon \rightarrow 0\) gives

$$\begin{aligned}&\frac{\partial \lambda }{\partial t}+\nabla \lambda \cdot Ax+\nabla \lambda \cdot {\bar{A}}{{\bar{x}}}_{\rho ^\star }+\frac{1}{2}\nabla \lambda \cdot BB'\nabla \lambda =0, \end{aligned}$$
(23a)
$$\begin{aligned}&\frac{\partial \rho ^\star }{\partial t}+\nabla \cdot \left( \left( Ax+{\bar{A}}{{\bar{x}}}_{\rho ^\star }+BB'\nabla \lambda \right) \rho ^\star \right) =0, \end{aligned}$$
(23b)
$$\begin{aligned}&\rho ^\star \left( 0,\cdot \right) =\rho _0, \quad \rho ^\star \left( 1,\cdot \right) =\rho _1. \end{aligned}$$
(23c)

With similar analysis as in Sect. 3 we conclude that the above PDEs system has a (viscosity) solution [42]. However, in this case, we need to resort to OMT instead of Schrödinger bridge. Indeed, the former is the zero-noise limit of the latter [27, 29, 30]. The solution \((\lambda ,\rho ^\star )\) to (23) has the form (19) with \({\hat{\lambda }}\) being

$$\begin{aligned} {\hat{\lambda }}\left( t,x\right) =\inf _y \left\{ {\hat{\lambda }}\left( 0,y\right) + \frac{1}{2}\left( x-\varPhi \left( t,0\right) y\right) 'M\left( t,0\right) ^{-1}\left( x-\varPhi \left( t,0\right) y\right) \right\} , \end{aligned}$$

where

$$\begin{aligned} {\hat{\lambda }}\left( 0,x\right) =\psi \left( M_{10}^{-1/2}\varPhi _{10}x\right) -\frac{1}{2}x'\varPhi _{10}'M_{10}^{-1} \varPhi _{10}x. \end{aligned}$$

The function \(\psi \) corresponds to the optimal transport map with prior dynamics \(\dot{x}=Ax+Bu\), and marginal distributions \({\hat{\rho }}_0\) and \({\hat{\rho }}_1\) after coordinate transformation, see [27]. It can be computed efficiently. It turns out that the solution to (23) solves the following problem.

Problem 4.1

Given N agents governed by (21) with \(\epsilon =0\), and initial probability density \(\rho _0\), find a function g such that, in the Nash equilibrium with cost function (8), the agents would reach a specified density \(\rho _1\) at time \(t=1\), in the limit as N goes to \(\infty \).

With the solution to (23), we can choose a terminal cost as in (20). The corresponding equilibrium control strategy is \(u(t,x)=B'\nabla \lambda (t,x)\). The argument is the same as that in Sect. 3 and is therefore omitted.

Theorem 4.1

Consider N agents governed by (21) with \(\epsilon =0\) and initial density \(\rho _0\). Suppose the terminal cost g in (8) is as in (20), then, in the Nash equilibrium, the agents will reach density \(\rho _1\) at time \(t=1\), in the mean-field limit.

5 Gaussian Case

In the special case when \(\rho _0\) and \(\rho _1\) are normal (Gaussian) distributions, the solutions have a nice linear structure. Let the two marginal distributions be

$$\begin{aligned} \rho _0 \sim {\mathcal N}[m_0, \varSigma _0],~~~ \rho _1 \sim {\mathcal N}[m_1,\varSigma _1], \end{aligned}$$

i.e., Gaussian distributions with, respectively, means \(m_0, m_1\) and covariances \(\varSigma _0, \varSigma _1\).

When \(\epsilon =1\), \({\hat{\lambda }}\) in (15) equals

$$\begin{aligned} {\hat{\lambda }}(t,x)=-\,\frac{1}{2}x'\varPi (t)x+\frac{1}{2}\int _0^t {\text {tr}}(BB'\varPi (s))\text {d}s, \end{aligned}$$

where \(\varPi (t)\) is the solution to the Riccati equation

$$\begin{aligned} \dot{\varPi }(t)=-\,A'\varPi (t)-\varPi (t)A+\varPi (t)BB'\varPi (t) \end{aligned}$$
(24)

with boundary condition

$$\begin{aligned} \varPi \left( 0\right)&=\varSigma _0^{-1/2}\left[ \frac{I}{2}+\varSigma _0^{1/2}\varPhi _{10}'M_{10}^{-1}\varPhi _{10} \varSigma _0^{1/2}\right. \\&\quad \left. -\left( \frac{I}{4}+\varSigma _0^{1/2}\varPhi _{10}'M_{10}^{-1}\varSigma _1M_{10}^{-1} \varPhi _{10}\varSigma _0^{1/2}\right) ^{1/2}\right] \varSigma _0^{-1/2}. \end{aligned}$$

where \(\varPhi _{10}=\varPhi (1,0), M_{10}=M(1,0)\). And so, in view of (20), one choice of terminal cost is

$$\begin{aligned} g(x,\mu )=\frac{1}{2}(x-{\bar{x}_\mu })'\varPi (1)(x-{\bar{x}_\mu })-m(1)\cdot x, \end{aligned}$$
(25)

where m(t) is as in (18). In the above we have discarded some constant terms as it does not affect the final result. The resulting optimal strategy for each agent is \(u(t,x) = B'\nabla \lambda (t,x)\), which is a linear state feedback.

Theorem 5.1

Consider N agents governed by (1) with initial distribution \(\rho _0\sim {\mathcal N}[m_0, \varSigma _0]\). Suppose the terminal cost in (8) is (25). Then, in the Nash equilibrium, the agents will reach density \(\rho _1\sim {\mathcal N}[m_1,\varSigma _1]\) at time \(t=1\), in the mean-field limit.

Following the discussion in Sect. 4, the solution to the problem with noise intensity \(\epsilon \) is almost identical to the above except that, the initial condition of the Riccati Eq. (24) becomes

$$\begin{aligned} \varPi _\epsilon \left( 0\right)&=\varSigma _0^{-1/2}\left[ \frac{\epsilon I}{2}+\varSigma _0^{1/2}\varPhi _{10}'M_{10}^{-1}\varPhi _{10}\varSigma _0^{1/2}\right. \\&\quad \left. -\left( \frac{\epsilon ^2 I}{4}+\varSigma _0^{1/2}\varPhi _{10}'M_{10}^{-1}\varSigma _1M_{10}^{-1}\varPhi _{10}\varSigma _0^{1/2}\right) ^{1/2}\right] \varSigma _0^{-1/2}. \end{aligned}$$

Taking the limit as \(\epsilon \rightarrow 0\) we obtain the solution to the deterministic problem, which corresponds to the initial condition

$$\begin{aligned} \varPi _0\left( 0\right)&=\varSigma _0^{-1/2}\left[ \varSigma _0^{1/2}\varPhi _{10}'M_{10}^{-1}\varPhi _{10} \varSigma _0^{1/2}\right. \\&\quad \left. -\left( \varSigma _0^{1/2}\varPhi _{10}'M_{10}^{-1}\varSigma _1M_{10}^{-1}\varPhi _{10} \varSigma _0^{1/2}\right) ^{1/2}\right] \varSigma _0^{-1/2}. \end{aligned}$$

Since the terminal cost g in (25) is quadratic, once g is fixed, the resulting problem falls within the class of linear-quadratic mean-field games [6,7,8,9,10,11]. The optimal policy, which is a linear state feedback, now follows from standard theory. Our contribution is to obtain the appropriate value for the terminal cost g in (25).

6 Stationary Case and Invariant Measure

We now turn to the stationary counterpart of Problem 2.1. We would like to design a cost function that will lead the agents to achieve a given invariant measure \(\rho \), if the agents follows the equilibrium strategy. In particular, given N agents with identical dynamics (1) that attempt to minimize their control effort

$$\begin{aligned} J_i (u_i)=\limsup _{T\rightarrow \infty } \frac{1}{T}{\mathbb E}\left\{ \int _0^T \frac{1}{2}\Vert u_i(t)\Vert ^2 \text {d}t\right\} , \end{aligned}$$

we look for an extra cost \(g(x,\mu )\) term added to the above such that, in the equilibrium state, the agents have some specified distribution. The new cost function is

$$\begin{aligned} J_i (u_i)=\limsup _{T\rightarrow \infty } \frac{1}{T}{\mathbb E}\left\{ \int _0^T \left[ \frac{1}{2}\Vert u_i(t)\Vert ^2 +g(x_i(t),\mu ^N(t))\right] \text {d}t\right\} \end{aligned}$$
(26)

where \(\mu ^N\) is the empirical distribution (3). Again we are interested in the mean-field limit of the problem, that is, the case when N goes to \(\infty \).

In this stationary problem an extra running cost is being considered. This is different from the finite horizon case discussed earlier where a terminal cost is needed instead. Adding a running cost to dictate the stationary distribution, in an optimal control setting, has been studied in [33].

Let us first recall some relevant results in the stationary mean-field game problems. Suppose the N agents with dynamics (1) attempt to minimize the cost function

$$\begin{aligned} J_i (u_i)=\limsup _{T\rightarrow \infty } \frac{1}{T}{\mathbb E}\left\{ \int _0^T f(x_i(t),u_i(t),\mu ^N(t)) \text {d}t\right\} . \end{aligned}$$

We restrict ourselves to equilibria with symmetric stationary Markovian strategies

$$\begin{aligned} u_i(t)=\phi (x_i(t)). \end{aligned}$$

In the mean-field limit, one can adapt the following steps [5]. First, fix a probability measure \(\mu \) and then solve the standard stochastic control problem (parametrized by \(\mu \))

$$\begin{aligned} \phi ^\star ={\text {argmin}}_{\phi } \limsup _{T\rightarrow \infty } \frac{1}{T}{\mathbb E}\left\{ \int _0^T f(x(t),\phi (x(t)),\mu (t)) \text {d}t\right\} \end{aligned}$$
(27)

subject to the dynamics

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+{\bar{A}}{\bar{x}_\mu }\text {d}t +B\phi (x(t))\text {d}t+B\text {d}w(t). \end{aligned}$$
(28)

Once this standard optimal control problem is solved, the remaining issue is finding the correct distribution \(\mu \) such that the stationary distribution of (28) with optimal control strategy

$$\begin{aligned} u^\star (t)=\phi ^\star (x(t)) \end{aligned}$$

coincides with \(\mu \).

The solution to this MFG problem involves the coupled PDEs [2, 5]

$$\begin{aligned}&\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \lambda \right) +\eta -H^{\rho ^\star }\left( x,-\nabla \lambda \right) =0, \end{aligned}$$
(29a)
$$\begin{aligned}&\nabla \cdot \left( \left( Ax+{\bar{A}}{{\bar{x}}}_{\rho ^\star }+B\phi ^\star \right) \rho ^\star \right) -\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \rho ^\star \right) =0, \end{aligned}$$
(29b)
$$\begin{aligned}&\rho ^\star \ge 0, ~~~\int \rho ^\star =1, \end{aligned}$$
(29c)

where \(\eta \) is an unknown constant, \(u^\star =\phi ^\star (x)\) is the minimizer of

$$\begin{aligned} H^\rho (x,p)=\min _{u\in {\mathbb R}^m}\left\{ p^\prime (Ax+{\bar{A}}{{\bar{x}}}_{\rho }+Bu)+f(x,u,\rho )\right\} . \end{aligned}$$

When the cost function is of the form (26), \(\phi ^\star =B'\nabla \lambda \) and the PDEs boil down to

$$\begin{aligned}&\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \lambda \right) +\eta +\nabla \lambda \cdot Ax+\nabla \lambda \cdot {\bar{A}}{{\bar{x}}}_{\rho ^\star } +\frac{1}{2}\nabla \lambda \cdot BB'\nabla \lambda -g\left( x,\rho ^\star \right) =0, \end{aligned}$$
(30a)
$$\begin{aligned}&\nabla \cdot \left( \left( Ax+{\bar{A}}{{\bar{x}}}_{\rho ^\star }+BB'\nabla \lambda \right) \rho ^\star \right) -\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \rho ^\star \right) =0, \end{aligned}$$
(30b)
$$\begin{aligned}&\rho ^\star \ge 0, ~~~\int \rho ^\star =1. \end{aligned}$$
(30c)

The existence of a solution \((\rho ^\star , \lambda )\) can be shown under some proper assumptions on g, see [2, 5].

Back to our problem, the cost function g in (26) becomes a design parameter, which is different from the classic MFG setting. Our goal is to choose a function g such that the corresponding stationary distribution in Nash equilibrium is \(\rho ^\star \). The solution relies on the same PDEs (30), but with different variables. Given a distribution \(\rho ^\star \), we need to find \(\lambda \) and the proper cost g that solve (30). It turns out that (30) has solution only for a small class of distributions \(\rho ^\star \), which we call the feasible distributions. We next focus on Gaussian distributions. In this case, the feasible distributions can be characterized by some algebraic equations. The cases of general distributions will be investigated in future study.

Let \(\rho ^\star \) be a Gaussian distribution with mean m and covariance \(\varSigma \). Plugging the ansatz

$$\begin{aligned} \lambda (x)=-\,\frac{1}{2}x'\varPi x+n'x \end{aligned}$$

with \(\varPi =\varPi '\), into (30) yields (after discarding constant terms)

$$\begin{aligned}&-\,\frac{1}{2}{\text {tr}}\left( BB'\varPi \right) +\eta +\frac{1}{2}x'\left( -\varPi A-A'\varPi +\varPi BB'\varPi \right) x\nonumber \\&\quad +\,n'\left( A-BB'\varPi \right) x-m'{\bar{A}}'\varPi x-g\left( x,\rho ^\star \right) =0 \end{aligned}$$
(31a)
$$\begin{aligned}&\left( A-BB'\varPi \right) \varSigma +\varSigma \left( A-BB'\varPi \right) '+BB'=0 \end{aligned}$$
(31b)
$$\begin{aligned}&\left( A+{\bar{A}}-BB'\varPi \right) m+BB'n=0. \end{aligned}$$
(31c)

In order for a solution to exist, in view of (31b), it is necessary that \(\varSigma \) satisfies

$$\begin{aligned} A\varSigma +\varSigma A' \in {\text {range}}({\mathfrak f}_B), \end{aligned}$$
(32a)

where \({\mathfrak f}_B(X)=BX'+XB'\) is a map from \({\mathbb R}^{n\times m}\) to the space of symmetric matrices (see [33] for other equivalent algebraic conditions). Likewise, by (31c), the mean m has to satisfy

$$\begin{aligned} (A+{\bar{A}})m \in {\text {range}}(B). \end{aligned}$$
(32b)

On the other hand, given \((m,\varSigma )\) satisfying (32), assuming B has full column rank, then (31b) has a unique symmetric solution [33]. Therefore, these two conditions (32) are also sufficient. Now from (31a) it is easy to conclude that a possible cost function is

$$\begin{aligned} g(x,\rho )=\frac{1}{2}x'Qx+n\cdot (A-BB'\varPi )x-{{\bar{x}}}_\rho \cdot {\bar{A}}'\varPi x, \end{aligned}$$
(33a)

where

$$\begin{aligned} Q=-\,\varPi A-A'\varPi +\varPi BB'\varPi , \end{aligned}$$
(33b)

with \(\varPi \) being the unique solution to (31b) and n being a solution to (31c). Therefore, we have established the following result.

Theorem 6.1

Consider N agents governed by (1). Assuming the Gaussian distribution \({\mathcal N}(m,\varSigma )\) satisfies (32). If g function in the cost (26) is as in (33), then, in the Nash equilibrium, the agents have stationary Gaussian distribution \({\mathcal N}(m,\varSigma )\) in the mean-field limit.

7 Cooperative Game

In this section we shift to a slightly different problem. Given the same interacting agents’ system (1), we would like to investigate the density-steering problem in the cooperative game setting. How to select an optimal controller to drive the agents from given initial distribution \(\rho _0\) to terminal distribution \(\rho _1\)? Again, we restrict ourselves to equilibriums given by symmetric Markovian strategies in closed-loop feedback form

$$\begin{aligned} u_i(t)=\phi (t,x_i(t)), i=1,\ldots ,N. \end{aligned}$$
(34)

The cost function we attempt to minimize is the average control energy

$$\begin{aligned} J(u)={\mathbb E}\left\{ \frac{1}{N}\sum _{i=1}^N\int _0^1 \frac{1}{2}\Vert u_i(t)\Vert ^2 \text {d}t\right\} . \end{aligned}$$
(35)

We are interested in the mean-field limit, namely, the asymptotical behavior of the solution when \(N\rightarrow \infty \).

Problem 7.1

Given N agents governed by (1) with initial density \(\rho _0\), find a control strategy (34) with minimum control energy (35) so that the agents will reach density \(\rho _1\) at time \(t=1\), as N goes to \(\infty \).

The above problem is closely related to the optimal control problem of McKean–Vlasov dynamics [5, 13, 14]. The goal in the latter is to minimize a cost function with possibly a terminal cost while in our problem it becomes to achieve a specified target distribution with minimum running cost. Our problem reduces to an optimal control problem of McKean–Vlasov dynamics with some proper terminal cost. Therefore, one contribution of this section is a method to construct such a terminal cost.

The major difference between Problems 7.1 and  2.1 is that the agents are cooperative. All the agents always use the same control strategy. A small perturbation on the control will affect the probability density flow as the perturbation is applied to the controllers of all the agents, see [5, 14] for more discussions on their differences. The average control energy (35) turns out to be equivalent to relative entropy of the controller system with respect to the uncontrolled system [35, 43, 44]. Therefore, the above problem can also be viewed as an Schrödinger bridge problem for weakly interacting particle systems.

Problem 7.1 is formulated as an optimal control problem over the McKean–Vlasov model

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+{\bar{A}}{\bar{x}}(t) \text {d}t +Bu(t)\text {d}t+B\text {d}w(t), ~~x(0)=x_0\sim \rho _0. \end{aligned}$$
(36)

It has the following fluid dynamic formulation. Let \(\rho (t,\cdot )\) be the probability density of the controlled process x(t), then the optimal control problem can be stated as

$$\begin{aligned}&\inf _{\rho , u}\quad \int _0^1 \int _{{\mathbb R}^n} \frac{1}{2} \rho \left( t,x\right) \Vert u\left( t,x\right) \Vert ^2 \text {d}x\text {d}t, \end{aligned}$$
(37a)
$$\begin{aligned}&\frac{\partial \rho }{\partial t}+\nabla \cdot \left( \left( Ax+{\bar{A}}{{\bar{x}}}_\rho +Bu\right) \rho \right) -\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \rho \right) =0, \end{aligned}$$
(37b)
$$\begin{aligned}&\rho \left( 0,\cdot \right) =\rho _0, \quad \rho \left( 1,\cdot \right) =\rho _1. \end{aligned}$$
(37c)

Proposition 7.1

If there exists \((\lambda ,\rho ^\star )\) satisfying

$$\begin{aligned}&\frac{\partial \lambda }{\partial t}\!+\nabla \lambda \cdot Ax+\!\nabla \lambda \cdot {\bar{A}}{{\bar{x}}}_{\rho ^\star }\!+\!\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \lambda \right) \nonumber \\&\quad +\,\frac{1}{2}\nabla \lambda \cdot BB'\nabla \lambda +{\bar{A}}x\cdot \langle \nabla \lambda , \rho ^\star \rangle \!=\!0, \end{aligned}$$
(38a)
$$\begin{aligned}&\frac{\partial \rho ^\star }{\partial t}+\nabla \cdot \left( \left( Ax+{\bar{A}}{{\bar{x}}}_{\rho ^\star }+BB'\nabla \lambda \right) \rho ^\star \right) -\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \rho ^\star \right) =0, \end{aligned}$$
(38b)

with boundary conditions

$$\begin{aligned} \rho ^\star (0,\cdot )=\rho _0, \quad \rho ^\star (1,\cdot )=\rho _1, \end{aligned}$$
(38c)

then \((\rho ^\star , u^\star =B'\nabla \lambda )\) is a solution to (37).

The above optimality conditions (38) are highly coupled. In general, one may not expect a solution to exist. But interestingly, as we will see below, (38) always has a solution. Again, we can construct a solution based on standard Schrödinger bridge theory.

Let \(({\hat{\rho }},{\hat{\lambda }})\) be as in (13)–(16),

$$\begin{aligned} m(t)={\bar{\varPhi }}(1,t)'\hat{M}_{10}^{-1}({{\bar{x}}}_{\rho _1}-{\bar{\varPhi }}_{10}{{\bar{x}}}_{\rho _0}) \end{aligned}$$
(39a)

and y(t) the solution to

$$\begin{aligned} \dot{y}(t)=(A+{\bar{A}})y(t)+BB'm(t), ~~y(0)={{\bar{x}}}_{\rho _0}, \end{aligned}$$
(39b)

where \(\hat{M}_{10}=\hat{M}(1,0)\) with

$$\begin{aligned} \hat{M}(t,s)=\int _s^t {\bar{\varPhi }}(t,\tau )BB'{\bar{\varPhi }}(t,\tau )'\text {d}\tau . \end{aligned}$$

Define

$$\begin{aligned} \gamma (t)=-\,\int _0^t ({\bar{A}}y(s)\cdot m(s)+\frac{1}{2}m(s)\cdot BB'm(s))\text {d}s, \end{aligned}$$
$$\begin{aligned} \lambda (t,x)={\hat{\lambda }}(t,x-y(t))+m(t)\cdot x+\gamma (t), \end{aligned}$$
(40a)

and

$$\begin{aligned} \rho ^\star (t,x)={\hat{\rho }}(t,x-y(t)), \end{aligned}$$
(40b)

then \((\lambda , \rho ^\star )\) solves (38).

Apparently, (40b) satisfies the boundary conditions (38c). To verify (38b), substitute (40) into (38b), which gives

$$\begin{aligned}&\frac{\partial \rho ^\star }{\partial t}+\nabla \cdot \left( \left( Ax+{\bar{A}}{{\bar{x}}}_{\rho ^\star }+BB'\nabla \lambda \right) \rho ^\star \right) -\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \rho ^\star \right) \\&\quad = \frac{\partial {\hat{\rho }}}{\partial t}-\nabla {\hat{\rho }}\cdot \left( \left( A+ \right) y+BB'm\right) +\nabla \cdot \left( \left( Ax+BB'\nabla \lambda \right) {\hat{\rho }}\right) \\&\qquad +\,{\bar{A}}\langle \xi , {\hat{\rho }}\left( \xi -y\right) \rangle \cdot \nabla {\hat{\rho }}-\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 {\hat{\rho }}\right) \\&\quad =-\,\nabla {\hat{\rho }}\cdot \left( {\bar{A}}y+BB'm\right) +\nabla \cdot \left( BB'm{\hat{\rho }}\right) +{\bar{A}}\langle \xi , {\hat{\rho }}\left( \xi -y\right) \rangle \cdot \nabla {\hat{\rho }}=0. \end{aligned}$$

Similarly, Combing (40) and (38a) yields

$$\begin{aligned}&\frac{\partial \lambda }{\partial t}+\nabla \lambda \cdot Ax+\nabla \lambda \cdot {\bar{A}}{{\bar{x}}}_{\rho ^\star }+\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 \lambda \right) \\&\qquad +\,\frac{1}{2}\nabla \lambda \cdot BB'\nabla \lambda +{\bar{A}}x\cdot \langle \nabla \lambda , \rho ^\star \rangle \\&\quad = \frac{\partial {\hat{\lambda }}}{\partial t}-\nabla {\hat{\lambda }}\cdot \left( \left( A+{\bar{A}}\right) y+BB'm\right) \\&\qquad +\,\dot{m}\cdot x+\dot{\gamma }+\left( \nabla {\hat{\lambda }}+m\right) \cdot \left( Ax+{\bar{A}}{{\bar{x}}}_{\rho ^\star }\right) +\frac{1}{2}{\text {tr}}\left( BB'\nabla ^2 {\hat{\lambda }}\right) \\&\qquad +\,\frac{1}{2}\left( \nabla {\hat{\lambda }}+m\right) \cdot BB'\left( \nabla {\hat{\lambda }}+m\right) +{\bar{A}}x\cdot \langle \nabla {\hat{\lambda }}+m, \rho ^\star \rangle \\&\quad = -\,\nabla {\hat{\lambda }}\cdot {\bar{A}}y+\dot{m}\cdot x+\dot{\gamma }+\left( \nabla {\hat{\lambda }}+m\right) \cdot {\bar{A}}{{\bar{x}}}_{\rho ^\star }\\&\qquad +\,m\cdot Ax+\frac{1}{2}m\cdot BB'm+{\bar{A}}x\cdot m+{\bar{A}}y\cdot \langle \nabla {\hat{\lambda }}, \rho ^\star \rangle \\&\quad = {\bar{A}}y\cdot \langle \nabla {\hat{\lambda }}\left( \xi -y\right) , {\hat{\rho }}\left( \xi -y\right) \rangle =0. \end{aligned}$$

Therefore, the pair \(( \rho ^\star , u^\star =B'\nabla \lambda )\) is indeed a solution to (38).

Next we prove that this pair \((\rho ^\star , u^\star )\) provides a solution to the optimal control problem (37). Let \({\bar{u}}(t)={\mathbb E}\{u(t)\}\), then, by (36), we have

$$\begin{aligned} d{\bar{x}}(t)=(A+{\bar{A}}){\bar{x}}(t)\text {d}t+B{\bar{u}}(t)\text {d}t \end{aligned}$$
(41)

and

$$\begin{aligned} d({\tilde{x}}(t))=A{\tilde{x}}(t)\text {d}t+B{\tilde{u}}(t)\text {d}t+B\text {d}w, \end{aligned}$$
(42)

where \(\tilde{x}=x-{{\bar{x}}}\) and \({\tilde{u}}=u-{{\bar{u}}}\). The control energy can then be decomposed into two parts as

$$\begin{aligned} {\mathbb E}\left\{ \int _0^1 \frac{1}{2} \Vert u(t)\Vert ^2\text {d}t\right\} = \int _0^1 \frac{1}{2}\Vert {\bar{u}}(t)\Vert ^2 \text {d}t+{\mathbb E}\left\{ \int _0^1 \frac{1}{2} \Vert {\tilde{u}}(t)\Vert ^2\text {d}t\right\} . \end{aligned}$$

These two parts of control energy, corresponding to \({\bar{u}}\) and \(u-{\bar{u}}\), respectively, can be minimized independently since the dynamics (41) and (42) are decoupled. We next show that

  1. (i)

    \({\bar{u}}^\star \) minimizes

    $$\begin{aligned} \int _0^1 \frac{1}{2}\Vert {\bar{u}}(t)\Vert ^2 \text {d}t \end{aligned}$$
    (43)
  2. (ii)

    \({\tilde{u}}^\star \) minimizes

    $$\begin{aligned} {\mathbb E}\left\{ \int _0^1 \frac{1}{2} \Vert {\tilde{u}}(t)\Vert ^2\text {d}t\right\} . \end{aligned}$$
    (44)

For (i), recalling

$$\begin{aligned} u^\star (t)=B'\nabla \lambda =B'\nabla {\hat{\lambda }}(t,x(t)-y(t))+B'm(t), \end{aligned}$$

we have

$$\begin{aligned} {\bar{u}}^\star (t)&=B'{\mathbb E}\{\nabla {\hat{\lambda }}(t,x(t)-y(t))\}+B'm(t)\\&=B'\langle \nabla {\hat{\lambda }}(t,x-y(t)), \rho ^\star (t,x)\rangle +B'm(t)=B'm(t). \end{aligned}$$

Using standard optimal control, it is easy to see that \({\bar{u}}^\star (t)\) minimizes (43) subject to (41) and boundary conditions

$$\begin{aligned} {\bar{x}}(0)={{\bar{x}}}_{\rho _0},~~ {\bar{x}}(1)={{\bar{x}}}_{\rho _1}. \end{aligned}$$

For (ii), we note

$$\begin{aligned} {\tilde{u}}^\star (t)=B'\nabla {\hat{\lambda }}(t,x(t)-y(t))=B'\nabla {\hat{\lambda }}(t,x(t)-{\bar{x}}(t))= B'\nabla {\hat{\lambda }}(t,\tilde{x}(t)). \end{aligned}$$

The rest is an immediate consequence of the Schrödinger bridge theory [26, 27]. Hence we have established the following result.

Theorem 7.1

The pair \((\rho ^\star , u^\star =B'\nabla \lambda )\) with \(\rho ^\star , \lambda \) as in (40) solves (37).

As discussed earlier, by adding a proper terminal cost to (35), our problem reduces to an optimal control problem of McKean–Vlasov dynamics. One such choice of terminal cost is \(-\,\lambda (1,\cdot )\).

7.1 Gaussian Case

When both of the marginals \(\rho _0\) and \(\rho _1\) are normal distributions, the optimal control \(u^\star \) is a linear-state feedback control. Let the two marginal distributions \(\rho _0\) and \(\rho _1\) be

$$\begin{aligned} \rho _0 \sim {\mathcal N}[m_0, \varSigma _0],~~\rho _1\sim {\mathcal N}[m_1, \varSigma _1]. \end{aligned}$$

By Theorem 7.1, we need only to compute \(\lambda \) as in (40a), which is

$$\begin{aligned} \lambda (t,x)={\hat{\lambda }}(t,x-y(t))+m(t)\cdot x+\gamma (t). \end{aligned}$$

The function \({\hat{\lambda }}\) corresponds to the Schrödinger bridge (14), which satisfies [32]

$$\begin{aligned} \nabla {\hat{\lambda }}(t,x)=-\,\varPi (t)x, \end{aligned}$$

where \(\varPi (t)\) is the solution to the Riccati equation

$$\begin{aligned} \dot{\varPi }(t)=-\,A'\varPi (t)-\varPi (t)A+\varPi (t)BB'\varPi (t) \end{aligned}$$

with boundary condition

$$\begin{aligned} \varPi \left( 0\right)&=\varSigma _0^{-1/2}\bigg [\frac{I}{2}+\varSigma _0^{1/2}\varPhi _{10}'M_{10}^{-1}\varPhi _{10} \varSigma _0^{1/2}\\&\quad -\,\left( \frac{I}{4}+\varSigma _0^{1/2}\varPhi _{10}'M_{10}^{-1}\varSigma _1M_{10}^{-1}\varPhi _{10} \varSigma _0^{1/2}\right) ^{1/2}\bigg ]\varSigma _0^{-1/2}. \end{aligned}$$

It follows that the optimal control is

$$\begin{aligned} u^\star (t)= & {} B'\nabla \lambda (t,x(t))=-\,B'\varPi (t)(x(t)-y(t))+B'm(t)\\= & {} -\,B'\varPi (t)x(t)+B' n(t), \end{aligned}$$

where

$$\begin{aligned} n(t)= & {} \varPi (t)y(t)+m(t)\\= & {} \varPi (t){\bar{\varPhi }}(t,1)\hat{M}(1,t)\hat{M}_{10}^{-1}{\bar{\varPhi }}_{10}m_0\\&+\,\varPi (t)\hat{M}(t,0){\bar{\varPhi }}(1,t)'\hat{M}_{10}^{-1}m_1 +\,{\bar{\varPhi }}(1,t)'\hat{M}_{10}^{-1}(m_1-{\bar{\varPhi }}_{10}m_0). \end{aligned}$$

7.2 Zero-Noise Limit

We study the optimal steering problem for McKean–Vlasov model (36) with reduced disturbance \(\sqrt{\epsilon } \text {d}w(t)\), namely,

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+{\bar{A}}{\bar{x}}(t) \text {d}t +Bu(t)\text {d}t+\sqrt{\epsilon }B\text {d}w(t), ~~x(0)=x_0. \end{aligned}$$
(45)

In particular, we are interested in the zero-noise limit of this problem. That is, optimal steering problem for dynamics

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+{\bar{A}}{\bar{x}}(t) \text {d}t +Bu(t)\text {d}t, ~~x(0)=x_0. \end{aligned}$$
(46)

We show that the probability flow of the solution to the latter is the limit of that of the former as \(\epsilon \) goes to 0. This is achieved in a constructive manner.

Let us start with the steering problem for the dynamics

$$\begin{aligned} \text {d}x(t)=Ax(t)\text {d}t+B{\tilde{u}}(t)\text {d}t \end{aligned}$$

with marginals \({\hat{\rho }}_0\) and \({\hat{\rho }}_1\). This problem has been studied in [27]. The solution is

$$\begin{aligned} {\tilde{u}}(t,x)=B'\varPhi (1,t)'M_{10}^{-1}(T\circ T_t^{-1}(x)-\varPhi _{10}(T_t^{-1}(x))) \end{aligned}$$
(47)

where T is the generalized optimal mass transport map [27] with marginals \({\hat{\rho }}_0, {\hat{\rho }}_1\), and

$$\begin{aligned} T_t(x)=\varPhi (t,1)M(1,t)M_{10}^{-1}\varPhi _{10}x+M(t,0)\varPhi (1,t)'M_{10}^{-1}T(x). \end{aligned}$$

The corresponding distribution flow is

$$\begin{aligned} {\hat{\rho }}(t,\cdot )=(T_t)_\sharp {\hat{\rho }}_0. \end{aligned}$$

Note that \({\hat{\rho }}\) and \({\tilde{u}}\) satisfy the continuity equation

$$\begin{aligned} \frac{\partial {\hat{\rho }}}{\partial t}+\nabla \cdot ((Ax+B{\tilde{u}}){\hat{\rho }})=0. \end{aligned}$$

We claim that

$$\begin{aligned} u^\star (t,x)={\tilde{u}}(t,x-y(t))+B'm(t) \end{aligned}$$
(48)

with ym in (39), is the optimal control strategy for the steering problem for the dynamics (46) and the corresponding distribution flow is

$$\begin{aligned} \rho ^\star (t,x)={\hat{\rho }}(t,x-y(t)). \end{aligned}$$
(49)

We shall skip the proof as it is similar to the case with disturbance [(see (40)–(44)].

The solution to the density-steering problem with dynamics (45), weakly converges to \(\rho ^\star \) as \(\epsilon \) goes to 0. This follows directly from the fact a Schrödinger bridge converges to the corresponding optimal transport solution [27] as \(\epsilon \) goes to 0.

8 Examples

Consider N agents with dynamics

$$\begin{aligned} \text {d}x_i(t)=x_i(t)\text {d}t-\frac{2}{N-1}\sum _{j\ne i}x_j(t)\text {d}t+\text {d}w_i(t), ~1\le i\le N. \end{aligned}$$

The two marginal distributions \(\rho _0\) and \(\rho _1\) are two normal distributions

$$\begin{aligned} \rho _0 \sim {\mathcal N}[1, 4],\quad \rho _1 \sim {\mathcal N}[-4, 1]. \end{aligned}$$

8.1 Non-cooperative Game

One choice of terminal cost that will steer the agents from \(\rho _0\) to \(\rho _1\) is

$$\begin{aligned} g\left( x,\mu \right) =0.9805\left( x-{\bar{x}_\mu }\right) ^2+4.3679x. \end{aligned}$$
Fig. 1
figure 1

Time evolution of probability densities

Fig. 2
figure 2

Empirical distribution of x(0)

Fig. 3
figure 3

Empirical distribution of x(1)

Fig. 4
figure 4

Time evolution of mean \({\bar{x}}(t)\)

Figure 1 showcases the evolution of the probability density in the Nash equilibrium. To show that the distribution of the agents would evolve according to Fig. 1, we simulated the dynamics for a system with \(N=\)20,000 agents under the optimal strategy. Figures 2 and  3 depict the empirical distributions of the particles at time \(t=0\) and \(t=1\). They match with the theoretical distributions \(\rho _0\) and \(\rho _1\) very well. We also show the empirical mean of these particles in Fig. 4, which perfectly matches the theoretical result.

Fig. 5
figure 5

Time evolution of probability densities

Fig. 6
figure 6

Empirical distribution of x(0)

8.2 Cooperative Game

Figure 5 depicts the time evolution of the probability densities with these two marginal distributions in the cooperative game setting.

Fig. 7
figure 7

Empirical distribution of x(1)

Fig. 8
figure 8

Time evolution of mean \({\bar{x}}(t)\)

Similarly, we ran some simulations for a particle system with \(N=\)20,000 and obtained Figs. 6 and  7 as the empirical distributions of the agents at time \(t=0\) and \(t=1\). We also show the empirical mean of these particles in Fig. 8. Clearly the mean is different to the Nash equilibrium in the non-cooperative game setting.

9 Conclusions

We introduce a paradigm to steer a large number of agents from one distribution to another. The problem lies in the intersection of MFG, OMT and optimal control. We study such problems for linearly weakly interacting agents with quadratic running cost and solve the problem using tools from all these three areas. Results for several extensions such as stationary and cooperative game problems are also presented. In the future, we will focus on extending the results in this paper to more general running cost functions. We expect this paradigm to bring in a new dimension to the study and applications of MFG and OMT.