1 Introduction

Self-organization in social interactions is a fascinating mechanism, which inspired the mathematical modeling of multi-agent interactions towards formation of coherent global behaviors, with applications in the study of biological, social, and economical phenomena. Recently there has been a vigorous development of literature in applied mathematics and physics describing collective behavior of multiagent systems [41,42,43, 52, 56, 58, 82], towards modeling phenomena in biology, such as cell aggregation and motility [21, 59, 60, 73], coordinated animal motion [12, 27, 34, 37,38,39, 43, 66, 70, 71, 74, 79, 85], coordinated human [40, 44, 76] and synthetic agent behavior and interactions, such as cooperative robots [35, 63, 72, 77]. As it is very hard to be exhaustive in accounting all the developments of this very fast growing field, we refer to [28,29,30, 33, 81] for recent surveys.

Two main mechanisms are considered in such models to drive the dynamics. The first, which takes inspiration, e.g., from physics laws of motion, is based on binary forces encoding observed “first principles” of biological, social, or economical interactions. Most of these models start from particle-like systems, borrowing a leaf from Newtonian physics, by including fundamental “social interaction” forces within classical systems of 1st or 2nd order equations. In this paper we mix general principles with concrete modeling instances to encounter the need of both a certain level of generality and to provide immediately concrete applications. Accordingly, we consider here mainly large particle/agent systems of form:

$$\begin{aligned} dx_i = \left(\frac{1}{N}\sum _{j=1}^N P(x_i,x_j)(x_j-x_i)\right)dt + \sqrt{2\sigma } \,dB_i^t, \quad i=1,\ldots ,N, \quad t > 0, \end{aligned}$$
(1.1)

where \(P(\cdot ,\cdot )\) represents the communication function between agents \(x_i \in \mathbb R^d\) and \(B_i^t\) is a d-dimensional Brownian motion.

The second mechanism, which we do not address in detail here, is based on evolutive games, where the dynamics is driven by the simultaneous optimization of costs by the players, perhaps subjected to selection, from game theoretic models of evolution [54] to mean field games, introduced in [62] and independently under the name Nash Certainty Equivalence (NCE) in [55], later greatly popularized, e.g., within consensus problems, for instance in [67, 68].

The common viewpoint of these branches of mathematical modeling of multi-agent systems is that the dynamics are based on the free interaction of the agents or decentralized control. The wished phenomenon to be described is their self-organization in terms of the formation of complex macroscopic patterns.

One fundamental goal of these studies is in fact to reveal the possible relationship between the simple binary forces acting at individual level, being the “first principles” of social interaction or the game rules, and the potential emergence of a global behavior in the form of specific patterns.

For instance one can use the model in (1.1), for \(d=1\) and \(x_i\in I=[-1,1]\), a bounded interval, to formulate classical opinion models, where \(x_i\) represents an opinion in the continuous set between two opposite opinions \(\{-1,1\}\). According to the choice of the communication function \(P(\cdot ,\cdot )\), consensus can emerge or not, and different studies have been made in order to enforce the emergence of a global consensus, [5,6,7, 45, 80]. The mathematical property for a system to form patterns is actually its persistent compactness. There are actually several mechanisms of promotion of compactness to yield eventually self-organization. In the recent paper [65], for instance, the authors name the heterophilia, i.e., the tendency to bond more with those who are “different” rather than those who are similar, as a positive mechanism in consensus models to reach accord. However also in homophilious societies influenced by more local interactions, global self-organization towards consensus can be expected as soon as enough initial coherence is given. At this point, and perhaps reminiscently of biblic stories from the Genesis, one could enthusiastically argue “Let us give them good rules and they will find their way!” Unfortunately, this is not true, at all. In fact, in homophilious regimes there are plenty of situations where patterns will not spontaneously form. In Sect. 5 below we mathematically demonstrate with a few simple numerical examples the incompleteness of the self-organization paradigm, and we refer to [18] for its systematic discussion. Consequently, we propose to amend it by allowing possible external interventions in form of centralized controls. The human society calls them government.

The general idea consists in considering dynamics of the form

$$\begin{aligned} d x_i =\left(\frac{1}{N}\sum _{j=1}^N P(x_i,x_j)(x_j-x_i)\right)dt + f_i\,dt + \sqrt{2\sigma } \,dB_i^t, \quad i=1,\ldots ,N, \quad t > 0, \end{aligned}$$
(1.2)

where the control \(f = (f_1,\ldots ,f_N)\) minimizes a given functional J(xf). As an example we can consider the following variational formulation

$$\begin{aligned} f = \arg \min _{g\in {\mathcal {U}}} J(x,g):= \mathbb E \left[ \int _0^T\frac{1}{N}\sum _{i=1}^N\left( \frac{1}{2}|x_i-\bar{x}|^2 + \gamma \Psi (g_i)\right) \,dt \right] , \end{aligned}$$
(1.3)

where \(\bar{x}\) represents a target point, \(\gamma \) is the penalization parameter of the control g, which is chosen among the admissible controls in \({\mathcal {U}}\), and \(\Psi :\mathbb R^d \rightarrow \mathbb R_+ \cup \{0 \}\) is a convex function. The choice of this particular cost function, and especially of the term \(\int _0^T \frac{1}{2}\int |x-\bar{x}|^2\mu (x,t)\,dx\) is absolutely arbitrary. It is consistent with our wish of mixing general statements with instances of applications, and the cost function is so given to provide immediately a specific instance of application oriented to opinion consensus problems. Similar models as (1.3) have been studied recently also for the flocking dynamics in [4, 17, 24, 51] and one can of course consider many more instances, as soon as one ensures enough continuity of the cost, see, e.g., [51].

As the number of particles \(N \rightarrow \infty \), the finite dimensional optimal control problem with ODE constraints (1.2)–(1.3) converges to the following mean field optimal control problem [1, 15, 51, 61]:

$$\begin{aligned} \partial _t \mu + \nabla \cdot \left(\left(\mathcal {P}[\mu ] + f\right)\mu \right) = \sigma \Delta \mu , \end{aligned}$$
(1.4)

where the interaction force \(\mathcal P\) is given by

$$\begin{aligned} \mathcal {P}[\mu ](x) = \int P(x,y)(y-x)\mu (y,t)\,dy \end{aligned}$$
(1.5)

and the solution \(\mu \) is controlled by the minimizer of the cost functional

$$\begin{aligned} J(\mu ,f) = \int _0^T\left( \frac{1}{2}\int |x-\bar{x}|^2\mu (x,t)\,dx + \gamma \int \Psi (f)\mu (x,t)\,dx\right) \,dt. \end{aligned}$$
(1.6)

To a certain extent, the mean field optimal control problem (1.4)–(1.6) can be viewed as a generalization of optimal transport problems [14] for which the term \(P \equiv 0\), the term \(\int _0^T \frac{1}{2}\int |x-\bar{x}|^2\mu (x,t)\,dx\) does not appear in the cost, and final conditions are given. Differently from mean field games [62] the goal here is not to derive the equilibria of a multi-player game, rather to compute mean field optimal government strategies for a population so large that the curse of dimensionality would otherwise prohibit numerical solutions. The mean field optimal control problem (1.4)–(1.6) provides an artificial confinement vector field f, inducing the right amount of compactness to have global convergence to steady states (pattern formation). Local convergence towards, e.g., to global Maxwellians, is provided for certain second order mean field-type of equations in [32, 46]. Hence, our results can be also interpreted as an external model perturbation to induce global stability.

In this paper we provide a friendly introduction to mean field optimal controls of the type (1.4)–(1.6), showing their main analytical properties and furnish a simple route to their numerical solutions, which we call “the control hierarchy”. Although some of the results contained in this paper are certainly also derived elsewhere, see, e.g., [15, 51], we made an effort to present them in a simplified form as well as providing rigorous derivations.

In particular, in Sect. 2, we show existence of mean field optimal controls for first order models in case of both stochastic and deterministic control problems. We also derive rigorously in Sect. 3 the corresponding first order optimality conditions, resulting in a coupled system of forward/backward time-dependent PDEs. The forward equation is given by (1.4), while the backward one is a nonlocal integro-differential advection-reaction-diffusion equation. The presence of nonlocal interaction terms in form of integral functions is another feature, which distinguishes mean field optimal control problems from classical mean field games [62] and optimal transport problems [14], where usually \(P \equiv 0\). The nonlocal terms pose additional challenges in the numerical solution, which are subject of recent studies [22].

Although mean field optimal controls are designed to be independent of the number N of agents to provide a way to circumvent the curse of dimensionality of \(N\rightarrow \infty \), still their numerical computation needs to be realized by solving the first-order optimality conditions. The complexity of their solution depends on the intrinsic dimensionality d of the agents, which is affordable only at moderate dimensions (e.g., \(d \le 3\)). For this reason, in Sect. 4 we approach the solution of the mean field optimal control, by means of a novel hierarchy of suboptimal controls, computed by a Boltzmann approach: first one derives a control for a system of two representative particles, then one plugs it into a collisional operator considering the statistics of the interactions of a distribution of agents, and finally one performs a quasi-invariant limit to approximate the PDE of continuity-type, governing the dynamics of the probability distribution of the agent population. For the two particle system considered in the first step of the Boltzmann approach above, we propose two suboptimal controls stemming from the binary Boltzmann approach: the first level is given by an instantaneous model predictive control on two interacting agents—we shall call this control instantaneous control (IC), while the second stems from the solution of the binary optimal control problem by means of the Bellman dynamical programming principle—we shall call this control finite horizon control (FH). These two controls have the advantage that the complexity of their computation is dramatically reduced with respect to the mean field optimal control (OC) in its full glory, still retaining their ability to induce government of the population. We describe in detail how they can be efficiently numerically computed. In Sect. 5 we provide simple numerical approaches, easily implementable, for solving one-dimensional mean field optimal control problems of the type (1.4)–(1.6). We eventually numerically compare the control hierarchy with the mean field optimal control in a model of opinion formation and we show the quasi-optimality of the Boltzmann–Bellman (FH) control. To facilitate the reproducibility of our results and to allow other scientists to easily access this very exciting field, we provide at the link https://www-m15.ma.tum.de/Allgemeines/SoftwareSite the Matlab code used to produce our numerical experiments.

2 Existence of Mean Field Optimal Controls

2.1 Deterministic Case

In this section, we study global existence and uniqueness of weak solutions for Eq. (1.4) in \(\mathbb R^d\) without the diffusion, i.e., \(\sigma =0\), namely

$$\begin{aligned} \partial _t \mu + \nabla \cdot \left(\left(\mathcal {P}[\mu ] + f\right)\mu \right) = 0,\quad x \in \mathbb R^d, \quad t>0. \end{aligned}$$
(2.1)

We also investigate the mean field limit of the ODE constrained control problem (1.2)–(1.3) in the deterministic setting. Let us denote by \( {\mathcal {M}}(\mathbb R^d)\) and \( {\mathcal {M}}_p(\mathbb R^d)\) the sets of all probability measures and the ones with finite moments of order \(p \in [1,\infty )\) on \(\mathbb R^d\), respectively. We first define a notion of weak solutions to the equation to (2.1).

Definition 2.1

For a given \(T > 0\), we call \(\mu \in \mathcal {C}([0,T]; {\mathcal {M}}_1(\mathbb R^d))\) a weak solution of (2.1) on the time-interval [0, T] if for all compactly supported test functions \(\varphi \in \mathcal {C}^\infty _c(\mathbb R^d \times [0,T])\),

$$\begin{aligned}&\int _{\mathbb R^d} \varphi (x,T)\,\mu _T(dx)-\int _0^T\int _{\mathbb R^d} \left( \partial _t \varphi + \left( \mathcal {P}[\mu _t] + f \right)\cdot \nabla \varphi \right)\mu _t(dx)dt \\&\quad = \int _{\mathbb R^d} \varphi _0(x)\,\mu _0(dx). \end{aligned}$$

We also introduce a set of admissible controls \(\mathcal {F}_\ell ([0,T])\) in the definition below.

Definition 2.2

For a given T and \(q \in [1,\infty )\), we fix a control bound function \(\ell \in L^q(0,T)\). Then \(f \in \mathcal {F}_\ell ([0,T])\) if and only if

  1. (i)

    \(f : {\mathbb R^d \times [0,T]} \rightarrow \mathbb R^d\) is a Carathéodory function.

  2. (ii)

    \(f(\cdot ,t) \in W^{1,\infty }_{loc}(\mathbb R^d)\) for almost every \(t \in [0,T]\).

  3. (iii)

    \(|f(0,t)| + \Vert f(\cdot ,t)\Vert _\mathrm{{Lip}} \le \ell (t)\) for almost every \(t \in [0,T]\).

For the existence and mean field limit, we use the topology on probability measures induced by the Wasserstein distance, which is defined by

$$\begin{aligned} \mathcal {W}_p(\mu ,\nu ) := \inf _{\pi \in \Gamma (\mu ,\nu )} \left(\int _{\mathbb R^{2d}} |x - y|^p\,\pi (dx,dy) \right)^{1/p} \; \text{ for } \; p \ge 1 \;\text{ and } \;\mu ,\nu \in {\mathcal {M}}(\mathbb R^d), \end{aligned}$$

where \(\Gamma (\mu ,\nu )\) is the set of all probability measures on \(\mathbb R^{2d}\) with first and second marginals \(\mu \) and \(\nu \), respectively. Note that \( {\mathcal {M}}_1(\mathbb R^d)\) is a complete metric space endowed with the \(\mathcal {W}_1\) distance, and \(\mathcal {W}_1\) is equivalently characterized in duality with Lipschitz continuous functions [84].

The following result is a rather straightforward adaptation from [51] and we shall prove it rather concisely. For more details we address the interested reader to [51], which has been written in a more scholastic and perhaps accessible form.

Theorem 2.1

Let the initial data \(\mu _0 \in { {\mathcal {M}}(\mathbb R^d)}\) and assume that \(\mu _0\) is compactly supported, i.e., there exists \(R > 0\) such that

$$\begin{aligned} \textit{supp} \;\mu _0 \subset B(0,R), \end{aligned}$$

where \(B(0,R) := \{ x \in \mathbb R^d : |x| < R\}\). Furthermore, we assume that \(P \in W^{1,\infty }(\mathbb R^{2d})\). Then, for a given \(f \in \mathcal {F}_\ell ([0,T])\), there exists a unique weak solution \(\mu \in \mathcal {C}([0,T]; {\mathcal {M}}_1(\mathbb R^d))\) to Eq. (1.4) with \(\sigma =0\). Furthermore, \(\mu \) is determined as the push-forward of the initial measure \(\mu _0\) through the flow map generated by the locally Lipschitz velocity field \(\mathcal {P}[\mu ] + f\). Moreover, if \(\mu ^i,i=1,2\) are two such with initial data \(\mu _0^i\) satisfying the above assumption, we have

$$\begin{aligned} \mathcal {W}_1(\mu ^1_t,\mu ^2_t) \le C\mathcal {W}_1(\mu ^1_0,\mu ^2_0) \quad \text{ for } \quad t \in [0,T], \end{aligned}$$

where \(C > 0\) depends only on \(\Vert P\Vert _{W^{1,\infty }}\), R, T, and \(\Vert \ell \Vert _{L^q}\).

Proof

\(\bullet \) Existence and Uniqueness Let \(\mu \in \mathcal {C}([0,T]; {\mathcal {M}}_1(\mathbb R^d))\) with compact support in B(0, R) for some positive constant \(R > 0\). Then we can easily show that the interaction force \(\mathcal {P}\) is locally bounded and Lipschitz:

$$\begin{aligned} |\mathcal {P}[\mu ](x)| \le C(\Vert P\Vert _{L^\infty }, R)(1 + |x|), \end{aligned}$$

and

$$\begin{aligned} |\mathcal {P}[\mu ](x) - \mathcal {P}[\mu ](y)| \le C(\Vert P\Vert _{W^{1,\infty }},R)(1 + |x|)|x- y|. \end{aligned}$$

On the other hand, since \(f \in \mathcal {F}_\ell ([0,T])\), we obtain that the vector field \(\mathcal {P}[\mu ] + f\) is also locally bounded and Lipschitz. Then this together with employing the argument in [23, Theorem 3.10] and existence theory for Carathéodory differential equation in [50], we can get the local-in-time existence and uniqueness of weak solutions to the system (1.4) with \(\sigma =0\) in the sense of Definition 2.1. Note that those solutions exist as long as that solutions are compactly supported. Set

$$\begin{aligned} R(t) := \max _{x,y \in \overline{\text{ supp }(\mu _t)}}|x - y| \quad \text{ for } \quad t \in [0,T]. \end{aligned}$$

Let us consider the following characteristic \(X(t):=X(t;s,x): \mathbb R_+ \times \mathbb R_+ \times \mathbb R^d \rightarrow \mathbb R^d\):

$$\begin{aligned} \frac{d X(t;s,x)}{dt} = \mathcal {P}[\mu _t](X(t;s,x),t) + f(X(t;s,x),t) \quad \text{ for } \text{ all } \quad t,s \in [0,T], \end{aligned}$$
(2.2)

with the initial data \(X_0 = x \in \mathbb R^d\). We notice that characteristic is well-defined on the time interval [0, T] due to the regularity of the velocity field. A straightforward computation yields that for \(x,y \in \) supp\((\mu _0)\)

$$\begin{aligned}&\frac{ d|X(t) - Y(t)|^2}{dt} \\&\quad = (X(t) - Y(t)) \cdot \frac{d \left( X(t) - Y(t)\right)}{dt}\\&\quad \le |X(t) - Y(t)|\left| \mathcal {P}[\mu _t](X(t),t) - \mathcal {P}[\mu _t](Y(t),t)\right| \\&\quad \quad +\, |X(t) - Y(t)||f(X(t),t) - f(Y(t),t)|\\&\quad \le 2\Vert P\Vert _{L^\infty }|X(t) - Y(t)| \int _{\mathbb R^d} |z - X(t)|\mu (z,t)\,dz + \Vert P\Vert _{L^\infty }|X(t) - Y(t)|^2\\&\quad \quad +\, \Vert f(\cdot ,t)\Vert _\mathrm{{Lip}}|X(t) - Y(t)|^2. \end{aligned}$$

This deduces

$$\begin{aligned} \frac{d R(t)}{dt} \le \left( 3\Vert P\Vert _{L^\infty } + \Vert f(\cdot ,t)\Vert _\mathrm{{Lip}}\right)R(t) \le \left( 3\Vert P\Vert _{L^\infty } + \ell (t) \right)R(t), \end{aligned}$$

and

$$\begin{aligned} R(t) \le CR_0 \quad \text{ for } \quad t \in [0,T], \end{aligned}$$

where C depends only on T, \(\Vert P\Vert _{L^\infty }\), and \(\Vert \ell \Vert _{L^q}\). Thus, by continuity arguments, we have the global existence of weak solutions. We can also find that for \(h \in \mathcal {C}^\infty _c(\mathbb R^d)\)

$$\begin{aligned} \int _{\mathbb R^d} \mu (x,t)h(x)\,dx = \int _{\mathbb R^d} \mu _0(x) h(X(0;t,x))\,dx \quad \text{ for } \quad t \in [0,T]. \end{aligned}$$

This implies that \(\mu \) is determined as the push-forward of the initial density through the flow map (2.2).

\(\bullet \) Stability Estimate Let \(T>0\) and \(\mu ^i,i=1,2\) be the weak solutions to Eq. (1.4) with \(\sigma = 0\) obtained in the above. Let \(X_i\) be the characteristic flows defined in (2.2) generated by the velocity fields \(\mathcal {P}[\mu ^i] + f\), respectively. For a fixed \(t_0 \in [0,T]\), we choose an optimal transport map for \(\mathcal {W}_1\) denoted by \(\mathcal {T}^0(x)\) between \(\mu ^1_{t_0}\) and \(\mu ^2_{t_0}\), i.e., \(\mu ^2_{t_0} = \mathcal {T}^0 \# \mu ^1_{t_0}\). It also follows from the above that \(\mu ^i_t = X_i(t;t_0,\cdot ) \# \mu ^i_{t_0}\) for \(t \ge t_0\). Furthermore, we get \(\mathcal {T}^t \# \mu ^1_t = \mu ^2_t\) with \(\mathcal {T}^t = {X_2(t;t_0,\cdot )} \circ \mathcal {T}^0 \circ X_1(t_0;t,\cdot )\) for \(t \in [t_0,T]\). Then we obtain

$$\begin{aligned}&\frac{d^+ \mathcal {W}_1(\mu ^1_t,\mu ^2_t)}{dt}\Big |_{t = t_0+}\\&\quad \le \int _{\mathbb R^d} \left|\mathcal {P}[\mu ^1_{t_0}](X_1(t;t_0,x),t) - \mathcal {P}[\mu ^2_{t_0}](X_2(t;t_0,\mathcal {T}^0(x)),t) \right| \mu ^1_{t_0}(dx)\Big |_{t = t_0+}\\&\qquad + \int _{\mathbb R^d} \left|f(X_1(t;t_0,x),t)- f(X_2(t;t_0,\mathcal {T}^0(x)),t) \right| \mu ^1_{t_0}(dx)\Big |_{t = t_0+}\\&\quad =I_1 + I_2, \end{aligned}$$

where \(I_i,i=1,2\) are estimated as follows.

$$\begin{aligned} I_1\le & {} \int _{\mathbb R^{2d}} \left|P(x,y)(y-x) - P(\mathcal {T}^0(x),\mathcal {T}^0(y))(\mathcal {T}^0(y) - \mathcal {T}^0(x)) \right|\mu ^1_{t_0}(dx)\mu ^1_{t_0}(dy) \\\le & {} \int _{\mathbb R^{2d}}|P(x,y) - P(\mathcal {T}^0(x),\mathcal {T}^0(y))||y-x|\mu ^1_{t_0}(dx)\mu ^1_{t_0}(dy) \\&+ \int _{\mathbb R^{2d}}|P(\mathcal {T}^0(x),\mathcal {T}^0(y))|\left(|y-\mathcal {T}^0(y)| + | x - \mathcal {T}^0(x)| \right)\mu ^1_{t_0}(dx)\mu ^1_{t_0}(dy) \\\le & {} C\Vert P\Vert _{W^{1,\infty }}\mathcal {W}_1(\mu ^1_{t_0},\mu ^2_{t_0}), \\ I_2= & {} \int _{\mathbb R^d} \left|f(x,t) - f(\mathcal {T}^0(x),t) \right|\mu ^1_{t_0}(dx) \le \Vert f(\cdot ,t)\Vert _\mathrm{{Lip}}\mathcal {W}_1(\mu ^1_{t_0},\mu ^2_{t_0})\\\le & {} \ell (t)\mathcal {W}_1(\mu ^1_{t_0},\mu ^2_{t_0}), \end{aligned}$$

where we used the fact that \(\mu \) has the compact support for the estimate of \(I_1\). We now combine the above estimates together with being \(t_0\) arbitrary in [0, T] to conclude

$$\begin{aligned} \frac{d^+ \mathcal {W}_1(\mu ^1_t,\mu ^2_t)}{dt} \le C\left(\Vert P\Vert _{W^{1,\infty }} + \ell (t) \right)\mathcal {W}_1(\mu ^1_t,\mu ^2_t), \quad \text{ for } \quad t \in [0,T]. \end{aligned}$$

This completes the proof. \(\square \)

In Theorem 2.1, we show the global existence and uniqueness of weak solutions \(\mu \) to Eq. (1.4) with \(\sigma = 0\) for a given control \(f \in \mathcal {F}_\ell ([0,T])\). In the rest of this part, we show the rigorous derivation of the infinite dimensional optimal control problem from the finite dimensional one as \(N \rightarrow \infty \). Let us recall the finite/infinite dimensional optimal control problems:

  • Finite dimensional optimal control problem:

    $$\begin{aligned} \min _{f \in \mathcal {F}_\ell }J(x,f):= \min _{f \in \mathcal {F}_\ell }\int _0^T\frac{1}{N}\sum _{i=1}^N\left( \frac{1}{2}|x_i-\bar{x}|^2 + \gamma \Psi ({f(x_i,t)})\right) \,dt, \end{aligned}$$
    (2.3)

    where \(x_i\) is a unique solution of

    $$\begin{aligned} \dot{x}_i =\frac{1}{N}\sum _{j=1}^N P(x_i,x_j)(x_j-x_i) + {f(x_i,t)}, \quad i=1,\ldots ,N, \quad t > 0, \end{aligned}$$
    (2.4)
  • Infinite dimensional optimal control problem:

    $$\begin{aligned} \min _{f \in \mathcal {F}_\ell }J(\mu _t,f):= \min _{f \in \mathcal {F}_\ell }\int _0^T\left( \frac{1}{2}\int _{\mathbb R^d} |x-\bar{x}|^2 \,\mu _t(dx) + \gamma \int _{\mathbb R^d} \Psi (f) \,\mu _t(dx) \right) \,dt, \end{aligned}$$
    (2.5)

    where \(\mu \in \mathcal {C}([0,T]; {\mathcal {M}}_1(\mathbb R^d))\) is a unique weak solution of

    $$\begin{aligned} \partial _t \mu _t&= \nabla \cdot \left( \left(\mathcal {P}[\mu _t] + f\right)\mu _t \right), \quad (x,t) \in \mathbb R^d \times [0,T],\nonumber \\ \mathcal {P}[\mu _t](x)&= \int _{\mathbb R^d} P(x,y)(y-x)\mu _t(dy). \end{aligned}$$
    (2.6)

For the convergence from (2.3)–(2.4) to (2.5)–(2.6), we need a weak compactness result in \(\mathcal {F}_\ell \) whose proof can be found in [51, Corollary 2.7].

Lemma 2.2

Let \(p \in (1,\infty )\). Suppose that \((f_j)_{j \in \mathbb N} \in \mathcal {F}_\ell \) with \(\ell \in L^q(0,T)\) for \(1 \le q < \infty \). Then there exists a subsequence \((f_{j_k})_{k \in N}\) and a function \(f \in \mathcal {F}_\ell \) such that

$$\begin{aligned} f_{j_k} \rightharpoonup f \quad \text{ weakly* } \text{ in } L^q(0,T;W^{1,p}(\mathbb R^d))\quad \text{ as } \quad k \rightarrow \infty , \end{aligned}$$
(2.7)

i.e.,

$$\begin{aligned}&\lim _{k \rightarrow \infty }\int _0^T \int _{\mathbb R^d} \phi (x,t)(f_{j_k}(x,t) - f(x,t))\,dxdt = 0\\&\quad \text{ for } \text{ all } \quad \phi \in L^{q'}(0,T;W^{-1,p'}(\mathbb R^d)). \end{aligned}$$

Define the empirical measure \(\mu ^N\) associated to the particle system (2.4) as

$$\begin{aligned} \mu ^N_t := \frac{1}{N} \sum _{i=1}^N \delta _{x_i(t)} \quad \text{ for } \quad t \ge 0. \end{aligned}$$

Then we are now in a position to state our theorem on the mean field limit of the optimal control problem.

Theorem 2.3

Let \(T >0\). Suppose that \(P \in W^{1,\infty }(\mathbb R^{2d})\) and \(\Psi \) satisfies that there exist \(C \ge 0\) and \(1 \le q < \infty \)

$$\begin{aligned} Lip(\Psi , B(0,R)) \le CR^{q-1} \quad \text{ for } \text{ all } \quad R > 0. \end{aligned}$$

Let \(\ell (t)\) be a fixed function in \(L^q(0,T)\). Furthermore we assume that \(\{x_i^0\}_{i=1}^N \subset B(0,R_0)\) for \(R_0 > 0\) independent of N. For all \(N \in \mathbb N\), let us denote the control function \(f_N \in \mathcal {F}_\ell \) as a solution of the finite dimensional optimal control problem (2.3)–(2.4). If there exits a compactly supported initial data \(\mu _0 \in {\mathcal {M}}_1(\mathbb R^d)\) such that \(\lim _{N \rightarrow \infty }\mathcal {W}_1(\mu _0^N, \mu _0)\), then there exists a subsequence \((f^{N_k}_t)_{k \in \mathbb N}\) and a function \(f^\infty _t\) such that \(f^{N_k}_t \rightarrow f^\infty _t\) in the sense of (2.7). Moreover, \(f^\infty _t\) and the corresponding \(\mu ^\infty _t\) are solutions of the infinite dimensional optimal control problem (2.5)–(2.6).

Proof

We first notice that the existence of an optimal control \(f^N_t\) on the time interval [0, T] for the finite dimensional optimal problem (2.3)–(2.4) can be obtained by using the weak compactness estimate in Lemma 2.2 together with the strong regularity of velocity field \(\mathcal P + f\), see [51, Theorem 3.3]. For any \(f \in \mathcal {F}_\ell ([0,T])\), let us denote \((\mu _f)^N_t\) by the solution to the equation (2.4) with the initial data \((\mu _f)_0^N\) satisfying \(\lim _{N \rightarrow \infty } \mathcal {W}_1((\mu _f)_0^N,\mu _0) = 0\). Let denote also by \(\mu ^{f_t}_t\) is a solution associated to (2.6) with the control \(f_t\) and that initial data \(\mu _0\), which is ensured by Theorem 2.1. Moreover, by Theorem 2.1, \(\lim _{N \rightarrow \infty } \mathcal {W}_1((\mu _f)_t^N, \mu _t^{f_t}) = 0\). On the other hand, it follows from Lemma 2.2 that there exists a subsequence \(f^{N_k}_t\) such that \(f^{N_k}_t \rightharpoonup f^\infty _t\) weakly* in \(L^q(0,T;W^{1,p}(\mathbb R^d))\) as \(k \rightarrow \infty \) for some \(f^\infty _t \in \mathcal {F}_\ell \). Let \(\mu ^{\infty }_t\) is the solution to (2.6) with the control function \(f^\infty _t\). Then, by the lower-semicontinuity of the onset functional, we get

$$\begin{aligned} \liminf _{k \rightarrow \infty }J\left(\mu _t^{N_k}, f^{N_k}_t\right) \ge J(\mu ^\infty _t,f^\infty _t), \end{aligned}$$

where \(\mu ^{N_k}_t\) is a solution to the particle equation (2.4) with the optimal control \(f^{N_k}_t\). Then, due to the minimality of \(f^{N_k}_t\), it is clear that

$$\begin{aligned} J\left( (\mu _f)_t^{N_k}, f_t\right) \ge J\left(\mu _t^{N_k}, f^{N_k}_t\right) \quad \text{ for } \text{ each } \quad k \in \mathbb N. \end{aligned}$$

We finally use the convergence of \(\lim _{k \rightarrow \infty } \mathcal {W}_1((\mu _f)_t^{N_k}, \mu _t^f) = 0\) together with the compactly supported solution \(\mu _t\) to have

$$\begin{aligned} J(\mu _t^{f_t}, f_t)= \lim _{k \rightarrow \infty } J\left((\mu _f)_t^{N_k}, f_t\right) \ge \liminf _{k \rightarrow \infty }J\left(\mu _t^{N_k}, f^{N_k}_t\right) \ge J(\mu ^\infty _t,f^\infty _t). \end{aligned}$$

Since \(f_t\) is arbitrarily chosen in \(\mathcal {F}_\ell ([0,T])\), this concludes

$$\begin{aligned} \min _{f_t \in \mathcal {F}_\ell } J(\mu _t,f_t)= J(\mu ^\infty _t,f^\infty _t), \end{aligned}$$

i.e., \(f^\infty _t\) is the optimal control for the problem (2.5)–(2.6). \(\square \)

2.2 Stochastic Case

In this section, we study the parabolic optimal control problem in a bounded domain. In this section we are to a certain extent inspired by the work [20]. As we are deviating from that in certain estimates, we take the burden somehow of presenting the results in more details than in the previous section.

Let \(\Omega \) denote an open, bounded, smooth subset of \(\mathbb R^d\). We first introduce function spaces:

$$\begin{aligned} V:= L^2(0,T; H^1(\Omega )) \cap \dot{H}^1(0,T; H^{-1}_*(\Omega )), \quad \text{ and } \quad H^{-1}_*(\Omega ) = H^1(\Omega )', \end{aligned}$$

and the set of admissible controls

$$\begin{aligned} Q_M := \left\rbrace \Vert f\Vert _{L^2(0,T; L^\infty (\Omega ))} \le M \,:\,f \in L^2(0,T; L^\infty (\Omega ))\right\lbrace , \end{aligned}$$

for a given \(M>0\). Then our optimization problem is to show the existence of

$$\begin{aligned} \min _{f \in Q_M} J(\mu ,f) := \min _{ f \in Q_M}\int _0^T\left( \frac{1}{2}\int _\Omega |x-\bar{x}|^2\mu (x,t)\,dx + \gamma \int _\Omega \Psi (f)\mu (x,t)\,dx\right) \,dt, \end{aligned}$$
(2.8)

where \(\mu \) is a weak solution to the following parabolic equation:

$$\begin{aligned} \partial _t \mu + \nabla \cdot (\mathcal {P}[\mu ]\mu + f\mu ) = \sigma \Delta \mu , \quad (x,t) \in \Omega _T:=\Omega \times [0,T], \end{aligned}$$
(2.9)

with the initial data

$$\begin{aligned} \mu (\cdot , 0) = \mu _0(x) \quad x \in \Omega , \end{aligned}$$

and the zero-flux boundary condition

$$\begin{aligned} \left\langle \sigma \nabla \mu - (\mathcal {P}[\mu ] + f)\mu , \,n(x) \right\rangle = 0, \quad (x,t) \in \partial \Omega \times [0,T], \end{aligned}$$

where n(x) is the outward normal to \(\partial \Omega \) at the point \(x \in \partial \Omega \). Here the interaction term is given by

$$\begin{aligned} \mathcal {P}[\mu ](x,t) = \int _\Omega P(x,y)(y-x)\mu (y,t)\,dy. \end{aligned}$$

We next provide a notion of weak solution to Eq. (2.9).

Definition 2.3

For a given \(T >0\), a function \(\mu : \Omega _T \rightarrow [0,\infty )\) is a weak solution of Eq. (2.9) on the time-interval [0, T] if and only if

  1. 1.

    \(\mu \in L^2(0,T;H^1(\Omega ))\) and \(\partial _t \mu \in L^2(0,T; H^{-1}_*(\Omega ))\).

  2. 2.

    For any \(\varphi \in \mathcal {C}^1(\overline{\Omega _T})\) with \(\varphi (\cdot ,0) = \varphi (\cdot ,T) = 0\),

    $$\begin{aligned} \int _0^T \int _\Omega \mu \partial _t \varphi +\left( \mathcal {P}[\mu ] \mu + f \mu - \sigma \nabla \mu \right)\cdot \nabla \varphi \,dx dt =0. \end{aligned}$$

Theorem 2.4

For a given \(T, M >0\), let \(f \in Q_M\) and \(\mu _0 \in L^2(\Omega )\). Furthermore, we assume \(P \in L^\infty (\Omega ^2)\). Then there exists a unique weak solution \(\mu \) to Eq. (2.9) in the sense of Definition 2.3.

Proof

Existence.- We first employ the following iteration scheme: Let \(\mu ^1(x,t) := \mu _0(x)\) for \((x,t) \in \Omega _T\). For \(n \ge 1\), let \(\mu ^{n+1}\) be the solution of

$$\begin{aligned} \partial _t \mu ^{n+1} + \nabla \cdot (\mathcal {P}[\mu ^n] \mu ^{n+1} + f \mu ^{n+1}) = \sigma \Delta \mu ^{n+1} \end{aligned}$$
(2.10)

with the initial data \(\mu ^n(x)|_{t=0} = \mu _0(x)\) for all \(n \ge 1\) \(x \in \Omega \) and the zero-flux boundary conditions. It is clear that \(\int _\Omega \mu ^n(x,t)\,dx = \int _\Omega \mu _0(x)\,dx\). Note that for given \(\mu ^n \in V\) we can have a unique weak solution to Eq. (2.10) since \(\mathcal {P}[\mu ^n] \in L^\infty (\Omega )\) and \(f \in L^\infty (\Omega )\). We next show that \(\mu ^{n+1} \in V\). A straightforward computation yields

$$\begin{aligned} \frac{1}{2}\frac{d}{dt}\int _\Omega (\mu ^{n+1})^2\,dx + \sigma \int _\Omega |\nabla \mu ^{n+1}|^2\,dx= & {} \int _\Omega \nabla \mu ^{n+1} \cdot \left( \mathcal {P}[\mu ^n]\mu ^{n+1} + f\mu ^{n+1} \right)\,dx\\&=: I_1 + I_2, \end{aligned}$$

where \(I_2\) can be easily estimated as

$$\begin{aligned} I_2 \le \int _\Omega |\nabla \mu ^{n+1}| |f| \mu ^{n+1}\,dx \le \frac{\epsilon }{2}\int _\Omega |\nabla \mu ^{n+1}|^2\,dx + C_\epsilon \Vert f\Vert _{L^\infty }^2 \int _\Omega (\mu ^{n+1})^2\,dx. \end{aligned}$$

For the estimate of \(I_1\), we use the fact that

$$\begin{aligned} \Vert \mathcal {P}[\mu ^n]\Vert _{L^\infty } \le {\text {diam}}(\Omega )\Vert P\Vert _{L^\infty }\Vert \mu _0\Vert _{L^1} < \infty , \end{aligned}$$
(2.11)

to obtain

$$\begin{aligned} |I_1| \le \int _\Omega |\nabla \mu ^{n+1}| |\mathcal {P}[\mu ^n]|\mu ^{n+1}\,dx \le \frac{\epsilon }{2}\int _\Omega |\nabla \mu ^{n+1}|^2\,dx + C_\epsilon \int _\Omega (\mu ^{n+1})^2\,dx. \end{aligned}$$

Combining the above estimates and choosing \(\epsilon < \sigma \), we find

$$\begin{aligned} \frac{1}{2}\frac{d}{dt}\int _\Omega (\mu ^{n+1})^2\,dx + \left(\sigma - \epsilon \right)\int _\Omega |\nabla \mu ^{n+1}|^2\,dx \le C_\epsilon \left(1 + \Vert f\Vert _{L^\infty }^2 \right)\int _\Omega (\mu ^{n+1})^2\,dx. \end{aligned}$$

Applying Gronwall’s inequality to the above differential inequality deduces

$$\begin{aligned} \int _\Omega (\mu ^{n+1})^2\,dx + \int _0^t \int _\Omega |\nabla \mu ^{n+1}|^2\,dxds \le C(T,\sigma ,\Vert \mu _0\Vert _{L^2}, M). \end{aligned}$$
(2.12)

We also get that for all \(\psi \in H^1(\Omega )\)

$$\begin{aligned} \Vert \partial _t \mu ^{n+1}\Vert _{H^{-1}_*}= & {} \sup _{\Vert \psi \Vert _{H^1} \le 1} | \langle \partial _t \mu ^{n+1}, \psi \rangle |\\\le & {} \sup _{\Vert \psi \Vert _{H^1} \le 1} \left| \left\langle \mathcal {P}[\mu ^n]\mu ^{n+1} + f \mu ^{n+1} + \sigma \nabla \mu ^{n+1}, \nabla \psi \right\rangle \right|\\\le & {} \left(\Vert \mathcal {P}[\mu ^n]\Vert _{L^\infty } + \Vert f\Vert _{L^\infty }\right)\Vert \mu ^{n+1}\Vert _{L^2} + \sigma \Vert \nabla \mu ^{n+1}\Vert _{L^2}. \end{aligned}$$

Thus we obtain \(\partial _t \mu ^{n+1}\in L^2(0,T; H^{-1}_*(\Omega ))\) due to (2.11) and (2.12). This concludes \(\mu ^n \in V\) for all \(n \ge 2\). Note that this also implies \(\mu ^{n} \in \mathcal {C}([0,T];L^2(\Omega ))\) for all \(n \ge 2\). Indeed, we have

$$\begin{aligned} \max _{0 \le t \le T}\Vert \mu ^n(t)\Vert _{L^2} \le C\left(\Vert \mu ^n\Vert _{L^2(0,T;H^1)} + \Vert \partial _t \mu ^n\Vert _{L^2(0,T;H^{-1}_*)} \right) \quad \text{ for } \text{ all } n \ge 2, \end{aligned}$$

where C only depends on T. Then, by Aubin–Lions lemma, there exist a subsequence \(\mu ^{n_k}\) and a function \(\mu \in L^2(\Omega _T)\) such that

$$\begin{aligned} \mu ^{n_k} \rightarrow \mu \quad \text{ in } L^2(\Omega _T) \quad \text{ as } \quad k \rightarrow \infty . \end{aligned}$$
(2.13)

We next show that the above limiting function \(\mu \) solves Eq. (2.9) in the sense of Definition 2.3. For this, it suffices to take into account the interaction term \(\mathcal {P}[\mu ]\mu \) since the other terms are linear with respect to \(\mu \). Using the linearity of the functional \(\mathcal {P}\) together with (2.11) and the following fact

$$\begin{aligned} \Vert \mathcal {P}[f]\Vert _{L^\infty } \le {\text {diam}}(\Omega )\Vert P\Vert _{L^\infty } \sqrt{|\Omega |}\Vert f\Vert _{L^2}, \end{aligned}$$

we get

$$\begin{aligned}&\int _0^T \int _\Omega \left| \mu ^{n_{k+1}}\mathcal {P}[\mu ^{n_k}] - \mu \mathcal {P}[\mu ]\right|^2 dxdt \nonumber \\&\quad \le 2\int _0^T \int _\Omega \left| \mu ^{n_{k+1}} - \mu \right|^2|\mathcal {P}[\mu ^{n_k}]|^2\,dxdt + 2\int _0^T \int _\Omega \mu ^2 |\mathcal {P}[\mu ^{n_k} - \mu ]|^2\,dxdt\nonumber \\&\quad \le C_0\int _0^T \int _\Omega \left| \mu ^{n_{k+1}} - \mu \right|^2 + \left| \mu ^{n_{k}} - \mu \right|^2\,dxdt \rightarrow 0 \quad \text{ as } \quad k \rightarrow \infty , \end{aligned}$$
(2.14)

where \(C_0 > 0\) is given by

$$\begin{aligned} C_0 := 2 {\text {diam}}(\Omega )^2\Vert P\Vert _{L^\infty }^2\left(\Vert \mu _0\Vert _{L^1}^2 + |\Omega |\Vert \mu \Vert _{L^\infty (0,T;L^2)}^2\right). \end{aligned}$$

Hence we have that the limiting function \(\mu \) satisfies

$$\begin{aligned} \int _0^T \int _\Omega \mu \partial _t \varphi +\left( \mathcal {P}[\mu ] \mu + f \mu - \sigma \nabla \mu \right)\cdot \nabla \varphi \,dx dt =0. \end{aligned}$$

Uniqueness.- Let \(\mu _i,i=1,2\) be two solutions to Eq. (2.9) with initial data \(\mu _i(0) \in L^2(\Omega )\). Then, by using the similar estimate as in (2.14), we find

$$\begin{aligned}&\frac{1}{2}\frac{d}{dt}\int _\Omega |\mu _1 - \mu _2|^2\,dx + \sigma \int _\Omega |\nabla (\mu _1 - \mu _2)|^2\,dx\\&\quad = \int _\Omega \nabla (\mu _1 - \mu _2) \cdot \left( \mathcal {P}[\mu _1 - \mu _2]\mu _1 + \mathcal {P}[\mu _2](\mu _1 - \mu _2) + f(\mu _1 - \mu _2) \right)dx\\&\quad \le \epsilon \int _\Omega |\nabla (\mu _1 - \mu _2)|^2\,dx + C_\epsilon \left(1 + \Vert f\Vert _{L^\infty }^2\right)\int _\Omega |\mu _1 - \mu _2|^2\,dx, \end{aligned}$$

where \(C_\epsilon \) depends only on \(\Omega \), \(\epsilon \), \(\Vert \mu _1\Vert _{L^\infty (0,T;L^2)}\), and \(\Vert \mu _2(0)\Vert _{L^1}\). Finally, we apply the Gronwall’s inequality to the above differential inequality to get

$$\begin{aligned} \Vert \mu _1 - \mu _2\Vert _{L^\infty (0,T;L^2)}^2 + \Vert \nabla (\mu _1 - \mu _2)\Vert _{L^2(0,T;L^2)}^2 \le C_1 \Vert \mu _1(0) - \mu _2(0)\Vert _{L^2}^2 \end{aligned}$$

where \(C_1\) depends only on \(T,\sigma ,\Vert \mu _2(0)\Vert _{L^2}, M, \Omega \), and \( \Vert \mu _1\Vert _{L^\infty (0,T;L^2)}\). This completes the proof. \(\square \)

Theorem 2.5

For a given \(T, M> 0\), let us assume \(\mu _0 \in L^2(\Omega )\). Furthermore, we assume that \(P \in L^\infty (\Omega ^2)\) and \(\Psi \) satisfies that for all \(R > 0\)

$$\begin{aligned} W^{1,\infty }(\Psi , B(0,R)) \le CR, \end{aligned}$$

for some \(C > 0\). Then there exist \(f^\infty \in Q_M\) and the corresponding density \(\mu ^\infty \) solving the optimal control problem (2.8)–(2.9).

Proof

For \(f \in Q_M\), by Theorem 2.4, there exists a weak solution \(\mu \) in the sense of Definition 2.3. Note that \(0 \in Q_M\) and

$$\begin{aligned} J(\mu ^0,0) = \frac{1}{2}\int _0^T\int _\Omega |x-\bar{x}|^2\mu (x,t)\,dxdt \le C(T,\Omega )\Vert \mu _0\Vert _{L^1(\Omega )} \le C, \end{aligned}$$

where \(\mu ^0\) is a weak solution of Eq. (2.9) with \(f=0\). Since \(J(\mu ,f) \ge 0\) for all \((\mu ,f)\in V \times Q_M\), there exist a sequence \((f^j)_{j \in \mathbb N} \in Q_M\) and the corresponding density \((\mu ^j)_{j \in \mathbb N} \in V\) solving (2.9) such that

$$\begin{aligned} \lim _{j \rightarrow \infty } J(\mu ^j, f^j) = \inf _{f \in Q_M} J(\mu ,f). \end{aligned}$$

On the other hand, since \((\mu ^j,f^j)_{j \in \mathbb N} \in V \times Q_M\), by Banach–Alaoglu theorem, there exist a subsequence \((\mu ^{j_k},f^{j_k}) \in V \times Q_M\) and \((\mu ^\infty ,f^\infty )\in V \times Q_M\) such that

$$\begin{aligned} \mu ^{j_k} \rightarrow \mu ^\infty \quad \text{ in } L^2(\Omega _T) \quad \text{ and } \quad f^{j_k} \mathop {\rightharpoonup }\limits ^{*}f^\infty \quad \text{ in } L^2(0,T;L^\infty (\Omega )). \end{aligned}$$
(2.15)

We next show that \((\mu ^\infty ,f^\infty )\) is a solution to (2.9). Since the term involving \(\mathcal {P}[\mu ]\) can be easily handled by using the similar estimate to (2.14) and the above strong convergence (2.15), it is enough to show that

$$\begin{aligned} I_k:= \int _0^T \int _\Omega \left(f^{j_k}\mu ^{j_k} - f^\infty \mu ^\infty \right)\phi \,dxdt \rightarrow 0\quad \text{ as } \quad k \rightarrow \infty , \end{aligned}$$

for \(\phi \in L^2(0,T;H^1(\Omega ))\). For this, we decompose \(I_k\) into two parts as

$$\begin{aligned} I_k = \int _0^T \int _\Omega (f^{j_k} - f^\infty )\mu ^{j_k} \phi \,dxdt + \int _0^T \int _\Omega (\mu ^{j_k} - \mu ^\infty )f^\infty \phi \,dxdt =: I_k^1 + I_k^2. \end{aligned}$$

Since

$$\begin{aligned} L^2(0,T;L^\infty (\Omega )) = \left(L^2(0,T;L^1(\Omega ))\right)' \quad \text{ and } \quad \mu ^{j_k} \phi \in L^2(0,T;L^1(\Omega )), \end{aligned}$$

it is clear from (2.15) that \(I_k^1 \rightarrow 0\) as \(k \rightarrow \infty \). For the convergence of \(I_k^2\), we get

$$\begin{aligned} I_k^2\le & {} \int _0^T\Vert f^\infty \Vert _{L^\infty }\Vert \mu ^{j_k} - \mu ^\infty \Vert _{L^2}\Vert \phi \Vert _{L^2}\,dt \\\le & {} \Vert \phi \Vert _{L^\infty (0,T;L^2)}\Vert f^\infty \Vert _{L^2(0,T;L^\infty )}\Vert \mu ^{j_k} - \mu ^\infty \Vert _{L^2(0,T;L^2)} \rightarrow 0 \quad \text{ as } \quad k \rightarrow \infty . \end{aligned}$$

Thus we conclude that \((\mu ^\infty ,f^\infty )\) is a solution to (2.9). Furthermore, we obtain

$$\begin{aligned} \int _0^T \int _\Omega |x - \bar{x}|^2 \mu ^{j_k}\,dxdt \rightarrow \int _0^T \int _\Omega |x - \bar{x}|^2 \mu ^\infty \,dxdt \quad \text{ as } \quad k \rightarrow \infty , \end{aligned}$$

due to \(|\Omega | < \infty \). We also find

$$\begin{aligned} \lim _{k \rightarrow \infty }\int _0^T \int _\Omega \Psi (f^{j_k}) \mu ^{j_k}\,dxdt \ge \int _0^T \int _\Omega \Psi (f^\infty ) \mu ^\infty \,dxdt. \end{aligned}$$
(2.16)

More precisely, we estimate

$$\begin{aligned}&\int _0^T \int _\Omega \left( \Psi (f^{j_k}) \mu ^{j_k} - \Psi (f^\infty ) \mu ^\infty \right) dxdt \\&\quad = \int _0^T \int _\Omega \left( \Psi (f^{j_k}) - \Psi (f^\infty )\right)\mu ^{j_k} + \Psi (f^\infty )\left(\mu ^{j_k} - \mu ^\infty \right) dxdt\\&\quad \ge \int _0^T \int _\Omega \nabla \Psi (f^\infty ) \cdot (f^{j_k} - f^\infty )\mu ^{j_k} dxdt + \int _0^T \int _\Omega \Psi (f^\infty )\left(\mu ^{j_k} - \mu ^\infty \right) dxdt\\&\quad =: J_k^1 + J_k^2, \end{aligned}$$

where we used the convexity of \(\Psi \) and the positivity of \(\mu ^{j_k}\). We then claim that \(\lim _{k \rightarrow \infty } J_k^1 \ge 0\) and \(\lim _{k \rightarrow \infty } J_k^2 \ge 0\), and for this, we show that

$$\begin{aligned}&\nabla \Psi (f^\infty ) \cdot f^{j_k}\mu ^{j_k} \mathop {\rightharpoonup }\limits ^{*}\nabla \Psi (f^\infty ) \cdot f^\infty \mu ^{j_k}\quad \\&\text{ and } \quad \Psi (f^\infty ) \mu ^{j_k} \mathop {\rightharpoonup }\limits ^{*}\Psi (f^\infty ) \mu ^\infty \quad \text{ in } \mathcal {M}(\Omega _T) \quad \text{ as } \quad k \rightarrow \infty . \end{aligned}$$

For \(\phi \in \mathcal {C}_c(\Omega _T)\), we get

$$\begin{aligned} \int _0^T \int _\Omega \Psi (f^\infty )(\mu ^{j_k} - \mu ^\infty )\phi \,dxdt&\le C\Vert \phi \Vert _{L^\infty (\Omega _T)}\int _0^T \Vert f^\infty \Vert _{L^\infty }\Vert \mu ^{j_k} - \mu ^\infty \Vert _{L^2}\,dt \\&\le M\Vert \phi \Vert _{L^\infty (\Omega _T)}\Vert \mu ^{j_k} - \mu ^\infty \Vert _{L^2(\Omega _T)}, \end{aligned}$$

thus \(J_k^2 \rightarrow 0 \) as \(k \rightarrow \infty \). Similarly as before, for the estimate of \(J_k^1\), we use the fact that \(\nabla \Psi (f^\infty )\, \mu ^{j_k}\,\phi \in L^2(0,T; L^1(\Omega ))\) uniformly in k and \((L^2(0,T;L^1))' = L^2(0,T;L^\infty )\) to obtain \(J_k^1 \rightarrow 0\) as \(k \rightarrow \infty \). Then, this and together with de la Vallée–Poussin’s theorem, provides the semicontinuity (2.16). This yields

$$\begin{aligned} \liminf _{k \rightarrow \infty }J(\mu ^{j_k}, f^{j_k}) \ge J(\mu ^\infty ,f^\infty ). \end{aligned}$$

Hence we conclude

$$\begin{aligned} \inf _{f \in Q_M} J(\mu ,f)=\lim _{j \rightarrow \infty } J(\mu ^j, f^j)=\liminf _{k \rightarrow \infty }J(\mu ^{j_k}, f^{j_k}) \ge J(\mu ^\infty ,f^\infty ). \square \end{aligned}$$

3 First Order Optimality Conditions

In this section, we derive first order optimality conditions for the mean field optimal control problem studied in Sect. 2:

$$\begin{aligned} \partial _t\mu + \nabla \cdot \left((\mathcal {P}[\mu ] + f)\mu \right) = \sigma \Delta \mu , \quad x \in \Omega , \quad t > 0, \end{aligned}$$
(3.1)

where the control f is the solution of the minimization of the following cost functional:

$$\begin{aligned} J(\mu ,f) = \int _0^T\left( \frac{1}{2}\int _\Omega |x-\bar{x}|^2\mu (x,t)\,dx + \gamma \int _\Omega \Psi (f)\mu (x,t)\,dx\right) \,dt. \end{aligned}$$
(3.2)

3.1 Formal Derivation of the Optimality Conditions

Let us first write the Lagrangian of the mean field optimal control defined by (3.1) and (3.2), as follows

(3.3)

Integrating by parts and taking the terminal data \(\psi (x,T) = 0\), we get

$$\begin{aligned} \mathcal {L}(\mu ,\psi ,f)= & {} \int _0^T\left( \frac{1}{2}\int _\Omega |x-\bar{x}|^2\mu \,dx +\gamma \int _\Omega \Psi (f) \mu \,dx\right) \,dt \nonumber \\&+\int _\Omega \psi (x,0)\mu (x,0)\,dx + \int _0^T\int _\Omega \partial _t\psi \,\mu \,dx dt\nonumber \\&+ \int _0^T\int _\Omega \nabla \psi \cdot ({\mathcal {P}}[\mu ] \mu )\,dxdt \nonumber \\&+ \int _0^T\int _\Omega \nabla \psi \cdot (f \mu )\,dxdt + \sigma \int _0^T \int _\Omega \mu \Delta \psi \,dxdt, \end{aligned}$$
(3.4)

where we omit the dependency on (xt) where not necessary. We compute the functional derivatives of the Lagrangian with respect to the state function \(\mu \) and the control f,

$$\begin{aligned} \frac{\delta \mathcal {L}}{\delta f}&= \gamma \nabla \Psi (f) \mu {+ \nabla \psi \,\mu } = (\gamma \nabla \Psi (f){+\nabla \psi })\mu ,\end{aligned}$$
(3.5)
$$\begin{aligned} \frac{\delta \mathcal {L}}{\delta \mu }&=\frac{1}{2}|x-\bar{x}|^2 +\gamma \Psi (f) + \partial _t\psi + \nabla \psi \cdot f + \sigma \Delta \psi \nonumber \\&\quad {+}\int _\Omega \left(P(x,y)\nabla \psi (x,t) - P(y,x)\nabla \psi (y,t)\right)\cdot (y-x)\mu (y,t)\,dy. \end{aligned}$$
(3.6)

Let \((\mu ^*,\psi ^*,f^*)\) be the solution to the optimal control problem. Then we have

$$\begin{aligned} \frac{\delta \mathcal {L}}{\delta f}\Big |_{(\mu ,\psi ,f) = (\mu ^*,\psi ^*,f^*)} = 0 \quad \text{ and } \quad \frac{\delta \mathcal {L}}{\delta \mu }\Big |_{(\mu ,\psi ,f) = (\mu ^*,\psi ^*,f^*)} = 0. \end{aligned}$$

This yields from (3.5) that

$$\begin{aligned} \gamma \nabla \Psi (f^*) = {-\nabla \psi ^*} \quad \text{ on } \text{ the } \text{ support } \text{ of } \mu ^*. \end{aligned}$$
(3.7)

We also find from (3.6) that \(\psi ^*\) satisfies

$$\begin{aligned}&\partial _t \psi ^* + \frac{1}{2}|x-\bar{x}|^2 + \gamma \Psi (f^*) + \nabla \psi ^* \cdot f^* + \sigma \Delta \psi ^* \\&\quad +\int _\Omega \left(P(x,y)\nabla \psi ^*(x,t) - P(y,x)\nabla \psi ^*(y,t)\right)\cdot (y-x)\mu ^*(y,t)\,dy = 0, \end{aligned}$$

or equivalently

$$\begin{aligned}&\partial _t \psi ^* + \frac{1}{2}|x-\bar{x}|^2 + \gamma \left( \Psi (f^*) {- \nabla \Psi (f^*) \cdot f^*}\right) + \sigma \Delta \psi ^* \nonumber \\&\quad + \int _\Omega \left(P(x,y)\nabla \psi ^*(x,t) - P(y,x)\nabla \psi ^*(y,t)\right)\cdot (y-x)\mu ^*(y,t)\,dy = 0,\nonumber \\ \end{aligned}$$
(3.8)

due to (3.7), where \(\mu ^*\) satisfies

$$\begin{aligned} \partial _t\mu ^* + \nabla \cdot (({\mathcal {P}}[\mu ^*] + f^*)\mu ^*) = \sigma \Delta \mu ^*\quad \text{ with } \quad \nabla \Psi (f^*) = {-\frac{1}{\gamma }\nabla \psi ^*.} \end{aligned}$$

3.2 Rigorous Derivation of the Optimality Conditions

The first order optimality conditions (3.10) are of utmost relevance as they are often used for the numerical computation of mean field optimal controls and we show how to proceed for that in Sect. 5. Although they are very often formally derived, as we do above, and used in several contributions, see, e.g. [15], as a relatively straightforward consequence of the Lagrange multiplier theorem, we feel that presenting their rigorous derivation can be useful for a reader not familiar with such derivations. Moreover, by doing so, we highlight more precisely certain technical difficulties and aspects, which one may in fact encounter along the process, and are often left to a certain extent as for granted. Let us recall then the Lagrange multiplier theorem in Banach spaces.

Let X and Y be Banach spaces, and let a functional \(J: U(x^*) \subseteq X \rightarrow \mathbb R\) and a mapping \(G: U(x^*)\subseteq X \rightarrow Y\) be continuously differentiable on an open neighbourhood of \(x^*\). Consider the following optimal problem:

$$\begin{aligned} J(x) \rightarrow \inf , \quad G(x) = 0. \end{aligned}$$
(3.9)

Then we recall the following first order optimality condition whose proof can be found in [86, Sect. 4.14].

Theorem 3.1

Let \(x^*\) be a solution to the problem (3.9), and let the range of the operator \(G'(x^*) : X \rightarrow Y\) be closed. Then there exists a nonzero pair \((\lambda ,p) \in \mathbb R\times Y'\) such that

$$\begin{aligned} \mathcal {L}'_x(x^*,\lambda ,p)(x) = 0 \quad \text{ for } \text{ all } x \in X, \end{aligned}$$

where

$$\begin{aligned} \mathcal {L}(x,\lambda ,p) = \lambda J(x) + G(x)(p). \end{aligned}$$

Moreover, if Im \(G'(x^*) = Y\), then \(\lambda \ne 0\) in the above, thus we can assume that \(\lambda = 1\).

In order to apply the above theorem, we set

$$\begin{aligned} X= & {} V \times L^2(\Omega _T), \quad Y = L^2(0,T;H^{-1}(\Omega )),\\ J(\mu ,f)= & {} \int _0^T\left( \frac{1}{2}\int _\Omega |x-\bar{x}|^2\mu (x,t)\,dx + \gamma \int _\Omega \Psi (f)\mu (x,t)\,dx\right) dt, \end{aligned}$$

and

$$\begin{aligned} G(\mu ,f)(\psi )&= - \int _0^T\int _\Omega \partial _t\psi \,\mu \,dx dt + \int _0^T\int _\Omega \nabla \psi \cdot ({\mathcal {P}}[\mu ] \mu )\,dxdt \\&\quad + \int _0^T\int _\Omega \nabla \psi \cdot (f \mu )\,dxdt - \sigma \int _0^T \int _\Omega \nabla \mu \cdot \nabla \psi \,dxdt, \end{aligned}$$

for \(\psi \in Y' = L^2(0,T;H^1_0(\Omega ))\). Then straightforward computations yield

$$\begin{aligned} G'_\mu (\mu ,f)(\nu ,\psi )= & {} - \int _0^T\int _\Omega \partial _t\psi \,\nu \,dx dt\\&+ \int _0^T\int _\Omega \nabla \psi \cdot ({\mathcal {P}}[\nu ] \mu + {\mathcal {P}}[\mu ] \nu + f\nu )\,dxdt \\&-\, \sigma \int _0^T \int _\Omega \nabla \nu \cdot \nabla \psi \,dxdt, \end{aligned}$$

for \((\nu ,\psi ) \in V \times Y'\), and

$$\begin{aligned} G'_f(\mu ,f)(g,\psi ) = \int _0^T \int _\Omega \nabla \psi \cdot (g\mu )\,dxdt \quad \text{ for } \quad (g,\psi ) \in Q_M \times V'. \end{aligned}$$

Note that the interaction terms on the right hand side of the equality for \(G'_\mu (\mu ,f)(\nu ,\psi )\) can be rewritten as

$$\begin{aligned}&\int _0^T\int _\Omega \nabla \psi \cdot ({\mathcal {P}}[\nu ] \mu + {\mathcal {P}}[\mu ] \nu ) dxdt \\&\quad = \int _0^T \int _{\Omega ^2} \nabla \psi (x)P(x,y)\cdot (y-x)(\nu (y)\mu (x) + \mu (y)\nu (x))\,dxdydt\\&\quad = -\int _0^T \int _{\Omega ^2} \nabla \psi (y)P(y,x)\cdot (y-x)(\nu (y)\mu (x) + \mu (y)\nu (x))\,dxdydt\\&\quad = \frac{1}{2}\int _0^T \int _{\Omega ^2} \left(P(x,y)\nabla \psi (x) - P(y,x)\nabla \psi (y) \right)\\&\qquad \cdot (y-x) \left(\nu (x)\mu (y) + \mu (x)\nu (y) \right)dxdydt. \end{aligned}$$

We now present our main result on the first order optimality condition in the theorem below.

Theorem 3.2

Let \((\mu ^*,f^*) \in V \times Q_M\) be a solution to the problem (3.1)–(3.2). Suppose that there exists a \(\mu _\ell > 0\) such that \(\mu ^* \ge \mu _\ell \) for all \((x,t) \in \Omega _T\). Then there exists \(\psi ^* \in Y'\) such that

$$\begin{aligned} G'_\mu (\mu ^*,f^*)(\nu , \psi ^* )= & {} J'_\mu (\mu ^*,f^*)(\nu ), \quad \text{ for } \text{ all } \quad \nu \in V,\nonumber \\ G'_f(\mu ^*,f^*)(g, \psi ^*)= & {} J'_f(\mu ^*,f^*)(g), \quad \text{ for } \text{ all } \quad g \in L^2(\Omega _T). \end{aligned}$$
(3.10)

Before presenting the proof of the first order optimality conditions (3.10), let us comment the positivity principle on the existence of \(\mu _\ell > 0\) such that \(\mu ^* \ge \mu _\ell \) for all \((x,t) \in \Omega _T\). If we assume that \(\mu _0, f, P \in \mathcal {C}^2\) and \(\mu _0\) is bounded from below by a positive constant, then by Feynman–Kac formula, we can show that \(\mu \) is bounded from below by some positive constant until the fixed time T. However, we a priori assume it to avoid any further stronger regularity assumption for the control f. Later, we will verify this property numerically in Sect. 5.

Proof

For the proof, we show that linear operators \(G'_\mu (\mu ^*,f^*): V \rightarrow Y\) and \(G'_f (\mu ^*,f^*): L^2(\Omega _T)\left(\supseteq Q_M\right) \rightarrow Y\) are surjective. Then, by Theorem 3.1, we conclude our desired results.

Surjectivity of \(G'_\mu (\mu ^*,f^*)\). Let \((\mu ^*,f^*) \in V \times Q_M\) be a solution to (3.1)–(3.2). We want to show that for any \(\eta \in Y\) there exists a \(\nu \in V\) such that

$$\begin{aligned} G'_\mu (\mu ^*,f^*)(\nu ) = \eta , \quad \text{ i.e., } \quad G'_\mu (\mu ^*,f^*)(\nu , \psi ) = \eta (\psi ) \quad \text{ for } \text{ all } \quad \psi \in Y'. \end{aligned}$$

Note that finding the above equality is equivalent to show that for given \((\mu ^*,f^*,\eta ) \in V \times Q_M \times Y\), there exists a solution \(\nu \in V\) to the Cauchy problem:

$$\begin{aligned} \partial _t \nu + \nabla \cdot \left({\mathcal {P}}[\nu ]\mu ^* + {\mathcal {P}}[\mu ^*]\nu + f^*\nu \right) = \sigma \Delta \nu -\eta , \quad x \in \Omega , \quad t > 0, \end{aligned}$$
(3.11)

with the initial data \(\nu _0 \in L^2(\Omega )\) and the boundary condition:

$$\begin{aligned} \left\langle \sigma \nabla \nu - {\mathcal {P}}[\nu ]\mu ^* - \left( {\mathcal {P}}[\mu ^*]+ f^*\right)\nu , n(x)\right\rangle = 0, \quad (x,t) \in \partial \Omega \times \mathbb R_+. \end{aligned}$$

We notice that (3.11) is linear parabolic equation of \(\nu \). Thus the existence of \(\nu \in V\) is enough to show the following a priori estimates which are very similar to that in the proof of Theorem 2.4:

$$\begin{aligned}&\frac{1}{2}\frac{d}{dt}\Vert \nu \Vert _{L^2}^2 + \sigma \Vert \nabla \nu \Vert _{L^2}^2 \\&\quad \le \Vert \nabla \nu \Vert _{L^2}\left(\Vert {\mathcal {P}}[\nu ]\mu ^*\Vert _{L^2} + \Vert {\mathcal {P}}[\mu ^*]\nu \Vert _{L^2} + \Vert f^* \nu \Vert _{L^2}\right) + \Vert \eta \Vert _{H^{-1}}\Vert \nu \Vert _{H^1}\\&\quad \le \frac{\sigma }{2}\Vert \nabla \nu \Vert _{L^2}^2 + C\left(\Vert {\mathcal {P}}[\nu ]\Vert _{L^\infty }^2\Vert \mu ^*\Vert _{L^2}^2 + \left(\Vert {\mathcal {P}}[\mu ^*]\Vert _{L^\infty }^2 + \Vert f^*\Vert _{L^\infty }^2\right)\Vert \nu \Vert _{L^2}^2\right)\\&\qquad + \Vert \eta \Vert _{H^{-1}}^2 + \Vert \nu \Vert _{L^2}^2\\&\quad \le \frac{\sigma }{2}\Vert \nabla \nu \Vert _{L^2}^2 + C\left(\Vert \mu ^*\Vert _{L^2}^2 + \Vert f^*\Vert _{L^\infty }^2 + 1\right)\Vert \nu \Vert _{L^2}^2+ \Vert \eta \Vert _{H^{-1}}^2,\\&\qquad \Vert \partial _t \nu \Vert _{H^{-1}} \le \Vert {\mathcal {P}}[\nu ]\Vert _{L^\infty }\Vert \mu ^*\Vert _{L^2} + \left(\Vert {\mathcal {P}}[\mu ^*]\Vert _{L^\infty } + \Vert f^*\Vert _{L^\infty }\right)\Vert \nu \Vert _{L^2} \\&\qquad + \sigma \Vert \nabla \nu \Vert _{L^2} + \Vert \eta \Vert _{H^{-1}}\\&\quad \lesssim \left(\Vert \mu ^*\Vert _{L^2} + \Vert f^*\Vert _{L^\infty }\right)\Vert \nu \Vert _{L^2} + \sigma \Vert \nabla \nu \Vert _{L^2} + \Vert \eta \Vert _{H^{-1}}. \end{aligned}$$

Here we used

$$\begin{aligned} \Vert {\mathcal {P}}[\nu ]\Vert _{L^\infty } \le {\text {diam}}(\Omega )\sqrt{|\Omega |}\Vert P\Vert _{L^\infty }\Vert \nu \Vert _{L^2}, \end{aligned}$$

and similarly

$$\begin{aligned} \Vert {\mathcal {P}}[\mu ^*]\Vert _{L^\infty } \le {\text {diam}}(\Omega )\sqrt{|\Omega |}\Vert P\Vert _{L^\infty }\Vert \mu ^*\Vert _{L^2}. \end{aligned}$$

This yields

$$\begin{aligned}&\Vert \nu (\cdot ,t)\Vert _{L^2}^2 + \int _0^t \Vert \nabla \nu (\cdot ,s)\Vert _{L^2}^2 ds \\\le & {} \left( \Vert \nu _0\Vert _{L^2}^2 + \Vert \eta \Vert _{L^2(0,T;H^{-1})}^2 \right)\exp \left(C\int _0^T \left(\Vert \mu ^*(\cdot ,s)\Vert _{L^2}^2 + \Vert f^*(\cdot ,s)\Vert _{L^\infty }^2 + 1\right)ds \right) \end{aligned}$$

and

$$\begin{aligned} \Vert \partial _t \nu \Vert _{L^2(0,T;H^{-1})}\lesssim & {} \Vert \nu \Vert _{L^\infty (0,T;L^2)}\left(\Vert \mu ^*\Vert _{L^2(\Omega _T)} + \Vert f^*\Vert _{L^2(0,T;L^\infty )} \right) \\&+\,\sigma \Vert \nabla \nu \Vert _{L^2(\Omega _T)} + \Vert \eta \Vert _{L^2(0,T;H^{-1})}. \end{aligned}$$

Surjectivity of \(G'_f(\mu ^*,f^*)\). For \(\xi \in Y\), we first consider the following weak formulation of Poisson equation:

$$\begin{aligned} \int _0^t \int _\Omega \nabla \psi \cdot \nabla u\,dxds = \int _0^t \int _\Omega \xi \psi \,dxds, \quad \text{ for } \text{ any } \psi \in H^1_0(\Omega ), \end{aligned}$$
(3.12)

where we already took account the space–time decomposition of the test function. Note that solving Eq. (3.12) is equivalent to finding \(u \in L^2(0,T;H^1_0(\Omega ))\) such that

$$\begin{aligned} a(u,v) = (f,v) \quad \text{ for } \text{ all } \quad L^2(0,T;H^1_0(\Omega )), \end{aligned}$$

with

$$\begin{aligned} a(u,v) := (\nabla u, \nabla v) = \int _0^T \int _\Omega \nabla u\cdot \nabla v\,dxdt, \end{aligned}$$

and \((\cdot ,\cdot )\) is the inner product in \(L^2(\Omega _T)\). Due to Poincaré inequality, we find that \(a(\cdot ,\cdot )\) is an inner product on \(L^2(0,T;H^1_0(\Omega ))\) with the induced norm:

$$\begin{aligned} \Vert v\Vert _{L^2(0,T;H^1_0)}^2 = a(v,v). \end{aligned}$$

Define

$$\begin{aligned} F(v):= \int _0^T\int _\Omega f\,v\,dxdt \quad \text{ for } \quad v \in L^2(0,T;H^1_0(\Omega )). \end{aligned}$$

Then this functional is continuous on \(L^2(0,T;H^1_0(\Omega ))\) since \(|F(v)| \le \Vert f\Vert _{L^2(0,T;H^{-1})}\Vert v\Vert _{L^2(0,T;H^1_0)}\). Thus, by Riesz representation theorem, there exists a unique \(u \in L^2(0,T;H^1_0(\Omega ))\) solving Eq. (3.12).

We now get back to our original problem. Our goal was to show that for given \(\mu ^* \in V\) and \(\xi \in Y\), there exists a function \(g \in L^2(\Omega _T)\) such that

$$\begin{aligned} \int _0^T \int _\Omega \nabla \psi \cdot (g\mu ^*)\,dxdt = \int _0^T \int _\Omega \xi \,\psi \,dxdt \quad \text{ for } \text{ any } \psi \in Y'. \end{aligned}$$

Then we now construct the solution g to the above equation by

$$\begin{aligned} g\mu ^* = \nabla u, \quad \text{ i.e., } \quad g = \frac{\nabla u}{\mu ^*} \quad \text{ on } \text{ the } \text{ support } \text{ of } \mu ^*, \end{aligned}$$

where the existence of \(u \in L^2(0,T;H^1_0(\Omega ))\) was guaranteed in the beginning of the proof. Moreover, by the assumption \(\mu ^*(x,t)>\mu _\ell > 0\) in \(\Omega \times [0,T]\), we have

$$\begin{aligned} \int _0^T \int _\Omega |g(x,t)|^2\,dxdt= & {} \int _0^T \int _\Omega \left|\frac{\nabla u(x,t)}{\mu ^*(x,t)}\right|^2\,dxdt\\\le & {} \frac{1}{\mu _\ell ^2}\int _0^T \int _\Omega |\nabla u(x,t)|^2\,dxdt < \infty , \end{aligned}$$

due to \(u \in L^2(0,T;H^1_0(\Omega ))\). This completes the proof. \(\square \)

4 Hierarchy of Controls via Boltzmann Equation

For large values of N, the solution of finite horizon control problems of the type (1.2)–(1.3) through standard methods stumble upon prohibitive computational costs, due to the nonlinear constraints and the lack of convexity in the cost. Although mean field optimal controls (1.4)–(1.6) are designed to be independent of the number N of agents to provide a way to circumvent the course of dimensionality of \(N\rightarrow \infty \), still their numerical computation needs to be realized by solving the first-order optimality conditions. The complexity of their solution depends on the intrinsic dimensionality d of the agents, which is affordable only at moderate dimensions (e.g., \(d \le 3\)). In order to tackle these difficulties, we introduce a novel reduced setting, by introducing a binary dynamics whose evolution can be described by means of a Boltzmann-type equation, [3, 69]. Hence we will show that this description, under a proper scaling [80, 83], converges to the mean field equation (1.4), [6, 36, 80]. This type of approach allows to embed the control dynamics into two different ways:

  1. (i)

    we can assume the control f to be a given function, possibly obtained from the solution of the optimal control problem (1.2)–(1.3);

  2. (ii)

    alternatively, the control is obtained as a solution of the reduced optimal control problem associated to the dynamics of two single agents. We refer to this approach as binary control.

Similar ideas have been used in a control context in [5,6,7, 45, 49]. We devote the forthcoming sections to show different strategies to derive such binary controls. Thus we want to approach the mean field optimal control problem (1.2)–(1.3) as the last step of a control hierarchy, starting from an instantaneous control strategy and going towards a binary Hamilton–Jacobi–Bellmann control.

4.1 Binary Controlled Dynamics

We consider the discrete controlled system (1.2)–(1.3) in the simplified case of only two interacting agents \((x_i(t),x_j(t))\) and in absence of noise, i.e. \(\sigma = 0\). Hence, by defining the sample time \(\Delta t\) such that \(t_m = m\Delta t\), so that \(0=t_0<\cdots<t_m<\cdots <t_M=T\) and introducing a forward Euler discretization, we write (1.2) as follows

$$\begin{aligned} x_i^{m+1} = \,&x_i^m + \frac{\Delta t}{2} P(x_i^m,x_j^m)(x_j^m-x_i^m) + {\Delta t} u^m_i, \nonumber \\ x_j^{m+1}=\,&x_j^m+ \frac{\Delta t}{2} P(x_j^m,x_i^m)(x_i^m-x_j^m) + {\Delta t} u^m_j, \end{aligned}$$
(4.1)

where from now on we denote the control pair \(u:=(u_i,u_j)\) associated to the state variable \(x:=(x_i,x_j)\), and having used the compact notation for \(x^m_i=x_i(t_m), u^m_i=u_i(t_m)\).

The discretized form for the functional (1.3) for the binary dynamics (4.1) reads

$$\begin{aligned} J_M(x,u) : = \sum _{m=0}^{M-1}\int _{t_m}^{t_{m+1}} L \left( x(t),u(t)\right) \ dt, \end{aligned}$$
(4.2)

where the stage cost is given by

$$\begin{aligned} L(x,u) =\frac{1}{2}\left( |x_i-\bar{x}|^2+|x_j-\bar{x}|^2\right) +\gamma \left( \Psi (u_i)+ \Psi (u_j)\right) . \end{aligned}$$
(4.3)

In the following we propose two alternative methods in order to characterize \(u_i,u_j\) as (sub-)optimal feedback controller. In both cases, we will consider the controlled dynamics in the deterministic case. Nonetheless, we will show in Sect. 5.3 that such controls are robust with respect to the presence of noise, (\(\sigma > 0\)) and they shall be employed in the corresponding stochastic setting as well.

4.1.1 Instantaneous Control

A first approach towards obtaining a low complexity computational realization of the solution of the optimal control problem (4.1)–(4.2) is the so-called model predictive control (MPC). This strategy furnishes a suboptimal control by an iterative solution over a sequence of finite time steps, representing the predictive horizon [6, 8, 64]. Since we are only interested in instantaneous control strategies, we limit the MPC method to a single time prediction horizon, therefore we reduce the original optimization into the minimization on every time interval \([t_m,t_{m+1}]\) of the following functional

$$\begin{aligned} J_{\Delta t}(x^m,{u}^m)&= \Delta tL(x(t_{m+1}),u(t_m))= \Delta t\left( \frac{1}{2}\left( |x^{m+1}_i-\bar{x}|^2+|x^{m+1}_j-\bar{x}|^2\right) \right. \nonumber \\&\left. \quad +\,\gamma \left( \Psi (u^m_i)+ \Psi (u^m_j)\right) \right) . \end{aligned}$$
(4.4)

Note that from (4.1) we have that \(x^{m+1}\) depends linearly on \(u^m\), thus

$$\begin{aligned} U^m_{ij}:= U(x_i,x_j,t_m)=\underset{u^m}{\arg \min }\;J_{\Delta t}(x^m,u^m) \end{aligned}$$

can be directly computed from the following system

$$\begin{aligned} \Delta t ^2{U}^m_{ij} + 2\gamma \nabla _{{u}_i}\Psi (U^m_{ij}) + \Delta t (x^m_i-\bar{x}) + \frac{\Delta t ^2}{2} P(x_i^m,x_j^m)(x_j^m-x_i^m) = 0,\nonumber \\ \Delta t ^2{U}^m_{ji} + 2\gamma \nabla _{{u}_j}\Psi (U^m_{ji}) + \Delta t(x^m_j-\bar{x}) + \frac{\Delta t ^2}{2}P(x_j^m,x_i^m)(x_i^m-x_j^m) = 0.\nonumber \\ \end{aligned}$$
(4.5)

In the case of a quadratic penalization of the control, i.e. \(\Psi (c) := |c|^2/2\), we can furnish the following explicit expression for the minimizers

$$\begin{aligned} U_{ij}^m= \frac{\Delta t}{2\gamma +\Delta t^2}\left( (\bar{x}-x^m_i) - \frac{\Delta t}{2} P(x_i^m,x_j^m)(x_j^m-x_i^m)\right) ,\nonumber \\ U_{ji}^m = \frac{\Delta t}{2\gamma +\Delta t^2}\left( (\bar{x}-x^m_j) - \frac{\Delta t}{2} P(x_j^m,x_i^m)(x_i^m-x_j^m)\right) , \end{aligned}$$
(4.6)

hence (4.5) gives a feedback control for the full binary dynamics, which can be plugged as an instantaneous control into (4.1).

Remark 4.1

Note that the instantaneous control (4.6) embedded into the discretized dynamics (4.1), is of order \(o(\Delta t)\). To obtain an effective contribution of the control in the dynamics we will assume that the penalization parameter \(\gamma \) scales with the time discretization, in this way the leading order is recovered, [6, 8], e.g. for \(\gamma = \Delta t \bar{\gamma }\) we have

$$\begin{aligned} U_{ij}^m= \frac{1}{2\bar{\gamma }+\Delta t}\left( (\bar{x}-x^m_i) - \frac{\Delta t}{2} P(x_i^m,x_j^m)(x_j^m-x_i^m)\right) . \end{aligned}$$
(4.7)

4.1.2 Finite Horizon Optimal Control

The instantaneous feedback control derived in the previous section is the optimal control action for the binary system with a single step prediction horizon. An improved, yet more complex optimal feedback synthesis can be performed by considering an extended finite horizon control problem. Let us define the value function associated to the finite horizon discrete cost (4.2) as

$$\begin{aligned} V(x_i,x_j,t_m) := \underset{u\in {\mathcal {U}}}{\inf }\sum _{k=m}^{M-1} \Delta t L(x_i(t_k),x_j(t_k),u(t_k)),\quad \text {for } m = 0,\ldots ,M-1, \end{aligned}$$
(4.8)

with terminal condition \(V(x_i,x_j,t_M)=0\). It is well-known that the application of the Dynamic Programming Principle [13] with the discrete time dynamics (4.1) characterizes the value function as the solution of the following recursive Bellman equation

$$\begin{aligned} V(x_i,x_j,t_M)= & {} 0,\nonumber \\ V(x_i,x_j,t_m)= & {} \inf _{u\in {\mathcal {U}}}\left\{ \Delta t L(x_i,x_j,u) + V(x+\Delta t (F(x_i,x_j)+u),t_{m+1}) \right\} ,\nonumber \\&m = M-1,\ldots ,0, \end{aligned}$$
(4.9)

where \(x=(x_i,x_j)\), \(u=(u_i,u_j)\), and \(F(x_i,x_j) := (P(x_i,x_j)(x_j-x_i),P(x_i,x_j)(x_j-x_i))\). Once this functional relation has been solved, for every time step the optimal control is recovered from the optimality condition as follows

$$\begin{aligned} U(x_i,x_j,t_m)=\underset{u\in {\mathcal {U}}}{\arg \min }\left\{ \Delta t L(x_i,x_j,u) + V(x+\Delta t (F(x_i,x_j)+u),t_{m+1}) \right\} . \end{aligned}$$
(4.10)

As in the expression (4.5), this optimal control is also in feedback form, depending not only on the current states of binary system \((x_i,x_j)\), but also on the discrete time variable \(t_m\).

Remark 4.2

The system (4.9) is a first-order approximation of the Hamilton–Jacobi–Bellman equation

$$\begin{aligned} \displaystyle \partial _tV(x,t) + \inf _{u \in {\mathcal {U}}} \left\{ L(x,u)+ \nabla V(x,t)\cdot \left[ F(x)+u\right] \right\} =0, \end{aligned}$$
(4.11)

related to the continuous time optimal control problem. In fact, this latter equation corresponds to the adjoint (3.6) when the nonlocal integral terms are neglected, and therefore this approach although optimal for the binary system, cannot be expected to satisfy the optimality system (3.5)–(3.6) related to the mean field optimal control problem.

4.2 Boltzmann Description

We introduce now a Boltzmann framework in order to describe the statistical evolution of a system of agents ruled by binary interactions, [8, 69].

Let \(\mu (x,t)\) denote the kinetic density of agents in position \(x\in \Omega \) at time \(t\ge 0\), such that the total mass is normalized

$$\begin{aligned} \rho (t) = \int _{\Omega } \mu (x,t) \ dx = 1, \end{aligned}$$

and the time evolution of the density \(\mu \) is given as a balance between the bilinear gain and loss of the agents position due to the binary interaction. In a general formulation, we assume that two agents have positions \(x, y\in \Omega \) and modify their positions according to the following rule

$$\begin{aligned} x^*= \,&x + \alpha P(x,y)(y-x) + \alpha U_\alpha (x,y,t) + \sqrt{2\alpha }\xi , \nonumber \\ y^* =\,&y+ \alpha P(y,x)(x-y) + \alpha U_\alpha (y,x,t) + \sqrt{2\alpha }\zeta , \end{aligned}$$
(4.12)

where \((x^*,y^*)\) are the post-interaction positions, the parameter \(\alpha \) measures the influence strength of the different terms, \((\xi ,\zeta )\) is a vector of i.i.d. random variables with a symmetric distribution \(\Theta (\cdot )\) with zero mean and variance \(\sigma \), and \(U_\alpha (x,y,t)\) indicates the forcing term due to the control dynamics.

We consider now a kinetic model for the evolution of the density \(\mu =\mu (x,t)\) of agents with \(x \in \mathbb R^d\) at time \(t\ge 0\) and ruled by the following Boltzmann-type equation

$$\begin{aligned} \partial _t \mu (x,t) = Q_{\alpha }(\mu ,\mu )(x,t), \end{aligned}$$
(4.13)

where the interaction operator \(Q_{\alpha }(\mu ,\mu )\) in (4.13), accounts the loss and gain of agents in position x at time t, as follows

$$\begin{aligned} Q_{\alpha }(\mu ,\mu )(x,t) = \mathbb {E}\left[ \int _{\Omega }\left( \mathcal {B}_* \frac{1}{\mathcal {J}_\alpha }\mu (x_*,t) \mu (y_*,t) - \mathcal {B}\mu (x,t)\mu (y,t)\right) \,dy\right] , \end{aligned}$$
(4.14)

where \((x_*,y_*)\) are the pre-interaction positions that generate arrivals (xy). The bilinear operator \(Q_\alpha (\cdot ,\cdot )\) includes the expectation value with respect to \(\xi ^x\) and \(\xi ^y\), while \(\mathcal {J}_\alpha \) represents the Jacobian of the transformation \((x,y)\rightarrow (x^*,y^*)\), described by (4.12). Here \(\mathcal {B}_*=\mathcal {B}_{(x_*,y_*)\rightarrow (x,y)} \) and \(\mathcal {B}=\mathcal {B}_{(x,y)\rightarrow (x^*,y^*)} \) are the transition rate functions. More into the details we take into account

$$\begin{aligned} \mathcal {B}_{(x,y)\rightarrow (x^*,y^*)} = \eta \chi _\Omega (x^*)\chi _\Omega (y^*), \end{aligned}$$

as the functions with an interaction rate \(\eta >0\), and where \(\chi _\Omega \) is the characteristic function of the domain \(\Omega \). Note that in this case the transition functions depends on the relative position, similarly to [80], as we introduced a bounded domain \(\Omega \) into the dynamics. A major simplification occurs in the case the bounded domain is preserved by the binary interactions itself, therefore the transition is constant and the interaction operator (4.14) reads

$$\begin{aligned} Q_{\alpha }(\mu ,\mu )(x,t) = \eta \mathbb {E}\left[ \int _{\Omega }\left( \frac{1}{\mathcal {J}_\alpha }\mu (x_*,t) \mu (y_*,t) - \mu (x,t)\mu (y,t)\right) \,dy\right] . \end{aligned}$$
(4.15)

In [6, 80] authors showed that in opinion dynamics binary interactions are able to preserve the boundary, according to the choice of a small support of the symmetric random variable \(\xi \) and introducing a suitable function D(x) acting as a local weight on the noise in (4.12).

In the next section we will perform the analysis of this model in the simplified case of \(\Omega = \mathbb {R}^d\) and constant rate of interaction \(\eta \).

Remark 4.3

Note that the binary dynamics (4.12) is equivalent to the Euler–Maruyama discretization for Eq. (1.2) in the two agents case

$$\begin{aligned} x_i^{m+1} = \,&x_i^m + \frac{\Delta t}{2} P(x_i^m,x_j^m)(x_j^m-x_i^m) + {\Delta t} U^m_{ij} + \sqrt{2\sigma }\Delta B^m_i, \nonumber \\ x_j^{m+1}=\,&x_j^m+ \frac{\Delta t}{2} P(x_j^m,x_i^m)(x_i^m-x_j^m) + {\Delta t} U^m_{ji} + \sqrt{2\sigma }\Delta B^m_j, \end{aligned}$$
(4.16)

where we impose that \(\alpha = \Delta t/2\), \(\alpha U_\alpha (x_i,x_j) = \Delta t U^m_{ij}\), and \(\sqrt{2\alpha }\xi = \sqrt{2\sigma }\Delta B^m_i\) is a random variable normally distributed with zero mean value and variance \(\Delta t\), for \(\Delta B^m_i\) defined as the \(\Delta B^m_i=B_i(t_{m+1})-B_i(t_m)\).

4.2.1 The Quasi-Invariant Limit

We consider now the Boltzmann operator (4.15) in the case \(\Omega = \mathbb R^d\), and in order to obtain a more regular description we introduce the so-called quasi-invariant interaction limit, whose basic idea is considering a regime where interactions strength is low and frequency is high. This technique, analogous to the grazing collision limit in plasma physics, has been thoroughly studied in [83] and specifically for first order models in [36, 80], and allows to pass from Boltzmann equation (4.13) to a mean field equation of the Fokker–Planck-type, [5, 6]. In order to state the main result we start fixing some notation and terminology.

Definition 4.1

(Multi-index) For any \(a \in \mathbb {N}^d\) we set \(|a| = \sum ^d_{i = 1} a_i\), and for any function \(h \in C^q(\mathbb R^d \times \mathbb R^d,\mathbb R)\), with \(q\ge 0\) and any \(a \in \mathbb {N}^d\) such that \(|a| \le q\), we define for every \((x,v) \in \mathbb R^d \times \mathbb R^d\)

$$\begin{aligned} \partial ^{a}_x h(x) := \frac{\partial ^{|a|} h}{\partial ^{a_1}x_1 \ldots \partial ^{a_d}\bar{x}} (x), \end{aligned}$$

with the convention that if \(a= (0, \ldots , 0)\) then \(\partial ^{a}_x h(x) := h(x)\).

Definition 4.2

(Test functions) We denote by \(\mathcal {T}_{\delta }\) the set of compactly supported functions \(\varphi \) from \(\mathbb R^{d}\) to \(\mathbb R\) such that for any multi-index \(a \in \mathbb {N}^d\) we have,

  1. 1.

    if \(|a| < 2\), then \(\partial ^{a}_x \varphi (\cdot )\) is continuous for every \(x \in \mathbb R^d\);

  2. 2.

    if \(|a| = 2\), then there exists \(C > 0\) such that, \(\partial ^{a}_x \varphi (\cdot )\) is uniformly Hölder continuous of order \(\delta \) for every \(x \in \mathbb R^d\) with Hölder bound C, that is for every \(x,y \in \mathbb R^d\)

    $$\begin{aligned} \left\| \partial ^{a}_x \varphi (x) - \partial ^{a}_x \varphi (y) \right\| \le C \left\| x - y \right\| ^{\delta }, \end{aligned}$$

    and \(\Vert \partial ^{a}_x \varphi (x)\Vert \le C\) for every \(x \in \mathbb R^{d}\).

Definition 4.3

(\(\delta \)-weak solution) Let \(T > 0\), \(\delta > 0\), we call a \(\delta \) -weak solution of the initial value problem for Eq. (4.13), with initial datum \(\mu ^0=\mu (x,0) \in \mathcal {M}_0(\mathbb R^{d})\) in the interval [0, T], if \(\mu \in L^2([0,T], \mathcal {M}_0(\mathbb R^{d}))\) such that, \(\mu (x,0) = \mu ^0(x)\) for every \(x \in \mathbb R^{d}\), and there exists \(R_T > 0\) such that \(\text {supp}(\mu (t)) \subset B_{R_T}(0)\) for every \(t \in [0,T]\) and \(\mu \) satisfies the weak form of Eq. (4.13), i.e.,

$$\begin{aligned} \frac{d}{d t} \left\langle \mu , \varphi \right\rangle = \left\langle Q_\alpha (\mu ,\mu ),\varphi \right\rangle , \end{aligned}$$
(4.17)

for all \(t \in (0,T]\) and all \(\varphi \in \mathcal {T}_{\delta }\), where

$$\begin{aligned} \left\langle Q_\alpha (\mu ,\mu ),\varphi \right\rangle&= \mathbb {E}\left[ \int _{\mathbb {R}^{2d}} \eta \left( \varphi (x^*) - \varphi (x) \right) \mu (x)\mu (y) \ dx \ dy\right] . \end{aligned}$$
(4.18)

Moreover, we assume that

  1. (a)

    the system (4.12) constitutes invertible changes of variables from (xy) to \((x^*,y^{*})\);

  2. (b)

    there exists an integrable function K(xyt) such that the following limit is well defined

    $$\begin{aligned} \lim _{\alpha \rightarrow 0}U_\alpha (x,y,t) = K(x,y,t). \end{aligned}$$
    (4.19)

    In the case of instantaneous control of type (4.6), we can explicitly give an expression to the limit as \(K(x,y,t) = (\bar{x}-x)/\gamma \).

We state the following theorem.

Theorem 4.4

Let us fix a control \(U_\alpha \in {\mathcal {U}}\) and \(\alpha \ge 0\), and \(T > 0\), \(\delta > 0\), \(\varepsilon >0\), and assume that density \(\Theta \in {\mathcal {M}}_{2+\delta }(\mathbb R^d)\) and the function \(P(\cdot ,\cdot )\in L^q_{loc}\) for \(q = 2, 2+\delta \) and for every \(t\ge 0\). We consider a \(\delta \)-weak solution \(\mu \) of Eq. (4.13) with initial datum \(\mu _0(x)\). Thus introducing the following scaling

$$\begin{aligned} \alpha = \varepsilon , \quad \eta ={1/\varepsilon }, \end{aligned}$$
(4.20)

for the binary interaction (4.12) and defining by \(\mu ^\varepsilon (x,t)\) a solution for the scaled equation (4.13), for \(\varepsilon \rightarrow 0\) \(\mu ^\varepsilon (x,t)\) converges pointwise, up to a subsequence, to \(\mu (x,t)\) where \(\mu \) satisfies the following Fokker–Planck-type equation,

$$\begin{aligned} \partial _t \mu + \nabla \cdot \left( (\mathcal {P}[\mu ] + {\mathcal {K}}[\mu ])\mu \right) = \sigma \Delta \mu , \end{aligned}$$
(4.21)

with initial data \(\mu _0(x)=\mu (x,0)\) and where \(\mathcal {P}\) represents the interaction kernel (1.5) and \(\mathcal {K}\) is the control.

$$\begin{aligned} \mathcal {K}[\mu ](x,t) = \int _{\mathbb {R}^d}K(x,y,t)\mu (y,t)\,dy. \end{aligned}$$
(4.22)

with K(xyt) defined as in (4.19).

Proof

\(\bullet \) Taylor approximation We consider the weak formulation of the Boltzmann equation (4.17) and we expand \(\varphi (x^{*})\) inside the operator (4.18) in Taylor series of \(x^* - x\) up to the second order, obtaining

$$\begin{aligned}&\left\langle Q_\alpha (\mu ,\mu ),\varphi \right\rangle = T^\varphi _1+T^\varphi _2+ R_1^{\varphi }, \end{aligned}$$
(4.23)

where the first and second order terms are

$$\begin{aligned}&T^\varphi _1:= \eta \mathbb {E}\Bigg [\int _{\mathbb {R}^{2d}} \nabla \varphi (x) \cdot \left( x^{*} - x\right) \mu (x)\mu (y) \, dxdy\Bigg ],\end{aligned}$$
(4.24)
$$\begin{aligned}&T^\varphi _2 := \frac{\eta }{2}\mathbb {E}\left[ \int _{\mathbb {R}^{2d}} \left( \sum ^{d}_{i,j = 1} \partial ^{(i,j)}_{x} \varphi (x) \left( x^{*} - x\right) _i\left( x^{*} - x\right) _j\right) \mu (x)\mu (y) \, dxdy\right] , \end{aligned}$$
(4.25)

and \(R_1^{\varphi }({\varepsilon })\) is the reminder of the Taylor expansion, with a form

$$\begin{aligned}&R_1^{\varphi }:= \frac{\eta }{2}\mathbb {E}\\&\quad \left[ \int _{\mathbb {R}^{2d}} \left( \sum ^{d}_{i,j = 1} \left( \partial ^{(i,j)}_{x} \varphi (x) - \partial ^{(i,j)}_{x} \varphi (\overline{x})\right) \left( x^{*} - x\right) _i\left( x^{*} - x\right) _j\right) \mu (x)\mu ({y}) \, dxdy \right] , \end{aligned}$$

with \(\overline{x} := (1-\theta ) x^* + \theta x\), for some \(\theta \in [0,1]\). By using the relation given by the scaled interaction rule (4.12), i.e.

$$\begin{aligned} x^* - x = \alpha F_\alpha (x,y) +\sqrt{2\alpha }\xi \end{aligned}$$

where for the sake of brevity we denoted \(F_\alpha (x,y) := P(x,y)(y-x) +U_\alpha (x,y)\). Note that from the hypothesis it follows that \(F_\alpha \in L^q_{loc}\). Thus we obtain

$$\begin{aligned} T^\varphi _1&= \eta \mathbb {E}\Bigg [\alpha \int _{\mathbb {R}^{2d}} \nabla \varphi (x) \cdot \left( F_\alpha (x,y) +\sqrt{2/\alpha }\ \xi \right) \mu (x)\mu (y)\, dxdy\Bigg ]\\ {}&= \eta \alpha \int _{\mathbb {R}^{2d}} \nabla \varphi (x) \cdot F_\alpha (x,y)\mu (x)\mu (y)\, dxdy \end{aligned}$$

where the noise term, \(\xi \) is canceled out since it has zero mean. For the same reason in the second order term \(T^\varphi _2\) all the mixed products between \(F_\alpha \) and \(\xi \) vanish, the same hold for all the crossing terms \(\xi _{i} \xi _{j}\) since \(\xi _{i}\) are supposed to be independent variables. Hence the only contribution we have reads

$$\begin{aligned} T^\varphi _2&= \frac{\eta }{2}\mathbb {E}\left[ \int _{\mathbb {R}^{2d}}\alpha ^2\left( \sum ^{d}_{j = 1} \partial ^{(j,j)}_{x} \varphi (x)\left( F_\alpha (x,y)_j\right) ^2 \right) \right. \\&\left. \quad + \left( \sum ^{d}_{j = 1} \partial ^{(j,j)}_{x} \varphi (x)\left( 2\alpha \xi _{j}^2\right) \right) \mu (x)\mu (y) \, dxdy \right] \\&= \eta \alpha \int _{\mathbb {R}^{2d}} \sigma \Delta \varphi (x)\mu (x)\mu (y) \, dxdy\\ {}&\quad +\frac{\eta \alpha ^2}{2}\int _{\mathbb {R}^{2d}} \left( \sum ^{d}_{j = 1} \partial ^{(j,j)}_{x} \varphi (x)\left( F_\alpha (x,y)_j\right) ^2 \right) \mu (x)\mu (y) \, dxdy,\\&=: T^\varphi _{22} + R_{2}^\varphi . \end{aligned}$$

\(\bullet \) Quasi-invariant limit We now introduce the scaling (4.20), for which we can substitute in the previous equations, \(\eta \alpha =1\) and \(\eta \alpha ^2={\varepsilon }\), thus we have that terms \(T_1^\varphi \) and \(T_{22}^\varphi \) represent the leading order and \(R^\varphi ({\varepsilon }) := R^\varphi _1+R^\varphi _2\) a reminder, so we can recast the scaled expression (4.23) as follows

$$\begin{aligned} \int _{\mathbb R^{2d}}\left( \nabla \varphi \cdot F_{\varepsilon }(x,y)+\sigma \Delta \varphi (x)\right) \mu (x)\mu (y) \, dxdy+ R^\varphi ({\varepsilon }). \end{aligned}$$
(4.26)

Let us now consider the limit \(\varepsilon \rightarrow 0\), assuming that for every \(\varphi \in \mathcal {T}_{\delta }\)

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} R^\varphi ({\varepsilon }) = 0 \end{aligned}$$
(4.27)

holds true, we have thanks to (4.19) and (4.26) that the weak scaled Boltzman equation (4.17) converges pointwise to the Fokker–Planck-type equation (4.21) as follows

$$\begin{aligned} \frac{d}{dt} \left\langle \mu , \varphi \right\rangle = \left\langle \mu , \nabla \varphi \cdot (\mathcal {P}\left[ \mu \right] +{\mathcal {K}}[\mu ]) + \sigma \Delta \varphi \right\rangle , \end{aligned}$$
(4.28)

where the operators \(\mathcal {P}[\mu ]\) and \(\mathcal {K}[\mu ]\) are defined in (1.5) and (4.22). Since \(\varphi \) has compact support, Eq. (4.28) can be revert in strong form by means of integration by parts, we eventually obtain system (4.21).

\(\bullet \) Estimates for the reminder In order to conclude the proof it is sufficient to show that the limit (4.27) for \(R^\varphi ({\varepsilon })\) vanishes. From the definition of \(\overline{x}\) it follows that \(\left\| \overline{x} -x\right\| \le \left\| x^* - x \right\| \), then for every \(\varphi \in \mathcal {T}_{\delta }\) we have

$$\begin{aligned} \left\| \partial ^{(i,j)}_{x}\varphi (x) - \partial ^{(i,j)}_{x}\varphi (\overline{x})\right\|&\le C \left\| \overline{x} -x\right\| ^{\delta } \le C \left\| x^* - x \right\| ^{\delta }. \end{aligned}$$

Hence for \(R_1^\varphi \) we get

$$\begin{aligned} \left\| R_1^{\varphi }\right\|&\le \frac{C}{2\varepsilon } \mathbb {E}\left[ \int _{\mathbb R^{2d}} \left\| x^* - x \right\| ^{2+\delta } \mu (x) \mu (y) \, dxdy\right] \\&= \frac{C}{2} {\varepsilon }^{1+\delta } \mathbb {E}\left[ \int _{\mathbb R^{2d}} \left\| F_{\varepsilon }(x,y) + \sqrt{2/{\varepsilon }}\ \xi \right\| ^{2+\delta } \mu (x) \mu (y) \, dxdy \right] \end{aligned}$$

from the inequality \(|a+b|^{2+\delta }\le 2^{2+2\delta }(|a|^{2+\delta }+|b|^{2+\delta })\) for some ab we obtain

$$\begin{aligned}&\left\| R_1^{\varphi }\right\| \le 2^{1+2\delta }C\\&\quad \left( {\varepsilon }^{1+\delta } \int _{\mathbb R^{2d}} \left\| F_{\varepsilon }(x,y)\right\| ^{2+\delta } \mu (x) \mu (y) \, dxdy+ 2^{1+\delta /2}{\varepsilon }^{\delta /2}\mathbb {E}\left[ \left\| \xi \right\| ^{2+\delta }\right] \right) . \end{aligned}$$

Analogous computation can be yield for \(R_2^\varphi \) for which we have the following inequality

$$\begin{aligned} \left\| R_2^{\varphi }\right\|&\le \frac{{\varepsilon }C}{2} \int _{\mathbb R^{2d}} \left\| F_{\varepsilon }(x,y)\right\| ^{2} \mu (x) \mu (y) \, dxdy. \end{aligned}$$

Since \(F_{\varepsilon }\in L^q_{loc}\) for \(q=2,2+\delta \) and \(\Theta \in {\mathcal {M}}_{2+\delta }(\mathbb R^d)\) we can conclude that for \({\varepsilon }\rightarrow 0\) the limit (4.27) holds true. \(\square \)

Remark 4.4

Note that in the case \(U_\alpha (x,y,t)=U_\alpha (x,t)\), namely if the feedback control depends only by the position x of the agents at time t, then the kernel \({\mathcal {K}}[\mu ](x,t)\) reduces to K(xt). This observation holds also if we consider a sampling from the optimal control, i.e. \(U_\alpha (x,y,t) = f(x,t)\), thus Eq. (4.21) becomes exactly the original equation (1.2).

5 Numerical Methods

In this section we are concerned with the development of numerical methods for the mean field optimal control problem (1.2)–(1.3). First, we present direct simulation Monte Carlo methods for the constrained Boltzmann-type model (4.13), and discuss the implementation of the binary feedback controllers introduced in Sect. 4.1. Next, we describe a sweeping algorithm based on the iterative solution of the optimality system, (3.1)–(3.8).

5.1 Asymptotic Constrained Binary Algorithms

One of the most common approaches to solve Boltzmann-type equations is based on Monte Carlo methods. For this, we consider the initial value problem given by Eq. (4.13), in the grazing interaction regime (4.20), with initial data \(\mu (x,t=0)=\mu _0(x)\), as follows

$$\begin{aligned} {\left\{ \begin{array}{ll} \dfrac{d}{dt}\mu (x,t) = \dfrac{1}{{\varepsilon }}\left[ {Q}_{\varepsilon }^{+}(\mu ,\mu )(x,t)-\mu (x,t)\right] , \\ \mu (x,0) =\mu _0(x). \end{array}\right. } \end{aligned}$$
(5.1)

Here we have made explicit the dependence of the interaction operator \(Q_{\varepsilon }(\cdot ,\cdot )\) on the frequency of interactions \(1/\varepsilon \), and decomposing it into its gain and loss parts according to (4.15). With \(Q^{+}_\varepsilon (\cdot ,\cdot )\) we denote the gain part, which accounts the density of agents gained at position x after the binary interaction (4.12).

We tackle the Boltzmann-type equation (5.1) by means of a binary interaction algorithm [3, 69], where the basic idea is to solve the binary exchange of information described by (4.12), under the grazing interaction scaling (4.20), in order to obtain in the limit an approximate solution of the mean field equation (4.21). Note that the consistency of this procedure is given by Theorem 4.4.

Let us now consider a time interval [0, T] discretized in \(M_{tot}\) intervals of size \(\Delta t\). We denote by \(\mu ^m\) the approximation of \(\mu (x,m\Delta t)\), thus the first order forward scheme of the scaled Boltzmann-type equation (5.1) reads

$$\begin{aligned} \mu ^{m+1}=\left( 1-\frac{\Delta t}{\varepsilon }\right) \mu ^{m}+\frac{\Delta t}{\varepsilon }{{Q}_\varepsilon ^{+}(\mu ^m,\mu ^m)}, \end{aligned}$$
(5.2)

where, since \(\mu ^m\) is a probability density, thanks to mass conservation, and also \(Q_\varepsilon ^{+}(\mu ^m,\mu ^m)\) is a probability density. Under the restriction \(\Delta t\le \varepsilon \), \(\mu ^{m+1}\) is a probability density, since it is a convex combination of probability densities.

From a Monte Carlo point of view, Eq. (5.2) can be interpreted as follows: an individual with position x will not interact with other individuals with probability \(1-\Delta t/\varepsilon \) and it will interact with others with probability \(\Delta t/\varepsilon \) according to the interaction law stated by \(Q_\varepsilon ^{+}(\mu ^m,\mu ^m)\). Note that, since we aim at small values of \(\varepsilon \) and we have to fulfill the condition \(\Delta t\le {\varepsilon }\), the natural choice is to take \(\Delta t=\varepsilon \). At every time step, this choice maximizes the number of interactions among the agents.

For the numerical treatment of the operator \(Q_\varepsilon ^{+}(\mu ^m,\mu ^m)\), we have to account that, every interaction includes action of the feedback control. In the case of instantaneous control this can be evaluated directly, for example in the case of quadratic functional defining the scaling version of (4.7) as

$$\begin{aligned} U_{\varepsilon }(x,y,t) = \frac{1}{\gamma +{\varepsilon }}\left( (\bar{x}-x)+\alpha P(x,y)(y-x)\right) . \end{aligned}$$

On the other hand, the realization of the optimal feedback controller in the finite horizon setting requires the numerical approximation of the Bellman equation (4.9). This approximation is performed offline and only once, previous to the simulation of the mean field model. For a state space of moderate dimension, such as in our binary model, several numerical schemes for the approximation of Hamilton–Jacobbi–Bellman equations are available, and we refer the reader to [47, Chap. 8] for a comprehensive description of the different available techniques. Since the binary model is already introduced in discrete time, a natural choice is to solve eq. (4.9) by means of a sequential semi-Lagrangian scheme, following the same guidelines as in the recent works [11, 48, 57]. Once the value function has been approximated, online feedback controllers can be implemented through the evaluation of the optimality condition (4.10).

We report in Algorithm 1 a stochastic procedure to solve (5.2), based on Nanbu’s method for plasma physics, [3, 16].

figure a

Where function \(\textsc {Iround}(\cdot )\) denotes the integer stochastic rounding defined as

$$\begin{aligned} \textsc {Iround}(x)= {\left\{ \begin{array}{ll} \,[x]+1,&{} \zeta < x-[x],\\ \,[x],&{}\hbox {elsewhere} \end{array}\right. } \end{aligned}$$

with \(\zeta \) a uniform [0, 1] random number and \([\cdot ]\) the integer part.

Remark 5.1

(Efficency) In general, computing the interactions among a multi-agent system is a procedure of quadratic cost with respect to the number of agents, since every agent needs to evaluate its influence with every other. Note that with the proposed algorithm this cost becomes linear with respect to the number of samples introduced \(O(N_s)\), since only binary interactions are accounted. A major difference compared to standard algorithms for Boltzmann equations is the way in which particles are sampled from \(Q_{\varepsilon }^+(\mu ^m,\mu ^m)\) which does not require the introduction of a space grid [16].

Remark 5.2

(Accuracy) The choice \(\Delta t=\varepsilon \) is optimal if \({\varepsilon }\) is of the order of \(O({N_s}^{-1/2})\). Indeed, the accuracy of the method will not increase for smaller values of \(\Delta t\), because the numerical error is dominated by the fluctuations of the Monte Carlo method. For further details we refer to [3, 69].

5.2 Numerical Approximation of the Optimality Conditions

As shown in Sect. 3, the solution of the mean field optimal control problem (3.1)–(3.2) satisfies the optimality system

$$\begin{aligned} \partial _t\mu&=- \nabla \cdot (({\mathcal {P}}[\mu ] + f)\mu ) +\sigma \Delta \mu ,\end{aligned}$$
(5.3)
$$\begin{aligned} - \partial _t\psi&=\frac{1}{2}|x-\bar{x}|^2 +\gamma \Psi (f) + \nabla \psi \cdot f + \sigma \Delta \psi \nonumber \\&\quad +\int _\Omega \left(P(x,y)\nabla \psi (x,t) - P(y,x)\nabla \psi (y,t)\right)\cdot (y-x)\mu (y,t)\,dy,\end{aligned}$$
(5.4)
$$\begin{aligned} \nabla \Psi (f)&= {-\frac{1}{\gamma }\nabla \psi },\quad \mu (x,0)=\mu _0(x),\quad \psi (x,T)=0. \end{aligned}$$
(5.5)

5.2.1 Forward Equation

In order to solve Eq. (5.3), we consider a first order forward scheme the time evolution and the Chang–Cooper scheme for the space discretization, [31]. The formulation is based on the finite volume approximation of the density \(\mu \) and f. Defining the operator \(\mathcal {G}[\mu ,f] := \mathcal {F}[\mu ,f] +\sigma \nabla \mu \), with \(\mathcal {F}[\mu ,f] = \mathcal {P}[\mu ] + f\), then we can write in the one-dimensional domain \([-L,L]\) the (semi)-discretized equation (5.3) as

$$\begin{aligned} \frac{d}{dt} \mu _i(t) = \frac{\mathcal {G}_{i+1/2}[\mu ,f]-\mathcal {G}_{i-1/2}[\mu ,f]}{\delta x},\quad \text { with } \quad \mu _i(t) =\frac{1}{\delta x}\int ^{x+1/2}_{x-{1/2}} \mu (x,t) \ dx, \end{aligned}$$
(5.6)

where we have introduced the uniform grid \(x_{i}=-L+i\delta x\), \(i=0,\ldots ,N,\) with \(\delta x = 2L/N\), and denoted by \(x_{i \pm 1/2}=x_i \pm \delta x/2\). Thus, the operator \(\mathcal {G}_{i+1/2}[\mu ,f]\) in the case of constant diffusion \(\sigma \) reads

$$\begin{aligned} \mathcal {G}_{i+1/2}[\mu ,f]=\left( (1-\theta _{i+1/2})\mu _{i+1}+\theta _{i+1/2}\mu _i\right) \mathcal {F}[\mu _{i+1/2}, f_{i+1/2}] +\frac{\sigma (\mu _{i+1}-\mu _i)}{\delta x}, \end{aligned}$$
(5.7)

where the weights \(\theta _{i+1/2}\) are in general depending on the solution and the parameters of Eq. (5.3). Hence the flux functions are defined as a combination of upwind and centered discretizations, and such that for \(\sigma = 0\) the scheme reduces to an upwind scheme, i.e. \(\theta _{i+1/2} = 0\). The choice of the weights is the key point of the scheme (5.6), which allows to preserve steady state solutions and the non-negativity of the numerical density. We refer to [9, 19, 31] for the details on the properties and analysis of the Chang–Cooper scheme for similar Fokker–Planck models and to [75], and references therein, for applications to control problems.

Alternatively, scheme (5.2) furnishes a consistent method to solve the forward equation (5.3), which we expect to be more efficient for problems with high dimensionality, since it relies on a stochastic evaluation of the nonlocal operator \(\mathcal {P}[f]\).

5.2.2 Backward Equation

The main difficulty of the integro-differential advection-reaction-diffusion equation (5.4) resides on the efficient approximation of the integral term. We follow a finite difference approach, which we describe in the following. First, with time parameter \(\delta t\) as in the forward problem, we consider the first-order temporal approximation

$$\begin{aligned} - \frac{\psi ^m-\psi ^{m+1}}{\delta t}&=\frac{1}{2}|x-\bar{x}|^2 +\gamma \Psi (f^{m+1})\\&\quad + \left( f^{m+1} {+}\int _\Omega P(x,y)\cdot (y-x)\mu ^{m+1}\,dy\right) \cdot \nabla \psi ^{m+1}\\ {}&\quad + \sigma \Delta \psi ^{m+1}{-}\int _\Omega \left(P(y,x)\nabla _y \psi ^{m+1}\right)\\&\quad \cdot (y-x)\mu ^{m+1}\,dy,\quad m=0,\ldots ,M \end{aligned}$$

where \(\psi ^M=0\). At this level, f, \(\mu \), and \(\nabla \psi \) are treated as external data available at every discrete instance. In particular \(\nabla _y\) (inside the integral) is reconstructed by numerical differentiation. Then, the integral terms are evaluated with a Monte Carlo method generating \(M_s\) samples according to the distribution \(\mu \), and values of \(\nabla _y\psi \) are obtained by interpolation of the reconstructed variable. The advection term is approximated with a space-dependent upwind scheme, and diffusion is approximated with centered differences.

5.2.3 Optimality Condition and Sweeping Iteration

Once the forward–backward system has been discretized, what remains is to establish a coupling procedure in order to find the solution of the optimality system matching both initial and terminal conditions. For this, a first possibility is to consider the full space–time discretization of the forward–backward system, together with the optimality condition \(\nabla \Psi (f) = {-\frac{1}{\gamma }\nabla \psi }\), and cast it as a large-scale set of nonlinear equations, which can be solved via a Newton method. This idea has been already successfully applied in the context of mean field games in [2]. We pursue a different approach that has proven to be equally effective, developed in [26], where the authors apply a sweeping algorithm, which in our setting reads as follows.

figure b

Our numerical experience is consistent with what has been already reported in [26], in the sense that solutions satisfying the optimality system can be found after few sweeps. Convergence of a similar sweeping iteration in the context of mean-field games has been recently proven in [25]. An alternative approach is to follow a gradient-type method, as in [20].

5.3 Numerical Experiments

In order to validate our previous analysis we focus on models for opinion dynamics, [53, 69, 78, 80], thus in the unidimensional case the state variable \(x\in [-L,L]\) represents the agent opinion with respect to two opposite opinions \(\{-L,+L\}\), and the control f(xt) can be interpreted as the strategy of a policy maker, [5, 6].

Therefore we consider the following initial value problem

$$\begin{aligned} \partial _t \mu +\partial _x\left( \left( \int _{-L}^{+L}P(x,y)(y-x)\mu (y)dy +f\right) \mu \right) =\sigma \partial _x^2 \mu , \quad \mu (x,0) = \mu ^0(x) \end{aligned}$$
(5.8)

with no-flux boundary conditions, and where f denotes the control term, solution of

$$\begin{aligned} f = \arg \min _{g\in {\mathcal {U}}}\dfrac{1}{2}\int _{0}^{T}\int _{-L}^{+L}\left( |x-\bar{x}|^2 +\gamma g^2 \right) \mu (x,t)\ dx \ dt, \end{aligned}$$
(5.9)

where we consider a quadratic penalization of the control, i.e. \(\Psi (c) = |c|^2/2\).

For different interaction kernels \(P(\cdot ,\cdot )\), we will study the performance of the proposed controllers \(f=f(x,t)\), obtained through the following synthesis procedures: instantaneous control (IC), finite horizon (FH), and the sweeping algorithm (OC).

We report in Table 1 the choice of the algorithms and parameters, indicating for which method they have been used to compute (5.8)–(5.9).

Table 1 Parameters choice for the various algorithms and optimization methods

5.3.1 Test 1: Sznajd Model

We consider the Sznajd model, [10, 78] for which the interaction operator \(P(\cdot ,\cdot )\) in (5.8) is defined as follows

$$\begin{aligned} P(x,y) = \beta (1-x^2), \end{aligned}$$
(5.10)

for \(\beta \) a constant. Note that in this case the interaction kernel \(P(\cdot ,\cdot )\) models the propensity of voters to change their opinions within the domain \(\Omega = [-1,1]\), and for values close to the extremal opinions \(\{-1,1\}\) the influence is low, conversely for opinions close to zero the influence is high. The dynamics is such that for \(\beta >0\) concentration of the density profile appears, whereas for \(\beta <0\) separation occurs, namely concentration around \(x=1\) and \(x=-1\), see [10].

For our first test we fix \(\beta =-1\) and we define in the time interval [0, T], \(T= 8\). We solve the control problem (5.8)–(5.9), with a bivariate initial data \(\mu ^0(x) := \varrho _+(x+0.75;0.05,0.5)+\varrho _+(x-0.5;0.15,1),\) where \(\varrho _+(y;a,b) := \max \{-(y/b)^2+a,0\}\), with diffusion coefficient \(\sigma = 0.01\), and desired state \(\bar{x} = -0.5\).

In Fig. 1 we depict the final state of (5.10) at time \(T = 8\) for the uncontrolled and controlled dynamics. The simulations show the concentration of the profiles around the reference position \(\bar{x}\) in presence of the control, instead in the uncontrolled case the density tends to concentrate around the boundary. The left-hand side figure refers to a penalization of the control \(\gamma = 0.5\), the right-hand side figure with \(\gamma = 0.05\). As expected, with smaller control penalizations, the final state is driven closer to the desired reference.

Fig. 1
figure 1

Test #1: Final states at time \(T=8\) of the Sznajd model (5.10) for \(\beta = -1\) with initial data \(\mu ^0(x)\). Concentration around the desired state \(\bar{x}\) is observed in presence of the controls: instantaneous control (IC), finite horizon approach (FH), optimal control (OC), separation is observed in the uncontrolled setting. Left figure \(\gamma = 0.5\), right figure \(\gamma = 0.05\)

Figure 2 illustrates the transient behavior of the density \(\mu (x,t)\) and the control f(xt) in the \([-1,+1]\times [0,T]\) frame, respectively for \(\gamma =0.5\) and \(\gamma = 0.05\), and we report the values of the cost function \(J(\mu ,f)\) corresponding to the different methods. Note that that the action of the instantaneous control is almost constant in time steering the system toward \(\bar{x}\) but with the higher cost \(J(\mu ,f)\), on the other hand the optimal finite horizon for the binary dynamics (FH) produces a similar control with respect to the optimal control obtained by the sweeping algorithm (OC), with a small difference between the values of the cost functional.

Fig. 2
figure 2

Test #1: Transient behavior of the density \(\mu (x,t)\) and the control f(xt) in \([-L,+L]\times [0,T]\), with \(L =1,~T=8\), for the Sdnajz’s model, (5.8)–(5.10). The top picture depicts the transient density of the unconstrained dynamics. Value of the cost functional are reported in correspondence of the choice of the method and the penalization parameter \(\gamma \)

5.3.2 Test 2: Hegselmann–Krause Model

In this second test we consider the mean field Hegselmann–Krause model [53], also known as bounded confidence model, whose interaction kernel reads

$$\begin{aligned} P(x,y)=\chi _{\{|x-y|\le \kappa \}}(y). \end{aligned}$$
(5.11)

This type of model describes the propensity of agents to interact only within a confidence range \(K=[x-\kappa ,x+\kappa ]\) of their opinion x, in the present experiment we fix \(\kappa = 0.15\). Thus we study the evolution of the control problem (5.8)–(5.9) up to time \(T = 20\) with initial data defined as \(\mu ^0(x) = C_0(0.5+\epsilon (1-x^2)),\) for \(\epsilon = 0.01\) and \(C_0\) such that the total density is a probability distribution. The diffusion coefficient is \(\sigma = 10^{-5}\), the penalization parameter \(\gamma = 2.5\), and the desired state \(\bar{x} = 0\).

The uncontrolled evolution of this model shows the emergence of multiple clusters, as it is shown in the top picture of Fig. 3, due to the small value of \(\kappa \) and small diffusion. Figure 3 depicts the transient behavior of the density \(\mu (x,t)\) and the control signal f(xt) in the frame \(\Omega \times [0,T]\).

We observe in Fig. 3 that for the instantaneous control (IC), consensus is slowly reached with a cost functional value of \(J_{IC}(\mu ,f)=0.8807\); the finite horizon control (FH) and the solution of the optimality conditions (OC) are able to steer faster the system towards \(\bar{x}\), respectively with cost \(J_{FH}(\mu ,f)= 0.6079\), and \({J_{OC}=0.5080}\).

Fig. 3
figure 3

Test #2: Transient behavior of the density \(\mu (x,t)\) and the control f(xt) in \([-L,+L]\times [0,T]\), with \(L =1,~T=20\), for the Hegelmann–Krause’s model, (5.8)–(5.9). The top picture shows the emergence of opinion clustering in the unconstrained dynamics. Value of the cost functional are reported in correspondence of the choice of the method with penalization parameter \(\gamma = 2.5\)

These experiments are showing very clearly the hierarchy of the controls (IC) \(\rightarrow \) (FH) \(\rightarrow \) (OC). In particular, it is evident the quasi-optimality of (FH), to the extent that we can claim (FH) \(\approx \) (OC). The intuition is that (FH) is an optimal control on the binary dynamics of two particles, and, through the Boltzmann collisional operator, its binary optimality is “smeared” over the entire population. However, we have no quantitative method yet to assess such an approximation. In fact, as commented in Remark 4.2, although the (FH) fulfills a Hamilton–Jacobi–Bellman equation, its synthesis by means of (4.22) to control (4.21) unfortunately does not fulfill the backward equation (5.4) of the optimality conditions, even not approximately: by testing (4.22) within (5.4), there a few useful cancellations, but, because of lack of symmetry, certain terms remains, whose magnitude is still hard to estimate. We expect that those terms are actually not so large and this would somehow justify the quasi-optimality of (FH). This issue remains an interesting open problem.

6 Concluding Remarks

In this paper, we have presented a hierarchy of control designs for mean field dynamics. At the bottom of the hierarchy, we have introduced optimal feedback controls which are derived for two-agent models, and which are subsequently realized at the mean field level through a Boltzmann approach. At the top of the hierarchy, one finds the mean field optimal control problem and its correspondent optimality conditions. In both cases, we presented a theoretical and numerical analysis of the proposed designs, as well as computational implementations. From the numerical experiments presented in the last section, we observe that although the numerical realization of the mean field optimality system yields the best controller in terms of the cost functional value, feedback controllers obtained for the binary system perform reasonably well, and provide a much simpler control synthesis. We expect to further proceed along this direction of research, in particular in relation to the computation of feedback controllers via Dynamic Programming and Hamilton–Jacobi–Bellman equations for the binary system, as it provides a versatile framework to address different control problems.