Dynamic Programming for Mean-Field Type Control

Laurière, Mathieu; Pironneau, Olivier

doi:10.1007/s10957-015-0785-x

Dynamic Programming for Mean-Field Type Control

Published: 29 July 2015

Volume 169, pages 902–924, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Dynamic Programming for Mean-Field Type Control

Download PDF

Mathieu Laurière¹ &
Olivier Pironneau²

722 Accesses
14 Citations
Explore all metrics

Abstract

We investigate a model problem for optimal resource management. The problem is a stochastic control problem of mean-field type. We compare a Hamilton–Jacobi–Bellman fixed-point algorithm to a steepest descent method issued from calculus of variations. For mean-field type control problems, stochastic dynamic programming requires adaptation. The problem is reformulated as a distributed control problem by using the Fokker–Planck equation for the probability distribution of the stochastic process; then, an extended Bellman’s principle is derived by a different argument than the one used by P. L. Lions. Both algorithms are compared numerically.

A Weak Martingale Approach to Linear-Quadratic McKean–Vlasov Stochastic Control Problems

Article 11 December 2018

Discrete-Time Control for Systems of Interacting Objects with Unknown Random Disturbance Distributions: A Mean Field Approach

Article 07 September 2015

Control Strategies for the Dynamics of Large Particle Systems

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Stochastic control has been studied extensively over the past five decades [1–5], and yet there is a renewed interest in economy and finance due to mean-field games [6–9]. Mean-field games give rise to mean-field type stochastic control problems [10], which involve not only the Markov process of the state of the system, some statistics of the process like means and variance, in the cost function or in the stochastic differential equation (SDE). For these problems, optimality conditions are derived either by stochastic calculus of variation [11] or by stochastic dynamic programming [12, 13] and justified in the quadratic case by classical arguments [14, 15], but not so classical one in the general case for the fundamental reason that Bellman’s principle does not apply in its original form [12, 16].

Several authors have generalized dynamic programming using Wasserstein distance to define derivatives with respect to measures. Others have studied the existence of a conceptual HJB equation [16, 17]. These results certainly overlap and precede our analysis, but our point of view is different: It is pragmatic, so we sacrifice mathematical rigor to explicit expressions, and numerical, as we wish to compare solutions obtained by HJB to standard optimal control by calculus of variations. We have not tried to specify regularity of data for existence and differentiability of solutions. The results are stated formally only but with an intuitive feeling that they could be justified later with appropriate assumptions as in, e.g., [18, 19] if the behavior at infinity of the solution of the HJB equation is known, which is a major riddle.

Before proceeding further, note that a direct simulation of the problem with the stochastic differential equation approximated by Monte-Carlo is too costly and not competitive with the methods that we pursue below. Indeed, the cost function of the optimization problem involves means of stochastic quantities, and Monte-Carlo methods would require large numbers of evaluations of the SDE, embedded in forward–backward time loops. Faced with the same problem, Garnier et al. [7] and Chan et al. [20] came to the same conclusion.

In this article, pursuing a preliminary study published in [21], we apply the dynamic programming argument to the value functional as in [22], but instead of using the probability measure of the stochastic process, we use its probability density function (PDF). Hence, though less general, the mathematical argument will be simpler. Of course, this is at the cost of several regularity assumptions, such as the existence of a regular PDF at all times. However, our analysis here is strongly motivated by the numerical solutions of these control problems, and hence, assuming regularity is not a real practical limitation.

Once the problem is reformulated with the Fokker–Planck equation [1, 23], it becomes a somewhat standard exercise to find the optimality necessary conditions by a calculus of variations. So this article begins likewise in Sect. 3. Then, in Sect. 4, a similar result is obtained by using dynamic programming, and the connection with the previous approach and with stochastic dynamic programming is established. In Sect. 5, a problem introduced in [24] involving profit-optimizing oil producers is defined and studied for existence and optimality, and two algorithms are proposed together with a semi-analytical method based on a Riccati solution. The paper ends with a numerical Section, which implements the three methods and compare them.

2 The Problem

Consider a stochastic differential system of d-equations

$$\begin{aligned} {\mathrm {d}}X_t = u(X_t,t){\mathrm {d}}t + \sigma (X_t,t,u(X_t,t)){\mathrm {d}}W_t, ~~~t\in [0,T], \end{aligned}$$

(1)

where u takes values in ${{{\mathbb {R}}}^d}$, $\sigma $ in ${\mathbb {R}}^{d\times k}$ and $W_t$ is a k-vector of independent Brownian motions. Assumptions for which a strong solution is known to exist once the distribution of $X_0$ is known to be in $L^2\cap L^\infty $ are given in [25] (see also [26], Proposition 4 which applies when $\sigma \sigma ^T$ is uniformly positive definite:

$$\begin{aligned}&u\in L^2\big (]0,T[,L^2_{loc}({{{\mathbb {R}}}^d})\big )^{\mathrm {d}}, ~ \frac{u}{1+|x|}\in L^1\big (]0,T[,L^1+L^\infty ({{{\mathbb {R}}}^d})\big )^d,~\nonumber \\&\nabla \cdot u\in L^1\big (]0,T[,L^\infty ({{{\mathbb {R}}}^d})\big ), \end{aligned}$$

(2)

and with $\tilde{\sigma }(x,t):=\sigma (x,t,u(x,t))$:

$$\begin{aligned} \tilde{\sigma }\in L^\infty \big (]0,T[,L^\infty _{loc}({{{\mathbb {R}}}^d})\big )^{d\times k}, ~\frac{\tilde{\sigma }}{1+|x|}\in L^2\big (]0,T[,L^2+L^\infty ({{{\mathbb {R}}}^d})\big )^{d\times k} \end{aligned}$$

(3)

Then, the PDF of $X_t$ satisfies

$$\begin{aligned}&\rho \in L^\infty \big (]0,T[,L^2({{{\mathbb {R}}}^d})\cap L^\infty ({{{\mathbb {R}}}^d})\big )\cap L^2\big (]0,T[,H^1({{{\mathbb {R}}}^d})\big ),~\nonumber \\&\quad \sigma ^T\nabla \rho \in L^2\big (]0,T[,L^2({{{\mathbb {R}}}^d})\big ) \end{aligned}$$

and is the unique solution of the Fokker–Planck equation,

$$\begin{aligned} \partial _t\rho +\nabla \cdot (u\rho )-\nabla ^2:\left( \tfrac{1}{2}\sigma \sigma ^T\rho \right) =0,~~\rho |_{t=0}=\rho _0(x), \end{aligned}$$

(4)

Conversely, with (2) (3), there is a unique solution to (4) which is the PDF of a Markov process which satisfies (1).

Remark 2.1

The assumption $\sigma \sigma ^T>0$ can be replaced by assuming the conditions (2) for $u-\frac{1}{2}\nabla \cdot (\tilde{\sigma }\tilde{\sigma }^T)$ rather than u, but it implies some regularity on the second derivatives of u.

Consider the stochastic optimization problem

$$\begin{aligned} \displaystyle&\min _{u\in {\mathscr {U}}_d} J(u):=J_0(u) \end{aligned}$$

(5)

subject to (1) with $\rho _0$ given, and

$$\begin{aligned} J_\tau (u):= & {} \int _\tau ^T \mathbf{E}\left[ \tilde{H}\left( X_t,t,u(X_t,t),{\rho (X_t,t),}\mathbf{E}\left[ \tilde{h}(X_t,t,u(X_t,t))\right] \right) \right] {\mathrm {d}}t\nonumber \\&\quad \quad \quad \quad +\,\mathbf{E}\left[ G\left( X_T, \mathbf{E}[g(X_T)]\right) \right] \end{aligned}$$

(6)

where $\tilde{h}, g, \tilde{H}, G$ are $C^1$ functions taking values in ${\mathbb {R}}^r,{\mathbb {R}}^s,{\mathbb {R}}$ and ${\mathbb {R}}$, respectively. Assume also that $\tilde{H}$ and G are bounded from below and

$$\begin{aligned} {\mathscr {U}}_d=\left\{ u\,\hbox {verifies (2)}~:~u(x,t)\in {\mathscr {V}}_d~\forall x,t\right\} \,\hbox {for some}\, {\mathscr {V}}_d\subset {{{\mathbb {R}}}^d}. \end{aligned}$$

As a first approach to solve such problems, Andersson et al. [11] proposed to use a stochastic calculus of variations; the necessary optimality conditions are a forward–backward stochastic differential system, which is numerically very hard because the backward volatility for the adjoint equation is one of the unknowns [11, 27].

A second approach is to use stochastic dynamic programming (SDP), but an important adaptation needs to be made. Usually, SDP uses the remaining cost function

$$\begin{aligned} V(\tau ,x) = \min \limits _{u\in {\mathscr {U}}_d} J_\tau (u)\,\hbox {subject to (1)},\,\hbox {with}\,X_\tau =x. \end{aligned}$$

The Bellman equation is derived by saying that ${u_t},t>\tau $ is a solution only if, together with (1) and $X_\tau =x$,

$$\begin{aligned} V(\tau ,x)= & {} \min \nolimits _{u\in {\mathscr {U}}_d}\mathbf{E}\left[ \int _\tau ^{\tau +\delta \tau } \tilde{H}\left( X_t,t,{u(X_t,t)},{\rho (X_t,t),}\mathbf{E}\left[ \tilde{h}(X_t,t,{u(X_t,t)})\right] \right) {\mathrm {d}}t\right. \\&\quad \quad \quad \quad \quad \quad \left. +\,V\left( \tau +\delta \tau , X_{\tau +\delta \tau }\right) \right] . \end{aligned}$$

However, the above is not true unless $\tilde{h}=0,~g=0$; as in [22], one has to work with

$$\begin{aligned} V(\tau ;\rho _\tau (\cdot )) = \min \limits _{u\in {\mathscr {U}}_d} J_\tau (u)\hbox { subject to (1)},\,\hbox {with}\,\rho _\tau \,\hbox {given}, \end{aligned}$$

where V is a pointwise function of $\tau \in [0,T]$ and has functional dependence on $\rho _\tau (\cdot )$, i.e., depends on $\{x\mapsto \rho _\tau (x),~\forall x\in {{{\mathbb {R}}}^d}\}$.

A third approach is to work with the deterministic version of the problem. With sufficient regularity (see [28] for weaker assumptions), with the Fokker–Planck partial differential equation for $\rho (x,t):=\rho _t(x)$; the problem is equivalent to the deterministic distributed control problem,

$$\begin{aligned} \displaystyle \min \limits _{u\in {\mathscr {U}}_d}&J= {\int _0^T\int _{{{\mathbb {R}}}^d}} \tilde{H}(x,t,u(x,t),\rho (x,t),\chi (t))\rho (x,t){\mathrm {d}}x{\mathrm {d}}t + \int _{{{\mathbb {R}}}^d}G(x,\xi )\rho (x,T){\mathrm {d}}x, \nonumber \\&\hbox { where } \chi (t):=\int _{{{\mathbb {R}}}^d}\tilde{h}(x,t,u(x,t),\rho (x,t))\rho (x,t){\mathrm {d}}x, ~\xi :=\int _{{{\mathbb {R}}}^d}g(x)\rho (x,T){\mathrm {d}}x \nonumber \\&\hbox {and } \partial _t\rho +\nabla \cdot (u\rho )-\nabla ^2:(\mu \rho )=0,~~\rho |_{t=0}=\rho _0(x), \end{aligned}$$

(7)

where $\mu _{ij}=\frac{1}{2}\sum _{k}\sigma _{ik}\sigma _{jk}$ and $\nabla ^2$ is the $d\times d$ matrix operator of element $\partial _{ij}$. The notation A : B stands for $\sum _{i,j=1}^dA_{ij}B_{ij}$ and $\nabla \cdot u$ stands for $\sum _{i=1}^d\partial _i u_i$.

Remark 2.2

Note that the problem is equivalent to the stochastic control problem only if $\rho _0$ is in $\mathbf {P}$, the set of positive real-valued functions with measure 1. However, the deterministic control problem still makes sense even if it is not the case and $\rho \in L^2([0,T],H^1({{{\mathbb {R}}}^d}))$ only. We will use this in the proof of Proposition 4.3.

Remark 2.3

Existence of solutions for (6) or (7) requires more assumptions on $\tilde{H}, \tilde{h},G$ and g to make sure that J is lower semi-continuous with respect to u in a norm such as $L^2\big ([0,T],H^1({{{\mathbb {R}}}^d})\big )$ which implies (2) and ${\mathscr {U}}_d$ closed. However, since there is no term containing a gradient of u in J, a Tikhonov regularization seems to be needed for these problems to be well posed.

3 Calculus of Variations

Deriving optimality conditions by calculus of variations is fairly standard, as we shall show in this section.

Proposition 3.1

A control u is optimal for (7) only if for $\forall t\in ]0,T[$ , $\forall v\in {\mathscr {V}}_d$,

$$\begin{aligned}&\int _{{{\mathbb {R}}}^d}\left( {\tilde{H}}'_u +{\tilde{h}}'_u\int _{{{\mathbb {R}}}^d}{\tilde{H}}'_\chi \rho {\mathrm {d}}x + \nabla \rho ^* - {\mu '_u}:\nabla ^2\rho ^*\right) (v-u)\rho {\mathrm {d}}x\ge 0~ \end{aligned}$$

(8)

$$\begin{aligned}&\hbox {with } \small \partial _t\rho ^* + u\nabla \rho ^* + \mu :\nabla ^2\rho ^* = -\left[ {\tilde{H}}'_\rho \rho +{\tilde{H}} + ({\tilde{h}}'_\rho \rho + {\tilde{h}})\int _{{{\mathbb {R}}}^d}{\tilde{H}}'_\chi \rho {\mathrm {d}}x \right] , \nonumber \\&\quad \rho ^*_{|T}= g\int _{{{\mathbb {R}}}^d}G'_\xi \rho _{|T}{\mathrm {d}}x + G, \end{aligned}$$

(9)

where $\tilde{H}_u', \tilde{H}_\rho ', \tilde{H}_\chi ',\tilde{h}_u', \tilde{h}_\rho '$ and $G_\xi '$ are partial derivative in the classical sense.

Proof

Recall that (7) is in $Q=]-\infty ,+\infty [^d\times ]0,T]$. The Fokker–Planck equation (7) being set in $\mathbf {P}$, it contains the “boundary condition” $\forall t\in ]0,T], \lim _{|x|\rightarrow \infty }\rho (x,t)=0$.

Consider an admissible variation $\lambda \delta u$, i.e., $u+\lambda \delta u\in {\mathscr {U}}_d$ for all $\lambda \in [0,1]$. Such a variation induces a variation $\lambda \delta \rho $ of $\rho $ given by

$$\begin{aligned} \partial _t\delta \rho +\nabla \cdot \left( u\delta \rho + \rho \delta u+\lambda \delta u\delta \rho \right) -\nabla ^2:\left( \mu \delta \rho +\mu '_u\delta u(\rho +\lambda \delta \rho )\right) =0, \end{aligned}$$

(10)

with $\delta \rho |_{t=0}=0$ and where $\mu '_u$ is evaluated at $x,t,u+\theta \delta u$ for some $\theta \in [0,\lambda ]$. We assume that there is enough regularity for the solution of the Fokker–Planck equation in (7) to depend continuously on the data $u,\mu $. Then (10) at $\lambda =0$ defines $\delta \rho $. Also, up to higher-order terms,

$$\begin{aligned} \delta \chi= & {} \int _{{{\mathbb {R}}}^d}\left[ \left( {\tilde{h}}'_u\delta u + {\tilde{h}}'_\rho \delta \rho \right) \rho + {\tilde{h}}\delta \rho \right] ,~~ \delta \xi = \int _{{{\mathbb {R}}}^d}[g\delta \rho _{|T}]\\ \delta J= & {} \int _Q \left[ \left( {\tilde{H}}'_u\delta u + {\tilde{H}}'_\rho \delta \rho + {\tilde{H}}'_\chi \delta \chi \right) \rho + {\tilde{H}}\delta \rho \right] + \int _{{{\mathbb {R}}}^d}\left[ G'_\xi \delta \xi \rho _{|T} + G\delta \rho _{|T}\right] \\= & {} \int _Q \left[ \left( {\tilde{H}}'_u +{\tilde{h}}'_u\int _{{{\mathbb {R}}}^d}[{\tilde{H}}'_\chi \rho ]\right) \rho \delta u\right] \\&+ \,\int _Q\left[ \left( {\tilde{H}}'_\rho \rho +{\tilde{H}} + ({\tilde{h}}'_\rho \rho + {\tilde{h}})\int _{{{\mathbb {R}}}^d}[{\tilde{H}}'_\chi \rho ] \right) \delta \rho \right] \\&+\, \int _{{{\mathbb {R}}}^d}\left[ \left( \int _{{{\mathbb {R}}}^d}[G'_\xi \rho _{|T}]g + G\right) \delta \rho _{|T}\right] . \end{aligned}$$

Define an adjoint state $\rho ^*$ by (9). Then (9) multiplied by $\delta \rho $ and integrated on Q gives

$$\begin{aligned}&\int _Q\delta \rho \left[ {\tilde{H}}'_\rho \rho +{\tilde{H}} + ({\tilde{h}}'_\rho \rho + {\tilde{h}})\int _{{{\mathbb {R}}}^d}[{\tilde{H}}'_\chi \rho ] \right] \\&\quad =-\int _Q\delta \rho \left[ \partial _t\rho ^* + u\nabla \rho ^* + \mu :\nabla ^2\rho ^*\right] \\&\quad =\int _Q\left[ \rho ^*\left( \partial _t\delta \rho +\nabla \cdot (u\delta \rho ) -\nabla ^2:(\mu \delta \rho )\right) \right] -\int _{{{\mathbb {R}}}^d}\rho ^*\delta \rho \Big |_0^T\\&\quad = -\int _Q\left[ \rho ^*\nabla \cdot \left( \rho \delta u - \nabla \cdot (\mu '_u\delta u\rho )\right) \right] - \int _{{{\mathbb {R}}}^d}\left[ (g\int _{{{\mathbb {R}}}^d}[G'_\xi \rho _{|T}] + G)\delta \rho |_T\right] .\\&\hbox {Hence }\delta J =\int _Q \left[ \left( {\tilde{H}}'_u +{\tilde{h}}'_u\int _{{{\mathbb {R}}}^d}[{\tilde{H}}'_\chi \rho ]\right) \rho \delta u\right] -\int _Q\left[ \rho ^*\nabla \cdot \left( \rho \delta u - \nabla \cdot (\mu '_u\delta u\rho )\right) \right] \\&\quad =\int _Q \left[ \left( {\tilde{H}}'_u +{\tilde{h}}'_u\int _{{{\mathbb {R}}}^d}[{\tilde{H}}'_\chi \rho ] + \nabla \rho ^* - \nabla ^2\rho ^*:{\mu '_u}\right) \rho \delta u\right] . \end{aligned}$$

The rest follows by saying that u is a minimum. $\square $

Remark 3.1

The “boundary conditions at infinity” on $\rho ^*$ in (9) are problematic. The PDE is to be understood in weak form in the dual of $\mathbf {P}$, i.e., for all $\nu \in \mathbf {P}$ :

$$\begin{aligned}&\int _Q\left[ \nu (\partial _t\rho ^* + u\nabla \rho ^*) - \nabla \nu \cdot (\mu \nabla \rho ^*)\right] \nonumber \\&\quad = -\,\int _Q\Big [\nu \big [{\tilde{H}}'_\rho \rho +{\tilde{H}} + ({\tilde{h}}'_\rho \rho + {\tilde{h}})\int _{{{\mathbb {R}}}^d}{\tilde{H}}'_\chi \rho {\mathrm {d}}x \big ]\Big ]. \end{aligned}$$

(11)

Then, the decay of $\nu $ at infinity will balance the potential growth of $\rho ^*$, and the integrations by parts in the proof above will have no terms at infinity in $\Vert x\Vert $. However, the uniqueness of $\rho ^*$ is not guaranteed. We could have avoided the problem by working in $[-L,L]^d\times [0,T]$ rather than ${{{\mathbb {R}}}^d}\times [0,T]$ and imposing $\rho ^*=0$ on the boundary of $[-L,L]^d$, but then the solution depends strongly on L; this problem will be rediscussed in the numerical section (Sect. 5).

4 Dynamic Programming

For notational clarity consider the more general case, where H, G are functionals of $\rho _t(\cdot )$. Let $\rho $ be solution of (7) initialized at $\tau $ by $\rho _\tau (\cdot )$ and let J and V be defined as :

$$\begin{aligned} J(\tau ; \rho _\tau , u)&:= \int _\tau ^T\int _{{{\mathbb {R}}}^d}H(x,t,u(x,t);\rho _t)\rho _t(x){\mathrm {d}}x{\mathrm {d}}t + \int _{{{\mathbb {R}}}^d}G(x;\rho _{|T})\rho _{|T}{\mathrm {d}}x, \nonumber \\ V(\tau ;\rho _\tau )&:= \min \limits _{u\in {\mathscr {U}}_d} J(\tau ; \rho _\tau , u). \end{aligned}$$

(12)

By the Markovian property, $\rho _t(x):=\rho (x,t)$, for $t>\tau $, is also the PDF of $X_t$ given by (1) knowing its PDF $\rho _\tau $ at $\tau $.

Remark 4.1

In this section, H is a pointwise function of $u(x,t) \in {\mathbb {R}}^d$, but the theory can be extended to the case where H is a functional of $u(\cdot ,\cdot ) : {\mathbb {R}}^d \times {\mathbb {R}}\rightarrow ~{\mathbb {R}}^d$.

Assuming that J is bounded from below, V is finite and we prove the following version of Bellman’s principle of optimality :

Proposition 4.1

If the problem is regular, then for any $\tau \in [0,T]$ and any $\rho _\tau \in \mathbf {P}$, we have :

$$\begin{aligned} V(\tau ;\rho _\tau )= & {} \min _{u\in {\mathscr {U}}_d}\Big \{ \int _\tau ^{\tau +\delta \tau } \int _{{{\mathbb {R}}}^d}H(x,t,u(x,t);\rho _t)\rho _t(x){\mathrm {d}}x{\mathrm {d}}t + V(\tau +\delta \tau ; \rho _{\tau +\delta \tau })\Big \},\\&\hbox { subject to}\, \rho _t, \,\hbox {given by (7) on} \,[\tau , \tau +\delta \tau ], \,\hbox {initialized by}\, \rho _\tau \,\hbox {at time}\, \tau . \end{aligned}$$

Proof

Denote the infimum of the right-hand side by $\overline{V}(\tau ; \rho _\tau )$. For any $\varepsilon >0$, there exists $u \in {\mathscr {U}}_d$ such that, if $\rho _t$ is the solution of (7) with control u :

$$\begin{aligned} V(\tau ; \rho _\tau ) + \varepsilon > J(\tau ; \rho _\tau , u)&= \int _\tau ^T\int _{{{\mathbb {R}}}^d}H(x,t,u(x,t);\rho _t)\rho _t{\mathrm {d}}x{\mathrm {d}}t \\&\quad +\,\int _{{{\mathbb {R}}}^d}G(x;\rho _{|T})\rho _{|T}{\mathrm {d}}x\\&= \int _\tau ^{\tau +\delta \tau }\int _{{{\mathbb {R}}}^d}H\rho _t + \int _{\tau +\delta \tau }^T \int _{{{\mathbb {R}}}^d}H\rho _t + \,\int _{{{\mathbb {R}}}^d}G\rho _{|T}\\ {}&\ge \int _\tau ^{\tau +\delta \tau }\int _{{{\mathbb {R}}}^d}H\rho _t + V(\tau +\delta \tau ; \rho _{\tau +\delta \tau }) \ge \overline{V}(\tau ; \rho _\tau ). \end{aligned}$$

Conversely, given $u \in {\mathscr {U}}_d$ and $\varepsilon >0$, there exists a control $\tilde{u} \in {\mathscr {U}}_d$, which coincides with u on ${{{\mathbb {R}}}^d}\times [\tau , \tau +\delta \tau ]$, such that:

$$\begin{aligned} J(\tau + \delta \tau ; \tilde{\rho }_{\tau + \delta \tau }, \tilde{u}) \le V(\tau +\delta \tau ; \tilde{\rho }_{\tau + \delta \tau }) + \varepsilon , \end{aligned}$$

where $\tilde{\rho }_t$ is the solution of (7) at t with control $\tilde{u}$ starting with $\rho _\tau $ at time $\tau $. Hence,

$$\begin{aligned} V(\tau ; \rho _\tau )&= V(\tau ; \tilde{\rho }_\tau ) \le J(\tau ; \tilde{\rho }_\tau , \tilde{u})\\&= \int _\tau ^{\tau +\delta \tau }\int _{{{\mathbb {R}}}^d}H(x,t,u(x,t);\tilde{\rho }_t)\tilde{\rho }_t(x){\mathrm {d}}x{\mathrm {d}}t + J(\tau +\delta \tau ; \tilde{\rho }_{\tau +\delta \tau }, \tilde{u} )\\&\le \int _\tau ^{\tau +\delta \tau }\int _{{{\mathbb {R}}}^d}H(x,t,u(x,t);\tilde{\rho }_t)\tilde{\rho }_t(x){\mathrm {d}}x{\mathrm {d}}t + V(\tau +\delta \tau ; \tilde{\rho }_{\tau +\delta \tau }) + \varepsilon . \end{aligned}$$

To conclude, let $\varepsilon \rightarrow 0$ and take the infimum over $u \in {\mathscr {U}}_d$. $\square $

From now on, we assume that H and V are Fréchet differentiable with respect to $\rho $.

Remark 4.2

The correct mathematical tool for this differentiability is the Wasserstein distance and the differentiability with respect to the probability measure rather than to its density (see, e.g., [13, 22]). Our approach is more pragmatic.

We denote the Fréchet derivatives by $H_\rho '(x,\tau ;\rho )$ and $V_\rho '(\tau ;\rho )$. Thus, and similarly for $V, H_\rho '(x,\tau ;\rho )$ denotes the linear application $\mathbf {L}^2\mapsto {\mathbb {R}}$ such that :

$$\begin{aligned} H(x,\tau ;\rho +\nu ) = H(x,\tau ;\rho ) + H_\rho '(x,\tau ;\rho )\cdot \nu + o(||\nu ||_2),~\forall \nu \in \mathbf {L}^2, \end{aligned}$$

where $\mathbf {L}^2:=L^2({{{\mathbb {R}}}^d})$. Moreover, we denote with a prime the Riesz representative of the Fréchet derivative with respect to $\rho $. For instance, $V' : [0,T] \times \mathbf {L}^2\rightarrow \mathbf {L}^2$ is defined by :

$$\begin{aligned} \int _{{{\mathbb {R}}}^d}V'(\tau ; \rho ) (x) \nu (x) {\mathrm {d}}x := V_\rho '(\tau ; \rho )\cdot \nu ~~ \forall \nu \in \mathbf {L}^2. \end{aligned}$$

Proposition 4.2

(HJB minimum principle). Assuming that $V'$ is smooth enough, the following holds :

$$\begin{aligned} \displaystyle 0=\min \limits _{v\in {\mathscr {V}}_d} \int _{{{\mathbb {R}}}^d}&\Big [H(x,\tau ,v;\rho _\tau ) + H_\rho '(x,\tau , v; \rho _\tau )\cdot \rho _\tau \nonumber \\&+\partial _\tau V'(\tau ;\rho _\tau ) + v\cdot \nabla V'(\tau ;\rho _\tau ) + \mu :\nabla ^2 V'(\tau ;\rho _\tau )\Big ]\rho _\tau {\mathrm {d}}x . \end{aligned}$$

(13)

Note: As usual, $\nabla $ is with respect to x.

Proof

A first-order approximation of the time derivative in the Fokker–Planck equation gives

$$\begin{aligned} \delta _\tau \rho := \rho _{\tau +\delta \tau } - \rho _\tau = \delta \tau \left[ \nabla ^2:(\mu _\tau \rho _\tau )-\nabla \cdot (u_\tau \rho ) \right] + o(\delta \tau ). \end{aligned}$$

(14)

As V is assumed to be smooth, we have :

$$\begin{aligned} V(\tau +\delta \tau ; \rho _{\tau +\delta \tau }) = V(\tau ; \rho _\tau ) + \partial _\tau V(\tau ; \rho _\tau ) \delta \tau + V_\rho '(\tau ; \rho _\tau ) \cdot \delta _\tau \rho + o(\delta \tau ). \end{aligned}$$

(15)

Using (15) and the mean value theorem for the time integral, Bellman’s principle yields , up to $o(\delta \tau )$,

$$\begin{aligned} V(\tau ; \rho _\tau ) =\min \limits _{u\in {\mathscr {U}}_d} \Big \{ \delta \tau \int _{{{\mathbb {R}}}^d}H\rho _\tau {\mathrm {d}}x + V(\tau ; \rho _\tau ) + \partial _\tau V(\tau ; \rho _\tau ) \delta \tau + V_\rho '(\tau ; \rho _\tau ) \cdot \delta _\tau \rho \Big \}. \end{aligned}$$

(16)

The terms $V(\tau ; \rho _\tau )$ cancel, divided by $\delta \tau $ and combined with (14) and letting $\delta \tau \rightarrow 0$, (16) gives

$$\begin{aligned} 0 =\min \limits _{u\in {\mathscr {U}}_d} \Big \{ \int _{{{\mathbb {R}}}^d}H\rho _\tau {\mathrm {d}}x + \partial _\tau V(\tau ; \rho _\tau ) + V_\rho '(\tau ; \rho _\tau ) \cdot \left[ \nabla ^2:(\mu _\tau \rho _\tau )-\nabla \cdot (u_\tau \rho ) \right] \Big \}. \end{aligned}$$

(17)

To finalize the proof, we need the following proposition to relate V to $V_\rho '$. $\square $

Proposition 4.3

Given $\tau \in [0,T]$ and an initial $\rho _\tau \in \mathbf {P}$, let $\hat{u} \in {\mathscr {U}}_d$ and $\hat{\rho }$ denote an optimal control and the corresponding solution of (7), then:

$$\begin{aligned} \displaystyle \int _{{{\mathbb {R}}}^d}V'(\tau ; \rho _\tau )\rho _\tau {\mathrm {d}}x= & {} V(\tau ; \rho _\tau ) + \,\int _\tau ^T\int _{{{\mathbb {R}}}^d}\Big (H_\rho '(x,t,\hat{u};\hat{\rho }_t)\cdot \hat{\rho }_t\Big ) \hat{\rho }_t{\mathrm {d}}x{\mathrm {d}}t \nonumber \\&\quad +\, \int _{{{\mathbb {R}}}^d}\Big (G_\rho '(x;\hat{\rho }_T)\cdot \hat{\rho }_T\Big ) \hat{\rho }_T{\mathrm {d}}x. \end{aligned}$$

(18)

Proof

Note that the Fokker–Planck equation implies the existence of a semigroup operator $\mathbf{G}$ such that, for all $\tau \le t, \rho _t=\mathbf{G}(t-\tau )*\rho _\tau $. Let $(\hat{u}_t)_{t\in [0,T]}$ be the optimal control and $(\hat{\rho }_t)_{t\in [0,T]}$ the corresponding solution of (7) and (12). Then :

$$\begin{aligned} V(\tau ; \hat{\rho }_\tau )&=\int _\tau ^T\int _{{{\mathbb {R}}}^d}H\left( x,t,\hat{u};\mathbf{G}(t-\tau )*\hat{\rho }_\tau \right) \mathbf{G}(t-\tau )*\hat{\rho }_\tau {\mathrm {d}}x{\mathrm {d}}t \\&\quad +\, \int _{{{\mathbb {R}}}^d}G(x;\hat{\rho }_{|T})\hat{\rho }_{|T}{\mathrm {d}}x.\nonumber \end{aligned}$$

(19)

By the optimality of $\hat{u}$ and $\hat{\rho }$, this can be Fréchet-differentiated with respect to $\rho $ by computing , for a given $\nu \in \mathbf {L}^2, \lim _{\lambda \rightarrow 0}\frac{1}{\lambda }\big [V\left( \tau ; \hat{\rho }_\tau +\lambda \nu \right) - V\left( \tau ; \hat{\rho }_\tau \right) \big ]$. The result is:

$$\begin{aligned} V_\rho '(\tau ; \hat{\rho }_\tau ) \cdot \nu&= \int _\tau ^T\int _{{{\mathbb {R}}}^d}H(x,t,\hat{u};\mathbf{G}(t-\tau )*\hat{\rho }_\tau ) \mathbf{G}(t-\tau )*\nu \\&\quad + \, \int _{{{\mathbb {R}}}^d}G(x;\hat{\rho }_{|T})\mathbf{G}(T{-}\tau )*\nu {\mathrm {d}}x \\&\quad + \,\int _\tau ^T\int _{{{\mathbb {R}}}^d}\Big (H_\rho '(x,t,\hat{u};\mathbf{G}(t{-}\tau )*\hat{\rho }_\tau )\cdot [\mathbf{G}(t{-}\tau )*\nu ]\Big ) \mathbf{G}(t-\tau )*\hat{\rho }_\tau \\&\quad +\, \int _{{{\mathbb {R}}}^d}\Big (G_\rho '(x;\mathbf{G}(T-\tau )*\hat{\rho }_\tau )\cdot [\mathbf{G}(T-\tau )*\nu ]\Big ) \mathbf{G}(T-\tau )*\hat{\rho }_\tau . \end{aligned}$$

Taking $\nu = \hat{\rho }_\tau $ leads to (18).

One may object, however, that such choice for $\nu $ is not admissible because being a variation of $\rho _\tau $ it has to have zero measure, but we discussed this in Remark 2.2.

End of proof of Proposition 4.2 Differentiating (18) with respect to $\tau $ leads to

$$\begin{aligned} \partial _\tau V(\tau ; \rho _\tau ) = \int _{{{\mathbb {R}}}^d}\partial _\tau V'(\tau ; \rho _\tau )\rho _\tau {\mathrm {d}}x + \int _{{{\mathbb {R}}}^d}\Big (H_\rho '(x,\tau , \hat{u}_\tau ; \rho _\tau )\cdot \rho _\tau \Big ) \rho _\tau {\mathrm {d}}x, \end{aligned}$$

where $\hat{u}_\tau $ is the optimal control at time $\tau $. Now, let us use (17), rewritten as

$$\begin{aligned} 0&=\min \limits _{u_\tau } \Big \{ \int _{{{\mathbb {R}}}^d}\Big (H(x,\tau ,u_\tau (x);\rho _\tau ) + H_\rho '(x,\tau ,u_\tau (x); \rho _\tau )\cdot \rho _\tau \Big ) \rho _\tau {\mathrm {d}}x \\&\quad +\, \int _{{{\mathbb {R}}}^d}\Big (\partial _\tau V'(\tau ; \rho _\tau )\rho _\tau + V'(\tau ; \rho _\tau ) \big [ \nabla ^2:(\mu _\tau \rho _\tau )-\nabla \cdot (u_\tau \rho _\tau ) \big ]\Big ) {\mathrm {d}}x \Big \}. \end{aligned}$$

Integrating by parts the last two terms leads to (13). $\square $

Remark 4.3

Note that (18) and (12) imply :

$$\begin{aligned} \int _{{{\mathbb {R}}}^d}V'_{|T}\hat{\rho }_T{\mathrm {d}}x=V(T,\hat{\rho }_T)=\int _{{{\mathbb {R}}}^d}\left( G+g\int _{{{\mathbb {R}}}^d}\partial _\xi G\hat{\rho }_T{\mathrm {d}}x\right) \hat{\rho }_T{\mathrm {d}}x. \end{aligned}$$

(20)

Remark 4.4

By taking $\rho _\tau =\delta (x-x_0)$, the Dirac mass at $x_0$, the usual HJB principle is found if $h\equiv g \equiv 0$.

Proposition 4.4

(Hamilton–Jacobi–Bellman equation) Denote an optimal control $\hat{u}$. When ${\mathscr {V}}_d={{{\mathbb {R}}}^d}$, (13) in Proposition 4.2 gives

$$\begin{aligned} \displaystyle&\int _{{{\mathbb {R}}}^d}\left( H + H'\cdot \hat{\rho }_\tau + \partial _\tau V' + \hat{\mu }:\nabla _x^2 V' + \hat{u}\cdot \nabla _x V'\right) \hat{\rho }_\tau {\mathrm {d}}x=0 \end{aligned}$$

(21)

$$\begin{aligned} \hbox {and } \qquad \displaystyle&\nabla _u H + \nabla _u H'\cdot \hat{\rho }_\tau + \nabla _x V' +\nabla _u\mu :\nabla ^2 V' = 0, \end{aligned}$$

(22)

where the second equation is in fact the first-order optimality condition in (13).

Remark 4.5

When the Hamiltonian depends on the distribution $\rho _t$ only through the local value $\rho _t(x)$ and the average of a fixed function, we can make explicit the link with the calculus of variations (see Sect. 3). More precisely, let us assume that $H~=~\tilde{H}(x,t,u,\rho _t(x),\chi (t))$ with $\displaystyle \chi (t)=\int _{{{\mathbb {R}}}^d}h(x,t,u(x,t),\rho _t(x))\rho _t(x){\mathrm {d}}x$; recall that in this case, $\partial _\rho H$ and $\partial _\rho h$ denote derivative with respect to a real parameter. Then, for any $\nu \in \mathbf {L}^2$ :

$$\begin{aligned} H_\rho '(x,\tau ,u;\rho _\tau )\cdot \nu = \nu \partial _\rho \tilde{H} + \Big (\int _{{{\mathbb {R}}}^d}\partial _\chi \tilde{H}\nu {\mathrm {d}}x\Big ) h + \Big (\int _{{{\mathbb {R}}}^d}\partial _\chi \tilde{H}\rho _\tau {\mathrm {d}}x\Big )\nu \partial _\rho h. \end{aligned}$$

(23)

In particular, for $\nu = \rho _\tau $ we have :

$$\begin{aligned} H_\rho '(x,\tau ,u;\rho _\tau )\cdot \rho _\tau = \rho _\tau \partial _\rho \tilde{H} + \Big (\int _{{{\mathbb {R}}}^d}\partial _\chi \tilde{H}\rho _\tau {\mathrm {d}}x\Big )( h + \rho _\tau \partial _\rho h ). \end{aligned}$$

(24)

Then, for the optimal $\hat{u}$ and $\hat{\rho }$, (21) yields

$$\begin{aligned} \displaystyle \partial _\tau V' + \hat{\mu }:\nabla _x^2 V' + \hat{u} \cdot \nabla _x V' = -\big [\tilde{H} + \hat{\rho }\partial _\rho \tilde{H} + ( h + \hat{\rho }\partial _\rho h )\int _{{{\mathbb {R}}}^d}\partial _\chi \tilde{H}\hat{\rho }{\mathrm {d}}x \big ]. \end{aligned}$$

(25)

The link with Sect. 3 is established : (9) and (25) coincide with $V'=\rho ^*$.

Remark 4.6

Note that the adjoint equation, (25), is set in ${\mathbb {R}}^d\times [0,T]$ with a right-hand side which is unbounded as $x \rightarrow \pm \infty $. Existence of solutions is doable in the finite case $\varOmega \times [0,T]$ with $\varOmega $ a bounded open set and $V|_{\partial \varOmega } = 0$, but is a riddle otherwise. It is also a source of numerical difficulties because, as ${\mathbb {R}}^d$ is approximated by $]-L,L[^d$, numerical boundary conditions compatible with the (unknown) behavior at infinity of $V'$ need to be provided.

5 An Academic Example: Production of an Exhaustible Resource

Following [24], we consider a continuum of producers exploiting an oil field. Each producer’s goal is to maximize his profit, knowing the price of oil; however, this price is influenced by the quantity of oil available on the market, which is the sum of all that the producers have decided to extract at a given time. Hence, while each producer does not affect the price of oil, because each producer solves the same optimization problem, in the end the global problem must take into account the market price as a function of oil availability. For a better understanding of the relation between the individual game and the global game, the reader is referred to [10].

5.1 Notations

Let $X_0$ be the initial oil reserve and $X_t$ be the quantity of oil left in the field at time t, as seen by a producer. It is modeled by

$$\begin{aligned} {\mathrm {d}}X_t =-a_t{\mathrm {d}}t + \sigma X_t{\mathrm {d}}W_t,~~X_0 \hbox { given by its PDF}, \end{aligned}$$

(26)

where $a_t{\mathrm {d}}t$ is the quantity extracted by the producer in the time interval $[t,t+{\mathrm {d}}t]$, (so $a_t$ is the extraction rate), and W is a standard real-valued Brownian motion reflecting the incertitude of the producer about the remaining reserve; $\sigma >0$ is a volatility parameter, assumed constant. We suppose that $a_t := a(X_t,t)$ is a deterministic function of t and $X_t$, meaning by this that the producers apply a feedback law to control their production.

We denote by C the cost of oil extraction, which is function of a and assumed to be $C(a):=\alpha a+\beta a^2$, for some positive constants $\alpha $ and $\beta $. The price of oil is assumed to be $p_t := \kappa {\mathrm {e}}^{-b t}(\mathbf{E}(a_t))^{-c}$, with positive $\kappa , b$ and c. This contains the macroeconomic assumption that $p_t$ is a decreasing function of the mean production because scarcity of oil increases its price and conversely. It also says that in the future oil will be cheaper because it will be slowly replaced by renewable energy.

Note that by construction $X_t$ takes only positive values and ought to be bounded by, say L, the maximum estimate of the reservoir content. However, nothing in the model enforces these constraints.

5.2 Model

Each producer optimizes his integrated profit up to time T, discounted by the interest rate r; however, he wishes also to drain the oil field, i.e., achieve $X_T=0$. Thus, his goal is to maximize over $a(\cdot ,\cdot )\ge 0$ the functional :

$$\begin{aligned} J(a) :=\,&\mathbf{E}\left[ \int _0^T(p_t a_t - C(a_t)){\mathrm {e}}^{-rt}{\mathrm {d}}t\right] -\gamma \mathbf{E}[|X_T|^\eta ], ~\hbox { subject to (26);} \end{aligned}$$

(27)

$\gamma $ and $\eta $ are penalization parameters.

Replacing p and C by their expressions gives

$$\begin{aligned} J(a) = \mathbf{E}\left[ \int _0^T(\kappa {\mathrm {e}}^{-b t}(\mathbf{E}[a_t])^{-c}a_t -\alpha a_t-\beta (a_t)^2){\mathrm {e}}^{-rt}{\mathrm {d}}t\right] -\gamma \mathbf{E}[|X_T|^\eta ]. \end{aligned}$$

Obviously, J being the mean of a function of $\mathbf{E}[a_t]$, it is a mean-field type stochastic control problem.

5.3 Remarks on the Existence of Solutions

Denoting $\overline{a}_t := \mathbf{E}[a_t], J$ is:

$$\begin{aligned} J(a) = \int _0^T\left( \kappa {\mathrm {e}}^{-b t}(\overline{a}_t)^{1-c} -\alpha \overline{a}_t-\beta \mathbf{E}[a_t^2]\right) {\mathrm {e}}^{-rt}{\mathrm {d}}t-\gamma \mathbf{E}[|X_T|^\eta ]. \end{aligned}$$

Since $\mathbf{E}[a_t^2]\ge \overline{a}_t^2, J$ is bounded from above,

$$\begin{aligned} J(a) \le \int _0^T\left( \kappa {\mathrm {e}}^{-b t}(\overline{a}_t)^{1-c} -\alpha \overline{a}_t-\beta \overline{a}_t^2\right) {\mathrm {e}}^{-rt}dt. \end{aligned}$$

Assume that $c < 1$. Then, the maximum of the right-hand side is attained when a is such that, $a(1-c) {\mathrm {e}}^{-b t}(\overline{a}_t)^{-c} =\alpha +2\beta \overline{a}_t,~\forall t$. Hence, the maximum value

$$\begin{aligned} J\le \int _0^T\left( c \alpha +(1+c) \beta \overline{a}_t\right) \frac{\overline{a}_t{\mathrm {e}}^{-rt}}{1-c}{\mathrm {d}}t \end{aligned}$$

provides an upper bound for J, so long as $\overline{a}_t$ is upper bounded on [0, T]. Furthermore, when the problem is converted into a deterministic optimal control problem with the Fokker–Planck equation, it is seen that the function is upper semi-continuous, so a maximum exists.

A counter example: Assume $c > 1$. For simplicity, suppose that $a(t) = |\tau -t|$ for some $\tau \in ]0,T[$. Then, $\int _0^T a_t^{1-c}{\mathrm {d}}t = +\infty $; hence, the problem is not well posed, an obvious consequence of the fact that the model makes the price infinite too fast if nobody extract oil.

Linear feedback: If we search for a in the class of linear feedbacks $a(x,t)= w(t)x$ where w is a deterministic function of time only, then (26) has an analytical solution

$$\begin{aligned} \displaystyle X_t = X_0\exp \left( -\int _0^t w(\tau ){\mathrm {d}}\tau -\frac{\sigma ^2}{2} t + \sigma (W_t-W_0)\right) , \end{aligned}$$

(28)

and the first and second moments of $a_t$ are

$$\begin{aligned} \mathbf{E}[a_t]=\mathbf{E}[X_0]\tilde{w}_t ,~~~~ \mathbf{E}[a_t^2]=\mathbf{E}[X_0^2]{\tilde{w}_t}^2{\mathrm {e}}^{\sigma ^2 t} \hbox {, with }\tilde{w}_t := w(t){\mathrm {e}}^{-\int _0^t w(\tau ){\mathrm {d}}\tau }. \end{aligned}$$

Then, for $\eta =2$, the problem reduces to maximizing over $\tilde{w}_t\ge 0$

$$\begin{aligned} J(\tilde{w}_t)&= \int _0^T\big (\kappa {\mathrm {e}}^{-b t}\mathbf{E}[X_0]^{1-c}\tilde{w}_t^{1-c} -\alpha \mathbf{E}[X_0]\tilde{w}_t-\beta \mathbf{E}[X_0^2]\tilde{w}_t^2{\mathrm {e}}^{\sigma ^2 t}\big ){\mathrm {e}}^{-rt}{\mathrm {d}}t\nonumber \\ \displaystyle&\quad -\,\gamma \mathbf{E}[X_0^2]{\mathrm {e}}^{\sigma ^2 T-2\int _0^T w(\tau ){\mathrm {d}}\tau }. \end{aligned}$$

(29)

5.4 Dynamic Programming in Absence of Constraint

To connect to Sect. 2 let us work with $u=-a$, the reserve depletion rate. For the time being, we shall ignore the constraints $L\ge X_t\ge 0$ and $u\le 0$; so ${\mathscr {V}}_d= {\mathbb {R}}$. Moreover, we shall work with $\eta =2$ and comment on $\eta >2$ at the end.

Recall that $\rho (\cdot ,t)$, the PDF of $X_t$, is given by the Fokker–Planck equation :

$$\begin{aligned} \partial _t\rho - \frac{\sigma ^2 }{2}\partial _{xx}\big (x^2\rho \big ) + \partial _x\big (\rho u\big )=0 \qquad (x,t)\in {\mathbb {R}}\times ]0,T], \end{aligned}$$

(30)

with initial condition : $\rho _{|t=0} = \rho _0$ given. Now $\overline{u}_t := \int _{\mathbb {R}}u_t\rho _t{\mathrm {d}}x=\mathbf{E}[-a_t]$ and

$$\begin{aligned} \tilde{J}(\tau ; \rho _\tau , u) :=&\int _\tau ^T\int _{\mathbb {R}}\Big (\kappa {\mathrm {e}}^{-bt}(-\overline{u}_t)^{-c} u_t - \alpha u_t +\beta u_t^2\Big ){\mathrm {e}}^{-rt} \rho _t{\mathrm {d}}x{\mathrm {d}}t \end{aligned}$$

(31)

$$\begin{aligned}&\quad + \,\int _{\mathbb {R}}\gamma |x|^\eta \rho (x,T) {\mathrm {d}}x ~~\hbox { subject to (30) with}\, \rho _{|t=\tau } = \rho _\tau . \end{aligned}$$

(32)

The goal is now to minimize $\tilde{J}$ with respect to u. Define also

$$\begin{aligned} V(\tau ;\rho _\tau ) := \min _u \tilde{J}(\tau ; \rho _\tau , u). \end{aligned}$$

Application of the Results of Sect. 2. In this example, we have : ${\mathscr {V}}_d= {\mathbb {R}}$ and

$$\begin{aligned}&H(x,t; u_t, \rho _t)= (\kappa {\mathrm {e}}^{-bt}\chi _t^{-c} u(x,t) - \alpha u(x,t) +\beta u(x,t)^2){\mathrm {e}}^{-rt}, \\&\chi _t = -\int _{\mathbb {R}}u_t\rho _t{\mathrm {d}}x , \hbox { hence }h(x,t, u, \rho ) = -u(x,t) \hbox { and } G(x,\chi _T) = G(x) = \gamma |x|^\eta . \end{aligned}$$

By Proposition 4.2 and Remark 4.5, we have $V'_{|T}=\gamma |x|^\eta $, and $V'$ satisfies

$$\begin{aligned} \partial _t V' +\frac{\sigma ^2 x^2}{2} \partial _{xx} V'+u\partial _x V'&=-\left[ H+\rho \partial _\rho H + (h+\rho \partial _\rho h)\int _{\mathbb {R}}\partial _\chi H\rho {\mathrm {d}}x \right] \end{aligned}$$

(33)

$$\begin{aligned}&= {-}\left( \kappa (1-c){\mathrm {e}}^{-bt}({{-}\overline{u}})^{-c} u {-} \alpha u +\beta u^2\right) {\mathrm {e}}^{-rt.} \end{aligned}$$

(34)

because $\partial _\rho H = \partial _\rho h = 0$ and $\partial _\chi H = c \kappa {\mathrm {e}}^{-bt}\chi ^{-c-1} u$. Moreover, by (22),

$$\begin{aligned} -\partial _x V'&=\left( \kappa (1-c){\mathrm {e}}^{-bt}({-\overline{u}})^{-c} - \alpha +2\beta u\right) {\mathrm {e}}^{-rt} , \end{aligned}$$

(35)

giving:

$$\begin{aligned} u(x,t) = \frac{1}{2\beta } \left[ \alpha -{\mathrm {e}}^{rt}\partial _x V' -{\kappa (1-c)}{\mathrm {e}}^{-bt}({-\overline{u}})^{-c}\right] . \end{aligned}$$

(36)

Now, using (35) to eliminate $\partial _xV'$ in (34) leads to

$$\begin{aligned} \partial _t V' +\frac{\sigma ^2 x^2}{2} \partial _{xx} V' =\beta {\mathrm {e}}^{-rt}u^2. \end{aligned}$$

(37)

Finally, using (36) in (37) and the definition of $({-\overline{u}})_t$ yields :

$$\begin{aligned} \partial _t V' +\frac{\sigma ^2 x^2}{2} \partial _{xx} V' = \frac{{\mathrm {e}}^{-rt}}{4\beta }\Big (\alpha -{\mathrm {e}}^{rt}\partial _x V' -{\kappa (1-c)}{\mathrm {e}}^{-bt}({-\overline{u}})^{-c} \Big )^2. \end{aligned}$$

(38)

Note that this equation for $V'$ depends only on $\overline{u}$ and not on u. Nevertheless, (36)–(38) is a rather complex partial integro-differential system.

A Fixed-Point Algorithm. We can now sketch a numerical method to solve the problem:

Although it seems to work numerically in many situations, as we shall see below, nothing is known on the convergence of this fixed-point type algorithm; three points need to be clarified:

1.
Equation (30) is nonlinear and existence of solution is unclear.
2.
A relevant stopping criteria for Algorithm 1 are yet to be found.
3.
Even if the Fokker–Planck equation (30) is set on ${\mathbb {R}}^+\times ]0,T[$ instead of Q as discussed below in the numerical section (Sect. 5), there are difficulties. Because the second-order term vanishes at $x=0$, a weak formulation ought to be in the weighted Sobolev space and would be to find $\rho $, with
$$\begin{aligned}&\rho \in \mathbf{H}=\{\nu \in L^2({\mathbb {R}}^+)~:~x\partial _x\nu \in L^2({\mathbb {R}}^+)\}\hbox { such that }\forall \nu \in \mathbf{H}\nonumber \\&\int _{{\mathbb {R}}^+}\Big [\nu \partial _t\rho {\mathrm {d}}x + (x\sigma ^2-u)\rho \partial _x\nu +{x^2\sigma ^2}\partial _x\rho \partial _x\nu \Big ]{\mathrm {d}}x=0, \hbox { for almost all} \,t. \end{aligned}$$
(39)
Theorem 2.2 in [29] asserts existence and uniqueness of $\rho $, provided that there exists $u_M$ such that $u(x,t)<x u_M,~\forall t$. However, this oil resource model does not impose $u(0,t)=0$, and consequently, there is a singularity at that point.

5.5 Calculus of Variations on the Deterministic Control Problem

To find the optimality conditions for (31), let us introduce an adjoint ${\rho ^*}$ satisfying

$$\begin{aligned} \partial _t {\rho ^*} + \frac{\sigma ^2x^2}{2}\partial _{xx}{\rho ^*} +u\partial _x {\rho ^*} = {\mathrm {e}}^{-rt}\left( \alpha -\beta u-\kappa (1-c) {\mathrm {e}}^{-b t}(-{\overline{u}})^{-c}\right) u. \end{aligned}$$

(40)

in ${\mathbb {R}}\times [0,T[$ and ${\rho ^*}_{|T}=\gamma |x|^\eta $. Then,

$$\begin{aligned} \delta J = -\int _Q \Big ({\mathrm {e}}^{-rt}\left( \alpha -2\beta u-\kappa (1-c) {\mathrm {e}}^{-b t}(-\overline{u})^{-c}\right) -\partial _x {\rho ^*}\Big )\rho \delta u{\mathrm {d}}x {\mathrm {d}}t + o(\Vert \delta u\Vert ). \end{aligned}$$

(41)

In other words,

$$\begin{aligned} \hbox {Grad}_uJ = -\Big ({\mathrm {e}}^{-rt}\left( \alpha -2\beta u-\kappa (1-c) {\mathrm {e}}^{-b t}(-\overline{u})^{-c}\right) -\partial _x {\rho ^*}\Big )\rho . \end{aligned}$$

(42)

Algorithm. We apply the steepest descent method with varying step size:

For a convergence analysis, here the situation is somewhat better: Both the state and the adjoint equations (30), (40) are linear and the only problem is the asymptotic behavior of u. Note also that one could use a conjugate gradient algorithm at little additional computational cost.

After discretization by a variation method such as finite element methods, convergence to a local minima could probably be established by the techniques of control theory (see, e.g., [30]) because the problem is finite dimensional and the gradient of the cost function is exact [29]. However, convergence of the solution of the discrete problem to the solution of the continuous problem is open and even more difficult than existence.

5.6 The Riccati Equation when $\eta =2$

Even though the problem is not linear quadratic, when $\eta =2$ we can still look for $V'$ solution of (38) in the form $V'(x,t)=P_tx^2 + z_tx+s_t$, where $P_t,z_t,s_t$ are functions of time only.

Identification of all terms proportional to $x^2$ in (37) gives,

$$\begin{aligned} \dot{P}_t +\sigma ^2 P_t = \frac{{\mathrm {e}}^{rt}}{4\beta }P_t^2,~~~P_T=\gamma . \end{aligned}$$

For clarity, let $Q_t ={\mathrm {e}}^{rt}P_t$ and $\mu =\sigma ^2-r$. Then, the above is:

$$\begin{aligned} \dot{Q}_t + \mu Q_t -\frac{Q_t^2}{4\beta {\mathrm {e}}^{rt}}=0, ~Q_T=\gamma {\mathrm {e}}^{rT} ~\Rightarrow ~ \frac{\dot{Q}_t}{Q_t} + \frac{\dot{Q}_t}{4\beta {\mathrm {e}}^{rt}\mu -Q_t} = -\mu . \end{aligned}$$

As long as ${4\beta e^{rt}\mu -Q_t}>0$, it leads to

$$\begin{aligned} \frac{Q_t}{Q_t-4\beta {\mathrm {e}}^{rt}\mu } = \frac{\gamma {\mathrm {e}}^{(T-t)\mu }}{\gamma -4\beta \mu } ~\Rightarrow ~ P_t=\frac{4\beta \mu \gamma {\mathrm {e}}^{(T-t)\mu }}{\gamma {\mathrm {e}}^{(T-t)\mu }-\gamma +4\beta \mu }. \end{aligned}$$

(43)

Then, u is found by (36). In particular, $\partial _x u = - \frac{1}{8 \beta } \partial _{xx} V' = - \frac{1}{4\beta } P_t$. However, the Fokker–Planck equation must be solved numerically to compute $\overline{u}$.

Note that $\gamma <4\beta \mu $ implies ${4\beta e^{rt}\mu -Q_t}>0$ and also $P_t>0$.

Remark 5.1

By identification of the terms of order 1 and 0 in x, equations for z and s are found:

$$\begin{aligned}&\dot{z} = -\frac{1}{\beta }\left( \alpha -\delta (1-c){\mathrm {e}}^{-bt}(-\overline{u})^{-2c}\right) ^2 {\mathrm {e}}^{rt}P_t^2,~~~z_T=0,\\&\dot{s} = -\frac{{\mathrm {e}}^{-rt}}{\beta }\left( \alpha -\delta (1-c){\mathrm {e}}^{-bt}(-\overline{u})^{-c} -{\mathrm {e}}^{rt}z_t\right) ^2,~~~s_T=0. \end{aligned}$$

Remark 5.2

Note that $u_t=2 x P_t + z_t$ is not a linear feedback function as in (28) above.

Remark 5.3

This explicit feedback solution is smooth and adapted to the stochastic process (26), so it should be a solution of (27) if it exists (recall that (27) is not convex). Furthermore, it has a behavior at infinity which is compatible with (2).

5.7 Numerical Implementation

To implement Algorithms 1 & 2, we need to localize the PDE. As $x<0$ makes no sense for this application, we shall work on $Q_L=[0,L]\times [0,T]$ instead of ${\mathbb {R}}\times [0,T]$; a stopping time for the event $X_t=0$ would be better, but too costly. At $x=L$, we set $\rho (L,t)=0,\forall t$, which makes sense when L is large.

Assigning boundary conditions to (38) and (40) is problematic. Our numerical tests show that the computations depend strongly on L when it is not done correctly. When $\eta =2$, we know that $V'$ and $\rho ^*$ have asymptotically the same behavior as $P_t x^2$, giving $\frac{1}{2}\sigma ^2x^2\partial _x V'=\sigma ^2 x^3 P_t =\sigma ^2 x V'$, a relation which can be used as a boundary condition in the weak form of the equation (and similarly for $\rho ^*$): find $\rho \in H^1(Q_L)$ with $V'_{T}$ given and

$$\begin{aligned} \displaystyle&\int _{Q_L}\big [ -\nu \partial _t V' +\frac{\sigma ^2}{2}\partial _x(\nu x^2) \partial _{x} V' \big ]{\mathrm {d}}x{\mathrm {d}}t +\int _0^T\sigma ^2 L V'(L,t)\nu (L,t){\mathrm {d}}t \nonumber \\&+ \int _{Q_L}\frac{{\mathrm {e}}^{-rt}}{4\beta }\Big (\alpha -{\mathrm {e}}^{rt}\partial _x V' -{\kappa (1-c)}{\mathrm {e}}^{-bt}({-\overline{u}})^{-c} \Big )^2\nu {\mathrm {d}}x {\mathrm {d}}t =0, \end{aligned}$$

(44)

for all $\nu \in H^1(Q_L)$ with $\nu _{T}=0$.

To solve this nonlinear PDE, we use the fact that it is embedded into an iterative loop in Algorithm 1 and semi-linearize it by evaluating the square term in the last integral as a product of the same, where one factor is evaluated at the previous iteration.

To discretise it, we have used a space–time finite element method of degree 1 over triangles covering $Q_L$. Admittedly, it is an unusual method. However, it is somewhat similar to a central difference method, and it is feasible because the problem is one dimensional in space and because it allows exact conservativity and exact duality with respect to time and space in the integrations by parts. It handles also automatically the storage of $\rho ,u,V,\overline{u}$ at all times and solve the backward (and forward) equation at all time steps by a single linear system. The linear systems are solved with the library MUMPS as implemented in freefem++ [31].

5.8 Results with Algorithm 1

We used 50 points to discrete $(0,L), L=10$ and 50 time steps for $[0,T], T=5$. The following values have been used: $\alpha =1, \beta =1, \gamma =0.5, \kappa =1, b=0.1, r=0.05, \sigma =0.5$ and $c=0.5$. The initial condition $\rho _0$ is a Gaussian curve centered at $x=5$ with volatility 1. We initialized u by $u_0=-\alpha /(2\beta )$. A local minimum $u_e$ is known from the Riccati equation; the error $\Vert u-u_e\Vert $ is used as a stopping criteria in Algorithm 1. We chose $\omega =0.5$.

Results with Algorithm 1.

Figure 1 shows the optimal control as a function of (x, t). For each t, the control is linear in x, as predicted by the Riccati equation; the quadratic part of the Riccati solution of Sect. 5.6 is also plotted, and a small difference is seen on the plot (two surfaces close to each other are displayed). Figure 2 shows the evolution in time of the PDF $\rho $ for all $x>0$ of the resource distribution $X_t$. At time 0, it is a Gaussian distribution centered at $x=5$; at time T, the distribution is concentrated around $x=0.5$, so most producers have pumped 90% of the oil available to them (Table 1).

Table 1 Algorithm 1

Full size table

All above is obtained with $L=10$, but there is almost no difference with $L=40$.

Figures 3, 4 present the evolution of production ($-\overline{u}_t$) and price $p_t = \kappa {\mathrm {e}}^{-b t}(-\overline{u}_t)^{-c}$.

5.9 Results with Algorithm 2

The performance of descent methods was disappointing. It generated many different solutions depending on the initial value for u. If $u_0=u_e$, then the algorithm decreases the cost functions by introducing small oscillations, a strategy which is clearly mesh dependent.

If $u_0=-0.5$, then the solution of Fig. 5 is found after 10 iterations of steepest descent. The convergence history is given in Table 2. The results are shown in Fig. 6. Note the oscillations of $\rho $ near $t=T$.

Table 2 Algorithm 2

Full size table

5.10 The Case $\eta \ne 2$

The following parameters are now changed : $\eta =3$ and $u_0$ is initialized by the Riccati solution. Algorithm 1 converges in a few iterations to a solution, but $\omega =0.05$ is required for convergence. Algorithm 2 gives a different solution. Both adjoint states are shown in Fig. 7.

When $\eta =4$, Algorithm 1 diverges, while Algorithm 2 converges to the solution shown in Fig. 7.

5.11 Linear Feedback Solution

Using automatic differentiation of computer programs by operator overloading in C++ and initializing a steepest descent method with the linear part of the Riccati solution of Sect. 5.6 and the same parameters as above, we obtained the w(t) displayed in Fig. 8, very close to the Riccati solution. To understand why the Riccati solution may not be the best solution, we plotted $\lambda \rightarrow J^d(\lambda ):=J(w^d_t+\lambda h_t),~\lambda \in ]-0.5,+0.5[$, where $h_t$ is an approximate $w_t- \nabla J(w^d_t)$. Figure 8 shows that there are three local minima and two local maxima, and $w^d_t$ is only a local minimum.

6 Conclusions

Stochastic mean-field type control is numerically hard. Even a simple academic toy problem gives difficulties. The first difficulty which this paper had to deal with is the extension of the HJB dynamic programming approach. The second difficulty is related to the well posedness of the HJB or adjoint equation because it is set in an infinite domain in space. The third difficulty is the lack of proof of convergence of the two algorithms suggested here, an HJB-based fixed point and a steepest descent based on calculus of variations. When it converges, the fixed-point method is preferred but there is no known way to guarantee convergence even to a local minimum; as for the steepest descent, we found it somewhat hard to use, mainly because it generates irregular oscillating solutions; some bounds on the derivatives of u need to be added in penalized form in the criteria. Numerically both algorithms are cursed by the asymptotic behavior of the solution of the adjoint state at infinity. So, when possible, the Riccati semi-analytical feedback solution is the best. Finally, but this applies only to this toy problem, the pure feedback solution is nearly optimal, easy to compute and stable.

Note also that this semi-analytical solution is a validation test, since it has been recovered by both algorithms.

References

Fleming, W.H., Soner, H.M.: Controlled Markov Processes and Viscosity Solutions. Stochastic Modelling and Applied Probability series vol 25. Springer, Berlin (2006)
Google Scholar
Kushner, H.J.: Stochastic Stability and Control. Academic Press, New York (1967)
MATH Google Scholar
Øksendal, B., Sulem, A.: Applied Stochastic Control of Jump Diffusions. Springer, Berlin (2005)
MATH Google Scholar
Touzi, N.: Optimal Stochastic Control, Stochastic Target Problems and Backward SDE. Field Inst. Monographs 29. Springer, Berlin (2013)
Book MATH Google Scholar
Yong, J., Zhou, X.Y.: Stochastic Control. Applications of Math. Series vol 43. Springer, Berlin (1999)
Google Scholar
Carmona, R., Fouque, J.-P., Sun, L.-H.: Mean-field games and systemic risk. Commun. Math. Sci. 13(4), 911–933 (2015)
Article MathSciNet MATH Google Scholar
Garnier, J., Papanicolaou, G., Yang, T.-W.: Large deviations for a mean-field model of systemic risk. SIAM J. Financ. Math. 4(1), 151–184 (2013)
Article MathSciNet MATH Google Scholar
Lasry, J.M., Lions, P.L.: Mean-field games. Jpn. J. Math. 2, 229–260 (2007)
Article MathSciNet MATH Google Scholar
Shen, M., Turinici, G.: Liquidity generated by heterogeneous beliefs and costly estimations. Netw. Heterog. Media 7(2), 349–361 (2012)
Article MathSciNet MATH Google Scholar
Carmona, R., Delarue, F., Lachapelle, A.: Control of McKean–Vlasov dynamics versus mean field games. Math. Financ. Econ. 7(2), 131–166 (2013)
Article MathSciNet MATH Google Scholar
Andersson, D., Djehiche, B.: A maximum principle for SDEs of mean-field type. Dyn. Games Appl. 3, 537–552 (2013)
Article MathSciNet MATH Google Scholar
Bensoussan, A., Frehse, J., Yam, S.C.P.: The Master equation in mean field theory. J. Math. Pures Appl. 103(6), 1441–1474 (2015)
Article MathSciNet MATH Google Scholar
Carmona, R., Delarue, F.: The master equation for large population equilibriums. In: Crisan, D., Hambly, B., Zariphopoulou, T. (eds.) Stochastic Analysis and Applications 2014. Springer, Berlin (2014)
Google Scholar
Bensoussan, A., Frehse, J.: Control and Nash games with mean-field effect. Chin. Annal. Math. Series B. 34B(2), 161–192 (2013)
Article MathSciNet MATH Google Scholar
Bensoussan, A., Frehse, J., Yam, S.C.P.: Mean-Field Games and Mean-Field Type Control. Springer Briefs in Math. Springer, Berlin (2014)
Google Scholar
Kolokoltsov, V., Troeva, M., Yang, W.: On the rate of convergence for the mean-field approximation of controlled diffusions with large number of players. Dyn. Games Appl. 4(2), 208–230 (2014)
Article MathSciNet MATH Google Scholar
Kolokoltsov, V., Yang, W.: Existence of solutions to path-dependent kinetic equations and related forward-backward systems. Open J. Optim. 2, 39–44 (2013)
Article Google Scholar
Achdou, Y., Laurière, M.: On the system of partial differential equations arising in mean field type control, DCDS A (2015, in review)
Gangbo, W., Swiech, A.: Optimal transport and large number of particles. Discret. Cont. Dynam. Syst. A 34, 1397–1441 (2014)
Article MathSciNet MATH Google Scholar
Chan, P., Sircar, R.: Bertrand & Cournot mean-field games. Appl. Math. Optim. 71, 533 (2015)
Article MathSciNet MATH Google Scholar
Laurière, M., Pironneau, O.: Dynamic programming for mean-field type control. C. R. Acad. Sci. Serie I, 352(9), 707–713 (2014)
Lions, P.L. : Mean-Field Games. Cours au Collège de France (2007–2008). http://www.college-de-france.fr/site/pierre-louis-lions/course-2007-2008_1.htm
Annunziato, M., Borzì, A., Nobile, F., Tempone, R.: On the connection between the Hamilton-Jacobi-Bellman and the Fokker-Planck control frameworks. Appl. Math. 5, 2476–2484 (2014)
Gueant, O., Lasry, M., Lions, P.L.: Mean field games and applications. In: Paris-Princeton Lectures on Mathematical Finance. Lecture Notes in Math. Springer, (2011)
Le Bris, C., Lions, P.L.: Existence and uniqueness of solutions to Fokker–Planck type equations with irregular coefficients. Commun. Partial Differ. Equ. 33, 1272–1317 (2008)
Article MathSciNet MATH Google Scholar
Porretta, A.: Weak solutions to Fokker–Planck equations and mean field games. Arch. Ration. Mech. Anal. 216, 1–62 (2015)
Article MathSciNet MATH Google Scholar
Bally, V., Pagès, G., Printems, J.: A Stochastic quantization method for nonlinear problems. Monte Carlo Methods Appl. 7(12), 21–34 (2001)
MathSciNet MATH Google Scholar
Neufeld, A., Nutz, M. : Nonlinear Lévy processes and their Characteristics. Trans. Am. Math. Soc., forthcoming. See also arXiv:1401.7253v1
Achdou, Y., Pironneau, O.: Computation Methods for Option Pricing, SIAM Frontiers in Math (2005)
Polak, E.: Optimization Algorithms and Consistent Approximations. Springer, New York (1997)
MATH Google Scholar
Hecht, F.: New Development in Freefem++. J. Numer. Math. 20(3–4), 251–265 (2012)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors are grateful to Yves Achdou, Alain Bensoussan and Olivier Guéant for useful discussions.

Author information

Authors and Affiliations

LJLL, Université Denis Diderot (Paris 7), Paris, France
Mathieu Laurière
LJLL, Université Pierre et Marie Curie (Paris 6), Paris, France
Olivier Pironneau

Authors

Mathieu Laurière
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Pironneau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olivier Pironneau.

Additional information

Communicated by Nizar Touzi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laurière, M., Pironneau, O. Dynamic Programming for Mean-Field Type Control. J Optim Theory Appl 169, 902–924 (2016). https://doi.org/10.1007/s10957-015-0785-x

Download citation

Received: 01 February 2015
Accepted: 18 July 2015
Published: 29 July 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10957-015-0785-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Dynamic Programming for Mean-Field Type Control

Abstract

Similar content being viewed by others

A Weak Martingale Approach to Linear-Quadratic McKean–Vlasov Stochastic Control Problems

Discrete-Time Control for Systems of Interacting Objects with Unknown Random Disturbance Distributions: A Mean Field Approach

Control Strategies for the Dynamics of Large Particle Systems

1 Introduction

2 The Problem

Remark 2.1

Remark 2.2

Remark 2.3

3 Calculus of Variations

Proposition 3.1

Proof

Remark 3.1

4 Dynamic Programming

Remark 4.1

Proposition 4.1

Proof

Remark 4.2

Proposition 4.2

Proof

Proposition 4.3

Proof

Remark 4.3

Remark 4.4

Proposition 4.4

Remark 4.5

Remark 4.6

5 An Academic Example: Production of an Exhaustible Resource

5.1 Notations

5.2 Model

5.3 Remarks on the Existence of Solutions

5.4 Dynamic Programming in Absence of Constraint

5.5 Calculus of Variations on the Deterministic Control Problem

5.6 The Riccati Equation when \(\eta =2\)

Remark 5.1

Remark 5.2

Remark 5.3

5.7 Numerical Implementation

5.8 Results with Algorithm 1

5.9 Results with Algorithm 2

5.10 The Case \(\eta \ne 2\)

5.11 Linear Feedback Solution

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation