Abstract
We investigate a model problem for optimal resource management. The problem is a stochastic control problem of mean-field type. We compare a Hamilton–Jacobi–Bellman fixed-point algorithm to a steepest descent method issued from calculus of variations. For mean-field type control problems, stochastic dynamic programming requires adaptation. The problem is reformulated as a distributed control problem by using the Fokker–Planck equation for the probability distribution of the stochastic process; then, an extended Bellman’s principle is derived by a different argument than the one used by P. L. Lions. Both algorithms are compared numerically.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Stochastic control has been studied extensively over the past five decades [1–5], and yet there is a renewed interest in economy and finance due to mean-field games [6–9]. Mean-field games give rise to mean-field type stochastic control problems [10], which involve not only the Markov process of the state of the system, some statistics of the process like means and variance, in the cost function or in the stochastic differential equation (SDE). For these problems, optimality conditions are derived either by stochastic calculus of variation [11] or by stochastic dynamic programming [12, 13] and justified in the quadratic case by classical arguments [14, 15], but not so classical one in the general case for the fundamental reason that Bellman’s principle does not apply in its original form [12, 16].
Several authors have generalized dynamic programming using Wasserstein distance to define derivatives with respect to measures. Others have studied the existence of a conceptual HJB equation [16, 17]. These results certainly overlap and precede our analysis, but our point of view is different: It is pragmatic, so we sacrifice mathematical rigor to explicit expressions, and numerical, as we wish to compare solutions obtained by HJB to standard optimal control by calculus of variations. We have not tried to specify regularity of data for existence and differentiability of solutions. The results are stated formally only but with an intuitive feeling that they could be justified later with appropriate assumptions as in, e.g., [18, 19] if the behavior at infinity of the solution of the HJB equation is known, which is a major riddle.
Before proceeding further, note that a direct simulation of the problem with the stochastic differential equation approximated by Monte-Carlo is too costly and not competitive with the methods that we pursue below. Indeed, the cost function of the optimization problem involves means of stochastic quantities, and Monte-Carlo methods would require large numbers of evaluations of the SDE, embedded in forward–backward time loops. Faced with the same problem, Garnier et al. [7] and Chan et al. [20] came to the same conclusion.
In this article, pursuing a preliminary study published in [21], we apply the dynamic programming argument to the value functional as in [22], but instead of using the probability measure of the stochastic process, we use its probability density function (PDF). Hence, though less general, the mathematical argument will be simpler. Of course, this is at the cost of several regularity assumptions, such as the existence of a regular PDF at all times. However, our analysis here is strongly motivated by the numerical solutions of these control problems, and hence, assuming regularity is not a real practical limitation.
Once the problem is reformulated with the Fokker–Planck equation [1, 23], it becomes a somewhat standard exercise to find the optimality necessary conditions by a calculus of variations. So this article begins likewise in Sect. 3. Then, in Sect. 4, a similar result is obtained by using dynamic programming, and the connection with the previous approach and with stochastic dynamic programming is established. In Sect. 5, a problem introduced in [24] involving profit-optimizing oil producers is defined and studied for existence and optimality, and two algorithms are proposed together with a semi-analytical method based on a Riccati solution. The paper ends with a numerical Section, which implements the three methods and compare them.
2 The Problem
Consider a stochastic differential system of d-equations
where u takes values in \({{{\mathbb {R}}}^d}\), \(\sigma \) in \({\mathbb {R}}^{d\times k}\) and \(W_t\) is a k-vector of independent Brownian motions. Assumptions for which a strong solution is known to exist once the distribution of \(X_0\) is known to be in \(L^2\cap L^\infty \) are given in [25] (see also [26], Proposition 4 which applies when \(\sigma \sigma ^T\) is uniformly positive definite:
and with \(\tilde{\sigma }(x,t):=\sigma (x,t,u(x,t))\):
Then, the PDF of \(X_t\) satisfies
and is the unique solution of the Fokker–Planck equation,
Conversely, with (2) (3), there is a unique solution to (4) which is the PDF of a Markov process which satisfies (1).
Remark 2.1
The assumption \(\sigma \sigma ^T>0\) can be replaced by assuming the conditions (2) for \(u-\frac{1}{2}\nabla \cdot (\tilde{\sigma }\tilde{\sigma }^T)\) rather than u, but it implies some regularity on the second derivatives of u.
Consider the stochastic optimization problem
subject to (1) with \(\rho _0\) given, and
where \(\tilde{h}, g, \tilde{H}, G\) are \(C^1\) functions taking values in \({\mathbb {R}}^r,{\mathbb {R}}^s,{\mathbb {R}}\) and \({\mathbb {R}}\), respectively. Assume also that \(\tilde{H}\) and G are bounded from below and
As a first approach to solve such problems, Andersson et al. [11] proposed to use a stochastic calculus of variations; the necessary optimality conditions are a forward–backward stochastic differential system, which is numerically very hard because the backward volatility for the adjoint equation is one of the unknowns [11, 27].
A second approach is to use stochastic dynamic programming (SDP), but an important adaptation needs to be made. Usually, SDP uses the remaining cost function
The Bellman equation is derived by saying that \({u_t},t>\tau \) is a solution only if, together with (1) and \(X_\tau =x\),
However, the above is not true unless \(\tilde{h}=0,~g=0\); as in [22], one has to work with
where V is a pointwise function of \(\tau \in [0,T]\) and has functional dependence on \(\rho _\tau (\cdot )\), i.e., depends on \(\{x\mapsto \rho _\tau (x),~\forall x\in {{{\mathbb {R}}}^d}\}\).
A third approach is to work with the deterministic version of the problem. With sufficient regularity (see [28] for weaker assumptions), with the Fokker–Planck partial differential equation for \(\rho (x,t):=\rho _t(x)\); the problem is equivalent to the deterministic distributed control problem,
where \(\mu _{ij}=\frac{1}{2}\sum _{k}\sigma _{ik}\sigma _{jk}\) and \(\nabla ^2\) is the \(d\times d\) matrix operator of element \(\partial _{ij}\). The notation A : B stands for \(\sum _{i,j=1}^dA_{ij}B_{ij}\) and \(\nabla \cdot u\) stands for \(\sum _{i=1}^d\partial _i u_i\).
Remark 2.2
Note that the problem is equivalent to the stochastic control problem only if \(\rho _0\) is in \(\mathbf {P}\), the set of positive real-valued functions with measure 1. However, the deterministic control problem still makes sense even if it is not the case and \(\rho \in L^2([0,T],H^1({{{\mathbb {R}}}^d}))\) only. We will use this in the proof of Proposition 4.3.
Remark 2.3
Existence of solutions for (6) or (7) requires more assumptions on \(\tilde{H}, \tilde{h},G\) and g to make sure that J is lower semi-continuous with respect to u in a norm such as \(L^2\big ([0,T],H^1({{{\mathbb {R}}}^d})\big )\) which implies (2) and \({\mathscr {U}}_d\) closed. However, since there is no term containing a gradient of u in J, a Tikhonov regularization seems to be needed for these problems to be well posed.
3 Calculus of Variations
Deriving optimality conditions by calculus of variations is fairly standard, as we shall show in this section.
Proposition 3.1
A control u is optimal for (7) only if for \(\forall t\in ]0,T[\) , \(\forall v\in {\mathscr {V}}_d\),
where \(\tilde{H}_u', \tilde{H}_\rho ', \tilde{H}_\chi ',\tilde{h}_u', \tilde{h}_\rho '\) and \(G_\xi '\) are partial derivative in the classical sense.
Proof
Recall that (7) is in \(Q=]-\infty ,+\infty [^d\times ]0,T]\). The Fokker–Planck equation (7) being set in \(\mathbf {P}\), it contains the “boundary condition” \(\forall t\in ]0,T], \lim _{|x|\rightarrow \infty }\rho (x,t)=0\).
Consider an admissible variation \(\lambda \delta u\), i.e., \(u+\lambda \delta u\in {\mathscr {U}}_d\) for all \(\lambda \in [0,1]\). Such a variation induces a variation \(\lambda \delta \rho \) of \(\rho \) given by
with \(\delta \rho |_{t=0}=0\) and where \(\mu '_u\) is evaluated at \(x,t,u+\theta \delta u\) for some \(\theta \in [0,\lambda ]\). We assume that there is enough regularity for the solution of the Fokker–Planck equation in (7) to depend continuously on the data \(u,\mu \). Then (10) at \(\lambda =0\) defines \(\delta \rho \). Also, up to higher-order terms,
Define an adjoint state \(\rho ^*\) by (9). Then (9) multiplied by \(\delta \rho \) and integrated on Q gives
The rest follows by saying that u is a minimum. \(\square \)
Remark 3.1
The “boundary conditions at infinity” on \(\rho ^*\) in (9) are problematic. The PDE is to be understood in weak form in the dual of \(\mathbf {P}\), i.e., for all \(\nu \in \mathbf {P}\) :
Then, the decay of \(\nu \) at infinity will balance the potential growth of \(\rho ^*\), and the integrations by parts in the proof above will have no terms at infinity in \(\Vert x\Vert \). However, the uniqueness of \(\rho ^*\) is not guaranteed. We could have avoided the problem by working in \([-L,L]^d\times [0,T]\) rather than \({{{\mathbb {R}}}^d}\times [0,T]\) and imposing \(\rho ^*=0\) on the boundary of \([-L,L]^d\), but then the solution depends strongly on L; this problem will be rediscussed in the numerical section (Sect. 5).
4 Dynamic Programming
For notational clarity consider the more general case, where H, G are functionals of \(\rho _t(\cdot )\). Let \(\rho \) be solution of (7) initialized at \(\tau \) by \(\rho _\tau (\cdot )\) and let J and V be defined as :
By the Markovian property, \(\rho _t(x):=\rho (x,t)\), for \(t>\tau \), is also the PDF of \(X_t\) given by (1) knowing its PDF \(\rho _\tau \) at \(\tau \).
Remark 4.1
In this section, H is a pointwise function of \(u(x,t) \in {\mathbb {R}}^d\), but the theory can be extended to the case where H is a functional of \(u(\cdot ,\cdot ) : {\mathbb {R}}^d \times {\mathbb {R}}\rightarrow ~{\mathbb {R}}^d\).
Assuming that J is bounded from below, V is finite and we prove the following version of Bellman’s principle of optimality :
Proposition 4.1
If the problem is regular, then for any \(\tau \in [0,T]\) and any \(\rho _\tau \in \mathbf {P}\), we have :
Proof
Denote the infimum of the right-hand side by \(\overline{V}(\tau ; \rho _\tau )\). For any \(\varepsilon >0\), there exists \(u \in {\mathscr {U}}_d\) such that, if \(\rho _t\) is the solution of (7) with control u :
Conversely, given \(u \in {\mathscr {U}}_d\) and \(\varepsilon >0\), there exists a control \(\tilde{u} \in {\mathscr {U}}_d\), which coincides with u on \({{{\mathbb {R}}}^d}\times [\tau , \tau +\delta \tau ]\), such that:
where \(\tilde{\rho }_t\) is the solution of (7) at t with control \(\tilde{u}\) starting with \(\rho _\tau \) at time \(\tau \). Hence,
To conclude, let \(\varepsilon \rightarrow 0\) and take the infimum over \(u \in {\mathscr {U}}_d\). \(\square \)
From now on, we assume that H and V are Fréchet differentiable with respect to \(\rho \).
Remark 4.2
The correct mathematical tool for this differentiability is the Wasserstein distance and the differentiability with respect to the probability measure rather than to its density (see, e.g., [13, 22]). Our approach is more pragmatic.
We denote the Fréchet derivatives by \(H_\rho '(x,\tau ;\rho )\) and \(V_\rho '(\tau ;\rho )\). Thus, and similarly for \(V, H_\rho '(x,\tau ;\rho )\) denotes the linear application \(\mathbf {L}^2\mapsto {\mathbb {R}}\) such that :
where \(\mathbf {L}^2:=L^2({{{\mathbb {R}}}^d})\). Moreover, we denote with a prime the Riesz representative of the Fréchet derivative with respect to \(\rho \). For instance, \(V' : [0,T] \times \mathbf {L}^2\rightarrow \mathbf {L}^2\) is defined by :
Proposition 4.2
(HJB minimum principle). Assuming that \(V'\) is smooth enough, the following holds :
Note: As usual, \(\nabla \) is with respect to x.
Proof
A first-order approximation of the time derivative in the Fokker–Planck equation gives
As V is assumed to be smooth, we have :
Using (15) and the mean value theorem for the time integral, Bellman’s principle yields , up to \(o(\delta \tau )\),
The terms \(V(\tau ; \rho _\tau )\) cancel, divided by \(\delta \tau \) and combined with (14) and letting \(\delta \tau \rightarrow 0\), (16) gives
To finalize the proof, we need the following proposition to relate V to \(V_\rho '\). \(\square \)
Proposition 4.3
Given \(\tau \in [0,T]\) and an initial \(\rho _\tau \in \mathbf {P}\), let \(\hat{u} \in {\mathscr {U}}_d\) and \(\hat{\rho }\) denote an optimal control and the corresponding solution of (7), then:
Proof
Note that the Fokker–Planck equation implies the existence of a semigroup operator \(\mathbf{G}\) such that, for all \(\tau \le t, \rho _t=\mathbf{G}(t-\tau )*\rho _\tau \). Let \((\hat{u}_t)_{t\in [0,T]}\) be the optimal control and \((\hat{\rho }_t)_{t\in [0,T]}\) the corresponding solution of (7) and (12). Then :
By the optimality of \(\hat{u}\) and \(\hat{\rho }\), this can be Fréchet-differentiated with respect to \(\rho \) by computing , for a given \(\nu \in \mathbf {L}^2, \lim _{\lambda \rightarrow 0}\frac{1}{\lambda }\big [V\left( \tau ; \hat{\rho }_\tau +\lambda \nu \right) - V\left( \tau ; \hat{\rho }_\tau \right) \big ]\). The result is:
Taking \(\nu = \hat{\rho }_\tau \) leads to (18).
One may object, however, that such choice for \(\nu \) is not admissible because being a variation of \(\rho _\tau \) it has to have zero measure, but we discussed this in Remark 2.2.
End of proof of Proposition 4.2 Differentiating (18) with respect to \(\tau \) leads to
where \(\hat{u}_\tau \) is the optimal control at time \(\tau \). Now, let us use (17), rewritten as
Integrating by parts the last two terms leads to (13). \(\square \)
Remark 4.3
Note that (18) and (12) imply :
Remark 4.4
By taking \(\rho _\tau =\delta (x-x_0)\), the Dirac mass at \(x_0\), the usual HJB principle is found if \(h\equiv g \equiv 0\).
Proposition 4.4
(Hamilton–Jacobi–Bellman equation) Denote an optimal control \(\hat{u}\). When \({\mathscr {V}}_d={{{\mathbb {R}}}^d}\), (13) in Proposition 4.2 gives
where the second equation is in fact the first-order optimality condition in (13).
Remark 4.5
When the Hamiltonian depends on the distribution \(\rho _t\) only through the local value \(\rho _t(x)\) and the average of a fixed function, we can make explicit the link with the calculus of variations (see Sect. 3). More precisely, let us assume that \(H~=~\tilde{H}(x,t,u,\rho _t(x),\chi (t))\) with \(\displaystyle \chi (t)=\int _{{{\mathbb {R}}}^d}h(x,t,u(x,t),\rho _t(x))\rho _t(x){\mathrm {d}}x\); recall that in this case, \(\partial _\rho H\) and \(\partial _\rho h\) denote derivative with respect to a real parameter. Then, for any \(\nu \in \mathbf {L}^2\) :
In particular, for \(\nu = \rho _\tau \) we have :
Then, for the optimal \(\hat{u}\) and \(\hat{\rho }\), (21) yields
The link with Sect. 3 is established : (9) and (25) coincide with \(V'=\rho ^*\).
Remark 4.6
Note that the adjoint equation, (25), is set in \({\mathbb {R}}^d\times [0,T]\) with a right-hand side which is unbounded as \(x \rightarrow \pm \infty \). Existence of solutions is doable in the finite case \(\varOmega \times [0,T]\) with \(\varOmega \) a bounded open set and \(V|_{\partial \varOmega } = 0\), but is a riddle otherwise. It is also a source of numerical difficulties because, as \({\mathbb {R}}^d\) is approximated by \(]-L,L[^d\), numerical boundary conditions compatible with the (unknown) behavior at infinity of \(V'\) need to be provided.
5 An Academic Example: Production of an Exhaustible Resource
Following [24], we consider a continuum of producers exploiting an oil field. Each producer’s goal is to maximize his profit, knowing the price of oil; however, this price is influenced by the quantity of oil available on the market, which is the sum of all that the producers have decided to extract at a given time. Hence, while each producer does not affect the price of oil, because each producer solves the same optimization problem, in the end the global problem must take into account the market price as a function of oil availability. For a better understanding of the relation between the individual game and the global game, the reader is referred to [10].
5.1 Notations
Let \(X_0\) be the initial oil reserve and \(X_t\) be the quantity of oil left in the field at time t, as seen by a producer. It is modeled by
where \(a_t{\mathrm {d}}t\) is the quantity extracted by the producer in the time interval \([t,t+{\mathrm {d}}t]\), (so \(a_t\) is the extraction rate), and W is a standard real-valued Brownian motion reflecting the incertitude of the producer about the remaining reserve; \(\sigma >0\) is a volatility parameter, assumed constant. We suppose that \(a_t := a(X_t,t)\) is a deterministic function of t and \(X_t\), meaning by this that the producers apply a feedback law to control their production.
We denote by C the cost of oil extraction, which is function of a and assumed to be \(C(a):=\alpha a+\beta a^2\), for some positive constants \(\alpha \) and \(\beta \). The price of oil is assumed to be \(p_t := \kappa {\mathrm {e}}^{-b t}(\mathbf{E}(a_t))^{-c}\), with positive \(\kappa , b\) and c. This contains the macroeconomic assumption that \(p_t\) is a decreasing function of the mean production because scarcity of oil increases its price and conversely. It also says that in the future oil will be cheaper because it will be slowly replaced by renewable energy.
Note that by construction \(X_t\) takes only positive values and ought to be bounded by, say L, the maximum estimate of the reservoir content. However, nothing in the model enforces these constraints.
5.2 Model
Each producer optimizes his integrated profit up to time T, discounted by the interest rate r; however, he wishes also to drain the oil field, i.e., achieve \(X_T=0\). Thus, his goal is to maximize over \(a(\cdot ,\cdot )\ge 0\) the functional :
\(\gamma \) and \(\eta \) are penalization parameters.
Replacing p and C by their expressions gives
Obviously, J being the mean of a function of \(\mathbf{E}[a_t]\), it is a mean-field type stochastic control problem.
5.3 Remarks on the Existence of Solutions
Denoting \(\overline{a}_t := \mathbf{E}[a_t], J\) is:
Since \(\mathbf{E}[a_t^2]\ge \overline{a}_t^2, J\) is bounded from above,
Assume that \(c < 1\). Then, the maximum of the right-hand side is attained when a is such that, \(a(1-c) {\mathrm {e}}^{-b t}(\overline{a}_t)^{-c} =\alpha +2\beta \overline{a}_t,~\forall t\). Hence, the maximum value
provides an upper bound for J, so long as \(\overline{a}_t\) is upper bounded on [0, T]. Furthermore, when the problem is converted into a deterministic optimal control problem with the Fokker–Planck equation, it is seen that the function is upper semi-continuous, so a maximum exists.
A counter example: Assume \(c > 1\). For simplicity, suppose that \(a(t) = |\tau -t|\) for some \(\tau \in ]0,T[\). Then, \(\int _0^T a_t^{1-c}{\mathrm {d}}t = +\infty \); hence, the problem is not well posed, an obvious consequence of the fact that the model makes the price infinite too fast if nobody extract oil.
Linear feedback: If we search for a in the class of linear feedbacks \(a(x,t)= w(t)x\) where w is a deterministic function of time only, then (26) has an analytical solution
and the first and second moments of \(a_t\) are
Then, for \(\eta =2\), the problem reduces to maximizing over \(\tilde{w}_t\ge 0\)
5.4 Dynamic Programming in Absence of Constraint
To connect to Sect. 2 let us work with \(u=-a\), the reserve depletion rate. For the time being, we shall ignore the constraints \(L\ge X_t\ge 0\) and \(u\le 0\); so \({\mathscr {V}}_d= {\mathbb {R}}\). Moreover, we shall work with \(\eta =2\) and comment on \(\eta >2\) at the end.
Recall that \(\rho (\cdot ,t)\), the PDF of \(X_t\), is given by the Fokker–Planck equation :
with initial condition : \(\rho _{|t=0} = \rho _0\) given. Now \(\overline{u}_t := \int _{\mathbb {R}}u_t\rho _t{\mathrm {d}}x=\mathbf{E}[-a_t]\) and
The goal is now to minimize \(\tilde{J}\) with respect to u. Define also
Application of the Results of Sect. 2. In this example, we have : \({\mathscr {V}}_d= {\mathbb {R}}\) and
By Proposition 4.2 and Remark 4.5, we have \(V'_{|T}=\gamma |x|^\eta \), and \(V'\) satisfies
because \(\partial _\rho H = \partial _\rho h = 0\) and \(\partial _\chi H = c \kappa {\mathrm {e}}^{-bt}\chi ^{-c-1} u\). Moreover, by (22),
giving:
Now, using (35) to eliminate \(\partial _xV'\) in (34) leads to
Finally, using (36) in (37) and the definition of \(({-\overline{u}})_t\) yields :
Note that this equation for \(V'\) depends only on \(\overline{u}\) and not on u. Nevertheless, (36)–(38) is a rather complex partial integro-differential system.
A Fixed-Point Algorithm. We can now sketch a numerical method to solve the problem:
Although it seems to work numerically in many situations, as we shall see below, nothing is known on the convergence of this fixed-point type algorithm; three points need to be clarified:
-
1.
Equation (30) is nonlinear and existence of solution is unclear.
-
2.
A relevant stopping criteria for Algorithm 1 are yet to be found.
-
3.
Even if the Fokker–Planck equation (30) is set on \({\mathbb {R}}^+\times ]0,T[\) instead of Q as discussed below in the numerical section (Sect. 5), there are difficulties. Because the second-order term vanishes at \(x=0\), a weak formulation ought to be in the weighted Sobolev space and would be to find \(\rho \), with
$$\begin{aligned}&\rho \in \mathbf{H}=\{\nu \in L^2({\mathbb {R}}^+)~:~x\partial _x\nu \in L^2({\mathbb {R}}^+)\}\hbox { such that }\forall \nu \in \mathbf{H}\nonumber \\&\int _{{\mathbb {R}}^+}\Big [\nu \partial _t\rho {\mathrm {d}}x + (x\sigma ^2-u)\rho \partial _x\nu +{x^2\sigma ^2}\partial _x\rho \partial _x\nu \Big ]{\mathrm {d}}x=0, \hbox { for almost all} \,t. \end{aligned}$$(39)Theorem 2.2 in [29] asserts existence and uniqueness of \(\rho \), provided that there exists \(u_M\) such that \(u(x,t)<x u_M,~\forall t\). However, this oil resource model does not impose \(u(0,t)=0\), and consequently, there is a singularity at that point.
5.5 Calculus of Variations on the Deterministic Control Problem
To find the optimality conditions for (31), let us introduce an adjoint \({\rho ^*}\) satisfying
in \({\mathbb {R}}\times [0,T[\) and \({\rho ^*}_{|T}=\gamma |x|^\eta \). Then,
In other words,
Algorithm. We apply the steepest descent method with varying step size:
For a convergence analysis, here the situation is somewhat better: Both the state and the adjoint equations (30), (40) are linear and the only problem is the asymptotic behavior of u. Note also that one could use a conjugate gradient algorithm at little additional computational cost.
After discretization by a variation method such as finite element methods, convergence to a local minima could probably be established by the techniques of control theory (see, e.g., [30]) because the problem is finite dimensional and the gradient of the cost function is exact [29]. However, convergence of the solution of the discrete problem to the solution of the continuous problem is open and even more difficult than existence.
5.6 The Riccati Equation when \(\eta =2\)
Even though the problem is not linear quadratic, when \(\eta =2\) we can still look for \(V'\) solution of (38) in the form \(V'(x,t)=P_tx^2 + z_tx+s_t\), where \(P_t,z_t,s_t\) are functions of time only.
Identification of all terms proportional to \(x^2\) in (37) gives,
For clarity, let \(Q_t ={\mathrm {e}}^{rt}P_t\) and \(\mu =\sigma ^2-r\). Then, the above is:
As long as \({4\beta e^{rt}\mu -Q_t}>0\), it leads to
Then, u is found by (36). In particular, \(\partial _x u = - \frac{1}{8 \beta } \partial _{xx} V' = - \frac{1}{4\beta } P_t\). However, the Fokker–Planck equation must be solved numerically to compute \(\overline{u}\).
Note that \(\gamma <4\beta \mu \) implies \({4\beta e^{rt}\mu -Q_t}>0\) and also \(P_t>0\).
Remark 5.1
By identification of the terms of order 1 and 0 in x, equations for z and s are found:
Remark 5.2
Note that \(u_t=2 x P_t + z_t\) is not a linear feedback function as in (28) above.
Remark 5.3
This explicit feedback solution is smooth and adapted to the stochastic process (26), so it should be a solution of (27) if it exists (recall that (27) is not convex). Furthermore, it has a behavior at infinity which is compatible with (2).
5.7 Numerical Implementation
To implement Algorithms 1 & 2, we need to localize the PDE. As \(x<0\) makes no sense for this application, we shall work on \(Q_L=[0,L]\times [0,T]\) instead of \({\mathbb {R}}\times [0,T]\); a stopping time for the event \(X_t=0\) would be better, but too costly. At \(x=L\), we set \(\rho (L,t)=0,\forall t\), which makes sense when L is large.
Assigning boundary conditions to (38) and (40) is problematic. Our numerical tests show that the computations depend strongly on L when it is not done correctly. When \(\eta =2\), we know that \(V'\) and \(\rho ^*\) have asymptotically the same behavior as \(P_t x^2\), giving \(\frac{1}{2}\sigma ^2x^2\partial _x V'=\sigma ^2 x^3 P_t =\sigma ^2 x V'\), a relation which can be used as a boundary condition in the weak form of the equation (and similarly for \(\rho ^*\)): find \(\rho \in H^1(Q_L)\) with \(V'_{T}\) given and
for all \(\nu \in H^1(Q_L)\) with \(\nu _{T}=0\).
To solve this nonlinear PDE, we use the fact that it is embedded into an iterative loop in Algorithm 1 and semi-linearize it by evaluating the square term in the last integral as a product of the same, where one factor is evaluated at the previous iteration.
To discretise it, we have used a space–time finite element method of degree 1 over triangles covering \(Q_L\). Admittedly, it is an unusual method. However, it is somewhat similar to a central difference method, and it is feasible because the problem is one dimensional in space and because it allows exact conservativity and exact duality with respect to time and space in the integrations by parts. It handles also automatically the storage of \(\rho ,u,V,\overline{u}\) at all times and solve the backward (and forward) equation at all time steps by a single linear system. The linear systems are solved with the library MUMPS as implemented in freefem++ [31].
5.8 Results with Algorithm 1
We used 50 points to discrete \((0,L), L=10\) and 50 time steps for \([0,T], T=5\). The following values have been used: \(\alpha =1, \beta =1, \gamma =0.5, \kappa =1, b=0.1, r=0.05, \sigma =0.5\) and \(c=0.5\). The initial condition \(\rho _0\) is a Gaussian curve centered at \(x=5\) with volatility 1. We initialized u by \(u_0=-\alpha /(2\beta )\). A local minimum \(u_e\) is known from the Riccati equation; the error \(\Vert u-u_e\Vert \) is used as a stopping criteria in Algorithm 1. We chose \(\omega =0.5\).
Results with Algorithm 1.
Figure 1 shows the optimal control as a function of (x, t). For each t, the control is linear in x, as predicted by the Riccati equation; the quadratic part of the Riccati solution of Sect. 5.6 is also plotted, and a small difference is seen on the plot (two surfaces close to each other are displayed). Figure 2 shows the evolution in time of the PDF \(\rho \) for all \(x>0\) of the resource distribution \(X_t\). At time 0, it is a Gaussian distribution centered at \(x=5\); at time T, the distribution is concentrated around \(x=0.5\), so most producers have pumped 90% of the oil available to them (Table 1).
All above is obtained with \(L=10\), but there is almost no difference with \(L=40\).
Figures 3, 4 present the evolution of production (\(-\overline{u}_t\)) and price \(p_t = \kappa {\mathrm {e}}^{-b t}(-\overline{u}_t)^{-c}\).
5.9 Results with Algorithm 2
The performance of descent methods was disappointing. It generated many different solutions depending on the initial value for u. If \(u_0=u_e\), then the algorithm decreases the cost functions by introducing small oscillations, a strategy which is clearly mesh dependent.
If \(u_0=-0.5\), then the solution of Fig. 5 is found after 10 iterations of steepest descent. The convergence history is given in Table 2. The results are shown in Fig. 6. Note the oscillations of \(\rho \) near \(t=T\).
5.10 The Case \(\eta \ne 2\)
The following parameters are now changed : \(\eta =3\) and \(u_0\) is initialized by the Riccati solution. Algorithm 1 converges in a few iterations to a solution, but \(\omega =0.05\) is required for convergence. Algorithm 2 gives a different solution. Both adjoint states are shown in Fig. 7.
When \(\eta =4\), Algorithm 1 diverges, while Algorithm 2 converges to the solution shown in Fig. 7.
5.11 Linear Feedback Solution
Using automatic differentiation of computer programs by operator overloading in C++ and initializing a steepest descent method with the linear part of the Riccati solution of Sect. 5.6 and the same parameters as above, we obtained the w(t) displayed in Fig. 8, very close to the Riccati solution. To understand why the Riccati solution may not be the best solution, we plotted \(\lambda \rightarrow J^d(\lambda ):=J(w^d_t+\lambda h_t),~\lambda \in ]-0.5,+0.5[\), where \(h_t\) is an approximate \(w_t- \nabla J(w^d_t)\). Figure 8 shows that there are three local minima and two local maxima, and \(w^d_t\) is only a local minimum.
6 Conclusions
Stochastic mean-field type control is numerically hard. Even a simple academic toy problem gives difficulties. The first difficulty which this paper had to deal with is the extension of the HJB dynamic programming approach. The second difficulty is related to the well posedness of the HJB or adjoint equation because it is set in an infinite domain in space. The third difficulty is the lack of proof of convergence of the two algorithms suggested here, an HJB-based fixed point and a steepest descent based on calculus of variations. When it converges, the fixed-point method is preferred but there is no known way to guarantee convergence even to a local minimum; as for the steepest descent, we found it somewhat hard to use, mainly because it generates irregular oscillating solutions; some bounds on the derivatives of u need to be added in penalized form in the criteria. Numerically both algorithms are cursed by the asymptotic behavior of the solution of the adjoint state at infinity. So, when possible, the Riccati semi-analytical feedback solution is the best. Finally, but this applies only to this toy problem, the pure feedback solution is nearly optimal, easy to compute and stable.
Note also that this semi-analytical solution is a validation test, since it has been recovered by both algorithms.
References
Fleming, W.H., Soner, H.M.: Controlled Markov Processes and Viscosity Solutions. Stochastic Modelling and Applied Probability series vol 25. Springer, Berlin (2006)
Kushner, H.J.: Stochastic Stability and Control. Academic Press, New York (1967)
Øksendal, B., Sulem, A.: Applied Stochastic Control of Jump Diffusions. Springer, Berlin (2005)
Touzi, N.: Optimal Stochastic Control, Stochastic Target Problems and Backward SDE. Field Inst. Monographs 29. Springer, Berlin (2013)
Yong, J., Zhou, X.Y.: Stochastic Control. Applications of Math. Series vol 43. Springer, Berlin (1999)
Carmona, R., Fouque, J.-P., Sun, L.-H.: Mean-field games and systemic risk. Commun. Math. Sci. 13(4), 911–933 (2015)
Garnier, J., Papanicolaou, G., Yang, T.-W.: Large deviations for a mean-field model of systemic risk. SIAM J. Financ. Math. 4(1), 151–184 (2013)
Lasry, J.M., Lions, P.L.: Mean-field games. Jpn. J. Math. 2, 229–260 (2007)
Shen, M., Turinici, G.: Liquidity generated by heterogeneous beliefs and costly estimations. Netw. Heterog. Media 7(2), 349–361 (2012)
Carmona, R., Delarue, F., Lachapelle, A.: Control of McKean–Vlasov dynamics versus mean field games. Math. Financ. Econ. 7(2), 131–166 (2013)
Andersson, D., Djehiche, B.: A maximum principle for SDEs of mean-field type. Dyn. Games Appl. 3, 537–552 (2013)
Bensoussan, A., Frehse, J., Yam, S.C.P.: The Master equation in mean field theory. J. Math. Pures Appl. 103(6), 1441–1474 (2015)
Carmona, R., Delarue, F.: The master equation for large population equilibriums. In: Crisan, D., Hambly, B., Zariphopoulou, T. (eds.) Stochastic Analysis and Applications 2014. Springer, Berlin (2014)
Bensoussan, A., Frehse, J.: Control and Nash games with mean-field effect. Chin. Annal. Math. Series B. 34B(2), 161–192 (2013)
Bensoussan, A., Frehse, J., Yam, S.C.P.: Mean-Field Games and Mean-Field Type Control. Springer Briefs in Math. Springer, Berlin (2014)
Kolokoltsov, V., Troeva, M., Yang, W.: On the rate of convergence for the mean-field approximation of controlled diffusions with large number of players. Dyn. Games Appl. 4(2), 208–230 (2014)
Kolokoltsov, V., Yang, W.: Existence of solutions to path-dependent kinetic equations and related forward-backward systems. Open J. Optim. 2, 39–44 (2013)
Achdou, Y., Laurière, M.: On the system of partial differential equations arising in mean field type control, DCDS A (2015, in review)
Gangbo, W., Swiech, A.: Optimal transport and large number of particles. Discret. Cont. Dynam. Syst. A 34, 1397–1441 (2014)
Chan, P., Sircar, R.: Bertrand & Cournot mean-field games. Appl. Math. Optim. 71, 533 (2015)
Laurière, M., Pironneau, O.: Dynamic programming for mean-field type control. C. R. Acad. Sci. Serie I, 352(9), 707–713 (2014)
Lions, P.L. : Mean-Field Games. Cours au Collège de France (2007–2008). http://www.college-de-france.fr/site/pierre-louis-lions/course-2007-2008_1.htm
Annunziato, M., Borzì, A., Nobile, F., Tempone, R.: On the connection between the Hamilton-Jacobi-Bellman and the Fokker-Planck control frameworks. Appl. Math. 5, 2476–2484 (2014)
Gueant, O., Lasry, M., Lions, P.L.: Mean field games and applications. In: Paris-Princeton Lectures on Mathematical Finance. Lecture Notes in Math. Springer, (2011)
Le Bris, C., Lions, P.L.: Existence and uniqueness of solutions to Fokker–Planck type equations with irregular coefficients. Commun. Partial Differ. Equ. 33, 1272–1317 (2008)
Porretta, A.: Weak solutions to Fokker–Planck equations and mean field games. Arch. Ration. Mech. Anal. 216, 1–62 (2015)
Bally, V., Pagès, G., Printems, J.: A Stochastic quantization method for nonlinear problems. Monte Carlo Methods Appl. 7(12), 21–34 (2001)
Neufeld, A., Nutz, M. : Nonlinear Lévy processes and their Characteristics. Trans. Am. Math. Soc., forthcoming. See also arXiv:1401.7253v1
Achdou, Y., Pironneau, O.: Computation Methods for Option Pricing, SIAM Frontiers in Math (2005)
Polak, E.: Optimization Algorithms and Consistent Approximations. Springer, New York (1997)
Hecht, F.: New Development in Freefem++. J. Numer. Math. 20(3–4), 251–265 (2012)
Acknowledgments
The authors are grateful to Yves Achdou, Alain Bensoussan and Olivier Guéant for useful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Nizar Touzi.
Rights and permissions
About this article
Cite this article
Laurière, M., Pironneau, O. Dynamic Programming for Mean-Field Type Control. J Optim Theory Appl 169, 902–924 (2016). https://doi.org/10.1007/s10957-015-0785-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-015-0785-x