1 Introduction

Model predictive control (MPC), an open-loop approach in which a receding control horizon is considered to determine an optimal constrained control input \(u_t\) at each time step t, is now a well established control synthesis method for deterministic dynamical systems described by linear models with quadratic cost functions, in particular in several industrial fields [8]. Extensions to more realistic situations, such as nonlinearity of the systems and general constraint settings, have given rise to a considerable amount of works in the last twenty years (an almost exhaustive survey of recent developments and future promise in MPC has recently been proposed [36]): for example, to take a better account of significant nonlinearities in many real life systems and to improve control quality, several extensions have progressively led to a set of approaches known as nonlinear model predictive control (NMPC), which tried to overcome more or less the loss of convexity of the related constrained optimization problems, often by successive linearizations [34]. Even if they allowed some progress in the control efficiency, these approaches are still faced to other control difficulties such that those linked to random effects and/or uncertainties rising from system noise and imperfect state information (undirect measurements). In the last few years they have induced the development of robust approaches (see [31, 41, 42]) or stochastic approaches (see [30] and the recent survey of [37] ) and still scarcely for the unobserved state case, the introduction of state estimation steps into the control procedure (see particularly [10, 58] and [26]). However this last situation had already been considered in the context of operations research and stochastic optimal control, in particular in the case of finite state, observation and control spaces, for partially observed Markov decision processes (countable POMDP [3]) for which finite and infinite horizon problems can be solved by so-called value iteration algorithms [33]. Continuous-state POMDPs are most often approximated as finite POMDP by discretization [23], which may result however in discrete-state POMDPs of huge dimensions and hard to solve numerically. Other approximating approaches have also been proposed to address continuous-state POMDP problems (see for example [50] and [60] for variants of particle-filtering-based approximations).

As a matter of fact, stochastic uncertainty, added to imperfect state information for discrete-time systems in which the state variables \(X_t\) to be controlled are not directly accessible but only output variables \(Y_t\) instead, set up a critical challenge for NMPC: at each time step t the control to be applied with respect to the given receding horizon has to be determined by the constrained minimization of an expected cost-to-go function dependent on the possible induced future values of the state variables on the receding horizon. These anticipated possible values are ideally summarized by their anticipated probability distributions conditional on past observations and past control values but also conditional on possible future control values over the receding horizon. This has led to ideal open-loop feedback controllers, which perform at least as well as optimal open-loop policies (see [4, 57]). However apart from the linear-quadratic-Gaussian (LQG) special case, this theoretical approach raises two important critical issues: the determination of the anticipated conditional state distributions and the successive expected cost-to-go estimations and minimizations. This partitioning of the imperfect state information stochastic NMPC problem is the starting point of the development of some tentative suboptimal NMPC controllers but the relevant literature is still extremely limited and more practically than theoretically oriented: several works rely on approximation of the conditional state distribution through Gaussian mixtures and approximate filtering (for example [56], in the case of a finite set of control inputs and time varying state and observation system models). Under perfect knowledge of the system noise distributions, some particle based approaches use particle collections (see [16]) to replace the conditional probability distributions of the state variables by particle filters (see [15, 22]), in order to estimate the cost-to-go expectations when needed during the successive minimizations (see [46], and more recently [48] which combines the so-called MPC Scenario approach and a particle approach). Other works use particle filters for state estimation followed by a subsequent MPC optimization (see [1, 6]) or by another particle based procedure as sequential Monte Carlo (SMC) in the prediction/optimization step of the control determination (see [29, 49]). However the issue of the consistency of all these particle based estimations remains open. Moreover, when parameter estimation has to be done in parallel with the control of the system, particle depletion phenomena may occur, due to the introduction of the unknown parameters as state extension, which impair the control quality [21]. Last but not least, the crucial issue of the stochastic stability of the resulting closed-loop system, as desirable as it may be for insuring reaching feasible states, has not yet received a comprehensive response even in its weakest form (existence of invariant measure, positive recurrence, etc) even for stochastic NMPC under complete state information (see the acute recent analysis of [11]) and a fortiori neither in the case of imperfect state information just mentioned and still less when the noise distribution is also unknown.

This paper is devoted to the presentation of the main principles of a stochastic NMPC novel approach for systems described by time varying nonlinear state space and observation models, with unknown parameters and unknown but simulable system random effects. This approach is based on the use of particle convergent estimators of the multi-step ahead conditional probability density functions (pdf) of the state variables to be controlled [52]. These convergent pdf predictors rely themselves on a new generation of so-called convolution or nonparametric particle filters, free of any particle depletion risk (see [9, 44, 45]). This particle approach, combined with a simulation-based estimation of the expected cost-to-go function, allows construction of epi-convergent estimators of the successive cost-to-go expectations when the number of simulations and the number of particles used grow to infinity. According to the epi-convergence theory (see [2, 54, 55, 59]) the epi-convergence property of these cost expectation estimators, ensures itself the almost sure convergence of the corresponding optimal controls estimators to their true counterparts under regular conditions.

The paper is organised as follows. The nonlinear modelling context to be considered and its assumption set are described in Sect. 2. The setting of the relevant NMPC problem is done in Sect. 3. The almost sure convergence of the particle estimators of the expected costs and of their minimizers to their true respective counterparts is established in Sect. 4. In Sect. 5 a simulated case study of NMPC control in predictive microbiology is presented that shows the efficiency of the proposed approach. Then, the construction principles of the nonparametric estimator of the multi-step ahead conditional pdf of the state, which is used in the particle generations during the predictive control process, are presented in Appendix 1. They are preceded by a contrasting and brief recall about Monte Carlo particle filters. Finally, Appendix 2 is devoted to the proof of a technical lemma.

2 The Modelling Context

The dynamic systems of interest are supposed to obey general state space models of the form:

$$\begin{aligned} \left\{ \begin{array}{l} X_{t+1} \ = \ f_{t+1}(X_{t}, \theta , u_{t}, \varepsilon _{t+1}) \ \ \ \ \ \ \quad \hbox {(state equation)}\\ \\ Y_{t+1} \ \sim \ G_{t+1}(.| X_{t+1} = x_{t+1}, \theta ) \ \ \ \ \ \! \hbox {(observation equation)} \end{array}\right. \end{aligned}$$
(1)

in which \(X_t \in I\!\!R^d\) is the vector of the unobserved state variables, \(u_t \in \mathcal {U} \subset I\!\!R^q\) the vector of control variables, \(Y_t \in I\!\!R^s\) the vector of the observed output variables. \(\theta \in \Theta \subset I\!\!R^p\) is a vector of p known or unknown fixed parameters. \(\varepsilon _t\) is a vector of random variables (possibly noises). For all \(t \in I\!\!N^+\), \(f_t\) is a known Borel measurable function, \(G_t\) is an absolutely continuous probability distribution function with respect to the Lebesgue measure with pdf \(p_t^Y(y|x_t,\theta )\). The probability distribution function of the state \(X_t\) at \(t=0\) and the transition distribution functions \(P_{t+1}^X(x|x_t,u_t)\) for \(t \ge 0\) are also supposed to be absolutely continuous w.r.t. the Lebesgue measure. The probability distribution function \(G_t\) and that of \(\varepsilon _t\) are not necessarily known but are supposed to be at least simulable. As a particular case of model (1) the output variable model can be given by a regression equation \(Y_{t+1} = h_{t+1}(X_{t+1}, \theta , \eta _{t+1})\) in which \(h_t\) is a known Borel measurable function where \(\eta _t\) is a vector of random variables (possibly noises) supposed to be at least simulable.

2.1 Primary Notations

Let

  • \(p_0^X(.)\) : the probability density of the state variable vector X at time \(t=0\), supposed to be known or simulable.

  • \(p_0^{\theta }(.)\) : a given prior density for \(\theta \in \Theta \) when unknown, non zero for \(\theta ^*\) the true values of the parameters.

  • \({\mathcal L}_{\varepsilon _t}\) : the probability distribution function of \(\varepsilon _t\), at least simulable whatever t.

  • \(x_{1:t} \! := \! (x_1,\ldots , x_t)\), some realizations of the successive state vectors \( X_1,\ldots ,X_t\).

  • \(y_{1:t} := (y_1,\ldots , y_t), \) observed values of the output variables up to time t and \(u_{0:t} := (u_0,\ldots , u_t)\), controls applied until time t.

  • \(c_t(x,u) : \ I\!\!R^d \times I\!\!R^q \ \longrightarrow \ I\!\!R^+,\) the cost function at time t (\(t \in I\!\!N^+\)) for the predictive control problem to be considered, supposed to be continuous in both x and u.

  • \(p_{t+k}^X(x|y_{1:j}, u_{0:t+k-1}), \, 1 \le j \le t, \, k \ge 0 :\) the pdf of the state variables X at time \(t+k\), conditional on the past values \(y_{1:j}\) and \(u_{0:t+k-1}\), supposed to be continuous with respect to \(u_{0:t+k-1}\), and with corresponding probability distribution function denoted \(P_{t+k}^X(x|y_{1:j}, u_{0:t+k-1})\).

    The previous probabilistic assumptions ensure the existence of the conditional pdf \(p_{t+k}^X(x|y_{1:j}, u_{0:t+k-1})\). Indeed, with obvious notations:

    \( p_{t+k}^X(x|y_{1:j}, u_{0:t+k-1}) = \int p^X(x_{1:t+k}|y_{1:j}, u_{0:t+k-1})dx_1\ldots dx_{t+k-1}.\)

    By the Bayes’ rule, \(\displaystyle p^X(x_{1:t+k}|y_{1:j}, u_{0:t+k-1}) \,{=}\, \frac{p^Y(y_{1:j}|x_{1:j})p^X(x_{1:t+k}|u_{0:t+k-1})}{p^Y(y_{1:j}|u_{0:j\,{-}\,1})}\)

    with \(p^Y(y_{1:j}|u_{0:j-1}) = \int p^Y(y_{1:j}|x_{1:j})p^X(x_{1:j}|u_{0:j-1})dx_1 \ldots dx_j\),

    \(\displaystyle p^Y(y_{1:j}|x_{1:j}) = \Pi _{1 \le \ell \le j} \, p^Y(y_{\ell }|x_{\ell })\) and

    \(p^X(x_{1:t+k}|u_{0:t+k-1}) = \int p_0^X(x_0) \, \Pi _{1 \le \ell \le t+k} \, p^X(x_{\ell }|x_{\ell -1},u_{\ell -1})dx_0, \hbox {or simply}\)

    \(\Pi _{1 \le \ell \le t+k} \, p^X(x_{\ell }|x_{\ell -1},u_{\ell -1})\) if \(x_0\) is known.

The computation of \(p_{t+k}^X(x|y_{1:j}, u_{0:t+k-1})\) is intractable in the general case. This pdf is at the core of the predictive control process and will be consistently estimated following a nonparametric particle approach described in Appendix 1.

2.2 Assumptions

  • A1: \(\mathcal {U} \subset I\!\!R^q, \) the set of admissible controls, is supposed to be compact.

  • A2: \(\forall x, \ \forall j: 0 < j \le t\), \(\hbox {E}\Big [c_{t+1}(X_{t+1},u_t)|y_{1:j},u_{0:t}\Big ] \ = \ \displaystyle \int c_{t+1}(x,u_t)p_{t+1}^X\big (x|y_{1:j}, u_{0:t}\big )dx \ < \ \infty \).

Remark 1

As mentionned in the introduction, the closed-loop stability issue in a general stochastic NMPC procedure has not yet received a definitive treatment (if any) and must be examined on a case-by-case basis (i.e. according to each system model setting). This is all the more true as imperfect state information and unknown random effects (noise distributions) superadd new complexities, as in the situation considered in the present paper. Therefore the particle nonparametric NMPC approach to be developed in the following and the results therefrom, do not deal with this issue. However for a given system model, the stability of the closed-loop system this approach leads to, can in any case be investigated through more or less severe sufficient conditions on the system model and the control settings, as the following one (from [18]):

  • A3: \(\displaystyle \sup _{t \in I\!\!N^{^+}} \sup _{u \in \mathcal {U}^{^{ } }} \hbox {E}_{{\mathcal L}_{\varepsilon _t} }\Big [\big \Vert f_t(x,\theta ,u,\varepsilon _t) \big \Vert ^a \Big ] \ \le \ \alpha \big \Vert x \big \Vert ^a \ + \ \beta \)\(\hbox {with} \ \ a>1, \ \ 0 \le \alpha< 1, \ \ 0 \le \beta < \infty .\)

    which implies that the system (1) is stabilized by any admissible control strategy (and in particular an optimal one): there exists a constant \(\kappa \) such that, whatever the initial state probability distribution and whatever the admissible strategy considered, it holds:

    $$\begin{aligned} \limsup _{T \rightarrow \infty } \frac{1}{T+1} \sum _{t=0}^T \Vert X_t \Vert ^2 \le \kappa \ \ \ \ \ a.s. \end{aligned}$$

    Moreover, for all \( \xi > 0\), there exists a compact \(\Phi \) such that:

    $$\begin{aligned} \liminf _{T \rightarrow \infty } \frac{1}{T+1}\sum _{t=0}^T 1\!\!1_{[X_t \in \Phi ]} \ge 1-\xi \ \ \ \ \hbox {a.s.} \ \ (\hbox {from }[18]). \end{aligned}$$

    This sufficient condition will be considered in the proposed case study (Sect. 5).

3 The Predictive Control Problem

Let us consider the system at time \(j \ge 1\), time instant until which the controls \(\{u_0,u_1,\ldots ,u_{j-1}\}\) have been applied and the observations \(\{y_1,y_2,\ldots ,y_j\}\) have been recorded.

Let us denote

  • \(v_{j:t} := v_j,v_{j+1},\ldots ,v_t\), unknown future controls to be applied until a future time t, \(t~\ge ~j\).

  • \(Y_{j+1:t+1} := Y_{j+1},Y_{j+2},\ldots ,Y_t, Y_{t+1}\), the corresponding future observations.

For a given receding horizon length H, let

  • \( {\mathbf {v}} = v_{j:j+H-1}. \)

  • Define the expected cost-to-go

    $$\begin{aligned}&J_H(\mathbf {v}) := \nonumber \\&\quad \hbox {E}\left\{ \sum _{t=j}^{j+H-1} \hbox {E}\Big [c_{t+1}(X_{t+1}, v_t) \Big | y_{1:j},Y_{j+1:t+1},u_{0:j-1},v_{j:t}\Big ]\right\} \nonumber \\&\quad = \ \sum _{t=j}^{j+H-1} \hbox {E}\Big [c_ {t+1}(X_{t+1},v_t) \Big | y_{1:j},u_{0:j-1},v_{j:t}\Big ]. \end{aligned}$$
    (2)

Suppose that all expectations in (2) can be evaluated. As time goes on, a classic sliding horizon control procedure (see [3, 8]) would proceed as the following:

  1. (1)

    Find \(\displaystyle \mathbf {v}^*= v_j^*,\ldots ,v_{j+H-1}^*= \hbox {arginf}_{v_j,\ldots ,v_{j+H-1}} J_H(\mathbf {v)}\),

    with \(v_k \in \mathcal {U}, \ k=j,\ldots ,j+H-1\)

  2. (2)

    Apply control \(v_j^*\) to the system

  3. (3)

    Get the new observation \(y_{j+1}\)

  4. (4)

    Let \(j=j+1\)

  5. (5)

    Go back to step 1.

Given \( u_0,\ldots ,u_{j-1}\) and the observations \(y_1,\ldots , y_j\), the exact computation of \(J_H(\mathbf {v})\) is generally neither feasible nor is its exact minimization with respect to \(\mathbf {v}\). However, based on particle simulation approaches and the theory of epi-convergence, convergent estimators of \(J_H(.)\) and of its minimizers \(\mathbf {v}^*\), can be obtained as shown in the next section.

Remark 2

As regards the unknown system model parameters \(\theta \) :

Their convergent filtering estimation can be performed simultaneously with the filtering of the state variables by the convolution filter procedure to be used in the control operation, after a classic state extension of model (1) (see Appendix 1).

Remark 3

Particular case: tracking control

Let \(\{x_t^*\}\) be a given reference trajectory for the system dynamics. Let us suppose that the system obeys the following particular form of state equation in model (1):

$$\begin{aligned} X_{t+1} \ = \ f_{t+1}(X_{t}, \theta , u_{t}) \ + \ \varepsilon _{t+1}. \end{aligned}$$
(3)

with \(\hbox {E}(\varepsilon _{t+1}) = 0_{(d \times 1)}\) and \(\hbox {Var}(\varepsilon _{t+1}) = \Lambda _{{t+1}_{(d \times d)}}\).

An appropriate simple cost function is then given by the quadratic discrepancy

$$\begin{aligned} c_{t+1}(x_{t+1}, u_t) \ = \ \Vert x_{t+1} - x_{t+1}^*\Vert ^2. \end{aligned}$$
(4)

Let \( \ \psi _{t+1} = f_{t+1}(X_{t}, \theta , u_{t}) - x_{t+1}^*\).

Then, \( \ \Vert X_{t+1} - x_{t+1}^*\Vert ^2 \ = \ \Vert \varepsilon _{t+1} \Vert ^2 \ + \ \Vert \psi _{t+1}\Vert ^2 \ + \ 2\psi _{t+1}^T \varepsilon _{t+1},\) and

$$\begin{aligned} J_H(\mathbf {v})= & {} \sum _{t=j}^{j+H-1} \hbox {E} \Big [\Vert X_{t+1} - x_{t+1}^*\Vert ^2 | y_{1:j}, u_{0:j-1}, v_{j:t}\Big ] \nonumber \\&= \ \sum _{t=j}^{j+H-1} \hbox {E} \Big [\Vert \varepsilon _{t+1} \Vert ^2 | y_{1:j}, u_{0:j-1}, v_{j:t}\Big ] \ + \ \sum _{t=j}^{j+H-1}\hbox {E}\Big [\Vert \psi _{t+1} \Vert ^2 | y_{1:j}, u_{0:j-1}, v_{j:t}\Big ] \ \nonumber \\&\quad \ + \ 2 \sum _{t=j}^{j+H-1} \hbox {E}\Big [\psi _{t+1}^T \varepsilon _{t+1} | y_{1:j}, u_{0:j-1}, v_{j:t}\Big ] \nonumber \\&= \ \sum _{t=j}^{j+H-1} Tr\Big [\Lambda _{t+1} \Big ] \ + \sum _{t=j}^{j+H-1} \hbox {E}\Big [\Vert f_{t+1}(X_t, \theta , v_t) - x_{t+1}^*\Vert ^2 | y_{1:j}, u_{0:j-1}, v_{j:t}\Big ]. \nonumber \\ \end{aligned}$$
(5)

Applying Jensen’s inequality, it follows

$$\begin{aligned} J_H(\mathbf {v})\ge & {} \sum _{t=j}^{j+H-1} Tr\Big [\Lambda _{t+1} \Big ] + \sum _{t=j}^{j+H-1} \Big \Vert \hbox {E}\Big [f_{t+1}(X_t, \theta , v_t) | y_{1:j}, u_{0:j-1}, v_{j:t} \Big ] - x_{t+1}^*\Big \Vert ^2 \nonumber \\= & {} \sum _{t=j}^{j+H-1} Tr\Big [\Lambda _{t+1} \Big ] \ + \sum _{t=j}^{j+H-1} \Big \Vert \hbox {E}\Big [X_{t+1} | y_{1:j}, u_{0:j-1}, v_{j:t} \Big ] - x_{t+1}^*\Big \Vert ^2. \end{aligned}$$
(6)

(5) gives the decomposition of \(J_H\) corresponding to the quadratic cost (4) and shows the pure expected quadratic error reduction performed by the minimization of this cost expectation. Moreover (6) shows the link between the criterion \(J_H\) with quadratic cost and another relevant criterion of the predictive least squares type, which is also reduced by the minimization of \(J_H\).

In all the sequel the cost function \(c_t(x,u)\) will be considered in its general form.

4 Convergent Estimators of the Expected Cost-to-Go Function and Its Minimizers, to Their True Counterparts

In this central section, an estimator of the expected cost-to-go \(J_H(\mathbf {v})\) is proposed (Sect. 4.2), built from a particle estimator of the conditional predictive pdf of the state variables (Sect. 4.1). This expected cost-to-go estimator, \(J_H^m(\mathbf {v})\), is then shown to converge pointwise almost surely as m grows to infinity whatever \(\mathbf {v}\), to its true counterpart \(J_H(\mathbf {v})\) (Sect. 4.3). From that, the \(J_H^m(.)\) function is shown to converge according to a so-called epi-convergent mode to \(J_H(.)\) with m (Sect. 4.4), which in the present case ensures the almost sure convergence of the \(J_H^m(.)\) minimizers into the set of minimizers of \(J_H(.)\), and that of the corresponding minima to their true counterparts.

The first step of this construction is then the introduction of a convergent estimator with sufficiently good properties, of any multistep ahead conditional predictive pdf \(p_{t+k}^X(x|y_{1:j},u_{0:t+k-1})\) of the state vector \(X_t\),   \(\forall j > 0, \ \forall t \ge j, \ \forall k \ge 0.\)

4.1 A Convergent Particle Estimator of the Conditional Predictive pdf of the State Variables

A convergent n-particle estimator of the multi-step ahead conditional pdf of the state vector has recently been proposed (see [52, 53]), such that under reasonable conditions: \(\forall j > 0, \ \forall t \ge j, \ \forall k \ge 0, \ \)

  • $$\begin{aligned} \lim _{n \rightarrow \infty } \Big \Vert p^{n,X}_{t+k}(x|y_{1:j},u_{0:t+k-1}) - p_{t+k}^X(x|y_{1:j},u_{0:t+k-1})\Big \Vert _{L_1} \ = \ 0 \ \ \ \ a.s. \end{aligned}$$
    (7)
  • $$\begin{aligned} \forall x, \ \lim _{n \rightarrow \infty } p^{n,X}_{t+k}(x|y_{1:j},u_{0:t+k-1}) \ = \ p_{t+k}^X(x|y_{1:j},u_{0:t+k-1}) \ \ \ \ a.s. \end{aligned}$$
    (8)

in which \(p_{t+k}^{n,X}(x|y_{1:j},u_{0:t+k-1})\) is an n-particle estimator of the true \((t+k-j)\)-step ahead conditional pdf \(p_{t+k}^X(x|y_{1:j},u_{0:t+k-1})\) of the state (see Appendix 1), with corresponding probability distribution function denoted \(P_{t+k}^{n,X}(x|y_{1:j},u_{0:t+k-1})\).

As said previously, the possible unknown parameters \(\theta \) can be treated as are the state variables X, with equivalent results for their conditional pdf estimation, by classic state extension (Appendix 1).

4.2 A Particle Estimator of the Expected Cost-to-Go Function \(J_H(.)\)

For \(t \ge j\), let:

  • Q(x): an absolutely continuous probability distribution function with density q(x), which dominates the distribution \( P_{t+1}^X(x|y_{1:j},u_{0:j-1}, v_{j:t})\) and also the distribution \( P_{t+1}^{n,X}(x|y_{1:j},u_{0:j-1}, v_{j:t})\) for n sufficiently large. This last assumption is all the more plausible as n tends to infinity because of (7) and (8). The pdf q(x) will be used to perform changes of probability measures of theoretical interest to make easier subsequent convergence proofs in Sect. 4.4 (see also [20]).

  • \( \mathbf {X} := \ X_{j+1 : j+H}. \)

  • \(\mathcal {X} := x_{j+1:j+H},\) a realization of \(\mathbf {X}\).

  • \(\displaystyle \sigma _{t+1}(x,v_{j:t}) := c_{t+1}(x,v_t)\frac{p_{t+1}^X\big (x|y_{1:j}, u_{0:j-1}, v_{j:t}\big )}{q(x)}.\)

  • \(S(\mathbf {X}, \mathbf {v}) \ := \sum _{t=j}^{j+H-1} \sigma _{t+1}(X_{t+1},v_{j:t}).\)

  • \(\hbox {E}_q[.]\), the expectation operator with respect to q(x).

  • $$\begin{aligned} \displaystyle J_H(\mathbf {v}):= & {} \sum _{t=j}^{j+H-1} \hbox {E}_{p_{t+1}^X}\Big [ c_{t+1}(X_{t+1},v_t) \Big | y_{1:j}, u_{0,j-1},v_{j:t}\Big ] \\ \displaystyle= & {} \sum _{t=j}^{j+H-1} {\hbox {E}}_q \Big [\sigma _{t+1}(X_{t+1},v_{j:t})\Big ] \ = \ {\hbox {E}}_{q^H}\Big [S(\mathbf {X},\mathbf {v})\Big ]. \end{aligned}$$

The number n of particles used in the estimation of the state conditional pdf’s of interest will be taken as some chosen growing function of m, the number of draws in a simulation procedure defined just below: \(n = n(m).\) With this single growth constraint, the choice of the function n(.) is immaterial to all the convergence results to follow.

Then let :

  • $$\begin{aligned} \displaystyle \sigma ^m_{t+1}(x,v_{j:t}) := c_{t+1}(x,v_t)\frac{p_{t+1}^{n(m),X}\big (x|y_{1:j}, u_{0:j-1}, v_{j:t}\big )}{q(x)}. \end{aligned}$$
  • $$\begin{aligned} S^m(\mathbf {X},\mathbf {v}) \ := \ \sum _{t=j}^{j+H-1} \sigma ^{m}_{t+1}(X_{t+1},v_{j:t}). \end{aligned}$$
  • $$\begin{aligned} \bar{\sigma }_{t+1}^{m} \ := \ \frac{1}{m} \sum _{i=1}^m\sigma ^m_{t+1}(X^i_{t+1},v_{j:t}), \ with X_{t+1}^i \sim \, q(x), \ i=1,\ldots ,m. \end{aligned}$$
  • $$\begin{aligned} \mathbf {X}^i := X_{j+1}^i,\ldots ,X_{j+H}^i \ (\hbox {with} \ \mathcal {X}^i := x_{j+1}^i,...,x_{j+H}^i \, \hbox {a realization of}\,\,\, \mathbf {X}^i). \end{aligned}$$
  • $$\begin{aligned} J_H^{m}(\mathbf {v}):= & {} \mathop \sum \nolimits _{t=j}^{j+H-1}\bar{\sigma }_{t+1}^{m} \ = \ \frac{1}{m} \mathop \sum \nolimits _{i=1}^m \mathop \sum \nolimits _{t=j}^{j+H-1} \sigma ^m_{t+1}(X^i_{t+1},v_{j:t}) \\= & {} \frac{1}{m} \mathop \sum \nolimits _{i=1}^m S^m(\mathbf {X}^i,\mathbf {v}). \end{aligned}$$

Let us note that given \(\mathbf {v}\), the approximated cost-to-go expectation \(J_H^{m}(\mathbf {v}) \) is a random variable which depends on the set of mH drawings \({X}_{t+1}^i, \, i=1,\ldots , m, t=j,\ldots ,j+H-1\) according to the pdf q(x), and on the \(j+H\) sets of n(m) particles generated from \(t=0\), to get the predictive conditional pdf estimates \(p_{t+1}^{n(m),X}(x|y_{1:j},u_{0:j-1},v_{j:t}), \ t=j,\ldots ,j+H-1\) (see Appendix 1).

4.3 Almost Sure Convergence of the Expected Cost-to-Go Estimator

Theorem 4.1

Under the assumptions of Sect. 2,

$$\begin{aligned} \forall \mathbf {v} \in \mathcal {U}^H \subset I\!\!R^{H \times q}, \ \ \ \ \ J_H^{m}(\mathbf {v}) \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \ J_H(\mathbf {v}) \ \ \ \ a.s. \end{aligned}$$
(9)

Proof

For \( j \le t \le j+H-1,\) for fixed \(\mathbf {v}\),

Let us denote for brevity’s sake without any ambiguity: \(\sigma _{x,t+1} := \sigma _{t+1}(x, v_{j:t}), \sigma ^m_{x,t+1} := \sigma ^m_{t+1}(x, v_{j:t})\), \(\sigma _{i,t+1} := \sigma _{t+1}(X^i_{t+1}, v_{j:t})\)   and   \(\sigma ^m_{i,t+1} := \sigma ^m_{t+1}(X^i_{t+1}, v_{j:t})\).

From (8), \(\{\sigma ^m_{x,t+1}\}\) is a sequence of measurable functions which converges pointwise a.s. to \(\sigma _{x,t+1}\) with m for all x, then also Q-almost-everywhere a.s. As Q is a finite measure, one has by Egoroff’s theorem:

\(\forall \delta > 0, \exists E_\delta \subset I\!\!R^d\) with \(Q(E_\delta ) < \delta \), such that \(\sigma ^m_{x,t+1}\) converges to \(\sigma _{x,t+1}\) uniformly a.s. with m on \(I\!\!R^d \! \setminus \! E_\delta \) (the complementary of \(E_\delta \) in \( I\!\!R^d\)). Note that given \(\delta \) there exists an indefinite number of such subsets \( E_\delta \).

Then \(\forall \delta > 0, \displaystyle \sup _{x \in I\!\!R^d \setminus E_\delta } \mid \sigma ^m_{x,t+1} - \sigma _{x,t+1} \mid \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \ 0,\)

and \( g_m(\delta ) = \inf _{E_\delta } \sup _{x \in I\!\!R^d \setminus E_\delta } \mid \sigma ^m_{x,t+1} - \sigma _{x,t+1} \mid \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \ 0 .\)

Now let \( 0< L_1< L_2 < \infty \) and \(\Delta = ]0, L_2]\). We have

$$\begin{aligned} \displaystyle \sup _{L_1 \le \delta \le L_2} g_m(\delta ) \ = \ \inf _{E_{L_1}} \sup _{x \in I\!\!R^d \setminus E_{L_1}} \mid \sigma ^m_{x,t+1} - \sigma _{x,t+1} \mid \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \ 0 . \end{aligned}$$
(i):

\(g_m(\delta )\) converges uniformly a.s. to 0 as m grows to \(\infty \) on \([L_1, L_2]\) whatever \(L_1 > 0\), with \(L_1 < L_2\).

(ii):

\(\{0\}\) is adherent to \(\Delta \).

(iii):
$$\begin{aligned} \displaystyle&g_m(\delta ) = \\&\inf _{E_\delta }\sup _{x \in I\!\!R^d \setminus E_\delta } \! \! \mid \! \sigma ^m_{x,t+1} - \sigma _{x,t+1} \! \mid \ \ {\mathop {\longrightarrow }\limits ^{\delta \rightarrow 0}} \ \ell _m\nonumber = \lim _{\delta \rightarrow 0} \inf _{E_\delta } \sup _{x \in I\!\!R^d \setminus E_\delta } \! \! \mid \! \sigma ^m_{x,t+1} - \sigma _{x,t+1} \! \mid \\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \qquad \;\qquad \quad \quad \quad \quad \quad \quad \quad \le \sup _{x \in I\!\!R^d} \mid \! \sigma ^m_{x,t+1} - \sigma _{x,t+1} \! \mid \ \ < \infty . \end{aligned}$$

Then by the Moore–Osgood’s theorem on exchanging limits:

$$\begin{aligned} \displaystyle \lim _{m \rightarrow \infty } \lim _{\delta \rightarrow 0} g_m(\delta ) \ = \ \lim _{\delta \rightarrow 0} \lim _{m \rightarrow \infty } g_m(\delta ) \ = \ 0 \quad a.s. \end{aligned}$$
(10)

and

$$\begin{aligned}&\displaystyle \lim _{m \rightarrow \infty } \left| \frac{1}{m} \sum _{i=1}^m(\sigma ^m_{i,t+1} - \sigma _{i,t+1}) \right| \ \le \nonumber \\&\lim _{m \rightarrow \infty } \max _{i=1,\ldots ,m} \vert (\sigma ^m_{i,t+1} - \sigma _{i,t+1}) \vert \displaystyle \ \le \lim _{m \rightarrow \infty } \lim _{\delta \rightarrow 0} g_m(\delta ) = 0 \\&\displaystyle \Longrightarrow \ \frac{1}{m} \sum _{i=1}^m \sigma _{i,t+1}^m \nonumber \\&{\mathop {\simeq }\limits ^{m \rightarrow \infty }} \ \frac{1}{m} \sum _{i=1}^m \sigma _{i,t+1} \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \hbox {E}_q[\sigma _{t+1}(X_{t+1},v_{j:t})] \quad \hbox { a.s.} \nonumber \\&\qquad \quad \quad \,\, \hbox { (by the strong law of large numbers)}\nonumber \\&= \ \hbox {E}_{p_{t+1}^X}\Big [ c_{t+1}(X_{t+1},v_t) \Big | y_{1:j}, u_{0,j-1},v_{j:t}\Big ].\nonumber \end{aligned}$$
(11)

Finally,

$$\begin{aligned}&\displaystyle J_H^m(\mathbf {v}) = \! \sum _{t=j}^{j+H-1} \! \! \frac{1}{m} \sum _{i=1}^m \sigma ^m_{t+1}(X_{t+1}^i, v_{j:t}) \\&{\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \sum _{t=j}^{j+H-1} \! \hbox {E}_{p_{t+1}^X}\Big [ c_{t+1}(X_{t+1},v_t) \Big | y_{1:j}, u_{0,j-1},v_{j:t}\Big ] \\&= \ J_H(\mathbf {v}) \ \ \hbox {a.s.} \end{aligned}$$

\(\square \)

4.4 Almost Sure epi-Convergence Results

Simple pointwise convergence of a sequence of deterministic or random functions \(\{F^m(.)\}\) to a limit function F(.) does not necessarily involve the convergence of the corresponding sequence of minima to a minimum of the limit function F(.). The so-called epi-convergence (see [2, 54, 55])—which is essentially a one-sided locally uniform convergence combined with a weak pointwise convergence on the other side (see [19])—is then useful to establish this result. The sequence of functions \(\{F^m(.)\}\) is said to epi-converge to the function F(.) if and only if the sequence of corresponding epi-graphs converges to the epi-graph of F(.). Actually, if \(\{F^m(.)\}\) epi-converges to F(.) and \(\mathbf {v}^m\) minimizes \(F^m(.)\), then any cluster point of the sequence \(\{\mathbf {v}^m\}\) is a minimizer of F(.) (see [19]). Moreover the corresponding optimal values also converge (see [12]).

Now, \(\mathbf {v} := v_{j:j+H-1} \in \mathcal {U}^H \subset I\!\!R^{H \times q}\) which is a separable metric space. Then, let

  • \(\mathcal {B}_c := \{B_1, B_2, \ldots \}\), a countable basis of open sets of \(\mathcal {U}^H\) for the topology of \(\mathcal {U}^H\) induced by the usual topology of \(I\!\!R^{H \times q}\).

  • \(\mathcal {N}(\mathbf {v})\): the set of open neighborhoods of the point \(\mathbf {v}\).

  • \(\mathcal {N}_c(\mathbf {v}) := \mathcal {B}_c \bigcap \mathcal {N}(\mathbf {v})\), the set of neighborhoods in the countable basis.

  • \(\mathbf {v}^k \in B_k \in \mathcal {B}_c\), such that

    $$\begin{aligned} \displaystyle J_H(\mathbf {v}^k) \le \inf _{\mathbf {w} \in B_k} J_H(\mathbf {w}) + \frac{1}{k}, \ \ \forall k \in I\!\!N^+. \end{aligned}$$
    (12)
  • \( \mathcal {U}^H_c := \{\mathbf {v}^1, \mathbf {v}^2, \ldots , \mathbf {v}^k,\ldots \}\).

According to the usual standard approach (see [12]) the epi-convergence of \(J_H^{m}\) to \(J_H\) as m grows to infinity will be established if it can be shown that the epi-limits superior and inferior of \(\{J_H^m\}\) are both equal to \(J_H\) a.s., or equivalently: \(\forall \mathbf {v} \in \mathcal {U}^H\),

$$\begin{aligned} J_H(\mathbf {v}) \ \ge \sup _{B \in \mathcal {N}_c(\mathbf {v})} \limsup _{m \rightarrow \infty } \inf _{\mathbf {w} \in B} J_H^{m}(\mathbf {w}) \ \ \ \hbox {(epi-limit superior)} \ \ \end{aligned}$$
(13)

and

$$\begin{aligned} J_H(\mathbf {v}) \ \le \sup _{B \in \mathcal {N}_c(\mathbf {v})} \liminf _{m \rightarrow \infty } \inf _{\mathbf {w} \in B} J_H^{m}(\mathbf {w}) \ \ \ \ \hbox {(epi-limit inferior)} \ \ \end{aligned}$$
(14)

Theorem 4.2

\(J_H^m(.)\) epi-converges to \(J_H(.)\) almost surely as m grows to \(\infty \).

The following lemmas will be needed.

Lemma 4.3

\(\displaystyle J_H(.)\) is lower semi-continuous at \(\mathbf {v}\), \(\forall \mathbf {v} \in \mathcal {U}^H\).

Proof

if \(\mathbf {v}^{\ell } {\mathop {\longrightarrow }\limits ^{{\ell } \rightarrow \infty }} \mathbf {v}\), then

$$\begin{aligned} \displaystyle J_H(\mathbf {v})= & {} \hbox {E}_{q^H}\Big [S(\mathbf {X},\mathbf {v})\Big ] \ = \ \hbox {E}_{q^H}\Big [\sum _{t=j}^{j+H-1}\sigma _{t+1}(X_{t+1},v_{j:t})\Big ] \nonumber \\&= \ {\hbox {E}}_{q^H}\Big [ \liminf _{\ell } \sum _{t=j}^{j+H-1}\sigma _{t+1}(X_{t+1},v^{\ell }_{j:t})\Big ] \ \ \ \hbox {(by continuity of }p_{t+1}) \nonumber \\&\le \ \liminf _{\ell } {\hbox {E}}_{q^H}\Big [\sum _{t=j}^{j+H-1}\sigma _{t+1}(X_{t+1},v^{\ell }_{j:t})\Big ] \ \ \ \hbox {(by Fatou's lemma)} \nonumber \\&= \ \liminf _{\ell } J_H(\mathbf {v}^{\ell }). \end{aligned}$$

\(\square \)

Lemma 4.4

$$\begin{aligned} \forall (\mathcal {X},\mathbf {v}), \ \ S^m(\mathcal {X},\mathbf {v}) \ \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \ \ S(\mathcal {X},\mathbf {v}) \ \ \ \ a.s. \end{aligned}$$
(15)

Proof

from (8) \(p^{n(m),X}_{t+1}(x|y_{1:j},u_{0:j-1},v_{j:t}) \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \ p_{t+1}^{X}(x|y_{1:j},u_{0:j-1},v_{j:t})\) a.s. (see Appendix 1).

Lemma 4.5

\(S^m(\mathcal {X}, . )\) epi-converges almost surely to \(S(\mathcal {X}, . )\) for all \(\mathcal {X}\) as m grows to infinity.

Proof

See Appendix 2.

Lemma 4.6

\(\forall B \in \mathcal {B}_c\),

$$\begin{aligned} \displaystyle \frac{1}{m}\sum _{i=1}^m \inf _{\mathbf {v} \in B} \sum _{t=j}^{j+H-1} \sigma ^m_{t+1}(X^i_{t+1},v_{j:t}) \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} {\hbox {E}}_{q^H}\Big [ \inf _{\mathbf {v} \in B} \sum _{t=j}^{j+H-1} \sigma _{t+1}(X_{t+1},v_{j:t})\Big ] \ \ \ a.s. \nonumber \\ \end{aligned}$$
(16)

or more compactly \( \displaystyle \frac{1}{m}\sum _{i=1}^m \inf _{\mathbf {v} \in B} S^m(\mathbf {X}^i,\mathbf {v}) \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} {\hbox {E}}_{q^H}\Big [ \inf _{\mathbf {v} \in B} S(\mathbf {X},\mathbf {v}) \Big ] \ \ \ a.s.\)

Proof

Let \(\displaystyle Z(\mathcal {X}) = \inf _{\mathbf {v} \in B}S(\mathcal {X},\mathbf {v})\) and \(\displaystyle Z^m(\mathcal {X}) = \inf _{\mathbf {v} \in B}S^m(\mathcal {X},\mathbf {v})\). As a consequence of Lemma 4.5, \(\{Z^m(\mathcal {X})\}\) is a sequence of functions which converges pointwise (a.s.) and then also \(Q^H\)-almost-everywhere (a.s.) to \(Z(\mathcal {X})\) with m, \(\forall \mathcal {X} \in I\!\!R^{H \times d}\). Moreover the functions \(\{Z^m(\mathcal {X})\}\) are measurable (as are the functions \(\{S^m(\mathcal {X},\mathbf {v})\}\) for all \(\mathbf {v}\)) due to the property of the inf operation. The rationale of the proof of Theorem 4.1 can then be reused with \(Z(\mathcal {X}),\, Z^m(\mathcal {X}), \, \mathcal {X}\) and \(\mathbf {X}^i\) in place of \(\sigma _{x,t+1}, \, \sigma ^m_{x,t+1}, \, x\) and \(X^i_{t+1}\) respectively.

With the same notations and applications of Egoroff’s and Moore–Osgood’s theorems as previously, we have:

$$\begin{aligned}&\displaystyle \lim _{m \rightarrow \infty } \Big | \frac{1}{m} \sum _{i=1}^m(Z^m(\mathbf {X}^i) - Z(\mathbf {X}^i))\Big | \ \le \ \lim _{m \rightarrow \infty } \frac{1}{m}\sum _{i=1}^m\Big |Z^m(\mathbf {X}^i) - Z(\mathbf {X}^i)\Big |\\&\displaystyle \le \lim _{m \rightarrow \infty } \max _{i=1,\ldots ,m} \Big | (Z^m(\mathbf {X}^i) - Z(\mathbf {X}^i))\Big | \\&\displaystyle \le \lim _{m \rightarrow \infty } \lim _{\delta \rightarrow 0} \inf _{E_\delta } \sup _{\mathcal {X} \in I\!\!R^{Hd} \setminus E_\delta } \Big | Z^m(\mathcal {X}) - Z(\mathcal {X})\Big | \\&\displaystyle = \ \lim _{\delta \rightarrow 0} \lim _{m \rightarrow \infty } \inf _{E_\delta } \sup _{\mathcal {X} \in I\!\!R^{Hd} \setminus E_\delta } \Big |Z^m(\mathcal {X}) - Z(\mathcal {X})\Big | \ = \ 0\\&\displaystyle \Longrightarrow \ \frac{1}{m} \sum _{i=1}^m Z^m(\mathbf {X}^i) \ {\mathop {\sim }\limits ^{m \rightarrow \infty }} \ \frac{1}{m} \sum _{i=1}^m Z(\mathbf {X}^i) \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \ {\hbox {E}}_{q^H} \Big [Z(\mathbf {X})\Big ], \ \ \ \ i.e.\\&\displaystyle \frac{1}{m} \sum _{i=1}^m \inf _{\mathbf {v} \in B} S^m(\mathbf {X}^i,\mathbf {v}) \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} \ {\hbox {E}}_{q^H}\Big [ \inf _{\mathbf {v} \in B} S(\mathbf {X},\mathbf {v})\Big ], \ \ \hbox {or}\\&\displaystyle \frac{1}{m}\sum _{i=1}^m \inf _{\mathbf {v} \in B} \sum _{t=j}^{j+H-1} \sigma ^m_{t+1}(X^i_{t+1},v_{j:t}) \ {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} {\hbox {E}}_{q^H}\Big [ \inf _{\mathbf {v} \in B} \sum _{t=j}^{j+H-1} \sigma _{t+1}(X_{t+1},v_{j:t})\Big ] \ \ \ a.s. \end{aligned}$$

\(\square \)

The rest of the paragraph is inspired by Theorem 1 of Chen et al. [12].

Let us show first that (13) is satisfied:

\(\forall B \in \mathcal {B}_c\) and \(\forall \mathbf {v} \in B,\) we have by (9)

$$\begin{aligned} J_H(\mathbf {v}) \ = \ \lim _{m \rightarrow \infty } J_H^{m}(\mathbf {v}) \ \ge \ \limsup _{m \rightarrow \infty } \inf _{\mathbf {w} \in B} J_H^{m}(\mathbf {w}), \ \ \ a.s. \end{aligned}$$
(17)

then

$$\begin{aligned} \inf _{\mathbf {w} \in B \bigcap \mathcal {U}_c^H} J_H(\mathbf {w}) \ \ge \ \limsup _{m \rightarrow \infty } \inf _{\mathbf {w} \in B} J_H^{m}(\mathbf {w}), \end{aligned}$$
(18)

and \(\forall \mathbf {v} \in B\)

$$\begin{aligned} \sup _{B \in \mathcal {N}_c(\mathbf {v})} \inf _{\mathbf {w} \in B \bigcap \mathcal {U}_c^H} J_H(\mathbf {w}) \ \ge \ \sup _{B \in \mathcal {N}_c(\mathbf {v})} \limsup _{m \rightarrow \infty } \inf _{\mathbf {w} \in B} J_H^{m}(\mathbf {w}). \end{aligned}$$
(19)

Given \(B \in \mathcal {B}_c\), any ball S with center at \(\mathbf {v} \in B \) and sufficiently small radius, is such that \(S \subset B\), since B is open. Moreover there exists \(B^\prime \) in the basis \(\mathcal {B}_c\) such that \(\mathbf {v} \in B^\prime \subset S\) and then \(B \supset B^\prime \). Starting from \(\mathbf {v} \in B^\prime \) a new ball \(S^\prime \) centered at \(\mathbf {v}\) and with sufficiently small radius to being countained in \(B^\prime \), can be found, and there is a \(B^{\prime \prime } \in \mathcal {B}_c\) such that \(\mathbf {v} \in B^{\prime \prime } \subset S^\prime \) and then \(B^\prime \supset B^{\prime \prime }\). This process can be iterated. It is then possible to choose a subsequence \(\{k_l\}\) such that \(B_{k_l} \supset B_{k_{l+1}}\), with \(\bigcap _l B_{k_l} = \{\mathbf {v}\}\).

Then by the lower semicontinuity of \(J_H(.)\)

$$\begin{aligned} \displaystyle \sup _{B \in \mathcal {N}_c(\mathbf {v})} \inf _{\mathbf {w} \in B} J_H(\mathbf {w}) \ = \ \lim _{l \rightarrow \infty } \inf _{\mathbf {w} \in B_{k_l}}J_H(\mathbf {w}) \ = \ J_H(\mathbf {v}), \end{aligned}$$
(20)

and by (12)

$$\begin{aligned} \displaystyle \inf _{\mathbf {w} \in B_{k_l}} J_H(\mathbf {w}) \ \le \inf _{\mathbf {w} \in B_{k_l}\bigcap \mathcal {U}_c^H} J_H(\mathbf {w}) \ \le J_H(\mathbf {v}^{k_l}) \ \le \ \inf _{\mathbf {w} \in B_{k_l}} J_H(\mathbf {w}) + \frac{1}{k_l}. \end{aligned}$$
(21)

The inequality (13) is implied by (20), (21) and (19). \(\square \)

Let us show now that (14) is also satisfied:

\(\forall B \in \mathcal {B}_c\), let \(\mathbf {w} \in B\), with \(\mathbf {w} := w_{j:j+H-1}\).

Let us note first that the continuity of \(\sigma ^m_{t+1}(x,w_{j:t})\)   for all t,   \(j \le t \le j \! + \! H \! - \! 1\), assures that   \(\displaystyle \inf _{\mathbf {w} \in B}\sum _{t=j}^{j+H-1}\sigma ^m_{t+1}(x,w_{j:t})\)   is measurable.

We have

$$\begin{aligned}&\displaystyle {\sup _{B \in \mathcal {N}_c({\mathbf {v}})} \liminf _{m \rightarrow \infty }\inf _{\mathbf {w} \in B} J_H^{m}(\mathbf {w}) } \nonumber \\&\ \ge \ \sup _{B \in \mathcal {N}_c(\mathbf {v})} \liminf _{m \rightarrow \infty } \frac{1}{m} \sum _{i=1}^m \inf _{\mathbf {w} \in B} \sum _{t=j}^{j+H-1}\sigma ^m_{t+1}(X^i_{t+1},w_{j:t}) \end{aligned}$$
(22)
$$\begin{aligned}&\ = \ \sup _{B \in \mathcal {N}_c(\mathbf {v})} {\hbox {E}}_{q^H}\Big [\inf _{\mathbf {w} \in B} \sum _{t=j}^{j+H-1}\sigma _{t+1}(X_{t+1},w_{j:t}) \Big ] \end{aligned}$$
(23)
$$\begin{aligned}&\ = \ \lim _{l \rightarrow \infty } {\hbox {E}}_{q^H}\Big [\inf _{\mathbf {w} \in B_{k_l}} \sum _{t=j}^{j+H-1} \sigma _{t+1}(X_{t+1},w_{j:t}) \Big ] \end{aligned}$$
(24)
$$\begin{aligned}&\ = \ {\hbox {E}}_{q^H} \Big [\lim _{l \rightarrow \infty } \inf _{\mathbf {w} \in B_{k_l}} \sum _{t=j}^{j+H-1} \sigma _{t+1}(X_{t+1},w_{j:t}) \Big ] \end{aligned}$$
(25)
$$\begin{aligned}&\ \ge \ {\hbox {E}}_{q^H}\Big [ \sum _{t=j}^{j+H-1} \sigma _{t+1}(X_{t+1},v_{j:t}) \Big ] \ = \ J_H(\mathbf {v}). \end{aligned}$$
(26)

in which (22) is true by subadditivity of the infimum, (23) is the immediate consequence of (16), (24) is due to the decrease of the embedded sequence \((B_{k_l})\), (25) is a direct application of the dominated convergence theorem and (26) is due to the lower semicontinuity of the operand in the expectation. \(\square \)

The epi-convergence of \(J_H^{m}(.)\) to \(J_H(.)\) as m grows to infinity is then established.

Now let \(\{\mathbf {v}^m\}\) be a sequence of \(\epsilon \)-minimizers of \(\{J_H^m(.)\}\), i.e.

$$\begin{aligned} \displaystyle J_H^m(\mathbf {v}^m) \ \le \ \inf _{\mathbf {v} \in \mathcal {U}^H} J_H^m(\mathbf {v}) \ + \ \epsilon _m \ \ \ \ \hbox {in which} \ \epsilon _m >0 \ \, \hbox {and} \ \, \epsilon _m {\mathop {\longrightarrow }\limits ^{m \rightarrow \infty }} 0. \end{aligned}$$
(27)

According to Theorem 1.10 of Attouch [2] every converging subsequence \(\{\mathbf {v}^{m_k}\}\) of \(\{\mathbf {v}^m\}\) must converge to one of the minimizers \(\{\mathbf {v}^{*,i}, i=1,\ldots ,r\}\) of \(J_H(.)\)i.e. \(\{\mathbf {v}^{m_k}\} {\mathop {\longrightarrow }\limits ^{k \rightarrow \infty }} \mathbf {v}^*\) a.s., implies that \(\displaystyle J_H(\mathbf {v}^*) = \min _{\mathbf {v} \in \mathcal {U}^H}J_H(\mathbf {v})\). Moreover according to [2] the optimal values also converge : \(J_H^{m_k}(\mathbf {v}^{m_k}) {\mathop {\longrightarrow }\limits ^{k \rightarrow \infty }} J_H(\mathbf {v}^*)\)   a.s.

Theorem 4.7

The sequence \(\{\mathbf {v}^m\}\) of \(\epsilon \)-minimizers of \(\{J_H^m(.)\}\) defined by (27) converges with probability one into the set of the minimizers of \(J_H(.)\). Moreover the sequence \(\{J_H^m(\mathbf {v}^m)\}\) converges itself with probability one into the set of the corresponding minima of \(J_H(.)\).

Proof

(i):

Since the random sequence \(\{\mathbf {v}^m\}\) is bounded by definition, it is tight. Hence by Prokhorov’s theorem every subsequence \(\{\mathbf {v}^{m_k}\}\) has a sub-subsequence \(\{\mathbf {v}^{m_{k_l}}\}\) whose related probability distribution functions converge weakly to the probability distribution function of a random variable V. Moreover according to the Skorokhod’s representation theorem, there exists random variables \(\{{W}_l\}\) and W, respectively distributed as \(\{\mathbf {v}^{m_{k_l}}\}\) and V, such that \(\{W_l\}\) converges toward W almost surely. Then \(\{\mathbf {v}^{m_{k_l}}\}\) converges a.s. into \(\hbox {Supp}(W) \equiv \hbox {Supp}(V)\) as does \(\{W_l\}\) itself, and according to Th. 1.10 of Attouch [2], the corresponding limit points are minimizers of \(J_H(.)\). Then the Prokhorov’s theorem ensures that \(\hbox {Supp}(V) \equiv \hbox {Supp}(W)\) can only be a subset of the set of minimizers \(\{\mathbf {v}^{*,i}, i=1,\ldots ,r\}\) of \(J_H(.)\).

(ii):

Now suppose that the random sequence \(\{\mathbf {v}^m\}\) does not converge with probability one into the set \(\{\mathbf {v}^{*,i}, i=1,\ldots ,r\}\). Then, there exist r open sets \(\{ \mathcal {O}_i, i=1,\ldots ,r\} \subset \mathcal {U}^H \), each one containing one of the \(\{\mathbf {v}^{*,i}\}\), and a subsequence \(\{\mathbf {v}^{m_k}\}\) of \(\{\mathbf {v}^m\}\) such that for all \(k \in I\!\!N,\)\(P(\mathbf {v}^{m_k} \notin \mathcal {O}_i, i=1,\ldots ,r) > 0 \) and for any of its embedded sub-subsequence \(\{\mathbf {v}^{m_{k_l}}\}\), \(P(\mathbf {v}^{m_{k_l}} \notin \mathcal {O}_i, i=1,\ldots ,r) >0\). By Prokhorov’s theorem every such subsequence \(\{\mathbf {v}^{m_k}\}\) has still a sub-subsequence \(\{\mathbf {v}^{m_{k_l}}\}\) converging in distribution to a random variable V. But the Skorokhod sequence \(\{W_l\}\) distributed as \(\{\mathbf {v}^{m_{k_l}}\}\) is then also such that \(\forall l \in I\!\!N\)\(P(W_l \notin \mathcal {O}_i, i=1,\ldots ,r) > 0 \) and \(\{W_l\}\) cannot converge a.s. into the set \(\{ \mathbf {v}^{*,i}, i=1,\ldots ,r \}\), which contradicts (i). Hence the result for the sequence \(\{\mathbf {v}^m\}\). Moreover, due to the convergence of the corresponding optimal values themselves according to Attouch’s theorem, the proof is complete. \(\square \)

5 Application: A Simulated Case Study in Predictive Microbiology

5.1 The State-Space Model Considered

5.1.1 The State Equation

One of the most efficient tools in the field of food safety is the stochastic modelling of a (pathogenic) bacterial population decreasing in a given culture medium . Indeed, under particular conditions of environmental factors such as temperature, pH, water activity, and after a lactic acid shock, a decreasing of the bacteria number can be observed (the so-called growth inactivation). This decreasing can be very slow if the temperature is kept constant (see the curve number 1 in Fig. 1), but can go faster if the temperature is increased, due to the enhanced efficiency of the lactic acid effect on the bacteria morbidness. A goal of the microbiologists is then to control the temperature evolution for obtaining a particular (a priori chosen) growth inactivation profile, which will be called hereafter the target decreasing trajectory.

For some bacterial species (Listeria, Salmonella,\(\ldots \)) efficient mathematical models are available for describing growth inactivation. For the bacterial dynamics simulation and its subsequent predictive control in our case study, the model proposed by Coroller et al. [14] will be considered under its approximate discrete time autoregressive form, with a multiplicative lognormal noise (as usually considered by microbiologists for counts variability) :

$$\begin{aligned} X_{t+1} \ = \ \left[ X_{t} - \delta X_t\left( \frac{\lambda }{D_t}\right) \times \left( \frac{t}{D_t}\right) ^{\lambda - 1}\right] \times \varepsilon _{t+1}, \end{aligned}$$
(28)

with

  • t :  time variable, with \(t\in \left[ 0,t_{\max }\right] \), \(t_{\max }\) being a priori selected by the microbiologist (here \(t_{\max }=600\) hours).

  • \(X_{t}:\) the bacteria number per ml of culture broth at time t, which cannot be observed directly (from which, the associated filtering problem).

    The initial number of bacteria \(x_0\) is chosen as \(x_0 = 10^{7}\) for the simulation and is considered as known in this simulated control processing. However it could also be considered as an additional unknown parameter to be estimated by filtering from a given initial prior density \(p_0^X(.)\) at time \(t=0\).

  • \(\lambda :\) an unknown shape parameter that must be estimated by filtering, simultaneously with the control procedure. According to \(\lambda \) the graph of the bacterial dynamics without noise is convex (\(\lambda < 1\)), straight (\(\lambda =1\)) or concave (\(\lambda >1\)).

  • \(D_t:\) a function of the temperature, \(T_t( {{}^\circ } C)\). \(D_t\) is the so-called decimal reduction, defined in [14], by

    $$\begin{aligned} \log _{10}D_t=D^{s}-\left[ \left( \frac{2\left( T^{c}-T^{opt}\right) }{(Z)^{2}}\right) \times \left( T_t-T^{s}\right) \right] \ \ \ \hbox { if } \ \ \ T_t \le T^{c} \end{aligned}$$
    (29)

    and

    $$\begin{aligned} \log _{10}D_t=D^{s}+\left( \frac{\left( T^{c}-T^{opt}\right) \left( 2T^{s}-T^{c}-T^{opt}\right) }{(Z)^{2}}\right) \ -\left( \frac{T_t-T^{opt}}{Z}\right) ^{2} \end{aligned}$$
    (30)

    if \(T_t>T^{c},\)

    where \(D^{s},\)\(T^{c},\)\(T^{opt},\)\(T^{s},\)Z are in general badly known parameters, the values of which depend on the bacterial species. For the data simulation in the present case (Listeria) these parameters and the shape parameter \(\lambda \), are fixed to (see [14]):

    $$\begin{aligned} \lambda = 2, \ D^{s}=2.5, \ T^{c}=20, \ T^{opt}=10, \ T^{s}=12, \ Z = 22. \end{aligned}$$
    (31)

    All these six parameters, will have to be estimated by the filtering process, in parallel with the system control processing: in our case study the unknown \(\theta \) of model (1) corresponds then to the vector \(\{ \lambda , D^s, T^c, T^{opt}, T^s, Z\}\).

Remark 4

  • The temperature, \(T_t,\) is the \(u_{t}\) control variable to be used: \( 0 \le \ T_t \ \le 40\,^{\circ }\)C.

  • \(\delta \) is the model time step (here always fixed at the value of 1 hour).

  • \(\varepsilon _{t+1}\) is a noise, taken as a lognormal random variable such that \(e_{t+1} = \ln \varepsilon _{t+1}\) is a Gaussian variable \(N(0,\rho _t^2)\), with \(\rho _t = \frac{1}{2}CV\log _{10}x_t\), CV being an approximate surrogate coefficient of variation supposed to be constant according microbiological considerations and known during the control process (realistic values 0.01, 0.025, 0.05, 0.10, 0.20 were considered for the different simulations done). Note that this quantity could also be considered as a parameter to be estimated by filtering during the control process.

  • The noninteger values provided for the state variable \(X_t\) by model (28) as approximations of integer bacteria counts, are quite acceptable with respect to these very high population sizes.

Remark 5

With this setting it can be checked that the stabilisability sufficient condition A3 is satisfied by model (28) if whatever t, the applied computed temperature \(T_t\) (the control) is such that:

$$\begin{aligned}&if \ \ 1 \ \le \ x_t \ \le \ C: \ \ \ \ \ \ \ \frac{\delta \lambda t^{\lambda -1}}{1-1/x_t} \ \le \ D_t^{\lambda } \end{aligned}$$
(32)
$$\begin{aligned}&if \ \ x_t \ > \ C: \ \ \ \ \ \ \ \frac{\delta \lambda t^{\lambda -1}}{1-1/x_t} \ \le \ D_t^{\lambda } \ \le \ \frac{\delta \lambda t^{\lambda -1}}{1-C/x_t} \end{aligned}$$
(33)

with \(C=\exp {\Big ((\ln 10)\sqrt{\beta }\Big )}\), for a given \(\beta > 0\) and \(a=2\) in A3.

5.1.2 The Observation Equation

In the simulated case study considered here, the observed variable \(Y_{t}\) at time t is the number of cells (bacteria) supposed to have been detected by flow cytometry counting [32] in the last of a series of diluted samples in successive test tubes, from the original culture broth at time t. The few minutes requested by this counting process can be considered as negligible with respect to the slow dynamics of the growth inactivation. The probability distribution function \(G_{t}(.|X_t=x_{t},\theta )\) of \(Y_{t}\) in model (1), is here the result of the interaction of several independent random phenomena: the spatial sampling in the primary test tube at time t, a given number of successive samplings in several tubes of increasing dilution (with Poisson or aggregative assumptions for the bacteria spatial probability distributions), the successive volume sampling errors and dilution errors (assumed to be Gaussian) and finally, the lognormal error counts attributed to the flow cytometer device itself. See [40] for full details about this sophisticated sampling-dilution-numbering procedure. The probability distribution function \(G_{t}\) cannot be analytically characterised but can be easily simulated, which is the only requirement for the proposed particle predictive control procedure to be used, according to the particle generation algorithm considered (Appendix 1). Here, the dimension of the \(Y_{t}\) variable, s, corresponds to the number of repetitions of the previous bacteria sampling-dilution-counting procedure achieved at every time, t.

5.2 Settings and Results of the Predictive Control

In the present case study the goal of the successive minimizations of the approximated cost function expectation, \(J_{H}^{m}(.)\) defined in Sect. 4.2, is to obtain a controlled state trajectory being the closest as possible to a given deterministic target trajectory \(\{x_t^*\}\), all along the selected time range, \(\left[ 0,t_{\max }\right] \). The cost function considered is then taken as the quadratic discrepancy (4). The computation of \(J_H^m\) when requested by the minimization algorithm, is done by using a N(0, 1) probability density as intrumental pdf q(x), and Gaussian kernels for the particle pdf estimator \(p_{t+1}^{n(m),X}(x|y_{1:j}, u_{0,j-1}, v_{j:t})\) of the pdf \(p_{t+1}^X(x|y_{1:j}, u_{0,j-1}, v_{j:t})\) (see Appendix 1).

Let \(\delta _{v^{*}}\) be the time during which a same computed optimal control \(v^{*}\) is applied (the tested values for \(\delta _{v^{*}}\) were 1, 2, 5, 7, 10, 15, 50 hours). Note that the minimizations must be performed every \(\delta _{v^{*}}\), that means for example with \(\delta _{v^{*}}=\delta =1\) and a time range of \(\left[ 0,t_{\max }=600h \right] \), that the minimization procedure should be performed \(600 \times \gamma \) times (with \(\gamma \) a given number of independent runs from different initial values for limiting the risk to be trapped into a local minimum), i.e.  6000 times if \(\gamma =10\). Moreover, the dimension of the optimization space is H, with realistic values from 1 to 10.

Two minimization algorithms were compared: a global stochastic procedure based on [51] and a deterministic procedure based on the well-known Nelder and Mead simplex algorithm from the SAS/IML library [47]. The stochastic procedure, leading presently to too costly computer time—more than two weeks when (mn) \(=\) (100, 1000) and \(\delta _{v^{*}}\)\(=\) 1 to 5—was abandoned. Only the simplex procedure was carried out, with still long but affordable computing times for these exploratory trials, typically several days with a Pentium IV computer, depending on the simulation conditions (much less however than the 600 hour duration of the virtual microbiological experiment): as previously said, our objective in these tests was not to find the fastest minimization procedure but rather to provide illustrations of the relevance of the \(J_{H}^{m}(.)\) approximated cost function expectation under different simulation conditions. Note however that this whole simulation/minimization procedure could be parallelized, with significant time saving beneficial effect.

Several successive optional settings of the predictive control / minimization procedure were tested:

  • For the horizon, H: values from 1 to 10, by step of 1.

  • For (mn) (computation of \(\bar{\sigma }_{t+1}^{m}\)): (20, 100), (50, 500), (100, 1000).

  • For \(\delta _{v^{*}}\): 1, 2, 5, 7, 10, 15, 50.

  • For s (number of bacteria counting repetitions carried out at each time t): 1, 2, 3.

  • For the filtering process itself, the initial prior distribution of the six unknown parameters, \(p_0^{\theta }(.)\), was taken as a uniform distribution over the following intervals (chosen according to microbiological considerations):

    $$\begin{aligned}&1.62 \le \lambda \le 2.42, \ \ 2.03 \le D^s \le 3.03, \ \ 16.20 \le T^c \le 24.20, \ \ \nonumber \\ \nonumber \\&8.10 \le T^{opt} \le 12.10, \ \ 9.72 \le T^s \le 14.52, \ \ 17.82 \le Z \le 26.62. \end{aligned}$$
    (34)
Fig. 1
figure 1

Evolution of the predictive optimal control process under the setting \(m = 100\), \(n = 1000\) and \(H = 10\)

By combining these settings, several simulated predictive control processings were performed for a given decreasing deterministic bi-lobbed target trajectory \(x^*(t)\) (curve 3   in Fig. 1). The results of the most illustrative nine of them are reported in the following three tables (all with \(\delta _{v^{*}} = 2\), \(s=2\) and each noise surrogate coefficient of variation CV taken fixed as 0.025). Without surprise the best predictive control was obtained for \(m = 100, n=1000, H=10\) and is displayed in Fig. 1.

Table 1 Evolution of the SSQ with respect to the (mnH) setting
Table 2 Evolution of the estimates of the parameters \(\lambda , D^s, T^c\) and that of their 95% confidence bounds with respect to the (mnH) setting

Results:

  • Table 1 presents the evolution with respect to different combinations of mnH, of the discrepancy sum of squares between the target trajectory and the estimate of the expected controlled trajectory, after a logarithmic transformation, \(\displaystyle SSQ = \sum _{t=0}^{600} \left[ \log _{10}x^*_t - \overline{\log _{10}x_t}\right] ^2\) where \(\overline{\log _{10}x_t} = \frac{1}{n}\sum _{i=1}^{n}\log _{10}\bar{x}_t^i\) (see Appendix 1 for the particles \(\{\bar{x}_t^i\}\) generation). The most significant result is the major effect of the chosen horizon H on the decreasing of the SSQ, with respect to the values of m and n for sufficiently big values (from 50 and 500 upwards respectively).This behaviour reveals the good predictability of this nonparametric predictive control approach.

  • Tables 2 and 3 display the final filtering estimates of each of the six parameters and the lower and upper bounds of their respective particle-estimated \(95\%\) confidence intervals (see [45] for technical details). These estimates are to be compared with the true parameter values (31) and with the initial prior parameter intervals (34) respectively. Beside the good quality of these estimates in spite of the relatively moderate m and n values used, another noticeable result is again the effect of the horizon H, the growing of which seems to be more sensible upon the ranges of the parameter confidence intervals than upon the parameter estimates themselves, with again a global improvement with increasing H, for m and n sufficiently big.

  • Figure 1 displays four curves related to the control processing with the setting \((m=100, n=1000, H=10)\), in \(\log _{10}\) units for the state variable \(X_t\) (bacteria number, left vertical axis) and in degree Celsius units for the control variable \(T_t\) (applied temperature, right vertical axis):

Table 3 Evolution of the estimates of the parameters \(T^{opt}, T^s, Z\) and that of their 95% confidence bounds with respect to the (mnH) setting
  • \(\circ \) Curve 1 is that of a simulation without control of the noisy state variable \(X_t\) (bacteria number per ml of culture broth (28)) under a fixed temperature (\(T_t = 2\,^{\circ }\)C).

  • \(\circ \) Curve 2 is that of the computed optimal control \(v^*_t\) (temperature \(T_t^*\)) with \(\delta _{v^*} = 2\). With some algebra one can easily check that all \(v^*_t\) satisfy (32, 33) for \(\beta > (\log _{10}x_0)^2\) and then, that the sufficient stabilisability condition A3 is satisfied.

  • \(\circ \) Curve 3 is that of the bi-lobbed target trajectory \(x^*_t\).

  • \(\circ \) Curve 4 is that of the evolution of the expected optimally controlled state trajectory \(X_t\).

One can notice the good predictive control anticipation of the change of curvature of the bi-lobbed reference trajectory under this horizon setting, leading to a satisfactory computed discrepancy (\(SSQ = 1.15\)).

The performance of this simulated predictive control processing and that of other non reported trials, could be improved by increasing the number m of simulations in the approximation of the cost-to-go function expectation and then the number n of particles used for the filtering step, at the price of a still heavier computing time. But as said previously this last drawback could be drastically reduced by the parallelization of the computer code, as often done for particle procedures like this one. Moreover with reasonable values for m and n as in the present settings, the proposed predictive control procedure seems already able to correctly anticipate the dynamic variations of the target trajectory and provides rather good control of the state space system considered, as shown by Fig. 1.

6 Conclusion

Solving stochastic NMPC problems on continuous state spaces, for imperfectly observed systems described by nonlinear non-Gaussian discrete-time state space models, is still a theoretical and a practical challenge. This paper addresses both aspects of this ambitious objective: first, the estimation of the multi-step ahead conditional pdf of the system state variables and the estimation of the subsequent cost-to-go expectation; secondly, the minimization of this expected cost estimate, providing optimal controls to be applied at each time step. Based on the use of a recently developed nonparametric particle estimator of the multi-step-ahead conditional pdf of the state variables and on the theory of epi-convergence, a simulation-based epi-convergent estimator of the expected cost-to-go over the receding horizon is proposed. Therefrom, every sequence of approximated minimizers of the corresponding expected cost-to-go estimates, converges with probability one into the set of the minimizers of the true expected cost-to-go at each time step. Idem for the convergence of the sequence of the corresponding minima to their true counterparts.