Optimal Control of a Discrete-Time Stochastic System with a Probabilistic Criterion and a Non-fixed Terminal Time

Azanov, V. M.

doi:10.1134/S0005117920120012

Optimal Control of a Discrete-Time Stochastic System with a Probabilistic Criterion and a Non-fixed Terminal Time

topical issue
Published: 10 February 2021

Volume 81, pages 2143–2159, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Automation and Remote Control Aims and scope Submit manuscript

Optimal Control of a Discrete-Time Stochastic System with a Probabilistic Criterion and a Non-fixed Terminal Time

Download PDF

V. M. Azanov¹

71 Accesses
1 Citation
Explore all metrics

Abstract

This paper considers an optimal control problem for a discrete-time stochastic system with the probability of first reaching the boundaries of a given domain as the optimality criterion. Dynamic programming-based sufficient conditions of optimality are formulated and proved. The isobells of levels 1 and 0 of the Bellman function are used for obtaining two-sided estimates of the right-hand side of the dynamic programming equation, two-sided estimates of the Bellman function, and two-sided estimates of the optimal-value function of the probabilistic criterion. A suboptimal control design method is proposed. The conditions of equivalence to an optimal control problem with a probabilistic terminal criterion are established. An illustrative example is given.

Optimal control for linear discrete systems with respect to probabilistic criteria

Article 10 October 2014

Two Approaches to Stochastic Optimal Control Problems with a Final-Time Expectation Constraint

Article 10 September 2016

Discrete Optimal Control

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

One of the most important fields of research into stochastic optimal control is the problems with a non-fixed terminal time. Among them, note the stochastic time-optimal control problem [1, 2], the stochastic optimal control problem with infinite horizon [3–6], the problem of optimizing the duration of stay of a stochastic system in a given tube of trajectories [1, 7], and the stochastic problem of optimizing the time of first reaching the boundaries of a given domain [1, 8, 9]. Control models with a non-fixed terminal time are widely used in the aerospace [10], economic [11], biological, robotic, and energy [1] applications. In the case of continuous-time systems, the methods using dynamic programming-based sufficient conditions of optimality have become widespread, which yield optimal feedback control laws. Interestingly, just the use of the probabilistic criterion [1, 7] leads to a constructive statement of the control problems with a non-fixed terminal time for which the Bellman equation can be written. Nevertheless, there are still a number of fundamental problems in the numerical calculation of optimal control, and analytical solutions have been obtained only for some model problems [1]. This circumstance is due to the following difficulties. First, the solution of the Bellman equation may be nonunique. Second, even if a solution of the Bellman equation exists in the class of smooth functions, it may be inadmissible (for example, because the Itô equation of the closed loop stochastic system with this control has no strong solution). Third, the class of admissible controls does not always include the one for which the supremum (or infimum) of the criterion is achieved. And fourth, Bellman’s equation is related to the curse of dimensionality.

In the case of discrete time, a qualitative theory of such problems was presented in [3]. Also, solutions of particular economic problems [11] and model examples [9] are known. In [9], the problem of optimizing the probability of first reaching the neighborhood of the origin by the trajectories of a linear stochastic system was considered in the Brunovsky canonical form of controllability. Its analytical solution was found through the reduction to the problem with a probabilistic terminal criterion and further use of the dynamic programming method in the form [12].

This paper studies an optimal control problem for a discrete-time stochastic system with the probability of first reaching a given tube as the optimality criterion. Sufficient conditions of optimality similar to [12] are established, and the properties of the two-sided estimates of the Bellman function [13, 14] are examined. The conditions of equivalence to the optimal control problem with the probabilistic terminal criterion [12] are obtained. As an illustrative example the problem of investment portfolio management is considered.

2 Problem Statement

Consider a discrete-time stochastic controlled system of the form

$$\left\{\begin{array}{l}{x}_{k+1}={f}_{k}\left({x}_{k},{u}_{k},{\xi }_{k}\right)\\ {x}_{0}=X,\end{array}\right.\qquad k=\overline{0,N},$$

(1)

with the following notations: ${x}_{k}\in {{\mathbb{R}}}^{n}$ as the state vector; ${u}_{k}\in {U}_{k}\subset {{\mathbb{R}}}^{m}$ as the control vector; U_k as the set of control constraints; ξ_k as the vector of random perturbations with the codomain ${{\mathbb{R}}}^{s}$ and a given distribution ${{\bf{P}}}_{{\xi }_{k}};$${f}_{k}:{{\mathbb{R}}}^{n}\times {{\mathbb{R}}}^{m}\times {{\mathbb{R}}}^{s}\to {{\mathbb{R}}}^{n}$ as the transition function (system function); finally, $N\in \left\{0\right\}\cup {\mathbb{N}}$ as the control horizon.

Introduce several assumptions regarding the system (1) as follows.

1.
The complete information on the state vector x_k is known. (Due to this fact, an appropriate control can be constructed in the class of functions ${u}_{k}={\gamma }_{k}\left({x}_{k}\right)$, where ${\gamma }_{k}\left(\cdot \right)$ is some measurable function.) In this case, it is said that “control is found in the class of full state feedback controls.”
2.
The initial state x₀ = X is a deterministic from ${{\mathbb{R}}}^{n}$.
3.
The system function ${f}_{k}\left({x}_{k},{u}_{k},{\xi }_{k}\right)$ is continuous for all k.
4.
The control vector u_k is generated in the form ${u}_{k}={\gamma }_{k}\left({x}_{k}\right)$, where ${\gamma }_{k}:{{\mathbb{R}}}^{n}\to {{\mathbb{R}}}^{m}$ is a measurable function with bounded values u_k ∈ U_k, U_k being a compact set.
5.
The state vector x_k+1 is generated in the following way: in step k the vector x_k is realized; then the control vector ${u}_{k}={\gamma }_{k}\left({x}_{k}\right)$ is formed; at last the random perturbation ξ_k is realized.
6.
The control is the set of functions $u\left(\cdot \right)=\left({\gamma }_{0}\left(\cdot \right),\ldots ,{\gamma }_{N}\left(\cdot \right)\right)\in {\mathcal{U}};$ the class of admissible controls is the set ${\mathcal{U}}={{\mathcal{U}}}_{0}\times \ldots \times {{\mathcal{U}}}_{N},$ where ${{\mathcal{U}}}_{k}$ denotes the set of Borel functions ${\gamma }_{k}\left(\cdot \right)$ with values bounded on U_k.
7.
The random vector ξ_k is continuous, has the codomain ${{\mathbb{R}}}^{s}$ and a known distribution ${{\bf{P}}}_{{\xi }_{k}}$; moreover, the components of the vector $\zeta =\left(X,{\xi }_{0},\ldots ,{\xi }_{N}\right)$ are independent.

Note that the system (1) is Markovian, i.e., its future behavior does not depend on the past states (history) and is completely defined by the current state.

On the trajectories of the system (1) define the probabilistic functional

$$P\left(u\left(\cdot \right)\right)={\bf{P}}\left(\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{x}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right),$$

in which the sets ${{\mathcal{F}}}_{k}$ have the form

$$\left\{\begin{array}{l}{{\mathcal{F}}}_{k}=\left\{x\in {{\mathbb{R}}}^{n}:{\Phi }_{k}\left(x\right)\leqslant \varphi \right\},\quad k=\overline{1,N+1}\\ {{\mathcal{F}}}_{0}={{\mathbb{R}}}^{n},\end{array}\right.$$

where $\varphi \in {\mathbb{R}}$ is a given scalar; ${\Phi }_{k}:{{\mathbb{R}}}^{n}\to {\mathbb{R}}$ are continuous functions, k = 1, …, N + 1, and ${\Phi }_{N+1}\left(x\right)$ is bounded below.

Consider the optimization problem

$$P\left(u\left(\cdot \right)\right)\to \mathop{\rm{max}}\limits_{u\left(\cdot \right)\in {\mathcal{U}}},$$

(2)

where ${\mathcal{U}}={{\mathcal{U}}}_{0}\times \ldots \times {{\mathcal{U}}}_{N}$.

Physically, problem (2) is to find a control maximizing the probability of first reaching the tube of trajectories described by the sequence of given sets ${\left\{{{\mathcal{F}}}_{k+1}\right\}}_{k = 0}^{N}$.

Note that the dynamic programming method in the form [12], introduced for optimal control problems with the probabilistic terminal criterion, is not generally applicable to problem (2). The dynamic programming-based sufficient conditions of optimality, similar to [12], will be established in Section 3.

3 Dynamic Programming and Two-Sided Estimates of Bellman Function

Define the Bellman function ${{\rm{B}}}_{k}:{{\mathbb{R}}}^{n}\to \left[0,1\right]$ in problem (2) as

$${{\rm{B}}}_{k}\left(x\right)=\mathop{\rm{sup}}\limits_{{\gamma }_{k}\left(\cdot \right)\in {{\mathcal{U}}}_{k},\ldots ,{\gamma }_{N}\left(\cdot \right)\in {{\mathcal{U}}}_{N}}{\bf{P}}\left(\mathop{\rm{min}}\limits_{i = \overline{k,N}}\left.{\Phi }_{i+1}\left({x}_{i+1}\right)\leqslant \varphi \right|{x}_{k}=x\right).$$

In view of the assumptions accepted in Section 2, formulate a theorem on the Bellman equation for problem (2) in the n-dimensional state space.

Theorem 1

Let the following conditions be satisfied:

1) The functions ${f}_{k}\left({x}_{k},{u}_{k},{\xi }_{k}\right)$ are continuous for all $k=\overline{0,N}$.

2) The functions ${\Phi }_{k}\left({x}_{k}\right)$ are continuous for all $k=\overline{1,N+1}$.

3) The function ${\Phi }_{N+1}\left({x}_{N+1}\right)$ is bounded below.

4) The random vectors X, ξ₀, …, ξ_N are independent.

5) The sets U₀, …, U_N are compact.

Then the optimal control in problem (2) exists in the class of measurable functions ${u}^{* }\left(\cdot \right)\in {\mathcal{U}}$ and is determined by solving the following problems:

$${\gamma }_{k}^{* }\left(x\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in {U}_{k}}\,\,{{\bf{M}}}_{{\xi }_{k}}\left[{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)+\left(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\right){{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],$$

(3)

$${{\rm{B}}}_{k}\left(x\right)=\mathop{\rm{max}}\limits_{u\in {U}_{k}}\,\,{{\bf{M}}}_{{\xi }_{k}}\left[{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)+\left(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\right){{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],\quad k=\overline{0,N},$$

(4)

$${{\rm{B}}}_{N+1}\left(x\right)={{\bf{I}}}_{{{\mathcal{F}}}_{N+1}}\left(x\right).$$

(5)

The proof of Theorem 1, as well as the proofs of all other theorems, propositions and lemma below, are postponed to the Appendix.

In Theorem 1, ${{\bf{M}}}_{{\xi }_{k}}\left[\cdot \right]$ denotes the expectation operator in the distribution ${{\bf{P}}}_{{\xi }_{k}}$ of the random vector ξ_k, and ${{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)$ is the indicator function of the set ${{\mathcal{F}}}_{k}$. Note that Eqs. (3)–(5) differ from the classical Bellman equation [12] for the problem with the probabilistic terminal criterion by the additional term ${{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)$ and the factor $1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)$ under the expectation operator, which figure in the right-hand side.

As is known, the direct integration of the Bellman equation causes the difficulties of calculating multiple integrals and solving stochastic programming problems in its right-hand side. These difficulties have repeatedly occurred even for systems of the simplest type, e.g., for the investment portfolio management system [15–17] and for the stationary satellite control system [12]. As a result, for almost twenty years, optimal control problems with a probabilistic criterion have been considered in the one-step (N = 0) and two-step (N = 1) statements; see [15–17]. In [13, 14], two-sided estimates of the Bellman function were found using the isobells of levels 1 and 0, which allowed solving particular optimal control problems with a probabilistic criterion for an arbitrary time step N.

By analogy with [13, 14], investigate the properties of the Bellman function for problem (2) using the isobells of levels 1 and 0 of the Bellman function.

4 Two-Sided Estimates of Bellman Function

Introduce the isobells of levels 1 and 0 of the Bellman function, defined as

$${{\mathcal{I}}}_{k}=\left\{x\in {{\mathbb{R}}}^{n}:\quad {{\rm{B}}}_{k}\left(x\right)=1\right\},\quad {{\mathcal{O}}}_{k}=\left\{x\in {{\mathbb{R}}}^{n}:\quad {{\rm{B}}}_{k}\left(x\right)=0\right\},$$

and the set ${{\mathcal{B}}}_{k}={{\mathbb{R}}}^{n}\setminus \left\{{{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k}\right\}$. For the sake of convenience, also introduce the notation ${\overline{{\mathcal{F}}}}_{k}={{\mathbb{R}}}^{n}\setminus {{\mathcal{F}}}_{k}$. Clearly, by the definition of these sets,

$${{\mathcal{I}}}_{k}\cup {{\mathcal{B}}}_{k}\cup {{\mathcal{O}}}_{k}={{\mathbb{R}}}^{n},\quad \left\{\begin{array}{ll}{{\rm{B}}}_{k}\left(x\right)=1,&x\in {{\mathcal{I}}}_{k}\\ {{\rm{B}}}_{k}\left(x\right)\in \left(0,1\right),&x\in {{\mathcal{B}}}_{k}\\ {{\rm{B}}}_{k}\left(x\right)=0,&x\in {{\mathcal{O}}}_{k}.\end{array}\right.$$

Theorem 2

The following statements hold:

1. The sets ${{\mathcal{I}}}_{k}$, $k=\overline{0,N}$, satisfy the recurrent relations in backward time

$${{\mathcal{I}}}_{k}={{\mathcal{F}}}_{k}\cup \left\{x\in {{\mathbb{R}}}^{n}:\exists u\in {U}_{k}:{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=1\right\},\quad k=\overline{0,N},$$

$${{\mathcal{I}}}_{N+1}={{\mathcal{F}}}_{N+1}.$$

2. The sets ${{\mathcal{O}}}_{k}$, $k=\overline{0,N}$, satisfy the recurrent relations in backward time

$${{\mathcal{O}}}_{k}={\overline{{\mathcal{F}}}}_{k}\cap \left\{x\in {{\mathbb{R}}}^{n}:\forall u\in {U}_{k}:{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right)=1\right\},\quad k=\overline{0,N},$$

$${{\mathcal{O}}}_{N+1}={\overline{{\mathcal{F}}}}_{N+1}.$$

3. For $x\in {{\mathcal{I}}}_{k}$, the function ${\gamma }_{k}^{* }(x)$ takes any value from the set

$${U}_{k}^{{\mathcal{I}}}\left(x\right)=\left\{u\in {U}_{k}:{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=1\right\}.$$

(6)

4. For $x\in {{\mathcal{O}}}_{k}$, the function ${\gamma }_{k}^{* }\left(x\right)$ takes any value from the set U_k.

5. The Bellman equation in the domain of all $x\in {{\mathcal{B}}}_{k}$ admits of the representation

$$\begin{array}{rcl}{{\rm{B}}}_{k}\left(x\right)&=&\mathop{\rm{max}}\limits_{u\in {U}_{k}}\left\{{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)\right.\\ &&\left.+{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right){{\bf{M}}}_{{\xi }_{k}}\left[\left.{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right|{f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right]\right\}.\end{array}$$

(7)

6. For $x\in {{\mathcal{B}}}_{k}$ and u ∈ U_k,

$$\begin{array}{cc}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{F}}}_{k+1}\right)\leqslant {{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)\\ \leqslant {{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right]\leqslant 1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right).\end{array}$$

(8)

7. For $x\in {{\mathcal{B}}}_{k}$, the Bellman equation satisfies the two-sided inequality

$${\Psi }_{k}\left(x\right)\leqslant {\underline{{\rm{B}}}}_{k}\left(x\right)\leqslant {{\rm{B}}}_{k}\left(x\right)\leqslant {\overline{{\rm{B}}}}_{k}\left(x\right),$$

(9)

where

$${\Psi }_{k}\left(x\right)=\mathop{\rm{sup}}\limits_{u\in {U}_{k}}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{F}}}_{k+1}\right),$$

(10)

${\underline{{\rm{B}}}}_{k}\left(x\right)$ and ${\overline{{\rm{B}}}}_{k}\left(x\right)$ are lower and upper bounds on the Bellman function, i.e.,

$$\begin{array}{cc}{\underline{{\rm{B}}}}_{k}\left(x\right)=\mathop{\rm{sup}}\limits_{u\in {U}_{k}}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right),\\ {\overline{{\rm{B}}}}_{k}\left(x\right)=\mathop{\rm{sup}}\limits_{u\in {U}_{k}}\left\{1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right)\right\},\end{array}$$

moreover ${\underline{{\rm{B}}}}_{N}\left(x\right)={{\rm{B}}}_{N}\left(x\right)={\overline{{\rm{B}}}}_{N}\left(x\right)$.

The difference between the right-hand side of the relations in item 1 of Theorem 2 from the relations for the isobell of level 1 of the Bellman function in the problem with the terminal probabilistic criterion [18] is the union operation with the set ${{\mathcal{F}}}_{k}$. For the isobell of level 0 of the Bellman function, the difference lies in the intersection operation with the set ${\overline{{\mathcal{F}}}}_{k}$. Items 3 and 4 establish the simplest (with respect to (3)) expressions for determining the optimal control for ${x}_{k}\in {{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k},$ which coincide, up to the structures of the sets ${{\mathcal{I}}}_{k},$ with those in the problem with the terminal probabilistic criterion. Items 6 and 7 of Theorem 2 establish two-sided estimates for the right-hand side of the dynamic programming equation and the Bellman function, respectively. Moreover, the expressions for the lower and upper bounds, up to the structures of the sets ${{\mathcal{I}}}_{k}$ and ${{\mathcal{O}}}_{k}$, coincide with those in the problem with the terminal criterion [13, 18]. The difference is the presence of an additional inequality in the left-hand side of (8) and, as a consequence, the inequality ${\Psi }_{k}\left(x\right)\leqslant {\underline{{\rm{B}}}}_{k}\left(x\right)$.

Next, explore in detail the properties of the control maximizing at each step the lower bound on the right-hand side of the dynamic programming equation.

5 Suboptimal Control

Consider a control $\underline{u}\left(\cdot \right)=\left({\underline{\gamma }}_{0}\left(\cdot \right),\ldots ,{\underline{\gamma }}_{N}\left(\cdot \right)\right)$, where

$${\underline{\gamma }}_{k}\left(x\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right),\quad k=\overline{0,N}.$$

(11)

This control has the following properties:

For $x\in {{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k}$ and for all $k=\overline{0,N}$, ${\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)$.
For k = N, ${\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)$.
From ${{\mathcal{F}}}_{k}={{\mathcal{I}}}_{k}$ holding for all $k=\overline{0,N}$ it follows that${\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)$.

Theorem 3

Assume that the control $\underline{u}\left(\cdot \right)$ exists in the class ${\mathcal{U}}$. Then the following statements hold:

1. The value of the probabilistic criterion under the control $\underline{u}\left(\cdot \right)$ is given by

$$\begin{array}{ccc}P\left(\underline{u}\left(\cdot \right)\right)=\underline{F}\left(\varphi ,N,X\right)\\ +\mathop{\sum }\limits_{l=1}^{N-1}{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{l}\left\{{\underline{x}}_{k+1}\in {{\mathcal{I}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right)\\ +{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right),\end{array}$$

(12)

where ${\underline{x}}_{k}$ is the trajectory of the closed loop system with the control (11),

$$\left\{\begin{array}{l}{\underline{x}}_{k+1}={f}_{k}\left({\underline{x}}_{k},{\underline{u}}_{k},{\xi }_{k}\right)\\ {\underline{x}}_{0}=X,\end{array}\right.\qquad k=\overline{0,N},$$

in which ${\underline{u}}_{k}={\underline{\gamma }}_{k}\left({\underline{x}}_{k}\right)$.

2. The optimal-value function of the probabilistic criterio has the form

$$\begin{array}{ccc}P\left({u}^{* }\left(\cdot \right)\right)=\underline{F}\left(\varphi ,N,X\right)\\ +\mathop{\sum }\limits_{l=1}^{N-1}{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{x}_{k}^{* }\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{l}\left\{{x}_{k+1}^{* }\in {{\mathcal{I}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{x}_{k}^{* }\notin {{\mathcal{I}}}_{k}\right\}\right)\\ +{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{x}_{k}^{* }\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{x}_{k+1}^{* }\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{x}_{k}^{* }\notin {{\mathcal{I}}}_{k}\right\}\right),\end{array}$$

(13)

where ${x}_{k}^{* }$ is the trajectory of the closed loop system with the optimal control,

$$\left\{\begin{array}{l}{x}_{k+1}^{* }={f}_{k}\left({x}_{k}^{* },{u}_{k}^{* },{\xi }_{k}\right)\\ {x}_{0}^{* }=X,\end{array}\right.\qquad k=\overline{0,N},$$

in which ${u}_{k}^{* }={\gamma }_{k}^{* }\left({x}_{k}^{* }\right)$.

3. For any $\varphi \in {\mathbb{R}}$, $N\in {\mathbb{N}}$, and $X\in {{\mathbb{R}}}^{n}$,

$${F}^{{\mathcal{F}}}\left(\varphi ,N,X\right)\leqslant \underline{F}\left(\varphi ,N,X\right)\leqslant P\left(\underline{u}\left(\cdot \right)\right)\leqslant P\left({u}^{* }\left(\cdot \right)\right)\leqslant \overline{F}\left(\varphi ,N,X\right),$$

(14)

where the functions

$${F}^{{\mathcal{F}}}:{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right],\quad \underline{F}:{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right],\quad \overline{F}:{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right]$$

have the form

$${F}^{{\mathcal{F}}}\left(\varphi ,N,X\right)={\Psi }_{0}\left(X\right),\quad \underline{F}\left(\varphi ,N,X\right)={\underline{{\rm{B}}}}_{0}\left(X\right),\quad \overline{F}\left(\varphi ,N,X\right)={\overline{{\rm{B}}}}_{0}\left(X\right).$$

According to Theorem 3, for the suboptimal control $\underline{u}\left(\cdot \right)$(11) the accuracy can be estimated as

$$P\left({u}^{* }\left(\cdot \right)\right)-P\left(\underline{u}\left(\cdot \right)\right)\leqslant \Delta \left(\varphi ,N,X\right)$$

which holds for all

$$\varphi \in {\mathbb{R}},\quad N\in \left\{0\right\}\cup {\mathbb{N}},\quad X\in {{\mathbb{R}}}^{n},$$

where the function $\Delta :{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right]$ has the form

$$\begin{array}{ccc}\Delta \left(\varphi ,N,X\right)=\overline{F}\left(\varphi ,N,X\right)-\underline{F}\left(\varphi ,N,X\right)\\ -\mathop{\sum }\limits_{l=1}^{N-1}{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{l}\left\{{\underline{x}}_{k+1}\in {{\mathcal{I}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right)\\ -{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right).\end{array}$$

(15)

Now study the conditions under which problem (2) is equivalent to the optimal control problem with the probabilistic terminal criterion.

6 Equivalence to Optimal Control Problem with Probabilistic Terminal Criterion

On the trajectories of the system (1) define the probabilistic terminal criterion

$${P}_{\varphi }\left(u\left(\cdot \right)\right)={\bf{P}}\left({x}_{N+1}\in {{\mathcal{F}}}_{N+1}\right)={\bf{P}}\left({\Phi }_{N+1}\left({x}_{N+1}\right)\leqslant \varphi \right),$$

(16)

and consider the optimal control problem

$${P}_{\varphi }\left(u\left(\cdot \right)\right)\to \mathop{\rm{max}}\limits_{u\left(\cdot \right)\in {\mathcal{U}}}.$$

(17)

As was demonstrated in [12], the solution of problem (17) exists in the class ${\mathcal{U}}$ and is determined by solving the dynamic programming equations

$${\gamma }_{k}^{\varphi }\left(x\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}^{\varphi }\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],$$

(18)

$${{\rm{B}}}_{k}^{\varphi }\left(x\right)=\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}^{\varphi }\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],\quad k=\overline{0,N},$$

(19)

$${{\rm{B}}}_{N+1}^{\varphi }\left(x\right)={{\bf{I}}}_{{{\mathcal{F}}}_{N+1}}\left(x\right),$$

(20)

where ${{\rm{B}}}_{k}^{\varphi }\left(x\right)$ is the Bellman function in problem (17).

The result below establishes the equivalence of problems (2) and (17).

Lemma. Assume that for all $k=\overline{0,N}$, ${{\mathcal{F}}}_{k}\subseteq \Delta {{\mathcal{I}}}_{k}$, where

$$\Delta {{\mathcal{I}}}_{k}=\left\{x\in {{\mathbb{R}}}^{n}:\exists u\in {U}_{k}:{\bf{P}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=1\right\}.$$

(21)

Then problem (3) is equivalent to the optimal control problem with the probabilistic terminal criterion (17) in the sense of the same optimal controls ${u}^{* }\left(\cdot \right)={u}^{\varphi }\left(\cdot \right)$, the same optimal values of the criteria $P\left({u}^{* }\left(\cdot \right)\right)={P}_{\varphi }\left({u}^{\varphi }\left(\cdot \right)\right)$, and the same Bellman functions ${{\rm{B}}}_{k}\left(x\right)={{\rm{B}}}_{k}^{\varphi }\left(x\right)$ for all $k=\overline{0,N+1}$ and $x\in {{\mathbb{R}}}^{n}$.

Note that more specific conditions for the equivalence of problems (2) and (17) can be obtained from Lemma for a narrow class of systems, but this goes beyond the scope of the paper.

In the next section, the results obtained will be applied to examine additional properties of the optimal investment portfolio management problem with a probabilistic criterion.

7 Optimal Investment Portfolio Management with Non-Fixed Terminal Time

Following [13, 19], consider a discrete-time stochastic control system of the form

$$\left\{\begin{array}{l}{x}_{k+1}={x}_{k}\left(1+b{u}_{k}^{1}+\mathop{\sum }\limits_{j=2}^{m}{u}_{k}^{j}{\xi }_{k}^{j-1}\right)\\ {x}_{0}=X,\end{array}\right.\qquad k=\overline{0,N},$$

(22)

where n = 1, m, and s = m − 1 specify the dimensions of the state vector, control vector and random perturbations, respectively; X > 0, b > −1, and φ < 0 are deterministic scalars; ${\xi }_{k}={\left({\xi }_{k}^{1},\ldots ,{\xi }_{k}^{m-1}\right)}^{{\mathtt{T}}}$ mean random vectors with independent components, among which ξ_k+1 and ξ_k are independent for all $k=\overline{0,N-1}$. Let the control constraints be given by

$${U}_{k}=U=\left\{u\in {{\mathbb{R}}}^{m}:\mathop{\sum }\limits_{j=1}^{m}{u}^{j}=1,\ {u}^{j}\geqslant 0,\ \forall j=\overline{1,m}\right\},\quad k=\overline{0,N}.$$

Assume that the distribution of the random vectors ξ_k has the support ${\rm{supp}}{\rho }_{\xi }\left(t\right)=\mathop{\bigotimes }\limits_{j = 1}^{m-1}\left[{\underline{\varepsilon }}_{j},{\overline{\varepsilon }}_{j}\right]$, and moreover the inequality $-1\leqslant {\underline{\varepsilon }}_{j}\leqslant b\leqslant {\overline{\varepsilon }}_{j}$ holds for all $j=\overline{1,m-1}$.

Consider an optimal control problem with a non-fixed terminal time of the form

$${\bf{P}}\left(\mathop{\rm{min}}\limits_{k = \overline{0,N}}\left\{-{x}_{k+1}\right\}\leqslant \varphi \right)\to \mathop{\rm{sup}}\limits_{u\left(\cdot \right)\in {\mathcal{U}}}$$

(23)

and a problem with a probabilistic terminal criterion of the form

$${\bf{P}}\left(-{x}_{N+1}\leqslant \varphi \right)\to \mathop{\rm{sup}}\limits_{u\left(\cdot \right)\in {\mathcal{U}}}.$$

(24)

In the notations accepted in this paper,

$${{\mathcal{F}}}_{k}={\mathcal{F}}=\left[-\varphi ,+\infty \right),\quad {\Phi }_{k}\left(x\right)=-x,\quad {f}_{k}\left(x,u,\xi \right)=x\left(1+b{u}^{1}+\mathop{\sum }\limits_{j=2}^{m}{u}^{j}{\xi }^{j-1}\right).$$

Let X be the amount of initial capital, x_k be the amount of capital at the beginning of the kth year, ${u}_{k}^{1}$ be the share of x_k invested in a risk-free asset (for example, a reliable bank) with a rate of return b, and ${u}_{k}^{j}$ be the shares of x_k invested in risky assets characterized by rates of return ${\xi }_{k}^{j-1}$, $j=\overline{2,m}$. In this case, problem (23) is to maximize the probability of reaching the amount of capital ( −φ) in time steps bounded above by the value N + 1 through investing in some assets. And problem (24) is to maximize the probability of reaching the amount of capital ( −φ) in time steps N + 1 through investing in some assets.

Problem (24) was considered in the two-step statement (N = 1) for the case of one risky asset (m = 2) in [15, 16]. In [13], a whole class of asymptotically optimal controls (as N → ∞) was found. Problem (23) is studied for the first time.

Using the results of this paper, check the equivalence of problems (23) and (24).

Proposition 1

For all $k=\overline{0,N},$

$${{\mathcal{I}}}_{k}=\Delta {{\mathcal{I}}}_{k}=\left[{\varphi }_{k}^{{\mathcal{I}}},+\infty \right),\quad {{\mathcal{B}}}_{k}=\left({\varphi }_{k}^{{\mathcal{O}}},{\varphi }_{k}^{{\mathcal{I}}}\right),\quad {{\mathcal{O}}}_{k}=\left(-\infty ,{\varphi }_{k}^{{\mathcal{O}}}\right],$$

where the scalars ${\varphi }_{k}^{{\mathcal{I}}}$ and ${\varphi }_{k}^{{\mathcal{O}}}$ are given by

$$\begin{array}{l}{\varphi }_{k}^{{\mathcal{I}}}=-\varphi {\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\underline{\varepsilon }}_{j}\right\}\right)}^{k-N-1},\\ {\varphi }_{k}^{{\mathcal{O}}}=-\varphi {\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\overline{\varepsilon }}_{j}\right\}\right)}^{k-N-1}.\end{array}$$

Due to Proposition 1, the condition of Lemma holds, ${\mathcal{F}}\subseteq \Delta {{\mathcal{I}}}_{k}$, and consequently problems (23) and (24) are equivalent. From Proposition 1 it follows that the optimal control in terms of the probabilistic terminal criterion will maximize the probability of first reaching the amount ( −φ) by the capital x_k in time steps bounded above by the value N.

Now apply the two-sided estimates of the optimal-value function of the probabilistic criterion to estimate a time step ${N}^{* }\in {\mathbb{N}}$ such that

$${\bf{P}}\left(\mathop{\rm{max}}\limits_{k\in \left\{0,\ldots ,{N}^{* }\right\}}{x}_{k}^{* }\geqslant \varphi \right)=1,$$

where ${\left\{{x}_{k}^{* }\right\}}_{k = 0}^{N}$ are the trajectories of the closed loop system (22) with the optimal control ${u}^{* }\left(\cdot \right)$. Interestingly, the two-sided estimates of the optimal-value function of the probabilistic criterion (see Theorem 3) can be used to obtain such an estimate without finding the optimal control.

Proposition 2

Let ${\left\{{\underline{x}}_{k}\right\}}_{k = 0}^{N}$ be the trajectories of the closed loop system (22) with the control (11), where

$${\underline{\gamma }}_{k}\left(x\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in U}{\bf{P}}\left(x\left(1+b{u}^{1}+\mathop{\sum }\limits_{j=2}^{m}{u}^{j}{\xi }_{k}^{j-1}\right)\geqslant {\varphi }_{k+1}^{{\mathcal{I}}}\right),\quad k=\overline{0,N}.$$

(25)

Then there exists a time step $\underline{N}\in {\mathbb{N}}$ such that

$${\bf{P}}\left(\mathop{\rm{max}}\limits_{k = \left\{0,\ldots ,\underline{N}\right\}}{\underline{x}}_{k+1}\geqslant \varphi \right)=1,$$

and moreover for any X > 0 and b > − 1, ${N}^{* }\geqslant \underline{N}$ and

$$\underline{N}=\left[\frac{\mathrm{ln}\,\left(-\varphi \right)-\mathrm{ln}\,\left(X\right)}{\mathrm{ln}\,\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\underline{\varepsilon }}_{j}\right\}\right)}-1\right].$$

(26)

Perform a series of three numerical experiments to check the adequacy of the estimate (26). Consider the case of one risky asset, for which the control (25) was found in [13]:

$${\underline{\gamma }}_{k}\left(x\right)=\left\{\begin{array}{ll}{\left(1,0\right)}^{{\mathtt{T}}},&x\geqslant -\varphi {\left(1+b\right)}^{k-N-1}\\ {\left(0,1\right)}^{{\mathtt{T}}},&x<-\varphi {\left(1+b\right)}^{k-N-1}.\end{array}\right.$$

The values of the system parameters are given in Table 1.

Table 1 Values of system parameters

Full size table

To simulate the value N^*, the Monte Carlo method of 50 000 observations was used. The simulation results are presented in Table 2.

Table 2 Values of system parameters

Full size table

According to Table 2, $\underline{N}$ is a rather accurate estimate of the value N^*.

8 Conclusions

This paper has considered the optimal control problem for a discrete-time stochastic system with the probability of first reaching a given tube of trajectories as the optimality criterion. Dynamic programming-based sufficient conditions of optimality have been established. Two-sided estimates of the right-hand side of the dynamic programming equation, two-sided estimates of the Bellman function, and two-sided estimates of the optimal-value function of the probabilistic criterion have been found. A suboptimal control has been designed in analytical form, and its accuracy has been estimated. The conditions under which this problem is equivalent to the optimal control problem with a probabilistic terminal criterion have been proved. These conditions were tested on an optimal management problem for an investment portfolio.

9 Funding

This work was supported by the Russian Foundation for Basic Research, project no. 18-08-00595.

References

Afanas’ev, V.N., Kolmanosvkii, V.B., and Nosov, V.R.Matematicheskaya teoriya konstruirovaniya sistem upravleniya, Moscow: Vysshaya Shkola, 2003, 3rd ed. Translated under the title Mathematical Theory of Control Systems Design, Kluwer, 1996, 1st ed.
Smirnov, I. P. On a Time-Optimal Control Problem for a Stochastic Controlled System. Differ. Uravn. 22(no. 2), 247–254 (1986).
Google Scholar
Bertsekas, D. P. & Shreve, S. E. Stochastic Optimal Control: The Discrete-Time Case. (Academic, New York, 1978).
MATH Google Scholar
Zhou, J. Infinite Horizon Optimal Control Problem for Stochastic Evolution Equations in Hilbert Spaces. J. Dyn. Control Syst. 22(no. 3), 531–554 (2016).
Article MathSciNet Google Scholar
Agram, N., Haadem, S., Øksendal, B. & Proske, F. A Maximum Principle for Infinite Horizon Delay Equations. SIAM J. Math. Anal. 45(no. 4), 2499–2522 (2013).
Article MathSciNet Google Scholar
Agram, N. & Øksendal, B. Infinite Horizon Optimal Control of Forward-Backward Stochastic Differential Equations with Delay. J. Comput. Appl. Math. 259(part B), 336–349 (2014).
Article MathSciNet Google Scholar
Chernous’ko, F. L. & Kolmanovskii, V. B. Optimalanoe upravlenie pri sluchainykh vozmushcheniyakh (Optimal Control under Random Perturbations). (Nauka, Moscow, 1978).
Google Scholar
Smirnov, I. P. Control of the Probability of Reaching a Given Domain by a System. Differ. Uravn. 26(no. 10), 1753–1758 (1990).
MATH Google Scholar
Azanov, V. M. Optimal Control for Linear Discrete Systems with Respect to Probabilistic Criteria. Autom. Remote Control 75(no. 10), 1743–1753 (2014).
Article MathSciNet Google Scholar
Semakov, S. L. Vybrosy sluchainykh protsessov: prilozheniya v aviatsii. (Nauka, Moscow, 2005). Translated under the title Crossings Problems in Random Processes Theory and Their Applications in Aviation, Newcastle: Cambridge Scholars Publisher, 2019.
Google Scholar
Khametov, V. M. & Shelemekh, E. A. Superhedging of American Options on an Incomplete Market with Discrete Time and Finite Horizon. Autom. Remote Control 76(no. 9), 1616–1634 (2015).
Article MathSciNet Google Scholar
Malyshev, V. V. & Kibzun, A. I. Analiz i sintez vysokotochnogo upravleniya letatelanymi apparatami (Analysis and Design of High-Precision Control of Aircrafts). (Mashinostroenie, Moscow, 1987).
Google Scholar
Azanov, V. M. & Kan, Yu. S. Bilateral Estimation of the Bellman Function in the Problems of Optimal Stochastic Control of Discrete Systems by the Probabilistic Performance Criterion. Autom. Remote Control 79(no. 2), 203–215 (2018).
Article MathSciNet Google Scholar
Azanov, V. M. & Kan, Yu. S. Refined Estimation of the Bellman Function for Stochastic Optimal Control Problems with Probabilistic Performance Criterion, Autom. Remote Control 80(no. 4), 634–647 (2019).
Article MathSciNet Google Scholar
Grigor’ev, P. V. & Kan, Yu. S. Optimal Control of the Investment Portfolio with Respect to the Quantile Criterion. Autom. Remote Control 65(no. 2), 319–336 (2004).
Article MathSciNet Google Scholar
Bunto, T. V. & Kan, Yu. S. Quantile Criterion-based Control of the Securities Portfolio with a Nonzero Ruin Probability. Autom. Remote Control 74(no. 5), 811–828 (2013).
Article MathSciNet Google Scholar
Kibzun, A. I. & Ignatov, A. N. The Two-Step Problem of Investment Portfolio Selection from Two Risk Assets via the Probability Criterion. Autom. Remote Control 76(no. 7), 1201–1220 (2015).
Article MathSciNet Google Scholar
Azanov, V. M. & Kan, Yu. S. Design of Optimal Strategies in the Problems of Discrete System Control by the Probabilistic Criterion. Autom. Remote Control 78(no. 6), 1006–1027 (2017).
Article MathSciNet Google Scholar
Kibzun, A. I. & Ignatov, A. N. Reduction of the Two-Step Problem of Stochastic Optimal Control with Bilinear Model to the Problem of Mixed Integer Linear Programming. Autom. Remote Control 77(no. 12), 2175–2192 (2016).
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Moscow Aviation Institute (National Research University), Moscow, Russia
V. M. Azanov

Authors

V. M. Azanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. M. Azanov.

Appendix

Proof of Theorem 1. Introduce a function ${\Phi }_{0}:{{\mathbb{R}}}^{n}\to {\mathbb{R}}$ such that ${\Phi }_{0}\left(x\right)={\Phi }_{1}\left(x\right)$. Augment the state vector of the system by adding a new variable ${y}_{k}={\rm{min}}_{i = \overline{0,k}}{\Phi }_{i}\left({x}_{i}\right)$. The augmented control system has the form

$$\left\{\begin{array}{l}{x}_{k+1}={f}_{k}\left({x}_{k},{\widetilde{u}}_{k},{\xi }_{k}\right)\\ {y}_{k+1}={\rm{min}}\left\{{y}_{k},{\Phi }_{k}\left({x}_{k}\right)\right\}\\ {x}_{0}=X\\ {y}_{0}={\Phi }_{0}\left(X\right),\end{array}\right.\qquad k=\overline{0,N},$$

where ${\widetilde{u}}_{k}={\widetilde{\gamma }}_{k}\left({x}_{k},{y}_{k}\right)$. Consider the right-hand side $\widetilde{f}:{{\mathbb{R}}}^{n+1}\times {{\mathbb{R}}}^{m}\times {{\mathbb{R}}}^{s}\to {{\mathbb{R}}}^{n+1}$ of the augmented system:

$${\widetilde{f}}_{k}\left(x,y,u,\xi \right)={\left({f}_{k}\left(x,u,\xi \right),{\rm{min}}\left\{y,{\Phi }_{k}\left(x\right)\right\}\right)}^{\,\,\,{\mathtt{T}}}.$$

Then the equivalent optimal control problem can be written as

$${\bf{P}}\left({\rm{min}}\left\{{y}_{N+1},{\Phi }_{N+1}\left({x}_{N+1}\right)\right\}\leqslant \varphi \right)\to \mathop{\rm{sup}}\limits_{u\left(\cdot \right)\in \widetilde{{\mathcal{U}}}},\quad k=\overline{0,N},$$

where $\widetilde{{\mathcal{U}}}={\widetilde{{\mathcal{U}}}}_{0}\times \ldots \times {\widetilde{{\mathcal{U}}}}_{N}$ and

$${\widetilde{{\mathcal{U}}}}_{k}=\left\{\widetilde{\gamma }:{{\mathbb{R}}}^{n+1}\to {{\mathbb{R}}}^{m}| \widetilde{\gamma }\,\text{is Borel measurable}\,,\,\forall x\in {{\mathbb{R}}}^{n+1}:\widetilde{\gamma }\left(x\right)\in {U}_{k}\right\}.$$

Here the equivalence is understood in the sense of the same criteria

$$P\left(u\left(\cdot \right)\right)={\bf{P}}\left(\mathop{\rm{min}}\limits_{k = \overline{0,N}}{\Phi }_{k+1}\left(x\right)\leqslant \varphi \right)={\bf{P}}\left(\mathop{\rm{min}}\limits\left\{{y}_{N+1},{\Phi }_{N+1}\left({x}_{N+1}\right)\right\}\leqslant \varphi \right).$$

The Bellman equation for the equivalent problem [12] has the form

$${\widetilde{\gamma }}_{k}^{* }\left(x,y\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{\widetilde{{\rm{B}}}}_{k+1}\left({\widetilde{f}}_{k}\left(x,y,u,{\xi }_{k}\right)\right)\right],$$

(A.1)

$${\widetilde{{\rm{B}}}}_{k}\left(x,y\right)=\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{\widetilde{{\rm{B}}}}_{k+1}\left({\widetilde{f}}_{k}\left(x,y,{u}_{k},{\xi }_{k}\right)\right)\right],$$

(A.2)

$${\widetilde{{\rm{B}}}}_{N+1}\left(x,y\right)={{\bf{I}}}_{\left\{{\rm{min}}\left\{y,{\Phi }_{N+1}\left(x\right)\right\}\leqslant \varphi \right\}}\left(x,y\right),\quad k=\overline{0,N}.$$

(A.3)

The following result was established in [12]. Assume that:

1) The functions ${\widetilde{f}}_{k}$ are continuous for all $k=\overline{0,N}$.

2) The function ${\rm{min}}\left\{y,{\Phi }_{N+1}\left(x\right)\right\}$ is continuous and bounded below.

3) The random vectors X, ξ₀, …, ξ_N are independent.

4) The sets U₀, …, U_N are compact.

Then the optimal control exists in the class of measurable functions and is determined by solving problems (A.1)–(A.3).

At step k = N the Bellman equation takes the form

$$\begin{array}{rcl}{\widetilde{{\rm{B}}}}_{N}\left(x,y\right)&=&\mathop{\rm{max}}\limits_{u\in {U}_{N}}{{\bf{M}}}_{{\xi }_{N}}\left[{{\bf{I}}}_{\left\{{\rm{min}}\left\{y,{\Phi }_{N}\left(x\right),{\Phi }_{N+1}\left(f\left(x,u,{\xi }_{k}\right)\right)\right\}\leqslant \varphi \right\}}\left(x,y\right)\right]\\ &=&\mathop{\rm{max}}\limits_{u\in {U}_{N}}{{\bf{M}}}_{{\xi }_{N}}\left[{{\bf{I}}}_{\left\{{\rm{min}}\left\{y,{\Phi }_{N}\left(x\right)\right\}\leqslant \varphi \right\}}\left(x,y\right)+\left(1-{{\bf{I}}}_{\left\{{\rm{min}}\left\{y,{\Phi }_{N}\left(x\right)\right\}\leqslant \varphi \right\}}\left(x,y\right)\right)\right.\\ &&\hspace{1em}\left.\times {{\bf{I}}}_{\left\{{\Phi }_{N+1}\left(f\left(x,u,{\xi }_{k}\right)\right)\leqslant \varphi \right\}}\left(x\right)\right]\\ &=&\mathop{\rm{max}}\limits_{u\in {U}_{N}}{{\bf{M}}}_{{\xi }_{N}}\left[{{\bf{I}}}_{\left\{{\rm{min}}\left\{y,{\Phi }_{N}\left(x\right)\right\}\leqslant \varphi \right\}}\left(x,y\right)+\left(1-{{\bf{I}}}_{\left\{{\rm{min}}\left\{y,{\Phi }_{N}\left(x\right)\right\}\leqslant \varphi \right\}}\left(x,y\right)\right)\right.\\ &&\hspace{1em}\left.\times {{\rm{B}}}_{N+1}\left({f}_{N}\left(x,u,{\xi }_{N}\right)\right)\right].\end{array}$$

(A.4)

Hence, for any $k=\overline{0,N}$, Eq. (A.2) can be written as

$$\begin{array}{l}{{\rm{B}}_k}\left( {x,y} \right) = \mathop {{\rm{max}}}\limits_{u \in {U_k}} {{\bf{M}}_{{\xi _k}}}[{{\bf{I}}_{{\rm{\{ min}}\left\{ {{\rm{y}},{\Phi _{\rm{k}}}\left( {\rm{x}} \right)} \right\}\varphi \} }}\left( {{\rm{x}},{\rm{y}}} \right)\\\quad\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\left. {{\rm{ + }}\left( {{\rm{1 - }}{{\bf{I}}_{\left\{ {{\rm{min}}\left\{ {{\rm{y}},{\Phi _{\rm{k}}}\left( {\rm{x}} \right)} \right\}\varphi } \right\}}}\left( {{\rm{x}},{\rm{y}}} \right)} \right){{\rm{B}}_{{\rm{k + 1}}}}\left( {{{\rm{f}}_{\rm{k}}}\left( {{\rm{x}},{\rm{u}},{\xi _{\rm{k}}}} \right)} \right)} \right].\end{array}$$

(A.5)

Therefore, the Bellman function in the equivalent problem admits of the representation

$${\widetilde{{\rm{B}}}}_{k}(x,y)=\left\{\begin{array}{ll}1,&y\leqslant \varphi \\ \mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left\{{{\bf{I}}}_{{{\mathcal{F}}}_{k}}(x)+(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}(x)){{\rm{B}}}_{k+1}({f}_{k}(x,u,{\xi }_{k}))\right\},&y>\varphi .\end{array}\right.$$

(A.6)

As a result, the optimal control is given by

$${\widetilde{\gamma }}_{k}^{* }(x,y)=\left\{\begin{array}{ll}\,\text{any element from}\,\,{U}_{k},&y\leqslant \varphi \\ {\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left\{{{\bf{I}}}_{{{\mathcal{F}}}_{k}}(x)+(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}(x)){{\rm{B}}}_{k+1}({f}_{k}(x,u,{\xi }_{k}))\right\},&y>\varphi .\end{array}\right.$$

(A.7)

Conditions 1–5 of Theorem 1 follow from the existence conditions of an optimal control in the problem with the terminal criterion. Note that items 2 and 3 are immediate from the continuity and boundedness (from below) of the function ${\rm{min}}\left\{y,{\Phi }_{N+1}\left(x\right)\right\}$.

The proof of Theorem 1 is complete.

Proof of Theorem 2. 1. Consider the Bellman equation at some step k. For the isobell of level 1,

$$\begin{array}{ccc}{{\mathcal{I}}}_{k}&=&\left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)+\left(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\right){{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right]=1\right\}\\ &=&{{\mathcal{F}}}_{k}\cup \left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right]=1\right\}.\end{array}$$

Using the total mathematical expectation formula, write

$$\begin{array}{cc}{{\mathcal{I}}}_{k}={{\mathcal{F}}}_{k}\cup \left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}\left\{{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right){{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right]\right.\right.\\ \left.\left.+\left(1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)\right){{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\notin {{\mathcal{I}}}_{k+1}\right]\right\}=1\right\}.\end{array}$$

From the equalities

$$\begin{array}{l}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right]=1,\\ {{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\notin {{\mathcal{I}}}_{k+1}\right]<1,\end{array}$$

it follows that

$$\begin{array}{ccc}{{\mathcal{I}}}_{k}={{\mathcal{F}}}_{k}\cup \left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}\left\{{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)+\left(1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)\right)\right.\right.\\ \left.\left.\times {{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\notin {{\mathcal{I}}}_{k+1}\right]\right\}=1\right\}\\ ={{\mathcal{F}}}_{k}\cup \left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=1\right\}.\end{array}$$

Item 1 of this theorem is established.

2. By analogy with item 1, the isobell of level 0 of the Bellman function can be written as

$$\begin{array}{ccc}{{\mathcal{O}}}_{k}&=&\left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)+\left(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\right){{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right]=0\right\}\\ &=&{\overline{{\mathcal{F}}}}_{k}\cap \left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right]=1\right\}.\end{array}$$

Using the total mathematical expectation formula, transform the right-hand side of this expression:

$$\begin{array}{rcl}{{\mathcal{O}}}_{k}&=&{\overline{{\mathcal{F}}}}_{k}\cap \left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}\left\{{{\bf{P}}}_{{\xi }_{k}}({f}_{k}(x,u,{\xi }_{k})\in {{\mathcal{I}}}_{k+1}){{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}({f}_{k}(x,u,{\xi }_{k}))| {f}_{k}(x,u,{\xi }_{k})\in {{\mathcal{I}}}_{k+1}\right]\right.\right.\\ &&+{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right){{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right]\\ &&\left.\left.+{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right){{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right]\right\}=0\right\}.\end{array}$$

From the equalities

$$\begin{array}{ccc}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right]=1,\\ {{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right]\in \left(0,1\right),\\ {{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right]=0\end{array}$$

it follows that

$$\begin{array}{l}{{\mathcal{O}}}_{k}={\overline{{\mathcal{F}}}}_{k}\cap \left\{x\in {{\mathbb{R}}}^{n}| \mathop{\rm{max}}\limits_{u\in {U}_{k}}\left\{{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)\right.\right.\\ \left.\left.\quad\quad\quad\quad+{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right){{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right]\right\}=0\right\}\\ \quad\quad={\overline{{\mathcal{F}}}}_{k}\cap \left\{x\in {{\mathbb{R}}}^{n}| \forall u\in {U}_{k}:{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=0,\,{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right)=0\right\}\\ \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad={\overline{{\mathcal{F}}}}_{k}\cap \left\{x\in {{\mathbb{R}}}^{n}| \forall u\in {U}_{k}:{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right)=1\right\}.\end{array}$$

Item 2 of this theorem is established.

3. Item 3 of this theorem follows from its item 1.

4. Item 4 of this theorem holds, since ${{\rm{B}}}_{k}\left(x\right)=0$ for $x\in {{\mathcal{O}}}_{k}$.

5. Item 5 of this theorem follows its item 1.

6. For $x\in {{\mathcal{B}}}_{k}$, the right-hand side of the dynamic programming equation can be represented as

$$\begin{array}{ccc}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right]={{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)\\ \times \left(1-{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right]\right)\\ +\left(1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right)\right){{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)| {f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right].\end{array}$$

Due to the two-sided inequality for a convex combination, this yields

$$\begin{array}{cc}{\rm{min}}\left\{{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right),\left(1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right)\right)\right\}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right]\\ \leqslant {\rm{max}}\left\{{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right),\left(1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right)\right)\right\}.\end{array}$$

The relations $1-{{\bf{P}}}_{{\xi }_{k}}({f}_{k}(x,u,{\xi }_{k})\in {{\mathcal{O}}}_{k+1})={{\bf{P}}}_{{\xi }_{k}}({f}_{k}(x,u,{\xi }_{k})\in {{\mathcal{I}}}_{k+1})+{{\bf{P}}}_{{\xi }_{k}}({f}_{k}(x,u,{\xi }_{k})\in {{\mathcal{B}}}_{k+1})$ and ${{\mathcal{F}}}_{k}\subseteq {{\mathcal{I}}}_{k}$ finally establish item 6 of this theorem.

7. Item 7 of this theorem is immediate from its item 6 by taking the supremum in all sides of inequality (8).

Proof of Theorem 3. 1. Introduce a system of hypotheses forming a complete group of incompatible events:

$$\left\{\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right\},\quad \left\{\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right\}.$$

Then the total probability formula gives the chain of equalities

$$\begin{array}{ccc}P\left(\underline{u}\left(\cdot \right)\right)={\bf{P}}\left(\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right)\\ ={\bf{P}}\left(\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right)\\ +\,\,{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right).\end{array}$$

(A.8)

Analyze the second factor in the first term of the right-hand side of this expression. From the chain of equalities

$${\bf{P}}\left(\left.{\underline{x}}_{N+1}\in {{\mathcal{F}}}_{N+1}\right|{\underline{x}}_{N}\in {{\mathcal{I}}}_{N}\right)={\bf{P}}\left(\left.{\underline{x}}_{N+1}\in {{\mathcal{I}}}_{N+1}\right|{\underline{x}}_{N}\in {{\mathcal{I}}}_{N}\right)=1$$

it follows that

$${\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right)=1.$$

In view of this equality, the expression (A.8) takes the form

$$\begin{array}{cc}P\left(\underline{u}\left(\cdot \right)\right)={\bf{P}}\left(\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right)\\ +\,\,{\bf{P}}\left(\mathop{\bigcap }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right).\end{array}$$

(A.9)

Now introduce another system of hypotheses forming a complete group of incompatible events:

$$\left\{\mathop{\bigcup }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right\},\quad \left\{\mathop{\bigcap }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right\}.$$

Apply the total probability formula for the first term in the right-hand side of (A.9):

$$\begin{array}{ccc}{\bf{P}}\left(\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right)\\ ={\bf{P}}\left(\mathop{\bigcup }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right|\mathop{\bigcup }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right)\\ +\,\,{\bf{P}}\left(\mathop{\bigcap }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right|\mathop{\bigcap }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right).\end{array}$$

(A.10)

Similar to (A.9), transform the right-hand side of (A.10) to obtain

$$\begin{array}{cc}{\bf{P}}\left(\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right)={\bf{P}}\left(\mathop{\bigcup }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right)\\ +\,\,{\bf{P}}\left(\mathop{\bigcap }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right|\mathop{\bigcap }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right).\end{array}$$

(A.11)

Substituting (A.10) into (A.9) yields

$$\begin{array}{ccc}P\left(\underline{u}\left(\cdot \right)\right)={\bf{P}}\left(\mathop{\bigcup }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right)\\ +\,\,{\bf{P}}\left(\mathop{\bigcap }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right|\mathop{\bigcap }\limits_{k = 1}^{N-1}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right)\\ +\,\,{\bf{P}}\left(\mathop{\bigcap }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right).\end{array}$$

(A.12)

Perform the analogous transformations for the first term in (A.12), introducing the systems of hypotheses

$$\left\{\mathop{\bigcup }\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right\},\quad \left\{\mathop{\bigcap }\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right\},\quad l=\overline{1,\ldots ,N-2},$$

to obtain the following expression for the value of the probabilistic criterion under the control $\underline{u}\left(\cdot \right):$

$$\begin{array}{ccc}P\left(\underline{u}\left(\cdot \right)\right)={\bf{P}}\left({\underline{x}}_{1}\in {{\mathcal{I}}}_{1}\right)\\ +\mathop{\sum }\limits_{l = 1}^{N-1}{\bf{P}}\left(\mathop{\bigcap }\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 1}^{l+1}\left\{{\underline{x}}_{k}\in {{\mathcal{I}}}_{k}\right\}\right|\mathop{\bigcap }\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right)\\ +\,\,{\bf{P}}\left(\mathop{\bigcap }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap }\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right).\end{array}$$

(A.13)

Note that the first term in (A.13) satisfies the chain of equalities

$${\bf{P}}\left({\underline{x}}_{1}\in {{\mathcal{I}}}_{1}\right)={\bf{P}}\left({f}_{0}\left(X,{\underline{u}}_{0},{\xi }_{0}\right)\in {{\mathcal{I}}}_{1}\right)={\underline{{\rm{B}}}}_{0}\left(X\right)=\underline{F}\left(\varphi ,N,X\right),$$

which implies the expression (14).

Item 1 of Theorem 3 is established.

2. For establishing item 2 of Theorem 3, it suffices to observe that for ${x}_{k}\in {{\mathcal{I}}}_{k}$, ${u}_{k}^{* }={\underline{u}}_{k}$ for all $k=\overline{0,N}$. Following the same procedure as above, this equality can be used for deriving the expression (14) for the optimal-value function of the probabilistic criterion on the trajectories ${\left\{{x}_{k}^{* }\right\}}_{k = 1}^{N+1}$ of the closed loop system with the optimal control ${u}^{* }\left(\cdot \right)$.

Item 2 of Theorem 3 is established.

3. Item 3 of Theorem 3 directly follows from item 7 of Theorem 2 and item 1 of Theorem 3.

The proof of Theorem 3 is complete.

Proof of Lemma. Consider the dynamic programming relations (3)–(5) and (18)–(20) for problems (2) and (17), respectively. In (19), the equality

$${{\rm{B}}}_{k}^{\varphi }\left(x\right)=\mathop{\rm{max}}\limits_{u\in {U}_{k}}{\bf{M}}\left[{{\bf{I}}}_{{{\mathcal{I}}}_{k}^{\varphi }}\left(x\right)+\left(1-{{\bf{I}}}_{{{\mathcal{I}}}_{k}^{\varphi }}\left(x\right)\right){{\rm{B}}}_{k+1}^{\varphi }\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],\quad k=\overline{0,N},$$

where ${{\mathcal{I}}}_{k}^{\varphi }$ is the isobell of level 1 of the Bellman function ${{\rm{B}}}_{k}^{\varphi }\left(x\right)$, holds for all $x\in {{\mathbb{R}}}^{n}$. Hence, the equivalence conditions are true if the isobells of level 1 of the Bellman functions in problems (2) and (17) are the same, ${{\mathcal{I}}}_{k}={{\mathcal{I}}}_{k}^{\varphi }$. Due to the recurrent formula for ${{\mathcal{I}}}_{k}$ (see item 1 of Theorem 2), this is the case if and only if

$${{\mathcal{F}}}_{k}\subseteq \left\{x\in {{\mathbb{R}}}^{n}:\exists u\in {U}_{k}:{\bf{P}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=1\right\},\quad k=\overline{0,N}.$$

(A.14)

The proof of Lemma is complete.

Proof of Proposition 1. In accordance with item 1 of Theorem 2, write the equations for the isobells of levels 1 and 0 of the Bellman function at step k = N:

$$\begin{array}{l}{{\mathcal{I}}}_{N}={\mathcal{F}}\cup \left\{x\in {\mathbb{R}}:\exists u\in U:{\bf{P}}\left(x\left(1+{u}^{1}b+\mathop{\sum }\limits_{i=2}^{m}{u}^{i}{\xi }_{N}^{i}\right)\geqslant -\varphi \right)=1\right\},\\ {{\mathcal{O}}}_{N}=\overline{{\mathcal{F}}}\cap \left\{x\in {\mathbb{R}}:\forall u\in U:{\bf{P}}\left(x\left(1+{u}^{1}b+\mathop{\sum }\limits_{i=2}^{m}{u}^{i}{\xi }_{N}^{i}\right)<-\varphi \right)=1\right\}.\end{array}$$

The solutions of these equations were obtained in [13]. Using those results and the notations for the boundaries of the sets ${{\mathcal{I}}}_{k}$ and ${{\mathcal{O}}}_{k}$ (see Section 6), write:

$$\begin{array}{l}{{\mathcal{I}}}_{N}=\left[\varphi ,+\infty \right)\cup \left[\varphi {\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{2,m}}{\underline{b}}^{j}\right\}\right)}^{-1},+\infty \right)=\left[{\varphi }_{N}^{{\mathcal{I}}},+\infty \right),\\ {{\mathcal{O}}}_{N}=\left(-\infty ,\varphi \right]\cap \left(-\infty ,\varphi {\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{2,m}}{\overline{b}}^{j}\right\}\right)}^{-1}\right]=\left(-\infty ,{\varphi }_{N}^{{\mathcal{O}}}\right].\end{array}$$

This implies ${{\mathcal{I}}}_{N}=\Delta {{\mathcal{I}}}_{N}$. Due to item 1 of Theorem 2, at step k = N − 1 the equations for the isobells take the form

$$\begin{array}{l}{{\mathcal{I}}}_{N-1}={\mathcal{F}}\cup \left\{x\in {\mathbb{R}}:\ \exists u\in U:\ {\bf{P}}\left(x\left(1+{u}^{1}b+\mathop{\sum }\limits_{i = 2}^{m}{u}^{i}{\xi }_{N-1}^{i}\right)\geqslant {\varphi }_{N}^{{\mathcal{I}}}\right)=1\right\},\\ {{\mathcal{O}}}_{N-1}=\overline{{\mathcal{F}}}\cap \left\{x\in {\mathbb{R}}:\forall u\in U:{\bf{P}}\left(x\left(1+{u}^{1}b+\mathop{\sum }\limits_{i = 2}^{m}{u}^{i}\ {\xi }_{N-1}^{i}\right)<{\varphi }_{N}^{{\mathcal{O}}}\right)=1\right\}.\end{array}$$

By analogy with step k = N, it follows that

$${{\mathcal{I}}}_{N-1}=\Delta {{\mathcal{I}}}_{N-1}=\left[{\varphi }_{N-1}^{{\mathcal{I}}},+\infty \right),\quad {{\mathcal{O}}}_{N-1}=\left(-\infty ,{\varphi }_{N-1}^{{\mathcal{O}}}\right].$$

Mathematical induction on k finally leads to the conclusion that, for all $k=\overline{0,N}$,

$${{\mathcal{I}}}_{k}=\Delta {{\mathcal{I}}}_{k}=\left[{\varphi }_{k}^{{\mathcal{I}}},+\infty \right),\quad {{\mathcal{O}}}_{k}=\left(-\infty ,{\varphi }_{k}^{{\mathcal{O}}}\right].$$

The proof of Proposition 1 is complete.

Proof of Proposition 2. Consider the control (11), which takes the form (25); see Proposition 1. From item 3 of Theorem 3 it follows that

$$P\left(\underline{u}\left(\cdot \right)\right)={\bf{P}}\left(\mathop{\max }\limits_{k\in \left\{0,\ldots ,N\right\}}{\underline{x}}_{k+1}\geqslant -\varphi \right)\geqslant \underline{F}\left(\varphi ,N,X\right),$$

(A.15)

where the function $\underline{F}$ is given by

$$\underline{F}\left(\varphi ,N,X\right)=\mathop{\rm{max}}\limits_{u\in U}{\bf{P}}\left(X\left(1+b{u}^{1}+\mathop{\sum }\limits_{j = 2}^{m}{u}^{j}{\xi }_{0}^{j-1}\right)\geqslant -\frac{\varphi }{{\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\underline{\varepsilon }}_{j}\right\}\right)}^{N}}\right)$$

(A.16)

due to

$${\varphi }_{1}^{{\mathcal{I}}}=-\varphi {\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\underline{\varepsilon }}_{j}\right\}\right)}^{-N}.$$

Clearly, the value $\underline{N}\in {\mathbb{N}}$ can be determined as a root of the equation $\underline{F}\left(\varphi ,N,X\right)=1$. But there exist infinitely many such roots, and the estimate $\underline{N}$ will be therefore found in the form

$$\underline{N}={\rm{min}}\left\{N\in {\mathbb{N}}:\underline{F}\left(\varphi ,N,X\right)=1\right\}.$$

(A.17)

The expressions (A.15) and (A.17) imply

$${\bf{P}}\left(\mathop{\rm{max}}\limits_{k\in \left\{0,\ldots ,\underline{N}\right\}}{\underline{x}}_{k+1}\geqslant -\varphi \right)=1.$$

From the definition of the isobell of level 1 of the Bellman function and the definition of the function $\underline{F}$ it follows that the equation $\underline{F}\left(\varphi ,N,X\right)=1$ is equivalent to the inclusion $X\in {{\mathcal{I}}}_{0}$, which is in turn equivalent to

$$X\geqslant -\frac{\varphi }{{\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\underline{\varepsilon }}_{j}\right\}\right)}^{N+1}}.$$

Taking the logarithm gives the inequality

$$N\geqslant \frac{\mathrm{ln}\,\left(-\varphi \right)-\mathrm{ln}\,\left(X\right)}{\mathrm{ln}\,\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\underline{\varepsilon }}_{j}\right\}\right)}-1.$$

In view of (A.17), this finally leads to (26).

The proof of Proposition 2 is complete.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Azanov, V. Optimal Control of a Discrete-Time Stochastic System with a Probabilistic Criterion and a Non-fixed Terminal Time. Autom Remote Control 81, 2143–2159 (2020). https://doi.org/10.1134/S0005117920120012

Download citation

Received: 02 March 2020
Revised: 20 May 2020
Accepted: 09 July 2020
Published: 10 February 2021
Issue Date: December 2020
DOI: https://doi.org/10.1134/S0005117920120012

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Optimal Control of a Discrete-Time Stochastic System with a Probabilistic Criterion and a Non-fixed Terminal Time

Abstract

Similar content being viewed by others

Optimal control for linear discrete systems with respect to probabilistic criteria

Two Approaches to Stochastic Optimal Control Problems with a Final-Time Expectation Constraint

Discrete Optimal Control

1 Introduction

2 Problem Statement

3 Dynamic Programming and Two-Sided Estimates of Bellman Function

Theorem 1

4 Two-Sided Estimates of Bellman Function

Theorem 2

5 Suboptimal Control

Theorem 3

6 Equivalence to Optimal Control Problem with Probabilistic Terminal Criterion

7 Optimal Investment Portfolio Management with Non-Fixed Terminal Time

Proposition 1

Proposition 2

8 Conclusions

9 Funding

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal Control of a Discrete-Time Stochastic System with a Probabilistic Criterion and a Non-fixed Terminal Time

Abstract

Similar content being viewed by others

Optimal control for linear discrete systems with respect to probabilistic criteria

Two Approaches to Stochastic Optimal Control Problems with a Final-Time Expectation Constraint

Discrete Optimal Control

1 Introduction

2 Problem Statement

3 Dynamic Programming and Two-Sided Estimates of Bellman Function

Theorem 1

4 Two-Sided Estimates of Bellman Function

Theorem 2

5 Suboptimal Control

Theorem 3

6 Equivalence to Optimal Control Problem with Probabilistic Terminal Criterion

7 Optimal Investment Portfolio Management with Non-Fixed Terminal Time

Proposition 1

Proposition 2

8 Conclusions

9 Funding

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation