1 Introduction

One of the most important fields of research into stochastic optimal control is the problems with a non-fixed terminal time. Among them, note the stochastic time-optimal control problem [1, 2], the stochastic optimal control problem with infinite horizon [36], the problem of optimizing the duration of stay of a stochastic system in a given tube of trajectories [1, 7], and the stochastic problem of optimizing the time of first reaching the boundaries of a given domain [1, 8, 9]. Control models with a non-fixed terminal time are widely used in the aerospace [10], economic [11], biological, robotic, and energy [1] applications. In the case of continuous-time systems, the methods using dynamic programming-based sufficient conditions of optimality have become widespread, which yield optimal feedback control laws. Interestingly, just the use of the probabilistic criterion [1, 7] leads to a constructive statement of the control problems with a non-fixed terminal time for which the Bellman equation can be written. Nevertheless, there are still a number of fundamental problems in the numerical calculation of optimal control, and analytical solutions have been obtained only for some model problems [1]. This circumstance is due to the following difficulties. First, the solution of the Bellman equation may be nonunique. Second, even if a solution of the Bellman equation exists in the class of smooth functions, it may be inadmissible (for example, because the Itô equation of the closed loop stochastic system with this control has no strong solution). Third, the class of admissible controls does not always include the one for which the supremum (or infimum) of the criterion is achieved. And fourth, Bellman’s equation is related to the curse of dimensionality.

In the case of discrete time, a qualitative theory of such problems was presented in [3]. Also, solutions of particular economic problems [11] and model examples [9] are known. In [9], the problem of optimizing the probability of first reaching the neighborhood of the origin by the trajectories of a linear stochastic system was considered in the Brunovsky canonical form of controllability. Its analytical solution was found through the reduction to the problem with a probabilistic terminal criterion and further use of the dynamic programming method in the form [12].

This paper studies an optimal control problem for a discrete-time stochastic system with the probability of first reaching a given tube as the optimality criterion. Sufficient conditions of optimality similar to [12] are established, and the properties of the two-sided estimates of the Bellman function [13, 14] are examined. The conditions of equivalence to the optimal control problem with the probabilistic terminal criterion [12] are obtained. As an illustrative example the problem of investment portfolio management is considered.

2 Problem Statement

Consider a discrete-time stochastic controlled system of the form

$$\left\{\begin{array}{l}{x}_{k+1}={f}_{k}\left({x}_{k},{u}_{k},{\xi }_{k}\right)\\ {x}_{0}=X,\end{array}\right.\qquad k=\overline{0,N},$$
(1)

with the following notations: \({x}_{k}\in {{\mathbb{R}}}^{n}\) as the state vector; \({u}_{k}\in {U}_{k}\subset {{\mathbb{R}}}^{m}\) as the control vector; Uk as the set of control constraints; ξk as the vector of random perturbations with the codomain \({{\mathbb{R}}}^{s}\) and a given distribution \({{\bf{P}}}_{{\xi }_{k}};\)\({f}_{k}:{{\mathbb{R}}}^{n}\times {{\mathbb{R}}}^{m}\times {{\mathbb{R}}}^{s}\to {{\mathbb{R}}}^{n}\) as the transition function (system function); finally, \(N\in \left\{0\right\}\cup {\mathbb{N}}\) as the control horizon.

Introduce several assumptions regarding the system (1) as follows.

  1. 1.

    The complete information on the state vector xk is known. (Due to this fact, an appropriate control can be constructed in the class of functions \({u}_{k}={\gamma }_{k}\left({x}_{k}\right)\), where \({\gamma }_{k}\left(\cdot \right)\) is some measurable function.) In this case, it is said that “control is found in the class of full state feedback controls.”

  2. 2.

    The initial state x0 = X is a deterministic from \({{\mathbb{R}}}^{n}\).

  3. 3.

    The system function \({f}_{k}\left({x}_{k},{u}_{k},{\xi }_{k}\right)\) is continuous for all k.

  4. 4.

    The control vector uk is generated in the form \({u}_{k}={\gamma }_{k}\left({x}_{k}\right)\), where \({\gamma }_{k}:{{\mathbb{R}}}^{n}\to {{\mathbb{R}}}^{m}\) is a measurable function with bounded values uk ∈ Uk, Uk being a compact set.

  5. 5.

    The state vector xk+1 is generated in the following way: in step k the vector xk is realized; then the control vector \({u}_{k}={\gamma }_{k}\left({x}_{k}\right)\) is formed; at last the random perturbation ξk is realized.

  6. 6.

    The control is the set of functions \(u\left(\cdot \right)=\left({\gamma }_{0}\left(\cdot \right),\ldots ,{\gamma }_{N}\left(\cdot \right)\right)\in {\mathcal{U}};\) the class of admissible controls is the set \({\mathcal{U}}={{\mathcal{U}}}_{0}\times \ldots \times {{\mathcal{U}}}_{N},\) where \({{\mathcal{U}}}_{k}\) denotes the set of Borel functions \({\gamma }_{k}\left(\cdot \right)\) with values bounded on Uk.

  7. 7.

    The random vector ξk is continuous, has the codomain \({{\mathbb{R}}}^{s}\) and a known distribution \({{\bf{P}}}_{{\xi }_{k}}\); moreover, the components of the vector \(\zeta =\left(X,{\xi }_{0},\ldots ,{\xi }_{N}\right)\) are independent.

Note that the system (1) is Markovian, i.e., its future behavior does not depend on the past states (history) and is completely defined by the current state.

On the trajectories of the system (1) define the probabilistic functional

$$P\left(u\left(\cdot \right)\right)={\bf{P}}\left(\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{x}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right),$$

in which the sets \({{\mathcal{F}}}_{k}\) have the form

$$\left\{\begin{array}{l}{{\mathcal{F}}}_{k}=\left\{x\in {{\mathbb{R}}}^{n}:{\Phi }_{k}\left(x\right)\leqslant \varphi \right\},\quad k=\overline{1,N+1}\\ {{\mathcal{F}}}_{0}={{\mathbb{R}}}^{n},\end{array}\right.$$

where \(\varphi \in {\mathbb{R}}\) is a given scalar; \({\Phi }_{k}:{{\mathbb{R}}}^{n}\to {\mathbb{R}}\) are continuous functions, k = 1, …, N + 1, and \({\Phi }_{N+1}\left(x\right)\) is bounded below.

Consider the optimization problem

$$P\left(u\left(\cdot \right)\right)\to \mathop{\rm{max}}\limits_{u\left(\cdot \right)\in {\mathcal{U}}},$$
(2)

where \({\mathcal{U}}={{\mathcal{U}}}_{0}\times \ldots \times {{\mathcal{U}}}_{N}\).

Physically, problem (2) is to find a control maximizing the probability of first reaching the tube of trajectories described by the sequence of given sets \({\left\{{{\mathcal{F}}}_{k+1}\right\}}_{k = 0}^{N}\).

Note that the dynamic programming method in the form [12], introduced for optimal control problems with the probabilistic terminal criterion, is not generally applicable to problem (2). The dynamic programming-based sufficient conditions of optimality, similar to [12], will be established in Section 3.

3 Dynamic Programming and Two-Sided Estimates of Bellman Function

Define the Bellman function \({{\rm{B}}}_{k}:{{\mathbb{R}}}^{n}\to \left[0,1\right]\) in problem (2) as

$${{\rm{B}}}_{k}\left(x\right)=\mathop{\rm{sup}}\limits_{{\gamma }_{k}\left(\cdot \right)\in {{\mathcal{U}}}_{k},\ldots ,{\gamma }_{N}\left(\cdot \right)\in {{\mathcal{U}}}_{N}}{\bf{P}}\left(\mathop{\rm{min}}\limits_{i = \overline{k,N}}\left.{\Phi }_{i+1}\left({x}_{i+1}\right)\leqslant \varphi \right|{x}_{k}=x\right).$$

In view of the assumptions accepted in Section 2, formulate a theorem on the Bellman equation for problem (2) in the n-dimensional state space.

Theorem 1

Let the following conditions be satisfied:

1) The functions \({f}_{k}\left({x}_{k},{u}_{k},{\xi }_{k}\right)\) are continuous for all \(k=\overline{0,N}\).

2) The functions \({\Phi }_{k}\left({x}_{k}\right)\) are continuous for all \(k=\overline{1,N+1}\).

3) The function \({\Phi }_{N+1}\left({x}_{N+1}\right)\) is bounded below.

4) The random vectors Xξ0, …, ξN are independent.

5) The sets U0, …, UN are compact.

Then the optimal control in problem (2) exists in the class of measurable functions \({u}^{* }\left(\cdot \right)\in {\mathcal{U}}\) and is determined by solving the following problems:

$${\gamma }_{k}^{* }\left(x\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in {U}_{k}}\,\,{{\bf{M}}}_{{\xi }_{k}}\left[{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)+\left(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\right){{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],$$
(3)
$${{\rm{B}}}_{k}\left(x\right)=\mathop{\rm{max}}\limits_{u\in {U}_{k}}\,\,{{\bf{M}}}_{{\xi }_{k}}\left[{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)+\left(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\right){{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],\quad k=\overline{0,N},$$
(4)
$${{\rm{B}}}_{N+1}\left(x\right)={{\bf{I}}}_{{{\mathcal{F}}}_{N+1}}\left(x\right).$$
(5)

The proof of Theorem 1, as well as the proofs of all other theorems, propositions and lemma below, are postponed to the Appendix.

In Theorem 1, \({{\bf{M}}}_{{\xi }_{k}}\left[\cdot \right]\) denotes the expectation operator in the distribution \({{\bf{P}}}_{{\xi }_{k}}\) of the random vector ξk, and \({{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\) is the indicator function of the set \({{\mathcal{F}}}_{k}\). Note that Eqs. (3)–(5) differ from the classical Bellman equation [12] for the problem with the probabilistic terminal criterion by the additional term \({{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\) and the factor \(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\) under the expectation operator, which figure in the right-hand side.

As is known, the direct integration of the Bellman equation causes the difficulties of calculating multiple integrals and solving stochastic programming problems in its right-hand side. These difficulties have repeatedly occurred even for systems of the simplest type, e.g., for the investment portfolio management system [1517] and for the stationary satellite control system [12]. As a result, for almost twenty years, optimal control problems with a probabilistic criterion have been considered in the one-step (N = 0) and two-step (N = 1) statements; see [1517]. In [13, 14], two-sided estimates of the Bellman function were found using the isobells of levels 1 and 0, which allowed solving particular optimal control problems with a probabilistic criterion for an arbitrary time step N.

By analogy with [13, 14], investigate the properties of the Bellman function for problem (2) using the isobells of levels 1 and 0 of the Bellman function.

4 Two-Sided Estimates of Bellman Function

Introduce the isobells of levels 1 and 0 of the Bellman function, defined as

$${{\mathcal{I}}}_{k}=\left\{x\in {{\mathbb{R}}}^{n}:\quad {{\rm{B}}}_{k}\left(x\right)=1\right\},\quad {{\mathcal{O}}}_{k}=\left\{x\in {{\mathbb{R}}}^{n}:\quad {{\rm{B}}}_{k}\left(x\right)=0\right\},$$

and the set \({{\mathcal{B}}}_{k}={{\mathbb{R}}}^{n}\setminus \left\{{{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k}\right\}\). For the sake of convenience, also introduce the notation \({\overline{{\mathcal{F}}}}_{k}={{\mathbb{R}}}^{n}\setminus {{\mathcal{F}}}_{k}\). Clearly, by the definition of these sets,

$${{\mathcal{I}}}_{k}\cup {{\mathcal{B}}}_{k}\cup {{\mathcal{O}}}_{k}={{\mathbb{R}}}^{n},\quad \left\{\begin{array}{ll}{{\rm{B}}}_{k}\left(x\right)=1,&x\in {{\mathcal{I}}}_{k}\\ {{\rm{B}}}_{k}\left(x\right)\in \left(0,1\right),&x\in {{\mathcal{B}}}_{k}\\ {{\rm{B}}}_{k}\left(x\right)=0,&x\in {{\mathcal{O}}}_{k}.\end{array}\right.$$

Theorem 2

The following statements hold:

1. The sets \({{\mathcal{I}}}_{k}\), \(k=\overline{0,N}\), satisfy the recurrent relations in backward time

$${{\mathcal{I}}}_{k}={{\mathcal{F}}}_{k}\cup \left\{x\in {{\mathbb{R}}}^{n}:\exists u\in {U}_{k}:{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=1\right\},\quad k=\overline{0,N},$$
$${{\mathcal{I}}}_{N+1}={{\mathcal{F}}}_{N+1}.$$

2. The sets \({{\mathcal{O}}}_{k}\), \(k=\overline{0,N}\), satisfy the recurrent relations in backward time

$${{\mathcal{O}}}_{k}={\overline{{\mathcal{F}}}}_{k}\cap \left\{x\in {{\mathbb{R}}}^{n}:\forall u\in {U}_{k}:{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right)=1\right\},\quad k=\overline{0,N},$$
$${{\mathcal{O}}}_{N+1}={\overline{{\mathcal{F}}}}_{N+1}.$$

3. For \(x\in {{\mathcal{I}}}_{k}\), the function \({\gamma }_{k}^{* }(x)\) takes any value from the set

$${U}_{k}^{{\mathcal{I}}}\left(x\right)=\left\{u\in {U}_{k}:{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=1\right\}.$$
(6)

4. For \(x\in {{\mathcal{O}}}_{k}\), the function \({\gamma }_{k}^{* }\left(x\right)\) takes any value from the set Uk.

5. The Bellman equation in the domain of all \(x\in {{\mathcal{B}}}_{k}\) admits of the representation

$$\begin{array}{rcl}{{\rm{B}}}_{k}\left(x\right)&=&\mathop{\rm{max}}\limits_{u\in {U}_{k}}\left\{{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)\right.\\ &&\left.+{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right){{\bf{M}}}_{{\xi }_{k}}\left[\left.{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right|{f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{B}}}_{k+1}\right]\right\}.\end{array}$$
(7)

6. For \(x\in {{\mathcal{B}}}_{k}\) and u ∈ Uk,

$$\begin{array}{cc}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{F}}}_{k+1}\right)\leqslant {{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)\\ \leqslant {{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right]\leqslant 1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right).\end{array}$$
(8)

7. For \(x\in {{\mathcal{B}}}_{k}\), the Bellman equation satisfies the two-sided inequality

$${\Psi }_{k}\left(x\right)\leqslant {\underline{{\rm{B}}}}_{k}\left(x\right)\leqslant {{\rm{B}}}_{k}\left(x\right)\leqslant {\overline{{\rm{B}}}}_{k}\left(x\right),$$
(9)

where

$${\Psi }_{k}\left(x\right)=\mathop{\rm{sup}}\limits_{u\in {U}_{k}}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{F}}}_{k+1}\right),$$
(10)

\({\underline{{\rm{B}}}}_{k}\left(x\right)\) and \({\overline{{\rm{B}}}}_{k}\left(x\right)\) are lower and upper bounds on the Bellman function, i.e.,

$$\begin{array}{cc}{\underline{{\rm{B}}}}_{k}\left(x\right)=\mathop{\rm{sup}}\limits_{u\in {U}_{k}}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right),\\ {\overline{{\rm{B}}}}_{k}\left(x\right)=\mathop{\rm{sup}}\limits_{u\in {U}_{k}}\left\{1-{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{O}}}_{k+1}\right)\right\},\end{array}$$

moreover \({\underline{{\rm{B}}}}_{N}\left(x\right)={{\rm{B}}}_{N}\left(x\right)={\overline{{\rm{B}}}}_{N}\left(x\right)\).

The difference between the right-hand side of the relations in item 1 of Theorem 2 from the relations for the isobell of level 1 of the Bellman function in the problem with the terminal probabilistic criterion [18] is the union operation with the set \({{\mathcal{F}}}_{k}\). For the isobell of level 0 of the Bellman function, the difference lies in the intersection operation with the set \({\overline{{\mathcal{F}}}}_{k}\). Items 3 and 4 establish the simplest (with respect to (3)) expressions for determining the optimal control for \({x}_{k}\in {{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k},\) which coincide, up to the structures of the sets \({{\mathcal{I}}}_{k},\) with those in the problem with the terminal probabilistic criterion. Items 6 and 7 of Theorem 2 establish two-sided estimates for the right-hand side of the dynamic programming equation and the Bellman function, respectively. Moreover, the expressions for the lower and upper bounds, up to the structures of the sets \({{\mathcal{I}}}_{k}\) and \({{\mathcal{O}}}_{k}\), coincide with those in the problem with the terminal criterion [13, 18]. The difference is the presence of an additional inequality in the left-hand side of (8) and, as a consequence, the inequality \({\Psi }_{k}\left(x\right)\leqslant {\underline{{\rm{B}}}}_{k}\left(x\right)\).

Next, explore in detail the properties of the control maximizing at each step the lower bound on the right-hand side of the dynamic programming equation.

5 Suboptimal Control

Consider a control \(\underline{u}\left(\cdot \right)=\left({\underline{\gamma }}_{0}\left(\cdot \right),\ldots ,{\underline{\gamma }}_{N}\left(\cdot \right)\right)\), where

$${\underline{\gamma }}_{k}\left(x\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{P}}}_{{\xi }_{k}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right),\quad k=\overline{0,N}.$$
(11)

This control has the following properties:

  • For \(x\in {{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k}\) and for all \(k=\overline{0,N}\), \({\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)\).

  • For k = N, \({\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)\).

  • From \({{\mathcal{F}}}_{k}={{\mathcal{I}}}_{k}\) holding for all \(k=\overline{0,N}\) it follows that\({\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)\).

Theorem 3

Assume that the control \(\underline{u}\left(\cdot \right)\) exists in the class \({\mathcal{U}}\). Then the following statements hold:

1. The value of the probabilistic criterion under the control \(\underline{u}\left(\cdot \right)\) is given by

$$\begin{array}{ccc}P\left(\underline{u}\left(\cdot \right)\right)=\underline{F}\left(\varphi ,N,X\right)\\ +\mathop{\sum }\limits_{l=1}^{N-1}{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{l}\left\{{\underline{x}}_{k+1}\in {{\mathcal{I}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right)\\ +{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right),\end{array}$$
(12)

where \({\underline{x}}_{k}\) is the trajectory of the closed loop system with the control (11),

$$\left\{\begin{array}{l}{\underline{x}}_{k+1}={f}_{k}\left({\underline{x}}_{k},{\underline{u}}_{k},{\xi }_{k}\right)\\ {\underline{x}}_{0}=X,\end{array}\right.\qquad k=\overline{0,N},$$

in which \({\underline{u}}_{k}={\underline{\gamma }}_{k}\left({\underline{x}}_{k}\right)\).

2. The optimal-value function of the probabilistic criterio has the form

$$\begin{array}{ccc}P\left({u}^{* }\left(\cdot \right)\right)=\underline{F}\left(\varphi ,N,X\right)\\ +\mathop{\sum }\limits_{l=1}^{N-1}{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{x}_{k}^{* }\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{l}\left\{{x}_{k+1}^{* }\in {{\mathcal{I}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{x}_{k}^{* }\notin {{\mathcal{I}}}_{k}\right\}\right)\\ +{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{x}_{k}^{* }\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{x}_{k+1}^{* }\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{x}_{k}^{* }\notin {{\mathcal{I}}}_{k}\right\}\right),\end{array}$$
(13)

where \({x}_{k}^{* }\) is the trajectory of the closed loop system with the optimal control,

$$\left\{\begin{array}{l}{x}_{k+1}^{* }={f}_{k}\left({x}_{k}^{* },{u}_{k}^{* },{\xi }_{k}\right)\\ {x}_{0}^{* }=X,\end{array}\right.\qquad k=\overline{0,N},$$

in which \({u}_{k}^{* }={\gamma }_{k}^{* }\left({x}_{k}^{* }\right)\).

3. For any \(\varphi \in {\mathbb{R}}\), \(N\in {\mathbb{N}}\), and \(X\in {{\mathbb{R}}}^{n}\),

$${F}^{{\mathcal{F}}}\left(\varphi ,N,X\right)\leqslant \underline{F}\left(\varphi ,N,X\right)\leqslant P\left(\underline{u}\left(\cdot \right)\right)\leqslant P\left({u}^{* }\left(\cdot \right)\right)\leqslant \overline{F}\left(\varphi ,N,X\right),$$
(14)

where the functions

$${F}^{{\mathcal{F}}}:{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right],\quad \underline{F}:{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right],\quad \overline{F}:{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right]$$

have the form

$${F}^{{\mathcal{F}}}\left(\varphi ,N,X\right)={\Psi }_{0}\left(X\right),\quad \underline{F}\left(\varphi ,N,X\right)={\underline{{\rm{B}}}}_{0}\left(X\right),\quad \overline{F}\left(\varphi ,N,X\right)={\overline{{\rm{B}}}}_{0}\left(X\right).$$

According to Theorem 3, for the suboptimal control \(\underline{u}\left(\cdot \right)\)(11) the accuracy can be estimated as

$$P\left({u}^{* }\left(\cdot \right)\right)-P\left(\underline{u}\left(\cdot \right)\right)\leqslant \Delta \left(\varphi ,N,X\right)$$

which holds for all

$$\varphi \in {\mathbb{R}},\quad N\in \left\{0\right\}\cup {\mathbb{N}},\quad X\in {{\mathbb{R}}}^{n},$$

where the function \(\Delta :{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right]\) has the form

$$\begin{array}{ccc}\Delta \left(\varphi ,N,X\right)=\overline{F}\left(\varphi ,N,X\right)-\underline{F}\left(\varphi ,N,X\right)\\ -\mathop{\sum }\limits_{l=1}^{N-1}{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{l}\left\{{\underline{x}}_{k+1}\in {{\mathcal{I}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{l}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right)\\ -{\bf{P}}\left(\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right){\bf{P}}\left(\left.\mathop{\bigcup }\limits_{k = 0}^{N}\left\{{\underline{x}}_{k+1}\in {{\mathcal{F}}}_{k+1}\right\}\right|\mathop{\bigcap}\limits_{k = 1}^{N}\left\{{\underline{x}}_{k}\notin {{\mathcal{I}}}_{k}\right\}\right).\end{array}$$
(15)

Now study the conditions under which problem (2) is equivalent to the optimal control problem with the probabilistic terminal criterion.

6 Equivalence to Optimal Control Problem with Probabilistic Terminal Criterion

On the trajectories of the system (1) define the probabilistic terminal criterion

$${P}_{\varphi }\left(u\left(\cdot \right)\right)={\bf{P}}\left({x}_{N+1}\in {{\mathcal{F}}}_{N+1}\right)={\bf{P}}\left({\Phi }_{N+1}\left({x}_{N+1}\right)\leqslant \varphi \right),$$
(16)

and consider the optimal control problem

$${P}_{\varphi }\left(u\left(\cdot \right)\right)\to \mathop{\rm{max}}\limits_{u\left(\cdot \right)\in {\mathcal{U}}}.$$
(17)

As was demonstrated in [12], the solution of problem (17) exists in the class \({\mathcal{U}}\) and is determined by solving the dynamic programming equations

$${\gamma }_{k}^{\varphi }\left(x\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}^{\varphi }\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],$$
(18)
$${{\rm{B}}}_{k}^{\varphi }\left(x\right)=\mathop{\rm{max}}\limits_{u\in {U}_{k}}{{\bf{M}}}_{{\xi }_{k}}\left[{{\rm{B}}}_{k+1}^{\varphi }\left({f}_{k}\left(x,u,{\xi }_{k}\right)\right)\right],\quad k=\overline{0,N},$$
(19)
$${{\rm{B}}}_{N+1}^{\varphi }\left(x\right)={{\bf{I}}}_{{{\mathcal{F}}}_{N+1}}\left(x\right),$$
(20)

where \({{\rm{B}}}_{k}^{\varphi }\left(x\right)\) is the Bellman function in problem (17).

The result below establishes the equivalence of problems (2) and (17).

Lemma. Assume that for all \(k=\overline{0,N}\), \({{\mathcal{F}}}_{k}\subseteq \Delta {{\mathcal{I}}}_{k}\), where

$$\Delta {{\mathcal{I}}}_{k}=\left\{x\in {{\mathbb{R}}}^{n}:\exists u\in {U}_{k}:{\bf{P}}\left({f}_{k}\left(x,u,{\xi }_{k}\right)\in {{\mathcal{I}}}_{k+1}\right)=1\right\}.$$
(21)

Then problem (3) is equivalent to the optimal control problem with the probabilistic terminal criterion (17) in the sense of the same optimal controls \({u}^{* }\left(\cdot \right)={u}^{\varphi }\left(\cdot \right)\), the same optimal values of the criteria \(P\left({u}^{* }\left(\cdot \right)\right)={P}_{\varphi }\left({u}^{\varphi }\left(\cdot \right)\right)\), and the same Bellman functions \({{\rm{B}}}_{k}\left(x\right)={{\rm{B}}}_{k}^{\varphi }\left(x\right)\) for all \(k=\overline{0,N+1}\) and \(x\in {{\mathbb{R}}}^{n}\).

Note that more specific conditions for the equivalence of problems (2) and (17) can be obtained from Lemma for a narrow class of systems, but this goes beyond the scope of the paper.

In the next section, the results obtained will be applied to examine additional properties of the optimal investment portfolio management problem with a probabilistic criterion.

7 Optimal Investment Portfolio Management with Non-Fixed Terminal Time

Following [13, 19], consider a discrete-time stochastic control system of the form

$$\left\{\begin{array}{l}{x}_{k+1}={x}_{k}\left(1+b{u}_{k}^{1}+\mathop{\sum }\limits_{j=2}^{m}{u}_{k}^{j}{\xi }_{k}^{j-1}\right)\\ {x}_{0}=X,\end{array}\right.\qquad k=\overline{0,N},$$
(22)

where n = 1, m, and s = m − 1 specify the dimensions of the state vector, control vector and random perturbations, respectively; X > 0, b > −1, and φ < 0 are deterministic scalars; \({\xi }_{k}={\left({\xi }_{k}^{1},\ldots ,{\xi }_{k}^{m-1}\right)}^{{\mathtt{T}}}\) mean random vectors with independent components, among which ξk+1 and ξk are independent for all \(k=\overline{0,N-1}\). Let the control constraints be given by

$${U}_{k}=U=\left\{u\in {{\mathbb{R}}}^{m}:\mathop{\sum }\limits_{j=1}^{m}{u}^{j}=1,\ {u}^{j}\geqslant 0,\ \forall j=\overline{1,m}\right\},\quad k=\overline{0,N}.$$

Assume that the distribution of the random vectors ξk has the support \({\rm{supp}}{\rho }_{\xi }\left(t\right)=\mathop{\bigotimes }\limits_{j = 1}^{m-1}\left[{\underline{\varepsilon }}_{j},{\overline{\varepsilon }}_{j}\right]\), and moreover the inequality \(-1\leqslant {\underline{\varepsilon }}_{j}\leqslant b\leqslant {\overline{\varepsilon }}_{j}\) holds for all \(j=\overline{1,m-1}\).

Consider an optimal control problem with a non-fixed terminal time of the form

$${\bf{P}}\left(\mathop{\rm{min}}\limits_{k = \overline{0,N}}\left\{-{x}_{k+1}\right\}\leqslant \varphi \right)\to \mathop{\rm{sup}}\limits_{u\left(\cdot \right)\in {\mathcal{U}}}$$
(23)

and a problem with a probabilistic terminal criterion of the form

$${\bf{P}}\left(-{x}_{N+1}\leqslant \varphi \right)\to \mathop{\rm{sup}}\limits_{u\left(\cdot \right)\in {\mathcal{U}}}.$$
(24)

In the notations accepted in this paper,

$${{\mathcal{F}}}_{k}={\mathcal{F}}=\left[-\varphi ,+\infty \right),\quad {\Phi }_{k}\left(x\right)=-x,\quad {f}_{k}\left(x,u,\xi \right)=x\left(1+b{u}^{1}+\mathop{\sum }\limits_{j=2}^{m}{u}^{j}{\xi }^{j-1}\right).$$

Let X be the amount of initial capital, xk be the amount of capital at the beginning of the kth year, \({u}_{k}^{1}\) be the share of xk invested in a risk-free asset (for example, a reliable bank) with a rate of return b, and \({u}_{k}^{j}\) be the shares of xk invested in risky assets characterized by rates of return \({\xi }_{k}^{j-1}\), \(j=\overline{2,m}\). In this case, problem (23) is to maximize the probability of reaching the amount of capital ( −φ) in time steps bounded above by the value N + 1 through investing in some assets. And problem (24) is to maximize the probability of reaching the amount of capital ( −φ) in time steps N + 1 through investing in some assets.

Problem (24) was considered in the two-step statement (N = 1) for the case of one risky asset (m = 2) in [15, 16]. In [13], a whole class of asymptotically optimal controls (as N → ) was found. Problem (23) is studied for the first time.

Using the results of this paper, check the equivalence of problems (23) and (24).

Proposition 1

For all \(k=\overline{0,N},\)

$${{\mathcal{I}}}_{k}=\Delta {{\mathcal{I}}}_{k}=\left[{\varphi }_{k}^{{\mathcal{I}}},+\infty \right),\quad {{\mathcal{B}}}_{k}=\left({\varphi }_{k}^{{\mathcal{O}}},{\varphi }_{k}^{{\mathcal{I}}}\right),\quad {{\mathcal{O}}}_{k}=\left(-\infty ,{\varphi }_{k}^{{\mathcal{O}}}\right],$$

where the scalars \({\varphi }_{k}^{{\mathcal{I}}}\) and \({\varphi }_{k}^{{\mathcal{O}}}\) are given by

$$\begin{array}{l}{\varphi }_{k}^{{\mathcal{I}}}=-\varphi {\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\underline{\varepsilon }}_{j}\right\}\right)}^{k-N-1},\\ {\varphi }_{k}^{{\mathcal{O}}}=-\varphi {\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\overline{\varepsilon }}_{j}\right\}\right)}^{k-N-1}.\end{array}$$

Due to Proposition 1, the condition of Lemma holds, \({\mathcal{F}}\subseteq \Delta {{\mathcal{I}}}_{k}\), and consequently problems (23) and (24) are equivalent. From Proposition 1 it follows that the optimal control in terms of the probabilistic terminal criterion will maximize the probability of first reaching the amount ( −φ) by the capital xk in time steps bounded above by the value N.

Now apply the two-sided estimates of the optimal-value function of the probabilistic criterion to estimate a time step \({N}^{* }\in {\mathbb{N}}\) such that

$${\bf{P}}\left(\mathop{\rm{max}}\limits_{k\in \left\{0,\ldots ,{N}^{* }\right\}}{x}_{k}^{* }\geqslant \varphi \right)=1,$$

where \({\left\{{x}_{k}^{* }\right\}}_{k = 0}^{N}\) are the trajectories of the closed loop system (22) with the optimal control \({u}^{* }\left(\cdot \right)\). Interestingly, the two-sided estimates of the optimal-value function of the probabilistic criterion (see Theorem 3) can be used to obtain such an estimate without finding the optimal control.

Proposition 2

Let \({\left\{{\underline{x}}_{k}\right\}}_{k = 0}^{N}\) be the trajectories of the closed loop system (22) with the control (11), where

$${\underline{\gamma }}_{k}\left(x\right)={\rm{arg}}\,\mathop{\rm{max}}\limits_{u\in U}{\bf{P}}\left(x\left(1+b{u}^{1}+\mathop{\sum }\limits_{j=2}^{m}{u}^{j}{\xi }_{k}^{j-1}\right)\geqslant {\varphi }_{k+1}^{{\mathcal{I}}}\right),\quad k=\overline{0,N}.$$
(25)

Then there exists a time step \(\underline{N}\in {\mathbb{N}}\) such that

$${\bf{P}}\left(\mathop{\rm{max}}\limits_{k = \left\{0,\ldots ,\underline{N}\right\}}{\underline{x}}_{k+1}\geqslant \varphi \right)=1,$$

and moreover for any X > 0 and b > − 1, \({N}^{* }\geqslant \underline{N}\) and

$$\underline{N}=\left[\frac{\mathrm{ln}\,\left(-\varphi \right)-\mathrm{ln}\,\left(X\right)}{\mathrm{ln}\,\left(1+{\rm{max}}\left\{b,\mathop{\rm{max}}\limits_{j = \overline{1,m-1}}{\underline{\varepsilon }}_{j}\right\}\right)}-1\right].$$
(26)

Perform a series of three numerical experiments to check the adequacy of the estimate (26). Consider the case of one risky asset, for which the control (25) was found in [13]:

$${\underline{\gamma }}_{k}\left(x\right)=\left\{\begin{array}{ll}{\left(1,0\right)}^{{\mathtt{T}}},&x\geqslant -\varphi {\left(1+b\right)}^{k-N-1}\\ {\left(0,1\right)}^{{\mathtt{T}}},&x<-\varphi {\left(1+b\right)}^{k-N-1}.\end{array}\right.$$

The values of the system parameters are given in Table 1.

Table 1 Values of system parameters

To simulate the value N*, the Monte Carlo method of 50 000 observations was used. The simulation results are presented in Table 2.

Table 2 Values of system parameters

According to Table 2, \(\underline{N}\) is a rather accurate estimate of the value N*.

8 Conclusions

This paper has considered the optimal control problem for a discrete-time stochastic system with the probability of first reaching a given tube of trajectories as the optimality criterion. Dynamic programming-based sufficient conditions of optimality have been established. Two-sided estimates of the right-hand side of the dynamic programming equation, two-sided estimates of the Bellman function, and two-sided estimates of the optimal-value function of the probabilistic criterion have been found. A suboptimal control has been designed in analytical form, and its accuracy has been estimated. The conditions under which this problem is equivalent to the optimal control problem with a probabilistic terminal criterion have been proved. These conditions were tested on an optimal management problem for an investment portfolio.

9 Funding

This work was supported by the Russian Foundation for Basic Research, project no. 18-08-00595.