Abstract
This paper considers an optimal control problem for a discrete-time stochastic system with the probability of first reaching the boundaries of a given domain as the optimality criterion. Dynamic programming-based sufficient conditions of optimality are formulated and proved. The isobells of levels 1 and 0 of the Bellman function are used for obtaining two-sided estimates of the right-hand side of the dynamic programming equation, two-sided estimates of the Bellman function, and two-sided estimates of the optimal-value function of the probabilistic criterion. A suboptimal control design method is proposed. The conditions of equivalence to an optimal control problem with a probabilistic terminal criterion are established. An illustrative example is given.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
One of the most important fields of research into stochastic optimal control is the problems with a non-fixed terminal time. Among them, note the stochastic time-optimal control problem [1, 2], the stochastic optimal control problem with infinite horizon [3–6], the problem of optimizing the duration of stay of a stochastic system in a given tube of trajectories [1, 7], and the stochastic problem of optimizing the time of first reaching the boundaries of a given domain [1, 8, 9]. Control models with a non-fixed terminal time are widely used in the aerospace [10], economic [11], biological, robotic, and energy [1] applications. In the case of continuous-time systems, the methods using dynamic programming-based sufficient conditions of optimality have become widespread, which yield optimal feedback control laws. Interestingly, just the use of the probabilistic criterion [1, 7] leads to a constructive statement of the control problems with a non-fixed terminal time for which the Bellman equation can be written. Nevertheless, there are still a number of fundamental problems in the numerical calculation of optimal control, and analytical solutions have been obtained only for some model problems [1]. This circumstance is due to the following difficulties. First, the solution of the Bellman equation may be nonunique. Second, even if a solution of the Bellman equation exists in the class of smooth functions, it may be inadmissible (for example, because the Itô equation of the closed loop stochastic system with this control has no strong solution). Third, the class of admissible controls does not always include the one for which the supremum (or infimum) of the criterion is achieved. And fourth, Bellman’s equation is related to the curse of dimensionality.
In the case of discrete time, a qualitative theory of such problems was presented in [3]. Also, solutions of particular economic problems [11] and model examples [9] are known. In [9], the problem of optimizing the probability of first reaching the neighborhood of the origin by the trajectories of a linear stochastic system was considered in the Brunovsky canonical form of controllability. Its analytical solution was found through the reduction to the problem with a probabilistic terminal criterion and further use of the dynamic programming method in the form [12].
This paper studies an optimal control problem for a discrete-time stochastic system with the probability of first reaching a given tube as the optimality criterion. Sufficient conditions of optimality similar to [12] are established, and the properties of the two-sided estimates of the Bellman function [13, 14] are examined. The conditions of equivalence to the optimal control problem with the probabilistic terminal criterion [12] are obtained. As an illustrative example the problem of investment portfolio management is considered.
2 Problem Statement
Consider a discrete-time stochastic controlled system of the form
with the following notations: \({x}_{k}\in {{\mathbb{R}}}^{n}\) as the state vector; \({u}_{k}\in {U}_{k}\subset {{\mathbb{R}}}^{m}\) as the control vector; Uk as the set of control constraints; ξk as the vector of random perturbations with the codomain \({{\mathbb{R}}}^{s}\) and a given distribution \({{\bf{P}}}_{{\xi }_{k}};\)\({f}_{k}:{{\mathbb{R}}}^{n}\times {{\mathbb{R}}}^{m}\times {{\mathbb{R}}}^{s}\to {{\mathbb{R}}}^{n}\) as the transition function (system function); finally, \(N\in \left\{0\right\}\cup {\mathbb{N}}\) as the control horizon.
Introduce several assumptions regarding the system (1) as follows.
-
1.
The complete information on the state vector xk is known. (Due to this fact, an appropriate control can be constructed in the class of functions \({u}_{k}={\gamma }_{k}\left({x}_{k}\right)\), where \({\gamma }_{k}\left(\cdot \right)\) is some measurable function.) In this case, it is said that “control is found in the class of full state feedback controls.”
-
2.
The initial state x0 = X is a deterministic from \({{\mathbb{R}}}^{n}\).
-
3.
The system function \({f}_{k}\left({x}_{k},{u}_{k},{\xi }_{k}\right)\) is continuous for all k.
-
4.
The control vector uk is generated in the form \({u}_{k}={\gamma }_{k}\left({x}_{k}\right)\), where \({\gamma }_{k}:{{\mathbb{R}}}^{n}\to {{\mathbb{R}}}^{m}\) is a measurable function with bounded values uk ∈ Uk, Uk being a compact set.
-
5.
The state vector xk+1 is generated in the following way: in step k the vector xk is realized; then the control vector \({u}_{k}={\gamma }_{k}\left({x}_{k}\right)\) is formed; at last the random perturbation ξk is realized.
-
6.
The control is the set of functions \(u\left(\cdot \right)=\left({\gamma }_{0}\left(\cdot \right),\ldots ,{\gamma }_{N}\left(\cdot \right)\right)\in {\mathcal{U}};\) the class of admissible controls is the set \({\mathcal{U}}={{\mathcal{U}}}_{0}\times \ldots \times {{\mathcal{U}}}_{N},\) where \({{\mathcal{U}}}_{k}\) denotes the set of Borel functions \({\gamma }_{k}\left(\cdot \right)\) with values bounded on Uk.
-
7.
The random vector ξk is continuous, has the codomain \({{\mathbb{R}}}^{s}\) and a known distribution \({{\bf{P}}}_{{\xi }_{k}}\); moreover, the components of the vector \(\zeta =\left(X,{\xi }_{0},\ldots ,{\xi }_{N}\right)\) are independent.
Note that the system (1) is Markovian, i.e., its future behavior does not depend on the past states (history) and is completely defined by the current state.
On the trajectories of the system (1) define the probabilistic functional
in which the sets \({{\mathcal{F}}}_{k}\) have the form
where \(\varphi \in {\mathbb{R}}\) is a given scalar; \({\Phi }_{k}:{{\mathbb{R}}}^{n}\to {\mathbb{R}}\) are continuous functions, k = 1, …, N + 1, and \({\Phi }_{N+1}\left(x\right)\) is bounded below.
Consider the optimization problem
where \({\mathcal{U}}={{\mathcal{U}}}_{0}\times \ldots \times {{\mathcal{U}}}_{N}\).
Physically, problem (2) is to find a control maximizing the probability of first reaching the tube of trajectories described by the sequence of given sets \({\left\{{{\mathcal{F}}}_{k+1}\right\}}_{k = 0}^{N}\).
Note that the dynamic programming method in the form [12], introduced for optimal control problems with the probabilistic terminal criterion, is not generally applicable to problem (2). The dynamic programming-based sufficient conditions of optimality, similar to [12], will be established in Section 3.
3 Dynamic Programming and Two-Sided Estimates of Bellman Function
Define the Bellman function \({{\rm{B}}}_{k}:{{\mathbb{R}}}^{n}\to \left[0,1\right]\) in problem (2) as
In view of the assumptions accepted in Section 2, formulate a theorem on the Bellman equation for problem (2) in the n-dimensional state space.
Theorem 1
Let the following conditions be satisfied:
1) The functions \({f}_{k}\left({x}_{k},{u}_{k},{\xi }_{k}\right)\) are continuous for all \(k=\overline{0,N}\).
2) The functions \({\Phi }_{k}\left({x}_{k}\right)\) are continuous for all \(k=\overline{1,N+1}\).
3) The function \({\Phi }_{N+1}\left({x}_{N+1}\right)\) is bounded below.
4) The random vectors X, ξ0, …, ξN are independent.
5) The sets U0, …, UN are compact.
Then the optimal control in problem (2) exists in the class of measurable functions \({u}^{* }\left(\cdot \right)\in {\mathcal{U}}\) and is determined by solving the following problems:
The proof of Theorem 1, as well as the proofs of all other theorems, propositions and lemma below, are postponed to the Appendix.
In Theorem 1, \({{\bf{M}}}_{{\xi }_{k}}\left[\cdot \right]\) denotes the expectation operator in the distribution \({{\bf{P}}}_{{\xi }_{k}}\) of the random vector ξk, and \({{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\) is the indicator function of the set \({{\mathcal{F}}}_{k}\). Note that Eqs. (3)–(5) differ from the classical Bellman equation [12] for the problem with the probabilistic terminal criterion by the additional term \({{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\) and the factor \(1-{{\bf{I}}}_{{{\mathcal{F}}}_{k}}\left(x\right)\) under the expectation operator, which figure in the right-hand side.
As is known, the direct integration of the Bellman equation causes the difficulties of calculating multiple integrals and solving stochastic programming problems in its right-hand side. These difficulties have repeatedly occurred even for systems of the simplest type, e.g., for the investment portfolio management system [15–17] and for the stationary satellite control system [12]. As a result, for almost twenty years, optimal control problems with a probabilistic criterion have been considered in the one-step (N = 0) and two-step (N = 1) statements; see [15–17]. In [13, 14], two-sided estimates of the Bellman function were found using the isobells of levels 1 and 0, which allowed solving particular optimal control problems with a probabilistic criterion for an arbitrary time step N.
By analogy with [13, 14], investigate the properties of the Bellman function for problem (2) using the isobells of levels 1 and 0 of the Bellman function.
4 Two-Sided Estimates of Bellman Function
Introduce the isobells of levels 1 and 0 of the Bellman function, defined as
and the set \({{\mathcal{B}}}_{k}={{\mathbb{R}}}^{n}\setminus \left\{{{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k}\right\}\). For the sake of convenience, also introduce the notation \({\overline{{\mathcal{F}}}}_{k}={{\mathbb{R}}}^{n}\setminus {{\mathcal{F}}}_{k}\). Clearly, by the definition of these sets,
Theorem 2
The following statements hold:
1. The sets \({{\mathcal{I}}}_{k}\), \(k=\overline{0,N}\), satisfy the recurrent relations in backward time
2. The sets \({{\mathcal{O}}}_{k}\), \(k=\overline{0,N}\), satisfy the recurrent relations in backward time
3. For \(x\in {{\mathcal{I}}}_{k}\), the function \({\gamma }_{k}^{* }(x)\) takes any value from the set
4. For \(x\in {{\mathcal{O}}}_{k}\), the function \({\gamma }_{k}^{* }\left(x\right)\) takes any value from the set Uk.
5. The Bellman equation in the domain of all \(x\in {{\mathcal{B}}}_{k}\) admits of the representation
6. For \(x\in {{\mathcal{B}}}_{k}\) and u ∈ Uk,
7. For \(x\in {{\mathcal{B}}}_{k}\), the Bellman equation satisfies the two-sided inequality
where
\({\underline{{\rm{B}}}}_{k}\left(x\right)\) and \({\overline{{\rm{B}}}}_{k}\left(x\right)\) are lower and upper bounds on the Bellman function, i.e.,
moreover \({\underline{{\rm{B}}}}_{N}\left(x\right)={{\rm{B}}}_{N}\left(x\right)={\overline{{\rm{B}}}}_{N}\left(x\right)\).
The difference between the right-hand side of the relations in item 1 of Theorem 2 from the relations for the isobell of level 1 of the Bellman function in the problem with the terminal probabilistic criterion [18] is the union operation with the set \({{\mathcal{F}}}_{k}\). For the isobell of level 0 of the Bellman function, the difference lies in the intersection operation with the set \({\overline{{\mathcal{F}}}}_{k}\). Items 3 and 4 establish the simplest (with respect to (3)) expressions for determining the optimal control for \({x}_{k}\in {{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k},\) which coincide, up to the structures of the sets \({{\mathcal{I}}}_{k},\) with those in the problem with the terminal probabilistic criterion. Items 6 and 7 of Theorem 2 establish two-sided estimates for the right-hand side of the dynamic programming equation and the Bellman function, respectively. Moreover, the expressions for the lower and upper bounds, up to the structures of the sets \({{\mathcal{I}}}_{k}\) and \({{\mathcal{O}}}_{k}\), coincide with those in the problem with the terminal criterion [13, 18]. The difference is the presence of an additional inequality in the left-hand side of (8) and, as a consequence, the inequality \({\Psi }_{k}\left(x\right)\leqslant {\underline{{\rm{B}}}}_{k}\left(x\right)\).
Next, explore in detail the properties of the control maximizing at each step the lower bound on the right-hand side of the dynamic programming equation.
5 Suboptimal Control
Consider a control \(\underline{u}\left(\cdot \right)=\left({\underline{\gamma }}_{0}\left(\cdot \right),\ldots ,{\underline{\gamma }}_{N}\left(\cdot \right)\right)\), where
This control has the following properties:
-
For \(x\in {{\mathcal{I}}}_{k}\cup {{\mathcal{O}}}_{k}\) and for all \(k=\overline{0,N}\), \({\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)\).
-
For k = N, \({\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)\).
-
From \({{\mathcal{F}}}_{k}={{\mathcal{I}}}_{k}\) holding for all \(k=\overline{0,N}\) it follows that\({\underline{\gamma }}_{k}\left(x\right)={\gamma }_{k}^{* }\left(x\right)\).
Theorem 3
Assume that the control \(\underline{u}\left(\cdot \right)\) exists in the class \({\mathcal{U}}\). Then the following statements hold:
1. The value of the probabilistic criterion under the control \(\underline{u}\left(\cdot \right)\) is given by
where \({\underline{x}}_{k}\) is the trajectory of the closed loop system with the control (11),
in which \({\underline{u}}_{k}={\underline{\gamma }}_{k}\left({\underline{x}}_{k}\right)\).
2. The optimal-value function of the probabilistic criterio has the form
where \({x}_{k}^{* }\) is the trajectory of the closed loop system with the optimal control,
in which \({u}_{k}^{* }={\gamma }_{k}^{* }\left({x}_{k}^{* }\right)\).
3. For any \(\varphi \in {\mathbb{R}}\), \(N\in {\mathbb{N}}\), and \(X\in {{\mathbb{R}}}^{n}\),
where the functions
have the form
According to Theorem 3, for the suboptimal control \(\underline{u}\left(\cdot \right)\)(11) the accuracy can be estimated as
which holds for all
where the function \(\Delta :{\mathbb{R}}\times {\mathbb{N}}\times {{\mathbb{R}}}^{n}\to \left[0,1\right]\) has the form
Now study the conditions under which problem (2) is equivalent to the optimal control problem with the probabilistic terminal criterion.
6 Equivalence to Optimal Control Problem with Probabilistic Terminal Criterion
On the trajectories of the system (1) define the probabilistic terminal criterion
and consider the optimal control problem
As was demonstrated in [12], the solution of problem (17) exists in the class \({\mathcal{U}}\) and is determined by solving the dynamic programming equations
where \({{\rm{B}}}_{k}^{\varphi }\left(x\right)\) is the Bellman function in problem (17).
The result below establishes the equivalence of problems (2) and (17).
Lemma. Assume that for all \(k=\overline{0,N}\), \({{\mathcal{F}}}_{k}\subseteq \Delta {{\mathcal{I}}}_{k}\), where
Then problem (3) is equivalent to the optimal control problem with the probabilistic terminal criterion (17) in the sense of the same optimal controls \({u}^{* }\left(\cdot \right)={u}^{\varphi }\left(\cdot \right)\), the same optimal values of the criteria \(P\left({u}^{* }\left(\cdot \right)\right)={P}_{\varphi }\left({u}^{\varphi }\left(\cdot \right)\right)\), and the same Bellman functions \({{\rm{B}}}_{k}\left(x\right)={{\rm{B}}}_{k}^{\varphi }\left(x\right)\) for all \(k=\overline{0,N+1}\) and \(x\in {{\mathbb{R}}}^{n}\).
Note that more specific conditions for the equivalence of problems (2) and (17) can be obtained from Lemma for a narrow class of systems, but this goes beyond the scope of the paper.
In the next section, the results obtained will be applied to examine additional properties of the optimal investment portfolio management problem with a probabilistic criterion.
7 Optimal Investment Portfolio Management with Non-Fixed Terminal Time
Following [13, 19], consider a discrete-time stochastic control system of the form
where n = 1, m, and s = m − 1 specify the dimensions of the state vector, control vector and random perturbations, respectively; X > 0, b > −1, and φ < 0 are deterministic scalars; \({\xi }_{k}={\left({\xi }_{k}^{1},\ldots ,{\xi }_{k}^{m-1}\right)}^{{\mathtt{T}}}\) mean random vectors with independent components, among which ξk+1 and ξk are independent for all \(k=\overline{0,N-1}\). Let the control constraints be given by
Assume that the distribution of the random vectors ξk has the support \({\rm{supp}}{\rho }_{\xi }\left(t\right)=\mathop{\bigotimes }\limits_{j = 1}^{m-1}\left[{\underline{\varepsilon }}_{j},{\overline{\varepsilon }}_{j}\right]\), and moreover the inequality \(-1\leqslant {\underline{\varepsilon }}_{j}\leqslant b\leqslant {\overline{\varepsilon }}_{j}\) holds for all \(j=\overline{1,m-1}\).
Consider an optimal control problem with a non-fixed terminal time of the form
and a problem with a probabilistic terminal criterion of the form
In the notations accepted in this paper,
Let X be the amount of initial capital, xk be the amount of capital at the beginning of the kth year, \({u}_{k}^{1}\) be the share of xk invested in a risk-free asset (for example, a reliable bank) with a rate of return b, and \({u}_{k}^{j}\) be the shares of xk invested in risky assets characterized by rates of return \({\xi }_{k}^{j-1}\), \(j=\overline{2,m}\). In this case, problem (23) is to maximize the probability of reaching the amount of capital ( −φ) in time steps bounded above by the value N + 1 through investing in some assets. And problem (24) is to maximize the probability of reaching the amount of capital ( −φ) in time steps N + 1 through investing in some assets.
Problem (24) was considered in the two-step statement (N = 1) for the case of one risky asset (m = 2) in [15, 16]. In [13], a whole class of asymptotically optimal controls (as N → ∞) was found. Problem (23) is studied for the first time.
Using the results of this paper, check the equivalence of problems (23) and (24).
Proposition 1
For all \(k=\overline{0,N},\)
where the scalars \({\varphi }_{k}^{{\mathcal{I}}}\) and \({\varphi }_{k}^{{\mathcal{O}}}\) are given by
Due to Proposition 1, the condition of Lemma holds, \({\mathcal{F}}\subseteq \Delta {{\mathcal{I}}}_{k}\), and consequently problems (23) and (24) are equivalent. From Proposition 1 it follows that the optimal control in terms of the probabilistic terminal criterion will maximize the probability of first reaching the amount ( −φ) by the capital xk in time steps bounded above by the value N.
Now apply the two-sided estimates of the optimal-value function of the probabilistic criterion to estimate a time step \({N}^{* }\in {\mathbb{N}}\) such that
where \({\left\{{x}_{k}^{* }\right\}}_{k = 0}^{N}\) are the trajectories of the closed loop system (22) with the optimal control \({u}^{* }\left(\cdot \right)\). Interestingly, the two-sided estimates of the optimal-value function of the probabilistic criterion (see Theorem 3) can be used to obtain such an estimate without finding the optimal control.
Proposition 2
Let \({\left\{{\underline{x}}_{k}\right\}}_{k = 0}^{N}\) be the trajectories of the closed loop system (22) with the control (11), where
Then there exists a time step \(\underline{N}\in {\mathbb{N}}\) such that
and moreover for any X > 0 and b > − 1, \({N}^{* }\geqslant \underline{N}\) and
Perform a series of three numerical experiments to check the adequacy of the estimate (26). Consider the case of one risky asset, for which the control (25) was found in [13]:
The values of the system parameters are given in Table 1.
To simulate the value N*, the Monte Carlo method of 50 000 observations was used. The simulation results are presented in Table 2.
According to Table 2, \(\underline{N}\) is a rather accurate estimate of the value N*.
8 Conclusions
This paper has considered the optimal control problem for a discrete-time stochastic system with the probability of first reaching a given tube of trajectories as the optimality criterion. Dynamic programming-based sufficient conditions of optimality have been established. Two-sided estimates of the right-hand side of the dynamic programming equation, two-sided estimates of the Bellman function, and two-sided estimates of the optimal-value function of the probabilistic criterion have been found. A suboptimal control has been designed in analytical form, and its accuracy has been estimated. The conditions under which this problem is equivalent to the optimal control problem with a probabilistic terminal criterion have been proved. These conditions were tested on an optimal management problem for an investment portfolio.
9 Funding
This work was supported by the Russian Foundation for Basic Research, project no. 18-08-00595.
References
Afanas’ev, V.N., Kolmanosvkii, V.B., and Nosov, V.R.Matematicheskaya teoriya konstruirovaniya sistem upravleniya, Moscow: Vysshaya Shkola, 2003, 3rd ed. Translated under the title Mathematical Theory of Control Systems Design, Kluwer, 1996, 1st ed.
Smirnov, I. P. On a Time-Optimal Control Problem for a Stochastic Controlled System. Differ. Uravn. 22(no. 2), 247–254 (1986).
Bertsekas, D. P. & Shreve, S. E. Stochastic Optimal Control: The Discrete-Time Case. (Academic, New York, 1978).
Zhou, J. Infinite Horizon Optimal Control Problem for Stochastic Evolution Equations in Hilbert Spaces. J. Dyn. Control Syst. 22(no. 3), 531–554 (2016).
Agram, N., Haadem, S., Øksendal, B. & Proske, F. A Maximum Principle for Infinite Horizon Delay Equations. SIAM J. Math. Anal. 45(no. 4), 2499–2522 (2013).
Agram, N. & Øksendal, B. Infinite Horizon Optimal Control of Forward-Backward Stochastic Differential Equations with Delay. J. Comput. Appl. Math. 259(part B), 336–349 (2014).
Chernous’ko, F. L. & Kolmanovskii, V. B. Optimalanoe upravlenie pri sluchainykh vozmushcheniyakh (Optimal Control under Random Perturbations). (Nauka, Moscow, 1978).
Smirnov, I. P. Control of the Probability of Reaching a Given Domain by a System. Differ. Uravn. 26(no. 10), 1753–1758 (1990).
Azanov, V. M. Optimal Control for Linear Discrete Systems with Respect to Probabilistic Criteria. Autom. Remote Control 75(no. 10), 1743–1753 (2014).
Semakov, S. L. Vybrosy sluchainykh protsessov: prilozheniya v aviatsii. (Nauka, Moscow, 2005). Translated under the title Crossings Problems in Random Processes Theory and Their Applications in Aviation, Newcastle: Cambridge Scholars Publisher, 2019.
Khametov, V. M. & Shelemekh, E. A. Superhedging of American Options on an Incomplete Market with Discrete Time and Finite Horizon. Autom. Remote Control 76(no. 9), 1616–1634 (2015).
Malyshev, V. V. & Kibzun, A. I. Analiz i sintez vysokotochnogo upravleniya letatelanymi apparatami (Analysis and Design of High-Precision Control of Aircrafts). (Mashinostroenie, Moscow, 1987).
Azanov, V. M. & Kan, Yu. S. Bilateral Estimation of the Bellman Function in the Problems of Optimal Stochastic Control of Discrete Systems by the Probabilistic Performance Criterion. Autom. Remote Control 79(no. 2), 203–215 (2018).
Azanov, V. M. & Kan, Yu. S. Refined Estimation of the Bellman Function for Stochastic Optimal Control Problems with Probabilistic Performance Criterion, Autom. Remote Control 80(no. 4), 634–647 (2019).
Grigor’ev, P. V. & Kan, Yu. S. Optimal Control of the Investment Portfolio with Respect to the Quantile Criterion. Autom. Remote Control 65(no. 2), 319–336 (2004).
Bunto, T. V. & Kan, Yu. S. Quantile Criterion-based Control of the Securities Portfolio with a Nonzero Ruin Probability. Autom. Remote Control 74(no. 5), 811–828 (2013).
Kibzun, A. I. & Ignatov, A. N. The Two-Step Problem of Investment Portfolio Selection from Two Risk Assets via the Probability Criterion. Autom. Remote Control 76(no. 7), 1201–1220 (2015).
Azanov, V. M. & Kan, Yu. S. Design of Optimal Strategies in the Problems of Discrete System Control by the Probabilistic Criterion. Autom. Remote Control 78(no. 6), 1006–1027 (2017).
Kibzun, A. I. & Ignatov, A. N. Reduction of the Two-Step Problem of Stochastic Optimal Control with Bilinear Model to the Problem of Mixed Integer Linear Programming. Autom. Remote Control 77(no. 12), 2175–2192 (2016).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 1. Introduce a function \({\Phi }_{0}:{{\mathbb{R}}}^{n}\to {\mathbb{R}}\) such that \({\Phi }_{0}\left(x\right)={\Phi }_{1}\left(x\right)\). Augment the state vector of the system by adding a new variable \({y}_{k}={\rm{min}}_{i = \overline{0,k}}{\Phi }_{i}\left({x}_{i}\right)\). The augmented control system has the form
where \({\widetilde{u}}_{k}={\widetilde{\gamma }}_{k}\left({x}_{k},{y}_{k}\right)\). Consider the right-hand side \(\widetilde{f}:{{\mathbb{R}}}^{n+1}\times {{\mathbb{R}}}^{m}\times {{\mathbb{R}}}^{s}\to {{\mathbb{R}}}^{n+1}\) of the augmented system:
Then the equivalent optimal control problem can be written as
where \(\widetilde{{\mathcal{U}}}={\widetilde{{\mathcal{U}}}}_{0}\times \ldots \times {\widetilde{{\mathcal{U}}}}_{N}\) and
Here the equivalence is understood in the sense of the same criteria
The Bellman equation for the equivalent problem [12] has the form
The following result was established in [12]. Assume that:
1) The functions \({\widetilde{f}}_{k}\) are continuous for all \(k=\overline{0,N}\).
2) The function \({\rm{min}}\left\{y,{\Phi }_{N+1}\left(x\right)\right\}\) is continuous and bounded below.
3) The random vectors X, ξ0, …, ξN are independent.
4) The sets U0, …, UN are compact.
Then the optimal control exists in the class of measurable functions and is determined by solving problems (A.1)–(A.3).
At step k = N the Bellman equation takes the form
Hence, for any \(k=\overline{0,N}\), Eq. (A.2) can be written as
Therefore, the Bellman function in the equivalent problem admits of the representation
As a result, the optimal control is given by
Conditions 1–5 of Theorem 1 follow from the existence conditions of an optimal control in the problem with the terminal criterion. Note that items 2 and 3 are immediate from the continuity and boundedness (from below) of the function \({\rm{min}}\left\{y,{\Phi }_{N+1}\left(x\right)\right\}\).
The proof of Theorem 1 is complete.
Proof of Theorem 2. 1. Consider the Bellman equation at some step k. For the isobell of level 1,
Using the total mathematical expectation formula, write
From the equalities
it follows that
Item 1 of this theorem is established.
2. By analogy with item 1, the isobell of level 0 of the Bellman function can be written as
Using the total mathematical expectation formula, transform the right-hand side of this expression:
From the equalities
it follows that
Item 2 of this theorem is established.
3. Item 3 of this theorem follows from its item 1.
4. Item 4 of this theorem holds, since \({{\rm{B}}}_{k}\left(x\right)=0\) for \(x\in {{\mathcal{O}}}_{k}\).
5. Item 5 of this theorem follows its item 1.
6. For \(x\in {{\mathcal{B}}}_{k}\), the right-hand side of the dynamic programming equation can be represented as
Due to the two-sided inequality for a convex combination, this yields
The relations \(1-{{\bf{P}}}_{{\xi }_{k}}({f}_{k}(x,u,{\xi }_{k})\in {{\mathcal{O}}}_{k+1})={{\bf{P}}}_{{\xi }_{k}}({f}_{k}(x,u,{\xi }_{k})\in {{\mathcal{I}}}_{k+1})+{{\bf{P}}}_{{\xi }_{k}}({f}_{k}(x,u,{\xi }_{k})\in {{\mathcal{B}}}_{k+1})\) and \({{\mathcal{F}}}_{k}\subseteq {{\mathcal{I}}}_{k}\) finally establish item 6 of this theorem.
7. Item 7 of this theorem is immediate from its item 6 by taking the supremum in all sides of inequality (8).
Proof of Theorem 3. 1. Introduce a system of hypotheses forming a complete group of incompatible events:
Then the total probability formula gives the chain of equalities
Analyze the second factor in the first term of the right-hand side of this expression. From the chain of equalities
it follows that
In view of this equality, the expression (A.8) takes the form
Now introduce another system of hypotheses forming a complete group of incompatible events:
Apply the total probability formula for the first term in the right-hand side of (A.9):
Similar to (A.9), transform the right-hand side of (A.10) to obtain
Substituting (A.10) into (A.9) yields
Perform the analogous transformations for the first term in (A.12), introducing the systems of hypotheses
to obtain the following expression for the value of the probabilistic criterion under the control \(\underline{u}\left(\cdot \right):\)
Note that the first term in (A.13) satisfies the chain of equalities
which implies the expression (14).
Item 1 of Theorem 3 is established.
2. For establishing item 2 of Theorem 3, it suffices to observe that for \({x}_{k}\in {{\mathcal{I}}}_{k}\), \({u}_{k}^{* }={\underline{u}}_{k}\) for all \(k=\overline{0,N}\). Following the same procedure as above, this equality can be used for deriving the expression (14) for the optimal-value function of the probabilistic criterion on the trajectories \({\left\{{x}_{k}^{* }\right\}}_{k = 1}^{N+1}\) of the closed loop system with the optimal control \({u}^{* }\left(\cdot \right)\).
Item 2 of Theorem 3 is established.
3. Item 3 of Theorem 3 directly follows from item 7 of Theorem 2 and item 1 of Theorem 3.
The proof of Theorem 3 is complete.
Proof of Lemma. Consider the dynamic programming relations (3)–(5) and (18)–(20) for problems (2) and (17), respectively. In (19), the equality
where \({{\mathcal{I}}}_{k}^{\varphi }\) is the isobell of level 1 of the Bellman function \({{\rm{B}}}_{k}^{\varphi }\left(x\right)\), holds for all \(x\in {{\mathbb{R}}}^{n}\). Hence, the equivalence conditions are true if the isobells of level 1 of the Bellman functions in problems (2) and (17) are the same, \({{\mathcal{I}}}_{k}={{\mathcal{I}}}_{k}^{\varphi }\). Due to the recurrent formula for \({{\mathcal{I}}}_{k}\) (see item 1 of Theorem 2), this is the case if and only if
The proof of Lemma is complete.
Proof of Proposition 1. In accordance with item 1 of Theorem 2, write the equations for the isobells of levels 1 and 0 of the Bellman function at step k = N:
The solutions of these equations were obtained in [13]. Using those results and the notations for the boundaries of the sets \({{\mathcal{I}}}_{k}\) and \({{\mathcal{O}}}_{k}\) (see Section 6), write:
This implies \({{\mathcal{I}}}_{N}=\Delta {{\mathcal{I}}}_{N}\). Due to item 1 of Theorem 2, at step k = N − 1 the equations for the isobells take the form
By analogy with step k = N, it follows that
Mathematical induction on k finally leads to the conclusion that, for all \(k=\overline{0,N}\),
The proof of Proposition 1 is complete.
Proof of Proposition 2. Consider the control (11), which takes the form (25); see Proposition 1. From item 3 of Theorem 3 it follows that
where the function \(\underline{F}\) is given by
due to
Clearly, the value \(\underline{N}\in {\mathbb{N}}\) can be determined as a root of the equation \(\underline{F}\left(\varphi ,N,X\right)=1\). But there exist infinitely many such roots, and the estimate \(\underline{N}\) will be therefore found in the form
The expressions (A.15) and (A.17) imply
From the definition of the isobell of level 1 of the Bellman function and the definition of the function \(\underline{F}\) it follows that the equation \(\underline{F}\left(\varphi ,N,X\right)=1\) is equivalent to the inclusion \(X\in {{\mathcal{I}}}_{0}\), which is in turn equivalent to
Taking the logarithm gives the inequality
In view of (A.17), this finally leads to (26).
The proof of Proposition 2 is complete.
Rights and permissions
About this article
Cite this article
Azanov, V. Optimal Control of a Discrete-Time Stochastic System with a Probabilistic Criterion and a Non-fixed Terminal Time. Autom Remote Control 81, 2143–2159 (2020). https://doi.org/10.1134/S0005117920120012
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117920120012