Abstract
In this paper we consider the large deviation problem for the method of empirical means in stochastic optimization with continuous time observations. For discrete time models this problem was studied in Knopov and Kasitskaya (Cybern Syst Anal 4:52–61, 2004; Cybern Syst Anal 5:40–45, 2010).
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
1 The Approach
Consider the following stochastic minimization problem: minimize the function
where \(\left \{\xi _t,\, t\in {\mathbb R}\right \}\) is a stationary in the narrow sense stochastic process with continuous trajectories, defined on a complete probability space \(\left (\varOmega ,\mathcal {G},P\right )\) with values in some metric space \(\left (Y,\rho \right )\), where X is a non-empty compact subset of \({\mathbb R}\), and \(f:X\times Y\to {\mathbb R}\) is a continuous function.
Approximate the above problem by the following one: minimize the function
where \(\left \{\xi _t,0\le t\le T\right \}\) are the observations of the process ξ t , T > 0.
Clearly, there exists the minimum point \(x_{T} =x_{T} \left (\omega \right )\), which is a measurable function.
Suppose that
Then the function (1) is continuous, and has at least one minimum point x 0. Let us assume that this point is unique.
Theorem 1 ([1])
Let \(\left \{\xi _t,\; t\in R\right \}\) be a stationary in the narrow sense random process, defined on a probability space \(\left (\varOmega ,\Im ,P\right )\) , and assume that there exists the unique point x 0 ∈ X which is the unique minimum point of the function F(x).
Then for all T > 0 and ω ∈ Ω′, \(P\left (\varOmega '\right )=1\) , there exists at least one vector x T ∈ X, on which the minimal value of the function F T (x) is attained.
Moreover, for each T > 0 the function x T is \(\mathcal {G}^{\prime }_{T} \) -measurable, where \(\mathcal {G}^{\prime }_{T} =\mathcal {G}_{T} \bigcap \varOmega '\) , \(\mathcal {G}_{T} =\sigma \left \{\xi _t,\; 0\le t\le T\right \}\).
Then
Now we study the probability of large deviations of x T and the minimal value F T for \(x_{0} ,F\left (x_{0} \right )\).
For any y we can assume that \(f\left (\circ ,y\right )\) belongs to the space of continuous functions \(C\left (X\right )\). Suppose that there exists such compact convex \(K\subset C\left (X\right )\) that
Then \(F_{T} \left (\circ \right )-F\left (\circ \right )\in K\).
In what follows F T − F is considered as random elements on \(\left (\varOmega ,\mathcal {G},P\right )\) with values in the set K.
We use the well-known results from function analysis.
Definition 1 ([2])
Let \(\left (V,\left \| \circ \right \| \right )\) be a normed linear space, \(B\left (x,r\right )\) be a closed ball of radius r with center in x; \(f\_ :V\to \left [-\infty ,+\infty \right ]\) is some function, and x f is its minimum point on V. The improved function ψ for f in the point x f is a monotone non-decreasing function, such that \(\psi :\left [0,+\infty \right )\to \left [0,+\infty \right ],\psi \left (0\right )=0,\) and there exists r > 0, for which for any \(x\in B\left (x_{f} ,r\right )\) we have
Let V 0 ⊂ V . Define
Theorem 2 ([2])
Let \(\left (V,\left \| \circ \right \| \right )\) be a linear normed space, V 0 ⊂ V is closed, and \(f_{0} ,g_{0} :V\to {\mathbb R}\) are continuous on V functions. Suppose that
Let \(f,g:V\to \left (-\infty ,+\infty \right ]:\)
Then
Let x f be the minimum point of f on V, ψ be the improving function for f at the point x f with coefficient r. If ε is small enough such that
then for any \(x_{g} \in \arg \min \left \{g\left (x\right ),x\in B\left (x_{f} ,r\right )\right \}\) we have \(\psi \left (\left \| x_{f} -x_{g} \right \| \right )\le 2\varepsilon .\) For convex and strictly increasing on \(\left [0,r\right ]\) function ψ,
We need some statements from the large deviations theory.
Theorem 3 ([3, p. 53])
Let μ ε , ε > 0 be a family of probability measures on a compact closed subspace H of a separable Banach space E. Suppose that there exists
for any λ ∈ E ∗ —the dual space of E, where
for any probability measure μ on E, and \(\left \langle \lambda ,x\right \rangle \) is the duality relation. Define
Then Λ ∗ is non-negative, convex, lower semi-continuous, and for any compact set A ⊂ H
Definition 2 ([3])
Let Σ be a separable Banach space, \(\left \{\xi _t,t\in {\mathbb R}\right \}\) is a stationary in the narrow sense stochastic process on \(\left (\varOmega ,\mathcal {G},P\right )\) with values in Σ. Denote \(B_{t_{1} t_{2} } =\sigma \left \{\xi _t,t_{1} \le t\le t_{2} \right \}.\) For τ > 0 the random variables η 1, …, η p , p ≥ 2, are called τ-measurably separated, if
where η j is \(B_{t_{j} s_{j} } \)–measurable.
Definition 3 ([3])
A stochastic process \(\left \{\xi _t\right \}\) from Definition 2 is said to satisfy Hypothesis (H-1) of hypermixing, if there exist \(\tau _{0} \in {\mathbb N}\cup \left \{0\right \}\) and a non-increasing \(\alpha :\left \{\tau >\tau _{0} \right \}\to \left [1,+\infty \right )\), such that
for any p ≥ 2, τ > τ 0, η 1, …, η p τ-measurably separated,
Let X be a compact subset of \({\mathbb R}.\) It is known (cf. [4]), that \(\left (C\left (X\right )\right )^*=M\left (X\right )\), where M(X) is a collection of signed measures on X, and also for any \(g\in C\left (X\right ),Q\in M \left (X\right )\)
We need the following auxiliary statement.
Theorem 4
Let \(\left \{\xi _t,t\in {\mathbb R}\right \}\) be a stationary in the narrow sense ergodic stochastic process with continuous trajectories, which satisfies the hyper-mixing hypothesis (H-1) on \(\left (\varOmega ,\mathcal {G},P\right ),\) with values in a compact convex set \(K\subset C\left (X\right )\) , ξ t (x) ⊂ K and \(\mathcal {G}_t\) -measurable. Then for any measure \(Q\in M \left (X\right )\) there exists
and for any closed A ⊂ K
where \(\varLambda ^*\left (g\right )=\sup \left \{\int _{X}g\left (x\right )Q\left (dx\right ) -\varLambda \left (Q\right ),Q\in M \left (X\right )\right \}\) is a non-negative convex lower semi-continuous function.
Proof
Fix \(Q\in M \left (X\right ).\) Let τ 0 be a constant from the hyper-mixing condition, τ > τ 0, S > τ, S < T. Then
Define
Denote (cf. also [4])
We have
By (2),
Therefore, for all ω
It follows from (4) that for any ω
For any ω denote
Further,
We have
Inequality (8) follows from the hyper-mixing hypothesis (H-1). Further, due to the stationarity of ξ t ,
for \(j=\overline {0,N_{T} -1}\). From (8), (9) we have
By (3),
By (10),
Then
Letting S →∞, we derive
Passing to the limit as τ →∞, we get
Therefore,
We use Theorem 3. We have
Further, μ ε = μ 1/T is the probability measure on K, defined by the distribution \(\frac {1}{T} \int _{0}^{T}\xi _tdt\). We get
By (11), the proof follows from Theorem 2.
Let us come back to problem (1).
Theorem 5
Suppose that the process ξ t satisfies the hyper-mixing hypothesis (H-1). Then for any ε > 0
where \(I\left (z\right )=\varLambda ^*\left (z\right )=\sup \left \{\int _{X}z\left (x\right )Q\left (dx\right ) -\varLambda \left (Q\right ),\,\,Q\in M \left (X\right )\right \}\) is a non-negative lower semi-continuous convex function,
Proof
Note that A ε is a closed subspace of K. The process
taking values in K, is a measurable function of ξ t and hence, satisfies the conditions of Theorem 4. Therefore, the statement of the theorem follows from Theorem 4. Theorem is proved.
Theorem 6
Suppose that the conditions of Theorem 5 are satisfied. Then
where \(I\left (\circ \right ),A_{\varepsilon } \) is defined in Theorem 5.
Suppose that there exists an improving function ψ for F at the point x 0 with some constant r. Let x T be the minimum point of F T on the set \(B\left (x_{0} ,r\right ).\) If ε is small enough, such that the condition
is satisfied, then
Proof
By Theorem 2, for all ω
Then by Theorem 5 we derive (12).
Further, by Theorem 2, for any ω
and by Theorem 5 we get (13). Theorem is proved.
Remark 1
If, besides of conditions of Theorem 6, the function ψ is convex an strictly increasing on \(\left [0,r\right ],\) then we get
Indeed, by Theorem 2, for all ω
Then
2 Non-stationary Version
In this part we consider the non-stationary version of the method of empirical means with continuous time observations.
Let {ξ(t), t ∈ [0, T]} be a strictly stationary random process defined on a probability space \((\varOmega , \mathcal {F}, \mathbb {P})\) with values in some metric space (Y, ℑ), \(X=[a,b]\subset \mathbb {R}\), and the function \(h(t,x,y): (0,\infty )\times X\times Y\mapsto \mathbb {R}\) is convex with respect to the second variable and measurable with respect to the third one.
Consider the following problem:
Assume that the following conditions are satisfied.
-
1.
\(\mathbb {E} |h(t,x,\xi (0)|<\infty \) for all t > 0, x ∈ X;
-
2.
For all x ∈ X there exists
$$\displaystyle \begin{aligned} F(x)= \lim_{T\to\infty} F_T(x); \end{aligned}$$ -
3.
There exists \(\bar x\in X\), c > 0 such that
$$\displaystyle \begin{aligned} F(x) \geq F(\bar x) + c|x-\bar x| \quad \text{for all} \ x\in X. \end{aligned}$$
From condition 3 it follows that there exists a unique solution to the minimization problem
and this solution is achieved at some point \(\bar x\). Besides, for any T and w the function F T (x) = F T (x, w) is convex, and for any T the function \(\mathbb {E} F_T(x)\) is convex.
For any function g :→ define
Put \(g_T(x)=\mathbb {E} F_T(x)\), x ∈ X. Since by convexity of h(t, x, y) the limits in (16) and (17) exist, the following limits exist as well:
-
for all t, y for the function h(t, x, y);
-
for each t for the function \(\mathbb {E} h(t,\cdot , \xi (t))\);
-
for any t, w for the function F T (⋅);
-
for each t for g T (⋅).
The following lemma holds true.
Lemma 1
Suppose that there exists a function u : X × Ω →, convex with respect to the first argument and measurable with respect to the second one. Assume that \(\mathbb {E} |u(x,\omega )|<\infty \) for any x ∈ X. Denote \(v(x)= \mathbb {E} u(x,\omega )\) . Then
Proof
We have
Since u is convex with respect to x for all ω,
the fractions in the right-hand sides of (18) and (19) are decreasing monotone as Δ → +0. Then by the monotone convergence theorem
The same arguments is applie to v−′. Lemma is proved.
By Lemma 1 we have that
and for any small t ∈ [0, T], x ∈ X
Lemma 2
Suppose that conditions 1–3 and the statements a)–c) below hold true:
-
a)
\(h^{\prime }_+(t,\bar x, \xi (t))- \mathbb {E} h^{\prime }_+(t,\bar x, \xi (t))\) and \(h^{\prime }_-(t,\bar x, \xi (t))- \mathbb {E} h^{\prime }_-(t,\bar x, \xi (t))\) , t ∈ [0, T]
satisfy the strong mixing condition with the mixing coefficient [ 1 ]
$$\displaystyle \begin{aligned} \alpha(\tau)\leq \frac{c_0}{ 1+ \tau^{1+\epsilon}}, \quad \epsilon>0,\quad \tau>0. \end{aligned}$$ -
b)
there exists δ > 2/𝜖 such that for any t > 0
$$\displaystyle \begin{aligned} \mathbb{E} |h^{\prime}_+(t,\bar x, \xi(0))|{}^{2+\delta} <\infty, \quad \mathbb{E} |h^{\prime}_-(t,\bar x, \xi(0))|{}^{2+\delta} <\infty. \end{aligned}$$ -
c)
$$\displaystyle \begin{aligned} g^{\prime}_{T+} (\bar x) \to F^{\prime}_+(\bar x), \quad g^{\prime}_{T-}(\bar x) \to F^{\prime}_- (\bar x), \quad T\to \infty. \end{aligned}$$
Then
The proof is analogous to that of Lemma 2 [5] in the discrete time setting.
Theorem 7
Suppose that conditions of Lemma 2 hold true. Then with probability 1 there exists T ∗ = T ∗(ω) such that for any T > T ∗ problem (15) has the unique solution x T and \(x_T= \bar x\).
Proof
By assumption 2,
By Lemma 2,
with probability 1, starting from some T ∗. Since the function F T (x) is convex, \(\bar x\) is the unique minimum point of the function F T (x). Theorem is proved.
Now we turn to the large deviation problem for (15).
Theorem 8
Suppose that condition 2 and the assumptions below hold true:
-
a)
the family {ξ(t), t ∈ [0, T]} satisfies the conditions of hypothesis (H-1).
-
b)
there exists L > 0 such that for all t ∈ [0, T] and y ∈ Y
$$\displaystyle \begin{aligned} |h^{\prime}_+(t,\bar x, y)|\leq L, \quad |h^{\prime}_-(t,\bar x, y)|\leq L. \end{aligned}$$
Then
where
The proof follows the same line as that of Theorem 3 [6], with using Theorem 4.
Remark 2
Note that the statements of Theorems 5 and 6 for non-stationary observation model also hold true. The proofs are analogous to those of Theorems 5 and 6.
References
Knopov, P.S., Kasitskaya E.J.: Empirical Estimates in Stochastic Optimization and Identification. Kluwer, Dordrecht (2005)
Kaniovski, Y.M., King, A.J., Wets, R.J.-B.: Probabilistic bounds (via large deviations) for the solutions of stochastic programming problems. Ann. Oper. Res. 56, 189–208 (1995)
Deuschel J.D., Stroock D.W.: Large Deviations. Academic, Boston (1989)
Dunford N., Schwartz J.: Linear Operators. P.I: General Theory. Interscience, New York (1957)
Knopov, P.S., Kasitskaya E.J.: Large deviations of empirical estimates in stochastic programming with non-stationary observations. Cybern. Syst. Anal. 5, 40–45 (2010)
Knopov, P.S., Kasitskaya E.J.: On large deviations of empirical estimates in stochastic programming. Cybern. Syst. Anal. 4, 52–61 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Knopov, P.S., Kasitskaya, E.J. (2017). Large Deviations for the Method of Empirical Means in Stochastic Optimization Problems with Continuous Time Observations. In: Butenko, S., Pardalos, P., Shylo, V. (eds) Optimization Methods and Applications . Springer Optimization and Its Applications, vol 130. Springer, Cham. https://doi.org/10.1007/978-3-319-68640-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-68640-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68639-4
Online ISBN: 978-3-319-68640-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)