Abstract
In a Hilbert setting we aim to study a second order in time differential equation, combining viscous and Hessian-driven damping, containing a time scaling parameter function and a Tikhonov regularization term. The dynamical system is related to the problem of minimization of a nonsmooth convex function. In the formulation of the problem as well as in our analysis we use the Moreau envelope of the objective function and its gradient and heavily rely on their properties. We show that there is a setting where the newly introduced system preserves and even improves the well-known fast convergence properties of the function and Moreau envelope along the trajectories and also of the gradient of Moreau envelope due to the presence of time scaling. Moreover, in a different setting we prove strong convergence of the trajectories to the element of minimal norm from the set of all minimizers of the objective. The manuscript concludes with various numerical results.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the Hilbert setting H, where \(\langle \cdot , \cdot \rangle \) denotes the inner product and the norm is defined as usual \( \Vert \cdot \Vert = \sqrt{\langle \cdot , \cdot \rangle } \), we will study the convergence properties of the following second order in time differential equation
with initial conditions \(x(t_0) = x_0 \in H\), \(\dot{x}(t_0) = \dot{x}_0 \in H\), where \( \alpha , \beta \text { and } t_0 > 0 \), \(\lambda : [t_0, +\infty ) \mapsto \mathbb {R}_+\) and \(b: [t_0, +\infty ) \mapsto \mathbb {R}_+\) are non-negative, non-decreasing and differentiable, \( \Phi : H \mapsto \overline{\mathbb {R}} = \mathbb {R} \cup \{ \pm \infty \} \) is a proper, convex and lower semicontinuous function and \(\Phi _\lambda \) is its Moreau envelope of the index \(\lambda > 0\) and the function \(\varepsilon : [t_0, +\infty ) \mapsto \mathbb {R}_+\) is continuously differentiable and non-increasing with the property \(\lim _{t \rightarrow +\infty } \varepsilon (t) = 0\). In addition, we assume that \(\mathop {\textrm{argmin}}\limits \Phi \), which is the set of global minimizers of \(\Phi \), is not empty and denote by \(\Phi ^*\) the optimal objective value of \(\Phi \). The system (1) has a connection to the minimization problem
of a proper, convex and lower semicontinuous function \(\Phi \). Studying such systems provides better understanding of their discrete counterpart—optimization algorithms, since there is a strong connection between them, and the question of transitioning from one to another attracts a lot of attention in the modern literature.
One of the main goals of this research is to improve (compared to [23]) the fast rates of convergence for the Moreau envelope of the objective function and the objective function itself to \(\Phi ^*\), as well as for the gradient of the Moreau envelope of the objective function in terms of the Moreau parameter function \(\lambda \) and the time scaling function b. Moreover, we also deduce the strong convergence of the trajectory of the dynamics to the minimal norm element of \(\mathop {\textrm{argmin}}\limits \Phi \). We introduce two settings with different assumptions for each result. To conclude we provide multiple numerical results in order to illustrate our theoretical discoveries.
1.1 Nonsmooth Optimization with Time Scaling
In the smooth setting the pioneering research in studying second order dynamical systems was conducted by Su–Boyd–Candes [30] for the sake of obtaining faster asymptotic convergence for convex functions. They managed to deduce the rates of convergence of the function values being of the order \(\frac{1}{t^2}\). Later Attouch–Peypouquet–Redont [20] also established the weak (and in some particular cases the strong) convergence of the trajectories to a minimizer of the objective function. In [19] the same authors continued the development in this direction by adding Hessian-driven damping term in order to obtain the rates for the gradient of the objective function and to eliminate any possible oscillations in the dynamical behaviour of the trajectories.
Concerning the nonsmooth setting we must point out that the Moreau envelope of a proper, convex and lower semicontinuous function \(\Phi : H \rightarrow \overline{\mathbb {R}}\) proved to be of a significant importance in designing continuous-time approaches and numerical algorithms for the minimization of nonsmooth functions. The rigorous definition of this construction is
where \(\lambda > 0\) is the parameter of the Moreau envelope (see, for instance, [21]). One of the most important properties of Moreau approximation is that for every \(\lambda > 0\), the functions \(\Phi \) and \(\Phi _{\lambda }\) share the same optimal objective value and also the same set of minimizers. Moreover, \(\Phi _\lambda \) is convex and continuously differentiable with
and \(\nabla \Phi _\lambda \) is \(\frac{1}{\lambda }\)-Lipschitz continuous, where
denotes the proximal operator of \(\Phi \) of parameter \(\lambda \). The last fact we would like to mention is that for every \(x \in H\), the function \(\lambda \in (0, +\infty ) \rightarrow \Phi _\lambda (x)\) is nonincreasing and differentiable (see [14], Lemma A1), namely,
Our research is a logical continuation of the one conducted in [24], where authors applied the time rescaling technique to a nonsmooth optimization problem (for more information on time scaling see also [5, 10, 11, 13]). They considered the following system
where \(\alpha \ge 1 \), \(t_0 > 0\), and \(\beta : [t_0, +\infty ) \mapsto [0, +\infty )\) and \( b, \lambda : [t_0, +\infty ) \mapsto (0, +\infty ) \) are differentiable functions. On the one hand, the presence of the Hessian damping term is believed to help reducing the oscillations in the dynamical behaviour and provides the rates for the gradient of the objective function \(\Phi \). On the other hand, the time-scaling technique (which is considered to be an artificial way to speed up the convergence of values) affects the convergence rates while bringing more restrictions to the analysis. The following properties were established
from where through proximal mapping the convergence rates for the objective function \(\Phi \) itself along the trajectory were obtained
Note that by taking \(b(\cdot ) \equiv 1\) we arrive at the well-known convergence rate of the values being of the order \(o\left( \frac{1}{t^2} \right) \). In addition, the following rates for the gradient of the Moreau envelope were deduced
Finally, the weak convergence of the trajectories x(t) to a minimizer of \(\Phi \) as \(t \rightarrow +\infty \) was obtained.
In our analysis we borrow some ideas of [24] and develop them further in order to fit the new setting, namely, to adapt to a presence of the whole new term—Tikhonov regularization. The analysis becomes more involved and technical, some fundamental properties of Tikhonov regularization had to be proved for a nonsmooth setting. Its presence affects the set of conditions, which we have to impose on the system parameters: even though some of the conditions are formulated in the same spirit as in [24] (for instance, (11) and (14)), the other ones are completely new due to the presence of the Tikhonov term. Moreover, depending on how fast \(\varepsilon \) decays, two different setting arise providing different fundamental results (Sects. 3 and 4).
1.2 Tikhonov Regularization
It turned out that having additional term with specific properties in a system equation leads to improving the weak convergence of the trajectories to a minimizer of the objective function \(\Phi \) to a strong one to the element of minimal norm of \(\mathop {\textrm{argmin}}\limits \Phi \). Such systems were studied, for instance, in [4, 6, 9, 12, 17, 23, 27]. The main goal of such a research is to show that these systems preserve all the typical properties of the second order in time dynamical system (fast convergence of the values, the rates for the gradient etc.) but moreover there is an improvement to the strong convergence of the trajectories to the minimal norm solution instead of a weak one to an arbitrary minimizer. One of the many examples of such systems is presented below (see [23])
where \( \alpha \ge 3\), \(t_0 > 0\), \(\Phi : H \mapsto \mathbb {R}\) is twice continuously differentiable and convex and for the rest of the section the function \(\varepsilon : [t_0, +\infty ) \mapsto \mathbb {R}_+\) is continuously differentiable and non-increasing with the property \(\lim _{t \rightarrow +\infty } \varepsilon (t) = 0\). In that manuscript they provided two settings: one for the fast convergence of values obtaining
and the weak convergence of the trajectories to a minimizer of \(\Phi \) and another setting for the strong convergence of x to \(x^*\), as \(t \rightarrow +\infty \).
Another fine example is given in [4]:
where \( \alpha \), \( t_0 > 0 \) and \(\Phi : H \mapsto \mathbb {R}\) is continuously differentiable and convex. In that paper authors obtained the rates for the function values \( \Phi (x(t)) - \Phi ^* \), as well as for the quantity \( \Vert x(t) - x_{\varepsilon (t)} \Vert \), as \(t \rightarrow +\infty \), where \( x_{\varepsilon (t)} = \mathop {\textrm{argmin}}\limits _H \left( \Phi (x) + \frac{\varepsilon (t) \Vert x \Vert ^2}{2} \right) \). Thus, they assured the strong convergence of the trajectories to the minimal norm solution \( x^* = \mathop {\textrm{proj}}\limits _{\mathop {\textrm{argmin}}\limits \Phi }(0) \) under the appropriate assumptions and properly chosen energy functional, using the properties of Tikhonov regularization. The most important thing about this approach is that authors were able to establish fast convergence of values and strong convergence of the trajectories in the very same setting.
The next step was done in [6]:
where \(\varphi _t (x) = \Phi (x) + \frac{\varepsilon (t) \Vert x \Vert ^2}{2}\), \( \Phi : H \mapsto \mathbb {R}\) is twice continuously differentiable and convex and \(p \in [0, 1]\). This system while preserving all the properties of (5), additionally provides the integral estimate for the norm of the gradient of \(\varphi _t\).
1.3 Our Contribution
In that paper we will develop the ideas presented in [23] to cover the nonsmooth case with time scaling. We will obtain the fast convergence of the function values (as well as for the gradient of the Moreau envelope of the objective fucntion \(\Phi \)) for the family of dynamical systems (1) governed by the Moreau envelope of the nonsmooth function \(\Phi \) and having the Tiknonov term in their formulation:
in terms of the function itself:
where
and finally
We will also deduce (under some appropriate conditions) the following result
which under some restrictions will be improved to the full strong convergence of the trajectories of (1) to the minimal norm solution.
The paper is organized in the following way. Section 2 is devoted to some preliminary results, which we will need later. We will establish the fast rates of convergence of function values and its Moreau envelope, as well as the gradient of Moreau envelope along the trajectories of the dynamical system (Sect. 3). We will show that under some assumptions the strong convergence of the trajectories to the element of minimal norm from the set of all minimizers of the objective function takes place (Sect. 4). We will provide two settings for the polynomial choice of parameter functions to fulfill the assumptions made through the analysis (Sect. 5) and equip this manuscript with various numerical results (Sect. 6).
2 Preparatory Results
We start with the following lemma (see [21], Proposition 12.22, for the first term of the lemma and [18], Appendix, A1, for the second one).
Lemma 1
Let \(\Phi : H \mapsto \overline{\mathbb {R}}\) be a proper, convex and lower semicontinuous function, \(\lambda , \mu > 0\). Then
-
1.
\((\Phi _\lambda )_\mu = \Phi _{\lambda + \mu }\).
-
2.
\( \mathop {\textrm{prox}}\limits _{\mu \Phi _\lambda } = \frac{\lambda }{\lambda + \mu } \mathop {\textrm{Id}}\limits + \frac{\mu }{\lambda + \mu } \mathop {\textrm{prox}}\limits _{(\lambda + \mu )\Phi } \).
Let us mention two key properties of the Tikhonov regularization, which we will use later in the analysis (see, for instance, [2] or [21] Theorem 23.44 for its classic analogue). First let us introduce the strongly convex function \(\varphi _{\varepsilon (t), \lambda (t)}: H \mapsto \mathbb {R}\) as \(\varphi _{\varepsilon (t), \lambda (t)} (x) = \Phi _{\lambda (t)}(x) + \frac{\varepsilon (t) \Vert x \Vert ^2}{2}\) and denote the unique minimizer of \(\varphi _{\varepsilon (t), \lambda (t)}\) as \(x_{\varepsilon (t), \lambda (t)} = \mathop {\textrm{argmin}}\limits _{H} \varphi _{\varepsilon (t), \lambda (t)}\). Thus, the first order optimality condition reads as
Now we are ready to formulate the following result:
Lemma 2
Suppose that
Then the following properties of the mapping \(t \mapsto x_{\varepsilon (t), \lambda (t)}\) are satisfied:
and
Proof
By the monotonicity of \(\nabla \Phi _\lambda \) we deduce
By (6) we obtain
Using Cauchy–Schwarz inequality we derive
This proves the first claim. For the second one consider (6) again and note that it is equivalent to
by the item 2. of Lemma 1. Note that \(\lambda (t) + \frac{1}{\varepsilon (t)} \rightarrow +\infty \), as \(t \rightarrow +\infty \). Thus, the rest of the proof goes in line with Theorem 23.44 of [21]. \(\square \)
Our nearest goal is to deduce the existence and uniqueness of the solutions of the dynamical system (1). Suppose \(\beta > 0\). Let us integrate (1) from \(t_0\) to t to obtain
Denoting \(z(t):= \int _{t_0}^t \left( \frac{\alpha }{s} \dot{x}(s) + b(s) \nabla \Phi _{\lambda (s)} (x(s)) + \varepsilon (s) x(s) \right) ds - \big ( \dot{x}(t_0) + \beta \nabla \Phi _{\lambda (t_0)} (x_0)) \big )\) for every \(t \ge t_0\) and noticing that \(\dot{z}(t) = \frac{\alpha }{t}\dot{x}(t) +b(t) \nabla \Phi _{\lambda (t)} (x(t)) + \varepsilon (t) x(t)\) we deduce, that (1) is equivalent to
Let us multiply the first line by the function b and the second one by the constant \(\beta \) and then sum them up to get rid of the gradient of the Moreau envelope in the second equation
We denote now \(y(t) = \beta z(t) + \left( b(t) - \frac{\alpha \beta }{t} \right) x(t)\), and, after simplification, we obtain the following equivalent formulation for the dynamical system
In case \(\beta = 0\) for every \(t \ge t_0\), (1) can be equivalently written as
Based on the two reformulations of the dynamical system (1) we formulate the following existence and uniqueness result, which is a consequence of Cauchy-Lipschitz theorem for strong global solutions. The result can be proved in the lines of the proofs of Theorem 1 in [16] or of Theorem 1.1 in [19] with some small adjustments.
Theorem 3
Suppose that there exists \(\lambda _0 > 0\) such that \(\lambda (t) \ge \lambda _0\) for all \(t \ge t_0\). Then for every \((x_0, \dot{x}(t_0)) \in H \cdot H \) there exists a unique strong global solution \(x: [t_0, +\infty ) \mapsto H\) of the continuous dynamics (1) which satisfies the Cauchy initial conditions \(x(t_0) = x_0\) and \(\dot{x}(t_0) = \dot{x}_0\).
3 Fast Convergence Rates of the Function and Moreau Envelope Values
This chapter is devoted to obtaining the rates of convergence for the Moreau envelope values and for the values of function \(\Phi \) itself. We will heavily rely on the tools and techniques provided by the Lyapunov analysis. We introduce a slightly modified energy function from [23]. For \(2 \le q \le \alpha - 1\) we define
The key assumptions which are essential to our analysis are the following: for all \(t \ge t_0\)
Theorem 4
Suppose \(\alpha \ge 3\) and assume that (11), (12), (13), (14) hold for all \(t \ge t_0\). Then
Moreover, one has for all \(a \ge 1\)
If, in addition, \(\alpha > 3\) and (15) holds, then the trajectory x is bounded and
and
Proof
Let us compute the time derivative of the energy function. For every \(t \ge t_0\) using (3) we derive
Define \(v(t) = q(x(t) - x^*) + t \left( \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)}(x(t)) \right) \!.\) Using (1) to replace \(\ddot{x}(t) + \beta \frac{d}{dt} \nabla \Phi _{\lambda (t)} (x(t))\) we obtain
By (14) one has \(b(t) - \frac{\beta }{t} > 0\) for all \(t \ge t_0\), and thus for a strongly convex function \(\varphi _t (x) \ = \ \left( b(t) - \frac{\beta }{t} \right) \Phi _{\lambda (t)} (x) + \frac{\varepsilon (t)}{2} \Vert x \Vert ^2\) we have
or
Therefore, for every \(t \ge t_0\)
Notice that for \(a \ge 1\)
which leads to
for every \(t \ge t_0\). Note that \(b(t) - \frac{1}{a} > 0\) for all \(t \ge t_0\). Then, due to the properties of b, there exists \(t^* \ge t_0\) such that \(t^2 b(t) - \beta (q + 2 - \alpha )t \ \ge \ 0\) for all \(t \ge t^*\) and all \(q \in (2, \alpha - 1]\). Therefore, since \(\dot{\lambda }(t) \ge 0\) for all \(t \ge t_0\), there exists \(t^{**}\), namely, \(t^{**} = \max \left\{ t^*, \frac{\beta }{b(t_0) - \frac{1}{a}} \right\} \), such that
Consider now two cases with \(t \ge t^{**}\). First, take \(q = \alpha - 1\) to obtain from (16)
for every \(t \ge t_0\). Under the assumptions (11) and (12) we conclude starting from \(t^{**}\) that
Under the assumption () using the fact that \(t \mapsto E_{\alpha - 1}(t)\) is bounded from below we deduce the existence of the limit \(\lim _{t \rightarrow +\infty } E_{\alpha - 1}(t)\) due to the Lemma A.1 and, therefore, \(t \mapsto E_{\alpha - 1}(t)\) is bounded, which leads to
From the boundedness of \(t \mapsto \Vert (\alpha - 1)(x(t) - x^*) + t (\dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \Vert ^2\) we obtain
using the following inequality, which is true for every \(t \ge t_0\)
Moreover, integrating (17) one may obtain the integrability of \(t \varepsilon (t) \Vert x(t) - x^* \Vert ^2\) as well as the other terms in (17). Consider now \(q = \alpha - 1 - \delta \), where \(\delta \) is defined by (15). Thus, (16) becomes
Under the assumptions (12) and (15) we deduce \(\dot{E}_{\alpha - 1 - \delta }(t) \ \le \ \frac{(\alpha - 1 - \delta ) t \varepsilon (t)}{2} \Vert x^* \Vert ^2\) starting from \(t^{**}\). Repeating the same argument we derive that \(t \mapsto E_{\alpha - 1 - \delta }(t)\) is bounded. The function \(t \mapsto \Vert x(t) - x^* \Vert \) is also bounded and so is the trajectory x. Integrating (19) one may additionally obtain the integrability of \(t \Vert \dot{x}(t) \Vert ^2\). From the integrability of \(\left( (\alpha - 3) t b(t) - t^2 \dot{b}(t) - \beta (\alpha - 2) \right) \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \) and (15) we deduce
\(\square \)
The next theorem shows that we can actually improve the rates of convergence of the function values in case \(\alpha > 3\).
Theorem 5
Assume that \(\alpha > 3\) and (12), (13),(14) and (15) hold. Then
In addition, \(\lim _{t \rightarrow +\infty } \psi (t) = 0\), where for \(2 \le q \le \alpha - 1\)
which in particular means
and moreover,
and
Proof
(i) Let us first prove an auxiliary estimate (20), which will allow us to obtain the rest of the desired results. We return to
Under condition (12) we deduce starting from \(t^{**}\)
Integrating the last inequality on \([t_0, t]\) we obtain
Since the gradient \( \nabla \Phi _{\lambda } \) is monotone, we know that \(\left\langle \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle \ge 0\). Moreover,
Notice that by (15) we have
or
Obviuosly,
Introducing \(\delta _1 = \alpha - 1 - \delta > 0\) (by the choice of \(\delta \)) we obtain
or
From Theorem 4 we know that \( t b(t) \left( \Phi _{\lambda (s)} (x(s)) - \Phi ^* \right) \) is integrable and therefore so is \(\left( 2 t b(t) + t^2 \dot{b}(t) - \beta (q + 2 - \alpha ) \right) \left( \Phi _{\lambda (s)} (x(s)) - \Phi ^* \right) \). Since the function \(t \mapsto E_q(t)\) is bounded and the rest of the right hand side of (22) belongs to \(L^1 \big ( [t_0, +\infty ), \mathbb {R} \big )\) by Theorem 4 and (23), we conclude with (20) due to (14).
(ii) In order to derive the convergence rates for the quantities of our interest we require some additional results. Our nearest goal is to establish the existence of the limits
Consider (as was done in [23, 24]) for two different \(q_1, q_2 \in (2, \alpha - 1)\) and for every \(t \ge t_0\) the difference
As we have established earlier in Theorem 4 the limits of \(E_{q_1}(t) - E_{q_2}(t)\) and \(t \left( \Phi _{\lambda (t)} (x(t)) - \Phi ^* \right) \) exists (the latter is actually zero). Therefore, the limit
Let us introduce for every \(t \ge t_0\) two auxiliary functions
and
Noticing that
we may write for every \(t \ge t_0\)
From the fact that \(\lim _{t \rightarrow +\infty } k(t)\) exists using (20) we obtain that \( \lim _{t \rightarrow +\infty } (\alpha - 1) r(t) + t \dot{r}(t) \) also exists. Applying Lemma A.2 we deduce the existence of the limit \(\lim _{t \rightarrow +\infty } r(t)\). Using (20) again we obtain the existence of the limits \(\lim _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert \) and \(\lim _{t \rightarrow +\infty } t \left\langle \dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)), x(t) - x^* \right\rangle \).
(iii) Finally, we are in position to prove (21) and the rest of the convergence rates. The key idea is to show that the limit
exists and is actually zero. Let us return to the definition of our energy functional and rewrite it as
Since the limits
it follows that
exists as well. Denote
and consider
Let us show that the right hand side of (24) is integrable. Indeed, the first term is integrable by Theorem 4. As we have also established in Theorem 4, starting from \(t^{**}\)
where \(a \ge 1\). Then, by (14) and \(\dot{\lambda }(t) \ge 0\) for all \(t \ge t_0\), we deduce that there exists \(t_1 \ge t^{**}\) such that for all \(t \ge t_1\)
or
or
So, by Theorem 4 the right hand side of (24) belongs to \(L^1 \big ( [t_1, +\infty ), \mathbb {R} \big )\). Therefore, \(\frac{\psi (t)}{t}\) also belongs to \(L^1 \big ( [t_1, +\infty ), \mathbb {R} \big )\) and since the limit \(\lim _{t \rightarrow +\infty } \psi (t)\) exists we deduce that it should be actually zero, which gives us (21). To complete the proof notice that by the definition of the proximal mapping, we have
The conclusion follows immediately from (2) and (21). \(\square \)
4 Strong Convergence of the Trajectories
In this chapter we will establish the strong convergence of the trajectories to the minimal norm element of \(\mathop {\textrm{argmin}}\limits \Phi \).
In order to do so, we will need to modify assumption (13) from the previous chapter:
Before moving to the main point of the section, let us prove an auxiliary result first.
Theorem 6
Suppose that \(\alpha > 3\), the function \(\lambda \) is bounded for all \(t \ge t_0\) and (11), (12), (14) and (25) hold. Then
and
Proof
Let us return to (18):
Let us integrate the last inequality on [T, t]
On the other hand, for every \(t \ge t_0\)
Thus,
We deduce due to (25) and the Lemma A.3 that
Therefore,
and clearly
Thus, we establish
By the definition of the proximal mapping
Using the fact that \(\lambda \) is bounded for all \(t \ge t_0\) we deduce
and
\(\square \)
For the remaining part of this section we will use a different energy functional. Inspired by [23] we introduce the following functional, which we will heavily rely on throughout this section
where \(v(t) = q (x(t) - x^*) + t (\dot{x}(t) + \beta \nabla \Phi _{\lambda (t)} (x(t)) \) and \(p, \ q \ge 0\).
The proof of the following theorem draws inspiration from [10, 15, 23].
Theorem 7
Suppose that \(\lambda \) is bounded for all \(t \ge t_0\), \(\alpha > 3\), \(b(t_0) \ge \frac{1}{2} + \frac{\beta }{t_0}\) and (11), (12) and (25) are fulfilled. Suppose additionally that for all \(t \ge t_0\)
and moreover that for all \(t \ge t_0\)
and
If \(x: [t_0,+\infty ) \mapsto H\) is a solution to (1) and the trajectory x(t) stays either inside or outside the ball \(B(0, \Vert x^* \Vert )\), then x(t) converges to minimal norm solution \(x^* = \mathop {\textrm{proj}}\limits _{\mathop {\textrm{argmin}}\limits \Phi }(0)\), as \(t \rightarrow +\infty \). Otherwise, \(\liminf _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert = 0\).
Proof
As in [23] we will consider several cases with respect to the trajectory x staying either inside or outside the ball \(B \left( 0, \Vert x^* \Vert \right) \).
Case I.
Assume that the trajectory x stays in the complement of the ball B for all \(t \ge t_0\). This means nothing but \(\Vert x(t) \Vert \ge \Vert x^* \Vert \) for every \(t \ge t_0\).
(i) Our nearest goal is to obtain the upper bound for the derivative of \(E_{p, q}\). In order to do so, let us evaluate its time derivative for every \(t \ge t_0\) first.
Consider for every \(t \ge t_0\) the inner product \(\langle \dot{v}(t), v(t) \rangle \):
where above we used (1). Consider now for every \(t \ge t_0\),
The two estimates that we made above lead to (31) becoming
Let us apply the gradient inequality to the strongly convex function \(x \mapsto b(t) \Phi _{\lambda (t)} (x) + \frac{\varepsilon (t) \Vert x \Vert ^2}{2} \):
and thus
for every \(t \ge t_0\). So, noticing that
we deduce
In order to proceed further we will need the following estimates:
and
for every \(t \ge t_0\), some \(c \ge 1\) and \(a \ge 1\). Thus,
Let us fix
First of all, due to this choice
and thus we get rid of the term \(\big \langle \dot{x}(t), x(t) - x^* \big \rangle \). Secondly,
Then
So,
Obviously, for t large enough, say, \(t \ge t_2 \ge t_0\) the following expression is non-positive due to (27) and \(p + 1 = \frac{\alpha }{3} > 0\) and \(\alpha - p - q - 2 = -1\)
Moreover, from (28) it follows that for \(c = 1\)
for all \(t \ge t_0\). Furthermore,
So, under the assumption (12) and the fact that \(\Vert x(t) \Vert \ge \Vert x^* \Vert \) for all \(t \ge t_0\) we deduce due to (33)
Thus, under the assumptions (12), (27), (28) and (29) (the latest leads to the non-positivity of the coefficient of \(\Vert \nabla \Phi _{\lambda }(x) \Vert ^2\)) we conclude due to (32) that for every \(t \ge t_2\)
(ii) Let us obtain now the lower bound for \(E_{p,q}\). Notice that for \(p = \frac{\alpha - 3}{3}\) and \(q = \frac{2 \alpha }{3}\) we have \(\alpha - p - q = 1\) and
since \(t b(t) - \beta \ge \frac{t}{2}\) for every \(t \ge t_0\) by \(b(t_0) \ge \frac{1}{2} + \frac{\beta }{t_0}\) and b being non-decreasing. On the other hand, applying the gradient inequality to the strongly convex function \(\varphi _{\varepsilon (t), \lambda (t)}(x) \ = \ \frac{\Phi _{\lambda (t)}(x)}{2} + \frac{\varepsilon (t)}{2} \Vert x \Vert ^2\) we deduce for \(x_{\varepsilon (t), \lambda (t)} = \mathop {\textrm{argmin}}\limits _{H} \varphi _{\varepsilon (t), \lambda (t)}(x)\)
By the definition of \(\varphi _{\varepsilon (t), \lambda (t)} (x)\) we deduce
We may now add the last two inequalities to obtain
Plugging (36) into (35) we conclude that for every \(t \ge t_2\)
(iii) Finally, using the lower and upper bounds for \(E_{p, q}\) we can prove the strong convergence of the trajectories to a minimal norm solution. Integrating (34) on \([t_2, t]\) we obtain
and using (37) we deduce for every \(t \ge t_2\)
Note that due to (28)
and
Since \(\alpha > 3\) we deduce
and thus
Finally, by (9) and (30) we conclude
Case II.
Assume now the opposite to the first case, namely, \(\Vert x(t) \Vert < \Vert x^* \Vert \) for every \(t \ge t_0\). According to Theorem 6
and
Denote \(\xi (t) = \mathop {\textrm{prox}}\limits \nolimits _{\lambda (t) \Phi }(x(t))\). Considering a sequence \( \{ t_k \}_{k \in {\mathbb {N}}} \) such that \( \{x(t_k)\}_{k \in {\mathbb {N}}} \) converges weakly to an element \( \hat{x} \in H \) as \( k \rightarrow \infty \), we notice that \(\{\xi (t_k)\}_{k \in {\mathbb {N}}} \) converges weakly to \({\hat{x}}\) as \( k \rightarrow \infty \). Now, the function \( \Phi \) being convex and lower semicontinuous in the weak topology, allows us to write
and hence, \( {\hat{x}} \in \mathop {\textrm{argmin}}\limits \Phi \). The norm is weakly semicontinuous, so
which means that \({\hat{x}} = x^*\) by the uniqueness of the element of the minimum norm in \(\mathop {\textrm{argmin}}\limits \Phi _{\lambda }\). Therefore, the trajectory x converges weakly to \(x^*\) and
and thus
From this and the weak convergence of the trajectory x follows the strong one: \(\lim _{t \rightarrow +\infty } x(t) = x^*\).
Case III.
Assume that for \(t \ge t_0\) the trajectory x finds itself both inside and outside the ball \(B(0, \Vert x^* \Vert )\). Since x is continuous, there exists a sequence \(\{ t_n \}_{n \in \mathbb {N}} \subseteq [t_0, +\infty )\) such that \(t_n \rightarrow \infty \) as \(n \rightarrow \infty \) and \(\Vert x(t_n) \Vert = \Vert x^* \Vert \) for every \(n \in \mathbb {N}\). Consider again a weak sequential cluster point \({\hat{x}}\) of the sequence \(\{ x(t_n) \}_{n \in \mathbb {N}}\). By repeating the same argument as in the previous case we deduce the weak convergence of \(\{ x(t_n) \}_{n \in \mathbb {N}}\) to \(x^*\), as \(n \rightarrow \infty \). Since \(\Vert x(t_n) \Vert \rightarrow \Vert x^* \Vert \), as \(n \rightarrow \infty \), we obtain that \(\Vert x(t_n) - x^* \Vert \rightarrow 0\), as \(n \rightarrow \infty \), which means \(\liminf _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert = 0\). \(\square \)
Remark 1
In this section the condition \(\dot{b}(t) \ge 0\) for all \(t \ge t_0\) is not necessary. Our conjecture is that we can weaken the setting by omitting this condition and thus widen the range for b, including the functions that decay not faster than \(\frac{1}{t^2}\) for the polynomial choice of parameters.
Remark 2
There is no setting which guarantees both fast rates for the values and strong convergence of the trajectories. One of the future goal would be to develop a new approach (based on [6]), which would help us deduce these two results simultaneously.
4.1 Strong Convergence of the Tajectories in Cse \(\alpha = 3\)
Throughout this section we no longer require that b is non-decreasing. In this case the analogue of Theorem 6 looks as follows.
Theorem 8
Suppose that for all \(t \ge t_0\) the function \(\lambda \) is bounded, \(b(t) \equiv b > 0\) is a constant function and (12) and (14) hold. Suppose additionally that (25) holds for constant b, namely
Then
and
Proof
In this case the energy functional becomes
Relation (16) thus becomes for all \(t \ge t_0\)
Thus, repeating the same arguments as in Theorem 4 we obtain
Let us multiply this expression with \(t (b t - \beta )\) to obtain
Now, we will divide by \((b t - \beta )^2\) to conclude
or
Integrating the last inequality on [T, t], where \(T \ge t_0\), we deduce
By the definition of \(E_2\) we know
Combining these two inequalities, we deduce
Now,
Applying Lemma A.3 we deduce due to (25)
and thus
Therefore, we establish
Again, by the definition of the proximal mapping
Using the fact that \(\lambda \) is bounded for all \(t \ge t_0\) we deduce
and
\(\square \)
We are in position now to formulate the analogue of Theorem 7.
Theorem 9
Suppose that \(\lambda \) is bounded for all \(t \ge t_0\), \(b(t) \equiv b \ge \frac{1}{2} + \frac{\beta }{t_0}\) and (12) and (25) hold. Assume, in addition, that
and
If \(x: [t_0,+\infty ) \mapsto H\) is a solution to (1) and the trajectory x(t) stays either inside or outside the ball \(B(0, \Vert x^* \Vert )\), then x(t) converges to minimal norm solution \(x^* = \mathop {\textrm{proj}}\limits _{\mathop {\textrm{argmin}}\limits \Phi }(0)\), as \(t \rightarrow +\infty \). Otherwise, \(\liminf _{t \rightarrow +\infty } \Vert x(t) - x^* \Vert = 0\).
Proof
The proof goes in line with the one of Theorem 7 by taking \(\alpha = 3\), \(b(t) \equiv b > 0\), \(q = 2\), \(p = 0\) and referring to Theorem 8 instead of Theorem 6 in the second and third cases. \(\square \)
5 Analysis of the Conditions
Since all the conditions cannot be satisfied simultaneously, let us treat them separately, namely:
-
1.
In order to obtain the fast convergence rates of the function values we require that for all \(t \ge t_0\):
-
\(\alpha \ > \ 3\);
-
the existence of \(a \ge 1\) such that \( 2 \dot{\varepsilon }(t) \ \le \ - a \beta \varepsilon ^2(t) \),
-
\( b(t_0) \ \ge \ \frac{\beta }{t_0} \text { and } b(t_0) > \frac{1}{a} \);
-
\( \int _{t_0}^{+\infty } t \varepsilon (t) dt \ < \ +\infty \) and
-
the existence of \(0< \delta < \alpha - 3\) such that \( (\alpha - 3) t b(t) - t^2 \dot{b}(t) + \beta (2 - \alpha ) \ \ge \ \delta t b(t) \).
-
-
2.
For the strong convergence of the trajectories we require the following for all \(t \ge t_0\):
-
\(\alpha \ > \ 3\);
-
\(\lambda \) is bounded;
-
\( \frac{\alpha - 3}{3} b(t) - t \dot{b}(t) + \frac{\alpha \beta }{3} \ \ge \ 0 \);
-
\( (\alpha - 3) t b(t) - t^2 \dot{b}(t) + \beta (2 - \alpha ) \ \ge \ 0 \);
-
the existence of \(a \ge 1\) such that \( 2 \dot{\varepsilon }(t) \ \le \ - a \beta \varepsilon ^2(t) \), \( b(t_0) > \frac{1}{a} \text { and } b(t_0) \ge \frac{1}{2} + \frac{\beta }{t_0} \);
-
\( \int _{t_0}^{+\infty } \frac{\varepsilon (t)}{t b(t)} dt \ < \ +\infty \);
-
\( 2 \alpha (\alpha - 3) - 9 t^2 \varepsilon (t) + 6 \alpha \beta \ \le \ 0 \);
-
\( 18 \beta t + 9 \beta \dot{\lambda }(t) - 9 t b(t) \left( \dot{\lambda }(t) + 2 \beta \right) + 3 (\alpha + 3) \beta ^2 + \alpha ^2 \beta \ \le \ 0 \);
-
\( \lim _{t \rightarrow +\infty } \frac{\beta }{t^{\frac{\alpha }{3} + 1} \varepsilon (t)} \int _{t_0}^t s^{\frac{\alpha }{3} + 1} \varepsilon ^2(s) ds \ = \ 0 \).
-
We will analyse these conditions in details for the polynomial choice of functions b and \(\varepsilon \), namely, \(b(t) = b t^n\) and \(\varepsilon (t) = \frac{\varepsilon }{t^d}\), where b is positive, \(n \ge 0\) and \(\varepsilon , d > 0\).
5.1 Setting for the Fast Convergence Rates of the Function Values
The set of the conditions becomes for all \(t \ge t_0\)
-
1.
\(\alpha \ > \ 3\);
-
2.
there exists \(a \ge 1\) such that \( -\frac{2 d \varepsilon }{t^{d+1}} \ \le \ - \frac{a \beta \varepsilon ^2}{t^{2d}} \),
-
3.
\( b(t_0) \ \ge \ \frac{\beta }{t_0} \text { and } b(t_0) > \frac{1}{a} \);
-
4.
\( \int _{t_0}^{+\infty } \frac{\varepsilon }{t^{d-1}} dt \ < \ +\infty \) and
-
5.
there exists \(0< \delta < \alpha - 3\) such that \( (\alpha - 3) b t^{n+1} - b n t^{n+1} + \beta (2 - \alpha ) \ \ge \ \delta b t^{n+1} \).
After some simple algebraic computations one may discover that in order to satisfy all the conditions at the same time it is enough to assume
and
since all the other inequalities could be fulfilled by taking the appropriate \(t_0\), namely,
5.2 Setting for the Strong Convergence of the Trajectories
The set of the conditions becomes for all \(t \ge t_0\)
-
1.
\(\alpha \ > \ 3\);
-
2.
\(\lambda \) is bounded;
-
3.
\( \frac{\alpha - 3}{3} b t^{n+1} - b n t^{n+1} + \frac{\alpha \beta }{3} \ \ge \ 0 \);
-
4.
\( (\alpha - 3) b t^{n+1} - b n t^{n+1} + \beta (2 - \alpha ) \ \ge \ 0 \);
-
5.
there exists \(a \ge 1\) such that \( -\frac{2 d \varepsilon }{t^{d+1}} \ \le \ - \frac{a \beta \varepsilon ^2}{t^{2d}} \), \( b(t_0) > \frac{1}{a} \text { and } b(t_0) \ge \frac{1}{2} + \frac{\beta }{t_0} \);
-
6.
\( \int _{t_0}^{+\infty } \frac{\varepsilon }{b t^{n+d+1}} dt \ < \ +\infty \);
-
7.
\( 2 \alpha (\alpha - 3) - \frac{9 \varepsilon }{t^{d-2}} + 6 \alpha \beta \ \le \ 0 \);
-
8.
\( 18 \beta t + 9 \beta \dot{\lambda }(t) - 9 b t^{n+1} \left( \dot{\lambda }(t) + 2 \beta \right) + 3(\alpha + 3) \beta ^2 + \alpha ^2 \beta \ \le \ 0 \);
-
9.
\( \lim _{t \rightarrow +\infty } \frac{\beta }{\varepsilon t^{\frac{\alpha }{3} - d + 1}} \int _{t_0}^t \varepsilon ^2 s^{\frac{\alpha }{3} - 2d + 1} ds \ = \ 0 \).
Again, analysis of the set of conditions leads to the following conclusion:
-
\(\lambda \) is bounded (condition 2) ;
-
\(0 \ \le \ n \ \le \ \frac{\alpha - 3}{3}\) and \( \alpha \ > \ 3\) (condition 3) ;
-
\(\max \left\{ 1, \frac{\beta \varepsilon }{2} \right\} \ \le \ d \ \le \ 2\) (conditions 5, 7, 8, 9) .
As before, \(t_0\) should be chosen appropriately.
5.3 The Case \(\alpha = 3\)
In this case the following has to be assumed: there exists \(a \ge 1\) such that for all \(t \ge t_0\)
-
1.
\( \lambda (t) \) is bounded;
-
2.
\( 2 \dot{\varepsilon }(t) \ \le \ - a \beta \varepsilon ^2(t), \ b \ > \ \frac{1}{a}\) and \(b \ge \frac{1}{2} + \frac{\beta }{t_0} \);
-
3.
\( \int _{t_0}^{+\infty } \frac{\varepsilon (t)}{t} dt \ < \ +\infty \);
-
4.
\( \lim _{t \rightarrow +\infty } t^2 \varepsilon (t) \ = \ +\infty \);
-
5.
\( 2 \beta t + \beta \dot{\lambda }(t) - b t \left( \dot{\lambda }(t) + 2 \beta \right) + 2 \beta ^2 + \beta \ \le \ 0 \);
-
6.
\( \lim _{t \rightarrow +\infty } \frac{\beta }{t^2 \varepsilon (t)} \int _{t_0}^t s^2 \varepsilon ^2(s) ds \ = \ 0 \).
Essentially, for the polynomial choice of parameters that means \(b \ \ge \ 1\) and
-
\( \lambda (t) \) is bounded (condition 2) ;
-
\( \max \left\{ 1, \frac{\beta \varepsilon }{2} \right\} \ \le \ d \ < \ 2 \) (conditions 4, 5, 6) ,
so with the appropriate choice of \(t_0\) the whole set of conditions is fulfilled.
6 Numerical Examples
6.1 The Rates of Convergence of the Moreau Envelope Values
Consider the objective function \(\Phi : \mathbb {R} \rightarrow \mathbb {R}\), \(\Phi (x) = |x| + \frac{x^2}{2}\) and let us plot the values of its Moreau envelope as well as the gradient of its Moreau envelope for different polynomial functions \(\lambda \), \(\varepsilon \) and b to illustrate the theoretical results with some numerical examples. We take \(\lambda (t) = t^l\), \(\varepsilon (t) = \frac{1}{t^d}\), \(b(t) = t^n\) with \(x(t_0) = x_0 = 10\), \(\dot{x}(t_0) = 0\), \(\alpha = 10\) and \(t_0 = 1.4\).
First, let us take different time scaling parameter b with \(l = 0\) and \(d = 3\) and see how it affects the behaviour of the system (1) (see Fig. 1).
As expected, the faster b grows, the faster the convergence is.
Consider now different Moreau envelope parameter \(\lambda \) with \(d = 3\) and \(n = 0\) (see Fig. 2).
Note that the difference in the starting point comes from the fact that \(t_0 \ne 1\), and for different exponents l the value \(t_0^l\) is also different. As predicted by theory, a faster growing function \(\lambda \) leads to faster convergence of not only the gradient of Moreau envelope of the objective function \(\Phi \), but also of the values of the Moreau envelope themselves.
Varying the Tikhonov function \(\varepsilon \) for \(n = 0\) and \(l = 0\) does not affect the system, which is illustrated by the following plot (see Fig. 3).
6.2 Strong Convergence of the Trajectories
For a different objective function let us investigate the strong convergence of the trajectories of (1):
The set \(\mathop {\textrm{argmin}}\limits \Phi \) is nothing but the segment \([-1, 1]\) and 0 is its element of minimal norm. Let us fix \(\alpha = 6\) and \(n = 0.7\). First we take constant lambda (\(\lambda (t) = 1\) for all \(t \ge t_0\)) and plot the behaviour of the trajectories of (1) with and without Tikhonov term (see Fig. 4.
As we see in case there is no Tikhonov regularization the trajectories converge to the minimizer 1 of \(\Phi \), but the Tikhonov term actually guarantees the convergence towards the minimal norm solution, which is 0.
Another comparison was made for non-constant lambda: \(\lambda (t) = 1 - \frac{1}{t^l}\) for \(l = 1\) (for different l’s the picture is the same), illustrating similar behaviour (see Fig. 5).
Finally, for the same choice of \(\lambda \) let us take different Tikhonov terms to figure out how changing them affects the trajectories of (1) (see Fig. 6).
We see, that the faster \(\varepsilon \) decays, the slower trajectories converge.
Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. J. de Mathématiques Pures et Appliquées 81(8), 747–779 (2002)
Attouch, H.: Viscosity solutions of minimization problems. SIAM J. Optim. 6(3), 769–806 (1996)
Attouch, H., Abbas, B., Svaiter, B.F.: Newton-like dynamics and forward–backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 161(2), 331–360 (2014)
Attouch, H., Balhag, A., Chbani, Z., Riahi, H.: Damped inertial dynamics with vanishing Tikhonov regularization: strong asymptotic convergence towards the minimum norm solution. J. Differ. Equ. 311, 29–58 (2022)
Attouch, H., Balhag, A., Chbani, Z., Riahi, H.: Fast convex optimization via inertial dynamics combining viscous and Hessian-driven damping with time rescaling. Evol. Equ. Control Theory 11(2), 487–514 (2022)
Attouch, H., Balhag, A., Chbani, Z., Riahi, H.: Accelerated gradient methods combining Tikhonov regularization with geometric damping driven by the Hessian. Appl. Math. Optim. 88(29), (2023)
Attouch, H., Cabot, A.: Convergence of damped inertial dynamics governed by regularized maximally monotone operators. J. Differ. Equ. 264, 7138–7182 (2018)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018)
Attouch, H., Chbani, Z., Riahi, H.: Combining fast inertial dynamics for convex optimization with Tikhonov regularization. J. Math. Anal. Appl. 457(2), 1065–1094 (2018)
Attouch, H., Chbani, Z., Riahi, H.: Fast proximal methods via time scaling of damped inertial dynamics. SIAM J. Optim. 29(3), 2227–2256 (2019)
Attouch, H., Chbani, Z., Riahi, H.: Fast convex optimization via time scaling of damped inertial gradient dynamics. Pure Appl. Funct. Anal. 6(6), 1081–1117 (2021)
Attouch, H., Chbani, Z., Riahi, H.: Accelerated gradient methods with strong convergence to the minimum norm minimizer: a dynamic approach combining time scaling, averaging, and Tikhonov regularization (2022). arXiv:2211.10140v1
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: Convergence of iterates for first-order optimization algorithms with inertia and Hessian driven damping. J. Math. Program. Oper. Res. 72(5), (2023)
Attouch, H., Cominetti, R.: A dynamical approach to convex minimization coupling approximation with the steepest descent method. J. Differ. Equ. 128(2), 519–540 (1996)
Attouch, H., Czarnecki, M.-O.: Asymptotic control and stabilization of nonlinear oscillators with non-isolated equilibria. J. Differ. Equ. 179, 278–310 (2002)
Attouch, H., László, S.C.: Continuous Newton-like inertial dynamics for monotone inclusions. Set-valued Variat. Anal. 29, 555–581 (2021)
Attouch, H., László, S. C.: Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution (2021). arXiv:2104.11987
Attouch, H., Peypouquet, J.: Convergence of the inertial dynamics and proximal algorithms governed by maximally monotone operators. Math. Program. 174, 391–432 (2019)
Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with Hessian driven damping damping. J. Differ. Equ. 261(10), 5734–5783 (2016)
Attouch, H., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, Springer (2016)
Boţ, R.I., Csetnek, E.R.: Second order forward-backward dynamical systems for monotone inclusion problems. SIAM J. Control. Optim. 54(3), 1423–1443 (2016)
Boţ, R.I., Csetnek, E.R., László, S.C.: Tikhonov regularization of a second order dynamical system with Hessian driven damping. Math. Program. 189, 151–186 (2021)
Boţ, R. I., Karapetyants, M.A.: A fast continuous time approach with time scaling for nonsmooth convex optimization. Adv. Contin. Discrete Models: Theory Appl. 73 (2022)
Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation and insights. Trans. Am. Math. Soc. 361, 5983–6017 (2009)
Cabot, A., Engler, H., Gadat, S.: Second order differential equations with asymptotically small dissipation and piecewise flat potentials. Electron. J. Differ. Equ. 17, 33–38 (2009)
László, S.C.: On the strong convergence of the trajectories of a Tikhonov regularized second order dynamical system with asymptotically vanishing damping. J. Differ. Equ. 362, 355–381 (2023)
May, R.: Asymptotic for a second-order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 41(3), 681–685 (2017)
Sell, G.R.: Dynamics of Evolutionary Equations. Springer, New York (2002)
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)
Acknowledgements
The authors are grateful to two anonymous reviewers for their remarks on this manuscript and for meaningful suggestions, which improved the quality of this paper.
Funding
Open access funding provided by University of Vienna.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests subject to the topic of this article.
Additional information
Communicated by Russell Luke.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Ernö Robert Csetnek: This work was supported by a Grant of the Ministry of Research (Romania), Innovation and Digitization, CNCS-UEFISCDI, project number PN-III-P1-1.1-TE-2021-0138, within PNCDI III. Mikhail A. Karapetyants: Research supported by the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by FWF (Austrian Science Fund), project W 1260.
Appendix A
Appendix A
Let us state here some auxiliary lemmas which we used in our analysis. For the proof of the following lemma we refer to [3].
Lemma A.1
Suppose that \(f: [t_0, +\infty ) \rightarrow \mathbb {R}\) is locally absolutely continuous and bounded from below and there exists \(g \in L^1([t_0, +\infty ), \mathbb {R})\) such that for almost all \(t \ge t_0\)
Then there exists \(\lim _{t \rightarrow +\infty } f(t) \in \mathbb {R}\).
For the proof of the next lemma we refer to [19].
Lemma A.2
Let H be a real Hilbert space and \(x: [t_0, +\infty ) \mapsto \mathbb {H}\) be a continuously differentiable function satisfying \( x(t) + \frac{t}{\alpha } \dot{x}(t) \rightarrow \ L \) as \( t \rightarrow +\infty \), with \( \alpha > 0 \) and \( L \in \mathbb {H} \). Then \( x(t) \rightarrow L \) as \( t \rightarrow +\infty \).
For the proof of the final Lemma we refer to [9].
Lemma A.3
Let \(\delta > 0\) and \(f \in L^1 \left( (\delta , +\infty ), \mathbb {R} \right) \) be a non-negative and continuous function. Let \(g: [\delta , +\infty ) \rightarrow [0, +\infty )\) be a non-decreasing function such that \(\lim _{t \rightarrow +\infty } g(t) \ = \ +\infty \). Then it holds
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Csetnek, E.R., Karapetyants, M.A. Second Order Dynamics Featuring Tikhonov Regularization and Time Scaling. J Optim Theory Appl 202, 1385–1420 (2024). https://doi.org/10.1007/s10957-024-02500-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-024-02500-8
Keywords
- Nonsmooth convex optimization
- Damped inertial dynamics
- Hessian-driven damping
- Time scaling
- Moreau envelope
- Proximal operator
- Tikhonov regularization