Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity

Attouch, Hedy; Chbani, Zaki; Peypouquet, Juan; Redont, Patrick

doi:10.1007/s10107-016-0992-8

Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity

Full Length Paper
Series B
Published: 24 March 2016

Volume 168, pages 123–175, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Programming Submit manuscript

Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity

Download PDF

Hedy Attouch¹,
Zaki Chbani²,
Juan Peypouquet³ &
…
Patrick Redont¹

3074 Accesses
165 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

In a Hilbert space setting ${{\mathcal {H}}}$, we study the fast convergence properties as $t \rightarrow + \infty $ of the trajectories of the second-order differential equation

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \nabla \Phi (x(t)) = g(t), \end{aligned}$$

where $\nabla \Phi $ is the gradient of a convex continuously differentiable function $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}, \alpha $ is a positive parameter, and $g: [t_0, + \infty [ \rightarrow {{\mathcal {H}}}$ is a small perturbation term. In this inertial system, the viscous damping coefficient $\frac{\alpha }{t}$ vanishes asymptotically, but not too rapidly. For $\alpha \ge 3$, and $\int _{t_0}^{+\infty } t \Vert g(t)\Vert dt < + \infty $, just assuming that ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $, we show that any trajectory of the above system satisfies the fast convergence property

$$\begin{aligned} \Phi (x(t))- \min _{{{\mathcal {H}}}}\Phi \le \frac{C}{t^2}. \end{aligned}$$

Moreover, for $\alpha > 3$, any trajectory converges weakly to a minimizer of $\Phi $. The strong convergence is established in various practical situations. These results complement the ${{\mathcal {O}}}(t^{-2})$ rate of convergence for the values obtained by Su, Boyd and Candès in the unperturbed case $g=0$. Time discretization of this system, and some of its variants, provides new fast converging algorithms, expanding the field of rapid methods for structured convex minimization introduced by Nesterov, and further developed by Beck and Teboulle with FISTA. This study also complements recent advances due to Chambolle and Dossal.

Fast Convergence of Inertial Gradient Dynamics with Multiscale Aspects

Article 29 October 2022

Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

Article Open access 27 June 2024

First-order optimization algorithms via inertial systems with Hessian driven damping

Article 16 November 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Let ${{\mathcal {H}}}$ be a real Hilbert space endowed with the scalar product $\langle \cdot ,\cdot \rangle $ and norm $\Vert \cdot \Vert $, and let $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ be a convex differentiable function. In this paper, we first study the solution trajectories of the second-order differential equation

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \nabla \Phi (x(t)) = 0, \end{aligned}$$

(1)

with $\alpha >0$, in terms of their asymptotic behavior as $t \rightarrow + \infty $. This will serve us as guideline to the study of corresponding algorithms.

We take for granted the existence and uniqueness of a global solution to the Cauchy problem associated with (1). Although this is not our main concern, we point out that, given $t_0>0$, for any $x_0 \in {{\mathcal {H}}}, \ v_0 \in {{\mathcal {H}}} $, the existence of a unique global solution on $[t_0,+\infty [$ for the Cauchy problem with initial condition $x(t_0)=x_0$ and $\dot{x}(t_0)=v_0$ can be guaranteed, for instance, if $\nabla \Phi $ is Lipschitz-continuous on bounded sets, and $\Phi $ is minorized.

Throughout the paper, unless otherwise indicated, we simply assume that

$$\begin{aligned} \Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}} \textit{is a continuously differentiable convex function.} \end{aligned}$$

As we shall see, most of the convergence properties of the trajectories are valid under this general assumption. This approach paves the way for the extension of our results to non-smooth convex potential functions (replacing the gradient by the subdifferential).

In preparation for a stability study of this system and the associated algorithms, we will also consider the following perturbed version of (1):

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \nabla \Phi (x(t)) = g(t), \end{aligned}$$

(2)

where the right-hand side $g:[t_0, + \infty [ \rightarrow {{\mathcal {H}}}$ is an integrable source term, such that g(t) is small for large t.

The importance of the evolution system (1) is threefold:

1. Mechanical interpretation: It describes the motion of a particle, with mass equal to one, subject to a potential energy function $\Phi $, and an isotropic linear damping with a viscosity parameter that vanishes asymptotically. This provides a simple model for a progressive reduction of the friction, possibly due to material fatigue.

2. Fast minimization of function $\Phi {:}$ Equation (1) is a particular case of the inertial gradient-like system

$$\begin{aligned} \ddot{x}(t) + a(t) \dot{x}(t) + \nabla \Phi (x(t)) = 0, \end{aligned}$$

(3)

with asymptotic vanishing damping, studied by Cabot et al. [17, 18]. As shown in [17, Corollary 3.1] (under some additional conditions on $\Phi $), every solution $x(\cdot )$ of (3) satisfies $\lim _{t\rightarrow +\infty }\Phi (x(t))=\min \Phi $, provided $\int _0^{\infty }a(t)dt = + \infty $. The specific case (1) was studied by Su, Boyd and Candès in [35] in terms of the rate of convergence of the values. More precisely, [35, Theorem 4.1] establishes that $\Phi (x(t))-\min \Phi =\mathcal O(t^{-2})$, whenever $\alpha \ge 3$. Unfortunately, their analysis does not entail the convergence of the trajectory itself.

3. Relationship with fast numerical optimization methods: As pointed out in [35, Section 2], for $\alpha =3$, (1) can be seen as a continuous version of the fast convergent method of Nesterov (see [26–29]), and its widely used successors, such as the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), studied in [14]. These methods have a convergence rate of $\Phi (x_k)-\min \Phi ={{\mathcal {O}}}(k^{-2})$, where k is the number of iterations. As for the continuous-time system (1), convergence of the sequences generated by FISTA and related methods, has not been established so far. This is a central and long-standing question in the study of numerical optimization methods.

The purpose of this research is to establish the convergence of the trajectories satisfying (1), as well as the sequences generated by the corresponding numerical methods with Nesterov-type acceleration. We also complete the study with several stability properties concerning both the continuous-time system and the algorithms.

The main contributions of this work are the following:

In Sect. 2, we first establish the minimizing property in the general case where $\alpha >0$, and $\inf \Phi $ is not necessarily attained. As a consequence, every weak limit point of the trajectory must be a minimizer of $\Phi $, and so, the existence of a bounded trajectory characterizes the existence of minimizers. Assuming ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $ and $\alpha \ge 3$, we recover the ${{\mathcal {O}}}(t^{-2})$ convergence rates, and give several examples and counterexamples concerning the optimality of these results. Next, we show that every solution of (1) converges weakly to a minimizer of $\Phi $ provided $\alpha >3$ and ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $. For the limiting case $\alpha =3$, which corresponds exactly to Nesterov’s method, the convergence of the trajectories is still a puzzling open question. We finish this section by providing an ergodic convergence result for the acceleration of the system in case $\nabla \Phi $ is Lipschitz-continuous on sublevel sets of $\Phi $.

In Sect. 3, strong convergence is established in various practical situations enjoying further geometric features, such as strong convexity, symmetry, or nonemptyness of the interior of the solution set. In the strongly convex case, we obtained a surprising result: convergence of the values occurs at a $\mathcal O(t^{-\frac{2\alpha }{3}})$ rate.

In Sect. 4, we analyze the asymptotic behavior, as $t\rightarrow +\infty $, of the solutions of the perturbed differential system (2), and obtain similar convergence results under integrability assumptions on the perturbation term g.

Section 5 contains the analogous results for the associated Nesterov-type algorithms (also for $\alpha >3$). To avoid repeating similar arguments, we state the results and develop the proofs directly for the perturbed version. As a guideline, we follow the proof of the convergence of the continuous dynamic. We provide discrete versions of the differential inequalities that we used in the Lyapunov convergence analysis. The convergence results are parallel to those for the continuous case, under summability conditions on the errors.

As we were preparing the final version of this manuscript, we discovered the preprint [19] by Chambolle and Dossal, where the weak convergence result of the algorithm in the unperturbed case is obtained by a similar, but different argument (see [19, Theorem 3]). This approach has been further developed by Aujol–Dossal [10] in the perturbed case.

2 Minimizing property, convergence rates and weak convergence of the trajectories

We begin this section by providing some preliminary estimations concerning the global energy of the system (1) and the distance to the minimizers of $\Phi $. These allow us to show the minimizing property of the trajectories under minimal assumptions. Next, we recover the convergence rates for the values originally given in [35], and obtain further decay estimates that ultimately imply the convergence of the solutions of (1). We finish the study by proving an ergodic convergence result for the acceleration. Several examples and counterexamples are given throughout the section.

2.1 Preliminary remarks and estimations

The existence of global solutions to (1) has been examined, for instance, in [17, Proposition 2.2] in the case of a general asymptotic vanishing damping coefficient. In our setting, for any $t_0 >0, \alpha >0$, and $(x_0,v_0)\in \mathcal H\times {{\mathcal {H}}}$, there exists a unique global solution $x : [t_0, +\infty [ \rightarrow {{\mathcal {H}}}$ of (1), satisfying the initial condition $x(t_0)= x_0, \dot{x}(t_0)= v_0$, under the sole assumption that $\nabla \Phi $ is Lipschitz-continuous on bounded sets, and $\inf \Phi > - \infty $. Taking $t_0>0$ comes from the singularity of the damping coefficient $a(t) = \frac{\alpha }{t}$ at zero. Indeed, since we are only concerned about the asymptotic behavior of the trajectories, we do not really care about the origin of time. If one insists in starting from $t_0 =0$, then all the results remain valid with $a(t)=\frac{\alpha }{t+1}$.

At different points, we shall use the global energy of the system, given by $W:[t_0,+\infty [\rightarrow {\mathbb {R}}$

$$\begin{aligned} W(t) = \frac{1}{2}\Vert \dot{x}(t)\Vert ^2 + \Phi (x(t)). \end{aligned}$$

(4)

Using (1), we immediately obtain

Lemma 2.1

Let W be defined by (4). For each $t>t_0$, we have

$$\begin{aligned}\dot{W}(t) = -\frac{\alpha }{t}\Vert \dot{x}(t)\Vert ^2.\end{aligned}$$

Hence, W is nonincreasing,^{Footnote 1} and $W_\infty =\lim _{t\rightarrow +\infty }W(t)$ exists in ${\mathbb {R}}\cup \{-\infty \}$. If $\Phi $ is bounded from below, $W_\infty $ is finite.

Now, given $z\in {{\mathcal {H}}}$, we define $h_z:[t_0,+\infty [ \rightarrow {\mathbb {R}}$ by

$$\begin{aligned} h_z(t)=\frac{1}{2}\Vert x(t)-z\Vert ^2. \end{aligned}$$

(5)

By the Chain Rule, we have

$$\begin{aligned} \dot{h}_z(t) = \langle x(t) - z, \dot{x}(t) \rangle \qquad \hbox {and}\qquad \ddot{h}_z(t) = \langle x(t) - z, \ddot{x}(t) \rangle + \Vert \dot{x}(t) \Vert ^2. \end{aligned}$$

Using (1), we obtain

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t)= & {} \Vert \dot{x}(t) \Vert ^2 + \langle x(t) - z, \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) \rangle \nonumber \\= & {} \Vert \dot{x}(t) \Vert ^2 + \langle x(t) - z, -\nabla \Phi (x(t)) \rangle . \end{aligned}$$

(6)

The convexity of $\Phi $ implies

$$\begin{aligned} \langle x(t) - z, \nabla \Phi (x(t)) \rangle \ge \Phi (x(t)) - \Phi (z), \end{aligned}$$

and we deduce that

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) +\Phi (x(t)) - \Phi (z) \le \Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

(7)

We have the following relationship between $h_z$ and W:

Lemma 2.2

Take $z\in {{\mathcal {H}}}$, and let W and $h_z$ be defined by (4) and (5), respectively. There is a constant C such that

$$\begin{aligned} \int _{t_0}^t \frac{1}{s}\left( W(s) - \Phi (z)\right) ds \le C-\frac{1}{t} \dot{h}_{z}(t)-\frac{3}{2\alpha }W(t). \end{aligned}$$

Proof

Divide (7) by t, and use the definition of W given in (4), to obtain

$$\begin{aligned} \frac{1}{t} \ddot{h}_{z}(t) + \frac{\alpha }{t^2} \dot{h}_{z}(t) + \frac{1}{t}\left( W(t) - \Phi (z) \right) \le \frac{3}{2t}\Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

Integrate this expression from $t_0$ to $t>t_0$ (use integration by parts for the first term), to obtain

$$\begin{aligned}&\int _{t_0}^t \frac{1}{s}\left( W(s) - \Phi (z) \right) ds \le \frac{1}{t_0} \dot{h}_{z}(t_0) - \frac{1}{t} \dot{h}_{z}(t) \nonumber \\&\quad -(\alpha + 1)\int _{t_0}^t \frac{1}{s^2}\dot{h}_{z}(s)ds + \int _{t_0}^t \frac{3}{2s}\Vert \dot{x}(s)\Vert ^2 ds. \end{aligned}$$

(8)

On the one hand, Lemma 2.1 gives

$$\begin{aligned} \int _{t_0}^t \frac{3}{2s}\Vert \dot{x}(s)\Vert ^2 ds = \frac{3}{2\alpha }(W(t_0)-W(t)). \end{aligned}$$

On the other hand, another integration by parts yields

$$\begin{aligned} \int _{t_0}^t \frac{1}{s^2}\dot{h}_{z}(s)ds = \frac{1}{t^2}h_z(t) - \frac{1}{t_0^2}h_z(t_0) + \int _{t_0}^t\frac{2}{s^3}h_z(s)ds \ge - \frac{1}{t_0^2}h_z(t_0). \end{aligned}$$

Combining these inequalities with (8), we get

$$\begin{aligned} \int _{t_0}^t \frac{1}{s}\left( W(s) - \Phi (z) \right) ds\le & {} \frac{1}{t_0} \dot{h}_{z}(t_0) - \frac{1}{t} \dot{h}_{z}(t) + (\alpha + 1)\frac{1}{t_0^2}h_z(t_0) \\&+\, \frac{3}{2\alpha }(W(t_0)-W(t)) = C-\frac{1}{t} \dot{h}_{z}(t)-\frac{3}{2\alpha }W(t), \end{aligned}$$

where C collects the constant terms. $\square $

2.2 Minimizing property

It turns out that the trajectories of (1) minimize $\Phi $ in the completely general setting, where $\alpha >0, {{\mathrm{argmin\,}}}\Phi $ is possibly empty, and $\Phi $ is not necessarily bounded from below (recall that we assume the existence and uniqueness of a global solution to the Cauchy problem associated with (1), which is not guaranteed by these general assumptions). This property was obtained by Alvarez in [2, Theorem 2.1] for the heavy ball with friction (where the damping is constant). Similar results can be found in [17].

We have the following:

Theorem 2.3

Let $\alpha >0$, and suppose $x:[t_0,+\infty [\rightarrow {{\mathcal {H}}}$ is a solution of (1). Then

(i)
$W_\infty =\lim _{t \rightarrow +\infty }W(t)=\lim _{t \rightarrow +\infty }\Phi (x(t))=\inf \Phi \in {\mathbb {R}}\cup \{-\infty \}$.
(ii)
As $t\rightarrow +\infty $, every weak limit point of x(t) lies in ${{\mathrm{argmin\,}}}\Phi $.
(iii)
If ${{\mathrm{argmin\,}}}\Phi =\emptyset $, then $\lim _{t\rightarrow +\infty }\Vert x(t)\Vert =+\infty $.
(iv)
If x is bounded, then ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $.
(v)
If $\Phi $ is bounded from below, then $\lim _{t\rightarrow +\infty }\Vert {{\dot{x}}}(t)\Vert =0$.
(vi)
If $\Phi $ is bounded from below and x is bounded, then $\lim _{t\rightarrow +\infty }{{\dot{h}}}_z(t)=0$ for each $z\in {{\mathcal {H}}}$. Moreover,
$$\begin{aligned} \int _{t_0}^\infty \frac{1}{t}(\Phi (x(t))-\min \Phi )dt<+\infty . \end{aligned}$$

Proof

To prove (i), first set $z\in {{\mathcal {H}}}$ and $\tau \ge t>t_0$. By Lemma 2.1, W is nonincreasing. Hence, Lemma 2.2 gives

$$\begin{aligned} (W(\tau )-\Phi (z))\int _{t_0}^t\frac{ds}{s}+\frac{3}{2\alpha }W(\tau ) \le C-\frac{1}{t}{{\dot{h}}}_z(t), \end{aligned}$$

which we rewrite as

$$\begin{aligned} (W(\tau )-\Phi (z))\left( \int _{t_0}^t\frac{ds}{s}+\frac{3}{2\alpha }\right) \le C-\frac{3}{2\alpha }\Phi (z)-\frac{1}{t}{{\dot{h}}}_z(t), \end{aligned}$$

and then

$$\begin{aligned} (W(\tau )-\Phi (z))\left( \ln (t)+\frac{3}{2\alpha }-\ln (t_0)\right) \le C-\frac{3}{2\alpha }\Phi (z)-\frac{1}{t}{{\dot{h}}}_z(t). \end{aligned}$$

Integrate from $t=t_0$ to $t=\tau $ to obtain

$$\begin{aligned}&(W(\tau )-\Phi (z)) \left( \tau \ln (\tau )-t_0\ln (t_0)+t_0 -\tau + \left( \frac{2}{3\alpha }-\ln (t_0)\right) (\tau -t_0)\right) \\&\quad \le \left( C-\frac{3}{2\alpha }\Phi (z)\right) (\tau -t_0)- \int _{t_0}^\tau \frac{1}{t}{{\dot{h}}}_z(t)dt. \end{aligned}$$

But

$$\begin{aligned} \int _{t_0}^\tau \frac{{{\dot{h}}}_z(t)}{t}dt =\frac{h_z(\tau )}{\tau }-\frac{h_z(t_0)}{t_0}+\int _{t_0}^\tau \frac{h_z(t)}{t^2}dt \ge -\frac{h_z(t_0)}{t_0}. \end{aligned}$$

Hence,

$$\begin{aligned} (W(\tau )-\Phi (z))(\tau \ln (\tau )+A\tau +B) \le \widetilde{C}\tau +D, \end{aligned}$$

for suitable constants $A, B, \widetilde{C}$ and D. This immediately yields $W_\infty \le \Phi (z)$, and hence $W_\infty \le \inf \Phi $. It suffices to observe that

$$\begin{aligned} \inf \Phi \le \liminf _{t\rightarrow +\infty }\Phi (x(t))\le \limsup _{t\rightarrow +\infty }\Phi (x(t))\le \lim _{t\rightarrow +\infty }W(t)=W_\infty \end{aligned}$$

to obtain (i).

Next, (ii) follows from (i) by the lower-semicontinuity of $\Phi $ for the weak topology. Clearly, (iii) and (iv) are immediate consequences of (ii). We obtain (v) by using (i) and the definition of W given in (4). For (vi), since ${{\dot{h}}}_z(t)=\langle x(t) - z, \dot{x}(t) \rangle $ and x is bounded, (v) implies $\lim \nolimits _{t\rightarrow +\infty }{{\dot{h}}}_z(t)=0$. Finally, using the definition of W together with Lemma 2.2 with $z\in {{\mathrm{argmin\,}}}\Phi $, we get

$$\begin{aligned} \int _{t_0}^\infty \frac{1}{t}(\Phi (x(t))-\min \Phi )dt\le C-\frac{3}{2\alpha }\min \Phi <+\infty , \end{aligned}$$

which completes the proof. $\square $

Remark 2.4

We shall see in Theorem 2.14 that, for $\alpha \ge 3$, the existence of minimizers implies that every solution of (1) is bounded. This gives a converse to part iv) of Theorem 2.3.

If $\Phi $ is not bounded from below, it may be the case that $\Vert \dot{x}(t)\Vert $ does not tend to zero, as shown in the following example:

Example 2.5

Let ${{\mathcal {H}}}={\mathbb {R}}$ and $\alpha >0$. The function $x(t)=t^2$ satisfies (1) with $\Phi (x)=-2(\alpha +1)x$. Then $\lim _{t\rightarrow +\infty }\Phi (x(t))=-\infty =\inf \Phi $, and $\lim _{t\rightarrow +\infty }\Vert {{\dot{x}}}(t)\Vert =+\infty $.

2.3 Two “anchored” energy functions

We begin by introducing two important auxiliary functions, and showing their basic properties. From now on, we assume ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $. Fix $\lambda \ge 0, \xi \ge 0, p\ge 0$ and $x^*\in {{\mathrm{argmin\,}}}\Phi $. Let $x:[t_0,+\infty [\rightarrow {{\mathcal {H}}}$ be a solution of (1). For $t\ge t_0$ define

$$\begin{aligned} {{\mathcal {E}}}_{\lambda ,\xi }(t)= & {} t^2(\Phi (x(t))-\min \Phi )+\frac{1}{2}\Vert \lambda (x(t)-x^*)+t\dot{x}(t)\Vert ^2+ \frac{\xi }{2}\Vert x(t)-x^*\Vert ^2, \\ {{\mathcal {E}}}_\lambda ^p(t)= & {} t^p{{\mathcal {E}}}_{\lambda ,0}(t) \; = \; t^p\left( t^2(\Phi (x(t))-\min \Phi )+ \frac{1}{2}\Vert \lambda (x(t)-x^*)+t{{\dot{x}}}(t)\Vert ^2\right) , \end{aligned}$$

and notice that ${{\mathcal {E}}}_{\lambda ,\xi }$ and ${{\mathcal {E}}}_\lambda ^p$ are sums of nonnegative terms. These generalize the energy functions ${{\mathcal {E}}}$ and $\tilde{{{\mathcal {E}}}}$ introduced in [35]. More precisely, ${{\mathcal {E}}}={{\mathcal {E}}}_{\alpha -1,0}$ and $\tilde{\mathcal E}={{\mathcal {E}}}_{(2\alpha -3)/3}^1$.

We need some preparatory calculations prior to differentiating ${{\mathcal {E}}}_{\lambda ,\xi }$ and ${{\mathcal {E}}}_\lambda ^p$. For simplicity of notation, we do not make the dependence of x or ${{\dot{x}}}$ on t explicit. Notice that we use (1) in the second line to dispose of ${{\ddot{x}}}$.

$$\begin{aligned} \frac{d}{dt}t^2(\Phi (x)-\min \Phi )= & {} 2t(\Phi (x)-\min \phi )+t^2\langle {{\dot{x}}},\nabla \Phi (x)\rangle \\ \frac{d}{dt}\frac{1}{2}\Vert \lambda (x-x^*)+t{{\dot{x}}}\Vert ^2= & {} - \lambda t\langle x-x^*,\nabla \Phi (x)\rangle - \lambda (\alpha -\lambda -1)\langle x-x^*,{{\dot{x}}}\rangle \\&-\, (\alpha -\lambda -1)t\Vert {{\dot{x}}}\Vert ^2 - t^2\langle {{\dot{x}}},\nabla \Phi (x)\rangle \\ \frac{d}{dt}\frac{1}{2}\Vert x-x^*\Vert ^2= & {} \langle x-x^*,\dot{x}\rangle . \end{aligned}$$

Whence, we deduce

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_{\lambda ,\xi }(t)= & {} 2t(\Phi (x)-\min \Phi ) - \lambda t\langle x-x^*,\nabla \Phi (x)\rangle \nonumber \\&+\,(\xi -\lambda (\alpha -\lambda -1))\langle x-x^*,{{\dot{x}}}\rangle - (\alpha -\lambda -1)t\Vert {{\dot{x}}}\Vert ^2 \end{aligned}$$

(9)

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_\lambda ^p(t)= & {} (p+2)t^{p+1}(\Phi (x)-\min \Phi ) - \lambda t^{p+1}\langle x-x^*,\nabla \Phi (x)\rangle \nonumber \\&-\, \lambda (\alpha -\lambda -1-p)t^p\langle x-x^*,{{\dot{x}}}\rangle \nonumber \\&+\, \frac{\lambda ^2 p}{2}t^{p-1}\Vert x-x^*\Vert ^2 - \left( \alpha -\lambda -1-\frac{p}{2}\right) t^{p+1}\Vert {{\dot{x}}}\Vert ^2. \end{aligned}$$

(10)

Remark 2.6

If $y\in {{\mathcal {H}}}$ and $x^*\in {{\mathrm{argmin\,}}}\Phi $, the convexity of $\Phi $ gives $\min \Phi =\Phi (x^*)\ge \Phi (y)+\langle \nabla \Phi (y),x^*-y\rangle $. Using this in (9) with $y=x(t)$, we obtain

$$\begin{aligned}&\frac{d}{dt}{{\mathcal {E}}}_{\lambda ,\xi }(t) \le (2-\lambda )\,t\,(\Phi (x)-\min \Phi ) + (\xi -\lambda (\alpha -\lambda -1))\langle x-x^*,\dot{x}\rangle \\&\quad -(\alpha -\lambda -1)\,t\,\Vert {{\dot{x}}}\Vert ^2. \end{aligned}$$

If $\alpha \ge 3$ and $2\le \lambda \le \alpha -1$ and if one chooses $\xi ^*=\lambda (\alpha -\lambda -1)$ (which is nonnegative), then

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_{\lambda ,\xi ^*}(t) \le (2-\lambda )\,t\,(\Phi (x)-\min \Phi ) - (\alpha -\lambda -1)\,t\,\Vert {{\dot{x}}}\Vert ^2. \end{aligned}$$

Hence ${{\mathcal {E}}}_{\lambda ,\xi ^*}$ is nonincreasing. The extreme cases $\lambda =2$ and $\lambda =\alpha -1$ are of special importance, as we shall see shortly.

2.4 Rate of convergence for the values

We now recover convergence rate results for the value of $\Phi $ along a trajectory, already established in [35, Theorem 4.1]:

Theorem 2.7

Let $x:[t_0,+\infty [ \rightarrow {{\mathcal {H}}}$ be a solution of (1), and assume ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $. If $\alpha \ge 3$, then

$$\begin{aligned} \Phi (x(t))-\min \Phi \le \frac{{{\mathcal {E}}}_{\alpha -1,0}(t_0)}{t^2}. \end{aligned}$$

If $\alpha >3$, then

$$\begin{aligned} \int _{t_0}^{+\infty }t\big (\Phi (x(t))-\min \Phi \big )\,dt\le \frac{{{\mathcal {E}}}_{\alpha -1,0}(t_0)}{\alpha -3}<+\infty . \end{aligned}$$

Proof

Suppose $\alpha \ge 3$. Choose $\lambda =\alpha -1$ and $\xi =0$, so that $\xi -\lambda (\alpha -\lambda -1)=\alpha -\lambda -1=0$ and $\lambda -2=\alpha -3$. Remark 2.6 gives

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_{\alpha -1,0}(t) \le - (\alpha -3)\,t\,(\Phi (x)-\min \Phi ), \end{aligned}$$

(11)

and ${{\mathcal {E}}}_{\alpha -1,0}$ is nonincreasing. Since $t^2(\Phi (x)-\min \Phi )\le {{\mathcal {E}}}_{\alpha -1,0}(t)$ (recall the definition of ${{\mathcal {E}}}_{\lambda ,\xi }$), we obtain

$$\begin{aligned} \Phi (x(t))-\min \Phi \le \frac{{{\mathcal {E}}}_{\alpha -1,0}(t_0)}{t^2}. \end{aligned}$$

If $\alpha >3$, integrating (11) from $t_0$ to t we obtain

$$\begin{aligned}\int _{t_0}^t s(\Phi (x(s))-\min \Phi )ds \le \frac{1}{\alpha -3}({{\mathcal {E}}}_{\alpha -1,0}(t_0)-\mathcal E_{\alpha -1,0}(t))\le \frac{1}{\alpha -3}\mathcal E_{\alpha -1,0}(t_0),\end{aligned}$$

which allows us to conclude. $\square $

Remark 2.8

It would be interesting to know whether $\alpha =3$ is critical for the convergence rate given above.

Remark 2.9

For the (first-order) steepest descent dynamical system, the typical rate of convergence is ${{\mathcal {O}}}(1/t)$ (see, for instance, [33, Section 3.1]). For the second-order system (1), we have obtained a rate of ${{\mathcal {O}}}(1/t^2)$. It would be interesting to know whether higher-order systems give the corresponding rates of convergence. Another challenging question is the convergence rate of the trajectories defined by differential equations involving fractional time derivatives, as well as integro-differential equations.

Remark 2.10

The constant in the order of convergence given by Theorem 2.7 is

$$\begin{aligned} K(x_0,v_0)={{\mathcal {E}}}_{\alpha -1,0}(t_0)=t_0^2(\Phi (x_0)- \min \Phi )+\frac{1}{2}\Vert (\alpha -1)(x_0-x^*)+t_0v_0\Vert ^2, \end{aligned}$$

where $x_0=x(t_0)$ and $v_0={{\dot{x}}}(t_0)$. This quantity is minimized when $x_0^*\in {{\mathrm{argmin\,}}}\Phi $ and $v_0^*=\frac{(\alpha -1)}{t_0}(x^*-x_0^*)$, with $\min K=0$. If $x_0^*\ne x^*$, the trajectory will not be stationary, but the value $\Phi (x(t))$ will be constantly equal to $\min \Phi $. Of course, selecting $x_0^*\in {{\mathrm{argmin\,}}}\Phi $ is not realistic, and the point $x^*$ is unknown. Keeping $\hat{x}_0$ fixed, the function $v_0\mapsto K(\hat{x}_0,v_0)$ is minimized at $\hat{v}_0=\frac{(\alpha -1)}{t_0}(x^*-\hat{x}_0)$. This suggests taking the initial velocity as a multiple of an approximation of $x^*-\hat{x}_0$, such as the gradient direction $\hat{v}_0=\nabla \Phi (\hat{x}_0)$, Newton or Levenberg-Marquardt direction $\hat{v}_0=[\varepsilon I+\nabla ^2\Phi (\hat{x}_0)^{-1}]\nabla \Phi (\hat{x}_0)$ ($\varepsilon \ge 0$), or the proximal point direction $\hat{v}_0=\left[ (I+\gamma \nabla \Phi )^{-1}(\hat{x}_0)-\hat{x}_0\right] $ ($\gamma >>0$).

2.5 Some examples and counterexamples

A convergence rate of ${{\mathcal {O}}}(1/t^2)$ may be attained, even if ${{\mathrm{argmin\,}}}\Phi = \emptyset $ and $\alpha <3$. This is illustrated in the following example:

Example 2.11

Let ${{\mathcal {H}}} = {{\mathbb {R}}}$ and take $\Phi (x) = \frac{\alpha -1}{2}e^{-2x}$ with $\alpha \ge 1$. Let us verify that $x(t)= \ln t$ is a solution of (1). On the one hand,

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) = \frac{\alpha -1}{t^2}. \end{aligned}$$

On the other hand, $\nabla \Phi (x)= -(\alpha -1)e^{-2x}$ which gives

$$\begin{aligned} \nabla \Phi (x(t))= -(\alpha -1)e^{-2 \ln t}= -\frac{\alpha -1}{t^2}. \end{aligned}$$

Thus, $x(t)= \ln t$ is a solution of (1). Let us examine the minimizing property. We have $\inf \Phi = 0$, and

$$\begin{aligned} \Phi (x(t)) = \frac{\alpha -1}{2}e^{-2 \ln t}= \frac{\alpha -1}{2t^2}. \end{aligned}$$

Therefore, one may wonder whether the rapid convergence of the values is true in general. The following example shows that this is not the case:

Example 2.12

Let ${{\mathcal {H}}} = {{\mathbb {R}}}$ and take $\Phi (x) = \frac{c}{x^{\theta }}$, with $\theta >0, \alpha \ge \frac{\theta }{(2+\theta )}$ and $c= \frac{2( 2\alpha + \theta (\alpha -1))}{\theta ( 2+ \theta )^2}$. Let us verify that $x(t)= t^{\frac{2}{2 + \theta }}$ is a solution of (1). On the one hand,

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) = \frac{2}{(2+ \theta )^2}(2\alpha + \theta (\alpha -1)) t^{- \frac{2(1+ \theta )}{2 + \theta }}. \end{aligned}$$

On the other hand, $\nabla \Phi (x)= -c \theta x^{-\theta -1} $ which gives

$$\begin{aligned} \nabla \Phi (x(t))= -c \theta t^{- \frac{2(1+ \theta )}{2 + \theta }}= -\frac{2}{(2+ \theta )^2}(2\alpha + \theta (\alpha -1)) t^{- \frac{2(1+ \theta )}{2 + \theta }}. \end{aligned}$$

Thus, $x(t)= t^{\frac{2}{2 + \theta }}$ is solution of (1). Let us examine the minimizing property. We have $\inf \Phi = 0$, and

$$\begin{aligned} \Phi (x(t)) = c \frac{1}{t^{\frac{2 \theta }{2 + \theta }}}\quad \text{ with } \quad \frac{2 \theta }{2 + \theta }<2. \end{aligned}$$

We conclude that the order of convergence may be strictly slower than ${{\mathcal {O}}}(1/t^2)$ when ${{\mathrm{argmin\,}}}\Phi =\emptyset $. In the Example 2.12, this occurs no matter how large $\alpha $ is. The speed of convergence of $\Phi (x(t))$ to $\inf \Phi $ depends on the behavior of $\Phi (x)$ as $\Vert x\Vert \rightarrow +\infty $. The above examples suggest that, when $\Phi (x)$ decreases rapidly and attains its infimal value as $\Vert x\Vert \rightarrow \infty $, we can expect fast convergence of $\Phi (x(t))$.

Even when ${{\mathrm{argmin\,}}}\Phi \ne \emptyset , {{\mathcal {O}}} (1/t^2)$ is the worst possible case for the rate of convergence, attained as a limit in the following example:

Example 2.13

Take ${{\mathcal {H}}} = {{\mathbb {R}}}$ and $\Phi (x) = c|x|^{\gamma }$, where c and $\gamma $ are positive parameters. Let us look for nonnegative solutions of (1) of the form $x(t)= \frac{1}{t^{\theta }}$, with $\theta >0$. This means that the trajectory is not oscillating, it is a completely damped trajectory. We begin by determining the values of $c, \gamma $ and $\theta $ that provide such solutions. On the one hand,

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) = \theta (\theta +1 -\alpha ) \frac{1}{t^{\theta + 2}}. \end{aligned}$$

On the other hand, $\nabla \Phi (x)= c \gamma |x|^{\gamma -2}x $, which gives

$$\begin{aligned} \nabla \Phi (x(t))= c \gamma \frac{1}{t^{\theta (\gamma -1)}}. \end{aligned}$$

Thus, $x(t)= \frac{1}{t^{\theta }}$ is solution of (1) if, and only if,

(i)
$\theta + 2 = \theta (\gamma -1)$, which is equivalent to $\gamma >2$ and $\theta = \frac{2}{\gamma -2}$; and
(ii)
$c \gamma = \theta (\alpha -\theta -1)$, which is equivalent to $\alpha > \frac{\gamma }{\gamma -2}$ and $c= \frac{2}{\gamma (\gamma -2)}( \alpha - \frac{\gamma }{\gamma -2})$.

We have $\min \Phi = 0$ and

$$\begin{aligned} \Phi (x(t)) =\frac{2}{\gamma (\gamma -2)}( \alpha - \frac{\gamma }{\gamma -2} ) \frac{1}{t^{\frac{2 \gamma }{\gamma -2 }}}. \end{aligned}$$

The speed of convergence of $\Phi (x(t))$ to 0 depends on the parameter $\gamma $. As $\gamma $ tends to infinity, the exponent $\frac{2 \gamma }{\gamma -2 }$ tends to 2. This limiting situation is obtained by taking a function $\Phi $ that becomes very flat around the set of its minimizers. Therefore, without other geometric assumptions on $\Phi $, we cannot expect a convergence rate better than ${{\mathcal {O}}} (1/t^2)$. By contrast, in Sect. 3, we will show better rates of convergence under some geometrical assumptions, like strong convexity of $\Phi $.

2.6 Weak convergence of the trajectories

In this subsection, we show the convergence of the solutions of (1), provided $\alpha >3$. We begin by establishing some preliminary estimations that cannot be derived from the analysis carried out in [35]. The first statement improves part (v) of Theorem 2.3, while the second one is the key to proving the convergence of the trajectories of (1):

Theorem 2.14

Let $x:[t_0,+\infty [ \rightarrow {{\mathcal {H}}}$ be a solution of (1) with ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $.

(i)
If $\alpha \ge 3$ and x is bounded, then $\Vert {{\dot{x}}}(t)\Vert ={{\mathcal {O}}}(1/t)$. More precisely,
$$\begin{aligned} \Vert {{\dot{x}}}(t)\Vert \le \frac{1}{t} \left( \sqrt{2\mathcal E_{\alpha -1,0}(t_0)} + (\alpha -1)\sup _{t\ge t_0}\Vert x(t)-x^*\Vert \right) . \end{aligned}$$
(12)
(ii)
If $\alpha >3$, then x is bounded and
$$\begin{aligned} \int _{t_0}^{+\infty }t\Vert {{\dot{x}}}(t)\Vert ^2\,dt\le \frac{\mathcal E_{2,2(\alpha -3)}(t_0)}{\alpha -3}<+\infty . \end{aligned}$$
(13)

Proof

To prove (i), assume $\alpha \ge 3$ and x is bounded. From the definition of ${{\mathcal {E}}}_{\lambda ,\xi }$, we have $\frac{1}{2}\Vert \lambda (x-x^*)+t{{\dot{x}}}\Vert ^2\le {{\mathcal {E}}}_{\lambda ,\xi }(t)$, and so $\Vert t{{\dot{x}}}\Vert \le \sqrt{2{{\mathcal {E}}}_{\lambda ,\xi }(t)}+\lambda \Vert x-x^*\Vert $. By Remark 2.6, ${{\mathcal {E}}}_{\alpha -1,0}$ is nonincreasing, and we immediately obtain (12).

In order to show (ii), suppose now that $\alpha >3$. Choose $\lambda =2$ and $\xi ^*=2(\alpha -3)$. By Remark 2.6, we have

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_{\lambda ,\xi ^*}(t) \le - (\alpha -3)\,t\,\Vert {{\dot{x}}}\Vert ^2, \end{aligned}$$

(14)

and ${{\mathcal {E}}}_{\lambda ,\xi ^*}$ is nonincreasing. From the definition of ${{\mathcal {E}}}_{\lambda ,\xi }$, we deduce that $\Vert x(t)-x^*\Vert ^2 \le \frac{2}{\xi } {{\mathcal {E}}}_{\lambda ,\xi }(t)$, which gives

$$\begin{aligned} \Vert x(t)-x^*\Vert ^2\le \frac{\mathcal E_{2,2(\alpha -3)}(t)}{\alpha -3}\le \frac{\mathcal E_{2,2(\alpha -3)}(t_0)}{\alpha -3}, \end{aligned}$$

(15)

and establishes the boundedness of x. Integrating (14) from $t_0$ to t, and recalling that ${{\mathcal {E}}}_{\lambda ,\xi ^*}$ is nonnegative, we obtain

$$\begin{aligned} \int _{t_0}^t s\Vert {{\dot{x}}}(s)\Vert ^2 ds \le \frac{{{\mathcal {E}}}_{2,2(\alpha -3)}(t_0)}{\alpha -3}, \end{aligned}$$

as required. $\square $

Remark 2.15

In view of (12) and (15), when $\alpha >3$, we obtain the following explicit bound for $\Vert {{\dot{x}}}\Vert $, namely

$$\begin{aligned} \Vert {{\dot{x}}}(t)\Vert \le \frac{1}{t}\left( \sqrt{2{{\mathcal {E}}}_{\alpha -1,0}(t_0)} + (\alpha -1)\sqrt{\frac{{{\mathcal {E}}}_{2,2(\alpha -3)}(t_0)}{\alpha -3}}\right) . \end{aligned}$$

Since $\lim _{t\rightarrow +\infty }\Vert {{\dot{x}}}(t)\Vert =0$ by Theorem 2.3, we also have $\lim _{t\rightarrow +\infty }t\,\Vert \dot{x}(t)\Vert ^2=0$.

We are now in a position to prove the weak convergence of the trajectories of (1), which is the main result of this section. The proof relies on a Lyapunov analysis, which was first used by Alvarez [2] in the context of the heavy ball with friction.

Theorem 2.16

Let $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ be a continuously differentiable convex function such that ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $, and let $x:[t_0,+\infty [\rightarrow {{\mathcal {H}}}$ be a solution of (1) with $\alpha >3$. Then x(t) converges weakly, as $t\rightarrow +\infty $, to a point in ${{\mathrm{argmin\,}}}\Phi $.

Proof

We shall use Opial’s Lemma 5.7. To this end, let $x^*\in {{\mathrm{argmin\,}}}\Phi $ and recall from (7) that

$$\begin{aligned} \ddot{h}_{x^*}(t) + \frac{\alpha }{t} \dot{h}_{x^*}(t) +\Phi (x(t)) - \min \Phi \le \Vert \dot{x}(t)\Vert ^2, \end{aligned}$$

where $h_z$ is given by (5). This yields

$$\begin{aligned} t\ddot{h}_{x^*}(t) + \alpha \dot{h}_{x^*}(t)\le t\Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

In view of Theorem 2.14, part (ii), the right-hand side is integrable on $[t_0,+\infty [$. Lemma 5.9 then implies $\lim _{t\rightarrow +\infty }h_{x^*}(t)$ exists. This gives the first hypothesis in Opial’s Lemma. The second one was established in part (ii) of Theorem 2.3. $\square $

Remark 2.17

A puzzling question concerns the convergence of the trajectories for $\alpha =3$, a question which is directly related to the convergence of the sequences generated by Nesterov’s method [26] (see [35, Section 3]).

2.7 Further stabilization results

Let us complement the study of Eq. (1) by examining the asymptotic behavior of the acceleration $\ddot{x}$. To this end, we shall use an additional regularity assumption on the gradient of $\Phi $.

Proposition 2.18

Let $\alpha >3$ and let $x:[t_0,+\infty [ \rightarrow {{\mathcal {H}}}$ be a solution of (1) with ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $. Assume $\nabla \Phi $ Lipschitz-continuous on bounded sets. Then ${{\ddot{x}}}$ is bounded, globally Lipschitz continuous on $[t_0,+\infty [$, and satisfies

$$\begin{aligned} \lim _{t\rightarrow +\infty }\frac{1}{t^\alpha } \int _{t_0}^t s^\alpha \Vert {{\ddot{x}}}(s)\Vert ^2 ds =0.\end{aligned}$$

Proof

First recall that x and ${{\dot{x}}}$ are bounded, by virtue of Theorems 2.14 and 2.3, respectively. By (1), we have

$$\begin{aligned} \ddot{x}(t) = -\frac{\alpha }{t} \dot{x}(t) - \nabla \Phi (x(t)). \end{aligned}$$

(16)

Since $\nabla \Phi $ is Lipschitz-continuous on bounded sets, it follows from (16), and the boundedness of x and ${{\dot{x}}}$, that $\ddot{x}$ is bounded on $[t_0, +\infty [$. As a consequence, $\dot{x}$ is Lipschitz-continuous on $[t_0, +\infty [$. Returning to (16), we deduce that $\ddot{x}$ is Lipschitz-continuous on $[t_0, +\infty [$.

Pick $x^*\in {{\mathrm{argmin\,}}}\Phi $, set $h=h_{x^*}$ (to simplify the notation) and use (6) to obtain

$$\begin{aligned} \ddot{h}(t) + \frac{\alpha }{t} \dot{h}(t) + \langle x(t) - x^{*}, \nabla \Phi (x(t)) \rangle =\Vert \dot{x}(t) \Vert ^2. \end{aligned}$$

(17)

Let L be a Lipschitz constant for $\nabla \Phi $ on some ball containing the minimizer $x^*$ and the trajectory x. By virtue of the Baillon–Haddad Theorem (see, for instance, [12], [32, Theorem 3.13] or [27, Theorem 2.1.5]), $\nabla \Phi $ is $\frac{1}{L}$-cocoercive on that ball, which means that

$$\begin{aligned} \langle x(t) - x^{*}, \nabla \Phi (x(t))- \nabla \Phi (x^{*})\rangle \ge \frac{1}{L} \Vert \nabla \Phi (x(t))- \nabla \Phi (x^{*}) \Vert ^2. \end{aligned}$$

Substituting this inequality in (17), and using the fact that $\nabla \Phi (x^{*}) =0$, we obtain

$$\begin{aligned} \ddot{h}(t) + \frac{\alpha }{t} \dot{h}(t) + \frac{1}{L} \Vert \nabla \Phi (x(t)) \Vert ^2 \le \Vert \dot{x}(t) \Vert ^2. \end{aligned}$$

In view of (16), this gives

$$\begin{aligned} \ddot{h}(t) + \frac{\alpha }{t} \dot{h}(t) + \frac{1}{L} \left\| \ddot{x}(t) +\frac{\alpha }{t} \dot{x}(t) \right\| ^2 \le \Vert \dot{x}(t) \Vert ^2. \end{aligned}$$

Developing the square on the left-hand side, and neglecting the nonnegative term $(\alpha \Vert {{\dot{x}}}(t)\Vert /t)^2/L$, we obtain

$$\begin{aligned} \ddot{h}(t) + \frac{\alpha }{t} \dot{h}(t) + \frac{1}{L}\Vert \ddot{x}(t)\Vert ^2 + \frac{\alpha }{Lt}\frac{d}{dt}\Vert \dot{x}(t)\Vert ^2 \le \Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

We multiply this inequality by $t^\alpha $ to obtain

$$\begin{aligned} \frac{d}{dt}\left( t^{\alpha }\dot{h}(t)\right) + \frac{1}{L} t^{\alpha }\Vert \ddot{x}(t)\Vert ^2 + \frac{\alpha }{L}t^{\alpha -1}\frac{d}{dt}\Vert \dot{x}(t)\Vert ^2 \le t^{\alpha } \Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

Integration from $t_0$ to t yields

$$\begin{aligned}&t^{\alpha } \dot{h}(t) - t_0^{\alpha } \dot{h}(t_0) + \frac{1}{L} \int _{t_0}^t s^{\alpha }\Vert \ddot{x}(s) \Vert ^2 ds \\&\qquad +\frac{\alpha }{L}\Big ( t^{\alpha -1} \Vert \dot{x}(t) \Vert ^2 -{t_0}^{\alpha -1} \Vert \dot{x}(t_0) \Vert ^2 - (\alpha -1)\\&\int _{t_0}^t \Vert \dot{x}(s) \Vert ^2 s^{\alpha -2} ds \Big ) \le \int _{t_0}^t s^{\alpha } \Vert \dot{x}(s) \Vert ^2 ds. \end{aligned}$$

Neglecting the nonnegative term $\alpha t^{\alpha -1}\Vert \dot{x}(t)\Vert ^2/L$, we obtain

$$\begin{aligned}&t^\alpha {{\dot{h}}}(t) + \frac{1}{L} \int _{t_0}^t s^{\alpha }\Vert \ddot{x}(s)\Vert ^2 ds \le C + (\alpha -1)\int _{t_0}^t \Vert \dot{x}(s) \Vert ^2 s^{\alpha -2} ds \nonumber \\&\qquad +\, \int _{t_0}^t s^{\alpha } \Vert \dot{x}(s) \Vert ^2 ds, \end{aligned}$$

(18)

where $C=t_0^\alpha {{\dot{h}}}(t_0)+\alpha t_0^{\alpha -1}\Vert \dot{x}(t_0)\Vert ^2/L$.

If $t_0<1$, we have

$$\begin{aligned} \frac{1}{t^\alpha } \int _{t_0}^t s^\alpha \Vert {{\ddot{x}}}(s)\Vert ^2 ds = \frac{1}{t^\alpha } \int _{t_0}^1 s^\alpha \Vert {{\ddot{x}}}(s)\Vert ^2 ds + \frac{1}{t^\alpha } \int _1^t s^\alpha \Vert {{\ddot{x}}}(s)\Vert ^2 ds \end{aligned}$$

for all $t\ge 1$. Since the first term on the right-hand side tends to 0 as $t\rightarrow +\infty $, we may assume, without loss of generality, that $t_0\ge 1$.

Observe now that $s^{\alpha -2} \le s^{\alpha }$, whenever $s\ge 1$. Whence, inequality (18) simplifies to

$$\begin{aligned} t^\alpha {{\dot{h}}}(t) + \frac{1}{L} \int _{t_0}^t s^{\alpha }\Vert \ddot{x}(s)\Vert ^2 ds \le C + \alpha \int _{t_0}^t s^{\alpha } \Vert \dot{x}(s) \Vert ^2 ds. \end{aligned}$$

Dividing by $t^\alpha $ and integrating again, we obtain

$$\begin{aligned}&h(t)-h(t_0) + \frac{1}{L} \int _{t_0}^t \tau ^{-\alpha } \left( \int _{t_0}^\tau {s}^{\alpha }\Vert \ddot{x}(s) \Vert ^2 ds \right) d\tau \le \frac{C}{\alpha -1}\left( t_0^{-\alpha +1}-t^{-\alpha +1}\right) \\&\quad +\,\alpha \int _{t_0}^t \tau ^{-\alpha } \left( \int _{t_0}^\tau {s}^{\alpha }\Vert \dot{x}(s) \Vert ^2 ds \right) d\tau . \end{aligned}$$

Setting $C'=h(t_0)+C t_0^{-\alpha +1}/(\alpha -1)$, and neglecting the nonnegative term h(t) of the left-hand side and the nonpositive term $-C t^{-\alpha +1}/(\alpha -1)$ of the right-hand side, we get

$$\begin{aligned} \frac{1}{L} \int _{t_0}^t \tau ^{-\alpha } \left( \int _{t_0}^\tau {s}^{\alpha }\Vert \ddot{x}(s) \Vert ^2 ds \right) d\tau \le C' +\alpha \int _{t_0}^t \tau ^{-\alpha } \left( \int _{t_0}^\tau {s}^{\alpha }\Vert \dot{x}(s) \Vert ^2 ds \right) d\tau . \end{aligned}$$

Set $\displaystyle g(\tau )= \tau ^{-\alpha }\left( \int _{t_0}^\tau s^\alpha \Vert {{\ddot{x}}}(s)\Vert ^2 ds\right) $ and use Fubini’s Theorem on the second integral to get

$$\begin{aligned} \frac{1}{L}\int _{t_0}^t g(\tau )d\tau\le & {} C' + \frac{\alpha }{\alpha -1}\int _{t_0}^t s^\alpha \Vert {{\dot{x}}}(s)\Vert ^2 \left( s^{-\alpha +1}-t^{-\alpha +1}\right) \ ds\\\le & {} C' + \frac{\alpha }{\alpha -1}\int _{t_0}^t s\Vert {{\dot{x}}}(s)\Vert ^2 ds. \end{aligned}$$

By part (ii) of Theorem 2.14, the integral on the right-hand side is finite. We have

$$\begin{aligned} \int _{t_0}^{+\infty } g(\tau )d\tau < +\infty . \end{aligned}$$

(19)

The derivative of g is

$$\begin{aligned} {{\dot{g}}}(\tau ) = - \alpha \tau ^{-\alpha -1}\int _{t_0}^\tau s^\alpha \Vert {{\ddot{x}}}(s)\Vert ^2 ds +\Vert {{\ddot{x}}}(\tau )\Vert ^2. \end{aligned}$$

Let $C''$ be an upper bound for $\Vert {{\ddot{x}}}\Vert ^2$. We have

$$\begin{aligned} |{{\dot{g}}}(\tau )| \le C''\left( 1+\alpha \tau ^{-\alpha -1}\int _{t_0}^\tau s^\alpha ds\right)= & {} C''\left( 1+\frac{\alpha }{\alpha +1}\tau ^{-\alpha -1} \left( \tau ^{\alpha +1}-t^{\alpha +1}\right) \right) \nonumber \\\le & {} C''\left( 1+\frac{\alpha }{\alpha +1}\right) . \end{aligned}$$

(20)

From (19) and (20) we deduce that $\lim _{\tau \rightarrow +\infty }g(\tau )=0$ by virtue of Lemma 5.6. $\square $

Remark 2.19

Since $\int _{t_0}^ts^\alpha ds= \frac{1}{\alpha +1}\left( t^{\alpha +1}-t_0^{\alpha +1}\right) $, Proposition 2.18 expresses a fast ergodic convergence of $\Vert {{\ddot{x}}}(s)\Vert ^2$ to 0 with respect to the weight $s^\alpha $ as $t\rightarrow +\infty $, namely

$$\begin{aligned} \frac{\int _{t_0}^t s^\alpha \Vert {{\ddot{x}}}(s)\Vert ^2 ds}{\int _{t_0}^t s^\alpha ds} = o\left( \frac{1}{t}\right) . \end{aligned}$$

3 Strong convergence results

A counterexample due to Baillon [11] shows that the trajectories of the steepest descent dynamical system may converge weakly but not strongly. Nevertheless, under some additional geometrical or topological assumptions on $\Phi $, the steepest descent trajectories do converge strongly. This has been proved in the case where the function $\Phi $ is either even or strongly convex (see [16]), or when $\mathrm{int}({{\mathrm{argmin\,}}}\Phi ) \ne \emptyset $ (see [15, theorem 3.13]). Some of these results have been extended to inertial dynamics, see [2] for the heavy ball with friction, and [4] for an inertial version of Newton’s method. This suggests that convexity alone may not be sufficient for the trajectories of (1) to converge strongly, but one can reasonably expect it to be the case under some additional conditions. The purpose of this section is to establish this fact. The different types of hypotheses will be studied in independent subsections since different techniques are required.

3.1 Set of minimizers with nonempty interior

Let us begin by studying the case where $\mathrm{int}({{\mathrm{argmin\,}}}\Phi )\ne \emptyset $.

Theorem 3.1

Let $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ be a continuously differentiable convex function. Let $\mathrm{int}({{\mathrm{argmin\,}}}\Phi )\ne \emptyset $, and let $x:[t_0,+\infty [\rightarrow {{\mathcal {H}}}$ be a solution of (1) with $\alpha >3$. Then x(t) converges strongly, as $t\rightarrow +\infty $, to a point in ${{\mathrm{argmin\,}}}\Phi $. Moreover,

$$\begin{aligned} \int _{t_0}^{\infty } t\Vert \nabla \Phi (x(t)) \Vert dt < +\infty . \end{aligned}$$

Proof

Since $\mathrm{int}({{\mathrm{argmin\,}}}\Phi )\ne \emptyset $, there exist $x^*\in {{\mathrm{argmin\,}}}\Phi $ and some $\rho >0$ such that $\nabla \Phi (z)=0$ for all $z\in {{\mathcal {H}}}$ such that $\Vert z-x^*\Vert <\rho $. By the monotonicity of $\nabla \Phi $, for all $y\in {{\mathcal {H}}}$, we have

$$\begin{aligned} \langle \nabla \Phi (y), y- z \rangle \ge 0. \end{aligned}$$

Hence,

$$\begin{aligned} \langle \nabla \Phi (y), y-x^* \rangle \ge \langle \nabla \Phi (y), z-x^* \rangle . \end{aligned}$$

Taking the supremum with respect to $z \in {{\mathcal {H}}}$ such that $\Vert z-x^*\Vert < \rho $, we infer that

$$\begin{aligned} \langle \nabla \Phi (y), y-x^* \rangle \ge \rho \Vert \nabla \Phi (y)\Vert \end{aligned}$$

for all $ y \in {{\mathcal {H}}} $. In particular,

$$\begin{aligned} \langle \nabla \Phi ( x(t)), x(t)-x^* \rangle \ge \rho \Vert \nabla \Phi ( x(t))\Vert . \end{aligned}$$

By using this inequality in (9) with $\lambda =\alpha -1$ and $\xi =0$, we obtain

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_{\alpha -1,0}(t)+(\alpha -1)\rho t\Vert \nabla \Phi (x(t))\Vert \le 2\,t\big (\Phi (x(t))-\min \Phi \big ), \end{aligned}$$

whence we derive, by integrating from $t_0$ to t

$$\begin{aligned}&{{\mathcal {E}}}_{\alpha -1,0}(t)-{{\mathcal {E}}}_{\alpha -1,0}(t_0) +(\alpha -1)\rho \int _{t_0}^t s\Vert \nabla \Phi (x(s))\Vert ds \\&\quad \le 2\int _{t_0}^t s\big (\Phi (x(s))-\min \Phi \big )\,ds. \end{aligned}$$

Since ${{\mathcal {E}}}_{\alpha -1,0}(t)$ is nonnegative, part (ii) of Theorem 2.7 gives

$$\begin{aligned} \int _{t_0}^{\infty } t\Vert \nabla \Phi (x(t)) \Vert dt < +\infty . \end{aligned}$$

(21)

Finally, rewrite (1) as

$$\begin{aligned} t\ddot{x}(t) + \alpha \dot{x}(t) = - t\nabla \Phi (x(t)). \end{aligned}$$

Since the right-hand side is integrable, we conclude by applying Lemma 5.12 and Theorem 2.16. $\square $

3.2 Even functions

Let us recall that $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ is even if $\Phi (-x)= \Phi (x) $ for every $x\in {{\mathcal {H}}}$. In this case the set ${{\mathrm{argmin\,}}}\Phi $ is nonempty, and contains the origin.

Theorem 3.2

Let $\Phi :{{\mathcal {H}}}\rightarrow {{\mathbb {R}}}$ be a continuously differentiable convex even function, and let $x:[t_0,+\infty [\rightarrow {{\mathcal {H}}}$ be a solution of (1) with $\alpha >3$. Then x(t) converges strongly, as $t\rightarrow +\infty $, to a point in ${{\mathrm{argmin\,}}}\Phi $.

Proof

For $t_0 \le \tau \le s$, set

$$\begin{aligned} q(\tau )= \Vert x(\tau ) \Vert ^2 -\Vert x(s) \Vert ^2 - \frac{1}{2} \Vert x(\tau )- x(s) \Vert ^2. \end{aligned}$$

We have

$$\begin{aligned} \dot{q}(\tau )= \langle \dot{x}(\tau ),x(\tau ) + x(s) \rangle \qquad \hbox {and}\qquad \ddot{q}(\tau ) = \Vert \dot{x}(\tau )\Vert ^2 + \langle \ddot{x}(\tau ),x(\tau ) + x(s) \rangle . \end{aligned}$$

Combining these two equalities and using (1), we obtain

$$\begin{aligned} \ddot{q}(\tau ) + \frac{\alpha }{\tau } \dot{q}(\tau )= & {} \Vert \dot{x}(\tau )\Vert ^2 + \langle \ddot{x}(\tau ) + \frac{\alpha }{\tau } \dot{x}(\tau ),x(\tau ) + x(s) \rangle \nonumber \\= & {} \Vert \dot{x}(\tau )\Vert ^2 - \langle \nabla \Phi (x(\tau )),x(\tau ) + x(s) \rangle . \end{aligned}$$

(22)

Recall that the energy $W(\tau )= \frac{1}{2} \Vert \dot{x}(\tau ) \Vert ^2 + \Phi (x(\tau ))$ is nonincreasing. Therefore,

$$\begin{aligned} \frac{1}{2} \Vert \dot{x}(\tau ) \Vert ^2 + \Phi (x(\tau ))\ge & {} \frac{1}{2} \Vert \dot{x}(s) \Vert ^2 + \Phi (x(s)) \\= & {} \frac{1}{2} \Vert \dot{x}(s) \Vert ^2 + \Phi (-x(s))\\\ge & {} \frac{1}{2} \Vert \dot{x}(s) \Vert ^2 + \Phi (x(\tau )) - \langle \nabla \Phi (x(\tau )),x(\tau ) + x(s) \rangle , \end{aligned}$$

by convexity. After simplification, we obtain

$$\begin{aligned} \frac{1}{2} \Vert \dot{x}(\tau ) \Vert ^2 \ge - \langle \nabla \Phi (x(\tau )),x(\tau ) + x(s) \rangle . \end{aligned}$$

(23)

Combining (22) and (23), we obtain

$$\begin{aligned} \tau \ddot{q}(\tau ) + \alpha \dot{q}(\tau ) \le \frac{3}{2}\tau \Vert \dot{x}(\tau )\Vert ^2. \end{aligned}$$

As in the proof of Lemma 5.9, we have

$$\begin{aligned} \dot{q}(\tau ) \le k(\tau ):= \frac{C}{\tau ^{\alpha } } + \frac{3}{2\tau ^{\alpha } } \int _{t_0}^\tau u^{\alpha } \Vert \dot{x}(u)\Vert ^2du, \end{aligned}$$

where $C=2\Vert {{\dot{x}}}(t_0)\Vert \,\Vert x\Vert _{\infty }$. The function k does not depend on s. Moreover, using Fubini’s Theorem, we deduce that

$$\begin{aligned} \int _{t_0}^{+\infty }k(\tau )\,d\tau \le \frac{C}{t_0^{\alpha -1}(\alpha -1)}+\frac{3}{2(\alpha -1)} \int _{t_0}^{+\infty }u\Vert \dot{x}(u)\Vert ^2\,du<+\infty , \end{aligned}$$

by part ii) of Theorem 2.14. Integrating $\dot{q}(\tau ) \le k(\tau )$ from t to s, we obtain

$$\begin{aligned} \frac{1}{2} \Vert x(t)- x(s) \Vert ^2 \le \Vert x(t) \Vert ^2 -\Vert x(s) \Vert ^2 + \int _t^s k(\tau ) d\tau . \end{aligned}$$

Since $\Phi $ is even, we have $0 \in {{\mathrm{argmin\,}}}\Phi $. Hence $\lim _{t\rightarrow +\infty }\Vert x(t) \Vert ^2$ exists (see the proof of Theorem 2.16). As a consequence, x(t) has the Cauchy property as $t \rightarrow + \infty $, and hence converges. $\square $

3.3 Uniformly convex functions

Following [13], a function $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ is uniformly convex on bounded sets if, for each $r>0$, there is an increasing function $\omega _r:[0,+\infty [\rightarrow [0,+\infty [$ vanishing only at 0, and such that

$$\begin{aligned} \Phi (y) \ge \Phi (x) + \langle \nabla \Phi (x), y-x \rangle + \omega _r(\Vert x-y\Vert ) \end{aligned}$$

(24)

for all $x, y \in {{\mathcal {H}}}$ such that $\Vert x \Vert \le r$ and $\Vert y \Vert \le r$. Uniformly convex functions are strictly convex, so the set of their minimum points is at most a singleton.

Theorem 3.3

Let $\Phi $ be uniformly convex on bounded sets with ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $, and let $x:[t_0,+\infty [\rightarrow {{\mathcal {H}}}$ be a solution of (1) with $\alpha >3$. Then x(t) converges strongly, as $t\rightarrow +\infty $, to the unique $x^*\in {{\mathrm{argmin\,}}}\Phi $.

Proof

Recall that the trajectory $x(\cdot )$ is bounded, by part (ii) in Theorem 2.14. Let $r >0$ be such that x is contained in the ball of radius r centered at the origin. This ball also contains $x^*$, which is the weak limit of the trajectory in view of the weak lower-semicontinuity of the norm and Theorem 2.16. Writing $y=x(t)$ and $x=x^*$ in (24), we obtain

$$\begin{aligned} \omega _r (\Vert x(t)-x^{*} \Vert )\le \Phi (x(t))-\min \Phi . \end{aligned}$$

The right-hand side tends to 0 as $t\rightarrow +\infty $ by virtue of Theorem 2.3. It follows that x(t) converges strongly to $x^*$ as $t\rightarrow +\infty $. $\square $

Let us recall that a function $\Phi : {{\mathcal {H}}} \rightarrow \mathbb R$ is strongly convex if there exists a positive constant $\mu $ such that

$$\begin{aligned} \Phi (y) \ge \Phi (x) + \langle \nabla \Phi (x), y-x \rangle + \frac{\mu }{2} \Vert x-y \Vert ^2 \end{aligned}$$

for all $x, y \in {{\mathcal {H}}}$. Clearly, strongly convex functions are uniformly convex on bounded sets. However, interestingly, convergence rates increase indefinitely with larger values of $\alpha $ for these functions.

Theorem 3.4

Let $\Phi :{{\mathcal {H}}}\rightarrow {\mathbb {R}}$ be strongly convex, and let $x:[t_0,+\infty [\rightarrow {{\mathcal {H}}}$ be a solution of (1) with $\alpha >3$. Then x(t) converges strongly, as $t\rightarrow +\infty $, to the unique element $x^*\in {{\mathrm{argmin\,}}}\Phi $. Moreover

$$\begin{aligned} \Phi (x(t))-\min \Phi= & {} {{\mathcal {O}}}\left( t^{-\frac{2}{3}\alpha }\right) ,\qquad \Vert x(t)-x^*\Vert ^2=\mathcal O\left( t^{-\frac{2}{3}\alpha }\right) ,\qquad \hbox {and}\nonumber \\ \Vert \dot{x}(t)\Vert ^2= & {} {{\mathcal {O}}}\left( t^{-\frac{2}{3}\alpha }\right) . \end{aligned}$$

(25)

Proof

Strong convergence follows from Theorem 3.3 because strongly convex functions are uniformly convex on bounded sets. From (10) and the strong convexity of $\Phi $, we deduce that

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_\lambda ^p(t)\le & {} (p+2-\lambda )t^{p+1}(\Phi (x)-\min \Phi ) - \lambda (\alpha -\lambda -1-p)t^p\langle x-x^*,{{\dot{x}}}\rangle \\ \nonumber&-\,\frac{\lambda }{2}(\mu t^2-p\lambda )t^{p-1}\Vert x-x^*\Vert ^2 - \left( \alpha -\lambda -1-\frac{p}{2}\right) t^{p+1}\Vert {{\dot{x}}}\Vert ^2 \end{aligned}$$

for any $\lambda \ge 0$ and any $p\ge 0$. Now fix $p=\frac{2}{3}(\alpha -3)$ and $\lambda =\frac{2}{3}\alpha $, so that $p+2-\lambda =\alpha -\lambda -1-p/2=0$ and $\alpha -\lambda -1-p=-p/2$. The above inequality becomes

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_\lambda ^p(t) \le \frac{\lambda p}{2}t^p\langle x-x^*,{{\dot{x}}}\rangle -\frac{\lambda }{2}(\mu t^2-p\lambda )t^{p-1}\Vert x-x^*\Vert ^2.\end{aligned}$$

Define $t_1=\max \left\{ t_0,\sqrt{\frac{p\lambda }{\mu }}\right\} $, so that

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_\lambda ^p(t) \le \frac{\lambda p}{2}t^p\langle x-x^*,{{\dot{x}}}\rangle \end{aligned}$$

for all $t\ge t_1$. Integrate this inequality from $t_1$ to t (use integration by parts on the right-hand side) to get

$$\begin{aligned} {{\mathcal {E}}}_\lambda ^p(t)\le & {} {{\mathcal {E}}}_\lambda ^p(t_1) + \frac{\lambda p}{4}\left( t^p\Vert x(t)-x^*\Vert ^2 - t_1^p\Vert x(t_1)-x^*\Vert ^2\right. \nonumber \\&\left. -\, p\int _{t_1}^t s^{p-1}\Vert x(s)-x^*\Vert ^2 ds \right) . \end{aligned}$$

Hence,

$$\begin{aligned} {{\mathcal {E}}}_\lambda ^p(t) \le {{\mathcal {E}}}_\lambda ^p(t_1) + \frac{\lambda p}{4} t^p\Vert x(t)-x^*\Vert ^2 \le {{\mathcal {E}}}_\lambda ^p(t_1) + \frac{\lambda p}{2\mu } t^p(\Phi (x(t))-\min \Phi ), \end{aligned}$$

(26)

in view of the strong convexity of $\Phi $. By the definition of ${{\mathcal {E}}}_\lambda ^p$, we have

$$\begin{aligned} t^{p+2}((\Phi (x(t))-\min \Phi ) \le {{\mathcal {E}}}_\lambda ^p(t) \le {{\mathcal {E}}}_\lambda ^p(t_1) + \frac{\lambda p}{2\mu } t^p(\Phi (x(t))-\min \Phi ). \end{aligned}$$

Dividing by $t^{p+2}$ and using the definition of $t_1$, along with the fact that $t\ge t_1$, we obtain

$$\begin{aligned} \Phi (x(t))-\min \Phi\le & {} {{\mathcal {E}}}_\lambda ^p(t_1)t^{-p-2} + \frac{\lambda p}{2\mu } t^{-2}(\Phi (x(t))-\min \Phi )\\\le & {} {{\mathcal {E}}}_\lambda ^p(t_1)t^{-p-2}+\frac{\lambda p}{2\mu }t_1^{-2}(\Phi (x(t))-\min \Phi )\\\le & {} {{\mathcal {E}}}_\lambda ^p(t_1)t^{-p-2}+\frac{1}{2}(\Phi (x(t))-\min \Phi ). \end{aligned}$$

Recalling that $p=\frac{2}{3}(\alpha -3)$ and $\lambda =\frac{2}{3}\alpha $, we deduce that

$$\begin{aligned} \Phi (x(t))-\min \Phi \le 2{{\mathcal {E}}}_\lambda ^p(t_1)t^{-p-2} = \left[ 2\mathcal E_{\frac{2}{3}\alpha }^{\frac{2}{3}(\alpha -3)}(t_1)\right] t^{-\frac{2}{3}\alpha }. \end{aligned}$$

(27)

The strong convexity of $\Phi $ then gives

$$\begin{aligned} \Vert x(t)-x^*\Vert ^2\le & {} \frac{2}{\mu }(\Phi (x(t))-\min \Phi ) \le \left[ \frac{4}{\mu }{{\mathcal {E}}}_\lambda ^p(t_1)\right] t^{-p-2}\nonumber \\= & {} \left[ \frac{4}{\mu } \mathcal {E}_{\frac{2}{3}\alpha }^{\frac{2}{3}(\alpha -3)}(t_1)\right] t^{-\frac{2}{3}\alpha }. \end{aligned}$$

(28)

Inequalities (27) and (28) settle the first two points in (25).

Now, using (26) and (27), we derive

$$\begin{aligned} {{\mathcal {E}}}_\lambda ^p(t)\le & {} {{\mathcal {E}}}_\lambda ^p(t_1) + \frac{\lambda p}{2\mu } t^p(\Phi (x(t))-\min \Phi )\le {{\mathcal {E}}}_\lambda ^p(t_1)+\frac{\lambda p}{\mu }{{\mathcal {E}}}_\lambda ^p(t_1)t^{-2}\\\le & {} {{\mathcal {E}}}_\lambda ^p(t_1)+\frac{\lambda p}{\mu }{{\mathcal {E}}}_\lambda ^p(t_1)t_1^{-2}\le 2{{\mathcal {E}}}_\lambda ^p(t_1). \end{aligned}$$

The definition of ${{\mathcal {E}}}_\lambda ^p$ then gives

$$\begin{aligned} \frac{t^p}{2}\Vert \lambda \left( x(t)-x^*\right) +t{{\dot{x}}}(t)\Vert ^2 \le {{\mathcal {E}}}_\lambda ^p(t) \le 2{{\mathcal {E}}}_\lambda ^p(t_1).\end{aligned}$$

Hence

$$\begin{aligned} \Vert \lambda \left( x(t)-x^*\right) +t{{\dot{x}}}(t)\Vert ^2 \le 4 t^{-p} {{\mathcal {E}}}_\lambda ^p(t_1),\end{aligned}$$

and

$$\begin{aligned} t\Vert {{\dot{x}}}(t)\Vert \le 2 t^{-p/2} \sqrt{{{\mathcal {E}}}_\lambda ^p(t_1)} + \lambda \Vert x(t)-x^*\Vert . \end{aligned}$$

But using (28), we deduce that

$$\begin{aligned} \lambda \Vert x(t)-x^*\Vert \le \frac{2\lambda }{\sqrt{\mu }}t^{-p/2-1}\sqrt{{{\mathcal {E}}}_\lambda ^p(t_1)}.\end{aligned}$$

The last two inequalities together give

$$\begin{aligned} t\Vert {{\dot{x}}}(t)\Vert \le 2 t^{-p/2} \sqrt{{{\mathcal {E}}}_\lambda ^p(t_1)} \left( 1+\frac{\lambda t^{-1}}{\sqrt{\mu }}\right) \le 2 t^{-p/2}\sqrt{{{\mathcal {E}}}_\lambda ^p(t_1)}\left( 1+\sqrt{\frac{\lambda }{p}}\right) . \end{aligned}$$

Taking squares, and rearranging the terms, we obtain

$$\begin{aligned} \Vert {{\dot{x}}}(t)\Vert ^2 \le \left[ 4 \left( 1+\sqrt{\frac{\alpha }{\alpha -3}}\right) ^2{{\mathcal {E}}}_{\frac{2}{3}\alpha }^{\frac{2}{3}(\alpha -3)}(t_1) \right] t^{-\frac{2}{3}\alpha }, \end{aligned}$$

which shows the last point in (25) and completes the proof. $\square $

The preceding theorem extends [35, Theorem 4.2], which states that if $\alpha >9/2$, then $\Phi (x(t))-\min \Phi =\mathcal O(1/t^{3})$.

4 Asymptotic behavior of the trajectory under perturbations

In this section, we analyze the asymptotic behavior, as $t\rightarrow +\infty $, of the solutions of the differential equation

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \nabla \Phi (x(t)) = g(t). \end{aligned}$$

(29)

From the Cauchy–Lipschitz–Picard Theorem (see, for instance, [7, Theorem 17.1.2b)]), for any initial condition $x(t_0)=x_0 \in {{\mathcal {H}}}, \dot{x}(t_0)= v_0 \in {{\mathcal {H}}}$, we deduce the existence and uniqueness of a maximal local solution x to (29), with ${{\dot{x}}}$ locally absolutely continuous, if $\nabla \Phi $ is Lipschitz-continuous on bounded sets and g is locally integrable. If $\Phi $ is minorized, the global existence follows from the energy estimate proved in Lemma 4.1, in the next subsection.

This being said, our main concern here is to obtain sufficient conditions on the perturbation g in order to guarantee that the convergence properties established in the preceding sections are preserved. The analysis follows very closely the arguments given in Sects. 2 and 3. It is developed in the same general setting, just assuming that $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ is a continuously differentiable convex function. Therefore, we shall state the main results and sketch the proofs, underlining the parts where additional techniques are required.

4.1 Energy estimates and minimization property

The following result is obtained by considering the global energy of the system, and showing that it is a strict Lyapunov function:

Lemma 4.1

Let $\Phi $ be bounded from below, and let $x: [t_0, +\infty [ \rightarrow {{\mathcal {H}}}$ be a solution of (29) with $\alpha >0$ and ${\int _{t_0}^{\infty } \Vert g(t)\Vert \,dt < + \infty }$. Then, $\sup \nolimits _{t>t_0} \Vert \dot{x}(t) \Vert < + \infty $ and ${\int _{t_0}^{\infty } \frac{1}{\tau } \Vert \dot{x}(\tau ) \Vert ^2 d\tau <+\infty }$. Moreover, $\lim \nolimits _{t\rightarrow +\infty }\Phi (x(t))=\inf \nolimits _{{{\mathcal {H}}}}\Phi $.

Proof

Set $T >t_0$. For $t_0 \le t \le T$, define the energy function

$$\begin{aligned} W_T(t):= \frac{1}{2} \Vert \dot{x}(t) \Vert ^2 + \left( \Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi \right) + \int _t^T \langle \dot{x}(\tau ), g(\tau ) \rangle d\tau . \end{aligned}$$

(30)

Since $\dot{x}$ is continuous and g is integrable, the function $W_T$ is well defined. Derivating $W_T$ with respect to time, and using (29), we obtain

$$\begin{aligned} \dot{W_T} (t)= \langle \dot{x}(t), \ddot{x}(t) + \nabla \Phi (x(t)) - g(t) \rangle = \langle \dot{x}(t), -\frac{\alpha }{t} \dot{x}(t) \rangle = -\frac{\alpha }{t} \Vert \dot{x}(t) \Vert ^2\le 0. \end{aligned}$$

Hence $W_T(\cdot )$ is a decreasing function. In particular, $ W_T(t) \le W_T(t_0)$, that is

$$\begin{aligned}&\frac{1}{2} \Vert \dot{x}(t) \Vert ^2 + \left( \Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi \right) + \int _t^T \langle \dot{x}(\tau ) , g(\tau ) \rangle d\tau \\&\quad \le \frac{1}{2} \Vert \dot{x}(t_0) \Vert ^2 + \left( \Phi (x_0)- \inf _{{{\mathcal {H}}}}\Phi \right) + \int _{t_0}^T \langle \dot{x}(\tau ), g(\tau ) \rangle d\tau . \end{aligned}$$

As a consequence,

$$\begin{aligned} \frac{1}{2} \Vert \dot{x}(t) \Vert ^2 \le \frac{1}{2} \Vert \dot{x}(t_0) \Vert ^2 + \left( \Phi (x_0)- \inf _{{{\mathcal {H}}}}\Phi \right) + \int _{t_0}^t \Vert \dot{x}(\tau )\Vert \Vert g(\tau ) \Vert d\tau . \end{aligned}$$

Applying Lemma 5.13, we obtain

$$\begin{aligned} \Vert \dot{x}(t) \Vert \le \left( \Vert \dot{x}(t_0) \Vert ^2 + 2\left( \Phi (x_0)- \inf _{{{\mathcal {H}}}}\Phi \right) \right) ^{\frac{1}{2}} + \int _{t_0}^t \Vert g(\tau ) \Vert d\tau . \end{aligned}$$

It follows that

$$\begin{aligned} \sup _{t>t_0}\Vert \dot{x}(t) \Vert \le \left( \Vert \dot{x}(t_0) \Vert ^2 + 2\left( \Phi (x_0)- \inf _{{{\mathcal {H}}}}\Phi \right) \right) ^{\frac{1}{2}} + \int _{t_0}^\infty \Vert g(\tau ) \Vert d\tau <+\infty .\nonumber \\ \end{aligned}$$

(31)

As a consequence, we may define a function $W : \,]t_0,+\infty [\rightarrow {\mathbb {R}}$ by

$$\begin{aligned} W(t):= & {} \frac{1}{2} \Vert \dot{x}(t) \Vert ^2 + \left( \Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi \right) + \int _t^{\infty } \langle \dot{x}(\tau ), g(\tau ) \rangle d\tau \\\ge & {} -\left[ \sup _{t>t_0}\Vert \dot{x}(t) \Vert \right] \int _{t_0}^{\infty } \Vert g(\tau ) \Vert d\tau , \end{aligned}$$

by (31). From the definition of $W_T$ and W, we have

$$\begin{aligned} \dot{W} (t)={{\dot{W}}}_T(t)=- \frac{\alpha }{t} \Vert \dot{x}(t) \Vert ^2. \end{aligned}$$

(32)

Integrating from $t_0$ to t, and using (31), we obtain

$$\begin{aligned} \int _{t_0}^{\infty } \frac{\alpha }{\tau } \Vert \dot{x}(\tau ) \Vert ^2 d\tau= & {} W(t_0)-W(t) \le \frac{1}{2} \Vert \dot{x}(t_0) \Vert ^2 + \left( \Phi (x(t_0))- \inf _{{{\mathcal {H}}}}\Phi \right) \\&+\left[ \sup _{t>t_0}\Vert \dot{x}(t) \Vert \right] \int _{t_0}^{\infty } \Vert g(\tau ) \Vert d\tau < + \infty , \end{aligned}$$

which gives a bound for the second improper integral.

For the minimization property, consider the function $h: \,]t_0,+\infty [\rightarrow {\mathbb {R}}$, defined by $h(t)=\frac{1}{2}\Vert x(t)-z\Vert ^{2}$, where z is an arbitrary element of ${{\mathcal {H}}}$. We can easily verify that

$$\begin{aligned} \ddot{h}(t)+\frac{\alpha }{t}\dot{h}(t)= & {} \Vert \dot{x}(t)\Vert ^{2}-\langle \nabla \Phi (x(t)),x(t)-z\rangle \\&+\,\langle g(t),x(t)-z\rangle . \end{aligned}$$

By convexity of $\Phi $, we obtain

$$\begin{aligned} \ddot{h}(t)+\frac{\alpha }{t}\dot{h}(t)+\Phi (x(t))-\Phi (z)\le \Vert \dot{x}(t)\Vert ^{2}+\langle g(t),x(t)-z\rangle . \end{aligned}$$

(33)

Recall that the function W defined above is nonincreasing and bounded from below. Hence, W(t) converges, as $t\rightarrow +\infty $, to some $W_{\infty } \in {{\mathbb {R}}}$. Moreover, using the definition of W in (33), we deduce that

$$\begin{aligned}&\ddot{h}(t)+\frac{\alpha }{t}\dot{h}(t)+W(t) + \inf \Phi -\Phi (z)\nonumber \\&\quad \le \frac{3}{2}\Vert \dot{x}(t)\Vert ^{2}+\langle g(t),x(t)-z\rangle +\int _{t}^{\infty }\langle \dot{x}(s),g(s)\rangle ds. \end{aligned}$$

(34)

Setting $B_{\infty }=W_{\infty } +\inf \Phi -\Phi (z) $, we may write

$$\begin{aligned} B_{\infty }\le & {} \frac{3}{2}\Vert \dot{x}(t)\Vert ^{2}+\Vert g(t)\Vert \Vert x(t)-z\Vert \\&+\left( \sup _{t\ge t_0}\Vert \dot{x}(t)\Vert \right) \int _{t}^{\infty }\Vert g(s)\Vert ds-\frac{1}{t^{\alpha }}\frac{d}{dt}(t^{\alpha }\dot{h}(t)). \end{aligned}$$

Multiplying this last equation by $\frac{1}{t}$, and integrating between $t_0$ and $\theta >t_0$, we get

$$\begin{aligned} B_{\infty } \ln \left( \frac{\theta }{t_{0}}\right)&\le \frac{3}{2} \int _{t_{0}}^{\theta }\frac{1}{t}\Vert \dot{x}(t)\Vert ^{2}dt+\int _{t_{0}}^{\theta } \dfrac{\Vert g(t)\Vert \Vert x(t)-z\Vert }{t} dt\\&\quad +\left( \sup _{t\ge t_0}\Vert \dot{x}(t)\Vert \right) \int _{t_{0}}^{\theta } \left( \frac{1}{t}\int _{t}^{\infty }\Vert g(s)\Vert ds\right) dt \\&\quad -\int _{t_{0}}^{\theta }\frac{1}{t^{\alpha +1}}\frac{d}{dt}(t^{\alpha } \dot{h}(t))dt. \end{aligned}$$

Let us estimate the integrals in the second member of the last inequality:

(1)
The first term is finite, in view of Lemma 4.1.
(2)
The second term is also finite, since the relation $\Vert x(t)-z\Vert \le \Vert x(t_0)-z\Vert +\int _{t_0}^{t}\Vert \dot{x}(s)\Vert ds$ implies
$$\begin{aligned} \int _{t_{0}}^{\theta }\dfrac{\Vert g(t)\Vert \Vert x(t)-z\Vert }{t} dt\le \left( \frac{\Vert x_{0}-z\Vert }{t_{0}}+ \sup _{t\ge t_0}\Vert \dot{x}(t)\Vert \right) \int _{t_{0}}^{+\infty }\Vert g(t)\Vert dt< +\infty .\end{aligned}$$
(3)
For the third term, integration by parts gives
$$\begin{aligned} \int _{t_{0}}^{\theta } \left( \frac{1}{t}\int _{t}^{\infty }\Vert g(s)\Vert ds\right) dt= & {} \ln \theta \int _{\theta }^{\infty }\Vert g(s)\Vert ds - \ln t_0 \int _{t_0}^{\infty }\Vert g(s)\Vert ds \\&+ \int _{t_{0}}^{\theta } \Vert g(t)\Vert \ln t \ dt. \end{aligned}$$
(4)
For the fourth term, set $I=\int _{t_{0}}^{\theta }\frac{1}{t^{\alpha +1}}\frac{d}{dt}(t^{\alpha }\dot{h}(t))dt$, and integrate by parts twice to obtain
$$\begin{aligned} I= & {} \left[ \frac{1}{t}\dot{h}(t)\right] _{t_{0}}^{\theta } +(\alpha +1)\int _{t_{0}}^{\theta }\frac{1}{t^{2}}\dot{h}(t)dt\\= & {} C_0 +\frac{1}{\theta }\dot{h}(\theta ) +\frac{(1+\alpha )}{\theta ^{2}}h(\theta )\\&+\,2(1+\alpha )\int _{t_{0}}^{\theta } \frac{1}{t^{3}}h(t)dt\ge C_0 +\frac{1}{\theta }\dot{h}(\theta ), \end{aligned}$$
for some constant $C_0$, because $h\ge 0$. Finally, notice that
$$\begin{aligned} \vert \dot{h}(\theta )\vert =\vert \langle \dot{x}(\theta ),x(\theta )-z\rangle \vert \le \sup _{t\ge t_0}\Vert \dot{x}(t)\Vert \left( \Vert x(0)-z\Vert +\theta \sup _{t\ge t_0}\Vert \dot{x}(t)\Vert \right) .\end{aligned}$$

Collecting the above results, we deduce that

$$\begin{aligned} B_{\infty } \ln \left( \frac{\theta }{t_{0}}\right) \le C_1 + \ln \theta \int _{\theta }^{\infty }\Vert g(s)\Vert ds +\left( \sup _{t\ge t_0}\Vert \dot{x}(t)\Vert \right) \int _{t_{0}}^{\theta } \Vert g(t)\Vert \ln t \ dt, \end{aligned}$$

for some other constant $C_1$. Dividing by $\ln (\frac{\theta }{t_{0}})$, and letting $\theta \rightarrow +\infty $, we conclude that $B_{\infty } \le 0$, by using Lemma 5.11 with $\psi (t) =\ln t$. This implies that $W_{\infty }\le \Phi (z)-\inf \Phi $ for every $z\in {{\mathcal {H}}}$, which leads to $W_{\infty } \le 0$.

On the other hand, it is easy to see that

$$\begin{aligned} W(t)\ge \Phi (x(t))-\inf \Phi -\left( \sup _{t\ge t_0}\Vert \dot{x}(t)\Vert \right) \int _{t}^{+\infty }g(s)ds. \end{aligned}$$

Passing to the limit, as $t\rightarrow +\infty $, we deduce that

$$\begin{aligned} 0\ge W_{\infty }\ge \limsup \Phi (x(t))- \inf \Phi . \end{aligned}$$

Since we always have $\inf \Phi \le \liminf \Phi (x(t))$, we conclude that $\lim _{t\rightarrow +\infty }\Phi (x(t))=\inf \Phi $. $\square $

4.2 Fast convergence of the values

We are now in position to prove the following:

Theorem 4.2

Let ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $, and let $x: [t_0, +\infty [ \rightarrow {{\mathcal {H}}}$ be a solution of (29) with $\alpha \ge 3$ and ${\int _{t_0}^{\infty } t\,\Vert g(t)\Vert \, dt < + \infty }$. Then $\Phi (x(t))- \min _{{{\mathcal {H}}}}\Phi = \mathcal {O} \left( \frac{1}{t^2}\right) $.

Proof

The proof follows the arguments used for Theorem 2.7. Take $x^{*} \in S={{\mathrm{argmin\,}}}\Phi $. For $t_0 \le t \le T$, define the energy function

$$\begin{aligned} {{\mathcal {E}}}_{\alpha ,g, T} (t):= & {} \frac{2}{\alpha -1}t^2 (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi )+ (\alpha -1) \Vert x(t) - x^{*} \\&+\,\frac{t}{\alpha -1} \dot{x}(t) \Vert ^2 + 2\int _t^{T} \tau \langle x(\tau ) - x^{*} + \frac{\tau }{\alpha -1} \dot{x}(\tau ), g(\tau ) \rangle d\tau . \end{aligned}$$

Let us show that

$$\begin{aligned} \dot{{{\mathcal {E}}}}_{\alpha ,g,T} (t) + 2 \displaystyle {\frac{\alpha -3}{\alpha -1}} t (\Phi (x(t))- \min _{{{\mathcal {H}}}}\Phi ) \le 0. \end{aligned}$$

Derivation of ${{\mathcal {E}}}_{\alpha , g,T}$ gives

$$\begin{aligned} \dot{{{\mathcal {E}}}}_{\alpha ,g,T} (t):= & {} \frac{4}{\alpha -1}t (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi ) + \frac{2}{\alpha -1}t^2 \langle \nabla \Phi (x(t)), \dot{x}(t) \rangle \\&-\, 2t\left\langle x(t) - x^{*} + \frac{t}{\alpha -1} \dot{x}(t) , g(t) \right\rangle \\&+\, 2(\alpha -1) \left\langle x(t) - x^{*} + \frac{t}{\alpha -1} \dot{x}(t), \dot{x}(t) + \frac{1}{\alpha -1} \dot{x}(t) +\frac{t}{\alpha -1} \ddot{x}(t) \right\rangle \\= & {} \frac{4}{\alpha -1}t (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi ) + \frac{2}{\alpha -1}t^2 \langle \nabla \Phi (x(t)), \dot{x}(t) \rangle \\&+\, 2(\alpha -1) \left\langle x(t) - x^{*} + \frac{t}{\alpha -1} \dot{x}(t), \frac{t}{\alpha -1} \Big ( \frac{\alpha }{t}\dot{x}(t) + \ddot{x}(t) -g(t)\Big ) \right\rangle \\= & {} \frac{4}{\alpha -1}t (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi ) + \frac{2}{\alpha -1}t^2 \langle \nabla \Phi (x(t)), \dot{x}(t) \rangle \\&-\, 2 t\left\langle x(t) - x^{*} + \frac{t}{\alpha -1} \dot{x}(t), \nabla \Phi (x(t)) \right\rangle \\= & {} \frac{4}{\alpha -1}t \left( \Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi \right) - 2 t \langle x(t) - x^{*}, \nabla \Phi (x(t)) \rangle . \end{aligned}$$

Using the subdifferential inequality for $\Phi $, and rearranging the terms, we obtain

$$\begin{aligned} \dot{{{\mathcal {E}}}}_{\alpha ,g,T} (t) + 2 \frac{\alpha -3}{\alpha -1} t (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi ) \le 0. \end{aligned}$$

(35)

As a consequence, for $\alpha \ge 3$, the function ${\mathcal {E}}_{\alpha ,g,T}$ is nonincreasing. In particular, ${\mathcal {E}}_{\alpha ,g,T} (t) \le {{{\mathcal {E}}}}_{\alpha ,g,T} (t_0)$, which gives

$$\begin{aligned}&\frac{2}{\alpha -1}t^2 (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi )+ (\alpha -1) \Vert x(t) - x^{*} \nonumber \\&\quad + \frac{t}{\alpha -1} \dot{x}(t) \Vert ^2 \le C+ 2\int _{t_{0}}^{t} \tau \langle x(\tau ) - x^{*} + \frac{\tau }{\alpha -1} \dot{x}(\tau ), g(\tau ) \rangle d\tau , \end{aligned}$$

(36)

with

$$\begin{aligned} C=\frac{2}{\alpha -1}{t_0}^2 (\Phi (x_0)- \inf _{{{\mathcal {H}}}}\Phi )+ (\alpha -1) \Vert x_0 - x^{*} + \frac{t_0}{\alpha -1}\dot{x}(t_0)\Vert ^2. \end{aligned}$$

From (36), we infer that

$$\begin{aligned} \frac{1}{2} \Vert x(t) - x^{*} + \frac{t}{\alpha -1} \dot{x}(t) \Vert ^2\le & {} \frac{C}{2(\alpha -1)}+ \frac{1}{\alpha -1} \int _{t_{0}}^{t} \Vert x(\tau ) - x^{*} \\&+\, \frac{\tau }{\alpha -1} \dot{x}(\tau )\Vert \Vert \tau g(\tau ) \Vert d\tau . \end{aligned}$$

Applying Lemma 5.13, we obtain

$$\begin{aligned} \Vert x(t) - x^{*} + \frac{t}{\alpha -1} \dot{x}(t) \Vert \le \left( \frac{C}{\alpha -1}\right) ^{ \frac{1}{2} } + \frac{1}{\alpha -1} \int _{t_{0}}^{t} \tau \Vert g(\tau ) \Vert d\tau , \end{aligned}$$

(37)

and so

$$\begin{aligned} \sup _{t\ge t_0} \Vert x(t) - x^{*} + \frac{t}{\alpha -1} \dot{x}(t) \Vert <+\infty . \end{aligned}$$

(38)

Using (37) in (36), we conclude that

$$\begin{aligned}&\frac{2}{\alpha -1}t^2 (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi ) \le C+ 2 \left( \left( \frac{C}{\alpha -1}\right) ^{ \frac{1}{2} } + \frac{1}{\alpha -1} \int _{t_{0}}^{\infty } \tau \Vert g(\tau ) \Vert d\tau \right) \\&\int _{t_{0}}^{\infty } \tau \Vert g(\tau ) \Vert d\tau , \end{aligned}$$

and the result follows. $\square $

Remark 4.3

As a consequence, the energy function

$$\begin{aligned} {{\mathcal {E}}}_{\alpha ,g} (t):= & {} \frac{2}{\alpha -1}t^2 (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi )+ (\alpha -1) \Vert x(t) - x^{*} + \frac{t}{\alpha -1}\dot{x}(t)\Vert ^2 \\&+\,2\int _t^{+\infty } \tau \langle x(\tau ) - x^{*} + \frac{\tau }{\alpha -1} \dot{x}(\tau ), g(\tau ) \rangle d\tau \end{aligned}$$

is well defined on $[t_0,+\infty [$, and is a Lyapunov function for the dynamical system (29).

4.3 Convergence of the trajectories

In the case $\alpha >3$, provided that the second member g(t) is sufficiently small for large t, we are going to show the convergence of the trajectories of (29), as it occurs for the unperturbed system studied in the previous sections

Theorem 4.4

Let ${{\mathrm{argmin\,}}}\Phi \ne \emptyset $, and let $x: [t_0, +\infty [ \rightarrow {{\mathcal {H}}}$ be a solution of (29) with $\alpha >3$ and ${\int _{t_0}^{\infty } t\,\Vert g(t)\Vert \, dt < + \infty }$. Then, x(t) converges weakly, as $t\rightarrow +\infty $, to a point in ${{\mathrm{argmin\,}}}\Phi $.

Proof

Step 1 Recall, from the proof of Theorem 4.2, that the energy function ${\mathcal E}_{\alpha ,g}$ defined in Remark 4.3 satisfies

$$\begin{aligned} \dot{{{\mathcal {E}}}}_{\alpha ,g} (t) + 2 \frac{\alpha -3}{\alpha -1} t \left( \Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi \right) \le 0. \end{aligned}$$

Integrating this inequality, we obtain

$$\begin{aligned} {{{\mathcal {E}}}}_{\alpha ,g} (t) + 2 \frac{\alpha -3}{\alpha -1} \int _{t_0}^t \tau \left( \Phi (x(\tau ))- \inf _{{{\mathcal {H}}}}\Phi \right) d\tau \le {{{\mathcal {E}}}}_{\alpha ,g} (t_0). \end{aligned}$$

By the definition of ${{{\mathcal {E}}}}_{\alpha ,g}$, and neglecting its nonnegative terms, we infer that

$$\begin{aligned}&2\int _t^{+\infty } \tau \langle x(\tau ) - x^{*} + \frac{\tau }{\alpha -1} \dot{x}(\tau ), g(\tau ) \rangle d\tau \\&\quad +\, 2 \frac{\alpha -3}{\alpha -1} \int _{t_0}^t \tau \left( \Phi (x(\tau ))- \inf _{{{\mathcal {H}}}}\Phi \right) d\tau \le {{{\mathcal {E}}}}_{\alpha ,g} (t_0), \end{aligned}$$

and so

$$\begin{aligned}&2 \frac{\alpha -3}{\alpha -1} \int _{t_0}^t \tau \left( \Phi (x(\tau ))- \inf _{{{\mathcal {H}}}}\Phi \right) d\tau \\&\quad \le {{{\mathcal {E}}}}_{\alpha ,g} (t_0) + 2\int _{t_0}^{+\infty } \left\| x(\tau ) - x^{*} + \frac{\tau }{\alpha -1} \dot{x}(\tau )\right\| \Vert \tau g(\tau ) \Vert \rangle d\tau \\&\quad \le {{{\mathcal {E}}}}_{\alpha ,g} (t_0) + 2 \sup _{t\ge t_0} \left\| x(t) - x^{*} + \frac{t}{\alpha -1} \dot{x}(t) \right\| \int _{t_0}^{+\infty } \Vert \tau g(\tau ) \Vert \rangle d\tau . \end{aligned}$$

by (38). Since $\alpha >3$, we deduce that

$$\begin{aligned} \int _{t_0}^{+\infty } \tau (\Phi (x(\tau ))- \inf _{{{\mathcal {H}}}}\Phi ) d\tau < + \infty . \end{aligned}$$

(39)

Step 2 Let us show that

$$\begin{aligned} \int _{t_0}^{\infty } t \Vert \dot{x}(t) \Vert ^2 dt < +\infty . \end{aligned}$$

By taking the scalar product of (29) by $t^2 \dot{x}(t)$, we get

$$\begin{aligned} t^2\langle \ddot{x}(t), \dot{x}(t) \rangle + \alpha t \Vert \dot{x}(t)\Vert ^2 + t^2 \langle \nabla \Phi (x(t)), \dot{x}(t) \rangle = t^2 \langle g(t), \dot{x}(t) \rangle . \end{aligned}$$

Using the Chain Rule and the Cauchy–Schwarz inequality, we obtain

$$\begin{aligned} \frac{1}{2}t^2 \frac{d}{dt}\Vert \dot{x}(t)\Vert ^2 + \alpha t \Vert \dot{x}(t)\Vert ^2 + t^2 \frac{d}{dt}\Phi (x(t) \le \Vert t g(t)\Vert \Vert t \dot{x}(t)\Vert . \end{aligned}$$

Integration by parts yields

$$\begin{aligned}&\frac{t^2}{2}\Vert \dot{x}(t)\Vert ^2 - \frac{{t_0}^2}{2}\Vert \dot{x}(t_0)\Vert ^2- \int _{t_0}^t s \Vert \dot{x}(s)\Vert ^2 ds + \alpha \int _{t_0}^t s \Vert \dot{x}(s)\Vert ^2 ds \\&\quad + t^2 (\Phi (x(t))- \inf _{{{\mathcal {H}}}}\Phi ) - {t_0}^2 \left( \Phi (x(t_0))- \inf _{{{\mathcal {H}}}}\Phi \right) \\&\quad - 2\int _{t_0}^t s \left( \Phi (x(s))- \inf _{{{\mathcal {H}}}}\Phi \right) ds \le \int _{t_0}^t \Vert s g(s)\Vert \Vert s \dot{x}(s)\Vert ds. \end{aligned}$$

As a consequence,

$$\begin{aligned}&\frac{t^2}{2}\Vert \dot{x}(t)\Vert ^2 + (\alpha -1) \int _{t_0}^t s \Vert \dot{x}(s)\Vert ^2 ds \le C_0 \nonumber \\&\quad +\, 2\int _{t_0}^t s (\Phi (x(s))- \inf _{{{\mathcal {H}}}}\Phi )ds + \int _{t_0}^t \Vert s g(s)\Vert \Vert s \dot{x}(s)\Vert ds \end{aligned}$$

(40)

for some constant $C_0$ depending only on the Cauchy data. Since $\int _{t_0}^{\infty } s (\Phi (x(s))- \inf _{{{\mathcal {H}}}}\Phi )ds <+ \infty $ by (39), and $\alpha >1$, we deduce that

$$\begin{aligned} \frac{1}{2}\Vert t\dot{x}(t)\Vert ^2 \le C_1+ \int _{t_0}^t \Vert s g(s)\Vert \Vert s \dot{x}(s)\Vert ds \end{aligned}$$

(41)

for some other constant $C_1$, which we may assume to be nonnegative. Applying Lemma 5.13, we obtain

$$\begin{aligned} \Vert t\dot{x}(t)\Vert \le \sqrt{2C_1}+ \int _{t_0}^t \Vert s g(s)\Vert ds, \end{aligned}$$

and so

$$\begin{aligned} \sup _{t\ge t_0} \Vert t\dot{x}(t)\Vert < + \infty . \end{aligned}$$

(42)

Returning to (40), we deduce that

$$\begin{aligned}&(\alpha -1) \int _{t_0}^t s \Vert \dot{x}(s)\Vert ^2 ds \le C+ 2\int _{t_0}^{\infty } s (\Phi (x(s))- \inf _{{{\mathcal {H}}}}\Phi )ds \nonumber \\&\quad +\sup _{t\ge t_0}\Vert t \dot{x}(t)\Vert \int _{t_0}^{\infty } \Vert s g(s)\Vert ds, \end{aligned}$$

(43)

which gives

$$\begin{aligned} \int _{t_0}^{\infty } t\Vert \dot{x}(t) \Vert ^2 dt < + \infty . \end{aligned}$$

Moreover, combining (38) and (42), we deduce that

$$\begin{aligned} \sup _{t\ge t_0} \Vert x(t)\Vert < + \infty , \end{aligned}$$

(44)

and the trajectory x is bounded.

Step 3 As before, we prove the weak convergence by means of Opial’s Lemma 5.7. Take $x^{*}\in {{\mathrm{argmin\,}}}\Phi $, and define $h: [0, +\infty [ \rightarrow {{\mathbb {R}}}^+$ by

$$\begin{aligned} h(t) = \frac{1}{2}\Vert x(t) - x^{*}\Vert ^2. \end{aligned}$$

By the Chain Rule,

$$\begin{aligned} \dot{h}(t)&= \langle x(t) - x^{*}, \dot{x}(t) \rangle ,\\ \ddot{h}(t)&= \langle x(t) - x^{*}, \ddot{x}(t) \rangle + \Vert \dot{x}(t) \Vert ^2. \end{aligned}$$

Combining these two equations, and using (29), we obtain

$$\begin{aligned} \ddot{h}(t) + \frac{\alpha }{t} \dot{h}(t)&= \Vert \dot{x}(t) \Vert ^2 + \langle x(t) - x^{*}, \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) \rangle ,\\&= \Vert \dot{x}(t) \Vert ^2 + \langle x(t) - x^{*}, -\nabla \Phi (x(t)) + g(t)\rangle . \end{aligned}$$

By the monotonicity of $\nabla \Phi $ and the fact that $\nabla \Phi (x^{*}) = 0$, we have

$$\begin{aligned} \langle x(t) - x^{*}, -\nabla \Phi (x(t)) \rangle \le 0, \end{aligned}$$

(45)

and we infer that

$$\begin{aligned} \ddot{h}(t) + \frac{\alpha }{t} \dot{h}(t) \le \Vert \dot{x}(t) \Vert ^2 + \Vert x(t) - x^{*} \Vert \Vert g(t) \Vert . \end{aligned}$$

Equivalently

$$\begin{aligned} \ddot{h}(t) + \frac{\alpha }{t} \dot{h}(t) \le k(t), \end{aligned}$$

(46)

with

$$\begin{aligned} k(t):= \Vert \dot{x}(t) \Vert ^2 + \Vert x(t) - x^{*} \Vert \Vert g(t) \Vert \le \Vert \dot{x}(t) \Vert ^2 + C_2 \Vert g(t) \Vert \end{aligned}$$

because the trajectory x is bounded, by (44). Recall that $\int _{t_0}^{+\infty } t \Vert g(t)\Vert dt < + \infty $ by assumption, and $\int _{t_0}^{\infty } t\Vert \dot{x}(t) \Vert ^2 dt < + \infty $ by (13). Hence, the function $t \mapsto tk(t)$ belongs to $L^1 (t_0, +\infty )$. Applying Lemma 5.9, with $w(t)= \dot{h}(t)$, we deduce that $\dot{h}^+(t) \in L^1 (t_0, +\infty )$, which implies that the limit of h(t) exists, as $t \rightarrow + \infty $. This proves item (i) of Opial’s Lemma 5.7. For item (ii), observe that every weak limit point of x(t) as $t\rightarrow +\infty $ must minimize $\Phi $, since $\lim _{t\rightarrow +\infty }\Phi (x(t))=\inf \Phi $. $\square $

Remark 4.5

Throughout the proof of Theorem 4.4, we proved that

$$\begin{aligned} \int _{t_0}^{\infty } t \left( \Phi (x(t)) - \min _{{{\mathcal {H}}}}\Phi \right) dt < + \infty ,\qquad \hbox {and}\qquad \int _{t_0}^{\infty } t\Vert \dot{x}(t) \Vert ^2 dt < + \infty . \end{aligned}$$

We also proved that $\sup \nolimits _{t\ge t_0}t\, \Vert \dot{x}(t) \Vert <+\infty $, and hence $\lim \nolimits _{t\rightarrow \infty } \Vert \dot{x}(t) \Vert = 0$.

4.4 Strong convergence results

For strong convergence, we have the following:

Theorem 4.6

Let $\Phi :{{\mathcal {H}}}\rightarrow {{\mathbb {R}}}$ be a continuously differentiable convex function, and let $x:[t_0,+\infty [\rightarrow \mathcal {H}$ be a solution of (29) with $\alpha >3$ and ${\int _{t_0}^\infty t\,\Vert g(t)\Vert \,dt<+\infty }$. Then x(t) converges strongly, as $t\rightarrow +\infty $, in any of the following cases:

(i)
The set ${{\mathrm{argmin\,}}}\Phi $ has nonempty interior;
(ii)
The function $\Phi $ is even; or
(iii)
The function $\Phi $ is uniformly convex.

In order to prove this result, it suffices to adapt the arguments given in Sect. 3 for the unperturbed case. Since it is relatively straightforward, we leave it as an exercise to the reader.

5 Convergence of the associated algorithms

In this section, we analyze the fast convergence properties of the associated Nesterov-type algorithms. To avoid repeating similar arguments, we state the results and develop the proofs directly for the perturbed version.

5.1 A dynamical introduction of the algorithm

Time discretization of dissipative gradient-based dynamical systems leads naturally to algorithms, which, under appropriate assumptions, have similar convergence properties. This approach has been followed successfully in a variety of situations. For a general abstract discussion see [5, 6]. For dynamics with inertial features, see [2–4, 9]. To cover practical situations involving constraints or non-smooth data, we need to broaden our scope. This leads us to consider the non-smooth structured convex minimization problem

$$\begin{aligned} \min \left\{ \Psi (x) + \Phi (x): \ x \in {{\mathcal {H}}} \right\} \end{aligned}$$

(47)

where $\Psi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}} \cup \lbrace + \infty \rbrace $ is a proper lower semicontinuous convex function; and $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ is a continuously differentiable convex function, whose gradient is Lipschitz continuous.

The optimal solutions of (47) satisfy

$$\begin{aligned} \partial \Psi (x) + \nabla \Phi (x) \ni 0, \end{aligned}$$

where $\partial \Psi $ is the subdifferential of $\Psi $, in the sense of convex analysis. In order to adapt our dynamic to this non-smooth situation, we will consider the corresponding differential inclusion

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \partial \Psi (x(t)) + \nabla \Phi (x(t)) \ni g(t). \end{aligned}$$

(48)

This dynamical system is within the following framework

$$\begin{aligned} \ddot{x}(t) + a(t) \dot{x}(t) + \partial \Theta (x(t)) \ni g(t), \end{aligned}$$

(49)

where $\Theta : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}} \cup \lbrace + \infty \rbrace $ is a proper lower semicontinuous convex function, and $a(\cdot )$ is a positive damping parameter.

It is interesting to establish the asymptotic properties, as $t\rightarrow +\infty $, of the solutions of the differential inclusion (48). Beyond global existence issues, one must check that the Lyapunov analysis is still valid. In view of the validity of the subdifferential inequality for convex functions, the (generalized) chain rule for derivatives over curves (see [15]), most results presented in the previous sections can be transposed to this more general context, except for the stabilization of the acceleration, which relies on the Lipschitz character of the gradient. However, a detailed study of this differential inclusion goes far beyond the scope of the present article. See [8] for some results in the case of a fixed positive damping parameter, i.e., $a(t)= \gamma >0$ fixed, and $g=0$. Thus, setting $\Theta (x)= \Psi (x) + \Phi (x)$, we can reasonably assume that, for $\alpha >3$, and $\int _{t_0}^{+\infty } t \Vert g(t)\Vert dt < + \infty $, for each trajectory of (48), there is rapid convergence of the values

$$\begin{aligned} \Theta (x(t))- \min \Theta \le \frac{C}{t^2}, \end{aligned}$$

and weak convergence of the trajectory to an optimal solution.

We shall use these ideas as a guideline, in order to introduce corresponding fast converging algorithms, making the link with Nesterov [26–29] and Beck–Teboulle [14]; and so, extending the recent works of Chambolle–Dossal [19] and Su–Boyd–Candès [35] to the perturbed case.

In order to preserve the fast convergence properties of the dynamical system (48), we are going to discretize it implicitely with respect to the nonsmooth function $\Psi $, and explicitely with respect to the smooth function $\Phi $.

Taking a fixed time step size $h>0$, and setting $t_k= kh, x_k = x(t_k)$ the implicit/explicit finite difference scheme for (48) gives

$$\begin{aligned} \frac{1}{h^2}(x_{k+1} -2 x_{k} + x_{k-1} ) +\frac{\alpha }{kh^2}( x_{k} - x_{k-1}) + \partial \Psi (x_{k+1} ) + \nabla \Phi (y_k) \ni g_k, \end{aligned}$$

(50)

where $y_k$ is a linear combination of $x_k$ and $x_{k-1}$, that will be made precise later on. After developing (50), we obtain

$$\begin{aligned} x_{k+1} + h^2 \partial \Psi (x_{k+1}) \ni x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1}) - h^2 \nabla \Phi (y_k) + h^2 g_k. \end{aligned}$$

(51)

A natural choice for $y_k$ leading to a simple formulation of the algorithm (other choices are possible, offering new directions of research for the future) is

$$\begin{aligned} y_k= x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1}). \end{aligned}$$

(52)

Using the classical proximal operator (equivalently, the resolvent of the maximal monotone operator $\partial \Psi $)

$$\begin{aligned} \text{ prox }_{ \gamma \Psi } (x)= {{{\mathrm{argmin\,}}}}_{\xi \in {{\mathcal {H}}}} \left\{ \Psi (\xi ) + \frac{1}{2 \gamma } \Vert \xi -x \Vert ^2 \right\} = \left( I + \gamma \partial \Psi \right) ^{-1} (x), \end{aligned}$$

(53)

and setting $s=h^2$, the algorithm can be written as

$$\begin{aligned} \left\{ \begin{array}{lll} y_k = x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1}) \\ x_{k+1} = \text{ prox }_{s \Psi } \left( y_k- s (\nabla \Phi (y_k) -g_k) \right) . \end{array}\right. \end{aligned}$$

(54)

For practical purposes, and in order to fit with the existing literature on the subject, it is convenient to work with the following equivalent formulation

$$\begin{aligned} \left\{ \begin{array}{lll} y_k= x_{k} + \frac{k -1}{k + \alpha -1} ( x_{k} - x_{k-1}) \\ x_{k+1} = \text{ prox }_{s \Psi } \left( y_k- s (\nabla \Phi (y_k) -g_k)\right) . \end{array}\right. \end{aligned}$$

(55)

Indeed, we have $\frac{k -1}{k + \alpha -1} = 1- \frac{\alpha }{k + \alpha -1}$. When $\alpha $ is an integer, we obtain the same sequences $(x_k)$ and $(y_k)$, up to the reindexation $k\mapsto k + \alpha -1$. For general $\alpha >0$, we can easily verify that the algorithm (55) is still associated with the dynamical system (48).

This algorithm is within the scope of the proximal-based inertial algorithms [3, 24, 25], and forward-backward methods. In the unperturbed case, $g_k =0$, it has been recently considered by Chambolle–Dossal [19] and Su–Boyd–Candès [35]. It enjoys fast convergence properties which are very similar to that of the continuous dynamic.

For $\alpha = 3$ and $g_k =0$, we recover the classical FISTA algorithm developed by Beck–Teboulle [14], based on the acceleration method introduced by Nesterov [26] in the smooth case and by Güler [21] in the proximal setting:

$$\begin{aligned} \left\{ \begin{array}{lll} y_k = x_{k} + \frac{k -1}{k + 2} ( x_{k} - x_{k-1}) \\ x_{k+1} = \text{ prox }_{s \Psi } \left( y_k- s \nabla \Phi (y_k) \right) . \end{array}\right. \end{aligned}$$

(56)

An important question regarding the (FISTA) method, as described in (56), is the convergence of sequences $(x_k)$ and $(y_k)$, which is still an open question. A major interest to consider the broader context of algorithms (55) is that, for $\alpha >3$, these sequences converge, even when inexactly computed, provided the errors or perturbations are sufficiently small.

5.2 Fast convergence of the values

We will see that the fast convergence properties of algorithm (54) can be obtained in a parallel way with the convergence analysis in the continuous case in Theorem 4.2.

Theorem 5.1

Let $\Psi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}} \cup \lbrace + \infty \rbrace $ be proper, lower-semicontinuous and convex, and let $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ be convex and continuously differentiable with L-Lipschitz continuous gradient. Suppose that $S={{\mathrm{argmin\,}}}(\Psi + \Phi )\ne \emptyset $, and let $(x_k)$ be a sequence generated by algorithm (55) with $\alpha \ge 3, 0< s < \frac{1}{L} $, and $ \sum _{k \in {{\mathbb {N}}}} k \Vert g_k\Vert < + \infty $. Then,

$$\begin{aligned} (\Psi + \Phi )(x_k)- \min (\Psi + \Phi ) \le \frac{C (\alpha -1)}{2s\left( k + \alpha -2\right) ^2 }, \end{aligned}$$

(57)

where

$$\begin{aligned} C= & {} \frac{2s}{\alpha -1} \left( \alpha -2\right) ^2 (\Theta (x_0) - {\Theta }^*) + (\alpha -1) \Vert y_0 - x^{*} \Vert ^2 \\&+\, 2s\left[ \sum _{j=0}^{\infty } \left( j + \alpha -1\right) \Vert g_j\Vert \right] \left[ \sqrt{\frac{{{\mathcal {E}}} (0)}{\alpha -1}} + \frac{2s}{\alpha -1} \sum _{j=0}^{\infty } \left( j + \alpha -1\right) \Vert g_j\Vert \right] \end{aligned}$$

Proof

To simplify notations, we set $\Theta = \Psi + \Phi $, and take $x^{*} \in {{\mathrm{argmin\,}}}\Theta $. As in the continuous case, we shall prove that the energy sequence $({{\mathcal {E}}} (k))$ given by

$$\begin{aligned} {{\mathcal {E}}} (k):= & {} \frac{2s}{\alpha -1} \left( k + \alpha -2\right) ^2 (\Theta (x_k) - \Theta (x^{*}) + (\alpha -1) \Vert z_k -x^{*} \Vert ^2 \nonumber \\&+ \sum _{j=k}^{\infty } 2s\left( j + \alpha -1\right) \left\langle g_j, z_{j+1}- x^{*} \right\rangle , \end{aligned}$$

(58)

with

$$\begin{aligned} z_k := \frac{k + \alpha -1}{\alpha -1}y_k - \frac{k}{\alpha -1}x_k, \end{aligned}$$

(59)

is non-increasing (we shall justify further that it is well defined). Note that ${{\mathcal {E}}} (k)$ equals the Lyapunov function considered by Su–Boyd–Candès in [35, Theorem 4.3], plus a perturbation term. For each $y \in {{\mathcal {H}}}$, we set $\Phi _k (y):= \Phi (y) - \left\langle g_k, y \right\rangle $, and $\Theta _k(y) = \Psi (y) + \Phi _k(y)$. Since $\nabla \Phi _k (y)= \nabla \Phi (y) -g_k$, we deduce that $\nabla \Phi _k$ is still L-Lipschitz continuous. By introducing the operator $G_{s,k}: {{\mathcal {H}}} \rightarrow {{\mathcal {H}}}$, defined by

$$\begin{aligned} G_{s,k} (y) = \frac{1}{s}\left( y - \text{ prox }_{s \Psi } \left( y- s \nabla \Phi _k (y) \right) \right) \end{aligned}$$

for each $y \in {{\mathcal {H}}}$, we can write

$$\begin{aligned} \text{ prox }_{s \Psi } \left( y- s \nabla \Phi _k (y) \right) = y - s G_{s,k} (y), \end{aligned}$$

and rewrite algorithm (55) as

$$\begin{aligned} \left\{ \begin{array}{lll} y_k = x_{k} + \frac{k -1}{k + \alpha -1} ( x_{k} - x_{k-1});\\ x_{k+1} = y_k - s G_{s,k} (y_k). \end{array}\right. \end{aligned}$$

(60)

The variable $z_k$, defined in (59), will play an important role. It comes naturally into play as a discrete version of the term $ \frac{t}{\alpha -1} \dot{x}(t) + x(t) - x^{*} $ which enters ${{{\mathcal {E}}}}_{\alpha ,g}(t)$. Simple algebraic manipulations give

$$\begin{aligned} z_{k+1}= & {} \frac{k + \alpha -1}{\alpha -1} x_{k+1} -\frac{k }{\alpha -1} x_k \nonumber \\= & {} \frac{k + \alpha -1}{\alpha -1} \left( x_{k+1}- x_{k} \right) + x_{k}, \end{aligned}$$

(61)

and also

$$\begin{aligned} z_{k+1}= & {} \frac{k + \alpha -1}{\alpha -1} \left( y_k - s G_{s,k} (y_k) \right) -\frac{k}{\alpha -1}x_{k} \nonumber \\= & {} z_k -\frac{s}{\alpha -1} \left( k + \alpha -1\right) G_{s,k} (y_k). \end{aligned}$$

(62)

The operator $G_{s,k}$ satisfies

$$\begin{aligned} \Theta _k (y - sG_{s,k} (y)) \le \Theta _k (x) + \left\langle G_{s,k} (y), y-x \right\rangle -\frac{s}{2} \Vert G_{s,k} (y) \Vert ^2 \end{aligned}$$

(63)

for all $x, y\in {{\mathcal {H}}}$ (see [14, 19, 31, 35]), since $s \le \frac{1}{L}$, and $\nabla \Phi _k$ is L-lipschitz continuous. Let us write successively this formula at $y=y_k$ and $x= x_k$, then at $y=y_k$ and $x= x^{*}$. We obtain

$$\begin{aligned}&\Theta _k (y_k - sG_{s,k} (y_k)) \le \Theta _k (x_k) + \left\langle G_{s,k} (y_k), y_k-x_k \right\rangle -\frac{s}{2} \Vert G_{s,k} (y_k) \Vert ^2,\quad \hbox {and} \\&\Theta _k (y_k - sG_{s,k} (y_k)) \le \Theta _k (x^{*}) + \left\langle G_{s,k} (y_k), y_k-x^{*} \right\rangle -\frac{s}{2} \Vert G_{s,k} (y_k) \Vert ^2, \end{aligned}$$

respectively. Multiplying the first inequality by $\frac{k}{k + \alpha -1}$, and the second one by $\frac{\alpha -1}{k + \alpha -1}$, then adding the two resulting inequalities, and using the fact that $x_{k+1} = y_k - s G_{s,k} (y_k)$, we obtain

$$\begin{aligned} \Theta _k (x_{k+1})&\le \frac{k}{k + \alpha -1} \Theta _k (x_k) + \frac{\alpha -1}{k + \alpha -1}\Theta _k (x^{*}) -\frac{s}{2} \Vert G_{s,k} (y_k) \Vert ^2 \nonumber \\&\quad +\, \left\langle G_{s,k} (y_k), \frac{k}{k + \alpha -1}(y_k-x_k ) + \frac{\alpha -1}{k + \alpha -1} \left( y_k-x^{*}\right) \right\rangle . \end{aligned}$$

We rewrite the scalar product above as

$$\begin{aligned}&\left\langle G_{s,k} (y_k), \frac{k}{k + \alpha -1}(y_k-x_k ) + \frac{\alpha -1}{k + \alpha -1} (y_k-x^{*})\right\rangle \\&\quad = \frac{\alpha -1}{k + \alpha -1} \left\langle G_{s,k} (y_k), \frac{k}{\alpha -1}(y_k-x_k ) + y_k-x^{*}\right\rangle \\&\quad =\frac{\alpha -1}{k + \alpha -1} \left\langle G_{s,k} (y_k), \frac{k+\alpha -1}{\alpha -1}y_k- \frac{k}{\alpha -1} x_k -x^{*}\right\rangle \\&\quad =\frac{\alpha -1}{k + \alpha -1} \left\langle G_{s,k} (y_k), z_k -x^{*}\right\rangle . \end{aligned}$$

We obtain

$$\begin{aligned} \Theta _k (x_{k+1})\le & {} \frac{k}{k + \alpha -1} \Theta _k (x_k) + \frac{\alpha -1}{k + \alpha -1}\Theta _k (x^{*}) \nonumber \\&+\, \frac{\alpha -1}{k + \alpha -1} \left\langle G_{s,k} (y_k), z_k -x^{*}\right\rangle -\frac{s}{2} \Vert G_{s,k} (y_k) \Vert ^2. \end{aligned}$$

(64)

We shall obtain a recursion from (64). To this end, observe that (62) gives

$$\begin{aligned} z_{k+1} -x^{*} = z_k -x^{*} -\frac{s}{\alpha -1} \left( k + \alpha -1\right) G_{s,k} (y_k). \end{aligned}$$

After developing

$$\begin{aligned} \Vert z_{k+1} -x^{*} \Vert ^2= & {} \Vert z_{k} -x^{*} \Vert ^2 -2\frac{s}{\alpha -1} \left( k + \alpha -1\right) \left\langle z_{k} -x^{*}, G_{s,k} (y_k) \right\rangle \\&+\, \frac{s^2}{(\alpha -1)^2} \left( k + \alpha -1\right) ^2 \Vert G_{s,k} (y_k) \Vert ^2, \end{aligned}$$

and multiplying the above expression by $\frac{(\alpha -1)^2}{2s\left( k + \alpha -1\right) ^2}$, we obtain

$$\begin{aligned}&\frac{(\alpha -1)^2}{2s\left( k + \alpha -1\right) ^2} \left( \Vert z_{k} -x^{*} \Vert ^2 -\Vert z_{k+1} -x^{*} \Vert ^2 \right) \\&\quad = \frac{\alpha -1}{k + \alpha -1} \left\langle G_{s,k} (y_k), z_k -x^{*}\right\rangle -\frac{s}{2} \Vert G_{s,k} (y_k) \Vert ^2. \end{aligned}$$

Replacing this expression in (64), we obtain

$$\begin{aligned} \Theta _k (x_{k+1})\le & {} \frac{k}{k + \alpha -1} \Theta _k (x_k) + \frac{\alpha -1}{k + \alpha -1}\Theta _k (x^{*}) \\&+\, \frac{(\alpha -1)^2}{2s\left( k + \alpha -1\right) ^2} \left( \Vert z_{k} -x^{*} \Vert ^2 -\Vert z_{k+1} -x^{*} \Vert ^2 \right) . \end{aligned}$$

Equivalently,

$$\begin{aligned} \Theta _k (x_{k+1})-\Theta _k (x^{*})\le & {} \frac{k}{k + \alpha -1} \left( \Theta _k (x_k) -\Theta _k (x^{*}) \right) \\&+\, \frac{(\alpha -1)^2}{2s\left( k + \alpha -1\right) ^2} \left( \Vert z_{k} -x^{*} \Vert ^2 -\Vert z_{k+1} -x^{*} \Vert ^2 \right) . \end{aligned}$$

Recalling that $\Theta (y)= \Theta _k (y) + \left\langle g_k, y \right\rangle $, we obtain

$$\begin{aligned} \Theta (x_{k+1})-\Theta (x^{*})&\le \frac{k}{k + \alpha -1} \left( \Theta (x_k) -\Theta (x^{*}) \right) \\&\quad \,+ \frac{(\alpha -1)^2}{2s\left( k + \alpha -1\right) ^2} \left( \Vert z_{k} -x^{*} \Vert ^2 -\Vert z_{k+1} -x^{*} \Vert ^2 \right) \\&\quad \, + \left\langle g_k, x_{k+1}- x^{*} \right\rangle - \frac{k}{k + \alpha -1} \left\langle g_k, x_{k}- x^{*} \right\rangle \\&= \frac{k}{k + \alpha -1} \left( \Theta (x_k) -\Theta (x^{*}) \right) \\&\quad \,+ \frac{(\alpha -1)^2}{2s\left( k + \alpha -1\right) ^2} \left( \Vert z_{k} -x^{*} \Vert ^2 -\Vert z_{k+1} -x^{*} \Vert ^2 \right) \\&\quad \,+\left\langle g_k, x_{k+1}- x_{k} + \frac{\alpha -1}{k + \alpha -1}(x_{k}- x^{*} )\right\rangle . \end{aligned}$$

Multiplying by $ \frac{2s}{\alpha -1}\left( k + \alpha -1\right) ^2$, we obtain

$$\begin{aligned}&\frac{2s}{\alpha -1}\left( k + \alpha -1\right) ^2 \left( \Theta (x_{k+1})-\Theta (x^{*})\right) \\&\quad \le \frac{2s}{\alpha -1} k\left( k + \alpha -1\right) \left( \Theta (x_k) -\Theta (x^{*}) \right) \\&\quad \quad + (\alpha -1) \left( \Vert z_{k} -x^{*} \Vert ^2 -\Vert z_{k+1} -x^{*} \Vert ^2 \right) \\&\quad \quad + \frac{2s}{\alpha -1}\left( k + \alpha -1\right) ^2 \left\langle g_k, x_{k+1}- x_{k} + \frac{\alpha -1}{k + \alpha -1}(x_{k}- x^{*} )\right\rangle , \end{aligned}$$

which implies

$$\begin{aligned}&\frac{2s}{\alpha -1}\left( k + \alpha -1\right) ^2 \left( \Theta (x_{k+1})-\Theta (x^{*})\right) \nonumber \\&\quad + 2s \frac{\alpha -3}{\alpha -1}k \left( \Theta (x_k) -\Theta (x^{*}) \right) \le \frac{2s}{\alpha -1} \left( k + \alpha -2\right) ^2 \left( \Theta (x_k) -\Theta (x^{*}) \right) \nonumber \\&\quad +(\alpha -1) \left( \Vert z_{k} -x^{*} \Vert ^2 -\Vert z_{k+1} -x^{*} \Vert ^2 \right) \nonumber \\&\quad + \frac{2s}{\alpha -1}\left( k + \alpha -1\right) ^2 \left\langle g_k, x_{k+1}- x_{k} + \frac{\alpha -1}{k + \alpha -1}(x_{k}- x^{*} )\right\rangle , \end{aligned}$$

(65)

in view of

$$\begin{aligned} k\left( k + \alpha -1\right)= & {} \left( k + \alpha -2\right) ^2 -k(\alpha -3) -(\alpha -2)^2 \\&\le \left( k + \alpha -2\right) ^2 -k(\alpha -3). \end{aligned}$$

Setting

$$\begin{aligned} {{\mathcal {G}}}(k)= \frac{2s}{\alpha -1} \left( k + \alpha -2\right) ^2 (\Theta (x_k) - {\Theta }^*) + (\alpha -1) \Vert z_{k} -x^{*} \Vert ^2, \end{aligned}$$

we can reformulate (65) as

$$\begin{aligned}&{{\mathcal {G}}} (k+1) + 2s \frac{\alpha -3}{\alpha -1}k \left( \Theta (x_k) -\Theta (x^{*}) \right) \\&\quad \le {{\mathcal {G}}} (k) + \frac{2s}{\alpha -1}\left( k + \alpha -1\right) ^2 \left\langle g_k, x_{k+1}- x_{k} + \frac{\alpha -1}{k + \alpha -1}(x_{k}- x^{*} )\right\rangle . \end{aligned}$$

Equivalently,

$$\begin{aligned}&{{\mathcal {G}}} (k+1) + 2s \frac{\alpha -3}{\alpha -1}k \left( \Theta (x_k) -\Theta (x^{*}) \right) \\&\quad \le {{\mathcal {G}}} (k) + 2s\left( k + \alpha -1\right) \left\langle g_k, \frac{k + \alpha -1}{\alpha -1} \left( x_{k+1}- x_{k} \right) + x_{k}- x^{*} \right\rangle . \end{aligned}$$

Using (61), we deduce that

$$\begin{aligned}&{{\mathcal {G}}} (k+1) + 2s \frac{\alpha -3}{\alpha -1}k \left( \Theta (x_k) -\Theta (x^{*}) \right) \nonumber \\&\quad \le {{\mathcal {G}}} (k) + 2s\left( k + \alpha -1\right) \left\langle g_k, z_{k+1}- x^{*} \right\rangle . \end{aligned}$$

(66)

Fix an integer K, and set

$$\begin{aligned} {{\mathcal {E}}}_K (k)= {{\mathcal {G}}} (k) + \sum _{j=k}^K 2s\left( j + \alpha -1\right) \left\langle g_j, z_{j+1}- x^{*} \right\rangle , \end{aligned}$$

so that (66) is equivalent to

$$\begin{aligned} {{\mathcal {E}}}_K (k+1) + 2s \frac{\alpha -3}{\alpha -1}k \left( \Theta (x_k) -\Theta (x^{*}) \right) \le {{\mathcal {E}}}_K (k), \end{aligned}$$

and we deduce that the sequence $({{\mathcal {E}}}_K (k))_k$ is nonincreasing. In particular, ${{\mathcal {E}}}_K (k) \le {{\mathcal {E}}}_K (0)$, which gives

$$\begin{aligned}&{{\mathcal {G}}} (k) + \sum _{j=k}^K 2s\left( j + \alpha -1\right) \left\langle g_j, z_{j+1}- x^{*} \right\rangle \\&\quad \le {{\mathcal {G}}} (0) + \sum _{j=0}^K 2s\left( j + \alpha -1\right) \left\langle g_j, z_{j+1}- x^{*} \right\rangle . \end{aligned}$$

As a consequence,

$$\begin{aligned} {{\mathcal {G}}} (k) \le {{\mathcal {G}}} (0) + \sum _{j=0}^{k-1} 2s\left( j + \alpha -1\right) \left\langle g_j, z_{j+1}- x^{*} \right\rangle . \end{aligned}$$

(67)

By the definition of ${{\mathcal {G}}} (k)$, neglecting some positive terms, and using the Cauchy–Schwarz inequality, we infer that

$$\begin{aligned} \Vert z_{k} -x^{*} \Vert ^2 \le \frac{1}{\alpha -1}{{\mathcal {G}}} (0) + \frac{2s}{\alpha -1} \sum _{j=1}^{k} \left( j + \alpha -2\right) \Vert g_{j-1}\Vert \Vert z_{j}- x^{*} \Vert . \end{aligned}$$

Applying Lemma 5.14 with $a_k = \Vert z_{k} -x^{*} \Vert $, we deduce that

$$\begin{aligned} \Vert z_{k} -x^{*} \Vert \le M:= \sqrt{\frac{{{\mathcal {G}}}(0)}{\alpha -1}} + \frac{2s}{\alpha -1} \sum _{j=0}^{\infty } \left( j + \alpha -1\right) \Vert g_{j}\Vert . \end{aligned}$$

(68)

Note that M is finite, because $\sum _{k \in {{\mathbb {N}}}} k \Vert g_k \Vert < + \infty $. Returning to (67) we obtain

$$\begin{aligned} {{\mathcal {G}}} (k) \le C:= & {} {{\mathcal {G}}} (0) + 2s\left( \sum _{j=0}^{\infty } \left( j + \alpha -1\right) \Vert g_j\Vert \right) \\&\times \,\left( \sqrt{\frac{{{\mathcal {G}}} (0)}{\alpha -1}} + \frac{2s}{\alpha -1} \sum _{j=0}^{\infty } \left( j + \alpha -1\right) \Vert g_j\Vert \right) . \end{aligned}$$

By the definition of ${{\mathcal {G}}} (k)$, we finally obtain

$$\begin{aligned} \frac{2s}{\alpha -1} \left( k + \alpha -2\right) ^2 (\Theta (x_k) - {\Theta }^*) \le C. \end{aligned}$$

which gives (57) and completes the proof. $\square $

Remark 5.2

In [22], Kim–Fessler introduce an extra inertial term in the FISTA method that allows them to reduce the constant by a factor of 2 in the complexity estimation. It would be interesting to know whether this variant can be obtained by another discretization in time of our inertial dynamic, or a different one.

5.3 Convergence of the sequence $(x_k)$

Let us now study the convergence of the sequence $(x_k)$.

Theorem 5.3

Let $\Psi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}} \cup \lbrace + \infty \rbrace $ be proper, lower-semicontinuous and convex, and let $\Phi : {{\mathcal {H}}} \rightarrow {{\mathbb {R}}}$ be convex and continuously differentiable with L-Lipschitz continuous gradient. Suppose that $S={{\mathrm{argmin\,}}}(\Psi + \Phi )\ne \emptyset $, and let $(x_k)$ be a sequence generated by algorithm (55) with $\alpha > 3, 0< s < \frac{1}{L} $, and $ \sum _{k \in {{\mathbb {N}}}} k \Vert g_k\Vert < + \infty $. Then,

(i)
$ \sum _k k \Big ((\Psi + \Phi )(x_k) - \inf (\Psi + \Phi ) \Big ) < + \infty $;
(ii)
$\sum k\Vert x_{k+1}-x_k\Vert ^2<+\infty $; and
(iii)
$x_k$ converges weakly, as $k\rightarrow +\infty $, to some $x^*\in {{\mathrm{argmin\,}}}(\Psi +\Phi )$.

Proof

We follow the same steps as those of Theorem 4.4:

Step 1 Recall from (66) that

$$\begin{aligned} {{\mathcal {G}}} (k+1) + 2s \frac{\alpha -3}{\alpha -1}k \left( \Theta (x_k) -\Theta (x^{*}) \right) \le {{\mathcal {G}}} (k) + 2s\left( k + \alpha -1\right) \left\langle g_k, z_{k+1}- x^{*} \right\rangle . \end{aligned}$$

By (68), the sequence $(z_{k})$ is bounded. Summing the above inequalities, and using $\alpha >3$, we obtain item i).

Step 2 Rewrite inequality (63) as

$$\begin{aligned} \Theta _k (y -s G_{s,k} (y)) + \frac{1}{2s}\Vert y- sG_{s,k} (y)-x \Vert ^2 \le \Theta _k (x) + \frac{1}{2s} \Vert x-y \Vert ^2. \end{aligned}$$

Take $y= y_k$, and $x=x_k$. Since $x_{k+1} = y_k - s G_{s,k} (y_k)$, and $y_k - x_{k} = \frac{k -1}{k + \alpha -1} ( x_{k} - x_{k-1})$, we obtain

$$\begin{aligned} \Theta _k (x_{k+1}) + \frac{1}{2s}\Vert x_{k+1}-x_k \Vert ^2 \le \Theta _k (x_k) + \frac{1}{2s}\frac{(k -1)^2}{(k + \alpha -1)^2} \Vert x_{k} - x_{k-1} \Vert ^2. \end{aligned}$$

By the definition of $ \Theta _k $, this is

$$\begin{aligned}&\Theta (x_{k+1}) + \frac{1}{2s}\Vert x_{k+1}-x_k \Vert ^2 \le \Theta (x_k) \nonumber \\&\quad +\, \frac{1}{2s}\frac{(k -1)^2}{(k + \alpha -1)^2} \Vert x_{k} - x_{k-1} \Vert ^2 + \left\langle g_k, x_{k+1}-x_k \right\rangle . \end{aligned}$$

(69)

Set $\theta _k = \Theta (x_k) -\Theta (x^{*}), d_k = \frac{1}{2}\Vert x_{k} - x_{k-1} \Vert ^2, a=\alpha -1$. By the Cauchy–Schwarz inequality, (69) gives

$$\begin{aligned} \frac{1}{s}\left( d_{k+1} - \frac{(k -1)^2}{(k + a)^2} d_k \right) \le \left( \theta _k - \theta _{k+1} \right) + \Vert g_k\Vert \Vert x_{k+1}-x_k \Vert . \end{aligned}$$

Multiply by $(k + a)^2$ to get

$$\begin{aligned} \frac{1}{s}\left( (k + a)^2 d_{k+1} - (k -1)^2 d_k \right)\le & {} (k + a)^2\left( \theta _k - \theta _{k+1} \right) \\&+\, (k + a)^2 \Vert g_k\Vert \Vert x_{k+1}-x_k \Vert . \end{aligned}$$

Summing for $k=1,\ldots ,K$, we obtain

$$\begin{aligned} \sum _{k=1}^{K}\left( (k + a)^2 d_{k+1} - (k -1)^2 d_k \right)\le & {} s\sum _{k=1}^{K}(k + a)^2\left( \theta _k - \theta _{k+1} \right) \\&+\,s\sum _{k=1}^{K} (k + a)^2 \Vert g_k\Vert \Vert x_{k+1}-x_k \Vert . \end{aligned}$$

Performing a similar computation as in Chambolle–Dossal [19, Corollary 2], we can write

$$\begin{aligned}&(K+a)^2 d_{K+1} + \sum _{k=2}^{K}a\left( 2k +a -2\right) d_k \nonumber \\&\quad \le s \left( (a+1)^2 \theta _1 - (K+a)^2 \theta _{K+1}\right. \nonumber \\&\quad \quad \left. +\,\sum _{k=2}^{K}\left( 2k +2a -1 \right) \theta _k +\sum _{k=1}^{K} (k + a)^2 \Vert g_k\Vert \Vert x_{k+1}-x_k \Vert \right) . \end{aligned}$$

(70)

By item (i), we have $\sum _k \left( 2k +2a -1 \right) \theta _k < + \infty $. Hence there exists some constant C such that

$$\begin{aligned} (K+a)^2 \Vert x_{K+1}-x_K \Vert ^2 \le C + 2s\sum _{k=1}^{K} (k + a)^2 \Vert g_k\Vert \Vert x_{k+1}-x_k \Vert \end{aligned}$$

(71)

for all $ K \in {{\mathbb {N}}}$. We now proceed as in the proof of Theorem 4.4. To this end, write (71) as

$$\begin{aligned} a_{k}^2 \le C + 2s\sum _{j=1}^{k} (j + a)\Vert g_j\Vert a_j, \end{aligned}$$

where $a_j:= (j + a)\Vert x_{j+1}-x_j \Vert $. Recalling that $\sum _k k \Vert g_k\Vert <+\infty $, apply Lemma 5.14 with $\beta _j = (j + a)\Vert g_j\Vert $ to deduce that

$$\begin{aligned} \sup _k k \Vert x_{k+1}-x_k \Vert < + \infty . \end{aligned}$$

(72)

Injecting this information in (70), we obtain

$$\begin{aligned} \sum _{k}a\left( 2k +a -2\right) d_k\le & {} C + \sum _{k}\left( 2k +2a -1 \right) \theta _k \\&+\,\sup _k ( (k+a) \Vert x_{k+1}-x_k \Vert ) \sum _{k} (k + a) \Vert g_k \Vert . \end{aligned}$$

Since $a = \alpha -1 \ge 2$, item (i) and the definition of $d_k$, together give

$$\begin{aligned} \sum k\Vert x_{k+1}-x_k\Vert ^2<+\infty , \end{aligned}$$

which is ii).

Step 3 We finish by applying Opial’s Lemma 5.8 with $S={{\mathrm{argmin\,}}}(\Psi + \Phi )$. By Theorem 5.1, we have $(\Psi + \Phi )(x_k) \rightarrow \min (\Psi + \Phi )$. The weak lower-semicontinuity of $\Psi + \Phi $ gives item (ii) of Opial’s Lemma. Thus, the only point to verify is that $\lim \Vert x_k-x^{*}\Vert $ exists for each $x^{*} \in {{\mathrm{argmin\,}}}(\Psi + \Phi )$. Take any such $x^*$. We shall show that $\lim _{k\rightarrow \infty }h_k$ exists, where $h_k := \frac{1}{2} \Vert x_k - x^{*}\Vert ^2$.

The beginning of the proof is similar to [3] or [19], and consists in establishing a discrete version of the second-order differential inequality (46). We use the identity

$$\begin{aligned} \frac{1}{2} \Vert a-b \Vert ^2 + \frac{1}{2} \Vert a-c \Vert ^2 = \frac{1}{2} \Vert b -c \Vert ^2 + \left\langle a-b, a-c \right\rangle , \end{aligned}$$

which holds for any $a,b,c \in {{\mathcal {H}}}$. Taking $b= x^{*}, a= x_{k+1}, c=x_k$, we obtain

$$\begin{aligned} \frac{1}{2} \Vert x_{k+1}-x^{*} \Vert ^2 + \frac{1}{2} \Vert x_{k+1}-x_k \Vert ^2 = \frac{1}{2} \Vert x_k - x^{*}\Vert ^2 + \left\langle x_{k+1}-x^{*}, x_{k+1}-x_k\right\rangle , \end{aligned}$$

which is equivalent to

$$\begin{aligned} h_k - h_{k+1} = \frac{1}{2} \Vert x_{k+1}-x_k \Vert ^2 + \left\langle x_{k+1}-x^{*}, x_k - x_{k+1}\right\rangle . \end{aligned}$$

(73)

By the definition of $y_k$, we have

$$\begin{aligned} x_k - x_{k+1}= y_k - x_{k+1} - \frac{k -1}{k + \alpha -1} (x_k - x_{k-1}). \end{aligned}$$

Therefore,

$$\begin{aligned} h_k - h_{k+1}= & {} \frac{1}{2} \Vert x_{k+1}-x_k \Vert ^2 + \left\langle x_{k+1}-x^{*},y_k - x_{k+1}\right\rangle \nonumber \\&-\, \frac{k -1}{k + \alpha -1}\left\langle x_{k+1}-x^{*},x_k - x_{k-1}\right\rangle . \end{aligned}$$

(74)

We now use the monotonicity of $\partial \Psi $. Since $- s\nabla \Phi (x^{*}) \in s\partial \Psi ( x^{*}) $, and $y_k - x_{k+1}- s \nabla \Phi (y_k) +sg_k\in s\partial \Psi (x_{k+1})$, we have

$$\begin{aligned} \left\langle y_k - x_{k+1}- s \nabla \Phi (y_k) +sg_k +s\nabla \Phi (x^{*}), x_{k+1} - x^{*} \right\rangle \ge 0. \end{aligned}$$

Equivalently,

$$\begin{aligned} \left\langle y_k - x_{k+1}, x_{k+1} - x^{*} \right\rangle + s \left\langle \nabla \Phi (x^{*}) - \nabla \Phi (y_k) +g_k, x_{k+1} - x^{*} \right\rangle \ge 0. \end{aligned}$$

Replacing in (74), we obtain

$$\begin{aligned}&h_{k+1}- h_k +\frac{1}{2} \Vert x_{k+1}-x_k \Vert ^2 + s \left\langle \nabla \Phi (y_k) -\nabla \Phi (x^{*}) - g_k, x_{k+1} - x^{*} \right\rangle \nonumber \\&\quad - \frac{k -1}{k + \alpha -1}\left\langle x_{k+1}-x^{*},x_k - x_{k-1}\right\rangle \le 0. \end{aligned}$$

(75)

On the other hand, the co-coercivity of $\nabla \Phi $ gives

$$\begin{aligned} \left\langle \nabla \Phi (y_k) -\nabla \Phi (x^{*}), x_{k+1} - x^{*} \right\rangle&= \left\langle \nabla \Phi (y_k) -\nabla \Phi (x^{*}), x_{k+1} - y_k \right\rangle \nonumber \\&\quad + \left\langle \nabla \Phi (y_k) -\nabla \Phi (x^{*}), y_k -x^{*} \right\rangle \nonumber \\&\ge \frac{1}{L} \Vert \Phi (y_k) -\nabla \Phi (x^{*}) \Vert ^2 \nonumber \\&\quad + \left\langle \nabla \Phi (y_k) -\nabla \Phi (x^{*}), x_{k+1} - y_k \right\rangle \nonumber \\&\ge \frac{1}{L} \Vert \Phi (y_k) -\nabla \Phi (x^{*}) \Vert ^2 - \Vert \nabla \Phi (y_k) \nonumber \\&\quad -\nabla \Phi (x^{*}) \Vert \Vert x_{k+1} - y_k \Vert \nonumber \\&\ge -\frac{L}{2} \Vert x_{k+1} - y_k \Vert ^2 \end{aligned}$$

(76)

(vertex of the parabola). Combining (75) and (76), we deduce that

$$\begin{aligned}&h_{k+1}- h_k +\frac{1}{2} \Vert x_{k+1}-x_k \Vert ^2 -\frac{sL}{2} \Vert x_{k+1} \nonumber \\&\quad - y_k \Vert ^2 - s\Vert g_k \Vert \Vert x_{k+1} - x^{*}\Vert \nonumber \\&\quad - \frac{k -1}{k + \alpha -1}\left\langle x_{k+1}-x^{*},x_k - x_{k-1}\right\rangle \le 0. \end{aligned}$$

(77)

Replace k by $k-1$ in (73) to obtain

$$\begin{aligned} h_{k-1}-h_{k} = \frac{1}{2} \Vert x_{k}-x_{k -1} \Vert ^2 - \left\langle x_{k}-x^{*}, x_{k}-x_{k -1} \right\rangle . \end{aligned}$$

(78)

Combine (77) with (78) to deduce that

$$\begin{aligned}&h_{k+1}- h_k - \frac{k -1}{k + \alpha -1}\left( h_{k} -h_{k-1}\right) \nonumber \\&\quad \le -\frac{1}{2} \Vert x_{k+1}-x_k \Vert ^2 +\frac{sL}{2} \Vert x_{k+1} - y_k \Vert ^2 + s\Vert g_k \Vert \Vert x_{k+1} - x^{*}\Vert \nonumber \\&\quad \quad +\frac{k -1}{k + \alpha -1}\left( \frac{1}{2} \Vert x_k -x_{k -1} \Vert ^2 + \left\langle x_k - x_{k-1}, x_{k+1}-x_k\right\rangle \right) . \end{aligned}$$

(79)

By the definition of $y_k = x_{k} + \frac{k -1}{k + \alpha -1} ( x_{k} - x_{k-1})$, we have $x_{k+1}- y_k = x_{k+1} - x_{k} - \frac{k -1}{k + \alpha -1} ( x_{k} - x_{k-1})$. Hence,

$$\begin{aligned} \Vert x_{k+1}- y_k \Vert ^2= & {} \Vert x_{k+1} - x_{k}\Vert ^2 + \left( \frac{k -1}{k + \alpha -1} \right) ^2 \Vert x_{k} - x_{k-1}\Vert ^2 \\&-\,2\frac{k -1}{k + \alpha -1} \left\langle x_{k+1} - x_{k}, x_{k} - x_{k-1}\right\rangle \end{aligned}$$

Substituting this in (79), we obtain

$$\begin{aligned} h_{k+1}- h_k - \gamma _k \left( h_{k} -h_{k-1}\right)\le & {} -\left( 1- \frac{sL}{2}\right) \Vert x_{k+1} - y_k \Vert ^2 + s\Vert g_k \Vert \Vert x_{k+1} \\&-\, x^{*}\Vert + \left( \gamma _k + {\gamma _k}^2 \right) \Vert x_{k} - x_{k-1}\Vert ^2, \end{aligned}$$

where $\gamma _k= \frac{k -1}{k + \alpha -1} $. Since $0< s < \frac{1}{L} $, we have $(1- \frac{sL}{2}) >0$. On the other hand, since $\gamma _k <1$, we have $\gamma _k + {\gamma _k}^2 < 2 \gamma _k$. Therefore,

$$\begin{aligned} h_{k+1}- h_k - \gamma _k \left( h_{k} -h_{k-1}\right) \le s\Vert g_k \Vert \Vert x_{k+1} - x^{*}\Vert + 2 \gamma _k \Vert x_{k} - x_{k-1}\Vert ^2. \end{aligned}$$

(80)

By (68), we know that the sequence $(z_k)$ is bounded. By (72), we know that $\sup _k k \Vert x_{k+1}-x_k \Vert < + \infty $. Since $x_k = z_k - \frac{k + \alpha -1}{ \alpha -1} ( x_{k+1}-x_k )$, we deduce that the sequence $(x_k)$ is bounded. Returning to (80), we have

$$\begin{aligned} h_{k+1}- h_k - \gamma _k \left( h_{k} -h_{k-1}\right) \le C\Vert g_k \Vert + 2 \gamma _k \Vert x_{k} - x_{k-1}\Vert ^2 \end{aligned}$$

for some constant C. Now, item (ii), combined with the assumption $\sum _{k } k \Vert g_k\Vert < + \infty $, together give

$$\begin{aligned} h_{k+1}- h_k - \gamma _k \left( h_{k} -h_{k-1}\right) \le \omega _k, \end{aligned}$$

for some nonnegative sequence $(\omega _k) $ such that $ \sum _{k \in {{\mathbb {N}}}} k\omega _k < + \infty $. Taking the positive parts and applying Lemma 5.10 with $a_k = \left( h_{k} -h_{k-1}\right) ^+$, to obtain

$$\begin{aligned} \sum _{k }\left( h_{k} -h_{k-1}\right) ^+ < +\infty . \end{aligned}$$

Since $(h_k)$ is nonnegative, one easily sees that it must converge. This completes the proof. $\square $

Remark 5.4

One reasonable conjecture is that strong convergence results can be obtained for algorithm (54), by transposing the results of Sect. 3 from the continuous to the discrete case. This direction will not be explored here, though.

Remark 5.5

The analysis carried out in this section for inertial forward-backward algorithm (55) is a reinterpretation of the proof of the corresponding results in the continuous case. In other words, we built a complete proof having the continuous setting as a guideline. It would be interesting to know whether the results in [5, 6] can be applied in order to deduce the asymptotic properties without repeating the proofs.

5.4 Final comments

In the particular case $\alpha = 3$, for a perturbed version of the classical FISTA algorithm, Schmidt–Le Roux–Bach proved in [34] a result which is similar to Theorem 5.1, concerning the fast convergence of the values. In [36], Villa–Salzo–Baldassarres–Verri use the notion of $\epsilon $-subdifferential in order to compute inexact proximal points in the algorithm. In a recent article [10], Aujol–Dossal extend this study by introducing additional perturbation terms to analyze the stability of the FISTA method, and prove convergence of the sequences generated by the algorithm (in the spirit of Chambolle–Dossal [19]). Although the results obtained in the previous papers have some similarities to those of Theorems 5.1 and 5.3, our dynamic approach is original and opens the door to new developments. One example is the following: In the dynamical system studied here, second-order information with respect to time ultimately induces fast convergence properties. On the other hand, in Newton-type methods, second-order information with respect to space has a similar consequence. In a forthcoming paper, and based on previous works [4] and [9], we study the solutions of the second-order (both in time and space) evolution equation

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \beta \,\nabla ^2 \Phi (x(t))\,\dot{x} (t) + \nabla \Phi (x(t)) = 0, \end{aligned}$$

where $\Phi $ is a smooth convex function, and $\alpha , \beta $ are positive parameters. This inertial system combines an isotropic viscous damping which vanishes asymptotically, and a geometrical Hessian-driven damping, which makes it naturally related to Newton and Levenberg-Marquardt methods.

The rich literature on the optimal damping of oscillating systems offers interesting insight (see, for example, the recent paper by Ghisi–Gobbino–Haraux [20], which deals with periodic damping).

Notes

In fact, W decreases strictly, as long as the trajectory is not stationary.

References

Abbas, B., Attouch, H., Svaiter, B.F.: Newton-like dynamics and forward–backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 161(2), 331–360 (2014)
Article MathSciNet MATH Google Scholar
Alvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2000)
Article MathSciNet MATH Google Scholar
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9(1–2), 3–11 (2001)
Article MathSciNet MATH Google Scholar
Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81(8), 747–779 (2002)
Article MathSciNet MATH Google Scholar
Alvarez, F., Peypouquet, J.: Asymptotic almost-equivalence of Lipschitz evolution systems in Banach spaces. Nonlinear Anal. 73(9), 3018–3033 (2010)
Article MathSciNet MATH Google Scholar
Alvarez, F., Peypouquet, J.: A unified approach to the asymptotic almost-equivalence of evolution systems without Lipschitz conditions. Nonlinear Anal. 74(11), 3440–3444 (2011)
Article MathSciNet MATH Google Scholar
Attouch, H., Buttazzo, G., Michaille, G.: Variational analysis in Sobolev and BV spaces. Applications to PDEs and optimization, 2nd edn. MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA, xii+793 pp (2014)
Attouch, H., Cabot, A., Redont, P.: The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12(1), 273–306 (2002)
MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)
Article MathSciNet MATH Google Scholar
Aujol, J.-F., Dossal, C.: Stability of over-relaxations for the forward–backward algorithm, application to FISTA. SIAM J. Optim. Society for Industrial and Applied Mathematics (2015). hal-01163432
Baillon, J.-B.: Un exemple concernant le comportement asymptotique de la solution du problème $\frac{du}{dt} + \partial \phi (u) \ni 0$. J. Funct. Anal. 28, 369–376 (1978)
Article MathSciNet MATH Google Scholar
Baillon, J.-B., Haddad, G.: Quelques propriétés des opérateurs angle-bornés et n-cycliquement monotones. Isr. J. Math. 26, 137–150 (1977)
Article MATH Google Scholar
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert spaces. CMS Books in Mathematics, Springer (2011)
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland (1972)
Bruck, R.E.: Asymptotic convergence of nonlinear contraction semigroups in Hilbert spaces. J. Funct. Anal. 18, 15–26 (1975)
Article MathSciNet MATH Google Scholar
Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 361, 5983–6017 (2009)
Article MathSciNet MATH Google Scholar
Cabot, A., Engler, H., Gadat, S.: Second order differential equations with asymptotically small dissipation and piecewise flat potentials. Electron. J. Differ. Equ. 17, 33–38 (2009)
MathSciNet MATH Google Scholar
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of Fista, HAL Id: hal-01060130 https://hal.inria.fr/hal-01060130v3. Submitted on 20 Oct 2014
Ghisi, M., Gobbino, M., Haraux, A.: The remarkable effectiveness of time-dependent damping terms for second order evolution equations (2015). arXiv:1506.06915v1 [math.AP]
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Article MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization (2015). arXiv:1406.5468v2 [math.OC]
Knopp, K.: Theory and Application of Infinite Series. Blackie & Son, Glasgow (1951)
MATH Google Scholar
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 1–15 (2015)
Article MathSciNet MATH Google Scholar
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447–454 (2003)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Sov. Math. Dokl. 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston (2004)
Book MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Discussion Papers (2007)
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)
Article MathSciNet MATH Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 123–231 (2013)
Google Scholar
Peypouquet, J.: Convex Optimization in Normed Spaces: Theory, Methods and Examples. Springer, Berlin (2015)
Book MATH Google Scholar
Peypouquet, J., Sorin, S.: Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time. J. Convex Anal. 17(3–4), 1113–1163 (2010)
MathSciNet MATH Google Scholar
Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: NIPS’11—25th Annual Conference on Neural Information Processing Systems, Grenada, Spain (2011) HAL inria-00618152v3
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27, 2510–2518 (2014)
MATH Google Scholar
Villa, S., Salzo, S., Baldassarres, L., Verri, A.: Accelerated and inexact forward–backward. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institut Montpelliérain Alexander Grothendieck, UMR 5149 CNRS, Université Montpellier 2, Place Eugéne Bataillon, 34095, Montpellier Cedex 5, France
Hedy Attouch & Patrick Redont
Laboratoire IBN AL-BANNA de Mathématiques et Applications (LIBMA), Faculty of Sciences Semlalia, Mathematics, Cadi Ayyad University, 40000, Marrakech, Morocco
Zaki Chbani
Universidad Técnica Federico Santa María, Av España 1680, Valparaiso, Chile
Juan Peypouquet

Authors

Hedy Attouch
View author publications
You can also search for this author in PubMed Google Scholar
Zaki Chbani
View author publications
You can also search for this author in PubMed Google Scholar
Juan Peypouquet
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Redont
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Peypouquet.

Additional information

Effort sponsored by the Air Force Office of Scientific Research, Air Force Material Command, USAF, under Grant Number FA9550-14-1-0056. Also supported by Fondecyt Grant 1140829, Conicyt Anillo ACT-1106, ECOS-Conicyt Project C13E03, Millenium Nucleus ICM/FIC RC130003, Conicyt Project MATHAMSUD 15MATH-02, Conicyt Redes 140183, and Basal Project CMM Universidad de Chile.

Appendix: Some auxiliary results

In this section, we present some auxiliary lemmas to be used later on. The following result can be found in [1]:

Lemma 5.6

Let $\delta >0, 1\le p<\infty $ and $1\le r\le \infty $. Suppose $F\in L^{p}([\delta ,\infty [)$ is a locally absolutely continuous nonnegative function, $G\in L^{r}([\delta ,\infty [)$ and

$$\begin{aligned} \frac{d}{dt}F(t)\le G(t) \end{aligned}$$

for almost every $t>\delta $. Then $\lim _{t\rightarrow \infty }F(t)=0$.

To establish the weak convergence of the solutions of (1), we will use Opial’s Lemma [30], that we recall in its continuous form. This argument was first used in [16] to establish the convergence of nonlinear contraction semigroups.

Lemma 5.7

Let S be a nonempty subset of ${{\mathcal {H}}}$ and let $x:[0,+\infty )\rightarrow {{\mathcal {H}}}$. Assume that

(i)
for every $z\in S, \lim _{t\rightarrow \infty }\Vert x(t)-z\Vert $ exists;
(ii)
every weak sequential limit point of x(t), as $t\rightarrow \infty $, belongs to S.

Then x(t) converges weakly as $t\rightarrow \infty $ to a point in S.

Its discrete version is

Lemma 5.8

Let S be a non empty subset of ${{\mathcal {H}}}$, and $(x_k)$ a sequence of elements of ${{\mathcal {H}}}$. Assume that

(i)
for every $z\in S, \lim _{k\rightarrow +\infty }\Vert x_k-z\Vert $ exists;
(ii)
every weak sequential limit point of $(x_k)$, as $k\rightarrow \infty $, belongs to S.

Then $x_k$ converges weakly as $k\rightarrow \infty $ to a point in S.

The following allows us to establish the existence of a limit for a real-valued function, as $t\rightarrow +\infty $:

Lemma 5.9

Let $\delta >0$, and let $w: [\delta , +\infty [ \rightarrow \mathbb R$ be a continuously differentiable function which is bounded from below. Assume

$$\begin{aligned} t\ddot{w}(t) + \alpha {{\dot{w}}}(t) \le g(t), \end{aligned}$$

(81)

for some $\alpha > 1$, almost every $t>\delta $, and some nonnegative function $g\in L^1 (\delta , +\infty )$. Then, the positive part $[{{\dot{w}}}]_+$ of ${{\dot{w}}}$ belongs to $L^1(t_0,+\infty )$ and $\lim _{t\rightarrow +\infty }w(t)$ exists.

Proof

Multiply (81) by $t^{\alpha -1}$ to obtain

$$\begin{aligned} \frac{d}{dt} \big (t^{\alpha } {{\dot{w}}}(t)\big ) \le t^{\alpha -1} g(t). \end{aligned}$$

By integration, we obtain

$$\begin{aligned} {{\dot{w}}}(t) \le \frac{{\delta }^{\alpha }|{{\dot{w}}}(\delta )|}{t^{\alpha } } + \frac{1}{t^{\alpha } } \int _{\delta }^t s^{\alpha -1} g(s)ds. \end{aligned}$$

Hence,

$$\begin{aligned}{}[{{\dot{w}}}]_{+}(t) \le \frac{{\delta }^{\alpha }|\dot{w}(\delta )|}{t^{\alpha } } + \frac{1}{t^{\alpha } } \int _{\delta }^t s^{\alpha -1} g(s)ds, \end{aligned}$$

and so,

$$\begin{aligned} \int _{\delta }^{\infty } [{{\dot{w}}}]_{+}(t) dt \le \frac{{\delta }^{\alpha }|{{\dot{w}}}(\delta )|}{(\alpha -1) \delta ^{\alpha -1}} + \int _{\delta }^{\infty }\frac{1}{t^{\alpha }} \left( \int _{\delta }^t s^{\alpha -1} g(s) ds\right) dt. \end{aligned}$$

Applying Fubini’s Theorem, we deduce that

$$\begin{aligned} \int _{\delta }^{\infty }\frac{1}{t^{\alpha }} \left( \int _{\delta }^t s^{\alpha -1} g(s) ds\right) dt= & {} \int _{\delta }^{\infty } \left( \int _{s}^{\infty } \frac{1}{t^{\alpha }} dt\right) s^{\alpha -1} g(s) ds\\= & {} \frac{1}{\alpha -1} \int _{\delta }^{\infty }g(s) ds. \end{aligned}$$

As a consequence,

$$\begin{aligned} \int _{\delta }^{\infty } [{{\dot{w}}}]_{+}(t) dt \le \frac{{\delta }^{\alpha }|{{\dot{w}}}(\delta ) |}{(\alpha -1) \delta ^{\alpha -1}} + \frac{1}{\alpha -1} \int _{\delta }^{\infty }g(s) ds < + \infty . \end{aligned}$$

Finally, the function $\theta :[\delta ,+\infty )\rightarrow {\mathbb {R}}$, defined by

$$\begin{aligned} \theta (t)=w(t)-\int _{\delta }^{t}[{{\dot{w}}}]_+(\tau )\,d\tau , \end{aligned}$$

is nonincreasing and bounded from below. It follows that

$$\begin{aligned} \lim _{t\rightarrow +\infty }w(t)=\lim _{t\rightarrow +\infty }\theta (t)+\int _{\delta }^{+\infty }[{{\dot{w}}}]_+(\tau )\,d\tau \end{aligned}$$

exists. $\square $

In the study of the corresponding algorithms, we use the following result, which is a discrete version of Lemma 5.9:

Lemma 5.10

Let $\alpha \ge 3$, and let $(a_k)$ and $(\omega _k)$ be two sequences of nonnegative numbers such that

$$\begin{aligned} a_{k+1} \le \frac{k -1}{k + \alpha -1}a_k + \omega _k \end{aligned}$$

for all $k\ge 1$. If $\sum _k k\omega _{k\in {{\mathbb {N}}}} <+\infty $, then $\sum _{k \in {{\mathbb {N}}}} a_k < +\infty $.

Proof

Since $\alpha \ge 3$ we have $\alpha -1\ge 2$, and hence

$$\begin{aligned} a_{k+1} \le \frac{k -1}{k + 2}a_k + \omega _k. \end{aligned}$$

Multiplying this expression by $(k+1)^2$, we obtain

$$\begin{aligned} (k+1)^2 a_{k+1} \le \frac{(k -1)(k+1)^2}{k + 2}a_k + (k+1)^2\omega _k \le k^2a_k + (k+1)^2\omega _k. \end{aligned}$$

Summing this inequality for $j=1,2,\ldots ,k$, we obtain

$$\begin{aligned} k^2 a_{k} \le a_1 + \sum _{j=1}^{k-1}(j+1)^2\omega _j. \end{aligned}$$

Dividing by $k^2$, and summing for $k=2,\ldots ,K$, we obtain

$$\begin{aligned} \sum _{k=2}^K a_{k} \le a_1 \sum _{k=2}^K \frac{1}{k^2} + \sum _{k=2}^K \frac{1}{k^2}\sum _{j=1}^{k-1}(j+1)^2\omega _j. \end{aligned}$$

Applying Fubini’s Theorem to this last sum, and observing that $\sum _{k=j+1}^{\infty }\frac{1}{k^2} \le \int _{j}^{\infty }\frac{1}{t^2}dt = \frac{1}{j}$, we obtain

$$\begin{aligned} \sum _{k=2}^K a_{k}\le & {} a_1 \sum _{k=2}^K \frac{1}{k^2} + \sum _{j=1}^{K-1} \left( \sum _{k=j+1}^{\infty }\frac{1}{k^2} \right) (j+1)^2\omega _j \\\le & {} a_1 \sum _{k=2}^K \frac{1}{k^2} + \sum _{j=1}^{K-1}\frac{(j+1)^2}{j} \omega _j\\\le & {} a_1 \sum _{k=2}^\infty \frac{1}{k^2} + 4\sum _{j=1}^{\infty }j \omega _j <+\infty , \end{aligned}$$

and the result follows. $\square $

The following is a continuous version of Kronecker’s Theorem for series (see, for example, [23, page 129]):

Lemma 5.11

Take $\delta >0$, and let $f \in L^1 (\delta , +\infty )$ be nonnegative and continuous. Consider a nondecreasing function $\psi :(\delta ,+\infty )\rightarrow (0,+\infty )$ such that $\lim \limits _{t\rightarrow +\infty }\psi (t)=+\infty $. Then,

$$\begin{aligned} \lim _{t \rightarrow + \infty } \frac{1}{\psi (t)} \int _{\delta }^t \psi (s)f(s)ds =0. \end{aligned}$$

Proof

Given $\epsilon >0$, fix $t_\epsilon $ sufficiently large so that

$$\begin{aligned} \int _{t_\epsilon }^{\infty } f(s) ds \le \epsilon . \end{aligned}$$

Then, for $t \ge t_\epsilon $, split the integral $\int _{\delta }^t \psi (s)f(s) ds$ into two parts to obtain

$$\begin{aligned} \frac{1}{\psi (t)} \int _{\delta }^t \psi (s)f(s)ds= & {} \frac{1}{\psi (t)}\int _{\delta }^{t_\epsilon } \psi (s) f(s) ds + \frac{1}{\psi (t)}\int _{t_\epsilon }^t \psi (s) f(s) ds \\\le & {} \frac{1}{\psi (t)}\int _{\delta }^{t_\epsilon } \psi (s)f(s) ds + \int _{t_\epsilon }^t f(s) ds. \end{aligned}$$

Now let $t\rightarrow +\infty $ to deduce that

$$\begin{aligned} 0\le \limsup _{t\rightarrow +\infty }\frac{1}{\psi (t)}\int _{\delta }^t \psi (s)f(s)ds \le \epsilon . \end{aligned}$$

Since this is true for any $\epsilon >0$, the result follows. $\square $

Using the previous result, we also obtain the following vector-valued version of Lemma 5.9:

Lemma 5.12

Take $\delta >0$, and let $F \in L^1 (\delta , +\infty ; {{\mathcal {H}}})$ be continuous. Let $x:[\delta , +\infty [ \rightarrow {{\mathcal {H}}}$ be a solution of

$$\begin{aligned} t\ddot{x}(t) + \alpha \dot{x}(t) = F(t) \end{aligned}$$

(82)

with $\alpha >1$. Then, x(t) converges strongly in ${{\mathcal {H}}}$ as $t \rightarrow + \infty $.

Proof

As in the proof of Lemma 5.9, multiply (82) by $t^{\alpha -1}$ and integrate to obtain

$$\begin{aligned} \dot{x}(t) = \frac{{\delta }^{\alpha }\dot{x}(\delta )}{t^{\alpha }} + \frac{1}{t^{\alpha }} \int _{\delta }^t s^{\alpha -1}F(s)ds.\end{aligned}$$

Integrate again to deduce that

$$\begin{aligned} x(t) = x(\delta ) + {\delta }^{\alpha }\dot{x}(\delta ) \int _{\delta }^t \frac{1}{s^{\alpha }} ds + \int _{\delta }^t \frac{1}{s^{\alpha }} \left( \int _{\delta }^s {\tau }^{\alpha -1}F(\tau )d\tau \right) ds.\end{aligned}$$

Fubini’s Theorem applied to the last integral gives

$$\begin{aligned} x(t)= & {} x(\delta ) + \frac{{\delta }^{\alpha }\dot{x}(\delta )}{\alpha -1}\left( \frac{1}{\delta ^{\alpha -1}}-\frac{1}{t^{\alpha -1}}\right) \nonumber \\&+\,\frac{1}{\alpha -1} \left( \int _{\delta }^t F(\tau )d\tau - \frac{1}{t^{\alpha -1}} \int _{\delta }^t {\tau }^{\alpha -1} F(\tau )d\tau \right) . \end{aligned}$$

(83)

Finally, apply Lemma 5.11 to the last integral with $\psi (s)=s^{\alpha -1}$ and $f(s)=\Vert F(s)\Vert $ to conclude that all the terms in the right-hand side of (83) have a limit as $t \rightarrow + \infty $. $\square $

Finally, in the analysis of the solutions for the perturbed system, we shall use the following Gronwall-Bellman Lemma (see [15, Lemme A.5]):

Lemma 5.13

Let $m:[\delta ,T]\rightarrow [0,+\infty [$ be integrable, and let $c\ge 0$. Suppose $w:[\delta ,T]\rightarrow {\mathbb {R}}$ is continuous and

$$\begin{aligned} \displaystyle {\frac{1}{2}w^2 (t) \le \frac{1}{2} c^2 + \int _{\delta }^t m(\tau ) w(\tau ) d\tau } \end{aligned}$$

for all $t\in [\delta ,T]$. Then, $\displaystyle {|w(t)| \le c + \int _{\delta }^t m(\tau ) d\tau }$ for all $t\in [\delta ,T]$.

We shall also make use of the following discrete version of the preceding result:

Lemma 5.14

Let $(a_k)$ be a sequence of nonnegative numbers such that

$$\begin{aligned} a_k^2 \le c^2 + \sum _{j=1}^k \beta _j a_j \end{aligned}$$

for all $k\in {{\mathbb {N}}}$, where $(\beta _j )$ is a summable sequence of nonnegative numbers, and $c\ge 0$. Then, $\displaystyle a_k \le c + \sum \nolimits _{j=1}^{\infty } \beta _j$ for all $k\in {{\mathbb {N}}}$.

Proof

For $k\in {{\mathbb {N}}}$, set $A_k := \max _{1\le m \le k} a_m $. Then, for $1\le m\le k$, we have

$$\begin{aligned} a_m^2 \le c^2 + \sum _{j=1}^m \beta _j a_j \le c^2 + A_k \sum _{j=1}^{\infty } \beta _j. \end{aligned}$$

Taking the maximum over $1\le m\le k$, we obtain

$$\begin{aligned} A_k^2 \le c^2 + A_k \sum _{j=1}^{\infty } \beta _j. \end{aligned}$$

Bounding by the roots of the corresponding quadratic equation, we obtain the result. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Attouch, H., Chbani, Z., Peypouquet, J. et al. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018). https://doi.org/10.1007/s10107-016-0992-8

Download citation

Received: 22 August 2015
Accepted: 07 February 2016
Published: 24 March 2016
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10107-016-0992-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity

Abstract

Similar content being viewed by others

Fast Convergence of Inertial Gradient Dynamics with Multiscale Aspects

Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

First-order optimization algorithms via inertial systems with Hessian driven damping

1 Introduction

2 Minimizing property, convergence rates and weak convergence of the trajectories

2.1 Preliminary remarks and estimations

Lemma 2.1

Lemma 2.2

Proof

2.2 Minimizing property

Theorem 2.3

Proof

Remark 2.4

Example 2.5

2.3 Two “anchored” energy functions

Remark 2.6

2.4 Rate of convergence for the values

Theorem 2.7

Proof

Remark 2.8

Remark 2.9

Remark 2.10

2.5 Some examples and counterexamples

Example 2.11

Example 2.12

Example 2.13

2.6 Weak convergence of the trajectories

Theorem 2.14

Proof

Remark 2.15

Theorem 2.16

Proof

Remark 2.17

2.7 Further stabilization results

Proposition 2.18

Proof

Remark 2.19

3 Strong convergence results

3.1 Set of minimizers with nonempty interior

Theorem 3.1

Proof

3.2 Even functions

Theorem 3.2

Proof

3.3 Uniformly convex functions

Theorem 3.3

Proof

Theorem 3.4

Proof

4 Asymptotic behavior of the trajectory under perturbations

4.1 Energy estimates and minimization property

Lemma 4.1

Proof

4.2 Fast convergence of the values

Theorem 4.2

Proof

Remark 4.3

4.3 Convergence of the trajectories

Theorem 4.4

Proof

Remark 4.5

4.4 Strong convergence results

Theorem 4.6

5 Convergence of the associated algorithms

5.1 A dynamical introduction of the algorithm

5.2 Fast convergence of the values

Theorem 5.1

Proof

Remark 5.2

5.3 Convergence of the sequence \((x_k)\)

Theorem 5.3

Proof

Remark 5.4

Remark 5.5

5.4 Final comments

Notes

References