Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators

Attouch, Hedy; Peypouquet, Juan

doi:10.1007/s10107-018-1252-x

Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators

Full Length Paper
Series B
Published: 07 March 2018

Volume 174, pages 391–432, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Programming Submit manuscript

Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators

Download PDF

1511 Accesses
58 Citations
Explore all metrics

Abstract

We study the behavior of the trajectories of a second-order differential equation with vanishing damping, governed by the Yosida regularization of a maximally monotone operator with time-varying index, along with a new Regularized Inertial Proximal Algorithm obtained by means of a convenient finite-difference discretization. These systems are the counterpart to accelerated forward–backward algorithms in the context of maximally monotone operators. A proper tuning of the parameters allows us to prove the weak convergence of the trajectories to zeroes of the operator. Moreover, it is possible to estimate the rate at which the speed and acceleration vanish. We also study the effect of perturbations or computational errors that leave the convergence properties unchanged. We also analyze a growth condition under which strong convergence can be guaranteed. A simple example shows the criticality of the assumptions on the Yosida approximation parameter, and allows us to illustrate the behavior of these systems compared with some of their close relatives.

Fast Continuous Dynamics Inside the Graph of Maximally Monotone Operators

Article 03 February 2023

Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

Article Open access 27 June 2024

Convergence Rate of Inertial Proximal Algorithms with General Extrapolation and Proximal Coefficients

Article 29 April 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Let ${\mathcal {H}}$ be a real Hilbert space endowed with the scalar product $\langle \cdot ,\cdot \rangle $ and norm $\Vert \cdot \Vert $. Given a maximally monotone operator $A: {\mathcal {H}} \rightarrow 2^{{\mathcal {H}}}$, we study the asymptotic behavior, as time goes to $+\infty $, of the trajectories of the second-order differential equation

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) +A_{\lambda (t)}(x(t)) = 0, \qquad t>t_0>0, \end{aligned}$$

(1)

where

$$\begin{aligned} A_{\lambda } = \frac{1}{\lambda } \left( I- \left( I + \lambda A \right) ^{-1} \right) \end{aligned}$$

stands for the Yosida regularization of A of index $\lambda >0$ (see “Appendix A.1” for its main properties), along with a new Regularized Inertial Proximal Algorithm obtained by means of a convenient finite-difference discretization of (1). The design of rapidly convergent dynamics and algorithms to solve monotone inclusions of the form

$$\begin{aligned} 0\in Ax \end{aligned}$$

(2)

is a difficult problem of fundamental importance in optimization, equilibrium theory, economics and game theory, partial differential equations, statistics, among other subjects. We shall come back to this point shortly. The dynamical systems studied here are closely related to Nesterov’s acceleration scheme, whose rate of convergence for the function values is worst-case optimal. We shall see that, properly tuned, these systems converge to solutions of (2), and do so in a robust manner. We hope to open a broad avenue for further studies concerning related stochastic approximation and optimization methods. Let us begin by putting this in some more context. In all that follows, the set of solutions to (2) will be denoted by S.

1.1 From the heavy ball to fast optimization

Let $\Phi : {\mathcal {H}}\rightarrow \mathbb {R}$ be a continuously differentiable convex function. The heavy ball with friction system

$$\begin{aligned} \mathrm{(HBF)} \qquad \ddot{x}(t)+\gamma \dot{x}(t)+\nabla \Phi (x(t))=0 \end{aligned}$$

(3)

was first introduced, from an optimization perspective, by Polyak [31]. The convergence of the trajectories of (HBF) in the case of a convex function $\Phi $ has been obtained by Álvarez in [1]. In recent years, several studies have been devoted to the study of the Inertial Gradient System $\mathrm{(IGS)_{\gamma }} $, with a time-dependent damping coefficient $\gamma (\cdot )$

$$\begin{aligned} \mathrm{(IGS)_{\gamma }} \qquad \ddot{x}(t)+\gamma (t) \dot{x}(t)+\nabla \Phi (x(t))=0. \end{aligned}$$

(4)

A particularly interesting situation concerns the case $\gamma (t) \rightarrow 0$ of a vanishing damping coefficient. Indeed, as pointed out by Su et al. in [37], the $\mathrm {(IGS)_{\gamma } }$ system with $\gamma (t) = \frac{\alpha }{t}$, namely:

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \nabla \Phi (x(t)) = 0, \end{aligned}$$

(5)

can be seen as a continuous version of the fast gradient method of Nesterov (see [26, 27]), and its widely used successors, such as the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) of Beck and Teboulle [15]. When $\alpha \ge 3$, the rate of convergence of these methods is $\Phi (x_k)-\min _{\mathcal {H}}\Phi = {\mathcal {O}} \left( \frac{1}{k^2}\right) $, where k is the number of iterations. Convergence of the trajectories generated by (5), and of the sequences generated by Nesterov’s method, has been an elusive question for decades. However, when considering (5) with $\alpha >3$, it was shown by Attouch et al. [6] and May [24], that each trajectory converges weakly to an optimal solution, with the improved rate of convergence $\Phi (x(t))-\min _{\mathcal {H}}\Phi = o (\frac{1}{t^2})$. Corresponding results for the algorithmic case have been obtained by Chambolle and Dossal [20] and Attouch and Peypouquet [8]. Independently, and mainly motivated by applications to partial differential equations and control problems, Jendoubi and May [22] and Attouch and Cabot [3, 4] consider more general time-dependent damping coefficient $\gamma (\cdot )$. The latter includes the corresponding forward–backward algorithms, and unifies previous results. In the case of a convex lower semicontinuous proper function $\Phi : {\mathcal {H}} \rightarrow {\mathbb {R}}\cup + \{\infty \}$, the $\mathrm{(IGS)_{\gamma }}$ dynamic (4) is not well-posed, see [5]. A natural idea is to replace $\Phi $ by its Moreau envelope $\Phi _\lambda $, and consider

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) + \nabla \Phi _{\lambda (t)} (x(t)) = 0 \end{aligned}$$

(6)

(see [14, 29] for details, including the fact that $\nabla \Phi _\lambda =(\partial \Phi )_\lambda $). Fast minimization and convergence properties for the trajectories of (6) and their related algorithms have been recently obtained by Attouch et al. [9]. This also furnishes a viable path towards its extension to the maximally monotone operator setting. It is both useful and natural to consider a time-dependent regularization parameter, as we shall explain now.

1.2 Inertial dynamics and cocoercive operators

In analogy with (3), Álvarez and Attouch [2] and Attouch and Maingé [7] studied the equation

$$\begin{aligned} \ddot{x}(t) + \gamma \dot{x}(t) + A(x(t)) = 0, \end{aligned}$$

(7)

where A is a cocoercive operator. Cocoercivity plays an important role, not only to ensure the existence of solutions, but also in analyzing their long-term behavior. They discovered that it was possible to prove weak convergence to a solution of (2) if the cocoercivity parameter $\lambda $ and the damping coefficient $\gamma $ satisfy $\lambda \gamma ^2 >1$. Taking into account that for $\lambda >0$, the operator $A_{\lambda }$ is $\lambda $-cocoercive and that $A_{\lambda }^{-1} (0) = A^{-1}(0)$ (see “Appendix A.1”), we immediately deduce that, under the condition $\lambda \gamma ^2 >1$, each trajectory of

$$\begin{aligned} \ddot{x}(t) + \gamma \dot{x}(t) + A_{\lambda }(x(t)) = 0 \end{aligned}$$

converges weakly to a zero of A. In the quest for a faster convergence, our analysis of Eq. (1), led us to introduce a time-dependent regularizing parameter $\lambda (\cdot )$ satisfying

$$\begin{aligned} \lambda (t)\times \frac{\alpha ^2}{t^2}>1 \end{aligned}$$

for $t\ge t_0$. A similar condition appears in the study of the corresponding algorithms. We mention that a condition of the type $\lambda (t)\sim t^2$ appears to be critical to obtain fast convergence of the values in the case $A=\partial \Phi $, according to the results in [9]. But system (1) is not just an interesting extension of the heavy ball dynamic. It also arises naturally in stochastic variational analysis.

1.3 Connections with stochastic optimization

Let us present some links bewteen our approach and stochastic optimization.

1.3.1 Stochastic approximation algorithms

A close relationship between stochastic approximation algorithms and inertial dynamics with vanishing damping was established by Cabot et al. in [19]. Let us briefly (and skipping the technicalities) comment on this link and see how it naturally extends to our setting.

When A is a sufficiently smooth operator, stochastic approximation algorithms are frequently used to approximate, with a random version of the Euler’s scheme, the behavior of the ordinary differential equation $\dot{x}(t) + A(x(t))=0$. If A is a general maximally monotone operator, a natural idea is to apply this method to the regularized equation $\dot{x}(t) + A_{\lambda }(x(t))=0$, which has the same equilibrium points, since A and $A_{\lambda }$ have the same set of zeros. Consider first the case where $ \lambda >0$ is a fixed parameter. If we denote by $(X_n)_{n\in \mathbb {N}} $ the random approximants, $(w_n)_{n\ge 1}$ and $(\eta _n)_{n\ge 1}$ two auxiliary stochastic processes, the recursive approximation is written as

$$\begin{aligned} X_{n+1} = X_n -\epsilon _{n+1}A_{\lambda }(X_n,w_{n+1}) + \epsilon _{n+1}\eta _{n+1}, \end{aligned}$$

(8)

where $\epsilon _{n}$ is a sequence of positive real numbers, and $\eta _n$ is a small residual perturbation. Under appropriate conditions (see [19]), solutions of (8) asymptotically behave like those of the deterministic differential equation $\dot{x}(t) + A_{\lambda }(x(t))=0$. A very common case occurs when $(w_n)_{n\ge 1}$ is a sequence of independent identically distributed variables with distribution $\mu $ and

$$\begin{aligned} A_{\lambda }(x) =\int A_{\lambda }(x, \omega ) \mu (d\omega ). \end{aligned}$$

When $A= \partial \Phi $ derives from a potential, this gives a stochastic optimization algorithm (see [32]). When the random variable $A_{\lambda }(X_n,\cdot )$ has a large variance, the stochastic approximation of $A_{\lambda }(X_n)$ by $A_{\lambda }(X_n,w_{n+1})$ can be numerically improved by using the following modified recursive definition:

$$\begin{aligned} X_{n+1} = X_n -\frac{\epsilon _{n+1}}{\sum \epsilon _i}\sum _i \epsilon _i A_{\lambda }(X_i,w_{i+1}) . \end{aligned}$$

(9)

As proved in [19, Appendix A], the limit ordinary differential equation has the form

$$\begin{aligned} \ddot{X}(t) + \frac{1}{t + b} \dot{X}(t) + \frac{1}{t + b}A_{\lambda } (X(t))=0, \end{aligned}$$

(10)

where b is a positive parameter. After the successive changes of the time variable $t \mapsto t-b$, $t \mapsto 2\sqrt{t}$, we finally obtain

$$\begin{aligned} \ddot{X}(t) + \frac{1}{t} \dot{X}(t) + A_{\lambda } (X(t))=0, \end{aligned}$$

(11)

which is precisely (1) with $\alpha =1$ and $\lambda (t)\equiv \lambda $.

This opens a number of possible lines of research: First, it would be interesting to see how the coefficient $ \alpha $ will appear in the stochastic case. Next, it appears important to understand the connection between (10) and (11) with a time-dependent coefficient $ \lambda (\cdot ) $. Combining these two developments, we could be able to extend the stochastic approximation method to a wide range of equilibrium problems, and expect that the fast convergence results to hold, considering that (1) is an accelerated method in the case of convex minimization when $ \alpha \ge 3$. In the case of a gradient operator, the above results also suggest a natural link with the epigraphic law of large numbers of Attouch and Wets [11]. The analysis can be enriched using the equivalence between epi-convergence of a sequence of functions and the convergence of the associated evolutionary semigroups.

1.3.2 Robust optimization, regularization

Suppose we are interested in finding a zero of an operator A, which is not exactly known (a situation that is commonly encountered in inverse problems, for instance). If the uncertainty follows a known distribution, then the stochastic approximation approach described above may be applicable. Otherwise, if A is known to be sufficiently close to some model ${\hat{A}}$ (in a sense to be precised below), an alternative is to use robust analysis techniques, interpreting A as a perturbation of ${\hat{A}}$. Regularization techniques (like Tikhonov’s, where the operator A is replaced by a regularized operator $A+ \rho B$) follow a similar logic. We recall the notion of graph-distance bewteen two operators, introduced by Attouch and Wets in [12, 13] (see also [35]). It can be formulated using Yosida approximations as

$$\begin{aligned} d_{\lambda , \rho } (A, {\hat{A}}) = \sup _{\Vert x\Vert \le \rho } \Vert A_{\lambda }(x) - {\hat{A}}_{\lambda }(x)\Vert , \end{aligned}$$

where $\rho $ is a positive parameter. These pseudo-distances are particularly well adapted to our dynamics, which is expressed using Yosida approximations. In the case of minimization problems, they are associated with the notion of epi-distance. The calculus rules developed in [12, 13] make it possible to estimate these distances in the case of operators with an additive or composite structure. In Sect. 4, the convergence of the algorithm is examined in the case of errors that go asymptotically to zero. A more general situation, where approximation and iterative methods are coupled in the presence of noise, is an ongoing research topic. See [25, 36, 38] for recent results in this direction.

1.4 Organization of the paper

In Sect. 2, we study the asymptotic convergence properties of the trajectories of the continuous dynamics (1). We prove that each trajectory converges weakly, as $t\rightarrow +\infty $, to a solution of (2), with $\Vert \dot{x}(t)\Vert =O(1/t)$ and $\Vert \ddot{x}(t)\Vert =O(1/t^2)$. Then, in Sect. 3, we consider the corresponding proximal-based inertial algorithms and prove parallel convergence results. The effect of external perturbations on both the continuous-time system and the corresponding algorithms is analyzed in Sect. 4, yielding the robustness of both methods. In Sect. 5, we describe how convergence is improved under a quadratic growth condition. Finally, in Sect. 6, we give an example that shows the sharpness of the results and illustrates the behavior of (1) compared to other related systems. Some auxiliary technical results are gathered in an “Appendix”.

2 Convergence results for the continuous-time system

In this section, we shall study the asymptotic properties of the continuous-time system:

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) +A_{\lambda (t)}(x(t)) = 0,\qquad t>t_0>0. \end{aligned}$$

(12)

Although some of the results presented in this paper are valid when $ \lambda (\cdot ) $ is locally integrable, we shall assume $ \lambda (\cdot )$ to be continuous, for simplicity. The function $(t,x)\mapsto A_{\lambda (t)}(x)$ is continuous in (t, x), and uniformly Lipschitz-continuous with respect to x. This makes (12) a classical differential equation, whose solutions are unique, and defined for all $t\ge t_0$, for every initial condition $x(t_0)=x_0$, $\dot{x}(t_0)=v_0$.^{Footnote 1} See Lemma (A.1) for the corresponding result in the case where $\lambda (\cdot )$ is just locally integrable.

The main result of this section is the following:

Theorem 2.1

Let $A: {\mathcal {H}} \rightarrow 2^{{\mathcal {H}}}$ be a maximally monotone operator such that $S= A^{-1} (0)\ne \emptyset $. Let $x:[t_0,+\infty [\rightarrow {\mathcal {H}}$ be a solution of (12), where

$$\begin{aligned} \alpha>2\quad \hbox {and}\quad \lambda (t) = (1+\epsilon ) \frac{t^2}{\alpha ^2} \quad \hbox {for some}\quad \epsilon > \frac{2}{\alpha -2}. \end{aligned}$$

Then, x(t) converges weakly, as $t\rightarrow +\infty $, to an element of S. Moreover $\lim _{t \rightarrow + \infty } \Vert \dot{x}(t)\Vert =\lim _{t \rightarrow + \infty } \Vert \ddot{x}(t) \Vert =0.$

2.1 Anchor

As a fundamental tool we will use the distance to equilibria in order to anchor the trajectory to the solution set $S=A^{-1}(0)$. To this end, given a solution trajectory $x:[t_0,+\infty [\rightarrow {\mathcal {H}}$ of (12), and a point $z\in {\mathcal {H}}$, we define $h_z:[t_0,+\infty [ \rightarrow {\mathbb {R}}$ by

$$\begin{aligned} h_z(t)=\frac{1}{2}\Vert x(t)-z\Vert ^2. \end{aligned}$$

(13)

We have the following:

Lemma 2.2

Given $z\in S= A^{-1} (0)$, define $h_z$ by (13). For all $t \ge t_0$, we have

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) + \lambda (t) \Vert A_{\lambda (t)}(x(t)) \Vert ^2 \le \Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

(14)

Proof

First observe that

$$\begin{aligned} \dot{h}_z(t) = \langle x(t) - z , \dot{x}(t) \rangle \qquad \hbox {and}\qquad \ddot{h}_z(t) = \langle x(t) - z , \ddot{x}(t) \rangle + \Vert \dot{x}(t) \Vert ^2. \end{aligned}$$

By (12), it ensues that

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) + \langle A_{\lambda (t)}(x(t)), x(t) - z \rangle = \Vert \dot{x}(t) \Vert ^2 . \end{aligned}$$

(15)

Since $z\in S = A^{-1} (0)$, we have $A_{\lambda (t)}(z)=0$, and the $\lambda (t)$-cocoercivity of $A_{\lambda (t)}$ gives

$$\begin{aligned} \langle A_{\lambda (t)}(x(t)), x(t) - z \rangle \ge \lambda (t) \Vert A_{\lambda (t)}(x(t)) \Vert ^2 . \end{aligned}$$

(16)

It suffices to combine (15) and (16) to conclude. $\square $

As suggested in 1.2, we are likely to need a growth assumption on $\lambda (t)$ −related to $\lambda (t) \times \left( \frac{\alpha }{t}\right) ^2>1$ − in order to go further. Whence, the following result:

Lemma 2.3

Given $z\in S= A^{-1} (0)$, define $h_z$ by (13) and let

$$\begin{aligned} \lambda (t) \frac{\alpha ^2}{t^2} \ge 1+ \epsilon \end{aligned}$$

(17)

for some $\epsilon >0$. Then, for each $z\in S= A^{-1} (0)$ and all $t \ge t_0$, we have

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) + \epsilon \Vert \dot{x}(t)\Vert ^2 + \frac{\alpha \,\lambda (t)}{t} \frac{d}{dt} \Vert \dot{x}(t)\Vert ^2 + \lambda (t) \Vert \ddot{x}(t)\Vert ^2 \le 0. \end{aligned}$$

(18)

Proof

By (12), we have

$$\begin{aligned} A_{\lambda (t)}(x(t)) = -\ddot{x}(t) - \frac{\alpha }{t} \dot{x}(t). \end{aligned}$$

Replacing $A_{\lambda (t)}(x(t))$ by this expression in (14), we obtain

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) + \lambda (t) \left\| \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) \right\| ^2 \le \Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

(19)

After expanding the third term on the left-hand side of this expression, we obtain

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) + \left( \lambda (t) \frac{\alpha ^2}{t^2} -1 \right) \Vert \dot{x}(t)\Vert ^2 + \alpha \frac{\lambda (t)}{t} \frac{d}{dt} \Vert \dot{x}(t)\Vert ^2 + \lambda (t) \Vert \ddot{x}(t)\Vert ^2 \le 0. \end{aligned}$$

The result follows from (17). $\square $

2.2 Speed and acceleration decay

We now focus on the long-term behavior of the speed and acceleration. We have the following:

Proposition 2.4

Let $A: {\mathcal {H}} \rightarrow 2^{{\mathcal {H}}}$ be a maximally monotone operator such that $S= A^{-1} (0)\ne \emptyset $. Let $x:[t_0,+\infty [\rightarrow {\mathcal {H}}$ be a solution of (12), where

$$\begin{aligned} \alpha>2\quad \hbox {and}\quad \lambda (t) = (1+\epsilon ) \frac{t^2}{\alpha ^2} \quad \hbox {for some}\quad \epsilon > \frac{2}{\alpha -2}. \end{aligned}$$

Then, the trajectory $x(\cdot )$ is bounded, $\Vert \dot{x}(t)\Vert ={\mathcal {O}}(1/t)$, $\Vert \ddot{x}(t)\Vert ={\mathcal {O}}(1/t^2)$ and

$$\begin{aligned} \int _{t_0}^{+\infty } t\Vert \dot{x}(t)\Vert ^2\,dt<+\infty . \end{aligned}$$

Proof

As before, take $z\in S= A^{-1} (0)$ and define $h_z$ by (13). First we simplify the writing of Eq. (18), given in Lemma 2.3. By setting $g(t) = \Vert \dot{x}(t)\Vert ^2$, and $\beta = \frac{1 + \epsilon }{\alpha }$, we have

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) + \epsilon g(t) + \beta t \dot{g}(t)+ \lambda (t) \Vert \ddot{x}(t)\Vert ^2 \le 0. \end{aligned}$$

(20)

Neglecting the positive term $\lambda (t) \Vert \ddot{x}(t)\Vert ^2$ and multiplying by t, we obtain

$$\begin{aligned} t\ddot{h}_z(t) + \alpha \dot{h}_z(t) + \epsilon t g(t) + \beta t^2 \dot{g}(t) \le 0. \end{aligned}$$

(21)

Integrate this inequality from $t_0$ to t. We have

$$\begin{aligned} \int _{t_0}^t \left( s\ddot{h}_z(s) + \alpha \dot{h}_z(s)\right) ds&= t\dot{h}_z(t)-t_0\dot{h}_z(t_0) -\int _{t_0}^t \dot{h}_z(s)ds + \alpha h_z(t) - \alpha h_z(t_0) \\&= t\dot{h}_z(t) +(\alpha -1)h_z(t) -t_0\dot{h}_z(t_0)-(\alpha -1)h_z(t_0)\\ \int _{t_0}^t \left( \epsilon s g(s) + \beta s^2 \dot{g}(s) \right) ds&= \beta t^2 g(t) - \beta {t_0}^2 g(t_0) - 2 \beta \int _{t_0}^t s g(s) ds + \epsilon \int _{t_0}^t s g(s) ds\\&= \beta t^2 g(t) - \beta {t_0}^2 g(t_0) + (\epsilon - 2 \beta )\int _{t_0}^t s g(s) ds . \end{aligned}$$

Adding the two above expressions, we obtain

$$\begin{aligned} t\dot{h}_z(t) +(\alpha -1)h_z(t) + \beta t^2 g(t) + (\epsilon - 2 \beta )\int _{t_0}^t s g(s) ds \le C \end{aligned}$$

(22)

for some positive constant C that depends only on the initial data. By the definition of $\beta = \frac{1 + \epsilon }{\alpha }$, we have $\epsilon -2 \beta = \frac{\alpha -2}{\alpha }\left( \epsilon - \frac{2}{\alpha -2} \right) $. Since $\alpha >2$ and $\epsilon > \frac{2}{\alpha -2}$, we observe that $\epsilon - 2 \beta >0$. Hence, (22) gives

$$\begin{aligned} t\dot{h}_z(t) +(\alpha -1)h_z(t) \le C. \end{aligned}$$

(23)

Multiply this expression by $t^{\alpha -2}$ to obtain

$$\begin{aligned} \frac{d}{dt} t^{\alpha -1}h(t) \le Ct^{\alpha -2}. \end{aligned}$$

Integrating from $t_0$ to t, we obtain

$$\begin{aligned} h_z(t) \le \frac{C}{\alpha -1} + \frac{D}{t^{\alpha -1}}, \end{aligned}$$

for some other positive constant D. As a consequence, $h_z(\cdot )$ is bounded, and so is the trajectory $x(\cdot )$. Set

$$\begin{aligned} M := \sup _{t\ge t_0} \Vert x(t)\Vert < + \infty , \end{aligned}$$

and note that

$$\begin{aligned} |\dot{h}_z(t)| = |\langle x(t) - z , \dot{x}(t) \rangle |\le \Vert x(t) - z\Vert \Vert \dot{x}(t) \Vert \le ( M + \Vert z\Vert )\Vert \dot{x}(t)\Vert . \end{aligned}$$

(24)

Combining (22) with (24) we deduce that

$$\begin{aligned} \beta \big (t\Vert \dot{x}(t)\Vert \big )^2 \le C + ( M + \Vert z\Vert )\,\big (t\Vert \dot{x}(t)\Vert \big ). \end{aligned}$$

This immediately implies that

$$\begin{aligned} \sup _{t\ge t_0} t\Vert \dot{x}(t)\Vert < + \infty . \end{aligned}$$

(25)

This gives

$$\begin{aligned} \Vert \dot{x}(t)\Vert ={\mathcal {O}}(1/t). \end{aligned}$$

(26)

From (12), we have

$$\begin{aligned} \Vert \ddot{x}(t) \Vert \le \frac{\alpha }{t} \Vert \dot{x}(t) \Vert + \Vert A_{\lambda (t)}(x(t))\Vert \le \frac{\alpha }{t} \Vert \dot{x}(t) \Vert + \frac{M+\Vert z\Vert }{\lambda (t)}. \end{aligned}$$

(27)

Using (26) and the definition of $\lambda (t)$, we conclude that

$$\begin{aligned} \Vert \ddot{x}(t)\Vert ={\mathcal {O}}(1/t^2). \end{aligned}$$

(28)

Finally, returning to (22), and using (24) and (25) we infer that

$$\begin{aligned} \int _{t_0}^{+ \infty } t \Vert \dot{x}(t)\Vert ^2 dt < + \infty , \end{aligned}$$

(29)

which completes the proof. $\square $

2.3 Proof of Theorem 2.1

We are now in a position to prove the convergence of the trajectories to equilibria. According to Opial’s Lemma A.2, it suffices to verify that $\lim _{t\rightarrow + \infty }\Vert x(t)-z\Vert $ exists for each $z\in S$, and that every weak limit point of x(t), as $t\rightarrow +\infty $, belongs to S.

We begin by proving that $\lim _{t\rightarrow \infty }\Vert x(t)-z\Vert $ exists for every $z\in S$. By Lemma 2.2, we have

$$\begin{aligned} t\ddot{h}_z(t) + \alpha \dot{h}_z(t) + t\lambda (t) \Vert A_{\lambda (t)}(x(t)) \Vert ^2 \le t\Vert \dot{x}(t)\Vert ^2. \end{aligned}$$

(30)

Since $ h_z$ is nonnegative, and in view of (29), Lemma A.6 shows that

$$\begin{aligned} \lim _{t\rightarrow + \infty } h_z (t) \end{aligned}$$

exists. It follows that $\lim _{t\rightarrow \infty }\Vert x(t)-z\Vert $ exists for every $z\in S$.

From Lemma A.6 and (30), we also have

$$\begin{aligned} \int _{t_0}^{+ \infty } t \lambda (t) \Vert A_{\lambda (t)}(x(t)) \Vert ^2 dt < + \infty . \end{aligned}$$

(31)

Since $\lambda (t) = ct^2$ for some positive constant c we infer

$$\begin{aligned} \int _{t_0}^{+ \infty } \Vert \lambda (t) A_{\lambda (t)}(x(t)) \Vert ^2 \frac{1}{t}dt < + \infty . \end{aligned}$$

(32)

The central point of the proof is to show that this property implies

$$\begin{aligned} \lim _{t\rightarrow + \infty } \Vert \lambda (t) A_{\lambda (t)}(x(t)) \Vert =0. \end{aligned}$$

(33)

Suppose, for a moment, that this property holds. Then the end of the proof follows easily: let $t_n \rightarrow + \infty $ such that $x(t_n) \rightarrow \bar{x}$ weakly. We have $\lambda (t_n) A_{\lambda (t_n)}(x(t_n)) \rightarrow 0$ strongly. Since $\lambda (t_n)\rightarrow + \infty $, we also have $ A_{\lambda (t_n)}(x(t_n)) \rightarrow 0$ strongly. Passing to the limit in

$$\begin{aligned} A_{\lambda (t_n)}(x(t_n)) \in A (x(t_n) - \lambda (t_n) A_{\lambda (t_n)}(x(t_n))) \end{aligned}$$

and using the demi-closedness of the graph of A, we obtain

$$\begin{aligned} 0 \in A (\bar{x}). \end{aligned}$$

In other words, $\bar{x} \in S$, and we conclude by Opial’s Lemma.

As a consequence, it suffices to prove (33). To obtain this result, we shall estimate the variation of the function $t \mapsto \lambda (t) A_{\lambda (t)} $. By applying Lemma A.4 with $\gamma = \lambda (t)$, $\delta = \lambda (s)$, $x= x(t)$ and $y = y(s)$ with $s, t \ge t_0$, we obtain

$$\begin{aligned} \Vert \lambda (t) A_{\lambda (t)}x(t) - \lambda (s) A_{\lambda (s)}x(s)\Vert \le 2 \Vert x(t)-x(s) \Vert + 2 \Vert x(t)-z \Vert \frac{|\lambda (t) - \lambda (s) |}{\lambda (t)} \end{aligned}$$

(34)

for each fixed $z\in S$. Dividing by $t-s$ with $t\ne s$, and letting s tend to t, we deduce that

$$\begin{aligned} \left\| \frac{d}{dt}\big (\lambda (t) A_{\lambda (t)}x(t)\big )\right\| \le 2 \Vert \dot{x}(t)\Vert + 2 \Vert x(t)-z \Vert \frac{|\dot{\lambda }(t) |}{\lambda (t)} \end{aligned}$$

for almost every $t>t_0$. According to Proposition 2.4, the trajectory $x(\cdot )$ is bounded by some $M\ge 0$. In turn,

$$\begin{aligned} \Vert \dot{x}(t)\Vert \le \frac{C}{t}, \end{aligned}$$

for some $C\ge 0$ and all $t\ge t_0$, by (25). Finally, the definition of $\lambda (t)$ implies

$$\begin{aligned} \frac{|\dot{\lambda }(t)|}{\lambda (t)}=\frac{2}{t} \end{aligned}$$

for all $t\ge t_0$. As a consequence,

$$\begin{aligned} \left\| \frac{d}{dt}(\lambda (t) A_{\lambda (t)}x(t))\right\| \le \frac{2C+4(M+\Vert z\Vert )}{t}. \end{aligned}$$

This property, along with the boundedness of $\lambda (t) A_{\lambda (t)}x(t)$, and estimation (32), together imply that the nonnegative function $w(t):= \Vert \lambda (t) A_{\lambda (t)}x(t)\Vert ^2$ satisfies

$$\begin{aligned} \left| \frac{d}{dt} w(t)\right| \le \eta (t) \end{aligned}$$

for almost every $t>t_0$, and

$$\begin{aligned} \int _{t_0}^{+ \infty } w(t)\,\eta (t)\,dt < + \infty , \end{aligned}$$

where

$$\begin{aligned} \eta (t)=\frac{2C+4(M+\Vert z\Vert )}{t}. \end{aligned}$$

Noting that $\eta \notin L^1 (t_0, +\infty )$, we conclude thanks to Lemma A.5.

3 Proximal-based inertial algorithms with regularized operator

In this section, we introduce the inertial proximal algorithm which results from the discretization with respect to time of the continuous system

$$\begin{aligned} \ddot{x}(t)+\frac{\alpha }{t} \dot{x}(t)+A_{\lambda (t)}(x(t))=0, \end{aligned}$$

(35)

where $A_{\lambda (t)}$ is the Yosida approximation of index $\lambda (t)$ of the maximally monotone operator A. Further insight into the relationship between continuous- and discrete-time systems in variational analysis can be found in [30].

3.1 A regularized inertial proximal algorithm

We shall obtain an implementable algorithm by means of an implicit discretization of (35) with respect to time. Note that, in view of the Lipschitz continuity property of the operator $A_{\lambda }$, the explicit discretization might work well too. We choose to discretize it implicitely for two reasons: implicit discretizations tend to follow the continuous-time trajectories more closely; and the explicit discretization has the same iteration complexity (they each need one resolvent computation per iteration). Taking a fixed time step $h>0$, and setting $t_k= kh$, $x_k = x(t_k)$, $\lambda _k = \lambda (t_k)$, an implicit finite-difference scheme for (35) with centered second-order variation gives

$$\begin{aligned} \frac{1}{h^2}(x_{k+1} -2 x_{k} + x_{k-1} ) +\frac{\alpha }{kh^2}( x_{k} - x_{k-1}) + A_{\lambda _k} (x_{k+1}) =0. \end{aligned}$$

(36)

After expanding (36), we obtain

$$\begin{aligned} x_{k+1} + h^2 A_{\lambda _k} (x_{k+1}) = x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1}). \end{aligned}$$

(37)

Setting $s=h^2$, we have

$$\begin{aligned} x_{k+1} = \left( I + s A_{\lambda _k}\right) ^{-1} \left( x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1})\right) , \end{aligned}$$

(38)

where $\left( I + s A_{\lambda _k}\right) ^{-1} $ is the resolvent of index $s>0$ of the maximally monotone operator $A_{\lambda _k}$. This gives the following algorithm

$$\begin{aligned} \left\{ \begin{array}{ll} y_k = x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1}) \\ x_{k+1} = \left( I + s A_{\lambda _k}\right) ^{-1} \left( y_k \right) . \end{array}\right. \end{aligned}$$

(39)

Using equality (84):

$$\begin{aligned} \left( A_{\lambda }\right) _s = A_{\lambda + s}, \end{aligned}$$

we can reformulate this last equation as

$$\begin{aligned} \left( I + s A_{\lambda }\right) ^{-1} = \frac{\lambda }{\lambda +s}I + \frac{s}{\lambda +s}\left( I + (\lambda +s) A\right) ^{-1}. \end{aligned}$$

Hence, (39) can be rewritten as

$$\begin{aligned} \text{(RIPA) } \qquad \left\{ \begin{array}{ll} y_k= \displaystyle {x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1})} \\ x_{k+1} = \displaystyle {\frac{\lambda _k}{\lambda _k +s}y_k + \frac{s}{\lambda _k +s}\left( I + (\lambda _k +s) A\right) ^{-1} \left( y_k \right) }, \end{array}\right. \end{aligned}$$

where (RIPA) stands for the Regularized Inertial Proximal Algorithm. Let us reformulate (RIPA) in a compact way. Observe that

$$\begin{aligned} x_{k+1}&= \frac{\lambda _k}{\lambda _k +s}y_k + \frac{s}{\lambda _k +s}\left( I + (\lambda _k +s) A\right) ^{-1} \left( y_k \right) \\&= \left( 1- \frac{s}{\lambda _k +s} \right) y_k + \frac{s}{\lambda _k +s}\left( I + (\lambda _k +s) A\right) ^{-1} \left( y_k \right) \\&= y_k -s \frac{1}{\lambda _k +s} \left( y_k - \left( I + (\lambda _k +s) A\right) ^{-1} (y_k) \right) . \end{aligned}$$

By the definition of $A_{\lambda _k + s}$, this is

$$\begin{aligned} x_{k+1} = y_k -s A_{\lambda _k + s}\left( y_k \right) . \end{aligned}$$

(40)

Thus, setting $\alpha _k = 1- \frac{\alpha }{k}$, (RIPA) is just

$$\begin{aligned} \left\{ \begin{array}{ll} y_k= \displaystyle {x_{k} + \alpha _k( x_{k} - x_{k-1})} \\ x_{k+1} = y_k -s A_{\lambda _k + s}\left( y_k \right) , \end{array}\right. \end{aligned}$$

(41)

Remark 3.1

Letting $\lambda _k \rightarrow 0$ in (41), we obtain the classical form of the inertial proximal algorithm

$$\begin{aligned} \left\{ \begin{array}{ll} y_k= x_{k} + \alpha _k ( x_{k} - x_{k-1}) \\ x_{k+1} = (I + sA)^{-1} (y_k). \end{array}\right. \end{aligned}$$

(42)

The case $0\le \alpha _k \le \bar{\alpha }<1$ has been considered by Álvarez and Attouch in [2]. The case $\alpha _k \rightarrow 1$, which is the most interesting for obtening fast methods (in the line of Nesterov’s accelerated methods), and the one that we are concerned with, was recently studied by Attouch and Cabot [3, 4]. In these two papers, the convergence is obtained under the restrictive assumption

$$\begin{aligned} \sum _k \alpha _k \Vert x_{k+1} - x_{k} \Vert ^2 <+ \infty . \end{aligned}$$

By contrast, our approach, which supposes that $\lambda _k \rightarrow +\infty $, provides convergence of the trajectories, without any restrictive assumption on the trajectories. Let us give a geometrical interpretation of (RIPA). As a classical property of the resolvents ([14, Theorem 23.44]), for any $x\in {\mathcal {H}}$, $J_{\lambda }x \rightarrow \text{ proj }_S (x)$ as $\lambda \rightarrow + \infty $. Thus the algorithm writes

$$\begin{aligned} x_{k+1} = \theta _k y_k + (1-\theta _k) J_{\lambda _k +s }\left( y_k\right) \end{aligned}$$

with $\lambda _k \sim + \infty $, $\theta _k = \frac{\lambda _k}{\lambda _k +s} \sim 1$, and $J_{\lambda _k +s }\left( y_k\right) \sim \text{ proj }_S (y_k)$. This is illustrated in the following picture.

Remark 3.2

As a main difference with (42), (RIPA) contains an additional momentum term $\frac{\lambda _k}{\lambda _k +s}y_k $, which enters the definition of $x_{k+1}$. Although there is some similarity, this is different from the algorithm introduced by Kim and Fessler in [23], which also contains an additional momentum term, but which comes within the definition of $y_k$. It is important to mention that, in both cases, the introduction of this momentum term implies additional operations whose cost is negligible.

3.2 Preliminary estimations

Given $z\in {\mathcal {H}}$ and $k\ge 1$, write

$$\begin{aligned} h_{z,k} := \frac{1}{2} \Vert x_k - z\Vert ^2. \end{aligned}$$

(43)

Since there will be no risk of confusion, we shall write $h_k$ for $h_{z,k}$ to simplify the notation. The following result is valid for an arbitrary sequence $ \alpha _k \ge 0$:

Lemma 3.3

Let $\alpha _k \ge 0$. For any $z \in {\mathcal {H}}$, the following holds for all $k \ge 1$:

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) - \langle x_{k+1} - y_k , x_{k+1} - z \rangle \nonumber \\&\quad + \frac{1}{2} \Vert x_{k+1} - y_k \Vert ^2 = \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(44)

Proof

Since $y_k= x_{k} + \alpha _k( x_{k} - x_{k-1})$, we have

$$\begin{aligned} \Vert y_k -z\Vert ^2&= \Vert (x_{k} -z)+ \alpha _k( x_{k} - x_{k-1}) \Vert ^2 \nonumber \\&= \Vert x_{k} -z \Vert ^2 + {\alpha _k}^2 \Vert x_{k} - x_{k-1} \Vert ^2 +2 \alpha _k \langle x_{k} - z , x_{k}-x_{k-1} \rangle ,\nonumber \\&= \Vert x_{k} -z \Vert ^2 + {\alpha _k}^2 \Vert x_{k} - x_{k-1} \Vert ^2 + \alpha _k \Vert x_{k} -z \Vert ^2 \nonumber \\&\quad + \alpha _k\Vert x_{k}-x_{k-1} \Vert ^2 - \alpha _k\Vert x_{k-1} -z \Vert ^2 \nonumber \\&= \Vert x_{k} -z \Vert ^2 + \alpha _k ( \Vert x_{k} -z \Vert ^2 - \Vert x_{k-1} -z \Vert ^2 ) + (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 \nonumber \\&= 2(h_k + \alpha _k (h_{k} - h_{k-1} )) + (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(45)

Combining (45) with the elementary equality

$$\begin{aligned} (h_{k+1} -h_k) - \alpha _k (h_{k} - h_{k-1} ) = \frac{1}{2} \Vert x_{k+1} -z \Vert ^2 -(h_k + \alpha _k (h_{k} - h_{k-1} ) ), \end{aligned}$$

we deduce that

$$\begin{aligned}&(h_{k+1} -h_k) - \alpha _k (h_{k} - h_{k-1} )\\&\quad =\frac{1}{2} \Vert x_{k+1} -z \Vert ^2 - \frac{1}{2} \Vert y_{k} -z \Vert ^2 + \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 \\&\quad = \langle x_{k+1} - y_k , \frac{1}{2} ( x_{k+1} +y_k) -z\rangle + \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 \\&\quad = \langle x_{k+1} - y_k , ( x_{k+1}-z) + \frac{1}{2}(y_k - x_{k+1})\rangle + \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 \\&\quad = \langle x_{k+1} - y_k , x_{k+1} - z \rangle - \frac{1}{2} \Vert x_{k+1} - y_k \Vert ^2 + \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 \end{aligned}$$

which gives the claim. $\square $

Let us now use the specific form of the inertial algorithm (RIPA) and the cocoercivity of the operator $A_{\lambda }$. The following result is the discrete counterpart of Lemma 2.2:

Lemma 3.4

Let $S = A^{-1}(0) \ne \emptyset $, and $0 \le \alpha _k \le 1$. For any $z \in S$, the following holds for all $k \ge 1$

$$\begin{aligned} (h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) +s \lambda _k \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 \le \alpha _k \Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(46)

Proof

By Lemma 3.3 and the fact that $x_{k+1} = y_k -s A_{\lambda _k + s}\left( y_k \right) $ (see (40)), we have

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) +s \langle A_{\lambda _k + s}\left( y_k \right) , y_k -s A_{\lambda _k + s}\left( y_k \right) - z \rangle \\&\quad + \frac{s^2}{2} \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 = \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

Since $z\in S$, we have $A_{\lambda _k + s}(z)=0 $. By the $(\lambda _k + s)$-cocoercivity property of $A_{\lambda _k + s}$, we deduce that

$$\begin{aligned} \langle A_{\lambda _k + s}\left( y_k \right) , y_k-z\rangle \ge (\lambda _k + s) \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 . \end{aligned}$$

As a consequence,

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) +s \left( \lambda _k + \frac{s}{2} \right) \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 \\&\quad \le \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

But $\frac{1}{2} (\alpha _k + {\alpha _k}^2 ) \le \alpha _k$ because $0 \le \alpha _k \le 1$. Since $s\ge 0$, the result follows immediately. $\square $

It would be possible to continue the analysis assuming the right-hand side of (46) is summable. The main disadvantage of this hypothesis is that it involves the trajectory $(x_k)$, which is unknown. In the following lemma, we show that the two antagonistic terms $s(\lambda _k + \frac{s}{2} ) \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 $ and $ \alpha _k \Vert x_{k} - x_{k-1} \Vert ^2 $ can be balanced, provided $\lambda _k$ is taken large enough.

Lemma 3.5

Let $ S = A^{-1}(0) \ne \emptyset $, and take $\alpha _k =1 -\frac{\alpha }{k}$ with $\alpha >2$. For each $k \ge 1$, set

$$\begin{aligned} \lambda _k = (1+ \epsilon )\frac{s}{\alpha ^2}k^2, \end{aligned}$$

(47)

for some $\epsilon >0$, and write $\beta = \frac{1 + \epsilon }{\alpha } $. Then, for each $z \in S$ and all $k \ge 1$, we have

$$\begin{aligned} (h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) + \epsilon \Vert x_{k} - x_{k-1} \Vert ^2 + \beta k\left( \Vert x_{k+1} - x_{k}\Vert ^2 - \Vert x_{k} - x_{k-1} \Vert ^2 \right) {\!\le \,} 0. \end{aligned}$$

(48)

Proof

First, rewrite (46) as

$$\begin{aligned} (h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) + \frac{\lambda _k }{s} \Vert x_{k+1} - y_k \Vert ^2 \le \alpha _k \Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(49)

Let us write $\Vert x_{k+1} - y_k \Vert ^2$ in a recursive form. To this end, we use the specific form of $\alpha _k = 1- \frac{\alpha }{k}$ to obtain

$$\begin{aligned} \Vert x_{k+1} - y_k \Vert ^2&= \Vert (x_{k+1} - x_{k}) - \alpha _k( x_{k} - x_{k-1}) \Vert ^2 \\&= \Vert (x_{k+1} - x_{k}) - (x_{k} - x_{k-1} )+ (1-\alpha _k)( x_{k} - x_{k-1}) \Vert ^2 \\&= \left\| (x_{k+1} - x_{k}) - (x_{k} - x_{k-1} )+ \frac{\alpha }{k}( x_{k} - x_{k-1}) \right\| ^2 \\&=\Vert (x_{k+1} - x_{k}) - (x_{k} - x_{k-1} )\Vert ^2 + \frac{\alpha ^2}{k^2}\Vert x_{k} - x_{k-1} \Vert ^2 \\&\quad +\,2 \frac{\alpha }{k} \langle (x_{k+1} - x_{k}) - (x_{k} - x_{k-1}), x_{k} - x_{k-1} \rangle . \end{aligned}$$

But

$$\begin{aligned} \frac{1}{2}\Vert x_{k+1} - x_{k}\Vert ^2= & {} \frac{1}{2} \Vert x_{k} - x_{k-1} \Vert ^2 + \langle (x_{k+1} - x_{k}) - (x_{k} - x_{k-1}), x_{k} - x_{k-1} \rangle \\&\quad + \frac{1}{2}\Vert (x_{k+1} - x_{k})-(x_{k} - x_{k-1}) \Vert ^2 . \end{aligned}$$

By combining the above equalities, we get

$$\begin{aligned} \Vert x_{k+1} - y_k \Vert ^2&=\left( 1- \frac{\alpha }{k}\right) \Vert (x_{k+1} - x_{k}) - (x_{k} - x_{k-1} )\Vert ^2 + \frac{\alpha ^2}{k^2}\Vert x_{k} - x_{k-1} \Vert ^2 \\&\quad +\frac{\alpha }{k} \left( \Vert x_{k+1} - x_{k}\Vert ^2 - \Vert x_{k} - x_{k-1} \Vert ^2 . \right) \end{aligned}$$

Using this equality in (49), and neglecting the nonnegative term $(1- \frac{\alpha }{k})\Vert (x_{k+1} - x_{k}) - (x_{k} - x_{k-1} )\Vert ^2$, we obtain

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) \nonumber \\&\quad +\,\frac{\lambda _k \alpha ^2}{sk^2} \Vert x_{k} - x_{k-1} \Vert ^2 + \frac{\lambda _k \alpha }{sk} \left( \Vert x_{k+1} - x_{k}\Vert ^2 - \Vert x_{k} - x_{k-1} \Vert ^2 \right) \le \Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(50)

Using (47) and the definition $\beta = \frac{1 + \epsilon }{\alpha }$, inequality (50) becomes (48). $\square $

3.3 Main convergence result

We are now in position to prove the main result of this section, namely:

Theorem 3.6

Let $A: {\mathcal {H}} \rightarrow 2^{{\mathcal {H}}}$ be a maximally monotone operator such that $S= A^{-1} (0)\ne \emptyset $. Let $(x_k)$ be a sequence generated by the Regularized Inertial Proximal Algorithm

$$\begin{aligned} \text{(RIPA) } \qquad \left\{ \begin{array}{ll} y_k= \displaystyle {x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1})} \\ x_{k+1} = \displaystyle {\frac{\lambda _k}{\lambda _k +s}y_k + \frac{s}{\lambda _k +s}\left( I + (\lambda _k +s) A\right) ^{-1} \left( y_k \right) }, \end{array}\right. \end{aligned}$$

where $\alpha >2$ and

$$\begin{aligned} \lambda _k = (1+ \epsilon )\frac{s}{\alpha ^2}k^2 \end{aligned}$$

for some $\epsilon > \frac{2}{\alpha -2}$ and all $k \ge 1$. Then,

(i)
The speed tends to zero. More precisely, $\Vert x_{k+1} - x_{k} \Vert = {\mathcal {O}} (\frac{1}{k})$ and $\sum _k k\Vert x_{k} - x_{k-1} \Vert ^2 < +\infty $.
(ii)
The sequence $(x_k)$ converges weakly, as $k\rightarrow +\infty $, to some ${\hat{x}}\in S$.
(iii)
The sequence $(y_k)$ converges weakly, as $k\rightarrow +\infty $, to ${\hat{x}}$.

Proof

First, we simplify the writing of the Eq. (48) given in Lemma 3.5. Setting $g_k:= \Vert x_{k} - x_{k-1} \Vert ^2$, we have

$$\begin{aligned} (h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) + \epsilon g_k + \beta k\left( g_{k+1}- g_k\right) \le 0. \end{aligned}$$

Then, we multiply by k to obtain

$$\begin{aligned} k(h_{k+1} - h_k )- (k-\alpha ) (h_{k} - h_{k-1} ) + \epsilon kg_k + \beta k^2\left( g_{k+1}- g_k\right) \le 0. \end{aligned}$$

We now write these inequalities in a recursive form, in order to simplify their summation. We have

$$\begin{aligned}&k(h_{k+1} - h_k )- (k-1) (h_{k} - h_{k-1} ) +(\alpha -1) (h_{k} - h_{k-1} ) + \epsilon kg_k \\&\quad +\, \beta k^2 g_{k+1}- \beta (k-1)^2 g_{k} -\beta (2k-1)g_k \le 0. \end{aligned}$$

Summing for $p=1,\dots , k$, we obtain

$$\begin{aligned} k(h_{k+1} - h_k ) +(\alpha -1) h_{k} + (\epsilon -2 \beta ) \sum _1^k pg_p + \beta k^2 g_{k+1} +\beta \sum _1^k g_p \le C \end{aligned}$$

(51)

for some positive constant C that depends only on the initial data. Since $\beta = \frac{1 + \epsilon }{\alpha }$ with $\alpha >2$ and $\epsilon > \frac{2}{\alpha -2}$, we have $\epsilon -2 \beta = \frac{\alpha -2}{\alpha }\left( \epsilon - \frac{2}{\alpha -2} \right) >0$. From (51) we infer that

$$\begin{aligned} k(h_{k+1} - h_k ) +(\alpha -1) h_{k} \le C \end{aligned}$$

(52)

for all $k\ge 1$. Since $\alpha >2$ and $h_k\ge 0$, (52) implies

$$\begin{aligned} k(h_{k+1} - h_k ) + h_{k} \le C . \end{aligned}$$

Equivalently,

$$\begin{aligned} kh_{k+1} - (k-1)h_k \le C . \end{aligned}$$

Applying this fact recursively, we deduce that

$$\begin{aligned} kh_{k+1} \le Ck, \end{aligned}$$

which immediately gives $\sup _k \ h_k < +\infty $. Therefore, the sequence $(x_k)$ is bounded. Set

$$\begin{aligned} M := \sup _k \Vert x_k \Vert < +\infty . \end{aligned}$$

Now, (51) also implies that

$$\begin{aligned} k(h_{k+1} - h_k ) + \beta k^2 g_{k+1} \le C \end{aligned}$$

(53)

But

$$\begin{aligned} |h_{k+1} - h_k |&= \frac{1}{2} \left| \Vert x_{k+1} - z \Vert ^2- \Vert x_{k} - z \Vert ^2 \right| = \left| \left\langle x_{k+1} - x_{k}, \frac{1}{2} ( x_{k+1}+ x_{k} ) -z\right\rangle \right| \nonumber \\&\le (M+ \Vert z\Vert )\Vert x_{k+1} - x_{k} \Vert . \end{aligned}$$

(54)

Combining this inequality with (53), and recalling the definition $g_k=\Vert x_{k} - x_{k-1} \Vert ^2$, we deduce that

$$\begin{aligned} \beta \big [k \Vert x_{k+1} - x_{k} \Vert \big ]^2 -(M+ \Vert z\Vert )\,\big [k\Vert x_{k+1} - x_{k} \Vert \big ]-C \le 0. \end{aligned}$$

This immediately implies that

$$\begin{aligned} \sup _k k\Vert x_{k+1} - x_{k} \Vert < +\infty . \end{aligned}$$

(55)

In other words, $\Vert x_{k+1} - x_{k} \Vert = {\mathcal {O}} (\frac{1}{k})$. Another consequence of (51) is that

$$\begin{aligned} (\epsilon -2 \beta ) \sum _{p=1}^k p\Vert x_{p} - x_{p-1} \Vert ^2 \le C - k(h_{k+1} - h_k ). \end{aligned}$$

By (54), we deduce that

$$\begin{aligned} (\epsilon -2 \beta ) \sum _{p=1}^k p\Vert x_{p} - x_{p-1} \Vert ^2 \le C +k(M+ \Vert z\Vert )\Vert x_{k+1} - x_{k} \Vert . \end{aligned}$$

Then, (55) gives

$$\begin{aligned} \sum _k k\Vert x_{k} - x_{k-1} \Vert ^2 < +\infty , \end{aligned}$$

(56)

which completes the proof of item i).

For the convergence of the sequence $(x_k)$, we use Opial’s Lemma A.3. First, since $\alpha _k=1-\frac{\alpha }{k}\le 1$, Lemma 3.4 gives

$$\begin{aligned} (h_{k+1} - h_k)+s \lambda _k \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 \le \left( 1-\frac{\alpha }{k}\right) (h_{k} - h_{k-1}) +\Vert x_{k} - x_{k-1} \Vert ^2 \end{aligned}$$

for all $k \ge 1$. Using (56) and invoking Lemma A.7, we deduce that

$$\begin{aligned} \sum _k k\lambda _k \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 < + \infty . \end{aligned}$$

(57)

and

$$\begin{aligned} \sum _k [h_{k} - h_{k-1}]_+< + \infty . \end{aligned}$$

Since $h_k$ is nonnegative, this implies the existence of $\lim _{k\rightarrow +\infty }h_k$, and also that of $\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert $.

In order to conclude using Opial’s Lemma A.3, it remains to show that every weak limit point of the sequence $(x_k)$, as $k\rightarrow +\infty $, belongs to S. We begin by expressing (57) with respect to $x_k$, instead of $y_k$. We have

$$\begin{aligned} \Vert A_{\lambda _k +s}\left( x_k \right) \Vert ^2&\le 2\Vert A_{\lambda _k +s}\left( y_k \right) \Vert ^2 +2 \Vert A_{\lambda _k +s}\left( x_k \right) -A_{\lambda _k +s}\left( y_k \right) \Vert ^2 \\&\le 2\Vert A_{\lambda _k +s}\left( y_k \right) \Vert ^2 + \frac{2}{(\lambda _k +s)^2}\Vert y_k - x_k\Vert ^2 \\&\le 2\Vert A_{\lambda _k +s}\left( y_k \right) \Vert ^2 + \frac{2}{\lambda _k ^2}\Vert x_k - x_{k-1}\Vert ^2 , \end{aligned}$$

where the last inequality follows from the definition of $y_k$ given in (39). Using (55) and the definition of $\lambda _k$, we may find a constant $D\ge 0$ such that

$$\begin{aligned} \Vert A_{\lambda _k +s}\left( x_k \right) \Vert ^2&\le 2\Vert A_{\lambda _k +s}\left( y_k \right) \Vert ^2 + \frac{D}{k^2\lambda _k^2}. \end{aligned}$$

Hence,

$$\begin{aligned} \sum _k k\lambda _k \Vert A_{\lambda _k + s}\left( x_k \right) \Vert ^2 \le 2 \sum _k k\lambda _k\Vert A_{\lambda _k +s}\left( y_k \right) \Vert ^2 + \sum _k \frac{D}{k\lambda _k}. \end{aligned}$$

Now, using (57) and the definition of $\lambda _k$, we conclude that

$$\begin{aligned} \sum _k k\lambda _k \Vert A_{\lambda _k + s}\left( x_k \right) \Vert ^2 < +\infty . \end{aligned}$$

Since $\lambda _k$ tends to infinity, this immediately implies

$$\begin{aligned} \sum _k k(\lambda _k +s)\Vert A_{\lambda _k + s}\left( x_k \right) \Vert ^2 < +\infty . \end{aligned}$$

To simplify the notations, set $\mu _k = \lambda _k +s$, so that

$$\begin{aligned} \sum _k k\mu _k \Vert A_{\mu _k}\left( x_k \right) \Vert ^2 < +\infty . \end{aligned}$$

(58)

As we shall see, this fact implies

$$\begin{aligned} \lim _{k\rightarrow +\infty } \Vert \mu _k A_{{\mu }_k}\left( x_k \right) \Vert = 0. \end{aligned}$$

(59)

Suppose, for a moment, that this is true. Let $(x_{k_n})_n$ be a subsequence of $(x_k)$ which converges weakly to some $\bar{x}$. We want to prove that $\bar{x} \in S = A^{-1}(0)$. Since $\mu _k$ tends to infinity, we also have

$$\begin{aligned} \lim _{k\rightarrow +\infty } \Vert A_{\mu _k}\left( x_k \right) \Vert = 0. \end{aligned}$$

Passing to the limit in

$$\begin{aligned} A_{{\mu _{k}}_n}\left( x_{k_n} \right) \in A (x_{k_n} - \mu _{k_n} A_{\mu _{k_n}}\left( x_{k_n} \right) ), \end{aligned}$$

and using the demi-closedness of the graph of A, we obtain

$$\begin{aligned} 0 \in A (\bar{x}). \end{aligned}$$

In other words, $\bar{x} \in S$. As a consequence, it only remains to prove (59) in order to obtain ii) by Opial’s Lemma. To this end, define

$$\begin{aligned} \omega _k := \Vert \mu _k A_{\mu _k}\left( x_k \right) \Vert ^2. \end{aligned}$$

We intend to prove that $\lim _{k\rightarrow +\infty }\omega _k=0$. Using (58) and the definition of $\mu _k$, we deduce that

$$\begin{aligned} \sum _k \frac{1}{k} \omega _k < + \infty . \end{aligned}$$

(60)

Therefore, if $\lim _{k\rightarrow +\infty }\omega _k$ exists, it must be zero. Since

$$\begin{aligned} \Vert \mu _k A_{\mu _k}\left( x_k \right) \Vert = \Vert x_k -J_{\mu _kA}(x_k) \Vert \le \Vert x_k -z\Vert \le M+\Vert z\Vert , \end{aligned}$$

we have

$$\begin{aligned} |\omega _{k+1} -\omega _k |= & {} \left| \Vert \mu _{k+1} A_{\mu _{k+1}}\left( x_{k+1} \right) \Vert ^2 -\Vert \mu _k A_{\mu _k}\left( x_k \right) \Vert ^2 \right| \nonumber \\\le & {} 2\big (M+\Vert z\Vert \big ) \Vert \mu _{k+1} A_{\mu _{k+1}}\left( x_{k+1} \right) -\mu _k A_{\mu _k}\left( x_k \right) \Vert . \end{aligned}$$

(61)

On the other hand, by Lemma A.4 we have

$$\begin{aligned} \Vert \mu _{k+1} A_{\mu _{k+1}}\left( x_{k+1} \right) -\mu _k A_{\mu _k}\left( x_k \right) \Vert&\le 2 \Vert x_{k+1} - x_{k} \Vert + 2 \Vert x_{k+1}-z \Vert \frac{|\mu _{k+1} - \mu _{k} |}{\mu _{k+1}} \\&\le 2 \Vert x_{k+1} - x_{k} \Vert + 2 \Vert x_{k+1}-z \Vert \frac{|\lambda _{k+1} - \lambda _{k} |}{\lambda _{k+1}} \\&\le 2 \Vert x_{k+1} - x_{k} \Vert + \frac{4(M+\Vert z\Vert )}{k+1}, \end{aligned}$$

by the definition of $\lambda _k$. Using (55) and replacing the resulting inequality in (61), we deduce that there is a constant $E\ge 0$ such that

$$\begin{aligned} |\omega _{k+1} -\omega _k |\le \frac{E}{k}. \end{aligned}$$

But then

$$\begin{aligned} \sum _k |(\omega _{k+1})^2 -(\omega _k)^2 |\le E\sum _k \frac{1}{k}(\omega _{k+1}+\omega _k)<+\infty \end{aligned}$$

by (60). It follows that $\lim _{k\rightarrow +\infty }\omega _k^2$ exists and, since $\omega _k\ge 0$, $\lim _{k\rightarrow +\infty }\omega _k$ exists as well. This completes the proof of item ii). Finally, item iii) follows from the fact that $\lim _k\Vert x_{k+1}-y_k\Vert =\lim _k\Vert s A_{\lambda _k + s}\left( y_k \right) \Vert =0$. $\square $

3.4 An application to convex–concave saddle value problems

As shown by Rockafellar [33], to each closed convex–convave function $L: X\times Y \rightarrow \bar{\mathbb {R}}$ acting on the product of two Hilbert spaces X and Y is associated a maximally monotone operator $M: X\times Y \rightrightarrows X\times Y $ which is given by $M= (\partial _{x} L, -\partial _{y} L)$. This makes it possible to convert convex–concave saddle value problems into the search for the zeros of a maximally monotone operator, and thus to apply our results. Let’s illustrate it in the case of the convex constrained structured minimization problem

$$\begin{aligned} \mathrm{(P)} \quad \min \left\{ f(x) + g(y) : Ax-By =0 \right\} , \end{aligned}$$

where data satisfy the following assumptions:

X, Y, Z are real Hilbert spaces
$f : X \rightarrow \mathbb {R} \cup \{+\infty \}$ and $g: Y \rightarrow \mathbb {R} \cup \left\{ +\infty \right\} $ are closed convex proper functions.
$A : X \rightarrow Z$ and $B : Y \rightarrow Z$ are linear continuous operators.

Let us first reformulate (P) as a saddle value problem

$$\begin{aligned} \min _{(x,y)\in X \times Y} \max _{z\in Z } \left\{ f(x) + g(y) + \left\langle z, Ax - By \right\rangle \right\} . \end{aligned}$$

(62)

The Lagrangian $L : X \times Y \times Z \rightarrow \mathbb {R} \cup \{+\infty \}$ associated to (62)

$$\begin{aligned} L(x, y, z) = f(x) + g(y) + \left\langle z, Ax - By \right\rangle \end{aligned}$$

is a convex–concave extended-real-valued function. The maximal monotone operator $ M: X \times Y \times Z \rightrightarrows X \times Y \times Z$ that is associated to L is given by

$$\begin{aligned} M(x, y, z) = \left( \partial _{x,y}L, -\partial _{z}L \right) (x, y, z) = \left( \partial f(x) + A^t z, \ \partial g(y) - B^t z, \ By- Ax \right) . \end{aligned}$$

(63)

When the proximal algorithm is applied to the maximally monotone operator M, we obtain the so-called proximal method of multipliers. This method was initiated by Rockafellar [34]. By combining this method with the alternating proximal minimization algorithm for weakly coupled minimization problems, a fully split method is obtained. This approach was successfully developed by Attouch and Soueycatt in [10]. Introducing inertial terms in this algorithm, as given by (RIPA), is a subject of further study, which is part of the active research on the acceleration of the (ADMM) algorithms.

4 Stability with respect to perturbations, errors

In this section, we discuss the stability of the convergence results with respect to external perturbations. We first consider the continuous case, then the corresponding algorithmic results.

4.1 The continuous case

The continuous dynamics is now written in the following form

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) +A_{\lambda (t)}(x(t)) = f(t), \end{aligned}$$

(64)

where, depending on the context, f can be interpreted as a source term, a perturbation, or an error. We suppose that f is locally integrable to ensure existence and uniqueness for the corresponding Cauchy problem (see Lemma A.1 in the “Appendix”). Assuming that f(t) tends to zero fast enough as $t \rightarrow + \infty $, we will see that the convergence results proved in Sect. 2 are still valid for the perturbed dynamics (64). Due to its similarity to the unperturbed case, we give the main lines of the proof, just highlighting the differences.

Theorem 4.1

Let $A: {\mathcal {H}} \rightarrow 2^{{\mathcal {H}}}$ be a maximally monotone operator such that $S= A^{-1} (0)\ne \emptyset $. Let $x:[t_0,+\infty [\rightarrow {\mathcal {H}}$ be a solution of the continuous dynamic (64), where $\alpha >2$ and $\lambda (t) = (1+\epsilon ) \frac{t^2}{\alpha ^2}$ with $\epsilon > \frac{2}{\alpha -2}$. Assume also that $\int _{t_0}^{+\infty } t^3 \Vert f(t)\Vert ^2 dt < + \infty $ and $\int _{t_0}^{+\infty } t \Vert f(t)\Vert dt < + \infty $. Then, x(t) converges weakly, as $t\rightarrow +\infty $, to an element of S. Moreover $\Vert \dot{x}(t)\Vert = {\mathcal {O}} (1/t)$.

Proof

First, a similar computation as in Lemma 2.2 gives

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) + \lambda (t) \Vert A_{\lambda (t)}(x(t)) \Vert ^2 \le \Vert \dot{x}(t)\Vert ^2 + \Vert x(t)-z\Vert \Vert f(t)\Vert . \end{aligned}$$

(65)

Following the arguments in the proof of Lemma 2.3, we use (64), then develop and simplify (65) to obtain

$$\begin{aligned} \ddot{h}_z(t) + \frac{\alpha }{t} \dot{h}_z(t) + \epsilon g(t) + \beta t \dot{g}(t) \le \Vert x(t)-z\Vert \Vert f(t)\Vert + 2\beta t\Vert \dot{x}(t)\Vert \Vert f(t)\Vert , \end{aligned}$$

(66)

where, as in the proof of Proposition 2.4, we have set $g(t) = \Vert \dot{x}(t)\Vert ^2$. Using the fact that for any $0<\theta <1$

$$\begin{aligned} 2\beta t\Vert \dot{x}(t)\Vert \Vert f(t)\Vert \le \epsilon \theta \Vert \dot{x}(t)\Vert ^2 + \frac{\beta ^2}{\epsilon \theta }t^2 \Vert f(t)\Vert ^2 , \end{aligned}$$

and multiplying by t, we obtain

$$\begin{aligned} t\ddot{h}_z(t) + \alpha \dot{h}_z(t) + \epsilon (1- \theta ) t g(t) + \beta t^2 \dot{g}(t)\le t\Vert x(t)-z\Vert \Vert f(t)\Vert + \frac{\beta ^2}{\epsilon \theta }t^3 \Vert f(t)\Vert ^2 . \end{aligned}$$

(67)

Integration from $t_0$ to t yields

$$\begin{aligned}&t\dot{h}_z(t) +(\alpha -1)h_z(t) + \beta t^2 g(t) + \left( \epsilon (1-\theta ) - 2 \beta \right) \int _{t_0}^t s g(s) ds \le C \nonumber \\&\quad + \int _{t_0}^t s\Vert x(s)-z\Vert \Vert f(s)\Vert ds + \frac{\beta ^2}{\epsilon \theta } \int _{t_0}^t s^3 \Vert f(s)\Vert ^2 ds \end{aligned}$$

(68)

for some positive constant C that depends only on the initial data. In all that follows, C is a generic notation for a constant. By the definition $\beta = \frac{1 + \epsilon }{\alpha }$ and the assumptions and the parameters $\alpha $ and $\epsilon $, we can choose $\theta $ positive small enough to get $\epsilon (1-\theta )-2\beta >0$. Taking into account also the hypothesis $\int _{t_0}^{+\infty } t^3 \Vert f(t)\Vert ^2 dt < + \infty $, we deduce that

$$\begin{aligned} t\dot{h}_z(t) +(\alpha -1)h_z(t) \le C + \int _{t_0}^t s\Vert x(s)-z\Vert \Vert f(s)\Vert ds . \end{aligned}$$

Multiply this expression by $t^{\alpha -2}$, integrate from $t_0$ to t, and use Fubini’s Theorem to obtain

$$\begin{aligned} \frac{1}{2}\Vert x(t)-z\Vert ^2 \le C + \frac{1}{\alpha -1}\int _{t_0}^t t\Vert x(t)-z\Vert \Vert f(t)\Vert dt . \end{aligned}$$

The main difference with Sect. 2 is here. We apply Gronwall’s Lemma (see [16, Lemma A.5]) to get

$$\begin{aligned} \Vert x(t)-z\Vert \le C + \frac{1}{\alpha -1}\int _{t_0}^t t \Vert f(t)\Vert dt . \end{aligned}$$

Since $\int _{t_0}^{+\infty } t \Vert f(t)\Vert dt < + \infty $, we deduce that the trajectory $x(\cdot )$ is bounded. The rest of the proof is essentially the same. First, we obtain

$$\begin{aligned} \sup _{t\ge t_0} t\Vert \dot{x}(t)\Vert < + \infty , \end{aligned}$$

by bounding that quantity between the roots of a quadratic expression. Then, we go back to (68) to get that

$$\begin{aligned} \int _{t_0}^{+ \infty } t \Vert \dot{x}(t)\Vert ^2 dt < + \infty . \end{aligned}$$

We use Lemma A.6 to deduce that $\lim _{t\rightarrow + \infty } h_z (t)$ exists and

$$\begin{aligned} \int _{t_0}^{+ \infty } t \lambda (t) \Vert A_{\lambda (t)}(x(t)) \Vert ^2 dt < + \infty . \end{aligned}$$

The latter implies $\lim _{t\rightarrow +\infty }\lambda (t)\Vert A_{\lambda (t)}(x(t))\Vert =0$, and we conclude by means of Opial’s Lemma A.2. $\square $

4.2 The algorithmic case

Let us first consider how the introduction of the external perturbation f into the continuous dynamics modifies the corresponding algorithm. Setting $f_k = f(kh)$, a discretization similar to that of the unperturbed case gives

$$\begin{aligned} \frac{1}{h^2}(x_{k+1} -2 x_{k} + x_{k-1} ) +\frac{\alpha }{kh^2}( x_{k} - x_{k-1}) + A_{\lambda _k} (x_{k+1}) = f_k. \end{aligned}$$

After expanding this expression, and setting $s=h^2$, we obtain

$$\begin{aligned} x_{k+1} + s A_{\lambda _k} (x_{k+1}) = x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1})+sf_k , \end{aligned}$$

which gives

$$\begin{aligned} \left\{ \begin{array}{ll} y_k = x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1}) \\ x_{k+1} = \left( I + s A_{\lambda _k}\right) ^{-1} \left( y_k +sf_k \right) . \end{array}\right. \end{aligned}$$

Using the resolvent Eq. (84), we obtain the Regularized Inertial Proximal Algorithm with perturbation

$$\begin{aligned} \text{(RIPA-pert) } \qquad \left\{ \begin{array}{ll} y_k= \displaystyle {x_{k} + \left( 1- \frac{\alpha }{k}\right) ( x_{k} - x_{k-1})} \\ x_{k+1} = \displaystyle {\frac{\lambda _k}{\lambda _k +s} (y_k + sf_k ) +\frac{s}{\lambda _k +s}J_{(\lambda _k +s)A} \left( y_k +sf_k\right) }. \end{array}\right. \end{aligned}$$

Setting $\alpha _k = 1- \frac{\alpha }{k}$, and with help of the Yosida approximation, this can be written in a compact way as

$$\begin{aligned} \left\{ \begin{array}{ll} y_k= \displaystyle {x_{k} + \alpha _k( x_{k} - x_{k-1})} \\ x_{k+1} = (y_k +sf_k) -s A_{\lambda _k + s}\left( y_k +sf_k\right) . \end{array}\right. \end{aligned}$$

(69)

When $f_k=0$ we recover (RIPA). The convergence of $\text{(RIPA-pert) }$ algorithm is analyzed in the following theorem.

Theorem 4.2

Let $A: {\mathcal {H}} \rightarrow 2^{{\mathcal {H}}}$ be a maximally monotone operator such that $S= A^{-1} (0)\ne \emptyset $. Let $(x_k)$ be a sequence generated by the algorithm $\text{(RIPA-pert) }$ where $\alpha >2$ and

$$\begin{aligned} \lambda _k = \left( 1+ \frac{s}{2}+ \epsilon \right) \frac{2s}{\alpha ^2}k^2 \end{aligned}$$

for some $\epsilon > \frac{2 +s}{\alpha -2}$ and all $k \ge 1$. Suppose that $\sum _k k\Vert f_k\Vert < + \infty $ and $\sum _k k^3\Vert f_k\Vert ^2 < + \infty $. Then,

(i)
The speed tends to zero. More precisely, $\Vert x_{k+1} - x_{k} \Vert = {\mathcal {O}} (\frac{1}{k})$ and $\sum _k k\Vert x_{k} - x_{k-1} \Vert ^2 < +\infty $.
(ii)
The sequence $(x_k)$ converges weakly, as $k\rightarrow +\infty $, to some ${\hat{x}}\in S$.
(iii)
The sequence $(y_k)$ converges weakly, as $k\rightarrow +\infty $, to ${\hat{x}}$.

Proof

Let us observe that the definitions of $y_k$ and $h_k$ are the same as in the unperturbed case. Hence, it is only when using the constitutive equation $x_{k+1} = (y_k +sf_k) -s A_{\lambda _k + s}\left( y_k +sf_k\right) $, which contains the perturbation term, that changes occur in the proof. Thus, the beginning of the proof and Lemma 3.3 is still valid, which gives

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) - \langle x_{k+1} - y_k , x_{k+1} - z \rangle \nonumber \\&\quad +\, \frac{1}{2} \Vert x_{k+1} - y_k \Vert ^2 = \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(70)

The next step, which corresponds to Lemma 3.4, uses the constitutive equation. Let us adapt it to our situation. By (70) and $x_{k+1} - y_k = s(f_k - A_{\lambda _k + s}\left( y_k +sf_k\right) )$, it follows that

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) +s \langle A_{\lambda _k + s}\left( y_k +sf_k \right) \nonumber \\&\quad -\,f_k, (y_k +sf_k -z)-s A_{\lambda _k + s}\left( y_k +sf_k\right) \rangle \nonumber \\&\quad +\, \frac{s^2}{2} \Vert A_{\lambda _k + s}\left( y_k +sf_k\right) -f_k\Vert ^2 = \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(71)

Since $z\in S$, we have $A_{\lambda _k + s}(z)=0 $. By the $(\lambda _k + s)$-cocoercivity property of $A_{\lambda _k + s}$, we deduce that

$$\begin{aligned} \langle A_{\lambda _k + s}\left( y_k +sf_k\right) , y_k +sf_k-z\rangle \ge (\lambda _k + s) \Vert A_{\lambda _k + s}\left( y_k +sf_k \right) \Vert ^2 . \end{aligned}$$

Using the above inequality in (71), and after development and simplification, we obtain

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} )+s \left( \lambda _k + \frac{s}{2} \right) \Vert A_{\lambda _k + s}\left( y_k +sf_k \right) \Vert ^2 \nonumber \\&\quad \le \frac{s^2}{2}\Vert f_k \Vert ^2 + s\langle y_k -z , f_k\rangle + \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(72)

From $ A_{\lambda _k + s}\left( y_k +sf_k \right) = \frac{1}{s}( y_k +sf_k -x_{k+1})$ it follows that

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) +\frac{1}{s} \left( \lambda _k + \frac{s}{2} \right) \Vert (y_k -x_{k+1})+ sf_k \Vert ^2 \\&\quad \le \frac{s^2}{2}\Vert f_k \Vert ^2 + s\langle y_k -z , f_k\rangle + \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

Using the elementary inequality $ \Vert y_k -x_{k+1} \Vert ^2 \le 2 \Vert (y_k -x_{k+1})+ sf_k \Vert ^2 + 2\Vert sf_k \Vert ^2$, we deduce that

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) +\frac{1}{2s} \left( \lambda _k + \frac{s}{2} \right) \Vert x_{k+1}-y_k \Vert ^2 \\&\quad \le \frac{s^2}{2}\Vert f_k \Vert ^2 +s\left( \lambda _k + \frac{s}{2} \right) \Vert f_k \Vert ^2 + s\langle y_k -z , f_k\rangle + \frac{1}{2} (\alpha _k + {\alpha _k}^2 )\Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

Since $\frac{1}{2} (\alpha _k + {\alpha _k}^2 ) \le \alpha _k \le 1$, $s>0$, and by using Cauchy–Schwarz inequality, it follows that

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) +\frac{\lambda _k}{2s} \Vert x_{k+1} -y_k \Vert ^2 \nonumber \\&\quad \le s(s+\lambda _k)\Vert f_k \Vert ^2 + s\Vert y_k -z\Vert \Vert f_k\Vert + \Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(73)

By the definition of $y_k$ and elementary inequalities we have

$$\begin{aligned}&\Vert y_k -z\Vert \Vert f_k\Vert \le \Vert x_k -z\Vert \Vert f_k\Vert + \Vert x_k -x_{k-1}\Vert \Vert f_k\Vert \le \Vert x_k -z\Vert \Vert f_k\Vert \\&\quad +\, \frac{1}{2}\Vert f_k\Vert ^2 + \frac{1}{2}\Vert x_k -x_{k-1}\Vert ^2 . \end{aligned}$$

Combining this inequality with (73) we obtain

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) +\frac{\lambda _k}{2s} \Vert x_{k+1} -y_k \Vert ^2 \nonumber \\&\quad \le s\left( \frac{1}{2} +s+\lambda _k\right) \Vert f_k \Vert ^2 + s\Vert x_k -z\Vert \Vert f_k\Vert \nonumber \\&\qquad +\, (1+\frac{s}{2}) \Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(74)

Let us write $\Vert x_{k+1} - y_k \Vert ^2$ in a recursive form. The same computation as in Lemma 3.5 gives

$$\begin{aligned} \Vert x_{k+1} - y_k \Vert ^2&=\left( 1- \frac{\alpha }{k}\right) \Vert (x_{k+1} - x_{k}) - (x_{k} - x_{k-1} )\Vert ^2 \\&\quad +\, \frac{\alpha ^2}{k^2}\Vert x_{k} - x_{k-1} \Vert ^2 + \frac{\alpha }{k} \left( \Vert x_{k+1} - x_{k}\Vert ^2 - \Vert x_{k} - x_{k-1} \Vert ^2 \right) . \end{aligned}$$

Using this equality in (74), and neglecting the nonnegative term $(1- \frac{\alpha }{k})\Vert (x_{k+1} - x_{k}) - (x_{k} - x_{k-1} )\Vert ^2$, we obtain

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) + \frac{\lambda _k \alpha ^2}{2sk^2} \Vert x_{k} - x_{k-1} \Vert ^2 \nonumber \\&\quad +\, \frac{\lambda _k \alpha }{2sk} \left( \Vert x_{k+1} - x_{k}\Vert ^2 - \Vert x_{k} - x_{k-1} \Vert ^2 \right) \nonumber \\&\le s\left( \frac{1}{2} +s+\lambda _k\right) \Vert f_k \Vert ^2 + s\Vert x_k -z\Vert \Vert f_k\Vert + \left( 1+\frac{s}{2}\right) \Vert x_{k} - x_{k-1} \Vert ^2 . \end{aligned}$$

(75)

Using $ \lambda _k = (1+ \frac{s}{2}+ \epsilon )\frac{2s}{\alpha ^2}k^2$ and the definition $\beta = \frac{1 + \frac{s}{2} +\epsilon }{\alpha }$, inequality (75) becomes

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) + \epsilon \Vert x_{k} - x_{k-1} \Vert ^2 + \beta k \left( \Vert x_{k+1} - x_{k}\Vert ^2 - \Vert x_{k} - x_{k-1} \Vert ^2 \right) \nonumber \\&\quad \le s\left( \frac{1}{2} +s+\lambda _k\right) \Vert f_k \Vert ^2 + s\Vert x_k -z\Vert \Vert f_k\Vert . \end{aligned}$$

(76)

Setting $g_k:= \Vert x_{k} - x_{k-1} \Vert ^2$, we have

$$\begin{aligned}&(h_{k+1} - h_k )- \alpha _k (h_{k} - h_{k-1} ) + \epsilon g_k + \beta k\left( g_{k+1}- g_k\right) \\&\quad \le s\left( \frac{1}{2} +s+\lambda _k\right) \Vert f_k \Vert ^2 + s\Vert x_k -z\Vert \Vert f_k\Vert . \end{aligned}$$

Then, we multiply by k to obtain

$$\begin{aligned}&k(h_{k+1} - h_k )- (k-\alpha ) (h_{k} - h_{k-1} ) + \epsilon kg_k + \beta k^2\left( g_{k+1}- g_k\right) \\&\quad \le sk\left( \frac{1}{2} +s+\lambda _k\right) \Vert f_k \Vert ^2 + sk\Vert x_k -z\Vert \Vert f_k\Vert . \end{aligned}$$

We now write these inequalities in a recursive form, in order to simplify their summation. We have

$$\begin{aligned}&k(h_{k+1} - h_k )- (k-1) (h_{k} - h_{k-1} ) +(\alpha -1) (h_{k} - h_{k-1} )\\&\quad +\, \epsilon kg_k + \beta k^2 g_{k+1}- \beta (k-1)^2 g_{k} -\beta (2k-1)g_k \\&\quad \le sk\left( \frac{1}{2} +s+\lambda _k\right) \Vert f_k \Vert ^2 + sk\Vert x_k -z\Vert \Vert f_k\Vert . \end{aligned}$$

Summing for $p=1,\dots , k$, we obtain

$$\begin{aligned}&k(h_{k+1} - h_k ) +(\alpha -1) h_{k} + (\epsilon -2 \beta ) \sum _1^k pg_p + \beta k^2 g_{k+1} +\beta \sum _1^k g_p \nonumber \\&\quad \le s\sum _1^k p\Vert x_p -z \Vert \Vert f_p\Vert + s\sum _1^k p\left( \frac{1}{2} +s+\lambda _p\right) \Vert f_p \Vert ^2. \end{aligned}$$

(77)

Since $\beta = \frac{1 + \frac{s}{2} +\epsilon }{\alpha }$ with $\alpha >2$ and $\epsilon > \frac{2 +s}{\alpha -2}$, we have $\epsilon -2 \beta = \frac{\epsilon (\alpha -2) - (s+2)}{\alpha }>0$. Moreover by the definition of $\lambda _k$ and the assumption $\sum _k k^3\Vert f_k\Vert ^2 < + \infty $, we have $s\sum _1^k p(\frac{1}{2} +s+\lambda _p)\Vert f_p \Vert ^2 \le C$ for some positive constant C. Whence

$$\begin{aligned} k(h_{k+1} - h_k ) +(\alpha -1) h_{k} + \beta k^2 g_{k+1} \le C +\sum _1^k p\Vert x_p -z \Vert \Vert f_p\Vert \end{aligned}$$

(78)

for all $k\ge 1$. Since $\alpha >2$ and $h_k\ge 0$, (78) implies

$$\begin{aligned} kh_{k+1} - (k-1)h_k \le C +\sum _1^k p\Vert x_p -z \Vert \Vert f_p\Vert . \end{aligned}$$

By summing the above inequalities, and applying Fubini’s Theorem, we deduce that

$$\begin{aligned} kh_{k+1} \le Ck +\sum _1^k p\Vert x_p -z \Vert \Vert f_p\Vert (k-p) \le Ck +k\sum _1^k p\Vert x_p -z \Vert \Vert f_p\Vert . \end{aligned}$$

Hence

$$\begin{aligned} \frac{1}{2} \Vert x_{k+1} - z\Vert ^2 \le C +\sum _1^k p\Vert x_p -z \Vert \Vert f_p\Vert (k-p) \le C +\sum _1^k p\Vert x_p -z \Vert \Vert f_p\Vert . \end{aligned}$$

Applying the discrete form of the Gronwall’s Lemma A.8, and $\sum _k k\Vert f_k\Vert < + \infty $, we obtain $\sup _k \ h_k < +\infty $. Therefore, the sequence $(x_k)$ is bounded. The remainder of the proof is pretty much as that of Theorem 3.6. We first derive

$$\begin{aligned} \sup _k k\Vert x_{k+1} - x_{k} \Vert < +\infty . \end{aligned}$$

(79)

Then, we combine (77) with (79) to obtain

$$\begin{aligned} \sum _k k\Vert x_{k} - x_{k-1} \Vert ^2 < +\infty . \end{aligned}$$

(80)

Since $\alpha _k=1-\frac{\alpha }{k}\le 1$, inequalities (72) and (80) give

$$\begin{aligned} (h_{k+1} - h_k)+s \left( \lambda _k + \frac{s}{2} \right) \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 \le \left( 1-\frac{\alpha }{k}\right) (h_{k} - h_{k-1}) +\theta _k \end{aligned}$$

for all $k \ge 1$, and $\sum _{k \in \mathbb {N}} k\theta _k < +\infty $. Invoking Lemma A.7, we deduce that $\lim _{k\rightarrow +\infty }h_k$ exists and

$$\begin{aligned} \sum _k k\lambda _k \Vert A_{\lambda _k + s}\left( y_k \right) \Vert ^2 < + \infty . \end{aligned}$$

(81)

We conclude using Opial’s Lemma A.3 as in the unperturbed case. $\square $

Remark 4.3

The perturbation can be interpreted either as a miscomputation of $y_k$ from the two previous iterates, or as an error due to the fact that the resolvent can be computed at a neighboring point $y_k+sf_k$, rather than $y_k$. Anyway, perturbations of order less than $\frac{1}{k^2}$ are admissible and the convergence properties are preserved.

5 Quadratic growth and strong convergence

In this section, we examine the case of a maximally monotone operator A satisfying a quadratic growth property. More precisely, we assume that there is $\nu >0$ such that

$$\begin{aligned} \langle x^*,x-z\rangle \ge \nu \,\hbox {dist}(x,S)^2 \end{aligned}$$

(82)

whenever $x^*\in Ax$ and $z\in S$. If A is strongly monotone, then (82) holds and S is a singleton. Another example is the subdifferential of a convex function $\Phi $ satisfying a quadratic error bound (see [18]). Indeed,

$$\begin{aligned} \langle x^*,x-z\rangle \ge \Phi (x)-\min (\Phi )\ge \nu \,\hbox {dist}(x,S)^2 \end{aligned}$$

if $x^*\in \partial \Phi (x)$ and $z\in S=\hbox {argmin}(\Phi )$. A particular case is when $A=M^*M$, where M is a bounded linear operator with closed range (if $\Phi (x)=\frac{1}{2}\Vert Mx\Vert ^2$, then $\nabla \Phi (x)=M^*Mx$). We have,

$$\begin{aligned} \langle Ax,x-z\rangle =\Vert M(x-z)\Vert ^2\ge \nu \,\hbox {dist}(x,\hbox {Ker}(M))^2 \end{aligned}$$

(see [17, Exercise 2.14]).

We obtain the following convergence result:

Theorem 5.1

Let $A: {\mathcal {H}} \rightarrow 2^{{\mathcal {H}}}$ be a maximally monotone operator satisfying (82) for some $\nu >0$ and all $x\in {\mathcal {H}}$ and $z\in S$. Let $x:[t_0,+\infty [\rightarrow {\mathcal {H}}$ be a solution of the continuous dynamic

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) +A_{\lambda (t)}(x(t)) = 0, \end{aligned}$$

where $\alpha >2$ and

$$\begin{aligned} \lambda (t) = (1+\epsilon ) \frac{t^2}{\alpha ^2} \ \text{ with } \ \epsilon > \frac{2}{\alpha -2}. \end{aligned}$$

Then, $\lim _{t\rightarrow +\infty }\hbox {dist}(x(t),S)=0$. If, moreover, $S=\{{\bar{z}}\}$, then x(t) converges strongly to ${\bar{z}}$ as $t\rightarrow +\infty $.

Proof

First, fix $t\ge t_0$ and observe that

$$\begin{aligned} \langle A_{\lambda (t)}(x(t)),x(t)-z\rangle= & {} \langle A_{\lambda (t)}(x(t)),J_{\lambda (t)A}(x(t))-z\rangle \\&\quad +\langle A_{\lambda (t)}(x(t)),x(t)-J_{\lambda (t)A}(x(t))\rangle \\\ge & {} \nu \,\hbox {dist}(J_{\lambda (t)A}(x(t)),S)^2+\frac{1}{\lambda (t)}\Vert x(t)-J_{\lambda (t)A}(x(t))\Vert ^2 \\\ge & {} \nu \,\hbox {dist}(J_{\lambda (t)A}(x(t)),S)^2\\&\quad +\,\frac{1}{\lambda (t)}\Bigg [(1-\zeta )\Vert x(t)-z\Vert ^2\\&\quad +\,\left( 1-\frac{1}{\zeta }\right) \Vert z-J_{\lambda (t)A}(x(t))\Vert ^2\Bigg ] \end{aligned}$$

for all $\zeta >0$ (we shall select a convenient value later on). In turn, the left-hand side satisfies

$$\begin{aligned} \langle A_{\lambda (t)}(x(t)),x(t)-z\rangle\le & {} \Vert A_{\lambda (t)}(x(t))\Vert \,\Vert x(t)-z\Vert \\\le & {} \lambda (t)\Vert A_{\lambda (t)}(x(t))\Vert ^2+\Vert A_{\lambda (t)}(x(t))\Vert \,\Vert J_{\lambda (t)A}(x(t))-z\Vert \\\le & {} \left( \lambda (t)+\frac{\zeta }{2\nu }\right) \Vert A_{\lambda (t)}(x(t))\Vert ^2+\frac{\nu }{2\zeta }\Vert J_{\lambda (t)A}(x(t))-z\Vert ^2. \end{aligned}$$

Since $\Vert x(t)-z\Vert \ge \hbox {dist}(x(t),S)$, by taking z as the projection of $J_{\lambda (t)A}(x(t))$ onto S, and combining the last two inequalities, we obtain

$$\begin{aligned}&\frac{(1-\zeta )}{\lambda (t)}\hbox {dist}(x(t),S)^2+\left( \nu -\frac{\nu }{2\zeta }+\frac{\zeta -1}{\zeta \lambda (t)}\right) \hbox {dist}(J_{\lambda (t)A}(x(t)),S)^2\\&\quad \le \left( \lambda (t)+\frac{\zeta }{2\nu }\right) \Vert A_{\lambda (t)}(x(t))\Vert ^2. \end{aligned}$$

whenever $0<\zeta <1$. Set

$$\begin{aligned} \zeta =\frac{2+\lambda (t)\nu }{2(1+\lambda (t)\nu )}=\frac{1}{2}+\frac{1}{2(1+\lambda (t)\nu )}, \end{aligned}$$

so that

$$\begin{aligned} 0<\frac{1}{2}<\zeta \le \frac{1}{2}+\frac{1}{2(1+\lambda (t_0)\nu )}<1, \end{aligned}$$

and

$$\begin{aligned} \nu -\frac{\nu }{2\zeta }+\frac{\zeta -1}{\zeta \lambda (t)}=\frac{2\zeta \lambda (t)\nu -\lambda (t)\nu +2\zeta -2}{2\zeta \lambda (t)}=\frac{2\zeta (1+\lambda (t)\nu )-(2+\lambda (t)\nu )}{2\zeta \lambda (t)}=0. \end{aligned}$$

It follows that

$$\begin{aligned} \frac{(1-\zeta )}{\lambda (t)}\hbox {dist}(x(t),S)^2\le \left( \lambda (t)+\frac{\zeta }{2\nu }\right) \Vert A_{\lambda (t)}(x(t))\Vert ^2, \end{aligned}$$

and so

$$\begin{aligned} \hbox {dist}(x(t),S)^2\le \left[ \frac{2(1+\lambda (t_0)\nu )}{\lambda (t_0)\nu }\right] \,\lambda (t)\left( \lambda (t)+\frac{1}{2\nu }\right) \Vert A_{\lambda (t)}(x(t))\Vert ^2. \end{aligned}$$

The right-hand side goes to zero by (33). $\square $

A similar result holds for (RIPA), namely:

Theorem 5.2

Let $A: {\mathcal {H}} \rightarrow 2^{{\mathcal {H}}}$ be a maximally monotone operator satisfying (82) for some $\nu >0$ and all $x\in {\mathcal {H}}$ and $z\in S$. Let Let $(x_k)$ be a sequence generated by the algorithm $\text{(RIPA) }$, where $\alpha >2$ and $\lambda _k= (1+\epsilon ) \frac{sk^2}{\alpha ^2}$ with $\epsilon > \frac{2}{\alpha -2}$. Then, $\lim _{k\rightarrow +\infty }\hbox {dist}(x_k,S)=0$. If, moreover, $S=\{{\bar{z}}\}$, then $x_k$ converges strongly to ${\bar{z}}$ as $k\rightarrow +\infty $.

6 Further conclusions from a keynote example

Let us illustrate our results in the case where ${\mathcal {H}} = \mathbb {R}^2$, and A is the counterclockwise rotation centered at the origin and with the angle $\frac{\pi }{2}$, namely $A(x,y)=(-y,x)$. This is a model situation for a maximally monotone operator that is not cocoercive. The linear operator A is antisymmetric, that is, $\langle A(x,y),(x,y)\rangle =0$ for all $(x,y)\in {\mathcal {H}}$.

6.1 Critical parameters

Our results are based on an appropriate tuning of the Yosida approximation parameter. Let us analyze the asymptotic behavior of the solution trajectories of the second-order differential equation

$$\begin{aligned} \ddot{u}(t) + \frac{\alpha }{t} \dot{u}(t) +A_{\lambda (t)}(u(t)) = 0, \end{aligned}$$

(83)

where $u(t)= (x(t), y(t))$. Since 0 is the unique zero of A, the question is to find the conditions on $\lambda (t)$ which ensure the convergence of u(t) to zero. An elementary computation gives

$$\begin{aligned} A_{\lambda }= \frac{1}{1+\lambda ^2} \left( \begin{array}{cc} \lambda &{} -1 \\ 1 &{} \lambda \end{array} \right) . \end{aligned}$$

For easy computation, it is convenient to set $z(t)=x(t) +i y(t)$, and work with the equivalent formulation of the problem in the Hilbert space ${\mathcal {H}} = \mathbb {C}$, equipped with the real Hilbert structure $\langle z_1, z_2\rangle = Re (z_1 \bar{z_2})$. So, the operator and its Yosida approximation are given respectively by $Az =iz$ and $A_{\lambda }z = \frac{\lambda +i}{1 + \lambda ^2}z$. Then (83) becomes

$$\begin{aligned} \ddot{z}(t) + \frac{\alpha }{t} \dot{z}(t) +\frac{\lambda +i}{1 + \lambda ^2}z(t) = 0. \end{aligned}$$

Passing to the phase space $\mathbb {C} \times \mathbb {C}$, and setting $Z(t)= \big (z(t),\dot{z}(t)\big )^T$, we obtain the first-order equivalent system

$$\begin{aligned} \dot{Z}(t) + M(t) Z(t) =0 \quad \text{ where } M(t)= \left( \begin{array}{cc} 0 &{} -1 \\ \frac{\lambda (t) +i}{1 + \lambda (t)^2} &{} \frac{\alpha }{t} \end{array} \right) . \end{aligned}$$

The asymptotic behavior of the trajectories of this system can be analyzed by examinating the eigenvalues of the matrix M(t), which are given by

$$\begin{aligned} \theta (t) = \frac{\alpha }{2t}\left\{ 1 \pm \sqrt{1- \frac{4t^2}{\alpha ^2}\frac{\lambda (t) +i}{1 + \lambda (t)^2} }\right\} . \end{aligned}$$

Let us restrict ourselves to the case $\lambda (t)\sim t^p$. If $p>2$, the eigenvalues $\theta _+$ and $\theta _-$ satisfy

$$\begin{aligned} \theta _+(t) \sim \frac{1}{t}\qquad \hbox {and}\qquad \theta _-(t)\sim \frac{1}{t^{p-1}}. \end{aligned}$$

Although the solutions of the differential equation $\dot{v}(t) + \frac{1}{t}v(t)=0$ converge to 0, those of $\dot{v}(t) + \frac{1}{t^{p-1}}v(t)=0$ do not. Thus, to obtain the convergence results of our theorem, we are not allowed to let $\lambda (t)$ tend to infinity at a rate greater than $t^2$, which shows that $t^2$ is a critical size for $\lambda (t)$.

6.2 A comparative illustration

As an illustration, we depict solutions of some first- and second-order equations involving the rotation operator A, obtained using Scilab’s ode solver. In all cases, the initial condition at $t_0=1$ is (10, 10). For second-order equations, we take the initial velocity as (0, 0) in order not to force the system in any direction. When relevant, we take $\lambda (t)=(1+\epsilon )t^2/\alpha ^2$ with $\alpha =10$ and $\epsilon =1+2(\alpha -2)^{-1}$. For the constant $\lambda $, we set $\lambda =10$. Table 1 shows the distance to the unique equilibrium $({\bar{x}},{\bar{y}})=(0,0)$ at $t=100$.

Table 1 Distance to the unique equilibrium for a solution of each equation

Full size table

Observe that the final position of the solution of (E5) is comparable to that of (E4), which is a first-order equation governed by the strongly monotone operator $A_\lambda $.

Notes

The idea consisting in regularizing with the help of the Moreau envelopes an inertial dynamic governed by a nonsmooth operator was already used in the modeling of elastic shocks in [5].

References

Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2000)
Article MathSciNet MATH Google Scholar
Álvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1–2), 3–11 (2001)
Article MathSciNet MATH Google Scholar
Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differ. Equ. 263(9), 5412–5458 (2017)
Article MathSciNet MATH Google Scholar
Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms, HAL-01453170 (2017)
Attouch, H., Cabot, A., Redont, P.: The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12(1), 273–306 (2002)
MathSciNet MATH Google Scholar
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing damping, to appear in Math. Program. https://doi.org/10.1007/s10107-016-0992-8
Attouch, H., Maingé, P.E.: Asymptotic behavior of second order dissipative evolution equations combining potential with non-potential effects. ESAIM Control Optim. Calc. Var. 17(3), 836–857 (2011)
Article MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than $\frac{1}{k^2}$. SIAM J. Optim. 26(3), 1824–1834 (2016)
Article MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: Fast convergence of regularized inertial dynamics for nonsmooth convex optimization, Working paper. (2017)
Attouch, H., Soueycatt, M.: Augmented Lagrangian and proximal alternating direction methods of multipliers in Hilbert spaces. Applications to games, PDE’s and control. Pac. J. Optim. 5(1), 17–37 (2009)
MathSciNet MATH Google Scholar
Attouch, H., Wets, R.: Epigraphical processes: laws of large numbers for random LSC functions. Sem. Anal. Convexe Montp. 20, 13–29 (1990)
MathSciNet MATH Google Scholar
Attouch, H., Wets, R.: Quantitative stability of variational systems: I, the epigraphical distance. Trans. Am. Math. Soc. 328(2), 695–729 (1991)
MathSciNet MATH Google Scholar
Attouch, H., Wets, R.: Quantitative stability of variational systems: II, a framework for nonlinear conditioning. SIAM J. Optim. 3, 359–381 (1993)
Article MathSciNet MATH Google Scholar
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert spaces, CMS Books in Mathematics. Springer, Berlin (2011)
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland, (1972)
Brézis, H.: Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer, Berlin (2011)
MATH Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
Article MathSciNet MATH Google Scholar
Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 361, 5983–6017 (2009)
Article MathSciNet MATH Google Scholar
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the fast iterative shrinkage thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)
Article MathSciNet MATH Google Scholar
Haraux, A.: Systèmes dynamiques dissipatifs et applications, RMA 17, Masson, (1991)
Jendoubi, M.A., May, R.: Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term. Appl. Anal. 94(2), 436–444 (2015)
Article MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016). Ser. A
Article MathSciNet MATH Google Scholar
May, R.: Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 41(3), 681–685 (2017)
Article MathSciNet Google Scholar
Matet, S., Rosasco, L., Villa, S., Vu, B.C.: Don’t relax: early stopping for convex regularization. arXiv:1707.05422v1 [math.OC] (2017)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $O(1/k^2)$. Sov. Math. Dokl. 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, of Applied Optimization, vol. 87. Kluwer Academic Publishers, Boston (2004)
MATH Google Scholar
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)
Article MathSciNet MATH Google Scholar
Peypouquet, J.: Convex Otimization in Normed Spaces: Theory, Methods and Examples. With a Foreword by Hedy Attouch. Springer Briefs in Optimization, p. xiv+124. Springer, Cham (2015)
Google Scholar
Peypouquet, J., Sorin, S.: Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time. J. Convex Anal. 17(3–4), 1113–1163 (2010)
MathSciNet MATH Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
MATH Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Monotone operators associated with saddle-functions and mini-max problems, In: Nonlinear operators and nonlinear equations of evolution in Banach spaces 2. In: 18th Proceedings of Symposia in Pure Mathematics, F.E. Browder Ed., American Mathematical Society, pp. 241–250 (1976)
Rockafellar, R.T.: Augmented lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1, 97–116 (1976)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, Grundlehren der mathematischen Wissenschafte, vol. 317. Springer, Berlin (1998)
Google Scholar
Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems (NIPS), (2011)
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27, 2510–2518 (2014)
MATH Google Scholar
Villa, S., Salzo, S., Baldassarres, L., Verri, A.: Accelerated and inexact forward–backward. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank P. Redont for his careful and constructive reading of the paper.

Author information

Authors and Affiliations

Institut Montpelliérain Alexander Grothendieck, UMR 5149 CNRS, Université Montpellier, Place Eugène Bataillon, 34095, Montpellier Cedex 5, France
Hedy Attouch
Departamento de Matemática, Universidad Técnica Federico Santa María, Av España 1680, Valparaíso, Chile
Juan Peypouquet
Departamento de Ingeniería Matemática & Centro de Modelamiento Matemático (CNRS UMI2807), FCFM, Universidad de Chile, Beauchef 851, Santiago, Chile
Juan Peypouquet

Authors

Hedy Attouch
View author publications
You can also search for this author in PubMed Google Scholar
Juan Peypouquet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hedy Attouch.

Additional information

H. Attouch: Effort sponsored by the Air Force Office of Scientific Research, Air Force Material Command, USAF, under Grant Number F49550-1 5-1-0500. J. Peypouquet: supported by Fondecyt Grant 1140829, Millenium Nucleus ICM/FIC RC130003 and Basal Project CMM Universidad de Chile.

Auxiliary results

1.1 Yosida regularization of an operator A

Given a maximally monotone operator A and $\lambda >0$, the resolvent of A with index $\lambda $ and the Yosida regularization of A with parameter $\lambda $ are defined by

$$\begin{aligned} J_{\lambda A} = \left( I + \lambda A \right) ^{-1}\qquad \hbox {and}\qquad A_{\lambda } = \frac{1}{\lambda } \left( I- J_{\lambda A} \right) , \end{aligned}$$

respectively. The operator $J_{\lambda A}: {\mathcal {H}}\rightarrow {\mathcal {H}}$ is nonexpansive and eveywhere defined (indeed it is firmly non-expansive). Moreover, $A_{\lambda }$ is $\lambda $-cocoercive: for all $x, y \in {\mathcal {H}}$ we have

$$\begin{aligned} \langle A_{\lambda }y - A_{\lambda }x, y-x\rangle \ge \lambda \Vert A_{\lambda }y - A_{\lambda }x \Vert ^2 . \end{aligned}$$

This property immediately implies that $A_{\lambda }: {\mathcal {H}}\rightarrow {\mathcal {H}}$ is $\frac{1}{\lambda }$-Lipschitz continuous. Another property that proves useful is the resolvent equation (see, for example, [16, Proposition 2.6] or [14, Proposition 23.6])

$$\begin{aligned} (A_\lambda )_{\mu }= A_{(\lambda +\mu )}, \end{aligned}$$

(84)

which is valid for any $\lambda , \mu >0$. This property allows to compute simply the resolvent of $A_\lambda $: for any $\lambda , \mu >0$ by

$$\begin{aligned} J_{\mu A_\lambda } = \frac{\lambda }{\lambda + \mu }I + \frac{\mu }{\lambda + \mu }J_{(\lambda + \mu )A}. \end{aligned}$$

Also note that for any $x \in {\mathcal {H}}$, and any $\lambda >0$

$$\begin{aligned} A_\lambda (x) \in A (J_{\lambda A}x)= A( x - \lambda A_\lambda (x)). \end{aligned}$$

Finally, for any $\lambda >0$, A and $A_{\lambda }$ have the same solution set $S:=A_{\lambda }^{-1} (0) = A^{-1}(0)$. For a detailed presentation of the properties of the maximally monotone operators and the Yosida approximation, the reader can consult [14] or [16].

1.2 Existence and uniqueness of solution in the presence of a source term

Let us first establish the existence and uniqueness of the solution trajectory of the Cauchy problem associated to the continuous regularized dynamic (1) with a source term.

Lemma A.1

Take $t_0>0$. Let us suppose that $\lambda : [t_0, +\infty [ \rightarrow {\mathbb {R}}^+$ is a measurable function such that $\lambda (t) \ge \underline{\lambda }$ for some $\underline{\lambda }>0$. Suppose that $f \in L^1 ([t_0, T], {\mathcal {H}})$ for all $T \ge t_0$. Then, for any $x_0 \in {\mathcal {H}}, \ v_0 \in {\mathcal {H}} $, there exists a unique strong global solution $x: [t_0,+\infty [ \rightarrow {\mathcal {H}}$ of the Cauchy problem

$$\begin{aligned} \left\{ \begin{array}{ll} \ddot{x}(t) + \frac{\alpha }{t} \dot{x}(t) +A_{\lambda (t)}(x(t)) = f(t)\\ x(t_0)= x_0, \ \dot{x}(t_0)=v_0. \end{array}\right. \end{aligned}$$

(85)

Proof

The argument is standard, and consists in writing (85) as a first-order system in the phase space. By setting

$$\begin{aligned} X(t)= \begin{pmatrix} x(t) \\ \dot{x}(t) \end{pmatrix}, \quad F (t,u,v)= \begin{pmatrix} v \\ -\frac{\alpha }{t} v - A_{\lambda (t)}u + f(t) \end{pmatrix} \text{ and } X_0= \begin{pmatrix} x_0 \\ v_0 \end{pmatrix}, \end{aligned}$$

the system can be written as

$$\begin{aligned} \left\{ \begin{array}{ll} \dot{X}(t) = F (t,X(t))\\ X(t_0)= X_0 . \end{array}\right. \end{aligned}$$

(86)

Using the $\frac{1}{\lambda }$-Lipschitz continuity property of $A_{\lambda }$, one can easily verify that the conditions of the Cauchy–Lipschitz theorem are satisfied. Precisely, we can apply the non-autonomous version of this theorem given in [21, Proposition 6.2.1]. Thus, we obtain a strong solution, that is, $t\mapsto \dot{x}(t)$ is locally absolutely continuous. If, moreover, we suppose that the functions $\lambda (\cdot )$ and f are continuous, then the solution is a classical solution of class ${\mathcal {C}}^2$. $\square $

1.3 Opial’s Lemma

The following results are often referred to as Opial’s Lemma [28]. To our knowledge, it was first written in this form in Baillon’s thesis. See [30] for a proof.

Lemma A.2

Let S be a nonempty subset of ${\mathcal {H}}$ and let $x: [0,+\infty [ \rightarrow {\mathcal {H}}$. Assume that

(i)
for every $z\in S$, $\lim _{t\rightarrow \infty }\Vert x(t)-z\Vert $ exists;
(ii)
every weak sequential limit point of x(t), as $t\rightarrow \infty $, belongs to S.

Then x(t) converges weakly as $t\rightarrow \infty $ to a point in S.

Its discrete version is

Lemma A.3

Let S be a non empty subset of ${\mathcal {H}}$, and $(x_k)$ a sequence of elements of ${\mathcal {H}}$. Assume that

(i)
for every $z\in S$, $\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert $ exists;
(ii)
every weak sequential limit point of $(x_k)$, as $k\rightarrow \infty $, belongs to S.

Then $x_k$ converges weakly as $k\rightarrow \infty $ to a point in S.

1.4 Variation of the function $\gamma \mapsto \gamma A_{\gamma }x$

Lemma A.4

Let $\gamma , \delta >0$, and $x, y\in {\mathcal {H}}$. Then, for each $z\in S= A^{-1} (0)$, and all $t \ge t_0$, we have

$$\begin{aligned} \Vert \gamma A_{\gamma }x - \delta A_{\delta }y\Vert \le 2 \Vert x-y \Vert + 2 \Vert x-z \Vert \frac{|\gamma - \delta |}{\gamma } \end{aligned}$$

(87)

Proof

We use successively the definition of the Yosida approximation, the resolvent identity [14, Proposition 23.28 (i)], and the nonexpansive property of the resolvent, to obtain

$$\begin{aligned} \Vert \gamma A_{\gamma }x - \delta A_{\delta }y\Vert&\le \Vert x-y\Vert + \Vert J_{\gamma A}x - J_{\delta A}y \Vert \\&= \Vert x-y\Vert + \Vert J_{\delta A}\left( \frac{\delta }{\gamma }x + \left( 1- \frac{\delta }{\gamma } \right) J_{\gamma A}x\right) - J_{\delta A}y \Vert \\&\le \Vert x-y\Vert + \Vert \frac{\delta }{\gamma }x + \left( 1- \frac{\delta }{\gamma } \right) J_{\gamma A}x -y\Vert \\&\le 2\Vert x-y\Vert + |1- \frac{\delta }{\gamma } | \Vert J_{\gamma A}x -x \Vert . \end{aligned}$$

Since $J_{\gamma A}z =z$ for $z\in S$, and using again the nonexpansive property of the resolvent, we deduce that

$$\begin{aligned} \Vert \gamma A_{\gamma }x - \delta A_{\delta }y\Vert&\le 2\Vert x-y\Vert + |1- \frac{\delta }{\gamma } | \Vert (J_{\gamma A}x -J_{\gamma A}z) + (z -x \Vert )\\&\le 2\Vert x-y\Vert + 2 \Vert x-z \Vert \frac{|\gamma - \delta |}{\gamma }, \end{aligned}$$

which gives the claim. $\square $

1.5 On integration and decay

Lemma A.5

Let $w,\eta :[t_0,+\infty [\rightarrow [0,+\infty [$ be absolutely continuous functions such that $\eta \notin L^1 (t_0, +\infty )$,

$$\begin{aligned} \int _{t_0}^{+ \infty } w(t)\,\eta (t)\,dt < + \infty , \end{aligned}$$

and $|\dot{w}(t)| \le \eta (t)$ for almost every $t>t_0$. Then, $\lim _{t\rightarrow +\infty } w(t) =0$.

Proof

First, for almost every $t>t_0$, we have

$$\begin{aligned} \left| \frac{d}{dt} w^2(t)\right| = 2\left| \frac{d}{dt} w(t)\right| w(t) \le 2w(t)\,\eta (t). \end{aligned}$$

Therefore, $|\frac{d}{dt} w^2|$ belongs to $L^1$. This implies that $\lim _{t\rightarrow +\infty } w^2(t) $ exists. Since w is nonnegative, it follows that $\lim _{t\rightarrow +\infty } w(t) $ exists as well. But this limit is necessarily zero because $\eta \notin L^1$. $\square $

1.6 On boundedness and anchoring

Lemma A.6

Let $t_0>0$, and let $w: [t_0, +\infty [ \rightarrow \mathbb {R}$ be a continuously differentiable function which is bounded from below. Given a nonegative function $\theta $, let us assume that

$$\begin{aligned} t\ddot{w}(t) + \alpha \dot{w}(t) + \theta (t)\le k(t), \end{aligned}$$

(88)

for some $\alpha > 1$, almost every $t>t_0$, and some nonnegative function $k\in L^1 (t_0, +\infty )$. Then, the positive part $[\dot{w}]_+$ of $\dot{w}$ belongs to $L^1(t_0,+\infty )$, and $\lim _{t\rightarrow +\infty }w(t)$ exists. Moreover, we have $\int _{t_0}^{+\infty } \theta (t) dt < + \infty $.

Proof

Multiply (88) by $t^{\alpha -1}$ to obtain

$$\begin{aligned} \frac{d}{dt} \big (t^{\alpha } \dot{w}(t)\big )+ t^{\alpha -1} \theta (t)\le t^{\alpha -1} k (t). \end{aligned}$$

By integration, we obtain

$$\begin{aligned} \dot{w}(t) + \frac{1}{t^{\alpha } } \int _{t_0}^t s^{\alpha -1}\theta (s)ds \le \frac{{t_0}^{\alpha }|\dot{w}(t_0)|}{t^{\alpha } } + \frac{1}{t^{\alpha } } \int _{t_0}^t s^{\alpha -1}k (s)ds. \end{aligned}$$

(89)

Hence,

$$\begin{aligned}{}[\dot{w}]_{+}(t) \le \frac{{t_0}^{\alpha }|\dot{w}(t_0)|}{t^{\alpha } } + \frac{1}{t^{\alpha } } \int _{t_0}^t s^{\alpha -1} k(s)ds, \end{aligned}$$

and so,

$$\begin{aligned} \int _{t_0}^{\infty } [\dot{w}]_{+}(t) dt \le \frac{{t_0}^{\alpha }|\dot{w}(t_0)|}{(\alpha -1) t_0^{\alpha -1}} + \int _{t_0}^{\infty }\frac{1}{t^{\alpha }} \left( \int _{t_0}^t s^{\alpha -1} k(s) ds\right) dt. \end{aligned}$$

Applying Fubini’s Theorem, we deduce that

$$\begin{aligned} \int _{t_0}^{\infty }\frac{1}{t^{\alpha }} \left( \int _{t_0}^t s^{\alpha -1} k(s) ds\right) dt = \int _{t_0}^{\infty } \left( \int _{s}^{\infty } \frac{1}{t^{\alpha }} dt\right) s^{\alpha -1} k(s) ds = \frac{1}{\alpha -1} \int _{t_0}^{\infty }k(s) ds. \end{aligned}$$

As a consequence,

$$\begin{aligned} \int _{t_0}^{\infty } [\dot{w}]_{+}(t) dt \le \frac{{t_0}^{\alpha }|\dot{w}(t_0) |}{(\alpha -1) t_0^{\alpha -1}} + \frac{1}{\alpha -1} \int _{t_0}^{\infty }k(s) ds < + \infty . \end{aligned}$$

This implies $\lim _{t\rightarrow +\infty }w(t)$ exists. Back to (89), integrating from $t_0$ to t, using Fubini’s Theorem again, and then letting t tend to $+\infty $, we obtain

$$\begin{aligned} \lim _{t\rightarrow +\infty } w(t) - w(t_0) + \frac{1}{\alpha -1} \int _{t_0}^{\infty }\theta (s) ds \le \frac{{t_0}^{\alpha }|\dot{w}(t_0) |}{(\alpha -1) t_0^{\alpha -1}} + \frac{1}{\alpha -1} \int _{t_0}^{\infty }k(s) ds < + \infty . \end{aligned}$$

Hence $\int _{t_0}^{\infty }\theta (s) ds < + \infty $. $\square $

1.7 A summability result for real sequences

Lemma A.7

Let $\alpha >1$, and let $(h_k)$ be a sequence of real numbers which is bounded from below, and such that

$$\begin{aligned} (h_{k+1} - h_{k}) - \left( 1- \frac{\alpha }{k}\right) (h_{k} - h_{k-1}) + \omega _k \le \theta _k \end{aligned}$$

(90)

for all $k\ge 1$. Suppose that $(\omega _k)$, and $(\theta _k)$ are two sequences of nonnegative numbers, such that $\sum _k k\theta _{k} <+\infty $. Then

$$\begin{aligned} \sum _{k \in \mathbb {N}} [h_k - h_{k-1} ]_{+}< +\infty \quad \text{ and } \quad \sum _{k \in \mathbb {N}} k\omega _k < +\infty . \end{aligned}$$

Proof

Since $(\omega _k)$ is nonegative, we have

$$\begin{aligned} (h_{k+1} - h_{k}) - \left( 1- \frac{\alpha }{k}\right) (h_{k} - h_{k-1}) \le \theta _k. \end{aligned}$$

Setting $b_k := [h_k - h_{k-1} ]_{+}$ the positive part of $h_k - h_{k-1}$, we immediately infer that

$$\begin{aligned} b_{k+1} \le \left( 1- \frac{\alpha }{k}\right) b_k + \theta _k \end{aligned}$$

for all $k\ge 1$. Multiplying by k and rearranging the terms, we obtain

$$\begin{aligned} (\alpha -1)b_k\le (k-1)b_k-kb_{k+1}+k\theta _k. \end{aligned}$$

Summing for $k=1,\dots , K$, and using the telescopic property, along with the fact that $Kb_{K+1}\ge 0$, we deduce that

$$\begin{aligned} (\alpha -1)\sum _{k=1}^Kb_k\le \sum _{k=1}^Kk\theta _k, \end{aligned}$$

which gives

$$\begin{aligned} \sum _{k \in \mathbb {N}} [h_k - h_{k-1} ]_{+} < +\infty . \end{aligned}$$

Let us now prove that $\sum _{k \in \mathbb {N}} k\omega _k < +\infty $, which is the most delicate part of the proof. To this end, write $\delta _k= h_{k} - h_{k-1}$, and $\alpha _k =\left( 1- \frac{\alpha }{k}\right) $, so that (90) becomes

$$\begin{aligned} \delta _{k+1} + \omega _k \le \alpha _k\delta _k+\theta _k. \end{aligned}$$

An immediate recurrence (it can be easily seen by induction) shows that

$$\begin{aligned} \delta _{k+1} +\sum _{i=1}^k\left[ \left( \prod _{j=i+1}^k \alpha _j\right) \omega _i\right] \le \left( \prod _{j=1}^k \alpha _j\right) \delta _1+\sum _{i=1}^k\left[ \left( \prod _{j=i+1}^k \alpha _j\right) \theta _i\right] , \end{aligned}$$

with the convention $\prod _{j=k+1}^k \alpha _j=1$. To simplify the notation, write $A_{i}^k=\prod _{j=i}^k \alpha _j$. Sum the above inequality for $k=1,\dots , K$ to deduce that

$$\begin{aligned} h_{K+1}-h_1 +\sum _{k=1}^{K}\sum _{i=1}^kA_{i+1}^k\omega _i \le \delta _1\sum _{k=1}^{K}A_1^k +\sum _{k=1}^{K}\sum _{i=1}^kA_{i+1}^k\theta _i. \end{aligned}$$

(91)

Now, using Fubini’s Theorem, we obtain

$$\begin{aligned} h_{K+1}-h_1 +\sum _{i=1}^{K}\left[ \omega _i\sum _{k=i}^KA_{i+1}^k\right] \le \delta _1\sum _{k=1}^{K}A_{1}^k +\sum _{i=1}^{K}\left[ \theta _i\sum _{k=i}^KA_{i+1}^k\right] . \end{aligned}$$

(92)

Simple computations (using integrals in the estimations) show that

$$\begin{aligned} \left( \frac{i}{k}\right) ^{\alpha }\le A_{i+1}^k\le \left( \frac{i+1}{k+1}\right) ^{\alpha }, \end{aligned}$$

and

$$\begin{aligned} \frac{i}{\alpha -1}\le \sum _{k=i}^\infty A_{i+1}^k\le \frac{i}{\alpha -1}\left( \frac{i+1}{i}\right) ^{\alpha } \end{aligned}$$

(see also [4] for further details). Letting $K\rightarrow +\infty $ in (92), we deduce that

$$\begin{aligned} \sum _{i=1}^{\infty }i\omega _i\le C+D\sum _{i=1}^\infty i\theta _i<+\infty \end{aligned}$$

for appropriate constants C and D. $\square $

1.8 A discrete Gronwall lemma

Lemma A.8

Let $c\ge 0$ and let $(a_k)$ and $(\beta _j )$ be nonnegative sequences such that $(\beta _j )$ is summable and

$$\begin{aligned} a_k^2 \le c^2 + \sum _{j=1}^k \beta _j a_j \end{aligned}$$

for all $k\in \mathbb {N}$. Then, $\displaystyle a_k \le c + \sum _{j=1}^{\infty } \beta _j$ for all $k\in \mathbb {N}$.

Proof

For $k\in \mathbb {N}$, set $A_k := \max _{1\le m \le k} a_m $. Then, for $1\le m\le k$, we have

$$\begin{aligned} a_m^2 \le c^2 + \sum _{j=1}^m \beta _j a_j \le c^2 + A_k \sum _{j=1}^{\infty } \beta _j. \end{aligned}$$

Taking the maximum over $1\le m\le k$, we obtain

$$\begin{aligned} A_k^2 \le c^2 + A_k \sum _{j=1}^{\infty } \beta _j. \end{aligned}$$

Bounding by the roots of the corresponding quadratic equation, we obtain the result. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Attouch, H., Peypouquet, J. Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators. Math. Program. 174, 391–432 (2019). https://doi.org/10.1007/s10107-018-1252-x

Download citation

Received: 03 May 2017
Accepted: 16 February 2018
Published: 07 March 2018
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s10107-018-1252-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators

Abstract

Similar content being viewed by others

Fast Continuous Dynamics Inside the Graph of Maximally Monotone Operators

Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

Convergence Rate of Inertial Proximal Algorithms with General Extrapolation and Proximal Coefficients

1 Introduction

1.1 From the heavy ball to fast optimization

1.2 Inertial dynamics and cocoercive operators

1.3 Connections with stochastic optimization

1.3.1 Stochastic approximation algorithms

1.3.2 Robust optimization, regularization

1.4 Organization of the paper

2 Convergence results for the continuous-time system

Theorem 2.1

2.1 Anchor

Lemma 2.2

Proof

Lemma 2.3

Proof

2.2 Speed and acceleration decay

Proposition 2.4

Proof

2.3 Proof of Theorem 2.1

3 Proximal-based inertial algorithms with regularized operator

3.1 A regularized inertial proximal algorithm

Remark 3.1

Remark 3.2

3.2 Preliminary estimations

Lemma 3.3

Proof

Lemma 3.4

Proof

Lemma 3.5

Proof

3.3 Main convergence result

Theorem 3.6

Proof

3.4 An application to convex–concave saddle value problems

4 Stability with respect to perturbations, errors

4.1 The continuous case

Theorem 4.1

Proof

4.2 The algorithmic case

Theorem 4.2

Proof

Remark 4.3

5 Quadratic growth and strong convergence

Theorem 5.1

Proof

Theorem 5.2

6 Further conclusions from a keynote example

6.1 Critical parameters

6.2 A comparative illustration

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Auxiliary results

Auxiliary results

1.1 Yosida regularization of an operator A

1.2 Existence and uniqueness of solution in the presence of a source term

Lemma A.1

Proof

1.3 Opial’s Lemma

Lemma A.2

Lemma A.3

1.4 Variation of the function \(\gamma \mapsto \gamma A_{\gamma }x\)

Lemma A.4

Proof

1.5 On integration and decay

Lemma A.5

Proof

1.6 On boundedness and anchoring

Lemma A.6

Proof

1.7 A summability result for real sequences