Abstract
In a Hilbert space \({\mathcal {H}}\), given \(A{:}\;{\mathcal {H}}\rightarrow 2^{\mathcal {H}}\) a maximally monotone operator, we study the convergence properties of a general class of relaxed inertial proximal algorithms. This study aims to extend to the case of the general monotone inclusion \(Ax \ni 0\) the acceleration techniques initially introduced by Nesterov in the case of convex minimization. The relaxed form of the proximal algorithms plays a central role. It comes naturally with the regularization of the operator A by its Yosida approximation with a variable parameter, a technique recently introduced by Attouch–Peypouquet (Math Program Ser B, 2018. https://doi.org/10.1007/s10107-018-1252-x) for a particular class of inertial proximal algorithms. Our study provides an algorithmic version of the convergence results obtained by Attouch–Cabot (J Differ Equ 264:7138–7182, 2018) in the case of continuous dynamical systems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Throughout this paper, \({\mathcal {H}}\) is a real Hilbert space endowed with the scalar product \(\langle .,.\rangle \) and the corresponding norm \(\Vert .\Vert \). Given \(A:{\mathcal {H}}\rightarrow 2^{\mathcal {H}}\) a maximally monotone operator, we will study the convergence properties of a general class of inertial proximal based algorithms that aim to solve the inclusion \(Ax \ni 0\), whose solution set is denoted by \(\text{ zer }A\). Given initial data \(x_0\), \(x_1\in {\mathcal {H}}\), we consider the Relaxed Inertial Proximal Algorithm, (RIPA) for short, defined by, for \(k \ge 1\)
In the above formula, \(J_{\mu A} = \left( I + \mu A \right) ^{-1} \) is the resolvent of A with index \(\mu >0\), where I is the identity operator. It plays a central role in the analysis of (RIPA), along with the Yosida regularization of A with parameter \(\mu >0\), which is defined by \( A_{\mu } = \frac{1}{\mu } \left( I- J_{\mu A} \right) ,\) (see “Appendix A” for their main properties). We assume the following set of hypotheses
If \(\rho _k=1\) for every \(k\ge 1\), then algorithm (RIPA) reduces to the Inertial Proximal Algorithm
On the other hand, if \(\alpha _k=0\) for every \(k\ge 1\), then algorithm (RIPA) boils down to the Relaxed Proximal Algorithm
For classical references on relaxed proximal algorithms, see for example [10, 18, 19]. An inertial version of such algorithms was first studied in [1], see also [22]. Recent studies showed the importance of the case \(\alpha _k \rightarrow ~1\). When \(A= \partial \Psi \), where \(\Psi : {\mathcal {H}}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a proper closed convex function, this is a key property for obtaining fast convergent methods, in line with the Nesterov and FISTA methods [9, 24]. The case of inertial methods for general maximally monotone operators remains largely to be explored. Inertial proximal splitting methods were recently considered in [12, 17, 21, 23, 27]. One important application concerns the design of inertial ADMM algorithms for linearly constrained minimization problems. Recently, a new approach was delineated by Attouch and Peypouquet [7] based on the Yosida regularization of A with a varying parameter. In this paper, we will provide a unifying approach to these problems that extends [7] and opens new perspectives. Our study is the natural extension, in the algorithmic case, of the convergence results obtained by Attouch–Cabot [5] in the case of continuous dynamical systems.
1.1 Relaxed proximal algorithms
The classical proximal algorithm is obtained by the implicit discretization of the differential inclusion
By contrast, the relaxed proximal algorithm (RPA) comes naturally by discretizing the regularized differential equation
where \(A_{\lambda }\) is the Yosida approximation of A of index \(\lambda >0\). Since \(A_{\lambda }\) is Lipschitz continuous, (2) is relevant owing to the Cauchy–Lipschitz theorem. Indeed, implicit time discretization of (2), with step size \(h_k>0\), gives the relaxed proximal algorithm (details of the proof are given below in the inertial case)
with \(\mu _k = \lambda + h_k\) and \(\rho _k = \frac{h_k}{\lambda +h_k} \). System (2) has many advantages over the differential inclusion (1). Note that \(\text{ zer } A = \text{ zer } A_{\lambda }\), so the equilibria are the same for both systems. Since \(A_{\lambda }\) is cocoercive, the trajectories of (2) converge weakly to equilibria, which is a great contrast to the semigroup generated by A, for which, in general we only have weak ergodic convergence. Thus, one can expect that the associated algorithms also benefit from these favorable properties.
These aspects are even more striking when one considers inertial dynamics. The damped inertial dynamics \( \ddot{x}(t) + \gamma (t)\dot{x}(t) + A (x(t)) \ni 0\) is ill-posed, and no general convergence theory is available for this system. By contrast, the regularized dynamics
is well-posed. Some first results concerning the adjustments of the parameters \(\gamma (t)\) and \(\lambda (t)\), in order to have good asymptotic convergence properties have been obtained in [2, 6] and [7]. A closely connected dynamical system with variable damping and step sizes has been considered in [11].
Let’s proceed to the implicit temporal discretization of (3). Indeed, implicit discretizations tend to follow the continuous-time trajectories more closely than explicit discretizations. Note that, due to the Yosida regularization, the explicit discretization of (3) has the same numerical complexity as the implicit discretization (they each need one resolvent computation per iteration). Taking a time step \(h_k>0\), and setting \(t_k= \sum \nolimits _{i=1}^k h_i\), \(x_k = x(t_k)\), \(\lambda _k = \lambda (t_k)\), \(\gamma _k =\gamma (t_k)\), an implicit finite-difference scheme for (3) with centered second-order variation gives
After expanding (4), we obtain \(x_{k+1} + h_k^2 A_{\lambda _k} (x_{k+1}) = x_{k} + \left( 1- {\gamma _k}{h_k}\right) ( x_{k} - x_{k-1}).\) Setting \(s_k=h_k^2\), we have
where \(J_{s_k A_{\lambda _k}}\) is the resolvent of index \(s_k>0\) of the maximally monotone operator \(A_{\lambda _k}\). Setting \(\alpha _k = 1- {\gamma _k}{h_k}\), this gives the following algorithm
The resolvent equation, \(\left( A_{\lambda }\right) _s = A_{\lambda + s}\), gives \( J_{s A_{\lambda }} = \frac{\lambda }{\lambda +s}I + \frac{s}{\lambda +s}J_{(\lambda +s) A}. \) Hence, (5) is equivalent to
That’s algorithm (RIPA) with \(\mu _k= \lambda _k +s_k \) and \(\rho _k= \frac{s_k}{\lambda _k +s_k} \).
1.2 Geometrical aspects of (RIPA)
(RIPA) has a simple geometrical interpretation. This is illustrated in Fig. 1, where, as a distinctive feature of the proximal method, the closed affine half-space \( \left\{ z\in {\mathcal {H}}: \left\langle y_k- J_{\mu _k A}(y_k), z- J_{\mu _k A}(y_k) \right\rangle \le 0 \right\} \) separates \(y_k\) from \( \text{ zer }A\).
In the Fig. 1, the parameter \(\rho _k\) has been taken between zero and one, so that \(x_{k+1}\) belongs to the line segment \([y_k, J_{\mu _k A}(y_k)]\). Indeed, we will see that, under certain conditions, the parameter \(\rho _k\) can vary in the interval [0, 2]. The case \(\rho _k >1\) is particularly interesting, since it allows to combine inertial effect with over-relaxation. These combined aspects have been little studied until now, see [20] for some recent results in the case of a fixed resolvent operator \(J_{\mu A}\).
For a maximally monotone operator A such that \(\text{ zer }A\ne \emptyset \), we have \(\lim _{\mu \rightarrow +\infty }J_{\mu A}(x)= \text{ proj }_{{\mathrm{zer}}A} x\), see [8, Theorem 23.47 \(\mathrm{{(i)}}\)]. Thus, in the case \(\mu _k \rightarrow +\infty \) (which, we will see, is important for obtaining fast methods) we have \( J_{\mu _k A}(y_k) \sim \text{ proj }_{{\mathrm{zer}}A} y_k\), as shown in Fig. 1.
1.3 Presentation of the results
The case \(A=0\) already reveals some crucial notions. In this case, \(J_{\mu _k A}=I\) and algorithm (RIPA) becomes \(x_{k+1}=x_k+\alpha _k(x_k-x_{k-1}).\) It ensues that for every \(k\ge 1\),
Hence, \((x_k)\) converges if and only if \(x_1=x_0\) or if the following condition is satisfied \(\sum \nolimits _{l=1}^{+\infty }\prod _{j=1}^l \alpha _j <+\infty .\) When \(A=\partial \Psi \) is the subdifferential of a convex function \(\Psi :{\mathcal {H}}\rightarrow \mathbb {R}\cup \{+\infty \}\) with a continuum of minima, this condition has been identified as a necessary condition for the convergence of the iterates of (IPA), see [16]. From now on, we assume that
and we define the sequence \((t_i)\) by
The sequence \((t_i)\) plays a crucial role in the study of the asymptotic behavior of the iterates of (IPA). This was recently highlighted by Attouch and Cabot [4] in the potential case. However, most of the energetical arguments used in [4] are not available in the general framework of maximally monotone operators. It follows that we must develop different techniques.
As a model example of our results, let us give the following shortened version of Theorem 2.6.
Theorem 1.1
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that \(\alpha _k\in [0,1]\) and \(\rho _k\in ]0,2]\) for every \(k\ge 1\). Under \((K_0)\), let \((t_i)\) be the sequence defined by (6). Assume that there exists \(\varepsilon \in ]0,1[\) such that for k large enough,
Then for any sequence \((x_k)\) generated by (RIPA), we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\alpha _i t_{i+1}\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\rho _i (2-\rho _i)\, t_{i+1}\Vert \mu _i A_{\mu _i}(x_i)\Vert ^2<+\infty \).
-
(iii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
Assume moreover that \(\limsup _{k\rightarrow +\infty } \rho _k<2\), and \(\liminf _{k\rightarrow +\infty } \rho _k>0.\) Then the following holds
-
(iv)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(v)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
These results are complemented in Theorem 2.14 so as to cover the case of a possibly vanishing parameter \(\rho _k\). Then, the assumption \(\liminf _{k\rightarrow +\infty } \rho _k>0\) is removed and replaced with an alternative set of assumptions.
1.4 Organization of the paper
Our main convergence results are established in Sect. 2, see Theorem 2.6 and Theorem 2.14. Based on the behavior of the sequences \((\alpha _k)\), \((\mu _k)\), \((\rho _k)\), we show the weak convergence of the sequences \((x_k)\) generated by algorithm (RIPA). Thus, in the general context of maximally monotone operators acting on Hilbert spaces, we unify and extend most of the previously known results concerning the combination of the proximal methods with relaxation and inertia. These results are illustrated in Sect. 3 which presents applications to special classes of sequences \((\alpha _k)\), \((\mu _k)\), \((\rho _k)\). In particular, we find the recent result obtained by Attouch–Peypouquet based on the accelerated method of Nesterov. Finally, in Sect. 4, we provide ergodic convergence results, extending the seminal result of Brezis–Lions. The paper is supplemented by some auxiliary technical lemmas contained in the appendix.
2 Convergence results
2.1 Equivalent forms of (RIPA)
Let us give several equivalent formulations of (RIPA). Observe that
It ensues that (RIPA) can be equivalently rewritten as
Recalling that \(\xi =A_\mu (y)\) if and only if \(\xi \in A(y-\mu \xi )\), we obtain the following equivalences
This gives rise to the equivalent formulation of (RIPA)
Depending on the situation, we will use one of the above mentioned equivalent formulations.
2.2 The anchor sequence \((h_k)\)
Given \(z\in {\mathcal {H}}\) and a sequence \((x_k)\) generated by (RIPA), let us define the sequence \((h_k)\) by \(h_k=\frac{1}{2}\Vert x_k-z\Vert ^2\). The difference \(h_{k+1}-h_k-\alpha _k(h_k-h_{k-1})\) plays a central role in the study of the asymptotic behavior of \((x_k)\) as \(k\rightarrow +\infty \). Let us start with a basic result that relies on algebraic manipulations of the terms \(h_{k-1}\), \(h_k\) and \(h_{k+1}\).
Lemma 2.1
Let \((x_k)\) be a sequence in \({\mathcal {H}}\), and let \((\alpha _k)\) be a sequence of real numbers. Given \(z\in {\mathcal {H}}\), let us define the sequence \((h_k)\) by \(h_k=\frac{1}{2}\Vert x_k-z\Vert ^2\). We then have
where \(y_k=x_k+\alpha _k(x_k-x_{k-1})\).
Proof
Observe that
We deduce that
\(\square \)
Let us particularize Lemma 2.1 to sequences generated by (RIPA). In the following statement, \({\mathrm{gph}}A\) stands for the graph of A, see “Appendix A”.
Lemma 2.2
Assume (H) and let \((z,q)\in {\mathrm{gph}}A\). Given a sequence \((x_k)\) generated by (RIPA), let \((h_k)\) be the sequence defined by \(h_k=\frac{1}{2}\Vert x_k-z\Vert ^2\). Then we have for every \(k\ge 1\),
Assume moreover that \({\mathrm{zer}}A\ne \emptyset \), and let \(z\in {\mathrm{zer}}A\). The following holds true for every \(k\ge 1\)
Proof
Iteration (RIPA) can be expressed as
see (8). Since \(q\in A(z)\), the monotonicity of A yields
Hence
From equality (9) of Lemma 2.1, we deduce immediately that
which is nothing but (10). Finally, if \(z\in {\mathrm{zer}}A\), then inequality (11) is obtained by taking \(q=0\) in (10). \(\square \)
Lemma 2.3
Under (H), assume that \(\rho _k\in ]0,2]\) for every \(k\ge 1\). Suppose that \({\mathrm{zer}}A\ne \emptyset \) and let \(z\in {\mathrm{zer}}A\). Given a sequence \((x_k)\) generated by (RIPA), let \((h_k)\) be the sequence defined by \(h_k=\frac{1}{2}\Vert x_k-z\Vert ^2\). Then we have for every \(k\ge 1\),
Proof
Let us formulate Lemma 2.2 in a recursive form. Observe that
On the other hand, we have
By combining the above equalities, we obtain
Since \(\rho _k\in ]0,2]\) by assumption, the expected inequality follows immediately from (11). \(\square \)
2.3 The sequences \((t_i)\) and \((t_{i,k})\)
Let us introduce the sequences \((t_i)\) and \((t_{i,k})\) which will play a central role in the analysis of algorithm (RIPA) (the sequence \((t_i)\) has already been briefly defined in the introduction). Throughout the paper, we use the convention \(\prod _{j=i}^{i-1}\alpha _j=1\) for \(i\ge 1\). Given i, \(k\ge 1\), we write \(t_{i,k}\) the quantity defined by
and \(t_{i,k}=0\) if \(i>k\). Observe that for every \(i\ge 1\) and \(k\ge i+1\),
From now on, we assume that
We define the sequence \((t_i)\) by
For each \(i\ge 1\), the sequence \((t_{i,k})_k\) converges increasingly to \(t_i\). By letting \(k\rightarrow +\infty \) in (14), we obtain
for every \(i\ge 1\). Let us summarize the above results.
Lemma 2.4
Let \((\alpha _k)\) be a sequence of nonnegative real numbers. Then we have
-
(i)
The sequence \((t_{i,k})\) defined by (13) satisfies the recursive relation: for every \(i\ge 1\) and \(k\ge i+1\)
$$\begin{aligned} 1+\alpha _i t_{i+1,k}=t_{i,k} \end{aligned}$$ -
(ii)
Under \((K_0)\), the sequence \((t_i)\) given by (15) is well-defined and satisfies for every \(i\ge 1\)
$$\begin{aligned}1+\alpha _i t_{i+1}= t_i.\end{aligned}$$
2.4 Weak convergence of the iterates
Our convergence results are based on Lyapunov analysis. The weak convergence of the sequences generated by (RIPA) is based on the Opial lemma [25], which we recall in its discrete form.
Lemma 2.5
(Opial). Let S be a nonempty subset of \(\mathcal {H}\), and \((x_k)\) a sequence of elements of \(\mathcal {H}\) satisfying
-
(i)
for every \(z\in S\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists;
-
(ii)
every sequential weak cluster point of \((x_k)\), as \(k\rightarrow +\infty \), belongs to S.
Then the sequence \((x_k)\) converges weakly as \(k\rightarrow +\infty \) toward some \(x_\infty \in S\).
Let us state the main result of this section.
Theorem 2.6
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that \(\alpha _k\in [0,1]\) and \(\rho _k\in ]0,2]\) for every \(k\ge 1\). Under \((K_0)\), let \((t_i)\) be the sequence defined by (15). Assume that there exists \(\varepsilon \in ]0,1[\) such that for k large enough,
Then for any sequence \((x_k)\) generated by (RIPA), we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\frac{2-\rho _{i-1}}{\rho _{i-1}}(1-\alpha _{i-1})\Vert x_i-x_{i-1}\Vert ^2<+\infty \), and as a consequence \(\displaystyle \sum \nolimits _{i=1}^{+\infty }\alpha _i t_{i+1}\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\rho _i (2-\rho _i)\, t_{i+1}\Vert \mu _i A_{\mu _i}(y_i)\Vert ^2<+\infty \), and \(\displaystyle \sum \nolimits _{i=1}^{+\infty }\rho _i (2-\rho _i)\, t_{i+1}\Vert \mu _i A_{\mu _i}(x_i)\Vert ^2<+\infty \).
-
(iii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
Assume moreover that
Then the following holds
-
(iv)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(y_k)=0\), and \(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(v)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Proof
\(\mathrm{{(i)}}\) Let \(z\in {\mathrm{zer}}A\), and let us set \(h_k=\frac{1}{2}\Vert x_k-z\Vert ^2\) for every \(k\ge 1\). Setting \(a_k=h_k-h_{k-1}\) and
we can rewrite inequality (12) of Lemma 2.3 in the condensed form \(a_{k+1}\le \alpha _k a_k +w_k\). By applying Lemma B.1\(\mathrm{{(i)}}\), we obtain for every \(k\ge 1\)
Since \(t_{1,k}\le t_1\) and \(h_k\ge 0\), we deduce that
with \(C:=2h_0+2t_{1}|h_1-h_0|\). Now observe that (we perform a discrete form of integration by parts)
Since the second last term is nonnegative and since \(t_{1,k}\le t_1\), we deduce from the above equality that
Collecting the above results, we infer that
with \(C_1=2h_0+2t_{1}|h_1-h_0|+t_{1}\frac{2-\rho _{0}}{\rho _{0}}(1-\alpha _{0})\Vert x_{1}-x_{0}\Vert ^2\) and
Now recall that \(t_{i,k}=1+\alpha _it_{i+1,k}\) for every \(i\ge 1\) and \(k\ge i+1\), see Lemma 2.4\(\mathrm{{(i)}}\). It ensues that
We then infer from (16) that for every \(k\ge 2\),
By assumption, inequality \((K_1)\) holds true for k large enough. Without loss of generality, we may assume that it is satisfied for every \(k\ge 1\). In view of the above inequality, it ensues that
Taking the limit as \(k\rightarrow +\infty \), we find
By using again \((K_1)\), we deduce that
\(\mathrm{{(ii)}}\) Let us come back to inequality (11). Using that \(\alpha _k\in [0,1]\), we get
Since \(x_{k+1}-y_k=-\rho _k\mu _k A_{\mu _k}(y_k)\), this implies that
By invoking Lemma B.1\(\mathrm{{(i)}}\) with \(a_k=h_k-h_{k-1}\) and
we obtain for every \(k\ge 1\),
Since \(h_k\ge 0\) and \(t_{i+1,k}\le t_{i+1}\), we deduce that
Recalling from \(\mathrm{{(i)}}\) that \(\sum _{i=1}^{+\infty }\alpha _i t_{i+1} \Vert x_i-x_{i-1}\Vert ^2<+\infty \), we infer that for every \(k\ge 1\),
where we have set
Since \(t_{i+1,k}=0\) for \(i\ge k\), this yields in turn
Letting k tend to \(+\infty \), the monotone convergence theorem then implies that
which gives the first estimate of \(\mathrm{{(ii)}}\). Using the \(\frac{1}{\mu _i}\)-Lipschitz continuity property of \(A_{\mu _i}\), we have
It ensues that
where we have used \(\alpha _i\le 1\) and \(\rho _i(2-\rho _i)\le 1\) in the second inequality. The first (resp. second) term of the above right-hand side is summable by (19) [resp. (17)]. We deduce that the left-hand side is also summable. This proves the second estimate of \(\mathrm{{(ii)}}\).
\(\mathrm{{(iii)}}\) From (18), we derive that for every \(k\ge 1\),
Recall that, from \(\mathrm{{(i)}}\), we have \(\sum _{i=1}^{+\infty }\alpha _i t_{i+1}\Vert x_{i}-x_{i-1}\Vert ^2<+\infty \). Applying Lemma B.1\(\mathrm{{(ii)}}\) with \(a_k=h_k-h_{k-1}\) and \(w_k=\alpha _k\Vert x_k-x_{k-1}\Vert ^2\), we infer that \(\sum _{i=1}^{+\infty }(h_k-h_{k-1})_+<+\infty \). This classically implies that \(\lim _{k\rightarrow +\infty } h_k\) exists. Thus, we have obtained that \(\lim _{k\rightarrow +\infty } \Vert x_k-z\Vert \) exists for every \(z\in {\mathrm{zer}}A\), whence in particular the boundedness of the sequence \((x_k)\).
\(\mathrm{(iv)}\) From \((K_2)\) and \((K_3)\), there exist \(\underline{r}>0\) and \(\overline{r}<2\) such that \(\rho _k\in [\underline{r},\overline{r}]\) for k large enough. We deduce from the first estimate of \(\mathrm{{(ii)}}\) that \(\sum _{i=1}^{+\infty } t_{i+1}\Vert \mu _i A_{\mu _i}(y_i)\Vert ^2<+\infty \), hence \(\lim _{i\rightarrow +\infty } t_{i+1}\Vert \mu _i A_{\mu _i}(y_i)\Vert ^2=~0\). Since \(t_i\ge 1\) for every \(i\ge 1\), this implies in turn that \(\lim _{i\rightarrow +\infty } \Vert \mu _i A_{\mu _i}(y_i)\Vert =0\). The proof of \(\lim _{i\rightarrow +\infty } \Vert \mu _i A_{\mu _i}(x_i)\Vert =0\) follows the same lines.
\(\mathrm{(v)}\) To prove the weak convergence of \((x_k)\) as \(k\rightarrow +\infty \), we use the Opial lemma with \(S={\mathrm{zer}}A\). Item \(\mathrm{{(iii)}}\) shows the first condition of the Opial lemma. For the second one, let \((x_{k_n})\) be a subsequence of \((x_k)\) which converges weakly to some \(\overline{x^{}}\). By \(\mathrm{(iv)}\), we have \(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\) strongly in \({\mathcal {H}}\). Since \(\liminf _{k\rightarrow +\infty }\mu _k>0\), we also have \(\lim _{k\rightarrow +\infty }A_{\mu _k}(x_k)=0\) strongly in \({\mathcal {H}}\). Passing to the limit in
and invoking the graph-closedness of the maximally monotone operator A for the weak–strong topology in \({\mathcal {H}}\times {\mathcal {H}}\), we find \(0\in A(\overline{x^{}})\). This shows that \(\overline{x^{}}\in {\mathrm{zer}}A\), which completes the proof. \(\square \)
Remark 2.7
The main role of assumption \((K_1)\) is to guarantee the summability condition
obtained in \(\mathrm{{(i)}}\). A careful examination of the proof of Theorem 2.6 shows that conclusions \(\mathrm{{(ii)}}\), \(\mathrm{{(iii)}}\), \(\mathrm{(iv)}\) and \(\mathrm{(v)}\) hold true if we assume directly condition (20). The latter condition involves the sequence \((x_k)\) that is a priori unknown. However, in practice it is easy to ensure it by using a suitable on-line rule.
Let us now particularize Theorem 2.6 to the case \(\alpha _k=0\) for every \(k\ge 1\), corresponding to the absence of inertia in algorithm (RIPA). In this framework, assumptions \((K_0)\) and \((K_1)\) are automatically satisfied, and moreover \(t_i=1\) for every \(i\ge 1\). We then derive from Theorem 2.6 the following result, which is a particular case of [18, Theorem 3]. Note that the latter also takes into account the presence of errors in the computation of the resolvents.
Corollary 2.8
(Bertsekas–Eckstein [18]). Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \), and that \(\rho _k\in ]0,2]\) for every \(k\ge 1\). Then, for any sequence \((x_k)\) generated by (RPA)
we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\frac{2-\rho _{i-1}}{\rho _{i-1}}\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
Assume moreover that \(\limsup _{k\rightarrow +\infty } \rho _k<2\) and \(\liminf _{k\rightarrow +\infty } \rho _k>0.\) Then the following holds
-
(iii)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(iv)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Let us now assume that \(\rho _k=1\) for every \(k\ge 1\). In such a case, the algorithm (RIPA) boils down to the inertial proximal iteration. We obtain directly the following corollary of Theorem 2.6.
Corollary 2.9
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \), and that \(\alpha _k\in [0,1]\) for every \(k\ge 1\). Suppose \((K_0)\) and let \((t_i)\) be the sequence defined by (15). Assume that there exists \(\varepsilon \in ]0,1[\) such that for k large enough,
Then for any sequence \((x_k)\) generated by (IPA)
we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }(1-\alpha _{i-1})\Vert x_i-x_{i-1}\Vert ^2<+\infty \), and as a consequence \(\displaystyle \sum \nolimits _{i=1}^{+\infty }\alpha _i t_{i+1}\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty } t_{i+1}\Vert \mu _i A_{\mu _i}(y_i)\Vert ^2<+\infty \), and \(\displaystyle \sum \nolimits _{i=1}^{+\infty } t_{i+1}\Vert \mu _i A_{\mu _i}(x_i)\Vert ^2<+\infty \).
-
(iii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
-
(iv)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(y_k)=0\), and \(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(v)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Remark 2.10
Following Remark 2.7, items \(\mathrm{{(ii)}}\) to \(\mathrm{(v)}\) of Corollary 2.9 hold true if we suppose that
Assume moreover that there exists \(\overline{\alpha }\in [0,1[\) such that \(\alpha _k\in [0,\overline{\alpha }]\) for every \(k\ge 1\). Then it is easy to show that \(t_k\le 1/(1-\overline{\alpha })\) for every \(k\ge 1\). Hence the above summability condition is ensured by the following
To summarize, if \(\alpha _k\in [0,\overline{\alpha }]\) for every \(k\ge 1\), and if condition (21) is satisfied, then we obtain conclusions \(\mathrm{{(ii)}}\) to \(\mathrm{(v)}\) of Corollary 2.9. This is precisely the result stated in [3, Theorem 2.1].
As a consequence of Corollary 2.9, we also find the result of [3, Proposition 2.1], when \(\alpha _k\le \overline{\alpha }<\frac{1}{3}\).
Corollary 2.11
(Alvarez–Attouch [3]). Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that there exists \(\overline{\alpha }\in [0,1/3[\) such that \(\alpha _k\in [0,\overline{\alpha }]\) for every \(k\ge 1\). Then for any sequence \((x_k)\) generated by (IPA), we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty } \Vert \mu _i A_{\mu _i}(x_i)\Vert ^2<+\infty \).
-
(iii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
-
(iv)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(v)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Proof
Since \(\alpha _k\le \overline{\alpha }<1\) for every \(k\ge 1\), it is immediate to check that \((K_0)\) is satisfied and that \(t_k\le \frac{1}{1-\overline{\alpha }}\) for every \(k\ge 1\). On the one hand, observe that for every \(k\ge 1\),
On the other hand \(1-\alpha _{k-1}\ge 1-\overline{\alpha }.\) It ensues that \((K_1)\) is satisfied if there exists \(\varepsilon \in ]0,1[\) such that
The latter condition is equivalent to \((1-\overline{\alpha })^2>\overline{\alpha }(1+\overline{\alpha })\), which in turn is equivalent to \(\overline{\alpha }<1/3\). Therefore assumption \((K_1)\) is satisfied, and it suffices to apply Corollary 2.9. \(\square \)
By taking constant parameters \(\alpha _k\) and \(\rho _k\), we obtain the following consequence of Theorem 2.6.
Corollary 2.12
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that \(\alpha _k \equiv \alpha \in [0,1[\), \(\rho _k \equiv \rho \in ]0,2[\) for every \(k\ge 1\), and that
Then for any sequence \((x_k)\) generated by (RIPA), we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty } \Vert \mu _i A_{\mu _i}(x_i)\Vert ^2<+\infty \).
-
(iii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
-
(iv)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(v)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Proof
Since \(\alpha _k \equiv \alpha \in [0,1[\), we have for every \(i\ge 1\), \(t_i=\sum _{l=i-1}^{+\infty }\alpha ^{l-i+1}=\frac{1}{1-\alpha }<+\infty .\) Hence condition \((K_0)\) holds true. Using that \(\alpha _k\) and \(\rho _k\) are constant, condition \((K_1)\) then amounts to
which is equivalent to (22). Therefore, all the assumptions of Theorem 2.6 are met, giving the result. \(\square \)
Remark 2.13
The above result gives some indication of the balance between the inertial effect and the relaxation effect. The inequation (22) is equivalent to \(\rho < \frac{2(1- \alpha )^2}{2 \alpha ^2 -\alpha +1} \). Therefore, for given \(0<\alpha <1\), the maximum value of the relaxation parameter is given by \(\rho _m (\alpha )= \frac{2(1- \alpha )^2}{2 \alpha ^2 -\alpha +1} \). Elementary differential calculus shows that the function \(\alpha \mapsto \rho _m (\alpha )\) is decreasing on [0, 1]. Thus, as expected, when the inertial effect increases (\(\alpha \nearrow \)), then the relaxation effect decreases (\(\rho _m \searrow \)), and vice versa, see also [20]. When \(\alpha \rightarrow 0\), the limiting value \(\rho _m (\alpha ) \) is 2, which is in accordance with Corollary 2.8. When \(\alpha \rightarrow 1\), the limiting value of \(\rho _m (\alpha )\) is zero, which is in accordance with the existing results concerning the case \(\alpha _k \rightarrow 1\).
2.5 Case of a possibly vanishing parameter \(\rho _k\)
When \(\alpha _k \rightarrow 1\), which is the case of the Nesterov accelerated method, we must take \(\rho _k \rightarrow 0\) to satisfy the condition \((K_1)\). Consequently, Theorem 2.6 does not make it possible to obtain the convergence of the iterates of (RIPA) in the case \(\alpha _k \rightarrow 1\). The following result completes Theorem 2.6 by considering the case of a possibly vanishing parameter \(\rho _k\). In the upcoming statement, assumption \((K_3)\) is removed and replaced with an alternative set of assumptions, namely \((K_4)\)–\((K_5)\).
Theorem 2.14
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that the sequences \((\alpha _k)\) and \((\rho _k)\) satisfy \(\alpha _k\in [0,1]\) and \(\rho _k\in ]0,2]\) for every \(k\ge 1\), together with \((K_0)\)–\((K_1)\). Then for any sequence \((x_k)\) generated by (RIPA),
-
(i)
There exists a constant \(C\ge 0\) such that for every \(k\ge 1\),
$$\begin{aligned}\Vert x_{k+1}-x_k\Vert \le C\, \sum _{i=1}^k\left[ \left( \prod _{j=i+1}^k \alpha _j\right) \rho _i\right] .\end{aligned}$$
Assume additionally \((K_2)\), together with
Then the following holds
-
(ii)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\). If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Finally assume that condition \((K_5)\) is not satisfied, i.e. \(\displaystyle \sum \nolimits _{i=1}^{+\infty }\rho _i t_{i+1}<+\infty \). Then we obtain
-
(iii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\Vert x_i-x_{i-1}\Vert <+\infty \), and hence the sequence \((x_k)\) converges strongly toward some \(x_\infty \in {\mathcal {H}}\).
Proof
\(\mathrm{{(i)}}\) Iteration (RIPA) can be rewritten as
see (7). Taking the norm of each member, we find
On the other hand, for \(z\in {\mathrm{zer}}A={\mathrm{zer}}A_{\mu _k}\), the \(\frac{1}{\mu _k}\)-Lipschitz continuity of \(A_{\mu _k}\) yields
Recall that the sequence \((x_k)\) is bounded by Theorem 2.6\(\mathrm{{(iii)}}\). Since \(\alpha _k\in [0,1]\), the sequence \((y_k)\) is also bounded. From the above inequality, we deduce the existence of \(C_3\ge 0\) such that \(\Vert A_{\mu _k}(y_k)\Vert \le \frac{C_3}{\mu _k}\) for every \(k\ge 1\). In view of (23), we infer that
An immediate recurrence shows that for every \(k\ge 1\),
with the convention \(\prod _{j=k+1}^k \alpha _j=1\). Since \(\alpha _k\in [0,1]\), we have \(\prod _{j=i+1}^k \alpha _j\ge \prod _{j=1}^k \alpha _j\) and hence
Setting \(C_4:=\Vert x_{1}-x_0\Vert /\rho _1+C_3\), we deduce that for every \(k\ge 1\),
\(\mathrm{{(ii)}}\) Recall the estimate of Theorem 2.6\(\mathrm{{(ii)}}\)
According to \((K_2)\), there exists \(\bar{r}\in ]0,2[\) such that \(\rho _k\le \bar{r}\) for k large enough. We deduce from (26) that
Since the operator \(\mu _k A_{\mu _k}\) is 1-Lipschitz continuous, we have
with \(C_5:=\sup _{k\ge 1}\Vert x_k-z\Vert <+\infty \). It ensues that
By applying [7, Lemma A.4] with \(\gamma =\mu _{k+1}\), \(\delta =\mu _k\), \(x=x_{k+1}\) and \(y=x_k\), we find
In view of (25), we deduce that for every \(k\ge 1\),
Recalling the assumption \((K_4)\), we obtain the existence of \(C_6\ge 0\) such that for k large enough,
Using (28), we infer that
It follows that for every \(k\ge 1\),
Given the estimate (27), together with the assumption \(\rho _{i} t_{i+1}={\mathcal {O}}(\rho _{i+1} t_{i+2})\) as \(i\rightarrow +\infty \), we deduce that
From a classical result, this implies that \(\lim _{k\rightarrow +\infty }\Vert \mu _{k} A_{\mu _{k}}(x_{k})\Vert ^4\) exists, which entails in turn that \(\lim _{k\rightarrow +\infty }\Vert \mu _{k} A_{\mu _{k}}(x_{k})\Vert \) exists. Using again the estimate (27), together with the assumption \((K_5)\), we immediately conclude that \(\lim _{k\rightarrow +\infty }\Vert \mu _{k} A_{\mu _{k}}(x_{k})\Vert =0\). The proof of the weak convergence of the sequence \((x_k)\) follows the same lines as in Theorem 2.6\(\mathrm{(v)}\).
\(\mathrm{{(iii)}}\) Let us now assume that \(\sum _{i=1}^{+\infty }\rho _it_{i+1}<+\infty \). Recall from inequality (24) that
By applying Lemma B.1\(\mathrm{{(ii)}}\) with \(a_k=\Vert x_k-x_{k-1}\Vert \) and \(w_k=C_3\rho _k\), we obtain that \(\sum _{i=1}^{+\infty }\Vert x_i-x_{i-1}\Vert <~+\infty \). The last assertion is immediate. \(\square \)
3 Application to particular classes of parameters \(\alpha _k\), \(\mu _k\) and \(\rho _k\)
3.1 Some criteria for \((K_0)\) and \((K_1)\)
The following proposition provides a criterion for simply obtaining an asymptotic equivalent of \(t_k\).
Proposition 3.1
Let \((\alpha _k)\) be a sequence such that \(\alpha _k\in [0,1[\) for every \(k\ge 1\). Assume thatFootnote 1
for some \(c\in [0,1[\). Then we have
-
(i)
The property \((K_0)\) is satisfied, and
$$\begin{aligned}t_{k+1}\sim \frac{1}{(1-c)(1-\alpha _k)}\quad \text{ as } k\rightarrow +\infty .\end{aligned}$$ -
(ii)
The equivalence \(1-\alpha _k \sim 1-\alpha _{k+1}\) holds true as \(k\rightarrow +\infty \), hence \(t_{k+1}\sim t_{k+2}\) as \(k\rightarrow +\infty \).
-
(iii)
\(\displaystyle \sum \nolimits _{k=1}^{+\infty }(1-\alpha _k)=+\infty \).
Proof
\(\mathrm{{(i)}}\) This result was proved by the authors in [4, Proposition 15].
\(\mathrm{{(ii)}}\) First assume that \(c\in ]0,1[\). By a standard summation procedure, we infer from (29) that
It ensues that \(1-\alpha _{k}\sim \frac{1}{c k}\) as \(k\rightarrow +\infty \), and hence clearly \(1-\alpha _{k}\sim 1-\alpha _{k+1}\) as \(k\rightarrow +\infty \). Now assume that \(c=0\). Multiplying (29) by \(1-\alpha _k\), we find
because \(\alpha _k\in [0,1[\). This completes the proof of the equivalence \(1-\alpha _k \sim 1-\alpha _{k+1}\) as \(k\rightarrow +\infty \). The last assertion then follows immediately from \(\mathrm{{(i)}}\).
\(\mathrm{{(iii)}}\) Fix \(\varepsilon >0\). In view of (29), there exists \(k_0\ge 1\) such that for every \(k\ge k_0\),
By summing the above inequality, we obtain \(\frac{1}{1-\alpha _{k}}\le \frac{1}{1-\alpha _{k_0}}+ (c+\varepsilon )(k-k_0)\) for every \(k\ge k_0\). Setting \(d=1/(1-\alpha _{k_0})\), we deduce immediately that \(1-\alpha _{k}\ge 1/(d+ (c+\varepsilon )(k-k_0))\), thus implying that \(\sum _{k=1}^{+\infty }(1-\alpha _k)=+\infty \). \(\square \)
Let us now analyze the condition \((K_1)\)
Following an argument parallel to the continuous case, see [5, Proposition 3.2], let us introduce the following condition:
Proposition 3.2
Let’s make assumptions (29) and (30), with \(|c'| < 1-c\). Then \((K_1)\) is satisfied if
Proof
Setting \(\theta _{k} =\frac{2-\rho _{k}}{\rho _{k}}(1-\alpha _{k}) \), let us rewrite \((K_1)\) as a discrete differential inequality, as follows
According to Proposition 3.1 we have \(t_{k+1}\sim t_{k}\sim \frac{1}{(1-c)(1-\alpha _{k-1})}\quad \text{ as } k\rightarrow +\infty .\) Consequently, (32) can be equivalently formulated as
On the other hand, condition (30) gives
Setting \(R_k := \frac{2-\rho _{k}}{\rho _{k}}(1-\alpha _{k})^2 \), we deduce that \((K_1)\) is implied by the following condition
Rearranging the terms we obtain
Since \(\alpha _k \le 1\), the above inequality will be satisfied if
This will be fulfilled if
The right member of (33) is a continuous increasing function of \(\varepsilon \). Consequently, it is equivalent to assume that the above strict inequality is satisfied for \(\varepsilon =0\), which gives the claim. \(\square \)
The next proposition brings to light a set of conditions which guarantee that condition \((K_1)\) is satisfied.
Proposition 3.3
Suppose that \(\alpha _k\in [0,1[\) and \(\rho _k\in ]0,2[\) for every \(k\ge 1\). Let us assume that there exist \(\overline{\rho }\in [0,2[\), \(c \in [0,1[ \) and \(c'' \in {\mathbb {R}}\), with \(-(1-\overline{\rho }/2)<c''\le -(1-\overline{\rho }/2) c\) such that
Then condition \((K_1)\) is satisfied.
Proof
Let us check that the conditions (30) and (31) of Proposition 3.2 are satisfied. First observe that
In view of assumption (35), we have
since \(1-\alpha _{k-1}\sim 1-\alpha _{k}\) as \(k\rightarrow +\infty \), see Proposition 3.1\(\mathrm{{(ii)}}\). Setting \(R_k := \frac{2-\rho _{k}}{\rho _{k}}(1-\alpha _{k})^2\), this leads to
On the other hand, assumption (36) yields
where we used assumption (34) in the last equality. By combining (38), (39) and (40), we obtain
It ensues that condition (30) is satisfied with \(c'=-\left( c+\frac{c''}{1-\overline{\rho }/2}\right) \). Since \(c''\le -(1-\overline{\rho }/2) c\) by assumption, we have \(c'\ge 0\). This implies that
Using that \(-(1-\overline{\rho }/2)<c''\) by assumption, we deduce that the above quantity is positive, hence \(|c'|<1-c\).
Let us finally check condition (31). Recalling that \(\lim _{k\rightarrow +\infty }\rho _k=\overline{\rho }\), condition (31) is equivalent to
In view of (41), the latter condition is in turn equivalent to
which holds true by (37). Then just use Proposition 3.2. \(\square \)
3.2 Application of the main results
Combining Theorem 2.6 with Proposition 3.3, we obtain the following result.
Theorem 3.4
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that \(\alpha _k\in [0,1[\) and \(\rho _k\in ]0,2[\) for every \(k\ge 1\). Let us assume that there exist \(\overline{\rho }\in [0,2[\), \(c \in [0,1[ \) and \(c'' \in {\mathbb {R}}\), with \(-(1-\overline{\rho }/2)<c''\le -(1-\overline{\rho }/2) c\) such that
Then for any sequence \((x_k)\) generated by (RIPA), we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\frac{1-\alpha _{i-1}}{\rho _{i-1}}\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty } \frac{\rho _i}{1- \alpha _i}\Vert \mu _i A_{\mu _i}(y_i)\Vert ^2<+\infty \), and \(\displaystyle \sum \nolimits _{i=1}^{+\infty }\frac{\rho _i}{1- \alpha _i}\Vert \mu _i A_{\mu _i}(x_i)\Vert ^2<+\infty \).
-
(iii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
Assume moreover that \(\overline{\rho }>0\). Then the following holds
-
(iv)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(y_k)=0\), and \(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(v)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Proof
Proposition 3.1 shows that \((K_0)\) is satisfied and that \(t_{k+1}\sim \frac{1}{(1-c)(1-\alpha _k)}\,\text{ as } k\rightarrow +\infty .\) On the other hand, condition \((K_1)\) is fulfilled in view of Proposition 3.3. Items \(\mathrm{{(i)}}\) to \(\mathrm{(v)}\) then follow immediately from Theorem 2.6. \(\square \)
To apply Theorem 2.14, we must find suitable conditions that ensure that condition \((K_4)\) is satisfied. The following result gives an equivalent, when \(k\rightarrow +\infty \), of the first expression appearing in condition \((K_4)\).
Proposition 3.5
Let \((\alpha _k)\) and \((\rho _k)\) be sequences such that \(\alpha _k\in [0,1[\) and \(\rho _k\in ]0,2]\) for every \(k\ge 1\). Let us assume that there exist \(\overline{\alpha }\in [0,1]\), \(c \in [0,1[ \) and \(c'' \in {\mathbb {R}}\), with \(1+c+c''\overline{\alpha }>0\) such that
Then the following equivalence holds true
Proof
Observe that for every \(i\le k\),
In view of assumption (43), we have \(\lim _{i\rightarrow +\infty }(1-\alpha _{i})/(1-\alpha _{i-1})=1\), see Proposition 3.1\(\mathrm{{(ii)}}\). By using assumptions (42), (43) and (44), we then obtain that
Recalling that \(1+c+c''\overline{\alpha }>0\) by assumption, let us fix \(\varepsilon \in \left]0,\,1+c+c''\overline{\alpha }\right[\). We infer from (46) that there exists \(i_0\ge 1\) such that for every \(i\ge i_0\),
In view of (45), this implies that for every \(i\ge i_0\) and \(k\ge i\),
Let us sum the above inequalities from \(i=i_0\) to k. We find
It ensues that
It remains now to prove that \(\prod _{j=i_0}^{k}\alpha _j=o\left( \frac{\rho _k}{1-\alpha _k} \right) \) as \(k\rightarrow +\infty \). If there exists \(k_0\ge i_0\) such that \(\alpha _{k_0}=0\), then the sequence \(\left( \prod _{j=i_0}^{k}\alpha _j\right) \) is stationary and equal to 0 for \(k\ge k_0\). Without loss of generality, we can assume that \(\alpha _k>0\) for every \(k\ge i_0\). Let us come back to the left inequality of (47), and divide each member by \(\prod _{j=i_0}^{k}\alpha _j\). We find
Since \(1+c+c''\overline{\alpha }>\varepsilon \), we infer that the sequence \(\left( \frac{\rho _i}{1-\alpha _i}\frac{1}{\prod _{j=i_0}^i\alpha _j} \right) \) is increasing. This implies that for every \(i\ge i_0\),
In view of (48), we deduce that
By summing the above inequality from \(i=i_0\) to k, we obtain
Using that \(\sum _{i=1}^{+\infty }(1-\alpha _i)=+\infty \) by Proposition 3.1\(\mathrm{{(iii)}}\), this entails that
This shows that \(\prod _{j=i_0}^{k}\alpha _j=o\left( \frac{\rho _k}{1-\alpha _k} \right) \) as \(k\rightarrow +\infty \), which completes the proof. \(\square \)
Combining Theorem 2.14 with Proposition 3.5, we obtain the following result.
Theorem 3.6
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that the sequences \((\alpha _k)\) and \((\rho _k)\) satisfy \(\alpha _k\in [0,1[\) and \(\rho _k\in ]0,2[\) for every \(k\ge 1\). Let us assume that there exist \(\overline{\alpha }\in [0,1]\), \(\overline{\rho }\in [0,2[\), \(c \in [0,1[ \) and \(c'' \in {\mathbb {R}}\), with \(-(1-\overline{\rho }/2)<c''\le -(1-\overline{\rho }/2) c\) such that
Then for any sequence \((x_k)\) generated by (RIPA), we have
-
(i)
\(\Vert x_{k+1}-x_k\Vert ={\mathcal {O}}\left( \frac{\rho _k}{1-\alpha _k}\right) \quad \text{ as } k\rightarrow +\infty .\)
Assume additionally that \(\frac{|\mu _{k+1}-\mu _k|}{\mu _{k+1}}={\mathcal {O}}\left( \frac{\rho _k}{1-\alpha _k}\right) \) as \(k\rightarrow +\infty \), together with \(\displaystyle \sum \nolimits _{i=1}^{+\infty }\frac{\rho _i}{1-\alpha _i}=+\infty .\)
Then the following holds
-
(ii)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\). If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in ~{\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Finally assume that \(\displaystyle \sum \nolimits _{i=1}^{+\infty }\frac{\rho _i}{1-\alpha _i}<+\infty \). Then we obtain
-
(iii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\Vert x_i-x_{i-1}\Vert <+\infty \), and hence the sequence \((x_k)\) converges strongly toward some \(x_\infty \in {\mathcal {H}}\).
Proof
Let us check that the assumptions of Theorem 2.14 are satisfied. Condition \((K_0)\) is fulfilled owing to assumption (51) and Proposition 3.1\(\mathrm{{(i)}}\). Assumptions (49)–(50)–(51)–(52)–(53) ensure that condition \((K_1)\) is satisfied, see Proposition 3.3. Since \(\overline{\rho }\in [0,2[\), condition \((K_2)\) holds true in view of assumption (50).
\(\mathrm{{(i)}}\) Observe that
hence \(1+c+c''\overline{\alpha }>0\). Proposition 3.5 then shows that
By combining this equivalence with Theorem 2.14\(\mathrm{{(i)}}\), we obtain that \(\Vert x_{k+1}-x_k\Vert ={\mathcal {O}}\left( \frac{\rho _k}{1-\alpha _k}\right) \, \text{ as } k\rightarrow ~+\infty .\)
\(\mathrm{{(ii)}}\)–\(\mathrm{{(iii)}}\) In view of (54) and the equivalence \(t_{k+1}\sim \frac{1}{1-c}\frac{1}{1-\alpha _{k}}\, \text{ as } k\rightarrow +\infty \), we immediately see that the first condition of \((K_4)\) is satisfied. The second condition of \((K_4)\) is guaranteed by the assumption \(\frac{|\mu _{k+1}-\mu _k|}{\mu _{k+1}}={\mathcal {O}}\left( \frac{\rho _k}{1-\alpha _k}\right) \) as \(k\rightarrow +\infty \). From assumption (52) and \(\alpha _k\in [0,1[\), we get
It ensues that \(\rho _k={\mathcal {O}}(\rho _{k+1}) \text{ as } k\rightarrow +\infty \). Recalling from Proposition 3.1\(\mathrm{{(ii)}}\) that \(t_{k+1}\sim t_{k+2} \text{ as } k\rightarrow ~+\infty \), we deduce immediately that the third condition of \((K_4)\) is satisfied. Finally, condition \((K_5)\) is fulfilled owing to the assumption \(\sum _{i=1}^{+\infty }\frac{\rho _i}{1-\alpha _i}=+\infty .\) Points \(\mathrm{{(ii)}}\) and \(\mathrm{{(iii)}}\) then follow from the corresponding points of Theorem 2.14. \(\square \)
3.3 Some particular cases
Let us now particularize our results to the case \(\alpha _k=1-\alpha /k^q\) and \(\rho _k=\beta /k^r\), for some \(\alpha \), \(\beta >0\), \(q\in ]0,1[\) and \(r>0\).
Corollary 3.7
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that \((q,r)\in ]0,1[\times {\mathbb {R}}_+^*\) is such that \(r\ge 2q\), and that \((\alpha , \beta )\in {\mathbb {R}}_+^*\times {\mathbb {R}}_+^*\) satisfies \(\alpha ^2/\beta >1\) if \(r=2q\) (no condition if \(r>2q\)). Assume that \(\alpha _k=1-\alpha /k^q\) and \(\rho _k=\beta /k^r\) for every \(k\ge 1\). Then for any sequence \((x_k)\) generated by (RIPA), we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }i^{r-q}\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\frac{1}{i^{r-q}}\Vert \mu _i A_{\mu _i}(y_i)\Vert ^2<+\infty \), and \(\displaystyle \sum \nolimits _{i=1}^{+\infty } \frac{1}{i^{r-q}}\Vert \mu _i A_{\mu _i}(x_i)\Vert ^2<+\infty \).
-
(iii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
-
(iv)
\(\Vert x_{k+1}-x_k\Vert ={\mathcal {O}}\left( \frac{1}{k^{r-q}}\right) \quad \text{ as } k\rightarrow +\infty .\)
Assume additionally that \(r\le q+1\) and that \(\frac{|\mu _{k+1}-\mu _k|}{\mu _{k+1}}={\mathcal {O}}\left( \frac{1}{k^{r-q}}\right) \) as \(k\rightarrow +\infty \). Then the following holds
-
(v)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(vi)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in ~{\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Finally assume that \(r>q+1\). Then we obtain
-
(vii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\Vert x_i-x_{i-1}\Vert <+\infty \), and hence the sequence \((x_k)\) converges strongly toward some \(x_\infty \in {\mathcal {H}}\).
Proof
We first check that the assumptions (49), (50), (51), (52) and (53) are fulfilled. Assumptions (49)–(50) are clearly satisfied, with \(\overline{\alpha }=1\) and \(\overline{\rho }=0\) respectively. Now observe that
where we have used \(q\in ]0,1[\). Hence assumption (51) is verified with \(c=0\). On the other hand, we have
This shows that assumption (52) is fulfilled with \(c''=0\). Finally, hypothesis (53) amounts to
We have \((1-\alpha _k)^2/\rho _k=(\alpha ^2/k^{2q})(k^r/\beta )=\frac{\alpha ^2}{\beta } k^{r-2q}\), hence
It ensues that assumption (53) is automatically satisfied if \(r>2q\), while it is equivalent to \(\alpha ^2/\beta >1\) if \(r=2q\). Therefore the assumptions of Theorem 3.6 are satisfied, which implies that the hypotheses of Theorem 3.4 are also fulfilled. Points \(\mathrm{{(i)}}\), \(\mathrm{{(ii)}}\) and \(\mathrm{{(iii)}}\) follow immediately from Theorem 3.4. Item \(\mathrm{(iv)}\) is a consequence of Theorem 3.6\(\mathrm{{(i)}}\). Condition \(\sum _{i=1}^{+\infty }\frac{\rho _i}{1-\alpha _i}=+\infty \) amounts to \(r\le q+1\). Points \(\mathrm{(v)}\), \(\mathrm{(vi)}\) and \(\mathrm{(vii)}\) can be immediately derived from the corresponding points of Theorem 3.6. \(\square \)
Consider finally the case \(q=1\), thus leading to a sequence \((\alpha _k)\) of the form \(\alpha _k=1-\alpha /k\). This case was recently studied by Attouch and Peypouquet [7] in connection with Nesterov’s accelerated methods.
Corollary 3.8
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Let \(r\ge 2\), \(\alpha >r\) and \(\beta >0\) be such that \(\beta <\alpha (\alpha -2)\) if \(r=2\) (no condition on \(\beta \) if \(r>2\)). Assume that \(\alpha _k=1-\alpha /k\) and \(\rho _k=\beta /k^r\) for every \(k\ge 1\). Then for any sequence \((x_k)\) generated by (RIPA), we have
-
(i)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }i^{r-1}\Vert x_i-x_{i-1}\Vert ^2<+\infty \).
-
(ii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\frac{1}{i^{r-1}}\Vert \mu _i A_{\mu _i}(y_i)\Vert ^2<+\infty \), and \(\displaystyle \sum \nolimits _{i=1}^{+\infty } \frac{1}{i^{r-1}}\Vert \mu _i A_{\mu _i}(x_i)\Vert ^2<+\infty \).
-
(iii)
For any \(z\in {\mathrm{zer}}A\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists, and hence \((x_k)\) is bounded.
-
(iv)
\(\Vert x_{k+1}-x_k\Vert ={\mathcal {O}}\left( \frac{1}{k^{r-1}}\right) \quad \text{ as } k\rightarrow +\infty .\)
Assume additionally that \(r=2\) and that \(\frac{|\mu _{k+1}-\mu _k|}{\mu _{k+1}}={\mathcal {O}}\left( \frac{1}{k}\right) \) as \(k\rightarrow +\infty \). Then the following holds
-
(v)
\(\lim _{k\rightarrow +\infty }\mu _k A_{\mu _k}(x_k)=0\).
-
(vi)
If \(\liminf _{k\rightarrow +\infty }\mu _k>0\), then there exists \(x_\infty \in ~{\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
Finally assume that \(r>2\). Then we obtain
-
(vii)
\(\displaystyle \sum \nolimits _{i=1}^{+\infty }\Vert x_i-x_{i-1}\Vert <+\infty \), and hence the sequence \((x_k)\) converges strongly toward some \(x_\infty \in {\mathcal {H}}\).
Proof
Assumptions (49)–(50) are clearly satisfied, with \(\overline{\alpha }=1\) and \(\overline{\rho }=0\) respectively. Now observe that
hence assumption (51) is verified with \(c=\frac{1}{\alpha }\). On the other hand, we have
This shows that assumption (52) is fulfilled with \(c''=-\frac{r}{\alpha }\). The hypothesis \(-(1-\overline{\rho }/2)<c''\le -(1-\overline{\rho }/2)c\) amounts to \(-1<-\frac{r}{\alpha }\le -\frac{1}{\alpha }\), which is in turn equivalent to \(1\le r<\alpha \). This holds true in view of the assumptions of Corollary 3.8. Finally, hypothesis (53) can be rewritten as
We have \((1-\alpha _k)^2/\rho _k=(\alpha ^2/k^{2})(k^r/\beta )=\frac{\alpha ^2}{\beta } k^{r-2}\), hence
It ensues that assumption (53) is automatically satisfied if \(r>2\), while it is equivalent to \(\alpha (\alpha -2)>\beta \) if \(r=2\). Points \(\mathrm{{(i)}}\), \(\mathrm{{(ii)}}\) and \(\mathrm{{(iii)}}\) follow immediately from Theorem 3.4. Item \(\mathrm{(iv)}\) is a consequence of Theorem 3.6\(\mathrm{{(i)}}\). Condition \(\sum _{i=1}^{+\infty }\frac{\rho _i}{1-\alpha _i}=+\infty \) amounts to \(r\le 2\), which boils down to \(r=2\). Points \(\mathrm{(v)}\), \(\mathrm{(vi)}\) and \(\mathrm{(vii)}\) can be immediately derived from the corresponding points of Theorem 3.6. \(\square \)
The case \(r=2\) corresponds to the situation studied by Attouch and Peypouquet [7]. More precisely, they considered the case
where \(\alpha \), \(s>0\) and \(\lambda _k=(1+ \varepsilon )\frac{s}{\alpha ^2}k^2\), for some \(\varepsilon >0\). Let us recall their result, that can be obtained as a direct consequence of Theorems 3.4 and 3.6. The details are left to the reader.
Theorem 3.9
(Attouch–Peypouquet [7]) Let \(A: \mathcal {H} \rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator such that \({\mathrm{zer}}A \ne \emptyset \). Let \((x_k)\) be a sequence generated by the Regularized Inertial Proximal Algorithm
Suppose that \(\alpha >2\), \(s>0\), \(\varepsilon > \frac{2}{\alpha -2}\), and \( \lambda _k = (1+ \varepsilon )\frac{s}{\alpha ^2}k^2\) for all \(k \ge 1\). Then,
-
(i)
\(\Vert x_{k+1} - x_{k} \Vert = \mathcal {O} (\frac{1}{k})\) as \( k\rightarrow +\infty \), and \(\sum _{k=1}^{+\infty } k \Vert x_{k} - x_{k-1} \Vert ^2 < +\infty \).
-
(ii)
There exists \(x_\infty \in {\mathrm{zer}}A\) such that \(x_k\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(k\rightarrow +\infty \).
-
(iii)
The sequence \((y_k)\) converges weakly in \({\mathcal {H}}\) to \(x_\infty \), as \(k\rightarrow +\infty \).
The following table gives a synthetic view of some of the situations studied previously (the large number of cases does not allow to enter all of them). Each column gives the joint tuning of the parameters \(\alpha _k\), \(\rho _k\), and \(\mu _k\), which provides the convergence of the iterates generated by (RIPA). For ease of reading, we recall the definition of (RIPA)
From left to right, the table is ordered according to the decreasing values of \(\alpha _k\).
\(\alpha _k\) | \(\alpha _k=1-\frac{\alpha }{k}\) | \(\alpha _k=1-\frac{\alpha }{k^q}\) | \(\alpha _k=1-\frac{\alpha }{k^q}\) | \(\alpha _k \equiv \alpha \in [0,1[\) |
\( \alpha >2\) | \(q \in ]0,1[, \, \alpha >0\) | \(q \in ]0,1[, \, \alpha >0\) | ||
\(\rho _k\) | \(\rho _k = \frac{\beta }{k^2}, \) | \(\rho _k = \frac{\beta }{k^{2q}}, \, \beta < \alpha ^2\) | \(\rho _k = \frac{\beta }{k^r}\), | \( \rho _k \equiv \rho < \frac{2 (1-\alpha )^2}{2\alpha ^2 -\alpha +1}\) |
\(\beta < \alpha (\alpha -2)\) | \(2q < r\le q+1, \, \beta >0\) | |||
\(\mu _k\) | \(\frac{|\mu _{k+1}-\mu _k|}{\mu _{k+1}}={\mathcal {O}}\left( \frac{1}{k}\right) \) | \(\frac{|\mu _{k+1}-\mu _k|}{\mu _{k+1}}={\mathcal {O}}\left( \frac{1}{k^q}\right) \) | \(\frac{|\mu _{k+1}-\mu _k|}{\mu _{k+1}}={\mathcal {O}}\left( \frac{1}{k^{r-q}}\right) \) | |
\(\liminf \mu _k >0\) | \(\liminf \mu _k >0\) | \(\liminf \mu _k >0\) | \(\liminf \mu _k >0\) |
As noticed before, we can observe the balance between the inertial effect and the relaxation effect. As \(\alpha _k\) gets closer to one, the relaxation parameter \(\rho _k\) gets closer to zero.
4 Ergodic convergence results
4.1 Ergodic variant of the Opial lemma
An ergodic version of the Opial lemma was derived by Passty [26] in the case of the averaging process defined by
where \((s_k)\) is a sequence of positive steps. In order to deal with a more general averaging process, let us consider a double sequence \((\tau _{i,k})_{i,k\ge 1}\) of nonnegative numbers satisfying the following assumptions
To each bounded sequence \((x_k)\) of \({\mathcal {H}}\), we associate the averaged sequence \((\widehat{x}_k)\) by
Lemma B.2 in the appendix shows that the sequence \((\widehat{x}_k)\) is well-defined, bounded and that convergence of \((x_k)\) implies convergence of \((\widehat{x}_k)\) as \(k\rightarrow +\infty \) toward the same limit (Cesaro property). The extension of Opial lemma to a general averaging process satisfying (55) and (56) is given hereafter. This result can be obtained as a consequence of the generalized Opial lemma established by Brézis–Browder, see [14, Lemma 1]. For the sake of the reader, we give an independent and self-contained proof.
Proposition 4.1
Let S be a nonempty subset of \({\mathcal {H}}\) and let \((x_k)\) be a bounded sequence of \(({\mathcal {H}})\). Let \((\tau _{i,k})\) be a double sequence of nonnegative numbers satisfying (55) and (56), and let \((\widehat{x}_k)\) be the averaged sequence defined by (57). Assume that
-
(i)
For every \(z\in S\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists;
-
(ii)
every weak limit point of the sequence \((\widehat{x}_k)\) belongs to S.
Then the sequence \((\widehat{x}_k)\) converges weakly as \(k\rightarrow +\infty \) toward some \(x_\infty \in S\).
Proof
From Lemma B.2\(\mathrm{{(i)}}\), the sequence \((\widehat{x}_k)\) is bounded, therefore it is enough to establish the uniqueness of weak limit points. Let \((\widehat{x}_{k_n})\) and \((\widehat{x}_{k_m})\) be two weakly converging subsequences satisfying respectively \(\widehat{x}_{k_n}\rightharpoonup \overline{x^{}}_1\) as \(n\rightarrow +\infty \) and \(\widehat{x}_{k_m}\rightharpoonup \overline{x^{}}_2\) as \(m\rightarrow +\infty \). From \(\mathrm{{(ii)}}\), the weak limit points \(\overline{x^{}}_1\) and \(\overline{x^{}}_2\) belong to S. From \(\mathrm{{(i)}}\), we deduce that \(\lim _{k\rightarrow +\infty }\Vert x_k-\overline{x^{}}_1\Vert ^2\) and \(\lim _{k\rightarrow +\infty }\Vert x_k-\overline{x^{}}_2\Vert ^2\) exist. Writing that
we infer that \(\lim _{k\rightarrow +\infty }\langle x_k,\overline{x^{}}_2-\overline{x^{}}_1\rangle \) exists. Observe that
By applying Lemma B.2\(\mathrm{{(ii)}}\) to the real sequence \(\left( \big \langle x_k,\overline{x^{}}_2-\overline{x^{}}_1\big \rangle \right) \), we deduce that \(\lim _{k\rightarrow +\infty }\langle \widehat{x}_k,\overline{x^{}}_2-\overline{x^{}}_1\rangle \) exists. This implies that
which entails that \(\langle \overline{x^{}}_1,\overline{x^{}}_2-\overline{x^{}}_1\rangle =\langle \overline{x^{}}_2,\overline{x^{}}_2-\overline{x^{}}_1\rangle .\) Therefore \(\Vert \overline{x^{}}_2-\overline{x^{}}_1\Vert ^2=0\), which ends the proof. \(\square \)
Remark 4.2
By taking \((\tau _{i,k})\) defined by
conditions (55) and (56) are trivially satisfied and we find \(\widehat{x}_k=x_k\) for every \(k\ge 1\). It ensues that the Opial lemma appears as a particular case of Proposition 4.1.
4.2 Ergodic convergence of the iterates
To each sequence \((x_k)\) generated by (RIPA), we associate a suitable averaged sequence as in (57). The weight coefficients are judiciously chosen and depend on \(\alpha _k\), \(\mu _k\) and \(\rho _k\). Under conditions \((K_0)\)–\((K_1)\)–\((K_2)\)–\((K_3)\), we show that the averaged sequence converges weakly toward some zero of the operator A.
Theorem 4.3
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose that \(\alpha _k\in [0,1]\) and \(\rho _k\in ]0,2]\) for every \(k\ge 1\). Under \((K_0)\), let \((t_{i,k})\) and \((t_i)\) be the sequences respectively defined by (13) and (15). Assume that conditions \((K_1)\)–\((K_2)\)–\((K_3)\) hold, together with
Let us define the sequence \((\tau _{i,k})\) by
Then for any sequence \((x_k)\) generated by (RIPA), there exists \(x_\infty \in {\mathrm{zer}}A\) such that
Proof
The proof relies on Proposition 4.1 applied with \(S={\mathrm{zer}}A\). Let us first check that conditions (55) and (56) are satisfied for the sequence \((\tau _{i,k})\) given by (59). Property (55) follows immediately from the definition of \((\tau _{i,k})\) (recall that \(t_{i,k}=0\) for \(i>k\)). On the other hand, observe that for every i, \(k\ge 1\),
The quantity \(t_{i} \rho _{i-1}\mu _{i-1}\) is finite and independent of k. Since \(t_{i,k}\) tends increasingly toward \(t_i\) as \(k\rightarrow +\infty \), the monotone convergence theorem implies that
where we have used the assumption (58). We then deduce from the inequality (60) that \(\lim _{k\rightarrow +\infty }\tau _{i,k}=~0\), which establishes (56).
We now have to prove that the conditions \(\mathrm{{(i)}}\) and \(\mathrm{{(ii)}}\) of Proposition 4.1 are fulfilled. Condition \(\mathrm{{(i)}}\) is realized in view of Theorem 2.6\(\mathrm{{(iii)}}\). Let us now assume that there exist \(x_\infty \in {\mathcal {H}}\) and a sequence \((k_n)\) such that \(k_n\rightarrow +\infty \) and \(\widehat{x}_{k_n}\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(n\rightarrow +\infty \). Let us fix \((z,q)\in {\mathrm{gph}}A\) and define the sequence \((h_k)\) by \(h_k=\frac{1}{2}\Vert x_k-z\Vert ^2\). From inequality (10) of Lemma 2.2, we have
because the assumptions \(\alpha _k\in [0,1]\) and \(\rho _k\in ]0,2]\) imply respectively \(\frac{1}{2}(\alpha _k+\alpha _k^2)\le \alpha _k\) and \(\frac{2-\rho _k}{2\rho _k}\ge 0\). Since \(x_{k+1}=y_k-\rho _k\mu _k A_{\mu _k}(y_k)\), the above inequality can be rewritten as
Setting \(a_k=h_k-h_{k-1}\) and
inequality (62) amounts to \(a_{k+1}\le \alpha _k a_k+ w_k.\) By applying Lemma B.1\(\mathrm{{(i)}}\), we obtain for every \(k\ge 1\),
Since \(h_k\ge 0\) and \(t_{i+1,k}\le t_{i+1}\), we deduce that
Recalling from Theorem 2.6\(\mathrm{{(i)}}\) that \(\sum _{i=1}^{+\infty }t_{i+1}\alpha _i\Vert x_i-x_{i-1}\Vert ^2<+\infty \), we infer that for every \(k\ge 1\),
where we have set \(C:=h_0+t_1|h_1-h_0|+\sum _{i=1}^{+\infty }t_{i+1}\alpha _i\Vert x_i-x_{i-1}\Vert ^2<+\infty .\) Since \(\rho _k\in ]0,2]\), according to the Cauchy–Schwarz inequality we have that
It ensues that
By shifting the index of summation, we deduce from the above inequality that
where we have set \(C':=C+t_{1}\rho _{0}\mu _{0} |\langle x_{1}-z,q\rangle |.\) This can be rewritten as
Dividing by \(\sum _{i=1}^{k}t_{i,k}\rho _{i-1}\mu _{i-1}\), we find
By Theorem 2.6\(\mathrm{(iv)}\) we have \(\lim _{k\rightarrow +\infty }\Vert \mu _{k} A_{\mu _{k}}(y_{k})\Vert =0\). From the Cesaro property, we infer that
see Lemma B.2. Using (61) and taking the upper limit as \(k\rightarrow +\infty \) in inequality (63), we then obtain
Since \(\widehat{x}_{k_n}\rightharpoonup x_\infty \) weakly in \({\mathcal {H}}\) as \(n\rightarrow +\infty \), we have \(\langle \widehat{x}_{k_n}-z,q\rangle \rightarrow \langle x_\infty -z,q\rangle \) as \(n\rightarrow +\infty \). From what precedes, we deduce that \(\langle x_\infty -z,q\rangle \le 0\). Since this is true for every \((z,q)\in {\mathrm{gph}}A\), and since the operator A is maximally monotone, we infer that \(0\in A(x_\infty )\). We have proved that \(x_\infty \in {\mathrm{zer}}A\), which shows that condition \(\mathrm{{(ii)}}\) of Proposition 4.1 is satisfied. The proof is complete. \(\square \)
Let us now apply Theorem 4.3 to the case \(\alpha _k=0\) for every \(k\ge 1\). In this case, assumptions \((K_0)\) and \((K_1)\) are trivially satisfied, and moreover \(t_i=t_{i,k}=1\) for every \(i\ge 1\) and \(k\ge i\). We then obtain the following corollary of Theorem 4.3.
Corollary 4.4
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \). Suppose moreover that \(\limsup _{k\rightarrow +\infty } \rho _k<2\) and \(\liminf _{k\rightarrow +\infty } \rho _k>0\), together with \(\sum _{i=0}^{+\infty }\rho _i\mu _i=+\infty \). Then for any sequence \((x_k)\) generated by (RPA)
there exists \(x_\infty \in {\mathrm{zer}}A\) such that
Proof
From Theorem 4.3, we obtain that
We deduce immediately that
Recall from Corollary 2.8\(\mathrm{{(i)}}\) that \(\sum _{i=1}^{+\infty }\frac{2-\rho _i}{\rho _i}\Vert x_{i+1}-x_i\Vert ^2<+\infty \). Since \(\limsup _{k\rightarrow +\infty } \rho _k<2\), this implies that \(\sum _{i=1}^{+\infty }\Vert x_{i+1}-x_i\Vert ^2<+\infty \), which entails in turn that \(\lim _{k\rightarrow +\infty }\Vert x_{k+1}-x_k\Vert =0\). From the Cesaro property, we infer that
By putting together (65) and (66), we immediately obtain (64). \(\square \)
If we assume moreover that \(\rho _k=1\) for every \(k\ge 1\), we recover a classical result of ergodic convergence for the proximal point algorithm, see the seminal paper of Brezis and Lions, see [15, Remarque 10].
Corollary 4.5
Under (H), assume that \({\mathrm{zer}}A\ne \emptyset \) and that \(\sum _{i=0}^{+\infty }\mu _i=+\infty \). Then for any sequence \((x_k)\) generated by the algorithm
there exists \(x_\infty \in {\mathrm{zer}}A\) such that \(\displaystyle {\frac{1}{\sum \nolimits _{i=0}^k \mu _i}\sum \limits _{i=0}^k \mu _i x_i\rightharpoonup x_\infty } \quad \text{ weakly } \text{ in }~ {\mathcal {H}}~ \text{ as }~ k\rightarrow +\infty .\)
5 Conclusion, perspective
The introduction of inertial features into proximal-based algorithms to solve general monotone inclusions is a long-standing difficult problem. (RIPA) algorithm, which addresses these issues, involves three basic parameters, \(\alpha _k\), \(\mu _k\), \(\rho _k\), which depend on the iteration index k, and which take into account respectively the inertia, the proximal step size, and the relaxation. (RIPA) provides a general framework for understanding the subtle tuning of these different parameters to achieve the weak convergence of the iterates. In particular, we obtained convergence results based on the Nesterov acceleration method, in the context of maximally monotone operators, which extend the recent result of Attouch–Peypouquet [7]. Several basic splitting algorithms in optimization, naturally rely on the maximally monotone approach, such as ADMM, primal-dual methods, Douglas-Rachford. Our results provide a general way to understand the acceleration of these algorithms via inertia. Several important questions remain to be studied, such as obtaining splitting methods in this context, and studying the convergence rate of these methods. In this respect, it would be important to study the case \(A= \partial \Psi \) where \(\Psi \) is a closed convex function, thus recovering the rate of convergence of the values for Nesterov methods.
Notes
Note that in [4, Proposition 14], a closely related but different condition has been considered: the difference of the quotients is assumed to be less than or equal to c (and this guarantees (\(K_0\))).
References
Álvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14, 773–782 (2004)
Álvarez, F., Attouch, H.: The heavy ball with friction dynamical system for convex constrained minimization problems, Optimization (Namur, 1998), pp. 25–35, Lecture Notes in Economics and Mathematical Systems, vol. 481. Springer, Berlin, (2000)
Álvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9(1–2), 3–11 (2001)
Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28, 849–874 (2018)
Attouch, H., Cabot, A.: Convergence of damped inertial dynamics governed by regularized maximally monotone operators. J. Differ. Equ. 264, 7138–7182 (2018)
Attouch, H., Maingé, P.E.: Asymptotic behavior of second-order dissipative evolution equations combining potential with non-potential effects. ESAIM Control Optim. Calc. Var. 17, 836–857 (2010)
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Program. Ser. B. https://doi.org/10.1007/s10107-018-1252-x (2018)
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert spaces. CMS Books in Mathematics. Springer, Berlin (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York (1982)
Bot, R.I., Csetnek, E.R.: Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54, 1423–1443 (2016)
Boţ, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. Lecture Notes 5. North Holland (1972)
Brézis, H., Browder, F.E.: Nonlinear ergodic theorems. Bull. Am. Math. Soc. 82(6), 959–961 (1976)
Brézis, H., Lions, P.L.: Produits infinis de résolvantes. Isr. J. Math. 29, 329–345 (1978)
Cabot, A., Frankel, P.: Asymptotics for some proximal-like method involving inertia and memory aspects. Set Valued Var. Anal. 19, 59–74 (2011)
Combettes, P.L., Glaudin, L.E.: Quasinonexpansive iterations on the affine hull of orbits: from Mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27, 2356–2380 (2017)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Eckstein, J., Ferris, M.C.: Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control. Informs J. Comput. 10, 218–235 (1998)
Iutzeler, F., Hendrickx, J.M.: Generic online acceleration scheme for optimization algorithms via relaxation and inertia. Optim. Methods Softw. (2018) (to appear)
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
Maingé, P.-E.: Convergence theorems for inertial KM-type algorithms. J. Comput. Appl. Math. 219, 223–236 (2008)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155, 447–454 (2003)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Pesquet, J.-C., Pustelnik, N.: A Parallel Inertial Proximal Optimization Method. Pacific Journal of Optimization 8, 273–305 (2012)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Yosida regularization
A set-valued mapping A from \({\mathcal {H}}\) to \({\mathcal {H}}\) assigns to each \(x\in {\mathcal {H}}\) a set \(A(x)\subset {\mathcal {H}}\), hence it is a mapping from \({\mathcal {H}}\) to \(2^{\mathcal {H}}\). Every set-valued mapping \(A:{\mathcal {H}}\rightarrow 2^{\mathcal {H}}\) can be identified with its graph defined by
The set \(\{x\in {\mathcal {H}}:\ 0\in A(x)\}\) of the zeros of A is denoted by \({\mathrm{zer}}A\). An operator \(A:{\mathcal {H}}\rightarrow 2^{\mathcal {H}}\) is said to be monotone if for any (x, u), \((y,v)\in {\mathrm{gph}}A\), one has \(\langle y-x, v-u\rangle \ge 0\). It is maximally monotone if there exists no monotone operator whose graph strictly contains \({\mathrm{gph}}A\). If a single-valued operator \(A:{\mathcal {H}}\rightarrow {\mathcal {H}}\) is continuous and monotone, then it is maximally monotone, cf. [13, Proposition 2.4].
Given a maximally monotone operator A and \(\lambda >0\), the resolvent of A with index \(\lambda \) and the Yosida regularization of A with parameter \(\lambda \) are defined by
respectively. The operator \(J_{\lambda A}: {\mathcal {H}}\rightarrow {\mathcal {H}}\) is nonexpansive and everywhere defined (indeed it is firmly non-expansive). Moreover, \(A_{\lambda }\) is \(\lambda \)-cocoercive: for all \(x, y \in {\mathcal {H}}\) we have
This property immediately implies that \(A_{\lambda }: {\mathcal {H}}\rightarrow {\mathcal {H}}\) is \(\frac{1}{\lambda }\)-Lipschitz continuous. Another property that proves useful is the resolvent equation (see, for example, [13, Proposition 2.6] or [8, Proposition 23.6])
which is valid for any \(\lambda , \mu >0\). This property allows to compute simply the resolvent of \(A_\lambda \) by
for any \(\lambda , \mu >0\). Also note that for any \(x \in {\mathcal {H}}\), and any \(\lambda >0\) \( A_\lambda (x) \in A (J_{\lambda A}x)= A( x - \lambda A_\lambda (x)). \) Finally, for any \(\lambda >0\), \(A_{\lambda }\) and A have the same solution set, \({\mathrm{zer}}A_{\lambda }= {\mathrm{zer}}A\). For a detailed presentation of the maximally monotone operators and the Yosida approximation, the reader can consult [8] or [13].
Appendix B. Some auxiliary results
In this section, we present some auxiliary lemmas that are used throughout the paper.
Lemma B.1
Let \((a_k)\), \((\alpha _k)\) and \((w_k)\) be sequences of real numbers satisfying
Assume that \(\alpha _i\ge 0\) for every \(i\ge 1\).
-
(i)
For every \(k\ge 1\), we have
$$\begin{aligned} \sum _{i=1}^k a_i\le t_{1,k}a_1+\sum _{i=1}^{k-1} t_{i+1,k} w_i, \end{aligned}$$(68)where the double sequence \((t_{i,k})\) is defined by (13).
-
(ii)
Under \((K_0)\), assume that the sequence \((t_i)\) defined by (15) satisfies \(\sum _{i=1}^{+\infty }t_{i+1}(w_i)_+<~+\infty \). Then the series \(\sum _{i\ge 1}(a_i)_+\) is convergent, and
$$\begin{aligned}\sum _{i= 1}^{+\infty }(a_i)_+\le t_1(a_1)_+ +\sum _{i=1}^{+\infty } t_{i+1} (w_i)_+.\end{aligned}$$
Proof
\(\mathrm{{(i)}}\) Recall from Lemma 2.4\(\mathrm{{(i)}}\) that \(\alpha _i t_{i+1,k}=t_{i,k}-1\) for every \(i\ge 1\) and \(k\ge i+1\). Multiplying inequality (67) by \(t_{i+1,k}\) gives
or equivalently
By summing from \(i=1\) to \(k-1\), we deduce that
Since \(t_{k,k}=1\), inequality (68) follows immediately.
\(\mathrm{{(ii)}}\) Taking the positive part of each member of (67), we find
By applying \(\mathrm{{(i)}}\) with \((a_i)_+\) (resp. \((w_i)_+\)) in place of \(a_i\) (resp. \(w_i\)), we obtain for every \(k\ge 1\)
because \(t_{i+1,k} \le t_{i+1} \), and \(\sum _{i=1}^{+\infty } t_{i+1} (w_i)_+<+\infty \) by assumption. Then just let k tend to \( + \infty \). \(\square \)
Given a bounded sequence \((x_k)\) of a Banach space \(({\mathcal {X}},\Vert .\Vert )\), the next lemma gives basic properties of the averaged sequence \((\widehat{x}_k)\) defined by (57).
Lemma B.2
Let \(({\mathcal {X}},\Vert .\Vert )\) be a Banach space and let \((x_k)\) be a bounded sequence of \({\mathcal {X}}\). Given a sequence \((\tau _{i,k})_{i,k\ge 1}\) of nonnegative numbers satisfying (55)–(56), let \((\widehat{x}_k)\) be the averaged sequence defined by \(\widehat{x}_k=\sum _{i=1}^{+\infty }\tau _{i,k}x_i\). Then we have
-
(i)
The sequence \((\widehat{x}_k)\) is well-defined, bounded and \(\sup _{k\ge 1}\Vert \widehat{x}_k\Vert \le \sup _{k\ge 1}\Vert x_k\Vert \).
-
(ii)
If \((x_k)\) converges toward \(\overline{x^{}}\in {\mathcal {X}}\), then the sequence \((\widehat{x}_k)\) is also convergent and \(\lim _{k\rightarrow +\infty }\widehat{x}_k=\overline{x^{}}\).
Proof
\(\mathrm{{(i)}}\) Set \(M=\sup _{k\ge 1}\Vert x_k\Vert <+\infty \). In view of (55), observe that for every \(k\ge 1\),
Since the space \({\mathcal {X}}\) is complete, we classically deduce that the series \(\sum _{i\ge 1}\tau _{i,k}x_i\) is convergent. From the definition of \(\widehat{x}_k\), we then have \(\Vert \widehat{x}_k\Vert \le \sum _{i=1}^{+\infty }\tau _{i,k}\Vert x_i\Vert ,\) and hence \(\Vert \widehat{x}_k\Vert \le M\) in view of (69).
\(\mathrm{{(ii)}}\) Assume that \((x_k)\) converges toward \(\overline{x^{}}\in {\mathcal {X}}\). By using (55), we have for every \(k\ge 1\),
Fix \(\varepsilon >0\), and let \(K\ge 1\) such that \(\Vert x_i-\overline{x^{}}\Vert \le \varepsilon \) for every \(i\ge K\). From the above inequality, we obtain
with \(M=\sup _{i\ge 1}\Vert x_i-\overline{x^{}}\Vert <+\infty \). Taking the upper limit as \(k\rightarrow +\infty \), we deduce from (56) that
Since this is true for every \(\varepsilon >0\), we conclude that \(\lim _{k\rightarrow +\infty }\Vert \widehat{x}_k-\overline{x^{}}\Vert =0\). \(\square \)
Rights and permissions
About this article
Cite this article
Attouch, H., Cabot, A. Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. Math. Program. 184, 243–287 (2020). https://doi.org/10.1007/s10107-019-01412-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-019-01412-0
Keywords
- Maximally monotone operators
- Yosida regularization
- Inertial proximal method
- Large step proximal method
- Lyapunov analysis
- (Over)Relaxation