Accelerated proximal point method for maximally monotone operators

Kim, Donghwan

doi:10.1007/s10107-021-01643-0

Accelerated proximal point method for maximally monotone operators

Full Length Paper
Series A
Published: 23 March 2021

Volume 190, pages 57–87, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Programming Submit manuscript

Accelerated proximal point method for maximally monotone operators

Download PDF

Donghwan Kim¹

2677 Accesses
40 Citations
1 Altmetric
Explore all metrics

Abstract

This paper proposes an accelerated proximal point method for maximally monotone operators. The proof is computer-assisted via the performance estimation problem approach. The proximal point method includes various well-known convex optimization methods, such as the proximal method of multipliers and the alternating direction method of multipliers, and thus the proposed acceleration has wide applications. Numerical experiments are presented to demonstrate the accelerating behaviors.

General Proximal-Point Algorithm for Monotone Operators

Article 01 April 2017

The Developments of Proximal Point Algorithms

Article 10 March 2022

Weak and strong convergence of generalized proximal point algorithms with relaxed parameters

Article 07 October 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A fundamental tool for finding a root of a monotone operator is the proximal point method [46, 59]. The monotone operator theory is particularly of interest, since it is closely related to convex functions and convex minimization [7, 14, 60]. For example, the proximal point method is useful when solving ill-conditioned problems or dual problems. In particular, the augmented Lagrangian method (i.e., the method of multipliers) [36, 56] and the alternating direction method of multipliers (ADMM) [26, 27] are instances of the proximal point method applied to dual problems [23, 24, 58].

To improve the efficiency of the proximal point method, accelerating its worst-case rate has been of interest both in theory and in applications (see e.g., [1, 3, 6, 15, 29, 34, 43]). In specific, inspired by Nesterov’s fast gradient method [51, 52], Güler [34] accelerated the worst-case rate of the proximal point method for convex minimization with respect to the cost function. This yields the fast rate $O(1/i^2)$ where i denotes the number of iterations, compared to the O(1/i) rate of the proximal point method. However, this acceleration has not been theoretically generalized to the monotone inclusion problem, and only somewhat empirical accelerations, e.g., via the relaxation and the inertia (i.e., an implicit version of the heavy ball method [55], or equivalently, Nesterov’s and Güler’s accelerating technique [34, 51, 52]) in [1, 3, 6, 15, 29], have been studied. Therefore, this paper studies accelerating the worst-case rate of the proximal point method with respect to the fixed-point residual for maximally monotone operators. This provides the fast $O(1/i^2)$ rate, which improves upon the rate O(1/i) of the proximal point method [9, 32]. The proof is computer-assisted via the performance estimation problem (PEP) approach [21] and its extensions [20, 22, 31, 32, 37,38,39,40,41, 61, 65,66,67].

Under the additional strong monotonicity condition, the proximal point method has a linear rate in terms of the fixed-point residual [59], while the proposed acceleration is not guaranteed to have such a linear rate. Therefore, this paper further employs a restarting technique (e.g., [50, Section 11.4], [53, Section 5.1]) under the strong monotonicity condition. This has a linear rate, and is faster than the proximal point method for some practical cases.

The proposed acceleration of the proximal point method has wide applications. This provides an acceleration to the proximal method of multipliers [58], the Douglas–Rachford splitting method [19, 44], and ADMM [26, 27]. The proposed result also applies to a preconditioned proximal point method such as the primal–dual hybrid gradient (PDHG) method [11, 13, 25, 35], (i.e., a preconditioned ADMM), yielding an accelerated PDHG method. This paper then shows that the proposed acceleration applies to a forward method for cocoercive operators. Existing works on accelerating the forward method can be found, for example, in [2, 45].

Section 2 reviews maximally monotone operators, the proximal point method and its known accelerations. Section 3 studies the PEP with respect to the fixed-point residual for monotone inclusion problems. Section 4 proposes a new accelerated proximal point method using the PEP. Section 5 considers a restarting technique to yield a linear rate, under the additional strongly monotone assumption. Section 6 applies the proposed acceleration to well-known instances of the proximal point method, such as the proximal method of multipliers, the PDHG method, the Douglas–Rachford splitting method, and ADMM. Section 6 also provides numerical experiments. Section 7 presents that the proposed approach also accelerates the forward method for cocoercive operators, and Sect. 8 concludes.

2 Problem and method

2.1 Monotone inclusion problem

Let ${\mathcal {H}}$ be a real Hilbert space equipped with inner product $\mathop {\langle \cdot ,\, \cdot \rangle }\nolimits $, and associated norm $||\cdot ||$. A set-valued operator ${\varvec{M}} \;:\;{\mathcal {H}}\rightarrow 2^{\mathcal {H}}$ is monotone if

$$\begin{aligned} \mathop {\langle {\varvec{x}}- {\varvec{y}},\, {\varvec{u}}- {\varvec{v}} \rangle }\nolimits \ge 0 \text { for all } ({\varvec{x}},{\varvec{u}}),({\varvec{y}},{\varvec{v}})\in {\text {gra}}{\varvec{M}} , \end{aligned}$$

(1)

where ${\text {gra}}{\varvec{M}} {:}{=} \{({\varvec{x}},{\varvec{u}})\in {\mathcal {H}}\times {\mathcal {H}}\;:\,{\varvec{u}} \in {\varvec{M}} {\varvec{x}} \}$ denotes the graph of ${\varvec{M}} $. A monotone operator ${\varvec{M}} $ is maximally monotone if there exists no monotone operator ${\varvec{A}} \;:\;{\mathcal {H}}\rightarrow 2^{\mathcal {H}}$ such that ${\text {gra}}{\varvec{A}} $ properly contains ${\text {gra}}{\varvec{M}} $. Let ${\mathcal {M}}({\mathcal {H}})$ be the class of maximally monotone operators on ${\mathcal {H}}$. In addition, a set-valued operator ${\varvec{M}} \;:\;{\mathcal {H}}\rightarrow 2^{\mathcal {H}}$ is $\mu $-strongly monotone for $\mu \in \mathbb {R} _{++}$, if

$$\begin{aligned} \mathop {\langle {\varvec{x}}- {\varvec{y}},\, {\varvec{u}}- {\varvec{v}} \rangle }\nolimits \ge \mu ||{\varvec{x}}- {\varvec{y}} ||^2 \text { for all } ({\varvec{x}},{\varvec{u}}),({\varvec{y}},{\varvec{v}})\in {\text {gra}}{\varvec{M}} . \end{aligned}$$

(2)

Let ${\mathcal {M}}_{\mu }({\mathcal {H}})$ be the class of maximally and $\mu $-strongly monotone operators on ${\mathcal {H}}$. Also, define ${\mathcal {B}}({\mathcal {H}},{\mathcal {G}}) = \{{\varvec{L}} \;:\;{\mathcal {H}}\rightarrow {\mathcal {G}}\;|\;{\varvec{L}} \text { is linear and bounded}\}$ for a real Hilbert space ${\mathcal {G}}$ equipped with inner product $\mathop {\langle \cdot ,\, \cdot \rangle }\nolimits $, and let ${\varvec{L}} ^*\in {\mathcal {B}}({\mathcal {G}},{\mathcal {H}})$ be the adjoint of ${\varvec{L}} \in {\mathcal {B}}({\mathcal {H}},{\mathcal {G}})$ that satisfies $\mathop {\langle {\varvec{L}} {\varvec{x}},\, {\varvec{y}} \rangle }\nolimits = \mathop {\langle {\varvec{x}},\, {\varvec{L}} ^*{\varvec{y}} \rangle }\nolimits $ for all ${\varvec{x}} \in {\mathcal {H}}$ and ${\varvec{y}} \in {\mathcal {G}}$.

This paper considers the monotone inclusion problem:

$$\begin{aligned} {\text {Find}}\;\; {\varvec{x}} \in {\mathcal {H}}\quad \text {subject to} \quad {\varvec{0}} \in {\varvec{M}} {\varvec{x}} , \end{aligned}$$

(3)

where ${\varvec{M}} \in {\mathcal {M}}({\mathcal {H}})$ (or ${\varvec{M}} \in {\mathcal {M}}_{\mu }({\mathcal {H}})$). This includes convex problems and convex–concave problems; a subdifferential $\partial f$ of a closed proper convex function $f\;:\;{\mathcal {H}}\rightarrow \mathbb {R} \cup \{\infty \}$ is maximally monotone [48]. Let ${\mathcal {F}}({\mathcal {H}})$ be the class of closed proper convex functions on ${\mathcal {H}}$.

We assume that the optimal set $X_*({\varvec{M}}){:}{=} \{{\varvec{x}} \in {\mathcal {H}}\;:\; {\varvec{0}} \in {\varvec{M}} {\varvec{x}} \}$ is nonempty. We also assume that the distance between an initial point ${\varvec{x}} _0$ and some optimal point ${\varvec{x}} _*\in X_*({\varvec{M}})$ is bounded as

$$\begin{aligned} ||{\varvec{x}} _0 - {\varvec{x}} _*|| \le R \quad \text {for a constant } R>0 . \end{aligned}$$

(4)

2.2 Proximal point method and its worst-case rates

Proximal point method was first introduced to convex optimization by Martinet [46], which is based on the proximal mapping by Moreau [49]. The method was later extended to monotone inclusion problem by Rockafellar [59]. The proximal point method for maximally monotone operators includes the augmented Lagrangian [36, 56], the proximal method of multipliers [58], the Douglas–Rachford splitting method [19, 44], and the alternating direction method of multipliers (ADMM) [26, 27], so studying its worst-case convergence behavior and acceleration is important, which is of main interest in this paper.

The proximal mapping [49] (or the resolvent operator) of an operator ${\varvec{M}} $ is defined as

$$\begin{aligned} {\varvec{J}} _{{\varvec{M}}} := ({\varvec{I}} + {\varvec{M}})^{-1} , \end{aligned}$$

(5)

where ${\varvec{I}} \;:\;{\mathcal {H}}\rightarrow {\mathcal {H}}$ is an identity operator, i.e., ${\varvec{I}} ({\varvec{x}})={\varvec{x}} $ for all ${\varvec{x}} \in {\mathcal {H}}$. The resolvent operator ${\varvec{J}} _{{\varvec{M}}}$ is single-valued and firmly nonexpansive for ${\varvec{M}} \in {\mathcal {M}}({\mathcal {H}})$ [47]. The proximal point method [46, 59] generates a sequence $\{{\varvec{x}} _i\}$ by iteratively applying the resolvent operator with a positive real number $\lambda $ as below.

In [9, Proposition 8], the worst-case rate of the proximal point method with respect to the fixed-point residual

$$\begin{aligned} ||{\varvec{x}}- {\varvec{J}} _{\lambda {\varvec{M}}}({\varvec{x}})||^2 \end{aligned}$$

(6)

was found to satisfy

$$\begin{aligned} ||{\varvec{x}} _i - {\varvec{x}} _{i-1}||^2 \le \frac{R^2}{i} \end{aligned}$$

(7)

for $i\ge 1$. Very recently in [32], this was improved to

$$\begin{aligned} ||{\varvec{x}} _i - {\varvec{x}} _{i-1}||^2 \le \left( 1 - \frac{1}{i}\right) ^{i-1}\frac{R^2}{i} ,\end{aligned}$$

(8)

which is exact when $\dim {\mathcal {H}}\ge 2$. Such exact worst-case with $\dim {\mathcal {H}}= 2$ given in [32] will be visited at the end of Sect. 4. The bound (8) is asymptotically e-times lower than (7), where e is Euler’s number. When we additionally assume the $\mu $-strong monotonicity, the proximal point method has a linear rate [7, Example 23.40] [59]

$$\begin{aligned} ||{\varvec{x}} _{i+1} - {\varvec{x}} _i||^2 \le \left( \frac{1}{1+\lambda \mu }\right) ^2||{\varvec{x}} _i - {\varvec{x}} _{i-1}||^2 \end{aligned}$$

(9)

for $i\ge 1$, which is exact considering the case ${\varvec{M}} {\varvec{x}} = \mu {\varvec{x}} $ with $\dim {\mathcal {H}}= 1$.

For a convex minimization of $f\in {\mathcal {F}}({\mathcal {H}})$, [66, Conjecture 4.2] conjectures that the proximal point method satisfies

$$\begin{aligned} ||{\varvec{x}} _i - {\varvec{x}} _{i-1}||^2 \le \frac{R^2}{i^2} \end{aligned}$$

(10)

for $i\ge 1$, which is faster than (8) for maximally monotone operators. In addition, the O(1/i) worst-case rate of the proximal point method with respect to the cost function was studied in [33, Theorem 2.1], and this was improved by a constant 2 in [66, Theorem 4.1]

$$\begin{aligned} f({\varvec{x}} _i) - f({\varvec{x}} _*) \le \frac{R^2}{4\lambda i} \end{aligned}$$

(11)

for $i\ge 1$ and some ${\varvec{x}} _* \in X_*(\partial f)$ with $||{\varvec{x}} _0 - {\varvec{x}} _*|| \le R$.

Remark 2.1

The results for the proximal point method can be applied to a preconditioned proximal point method. Let ${\varvec{L}} \in {\mathcal {B}}({\mathcal {H}},{\mathcal {H}})$ be invertible. Then, ${\varvec{L}} ^*{\varvec{M}} {\varvec{L}} $ is maximally monotone for ${\varvec{M}} \in {\mathcal {M}}({\mathcal {H}})$ [7, Proposition 23.25], and the corresponding proximal point method is

$$\begin{aligned} \tilde{{\varvec{x}}}_{i+1} = {\varvec{J}} _{\lambda {\varvec{L}} ^*{\varvec{M}} {\varvec{L}}}(\tilde{{\varvec{x}}}_i) = ({\varvec{I}} + \lambda {\varvec{L}} ^*{\varvec{M}} {\varvec{L}})^{-1}\tilde{{\varvec{x}}}_i .\end{aligned}$$

(12)

Introducing ${\varvec{x}} _i = {\varvec{L}} \tilde{{\varvec{x}}}_i$ and ${\varvec{P}} = ({\varvec{L}} {\varvec{L}} ^*)^{-1}$ yields the following equivalent preconditioned proximal point method

$$\begin{aligned} {\varvec{x}} _{i+1} = ({\varvec{P}} + \lambda {\varvec{M}})^{-1}{\varvec{P}} {\varvec{x}} _i .\end{aligned}$$

(13)

So, for example, the inequality (7) leads to the preconditioned fixed-point residual bound for the preconditioned proximal point method

$$\begin{aligned} \mathop {\langle {\varvec{P}} ({\varvec{x}} _i - {\varvec{x}} _{i-1}),\, {\varvec{x}} _i - {\varvec{x}} _{i-1} \rangle }\nolimits \le \frac{R^2}{i} \end{aligned}$$

(14)

for $i\ge 1$, and for some ${\varvec{x}} _*\in X_*({\varvec{M}})$ with $\mathop {\langle {\varvec{P}} ({\varvec{x}} _0 - {\varvec{x}} _*),\, {\varvec{x}} _0 - {\varvec{x}} _* \rangle }\nolimits \le R^2$. This is particularly useful when considering the PDHG method [11, 13, 25, 35, Lemma 2.2], which is an instance of a preconditioned proximal point method. We will revisit this in Sect. 6.2.

2.3 Existing accelerations for proximal point method

This section reviews existing accelerations of proximal point method for convex minimization with respect to the cost function. To the best of our knowledge, there is no other type of proximal point methods that guarantees accelerated worst-case rates.

For convex minimization, Güler [34] developed the following two accelerated versions, inspired by Nesterov’s fast gradient method [51, 52]. The following is the first accelerated version of the proximal point method in [34] which is an instance of FISTA [8]. The original version in [34] includes some variation with an iteration-dependent $\lambda _i$, rather than a fixed constant $\lambda $ (see also [5] for choosing $\lambda _i$ appropriate for further acceleration). This paper focuses on a fixed constant $\lambda $, and we leave its extension to a varying constant $\lambda _i$ as future work.

The sequence generated by the Güler’s first accelerated proximal point method satisfies [34, Theorem 2.3], [8, Theorem 4.4]

$$\begin{aligned} f({\varvec{x}} _i) - f({\varvec{x}} _*) \le \frac{R^2}{2\lambda t_{i-1}^2} \le \frac{2R^2}{\lambda (i+1)^2} \end{aligned}$$

(15)

for $i\ge 1$ and for some ${\varvec{x}} _* \in X_*(\partial f)$ with $||{\varvec{x}} _0 - {\varvec{x}} _*|| \le R$. The following is another accelerated proximal point method by Güler [34], which the formulation is similar to those of the optimized gradient methods [37, 39, 40].

The sequence generated by Güler’s second accelerated proximal point method satisfies [34, Theorem 6.1] for $i\ge 1$

$$\begin{aligned} f({\varvec{x}} _i) - f({\varvec{x}} _*) \le \frac{R^2}{4\lambda t_{i-1}^2} \le \frac{R^2}{\lambda (i+1)^2}, \end{aligned}$$

(16)

which is twice smaller than (15).

2.4 Main contribution

To accelerate the worst-case rate of the proximal point method for maximally monotone operators, the relaxation and the inertia (i.e., an implicit version of the heavy ball method [55], or equivalently, Nesterov’s and Güler’s accelerating technique [34, 51, 52]) have been studied in [1, 3, 6, 15, 29]. However, none of them guarantee accelerated rates. Therefore, the main contribution of this paper is to develop a method that has a fast $O(1/i^2)$ rate with respect to the fixed-point residual, improving upon the O(1/i) rate of the proximal point method in (7) and (8).

This paper considers the following general proximal point method with step coefficients $\{h_{i+1,k+1}\}_{k=0}^i$ for reusing previous and current updates $\{{\varvec{x}} _{k+1} - {\varvec{y}} _k\}_{k=0}^i$. This includes the proximal point method, the accelerated methods via the relaxation and the inertia [1, 3, 6, 15, 29], and the proposed accelerated method.

This paper next uses the PEP approach to find the choice of $\{h_{i+1,k+1}\}_{k=0}^i$ that guarantees an accelerated rate. While the formulation of the general proximal point method is inefficient in general, the proposed accelerated method with the specific choice of $\{h_{i+1,k+1}\}_{k=0}^i$ found by PEP has an efficient equivalent form. This form is similar to the other accelerated methods with the relaxation and/or the inertia [1, 3, 6, 15, 29].

3 Performance estimation problem for maximally monotone operators

This section uses the performance estimation problem (PEP) approach [21, 66, 67] to analyze the general proximal point method for maximally monotone operators, in terms of the fixed-point residual (6). This was recently studied in [32] for the proximal point method, providing the exact rate (8). The same authors [31] also used the PEP to study the exact worst-case rate for the ergodic sequence of the (relaxed) proximal point method for the variational inequalities. Similarly, [66] used PEP to analyze the worst-case rate of the proximal point method for convex minimization in terms of the fixed-point residual and the cost function, yielding (10) and (11), respectively.

Building upon [21, 31, 32, 66, 67], the worst-case rate of the general proximal point method after N iterations for decreasing the fixed-point residual (6) under the initial distance condition (4) can be computed by

$$\begin{aligned} \max _{{\varvec{M}} \in {\mathcal {M}}({\mathcal {H}})}\max _{\begin{array}{c} {\varvec{x}} _1,\ldots ,{\varvec{x}} _N\in {\mathcal {H}}, \\ {\varvec{y}} _0,\ldots ,{\varvec{y}} _{N-1}\in {\mathcal {H}}, \\ {\varvec{x}} _*\in X_*({\varvec{M}}) \end{array}}\;&\frac{1}{R^2}||{\varvec{x}} _N - {\varvec{y}} _{N-1}||^2 \nonumber \\ \text {subject to }\;&{\varvec{x}} _{i+1} = {\varvec{J}} _{\lambda {\varvec{M}}}({\varvec{y}} _i), \quad i=0,\ldots ,N-1, \nonumber \\&{\varvec{y}} _{i+1} = {\varvec{y}} _i + \sum _{k=0}^i h_{i+1,k+1}({\varvec{x}} _{k+1} - {\varvec{y}} _k), \quad i=0,\ldots ,N-2, \nonumber \\&||{\varvec{y}} _0 - {\varvec{x}} _*||^2 \le R^2. \end{aligned}$$

(17)

This is an infinite-dimensional problem due to the constraint ${\varvec{M}} \in {\mathcal {M}}({\mathcal {H}})$, which is impractical to solve. PEP in [21] further introduced a series of steps that reformulate such impractical problem into a tractable problem, which we apply to (17) step by step below.

The first step is to reformulate the problem (17) into a finite-dimensional problem. [61, Fact 1] implies that one can replace ${\varvec{M}} \in {\mathcal {M}}({\mathcal {H}})$ in (17) by a set of inequality constraints (1) for ${\varvec{M}} \in {\mathcal {M}}({\mathcal {H}})$ on the finite number of pairs of points $\{{\varvec{x}} _1,\ldots ,{\varvec{x}} _N,{\varvec{x}} _*\}$ without strictly relaxing the problem (17). In specific, such constraints are

$$\begin{aligned} \mathop {\langle {\varvec{x}} _i - {\varvec{x}} _j,\, {\varvec{q}} _i - {\varvec{q}} _j \rangle }\nolimits \ge 0, \end{aligned}$$

(18)

for all $i,j\in \{1,\ldots ,N,*\}$, with additional variables ${\varvec{q}} _i\in {\varvec{M}} {\varvec{x}} _i$ for $i=1,\ldots ,N$ and ${\varvec{q}} _* = {\varvec{0}} \in {\varvec{M}} {\varvec{x}} _*$. Then the resulting equivalent problem of (17) is

$$\begin{aligned} \max _{\begin{array}{c} {\varvec{x}} _1,\ldots ,{\varvec{x}} _N,{\varvec{x}} _*\in {\mathcal {H}}, \\ {\varvec{y}} _0,\ldots ,{\varvec{y}} _{N-1}\in {\mathcal {H}}, \\ {\varvec{q}} _1,\ldots ,{\varvec{q}} _N\in {\mathcal {H}} \end{array}}\;&\frac{1}{R^2}||{\varvec{x}} _N - {\varvec{y}} _{N-1}||^2 \nonumber \\ \text {subject to }\;&\mathop {\langle {\varvec{x}} _i - {\varvec{x}} _j,\, {\varvec{q}} _i - {\varvec{q}} _j \rangle }\nolimits \ge 0, \quad i<j=1,\ldots ,N, \nonumber \\&\mathop {\langle {\varvec{x}} _i - {\varvec{x}} _*,\, {\varvec{q}} _i \rangle }\nolimits \ge 0, \quad i=1,\ldots ,N, \nonumber \\&{\varvec{x}} _{i+1} = {\varvec{y}} _i - \lambda {\varvec{q}} _{i+1}, \quad i=0,\ldots ,N-1, \nonumber \\&{\varvec{y}} _{i+1} = {\varvec{y}} _i - \lambda \sum _{k=0}^i h_{i+1,k+1}{\varvec{q}} _{k+1}, \quad i=0,\ldots ,N-2, \nonumber \\&||{\varvec{y}} _0 - {\varvec{x}} _*||^2 \le R^2. \end{aligned}$$

(19)

Further removing ${\varvec{x}} _i$ and using the change of variables

$$\begin{aligned} {\varvec{g}} _i {:}{=} \frac{\lambda }{R}{\varvec{q}} _i, \quad i=1,\ldots ,N, \end{aligned}$$

(20)

simplify the problem (19) as

$$\begin{aligned} \max _{\begin{array}{c} {\varvec{y}} _0,\ldots ,{\varvec{y}} _{N-1},{\varvec{x}} _*\in {\mathcal {H}}, \\ {\varvec{g}} _1,\ldots ,{\varvec{g}} _N\in {\mathcal {H}} \end{array}}\;&||{\varvec{g}} _N||^2 \nonumber \\ \text {subject to }\;&\frac{1}{R}\mathop {\langle {\varvec{y}} _{i-1} - R{\varvec{g}} _i - {\varvec{y}} _{j-1} + R{\varvec{g}} _j,\, {\varvec{g}} _i - {\varvec{g}} _j \rangle }\nolimits \ge 0, \quad i<j=1,\ldots ,N, \nonumber \\&\frac{1}{R}\mathop {\langle {\varvec{y}} _{i-1} - R{\varvec{g}} _i - {\varvec{x}} _*,\, {\varvec{g}} _i \rangle }\nolimits \ge 0, \quad i=1,\ldots ,N, \nonumber \\&{\varvec{y}} _{i+1} = {\varvec{y}} _i - R\sum _{k=0}^i h_{i+1,k+1}{\varvec{g}} _{k+1}, \quad i=0,\ldots ,N-2, \nonumber \\&||{\varvec{y}} _0 - {\varvec{x}} _*||^2 \le R^2. \end{aligned}$$

(21)

As in [21, 31, 32, 66, 67], we next introduce the Gram matrix

$$\begin{aligned} {\varvec{Z}} = \left[ \begin{array}{ccccc} ||{\varvec{g}} _1||^2 &{} \mathop {\langle {\varvec{g}} _1,\, {\varvec{g}} _2 \rangle }\nolimits &{} \cdots &{} \mathop {\langle {\varvec{g}} _1,\, {\varvec{g}} _N \rangle }\nolimits &{} \frac{1}{R}\mathop {\langle {\varvec{g}} _1,\, {\varvec{y}} _0 - {\varvec{x}} _* \rangle }\nolimits \\ \mathop {\langle {\varvec{g}} _1,\, {\varvec{g}} _2 \rangle }\nolimits &{} ||{\varvec{g}} _2||^2 &{} \cdots &{} \mathop {\langle {\varvec{g}} _2,\, {\varvec{g}} _N \rangle }\nolimits &{} \frac{1}{R}\mathop {\langle {\varvec{g}} _2,\, {\varvec{y}} _0 - {\varvec{x}} _* \rangle }\nolimits \\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ \mathop {\langle {\varvec{g}} _1,\, {\varvec{g}} _N \rangle }\nolimits &{} \cdots &{} &{} ||{\varvec{g}} _N||^2 &{} \frac{1}{R}\mathop {\langle {\varvec{g}} _N,\, {\varvec{y}} _0 - {\varvec{x}} _* \rangle }\nolimits \\ \frac{1}{R}\mathop {\langle {\varvec{g}} _1,\, {\varvec{y}} _0 - {\varvec{x}} _* \rangle }\nolimits &{} \cdots &{} &{} \frac{1}{R}\mathop {\langle {\varvec{g}} _N,\, {\varvec{y}} _0 - {\varvec{x}} _* \rangle }\nolimits &{} \frac{1}{R^2}||{\varvec{y}} _0 - {\varvec{x}} _*||^2 \end{array}\right] \end{aligned}$$

(22)

to relax the problem as

$$\begin{aligned} \max _{{\varvec{Z}} \in \mathbb {S} _+^{N+1}}\;&\textsf {tr} \{{\varvec{u}} _N{\varvec{u}} _N^\top {\varvec{Z}} \} \nonumber \\ \text {subject to }\;&\textsf {tr} \{{\varvec{A}} _{i,j}({\varvec{h}}){\varvec{Z}} \} \le 0, \quad i<j=1,\ldots ,N, \nonumber \\&\textsf {tr} \{{\varvec{B}} _i({\varvec{h}}){\varvec{Z}} \} \le 0, \quad i=1,\ldots ,N, \nonumber \\&\textsf {tr} \{{\varvec{C}} {\varvec{Z}} \} \le 1, \end{aligned}$$

(23)

where $\{{\varvec{u}} _i\}_{i=1}^{N+1}$ is the canonical basis of $\mathbb {R} ^{N+1}$ and

$$\begin{aligned} {\left\{ \begin{array}{ll} {\varvec{A}} _{i,j}({\varvec{h}}) {:}{=} ({\varvec{u}} _i-{\varvec{u}} _j)\odot ({\varvec{u}} _i - {\varvec{u}} _j) - ({\varvec{u}} _i - {\varvec{u}} _j)\odot \sum _{l=i-1}^{j-2}\sum _{k=0}^lh_{l+1,k+1}{\varvec{u}} _{k+1}, &{} i<j=1,\ldots ,N, \\ {\varvec{B}} _i({\varvec{h}}) {:}{=} {\varvec{u}} _i{\varvec{u}} _i^\top - {\varvec{u}} _i\odot {\varvec{u}} _{N+1} + {\varvec{u}} _i\odot \sum _{l=0}^{i-2}\sum _{k=0}^lh_{l+1,k+1}{\varvec{u}} _{k+1}, &{} i=1,\ldots ,N, \\ {\varvec{C}} {:}{=} {\varvec{u}} _{N+1}{\varvec{u}} _{N+1}^\top \end{array}\right. } \end{aligned}$$

with the outer product operator ${\varvec{u}} \odot {\varvec{v}} {:}{=} \frac{1}{2}({\varvec{u}} {\varvec{v}} ^\top + {\varvec{v}} {\varvec{u}} ^\top )$. If $\dim {\mathcal {H}}\ge N+1$, the problems (17) and (23) are equivalent, based on the following lemma similar to [61, Lemma 1].

Lemma 3.1

If $\dim {\mathcal {H}}\ge N+1$, then

$$\begin{aligned}&{\varvec{Z}} \in \mathbb {S} _+^{N+1} \quad \Leftrightarrow \quad \exists \; {\varvec{g}} _1,{\varvec{g}} _2,\ldots ,{\varvec{g}} _N,\frac{1}{R}({\varvec{y}} _0-{\varvec{x}} _*)\in {\mathcal {H}}\\&\quad \text { such that } {\varvec{Z}} = \text {expression of}~(22) . \end{aligned}$$

For simplicity in later analysis, we discard some constraints as

$$\begin{aligned} \max _{{\varvec{Z}} \in \mathbb {S} _+^{N+1}}\;&\textsf {tr} \{{\varvec{u}} _N{\varvec{u}} _N^\top {\varvec{Z}} \} \nonumber \\ \text {subject to }\;&\textsf {tr} \{{\varvec{A}} _{i-1,i}({\varvec{h}}){\varvec{Z}} \} \le 0, \quad i=2,\ldots ,N, \nonumber \\&\textsf {tr} \{{\varvec{B}} _N({\varvec{h}}){\varvec{Z}} \} \le 0, \nonumber \\&\textsf {tr} \{{\varvec{C}} {\varvec{Z}} \} \le 1, \end{aligned}$$

(24)

which does not affect the result in the paper, i.e., the optimal values of (23) and (24) are found to be numerically equivalent for the method proposed in this paper. Finally, we construct the associated Lagrangian dual of (24)

$$\begin{aligned} {\mathcal {B}}_{D}({\varvec{h}}) {:}{=} \qquad \min _{a_2,\ldots ,a_N,b_N,c\in \mathbb {R}}\;&c\\ \text {subject to }\;&\sum _{i=2}^Na_i{\varvec{A}} _{i-1,i}({\varvec{h}}) + b_N{\varvec{B}} _N({\varvec{h}}) + c{\varvec{C}}- {\varvec{u}} _N{\varvec{u}} _N^\top \succeq {\varvec{0}}, \nonumber \\&a_2,\ldots ,a_N,b_N,c \ge 0, \end{aligned}$$

(D)

where $a_2,\ldots ,a_N,b_N,c$ are dual variables associated with the constraints of (24), respectively. Then, for any given ${\varvec{h}} $ for the general proximal point method, one can compute its (upper bound of) worst-case fixed-point residual by numerically solving (D) using any SDP solver. For some choices of ${\varvec{h}} $ as for the proximal point method in [32], it might be possible to analytically solve (D); [32] analytically solved (D) for the proximal point method yielding the rate (8). This paper provides another choice of ${\varvec{h}} $ that provides an analytical solution to (D) with an accelerated rate.

4 Accelerating the proximal point method for maximally monotone operators

Using the dual problem (D), this section develops an accelerated version of the proximal point method via PEP:

$$\begin{aligned} \min _{{\varvec{h}}} {\mathcal {B}}_{D}({\varvec{h}}) , \end{aligned}$$

(HD)

which is studied in [20,21,22, 37,38,39,40] for certain classes of problems and methods. The problem is non-convex but convex for the variables $(a_2,\ldots ,a_N,b_N,c)$ given ${\varvec{h}} $ and for the variables $(c,{\varvec{h}})$ given $(a_2,\ldots ,a_N,b_N)$. Therefore, we used a variant of alternating minimization that alternatively optimizes over $(a_2,\ldots ,a_N,b_N,c)$ given ${\varvec{h}} $ and over $(c,{\varvec{h}})$ given $(a_2,\ldots ,a_N,b_N)$ to find a minimizer using a SDP solver [16, 30]. Inspired by numerical results, the following lemma specifies a feasible point of (HD) analytically. We do not have a guarantee that such point is a (unique) minimizer of (HD).

Lemma 4.1

The following

$$\begin{aligned} h_{i,k}&= {\left\{ \begin{array}{ll} -\frac{2k}{i(i+1)}, &{} i=1,\ldots ,N-1,\;k=1,\ldots ,i-1, \\ \frac{2i}{i+1}, &{} i=1,\ldots ,N-1,\;k=i, \end{array}\right. } \end{aligned}$$

(25)

$$\begin{aligned} a_i&= \frac{2(i-1)i}{N^2}, \quad i=2,\ldots ,N, \quad b_N = \frac{2}{N}, \quad c = \frac{1}{N^2} \end{aligned}$$

(26)

is a feasible point of (D) and (HD).

Proof

It is obvious that $a_2,\ldots ,a_N,b_N,c$ are nonnegative, so we are only left to show the positive semidefinite condition in (D). Since

$$\begin{aligned}&\; \sum _{i=2}^Na_i{\varvec{A}} _{i-1,i}({\varvec{h}}) + b_N{\varvec{B}} _N({\varvec{h}}) + c{\varvec{C}}- {\varvec{u}} _N{\varvec{u}} _N^\top \\&\quad = \; \sum _{i=2}^N\frac{2(i-1)i}{N^2}\left[ ({\varvec{u}} _{i-1} - {\varvec{u}} _i)\odot ({\varvec{u}} _{i-1} - {\varvec{u}} _i)\right. \\&\left. \qquad - ({\varvec{u}} _{i-1} - {\varvec{u}} _i)\odot \left( \frac{2(i-1)}{i}{\varvec{u}} _{i-1} - \sum _{k=0}^{i-3}\frac{2(k+1)}{(i-1)i}{\varvec{u}} _{k+1}\right) \right] \\&\qquad + \frac{2}{N}\left[ {\varvec{u}} _N{\varvec{u}} _N^\top - {\varvec{u}} _N\odot {\varvec{u}} _{N+1} + {\varvec{u}} _N\odot \sum _{l=0}^{N-2}\right. \\&\qquad \left. \left( \frac{2(l+1)}{l+2}{\varvec{u}} _{l+1}- \sum _{k=0}^{l-1}\frac{2(k+1)}{(l+1)(l+2)}{\varvec{u}} _{k+1}\right) \right] \\&\qquad + \frac{1}{N^2}{\varvec{u}} _{N+1}{\varvec{u}} _{N+1}^\top - {\varvec{u}} _N{\varvec{u}} _N^\top \\&\quad = \; \sum _{i=2}^{N-1} \left[ \frac{2(i-1)i}{N^2} + \frac{2i(i+1)}{N^2}\left( 1 - \frac{2i}{i+1}\right) \right] {\varvec{u}} _i{\varvec{u}} _i^\top + \left[ \frac{2(N-1)N}{N^2} \right. \\&\left. \qquad + \frac{2}{N} - 1\right] {\varvec{u}} _N{\varvec{u}} _N^\top + \frac{1}{N^2}{\varvec{u}} _{N+1}{\varvec{u}} _{N+1}^\top \\&\qquad + \sum _{i=2}^{N-1}\left[ \frac{2(i-1)i}{N^2}\left( -2 + \frac{2(i-1)}{i}\right) + \frac{2i(i+1)}{N^2}\frac{2(i-1)}{i(i+1)}\right] {\varvec{u}} _{i-1}\odot {\varvec{u}} _i \\&\qquad + \left[ \frac{2(N-1)}{N}\left( -2 + \frac{2(N-1)}{N}\right) + \frac{2}{N}\frac{2(N-1)}{N}\right] {\varvec{u}} _{N-1}\odot {\varvec{u}} _N - \frac{2}{N}{\varvec{u}} _N\odot {\varvec{u}} _{N+1} \\&\qquad + \sum _{i=3}^{N-1}\sum _{k=0}^{i-3}\left[ -\frac{2(i-1)i}{N^2}\frac{2(k+1)}{(i-1)i} + \frac{2i(i+1)}{N^2}\frac{2(k+1)}{i(i+1)}\right] {\varvec{u}} _{k+1}\odot {\varvec{u}} _i \\&\qquad + \sum _{k=0}^{N-3}\left[ -\frac{2(N-1)N}{N^2}\frac{2(k+1)}{(N-1)N} \right. \\&\left. \qquad + \frac{2}{N}\left( \frac{2(k+1)}{k+2} - \sum _{l=k+1}^{N-2}\frac{2(k+1)}{(l+1)(l+2)}\right) \right] {\varvec{u}} _{k+1}\odot {\varvec{u}} _N \\&\quad = \; {\varvec{u}} _N{\varvec{u}} _N^\top + \frac{1}{N^2}{\varvec{u}} _{N+1}{\varvec{u}} _{N+1}^\top - \frac{2}{N}{\varvec{u}} _N\odot {\varvec{u}} _{N+1} \\&\quad =\; \left( {\varvec{u}} _N - \frac{1}{N}{\varvec{u}} _{N+1}\right) \left( {\varvec{u}} _N - \frac{1}{N}{\varvec{u}} _{N+1}\right) ^\top \succeq {\varvec{0}}, \end{aligned}$$

the given point is a feasible point of (HD). $\square $

Before providing the worst-case rate of the general proximal point method with ${\varvec{h}} $ in (25), we develop its efficient formulation below. This has a low computational cost per iteration, comparable to that of the proximal point method. Note that this may not be the only efficient form for ${\varvec{h}} $ in (25).

Proposition 4.1

The sequences $\{{\varvec{x}} _i\}$ and $\{{\varvec{y}} _i\}$ generated by the general proximal point method with step coefficients $\{h_{i,k}\}$ in (25) are identical to the corresponding sequence generated by the proposed accelerated proximal point method starting from the same initial point.

Proof

We use induction, and for clarity we use the notation ${\varvec{x}} _1',{\varvec{x}} _2',\ldots $ and ${\varvec{y}} _0',{\varvec{y}} _1',\ldots $ for the general proximal point method with (25). It is obvious that ${\varvec{x}} _0={\varvec{y}} _0'={\varvec{y}} _0$, ${\varvec{x}} _1' = {\varvec{x}} _1 = {\varvec{y}} _1$, and we have

$$\begin{aligned} {\varvec{y}} _1'&= {\varvec{y}} _0' + h_{1,1}({\varvec{x}} _1' - {\varvec{y}} _0') = {\varvec{x}} _1' = {\varvec{y}} _1 . \end{aligned}$$

Similarly, it is obvious that ${\varvec{x}} _2' = {\varvec{x}} _2$, and we have

$$\begin{aligned} {\varvec{y}} _2'&= {\varvec{y}} _1' + \sum _{k=0}^1h_{2,k+1}({\varvec{x}} _{k+1}' - {\varvec{y}} _k') = {\varvec{y}} _1 + \frac{4}{3}({\varvec{x}} _2 - {\varvec{y}} _1) - \frac{1}{3}({\varvec{x}} _1 - {\varvec{y}} _0) \\&= {\varvec{x}} _2 + \frac{1}{3}({\varvec{x}} _2 - {\varvec{x}} _1) - \frac{1}{3}({\varvec{x}} _1 - {\varvec{y}} _0) = {\varvec{y}} _2. \end{aligned}$$

It is then also obvious that ${\varvec{x}} _3' = {\varvec{x}} _3$. Assuming ${\varvec{x}} _l' = {\varvec{x}} _l$ for $l=1,\ldots ,i+1$ and ${\varvec{y}} _l' = {\varvec{y}} _l$ for $l=0,\ldots ,i$, for some $i\ge 2$, we have

$$\begin{aligned} {\varvec{y}} _{i+1}'&= {\varvec{y}} _i' + \sum _{k=0}^ih_{i+1,k+1}({\varvec{x}} _{k+1}' - {\varvec{y}} _k') \\&= {\varvec{y}} _i + \frac{2(i+1)}{i+2}({\varvec{x}} _{i+1} - {\varvec{y}} _i) + \sum _{k=0}^{i-1}\left( -\frac{2(k+1)}{(i+1)(i+2)}\right) ({\varvec{x}} _{k+1} - {\varvec{y}} _k) \\&= {\varvec{y}} _i + \left( 1+\frac{i}{i+2}\right) ({\varvec{x}} _{i+1} - {\varvec{y}} _i) + \frac{i}{i+2}\sum _{k=0}^{i-1}\left( -\frac{2(k+1)}{i(i+1)}\right) ({\varvec{x}} _{k+1} - {\varvec{y}} _k) \\&= {\varvec{x}} _{i+1} + \frac{i}{i+2}({\varvec{x}} _{i+1} - {\varvec{y}} _i) + \frac{i}{i+2}({\varvec{y}} _i + {\varvec{y}} _{i-1} - 2{\varvec{x}} _i) \\&= {\varvec{x}} _{i+1} + \frac{i}{i+2}({\varvec{x}} _{i+1} - {\varvec{x}} _i) - \frac{i}{i+2}({\varvec{x}} _i - {\varvec{y}} _{i-1}) = {\varvec{y}} _{i+1}, \end{aligned}$$

where the fourth equality uses

$$\begin{aligned} {\varvec{y}} _i&= {\varvec{y}} _{i-1} + \frac{2i}{i+1}({\varvec{x}} _i - {\varvec{y}} _{i-1}) + \sum _{k=0}^{i-2}\left( -\frac{2(k+1)}{i(i+1)}\right) ({\varvec{x}} _{k+1} - {\varvec{y}} _k) \\&= {\varvec{y}} _{i-1} + 2({\varvec{x}} _i - {\varvec{y}} _{i-1}) + \sum _{k=0}^{i-1}\left( -\frac{2(k+1)}{i(i+1)}\right) ({\varvec{x}} _{k+1} - {\varvec{y}} _k) .\end{aligned}$$

$\square $

The proposed accelerated method has the inertia term $\frac{i}{i+2}({\varvec{x}} _{i+1} - {\varvec{x}} _i)$, similar to Nesterov’s acceleration [51, 52] and Güler’s methods [34]. However, the proposed method also has a correction term $- \frac{i}{i+2}({\varvec{x}} _i - {\varvec{y}} _{i-1})$, which is essential to guarantee an accelerated rate. Without such correction term, the accelerated method can diverge, for which we provide an example at the end of this section. We leave further understanding the role of the proposed correction term as future work, possibly via a differential equation perspective as in [64] for Nesterov’s acceleration. Note that a different correction term for Nesterov’s acceleration has been studied via the differential equation analysis for convex minimization [4, 63].

The following theorem provides an accelerated rate of the proposed method in terms of the fixed-point residual.^{Footnote 1}

Theorem 4.1

Let ${\varvec{M}} \in {\mathcal {M}}({\mathcal {H}})$ and let ${\varvec{x}} _0,{\varvec{y}} _0,{\varvec{x}} _1,{\varvec{y}} _1,\ldots \in {\mathcal {H}}$ be generated by the proposed accelerated proximal point method. Assume that $||{\varvec{x}} _0 - {\varvec{x}} _*||\le R$ for a constant $R>0$ and for some ${\varvec{x}} _*\in X_*({\varvec{M}})$. Then for any $i\ge 1$,

$$\begin{aligned} ||{\varvec{x}} _i - {\varvec{y}} _{i-1}||^2 \le \frac{R^2}{i^2} .\end{aligned}$$

(27)

Proof

Using Lemma 4.1, the general proximal point method with ${\varvec{h}} $ (25) satisfies

$$\begin{aligned} \frac{1}{R^2}||{\varvec{x}} _N - {\varvec{y}} _{N-1}||^2 \le {\mathcal {B}}_{D}({\varvec{h}}) \le \frac{1}{N^2}. \end{aligned}$$

(28)

Since the iterates of the method are recursive and do not depend on a given N, the bound (28) generalizes to the intermediate iterates of the method. By Proposition 4.1, the proposed accelerated proximal point method also satisfies the bound (28), which concludes the proof. $\square $

The bound (8) of the proximal point method was found to be exact in [32] by specifying a certain operator ${\varvec{M}} $ achieving the bound (8) exactly; that is, for given $N\ge 2$, the proximal point method exactly achieves the bound (8) for the operator

$$\begin{aligned} {\varvec{M}} \left[ \begin{array}{c} u \\ v \end{array}\right] = \frac{1}{\lambda \sqrt{N-1}}\left[ \begin{array}{cc} 0\; &{} \quad 1 \\ -1\; &{} \quad 0 \end{array}\right] \left[ \begin{array}{c} u \\ v \end{array}\right] , \end{aligned}$$

(29)

with an initial point ${\varvec{x}} _0 = [1\; 0]^\top $. Such exact analysis is important since it reveals the worst-case behavior of the iterates of the method. However, we were not able to show that the bound (27) of the proposed method is exact, which we leave as future work. Instead, we compared the behavior of the iterates of the proximal point method and its accelerated variants on the operator ${\varvec{M}} $ in (29). Figure 1 compares the proximal point method, Güler’s first accelerated method with ${\varvec{M}} $ instead of $\partial f$ (i.e., an instance of the inertia method) and the proposed accelerated method, with an initial point ${\varvec{x}} _0 = [1\; 0]^\top $ and the optimal point ${\varvec{x}} _* = {\varvec{0}} $. Note that the Güler’s first method is almost equivalent to the proposed accelerated method without the correction term $-\frac{i}{i+2}({\varvec{x}} _i - {\varvec{y}} _{i-1})$, and this exhibits diverging behavior in Fig. 1. The figure illustrates that the correction term greatly helps the iterates to rapidly converge by reducing the radius of the orbit of the iterates, compared to other methods.

We further investigate the behavior of the proposed method for a convex–concave saddle-point problem

$$\begin{aligned} \min _{{\varvec{u}} \in {\mathcal {H}}_1}\max _{{\varvec{v}} \in {\mathcal {H}}_2}\;&\phi ({\varvec{u}},{\varvec{v}}) , \end{aligned}$$

(30)

where ${\mathcal {H}}_1$ and ${\mathcal {H}}_2$ denote real Hilbert spaces equipped with inner product $\mathop {\langle \cdot ,\, \cdot \rangle }\nolimits $, and $\phi (\cdot ,{\varvec{v}})\in {\mathcal {F}}({\mathcal {H}}_1)$, $-\phi ({\varvec{u}},\cdot )\in {\mathcal {F}}({\mathcal {H}}_2)$, which we further study in Sects. 5 and 6.1. The saddle subdifferential of $\phi $,

$$\begin{aligned} \left[ \begin{array}{c} \partial _{{\varvec{u}}} \phi ({\varvec{u}},{\varvec{v}}) \\ \partial _{{\varvec{v}}} (-\phi ({\varvec{u}},{\varvec{v}})) \end{array}\right] , \end{aligned}$$

(31)

is monotone [57]. The proposed accelerated method applied to (31) with ${\varvec{x}} _i {:}{=} ({\varvec{u}} _i,{\varvec{v}} _i)$ and ${\varvec{x}} _* {:}{=} ({\varvec{u}} _*,{\varvec{v}} _*)$ (see Sect. 6.1 for details) satisfies

$$\begin{aligned} \phi ({\varvec{u}} _i,{\varvec{v}} _*) - \phi ({\varvec{u}} _*,{\varvec{v}} _i) \le \frac{||{\varvec{u}} _0 - {\varvec{u}} _*||^2 + ||{\varvec{v}} _0 - {\varvec{v}} _*||^2}{4\lambda i} \end{aligned}$$

(32)

for any $i\ge 1$. This is numerically conjectured by the PEP analysis in (19) with the objective function $\frac{1}{R}||{\varvec{x}} _N - {\varvec{y}} _{N-1}||^2$ and the inequality $\mathop {\langle {\varvec{x}} _N - {\varvec{x}} _*,\, {\varvec{q}} _N \rangle }\nolimits \ge 0$ in (19) replaced by $\phi ({\varvec{u}} _N,{\varvec{v}} _*) - \phi ({\varvec{u}} _*,{\varvec{v}} _N)$ and $\mathop {\langle {\varvec{x}} _N - {\varvec{x}} _*,\, {\varvec{q}} _N \rangle }\nolimits \ge \phi ({\varvec{u}} _N,{\varvec{v}} _*) - \phi ({\varvec{u}} _*,{\varvec{v}} _N)$, respectively.^{Footnote 2}

5 Restarting the accelerated proximal point method for strongly monotone operators

For strongly monotone operators, the proximal point method has a linear rate (9), whereas the proposed accelerated method is not guaranteed to have such a fast rate. Technically, one should be able to find an accelerated method for strong monotone operators via PEP, as we did for the monotone operators in the previous section. However, the resulting PEP problem, a reminiscent of (HD), is much more difficult to solve, and we leave it as future work. Instead, we consider a fixed restarting technique in [50, Section 11.4], [53, Section 5.1] that restarts an accelerated method with a sublinear rate every certain number of iterations to yield a fast linear rate, particularly for ${\varvec{M}} \in {\mathcal {M}}_{\mu }({\mathcal {H}})$ in this section.

Suppose one restarts the proposed method every k (inner) iterations by initializing the $(j+1)$th outer iteration ${\varvec{x}} _{j+1,0} = {\varvec{y}} _{j+1,0} = {\varvec{y}} _{j+1,-1}$ by ${\varvec{x}} _{j,k}$, where ${\varvec{x}} _{j,l}$ and ${\varvec{y}} _{j,l}$ denote iterates at the jth outer iteration and lth inner iteration for $j=0,1,\ldots $ and $l=-1,0,1,\ldots ,k$. Using the rate (27) (with $R = ||{\varvec{x}} _{j,0} - {\varvec{x}} _*||$) and the strong monotonicity condition (2), we have

$$\begin{aligned} ||{\varvec{x}} _{j,k} - {\varvec{y}} _{j,k-1}||^2 \le \frac{||{\varvec{x}} _{j,0} - {\varvec{x}} _*||^2}{k^2} \le \frac{1}{\mu ^2 k^2}||{\varvec{M}} {\varvec{x}} _{j,0}||^2 \end{aligned}$$

(33)

for $j = 0,1,\ldots $. Since $\frac{1}{\lambda }({\varvec{x}} _{j-1,k} - {\varvec{y}} _{j-1,k-1}) \in {\varvec{M}} {\varvec{x}} _{j,0}$, we have a linear rate

$$\begin{aligned} ||{\varvec{x}} _{j,k} - {\varvec{y}} _{j,k-1}||^2 \le \frac{1}{\lambda ^2\mu ^2 k^2}||{\varvec{x}} _{j-1,k} - {\varvec{y}} _{j-1,k-1}||^2 . \end{aligned}$$

(34)

For a given $N=jk$ total number of steps, minimizing the overall rate with respect to k yields an optimal choice of the restarting interval given by $k_{\mathrm {opt}}{\,\approx \,} \frac{e}{\lambda \mu }$, where e is Euler’s number. The corresponding linear rate is $O((e^{\lambda \mu /e})^{-2N})$.

We further investigate the behavior of the restarting technique for a saddle-point problem (30) with an assumption that $\phi $ is strongly-convex–strongly-concave, i.e., $\phi (\cdot ,{\varvec{v}}) - \frac{\mu }{2}||\cdot ||^2\in {\mathcal {F}}({\mathcal {H}}_1)$ and $-\phi ({\varvec{u}},\cdot ) - \frac{\mu }{2}||\cdot ||^2\in {\mathcal {F}}({\mathcal {H}}_2)$. The associated saddle subdifferential (31) is $\mu $-strongly monotone. For such case, using the rate (32), and the inequalities $\phi ({\varvec{u}} _*,{\varvec{v}} _*) + \frac{\mu }{2}||{\varvec{u}}- {\varvec{u}} _*||^2 \le \phi ({\varvec{u}},{\varvec{v}} _*)$ and $-\phi ({\varvec{u}} _*,{\varvec{v}} _*) + \frac{\mu }{2}||{\varvec{v}}- {\varvec{v}} _*||^2 \le -\phi ({\varvec{u}} _*,{\varvec{v}})$, the proposed method with restarting every k iterations satisfies

$$\begin{aligned} \phi ({\varvec{u}} _{j,k},{\varvec{v}} _*) - \phi ({\varvec{u}} _*,{\varvec{v}} _{j,k}) \le \frac{1}{2\lambda \mu k}(\phi ({\varvec{u}} _{j,0},{\varvec{v}} _*) - \phi ({\varvec{u}} _*,{\varvec{v}} _{j,0})) \end{aligned}$$

(35)

for $j=0,1,\ldots $. The associated optimal restarting interval is $k_{\mathrm {opt}}^\phi {\,\approx \,} \frac{e}{2\lambda \mu }$, which is twice smaller than $k_{\mathrm {opt}}$. The corresponding linear rate is also $O((e^{\lambda \mu /e})^{-2N})$, whereas the proximal point method has the rate $O((1+\lambda \mu )^{-2N})$ in (9). For any given positive $\mu $, there is no positive $\lambda $ that satisfies both $k_{\mathrm {opt}}^\phi (\lambda ,\mu )\ge 1$ and $e^{\lambda \mu /e} > 1+\lambda \mu $. This contrasts with the fact that the worst-case rate of optimally restarting the proposed method is not slower than that of the proximal point method. This implies that the bounds (34) and (35) are not exact, and we leave finding their tight bounds as future work. The numerical experiment below (and those in Sect. 6) suggests that the restarting technique can perform better than the proximal point method.

We consider a toy problem that is a combination of the worst-case problems in ${\mathcal {M}}({\mathcal {H}})$ and ${\mathcal {M}}_{\mu }({\mathcal {H}})$ for the proximal point method:

$$\begin{aligned} {\varvec{M}} \left[ \begin{array}{c} u \\ v \end{array}\right] = \left( \frac{1}{\lambda \sqrt{N-1}}\left[ \begin{array}{cc} 0\; &{} \quad 1 \\ -1\; &{} \quad 0 \end{array}\right] + \left[ \begin{array}{cc} \mu \; &{} \quad 0 \\ 0\; &{} \quad \mu \end{array}\right] \right) \left[ \begin{array}{c} u \\ v \end{array}\right] , \end{aligned}$$

(36)

which is the saddle subdifferential operator of $\phi (u,v) = \frac{\mu }{2}u^2 + \frac{1}{\lambda \sqrt{N-1}}uv - \frac{\mu }{2}v^2$. We choose $N = 100$, $\lambda = 1$ and $\mu = 0.02$. The optimal restarting intervals are $k_{\mathrm {opt}} \approx 136$ and $k_{\mathrm {opt}}^\phi \approx 68$, and we run 200 iterations in the experiment, where restarting intervals 17, 34, 68, and 136 are considered. Figure 2 compares the proximal point method, its accelerated variants, and the proposed accelerated method with restarting, with an initial point ${\varvec{x}} _0 = [1\; 0]^\top $ and the optimal point ${\varvec{x}} _* = {\varvec{0}} $. Figure 2 presents that the proximal point method has a linear rate that is faster than the proposed method (with a sublinear rate), while the restarting greatly accelerates the proposed method with a fast linear rate. Figure 2 also illustrates that the optimal restarting intervals $k_{\mathrm {opt}}$ and $k_{\mathrm {opt}}^\phi $ for strongly monotone operators and strongly-convex–strongly-concave functions, respectively, are not optimal for this specific case. Examples in the next section also present that the restarting can be useful even without strong monotonicity (but possibly with local strong monotonicity).

6 Applications of the accelerated proximal point method

As mentioned earlier, the proximal point method for maximally monotone operators include various well-known convex optimization methods. These include the augmented Lagrangian (i.e., the method of multipliers), the proximal method of multipliers, and ADMM. The augmented Lagrangian method is equivalent to the proximal point method directly solving the dual convex minimization problem [58], so Güler’s methods [34] already provide acceleration, whereas other instances of the proximal point method have no known accelerations yet. Thus, this section introduces accelerations to well-known instances of the proximal point method, which were not possible previously to the best of our knowledge (under this paper’s setting).

6.1 Accelerating the proximal point method for convex–concave saddle-point problem

This section considers a convex–concave saddle-point problem (30), where the associated saddle subdifferential operator (31) is monotone. [59] applied the proximal point method on such operator to solve the convex–concave saddle-point problem, and this section further applies the proposed acceleration to such proximal point method as below.

One primary use of this accelerated method is the following convex–concave Lagrangian problem

$$\begin{aligned} \min _{{\varvec{u}} \in {\mathcal {H}}_1}\max _{{\varvec{v}} \in {\mathcal {H}}_2}\;&\left\{ L({\varvec{u}},{\varvec{v}}) {:}{=} f({\varvec{u}}) + \mathop {\langle {\varvec{v}},\, {\varvec{A}} {\varvec{u}}- {\varvec{b}} \rangle }\nolimits \right\} , \end{aligned}$$

(37)

associated with the linearly constrained problem

$$\begin{aligned} \min _{{\varvec{u}} \in {\mathcal {H}}_1}\;&f({\varvec{u}}) \nonumber \\ \text {subject to }\;&{\varvec{A}} {\varvec{u}} = {\varvec{b}}, \end{aligned}$$

(38)

where ${\varvec{A}} \in {\mathcal {B}}({\mathcal {H}}_1,{\mathcal {H}}_2)$ and ${\varvec{b}} \in {\mathcal {H}}_2$. The resulting method is called the proximal method of multipliers in [58], and applying the proposed acceleration to this method leads to below.

Note that this method without the acceleration and the term $\frac{1}{2\lambda }||{\varvec{u}}- {\varvec{u}} _i||^2$ reduces to the augmented Lagrangian method. This method has an advantage over the augmented Lagrangian method and its accelerated variants; the primal iterate ${\varvec{u}} _{i+1}$ is uniquely defined with a better conditioning.

Example 6.1

We apply the accelerated proximal method of multipliers to a basis pursuit problem

$$\begin{aligned} \min _{{\varvec{u}} \in \mathbb {R} ^{d_1}}\;&||{\varvec{u}} ||_1 \nonumber \\ \text {subject to }\;&{\varvec{A}} {\varvec{u}} = {\varvec{b}}, \end{aligned}$$

(39)

where ${\varvec{A}} \in \mathbb {R} ^{d_2\times d_1}$ and ${\varvec{b}} \in \mathbb {R} ^{d_2}$. In the experiment, we choose $d_1 = 100$, $d_2 = 20$, and randomly generated ${\varvec{A}} $. A true sparse ${\varvec{u}} _{\mathrm {true}}$ is randomly generated followed by a thresholding to sparsify nonzero elements, and ${\varvec{b}} $ is then given by ${\varvec{A}} {\varvec{u}} _{\mathrm {true}}$. We run 100 iterations of the proximal method of multipliers and its variants with $\lambda = 0.01$ and initial ${\varvec{x}} _0 = {\varvec{0}} $. Since the ${\varvec{u}} _{i+1}$-update does not have a closed form, we used a sufficient number of iterations to solve the ${\varvec{u}} _{i+1}$-update using the strongly convex version of FISTA [8] in [12, Theorem 4.10].

Figure 3 compares the proximal method of multipliers and its accelerated variants. Similar to Fig. 1, Güler’s first accelerated version diverges, while the proposed method has accelerating behavior, compared to the non-accelerated version. The proposed method exhibits an oscillation in Fig. 3 (and a subtle oscillation in Fig. 1). This might be due to high momentum, owing from the acceleration, discussed in [54]. So in Fig. 3 we heuristically restarted the method every 30 iterations to avoid such oscillation and accelerate, as suggested in [54]. Developing an approach to appropriately choosing a restarting interval or adaptively restarting the method as in [54] for such problem are left as future work.^{Footnote 3}

6.2 Accelerating the primal–dual hybrid gradient method

This section considers a linearly coupled convex–concave saddle-point problem

$$\begin{aligned} \min _{{\varvec{u}} \in {\mathcal {H}}_1}\max _{{\varvec{v}} \in {\mathcal {H}}_2} \left\{ \phi ({\varvec{u}},{\varvec{v}}) \equiv f({\varvec{u}}) + \mathop {\langle {\varvec{K}} {\varvec{u}},\, {\varvec{v}} \rangle }\nolimits - g({\varvec{v}})\right\} , \end{aligned}$$

(40)

where $f\in {\mathcal {F}}({\mathcal {H}}_1)$, $g\in {\mathcal {F}}({\mathcal {H}}_2)$ and ${\varvec{K}} \in {\mathcal {B}}({\mathcal {H}}_1,{\mathcal {H}}_2)$. One widely known method for such problem is the primal–dual hybrid gradient (PDHG) method [11, 25], which is a preconditioned proximal point method (with $\lambda = 1$) for the saddle subdifferential operator of $\phi $ (31) [13, 35]. The associated preconditioner is

$$\begin{aligned} {\varvec{P}} = \left[ \begin{array}{cc} \frac{1}{\tau }{\varvec{I}} \;&{}\quad -{\varvec{K}} ^* \\ -{\varvec{K}} \;&{}\quad \frac{1}{\sigma }{\varvec{I}} \end{array}\right] ,\end{aligned}$$

(41)

which is positive definite when $\tau \sigma ||{\varvec{K}} ||^2<1$, where $||{\varvec{K}} || = \sup _{||{\varvec{x}} ||\le 1} ||{\varvec{K}} {\varvec{x}} ||$. As mentioned in remark 2.1, we can directly apply our results to the PDHG method as below.

Corollary 6.1

Assume that $\mathop {\langle {\varvec{P}} ({\varvec{x}} _0 - {\varvec{x}} _*),\, {\varvec{x}} _0 - {\varvec{x}} _* \rangle }\nolimits \le R^2$ for some ${\varvec{x}} _* \in X_*({\varvec{M}})$. The PDHG method satisfies

$$\begin{aligned} \mathop {\langle {\varvec{P}} ({\varvec{x}} _i - {\varvec{x}} _{i-1}),\, {\varvec{x}} _i - {\varvec{x}} _{i-1} \rangle }\nolimits \le \left( 1 - \frac{1}{i}\right) ^{i-1}\frac{R^2}{i} , \end{aligned}$$

and the proposed accelerated PDHG method satisfies

$$\begin{aligned} \mathop {\langle {\varvec{P}} ({\varvec{x}} _i - {\varvec{y}} _{i-1}),\, {\varvec{x}} _i - {\varvec{y}} _{i-1} \rangle }\nolimits \le \frac{R^2}{i^2} .\end{aligned}$$

Example 6.2

We apply the accelerated PDHG method to the bilinear game problem

$$\begin{aligned} \min _{{\varvec{u}} \in \mathbb {R} ^{d_1}}\max _{{\varvec{v}} \in \mathbb {R} ^{d_2}} \mathop {\langle {\varvec{a}},\, {\varvec{u}} \rangle }\nolimits + \mathop {\langle {\varvec{K}} {\varvec{u}},\, {\varvec{v}} \rangle }\nolimits - \mathop {\langle {\varvec{b}},\, {\varvec{v}} \rangle }\nolimits ,\end{aligned}$$

(42)

where ${\varvec{K}} \in \mathbb {R} ^{d_2\times d_1}$, ${\varvec{a}} \in \mathbb {R} ^{d_1}$ and ${\varvec{b}} \in \mathbb {R} ^{d_2}$. The main part of the corresponding method is as below:

$$\begin{aligned} {\varvec{u}} _{i+1}&= \hat{{\varvec{u}}}_i - \tau ({\varvec{K}} ^*\hat{{\varvec{v}}}_i + {\varvec{a}}) \nonumber \\ {\varvec{v}} _{i+1}&= \hat{{\varvec{v}}}_i + \sigma ({\varvec{K}} (2{\varvec{u}} _{i+1} - \hat{{\varvec{u}}}_i) - {\varvec{b}}). \end{aligned}$$

(43)

In the experiment, we choose $d_1 = 1000$, $d_2 = 500$, and a matrix ${\varvec{K}} $ and vectors ${\varvec{a}},{\varvec{b}} $ are randomly generated. We run 100 iterations of the PDHG method and its variants with initial $\hat{{\varvec{u}}}_0 = [10\;\cdots \;10]^\top $, $\hat{{\varvec{v}}}_0 = [10\;\cdots \;10]^\top $ and $\tau =\sigma =\frac{0.99}{||{\varvec{K}} ||}$. Figure 4 plots the preconditioned fixed-point residual, where Güler’s first accelerated method diverges. The PDHG method and its proposed accelerated variant are comparable in this experiment, and heuristically restarting the accelerated method every 10 iterations yields a big acceleration. While [11] found restarting (reinitializing) a relaxed PDHG method not useful, our experiment suggests that restarting can be effective in some practical cases.

6.3 Accelerating the Douglas–Rachford splitting method

This section considers a monotone inclusion problem in a form

$$\begin{aligned} {\text {Find}}\;\; {\varvec{x}} \in {\mathcal {H}}\quad \text {subject to} \quad {\varvec{0}} \in ({\varvec{M}} _1 + {\varvec{M}} _2){\varvec{x}} \end{aligned}$$

(44)

for ${\varvec{M}} _1,{\varvec{M}} _2\in {\mathcal {M}}({\mathcal {H}})$, where ${\varvec{J}} _{\rho {\varvec{M}} _1}$ and ${\varvec{J}} _{\rho {\varvec{M}} _2}$ are more efficient than ${\varvec{J}} _{\rho ({\varvec{M}} _1 + {\varvec{M}} _2)}$ for a positive real number $\rho $. For such problem, the Douglas–Rachford splitting method [19, 44] that iteratively applies the operator

$$\begin{aligned} {\varvec{G}} _{\rho ,{\varvec{M}} _1,{\varvec{M}} _2}{:}{=} {\varvec{J}} _{\rho {\varvec{M}} _1}{\circ }(2{\varvec{J}} _{\rho {\varvec{M}} _2} - {\varvec{I}}) + ({\varvec{I}}- {\varvec{J}} _{\rho {\varvec{M}} _2}) \end{aligned}$$

(45)

has been found to be effective in many applications including ADMM, which we discuss in the next section.

In [24, Theorem 4], the Douglas–Rachford operator (45) was found to be a resolvent ${\varvec{J}} _{{\varvec{M}} _{\rho ,{\varvec{M}} _1,{\varvec{M}} _2}}$ of a maximally monotone operator

$$\begin{aligned} {\varvec{M}} _{\rho ,{\varvec{M}} _1,{\varvec{M}} _2} {:}{=} {\varvec{G}} _{\rho ,{\varvec{M}} _1,{\varvec{M}} _2}^{-1} - {\varvec{I}}. \end{aligned}$$

(46)

In other words, the Douglas–Rachford splitting method is an instance of the proximal point method (with $\lambda = 1$) as

$$\begin{aligned} {\varvec{\nu }} _{i+1} = {\varvec{J}} _{{\varvec{M}} _{\rho ,{\varvec{M}} _1,{\varvec{M}} _2}}({\varvec{\nu }} _i) = {\varvec{G}} _{\rho ,{\varvec{M}} _1,{\varvec{M}} _2}({\varvec{\nu }} _i) \end{aligned}$$

(47)

for $i=0,1,\ldots $. Therefore, we can apply the proposed acceleration to the Douglas–Rachford splitting method as below.

Using (8) and (27), we have the following worst-case rates for the Douglas–Rachford splitting method and its accelerated variant. Finding exact bounds for the Douglas–Rachford splitting method and its variant is left as future work; [61] used PEP to analyze the exact worst-case rate of Douglas–Rachford splitting method under some additional conditions.

Corollary 6.2

Assume that $||{\varvec{\nu }} _0 - {\varvec{\nu }} _*||\le R$ for some ${\varvec{\nu }} _*\in X_*({\varvec{M}} _{\rho ,{\varvec{M}} _1,{\varvec{M}} _2})$. The Douglas–Rachford splitting method satisfies

$$\begin{aligned} ||{\varvec{\nu }} _i - {\varvec{\eta }} _{i-1}||^2 \le \left( 1 - \frac{1}{i}\right) ^{i-1}\frac{R^2}{i} , \end{aligned}$$

(48)

and the proposed accelerated Douglas–Rachford splitting method satisfies

$$\begin{aligned} ||{\varvec{\nu }} _i - {\varvec{\eta }} _{i-1}||^2 \le \frac{R^2}{i^2} . \end{aligned}$$

(49)

[23, 24] illustrated that ADMM is equivalent to the Douglas–Rachford splitting method on the dual problem, so we naturally develop an accelerated ADMM in the next section and provide numerical experiment of the accelerated ADMM and thus the accelerated Douglas–Rachford splitting method.

6.4 Accelerating the alternating direction method of multipliers (ADMM)

Let ${\mathcal {H}}_1,{\mathcal {H}}_2,{\mathcal {G}}$ be real Hilbert spaces equipped with inner product $\mathop {\langle \cdot ,\, \cdot \rangle }\nolimits $. This section considers a linearly constrained convex problem

$$\begin{aligned} \min _{{\varvec{x}} \in {\mathcal {H}}_1,{\varvec{z}} \in {\mathcal {H}}_2}\;&f({\varvec{x}}) + g({\varvec{z}}) \nonumber \\ \text {subject to }\;&{\varvec{A}} {\varvec{x}} + {\varvec{B}} {\varvec{z}} = {\varvec{c}}, \end{aligned}$$

(50)

where $f\in {\mathcal {F}}({\mathcal {H}}_1)$, $g\in {\mathcal {F}}({\mathcal {H}}_2)$, ${\varvec{A}} \in {\mathcal {B}}({\mathcal {H}}_1,{\mathcal {G}})$, ${\varvec{B}} \in {\mathcal {B}}({\mathcal {H}}_2,{\mathcal {G}})$ and ${\varvec{c}} \in {\mathcal {G}}$. Its dual problem is

$$\begin{aligned} \max _{{\varvec{\nu }} \in {\mathcal {G}}}\;&\left\{ - f^*(-{\varvec{A}} ^*{\varvec{\nu }}) - g^*(-{\varvec{B}} ^*{\varvec{\nu }}) + \mathop {\langle {\varvec{c}},\, {\varvec{\nu }} \rangle }\nolimits \right\} , \end{aligned}$$

(51)

where $f^*({\varvec{y}}) {:}{=} \sup _{{\varvec{x}} \in {\mathcal {H}}_1} \{\mathop {\langle {\varvec{y}},\, {\varvec{x}} \rangle }\nolimits - f({\varvec{x}})\}$ and $g^*({\varvec{y}}) {:}{=} \sup _{{\varvec{z}} \in {\mathcal {H}}_2} \{\mathop {\langle {\varvec{y}},\, {\varvec{z}} \rangle }\nolimits - g({\varvec{z}})\}$ are the conjugate functions of f and g, respectively. The dual problem (51) is equivalent to the following monotone inclusion problem

$$\begin{aligned} {\text {Find}}\;\; {\varvec{\nu }} \in {\mathcal {G}}\quad \text {subject to} \quad {\varvec{0}} \in -{\varvec{A}} \partial f^*(-{\varvec{A}} ^*{\varvec{\nu }}) - {\varvec{B}} \partial g^*(-{\varvec{B}} ^*{\varvec{\nu }}) - {\varvec{c}} . \end{aligned}$$

(52)

We next use the connection between ADMM for solving (50) and the Douglas–Rachford splitting method for solving (52) in [17, Proposition 9] [60] to develop an accelerated ADMM, using the accelerated Douglas–Rachford splitting method in the previous section.

Denoting

$$\begin{aligned} {\varvec{M}} _1&{:}{=} -{\varvec{A}} \partial f^*(-{\varvec{A}} ^*\cdot ) - {\varvec{c}} \quad \text {and} \quad {\varvec{M}} _2 {:}{=} -{\varvec{B}} \partial g^*(-{\varvec{B}} ^*\cdot ) \end{aligned}$$

(53)

converts the problem (52) into a form of the monotone inclusion problem (44). Then we use the following equivalent form of the accelerated Douglas–Rachford splitting method to solve (44) with (53):

$$\begin{aligned} {\varvec{\zeta }} _{i+1}&= {\varvec{J}} _{\rho {\varvec{M}} _2}({\varvec{\eta }} _i) \nonumber \\ {\varvec{\xi }} _{i+1}&= {\varvec{J}} _{\rho {\varvec{M}} _1}(2{\varvec{\zeta }} _{i+1} - {\varvec{\eta }} _i) \nonumber \\ {\varvec{\nu }} _{i+1}&= {\varvec{\eta }} _i + ({\varvec{\xi }} _{i+1} - {\varvec{\zeta }} _i) \nonumber \\ {\varvec{\eta }} _{i+1}&= {\varvec{\nu }} _{i+1} + \frac{i}{i+2}({\varvec{\nu }} _{i+1} - {\varvec{\nu }} _i) - \frac{i}{i+2}({\varvec{\nu }} _i - {\varvec{\eta }} _{i-1}) \end{aligned}$$

(54)

for $i=0,1,\ldots $. Replacing the resolvent operators of ${\varvec{M}} _1$ and ${\varvec{M}} _2$ in (53) by minimization steps yields

$$\begin{aligned} {\varvec{z}} _{i+1}&= \mathop {\mathrm{arg}\, \mathrm{min}}\limits _{z\in H_2} \left\{ g({\varvec{z}}) + \mathop {\langle {\varvec{\eta }} _i,\, {\varvec{B}} {\varvec{z}} \rangle }\nolimits + \frac{\rho }{2}||{\varvec{B}} {\varvec{z}} ||^2\right\} \nonumber \\ {\varvec{\zeta }} _{i+1}&= {\varvec{\eta }} _i + \rho {\varvec{B}} {\varvec{z}} _{i+1} \nonumber \\ \tilde{{\varvec{x}}}_{i+1}&= \mathop {\mathrm{arg}\, \mathrm{min}}\limits _{x\in H_1} \left\{ f({\varvec{x}}) + \mathop {\langle {\varvec{\eta }} _i + 2\rho {\varvec{B}} {\varvec{z}} _{i+1},\, {\varvec{A}} {\varvec{x}}- {\varvec{c}} \rangle }\nolimits + \frac{\rho }{2}||{\varvec{A}} {\varvec{x}}- {\varvec{c}} ||^2\right\} \nonumber \\ {\varvec{\xi }} _{i+1}&= {\varvec{\eta }} _i + \rho ({\varvec{A}} \tilde{{\varvec{x}}}_{i+1} - {\varvec{c}}) + 2\rho {\varvec{B}} {\varvec{z}} _{i+1} \nonumber \\ {\varvec{\nu }} _{i+1}&= {\varvec{\eta }} _i + \rho ({\varvec{A}} \tilde{{\varvec{x}}}_{i+1} + {\varvec{B}} {\varvec{z}} _{i+1} - {\varvec{c}}) \nonumber \\ {\varvec{\eta }} _{i+1}&= {\varvec{\nu }} _{i+1} + \frac{i}{i+2}({\varvec{\nu }} _{i+1} - {\varvec{\nu }} _i) - \frac{i}{i+2}({\varvec{\nu }} _i - {\varvec{\eta }} _{i-1}). \end{aligned}$$

(55)

By discarding ${\varvec{\zeta }} _i$ and ${\varvec{\xi }} _i$, and defining

$$\begin{aligned} \hat{{\varvec{\nu }}}_i {:}{=} {\varvec{\nu }} _i - \rho ({\varvec{A}} \tilde{{\varvec{x}}}_i - {\varvec{c}}) \quad \text {and}\quad \hat{{\varvec{\eta }}}_i {:}{=} {\varvec{\eta }} _i - \rho ({\varvec{A}} \tilde{{\varvec{x}}}_i - {\varvec{c}}), \end{aligned}$$

(56)

for $i=0,1,\ldots $, we have

$$\begin{aligned} {\varvec{z}} _{i+1}&= \mathop {\mathrm{arg}\, \mathrm{min}}\limits _{{z\in {\mathcal {H}}_2}} \left\{ g({\varvec{z}}) + \mathop {\langle \hat{{\varvec{\eta }}}_i + \rho ({\varvec{A}} \tilde{{\varvec{x}}}_i - {\varvec{c}}),\, {\varvec{B}} {\varvec{z}} \rangle }\nolimits + \frac{\rho }{2}||{\varvec{B}} {\varvec{z}} ||^2\right\} \nonumber \\&= \mathop {\mathrm{arg}\, \mathrm{min}}\limits _{z\in {\mathcal {H}}_2} \left\{ g({\varvec{z}}) + \mathop {\langle \hat{{\varvec{\eta }}}_i,\, {\varvec{A}} \tilde{{\varvec{x}}}_i + {\varvec{B}} {\varvec{z}}- {\varvec{c}} \rangle }\nolimits + \frac{\rho }{2}||{\varvec{A}} \tilde{{\varvec{x}}}_i + {\varvec{B}} {\varvec{z}}- {\varvec{c}} ||^2\right\} \nonumber \\ \tilde{{\varvec{x}}}_{i+1}&= \mathop {\mathrm{arg}\, \mathrm{min}}\limits _{x\in {\mathcal {H}}_1} \left\{ f({\varvec{x}}) + \mathop {\langle \hat{{\varvec{\nu }}}_{i+1} + \rho {\varvec{B}} {\varvec{z}} _{i+1},\, {\varvec{A}} {\varvec{x}}- {\varvec{c}} \rangle }\nolimits + \frac{\rho }{2}||{\varvec{A}} {\varvec{x}}- {\varvec{c}} ||^2\right\} \nonumber \\&= \mathop {\mathrm{arg}\, \mathrm{min}}\limits _{x\in {\mathcal {H}}_1} \left\{ f({\varvec{x}}) + \mathop {\langle \hat{{\varvec{\nu }}}_{i+1},\, {\varvec{A}} {\varvec{x}} + {\varvec{B}} {\varvec{z}} _{i+1} - {\varvec{c}} \rangle }\nolimits + \frac{\rho }{2}||{\varvec{A}} {\varvec{x}} + {\varvec{B}} {\varvec{z}} _{i+1} - {\varvec{c}} ||^2\right\} \nonumber \\ \hat{{\varvec{\nu }}}_{i+1}&= \hat{{\varvec{\eta }}}_i + \rho ({\varvec{A}} \tilde{{\varvec{x}}}_i + {\varvec{B}} {\varvec{z}} _{i+1} - {\varvec{c}}) \nonumber \\ \hat{{\varvec{\eta }}}_{i+1}&= \hat{{\varvec{\nu }}}_{i+1} + \frac{i}{i+2}(\hat{{\varvec{\nu }}}_{i+1} - \hat{{\varvec{\nu }}}_i + \rho {\varvec{A}} (\tilde{{\varvec{x}}}_{i+1} - \tilde{{\varvec{x}}}_i)) \nonumber \\&\quad - \frac{i}{i+2}(\hat{{\varvec{\nu }}}_i - \hat{{\varvec{\eta }}}_{i-1} + \rho {\varvec{A}} (\tilde{{\varvec{x}}}_i - \tilde{{\varvec{x}}}_{i-1})) . \end{aligned}$$

(57)

Then, replacing $\tilde{{\varvec{x}}}_i$ by ${\varvec{x}} _{i+1}$ and reordering steps appropriately yield the following accelerated version of ADMM, which reduces to the standard ADMM when we let $\hat{{\varvec{\eta }}}_i = \hat{{\varvec{\nu }}}_i$ for $i=0,1,\ldots $.

Since

$$\begin{aligned} {\varvec{\nu }} _i - {\varvec{\eta }} _{i-1} = \hat{{\varvec{\nu }}}_i - \hat{{\varvec{\eta }}}_{i-1} - \rho ({\varvec{A}} {\varvec{x}} _{i+1} - {\varvec{A}} {\varvec{x}} _i) = \rho ({\varvec{A}} {\varvec{x}} _{i+1} + {\varvec{B}} {\varvec{z}} _i - {\varvec{c}}), \end{aligned}$$

(58)

we have the following worst-case rates with respect to the infeasibility for ADMM and its accelerated version, using (8) and (27).

Corollary 6.3

Assume that $||\hat{{\varvec{\nu }}}_0 + \rho {\varvec{A}} ({\varvec{x}} _0 - {\varvec{c}}) - {\varvec{\nu }} _*|| \le R$ for some ${\varvec{\nu }} _*\in X_*({\varvec{M}} _{\rho ,-{\varvec{A}} \partial f^*(-{\varvec{A}} ^*\cdot ) - {\varvec{c}},-{\varvec{B}} \partial g^*(-{\varvec{B}} ^*\cdot )})$. Alternating direction method of multipliers satisfies

$$\begin{aligned} ||{\varvec{A}} {\varvec{x}} _{i+1} + {\varvec{B}} {\varvec{z}} _i - {\varvec{c}} ||^2 \le \left( 1 - \frac{1}{i}\right) ^{i-1}\frac{R^2}{\rho ^2i} ,\end{aligned}$$

(59)

and the proposed accelerated alternating direction method of multipliers satisfies

$$\begin{aligned} ||{\varvec{A}} {\varvec{x}} _{i+1} + {\varvec{B}} {\varvec{z}} _i - {\varvec{c}} ||^2 \le \frac{R^2}{\rho ^2i^2} . \end{aligned}$$

(60)

The bound (59) is e-times asymptotically smaller than the known rate for ADMM in [17, Theorem 15], which originated from the bound (7). Finding exact bounds for the ADMM and its proposed variant is yet left as future work.

Remark 6.1

Many existing rates for the (preconditioned) ADMM consider the ergodic sequences $\{{\bar{{\varvec{x}}}}_i\}$ and $\{{\bar{{\varvec{z}}}}_i\}$, where ${\bar{{\varvec{x}}}}_i {:}{=} \frac{1}{i}\sum _{l=1}^i{\varvec{x}} _l$ and ${\bar{{\varvec{z}}}}_i {:}{=} \frac{1}{i}\sum _{l=1}^i{\varvec{z}} _l$ (see e.g., [11, 13, 17, 18]). In particular, in [17, Theorem 15], ADMM is found to satisfy

$$\begin{aligned} ||{\varvec{A}} {\bar{{\varvec{x}}}}_{i+1} + {\varvec{B}} {\bar{{\varvec{z}}}}_i - {\varvec{c}} ||^2 \le \frac{16R^2}{\rho ^2i^2} , \end{aligned}$$

(61)

which is faster than the rate of the nonergodic sequence $\{{\varvec{x}} _i,{\varvec{z}} _i\}$ of ADMM in (59) and is comparable to the rate of the proposed accelerated ADMM in (60). One should note that the feasibility convergence of the ergodic sequence, as in (61), does not necessarily imply the convergence of the fixed-point residual of the ergodic sequence, unlike (59) and (60) for the nonergodic sequence. In addition, some numerical experiments in [13] illustrate that the performance of the nonergodic sequence can be faster than that of the ergodic sequence. We leave further understanding the rates of the ergodic and nonergodic sequences of (preconditioned) ADMM and their relationship as future work.

Remark 6.2

[11, 13, 28] proposed accelerated variants of (preconditioned) ADMM under some additional conditions, while the proposed method does not require such conditions.

Example 6.3

We apply the accelerated ADMM to the problem

$$\begin{aligned} \min _{{\varvec{x}} \in \mathbb {R} ^{d_1},{\varvec{z}} \in \mathbb {R} ^{d_2}}\;&\frac{1}{2}||{\varvec{H}} {\varvec{x}}- {\varvec{b}} ||^2 + \gamma ||{\varvec{z}} ||_1 \nonumber \\ \text {subject to }\;&{\varvec{D}} {\varvec{x}}- {\varvec{z}} = {\varvec{0}}, \end{aligned}$$

(62)

with a positive real number $\gamma $, associated with the total-variation-regularized least-squares problem

$$\begin{aligned} \min _{{\varvec{x}} \in \mathbb {R} ^{d_1}}\;&\frac{1}{2}||{\varvec{H}} {\varvec{x}}- {\varvec{b}} ||^2 + \gamma ||{\varvec{D}} {\varvec{x}} ||_1 , \end{aligned}$$

(63)

where ${\varvec{H}} \in \mathbb {R} ^{p\times d_1}$, ${\varvec{b}} \in \mathbb {R} ^p$, and a matrix ${\varvec{D}} \in \mathbb {R} ^{d_2\times d_1}$ is given as

$$\begin{aligned} {\varvec{D}} = \left[ \begin{array}{cccccc} 1 &{} \quad -1 &{} \quad 0 &{} \quad 0 &{} \quad \cdots &{} \quad 0 \\ 0 &{} \quad 1 &{} \quad -1 &{} \quad 0 &{} \quad \cdots &{} \quad 0 \\ \vdots &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad \vdots \\ \vdots &{} \quad \ddots &{} \quad 0 &{} \quad 1 &{} \quad -1 &{} \quad 0 \\ 0 &{} \quad \cdots &{} \quad \cdots &{} \quad 0 &{} \quad 1 &{} \quad -1 \end{array}\right] .\end{aligned}$$

(64)

By letting $f({\varvec{x}}) = \frac{1}{2}||{\varvec{H}} {\varvec{x}}- {\varvec{b}} ||^2$, $g({\varvec{z}}) = \gamma ||{\varvec{z}} ||_1$, ${\varvec{A}} = {\varvec{D}} $, ${\varvec{B}} = -{\varvec{I}} $ and ${\varvec{c}} = {\varvec{0}} $, we have the following accelerated ADMM method:

$$\begin{aligned} {\varvec{x}} _{i+1}&= \mathop {\text {arg min}}\limits _{x\in R^{d_1}} \left\{ \frac{1}{2}||{\varvec{H}} {\varvec{x}}- {\varvec{b}} ||^2 + \mathop {\langle \hat{{\varvec{\nu }}}_i,\, {\varvec{D}} {\varvec{x}}- {\varvec{z}} _i \rangle }\nolimits + \frac{\rho }{2}||{\varvec{D}} {\varvec{x}}- {\varvec{z}} _i||^2\right\} \nonumber \\&= ({\varvec{H}} ^\top {\varvec{H}} + \rho {\varvec{D}} ^\top {\varvec{D}})^{-1}({\varvec{D}} ^\top (\rho {\varvec{z}} _i - \hat{{\varvec{\nu }}}_i) + {\varvec{H}} ^\top {\varvec{b}}) \nonumber \\ \hat{{\varvec{\eta }}}_i&= {\left\{ \begin{array}{ll} \hat{{\varvec{\nu }}}_i, &{} i=0,1, \\ \hat{{\varvec{\nu }}}_i + \frac{i-1}{i+1}(\hat{{\varvec{\nu }}}_i - \hat{{\varvec{\nu }}}_{i-1} + \rho {\varvec{D}} ({\varvec{x}} _{i+1} - {\varvec{x}} _i)) - \frac{i-1}{i+1}(\hat{{\varvec{\nu }}}_{i-1} - \hat{{\varvec{\eta }}}_{i-2} + \rho {\varvec{D}} ({\varvec{x}} _i - {\varvec{x}} _{i-1})), &{} i=2,3,\ldots \end{array}\right. } \nonumber \\ {\varvec{z}} _{i+1}&= \mathop {\text {arg min}}\limits _{z\in R^{d_2}} \left\{ \gamma ||{\varvec{z}} ||_1 + \mathop {\langle \hat{{\varvec{\eta }}}_i,\, {\varvec{D}} {\varvec{x}} _{i+1} - {\varvec{z}} \rangle }\nolimits + \frac{\rho }{2}||{\varvec{D}} {\varvec{x}} _{i+1} - {\varvec{z}} ||^2\right\} \nonumber \\&= {\text {S}}_{\frac{\gamma }{\rho }}\left( {\varvec{D}} {\varvec{x}} _{i+1} + \frac{1}{\rho }\hat{{\varvec{\eta }}}_i\right) \nonumber \\ \hat{{\varvec{\nu }}}_{i+1}&= \hat{{\varvec{\eta }}}_i + \rho ({\varvec{D}} {\varvec{x}} _{i+1} - {\varvec{z}} _{i+1}), \end{aligned}$$

(65)

where the soft-thresholding operator is defined as ${\text {S}}_{\tau }({\varvec{z}}) {:}{=} \max \{|{\varvec{z}} |-\tau ,{\varvec{0}} \} \odot {\text {sign}}({\varvec{z}})$ with the element-wise absolute value, maximum and multiplication operators, $|\cdot |$, $\max \{\cdot ,\cdot \}$ and $\odot $, respectively.

In the experiment, we choose $d_1 = 100$, $d_2 = 99$, $p=5$, and a true vector ${\varvec{x}} _{\mathrm {true}}$ is constructed such that a vector ${\varvec{D}} {\varvec{x}} _{\mathrm {true}}$ has few nonzero elements. A matrix ${\varvec{H}} $ is randomly generated and a noisy vector ${\varvec{b}} $ is generated by adding randomly generated (noise) vector to ${\varvec{H}} {\varvec{x}} _{\mathrm {true}}$. We choose the parameters $\gamma = 3$ and $\rho = 0.05$ in the experiment.

Figure 5 illustrates the fixed-point residual of ADMM and its accelerated variants. Interestingly, ADMM has a rate comparable to the $O(1/i^2)$ rate of the proposed method. This does not contradict with the theory, and we leave further investigating the worst-case rate of ADMM under the Lipschitz continuity condition of $\nabla f$; similar analysis but under different conditions can be found in [17, 18]. Noticing the oscillating behavior of the proposed ADMM in Fig. 5, we heuristically restarted the proposed method every 20 iterations, yielding a linear rate, without a strong monotonicity condition.^{Footnote 4} Restarting has been previously found useful for a different accelerated ADMM in [28].

7 Accelerated forward method for cocoercive operators

This section applies the proposed acceleration to a forward method, such as a gradient method, for cocoercive operators. A single-valued operator ${\varvec{M}} \;:\;{\mathcal {H}}\rightarrow {\mathcal {H}}$ is $\beta $-cocoercive for $\beta \in \mathbb {R} _{++}$ if

$$\begin{aligned} \mathop {\langle {\varvec{x}}- {\varvec{y}},\, {\varvec{M}} {\varvec{x}}- {\varvec{M}} {\varvec{y}} \rangle }\nolimits \ge \beta ||{\varvec{M}} {\varvec{x}}- {\varvec{M}} {\varvec{y}} ||^2\text { for all } {\varvec{x}},{\varvec{y}} \in {\mathcal {H}}. \end{aligned}$$

(66)

Let ${\mathcal {C}}_{\beta }({\mathcal {H}})$ be the class of $\beta $-cocoercive operators on ${\mathcal {H}}$. For the $\beta $-cocoercive operator, the following forward method (that iteratively applies the forward operator ${\varvec{I}}- \beta {\varvec{M}} $) is guaranteed to converge weakly to a solution [7, Theorem 26.14].

An operator ${\varvec{T}} \;:\;{\mathcal {H}}\rightarrow {\mathcal {H}}$ is $\lambda $-cocoercive if and only if it is the Yosida approximation of index $\lambda $ [7, Proposition 23.21]:

$$\begin{aligned} {\varvec{M}} _\lambda {:}{=} \frac{1}{\lambda }({\varvec{I}}- {\varvec{J}} _{\lambda {\varvec{M}}}) \end{aligned}$$

(67)

of a maximally monotone operator ${\varvec{M}} \;:\;{\mathcal {H}}\rightarrow 2^{\mathcal {H}}$. We thus have the following equivalence between the resolvent (backward) operator of a maximally monotone operator ${\varvec{M}} $ and a forward operator of the corresponding cocoercive operator ${\varvec{M}} _\lambda $:

$$\begin{aligned} {\varvec{J}} _{\lambda {\varvec{M}}} = ({\varvec{I}} + \lambda {\varvec{M}})^{-1} = {\varvec{I}}- \lambda {\varvec{M}} _\lambda .\end{aligned}$$

(68)

Therefore, the results on the proximal point method and its accelerated variant for monotone operators directly apply to the forward method and its accelerated variant for cocoercive operators.

8 Conclusion

This paper developed an accelerated proximal point method for maximally monotone operators, with respect to the fixed-point residual, using the computer-assisted performance estimation problem approach. Restarting technique was further employed under the strong monotonicity condition. The proposed acceleration was applied to various instances of the proximal point method such as the proximal method of multipliers, the primal–dual hybrid gradient method, the Douglas–Rachford splitting method, and the alternating direction method of multipliers, yielding accelerations both theoretically and practically. The acceleration was also applied to a forward method for cocoercive operators.

We leave developing accelerations for more general or more specific classes of problems or methods as future work, possibly via the performance estimation problem approach; a comprehensive understanding of accelerations for the alternating direction method of multipliers with respect to various performance measures under various conditions are yet remain open.

Notes

The convergence of the fixed-point residual does not guarantee the convergence of the sequence of the iterates $\{{\varvec{x}} _i\}$. We leave analyzing the convergence of the sequence as future work, possibly based on the convergence analysis in [10] for Nesterov’s fast gradient method [51, 52] and FISTA [8] in convex minimization.
A convex–concave function $\phi $ satisfies $\phi ({\varvec{u}} _*,{\varvec{v}} _N) \ge \phi ({\varvec{u}} _N,{\varvec{v}} _N) + \mathop {\langle {\varvec{u}} _* - {\varvec{u}} _N,\, {\varvec{q}} _{{\varvec{u}},N} \rangle }\nolimits $ for ${\varvec{q}} _{{\varvec{u}},N} \in \partial _{{\varvec{u}}}\phi ({\varvec{u}} _N,{\varvec{v}} _N)$ and $-\phi ({\varvec{u}} _N,{\varvec{v}} _*) \ge -\phi ({\varvec{u}} _N,{\varvec{v}} _N) + \mathop {\langle {\varvec{v}} _* - {\varvec{v}} _N,\, -{\varvec{q}} _{{\varvec{v}},N} \rangle }\nolimits $ for $-{\varvec{q}} _{{\varvec{v}},N} \in \partial _{{\varvec{v}}} (-\phi ({\varvec{u}} _N,{\varvec{v}} _N))$. Adding these two inequalities yields $\mathop {\langle {\varvec{x}} _N - {\varvec{x}} _*,\, {\varvec{q}} _N \rangle }\nolimits \ge \phi ({\varvec{u}} _N,{\varvec{v}} _*) - \phi ({\varvec{u}} _*,{\varvec{v}} _N)$, where ${\varvec{x}} _N {:}{=} ({\varvec{u}} _N,{\varvec{v}} _N)$ and ${\varvec{q}} _N {:}{=} ({\varvec{q}} _{{\varvec{u}},N},-{\varvec{q}} _{{\varvec{v}},N})$.
We found that adaptively restarting the method when the fixed-point residual increases seems to be a good option in practice.
Since $\nabla f$ is Lipschitz continuous, the operator ${\varvec{M}} _1 = -{\varvec{D}} \partial f^*(-{\varvec{D}} ^\top \cdot )$ in (53) for the problem (62) is strongly monotone, but this is insufficient to guarantee a strong monotonicity of ${\varvec{M}} _{\rho ,{\varvec{M}} _1,{\varvec{M}} _2}$ (46) for the problem (62).

References

Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1–2), 3–11 (2001). https://doi.org/10.1023/A:1011253113155
Article MathSciNet MATH Google Scholar
Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. Appl. Math. Optim. 80(3), 547–598 (2019). https://doi.org/10.1007/s00245-019-09584-z
Article MathSciNet MATH Google Scholar
Attouch, H., Cabot, A.: Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. Math. Program. 184(1–2), 243–287 (2020). https://doi.org/10.1007/s10107-019-01412-0
Article MathSciNet MATH Google Scholar
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01591-1
Article Google Scholar
Attouch, H., Chbani, Z., Riahi, H.: Fast proximal methods via time scaling of damped inertial dynamics. SIAM J. Optim. 29(3), 2227–2256 (2019). https://doi.org/10.1137/18M1230207
Article MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators. Math. Program. 174(1–2), 391–432 (2019). https://doi.org/10.1007/s10107-018-1252-x
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces. Springer, Berlin (2011). https://doi.org/10.1007/978-1-4419-9467-7
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Article MathSciNet MATH Google Scholar
Brezis, H., Lions, P.L.: Produits infinis de resolvantes. Isr. J. Math. 29(4), 329–345 (1978). https://doi.org/10.1007/BF02761171
Article MATH Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “Fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–82 (2015). https://doi.org/10.1007/s10957-015-0746-4
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011). https://doi.org/10.1007/s10851-010-0251-1
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016). https://doi.org/10.1017/S096249291600009X
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1), 253–87 (2016). https://doi.org/10.1007/s10107-015-0957-3
Article MathSciNet MATH Google Scholar
Combettes, P.L.: Monotone operator theory in convex optimization. Math. Program. 170(1), 177–206 (2018)
Article MathSciNet Google Scholar
Corman, E., Yuan, X.: A generalized proximal point algorithm and its convergence rate. SIAM J. Optim. 24(4), 1614–38 (2014). https://doi.org/10.1137/130940402
Article MathSciNet MATH Google Scholar
CVX Research Inc.: CVX: Matlab software for disciplined convex programming, version 2.0. http://cvxr.com/cvx (2012)
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. In: Glowinski, R., Osher, S., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science, and Engineering. Springer, Berlin (2016)
Google Scholar
Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42(3), 783–805 (2017). https://doi.org/10.1287/moor.2016.0827
Article MathSciNet MATH Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–39 (1956)
Article MathSciNet Google Scholar
Drori, Y., Taylor, A.B.: Efficient first-order methods for convex minimization: a constructive approach. Math. Program. 184(1–2), 183–220 (2020). https://doi.org/10.1007/s10107-019-01410-2
Article MathSciNet MATH Google Scholar
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–82 (2014). https://doi.org/10.1007/s10107-013-0653-0
Article MathSciNet MATH Google Scholar
Drori, Y., Teboulle, M.: An optimal variant of Kelley’s cutting-plane method. Math. Program. 160(1), 321–51 (2016). https://doi.org/10.1007/s10107-016-0985-7
Article MathSciNet MATH Google Scholar
Eckstein, J.: The Lions–Mercier splitting algorithm and the alternating direction method are instances of the proximal point method (1988). Technical report LIDS-P-1769
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992). https://doi.org/10.1007/BF01581204
Article MathSciNet MATH Google Scholar
Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–46 (2010). https://doi.org/10.1137/09076934X
Article MathSciNet MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2(1), 17–40 (1976). https://doi.org/10.1016/0898-1221(76)90003-1
Article MATH Google Scholar
Glowinski, R., Marrocco, A.: Sur lapproximation par elements nis dordre un, et la resolution par penalisation-dualite dune classe de problemes de dirichlet nonlineaires, rev. francaise daut. Inf. Rech. Oper. R–2, 41–76 (1975)
Google Scholar
Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–623 (2014). https://doi.org/10.1137/120896219
Article MathSciNet MATH Google Scholar
Gol’shtein, E.G., Tret’yakov, N.V.: Modified Lagrangians in convex programming and their generalizations. In: Huard, P. (ed.) Point-to-Set Maps and Mathematical Programming, Mathematical Programming Studies 10. Springer, Berlin (1979)
Google Scholar
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pp. 95–110. Springer, Berlin (2008)
Google Scholar
Gu, G., Yang, J.: On the optimal ergodic sublinear convergence rate of the relaxed proximal point algorithm for variational inequalities (2019). arXiv:1905.06030
Gu, G., Yang, J.: Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problems. SIAM J. Optim. 30(3), 1905–1921 (2020). https://doi.org/10.1137/19M1299049
Article MathSciNet MATH Google Scholar
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optim. 29(2), 403–19 (1991). https://doi.org/10.1137/0329022
Article MathSciNet MATH Google Scholar
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–64 (1992). https://doi.org/10.1137/0802032
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–49 (2012). https://doi.org/10.1137/100814494
Article MathSciNet MATH Google Scholar
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303–20 (1969). https://doi.org/10.1007/BF00927673
Article MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1), 81–107 (2016). https://doi.org/10.1007/s10107-015-0949-3
Article MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Another look at the Fast Iterative Shrinkage/Thresholding Algorithm (FISTA). SIAM J. Optim. 28(1), 223–50 (2018). https://doi.org/10.1137/16M108940X
Article MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Generalizing the optimized gradient method for smooth convex minimization. SIAM J. Optim. 28(2), 1920–50 (2018). https://doi.org/10.1137/17m112124x
Article MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. (2020). https://doi.org/10.1007/s10957-020-01770-2
Article MATH Google Scholar
Lieder, F.: On the convergence rate of the Halpern-iteration. Optim. Lett. 15, 405–18 (2020). https://doi.org/10.1007/s11590-020-01617-9
Article MathSciNet MATH Google Scholar
Lieder, F.: Projection based methods for conic linear programming—optimal first order complexities and norm constrained quasi newton methods (Doctoral dissertation, Universitäts-und Landesbibliothek der Heinrich-Heine-Universität Düsseldorf) (2018). https://docserv.uniduesseldorf.de/servlets/DerivateServlet/Derivate-49971/Dissertation.pdf
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018)
MathSciNet MATH Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–79 (1979). https://doi.org/10.1137/0716071
Article MathSciNet MATH Google Scholar
Lorenz, D., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 311–25 (2015). https://doi.org/10.1007/s10851-014-0523-2
Article MathSciNet MATH Google Scholar
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informat. Recherche Opérationnelle 4, 154–8 (1970)
MathSciNet MATH Google Scholar
Minty, G.J.: Monotone (nonlinear) operators in Hilbert space. Duke Math. J. 29(3), 341–6 (1962). https://doi.org/10.1215/S0012-7094-62-02933-2
Article MathSciNet MATH Google Scholar
Minty, G.J.: On the monotonicity of the gradient of a convex function. Pac. J. Math. 14, 243–7 (1964)
Article MathSciNet Google Scholar
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société Mathématique de France 93, 273–99 (1965)
Article MathSciNet Google Scholar
Nemirovski, A.: Efficient methods in convex programming (1994). http://www2.isye.gatech.edu/~nemirovs/Lect_EMCO.pdf. visited on 05/2019
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence $O(1/k^2)$. Dokl. Akad. Nauk. USSR 269(3), 543–7 (1983)
Google Scholar
Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24, 509–517 (1988). in Russian
MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–61 (2013). https://doi.org/10.1007/s10107-012-0629-5
Article MathSciNet MATH Google Scholar
O’Donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–32 (2015). https://doi.org/10.1007/s10208-013-9150-3
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964). https://doi.org/10.1016/0041-5553(64)90137-5
Article Google Scholar
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, pp. 283–298. Academic Press, New York (1969)
Google Scholar
Rockafellar, R.T.: Monotone operators associated with saddle functions and minimax problems. In: Browder, F.E. (ed.) Nonlinear Functional Analysis, Part 1, vol. 18, pp. 397–407. American Mathematical Society, New York (1970)
Google Scholar
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976). https://doi.org/10.1287/moor.1.2.97
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–98 (1976). https://doi.org/10.1137/0314056
Article MathSciNet MATH Google Scholar
Ryu, E.K., Boyd, S.: A primer on monotone operator methods. Appl. Comput. Math. 15(1), 3–43 (2016)
MathSciNet MATH Google Scholar
Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. SIAM J. Optim. 30(3), 2251–2271 (2020). https://doi.org/10.1137/19M1304854
Article MathSciNet MATH Google Scholar
Ryu, E.K., Yin, W.: Large-scale convex optimization via monotone operators (2020). https://large-scale-book.mathopt.com/LSCOMO.pdf. visited on 03/2021
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations (2018). arXiv:1810.08907
Su, W., Boyd, S., Candes, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)
MathSciNet MATH Google Scholar
Taylor, A.B., Bach, F.: Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In: Proceedings of the Conference on Learning Theory, pp. 2934–2992 (2019)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–313 (2017). https://doi.org/10.1137/16m108104x
Article MathSciNet MATH Google Scholar
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1), 307–45 (2017). https://doi.org/10.1007/s10107-016-1009-3
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author sincerely appreciates the useful comments by the associate editor and anonymous referees. The author also would like to thank Dr. Felix Lieder, who brought to attention his Ph.D. thesis [42] and his paper [41], after the acceptance of this paper, which optimized the step coefficients of the Krasnosel’skii-Mann iteration for a nonexpansive operator ${\varvec{T}} $, similarly using the PEP approach. The form of the resulting optimized method differs from that of the accelerated proximal point method proposed in this paper, but [62] recently showed that they are equivalent in the sense that they generate the same sequence, when ${\varvec{T}} = 2{\varvec{J}} _{\lambda {\varvec{M}}} - {\varvec{I}} $.

Author information

Authors and Affiliations

Department of Mathematical Sciences, KAIST, Daejeon, Republic of Korea
Donghwan Kim

Authors

Donghwan Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donghwan Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Research Foundation of Korea (NRF) Grant funded by the Korea Government (MSIT) (No. 2019R1A5A1028324), and the POSCO Science Fellowship of POSCO TJ Park Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, D. Accelerated proximal point method for maximally monotone operators. Math. Program. 190, 57–87 (2021). https://doi.org/10.1007/s10107-021-01643-0

Download citation

Received: 23 May 2019
Accepted: 05 May 2020
Published: 23 March 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s10107-021-01643-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Accelerated proximal point method for maximally monotone operators

Abstract

Similar content being viewed by others

General Proximal-Point Algorithm for Monotone Operators

The Developments of Proximal Point Algorithms

Weak and strong convergence of generalized proximal point algorithms with relaxed parameters

1 Introduction

2 Problem and method

2.1 Monotone inclusion problem

2.2 Proximal point method and its worst-case rates

Remark 2.1

2.3 Existing accelerations for proximal point method

2.4 Main contribution

3 Performance estimation problem for maximally monotone operators

Lemma 3.1

4 Accelerating the proximal point method for maximally monotone operators

Lemma 4.1

Proof

Proposition 4.1

Proof

Theorem 4.1

Proof

5 Restarting the accelerated proximal point method for strongly monotone operators

6 Applications of the accelerated proximal point method

6.1 Accelerating the proximal point method for convex–concave saddle-point problem

Example 6.1

6.2 Accelerating the primal–dual hybrid gradient method

Corollary 6.1

Example 6.2

6.3 Accelerating the Douglas–Rachford splitting method

Corollary 6.2

6.4 Accelerating the alternating direction method of multipliers (ADMM)

Corollary 6.3

Remark 6.1

Remark 6.2

Example 6.3

7 Accelerated forward method for cocoercive operators

8 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation