On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers

He, Bingsheng; Yuan, Xiaoming

doi:10.1007/s00211-014-0673-6

On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers

Published: 30 November 2014

Volume 130, pages 567–577, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Numerische Mathematik Aims and scope Submit manuscript

On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers

Download PDF

Bingsheng He¹ &
Xiaoming Yuan²

2724 Accesses
159 Citations
Explore all metrics

Abstract

This note proposes a novel approach to derive a worst-case $O(1/k)$ convergence rate measured by the iteration complexity in a non-ergodic sense for the Douglas–Rachford alternating direction method of multipliers proposed by Glowinski and Marrocco.

On the Convergence Properties of a Majorized Alternating Direction Method of Multipliers for Linearly Constrained Convex Optimization Problems with Coupled Objective Functions

Article 02 February 2016

The exact worst-case convergence rate of the alternating direction method of multipliers

Article Open access 26 December 2023

Further Study on the Convergence Rate of Alternating Direction Method of Multipliers with Logarithmic-quadratic Proximal Regularization

Article 21 November 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

There has been an impressive development on operator splitting methods in the area of partial differential equations, and among them are some alternating direction methods of multipliers (ADMMs for short). In this note, we focus on the Douglas–Rachford ADMM scheme proposed by Glowinski and Marrocco in [7] (see also [5]) and we restrict our discussion into the context of a convex minimization problem with linear constraints and a separable objective function:

$$\begin{aligned} \min \big \{\theta _1(x) + \theta _2(y) \; | \; Ax + By =b, \, x\in \mathcal{X}, y\in \mathcal{Y} \big \}, \end{aligned}$$

(1.1)

where $A\in \mathfrak {R}^{m\times n_1}$, $B\in \mathfrak {R}^{m\times n_2}$, $b\in \mathfrak {R}^m$, $\mathcal{X}\subset \mathfrak {R}^{n_1}$ and $\mathcal{Y}\subset \mathfrak {R}^{n_2}$ are closed convex sets, $\theta _1: \mathfrak {R}^{n_1} \rightarrow \mathfrak {R}$ and $\theta _2: \mathfrak {R}^{n_2} \rightarrow \mathfrak {R}$ are convex functions (not necessarily smooth). The solution set of (1.1) is assumed to be nonempty, and we refer to [4, 8] for some convergence results without this assumption.

As in [10], in order to treat the original ADMM in [7] and the split inexact Uzawa method in [13] uniformly, we study the following ADMM scheme for (1.1):

$$\begin{aligned} x^{k+1} = \arg \min \bigl \{\theta _1(x)+\frac{\beta }{2}\Vert (Ax+By^k-b)- {1 \over \beta }\lambda ^k\Vert ^2+\frac{1}{2} \Vert x-x^k\Vert _G^2\;\big |\;x\in \mathcal{X} \bigr \}\!,\quad \end{aligned}$$

(1.2a)

$$\begin{aligned} y^{k+1} = \arg \min \bigl \{\theta _2(y)+\frac{\beta }{2}\Vert (Ax^{k+1}+By-b) -{1 \over \beta }\lambda ^k\Vert ^2\;\big |\;y\in \mathcal{Y}\bigr \}\!, \end{aligned}$$

(1.2b)

$$\begin{aligned} \lambda ^{k+1} = \lambda ^k-\beta (Ax^{k+1}+By^{k+1}-b)\!, \end{aligned}$$

(1.2c)

where $\lambda ^k\in {\mathfrak {R}}^m$ is the Lagrange multiplier, $\beta > 0$ is a penalty parameter, and $G \in \mathfrak {R}^{n_1 \times n_1}$ is a symmetric and positive semidefinite matrix. In fact, the original ADMM scheme in [7] and the split inexact Uzawa method in [13] are recovered by taking $G=0$ and $G=(r I_{n_1}-\beta A^TA)$ with $r >\beta \Vert A^TA\Vert $ (where $\Vert \cdot \Vert $ denotes a matrix norm such as the spectrum norm) in (1.2a), respectively. We refer to review papers [1, 3, 6] and references therein for the history of ADMMs, and in particular, some efficient applications of ADMMs exploited recently.

Because of its impressive efficiency and wide applicability, it is interesting to investigate the convergence rate of the ADMM scheme (1.2). A first-step work is [10] where we showed a worst-case $O(1/k)$ convergence rate measured by the iteration complexity, where $k$ denotes the iteration counter, for the ADMM scheme (1.2)^{Footnote 1}. With this first-step result, it becomes possible to investigate more intensive results on the convergence rate of the scheme (1.2). Recall that the convergence rate derived in [10] is in the ergodic sense because the approximate solution with an accuracy of $O(1/k)$ is found based on all $k$ iterates generated by (1.2). One may ask if we can establish the same convergence rate directly in a non-ergodic sense for the sequence generated by the scheme (1.2). The main purpose of this note is to answer this question affirmatively. We expect the new technique to be used to analyze convergence rates for other important numerical algorithms of the same kind.

2 Preliminaries

In this section, we provide some preliminaries which are useful in later analysis.

2.1 Notations

We first define some matrices which will simplify the notations in our analysis. More specifically, let

$$\begin{aligned} H= \left( \begin{array}{lll} G &{} 0 &{} 0\\ 0 &{} \beta B^TB &{} 0 \\ 0 &{} 0 &{} \frac{1}{\beta } I_m \end{array} \right) \!, \;\; M= \left( \begin{array}{lll} I_{n_1} &{} 0 &{} 0 \\ 0 &{} I_{n_2} &{} 0 \\ 0 &{} -\beta B &{} I_m \end{array} \right) \!,\;\; Q= \left( \begin{array}{lll} G &{} 0 &{} 0 \\ 0 &{} \beta B^TB&{} 0 \\ 0 &{} -B &{} \frac{1}{\beta } I_m \end{array} \right) \!. \end{aligned}$$

(2.1)

Without further assumption on $B$, the matrix $H$ defined above can only be guaranteed as symmetric and positive semidefinite. But we still use the notation $\Vert w\Vert _H^2$ to represent the non-negative number $w^THw$ in our analysis. Based on these matrices, some relationship can be easily derived and we summarize them in the following proposition.

Proposition 2.1

Let the matrices $H$, $M$ and $Q$ be defined in (2.1). Then we have

(1)
$Q=HM$;
(2)
The symmetric matrix $(Q^T+Q) - M^THM$ is positive semidefinite: $(Q^T+Q) - M^THM \succeq 0$.

Proof

The first conclusion is trivial, and we omit it. For the second one, we notice that

$$\begin{aligned}&(Q^T+Q) - M^THM =(Q^T+Q) - M^T Q \\&\quad = \left( \begin{array}{lll} 2 G &{} 0 &{} 0 \\ 0 &{} 2\beta B^TB&{} -B^T \\ 0 &{} -B &{} \frac{2}{\beta } I_m \end{array} \right) - \left( \begin{array}{lll} I_{n_1} &{} 0 &{} 0 \\ 0 &{} I_{n_2} &{} -\beta B^T \\ 0 &{} 0 &{} I_m \end{array} \right) \left( \begin{array}{lll} G &{} 0 &{} 0 \\ 0 &{} \beta B^TB&{} 0 \\ 0 &{} -B &{} \frac{1}{\beta } I_m \end{array} \right) \\&\quad = \left( \begin{array}{lll} 2 G &{} 0 &{} 0 \\ 0 &{} 2\beta B^TB&{} -B^T \\ 0 &{} -B &{} \frac{2}{\beta } I_m \end{array} \right) - \left( \begin{array}{lll} G &{} 0 &{} 0 \\ 0 &{} 2\beta B^TB&{} -B^T \\ 0 &{} -B &{} \frac{1}{\beta } I_m \end{array} \right) \\&\quad = \left( \begin{array}{lll} G &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} \frac{1}{\beta } I_m \end{array} \right) \succeq 0. \end{aligned}$$

Thus, the proposition is proved. $\square $

2.2 Variational inequality characterization of (1.1)

It is easy to see that (1.1) is characterized by a variational inequality (VI) problem: Find $w^*=(x^*,y^*,\lambda ^*) \in \Omega :=\mathcal{X} \times \mathcal{Y} \times \mathfrak {R}^m$ such that

$$\begin{aligned} \hbox {VI}(\Omega ,F,\theta )\!: \qquad \theta (u) - \theta (u^*) + (w-w^*)^T F(w^*) \ge 0, \quad \forall \, w\in \Omega , \end{aligned}$$

(2.2a)

where

$$\begin{aligned} u= \left( \begin{array}{l} x\\ y \end{array} \right) \!, \; w = \left( \begin{array}{l} x\\ y\\ \lambda \end{array} \right) \!, \; F(w) = \left( \begin{array}{l} - A^T\lambda \\ - B^T\lambda \\ Ax + By -b \end{array} \right) \;\;\hbox {and}\;\; \theta (u)=\theta _1(x) + \theta _2(y)\!. \end{aligned}$$

(2.2b)

Note that the mapping $F(w)$ is monotone because it is affine with a skew-symmetric matrix. We denote by $\Omega ^*$ the solution set of VI$(\Omega ,F,\theta )$. Then, $\Omega ^*$ is nonempty under the nonempty assumption on the solution set of (1.1).

3 Sketch of Proof

To establish a worst-case $O(1/k)$ convergence rate for the sequence $\{w^k\}$ generated by (1.2) in a non-ergodic sense, we need the assertion in the following lemma.

Lemma 3.1

Let the sequence $\{w^k\}$ be generated by (1.2) and $H$ be given in (2.1). Then, we have

$$\begin{aligned}&\theta (u) -\theta (u^{k+1}) + (w-w^{k+1})^T \left\{ F(w^{k+1}) + \eta (y^k,y^{k+1}) + H(w^{k+1} - w^k)\right\} \ge 0,\nonumber \\&\quad \forall \, w\in \Omega , \end{aligned}$$

(3.1)

where

$$\begin{aligned} \eta (y^k,y^{k+1}) := \beta \,\left( \!\! \begin{array}{l} A^T \\ B^T \\ 0 \end{array}\!\! \right) B\,(y^k-y^{k+1})\!. \end{aligned}$$

(3.2)

Proof

First, deriving the optimality condition of the minimization problem (1.2a), we have

$$\begin{aligned}&\theta _1(x) \!-\!\theta _1(x^{k+1}) \!+\! (x-x^{k+1})^T\left\{ A^T\left[ \beta (Ax^{k+1} \!+\! By^k -b)\!-\!\lambda ^k\right] \!+\! G (x^{k+1}-x^k)\right\} \\&\quad \ge 0, \forall \, x\in \mathcal{X}. \end{aligned}$$

By using (1.2c), it can be written as

$$\begin{aligned}&\theta _1(x) -\theta _1(x^{k+1}) + (x-x^{k+1})^T \{- A^T\lambda ^{k+1} +\beta A^TB(y^k-y^{k+1})\nonumber \\&\quad + G (x^{k+1}-x^k) \} \ge 0, \forall \, x\in \mathcal{X}. \end{aligned}$$

(3.3)

It follows from (1.2) that

$$\begin{aligned} (Ax^{k+1} + By^{k+1} - b) + \frac{1}{\beta }(\lambda ^{k+1} -\lambda ^k) =0. \end{aligned}$$

(3.4)

Combining (3.3), (4.2) and (3.4) together, we get $w^{k+1}=(x^{k+1},y^{k+1},\lambda ^{k+1}) \in \Omega $, such that

$$\begin{aligned} \theta (u)&- \theta (u^{k+1}) + \left( \begin{array}{l} x- x^{k+1}\\ y- y^{k+1}\\ \lambda - \lambda ^{k+1} \end{array}\right) ^T \left\{ \left( \begin{array}{l} - A^T\lambda ^{k+1} +\beta A^TB(y^k-y^{k+1}) \\ -B^T \lambda ^{k+1} \\ A x^{k+1} +By^{k+1} -b \end{array}\right) \right. \\&+ \left. \left( \begin{array}{l} G(x^{k+1} - x^k) \\ 0 \\ \frac{1}{\beta }(\lambda ^{k+1}-\lambda ^k) \end{array}\right) \right\} \ge 0, \quad \forall \, w\in \Omega , \end{aligned}$$

which can be rewritten as

$$\begin{aligned}&{\theta (u) -\theta (u^{k+1}) + \left( \begin{array}{l} x- x^{k+1}\\ y- y^{k+1}\\ \lambda - \lambda ^{k+1} \end{array}\right) ^T \left\{ \left( \begin{array}{l} - A^T\lambda ^{k+1} \\ -B^T \lambda ^{k+1} \\ A x^{k+1} +By^{k+1} -b \end{array}\right) \right. }\\&\quad +\left. \beta \left( \begin{array}{l} A^TB(y^k-y^{k+1}) \\ B^TB(y^k-y^{k+1}) \\ 0 \end{array}\right) + \left( \begin{array}{l} G(x^{k+1} - x^k) \\ \beta B^TB(y^{k+1} - y^k) \\ \frac{1}{\beta }(\lambda ^{k+1}-\lambda ^k) \end{array}\right) \right\} \ge 0, \forall \, w\in \Omega . \end{aligned}$$

Using the notations of $F(w)$, $\eta (y^k,y^{k+1})$ and $H$, we get the assertion (3.1) immediately. $\square $

Lemma 3.1 indicates that the quantity $\Vert w^k-w^{k+1}\Vert _H^2$ can be used to measure the accuracy of the iterate $w^{k+1}$ to a solution point of VI$(\Omega ,F,\theta )$. More specifically, since $H$ is positive semidefinite, we conclude that $H(w^{k+1}-w^k)=0$ and $\eta (y^k ,y^{k+1})=0$ if $\Vert w^k-w^{k+1}\Vert _H^2=0$. In other words, because of the variational inequality characterization (2.2), when $\Vert w^k-w^{k+1}\Vert _H^2=0$, we have

$$\begin{aligned} \theta (u) -\theta (u^{k+1}) + (w-w^{k+1})^T F(w^{k+1}) \ge 0, \quad \forall \, w\in \Omega , \end{aligned}$$

which means $w^{k+1}$ is a solution of VI$(\Omega ,F,\theta )$. Therefore, $\Vert w^k - w^{k+1}\Vert _H^2$ can be viewed as an error measurement after $k$ iterations of the ADMM scheme (1.2), and it is reasonable to seek an upper bound of $\Vert w^k - w^{k+1}\Vert _H^2$ in term of the quantity $O(1/k)$ for the purpose of investigating the convergence rate of ADMM. Based on this fact, our proof follows the following steps:

1.
To show that $\{w^k\}$ is strictly contractive with respect to $\Omega ^*$, i.e.,
$$\begin{aligned} \Vert w^{k+1} - w^*\Vert _H^2 \le \Vert w^k-w^*\Vert _H^2 - \Vert w^k - w^{k+1}\Vert _H^2 \quad \forall \, w^*\in \Omega ^*; \end{aligned}$$
(3.5)
2.
To show that $\{\Vert w^{k} - w^{k+1}\Vert _H^2 \}$ is monotonically non-increasing, i.e.,
$$\begin{aligned} \Vert w^{k} - w^{k+1}\Vert _H^2 \le \Vert w^{k-1}-w^{k}\Vert _H^2 \quad \forall \, k\ge 1;\end{aligned}$$
(3.6)
3.
To derive a worst-case $O(1/k)$ convergence rate in a non-ergodic sense based on (3.5) and (3.6), i.e.,
$$\begin{aligned} \Vert w^k-w^{k+1}\Vert _H^2 \le \frac{1}{(k+1)}\Vert w^0-w^*\Vert _H^2\quad \forall w^*\in \Omega ^*. \end{aligned}$$
(3.7)

In the following, our analysis is thus divided into three sections to address these three tasks.

4 Strict Contraction

We prove the conclusion (3.5) in this section, and our proof is inspired by Theorem 1 in [9]. We first present several lemmas.

Lemma 4.1

Let the sequence $\{w^k\}$ be generated by (1.2). Then, we have

$$\begin{aligned} (y^k-y^{k+1})^TB^T (\lambda ^k-\lambda ^{k+1})\ge 0, \quad \forall \, k\ge 0. \end{aligned}$$

(4.1)

Proof

Deriving the optimality conditions of the minimization problem (1.2b), we have

$$\begin{aligned}&y^{k+1}\in \mathcal{Y}, \quad \theta _2(y) -\theta _2(y^{k+1}) {+} (y-y^{k+1})^T \left\{ B^T[\beta (Ax^{k+1} + By^{k+1} -b) -\lambda ^k] \right\} \\&\quad \ge 0, \forall \, y\in \mathcal{Y}. \end{aligned}$$

Substituting (1.2c) into the last inequality, we obtain

$$\begin{aligned} y^{k+1}\in \mathcal{Y}, \quad \theta _2(y) -\theta _2(y^{k+1}) + (y-y^{k+1})^T (-B^T\lambda ^{k+1} ) \ge 0, \; \forall \, y\in \mathcal{Y}. \end{aligned}$$

(4.2)

Obviously, analogous to (4.2), we have

$$\begin{aligned} y^k \in \mathcal{Y}, \quad \theta _2(y) - \theta _2(y^k) + (y -y^{k})^T ( - B^T\lambda ^{k} ) \ge 0, \; \forall \; y\in \mathcal{Y}. \end{aligned}$$

(4.3)

Setting $y=y^k$ and $y=y^{k+1}$ in in (4.2) and (4.3), respectively, and then adding the two resulting inequalities, we get (4.1) immediately. $\square $

Lemma 4.2

Let the sequence $\{w^k\}$ be generated by (1.2) and $H$ be given in (2.1). Then, we have

$$\begin{aligned} (w^{k+1} - w^*)^T H (w^k-w^{k+1}) \ge 0, \quad \forall \, w^*\in \Omega ^*. \end{aligned}$$

(4.4)

Proof

Setting $w^*$ in (3.1), we obtain

$$\begin{aligned}&{ (w^{k+1} - w^*)^T H(w^k-w^{k+1}) } \nonumber \\&\quad \ge \theta (u^{k+1}) - \theta (u^*) + (w^{k+1} - w^*)^T F(w^{k+1})\nonumber \\&\qquad + (w^{k+1} - w^*)^T\eta (y^k,y^{k+1}), \forall w^*\in \Omega ^*. \end{aligned}$$

(4.5)

In the following we show that the right-hand side of (4.5) is non-negative. First, since $w^{k+1}\in \Omega $ and $w^*\in \Omega ^*$, it follows from (2.2a) that

$$\begin{aligned}\theta (u^{k+1})- \theta (u^*) + (w^{k+1} - w^*)^T F(w^*) \ge 0. \end{aligned}$$

Because of the monotonicity of $F$, we have

$$\begin{aligned} (w^{k+1} - w^*)^T F(w^{k+1}) \ge (w^{k+1} - w^*)^T F(w^*) . \end{aligned}$$

Thus, we obtain

$$\begin{aligned} \theta (u^{k+1}) - \theta (u^*) + (w^{k+1} - w^*)^T F(w^{k+1}) \ge 0.\end{aligned}$$

(4.6)

On the other hand, by using the notation of $\eta (y^k,y^{k+1})$ [see (3.2)], $Ax^*+By^*=b$ and (1.2c), we have

$$\begin{aligned}&{ (w^{k+1} - w^*)^T\eta (y^k,y^{k+1}) } \nonumber \\&\quad =(y^k-y^{k+1})^TB^T \beta \{ (Ax^{k+1} + By^{k+1}) - (Ax^*+ By^*)\} \nonumber \\&\quad = (y^k-y^{k+1})^TB^T \beta ( Ax^{k+1} + By^{k+1} -b) \nonumber \\&\quad = (y^k-y^{k+1})^TB^T (\lambda ^k-\lambda ^{k+1}). \end{aligned}$$

Combining with (4.1), we get

$$\begin{aligned} (w^{k+1} - w^*)^T\eta (y^k,y^{k+1}) \ge 0. \end{aligned}$$

(4.7)

Substituting (4.6) and (4.7) into the right-hand side of (4.5), the lemma is proved. $\square $

With the proved lemmas, we are now ready to show the assertion (3.5).

Theorem 4.1

Let the sequence $\{w^k\}$ be generated by (1.2) and $H$ be given in (2.1). Then (3.5) is satisfied for any $k\ge 0$.

Proof

Using (4.4), we have

$$\begin{aligned} \Vert w^k -w^*\Vert _H^2&= \Vert (w^{k+1} - w^*) + (w^k - w^{k+1}\Vert _H^2 \nonumber \\&= \Vert w^{k+1} \!-\! w^*\Vert _H^2 \!+\! 2(w^{k+1} \!-\! w^*)^TH (w^k \!-\! w^{k+1}) + \Vert w^k - w^{k+1}\Vert _H^2 \nonumber \\&\ge \Vert w^{k+1} - w^*\Vert _H^2 + \Vert w^k - w^{k+1}\Vert _H^2, \end{aligned}$$

and thus the assertion (3.5) is proved. $\square $

5 Monotonicity

This section shows the assertion (3.6), i.e, the sequence $\{\Vert w^{k+1} - w^{k+2}\Vert _H^2\}$ is monotonically non-increasing. Again, we need to prove several lemmas for this purpose.

First of all, for the convenience of analysis, we introduce an auxiliary variable ${\tilde{w}}^k$ defined as

$$\begin{aligned} {\tilde{w}}^k= \left( \begin{array}{l} {\tilde{x}}^k \\ {\tilde{y}}^k \\ {\tilde{\lambda }}^k \end{array} \right) = \left( \begin{array}{l} x^{k+1}\\ y^{k+1} \\ \lambda ^k-\beta (Ax^{k+1}+By^{k}-b) \end{array} \right) \!, \end{aligned}$$

(5.1)

where $w^{k}$ is generated by (1.2). Then, we have the relationship

$$\begin{aligned} w^{k+1} = w^k - M(w^k-\tilde{w}^k)\!, \end{aligned}$$

(5.2)

where the matrix $M$ is given in (2.1).

Lemma 5.1

Let $\{w^k\}$ be the sequence generated by (1.2), the associated sequence $\{{\tilde{w}}^k\}$ be defined by (5.1) and $Q$ be given in (2.1). Then, we have

$$\begin{aligned} \tilde{w}^k \in \Omega , \quad \theta (u) -\theta (\tilde{u}^k) + (w- \tilde{w}^k)^T \left\{ F(\tilde{w}^k) + Q(\tilde{w}^k - w^k)\right\} \ge 0, \quad \forall \, w\in \Omega .\nonumber \\ \end{aligned}$$

(5.3)

Proof

By using the notation $\tilde{w}^k$ in (5.1), and the facts

$$\begin{aligned} \frac{1}{\beta }(\lambda ^k-\tilde{\lambda }^k) = (A\tilde{x}^k +B\tilde{y}^k -b) -B(\tilde{y}^k-y^k) \quad \hbox {and} \quad \Omega =\mathcal{X} \times \mathcal{Y} \times \mathfrak {R}^m, \end{aligned}$$

the inequality (3.1) can be rewritten as

$$\begin{aligned}&\tilde{w}^k\in \Omega , \quad { \theta (u) -\theta (\tilde{u}^k) + \left( \begin{array}{c} x- \tilde{x}^k\\ y- \tilde{y}^k\\ \lambda - \tilde{\lambda }^{k} \end{array}\right) ^T } \nonumber \\&\quad \times \left\{ \left( \begin{array}{c} - A^T\tilde{\lambda }^k \\ -B \tilde{\lambda }^k \\ A \tilde{x}^k +B\tilde{y}^k -b \end{array}\right) + \left( \begin{array}{l} G(\tilde{x}^k - x^k) \\ \beta B^TB(\tilde{y}^k - y^k)\\ -B(\tilde{y}^k-y^k) + \frac{1}{\beta }(\tilde{\lambda }^k-\lambda ^k) \end{array}\right) \right\} \ge 0, \quad \forall \, w\in \Omega . \end{aligned}$$

The assertion (5.3) thus follows immediately from the definition of $Q$. $\square $

Lemma 5.1 enables us to establish an important inequality in the following lemma.

Lemma 5.2

Let $\{w^k\}$ be the sequence generated by (1.2), the associated sequence $\{{\tilde{w}}^k\}$ be defined by (5.1) and $Q$ be given in (2.1). Then, we have

$$\begin{aligned} (\tilde{w}^k - \tilde{w}^{k+1})^T Q \{ (w^k-w^{k+1})- (\tilde{w}^k -\tilde{w}^{k+1})\} \ge 0. \end{aligned}$$

(5.4)

Proof

Setting $w=\tilde{w}^{k+1}$ in (5.3), we have

$$\begin{aligned} \theta (\tilde{u}^{k+1}) -\theta (\tilde{u}^k) + (\tilde{w}^{k+1} - \tilde{w}^k)^T \{ F(\tilde{w}^k) + Q(\tilde{w}^k - w^k)\} \ge 0. \end{aligned}$$

(5.5)

Note that (5.3) is also true for $k:=k+1$ and thus we have

$$\begin{aligned} \theta (u)-\theta (\tilde{u}^{k+1}) + (w -\tilde{w}^{k+1} )^T \{ F(\tilde{w}^{k+1}) + Q(\tilde{w}^{k+1} - w^{k+1})\} \ge 0, \quad \forall \, w\in \Omega . \end{aligned}$$

Setting $w=\tilde{w}^k$ in the above inequality, we obtain

$$\begin{aligned} \theta (\tilde{u}^k)-\theta (\tilde{u}^{k+1}) + (\tilde{w}^k -\tilde{w}^{k+1} )^T \{F(\tilde{w}^{k+1}) + Q(\tilde{w}^{k+1} - w^{k+1})\} \ge 0. \end{aligned}$$

(5.6)

Adding (5.5) to (5.6) and using the monotonicity of $F$, we get (5.4) immediately. $\square $

Lemma 5.3

Let $\{w^k\}$ be the sequence generated by (1.2), the associated sequence $\{{\tilde{w}}^k\}$ be defined by (5.1), the matrices $H$, $M$ and $Q$ be given in (2.1). Then, we have

$$\begin{aligned}&(w^k - \tilde{w}^k)^T M^THM \left\{ (w^k-\tilde{w}^k) - (w^{k+1}-\tilde{w}^{k+1})\right\} \ge \frac{1}{2} \Vert (w^k-\tilde{w}^k)\nonumber \\&\quad - (w^{k+1}-\tilde{w}^{k+1})\Vert _{(Q^T+Q)}^2. \end{aligned}$$

(5.7)

Proof

First, adding the term

$$\begin{aligned} \left\{ (w^k -w^{k+1}) - (\tilde{w}^k - \tilde{w}^{k+1})\right\} ^T Q \left\{ (w^k-w^{k+1}) - (\tilde{w}^k -\tilde{w}^{k+1})\right\} \end{aligned}$$

to the both sides of (5.4), and using $w^TQw= \frac{1}{2}w^T(Q^T+Q)w$, we get

$$\begin{aligned}&(w^k - w^{k+1})^T Q \left\{ (w^k - w^{k+1})-(\tilde{w}^k-\tilde{w}^{k+1})\right\} \nonumber \\&\quad \ge \frac{1}{2} \Vert (w^k-w^{k+1})-(\tilde{w}^k- \tilde{w}^{k+1})\Vert _{(Q^T+Q)}^2\!. \end{aligned}$$

Reordering $(w^k - w^{k+1}) -(\tilde{w}^k-\tilde{w}^{k+1})$ in the above inequality to $(w^k-\tilde{w}^k) - (w^{k+1}-\tilde{w}^{k+1})$ , we get

$$\begin{aligned}&(w^k - w^{k+1})^T Q \left\{ (w^k-\tilde{w}^k) - (w^{k+1}-\tilde{w}^{k+1})\right\} \nonumber \\&\quad \ge \frac{1}{2} \Vert (w^k-\tilde{w}^k) - (w^{k+1}-\tilde{w}^{k+1})\Vert _{(Q^T+Q)}^2\!. \end{aligned}$$

Substituting the term $(w^k - w^{k+1})$ into the left-hand side of the last inequality, and using the relationship in (5.2) and the fact $Q=HM$ in Proposition 2.1, we obtain (5.7). $\square $

Finally, we are ready to show the assertion (3.6) in the following theorem.

Theorem 5.1

Let $\{w^k\}$ be the sequence generated by (1.2) and $H$ be given in (2.1). Then, (3.6) is satisfied for any $k \ge 0$.

Proof

Setting $a=M(w^k-\tilde{w}^k)$ and $b=M(w^{k+1}-\tilde{w}^{k+1})$ in the identity

$$\begin{aligned}\Vert a\Vert _H^2 - \Vert b\Vert _H^2 = 2a^TH(a-b) - \Vert a-b\Vert _H^2\!, \end{aligned}$$

we obtain

$$\begin{aligned}&{ \Vert M(w^k-\tilde{w}^k)\Vert _H^2 - \Vert M(w^{k+1}-\tilde{w}^{k+1})\Vert _H^2 } \\&\quad = 2(w^k-\tilde{w}^k)^TM^THM\{(w^k-\tilde{w}^k) - (w^{k+1}-\tilde{w}^{k+1})\} \\&\qquad - \Vert M\{(w^k-\tilde{w}^k) - (w^{k+1}-\tilde{w}^{k+1})\}\Vert _H^2\!. \end{aligned}$$

Inserting (5.7) into the first term of the right-hand side of the last equality, we obtain

$$\begin{aligned}&{ \Vert M(w^k-\tilde{w}^k)\Vert _H^2 - \Vert M(w^{k+1}-\tilde{w}^{k+1})\Vert _H^2 } \\&\quad \ge \Vert (w^k-\tilde{w}^k) \!-\! (w^{k+1}\!-\!\tilde{w}^{k+1})\Vert _{(Q^T+Q)}^2 \!-\! \Vert M\{(w^k-\tilde{w}^k) \!-\! (w^{k+1}-\tilde{w}^{k+1})\}\Vert _H^2 \\&\quad = \Vert (w^k-\tilde{w}^k) - (w^{k+1}-\tilde{w}^{k+1})\Vert _{\{(Q^T+Q)- M^THM\}}^2 \\&\quad \ge 0, \end{aligned}$$

where the last inequality is because of the positive definiteness of the matrix $(Q^T+Q)-M^THM$ proved in Proposition 2.1. In other words, we derive

$$\begin{aligned} \Vert M(w^{k+1} -\tilde{w}^{k+1})\Vert _H^2 \le \Vert M(w^k-\tilde{w}^k)\Vert _H^2\!. \end{aligned}$$

(5.8)

Recall the relationship in (5.2). The assertion (3.6) follows immediately from (5.8). $\square $

6 Non-ergodic convergence rate

With Theorems 4.1 and 5.1, we can prove the assertion (3.7). That is, a worst-case $O(1/k)$ convergence rate in a non-ergodic sense for the ADMM scheme (1.2) is established.

Theorem 6.1

Let $\{w^k\}$ be the sequence generated by (1.2). Then, the assertion (3.7) is satisfied.

Proof

First, it follows from (3.5) that

$$\begin{aligned} \sum _{t=0}^{\infty } \Vert w^t - w^{t+1}\Vert _H^2 \le \Vert w^0 - w^*\Vert _H^2,\quad \forall \, w^*\in \Omega ^*. \end{aligned}$$

(6.1)

According to Theorem 5.1, the sequence $\{\Vert w^t-w^{t+1}\Vert _H^2 \}$ is monotonically non-increasing. Therefore, we have

$$\begin{aligned} (k+1)\Vert w^k-w^{k+1}\Vert _H^2 \le \sum _{t=0}^{k} \Vert w^t -w^{t+1}\Vert _H^2. \end{aligned}$$

(6.2)

The assertion (3.7) follows from (6.1) and (6.2) immediately. $\square $

Notice that $\Omega ^*$ is convex and closed (see Theorem 2.1 in [10]). Let $d: = \inf \{ \Vert w^0 - w^*\Vert _H \, |\, w^*\in \Omega ^* \}$. Then, for any given $\epsilon >0$, Theorem 6.1 shows that the ADMM scheme (1.2) needs at most $ \lfloor {d^2}/{\epsilon } \rfloor $ iterations to ensure that $\Vert w^k- w^{k+1}\Vert _H^2\le \epsilon $. Recall that $w^{k+1}$ is a solution of VI$(\Omega ,F,\theta )$ if $\Vert w^k- w^{k+1}\Vert _H^2=0$ (see Lemma 3.1). A worst-case $O(1/k)$ convergence rate in a non-ergodic sense for the ADMM scheme (1.2) is thus established in Theorem 6.1.

Finally, we remark that because of the monotonicity of the sequence $\{\Vert w^k-w^{k+1}\Vert _H^2\}$ and the fact (6.1), and using Lemma 1.2 in [2], we can immediately refine the worst-case convergence rate in Theorem 6.1 from $O(1/k)$ to $o(1/k)$.

Notes

As [11, 12] and many others, a worst-case $O(1/k)$ convergence rate measured by the iteration complexity means the accuracy to a solution under certain criterion is of the order $O(1/k)$ after $k$ iterations of an iterative scheme; or equivalently, it requires at most $O(1/\epsilon )$ iterations to achieve an approximate solution with an accuracy of $\epsilon $.

References

Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2010)
Article MATH Google Scholar
Deng, W., Lai, M. J., Peng, Z. M., Yin, W. T.: Parallel multi-block ADMM with $o(1/k)$ convergence. 2014 (in press)
Eckstein, J., Yao, W.: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Research Report RRR 32–2012 (2012)
Fortin, M., Glowinski, R.: Augmented lagrangian methods: applications to the numerical solutions of boundary value problems. Stud. Math. Appl. 15. NorthHolland, Amsterdam (1983)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Glowinski, R.: On alternating direction methods of multipliers: a historical perspective. In: Fitzgibbon, W., Kuznetsov, Y.A., Neittaanmaki, P., Pironneau, O. (eds.) Modeling, Simulation and Optimization for Science and Technology. Computational Methods in Applied Sciences, vol. 34, pp. 59–82. Springer, Dordrecht (2014)
Glowinski, R., Marroco, A.: Approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique, vol. 9, R2, pp. 41–76 (1975)
Glowinski, R., Le Tallec, P.: Augmented lagrangian and operator-splitting methods in nonlinear mechanics. SIAM Studies in Applied Mathematics, Philadelphia (1989)
Book MATH Google Scholar
He, B.S., Yang, H.: Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett. 23, 151–161 (1998)
Article MATH MathSciNet Google Scholar
He, B.S., Yuan, X.M.: On the O(1/n) convergence rate of Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MATH MathSciNet Google Scholar
Nemirovsky, A. S., Yudin, D. B.: Problem complexity and method efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics, Wiley, New York (1983)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate $O(1/k^2)$. Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
MathSciNet Google Scholar
Zhang, X.Q., Burger, M., Bresson, X., Osher, S.: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253–276 (2010)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, International Centre of Management Science and Engineering, Nanjing University, Nanjing, 210093, China
Bingsheng He
Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
Xiaoming Yuan

Authors

Bingsheng He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoming Yuan.

Additional information

X. Yuan was supported by the General Research Fund from Hong Kong Research Grants Council: 203613.

B. He was supported by the NSFC Grant 91130007 and 11471156.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, B., Yuan, X. On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130, 567–577 (2015). https://doi.org/10.1007/s00211-014-0673-6

Download citation

Received: 11 April 2013
Revised: 03 November 2014
Published: 30 November 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s00211-014-0673-6

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers

Abstract

Similar content being viewed by others

On the Convergence Properties of a Majorized Alternating Direction Method of Multipliers for Linearly Constrained Convex Optimization Problems with Coupled Objective Functions

The exact worst-case convergence rate of the alternating direction method of multipliers

Further Study on the Convergence Rate of Alternating Direction Method of Multipliers with Logarithmic-quadratic Proximal Regularization

1 Introduction

2 Preliminaries

2.1 Notations

Proposition 2.1

Proof

2.2 Variational inequality characterization of (1.1)

3 Sketch of Proof

Lemma 3.1

Proof

4 Strict Contraction

Lemma 4.1

Proof

Lemma 4.2

Proof

Theorem 4.1

Proof

5 Monotonicity

Lemma 5.1

Proof

Lemma 5.2

Proof

Lemma 5.3

Proof

Theorem 5.1

Proof

6 Non-ergodic convergence rate

Theorem 6.1

Proof

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation