On the Sublinear Convergence Rate of Multi-block ADMM

Lin, Tian-Yi; Ma, Shi-Qian; Zhang, Shu-Zhong

doi:10.1007/s40305-015-0092-0

On the Sublinear Convergence Rate of Multi-block ADMM

Published: 19 July 2015

Volume 3, pages 251–274, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of the Operations Research Society of China Aims and scope Submit manuscript

On the Sublinear Convergence Rate of Multi-block ADMM

Download PDF

Tian-Yi Lin¹,
Shi-Qian Ma¹ &
Shu-Zhong Zhang²

1295 Accesses
59 Citations
Explore all metrics

Abstract

The alternating direction method of multipliers (ADMM) is widely used in solving structured convex optimization problems. Despite its success in practice, the convergence of the standard ADMM for minimizing the sum of $N\,(N\geqslant 3)$ convex functions, whose variables are linked by linear constraints, has remained unclear for a very long time. Recently, Chen et al. (Math Program, doi:10.1007/s10107-014-0826-5, 2014) provided a counter-example showing that the ADMM for $N\geqslant 3$ may fail to converge without further conditions. Since the ADMM for $N\geqslant 3$ has been very successful when applied to many problems arising from real practice, it is worth further investigating under what kind of sufficient conditions it can be guaranteed to converge. In this paper, we present such sufficient conditions that can guarantee the sublinear convergence rate for the ADMM for $N\geqslant 3$. Specifically, we show that if one of the functions is convex (not necessarily strongly convex) and the other N-1 functions are strongly convex, and the penalty parameter lies in a certain region, the ADMM converges with rate O(1 / t) in a certain ergodic sense and o(1 / t) in a certain non-ergodic sense, where t denotes the number of iterations. As a by-product, we also provide a simple proof for the O(1 / t) convergence rate of two-block ADMM in terms of both objective error and constraint violation, without assuming any condition on the penalty parameter and strong convexity on the functions.

Convergence analysis of the direct extension of ADMM for multiple-block separable convex minimization

Article 13 October 2017

Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems

Article 24 November 2017

On the global and linear convergence of direct extension of ADMM for 3-block separable convex minimization models

Article Open access 17 September 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We consider solving the following multi-block convex minimization problem:

$$\begin{aligned} \begin{array}{ll} \min &{}f_1(x_1)+f_2(x_2)+\cdots +f_N(x_N)\\ \text {s.t.}&{}A_1 x_1+A_2 x_2+\cdots +A_N x_N=b\\ &{} x_i \in \mathcal {X}_i,\,i=1,\cdots , N, \end{array} \end{aligned}$$

(1.1)

where $A_i \in \mathbb {R}^{p\times n_i}$, $b\in \mathbb {R}^p$, $\mathcal {X}_i\subset \mathbb {R}^{n_i}$ are closed convex sets and $f_i:\mathbb {R}^{n_i}\rightarrow \mathbb {R}^p$ are closed convex functions. One recently popular way to solve (1.1), when the functions $f_i$ are of special structures, is to apply the alternating direction method of multipliers (ADMM) [1, 2]. The ADMM is closely related to the Douglas–Rachford [3] and Peaceman–Rachford [4] operator splitting methods that date back to 1950s. These operator splitting methods were further studied later in [5–8]. The ADMM has been revisited recently due to its success in solving problems with special structures arising from compressed sensing, machine learning, image processing, and so on; see the recent survey papers [9, 10] for more information.

ADMM for solving (1.1) is based on an augmented Lagrangian method framework. The augmented Lagrangian function for (1.1) is defined as

$$\begin{aligned} \mathcal {L}_\gamma (x_1,\cdots ,x_N;\lambda ) := \sum _{j=1}^N f_j(x_j) - \left\langle \lambda , \sum _{j=1}^N A_j x_j -b\right\rangle + \frac{\gamma }{2}\left\| \sum _{j=1}^N A_j x_j -b\right\| ^2, \end{aligned}$$

where $\lambda $ is the Lagrange multiplier and $\gamma > 0$ is a penalty parameter. In a typical iteration of the standard ADMM for solving (1.1), the following updating procedure is implemented:

$$\begin{aligned} \left\{ \begin{array}{lcl} x_1^{k+1} &{}&{} := \mathop {\mathrm{argmin}}\nolimits _{x_1\in \mathcal {X}_1} \ \mathcal {L}_\gamma \left( x_1,x_2^k,\cdots ,x_N^k;\lambda ^k\right) , \\ x_2^{k+1} &{}&{} := \mathop {\mathrm{argmin}}\nolimits _{x_2\in \mathcal {X}_2} \ \mathcal {L}_\gamma \left( x_1^{k+1},x_2,x_3^k,\cdots ,x_N^k;\lambda ^k\right) , \\ &{}&{} \vdots \\ x_N^{k+1} &{}&{} := \mathop {\mathrm{argmin}}\nolimits _{x_N\in \mathcal {X}_N} \ \mathcal {L}_\gamma \left( x_1^{k+1},x_2^{k+1},\cdots ,x_{N-1}^{k+1},x_N;\lambda ^k\right) , \\ \lambda ^{k+1} &{}&{} := \lambda ^k - \gamma \left( \sum _{j=1}^N A_j x_j^{k+1} -b\right) . \end{array}\right. \end{aligned}$$

(1.2)

The ADMM (1.2) for solving two-block convex minimization problems (i.e., $N=2$) has been studied extensively in the literature. The global convergence of ADMM (1.2) when $N=2$ has been shown in [11, 12]. There are also some very recent works that study the convergence rate properties of ADMM when $N=2$ (see, e.g., [13–18]).

However, the convergence of ADMM (1.2) when $N\geqslant 3$ had remained unclear for a long time. In a recent work by Chen et al. [19], a counter-example was constructed that shows the failure of ADMM (1.2) when $N\geqslant 3$. Since the ADMM (1.2) for $N\geqslant 3$ has been successfully applied to solve many problems arising from real practice (see e.g., [20, 21]), it is worth investigating under what kind of sufficient conditions the ADMM (1.2) can converge. Moreover, it has been observed by many researchers that the ADMM (1.2) often outperforms all its modified versions (see the observations in [22, 23]). In fact, Sun, Toh, and Yang made the following statement in [22]: “However, to the best of our knowledge, up to now the dilemma is that at least for convex conic programming, the modified versions though with convergence guarantee, often perform 2-3 times slower than the multi-block ADMM with no convergent guarantee.” There is thus a strong need to further study sufficient conditions that can guarantee the convergence of (1.2). It was shown by Han and Yuan in [24] that ADMM (1.2) globally converges if all the functions $f_1,\cdots ,f_N$ are assumed to be strongly convex and the penalty parameter $\gamma $ is smaller than a certain bound. Chen, Shen, and You [25] showed that the 3-block ADMM [i.e., $N=3$ in (1.2)] globally converges if $A_1$ is injective, $f_2$ and $f_3$ are strongly convex, and $\gamma $ is smaller than a certain bound. After we released our work,^{Footnote 1} Cai, Han, and Yuan [26] and Li, Sun, and Toh [27] independently proved that when $N=3$, the ADMM (1.2) converges under the conditions that one function among $f_1$, $f_2$, and $f_3$ is strongly convex and $\gamma $ is smaller than a certain bound. Davis and Yin [28] studied a variant of the 3-block ADMM (see Algorithm 8 in [28]) which requires that $f_1$ is strongly convex and $\gamma $ is smaller than a certain bound to guarantee the convergence. Recently, Lin, Ma, and Zhang [29] proposed several alternative approaches to ensure the sublinear convergence rate of (1.2) without requiring any function to be strongly convex. Furthermore, Lin, Ma, and Zhang [30] proved that the 3-block ADMM is globally convergent for any $\gamma >0$ when it is applied to solve the so-called regularized least-squares decomposition problems. In a recent work by Hong and Luo [31], a variant of ADMM (1.2) with small step size in updating the Lagrange multiplier was studied. Specifically, [31] proposed to replace the last equation in (1.2) by

$$\begin{aligned} \lambda ^{k+1} := \lambda ^k - \alpha \gamma \left( \sum _{j=1}^N A_j x_j^{k+1} -b\right) , \end{aligned}$$

where $\alpha >0$ is a small step size. Linear convergence of this variant is proved under the assumption that the objective function satisfies certain error bound conditions. However, it is noted that the selection of $\alpha $ is in fact bounded by some parameters associated with the error bound conditions to guarantee the convergence. Therefore, it might be difficult to choose $\alpha $ in practice. There are also studies on the convergence rate of some other variants of ADMM (1.2), and we refer the interested readers to [32–36] for details of these variants. In this paper, we focus on the ADMM (1.2) that directly extends the two-block ADMM to problems with more than two block variables.

1.1 Our Contributions

The main contribution in this paper is as follows. We show that the ADMM (1.2) when $N\geqslant 3$ converges with rate O(1 / t) in ergodic sense and o(1 / t) in non-ergodic sense, under the assumption that $f_2,\cdots ,f_N$ are strongly convex and $f_1$ is convex but not necessarily strongly convex, and $\gamma $ is smaller than a certain bound. It should be pointed out that our assumption is weaker than the one used in [24], in which all the functions are required to be strongly convex. Moreover, unlike the sufficient condition suggested in [19], we do not make any assumption on the matrices $A_1,\cdots ,A_N$. To the best of our knowledge, the convergence rate results given in this paper are the first sublinear convergence rate results for the standard ADMM (1.2) when $N\geqslant 3$. We also remark here that by further assuming additional conditions, we proved the global linear convergence rate of ADMM (1.2) in [37].

1.2 Organization

The rest of this paper is organized as follows. In Sect. 2, we provide some preliminaries for our convergence rate analysis. In Sect. 3, we prove the convergence rate of ADMM (1.2) in the ergodic sense. In Sect. 4, we prove the convergence rate of ADMM (1.2) in the non-ergodic sense. Section 5 draws some conclusions and points out some future directions.

2 Preliminaries

We will only prove the convergence results of ADMM for $N=3$, because all the analysis can be extended to arbitrary N easily. As a result, for the ease of presentation and succinctness, we assume $N=3$ in the rest of this paper. We will present the results for general N but omit the proofs.

We restate the problem (1.1) for $N=3$ as

$$\begin{aligned} \begin{array}{ll} \min &{} f_1(x_1) + f_2(x_2) + f_3(x_3) \\ \text {s.t.}&{} A_1 x_1 + A_2 x_2 + A_3 x_3 = b, \\ &{} x_1\in \mathcal {X}_1, x_2\in \mathcal {X}_2, x_3\in \mathcal {X}_3. \end{array} \end{aligned}$$

(2.1)

The ADMM for solving (2.1) can be summarized as (note that some constant terms in the three subproblems are discarded):

$$\begin{aligned} x_{1}^{k+1}:= & {} \mathop {\mathrm{argmin}}_{x_{1} \in \mathcal {X}_{1}}f_{1}(x_{1})+\frac{\gamma }{2}\left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k} -b-\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \end{aligned}$$

(2.2)

$$\begin{aligned} x_{2}^{k+1}:= & {} \mathop {\mathrm{argmin}}_{x_{2} \in \mathcal {X}_{2}}f_{2}(x_{2}) +\frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b -\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \end{aligned}$$

(2.3)

$$\begin{aligned} x_{3}^{k+1}:= & {} \mathop {\mathrm{argmin}}_{x_{3} \in \mathcal {X}_{3}}f_{3}(x_{3}) +\frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}-b -\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \end{aligned}$$

(2.4)

$$\begin{aligned} \lambda ^{k+1}:= & {} \lambda ^{k} - \gamma \left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1} +A_{3}x_{3}^{k+1}-b\right) . \end{aligned}$$

(2.5)

The first-order optimality conditions for (2.2)–(2.4) are given, respectively, by $x_{i}^{k+1}\in \mathcal {X}_{i}, i=1,2,3$, and

$$\begin{aligned}&\quad \left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ g_1\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k} +\gamma A_{1}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right) \right] \nonumber \\&\geqslant 0,\quad \forall x_{1}\in \mathcal {X}_{1}, \end{aligned}$$

(2.6)

$$\begin{aligned}&\quad \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ g_2\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k}+\gamma A_{2}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right) \right] \nonumber \\&\geqslant 0,\quad \forall x_{2}\in \mathcal {X}_{2}, \end{aligned}$$

(2.7)

$$\begin{aligned}&\quad \left( x_{3}-x_{3}^{k+1}\right) ^{\mathrm{T}}\left[ g_3\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k}+\gamma A_{3}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right) \right] \nonumber \\&\geqslant 0,\quad \forall x_{3}\in \mathcal {X}_{3}, \end{aligned}$$

(2.8)

where $g_i \in \partial f_i$ is the subgradient of $f_i$ for $i=1,2,3$. Moreover, by combining with (2.5), (2.6)–(2.8) can be rewritten as

$$\begin{aligned}&\quad \left( x_{1}\,-\,x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) \,-\,A_{1}^{\mathrm{T}}\lambda ^{k+1}\,+\,\gamma A_{1}^{\mathrm{T}}A_{2}\left( x_{2}^{k}\,-\,x_{2}^{k+1}\right) \right. \nonumber \\&\quad \,\,\left. +\,\gamma A_{1}^{\mathrm{T}}A_{3}\left( x_{3}^{k}\,-\,x_{3}^{k+1}\right) \right] \geqslant 0,\quad \forall x_{1}\in \mathcal {X}_{1}, \end{aligned}$$

(2.9)

$$\begin{aligned}&\quad \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ g_2\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k+1}+\gamma A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \right] \nonumber \\&\geqslant 0,\quad \forall x_{2}\in \mathcal {X}_{2}, \end{aligned}$$

(2.10)

$$\begin{aligned}&\quad \left( x_{3}-x_{3}^{k+1}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k+1}\right] \geqslant 0,\quad \forall x_{3}\in \mathcal {X}_{3}. \end{aligned}$$

(2.11)

We denote $\Omega = \mathcal {X}_1\times \mathcal {X}_2 \times \mathcal {X}_3 \times \mathbb {R}^p$ and the optimal set of (2.1) as $\Omega ^*$, and the following assumption is made throughout this paper.

Assumption 2.1

The optimal set $\Omega ^*$ for problem (2.1) is non-empty.

According to the first-order optimality conditions for (2.1), solving (2.1) is equivalent to finding

$$\begin{aligned} \left( x_1^*,x_2^*,x_3^*,\lambda ^*\right) \in \Omega ^* \end{aligned}$$

such that the following holds:

$$\begin{aligned} \left\{ \begin{array}{l} \left( x_1-x_1^*\right) ^\mathrm{T}\left( g_1\left( x_1^*\right) -A_1^\mathrm{T}\lambda ^*\right) \geqslant 0, \quad \forall x_1\in \mathcal {X}_1, \\ \left( x_2-x_2^*\right) ^\mathrm{T}\left( g_2\left( x_2^*\right) -A_2^\mathrm{T}\lambda ^*\right) \geqslant 0, \quad \forall x_2\in \mathcal {X}_2, \\ \left( x_3-x_3^*\right) ^\mathrm{T}\left( g_3\left( x_3^*\right) -A_3^\mathrm{T}\lambda ^*\right) \geqslant 0, \quad \forall x_3\in \mathcal {X}_3, \\ A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{*}-b=0, \end{array}\right. \end{aligned}$$

(2.12)

where $g_i(x_i^*)\in \partial f_i(x_i^*)$, $i=1,2,3$.

Furthermore, the following condition is assumed in our subsequent analysis.

Assumption 2.2

The functions $f_2$ and $f_3$ are strongly convex with parameters $\sigma _2>0$ and $\sigma _3>0$, respectively; i.e., the following two inequalities hold:

$$\begin{aligned} f_2(y)\geqslant & {} f_2(x) + (y-x)^\mathrm{T}g_2(x) + \frac{\sigma _2}{2}\Vert y-x\Vert ^2, \quad \forall x,y\in \mathcal {X}_2, \end{aligned}$$

(2.13)

$$\begin{aligned} f_3(y)\geqslant & {} f_3(x) + (y-x)^\mathrm{T}g_3(x) + \frac{\sigma _3}{2}\Vert y-x\Vert ^2, \quad \forall x,y\in \mathcal {X}_3, \end{aligned}$$

(2.14)

or equivalently,

$$\begin{aligned} (y-x)^\mathrm{T}(g_2(y)-g_2(x))\geqslant & {} \sigma _2 \Vert y-x\Vert ^2, \quad \forall x,y\in \mathcal {X}_2, \end{aligned}$$

(2.15)

$$\begin{aligned} (y-x)^\mathrm{T}(g_3(y)-g_3(x))\geqslant & {} \sigma _3 \Vert y-x\Vert ^2, \quad \forall x,y\in \mathcal {X}_3, \end{aligned}$$

(2.16)

where $g_2(x)\in \partial f_2(x)$ and $g_3(x) \in \partial f_3(x)$ are the subgradients of $f_2$ and $f_3$, respectively.

In our analysis, the following well-known identity is used frequently:

$$\begin{aligned} (w_1-w_2)^{\mathrm{T}}(w_3-w_4)&=\frac{1}{2}\left( \Vert w_1-w_4\Vert ^{2}-\Vert w_1-w_3\Vert ^{2}\right) \nonumber \\&\quad +\,\frac{1}{2}\left( \Vert w_3-w_2\Vert ^{2}-\Vert w_4-w_2\Vert ^{2}\right) . \end{aligned}$$

(2.17)

Notation

For simplicity, we use the following notation to denote the stacked vectors or tuples:

$$\begin{aligned} u = \left( \begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \end{array} \right) , u^k = \left( \begin{array}{c} x_{1}^k \\ x_{2}^k \\ x_{3}^k \end{array} \right) , u^* = \left( \begin{array}{c} x_{1}^* \\ x_{2}^* \\ x_{3}^* \end{array} \right) . \end{aligned}$$

We denote the objective function of problem (2.1) by $f(u)\equiv f_1(x_{1})+f_{2}(x_{2})+f_{3}(x_{3})$; $g_i$ is a subgradient of $f_i$; $\lambda _{\max }(B)$ denotes the largest eigenvalue of a real symmetric matrix B; $\Vert x\Vert $ denotes the Euclidean norm of x.

3 Ergodic Convergence Rate of ADMM

In this section, we prove the O(1 / t) convergence rate of ADMM (2.2)–(2.5) in the ergodic sense.

Lemma 3.1

Assume that $\gamma \leqslant \min \left\{ \frac{\sigma _2}{2\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{2\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} $, where $\sigma _2$ and $\sigma _3$ are defined in Assumption 2.2. Let $(x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\in \Omega $ be generated by ADMM from given $(x_{2}^{k},x_{3}^{k},\lambda ^{k})$. Then, for any primal optimal solution $u^*=(x_1^*,x_2^*,x_3^*)$ of (2.1) and $\lambda \in \mathbb {R}^p$, it holds that

$$\begin{aligned}&\qquad f(u^{*})-f(u^{k+1})+\left( \begin{array}{c} x_{1}^{*}-x_{1}^{k+1} \\ x_{2}^{*}-x_{2}^{k+1} \\ x_{3}^{*}-x_{3}^{k+1} \\ \lambda -\lambda ^{k+1} \end{array}\right) ^{\mathrm{T}} \left( \begin{array}{c} -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{3}^{\mathrm{T}}\lambda ^{k+1} \\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \end{array} \right) \nonumber \\&\quad +\,\frac{1}{2\gamma }\left( \Vert \lambda -\lambda ^{k}\Vert ^{2}-\Vert \lambda -\lambda ^{k+1}\Vert ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\geqslant \frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$

(3.1)

Proof

Note that combining (2.9)–(2.11) yields

$$\begin{aligned}&\left( \begin{array}{c} x_{1}-x_{1}^{k+1} \\ x_{2}-x_{2}^{k+1} \\ x_{3}-x_{3}^{k+1} \end{array} \right) ^{\mathrm{T}} \left[ \left( \begin{array}{c} g_1\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ g_{2}\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ g_{3}\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k+1} \end{array}\right) \right. \nonumber \\&\left. \quad + \left( \begin{array}{ccc} \gamma A_{1}^{\mathrm{T}}A_{2} &{} \gamma A_{1}^{\mathrm{T}}A_{3} \\ 0 &{} \gamma A_{2}^{\mathrm{T}}A_{3} \\ 0 &{} 0 \end{array} \right) \left( \begin{array}{c} x_{2}^{k}-x_{2}^{k+1} \\ x_{3}^{k}-x_{3}^{k+1} \end{array} \right) \right] \geqslant 0. \end{aligned}$$

(3.2)

The key step in our proof is to bound the following two terms:

$$\begin{aligned}&\left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}A_{1}^{\mathrm{T}}\left( A_{2}\left( x_{2}^{k}-x_{2}^{k+1}\right) +A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \right) \quad \text{ and }\\&\quad \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) . \end{aligned}$$

For the first term, we have

$$\begin{aligned}&\qquad \left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}A_{1}^{\mathrm{T}}\left[ A_{2}\left( x_{2}^{k}-x_{2}^{k+1}\right) +A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \right] \\&=\left[ (A_{1}x_{1}-b)\!-\!\left( A_{1}x_{1}^{k+1}\!-\!b\right) \right] ^{\mathrm{T}} \left[ \left( \!-\!A_{2}x_{2}^{k+1}-A_{3}x_{3}^{k+1}\right) -\left( -A_{2}x_{2}^{k}-A_{3}x_{3}^{k}\right) \right] \\&= \frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k} -b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad +\,\frac{1}{2}\left( \left\| A_{1}x_{1}^{k+1}\!+\!A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1} \!-\!b\right\| ^{2} \!-\!\left\| A_{1}x_{1}^{k+1}+\,A_{2}x_{2}^{k}+A_{3}x_{3}^{k}\!-\!b\right\| ^{2}\right) \\&= \frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} -\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad +\,\frac{1}{2\gamma ^{2}}\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^{2} -\frac{1}{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}, \end{aligned}$$

where in the second equality we used the identity (2.17), and the last equality follows from the updating formula for $\lambda ^{k+1}$ in (2.5).

For the second term, we have

$$\begin{aligned}&\qquad \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}A_{2}^{\mathrm{T}}A_{3} \left( x_{3}^{k}-x_{3}^{k+1}\right) \\&=\left( \left( A_{1}x_{1}+A_{2}x_{2}-b\right) - \left( A_{1}x_{1}+A_{2}x_{2}^{k+1}-b\right) \right) ^{\mathrm{T}} \left( \left( -A_{3}x_{3}^{k+1}\right) -\left( -A_{3}x_{3}^{k}\right) \right) \\&=\frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b\right\| ^{2} -\left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad +\,\frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} -\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2}\right) \\&\leqslant \frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1} +A_{2}x_{2}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad +\,\frac{1}{2}\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}, \end{aligned}$$

where in the second equality we applied the identity (2.17).

Therefore, we have

$$\begin{aligned}&\qquad \left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}\gamma A_{1}^{\mathrm{T}}\left( A_{2}\left( x_{2}^{k}-x_{2}^{k+1}\right) +A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \right) \nonumber \\&\quad +\,\left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}\gamma A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \nonumber \\&\leqslant \frac{\gamma }{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^{2}+\frac{\gamma }{2}\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \nonumber \\&\quad -\,\frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$

(3.3)

Combining (3.3), (3.2), and (2.5), it holds for any $\lambda \in \mathbb {R}^p$ that

$$\begin{aligned}&\qquad \left( \begin{array}{c} x_{1}-x_{1}^{k+1} \\ x_{2}-x_{2}^{k+1} \\ x_{3}-x_{3}^{k+1} \\ \lambda -\lambda ^{k+1}\end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} g_{1}\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ g_{2}\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ g_{3}\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k+1} \\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \end{array}\right) \nonumber \\&\quad +\, \frac{1}{\gamma }\left( \lambda -\lambda ^{k+1}\right) ^{\mathrm{T}}\left( \lambda ^{k+1}-\lambda ^{k}\right) \nonumber \\&\quad +\,\frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^{2} +\frac{\gamma }{2}\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\geqslant \frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$

(3.4)

Using the convexity of $f_1$ and the identity

$$\begin{aligned} \frac{1}{\gamma }\left( \lambda -\lambda ^{k+1}\right) ^{\mathrm{T}}(\lambda ^{k+1}-\lambda ^{k}) +\frac{1}{2\gamma }\Vert \lambda ^{k+1}\!-\!\lambda ^{k}\Vert ^{2} \!=\! \frac{1}{2\gamma }\left( \Vert \lambda -\lambda ^{k}\Vert ^{2}-\Vert \lambda -\lambda ^{k+1}\Vert ^{2}\right) , \end{aligned}$$

letting $u=u^{*}$ in (3.4), and applying the facts that [invoking (2.13) and (2.14)]

$$\begin{aligned}&f_{2}\left( x_{2}^{*}\right) -f_{2}\left( x_{2}^{k+1}\right) -\frac{\sigma _2}{2}\left\| x_{2}^{*}-x_{2}^{k+1}\right\| ^{2} \geqslant \left( x_{2}^{*}-x_{2}^{k+1}\right) ^{\mathrm{T}} g_{2}\left( x_{2}^{k+1}\right) ,\\&f_{3}\left( x_{3}^{*}\right) -f_{3}\left( x_{3}^{k+1}\right) -\frac{\sigma _3}{2}\left\| x_{3}^{*}-x_{3}^{k+1}\right\| ^{2} \geqslant \left( x_{3}^{*}-x_{3}^{k+1}\right) ^{\mathrm{T}} g_{3}\left( x_{3}^{k+1}\right) , \end{aligned}$$

and

$$\begin{aligned}&\quad \frac{\gamma }{2}\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \\&=\frac{\gamma }{2}\left\| A_{2}\left( x_{2}^{k+1}-x_{2}^{*}\right) +A_{3}\left( x_{3}^{k+1}-x_{3}^{*}\right) \right\| ^{2}\\&\leqslant \gamma \left( \lambda _{\max }\left( A_{2}^{\mathrm{T}}A_{2}\right) \left\| x_{2}^{k+1}-x_{2}^{*}\right\| ^{2}+\lambda _{\max }\left( A_{3}^{\mathrm{T}}A_{3}\right) \left\| x_{3}^{k+1}-x_{3}^{*}\right\| ^{2}\right) , \end{aligned}$$

we obtain

$$\begin{aligned}&f(u^{*})-f(u^{k+1})+\left( \begin{array}{c} x_{1}^{*}-x_{1}^{k+1} \\ x_{2}^{*}-x_{2}^{k+1} \\ x_{3}^{*}-x_{3}^{k+1} \\ \lambda -\lambda ^{k+1}\end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{3}^{\mathrm{T}}\lambda ^{k+1} \\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \end{array}\right) \\&\qquad +\,\frac{1}{2\gamma }\left( \Vert \lambda -\lambda ^{k}\Vert ^{2}-\Vert \lambda -\lambda ^{k+1}\Vert ^{2}\right) \\&\qquad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\qquad +\left( \gamma \lambda _{\max }\left( A_{2}^{\mathrm{T}}A_{2}\right) -\frac{\sigma _2}{2}\right) \left\| x_{2}^{k+1}-x_{2}^{*}\right\| ^{2}{+}\left( \gamma \lambda _{\max }\left( A_{3}^{\mathrm{T}}A_{3}\right) -\frac{\sigma _3}{2}\right) \left\| x_{3}^{k+1}-x_{3}^{*}\right\| ^{2}\\&\qquad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad \geqslant \frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$

This together with the facts that $\gamma \lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})-\frac{\sigma _2}{2} \leqslant 0$ and $\gamma \lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})-\frac{\sigma _3}{2}\leqslant 0$ implies the desired inequality (3.1).

Now, we are ready to present the O(1 / t) ergodic convergence rate of the ADMM.

Theorem 3.2

Assume that $\gamma \leqslant \min \left\{ \frac{\sigma _2}{2\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{2\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} $. Let $(x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\in \Omega $ be generated by ADMM (2.2)–(2.5) from given $(x_2^{k},x_3^{k},\lambda ^{k})$. For any integer $t>0$, let $\bar{u}^{t}=(\bar{x}_{1}^{t}, \bar{x}_{2}^{t}, \bar{x}_{3}^{t})$ and $\bar{\lambda }^{t}$ be defined as

$$\begin{aligned}&\bar{x}_{1}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{1}^{k+1},\quad \bar{x}_{2}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{2}^{k+1},\quad \bar{x}_{3}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{3}^{k+1},\\&\quad \bar{\lambda }^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}\lambda ^{k+1}. \end{aligned}$$

Then, for any $(u^*,\lambda ^*) \in \Omega ^{*}$, by defining $\rho :=\Vert \lambda ^*\Vert +1$, we have

$$\begin{aligned} 0&\leqslant f(\bar{u}^{t})-f(u^{*})+\rho \left\| A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{1}^{t}+A_{3}\bar{x}_{3}^{t}-b\right\| \\&\leqslant \frac{\gamma }{2(t+1)}\left\| A_3 x_{3}^{*}-A_3 x_{3}^{0}\right\| ^{2}+\frac{\rho ^{2}+\Vert \lambda ^0\Vert ^2}{\gamma (t+1)}\\&\quad +\,\frac{\gamma }{2(t+1)}\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{0}+A_{3}x_{3}^{0}-b\right\| ^{2}. \end{aligned}$$

Note that this also implies that both the error of the objective function value and the residual of the equality constraint converge to 0 with convergence rate O(1 / t), i.e.,

$$\begin{aligned} |f(\bar{u}^t)-f(u^*)|=O(1/t),\quad \text{ and } \quad \left\| A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{1}^{t}+A_{3}\bar{x}_{3}^{t}-b\right\| =O(1/t).\nonumber \\ \end{aligned}$$

(3.5)

Proof

Because $(u^{k},\lambda ^k)\in \Omega $, it holds that $(\bar{u}^{t},\bar{\lambda }^t)\in \Omega $ for all $t\geqslant 0$. Using Lemma 3.1, the last equation of (2.12), and invoking the convexity of function $f(\cdot )$, we have

$$\begin{aligned}&\qquad \,\, f(u^*)-f(\bar{u}^t) + \lambda ^\mathrm{T}\left( A_1\bar{x}_1^t+A_2\bar{x}_2^t+A_3\bar{x}_3^t-b\right) \nonumber \\&\quad =f(u^{*})-f(\bar{u}^{t})+\left( \begin{array}{c} x_{1}^{*}-\bar{x}_{1}^{t} \\ x_{2}^{*}-\bar{x}_{2}^{t} \\ x_{3}^{*}-\bar{x}_{3}^{t} \\ \lambda -\bar{\lambda }^{t}\end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} -A_{1}^{\mathrm{T}}\bar{\lambda }^{t} \\ -A_{2}^{\mathrm{T}}\bar{\lambda }^{t} \\ -A_{3}^{\mathrm{T}}\bar{\lambda }^{t} \\ A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{2}^{t}+A_{3}\bar{x}_{3}^{t}-b \end{array} \right) \nonumber \\&\quad \geqslant \frac{1}{t+1}\sum \limits _{k=0}^{t}\left[ f(u^{*})-f(u^{k+1})\right. \nonumber \\&\qquad \left. +\,\left( \begin{array}{c} x_{1}^{*}-x_{1}^{k+1} \\ x_{2}^{*}-x_{2}^{k+1} \\ x_{3}^{*}-x_{3}^{k+1} \\ \lambda -\lambda ^{k+1}\end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{3}^{\mathrm{T}}\lambda ^{k+1} \\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \end{array} \right) \right] \nonumber \\&\quad \geqslant \frac{1}{t+1}\sum \limits _{k=0}^{t}\left[ \frac{1}{2\gamma } \left( \Vert \lambda -\lambda ^{k+1}\Vert ^{2}-\Vert \lambda -\lambda ^{k}\Vert ^{2}\right) \right. \nonumber \\&\qquad \,+\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}\right) \nonumber \\&\qquad \,\left. +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}\right) \right] \nonumber \\&\quad \geqslant -\frac{1}{2\gamma (t+1)}\left\| \lambda -\lambda ^{0}\right\| ^{2}-\frac{\gamma }{2(t+1)}\Vert A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{0}-b\Vert ^{2}\nonumber \\&\qquad \,-\,\frac{\gamma }{2(t+1)}\Vert A_{1}x_{1}^{*}+A_{2}x_{2}^{0}+A_{3}x_{3}^{0}-b\Vert ^{2}. \end{aligned}$$

(3.6)

Note that this inequality holds for all $\lambda \in \mathbb {R}^{p}$. From weak duality of (2.1), we obtain

$$\begin{aligned} 0\geqslant f(u^*)-f(\bar{u}^t)+(\lambda ^*)^\mathrm{T}\left( A_{1}\bar{x}_{1}^{t}+A_{2} \bar{x}_{1}^{t}+A_{3}\bar{x}_{3}^{t}-b\right) , \end{aligned}$$

which implies that

$$\begin{aligned} 0 \leqslant f(\bar{u}^{t})-f(u^{*})+\rho \left\| A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{1}^{t} +A_{3}\bar{x}_{3}^{t}-b\right\| , \end{aligned}$$

(3.7)

because $\rho =\Vert \lambda ^*\Vert +1$. Moreover, by letting $\lambda :=-\rho (A_1\bar{x}_1^t+A_2\bar{x}_2^t+A_3\bar{x}_3^t-b)/\Vert A_1\bar{x}_1^t+A_2\bar{x}_2^t+A_3\bar{x}_3^t-b\Vert _2$ in (3.6), and using $A_1x^*_1+A_2x^*_2+A_3x^*_3=b$, we obtain

$$\begin{aligned}&\qquad f(\bar{u}^{t})-f(u^{*})+\rho \left\| A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{1}^{t} +A_{3}\bar{x}_{3}^{t}-b\right\| \nonumber \\&\leqslant \frac{\rho ^{2}+\Vert \lambda ^0\Vert ^2}{\gamma (t+1)} +\frac{\gamma }{2(t+1)}\left\| A_3\left( x_{3}^{*}-x_{3}^{0}\right) \right\| ^{2}\nonumber \\&\quad +\,\frac{\gamma }{2(t+1)}\left\| A_{2}\left( x_{2}^{*}-x_2^0\right) +A_3\left( x^*_3-x_{3}^{0}\right) \right\| ^{2}. \end{aligned}$$

(3.8)

We now define the function

$$\begin{aligned} v(\xi ) = \min \{ f(u) | A_1 x_1 + A_2 x_2 + A_3 x_3 - b = \xi , x_1\in \mathcal {X}_1, x_2\in \mathcal {X}_2, x_3\in \mathcal {X}_3 \}. \end{aligned}$$

It is easy to verify that v is convex, $v(0)=f(u^*)$, and $\lambda ^* \in \partial v(0)$. Therefore, from the convexity of v, it holds that

$$\begin{aligned} v(\xi ) \geqslant v(0) + \langle \lambda ^*, \xi \rangle \geqslant f(u^*) - \Vert \lambda ^*\Vert \Vert \xi \Vert . \end{aligned}$$

(3.9)

Letting $\bar{\xi } = A_1 \bar{x}_1 + A_2 \bar{x}_2 + A_3\bar{x}_3 - b$, we have $f(\bar{u}^t) \geqslant v(\bar{\xi })$. Therefore, by denoting the constant

$$\begin{aligned} C:=\frac{\gamma }{2}\left\| A_3x_3^*-A_3x_3^0\right\| ^2 + \frac{\Vert \lambda ^0\Vert ^2}{\gamma } + \frac{\gamma }{2}\left\| A_1x_1^*+A_2x_2^0+A_3x_3^0-b\right\| ^2, \end{aligned}$$

and combining (3.7), (3.8), and (3.9), we get

$$\begin{aligned} \frac{C+\rho ^2/\gamma }{t+1} - \rho \Vert \bar{\xi }\Vert \geqslant f(\bar{u}^t) - f(u^*) \geqslant - \Vert \lambda ^*\Vert \Vert \bar{\xi } \Vert , \end{aligned}$$

which, by using $\rho =\Vert \lambda ^*\Vert +1$, yields

$$\begin{aligned} \Vert A_1 \bar{x}_1 + A_2 \bar{x}_2 + A_3\bar{x}_3 - b\Vert = \Vert \bar{\xi }\Vert \leqslant \frac{C+\rho ^2/\gamma }{t+1}. \end{aligned}$$

(3.10)

Moreover, by combining (3.7), (3.8), and (3.10), one obtains that

$$\begin{aligned} -\frac{\rho C+\rho ^3/\gamma }{t+1} \leqslant f(\bar{u}^t) - f(u^*) \leqslant \frac{C+\rho ^2/\gamma }{t+1}. \end{aligned}$$

(3.11)

As a result, (3.5) follows immediately from (3.10) and (3.11).

Therefore, we have established the O(1 / t) convergence rate of the ADMM (2.2)–(2.5) in an ergodic sense. Our proof is readily extended to the case of N-block ADMM (1.2). The following theorem shows the O(1 / t) convergence rate of N-block ADMM (1.2). We omit the proof here for the sake of succinctness.

Theorem 3.3

Assume that

$$\begin{aligned} \gamma \leqslant \min _{i=2,\cdots ,N-1}\left\{ \frac{2\sigma _{i}}{(2N-i)(i-1)\lambda _{\max }(A_{i}^{\mathrm{T}}A_{i})}, \frac{2\sigma _{N}}{(N-2)(N+1)\lambda _{\max }(A_{N}^{\mathrm{T}}A_{N})}\right\} , \end{aligned}$$

where $\sigma _{i}$ is the strong convexity parameter of $f_{i}$, $i=2,\cdots ,N$. Let $(x_{1}^{k+1},x_{2}^{k+1},x_{3}^{k+1},\cdots ,x_{N}^{k+1}, \lambda ^{k+1})\in \Omega $ be generated by the N-block ADMM (1.2). For any integer $t>0$, we define

$$\begin{aligned} \bar{x}_{i}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{i}^{k+1}, \quad 1\leqslant i \leqslant N, \quad \quad \bar{\lambda }^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}\lambda ^{k+1}. \end{aligned}$$

Then, for $\rho :=\Vert \lambda ^*\Vert +1$, it holds that

$$\begin{aligned}&\quad \sum \limits _{i=1}^{N}\left( f_{i}\left( \bar{x}_{i}^{t}\right) \,-\,f_{i}\left( x_{i}^{*}\right) \right) \,+\,\rho \left\| \sum \limits _{i=1}^{N}A_{i}\bar{x}_{i}^{t}\,-\,b\right\| \\&\leqslant \frac{\gamma }{2(t\,+\,1)}\sum \limits _{i=1}^{N-1} \left\| \sum \limits _{m=i+1}^{N}A_{m}\left( x_{m}^{0}-x_{m}^{*}\right) \right\| ^{2} +\frac{\rho ^{2}+\Vert \lambda ^0\Vert ^2}{\gamma (t+1)}. \end{aligned}$$

Similarly as Theorem 3.2, this also implies that N-block ADMM (1.2) converges with rate O(1 / t) in terms both error of objective function value and the residual of the equality constraints, i.e., it holds that

$$\begin{aligned} |f(\bar{u}^t)-f(u^*)| = O(1/t), \quad \text{ and } \quad \left\| \sum \limits _{i=1}^{N}A_{i}\bar{x}_{i}^{t}-b\right\| =O(1/t). \end{aligned}$$

4 Non-ergodic Convergence Rate of ADMM

In this section, we prove an o(1 / k) non-ergodic convergence rate for ADMM (2.2)–(2.5).

Let us first observe the following (see also Lemma 4.1 in [24]). Suppose at the $(k+1)$-th iteration of ADMM (2.2)–(2.5), we have

$$\begin{aligned} \left\{ \begin{array}{l} A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}=0,\\ A_{3}x_{3}^{k+1}-A_{3}x_{3}^{k}=0,\\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b=0. \end{array}\right. \end{aligned}$$

(4.1)

Then, (2.9)–(2.11) would immediately lead to

$$\begin{aligned} \left\{ \begin{array}{ll} \left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k+1} \right] \geqslant 0, &{}\quad \forall x_{1}\in \mathcal {X}_{1}, \\ \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ g_2\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k+1} \right] \geqslant 0, &{}\quad \forall x_{2}\in \mathcal {X}_{2}, \\ \left( x_{3}-x_{3}^{k+1}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k+1}\right] \geqslant 0, &{}\quad \forall x_{3}\in \mathcal {X}_{3}. \end{array}\right. \end{aligned}$$

In other words, if (4.1) is satisfied, then $(x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})$ would have been already an optimal solution for (2.1). It is therefore natural to introduce a residual for the linear system (4.1) as an optimality measure. Below is such a measure, to be denoted by $R_{k+1}$:

$$\begin{aligned} R_{k+1}&:=\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}+2\left\| A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right\| ^{2}\nonumber \\&\quad +\,3\left\| A_{3}x_{3}^{k+1}-A_{3}x_{3}^{k}\right\| ^{2}. \end{aligned}$$

(4.2)

In the sequel, we will show that $R_k$ converges to 0 at the rate o(1 / k). Note that this gives the convergence rate of ADMM (2.2)–(2.5) in non-ergodic sense.

We first show that $R_k$ is non-increasing.

Lemma 4.1

Assume $\gamma \leqslant \min \{\frac{\sigma _2}{\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\}$. Let the sequence $\{x_1^k,x_2^k,x_3^k,\lambda ^k\}$ be generated by ADMM (2.2)–(2.5). It holds that $R_k$ defined in (4.2) is non-increasing, i.e.,

$$\begin{aligned} R_{k+1} \leqslant R_{k}, \quad k=0,1,2,\cdots . \end{aligned}$$

(4.3)

Proof

Letting $x_{1}=x_{1}^{k}$ in (2.6) yields

$$\begin{aligned} \left( x_{1}^{k}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k}+\gamma A_{1}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right) \right] \geqslant 0, \end{aligned}$$

with $g_1 \in \partial f_1$, which further implies that

$$\begin{aligned}&\qquad \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}g_{1}\left( x_{1}^{k+1}\right) \nonumber \\&\leqslant \left( x_{1}^{k}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left( -A_{1}^{\mathrm{T}}\lambda ^{k}\right) \nonumber \\&\quad +\,\left( x_{1}^{k}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ \gamma A_{1}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1} +A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right) \right] \nonumber \\&=\left( A_{1}x_{1}^{k}\,-\,A_{1}x_{1}^{k+1}\right) ^{\mathrm{T}}\,\,\left( \,-\,\lambda ^{k}\right) \,+\,\gamma \left( A_{1}x_{1}^{k}\right. \nonumber \\&\quad \left. -\,A_{1}x_{1}^{k+1}\right) ^{\mathrm{T}}\,\left( A_{1}x_{1}^{k+1}\,+\,A_{2}x_{2}^{k}\,+\,A_{3}x_{3}^{k}\,-\,b\right) \nonumber \\&=\left( A_{1}x_{1}^{k}-A_{1}x_{1}^{k+1}\right) ^{\mathrm{T}}\left( -\lambda ^{k}\right) +\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \nonumber \\&\quad -\,\left. \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} -\left\| A_{1}x_{1}^{k}-A_{1}x_{1}^{k+1}\right\| ^{2}\right) , \end{aligned}$$

(4.4)

where the last equality is due to the identity (2.17). Letting $x_{1}=x_{1}^{k+1}$ in (2.9) with $k+1$ changed to k yields

$$\begin{aligned}&\quad \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k}\right) -A_{1}^{\mathrm{T}}\lambda ^{k}+\gamma A_{1}^{\mathrm{T}}A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) +\gamma A_{1}^{\mathrm{T}}A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \\&\geqslant 0, \end{aligned}$$

which further implies that

$$\begin{aligned}&\qquad \left( x_{1}^{k}-x_{1}^{k+1}\right) ^{\mathrm{T}} g_{1}\left( x_{1}^{k}\right) \nonumber \\&\leqslant \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left( -A_{1}^{\mathrm{T}}\lambda ^{k}\right) \nonumber \\&\quad +\,\gamma \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left[ A_{1}^{\mathrm{T}}A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) +A_{1}^{\mathrm{T}}A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \nonumber \\&=\left( A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right) ^{\mathrm{T}}\left( -\lambda ^{k}\right) \nonumber \\&\quad +\, \gamma \left( A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right) ^{\mathrm{T}}\left[ A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) +A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \nonumber \\&\leqslant \left( A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right) ^{\mathrm{T}}(-\lambda ^{k})\nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right\| ^{2} + \left\| A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) +A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right\| ^{2}\right) \nonumber \\&\leqslant \left( A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right) ^{\mathrm{T}}(-\lambda ^{k}) \nonumber \\&\quad +\, \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right\| ^{2} + 2\left\| A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) \right\| ^{2} +2\left\| A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right\| ^{2}\right) .\nonumber \\ \end{aligned}$$

(4.5)

Combining (4.4) and (4.5) gives

$$\begin{aligned}&\quad \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) -g_{1}\left( x_{1}^{k}\right) \right] \nonumber \\&\leqslant \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} - \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \nonumber \\&\qquad +\left. 2\left\| A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) \right\| ^{2}+2\left\| A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right\| ^{2}\right) . \end{aligned}$$

(4.6)

Letting $x_{2}=x_{2}^{k}$ in (2.7) yields

$$\begin{aligned} \left( x_{2}^{k}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k}+\gamma A_{2}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right) \right] \geqslant 0, \end{aligned}$$

which further implies that

$$\begin{aligned}&\qquad \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}g_{2}\left( x_{2}^{k+1}\right) \nonumber \\&\leqslant \left( x_{2}^{k}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left( -A_{2}^{\mathrm{T}}\lambda ^{k}\right) \nonumber \\&\quad +\,\left( x_{2}^{k}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ \gamma A_{2}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right) \right] \nonumber \\&=\left( A_{2}x_{2}^{k}-A_{2}x_{2}^{k+1}\right) ^{\mathrm{T}}(-\lambda ^{k})\nonumber \\&\quad +\, \gamma \left( A_{2}x_{2}^{k}-A_{2}x_{2}^{k+1}\right) ^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right) \nonumber \\&=\left( A_{2}x_{2}^{k}-A_{2}x_{2}^{k+1}\right) ^{\mathrm{T}}(-\lambda ^{k})+\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \nonumber \\&\quad -\,\left. \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} -\left\| A_{2}x_{2}^{k}-A_{2}x_{2}^{k+1}\right\| ^{2}\right) , \end{aligned}$$

(4.7)

where the last equality is due to the identity (2.17). Letting $x_{2}=x_{2}^{k+1}$ in (2.10) with $k+1$ changed to k yields

$$\begin{aligned} \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k}\right) -A_{2}^{\mathrm{T}}\lambda ^{k}+\gamma A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \geqslant 0, \end{aligned}$$

which further implies that

$$\begin{aligned}&\quad \left( x_{2}^{k}-x_{2}^{k+1}\right) ^{\mathrm{T}} g_{2}\left( x_{2}^{k}\right) \nonumber \\&\leqslant \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left( -A_{2}^{\mathrm{T}}\lambda ^{k}\right) + \gamma \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \nonumber \\&=\left( A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right) ^{\mathrm{T}}(-\lambda ^{k})+ \gamma \left( A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right) ^{\mathrm{T}}\left( A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right) \nonumber \\&\leqslant \left( A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right) ^{\mathrm{T}}(-\lambda ^{k})+\frac{\gamma }{2}\left( \left\| A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right\| ^{2}+\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right) .\nonumber \\ \end{aligned}$$

(4.8)

Combining (4.7) and (4.8) gives

$$\begin{aligned}&\quad \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -g_{2}\left( x_{2}^{k}\right) \right] \nonumber \\&\leqslant \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} - \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \nonumber \\&\quad \left. +\,\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right) . \end{aligned}$$

(4.9)

Letting $x_3=x_3^k$ in (2.11) and $x_3=x_3^{k+1}$ in (2.11) with $k+1$ changed to k, and adding the two resulting inequalities, yields

$$\begin{aligned}&\qquad \left( x_{3}^{k+1}-x_{3}^{k}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -g_{3} \left( x_{3}^{k}\right) \right] \nonumber \\&\leqslant \left( x_{3}^{k+1}-x_{3}^{k}\right) ^{\mathrm{T}}\left( A_{3}^{\mathrm{T}}\lambda ^{k+1}-A_{3}^{\mathrm{T}}\lambda ^{k}\right) \nonumber \\&=\left( A_{3}x_{3}^{k+1}-A_{3}x_{3}^{k}\right) ^{\mathrm{T}}\left( \lambda ^{k+1}-\lambda ^{k}\right) \nonumber \\&=\gamma \left( A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right) ^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right) \nonumber \\&=\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2}{-}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \right. \nonumber \\&\quad \left. -\,\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}\right) , \end{aligned}$$

(4.10)

where the last equality is due to the identity (2.17).

Combining (4.6), (4.9), and (4.10) yields

$$\begin{aligned}&\qquad \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) -g_{1}\left( x_{1}^{k}\right) \right] + \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -g_{2}\left( x_{2}^{k}\right) \right] \nonumber \\&\quad + \,\left( x_{3}^{k+1}\,-\,x_{3}^{k}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) \,-\,g_{3}\left( x_{3}^{k}\right) \right] \nonumber \\&\leqslant \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}\right. \nonumber \\&\quad \left. -\, \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}+2\left\| A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) \right\| ^{2} \right. \nonumber \\&\quad +\,2\left\| A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right\| ^{2}+\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}\nonumber \\&\quad -\,\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} \nonumber \\&\quad +\,\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}+ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} \nonumber \\&\quad \left. -\,\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}-\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2} \right] \nonumber \\&= \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k}\,+\,A_{2}x_{2}^{k}\,+\,A_{3}x_{3}^{k}\,-\,b\right\| ^{2}\,+\,2\left\| A_{2}\left( x_{2}^{k-1}\,-\,x_{2}^{k}\right) \right\| ^{2}\right. \nonumber \\&\quad +\,3\left\| A_{3}\left( x_{3}^{k-1}\,-\,x_{3}^{k}\right) \right\| ^{2} \nonumber \\&\quad \left. -\,\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}-\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right] \nonumber \\&= \frac{\gamma }{2} \left[ R_{k}-R_{k+1}+2\left\| A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right\| ^{2}+2\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}\right] . \end{aligned}$$

(4.11)

Note that (2.15) and (2.16) imply that

$$\begin{aligned}&\left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -g_{2}\left( x_{2}^{k}\right) \right] \geqslant \sigma _2 \Vert x_{2}^{k+1}-x_{2}^{k}\Vert ^{2},\nonumber \\&\left( x_{3}^{k+1}-x_{3}^{k}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -g_{3}\left( x_{3}^{k}\right) \right] \geqslant \sigma _3 \Vert x_{3}^{k+1}-x_{3}^{k}\Vert ^{2}. \end{aligned}$$

(4.12)

Combining (4.11) and (4.12), and the fact that $\gamma \leqslant \min \left\{ \frac{\sigma _2}{\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} $, it is easy to see that $R_{k+1} \leqslant R_k$ for $k=0,1,2,\cdots $.

We are now ready to present the o(1 / k) non-ergodic convergence rate of the ADMM (2.2)–(2.5).

Theorem 4.2

Assume $\gamma \leqslant \min \left\{ \frac{\sigma _2}{2\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{2\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} $. Let the sequence $\{x_1^{k},x_2^k,x_3^k,\lambda ^k\}$ be generated by ADMM (2.2)–(2.5). Then $\sum _{k=1}^\infty R_k < +\infty $ and $R_k = o(1/k)$.

Proof

Combining (4.9) and (4.10) yields

$$\begin{aligned}&\qquad \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -g_{2}\left( x_{2}^{k}\right) \right] +\left( x_{3}^{k+1}-x_{3}^{k}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -g_{3}\left( x_{3}^{k}\right) \right] \\&\leqslant \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \\&\quad +\,\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}+ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2}\\&\quad \left. -\,\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}-\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}\right] \\&= \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}+\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right. \\&\quad \left. -\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2} -\,\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \right] \\&=\frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^2 +\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right. \\&\quad \left. -\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}-R_{k+1} \right. \\&\quad \left. +\,2\left\| A_2x_2^{k+1}-A_2x_2^k\right\| ^2 + 3\left\| A_3x_3^k-A_3x_3^{k+1}\right\| ^2 \right] \\&\leqslant \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^2 +\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right. \\&\quad \left. -\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}-R_{k+1}\right] \\&\quad +\,\gamma \lambda _{\max }\left( A_2^\mathrm{T}A_2\right) \left\| x_{2}^{k+1}-x_{2}^{k}\right\| ^{2} + \frac{3\gamma }{2}\lambda _{\max }\left( A_3^\mathrm{T}A_3\right) \left\| x_{3}^{k+1}-x_{3}^{k}\right\| ^{2}. \end{aligned}$$

Using (4.12) and the assumption that $\gamma \leqslant \min \left\{ \frac{\sigma _2}{2\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{2\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} $, we obtain

$$\begin{aligned} R_{k+1}\leqslant & {} \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}+\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\nonumber \\&-\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}. \end{aligned}$$

(4.13)

From the optimality conditions (2.12) and the convexity of f, it follows that

$$\begin{aligned} f(u^{*})-f\left( u^{k+1}\right)\leqslant & {} \left( x_{1}^{*}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left( A_{1}^{\mathrm{T}}\lambda ^{*}\right) +\left( x_{2}^{*}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left( A_{2}^{\mathrm{T}}\lambda ^{*}\right) \nonumber \\&+\left( x_{3}^{*}-x_{3}^{k+1}\right) ^{\mathrm{T}}\left( A_{3}^{\mathrm{T}}\lambda ^{*}\right) . \end{aligned}$$

(4.14)

By combining (3.1) and (4.14), we have

$$\begin{aligned}&\qquad \left( \begin{array}{c} x_{1}^{*}-x_{1}^{k+1} \\ x_{2}^{*}-x_{2}^{k+1} \\ x_{3}^{*}-x_{3}^{k+1} \end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} A_{1}^{\mathrm{T}}\left( \lambda ^{*}-\lambda ^{k+1}\right) \\ A_{2}^{\mathrm{T}}\left( \lambda ^{*}-\lambda ^{k+1}\right) \\ A_{3}^{\mathrm{T}}\left( \lambda ^{*}-\lambda ^{k+1}\right) \end{array} \right) +\frac{1}{2\gamma }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^{2}\nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\geqslant \frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$

(4.15)

Note that the first term in (4.15) is equal to

$$\begin{aligned}&\quad -\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right) ^{\mathrm{T}} \left( \lambda ^{*}-\lambda ^{k+1}\right) \\&=\frac{1}{\gamma }\left( \lambda ^{k+1}-\lambda ^{k}\right) ^{\mathrm{T}} \left( \lambda ^{*}-\lambda ^{k+1}\right) \\&=\frac{1}{2\gamma }\left( \Vert \lambda ^{*}-\lambda ^{k}\Vert ^{2} -\Vert \lambda ^{k+1}-\lambda ^{k}\Vert ^{2}-\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^{2}\right) . \end{aligned}$$

Therefore, (4.15) can be rearranged as

$$\begin{aligned}&\qquad \frac{1}{\gamma ^{2}}\left( \Vert \lambda ^{*}-\lambda ^{k}\Vert ^{2}-\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^{2}\right) \nonumber \\&\quad +\,\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\geqslant \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$

(4.16)

By (4.13) and (4.16), we get that

$$\begin{aligned} \sum \limits _{k=1}^{\infty }R_{k+1}\leqslant & {} \sum \limits _{k=1}^{\infty }\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}+\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right. \\&\left. -\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}\right] \\\leqslant & {} \sum \limits _{k=1}^{\infty } \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} + \left\| A_{3}x_{3}^{0}-A_{3}x_{3}^{1}\right\| ^{2}\\\leqslant & {} \left\| A_{3}x_{3}^{0}-A_{3}x_{3}^{1}\right\| ^{2}+\sum \limits _{k=1}^{\infty } \left[ \left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}\right. \right. \\&\left. \left. -\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \right. \\&+\left( \left\| A_{1}x_{1}^{*}{+}A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}{-}\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}{+}A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\left. +\,\frac{1}{\gamma ^{2}}\left( \left\| \lambda ^{*}-\lambda ^{k} \right\| ^{2}-\left\| \lambda ^{*}-\lambda ^{k+1}\right\| ^{2}\right) \right] \\\leqslant & {} \left\| A_{3}x_{3}^{0}-A_{3}x_{3}^{1}\right\| ^{2}+\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{1}-b\right\| ^{2}\\&+\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{1}+A_{3}x_{3}^{1}-b\right\| ^{2}+\frac{1}{\gamma ^{2}}\left\| \lambda ^{*}-\lambda ^{1}\right\| ^{2}. \end{aligned}$$

Note that we have proved that $R_k$ is monotonically non-increasing, and $\sum _{k=1}^\infty R_k<+\infty $. As observed in Lemma 1.2 of [32], one has

$$\begin{aligned} k R_{2k} \leqslant R_k + R_{k+1} + \cdots + R_{2k} \rightarrow 0, \quad \text{ as } k \rightarrow \infty , \end{aligned}$$

and therefore $R_k = o(1/k)$.

Remark 4.3

We remark here that using similar arguments, it is easy to see that (4.13) and (4.16) together with the monotonicity of $R_k$ also imply that $R_k$ has a non-asymptotic sublinear convergence rate O(1 / k).

Note that our analysis can be extended to N-block ADMM (1.2) easily. The results are summarized in the following theorem, and the proof is omitted for the sake of succinctness.

Theorem 4.4

Assume that

$$\begin{aligned} \gamma \leqslant \min _{i=2,\cdots ,N-1}\left\{ \frac{2\sigma _{i}}{(2N-i)(i-1)\lambda _{\max }\left( A_{i}^{\mathrm{T}}A_{i}\right) }, \frac{2\sigma _{N}}{(N-2)(N+1)\lambda _{\max }\left( A_{N}^{\mathrm{T}}A_{N}\right) }\right\} . \end{aligned}$$

Let $(x_{1}^{k+1},x_{2}^{k+1},x_{3}^{k+1},\cdots ,x_{N}^{k+1},\lambda ^{k+1})\in \Omega $ be generated by ADMM (1.2). Then $\sum _{k=1}^\infty R_k < +\infty $ and $R_k=o(1/k)$, where $R_k$ is defined as

$$\begin{aligned} R_{k+1} := \left\| \sum _{i=1}^N A_ix_i^{k+1}-b\right\| ^2 + \sum _{i=2}^N\frac{(2N-i)(i-1)}{2}\left\| A_ix_i^k-A_ix_i^{k+1}\right\| ^2. \end{aligned}$$

5 Conclusions

In this paper, we analyzed the sublinear convergence rate of the standard Gauss–Seidel multi-block ADMM in both ergodic and non-ergodic sense. These are the first sublinear convergence rate results for standard multi-block ADMM. Using the techniques developed in this paper, we can also analyze the convergence rate of some variants of the standard multi-block ADMM such as the ones studied in [32] and [33], where the primal variables are updated in a Jacobi manner; we plan to pursue this direction of research in the future.

We remark here the techniques developed in this paper can lead to a very simple proof for the O(1 / t) complexity of two-block ADMM in terms of objective error and constraint violation of (1.1) ($N=2$). Specifically, when $N=2$, denote $(x_1^k,x_2^k;\lambda ^k)$ as the iterate generated by the two-block ADMM (1.2), and define

$$\begin{aligned} \bar{x}_1^t = \frac{1}{t+1}\sum _{k=0}^t x_1^{k+1}, \quad \bar{x}_2^t = \frac{1}{t+1}\sum _{k=0}^t x_2^{k+1}, \quad \bar{\lambda }^t = \frac{1}{t+1}\sum _{k=0}^t \lambda ^{k+1}. \end{aligned}$$

We can prove that

$$\begin{aligned}&|f_1(\bar{x}_1^t) + f_2(\bar{x}_2^t) - f_1(x_1^*) - f_2(x_2^*) | = O(1/t), \quad \nonumber \\&\quad \text{ and } \quad \Vert A_1\bar{x}_1^t+A_2\bar{x}_2^t-b\Vert =O(1/t), \end{aligned}$$

(5.1)

i.e., the convergence rate of the two-block ADMM is O(1 / t) in terms of both objective error and constraint violation. Note that for $N=2$, $\gamma $ can be any positive number and there is no need to impose the strong convexity on either $f_1$ or $f_2$. The proof of this result is as follows.

First, when $N=2$, the optimality conditions (2.9)–(2.11) reduce to

(5.2)

(5.3)

Therefore, by letting $x_1=x_1^*$ in (5.2), $x_2=x_2^*$ in (5.3), and using the convexity of $f_1$ and $f_2$, we have

$$\begin{aligned}&f_1(x_1^{k+1}) - f_1(x_1^*) + f_2(x_2^{k+1}) - f_2(x_2^*) \\ \leqslant&g_1(x_1^{k+1})^\mathrm{T}(x_1^{k+1}-x_1^*) + g_2(x_2^{k+1})^\mathrm{T}(x_2^{k+1}-x_2^*) \\ \leqslant&(-A_1^\mathrm{T}\lambda ^{k+1}+\gamma A_1^\mathrm{T} A_2(x_2^k-x_2^{k+1}))^\mathrm{T} (x_1^*-x_1^{k+1}) + (-A_2^\mathrm{T}\lambda ^{k+1})^\mathrm{T} (x_2^*-x_2^{k+1}) \\ =&\frac{1}{\gamma }(\lambda ^k-\lambda ^{k+1})^\mathrm{T}\lambda ^{k+1} \!+\! \gamma \left[ (-A_2x_2^{k+1})\!-\!(-A_2x_2^k)\right] ^\mathrm{T}\left[ (A_1x_1^*-b)\!-\!(A_1x_1^{k+1}-b)\right] \\ =&\frac{1}{\gamma }(\lambda ^k-\lambda ^{k+1})^\mathrm{T}\lambda ^{k+1} + \frac{\gamma }{2}\left( \Vert -A_1x_1^{k+1}+b-A_2x_2^{k+1}\Vert ^2+\Vert A_1x_1^*-b\right. \\&\left. +A_2x_2^k\Vert ^2-\Vert -A_2x_2^{k+1}-A_1x_1^*+b\Vert ^2-\Vert A_1x_1^{k+1}-b+A_2x_2^k\Vert ^2\right) \\ \leqslant&\frac{1}{\gamma }(\lambda ^k-\lambda ^{k+1})^\mathrm{T}\lambda ^{k+1} + \frac{\gamma }{2}\left( \frac{1}{\gamma ^2}\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2+\Vert A_1x_1^*-b+A_2x_2^k\Vert ^2-\Vert \right. \\&\left. -A_2x_2^{k+1}-A_1x_1^*+b\Vert ^2\right) , \end{aligned}$$

where the second equality is due to (2.17). Thus for any $\lambda \in \mathbb {R}^p$, it holds that,

$$\begin{aligned}&f_1(x_1^{k+1}) - f_1(x_1^*) + f_2(x_2^{k+1}) - f_2(x_2^*) -\lambda ^\mathrm{T}(A_1x_1^{k+1}+A_2x_2^{k+1}-b) \nonumber \\ \leqslant&\frac{1}{\gamma }(\lambda ^{k+1}-\lambda )^\mathrm{T}(\lambda ^k-\lambda ^{k+1}) + \frac{1}{2\gamma }\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2 + \frac{\gamma }{2}(\Vert A_1x_1^*+A_2x_2^k-b\Vert ^2\nonumber \\&\quad -\Vert A_1x_1^*+A_2x_2^{k+1}-b\Vert ^2)\nonumber \\ =&\frac{1}{2\gamma } (\Vert \lambda -\lambda ^{k}\Vert ^2 - \Vert \lambda -\lambda ^{k+1}\Vert ^2) + \frac{\gamma }{2}(\Vert A_1x_1^*+A_2x_2^k-b\Vert ^2-\Vert A_1x_1^*\nonumber \\&\quad +A_2x_2^{k+1}-b\Vert ^2). \end{aligned}$$

(5.4)

Summing (5.4) over $k=0,1,\ldots ,t$ yields,

$$\begin{aligned}&f_1(\bar{x}_1^{t}) - f_1(x_1^*) + f_2(\bar{x}_2^{t}) - f_2(x_2^*) -\lambda ^\mathrm{T}(A_1\bar{x}_1^{t}+A_2\bar{x}_2^{t}-b) \\ \leqslant&\frac{1}{2\gamma (t+1)}\Vert \lambda -\lambda ^0\Vert ^2 + \frac{\gamma }{2(t+1)}\Vert A_1x_1^*+A_2x_2^0-b\Vert ^2. \end{aligned}$$

Based on the above bound, the error analysis for both the objective and the residual follow the same line of arguments as the proof of Theorem 3.2.

Notes

Preprint available at http://arxiv.org/abs/1408.4265.

References

Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comp. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Glowinski, R., Marrocco, A.: Sur l’approximation par éléments finis et la résolution par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue Française d’Automatique, Informatique, Recherche Operationnelle, Serie Rouge (AnalyseNumérique), R-2,pp. 41–76 (1975)
Douglas, J., Rachford, H.H.: On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Article MathSciNet MATH Google Scholar
Peaceman, D.H., Rachford, H.H.: The numerical solution of parabolic elliptic differential equations. SIAM J. Appl. Math. 3, 28–41 (1955)
Article MathSciNet MATH Google Scholar
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. PhD thesis, Massachusetts Institute of Technology (1989)
Fortin, M., Glowinski, R.: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland Pub, Co, Amsterdam (1983)
MATH Google Scholar
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics. SIAM, Philadelphia (1989)
Book MATH Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Eckstein, J.: Augmented Lagrangian and alternating direction methods for convex optimization: A tutorial and some illustrative computational results (2012). Preprint http://www.optimization-online.org/DB_HTML/2012/12/3704.html
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet MATH Google Scholar
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary Value Problems. North-Holland, Amsterdam (1983)
Google Scholar
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Article MathSciNet MATH Google Scholar
Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. Technical report, UCLA CAM Report 14–58 (2014)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. (2015). doi:10.1007/s10915-015-0048-x
He, B., Yuan, X.: On the ${O}(1/n)$ convergence rate of Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: On nonergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numerische Mathematik 130(3), 567–577 (2015)
Article MathSciNet Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23, 475–507 (2013)
Article MathSciNet MATH Google Scholar
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. (2014). doi:10.1007/s10107-014-0826-5
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Article Google Scholar
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)
Article MathSciNet MATH Google Scholar
Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints. SIAM J. Optim. 25, 882–915 (2015)
Article MathSciNet Google Scholar
Wang, X., Hong, M., Ma, S., Luo, Z.-Q.: Solving multiple-block separable convex minimization problems using two-block alternating direction method of multipliers (2013). Preprint arXiv:1308.5294
Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)
Article MathSciNet MATH Google Scholar
Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. Abstract and Applied Analysis, Article ID 183961 (2013). doi:10.1155/2013/183961
Cai, X., Han, D., Yuan, X.: The direct extension of ADMM for three-block separable convex minimization models is convergent when one function is strongly convex (2014). Preprint http://www.optimization-online.org/DB_HTML/2014/11/4644.html
Li, M., Sun, D., Toh, K.C.: A convergent 3-block semi-proximal ADMM for convex minimization problems with one strongly convex block. Asia-Pacific J. Oper. Res. 32(3), 1550024 (2015)
Article MathSciNet Google Scholar
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Technical report, UCLA CAM Report 15–13 (2015)
Lin, T., Ma, S., Zhang, S.: Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity (2015). Preprint arXiv:1504.03087
Lin, T., Ma, S., Zhang, S.: Global convergence of unmodified 3-block ADMM for a class of convex minimization problems (2015). Preprint arXiv:1505.04252
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers (2012). Preprint arXiv:1208.3922
Deng, W., Lai, M., Peng, Z., Yin, W.: Parallel multi-block ADMM with $o(1/k)$ convergence (2013). Preprint arXiv:1312.3040
He, B., Hou, L., Yuan, X.: On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming (2013). Preprint http://www.optimization-online.org/DB_HTML/2013/05/3894.html
He, B., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)
Article MathSciNet MATH Google Scholar
B. He, M. Tao, and X. Yuan. Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming (2013). Preprint http://www.optimization-online.org/DB_FILE/2012/09/3611.pdf
Hong, M., Chang, T.-H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.-Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization (2014). Preprint arXiv:1401.7079
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim., to appear (2015)

Download references

Acknowledgments

We would like to thank the editor and the anonymous referees for carefully reading this paper and for insightful comments.

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
Tian-Yi Lin & Shi-Qian Ma
Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Shu-Zhong Zhang

Authors

Tian-Yi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Shi-Qian Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Zhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi-Qian Ma.

Additional information

The research of S.-Q. Ma was supported in part by the Hong Kong Research Grants Council General Research Fund Early Career Scheme (No. CUHK 439513). The research of S.-Z. Zhang was supported in part by the National Natural Science Foundation (No. CMMI 1161242).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, TY., Ma, SQ. & Zhang, SZ. On the Sublinear Convergence Rate of Multi-block ADMM. J. Oper. Res. Soc. China 3, 251–274 (2015). https://doi.org/10.1007/s40305-015-0092-0

Download citation

Received: 08 May 2015
Revised: 07 June 2015
Accepted: 19 June 2015
Published: 19 July 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s40305-015-0092-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the Sublinear Convergence Rate of Multi-block ADMM

Abstract

Similar content being viewed by others

Convergence analysis of the direct extension of ADMM for multiple-block separable convex minimization

Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems

On the global and linear convergence of direct extension of ADMM for 3-block separable convex minimization models

1 Introduction

1.1 Our Contributions

1.2 Organization

2 Preliminaries

Assumption 2.1

Assumption 2.2

Notation

3 Ergodic Convergence Rate of ADMM

Lemma 3.1

Proof

Theorem 3.2

Proof

Theorem 3.3

4 Non-ergodic Convergence Rate of ADMM

Lemma 4.1

Proof

Theorem 4.2

Proof

Remark 4.3

Theorem 4.4

5 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the Sublinear Convergence Rate of Multi-block ADMM

Abstract

Similar content being viewed by others

Convergence analysis of the direct extension of ADMM for multiple-block separable convex minimization

Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems

On the global and linear convergence of direct extension of ADMM for 3-block separable convex minimization models

1 Introduction

1.1 Our Contributions

1.2 Organization

2 Preliminaries

Assumption 2.1

Assumption 2.2

Notation

3 Ergodic Convergence Rate of ADMM

Lemma 3.1

Proof

Theorem 3.2

Proof

Theorem 3.3

4 Non-ergodic Convergence Rate of ADMM

Lemma 4.1

Proof

Theorem 4.2

Proof

Remark 4.3

Theorem 4.4

5 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation