1 Introduction

We consider solving the following multi-block convex minimization problem:

$$\begin{aligned} \begin{array}{ll} \min &{}f_1(x_1)+f_2(x_2)+\cdots +f_N(x_N)\\ \text {s.t.}&{}A_1 x_1+A_2 x_2+\cdots +A_N x_N=b\\ &{} x_i \in \mathcal {X}_i,\,i=1,\cdots , N, \end{array} \end{aligned}$$
(1.1)

where \(A_i \in \mathbb {R}^{p\times n_i}\), \(b\in \mathbb {R}^p\), \(\mathcal {X}_i\subset \mathbb {R}^{n_i}\) are closed convex sets and \(f_i:\mathbb {R}^{n_i}\rightarrow \mathbb {R}^p\) are closed convex functions. One recently popular way to solve (1.1), when the functions \(f_i\) are of special structures, is to apply the alternating direction method of multipliers (ADMM) [1, 2]. The ADMM is closely related to the Douglas–Rachford [3] and Peaceman–Rachford [4] operator splitting methods that date back to 1950s. These operator splitting methods were further studied later in [58]. The ADMM has been revisited recently due to its success in solving problems with special structures arising from compressed sensing, machine learning, image processing, and so on; see the recent survey papers [9, 10] for more information.

ADMM for solving (1.1) is based on an augmented Lagrangian method framework. The augmented Lagrangian function for (1.1) is defined as

$$\begin{aligned} \mathcal {L}_\gamma (x_1,\cdots ,x_N;\lambda ) := \sum _{j=1}^N f_j(x_j) - \left\langle \lambda , \sum _{j=1}^N A_j x_j -b\right\rangle + \frac{\gamma }{2}\left\| \sum _{j=1}^N A_j x_j -b\right\| ^2, \end{aligned}$$

where \(\lambda \) is the Lagrange multiplier and \(\gamma > 0\) is a penalty parameter. In a typical iteration of the standard ADMM for solving (1.1), the following updating procedure is implemented:

$$\begin{aligned} \left\{ \begin{array}{lcl} x_1^{k+1} &{}&{} := \mathop {\mathrm{argmin}}\nolimits _{x_1\in \mathcal {X}_1} \ \mathcal {L}_\gamma \left( x_1,x_2^k,\cdots ,x_N^k;\lambda ^k\right) , \\ x_2^{k+1} &{}&{} := \mathop {\mathrm{argmin}}\nolimits _{x_2\in \mathcal {X}_2} \ \mathcal {L}_\gamma \left( x_1^{k+1},x_2,x_3^k,\cdots ,x_N^k;\lambda ^k\right) , \\ &{}&{} \vdots \\ x_N^{k+1} &{}&{} := \mathop {\mathrm{argmin}}\nolimits _{x_N\in \mathcal {X}_N} \ \mathcal {L}_\gamma \left( x_1^{k+1},x_2^{k+1},\cdots ,x_{N-1}^{k+1},x_N;\lambda ^k\right) , \\ \lambda ^{k+1} &{}&{} := \lambda ^k - \gamma \left( \sum _{j=1}^N A_j x_j^{k+1} -b\right) . \end{array}\right. \end{aligned}$$
(1.2)

The ADMM (1.2) for solving two-block convex minimization problems (i.e., \(N=2\)) has been studied extensively in the literature. The global convergence of ADMM (1.2) when \(N=2\) has been shown in [11, 12]. There are also some very recent works that study the convergence rate properties of ADMM when \(N=2\) (see, e.g., [1318]).

However, the convergence of ADMM (1.2) when \(N\geqslant 3\) had remained unclear for a long time. In a recent work by Chen et al.  [19], a counter-example was constructed that shows the failure of ADMM (1.2) when \(N\geqslant 3\). Since the ADMM (1.2) for \(N\geqslant 3\) has been successfully applied to solve many problems arising from real practice (see e.g., [20, 21]), it is worth investigating under what kind of sufficient conditions the ADMM (1.2) can converge. Moreover, it has been observed by many researchers that the ADMM (1.2) often outperforms all its modified versions (see the observations in [22, 23]). In fact, Sun, Toh, and Yang made the following statement in [22]: “However, to the best of our knowledge, up to now the dilemma is that at least for convex conic programming, the modified versions though with convergence guarantee, often perform 2-3 times slower than the multi-block ADMM with no convergent guarantee.” There is thus a strong need to further study sufficient conditions that can guarantee the convergence of (1.2). It was shown by Han and Yuan in [24] that ADMM (1.2) globally converges if all the functions \(f_1,\cdots ,f_N\) are assumed to be strongly convex and the penalty parameter \(\gamma \) is smaller than a certain bound. Chen, Shen, and You [25] showed that the 3-block ADMM [i.e., \(N=3\) in (1.2)] globally converges if \(A_1\) is injective, \(f_2\) and \(f_3\) are strongly convex, and \(\gamma \) is smaller than a certain bound. After we released our work,Footnote 1 Cai, Han, and Yuan [26] and Li, Sun, and Toh [27] independently proved that when \(N=3\), the ADMM (1.2) converges under the conditions that one function among \(f_1\), \(f_2\), and \(f_3\) is strongly convex and \(\gamma \) is smaller than a certain bound. Davis and Yin [28] studied a variant of the 3-block ADMM (see Algorithm 8 in [28]) which requires that \(f_1\) is strongly convex and \(\gamma \) is smaller than a certain bound to guarantee the convergence. Recently, Lin, Ma, and Zhang [29] proposed several alternative approaches to ensure the sublinear convergence rate of (1.2) without requiring any function to be strongly convex. Furthermore, Lin, Ma, and Zhang [30] proved that the 3-block ADMM is globally convergent for any \(\gamma >0\) when it is applied to solve the so-called regularized least-squares decomposition problems. In a recent work by Hong and Luo [31], a variant of ADMM (1.2) with small step size in updating the Lagrange multiplier was studied. Specifically, [31] proposed to replace the last equation in (1.2) by

$$\begin{aligned} \lambda ^{k+1} := \lambda ^k - \alpha \gamma \left( \sum _{j=1}^N A_j x_j^{k+1} -b\right) , \end{aligned}$$

where \(\alpha >0\) is a small step size. Linear convergence of this variant is proved under the assumption that the objective function satisfies certain error bound conditions. However, it is noted that the selection of \(\alpha \) is in fact bounded by some parameters associated with the error bound conditions to guarantee the convergence. Therefore, it might be difficult to choose \(\alpha \) in practice. There are also studies on the convergence rate of some other variants of ADMM (1.2), and we refer the interested readers to [3236] for details of these variants. In this paper, we focus on the ADMM (1.2) that directly extends the two-block ADMM to problems with more than two block variables.

1.1 Our Contributions

The main contribution in this paper is as follows. We show that the ADMM (1.2) when \(N\geqslant 3\) converges with rate O(1 / t) in ergodic sense and o(1 / t) in non-ergodic sense, under the assumption that \(f_2,\cdots ,f_N\) are strongly convex and \(f_1\) is convex but not necessarily strongly convex, and \(\gamma \) is smaller than a certain bound. It should be pointed out that our assumption is weaker than the one used in [24], in which all the functions are required to be strongly convex. Moreover, unlike the sufficient condition suggested in [19], we do not make any assumption on the matrices \(A_1,\cdots ,A_N\). To the best of our knowledge, the convergence rate results given in this paper are the first sublinear convergence rate results for the standard ADMM (1.2) when \(N\geqslant 3\). We also remark here that by further assuming additional conditions, we proved the global linear convergence rate of ADMM (1.2) in [37].

1.2 Organization

The rest of this paper is organized as follows. In Sect. 2, we provide some preliminaries for our convergence rate analysis. In Sect. 3, we prove the convergence rate of ADMM (1.2) in the ergodic sense. In Sect. 4, we prove the convergence rate of ADMM (1.2) in the non-ergodic sense. Section 5 draws some conclusions and points out some future directions.

2 Preliminaries

We will only prove the convergence results of ADMM for \(N=3\), because all the analysis can be extended to arbitrary N easily. As a result, for the ease of presentation and succinctness, we assume \(N=3\) in the rest of this paper. We will present the results for general N but omit the proofs.

We restate the problem (1.1) for \(N=3\) as

$$\begin{aligned} \begin{array}{ll} \min &{} f_1(x_1) + f_2(x_2) + f_3(x_3) \\ \text {s.t.}&{} A_1 x_1 + A_2 x_2 + A_3 x_3 = b, \\ &{} x_1\in \mathcal {X}_1, x_2\in \mathcal {X}_2, x_3\in \mathcal {X}_3. \end{array} \end{aligned}$$
(2.1)

The ADMM for solving (2.1) can be summarized as (note that some constant terms in the three subproblems are discarded):

$$\begin{aligned} x_{1}^{k+1}:= & {} \mathop {\mathrm{argmin}}_{x_{1} \in \mathcal {X}_{1}}f_{1}(x_{1})+\frac{\gamma }{2}\left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k} -b-\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \end{aligned}$$
(2.2)
$$\begin{aligned} x_{2}^{k+1}:= & {} \mathop {\mathrm{argmin}}_{x_{2} \in \mathcal {X}_{2}}f_{2}(x_{2}) +\frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b -\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \end{aligned}$$
(2.3)
$$\begin{aligned} x_{3}^{k+1}:= & {} \mathop {\mathrm{argmin}}_{x_{3} \in \mathcal {X}_{3}}f_{3}(x_{3}) +\frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}-b -\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \end{aligned}$$
(2.4)
$$\begin{aligned} \lambda ^{k+1}:= & {} \lambda ^{k} - \gamma \left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1} +A_{3}x_{3}^{k+1}-b\right) . \end{aligned}$$
(2.5)

The first-order optimality conditions for (2.2)–(2.4) are given, respectively, by \(x_{i}^{k+1}\in \mathcal {X}_{i}, i=1,2,3\), and

$$\begin{aligned}&\quad \left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ g_1\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k} +\gamma A_{1}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right) \right] \nonumber \\&\geqslant 0,\quad \forall x_{1}\in \mathcal {X}_{1}, \end{aligned}$$
(2.6)
$$\begin{aligned}&\quad \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ g_2\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k}+\gamma A_{2}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right) \right] \nonumber \\&\geqslant 0,\quad \forall x_{2}\in \mathcal {X}_{2}, \end{aligned}$$
(2.7)
$$\begin{aligned}&\quad \left( x_{3}-x_{3}^{k+1}\right) ^{\mathrm{T}}\left[ g_3\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k}+\gamma A_{3}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right) \right] \nonumber \\&\geqslant 0,\quad \forall x_{3}\in \mathcal {X}_{3}, \end{aligned}$$
(2.8)

where \(g_i \in \partial f_i\) is the subgradient of \(f_i\) for \(i=1,2,3\). Moreover, by combining with (2.5), (2.6)–(2.8) can be rewritten as

$$\begin{aligned}&\quad \left( x_{1}\,-\,x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) \,-\,A_{1}^{\mathrm{T}}\lambda ^{k+1}\,+\,\gamma A_{1}^{\mathrm{T}}A_{2}\left( x_{2}^{k}\,-\,x_{2}^{k+1}\right) \right. \nonumber \\&\quad \,\,\left. +\,\gamma A_{1}^{\mathrm{T}}A_{3}\left( x_{3}^{k}\,-\,x_{3}^{k+1}\right) \right] \geqslant 0,\quad \forall x_{1}\in \mathcal {X}_{1}, \end{aligned}$$
(2.9)
$$\begin{aligned}&\quad \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ g_2\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k+1}+\gamma A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \right] \nonumber \\&\geqslant 0,\quad \forall x_{2}\in \mathcal {X}_{2}, \end{aligned}$$
(2.10)
$$\begin{aligned}&\quad \left( x_{3}-x_{3}^{k+1}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k+1}\right] \geqslant 0,\quad \forall x_{3}\in \mathcal {X}_{3}. \end{aligned}$$
(2.11)

We denote \(\Omega = \mathcal {X}_1\times \mathcal {X}_2 \times \mathcal {X}_3 \times \mathbb {R}^p\) and the optimal set of (2.1) as \(\Omega ^*\), and the following assumption is made throughout this paper.

Assumption 2.1

The optimal set \(\Omega ^*\) for problem (2.1) is non-empty.

According to the first-order optimality conditions for (2.1), solving (2.1) is equivalent to finding

$$\begin{aligned} \left( x_1^*,x_2^*,x_3^*,\lambda ^*\right) \in \Omega ^* \end{aligned}$$

such that the following holds:

$$\begin{aligned} \left\{ \begin{array}{l} \left( x_1-x_1^*\right) ^\mathrm{T}\left( g_1\left( x_1^*\right) -A_1^\mathrm{T}\lambda ^*\right) \geqslant 0, \quad \forall x_1\in \mathcal {X}_1, \\ \left( x_2-x_2^*\right) ^\mathrm{T}\left( g_2\left( x_2^*\right) -A_2^\mathrm{T}\lambda ^*\right) \geqslant 0, \quad \forall x_2\in \mathcal {X}_2, \\ \left( x_3-x_3^*\right) ^\mathrm{T}\left( g_3\left( x_3^*\right) -A_3^\mathrm{T}\lambda ^*\right) \geqslant 0, \quad \forall x_3\in \mathcal {X}_3, \\ A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{*}-b=0, \end{array}\right. \end{aligned}$$
(2.12)

where \(g_i(x_i^*)\in \partial f_i(x_i^*)\), \(i=1,2,3\).

Furthermore, the following condition is assumed in our subsequent analysis.

Assumption 2.2

The functions \(f_2\) and \(f_3\) are strongly convex with parameters \(\sigma _2>0\) and \(\sigma _3>0\), respectively; i.e., the following two inequalities hold:

$$\begin{aligned} f_2(y)\geqslant & {} f_2(x) + (y-x)^\mathrm{T}g_2(x) + \frac{\sigma _2}{2}\Vert y-x\Vert ^2, \quad \forall x,y\in \mathcal {X}_2, \end{aligned}$$
(2.13)
$$\begin{aligned} f_3(y)\geqslant & {} f_3(x) + (y-x)^\mathrm{T}g_3(x) + \frac{\sigma _3}{2}\Vert y-x\Vert ^2, \quad \forall x,y\in \mathcal {X}_3, \end{aligned}$$
(2.14)

or equivalently,

$$\begin{aligned} (y-x)^\mathrm{T}(g_2(y)-g_2(x))\geqslant & {} \sigma _2 \Vert y-x\Vert ^2, \quad \forall x,y\in \mathcal {X}_2, \end{aligned}$$
(2.15)
$$\begin{aligned} (y-x)^\mathrm{T}(g_3(y)-g_3(x))\geqslant & {} \sigma _3 \Vert y-x\Vert ^2, \quad \forall x,y\in \mathcal {X}_3, \end{aligned}$$
(2.16)

where \(g_2(x)\in \partial f_2(x)\) and \(g_3(x) \in \partial f_3(x)\) are the subgradients of \(f_2\) and \(f_3\), respectively.

In our analysis, the following well-known identity is used frequently:

$$\begin{aligned} (w_1-w_2)^{\mathrm{T}}(w_3-w_4)&=\frac{1}{2}\left( \Vert w_1-w_4\Vert ^{2}-\Vert w_1-w_3\Vert ^{2}\right) \nonumber \\&\quad +\,\frac{1}{2}\left( \Vert w_3-w_2\Vert ^{2}-\Vert w_4-w_2\Vert ^{2}\right) . \end{aligned}$$
(2.17)

Notation

For simplicity, we use the following notation to denote the stacked vectors or tuples:

$$\begin{aligned} u = \left( \begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \end{array} \right) , u^k = \left( \begin{array}{c} x_{1}^k \\ x_{2}^k \\ x_{3}^k \end{array} \right) , u^* = \left( \begin{array}{c} x_{1}^* \\ x_{2}^* \\ x_{3}^* \end{array} \right) . \end{aligned}$$

We denote the objective function of problem (2.1) by \(f(u)\equiv f_1(x_{1})+f_{2}(x_{2})+f_{3}(x_{3})\); \(g_i\) is a subgradient of \(f_i\); \(\lambda _{\max }(B)\) denotes the largest eigenvalue of a real symmetric matrix B; \(\Vert x\Vert \) denotes the Euclidean norm of x.

3 Ergodic Convergence Rate of ADMM

In this section, we prove the O(1 / t) convergence rate of ADMM (2.2)–(2.5) in the ergodic sense.

Lemma 3.1

Assume that \(\gamma \leqslant \min \left\{ \frac{\sigma _2}{2\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{2\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} \), where \(\sigma _2\) and \(\sigma _3\) are defined in Assumption 2.2. Let \((x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\in \Omega \) be generated by ADMM from given \((x_{2}^{k},x_{3}^{k},\lambda ^{k})\). Then, for any primal optimal solution \(u^*=(x_1^*,x_2^*,x_3^*)\) of (2.1) and \(\lambda \in \mathbb {R}^p\), it holds that

$$\begin{aligned}&\qquad f(u^{*})-f(u^{k+1})+\left( \begin{array}{c} x_{1}^{*}-x_{1}^{k+1} \\ x_{2}^{*}-x_{2}^{k+1} \\ x_{3}^{*}-x_{3}^{k+1} \\ \lambda -\lambda ^{k+1} \end{array}\right) ^{\mathrm{T}} \left( \begin{array}{c} -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{3}^{\mathrm{T}}\lambda ^{k+1} \\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \end{array} \right) \nonumber \\&\quad +\,\frac{1}{2\gamma }\left( \Vert \lambda -\lambda ^{k}\Vert ^{2}-\Vert \lambda -\lambda ^{k+1}\Vert ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\geqslant \frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$
(3.1)

Proof

Note that combining (2.9)–(2.11) yields

$$\begin{aligned}&\left( \begin{array}{c} x_{1}-x_{1}^{k+1} \\ x_{2}-x_{2}^{k+1} \\ x_{3}-x_{3}^{k+1} \end{array} \right) ^{\mathrm{T}} \left[ \left( \begin{array}{c} g_1\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ g_{2}\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ g_{3}\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k+1} \end{array}\right) \right. \nonumber \\&\left. \quad + \left( \begin{array}{ccc} \gamma A_{1}^{\mathrm{T}}A_{2} &{} \gamma A_{1}^{\mathrm{T}}A_{3} \\ 0 &{} \gamma A_{2}^{\mathrm{T}}A_{3} \\ 0 &{} 0 \end{array} \right) \left( \begin{array}{c} x_{2}^{k}-x_{2}^{k+1} \\ x_{3}^{k}-x_{3}^{k+1} \end{array} \right) \right] \geqslant 0. \end{aligned}$$
(3.2)

The key step in our proof is to bound the following two terms:

$$\begin{aligned}&\left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}A_{1}^{\mathrm{T}}\left( A_{2}\left( x_{2}^{k}-x_{2}^{k+1}\right) +A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \right) \quad \text{ and }\\&\quad \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) . \end{aligned}$$

For the first term, we have

$$\begin{aligned}&\qquad \left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}A_{1}^{\mathrm{T}}\left[ A_{2}\left( x_{2}^{k}-x_{2}^{k+1}\right) +A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \right] \\&=\left[ (A_{1}x_{1}-b)\!-\!\left( A_{1}x_{1}^{k+1}\!-\!b\right) \right] ^{\mathrm{T}} \left[ \left( \!-\!A_{2}x_{2}^{k+1}-A_{3}x_{3}^{k+1}\right) -\left( -A_{2}x_{2}^{k}-A_{3}x_{3}^{k}\right) \right] \\&= \frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k} -b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad +\,\frac{1}{2}\left( \left\| A_{1}x_{1}^{k+1}\!+\!A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1} \!-\!b\right\| ^{2} \!-\!\left\| A_{1}x_{1}^{k+1}+\,A_{2}x_{2}^{k}+A_{3}x_{3}^{k}\!-\!b\right\| ^{2}\right) \\&= \frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} -\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad +\,\frac{1}{2\gamma ^{2}}\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^{2} -\frac{1}{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}, \end{aligned}$$

where in the second equality we used the identity (2.17), and the last equality follows from the updating formula for \(\lambda ^{k+1}\) in (2.5).

For the second term, we have

$$\begin{aligned}&\qquad \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}A_{2}^{\mathrm{T}}A_{3} \left( x_{3}^{k}-x_{3}^{k+1}\right) \\&=\left( \left( A_{1}x_{1}+A_{2}x_{2}-b\right) - \left( A_{1}x_{1}+A_{2}x_{2}^{k+1}-b\right) \right) ^{\mathrm{T}} \left( \left( -A_{3}x_{3}^{k+1}\right) -\left( -A_{3}x_{3}^{k}\right) \right) \\&=\frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b\right\| ^{2} -\left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad +\,\frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} -\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2}\right) \\&\leqslant \frac{1}{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1} +A_{2}x_{2}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad +\,\frac{1}{2}\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}, \end{aligned}$$

where in the second equality we applied the identity (2.17).

Therefore, we have

$$\begin{aligned}&\qquad \left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}\gamma A_{1}^{\mathrm{T}}\left( A_{2}\left( x_{2}^{k}-x_{2}^{k+1}\right) +A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \right) \nonumber \\&\quad +\,\left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}\gamma A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k}-x_{3}^{k+1}\right) \nonumber \\&\leqslant \frac{\gamma }{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^{2}+\frac{\gamma }{2}\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \nonumber \\&\quad -\,\frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$
(3.3)

Combining (3.3), (3.2), and (2.5), it holds for any \(\lambda \in \mathbb {R}^p\) that

$$\begin{aligned}&\qquad \left( \begin{array}{c} x_{1}-x_{1}^{k+1} \\ x_{2}-x_{2}^{k+1} \\ x_{3}-x_{3}^{k+1} \\ \lambda -\lambda ^{k+1}\end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} g_{1}\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ g_{2}\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ g_{3}\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k+1} \\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \end{array}\right) \nonumber \\&\quad +\, \frac{1}{\gamma }\left( \lambda -\lambda ^{k+1}\right) ^{\mathrm{T}}\left( \lambda ^{k+1}-\lambda ^{k}\right) \nonumber \\&\quad +\,\frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^{2} +\frac{\gamma }{2}\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\geqslant \frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$
(3.4)

Using the convexity of \(f_1\) and the identity

$$\begin{aligned} \frac{1}{\gamma }\left( \lambda -\lambda ^{k+1}\right) ^{\mathrm{T}}(\lambda ^{k+1}-\lambda ^{k}) +\frac{1}{2\gamma }\Vert \lambda ^{k+1}\!-\!\lambda ^{k}\Vert ^{2} \!=\! \frac{1}{2\gamma }\left( \Vert \lambda -\lambda ^{k}\Vert ^{2}-\Vert \lambda -\lambda ^{k+1}\Vert ^{2}\right) , \end{aligned}$$

letting \(u=u^{*}\) in (3.4), and applying the facts that [invoking (2.13) and (2.14)]

$$\begin{aligned}&f_{2}\left( x_{2}^{*}\right) -f_{2}\left( x_{2}^{k+1}\right) -\frac{\sigma _2}{2}\left\| x_{2}^{*}-x_{2}^{k+1}\right\| ^{2} \geqslant \left( x_{2}^{*}-x_{2}^{k+1}\right) ^{\mathrm{T}} g_{2}\left( x_{2}^{k+1}\right) ,\\&f_{3}\left( x_{3}^{*}\right) -f_{3}\left( x_{3}^{k+1}\right) -\frac{\sigma _3}{2}\left\| x_{3}^{*}-x_{3}^{k+1}\right\| ^{2} \geqslant \left( x_{3}^{*}-x_{3}^{k+1}\right) ^{\mathrm{T}} g_{3}\left( x_{3}^{k+1}\right) , \end{aligned}$$

and

$$\begin{aligned}&\quad \frac{\gamma }{2}\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \\&=\frac{\gamma }{2}\left\| A_{2}\left( x_{2}^{k+1}-x_{2}^{*}\right) +A_{3}\left( x_{3}^{k+1}-x_{3}^{*}\right) \right\| ^{2}\\&\leqslant \gamma \left( \lambda _{\max }\left( A_{2}^{\mathrm{T}}A_{2}\right) \left\| x_{2}^{k+1}-x_{2}^{*}\right\| ^{2}+\lambda _{\max }\left( A_{3}^{\mathrm{T}}A_{3}\right) \left\| x_{3}^{k+1}-x_{3}^{*}\right\| ^{2}\right) , \end{aligned}$$

we obtain

$$\begin{aligned}&f(u^{*})-f(u^{k+1})+\left( \begin{array}{c} x_{1}^{*}-x_{1}^{k+1} \\ x_{2}^{*}-x_{2}^{k+1} \\ x_{3}^{*}-x_{3}^{k+1} \\ \lambda -\lambda ^{k+1}\end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{3}^{\mathrm{T}}\lambda ^{k+1} \\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \end{array}\right) \\&\qquad +\,\frac{1}{2\gamma }\left( \Vert \lambda -\lambda ^{k}\Vert ^{2}-\Vert \lambda -\lambda ^{k+1}\Vert ^{2}\right) \\&\qquad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\qquad +\left( \gamma \lambda _{\max }\left( A_{2}^{\mathrm{T}}A_{2}\right) -\frac{\sigma _2}{2}\right) \left\| x_{2}^{k+1}-x_{2}^{*}\right\| ^{2}{+}\left( \gamma \lambda _{\max }\left( A_{3}^{\mathrm{T}}A_{3}\right) -\frac{\sigma _3}{2}\right) \left\| x_{3}^{k+1}-x_{3}^{*}\right\| ^{2}\\&\qquad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\quad \geqslant \frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$

This together with the facts that \(\gamma \lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})-\frac{\sigma _2}{2} \leqslant 0\) and \(\gamma \lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})-\frac{\sigma _3}{2}\leqslant 0\) implies the desired inequality (3.1).

Now, we are ready to present the O(1 / t) ergodic convergence rate of the ADMM.

Theorem 3.2

Assume that \(\gamma \leqslant \min \left\{ \frac{\sigma _2}{2\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{2\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} \). Let \((x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\in \Omega \) be generated by ADMM (2.2)–(2.5) from given \((x_2^{k},x_3^{k},\lambda ^{k})\). For any integer \(t>0\), let \(\bar{u}^{t}=(\bar{x}_{1}^{t}, \bar{x}_{2}^{t}, \bar{x}_{3}^{t})\) and \(\bar{\lambda }^{t}\) be defined as

$$\begin{aligned}&\bar{x}_{1}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{1}^{k+1},\quad \bar{x}_{2}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{2}^{k+1},\quad \bar{x}_{3}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{3}^{k+1},\\&\quad \bar{\lambda }^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}\lambda ^{k+1}. \end{aligned}$$

Then, for any \((u^*,\lambda ^*) \in \Omega ^{*}\), by defining \(\rho :=\Vert \lambda ^*\Vert +1\), we have

$$\begin{aligned} 0&\leqslant f(\bar{u}^{t})-f(u^{*})+\rho \left\| A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{1}^{t}+A_{3}\bar{x}_{3}^{t}-b\right\| \\&\leqslant \frac{\gamma }{2(t+1)}\left\| A_3 x_{3}^{*}-A_3 x_{3}^{0}\right\| ^{2}+\frac{\rho ^{2}+\Vert \lambda ^0\Vert ^2}{\gamma (t+1)}\\&\quad +\,\frac{\gamma }{2(t+1)}\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{0}+A_{3}x_{3}^{0}-b\right\| ^{2}. \end{aligned}$$

Note that this also implies that both the error of the objective function value and the residual of the equality constraint converge to 0 with convergence rate O(1 / t), i.e.,

$$\begin{aligned} |f(\bar{u}^t)-f(u^*)|=O(1/t),\quad \text{ and } \quad \left\| A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{1}^{t}+A_{3}\bar{x}_{3}^{t}-b\right\| =O(1/t).\nonumber \\ \end{aligned}$$
(3.5)

Proof

Because \((u^{k},\lambda ^k)\in \Omega \), it holds that \((\bar{u}^{t},\bar{\lambda }^t)\in \Omega \) for all \(t\geqslant 0\). Using Lemma 3.1, the last equation of (2.12), and invoking the convexity of function \(f(\cdot )\), we have

$$\begin{aligned}&\qquad \,\, f(u^*)-f(\bar{u}^t) + \lambda ^\mathrm{T}\left( A_1\bar{x}_1^t+A_2\bar{x}_2^t+A_3\bar{x}_3^t-b\right) \nonumber \\&\quad =f(u^{*})-f(\bar{u}^{t})+\left( \begin{array}{c} x_{1}^{*}-\bar{x}_{1}^{t} \\ x_{2}^{*}-\bar{x}_{2}^{t} \\ x_{3}^{*}-\bar{x}_{3}^{t} \\ \lambda -\bar{\lambda }^{t}\end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} -A_{1}^{\mathrm{T}}\bar{\lambda }^{t} \\ -A_{2}^{\mathrm{T}}\bar{\lambda }^{t} \\ -A_{3}^{\mathrm{T}}\bar{\lambda }^{t} \\ A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{2}^{t}+A_{3}\bar{x}_{3}^{t}-b \end{array} \right) \nonumber \\&\quad \geqslant \frac{1}{t+1}\sum \limits _{k=0}^{t}\left[ f(u^{*})-f(u^{k+1})\right. \nonumber \\&\qquad \left. +\,\left( \begin{array}{c} x_{1}^{*}-x_{1}^{k+1} \\ x_{2}^{*}-x_{2}^{k+1} \\ x_{3}^{*}-x_{3}^{k+1} \\ \lambda -\lambda ^{k+1}\end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} -A_{1}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{2}^{\mathrm{T}}\lambda ^{k+1} \\ -A_{3}^{\mathrm{T}}\lambda ^{k+1} \\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \end{array} \right) \right] \nonumber \\&\quad \geqslant \frac{1}{t+1}\sum \limits _{k=0}^{t}\left[ \frac{1}{2\gamma } \left( \Vert \lambda -\lambda ^{k+1}\Vert ^{2}-\Vert \lambda -\lambda ^{k}\Vert ^{2}\right) \right. \nonumber \\&\qquad \,+\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}\right) \nonumber \\&\qquad \,\left. +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}\right) \right] \nonumber \\&\quad \geqslant -\frac{1}{2\gamma (t+1)}\left\| \lambda -\lambda ^{0}\right\| ^{2}-\frac{\gamma }{2(t+1)}\Vert A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{0}-b\Vert ^{2}\nonumber \\&\qquad \,-\,\frac{\gamma }{2(t+1)}\Vert A_{1}x_{1}^{*}+A_{2}x_{2}^{0}+A_{3}x_{3}^{0}-b\Vert ^{2}. \end{aligned}$$
(3.6)

Note that this inequality holds for all \(\lambda \in \mathbb {R}^{p}\). From weak duality of (2.1), we obtain

$$\begin{aligned} 0\geqslant f(u^*)-f(\bar{u}^t)+(\lambda ^*)^\mathrm{T}\left( A_{1}\bar{x}_{1}^{t}+A_{2} \bar{x}_{1}^{t}+A_{3}\bar{x}_{3}^{t}-b\right) , \end{aligned}$$

which implies that

$$\begin{aligned} 0 \leqslant f(\bar{u}^{t})-f(u^{*})+\rho \left\| A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{1}^{t} +A_{3}\bar{x}_{3}^{t}-b\right\| , \end{aligned}$$
(3.7)

because \(\rho =\Vert \lambda ^*\Vert +1\). Moreover, by letting \(\lambda :=-\rho (A_1\bar{x}_1^t+A_2\bar{x}_2^t+A_3\bar{x}_3^t-b)/\Vert A_1\bar{x}_1^t+A_2\bar{x}_2^t+A_3\bar{x}_3^t-b\Vert _2\) in (3.6), and using \(A_1x^*_1+A_2x^*_2+A_3x^*_3=b\), we obtain

$$\begin{aligned}&\qquad f(\bar{u}^{t})-f(u^{*})+\rho \left\| A_{1}\bar{x}_{1}^{t}+A_{2}\bar{x}_{1}^{t} +A_{3}\bar{x}_{3}^{t}-b\right\| \nonumber \\&\leqslant \frac{\rho ^{2}+\Vert \lambda ^0\Vert ^2}{\gamma (t+1)} +\frac{\gamma }{2(t+1)}\left\| A_3\left( x_{3}^{*}-x_{3}^{0}\right) \right\| ^{2}\nonumber \\&\quad +\,\frac{\gamma }{2(t+1)}\left\| A_{2}\left( x_{2}^{*}-x_2^0\right) +A_3\left( x^*_3-x_{3}^{0}\right) \right\| ^{2}. \end{aligned}$$
(3.8)

We now define the function

$$\begin{aligned} v(\xi ) = \min \{ f(u) | A_1 x_1 + A_2 x_2 + A_3 x_3 - b = \xi , x_1\in \mathcal {X}_1, x_2\in \mathcal {X}_2, x_3\in \mathcal {X}_3 \}. \end{aligned}$$

It is easy to verify that v is convex, \(v(0)=f(u^*)\), and \(\lambda ^* \in \partial v(0)\). Therefore, from the convexity of v, it holds that

$$\begin{aligned} v(\xi ) \geqslant v(0) + \langle \lambda ^*, \xi \rangle \geqslant f(u^*) - \Vert \lambda ^*\Vert \Vert \xi \Vert . \end{aligned}$$
(3.9)

Letting \(\bar{\xi } = A_1 \bar{x}_1 + A_2 \bar{x}_2 + A_3\bar{x}_3 - b\), we have \(f(\bar{u}^t) \geqslant v(\bar{\xi })\). Therefore, by denoting the constant

$$\begin{aligned} C:=\frac{\gamma }{2}\left\| A_3x_3^*-A_3x_3^0\right\| ^2 + \frac{\Vert \lambda ^0\Vert ^2}{\gamma } + \frac{\gamma }{2}\left\| A_1x_1^*+A_2x_2^0+A_3x_3^0-b\right\| ^2, \end{aligned}$$

and combining (3.7), (3.8), and (3.9), we get

$$\begin{aligned} \frac{C+\rho ^2/\gamma }{t+1} - \rho \Vert \bar{\xi }\Vert \geqslant f(\bar{u}^t) - f(u^*) \geqslant - \Vert \lambda ^*\Vert \Vert \bar{\xi } \Vert , \end{aligned}$$

which, by using \(\rho =\Vert \lambda ^*\Vert +1\), yields

$$\begin{aligned} \Vert A_1 \bar{x}_1 + A_2 \bar{x}_2 + A_3\bar{x}_3 - b\Vert = \Vert \bar{\xi }\Vert \leqslant \frac{C+\rho ^2/\gamma }{t+1}. \end{aligned}$$
(3.10)

Moreover, by combining (3.7), (3.8), and (3.10), one obtains that

$$\begin{aligned} -\frac{\rho C+\rho ^3/\gamma }{t+1} \leqslant f(\bar{u}^t) - f(u^*) \leqslant \frac{C+\rho ^2/\gamma }{t+1}. \end{aligned}$$
(3.11)

As a result, (3.5) follows immediately from (3.10) and (3.11).

Therefore, we have established the O(1 / t) convergence rate of the ADMM (2.2)–(2.5) in an ergodic sense. Our proof is readily extended to the case of N-block ADMM (1.2). The following theorem shows the O(1 / t) convergence rate of N-block ADMM (1.2). We omit the proof here for the sake of succinctness.

Theorem 3.3

Assume that

$$\begin{aligned} \gamma \leqslant \min _{i=2,\cdots ,N-1}\left\{ \frac{2\sigma _{i}}{(2N-i)(i-1)\lambda _{\max }(A_{i}^{\mathrm{T}}A_{i})}, \frac{2\sigma _{N}}{(N-2)(N+1)\lambda _{\max }(A_{N}^{\mathrm{T}}A_{N})}\right\} , \end{aligned}$$

where \(\sigma _{i}\) is the strong convexity parameter of \(f_{i}\), \(i=2,\cdots ,N\). Let \((x_{1}^{k+1},x_{2}^{k+1},x_{3}^{k+1},\cdots ,x_{N}^{k+1}, \lambda ^{k+1})\in \Omega \) be generated by the N-block ADMM (1.2). For any integer \(t>0\), we define

$$\begin{aligned} \bar{x}_{i}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{i}^{k+1}, \quad 1\leqslant i \leqslant N, \quad \quad \bar{\lambda }^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}\lambda ^{k+1}. \end{aligned}$$

Then, for \(\rho :=\Vert \lambda ^*\Vert +1\), it holds that

$$\begin{aligned}&\quad \sum \limits _{i=1}^{N}\left( f_{i}\left( \bar{x}_{i}^{t}\right) \,-\,f_{i}\left( x_{i}^{*}\right) \right) \,+\,\rho \left\| \sum \limits _{i=1}^{N}A_{i}\bar{x}_{i}^{t}\,-\,b\right\| \\&\leqslant \frac{\gamma }{2(t\,+\,1)}\sum \limits _{i=1}^{N-1} \left\| \sum \limits _{m=i+1}^{N}A_{m}\left( x_{m}^{0}-x_{m}^{*}\right) \right\| ^{2} +\frac{\rho ^{2}+\Vert \lambda ^0\Vert ^2}{\gamma (t+1)}. \end{aligned}$$

Similarly as Theorem 3.2, this also implies that N-block ADMM (1.2) converges with rate O(1 / t) in terms both error of objective function value and the residual of the equality constraints, i.e., it holds that

$$\begin{aligned} |f(\bar{u}^t)-f(u^*)| = O(1/t), \quad \text{ and } \quad \left\| \sum \limits _{i=1}^{N}A_{i}\bar{x}_{i}^{t}-b\right\| =O(1/t). \end{aligned}$$

4 Non-ergodic Convergence Rate of ADMM

In this section, we prove an o(1 / k) non-ergodic convergence rate for ADMM (2.2)–(2.5).

Let us first observe the following (see also Lemma 4.1 in [24]). Suppose at the \((k+1)\)-th iteration of ADMM (2.2)–(2.5), we have

$$\begin{aligned} \left\{ \begin{array}{l} A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}=0,\\ A_{3}x_{3}^{k+1}-A_{3}x_{3}^{k}=0,\\ A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b=0. \end{array}\right. \end{aligned}$$
(4.1)

Then, (2.9)–(2.11) would immediately lead to

$$\begin{aligned} \left\{ \begin{array}{ll} \left( x_{1}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k+1} \right] \geqslant 0, &{}\quad \forall x_{1}\in \mathcal {X}_{1}, \\ \left( x_{2}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ g_2\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k+1} \right] \geqslant 0, &{}\quad \forall x_{2}\in \mathcal {X}_{2}, \\ \left( x_{3}-x_{3}^{k+1}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -A_{3}^{\mathrm{T}}\lambda ^{k+1}\right] \geqslant 0, &{}\quad \forall x_{3}\in \mathcal {X}_{3}. \end{array}\right. \end{aligned}$$

In other words, if (4.1) is satisfied, then \((x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\) would have been already an optimal solution for (2.1). It is therefore natural to introduce a residual for the linear system (4.1) as an optimality measure. Below is such a measure, to be denoted by \(R_{k+1}\):

$$\begin{aligned} R_{k+1}&:=\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}+2\left\| A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right\| ^{2}\nonumber \\&\quad +\,3\left\| A_{3}x_{3}^{k+1}-A_{3}x_{3}^{k}\right\| ^{2}. \end{aligned}$$
(4.2)

In the sequel, we will show that \(R_k\) converges to 0 at the rate o(1 / k). Note that this gives the convergence rate of ADMM (2.2)–(2.5) in non-ergodic sense.

We first show that \(R_k\) is non-increasing.

Lemma 4.1

Assume \(\gamma \leqslant \min \{\frac{\sigma _2}{\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\}\). Let the sequence \(\{x_1^k,x_2^k,x_3^k,\lambda ^k\}\) be generated by ADMM (2.2)–(2.5). It holds that \(R_k\) defined in (4.2) is non-increasing, i.e.,

$$\begin{aligned} R_{k+1} \leqslant R_{k}, \quad k=0,1,2,\cdots . \end{aligned}$$
(4.3)

Proof

Letting \(x_{1}=x_{1}^{k}\) in (2.6) yields

$$\begin{aligned} \left( x_{1}^{k}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) -A_{1}^{\mathrm{T}}\lambda ^{k}+\gamma A_{1}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right) \right] \geqslant 0, \end{aligned}$$

with \(g_1 \in \partial f_1\), which further implies that

$$\begin{aligned}&\qquad \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}g_{1}\left( x_{1}^{k+1}\right) \nonumber \\&\leqslant \left( x_{1}^{k}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left( -A_{1}^{\mathrm{T}}\lambda ^{k}\right) \nonumber \\&\quad +\,\left( x_{1}^{k}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left[ \gamma A_{1}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1} +A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right) \right] \nonumber \\&=\left( A_{1}x_{1}^{k}\,-\,A_{1}x_{1}^{k+1}\right) ^{\mathrm{T}}\,\,\left( \,-\,\lambda ^{k}\right) \,+\,\gamma \left( A_{1}x_{1}^{k}\right. \nonumber \\&\quad \left. -\,A_{1}x_{1}^{k+1}\right) ^{\mathrm{T}}\,\left( A_{1}x_{1}^{k+1}\,+\,A_{2}x_{2}^{k}\,+\,A_{3}x_{3}^{k}\,-\,b\right) \nonumber \\&=\left( A_{1}x_{1}^{k}-A_{1}x_{1}^{k+1}\right) ^{\mathrm{T}}\left( -\lambda ^{k}\right) +\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \nonumber \\&\quad -\,\left. \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} -\left\| A_{1}x_{1}^{k}-A_{1}x_{1}^{k+1}\right\| ^{2}\right) , \end{aligned}$$
(4.4)

where the last equality is due to the identity (2.17). Letting \(x_{1}=x_{1}^{k+1}\) in (2.9) with \(k+1\) changed to k yields

$$\begin{aligned}&\quad \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k}\right) -A_{1}^{\mathrm{T}}\lambda ^{k}+\gamma A_{1}^{\mathrm{T}}A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) +\gamma A_{1}^{\mathrm{T}}A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \\&\geqslant 0, \end{aligned}$$

which further implies that

$$\begin{aligned}&\qquad \left( x_{1}^{k}-x_{1}^{k+1}\right) ^{\mathrm{T}} g_{1}\left( x_{1}^{k}\right) \nonumber \\&\leqslant \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left( -A_{1}^{\mathrm{T}}\lambda ^{k}\right) \nonumber \\&\quad +\,\gamma \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left[ A_{1}^{\mathrm{T}}A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) +A_{1}^{\mathrm{T}}A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \nonumber \\&=\left( A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right) ^{\mathrm{T}}\left( -\lambda ^{k}\right) \nonumber \\&\quad +\, \gamma \left( A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right) ^{\mathrm{T}}\left[ A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) +A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \nonumber \\&\leqslant \left( A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right) ^{\mathrm{T}}(-\lambda ^{k})\nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right\| ^{2} + \left\| A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) +A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right\| ^{2}\right) \nonumber \\&\leqslant \left( A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right) ^{\mathrm{T}}(-\lambda ^{k}) \nonumber \\&\quad +\, \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}-A_{1}x_{1}^{k}\right\| ^{2} + 2\left\| A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) \right\| ^{2} +2\left\| A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right\| ^{2}\right) .\nonumber \\ \end{aligned}$$
(4.5)

Combining (4.4) and (4.5) gives

$$\begin{aligned}&\quad \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) -g_{1}\left( x_{1}^{k}\right) \right] \nonumber \\&\leqslant \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} - \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \nonumber \\&\qquad +\left. 2\left\| A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) \right\| ^{2}+2\left\| A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right\| ^{2}\right) . \end{aligned}$$
(4.6)

Letting \(x_{2}=x_{2}^{k}\) in (2.7) yields

$$\begin{aligned} \left( x_{2}^{k}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -A_{2}^{\mathrm{T}}\lambda ^{k}+\gamma A_{2}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right) \right] \geqslant 0, \end{aligned}$$

which further implies that

$$\begin{aligned}&\qquad \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}g_{2}\left( x_{2}^{k+1}\right) \nonumber \\&\leqslant \left( x_{2}^{k}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left( -A_{2}^{\mathrm{T}}\lambda ^{k}\right) \nonumber \\&\quad +\,\left( x_{2}^{k}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left[ \gamma A_{2}^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right) \right] \nonumber \\&=\left( A_{2}x_{2}^{k}-A_{2}x_{2}^{k+1}\right) ^{\mathrm{T}}(-\lambda ^{k})\nonumber \\&\quad +\, \gamma \left( A_{2}x_{2}^{k}-A_{2}x_{2}^{k+1}\right) ^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right) \nonumber \\&=\left( A_{2}x_{2}^{k}-A_{2}x_{2}^{k+1}\right) ^{\mathrm{T}}(-\lambda ^{k})+\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \nonumber \\&\quad -\,\left. \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} -\left\| A_{2}x_{2}^{k}-A_{2}x_{2}^{k+1}\right\| ^{2}\right) , \end{aligned}$$
(4.7)

where the last equality is due to the identity (2.17). Letting \(x_{2}=x_{2}^{k+1}\) in (2.10) with \(k+1\) changed to k yields

$$\begin{aligned} \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k}\right) -A_{2}^{\mathrm{T}}\lambda ^{k}+\gamma A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \geqslant 0, \end{aligned}$$

which further implies that

$$\begin{aligned}&\quad \left( x_{2}^{k}-x_{2}^{k+1}\right) ^{\mathrm{T}} g_{2}\left( x_{2}^{k}\right) \nonumber \\&\leqslant \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left( -A_{2}^{\mathrm{T}}\lambda ^{k}\right) + \gamma \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ A_{2}^{\mathrm{T}}A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right] \nonumber \\&=\left( A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right) ^{\mathrm{T}}(-\lambda ^{k})+ \gamma \left( A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right) ^{\mathrm{T}}\left( A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right) \nonumber \\&\leqslant \left( A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right) ^{\mathrm{T}}(-\lambda ^{k})+\frac{\gamma }{2}\left( \left\| A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right\| ^{2}+\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right) .\nonumber \\ \end{aligned}$$
(4.8)

Combining (4.7) and (4.8) gives

$$\begin{aligned}&\quad \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -g_{2}\left( x_{2}^{k}\right) \right] \nonumber \\&\leqslant \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} - \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \nonumber \\&\quad \left. +\,\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right) . \end{aligned}$$
(4.9)

Letting \(x_3=x_3^k\) in (2.11) and \(x_3=x_3^{k+1}\) in (2.11) with \(k+1\) changed to k, and adding the two resulting inequalities, yields

$$\begin{aligned}&\qquad \left( x_{3}^{k+1}-x_{3}^{k}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -g_{3} \left( x_{3}^{k}\right) \right] \nonumber \\&\leqslant \left( x_{3}^{k+1}-x_{3}^{k}\right) ^{\mathrm{T}}\left( A_{3}^{\mathrm{T}}\lambda ^{k+1}-A_{3}^{\mathrm{T}}\lambda ^{k}\right) \nonumber \\&=\left( A_{3}x_{3}^{k+1}-A_{3}x_{3}^{k}\right) ^{\mathrm{T}}\left( \lambda ^{k+1}-\lambda ^{k}\right) \nonumber \\&=\gamma \left( A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right) ^{\mathrm{T}}\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right) \nonumber \\&=\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2}{-}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \right. \nonumber \\&\quad \left. -\,\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}\right) , \end{aligned}$$
(4.10)

where the last equality is due to the identity (2.17).

Combining (4.6), (4.9), and (4.10) yields

$$\begin{aligned}&\qquad \left( x_{1}^{k+1}-x_{1}^{k}\right) ^{\mathrm{T}}\left[ g_{1}\left( x_{1}^{k+1}\right) -g_{1}\left( x_{1}^{k}\right) \right] + \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -g_{2}\left( x_{2}^{k}\right) \right] \nonumber \\&\quad + \,\left( x_{3}^{k+1}\,-\,x_{3}^{k}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) \,-\,g_{3}\left( x_{3}^{k}\right) \right] \nonumber \\&\leqslant \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}\right. \nonumber \\&\quad \left. -\, \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}+2\left\| A_{2}\left( x_{2}^{k-1}-x_{2}^{k}\right) \right\| ^{2} \right. \nonumber \\&\quad +\,2\left\| A_{3}\left( x_{3}^{k-1}-x_{3}^{k}\right) \right\| ^{2}+\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}\nonumber \\&\quad -\,\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} \nonumber \\&\quad +\,\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}+ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} \nonumber \\&\quad \left. -\,\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}-\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2} \right] \nonumber \\&= \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k}\,+\,A_{2}x_{2}^{k}\,+\,A_{3}x_{3}^{k}\,-\,b\right\| ^{2}\,+\,2\left\| A_{2}\left( x_{2}^{k-1}\,-\,x_{2}^{k}\right) \right\| ^{2}\right. \nonumber \\&\quad +\,3\left\| A_{3}\left( x_{3}^{k-1}\,-\,x_{3}^{k}\right) \right\| ^{2} \nonumber \\&\quad \left. -\,\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}-\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right] \nonumber \\&= \frac{\gamma }{2} \left[ R_{k}-R_{k+1}+2\left\| A_{2}x_{2}^{k+1}-A_{2}x_{2}^{k}\right\| ^{2}+2\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}\right] . \end{aligned}$$
(4.11)

Note that (2.15) and (2.16) imply that

$$\begin{aligned}&\left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -g_{2}\left( x_{2}^{k}\right) \right] \geqslant \sigma _2 \Vert x_{2}^{k+1}-x_{2}^{k}\Vert ^{2},\nonumber \\&\left( x_{3}^{k+1}-x_{3}^{k}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -g_{3}\left( x_{3}^{k}\right) \right] \geqslant \sigma _3 \Vert x_{3}^{k+1}-x_{3}^{k}\Vert ^{2}. \end{aligned}$$
(4.12)

Combining (4.11) and (4.12), and the fact that \(\gamma \leqslant \min \left\{ \frac{\sigma _2}{\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} \), it is easy to see that \(R_{k+1} \leqslant R_k\) for \(k=0,1,2,\cdots \).

We are now ready to present the o(1 / k) non-ergodic convergence rate of the ADMM (2.2)–(2.5).

Theorem 4.2

Assume \(\gamma \leqslant \min \left\{ \frac{\sigma _2}{2\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{2\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} \). Let the sequence \(\{x_1^{k},x_2^k,x_3^k,\lambda ^k\}\) be generated by ADMM (2.2)–(2.5). Then \(\sum _{k=1}^\infty R_k < +\infty \) and \(R_k = o(1/k)\).

Proof

Combining (4.9) and (4.10) yields

$$\begin{aligned}&\qquad \left( x_{2}^{k+1}-x_{2}^{k}\right) ^{\mathrm{T}}\left[ g_{2}\left( x_{2}^{k+1}\right) -g_{2}\left( x_{2}^{k}\right) \right] +\left( x_{3}^{k+1}-x_{3}^{k}\right) ^{\mathrm{T}}\left[ g_{3}\left( x_{3}^{k+1}\right) -g_{3}\left( x_{3}^{k}\right) \right] \\&\leqslant \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2} \right. \\&\quad +\,\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}+ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b\right\| ^{2}\\&\quad \left. -\,\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}-\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}\right] \\&= \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}+\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right. \\&\quad \left. -\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2} -\,\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2} \right] \\&=\frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^2 +\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right. \\&\quad \left. -\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}-R_{k+1} \right. \\&\quad \left. +\,2\left\| A_2x_2^{k+1}-A_2x_2^k\right\| ^2 + 3\left\| A_3x_3^k-A_3x_3^{k+1}\right\| ^2 \right] \\&\leqslant \frac{\gamma }{2}\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^2 +\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right. \\&\quad \left. -\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}-R_{k+1}\right] \\&\quad +\,\gamma \lambda _{\max }\left( A_2^\mathrm{T}A_2\right) \left\| x_{2}^{k+1}-x_{2}^{k}\right\| ^{2} + \frac{3\gamma }{2}\lambda _{\max }\left( A_3^\mathrm{T}A_3\right) \left\| x_{3}^{k+1}-x_{3}^{k}\right\| ^{2}. \end{aligned}$$

Using (4.12) and the assumption that \(\gamma \leqslant \min \left\{ \frac{\sigma _2}{2\lambda _{\max }(A_{2}^{\mathrm{T}}A_{2})}, \frac{\sigma _3}{2\lambda _{\max }(A_{3}^{\mathrm{T}}A_{3})}\right\} \), we obtain

$$\begin{aligned} R_{k+1}\leqslant & {} \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}+\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\nonumber \\&-\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}. \end{aligned}$$
(4.13)

From the optimality conditions (2.12) and the convexity of f, it follows that

$$\begin{aligned} f(u^{*})-f\left( u^{k+1}\right)\leqslant & {} \left( x_{1}^{*}-x_{1}^{k+1}\right) ^{\mathrm{T}}\left( A_{1}^{\mathrm{T}}\lambda ^{*}\right) +\left( x_{2}^{*}-x_{2}^{k+1}\right) ^{\mathrm{T}}\left( A_{2}^{\mathrm{T}}\lambda ^{*}\right) \nonumber \\&+\left( x_{3}^{*}-x_{3}^{k+1}\right) ^{\mathrm{T}}\left( A_{3}^{\mathrm{T}}\lambda ^{*}\right) . \end{aligned}$$
(4.14)

By combining (3.1) and (4.14), we have

$$\begin{aligned}&\qquad \left( \begin{array}{c} x_{1}^{*}-x_{1}^{k+1} \\ x_{2}^{*}-x_{2}^{k+1} \\ x_{3}^{*}-x_{3}^{k+1} \end{array} \right) ^{\mathrm{T}} \left( \begin{array}{c} A_{1}^{\mathrm{T}}\left( \lambda ^{*}-\lambda ^{k+1}\right) \\ A_{2}^{\mathrm{T}}\left( \lambda ^{*}-\lambda ^{k+1}\right) \\ A_{3}^{\mathrm{T}}\left( \lambda ^{*}-\lambda ^{k+1}\right) \end{array} \right) +\frac{1}{2\gamma }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^{2}\nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\frac{\gamma }{2}\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\geqslant \frac{\gamma }{2}\left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$
(4.15)

Note that the first term in (4.15) is equal to

$$\begin{aligned}&\quad -\left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right) ^{\mathrm{T}} \left( \lambda ^{*}-\lambda ^{k+1}\right) \\&=\frac{1}{\gamma }\left( \lambda ^{k+1}-\lambda ^{k}\right) ^{\mathrm{T}} \left( \lambda ^{*}-\lambda ^{k+1}\right) \\&=\frac{1}{2\gamma }\left( \Vert \lambda ^{*}-\lambda ^{k}\Vert ^{2} -\Vert \lambda ^{k+1}-\lambda ^{k}\Vert ^{2}-\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^{2}\right) . \end{aligned}$$

Therefore, (4.15) can be rearranged as

$$\begin{aligned}&\qquad \frac{1}{\gamma ^{2}}\left( \Vert \lambda ^{*}-\lambda ^{k}\Vert ^{2}-\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^{2}\right) \nonumber \\&\quad +\,\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\quad +\,\left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}-\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \nonumber \\&\geqslant \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}. \end{aligned}$$
(4.16)

By (4.13) and (4.16), we get that

$$\begin{aligned} \sum \limits _{k=1}^{\infty }R_{k+1}\leqslant & {} \sum \limits _{k=1}^{\infty }\left[ \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}+\left\| A_{3}x_{3}^{k-1}-A_{3}x_{3}^{k}\right\| ^{2}\right. \\&\left. -\left\| A_{3}x_{3}^{k}-A_{3}x_{3}^{k+1}\right\| ^{2}\right] \\\leqslant & {} \sum \limits _{k=1}^{\infty } \left\| A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2} + \left\| A_{3}x_{3}^{0}-A_{3}x_{3}^{1}\right\| ^{2}\\\leqslant & {} \left\| A_{3}x_{3}^{0}-A_{3}x_{3}^{1}\right\| ^{2}+\sum \limits _{k=1}^{\infty } \left[ \left( \left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k}-b\right\| ^{2}\right. \right. \\&\left. \left. -\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \right. \\&+\left( \left\| A_{1}x_{1}^{*}{+}A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b\right\| ^{2}{-}\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{k+1}{+}A_{3}x_{3}^{k+1}-b\right\| ^{2}\right) \\&\left. +\,\frac{1}{\gamma ^{2}}\left( \left\| \lambda ^{*}-\lambda ^{k} \right\| ^{2}-\left\| \lambda ^{*}-\lambda ^{k+1}\right\| ^{2}\right) \right] \\\leqslant & {} \left\| A_{3}x_{3}^{0}-A_{3}x_{3}^{1}\right\| ^{2}+\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{1}-b\right\| ^{2}\\&+\left\| A_{1}x_{1}^{*}+A_{2}x_{2}^{1}+A_{3}x_{3}^{1}-b\right\| ^{2}+\frac{1}{\gamma ^{2}}\left\| \lambda ^{*}-\lambda ^{1}\right\| ^{2}. \end{aligned}$$

Note that we have proved that \(R_k\) is monotonically non-increasing, and \(\sum _{k=1}^\infty R_k<+\infty \). As observed in Lemma 1.2 of [32], one has

$$\begin{aligned} k R_{2k} \leqslant R_k + R_{k+1} + \cdots + R_{2k} \rightarrow 0, \quad \text{ as } k \rightarrow \infty , \end{aligned}$$

and therefore \(R_k = o(1/k)\).

Remark 4.3

We remark here that using similar arguments, it is easy to see that (4.13) and (4.16) together with the monotonicity of \(R_k\) also imply that \(R_k\) has a non-asymptotic sublinear convergence rate O(1 / k).

Note that our analysis can be extended to N-block ADMM (1.2) easily. The results are summarized in the following theorem, and the proof is omitted for the sake of succinctness.

Theorem 4.4

Assume that

$$\begin{aligned} \gamma \leqslant \min _{i=2,\cdots ,N-1}\left\{ \frac{2\sigma _{i}}{(2N-i)(i-1)\lambda _{\max }\left( A_{i}^{\mathrm{T}}A_{i}\right) }, \frac{2\sigma _{N}}{(N-2)(N+1)\lambda _{\max }\left( A_{N}^{\mathrm{T}}A_{N}\right) }\right\} . \end{aligned}$$

Let \((x_{1}^{k+1},x_{2}^{k+1},x_{3}^{k+1},\cdots ,x_{N}^{k+1},\lambda ^{k+1})\in \Omega \) be generated by ADMM (1.2). Then \(\sum _{k=1}^\infty R_k < +\infty \) and \(R_k=o(1/k)\), where \(R_k\) is defined as

$$\begin{aligned} R_{k+1} := \left\| \sum _{i=1}^N A_ix_i^{k+1}-b\right\| ^2 + \sum _{i=2}^N\frac{(2N-i)(i-1)}{2}\left\| A_ix_i^k-A_ix_i^{k+1}\right\| ^2. \end{aligned}$$

5 Conclusions

In this paper, we analyzed the sublinear convergence rate of the standard Gauss–Seidel multi-block ADMM in both ergodic and non-ergodic sense. These are the first sublinear convergence rate results for standard multi-block ADMM. Using the techniques developed in this paper, we can also analyze the convergence rate of some variants of the standard multi-block ADMM such as the ones studied in [32] and [33], where the primal variables are updated in a Jacobi manner; we plan to pursue this direction of research in the future.

We remark here the techniques developed in this paper can lead to a very simple proof for the O(1 / t) complexity of two-block ADMM in terms of objective error and constraint violation of (1.1) (\(N=2\)). Specifically, when \(N=2\), denote \((x_1^k,x_2^k;\lambda ^k)\) as the iterate generated by the two-block ADMM (1.2), and define

$$\begin{aligned} \bar{x}_1^t = \frac{1}{t+1}\sum _{k=0}^t x_1^{k+1}, \quad \bar{x}_2^t = \frac{1}{t+1}\sum _{k=0}^t x_2^{k+1}, \quad \bar{\lambda }^t = \frac{1}{t+1}\sum _{k=0}^t \lambda ^{k+1}. \end{aligned}$$

We can prove that

$$\begin{aligned}&|f_1(\bar{x}_1^t) + f_2(\bar{x}_2^t) - f_1(x_1^*) - f_2(x_2^*) | = O(1/t), \quad \nonumber \\&\quad \text{ and } \quad \Vert A_1\bar{x}_1^t+A_2\bar{x}_2^t-b\Vert =O(1/t), \end{aligned}$$
(5.1)

i.e., the convergence rate of the two-block ADMM is O(1 / t) in terms of both objective error and constraint violation. Note that for \(N=2\), \(\gamma \) can be any positive number and there is no need to impose the strong convexity on either \(f_1\) or \(f_2\). The proof of this result is as follows.

First, when \(N=2\), the optimality conditions (2.9)–(2.11) reduce to

(5.2)
(5.3)

Therefore, by letting \(x_1=x_1^*\) in (5.2), \(x_2=x_2^*\) in (5.3), and using the convexity of \(f_1\) and \(f_2\), we have

$$\begin{aligned}&f_1(x_1^{k+1}) - f_1(x_1^*) + f_2(x_2^{k+1}) - f_2(x_2^*) \\ \leqslant&g_1(x_1^{k+1})^\mathrm{T}(x_1^{k+1}-x_1^*) + g_2(x_2^{k+1})^\mathrm{T}(x_2^{k+1}-x_2^*) \\ \leqslant&(-A_1^\mathrm{T}\lambda ^{k+1}+\gamma A_1^\mathrm{T} A_2(x_2^k-x_2^{k+1}))^\mathrm{T} (x_1^*-x_1^{k+1}) + (-A_2^\mathrm{T}\lambda ^{k+1})^\mathrm{T} (x_2^*-x_2^{k+1}) \\ =&\frac{1}{\gamma }(\lambda ^k-\lambda ^{k+1})^\mathrm{T}\lambda ^{k+1} \!+\! \gamma \left[ (-A_2x_2^{k+1})\!-\!(-A_2x_2^k)\right] ^\mathrm{T}\left[ (A_1x_1^*-b)\!-\!(A_1x_1^{k+1}-b)\right] \\ =&\frac{1}{\gamma }(\lambda ^k-\lambda ^{k+1})^\mathrm{T}\lambda ^{k+1} + \frac{\gamma }{2}\left( \Vert -A_1x_1^{k+1}+b-A_2x_2^{k+1}\Vert ^2+\Vert A_1x_1^*-b\right. \\&\left. +A_2x_2^k\Vert ^2-\Vert -A_2x_2^{k+1}-A_1x_1^*+b\Vert ^2-\Vert A_1x_1^{k+1}-b+A_2x_2^k\Vert ^2\right) \\ \leqslant&\frac{1}{\gamma }(\lambda ^k-\lambda ^{k+1})^\mathrm{T}\lambda ^{k+1} + \frac{\gamma }{2}\left( \frac{1}{\gamma ^2}\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2+\Vert A_1x_1^*-b+A_2x_2^k\Vert ^2-\Vert \right. \\&\left. -A_2x_2^{k+1}-A_1x_1^*+b\Vert ^2\right) , \end{aligned}$$

where the second equality is due to (2.17). Thus for any \(\lambda \in \mathbb {R}^p\), it holds that,

$$\begin{aligned}&f_1(x_1^{k+1}) - f_1(x_1^*) + f_2(x_2^{k+1}) - f_2(x_2^*) -\lambda ^\mathrm{T}(A_1x_1^{k+1}+A_2x_2^{k+1}-b) \nonumber \\ \leqslant&\frac{1}{\gamma }(\lambda ^{k+1}-\lambda )^\mathrm{T}(\lambda ^k-\lambda ^{k+1}) + \frac{1}{2\gamma }\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2 + \frac{\gamma }{2}(\Vert A_1x_1^*+A_2x_2^k-b\Vert ^2\nonumber \\&\quad -\Vert A_1x_1^*+A_2x_2^{k+1}-b\Vert ^2)\nonumber \\ =&\frac{1}{2\gamma } (\Vert \lambda -\lambda ^{k}\Vert ^2 - \Vert \lambda -\lambda ^{k+1}\Vert ^2) + \frac{\gamma }{2}(\Vert A_1x_1^*+A_2x_2^k-b\Vert ^2-\Vert A_1x_1^*\nonumber \\&\quad +A_2x_2^{k+1}-b\Vert ^2). \end{aligned}$$
(5.4)

Summing (5.4) over \(k=0,1,\ldots ,t\) yields,

$$\begin{aligned}&f_1(\bar{x}_1^{t}) - f_1(x_1^*) + f_2(\bar{x}_2^{t}) - f_2(x_2^*) -\lambda ^\mathrm{T}(A_1\bar{x}_1^{t}+A_2\bar{x}_2^{t}-b) \\ \leqslant&\frac{1}{2\gamma (t+1)}\Vert \lambda -\lambda ^0\Vert ^2 + \frac{\gamma }{2(t+1)}\Vert A_1x_1^*+A_2x_2^0-b\Vert ^2. \end{aligned}$$

Based on the above bound, the error analysis for both the objective and the residual follow the same line of arguments as the proof of Theorem 3.2.