1 Introduction

Many applications can be modeled or reformulated as convex minimization models with separable objective functions without coupled variables but the variables are coupled by linear constraints. When the objective function is the sum of two convex functions without coupled variables, the alternating direction method of multipliers (ADMM) proposed in [12] has been well studied in the literature. It can be regarded as a splitting version of the classical augmented Lagrangian method (ALM) in [20, 30]; and it has found a variety of applications in many areas, mainly because of its feature of generating easier subproblems in iterations. We refer to, e.g., [2, 10, 13], for some reviews on the ADMM. On the other hand, there are many applications that can be modeled or reformulated as a multiple-block linearly constrained convex minimization model whose objective function is the sum of more than two functions without coupled variables. In fact, many applications can be captured by a three-block model where there are three functions in the objective, see e.g., the image alignment problem in [29], the robust principal component analysis model with noisy and incomplete data in [34], the latent variable Gaussian graphical model selection in [4, 28], the stable principal component pursuit with nonnegative constraint in [38], the quadratic discriminant analysis model in [27] and the quadratic conic programming in [22].

Inspired by the mentioned applications, we focus on a three-block linearly constrained separable convex minimization model as follows. Let \({\mathcal {X}}_i\) with \(i=1,2,3\) and \({\mathcal {Z^*}}\) be real finite-dimensional Euclidean spaces, each equipped with an inner product \(\langle \cdot ,\cdot \rangle \) and its induced norm \(\Vert \cdot \Vert \); \(\theta _i:{\mathcal {X}}_i\rightarrow (-\infty , +\infty ]\) with \(i=1,2,3\) be closed proper convex functions; \(A_i:\;{\mathcal {X}}_i\rightarrow {\mathcal {Z^*}}\) with \(i=1,2,3\) be linear operators; and \(b\in {\mathcal {Z^*}}\) be a given vector. We aim at finding a solution point of the model

$$\begin{aligned} \min \left\{ \sum \limits _{i=1}^3 \theta _i(x_i)\;\Big |\; \sum \limits _{i=1}^3A_ix_i = b, \; x_i\in {\mathcal {X}}_i,\quad i=1,2,3\right\} . \end{aligned}$$
(1.1)

The solution set of (1.1) is assumed to be nonempty throughout our discussion. In our analysis, we also use the notations \(\mathbf {x}:=(x_1,x_2,x_3)\) and \(\mathcal{X}:={\mathcal {X}}_1 \times {\mathcal {X}}_2\times {\mathcal {X}}_3\). Let the augmented Lagrangian function of (1.1) be

$$\begin{aligned} \;\;\;\;\begin{array}{c} {\mathcal {L}}_{\beta }(x_1,x_2,x_3,z):=\sum \limits _{i=1}^3\theta _i(x_i)-\left\langle z, \sum \limits _{i=1}^3A_ix_i-b\right\rangle +\frac{{\beta }}{2}\left\| \sum \limits _{i=1}^3A_ix_i-b\right\| ^2 \end{array} \end{aligned}$$

with \(z \in {\mathcal {Z}}\) the Lagrange multiplier and \(\beta >0\) a penalty parameter. Then, the direct extension of ADMM for (1.1) reads as

figure a

Although it is natural to consider the direct extension of ADMM (1.2a)–(1.2d) and this scheme does work very well for many concrete applications of (1.1) (e.g., [16, 29, 34, 38]), it has been shown in [5] that the scheme (1.2a)–(1.2d) is not necessarily convergent if no further conditions are posed on the model (1.1). Some works have been studied in the literature for guaranteeing the convergence of (1.2a)–(1.2d) and they can be classified into two categories. The first category is twisting the scheme (1.2a)–(1.2d) slightly and no more condition is assumed for the original model. For examples, in [17, 18], it was suggested to correct the output of (1.2a)–(1.2d) to generate a new iterate and the resulting prediction-correction schemes are guaranteed to be convergent. In [19], the \(x_i\)-subproblems are twisted slightly to render the convergence. Numerically the original scheme (1.2a)–(1.2d) usually performs better than all the twisted variants with provable convergence (see, e.g., [16, 34]); and it is the most convenient scheme to be implemented compared with its variants. In [21], the authors suggested attaching a shrinkage factor to the Lagrange-multiplier updating step in (1.2d) and it was shown that the convergence of (1.2a)–(1.2d) is guaranteed when this factor is small enough to satisfy some error bound conditions. Note that these just-mentioned works are for the more general case with \(m\ge 3\) blocks in its objective function. Most recently, it was studied in [22] that if certain proximal terms are used to regularize the \(x_i\)-subproblems, then the convergence of (1.2a)–(1.2d) can be guaranteed only when one function in the objective of (1.1) is strongly convex. Indeed, a relaxation factor in \((0,\frac{1+\sqrt{5}}{2})\) which was initiated in [11] can be attached to the Lagrange-multiplier updating step (1.2d) provided that the proximal terms and the penalty parameter \(\beta \) are chosen judiciously.

The second category is assuming more conditions on the model (1.1) while retaining the original iterative scheme (1.2a)–(1.2d). Indeed, prior to the work [5] in which it was shown that (1.2a)–(1.2d) is not necessarily convergent, it was proved in [14] that the scheme (1.2a)–(1.2d) is convergent if all the functions \(\theta _i\) are strongly convex and the penalty parameter \(\beta \) is chosen in a certain interval. Then, in [6, 23], this condition was relaxed and only two or more functions in the objective are required to be strongly convex to ensure the convergence of (1.2a)–(1.2d). However, assuming the strong convexity for two functions still excludes most of the applications that can be efficiently solved by the scheme (1.2a)–(1.2d). Thus, these conditions are only of theoretical interests and they are usually too strict to be satisfied by the mentioned applications.

We are thus interested in understanding the gap between the empirical efficiency of (1.2a)–(1.2d) and the lack of theoretical conditions that can both ensure the convergence of (1.2a)–(1.2d) and be satisfied by some applications of the abstract model (1.1). As [22], we independently show that the convergence of (1.2a)–(1.2d) can be ensured when one function in the objective of (1.1) is strongly convex, the penalty parameter \(\beta \) is appropriately restricted and some assumptions on the linear operators \(A_i\)’s hold—some conditions that hold for some concrete applications of (1.1). Note that the restriction onto \(\beta \) in [22] is determined through checking the positive definiteness of some operators, and it is generally more restrictive than ours because it targets a setting more general than (1.2a)–(1.2d) with larger step sizes for updating z and semi-proximal terms for regularizing the \(x_i\)-subproblems. Moreover, it worths to mention that such a condition on \(\beta \) is sufficient to theoretically ensure the convergence of (1.2a)–(1.2d). It is thus usually conservative and can be enlarged in practice to result in better numerical performance. Another purpose of this paper is establishing the convergence rate for the scheme (1.2a)–(1.2d), including the worst-case convergence rate measured by the iteration complexity and the globally linear convergence rate in asymptotical sense under some additional assumptions. We refer to [6, 21, 23, 24] for some convergence rate analysis for (1.2a)–(1.2d).

The rest of this paper is organized as follows. Some necessary preliminaries for further analysis are provided in Sect. 2. In Sect. 3, we prove the convergence of the scheme (1.2a)–(1.2d) under the assumption that one of the three functions in (1.1) is strongly convex. Then, we establish the worst-case convergence rate measured by the iteration complexity in the ergodic and nonergodic senses in Sects. 4 and 5, respectively. In Sect. 6, under some additional conditions, we show that the scheme (1.2a)–(1.2d) is globally linear convergent. Finally, some concluding remarks are made in Sect. 7.

2 Preliminaries

In this section, we summarize some notations and preliminaries to be used for further analysis.

2.1 Notation

We use calligraphic letters (e.g., \({\mathcal {H}}\), \({\mathcal {X}}\)) to denote spaces, each equipped with an inner product \(\langle \cdot , \cdot \rangle \) and an associated norm \(\Vert \cdot \Vert \). All vectors are column vectors. For two arbitrary vectors \(x\in {\mathcal {X}}_1\) and \(y\in {\mathcal {X}}_2\), we simply use \(u=(x,y)\) to denote their adjunction. That is, (xy) denotes the vector \((x^*, y^*)^*\). For a linear operator \(A_i\), we denote by \(A_i^*\) its adjoint operator.

Let \(M:\;{\mathcal {H}}\rightarrow {\mathcal {H}}^*\) be a positive definite linear operator; we use \(\Vert x\Vert _M:=\sqrt{\langle x, Mx\rangle }\) to denote its M-norm and \(\lambda _{\min }(M)\) to denote the smallest eigenvalue of M. For an identity operator I and a scalar \(\beta >0\), we simply use \(\Vert x\Vert _{\beta }\) to denote \(\Vert x\Vert _{\beta I}\). For a given linear operator A, its norm is

$$\begin{aligned} \left\| A\right\| {:}{=}\sup _{x\ne 0}\left\{ \frac{{\left\| Ax\right\| }}{{\left\| x\right\| }}\right\} . \end{aligned}$$

Specially, for a symmetric linear operator A, \(\Vert A\Vert \) denotes its spectral norm.

We say a function \(f:{\mathcal {H}}\rightarrow (-\infty , \infty ]\) is convex if

$$\begin{aligned} f(tx+(1-t)y)\le tf(x)+(1-t)f(y),\quad \forall x,y\in {\mathcal {H}}; \end{aligned}$$

and it is strongly convex with modulus \(\mu >0\) if

$$\begin{aligned} f(tx+(1-t)y)\le tf(x)+(1-t)f(y)-\frac{\mu }{2}t(1-t)\Vert x-y\Vert ^2,\quad \forall x,y\in {\mathcal {H}}, \end{aligned}$$

where \(t \in [0,1]\).

A multifunction \(F:{\mathcal {H}}\rightrightarrows {\mathcal {H}}\) means that F(x) is a set in \({\mathcal {H}}\), see, e.g., [32]. For a multifunction \(F:{\mathcal {H}}\rightrightarrows {\mathcal {H}}\), we say that F is monotone if

$$\begin{aligned} \langle x-y, \xi -\zeta \rangle \ge 0,\quad \forall \xi \in F(x),\quad \forall \zeta \in F(y); \end{aligned}$$

and strongly monotone with modulus \(\mu >0\) if

$$\begin{aligned} \langle x-y, \xi -\zeta \rangle \ge \mu \Vert x-y\Vert ^2,\quad \forall \xi \in F(x),\quad \forall \zeta \in F(y). \end{aligned}$$

It is well known, see, e.g., [32], that for a convex function f, \(\partial f\), the subdifferential of f, is a monotone multifunction; and for a strongly convex function f with modulus \(\mu >0\), \(\partial f\) is a strongly monotone multifunction with modulus \(\mu \). That is, if f is convex, then for any \(x,y\in \hbox {dom} \; f\)

$$\begin{aligned} \langle \xi -\zeta , x-y\rangle \ge 0,\quad \forall \xi \in \partial f(x),\quad \forall \zeta \in \partial f(y); \end{aligned}$$

and if f is strongly convex with modulus \(\mu >0\), then

$$\begin{aligned} \langle \xi -\zeta , x-y\rangle \ge \mu \Vert x-y\Vert ^2,\quad \forall \xi \in \partial f(x),\quad \forall \zeta \in \partial f(y), \end{aligned}$$

where

$$\begin{aligned} \hbox {dom}\; f:=\{x\in {\mathcal {H}}\; | \; f(x)<\infty \} \end{aligned}$$

is the domain of \(f:\; {\mathcal {H}}\rightarrow (-\infty ,\infty ]\). An equivalent inequality for the strong convexity of f is that

$$\begin{aligned} f(x)\ge f(y)+\langle \xi ,x-y\rangle +\frac{\mu }{2}\Vert x-y\Vert ^2,\quad \forall \xi \in \partial f(y), \, \forall x,y\in \hbox {dom}\;f. \end{aligned}$$

Last, for any two vectors a and b with the same dimensions, we have

$$\begin{aligned} 2|\langle a, b\rangle |\le \varsigma \Vert a\Vert ^2+\frac{1}{\varsigma }\Vert b\Vert ^2, \end{aligned}$$
(2.1)

for any positive scalar \(\varsigma \). Obviously, we also have

$$\begin{aligned} \Vert a+b\Vert ^2\le (1+\varsigma )\Vert a\Vert ^2+\left( 1+\frac{1}{\varsigma }\right) \Vert b\Vert ^2,\quad \forall \varsigma >0. \end{aligned}$$

These two inequalities will be frequently used in the proofs to be presented.

2.2 Optimality characterization

In this subsection, we present some variational forms to characterize a solution point of the model (1.1) under discussion. These characterizations will be the basis of our convergence analysis to be conducted.

Recall that the Lagrangian function of (1.1) is

$$\begin{aligned} {\mathcal {L}}(x_1,x_2,x_3,z):=\theta _1(x_1)+\theta _2(x_2)+\theta _3(x_3)-\langle z, A_1x_1+A_2x_2+A_3x_3-b\rangle , \end{aligned}$$

where z is the Lagrange multiplier. Solving (1.1) amounts to finding a saddle point of \({\mathcal {L}}(x_1,x_2,x_3,z)\). That is, finding \((\hat{x}_1, \hat{x}_2, \hat{x}_3, \hat{z}) \in {\mathcal {X}}_1 \times {\mathcal {X}}_2\times {\mathcal {X}}_3 \times {\mathcal {Z}}\) such that the following inequalities hold:

$$\begin{aligned}&{\mathcal {L}}(\hat{x}_1,\hat{x}_2,\hat{x}_3,z)\le {\mathcal {L}}(\hat{x}_1,\hat{x}_2,\hat{x}_3, \hat{z})\le {\mathcal {L}}(x_1,x_2,x_3,\hat{z}), \\&\forall (x_1,x_2,x_3, z) \in {\mathcal {X}}_1 \times {\mathcal {X}}_2\times {\mathcal {X}}_3 \times {\mathcal {Z}}. \end{aligned}$$

Let the set of all saddle points of \({\mathcal {L}}(x_1,x_2,x_3, z)\) be denoted by S. Throughout, S is assumed to be nonempty. Let \((\hat{x}_1, \hat{x}_2, \hat{x}_3, \hat{z})\in S\) be an arbitrary solution in S. Then, we have

figure b

It follows from (2.2a)–(2.2c) that, for \(i=1,2,3\), we have

$$\begin{aligned} \theta _i(x_i')\ge \theta _i(\hat{x}_i)+\langle \hat{z}, A_ix_i'-A_i\hat{x}_i), \quad \forall x_i'\in {\mathcal {X}}_i \end{aligned}$$

if \(\theta _i\) is convex; and

$$\begin{aligned} \theta _i(x_i')\ge \theta _i(\hat{x}_i) +\langle \hat{z}, A_{i}x_{i}'-A_{i}\hat{x}_{i})+\frac{\mu _i}{2}\Vert x_{i}'-\hat{x}_{i}\Vert ^2, \quad \forall x_{i}'\in {\mathcal {X}}_{i} \end{aligned}$$
(2.3)

if \(\theta _i\) is strongly convex with the modulus \(\mu _i>0\).

In our analysis, we also use the primal-dual gap function (see e.g., [3, 15, 21]) defined as

where \({\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\) is a subset of \({\mathcal {X}}\times {\mathcal {Z}}\). If \({\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\) contains a solution \((\hat{x}_1,\hat{x}_2,\hat{x}_3,\hat{z})\), then for any \((\mathbf {x},z)\in {\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\), we have

$$\begin{aligned} {\mathcal {G}}_{{\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}}(\mathbf {x}, z)\ge & {} \left( \sum _{i=1}^3\theta _i(x_i)-\left\langle \hat{z}, \sum _{i=1}^3A_ix_i-b\right\rangle \right) \nonumber \\&-\left( \sum _{i=1}^3\theta _i(\hat{x}_i)- \left\langle z, \sum _{i=1}^3A_i\hat{x}_i-b\right\rangle \right) \nonumber \\= & {} \sum _{i=1}^3(\theta _i(x_i)-\theta _i(\hat{x}_i)-\langle \hat{z}, A_ix_i-A_i\hat{x}_i\rangle )\nonumber \\\ge & {} \sum _{i=1}^3\frac{\mu _i}{2}\Vert x_i-\hat{x}_i\Vert ^2\nonumber \\\ge & {} 0, \end{aligned}$$
(2.4)

where the equality follows from (2.2d), the second inequality follows from (2.3), and the last inequality follows from the convexity of \(\theta _i\), \(i=1,2,3\).

The following lemma gives a sufficient condition for a point to be a saddle point of \({\mathcal {L}}(x_1,x_2,x_3,z)\). The assertion is an immediate conclusion of (2.4); we thus omit the proof.

Lemma 2.1

Let \(({\bar{\mathbf{x}}}, \bar{z})\) be an arbitrary point in interior of \({\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\). If

$$\begin{aligned} {\mathcal {G}}_{{\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}}({\bar{\mathbf{x}}},\bar{z})\le 0, \end{aligned}$$

then \(({\bar{\mathbf{x}}}, \bar{z})\) is a saddle point.

Based on this lemma, if \((\mathbf {x},z)\in {\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\) and \({\mathcal {G}}_{{\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}}(\mathbf {x}, z)\le \epsilon \) for some \(\epsilon >0\), then we call \((\mathbf {x},z)\) a saddle point of \({\mathcal {L}}(x_1,x_2,x_3, z)\) with an accuracy of \(\epsilon \).

Finally, we use the notation

$$\begin{aligned} Q:=\left( \begin{array}{lll}\beta ^2 A_2^* A_2&{}\quad 0&{}\quad 0\\ 0&{}\quad \beta ^2A_3^*A_3&{}\quad 0\\ 0&{}\quad 0&{}\quad I\end{array}\right) , \end{aligned}$$
(2.5)

where the operators \(A_2\) and \(A_3\) are given in (1.1), and \(\beta \) is the penalty parameter in the direct extension of ADMM (1.2a)–(1.2d).

3 Convergence

In this section, we prove the convergence of the direct extension of ADMM (1.2a)–(1.2d) under the following assumption.

Assumption 3.1

In (1.1), the functions \(\theta _1\) and \(\theta _2\) are convex; the function \(\theta _3\) is strongly convex with modulus \(\mu _3>0\); \(A_1\) and \(A_2\) in (1.1) are of full column rank.

Let us first present the variational characterization of the scheme (1.2a)–(1.2d). Indeed, similar as (2.2a)–(2.2d), the subproblems (1.2a)–(1.2c) can be characterized as

figure c

Combining it with the step (1.2d), the \((k+1)\)th iterate generated by (1.2a)–(1.2d) satisfies that for any \( x_1'\in {\mathcal {X}}_1\), \( x_2'\in {\mathcal {X}}_2\), and \( x_3'\in {\mathcal {X}}_3\),

figure d

Now, we start to prove the convergence of (1.2a)–(1.2d) under Assumption 3.1. First, we prove several lemmas.

Lemma 3.1

Suppose Assumption 3.1 holds. For the iterative sequence \(\{(x_1^k,x_2^k,x_3^k,z^k)\}\) generated by the direct extension of ADMM (1.2a)–(1.2d), we have

$$\begin{aligned} \left\langle A_3x_3^{k+1}-A_3x_3^k, z^{k+1}- z^k\right\rangle \ge \mu _3\left\| x_3^k-x_3^{k+1}\right\| ^2, \end{aligned}$$
(3.3)

and

$$\begin{aligned}&\left\langle A_2x_2^{k+1}-A_2x_2^k, z^{k+1}- z^k\right\rangle \nonumber \\&\ge -\left\{ \frac{3}{10}\Vert A_2x_2^{k+1}-A_2x_2^k\Vert _\beta ^2+\frac{5}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_\beta +\frac{5}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_\beta \right\} . \end{aligned}$$
(3.4)

Proof

Setting \(x_3':=x_3^k\) in (3.2c) yields

$$\begin{aligned} \theta _3(x_3^k)-\theta _3(x_3^{k+1})- \langle A_3x_3^k-A_3x_3^{k+1}, z^{k+1}\rangle \ge \frac{\mu _3}{2}\Vert x_3^k-x_3^{k+1}\Vert ^2; \end{aligned}$$

and setting \(x_3':=x_3^{k+1}\) for the kth iteration yields

$$\begin{aligned} \theta _3\left( x_3^{k+1}\right) -\theta _3\left( x_3^{k}\right) - \left\langle A_3x_3^{k+1}-A_3x_3^{k}, z^{k}\right\rangle \ge \frac{\mu _3}{2}\Vert x_3^k-x_3^{k+1}\Vert ^2. \end{aligned}$$

Adding the above two inequalities, we have

$$\begin{aligned} \left\langle A_3\left( x_3^{k+1}-x_3^{k}\right) , z^{k+1}- z^k\right\rangle \ge \mu _3\Vert x_3^k-x_3^{k+1}\Vert ^2, \end{aligned}$$

and the inequality (3.3) is proved.

We now prove (3.4). It follows from (3.2d) that

$$\begin{aligned} \sum _{i=1}^3A_ix_i^{k+1}-b=\frac{1}{\beta }\left( z^k-z^{k+1}\right) . \end{aligned}$$
(3.5)

Using (3.5), we can rewrite (3.2b) as

$$\begin{aligned} \theta _2\left( x_2'\right) -\theta _2\left( x_2^{k+1}\right) -\left\langle A_2x_2'-A_2x_2^{k+1}, z^{k+1}-\beta \left( A_3x_3^k-A_3x_3^{k+1}\right) \right\rangle \ge 0,\quad \forall x_2'\in {\mathcal {X}}_2.\nonumber \\ \end{aligned}$$
(3.6)

Then, setting \(x_2':=x_2^k\) and \(x_2':=x_2^{k+1}\) in (3.6) for the kth iteration yields

$$\begin{aligned} \theta _2\left( x_2^k\right) -\theta _2\left( x_2^{k+1}\right) - \left\langle A_2x_2^k-A_2x_2^{k+1}, z^{k+1}-\beta \left( A_3x_3^k-A_3x_3^{k+1}\right) \right\rangle \ge 0, \end{aligned}$$

and

$$\begin{aligned} \theta _2\left( x_2^{k+1}\right) -\theta _2\left( x_2^{k}\right) - \left\langle A_2x_2^{k+1}-A_2x_2^{k}, z^{k}-\beta \left( A_3x_3^{k-1}-A_3x_3^{k}\right) \right\rangle \ge 0, \end{aligned}$$

respectively. Adding these two inequalities, we get

$$\begin{aligned}&\left\langle A_2x_2^{k+1}-A_2x_2^{k}, z^{k+1}- z^k-\beta \left( A_3x_3^k-A_3x_3^{k+1}\right) +\beta \left( A_3x_3^{k-1}-A_3x_3^{k}\right) \right\rangle \ge 0. \end{aligned}$$

Rearranging terms, it follows that

$$\begin{aligned}&\left\langle A_2x_2^{k+1}-A_2x_2^k, z^{k+1}- z^k\right\rangle \\&\;\ge \beta \left\langle A_2x_2^{k+1}-A_2x_2^k,\left( A_3x_3^{k}-A_3x_3^{k+1}\right) -\left( A_3x_3^{k-1}-A_3x_3^k\right) \right\rangle \\&\;\ge -\left\{ \frac{3}{20}\Vert A_2x_2^{k+1}-A_2x_2^k\Vert _\beta ^2+\frac{5}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_\beta \right\} \\&\;\;\qquad -\left\{ \frac{3}{20}\Vert A_2x_2^{k+1}-A_2x_2^k\Vert _\beta ^2+\frac{5}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_\beta \right\} \\&=-\left\{ \frac{3}{10}\Vert A_2x_2^{k+1}-A_2x_2^k\Vert _\beta ^2+\frac{5}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_\beta +\frac{5}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_\beta \right\} , \end{aligned}$$

where the second inequality follows from (2.1). The proof is complete. \(\square \)

Lemma 3.2

Suppose Assumption 3.1 holds. Let \(\{(x_1^k,x_2^k,x_3^k,z^k)\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d). Then, for all \((x_1',x_2',x_3',z')\in {\mathcal {X}}_1\times {\mathcal {X}}_2\times {\mathcal {X}}_3\times {\mathcal {Z}}\), we have

$$\begin{aligned}&\left[ \sum _{i=1}^3\theta _i(x_i^{k+1})-\left\langle z',\sum _{i=1}^3A_ix_i^{k+1}-b\right\rangle \right] -\left[ \sum _{i=1}^3\theta _i(x_i')-\left\langle {z^{k+1}}, \sum _{i=1}^3A_ix_i'-b\right\rangle \right] \nonumber \\&\;\le \left\langle z^{k+1}-z', \sum _{i=1}^3A_ix_i^{k+1}-b\right\rangle \nonumber \\&\quad +\,\beta \left\langle A_1x_1'-A_1x_1^{k+1}+A_2x_2'-A_2x_2^{k+1},A_3x_3^k-A_3x_3^{k+1}\right\rangle \nonumber \\&\quad +\,\beta \left\langle A_1x_1'-A_1x_1^{k+1}, A_2x_2^k-A_2x_2^{k+1}\right\rangle -\frac{\mu _3}{2}\Vert x_3'-x_3^{k+1}\Vert ^2. \end{aligned}$$
(3.7)

Proof

Using (3.2d) and rearranging terms, it follows from (3.2a)–(3.2c) that

figure e

Adding (3.8a)–(3.8c) and rearranging terms, we obtain the assertion (3.7) immediately. \(\square \)

Now, we need to further analyze the right-hand side of (3.7) and refine the assertion (3.7). This is done in the following lemma.

Lemma 3.3

Suppose Assumption 3.1 holds. Let \(\{(x_1^k,x_2^k,x_3^k,z^k)\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d). Then, for all \((x_1',x_2',x_3',z')\in {\mathcal {X}}_1\times {\mathcal {X}}_2\times {\mathcal {X}}_3\times {\mathcal {Z}}\), we have

$$\begin{aligned}&2\beta \left[ \sum _{i=1}^3\theta _i(x_i^{k+1})-\left\langle z', \sum _{i=1}^3A_ix_i^{k+1}-b\right\rangle \right] -2\beta \left[ \sum _{i=1}^3\theta _i(x_i')-\left\langle z^{k+1}, \sum _{i=1}^3A_ix_i'-b\right\rangle \right] \nonumber \\&\;\le \sum _{i=2}^3\left( \Vert A_ix_i^{k}-A_ix_i'\Vert _{\beta ^2}^2-\Vert A_ix_i^{k+1}-A_ix_i'\Vert _{\beta ^2}^2\right) +\left( \Vert z^k-z'\Vert ^2-\Vert z^{k+1}-z'\Vert ^2\right) \nonumber \\&\qquad +\,\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}-\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\qquad -\,\frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2-\left( \beta \mu _3\Vert x_3^{k+1}-x_3'\Vert ^2-\frac{8}{3}\Vert A_3x_3^{k+1}-A_3x_3'\Vert ^2_{\beta ^2}\right) \nonumber \\&\qquad +\,\frac{17}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}-2\beta \mu _3\Vert x_3^k-x_3^{k+1}\Vert ^2-\Vert z^k-z^{k+1}\Vert ^2\nonumber \\&\qquad +\,2\beta ^2\left\langle \sum _{i=1}^3A_ix_i'-b,A_2x_2^k-A_2x_2^{k+1}+A_3x_3^k-A_3x_3^{k+1}\right\rangle . \end{aligned}$$
(3.9)

Proof

Because of the identity

$$\begin{aligned} \langle x, y\rangle =\frac{1}{2}\left( \Vert x\Vert ^2+\Vert y\Vert ^2-\Vert x-y\Vert ^2\right) , \end{aligned}$$
(3.10)

it follows that

$$\begin{aligned}&\beta \left\langle A_2x_2'-A_2x_2^{k+1}, A_2x_2^k-A_2x_2^{k+1}\right\rangle \\&\;=\frac{1}{2}\left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta }^2+ \Vert A_2x_2^{k+1}-A_2x_2'\Vert _{\beta }^2-\Vert A_2x_2^{k}-A_2x_2'\Vert _{\beta }^2\right\} \end{aligned}$$

and

$$\begin{aligned}&\beta \left\langle A_3x_3'-A_3x_3^{k+1}, A_3x_3^k-A_3x_3^{k+1}\right\rangle \\&\;=\frac{1}{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert _{\beta }^2+ \Vert A_3x_3^{k+1}-A_3x_3'\Vert _{\beta }^2-\Vert A_3x_3^{k}-A_3x_3'\Vert _{\beta }^2\right\} . \end{aligned}$$

Consequently, we have

$$\begin{aligned}&\beta \langle A_1x_1'-A_1x_1^{k+1}+A_2x_2'-A_2x_2^{k+1}, A_3x_3^k-A_3x_3^{k+1}\rangle \nonumber \\&\;=\beta \left\langle \left( \sum _{i=1}^3A_ix_i'-b\right) -\left( \sum _{i=1}^3A_ix_i^{k+1}-b\right) , A_3x_3^k-A_3x_3^{k+1}\right\rangle \nonumber \\&\qquad -\beta \left\langle A_3x_3'-A_3x_3^{k+1}, A_3x_3^k-A_3x_3^{k+1}\right\rangle \nonumber \\&\;=\beta \left\langle \left( \sum _{i=1}^3A_ix_i'-b\right) -\left( \sum _{i=1}^3A_ix_i^{k+1}-b\right) , A_3x_3^k-A_3x_3^{k+1}\right\rangle \nonumber \\&\qquad -\frac{1}{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert _{\beta }^2+ \Vert A_3x_3^{k+1}-A_3x_3'\Vert _{\beta }^2-\Vert A_3x_3^{k}-A_3x_3'\Vert _{\beta }^2\right\} \nonumber \\&\;=\beta \left\langle \sum _{i=1}^3A_ix_i'-b, A_3x_3^k-A_3x_3^{k+1}\right\rangle -\left\langle z^k-z^{k+1}, A_3x_3^k-A_3x_3^{k+1}\right\rangle \nonumber \\&\qquad -\frac{1}{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert _{\beta }^2+ \Vert A_3x_3^{k+1}-A_3x_3'\Vert _{\beta }^2-\Vert A_3x_3^{k}-A_3x_3'\Vert _{\beta }^2\right\} \nonumber \\&\;\le \beta \left\langle \sum _{i=1}^3A_ix_i'-b, A_3x_3^k-A_3x_3^{k+1}\right\rangle -\frac{1}{2}\Vert A_3x_3^k-A_3x_3^{k+1}\Vert _{\beta }^2 -\mu _3\Vert x_3^k-x_3^{k+1}\Vert ^2\nonumber \\&\qquad -\frac{1}{2}\left\{ \Vert A_3x_3^{k+1}-A_3x_3'\Vert _{\beta }^2-\Vert A_3x_3^{k}-A_3x_3'\Vert _{\beta }^2\right\} , \end{aligned}$$
(3.11)

where the third equality follows from (3.5) and the inequality follows from (3.3). On the other hand, using (3.4), we have

$$\begin{aligned}&-\beta \left\langle \left( \sum _{i=1}^3A_ix_i^{k+1}-b\right) +(A_3x_3'-A_3x_3^{k+1}), A_2x_2^k-A_2x_2^{k+1}\right\rangle \\&\le \frac{3}{10}\Vert A_2x_2^{k+1}-A_2x_2^k\Vert _\beta ^2+\frac{5}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_\beta +\frac{5}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_\beta \\&\qquad +\,\beta \left\langle A_3x_3^{k+1}-A_3x_3', A_2x_2^k-A_2x_2^{k+1}\right\rangle \\&\le \frac{3}{10}\Vert A_2x_2^{k+1}-A_2x_2^k\Vert _\beta ^2+\frac{5}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_\beta +\frac{5}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_\beta \\&\qquad +\,\frac{4}{3} \Vert A_3x_3^{k+1}-A_3x_3'\Vert ^2_\beta +\frac{3}{16}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _\beta ^2 \\&=\frac{39}{80}\Vert A_2x_2^{k+1}-A_2x_2^k\Vert _\beta ^2+\frac{5}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_\beta +\frac{5}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_\beta \\&\qquad +\,\frac{4}{3} \Vert A_3x_3^{k+1}-A_3x_3'\Vert ^2_\beta , \end{aligned}$$

where the inequality follows from (2.1). Then, it holds that

$$\begin{aligned}&\beta \left\langle A_1x_1'-A_1x_1^{k+1},A_2x_2^k-A_2x_2^{k+1}\right\rangle \nonumber \\&\;=\beta \left\langle \sum _{i=1}^3A_ix_i'-b, A_2x_2^k-A_2x_2^{k+1}\right\rangle -\beta \left\langle A_2x_2'-A_2x_2^{k+1}, A_2x_2^k-A_2x_2^{k+1}\right\rangle \nonumber \\&\qquad -\beta \left\langle \left( \sum _{i=1}^3A_ix_i^{k+1}-b\right) +(A_3x_3'-A_3x_3^{k+1}), A_2x_2^k-A_2x_2^{k+1}\right\rangle \nonumber \\&\;\le \beta \left\langle \sum _{i=1}^3A_ix_i'-b, A_2x_2^k-A_2x_2^{k+1}\right\rangle \nonumber \\&\qquad -\,\frac{1}{2}\left\{ \Vert A_2x_2^{k+1}-A_2x_2'\Vert _{\beta }^2-\Vert A_2x_2^{k}-A_2x_2'\Vert _{\beta }^2\right\} \nonumber \\&\qquad +\,\frac{5}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_\beta -\frac{5}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_\beta +\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_\beta \nonumber \\&\qquad -\,\frac{1}{80}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta }^2+\frac{4}{3} \Vert A_3x_3^{k+1}-A_3x_3'\Vert ^2_\beta . \end{aligned}$$
(3.12)

Furthermore, since

$$\begin{aligned}&\left\langle z^{k+1}-z', \sum _{i=1}^3A_ix_i^{k+1}-b\right\rangle =\frac{1}{\beta }\left\langle z^{k+1}-z',z^k- z^{k+1}\right\rangle \nonumber \\&\qquad =\frac{1}{2\beta }\left\{ \Vert z^{k}-z'\Vert ^2-\Vert z^{k+1}-z'\Vert ^2-\Vert z^k- z^{k+1}\Vert ^2\right\} , \end{aligned}$$
(3.13)

we obtain the assertion (3.9) by combining (3.11), (3.12), (3.13), and (3.7). \(\square \)

Now, we are ready to prove the convergence of the direct extension of ADMM (3.2a)-(3.2d) under Assumption 3.1. The result is summarized in the following theorem.

Theorem 3.1

Suppose Assumption 3.1 holds . Let \(\{(x_1^k,x_2^k,x_3^k,z^k)\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d) with \(\beta \in \left( 0,\frac{6 \mu _3}{17\Vert A_3^*A_3\Vert }\right) \). Then, the sequence \(\{(x_1^k, x_2^k, x_3^k, z^k)\}\) converges to a saddle-point in S.

Proof

Let \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) be a saddle point in S. Then, we have

$$\begin{aligned} \sum _{i=1}^3A_i\hat{x}_i-b=0. \end{aligned}$$
(3.14)

Setting \((x_1',x_2', x_3', z'):=(\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) in (3.9) and using (3.14), we obtain

$$\begin{aligned}&2\beta \left[ \sum _{i=1}^3\theta _i(x_i^{k+1})-\left\langle \hat{z},\sum _{i=1}^3A_ix_i^{k+1}-b\right\rangle \right] -2\beta \sum _{i=1}^3\theta _i(\hat{x}_i)\nonumber \\&\;\le \sum _{i=2}^3\left( \Vert A_ix_i^{k}-A_i\hat{x}_i\Vert _{\beta ^2}^2-\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert _{\beta ^2}^2\right) +\left( \Vert z^k-\hat{z}\Vert ^2-\Vert z^{k+1}-\hat{z}\Vert ^2\right) \nonumber \\&\qquad +\,\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}-\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\qquad -\,\frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2-\left( \beta \mu _3-\frac{8}{3}\beta ^2\Vert A_3^* A_3\Vert \right) \Vert x_3^{k+1}-\hat{x}_3\Vert ^2\nonumber \\&\qquad -\,\left( 2\beta \mu _3-\frac{17}{3}\beta ^2\Vert A_3^* A_3\Vert \right) \Vert x_3^{k+1}-x_3^k\Vert ^2-\Vert z^k-z^{k+1}\Vert ^2. \end{aligned}$$
(3.15)

Since \(\beta \in \left( 0,\frac{6 \mu _3}{17\Vert A_3^*A_3\Vert }\right) \), we have

$$\begin{aligned} \varrho _1:=\beta \mu _3-\frac{17}{6}\beta ^2\Vert A_3^* A_3\Vert >0. \end{aligned}$$

Then, (3.15) can be written as

$$\begin{aligned}&2\beta \left[ \sum _{i=1}^3\theta _i(x_i^{k+1})-\left\langle \hat{z}, \sum _{i=1}^3A_ix_i^{k+1}-b\right\rangle \right] -2\beta \sum _{i=1}^3\theta _i(\hat{x}_i)\nonumber \\&\;\le \sum _{i=2}^3\left( \Vert A_ix_i^{k}-A_i\hat{x}_i\Vert _{\beta ^2}^2-\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert _{\beta ^2}^2\right) +\left( \Vert z^k-\hat{z}\Vert ^2-\Vert z^{k+1}-\hat{z}\Vert ^2\right) \nonumber \\&\qquad +\,\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}-\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\qquad -\,\frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2-\varrho _1\Vert x_3^{k+1}-x_3^k\Vert ^2\nonumber \\&\qquad -\,\varrho _1\Vert x_3^{k+1}-\hat{x}_3\Vert ^2-\Vert z^k-z^{k+1}\Vert ^2. \end{aligned}$$
(3.16)

Since \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) is a saddle point, it follows from (2.4) that the left-hand side of (3.16) is nonegative. We thus have

$$\begin{aligned}&\sum _{i=2}^3\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert _{\beta ^2}^2+\Vert z^{k+1}-\hat{z}\Vert ^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\;\le \sum _{i=2}^3\Vert A_ix_i^{k}-A_i\hat{x}_i\Vert _{\beta ^2}^2+\Vert z^k-\hat{z}\Vert ^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}\nonumber \\&\qquad -\frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2-\varrho _1\Vert x_3^{k+1}-x_3^k\Vert ^2\nonumber \\&\qquad -\,\varrho _1\Vert x_3^{k+1}-\hat{x}_3\Vert ^2-\Vert z^k-z^{k+1}\Vert ^2. \end{aligned}$$
(3.17)

Recall the definition of Q in (2.5). Then, (3.17) can be rewritten as the compact form

$$\begin{aligned}&\Vert {\mathbf{w}}^{k+1}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\;\le \Vert {\mathbf{w}}^{k}-{\hat{\mathbf{w}}}\Vert _Q^2 +\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}\nonumber \\&\qquad -\,\frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2-\varrho _1\Vert x_3^{k+1}-x_3^k\Vert ^2\nonumber \\&\qquad -\,\varrho _1\Vert x_3^{k+1}-\hat{x}_3\Vert ^2-\Vert z^k-z^{k+1}\Vert ^2, \end{aligned}$$
(3.18)

where \(\mathbf {w}:=(x_2,x_3, z)\).

Thus we have

$$\begin{aligned} \Vert {\mathbf{w}}^{k+1}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\le \Vert {\mathbf{w}}^{k}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}, \end{aligned}$$
(3.19)

indicating that the sequence \(\{{\mathbf{w}}^k\}\) is bounded. Rearranging terms for (3.18) yields

$$\begin{aligned}&\frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\varrho _1\Vert x_3^{k+1}-x_3^k\Vert ^2+\varrho _1\Vert x_3^{k+1}-\hat{x}_3\Vert ^2 +\Vert z^k-z^{k+1}\Vert ^2\\&\;\le \left( \Vert {\mathbf{w}}^{k}-{\hat{\mathbf{w}}}\Vert _Q^2 +\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}\right) \\&\quad -\left( \Vert {\mathbf{w}}^{k+1}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\right) . \end{aligned}$$

Adding both sides of the above inequality for all k, we get

$$\begin{aligned}&\sum _{k=1}^{\infty }\left\{ \frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\varrho _1\Vert x_3^{k+1}-x_3^k\Vert ^2+\varrho _1\Vert x_3^{k+1}-\hat{x}_3\Vert ^2 +\Vert z^k-z^{k+1}\Vert ^2\right\} \\&\;\le \left( \Vert {\mathbf{w}}^{1}-{\hat{\mathbf{w}}}\Vert _Q^2 +\frac{10}{3}\Vert A_3x_3^{1}-A_3x_3^{0}\Vert ^2_{\beta ^2}\right) , \end{aligned}$$

and hence

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert z^k-z^{k+1}\Vert= & {} \lim _{k\rightarrow \infty }\Vert A_2x_2^k-A_2x_2^{k+1}\Vert =\lim _{k\rightarrow \infty }\Vert x_3^k-x_3^{k+1}\Vert \nonumber \\= & {} \lim _{k\rightarrow \infty }\Vert x_3^{k+1}-\hat{x}_3\Vert =0. \end{aligned}$$
(3.20)

The full column rank of \(A_1\) assumed in Assumption 3.1, (3.20), and (1.2d) then imply that \(\{x_1^k\}\) is also bounded. Hence the iterative sequence \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) generated by ADMM (1.2a)-(1.2d) is bounded.

Then, the boundedness of the sequence indicates that it has at least one cluster point. Let \((\tilde{x}_1, \tilde{x}_2, \tilde{x}_3, \tilde{z})\) be an arbitrary cluster point of \(\{({x}_1^k, {x}_2^k, {x}_3^k, {z}^k)\} \) and \(\{({x}_1^{k_j}, {x}_2^{k_j}, {x}_3^{k_j}, {z}^{k_j})\}\) be the subsequence converging to \((\tilde{x}_1, \tilde{x}_2, \tilde{x}_3, \tilde{z})\). Taking the limit along the subsequence in (3.1a)-(3.1c) and using (3.20), we have

$$\begin{aligned} \left\{ \begin{array}{l} 0\in \partial \theta _1(\tilde{x}_1)-A_1^*\tilde{z},\\ 0\in \partial \theta _2(\tilde{x}_2)-A_2^*\tilde{z},\\ 0\in \partial \theta _3(\tilde{x}_3)-A_3^*\tilde{z}. \end{array}\right. \end{aligned}$$

Moreover, since \(\Vert z^k-z^{k+1}\Vert =\beta \Vert \sum \limits _{j=1}^3 A_j x_j^{k+1}-b\Vert \), by taking the limit, we have \(\sum \limits _{j=1}^3 A_j \tilde{x}_j-b=0\). Hence, it follows from (2.2a)–(2.2d) that \((\tilde{x}_1, \tilde{x}_2, \tilde{x}_3, \tilde{z})\) is a saddle point in S and (3.19) means that the whole sequence has only one cluster point. Consequently, the iterative sequence \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) generated by the direct extension of ADMM (1.2a)–(1.2d) converges to a saddle point in S. The proof is complete.

Remark 3.1

For some specific applications of the abstract model (1.1), it may not be difficult to determine the range for \(\beta \). For example, for the application of moving object detection by detecting contiguous outliers studied in [37], it is easy to see that its model corresponds to (1.1) with \(A_3=I\) and \(\mu _3=1\). Thus, choosing \(\beta \in \left( 0, \frac{6}{17}\right) \) can sufficiently ensure the convergence if the scheme (1.2a)–(1.2d) is implemented to solve the model in [37].

4 Ergodic worst-case convergence rate

In this section, under Assumption 3.1, we establish the O(1 / t) worst-case convergence rate measured by the iteration complexity in the ergodic sense for the direct extension of ADMM (1.2a)–(1.2d). The main result is summarized in the following theorem.

Theorem 4.1

Suppose Assumption 3.1 holds. After t iterations of the direct extension of ADMM (1.2a)–(1.2d), we can find an approximate solution of (1.1) with an accuracy of O(1 / t).

Proof

As proved in Theorem 3.1, the sequence \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) generated by the direct extension of ADMM (1.2a)–(1.2d) is convergent. Thus, there exists a constant \(M>0\) such that

$$\begin{aligned} \Vert A_ix_i^k\Vert \le M, \; i=1, 2,3,\;\hbox {and}\; \Vert z^k\Vert \le M, \quad \forall k\ge 0. \end{aligned}$$

Then, it follows from Assumption 3.1 and (3.9) that for all \((x_1',x_2',x_3',z')\in {\mathcal {X}}_1\times {\mathcal {X}}_2\times {\mathcal {X}}_3\times {\mathcal {Z}}\), we have

$$\begin{aligned}&2\beta \left[ \sum _{i=1}^3\theta _i(x_i^{k+1})-\left\langle z', \sum _{i=1}^3A_ix_i^{k+1}-b\right\rangle \right] \\&\quad -2\beta \left[ \sum _{i=1}^3\theta _i(x_i')-\left\langle z^{k+1}, \sum _{i=1}^3A_ix_i'-b\right\rangle \right] \\&\;\le \sum _{i=2}^3\left( \Vert A_ix_i^{k}-A_ix_i'\Vert _{\beta ^2}^2-\Vert A_ix_i^{k+1}-A_ix_i'\Vert _{\beta ^2}^2\right) +\left( \Vert z^k-z'\Vert ^2-\Vert z^{k+1}-z'\Vert ^2\right) \\&\qquad +\,\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}-\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\\&\qquad +\,2\beta ^2\left\langle \sum _{i=1}^3A_ix_i'-b, A_2x_2^k-A_2x_2^{k+1}+A_3x_3^k-A_3x_3^{k+1}\right\rangle . \end{aligned}$$

Summing both sides of the above inequality from \(k=1\) to t yields

$$\begin{aligned}&2\beta \sum _{k=1}^{t}\left\{ \left[ \sum _{i=1}^3\theta _i(x_i^{k+1})-\left\langle z', \sum _{i=1}^3A_ix_i^{k+1}-b\right\rangle \right] \right. \nonumber \\&\quad \left. -\left[ \sum _{i=1}^3\theta _i(x_i')-\left\langle z^{k+1}, \sum _{i=1}^3A_ix_i'-b\right\rangle \right] \right\} \nonumber \\&\le \sum _{i=2}^3\Vert A_ix_i^{1}-A_ix_i'\Vert _{\beta ^2}^2+\Vert z^1-z'\Vert ^2+\frac{10}{3}\Vert A_3x_3^{1}-A_3x_3^{0}\Vert ^2_{\beta ^2}\nonumber \\&\qquad +2\beta ^2\left\langle \sum _{i=1}^3A_ix_i'-b, A_2x_2^0-A_2x_2^{t+1}+A_3x_3^0-A_3x_3^{t+1}\right\rangle \nonumber \\&\le \sum _{i=2}^3\Vert A_ix_i^{1}-A_ix_i'\Vert _{\beta ^2}^2+\Vert z^1-z'\Vert ^2+\frac{10}{3}\Vert A_3x_3^{1}-A_3x_3^{0}\Vert ^2_{\beta ^2}\nonumber \\&\qquad +2\beta ^2\left\| \sum _{i=1}^3A_ix_i'-b\right\| \left\| A_2x_2^0-A_2x_2^{t+1}+A_3x_3^0-A_3x_3^{t+1}\right\| \nonumber \\&\le \sum _{i=2}^3\Vert A_ix_i^{1}-A_ix_i'\Vert _{\beta ^2}^2+\Vert z^1-z'\Vert ^2\nonumber \\&\quad +\frac{10}{3}\Vert A_3x_3^{1}-A_3x_3^{0}\Vert ^2_{\beta ^2}+8\beta ^2M\left\| \sum _{i=1}^3A_ix_i'-b\right\| , \end{aligned}$$
(4.1)

where the second inequality follows from the Cauchy-Schwarz inequality and the last one is because of the triangle inequality and the boundedness of \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\). Define

$$\begin{aligned} x_1^K:=\frac{1}{t}\sum _{k=1}^tx_1^k,\quad x_2^K:=\frac{1}{t}\sum _{k=1}^tx_2^k,\quad x_3^K:=\frac{1}{t}\sum _{k=1}^tx_3^k,\quad \hbox {and}\quad z^K:=\frac{1}{t}\sum _{k=1}^t z^k. \end{aligned}$$

Then, it follows from the convexity of \(\theta _1\), \(\theta _2\), and \(\theta _3\) that

$$\begin{aligned} \theta _1(x_1^K)\le \frac{1}{t}\sum _{k=1}^{t}\theta _1(x_1^k),\quad \theta _2(x_2^K)\le \frac{1}{t}\sum _{k=1}^{t}\theta _2(x_2^k),\quad \hbox {and}\; \theta _3(x_3^K)\le \frac{1}{t}\sum _{k=1}^{t}\theta _2(x_3^k). \end{aligned}$$
(4.2)

Combining (4.1) and (4.2) yields

$$\begin{aligned}&2\beta t\left\{ \left[ \sum _{i=1}^3\theta _i(x_i^{K})-\left\langle z', \sum _{i=1}^3A_ix_i^{K}-b\right\rangle \right] -\left[ \sum _{i=1}^3\theta _i( x_i')-\left\langle z^{K}, \sum _{i=1}^3A_i x_i'-b\right\rangle \right] \right\} \nonumber \\&\le \sum _{i=2}^3\Vert A_ix_i^{1}-A_i x_i'\Vert _{\beta ^2}^2+\Vert z^1-z'\Vert ^2+\frac{10}{3}\Vert A_3x_3^{1}-A_3x_3^{0}\Vert ^2_{\beta ^2}+8\beta ^2M\left\| \sum _{i=1}^3A_ix_i'-b\right\| . \end{aligned}$$
(4.3)

Since \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) converges to a saddle point, \(\{(x_1^t,x_2^t, x_3^t, z^t)\}\) also converges to the same saddle point, and (4.3) implies that

$$\begin{aligned} 2\beta t{\mathcal {G}}_{{\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}}\left( x_1^K,x_2^K,x_3^K, z^K\right) \le D, \end{aligned}$$

where

That is, \((\mathbf {x}^K,z^K)\) is an approximate solution of (1.1) with an accuracy of O(1 / t). A worst-case O(1 / t) convergence rate measured by the iteration complexity in the ergodic sense is thus established for the direct extension of ADMM (1.2a)–(1.2d). This completes the proof. \(\square \)

5 Non-ergodic worst-case convergence rate

In this section, we establish the worst-case convergence rate measured by the iteration complexity in a non-ergodic sense for the direct extension of ADMM (1.2a)–(1.2d). Note that a non-ergodic worst convergence rate is generally stronger than its ergodic counterpart. Thus, we need more assumptions to derive a non-ergodic worst-case convergence rate. We first make the following additional assumption, and refer to, e.g. [38], for some applications that this assumption holds.

Assumption 5.1

In (1.1), \(\theta _1\) and \(\theta _2\) are convex, \(\theta _3\) is smooth and \(\nabla \theta _3\) is Lipschitz continuous with constant \(L_3>0\), \(\theta _3\) is strongly convex with modulus \(\mu _3>L_3/2\); \(A_1\) and \(A_2\) are of full column rank.

First of all, we need to present the optimality measurement for an iterate generated by (1.2a)–(1.2d). Recall the optimality condition for the subproblems (1.2a)–(1.2c). We know that an iterate generated by (1.2a)–(1.2d) satisfies the following inequalities:

figure f

Therefore, if we have

figure g

then the iterate \((x_1^{k+1}, x_2^{k+1}, x_3^{k+1}, z^{k+1})\) generated by (1.2a)–(1.2d) is a saddle point in S. Naturally, we can use the residual of (5.2a)–(5.2c) to measure the optimality of an iterate generated by (1.2a)–(1.2d).

Define

$$\begin{aligned} \beta _{\max }:=\min \left\{ \frac{2\mu _3-L_3}{{\left\| A_{3}^{*}A_3\right\| }},\frac{6 \mu _3}{17\left\| A_{3}^{*}A_3\right\| }\right\} . \end{aligned}$$
(5.3)

It follows from Assumption 5.1 that \(\beta _{\max }>0\). To derive the non-ergodic worst-case convergence rate, let us first prove a key inequality that will play an important role in our further analysis.

Lemma 5.1

Suppose Assumption 5.1 holds. Let \(\{(x_1^{k+1}, x_2^{k+1}, x_3^{k+1}, z^{k+1})\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d) with the restriction \(\beta \in \left( 0, \beta _{\max }\right) \). Then we have

$$\begin{aligned}&\frac{1}{2\beta }\Vert z^{k-1}- z^{k}\Vert ^2 +\frac{\beta }{2}\left\{ \Vert A_3x_3^{k-1}-A_3x_3^{k}\Vert ^2+\Vert A_2x_2^{k-1}-A_2x_2^{k}\Vert ^2\right\} \\&\;\ge \frac{1}{2\beta }\Vert z^{k}- z^{k+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2\right\} . \end{aligned}$$

Proof

Setting \(x_1':=x_1^k\) in (5.1a) yields

$$\begin{aligned}&\theta _1\left( x_1^k\right) -\theta _1\left( x_1^{k+1}\right) - \Big \langle A_1x_1^k-A_1x_1^{k+1}, z^{k+1}\\&\quad +\beta \left( A_2\left( x_2^{k+1}-x_2^k\right) +A_3\left( x_3^{k+1}-x_3^{k}\right) \right) \Big \rangle \ge 0. \end{aligned}$$

Rearranging terms, we have

$$\begin{aligned}&\theta _1\left( x_1^k\right) -\theta _1\left( x_1^{k+1}\right) \ge \left\langle A_1x_1^k-A_1x_1^{k+1}, z^{k+1}\right\rangle \\&\quad - \beta \left\langle A_1x_1^k-A_1x_1^{k+1}, \sum \limits _{j=2}^3A_j\left( x_j^k-x_j^{k+1}\right) \right\rangle . \end{aligned}$$

Moreover, setting \(x_1':=x_1^{k+1}\) for the kth iteration, we obtain

$$\begin{aligned} \theta _1(x_1^{k+1})-\theta _1(x_1^{k})\ge & {} \left\langle A_1x_1^{k+1}-A_1x_1^{k}, z^{k}\right\rangle \\&-\beta \left\langle A_1x_1^{k+1}-A_1x_1^{k}, \sum \limits _{j=2}^3A_j(x_j^{k-1}-x_j^{k})\right\rangle . \end{aligned}$$

Adding the above two inequalities and rearranging terms, we have

$$\begin{aligned} \left\langle A_1x_1^{k+1}-A_1x_1^{k}, z^{k+1}- z^{k}\right\rangle\ge & {} -\beta \left\langle A_1x_1^k-A_1x_1^{k+1}, \sum \limits _{j=2}^3A_j(x_j^k-x_j^{k+1})\right\rangle \nonumber \\&\qquad -\beta \left\langle A_1x_1^{k+1}-A_1x_1^{k}, \sum \limits _{j=2}^3A_j(x_j^{k-1}-x_j^{k})\right\rangle .\nonumber \\ \end{aligned}$$
(5.4)

Similarly, setting \(x_2'=x_2^k\) and \(x_2'=x_2^{k+1}\) in the kth iteration in (5.1b), and adding the resulting inequalities, we get the following inequality:

$$\begin{aligned}&\left\langle A_2x_2^{k+1}-A_2x_2^{k}, z^{k+1}- z^{k}\right\rangle \ge \;-\beta \left\langle A_2x_2^k-A_2x_2^{k+1}, A_3(x_3^k-x_3^{k+1})\right\rangle \nonumber \\&\quad -\beta \left\langle A_2x_2^{k+1}-A_2x_2^{k}, A_3(x_3^{k-1}-x_3^{k})\right\rangle . \end{aligned}$$
(5.5)

Then, it follows from (5.1c) that

$$\begin{aligned} \langle A_3x_3^{k+1}-A_3x_3^{k}, z^{k+1}- z^{k}\rangle \ge \mu _3\Vert x_3^{k+1}-x_3^k\Vert ^2. \end{aligned}$$
(5.6)

Adding (5.4)–(5.6), we have

$$\begin{aligned}&\left\langle \sum \limits _{j=1}^3A_j x_j^{k+1}-\sum \limits _{j=1}^3 A_j x_j^{k}, z^{k+1}- z^{k}\right\rangle \nonumber \\&\;\ge -\beta \left\langle A_1x_1^k-A_1x_1^{k+1}, \sum \limits _{j=2}^3A_j(x_j^k-x_j^{k+1})\right\rangle \nonumber \\&\qquad -\,\beta \left\langle A_1x_1^{k+1}-A_1x_1^{k}, \sum \limits _{j=2}^3A_j(x_j^{k-1}-x_j^{k})\right\rangle \nonumber \\&\qquad -\,\beta \left\langle A_2x_2^k-A_2x_2^{k+1}, A_3(x_3^k-x_3^{k+1})\right\rangle -\beta \left\langle A_2x_2^{k+1}-A_2x_2^{k}, A_3(x_3^{k-1}-x_3^{k})\right\rangle \nonumber \\&\qquad +\,\mu _3\Vert x_3^{k+1}-x_3^k\Vert ^2. \end{aligned}$$
(5.7)

Further, because of the identity (3.5), we know that

$$\begin{aligned} \sum \limits _{j=1}^3A_j x_j^{k+1}-\sum \limits _{j=1}^3 A_j x_j^{k}=\frac{1}{\beta }\left( ( z^{k}- z^{k+1})-( z^{k-1}- z^{k})\right) . \end{aligned}$$

We thus can rewrite the left-hand term of (5.7) as

$$\begin{aligned}&\left\langle \sum \limits _{j=1}^3A_j x_j^{k+1}-\sum \limits _{j=1}^3 A_j x_j^{k}, z^{k+1}- z^{k}\right\rangle \nonumber \\&\;=\frac{1}{\beta }\left\langle \left( z^{k}- z^{k+1}\right) -\left( z^{k-1}- z^{k}\right) , z^{k+1}- z^{k}\right\rangle \nonumber \\&\;=\frac{1}{2\beta }\left\{ \Vert z^{k-1}- z^{k}\Vert ^2-\Vert z^{k}- z^{k+1}\Vert ^2-\Vert \left( z^{k}- z^{k+1}\right) -\left( z^{k-1}- z^{k}\right) \Vert ^2\right\} .\nonumber \\ \end{aligned}$$
(5.8)

Now, we treat the right-hand side of (5.7). Indeed, these terms can be written as

$$\begin{aligned}&-\beta \left\langle A_1x_1^k-A_1x_1^{k+1}, \sum \limits _{j=2}^3(A_jx_j^k-A_jx_j^{k+1})\right\rangle -\beta \left\langle A_1x_1^{k+1}-A_1x_1^{k}, \sum \limits _{j=2}^3\left( A_jx_j^{k-1}-A_jx_j^{k}\right) \right\rangle \nonumber \\&-\beta \left\langle A_2x_2^k-A_2x_2^{k+1}, A_3x_3^k-A_3x_3^{k+1}\right\rangle -\beta \left\langle A_2x_2^{k+1}-A_2x_2^{k}, A_3x_3^{k-1}-A_3x_3^{k}\right\rangle \nonumber \\&=-\beta \left\langle A_3x_3^k-A_3x_3^{k+1}, \sum \limits _{j=1}^2\left( A_jx_j^k-A_jx_j^{k+1}\right) \right\rangle -\beta \left\langle A_1x_1^k-A_1x_1^{k+1}, A_2x_2^k-A_2x_2^{k+1}\right\rangle \nonumber \\&-\beta \left\langle A_3x_3^{k-1}-A_3x_3^{k}, \sum \limits _{j=1}^2(A_jx_j^{k+1}-A_j x_j^{k})\right\rangle -\beta \left\langle A_1x_1^{k+1}-A_1x_1^{k}, A_2x_2^{k-1}-A_2x_2^{k}\right\rangle . \end{aligned}$$
(5.9)

Using the identity (3.10), we have

$$\begin{aligned}&-\beta \left\langle A_3x_3^k-A_3x_3^{k+1}, \sum \limits _{j=1}^2\left( A_jx_j^k-A_jx_j^{k+1}\right) \right\rangle \nonumber \\&\quad =\frac{\beta }{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\left\| \sum \limits _{j=1}^2A_j\left( x_j^k-x_j^{k+1}\right) \right\| ^2-\left\| \sum \limits _{j=1}^3A_j\left( x_j^k-x_j^{k+1}\right) \right\| ^2\right\} \nonumber \\ \end{aligned}$$
(5.10)

and

$$\begin{aligned}&-\beta \left\langle A_1x_1^k-A_1x_1^{k+1}, A_2x_2^k-A_2x_2^{k+1}\right\rangle \nonumber \\&=\frac{\beta }{2}\left\{ \Vert A_1x_1^k-A_1x_1^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2-\left\| \sum \limits _{j=1}^2A_j\left( x_j^k-x_j^{k+1}\right) \right\| ^2\right\} .\nonumber \\ \end{aligned}$$
(5.11)

Then, using (2.1) and (3.3), we have that for all \(\omega >0\)

$$\begin{aligned}&-\beta \left\langle A_3x_3^{k-1}-A_3x_3^{k}, \sum \limits _{j=1}^2\left( A_jx_j^{k+1}-A_j x_j^{k}\right) \right\rangle \nonumber \\&=-\beta \left\langle A_3x_3^{k-1}-A_3x_3^{k}, \sum \limits _{j=1}^3\left( A_jx_j^{k+1}-A_j x_j^{k}\right) -\left( A_3x_3^{k+1}-A_3 x_3^{k}\right) \right\rangle \nonumber \\&=- \left\langle A_3x_3^{k-1}-A_3x_3^{k}, \left( z^k-z^{k+1}\right) -\left( z^{k-1}-z^k\right) \right\rangle \nonumber \\&\qquad +\,\beta \left\langle A_3x_3^{k-1}-A_3x_3^{k}, A_3x_3^{k+1}-A_3 x_3^{k}\right\rangle \nonumber \\&\ge - \frac{\omega }{2}\Vert x_3^{k-1}-x_3^{k}\Vert ^2-\frac{1}{2\omega }\Vert A_3^*z^k-A_3^*z^{k+1}\Vert ^2+\mu _3\Vert x_3^{k-1}-x_3^{k}\Vert ^2\nonumber \\&\qquad -\frac{\beta }{2}\Vert A_3x_3^{k-1}-A_3x_3^{k}\Vert ^2-\frac{\beta }{2}\Vert A_3x_3^{k}-A_3x_3^{k+1}\Vert ^2, \end{aligned}$$
(5.12)

and

$$\begin{aligned}&-\beta \left\langle A_1x_1^{k+1}-A_1x_1^{k}, A_2x_2^{k-1}-A_2x_2^{k}\right\rangle \ge \nonumber \\&\quad -\frac{\beta }{2}\left\{ \Vert A_1x_1^{k+1}-A_1x_1^{k}\Vert ^2+\Vert A_2x_2^{k-1}-A_2x_2^{k}\Vert ^2\right\} . \end{aligned}$$
(5.13)

Substituting (5.10)–(5.13) into (5.9), we have

$$\begin{aligned}&-\beta \left\langle A_1x_1^k-A_1x_1^{k+1}, \sum \limits _{j=2}^3(A_jx_j^k-A_jx_j^{k+1})\right\rangle \nonumber \\&\quad -\beta \left\langle A_1x_1^{k+1}-A_1x_1^{k}, \sum \limits _{j=2}^3(A_jx_j^{k-1}-A_jx_j^{k})\right\rangle \nonumber \\&\quad -\beta \left\langle A_2x_2^k-A_2x_2^{k+1}, A_3x_3^k-A_3x_3^{k+1}\right\rangle -\beta \left\langle A_2x_2^{k+1}-A_2x_2^{k}, A_3x_3^{k-1}-A_3x_3^{k}\right\rangle \nonumber \\&\ge \frac{\beta }{2}\left( \Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2 -\Vert A_2x_2^{k-1}-A_2x_2^{k}\Vert ^2\right) \nonumber \\&\quad -\frac{1}{2\omega }\Vert A_3^*z^k-A_3^*z^{k+1}\Vert ^2-\frac{\beta }{2}\Vert A_3x_3^{k-1}-A_3x_3^{k}\Vert ^2\nonumber \\&\quad +\left( \mu _3-\frac{\omega }{2}\right) \Vert x_3^{k-1}-x_3^{k}\Vert ^2- \frac{\beta }{2}\left\| \sum \limits _{j=1}^3A_j(x_j^k-x_j^{k+1})\right\| ^2. \end{aligned}$$
(5.14)

Using the identities \(A_3^*z^{k+1}= \nabla \theta _3(x_3^{k+1})\) and \(A_3^*z^{k}= \nabla \theta _3(x_3^{k})\), and the Lipschitz continuity of \(\nabla \theta _3\), we obtain

$$\begin{aligned} \Vert A_3^* z^{k+1}-A_3^*z^k\Vert \le L_3\Vert x_3^{k+1}-x_3^k\Vert . \end{aligned}$$

Combining (5.8) and (5.14), and setting \(\omega =L_3\), it follows from (5.7) that

$$\begin{aligned}&\frac{1}{2\beta }\Vert z^{k-1}- z^{k}\Vert ^2 +\frac{\beta }{2}\left\{ \Vert A_3x_3^{k-1}-A_3x_3^{k}\Vert ^2+\Vert A_2x_2^{k-1}-A_2x_2^{k}\Vert ^2\right\} \\&\;\ge \frac{1}{2\beta }\Vert z^{k}- z^{k+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2\right\} \\&\qquad +\left( \mu _3-\frac{L_3}{2}\right) \Vert x_3^{k}-x_3^{k+1}\Vert ^2\\&\qquad \;\;+\left( \mu _3-\frac{\beta \Vert A_3^*A_3\Vert }{2}-\frac{L_3}{2}\right) \Vert x_3^{k-1}-x_3^{k}\Vert ^2\\&\;\ge \frac{1}{2\beta }\Vert z^{k}- z^{k+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2\right\} , \end{aligned}$$

where the last inequality is due to the fact that \(\beta \le \frac{2\mu _3-L_3}{\Vert A_3^*A_3\Vert }\). The proof is complete. \(\square \)

Based on the above lemma, we are now ready to establish the worst-case convergence rate measured by the iteration complexity in a non-ergodic sense. We can actually measure this convergence rate by both O(1 / t) and o(1 / t) orders. Recall that we use the residual of (5.2a)–(5.2c) to measure the accuracy of an iterate of the scheme (3.2a)–(3.2d).

Theorem 5.1

Suppose Assumption 5.1 holds. Let \(\{(x_1^{k+1}, x_2^{k+1}, x_3^{k+1}, z^{k+1})\}\) be the sequence generated by the direct extension of ADMM (3.2a)–(3.2d) with the restriction \(\beta \in \left( 0, \beta _{\max }\right) \). Then, we have

$$\begin{aligned} \frac{1}{2\beta }\Vert z^{t}- z^{t+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^t-A_3x_3^{t+1}\Vert ^2+\Vert A_2x_2^t-A_2x_2^{t+1}\Vert ^2\right\} =O(1/t). \end{aligned}$$

Proof

First, it follows from (3.18) that

$$\begin{aligned}&\frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\varrho _1\Vert x_3^{k+1}-x_3^k\Vert ^2+\Vert z^k-z^{k+1}\Vert ^2 \nonumber \\&\le \Vert {\mathbf{w}}^{k}-{\hat{\mathbf{w}}}\Vert _Q^2-\Vert {\mathbf{w}}^{k+1}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}-\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}. \end{aligned}$$
(5.15)

Setting

$$\begin{aligned} \mu =\min \left\{ \frac{\beta }{20}, \frac{2\varrho _1}{\beta \Vert A_3^*A_3\Vert }\right\} , \end{aligned}$$

it follows that

$$\begin{aligned}&\mu \left\{ \frac{1}{2\beta }\Vert z^{k}- z^{k+1}\Vert ^2+\frac{\beta }{2}\left( \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2\right) \right\} \nonumber \\&\le \frac{1}{40}\Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\varrho _1\Vert x_3^{k+1}-x_3^k\Vert ^2+\Vert z^k-z^{k+1}\Vert ^2 \nonumber \\&\le \Vert {\mathbf{w}}^{k}-{\hat{\mathbf{w}}}\Vert _Q^2-\Vert {\mathbf{w}}^{k+1}-{\hat{\mathbf{w}}}\Vert _Q^2 +\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}-\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}. \end{aligned}$$
(5.16)

Adding these inequalities from \(k=1\) to t, we have

$$\begin{aligned}&\sum _{k=1}^t\left\{ \frac{1}{2\beta }\Vert z^{k}- z^{k+1}\Vert ^2+\frac{\beta }{2}\left( \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2\right) \right\} \\&\le \frac{1}{\mu }\left\{ \Vert {\mathbf{w}}^{0}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{1}-A_3x_3^{0}\Vert ^2_{\beta ^2}\right\} . \end{aligned}$$

Then, using Lemma 5.1, we obtain

$$\begin{aligned}&\frac{1}{2\beta }\Vert z^{t}- z^{t+1}\Vert ^2+\frac{\beta }{2}\left( \Vert A_3x_3^t-A_3x_3^{t+1}\Vert ^2+\Vert A_2x_2^t-A_2x_2^{t+1}\Vert ^2\right) \\&\le \frac{1}{\mu t}\left\{ \Vert {\mathbf{w}}^{0}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{1}-A_3x_3^{0}\Vert ^2_{\beta ^2}\right\} , \end{aligned}$$

which means that

$$\begin{aligned} \frac{1}{2\beta }\Vert z^{t}- z^{t+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^t-A_3x_3^{t+1}\Vert ^2+\Vert A_2x_2^t-A_2x_2^{t+1}\Vert ^2\right\} =O(1/t). \end{aligned}$$

The proof is complete. \(\square \)

In Theorem 5.1, we show the worst-case O(1 / t) convergence rate in a nonergodic sense for the direct extension of ADMM (3.2a)–(3.2d). Indeed, following the work [9], we can easily refine this O(1 / t) order to an o(1 / t) order, and thus establish a worst-case o(1 / t) convergence rate in a nonergodic sense for the scheme (3.2a)–(3.2d). We summarize it in the following theorem.

Theorem 5.2

Suppose Assumption 5.1 holds. Let \(\{(x_1^{k+1}, x_2^{k+1}, x_3^{k+1}, z^{k+1})\}\) be the sequence generated by the direct extension of ADMM (3.2a)–(3.2d) with the restriction \(\beta \in \left( 0, \beta _{\max }\right) \). Then, we have

$$\begin{aligned} \frac{1}{2\beta }\Vert z^{t}- z^{t+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^t-A_3x_3^{t+1}\Vert ^2+\Vert A_2x_2^t-A_2x_2^{t+1}\Vert ^2\right\} =o(1/t). \end{aligned}$$
(5.17)

Proof

Taking \(k=t\) to 2t in (5.16) and adding the resulting inequalities, we obtain

$$\begin{aligned}&t\left\{ \frac{1}{2\beta }\Vert z^{2t}- z^{2t+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^{2t}-A_3x_3^{2t+1}\Vert ^2+\Vert A_2x_2^{2t}-A_2x_2^{2t+1}\Vert ^2\right\} \right\} \\&\le \sum _{k=t}^{2t}\frac{1}{2\beta }\Vert z^{k}- z^{k+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2\right\} \\&=\sum _{k=1}^{2t}\frac{1}{2\beta }\Vert z^{k}- z^{k+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2\right\} \\&\qquad -\sum _{k=1}^{t}\frac{1}{2\beta }\Vert z^{k}- z^{k+1}\Vert ^2+\frac{\beta }{2}\left\{ \Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2+\Vert A_2x_2^k-A_2x_2^{k+1}\Vert ^2\right\} . \end{aligned}$$

The limit of the last term in the right-hand side is 0. So, the assertion (5.17) is proved. \(\square \)

6 Globally linear convergence

In this section, we show that it is possible to theoretically derive the globally linear convergence of the direct extension of ADMM (1.2a)–(1.2d) provided that some strengthen enough conditions are assumed.

6.1 Linear convergence under stronger conditions

In this subsection, we prove the linear convergence of the direct extension of ADMM under the following conditions. We refer to [24] for some existing analysis under stronger conditions.

Assumption 6.1

In (1.1), \(\theta _1\) is convex; \(\theta _2\) and \(\theta _3\) are strongly convex with modulus \(\mu _2>0\) and \(\mu _3>0\), respectively; \(\nabla \theta _1\) is Lipschitz continuous with constant \(L_1\); \(A_2\) and \(A_3\) are of full column rank; and \(A_1\) is nonsingular.

First of all, under Assumption 3.1, we can prove a result similar to (3.18). That is, there exists a constant \(\zeta =\min \{\frac{1}{40}, \varrho _1, \mu _2\}>0\), such that

$$\begin{aligned}&\Vert {\mathbf{w}}^{k+1}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\;\le \Vert {\mathbf{w}}^{k}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2} -\zeta \left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right. \nonumber \\&\qquad \;+\left. \Vert z^k-z^{k+1}\Vert ^2+\sum _{i=2}^3\Vert x_i^{k+1}-\hat{x}_i\Vert ^2\right\} . \end{aligned}$$
(6.1)

Then, let us explain the roadmap to prove the linear convergence rate of (1.2a)–(1.2d). Indeed, according to (6.1), it is clear that we only need to find bounds for the terms \(\Vert z^{k+1}-\hat{z}\Vert ^2\) and \(\Vert A_1x_1^{k+1}-A_1\hat{x}_1\Vert ^2\) because of the minus term in (6.1). We first consider \(\Vert A_1x_1^{k+1}-A_1\hat{x}_1\Vert ^2\). It follows from (3.14) and (3.5) that

$$\begin{aligned} A_1x_1^{k+1}-A_1\hat{x}_1=\frac{1}{\beta }(z^k-z^{k+1})-(A_2x_2^{k+1}-A_2\hat{x}_2)-(A_3x_3^{k+1}-A_3\hat{x}_3). \end{aligned}$$

Therefore, we have

$$\begin{aligned} \Vert A_1x_1^{k+1}-A_1\hat{x}_1\Vert ^2\le 3\left\{ \frac{1}{\beta ^2}\Vert z^k-z^{k+1}\Vert ^2+\Vert A_2x_2^{k+1}-A_2\hat{x}_2\Vert ^2+\Vert A_3x_3^{k+1}-A_3\hat{x}_3\Vert ^2\right\} . \end{aligned}$$
(6.2)

Then, we need to consider how to bound the term \(\Vert z^{k+1}-\hat{z}\Vert ^2\) to derive the linear convergence of the sequence generated by the scheme (1.2a)–(1.2d). As we show below, this is exactly why we need to assume Assumption 6.1.

Lemma 6.1

Suppose Assumption 6.1 holds. Let \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) be a saddle point in S; let \(\{(x_1^{k+1},x_2^{k+1}, x_3^{k+1}, z^{k+1})\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d). Then, there exists a constant \(\sigma >0\) such that

$$\begin{aligned} \Vert z^{k+1}-\hat{z}\Vert ^2\le \sigma \left( \Vert x_1^{k+1}-\hat{x}_1\Vert ^2+\Vert x_2^{k}-x_2^{k+1}\Vert ^2+\Vert x_3^{k}-x_3^{k+1}\Vert ^2\right) . \end{aligned}$$

Proof

It follows from the optimality conditions for the \(x_1\)- and \(\hat{x}_1\)-subproblems that

$$\begin{aligned} A_1^* z^{k+1}= \nabla \theta _1(x_1^{k+1})+\beta A_1^*\sum _{i=2}^3\left( A_ix_i^{k}-A_ix_i^{k+1}\right) \end{aligned}$$

and

$$\begin{aligned} A_1^* \hat{z}= \nabla \theta _1(\hat{x}_1). \end{aligned}$$

Using Assumption 6.1 and the Cauchy-Schwarz inequality, we get

$$\begin{aligned} \Vert A_1^* z^{k+1}-A_1^* \hat{z}\Vert \le L_1\Vert x_1^{k+1}-\hat{x}_1\Vert +\beta \Vert A_1\Vert \sum _{i=2}^3\left( \Vert A_i\Vert \Vert x_i^{k}-x_i^{k+1}\Vert \right) , \end{aligned}$$

and the assertion follows immediately from the non-singularity of \(A_1\). \(\square \)

Now, we can derive the linear convergence of the sequence generated by the scheme (1.2a)-(1.2d) under Assumptions 3.1 and 6.1.

Theorem 6.1

Suppose Assumptions 3.1 and 6.1 hold. Then the sequence \(\{(x_1^{k+1},x_2^{k+1}, x_3^{k+1},z^{k+1})\}\) generated by the direct extension of ADMM (1.2a)–(1.2d) with \(\beta \in \left( 0,\frac{6 \mu _3}{17\Vert A_3^*A_3\Vert }\right) \) converges globally linearly to a saddle point in S.

Proof

Let \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) be a saddle point in S. It follows from (6.2) that

$$\begin{aligned}&\frac{\zeta }{4}\left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2+\Vert z^k-z^{k+1}\Vert ^2+\sum _{i=2}^3\Vert x_i^{k+1}-\hat{x}_i\Vert ^2\right\} \nonumber \\&\ge \frac{\zeta }{4}\left\{ \Vert z^k-z^{k+1}\Vert ^2+\sum _{i=2}^3\Vert x_i^{k+1}-\hat{x}_i\Vert ^2\right\} \nonumber \\&\ge \frac{\zeta }{4}\left\{ \beta ^2\frac{1}{\beta ^2}\Vert z^k-z^{k+1}\Vert ^2+\sum _{i=2}^3\frac{1}{\Vert A_i^* A_i\Vert }\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert ^2\right\} \nonumber \\&\ge \frac{\zeta }{12}\min \left\{ \beta ^2, \frac{1}{\Vert A_2^* A_2\Vert }, \frac{1}{\Vert A_3^* A_3\Vert }\right\} \Vert A_1x_1^{k+1}-A_1\hat{x}_1\Vert ^2\nonumber \\&\ge \frac{\zeta }{12}\min \left\{ \beta ^2, \frac{1}{\Vert A_2^* A_2\Vert }, \frac{1}{\Vert A_3^* A_3\Vert }\right\} \lambda _{\min }(A_1^* A_1)\Vert x_1^{k+1}-\hat{x}_1\Vert ^2\nonumber \\&=\tau _1\Vert x_1^{k+1}-\hat{x}_1\Vert ^2, \end{aligned}$$
(6.3)

where \(\tau _1=\frac{\zeta }{12}\min \left\{ \beta ^2, \frac{1}{\Vert A_2^* A_2\Vert }, \frac{1}{\Vert A_3^* A_3\Vert }\right\} \lambda _{\min }(A_1^* A_1)\). In addition, we have

$$\begin{aligned}&\frac{\zeta }{4}\left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2+\Vert z^k-z^{k+1}\Vert ^2+\sum _{i=2}^3\Vert x_i^{k+1}-\hat{x}_i\Vert ^2\right\} \nonumber \\&\ge \frac{\zeta }{4}\left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right\} \nonumber \\&\ge \frac{\zeta }{4}\left\{ \beta ^2\lambda _{\min }(A_2^* A_2)\Vert x_2^k-x_2^{k+1}\Vert ^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right\} \nonumber \\&\ge \frac{\zeta }{4}\min \left\{ \beta ^2\lambda _{\min }(A_2^* A_2),1\right\} \left\{ \Vert x_2^k-x_2^{k+1}\Vert ^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right\} \nonumber \\&=\tau _2\left\{ \Vert x_2^k-x_2^{k+1}\Vert ^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right\} , \end{aligned}$$
(6.4)

where \(\tau _2=\frac{\zeta }{4}\min \left\{ \beta ^2\lambda _{\min }(A_2^* A_2),1\right\} \). Adding (6.3) and (6.4), and using Lemma 6.1, we have

$$\begin{aligned}&\frac{\zeta }{2}\left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2+\Vert z^k-z^{k+1}\Vert ^2+\sum _{i=2}^3\Vert x_i^{k+1}-\hat{x}_i\Vert ^2\right\} \nonumber \\&\ge \tau _1\Vert x_1^{k+1}-\hat{x}_1\Vert ^2+\tau _2\left\{ \Vert x_2^k-x_2^{k+1}\Vert ^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right\} \nonumber \\&\ge \min \left\{ \tau _1, \tau _2\right\} \left\{ \Vert x_1^{k+1}-\hat{x}_1\Vert ^2+\Vert x_2^k-x_2^{k+1}\Vert ^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right\} \nonumber \\&\ge \frac{\min \left\{ \tau _1, \tau _2\right\} }{\sigma }\Vert z^{k+1}-\hat{z}\Vert ^2\nonumber \\&=\tau \Vert z^{k+1}-\hat{z}\Vert ^2, \end{aligned}$$
(6.5)

where \(\tau =\frac{1}{\sigma }\min \left\{ \tau _1, \tau _2\right\} \). Moreover, we have

$$\begin{aligned}&\frac{\zeta }{2}\left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2+\Vert z^k-z^{k+1}\Vert ^2+\sum _{i=2}^3\Vert x_i^{k+1}-\hat{x}_i\Vert ^2\right\} \nonumber \\&\ge \frac{\zeta }{2}\left\{ \Vert x_3^k-x_3^{k+1}\Vert ^2+\sum _{i=2}^3\Vert x_i^{k+1}-\hat{x}_i\Vert ^2\right\} \nonumber \\&\ge \frac{\zeta }{2}\left\{ \frac{3}{10\beta ^2}\frac{10}{3}\Vert x_3^k-x_3^{k+1}\Vert ^2_{\beta ^2}+\sum _{i=2}^3\frac{1}{\beta ^2\Vert A_i^* A_i\Vert }\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert ^2_{\beta ^2}\right\} \nonumber \\&\ge \frac{\zeta }{2\beta ^2}\min \left\{ \frac{1}{\Vert A_2^* A_2\Vert }, \frac{3}{10\Vert A_3^* A_3\Vert }\right\} \left\{ \frac{10}{3}\Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2_{\beta ^2}+\sum _{i=2}^3\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert ^2_{\beta ^2}\right\} \nonumber \\&=\tau '\left\{ \frac{10}{3}\Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2_{\beta ^2}+\sum _{i=2}^3\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert ^2_{\beta ^2}\right\} , \end{aligned}$$
(6.6)

where \(\tau '=\frac{\zeta }{2\beta ^2}\min \left\{ \frac{1}{\Vert A_2^* A_2\Vert }, \frac{3}{10\Vert A_3^* A_3\Vert }\right\} \).

Then, it follows from (6.5) and (6.6) that

$$\begin{aligned}&\zeta \left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2+\Vert z^k-z^{k+1}\Vert ^2+\sum _{i=2}^3\Vert x_i^{k+1}-\hat{x}_i\Vert ^2\right\} \nonumber \\&\ge \tau \Vert z^{k+1}-\hat{z}\Vert ^2+\tau '\left\{ \frac{10}{3}\Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2_{\beta ^2}+\sum _{i=2}^3\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert ^2_{\beta ^2}\right\} \nonumber \\&\ge \min \left\{ \tau , \tau '\right\} \left\{ \Vert z^{k+1}-\hat{z}\Vert ^2+\frac{10}{3}\Vert A_3x_3^k-A_3x_3^{k+1}\Vert ^2_{\beta ^2}+\sum _{i=2}^3\Vert A_ix_i^{k+1}-A_i\hat{x}_i\Vert ^2_{\beta ^2}\right\} \nonumber \\&=\sigma '\left\{ \Vert \mathbf {w}^{k+1}-\hat{\mathbf {w}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\right\} , \end{aligned}$$
(6.7)

where Q is defined in (2.5) and its positive definiteness is ensured by Assumption 6.1 and \(\sigma '=\min \left\{ \tau , \tau '\right\} \). Finally, combining (6.1) and (6.7), we obtain

$$\begin{aligned}&\Vert {\mathbf{w}}^{k+1}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\\&\quad \le \frac{1}{1+\sigma '}\left\{ \Vert {\mathbf{w}}^{k}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}\right\} , \end{aligned}$$

which implies the linear convergence rate of the generated sequence under the Q-norm. The proof is complete. \(\square \)

6.2 Linear convergence under error bound conditions

As mentioned, the linear convergence of the scheme (1.2a)–(1.2d) has been well investigated under some error bound conditions in [21], which has also inspired the work [15] for deriving the linear convergence of the original two-block ADMM scheme in the quadratic programming context, which was further extended to the general case that the objective functions possess polyhedra subdifferentials [35]. In this subsection, we provide an alternative proof for the globally linear convergence of (1.2a)–(1.2d) under some error bound conditions. The new conditions differ from Assumption 6.1 in that no smoothness or strong convexity assumption on the objective function is assumed.

Assumption 6.2

Let \({\mathcal {U}}:={\mathcal {X}}_1\times {\mathcal {X}}_2\times {\mathcal {X}}_3\times {\mathcal {Z}}\) and let \(\tilde{U}\) be a bounded set in \({\mathcal {U}}\). Then there exists \(\eta >0\) such that

$$\begin{aligned} \hbox {dist}(u,S)\le \eta \hbox {dist}(0,F(u)),\quad \forall u\in \tilde{U}, \end{aligned}$$
(6.8)

where \(u:=(x_1,x_2,x_3, z)\), and

$$\begin{aligned} F(u):=\left( \begin{array}{c}\partial \theta _1(x_1)-A_1^*z\\ \partial \theta _2(x_2)-A_2^*z\\ \partial \theta _3(x_3)-A_3^*z\\ \sum \limits _{i=1}^3A_ix_i-b\end{array}\right) . \end{aligned}$$
(6.9)

Obviously, using the definition in (6.9) and the optimality condition (2.2a)–(2.2d), we see that a point \(u\in S\) if and only if \(0\in F(u)\). Using the result [36, Theorem 3.3], it is clear that the error bound condition (6.8) holds if F is a piecewise linear multifunction, which is true for the case where \(\theta _i\)’s are, e.g., the \(l_1\)- or \(l_2\)-terms. For the definition of piecewise linear multifunction or polyhedral multifunction, see, e.g., [31, 36].

Recall that under Assumption 3.1, we show that there exists a constant \(\zeta =\min \{\frac{1}{40}, \varrho _1\}>0\) such that

$$\begin{aligned}&\Vert {\mathbf{w}}^{k+1}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\;\le \Vert {\mathbf{w}}^{k}-{\hat{\mathbf{w}}}\Vert _Q^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}-\zeta \left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right. \nonumber \\&\qquad \;+\left. \Vert z^k-z^{k+1}\Vert ^2+\Vert x_3^{k+1}-\hat{x}_3\Vert ^2\right\} , \end{aligned}$$
(6.10)

where \(\mathbf {w}:=(x_2,x_3, z)\) and Q is defined by (2.5), see (3.18). To prove the linear convergence rate of the scheme (1.2a)–(1.2d) under Assumption 6.2, it follows from (6.10) that we only need to bound the term \(\Vert \mathbf {w}^{k+1}-{\hat{\mathbf{w}}}\Vert ^2\) because of the minus term in (6.10). Then, under Assumption 6.2, the key is to find a bound of the term \(\hbox {dist}(0,F(u^{k+1}))\). Therefore, we first prove the following lemma.

Lemma 6.2

Let F(u) be defined in (6.9) and \(\{u^k:=(x_1^k,x_2^k, x_3^k, z^k)\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d). Then, there exists a constant \(\sigma >0\) such that

$$\begin{aligned} \hbox {dist}^2(0,F(u^{k+1})) \le \sigma \left\{ \left\| A_2x_2^k-A_2x_2^{k+1}\right\| _{\beta ^2}^2+\left\| x_3^k-x_3^{k+1}\right\| ^2+ \Vert z^k-z^{k+1}\Vert ^2\right\} ,\nonumber \\ \end{aligned}$$
(6.11)

where

$$\begin{aligned} \sigma =\max \left\{ \frac{1}{\beta ^2},\beta ^2(2\Vert A_1\Vert ^2+\Vert A_2\Vert ^2)\Vert A_3\Vert ^2,2\Vert A_1\Vert ^2\right\} . \end{aligned}$$
(6.12)

Proof

It follows from the optimality condition of the \(x_1\)-subproblem that

$$\begin{aligned} 0\in \partial \theta _1\left( x_1^{k+1}\right) -A_1^* z^k+\beta A_1^*\left( A_1x_1^{k+1}+A_2x_2^k+A_3x_3^k-b\right) , \end{aligned}$$

which can be alternatively written as (using (1.2d))

$$\begin{aligned} \beta A_1^*\left[ A_2\left( x_2^{k+1}-x_2^{k}\right) +A_3\left( x_3^{k+1}-x_3^{k}\right) \right] \in \partial \theta _1\left( x_1^{k+1}\right) -A_1^* z^{k+1}. \end{aligned}$$

Consequently, we have

$$\begin{aligned} \hbox {dist}^2(0, \partial \theta _1(x_1^{k+1})-A_1^* z^{k+1})&\le \Vert \beta A_1^*[A_2(x_2^{k+1}-x_2^{k})+A_3(x_3^{k+1}-x_3^{k})]\Vert ^2\nonumber \\&\le 2\beta ^2 \Vert A_1\Vert ^2\{\Vert A_2(x_2^{k+1}-x_2^{k})\Vert ^2+\Vert A_3(x_3^{k+1}-x_3^{k})\Vert ^2\}. \end{aligned}$$
(6.13)

Moreover, the optimality condition of the \(x_2\)-subproblem and (1.2d) indicate that

$$\begin{aligned} \beta A_2^*A_3(x_3^{k+1}-x_3^{k}) \in \partial \theta _2(x_2^{k+1})-A_2^* z^{k+1}. \end{aligned}$$

Hence, we obtain

$$\begin{aligned} \hbox {dist}^2(0, \partial \theta _2(x_2^{k+1})-A_2^* z^{k+1})&\le \Vert \beta A_2^*A_3(x_3^{k+1}-x_3^{k})\Vert ^2\le \beta ^2 \Vert A_2\Vert ^2\Vert A_3(x_3^{k+1}-x_3^{k})\Vert ^2. \end{aligned}$$
(6.14)

Further, the optimality condition of the \(x_3\)-subproblem and (1.2d) indicate that

$$\begin{aligned} 0 \in \partial \theta _3(x_3^{k+1})-A_3^* z^{k+1}, \end{aligned}$$

which means

$$\begin{aligned} \hbox {dist}^2(0, \partial \theta _3(x_3^{k+1})-A_3^* z^{k+1})=0. \end{aligned}$$
(6.15)

In addition, it follows from (1.2d) that

$$\begin{aligned} \frac{1}{\beta }(z^k-z^{k+1})=A_1 x_1^{k+1}+A_2 x_2^{k+1}+A_3 x_3^{k+1}-b. \end{aligned}$$
(6.16)

Finally, combining (6.13)–(6.16), we obtain

$$\begin{aligned} \hbox {dist}^2(0, F(u^{k+1}))&=\hbox {dist}^2(0, \partial \theta _1(x_1^{k+1})-A_1^* z^{k+1})+ \hbox {dist}^2(0, \partial \theta _2(x_2^{k+1})-A_2^* z^{k+1})\nonumber \\&\qquad + \hbox {dist}^2(0, \partial \theta _3(x_3^{k+1})-A_3^* z^{k+1})+ \Vert A_1 x_1^{k+1}+A_2 x_2^{k+1}+A_3 x_3^{k+1}-b\Vert ^2\nonumber \\&\le 2\beta ^2\Vert A_1\Vert ^2\Vert A_2(x_2^{k+1}-x_2^{k})\Vert ^2 +\frac{1}{\beta ^2}\Vert z^{k+1}-z^k\Vert ^2\nonumber \\&\qquad \beta ^2(2\Vert A_1\Vert ^2+\Vert A_2\Vert ^2)\Vert A_3(x_3^{k+1}-x_3^{k})\Vert ^2. \end{aligned}$$
(6.17)

Therefore, the assertion (6.11) follows immediately from (6.17) and the definition of \(\sigma \) in (6.12). \(\square \)

Now, we show the globally linear convergence of the direct extension of ADMM (1.2a)-(1.2d) under Assumptions 3.1 and 6.2.

Theorem 6.2

Suppose Assumptions 3.1 and 6.2 hold. Then the sequence \(\{(x_1^k,x_2^k, x_3^k,z^k)\}\) generated by the direct extension of ADMM (1.2a)–(1.2d) with \(\beta \in \left( 0,\frac{6 \mu _3}{17\Vert A_3^*A_3\Vert }\right) \) converges globally linearly to a saddle point in S.

Proof

Let \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) be a saddle point in S. According to Theorem 3.1, the whole sequence \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) converges to a saddle point in S. Thus it is bounded. Because of Assumption 6.2, we know that

$$\begin{aligned} \hbox {dist}(u^{k+1},S)\le \eta \hbox {dist}(0,F(u^{k+1})),\qquad \forall k\ge 0, \end{aligned}$$
(6.18)

which, together with (6.11), implies that

$$\begin{aligned} \hbox {dist}^2(u^{k+1},S) \le \eta ^2\sigma \left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2+ \Vert z^k-z^{k+1}\Vert ^2\right\} .\nonumber \\ \end{aligned}$$
(6.19)

Let us define

$$\begin{aligned} M=\left( \begin{array}{cc}0&{}0\\ 0&{}Q\end{array}\right) , \end{aligned}$$

with Q given in (2.5). Then, combining (6.19) with (6.10), we have

$$\begin{aligned}&\Vert u^{k+1}-\hat{u}\Vert _M^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\;\le \Vert u^{k}-\hat{u}\Vert _M^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2} -\zeta \left\{ \Vert A_2x_2^k-A_2x_2^{k+1}\Vert _{\beta ^2}^2+\Vert x_3^k-x_3^{k+1}\Vert ^2\right. \nonumber \\&\qquad \;+\left. \Vert z^k-z^{k+1}\Vert ^2\right\} \nonumber \\&\;\le \Vert u^{k}-\hat{u}\Vert _M^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2} - \frac{\zeta }{3}\Vert x_3^k-x_3^{k+1}\Vert ^2-\frac{2\zeta }{3\sigma \eta ^2}\hbox {dist}^2(u^{k+1},S).\nonumber \\ \end{aligned}$$
(6.20)

It follows from (6.18) that

$$\begin{aligned} \hbox {dist}_M^2(u^{k+1},S)\le \eta ^2\Vert M\Vert \hbox {dist}^2(0,F(u^{k+1})),\qquad \forall k\ge 0. \end{aligned}$$

Let \(u^*\in S\) such that \(\hbox {dist}_{M}(u^k,S)=\Vert u^k-u^*\Vert _M\). Since (6.20) holds for any saddle point in S, we have

$$\begin{aligned}&\Vert u^{k+1}-u^*\Vert _M^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\;\le \hbox {dist}_{M}(u^k,S)+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2} - \frac{\zeta }{3}\Vert x_3^k-x_3^{k+1}\Vert ^2-\frac{2\zeta }{3\sigma \eta ^2}\hbox {dist}^2(u^{k+1},S). \end{aligned}$$
(6.21)

Setting

$$\begin{aligned} \bar{\mu }:=\min \left\{ \frac{\zeta }{10\beta ^2\Vert A_3\Vert ^2},\frac{\zeta }{3\sigma \eta ^2\Vert M||}\right\} \;\; \hbox {and}\;\; \gamma :=\frac{\zeta }{3(1+\bar{\mu })\sigma \eta ^2}, \end{aligned}$$

we obtain the following inequality:

$$\begin{aligned}&(1+\bar{\mu })\left\{ \hbox {dist}_M^2(u^{k+1},S)+\hbox {dist}_{\gamma I}^2(u^{k+1},S)+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^{k}\Vert ^2_{\beta ^2}\right\} \nonumber \\&\le \Vert u^{k+1}-u^*\Vert _M^2+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^k\Vert ^2_{\beta ^2}\nonumber \\&\quad +\bar{\mu } \left\{ \hbox {dist}_M^2(u^{k+1},S)+\frac{10}{3}\Vert A_3x_3^{k+1}-A_3x_3^{k}\Vert ^2_{\beta ^2}\right\} \nonumber \\&\qquad +(1+\bar{\mu })\gamma \hbox {dist}^2(u^{k+1},S)\nonumber \\&\le \Vert u^{k}-u^*\Vert _M^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2} -\left( \frac{\zeta }{3}-\frac{10\beta ^2\Vert A_3\Vert ^2\bar{\mu }}{3}\right) \Vert x_3^{k}-x_3^{k+1}\Vert ^2\nonumber \\&\qquad -\left( \frac{\zeta }{3\sigma \eta ^2}-\bar{\mu }\Vert M\Vert \right) \hbox {dist}^2(u^{k+1},S)\nonumber \\&\le \Vert u^{k}-u^*\Vert _M^2+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}\nonumber \\&= \hbox {dist}_M^2(u^{k},S)+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}\nonumber \\&\le \hbox {dist}_{\gamma I+M}^2(u^{k},S)+\frac{10}{3}\Vert A_3x_3^{k}-A_3x_3^{k-1}\Vert ^2_{\beta ^2}, \end{aligned}$$
(6.22)

where the second inequality follows from (6.21) and the third one from the definition of \(\bar{\mu }\). Thus, the inequality (6.22) implies the linear convergence rate of the sequence generated by the direct extension of ADMM measured by the sum of the distance under the \((\gamma I+M)\)-norm. The proof is complete. \(\square \)

Remark 6.1

Note that the direct extension of ADMM (1.2a)–(1.2d) is a splitting, thus inexact, version of the augmented Lagrangian method in [20, 30]; and it has been proved in [33] that the augmented Lagrangian method is an application of the proximal point algorithm (PPA) in [25, 26]. Since the convergence rate of PPA is known to be only linear in [33], the linear convergence rate is the most we can expect for the direct extension of ADMM (1.2a)–(1.2d). The linear convergence rate is indeed ideal for the generic convex model (1.1), and thus very strict conditions should be posed to gain this convergence rate. In fact, even for the original ADMM to tackle a two-block linearly constrained separable convex minimization model, its linear convergence rate can only be established for special cases (e.g., [1, 15]) or generic cases but with strong assumptions on the model (e.g., [7, 8]). Usually, the conditions that can ensure the linear convergence rate of (1.2a)–(1.2d) are too strict to be satisfied by the majority of the concrete applications of the abstract model (1.1). In this sense, discussing the global linear convergence rate for the direct extension of ADMM (1.2a)–(1.2d) only makes a theoretical sense.

7 Conclusions

In this paper, we discuss the convergence of the direct extension of alternating direction method of multipliers (ADMM) for solving a three-block linearly constrained convex minimization model whose objective function is the sum of three functions without coupled variables. We show that the convergence of the direct extension of ADMM can be ensured if one function in the objective is strongly convex, the penalty parameter is appropriately restricted and some assumptions on the operators in the constraints hold. This convergence result can well explain why the direct extension of ADMM is convergent for some applications. We also establish the worst-case convergence rate measured by the iteration complexity and the globally linear convergence rate in asymptotical sense under additional assumptions. This is a deeper study on the convergence of the direct extension of ADMM that may help us better understand its theoretical aspects. We would like to mention that instead of the original scheme (1.2a)–(1.2d), we can study the scheme

figure h

where \(\gamma \in (0,1)\). Since the relaxation factor \(\gamma \) can be arbitrarily close to 1, asymptotically this scheme has no difference from the direct extension of ADMM (1.2a)–(1.2d). Similarly, we can discuss the convergence and estimate the convergence rate under different conditions for the scheme (7.1a)–(7.1c), and indeed, their proofs can be presented with easier notation than what we present in this paper.