1 Introduction

We consider the convex minimization model with linear constraints and an objective function which is the sum of three functions without coupled variables:

$$\begin{aligned} \begin{array}{ll} \min &{} \,\theta _1(x_1)+\theta _2(x_2) +\theta _3(x_3)\\ \hbox {s.t.}&{}\, A_1x_1+A_2x_2 +A_3x_3 = b ,\\ &{} \,x_1 \in \mathcal{X}_1, \,x_2 \in \mathcal{X}_2,\, x_3 \in \mathcal{X}_3, \end{array} \end{aligned}$$
(1.1)

where \(A_i\in \mathfrak {R}^{p\times n_i}\) (\(i=1,2,3\)), \(b \in \mathfrak {R}^p\), \(\mathcal{X}_i \subset \mathfrak {R}^{n_i}\) (\(i=1,2,3\)) are closed convex sets; and \(\theta _i:\mathfrak {R}^{n_i}\rightarrow \mathfrak {R}\) (\(i=1,2,3\)) are closed convex but not necessarily smooth functions. The solution set of (1.1) is assumed to be nonempty. The abstract model (1.1) captures many applications in diversifying areas—e.g. see the image alignment problem in [24], the robust principal component analysis model with noisy and incomplete data in [27], the latent variable Gaussian graphical model selection in [5, 22] and the quadratic discriminant analysis model in [21]. Our discussion is inspired by the scenario where each function \(\theta _i\) may have some specific properties and it deserves to explore them in algorithmic design. This is often encountered in some sparse and low-rank optimization models, such as the just-mentioned applications of (1.1). We thus do not consider the generic treatment that the sum of three functions is regarded as one general function and possible advantageous properties of each individual \(\theta _i\) are ignored or not fully used.

The alternating direction method of multipliers (ADMM) was originally proposed in [12] (see also [4, 9]), and it is now a benchmark for the following convex minimization model analogous to (1.1) but with only two blocks of functions and variables:

$$\begin{aligned} \begin{array}{ll} \min &{} \,\theta _1(x_1) + \theta _2(x_2)\\ \hbox {s.t.} &{} \,A_1x_1 + A_2x_2 = b ,\\ &{} x_1 \in \mathcal{X}_1, x_2 \in \mathcal{X}_2\,. \end{array} \end{aligned}$$
(1.2)

Let

$$\begin{aligned} \mathcal{L}_\mathcal{A} (x_1,x_2,\lambda )&= \theta _1(x_1)+\theta _2(x_2) -\lambda ^T\bigl (A_1x_1 + A_2x_2-b\bigr )\nonumber \\&+\, \frac{\beta }{2} \big \Vert A_1x_1 + A_2x_2 -b\big \Vert ^2 \end{aligned}$$
(1.3)

be the augmented Lagrangian function of (1.2) with the Lagrange multiplier \(\lambda \in \mathfrak {R}^p\) and \(\beta >0\) be a penalty parameter. Then, the iterative scheme of ADMM for (1.2) is

figure a

The iterative scheme of ADMM embeds a Gaussian–Seidel decomposition into each iteration of the augmented Lagrangian method (ALM) in [19, 25]; thus the functions \(\theta _1\) and \(\theta _2\) are treated individually and so easier subproblems could be generated. This feature is very advantageous for a broad spectrum of application such as partial differential equations, mechanics, image processing, statistical learning, computer vision, and so on. In fact, the ADMM has recently witnessed a “renaissance” in many application domains after a long period without too much attention. We refer to [3, 6, 11] for some review papers on the ADMM.

Given the ADMM’s advantage in using each \(\theta _i\)’s properties individually, it is natural to extend the original ADMM (1.4) for (1.2) directly to (1.1) and obtain the scheme

figure b

where

$$\begin{aligned} \mathcal{L}_\mathcal{A} (x_1,x_2,x_3,\lambda )&= \sum _{i=1}^3 \theta _i(x_i) -\lambda ^T\bigl (A_1x_1 + A_2x_2 + A_3x_3 -b\bigr ) \nonumber \\&+\, \frac{\beta }{2} \big \Vert A_1x_1 + A_2x_2 +A_3x_3 -b\big \Vert ^2 \end{aligned}$$
(1.6)

is the augmented Lagrangian function of (1.1). This direct extension of ADMM is strongly desired and practically used by many users, see e.g. [24, 27]. The convergence of (1.5), however, has been ambiguous for a long time—there is neither an affirmative convergence proof nor an example showing its divergence in the literature. This convergence ambiguity has inspired an active research topic of developing such algorithms that are slightly twisted versions of (1.5) but with provable convergence and competitive numerical performance, see e.g. [15, 17, 20]. Since the direct extension of ADMM (1.5) does work well for some applications (e.g. [24, 27]), users have the inclination to imagine that this scheme seems to be convergent even though they are perplexed by the rigorous proof. In the literature, there was even very little hint for the difficulty in the convergence proof for (1.5), see [6] for an insightful explanation.

The main result of this paper is to answer this long-standing open question negatively: The direct extension of ADMM (1.5) is not necessarily convergent. We organize the rest of this paper as follows. In Sect. 2, we present a sufficient condition to ensure the convergence of (1.5). Then, based on the analysis in Sect. 2, we construct an example to demonstrate the divergence of the direct extension of ADMM (1.5) in Sect. 3. Some extensions of the paper’s main result are discussed in Sect. 4. Finally, some concluding remarks are given in Sect. 5.

2 A sufficient condition ensuring the convergence of (1.5)

We first study a sufficient condition that can ensure the convergence for the direct extension of ADMM (1.5). This condition is of only theoretical interest, but our idea of constructing a counter example to show the divergence of (1.5) would be clearer via this study.

Our claim is that the convergence of (1.5) is guaranteed when any two coefficient matrices in (1.1) are orthogonal. We thus will discuss the cases: \(A_1^TA_2=0\), \(A_2^TA_3=0\) and \(A_1^TA_3=0\). This new condition does not impose any strong convexity on the objective function in (1.1), and it simply requires to check the orthogonality of the coefficient matrices.

2.1 Case 1: \(A_1^TA_2=0\) or \(A_2^TA_3=0\)

We remark that if two coefficient matrices of (1.1) in consecutive order are orthogonal, i.e., \(A_1^TA_2=0\) or \(A_2^TA_3=0\), then the direct extension of ADMM (1.5) reduces to a special case of the original ADMM (1.4). Thus the convergence of (1.5) under this condition is implied by well known results in ADMM literature.

To see this, let us first assume \(A_1^TA_2=0\). According to the first-order optimality conditions of the minimization problems in (1.5), we have \(x_i^{k+1} \in \mathcal{X}_i\) (\(i=1,2,3\)) and

figure c

Then, because of \(A_1^TA_2=0\), it follows from (2.1) that

figure d

which is also the first-order optimality condition of the scheme

figure e

Clearly, (2.3) is a specific application of the original ADMM (1.4) to (1.1) by regarding \((x_1,x_2)\) as one variable, \([A_1,A_2]\) as one matrix and \(\theta _1(x_1)+\theta _2(x_2)\) as one function. Note that both \(x_1^k\) and \(x_2^k\) are not required to generate the \((k+1)\)-th iteration under the orthogonality condition \(A_1^TA_2=0\) in (2.3). Existing convergence results for the original ADMM such as those in [8, 18] thus hold for the special case of (1.5) with the orthogonality condition \(A_1^TA_2=0\).

Similar discussion can be carried out under the orthogonality condition \(A_2^TA_3=0\).

2.2 Case 2: \(A_1^TA_3=0\)

In the last subsection, we have discussed the cases where two consecutive coefficient matrices are orthogonal. Now, we pay attention to the case where \(A_1^TA_3=0\) and show that it can also ensure the convergence of (1.5).

To prepare for the proof, we need to make something clear. First, note that the update order of (1.5) at each iteration is \(x_1\rightarrow x_2 \rightarrow x_3 \rightarrow \lambda \) and then it repeats cyclically. Equivalently, we can update the variables via the order \(x_2\rightarrow x_3 \rightarrow \lambda \rightarrow x_1\) and thus have the following iterative form:

figure f

According to (2.4), there is a update for the variable \(\lambda \) between the updates for \(x_3\) and \(x_1\). Thus, the case \(A_1^TA_3=0\) requires discussion different from that in the last subsection. Moreover, when \(x_1^{k}\) is taken as \(x_1^{k+1}\) and \(x_1^{k+1}\) as \(x_1^{k+2}\), the scheme (2.4) reduces exactly to the direct extension of ADMM (1.5). Therefore, the convergence analysis for the scheme (1.5) is equivalent to that for (2.4). For notational simplicity, we will focus on the representation of (2.4) within this subsection.

Second, it worths to mention that the variable \(x_2\) is not involved in the iteration of (2.4), meaning the scheme (2.4) generating a new iterate only based on \((x_1^{k},x_3^k,\lambda ^k)\). We thus follow the terminology in [3] to call \(x_2\) an intermediate variable; and correspondingly call \((x_1,x_3,\lambda )\) essential variables because they are really necessary to execute the iteration of (2.4). Accordingly, we use the notations \(w^k =(x_1^{k},x_2^k,x_3^k,\lambda ^k)\), \(u^k = w^k{\setminus }\lambda ^k=(x_1^{k},x_2^k,x_3^k)\), \(v^k=w^k{\setminus }x_2^k=(x_1^{k},x_3^k,\lambda ^k)\), \(v=w{\setminus } x_2=(x_1,x_3,\lambda )\), \(\mathcal{V} = \mathcal{X}_1\times \mathcal{X}_3\times \mathfrak {R}^p\) and

$$\begin{aligned} \mathcal{V^*} := \{ v^* =(x_1^*,x_3^*,\lambda ^*) \, |\, w^* =(x_1^*,x_2^*,x_3^*,\lambda ^*)\in \Omega ^*\}, \end{aligned}$$

where \(\Omega ^*\) is the collection of the KKT points of (1.1).

Third, it is useful to characterize the model (1.1) by a variational inequality. More specifically, finding a saddle point of the Lagrangian function of (1.1) is equivalent to solving the variational inequality problem: Finding \(w^*\in \Omega \) such that

$$\begin{aligned} \hbox {VI} (\Omega ,F,\theta ) : \quad \theta (u) - \theta (u^*) + (w-w^*)^T F(w^*) \ge 0, \quad \forall \, w\in \Omega , \end{aligned}$$
(2.5a)

where

$$\begin{aligned} u\!:=\! \left( \begin{array}{c} x_1\\ x_2 \\ x_3 \end{array} \right) , \quad w:= \left( \begin{array}{c} x_1\\ x_2\\ x_3\\ \lambda \end{array} \right) , \quad \theta (u):=\theta _1(x_1) \!+\! \theta _2(x_2)\!+\!\theta _3(x_3), \qquad \end{aligned}$$
(2.5b)
$$\begin{aligned} F(w)\!:=\! \left( \begin{array}{c} - A_1^T\lambda \\ -A_2^T\lambda \\ -A_3^T\lambda \\ A_1x_1 + A_2x_2 + A_3x_3 -b \end{array} \right) , \quad \mathrm{and}\;\; \Omega := \mathcal{X}_1 \!\times \!\mathcal{X}_2\! \times \! \mathcal{X}_3 \times \mathfrak {R}^p.\qquad \end{aligned}$$
(2.5c)

Obviously, the mapping \(F(\cdot )\) defined in (2.5c) is monotone because it is affine with a skew-symmetric matrix.

Last, let us take a deeper look at the output of (2.4) and investigate some of its properties. In fact, deriving the first-order optimality condition of the minimization problems in (2.4) and rewriting (2.4c) appropriately, we obtain

figure g

Then, substituting (2.6c) into (2.6a), (2.6b) and (2.6d); and using \(A_1^TA_3=0\), we get

figure h

With the definitions of \(\theta \), \(F\), \(\Omega \), \(u^k\) and \(v^k\), we can rewrite (2.7) as a compact form. We summarize it in the next lemma and omit its proof as it is just a compact reformulation of (2.7).

Lemma 2.1

Let \(w^{k+1}=(x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\) be generated by (2.4) from given \(v^k=(x_1^{k},x_3^k,\lambda ^k)\). Then we have

$$\begin{aligned}&w^{k+1}\in \Omega , \quad \theta (u) - \theta \left( u^{k+1}\right) + \left( w- w^{k+1} \right) ^T \nonumber \\&\qquad \qquad \times \left\{ F(w^{k+1}) +Q\left( v^k-v^{k+1}\right) \right\} \ge 0, \quad \forall \, w\in \Omega , \end{aligned}$$
(2.8)

where

$$\begin{aligned} Q = \left( \begin{array}{ccc} -\beta A_1^TA_1 &{} 0 &{} A_1^T \\ 0 &{}\beta A_2^T A_3 &{} 0 \\ 0 &{} 0 &{} 0 \\ A_1 &{} 0 &{} -{1\over \beta } I \end{array} \right) . \end{aligned}$$
(2.9)

Note that the assertion (2.8) is useful for quantifying the accuracy of \(w^{k+1}\) to a solution point of VI\((\Omega ,F,\theta )\), because of the variational inequality reformulation (2.5) of (1.1).

Now, we are ready to prove the convergence for the direct extension of ADMM under the condition \(A_1^TA_3=0\). We first refine the assertion (2.8) under this additional condition.

Lemma 2.2

Let \(w^{k+1}=(x_1^{k},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\) be generated by (2.4) from given \(v^k=(x_1^{k},x_3^k,\lambda ^k)\). If \(A_1^T A_3=0\), then we have

$$\begin{aligned} w^{k+1}&\in \Omega , \theta (u) - \theta \left( u^{k+1}\right) + \left( w- w^{k+1} \right) ^T \left\{ F\left( w^{k+1}\right) +\beta PA_3\left( x_3^k-x_3^{k+1}\right) \right\} \nonumber \\&\ge \left( v-v^{k+1}\right) ^T H \left( v^k-v^{k+1}\right) , \quad \forall \, w\in \Omega , \end{aligned}$$
(2.10)

where

$$\begin{aligned} P= \left( \begin{array}{c} A_1^T \\ A_2^T \\ A_3^T \\ 0 \end{array}\right) ,\quad {v=\left( \begin{array}{c} x_1 \\ x_3 \\ \lambda \end{array}\right) } \quad \hbox {and} \quad H= \left( \begin{array}{ccc} \beta A_1^TA_1 &{}\quad 0 &{}\quad -A_1^T \\ 0 &{}\quad \beta A_3^TA_3 &{}\quad 0 \\ -A_1 &{}\quad 0 &{}\quad {1\over \beta }I \end{array} \right) . \end{aligned}$$
(2.11)

Proof

Since \(A_1^TA_3=0\), the following is an identity:

$$\begin{aligned}&\left( \begin{array}{c} x_1-x_1^{k+1}\\ x_2-x_2^{k+1}\\ x_3-x_3^{k+1}\\ \lambda - \lambda ^{k+1} \end{array}\right) ^T \left( \begin{array}{ccc} \beta A_1^TA_1 &{}\quad \beta A_1^TA_3 &{}\quad -A_1^T \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad \beta A_3^TA_3 &{}\quad 0 \\ -A_1 &{}\quad 0 &{}\quad {1\over \beta }I \end{array} \right) \left( \begin{array}{c} x_1^{k}-x_1^{k+1}\\ x_3^k-x_3^{k+1}\\ \lambda ^k - \lambda ^{k+1} \end{array}\right) \nonumber \\&\quad = \left( \begin{array}{c} x_1-x_1^{k+1}\\ x_3-x_3^{k+1}\\ \lambda - \lambda ^{k+1}\end{array} \right) ^T \left( \begin{array}{ccc} \beta A_1^TA_1 &{}\quad 0 &{}\quad -A_1^T \\ 0 &{}\quad \beta A_3^TA_3 &{}\quad 0 \\ -A_1 &{}\quad 0 &{}\quad {1\over \beta }I \end{array} \right) \left( \begin{array}{c} x_1^{k}-x_1^{k+1}\\ x_3^k-x_3^{k+1}\\ \lambda ^k - \lambda ^{k+1}\end{array} \right) . \end{aligned}$$

Adding the above identity to the both sides of (2.8) and using the notations of \(v\) and \(H\), we obtain

$$\begin{aligned} w^{k+1}&\in \Omega , \theta (u) - \theta \left( u^{k+1}\right) + \left( w- w^{k+1}\right) ^T \left\{ F\left( w^{k+1}\right) + Q_0\left( v^k-v^{k+1}\right) \right\} \nonumber \\&\ge \left( v-v^{k+1}\right) ^T H \left( v^k-v^{k+1}\right) , \quad \forall w\in \Omega , \end{aligned}$$
(2.12)

where (see \(Q\) in (2.9))

$$\begin{aligned} Q_0= Q+\left( \begin{array}{ccc} \beta A_1^TA_1 &{}\quad \beta A_1^TA_3 &{}\quad -A_1^T \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad \beta A_3^TA_3 &{}\quad 0 \\ -A_1 &{} 0 &{} {1\over \beta } I \end{array} \right) = \left( \begin{array}{ccc} 0 &{} \beta A_1^T A_3 &{} 0 \\ 0 &{} \beta A_2^T A_3 &{} 0 \\ 0 &{} \beta A_3^T A_3 &{} 0 \\ 0 &{} 0 &{} 0 \end{array} \right) . \end{aligned}$$

By using the structures of the matrices \(Q_0\) and \(P\) (see (2.11)), and the vector \(v\), we have

$$\begin{aligned} (w- w^{k+1} )^TQ_0(v^k-v^{k+1}) = (w- w^{k+1} )^T \beta PA_3(x_3^k-x_3^{k+1}). \end{aligned}$$

The assertion (2.10) is proved. \(\square \)

Let us define two auxiliary sequences which will only serve for simplifying our notation in convergence analysis:

$$\begin{aligned} \tilde{w}^k = \left( \begin{array}{c} \tilde{x}_1^k\\ \tilde{x}_2^k\\ \tilde{x}_3^k\\ \tilde{\lambda }^k \end{array}\right) =\left( \begin{array}{c} x_1^{k+1}\\ x_2^{k+1}\\ x_3^{k+1}\\ \lambda ^{k+1} - \beta A_3(x_3^k-x_3^{k+1}) \end{array}\right) \quad \mathrm{and}\quad \tilde{u}^k = \left( \begin{array}{c} \tilde{x}_1^k\\ \tilde{x}_2^k\\ \tilde{x}_3^k\\ \end{array}\right) ,\qquad \end{aligned}$$
(2.13)

where \((x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\) is generated by (2.4).

In the next lemma, we establish an important inequality based on the assertion in Lemma 2.2, which will play a vital role in convergence analysis.

Lemma 2.3

Let \(w^{k+1}=(x_1^{k+1},x_2^{k+1},x_3^{k+1},\lambda ^{k+1})\) be generated by (2.4) from given \(v^k=(x_1^{k},x_3^k,\lambda ^k)\). If \(A_1^T A_3=0\), we have \(\tilde{w}^k\in \Omega \) and

$$\begin{aligned} \theta (u) - \theta (\tilde{u}^k) + (w- \tilde{w}^k)^T F(\tilde{w}^k)&\ge \frac{1}{2}\bigl (\Vert v-v^{k+1}\Vert _H^2 -\Vert v-v^k\Vert _H^2 \bigr )\nonumber \\&+ \frac{1}{2}\Vert v^k-v^{k+1}\Vert _H^2, \;\forall \, w\in \Omega , \end{aligned}$$
(2.14)

where \({\tilde{w}}^k\) and \({\tilde{u}}^k\) are defined in (2.13).

Proof

According to the definition of \(\tilde{w}^k\) and \(F(w)\) (see (2.13) and (2.5c), respectively), (2.10) can be rewritten as

$$\begin{aligned} \tilde{w}^{k}&\in \Omega , \theta (u) - \theta (\tilde{u}^k) + (w- w^{k+1})^T F(\tilde{w}^{k}) \nonumber \\&\ge (v-v^{k+1})^T H (v^k-v^{k+1}) , \quad \forall w\in \Omega . \end{aligned}$$
(2.15)

Note that \( w^{k+1} - \tilde{w}^k =\beta \left( \begin{array}{c} 0 \\ 0 \\ 0\\ A_3(x_3^k-x_3^{k+1}) \end{array} \right) \), we further obtain that \(\tilde{w}^{k}\in \Omega \), and

$$\begin{aligned}&\theta (u) - \theta (\tilde{u}^k) + (w- \tilde{w}^k)^T F(\tilde{w}^k) \nonumber \\&\quad = \theta (u) - \theta (\tilde{u}^k) + (w-w^{k+1})^T F(\tilde{w}^k) + (w^{k+1}-\tilde{w}^k)^T F(\tilde{w}^k) \nonumber \\&\quad \ge (v-v^{k+1})^T H (v^k-v^{k+1}) \nonumber \\&\qquad +\, \bigl ( A_3 (x_3^k-x_3^{k+1})\bigr )^T\bigl ( \beta (A_1x_1^{k+1}\!+\!A_2 x_2^{k+1}+A_3x_3^{k+1}\!-\!b)\bigr ) , \forall w\!\in \!\Omega .\qquad \quad \end{aligned}$$
(2.16)

Setting \(x_3 = x_3^k\) in (2.7b), we obtain

$$\begin{aligned} \theta _3(x_3^k)-\theta _3(x_3^{k+1}) +(x_3^k-x_3^{k+1})^T\{-A_3^T\lambda ^{k+1}\} \ge 0. \end{aligned}$$
(2.17)

Note that (2.7b) is also true for the \((k-1)\)th iteration. Thus, it holds that

$$\begin{aligned} \theta _3(x_3)-\theta _3\left( x_3^k\right) +\left( x_3- x_3^k\right) ^T\left\{ -A_3^T\lambda ^k\right\} \ge 0. \end{aligned}$$

Setting \(x_3=x_3^{k+1}\) in the last inequality, we obtain

$$\begin{aligned} \theta _3\left( x_3^{k+1}\right) -\theta _3\left( x_3^k\right) +\left( x_3^{k+1}- x_3^k\right) ^T\left\{ -A_3^T\lambda ^k\right\} \ge 0, \end{aligned}$$
(2.18)

which together with (2.17) yields that

$$\begin{aligned} \left( \lambda ^k - \lambda ^{k+1}\right) ^T A_3\left( x_3^k - x_3^{k+1}\right) \ge 0, \quad \forall k\ge 0. \end{aligned}$$
(2.19)

By using the fact \(\lambda ^k - \lambda ^{k+1} = \beta (A_1 x_1^{k} +A_2x_2^{k+1}+A_3x_3^{k+1}-b)\) (see (2.6c)) and the assumption \(A_1^T A_3=0\), we get immediately that

$$\begin{aligned} \beta \left( A_1x_1^{k+1}+A_2 x_2^{k+1}+A_3x_3^{k+1}-b\right) ^T A_3 \left( x_3^k-x_3^{k+1}\right) \ge 0, \end{aligned}$$
(2.20)

and hence

$$\begin{aligned} \tilde{w}^k \!\in \Omega , \theta (u) \!-\! \theta \left( \tilde{u}^k\right) + \left( w\!-\! \tilde{w}^k\right) ^T F\left( \tilde{w}^k\right) \!\ge \! \left( v\!-\!v^{k+1}\right) ^T H \left( v^k-v^{k\!+\!1}\right) , \forall \, w\in \Omega . \end{aligned}$$
(2.21)

By substituting the identity

$$\begin{aligned} \left( v-v^{k+1}\right) ^T H \left( v^k-v^{k+1}\right) =\frac{1}{2}\bigl (\Vert v-v^{k+1}\Vert _H^2 -\Vert v-v^k\Vert _H^2 \bigr ) + \frac{1}{2}\left\| v^k-v^{k+1}\right\| _H^2 \end{aligned}$$

into the right-hand side of (2.21), we obtain (2.14). \(\square \)

Now, we are able to establish the contraction property with respect to the solution set of VI\((\Omega ,F,\theta )\) for the sequence \(\{v^k\}\) generated by (2.4), from which the convergence of (2.4) can be easily established.

Theorem 2.4

Assume \(A_1^T A_3=0\) for the model (1.1). Let \(\{x_1^{k},x_2^k,x_3^k,\lambda ^k\}\) be the sequence generated by the direct extension of ADMM (2.4). Then, we have:

  1. (i)

    The sequence \(\{v^k:=(x_1^{k},x_3^k,\lambda ^k)\}\) is contractive with respective to the solution of VI(\(\Omega ,F,\theta \)), i.e.,

    $$\begin{aligned} \Vert v^{k+1}-v^*\Vert _H^2 \le \Vert v^k-v^*\Vert _H^2 - \Vert v^k -v^{k+1}\Vert _H^2. \end{aligned}$$
    (2.22)
  2. (ii)

    If the matrices \([A_1,\, A_2]\) and \(A_3\) are assumed to be full column rank, then the sequence \(\{w^{k}\}\) converges to a KKT point of the model (1.1).

Proof

(i) The first assertion is straightforward based on (2.14). Setting \(w=w^*\) in (2.14), we get

$$\begin{aligned} \frac{1}{2} \bigl (\Vert v^k\!-\!v^*\Vert _H^2 \!-\!\Vert v^{k+1}\!-\!v^*\Vert _H^2 \bigr ) \!-\! \frac{1}{2}\Vert v^k\!-\!v^{k\!+\!1}\Vert _H^2\!\ge \! \theta (\tilde{u}^k) \!-\! \theta (u^*) \!+\! (\tilde{w}^k\!-\!w^*)\!^T \! F(\tilde{w}^k) . \end{aligned}$$

From the monotonicity of \(F\) and (2.5), it follows that

$$\begin{aligned} \theta (\tilde{u}^k) - \theta (u^*) + (\tilde{w}^k-w^*)^T F(\tilde{w}^k) \ge \theta (\tilde{u}^k) - \theta (u^*) + (\tilde{w}^k-w^*)^T F(w^*) \ge 0, \end{aligned}$$

and thus (2.22) is proved. Clearly, (2.22) indicates that the sequence \(\{v^k\}\) is contractive with respect to the solution set of VI\((\Omega ,F,\theta )\), see e.g. [2].

(ii) To prove (ii), by the inequality (2.22) and (see the definitions of \(v\) and \(H\) in (2.11))

$$\begin{aligned} \Vert v^k-v^{k+1}\Vert _H^2 = \beta \Vert A_1(x_1^k-x_1^{k+1}) -\frac{1}{\beta } (\lambda ^k-\lambda ^{k+1})\Vert ^2 + \beta \Vert A_3(x_3^k-x_3^{k+1})\Vert ^2, \end{aligned}$$

it follows that the sequences \(\{A_1 x_1^{k} -{1\over \beta }\lambda ^k\}\) and \(\{A_3x_3^k\}\) are both bounded. Since \(A_3\) has full column rank, we deduce that \(\{x_3^k\}\) is bounded. Note that

$$\begin{aligned} A_1x_1^{k} + A_2 x_2^{k} = A_1 x_1^{k} - {1\over \beta }\lambda ^k -\left( A_1x_1^{k-1} -{1\over \beta } \lambda ^{k-1}\right) - A_3x_3^k+b. \end{aligned}$$
(2.23)

Hence, \(\{ A_1x_1^{k} + A_2 x_2^{k}\}\) is bounded. Together with the assumption that \([A_1,\,A_2]\) has full column rank, we conclude that the sequences \(\{x_1^{k}\}, \{x_2^k\}\) and \(\{\lambda ^k\}\) are all bounded. Therefore, there exists a subsequence \(\{x_1^{n_k+1},x_2^{n_k+1},x_3^{n_k+1},\lambda ^{n_k+1}\}\) that converges to a limit point, say \((x_1^\infty ,x_2^\infty ,x_3^\infty ,\lambda ^\infty )\). Moreover, from (2.22), we see immediately that

$$\begin{aligned} \begin{array}{ll}&\displaystyle \sum _{k=1}^\infty \Vert v^k-v^{k+1}\Vert _H^2 <+\infty , \end{array} \end{aligned}$$
(2.24)

which shows

$$\begin{aligned} \lim _{k\rightarrow \infty } H\left( v^k-v^{k+1}\right) =0, \end{aligned}$$
(2.25)

and thus

$$\begin{aligned} \lim _{k\rightarrow \infty } Q\left( v^k-v^{k+1}\right) =0. \end{aligned}$$
(2.26)

Then, by taking the limits on the both sides of (2.8), using (2.26), one can immediately write

$$\begin{aligned} \,\, w^\infty \in \Omega , \,\,\,\, \theta (u)-\theta (u^\infty ) +\, \,(w-w^\infty )^T F(w^\infty ) \ge 0,\,\,\, \forall \, w\in \Omega , \end{aligned}$$
(2.27)

which means \(w^\infty = (x_1^\infty ,x_2^\infty ,x_3^\infty , \lambda ^\infty )\) is a KKT point of (1.1). Hence, the inequality (2.22) is also valid if \((x_1^*,x_2^*,x_3^*,\lambda ^*)\) is replaced by \((x_1^\infty ,x_2^\infty ,x_3^\infty , \lambda ^\infty )\). Then it holds that

$$\begin{aligned} \Vert v^{k+1}-v^{\infty }\Vert _H^2 \le \Vert v^k-v^{\infty }\Vert _H^2, \end{aligned}$$
(2.28)

which implies that

$$\begin{aligned} \lim _{k\rightarrow \infty } A_1 (x_1^{k} -x_1^\infty )-{1\over \beta } (\lambda ^k-\lambda ^\infty )=0,\quad \lim _{k\rightarrow \infty } A_3(x_3^k -x_3^\infty )=0. \end{aligned}$$
(2.29)

By taking limits to (2.23), using (2.29) and the assumptions, we know

$$\begin{aligned} \lim _{k\rightarrow \infty } x_1^{k} = x_1^\infty , \quad \lim _{k\rightarrow \infty } x_2^{k} = x_2^\infty , \quad \lim _{k\rightarrow \infty } x_3^{k} = x_3^\infty ,\quad \lim _{k\rightarrow \infty }\lambda ^{k} = \lambda ^\infty . \end{aligned}$$
(2.30)

which completes the proof of this theorem. \(\square \)

Inspired by [18], we can also establish a worst-case convergence rate measured by the iteration complexity in the ergodic sense for the direct extension of ADMM (2.4). This is summarized in the following theorem.

Theorem 2.5

Assume \(A_1^T A_3=0\) for the model (1.1). Let \(\{(x_1^k,x_2^k,x_3^k,\lambda ^k)\}\) be the sequence generated by the direct extension of ADMM (2.4) and \(\tilde{w}^k\) be defined in (2.13). After \(t\) iterations of (2.4), we take

$$\begin{aligned} \tilde{w}_t = \frac{1}{t+1} \sum _{k=0}^t \tilde{w}^k. \end{aligned}$$
(2.31)

Then, \(\tilde{w} \in \mathcal{W}\) and it satisfies

$$\begin{aligned} \theta (\tilde{u}_t) -\theta (u) + (\tilde{w}_{t}-w)^T F(w) \le \frac{1}{2(t+1)}\Vert v-v^0\Vert _H^2,\quad \forall w\in \Omega . \end{aligned}$$
(2.32)

Proof

By the monotonicity of \(F\) and (2.14), it follows that

$$\begin{aligned} \tilde{w}^k&\in \Omega , \;\; \theta (u) - \theta (\tilde{u}^k) + (w- \tilde{w}^k)^T F(w) + \frac{1}{2} \Vert v-v^k\Vert _H^2\nonumber \\&\ge \frac{1}{2} \Vert v-v^{k+1}\Vert _H^2, \; \forall w\in \Omega . \end{aligned}$$
(2.33)

Together with the convexity of \(\mathcal{X}_1\), \(\mathcal{X}_2\) and \(\mathcal{X}_3\), (2.31) implies that \(\tilde{w}_t\in \Omega \). Summing the inequality (2.33) over \(k=0,1,\ldots , t\), we obtain

$$\begin{aligned} (t+1) \theta (u) \!-\! \sum _{k=0}^t \theta (\tilde{u}^k) \!+\! \left( (t+1) w \!-\! \sum _{k=0}^t \tilde{w}^k\right) ^T F(w) \!+\! \frac{1}{2}\Vert v-v^0\Vert _H^2 \!\ge \! 0, \quad \forall w\!\in \!\Omega . \end{aligned}$$

Use the notation of \(\tilde{w}_{t}\), it can be written as

$$\begin{aligned} \frac{1}{t+1} \sum _{k=0}^t \theta (\tilde{u}^k) -\theta (u) + (\tilde{w}_{t}-w)^T F(w) \le \frac{1}{2(t+1)}\Vert v-v^0\Vert _H^2,\quad \forall w\in \Omega .\nonumber \\ \end{aligned}$$
(2.34)

Since \(\theta (u)\) is convex and

$$\begin{aligned} \tilde{u}_{t}= \frac{1}{t+1} \sum _{k=0}^t \tilde{u}^k, \end{aligned}$$

we have that

$$\begin{aligned}\theta (\tilde{u}_t) \le \frac{1}{t+1}\sum _{k=0}^t \theta (\tilde{u}^k). \end{aligned}$$

Substituting it into (2.34), the assertion of this theorem follows directly. \(\square \)

Remark 2.6

For an arbitrarily given compact set \(D \subset \Omega \), let \(d =\sup \{\Vert v-v^0\Vert _H^2\}\,|\, v=w\!\setminus \! x_2, w\in D\}\), where \(v^0 =(x_1^0,x_3^0,\lambda ^0)\). Then, after \(t\) iterations of the extended ADMM (2.4) , the point \(\tilde{w}_t \) defined in (2.31) satisfies

$$\begin{aligned} \sup \left\{ \theta (\tilde{u}_t) -\theta (u) + (\tilde{w}_t -w)^T F(w)\right\} \le {d\over 2(t+1)}, \end{aligned}$$
(2.35)

which, according to the definition (2.5), means \(\tilde{w}_t\) is an approximate solution of VI\((\Omega ,F,\theta )\) with an accuracy of \(O(1/t)\). Thus a worst-case \(O(1/t)\) convergence rate in the ergodic sense is established for the direct extension of ADMM (2.4).

3 An example showing the divergence of (1.5)

In the last section, we have shown that if it is additionally assumed that any two coefficient matrices in (1.1) be orthogonal, then the direct extension of ADMM (1.5) is convergent. Based on this study, we now give an example to show the divergence of (1.5) when such an orthogonality condition is missing. The analyses below also present a strategy for constructing more such examples.

More specifically, we consider the following linear homogeneous equation with three variables:

$$\begin{aligned} A_1x_1 + A_2x_2 + A_3x_3 =0, \end{aligned}$$
(3.1)

where \(A_i \in \mathfrak {R}^3\) (\(i=1,2,3\)) are all column vectors and the matrix \([A_1,\,A_2,\,A_3]\) is assumed to be nonsingular; and \(x_i \in \mathfrak {R}\) (\(i=1,2,3\)). The unique solution of (3.1) is thus \(x_1=x_2=x_3=0\). Clearly, (3.1) is a special case of (1.1) where the objective function is null, \(b\) is the all-zero vector in \(\mathfrak {R}^3\), and \(\mathcal{X}_i =\mathfrak {R}\) for \(i=1,2,3\). The direct extension of ADMM (1.5) is thus applicable to (3.1), and the corresponding optimal Lagrange multipliers are all \(0\).

One will see next that the convergence of the direct extension of ADMM (1.5) applied to solving the linear equations with a null objective is independent of the selection of the penalty parameter \(\beta \). That is, if the direct extension of ADMM (1.5) is convergent for a selected \(\beta >0\), then it is convergent for every \(\beta >0\). On the other hand, if (1.5) is not convergent for one selected \(\beta >0\), then it is not convergent for any \(\beta >0\). Hence, in our specific example to be developed below, one can think \(\beta =1\) without loss of generality.

3.1 The iterative scheme of (1.5) for (3.1)

Now, we elucidate the iterative scheme when the direct extension of ADMM (1.5) is applied to solve the linear equation (3.1). In fact, as we will show, it can be represented as a matrix recursion.

Specifying the scheme (1.5) with any given \(\beta >0\) by the particular setting in (3.1), we obtain

figure i

By introducing a new variable \(\mu ^k:={\lambda ^k/\beta }\), we can recast the scheme (3.2) as

figure j

It follows from the first equation in (3.3) that

$$\begin{aligned} x_1^{k+1} = \frac{1}{A_1^TA_1}\bigl (-A_1^TA_2x_2^k -A_1^TA_3x_3^k+ A_1^T\mu ^k \bigr ). \end{aligned}$$
(3.4)

Substituting (3.4) into (3.3b), (3.3c) and (3.3d), we obtain a reformulation of (3.3)

$$\begin{aligned}&\left( \begin{array}{ccc} A_2^TA_2 &{}\quad 0 &{}\quad 0_{1\times 3} \\ A_3^TA_2 &{}\quad A_3^TA_3 &{}\quad 0_{1\times 3} \\ \ A_2 &{}\quad A_3 &{}\quad I_{3\times 3} \end{array} \right) \left( \begin{array}{c} x_2^{k+1} \\ x_3^{k+1} \\ \mu ^{k+1} \end{array} \right) \nonumber \\&\quad = \left[ \left( \begin{array}{ccc} 0 &{} -A_2^TA_3 &{} A_2^T\\ 0 &{} 0 &{} A_3^T \\ 0_{3\times 1} &{} 0_{3\times 1} &{} I_{3\times 3} \end{array} \right) - \frac{1}{A_1^TA_1}\left( \begin{array}{c} A_2^TA_1 \\ A_3^TA_1 \\ A_1 \end{array} \right) \bigl (-A_1^TA_2, -A_1^TA_3, A_1^T \bigr ) \right] \left( \begin{array}{c} x_2^k \\ x_3^k \\ \mu ^k \end{array} \right) .\nonumber \\ \end{aligned}$$
(3.5)

Let

$$\begin{aligned} L= \left( \begin{array}{c@{\quad }c@{\quad }c} A_2^TA_2 &{} 0 &{} 0_{1\times 3} \\ A_3^TA_2 &{} A_3^TA_3 &{} 0_{1\times 3} \\ \ A_2 &{} A_3 &{} I_{3\times 3} \end{array} \right) \,\, \end{aligned}$$
(3.6)

and

$$\begin{aligned} R \!=\! \left( \begin{array}{ccc} 0 &{}\quad -A_2^TA_3 &{}\quad A_2^T\\ 0 &{}\quad 0 &{}\quad A_3^T \\ 0_{3\times 1} &{}\quad 0_{3\times 1} &{}\quad I_{3\times 3} \end{array} \right) - \frac{1}{A_1^TA_1}\left( \begin{array}{c} A_2^TA_1 \\ A_3^TA_1 \\ A_1 \end{array} \right) \bigl (-A_1^TA_2, -A_1^TA_3, A_1^T \bigr ).\qquad \end{aligned}$$
(3.7)

Then the iterative formula (3.5) can be rewritten in the following fixed matrix mappings:

$$\begin{aligned} \left( \begin{array}{c} x_2^{k+1} \\ x_3^{k+1} \\ \mu ^{k+1} \end{array} \right) = M\left( \begin{array}{c} x_2^{k} \\ x_3^{k} \\ \mu ^{k} \end{array} \right) = \cdots = M^{k+1}\left( \begin{array}{c} x_2^{0} \\ x_3^{0} \\ \mu ^{0} \end{array} \right) \end{aligned}$$
(3.8)

with

$$\begin{aligned} M = L^{-1}R. \end{aligned}$$
(3.9)

Therefore, the direct extension of ADMM (1.5) is convergent for any starting point if the matrix mapping is a contraction, or equivalently, the spectral radius of \(M\), denote by \(\rho (M)\), is strictly less than \(1\). Thus, to construct a divergent example, we would look for a \(A\) such that \(\rho (M)>1\).

3.2 A concrete example showing the divergence of (1.5)

Now we are ready to construct a concrete example to show the divergence of the direct extension of ADMM (1.5) for all \(\beta >0\) when it is applied to solve the model (3.1).

Our previous analysis in Sect. 2 has shown that the scheme (1.5) is convergent whenever any two coefficient matrices are orthogonal. Thus, to show the divergence of (1.5) for (3.1), the columns \(A_1, A_2\) and \(A_3\) in (3.1) should be chosen such that any two of them are non-orthogonal.

Specifically, we thus construct the matrix \(A\) as follows:

$$\begin{aligned} A= \left( A_1, A_2, A_3\right) = \left( \begin{array}{ccc} 1 &{}\quad 1 &{}\quad 1\\ 1 &{}\quad 1 &{}\quad 2\\ 1 &{}\quad 2 &{}\quad 2 \end{array} \right) . \end{aligned}$$
(3.10)

Given this matrix \(A\), the system of linear equations (3.5) can be specified as

$$\begin{aligned}&\left( \begin{array}{ccccc} 6 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 7 &{}\quad 9 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 1 &{}\quad 2 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 2 &{}\quad 2 &{}\quad 0 &{}\quad 0 &{}\quad 1 \end{array} \right) \left( \begin{array}{c} x_2^{k+1}\\ x_3^{k+1}\\ \mu _1^{k+1}\\ \mu _2^{k+1}\\ \mu _3^{k+1} \end{array} \right) \\&\quad = \left[ \left( \begin{array}{ccccc} 0 &{}\quad -7 &{}\quad 1 &{}\quad 1 &{}\quad 2 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 2 &{}\quad 2 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \end{array} \right) -\frac{1}{3} \left( \begin{array}{c} 4 \\ 5 \\ 1 \\ 1 \\ 1 \end{array} \right) \bigl (-4,-5,1,1,1\bigr )\right] \left( \begin{array}{c} x_2^k\\ x_3^k\\ \mu _1^k\\ \mu _2^k\\ \mu _3^k \end{array} \right) . \end{aligned}$$

Note with the specification in (3.10), the matrices \(L\) in (3.6) and \(R\) in (3.7) reduce to

$$\begin{aligned} L= \left( \begin{array}{ccccc} 6 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 7 &{}\quad 9 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{} \quad 0 \\ 1 &{}\quad 2 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 2 &{}\quad 2 &{}\quad 0 &{}\quad 0 &{}\quad 1 \end{array} \right) \quad \hbox {and} \quad R= \frac{1}{3} \left( \begin{array}{ccccc} 16 &{}\quad -1 &{}\quad -1 &{}\quad -1 &{}\quad 2 \\ 20 &{}\quad 25 &{}\quad -2 &{}\quad 1 &{}\quad 1 \\ 4 &{}\quad 5 &{}\quad 2 &{}\quad -1 &{}\quad -1 \\ 4 &{}\quad 5 &{}\quad -1 &{}\quad 2 &{}\quad -1 \\ 4 &{}\quad 5 &{}\quad -1 &{}\quad -1 &{}\quad 2 \end{array}\right) . \end{aligned}$$

Thus we have

$$\begin{aligned} M= L^{-1}R = \frac{1}{162}\left( \begin{array}{ccccc} 144 &{}\quad -9 &{}\quad -9 &{}\quad -9 &{}\quad 18 \\ 8 &{}\quad 157 &{} -5 &{}\quad 13 &{}\quad -8 \\ 64 &{}\quad 122 &{}\quad 122 &{}\quad -58 &{}\quad -64 \\ 56 &{}\quad -35 &{} -35 &{}\quad 91 &{}\quad -56 \\ -88 &{}\quad -26 &{}\quad -26 &{}\quad -62 &{}\quad 88 \end{array}\right) . \end{aligned}$$
(3.11)

From direct computation, \(M\) admits the following eigenvalue decomposition

$$\begin{aligned} M = V\mathrm{Diag(d)}V^{-1}, \end{aligned}$$
(3.12)

where

$$\begin{aligned} d =\left( \begin{array}{c} 0.9836+0.2984i\\ 0.9836-0.2984i\\ 0.8744+0.2310i\\ 0.8744-0.2310i\\ 0 \end{array} \right) \end{aligned}$$

and

$$\begin{aligned} V = \left( \begin{array}{ccccc} 0.1314+0.2661i &{}\quad 0.1314- 0.2661i&{}\quad 0.1314 -0.2661i &{}\quad 0.1314+0.2661i &{}\quad 0\\ 0.0664-0.2718i&{}\quad 0.0664 +0.2718i&{}\quad 0.0664 +0.2718i&{}\quad 0.0664-0.2718i &{}\quad 0\\ -0.2847-0.4437i &{}\quad -0.2847+0.4437i &{}\quad 0.2847-0.4437i &{}\quad 0.2847+0.4437i &{}\quad 0.5774\\ 0.5694 &{}\quad 0.5694 &{}\quad -0.5694 &{}\quad -0.5694 &{}\quad 0.5774\\ -0.4270 +0.2218i &{}\quad -0.4270 -0.2218i&{}\quad 0.4270 +0.2218i &{}\quad 0.4270 -0.2218i &{}\quad 0.5774 \end{array} \right) . \end{aligned}$$

An important fact regarding \(d\) defined above is that

$$\begin{aligned} \rho (M)=|d_1 | = |d_2 |=1.0278 >1, \end{aligned}$$

from which we can construct a divergent sequence \(\{(x_2^k,x_3^k,\lambda _1^k,\lambda _2^k,\lambda _3^k)\}\) starting from certain initial points. The questions are: Can we find real-valued non-convergent starting points? Does the set of non-convergent starting points form a continuously dense set, that is, are they not isolated? We give affirmative answers below.

Indeed, for any initial \((x_2^0,x_3^0,\mu _1^0,\mu _2^0,\mu _3^0)\), let

$$\begin{aligned} \left( \begin{array}{c} l_1 \\ l_2\\ l_3 \\ l_4 \\ l_5 \end{array} \right) = V^{-1} \left( \begin{array}{c} x_2^0\\ x_3^0\\ \mu _1^0 \\ \mu _2^0 \\ \mu _3^0 \end{array} \right) . \end{aligned}$$
(3.13)

From (3.8) and (3.12), we know that

$$\begin{aligned} \left( \begin{array}{c} x_2^{k}\\ x_3^{k}\\ \mu _1^{k}\\ \mu _2^{k}\\ \mu _3^k \end{array} \right)&= V \mathrm{Diag}(d^{\,k})V^{-1} \left( \begin{array}{c} x_2^{0}\\ x_3^{0}\\ \mu _1^{0} \\ \mu _2^0 \\ \mu _3^0 \end{array} \right) \\&= V \mathrm{Diag}(d^{\,k}) \left( \begin{array}{c} l_1\\ l_2\\ l_3\\ l_4\\ l_5 \end{array}\right) \\&= V \left( \begin{array}{c} l_1(\,0.9836+0.2984i\,)^{k}\\ l_2(\,0.9836-0.2984i\,)^{k}\\ l_3(\,0.8744+0.2310i\,)^{k}\\ l_4(\,0.8744-0.2310i\,)^{k}\\ 0 \end{array}\right) , \end{aligned}$$

Thus, as long as \((l_1l_2)\ne 0\), the sequence would be divergent and there is no way for it to converge to a solution point of (3.1).

There are many choices of the starting point \((x_2^0,x_3^0,\mu _1^0,\mu _2^0,\mu _3^0)\) such that \((l_1l_2)\ne 0\). For example,

$$\begin{aligned} \left( \begin{array}{c} x_2^0\\ x_3^0\\ \mu _1^0 \\ \mu _2^0 \\ \mu _3^0 \end{array} \right) =V\left( \begin{array}{c} \alpha _1\\ \alpha _1\\ \alpha _2 \\ \alpha _2 \\ \alpha _3 \end{array} \right) , \end{aligned}$$
(3.14)

where \(\alpha _{i}\) are any real numbers and \(\alpha _1\ne 0\) (which implies that \(l_1=l_2=\alpha _1\ne 0\)). Furthermore, it is clear that the pair of \(V(1)\) and \(V (2)\) are two complex conjugate vectors, so are the pair of \(V (3)\) and \(V(4)\), where \(V(i)\) denotes the \(i\)-th column of \(V\). Thus the starting point of (3.14) is real-valued.

Since the vectors \((\alpha _1,\alpha _2,\alpha _3)\in \mathfrak {R}^3\) with \(\alpha _1>0\) form a continuously dense half space, the non-convergent starting points given by (3.14) with \(\alpha _1>0\) also form a continuously dense half space. Thus, we conclude the main result of this paper as follows.

Theorem 3.1

For the three-block convex minimization problem (1.1), there is an example where the direct extension of ADMM (1.5) is divergent for any penalty parameter \(\beta >0\) and for any starting-point in a certain continuously dense half space of dimension \(3\).

Remark 3.2

The linear homogeneous Eq. (3.1) with the matrix given in (3.10), whose feasible region is a singleton, is already sufficient to show the divergence of the direct extension of ADMM (1.5). In fact, we can construct more sophisticated examples to demonstrate the same divergence. For example, we consider the quadratic programming model

$$\begin{aligned} \begin{array}{ll} \min &{} \,\displaystyle {1\over 2} x_1^2 \\ \hbox {s.t.}&{} \,\left( \begin{array}{cc} 1&{} 1\\ 1&{} 1\\ 1&{} 1 \end{array} \right) \left( \begin{array}{c} x_1\\ x_2 \end{array} \right) + \left( \begin{array}{c} 1\\ 1\\ 2 \end{array} \right) x_3 + \left( \begin{array}{c} 1\\ 2\\ 2 \end{array} \right) x_4=0. \end{array} \end{aligned}$$
(3.15)

Obviously, the feasible region of (3.15) is not a singleton. Following the procedure in Sect. 3, it is easy to show that applying the scheme (1.5) with any \(\beta >0\) to (3.15), the resulting linear iterative mapping shares the same non-zero eigenvalues as those of the matrix given in (3.11). Therefore, the corresponding spectral radius of the matrix involved in the linear iterative mapping is still \(1.0278\). Hence, the model (3.15) also shows the divergence of the direct extension of ADMM (1.5) with \(\beta >0\) and a certain starting point.

4 Extensions

In this section, we extend our previous analysis to some relevant work in the literature.

4.1 Strongly convex case of (1.1)

When all functions \(\theta _i\)’s in (1.1) are further assumed to be strongly convex and the penalty parameter \(\beta \) is restricted into a specific range determined by all the strong convex modulus of these functions, the direct extension of ADMM (1.5) is convergent as proved in [14].

Then, it is interesting to ask whether the scheme (1.5) for a strongly convex minimization model is still convergent when the restriction on \(\beta \) in [14] is removed. In other words, does the strong convexity of the objective function help the convergence of the direct extension of ADMM for the three block convex minimization problem (1.1)? A by-product of this paper is a negative answer to the question.

Theorem 4.1

For the model (1.1) with the strong convex assumption on its objective function, the direct extension of ADMM (1.5) is not necessarily convergent for all \(\beta >0\).

Recall that the requirement \(\rho (M)>1\) yields the divergence of the direct extension of ADMM (1.5) when it is applied to solve (3.1). Consider the following strongly convex minimization problem with three variables:

$$\begin{aligned} \begin{array}{rl} \min &{} 0.05 x_1^2+0.05 x_2^2 +0.05 x_3^2\\ \hbox {s.t.}&{} \left( \begin{array}{c@{\quad }c@{\quad }c} 1 &{} 1 &{} 1\\ 1 &{} 1 &{} 2\\ 1 &{} 2 &{} 2 \end{array} \right) \left( \begin{array}{c} x_1\\ x_2\\ x_3 \end{array} \right) = 0. \end{array} \end{aligned}$$
(4.1)

One can verify that each iteration of the direct extension of ADMM (1.5) applied to the problem remains a fixed matrix mapping. Based on a simple calculation, it is seen that for (4.1), the spectral radius of the matrix involved in (1.5) with \(\beta =1\) is 1.0087. Thus, by a similar discussion to that in Sect. 3.2, one can find a proper starting point such that the direct extension of ADMM (1.5) with \(\beta =1\) is divergent. The detail is omitted for succinctness.

4.2 A revisit to the ADMM variant with a small step-size in [20]

To tackle the convergence ambiguity of the direct extension of ADMM (1.5), it was recently proposed in [20]Footnote 1 to attach a relaxation factor \(\gamma >0\) to the Lagrange-multiplier updating step (1.5d). That is, the step (1.5d) was changed to

$$\begin{aligned} \lambda ^{k+1}=\lambda ^k -\gamma \beta \Big (A_1x_1^{k+1} +A_2x_2^{k+1}+A_3x_3^{k+1}-b\Big ), \end{aligned}$$
(4.2)

where the “step-size” relaxation factor \(\gamma \) is required to be sufficiently small to ensure that a certain error-bound condition is satisfied. As proved in [20], this ADMM variant with a small step-size could be even linearly convergent provided that certain additional assumptions are posed on the model (1.1). Indeed, the sufficiently small requirement on \(\gamma \) plays a significant theoretical role in the convergence analysis in [20]; and the analysis in [20] requires to know some model-dependent data to determine the value of \(\gamma \).

In fact, how to relax the Lagrange-multiplier updating steps for ALM-based iterative schemes has been well investigated in the literature. For instance, when there is only one block of variable and function in the model (1.1), the ADMM (1.4) reduces to the standard ALM [19, 25]; and it has been demonstrated in [1, 7] that the convergence can still be ensured if we attach a relaxation factor \(\gamma \in (0,2)\) to the Lagrange-multiplier updating step of the ALM. The key point is the fact elucidated in [26] that the ALM is indeed an application of the proximal point algorithm in [23]; and thus the relaxation idea in [13] is applicable. When the model (1.2) with two blocks of variable and function is considered, the convergence can be ensured if a relaxation factor \(\gamma \in (0,\,{\sqrt{5}+1\over 2})\) is attached to the Lagrange-multiplier updating step (1.4c) in the ADMM scheme (1.4). We refer the reader to [10] for the analysis and [28] for some numerical experiments. Indeed, as well demonstrated in the literature, sometimes the ALM and ADMM schemes can be numerically accelerated with a relaxation factor \(\gamma >1\) in their Lagrange-multiplier updating steps. According to these facts about the ALM and ADMM, we see that the upper bound for the factor \(\gamma \)’s allowable range is reduced from \(2\) to \({\sqrt{5}+1\over 2}\) when the ALM is splitted as the ADMM, i.e., when the block of variable and function is increased from \(1\) to \(2\); but these upper bounds do not depend on any model-dependent data. It is thus interesting to ask whether we can find such a model-data-independent range for the \(\gamma \) in (4.2) so that the convergence of the ADMM variant with a small step-size proposed in [20] can be guaranteed with any value of \(\gamma \) in this range.

Recall that we are considering three blocks of variable and function in the model (1.1); and the ALM subproblem at each iteration is splitted as three subproblems. So, the steps (1.5a)–(1.5c) represent an approximation to the corresponding augmented Lagrangian function but less accurate than the steps (1.4a)–(1.4b) of the ADMM. Hence, based on the just mentioned facts about the ALM and ADMM, it seems reasonable to expect that even if such a range exists, very likely its upper bound should be smaller than \({\sqrt{5}+1\over 2}\) (in fact, by the counter example developed earlier, it could be even smaller than \(1\)). In the following, we will construct some examples to study this fact numerically. Indeed, our numerical results strongly suggest that such a constant upper bound does not even exist, though rigorous theoretical analysis lacks.

We still consider the linear equation example (3.1) but the matrix \(A\) is given by

$$\begin{aligned} A= \left( \begin{array}{c@{\quad }c@{\quad }c} 1 &{} 1 &{} 1\\ 1 &{} 1 &{} 1+\alpha \\ 1 &{} 1+\alpha &{} 1+\alpha \end{array} \right) \end{aligned}$$
(4.3)

where the positive scalar \(\alpha >0\). Thus, the matrix in (3.10) used to show the divergence of the direct extension of ADMM (1.5) is a special case of (4.3) with \(\alpha =1\).

Now, we consider implementing the ADMM variant with a small step-size in [20] to the problem (3.1) where the matrix \(A\) is given in (4.3) and the penalty parameter \(\beta \) is fixed as \(1\). In particular, we take the relaxation factor \(\gamma \) in (4.2) exactly as the scalar \(\alpha \) in (4.3), i.e., \(\gamma =\alpha \); and test the convergence when the value of \(\alpha \) (and also \(\gamma \)) varies from \(1\) to some extremely small values. Let \(M(\alpha )\) be the corresponding matrix of the resulting linear iterative mapping. Recall that the divergence occurs with a certain initial point if the spectral radius of \(M(\alpha )\), denoted by \(\rho (M(\alpha ))\), is greater than \(1\). In Table 1, we report the numerical values of \(\rho (M(\alpha ))\) for several choices of \(\alpha \).

Table 1 Spectral radius of the implementation of the ADMM variant with a small step-size (\(\beta =1\))

It is observed from Table 1 that for the example (3.1) where the matrix \(A\) is given in (4.3) with the tested values of \(\alpha \), the ADMM variant with a small step-size in [20] is still divergent even if \(\gamma \) is as small as \(1e-8\). In other words, even if the value of \(\gamma \) in (4.2) is extremely small (e.g., \(\gamma =1e-8\)), the example (3.1) with \(\alpha =\gamma \) in the matrix \(A\) given by (4.3) already shows the divergence of the ADMM variant with a small step-size. This numerical study thus inspires us to conjecture that it is not possible to find a model-data-independent allowable range for the factor \(\gamma \) in (4.2) such that the convergence of the ADMM variant with a small step-size can be guaranteed for any value of \(\gamma \) in this range. This is a significant difference between the ADMM variant with a small step-size and the ALM and ADMM schemes (Recall that the model-data-independent ranges \(\gamma \in (0,2)\) and \(\gamma \in (0,\,{\sqrt{5}+1\over 2})\) exist for the ALM and ADMM, respectively). Accordingly, with this numerical study, the rationale of proposing some model-data-dependent conditions on the factor \(\gamma \), such as the one in [20], to ensure the convergence of the direct extension of ADMM for solving a multi-block convex minimization model is also verified empirically.

5 Conclusions

We have shown by an example that the direct extension of the alternating direction method of multiplier (ADMM) is not necessarily convergent for solving a convex minimization model with linear constraints and a separable objective function with three function blocks.

We first study a condition that can sufficiently ensure the convergence of the direct extension of the ADMM. This new condition requires the orthogonality of the given coefficient matrices in the model; but poses no restriction on strong convexity on the objective functions in the model or range restriction on the penalty parameter in the algorithmic implementation. This sufficient condition is only of theoretical interest, because it is not easily satisfied by the known applications in the literature. But, the study of this condition essentially inspires the idea and roadmap to construct the main counter example to show the divergence of the direct extension of ADMM.

We then extend the analysis to some other cases, including the strongly convex case considered in [14] and an ADMM variant with a small step-size proposed in [20]. Our main counter example can be easily extended to show that even with strong convexity on the objective functions in the model (1.1), the direct extension of ADMM is again not necessarily convergent with some specific \(\beta >0\). This is a complementary result to the conclusion in [14]. Moreover, slightly extending the main counter example, we can numerically show that it seems not possible to find a model-data-independent allowable range for the relaxation factor \(\gamma \) in the Lagrange-multiplier updating step (4.2) such that the convergence of the direct extension of ADMM can be guaranteed for any value of \(\gamma \) in this range. This further justifies the rationale in [20] of considering some model-data-dependent conditions on the factor \(\gamma \). Besides, the result in this paper also justifies the rationale of algorithmic design in some recent work such as the strategy of combining certain correction steps with the output of the direct extension of ADMM in [15, 17], and the idea in [16] which suggests interchanging the order of variable updating appropriately and employing the proximal regularization for some decomposed subproblems, in order to produce a splitting algorithm with provable convergence under mild assumptions for multi-block convex minimization models. It has been shown that these strategies work for tackling the lack of convergence of the direct extension of ADMM.

Finally, we would mention that although our discussion focuses on the model (1.1) where there are three variable and function blocks, our analysis can be easily extended to the more general case where the number of variable and function is greater than \(3\). More specifically, we consider the more general multi-block convex minimization model

$$\begin{aligned} \begin{array}{ll} \min &{} \,\sum _{i=1}^m\theta _i(x_i)\\ \hbox {s.t.}&{} \,\sum _{i=1}^m A_ix_i = b ,\\ &{} x_i \in \mathcal{X}_i, \end{array} \end{aligned}$$
(5.1)

where \(m>3\), \(A_i\in \mathfrak {R}^{p\times n_i}\) (\(i=1,2,\ldots ,m\)), \(b \in \mathfrak {R}^p\), \(\mathcal{X}_i \subset \mathfrak {R}^{n_i}\) (\(i=1,2,\ldots ,m\)) are closed convex sets; and \(\theta _i:\mathfrak {R}^{n_i}\rightarrow \mathfrak {R}\) (\(i=1,2,\ldots ,m\)) are closed convex but not necessarily smooth functions. Then, to ensure the convergence of the direct extension of ADMM for (5.1), a sufficient condition analogous to that in Sect. 2 is: There exist two integers \(i\) and \(j\) such that any two matrices in the sets \(\{A_i,A_{i+1},\ldots ,A_{i+j}\}\) and \(\{A_{i+j+1},A_{i+j+2},\ldots ,A_m,A_1,A_2,\ldots ,A_{i-1}\}\) are orthogonal. This is an easy extension of the condition in Sect. 2; and based on this condition, some examples similar as that in Sect. 3 can be easily found to show the divergence of the direct extension of ADMM for (5.1). We omit the detail for succinctness.