1 Introduction

We consider the following multi-block convex minimization problem:

$$\begin{aligned} \begin{array}{ll} \min &{} f_1(x_1) + f_2(x_2) + \cdots + f_N(x_N) \\ \text {s.t.}&{} A_1 x_1 + A_2 x_2 + \cdots + A_N x_N = b \\ &{} x_i \in {\mathcal {X}}_i, \quad i = 1,\ldots , N, \end{array} \end{aligned}$$
(1.1)

where \(A_i \in \mathbf R^{p\times n_i}\), \(b\in \mathbf R^p\), \({\mathcal {X}}_i\subset \mathbf R^{n_i}\) are closed convex sets, and \(f_i:\mathbf R^{n_i}\rightarrow \mathbf R\) are closed convex functions. Problem (1.1) is a general convex minimization problem with linear equality constraints, and there exist many algorithms for solving this problem. For example, the augmented Lagrangian method can be used to solve it. In this paper, we consider the case that the functions \(f_i, i=1,\ldots ,N\) have certain structures so that it is beneficial to handle them separately, rather than treat them as a whole. One effective way to solve (1.1) that can take advantage of its separable structure is the so-called Alternating Direction Method of Multipliers (ADMM). The ADMM is closely related to the Douglas–Rachford [12] and Peaceman–Rachford [34] operator splitting methods that date back to 1950s. These operator splitting methods were further studied later in [13, 16, 18, 32]. The ADMM has been revisited recently due to its success in solving problems with special structures arising from compressed sensing, machine learning, image processing, and so on; see the recent survey papers [5, 15] for more information.

The ADMM is constructed under an augmented Lagrangian framework, where the augmented Lagrangian function for (1.1) is defined as

$$\begin{aligned} {\mathcal {L}}_\gamma (x_1,\ldots ,x_N;\lambda ) := \sum _{j=1}^N f_j(x_j) - \left\langle \lambda , \sum _{j=1}^N A_j x_j -b\right\rangle + \frac{\gamma }{2}\left\| \sum _{j=1}^N A_j x_j -b\right\| ^2, \end{aligned}$$

where \(\lambda \) is the Lagrange multiplier and \(\gamma > 0\) is a penalty parameter. In a typical iteration of the ADMM for solving (1.1), the following updating procedure is implemented:

$$\begin{aligned} \left\{ \begin{array}{lcl} x_1^{k+1} &{} := &{} \mathrm{argmin}_{x_1\in {\mathcal {X}}_1} \ {\mathcal {L}}_\gamma (x_1,x_2^k,\ldots ,x_N^k;\lambda ^k) \\ x_2^{k+1} &{} := &{} \mathrm{argmin}_{x_2\in {\mathcal {X}}_2} \ {\mathcal {L}}_\gamma (x_1^{k+1},x_2,x_3^k,\ldots ,x_N^k;\lambda ^k) \\ &{} \vdots &{} \\ x_N^{k+1} &{} := &{} \mathrm{argmin}_{x_N\in {\mathcal {X}}_N} \ {\mathcal {L}}_\gamma (x_1^{k+1},x_2^{k+1},\ldots ,x_{N-1}^{k+1},x_N;\lambda ^k)\\ \lambda ^{k+1} &{} := &{} \lambda ^k - \gamma \left( \sum _{j=1}^N A_j x_j^{k+1} -b\right) . \end{array}\right. \end{aligned}$$
(1.2)

Note that the ADMM (1.2) minimizes in each iteration the augmented Lagrangian function with respect to \(x_1,\ldots ,x_N\) alternatingly in a Gauss-Seidel manner. The ADMM (1.2) for solving two-block convex minimization problems (i.e., \(N=2\)) has been studied extensively in the literature. The global convergence of ADMM (1.2) when \(N=2\) has been shown in [14, 17]. There are also some recent works that study the convergence rate properties of ADMM when \(N=2\) (see, e.g., [2, 11, 23, 24, 33]).

However, the convergence of multi-block ADMM (1.2) (we call (1.2) multi-block ADMM when \(N\ge 3\)) has remained unclear for a long time. Recently, Chen et al. [7] constructed a counterexample to show the failure of ADMM (1.2) when \(N\ge 3\). Notwithstanding its lack of theoretical convergence assurance, the multi-block ADMM (1.2) has been applied very successfully to solve problems with \(N\, (N\ge 3)\) block variables; for example, see [35, 37]. It is thus of great interest to further study sufficient conditions that can guarantee the convergence of multi-block ADMM. Some recent works on studying the sufficient conditions guaranteeing the convergence of multi-block ADMM are described briefly as follows. Han and Yuan [19] showed that the multi-block ADMM (1.2) converges if all the functions \(f_1,\ldots ,f_N\) are strongly convex and \(\gamma \) is restricted to certain region. This condition is relaxed in [8, 31] to allow only \(N-1\) functions to be strongly convex and \(\gamma \) is restricted to a certain region. Especially, Lin et al. [31] proved the sublinear convergence rate under such conditions. Cai et al. [6] and Li et al. [28] proved that for \(N=3\), convergence of multi-block ADMM can be guaranteed under the assumption that only one function among \(f_1\), \(f_2\) and \(f_3\) is required to be strongly convex, and \(\gamma \) is restricted to be in a certain region. In addition to strong convexity of \(f_2,\ldots ,f_N\), by assuming further conditions on the smoothness of the functions and some rank conditions on the matrices in the linear constraints, Lin et al. [30] proved the globally linear convergence of multi-block ADMM. Davis and Yin [9] studied a variant of the 3-block ADMM (see Algorithm 8 in [9]) which requires that \(f_1\) is strongly convex and \(\gamma \) is smaller than a certain bound to guarantee the convergence. More recently, Lin, Ma and Zhang [29] proved that the 3-block ADMM is globally convergent for any \(\gamma >0\) when it is applied to solve a class of regularized least squares decomposition problems. Note that the above mentioned works all require that (parts of) the objective function is strongly convex. Without assuming strong convexity, Hong and Luo [26] studied a variant of ADMM (1.2) with small stepsize in updating the Lagrange multiplier. Specifically, [26] proposes to replace the last equation in (1.2) to \( \lambda ^{k+1} := \lambda ^k - \alpha \gamma (\sum _{j=1}^N A_j x_j^{k+1} -b ), \) where \(\alpha >0\) is a small step size. Linear convergence of this variant is proven under the assumption that the objective function satisfies certain error bound conditions. However, it is noted that the selection of \(\alpha \) is in fact bounded by some parameters associated with the error bound conditions to guarantee the convergence. Therefore, it might be difficult to choose \(\alpha \) in practice. There are also studies on the convergence and convergence rate of some other variants of ADMM (1.2), and we refer the interested readers to [10, 2022, 25, 36, 38] for the details of these variants. However, it is observed by many researchers that modified versions of ADMM though with convergence guarantee, often perform slower than the multi-block ADMM with no convergent guarantee. The authors of [36] made the following statement (see page 885 in [36]): “However, to the best of our knowledge, up to now the dilemma is that at least for convex conic programming, the modified versions, though with a convergence guarantee, often perform two to three times slower than the multiblock ADMM with no convergent guarantee.” Wang et al. [38] compared the multi-block ADMM with several modifications of it, and the numerical experiments on basis pursuit and latent variable Gaussian graphical model selection problems show that the multi-block ADMM always outperforms its modifications tested (see Section 3 of [38]). These results motivated us to further study whether ADMM can be guaranteed to converge when it is applied to solve certain class of problems, although it is not necessarily convergent in general.

Our contributions. The main contributions of this paper are as follows. First, we show that if ADMM (1.2) is applied to solve an associated perturbed problem of (1.1) rather than (1.1) itself, then it returns an \(\epsilon \)-optimal solution to (1.1) within \(O(1/\epsilon ^2)\) iterations, with the condition that \(\gamma \) depends on \(\epsilon \). Here we do not assume strong convexity of any objective function \(f_i\). Note that our result in this part can be seen as a perturbation approach. Instead of modifying the ADMM procedure itself as most papers in the literature propose, we propose to modify the problem. Such approach is not uncommon. The perturbation technique applied to resolve the cycling problem in the simplex method for linear programming is one famous example of this type. Modifying the algorithm sometimes requires additional computational efforts. Moreover, as shown in [36] and [38], modified versions of ADMM perform worse than the original ADMM for some problems. Our approach maintains the effectiveness of ADMM, at least for certain problems discussed in [36] and [38], because we do not modify the algorithm itself. Secondly, we show that the ADMM (1.2) returns an \(\epsilon \)-optimal solution within \(O(1/\epsilon )\) iterations when it is applied to solve the so-called sharing problems, under the condition that the augmented Lagrangian \({\mathcal {L}}_\gamma \) is a Kurdyka–Łojasiewicz (KL) function [3, 4], and \(\gamma \) is sufficiently large. Interestingly, it turns out that ADMM possesses nice convergence properties when it is applied to solve sharing problems. In a recent work by Hong et al. [27], global convergence of ADMM was established even when it is applied to solve a nonconvex sharing problem. In this paper, we show that by combining the KL property and the convexity of the problem, we can establish a sublinear convergence rate of ADMM for solving sharing problem.

Organization. The rest of this paper is organized as follows. In Sect. 2 we provide some preliminaries for our convergence rate analysis. In Sect. 3, we prove the \(O(1/\epsilon ^2)\) iteration complexity of ADMM (1.2) by introducing an associated problem of (1.1). In Sect. 4, we prove the \(O(1/\epsilon )\) iteration complexity of ADMM (1.2) with Kurdyka–Łojasiewicz (KL) property.

2 Preliminaries

We denote \(\Omega = {\mathcal {X}}_1\times \cdots \times {\mathcal {X}}_N \times \mathbf R^p\) and the optimal set of (1.1) as \(\Omega ^*\), and the following assumption is made throughout this paper.

Assumption 2.1

The optimal set \(\Omega ^*\) for problem (1.1) is non-empty.

According to the first-order optimality conditions for (1.1), solving (1.1) is equivalent to finding \((x_1^*,x_2^*,\ldots ,x_N^*,\lambda ^*)\in \Omega ^*\) such that the following holds:

$$\begin{aligned} \left\{ \begin{array}{l} (x_i-x_i^*)^\top (g_i(x_i^*)-A_i^\top \lambda ^*) \ge 0, \quad \forall x_i\in {\mathcal {X}}_i, \quad i=1,2,\ldots ,N\\ A_{1}x_{1}^{*} + \cdots +A_{N}x_{N}^{*}-b=0. \end{array} \right. \end{aligned}$$
(2.1)

In this paper, we analyze the iteration complexity of ADMM (1.2) under two scenarios. The conditions of the two scenarios are listed in Table 1. The following assumption is only used in Scenario 2.

Assumption 2.2

We assume that \({\mathcal {X}}_N=\mathbf R^{n_N}\). We also assume that \(f_i\) has a finite lower bound, i.e., \(\inf _{x_i\in {\mathcal {X}}_i} f_i(x_i)\ge f_i^*>-\infty \) for \(i=1,2,\ldots ,N\). Moreover, we assume that \(f_i+\mathbf 1 _{{\mathcal {X}}_i}\) is a coercive function for \(i=1,2,\ldots ,N-1\), where \(\mathbf 1 _{{\mathcal {X}}_i}\) denotes the indicator function of \({\mathcal {X}}_i\), i.e.,

$$\begin{aligned} \mathbf 1 _{{\mathcal {X}}_i}(x_i) = \left\{ \begin{array}{l@{\quad }l} 0, &{} \text{ if } \; x_i\in {\mathcal {X}}_i \\ +\infty , &{} \text{ otherwise. } \end{array}\right. \end{aligned}$$

Furthermore, we assume that \({\mathcal {L}}_\gamma \) is a KL function (will be defined later).

Table 1 Two scenarios leading to sublinear convergence

Remark 2.3

Some remarks are in order here regarding the conditions in Scenario 2. Note that it is not very restrictive to require \(f_i+\mathbf 1 _{{\mathcal {X}}_i}\) to be a coercive function. In fact, many functions used as regularization terms including \(\ell _1\)-norm, \(\ell _2\)-norm, \(\ell _\infty \)-norm for vectors and nuclear norm for matrices are all coercive functions; assuming the compactness of \({\mathcal {X}}_i\) also leads to the coerciveness of \(f_i+\mathbf 1 _{{\mathcal {X}}_i}\). Moreover, the assumptions \(A_N=I\) and \(\nabla f_N\) is Lipschitz continuous actually cover many interesting applications in practice. For example, many problems arising from machine learning, statistics, image processing and so on always have the following structure:

$$\begin{aligned} \min \ f_1(x_1) + \cdots + f_{N-1}(x_{N-1}) + f_N(b-A_1x_1- \cdots - A_{N-1}x_{N-1}), \end{aligned}$$
(2.2)

where \(f_N\) denotes a loss function on data fitting, which is usually a smooth function, and \(f_1, \ldots , f_{N-1}\) are regularization terms to promote certain structures of the solution. This problem is usually referred as sharing problem (see, e.g., [5, 27]). (2.2) can be reformulated as

$$\begin{aligned} \begin{array}{ll} \min &{} f_1(x_1) + \cdots + f_{N-1}(x_{N-1}) + f_N(x_N) \\ \text {s.t.}&{} A_1x_1+ \cdots + A_{N-1}x_{N-1} + x_N = b, \end{array} \end{aligned}$$
(2.3)

which is in the form of (1.1) and can be solved by ADMM (see [5, 27]). Note that \(A_N=I\) in (2.3) and it is very natural to assume that \(\nabla f_N\) is Lipschitz continuous. Thus the conditions in Scenario 2 are satisfied.

Notation

For simplicity, we use u to denote the stacked vectors \(x_1,\ldots ,x_N\), i.e., \(u=[x_1;x_2;\ldots ;x_N]\), \(u^k=[x_1^k;x_2^k;\ldots ;x_N^k]\), \(u^*=[x_1^*,x_2^*;\ldots ;x_N^*]\) and denote \(w = [u;\lambda ]\). We denote by \(f(u)\equiv f_1(x_{1})+\cdots +f_{N}(x_{N})\) the objective function of problem (1.1); \(\mathbf {1}_{{\mathcal {X}}}\) is the indicator function of \({\mathcal {X}}\); \(\nabla f\) is the gradient of f; \(\Vert x\Vert \) denotes the Euclidean norm of x.

In our analysis, the following two well-known identities are used frequently,

$$\begin{aligned} (w_1-w_2)^{\top }(w_3-w_4)= & {} \frac{1}{2}\left( \Vert w_1-w_4\Vert ^{2}-\Vert w_1-w_3\Vert ^{2}\right) \nonumber \\&+\,\frac{1}{2}\left( \Vert w_3-w_2\Vert ^{2}-\Vert w_4-w_2\Vert ^{2}\right) \end{aligned}$$
(2.4)
$$\begin{aligned} (w_{1}-w_{2})^\top (w_{3}-w_{1})= & {} \frac{1}{2}\left( \Vert w_{2}-w_{3}\Vert ^{2}-\Vert w_{1}-w_{2}\Vert ^{2}-\Vert w_{1}-w_{3}\Vert ^{2}\right) . \end{aligned}$$
(2.5)

3 Iteration Complexity of ADMM: Associated Perturbation

In this section, we prove the \(O(1/\epsilon ^2)\) iteration complexity of ADMM (1.2) under the conditions in Scenario 1 of Table 1. Indeed, given \(\epsilon >0\) sufficiently small and \(u^0:=(x_1^0;\ldots ;x_N^0)\), we introduce an associated perturbed problem of (1.1), i.e.,

$$\begin{aligned} \begin{array}{ll} \min &{} f_1(x_1) + \tilde{f}_2(x_2) + \cdots + \tilde{f}_N(x_N) \\ \text {s.t.}&{} A_1 x_1 + A_2 x_2 + \cdots + A_N x_N = b \\ &{} x_i\in {\mathcal {X}}_i,\quad i=1,\ldots ,N, \end{array} \end{aligned}$$
(3.1)

where \(\tilde{f}_i(x_i) = f_i(x_i) + \frac{\mu }{2}\left\| A_i x_i - A_i x_i^0\right\| ^2\) for \(i=2,\ldots ,N\), and \(\mu = \epsilon (N-2)(N+1)\). Note \(\tilde{f}_i\) are not necessarily strongly convex. We prove that the ADMM (1.2) for associated perturbed problem (3.1) returns an \(\epsilon \)-optimal solution of the original problem (1.1), in terms of both objective value and constraint violation, within \(O(1/\epsilon ^2)\) iterations.

The ADMM for solving (3.1) can be summarized as (note that some constant terms in the subproblems are discarded):

$$\begin{aligned} x_{1}^{k+1}:= & {} \mathrm{argmin}_{x_{1} \in {\mathcal {X}}_{1}}f_{1}(x_{1})+\frac{\gamma }{2}\left\| A_{1}x_{1} + \sum \limits _{j=2}^N A_{j}x_{j}^{k}-b-\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \end{aligned}$$
(3.2)
$$\begin{aligned} x_{i}^{k+1}:= & {} \mathrm{argmin}_{x_{i} \in {\mathcal {X}}_{i}}\tilde{f}_{i}(x_{i})\nonumber \\&+\,\frac{\gamma }{2}\left\| \sum \limits _{j=1}^{i-1} A_{j}x_{j}^{k+1} + A_{i}x_{i} + \sum \limits _{j=i+1}^{N} A_{j}x_{j}^{k}-b-\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \ i = 2,\ldots ,N, \end{aligned}$$
(3.3)
$$\begin{aligned} \lambda ^{k+1}:= & {} \lambda ^{k} - \gamma \left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+ \cdots + A_{N}x_{N}^{k+1}-b\right) . \end{aligned}$$
(3.4)

The first-order optimality conditions for (3.2)–(3.3) are given respectively by \(x_{i}^{k+1}\in {\mathcal {X}}_{i}\) and

$$\begin{aligned}&\left( x_{1}-x_{1}^{k+1}\right) ^{\top }\left[ g_1\left( x_{1}^{k+1}\right) -A_{1}^{\top }\lambda ^{k}+\gamma A_{1}^{\top }\left( A_{1}x_{1}^{k+1}+\sum \limits _{j=2}^N A_{j}x_{j}^{k}-b\right) \right] \ge 0, \end{aligned}$$
(3.5)
$$\begin{aligned}&\left( x_{i}-x_{i}^{k+1}\right) ^{\top }\left[ g_i\left( x_{i}^{k+1}\right) +\mu A_i^\top A_i\left( x_i^{k+1}-x_i^0\right) -A_{i}^{\top }\lambda ^{k} \right. \nonumber \\&\quad \left. +\,\gamma A_{i}^{\top }\left( \sum \limits _{j=1}^{i} A_{j}x_{j}^{k+1} + \sum \limits _{j=i+1}^{N} A_{j}x_{j}^{k}-b\right) \right] \ge 0, \end{aligned}$$
(3.6)

hold for any \(x_i\in {\mathcal {X}}_i\) and \(g_i \in \partial f_i\), a subgradient of \(f_i\), for \(i=1,2,\ldots ,N\). Moreover, by combining with (3.4), (3.5)–(3.6) can be rewritten as

$$\begin{aligned}&\left( x_{1}-x_{1}^{k+1}\right) ^{\top }\left[ g_{1}\left( x_{1}^{k+1}\right) -A_{1}^{\top }\lambda ^{k+1} +\gamma A_{1}^{\top }\left( \sum \limits _{j=2}^N A_{j}\left( x_{j}^{k}-x_{j}^{k+1}\right) \right) \right] \ge 0, \end{aligned}$$
(3.7)
$$\begin{aligned}&\left( x_{i}-x_{i}^{k+1}\right) ^{\top }\left[ g_i\left( x_{i}^{k+1}\right) +\mu A_i^\top A_i\left( x_i^{k+1}- x_i^0\right) -A_{i}^{\top }\lambda ^{k+1} \right. \nonumber \\&\left. \quad +\,\gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^N A_{j}(x_{j}^{k}-x_{j}^{k+1})\right) \right] \ge 0. \end{aligned}$$
(3.8)

Lemma 3.1

Let \((x_1^{k+1},x_{2}^{k+1},\ldots ,x_N^{k+1},\lambda ^{k+1})\in \Omega \) be generated by the ADMM (3.2)–(3.4) from given \((x_{2}^{k},\ldots ,x_{N}^{k},\lambda ^{k})\). For any \(u^*=(x_1^*,x_2^*,\ldots ,x_N^*)\in \Omega ^*\) and \(\lambda \in \mathbf R^p\), it holds true under the conditions in Scenario 1 that

$$\begin{aligned}&f(u^*)-f\left( u^{k+1}\right) +\sum _{i=1}^N \left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) + (\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^NA_ix_i^{k+1}-b\right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^{k}\right\| ^{2}-\left\| \lambda -\lambda ^{k+1}\right\| ^{2}\right) + \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^{0}\right\| ^{2} \nonumber \\&\qquad +\, \frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_{j}x_{j}^* + \sum \limits _{j=i+1}^N A_j x_j^{k} - b\right\| ^{2}-\left\| \sum \limits _{j=1}^i A_{j}x_{j}^* + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^{2}\right) \nonumber \\&\quad \ge 0. \end{aligned}$$
(3.9)

Proof

For \(i=1,2,\ldots ,N-1\), we have,

$$\begin{aligned}&\left( x_i -x_i^{k+1}\right) ^\top A_i^\top \left( \sum \limits _{j=i+1}^N A_j \left( x_j^k - x_j^{k+1}\right) \right) \\&\quad = \left[ \left( \sum \limits _{j=1}^i A_j x_j - b\right) - \left( \sum \limits _{j=1}^{i-1} A_j x_j + A_i x_i^{k+1}-b\right) \right] ^\top \left[ \left( -\sum \limits _{j=i+1}^N A_j x_j^{k+1}\right) \right. \\&\left. \qquad -\,\left( -\sum \limits _{j=i+1}^N A_j x_j^k\right) \right] \\&\quad = \frac{1}{2}\left( \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^{2}-\left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^{2}\right) \\&\qquad +\,\frac{1}{2}\left( \left\| \sum \limits _{j=1}^{i-1} A_j x_j + \sum \limits _{j=i}^N A_j x_j^{k+1} - b\right\| ^{2} - \left\| \sum \limits _{j=1}^{i-1} A_j x_j + A_i x_i^{k+1}+\sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^{2}\right) \\&\quad \le \frac{1}{2}\left( \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^{2}-\left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^{2}\right) \\&\qquad +\,\frac{1}{2}\left\| \sum \limits _{j=1}^{i-1} A_j x_j + \sum \limits _{j=i}^N A_j x_j^{k+1} - b \right\| ^{2}, \end{aligned}$$

where in the second equality we applied the identity (2.4). Summing the above inequality for \(i=1,\ldots ,N-1\) and using (3.4) yield,

$$\begin{aligned}&\gamma \sum _{i=1}^{N-1}\left( x_i-x_i^{k+1}\right) ^\top A_i^\top \left( \sum _{j=i+1}^N A_j\left( x_j^k-x_j^{k+1}\right) \right) \nonumber \\&\quad \le \frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^2 - \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^2\right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^k\right\| ^{2}+ \frac{\gamma }{2}\sum \limits _{i=2}^{N-1}\left\| \sum \limits _{j=1}^{i-1} A_j x_j +\sum \limits _{j=i}^N A_j x_j^{k+1}-b\right\| ^2. \end{aligned}$$
(3.10)

Adding (3.7) and (3.8), and combining the resulting inequality with (3.4) and (3.10) yields that the following inequality holds for any \(\lambda \in \mathbf R^p\):

$$\begin{aligned}&\sum _{i=1}^N \left( x_i-x_i^{k+1}\right) ^\top \left( g_i(x_i^{k+1})-A_i^\top \lambda ^{k+1}\right) +(\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^NA_ix_i^{k+1}-b\right) \nonumber \\&\qquad +\, \frac{1}{\gamma }(\lambda -\lambda ^{k+1})^\top (\lambda ^{k+1}-\lambda ^k) \nonumber \\&\qquad +\,\mu \sum \limits _{i=2}^N \left( x_i - x_i^{k+1}\right) ^\top A_i^\top A_i\left( x_i^{k+1} - x_i^0\right) + \frac{1}{2\gamma }\left\| \lambda ^{k+1} - \lambda ^k \right\| ^2 \nonumber \\&\qquad +\, \frac{\gamma }{2}\sum \limits _{i=2}^{N-1}\left\| \sum \limits _{j=1}^{i-1} A_j x_j + \sum \limits _{j=i}^N A_j x_j^{k+1} - b\right\| ^2 \nonumber \\&\qquad +\, \frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^2 -\left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^2\right) \nonumber \\&\quad \ge 0. \end{aligned}$$
(3.11)

Using (2.5), we have

$$\begin{aligned} \frac{1}{\gamma }\left( \lambda -\lambda ^{k+1}\right) ^{\top }\left( \lambda ^{k+1} - \lambda ^k \right) +\frac{1}{2\gamma }\left\| \lambda ^{k+1} - \lambda ^k \right\| ^2 = \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1}\right\| ^2\right) , \end{aligned}$$

and

$$\begin{aligned}&\mu \left( x_i - x_i^{k+1}\right) ^\top A_i^\top A_i\left( x_i^{k+1} - x_i^0 \right) \\&\quad = \frac{\mu }{2}\left( \left\| A_i x_i - A_i x_i^0 \right\| ^2 - \left\| A_i x_i^{k+1} - A_i x_i^0 \right\| ^{2} - \left\| A_i x_i - A_i x_i^{k+1}\right\| ^2\right) \\&\quad \le \frac{\mu }{2}\left\| A_i x_i - A_i x_i^0 \right\| ^2 - \frac{\mu }{2}\left\| A_i x_i - A_i x_i^{k+1} \right\| ^2. \end{aligned}$$

Letting \(u=u^{*}\) in (3.11), and invoking the convexity of \(f_i\) that

$$\begin{aligned} f_{i}\left( x_{i}^{*}\right) -f_{i}\left( x_{i}^{k+1}\right) \ge \left( x_{i}^{*}-x_{i}^{k+1}\right) ^{\top } g_{i}\left( x_{i}^{k+1}\right) ,\quad i=1,2,\ldots ,N \end{aligned}$$

and

$$\begin{aligned} \frac{\gamma }{2}\sum \limits _{i=2}^{N-1}\left\| \sum \limits _{j=1}^{i-1} A_j x_j^* + \sum \limits _{j=i}^N A_j x_j^{k+1} - b\right\| ^2= & {} \frac{\gamma }{2}\sum \limits _{i=2}^{N-1}\left\| \sum \limits _{j=i}^N A_j (x_j^{k+1} - x_j^*)\right\| ^2 \\\le & {} \frac{\gamma (N+1)(N-2)}{2}\sum \limits _{i=2}^{N} \left\| A_i x_i^{k+1} - A_i x_i^*\right\| ^2, \end{aligned}$$

we obtain,

$$\begin{aligned}&f(u^*)-f(u^{k+1})+\sum _{i=1}^N\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) + (\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^NA_ix_i^{k+1}-b\right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1}\right\| ^2 \right) \\&\qquad +\, \frac{\mu }{2}\sum \limits _{i=2}^N \left( \left\| A_i x_i^* - A_i x_i^0\right\| ^2 - \left\| A_i x_i^* - A_i x_i^{k+1} \right\| ^2 \right) \\&\qquad +\, \frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^2 - \left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b \right\| ^{2}\right) \\&\qquad +\,\frac{\gamma (N+1)(N-2)}{2}\sum \limits _{i=2}^{N} \left\| A_i x_i^* - A_i x_i^{k+1}\right\| ^{2} \\&\quad \ge 0. \end{aligned}$$

This together with the facts that \(\mu = \epsilon (N-2)(N+1)\) and \(\gamma \le \epsilon \) implies that

$$\begin{aligned} \frac{\gamma (N+1)(N-2)}{2}\sum \limits _{j=2}^{N} \left\| A_j x_j^* - A_j x_j^{k+1}\right\| ^{2} - \frac{\mu }{2}\sum \limits _{j=2}^{N} \left\| A_j x_j^* - A_j x_j^{k+1}\right\| ^{2}\le 0, \end{aligned}$$

which further implies the desired inequality (3.9). \(\square \)

Now we are ready to prove the \(O(1/\epsilon ^2)\) iteration complexity of the ADMM for (1.1) in an ergodic case.

Theorem 3.2

Let \((x_1^{k+1},x_2^{k+1},\ldots ,x_N^{k+1},\lambda ^{k+1})\in \Omega \) be generated by ADMM (3.2)–(3.4) from given \((x_2^k,\ldots ,x_N^k,\lambda ^k)\). For any integer \(t>0\), let \(\bar{u}^t = (\bar{x}_1^t; \bar{x}_2^t; \ldots ; \bar{x}_N^t)\) and \(\bar{\lambda }^t\) be defined as

$$\begin{aligned} \bar{x}_{i}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{i}^{k+1}, \quad i=1,2,\ldots ,N, \quad \bar{\lambda }^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}\lambda ^{k+1}. \end{aligned}$$

For any \((u^*,\lambda ^*) \in \Omega ^{*}\), by defining \(\rho :=\Vert \lambda ^*\Vert +1\), it holds under conditions in Scenario 1 that,

$$\begin{aligned} 0\le & {} f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| \nonumber \\\le & {} \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\sum \limits _{i=1}^{N-1}\left\| \sum \limits _{j=i+1}^N A_j (x_j^0 - x_j^*)\right\| ^{2} \\&+\, \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^0 \right\| ^{2}. \end{aligned}$$

This also implies that when \(t=O(1/\epsilon ^2)\), \(\bar{u}^{t}\) is an \(\epsilon \)-optimal solution to the original problem (1.1), i.e., both the error of the objective function value and the residual of the equality constraint satisfy that

$$\begin{aligned} |f(\bar{u}^t)-f(u^*)| = O(\epsilon ), \quad \text{ and } \quad \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| =O(\epsilon ). \end{aligned}$$
(3.12)

Proof

Because \((u^{k},\lambda ^k)\in \Omega \), it holds that \((\bar{u}^{t},\bar{\lambda }^t)\in \Omega \) for all \(t\ge 0\). By Lemma 3.1 and invoking the convexity of function \(f(\cdot )\), we have

$$\begin{aligned}&f(u^*)-f(\bar{u}^t) + \lambda ^\top \left( \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right) \nonumber \\&\quad = f(u^*)-f(\bar{u}^t)+ \sum _{i=1}^N\left( x_i^*-\bar{x}_i^t\right) ^\top \left( -A_i^\top \bar{\lambda }^t\right) + (\lambda -\bar{\lambda }^t)^\top \left( \sum _{i=1}^NA_i\bar{x}_i^t-b\right) \nonumber \\&\quad \ge \frac{1}{t+1}\sum \limits _{k=0}^{t}\left[ f(u^*) - f(u^{k+1})\right. \nonumber \\&\left. \qquad +\,\sum _{i=1}^N\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) +(\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^NA_ix_i^{k+1}-b\right) \right] \nonumber \\&\quad \ge \frac{1}{t+1}\sum \limits _{k=0}^t\left[ \frac{1}{2\gamma }\left( \left\| \lambda - \lambda ^{k+1}\right\| ^2 - \left\| \lambda -\lambda ^k \right\| ^2 \right) \right. \nonumber \\&\left. \qquad -\, \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^0 \right\| ^2 \right. \nonumber \\&\qquad \left. +\,\frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^2 - \left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^2\right) \right] \nonumber \\&\quad \ge -\frac{1}{2\gamma (t+1)}\left\| \lambda - \lambda ^0 \right\| ^2 -\frac{\gamma }{2(t+1)}\sum \limits _{i=1}^{N-1}\left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^0 - b\right\| ^2 \nonumber \\&\qquad - \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^0 \right\| ^2. \end{aligned}$$
(3.13)

Note that this inequality holds for all \(\lambda \in \mathbf R^{p}\). From the optimality condition (2.1) we obtain

$$\begin{aligned} 0 \ge f(u^*)-f(\bar{u}^t) + (\lambda ^*)^\top \left( \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right) . \end{aligned}$$

Moreover, since \(\rho :=\Vert \lambda ^*\Vert +1\), by applying Cauchy–Schwarz inequality, we obtain

$$\begin{aligned} 0 \le f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| . \end{aligned}$$
(3.14)

By setting \(\lambda = -\rho (\sum _{i=1}^N A_i\bar{x}_i^t-b) /\Vert \sum _{i=1}^N A_i\bar{x}_i^t-b\Vert \) in (3.13), and noting that \(\Vert \lambda \Vert = \rho \), we obtain

$$\begin{aligned}&f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| \nonumber \\&\quad \le \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\sum \limits _{i=1}^{N-1}\left\| \sum \limits _{j=i+1}^N A_j (x_j^0 - x_j^*)\right\| ^{2} \nonumber \\&\qquad +\, \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^{0} \right\| ^{2}. \end{aligned}$$
(3.15)

When \(t=O(1/\epsilon ^2)\), together with the condition that \(\frac{\epsilon }{2}\le \gamma \le \epsilon \), it is implied that the right hand side of (3.15) is of order \(O(\epsilon )\).

We now define the function

$$\begin{aligned} v(\xi ) = \min \left\{ f(u) \,\left| \, \sum _{i=1}^N A_i x_i - b = \xi , x_i\in {\mathcal {X}}_i, i=1,2,\ldots ,N \right\} . \right. \end{aligned}$$

It is easy to verify that v is convex, \(v(0)=f(u^*)\), and \(\lambda ^* \in \partial v(0)\). Therefore, from the convexity of v and the Cauchy–Schwarz inequality, it holds that

$$\begin{aligned} v(\xi ) \ge v(0) + \langle \lambda ^*, \xi \rangle \ge f(u^*) - \Vert \lambda ^*\Vert \Vert \xi \Vert . \end{aligned}$$
(3.16)

Let \(\bar{\xi } = \sum \nolimits _{i=1}^N A_i \bar{x}_i^t - b\), we have \(f(\bar{u}^t) \ge v(\bar{\xi })\). Therefore, by combining (3.14), (3.16), and the fact that the right hand side (thus the left hand side) of (3.15) is of order \(O(\epsilon )\), we get

$$\begin{aligned}&- \Vert \lambda ^*\Vert \Vert \bar{\xi } \Vert \le f(\bar{u}^t) - f(u^*) \\&\quad \le \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\sum \limits _{i=1}^{N-1}\left\| \sum \limits _{j=i+1}^N A_j (x_j^0 - x_j^*)\right\| ^{2} \\&\qquad + \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^0 \right\| ^{2} - \rho \Vert \bar{\xi }\Vert \\&\quad \le C\epsilon - \rho \Vert \bar{\xi }\Vert , \end{aligned}$$

which, by using \(\rho =\Vert \lambda ^*\Vert +1\), yields,

$$\begin{aligned} \left\| \sum \limits _{i=1}^N A_i \bar{x}_i^t - b\right\| = \Vert \bar{\xi }\Vert \le C\epsilon , \end{aligned}$$
(3.17)

where C is a constant depending on \(\rho \), \(\lambda ^0\), \(x^0\), \(x^*\) and N. Moreover, by combining (3.14) and (3.17), one obtains that

$$\begin{aligned} -\rho C\epsilon \le -\rho \Vert \bar{\xi }\Vert \le f(\bar{u}^t) - f(u^*) \le (1-\rho )C\epsilon . \end{aligned}$$
(3.18)

Finally, we note that (3.17), (3.18) imply (3.12). \(\square \)

Remark 3.3

We remark that by applying ADMM to solve the associated perturbed problem (3.1), we can find an \(\epsilon \)-optimal solution to the original problem (1.1), i.e., an approximate solution to (1.1). However, the assumptions required in this section do not ensure that the exact optimal solution to (1.1) can be found, because that would require \(\gamma \rightarrow 0\) and thus \(\epsilon \rightarrow 0\).

3.1 Solving the Counterexample in [7]

We now implement our algorithm in this section to solve a perturbed version of the counterexample constructed by Chen et al. [7]. As we will see, after modifying the problem by adding the perturbation terms, the multi-block ADMM is always able to find an \(\epsilon \)-optimal solution. The counterexample constructed in [7] is to find a solution to the linear system \(A_1x_1+A_2x_2+A_3x_3=0\), where \(A_1=(1,1,1)^\top \), \(A_2=(1,1,2)^\top \) and \(A_3=(1,2,2)^\top \). It is easy to see that the unique solution of this linear system is \((x_1,x_2,x_3)=(0,0,0)\). The associated perturbed problem to this example is given by

$$\begin{aligned} \begin{array}{ll} \min &{} \frac{\mu }{2}\Vert A_2x_2-A_2x_2^0\Vert ^2 + \frac{\mu }{2}\Vert A_3x_3-A_3x_3^0\Vert ^2 \\ \text {s.t.}&{} A_1x_1+A_2x_2+A_3x_3=0, \end{array} \end{aligned}$$
(3.19)

where \(\mu = 4\epsilon \). To satisfy the conditions in Scenario 1, we chose \(\gamma =\epsilon \). We randomly generated \(x^0=(x_1^0,x_2^0,x_3^0)\) using the randn function in Matlab with random seed set to 0. We plot the residual of the equality constraint and the three components of \(x^t\) and the average of \(x^k\) for \(k=0,\ldots ,t\) in Figs. 1 and 2 for \(\epsilon =10^{-2}\) and \(\epsilon =10^{-3}\) respectively, where t denotes the iteration number. From Figs. 1 and 2 we can see that ADMM for solving the perturbed problem is able to decrease the residual of the equality constraint to the order of \(O(\epsilon )\) in \(O(1/\epsilon ^2)\) iterations, which matches our theoretical results in Theorem 3.2. We also see that both \(x^t\) and the average of \(x^k\) for \(k=0,\ldots ,t\) converge to zero, which is the unique solution to the original problem. As a result, our approach successfully eliminates the risk of divergence of the multi-block ADMM.

Fig. 1
figure 1

ADMM for solving the associated perturbed problem (3.19) with \(\epsilon =10^{-2}\). Left: the residual of the equality constraint; Right: \(x^t\) and the average of \(x^k\) for \(k=0,\ldots ,t\) (denoted as xa) as defined in Theorem 3.2

Fig. 2
figure 2

ADMM for solving the associated perturbed problem (3.19) with \(\epsilon =10^{-3}\). Left: the residual of the equality constraint; Right: \(x^t\) and the average of \(x^k\) for \(k=0,\ldots ,t\) (denoted as xa) as defined in Theorem 3.2

4 Iteration Complexity of ADMM: Kurdyka–Łojasiewicz Property

In this section, we prove an \(O(1/\epsilon )\) iteration complexity of ADMM (1.2) under the conditions in Scenario 2 of Table 1. Under the conditions in Scenario 2, since \({\mathcal {X}}_N=\mathbf R^{n_N}\), \(f_N\) is differentiable and \(A_N=I\), (1.1) becomes the so-called sharing problem. It turns out that ADMM has nice convergence properties when it is applied to solve sharing problems. In a recent work by Hong, Luo and Razaviyayn [27], it was shown that ADMM is globally convergent under certain conditions for solving a nonconvex sharing problem. For convex sharing problem, Corollary 3.1 in [27] gives conditions that guarantee the global convergence of ADMM. These conditions are the same as our conditions required in Scenario 2. For example, both papers require \(\gamma > \sqrt{2}L\). The difference is that global convergence of ADMM for solving (both nonconvex and convex) sharing problem is shown in [27], and we show that by using the KL property, we are able to prove the sublinear convergence rate of ADMM for convex sharing problem.

The first-order optimality conditions for the subproblems in (1.2) are given by \(x_i^{k+1}\in {\mathcal {X}}_i, i=1,\ldots ,N-1\), and for any \(x_i\in {\mathcal {X}}_i\),

$$\begin{aligned}&g_i\left( x_{i}^{k+1}\right) -A_{i}^{\top }\lambda ^{k}+\gamma A_{i}^{\top }\left( \sum \limits _{j=1}^{i} A_{j}x_{j}^{k+1} + \sum \limits _{j=i+1}^{N-1} A_{j}x_{j}^{k} + x_N^k -b\right) = 0, \quad i=1,\ldots ,N-1,&\end{aligned}$$
(4.1)
$$\begin{aligned}&\nabla f_N\left( x_{N}^{k+1}\right) -\lambda ^{k}+\gamma \left( \sum \limits _{j=1}^{N-1} A_{j}x_{j}^{k+1} + x_{N}^{k+1}-b\right) = 0,&\end{aligned}$$
(4.2)

where \(g_i \in \partial (f_i+\mathbf 1 _{{\mathcal {X}}_i})\) is a subgradient of \(f_i+\mathbf 1 _{{\mathcal {X}}_i}\), \(i=1,2,\ldots ,N-1\). The last equation in (1.2) that updates \(\lambda ^k\) now becomes

$$\begin{aligned} \lambda ^{k+1} := \lambda ^k - \gamma \left( A_1 x_1^{k+1} + A_2 x_2^{k+1}+ \cdots + A_{N-1}x_{N-1}^{k+1} + x_N^{k+1} - b\right) . \end{aligned}$$
(4.3)

Moreover, by combining with (4.3), (4.1) and (4.2) can be rewritten as

$$\begin{aligned}&g_i(x_{i}^{k+1})-A_{i}^{\top }\lambda ^{k+1}+\gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^{N-1} A_{j}(x_{j}^{k}-x_{j}^{k+1})+(x_{N}^{k}-x_{N}^{k+1})\right) = 0, \ i=1,\ldots ,N-1,&\end{aligned}$$
(4.4)
$$\begin{aligned}&\nabla f_N(x_{N}^{k+1})-\lambda ^{k+1} = 0.&\end{aligned}$$
(4.5)

Note that in Scenario 2 we require that \({\mathcal {L}}_\gamma \) is a Kurdyka–Łojasiewicz (KL) function. Let us first introduce the notion of the KL function and the KL property, which can be found, e.g., in [3, 4]. We denote \(\mathrm{dist}(x,S):=\inf \{\Vert y-x \Vert :y\in S\}\) as the distance from x to S. Let \(\eta \in (0,+\infty ]\). We further denote \(\Phi _\eta \) to be the class of all concave and continuous functions \(\varphi :[0,\eta )\rightarrow \mathbf R_{+}\) satisfying the following conditions: (1) \(\varphi (0)=0\); (2) \(\varphi \) is \(C^1\) on \((0,\eta )\) and continuous at 0; (3) for all \(s\in (0,\eta ):\varphi '(s)>0\).

Definition 4.1

Let \(f:\Omega \rightarrow (-\infty ,+\infty ]\) be proper and lower semicontinuous.

  1. 1.

    The function f has KL property at \(w_0\in \{w\in \Omega :\partial f(w)\ne \emptyset \}\) if there exists \(\eta \in (0,+\infty ]\), a neighborhood \(W_0\) of \(w_0\) and a function \(\varphi \in \Phi _\eta \) such that for all

    $$\begin{aligned} \bar{w}_0 \in W\cap \left\{ w\in \Omega : f(w)<f(w_0)<f(w)+\eta \right\} , \end{aligned}$$

    the following inequality holds,

    $$\begin{aligned} \varphi '(f(\bar{w}_0) - f(w_0))\mathrm{dist}(0,\partial f(\bar{w}_0))\ge 1. \end{aligned}$$
    (4.6)
  2. 2.

    The function f is a KL function if f satisfies the KL property at each point of \(\Omega \cap \{\partial f(w)\ne \emptyset \}\).

Remark 4.1

It is important to remark that most convex functions from practical applications satisfy the KL property; see Section 5.1 of [4]. In fact, convex functions that do not satisfy the KL property exist (see [3] for a counterexample) but they are rare and difficult to construct.

Indeed, \({\mathcal {L}}_\gamma \) will be a KL function if each \(f_i\) satisfies growth condition, or uniform convexity, or they are general convex semialgebraic or real analytic functions. We refer the interested readers to [1, 4] for more information.

The following result, which is called uniformized KL property, is from Lemma 6 of [4].

Lemma 4.2

(Lemma 6 [4]) Let \(\Omega \) be a compact set and \(f: \mathbf R^n\rightarrow (-\infty ,\infty ]\) be a proper and lower semi-continuous function. Assume that f is constant on \(\Omega \) and satisfies the KL property at each point of \(\Omega \). Then, there exists \(\epsilon >0\), \(\eta >0\) and \(\varphi \in \Phi _\eta \) such that for all \(\bar{u}\) in \(\Omega \) and all u in the intersection:

$$\begin{aligned} \left\{ u\in \mathbf R^n: \mathrm{dist}(u, \Omega ) < \epsilon \right\} \cap \left\{ u\in \mathbf R^n: f(\bar{u}) < f(u) < f(\bar{u})+\eta \right\} , \end{aligned}$$

the following inequality holds,

$$\begin{aligned} \varphi '\left( f(u) - f(\bar{u})\right) \mathrm{dist}\left( 0, \partial f(u)\right) \ge 1. \end{aligned}$$

We now give a formal definition of the limit point set. Let the sequence \(w^k = \left( x_1^k,\ldots , x_N^k,\lambda ^k\right) \) be a sequence generated by the multi-block ADMM (1.2) from a starting point \(w^0 = \left( x_1^0,\ldots , x_N^0,\lambda ^0\right) \). The set of all limit points is denoted by \(\Omega (w^0)\), i.e.,

$$\begin{aligned}&\Omega (w^0) = \left\{ \bar{w}\in \mathbf R^{n_1}\times \cdots \times \mathbf R^{n_N}\times \mathbf R^p: \exists \text { an infinite sequence }\{k_l\}_{l=1,\ldots } \right. \\&\left. \quad \text { such that }w^{k_l}\rightarrow \bar{w}\text { as }l\rightarrow \infty \right\} . \end{aligned}$$

In the following we present the main results in this section. Specifically, Theorem 4.3 gives the convergence of the multi-block ADMM (1.2), and we include its proof in the “Appendix”. Theorem 4.5 shows that the whole sequence generated by the multi-block ADMM (1.2) converges.

Theorem 4.3

Under the conditions in Scenario 2 of Table 1, then:

  1. 1.

    \(\Omega (w^0)\) is a non-empty set, and any point in \(\Omega (w^0)\) is a stationary point of \({\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )\);

  2. 2.

    \(\Omega (w^0)\) is a compact and connected set;

  3. 3.

    The function \({\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )\) is finite and constant on \(\Omega (w^0)\).

Remark 4.4

In Theorem 4.3, we do not require \({\mathcal {L}}_\gamma \) to be a KL function, which is only required in Theorem 4.5 (see next).

Theorem 4.5

Suppose that \({\mathcal {L}}_\gamma (x_1,\ldots , x_N,\lambda )\) is a KL function. Let the sequence \(w^k =\left( x_1^k,\ldots , x_N^k,\lambda ^k\right) \) be generated by the multi-block ADMM (1.2). Let \(w^* =\left( x_1^*,\ldots , x_N^*,\lambda ^*\right) \in \Omega (w^0)\), then the sequence \(w^k =\left( x_1^k,\ldots , x_N^k,\lambda ^k\right) \) has a finite length, i.e.,

$$\begin{aligned} \sum \limits _{k=0}^\infty \left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1} \right\| + \left\| \lambda ^k - \lambda ^{k+1}\right\| \right) \le G, \end{aligned}$$
(4.7)

where the constant G is given by

$$\begin{aligned} G:= & {} 2\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^0 - A_i x_i^1\right\| + \left\| x_N^0 - x_N^1 \right\| + \left\| \lambda ^0 - \lambda ^1\right\| \right) \\&+\, \frac{2M\gamma (1+L^2)}{\gamma ^2-2L^2}\varphi \left( {\mathcal {L}}_\gamma (w^1) - {\mathcal {L}}_\gamma (w^*)\right) , \end{aligned}$$

and

$$\begin{aligned} M = \max \left( \gamma \sum \limits _{i=1}^{N-1}\left\| A_i^\top \right\| , \frac{1}{\gamma }+1+\sum \limits _{i=1}^{N-1} \left\| A_i^\top \right\| \right) >0, \end{aligned}$$
(4.8)

and the whole sequence \(\left( A_1 x_1^k, A_2 x_2^k, \ldots , A_{N-1}x_{N-1}^k, x_N^k,\lambda ^k\right) \) converges to \(\left( A_1 x_1^*,\ldots ,\right. \left. A_{N-1}x_{N-1}^*, x_N^*,\lambda ^*\right) \).

Proof

The proof of this theorem is almost identical to the proof of Theorem 1 in [4], by utilizing the uniformized KL property (Lemma 4.2), and the facts that \(\Omega (w^0)\) is compact, \({\mathcal {L}}_\gamma (w)\) is constant (proved in Theorem 4.3), with function \(\Psi \) replaced by \({\mathcal {L}}_\gamma \) and some other minor changes. We thus omit the proof for succinctness. \(\square \)

Based on Theorem 4.5, we prove a key lemma for analyzing the iteration complexity for the ADMM.

Lemma 4.6

Suppose that \({\mathcal {L}}_\gamma (x_1,\ldots , x_N,\lambda )\) is a KL function. Let \((x_1^{k+1},x_{2}^{k+1},\ldots ,x_N^{k+1},\lambda ^{k+1})\in \Omega \) be generated by the multi-block ADMM (1.2) from given \((x_{2}^{k},\ldots ,x_{N}^{k},\lambda ^{k})\). For any \(u^*=(x_1^*,x_2^*,\ldots ,x_N^*)\in \Omega ^*\) and \(\lambda \in \mathbf R^p\), it holds under conditions in Scenario 2 that

$$\begin{aligned}&f(u^*)-f(u^{k+1}) + \sum _{i=1}^{N-1}\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) -\left( x_N^*-x_N^{k+1}\right) ^\top \lambda ^{k+1} \nonumber \\&\qquad +\, (\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^{N-1}A_ix_i^{k+1}+x_N^{k+1}-b\right) \nonumber \\&\qquad +\, \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^* + \sum \limits _{i=2}^{N-1} A_i x_i^{k} + x_N^k - b\right\| ^2 - \left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2 \right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1} \right\| ^2 \right) \nonumber \\&\qquad +\, \gamma D(N-2)\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right) \nonumber \\&\quad \ge 0, \end{aligned}$$
(4.9)

where D is a constant that will be specified later.

Proof

Note that similar to the proof of Lemma 3.1, we can prove that, for any \(x_1\in {\mathcal {X}}_1\),

$$\begin{aligned}&\left( x_1 - x_1^{k+1}\right) ^\top A_1^\top \left[ \sum \limits _{j=2}^{N-1} A_j \left( x_j^k - x_j^{k+1}\right) + \left( x_N^k - x_N^{k+1}\right) \right] \\&\quad \le \frac{1}{2}\left( \left\| A_1 x_1 +\sum \limits _{j=2}^{N-1} A_j x_j^k + x_N^k -b\right\| ^2 - \left\| A_1 x_1 + \sum \limits _{j=2}^{N-1} A_j x_j^{k+1} + x_N^{k+1} - b\right\| ^2 \right) \\&\qquad +\, \frac{1}{2\gamma ^2}\left\| \lambda ^{k+1}-\lambda ^k\right\| ^2. \end{aligned}$$

For any \(x_i\in {\mathcal {X}}_i\), \(i=2,3,\ldots ,N-1\), we have,

$$\begin{aligned}&\left( x_i - x_i^{k+1}\right) ^\top A_i^\top \left[ \sum \limits _{j=i+1}^{N-1} A_j \left( x_j^k - x_j^{k+1}\right) + \left( x_N^k - x_N^{k+1}\right) \right] \\&\quad \le \left\| A_i x_i - A_i x_i^{k+1}\right\| \left[ \sum \limits _{j=i+1}^{N-1} \left\| A_j x_j^k - A_j x_j^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] \\&\quad \le \left\| A_i x_i - A_i x_i^{k+1}\right\| \left[ \sum \limits _{j=1}^{N-1} \left\| A_j x_j^k - A_j x_j^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] . \end{aligned}$$

Therefore, for any \(x_i\in {\mathcal {X}}_i\), \(i=1,\ldots ,N-1\), it holds that

$$\begin{aligned}&\gamma \sum _{i=1}^{N-1}\left( x_i-x_i^{k+1}\right) ^\top A_i^\top \left( \sum _{j=i+1}^{N-1}A_j\left( x_j^k-x_j^{k+1}\right) +\left( x_N^k-x_N^{k+1}\right) \right) \nonumber \\&\quad \le \frac{\gamma }{2}\left( \left\| A_1 x_1 + \sum \limits _{i=2}^{N-1} A_i x_i^{k} +x_N^k- b\right\| ^2 - \left\| A_1 x_1 + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2 \right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^2 \nonumber \\&\qquad +\, \gamma \left( \sum \limits _{i=2}^{N-1}\left\| A_i x_i - A_i x_i^{k+1}\right\| \right) \left[ \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] . \end{aligned}$$
(4.10)

Combining (4.3), (4.4), (4.5) and (4.10), it holds for any \(\lambda \in \mathbf R^p\) that

$$\begin{aligned}&\sum _{i=1}^{N-1}\left( x_i-x_i^{k+1}\right) ^\top \left( g_i(x_i^{k+1})-A_i^\top \lambda ^{k+1}\right) +\left( x_N-x_N^{k+1}\right) ^\top \left( \nabla f_N(x_N^{k+1})-\lambda ^{k+1}\right) \nonumber \\&\qquad +\, (\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} -b\right) + \frac{1}{\gamma } (\lambda -\lambda ^{k+1} )^\top (\lambda ^{k+1} - \lambda ^k ) \nonumber \\&\qquad +\, \frac{\gamma }{2}\left( \left\| A_1 x_1 + \sum \limits _{i=2}^{N-1} A_i x_i^k + x_N^k - b\right\| ^2 - \left\| A_1 x_1 + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2\right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^k \right\| ^2 \nonumber \\&\qquad +\, \gamma \left( \sum \limits _{i=2}^{N-1}\left\| A_i x_i - A_i x_i^{k+1}\right\| \right) \left[ \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] \nonumber \\&\quad \ge 0. \end{aligned}$$
(4.11)

Using (2.5), we have

$$\begin{aligned} \frac{1}{\gamma }\left( \lambda -\lambda ^{k+1}\right) ^{\top }\left( \lambda ^{k+1}-\lambda ^{k}\right) +\frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^2 = \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1}\right\| ^2 \right) . \end{aligned}$$

Letting \(u=u^{*}\) in (4.11), and invoking the convexity of \(f_i\), we obtain

$$\begin{aligned}&f(u^*)-f(u^{k+1}) + \sum _{i=1}^{N-1}\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) +\left( x_N^*-x_N^{k+1}\right) ^\top (-\lambda ^{k+1}) \\&\qquad +\, \left( \lambda -\lambda ^{k+1}\right) ^\top \left( \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} -b\right) + \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1} \right\| ^2 \right) \\&\qquad +\, \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^* + \sum \limits _{i=2}^{N-1} A_i x_i^k + x_N^k - b\right\| ^2 - \left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2 \right) \\&\qquad +\, \gamma \left( \sum \limits _{i=2}^{N-1}\left\| A_i x_i^* - A_i x_i^{k+1}\right\| \right) \left[ \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] \ge 0. \end{aligned}$$

From Theorem 4.5 we know that the whole sequence \(\left( A_1 x_1^k, A_2 x_2^k, \ldots , A_{N-1}x_{N-1}^k, x_N^k,\lambda ^k\right) \) converges to \(\left( A_1 x_1^*,\ldots , A_{N-1}x_{N-1}^*, x_N^*,\lambda ^*\right) \). Therefore, there exists a constant \(D>0\) such that

$$\begin{aligned} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| \le D, \end{aligned}$$

for any \(k\ge 0\) and any \(i=2,3,\ldots ,N-1\). This implies (4.9). \(\square \)

Now, we are ready to prove the \(O(1/\epsilon )\) iteration complexity of the multi-block ADMM for (1.1).

Theorem 4.7

Suppose that \({\mathcal {L}}_\gamma (x_1,\ldots , x_N,\lambda )\) is a KL function. Let \((x_1^{k+1},x_{2}^{k+1},\ldots ,x_N^{k+1},\lambda ^{k+1})\in \Omega \) be generated by the multi-block ADMM (1.2) from given \((x_{2}^{k},\ldots ,x_{N}^{k},\lambda ^{k})\). For any integer \(t>0\), let \(\bar{u}^t = (\bar{x}_1^t; \bar{x}_2^t; \ldots ; \bar{x}_N^t)\) and \(\bar{\lambda }^{t}\) be defined as

$$\begin{aligned} \bar{x}_i^t = \frac{1}{t+1}\sum \limits _{k=0}^{t}x_i^{k+1}, \quad i=1,2,\ldots ,N, \quad \bar{\lambda }^t = \frac{1}{t+1}\sum \limits _{k=0}^t \lambda ^{k+1}. \end{aligned}$$

For any \((u^*,\lambda ^*) \in \Omega ^{*}\), by defining \(\rho :=\Vert \lambda ^*\Vert +1\), it holds in Scenario 2 that,

$$\begin{aligned} 0\le & {} f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| \\\le & {} \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\left\| \sum \limits _{i=2}^{N-1} A_i (x_i^0 - x_j^*) + (x_N^0 - x_N^*)\right\| ^2 + \frac{\gamma DG}{t+1}. \end{aligned}$$

Note this also implies that when \(t=O(1/\epsilon )\), \(\bar{u}^{t}\) is an \(\epsilon \)-optimal solution to the original problem (1.1), i.e., both the error of the objective function value and the residual of the equality constraint satisfy that

$$\begin{aligned} |f(\bar{u}^t)-f(u^*)| = O(\epsilon ), \quad \text{ and } \quad \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| =O(\epsilon ). \end{aligned}$$
(4.12)

Proof

Because \((u^{k},\lambda ^k)\in \Omega \), it holds that \((\bar{u}^{t},\bar{\lambda }^t)\in \Omega \) for all \(t\ge 0\). By Lemma 4.6 and invoking the convexity of function \(f(\cdot )\), we have for any \(\lambda \in \mathbf R^{p}\),

$$\begin{aligned}&f(u^*)-f(\bar{u}^t) + \lambda ^\top \left( \sum \limits _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b \right) \nonumber \\&\quad = f(u^*)-f(\bar{u}^t) + \sum _{i=1}^{N-1}\left( x_i^*-\bar{x}_i^t\right) ^\top \left( -A_i^\top \bar{\lambda }^t\right) \nonumber \\&\qquad +\, \left( x_N^*-\bar{x}_N^t\right) ^\top (-\bar{\lambda }^t) + (\lambda -\bar{\lambda }^t)^\top \left( \sum _{i=1}^{N-1} A_i\bar{x}_i^t + \bar{x}_N^t - b\right) \nonumber \\&\quad \ge \frac{1}{t+1}\sum \limits _{k=0}^t\left( f(u^*)-f(u^{k+1})+\sum _{i=1}^{N-1}\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) \right. \nonumber \\&\left. \qquad +\left( x_N^*-x_N^{k+1}\right) ^\top (-\lambda ^{k+1})\right. \nonumber \\&\qquad \left. +\,(\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \right) \nonumber \\&\quad \ge \frac{1}{t+1}\sum \limits _{k=0}^t\left[ \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^{k+1}\right\| ^2 - \left\| \lambda -\lambda ^k \right\| ^2 \right) \right. \nonumber \\&\qquad -\, \gamma D(N-2)\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right) \nonumber \\&\qquad \left. +\,\frac{\gamma }{2}\left( \left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2 - \left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^k + x_N^k - b\right\| ^2 \right) \right] \nonumber \\&\quad \ge -\frac{1}{2\gamma (t+1)}\left\| \lambda -\lambda ^0 \right\| ^2 - \frac{\gamma }{2(t+1)}\left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^0 + x_N^0 - b\right\| ^2 \nonumber \\&\qquad - \frac{\gamma D (N-2)}{t+1}\sum \limits _{k=0}^t \left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right) \nonumber \\&\quad \ge -\frac{1}{2\gamma (t+1)}\left\| \lambda -\lambda ^0\right\| ^2 - \frac{\gamma }{2(t+1)}\left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^0 + x_N^0 - b\right\| ^2 - \frac{\gamma D G (N-2)}{t+1}, \end{aligned}$$
(4.13)

where the last inequality holds due to Theorem 4.5. From the optimality conditions (2.1) we obtain

$$\begin{aligned} 0 \ge f(u^*)-f(\bar{u}^t) + (\lambda ^*)^\top \left( \sum \limits _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b \right) . \end{aligned}$$

Since \(\rho :=\Vert \lambda ^*\Vert +1\), by the Cauchy–Schwarz inequality, we obtain

$$\begin{aligned} 0 \le f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b \right\| . \end{aligned}$$
(4.14)

By setting \(\lambda = -\rho (\sum _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b)/\Vert \sum _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b\Vert \) in (4.13), and noting that \(\Vert \lambda \Vert =\rho \), we obtain

$$\begin{aligned}&f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b \right\| \nonumber \\&\quad \le \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\left\| \sum \limits _{i=2}^{N-1} A_i (x_i^0 - x_i^*) + (x_N^0 - x_N^*)\right\| ^{2} + \frac{\gamma D G (N-2)}{t+1}. \end{aligned}$$
(4.15)

When \(t=O(1/\epsilon )\), we have that the right hand side of (4.15) is of order \(O(\epsilon )\). By the same arguments as in the proof for Theorem 3.2, (4.12) follows from (4.14) and (4.15). \(\square \)

4.1 Divergence Counterexample Revisited

In this subsection we shall test the same divergence counterexample as considered in Sect. 3.1, but now equivalently reformulated in a way to satisfy the conditions required in Scenario 2. The example in [7] can obviously be formulated equivalently as:

$$\begin{aligned} \begin{array}{ll} \min &{} \frac{1}{2}\Vert y\Vert ^2 \\ \text {s.t.}&{} A_1x_1+A_2x_2+A_3x_3+y=0, \end{array} \end{aligned}$$
(4.16)

where \(A_1, A_2, A_3\) are the same as in Sect. 3.1. Note that (4.16) is in the form of the sharing problem with four block variables \(x_1\), \(x_2\), \(x_3\) and y, and it satisfies assumptions in Scenario 2. Note that (4.16) corresponds to \(f_4(\cdot )=\frac{1}{2}\Vert \cdot \Vert ^2\) and the Lipschitz constant \(L=1\) for \(\nabla f_4\). It is easy to see that the unique solution to (4.16) is \(x_1=x_2=x_3=0\) and \(y=(0,0,0)^\top \). We then apply ADMM (1.2) to solve (4.16) with \(\gamma = 1.01*\sqrt{2}*L\) so that the assumption on \(\gamma \) in Scenario 2 is satisfied. Similarly as the tests in Sect. 3.1, we plot the residual of the equality constraint and the components of the average of \(x^k\) and \(y^k\) for \(k=0,\ldots ,t\) in Fig. 3, where t denotes the iteration number. From Fig. 3 we again observed the convergence of ADMM. We then tested the same problem using a smaller \(\gamma \), which was set to \(0.01*\sqrt{2}*L\). The residual of the equality constraint is plotted in Fig. 4. Figure 4 shows that after a few number of iterations, ADMM diverges. This indicates that if the assumption \(\gamma > \sqrt{2}*L\) is violated, then it is possible that ADMM diverges for solving the sharing problem (4.16).

Fig. 3
figure 3

ADMM for solving the sharing problem (4.16) with \(\gamma = 1.01*\sqrt{2}*L\). Left: the residual of the equality constraint; Right: the average of \(x^k\) and \(y^k\) for \(k=0,\ldots ,t\) (denoted as xa and ya respectively)

Fig. 4
figure 4

The residual of the equality constraint of ADMM for solving the sharing problem (4.16) with \(\gamma = 0.01*\sqrt{2}*L\)