Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity

Lin, Tianyi; Ma, Shiqian; Zhang, Shuzhong

doi:10.1007/s10915-016-0182-0

Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity

Published: 01 March 2016

Volume 69, pages 52–81, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Scientific Computing Aims and scope Submit manuscript

Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity

Download PDF

Tianyi Lin¹,
Shiqian Ma¹ &
Shuzhong Zhang²

1169 Accesses
42 Citations
Explore all metrics

Abstract

The alternating direction method of multipliers (ADMM) is widely used in solving structured convex optimization problems due to its superior practical performance. On the theoretical side however, a counterexample was shown in Chen et al. (Math Program 155(1):57–79, 2016.) indicating that the multi-block ADMM for minimizing the sum of N $(N\ge 3)$ convex functions with N block variables linked by linear constraints may diverge. It is therefore of great interest to investigate further sufficient conditions on the input side which can guarantee convergence for the multi-block ADMM. The existing results typically require the strong convexity on parts of the objective. In this paper, we provide two different ways related to multi-block ADMM that can find an $\epsilon $-optimal solution and do not require strong convexity of the objective function. Specifically, we prove the following two results: (1) the multi-block ADMM returns an $\epsilon $-optimal solution within $O(1/\epsilon ^2)$ iterations by solving an associated perturbation to the original problem; this case can be seen as using multi-block ADMM to solve a modified problem; (2) the multi-block ADMM returns an $\epsilon $-optimal solution within $O(1/\epsilon )$ iterations when it is applied to solve a certain sharing problem, under the condition that the augmented Lagrangian function satisfies the Kurdyka–Łojasiewicz property, which essentially covers most convex optimization models except for some pathological cases; this case can be seen as applying multi-block ADMM to solving a special class of problems.

Block-Wise ADMM with a Relaxation Factor for Multiple-Block Convex Programming

Article 02 February 2018

On the Sublinear Convergence Rate of Multi-block ADMM

Article 19 July 2015

Convergence analysis of the direct extension of ADMM for multiple-block separable convex minimization

Article 13 October 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We consider the following multi-block convex minimization problem:

$$\begin{aligned} \begin{array}{ll} \min &{} f_1(x_1) + f_2(x_2) + \cdots + f_N(x_N) \\ \text {s.t.}&{} A_1 x_1 + A_2 x_2 + \cdots + A_N x_N = b \\ &{} x_i \in {\mathcal {X}}_i, \quad i = 1,\ldots , N, \end{array} \end{aligned}$$

(1.1)

where $A_i \in \mathbf R^{p\times n_i}$, $b\in \mathbf R^p$, ${\mathcal {X}}_i\subset \mathbf R^{n_i}$ are closed convex sets, and $f_i:\mathbf R^{n_i}\rightarrow \mathbf R$ are closed convex functions. Problem (1.1) is a general convex minimization problem with linear equality constraints, and there exist many algorithms for solving this problem. For example, the augmented Lagrangian method can be used to solve it. In this paper, we consider the case that the functions $f_i, i=1,\ldots ,N$ have certain structures so that it is beneficial to handle them separately, rather than treat them as a whole. One effective way to solve (1.1) that can take advantage of its separable structure is the so-called Alternating Direction Method of Multipliers (ADMM). The ADMM is closely related to the Douglas–Rachford [12] and Peaceman–Rachford [34] operator splitting methods that date back to 1950s. These operator splitting methods were further studied later in [13, 16, 18, 32]. The ADMM has been revisited recently due to its success in solving problems with special structures arising from compressed sensing, machine learning, image processing, and so on; see the recent survey papers [5, 15] for more information.

The ADMM is constructed under an augmented Lagrangian framework, where the augmented Lagrangian function for (1.1) is defined as

$$\begin{aligned} {\mathcal {L}}_\gamma (x_1,\ldots ,x_N;\lambda ) := \sum _{j=1}^N f_j(x_j) - \left\langle \lambda , \sum _{j=1}^N A_j x_j -b\right\rangle + \frac{\gamma }{2}\left\| \sum _{j=1}^N A_j x_j -b\right\| ^2, \end{aligned}$$

where $\lambda $ is the Lagrange multiplier and $\gamma > 0$ is a penalty parameter. In a typical iteration of the ADMM for solving (1.1), the following updating procedure is implemented:

$$\begin{aligned} \left\{ \begin{array}{lcl} x_1^{k+1} &{} := &{} \mathrm{argmin}_{x_1\in {\mathcal {X}}_1} \ {\mathcal {L}}_\gamma (x_1,x_2^k,\ldots ,x_N^k;\lambda ^k) \\ x_2^{k+1} &{} := &{} \mathrm{argmin}_{x_2\in {\mathcal {X}}_2} \ {\mathcal {L}}_\gamma (x_1^{k+1},x_2,x_3^k,\ldots ,x_N^k;\lambda ^k) \\ &{} \vdots &{} \\ x_N^{k+1} &{} := &{} \mathrm{argmin}_{x_N\in {\mathcal {X}}_N} \ {\mathcal {L}}_\gamma (x_1^{k+1},x_2^{k+1},\ldots ,x_{N-1}^{k+1},x_N;\lambda ^k)\\ \lambda ^{k+1} &{} := &{} \lambda ^k - \gamma \left( \sum _{j=1}^N A_j x_j^{k+1} -b\right) . \end{array}\right. \end{aligned}$$

(1.2)

Note that the ADMM (1.2) minimizes in each iteration the augmented Lagrangian function with respect to $x_1,\ldots ,x_N$ alternatingly in a Gauss-Seidel manner. The ADMM (1.2) for solving two-block convex minimization problems (i.e., $N=2$) has been studied extensively in the literature. The global convergence of ADMM (1.2) when $N=2$ has been shown in [14, 17]. There are also some recent works that study the convergence rate properties of ADMM when $N=2$ (see, e.g., [2, 11, 23, 24, 33]).

However, the convergence of multi-block ADMM (1.2) (we call (1.2) multi-block ADMM when $N\ge 3$) has remained unclear for a long time. Recently, Chen et al. [7] constructed a counterexample to show the failure of ADMM (1.2) when $N\ge 3$. Notwithstanding its lack of theoretical convergence assurance, the multi-block ADMM (1.2) has been applied very successfully to solve problems with $N\, (N\ge 3)$ block variables; for example, see [35, 37]. It is thus of great interest to further study sufficient conditions that can guarantee the convergence of multi-block ADMM. Some recent works on studying the sufficient conditions guaranteeing the convergence of multi-block ADMM are described briefly as follows. Han and Yuan [19] showed that the multi-block ADMM (1.2) converges if all the functions $f_1,\ldots ,f_N$ are strongly convex and $\gamma $ is restricted to certain region. This condition is relaxed in [8, 31] to allow only $N-1$ functions to be strongly convex and $\gamma $ is restricted to a certain region. Especially, Lin et al. [31] proved the sublinear convergence rate under such conditions. Cai et al. [6] and Li et al. [28] proved that for $N=3$, convergence of multi-block ADMM can be guaranteed under the assumption that only one function among $f_1$, $f_2$ and $f_3$ is required to be strongly convex, and $\gamma $ is restricted to be in a certain region. In addition to strong convexity of $f_2,\ldots ,f_N$, by assuming further conditions on the smoothness of the functions and some rank conditions on the matrices in the linear constraints, Lin et al. [30] proved the globally linear convergence of multi-block ADMM. Davis and Yin [9] studied a variant of the 3-block ADMM (see Algorithm 8 in [9]) which requires that $f_1$ is strongly convex and $\gamma $ is smaller than a certain bound to guarantee the convergence. More recently, Lin, Ma and Zhang [29] proved that the 3-block ADMM is globally convergent for any $\gamma >0$ when it is applied to solve a class of regularized least squares decomposition problems. Note that the above mentioned works all require that (parts of) the objective function is strongly convex. Without assuming strong convexity, Hong and Luo [26] studied a variant of ADMM (1.2) with small stepsize in updating the Lagrange multiplier. Specifically, [26] proposes to replace the last equation in (1.2) to $ \lambda ^{k+1} := \lambda ^k - \alpha \gamma (\sum _{j=1}^N A_j x_j^{k+1} -b ), $ where $\alpha >0$ is a small step size. Linear convergence of this variant is proven under the assumption that the objective function satisfies certain error bound conditions. However, it is noted that the selection of $\alpha $ is in fact bounded by some parameters associated with the error bound conditions to guarantee the convergence. Therefore, it might be difficult to choose $\alpha $ in practice. There are also studies on the convergence and convergence rate of some other variants of ADMM (1.2), and we refer the interested readers to [10, 20–22, 25, 36, 38] for the details of these variants. However, it is observed by many researchers that modified versions of ADMM though with convergence guarantee, often perform slower than the multi-block ADMM with no convergent guarantee. The authors of [36] made the following statement (see page 885 in [36]): “However, to the best of our knowledge, up to now the dilemma is that at least for convex conic programming, the modified versions, though with a convergence guarantee, often perform two to three times slower than the multiblock ADMM with no convergent guarantee.” Wang et al. [38] compared the multi-block ADMM with several modifications of it, and the numerical experiments on basis pursuit and latent variable Gaussian graphical model selection problems show that the multi-block ADMM always outperforms its modifications tested (see Section 3 of [38]). These results motivated us to further study whether ADMM can be guaranteed to converge when it is applied to solve certain class of problems, although it is not necessarily convergent in general.

Our contributions. The main contributions of this paper are as follows. First, we show that if ADMM (1.2) is applied to solve an associated perturbed problem of (1.1) rather than (1.1) itself, then it returns an $\epsilon $-optimal solution to (1.1) within $O(1/\epsilon ^2)$ iterations, with the condition that $\gamma $ depends on $\epsilon $. Here we do not assume strong convexity of any objective function $f_i$. Note that our result in this part can be seen as a perturbation approach. Instead of modifying the ADMM procedure itself as most papers in the literature propose, we propose to modify the problem. Such approach is not uncommon. The perturbation technique applied to resolve the cycling problem in the simplex method for linear programming is one famous example of this type. Modifying the algorithm sometimes requires additional computational efforts. Moreover, as shown in [36] and [38], modified versions of ADMM perform worse than the original ADMM for some problems. Our approach maintains the effectiveness of ADMM, at least for certain problems discussed in [36] and [38], because we do not modify the algorithm itself. Secondly, we show that the ADMM (1.2) returns an $\epsilon $-optimal solution within $O(1/\epsilon )$ iterations when it is applied to solve the so-called sharing problems, under the condition that the augmented Lagrangian ${\mathcal {L}}_\gamma $ is a Kurdyka–Łojasiewicz (KL) function [3, 4], and $\gamma $ is sufficiently large. Interestingly, it turns out that ADMM possesses nice convergence properties when it is applied to solve sharing problems. In a recent work by Hong et al. [27], global convergence of ADMM was established even when it is applied to solve a nonconvex sharing problem. In this paper, we show that by combining the KL property and the convexity of the problem, we can establish a sublinear convergence rate of ADMM for solving sharing problem.

Organization. The rest of this paper is organized as follows. In Sect. 2 we provide some preliminaries for our convergence rate analysis. In Sect. 3, we prove the $O(1/\epsilon ^2)$ iteration complexity of ADMM (1.2) by introducing an associated problem of (1.1). In Sect. 4, we prove the $O(1/\epsilon )$ iteration complexity of ADMM (1.2) with Kurdyka–Łojasiewicz (KL) property.

2 Preliminaries

We denote $\Omega = {\mathcal {X}}_1\times \cdots \times {\mathcal {X}}_N \times \mathbf R^p$ and the optimal set of (1.1) as $\Omega ^*$, and the following assumption is made throughout this paper.

Assumption 2.1

The optimal set $\Omega ^*$ for problem (1.1) is non-empty.

According to the first-order optimality conditions for (1.1), solving (1.1) is equivalent to finding $(x_1^*,x_2^*,\ldots ,x_N^*,\lambda ^*)\in \Omega ^*$ such that the following holds:

$$\begin{aligned} \left\{ \begin{array}{l} (x_i-x_i^*)^\top (g_i(x_i^*)-A_i^\top \lambda ^*) \ge 0, \quad \forall x_i\in {\mathcal {X}}_i, \quad i=1,2,\ldots ,N\\ A_{1}x_{1}^{*} + \cdots +A_{N}x_{N}^{*}-b=0. \end{array} \right. \end{aligned}$$

(2.1)

In this paper, we analyze the iteration complexity of ADMM (1.2) under two scenarios. The conditions of the two scenarios are listed in Table 1. The following assumption is only used in Scenario 2.

Assumption 2.2

We assume that ${\mathcal {X}}_N=\mathbf R^{n_N}$. We also assume that $f_i$ has a finite lower bound, i.e., $\inf _{x_i\in {\mathcal {X}}_i} f_i(x_i)\ge f_i^*>-\infty $ for $i=1,2,\ldots ,N$. Moreover, we assume that $f_i+\mathbf 1 _{{\mathcal {X}}_i}$ is a coercive function for $i=1,2,\ldots ,N-1$, where $\mathbf 1 _{{\mathcal {X}}_i}$ denotes the indicator function of ${\mathcal {X}}_i$, i.e.,

$$\begin{aligned} \mathbf 1 _{{\mathcal {X}}_i}(x_i) = \left\{ \begin{array}{l@{\quad }l} 0, &{} \text{ if } \; x_i\in {\mathcal {X}}_i \\ +\infty , &{} \text{ otherwise. } \end{array}\right. \end{aligned}$$

Furthermore, we assume that ${\mathcal {L}}_\gamma $ is a KL function (will be defined later).

Table 1 Two scenarios leading to sublinear convergence

Full size table

Remark 2.3

Some remarks are in order here regarding the conditions in Scenario 2. Note that it is not very restrictive to require $f_i+\mathbf 1 _{{\mathcal {X}}_i}$ to be a coercive function. In fact, many functions used as regularization terms including $\ell _1$-norm, $\ell _2$-norm, $\ell _\infty $-norm for vectors and nuclear norm for matrices are all coercive functions; assuming the compactness of ${\mathcal {X}}_i$ also leads to the coerciveness of $f_i+\mathbf 1 _{{\mathcal {X}}_i}$. Moreover, the assumptions $A_N=I$ and $\nabla f_N$ is Lipschitz continuous actually cover many interesting applications in practice. For example, many problems arising from machine learning, statistics, image processing and so on always have the following structure:

$$\begin{aligned} \min \ f_1(x_1) + \cdots + f_{N-1}(x_{N-1}) + f_N(b-A_1x_1- \cdots - A_{N-1}x_{N-1}), \end{aligned}$$

(2.2)

where $f_N$ denotes a loss function on data fitting, which is usually a smooth function, and $f_1, \ldots , f_{N-1}$ are regularization terms to promote certain structures of the solution. This problem is usually referred as sharing problem (see, e.g., [5, 27]). (2.2) can be reformulated as

$$\begin{aligned} \begin{array}{ll} \min &{} f_1(x_1) + \cdots + f_{N-1}(x_{N-1}) + f_N(x_N) \\ \text {s.t.}&{} A_1x_1+ \cdots + A_{N-1}x_{N-1} + x_N = b, \end{array} \end{aligned}$$

(2.3)

which is in the form of (1.1) and can be solved by ADMM (see [5, 27]). Note that $A_N=I$ in (2.3) and it is very natural to assume that $\nabla f_N$ is Lipschitz continuous. Thus the conditions in Scenario 2 are satisfied.

Notation

For simplicity, we use u to denote the stacked vectors $x_1,\ldots ,x_N$, i.e., $u=[x_1;x_2;\ldots ;x_N]$, $u^k=[x_1^k;x_2^k;\ldots ;x_N^k]$, $u^*=[x_1^*,x_2^*;\ldots ;x_N^*]$ and denote $w = [u;\lambda ]$. We denote by $f(u)\equiv f_1(x_{1})+\cdots +f_{N}(x_{N})$ the objective function of problem (1.1); $\mathbf {1}_{{\mathcal {X}}}$ is the indicator function of ${\mathcal {X}}$; $\nabla f$ is the gradient of f; $\Vert x\Vert $ denotes the Euclidean norm of x.

In our analysis, the following two well-known identities are used frequently,

$$\begin{aligned} (w_1-w_2)^{\top }(w_3-w_4)= & {} \frac{1}{2}\left( \Vert w_1-w_4\Vert ^{2}-\Vert w_1-w_3\Vert ^{2}\right) \nonumber \\&+\,\frac{1}{2}\left( \Vert w_3-w_2\Vert ^{2}-\Vert w_4-w_2\Vert ^{2}\right) \end{aligned}$$

(2.4)

$$\begin{aligned} (w_{1}-w_{2})^\top (w_{3}-w_{1})= & {} \frac{1}{2}\left( \Vert w_{2}-w_{3}\Vert ^{2}-\Vert w_{1}-w_{2}\Vert ^{2}-\Vert w_{1}-w_{3}\Vert ^{2}\right) . \end{aligned}$$

(2.5)

3 Iteration Complexity of ADMM: Associated Perturbation

In this section, we prove the $O(1/\epsilon ^2)$ iteration complexity of ADMM (1.2) under the conditions in Scenario 1 of Table 1. Indeed, given $\epsilon >0$ sufficiently small and $u^0:=(x_1^0;\ldots ;x_N^0)$, we introduce an associated perturbed problem of (1.1), i.e.,

$$\begin{aligned} \begin{array}{ll} \min &{} f_1(x_1) + \tilde{f}_2(x_2) + \cdots + \tilde{f}_N(x_N) \\ \text {s.t.}&{} A_1 x_1 + A_2 x_2 + \cdots + A_N x_N = b \\ &{} x_i\in {\mathcal {X}}_i,\quad i=1,\ldots ,N, \end{array} \end{aligned}$$

(3.1)

where $\tilde{f}_i(x_i) = f_i(x_i) + \frac{\mu }{2}\left\| A_i x_i - A_i x_i^0\right\| ^2$ for $i=2,\ldots ,N$, and $\mu = \epsilon (N-2)(N+1)$. Note $\tilde{f}_i$ are not necessarily strongly convex. We prove that the ADMM (1.2) for associated perturbed problem (3.1) returns an $\epsilon $-optimal solution of the original problem (1.1), in terms of both objective value and constraint violation, within $O(1/\epsilon ^2)$ iterations.

The ADMM for solving (3.1) can be summarized as (note that some constant terms in the subproblems are discarded):

$$\begin{aligned} x_{1}^{k+1}:= & {} \mathrm{argmin}_{x_{1} \in {\mathcal {X}}_{1}}f_{1}(x_{1})+\frac{\gamma }{2}\left\| A_{1}x_{1} + \sum \limits _{j=2}^N A_{j}x_{j}^{k}-b-\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \end{aligned}$$

(3.2)

$$\begin{aligned} x_{i}^{k+1}:= & {} \mathrm{argmin}_{x_{i} \in {\mathcal {X}}_{i}}\tilde{f}_{i}(x_{i})\nonumber \\&+\,\frac{\gamma }{2}\left\| \sum \limits _{j=1}^{i-1} A_{j}x_{j}^{k+1} + A_{i}x_{i} + \sum \limits _{j=i+1}^{N} A_{j}x_{j}^{k}-b-\frac{1}{\gamma }\lambda ^{k}\right\| ^{2}, \ i = 2,\ldots ,N, \end{aligned}$$

(3.3)

$$\begin{aligned} \lambda ^{k+1}:= & {} \lambda ^{k} - \gamma \left( A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+ \cdots + A_{N}x_{N}^{k+1}-b\right) . \end{aligned}$$

(3.4)

The first-order optimality conditions for (3.2)–(3.3) are given respectively by $x_{i}^{k+1}\in {\mathcal {X}}_{i}$ and

$$\begin{aligned}&\left( x_{1}-x_{1}^{k+1}\right) ^{\top }\left[ g_1\left( x_{1}^{k+1}\right) -A_{1}^{\top }\lambda ^{k}+\gamma A_{1}^{\top }\left( A_{1}x_{1}^{k+1}+\sum \limits _{j=2}^N A_{j}x_{j}^{k}-b\right) \right] \ge 0, \end{aligned}$$

(3.5)

$$\begin{aligned}&\left( x_{i}-x_{i}^{k+1}\right) ^{\top }\left[ g_i\left( x_{i}^{k+1}\right) +\mu A_i^\top A_i\left( x_i^{k+1}-x_i^0\right) -A_{i}^{\top }\lambda ^{k} \right. \nonumber \\&\quad \left. +\,\gamma A_{i}^{\top }\left( \sum \limits _{j=1}^{i} A_{j}x_{j}^{k+1} + \sum \limits _{j=i+1}^{N} A_{j}x_{j}^{k}-b\right) \right] \ge 0, \end{aligned}$$

(3.6)

hold for any $x_i\in {\mathcal {X}}_i$ and $g_i \in \partial f_i$, a subgradient of $f_i$, for $i=1,2,\ldots ,N$. Moreover, by combining with (3.4), (3.5)–(3.6) can be rewritten as

$$\begin{aligned}&\left( x_{1}-x_{1}^{k+1}\right) ^{\top }\left[ g_{1}\left( x_{1}^{k+1}\right) -A_{1}^{\top }\lambda ^{k+1} +\gamma A_{1}^{\top }\left( \sum \limits _{j=2}^N A_{j}\left( x_{j}^{k}-x_{j}^{k+1}\right) \right) \right] \ge 0, \end{aligned}$$

(3.7)

$$\begin{aligned}&\left( x_{i}-x_{i}^{k+1}\right) ^{\top }\left[ g_i\left( x_{i}^{k+1}\right) +\mu A_i^\top A_i\left( x_i^{k+1}- x_i^0\right) -A_{i}^{\top }\lambda ^{k+1} \right. \nonumber \\&\left. \quad +\,\gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^N A_{j}(x_{j}^{k}-x_{j}^{k+1})\right) \right] \ge 0. \end{aligned}$$

(3.8)

Lemma 3.1

Let $(x_1^{k+1},x_{2}^{k+1},\ldots ,x_N^{k+1},\lambda ^{k+1})\in \Omega $ be generated by the ADMM (3.2)–(3.4) from given $(x_{2}^{k},\ldots ,x_{N}^{k},\lambda ^{k})$. For any $u^*=(x_1^*,x_2^*,\ldots ,x_N^*)\in \Omega ^*$ and $\lambda \in \mathbf R^p$, it holds true under the conditions in Scenario 1 that

$$\begin{aligned}&f(u^*)-f\left( u^{k+1}\right) +\sum _{i=1}^N \left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) + (\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^NA_ix_i^{k+1}-b\right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^{k}\right\| ^{2}-\left\| \lambda -\lambda ^{k+1}\right\| ^{2}\right) + \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^{0}\right\| ^{2} \nonumber \\&\qquad +\, \frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_{j}x_{j}^* + \sum \limits _{j=i+1}^N A_j x_j^{k} - b\right\| ^{2}-\left\| \sum \limits _{j=1}^i A_{j}x_{j}^* + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^{2}\right) \nonumber \\&\quad \ge 0. \end{aligned}$$

(3.9)

Proof

For $i=1,2,\ldots ,N-1$, we have,

$$\begin{aligned}&\left( x_i -x_i^{k+1}\right) ^\top A_i^\top \left( \sum \limits _{j=i+1}^N A_j \left( x_j^k - x_j^{k+1}\right) \right) \\&\quad = \left[ \left( \sum \limits _{j=1}^i A_j x_j - b\right) - \left( \sum \limits _{j=1}^{i-1} A_j x_j + A_i x_i^{k+1}-b\right) \right] ^\top \left[ \left( -\sum \limits _{j=i+1}^N A_j x_j^{k+1}\right) \right. \\&\left. \qquad -\,\left( -\sum \limits _{j=i+1}^N A_j x_j^k\right) \right] \\&\quad = \frac{1}{2}\left( \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^{2}-\left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^{2}\right) \\&\qquad +\,\frac{1}{2}\left( \left\| \sum \limits _{j=1}^{i-1} A_j x_j + \sum \limits _{j=i}^N A_j x_j^{k+1} - b\right\| ^{2} - \left\| \sum \limits _{j=1}^{i-1} A_j x_j + A_i x_i^{k+1}+\sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^{2}\right) \\&\quad \le \frac{1}{2}\left( \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^{2}-\left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^{2}\right) \\&\qquad +\,\frac{1}{2}\left\| \sum \limits _{j=1}^{i-1} A_j x_j + \sum \limits _{j=i}^N A_j x_j^{k+1} - b \right\| ^{2}, \end{aligned}$$

where in the second equality we applied the identity (2.4). Summing the above inequality for $i=1,\ldots ,N-1$ and using (3.4) yield,

$$\begin{aligned}&\gamma \sum _{i=1}^{N-1}\left( x_i-x_i^{k+1}\right) ^\top A_i^\top \left( \sum _{j=i+1}^N A_j\left( x_j^k-x_j^{k+1}\right) \right) \nonumber \\&\quad \le \frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^2 - \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^2\right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^k\right\| ^{2}+ \frac{\gamma }{2}\sum \limits _{i=2}^{N-1}\left\| \sum \limits _{j=1}^{i-1} A_j x_j +\sum \limits _{j=i}^N A_j x_j^{k+1}-b\right\| ^2. \end{aligned}$$

(3.10)

Adding (3.7) and (3.8), and combining the resulting inequality with (3.4) and (3.10) yields that the following inequality holds for any $\lambda \in \mathbf R^p$:

$$\begin{aligned}&\sum _{i=1}^N \left( x_i-x_i^{k+1}\right) ^\top \left( g_i(x_i^{k+1})-A_i^\top \lambda ^{k+1}\right) +(\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^NA_ix_i^{k+1}-b\right) \nonumber \\&\qquad +\, \frac{1}{\gamma }(\lambda -\lambda ^{k+1})^\top (\lambda ^{k+1}-\lambda ^k) \nonumber \\&\qquad +\,\mu \sum \limits _{i=2}^N \left( x_i - x_i^{k+1}\right) ^\top A_i^\top A_i\left( x_i^{k+1} - x_i^0\right) + \frac{1}{2\gamma }\left\| \lambda ^{k+1} - \lambda ^k \right\| ^2 \nonumber \\&\qquad +\, \frac{\gamma }{2}\sum \limits _{i=2}^{N-1}\left\| \sum \limits _{j=1}^{i-1} A_j x_j + \sum \limits _{j=i}^N A_j x_j^{k+1} - b\right\| ^2 \nonumber \\&\qquad +\, \frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^2 -\left\| \sum \limits _{j=1}^i A_j x_j + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^2\right) \nonumber \\&\quad \ge 0. \end{aligned}$$

(3.11)

Using (2.5), we have

$$\begin{aligned} \frac{1}{\gamma }\left( \lambda -\lambda ^{k+1}\right) ^{\top }\left( \lambda ^{k+1} - \lambda ^k \right) +\frac{1}{2\gamma }\left\| \lambda ^{k+1} - \lambda ^k \right\| ^2 = \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1}\right\| ^2\right) , \end{aligned}$$

and

$$\begin{aligned}&\mu \left( x_i - x_i^{k+1}\right) ^\top A_i^\top A_i\left( x_i^{k+1} - x_i^0 \right) \\&\quad = \frac{\mu }{2}\left( \left\| A_i x_i - A_i x_i^0 \right\| ^2 - \left\| A_i x_i^{k+1} - A_i x_i^0 \right\| ^{2} - \left\| A_i x_i - A_i x_i^{k+1}\right\| ^2\right) \\&\quad \le \frac{\mu }{2}\left\| A_i x_i - A_i x_i^0 \right\| ^2 - \frac{\mu }{2}\left\| A_i x_i - A_i x_i^{k+1} \right\| ^2. \end{aligned}$$

Letting $u=u^{*}$ in (3.11), and invoking the convexity of $f_i$ that

$$\begin{aligned} f_{i}\left( x_{i}^{*}\right) -f_{i}\left( x_{i}^{k+1}\right) \ge \left( x_{i}^{*}-x_{i}^{k+1}\right) ^{\top } g_{i}\left( x_{i}^{k+1}\right) ,\quad i=1,2,\ldots ,N \end{aligned}$$

and

$$\begin{aligned} \frac{\gamma }{2}\sum \limits _{i=2}^{N-1}\left\| \sum \limits _{j=1}^{i-1} A_j x_j^* + \sum \limits _{j=i}^N A_j x_j^{k+1} - b\right\| ^2= & {} \frac{\gamma }{2}\sum \limits _{i=2}^{N-1}\left\| \sum \limits _{j=i}^N A_j (x_j^{k+1} - x_j^*)\right\| ^2 \\\le & {} \frac{\gamma (N+1)(N-2)}{2}\sum \limits _{i=2}^{N} \left\| A_i x_i^{k+1} - A_i x_i^*\right\| ^2, \end{aligned}$$

we obtain,

$$\begin{aligned}&f(u^*)-f(u^{k+1})+\sum _{i=1}^N\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) + (\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^NA_ix_i^{k+1}-b\right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1}\right\| ^2 \right) \\&\qquad +\, \frac{\mu }{2}\sum \limits _{i=2}^N \left( \left\| A_i x_i^* - A_i x_i^0\right\| ^2 - \left\| A_i x_i^* - A_i x_i^{k+1} \right\| ^2 \right) \\&\qquad +\, \frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^2 - \left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b \right\| ^{2}\right) \\&\qquad +\,\frac{\gamma (N+1)(N-2)}{2}\sum \limits _{i=2}^{N} \left\| A_i x_i^* - A_i x_i^{k+1}\right\| ^{2} \\&\quad \ge 0. \end{aligned}$$

This together with the facts that $\mu = \epsilon (N-2)(N+1)$ and $\gamma \le \epsilon $ implies that

$$\begin{aligned} \frac{\gamma (N+1)(N-2)}{2}\sum \limits _{j=2}^{N} \left\| A_j x_j^* - A_j x_j^{k+1}\right\| ^{2} - \frac{\mu }{2}\sum \limits _{j=2}^{N} \left\| A_j x_j^* - A_j x_j^{k+1}\right\| ^{2}\le 0, \end{aligned}$$

which further implies the desired inequality (3.9). $\square $

Now we are ready to prove the $O(1/\epsilon ^2)$ iteration complexity of the ADMM for (1.1) in an ergodic case.

Theorem 3.2

Let $(x_1^{k+1},x_2^{k+1},\ldots ,x_N^{k+1},\lambda ^{k+1})\in \Omega $ be generated by ADMM (3.2)–(3.4) from given $(x_2^k,\ldots ,x_N^k,\lambda ^k)$. For any integer $t>0$, let $\bar{u}^t = (\bar{x}_1^t; \bar{x}_2^t; \ldots ; \bar{x}_N^t)$ and $\bar{\lambda }^t$ be defined as

$$\begin{aligned} \bar{x}_{i}^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}x_{i}^{k+1}, \quad i=1,2,\ldots ,N, \quad \bar{\lambda }^{t}=\frac{1}{t+1}\sum \limits _{k=0}^{t}\lambda ^{k+1}. \end{aligned}$$

For any $(u^*,\lambda ^*) \in \Omega ^{*}$, by defining $\rho :=\Vert \lambda ^*\Vert +1$, it holds under conditions in Scenario 1 that,

$$\begin{aligned} 0\le & {} f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| \nonumber \\\le & {} \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\sum \limits _{i=1}^{N-1}\left\| \sum \limits _{j=i+1}^N A_j (x_j^0 - x_j^*)\right\| ^{2} \\&+\, \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^0 \right\| ^{2}. \end{aligned}$$

This also implies that when $t=O(1/\epsilon ^2)$, $\bar{u}^{t}$ is an $\epsilon $-optimal solution to the original problem (1.1), i.e., both the error of the objective function value and the residual of the equality constraint satisfy that

$$\begin{aligned} |f(\bar{u}^t)-f(u^*)| = O(\epsilon ), \quad \text{ and } \quad \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| =O(\epsilon ). \end{aligned}$$

(3.12)

Proof

Because $(u^{k},\lambda ^k)\in \Omega $, it holds that $(\bar{u}^{t},\bar{\lambda }^t)\in \Omega $ for all $t\ge 0$. By Lemma 3.1 and invoking the convexity of function $f(\cdot )$, we have

$$\begin{aligned}&f(u^*)-f(\bar{u}^t) + \lambda ^\top \left( \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right) \nonumber \\&\quad = f(u^*)-f(\bar{u}^t)+ \sum _{i=1}^N\left( x_i^*-\bar{x}_i^t\right) ^\top \left( -A_i^\top \bar{\lambda }^t\right) + (\lambda -\bar{\lambda }^t)^\top \left( \sum _{i=1}^NA_i\bar{x}_i^t-b\right) \nonumber \\&\quad \ge \frac{1}{t+1}\sum \limits _{k=0}^{t}\left[ f(u^*) - f(u^{k+1})\right. \nonumber \\&\left. \qquad +\,\sum _{i=1}^N\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) +(\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^NA_ix_i^{k+1}-b\right) \right] \nonumber \\&\quad \ge \frac{1}{t+1}\sum \limits _{k=0}^t\left[ \frac{1}{2\gamma }\left( \left\| \lambda - \lambda ^{k+1}\right\| ^2 - \left\| \lambda -\lambda ^k \right\| ^2 \right) \right. \nonumber \\&\left. \qquad -\, \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^0 \right\| ^2 \right. \nonumber \\&\qquad \left. +\,\frac{\gamma }{2}\sum \limits _{i=1}^{N-1} \left( \left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^{k+1} - b\right\| ^2 - \left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^k - b\right\| ^2\right) \right] \nonumber \\&\quad \ge -\frac{1}{2\gamma (t+1)}\left\| \lambda - \lambda ^0 \right\| ^2 -\frac{\gamma }{2(t+1)}\sum \limits _{i=1}^{N-1}\left\| \sum \limits _{j=1}^i A_j x_j^* + \sum \limits _{j=i+1}^N A_j x_j^0 - b\right\| ^2 \nonumber \\&\qquad - \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^0 \right\| ^2. \end{aligned}$$

(3.13)

Note that this inequality holds for all $\lambda \in \mathbf R^{p}$. From the optimality condition (2.1) we obtain

$$\begin{aligned} 0 \ge f(u^*)-f(\bar{u}^t) + (\lambda ^*)^\top \left( \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right) . \end{aligned}$$

Moreover, since $\rho :=\Vert \lambda ^*\Vert +1$, by applying Cauchy–Schwarz inequality, we obtain

$$\begin{aligned} 0 \le f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| . \end{aligned}$$

(3.14)

By setting $\lambda = -\rho (\sum _{i=1}^N A_i\bar{x}_i^t-b) /\Vert \sum _{i=1}^N A_i\bar{x}_i^t-b\Vert $ in (3.13), and noting that $\Vert \lambda \Vert = \rho $, we obtain

$$\begin{aligned}&f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| \nonumber \\&\quad \le \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\sum \limits _{i=1}^{N-1}\left\| \sum \limits _{j=i+1}^N A_j (x_j^0 - x_j^*)\right\| ^{2} \nonumber \\&\qquad +\, \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^{0} \right\| ^{2}. \end{aligned}$$

(3.15)

When $t=O(1/\epsilon ^2)$, together with the condition that $\frac{\epsilon }{2}\le \gamma \le \epsilon $, it is implied that the right hand side of (3.15) is of order $O(\epsilon )$.

We now define the function

$$\begin{aligned} v(\xi ) = \min \left\{ f(u) \,\left| \, \sum _{i=1}^N A_i x_i - b = \xi , x_i\in {\mathcal {X}}_i, i=1,2,\ldots ,N \right\} . \right. \end{aligned}$$

It is easy to verify that v is convex, $v(0)=f(u^*)$, and $\lambda ^* \in \partial v(0)$. Therefore, from the convexity of v and the Cauchy–Schwarz inequality, it holds that

$$\begin{aligned} v(\xi ) \ge v(0) + \langle \lambda ^*, \xi \rangle \ge f(u^*) - \Vert \lambda ^*\Vert \Vert \xi \Vert . \end{aligned}$$

(3.16)

Let $\bar{\xi } = \sum \nolimits _{i=1}^N A_i \bar{x}_i^t - b$, we have $f(\bar{u}^t) \ge v(\bar{\xi })$. Therefore, by combining (3.14), (3.16), and the fact that the right hand side (thus the left hand side) of (3.15) is of order $O(\epsilon )$, we get

$$\begin{aligned}&- \Vert \lambda ^*\Vert \Vert \bar{\xi } \Vert \le f(\bar{u}^t) - f(u^*) \\&\quad \le \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\sum \limits _{i=1}^{N-1}\left\| \sum \limits _{j=i+1}^N A_j (x_j^0 - x_j^*)\right\| ^{2} \\&\qquad + \frac{\epsilon (N-2)(N+1)}{2}\sum \limits _{i=2}^N \left\| A_i x_i^* - A_i x_i^0 \right\| ^{2} - \rho \Vert \bar{\xi }\Vert \\&\quad \le C\epsilon - \rho \Vert \bar{\xi }\Vert , \end{aligned}$$

which, by using $\rho =\Vert \lambda ^*\Vert +1$, yields,

$$\begin{aligned} \left\| \sum \limits _{i=1}^N A_i \bar{x}_i^t - b\right\| = \Vert \bar{\xi }\Vert \le C\epsilon , \end{aligned}$$

(3.17)

where C is a constant depending on $\rho $, $\lambda ^0$, $x^0$, $x^*$ and N. Moreover, by combining (3.14) and (3.17), one obtains that

$$\begin{aligned} -\rho C\epsilon \le -\rho \Vert \bar{\xi }\Vert \le f(\bar{u}^t) - f(u^*) \le (1-\rho )C\epsilon . \end{aligned}$$

(3.18)

Finally, we note that (3.17), (3.18) imply (3.12). $\square $

Remark 3.3

We remark that by applying ADMM to solve the associated perturbed problem (3.1), we can find an $\epsilon $-optimal solution to the original problem (1.1), i.e., an approximate solution to (1.1). However, the assumptions required in this section do not ensure that the exact optimal solution to (1.1) can be found, because that would require $\gamma \rightarrow 0$ and thus $\epsilon \rightarrow 0$.

3.1 Solving the Counterexample in [7]

We now implement our algorithm in this section to solve a perturbed version of the counterexample constructed by Chen et al. [7]. As we will see, after modifying the problem by adding the perturbation terms, the multi-block ADMM is always able to find an $\epsilon $-optimal solution. The counterexample constructed in [7] is to find a solution to the linear system $A_1x_1+A_2x_2+A_3x_3=0$, where $A_1=(1,1,1)^\top $, $A_2=(1,1,2)^\top $ and $A_3=(1,2,2)^\top $. It is easy to see that the unique solution of this linear system is $(x_1,x_2,x_3)=(0,0,0)$. The associated perturbed problem to this example is given by

$$\begin{aligned} \begin{array}{ll} \min &{} \frac{\mu }{2}\Vert A_2x_2-A_2x_2^0\Vert ^2 + \frac{\mu }{2}\Vert A_3x_3-A_3x_3^0\Vert ^2 \\ \text {s.t.}&{} A_1x_1+A_2x_2+A_3x_3=0, \end{array} \end{aligned}$$

(3.19)

where $\mu = 4\epsilon $. To satisfy the conditions in Scenario 1, we chose $\gamma =\epsilon $. We randomly generated $x^0=(x_1^0,x_2^0,x_3^0)$ using the randn function in Matlab with random seed set to 0. We plot the residual of the equality constraint and the three components of $x^t$ and the average of $x^k$ for $k=0,\ldots ,t$ in Figs. 1 and 2 for $\epsilon =10^{-2}$ and $\epsilon =10^{-3}$ respectively, where t denotes the iteration number. From Figs. 1 and 2 we can see that ADMM for solving the perturbed problem is able to decrease the residual of the equality constraint to the order of $O(\epsilon )$ in $O(1/\epsilon ^2)$ iterations, which matches our theoretical results in Theorem 3.2. We also see that both $x^t$ and the average of $x^k$ for $k=0,\ldots ,t$ converge to zero, which is the unique solution to the original problem. As a result, our approach successfully eliminates the risk of divergence of the multi-block ADMM.

4 Iteration Complexity of ADMM: Kurdyka–Łojasiewicz Property

In this section, we prove an $O(1/\epsilon )$ iteration complexity of ADMM (1.2) under the conditions in Scenario 2 of Table 1. Under the conditions in Scenario 2, since ${\mathcal {X}}_N=\mathbf R^{n_N}$, $f_N$ is differentiable and $A_N=I$, (1.1) becomes the so-called sharing problem. It turns out that ADMM has nice convergence properties when it is applied to solve sharing problems. In a recent work by Hong, Luo and Razaviyayn [27], it was shown that ADMM is globally convergent under certain conditions for solving a nonconvex sharing problem. For convex sharing problem, Corollary 3.1 in [27] gives conditions that guarantee the global convergence of ADMM. These conditions are the same as our conditions required in Scenario 2. For example, both papers require $\gamma > \sqrt{2}L$. The difference is that global convergence of ADMM for solving (both nonconvex and convex) sharing problem is shown in [27], and we show that by using the KL property, we are able to prove the sublinear convergence rate of ADMM for convex sharing problem.

The first-order optimality conditions for the subproblems in (1.2) are given by $x_i^{k+1}\in {\mathcal {X}}_i, i=1,\ldots ,N-1$, and for any $x_i\in {\mathcal {X}}_i$,

$$\begin{aligned}&g_i\left( x_{i}^{k+1}\right) -A_{i}^{\top }\lambda ^{k}+\gamma A_{i}^{\top }\left( \sum \limits _{j=1}^{i} A_{j}x_{j}^{k+1} + \sum \limits _{j=i+1}^{N-1} A_{j}x_{j}^{k} + x_N^k -b\right) = 0, \quad i=1,\ldots ,N-1,&\end{aligned}$$

(4.1)

$$\begin{aligned}&\nabla f_N\left( x_{N}^{k+1}\right) -\lambda ^{k}+\gamma \left( \sum \limits _{j=1}^{N-1} A_{j}x_{j}^{k+1} + x_{N}^{k+1}-b\right) = 0,&\end{aligned}$$

(4.2)

where $g_i \in \partial (f_i+\mathbf 1 _{{\mathcal {X}}_i})$ is a subgradient of $f_i+\mathbf 1 _{{\mathcal {X}}_i}$, $i=1,2,\ldots ,N-1$. The last equation in (1.2) that updates $\lambda ^k$ now becomes

$$\begin{aligned} \lambda ^{k+1} := \lambda ^k - \gamma \left( A_1 x_1^{k+1} + A_2 x_2^{k+1}+ \cdots + A_{N-1}x_{N-1}^{k+1} + x_N^{k+1} - b\right) . \end{aligned}$$

(4.3)

Moreover, by combining with (4.3), (4.1) and (4.2) can be rewritten as

$$\begin{aligned}&g_i(x_{i}^{k+1})-A_{i}^{\top }\lambda ^{k+1}+\gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^{N-1} A_{j}(x_{j}^{k}-x_{j}^{k+1})+(x_{N}^{k}-x_{N}^{k+1})\right) = 0, \ i=1,\ldots ,N-1,&\end{aligned}$$

(4.4)

$$\begin{aligned}&\nabla f_N(x_{N}^{k+1})-\lambda ^{k+1} = 0.&\end{aligned}$$

(4.5)

Note that in Scenario 2 we require that ${\mathcal {L}}_\gamma $ is a Kurdyka–Łojasiewicz (KL) function. Let us first introduce the notion of the KL function and the KL property, which can be found, e.g., in [3, 4]. We denote $\mathrm{dist}(x,S):=\inf \{\Vert y-x \Vert :y\in S\}$ as the distance from x to S. Let $\eta \in (0,+\infty ]$. We further denote $\Phi _\eta $ to be the class of all concave and continuous functions $\varphi :[0,\eta )\rightarrow \mathbf R_{+}$ satisfying the following conditions: (1) $\varphi (0)=0$; (2) $\varphi $ is $C^1$ on $(0,\eta )$ and continuous at 0; (3) for all $s\in (0,\eta ):\varphi '(s)>0$.

Definition 4.1

Let $f:\Omega \rightarrow (-\infty ,+\infty ]$ be proper and lower semicontinuous.

1.
The function f has KL property at $w_0\in \{w\in \Omega :\partial f(w)\ne \emptyset \}$ if there exists $\eta \in (0,+\infty ]$, a neighborhood $W_0$ of $w_0$ and a function $\varphi \in \Phi _\eta $ such that for all
$$\begin{aligned} \bar{w}_0 \in W\cap \left\{ w\in \Omega : f(w)<f(w_0)<f(w)+\eta \right\} , \end{aligned}$$
the following inequality holds,
$$\begin{aligned} \varphi '(f(\bar{w}_0) - f(w_0))\mathrm{dist}(0,\partial f(\bar{w}_0))\ge 1. \end{aligned}$$
(4.6)
2.
The function f is a KL function if f satisfies the KL property at each point of $\Omega \cap \{\partial f(w)\ne \emptyset \}$.

Remark 4.1

It is important to remark that most convex functions from practical applications satisfy the KL property; see Section 5.1 of [4]. In fact, convex functions that do not satisfy the KL property exist (see [3] for a counterexample) but they are rare and difficult to construct.

Indeed, ${\mathcal {L}}_\gamma $ will be a KL function if each $f_i$ satisfies growth condition, or uniform convexity, or they are general convex semialgebraic or real analytic functions. We refer the interested readers to [1, 4] for more information.

The following result, which is called uniformized KL property, is from Lemma 6 of [4].

Lemma 4.2

(Lemma 6 [4]) Let $\Omega $ be a compact set and $f: \mathbf R^n\rightarrow (-\infty ,\infty ]$ be a proper and lower semi-continuous function. Assume that f is constant on $\Omega $ and satisfies the KL property at each point of $\Omega $. Then, there exists $\epsilon >0$, $\eta >0$ and $\varphi \in \Phi _\eta $ such that for all $\bar{u}$ in $\Omega $ and all u in the intersection:

$$\begin{aligned} \left\{ u\in \mathbf R^n: \mathrm{dist}(u, \Omega ) < \epsilon \right\} \cap \left\{ u\in \mathbf R^n: f(\bar{u}) < f(u) < f(\bar{u})+\eta \right\} , \end{aligned}$$

the following inequality holds,

$$\begin{aligned} \varphi '\left( f(u) - f(\bar{u})\right) \mathrm{dist}\left( 0, \partial f(u)\right) \ge 1. \end{aligned}$$

We now give a formal definition of the limit point set. Let the sequence $w^k = \left( x_1^k,\ldots , x_N^k,\lambda ^k\right) $ be a sequence generated by the multi-block ADMM (1.2) from a starting point $w^0 = \left( x_1^0,\ldots , x_N^0,\lambda ^0\right) $. The set of all limit points is denoted by $\Omega (w^0)$, i.e.,

$$\begin{aligned}&\Omega (w^0) = \left\{ \bar{w}\in \mathbf R^{n_1}\times \cdots \times \mathbf R^{n_N}\times \mathbf R^p: \exists \text { an infinite sequence }\{k_l\}_{l=1,\ldots } \right. \\&\left. \quad \text { such that }w^{k_l}\rightarrow \bar{w}\text { as }l\rightarrow \infty \right\} . \end{aligned}$$

In the following we present the main results in this section. Specifically, Theorem 4.3 gives the convergence of the multi-block ADMM (1.2), and we include its proof in the “Appendix”. Theorem 4.5 shows that the whole sequence generated by the multi-block ADMM (1.2) converges.

Theorem 4.3

Under the conditions in Scenario 2 of Table 1, then:

1.
$\Omega (w^0)$ is a non-empty set, and any point in $\Omega (w^0)$ is a stationary point of ${\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )$;
2.
$\Omega (w^0)$ is a compact and connected set;
3.
The function ${\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )$ is finite and constant on $\Omega (w^0)$.

Remark 4.4

In Theorem 4.3, we do not require ${\mathcal {L}}_\gamma $ to be a KL function, which is only required in Theorem 4.5 (see next).

Theorem 4.5

Suppose that ${\mathcal {L}}_\gamma (x_1,\ldots , x_N,\lambda )$ is a KL function. Let the sequence $w^k =\left( x_1^k,\ldots , x_N^k,\lambda ^k\right) $ be generated by the multi-block ADMM (1.2). Let $w^* =\left( x_1^*,\ldots , x_N^*,\lambda ^*\right) \in \Omega (w^0)$, then the sequence $w^k =\left( x_1^k,\ldots , x_N^k,\lambda ^k\right) $ has a finite length, i.e.,

$$\begin{aligned} \sum \limits _{k=0}^\infty \left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1} \right\| + \left\| \lambda ^k - \lambda ^{k+1}\right\| \right) \le G, \end{aligned}$$

(4.7)

where the constant G is given by

$$\begin{aligned} G:= & {} 2\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^0 - A_i x_i^1\right\| + \left\| x_N^0 - x_N^1 \right\| + \left\| \lambda ^0 - \lambda ^1\right\| \right) \\&+\, \frac{2M\gamma (1+L^2)}{\gamma ^2-2L^2}\varphi \left( {\mathcal {L}}_\gamma (w^1) - {\mathcal {L}}_\gamma (w^*)\right) , \end{aligned}$$

and

$$\begin{aligned} M = \max \left( \gamma \sum \limits _{i=1}^{N-1}\left\| A_i^\top \right\| , \frac{1}{\gamma }+1+\sum \limits _{i=1}^{N-1} \left\| A_i^\top \right\| \right) >0, \end{aligned}$$

(4.8)

and the whole sequence $\left( A_1 x_1^k, A_2 x_2^k, \ldots , A_{N-1}x_{N-1}^k, x_N^k,\lambda ^k\right) $ converges to $\left( A_1 x_1^*,\ldots ,\right. \left. A_{N-1}x_{N-1}^*, x_N^*,\lambda ^*\right) $.

Proof

The proof of this theorem is almost identical to the proof of Theorem 1 in [4], by utilizing the uniformized KL property (Lemma 4.2), and the facts that $\Omega (w^0)$ is compact, ${\mathcal {L}}_\gamma (w)$ is constant (proved in Theorem 4.3), with function $\Psi $ replaced by ${\mathcal {L}}_\gamma $ and some other minor changes. We thus omit the proof for succinctness. $\square $

Based on Theorem 4.5, we prove a key lemma for analyzing the iteration complexity for the ADMM.

Lemma 4.6

Suppose that ${\mathcal {L}}_\gamma (x_1,\ldots , x_N,\lambda )$ is a KL function. Let $(x_1^{k+1},x_{2}^{k+1},\ldots ,x_N^{k+1},\lambda ^{k+1})\in \Omega $ be generated by the multi-block ADMM (1.2) from given $(x_{2}^{k},\ldots ,x_{N}^{k},\lambda ^{k})$. For any $u^*=(x_1^*,x_2^*,\ldots ,x_N^*)\in \Omega ^*$ and $\lambda \in \mathbf R^p$, it holds under conditions in Scenario 2 that

$$\begin{aligned}&f(u^*)-f(u^{k+1}) + \sum _{i=1}^{N-1}\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) -\left( x_N^*-x_N^{k+1}\right) ^\top \lambda ^{k+1} \nonumber \\&\qquad +\, (\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^{N-1}A_ix_i^{k+1}+x_N^{k+1}-b\right) \nonumber \\&\qquad +\, \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^* + \sum \limits _{i=2}^{N-1} A_i x_i^{k} + x_N^k - b\right\| ^2 - \left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2 \right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1} \right\| ^2 \right) \nonumber \\&\qquad +\, \gamma D(N-2)\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right) \nonumber \\&\quad \ge 0, \end{aligned}$$

(4.9)

where D is a constant that will be specified later.

Proof

Note that similar to the proof of Lemma 3.1, we can prove that, for any $x_1\in {\mathcal {X}}_1$,

$$\begin{aligned}&\left( x_1 - x_1^{k+1}\right) ^\top A_1^\top \left[ \sum \limits _{j=2}^{N-1} A_j \left( x_j^k - x_j^{k+1}\right) + \left( x_N^k - x_N^{k+1}\right) \right] \\&\quad \le \frac{1}{2}\left( \left\| A_1 x_1 +\sum \limits _{j=2}^{N-1} A_j x_j^k + x_N^k -b\right\| ^2 - \left\| A_1 x_1 + \sum \limits _{j=2}^{N-1} A_j x_j^{k+1} + x_N^{k+1} - b\right\| ^2 \right) \\&\qquad +\, \frac{1}{2\gamma ^2}\left\| \lambda ^{k+1}-\lambda ^k\right\| ^2. \end{aligned}$$

For any $x_i\in {\mathcal {X}}_i$, $i=2,3,\ldots ,N-1$, we have,

$$\begin{aligned}&\left( x_i - x_i^{k+1}\right) ^\top A_i^\top \left[ \sum \limits _{j=i+1}^{N-1} A_j \left( x_j^k - x_j^{k+1}\right) + \left( x_N^k - x_N^{k+1}\right) \right] \\&\quad \le \left\| A_i x_i - A_i x_i^{k+1}\right\| \left[ \sum \limits _{j=i+1}^{N-1} \left\| A_j x_j^k - A_j x_j^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] \\&\quad \le \left\| A_i x_i - A_i x_i^{k+1}\right\| \left[ \sum \limits _{j=1}^{N-1} \left\| A_j x_j^k - A_j x_j^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] . \end{aligned}$$

Therefore, for any $x_i\in {\mathcal {X}}_i$, $i=1,\ldots ,N-1$, it holds that

$$\begin{aligned}&\gamma \sum _{i=1}^{N-1}\left( x_i-x_i^{k+1}\right) ^\top A_i^\top \left( \sum _{j=i+1}^{N-1}A_j\left( x_j^k-x_j^{k+1}\right) +\left( x_N^k-x_N^{k+1}\right) \right) \nonumber \\&\quad \le \frac{\gamma }{2}\left( \left\| A_1 x_1 + \sum \limits _{i=2}^{N-1} A_i x_i^{k} +x_N^k- b\right\| ^2 - \left\| A_1 x_1 + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2 \right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^2 \nonumber \\&\qquad +\, \gamma \left( \sum \limits _{i=2}^{N-1}\left\| A_i x_i - A_i x_i^{k+1}\right\| \right) \left[ \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] . \end{aligned}$$

(4.10)

Combining (4.3), (4.4), (4.5) and (4.10), it holds for any $\lambda \in \mathbf R^p$ that

$$\begin{aligned}&\sum _{i=1}^{N-1}\left( x_i-x_i^{k+1}\right) ^\top \left( g_i(x_i^{k+1})-A_i^\top \lambda ^{k+1}\right) +\left( x_N-x_N^{k+1}\right) ^\top \left( \nabla f_N(x_N^{k+1})-\lambda ^{k+1}\right) \nonumber \\&\qquad +\, (\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} -b\right) + \frac{1}{\gamma } (\lambda -\lambda ^{k+1} )^\top (\lambda ^{k+1} - \lambda ^k ) \nonumber \\&\qquad +\, \frac{\gamma }{2}\left( \left\| A_1 x_1 + \sum \limits _{i=2}^{N-1} A_i x_i^k + x_N^k - b\right\| ^2 - \left\| A_1 x_1 + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2\right) \nonumber \\&\qquad +\, \frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^k \right\| ^2 \nonumber \\&\qquad +\, \gamma \left( \sum \limits _{i=2}^{N-1}\left\| A_i x_i - A_i x_i^{k+1}\right\| \right) \left[ \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] \nonumber \\&\quad \ge 0. \end{aligned}$$

(4.11)

Using (2.5), we have

$$\begin{aligned} \frac{1}{\gamma }\left( \lambda -\lambda ^{k+1}\right) ^{\top }\left( \lambda ^{k+1}-\lambda ^{k}\right) +\frac{1}{2\gamma }\left\| \lambda ^{k+1}-\lambda ^{k}\right\| ^2 = \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1}\right\| ^2 \right) . \end{aligned}$$

Letting $u=u^{*}$ in (4.11), and invoking the convexity of $f_i$, we obtain

$$\begin{aligned}&f(u^*)-f(u^{k+1}) + \sum _{i=1}^{N-1}\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) +\left( x_N^*-x_N^{k+1}\right) ^\top (-\lambda ^{k+1}) \\&\qquad +\, \left( \lambda -\lambda ^{k+1}\right) ^\top \left( \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} -b\right) + \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^k \right\| ^2 - \left\| \lambda -\lambda ^{k+1} \right\| ^2 \right) \\&\qquad +\, \frac{\gamma }{2}\left( \left\| A_{1}x_{1}^* + \sum \limits _{i=2}^{N-1} A_i x_i^k + x_N^k - b\right\| ^2 - \left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2 \right) \\&\qquad +\, \gamma \left( \sum \limits _{i=2}^{N-1}\left\| A_i x_i^* - A_i x_i^{k+1}\right\| \right) \left[ \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right] \ge 0. \end{aligned}$$

From Theorem 4.5 we know that the whole sequence $\left( A_1 x_1^k, A_2 x_2^k, \ldots , A_{N-1}x_{N-1}^k, x_N^k,\lambda ^k\right) $ converges to $\left( A_1 x_1^*,\ldots , A_{N-1}x_{N-1}^*, x_N^*,\lambda ^*\right) $. Therefore, there exists a constant $D>0$ such that

$$\begin{aligned} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| \le D, \end{aligned}$$

for any $k\ge 0$ and any $i=2,3,\ldots ,N-1$. This implies (4.9). $\square $

Now, we are ready to prove the $O(1/\epsilon )$ iteration complexity of the multi-block ADMM for (1.1).

Theorem 4.7

Suppose that ${\mathcal {L}}_\gamma (x_1,\ldots , x_N,\lambda )$ is a KL function. Let $(x_1^{k+1},x_{2}^{k+1},\ldots ,x_N^{k+1},\lambda ^{k+1})\in \Omega $ be generated by the multi-block ADMM (1.2) from given $(x_{2}^{k},\ldots ,x_{N}^{k},\lambda ^{k})$. For any integer $t>0$, let $\bar{u}^t = (\bar{x}_1^t; \bar{x}_2^t; \ldots ; \bar{x}_N^t)$ and $\bar{\lambda }^{t}$ be defined as

$$\begin{aligned} \bar{x}_i^t = \frac{1}{t+1}\sum \limits _{k=0}^{t}x_i^{k+1}, \quad i=1,2,\ldots ,N, \quad \bar{\lambda }^t = \frac{1}{t+1}\sum \limits _{k=0}^t \lambda ^{k+1}. \end{aligned}$$

For any $(u^*,\lambda ^*) \in \Omega ^{*}$, by defining $\rho :=\Vert \lambda ^*\Vert +1$, it holds in Scenario 2 that,

$$\begin{aligned} 0\le & {} f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| \\\le & {} \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\left\| \sum \limits _{i=2}^{N-1} A_i (x_i^0 - x_j^*) + (x_N^0 - x_N^*)\right\| ^2 + \frac{\gamma DG}{t+1}. \end{aligned}$$

Note this also implies that when $t=O(1/\epsilon )$, $\bar{u}^{t}$ is an $\epsilon $-optimal solution to the original problem (1.1), i.e., both the error of the objective function value and the residual of the equality constraint satisfy that

$$\begin{aligned} |f(\bar{u}^t)-f(u^*)| = O(\epsilon ), \quad \text{ and } \quad \left\| \sum \limits _{i=1}^N A_i\bar{x}_i^t-b \right\| =O(\epsilon ). \end{aligned}$$

(4.12)

Proof

Because $(u^{k},\lambda ^k)\in \Omega $, it holds that $(\bar{u}^{t},\bar{\lambda }^t)\in \Omega $ for all $t\ge 0$. By Lemma 4.6 and invoking the convexity of function $f(\cdot )$, we have for any $\lambda \in \mathbf R^{p}$,

$$\begin{aligned}&f(u^*)-f(\bar{u}^t) + \lambda ^\top \left( \sum \limits _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b \right) \nonumber \\&\quad = f(u^*)-f(\bar{u}^t) + \sum _{i=1}^{N-1}\left( x_i^*-\bar{x}_i^t\right) ^\top \left( -A_i^\top \bar{\lambda }^t\right) \nonumber \\&\qquad +\, \left( x_N^*-\bar{x}_N^t\right) ^\top (-\bar{\lambda }^t) + (\lambda -\bar{\lambda }^t)^\top \left( \sum _{i=1}^{N-1} A_i\bar{x}_i^t + \bar{x}_N^t - b\right) \nonumber \\&\quad \ge \frac{1}{t+1}\sum \limits _{k=0}^t\left( f(u^*)-f(u^{k+1})+\sum _{i=1}^{N-1}\left( x_i^*-x_i^{k+1}\right) ^\top \left( -A_i^\top \lambda ^{k+1}\right) \right. \nonumber \\&\left. \qquad +\left( x_N^*-x_N^{k+1}\right) ^\top (-\lambda ^{k+1})\right. \nonumber \\&\qquad \left. +\,(\lambda -\lambda ^{k+1})^\top \left( \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \right) \nonumber \\&\quad \ge \frac{1}{t+1}\sum \limits _{k=0}^t\left[ \frac{1}{2\gamma }\left( \left\| \lambda -\lambda ^{k+1}\right\| ^2 - \left\| \lambda -\lambda ^k \right\| ^2 \right) \right. \nonumber \\&\qquad -\, \gamma D(N-2)\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right) \nonumber \\&\qquad \left. +\,\frac{\gamma }{2}\left( \left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2 - \left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^k + x_N^k - b\right\| ^2 \right) \right] \nonumber \\&\quad \ge -\frac{1}{2\gamma (t+1)}\left\| \lambda -\lambda ^0 \right\| ^2 - \frac{\gamma }{2(t+1)}\left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^0 + x_N^0 - b\right\| ^2 \nonumber \\&\qquad - \frac{\gamma D (N-2)}{t+1}\sum \limits _{k=0}^t \left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_N^k - x_N^{k+1}\right\| \right) \nonumber \\&\quad \ge -\frac{1}{2\gamma (t+1)}\left\| \lambda -\lambda ^0\right\| ^2 - \frac{\gamma }{2(t+1)}\left\| A_1 x_1^* + \sum \limits _{i=2}^{N-1} A_i x_i^0 + x_N^0 - b\right\| ^2 - \frac{\gamma D G (N-2)}{t+1}, \end{aligned}$$

(4.13)

where the last inequality holds due to Theorem 4.5. From the optimality conditions (2.1) we obtain

$$\begin{aligned} 0 \ge f(u^*)-f(\bar{u}^t) + (\lambda ^*)^\top \left( \sum \limits _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b \right) . \end{aligned}$$

Since $\rho :=\Vert \lambda ^*\Vert +1$, by the Cauchy–Schwarz inequality, we obtain

$$\begin{aligned} 0 \le f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b \right\| . \end{aligned}$$

(4.14)

By setting $\lambda = -\rho (\sum _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b)/\Vert \sum _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b\Vert $ in (4.13), and noting that $\Vert \lambda \Vert =\rho $, we obtain

$$\begin{aligned}&f(\bar{u}^t)-f(u^*)+\rho \left\| \sum \limits _{i=1}^{N-1} A_i\bar{x}_i^t+\bar{x}_N^t-b \right\| \nonumber \\&\quad \le \frac{\rho ^2 + \Vert \lambda ^0\Vert ^2}{\gamma (t+1)} + \frac{\gamma }{2(t+1)}\left\| \sum \limits _{i=2}^{N-1} A_i (x_i^0 - x_i^*) + (x_N^0 - x_N^*)\right\| ^{2} + \frac{\gamma D G (N-2)}{t+1}. \end{aligned}$$

(4.15)

When $t=O(1/\epsilon )$, we have that the right hand side of (4.15) is of order $O(\epsilon )$. By the same arguments as in the proof for Theorem 3.2, (4.12) follows from (4.14) and (4.15). $\square $

4.1 Divergence Counterexample Revisited

In this subsection we shall test the same divergence counterexample as considered in Sect. 3.1, but now equivalently reformulated in a way to satisfy the conditions required in Scenario 2. The example in [7] can obviously be formulated equivalently as:

$$\begin{aligned} \begin{array}{ll} \min &{} \frac{1}{2}\Vert y\Vert ^2 \\ \text {s.t.}&{} A_1x_1+A_2x_2+A_3x_3+y=0, \end{array} \end{aligned}$$

(4.16)

where $A_1, A_2, A_3$ are the same as in Sect. 3.1. Note that (4.16) is in the form of the sharing problem with four block variables $x_1$, $x_2$, $x_3$ and y, and it satisfies assumptions in Scenario 2. Note that (4.16) corresponds to $f_4(\cdot )=\frac{1}{2}\Vert \cdot \Vert ^2$ and the Lipschitz constant $L=1$ for $\nabla f_4$. It is easy to see that the unique solution to (4.16) is $x_1=x_2=x_3=0$ and $y=(0,0,0)^\top $. We then apply ADMM (1.2) to solve (4.16) with $\gamma = 1.01*\sqrt{2}*L$ so that the assumption on $\gamma $ in Scenario 2 is satisfied. Similarly as the tests in Sect. 3.1, we plot the residual of the equality constraint and the components of the average of $x^k$ and $y^k$ for $k=0,\ldots ,t$ in Fig. 3, where t denotes the iteration number. From Fig. 3 we again observed the convergence of ADMM. We then tested the same problem using a smaller $\gamma $, which was set to $0.01*\sqrt{2}*L$. The residual of the equality constraint is plotted in Fig. 4. Figure 4 shows that after a few number of iterations, ADMM diverges. This indicates that if the assumption $\gamma > \sqrt{2}*L$ is violated, then it is possible that ADMM diverges for solving the sharing problem (4.16).

References

Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet MATH Google Scholar
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Article MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearization minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Cai, X., Han, D., Yuan, X.: The direct extension of ADMM for three-block separable convex minimization models is convergent when one function is strongly convex. Preprint http://www.optimization-online.org/DB_FILE/2014/11/4644.pdf (2014)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1), 57–79 (2016)
Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. In: Abstract and Applied Analysis, Article ID 183961 (2013)
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Technical report, UCLA CAM Report 15-13 (2015)
Deng, W., Lai, M., Peng, Z., Yin, W.: Parallel multi-block ADMM with $o(1/k)$ convergence. Technical report, UCLA CAM 13-64 (2013)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Article MathSciNet Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Article MathSciNet MATH Google Scholar
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, Massachusetts Institute of Technology (1989)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet MATH Google Scholar
Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Pac. J. Optim. 11(4), 619–644 (2015)
Fortin, M., Glowinski, R.: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland Pub. Co., Amsterdam (1983)
MATH Google Scholar
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary Value Problems. North-Holland, Amsterdam (1983)
Google Scholar
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics. SIAM, Philadelphia (1989)
Book MATH Google Scholar
Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)
Article MathSciNet MATH Google Scholar
He, B., Hou, L., Yuan, X.: On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2013)
Article MathSciNet MATH Google Scholar
He, B., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)
Article MathSciNet MATH Google Scholar
He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Preprint http://www.optimization-online.org/DB_FILE/2012/09/3611.pdf (2012)
He, B., Yuan, X.: On the ${O}(1/n)$ convergence rate of Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: On nonergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2015)
Article MathSciNet MATH Google Scholar
Hong, M., Chang, T.-H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.-Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization (2014). Preprint ArXiv:1401.7079
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers (2012). Preprint ArXiv:1208.3922
Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Li, M., Sun, D., Toh, K.-C.: A convergent 3-block semi-proximal ADMM for convex minimization with one strongly convex block. Asia Pac. J. Oper. Res. 32, 1550024 (2015)
Article MathSciNet MATH Google Scholar
Lin, T., Ma, S., Zhang, S.: Global convergence of unmodified 3-block ADMM for a class of convex minimization problems (2015). Preprint ArXiv:1505.04252
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)
Article MathSciNet MATH Google Scholar
Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block ADMM. J. Oper. Res. Soc. China 3(3), 251–274 (2015)
Article MathSciNet MATH Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MathSciNet MATH Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23, 475–507 (2013)
Article MathSciNet MATH Google Scholar
Peaceman, D.H., Rachford, H.H.: The numerical solution of parabolic elliptic differential equations. SIAM J. Appl. Math. 3, 28–41 (1955)
Article MathSciNet MATH Google Scholar
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Article Google Scholar
Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semi-proximal alternating direction method of multipliers for conic programming with 4-type of constraints. SIAM J. Optim. 25(2), 882–915 (2015)
Article MathSciNet MATH Google Scholar
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)
Article MathSciNet MATH Google Scholar
Wang, X., Hong, M., Ma, S., Luo, Z.-Q.: Solving multiple-block separable convex minimization problems using two-block alternating direction method of multipliers. Pac. J. Optim. 11(4), 645–667 (2015)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors are grateful to the associate editor and two anonymous referees for their insightful comments that have improved the presentation of this paper greatly.

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
Tianyi Lin & Shiqian Ma
Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Shuzhong Zhang

Authors

Tianyi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Shiqian Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shuzhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiqian Ma.

Additional information

Shiqian Ma: Research of this author was supported in part by the Hong Kong Research Grants Council General Research Fund Early Career Scheme (Project ID: CUHK 439513). Shuzhong Zhang: Research of this author was supported in part by the National Science Foundation under Grant Number CMMI-1462408.

Appendix: Proof of Theorem 4.3

We first prove the following lemma.

Lemma 4.8

The following results hold under the conditions in Scenario 2.

1.
The iterative gap of dual variable can be bounded by that of primal variable, i.e.,
$$\begin{aligned} \Vert \lambda ^{k+1} - \lambda ^{k} \Vert ^2 \le L^2 \Vert x_N^{k+1} - x_N^k \Vert , \end{aligned}$$
(4.17)
where L is the Lipschitz constant for $\nabla f_N$.
2.
The augmented Lagrangian $L_\gamma $ has a sufficient decrease in each iteration, i.e.,
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k},\ldots , x_{N+1}^{k};\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{N+1}^{k+1};\lambda ^{k+1}\right) \nonumber \\&\quad \ge \frac{\gamma ^2-2L^2}{2\gamma (1+L^2)}\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) .\nonumber \\ \end{aligned}$$
(4.18)
3.
The augmented Lagrangian ${\mathcal {L}}_\gamma (w^k)$ is uniformly lower bounded, and it holds true that
$$\begin{aligned}&\sum \limits _{k=0}^\infty \left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^{k+1} - A_i x_i^k \right\| ^2 + \left\| x_N^{k+1} - x_N^k \right\| ^2 + \left\| \lambda ^{k+1}-\lambda ^k \right\| ^2\right) \nonumber \\&\quad \le \frac{2\gamma (1+L^2)}{\gamma ^2-2L^2}\left( {\mathcal {L}}_\gamma (w^0) - L^*\right) \end{aligned}$$
(4.19)
where $L^*$ is the uniform lower bound of ${\mathcal {L}}_\gamma (w^k)$, and hence
$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \left( \sum _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2 \right) = 0. \end{aligned}$$
(4.20)
Moreover, $\left\{ \left( x_1^k, x_2^k, \ldots ,x_N^k, \lambda ^k\right) : k=0,1,\ldots \right\} $ is a bounded sequence.
4.
There exists an upper bound for a subgradient of augmented Lagrangian ${\mathcal {L}}_\gamma $ in each iteration. In fact, we define
$$\begin{aligned} R_i^{k+1}:= & {} \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \\&- \gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^{N-1} A_{j}\left( x_{j}^{k}-x_{j}^{k+1}\right) +\left( x_{N}^{k}-x_{N}^{k+1}\right) \right) \end{aligned}$$
and
$$\begin{aligned} R_N^{k+1} := \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) , \quad R_{\lambda }^{k+1} := b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} \end{aligned}$$
for each positive integer k, and $i = 1,2,\ldots , N$. Then $\left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \in \partial {\mathcal {L}}_\gamma (w^{k+1})$. Moreover, it holds that
$$\begin{aligned}&\left\| \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \right\| \le \sum \limits _{i=1}^N \left\| R_i^{k+1} \right\| + \left\| R_\lambda ^{k+1} \right\| \nonumber \\&\quad \le M\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_i^k - x_i^{k+1} \right\| + \left\| \lambda ^k - \lambda ^{k+1} \right\| \right) , \quad \forall k\ge 0,\nonumber \\ \end{aligned}$$
(4.21)
where M is a constant defined in (4.8).

Proof of Lemma 4.8

1.
(4.17) follows directly from (4.5).
2.
From (4.1), by invoking the convexity of $f_i$, we have for $i=1,\ldots ,N-1$:
$$\begin{aligned} 0&= \left( x_i^k - x_i^{k+1}\right) ^\top \left[ g_i\left( x_{i}^{k+1}\right) -A_{i}^{\top }\lambda ^{k}+\gamma A_{i}^{\top }\left( \sum _{j=1}^iA_{j}x_{j}^{k+1}+\sum _{j=i+1}^{N-1} A_{j}x_{j}^{k} + x_{N}^k -b\right) \right] \nonumber \\&\le f_i\left( x_i^k\right) - f_i\left( x_i^{k+1}\right) - \left( A_i x_i^k - A_i x_i^{k+1}\right) ^\top \lambda ^k \nonumber \\&\qquad +\, \gamma \left( A_i x_i^k - A_i x_i^{k+1}\right) ^\top \left( \sum _{j=1}^iA_{j}x_{j}^{k+1}+\sum _{j=i+1}^{N-1} A_{j}x_{j}^{k} + x_{N}^k -b \right) \nonumber \\&= {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^k,x_i^k,\ldots ,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots ,x_i^{k+1},x_{i+1}^k,\ldots ,\lambda ^k\right) \nonumber \\&\qquad -\, \frac{\gamma }{2}\left\| A_i x_i^k - A_i x_i^{k+1}\right\| ^2. \end{aligned}$$
(4.22)
Similarly, from (4.2) we can prove that
$$\begin{aligned} {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{N-1}^{k+1},x_N^k,\ldots ;\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots ,x_N^{k+1};\lambda ^k\right) \ge \frac{\gamma }{2}\left\| x_N^k - x_N^{k+1}\right\| ^2.\nonumber \\ \end{aligned}$$
(4.23)
Summing (4.22) over $i=1,\ldots ,N-1$ and (4.23), we have
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^k,\ldots ,x_N^k,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^k\right) \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma }{2}\left\| x_N^k - x_N^{k+1} \right\| ^2. \end{aligned}$$
(4.24)
On the other hand, it follows from (4.2) and (4.17) that
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad = \frac{1}{\gamma }\left\| \lambda ^k - \lambda ^{k+1} \right\| ^2 \ge - \frac{L^2}{\gamma }\left\| x_N^k - x_N^{k+1} \right\| ^2. \end{aligned}$$
(4.25)
Combining (4.24) and (4.25) yields
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^k,\ldots , x_N^k,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma ^2 - 2L^2}{2\gamma } \left\| x_N^k - x_N^{k+1} \right\| ^2 \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma ^2 - 2L^2}{2\gamma (1+L^2)}\left( \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) \nonumber \\&\quad \ge \frac{\gamma ^2 - 2L^2}{2\gamma (1+L^2)}\left( \sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) .\nonumber \\ \end{aligned}$$
(4.26)
3.
It follows from (4.2) and the fact that $\nabla f_N$ is Lipschitz continuous with constant L that,
$$\begin{aligned}&f_N\left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1}\right) \\&\quad \le f_N\left( x_N^{k+1}\right) + \left\langle \nabla f_N\left( x_N^{k+1}\right) , \left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1}\right) \right\rangle \\&\qquad +\, \frac{L}{2}\left\| b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} \right\| ^2 \\&\quad = f_N\left( x_N^{k+1}\right) - \left\langle \lambda ^{k+1}, \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\rangle + \frac{L}{2} \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2. \end{aligned}$$
This implies that there exists $L^*>-\infty $, such that
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad \ge \sum _{i=1}^{N-1} f_i(x_i^{k+1}) + f_N\left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1}\right) + \frac{\gamma -L}{2} \left\| \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} -b\right\| ^2 \nonumber \\&\quad > L^*, \end{aligned}$$
(4.27)
where the last inequality holds since $\gamma >L$ and $\inf _{{\mathcal {X}}_i}f_i>f_i^*$ for $i=1,2,\ldots ,N$.

Therefore, it directly follows from (4.18) and $\gamma >\sqrt{2}L$ that,
$$\begin{aligned}&\frac{\gamma ^2-2L^2}{2\gamma (1+L^2)}\sum \limits _{k=0}^K \left( \sum \limits _{i=1}^{N-1} \Vert A_i x_i^{k+1} - A_i x_i^k \Vert ^2 + \Vert x_N^{k+1} - x_N^k \Vert ^2 + \Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right) \\&\quad \le {\mathcal {L}}_\gamma (w^0) - L^*. \end{aligned}$$
Letting $K\rightarrow \infty $ gives (4.19) and (4.20).

It also follows from (4.27), (4.18) and $\gamma >\sqrt{2}L$ that ${\mathcal {L}}_\gamma (w^0) - f_N^* \ge \sum _{i=1}^{N-1} f_i(x_i^{k+1})$. This implies that $\left\{ \left( x_1^k, x_2^k, \ldots ,x_{N-1}^k\right) : k=0,1,\ldots \right\} $ is a bounded sequence by using the coerciveness of $f_i+\mathbf 1 _{{\mathcal {X}}_i}, i=1,2,\ldots ,N-1$. The boundedness of $\left( x_N^k, \lambda ^k\right) $ can be obtained by using (4.3), (4.17) and (4.20).
4.
From the definition of ${\mathcal {L}}_\gamma $, it is clear that for $i=1,\ldots ,N-1$,
$$\begin{aligned} g_i\left( x_i^{k+1}\right) - A_i^\top \lambda ^{k+1} + \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \in \partial _{x_i} {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$
and
$$\begin{aligned} \nabla f\left( x_N^{k+1}\right) - \lambda ^{k+1} + \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) = \nabla _{x_N} {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$
and
$$\begin{aligned} b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} = \nabla _{\lambda } {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$
where $g_i\in \partial \left( f_i + \mathbf {1}_{{\mathcal {X}}_i}\right) $ for $i=1,2,\ldots ,N-1$.

Combining these relations with (4.4) and (4.5) yields that
$$\begin{aligned}&R_i^{k+1} := \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \\&\quad - \gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^{N-1} A_{j}(x_{j}^{k}-x_{j}^{k+1})+(x_{N}^{k}-x_{N}^{k+1})\right) \in \partial _{x_i} {\mathcal {L}}_\gamma (w^{k+1}), \\&R_N^{k+1} := \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) = \nabla _{x_N} {\mathcal {L}}_\gamma (w^{k+1}), \\&R_{\lambda }^{k+1} := b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} = \nabla _{\lambda } {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$
for $i=1,2,\ldots ,N-1$. Therefore, $\left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \in \partial {\mathcal {L}}_\gamma (w^{k+1})$. We now need to bound the norms of $R_i^{k+1}$, $i=1,\ldots ,N-1$, $R_N^k$ and $R_\lambda ^k$. It holds that
$$\begin{aligned} \left\| R_i^{k+1} \right\|\le & {} \gamma \left\| A_i^\top \right\| \left( \sum \limits _{j=i+1}^{N-1} \left\| A_j x_j^{k} - A_j x_j^{k+1}\right\| + \left\| x_N^{k} - x_N^{k+1}\right\| \right) \\&+\, \gamma \left\| A_i^\top \right\| \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| \\\le & {} \gamma \left\| A_i^\top \right\| \left( \sum \limits _{j=1}^{N-1} \left\| A_j x_j^{k} - A_j x_j^{k+1}\right\| + \left\| x_N^{k} - x_N^{k+1}\right\| \right) + \left\| A_i^\top \right\| \left\| \lambda ^k - \lambda ^{k+1}\right\| \end{aligned}$$
and
$$\begin{aligned} \left\| R_N^{k+1} \right\| = \gamma \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| = \left\| \lambda ^k - \lambda ^{k+1}\right\| , \quad \Vert R_\lambda ^{k+1} \Vert = \frac{1}{\gamma } \left\| \lambda ^k - \lambda ^{k+1} \right\| . \end{aligned}$$
These relations immediately imply (4.21). $\Box $

Proof of Theorem 4.3

1.
It has been proven in Lemma 4.8 that $\left\{ \left( x_1^k, x_2^k, \ldots ,x_N^k, \lambda ^k\right) :\right. \left. k=0,1,\ldots \right\} $ is a bounded sequence. Therefore, we conclude that $\Omega (w^0)$ is non-empty by the Bolzano-Weierstrass Theorem. Let $w^* = \left( x_1^*,\ldots , x_N^*,\lambda ^*\right) \in \Omega (w^0)$ be a limit point of $\{w^k = \left( x_1^k,\ldots , x_N^k,\lambda ^k\right) :k=0,1,\ldots \}$. Then there exists a subsequence $\left\{ w^{k_q} = \left( x_1^{k_q},\ldots , x_N^{k_q},\lambda ^{k_q}\right) :q=0,1,\ldots \right\} $ such that $w^{k_q}\rightarrow w^*$ as $q\rightarrow \infty $. Since $f_i, i=1,\ldots ,N-1$, are lower semi-continuous, we obtain that
$$\begin{aligned} \liminf \limits _{q\rightarrow \infty } f_i(x_i^{k_q}) \ge f_i(x_i^*), \quad i=1,2,\ldots , N. \end{aligned}$$
(4.28)
From (1.2), we have for any integer k and any $i=1,\ldots ,N-1$,
$$\begin{aligned} x_i^{k+1} := \mathop {\mathrm{argmin}}\limits _{x_i\in {\mathcal {X}}_i} \ {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^{k+1}, x_i, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) . \end{aligned}$$
Letting $x_i = x_i^*$ in the above, we get
$$\begin{aligned} {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_i^{k+1}, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) \le {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^{k+1}, x_i^{*}, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) , \end{aligned}$$
i.e.,
$$\begin{aligned}&f_i\left( x_i^{k+1}\right) - \left\langle \lambda ^k, A_i x_i^{k+1}\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{j=1}^i A_j x_j^{k+1} + \sum \limits _{j=i+1}^{N-1} A_j x_j^{k} + x_N^k - b \right\| ^2 \\&\quad \le f_i\left( x_i^*\right) - \left\langle \lambda ^k, A_i x_i^*\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{j=1}^{i-1} A_j x_j^{k+1} + A_i x_i^* + \sum \limits _{j=i+1}^{N-1} A_j x_j^{k} + x_N^k - b \right\| ^2. \end{aligned}$$
Choosing $k=k_q-1$ in the above inequality and letting q go to $+\infty $, we obtain
$$\begin{aligned} \limsup \limits _{q\rightarrow +\infty }f_i\left( x_i^{k_q}\right) \le \limsup \limits _{q\rightarrow +\infty } \left( \frac{\gamma }{2}\left\| A_i x_i^{k_q} - A_i x_i^* \right\| ^2 + \left\langle \lambda ^k, A_i x_i^{k_q}- A_i x_i^*\right\rangle \right) + f_i\left( x_i^*\right) ,\nonumber \\ \end{aligned}$$
(4.29)
for $i=1,2,\ldots ,N-1$. Here we have used the facts that the sequence $\{w^k:k=0,1,\ldots \}$ is bounded, and $\gamma $ is finite, and that the distance between two successive iterates tends to zero (4.20), and the fact that
$$\begin{aligned}&\sum \limits _{j=1}^i A_j x_j^{k+1} + \sum \limits _{j=i+1}^{N-1} A_j x_j^k + x_N^k - b = \sum \limits _{j=i+1}^{N-1} \left( A_j x_j^k - A_j x_j^{k+1}\right) + \left( x_N^k - x_N^{k+1}\right) \\&\quad +\,\frac{1}{\gamma } \left( \lambda ^k - \lambda ^{k+1}\right) . \end{aligned}$$
From (4.20) we also have $x_i^{k_q-1}\rightarrow x_i^*$ as $q\rightarrow \infty $, hence (4.29) reduces to $\limsup \limits _{q\rightarrow \infty }f_i(x_i^{k_q})\le f_i(x_i^*)$. Therefore, combining with (4.28), $f_i(x_i^{k_q})$ tends to $f_i(x_i^*)$ as $q\rightarrow \infty $. Hence, we can conclude that
$$\begin{aligned} \lim \limits _{q\rightarrow \infty }{\mathcal {L}}_\gamma (w^{k_q})= & {} \lim \limits _{q\rightarrow \infty }\left( \sum \limits _{i=1}^N f_i\left( x_i^{k_q}\right) - \left\langle \lambda ^{k_q}, \sum \limits _{i=1}^{N-1} A_i x_i^{k_q} + x_N^{k_q} -b\right\rangle \right. \\&\left. +\, \frac{\gamma }{2}\left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k_q} + x_N^{k_q} -b \right\| ^2\right) \\= & {} \sum \limits _{i=1}^N f_i\left( x_i^{*}\right) - \left\langle \lambda ^{*}, \sum \limits _{i=1}^{N-1} A_i x_i^{*}+x_N^{*}-b\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{i=1}^{N-1} A_i x_i^{*}+x_N^{*}-b \right\| ^2 \\= & {} {\mathcal {L}}_\gamma (w^{*}). \end{aligned}$$
On the other hand, it follows from (4.20) and (4.21) that
$$\begin{aligned} \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right)\in & {} \partial {\mathcal {L}}_\gamma (w^{k+1}) \end{aligned}$$
(4.30)

$$\begin{aligned} \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right)\rightarrow & {} (0,\ldots ,0), \quad k\rightarrow \infty . \end{aligned}$$
(4.31)
It implies that $(0,\ldots ,0)\in \partial {\mathcal {L}}_\gamma (x_1^*,\ldots ,x_N^*,\lambda ^*)$ due to the closeness of $\partial {\mathcal {L}}_\gamma $. Therefore, $w^* = \left( x_1^*,\ldots ,x_N^*,\lambda ^*\right) $ is a critical point of ${\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )$.
2.
The proof for this assertion directly follows from Lemma 5 and Remark 5 of [4]. We omit the proof here for succinctness.
3.
We define that $\tilde{L}$ is the finite limit of ${\mathcal {L}}_\gamma (x_1^k,\ldots ,x_N^k,\lambda ^k)$ as k goes to infinity, i.e.,
$$\begin{aligned} \tilde{L} = \lim \limits _{k\rightarrow \infty } {\mathcal {L}}_\gamma \left( x_1^k,\ldots ,x_N^k,\lambda ^k\right) . \end{aligned}$$
Take $w^*\in \Omega (w^0)$. There exists a subsequence $w^{k_q}$ converging to $w^*$ as q goes to infinity. Since we have proven that
$$\begin{aligned} \lim \limits _{q\rightarrow \infty }{\mathcal {L}}_\gamma (w^{k_q}) = {\mathcal {L}}_\gamma (w^{*}), \end{aligned}$$
and ${\mathcal {L}}_\gamma (w^{k})$ is a non-increasing sequence, we conclude that ${\mathcal {L}}_\gamma (w^{*}) = \tilde{L}$, hence the restriction of ${\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )$ to $\Omega (w^0)$ equals $\tilde{L}$.

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, T., Ma, S. & Zhang, S. Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity. J Sci Comput 69, 52–81 (2016). https://doi.org/10.1007/s10915-016-0182-0

Download citation

Received: 27 May 2015
Revised: 29 December 2015
Accepted: 10 February 2016
Published: 01 March 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10915-016-0182-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity

Abstract

Similar content being viewed by others

Block-Wise ADMM with a Relaxation Factor for Multiple-Block Convex Programming

On the Sublinear Convergence Rate of Multi-block ADMM

Convergence analysis of the direct extension of ADMM for multiple-block separable convex minimization

1 Introduction

2 Preliminaries

Assumption 2.1

Assumption 2.2

Remark 2.3

Notation

3 Iteration Complexity of ADMM: Associated Perturbation

Lemma 3.1

Proof

Theorem 3.2

Proof

Remark 3.3

3.1 Solving the Counterexample in [7]

4 Iteration Complexity of ADMM: Kurdyka–Łojasiewicz Property

Definition 4.1

Remark 4.1

Lemma 4.2

Theorem 4.3

Remark 4.4

Theorem 4.5

Proof

Lemma 4.6

Proof

Theorem 4.7

Proof

4.1 Divergence Counterexample Revisited

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of Theorem 4.3

Appendix: Proof of Theorem 4.3

Lemma 4.8

Proof of Lemma 4.8

Proof of Theorem 4.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation