A note on the convergence of ADMM for linearly constrained convex optimization problems

Chen, Liang; Sun, Defeng; Toh, Kim-Chuan

doi:10.1007/s10589-016-9864-7

A note on the convergence of ADMM for linearly constrained convex optimization problems

Published: 01 August 2016

Volume 66, pages 327–343, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Optimization and Applications Aims and scope Submit manuscript

A note on the convergence of ADMM for linearly constrained convex optimization problems

Download PDF

1738 Accesses
45 Citations
1 Altmetric
Explore all metrics

Abstract

This note serves two purposes. Firstly, we construct a counterexample to show that the statement on the convergence of the alternating direction method of multipliers (ADMM) for solving linearly constrained convex optimization problems in a highly influential paper by Boyd et al. (Found Trends Mach Learn 3(1):1–122, 2011) can be false if no prior condition on the existence of solutions to all the subproblems involved is assumed to hold. Secondly, we present fairly mild conditions to guarantee the existence of solutions to all the subproblems of the ADMM and provide a rigorous convergence analysis on the ADMM with a computationally more attractive large step-length that can even exceed the practically much preferred golden ratio of $(1+\sqrt{5})/2$.

On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers

Article 30 May 2015

Analysis of the Alternating Direction Method of Multipliers for Nonconvex Problems

Article 28 January 2021

On the Convergence Properties of a Majorized Alternating Direction Method of Multipliers for Linearly Constrained Convex Optimization Problems with Coupled Objective Functions

Article 02 February 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Let $\mathcal {X}$, $\mathcal {Y}$ and $\mathcal {Z}$ be three finite-dimensional real Euclidean spaces each endowed with an inner product $\langle \cdot ,\cdot \rangle $ and its induced norm $\Vert \cdot \Vert $. Let $f:\mathcal {Y}\rightarrow (-\infty ,+\infty ]$ and $g:\mathcal {Z}\rightarrow (-\infty ,+\infty ]$ be two closed proper convex functions and $\mathcal {A}:\mathcal {X}\rightarrow \mathcal {Y}$ and $\mathcal {B}:\mathcal {X}\rightarrow \mathcal {Z}$ be two linear maps. The effective domains of f and g are denoted by ${{\mathrm{dom}}}\, f$ and ${{\mathrm{dom}}}\, g$, respectively. Consider the following 2-block separable convex optimization problem:

$$\begin{aligned} \min _{y\in \mathcal {Y},z\in \mathcal {Z}}\big \{ f(y)+g(z) \quad \text{ s.t. }\quad \mathcal {A}^*y+\mathcal {B}^*z=c\big \}, \end{aligned}$$

(1)

where $c\in \mathcal {X}$ is the given data and the linear maps $\mathcal {A}^*$ and $\mathcal {B}^*$ are the adjoints of $\mathcal {A}$ and $\mathcal {B}$, respectively.

Let $\sigma >0$ be a given penalty parameter. The augmented Lagrangian function of problem (1) is defined by, for any $(x,y,z)\in \mathcal {X}\times \mathcal {Y}\times \mathcal {Z}$,

$$\begin{aligned} \begin{array}{l} \mathcal {L}_{\sigma }(y,z;x):=f(y)+g(z)+\langle x ,\mathcal {A}^* y+\mathcal {B}^* z-c\rangle +\frac{\sigma }{2}\Vert \mathcal {A}^*y+\mathcal {B}^*z-c\Vert ^2\, . \end{array} \end{aligned}$$

(2)

Choose an initial point $(x^0,y^0,z^0)\in \mathcal {X}\times {{\mathrm{dom}}}\, f\times {{\mathrm{dom}}}\, g$ and a step-length $\tau \in (0,+\infty )$. The classical alternating direction method of multipliers (ADMM) of Glowinski and Marroco [9] and Gabay and Mercier [6] then takes the following scheme for $k=0,1,\ldots $,

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle y^{k+1}=\mathop {{{\mathrm{arg\,min}}}}\limits _{y} \mathcal {L}_{\sigma }(y,z^k; x^k),\\ \displaystyle z^{k+1}=\mathop {{{\mathrm{arg\,min}}}}\limits _{z} \mathcal {L}_{\sigma }(y^{k+1},z; x^k),\\ \displaystyle x^{k+1}=x^k+\tau \sigma (\mathcal {A}^*y^{k+1}+\mathcal {B}^*z^{k+1}-c). \end{array} \right. \end{aligned}$$

(3)

The convergence analysis for the ADMM scheme (3) under certain settings was first conducted by Gabay and Mercier [6], Glowinski [7] and Fortin and Glowinski [5]. One may refer to [1] and [3] for recent surveys on this topic and to [8] for a note on the historical development of the ADMM.

In a highly influential paper^{Footnote 1} written by Boyd et al. [1], it was asserted [Section 3.2.1, Page 17] that if f and g are closed proper convex functions [1, Assumption 1] and the Lagrangian function of problem (1) has a saddle point [1, Assumption 2], then the ADMM scheme (3) converges for $\tau =1$. This, however, turns to be false without imposing the prior condition that all the subproblems involved have solutions. To demonstrate our claim, in this note we shall provide a simple example (see Sect. 3) with the following four nice properties:

(P1)
Both f and g are closed proper convex functions;
(P2)
The Lagrangian function has infinitely many saddle points;
(P3)
The Slater constraint qualification (CQ) holds; and
(P4)
The linear operator $\mathcal {B}$ is nonsingular.

Note that our example to be constructed satisfies the two assumptions made in [1], i.e., (P1) and (P2), and the two additional favorable properties (P3) and (P4). Yet, the ADMM scheme (3) even with $\tau =1$ may not be well-defined for solving problem (1). A closer examination of the proofs given in [1] reveals that the authors mistakenly took for granted the existence of solutions to all the subproblems in (3) under (P1) and (P2) only. Here we will fix this gap by presenting fairly mild conditions to guarantee the existence of solutions to all the subproblems in (3). Moreover we shall analyze the convergence of the ADMM with a computationally more attractive large step-length that can even be bigger than the golden ratio of $(1+\sqrt{5})/2$.

The remaining parts of this note are organized as follows. In Sect. 2, we first present some necessary preliminary results from convex analysis for later discussions and then provide conditions under which the subproblems in the ADMM scheme (3) are solvable, or even admit bounded solution sets, so that this scheme is well-defined. In Sect. 3, based on several results established in Sect. 2, we construct a counterexample that satisfies (P1)–(P4) to show that the conclusion on the convergence of ADMM scheme (3) in [1, Section 3.2.1] can be false without making further assumptions. In Sect. 4, we establish some satisfactory convergence properties for the ADMM scheme (3) with a computationally more attractive large step-length that can even exceed the golden ratio of $(1+\sqrt{5})/2$, under fairly weak assumptions. We conclude this note in Sect. 5.

2 Preliminaries

Let $\mathcal {U}$ be a finite dimensional real Euclidean space endowed with an inner product $\langle \cdot ,\cdot \rangle $ and its induced norm $\Vert \cdot \Vert $. Let $\mathcal {O}: \mathcal {U}\rightarrow \mathcal {U}$ be any self-adjoint positive semidefinite linear operator. For any $u,u'\in \mathcal {U}$, define $\langle u,u'\rangle _{\mathcal {O}}:=\langle u,\mathcal {O}u'\rangle $ and $\Vert u\Vert _\mathcal {O}:=\sqrt{\langle u, \mathcal {O}u\rangle }$ so that

$$\begin{aligned} \begin{array}{l} \langle u,u'\rangle _\mathcal {O}=\frac{1}{2}\left( \Vert u\Vert _\mathcal {O}^2+\Vert u'\Vert _\mathcal {O}^2-\Vert u-u'\Vert _\mathcal {O}^2\right) =\frac{1}{2}\left( \Vert u+u'\Vert _\mathcal {O}^2-\Vert u\Vert _\mathcal {O}^2-\Vert u'\Vert _\mathcal {O}^2\right) . \end{array} \end{aligned}$$

(4)

For any given set $U\subseteq \mathcal {U}$, we denote its relative interior by ${{\mathrm{ri}}}(U)$ and define its indicator function $\delta _{U}:\mathcal {U}\rightarrow (-\infty ,+\infty ]$ by

$$\begin{aligned} \delta _{U}(u):=\left\{ \begin{array}{ll} 0,&{}\quad \text{ if } \,u\in U,\\ +\infty ,&{}\quad \text{ if }\, u\not \in U. \end{array} \right. \end{aligned}$$

Let $\theta :\mathcal {U}\rightarrow (-\infty ,+\infty ]$ be a closed proper convex function. We use ${{\mathrm{dom}}}\,\theta $ and ${{\mathrm{epi}}}(\theta )$ to denote its effective domain and its epigraph, respectively. Moreover, we use $\partial \theta (\cdot )$ to denote the subdifferential mapping [11, Section 23] of $\theta (\cdot )$, which is defined by

$$\begin{aligned} \partial \theta (u):=\{v\in \mathcal {U}\,|\, \theta (u')\ge \theta (u)+\langle v,u'-u\rangle \ \forall \, u'\in \mathcal {U}\}, \quad \forall \, u\in \mathcal {U}. \end{aligned}$$

(5)

It holds that there exists a self-adjoint positive semidefinite linear operator $\varSigma _\theta :\mathcal {U}\rightarrow \mathcal {U}$ such that for any $u,u'$ with $v\in \partial \theta (u)$ and $v'\in \partial \theta (u')$,

$$\begin{aligned} \langle v-v',u-u'\rangle \ge \Vert u-u'\Vert ^{2}_{\varSigma _\theta }. \end{aligned}$$

(6)

Since $\theta $ is closed, proper and convex, by [11, Theorem 8.5] we know that the recession function [11, Section 8] of $\theta $, denoted by $\theta 0^+$, is a positively homogeneous closed proper convex function that can be written as, for an arbitrary $u'\in {{\mathrm{dom}}}\, \theta $,

$$\begin{aligned} \theta 0^+(u)=\lim _{\rho \rightarrow +\infty }\frac{\theta (u'+\rho u)-\theta (u')}{\rho }, \quad \forall \, u\in \mathcal {U}. \end{aligned}$$

The Fenchel conjugate $\theta ^{*}(\cdot )$ of $\theta $ is a closed proper convex function defined by

$$\begin{aligned} \theta ^*(v):=\sup _{u\in \mathcal {U}}\big \{\langle u,v\rangle -\theta (u)\big \}, \quad \forall \, v\in \mathcal {U}. \end{aligned}$$

Since $\theta $ is closed, by [11, Theorem 23.5] we know that

$$\begin{aligned} v\in \partial \theta (u) \Leftrightarrow u\in \partial \theta ^{*}(v). \end{aligned}$$

(7)

The dual of problem (1) takes the form of

$$\begin{aligned} \max _{x\in \mathcal {X}}\big \{h(x):=-f^*(-\mathcal {A}x)-g^*(-\mathcal {B}x)-\langle c,x\rangle \big \}. \end{aligned}$$

(8)

The Lagrangian function of problem (1) is defined by

$$\begin{aligned} \begin{array}{ll} \mathcal {L}(y,z;x):=f(y)+g(z)+\langle x,\mathcal {A}^*y+\mathcal {B}^*z-c\rangle , \quad \forall \, (y,z,x)\in \mathcal {Y}\times \mathcal {Z}\times \mathcal {X}, \end{array} \end{aligned}$$

(9)

which is convex in $(y,z)\in \mathcal {Y}\times \mathcal {Z}$ and concave in $x\in \mathcal {X}$. Recall that we say that the Slater CQ for problem (1) holds if

$$\begin{aligned} \bigl \{(y,z)\ |\ y\in {{\mathrm{ri}}}({{\mathrm{dom}}}\,f),\ z\in {{\mathrm{ri}}}({{\mathrm{dom}}}\,g),\ \mathcal {A}^{*}y+\mathcal {B}^{*}z=c\bigr \}\ne \emptyset . \end{aligned}$$

Under the above Slater CQ, from [11, Corollaries 28.2.2 and 28.3.1] we know that $(\bar{y}, \bar{z})\in {{\mathrm{dom}}}\, f\times {{\mathrm{dom}}}\, g$ is a solution to problem (1) if and only if there exists a Lagrangian multiplier $\bar{x}\in \mathcal {X}$ such that $(\bar{x}, \bar{y}, \bar{z})$ is a saddle point to the Lagrangian function (9), or, equivalently, $(\bar{x}, \bar{y},\bar{z})$ is a solution to the following Karush-Kuhn-Tucker (KKT) system

$$\begin{aligned} -\mathcal {A}x\in \partial f( y),\quad -\mathcal {B}x\in \partial g( z)\quad \text{ and } \quad \mathcal {A}^* y+\mathcal {B}^* z=c. \end{aligned}$$

(10)

Furthermore, if the solution set to the KKT system (10) is nonempty, by [11, Theorem 30.4 and Corollary 30.5.1] we know that a vector $(\bar{x},\bar{y},\bar{z})\in \mathcal {X}\times \mathcal {Y}\times \mathcal {Z}$ is a solution to (10) if and only if $(\bar{y}, \bar{z})$ is an optimal solution to problem (1) and $\bar{x}$ is an optimal solution to problem (8).

In the following, we shall conduct discussions on the existence of solutions to the subproblems in the ADMM scheme (3). Let the augmented Lagrangian function $\mathcal {L}_\sigma $ be defined by (2) and $(x',y',z')\in \mathcal {X}\times {{\mathrm{dom}}}\, f\times {{\mathrm{dom}}}\, g$ be an arbitrarily given point. Consider the following two auxiliary optimization problems:

$$\begin{aligned} \min _{y\in \mathcal {Y}}\big \{F(y):=\mathcal {L}_{\sigma }(y,z';x')\big \} \end{aligned}$$

(11)

and

$$\begin{aligned} \min _{z\in \mathcal {Z}}\big \{G(z):=\mathcal {L}_{\sigma }(y',z;x')\big \}. \end{aligned}$$

(12)

Note that since $z'\in {{\mathrm{dom}}}\, g$, problem (11) is equivalent to

$$\begin{aligned} \begin{array}{l} \min \limits _{y\in \mathcal {Y}} \big \{ \widehat{F}(y):=f(y)+\frac{\sigma }{2}\Vert \mathcal {A}^*y+(\mathcal {B}^*z'-c+{x'}/{\sigma })\Vert ^2\big \}. \end{array} \end{aligned}$$

(13)

We now study under what conditions the problems (11) and (12) are solvable or have bounded solution sets. For this purpose, we consider the following assumptions:

Assumption 1

$f0^+(y)>0$ for any $y\in \mathcal {M}$, where

$$\begin{aligned} \mathcal {M}:=\{y\in \mathcal {Y}\,|\, \mathcal {A}^*y=0\}\backslash \{y\in \mathcal {Y}\, |\, f0^+(-y) = -f0^+(y) = 0\}. \end{aligned}$$

Assumption 2

$g0^+(z)>0$ for any $z\in \mathcal {N}$, where

$$\begin{aligned} \mathcal {N}:=\{z\in \mathcal {Z}\,|\, \mathcal {B}^*z=0\}\backslash \{z \in \mathcal {Z}\, |\, g0^+(-z)=-g0^+(z) = 0\}. \end{aligned}$$

Assumption 3

$f0^+(y)>0$ for any $0\ne y\in \{y\in \mathcal {Y}\, |\, \mathcal {A}^*y=0\}$.

Assumption 4

$g0^+(z)>0$ for any $0\ne z\in \{z\in \mathcal {Z}\, |\, \mathcal {B}^*z=0\}$.

Note that Assumptions 1–4 are not very restrictive. For example, if both f and g are coercive, in particular if they are norm functions, all the four assumptions hold automatically without any other conditions.

Proposition 2.1

It holds that

(a):: Problem (11) is solvable if Assumption 1 holds, and problem (12) is solvable if Assumption 2 holds.
(b):: The solution set to problem (11) is nonempty and bounded if and only if Assumption 3 holds, and the solution set to problem (12) is nonempty and bounded if and only if Assumption 4 holds.

Proof

(a) We first show that when Assumption 1 holds, the solution set to problem (13) is not empty. Consider the recession function $\widehat{F}0^+$ of $\widehat{F}$. On the one hand, by using [11, Theorem 9.3] and the second example given in [11, Pages 67–68], we know that for any $y\in \mathcal {Y}$ such that $\mathcal {A}^*y\ne 0$, one must have $\widehat{F}0^+(y)=+\infty $. On the other hand, for any $y\in \mathcal {Y}$ such that $\mathcal {A}^*y=0$, by the definition of $\widehat{F}(y)$ in (13) we have

$$\begin{aligned} \widehat{F}0^+(y)=f0^+(y) + \sigma \langle \mathcal {A}(\mathcal {B}^*z'-c+{x'}/{\sigma }), y\rangle =f0^+(y). \end{aligned}$$

Hence, by Assumption 1 we know that $\widehat{F}0^+(y)>0$ for all $y\in \mathcal {Y}$ except for those satisfying $\widehat{F}0^+(-y)=-\widehat{F}0^+(y)=0$. Then, by [11, part (b) in Corollary 13.3.4] and the closedness of $\hat{F}$, it holds that $0\in {{\mathrm{ri}}}({{\mathrm{dom}}}\, \widehat{F}^*)$. Furthermore, by [11, Theorem 23.4] we know that $\partial \widehat{F}^*(0)$ is a nonempty set, i.e., there exists a $\hat{y}\in \mathcal {Y}$ such that $\hat{y}\in \partial \widehat{F}^*(0)$. By noting that $\widehat{F}$ is closed and using (7), we then have $0\in \partial \widehat{F}(\hat{y})$, which implies that $\hat{y}$ is an optimal solution to problem (13) and hence to problem (11).

By repeating the above discussions we know that problem (12) is also solvable if Assumption 2 holds.

(b) By reorganizing the proofs for part (a), we can see that Assumption 3 holds if and only if $\widehat{F} 0^+ (y) >0$ for all $0\ne y\in \mathcal {Y}$. As a result, if Assumption 3 holds, from [11, Theorem 27.2] we know that problem (13) has a nonempty and bounded solution set. Conversely, if the solution set to problem (13) is nonempty and bounded, by [11, Corollary 8.7.1] we know that there does not exist any $0\ne y\in \mathcal {Y}$ such that $\widehat{F}0^{+}(y)\le 0$, so that Assumption 3 holds. Similarly, we can prove the remaining results of part (b). This completes the proof of the proposition. $\square $

Based on Proposition 2.1 and its proof, we have the following result.

Corollary 2.1

If problem (1) has a nonempty and bounded solution set, then both problems (11) and (12) have nonempty and bounded solution sets.

Proof

Since problem (1) has a nonempty and bounded solution set, there does not exist any $0\ne y\in \mathcal {Y}$ with $\mathcal {A}^*y=0$ such that $f0^+(y)\le 0$, or $0\ne z\in \mathcal {Z}$ with $\mathcal {B}^*z=0$ such that $g0^+(z)\le 0$. Thus, Assumptions 3 and 4 hold. Then, by part (b) in Proposition 2.1 we know that the conclusion of Corollary 2.1 holds. $\square $

A function $\varphi :\mathcal {U}\rightarrow (-\infty ,+\infty ]$ is called piecewise linear-quadratic [12, Definition 10.20] if its effective domain can be represented as the union of finitely many polyhedral sets, relative to each of which this function is given by an expression of $\frac{1}{2}\langle u, \mathcal {Q}u\rangle +\langle a,u\rangle +\alpha $ for some scalar $\alpha \in \mathfrak {R}$, vector $a\in \mathcal {U}$, and symmetric linear operator $\mathcal {Q}:\mathcal {U}\rightarrow \mathcal {U}$.

Proposition 2.2

If f (or g) is a closed proper piecewise linear-quadratic convex function, especially a polyhedral convex function, we can replace the “ >” in Assumption 1 ( or 2 ) by “ $\ge $” and the corresponding sufficient condition in part (a ) of Proposition 2.1 is also necessary.

Proof

Note that when f is a closed piecewise linear-quadratic convex function, the function $\widehat{F}$ defined in problem (13) is a piecewise linear-quadratic convex function with ${{\mathrm{dom}}}\widehat{F}= {{\mathrm{dom}}}f$ being a closed convex polyhedral set. Then by [12, Theorem 11.14 (b)] we know that $\widehat{F}^*$ is also a piecewise linear-quadratic convex function whose effective domain is a closed convex polyhedral set. By repeating the discussions for proving part (a) of Proposition 2.1 and using [11, part (a) in Corollary 13.3.4] we can obtain that Assumption 1 with “>” being replaced by “ $\ge $” holds if and only if $0\in {{\mathrm{dom}}}\, \widehat{F}^*$, or $\partial \widehat{F}^*(0)$ is a nonempty set [12, Proposition 10.21], which is equivalent to saying that ${{\mathrm{arg\,min}}}\widehat{F}$ is a nonempty set. If g is piecewise linear-quadratic we can get a similar result. $\square $

Finally, we need the following easy-to-verify result on the convergence of quasi-Fejér monotone sequences.

Lemma 2.1

Let $\{a_k\}_{k\ge 0}$ be a nonnegative sequence of real numbers satisfying $a_{k+1}\le a_k+\varepsilon _k$ for all $k\ge 0$, where $\{\varepsilon _k\}_{k\ge 0}$ is a nonnegative and summable sequence of real numbers. Then the quasi-Fejér monotone sequence $\{a_k\}$ converges to a unique limit point.

3 A Counterexample

In this section, we shall provide an example that satisfies all the properties (P1)-(P4) stated in Sect. 1 to show that the solution set to a certain subproblem in the ADMM scheme (3) can be empty if no further assumptions on f, g or $\mathcal {A}$ are made. This means that the convergence analysis for the ADMM stated in [1] can be false. The construction of this example relies on Proposition 2.1. The parameter $\sigma $ and the initial point $(x^{0},y^{0},z^{0})$ in the counterexample are just selected for the convenience of computations and one can construct similar examples for arbitrary penalty parameters and initial points.

We now present this example, which is a 3-dimensional 2-block convex optimization problem.

In this example, f and g are closed proper convex functions with ${{\mathrm{ri}}}({{\mathrm{dom}}}\, f)={{\mathrm{dom}}}\, f=\mathfrak {R}^2$ and ${{\mathrm{ri}}}({{\mathrm{dom}}}\, g)=\{z\, |\, z>0\}\subset {{\mathrm{dom}}}\, g $. The vector $(0,3,1)\in \mathfrak {R}^3$ lies in ${{\mathrm{ri}}}({{\mathrm{dom}}}\, f)\times {{\mathrm{ri}}}({{\mathrm{dom}}}\, g)$ and satisfies the constraint in problem (14). Hence, for problem (14), the Slater CQ holds. It is easy to check that the optimal solution set to problem (14) is given by

$$\begin{aligned} \{(y_1,y_2,z)\in \mathfrak {R}^3\, |\, y_1\ge -\log _e 2,\ y_2=2, \ z=0\} \end{aligned}$$

and the corresponding optimal objective value is 4. The Lagrangian function of problem (14) is given by

$$\begin{aligned} \mathcal {L}(y_1,y_2,z;x)= & {} \max (e^{-y_1}+y_2,y_2^2)+\delta _{\ge 0}(z)+x(y_2-z-2),\\&\forall \, (y_1,y_2,z,x)\in \mathfrak {R}^4\, . \end{aligned}$$

We now compute the dual of problem (14) based on this Lagrangian function.

Lemma 3.1

The objective function of the dual of problem (14) is given by

$$\begin{aligned} h(x)= \left\{ \begin{array}{ll} -x^2/4-2x, &{} \quad {\mathrm{if}}\; x\in (-\infty ,-2),\\ 1-x, &{}\quad {\mathrm{if}}\; x\in [-2,-1),\\ -2x, &{}\quad {\mathrm{if}}\; x\in [-1,0],\\ -\infty , &{}\quad {\mathrm{if}}\; x\in (0+\infty ). \end{array} \right. \end{aligned}$$

Proof

By the definition of the dual objective function, we have

$$\begin{aligned} h(x)\displaystyle= & {} \inf _{y_1,y_2,z} \mathcal {L}(y_1,y_2,z;x)\\ \displaystyle= & {} \inf _{z\ge 0,y_2}\big \{\inf _{y_1}(\max (e^{-y_1}+y_2,y_2^2)+(y_2-z-2)x)\big \}\\ \displaystyle= & {} \inf _{z\ge 0,y_2}\{\max (y_2,y_2^2)+y_2x-zx-2x\}\\ \displaystyle= & {} \min _{y_2}\Big ( \inf _{y_2\in [0,1],z\ge 0}\big \{y_2\!+\!y_2x\!-\!zx\!-\!2x\big \}, \inf _{y_2\not \in [0,1],z\!\ge \!0}\big \{y_2^2+y_2x-zx-2x\big \}\Big ). \end{aligned}$$

For any given $x\in \mathfrak {R}$, we have

$$\begin{aligned} \begin{array}{l} \displaystyle \inf _{y_2\in [0,1],z\ge 0}\big \{y_2+y_2x-zx-2x\big \}\\ \quad \displaystyle =\inf _{y_2\in [0,1]}\big \{y_2(1+x)\big \}+\inf _{z\ge 0}\big \{-zx\big \}-2x =\left\{ \begin{array}{ll} 1-x, &{}\quad \text{ if }\; x< -1,\\ -2x, &{}\quad \text{ if }\; x\in [-1,0],\\ -\infty , &{}\quad \text{ if }\; x> 0. \end{array} \right. \end{array} \end{aligned}$$

Moreover, for any $x\in \mathfrak {R}$, it holds that

$$\begin{aligned} \begin{array}{l} \displaystyle \inf _{y_2\not \in [0,1],z\ge 0}\big \{y_2^2+y_2x-zx-2x\big \}\\ \quad \displaystyle =\inf _{y_2\not \in [0,1]}\big \{y_2^2+y_2x+x^2/4-x^2/4-2x\big \}+\inf _{z\ge 0}\big \{-zx\big \}\\ \quad \displaystyle =\inf _{y_2\not \in [0,1]}\big \{(y_2+x/2)^2\big \}+\inf _{z\ge 0}\big \{-zx\big \}-x^2/4-2x\\ \quad =\left\{ \begin{array}{ll} -x^2/4-2x, &{}\quad \text{ if }\; x< -2,\\ 1-x, &{}\quad \text{ if }\; x\in [-2,-1],\\ -2x, &{}\quad \text{ if }\; x\in [-1,0],\\ -\infty , &{}\quad \text{ if }\; x> 0. \end{array} \right. \end{array} \end{aligned}$$

Then by combining the above discussions on the two cases we obtain the conclusion of this lemma. $\square $

By Lemma 3.1, one can see that the optimal solution to the dual of problem (14) is $\bar{x}=-4$ and the optimal value of the dual of problem (14) is $h(-4)=4$ (see Fig. 1). Moreover, the set of solutions to the KKT system (10) for problem (14) is given by

$$\begin{aligned} \big \{(y_1,y_2,z, x)\in \mathfrak {R}^4\ |\ y_1\ge -\log _e 2,\ y_2=2,\ z=0, \ x=-4 \big \}. \end{aligned}$$

Next, we consider solving problem (14) by using the ADMM scheme (3). For convenience, let $\sigma =1$ and set the initial point $(x^0, y^0_1, y_2^0, z^0)=(0,0,0,0)$. Now, one should compute $(y_{1}^1,y_{2}^1)$ by solving

$$\begin{aligned} \min _{y_1,y_2} \mathcal {L}_\sigma (y_1,y_2,z^0;x^0). \end{aligned}$$

Define the function $I(\cdot ):\mathfrak {R}\rightarrow [-\infty , +\infty ]$ by

$$\begin{aligned} \begin{array}{ll} I(y_2):&{}=\displaystyle \inf _{y_1} \mathcal {L}_\sigma (y_1,y_2,z^0;x^0)\\ &{}= \displaystyle \inf _{y_1}\Big \{ \max \big (e^{-y_1}+y_2,y_2^2\big )+(y_2-2)^2/2\Big \}\\ &{} =\left\{ \begin{array}{ll} \frac{3}{2} y_2^2-2y_2+2&{}\quad \text{ if }\; y_{2}\not \in [0,1],\\ \frac{1}{2} y^2_2-y_2+2 &{}\quad \text{ if }\; y_{2}\in [0,1]. \end{array} \right. \end{array} \end{aligned}$$

By direct calculations we can see that the above infimum is attained at $\bar{y}_2=1$ with $I(\bar{y}_2)=1.5$ (see Fig. 1). However, we have for any $y_1\in \mathfrak {R}$,

$$\begin{aligned} \mathcal {L}_\sigma (y_1,1,0 ;0) = \max (e^{-y_1}+1,1)+0.5 = e^{-y_{1}}+1.5 >\inf _{y_1,y_2} \mathcal {L}_\sigma (y_1,y_2,z^0;x^0). \end{aligned}$$

This means that although $\inf _{y_1,y_2}\mathcal {L}_\sigma (y_1,y_2,z^0; x^0)=1.5$ is finite, it cannot be attained at any $(y_1,y_2)\in \mathfrak {R}^2$. Then the subproblem for computing $(y_{1}^{1},y_{2}^{1})$ is not solvable and hence the ADMM scheme (3) is not well-defined. Note that for problem (14), Assumption 1 fails to hold since the direction $y=(1,0)$ satisfies $\mathcal {A}^{*}y=0$ and $f0^{+}(y)=0$ but $f0^{+}(-y)=+\infty $.

Remark 3.1

The counterexample constructed here is very simple. Yet, one may still ask if the objective function f about $(y_1, y_2)$ in problem (14) can be replaced by an even simpler quadratic function. Actually, this is not possible as Assumption 1 holds if f is a quadratic function and the original problem has a solution. Specifically, suppose that $\alpha \in \mathfrak {R}$ is a given number, $\mathcal {Q}:\mathcal {Y}\rightarrow \mathcal {Y}$ is a self-adjoint positive semidefinite linear operator and $a\in \mathcal {Y}$ is a given vector while f takes the following form

$$\begin{aligned} \begin{array}{l} f(y)=\frac{1}{2}\langle y,\mathcal {Q}y\rangle +\langle a, y\rangle +\alpha ,\quad \forall \, y\in \mathcal {Y}. \end{array} \end{aligned}$$

From [11, Pages 67–68] we know that

$$\begin{aligned} f0^+(y)=\left\{ \begin{array}{ll} \langle a,y\rangle , &{}\quad \text{ if }\;\mathcal {Q}y=0,\\ +\infty ,&{}\quad \text{ if }\;\mathcal {Q}y\ne 0. \end{array} \right. \end{aligned}$$

(15)

If problem (1) has a solution, one must have $f0^+(y)\ge 0$ whenever $\mathcal {A}^*y=0$. This, together with (15), clearly implies that Assumption 1 holds.

4 Convergence properties of ADMM

The example presented in the previous section motivates us to reconsider the convergence of the ADMM scheme (3). In the following, we will revisit the convergence properties of the ADMM, with a computationally more attractive large step-length.

For convenience, we introduce some notations, which will be used throughout this section. We use $\varSigma _{f}$ and $\varSigma _{g}$ to denote the two self-adjoint positive semidefinite linear operators whose definitions, corresponding to the two functions f and g in problem (1), can be drawn from (6). Let $(\bar{x},\bar{y},\bar{z})\in \mathcal {X}\times \mathcal {Y}\times \mathcal {Z}$ be a given vector, whose definition will be specified latter. We denote $x_e:=x-\bar{x}$, $y_e:=y-\bar{y}$ and $z_e:=z-\bar{z}$ for any $(x,y,z)\in \mathcal {X}\times \mathcal {Y}\times \mathcal {Z}$. If additionally the ADMM scheme (3) generates an infinite sequence $\{(x^{k},y^{k},z^{k})\}$, for $k\ge 0$ we denote $x^k_e:=x^k-\bar{x}$, $y^k_e:=y^k-\bar{y}$ and $z^k_e:=z^k-\bar{z}$, and define the following auxiliary notations

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle u^{k}:=-\mathcal {A}[x^{k}+(1-\tau )\sigma (\mathcal {A}^*y_e^{k}+\mathcal {B}^*z_e^{k})+\sigma \mathcal {B}^*(z^{k-1}-z^{k})],\\ \displaystyle v^{k}:=-\mathcal {B}[x^{k}+(1-\tau )\sigma (\mathcal {A}^*y_e^{k}+\mathcal {B}^*z_e^{k})],\\ \varPsi _k:=\frac{1}{\tau \sigma }\Vert x^k_e\Vert ^2 +\Vert z_e^k\Vert ^2_{\sigma \mathcal {B}\mathcal {B}^*},\\ \varPhi _k: =\varPsi _{k} +\max (1-\tau , 1-\tau ^{-1})\sigma \Vert \mathcal {A}^*y_e^{k}+\mathcal {B}^*z_e^{k}\Vert ^2 \end{array} \right. \end{aligned}$$

(16)

with the convention $z^{-1}\equiv z^0$. Based on these notations, we have the following result.

Proposition 4.1

Suppose that $(\bar{x},\bar{y},\bar{z})\in \mathcal {X}\times \mathcal {Y}\times \mathcal {Z}$ is a solution to the KKT system (10), and that the ADMM scheme (3) generates an infinite sequence $\{(x^{k},y^{k},z^{k})\}$ (which is guaranteed to be true if Assumptions 1 and 2 hold, cf. Proposition 2.1). Then, for any $k\ge 1$,

$$\begin{aligned}&u^{k}\in \partial f(y^{k}), \quad v^{k}\in \partial g(z^{k}), \end{aligned}$$

(17)

$$\begin{aligned}&\varPhi _k-\varPhi _{k+1}\ge 2\Vert y_e^{k+1}\Vert _{\varSigma _f}^2 +\min (\tau ,1+\tau -\tau ^2)\sigma \Vert \mathcal {B}^*(z^{k+1}-z^k)\Vert ^2\nonumber \\&\quad +\,2\Vert z_e^{k+1}\Vert _{\varSigma _g}^2+\min (1,1-\tau +\tau ^{-1})\sigma \Vert \mathcal {A}^*y_e^{k+1}+\mathcal {B}^*z_e^{k+1}\Vert ^2 \end{aligned}$$

(18)

and

$$\begin{aligned} \varPsi _{k}-\varPsi _{k+1}\ge & {} 2\Vert y_e^{k+1}\Vert _{\varSigma _f}^2 +2\Vert z_e^{k+1}\Vert _{\varSigma _g}^2 +\sigma \Vert \mathcal {A}^*y^{k+1}_e+\mathcal {B}^*z^{k}_e\Vert ^{2}\nonumber \\&+\,(1-\tau )\sigma \Vert \mathcal {A}^*y^{k+1}_e+\mathcal {B}^*z^{k+1}_e\Vert ^2. \end{aligned}$$

(19)

Proof

For any $k\ge 1$, the inclusions in (17) directly follow from the first-order optimality condition of the subproblems in the ADMM scheme (3). The inequality (18) has been proved in Fazel et al.^{Footnote 2} [4, parts (a) and (b) in Theorem B.1]. Meanwhile, by using (B.12) in [4, Theorem B.1] and (4) we can get

$$\begin{aligned}&\frac{1}{2\tau \sigma }( \Vert x^k_e\Vert ^2-\Vert x^{k+1}_e\Vert ^2) -\frac{\sigma }{2}\Vert \mathcal {B}^*(z^{k+1}-z^k)\Vert ^2 -\frac{\sigma }{2}\Vert \mathcal {B}^*z_e^{k+1}\Vert ^2 +\frac{\sigma }{2}\Vert \mathcal {B}^*z_e^k\Vert ^2\\&\quad \quad -\frac{2-\tau }{2}\sigma \Vert \mathcal {A}^*y^{k+1}_e+\mathcal {B}^*z^{k+1}_e\Vert ^2 +\sigma \langle \mathcal {B}^*(z^{k+1}-z^k), \mathcal {A}^*y_e^{k+1}+\mathcal {B}^*z_e^{k+1}\rangle \\&\quad \ge \Vert y_e^{k+1}\Vert ^2_{\varSigma _f} +\Vert z_e^{k+1}\Vert ^2_{\varSigma _g}, \end{aligned}$$

which, together with the definition of $\varPsi _{k}$ in (16), implies (19). This completes the proof. $\square $

Now, we are ready to present several convergence properties of the ADMM scheme (3).

Theorem 4.1

Assume that the solution set to the KKT system (10) for problem (1) is nonempty. Suppose that the ADMM scheme (3) generates an infinite sequence $\{(x^{k},y^{k},z^{k})\}$, which is guaranteed to be true if Assumptions 1 and 2 hold. Then, if $\tau \in \big (\, 0,(1+\sqrt{5}\,)/2\, \big )$, one has the following results:

(a)
the sequence $\{x^k\}$ converges to an optimal solution to the dual problem (8), and the primal objective function value sequence $\{f(y^k)+g(z^k)\}$ converges to the optimal value;
(b)
the sequences $\{f(y^{k})\}$ and $\{g(z^{k})\}$ are bounded, and if Assumptions 3 and 4 hold, the sequences $\{y^{k}\}$ and $\{z^{k}\}$ are also bounded;
(c)
any accumulation point of the sequence $\{(x^{k},y^{k},z^{k})\}$ is a solution to the KKT system (10), and if $(x^\infty , y^\infty ,z^\infty )$ is one of its accumulation points, then $ \mathcal {A}^{*} y^{k} \rightarrow \mathcal {A}^{*} y^{\infty }$, $ \varSigma _f y^{k} \rightarrow \varSigma _f y^{\infty }$, $ \mathcal {B}^{*} z^{k} \rightarrow \mathcal {B}^{*} z^{\infty }$ and $ \varSigma _g z^{k} \rightarrow \varSigma _g z^{\infty }$ as $k\rightarrow \infty $;
(d)
if $\varSigma _f+\mathcal {A}\mathcal {A}^*\succ 0$ and $\varSigma _g+\mathcal {B}\mathcal {B}^*\succ 0$, then each of the subproblems in the ADMM scheme (3) has a unique optimal solution and the whole sequence $\{(x^{k},y^{k},z^{k})\}$ converges to a solution to the KKT system (10).

Proof

Let $(\bar{x},\bar{y},\bar{z})\in \mathcal {X}\times \mathcal {Y}\times \mathcal {Z}$ be an arbitrary solution to the KKT system (10) of problem (1). We first establish some basic results and then prove (a)–(d) one by one. In the following, the notations provided at the beginning of this section are used.

Note that $\Vert \mathcal {A}^{*}y_{e}^{k}\Vert \le \Vert \mathcal {A}^{*}y_{e}^{k}+\mathcal {B}^{*}z_{e}^{k}\Vert +\Vert \mathcal {B}^{*}z_{e}^{k}\Vert $ for any $k\ge 0$. Since $\tau \in (0,(1+\sqrt{5})/2)$, by using (16) and (18) we obtain that the sequences

$$\begin{aligned} \{\Vert x^k\Vert \},\quad \{\Vert y^{k}\Vert _{\sigma \mathcal {A}\mathcal {A}^{*}}\}\ \quad \text{ and } \quad \{\Vert z^k\Vert _{\sigma \mathcal {B}\mathcal {B}^*}\} \end{aligned}$$

(20)

are all bounded, and

$$\begin{aligned} \sum _{k=0}^{\infty }\Vert y_e^{k}\Vert _{\varSigma _f}^2,\quad \sum _{k=0}^{\infty }\Vert z_e^{k}\Vert _{\varSigma _g}^2,\quad \sum _{k=0}^{\infty }\Vert \mathcal {A}^*y_e^{k}+\mathcal {B}^*z_e^{k}\Vert ^2,\quad \sum _{k=0}^{\infty }\Vert \mathcal {B}^*(z^{k+1}-z^k)\Vert ^2\, < +\infty . \end{aligned}$$

(21)

This, consequently, implies that $\{u^k\}$ and $\{v^k\}$ are bounded sequences. In the following, we prove (a)–(d) separately.

(a) Since $\{x^k\}$ is a bounded sequence, for any one of its accumulation points, e.g. $x^\infty \in \mathcal {X}$, it admits a subsequence, say, $\{x^{k_{j}}\}_{j\ge 0}$, such that $\lim \limits _{j\rightarrow \infty }x^{k_{j}}=x^{\infty }$. By using the definitions of $\{u^k\}$ and $\{v^k\}$ in (16) we obtain that

$$\begin{aligned} u^\infty :=\lim _{j\rightarrow \infty } u^{k_j}=-\mathcal {A}x^{\infty } \quad \text{ and } \quad v^\infty :=\lim _{j\rightarrow \infty } v^{k_j} =-\mathcal {B}x^{\infty } . \end{aligned}$$

(22)

From (7) and (17) we know that for any $k\ge 1$, $ y^{k}\in \partial f^*(u^{k}) $ and $ z^{k}\in \partial g^*(v^{k}) $. Hence, we can get $ \mathcal {A}^*y^{k}\in \mathcal {A}^*\partial f^*(u^{k}) $ and $ \mathcal {B}^*z^{k}\in \mathcal {B}^*\partial g^*(v^{k}) $ so that

$$\begin{aligned} \mathcal {A}^*y^{k_j}+\mathcal {B}^*z^{k_j}\in \mathcal {A}^*\partial f^*(u^{k_j})+\mathcal {B}^*\partial g^*(v^{k_j}), \quad \forall \, j\ge 0. \end{aligned}$$

(23)

Then, by using (21), (22), (23) and the outer semi-continuity of subdifferential mappings of closed proper convex functions we know that

$$\begin{aligned} c\in \mathcal {A}^*\partial f^*(-\mathcal {A}x^\infty )+\mathcal {B}^*\partial g^*(-\mathcal {B}x^\infty ), \end{aligned}$$

which implies that $x^\infty $ is a solution to the dual problem (8). Therefore, we can conclude that any accumulation of $\{x^k\}$ is a solution to the dual problem (8). To finish the proof of part (a), we need to show that $\{x^k\}$ is a convergent sequence. This will be done in the following.

We define the sequence $\{\phi _k\}_{k\ge 1}$ by

$$\begin{aligned} \phi _k:= \sigma \Vert z_e^k\Vert ^2_{\mathcal {B}\mathcal {B}^*} +\max (1-\tau , 1-\tau ^{-1})\sigma \Vert \mathcal {A}^*y_e^{k}+\mathcal {B}^*z_e^{k}\Vert ^2. \end{aligned}$$

Since $\tau \in (0,(1+\sqrt{5})/2)$, from (18) in Proposition 4.1 and the fact that $\varPhi _k\ge \phi _k$, we know that $\{\phi _k\}$ is a nonnegative and bounded sequence. Thus, there exists a subsequence of $\{\phi _{k}\}$, say $\{\phi _{k_{l}}\}$, such that $ \lim \limits _{l\rightarrow \infty }\phi _{k_l}=\liminf \limits _{k\rightarrow \infty }\phi _k$. Since $\{x^{k_l}\}$ is bounded, it must have a convergent subsequence, say, $\{x^{k_{l_i}}\}$, such that $\tilde{x}:=\lim \limits _{i\rightarrow \infty }x^{k_{l_i}}$ exists. Note that $(\tilde{x}, \bar{y},\bar{z})$ is a solution to the KKT system (10). Therefore, without loss of generality, we can reset $\bar{x}=\tilde{x}$ from now on. By using (18) in Proposition 4.1 we know that the nonnegative sequence $\{\varPhi _k\}$ is monotonically nonincreasing, and

$$\begin{aligned} \lim _{k\rightarrow \infty }\varPhi _{k}=\lim _{i\rightarrow \infty }\varPhi _{k_{l_i}}=\lim _{i\rightarrow \infty }\big (\frac{1}{\tau \sigma }\Vert x_e^{k_{l_{i}}}\Vert ^2+\phi _{k_{l_i}}\big )=\liminf _{k\rightarrow \infty }\phi _k. \end{aligned}$$

(24)

Since $\frac{1}{\tau \sigma }\Vert x_e^{k}\Vert ^2=\varPhi _k-\phi _k$, we have

$$\begin{aligned} \limsup _{k\rightarrow \infty }\frac{1}{\tau \sigma }\Vert x_e^{k}\Vert ^2=\limsup _{k\rightarrow \infty }\{\varPhi _k-\phi _k\} \le \limsup _{k\rightarrow \infty }\, \varPhi _k-\liminf _{k\rightarrow \infty }\phi _k=0, \end{aligned}$$

(25)

which indicates that $\{x^k\}$ is a convergent sequence.

Now we study the convergence of the primal objective function values. On the one hand, since $(\bar{x},\bar{y},\bar{z})$ is a saddle point to the Lagrangian function $\mathcal {L}(\cdot )$ defined by (9), we have for any $k\ge 1$, $\mathcal {L}(\bar{y},\bar{z}; \bar{x})\le \mathcal {L}(y^{k},z^{k};\bar{x})$. This, together with $\mathcal {A}^*\bar{y}+\mathcal {B}^*\bar{z}=c$, implies that for any $k\ge 1$,

$$\begin{aligned} \begin{array}{l} f(\bar{y})+g(\bar{z})-\langle \bar{x},\mathcal {A}^*y_e^{k}+\mathcal {B}^*z_e^{k}\rangle \le f(y^{k})+g(z^{k}). \end{array} \end{aligned}$$

(26)

On the other hand, from (17) and (5) we know that

$$\begin{aligned} \begin{array}{l} f(y^{k})+\langle u^{k}, \bar{y}-y^{k}\rangle \le f(\bar{y}) \quad \text{ and } \quad g(z^{k})+\langle v^{k},\bar{z}-z^{k}\rangle \le g(\bar{z}). \end{array} \end{aligned}$$

By combining the above two inequalities together and using (16) we can get

$$\begin{aligned}&f(\bar{y})+g(\bar{z}) -\langle x^{k},\mathcal {A}^{*}y_e^{k}+\mathcal {B}^{*}z_e^{k}\rangle -\sigma \langle \mathcal {B}^{*}(z^{k-1}-z^{k}),\mathcal {A}^{*} y_e^{k}\rangle \nonumber \\&\quad -(1-\tau )\sigma \Vert \mathcal {A}^{*}y_e^{k}+\mathcal {B}^{*}z_e^{k}\Vert ^2 \ge f(y^{k})+g(z^{k}). \end{aligned}$$

(27)

Since the sequences in (20) are bounded, by using (21) and the fact that any nonnegative summable sequence should converge to zero we know that the left-hand-sides of both (26) and (27) converge to $f(\bar{y})+g(\bar{z})$ when $k\rightarrow \infty $. Consequently, $\lim \limits _{k\rightarrow \infty }\{f(y^{k})+g(z^{k})\}=f(\bar{y})+g(\bar{z})$ by the squeeze theorem. Thus, part (a) is proved.

(b) From (17) we konw that for any $k\ge 1$,

$$\begin{aligned} f(y^{k}) \le f(\bar{y})-\langle u^{k},\bar{y}-y^{k}\rangle = f(\bar{y})-\langle u^{k},\bar{y}\rangle +\langle u^{k},y^{k}\rangle . \end{aligned}$$

(28)

On the one hand, from the boundedness of $\{u^k\}$ we know that the sequence $\{-\langle u^{k},\bar{y}\rangle \}$ is bounded. On the other hand, from (21) and the boundedness of the sequences in (20) we can use

$$\begin{aligned} \langle u^{k},y^k\rangle= & {} -\langle x^{k},\mathcal {A}^*y^k\rangle -(1-\tau )\sigma \langle \mathcal {A}^*y_e^{k}+\mathcal {B}^*z_e^{k} ,\mathcal {A}^*y^k\rangle \\&-\sigma \langle \mathcal {B}^*(z^{k-1}-z^{k}),\mathcal {A}^*y^k\rangle \end{aligned}$$

to get the boundedness of the sequence $\{\langle u^{k},y^k\rangle \}$. Hence, from (28) we know the sequence $\{f(y^{k})\}$ is bounded from above. From (10) we know

$$\begin{aligned} f(y^{k})\ge f(\bar{y})+\langle -\mathcal {A}\bar{x}, y^{k}-\bar{y}\rangle = f(\bar{y})-\langle \bar{x}, \mathcal {A}^*y_e^{k}\rangle , \end{aligned}$$

which, together with the fact that the sequences in (20) are bounded, implies that $\{f(y^{k})\}$ is bounded from below. Consequently, $\{f(y^{k})\}$ is a bounded sequence. By using similar approach, we can obtain that $\{g(z^{k})\}$ is also a bounded sequence.

Next, we prove the remaining part of (b) by contradiction. Suppose that Assumption 3 holds and the sequence $\{y^{k}\}$ is unbounded. Note that the sequence $\{y^{k}/(1+\Vert y^{k}\Vert )\}$ is always bounded. Thus it must have a subsequence $\{y^{k_{j}}/(1+\Vert y^{k_{j}}\Vert )\}_{j\ge 0}$, with $\{\Vert y^{k_{j}}\Vert \}$ being unbounded and non-decreasing, converging to a certain point $\xi \in \mathcal {Y}$. From the boundedness of the sequences in (20) we know that $\{\mathcal {A}^{*} y^{k}\}$ is bounded. Then we have

$$\begin{aligned} \mathcal {A}^{*}\xi =\mathcal {A}^{*}\left( \lim _{j\rightarrow \infty } \frac{y^{k_{j}}}{1+\Vert y^{k_{j}}\Vert } \right) = \lim _{j\rightarrow \infty } \frac{\mathcal {A}^{*}y^{k_{j}}}{1+\Vert y^{k_{j}}\Vert }=0. \end{aligned}$$

By noting that $\Vert \xi \Vert =1$, one has $\xi \in \{y\in \mathcal {Y}\, |\, y\ne 0, \mathcal {A}^*y=0\}$. On the other hand, define the sequence $\{d^{k_j}\}_{j\ge 0}$ by

$$\begin{aligned} d^{k_j}:=\left( y^{k_{j}}/(1+\Vert y^{k_{j}}\Vert )\, ,\, f(y^{k_{j}})/(1+\Vert y^{k_{j}}\Vert )\right) . \end{aligned}$$

From the boundedness of the sequence $\{f(y^{k_j})\}$ and the definition of $\xi $ we know that $\lim _{j\rightarrow \infty }d^{k_j}=(\xi ,0)$. Since $(y^{k_{j}},f(y^{k_{j}}))\in {{\mathrm{epi}}}(f)$, by [11, Theorem 8.2] we know that $(\xi ,0)$ is a recession direction of ${{\mathrm{epi}}}(f)$. Then from the fact that ${{\mathrm{epi}}}(f0^+)=0^+({{\mathrm{epi}}}f)$ we know that $f0^+(\xi )\le 0$, which contradicts Assumption 3. The boundedness of $\{z^{k}\}$ under Assumption 4 can be similarly proved. Thus, part (b) is proved.

(c) Suppose that $(x^\infty ,y^\infty ,z^\infty )$ is an accumulation point of $\{(x^k,y^k,z^k)\}$. Let $\{(x^{k_j},y^{k_j},z^{k_j})\}_{j\ge 0}$ be a subsequence of $\{(x^k,y^k,z^k)\}$ which converges to $(x^\infty ,y^\infty ,z^\infty )$. By taking limits in (17) along $k_j$ for $j\rightarrow \infty $ and using (16) and (21) we can see that

$$\begin{aligned} -\mathcal {A}x^{\infty }\in \partial f(y^\infty ), \quad -\mathcal {B}x^{\infty }\in \partial g(z^{\infty }) \quad \text{ and } \quad \mathcal {A}^*y^\infty +\mathcal {B}^*z^\infty =c, \end{aligned}$$

and this implies that $(x^\infty ,y^\infty ,z^\infty )$ is a solution to the KKT system (10). Now, without lose of generality we reset $(\bar{x},\bar{y},\bar{z})=(x^\infty ,y^\infty ,z^\infty )$. Then, by part (a) we know that the sequence $\{\varPhi _{k}\}$ defined in (16) converges to zero if $\tau \in (0,(1+\sqrt{5})/2)$. Thus, we always have

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert y_{e}^{k}\Vert _{\varSigma _{f}}=0 \quad \text{ and } \quad \lim _{k\rightarrow \infty }\Vert z_{e}^{k}\Vert _{\sigma \mathcal {B}\mathcal {B}^{*}+\varSigma _{g}}=0. \end{aligned}$$

(29)

As a result, it holds that $\mathcal {B}^{*} z^{k}\rightarrow \mathcal {B}^{*} z^{\infty }$, $\varSigma _f y^{k}\rightarrow \varSigma _f y^{\infty }$ and $\varSigma _g z^{k} \rightarrow \varSigma _g z^{\infty }$ as $k\rightarrow \infty $. Moreover, by using the fact that $\mathcal {A}^{*}y^{k}=(\mathcal {A}^{*}y^{k}+\mathcal {B}^{*}z^{k})-\mathcal {B}^{*}z^{k}$ and $ \mathcal {A}^{*}y^{k}+\mathcal {B}^{*}z^{k} \rightarrow \mathcal {A}^{*}y^{\infty }+\mathcal {B}^{*}z^{\infty }=c$ as $k\rightarrow \infty $, we can get $ \mathcal {A}^{*} y^{k} \rightarrow \mathcal {A}^{*} y^{\infty }$ as $k\rightarrow \infty $. This completes the proof of part (c).

(d) If $\varSigma _f+\mathcal {A}\mathcal {A}^*\succ 0$ and $\varSigma _g+\mathcal {B}\mathcal {B}^*\succ 0$, then the subproblems in the ADMM scheme (3) are strongly convex, and hence each of them has a unique optimal solution. Then, by part (c) we know that $\{y^k\}$ and $\{z^k\}$ are convergent. Note that the sequence $\{x^k\}$ is convergent by part (a). Therefore, by part (c) we know that $\{(x^k,y^k,z^k)\}$ converges to a solution to the KKT system (10). Hence, part (d) is proved and this completes the proof of the theorem. $\square $

Before concluding this note, we make the following remarks on the convergence results presented in Theorem 4.1.

Remark 4.1

The corresponding results in part (a) of Theorem 4.1 for the ADMM scheme (3) with $\tau =1$ have been stated in Boyd et al. [1]. However, as indicated by the counterexample constructed in Sect. 3, the proofs in [1] need to be revised with proper additional assumptions. Actually, no proof on the convergence of $\{x^k\}$ has been given in [1] at all. Nevertheless, one may view the results in part (a) as extensions of those in Boyd et al. [1] for the ADMM scheme (3) with $\tau =1$ to a computationally more attractive ADMM scheme (3) with a rigorous proof.

Remark 4.2

Note that, numerically, the boundedness of the sequences generated by a certain algorithm is a desirable property and Assumptions 3 and 4 can fulfill this purpose. Assumption 3 is rather mild in the sense that it holds automatically for many practical problems where f has bounded level sets. Of course, the same comment can also be applied to Assumption 4.

Remark 4.3

All the results of Theorem 4.1 are also valid if the step-length $\tau $ and the sequence generated by the ADMM scheme (3) satisfy the condition that

$$\begin{aligned} \tau \ge (1+\sqrt{5})/2\quad \text{ but }\quad \sum _{k=1}^\infty \Vert x^{k+1}-x^{k}\Vert ^2<+\infty . \end{aligned}$$

(30)

To prove this argument, one can first use $x^{k+1}-x^{k}=\tau \sigma (\mathcal {A}^*y^{k+1}_e+\mathcal {B}^*z^{k+1}_e)$ to get $\sum _{k=0}^{\infty }\Vert \mathcal {A}^*y_e^{k}+\mathcal {B}^*z_e^{k}\Vert ^2 < +\infty $. Then, by using (19) and the fact that $\Vert \mathcal {A}^{*}y_{e}^{k}\Vert \le \Vert \mathcal {A}^{*}y_{e}^{k}+\mathcal {B}^{*}z_{e}^{k}\Vert +\Vert \mathcal {B}^{*}z_{e}^{k}\Vert $ we know the sequences in (20) are bounded. Note that $ \Vert \mathcal {B}^*(z^{k+1}-z^k)\Vert ^2\le 2\Vert \mathcal {A}^*y^{k+1}_e+\mathcal {B}^*z^{k+1}_e\Vert ^2 +2\Vert \mathcal {A}^*y^{k+1}_e+\mathcal {B}^*z^{k}_e\Vert ^{2}$. This, together with (19), implies that (21) holds. The remaining procedure is similar to that for Theorem 4.1 except the following two key steps.

The first one is to show that $\{x^k\}$ is convergent. To do this, we define the nonnegative sequence $\{\psi _k\}$ by $\psi _k:=\sigma \Vert z_e^k\Vert ^2_{\mathcal {B}\mathcal {B}^*}$. By using (19), (21), Lemma 2.1 and fact that $1-\tau <0$ one can show that the sequence $\{\varPsi _k\}$ is convergent. Hence, the sequence $\{\psi _{k}\}$ is nonnegative and bounded. Then, by similar discussions for getting (24) and (25) with $\phi _k$ and $\varPhi _k$ being replaced by $\psi _k$ and $\varPsi _k$, one can get $\lim \limits _{k\rightarrow \infty }\varPsi _{k}=\liminf \limits _{k\rightarrow \infty }\psi _k$. Hence, $\{x^k\}$ is convergent.

The second one is to show that (29) still holds for this case. This can be done by verifying that the sequence $\{\varPsi _{k}\}$ defined in (16) converges to zero if $\tau \ge (1+\sqrt{5})/2$ but $\sum _{k=0}^{\infty }\Vert x^{k+1}-x^{k}\Vert ^{2}<+\infty $.

The condition (30) simplifies the condition proposed by Sun et al. [13, Theorem 2.2], in which one need $\sum _{k=1}^\infty \{\Vert \mathcal {B}^*(z^{k+1}-z^k)\Vert ^2+\sigma \Vert x^{k+1}-x^{k}\Vert ^{2}\}<+\infty $ if $\tau \ge (1+\sqrt{5})/2$. This was used for the purpose of achieving better numerical performance. The advantage of taking the step-length $\tau \ge (1+\sqrt{5})/2$ has been observed in [2, 10, 13] for solving high-dimensional linear and convex quadratic semidefinite programming problems. In numerical computations, one can start with a larger $\tau $, e.g. $\tau =1.95$, and reset it as $\tau :=\max (\gamma \tau ,1.618)$ for some $\gamma \in (0,1)$, e.g. $\gamma =0.95$, if at the k-th iteration one observes that $\Vert x^{k+1}-x^{k}\Vert ^2>c_0/k^{1.2}$, where $c_0$ is a given positive constant. Since $\tau $ can be reset at most a finite number of times, our convergence analysis is valid for such a strategy. One may refer to [13, Remark 2.3] for more discussions on this computational issue.

5 Conclusions

In this note, we have constructed a simple example possessing several nice properties to illustrate that the convergence theorem of the ADMM scheme (3) with the unit step-length stated in Boyd et al. [1] can be false if no prior condition that guarantees the existence of solutions to all the subproblems involved is made. In order to correct this mistake we have presented fairly mild conditions under which all the subproblems are solvable by using standard knowledge in convex analysis. Based on these conditions, we have further established some satisfactory convergence properties of the ADMM with a computationally more attractive large step-length that can even exceed the golden ratio of $(1+\sqrt{5})/2$. In conclusion, this note has (i) clarified some confusions on the convergence results of the popular ADMM; (ii) opened the potential for designing computationally more efficient ADMM based solvers in the future.

Notes

It has been cited 2, 229 times as captured by Google Scholar as of July 8, 2015.
In [4] the authors studied a much general ADMM-type scheme where positive semidefinite proximal terms were added to the subproblems. The convergence properties studied in this paper can be extended to that setting with no difficulty, but in order to make this note as concise as possible we focus on ADMM only.

References

Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Chen, L., Sun, D.F., Toh, K.-C.: An effcient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. (2016). doi:10.1007/s10107-016-1007-5
Google Scholar
Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Pac. J. Optim. 11(4), 619–644 (2015)
MathSciNet MATH Google Scholar
Fazel, M., Pong, T.K., Sun, D.F., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013)
Article MathSciNet MATH Google Scholar
Fortin, M., Glowinski, R.: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary Value Problems. Studies in Mathematics and its Applications, vol. 15. Translated from French by Hunt, B. and Spicer, D.C. Elsevier Science Publishers B.V. (1983)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Glowinski, R.: Lectures on numerical methods for non-linear variational problems. Published for the Tata Institute of Fundamental Research, Bombay [by] Springer-Verlag (1980)
Glowinski, R.: On alternating direction methods of multipliers: a historical perspective. In: Fitzgibbon, W., Kuznetsov, Y.A., Neittaanmaki, P., Pironneau, O. (eds.) Modeling, Simulation and Optimization for Science and Technology, pp. 59–82. Springer, Netherlands (2014)
Google Scholar
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’atomatique, Informatique Recherche Opérationelle. Anal. Num. 9(2), 41–76 (1975)
MATH Google Scholar
Li, X.D., Sun, D.F., Toh, K.-C.: A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions. Math. Program. 155, 333–373 (2016)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Verlag Berlin Heidelberg (2009)
MATH Google Scholar
Sun, D.F., Toh, K.-C., Yang, L.Q.: A convergent 3-block semi-proximal alternating direction method of multipliers for conic programming with 4-type constraints. SIAM J. Optim. 25, 882–915 (2015)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the two anonymous referees for their careful reading of this paper and their comments to improve the quality of this paper. The authors also would like to thank the corresponding editor for providing insightful suggestions.

Author information

Authors and Affiliations

College of Mathematics and Econometrics, Hunan University, Changsha, 410082, China
Liang Chen
Department of Mathematics and Risk Management Institute, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, Singapore
Defeng Sun
Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, Singapore
Kim-Chuan Toh

Authors

Liang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Defeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Chuan Toh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Chen.

Additional information

The research of the first author was supported by the China Scholarship Council while visiting the National University of Singapore and the National Natural Science Foundation of China (Grant no. 11271117). The research of the second and the third authors was supported in part by the Ministry of Education, Singapore, Academic Research Fund (Grant no. R-146-000-194-112).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Sun, D. & Toh, KC. A note on the convergence of ADMM for linearly constrained convex optimization problems. Comput Optim Appl 66, 327–343 (2017). https://doi.org/10.1007/s10589-016-9864-7

Download citation

Received: 08 July 2015
Published: 01 August 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10589-016-9864-7

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A note on the convergence of ADMM for linearly constrained convex optimization problems

Abstract

Similar content being viewed by others

On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers

Analysis of the Alternating Direction Method of Multipliers for Nonconvex Problems

On the Convergence Properties of a Majorized Alternating Direction Method of Multipliers for Linearly Constrained Convex Optimization Problems with Coupled Objective Functions

1 Introduction

2 Preliminaries

Assumption 1

Assumption 2

Assumption 3

Assumption 4

Proposition 2.1

Proof

Corollary 2.1

Proof

Proposition 2.2

Proof

Lemma 2.1

3 A Counterexample

Lemma 3.1

Proof

Remark 3.1

4 Convergence properties of ADMM

Proposition 4.1

Proof

Theorem 4.1

Proof

Remark 4.1

Remark 4.2

Remark 4.3

5 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation