Abstract
The alternating direction method of multipliers (ADMM) is a benchmark for solving a two-block linearly constrained convex minimization model whose objective function is the sum of two functions without coupled variables. Meanwhile, it is known that the convergence is not guaranteed if the ADMM is directly extended to a multiple-block convex minimization model whose objective function has more than two functions. Recently, some authors have actively studied the strong convexity condition on the objective function to sufficiently ensure the convergence of the direct extension of ADMM or the resulting convergence when the original scheme is appropriately twisted. We focus on the three-block case of such a model whose objective function is the sum of three functions, and discuss the convergence of the direct extension of ADMM. We show that when one function in the objective is strongly convex, the penalty parameter and the operators in the linear equality constraint are appropriately restricted, it is sufficient to guarantee the convergence of the direct extension of ADMM. We further estimate the worst-case convergence rate measured by the iteration complexity in both the ergodic and nonergodic senses, and derive the globally linear convergence in asymptotical sense under some additional conditions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Many applications can be modeled or reformulated as convex minimization models with separable objective functions without coupled variables but the variables are coupled by linear constraints. When the objective function is the sum of two convex functions without coupled variables, the alternating direction method of multipliers (ADMM) proposed in [12] has been well studied in the literature. It can be regarded as a splitting version of the classical augmented Lagrangian method (ALM) in [20, 30]; and it has found a variety of applications in many areas, mainly because of its feature of generating easier subproblems in iterations. We refer to, e.g., [2, 10, 13], for some reviews on the ADMM. On the other hand, there are many applications that can be modeled or reformulated as a multiple-block linearly constrained convex minimization model whose objective function is the sum of more than two functions without coupled variables. In fact, many applications can be captured by a three-block model where there are three functions in the objective, see e.g., the image alignment problem in [29], the robust principal component analysis model with noisy and incomplete data in [34], the latent variable Gaussian graphical model selection in [4, 28], the stable principal component pursuit with nonnegative constraint in [38], the quadratic discriminant analysis model in [27] and the quadratic conic programming in [22].
Inspired by the mentioned applications, we focus on a three-block linearly constrained separable convex minimization model as follows. Let \({\mathcal {X}}_i\) with \(i=1,2,3\) and \({\mathcal {Z^*}}\) be real finite-dimensional Euclidean spaces, each equipped with an inner product \(\langle \cdot ,\cdot \rangle \) and its induced norm \(\Vert \cdot \Vert \); \(\theta _i:{\mathcal {X}}_i\rightarrow (-\infty , +\infty ]\) with \(i=1,2,3\) be closed proper convex functions; \(A_i:\;{\mathcal {X}}_i\rightarrow {\mathcal {Z^*}}\) with \(i=1,2,3\) be linear operators; and \(b\in {\mathcal {Z^*}}\) be a given vector. We aim at finding a solution point of the model
The solution set of (1.1) is assumed to be nonempty throughout our discussion. In our analysis, we also use the notations \(\mathbf {x}:=(x_1,x_2,x_3)\) and \(\mathcal{X}:={\mathcal {X}}_1 \times {\mathcal {X}}_2\times {\mathcal {X}}_3\). Let the augmented Lagrangian function of (1.1) be
with \(z \in {\mathcal {Z}}\) the Lagrange multiplier and \(\beta >0\) a penalty parameter. Then, the direct extension of ADMM for (1.1) reads as
Although it is natural to consider the direct extension of ADMM (1.2a)–(1.2d) and this scheme does work very well for many concrete applications of (1.1) (e.g., [16, 29, 34, 38]), it has been shown in [5] that the scheme (1.2a)–(1.2d) is not necessarily convergent if no further conditions are posed on the model (1.1). Some works have been studied in the literature for guaranteeing the convergence of (1.2a)–(1.2d) and they can be classified into two categories. The first category is twisting the scheme (1.2a)–(1.2d) slightly and no more condition is assumed for the original model. For examples, in [17, 18], it was suggested to correct the output of (1.2a)–(1.2d) to generate a new iterate and the resulting prediction-correction schemes are guaranteed to be convergent. In [19], the \(x_i\)-subproblems are twisted slightly to render the convergence. Numerically the original scheme (1.2a)–(1.2d) usually performs better than all the twisted variants with provable convergence (see, e.g., [16, 34]); and it is the most convenient scheme to be implemented compared with its variants. In [21], the authors suggested attaching a shrinkage factor to the Lagrange-multiplier updating step in (1.2d) and it was shown that the convergence of (1.2a)–(1.2d) is guaranteed when this factor is small enough to satisfy some error bound conditions. Note that these just-mentioned works are for the more general case with \(m\ge 3\) blocks in its objective function. Most recently, it was studied in [22] that if certain proximal terms are used to regularize the \(x_i\)-subproblems, then the convergence of (1.2a)–(1.2d) can be guaranteed only when one function in the objective of (1.1) is strongly convex. Indeed, a relaxation factor in \((0,\frac{1+\sqrt{5}}{2})\) which was initiated in [11] can be attached to the Lagrange-multiplier updating step (1.2d) provided that the proximal terms and the penalty parameter \(\beta \) are chosen judiciously.
The second category is assuming more conditions on the model (1.1) while retaining the original iterative scheme (1.2a)–(1.2d). Indeed, prior to the work [5] in which it was shown that (1.2a)–(1.2d) is not necessarily convergent, it was proved in [14] that the scheme (1.2a)–(1.2d) is convergent if all the functions \(\theta _i\) are strongly convex and the penalty parameter \(\beta \) is chosen in a certain interval. Then, in [6, 23], this condition was relaxed and only two or more functions in the objective are required to be strongly convex to ensure the convergence of (1.2a)–(1.2d). However, assuming the strong convexity for two functions still excludes most of the applications that can be efficiently solved by the scheme (1.2a)–(1.2d). Thus, these conditions are only of theoretical interests and they are usually too strict to be satisfied by the mentioned applications.
We are thus interested in understanding the gap between the empirical efficiency of (1.2a)–(1.2d) and the lack of theoretical conditions that can both ensure the convergence of (1.2a)–(1.2d) and be satisfied by some applications of the abstract model (1.1). As [22], we independently show that the convergence of (1.2a)–(1.2d) can be ensured when one function in the objective of (1.1) is strongly convex, the penalty parameter \(\beta \) is appropriately restricted and some assumptions on the linear operators \(A_i\)’s hold—some conditions that hold for some concrete applications of (1.1). Note that the restriction onto \(\beta \) in [22] is determined through checking the positive definiteness of some operators, and it is generally more restrictive than ours because it targets a setting more general than (1.2a)–(1.2d) with larger step sizes for updating z and semi-proximal terms for regularizing the \(x_i\)-subproblems. Moreover, it worths to mention that such a condition on \(\beta \) is sufficient to theoretically ensure the convergence of (1.2a)–(1.2d). It is thus usually conservative and can be enlarged in practice to result in better numerical performance. Another purpose of this paper is establishing the convergence rate for the scheme (1.2a)–(1.2d), including the worst-case convergence rate measured by the iteration complexity and the globally linear convergence rate in asymptotical sense under some additional assumptions. We refer to [6, 21, 23, 24] for some convergence rate analysis for (1.2a)–(1.2d).
The rest of this paper is organized as follows. Some necessary preliminaries for further analysis are provided in Sect. 2. In Sect. 3, we prove the convergence of the scheme (1.2a)–(1.2d) under the assumption that one of the three functions in (1.1) is strongly convex. Then, we establish the worst-case convergence rate measured by the iteration complexity in the ergodic and nonergodic senses in Sects. 4 and 5, respectively. In Sect. 6, under some additional conditions, we show that the scheme (1.2a)–(1.2d) is globally linear convergent. Finally, some concluding remarks are made in Sect. 7.
2 Preliminaries
In this section, we summarize some notations and preliminaries to be used for further analysis.
2.1 Notation
We use calligraphic letters (e.g., \({\mathcal {H}}\), \({\mathcal {X}}\)) to denote spaces, each equipped with an inner product \(\langle \cdot , \cdot \rangle \) and an associated norm \(\Vert \cdot \Vert \). All vectors are column vectors. For two arbitrary vectors \(x\in {\mathcal {X}}_1\) and \(y\in {\mathcal {X}}_2\), we simply use \(u=(x,y)\) to denote their adjunction. That is, (x, y) denotes the vector \((x^*, y^*)^*\). For a linear operator \(A_i\), we denote by \(A_i^*\) its adjoint operator.
Let \(M:\;{\mathcal {H}}\rightarrow {\mathcal {H}}^*\) be a positive definite linear operator; we use \(\Vert x\Vert _M:=\sqrt{\langle x, Mx\rangle }\) to denote its M-norm and \(\lambda _{\min }(M)\) to denote the smallest eigenvalue of M. For an identity operator I and a scalar \(\beta >0\), we simply use \(\Vert x\Vert _{\beta }\) to denote \(\Vert x\Vert _{\beta I}\). For a given linear operator A, its norm is
Specially, for a symmetric linear operator A, \(\Vert A\Vert \) denotes its spectral norm.
We say a function \(f:{\mathcal {H}}\rightarrow (-\infty , \infty ]\) is convex if
and it is strongly convex with modulus \(\mu >0\) if
where \(t \in [0,1]\).
A multifunction \(F:{\mathcal {H}}\rightrightarrows {\mathcal {H}}\) means that F(x) is a set in \({\mathcal {H}}\), see, e.g., [32]. For a multifunction \(F:{\mathcal {H}}\rightrightarrows {\mathcal {H}}\), we say that F is monotone if
and strongly monotone with modulus \(\mu >0\) if
It is well known, see, e.g., [32], that for a convex function f, \(\partial f\), the subdifferential of f, is a monotone multifunction; and for a strongly convex function f with modulus \(\mu >0\), \(\partial f\) is a strongly monotone multifunction with modulus \(\mu \). That is, if f is convex, then for any \(x,y\in \hbox {dom} \; f\)
and if f is strongly convex with modulus \(\mu >0\), then
where
is the domain of \(f:\; {\mathcal {H}}\rightarrow (-\infty ,\infty ]\). An equivalent inequality for the strong convexity of f is that
Last, for any two vectors a and b with the same dimensions, we have
for any positive scalar \(\varsigma \). Obviously, we also have
These two inequalities will be frequently used in the proofs to be presented.
2.2 Optimality characterization
In this subsection, we present some variational forms to characterize a solution point of the model (1.1) under discussion. These characterizations will be the basis of our convergence analysis to be conducted.
Recall that the Lagrangian function of (1.1) is
where z is the Lagrange multiplier. Solving (1.1) amounts to finding a saddle point of \({\mathcal {L}}(x_1,x_2,x_3,z)\). That is, finding \((\hat{x}_1, \hat{x}_2, \hat{x}_3, \hat{z}) \in {\mathcal {X}}_1 \times {\mathcal {X}}_2\times {\mathcal {X}}_3 \times {\mathcal {Z}}\) such that the following inequalities hold:
Let the set of all saddle points of \({\mathcal {L}}(x_1,x_2,x_3, z)\) be denoted by S. Throughout, S is assumed to be nonempty. Let \((\hat{x}_1, \hat{x}_2, \hat{x}_3, \hat{z})\in S\) be an arbitrary solution in S. Then, we have
It follows from (2.2a)–(2.2c) that, for \(i=1,2,3\), we have
if \(\theta _i\) is convex; and
if \(\theta _i\) is strongly convex with the modulus \(\mu _i>0\).
In our analysis, we also use the primal-dual gap function (see e.g., [3, 15, 21]) defined as
where \({\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\) is a subset of \({\mathcal {X}}\times {\mathcal {Z}}\). If \({\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\) contains a solution \((\hat{x}_1,\hat{x}_2,\hat{x}_3,\hat{z})\), then for any \((\mathbf {x},z)\in {\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\), we have
where the equality follows from (2.2d), the second inequality follows from (2.3), and the last inequality follows from the convexity of \(\theta _i\), \(i=1,2,3\).
The following lemma gives a sufficient condition for a point to be a saddle point of \({\mathcal {L}}(x_1,x_2,x_3,z)\). The assertion is an immediate conclusion of (2.4); we thus omit the proof.
Lemma 2.1
Let \(({\bar{\mathbf{x}}}, \bar{z})\) be an arbitrary point in interior of \({\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\). If
then \(({\bar{\mathbf{x}}}, \bar{z})\) is a saddle point.
Based on this lemma, if \((\mathbf {x},z)\in {\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}\) and \({\mathcal {G}}_{{\mathcal {B}}_{\mathbf {x}}\times {\mathcal {B}}_{z}}(\mathbf {x}, z)\le \epsilon \) for some \(\epsilon >0\), then we call \((\mathbf {x},z)\) a saddle point of \({\mathcal {L}}(x_1,x_2,x_3, z)\) with an accuracy of \(\epsilon \).
Finally, we use the notation
where the operators \(A_2\) and \(A_3\) are given in (1.1), and \(\beta \) is the penalty parameter in the direct extension of ADMM (1.2a)–(1.2d).
3 Convergence
In this section, we prove the convergence of the direct extension of ADMM (1.2a)–(1.2d) under the following assumption.
Assumption 3.1
In (1.1), the functions \(\theta _1\) and \(\theta _2\) are convex; the function \(\theta _3\) is strongly convex with modulus \(\mu _3>0\); \(A_1\) and \(A_2\) in (1.1) are of full column rank.
Let us first present the variational characterization of the scheme (1.2a)–(1.2d). Indeed, similar as (2.2a)–(2.2d), the subproblems (1.2a)–(1.2c) can be characterized as
Combining it with the step (1.2d), the \((k+1)\)th iterate generated by (1.2a)–(1.2d) satisfies that for any \( x_1'\in {\mathcal {X}}_1\), \( x_2'\in {\mathcal {X}}_2\), and \( x_3'\in {\mathcal {X}}_3\),
Now, we start to prove the convergence of (1.2a)–(1.2d) under Assumption 3.1. First, we prove several lemmas.
Lemma 3.1
Suppose Assumption 3.1 holds. For the iterative sequence \(\{(x_1^k,x_2^k,x_3^k,z^k)\}\) generated by the direct extension of ADMM (1.2a)–(1.2d), we have
and
Proof
Setting \(x_3':=x_3^k\) in (3.2c) yields
and setting \(x_3':=x_3^{k+1}\) for the kth iteration yields
Adding the above two inequalities, we have
and the inequality (3.3) is proved.
We now prove (3.4). It follows from (3.2d) that
Using (3.5), we can rewrite (3.2b) as
Then, setting \(x_2':=x_2^k\) and \(x_2':=x_2^{k+1}\) in (3.6) for the kth iteration yields
and
respectively. Adding these two inequalities, we get
Rearranging terms, it follows that
where the second inequality follows from (2.1). The proof is complete. \(\square \)
Lemma 3.2
Suppose Assumption 3.1 holds. Let \(\{(x_1^k,x_2^k,x_3^k,z^k)\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d). Then, for all \((x_1',x_2',x_3',z')\in {\mathcal {X}}_1\times {\mathcal {X}}_2\times {\mathcal {X}}_3\times {\mathcal {Z}}\), we have
Proof
Using (3.2d) and rearranging terms, it follows from (3.2a)–(3.2c) that
Adding (3.8a)–(3.8c) and rearranging terms, we obtain the assertion (3.7) immediately. \(\square \)
Now, we need to further analyze the right-hand side of (3.7) and refine the assertion (3.7). This is done in the following lemma.
Lemma 3.3
Suppose Assumption 3.1 holds. Let \(\{(x_1^k,x_2^k,x_3^k,z^k)\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d). Then, for all \((x_1',x_2',x_3',z')\in {\mathcal {X}}_1\times {\mathcal {X}}_2\times {\mathcal {X}}_3\times {\mathcal {Z}}\), we have
Proof
Because of the identity
it follows that
and
Consequently, we have
where the third equality follows from (3.5) and the inequality follows from (3.3). On the other hand, using (3.4), we have
where the inequality follows from (2.1). Then, it holds that
Furthermore, since
we obtain the assertion (3.9) by combining (3.11), (3.12), (3.13), and (3.7). \(\square \)
Now, we are ready to prove the convergence of the direct extension of ADMM (3.2a)-(3.2d) under Assumption 3.1. The result is summarized in the following theorem.
Theorem 3.1
Suppose Assumption 3.1 holds . Let \(\{(x_1^k,x_2^k,x_3^k,z^k)\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d) with \(\beta \in \left( 0,\frac{6 \mu _3}{17\Vert A_3^*A_3\Vert }\right) \). Then, the sequence \(\{(x_1^k, x_2^k, x_3^k, z^k)\}\) converges to a saddle-point in S.
Proof
Let \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) be a saddle point in S. Then, we have
Setting \((x_1',x_2', x_3', z'):=(\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) in (3.9) and using (3.14), we obtain
Since \(\beta \in \left( 0,\frac{6 \mu _3}{17\Vert A_3^*A_3\Vert }\right) \), we have
Then, (3.15) can be written as
Since \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) is a saddle point, it follows from (2.4) that the left-hand side of (3.16) is nonegative. We thus have
Recall the definition of Q in (2.5). Then, (3.17) can be rewritten as the compact form
where \(\mathbf {w}:=(x_2,x_3, z)\).
Thus we have
indicating that the sequence \(\{{\mathbf{w}}^k\}\) is bounded. Rearranging terms for (3.18) yields
Adding both sides of the above inequality for all k, we get
and hence
The full column rank of \(A_1\) assumed in Assumption 3.1, (3.20), and (1.2d) then imply that \(\{x_1^k\}\) is also bounded. Hence the iterative sequence \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) generated by ADMM (1.2a)-(1.2d) is bounded.
Then, the boundedness of the sequence indicates that it has at least one cluster point. Let \((\tilde{x}_1, \tilde{x}_2, \tilde{x}_3, \tilde{z})\) be an arbitrary cluster point of \(\{({x}_1^k, {x}_2^k, {x}_3^k, {z}^k)\} \) and \(\{({x}_1^{k_j}, {x}_2^{k_j}, {x}_3^{k_j}, {z}^{k_j})\}\) be the subsequence converging to \((\tilde{x}_1, \tilde{x}_2, \tilde{x}_3, \tilde{z})\). Taking the limit along the subsequence in (3.1a)-(3.1c) and using (3.20), we have
Moreover, since \(\Vert z^k-z^{k+1}\Vert =\beta \Vert \sum \limits _{j=1}^3 A_j x_j^{k+1}-b\Vert \), by taking the limit, we have \(\sum \limits _{j=1}^3 A_j \tilde{x}_j-b=0\). Hence, it follows from (2.2a)–(2.2d) that \((\tilde{x}_1, \tilde{x}_2, \tilde{x}_3, \tilde{z})\) is a saddle point in S and (3.19) means that the whole sequence has only one cluster point. Consequently, the iterative sequence \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) generated by the direct extension of ADMM (1.2a)–(1.2d) converges to a saddle point in S. The proof is complete.
Remark 3.1
For some specific applications of the abstract model (1.1), it may not be difficult to determine the range for \(\beta \). For example, for the application of moving object detection by detecting contiguous outliers studied in [37], it is easy to see that its model corresponds to (1.1) with \(A_3=I\) and \(\mu _3=1\). Thus, choosing \(\beta \in \left( 0, \frac{6}{17}\right) \) can sufficiently ensure the convergence if the scheme (1.2a)–(1.2d) is implemented to solve the model in [37].
4 Ergodic worst-case convergence rate
In this section, under Assumption 3.1, we establish the O(1 / t) worst-case convergence rate measured by the iteration complexity in the ergodic sense for the direct extension of ADMM (1.2a)–(1.2d). The main result is summarized in the following theorem.
Theorem 4.1
Suppose Assumption 3.1 holds. After t iterations of the direct extension of ADMM (1.2a)–(1.2d), we can find an approximate solution of (1.1) with an accuracy of O(1 / t).
Proof
As proved in Theorem 3.1, the sequence \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) generated by the direct extension of ADMM (1.2a)–(1.2d) is convergent. Thus, there exists a constant \(M>0\) such that
Then, it follows from Assumption 3.1 and (3.9) that for all \((x_1',x_2',x_3',z')\in {\mathcal {X}}_1\times {\mathcal {X}}_2\times {\mathcal {X}}_3\times {\mathcal {Z}}\), we have
Summing both sides of the above inequality from \(k=1\) to t yields
where the second inequality follows from the Cauchy-Schwarz inequality and the last one is because of the triangle inequality and the boundedness of \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\). Define
Then, it follows from the convexity of \(\theta _1\), \(\theta _2\), and \(\theta _3\) that
Combining (4.1) and (4.2) yields
Since \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) converges to a saddle point, \(\{(x_1^t,x_2^t, x_3^t, z^t)\}\) also converges to the same saddle point, and (4.3) implies that
where
That is, \((\mathbf {x}^K,z^K)\) is an approximate solution of (1.1) with an accuracy of O(1 / t). A worst-case O(1 / t) convergence rate measured by the iteration complexity in the ergodic sense is thus established for the direct extension of ADMM (1.2a)–(1.2d). This completes the proof. \(\square \)
5 Non-ergodic worst-case convergence rate
In this section, we establish the worst-case convergence rate measured by the iteration complexity in a non-ergodic sense for the direct extension of ADMM (1.2a)–(1.2d). Note that a non-ergodic worst convergence rate is generally stronger than its ergodic counterpart. Thus, we need more assumptions to derive a non-ergodic worst-case convergence rate. We first make the following additional assumption, and refer to, e.g. [38], for some applications that this assumption holds.
Assumption 5.1
In (1.1), \(\theta _1\) and \(\theta _2\) are convex, \(\theta _3\) is smooth and \(\nabla \theta _3\) is Lipschitz continuous with constant \(L_3>0\), \(\theta _3\) is strongly convex with modulus \(\mu _3>L_3/2\); \(A_1\) and \(A_2\) are of full column rank.
First of all, we need to present the optimality measurement for an iterate generated by (1.2a)–(1.2d). Recall the optimality condition for the subproblems (1.2a)–(1.2c). We know that an iterate generated by (1.2a)–(1.2d) satisfies the following inequalities:
Therefore, if we have
then the iterate \((x_1^{k+1}, x_2^{k+1}, x_3^{k+1}, z^{k+1})\) generated by (1.2a)–(1.2d) is a saddle point in S. Naturally, we can use the residual of (5.2a)–(5.2c) to measure the optimality of an iterate generated by (1.2a)–(1.2d).
Define
It follows from Assumption 5.1 that \(\beta _{\max }>0\). To derive the non-ergodic worst-case convergence rate, let us first prove a key inequality that will play an important role in our further analysis.
Lemma 5.1
Suppose Assumption 5.1 holds. Let \(\{(x_1^{k+1}, x_2^{k+1}, x_3^{k+1}, z^{k+1})\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d) with the restriction \(\beta \in \left( 0, \beta _{\max }\right) \). Then we have
Proof
Setting \(x_1':=x_1^k\) in (5.1a) yields
Rearranging terms, we have
Moreover, setting \(x_1':=x_1^{k+1}\) for the kth iteration, we obtain
Adding the above two inequalities and rearranging terms, we have
Similarly, setting \(x_2'=x_2^k\) and \(x_2'=x_2^{k+1}\) in the kth iteration in (5.1b), and adding the resulting inequalities, we get the following inequality:
Then, it follows from (5.1c) that
Further, because of the identity (3.5), we know that
We thus can rewrite the left-hand term of (5.7) as
Now, we treat the right-hand side of (5.7). Indeed, these terms can be written as
Using the identity (3.10), we have
and
Then, using (2.1) and (3.3), we have that for all \(\omega >0\)
and
Substituting (5.10)–(5.13) into (5.9), we have
Using the identities \(A_3^*z^{k+1}= \nabla \theta _3(x_3^{k+1})\) and \(A_3^*z^{k}= \nabla \theta _3(x_3^{k})\), and the Lipschitz continuity of \(\nabla \theta _3\), we obtain
Combining (5.8) and (5.14), and setting \(\omega =L_3\), it follows from (5.7) that
where the last inequality is due to the fact that \(\beta \le \frac{2\mu _3-L_3}{\Vert A_3^*A_3\Vert }\). The proof is complete. \(\square \)
Based on the above lemma, we are now ready to establish the worst-case convergence rate measured by the iteration complexity in a non-ergodic sense. We can actually measure this convergence rate by both O(1 / t) and o(1 / t) orders. Recall that we use the residual of (5.2a)–(5.2c) to measure the accuracy of an iterate of the scheme (3.2a)–(3.2d).
Theorem 5.1
Suppose Assumption 5.1 holds. Let \(\{(x_1^{k+1}, x_2^{k+1}, x_3^{k+1}, z^{k+1})\}\) be the sequence generated by the direct extension of ADMM (3.2a)–(3.2d) with the restriction \(\beta \in \left( 0, \beta _{\max }\right) \). Then, we have
Proof
First, it follows from (3.18) that
Setting
it follows that
Adding these inequalities from \(k=1\) to t, we have
Then, using Lemma 5.1, we obtain
which means that
The proof is complete. \(\square \)
In Theorem 5.1, we show the worst-case O(1 / t) convergence rate in a nonergodic sense for the direct extension of ADMM (3.2a)–(3.2d). Indeed, following the work [9], we can easily refine this O(1 / t) order to an o(1 / t) order, and thus establish a worst-case o(1 / t) convergence rate in a nonergodic sense for the scheme (3.2a)–(3.2d). We summarize it in the following theorem.
Theorem 5.2
Suppose Assumption 5.1 holds. Let \(\{(x_1^{k+1}, x_2^{k+1}, x_3^{k+1}, z^{k+1})\}\) be the sequence generated by the direct extension of ADMM (3.2a)–(3.2d) with the restriction \(\beta \in \left( 0, \beta _{\max }\right) \). Then, we have
Proof
Taking \(k=t\) to 2t in (5.16) and adding the resulting inequalities, we obtain
The limit of the last term in the right-hand side is 0. So, the assertion (5.17) is proved. \(\square \)
6 Globally linear convergence
In this section, we show that it is possible to theoretically derive the globally linear convergence of the direct extension of ADMM (1.2a)–(1.2d) provided that some strengthen enough conditions are assumed.
6.1 Linear convergence under stronger conditions
In this subsection, we prove the linear convergence of the direct extension of ADMM under the following conditions. We refer to [24] for some existing analysis under stronger conditions.
Assumption 6.1
In (1.1), \(\theta _1\) is convex; \(\theta _2\) and \(\theta _3\) are strongly convex with modulus \(\mu _2>0\) and \(\mu _3>0\), respectively; \(\nabla \theta _1\) is Lipschitz continuous with constant \(L_1\); \(A_2\) and \(A_3\) are of full column rank; and \(A_1\) is nonsingular.
First of all, under Assumption 3.1, we can prove a result similar to (3.18). That is, there exists a constant \(\zeta =\min \{\frac{1}{40}, \varrho _1, \mu _2\}>0\), such that
Then, let us explain the roadmap to prove the linear convergence rate of (1.2a)–(1.2d). Indeed, according to (6.1), it is clear that we only need to find bounds for the terms \(\Vert z^{k+1}-\hat{z}\Vert ^2\) and \(\Vert A_1x_1^{k+1}-A_1\hat{x}_1\Vert ^2\) because of the minus term in (6.1). We first consider \(\Vert A_1x_1^{k+1}-A_1\hat{x}_1\Vert ^2\). It follows from (3.14) and (3.5) that
Therefore, we have
Then, we need to consider how to bound the term \(\Vert z^{k+1}-\hat{z}\Vert ^2\) to derive the linear convergence of the sequence generated by the scheme (1.2a)–(1.2d). As we show below, this is exactly why we need to assume Assumption 6.1.
Lemma 6.1
Suppose Assumption 6.1 holds. Let \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) be a saddle point in S; let \(\{(x_1^{k+1},x_2^{k+1}, x_3^{k+1}, z^{k+1})\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d). Then, there exists a constant \(\sigma >0\) such that
Proof
It follows from the optimality conditions for the \(x_1\)- and \(\hat{x}_1\)-subproblems that
and
Using Assumption 6.1 and the Cauchy-Schwarz inequality, we get
and the assertion follows immediately from the non-singularity of \(A_1\). \(\square \)
Now, we can derive the linear convergence of the sequence generated by the scheme (1.2a)-(1.2d) under Assumptions 3.1 and 6.1.
Theorem 6.1
Suppose Assumptions 3.1 and 6.1 hold. Then the sequence \(\{(x_1^{k+1},x_2^{k+1}, x_3^{k+1},z^{k+1})\}\) generated by the direct extension of ADMM (1.2a)–(1.2d) with \(\beta \in \left( 0,\frac{6 \mu _3}{17\Vert A_3^*A_3\Vert }\right) \) converges globally linearly to a saddle point in S.
Proof
Let \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) be a saddle point in S. It follows from (6.2) that
where \(\tau _1=\frac{\zeta }{12}\min \left\{ \beta ^2, \frac{1}{\Vert A_2^* A_2\Vert }, \frac{1}{\Vert A_3^* A_3\Vert }\right\} \lambda _{\min }(A_1^* A_1)\). In addition, we have
where \(\tau _2=\frac{\zeta }{4}\min \left\{ \beta ^2\lambda _{\min }(A_2^* A_2),1\right\} \). Adding (6.3) and (6.4), and using Lemma 6.1, we have
where \(\tau =\frac{1}{\sigma }\min \left\{ \tau _1, \tau _2\right\} \). Moreover, we have
where \(\tau '=\frac{\zeta }{2\beta ^2}\min \left\{ \frac{1}{\Vert A_2^* A_2\Vert }, \frac{3}{10\Vert A_3^* A_3\Vert }\right\} \).
Then, it follows from (6.5) and (6.6) that
where Q is defined in (2.5) and its positive definiteness is ensured by Assumption 6.1 and \(\sigma '=\min \left\{ \tau , \tau '\right\} \). Finally, combining (6.1) and (6.7), we obtain
which implies the linear convergence rate of the generated sequence under the Q-norm. The proof is complete. \(\square \)
6.2 Linear convergence under error bound conditions
As mentioned, the linear convergence of the scheme (1.2a)–(1.2d) has been well investigated under some error bound conditions in [21], which has also inspired the work [15] for deriving the linear convergence of the original two-block ADMM scheme in the quadratic programming context, which was further extended to the general case that the objective functions possess polyhedra subdifferentials [35]. In this subsection, we provide an alternative proof for the globally linear convergence of (1.2a)–(1.2d) under some error bound conditions. The new conditions differ from Assumption 6.1 in that no smoothness or strong convexity assumption on the objective function is assumed.
Assumption 6.2
Let \({\mathcal {U}}:={\mathcal {X}}_1\times {\mathcal {X}}_2\times {\mathcal {X}}_3\times {\mathcal {Z}}\) and let \(\tilde{U}\) be a bounded set in \({\mathcal {U}}\). Then there exists \(\eta >0\) such that
where \(u:=(x_1,x_2,x_3, z)\), and
Obviously, using the definition in (6.9) and the optimality condition (2.2a)–(2.2d), we see that a point \(u\in S\) if and only if \(0\in F(u)\). Using the result [36, Theorem 3.3], it is clear that the error bound condition (6.8) holds if F is a piecewise linear multifunction, which is true for the case where \(\theta _i\)’s are, e.g., the \(l_1\)- or \(l_2\)-terms. For the definition of piecewise linear multifunction or polyhedral multifunction, see, e.g., [31, 36].
Recall that under Assumption 3.1, we show that there exists a constant \(\zeta =\min \{\frac{1}{40}, \varrho _1\}>0\) such that
where \(\mathbf {w}:=(x_2,x_3, z)\) and Q is defined by (2.5), see (3.18). To prove the linear convergence rate of the scheme (1.2a)–(1.2d) under Assumption 6.2, it follows from (6.10) that we only need to bound the term \(\Vert \mathbf {w}^{k+1}-{\hat{\mathbf{w}}}\Vert ^2\) because of the minus term in (6.10). Then, under Assumption 6.2, the key is to find a bound of the term \(\hbox {dist}(0,F(u^{k+1}))\). Therefore, we first prove the following lemma.
Lemma 6.2
Let F(u) be defined in (6.9) and \(\{u^k:=(x_1^k,x_2^k, x_3^k, z^k)\}\) be the sequence generated by the direct extension of ADMM (1.2a)–(1.2d). Then, there exists a constant \(\sigma >0\) such that
where
Proof
It follows from the optimality condition of the \(x_1\)-subproblem that
which can be alternatively written as (using (1.2d))
Consequently, we have
Moreover, the optimality condition of the \(x_2\)-subproblem and (1.2d) indicate that
Hence, we obtain
Further, the optimality condition of the \(x_3\)-subproblem and (1.2d) indicate that
which means
In addition, it follows from (1.2d) that
Finally, combining (6.13)–(6.16), we obtain
Therefore, the assertion (6.11) follows immediately from (6.17) and the definition of \(\sigma \) in (6.12). \(\square \)
Now, we show the globally linear convergence of the direct extension of ADMM (1.2a)-(1.2d) under Assumptions 3.1 and 6.2.
Theorem 6.2
Suppose Assumptions 3.1 and 6.2 hold. Then the sequence \(\{(x_1^k,x_2^k, x_3^k,z^k)\}\) generated by the direct extension of ADMM (1.2a)–(1.2d) with \(\beta \in \left( 0,\frac{6 \mu _3}{17\Vert A_3^*A_3\Vert }\right) \) converges globally linearly to a saddle point in S.
Proof
Let \((\hat{x}_1,\hat{x}_2, \hat{x}_3,\hat{z})\) be a saddle point in S. According to Theorem 3.1, the whole sequence \(\{(x_1^k,x_2^k, x_3^k, z^k)\}\) converges to a saddle point in S. Thus it is bounded. Because of Assumption 6.2, we know that
which, together with (6.11), implies that
Let us define
with Q given in (2.5). Then, combining (6.19) with (6.10), we have
It follows from (6.18) that
Let \(u^*\in S\) such that \(\hbox {dist}_{M}(u^k,S)=\Vert u^k-u^*\Vert _M\). Since (6.20) holds for any saddle point in S, we have
Setting
we obtain the following inequality:
where the second inequality follows from (6.21) and the third one from the definition of \(\bar{\mu }\). Thus, the inequality (6.22) implies the linear convergence rate of the sequence generated by the direct extension of ADMM measured by the sum of the distance under the \((\gamma I+M)\)-norm. The proof is complete. \(\square \)
Remark 6.1
Note that the direct extension of ADMM (1.2a)–(1.2d) is a splitting, thus inexact, version of the augmented Lagrangian method in [20, 30]; and it has been proved in [33] that the augmented Lagrangian method is an application of the proximal point algorithm (PPA) in [25, 26]. Since the convergence rate of PPA is known to be only linear in [33], the linear convergence rate is the most we can expect for the direct extension of ADMM (1.2a)–(1.2d). The linear convergence rate is indeed ideal for the generic convex model (1.1), and thus very strict conditions should be posed to gain this convergence rate. In fact, even for the original ADMM to tackle a two-block linearly constrained separable convex minimization model, its linear convergence rate can only be established for special cases (e.g., [1, 15]) or generic cases but with strong assumptions on the model (e.g., [7, 8]). Usually, the conditions that can ensure the linear convergence rate of (1.2a)–(1.2d) are too strict to be satisfied by the majority of the concrete applications of the abstract model (1.1). In this sense, discussing the global linear convergence rate for the direct extension of ADMM (1.2a)–(1.2d) only makes a theoretical sense.
7 Conclusions
In this paper, we discuss the convergence of the direct extension of alternating direction method of multipliers (ADMM) for solving a three-block linearly constrained convex minimization model whose objective function is the sum of three functions without coupled variables. We show that the convergence of the direct extension of ADMM can be ensured if one function in the objective is strongly convex, the penalty parameter is appropriately restricted and some assumptions on the operators in the constraints hold. This convergence result can well explain why the direct extension of ADMM is convergent for some applications. We also establish the worst-case convergence rate measured by the iteration complexity and the globally linear convergence rate in asymptotical sense under additional assumptions. This is a deeper study on the convergence of the direct extension of ADMM that may help us better understand its theoretical aspects. We would like to mention that instead of the original scheme (1.2a)–(1.2d), we can study the scheme
where \(\gamma \in (0,1)\). Since the relaxation factor \(\gamma \) can be arbitrarily close to 1, asymptotically this scheme has no difference from the direct extension of ADMM (1.2a)–(1.2d). Similarly, we can discuss the convergence and estimate the convergence rate under different conditions for the scheme (7.1a)–(7.1c), and indeed, their proofs can be presented with easier notation than what we present in this paper.
References
Boley, D.: Local linear convergence of ADMM on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. In: Jordan, M. (ed.) Foundations and Trends in Machine Learning, vol. 3,pp. 1–122 (2011)
Chamboulle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40, 1935–1967 (2012)
Chen, C.H., He, B.S., Ye, Y.Y., Yuan, X.M.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155, 57–79 (2016)
Chen, C.H., Shen, Y., You, Y.F.: On the convergence analysis of the alternating direction method of multipliers with three blocks. Abstr. Appl. Anal. 2013 (2013), Article ID 183961
Corman, E., Yuan, X.M.: A generalized proximal point algorithm and its convergence rate. SIAM J. Optim. 24(4), 1614–1638 (2014)
Deng, W., Yin, W.T.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Deng, W., Lai, M.J., Peng, Z.M., Yin, W.T.: Parallel multi-block ADMM with \(o(1/k)\) convergence. Manuscript (2014)
Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Pac. J. Optim. 11, 619–644 (2015)
Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer, New York (1984)
Glowinski, R., Marrocco, A.: Sur l’approximation paréléments finis d’ordre un et larésolution parpénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue Fr. Autom. Inf. Rech. Opér., Anal. Numér. 2, 41–76 (1975)
Glowinski, R.: On alternating direction methods of multipliers: a historical perspective. Model. Simul. Optim. Sci. Technol. Comput. Methods Appl. Sci. 34, 59–82 (2014)
Han, D.R., Yuan, X.M.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155, 227–238 (2013)
Han, D.R., Yuan, X.M.: Local linear convergence of the alternating direction method of multipliers for quadratic programs. SIAM J. Numer. Anal. 51, 3446–3457 (2013)
Han, D.R., Yuan, X.M., Zhang, W.X.: An augmented-Lagrangian-based parallel splitting method for separable convex programming with applications to image processing. Math. Comput. 83, 2263–2291 (2014)
He, B.S., Tao, M., Yuan, X.M.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)
He, B.S., Tao, M., Yuan, X.M.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Math. Oper. Res. (under revision)
He, B.S., Tao, M., Yuan, X.M.: A splitting method for separable convex programming. IMA J. Numer. Anal. 35, 394–426 (2015)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)
Hong, M.Y., Luo, Z.Q.: On the linear convergence of alternating direction method of multipliers. Math. Program. (to appear)
Li, M., Sun, D.F., Toh, K.C.: A convergent \(3\)-block semi-proximal ADMM for convex minimization problems with one strongly convex block. Asia Pac. J. Oper. Res. 32, 1550024 (2015). 19 pages
Lin, T.Y., Ma, S.Q., Zhang, S.Z.: On the sublinear convergence rate of multi-block ADMM. J. Oper. Res. Soc. China 3, 251–274 (2015)
Lin, T.Y., Ma, S.Q., Zhang, S.Z.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25, 1478–1497 (2015)
Martinet, B.: Regularization d’inequations variationelles par approximations successives. Revue Francaise d’Informatique et de Recherche Opérationelle 4, 154–159 (1970)
Moreau, J.J.: Proximité et dualit ’e dans un espace Hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition, vol. 544. Wiley, New York (2004)
Mohan, K., London, P., Fazel, M., Witten, D., Lee, S.: Node-based learning of multiple gaussian graphical models. J. Mach. Learn. Res. 15, 445–488 (2014)
Peng, Y.G., Ganesh, A., Wright, J., Xu, W.L., Ma, Y.: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, pp. 283–298. Academic Press, New York (1969)
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–214 (1981)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Tao, M., Yuan, X.M.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)
Yang, W.H., Han, D.R.: Linear convergence of alternating direction method of multipliers for a class of convex optimization problems. SIAM J. Numer. Anal. 54(2), 625–640 (2016)
Zheng, X.Y., Ng, K.F.: Metric subregularity of piecewise linear multifunctions and applications to piecewise linear multiobjective optimization. SIAM J. Optim. 24, 154–174 (2014)
Zhou, X., Yang, C., Yu, W.: Moving object detection by detecting contiguous outliers in the Low-Rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 597–610 (2013)
Zhou, Z., Li, X., Wright, J., Candes, E.J., Ma, Y.: Stable principal component pursuit. In: Proceedings of International Symposium on Information Theory. Austin (2010)
Acknowledgments
Xingju Cai is supported by the NSFC Grant 11401315 and the NSF from Jiangsu province BK20140914. Deren Han is supported by a project funded by PAPD of Jiangsu Higher Education Institutions and the NSFC Grants 11371197 and 11431002. Xiaoming Yuan was supported by the NSFC/RGC Joint Research Scheme: N_PolyU504/14.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cai, X., Han, D. & Yuan, X. On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput Optim Appl 66, 39–73 (2017). https://doi.org/10.1007/s10589-016-9860-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-016-9860-y