Abstract
This note proposes a novel approach to derive a worst-case \(O(1/k)\) convergence rate measured by the iteration complexity in a non-ergodic sense for the Douglas–Rachford alternating direction method of multipliers proposed by Glowinski and Marrocco.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
There has been an impressive development on operator splitting methods in the area of partial differential equations, and among them are some alternating direction methods of multipliers (ADMMs for short). In this note, we focus on the Douglas–Rachford ADMM scheme proposed by Glowinski and Marrocco in [7] (see also [5]) and we restrict our discussion into the context of a convex minimization problem with linear constraints and a separable objective function:
where \(A\in \mathfrak {R}^{m\times n_1}\), \(B\in \mathfrak {R}^{m\times n_2}\), \(b\in \mathfrak {R}^m\), \(\mathcal{X}\subset \mathfrak {R}^{n_1}\) and \(\mathcal{Y}\subset \mathfrak {R}^{n_2}\) are closed convex sets, \(\theta _1: \mathfrak {R}^{n_1} \rightarrow \mathfrak {R}\) and \(\theta _2: \mathfrak {R}^{n_2} \rightarrow \mathfrak {R}\) are convex functions (not necessarily smooth). The solution set of (1.1) is assumed to be nonempty, and we refer to [4, 8] for some convergence results without this assumption.
As in [10], in order to treat the original ADMM in [7] and the split inexact Uzawa method in [13] uniformly, we study the following ADMM scheme for (1.1):
where \(\lambda ^k\in {\mathfrak {R}}^m\) is the Lagrange multiplier, \(\beta > 0\) is a penalty parameter, and \(G \in \mathfrak {R}^{n_1 \times n_1}\) is a symmetric and positive semidefinite matrix. In fact, the original ADMM scheme in [7] and the split inexact Uzawa method in [13] are recovered by taking \(G=0\) and \(G=(r I_{n_1}-\beta A^TA)\) with \(r >\beta \Vert A^TA\Vert \) (where \(\Vert \cdot \Vert \) denotes a matrix norm such as the spectrum norm) in (1.2a), respectively. We refer to review papers [1, 3, 6] and references therein for the history of ADMMs, and in particular, some efficient applications of ADMMs exploited recently.
Because of its impressive efficiency and wide applicability, it is interesting to investigate the convergence rate of the ADMM scheme (1.2). A first-step work is [10] where we showed a worst-case \(O(1/k)\) convergence rate measured by the iteration complexity, where \(k\) denotes the iteration counter, for the ADMM scheme (1.2)Footnote 1. With this first-step result, it becomes possible to investigate more intensive results on the convergence rate of the scheme (1.2). Recall that the convergence rate derived in [10] is in the ergodic sense because the approximate solution with an accuracy of \(O(1/k)\) is found based on all \(k\) iterates generated by (1.2). One may ask if we can establish the same convergence rate directly in a non-ergodic sense for the sequence generated by the scheme (1.2). The main purpose of this note is to answer this question affirmatively. We expect the new technique to be used to analyze convergence rates for other important numerical algorithms of the same kind.
2 Preliminaries
In this section, we provide some preliminaries which are useful in later analysis.
2.1 Notations
We first define some matrices which will simplify the notations in our analysis. More specifically, let
Without further assumption on \(B\), the matrix \(H\) defined above can only be guaranteed as symmetric and positive semidefinite. But we still use the notation \(\Vert w\Vert _H^2\) to represent the non-negative number \(w^THw\) in our analysis. Based on these matrices, some relationship can be easily derived and we summarize them in the following proposition.
Proposition 2.1
Let the matrices \(H\), \(M\) and \(Q\) be defined in (2.1). Then we have
-
(1)
\(Q=HM\);
-
(2)
The symmetric matrix \((Q^T+Q) - M^THM\) is positive semidefinite: \((Q^T+Q) - M^THM \succeq 0\).
Proof
The first conclusion is trivial, and we omit it. For the second one, we notice that
Thus, the proposition is proved. \(\square \)
2.2 Variational inequality characterization of (1.1)
It is easy to see that (1.1) is characterized by a variational inequality (VI) problem: Find \(w^*=(x^*,y^*,\lambda ^*) \in \Omega :=\mathcal{X} \times \mathcal{Y} \times \mathfrak {R}^m\) such that
where
Note that the mapping \(F(w)\) is monotone because it is affine with a skew-symmetric matrix. We denote by \(\Omega ^*\) the solution set of VI\((\Omega ,F,\theta )\). Then, \(\Omega ^*\) is nonempty under the nonempty assumption on the solution set of (1.1).
3 Sketch of Proof
To establish a worst-case \(O(1/k)\) convergence rate for the sequence \(\{w^k\}\) generated by (1.2) in a non-ergodic sense, we need the assertion in the following lemma.
Lemma 3.1
Let the sequence \(\{w^k\}\) be generated by (1.2) and \(H\) be given in (2.1). Then, we have
where
Proof
First, deriving the optimality condition of the minimization problem (1.2a), we have
By using (1.2c), it can be written as
It follows from (1.2) that
Combining (3.3), (4.2) and (3.4) together, we get \(w^{k+1}=(x^{k+1},y^{k+1},\lambda ^{k+1}) \in \Omega \), such that
which can be rewritten as
Using the notations of \(F(w)\), \(\eta (y^k,y^{k+1})\) and \(H\), we get the assertion (3.1) immediately. \(\square \)
Lemma 3.1 indicates that the quantity \(\Vert w^k-w^{k+1}\Vert _H^2\) can be used to measure the accuracy of the iterate \(w^{k+1}\) to a solution point of VI\((\Omega ,F,\theta )\). More specifically, since \(H\) is positive semidefinite, we conclude that \(H(w^{k+1}-w^k)=0\) and \(\eta (y^k ,y^{k+1})=0\) if \(\Vert w^k-w^{k+1}\Vert _H^2=0\). In other words, because of the variational inequality characterization (2.2), when \(\Vert w^k-w^{k+1}\Vert _H^2=0\), we have
which means \(w^{k+1}\) is a solution of VI\((\Omega ,F,\theta )\). Therefore, \(\Vert w^k - w^{k+1}\Vert _H^2\) can be viewed as an error measurement after \(k\) iterations of the ADMM scheme (1.2), and it is reasonable to seek an upper bound of \(\Vert w^k - w^{k+1}\Vert _H^2\) in term of the quantity \(O(1/k)\) for the purpose of investigating the convergence rate of ADMM. Based on this fact, our proof follows the following steps:
-
1.
To show that \(\{w^k\}\) is strictly contractive with respect to \(\Omega ^*\), i.e.,
$$\begin{aligned} \Vert w^{k+1} - w^*\Vert _H^2 \le \Vert w^k-w^*\Vert _H^2 - \Vert w^k - w^{k+1}\Vert _H^2 \quad \forall \, w^*\in \Omega ^*; \end{aligned}$$(3.5) -
2.
To show that \(\{\Vert w^{k} - w^{k+1}\Vert _H^2 \}\) is monotonically non-increasing, i.e.,
$$\begin{aligned} \Vert w^{k} - w^{k+1}\Vert _H^2 \le \Vert w^{k-1}-w^{k}\Vert _H^2 \quad \forall \, k\ge 1;\end{aligned}$$(3.6) -
3.
To derive a worst-case \(O(1/k)\) convergence rate in a non-ergodic sense based on (3.5) and (3.6), i.e.,
$$\begin{aligned} \Vert w^k-w^{k+1}\Vert _H^2 \le \frac{1}{(k+1)}\Vert w^0-w^*\Vert _H^2\quad \forall w^*\in \Omega ^*. \end{aligned}$$(3.7)
In the following, our analysis is thus divided into three sections to address these three tasks.
4 Strict Contraction
We prove the conclusion (3.5) in this section, and our proof is inspired by Theorem 1 in [9]. We first present several lemmas.
Lemma 4.1
Let the sequence \(\{w^k\}\) be generated by (1.2). Then, we have
Proof
Deriving the optimality conditions of the minimization problem (1.2b), we have
Substituting (1.2c) into the last inequality, we obtain
Obviously, analogous to (4.2), we have
Setting \(y=y^k\) and \(y=y^{k+1}\) in in (4.2) and (4.3), respectively, and then adding the two resulting inequalities, we get (4.1) immediately. \(\square \)
Lemma 4.2
Let the sequence \(\{w^k\}\) be generated by (1.2) and \(H\) be given in (2.1). Then, we have
Proof
Setting \(w^*\) in (3.1), we obtain
In the following we show that the right-hand side of (4.5) is non-negative. First, since \(w^{k+1}\in \Omega \) and \(w^*\in \Omega ^*\), it follows from (2.2a) that
Because of the monotonicity of \(F\), we have
Thus, we obtain
On the other hand, by using the notation of \(\eta (y^k,y^{k+1})\) [see (3.2)], \(Ax^*+By^*=b\) and (1.2c), we have
Combining with (4.1), we get
Substituting (4.6) and (4.7) into the right-hand side of (4.5), the lemma is proved. \(\square \)
With the proved lemmas, we are now ready to show the assertion (3.5).
Theorem 4.1
Let the sequence \(\{w^k\}\) be generated by (1.2) and \(H\) be given in (2.1). Then (3.5) is satisfied for any \(k\ge 0\).
Proof
Using (4.4), we have
and thus the assertion (3.5) is proved. \(\square \)
5 Monotonicity
This section shows the assertion (3.6), i.e, the sequence \(\{\Vert w^{k+1} - w^{k+2}\Vert _H^2\}\) is monotonically non-increasing. Again, we need to prove several lemmas for this purpose.
First of all, for the convenience of analysis, we introduce an auxiliary variable \({\tilde{w}}^k\) defined as
where \(w^{k}\) is generated by (1.2). Then, we have the relationship
where the matrix \(M\) is given in (2.1).
Lemma 5.1
Let \(\{w^k\}\) be the sequence generated by (1.2), the associated sequence \(\{{\tilde{w}}^k\}\) be defined by (5.1) and \(Q\) be given in (2.1). Then, we have
Proof
By using the notation \(\tilde{w}^k\) in (5.1), and the facts
the inequality (3.1) can be rewritten as
The assertion (5.3) thus follows immediately from the definition of \(Q\). \(\square \)
Lemma 5.1 enables us to establish an important inequality in the following lemma.
Lemma 5.2
Let \(\{w^k\}\) be the sequence generated by (1.2), the associated sequence \(\{{\tilde{w}}^k\}\) be defined by (5.1) and \(Q\) be given in (2.1). Then, we have
Proof
Setting \(w=\tilde{w}^{k+1}\) in (5.3), we have
Note that (5.3) is also true for \(k:=k+1\) and thus we have
Setting \(w=\tilde{w}^k\) in the above inequality, we obtain
Adding (5.5) to (5.6) and using the monotonicity of \(F\), we get (5.4) immediately. \(\square \)
Lemma 5.3
Let \(\{w^k\}\) be the sequence generated by (1.2), the associated sequence \(\{{\tilde{w}}^k\}\) be defined by (5.1), the matrices \(H\), \(M\) and \(Q\) be given in (2.1). Then, we have
Proof
First, adding the term
to the both sides of (5.4), and using \(w^TQw= \frac{1}{2}w^T(Q^T+Q)w\), we get
Reordering \((w^k - w^{k+1}) -(\tilde{w}^k-\tilde{w}^{k+1})\) in the above inequality to \((w^k-\tilde{w}^k) - (w^{k+1}-\tilde{w}^{k+1})\) , we get
Substituting the term \((w^k - w^{k+1})\) into the left-hand side of the last inequality, and using the relationship in (5.2) and the fact \(Q=HM\) in Proposition 2.1, we obtain (5.7). \(\square \)
Finally, we are ready to show the assertion (3.6) in the following theorem.
Theorem 5.1
Let \(\{w^k\}\) be the sequence generated by (1.2) and \(H\) be given in (2.1). Then, (3.6) is satisfied for any \(k \ge 0\).
Proof
Setting \(a=M(w^k-\tilde{w}^k)\) and \(b=M(w^{k+1}-\tilde{w}^{k+1})\) in the identity
we obtain
Inserting (5.7) into the first term of the right-hand side of the last equality, we obtain
where the last inequality is because of the positive definiteness of the matrix \((Q^T+Q)-M^THM\) proved in Proposition 2.1. In other words, we derive
Recall the relationship in (5.2). The assertion (3.6) follows immediately from (5.8). \(\square \)
6 Non-ergodic convergence rate
With Theorems 4.1 and 5.1, we can prove the assertion (3.7). That is, a worst-case \(O(1/k)\) convergence rate in a non-ergodic sense for the ADMM scheme (1.2) is established.
Theorem 6.1
Let \(\{w^k\}\) be the sequence generated by (1.2). Then, the assertion (3.7) is satisfied.
Proof
First, it follows from (3.5) that
According to Theorem 5.1, the sequence \(\{\Vert w^t-w^{t+1}\Vert _H^2 \}\) is monotonically non-increasing. Therefore, we have
The assertion (3.7) follows from (6.1) and (6.2) immediately. \(\square \)
Notice that \(\Omega ^*\) is convex and closed (see Theorem 2.1 in [10]). Let \(d: = \inf \{ \Vert w^0 - w^*\Vert _H \, |\, w^*\in \Omega ^* \}\). Then, for any given \(\epsilon >0\), Theorem 6.1 shows that the ADMM scheme (1.2) needs at most \( \lfloor {d^2}/{\epsilon } \rfloor \) iterations to ensure that \(\Vert w^k- w^{k+1}\Vert _H^2\le \epsilon \). Recall that \(w^{k+1}\) is a solution of VI\((\Omega ,F,\theta )\) if \(\Vert w^k- w^{k+1}\Vert _H^2=0\) (see Lemma 3.1). A worst-case \(O(1/k)\) convergence rate in a non-ergodic sense for the ADMM scheme (1.2) is thus established in Theorem 6.1.
Finally, we remark that because of the monotonicity of the sequence \(\{\Vert w^k-w^{k+1}\Vert _H^2\}\) and the fact (6.1), and using Lemma 1.2 in [2], we can immediately refine the worst-case convergence rate in Theorem 6.1 from \(O(1/k)\) to \(o(1/k)\).
Notes
As [11, 12] and many others, a worst-case \(O(1/k)\) convergence rate measured by the iteration complexity means the accuracy to a solution under certain criterion is of the order \(O(1/k)\) after \(k\) iterations of an iterative scheme; or equivalently, it requires at most \(O(1/\epsilon )\) iterations to achieve an approximate solution with an accuracy of \(\epsilon \).
References
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2010)
Deng, W., Lai, M. J., Peng, Z. M., Yin, W. T.: Parallel multi-block ADMM with \(o(1/k)\) convergence. 2014 (in press)
Eckstein, J., Yao, W.: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Research Report RRR 32–2012 (2012)
Fortin, M., Glowinski, R.: Augmented lagrangian methods: applications to the numerical solutions of boundary value problems. Stud. Math. Appl. 15. NorthHolland, Amsterdam (1983)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2, 17–40 (1976)
Glowinski, R.: On alternating direction methods of multipliers: a historical perspective. In: Fitzgibbon, W., Kuznetsov, Y.A., Neittaanmaki, P., Pironneau, O. (eds.) Modeling, Simulation and Optimization for Science and Technology. Computational Methods in Applied Sciences, vol. 34, pp. 59–82. Springer, Dordrecht (2014)
Glowinski, R., Marroco, A.: Approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique, vol. 9, R2, pp. 41–76 (1975)
Glowinski, R., Le Tallec, P.: Augmented lagrangian and operator-splitting methods in nonlinear mechanics. SIAM Studies in Applied Mathematics, Philadelphia (1989)
He, B.S., Yang, H.: Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett. 23, 151–161 (1998)
He, B.S., Yuan, X.M.: On the O(1/n) convergence rate of Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Nemirovsky, A. S., Yudin, D. B.: Problem complexity and method efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics, Wiley, New York (1983)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
Zhang, X.Q., Burger, M., Bresson, X., Osher, S.: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253–276 (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
X. Yuan was supported by the General Research Fund from Hong Kong Research Grants Council: 203613.
B. He was supported by the NSFC Grant 91130007 and 11471156.
Rights and permissions
About this article
Cite this article
He, B., Yuan, X. On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130, 567–577 (2015). https://doi.org/10.1007/s00211-014-0673-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-014-0673-6