Abstract
Due to updating the Lagrangian multiplier twice at each iteration, the symmetric alternating direction method of multipliers (S-ADMM) often performs better than other ADMM-type methods. In practical applications, some proximal terms with positive definite proximal matrices are often added to its subproblems, and it is commonly known that large proximal parameter of the proximal term often results in ‘too-small-step-size’ phenomenon. In this paper, we generalize the proximal matrix from positive definite to indefinite, and propose a new S-ADMM with indefinite proximal regularization (termed IPS-ADMM) for the two-block separable convex programming with linear constraints. Without any additional assumptions, we prove the global convergence of the IPS-ADMM and analyze its worst-case \(\mathcal{O}(1/t)\) convergence rate in an ergodic sense by the iteration complexity. Finally, some numerical results are included to illustrate the efficiency of the IPS-ADMM.
Similar content being viewed by others
1 Introduction
Let \(\mathcal{R}^{n_{i}}\) stand for an \(n_{i}\)-dimensional Euclidean space, and let \(\mathcal{X}_{i}\subseteq\mathcal{R}^{n_{i}}\) be nonempty, closed and convex set, where \(i=1,2\). For two continuous closed convex functions \(\theta_{i}(x_{i}): \mathcal{R}^{n_{i}}\rightarrow\mathcal{R}\) (\(i=1,2\)), the canonical two-block separable convex programming with linear equality constraints is
where \(A_{i}\in\mathcal{R}^{m\times n_{i}}\) (\(i=1,2\)), \(b\in\mathcal{R}^{m}\). Throughout, the solution set of (1) is assumed to be nonempty. Convex programming (1) has promising applicability in modeling many concrete problems arising in a wide range of disciplines, such as statistical learning, inverse problems and image processing; see, e.g. [1–3] for more details.
Convex programming (1) has been studied extensively in the literature, researchers have developed many numerical methods to solve it during the last decades, which are mainly based on the well-known Douglas-Rachford splitting method [4, 5] and the Peachmen-Rachford splitting method [5, 6], which originate with the partial differential equation (PDE) literature. Concretely, applying the Douglas-Rachford splitting method to the dual of (1) [7, 8], we get the well-known alternating direction of multipliers (ADMM) [9, 10], whose iterative schemes reads
where \(\lambda\in\mathcal{R}^{m}\) is the Lagrangian multiplier; \(\beta>0\) is a penalty parameter, and \(s\in(0,\frac{1+\sqrt{5}}{2})\) is a relaxation factor. Analogously, applying the Peachmen-Rachford splitting method to the dual of (1), we get the symmetric ADMM [11–13], which generates its sequence via the scheme
where the feasible region of r, s is
Both methods make full use of the separable structure of (1), and minimize the primal variables \(x_{1}\) and \(x_{2}\) individually in the Gauss-Seidel way. As elaborated in [13], the S-ADMM updates the Lagrangian multiplier twice at each iteration and thus the variables \(x_{1}\), \(x_{2}\) are treated in a symmetric manner. The S-ADMM includes some well-known ADMM-based schemes as special cases. For example, it reduces to the original ADMM (2) when \(r=0\), and reduces to the generalized ADMM [14] when \(r\in (-1,1)\), \(s=1\). Therefore, the S-ADMM provides a unified framework to study the ADMM-type methods. The convergence results of the S-ADMM with any \((r,s)\in\mathcal{D}\), including global convergence, the worst-case \(\mathcal{O}(1/t)\) convergence rate in an ergodic sense, have been established in [13]. To the best of the authors’ knowledge, the worst-case \(\mathcal{O}(1/t)\) convergence rate in some non-ergodic sense of the S-ADMM is still missing.
In practical applications, the two essential subproblems related to \(x_{1}\) and \(x_{2}\) dominate the computation of the S-ADMM, which are often either linear or easily solvable, but nevertheless challenging. In order to solve the issue, some proximal terms are often added to these subproblems, which can linearize the quadratic term \(\frac{\beta }{2}\|A_{i}x_{i}\|^{2}\) (\(i=1,2\)) of these subproblems, and as a result we have the following proximal S-ADMM (termed PS-ADMM) [15–17]:
where \(G\in\mathcal{R}^{n_{2}\times n_{2}}\) is a positive definite matrix. When we set \(G=\tau I_{n_{2}}-\beta A_{2}^{\top}A_{2}\) with \(\tau>\beta\| A_{2}^{\top}A_{2}\|\), the quadratic term \(\frac{\beta}{2}\|A_{2}x_{2}\|^{2}\) in the subproblem related to \(x_{2}\) of the PS-ADMM is offset and thus the quadratic term \(\frac{\beta}{2}\|A_{1}x_{1}^{k+1}+A_{2}x_{2}-b\|^{2}\) is linearized. Then, if \(\mathcal{X}_{2}=\mathcal{R}^{n_{2}}\), the PS-ADMM only needs to compute the proximal mapping of the involved convex function \(\theta_{2}(\cdot)\) at each iteration, which is often simple enough to have a closed-form solution in many practical applications, such as \(\theta_{2}(x_{2})=\|x_{2}\|_{1}\) in the compressive sensing problems [3], \(\theta_{2}(x_{2})=\|x_{2}\|_{*}\) (here \(x_{2}\) is a square matrix) in the robust principal component analysis models [18]. \(\|x_{2}\|_{*}\) is defined by the sum of all singular values of \(x_{2}\).
The curse accompanying the above improvement in solvability is that the proximal parameter τ is not easy to determine for some problems in practice. Large τ prompts the weight of the quadratic term \(\frac{1}{2}\|x_{2}-x_{2}^{k}\|_{G}^{2}\) in the objective function of the \(x_{2}\)-subproblem and inevitably results in the ‘too-small-step-size’ phenomenon. Then, the advance of \(x_{2}\) is tiny at the kth iteration, which often slows down the convergence of the corresponding method. Therefore, it is meaningful to expand the feasible set of τ. Obviously, if we further reduce τ to \(\tau\leq\beta\|A_{2}^{\top}A_{2}\| \), the proximal matrix G will become indefinite, and it is thus natural to ask whether or not the corresponding method with such G is still globally convergent? Quite recently the authors in [19–21] partially answered the question. More specifically, for the ADMM (2) with \({s}=1\), He et al. [19] have proved that the feasible set of τ can be expanded to \(\{\tau |\tau>0.8\beta\|A_{2}^{\top}A_{2}\|\}\), and for the ADMM (2) with \({s}\in(0,\frac{1+\sqrt{5}}{2})\), Sun et al. [20] have proved that the feasible set of τ can be expanded to \(\{\tau|\tau>(5-\min\{{s,1+s-s^{2}}\})\beta\| A_{2}^{\top}A_{2}\|/5\}\). Then, for the S-ADMM with \(r\in(-1,1)\), \(s=1\), Gao et al. [21] have proved that the feasible set of τ can be expanded to \(\{\tau|\tau>(r^{2}-r+4)\beta\|A_{2}^{\top}A_{2}\|/(r^{2}-2r+5)\}\). Other relevant studies can be found in [22, 23]. In this paper, we continue to study along this direction, and present a new feasible set of τ, which generalizes those in [19–21] to any \((r,s)\in\mathcal{D}\). Furthermore, we show that for any \((r,s)\in\mathcal{D}\), the global convergence of the S-ADMM with some indefinite proximal regularization can be guaranteed.
The rest of the paper is organized as follows. In Section 2, we summarize some preliminaries which are useful for further discussion. Then, in Section 3, we list the iterative scheme of the IPS-ADMM and prove its convergence results, including the global convergence and the convergence rate. Some preliminary numerical results are reported in Section 4. Finally, some conclusions are drawn in Section 5.
2 Preliminaries
In this section, we first list some notation used in this paper, and then characterize problem (1) by a mixed variational inequality problem. Some matrices and variables to simplify the notation of our later analysis are also defined.
For any two vectors \(x,y\in\mathcal{R}^{n}\), \(\langle x,y\rangle\) or \(x^{\top}y\) denote their inner product. For any two matrices \(A\in \mathcal{R}^{s\times m}\), \(B\in\mathcal{R}^{n\times s}\), the Kronecker product of A and B is defined as \(A\otimes B=(a_{ij}B)\). We let \(\|\cdot\|_{1}\) and \(\|\cdot\|\) be the \(\ell_{1}\)-norm and \(\ell_{2}\)-norm for vector variables, respectively. \(I_{n}\) denotes the n-dimensional identity matrix. If the matrix \(G\in \mathcal{R}^{n\times n}\) is symmetric, we use the symbol \(\|x\|_{G}^{2}\) to denote \(x^{\top}Gx\) even if G is indefinite; \(G\succ0\) (resp., \(G\succeq0\)) denotes that the matrix G is positive definite (resp., semi-definite).
Let us split the feasible set \(\mathcal{D}\) of the parameters \((r,s)\) into the following five subsets:
Obviously, the set \(\{\mathcal{D}_{1},\mathcal{D}_{2},\mathcal{D}_{3},\mathcal {D}_{4},\mathcal{D}_{5}\}\) is a simplicial partition of the set \(\mathcal{D}\).
Throughout, the proximal matrix G is defined by
where we set \(\tau=\alpha\tilde{\tau}\) with \(\tilde{\tau}>\beta\| A_{2}^{\top}A_{2}\|\), \(\alpha\in(c(r,s),+\infty)\), and \(c(r,s)\) is defined by
Remark 2.1
Note that \(c(r,s)\leq1\) if \((r,s)\in\mathcal{D}\); see Lemmas 3.4-3.8 in Section 3. Therefore, the feasible set of τ is expanded from \(\{\tau|\tau>\beta\|A_{2}^{\top}A_{2}\|\}\) to \(\{\tau|\tau>c(r,s)\beta\| A_{2}^{\top}A_{2}\|\}\), which provides more choices for researchers or practitioners.
Furthermore, we define an auxiliary matrix as follows:
which is positive definite by \(\tilde{\tau}>\beta\|A_{2}^{\top}A_{2}\|\).
Invoking the first-order optimality condition for convex programming, we get the following equivalent form of problem (1): Finding a vector \(w^{*}\in\mathcal{W}\) such that
where
Obviously, the problem (10) is a mixed variational inequality problem, which is denoted by \(\operatorname{MVI}(\theta, F,\mathcal{W})\). The mapping \(F(w)\) defined in (11) is not only monotone, but also satisfies the property
Furthermore, the solution set of \(\operatorname{MVI}(\theta, F,\mathcal{W})\), denoted by \(\mathcal{W}^{*}\), is nonempty under the nonempty assumption for the solution set of problem (1).
Now, let us define three matrices in order to make our following analysis more succinct. Set
Lemma 2.1
Suppose the matrix \(A_{2}\) is full column rank and the parameter α in (7) satisfies
Then, the matrices M, Q, H defined, respectively, in (6), (7) satisfies
Proof
The proof of (16) is trivial, and we only need to prove (17). By the positive definiteness of P, we only need to prove \(H(2:3,2:3)\) is positive definite. Here \(H(2:3,2:3)\) denotes the corresponding sub-matrix formed from the rows and columns with the indices \((2:3)\) and \((2:3)\) as in Matlab. Substituting (7) into the right-hand side of (14), we get
where the relationship ⪰ comes from \(\alpha>0\) and \(\tilde{\tau }>\beta\|A_{2}^{\top}A_{2}\|\). Since the matrix \(A_{2}\) is full column rank, we only need to prove the positive definiteness of the matrix
which can be further written as
where ⊗ denotes the matrix Kronecker product. Then, we only need to show the 2-by-2 matrix
is positive definite. In fact, by (15), we have
Therefore, the matrix H is positive definite. The proof is completed. □
At the end of this section, let us summarize two criteria to measure the worst-case \(\mathcal{O}(1/t)\) convergence rate of the ADMM-type methods in an ergodic sense.
-
(1)
For a given compact set \(\bar{\mathcal{D}}\subset\mathcal {R}^{m+n}\), let \(d=\sup\{\|w-w^{0}\||w\in\bar{\mathcal{D}}\}\), where \(w^{0}\) is the initial iterate. He et al. [24] established the following criterion:
$$ \sup_{w\in\bar{\mathcal{D}}}\bigl\{ \theta (x_{t})- \theta(x)+(w_{t}-w)^{\top}F(w)\bigr\} \leq\frac{Cd^{2}}{t+1}, $$(18)where \(w_{t}=\frac{1}{t+1}\sum_{k=0}^{t}w^{k}\), \(C>0\), and t is the iteration counter. This criterion is used in [19, 21]. Obviously, we can only ensure that any \(w\in\bar{\mathcal{D}}\) satisfies (18). Therefore, the criterion (18) is not reasonable.
-
(2)
In [25], Lin et al. proposed the following criterion:
$$ \theta(x_{t})-\theta\bigl(x^{*}\bigr)+ \bigl(x_{t}-x^{*}\bigr)^{\top}\bigl(-A^{\top}\lambda^{*} \bigr)+\frac{c}{2}\|Ax_{t}-b\|^{2}\leq \frac{C}{t+1}, $$(19)where \(c>0\). Proposition 1 in [25] indicates that the vector \({x}_{t}\in\mathcal{X}_{1}\times\mathcal{X}_{2}\) is an optimal solution to (1) if and only if the left-hand side of (19) equals zero. Compared with (18), the criterion (19) is more reasonable. Therefore, we shall use a criterion similar to (19) to measure the \(\mathcal{O}(1/t)\) convergence rate of our new method.
3 Algorithm and convergence results
In this section, we first present the symmetric ADMM with indefinite proximal regularization (termed IPS-ADMM), and then prove the convergence results of the sequence generated by the IPS-ADMM.
Algorithm 3.1
The IPS-ADMM for problem (1)
- Step 0.:
-
Input four parameters \((r,s)\in\mathcal{D}\), \(\alpha\in (c(r,s),+\infty)\), \(\beta>0\), the tolerance \(\varepsilon>0\), and the proximal matrices \(P\in\mathcal{R}^{n_{1}\times n_{1}}\) with \(P\succ0\) and \(G\in\mathcal{R}^{n_{2}\times n_{2}}\) defined by (7). Initialize \((x_{1},x_{2},\lambda):=(x_{1}^{0},x_{2}^{0},\lambda^{0})\), and set \(k:=0\).
- Step 1.:
-
Compute the new iterate \(w^{k+1}=(x_{1}^{k+1},x_{2}^{k+1},\lambda^{k+1})\) by the following iterative scheme:
$$ \textstyle\begin{cases} x_{1}^{k+1}\in\operatorname{argmin}_{x_{1}\in\mathcal{X}_{1}}\{ \theta_{1}(x_{1})-(\lambda^{k})^{\top}(A_{1}x_{1}+A_{2}x_{2}^{k}-b) \\ \hphantom{x_{1}^{k+1}\in{}}{}+\frac{\beta}{2}\| A_{1}x_{1}+A_{2}x_{2}^{k}-b\|^{2}+\frac{1}{2}\|x_{1}-x_{1}^{k}\|_{P}^{2}\}, \\ \lambda^{k+\frac{1}{2}}=\lambda^{k}-r\beta(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}-b), \\ x_{2}^{k+1}\in\operatorname{argmin}_{x_{2}\in\mathcal{X}_{2}}\{\theta _{2}(x_{2})-(\lambda^{k+\frac{1}{2}})^{\top}(A_{1}x_{k}^{k+1}+A_{2}x_{2}-b) \\ \hphantom{x_{2}^{k+1}\in{}}{}+\frac{\beta}{2}\|A_{1}x_{1}^{k+1}+A_{2}x_{2}-b\|^{2}+\frac{1}{2}\|x_{2}-x_{2}^{k}\| _{G}^{2}\}, \\ \lambda^{k+1}=\lambda^{k+\frac{1}{2}}-s\beta(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b). \end{cases} $$(20) - Step 2.:
-
If \(\|w^{k}-{w}^{k+1}\|\leq\varepsilon\), then stop; otherwise set \(k:=k+1\), and go to Step 1.
Remark 3.1
Since the global convergence of IPS-ADMM with \(\alpha\geq1\) has been established in the literature [16, 26–28], in the following, we restrict \(\alpha\in (c(r,s),1)\).
To prove the convergence results of the IPS-ADMM, we first define a block matrix and an auxiliary variable.
Lemma 3.1
For the sequence \(\{(x^{k},\lambda^{k})\}=\{ (x_{1}^{k},x_{2}^{k},\lambda^{k})\}\) generated by the IPS-ADMM, we have
and
where \(R=Q^{\top}+Q-M^{\top}HM\).
Proof
The proof of this lemma is similar to that of Lemma 3.1 and Theorem 4.2 in [13], which is omitted. □
Remark 3.2
By the definition of \(F(\cdot)\) in (11), (12), for any \((x_{1},x_{2},\lambda)\in\mathcal{R}^{m+n_{1}+n_{2}}\) such that \(A_{1}x_{1}+A_{2}x_{2}=b\), the left-hand side of (22) can be written as
Then, substituting the above equality into the left-hand side of (22), we get
where the vector \((x_{1},x_{2},\lambda)\in\mathcal{R}^{m+n_{1}+n_{2}}\) satisfies \(A_{1}x_{1}+A_{2}x_{2}=b\).
Comparing all the terms appeared in (19) and (24), we find that the left-hand side of (24) does not have the term \(\| Ax^{k+1}-b\|^{2}\) temporarily, and due to the indefinite of R, the term \(\|v^{k}-\tilde{v}^{k}\|_{R}^{2}\) on the right-hand side of (24) maybe negative. Now let us deal with the term \(\|v^{k}-\tilde{v}^{k}\|_{R}^{2}\), and by doing so, the term \(\|Ax^{k+1}-b\|^{2}\) will also appear. By a manipulation, we get the concrete expression of the matrix R, which is as follows:
Lemma 3.2
Let \(\{(x^{k},\lambda^{k})\}=\{(x_{1}^{k},x_{2}^{k},\lambda^{k})\} \) be the sequence generated by the IPS-ADMM. Then we have
Proof
The proof of this lemma is similar to that of Lemma 5.1 in [13], which is omitted. □
The following lemma deals with the crossing term \((Ax^{k+1}-b)^{\top}A_{2}(x_{2}^{k+1}-x_{2}^{k})\) on the right-hand side of (26), whose proof is mainly motivated by those of Lemma 3.2 in [26] and Lemma 5.2 in [13].
Lemma 3.3
Let \(\{(x^{k},\lambda^{k})\}=\{(x_{1}^{k},x_{2}^{k},\lambda^{k})\} \) be the sequence generated by the IPS-ADMM. Then we have
Proof
The first-order optimality condition of \(x_{2}\)-subproblem in (5) indicates that, for any \(x_{2}\in\mathcal{X}_{2}\),
Setting \(x_{2}=x_{2}^{k}\) in (28), we get
Similarly, taking \(x_{2}=x_{2}^{k+1}\) in (28) for \(k:=k-1\), we have
Then, adding the above two inequalities, we get
From the update formula for λ in (5), we have
Substituting the above equality into the left-hand side of (29), we get
By the definitions of G and \(G_{0}\) (see (7) and (9)), we have
where the last inequality comes from the Cauchy-Schwartz inequality. Substituting the above inequality into the right-hand side of (30) and arranging terms, we get the assertion (28) immediately. □
Then, substituting (28) into the right-hand side of (26), we get the following main theorem, which provides a lower bound of as \(\|w^{k}-\tilde{w}^{k}\|_{R}^{2}\), and the lower bound is composed of the term \(\|Ax^{k+1}-b\|^{2}\), some terms in the form \(\|w-w^{k+1}\| ^{2}-\|w-w^{k}\|^{2}\), and some others.
Theorem 3.1
Let \(\{(x^{k},\lambda^{k})\}=\{(x_{1}^{k},x_{2}^{k},\lambda ^{k})\}\) be the sequence generated by the IPS-ADMM. Then we have
Now, let us rewrite all the terms on the right-hand side of (31) by some quadratic terms, and mainly deal with the term \(\| x_{2}^{k}-x_{2}^{k+1}\|_{G_{0}}^{2}\) and the crossing term \((Ax^{k}-b)^{\top}A_{2}(x_{2}^{k}-x_{2}^{k+1})\). According to the simplicial partition \(\mathcal{D}_{i}\) (\(i=1,2,\ldots,5\)) of the set \(\mathcal{D}\) in (6), the following analysis is divided into five cases, which are discussed in the following five subsections.
3.1 Case 1: \((r,s)\in\mathcal{D}_{1}\)
Lemma 3.4
For any fixed \((r,s)\in\mathcal{D}_{1}\), if \(\alpha >\alpha_{1}\doteq s+\frac{(1-s)^{2}}{2-r-s}\), then there are constants \(C_{11},C_{12}>0\) such that
Furthermore, \(\alpha_{1}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal {D}_{1}\), where \(\alpha_{0}\) is defined in (15).
Proof
We prove the assertion (32) from the definition of the matrix R directly. Define an auxiliary matrix \(R_{0}\) as
By the expression of R in (25), we have
Now, let us verify the positive definiteness of the matrix
which can be written as
Obviously, when \(\alpha>\alpha_{1}\), the above matrix is positive definite. Therefore, the matrix S is positive definite, and then the matrices R and \(R_{0}\) are both positive definite by the full column rank of \(A_{2}\) and the positive definiteness of P. By a manipulation, we get
where
By the positive definiteness of the matrix S, we get the assertion (32). By the definitions of \(\alpha_{0}\) and \(\alpha_{1}\), we have
Therefore, \(\alpha_{1}>\alpha_{0}\), for any \((r,s)\in\mathcal{D}_{1}\). By some manipulations, we have
Therefore, \(\alpha_{1}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{1}\). □
3.2 Case 2: \((r,s)\in\mathcal{D}_{2}\)
Lemma 3.5
For any \((r,s)\in\mathcal{D}_{2}\), if \(\alpha>\alpha _{2}\doteq\frac{4-r-r^{2}}{5-3r}\), then we have
where \(C_{2i}\) (\(i=1,2,3,4\)) are four positive constants defined by
Furthermore, \(\alpha_{2}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{2}\).
Proof
Setting \(s=1\) in (31), we have
which proves (33). From \(\alpha>\alpha_{2}\), it is obvious that \(C_{21}>0\), and from \(r\in(-1,1)\), \(\alpha\in(\alpha_{2},1)\), we have \(C_{22},C_{23},C_{24}>0\). By the definition of \(\alpha_{2}\), we get
Therefore, \(\alpha_{2}>\alpha_{0}\), for any \((r,s)\in\mathcal{D}_{2}\). By some manipulations, we have
Therefore, \(\alpha_{2}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{2}\). □
Remark 3.3
For any \((r,s)\in\mathcal{D}_{2}\), Gao et al. [21] have proved that \(\alpha_{G}\doteq\frac{r^{2}-r+4}{r^{2}-2r+5}\) is a lower bound of α. The curves of \(\alpha_{2}\) and \(\alpha_{G}\) with \(r\in(-1,1)\) are drawn in Figure 1, from which we have \(\alpha_{2}<\alpha_{G}\) if \(r\in(-1,0)\), and \(\alpha_{2}>\alpha_{G}\) if \(r\in(0,1)\). Therefore, compared with that in [21], the feasible set of τ in this paper is expanded if \(r\in(-1,0)\), and is shrunk if \(r\in(0,1)\). However, Gao et al. only established the worst-case convergence rate of the IPS-ADMM using the criterion (18), and we shall prove the worst-case convergence rate of the IPS-ADMM using the more reasonable criterion (19); see the following Theorem 3.4.
3.3 Case 3: \((r,s)\in\mathcal{D}_{3}\)
Lemma 3.6
For any \((r,s)\in\mathcal{D}_{3}\), if \(\alpha>\alpha _{3}\doteq\frac{7s^{2}-22s+23}{5s^{2}-20s+25}\), then we have
where \(C_{3i}\) (\(i=0,1,2,3,4\)) are five positive constants defined by
Furthermore, \(\alpha_{3}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{3}\).
Proof
By the Cauchy-Schwartz inequality, we have
Then, substituting the above inequality into the right-hand side of (31), we get
which proves (34). From \(\alpha\in(\alpha_{3},1)\), it is obvious that
Furthermore, by some manipulations, \(\forall(r,s)\in\mathcal{D}_{3}\), we have
Therefore, \(\alpha_{3}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{3}\). □
3.4 Case 4: \((r,s)\in\mathcal{D}_{4}\)
Lemma 3.7
For any \((r,s)\in\mathcal{D}_{4}\), if \(\alpha>\alpha _{4}\doteq\frac{r^{3} + r^{2} - r - 5}{3r^{2} - 2r - 5}\), then we have
where \(C_{4i}\) (\(i=0,1,2,3,4\)) are five positive constants defined by
Furthermore, \(\alpha_{4}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{4}\).
Proof
By the Cauchy-Schwartz inequality, we have
Then, substituting the above inequality into the right-hand side of (31), we get
which proves (35). From the definition of \(T_{2}\), \(\alpha\in (\alpha_{4},1)\), \((r,s)\in\mathcal{D}_{4}\), it is easy to verify that \(C_{40},C_{42},C_{43},C_{44}>0\). From the definition of \(C_{41}\), we get
Furthermore, by some manipulations, \(\forall(r,s)\in\mathcal{D}_{3}\), we have
Therefore, \(\alpha_{4}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{4}\). □
3.5 Case 5: \((r,s)\in\mathcal{D}_{5}\)
Lemma 3.8
For any \((r,s)\in\mathcal{D}_{5}\), if \(\alpha>\alpha _{5}\doteq\frac{(r^{2} + r - 4)s^{2} -( r^{2} +4r - 9)s - (r-1)^{2}}{s(2-s)(5-3r)}\), then we have
where \(C_{5i}\) (\(i=0,1,2,3,4\)) are five positive constants defined by
Furthermore, \(\alpha_{5}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{5}\).
Proof
By the Cauchy-Schwartz inequality, we have
Then, substituting the above inequality into the right-hand side of (31), we get
which proves (36). From the definition of \(T_{3}\), \(\alpha\in (\alpha_{5},1)\), \((r,s)\in\mathcal{D}_{5}\), it is easy to verify that \(C_{50},C_{52},C_{53},C_{54}>0\). From the definition of \(C_{51}\), for any \((r,s)\in\mathcal{D}_{5}\), we get
By the definition of \(\alpha_{5}\), for any \((r,s)\in\mathcal{D}_{5}\), we have
where the first inequality follows from \(s^{2}<1+r+s\), and the second inequality comes from \(r<0\), \(s\in(1,\frac{1+\sqrt{5}}{2})\), \(r<\frac{3s+1}{2}\). By some manipulations, we obtain
Therefore, \(\alpha_{5}\in(\alpha_{0},1)\), for any \((r,s)\in\mathcal{D}_{5}\). □
In the remainder of this section, we shall establish the convergence results of the sequence generated by the IPS-ADMM. First, based on (24) and Lemmas 3.4-3.8, we can get the following theorem.
Theorem 3.2
Let \(\{(x^{k},\lambda^{k})\}=\{(x_{1}^{k},x_{2}^{k},\lambda ^{k})\}\) be the sequence generated by the IPS-ADMM. Then, for any \((r,s)\in\mathcal{D}\), \(\alpha\in(c(r,s),1)\), where \(c(r,s)\) is defined in (8), we have
where \((x_{1},x_{2},\lambda)\in\mathcal{R}^{m+n_{1}+n_{2}}\) satisfies \(A_{1}x_{1}+A_{2}x_{2}=b\), \(C_{0},C_{3},C_{4}\geq0\), \(C_{1},C_{2}>0\) with \(C_{j}=C_{ij}\) if \((r,s)\in\mathcal{D}_{i}\), \(i=1,2,3,4,5\), \(j=0,1,2,3,4\).
With the above theorems in hand, now we are ready to prove the global convergence of the IPS-ADMM.
Theorem 3.3
Let \(\{(x^{k},\lambda^{k})\}=\{(x_{1}^{k},x_{2}^{k},\lambda ^{k})\}\) be the sequence generated by the IPS-ADMM. Then, if \(A_{1}\), \(A_{2}\) are both full column rank, the sequence \(\{(x^{k},\lambda^{k})\}\) is bounded and converges to a point \((x^{\infty},\lambda^{\infty})\in\mathcal{W}^{*}\).
Proof
Choose an arbitrary \((x_{1}^{*},x_{2}^{*},\lambda^{*})\in\mathcal {W}^{*}\) and setting \(x_{1}=x_{1}^{*}\), \(x_{2}=x_{2}^{*}\), \(\lambda=\lambda^{*}\) in (37), we get
Then, from \(x\in\mathcal{X}_{1}\times\mathcal{X}_{2}\), \((x_{1}^{*},x_{2}^{*},\lambda ^{*})\in\mathcal{W}^{*}\) and (23), we have
which together with \(C_{0},C_{3},C_{4}\geq0\), \(C_{1},C_{2}>0\), \(H,G_{0}\succ0\) implies that
This, the full column rank of \(A_{2}\), and the positive definiteness of P indicate that
Furthermore, it follows from (38) that the sequences \(\{w^{k}\} \) and \(\{Ax^{k}-b\}\) are both bounded. Therefore, \(\{w^{k}\}\) has at least one cluster point, saying \(w^{\infty}\), and suppose that the subsequence \(\{w^{k_{i}}\}\) converges to \(w^{\infty}\). Then, taking the limits on both sides of (21) along the subsequence \(\{w^{k_{i}}\}\) and using (39), we have
Therefore, \(w^{\infty}\in\mathcal{W}^{*}\).
Hence, replacing \(w^{*}\) by \(w^{\infty}\) in (38), we get
From (39), we see that, for any given \(\varepsilon>0\), there exists \(l_{0}>0\), such that
Since \(w^{k_{i}}\rightarrow w^{\infty}\) for \(i\rightarrow\infty\), there exists \(k_{l}>l_{0}\), such that
Then the above three inequalities lead, for any \(k>k_{l}\), to
Therefore the whole sequence \(\{w^{k}\}\) converges to the \(w^{\infty}\). The proof is completed. □
Now, we are going to prove the worst-case \(\mathcal{O}(1/t)\) convergence rate in an ergodic sense of the IPS-ADMM.
Theorem 3.4
Let \(\{(x^{k},\lambda^{k})\}=\{(x_{1}^{k},x_{2}^{k},\lambda ^{k})\}\) be the sequence generated by the IPS-ADMM, and let
where t is a positive integer. Then,
where \((x^{*},\lambda^{*})\in\mathcal{W}^{*}\), and D is a constant defined by
Proof
Setting \(x=x^{*}\), \(\lambda=\lambda^{*}\) in (37), and summing the resulted inequality over \(k=1, 2, \ldots, t\), we have
Therefore, dividing (42) by t and using the convexity of \(\theta(\cdot)\) lead to
where the constant D is defined by (41).
Compared (43) with (19), we only need to deal with the term \(\frac{C_{2}\beta}{2t}\sum_{k=1}^{t}\|A^{k+1}-b\|^{2}\) on the left-hand side of (43). In fact, from the convexity of \(\|\cdot\|^{2}\), we get
Then, substituting the above inequality into (43), we get the desired result (40). This completes the proof. □
4 Numerical results
We have established the convergence results of the IPS-ADMM in theory. In this section, by comparing the IPS-ADMM with the PS-ADMM [15], we are going to highlight its promising numerical behaviors in solving an image restoration problem: the total-variational denoising problem. All the codes were written by Matlab R2010a and all the numerical experiments were conducted on a THINKPAD notebook with Pentium(R) Dual-Core CPU@2.20 GHz and 4 GB RAM.
Below, we consider the total-variational (TV) denoising problem [29]:
where \(D^{\top}=[D_{1}^{\top},D_{2}^{\top}]^{\top}\) is a discrete gradient operator with \(D_{1}:\mathcal{R}^{n}\rightarrow\mathcal{R}^{n}\), \(D_{2}:\mathcal {R}^{n}\rightarrow\mathcal{R}^{n}\) being the finite-difference operators in the horizontal and vertical directions, respectively; \(\eta>0\) is the regularization parameter. Here, we set \(\eta=5\).
Introducing an auxiliary variable \(x\in\mathcal{R}^{2n}\), we can reformulate (44) as
Obviously, (45) is a special case of (1), and therefore the IPS-ADMM is applicable. Now, let us elaborate on how to derive the closed-form solutions for the subproblems resulted by the IPS-ADMM.
Set \(P=\tau_{1}I_{2n}\), \(G=\alpha\tau_{2}I_{n}-\beta D^{\top}D\). For given \((x^{k},y^{k},\lambda^{k})\), the first subproblem is
which has a closed-form solution:
For given \(x^{k+1}\), \(y^{k}\), \(\lambda^{k+\frac{1}{2}}\), the third subproblem is
which has a closed-form solution:
For the IPS-ADMM, we set \(\beta=1\), \(\tau_{1}=0.001\), \(\tau_{2}=1.01\beta\| D^{\top}D\|\), \(\alpha=1.01c(r,s)\). For the PS-ADMM, we set \(G=\tau_{2} I_{n}-\beta B^{\top}B\). The initialization is chosen as \(x_{0}=0\), \(y_{0}=b\), \(\lambda_{0}=0\). The stopping criterion is the same as that in [2]:
where \(\epsilon^{\mathrm{pri}}=\sqrt{n}\epsilon^{\mathrm{abs}}+\epsilon ^{\mathrm{rel}}\max\{\|x^{k+1}\|,\|Dy^{k+1}\|\}\), and \(\epsilon^{\mathrm{dual}}=\sqrt{n}\epsilon^{\mathrm{abs}}+\epsilon ^{\mathrm{rel}}\|y^{k+1}\|\) with \(\epsilon^{\mathrm{abs}}=10^{-4}\) and \(\epsilon^{\mathrm{rel}}=10^{-3}\). We use the following Matlab scripts to generate some synthetic data for (45) [21]:
We list some numerical results in Table 1. Numerical results in Table 1 illustrate that the IPS-ADMM often performs much better than the PS-ADMM, though the difference between them only lies in the proximal parameter. Then, the numerical advantage of smaller proximal parameter is verified.
5 Conclusions
In this paper, a symmetric ADMM with indefinite proximal regularization for two-block linearly constrained convex programming is proposed. Under mild conditions, we have established the global convergence and the worst-case \(\mathcal{O}(1/t)\) convergence rate in an ergodic sense of the new method. Some numerical results are given, which illustrate that the new method often performs better than its counterpart with positive definite proximal regularization. Note that this paper only discusses the symmetric ADMM with indefinite proximal regularization for the two-block separable convex problems. In the future, we shall study the ADMM-type method with indefinite proximal regularization for the multi-block case.
References
Wang, YL, Yang, JF, Yin, WT, Zhang, Y: A new alternating minimization algorithm for total variation image reconstruction. SIAM J. Imaging Sci. 1(3), 248-272 (2008)
Boyd, S, Parikh, N, Chu, E, Peleato, B, Eckstein, J: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1-122 (2011)
Yang, JF, Zhang, Y: Alternating direction algorithms for \(\ell_{1}\)-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250-278 (2011)
Douglas, J, Rachford, HH: On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc. 82(82), 421-439 (1956)
Lions, PL, Mercier, B: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(16), 964-979 (1979)
Peaceman, DH, Rachford, HH: The numerical solution of parabolic elliptic differential equations. J. Soc. Ind. Appl. Math. 3(1), 28-41 (1955)
Gabay, D: Applications of the method of multipliers to variational inequalities. In: Fortin, M, Glowinski, R (eds.) Augmented Lagrange Methods: Applications to the Solution of Boundary-Valued Problems, pp. 299-331. North-Holland, Amsterdam (1983)
Glowinski, R, Tallec, PL: Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics. SIAM Studies in Applied Mathematics. SIAM, Philadelphia (1989)
Glowinski, R, Marrocco, A: Sur l’approximation, par éléments fins d’ordren, et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet nonlinéares. Rev. Fr. Autom. Inform. Rech. Opér. 9, 41-76 (1975)
Gabay, D, Mercier, B: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2(1), 17-40 (1976)
Kiwiel, KC, Rosa, CH, Ruszczynski, A: Proximal decomposition via alternating linearization. SIAM J. Optim. 9(3), 668-689 (1999)
He, BS, Liu, H, Wang, ZR, Yuan, XM: A strictly contractive Peaceman-Rachford splitting method for convex programming. SIAM J. Optim. 24(3), 1011-1040 (2014)
He, BS, Ma, F, Yuan, XM: Convergence study on the symmetric version of ADMM with larger step sizes. SIAM J. Imaging Sci. 9(3), 1467-1501 (2016)
Eckstein, J, Bertsekas, D: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293-318 (1992)
Li, XX, Li, XM: A proximal strictly contractive Peaceman-Rachford splitting method for convex programming with applications to imaging. SIAM J. Imaging Sci. 8(2), 1332-1365 (2015)
Sun, M, Liu, J: A proximal Peaceman-Rachford splitting method for compressive sensing. J. Appl. Math. Comput. 50(1), 349-363 (2016)
Sun, HC, Sun, M, Zhou, HC: A proximal splitting method for separable convex programming and its application to compressive sensing. J. Nonlinear Sci. Appl. 9(2), 392-403 (2016)
Candès, EJ, Li, XD, Ma, Y, Wright, J: Robust principal component analysis? J. Assoc. Comput. Mach. 58, 1-37 (2011)
He, BS, Ma, F, Yuan, XM: Linearized alternating direction method of multipliers via indefinite proximal regularization for convex programming. Manuscript (2016)
Sun, M, Liu, J: The convergence rate of the proximal alternating direction method of multipliers with indefinite proximal regularization. J. Inequal. Appl. 2017, 19 (2017)
Gao, B, Ma, F: Symmetric alternating direction method with indefinite proximal regularization for linearly constrained convex optimization. J. Optim. Theory Appl. (2017, submitted)
He, BS, Ma, F: Convergence study on the proximal alternating direction method with larger step size. Manuscript (2017)
Tao, M, Yuan, XM: On Glowinski’s open question of alternating direction method of multipliers. Manuscript (2017)
He, BS, Yuan, XM: On the \(\mathcal{O}(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700-709 (2012)
Lin, ZC, Liu, RS, Li, H: Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning. Mach. Learn. 99(2), 287-325 (2015)
Xu, MH, Wu, T: A class of linearized proximal alternating direction methods. J. Optim. Theory Appl. 151(2), 321-337 (2011)
Fang, EX, He, BS, Liu, H, Yuan, XM: Generalized alternating direction method of multipliers: new theoretical insights and applications. Math. Program. Comput. 7(2), 149-187 (2015)
Sun, M, Wang, YJ, Liu, J: Generalized Peaceman-Rachford splitting method for multiple-block separable convex programming with applications to robust PCA. Calcolo 54(1), 77-94 (2017)
Beck, A, Teboulle, M: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18, 2419-2434 (2009)
Acknowledgements
The authors gratefully acknowledge the valuable comments of the anonymous reviewers. This work is supported by the foundation of National Natural Science Foundation of China (Nos. 11601475, 11671228), the foundation of First Class Discipline of Zhejiang-A (Zhejiang University of Finance and Economics - Statistics), the foundation of National Natural Science Foundation of Shandong Province (No. ZR2016AL05) and Scientific Research Project of Shandong Universities (No. J15LI11).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The first author has proposed the motivations of the manuscript; the second author has proved the convergence result, and the third author has accomplished the numerical results. All authors read and approved the final manuscript.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Sun, H., Tian, M. & Sun, M. The symmetric ADMM with indefinite proximal regularization and its application. J Inequal Appl 2017, 172 (2017). https://doi.org/10.1186/s13660-017-1447-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-017-1447-3