Abstract
In this paper, we propose a new customized proximal point algorithm for linearly constrained convex optimization problem, and further extend the proposed method to separable convex optimization problem. Unlike the existing customized proximal point algorithms, the proposed algorithms do not involve relaxation step, but still ensure the convergence. We obtain the particular iteration schemes and the unified variational inequality perspective. The global convergence and \({\mathcal {O}}(1/k)\)-convergence rate of the proposed methods are investigated under some mild assumptions. Numerical experiments show that compared to some state-of-the-art methods, the proposed methods are effective.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 Proximal Point Algorithm
Proximal point algorithm (PPA for short) dates back to Moreau [1], which was extended by Rockafellar [2], Bertsekas and Tseng [3], and Kaplan and Tichatschke [4], etc. It was introduced to optimization community by Martinet [5]. The PPA possesses a robust convergence for very general problems in finite and infinite-dimensions (Kassay [6] and Glüer [7]), and is also the basis for the splitting methods (Spingarn [8] and Han [9]).
Consider a convex minimization problem with linear constraints of the form
where \({\mathcal {X}}\subset {\mathbb {R}}^{n}\) is a bounded closed convex and nonempty set, \(\theta :{\mathcal {X}}\rightarrow {\mathbb {R}}\) is a convex (not necessarily smooth) function, \(A\in {\mathbb {R}}^{m\times n}\), \(b\in {\mathbb {R}}^{m}\). The solution set of (1.1), denoted by \({\mathcal {X}}^*\), is assumed to be nonempty.
For solving problem (1.1), the primal-dual hybrid gradient (PDHG) algorithm proposed by Zhu and Chan [10] has the following iterative scheme:
where \(\lambda \in {\mathbb {R}}^m\) is the Lagrange multiplier vector, \(r,t>0\) are iteration parameters. Let \(w:=(x,\lambda )\), and \({\mathcal {W}}:= {\mathcal {X}}\times {\mathbb {R}}^m\). Then, by the first optimality condition of iteration (1.2), for \(w^{k+1}\in {\mathcal {W}}\) we have that
where
and \(I_n\) denotes a \({n\times n}\)-dimensional identity matrix, \(0_{m\times n}\) is a \({m\times n}\)-dimensional null matrix. However, He [11] showed that the PDHG algorithm could not be necessarily convergent, and gave a counter example, see Sect. 3 of [11].
He, Fu and Jiang [12] proposed a PPA using a linear proximal term (LPPA). For solving (1.1), the LPPA (Procedure 4.1 in [12]) first produces a predictor \({\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{\lambda }}^k)\) via
By the first-order optimality conditions of (1.4), for \({\tilde{w}}^k\in {\mathcal {W}}\), we have
where
To ensure the convergence, the new iterate \(w^{k+1}=(x^{k+1},\lambda ^{k+1})\) of LPPA is generated by a simple relaxation step
where \(\rho _k=\tau \rho _k^*\) and
He, Yuan and Zhang [13] proposed a customized proximal point algorithm (CPPA) by using matrix \(P_2\) instead of matrix \(P_1\) in (1.5), where
To guarantee convergence, the matrix \(P_2\) should be positive definite. As a consequence, it requires \(rt>\rho (A^{\text {T}}A)\), where \(\rho (A^{\text {T}}A)\) is the spectral radius of matrix \(A^{\text {T}}A\). The corresponding iterative scheme of CPPA first produces a predictor \({\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{\lambda }}^k)\) via
and then takes a simple relaxation step
where \(\gamma \in (0, 2)\) is a relaxation factor. Various numerical experiments show that \(\gamma >1\) provides the fast convergence. For distinction in this paper, the CPPA (1.7)-(1.8) with \(\gamma =1\) is called original-CPPA or CPPA for short, while the CPPA with \(\gamma >1\) is called relax-CPPA or rCPPA for short.
The augmented Lagrangian method of multipliers (ALM) is another efficient method for convex optimization, see Hestenes [14] and Powell [15]. Let \(\beta >0\) be a penalty parameter of ALM, the iterative scheme of the ALM to the problem (1.9) is
Rockafellar [16] showed that the ALM is exactly an application of PPA to the dual problem of (1.1), owning the compact inequality (1.3) where P is replaced by
However, He, Yuan and Zhang [13] showed that the CPPA is more efficient than the ALM when the objective function \(\theta (x)\) is simple in the sense that its resolvent operator of \(\theta (x)\), defined by \((I+\tfrac{1}{r}\partial \theta )^{-1}\), has a close-form representation, where \(\partial \theta \) is the subdifferential of \(\theta \).
1.2 Proximal Point Algorithm for Separable Problem
Consider a separable convex optimization problem of the form
where \({\mathcal {X}}\subset {\mathbb {R}}^{n_1}\) and \({\mathcal {Y}}\subset {\mathbb {R}}^{n_2}\) are bounded closed and convex nonempty sets, \(\theta _1:{\mathcal {X}}\rightarrow {\mathbb {R}}\) and \(\theta _2:{\mathcal {Y}}\rightarrow {\mathbb {R}}\) are convex (not necessarily smooth) functions, \(A\in {\mathbb {R}}^{m\times n_1}\), \(B\in {\mathbb {R}}^{m\times n_2}\) and \(b\in {\mathbb {R}}^{m}\). The solution set of (1.9), denoted by \({\mathcal {X}}^*\times {\mathcal {Y}}^*\), is assumed to be nonempty.
The augmented Lagrangian-based methods, especially some splitting forms including alternating direction method of multipliers (ADMM for short) and parallel splitting algorithm [17], are verified to be very efficient for problem (1.9). The iterative scheme of the ALM to the problem (1.9) is
In the second subproblem of scheme (1.10), the coupled variable (x, y) makes it intractable. As a splitting version of the ALM, the ADMM originally proposed by Glowinski [18] and Gabay [19], is as follows:
The ADMM overcomes the drawback of ALM stated above by discoupling variable (x, y). Gabay [20] expressed the ADMM as essentially an application of the Douglas–Rachford splitting method [21]. Cai, Gu and He [22] provided a novel and simple PPA way to understand ADMM, and proposed a generalized ADMM (gADMM) which producing firstly a predictor \({\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{y}}^k, {\tilde{\lambda }}^k)\) via
Let \(v=(y,\lambda )\) and \(w:=(x,v)\in {\mathcal {W}}={\mathcal {X}}\times {\mathcal {Y}}\times {\mathbb {R}}^m\), and then the new iterate \(w^{k+1}=(x^{k+1},v^{k+1})\) of the gADMM is generated by a relaxation step:
The iterative scheme (1.11) can be recovered by a compact form: find \({\tilde{w}}^k \in {\mathcal {W}}\) such that
where \(u=(x,y),~~w=(u, \lambda ) \in {\mathcal {W}}\) and \(\theta (u)=\theta _1(x)+\theta _2(y)\) and
and the customized matrix \(P_4\) is as follows:
The special structure of matrix \(P_4\) implies that variable x may not be involved in the iteration of the gADMM. The following matrix
which removes zero-elements from \(P_4\), is required to be symmetric and positive semidefinite to ensure the convergence of the gADMM. The gADMM was verified to be faster than ADMM because some approximate computation is permitted.
Let \({P}'_4\in {\mathbb {R}}^{n_1+n_2+m}\) be a matrix such that \({P}'_4(w^k-{\tilde{w}}^{k})\) be a descent direction, He and Yuan [23, 24] presented a general relaxation step as follows:
where \(\gamma \) is the steplength. Let \(H ={\bar{P}}_4 ({P'_4})^{-1}\) and \(G=({{\bar{P}}_4})^{\text {T}}+{\bar{P}}_4-\gamma ({P'_4})^{\text {T}} H({P'_4})\). One can relax the conventional proposition on \({\bar{P}}_4\), but ensure symmetric and positive semidefinite property of G. In [23, 24], He and Yuan established the worst-case convergence rate \({\mathcal {O}}(1/k)\) and non-ergodic convergence rate of the Douglas–Rachford alternating direction method of multipliers.
1.3 Contributions
From above analysis, there are many PPAs for problem (1.1) and (1.9).
The relaxation steps are very effective used in PPAs. For example, the relaxation step of CPPA accelerates the algorithm for problem (1.1), the relaxation step of LPPA and gADMM ensures the convergence for problem (1.1) and (1.9), respectively. However, the relaxation step might be unacceptable or even not permitted in some practical applications. These applications include real time control system, decision-making on flow line production, supply chain management and decision-making, and so on. For an instance, in the decision-making framework for supply chain management [25,26,27], each firm on the chain makes decision independently in a proper sequence, and the decision is immediately executed whenever it is given. Hence, all firms cannot adjust their decision at the end, correspondingly the relaxation step in the simulation method has not any actual meaning.
To adapt these real circumstances, we prefer to modify the prediction step rather than to take a relaxation step, and propose two novel customized proximal point algorithms without the relaxation step. The proposed methods are generalized versions of the CPPA proposed in [13, 28] for both convex and separable convex problems, respectively. The global convergence and the \({\mathcal {O}}(1/k)\)-convergence rate of the proposed methods will be established under some mild assumptions. We also verify that the new methods have a better numerical performance in practice comparing to some state-of-the-art methods within the relaxation step.
The rest of the paper is organized as follows. In Sect. 2, we propose two new customized proximal point algorithms without relaxation step for problems (1.1) and (1.9), respectively. The convergence of proposed methods will be proven in Sect. 2. In Sect. 3, some preliminary numerical results, comparing to the state-of-the-art methods, are presented to show the high efficiency of the new methods. Section 4 concludes this paper with some final remarks.
2 Two New Customized Proximal Point Algorithms
2.1 The Generalized Customized Proximal Point Algorithm
The Lagrangian function of problem (1.1) is
If \(x^*\in {\mathcal {X}}\) is a solution of problem (1.1), then there exists \(\lambda ^*\in {\mathbb {R}}^m\) such that the pair \((x^*,\lambda ^*)\) is a saddle point of (2.1) satisfying
Consequently,
By Lemma 2.1 in [11], the first-order optimality conditions of problem (2.2) are as follows:
Definition 2.1
The solution set of (1.1), denoted by \({\mathcal {W}}^*\subset {\mathcal {W}}\), consists of all \(w^*\in {\mathcal {W}}\) satisfying
where
The generalized customized proximal point algorithm for problem (1.1) is proposed as follows.
By the first-order optimality conditions of iteration subproblem (2.5), we have
which implies that \(\forall ~w\in {\mathcal {W}}\) the following variational inequality holds:
In a compact form, we have
where F(w) is defined by (2.4), and
Lemma 2.3
Let \(w^*\in {\mathcal {W}}^*\) be a solution defined by (2.3), and \(\{w^k\}\) be a sequence generated by the gCPPA method. Then we have
Proof
By setting \(u=u^*,~~w=w^*\) in (2.6), we get
By the monotonicity of F(w),Footnote 1. we have
Since \(w^*\in {\mathcal {W}}^*\) is a solution, it follows
Adding (2.10) and (2.11) yields
Substituting (2.12) into (2.9), we obtain (2.8) and complete the proof. \(\square \)
In the current subsection, let
It is easy to show that if \(r, t>0\) and \(rt\ge \alpha ^2\rho (A^{\text {T}}A)\), where \(\alpha \in (0.5,1)\), then Q and M are positive semidefinite. We use the notation \(\Vert w\Vert ^2_H: =w^{\text {T}} H w\ge 0\) if H is symmetric and positive semidefinite.
Lemma 2.4
For the sequence \(\{w^k\}\) generated by the gCPPA method we have
Proof
By the identity
we have
and
Combining (2.15), (2.16) and (2.8), we obtain
By the definition of Q in (2.7), it follows
Applying \(\lambda \)-iteration of (2.5), we get
Combining (2.18) with (2.19), it follows
Substituting (2.20) into (2.17), we obtain (2.14) immediately and complete the proof. \(\square \)
Theorem 2.5
Suppose \(\alpha \in (0.5,1)\), \(r, t>0\) and \(rt\ge \alpha ^2\rho (A^{\text {T}}A)\), the sequence \(\{w^k\}\) is generated by the gCPPA method. Then we have
and \(\{w^k\}\) globally converges to an element of solution set \({\mathcal {W}}^*\) satisfying (2.3).
Proof
Since \(\alpha \in (0.5,1)\), \(r, t>0\) and \(rt\ge \alpha ^2\rho (A^{\text {T}}A)\), it holds that
where M is defined by (2.13). Summing (2.14) from 0 to \(K>1\), it yields
On one hand, we get from (2.22) and (2.23) that
holds for \(\forall K>1\). Since the objective function of problem (1.1) is a closed convex function, the solution set \({\mathcal {W}}^*\) is a bounded closed convex nonempty set.
On the other hand, by taking limits on the both side of (2.23), we get
where \(d:= \big \Vert w^{*}-w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{1}\Vert ^2. \) Thus
which implies (2.21), i.e.,
The sequence \(\{w^k\}\) has at least one cluster point. Let \(w^\infty \) be a cluster point of \(\{w^k\}\), i.e., there exists a subsequence \(\{w^{k_j}\}_{k_j\in {\mathcal {K}}}\) such that \(w^\infty = \lim \nolimits _{k_j\rightarrow \infty , k_j\in {\mathcal {K}}} w^{k_j}\). Then by taking limit on (2.6) as \(k \rightarrow \infty \), and applying (2.26) we have
which implies that \(w^{\infty }\in {\mathcal {W}}^*\) is a solution satisfying (2.3) and completes the proof. \(\square \)
Theorem 2.6
If the conditions of Theorem 2.5 hold, then the sequence \(\{w^k\}\) generated by the gCPPA method has a worst-case convergence rate \({\mathcal {O}}(1/k)\).
Proof
Substituting \(a=w^{k-1}-w^{k}\) and \(b=w^k-w^{k+1}\) to the identity
we obtain
Letting \(w:=w^k\) in (2.6), we get
Since (2.6) also holds at the \((k-1)^{th}\) iteration, we have
Setting \(w:=w^{k+1}\) in the above inequality, we obtain
Adding (2.28) and (2.29), it yields
By the monotonicity of F(w), we get from the above inequality that
Adding \(\big \Vert w^{k-1}-w^{k}-(w^{k}-w^{k+1})\big \Vert ^2_{Q}\) to the both sides of (2.30), it follows
It is easy to derive that
Combining (2.27) with (2.31) and (2.32), we have
By the definition of Q in (2.7), it follows
By \(\lambda \)-iteration of (2.5), we get
which implies
Substituting (2.20) and (2.34) into (2.33), we have
Adding \(\Big (\tfrac{t(2\alpha -1)}{\alpha } -\tfrac{t}{\alpha }\Big ) \Big [\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ]\) to the both side of (2.35), we obtain
Recall that M is positive semidefinite and \(\alpha \in (0.5,1)\), it follows
Which implies \(\Big \{\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big \}\) is monotonically nonincreasing whenever \(k\ge 0\). Summing (2.36) from 1 to \(K+1\), it yields
Combing with (2.23), (2.24) and (2.1), we have
Hence, for any \(\epsilon >0\) there exists K such that
Let \(\lfloor d/\epsilon \rfloor \) means the maximal integer not greater that \(d/\epsilon \). Then, inequality (2.37) implies that the gCPPA needs at most \(\lfloor d/\epsilon \rfloor \) iterations to ensure that
which implies the sequence \(\{w^k\}\) generated by the gCPPA method has a worst-case convergence rate \({\mathcal {O}}(1/k)\) in a non-ergodic sense. The proof of Theorem 2.6 is completed. \(\square \)
2.2 Extended gCPPA to Separable Convex Minimization Problems
The Lagrangian function of problem (1.9) is
If a pair of \((x^*,y^*)\in {\mathcal {X}}\times {\mathcal {Y}}\) is a solution of problem (1.9), then there exists \(\lambda ^*\in {\mathbb {R}}^m\) such that \((x^*,y^*,\lambda ^*)\) is a saddle point of (2.38) satisfying
Consequently, by (2.39) it follows
According to Lemma 2.1 in [11], we have the first-order optimality conditions of problem (2.40):
By a compact form of (2.41), the solution set of problem (1.9), also denoted by \({\mathcal {W}}^*\subset {\mathcal {W}}\), consists of all \(w^*\in {\mathcal {W}}\) satisfying (2.3), where \({\mathcal {W}} := {\mathcal {X}}\times {\mathcal {Y}}\times {\mathbb {R}}^m\), \(\theta (u):=\theta _1(x)+\theta _2(y)\) and
We are in the position to propose the extended gCPPA for separable convex optimization problem (1.9).
By the first-order optimality conditions of iteration subproblem (2.43), we have that, for \(w^{k+1}\in {\mathcal {W}}\) generated by the eCPPA and \(\forall ~w\in {\mathcal {W}}\), we have
By notations (2.42), inequality (2.44) can be rewritten to the compact form (2.6) in which
Lemma 2.8
Let \(w^*\in {\mathcal {W}}^*\) be a solution satisfying (2.3), then we have
Proof
The mapping F(w) is affine with a skew-symmetric matrix, and thus it is monotone. The rest of the proof is as same as that of Lemma 2.3. \(\square \)
In this subsection, let
If \(r, s, t>0\) and \(rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)\), where \(\alpha \in (0.5,1)\), it is easy to verify that Q and M are positive semidefinite.
Lemma 2.9
For the sequence \(\{w^k\}\) generated by the gCPPA method, we have
Proof
Similar to Lemma 2.4, we also get (2.17). By notation Q in (2.45), it holds
From \(\lambda \)-iteration of (2.43), we get
which implies
where
Substituting (2.49) into (2.17), we obtain (2.14) immediately and complete the proof. \(\square \)
Theorem 2.10
Suppose that \(r, s, t>0\) and \(rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)\), where \(\alpha \in (0.5,1)\). Then, for the sequence \(\{w^k\}\) is generated by the eCPPA method, we have
and \(\{w^k\}\) globally converges to an element of solution set \({\mathcal {W}}^*\) satisfying (2.41).
Proof
Since \(rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)\), it follows
Notice that \(t>0,~ \alpha \in (0.5,1)\), we get
Summing (2.48) from 0 to \(K>1\), it yields
By (2.52) and (2.53), it is easy to derive (2.24) and
which implies that
Hence, there exists a subsequence \(\{w^{k_j}\}_{k_j\in {\mathcal {K}}}\) such that \(w^\infty = \lim \nolimits _{k_j\rightarrow \infty , k_j\in {\mathcal {K}}} w^{k_j}\) is a solution satisfying (2.3) and completes the proof. \(\square \)
Theorem 2.11
If the conditions of Theorem 2.10 hold, then the sequence \(\{w^k\}\) generated by the eCPPA method has a worst-case convergence rate \({\mathcal {O}}(1/k)\) in a non-ergodic sense.
Proof
It holds obviously that
Then, for any \(\epsilon >0\) there exists K such that
By (2.51), inequality (2.55) implies that the eCPPA needs at most \(\lfloor d/\epsilon \rfloor \) iterations to ensure that
which implies that the sequence \(\{w^k\}\) generated by the eCPPA method has a worst-case convergence rate \({\mathcal {O}}(1/k)\) in a non-ergodic sense. The proof of Theorem 2.11 is completed. \(\square \)
3 Numerical Results
The section focuses on the numerical performance of the gCPPA and eCPPA comparing with some state-of-the-art methods. All methods are coded in MATLAB 2012b and run on an HP desktop computer with Intel(R) Core(TM) i5-6500 CPU 3.20GHz, RAM 8G. The termination criterion of gCPPA is set to \(\max \left\{ \Vert x^k-x^{k+1}\Vert ,\Vert \lambda ^k-\lambda ^{k+1}\Vert \right\} \le \varepsilon \), and that of eCPPA is set to \(\max \left\{ \Vert x^k-x^{k+1}\Vert ,\Vert y^k-y^{k+1}\Vert ,\Vert \lambda ^k-\lambda ^{k+1}\Vert \right\} \le \varepsilon \), where \(\varepsilon >0\) is a small real number. By the global convergence of the gCPPA and eCPPA methods, we have \(\underset{k\rightarrow \infty }{\mathop {\text {lim}}}\,\Vert {{w}^{k}}-{{w}^{k+1}}\Vert ^{2}=0\), which means that for any given \(\varepsilon >0\) the corresponding algorithm will stop in finite iterations.
To adjust parameters r, s and t simultaneously, we introduce \(\alpha _r,~\alpha _t>0\) satisfying \(\alpha _r\alpha _t=\alpha ^2\). Take \(r= \alpha _r R,~t> \alpha _t \rho (A^{\text {T}}A)/R~\) in gCPPA such that \( rt\ge \alpha ^2\rho (A^{\text {T}}A)\), and take \(r= \sqrt{2}\alpha _r R,~s= \sqrt{2}\alpha _r R\rho (B^{\text {T}}B)/\rho (A^{\text {T}}A),~t> \sqrt{2}\alpha _t\rho (A^{\text {T}}A)/R\) in eCPPA such that \(rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)\), where \(R>0\) is a selected constant.
Example 3.1
The first test problem is a correlation matrices calibrating problem selected from [11, 29], which has the form
where \(C\in {\mathbb {R}}^{n\times n}\) is a symmetrical matrix, \(e=(1,1,\dots ,1)^{\text {T}}\in {\mathbb {R}}^{n \times 1}\) and \(S^n_{+}=\{H\in {\mathbb {R}}^{n\times n}~\big |~ H^{\text {T}}=H, H\succeq 0\}\). Problem (3.1) can be reformulated to problem (1.1) with \(\rho (A^{\text {T}}A)=1\), where A is a projection matrix for the linear constraint \({\text {diag}}(X)=e\) (see Example 2 in [11]).
Let \(z\in {\mathbb {R}}^n\) be the Lagrange multiplier to the linear equality constraint of problem (3.1).
We compare the performance of the gCPPA to the CPPA and rCPPA methods. For a given \((X^k, z^k)\), the gCPPA produces the new iteration \((X^{k+1},z^{k+1})\) as follows:
The X-subproblem of (3.2) is identical to
It is inherent a projection operator via SVD decomposition. It takes the main computation load in each iteration. Similarly, the iteration schemes of CPPA and rCPPA for (3.1) can be written in the form of (1.7) and (1.8).
In our experiments, we take a random matrix \(C=(c_{ij})_{n\times n}\) satisfying \(c_{ij}\in (-1,1)\) and \(c_{ij}=c_{ji}\) for all i, j. For each given n, 20 random instances are tested. To be fair, the initial point \(X^0\) is an \(n\times n\) identity matrix and \(z^0\) is n-dimensional zero vector. For the termination criterion of all methods, we set \(\varepsilon = 10^{-5}\).
Let \(R=2.00\). The proximal parameters of the CPPA and rCPPA methods are set to \(r=R\) and \(t=1.05/R=0.53\), which satisfies \(rt=1.05>\rho (AA^{\text {T}})\). Set \(\gamma =1.50\) in the rCPPA method. The “\(\hbox {gCPPA}^{(0.53)}\)” means the gCPPA method with \(\alpha _r=0.70\), \(\alpha _t=0.40\), so \(\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.53\), \(r=R\alpha _r=1.40\) and \(t=1.05\alpha _t/R=0.29\). The “\(\hbox {gCPPA}^{(0.17)}\)” means the gCPPA method with \(\alpha _r=0.30\), \(\alpha _t=0.10\), \(r=R\alpha _r=0.60\), so \(\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.17\) and \(t=1.05\alpha _t/R=0.05\).
For easy comparisons, the results of three methods in terms of matrix dimension n, average iteration number (iter) and average cpu-time (cput in seconds) are put into Table 1, and displayed in Fig. 1.
One can conclude from Table 1 and Fig. 1 that: (1) the \(\hbox {gCPPA}^{(0.53)}\) takes \(80\%\) iterations and time of the CPPA, but takes more iterations and time than the rCPPA; (2) the \(\hbox {gCPPA}^{(0.17)}\) takes \(60\%\) iterations of the CPPA and \(80\%\) iterations of the rCPPA, and spends less cpu-time than the CPPA and rCPPA, especially for the high dimension problem. In other word, the \(\hbox {gCPPA}^{(0.53)}\) and \(\hbox {gCPPA}^{(0.17)}\) are better than CPPA, and \(\hbox {gCPPA}^{(0.17)}\) has almost better numerical performance as the rCPPA with \(\gamma = 1.50\). However, the convergence of \(\hbox {gCPPA}^{(0.17)}\) is not established since \(\alpha <0.5\). It will be left to our future research.
Example 3.2
The second test problem is the matrix completion problem utilized in Tao, Yuan and He [30], which is as follows:
where \(S_{\text {B}}=\{H\in {\mathbb {R}}^{n\times n}~\big |~H_{\text {L}}\ge H\ge H_{\text {U}}\}\).
By introducing an auxiliary variable Y such that \( X - Y=0\), problem (3.3) can be reformulated to a separable convex optimization problem of the form
Obviously, problem (3.4) is a special case of problem (1.9) in which \(\rho (A^{\text {T}}A)=\rho (B^{\text {T}}B)=1\).
For comparisons, four methods are used for solving problem (3.4): classical ADMM (ADMM for short), gADMM, separable version of CPPA (i.e., Algorithm 2.7 with \(\alpha =1\), sCPPA) and eCPPA ( Algorithm 2.7 with \(0<\alpha _0< \alpha < 1\)). Let \(Z\in {\mathbb {R}}^{n\times n}\) be the Lagrange multiplier for linear constraint \(X-Y=0\). For a given \((X^k,Y^k,Z^k)\), the eCPPA produces new iterate \((X^{k+1},Y^{k+1},Z^{k+1})\) via
The X-subproblem of iteration (3.5) is inherent a projection operator via SVD decomposition, and it takes the main computation load in each iteration. The Y-subproblem of (3.5) is also a projection
where \(P_{S_{B}}\) denotes the projection onto \(S_{B}\) under the Euclidean norm.
In the experiments, we take a random matrix \(C=(c_{ij})_{n\times n}\) satisfying \(c_{ij}\in (-1,1)\) and \(c_{ij}=c_{ji}\) for all i, j. For each given n, 20 random instances are tested. To be fair, \(X^0\) and \(Y^0\) are set to \(n\times n\) identity matrice and \(z^0\) is set to n-dimensional zero vector.
For termination criterion of all methods, we set \( \varepsilon = 10^{-5}\Vert w^{1}-w^0\Vert _\infty \).
The parameter settings of the used methods are stated as follows: The penalty parameter is set to \(\beta =1.80\) in both ADMM and gADMM, the relaxation parameter is set to \(\gamma = 1.80\) in the gADMM. Let \(R=3.00\), the proximal parameters of the sCPPA are set to \(r = s = \sqrt{2} R \approx 4.24\) and \(t=1.02\sqrt{2}/R\approx 0.48\), and those of the eCPPA are set to \(r =s=\sqrt{2}\alpha _r R\approx 2.55\), \(t=1.02 \sqrt{2} \alpha _t/R\approx 0.24\), where \(\alpha _r =0.60\) and \(\alpha _t = 0.50\), and \(\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.55\).
We test all methods on 20 random instances for each fixed n. The average iteration number and the average cpu-time for problem (3.4) with different n are listed in Table 2 and displayed in Fig. 2.
From Table 2 and Fig. 2, one can conclude that the iteration number of eCPPA is about \(30\%\) of ADMM, and about \(70\%\) of gADMM and sCPPA. Moreover, eCPPA takes less cpu-time than others especially for the high-dimensional problems. Thus, eCPPA has the best performance for problem (3.4).
Example 3.3
The third example selected from [13] focuses on the wavelet-based image inpainting and zooming problems. Let a two-dimensional image \(x\in {\mathbb {R}}^{l_1\times l_2}\) and \(l=l_1\cdot l_2\). Letting \(\mathbf vec (\cdot )\) be the vectorization operator in the lexicographic order, \(\mathbf{x }=\mathbf vec (x)\in {\mathbb {R}}^l\) is tackled by vectorizing x as an one-dimensional vector in the lexicographic order. Let \(D\in {\mathbb {R}}^{l\times n}\) be a wavelet dictionary, each column of D is the elements of a wavelet frame. Commonly, the image \(\mathbf{x }\) possesses a sparse representation under dictionary D, i.e., \(\mathbf{x } = D x\) where x is a sparse vector. The wavelet-based image processing considers recovering the real image \(\mathbf{x }\) from an observation b which might have some missing pixels or convolutions. The model for wavelet-based image processing can be casted as
where \(\Vert \mathbf{x }\Vert _1\) is to deduce a sparse representation under the wavelet dictionary, and B (also called mask) is a typically ill-conditioned matrix representation of convolution or downsampling operators. For the inpainting problem, matrix \(B\in {\mathbb {R}}^{l\times l}\) in (3.6) is a diagonal matrix whose diagonal elements are either 0 or 1, where the locations of 0 correspond to missing pixels in the image and locations of 1 correspond to the pixels to be kept. For the image zooming problem, matrix \(B\in {\mathbb {R}}^{m\times l}\) can be expressed as \(B = SH\) where \(S\in {\mathbb {R}}^{m\times l} \) is a downsampling matrix and \(H\in {\mathbb {R}}^{l\times l} \) is a blurry matrix. Matrix H can be diagonalized by the discrete consine transform (DCT), i.e., \(H = C^{-1}\Lambda C\) where C represents the DCT and \(\Lambda \) is a diagonal matrix whose diagonal entries are eigenvalues of H.
We compare the gCPPA method to the CPPA and rCPPA methods on \(256\times 256\) images of Peppers.png and Boat.png for the image inpainting and image zooming problems, respectively.
The dictionary D is chosen as the inverse discrete Haar wavelet transform with level 6. Below we give the detail of how the tested images are degraded.
For the image inpainting problem, the original image Peppers is first blurred by the out-of-focus kernel with a radius of 7. Then 60% pixels of the blurred images are lost by implementing a mask operator S. The positions of missing pixels are located randomly.
For the image zooming problem, the original image Boat is downsampled by a downsampling matrix S with a factor of 4. Then, the downsampled image is corrupted by a convolution whose kernel is generated by fspecial(gaussian,9,2.5) of MATLAB.
Let z be the Lagrange multiplier. For a given \((\mathbf{x }^k, z^k)\), the gCPPA adopts the following iteration to obtain \((\mathbf{x }^{k+1}, z^{k+1})\):
On problem (3.6), the performance of the gCPPA (compared with CPPA and rCPPA) is tested. Similarly, the iteration scheme of the CPPA has the same iteration as the gCPPA with \(\alpha =1\), and the rCPPA can also be written in the form of (1.7) and (1.8). Since \(DD^{\text {T}} = I\), the blurry matrix H can be diagonalized by DCT and the binary matrix (both mask and downsampling matrices) S satisfies \(\Vert S\Vert =1\), we have \(\Vert (BD)^{\text {T}}(BD)\Vert =1\) for the wavelet-based image inpainting and zooming problems. Therefore, the requirement \(rt\ge \Vert (BD)^{\text {T}}(BD)\Vert \) reduces to \(rt\ge 1\) for the CPPA or rCPPA, and \(rt>\alpha ^2\) for the gCPPA. The parameters are described as below.
For the image inpainting problem, we take initial point \(({\mathbf{x }^0,\upsilon ^0}) = (D^{\text {T}}(b),\mathbf{0 })\), and \(R=0.60\), \(r_0 = R = 0.60\), \(t_0 = 1.02/R=1.70\), \(\gamma = 1.00\) for the CPPA and \(\gamma = 1.90\) for the rCPPA. Take \(\alpha _r= 2.00\), \(\alpha _t= 0.13\), \(\alpha =\sqrt{0.26}\approx 0.51\), \(r = \alpha _r R=1.20\), \(t=1.02\alpha _t/R\approx 0.22\) for the gCPPA;
For the image zooming problem, we take initial point \((\mathbf{x }^0,\upsilon ^0) = (\mathbf{0 }, \mathbf{0 })\), and \(R=0.55\), \(r_0 =R= 0.55\), \(t_0 = 1.02/R\approx 1.85\), \(\gamma = 1.00\) for the CPPA and \(\gamma = 1.20\) for the rCPPA. Take \(\alpha _r= 1.00\), \(\alpha _t= 0.26\), \(a = \sqrt{0.26}\approx 0.51\), \(r = \alpha _r R=0.55\), \(t = 1.02\alpha _t/R\approx 0.48\) for the gCPPA.
As usual, the quality of the reconstruction is measured by the signal-to-noise ratio (SNR) in decibel (dB)
where \(\bar{\mathbf{x }}\) is a reconstructed image and \(\mathbf{x }\) is a clean image.
For easy comparisons, the evolutions of SNR with respect to iterations and computing time for all tested methods are displayed in Fig. 3. It shows that the rCPPA is better than the CPPA, and the gCPPA converges to the nearly same solutions in inpainting and zooming problems as fast as the rCPPA.
The reconstructed images are displayed in Fig. 4 by executing 300 iterations in image inpainting (or 30 iterations in image zooming). It verifies our assertion: the gCPPA could be at least as efficient as the rCPPA on the inpainting and zooming problem. We have to emphasize that the gCPPA does not involve any relaxation step while the rCPPA has a relaxation step with \(\gamma >1\).
Example 3.4
The last example selected from [13] focuses on the total variation (TV) uniform noise removal model:
where the two-dimensional image \(x=(x_{i,j})_{l_1\times l_2}\in {\mathbb {R}}^{l_1\times l_2}\) and \(l=l_1 \cdot l_2\). The vectorized version of an image can be denoted by \(\mathbf{x }=\mathbf vec (x)\in {\mathbb {R}}^l\) as Example 3.3, and \(\Vert \nabla \mathbf{x }\Vert _1:=\sum _{i=1}^{l}(|\text {vec}(\nabla _h x)|_i+|\text {vec}(\nabla _v x)|_i)\) is the TV norm with \(\nabla _h x=\big ((\nabla _h x)_{i,j}\big )_{l_1\times l_2}\), \(\nabla _v x=\big ((\nabla _v x)_{i,j}\big )_{l_1\times l_2}\) where
The vector \(\mathbf{x }^0\in {\mathbb {R}}^l\) is an observed image corrupted by a zero-mean uniform noise; \(H\in {\mathbb {R}}^{l \times l}\) is the matrix representation of a blurry operator satisfying \(\Vert HH^{\text {T}}\Vert =1\), \(\sigma \) is a parameter indicating the uniform noise level and \(\Vert \mathbf{x }\Vert _{\infty }:=\max _{1\le i \le l}\Vert \mathbf{x }_i\Vert \).
Let
where \(\mathbf{e } = (1, 1,\cdots , 1)^{\text {T}} \in {\mathbb {R}}^l\). Model (3.8) can be reformulated to
We also compare the gCPPA to the CPPA and rCPPA methods on 256-by-256 images of Peppers.png and Boat.png. The clean images are degraded by either the Gaussian (fspecial(gaussian,9,2.5)”) or the out-of-focus (fspecial(disk,3)”) convolution.
Let \(\mathbf{z }\) be the Lagrange multiplier for the linear inequality constraint \(A\mathbf{x }\ge b\). The gCPPA produces the new iterate via:
The CPPA method also owns above iteration with \(\alpha =1\), and the rCPPA can be written in the form of (1.7) and (1.8). We take \(R = 0.60\), \(r_0 = R\), \(t_0 = 1.02/R=1.70\) and \(\gamma = 1.80\) for the CPPA and rCPPA method, and take \(\alpha _r=7.00\), \(\alpha _t=0.04\), \(\alpha =\sqrt{0.28}\approx 0.53\), \(r = \alpha _r R=4.20\), \(t = 1.02\alpha _t/R\approx 0.07\) for the gCPPA. The initial points for all methods are taken as zeros and the stopping criterion is set to
Set \(\sigma =0.5\). For easy comparisons, the original images, degraded images and the restored images are displayed in Fig. 5.
The evolutions of SNR with respect to iterations and computing time are displayed in Fig. 6. It shows that both the CPPA and gCPPA methods reach the higher SNR in a short time. However, the CPPA converges to the solutions worse than that of the rCPPA, while the gCPPA converges to better solutions than the rCPPA in TV uniform noise removal problem.
The numerical results on more experiments with different \(\sigma \) are put into Table 3 for comparisons.
In Table 3, each set of \(\cdot \setminus \cdot \setminus \cdot \) represents the number of iterations, the computing time in seconds and the restored SNR when the stopping criterion (3.10) is met. From Table 3, one can conclude that: the rCPPA has better performance comparing with the CPPA in SNR, but worse in the number of iterations and cpu-time, while the gCPPA has better performance comparing with the CPPA in all terms including SNR, the number of iterations and cpu-time.
4 Conclusions
In this paper, we propose two customized proximal point algorithm by modifying the prediction step involving no relaxation step. We have proposed two new CPPAs (i.e., gCPPA and eCPPA), both do not involve any relaxation step but still keep the same convergence properties as the CPPA within some relaxation steps. Numerical results have demonstrated that the proposed methods are more effective than some state-of-the-art methods.
Notes
The mapping F(w) is affine with a skew-symmetric matrix, and thus it is monotone; see He and Yuan [23]
References
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 93, 273–299 (1965)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Bertsekas, D.P., Tseng, P.: Partial proximal minimization algorithms for convex programming. SIAM J. Optim. 4(3), 551–572 (1993)
Kaplan, A., Tichatschke, R.: Proximal point methods and nonconvex optimization. J. Glob. Optim. 13(4), 389–406 (1998)
Martinet, B.: Brève communication. Regularisation, d’iné quations variationelles par approximations succesives. ESAIM-Math. Model. Numer. Anal.-Model. Math. Anal. Numer 4(R3), 154–158 (1970)
Kassay, G.: The proximal points algorithm for reflexive Banach spaces. Stud. Math. 30, 9–17 (1985)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 646–664 (1992)
Spingarn, J.E.: Partial inverse of a monotone operator. Appl. Math. Optim. 10(10), 247–265 (1983)
Han, S.P.: A decomposition method and its application to convex programming. Math. Oper. Res. 14(2), 237–248 (1989)
Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. UCLA CAM. Rep. 34, 1–29 (2008)
He, B.S.: PPA-like contraction methods for convex optimization: a framework using variational inequality approach. J. Oper. Res. Soc. China 3(4), 391–420 (2015)
He, B.S., Fu, X.L., Jiang, Z.K.: Proximal point algorithm using a linear proximal term. J. Optim. Theory Appl. 141(2), 299–319 (2009)
He, B.S., Yuan, X.M., Zhang, W.: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56(3), 559–572 (2013)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(4), 303–320 (1969)
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. Optimization 5(6), 283–298 (1969)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
He, B.S.: Parallel splitting augmented Lagrangian methods for monotone structured variational inequalities. Comput. Optim. Appl. 42(2), 195–212 (2009)
Glowinski, R., Marrocco, A.: Analyse numerique du champ magnetique d’un alternateur par elements finis et sur-relaxation ponctuelle non lineaire. Comput. Meth. Appl. Mech. Eng. 3(1), 55–85 (1974)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Gabay, D.: Chapter IX: applications of the method of multipliers to variational inequalities. Stud. Math. Appl. 15, 299–331 (1983)
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Cai, X.J., Gu, G.Y., He, B.S., Yuan, X.M.: A proximal point algorithm revisit on the alternating direction method of multipliers. Sci. China Math. 56(10), 2179–2186 (2013)
He, B.S., Yuan, X.M.: On the \(O(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
He, B.S., Yuan, X.M.: On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2014)
Sarkis, J.: A strategic decision framework for green supply chain management. J. Clean. Prod. 11(4), 397–409 (2003)
Wu, Z., Pagell, M.: Balancing priorities: Decision-making in sustainable supply chain management. J. Oper. Manag. 29(6), 577–590 (2011)
Yazdani, M., Zarate, P., Coulibaly, A., Zavadskas, E.K.: A group decision making support system in logistics and supply chain management. Expert Syst. Appl. 88, 376–392 (2017)
Gu, G., He, B.S., Yuan, X.M.: Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: a unified approach. Comput. Optim. Appl. 59, 135–161 (2014)
Cai, J.F., Candès, Emmanuel J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2008)
Tao, M., Yuan, X.M., He, B.S.: Solving a class of matrix minimization problems by linear variational inequality approaches. Linear Alg. Appl. 434(11), 2343–2352 (2011)
Acknowledgements
The authors are very grateful to Doctor Wen-Xing Zhang at University of Electronic Science and Technology of China for his help on numerical experiments on image processing problems. This work was also supported by the Science and Technology on Communication Information Security Control Laboratory, Xiangtan University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Majid Soleimani-damaneh.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the Natural Science Foundation of China with Grants 11571074 and 11726505.
Rights and permissions
About this article
Cite this article
Jiang, B., Peng, Z. & Deng, K. Two New Customized Proximal Point Algorithms Without Relaxation for Linearly Constrained Convex Optimization. Bull. Iran. Math. Soc. 46, 865–892 (2020). https://doi.org/10.1007/s41980-019-00298-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41980-019-00298-0
Keywords
- Convex optimization
- Separable convex optimization
- Customized proximal point algorithm
- Global convergence
- \({\mathcal {O}}(1/k)\)-convergence rate