Two New Customized Proximal Point Algorithms Without Relaxation for Linearly Constrained Convex Optimization

Jiang, Binqian; Peng, Zheng; Deng, Kangkang

doi:10.1007/s41980-019-00298-0

Two New Customized Proximal Point Algorithms Without Relaxation for Linearly Constrained Convex Optimization

Original Paper
Published: 23 October 2019

Volume 46, pages 865–892, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Bulletin of the Iranian Mathematical Society Aims and scope Submit manuscript

Two New Customized Proximal Point Algorithms Without Relaxation for Linearly Constrained Convex Optimization

Download PDF

275 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we propose a new customized proximal point algorithm for linearly constrained convex optimization problem, and further extend the proposed method to separable convex optimization problem. Unlike the existing customized proximal point algorithms, the proposed algorithms do not involve relaxation step, but still ensure the convergence. We obtain the particular iteration schemes and the unified variational inequality perspective. The global convergence and ${\mathcal {O}}(1/k)$-convergence rate of the proposed methods are investigated under some mild assumptions. Numerical experiments show that compared to some state-of-the-art methods, the proposed methods are effective.

A class of customized proximal point algorithms for linearly constrained convex optimization

Article 19 August 2016

On relaxation of some customized proximal point algorithms for convex minimization: from variational inequality perspective

Article 02 April 2019

A parameterized proximal point algorithm for separable convex optimization

Article 12 September 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Proximal Point Algorithm

Proximal point algorithm (PPA for short) dates back to Moreau [1], which was extended by Rockafellar [2], Bertsekas and Tseng [3], and Kaplan and Tichatschke [4], etc. It was introduced to optimization community by Martinet [5]. The PPA possesses a robust convergence for very general problems in finite and infinite-dimensions (Kassay [6] and Glüer [7]), and is also the basis for the splitting methods (Spingarn [8] and Han [9]).

Consider a convex minimization problem with linear constraints of the form

$$\begin{aligned} \min \{\theta (x)| Ax=b, x\in {\mathcal {X}}\}, \end{aligned}$$

(1.1)

where ${\mathcal {X}}\subset {\mathbb {R}}^{n}$ is a bounded closed convex and nonempty set, $\theta :{\mathcal {X}}\rightarrow {\mathbb {R}}$ is a convex (not necessarily smooth) function, $A\in {\mathbb {R}}^{m\times n}$, $b\in {\mathbb {R}}^{m}$. The solution set of (1.1), denoted by ${\mathcal {X}}^*$, is assumed to be nonempty.

For solving problem (1.1), the primal-dual hybrid gradient (PDHG) algorithm proposed by Zhu and Chan [10] has the following iterative scheme:

$$\begin{aligned} \left\{ \begin{array}{lll} x^{k+1}=\arg \min \nolimits _{x\in {\mathcal {X}}}\big \{\theta (x) +\frac{r}{2}\big \Vert (x-x^k) -\frac{1}{r}A^{\text {T}}{\lambda }^{k}\big \Vert ^2\big \},\\ \lambda ^{k+1}=\lambda ^k-\frac{1}{t}(Ax^{k+1}-b), \end{array}\right. \end{aligned}$$

(1.2)

where $\lambda \in {\mathbb {R}}^m$ is the Lagrange multiplier vector, $r,t>0$ are iteration parameters. Let $w:=(x,\lambda )$, and ${\mathcal {W}}:= {\mathcal {X}}\times {\mathbb {R}}^m$. Then, by the first optimality condition of iteration (1.2), for $w^{k+1}\in {\mathcal {W}}$ we have that

$$\begin{aligned} \theta (x)-\theta (x^{k+1})+(w-w^{k+1})^{\text {T}} \Big [ F(w^{k+1})+P(w^{k+1}-w^k)\Big ] \ge 0,~~\forall ~ w\in {\mathcal {W}}, \end{aligned}$$

(1.3)

where

$$\begin{aligned} F(w)=\left( \begin{array}{c}-A^{\text {T}}\lambda \\ Ax-b\end{array}\right) ,\quad P=\left( \begin{array}{cc}rI_n&{} A^{\text {T}} \\ 0_{m\times n} &{} t I_m \\ \end{array}\right) , \end{aligned}$$

and $I_n$ denotes a ${n\times n}$-dimensional identity matrix, $0_{m\times n}$ is a ${m\times n}$-dimensional null matrix. However, He [11] showed that the PDHG algorithm could not be necessarily convergent, and gave a counter example, see Sect. 3 of [11].

He, Fu and Jiang [12] proposed a PPA using a linear proximal term (LPPA). For solving (1.1), the LPPA (Procedure 4.1 in [12]) first produces a predictor ${\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{\lambda }}^k)$ via

$$\begin{aligned} \left\{ \begin{array}{l} {\tilde{x}}^{k}=\arg \min \nolimits _{x\in {\mathcal {X}}}\bigg \{\theta (x) +\frac{1}{2}\big \Vert A(x-x^k)-\lambda ^{k}\big \Vert ^2\bigg \}, \\ {\tilde{\lambda }}^{k}=\lambda ^k-(A{\tilde{x}}^{k}-b). \end{array}\right. \end{aligned}$$

(1.4)

By the first-order optimality conditions of (1.4), for ${\tilde{w}}^k\in {\mathcal {W}}$, we have

$$\begin{aligned} \theta (x)-\theta ({\tilde{x}}^k)+(w-{\tilde{w}}^k)^{\text {T}} \Big [ F({\tilde{w}}^k)+P_1({\tilde{w}}^k-w^k)\Big ] \ge 0,~~\forall ~w\in {\mathcal {W}}, \end{aligned}$$

(1.5)

where

$$\begin{aligned} F(w)=\left( \begin{array}{c}-A^{\text {T}}\lambda \\ Ax-b\end{array}\right) ,\quad P_1 =\left( \begin{matrix} A^{\text {T}}A &{} A^{\text {T}} \\ -A &{} I_m \end{matrix}\right) . \end{aligned}$$

(1.6)

To ensure the convergence, the new iterate $w^{k+1}=(x^{k+1},\lambda ^{k+1})$ of LPPA is generated by a simple relaxation step

$$\begin{aligned} w^{k+1}=w^k-\rho _k P_1(w^k-{\tilde{w}}^k), \end{aligned}$$

where $\rho _k=\tau \rho _k^*$ and

$$\begin{aligned} \rho _k^*=\frac{(w^k-{\tilde{w}}^k)^{\text {T}}P_1(w^k-{\tilde{w}}^k)}{\Vert P_1(w^k-{\tilde{w}}^k)\Vert ^2},~ \tau \in [1,2). \end{aligned}$$

He, Yuan and Zhang [13] proposed a customized proximal point algorithm (CPPA) by using matrix $P_2$ instead of matrix $P_1$ in (1.5), where

$$\begin{aligned} P_2 =\left( \begin{matrix} rI_n &{} -A^{\text {T}} \\ -A &{} tI_m \end{matrix}\right) . \end{aligned}$$

To guarantee convergence, the matrix $P_2$ should be positive definite. As a consequence, it requires $rt>\rho (A^{\text {T}}A)$, where $\rho (A^{\text {T}}A)$ is the spectral radius of matrix $A^{\text {T}}A$. The corresponding iterative scheme of CPPA first produces a predictor ${\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{\lambda }}^k)$ via

$$\begin{aligned} \left\{ \begin{array}{l} {\tilde{\lambda }}^{k}=\lambda ^k-\frac{1}{t}(A{x}^{k}-b), \\ {\tilde{x}}^{k}=\arg \min \nolimits _{x\in {\mathcal {X}}} \big \{\theta (x)+\frac{r}{2}\big \Vert (x-x^k)-\frac{1}{r}A^{\text {T}}(2{\tilde{\lambda }}^{k} -{\lambda }^{k}) \big \Vert ^2\big \}, \end{array}\right. \end{aligned}$$

(1.7)

and then takes a simple relaxation step

$$\begin{aligned} w^{k+1}=w^k-\gamma (w^k-{\tilde{w}}^{k}), \end{aligned}$$

(1.8)

where $\gamma \in (0, 2)$ is a relaxation factor. Various numerical experiments show that $\gamma >1$ provides the fast convergence. For distinction in this paper, the CPPA (1.7)-(1.8) with $\gamma =1$ is called original-CPPA or CPPA for short, while the CPPA with $\gamma >1$ is called relax-CPPA or rCPPA for short.

The augmented Lagrangian method of multipliers (ALM) is another efficient method for convex optimization, see Hestenes [14] and Powell [15]. Let $\beta >0$ be a penalty parameter of ALM, the iterative scheme of the ALM to the problem (1.9) is

$$\begin{aligned} \left\{ \begin{array}{l} {x}^{k+1} = \arg \min \nolimits _{x\in {\mathcal {X}}} \Big \{\theta (x)+\frac{\beta }{2}\big \Vert Ax-b -\frac{1}{\beta }\lambda ^{k} \big \Vert ^2\Big \}, \\ {\lambda }^{k+1} = \lambda ^k-\beta (A{x}^{k+1}-b). \end{array}\right. \end{aligned}$$

Rockafellar [16] showed that the ALM is exactly an application of PPA to the dual problem of (1.1), owning the compact inequality (1.3) where P is replaced by

$$\begin{aligned} P_3=\left( \begin{matrix}0_{n\times n}, &{}0_{n\times m}\\ 0_{m\times n}&{} \tfrac{1}{\beta }I_m\end{matrix}\right) . \end{aligned}$$

However, He, Yuan and Zhang [13] showed that the CPPA is more efficient than the ALM when the objective function $\theta (x)$ is simple in the sense that its resolvent operator of $\theta (x)$, defined by $(I+\tfrac{1}{r}\partial \theta )^{-1}$, has a close-form representation, where $\partial \theta $ is the subdifferential of $\theta $.

1.2 Proximal Point Algorithm for Separable Problem

Consider a separable convex optimization problem of the form

$$\begin{aligned} \min \Big \{\theta _1(x)+\theta _2(y)\Big | Ax+By=b, x\in {\mathcal {X}}, y\in {\mathcal {Y}}\Big \}, \end{aligned}$$

(1.9)

where ${\mathcal {X}}\subset {\mathbb {R}}^{n_1}$ and ${\mathcal {Y}}\subset {\mathbb {R}}^{n_2}$ are bounded closed and convex nonempty sets, $\theta _1:{\mathcal {X}}\rightarrow {\mathbb {R}}$ and $\theta _2:{\mathcal {Y}}\rightarrow {\mathbb {R}}$ are convex (not necessarily smooth) functions, $A\in {\mathbb {R}}^{m\times n_1}$, $B\in {\mathbb {R}}^{m\times n_2}$ and $b\in {\mathbb {R}}^{m}$. The solution set of (1.9), denoted by ${\mathcal {X}}^*\times {\mathcal {Y}}^*$, is assumed to be nonempty.

The augmented Lagrangian-based methods, especially some splitting forms including alternating direction method of multipliers (ADMM for short) and parallel splitting algorithm [17], are verified to be very efficient for problem (1.9). The iterative scheme of the ALM to the problem (1.9) is

$$\begin{aligned} \left\{ \begin{array}{lll} \lambda ^{k+1} = \lambda ^k-\beta (Ax^k+By^k-b),\\ (x^{k+1}, y^{k+1}) = \arg \min \nolimits _{x\in {\mathcal {X}}, y\in {\mathcal {Y}}} \Big \{\theta _1(x)+\theta _2(y)+\frac{\beta }{2}\big \Vert Ax+By-b -\frac{1}{\beta }\lambda ^{k+1} \big \Vert ^2\Big \}. \end{array}\right. \end{aligned}$$

(1.10)

In the second subproblem of scheme (1.10), the coupled variable (x, y) makes it intractable. As a splitting version of the ALM, the ADMM originally proposed by Glowinski [18] and Gabay [19], is as follows:

$$\begin{aligned} \left\{ \begin{array}{l} x^{k+1} = \arg \min \nolimits _{x\in {\mathcal {X}}} \Big \{\theta _1(x)+\frac{\beta }{2}\big \Vert Ax+By^k-b-\frac{1}{\beta }\lambda ^{k} \big \Vert ^2\Big \},\\ y^{k+1} = \arg \min \nolimits _{y\in {\mathcal {Y}}} \Big \{\theta _2(y)+\frac{\beta }{2}\big \Vert Ax^{k+1}+By-b-\frac{1}{\beta }\lambda ^{k} \big \Vert ^2\Big \},\\ \lambda ^{k+1} = \lambda ^k-\beta (Ax^{k+1}+By^{k+1}-b). \\ \end{array}\right. \end{aligned}$$

The ADMM overcomes the drawback of ALM stated above by discoupling variable (x, y). Gabay [20] expressed the ADMM as essentially an application of the Douglas–Rachford splitting method [21]. Cai, Gu and He [22] provided a novel and simple PPA way to understand ADMM, and proposed a generalized ADMM (gADMM) which producing firstly a predictor ${\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{y}}^k, {\tilde{\lambda }}^k)$ via

$$\begin{aligned} \left\{ \begin{array}{lll} {\tilde{x}}^k=\arg \min \nolimits _{x\in {\mathcal {X}}} \Big \{\theta _1(x)+\frac{\beta }{2}\big \Vert Ax+By^k-b-\frac{1}{\beta }\lambda ^k \big \Vert ^2\Big \},\\ {\tilde{\lambda }}^{k}=\lambda ^k-\beta (A{\tilde{x}}^k+By^{k}-b),\\ {\tilde{y}}^{k}=\arg \min \nolimits _{y\in {\mathcal {Y}}} \Big \{\theta _2(y)+\frac{\beta }{2}\big \Vert A{\tilde{x}}^k+By-b -\frac{1}{\beta }{\tilde{\lambda }}^k \big \Vert ^2\Big \}. \end{array}\right. \end{aligned}$$

(1.11)

Let $v=(y,\lambda )$ and $w:=(x,v)\in {\mathcal {W}}={\mathcal {X}}\times {\mathcal {Y}}\times {\mathbb {R}}^m$, and then the new iterate $w^{k+1}=(x^{k+1},v^{k+1})$ of the gADMM is generated by a relaxation step:

$$\begin{aligned} x^{k+1}={\tilde{x}}^k,\quad \quad v^{k+1}=v^k-\gamma (v^k-{\tilde{v}}^{k}),\quad \gamma \in (0,2). \end{aligned}$$

The iterative scheme (1.11) can be recovered by a compact form: find ${\tilde{w}}^k \in {\mathcal {W}}$ such that

$$\begin{aligned} \theta (u)-\theta ({\tilde{u}}^{k})+{{(w-{\tilde{w}}^{k})}^{\text {T}}} \Big \{F({\tilde{w}}^{k})+P_4({\tilde{w}}^{k}-w^{k})\Big \}\ge 0,~~\forall ~w\in {\mathcal {W}}, \end{aligned}$$

(1.12)

where $u=(x,y),~~w=(u, \lambda ) \in {\mathcal {W}}$ and $\theta (u)=\theta _1(x)+\theta _2(y)$ and

$$\begin{aligned} u=\left( \begin{matrix}x\\ y\\ \end{matrix}\right) ,\quad w=\left( \begin{matrix}x\\ y\\ \lambda \\ \end{matrix}\right) ,\quad F(w)=\left( \begin{matrix}-{{A}^{\text {T}}}\lambda \\ -{{B}^{\text {T}}}\lambda \\ Ax+By-b\\ \end{matrix} \right) , \end{aligned}$$

and the customized matrix $P_4$ is as follows:

$$\begin{aligned} P_4=\left( \begin{matrix}0_{n_1\times n_1}&{}0_{n_1\times n_2}&{}0_{n_1\times m}\\ 0_{n_2\times n_1}&{}\beta B^{\text {T}}B&{} -B^{\text {T}}\\ 0_{m\times n_1}&{}-B&{} \tfrac{1}{\beta }I_m\end{matrix}\right) . \end{aligned}$$

The special structure of matrix $P_4$ implies that variable x may not be involved in the iteration of the gADMM. The following matrix

$$\begin{aligned} {\bar{P}}_4=\left( \begin{matrix} \beta B^{\text {T}}B&{} -B^{\text {T}}\\ -B&{}\tfrac{1}{\beta }I_m\end{matrix}\right) , \end{aligned}$$

which removes zero-elements from $P_4$, is required to be symmetric and positive semidefinite to ensure the convergence of the gADMM. The gADMM was verified to be faster than ADMM because some approximate computation is permitted.

Let ${P}'_4\in {\mathbb {R}}^{n_1+n_2+m}$ be a matrix such that ${P}'_4(w^k-{\tilde{w}}^{k})$ be a descent direction, He and Yuan [23, 24] presented a general relaxation step as follows:

$$\begin{aligned} w^{k+1}=w^k-\gamma {P}'_4(w^k-{\tilde{w}}^{k}), \end{aligned}$$

where $\gamma $ is the steplength. Let $H ={\bar{P}}_4 ({P'_4})^{-1}$ and $G=({{\bar{P}}_4})^{\text {T}}+{\bar{P}}_4-\gamma ({P'_4})^{\text {T}} H({P'_4})$. One can relax the conventional proposition on ${\bar{P}}_4$, but ensure symmetric and positive semidefinite property of G. In [23, 24], He and Yuan established the worst-case convergence rate ${\mathcal {O}}(1/k)$ and non-ergodic convergence rate of the Douglas–Rachford alternating direction method of multipliers.

1.3 Contributions

From above analysis, there are many PPAs for problem (1.1) and (1.9).

The relaxation steps are very effective used in PPAs. For example, the relaxation step of CPPA accelerates the algorithm for problem (1.1), the relaxation step of LPPA and gADMM ensures the convergence for problem (1.1) and (1.9), respectively. However, the relaxation step might be unacceptable or even not permitted in some practical applications. These applications include real time control system, decision-making on flow line production, supply chain management and decision-making, and so on. For an instance, in the decision-making framework for supply chain management [25,26,27], each firm on the chain makes decision independently in a proper sequence, and the decision is immediately executed whenever it is given. Hence, all firms cannot adjust their decision at the end, correspondingly the relaxation step in the simulation method has not any actual meaning.

To adapt these real circumstances, we prefer to modify the prediction step rather than to take a relaxation step, and propose two novel customized proximal point algorithms without the relaxation step. The proposed methods are generalized versions of the CPPA proposed in [13, 28] for both convex and separable convex problems, respectively. The global convergence and the ${\mathcal {O}}(1/k)$-convergence rate of the proposed methods will be established under some mild assumptions. We also verify that the new methods have a better numerical performance in practice comparing to some state-of-the-art methods within the relaxation step.

The rest of the paper is organized as follows. In Sect. 2, we propose two new customized proximal point algorithms without relaxation step for problems (1.1) and (1.9), respectively. The convergence of proposed methods will be proven in Sect. 2. In Sect. 3, some preliminary numerical results, comparing to the state-of-the-art methods, are presented to show the high efficiency of the new methods. Section 4 concludes this paper with some final remarks.

2 Two New Customized Proximal Point Algorithms

2.1 The Generalized Customized Proximal Point Algorithm

The Lagrangian function of problem (1.1) is

$$\begin{aligned} L(x,\lambda )=\theta (x)-\lambda ^{\text {T}}(Ax-b). \end{aligned}$$

(2.1)

If $x^*\in {\mathcal {X}}$ is a solution of problem (1.1), then there exists $\lambda ^*\in {\mathbb {R}}^m$ such that the pair $(x^*,\lambda ^*)$ is a saddle point of (2.1) satisfying

$$\begin{aligned} \left\{ \begin{array}{ll} L(x,\lambda ^*)- L(x^*,\lambda ^*)\ge 0,&{}\quad \forall ~x\in {\mathcal {X}}, \\ L(x^*,\lambda ^*)- L(x^*,\lambda )\ge 0,&{}\quad \forall ~\lambda \in {\mathbb {R}}^m. \end{array}\right. \end{aligned}$$

Consequently,

$$\begin{aligned} x^* = \arg \min \limits _{x\in {\mathcal {X}}} L(x,\lambda ^*),\quad \lambda ^* = \arg \max \limits _{\lambda \in {\mathbb {R}}^m} L(x^*,\lambda ). \end{aligned}$$

(2.2)

By Lemma 2.1 in [11], the first-order optimality conditions of problem (2.2) are as follows:

$$\begin{aligned} \left\{ \begin{array}{ll} \theta (x)-\theta (x^*) -(x-x^*)^{\text {T}} (A^{\text {T}} \lambda ^*)\ge 0, &{}~~\forall ~x\in {\mathcal {X}}, \\ (\lambda -\lambda ^*)^{\text {T}} (Ax^*-b) \ge 0, &{}~~\forall ~\lambda \in {\mathbb {R}}^m. \end{array}\right. \end{aligned}$$

Definition 2.1

The solution set of (1.1), denoted by ${\mathcal {W}}^*\subset {\mathcal {W}}$, consists of all $w^*\in {\mathcal {W}}$ satisfying

$$\begin{aligned} \theta (u)-\theta (u^*) +(w-w^*)^{\text {T}} F(w^*)\ge 0, ~~\forall ~w\in {\mathcal {W}}, \end{aligned}$$

(2.3)

where

$$\begin{aligned} u=x,~~ w=\left( \begin{matrix}x\\ \lambda \end{matrix}\right) ,~~ F (w) = \left( \begin{array}{c}-A^{\text {T}}\lambda \\ Ax-b\end{array}\right) . \end{aligned}$$

(2.4)

The generalized customized proximal point algorithm for problem (1.1) is proposed as follows.

By the first-order optimality conditions of iteration subproblem (2.5), we have

$$\begin{aligned} \left\{ \begin{array}{l} \theta (x)-\theta (x^{k+1})+(x-x^{k+1})^{\text {T}}\Big \{-A^{\text {T}} [(1+\alpha )\lambda ^{k+1}-\alpha \lambda ^k]+r(x^{k+1}-x^k)\Big \}\ge 0, \\ (\lambda -\lambda ^{k+1})^{\text {T}}~\Big \{\alpha (Ax^k-b) +t(\lambda ^{k+1}-\lambda ^k)\Big \}= 0, \end{array}\right. \end{aligned}$$

which implies that $\forall ~w\in {\mathcal {W}}$ the following variational inequality holds:

$$\begin{aligned}&\theta (u)-\theta (u^{k+1}) +(\alpha -1)(\lambda -\lambda ^{k+1})^{\text {T}} (Ax^{k+1}-b)\\&\quad +{{ \left( \begin{array}{c}x-x^{k+1}\\ \lambda -\lambda ^{k+1}\end{array}\right) }^{\text {T}}} \left[ \left( \begin{array}{c}-A^{\text {T}}\lambda ^{k+1}\\ Ax^{k+1}-b\end{array}\right) +\left( \begin{array}{cc}rI_n &{} -\alpha A^{\text {T}}\\ alpha A &{} tI_m\end{array}\right) \left( \begin{array}{c}x^{k+1}-x^{k}\\ \lambda ^{k+1}-\lambda ^{k}\end{array}\right) \right] \ge 0. \end{aligned}$$

In a compact form, we have

$$\begin{aligned}&\theta (u)-\theta (u^{k+1}) +~\frac{t(\alpha -1)}{\alpha }(\lambda -\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2})\nonumber \\&\quad +~{{(w-w^{k+1})}^{\text {T}}} \Big [F(w^{k+1})+Q(w^{k+1}-w^{k})\Big ]\ge 0,~~\text {for any}~w\in {\mathcal {W}},\qquad \end{aligned}$$

(2.6)

where F(w) is defined by (2.4), and

$$\begin{aligned} Q = \left( \begin{array}{cc}rI_n &{} -\alpha A^{\text {T}} \\ alpha A &{} tI_m\end{array}\right) . \end{aligned}$$

(2.7)

Lemma 2.3

Let $w^*\in {\mathcal {W}}^*$ be a solution defined by (2.3), and $\{w^k\}$ be a sequence generated by the gCPPA method. Then we have

$$\begin{aligned} (w^*-w^{k+1})^{\text {T}}Q(w^{k+1}-w^{k}) \ge \tfrac{t(1-\alpha )}{\alpha }(\lambda ^*-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2}). \end{aligned}$$

(2.8)

Proof

By setting $u=u^*,~~w=w^*$ in (2.6), we get

$$\begin{aligned}&(w^*-w^{k+1})^{\text {T}}Q(w^{k+1}-w^{k})\nonumber \\&\quad \ge \theta (u^{k+1})-\theta (u^*)+(w^{k+1}-w^*)^{\text {T}}F (w^{k+1})\nonumber \\&\quad +\tfrac{t(1-\alpha )}{\alpha }(\lambda ^*-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2}). \end{aligned}$$

(2.9)

By the monotonicity of F(w),^{Footnote 1}. we have

$$\begin{aligned} (w^{k+1}-w^*)^{\text {T}} F(w^{k+1})\ge (w^{k+1}-w^*)^{\text {T}} F(w^{*}). \end{aligned}$$

(2.10)

Since $w^*\in {\mathcal {W}}^*$ is a solution, it follows

$$\begin{aligned} \theta (u^{k+1})-\theta (u^*)+(w^{k+1}-w^*)^{\text {T}} F(w^*)\ge 0. \end{aligned}$$

(2.11)

Adding (2.10) and (2.11) yields

$$\begin{aligned} \theta (u^{k+1})-\theta (u^*)+(w^{k+1}-w^{\text {*}})^{\text {T}}F (w^{k+1})\ge 0.\end{aligned}$$

(2.12)

Substituting (2.12) into (2.9), we obtain (2.8) and complete the proof. $\square $

In the current subsection, let

$$\begin{aligned} M=rI_n-\frac{\alpha ^2}{t} A^{\text {T}}A. \end{aligned}$$

(2.13)

It is easy to show that if $r, t>0$ and $rt\ge \alpha ^2\rho (A^{\text {T}}A)$, where $\alpha \in (0.5,1)$, then Q and M are positive semidefinite. We use the notation $\Vert w\Vert ^2_H: =w^{\text {T}} H w\ge 0$ if H is symmetric and positive semidefinite.

Lemma 2.4

For the sequence $\{w^k\}$ generated by the gCPPA method we have

$$\begin{aligned}&\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\frac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\nonumber \\&\quad \le \big \Vert w^{*}-w^{k}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^2 -\Big [\big \Vert w^{*}-w^{k+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{k+2}\Vert ^2\Big ]. \end{aligned}$$

(2.14)

Proof

By the identity

$$\begin{aligned} (a-b)^{\text {T}}Q(c-d)=\tfrac{1}{2}\big (\Vert a-d\Vert _{Q}^{2}-\Vert a-c\Vert _{Q}^{2}+\Vert c-b\Vert _{Q}^{2}-\Vert d-b\Vert _{Q}^{2}\big ), \end{aligned}$$

we have

$$\begin{aligned}&(w^*-w^{k+1})^{\text {T}}Q(w^{k+1}-w^k)\nonumber \\&=\frac{1}{2}\Big ( \big \Vert w^*-w^k\big \Vert _Q^2- \big \Vert w^*-w^{k+1}\big \Vert _Q^2- \big \Vert w^k-w^{k+1}\big \Vert _Q^2\Big ), \end{aligned}$$

(2.15)

and

$$\begin{aligned}&(\lambda ^*-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2})\nonumber \\&=\frac{1}{2}\Big ( \Vert \lambda ^{*}-\lambda ^{k+2}\Vert ^2 -\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^2- \Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ). \end{aligned}$$

(2.16)

Combining (2.15), (2.16) and (2.8), we obtain

$$\begin{aligned}&\big \Vert w^k-w^{k+1}\big \Vert _Q^2 -\tfrac{t(1-\alpha )}{\alpha }\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2\nonumber \\&\quad \le \big \Vert w^*-w^k\big \Vert _Q^2-\big \Vert w^*-w^{k+1}\big \Vert _Q^2 +\tfrac{t(1-\alpha )}{\alpha }\Big (\big \Vert \lambda ^{*}-\lambda ^{k+1}\big \Vert ^2 -\big \Vert \lambda ^{*}-\lambda ^{k+2}\big \Vert ^2\Big ). \end{aligned}$$

(2.17)

By the definition of Q in (2.7), it follows

$$\begin{aligned} \big \Vert w^k-w^{k+1}\big \Vert _Q^2&= r\big \Vert x^k-x^{k+1}\big \Vert ^2 -2\alpha (x^k-x^{k+1})^{\text {T}}A^{\text {T}}(\lambda ^{k}-\lambda ^{k+1})\nonumber \\&+t\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2. \end{aligned}$$

(2.18)

Applying $\lambda $-iteration of (2.5), we get

$$\begin{aligned}&-2\alpha (x^k-x^{k+1})^{\text {T}}A^{\text {T}}(\lambda ^{k}-\lambda ^{k+1}) +t\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2\nonumber \\&\quad =t\Big \Vert \big [\tfrac{\alpha }{t}(Ax^k-b)-\tfrac{\alpha }{t}(Ax^{k+1}-b) \big ] -(\lambda ^{k}-\lambda ^{k+1})\Big \Vert ^2 -\tfrac{\alpha ^2}{t}\big \Vert x^k-x^{k+1}\big \Vert ^2_{A^{\text {T}}A}\nonumber \\&\quad =t\Big \Vert \big [(\lambda ^{k}-\lambda ^{k+1})-(\lambda ^{k+1}-\lambda ^{k+2})\big ] -(\lambda ^{k}-\lambda ^{k+1})\Big \Vert ^2 -\tfrac{\alpha ^2}{t}\big \Vert x^k-x^{k+1}\big \Vert ^2_{A^{\text {T}}A}\nonumber \\&\quad =t\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2 -\tfrac{\alpha ^2}{t}\big \Vert x^k-x^{k+1}\big \Vert ^2_{A^{\text {T}}A}. \end{aligned}$$

(2.19)

Combining (2.18) with (2.19), it follows

$$\begin{aligned} \big \Vert w^k-w^{k+1}\big \Vert _Q^2 =\big \Vert x^k-x^{k+1}\big \Vert ^2_M +t\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2 . \end{aligned}$$

(2.20)

Substituting (2.20) into (2.17), we obtain (2.14) immediately and complete the proof. $\square $

Theorem 2.5

Suppose $\alpha \in (0.5,1)$, $r, t>0$ and $rt\ge \alpha ^2\rho (A^{\text {T}}A)$, the sequence $\{w^k\}$ is generated by the gCPPA method. Then we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert w^{k+1}-w^k\Vert =0, \end{aligned}$$

(2.21)

and $\{w^k\}$ globally converges to an element of solution set ${\mathcal {W}}^*$ satisfying (2.3).

Proof

Since $\alpha \in (0.5,1)$, $r, t>0$ and $rt\ge \alpha ^2\rho (A^{\text {T}}A)$, it holds that

$$\begin{aligned}&\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \nonumber \\&\quad \ge \Big (r-\tfrac{\alpha ^2\rho (A^{\text {T}}A)}{t}\Big ) \Vert x^{k}-x^{k+1}\Vert ^2 +\tfrac{t(2\alpha -1)}{\alpha } \Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\ge 0, \end{aligned}$$

(2.22)

where M is defined by (2.13). Summing (2.14) from 0 to $K>1$, it yields

$$\begin{aligned}&\sum \limits _{k=0}^{K}\Big ( \big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big )\nonumber \\&\quad \le \big \Vert w^{*}{-}w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}{-}\lambda ^{1}\Vert ^2 {-}\Big [\big \Vert w^{*}-w^{K+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{K+2}\Vert ^2\Big ]. \end{aligned}$$

(2.23)

On one hand, we get from (2.22) and (2.23) that

$$\begin{aligned} \big \Vert w^{*}-w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{1}\Vert ^2 \ge \big \Vert w^{*}-w^{K+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{K+2}\Vert ^2 \end{aligned}$$

(2.24)

holds for $\forall K>1$. Since the objective function of problem (1.1) is a closed convex function, the solution set ${\mathcal {W}}^*$ is a bounded closed convex nonempty set.

On the other hand, by taking limits on the both side of (2.23), we get

$$\begin{aligned} \sum \limits _{k=0}^{\infty }\Big ( \big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big )\le d, \end{aligned}$$

(2.25)

where $d:= \big \Vert w^{*}-w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{1}\Vert ^2. $ Thus

$$\begin{aligned} \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{x}^{k}}-{{x}^{k+1}}\Vert ^{2}_M=0, \quad \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{\lambda }^{k+1}}-{{\lambda }^{k+2}}\Vert ^{2}=0, \end{aligned}$$

which implies (2.21), i.e.,

$$\begin{aligned} \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{x}^{k}}-{{x}^{k+1}}\Vert ^{2}=0,\quad \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{\lambda }^{k}}-{{\lambda }^{k+1}}\Vert ^{2}=0, \end{aligned}$$

(2.26)

The sequence $\{w^k\}$ has at least one cluster point. Let $w^\infty $ be a cluster point of $\{w^k\}$, i.e., there exists a subsequence $\{w^{k_j}\}_{k_j\in {\mathcal {K}}}$ such that $w^\infty = \lim \nolimits _{k_j\rightarrow \infty , k_j\in {\mathcal {K}}} w^{k_j}$. Then by taking limit on (2.6) as $k \rightarrow \infty $, and applying (2.26) we have

$$\begin{aligned} \theta (u)-\theta (u^\infty ) +(w-w^\infty )^{\text {T}} F(w^\infty )\ge 0, \forall ~w\in {\mathcal {W}}, \end{aligned}$$

which implies that $w^{\infty }\in {\mathcal {W}}^*$ is a solution satisfying (2.3) and completes the proof. $\square $

Theorem 2.6

If the conditions of Theorem 2.5 hold, then the sequence $\{w^k\}$ generated by the gCPPA method has a worst-case convergence rate ${\mathcal {O}}(1/k)$.

Proof

Substituting $a=w^{k-1}-w^{k}$ and $b=w^k-w^{k+1}$ to the identity

$$\begin{aligned} \Vert a\Vert _Q^2-\Vert b\Vert _Q^2=2a^{\text {T}} Q(a-b)-\Vert a-b\Vert _Q^2, \end{aligned}$$

we obtain

$$\begin{aligned}&\big \Vert w^{k-1}-w^{k}\big \Vert _{Q}^2-\big \Vert w^k-w^{k+1}\big \Vert _{Q}^2\nonumber \\&\quad =2(w^{k-1}-w^{k})^{\text {T}}{Q}\Big \{w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \}\nonumber \\&\quad -\Big \Vert w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \Vert ^2_{Q}. \end{aligned}$$

(2.27)

Letting $w:=w^k$ in (2.6), we get

$$\begin{aligned}&\theta (u^k)-\theta (u^{k+1}) + \frac{t(\alpha -1)}{\alpha }(\lambda ^k-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2})\nonumber \\&\quad + {{(w^k-w^{k+1})}^{\text {T}}} \Big [F(w^{k+1})+Q(w^{k+1}-w^{k})\Big ]\ge 0. \end{aligned}$$

(2.28)

Since (2.6) also holds at the $(k-1)^{th}$ iteration, we have

$$\begin{aligned}&\theta (u)-\theta (u^{k}) +\frac{t(\alpha -1)}{\alpha }(\lambda -\lambda ^{k})^{\text {T}} (\lambda ^{k}-\lambda ^{k+1})\\&\quad +{{(w-w^{k})}^{\text {T}}} \Big [F(w^{k})+Q(w^{k}-w^{k-1})\Big ]\ge 0, \quad \forall ~w\in {\mathcal {W}}. \end{aligned}$$

Setting $w:=w^{k+1}$ in the above inequality, we obtain

$$\begin{aligned}&\theta (u^{k+1})-\theta (u^{k}) +\frac{t(\alpha -1)}{\alpha }(\lambda ^{k+1}-\lambda ^{k})^{\text {T}} (\lambda ^{k}-\lambda ^{k+1})\nonumber \\&\quad +{{(w^{k+1}-w^{k})}^{\text {T}}} \Big [F(w^{k})+Q(w^{k}-w^{k-1})\Big ]\ge 0. \end{aligned}$$

(2.29)

Adding (2.28) and (2.29), it yields

$$\begin{aligned}&(w^{k}-w^{k+1})^{\text {T}}Q \Big \{w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \}\\&\quad \ge (w^{k+1}-w^k)^{\text {T}}\Big [F(w^{k+1})-F(w^k)\Big ]\\&\quad +\tfrac{t(1-\alpha )}{\alpha } (\lambda ^{k}-\lambda ^{k+1})^{\text {T}}\Big [\lambda ^{k+1}-\lambda ^{k+2} -(\lambda ^{k}-\lambda ^{k+1})\Big ]. \end{aligned}$$

By the monotonicity of F(w), we get from the above inequality that

$$\begin{aligned}&(w^{k}-w^{k+1})^{\text {T}}Q \Big \{w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \}\nonumber \\&\quad \ge \tfrac{t(1-\alpha )}{\alpha } (\lambda ^{k}-\lambda ^{k+1})^{\text {T}} \Big [\lambda ^{k+1}-\lambda ^{k+2} -(\lambda ^{k}-\lambda ^{k+1})\Big ]. \end{aligned}$$

(2.30)

Adding $\big \Vert w^{k-1}-w^{k}-(w^{k}-w^{k+1})\big \Vert ^2_{Q}$ to the both sides of (2.30), it follows

$$\begin{aligned}&(w^{k-1}-w^{k})^{\text {T}}{Q}\Big \{w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \}\nonumber \\&\quad \ge \Big \Vert w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \Vert ^2_Q\nonumber \\&\quad +\tfrac{t(1 -\alpha )}{\alpha } (\lambda ^{k}-\lambda ^{k+1})^{\text {T}}\Big [\lambda ^{k+1} -\lambda ^{k+2} -(\lambda ^{k}-\lambda ^{k+1})\Big ]. \end{aligned}$$

(2.31)

It is easy to derive that

$$\begin{aligned}&2(\lambda ^{k}-\lambda ^{k+1})^{\text {T}}\Big [\lambda ^{k+1}-\lambda ^{k+2} -(\lambda ^{k}-\lambda ^{k+1})\Big ]\nonumber \\&\quad =-\Big \Vert \lambda ^{k}-\lambda ^{k+1}- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2 +\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2 -\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2. \end{aligned}$$

(2.32)

Combining (2.27) with (2.31) and (2.32), we have

$$\begin{aligned}&\big \Vert w^{k-1}-w^{k}\big \Vert _{Q}^2 +\tfrac{t(1-\alpha )}{\alpha }\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2 -\left( \big \Vert w^k-w^{k+1}\big \Vert _{Q}^2 +\tfrac{t(1-\alpha )}{\alpha }\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2\right) \nonumber \\&\quad \ge \Big \Vert w^{k-1}\!-\!w^{k}-(w^{k}\!-\!w^{k+1})\Big \Vert ^2_Q -\tfrac{t(1-\alpha )}{\alpha } \Big \Vert \lambda ^{k}-\lambda ^{k+1}- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2. \end{aligned}$$

(2.33)

By the definition of Q in (2.7), it follows

$$\begin{aligned} \Big \Vert w^{k-1}\!-\!w^{k}\!-\!(w^{k}\!-\!w^{k+1})\Big \Vert ^2_Q =~&r\Big \Vert x^{k-1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k+1})\Big \Vert ^2 +t\Big \Vert \lambda ^{k-1}\!-\!\lambda ^{k}\!-\! (\lambda ^{k}\!-\!\lambda ^{k+1})\Big \Vert ^2\\&-2\alpha \Big [x^{k-1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k+1})\Big ]^{\text {T}}A^{\text {T}} \Big [\lambda ^{k-1}\!-\!\lambda ^{k}\!-\! (\lambda ^{k}\!-\!\lambda ^{k+1})\Big ]. \end{aligned}$$

By $\lambda $-iteration of (2.5), we get

$$\begin{aligned}&-2\alpha \Big [x^{k-1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k+1})\Big ]^{\text {T}}A^{\text {T}} \Big [\lambda ^{k-1}\!-\!\lambda ^{k}\!-\!(\lambda ^{k}\!-\!\lambda ^{k+1})\Big ]\\&+t\Big \Vert \lambda ^{k-1}\!-\!\lambda ^{k}\! -\!(\lambda ^{k}\!-\!\lambda ^{k+1})\Big \Vert ^2 \\&\quad =t\Big \Vert \tfrac{\alpha }{t}A\big [x^{k\!-\!1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k\!+\!1})\big ] \!-\!\big [ \lambda ^{k\!-\!1}\!-\!\lambda ^{k}\!-\!(\lambda ^{k}\!-\!\lambda ^{k\!+\!1}) \big ]\Big \Vert ^2\\&\quad -\tfrac{\alpha ^2}{t}\Big \Vert x^{k\!-\!1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k\!+\!1}) \Big \Vert ^2_{A^{\text {T}}A}\\&\quad =t\Big \Vert \lambda ^{k}-\lambda ^{k+1}-(\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2 -\tfrac{\alpha ^2}{t}\Big \Vert x^{k-1}-x^{k}-(x^{k}-x^{k+1})\Big \Vert ^2_{A^{\text {T}}A}, \end{aligned}$$

which implies

$$\begin{aligned} \Big \Vert w^{k-1}\!-\!w^{k}\!-\!(w^{k}\!-\!w^{k+1})\Big \Vert ^2_Q&= \!\Big \Vert x^{k-1}\!-\!x^{k}-(x^{k}\!-\!x^{k+1})\Big \Vert ^2_M\nonumber \\&+t\Big \Vert \lambda ^{k}\!-\!\lambda ^{k+1}\! -\!(\lambda ^{k+1}\!-\!\lambda ^{k+2})\Big \Vert ^2. \end{aligned}$$

(2.34)

Substituting (2.20) and (2.34) into (2.33), we have

$$\begin{aligned}&\big \Vert x^{k-1}-x^{k}\big \Vert ^2_M +\tfrac{t}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Big (\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ) \nonumber \\&\quad \ge \Big \Vert x^{k-1}\!-\!x^{k}-(x^{k}\!-\!x^{k+1})\Big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha } \Big \Vert \lambda ^{k}-\lambda ^{k+1}- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2.\nonumber \\ \end{aligned}$$

(2.35)

Adding $\Big (\tfrac{t(2\alpha -1)}{\alpha } -\tfrac{t}{\alpha }\Big ) \Big [\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ]$ to the both side of (2.35), we obtain

$$\begin{aligned}&\big \Vert x^{k-1}-x^{k}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Big (\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ) \\&\quad \ge \Big \Vert x^{k-1}-x^{k}-(x^{k}-x^{k+1})\Big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha } \Big \Vert \lambda ^{k}-\lambda ^{k+1}- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2\\&\qquad -\tfrac{2t(1-\alpha )}{\alpha } \Big [\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ]\\&\quad = \Big \Vert x^{k-1}-x^{k}-(x^{k}-x^{k+1})\Big \Vert ^2_M +\tfrac{t}{\alpha } \Big \Vert (\small {2\alpha -1})(\lambda ^{k}-\lambda ^{k+1})- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2\\&\qquad +4(\small {2-\alpha })t\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2. \end{aligned}$$

Recall that M is positive semidefinite and $\alpha \in (0.5,1)$, it follows

$$\begin{aligned} \big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\le \big \Vert x^{k-1}-x^{k}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2. \end{aligned}$$

(2.36)

Which implies $\Big \{\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big \}$ is monotonically nonincreasing whenever $k\ge 0$. Summing (2.36) from 1 to $K+1$, it yields

$$\begin{aligned}&(1+K)\Big (\big \Vert x^{K}-x^{K+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2 \Big ) \\&\quad \le \sum \limits _{k=1}^{K+1}\Big ( \big \Vert x^{k-1}-x^{k}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 \Big )\\&\quad =\sum \limits _{k=0}^{K}\Big ( \big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big ). \end{aligned}$$

Combing with (2.23), (2.24) and (2.1), we have

$$\begin{aligned} (1+K)\Big (\big \Vert x^{K}-x^{K+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2 \Big )\le d. \end{aligned}$$

Hence, for any $\epsilon >0$ there exists K such that

$$\begin{aligned} \big \Vert x^{K}-x^{K+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2 \le \frac{d}{1+K}<\epsilon . \end{aligned}$$

(2.37)

Let $\lfloor d/\epsilon \rfloor $ means the maximal integer not greater that $d/\epsilon $. Then, inequality (2.37) implies that the gCPPA needs at most $\lfloor d/\epsilon \rfloor $ iterations to ensure that

$$\begin{aligned} \Big (r-\tfrac{\alpha ^2\rho (A^{\text {T}}A)}{t}\Big )\big \Vert x^{K}-x^{K+1}\big \Vert ^2< \epsilon ,\quad \tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2< \epsilon , \end{aligned}$$

which implies the sequence $\{w^k\}$ generated by the gCPPA method has a worst-case convergence rate ${\mathcal {O}}(1/k)$ in a non-ergodic sense. The proof of Theorem 2.6 is completed. $\square $

2.2 Extended gCPPA to Separable Convex Minimization Problems

The Lagrangian function of problem (1.9) is

$$\begin{aligned} L(x,y,\lambda )=\theta _1(x)+\theta _2(y) -\lambda ^{\text {T}}(Ax+By-b). \end{aligned}$$

(2.38)

If a pair of $(x^*,y^*)\in {\mathcal {X}}\times {\mathcal {Y}}$ is a solution of problem (1.9), then there exists $\lambda ^*\in {\mathbb {R}}^m$ such that $(x^*,y^*,\lambda ^*)$ is a saddle point of (2.38) satisfying

$$\begin{aligned} \left\{ \begin{array}{ll} L(x,y^*,\lambda ^*)- L(x^*,y^*,\lambda ^*)\ge 0, \quad \forall ~x\in {\mathcal {X}}, \\ L(x^*,y,\lambda ^*)- L(x^*,y^*,\lambda ^*)\ge 0, \quad \forall ~y\in {\mathcal {Y}}, \\ L(x^*,y^*,\lambda ^*)- L(x^*,y^*,\lambda )\ge 0, \quad \forall ~\lambda \in {\mathbb {R}}^m. \end{array}\right. \end{aligned}$$

(2.39)

Consequently, by (2.39) it follows

$$\begin{aligned}&x^* = \arg \min \limits _{x\in {\mathcal {X}}} L(x,y^*,\lambda ^*), \quad y^* = \arg \min \limits _{y\in {\mathcal {Y}}} L(x^*,y,\lambda ^*),\nonumber \\&\quad \lambda ^* = \arg \max \limits _{\lambda \in {\mathbb {R}}^m} L(x^*,y^*,\lambda ). \end{aligned}$$

(2.40)

According to Lemma 2.1 in [11], we have the first-order optimality conditions of problem (2.40):

$$\begin{aligned} \left\{ \begin{array}{ll} \theta _1(x)-\theta _1(x^*) -(x-x^*)^{\text {T}} (A^{\text {T}} \lambda ^*)\ge 0, &{}\quad \forall ~x\in {\mathcal {X}}, \\ \theta _2(y)-\theta _2(y^*) -(y-y^*)^{\text {T}} (B^{\text {T}} \lambda ^*)\ge 0, &{}\quad \forall ~y\in {\mathcal {Y}}, \\ (\lambda -\lambda ^*)^{\text {T}} (Ax^*+By^*-b) \ge 0, &{}\quad \forall ~\lambda \in {\mathbb {R}}^m. \\ \end{array}\right. \end{aligned}$$

(2.41)

By a compact form of (2.41), the solution set of problem (1.9), also denoted by ${\mathcal {W}}^*\subset {\mathcal {W}}$, consists of all $w^*\in {\mathcal {W}}$ satisfying (2.3), where ${\mathcal {W}} := {\mathcal {X}}\times {\mathcal {Y}}\times {\mathbb {R}}^m$, $\theta (u):=\theta _1(x)+\theta _2(y)$ and

$$\begin{aligned} u:=\left( \begin{matrix}x \\ y \\ \end{matrix}\right) ,\quad w:=\left( \begin{matrix}x \\ y \\ \lambda \end{matrix}\right) ,\quad F(w):=\left( \begin{matrix}-{{A}^{\text {T}}}\lambda \\ -{{B}^{\text {T}}}\lambda \\ Ax+By-b\\ \end{matrix} \right) . \end{aligned}$$

(2.42)

We are in the position to propose the extended gCPPA for separable convex optimization problem (1.9).

By the first-order optimality conditions of iteration subproblem (2.43), we have that, for $w^{k+1}\in {\mathcal {W}}$ generated by the eCPPA and $\forall ~w\in {\mathcal {W}}$, we have

$$\begin{aligned} \left\{ \begin{array}{l} \theta (x)-\theta (x^{k+1})+(x-x^{k+1})^{\text {T}}\Big \{-A^{\text {T}} [(1+\alpha )\lambda ^{k+1}-\alpha \lambda ^k]+r(x^{k+1}-x^k)\Big \}\ge 0, \\ \theta (y)-\theta (y^{k+1})+(y-y^{k+1})^{\text {T}}\Big \{-B^{\text {T}} [(1+\alpha )\lambda ^{k+1}-\alpha \lambda ^k]+s(y^{k+1}-y^k)\Big \}\ge 0, \\ (\lambda -\lambda ^{k+1})^{\text {T}}~\Big \{\alpha (Ax^k+By^k-b) +t(\lambda ^{k+1}-\lambda ^k)\Big \}= 0. \end{array}\right. \end{aligned}$$

(2.44)

By notations (2.42), inequality (2.44) can be rewritten to the compact form (2.6) in which

$$\begin{aligned} Q=\left( \begin{matrix}rI_{n_1}&{}0_{n_1\times n_2}&{} -\alpha A^{\text {T}}\\ 0_{n_2\times n_1}&{}sI_{n_2}&{}-\alpha B^{\text {T}}\\ alpha A&{}- \alpha B&{}tI_{m}\end{matrix}\right) . \end{aligned}$$

(2.45)

Lemma 2.8

Let $w^*\in {\mathcal {W}}^*$ be a solution satisfying (2.3), then we have

$$\begin{aligned} (w^\text {*}-w^{k+1})^{\text {T}}Q(w^{k+1}-w^{k}) \ge \tfrac{t(1-\alpha )}{\alpha }(\lambda ^*-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2}). \end{aligned}$$

(2.46)

Proof

The mapping F(w) is affine with a skew-symmetric matrix, and thus it is monotone. The rest of the proof is as same as that of Lemma 2.3. $\square $

In this subsection, let

$$\begin{aligned} M=\left( \begin{array}{cc} rI_{n_1}-\tfrac{\alpha ^2A^{\text {T}}A}{t}&{} -\tfrac{\alpha ^2A^{\text {T}}B}{t}\\ -\tfrac{\alpha ^2B^{\text {T}}A}{t}&{} sI_{n_2}-\tfrac{\alpha ^2B^{\text {T}}B}{t} \end{array}\right) . \end{aligned}$$

(2.47)

If $r, s, t>0$ and $rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)$, where $\alpha \in (0.5,1)$, it is easy to verify that Q and M are positive semidefinite.

Lemma 2.9

For the sequence $\{w^k\}$ generated by the gCPPA method, we have

$$\begin{aligned}&\big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\frac{t(2\alpha -1)}{\alpha }\big \Vert \lambda ^{k+1}- \lambda ^{k+2}\big \Vert ^2 \nonumber \\&\quad \le \big \Vert w^{*}-w^{k}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^2 -\Big [\big \Vert w^{*}-w^{k+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{k+2}\Vert ^2\Big ]. \end{aligned}$$

(2.48)

Proof

Similar to Lemma 2.4, we also get (2.17). By notation Q in (2.45), it holds

$$\begin{aligned} \big \Vert w^k-w^{k+1}\big \Vert _Q^2\!=\! & {} r\big \Vert x^k-x^{k+1}\big \Vert ^2 +s\big \Vert y^k-y^{k+1}\big \Vert ^2 -2\alpha (x^k-x^{k+1})^{\text {T}}A^{\text {T}}(\lambda ^{k}-\lambda ^{k+1})\\&-2\alpha (y^k-y^{k+1})^{\text {T}}B^{\text {T}}(\lambda ^{k}-\lambda ^{k+1}) +t\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2. \end{aligned}$$

From $\lambda $-iteration of (2.43), we get

$$\begin{aligned}&-2\alpha (x^k-x^{k+1})^{\text {T}}A^{\text {T}}(\lambda ^{k}-\lambda ^{k+1})- 2\alpha (y^k-y^{k+1})^{\text {T}}B^{\text {T}}(\lambda ^{k}-\lambda ^{k+1}) +t\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2 \\&\quad =t\Big \Vert \big [\tfrac{\alpha }{t}(Ax^k+By^k-b)-\tfrac{\alpha }{t}(Ax^{k+1}+By^{k+1}-b) \big ] -(\lambda ^k-\lambda ^{k+1})\Big \Vert ^2 \\&\qquad -\tfrac{\alpha ^2}{t}\Big \Vert A(x^k-x^{k+1})+B(y^k-y^{k+1})\Big \Vert ^2 \\&\quad =t\Big \Vert \big [(\lambda ^k-\lambda ^{k+1})-(\lambda ^{k+1}-\lambda ^{k+2})\big ] -(\lambda ^k-\lambda ^{k+1})\Big \Vert ^2\\&\quad -\tfrac{\alpha ^2}{t}\Big \Vert A(x^k-x^{k+1})+B(y^k-y^{k+1})\Big \Vert ^2 \\&\quad =t\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2 -\tfrac{\alpha ^2}{t}\Big \Vert A(x^k-x^{k+1})+B(y^k-y^{k+1})\Big \Vert ^2 , \end{aligned}$$

which implies

$$\begin{aligned} \big \Vert w^k-w^{k+1}\big \Vert _Q^2=\big \Vert u^k-u^{k+1}\big \Vert ^2_M +t\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2, \end{aligned}$$

(2.49)

where

$$\begin{aligned}&\big \Vert u^k-u^{k+1}\big \Vert ^2_M =~r\big \Vert x^{k}-x^{k+1}\big \Vert ^2 + s\big \Vert y^{k}-y^{k+1}\big \Vert ^2\\&-\tfrac{\alpha ^2}{t}\Big \Vert A(x^k\!-\!x^{k+1})\! +\!B(y^k\!-\!y^{k+1})\Big \Vert ^2. \end{aligned}$$

Substituting (2.49) into (2.17), we obtain (2.14) immediately and complete the proof. $\square $

Theorem 2.10

Suppose that $r, s, t>0$ and $rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)$, where $\alpha \in (0.5,1)$. Then, for the sequence $\{w^k\}$ is generated by the eCPPA method, we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert w^{k+1}-w^k\Vert =0, \end{aligned}$$

(2.50)

and $\{w^k\}$ globally converges to an element of solution set ${\mathcal {W}}^*$ satisfying (2.41).

Proof

Since $rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)$, it follows

$$\begin{aligned} \big \Vert u^{k}\!-\!u^{k+1}\big \Vert ^2_M&\ge \Big (r\!-\!\tfrac{2\alpha ^2\rho (A^{\text {T}}A)}{t}\Big )\big \Vert x^{k+1}\!-\! x^{k+2}\big \Vert ^2\nonumber \\&+ \Big (s\!-\!\tfrac{2\alpha ^2\rho (B^{\text {T}}B)}{t}\Big ) \big \Vert y^{k+1}\!-\! y^{k+2}\big \Vert ^2\ge 0. \end{aligned}$$

(2.51)

Notice that $t>0,~ \alpha \in (0.5,1)$, we get

$$\begin{aligned} \big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \ge 0. \end{aligned}$$

(2.52)

Summing (2.48) from 0 to $K>1$, it yields

$$\begin{aligned}&\sum \limits _{k=0}^{K}\Big ( \big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big ) \nonumber \\&\quad \le \big \Vert w^{*}-w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{1}\Vert ^2 -\Big [\big \Vert w^{*}-w^{K+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{K+2}\Vert ^2\Big ].\nonumber \\ \end{aligned}$$

(2.53)

By (2.52) and (2.53), it is easy to derive (2.24) and

$$\begin{aligned} \sum \limits _{k=0}^{\infty }\Big ( \big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big )\le d, \end{aligned}$$

which implies that

$$\begin{aligned} \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{x}^{k}}-{{x}^{k+1}}\Vert ^{2}=0,\quad \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{y}^{k}}-{{y}^{k+1}}\Vert ^{2}=0,\quad \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{\lambda }^{k}}-{{\lambda }^{k+1}}\Vert ^{2}=0. \end{aligned}$$

Hence, there exists a subsequence $\{w^{k_j}\}_{k_j\in {\mathcal {K}}}$ such that $w^\infty = \lim \nolimits _{k_j\rightarrow \infty , k_j\in {\mathcal {K}}} w^{k_j}$ is a solution satisfying (2.3) and completes the proof. $\square $

Theorem 2.11

If the conditions of Theorem 2.10 hold, then the sequence $\{w^k\}$ generated by the eCPPA method has a worst-case convergence rate ${\mathcal {O}}(1/k)$ in a non-ergodic sense.

Proof

It holds obviously that

$$\begin{aligned} \Big \Vert w^{k-1}\!-\!w^{k}\!-\!(w^{k}\!-\!w^{k+1})\Big \Vert ^2_Q&=\Big \Vert u^{k-1}\!-\!u^{k}\!-\!(u^{k}\!-\!u^{k+1})\Big \Vert ^2_M\nonumber \\&\quad +t\Big \Vert \lambda ^{k}\!-\!\lambda ^{k+1}\!-\!(\lambda ^{k+1} \!-\!\lambda ^{k+2})\Big \Vert ^2. \end{aligned}$$

(2.54)

By (2.49) and (2.54), we have

$$\begin{aligned}&\big \Vert u^{k-1}-u^{k}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Big (\big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ) \\&\quad \ge \Big \Vert u^{k-1}-u^{k}-(u^{k}-u^{k+1})\Big \Vert ^2_M +\tfrac{t}{\alpha } \Big \Vert (\small {2\alpha -1})(\lambda ^{k}-\lambda ^{k+1})- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2\\&\qquad +4(\small {2-\alpha })t\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2. \end{aligned}$$

Then, for any $\epsilon >0$ there exists K such that

$$\begin{aligned} \big \Vert u^{K}-u^{K+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2 \le \frac{d}{1+K}<\epsilon . \end{aligned}$$

(2.55)

By (2.51), inequality (2.55) implies that the eCPPA needs at most $\lfloor d/\epsilon \rfloor $ iterations to ensure that

$$\begin{aligned}&\Big (r-\tfrac{2\alpha ^2\rho (A^{\text {T}}A)}{t}\Big )\big \Vert x^{K}-x^{K+1}\big \Vert ^2< \epsilon , ~\Big (s-\tfrac{2\alpha ^2\rho (B^{\text {T}}B)}{t}\Big )\big \Vert y^{K}-y^{K+1}\big \Vert ^2< \epsilon ,\\&\quad \tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2< \epsilon , \end{aligned}$$

which implies that the sequence $\{w^k\}$ generated by the eCPPA method has a worst-case convergence rate ${\mathcal {O}}(1/k)$ in a non-ergodic sense. The proof of Theorem 2.11 is completed. $\square $

3 Numerical Results

The section focuses on the numerical performance of the gCPPA and eCPPA comparing with some state-of-the-art methods. All methods are coded in MATLAB 2012b and run on an HP desktop computer with Intel(R) Core(TM) i5-6500 CPU 3.20GHz, RAM 8G. The termination criterion of gCPPA is set to $\max \left\{ \Vert x^k-x^{k+1}\Vert ,\Vert \lambda ^k-\lambda ^{k+1}\Vert \right\} \le \varepsilon $, and that of eCPPA is set to $\max \left\{ \Vert x^k-x^{k+1}\Vert ,\Vert y^k-y^{k+1}\Vert ,\Vert \lambda ^k-\lambda ^{k+1}\Vert \right\} \le \varepsilon $, where $\varepsilon >0$ is a small real number. By the global convergence of the gCPPA and eCPPA methods, we have $\underset{k\rightarrow \infty }{\mathop {\text {lim}}}\,\Vert {{w}^{k}}-{{w}^{k+1}}\Vert ^{2}=0$, which means that for any given $\varepsilon >0$ the corresponding algorithm will stop in finite iterations.

To adjust parameters r, s and t simultaneously, we introduce $\alpha _r,~\alpha _t>0$ satisfying $\alpha _r\alpha _t=\alpha ^2$. Take $r= \alpha _r R,~t> \alpha _t \rho (A^{\text {T}}A)/R~$ in gCPPA such that $ rt\ge \alpha ^2\rho (A^{\text {T}}A)$, and take $r= \sqrt{2}\alpha _r R,~s= \sqrt{2}\alpha _r R\rho (B^{\text {T}}B)/\rho (A^{\text {T}}A),~t> \sqrt{2}\alpha _t\rho (A^{\text {T}}A)/R$ in eCPPA such that $rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)$, where $R>0$ is a selected constant.

Example 3.1

The first test problem is a correlation matrices calibrating problem selected from [11, 29], which has the form

$$\begin{aligned} \min \Bigg \{~ \frac{1}{2}\Vert X-C\Vert ^2_F ~\big | ~\text{ diag }(X) = e, ~ X\in S^n_{+} \Bigg \}, \end{aligned}$$

(3.1)

where $C\in {\mathbb {R}}^{n\times n}$ is a symmetrical matrix, $e=(1,1,\dots ,1)^{\text {T}}\in {\mathbb {R}}^{n \times 1}$ and $S^n_{+}=\{H\in {\mathbb {R}}^{n\times n}~\big |~ H^{\text {T}}=H, H\succeq 0\}$. Problem (3.1) can be reformulated to problem (1.1) with $\rho (A^{\text {T}}A)=1$, where A is a projection matrix for the linear constraint ${\text {diag}}(X)=e$ (see Example 2 in [11]).

Let $z\in {\mathbb {R}}^n$ be the Lagrange multiplier to the linear equality constraint of problem (3.1).

We compare the performance of the gCPPA to the CPPA and rCPPA methods. For a given $(X^k, z^k)$, the gCPPA produces the new iteration $(X^{k+1},z^{k+1})$ as follows:

$$\begin{aligned} \left\{ \begin{array}{l} z^{k+1}= z^k-\frac{\alpha }{t}({\text {diag}}(X^k)-e), \\ X^{k+1} = \arg \min \nolimits _{X\in S^n_{+}} \Big \{~\frac{1}{2}\Vert X-C\Vert ^2_F+\frac{r}{2}\big \Vert (X-X^k)- \frac{1}{r} {\text {diag}}[(1+\alpha )z^{k+1}-\alpha z^k]\big \Vert ^2_F~ \Big \}. \end{array}\right. \end{aligned}$$

(3.2)

The X-subproblem of (3.2) is identical to

$$\begin{aligned} X^{k+1} = \arg \min \nolimits _{X\in S^n_{+}} \Bigg \{~\frac{1}{2}\big \Vert X-\frac{1}{r+1}\{rX^k+ {\text {diag}} [(1+\alpha )z^{k+1}-\alpha z^k]+C\}\big \Vert ^2_F~\Bigg \}. \end{aligned}$$

It is inherent a projection operator via SVD decomposition. It takes the main computation load in each iteration. Similarly, the iteration schemes of CPPA and rCPPA for (3.1) can be written in the form of (1.7) and (1.8).

In our experiments, we take a random matrix $C=(c_{ij})_{n\times n}$ satisfying $c_{ij}\in (-1,1)$ and $c_{ij}=c_{ji}$ for all i, j. For each given n, 20 random instances are tested. To be fair, the initial point $X^0$ is an $n\times n$ identity matrix and $z^0$ is n-dimensional zero vector. For the termination criterion of all methods, we set $\varepsilon = 10^{-5}$.

Let $R=2.00$. The proximal parameters of the CPPA and rCPPA methods are set to $r=R$ and $t=1.05/R=0.53$, which satisfies $rt=1.05>\rho (AA^{\text {T}})$. Set $\gamma =1.50$ in the rCPPA method. The “$\hbox {gCPPA}^{(0.53)}$” means the gCPPA method with $\alpha _r=0.70$, $\alpha _t=0.40$, so $\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.53$, $r=R\alpha _r=1.40$ and $t=1.05\alpha _t/R=0.29$. The “$\hbox {gCPPA}^{(0.17)}$” means the gCPPA method with $\alpha _r=0.30$, $\alpha _t=0.10$, $r=R\alpha _r=0.60$, so $\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.17$ and $t=1.05\alpha _t/R=0.05$.

For easy comparisons, the results of three methods in terms of matrix dimension n, average iteration number (iter) and average cpu-time (cput in seconds) are put into Table 1, and displayed in Fig. 1.

Table 1 Numerical results of CPPA, rCPPA, $\hbox {gCPPA}^{(0.53)}$ and $\hbox {gCPPA}^{(0.17)}$

Full size table

One can conclude from Table 1 and Fig. 1 that: (1) the $\hbox {gCPPA}^{(0.53)}$ takes $80\%$ iterations and time of the CPPA, but takes more iterations and time than the rCPPA; (2) the $\hbox {gCPPA}^{(0.17)}$ takes $60\%$ iterations of the CPPA and $80\%$ iterations of the rCPPA, and spends less cpu-time than the CPPA and rCPPA, especially for the high dimension problem. In other word, the $\hbox {gCPPA}^{(0.53)}$ and $\hbox {gCPPA}^{(0.17)}$ are better than CPPA, and $\hbox {gCPPA}^{(0.17)}$ has almost better numerical performance as the rCPPA with $\gamma = 1.50$. However, the convergence of $\hbox {gCPPA}^{(0.17)}$ is not established since $\alpha <0.5$. It will be left to our future research.

Example 3.2

The second test problem is the matrix completion problem utilized in Tao, Yuan and He [30], which is as follows:

$$\begin{aligned} \min \Bigg \{~ \frac{1}{2}\Vert X-C\Vert ^2_F ~\Big |~ X\in S^n_{+}\cap S_{\text {B}} \Bigg \}, \end{aligned}$$

(3.3)

where $S_{\text {B}}=\{H\in {\mathbb {R}}^{n\times n}~\big |~H_{\text {L}}\ge H\ge H_{\text {U}}\}$.

By introducing an auxiliary variable Y such that $ X - Y=0$, problem (3.3) can be reformulated to a separable convex optimization problem of the form

$$\begin{aligned} \min \left\{ ~\frac{1}{2}\Vert X-C\Vert ^2_F + \frac{1}{2}\Vert Y-C\Vert ^2_F~ \Big |~ X = Y ,~ X\in S^n_{+},~ Y\in S_{\text {B}}\right\} . \end{aligned}$$

(3.4)

Obviously, problem (3.4) is a special case of problem (1.9) in which $\rho (A^{\text {T}}A)=\rho (B^{\text {T}}B)=1$.

For comparisons, four methods are used for solving problem (3.4): classical ADMM (ADMM for short), gADMM, separable version of CPPA (i.e., Algorithm 2.7 with $\alpha =1$, sCPPA) and eCPPA ( Algorithm 2.7 with $0<\alpha _0< \alpha < 1$). Let $Z\in {\mathbb {R}}^{n\times n}$ be the Lagrange multiplier for linear constraint $X-Y=0$. For a given $(X^k,Y^k,Z^k)$, the eCPPA produces new iterate $(X^{k+1},Y^{k+1},Z^{k+1})$ via

$$\begin{aligned} \left\{ \begin{array}{l} Z^{k+1}=Z^k-\frac{\alpha }{t}(X^k-Y^k),\\ X^{k+1}=\arg \min \nolimits _{X\in S^n_{+}} \Big \{ \frac{1}{2}\big \Vert X-\frac{1}{r+1}[rX^k+ (1+\alpha )Z^{k+1}-\alpha Z^k+C]\big \Vert ^2_F \Big \}, \\ Y^{k+1}=\arg \min \nolimits _{Y\in S_{B}}\Big \{ \frac{1}{2}\big \Vert Y-\frac{1}{s+1}[sY^k- (1+\alpha )Z^{k+1}+\alpha Z^k+C]\big \Vert ^2_F \Big \}. \end{array}\right. \qquad \end{aligned}$$

(3.5)

The X-subproblem of iteration (3.5) is inherent a projection operator via SVD decomposition, and it takes the main computation load in each iteration. The Y-subproblem of (3.5) is also a projection

$$\begin{aligned} Y^{k+1}=P_{S_{B}} \Bigg \{\frac{1}{s+1}\big [sY^k -(1+\alpha )Z^{k+1}+\alpha Z^k+C\big ]\Bigg \}, \end{aligned}$$

where $P_{S_{B}}$ denotes the projection onto $S_{B}$ under the Euclidean norm.

In the experiments, we take a random matrix $C=(c_{ij})_{n\times n}$ satisfying $c_{ij}\in (-1,1)$ and $c_{ij}=c_{ji}$ for all i, j. For each given n, 20 random instances are tested. To be fair, $X^0$ and $Y^0$ are set to $n\times n$ identity matrice and $z^0$ is set to n-dimensional zero vector.

For termination criterion of all methods, we set $ \varepsilon = 10^{-5}\Vert w^{1}-w^0\Vert _\infty $.

The parameter settings of the used methods are stated as follows: The penalty parameter is set to $\beta =1.80$ in both ADMM and gADMM, the relaxation parameter is set to $\gamma = 1.80$ in the gADMM. Let $R=3.00$, the proximal parameters of the sCPPA are set to $r = s = \sqrt{2} R \approx 4.24$ and $t=1.02\sqrt{2}/R\approx 0.48$, and those of the eCPPA are set to $r =s=\sqrt{2}\alpha _r R\approx 2.55$, $t=1.02 \sqrt{2} \alpha _t/R\approx 0.24$, where $\alpha _r =0.60$ and $\alpha _t = 0.50$, and $\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.55$.

We test all methods on 20 random instances for each fixed n. The average iteration number and the average cpu-time for problem (3.4) with different n are listed in Table 2 and displayed in Fig. 2.

Table 2 Numerical results of ADMM, gADMM, sCPPA and eCPPA

Full size table

From Table 2 and Fig. 2, one can conclude that the iteration number of eCPPA is about $30\%$ of ADMM, and about $70\%$ of gADMM and sCPPA. Moreover, eCPPA takes less cpu-time than others especially for the high-dimensional problems. Thus, eCPPA has the best performance for problem (3.4).

Example 3.3

The third example selected from [13] focuses on the wavelet-based image inpainting and zooming problems. Let a two-dimensional image $x\in {\mathbb {R}}^{l_1\times l_2}$ and $l=l_1\cdot l_2$. Letting $\mathbf vec (\cdot )$ be the vectorization operator in the lexicographic order, $\mathbf{x }=\mathbf vec (x)\in {\mathbb {R}}^l$ is tackled by vectorizing x as an one-dimensional vector in the lexicographic order. Let $D\in {\mathbb {R}}^{l\times n}$ be a wavelet dictionary, each column of D is the elements of a wavelet frame. Commonly, the image $\mathbf{x }$ possesses a sparse representation under dictionary D, i.e., $\mathbf{x } = D x$ where x is a sparse vector. The wavelet-based image processing considers recovering the real image $\mathbf{x }$ from an observation b which might have some missing pixels or convolutions. The model for wavelet-based image processing can be casted as

$$\begin{aligned} \min \Big \{ \Vert \mathbf{x }\Vert _1 \Big | BD\mathbf{x }=b \Big \}, \end{aligned}$$

(3.6)

where $\Vert \mathbf{x }\Vert _1$ is to deduce a sparse representation under the wavelet dictionary, and B (also called mask) is a typically ill-conditioned matrix representation of convolution or downsampling operators. For the inpainting problem, matrix $B\in {\mathbb {R}}^{l\times l}$ in (3.6) is a diagonal matrix whose diagonal elements are either 0 or 1, where the locations of 0 correspond to missing pixels in the image and locations of 1 correspond to the pixels to be kept. For the image zooming problem, matrix $B\in {\mathbb {R}}^{m\times l}$ can be expressed as $B = SH$ where $S\in {\mathbb {R}}^{m\times l} $ is a downsampling matrix and $H\in {\mathbb {R}}^{l\times l} $ is a blurry matrix. Matrix H can be diagonalized by the discrete consine transform (DCT), i.e., $H = C^{-1}\Lambda C$ where C represents the DCT and $\Lambda $ is a diagonal matrix whose diagonal entries are eigenvalues of H.

We compare the gCPPA method to the CPPA and rCPPA methods on $256\times 256$ images of Peppers.png and Boat.png for the image inpainting and image zooming problems, respectively.

The dictionary D is chosen as the inverse discrete Haar wavelet transform with level 6. Below we give the detail of how the tested images are degraded.

For the image inpainting problem, the original image Peppers is first blurred by the out-of-focus kernel with a radius of 7. Then 60% pixels of the blurred images are lost by implementing a mask operator S. The positions of missing pixels are located randomly.
For the image zooming problem, the original image Boat is downsampled by a downsampling matrix S with a factor of 4. Then, the downsampled image is corrupted by a convolution whose kernel is generated by fspecial(gaussian,9,2.5) of MATLAB.

Let z be the Lagrange multiplier. For a given $(\mathbf{x }^k, z^k)$, the gCPPA adopts the following iteration to obtain $(\mathbf{x }^{k+1}, z^{k+1})$:

$$\begin{aligned} \left\{ \begin{array}{l} z^{k+1} = z^k-\frac{\alpha }{t}(BD{\mathbf{x }}^{k}-b),\\ \mathbf{x }^{k+1} = \arg \min \nolimits _{\mathbf{x } \in {\mathcal {X}}} \big \{ \Vert \mathbf{x }\Vert _1 + \frac{r}{2}\big \Vert \mathbf{x }-\mathbf{x }^k- \frac{1}{r}BD\big [(\alpha +1)z^{k+1}-\alpha z^k\big ]\big \Vert ^2 \big \}. \end{array}\right. \end{aligned}$$

On problem (3.6), the performance of the gCPPA (compared with CPPA and rCPPA) is tested. Similarly, the iteration scheme of the CPPA has the same iteration as the gCPPA with $\alpha =1$, and the rCPPA can also be written in the form of (1.7) and (1.8). Since $DD^{\text {T}} = I$, the blurry matrix H can be diagonalized by DCT and the binary matrix (both mask and downsampling matrices) S satisfies $\Vert S\Vert =1$, we have $\Vert (BD)^{\text {T}}(BD)\Vert =1$ for the wavelet-based image inpainting and zooming problems. Therefore, the requirement $rt\ge \Vert (BD)^{\text {T}}(BD)\Vert $ reduces to $rt\ge 1$ for the CPPA or rCPPA, and $rt>\alpha ^2$ for the gCPPA. The parameters are described as below.

For the image inpainting problem, we take initial point $({\mathbf{x }^0,\upsilon ^0}) = (D^{\text {T}}(b),\mathbf{0 })$, and $R=0.60$, $r_0 = R = 0.60$, $t_0 = 1.02/R=1.70$, $\gamma = 1.00$ for the CPPA and $\gamma = 1.90$ for the rCPPA. Take $\alpha _r= 2.00$, $\alpha _t= 0.13$, $\alpha =\sqrt{0.26}\approx 0.51$, $r = \alpha _r R=1.20$, $t=1.02\alpha _t/R\approx 0.22$ for the gCPPA;
For the image zooming problem, we take initial point $(\mathbf{x }^0,\upsilon ^0) = (\mathbf{0 }, \mathbf{0 })$, and $R=0.55$, $r_0 =R= 0.55$, $t_0 = 1.02/R\approx 1.85$, $\gamma = 1.00$ for the CPPA and $\gamma = 1.20$ for the rCPPA. Take $\alpha _r= 1.00$, $\alpha _t= 0.26$, $a = \sqrt{0.26}\approx 0.51$, $r = \alpha _r R=0.55$, $t = 1.02\alpha _t/R\approx 0.48$ for the gCPPA.

As usual, the quality of the reconstruction is measured by the signal-to-noise ratio (SNR) in decibel (dB)

$$\begin{aligned} {\text {SNR}}:=20\log _{10} \frac{\Vert \mathbf{x }\Vert }{\Vert \bar{\mathbf{x }}-\mathbf{x }\Vert }, \end{aligned}$$

(3.7)

where $\bar{\mathbf{x }}$ is a reconstructed image and $\mathbf{x }$ is a clean image.

For easy comparisons, the evolutions of SNR with respect to iterations and computing time for all tested methods are displayed in Fig. 3. It shows that the rCPPA is better than the CPPA, and the gCPPA converges to the nearly same solutions in inpainting and zooming problems as fast as the rCPPA.

The reconstructed images are displayed in Fig. 4 by executing 300 iterations in image inpainting (or 30 iterations in image zooming). It verifies our assertion: the gCPPA could be at least as efficient as the rCPPA on the inpainting and zooming problem. We have to emphasize that the gCPPA does not involve any relaxation step while the rCPPA has a relaxation step with $\gamma >1$.

Example 3.4

The last example selected from [13] focuses on the total variation (TV) uniform noise removal model:

$$\begin{aligned} \min \Big \{\Vert \nabla \mathbf{x }\Vert _1 \Big | \Vert H\mathbf{x }-\mathbf{x }^0\Vert _{\infty }\le \sigma \Big \}, \end{aligned}$$

(3.8)

where the two-dimensional image $x=(x_{i,j})_{l_1\times l_2}\in {\mathbb {R}}^{l_1\times l_2}$ and $l=l_1 \cdot l_2$. The vectorized version of an image can be denoted by $\mathbf{x }=\mathbf vec (x)\in {\mathbb {R}}^l$ as Example 3.3, and $\Vert \nabla \mathbf{x }\Vert _1:=\sum _{i=1}^{l}(|\text {vec}(\nabla _h x)|_i+|\text {vec}(\nabla _v x)|_i)$ is the TV norm with $\nabla _h x=\big ((\nabla _h x)_{i,j}\big )_{l_1\times l_2}$, $\nabla _v x=\big ((\nabla _v x)_{i,j}\big )_{l_1\times l_2}$ where

$$\begin{aligned} (\nabla _h x)_{i,j}= {\left\{ \begin{array}{ll} x_{i+1,j}-x_{i,j},&{} i\ne l_1 \\ x_{1,j}-x_{l_1,j},&{} i= l_1 \end{array}\right. } ,\quad (\nabla _v x)_{i,j}= {\left\{ \begin{array}{ll} x_{i,j+1}-x_{i,j},&{} j\ne l_2 \\ x_{i,1}-x_{i,l_2},&{} j= l_2 \end{array}\right. }. \end{aligned}$$

The vector $\mathbf{x }^0\in {\mathbb {R}}^l$ is an observed image corrupted by a zero-mean uniform noise; $H\in {\mathbb {R}}^{l \times l}$ is the matrix representation of a blurry operator satisfying $\Vert HH^{\text {T}}\Vert =1$, $\sigma $ is a parameter indicating the uniform noise level and $\Vert \mathbf{x }\Vert _{\infty }:=\max _{1\le i \le l}\Vert \mathbf{x }_i\Vert $.

Let

$$\begin{aligned} A: = \left( \begin{array}{l} H \\ -H \end{array}\right) , \quad b: = \left( \begin{array}{l} \mathbf{x }^0-\sigma \mathbf{e } \\ -\mathbf{x }^0-\sigma \mathbf{e } \end{array}\right) , \end{aligned}$$

where $\mathbf{e } = (1, 1,\cdots , 1)^{\text {T}} \in {\mathbb {R}}^l$. Model (3.8) can be reformulated to

$$\begin{aligned} \min \Big \{\Vert \nabla \mathbf{x }\Vert _1 ~\Big |~A\mathbf{x }\ge b \Big \}. \end{aligned}$$

(3.9)

We also compare the gCPPA to the CPPA and rCPPA methods on 256-by-256 images of Peppers.png and Boat.png. The clean images are degraded by either the Gaussian (fspecial(gaussian,9,2.5)”) or the out-of-focus (fspecial(disk,3)”) convolution.

Let $\mathbf{z }$ be the Lagrange multiplier for the linear inequality constraint $A\mathbf{x }\ge b$. The gCPPA produces the new iterate via:

$$\begin{aligned} \left\{ \begin{array}{l} \mathbf{z }^{k+1}=P_{R_+}\Big \{ \mathbf{z }^k-\frac{\alpha }{t}(A{\mathbf{x }}^{k}-b) \Big \}, \\ \mathbf{x }^{k+1} = \arg \min \left\{ \Vert \nabla \mathbf{x }\Vert _1 + \frac{r}{2}\big \Vert \mathbf{x }-\mathbf{x }^k-\frac{1}{r}A^{\text {T}} \big [(\alpha +1)\mathbf{z }^{k+1}-\alpha \mathbf{z }^k\big ] \big \Vert ^2 ~\big |~ \mathbf{x }\in X\right\} . \end{array}\right. \end{aligned}$$

The CPPA method also owns above iteration with $\alpha =1$, and the rCPPA can be written in the form of (1.7) and (1.8). We take $R = 0.60$, $r_0 = R$, $t_0 = 1.02/R=1.70$ and $\gamma = 1.80$ for the CPPA and rCPPA method, and take $\alpha _r=7.00$, $\alpha _t=0.04$, $\alpha =\sqrt{0.28}\approx 0.53$, $r = \alpha _r R=4.20$, $t = 1.02\alpha _t/R\approx 0.07$ for the gCPPA. The initial points for all methods are taken as zeros and the stopping criterion is set to

$$\begin{aligned} Tol=\frac{\Vert \mathbf{x }^{k+1}-\mathbf{x }^k\Vert }{\Vert \mathbf{x }^k\Vert }<10^{-6}. \end{aligned}$$

(3.10)

Set $\sigma =0.5$. For easy comparisons, the original images, degraded images and the restored images are displayed in Fig. 5.

The evolutions of SNR with respect to iterations and computing time are displayed in Fig. 6. It shows that both the CPPA and gCPPA methods reach the higher SNR in a short time. However, the CPPA converges to the solutions worse than that of the rCPPA, while the gCPPA converges to better solutions than the rCPPA in TV uniform noise removal problem.

The numerical results on more experiments with different $\sigma $ are put into Table 3 for comparisons.

Table 3 Results of the CPPA, rCPPA and gCPPA with different $\sigma $

Full size table

In Table 3, each set of $\cdot \setminus \cdot \setminus \cdot $ represents the number of iterations, the computing time in seconds and the restored SNR when the stopping criterion (3.10) is met. From Table 3, one can conclude that: the rCPPA has better performance comparing with the CPPA in SNR, but worse in the number of iterations and cpu-time, while the gCPPA has better performance comparing with the CPPA in all terms including SNR, the number of iterations and cpu-time.

4 Conclusions

In this paper, we propose two customized proximal point algorithm by modifying the prediction step involving no relaxation step. We have proposed two new CPPAs (i.e., gCPPA and eCPPA), both do not involve any relaxation step but still keep the same convergence properties as the CPPA within some relaxation steps. Numerical results have demonstrated that the proposed methods are more effective than some state-of-the-art methods.

Notes

The mapping F(w) is affine with a skew-symmetric matrix, and thus it is monotone; see He and Yuan [23]

References

Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 93, 273–299 (1965)
Article MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Tseng, P.: Partial proximal minimization algorithms for convex programming. SIAM J. Optim. 4(3), 551–572 (1993)
Article MathSciNet MATH Google Scholar
Kaplan, A., Tichatschke, R.: Proximal point methods and nonconvex optimization. J. Glob. Optim. 13(4), 389–406 (1998)
Article MathSciNet MATH Google Scholar
Martinet, B.: Brève communication. Regularisation, d’iné quations variationelles par approximations succesives. ESAIM-Math. Model. Numer. Anal.-Model. Math. Anal. Numer 4(R3), 154–158 (1970)
Google Scholar
Kassay, G.: The proximal points algorithm for reflexive Banach spaces. Stud. Math. 30, 9–17 (1985)
MathSciNet MATH Google Scholar
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 646–664 (1992)
Article MathSciNet MATH Google Scholar
Spingarn, J.E.: Partial inverse of a monotone operator. Appl. Math. Optim. 10(10), 247–265 (1983)
Article MathSciNet MATH Google Scholar
Han, S.P.: A decomposition method and its application to convex programming. Math. Oper. Res. 14(2), 237–248 (1989)
Article MathSciNet MATH Google Scholar
Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. UCLA CAM. Rep. 34, 1–29 (2008)
Google Scholar
He, B.S.: PPA-like contraction methods for convex optimization: a framework using variational inequality approach. J. Oper. Res. Soc. China 3(4), 391–420 (2015)
Article MathSciNet MATH Google Scholar
He, B.S., Fu, X.L., Jiang, Z.K.: Proximal point algorithm using a linear proximal term. J. Optim. Theory Appl. 141(2), 299–319 (2009)
Article MathSciNet MATH Google Scholar
He, B.S., Yuan, X.M., Zhang, W.: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56(3), 559–572 (2013)
Article MathSciNet MATH Google Scholar
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(4), 303–320 (1969)
Article MathSciNet MATH Google Scholar
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. Optimization 5(6), 283–298 (1969)
MathSciNet Google Scholar
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Article MathSciNet MATH Google Scholar
He, B.S.: Parallel splitting augmented Lagrangian methods for monotone structured variational inequalities. Comput. Optim. Appl. 42(2), 195–212 (2009)
Article MathSciNet MATH Google Scholar
Glowinski, R., Marrocco, A.: Analyse numerique du champ magnetique d’un alternateur par elements finis et sur-relaxation ponctuelle non lineaire. Comput. Meth. Appl. Mech. Eng. 3(1), 55–85 (1974)
Article MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Gabay, D.: Chapter IX: applications of the method of multipliers to variational inequalities. Stud. Math. Appl. 15, 299–331 (1983)
Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Article MathSciNet MATH Google Scholar
Cai, X.J., Gu, G.Y., He, B.S., Yuan, X.M.: A proximal point algorithm revisit on the alternating direction method of multipliers. Sci. China Math. 56(10), 2179–2186 (2013)
Article MathSciNet MATH Google Scholar
He, B.S., Yuan, X.M.: On the $O(1/n)$ convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
He, B.S., Yuan, X.M.: On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2014)
Article MathSciNet MATH Google Scholar
Sarkis, J.: A strategic decision framework for green supply chain management. J. Clean. Prod. 11(4), 397–409 (2003)
Article Google Scholar
Wu, Z., Pagell, M.: Balancing priorities: Decision-making in sustainable supply chain management. J. Oper. Manag. 29(6), 577–590 (2011)
Article Google Scholar
Yazdani, M., Zarate, P., Coulibaly, A., Zavadskas, E.K.: A group decision making support system in logistics and supply chain management. Expert Syst. Appl. 88, 376–392 (2017)
Article Google Scholar
Gu, G., He, B.S., Yuan, X.M.: Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: a unified approach. Comput. Optim. Appl. 59, 135–161 (2014)
Article MathSciNet MATH Google Scholar
Cai, J.F., Candès, Emmanuel J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2008)
Article MathSciNet MATH Google Scholar
Tao, M., Yuan, X.M., He, B.S.: Solving a class of matrix minimization problems by linear variational inequality approaches. Linear Alg. Appl. 434(11), 2343–2352 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are very grateful to Doctor Wen-Xing Zhang at University of Electronic Science and Technology of China for his help on numerical experiments on image processing problems. This work was also supported by the Science and Technology on Communication Information Security Control Laboratory, Xiangtan University.

Author information

Authors and Affiliations

School of Economics and Management, Fuzhou University, Fuzhou, 350108, China
Binqian Jiang
School of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350108, China
Zheng Peng & Kangkang Deng

Authors

Binqian Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Peng
View author publications
You can also search for this author in PubMed Google Scholar
Kangkang Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Peng.

Additional information

Communicated by Majid Soleimani-damaneh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the Natural Science Foundation of China with Grants 11571074 and 11726505.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, B., Peng, Z. & Deng, K. Two New Customized Proximal Point Algorithms Without Relaxation for Linearly Constrained Convex Optimization. Bull. Iran. Math. Soc. 46, 865–892 (2020). https://doi.org/10.1007/s41980-019-00298-0

Download citation

Received: 04 May 2018
Revised: 31 August 2019
Accepted: 14 September 2019
Published: 23 October 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s41980-019-00298-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Two New Customized Proximal Point Algorithms Without Relaxation for Linearly Constrained Convex Optimization

Abstract

Similar content being viewed by others

A class of customized proximal point algorithms for linearly constrained convex optimization

On relaxation of some customized proximal point algorithms for convex minimization: from variational inequality perspective

A parameterized proximal point algorithm for separable convex optimization

1 Introduction

1.1 Proximal Point Algorithm

1.2 Proximal Point Algorithm for Separable Problem

1.3 Contributions

2 Two New Customized Proximal Point Algorithms

2.1 The Generalized Customized Proximal Point Algorithm

Definition 2.1

Lemma 2.3

Proof

Lemma 2.4

Proof

Theorem 2.5

Proof

Theorem 2.6

Proof

2.2 Extended gCPPA to Separable Convex Minimization Problems

Lemma 2.8

Proof

Lemma 2.9

Proof

Theorem 2.10

Proof

Theorem 2.11

Proof

3 Numerical Results

Example 3.1

Example 3.2

Example 3.3

Example 3.4

4 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation