1 Introduction

1.1 Proximal Point Algorithm

Proximal point algorithm (PPA for short) dates back to Moreau [1], which was extended by Rockafellar [2], Bertsekas and Tseng [3], and Kaplan and Tichatschke [4], etc. It was introduced to optimization community by Martinet [5]. The PPA possesses a robust convergence for very general problems in finite and infinite-dimensions (Kassay [6] and Glüer [7]), and is also the basis for the splitting methods (Spingarn [8] and Han [9]).

Consider a convex minimization problem with linear constraints of the form

$$\begin{aligned} \min \{\theta (x)| Ax=b, x\in {\mathcal {X}}\}, \end{aligned}$$
(1.1)

where \({\mathcal {X}}\subset {\mathbb {R}}^{n}\) is a bounded closed convex and nonempty set, \(\theta :{\mathcal {X}}\rightarrow {\mathbb {R}}\) is a convex (not necessarily smooth) function, \(A\in {\mathbb {R}}^{m\times n}\), \(b\in {\mathbb {R}}^{m}\). The solution set of (1.1), denoted by \({\mathcal {X}}^*\), is assumed to be nonempty.

For solving problem (1.1), the primal-dual hybrid gradient (PDHG) algorithm proposed by Zhu and Chan [10] has the following iterative scheme:

$$\begin{aligned} \left\{ \begin{array}{lll} x^{k+1}=\arg \min \nolimits _{x\in {\mathcal {X}}}\big \{\theta (x) +\frac{r}{2}\big \Vert (x-x^k) -\frac{1}{r}A^{\text {T}}{\lambda }^{k}\big \Vert ^2\big \},\\ \lambda ^{k+1}=\lambda ^k-\frac{1}{t}(Ax^{k+1}-b), \end{array}\right. \end{aligned}$$
(1.2)

where \(\lambda \in {\mathbb {R}}^m\) is the Lagrange multiplier vector, \(r,t>0\) are iteration parameters. Let \(w:=(x,\lambda )\), and \({\mathcal {W}}:= {\mathcal {X}}\times {\mathbb {R}}^m\). Then, by the first optimality condition of iteration (1.2), for \(w^{k+1}\in {\mathcal {W}}\) we have that

$$\begin{aligned} \theta (x)-\theta (x^{k+1})+(w-w^{k+1})^{\text {T}} \Big [ F(w^{k+1})+P(w^{k+1}-w^k)\Big ] \ge 0,~~\forall ~ w\in {\mathcal {W}}, \end{aligned}$$
(1.3)

where

$$\begin{aligned} F(w)=\left( \begin{array}{c}-A^{\text {T}}\lambda \\ Ax-b\end{array}\right) ,\quad P=\left( \begin{array}{cc}rI_n&{} A^{\text {T}} \\ 0_{m\times n} &{} t I_m \\ \end{array}\right) , \end{aligned}$$

and \(I_n\) denotes a \({n\times n}\)-dimensional identity matrix, \(0_{m\times n}\) is a \({m\times n}\)-dimensional null matrix. However, He [11] showed that the PDHG algorithm could not be necessarily convergent, and gave a counter example, see Sect. 3 of [11].

He, Fu and Jiang [12] proposed a PPA using a linear proximal term (LPPA). For solving (1.1), the LPPA (Procedure 4.1 in [12]) first produces a predictor \({\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{\lambda }}^k)\) via

$$\begin{aligned} \left\{ \begin{array}{l} {\tilde{x}}^{k}=\arg \min \nolimits _{x\in {\mathcal {X}}}\bigg \{\theta (x) +\frac{1}{2}\big \Vert A(x-x^k)-\lambda ^{k}\big \Vert ^2\bigg \}, \\ {\tilde{\lambda }}^{k}=\lambda ^k-(A{\tilde{x}}^{k}-b). \end{array}\right. \end{aligned}$$
(1.4)

By the first-order optimality conditions of (1.4), for \({\tilde{w}}^k\in {\mathcal {W}}\), we have

$$\begin{aligned} \theta (x)-\theta ({\tilde{x}}^k)+(w-{\tilde{w}}^k)^{\text {T}} \Big [ F({\tilde{w}}^k)+P_1({\tilde{w}}^k-w^k)\Big ] \ge 0,~~\forall ~w\in {\mathcal {W}}, \end{aligned}$$
(1.5)

where

$$\begin{aligned} F(w)=\left( \begin{array}{c}-A^{\text {T}}\lambda \\ Ax-b\end{array}\right) ,\quad P_1 =\left( \begin{matrix} A^{\text {T}}A &{} A^{\text {T}} \\ -A &{} I_m \end{matrix}\right) . \end{aligned}$$
(1.6)

To ensure the convergence, the new iterate \(w^{k+1}=(x^{k+1},\lambda ^{k+1})\) of LPPA is generated by a simple relaxation step

$$\begin{aligned} w^{k+1}=w^k-\rho _k P_1(w^k-{\tilde{w}}^k), \end{aligned}$$

where \(\rho _k=\tau \rho _k^*\) and

$$\begin{aligned} \rho _k^*=\frac{(w^k-{\tilde{w}}^k)^{\text {T}}P_1(w^k-{\tilde{w}}^k)}{\Vert P_1(w^k-{\tilde{w}}^k)\Vert ^2},~ \tau \in [1,2). \end{aligned}$$

He, Yuan and Zhang [13] proposed a customized proximal point algorithm (CPPA) by using matrix \(P_2\) instead of matrix \(P_1\) in (1.5), where

$$\begin{aligned} P_2 =\left( \begin{matrix} rI_n &{} -A^{\text {T}} \\ -A &{} tI_m \end{matrix}\right) . \end{aligned}$$

To guarantee convergence, the matrix \(P_2\) should be positive definite. As a consequence, it requires \(rt>\rho (A^{\text {T}}A)\), where \(\rho (A^{\text {T}}A)\) is the spectral radius of matrix \(A^{\text {T}}A\). The corresponding iterative scheme of CPPA first produces a predictor \({\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{\lambda }}^k)\) via

$$\begin{aligned} \left\{ \begin{array}{l} {\tilde{\lambda }}^{k}=\lambda ^k-\frac{1}{t}(A{x}^{k}-b), \\ {\tilde{x}}^{k}=\arg \min \nolimits _{x\in {\mathcal {X}}} \big \{\theta (x)+\frac{r}{2}\big \Vert (x-x^k)-\frac{1}{r}A^{\text {T}}(2{\tilde{\lambda }}^{k} -{\lambda }^{k}) \big \Vert ^2\big \}, \end{array}\right. \end{aligned}$$
(1.7)

and then takes a simple relaxation step

$$\begin{aligned} w^{k+1}=w^k-\gamma (w^k-{\tilde{w}}^{k}), \end{aligned}$$
(1.8)

where \(\gamma \in (0, 2)\) is a relaxation factor. Various numerical experiments show that \(\gamma >1\) provides the fast convergence. For distinction in this paper, the CPPA (1.7)-(1.8) with \(\gamma =1\) is called original-CPPA or CPPA for short, while the CPPA with \(\gamma >1\) is called relax-CPPA or rCPPA for short.

The augmented Lagrangian method of multipliers (ALM) is another efficient method for convex optimization, see Hestenes [14] and Powell [15]. Let \(\beta >0\) be a penalty parameter of ALM, the iterative scheme of the ALM to the problem (1.9) is

$$\begin{aligned} \left\{ \begin{array}{l} {x}^{k+1} = \arg \min \nolimits _{x\in {\mathcal {X}}} \Big \{\theta (x)+\frac{\beta }{2}\big \Vert Ax-b -\frac{1}{\beta }\lambda ^{k} \big \Vert ^2\Big \}, \\ {\lambda }^{k+1} = \lambda ^k-\beta (A{x}^{k+1}-b). \end{array}\right. \end{aligned}$$

Rockafellar [16] showed that the ALM is exactly an application of PPA to the dual problem of (1.1), owning the compact inequality (1.3) where P is replaced by

$$\begin{aligned} P_3=\left( \begin{matrix}0_{n\times n}, &{}0_{n\times m}\\ 0_{m\times n}&{} \tfrac{1}{\beta }I_m\end{matrix}\right) . \end{aligned}$$

However, He, Yuan and Zhang [13] showed that the CPPA is more efficient than the ALM when the objective function \(\theta (x)\) is simple in the sense that its resolvent operator of \(\theta (x)\), defined by \((I+\tfrac{1}{r}\partial \theta )^{-1}\), has a close-form representation, where \(\partial \theta \) is the subdifferential of \(\theta \).

1.2 Proximal Point Algorithm for Separable Problem

Consider a separable convex optimization problem of the form

$$\begin{aligned} \min \Big \{\theta _1(x)+\theta _2(y)\Big | Ax+By=b, x\in {\mathcal {X}}, y\in {\mathcal {Y}}\Big \}, \end{aligned}$$
(1.9)

where \({\mathcal {X}}\subset {\mathbb {R}}^{n_1}\) and \({\mathcal {Y}}\subset {\mathbb {R}}^{n_2}\) are bounded closed and convex nonempty sets, \(\theta _1:{\mathcal {X}}\rightarrow {\mathbb {R}}\) and \(\theta _2:{\mathcal {Y}}\rightarrow {\mathbb {R}}\) are convex (not necessarily smooth) functions, \(A\in {\mathbb {R}}^{m\times n_1}\), \(B\in {\mathbb {R}}^{m\times n_2}\) and \(b\in {\mathbb {R}}^{m}\). The solution set of (1.9), denoted by \({\mathcal {X}}^*\times {\mathcal {Y}}^*\), is assumed to be nonempty.

The augmented Lagrangian-based methods, especially some splitting forms including alternating direction method of multipliers (ADMM for short) and parallel splitting algorithm [17], are verified to be very efficient for problem (1.9). The iterative scheme of the ALM to the problem (1.9) is

$$\begin{aligned} \left\{ \begin{array}{lll} \lambda ^{k+1} = \lambda ^k-\beta (Ax^k+By^k-b),\\ (x^{k+1}, y^{k+1}) = \arg \min \nolimits _{x\in {\mathcal {X}}, y\in {\mathcal {Y}}} \Big \{\theta _1(x)+\theta _2(y)+\frac{\beta }{2}\big \Vert Ax+By-b -\frac{1}{\beta }\lambda ^{k+1} \big \Vert ^2\Big \}. \end{array}\right. \end{aligned}$$
(1.10)

In the second subproblem of scheme (1.10), the coupled variable (xy) makes it intractable. As a splitting version of the ALM, the ADMM originally proposed by Glowinski [18] and Gabay [19], is as follows:

$$\begin{aligned} \left\{ \begin{array}{l} x^{k+1} = \arg \min \nolimits _{x\in {\mathcal {X}}} \Big \{\theta _1(x)+\frac{\beta }{2}\big \Vert Ax+By^k-b-\frac{1}{\beta }\lambda ^{k} \big \Vert ^2\Big \},\\ y^{k+1} = \arg \min \nolimits _{y\in {\mathcal {Y}}} \Big \{\theta _2(y)+\frac{\beta }{2}\big \Vert Ax^{k+1}+By-b-\frac{1}{\beta }\lambda ^{k} \big \Vert ^2\Big \},\\ \lambda ^{k+1} = \lambda ^k-\beta (Ax^{k+1}+By^{k+1}-b). \\ \end{array}\right. \end{aligned}$$

The ADMM overcomes the drawback of ALM stated above by discoupling variable (xy). Gabay [20] expressed the ADMM as essentially an application of the Douglas–Rachford splitting method [21]. Cai, Gu and He [22] provided a novel and simple PPA way to understand ADMM, and proposed a generalized ADMM (gADMM) which producing firstly a predictor \({\tilde{w}}^k = ({\tilde{x}}^k, {\tilde{y}}^k, {\tilde{\lambda }}^k)\) via

$$\begin{aligned} \left\{ \begin{array}{lll} {\tilde{x}}^k=\arg \min \nolimits _{x\in {\mathcal {X}}} \Big \{\theta _1(x)+\frac{\beta }{2}\big \Vert Ax+By^k-b-\frac{1}{\beta }\lambda ^k \big \Vert ^2\Big \},\\ {\tilde{\lambda }}^{k}=\lambda ^k-\beta (A{\tilde{x}}^k+By^{k}-b),\\ {\tilde{y}}^{k}=\arg \min \nolimits _{y\in {\mathcal {Y}}} \Big \{\theta _2(y)+\frac{\beta }{2}\big \Vert A{\tilde{x}}^k+By-b -\frac{1}{\beta }{\tilde{\lambda }}^k \big \Vert ^2\Big \}. \end{array}\right. \end{aligned}$$
(1.11)

Let \(v=(y,\lambda )\) and \(w:=(x,v)\in {\mathcal {W}}={\mathcal {X}}\times {\mathcal {Y}}\times {\mathbb {R}}^m\), and then the new iterate \(w^{k+1}=(x^{k+1},v^{k+1})\) of the gADMM is generated by a relaxation step:

$$\begin{aligned} x^{k+1}={\tilde{x}}^k,\quad \quad v^{k+1}=v^k-\gamma (v^k-{\tilde{v}}^{k}),\quad \gamma \in (0,2). \end{aligned}$$

The iterative scheme (1.11) can be recovered by a compact form: find \({\tilde{w}}^k \in {\mathcal {W}}\) such that

$$\begin{aligned} \theta (u)-\theta ({\tilde{u}}^{k})+{{(w-{\tilde{w}}^{k})}^{\text {T}}} \Big \{F({\tilde{w}}^{k})+P_4({\tilde{w}}^{k}-w^{k})\Big \}\ge 0,~~\forall ~w\in {\mathcal {W}}, \end{aligned}$$
(1.12)

where \(u=(x,y),~~w=(u, \lambda ) \in {\mathcal {W}}\) and \(\theta (u)=\theta _1(x)+\theta _2(y)\) and

$$\begin{aligned} u=\left( \begin{matrix}x\\ y\\ \end{matrix}\right) ,\quad w=\left( \begin{matrix}x\\ y\\ \lambda \\ \end{matrix}\right) ,\quad F(w)=\left( \begin{matrix}-{{A}^{\text {T}}}\lambda \\ -{{B}^{\text {T}}}\lambda \\ Ax+By-b\\ \end{matrix} \right) , \end{aligned}$$

and the customized matrix \(P_4\) is as follows:

$$\begin{aligned} P_4=\left( \begin{matrix}0_{n_1\times n_1}&{}0_{n_1\times n_2}&{}0_{n_1\times m}\\ 0_{n_2\times n_1}&{}\beta B^{\text {T}}B&{} -B^{\text {T}}\\ 0_{m\times n_1}&{}-B&{} \tfrac{1}{\beta }I_m\end{matrix}\right) . \end{aligned}$$

The special structure of matrix \(P_4\) implies that variable x may not be involved in the iteration of the gADMM. The following matrix

$$\begin{aligned} {\bar{P}}_4=\left( \begin{matrix} \beta B^{\text {T}}B&{} -B^{\text {T}}\\ -B&{}\tfrac{1}{\beta }I_m\end{matrix}\right) , \end{aligned}$$

which removes zero-elements from \(P_4\), is required to be symmetric and positive semidefinite to ensure the convergence of the gADMM. The gADMM was verified to be faster than ADMM because some approximate computation is permitted.

Let \({P}'_4\in {\mathbb {R}}^{n_1+n_2+m}\) be a matrix such that \({P}'_4(w^k-{\tilde{w}}^{k})\) be a descent direction, He and Yuan [23, 24] presented a general relaxation step as follows:

$$\begin{aligned} w^{k+1}=w^k-\gamma {P}'_4(w^k-{\tilde{w}}^{k}), \end{aligned}$$

where \(\gamma \) is the steplength. Let \(H ={\bar{P}}_4 ({P'_4})^{-1}\) and \(G=({{\bar{P}}_4})^{\text {T}}+{\bar{P}}_4-\gamma ({P'_4})^{\text {T}} H({P'_4})\). One can relax the conventional proposition on \({\bar{P}}_4\), but ensure symmetric and positive semidefinite property of G. In [23, 24], He and Yuan established the worst-case convergence rate \({\mathcal {O}}(1/k)\) and non-ergodic convergence rate of the Douglas–Rachford alternating direction method of multipliers.

1.3 Contributions

From above analysis, there are many PPAs for problem (1.1) and (1.9).

The relaxation steps are very effective used in PPAs. For example, the relaxation step of CPPA accelerates the algorithm for problem (1.1), the relaxation step of LPPA and gADMM ensures the convergence for problem (1.1) and (1.9), respectively. However, the relaxation step might be unacceptable or even not permitted in some practical applications. These applications include real time control system, decision-making on flow line production, supply chain management and decision-making, and so on. For an instance, in the decision-making framework for supply chain management [25,26,27], each firm on the chain makes decision independently in a proper sequence, and the decision is immediately executed whenever it is given. Hence, all firms cannot adjust their decision at the end, correspondingly the relaxation step in the simulation method has not any actual meaning.

To adapt these real circumstances, we prefer to modify the prediction step rather than to take a relaxation step, and propose two novel customized proximal point algorithms without the relaxation step. The proposed methods are generalized versions of the CPPA proposed in [13, 28] for both convex and separable convex problems, respectively. The global convergence and the \({\mathcal {O}}(1/k)\)-convergence rate of the proposed methods will be established under some mild assumptions. We also verify that the new methods have a better numerical performance in practice comparing to some state-of-the-art methods within the relaxation step.

The rest of the paper is organized as follows. In Sect. 2, we propose two new customized proximal point algorithms without relaxation step for problems (1.1) and (1.9), respectively. The convergence of proposed methods will be proven in Sect. 2. In Sect. 3, some preliminary numerical results, comparing to the state-of-the-art methods, are presented to show the high efficiency of the new methods. Section 4 concludes this paper with some final remarks.

2 Two New Customized Proximal Point Algorithms

2.1 The Generalized Customized Proximal Point Algorithm

The Lagrangian function of problem (1.1) is

$$\begin{aligned} L(x,\lambda )=\theta (x)-\lambda ^{\text {T}}(Ax-b). \end{aligned}$$
(2.1)

If \(x^*\in {\mathcal {X}}\) is a solution of problem (1.1), then there exists \(\lambda ^*\in {\mathbb {R}}^m\) such that the pair \((x^*,\lambda ^*)\) is a saddle point of (2.1) satisfying

$$\begin{aligned} \left\{ \begin{array}{ll} L(x,\lambda ^*)- L(x^*,\lambda ^*)\ge 0,&{}\quad \forall ~x\in {\mathcal {X}}, \\ L(x^*,\lambda ^*)- L(x^*,\lambda )\ge 0,&{}\quad \forall ~\lambda \in {\mathbb {R}}^m. \end{array}\right. \end{aligned}$$

Consequently,

$$\begin{aligned} x^* = \arg \min \limits _{x\in {\mathcal {X}}} L(x,\lambda ^*),\quad \lambda ^* = \arg \max \limits _{\lambda \in {\mathbb {R}}^m} L(x^*,\lambda ). \end{aligned}$$
(2.2)

By Lemma 2.1 in [11], the first-order optimality conditions of problem (2.2) are as follows:

$$\begin{aligned} \left\{ \begin{array}{ll} \theta (x)-\theta (x^*) -(x-x^*)^{\text {T}} (A^{\text {T}} \lambda ^*)\ge 0, &{}~~\forall ~x\in {\mathcal {X}}, \\ (\lambda -\lambda ^*)^{\text {T}} (Ax^*-b) \ge 0, &{}~~\forall ~\lambda \in {\mathbb {R}}^m. \end{array}\right. \end{aligned}$$

Definition 2.1

The solution set of (1.1), denoted by \({\mathcal {W}}^*\subset {\mathcal {W}}\), consists of all \(w^*\in {\mathcal {W}}\) satisfying

$$\begin{aligned} \theta (u)-\theta (u^*) +(w-w^*)^{\text {T}} F(w^*)\ge 0, ~~\forall ~w\in {\mathcal {W}}, \end{aligned}$$
(2.3)

where

$$\begin{aligned} u=x,~~ w=\left( \begin{matrix}x\\ \lambda \end{matrix}\right) ,~~ F (w) = \left( \begin{array}{c}-A^{\text {T}}\lambda \\ Ax-b\end{array}\right) . \end{aligned}$$
(2.4)

The generalized customized proximal point algorithm for problem (1.1) is proposed as follows.

figure a

By the first-order optimality conditions of iteration subproblem (2.5), we have

$$\begin{aligned} \left\{ \begin{array}{l} \theta (x)-\theta (x^{k+1})+(x-x^{k+1})^{\text {T}}\Big \{-A^{\text {T}} [(1+\alpha )\lambda ^{k+1}-\alpha \lambda ^k]+r(x^{k+1}-x^k)\Big \}\ge 0, \\ (\lambda -\lambda ^{k+1})^{\text {T}}~\Big \{\alpha (Ax^k-b) +t(\lambda ^{k+1}-\lambda ^k)\Big \}= 0, \end{array}\right. \end{aligned}$$

which implies that \(\forall ~w\in {\mathcal {W}}\) the following variational inequality holds:

$$\begin{aligned}&\theta (u)-\theta (u^{k+1}) +(\alpha -1)(\lambda -\lambda ^{k+1})^{\text {T}} (Ax^{k+1}-b)\\&\quad +{{ \left( \begin{array}{c}x-x^{k+1}\\ \lambda -\lambda ^{k+1}\end{array}\right) }^{\text {T}}} \left[ \left( \begin{array}{c}-A^{\text {T}}\lambda ^{k+1}\\ Ax^{k+1}-b\end{array}\right) +\left( \begin{array}{cc}rI_n &{} -\alpha A^{\text {T}}\\ alpha A &{} tI_m\end{array}\right) \left( \begin{array}{c}x^{k+1}-x^{k}\\ \lambda ^{k+1}-\lambda ^{k}\end{array}\right) \right] \ge 0. \end{aligned}$$

In a compact form, we have

$$\begin{aligned}&\theta (u)-\theta (u^{k+1}) +~\frac{t(\alpha -1)}{\alpha }(\lambda -\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2})\nonumber \\&\quad +~{{(w-w^{k+1})}^{\text {T}}} \Big [F(w^{k+1})+Q(w^{k+1}-w^{k})\Big ]\ge 0,~~\text {for any}~w\in {\mathcal {W}},\qquad \end{aligned}$$
(2.6)

where F(w) is defined by (2.4), and

$$\begin{aligned} Q = \left( \begin{array}{cc}rI_n &{} -\alpha A^{\text {T}} \\ alpha A &{} tI_m\end{array}\right) . \end{aligned}$$
(2.7)

Lemma 2.3

Let \(w^*\in {\mathcal {W}}^*\) be a solution defined by (2.3), and \(\{w^k\}\) be a sequence generated by the gCPPA method. Then we have

$$\begin{aligned} (w^*-w^{k+1})^{\text {T}}Q(w^{k+1}-w^{k}) \ge \tfrac{t(1-\alpha )}{\alpha }(\lambda ^*-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2}). \end{aligned}$$
(2.8)

Proof

By setting \(u=u^*,~~w=w^*\) in (2.6), we get

$$\begin{aligned}&(w^*-w^{k+1})^{\text {T}}Q(w^{k+1}-w^{k})\nonumber \\&\quad \ge \theta (u^{k+1})-\theta (u^*)+(w^{k+1}-w^*)^{\text {T}}F (w^{k+1})\nonumber \\&\quad +\tfrac{t(1-\alpha )}{\alpha }(\lambda ^*-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2}). \end{aligned}$$
(2.9)

By the monotonicity of F(w),Footnote 1. we have

$$\begin{aligned} (w^{k+1}-w^*)^{\text {T}} F(w^{k+1})\ge (w^{k+1}-w^*)^{\text {T}} F(w^{*}). \end{aligned}$$
(2.10)

Since \(w^*\in {\mathcal {W}}^*\) is a solution, it follows

$$\begin{aligned} \theta (u^{k+1})-\theta (u^*)+(w^{k+1}-w^*)^{\text {T}} F(w^*)\ge 0. \end{aligned}$$
(2.11)

Adding (2.10) and (2.11) yields

$$\begin{aligned} \theta (u^{k+1})-\theta (u^*)+(w^{k+1}-w^{\text {*}})^{\text {T}}F (w^{k+1})\ge 0.\end{aligned}$$
(2.12)

Substituting (2.12) into (2.9), we obtain (2.8) and complete the proof. \(\square \)

In the current subsection, let

$$\begin{aligned} M=rI_n-\frac{\alpha ^2}{t} A^{\text {T}}A. \end{aligned}$$
(2.13)

It is easy to show that if \(r, t>0\) and \(rt\ge \alpha ^2\rho (A^{\text {T}}A)\), where \(\alpha \in (0.5,1)\), then Q and M are positive semidefinite. We use the notation \(\Vert w\Vert ^2_H: =w^{\text {T}} H w\ge 0\) if H is symmetric and positive semidefinite.

Lemma 2.4

For the sequence \(\{w^k\}\) generated by the gCPPA method we have

$$\begin{aligned}&\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\frac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\nonumber \\&\quad \le \big \Vert w^{*}-w^{k}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^2 -\Big [\big \Vert w^{*}-w^{k+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{k+2}\Vert ^2\Big ]. \end{aligned}$$
(2.14)

Proof

By the identity

$$\begin{aligned} (a-b)^{\text {T}}Q(c-d)=\tfrac{1}{2}\big (\Vert a-d\Vert _{Q}^{2}-\Vert a-c\Vert _{Q}^{2}+\Vert c-b\Vert _{Q}^{2}-\Vert d-b\Vert _{Q}^{2}\big ), \end{aligned}$$

we have

$$\begin{aligned}&(w^*-w^{k+1})^{\text {T}}Q(w^{k+1}-w^k)\nonumber \\&=\frac{1}{2}\Big ( \big \Vert w^*-w^k\big \Vert _Q^2- \big \Vert w^*-w^{k+1}\big \Vert _Q^2- \big \Vert w^k-w^{k+1}\big \Vert _Q^2\Big ), \end{aligned}$$
(2.15)

and

$$\begin{aligned}&(\lambda ^*-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2})\nonumber \\&=\frac{1}{2}\Big ( \Vert \lambda ^{*}-\lambda ^{k+2}\Vert ^2 -\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^2- \Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ). \end{aligned}$$
(2.16)

Combining (2.15), (2.16) and (2.8), we obtain

$$\begin{aligned}&\big \Vert w^k-w^{k+1}\big \Vert _Q^2 -\tfrac{t(1-\alpha )}{\alpha }\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2\nonumber \\&\quad \le \big \Vert w^*-w^k\big \Vert _Q^2-\big \Vert w^*-w^{k+1}\big \Vert _Q^2 +\tfrac{t(1-\alpha )}{\alpha }\Big (\big \Vert \lambda ^{*}-\lambda ^{k+1}\big \Vert ^2 -\big \Vert \lambda ^{*}-\lambda ^{k+2}\big \Vert ^2\Big ). \end{aligned}$$
(2.17)

By the definition of Q in (2.7), it follows

$$\begin{aligned} \big \Vert w^k-w^{k+1}\big \Vert _Q^2&= r\big \Vert x^k-x^{k+1}\big \Vert ^2 -2\alpha (x^k-x^{k+1})^{\text {T}}A^{\text {T}}(\lambda ^{k}-\lambda ^{k+1})\nonumber \\&+t\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2. \end{aligned}$$
(2.18)

Applying \(\lambda \)-iteration of (2.5), we get

$$\begin{aligned}&-2\alpha (x^k-x^{k+1})^{\text {T}}A^{\text {T}}(\lambda ^{k}-\lambda ^{k+1}) +t\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2\nonumber \\&\quad =t\Big \Vert \big [\tfrac{\alpha }{t}(Ax^k-b)-\tfrac{\alpha }{t}(Ax^{k+1}-b) \big ] -(\lambda ^{k}-\lambda ^{k+1})\Big \Vert ^2 -\tfrac{\alpha ^2}{t}\big \Vert x^k-x^{k+1}\big \Vert ^2_{A^{\text {T}}A}\nonumber \\&\quad =t\Big \Vert \big [(\lambda ^{k}-\lambda ^{k+1})-(\lambda ^{k+1}-\lambda ^{k+2})\big ] -(\lambda ^{k}-\lambda ^{k+1})\Big \Vert ^2 -\tfrac{\alpha ^2}{t}\big \Vert x^k-x^{k+1}\big \Vert ^2_{A^{\text {T}}A}\nonumber \\&\quad =t\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2 -\tfrac{\alpha ^2}{t}\big \Vert x^k-x^{k+1}\big \Vert ^2_{A^{\text {T}}A}. \end{aligned}$$
(2.19)

Combining (2.18) with (2.19), it follows

$$\begin{aligned} \big \Vert w^k-w^{k+1}\big \Vert _Q^2 =\big \Vert x^k-x^{k+1}\big \Vert ^2_M +t\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2 . \end{aligned}$$
(2.20)

Substituting (2.20) into (2.17), we obtain (2.14) immediately and complete the proof. \(\square \)

Theorem 2.5

Suppose \(\alpha \in (0.5,1)\), \(r, t>0\) and \(rt\ge \alpha ^2\rho (A^{\text {T}}A)\), the sequence \(\{w^k\}\) is generated by the gCPPA method. Then we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert w^{k+1}-w^k\Vert =0, \end{aligned}$$
(2.21)

and \(\{w^k\}\) globally converges to an element of solution set \({\mathcal {W}}^*\) satisfying (2.3).

Proof

Since \(\alpha \in (0.5,1)\), \(r, t>0\) and \(rt\ge \alpha ^2\rho (A^{\text {T}}A)\), it holds that

$$\begin{aligned}&\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \nonumber \\&\quad \ge \Big (r-\tfrac{\alpha ^2\rho (A^{\text {T}}A)}{t}\Big ) \Vert x^{k}-x^{k+1}\Vert ^2 +\tfrac{t(2\alpha -1)}{\alpha } \Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\ge 0, \end{aligned}$$
(2.22)

where M is defined by (2.13). Summing (2.14) from 0 to \(K>1\), it yields

$$\begin{aligned}&\sum \limits _{k=0}^{K}\Big ( \big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big )\nonumber \\&\quad \le \big \Vert w^{*}{-}w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}{-}\lambda ^{1}\Vert ^2 {-}\Big [\big \Vert w^{*}-w^{K+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{K+2}\Vert ^2\Big ]. \end{aligned}$$
(2.23)

On one hand, we get from (2.22) and (2.23) that

$$\begin{aligned} \big \Vert w^{*}-w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{1}\Vert ^2 \ge \big \Vert w^{*}-w^{K+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{K+2}\Vert ^2 \end{aligned}$$
(2.24)

holds for \(\forall K>1\). Since the objective function of problem (1.1) is a closed convex function, the solution set \({\mathcal {W}}^*\) is a bounded closed convex nonempty set.

On the other hand, by taking limits on the both side of (2.23), we get

$$\begin{aligned} \sum \limits _{k=0}^{\infty }\Big ( \big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big )\le d, \end{aligned}$$
(2.25)

where \(d:= \big \Vert w^{*}-w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{1}\Vert ^2. \) Thus

$$\begin{aligned} \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{x}^{k}}-{{x}^{k+1}}\Vert ^{2}_M=0, \quad \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{\lambda }^{k+1}}-{{\lambda }^{k+2}}\Vert ^{2}=0, \end{aligned}$$

which implies (2.21), i.e.,

$$\begin{aligned} \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{x}^{k}}-{{x}^{k+1}}\Vert ^{2}=0,\quad \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{\lambda }^{k}}-{{\lambda }^{k+1}}\Vert ^{2}=0, \end{aligned}$$
(2.26)

The sequence \(\{w^k\}\) has at least one cluster point. Let \(w^\infty \) be a cluster point of \(\{w^k\}\), i.e., there exists a subsequence \(\{w^{k_j}\}_{k_j\in {\mathcal {K}}}\) such that \(w^\infty = \lim \nolimits _{k_j\rightarrow \infty , k_j\in {\mathcal {K}}} w^{k_j}\). Then by taking limit on (2.6) as \(k \rightarrow \infty \), and applying (2.26) we have

$$\begin{aligned} \theta (u)-\theta (u^\infty ) +(w-w^\infty )^{\text {T}} F(w^\infty )\ge 0, \forall ~w\in {\mathcal {W}}, \end{aligned}$$

which implies that \(w^{\infty }\in {\mathcal {W}}^*\) is a solution satisfying (2.3) and completes the proof. \(\square \)

Theorem 2.6

If the conditions of Theorem 2.5 hold, then the sequence \(\{w^k\}\) generated by the gCPPA method has a worst-case convergence rate \({\mathcal {O}}(1/k)\).

Proof

Substituting \(a=w^{k-1}-w^{k}\) and \(b=w^k-w^{k+1}\) to the identity

$$\begin{aligned} \Vert a\Vert _Q^2-\Vert b\Vert _Q^2=2a^{\text {T}} Q(a-b)-\Vert a-b\Vert _Q^2, \end{aligned}$$

we obtain

$$\begin{aligned}&\big \Vert w^{k-1}-w^{k}\big \Vert _{Q}^2-\big \Vert w^k-w^{k+1}\big \Vert _{Q}^2\nonumber \\&\quad =2(w^{k-1}-w^{k})^{\text {T}}{Q}\Big \{w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \}\nonumber \\&\quad -\Big \Vert w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \Vert ^2_{Q}. \end{aligned}$$
(2.27)

Letting \(w:=w^k\) in (2.6), we get

$$\begin{aligned}&\theta (u^k)-\theta (u^{k+1}) + \frac{t(\alpha -1)}{\alpha }(\lambda ^k-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2})\nonumber \\&\quad + {{(w^k-w^{k+1})}^{\text {T}}} \Big [F(w^{k+1})+Q(w^{k+1}-w^{k})\Big ]\ge 0. \end{aligned}$$
(2.28)

Since (2.6) also holds at the \((k-1)^{th}\) iteration, we have

$$\begin{aligned}&\theta (u)-\theta (u^{k}) +\frac{t(\alpha -1)}{\alpha }(\lambda -\lambda ^{k})^{\text {T}} (\lambda ^{k}-\lambda ^{k+1})\\&\quad +{{(w-w^{k})}^{\text {T}}} \Big [F(w^{k})+Q(w^{k}-w^{k-1})\Big ]\ge 0, \quad \forall ~w\in {\mathcal {W}}. \end{aligned}$$

Setting \(w:=w^{k+1}\) in the above inequality, we obtain

$$\begin{aligned}&\theta (u^{k+1})-\theta (u^{k}) +\frac{t(\alpha -1)}{\alpha }(\lambda ^{k+1}-\lambda ^{k})^{\text {T}} (\lambda ^{k}-\lambda ^{k+1})\nonumber \\&\quad +{{(w^{k+1}-w^{k})}^{\text {T}}} \Big [F(w^{k})+Q(w^{k}-w^{k-1})\Big ]\ge 0. \end{aligned}$$
(2.29)

Adding (2.28) and (2.29), it yields

$$\begin{aligned}&(w^{k}-w^{k+1})^{\text {T}}Q \Big \{w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \}\\&\quad \ge (w^{k+1}-w^k)^{\text {T}}\Big [F(w^{k+1})-F(w^k)\Big ]\\&\quad +\tfrac{t(1-\alpha )}{\alpha } (\lambda ^{k}-\lambda ^{k+1})^{\text {T}}\Big [\lambda ^{k+1}-\lambda ^{k+2} -(\lambda ^{k}-\lambda ^{k+1})\Big ]. \end{aligned}$$

By the monotonicity of F(w), we get from the above inequality that

$$\begin{aligned}&(w^{k}-w^{k+1})^{\text {T}}Q \Big \{w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \}\nonumber \\&\quad \ge \tfrac{t(1-\alpha )}{\alpha } (\lambda ^{k}-\lambda ^{k+1})^{\text {T}} \Big [\lambda ^{k+1}-\lambda ^{k+2} -(\lambda ^{k}-\lambda ^{k+1})\Big ]. \end{aligned}$$
(2.30)

Adding \(\big \Vert w^{k-1}-w^{k}-(w^{k}-w^{k+1})\big \Vert ^2_{Q}\) to the both sides of (2.30), it follows

$$\begin{aligned}&(w^{k-1}-w^{k})^{\text {T}}{Q}\Big \{w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \}\nonumber \\&\quad \ge \Big \Vert w^{k-1}-w^{k}-(w^{k}-w^{k+1})\Big \Vert ^2_Q\nonumber \\&\quad +\tfrac{t(1 -\alpha )}{\alpha } (\lambda ^{k}-\lambda ^{k+1})^{\text {T}}\Big [\lambda ^{k+1} -\lambda ^{k+2} -(\lambda ^{k}-\lambda ^{k+1})\Big ]. \end{aligned}$$
(2.31)

It is easy to derive that

$$\begin{aligned}&2(\lambda ^{k}-\lambda ^{k+1})^{\text {T}}\Big [\lambda ^{k+1}-\lambda ^{k+2} -(\lambda ^{k}-\lambda ^{k+1})\Big ]\nonumber \\&\quad =-\Big \Vert \lambda ^{k}-\lambda ^{k+1}- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2 +\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2 -\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2. \end{aligned}$$
(2.32)

Combining (2.27) with (2.31) and (2.32), we have

$$\begin{aligned}&\big \Vert w^{k-1}-w^{k}\big \Vert _{Q}^2 +\tfrac{t(1-\alpha )}{\alpha }\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2 -\left( \big \Vert w^k-w^{k+1}\big \Vert _{Q}^2 +\tfrac{t(1-\alpha )}{\alpha }\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2\right) \nonumber \\&\quad \ge \Big \Vert w^{k-1}\!-\!w^{k}-(w^{k}\!-\!w^{k+1})\Big \Vert ^2_Q -\tfrac{t(1-\alpha )}{\alpha } \Big \Vert \lambda ^{k}-\lambda ^{k+1}- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2. \end{aligned}$$
(2.33)

By the definition of Q in (2.7), it follows

$$\begin{aligned} \Big \Vert w^{k-1}\!-\!w^{k}\!-\!(w^{k}\!-\!w^{k+1})\Big \Vert ^2_Q =~&r\Big \Vert x^{k-1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k+1})\Big \Vert ^2 +t\Big \Vert \lambda ^{k-1}\!-\!\lambda ^{k}\!-\! (\lambda ^{k}\!-\!\lambda ^{k+1})\Big \Vert ^2\\&-2\alpha \Big [x^{k-1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k+1})\Big ]^{\text {T}}A^{\text {T}} \Big [\lambda ^{k-1}\!-\!\lambda ^{k}\!-\! (\lambda ^{k}\!-\!\lambda ^{k+1})\Big ]. \end{aligned}$$

By \(\lambda \)-iteration of (2.5), we get

$$\begin{aligned}&-2\alpha \Big [x^{k-1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k+1})\Big ]^{\text {T}}A^{\text {T}} \Big [\lambda ^{k-1}\!-\!\lambda ^{k}\!-\!(\lambda ^{k}\!-\!\lambda ^{k+1})\Big ]\\&+t\Big \Vert \lambda ^{k-1}\!-\!\lambda ^{k}\! -\!(\lambda ^{k}\!-\!\lambda ^{k+1})\Big \Vert ^2 \\&\quad =t\Big \Vert \tfrac{\alpha }{t}A\big [x^{k\!-\!1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k\!+\!1})\big ] \!-\!\big [ \lambda ^{k\!-\!1}\!-\!\lambda ^{k}\!-\!(\lambda ^{k}\!-\!\lambda ^{k\!+\!1}) \big ]\Big \Vert ^2\\&\quad -\tfrac{\alpha ^2}{t}\Big \Vert x^{k\!-\!1}\!-\!x^{k}\!-\!(x^{k}\!-\!x^{k\!+\!1}) \Big \Vert ^2_{A^{\text {T}}A}\\&\quad =t\Big \Vert \lambda ^{k}-\lambda ^{k+1}-(\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2 -\tfrac{\alpha ^2}{t}\Big \Vert x^{k-1}-x^{k}-(x^{k}-x^{k+1})\Big \Vert ^2_{A^{\text {T}}A}, \end{aligned}$$

which implies

$$\begin{aligned} \Big \Vert w^{k-1}\!-\!w^{k}\!-\!(w^{k}\!-\!w^{k+1})\Big \Vert ^2_Q&= \!\Big \Vert x^{k-1}\!-\!x^{k}-(x^{k}\!-\!x^{k+1})\Big \Vert ^2_M\nonumber \\&+t\Big \Vert \lambda ^{k}\!-\!\lambda ^{k+1}\! -\!(\lambda ^{k+1}\!-\!\lambda ^{k+2})\Big \Vert ^2. \end{aligned}$$
(2.34)

Substituting (2.20) and (2.34) into (2.33), we have

$$\begin{aligned}&\big \Vert x^{k-1}-x^{k}\big \Vert ^2_M +\tfrac{t}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Big (\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ) \nonumber \\&\quad \ge \Big \Vert x^{k-1}\!-\!x^{k}-(x^{k}\!-\!x^{k+1})\Big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha } \Big \Vert \lambda ^{k}-\lambda ^{k+1}- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2.\nonumber \\ \end{aligned}$$
(2.35)

Adding \(\Big (\tfrac{t(2\alpha -1)}{\alpha } -\tfrac{t}{\alpha }\Big ) \Big [\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ]\) to the both side of (2.35), we obtain

$$\begin{aligned}&\big \Vert x^{k-1}-x^{k}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Big (\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ) \\&\quad \ge \Big \Vert x^{k-1}-x^{k}-(x^{k}-x^{k+1})\Big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha } \Big \Vert \lambda ^{k}-\lambda ^{k+1}- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2\\&\qquad -\tfrac{2t(1-\alpha )}{\alpha } \Big [\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ]\\&\quad = \Big \Vert x^{k-1}-x^{k}-(x^{k}-x^{k+1})\Big \Vert ^2_M +\tfrac{t}{\alpha } \Big \Vert (\small {2\alpha -1})(\lambda ^{k}-\lambda ^{k+1})- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2\\&\qquad +4(\small {2-\alpha })t\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2. \end{aligned}$$

Recall that M is positive semidefinite and \(\alpha \in (0.5,1)\), it follows

$$\begin{aligned} \big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\le \big \Vert x^{k-1}-x^{k}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2. \end{aligned}$$
(2.36)

Which implies \(\Big \{\big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big \}\) is monotonically nonincreasing whenever \(k\ge 0\). Summing (2.36) from 1 to \(K+1\), it yields

$$\begin{aligned}&(1+K)\Big (\big \Vert x^{K}-x^{K+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2 \Big ) \\&\quad \le \sum \limits _{k=1}^{K+1}\Big ( \big \Vert x^{k-1}-x^{k}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 \Big )\\&\quad =\sum \limits _{k=0}^{K}\Big ( \big \Vert x^{k}-x^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big ). \end{aligned}$$

Combing with (2.23), (2.24) and (2.1), we have

$$\begin{aligned} (1+K)\Big (\big \Vert x^{K}-x^{K+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2 \Big )\le d. \end{aligned}$$

Hence, for any \(\epsilon >0\) there exists K such that

$$\begin{aligned} \big \Vert x^{K}-x^{K+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2 \le \frac{d}{1+K}<\epsilon . \end{aligned}$$
(2.37)

Let \(\lfloor d/\epsilon \rfloor \) means the maximal integer not greater that \(d/\epsilon \). Then, inequality (2.37) implies that the gCPPA needs at most \(\lfloor d/\epsilon \rfloor \) iterations to ensure that

$$\begin{aligned} \Big (r-\tfrac{\alpha ^2\rho (A^{\text {T}}A)}{t}\Big )\big \Vert x^{K}-x^{K+1}\big \Vert ^2< \epsilon ,\quad \tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2< \epsilon , \end{aligned}$$

which implies the sequence \(\{w^k\}\) generated by the gCPPA method has a worst-case convergence rate \({\mathcal {O}}(1/k)\) in a non-ergodic sense. The proof of Theorem 2.6 is completed. \(\square \)

2.2 Extended gCPPA to Separable Convex Minimization Problems

The Lagrangian function of problem (1.9) is

$$\begin{aligned} L(x,y,\lambda )=\theta _1(x)+\theta _2(y) -\lambda ^{\text {T}}(Ax+By-b). \end{aligned}$$
(2.38)

If a pair of \((x^*,y^*)\in {\mathcal {X}}\times {\mathcal {Y}}\) is a solution of problem (1.9), then there exists \(\lambda ^*\in {\mathbb {R}}^m\) such that \((x^*,y^*,\lambda ^*)\) is a saddle point of (2.38) satisfying

$$\begin{aligned} \left\{ \begin{array}{ll} L(x,y^*,\lambda ^*)- L(x^*,y^*,\lambda ^*)\ge 0, \quad \forall ~x\in {\mathcal {X}}, \\ L(x^*,y,\lambda ^*)- L(x^*,y^*,\lambda ^*)\ge 0, \quad \forall ~y\in {\mathcal {Y}}, \\ L(x^*,y^*,\lambda ^*)- L(x^*,y^*,\lambda )\ge 0, \quad \forall ~\lambda \in {\mathbb {R}}^m. \end{array}\right. \end{aligned}$$
(2.39)

Consequently, by (2.39) it follows

$$\begin{aligned}&x^* = \arg \min \limits _{x\in {\mathcal {X}}} L(x,y^*,\lambda ^*), \quad y^* = \arg \min \limits _{y\in {\mathcal {Y}}} L(x^*,y,\lambda ^*),\nonumber \\&\quad \lambda ^* = \arg \max \limits _{\lambda \in {\mathbb {R}}^m} L(x^*,y^*,\lambda ). \end{aligned}$$
(2.40)

According to Lemma 2.1 in [11], we have the first-order optimality conditions of problem (2.40):

$$\begin{aligned} \left\{ \begin{array}{ll} \theta _1(x)-\theta _1(x^*) -(x-x^*)^{\text {T}} (A^{\text {T}} \lambda ^*)\ge 0, &{}\quad \forall ~x\in {\mathcal {X}}, \\ \theta _2(y)-\theta _2(y^*) -(y-y^*)^{\text {T}} (B^{\text {T}} \lambda ^*)\ge 0, &{}\quad \forall ~y\in {\mathcal {Y}}, \\ (\lambda -\lambda ^*)^{\text {T}} (Ax^*+By^*-b) \ge 0, &{}\quad \forall ~\lambda \in {\mathbb {R}}^m. \\ \end{array}\right. \end{aligned}$$
(2.41)

By a compact form of (2.41), the solution set of problem (1.9), also denoted by \({\mathcal {W}}^*\subset {\mathcal {W}}\), consists of all \(w^*\in {\mathcal {W}}\) satisfying (2.3), where \({\mathcal {W}} := {\mathcal {X}}\times {\mathcal {Y}}\times {\mathbb {R}}^m\), \(\theta (u):=\theta _1(x)+\theta _2(y)\) and

$$\begin{aligned} u:=\left( \begin{matrix}x \\ y \\ \end{matrix}\right) ,\quad w:=\left( \begin{matrix}x \\ y \\ \lambda \end{matrix}\right) ,\quad F(w):=\left( \begin{matrix}-{{A}^{\text {T}}}\lambda \\ -{{B}^{\text {T}}}\lambda \\ Ax+By-b\\ \end{matrix} \right) . \end{aligned}$$
(2.42)

We are in the position to propose the extended gCPPA for separable convex optimization problem (1.9).

figure b

By the first-order optimality conditions of iteration subproblem (2.43), we have that, for \(w^{k+1}\in {\mathcal {W}}\) generated by the eCPPA and \(\forall ~w\in {\mathcal {W}}\), we have

$$\begin{aligned} \left\{ \begin{array}{l} \theta (x)-\theta (x^{k+1})+(x-x^{k+1})^{\text {T}}\Big \{-A^{\text {T}} [(1+\alpha )\lambda ^{k+1}-\alpha \lambda ^k]+r(x^{k+1}-x^k)\Big \}\ge 0, \\ \theta (y)-\theta (y^{k+1})+(y-y^{k+1})^{\text {T}}\Big \{-B^{\text {T}} [(1+\alpha )\lambda ^{k+1}-\alpha \lambda ^k]+s(y^{k+1}-y^k)\Big \}\ge 0, \\ (\lambda -\lambda ^{k+1})^{\text {T}}~\Big \{\alpha (Ax^k+By^k-b) +t(\lambda ^{k+1}-\lambda ^k)\Big \}= 0. \end{array}\right. \end{aligned}$$
(2.44)

By notations (2.42), inequality (2.44) can be rewritten to the compact form (2.6) in which

$$\begin{aligned} Q=\left( \begin{matrix}rI_{n_1}&{}0_{n_1\times n_2}&{} -\alpha A^{\text {T}}\\ 0_{n_2\times n_1}&{}sI_{n_2}&{}-\alpha B^{\text {T}}\\ alpha A&{}- \alpha B&{}tI_{m}\end{matrix}\right) . \end{aligned}$$
(2.45)

Lemma 2.8

Let \(w^*\in {\mathcal {W}}^*\) be a solution satisfying (2.3), then we have

$$\begin{aligned} (w^\text {*}-w^{k+1})^{\text {T}}Q(w^{k+1}-w^{k}) \ge \tfrac{t(1-\alpha )}{\alpha }(\lambda ^*-\lambda ^{k+1})^{\text {T}} (\lambda ^{k+1}-\lambda ^{k+2}). \end{aligned}$$
(2.46)

Proof

The mapping F(w) is affine with a skew-symmetric matrix, and thus it is monotone. The rest of the proof is as same as that of Lemma 2.3. \(\square \)

In this subsection, let

$$\begin{aligned} M=\left( \begin{array}{cc} rI_{n_1}-\tfrac{\alpha ^2A^{\text {T}}A}{t}&{} -\tfrac{\alpha ^2A^{\text {T}}B}{t}\\ -\tfrac{\alpha ^2B^{\text {T}}A}{t}&{} sI_{n_2}-\tfrac{\alpha ^2B^{\text {T}}B}{t} \end{array}\right) . \end{aligned}$$
(2.47)

If \(r, s, t>0\) and \(rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)\), where \(\alpha \in (0.5,1)\), it is easy to verify that Q and M are positive semidefinite.

Lemma 2.9

For the sequence \(\{w^k\}\) generated by the gCPPA method, we have

$$\begin{aligned}&\big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\frac{t(2\alpha -1)}{\alpha }\big \Vert \lambda ^{k+1}- \lambda ^{k+2}\big \Vert ^2 \nonumber \\&\quad \le \big \Vert w^{*}-w^{k}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{k+1}\Vert ^2 -\Big [\big \Vert w^{*}-w^{k+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{k+2}\Vert ^2\Big ]. \end{aligned}$$
(2.48)

Proof

Similar to Lemma 2.4, we also get (2.17). By notation Q in (2.45), it holds

$$\begin{aligned} \big \Vert w^k-w^{k+1}\big \Vert _Q^2\!=\! & {} r\big \Vert x^k-x^{k+1}\big \Vert ^2 +s\big \Vert y^k-y^{k+1}\big \Vert ^2 -2\alpha (x^k-x^{k+1})^{\text {T}}A^{\text {T}}(\lambda ^{k}-\lambda ^{k+1})\\&-2\alpha (y^k-y^{k+1})^{\text {T}}B^{\text {T}}(\lambda ^{k}-\lambda ^{k+1}) +t\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2. \end{aligned}$$

From \(\lambda \)-iteration of (2.43), we get

$$\begin{aligned}&-2\alpha (x^k-x^{k+1})^{\text {T}}A^{\text {T}}(\lambda ^{k}-\lambda ^{k+1})- 2\alpha (y^k-y^{k+1})^{\text {T}}B^{\text {T}}(\lambda ^{k}-\lambda ^{k+1}) +t\big \Vert \lambda ^{k}-\lambda ^{k+1}\big \Vert ^2 \\&\quad =t\Big \Vert \big [\tfrac{\alpha }{t}(Ax^k+By^k-b)-\tfrac{\alpha }{t}(Ax^{k+1}+By^{k+1}-b) \big ] -(\lambda ^k-\lambda ^{k+1})\Big \Vert ^2 \\&\qquad -\tfrac{\alpha ^2}{t}\Big \Vert A(x^k-x^{k+1})+B(y^k-y^{k+1})\Big \Vert ^2 \\&\quad =t\Big \Vert \big [(\lambda ^k-\lambda ^{k+1})-(\lambda ^{k+1}-\lambda ^{k+2})\big ] -(\lambda ^k-\lambda ^{k+1})\Big \Vert ^2\\&\quad -\tfrac{\alpha ^2}{t}\Big \Vert A(x^k-x^{k+1})+B(y^k-y^{k+1})\Big \Vert ^2 \\&\quad =t\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2 -\tfrac{\alpha ^2}{t}\Big \Vert A(x^k-x^{k+1})+B(y^k-y^{k+1})\Big \Vert ^2 , \end{aligned}$$

which implies

$$\begin{aligned} \big \Vert w^k-w^{k+1}\big \Vert _Q^2=\big \Vert u^k-u^{k+1}\big \Vert ^2_M +t\big \Vert \lambda ^{k+1}-\lambda ^{k+2}\big \Vert ^2, \end{aligned}$$
(2.49)

where

$$\begin{aligned}&\big \Vert u^k-u^{k+1}\big \Vert ^2_M =~r\big \Vert x^{k}-x^{k+1}\big \Vert ^2 + s\big \Vert y^{k}-y^{k+1}\big \Vert ^2\\&-\tfrac{\alpha ^2}{t}\Big \Vert A(x^k\!-\!x^{k+1})\! +\!B(y^k\!-\!y^{k+1})\Big \Vert ^2. \end{aligned}$$

Substituting (2.49) into (2.17), we obtain (2.14) immediately and complete the proof. \(\square \)

Theorem 2.10

Suppose that \(r, s, t>0\) and \(rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)\), where \(\alpha \in (0.5,1)\). Then, for the sequence \(\{w^k\}\) is generated by the eCPPA method, we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert w^{k+1}-w^k\Vert =0, \end{aligned}$$
(2.50)

and \(\{w^k\}\) globally converges to an element of solution set \({\mathcal {W}}^*\) satisfying (2.41).

Proof

Since \(rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)\), it follows

$$\begin{aligned} \big \Vert u^{k}\!-\!u^{k+1}\big \Vert ^2_M&\ge \Big (r\!-\!\tfrac{2\alpha ^2\rho (A^{\text {T}}A)}{t}\Big )\big \Vert x^{k+1}\!-\! x^{k+2}\big \Vert ^2\nonumber \\&+ \Big (s\!-\!\tfrac{2\alpha ^2\rho (B^{\text {T}}B)}{t}\Big ) \big \Vert y^{k+1}\!-\! y^{k+2}\big \Vert ^2\ge 0. \end{aligned}$$
(2.51)

Notice that \(t>0,~ \alpha \in (0.5,1)\), we get

$$\begin{aligned} \big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \ge 0. \end{aligned}$$
(2.52)

Summing (2.48) from 0 to \(K>1\), it yields

$$\begin{aligned}&\sum \limits _{k=0}^{K}\Big ( \big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big ) \nonumber \\&\quad \le \big \Vert w^{*}-w^{0}\big \Vert ^2 _Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{1}\Vert ^2 -\Big [\big \Vert w^{*}-w^{K+1}\big \Vert ^2_Q +\tfrac{t(1-\alpha )}{\alpha }\Vert \lambda ^{*}-\lambda ^{K+2}\Vert ^2\Big ].\nonumber \\ \end{aligned}$$
(2.53)

By (2.52) and (2.53), it is easy to derive (2.24) and

$$\begin{aligned} \sum \limits _{k=0}^{\infty }\Big ( \big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2 \Big )\le d, \end{aligned}$$

which implies that

$$\begin{aligned} \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{x}^{k}}-{{x}^{k+1}}\Vert ^{2}=0,\quad \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{y}^{k}}-{{y}^{k+1}}\Vert ^{2}=0,\quad \underset{k\rightarrow \infty }{\mathop {\text {lim}}}\, \Vert {{\lambda }^{k}}-{{\lambda }^{k+1}}\Vert ^{2}=0. \end{aligned}$$

Hence, there exists a subsequence \(\{w^{k_j}\}_{k_j\in {\mathcal {K}}}\) such that \(w^\infty = \lim \nolimits _{k_j\rightarrow \infty , k_j\in {\mathcal {K}}} w^{k_j}\) is a solution satisfying (2.3) and completes the proof. \(\square \)

Theorem 2.11

If the conditions of Theorem 2.10 hold, then the sequence \(\{w^k\}\) generated by the eCPPA method has a worst-case convergence rate \({\mathcal {O}}(1/k)\) in a non-ergodic sense.

Proof

It holds obviously that

$$\begin{aligned} \Big \Vert w^{k-1}\!-\!w^{k}\!-\!(w^{k}\!-\!w^{k+1})\Big \Vert ^2_Q&=\Big \Vert u^{k-1}\!-\!u^{k}\!-\!(u^{k}\!-\!u^{k+1})\Big \Vert ^2_M\nonumber \\&\quad +t\Big \Vert \lambda ^{k}\!-\!\lambda ^{k+1}\!-\!(\lambda ^{k+1} \!-\!\lambda ^{k+2})\Big \Vert ^2. \end{aligned}$$
(2.54)

By (2.49) and (2.54), we have

$$\begin{aligned}&\big \Vert u^{k-1}-u^{k}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2 -\Big (\big \Vert u^{k}-u^{k+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{k+1}-\lambda ^{k+2}\Vert ^2\Big ) \\&\quad \ge \Big \Vert u^{k-1}-u^{k}-(u^{k}-u^{k+1})\Big \Vert ^2_M +\tfrac{t}{\alpha } \Big \Vert (\small {2\alpha -1})(\lambda ^{k}-\lambda ^{k+1})- (\lambda ^{k+1}-\lambda ^{k+2})\Big \Vert ^2\\&\qquad +4(\small {2-\alpha })t\Vert \lambda ^{k}-\lambda ^{k+1}\Vert ^2. \end{aligned}$$

Then, for any \(\epsilon >0\) there exists K such that

$$\begin{aligned} \big \Vert u^{K}-u^{K+1}\big \Vert ^2_M +\tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2 \le \frac{d}{1+K}<\epsilon . \end{aligned}$$
(2.55)

By (2.51), inequality (2.55) implies that the eCPPA needs at most \(\lfloor d/\epsilon \rfloor \) iterations to ensure that

$$\begin{aligned}&\Big (r-\tfrac{2\alpha ^2\rho (A^{\text {T}}A)}{t}\Big )\big \Vert x^{K}-x^{K+1}\big \Vert ^2< \epsilon , ~\Big (s-\tfrac{2\alpha ^2\rho (B^{\text {T}}B)}{t}\Big )\big \Vert y^{K}-y^{K+1}\big \Vert ^2< \epsilon ,\\&\quad \tfrac{t(2\alpha -1)}{\alpha }\Vert \lambda ^{K+1}-\lambda ^{K+2}\Vert ^2< \epsilon , \end{aligned}$$

which implies that the sequence \(\{w^k\}\) generated by the eCPPA method has a worst-case convergence rate \({\mathcal {O}}(1/k)\) in a non-ergodic sense. The proof of Theorem 2.11 is completed. \(\square \)

3 Numerical Results

The section focuses on the numerical performance of the gCPPA and eCPPA comparing with some state-of-the-art methods. All methods are coded in MATLAB 2012b and run on an HP desktop computer with Intel(R) Core(TM) i5-6500 CPU 3.20GHz, RAM 8G. The termination criterion of gCPPA is set to \(\max \left\{ \Vert x^k-x^{k+1}\Vert ,\Vert \lambda ^k-\lambda ^{k+1}\Vert \right\} \le \varepsilon \), and that of eCPPA is set to \(\max \left\{ \Vert x^k-x^{k+1}\Vert ,\Vert y^k-y^{k+1}\Vert ,\Vert \lambda ^k-\lambda ^{k+1}\Vert \right\} \le \varepsilon \), where \(\varepsilon >0\) is a small real number. By the global convergence of the gCPPA and eCPPA methods, we have \(\underset{k\rightarrow \infty }{\mathop {\text {lim}}}\,\Vert {{w}^{k}}-{{w}^{k+1}}\Vert ^{2}=0\), which means that for any given \(\varepsilon >0\) the corresponding algorithm will stop in finite iterations.

To adjust parameters r,  s and t simultaneously, we introduce \(\alpha _r,~\alpha _t>0\) satisfying \(\alpha _r\alpha _t=\alpha ^2\). Take \(r= \alpha _r R,~t> \alpha _t \rho (A^{\text {T}}A)/R~\) in gCPPA such that \( rt\ge \alpha ^2\rho (A^{\text {T}}A)\), and take \(r= \sqrt{2}\alpha _r R,~s= \sqrt{2}\alpha _r R\rho (B^{\text {T}}B)/\rho (A^{\text {T}}A),~t> \sqrt{2}\alpha _t\rho (A^{\text {T}}A)/R\) in eCPPA such that \(rt\ge 2\alpha ^2\rho (A^{\text {T}}A), ~st\ge 2\alpha ^2\rho (B^{\text {T}}B)\), where \(R>0\) is a selected constant.

Example 3.1

The first test problem is a correlation matrices calibrating problem selected from [11, 29], which has the form

$$\begin{aligned} \min \Bigg \{~ \frac{1}{2}\Vert X-C\Vert ^2_F ~\big | ~\text{ diag }(X) = e, ~ X\in S^n_{+} \Bigg \}, \end{aligned}$$
(3.1)

where \(C\in {\mathbb {R}}^{n\times n}\) is a symmetrical matrix, \(e=(1,1,\dots ,1)^{\text {T}}\in {\mathbb {R}}^{n \times 1}\) and \(S^n_{+}=\{H\in {\mathbb {R}}^{n\times n}~\big |~ H^{\text {T}}=H, H\succeq 0\}\). Problem (3.1) can be reformulated to problem (1.1) with \(\rho (A^{\text {T}}A)=1\), where A is a projection matrix for the linear constraint \({\text {diag}}(X)=e\) (see Example 2 in [11]).

Let \(z\in {\mathbb {R}}^n\) be the Lagrange multiplier to the linear equality constraint of problem (3.1).

We compare the performance of the gCPPA to the CPPA and rCPPA methods. For a given \((X^k, z^k)\), the gCPPA produces the new iteration \((X^{k+1},z^{k+1})\) as follows:

$$\begin{aligned} \left\{ \begin{array}{l} z^{k+1}= z^k-\frac{\alpha }{t}({\text {diag}}(X^k)-e), \\ X^{k+1} = \arg \min \nolimits _{X\in S^n_{+}} \Big \{~\frac{1}{2}\Vert X-C\Vert ^2_F+\frac{r}{2}\big \Vert (X-X^k)- \frac{1}{r} {\text {diag}}[(1+\alpha )z^{k+1}-\alpha z^k]\big \Vert ^2_F~ \Big \}. \end{array}\right. \end{aligned}$$
(3.2)

The X-subproblem of (3.2) is identical to

$$\begin{aligned} X^{k+1} = \arg \min \nolimits _{X\in S^n_{+}} \Bigg \{~\frac{1}{2}\big \Vert X-\frac{1}{r+1}\{rX^k+ {\text {diag}} [(1+\alpha )z^{k+1}-\alpha z^k]+C\}\big \Vert ^2_F~\Bigg \}. \end{aligned}$$

It is inherent a projection operator via SVD decomposition. It takes the main computation load in each iteration. Similarly, the iteration schemes of CPPA and rCPPA for (3.1) can be written in the form of (1.7) and (1.8).

In our experiments, we take a random matrix \(C=(c_{ij})_{n\times n}\) satisfying \(c_{ij}\in (-1,1)\) and \(c_{ij}=c_{ji}\) for all ij. For each given n, 20 random instances are tested. To be fair, the initial point \(X^0\) is an \(n\times n\) identity matrix and \(z^0\) is n-dimensional zero vector. For the termination criterion of all methods, we set \(\varepsilon = 10^{-5}\).

Let \(R=2.00\). The proximal parameters of the CPPA and rCPPA methods are set to \(r=R\) and \(t=1.05/R=0.53\), which satisfies \(rt=1.05>\rho (AA^{\text {T}})\). Set \(\gamma =1.50\) in the rCPPA method. The “\(\hbox {gCPPA}^{(0.53)}\)” means the gCPPA method with \(\alpha _r=0.70\), \(\alpha _t=0.40\), so \(\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.53\), \(r=R\alpha _r=1.40\) and \(t=1.05\alpha _t/R=0.29\). The “\(\hbox {gCPPA}^{(0.17)}\)” means the gCPPA method with \(\alpha _r=0.30\), \(\alpha _t=0.10\), \(r=R\alpha _r=0.60\), so \(\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.17\) and \(t=1.05\alpha _t/R=0.05\).

For easy comparisons, the results of three methods in terms of matrix dimension n, average iteration number (iter) and average cpu-time (cput in seconds) are put into Table 1, and displayed in Fig. 1.

Table 1 Numerical results of CPPA, rCPPA, \(\hbox {gCPPA}^{(0.53)}\) and \(\hbox {gCPPA}^{(0.17)}\)
Fig. 1
figure 1

Comparisons of CPPA, rCPPA, \(\hbox {gCPPA}^{(0.53)}\) and \(\hbox {gCPPA}^{(0.17)}\). Left: iteration number. Right: cpu time

One can conclude from Table 1 and Fig. 1 that: (1) the \(\hbox {gCPPA}^{(0.53)}\) takes \(80\%\) iterations and time of the CPPA, but takes more iterations and time than the rCPPA; (2) the \(\hbox {gCPPA}^{(0.17)}\) takes \(60\%\) iterations of the CPPA and \(80\%\) iterations of the rCPPA, and spends less cpu-time than the CPPA and rCPPA, especially for the high dimension problem. In other word, the \(\hbox {gCPPA}^{(0.53)}\) and \(\hbox {gCPPA}^{(0.17)}\) are better than CPPA, and \(\hbox {gCPPA}^{(0.17)}\) has almost better numerical performance as the rCPPA with \(\gamma = 1.50\). However, the convergence of \(\hbox {gCPPA}^{(0.17)}\) is not established since \(\alpha <0.5\). It will be left to our future research.

Example 3.2

The second test problem is the matrix completion problem utilized in Tao, Yuan and He [30], which is as follows:

$$\begin{aligned} \min \Bigg \{~ \frac{1}{2}\Vert X-C\Vert ^2_F ~\Big |~ X\in S^n_{+}\cap S_{\text {B}} \Bigg \}, \end{aligned}$$
(3.3)

where \(S_{\text {B}}=\{H\in {\mathbb {R}}^{n\times n}~\big |~H_{\text {L}}\ge H\ge H_{\text {U}}\}\).

By introducing an auxiliary variable Y such that \( X - Y=0\), problem (3.3) can be reformulated to a separable convex optimization problem of the form

$$\begin{aligned} \min \left\{ ~\frac{1}{2}\Vert X-C\Vert ^2_F + \frac{1}{2}\Vert Y-C\Vert ^2_F~ \Big |~ X = Y ,~ X\in S^n_{+},~ Y\in S_{\text {B}}\right\} . \end{aligned}$$
(3.4)

Obviously, problem (3.4) is a special case of problem (1.9) in which \(\rho (A^{\text {T}}A)=\rho (B^{\text {T}}B)=1\).

For comparisons, four methods are used for solving problem (3.4): classical ADMM (ADMM for short), gADMM, separable version of CPPA (i.e., Algorithm 2.7 with \(\alpha =1\), sCPPA) and eCPPA ( Algorithm 2.7 with \(0<\alpha _0< \alpha < 1\)). Let \(Z\in {\mathbb {R}}^{n\times n}\) be the Lagrange multiplier for linear constraint \(X-Y=0\). For a given \((X^k,Y^k,Z^k)\), the eCPPA produces new iterate \((X^{k+1},Y^{k+1},Z^{k+1})\) via

$$\begin{aligned} \left\{ \begin{array}{l} Z^{k+1}=Z^k-\frac{\alpha }{t}(X^k-Y^k),\\ X^{k+1}=\arg \min \nolimits _{X\in S^n_{+}} \Big \{ \frac{1}{2}\big \Vert X-\frac{1}{r+1}[rX^k+ (1+\alpha )Z^{k+1}-\alpha Z^k+C]\big \Vert ^2_F \Big \}, \\ Y^{k+1}=\arg \min \nolimits _{Y\in S_{B}}\Big \{ \frac{1}{2}\big \Vert Y-\frac{1}{s+1}[sY^k- (1+\alpha )Z^{k+1}+\alpha Z^k+C]\big \Vert ^2_F \Big \}. \end{array}\right. \qquad \end{aligned}$$
(3.5)

The X-subproblem of iteration (3.5) is inherent a projection operator via SVD decomposition, and it takes the main computation load in each iteration. The Y-subproblem of (3.5) is also a projection

$$\begin{aligned} Y^{k+1}=P_{S_{B}} \Bigg \{\frac{1}{s+1}\big [sY^k -(1+\alpha )Z^{k+1}+\alpha Z^k+C\big ]\Bigg \}, \end{aligned}$$

where \(P_{S_{B}}\) denotes the projection onto \(S_{B}\) under the Euclidean norm.

In the experiments, we take a random matrix \(C=(c_{ij})_{n\times n}\) satisfying \(c_{ij}\in (-1,1)\) and \(c_{ij}=c_{ji}\) for all ij. For each given n, 20 random instances are tested. To be fair, \(X^0\) and \(Y^0\) are set to \(n\times n\) identity matrice and \(z^0\) is set to n-dimensional zero vector.

For termination criterion of all methods, we set \( \varepsilon = 10^{-5}\Vert w^{1}-w^0\Vert _\infty \).

The parameter settings of the used methods are stated as follows: The penalty parameter is set to \(\beta =1.80\) in both ADMM and gADMM, the relaxation parameter is set to \(\gamma = 1.80\) in the gADMM. Let \(R=3.00\), the proximal parameters of the sCPPA are set to \(r = s = \sqrt{2} R \approx 4.24\) and \(t=1.02\sqrt{2}/R\approx 0.48\), and those of the eCPPA are set to \(r =s=\sqrt{2}\alpha _r R\approx 2.55\), \(t=1.02 \sqrt{2} \alpha _t/R\approx 0.24\), where \(\alpha _r =0.60\) and \(\alpha _t = 0.50\), and \(\alpha =\sqrt{\alpha _r\alpha _t}\approx 0.55\).

We test all methods on 20 random instances for each fixed n. The average iteration number and the average cpu-time for problem (3.4) with different n are listed in Table  2 and displayed in Fig. 2.

Table 2 Numerical results of ADMM, gADMM, sCPPA and eCPPA
Fig. 2
figure 2

Comparisons of ADMM, gADMM, sCPPA and eCPPA. Left: iteration number. Right: cpu time

From Table 2 and Fig. 2, one can conclude that the iteration number of eCPPA is about \(30\%\) of ADMM, and about \(70\%\) of gADMM and sCPPA. Moreover, eCPPA takes less cpu-time than others especially for the high-dimensional problems. Thus, eCPPA has the best performance for problem (3.4).

Example 3.3

The third example selected from [13] focuses on the wavelet-based image inpainting and zooming problems. Let a two-dimensional image \(x\in {\mathbb {R}}^{l_1\times l_2}\) and \(l=l_1\cdot l_2\). Letting \(\mathbf vec (\cdot )\) be the vectorization operator in the lexicographic order, \(\mathbf{x }=\mathbf vec (x)\in {\mathbb {R}}^l\) is tackled by vectorizing x as an one-dimensional vector in the lexicographic order. Let \(D\in {\mathbb {R}}^{l\times n}\) be a wavelet dictionary, each column of D is the elements of a wavelet frame. Commonly, the image \(\mathbf{x }\) possesses a sparse representation under dictionary D, i.e., \(\mathbf{x } = D x\) where x is a sparse vector. The wavelet-based image processing considers recovering the real image \(\mathbf{x }\) from an observation b which might have some missing pixels or convolutions. The model for wavelet-based image processing can be casted as

$$\begin{aligned} \min \Big \{ \Vert \mathbf{x }\Vert _1 \Big | BD\mathbf{x }=b \Big \}, \end{aligned}$$
(3.6)

where \(\Vert \mathbf{x }\Vert _1\) is to deduce a sparse representation under the wavelet dictionary, and B (also called mask) is a typically ill-conditioned matrix representation of convolution or downsampling operators. For the inpainting problem, matrix \(B\in {\mathbb {R}}^{l\times l}\) in (3.6) is a diagonal matrix whose diagonal elements are either 0 or 1, where the locations of 0 correspond to missing pixels in the image and locations of 1 correspond to the pixels to be kept. For the image zooming problem, matrix \(B\in {\mathbb {R}}^{m\times l}\) can be expressed as \(B = SH\) where \(S\in {\mathbb {R}}^{m\times l} \) is a downsampling matrix and \(H\in {\mathbb {R}}^{l\times l} \) is a blurry matrix. Matrix H can be diagonalized by the discrete consine transform (DCT), i.e., \(H = C^{-1}\Lambda C\) where C represents the DCT and \(\Lambda \) is a diagonal matrix whose diagonal entries are eigenvalues of H.

We compare the gCPPA method to the CPPA and rCPPA methods on \(256\times 256\) images of Peppers.png and Boat.png for the image inpainting and image zooming problems, respectively.

The dictionary D is chosen as the inverse discrete Haar wavelet transform with level 6. Below we give the detail of how the tested images are degraded.

  • For the image inpainting problem, the original image Peppers is first blurred by the out-of-focus kernel with a radius of 7. Then 60% pixels of the blurred images are lost by implementing a mask operator S. The positions of missing pixels are located randomly.

  • For the image zooming problem, the original image Boat is downsampled by a downsampling matrix S with a factor of 4. Then, the downsampled image is corrupted by a convolution whose kernel is generated by fspecial(gaussian,9,2.5) of MATLAB.

Let z be the Lagrange multiplier. For a given \((\mathbf{x }^k, z^k)\), the gCPPA adopts the following iteration to obtain \((\mathbf{x }^{k+1}, z^{k+1})\):

$$\begin{aligned} \left\{ \begin{array}{l} z^{k+1} = z^k-\frac{\alpha }{t}(BD{\mathbf{x }}^{k}-b),\\ \mathbf{x }^{k+1} = \arg \min \nolimits _{\mathbf{x } \in {\mathcal {X}}} \big \{ \Vert \mathbf{x }\Vert _1 + \frac{r}{2}\big \Vert \mathbf{x }-\mathbf{x }^k- \frac{1}{r}BD\big [(\alpha +1)z^{k+1}-\alpha z^k\big ]\big \Vert ^2 \big \}. \end{array}\right. \end{aligned}$$

On problem (3.6), the performance of the gCPPA (compared with CPPA and rCPPA) is tested. Similarly, the iteration scheme of the CPPA has the same iteration as the gCPPA with \(\alpha =1\), and the rCPPA can also be written in the form of (1.7) and (1.8). Since \(DD^{\text {T}} = I\), the blurry matrix H can be diagonalized by DCT and the binary matrix (both mask and downsampling matrices) S satisfies \(\Vert S\Vert =1\), we have \(\Vert (BD)^{\text {T}}(BD)\Vert =1\) for the wavelet-based image inpainting and zooming problems. Therefore, the requirement \(rt\ge \Vert (BD)^{\text {T}}(BD)\Vert \) reduces to \(rt\ge 1\) for the CPPA or rCPPA, and \(rt>\alpha ^2\) for the gCPPA. The parameters are described as below.

  • For the image inpainting problem, we take initial point \(({\mathbf{x }^0,\upsilon ^0}) = (D^{\text {T}}(b),\mathbf{0 })\), and \(R=0.60\), \(r_0 = R = 0.60\), \(t_0 = 1.02/R=1.70\), \(\gamma = 1.00\) for the CPPA and \(\gamma = 1.90\) for the rCPPA. Take \(\alpha _r= 2.00\), \(\alpha _t= 0.13\), \(\alpha =\sqrt{0.26}\approx 0.51\), \(r = \alpha _r R=1.20\), \(t=1.02\alpha _t/R\approx 0.22\) for the gCPPA;

  • For the image zooming problem, we take initial point \((\mathbf{x }^0,\upsilon ^0) = (\mathbf{0 }, \mathbf{0 })\), and \(R=0.55\), \(r_0 =R= 0.55\), \(t_0 = 1.02/R\approx 1.85\), \(\gamma = 1.00\) for the CPPA and \(\gamma = 1.20\) for the rCPPA. Take \(\alpha _r= 1.00\), \(\alpha _t= 0.26\), \(a = \sqrt{0.26}\approx 0.51\), \(r = \alpha _r R=0.55\), \(t = 1.02\alpha _t/R\approx 0.48\) for the gCPPA.

As usual, the quality of the reconstruction is measured by the signal-to-noise ratio (SNR) in decibel (dB)

$$\begin{aligned} {\text {SNR}}:=20\log _{10} \frac{\Vert \mathbf{x }\Vert }{\Vert \bar{\mathbf{x }}-\mathbf{x }\Vert }, \end{aligned}$$
(3.7)

where \(\bar{\mathbf{x }}\) is a reconstructed image and \(\mathbf{x }\) is a clean image.

For easy comparisons, the evolutions of SNR with respect to iterations and computing time for all tested methods are displayed in Fig. 3. It shows that the rCPPA is better than the CPPA, and the gCPPA converges to the nearly same solutions in inpainting and zooming problems as fast as the rCPPA.

The reconstructed images are displayed in Fig. 4 by executing 300 iterations in image inpainting (or 30 iterations in image zooming). It verifies our assertion: the gCPPA could be at least as efficient as the rCPPA on the inpainting and zooming problem. We have to emphasize that the gCPPA does not involve any relaxation step while the rCPPA has a relaxation step with \(\gamma >1\).

Fig. 3
figure 3

Evolutions of SNR with respect to iterations (Iteration No.) and CPU time (top row: for image inpainting; bottom row: for image zooming)

Fig. 4
figure 4

Original images, degraded images and reconstructed images by CPPA, rCPPA and gCPPA (top row: for image inpainting; bottom row: for image zooming)

Example 3.4

The last example selected from [13] focuses on the total variation (TV) uniform noise removal model:

$$\begin{aligned} \min \Big \{\Vert \nabla \mathbf{x }\Vert _1 \Big | \Vert H\mathbf{x }-\mathbf{x }^0\Vert _{\infty }\le \sigma \Big \}, \end{aligned}$$
(3.8)

where the two-dimensional image \(x=(x_{i,j})_{l_1\times l_2}\in {\mathbb {R}}^{l_1\times l_2}\) and \(l=l_1 \cdot l_2\). The vectorized version of an image can be denoted by \(\mathbf{x }=\mathbf vec (x)\in {\mathbb {R}}^l\) as Example 3.3, and \(\Vert \nabla \mathbf{x }\Vert _1:=\sum _{i=1}^{l}(|\text {vec}(\nabla _h x)|_i+|\text {vec}(\nabla _v x)|_i)\) is the TV norm with \(\nabla _h x=\big ((\nabla _h x)_{i,j}\big )_{l_1\times l_2}\), \(\nabla _v x=\big ((\nabla _v x)_{i,j}\big )_{l_1\times l_2}\) where

$$\begin{aligned} (\nabla _h x)_{i,j}= {\left\{ \begin{array}{ll} x_{i+1,j}-x_{i,j},&{} i\ne l_1 \\ x_{1,j}-x_{l_1,j},&{} i= l_1 \end{array}\right. } ,\quad (\nabla _v x)_{i,j}= {\left\{ \begin{array}{ll} x_{i,j+1}-x_{i,j},&{} j\ne l_2 \\ x_{i,1}-x_{i,l_2},&{} j= l_2 \end{array}\right. }. \end{aligned}$$

The vector \(\mathbf{x }^0\in {\mathbb {R}}^l\) is an observed image corrupted by a zero-mean uniform noise; \(H\in {\mathbb {R}}^{l \times l}\) is the matrix representation of a blurry operator satisfying \(\Vert HH^{\text {T}}\Vert =1\), \(\sigma \) is a parameter indicating the uniform noise level and \(\Vert \mathbf{x }\Vert _{\infty }:=\max _{1\le i \le l}\Vert \mathbf{x }_i\Vert \).

Let

$$\begin{aligned} A: = \left( \begin{array}{l} H \\ -H \end{array}\right) , \quad b: = \left( \begin{array}{l} \mathbf{x }^0-\sigma \mathbf{e } \\ -\mathbf{x }^0-\sigma \mathbf{e } \end{array}\right) , \end{aligned}$$

where \(\mathbf{e } = (1, 1,\cdots , 1)^{\text {T}} \in {\mathbb {R}}^l\). Model (3.8) can be reformulated to

$$\begin{aligned} \min \Big \{\Vert \nabla \mathbf{x }\Vert _1 ~\Big |~A\mathbf{x }\ge b \Big \}. \end{aligned}$$
(3.9)

We also compare the gCPPA to the CPPA and rCPPA methods on 256-by-256 images of Peppers.png and Boat.png. The clean images are degraded by either the Gaussian (fspecial(gaussian,9,2.5)”) or the out-of-focus (fspecial(disk,3)”) convolution.

Let \(\mathbf{z }\) be the Lagrange multiplier for the linear inequality constraint \(A\mathbf{x }\ge b\). The gCPPA produces the new iterate via:

$$\begin{aligned} \left\{ \begin{array}{l} \mathbf{z }^{k+1}=P_{R_+}\Big \{ \mathbf{z }^k-\frac{\alpha }{t}(A{\mathbf{x }}^{k}-b) \Big \}, \\ \mathbf{x }^{k+1} = \arg \min \left\{ \Vert \nabla \mathbf{x }\Vert _1 + \frac{r}{2}\big \Vert \mathbf{x }-\mathbf{x }^k-\frac{1}{r}A^{\text {T}} \big [(\alpha +1)\mathbf{z }^{k+1}-\alpha \mathbf{z }^k\big ] \big \Vert ^2 ~\big |~ \mathbf{x }\in X\right\} . \end{array}\right. \end{aligned}$$

The CPPA method also owns above iteration with \(\alpha =1\), and the rCPPA can be written in the form of (1.7) and (1.8). We take \(R = 0.60\), \(r_0 = R\), \(t_0 = 1.02/R=1.70\) and \(\gamma = 1.80\) for the CPPA and rCPPA method, and take \(\alpha _r=7.00\), \(\alpha _t=0.04\), \(\alpha =\sqrt{0.28}\approx 0.53\), \(r = \alpha _r R=4.20\), \(t = 1.02\alpha _t/R\approx 0.07\) for the gCPPA. The initial points for all methods are taken as zeros and the stopping criterion is set to

$$\begin{aligned} Tol=\frac{\Vert \mathbf{x }^{k+1}-\mathbf{x }^k\Vert }{\Vert \mathbf{x }^k\Vert }<10^{-6}. \end{aligned}$$
(3.10)

Set \(\sigma =0.5\). For easy comparisons, the original images, degraded images and the restored images are displayed in Fig. 5.

Fig. 5
figure 5

Original images, degraded images with \(\sigma =0.5\) and reconstructed images by the CPPA, rCPPA, gCPPA (top row: the gaussian convolution; bottom row: the out-of-focus convolution)

Fig. 6
figure 6

Evolutions of SNR with respect to the number of iterations (Iteration No.) and computing time (CPU time) (top row: the gaussian convolution with \(\sigma =0.5\); bottom row: the out-of-focus convolution with \(\sigma =0.5\);)

The evolutions of SNR with respect to iterations and computing time are displayed in Fig. 6. It shows that both the CPPA and gCPPA methods reach the higher SNR in a short time. However, the CPPA converges to the solutions worse than that of the rCPPA, while the gCPPA converges to better solutions than the rCPPA in TV uniform noise removal problem.

The numerical results on more experiments with different \(\sigma \) are put into Table 3 for comparisons.

Table 3 Results of the CPPA, rCPPA and gCPPA with different \(\sigma \)

In Table 3, each set of \(\cdot \setminus \cdot \setminus \cdot \) represents the number of iterations, the computing time in seconds and the restored SNR when the stopping criterion (3.10) is met. From Table 3, one can conclude that: the rCPPA has better performance comparing with the CPPA in SNR, but worse in the number of iterations and cpu-time, while the gCPPA has better performance comparing with the CPPA in all terms including SNR, the number of iterations and cpu-time.

4 Conclusions

In this paper, we propose two customized proximal point algorithm by modifying the prediction step involving no relaxation step. We have proposed two new CPPAs (i.e., gCPPA and eCPPA), both do not involve any relaxation step but still keep the same convergence properties as the CPPA within some relaxation steps. Numerical results have demonstrated that the proposed methods are more effective than some state-of-the-art methods.