On the Optimal Proximal Parameter of an ADMM-like Splitting Method for Separable Convex Programming

He, Bingsheng; Yuan, Xiaoming

doi:10.1007/978-981-16-2701-9_8

Bingsheng He^4,5 &
Xiaoming Yuan⁶

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 360))

Included in the following conference series:

International Workshop On Image Processing and Inverse Problems

545 Accesses
1 Citations

Abstract

We proposed an ADMM-like splitting method in [11] for solving convex minimization problems with linear constraints and multi-block separable objective functions. Its proximal parameter is required to be sufficiently large to theoretically ensure the convergence, despite that a smaller value of this parameter is preferred for numerical acceleration. Empirically, this method has been applied to solve various applications with relaxed restrictions on the parameter, yet no rigorous theory is available for guaranteeing the convergence. In this paper, we identify the optimal (smallest) proximal parameter for this method and clarify some ambiguity in selecting this parameter for implementation. For succinctness, we focus on the case where the objective function is the sum of three functions and show that the optimal proximal parameter is 0.5. This optimal proximal parameter generates positive indefiniteness in the regularization of the subproblems, and thus its convergence analysis is significantly different from those for existing methods of the same kind in the literature, which all require positive definiteness (or positive semi-definiteness plus additional assumptions) of the regularization. We establish the convergence and estimate the convergence rate in terms of iteration complexity for the improved method with the optimal proximal parameter.

Bingsheng He—He was supported by the NSFC Grant 11471156. // Xiaoming Yuan—He was supported by the General Research Fund from Hong Kong Research Grants Council: 12300317.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A relaxed proximal ADMM method for block separable convex programming

Article 29 June 2023

The symmetric ADMM with indefinite proximal regularization and its application

Article Open access 21 July 2017

Improved proximal ADMM with partially parallel splitting for multi-block separable convex programming

Article 20 October 2017

Keywords

1 Introduction

Our purpose is finding the optimal (smallest) proximal parameter for the splitting method in [11] for separable convex programming models. To expose our main idea and technique more clearly, we focus on the special convex minimization problem with linear constraints and a separable objective function that can be represented as the sum of three functions without coupled variables:

$$\begin{aligned} \min \{ \theta _1(x) + \theta _2(y) + \theta _3(z)\; | \; Ax + By +Cz=b, \; x\in \mathcal{X}, y\in \mathcal{Y}, z\in \mathcal{Z}\}, \end{aligned}$$

(1)

where $A\in \mathfrak {R}^{m\times n_1}$, $B\in \mathfrak {R}^{m\times n_2}$, $C\in \mathfrak {R}^{m\times n_3}$; $b \in \mathfrak {R}^m$; $\mathcal{X}\subset \mathfrak {R}^{n_1}$, $\mathcal{Y}\subset \mathfrak {R}^{n_2}$ and $\mathcal{Z}\subset \mathfrak {R}^{n_3}$ are closed convex sets; and $\theta _i:\mathfrak {R}^{n_i}\rightarrow \mathfrak {R}$ ($i=1,2,3$) are closed convex but not necessarily smooth functions. Such a model may arise from a concrete application in which one of the functions represents a data-fidelity term while the other two account for various regularization terms. We refer to, e.g., [16, 20,21,22,23], for some applications of (1). The solution set of (1) is assumed to be nonempty throughout.

To recall the splitting method in [11] for the model (1), we start from the augmented Lagrangian method (ALM) that was originally proposed in [15, 18]. Let the Lagrangian and augmented Lagrangian functions of (1) be given, respectively, by

$$\begin{aligned} L(x,y,z,\lambda )= \theta _1(x) + \theta _2(y) + \theta _3(y)- \lambda ^T(Ax +By +Cz -b), \end{aligned}$$

(2)

and

$$\begin{aligned} \mathcal{L}_{\beta }(x,y,z,\lambda )&= \theta _1(x) + \theta _2(y) \nonumber \\&\quad + \theta _3(z) - \lambda ^T(Ax +By +Cz -b) + \frac{\beta }{2} \Vert Ax + By +Cz -b\Vert ^2. \end{aligned}$$

(3)

In (2) and (3), $\lambda \in \mathfrak {R}^m$ is the Lagrange multiplier; and in (3), $\beta >0$ is the penalty parameter. When the three-block separable convex minimization model (1) is purposively regarded as a generic convex minimization model and its objective function is treated as a whole, the ALM in [15, 18] can be applied directly and the resulting iterative scheme is

If two functions in the objective are treated together and two variables in the constraints are grouped accordingly, the alternating direction method of multipliers (ADMM) in [5] can also be directly applied to (1). The resulting iterative scheme reads as

Unless the functions and/or coefficient matrices in (1) are special enough, direct applications of the ALM (1.4) and the ADMM (1.5) usually are not preferred because the (x, y, z)-subproblem in (1.5b) and (y, z)-subproblem in (1.5b) may still be too difficult (even when the functions $\theta _i$ per se are relatively easy). Therefore, generally the three-block model (1) should not be treated as a one-block or two-block case and the ALM (1.4) or ADMM (1.5) should not be applied directly.

On the other hand, for specific applications of the model (1), functions in its objective usually have their own physical explanations and mathematical properties. Thus, it is usually necessary to treat them individually to design more efficient algorithms. More accurately, we are interested in such an algorithm that handles these functions $\theta _i$ individually in its iterative scheme. A natural idea is to split the subproblem in the original ALM (1.4) in the Jacobian or Gaussian manner; the corresponding schemes are as follows:

$$\begin{aligned} \left\{ \begin{array}{rcl} x^{k+1} &{} = &{}\arg \min \bigl \{\mathcal{L}_{\beta }(x,y^k,z^k,\lambda ^k) \; | \; x\in \mathcal{X} \bigr \},\\ y^{k+1} &{} = &{}\arg \min \bigl \{\mathcal{L}_{\beta }(x^{k},y,z^k,\lambda ^k) \; | \; y\in \mathcal{Y} \bigr \},\\ z^{k+1} &{} = &{}\arg \min \bigl \{\mathcal{L}_{\beta }(x^{k},y^{k},z,\lambda ^k) \; | \; z\in \mathcal{Z} \bigr \},\\ {\lambda }^{k+1} &{} = &{} \lambda ^k - \beta (A x^{k+1} + By^{k+1} + Cz^{k+1} -b), \end{array} \right. \end{aligned}$$

(6)

and

$$\begin{aligned} \left\{ \begin{array}{rcl} x^{k+1} &{} = &{}\arg \min \bigl \{\mathcal{L}_{\beta }(x,y^k,z^k,\lambda ^k) \; \big | \; x\in \mathcal{X} \bigr \},\\ y^{k+1} &{} = &{}\arg \min \bigl \{\mathcal{L}_{\beta }(x^{k+1},y,z^k,\lambda ^k) \; \big | \; y\in \mathcal{Y} \bigr \},\\ z^{k+1} &{} = &{}\arg \min \bigl \{\mathcal{L}_{\beta }(x^{k+1},y^{k+1},z,\lambda ^k) \; \big | \; z\in \mathcal{Z} \bigr \},\\ {\lambda }^{k+1} &{} = &{} \lambda ^k - \beta (A x^{k+1} + By^{k+1} + Cz^{k+1} -b). \end{array} \right. \end{aligned}$$

(7)

All the subproblems in (6) and (7) are easier than the original problem (1); only one function in its objective and a quadratic term are involved in the x-, y-, z-subproblems. But, as shown in [1, 8], neither of the schemes (6) and (7) is necessarily convergent. Therefore, although schemes such as (6) and (7) can be easily generated, the lack of convergence may require more meticulous theoretical study and algorithmic design techniques for the three-block case (1). The results in [1, 8] also justify that designing augmented-Lagrangian-based splitting algorithms for the three-block case (1) is significantly different from that for the one- or two-block case; and they need to be discussed separately despite that there is a rich literature of the ALM and ADMM.

Despite of their lack of convergence, the schemes (6) and (7) may empirically work well, see, e.g., [20, 22, 23]. It is thus interesting to design an augmented-Lagrangian-based splitting method whose iterative scheme is analogous to (6), (7), or a fused one of both, while its theoretical convergence and empirical efficiency can be both ensured. The method in [11] is such one; its iterative scheme for (1) reads as

where the parameter $\mu $ is required to be $\mu \ge 2$ in [11]. The scheme (1.8) has the simplicity in sense of that each of the x-, y-, and z-subproblems involves just one function from (1) in its objective. Its efficiency has been verified in [11] by some sparse and low-rank models and image inpainting problems. Also, it was used in [2] for solving a dimensionality reduction problem on physical space.

It is easy to see that the scheme (1.8) can be rewritten as

with $\tau =\mu -1$ and thus $\tau \ge 1$ as shown in [11]. The scheme (1.9) shows more clearly that it is a mixture of the augmented-Lagrangian-based splitting schemes (6) and (7), in which the x- and (y, z)-subproblems are updated in the alternating order while the (y, z)-subproblem is further splitted in parallel so that parallel computation can be implemented to the resulting y- and z-subproblems. Recall the lack of convergence of (6) and (7). Thus, it is necessary to regularize the splitted y- and z-subproblems appropriately in (1.9) to ensure the convergence. Indeed, the terms $\frac{\tau \beta }{2}\Vert B(y-y^k)\Vert ^2$ and $\frac{\tau \beta }{2}\Vert C(z-z^k)\Vert ^2$ in (1.9) can be regarded as proximal regularization terms with $\tau $ as the proximal parameter.

On the other hand, with fixed $\beta $, the proximal parameter $\tau $ determines the weight of the proximal terms in the subproblems (1.9b) and its reciprocal plays the role of step size for an algorithm implemented internally to solve the subproblems (1.9b). We hence prefer smaller values of $\tau $ whenever the convergence of (1.9) can be theoretically guaranteed. As mentioned, in [11], we have shown that the condition $\tau \ge 1$ is sufficient to ensure the convergence of (1.9). While, numerically, as shown in [11] and also in [2] (see Section V, Part B, Pages 3247–3248), it has been observed that values very close to 1 are preferred for $\tau $. For example, $\mu =2.01$, i.e., $\tau =1.01$, was recommended in [11] and used in [2] to result in faster convergence. This raises the necessity of seeking the optimal (smallest) value of $\tau $ that can ensure the convergence of (1.9). The main purpose of this paper is to rigorously prove that the optimal value of $\tau $ is 0.5 for the method (1.9). That is, any $\tau >0.5$ ensures the convergence of (1.9) yet any $\tau \in (0,0.5)$ yields divergence.

Note that, because of our analysis in [1], without loss of the generality, we can just assume $\beta \equiv 1$. That is, the augmented Lagrangian function defined in (3) is reduced to

$$\begin{aligned} {\mathcal {L}}(x,y,z,\lambda )&= \theta _1(x) + \theta _2(y) +\theta _3(z) \nonumber \\&\quad - \lambda ^T(Ax +By +Cz-b) + \frac{1}{2} \Vert Ax + By +Cz -b\Vert ^2; \end{aligned}$$

(10)

and the iterative scheme of (1.9) is now simplified as

The rest of this paper is organized as follows. We recall some preliminaries in Sect. 2. In Sect. 3, we show why positive indefiniteness occurs in the proximal regularization for the scheme (1.11) when $\tau > 0.5$. Then, we provide an explanation in the prediction-correction framework for (1.11) in Sect. 4; and focus on analyzing an important quadratic term in Sect. 5 that is the key for conducting convergence analysis for (1.11). The convergence of (1.11) with $\tau > 0.5$ is proved in Sect. 6; and the divergence of (1.11) with $\tau \in (0,0.5)$ is shown in Sect. 7 by an example. We estimate the worst-case convergence rate in terms of iteration complexity for the scheme (1.11) in Sect. 8. Finally, we make some conclusions in Sect. 9.

2 Preliminaries

In this section, we recall some preliminary results for further analysis. First of all, a pair of $\bigl ((x^*,y^*,z^*),\lambda ^*\bigr )$ is called a saddle point of the Lagrangian function defined in (2) if it satisfies the inequalities

$$ L_{\lambda \in \mathfrak {R}^m}(x^*,y^*, z^*,\lambda ) \le L(x^*,y^*,z^*,\lambda ^*) \le L_{x\in \mathcal{X}, y\in \mathcal{Y},z\in \mathcal{Z}}(x,y,z,\lambda ^*). $$

Or, we can rewrite these inequalities as

$$\begin{aligned} \left\{ \begin{array}{l} x^* =\arg \min \{ L(x,y^*,z^*,\lambda ^*) \, |\, x\in {\mathcal {X}} \}, \\ y^* =\arg \min \{ L(x^*,y,z^*,\lambda ^*) \, |\, y\in {\mathcal {Y}} \}, \\ z^* =\arg \min \{ L(x^*,y^*,z,\lambda ^*) \, |\, z\in {\mathcal {Z}} \}, \\ \lambda ^*= \arg \max \{ L(x^*,y^*,z^*,\lambda ) \, |\, \lambda \in \mathfrak {R}^m \}. \end{array} \right. \end{aligned}$$

(12)

Indeed, a saddle point of the Lagrangian function defined in (2) can also be characterized by the following variational inequality:

$$\begin{aligned} \left\{ \begin{array}{lrl} x^*\in \mathcal{X}, &{} \theta _1(x) - \theta _1(x^*) + (x-x^*)^T(- A^T\lambda ^*) \ge 0, &{} \forall \, x\in \mathcal{X}, \\ y^*\in \mathcal{Y}, &{} \theta _2(y) - \theta _2(y^*) + (y-y^*)^T(- B^T\lambda ^*) \ge 0, &{} \forall \, y\in \mathcal{Y}, \\ z^*\in \mathcal{Z}, &{} \theta _3(z) - \theta _3(z^*) + (z-z^*)^T(- C^T\lambda ^*) \ge 0, &{} \forall \, z\in \mathcal{Z}, \\ \lambda ^*\in \mathfrak {R}^m, &{}(\lambda -\lambda ^*)^T(Ax^*+ By^* + Cz^*-b)\ge 0, &{} \forall \; \lambda \in \mathfrak {R}^m. \end{array} \right. \end{aligned}$$

(13)

We call (x, y, z) and $\lambda $ the primal and dual variables, respectively.

The optimality condition of the model (1) can be characterized by the monotone variational inequality:

$$\begin{aligned} w^*\in \Omega , \quad \theta (u) -\theta (u^*) + (w-w^*)^T F(w^*) \ge 0, \quad \forall \, w\in \Omega , \end{aligned}$$

(14a)

where

$$\begin{aligned} u = \left( \begin{array}{c} x\\ y \\ z \end{array} \right) , \; \theta (u) =\theta _1(x)+\theta _2(y) + \theta _3(z), \; w = \left( \begin{array}{c} x\\ y \\ z\\ \lambda \end{array} \right) , \; F(w) =\left( \begin{array}{c} - A^T\lambda \\ - B^T\lambda \\ - C^T\lambda \\ Ax + By +Cz-b \end{array} \right) \quad \end{aligned}$$

(14b)

and

$$ \Omega = \mathcal{X} \times \mathcal{Y} \times \mathcal{Z} \times \mathfrak {R}^m. $$

We denote by $\Omega ^*$ the solution set of (14). Note that the operator F in (14b) is affine with a skew-symmetric matrix and thus we have

$$\begin{aligned} (w-\bar{w})^T(F(w)-F(\bar{w}))=0, \quad \forall \, w,\bar{w}. \end{aligned}$$

(15)

3 The Positive Indefiniteness of (1.11) with $\tau >0.5$

In this section, we revisit the scheme (1.11) from the variational inequality perspective; and show that it can be represented as a proximal version of the direct application of ADMM (1.5) but the proximal regularization term is not positive definite for the case of $\tau >0.5$. The positive indefiniteness of the proximal regularization excludes the application of a vast set of known convergence results in the literature of ADMM and its proximal versions, because they all require positive definiteness or semi-definiteness (plus additional assumptions on the model (1)) for the proximal regularization term to validate the convergence analysis.

Let us first take a look at the optimality conditions of the subproblems in (1.11). Note that the subproblem (1.11b) are specified as

$$\begin{aligned} y^{k+1}&=\arg \min \bigl \{ \theta _2(y) - y^TB\lambda ^k \nonumber \\&+ \frac{1}{2}\Vert Ax^{k+1} +By + Cz^k-b\Vert ^2 + \frac{\tau }{2} \Vert B(y-y^k)\Vert ^2 \;|\; y\in \mathcal{Y}\}, \end{aligned}$$

(16a)

and

$$\begin{aligned} z^{k+1}&= \arg \min \bigl \{ \theta _3(z) - z^TC\lambda ^k \nonumber \\&+ \frac{1}{2}\Vert Ax^{k+1} +By^k + Cz-b\Vert ^2 + \frac{\tau }{2} \Vert C(z-z^k)\Vert ^2 \;|\; z\in \mathcal{Z}\}. \end{aligned}$$

(16b)

Thus, the optimality condition of the y-subproblem in (1.11b) can be written as $y^{k+1}\in \mathcal{Y}$ and

$$ \theta _2(y) - \theta _2(y^{k+1})+(y-y^{k+1})^T \left( \!\!\begin{array}{c} -B^T\lambda ^k + B^T(Ax^{k+1} + By^{k+1} + Cz^k -b) \\ + \tau B^TB (y^{k+1}-y^k) \end{array} \!\! \right) \ge 0, \;\forall y\in \mathcal{Y}; $$

or equivalently: $y^{k+1}\in \mathcal{Y}$ and

$$\begin{aligned} \theta _2(y) - \theta _2(y^{k+1})+(y-y^{k+1})^T \left( \!\!\begin{array}{c} -B^T\lambda ^k + B^T(Ax^{k+1} + By^{k+1} + Cz^{k+1} -b) \\ \tau B^TB (y^{k+1}-y^k) - B^TC(z^{k+1} - z^k) \end{array} \!\! \right) \ge 0, \;\forall y\in \mathcal{Y}. \end{aligned}$$

(17a)

Similarly, the optimality condition of the z-subproblem in (1.11b) can be written as $z^{k+1}\in \mathcal{Z}$ and

$$\begin{aligned} \theta _3(z) - \theta _3(z^{k+1})+(z-z^{k+1})^T \left( \!\!\begin{array}{c} -C^T\lambda ^k + C^T(Ax^{k+1} + By^{k+1} + Cz^{k+1} -b) \\ - C^TB(y^{k+1} - y^k) + \tau C^TC (z^{k+1}-z^k) \end{array} \!\! \right) \ge 0, \;\forall z\in \mathcal{Z}. \end{aligned}$$

(17b)

Then, with (1.11c), we can rewrite the inequalities (17a) and (17b) as $ (y^{k+1}, z^{k+1})\in \mathcal{Y}\times \mathcal{Z}$ and

$$\begin{aligned}&\left( \!\!\begin{array}{c} \theta _2(y) - \theta _2(y^{k+1})\\ \theta _3(z) - \theta _3(z^{k+1}) \end{array} \!\!\right) + \left( \!\!\begin{array}{c} y -y^{k+1}\\ z - z^{k+1} \end{array} \!\!\right) ^T\!\! \left\{ \left( \!\! \begin{array}{c} - B^T\lambda ^{k+1} \\ -C^T\lambda ^{k+1} \end{array}\!\! \right) + {{D_0}} \left( \begin{array}{c} y^{k+1}-y^k \\ z^{k+1} - z^k \end{array}\right) \right\} \nonumber \\&\,\, \quad \ge \,\, 0, \;\forall \, (y, z)\in \mathcal{Y}\times \mathcal{Z}, \end{aligned}$$

(18)

where

$$\begin{aligned} {{D_0}} = \left( \! \begin{array}{cc} \tau B^TB &{}- B^TC \\ -C^TB &{} \tau C^TC \end{array}\! \right) . \end{aligned}$$

(19)

Obviously, $D_0$ is positive semidefinite and indefinite when $\tau \ge 1$ and $\tau \in (0,1)$, respectively.

Then, it is easy to see that the scheme (1.11) can be rewritten as

Comparing (3.5b) with (1.5b) (note that $\beta =1$), we see that the scheme (1.11) can be symbolically represented as a proximal version of (1.5) in which the (y, z)-subproblem is proximally regularized by a proximal term. But the difficulty is that $D_0$ defined in (19) is positive indefinite when $\tau \in (0.5,1)$. Indeed, our analysis in [11] requires $\tau \ge 1$ and thus the positive semidefiniteness of $D_0$ is ensured. For this case, the convergence analysis is relatively easy because it can follow some techniques used for the proximal point algorithm which is originated from [17, 19]. For the case where $\tau $ is relaxed to $\tau >0.5$ and hence the matrix $D_0$ in (19) is positive indefinite, the analysis in [11] and other literatures is not applicable and more sophisticated techniques are needed for proving the convergence of the scheme (1.11) with $\tau >0.5$.

4 A Prediction-Correction Explanation of (1.11)

In this section, we show that the scheme (1.11) can be expressed by a prediction-correction framework. This prediction-correction explanation is only for the convenience of theoretical analysis and there is no need to follow this prediction-correction framework to implement the scheme (1.11).

In the scheme (1.11), we see that $x^{k}$ is not needed to generate the next $(k+1)$-th iterate; only $(y^k, z^k,\lambda ^k)$ are needed. Thus, we call x the intermediate variable; and $(y,z,\lambda )$ essential variables. To distinguish their roles, accompanied with the notation in (14b), we additionally define the notation

$$\begin{aligned} v= \left( \begin{array}{c} y\\ z\\ \lambda \end{array} \right) , \quad {\mathcal {V}}= {\mathcal {Y}}\times {\mathcal {Z}}\times R^m \quad \hbox {and} \quad {\mathcal {V}}^* = \{ (y^*,z^*,\lambda ^*) \, |\, (x^*,y^*, z^*,\lambda ^*) \in \Omega ^*\} . \end{aligned}$$

(21)

Moreover, we introduce the auxiliary variables $\tilde{w}^k= (\tilde{x}^k,\tilde{y}^k,\tilde{z}^k,\tilde{\lambda }^k)$ defined by

$$\begin{aligned} \tilde{x}^k =x^{k+1} , \quad \tilde{y}^k =y^{k+1}, \quad \tilde{z}^k= z^{k+1} \quad \hbox {and} \quad \tilde{\lambda }^k = \lambda ^k- (Ax^{k+1} + By^k +Cz^k-b), \end{aligned}$$

(22)

where $(x^{k+1},y^{k+1},z^{k+1})$ is the iterate generated by the scheme (1.11) from the given one $(y^k,z^k,\lambda ^k)$. Using these notations, we have

$$\begin{aligned} \lambda ^{k+1}= & {} \lambda ^k - (A x^{k+1} + By^{k+1} + Cz^{k+1} -b) \nonumber \\= & {} [\lambda ^k - (A x^{k+1} + By^{k} + Cz^{k} -b) ] +B (y^k-y^{k+1}) + C(z^k - z^{k+1}) \nonumber \\= & {} \tilde{\lambda }^k + B (y^k-\tilde{y}^k) + C(z^k - \tilde{z}^k). \end{aligned}$$

(23)

Now, we interpret the optimality conditions of the subproblems in (1.11) by using the auxiliary variables $\tilde{w}^k$. First, ignoring some constant terms, the subproblem (1.11.a) is equivalent to

$$ x^{k+1} =\arg \min \bigl \{ \theta _1(x) -x^TA\lambda ^k +\frac{1}{2}\Vert Ax +By^k +Cz^k-b\Vert ^2 \;|\; x\in \mathcal{X} \}; $$

and its optimality condition can be rewritten as

$$\begin{aligned} \tilde{x}^k\in \mathcal{X}, \;\, \theta _1(x) - \theta _1(\tilde{x}^k) + (x-\tilde{x}^k)^T( -A^T\tilde{\lambda }^k)\ge 0, \;\forall \; x\in \mathcal{X}.\end{aligned}$$

(24a)

Using (23), $y^{k+1}=\tilde{y}^k$ and $z^{k+1}=\tilde{z}^k$, the inequalities (17a) and (17b) can be written as

$$ \tilde{y}^k\in \mathcal{Y}, \;\; \theta _2(y) - \theta _2(\tilde{y}^k) + (y - \tilde{y}^k)^T\{ -B^T\tilde{\lambda }^k + (1+\tau ) B^TB (\tilde{y}^k-y^k) \} \ge 0, \;\; \forall \, y\in \mathcal{Y}$$

and

$$ \tilde{z}^k\in \mathcal{Z}, \;\; \theta _3(z) - \theta _3(\tilde{z}^k) + (z - \tilde{z}^k)^T\{ -C^T\tilde{\lambda }^k + (1+\tau ) C^TC (\tilde{z}^k-z^k) \} \ge 0, \;\; \forall \, z\in \mathcal{Z},$$

respectively. Thus, the inequality (18) becomes $(\tilde{y}^k,\tilde{z}^k)\in \mathcal{Y}\times \mathcal{Z}$ and

$$\begin{aligned}&\left( \!\!\begin{array}{c} \theta _2(y) - \theta _2(\tilde{y}^{k})\\ \theta _3(z) - \theta _3(\tilde{z}^{k}) \end{array} \!\!\right) + \left( \!\!\begin{array}{c} y -\tilde{y}^{k}\\ z - \tilde{z}^{k} \end{array} \!\!\right) ^T\!\! \left\{ \left( \!\! \begin{array}{c} - B^T \\ -C^T \end{array}\!\! \right) \tilde{\lambda }^{k} \right. + \qquad \quad \nonumber \\&\qquad \qquad \qquad \qquad \quad + \left. (1 +\tau ) \left( \! \begin{array}{cc} B^TB &{} 0 \\ 0 &{} C^TC \end{array}\! \right) \! \! \left( \begin{array}{c} \tilde{y}^{k} - y^k \\ \tilde{z}^{k} - z^k \end{array}\right) \right\} \ge 0, \; \forall (y, z)\in \mathcal{Y}\times \mathcal{Z} .\qquad \quad \end{aligned}$$

(24b)

Note that the equality $\tilde{\lambda }^k = \lambda ^k- (Ax^{k+1} + By^k + Cz^k -b)$ in (22) can be written as the variational inequality form

$$\begin{aligned} \tilde{\lambda }^k \in {\mathfrak {R}^m}, \;\;(\lambda - \tilde{\lambda }^k)^T\{(A\tilde{x}^k + B\tilde{y}^k +C\tilde{z}^k-b) - B(\tilde{y}^k - y^k) - C(\tilde{z}^k - z^k) + (\tilde{\lambda }^k-\lambda ^k)\}\ge 0, \;\;\forall \lambda \in {\mathfrak {R}^m}. \end{aligned}$$

(24c)

Therefore, it follows from the inequalities (24a), (24b) and (24c) that the auxiliary variable $\tilde{w}^k= (\tilde{x}^k,\tilde{y}^k,\tilde{z}^k,\tilde{\lambda }^k)$ defined in (22) satisfies the following variational inequality.

We call the auxiliary variable $\tilde{w}^k= (\tilde{x}^k,\tilde{y}^k,\tilde{z}^k,\tilde{\lambda }^k)$ as the predictor. Using (23), the update form (1.11c) can be represented as

$$ \lambda ^{k+1}=\lambda ^k - (A x^{k+1}+ By^{k+1}+ Cz^{k+1} -b)= \lambda ^k -[- B(y^k-\tilde{y}^{k}) - C(z^k-\tilde{z}^{k}) + (\lambda ^k-\tilde{\lambda }^k)]. $$

Recall we define by v in (21) the essential variables for the scheme (1.11). The new essential variables of (1.11), $v^{k+1} =(y^{k+1}, z^{k+1},\lambda ^{k+1})$, are updated by the following scheme:

Overall, the scheme (1.11) can be explained by a prediction-correction framework which generates a predictor characterized by the step (4.5) and then corrects it by the step (4.6). As we shall show, the inequality (4.5) indicates the discrepancy between ${\tilde{w}}^k$ and a solution point of the variational inequality (14) and it plays an important role in the convergence analysis for the scheme (1.11). Indeed, we can further investigate the inequality (4.5) and derive a new right-hand side that is more preferred for establishing the convergence. For this purpose, let us define a matrix as

$$\begin{aligned} H = \left( \begin{array}{ccc} (1+\tau ) B^TB &{} 0 &{} 0 \\ 0 &{} (1+\tau ) C^TC &{} 0 \\ 0 &{} 0&{} I_m \end{array} \right) , \end{aligned}$$

(27)

which is positive definite for any $\tau >0$ when B and C are both full column rank. Then, for the matrices Q and M defined in (4.5b) and (4.6b), respectively, it obviously holds that

$$\begin{aligned} Q = HM. \end{aligned}$$

(28)

In the following lemma, we further analyze the right-hand side of (4.5) and show more explicitly the difference of the proof for the convergence of (1.11) with $\tau > 0.5$ from that with $\tau \ge 1$ in [11].

Theorem 25.1

Let $\{w^k\}$ be the sequence generated by (1.11) for the problem (1) and $\tilde{w}^k$ be defined by (22). Then, $\tilde{w}^k\in \Omega $ and

$$\begin{aligned} \theta (u) - \theta (\tilde{u}^k) + (w- \tilde{w}^k)^T F(w) \ge \frac{1}{2} \bigl (\Vert v-v^{k+1}\Vert _H^2 -\Vert v-v^k\Vert _H^2 \bigr ) + \frac{1}{2}(v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k), \; \forall w\in \Omega , \end{aligned}$$

(29)

where

$$\begin{aligned} G= Q^T + Q - M^THM. \end{aligned}$$

(30)

Proof

Using $Q=HM$ (see (28)) and the relation (4.6a), the right-hand side of (4.5a) can be written as

$$ (v- \tilde{v}^k)^T H(v^k - v^{k+1}),$$

and hence we have

$$\begin{aligned} \theta (u) - \theta (\tilde{u}^k) + (w- \tilde{w}^k )^T F(\tilde{w}^k) \ge (v- \tilde{v}^k)^T H(v^k - v^{k+1}), \; \forall w\in \Omega . \end{aligned}$$

(31)

Applying the identity

$$ (a-b)^TH(c-d) = \frac{1}{2} \{\Vert a-d\Vert _H^2 -\Vert a-c\Vert _H^2\} +\frac{1}{2} \{\Vert c-b\Vert _H^2 -\Vert d-b\Vert _H^2\}, $$

to the right-hand side of (31) with

$$ a=v, \quad b=\tilde{v}^k, \quad c=v^k , \quad \hbox {and} \quad d=v^{k+1}, $$

we obtain

$$\begin{aligned} (v- \tilde{v}^k)^T H(v^k - v^{k+1}) = \frac{1}{2} \bigl ( \Vert v-v^{k+1}\Vert _H^2 - \Vert v-v^k\Vert _H^2 \bigr ) + \frac{1}{2}(\Vert v^k-\tilde{v}^k\Vert _H^2\!\! -\!\! \Vert v^{k+1}-\tilde{v}^k\Vert _H^2) . \end{aligned}$$

(32)

For the last term of (32), we have

$$\begin{aligned}&\Vert v^k-\tilde{v}^k\Vert _H^2 - \Vert v^{k+1}-\tilde{v}^k\Vert _H^2 \nonumber \\&\quad \,\,\, = \quad \,\,\, \Vert v^k-\tilde{v}^k\Vert _H^2 - \Vert (v^k-\tilde{v}^k) -(v^k- v^{k+1})\Vert _H^2 \nonumber \\&\qquad {\mathop {=}\limits ^{4.6a}} \,\,\,\,\,\, \Vert v^k-\tilde{v}^k\Vert _H^2 - \Vert (v^k-\tilde{v}^k) - M (v^k-\tilde{v}^k)\Vert _H^2 \nonumber \\&\quad \,\,\,\, = \,\,\,\,\, 2 (v^k-\tilde{v}^k)^TH M (v^k-\tilde{v}^k) -(v^k-\tilde{v}^k)^T M ^THM (v^k-\tilde{v}^k) \nonumber \\&\quad \,\,\,\, = \,\,\,\,(v^k-\tilde{v}^k)^T ( Q^T +Q - M^THM)(v^k-\tilde{v}^k) \nonumber \\&\quad \,\,\, {\mathop {=}\limits ^{4.10}} \quad \,\,\,\,\, (v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k). \end{aligned}$$

(33)

Substituting (33) into (32), we get

$$\begin{aligned} (v- \tilde{v}^k)^T H(v^k - v^{k+1})&= \frac{1}{2} \bigl ( \Vert v-v^{k+1}\Vert _H^2 - \Vert v-v^k\Vert _H^2 \bigr ) \nonumber \\&+ \frac{1}{2} (v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k). \end{aligned}$$

(34)

Recall that $(w- \tilde{w}^k)^T F(\tilde{w}^k) = (w- \tilde{w}^k)^T F(w)$ (see (15)). Using this fact, the assertion of this lemma follows from (31) and (34) directly. $\square $

When G given in (30) is positive definite, as shown in [11], it is relatively easier to use the assertion (29) to prove the global convergence and estimate its worst-case convergence rate in terms of iteration complexity, see, e.g., [7, 14] for details and [6] (Sections 4 and 5 therein) for a tutorial proof. For the matrix G given in (30), since $HM=Q$ and $M^THM = M^TQ$, we have

$$\begin{aligned} M^THM= & {} \left( \begin{array}{ccc} I &{} 0 &{} - B^T\\ 0 &{} I &{} -C^T \\ 0 &{} 0 &{} I_m \end{array}\right) \left( \begin{array}{ccc} (1+\tau ) B^TB &{} 0 &{} 0\\ 0 &{} (1+\tau ) C^TC &{} 0 \\ -B &{} -C &{}I_m \end{array} \right) \\= & {} \left( \begin{array}{ccc} (2+\tau ) B^TB &{} B^TC &{} -B^T\\ C^TB &{} (2+\tau ) C^TC &{} - C^T \\ -B &{} -C &{}I_m \end{array} \right) . \end{aligned}$$

Then, using (4.5b) and the above equation, we have

$$\begin{aligned} G= & {} (Q^T+ Q)- M ^THM \nonumber \\= & {} \left( \begin{array}{ccc} (2+2\tau ) B^TB &{} 0 &{} -B^T\\ 0 &{} (2+2\tau ) C^TC &{} -C^T \\ -B &{} -C &{} 2I_m \end{array} \right) \nonumber \\&\quad - \left( \begin{array}{ccc} (2+\tau ) B^TB &{} B^TC &{} -B^T\\ C^TB &{} (2+\tau ) C^TC &{} - C^T \\ -B &{} -C &{}I_m \end{array} \right) \nonumber \\= & {} \left( \begin{array}{ccc} \tau B^TB &{} -B^TC &{} 0 \\ -C^TB &{} \tau C^TC &{} 0 \\ 0 &{} 0 &{} I_m \end{array} \right) . \end{aligned}$$

(35)

By using the notation $D_0$ (see (19)), the matrix G can be rewritten as

Obviously, the proximal matrix $D_0$ in (19) can be rewritten as

$$\begin{aligned} {{D_0}}= & {} (\tau -1) \left( \! \begin{array}{cc} B^TB &{} 0\\ 0 &{} C^TC \end{array}\! \right) + \left( \! \begin{array}{c} B^T \\ -C^T \end{array}\! \right) \bigl ( B, -C\bigr ). \end{aligned}$$

(36)

Therefore, for $\tau \in (\frac{1}{2}, 1)$, G is positive indefinite because the matrix $D_0$ is not so. The positive indefiniteness of G is indeed the main difficulty of proving the convergence of the scheme (1.11) with $\tau > 0.5$; and we need to look into the quadratic term $(v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k)$ more intensively.

5 Investigation of the Quadratic Term ${\boldsymbol{ (v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k)}}$

As mentioned, the key point of proving the convergence of the scheme (1.11) with $\tau > 0.5$ is to analyze the quadratic term $(v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k)$ which is not guaranteed to be positive. In this section, we focus on investigating this term and show that

$$\begin{aligned} (v^k-\tilde{v}^k)^T G (v^k-\tilde{v}^k) \ge \psi (v^k,v^{k+1}) - \psi (v^{k-1},v^k) + \varphi (v^k, v^{k+1}), \end{aligned}$$

(37)

where $\psi (\cdot ,\cdot )$ and $\varphi (\cdot ,\cdot )$ are both non-negative functions. The first two terms $\psi (v^k,v^{k+1}) - \psi (v^{k-1},v^k)$ in the right-hand side of (37) can be manipulated consecutively between iterates and the last term $\varphi (v^k, v^{k+1})$ should be such an error bound that can measure how much $w^{k+1}$ fails to be a solution point of (14). If we find such functions that guarantee the assertion (37), then we can substitute it into (29) and get the inequality

$$\begin{aligned}&\theta (u) - \theta (\tilde{u}^k) + (w- \tilde{w}^k)^T F(w) \nonumber \\&\,\, \ge \frac{1}{2} \bigl (\Vert v-v^{k+1}\Vert _H^2 + \psi (v^k,v^{k+1}) \bigr ) - \frac{1}{2} \bigl ( \Vert v-v^k\Vert _H^2 + \psi (v^{k-1},v^k) \bigr ) \nonumber \\&\quad + \frac{1}{2}\varphi (v^k, v^{k+1}), \;\; \forall w\in \Omega . \end{aligned}$$

(38)

As we shall show, all the components of the right-hand side of (38) in parentheses should be positive to establish the convergence and convergence rate of (1.11). It is indeed this requirement that implies our restriction of $\tau > 0.5$. We show the details in Theorem 26.5, preceded by several lemmas. Similar techniques for the convergence analysis of the ADMM are referred to, e.g. [4, 9, 10, 12].

Lemma 26.1

Let $\{w^k\}$ be the sequence generated by (1.11) for the problem (1) and $\tilde{w}^k$ be defined by (22). Then we have

$$\begin{aligned}&(v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k) \nonumber \\&\,\, = (1+ \tau )\Vert B(y^k- y^{k+1})\Vert ^2 + (1+ \tau )\Vert C(z^k-z^{k+1})\Vert ^2 + \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 \nonumber \\&\quad +2(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr ) . \end{aligned}$$

(39)

Proof

First, according to (35), we have

$$\begin{aligned} G&=\left( \!\begin{array}{ccc} \tau B^TB &{} -B^TC &{} 0 \\ -C^TB &{} \tau C^TC &{} 0 \\ 0 &{} 0 &{} I_m \end{array}\!\right) =\left( \!\begin{array}{ccc} (1+ \tau ) B^TB &{} 0 &{} 0 \\ 0 &{} (1+\tau ) C^TC &{} 0 \\ 0 &{} 0 &{} I_m \end{array}\! \right) \quad -\left( \!\begin{array}{ccc} B^TB &{} B^TC &{} 0 \\ C^TB &{} C^TC &{} 0 \\ 0 &{} 0 &{} 0 \end{array}\!\right) \end{aligned}$$

and thus

$$\begin{aligned} (v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k)= & {} (1+ \tau ) \Vert B(y^k-\tilde{y}^k)\Vert ^2 + (1+\tau ) \Vert C(z^k-\tilde{z}^k)\Vert ^2 + \Vert \lambda ^k-\tilde{\lambda }^k\Vert ^2 \nonumber \\&- \Vert B(y^k-\tilde{y}^k)+C(z^k-\tilde{z}^k)\Vert ^2 . \end{aligned}$$

For the term $\Vert \lambda ^k - \tilde{\lambda }^k\Vert ^2$ in the right-hand side of the above equation, because $\tilde{x}^k = x^{k+1}$,

$$ \lambda ^k - \tilde{\lambda }^k= Ax^{k+1} + By^k +Cz^k -b \quad \hbox {and}\quad Ax^{k+1} + By^{k+1} + Cz^{k+1} -b= \lambda ^k-\lambda ^{k+1}, $$

we have

$$ \lambda ^k-\tilde{\lambda }^k = B(y^k-y^{k+1})+ C(z^k-z^{k+1}) + (\lambda ^k -\lambda ^{k+1}). $$

Finally, by a manipulation, we get

$$\begin{aligned}&(v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k) \\&\, = (1+ \tau )\Vert B(y^k- y^{k+1})\Vert ^2 + (1+ \tau )\Vert C(z^k-z^{k+1})\Vert ^2 \nonumber \\&\quad \,\, - \Vert B(y^k-y^{k+1})+C(z^k- z^{k+1})\Vert ^2 \nonumber \\&\quad \,\, +\Vert B(y^k-y^{k+1})+ C(z^k-z^{k+1}) + (\lambda ^k -\lambda ^{k+1})\Vert ^2 \nonumber \\&\, = (1+ \tau ) \Vert B(y^k- y^{k+1})\Vert ^2 + (1+ \tau ) \Vert C(z^k-z^{k+1})\Vert ^2 + \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 \nonumber \\&\quad \,\, +2(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr ) . \end{aligned}$$

The lemma is proved. $\square $

For further analysis, we will divide the crossing term $2(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr )$ in the right-hand side of (39) into two parts and give their lower bounds by quadratic terms.

Lemma 26.2

Let $\{w^k\}$ be the sequence generated by (1.11) for the problem (1) and $\tilde{w}^k$ be defined by (22). Then we have

$$\begin{aligned}&(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr )\qquad \quad \nonumber \\&\,\,\ge \bigl (\psi (v^k,v^{k+1}) - \psi (v^{k-1},v^k)\bigr ) - 2(1-\tau ) \bigl (\Vert B(y^k-y^{k+1})\Vert ^2 + \Vert C(z^k-z^{k+1})\Vert ^2 \bigr ),\qquad \quad \end{aligned}$$

(40)

where

$$\begin{aligned} \psi (v^k,v^{k+1}) = \frac{1}{2}\left( \left\| \begin{array}{c} y^k -y^{k+1}\\ z^k - z^{k+1}\end{array} \right\| _D^2 + (1-\tau ) \Bigl (\Vert B(y^k-y^{k+1})\Vert ^2 + \Vert C(z^k-z^{k+1})\Vert ^2 \Bigr ) \right) \end{aligned}$$

(41)

with

$$\begin{aligned} D= \left( \! \begin{array}{c} B^T \\ -C^T \end{array}\! \right) \bigl ( B, -C\bigr ). \end{aligned}$$

(42)

Proof

Recall (18). It holds that

$$\begin{aligned}&(y^{k+1}, z^{k+1})\in \mathcal{Y}\times \mathcal{Z}, \; \left( \!\!\begin{array}{c} \theta _2(y) - \theta _2(y^{k+1})\\ \theta _3(z) - \theta _3(z^{k+1}) \end{array} \!\!\right) + \left( \!\!\begin{array}{c} y -y^{k+1}\\ z - z^{k+1} \end{array} \!\!\right) ^T \qquad \quad \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \left\{ \left( \!\! \begin{array}{c} - B^T \\ -C^T \end{array}\!\! \right) \lambda ^{k+1} \right. + \left. D_0 \left( \begin{array}{c} y^{k+1}-y^k \\ z^{k+1} - z^k \end{array}\right) \right\} \ge 0, \;\; \forall (y, z)\in \mathcal{Y}\times \mathcal{Z} .\qquad \quad \end{aligned}$$

(43)

Analogously, for the previous iteration, we have

$$\begin{aligned}&(y^{k}, z^{k})\in \mathcal{Y}\times \mathcal{Z}, \; \; \left( \!\!\begin{array}{c} \theta _2(y) - \theta _2(y^{k})\\ \theta _3(z) - \theta _3(z^{k}) \end{array} \!\!\right) + \left( \!\!\begin{array}{c} y -y^{k}\\ z - z^{k} \end{array} \!\!\right) ^T\!\! \qquad \quad \nonumber \\&\qquad \qquad \qquad \qquad \qquad \left\{ \left( \!\! \begin{array}{c} - B^T \\ -C^T \end{array}\!\! \right) \lambda ^{k} + D_0 \left( \begin{array}{c} y^{k}-y^{k-1} \\ z^{k} - z^{k-1} \end{array}\right) \right\} \ge 0, \;\; \forall (y, z)\in \mathcal{Y}\times \mathcal{Z} .\qquad \quad \end{aligned}$$

(44)

Setting $(y,z)=(y^k,z^k)$ and $(y,z)=(y^{k+1},z^{k+1})$ in (43) and (44), respectively, and adding them, we get

$$ \left( \!\!\begin{array}{c} y^k -y^{k+1}\\ z^k - z^{k+1} \end{array} \!\!\right) ^T\!\! \left\{ \! \left( \!\! \begin{array}{c} B^T \\ C^T \end{array}\!\! \right) (\lambda ^k -\lambda ^{k+1}) + D_0\left[ \! \left( \begin{array}{c} y^{k+1}-y^k \\ z^{k+1} - z^k \end{array}\right) - \left( \begin{array}{c} y^{k} - y^{k-1} \\ z^{k} - z^{k-1} \end{array}\right) \!\right] \!\right\} \ge 0. $$

Consequently, we have

$$\begin{aligned}&(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr ) \nonumber \\&\,\, \ge \left( \!\!\begin{array}{c} y^k -y^{k+1}\\ z^k - z^{k+1} \end{array} \!\!\right) ^T D_0\left[ \! \left( \begin{array}{c} y^k - y^{k+1}\\ z^k - z^{k+1} \end{array}\right) - \left( \begin{array}{c} y^{k-1} - y^k \\ z^{k-1} - z^k \end{array}\right) \!\right] . \end{aligned}$$

(45)

From (19) and (42) we get

$$ D_0 = D - (1-\tau ) \left( \! \begin{array}{cc} B^TB &{} 0 \\ 0 &{} C^TC \end{array}\! \right) . $$

Thus, using Cauchy-Schwaez inequality, from (45) we obtain

$$\begin{aligned}&(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr ) \nonumber \\&\,\, \ge \left( \!\!\begin{array}{c} y^k -y^{k+1}\\ z^k - z^{k+1} \end{array} \!\!\right) ^T \left\{ D - (1-\tau ) \left( \! \begin{array}{cc} B^TB &{} 0 \\ 0 &{} C^TC \end{array}\! \right) \right\} \! \!\left[ \! \left( \!\! \begin{array}{c} y^k - y^{k+1}\\ z^k - z^{k+1} \end{array}\!\!\right) - \left( \!\! \begin{array}{c} y^{k-1} - y^k \\ z^{k-1} - z^k \end{array}\!\!\right) \!\right] \nonumber \\&\,\, = \left\| \begin{array}{c} y^k -y^{k+1}\\ z^k - z^{k+1} \end{array} \right\| _D^2 - \left( \begin{array}{c} y^k - y^{k+1}\\ z^k - z^{k+1} \end{array}\right) ^T D \left( \begin{array}{c} y^{k-1} - y^k \\ z^{k-1} - z^k \end{array}\right) \nonumber \\&\qquad \, - (1-\tau ) \Bigl (\Vert B(y^k-y^{k+1})\Vert ^2 + \Vert C(z^k-z^{k+1})\Vert ^2 \Bigr ) \nonumber \\&\qquad \quad + (1-\tau ) \left( \!\!\begin{array}{c} y^k -y^{k+1}\\ z^k - z^{k+1} \end{array} \!\!\right) ^T \left( \! \begin{array}{cc} B^TB &{} 0 \\ 0 &{} C^TC \end{array}\! \right) \left( \!\! \begin{array}{c} y^{k-1} - y^k \\ z^{k-1} - z^k \end{array}\!\!\right) \nonumber \\&\,\, \ge \frac{1}{2} \left\| \begin{array}{c} y^k -y^{k+1}\\ z^k - z^{k+1} \end{array} \right\| _D^2 - \frac{1}{2} \left\| \begin{array}{c} y^{k-1} -y^{k}\\ z^{k-1} - z^{k} \end{array} \right\| _D^2 \nonumber \\&\qquad \, - \frac{3}{2}(1-\tau ) \Bigl (\Vert B(y^k-y^{k+1})\Vert ^2 + \Vert C(z^k-z^{k+1})\Vert ^2 \Bigr ) \nonumber \\&\qquad \, - \frac{1}{2}(1-\tau ) \Bigl (\Vert B(y^{k-1}-y^{k})\Vert ^2 + \Vert C(z^{k-1}-z^{k})\Vert ^2 \Bigr ), \end{aligned}$$

(46)

where the last inequality is because of the Cauchy-Schwarz inequality. Manipulating the right-hand side of (46) recursively and using the notation of $\psi (\cdot ,\cdot )$ (see (41)), we get (40) and the lemma is proved. $\square $

In addition to (40), we need to the term $(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr )$ by an another quadratic terms. This is done by the following lemma.

Lemma 26.3

Let $\{w^k\}$ be the sequence generated by (1.11) for the problem (1) and $\tilde{w}^k$ be defined by (22). Then, for $\tau \in (0.5,1)$, we have

$$\begin{aligned}&(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr ) \nonumber \\&\,\,\ge - \tau \Bigl (\Vert B(y^k-y^{k+1})\Vert ^2+ \Vert C(z^k-z^{k+1})\Vert ^2 \Bigr ) -\Bigl ( \frac{3}{2} - \tau \Bigr ) \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 . \end{aligned}$$

(47)

Proof

Setting $\delta = \tau -\frac{1}{2}$. Because $\tau \in (0.5,1)$, we have $\delta \in (0,0.5)$. Using the Cauchy-Schwarz inequality twice, we get

$$\begin{aligned}&(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr ) \nonumber \\&\,\,\ge -\frac{1}{4(1-\delta )} \Vert B(y^k-y^{k+1})+ C(z^k-z^{k+1})\Vert ^2 - (1-\delta )\Vert \lambda ^k -\lambda ^{k+1}\Vert ^2\nonumber \\&\,\, \ge -\frac{1}{2(1-\delta )}\bigl ( \Vert B(y^k-y^{k+1})\Vert ^2 +\Vert C(z^k-z^{k+1})\Vert ^2\bigr ) - (1-\delta )\Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 . \end{aligned}$$

Since $\delta \in (0,0.5)$, we have

$$\frac{1}{2(1-\delta )}< \frac{1}{2} +\delta ,$$

and thus

$$\begin{aligned}&(\lambda ^k -\lambda ^{k+1})^T \bigl ( B(y^k-y^{k+1}) + C(z^k - z^{k+1}) \bigr ) \nonumber \\&\,\, \ge -\bigl (\frac{1}{2} +\delta \bigr )\bigl ( \Vert B(y^k-y^{k+1})\Vert ^2 +\Vert C(z^k-z^{k+1})\Vert ^2\bigr ) - (1-\delta )\Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 . \end{aligned}$$

Substituting $\delta =\tau - \frac{1}{2}$ in the above inequality, we get (47) and the lemma is proved. $\square $

Recall that we want to bound the quadratic term $(v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k)$ in the form of (38). Our previous analysis enables us to achieve it; and this is the basis of the convergence analysis to be shown soon.

Lemma 26.4

Let $\{w^k\}$ be the sequence generated by (1.11) for the problem (1) and $\tilde{w}^k$ be defined by (22). Then, for $\tau \in (0.5, 1)$, we have

$$\begin{aligned} (v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k)\ge \bigl (\psi (v^k,v^{k+1}) - \psi (v^{k-1},v^k)\bigr ) + \varphi (v^k,v^{k+1}), \end{aligned}$$

(48)

where $\psi (v^k,v^{k+1})$ is defined in (41) and

$$\begin{aligned} \varphi (v^k,v^{k+1}) = \bigl (\tau - \frac{1}{2}\bigr ) \bigl ( 2\Vert B(y^k-y^{k+1})\Vert ^2 + 2\Vert C(z^{k}-z^{k+1})\Vert ^2 + \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 \bigr ) .\end{aligned}$$

(49)

Proof

Substituting (40) and (47) into (39), we get

$$\begin{aligned}&(v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k) \nonumber \\&\,\,\, \ge (1+ \tau )\Vert B(y^k- y^{k+1})\Vert ^2 + (1+ \tau )\Vert C(z^k-z^{k+1})\Vert ^2 + \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 \nonumber \\&\qquad \,\, + \bigl (\psi (v^k,v^{k+1}) - \psi (v^{k-1},v^k)\bigr ) - 2(1-\tau ) \bigl (\Vert B(y^k-y^{k+1})\Vert ^2 + \Vert C(z^k-z^{k+1})\Vert ^2 \bigr ) \nonumber \\&\qquad \,\, - \tau \bigl (\Vert B(y^k-y^{k+1})\Vert ^2 + \Vert C(z^k-z^{k+1})\Vert ^2 \bigr ) -\big (\frac{3}{2} -\tau \bigr ) \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 \nonumber \\&\quad \,\,\, = \bigl (\psi (v^k,v^{k+1}) - \psi (v^{k-1},v^k)\bigr ) \nonumber \\&\qquad \,\, + (2\tau - 1) \bigl (\Vert B(y^{k}-y^{k+1})\Vert ^2 + \Vert C(z^{k}-z^{k+1})\Vert ^2 \bigr ) + \bigl ( \tau -\frac{1}{2} \bigr ) \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2. \end{aligned}$$

The assertion of this lemma follows from the definition of $\varphi (v^k,v^{k+1})$ directly. $\square $

Finally, substituting (48) into (29), we obtain the following theorem directly. This theorem plays a fundamental role in proving the convergence of (1.11) with $\tau >0.5$.

Theorem 26.5

Let $\{w^k\}$ be the sequence generated by (1.11) for the problem (1) and $\tilde{w}^k$ be defined by (22). Then we have

$$\begin{aligned} \theta (u) - \theta (\tilde{u}^k) + (w- \tilde{w}^k)^T F(w)\ge & {} \frac{1}{2}\bigl (\Vert v-v^{k+1}\Vert _H^2 +\psi (v^k,v^{k+1})\bigr ) -\frac{1}{2} \bigl (\Vert v-v^k\Vert _H^2\nonumber \\&+ \psi (v^{k-1},v^k)\bigr ) + \frac{1}{2} \varphi (v^k,v^{k+1}), \;\; \forall \, w\in \Omega , \end{aligned}$$

(50)

where $\psi (v^k,v^{k+1})$ and $\varphi (v^k,v^{k+1})$ are defined in (41) and (49), respectively.

6 Convergence

As mentioned, proving the convergence of the scheme (1.11) with $\tau >0.5$ essentially relies on Theorem 26.5. With Theorem 26.5, the remaining part of the proof is subroutine. In this section, we present the convergence of the scheme (1.11) with $\tau >0.5$; a lemma is first proved to show the contraction property of the sequence generated by (1.11).

Lemma 27.1

Let $\{w^k\}$ be the sequence generated by (1.11) with $\tau >0.5$ for the problem (1). Then we have

$$\begin{aligned} \bigl ( \Vert v^{k+1}-v^*\Vert _H^2 + \psi (v^k, v^{k+1}) \bigr ) \le \bigl ( \Vert v^k - v^*\Vert _H^2 + \psi (v^{k-1}, v^k) \bigr ) - \varphi (v^k,v^{k+1}), \end{aligned}$$

(51)

where $\psi (v^k,v^{k+1})$ and $\varphi (v^k,v^{k+1})$ are defined in (41) and (49), respectively.

Proof

Setting $w=w^*$ in (50) and using

$$ \theta (\tilde{u}^k) - \theta (u^*) + (\tilde{w}^k-w^*)^T F(w^*)\ge 0 ,$$

we obtain the assertion (51) immediately. $\square $

Theorem 27.2

Let $\{w^k\}$ be the sequence generated by (1.11) with $\tau >0.5$ for the problem (1). Then the sequence $\{v^k\}$ converges to a $v^{\infty } \in \mathcal{V}^*$ when B and C are both full column rank.

Proof

First, it follows from (51) and (49) that

$$\begin{aligned}&(\tau - \frac{1}{2}) \Bigl (2\Vert B(y^k-y^{k+1})\Vert ^2 +2 \Vert C(z^{k}-z^{k+1})\Vert ^2 + \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 \Bigr ) \nonumber \\&\,\,\, \le \bigl (\Vert v^k - v^*\Vert _H^2 + \psi (v^{k-1}, v^{k})\bigr ) - \bigl (\Vert v^{k+1}-v^*\Vert _H^2 + \psi (v^k, v^{k+1})\bigr ) . \end{aligned}$$

Summarizing the last inequality over $k=1,2,\ldots $, we obtain

$$\begin{aligned}&\sum _{k=1}^{\infty } \Bigl \{ (\tau - \frac{1}{2}) \Bigl (2\Vert B(y^k-y^{k+1})\Vert ^2 +2 \Vert C(z^{k}-z^{k+1})\Vert ^2 + \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 \Bigr ) \Bigr \}\\&\qquad \qquad \qquad \le \Vert v^1 - v^*\Vert _H^2 +\psi (v^0, v^1) \end{aligned}$$

and thus

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert B(y^k-y^{k+1})\Vert ^2 + \Vert C(z^k-z^{k+1})\Vert ^2 + \Vert \lambda ^k -\lambda ^{k+1}\Vert ^2 =0. \end{aligned}$$

(52)

For an arbitrarily fixed $v^*\in \mathcal{V}^*$, it follows from (51) that, for any $k>1$, we have

$$\begin{aligned} \Vert v^{k+1} - v^*\Vert _H^2 \, \le \, \Vert v^k - v^*\Vert _H^2 + \psi (v^{k-1}, v^{k}) \, \le \, \Vert v^1 - v^*\Vert _H^2 + \psi (v^{0}, v^{1}). \end{aligned}$$

(53)

Thus the sequence $\{v^k\}$ is bounded. Because M is non-singular, according to (4.6), $\{\tilde{v}^k\}$ is also bounded. Let $v^{\infty }$ be a cluster point $\{\tilde{v}^k\}$ and $\{\tilde{v}^{k_j}\}$ be the subsequence of $\{\tilde{v}^k\}$ converging to $v^{\infty }$. Let $x^{\infty }$ be the vector induced by given $(y^{\infty }, z^{\infty }, \lambda ^{\infty }) \in \mathcal{V}$. Then, it follows from (31) that

$$ w^\infty \in \Omega , \quad \theta (u) - \theta ({u}^\infty ) + (w- {w}^\infty )^T F({w}^\infty ) \ge 0, \quad \forall w\in \Omega , $$

which means ${w}^{\infty }$ is a solution point of (14) and its essential part $v^\infty \in \mathcal{V}^*$. Since $v^{\infty }\in \mathcal{V}^*$, it follows from (53) that

$$\begin{aligned} \Vert v^{k+1} - v^\infty \Vert _H^2 \le \Vert v^k - v^\infty \Vert _H^2 + \psi (v^{k-1}, v^{k}) . \end{aligned}$$

(54)

Together with (52), it is impossible that the sequence $\{v^k\}$ has more than one cluster point. Thus $\{v^k\}$ converges to $v^{\infty }$ and the proof is complete. $\square $

Remark 27.3

Note that the convergence of (1.11) with $\tau >0.5$ in terms of the sequence $\{v^k\}$ is proved in Theorem 27.2 under the assumption that both B and C are full column rank. Without this assumption, weaker convergence results in terms of $\{By^k,Cz^k\}$ can be derived. We refer to Sect. 6 in [11] for details.

7 The Optimality of ${\boldsymbol{ \tau =0.5}}$

We have proved the convergence of (1.11) with $\tau > 0.5$; the key is sufficiently ensuring the non-negativeness of the coefficients in the right-hand side of (50). In this section, we show by an example that any $\tau \in (0,0.5)$ may yield divergence of (1.11). Hence, $\tau =0.5$ is the watershed, or optimal value, to ensure the convergence of (1.11).

For any given $\tau <0.5$, we take $\epsilon =0.5-\tau >0$ and consider the problem

$$\begin{aligned} \min \{ x + \frac{\epsilon }{2} y^2 +\frac{\epsilon }{2} z^2 \; | \; x + y + z=0, \; x\in \{ 0\}, y\in {\mathfrak {R}}, z\in {\mathfrak {R}}\}, \end{aligned}$$

(55)

which is a special case of the model (1). Obviously, the solution of this problem is $x=y=z=0$.

The augmented Lagrangian function of the problem (55) with a penalty parameter of 1 is

$$ \mathcal{L}(x, y,z, \lambda ) = x + \frac{\epsilon }{2} y^2 +\frac{\epsilon }{2} z^2 - \lambda ^T(x+y+z) + \frac{1}{2} \Vert x+y +z\Vert ^2; $$

and the iterative scheme (1.11) for (55) is

$$\begin{aligned} \left\{ \begin{array} {rcl} x^{k+1} &{}= &{}\arg \min \bigl \{\mathcal{L}(x, y^k,z^k,\lambda ^k) \; \big | \; x\in \{0\} \bigr \},\\ y^{k+1} &{}= &{}\arg \min \bigl \{\mathcal{L}(x^{k+1}, y,z^k,\lambda ^k) +\frac{\tau }{2}\Vert y-y^k\Vert ^2 \; \big | \; y\in {\mathfrak {R}} \bigr \},\\ z^{k+1} &{}= &{}\arg \min \bigl \{\mathcal{L}(x^{k+1}, y^k,z,\lambda ^k) +\frac{\tau }{2}\Vert z-z^k\Vert ^2 \; \big | \; z\in {\mathfrak {R}} \bigr \},\\ {\lambda }^{k+1} &{}=&{} \lambda ^k -(x^{k+1} +y^{k+1} + z^{k+1}). \end{array} \right. \end{aligned}$$

(56)

Since $\mathcal{X} = \{0\}$, we have $x^{k+1} \equiv 0$. Ignoring constant terms in the objective function of the subproblems, the recursion (56) becomes

$$\begin{aligned} \left\{ \begin{array}{rcl} x^{k+1} &{}\equiv &{} 0,\\ y^{k+1} &{}=&{}\arg \min \bigl \{ \frac{\epsilon }{2} y^2 - y^T\lambda ^k + \frac{1}{2} \Vert y +z^k\Vert ^2 +\frac{\tau }{2}\Vert y-y^k\Vert ^2 \; \big | \; y\in {\mathfrak {R}} \bigr \},\\ z^{k+1} &{} =&{}\arg \min \bigl \{ \frac{\epsilon }{2} z^2 - z^T\lambda ^k + \frac{1}{2} \Vert y^k +z\Vert ^2 +\frac{\tau }{2}\Vert z-z^k\Vert ^2 \; \big | \; z\in {\mathfrak {R}} \bigr \},\\ {\lambda }^{k+1} &{}= &{}\lambda ^k -(y^{k+1} + z^{k+1}). \end{array} \right. \end{aligned}$$

(57)

Further, it follows from (57) that

$$ \left\{ \begin{array}{l} \epsilon y^{k+1} -\lambda ^k + (y^{k+1} + z^k) + \tau (y^{k+1} - y^k) = 0, \\ \epsilon z^{k+1} -\lambda ^k + (z^{k+1} + y^k) + \tau (z^{k+1} - z^k) = 0, \\ \lambda ^{k+1} = \lambda ^k -(y^{k+1}+z^{k+1}) . \end{array} \right. $$

Thus, the iterative scheme for $v=(y,z,\lambda )$ can be written as

$$\begin{aligned} \left\{ \begin{array}{l} (\tau + 1+ \epsilon ) y^{k+1} = \; \tau y^k - z^k + \lambda ^k, \\ (\tau + 1+ \epsilon ) z^{k+1} = - y^k + \tau z^k + \lambda ^k,\\ \lambda ^{k+1} = \lambda ^k - (y^{k+1} + z^{k+1}) . \end{array} \right. \end{aligned}$$

(58)

Without loss of generality, we can take $y^0=z^0$ and thus $y^k\equiv z^k$, for all $k>0$. Using this fact and $\tau +\epsilon =0.5$, we get

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle {\frac{3}{2}} y^{k+1} = (\tau -1) y^k + \lambda ^k,\\ \lambda ^{k+1} = \lambda ^k - 2y^{k+1} . \end{array} \right. \end{aligned}$$

(59)

With elementary manipulations, we obtain

$$\begin{aligned} \left\{ \begin{array}{l} y^{k+1} = {\displaystyle {\frac{-2(1-\tau )}{3}}} y^k + {\displaystyle { \frac{2}{3}}} \lambda ^k, \\ \lambda ^{k+1} = {\displaystyle { \frac{4(1-\tau )}{3} }} y^k + {\displaystyle { \frac{ -1 }{3} }} \lambda ^k, \end{array} \right. \end{aligned}$$

(60)

which can be written as

$$\begin{aligned} \left( \begin{array}{l} y^{k+1} \\ \lambda ^{k+1} \end{array} \right) =P(\tau ) \left( \begin{array}{l} y^{k} \\ \lambda ^{k} \end{array} \right) \qquad \hbox {with} \qquad P(\tau ) =\frac{1}{3} \left( \begin{array}{cc} -2(1-\tau ) &{} 2\\ 4(1- \tau ) &{} -1 \end{array}\right) . \end{aligned}$$

(61)

Let $f_1(\tau )$ and $f_2(\tau ) $ be the two eigenvalues of the matrix $P(\tau )$. Then we have

$$ f_1(\tau ) =\frac{1}{6} \Bigl ( (2\tau -3) + \sqrt{ (3-2\tau )^2 + 24 (1-\tau )}\Bigr ) , $$

and

$$ f_2(\tau ) = \frac{1}{6} \Bigl ( (2\tau -3) - \sqrt{ (3-2\tau )^2 + 24 (1-\tau )}\Bigr ). $$

Certainly, the scheme (60) is divergent if the absolute value of one of the eigenvalues of the matrix $ P(\tau )$ is greater than 1. Indeed, it holds that $f_2(\tau )<-1$ for any $\tau \in (0,0.5)$. To see this assertion, we notice that

$$\begin{array}{rcrcl} f_2(\tau )<-1 &{} \Leftrightarrow &{} (2\tau -3) - \sqrt{ (3-2\tau )^2 + 24 (1-\tau )} &{}< &{} -6 \\ &{} \Leftrightarrow &{} 2 \tau + 3 &{}< &{} \sqrt{4\tau ^2 - 36\tau + 33} \\ &{} \Leftrightarrow &{} 4\tau ^2 + 12 \tau +9 &{}< &{} 4\tau ^2 - 36\tau + 33 \\ &{} \Leftrightarrow &{} \tau &{} < &{} 0.5. \end{array} $$

Hence, the scheme (1.11) is not necessarily convergent for any $\tau \in (0,0.5)$.

8 Convergence Rate

In this section, we derive a worst-case O(1/t) convergence rate in terms of iteration complexity for the scheme (1.11) with $\tau >0.5$, where t is the iteration counter. Hence, although the condition $\tau \ge 1$ in [11] is now relaxed to $\tau >0.5$, the same convergence rate result in [11] remains valid for the scheme (1.11). Similar analysis is refereed to [11, 13].

First of all, recall (14). If we find $\tilde{w}$ satisfying the inequality

$$ \tilde{w}\in \Omega , \;\; \theta (u) - \theta (\tilde{u}) + (w-\tilde{w})^T F(\tilde{w}) \ge 0, \;\; \forall \, w\in \Omega , $$

then $\tilde{w}$ is a solution point of (14). As mentioned in (15), we have $ (w-\tilde{w})^T F(\tilde{w}) = (w-\tilde{w})^T F(\tilde{w})$. Thus, a solution point $\tilde{w}$ of (14) can be also characterized by

$$ \tilde{w}\in \Omega , \;\; \theta (u) - \theta (\tilde{u}) + (w-\tilde{w})^T F(w) \ge 0, \;\; \forall \, w\in \Omega . $$

Therefore, as [3], for given $\epsilon >0$, $\tilde{w} \in {\Omega }$ is called an $\epsilon $-approximate solution of VI$({\Omega },F,\theta )$ if it satisfies

$$ \tilde{w}\in \Omega , \;\; \theta (u) - \theta (\tilde{u}) + (w-\tilde{w})^TF(w)\ge -\epsilon , \;\;\forall \; w \in \mathcal{D}_{(\tilde{w})}, $$

where

$$ \mathcal{D}_{(\tilde{w})} =\{ w\in \Omega \, |\, \Vert w-\tilde{w}\Vert \le 1 \}. $$

In the following, we show that based on the first t iterates generated by the scheme (1.11) with $\tau >0.5$, we can find an approximate solution of (14), denoted by $\tilde{w} \in \Omega $, such that

$$\begin{aligned} \tilde{w}\in \Omega \quad \hbox {and} \quad \sup _{w\in \mathcal{D}_{(\tilde{w})}} \bigl \{ \theta (\tilde{u}) - \theta (u) + (\tilde{w}-w)^T F(w) \bigr \}\le \epsilon , \end{aligned}$$

(62)

where $\epsilon =O(1/t)$. That is, a worst-case O(1/t) convergence rate is established for the scheme (1.11) with $\tau >0.5$. Theorem 26.5 is still the basis for the analysis in this section.

Theorem 29.1

Let $\{w^k\}$ be the sequence generated by (1.11) with $\tau >0.5$ for the problem (1) and $\tilde{w}^k$ be defined by (22). Then for any integer t, we have

$$\begin{aligned} \theta (\tilde{u}_t) - \theta (u) + (\tilde{w}_t-w )^T F(w) \le \frac{1}{2t} \left\{ \Vert v-v^1\Vert _H^2 + \psi (v^0,v^1) \right\} , \end{aligned}$$

(63)

where

$$\begin{aligned} \tilde{w}_t = \frac{1}{t} \bigl (\sum _{k=1}^t \tilde{w}^k\bigr ) \end{aligned}$$

(64)

and $\psi (v^0,v^{1})$ is defined in (41) and thus

$$\psi (v^0,v^{1}) = \frac{1}{2}\left( \left\| \begin{array}{c} y^0 -y^{1}\\ z^0 - z^{1}\end{array} \right\| _D^2 + (1-\tau ) \Bigl (\Vert B(y^0-y^{1})\Vert ^2 + \Vert C(z^0-z^{1})\Vert ^2 \Bigr )\right) . $$

Proof

First, it follows from (50) that

$$\begin{aligned} \theta (u)&- \theta (\tilde{u}^k) + (w- \tilde{w}^k)^T F(w) \ge \frac{1}{2} \bigl ( \Vert v-v^{k+1}\Vert _H^2\\&+ \psi (v^k,v^{k+1}) \bigr ) - \frac{1}{2} \bigl ( \Vert v-v^k\Vert _H^2 +\psi (v^{k-1},v^k) \bigr ) . \end{aligned}$$

Thus, we have

$$\begin{aligned} \theta (\tilde{u}^k)&-\theta (u) + (\tilde{w}^k-w)^T F(w) +\frac{1}{2} \bigl ( \Vert v-v^{k+1}\Vert _H^2 \nonumber \\&+ \psi (v^k,v^{k+1}) \bigr ) \le \frac{1}{2} \bigl ( \Vert v-v^k\Vert _H^2 +\psi (v^{k-1},v^k) \bigr ). \end{aligned}$$

(65)

Summarizing the inequality (65) over $k=1,2, \ldots , t$, we obtain

$$ \sum _{k=1}^t \theta (\tilde{u}^k) - t \theta (u) + (\sum _{k=1}^t \tilde{w}^k- t w )^T F(w) \le \frac{1}{2} \bigl ( \Vert v-v^1\Vert _H^2 +\psi (v^0,v^1) \bigr ) $$

and thus

$$\begin{aligned} \frac{1}{t}\bigl ( \sum _{k=1}^t \theta (\tilde{u}^k)\bigr ) - \theta (u) + (\tilde{w}_t-w )^T F(w) \le \frac{1}{2t} \bigl ( \Vert v-v^1\Vert _H^2 +\psi (v^0,v^1) \bigr ).\end{aligned}$$

(66)

Since $\theta (u)$ is convex and

$$ \tilde{u}_t = \frac{1}{t} \bigl (\sum _{k=1}^t \tilde{u}^k\bigr ), $$

we have that

$$ \theta (\tilde{u}_t) \le \frac{1}{t}\bigl ( \sum _{k=1}^t \theta (\tilde{u}^k)\bigr ). $$

Substituting it into (66), the assertion of this theorem follows directly. $\square $

For a given compact set $ \mathcal{D}_{(\tilde{w})} \subset \Omega $, let

$$ d:=\sup \biggl \{ \Vert v-v^1\Vert _H^2 + \frac{1}{2} \bigg \Vert \begin{array}{c} y^0 -y^{1}\\ z^0 - z^{1}\end{array} \bigg \Vert _D^2 + \frac{1-\tau }{2} \Bigl (\Vert B(y^0-y^{1})\Vert ^2 + \Vert C(z^0-z^{1})\Vert ^2 \Bigr ) \, \Big |\, w\in \mathcal{D}_{(\tilde{w})} \biggr \}$$

where $v^0=(y^0,z^0,\lambda ^0)$ and $v^1=(y^1,z^1,\lambda ^1)$ are the initial and the first generated iterates, respectively. Then, after t iterations of the scheme (1.11), the point $\tilde{w}_t \in \Omega $ defined in (64) satisfies

$$ \tilde{w}\in \Omega \quad \hbox {and} \quad \sup _{w\in \mathcal{D}_{(\tilde{w})}} \bigl \{ \theta (\tilde{u}) - \theta (u) + (\tilde{w}-w)^T F(w) \bigr \}\le \frac{d}{2t}=O(\frac{1}{t}), $$

which means ${\tilde{w}}_t$ is an approximate solution of VI($\Omega ,F,\theta $) with an accuracy O(1/t) (recall (62)). That is, a worst-case O(1/t) convergence rate is established for the scheme (1.11) with $\tau >0.5$. Since ${\tilde{w}}_t$ defined in (64) is the average of all iterates of (1.11), this convergence rate is in the ergodic sense.

9 Conclusions

We revisit the splitting method proposed in [11] for solving separable convex minimization models; and show that its optimal proximal parameter is 0.5 when the objective function is the sum of three functions. This optimal proximal parameter offers the possibility of immediate numerical acceleration; which can be easily verified by the examples tested in [2, 11] and others. For succinctness, we omit the presentation of numerical results. Meanwhile, more sophisticated techniques are required for the convergence analysis because this optimal proximal parameter generates positive indefiniteness in the proximal regularization term as well. We establish the convergence and estimate the worst-case convergence rate in terms of iteration complexity for the improved version of the method in [11] with the optimal proximal parameter. This work is inspired by the analysis in our recent work [9, 10] for the augmented Lagrangian method and alternating direction method of multiplies.

References

C.H. Chen, B.S. He, Y.Y. Ye, X.M. Yuan, The direct extension of ADMM for multi-block convex minimization problems is not necessary convergent. Math. Program. 155, 57–79 (2016)
Article MathSciNet Google Scholar
E. Esser, M. Möller, S. Osher, G. Sapiro, J. Xin, A convex model for non-negative matrix factorization and dimensionality reduction on physical space. IEEE Trans. Imaging Process. 21(7), 3239–3252 (2012)
Article Google Scholar
F. Facchinei, J.S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems, Vol. I (Springer Series in Operations Research, Springer, 2003)
Google Scholar
R. Glowinski, Numerical Methods for Nonlinear Variational Problems (Springer, New York, Berlin, Heidelberg, Tokyo, 1984)
Book Google Scholar
R. Glowinski, A. Marrocco, Sur l’approximationparéléments finis d’ordre un et larésolution parpénalisation-dualité d’une classe de problèmes deDirichlet non linéaires, Revue Fr. Autom. Inform. Rech.Opér., Anal. Numér. 2 (1975), pp. 41–76
Google Scholar
B.S. He, PPA-like contraction methods for convex optimization: a framework using variational inequality approach. J. Oper. Res. Soc. China 3, 391–420 (2015)
Article MathSciNet Google Scholar
B.S. He, H. Liu, Z.R. Wang, X.M. Yuan, A strictly contractive Peaceman-Rachford splitting method for convex programming. SIAM J. Optim. 24, 1011–1040 (2014)
Article MathSciNet Google Scholar
B.S. He, L.S. Hou, X.M. Yuan, On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2015)
Article MathSciNet Google Scholar
B.S. He, F. Ma, X.M. Yuan, Indefinite proximal augmented Lagrangian method and its application to full Jacobian splitting for multi-block separable convex minimization problems. IMA J. Numer. Anal. 75, 361–388 (2020)
MATH Google Scholar
B.S. He, F. Ma, X.M. Yuan, Optimally linearizing the alternating direction method of multipliers for convex programming. Comput. Optim. Appl. 75(2), 361–388 (2020)
Article MathSciNet Google Scholar
B.S. He, M. Tao, X.M. Yuan, A splitting method for separable convex programming. IMA J. Numer. Anal. 35, 394–426 (2014)
Article MathSciNet Google Scholar
B.S. He, H. Yang, Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett. 23, 151–161 (1998)
Article MathSciNet Google Scholar
B.S. He, X.M. Yuan, On the $O(1/t)$ convergence rate of the alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MathSciNet Google Scholar
B.S. He, X.M. Yuan, On non-ergodic convergence rate of Douglas-Rachford alternating directions method of multipliers. Numer. Math. 130, 567–577 (2015)
Article MathSciNet Google Scholar
M.R. Hestenes, Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)
Article MathSciNet Google Scholar
K.C. Kiwiel, C.H. Rosa, A. Ruszczy$\acute{n}$ski, Proximal decomposition via alternating linearization. SIAM J. Optim. 9, 668–C689 (1999)
Google Scholar
B. Martinet, Regularisation, d’inéquations variationelles par approximations succesives. Rev. Francaise d’Inform. Recherche Oper. 4, 154–159 (1970)
MATH Google Scholar
Powell M.J.D., A method for nonlinear constraints in minimization problems, in Optimization, ed. by R. Fletcher (Academic Press, New York, NY, 1969), pp. 283–298
Google Scholar
R.T. Rockafellar, Monotone operators and the proximal point algorithm. SIAM J. Cont. Optim. 14, 877–898 (1976)
Article MathSciNet Google Scholar
M. Tao, X.M. Yuan, Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)
Article MathSciNet Google Scholar
R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, K. Knight, Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. 67, 91–108 (2005)
Google Scholar
X. Zhou, C. Yang, W. Yu, Moving object detection by detecting contiguous outliers in the low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 597–610 (2013)
Article Google Scholar
Z. Zhou, X. Li, J. Wright, E.J. Candes, Y. Ma, Stable principal component pursuit, in Proceedings of international symposium on information theory, Austin, Texas, USA (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Southern University of Science and Technology, Shenzhen, China
Bingsheng He
Department of Mathematics, Nanjing University, Nanjing, China
Bingsheng He
Department of Mathematics, The University of Hong Kong, Pok Fu Lam, Hong Kong
Xiaoming Yuan

Authors

Bingsheng He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoming Yuan .

Editor information

Editors and Affiliations

Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong
Xue-Cheng Tai
Institute of Applied Physics and Computational Mathematics, Beijing, China
Suhua Wei
Beijing Computational Science Research Center, Beijing, China
Haiguang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, B., Yuan, X. (2021). On the Optimal Proximal Parameter of an ADMM-like Splitting Method for Separable Convex Programming. In: Tai, XC., Wei, S., Liu, H. (eds) Mathematical Methods in Image Processing and Inverse Problems. IPIP 2018. Springer Proceedings in Mathematics & Statistics, vol 360. Springer, Singapore. https://doi.org/10.1007/978-981-16-2701-9_8

Download citation

DOI: https://doi.org/10.1007/978-981-16-2701-9_8
Published: 26 September 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2700-2
Online ISBN: 978-981-16-2701-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

On the Optimal Proximal Parameter of an ADMM-like Splitting Method for Separable Convex Programming

Abstract

Similar content being viewed by others

A relaxed proximal ADMM method for block separable convex programming

The symmetric ADMM with indefinite proximal regularization and its application

Improved proximal ADMM with partially parallel splitting for multi-block separable convex programming

Keywords

1 Introduction

2 Preliminaries

3 The Positive Indefiniteness of (1.11) with \(\tau >0.5\)

4 A Prediction-Correction Explanation of (1.11)

Theorem 25.1

Proof

5 Investigation of the Quadratic Term \({\boldsymbol{ (v^k-\tilde{v}^k)^T G(v^k-\tilde{v}^k)}}\)

Lemma 26.1

Proof

Lemma 26.2

Proof

Lemma 26.3

Proof

Lemma 26.4

Proof

Theorem 26.5

6 Convergence

Lemma 27.1

Proof

Theorem 27.2

Proof

Remark 27.3

7 The Optimality of \({\boldsymbol{ \tau =0.5}}\)

8 Convergence Rate

Theorem 29.1

Proof

9 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation