1 Introduction

In this paper, we consider the linearly constrained multi-block separable convex programming, whose objective function is the sum of m functions with decoupled variables:

$$\begin{aligned} \min \left\{ \begin{aligned} \sum _{i=1}^m\theta _i(x_i)|\sum _{i=1}^mA_ix_i=b,\;x_i\in \mathcal {X}_i,\;i=1,2,\ldots ,m \end{aligned}\right\} , \end{aligned}$$
(1.1)

where \(\theta _i:\mathcal {R}^{n_i}\rightarrow \mathcal {R}(i=1,2,\ldots ,m)\) are closed convex functions (not necessarily smooth); \(A_i\in \mathcal {R}^{l\times n_i} (i=1,2,\ldots ,m)\); \(b\in \mathcal {R}^l\) and \(\mathcal {X}_i\subseteq \mathcal {R}^{n_i}(i=1,2,\ldots ,m)\) are nonempty closed convex sets. Throughout this paper, the solution set of (1.1) is assumed to be nonempty. Many problems encountered in machine learning and statistical inference can be posed in the model (1.1) [2, 20,21,22, 27].

In the last decades, splitting methods for solving large scale (1.1) have been investigated in depth, e.g., the split-Bregman iteration method [18, 30], the fixed-point proximity methods [6, 19], the alternating direction method of multipliers [13, 14, 26, 28]. The relationship and numerical comparison of the above methods can be found in [19]. In this paper, we are going to study the alternating direction method of multipliers (ADMM), which is originally developed by [10], and has been revisited recently due to its success in the applications of separable convex programming [2, 22, 23, 27]. The kth iterative scheme of ADMM for (1.1) with \(m=2\) reads as

$$\begin{aligned} \left\{ \begin{array}{l} x_1^{k+1}=\mathop {{\mathrm {argmin}}}\limits _{x_1\in \mathcal {X}_1} \left\{ \mathcal {L}_\beta \left( x_1,x_{2}^k;\lambda ^k\right) \right\} ,\\ x_2^{k+1}=\mathop {{\mathrm {argmin}}}\limits _{x_2\in \mathcal {X}_2} \left\{ \mathcal {L}_\beta \left( x_1^{k+1},x_{2};\lambda ^k\right) \right\} ,\\ \lambda ^{k+1}=\lambda ^{k}-\beta \left( \begin{aligned} \sum _{i=1}^2A_ix_i^{k+1}-b\end{aligned}\right) , \end{array}\right. \end{aligned}$$
(1.2)

where \(\beta >0\), and

$$\begin{aligned} \mathcal {L}_\beta (x_1,x_2;\lambda )=\sum _{i=1}^2\theta _i(x_i) -\left\langle \begin{aligned} \lambda , \sum _{i=1}^2A_ix_i-b\end{aligned}\right\rangle +\frac{\beta }{2}\left\| \begin{aligned} \sum _{i=1}^2A_ix_i-b\end{aligned} \right\| ^2 \end{aligned}$$

is the augmented Lagrangian function of the model (1.1) with \(m=2\); \(\lambda \in \mathcal {R}^l\) is the Lagrangian multiplier. Based on the other’s most recent version, the iterative scheme (1.2) updates the primal variables \(x_i(i=1,2)\) in a sequential way. However, the ADMM is for two-block case and the global convergence cannot be guaranteed if it is directly extended m block case (\(m\ge 3\)) [5]. Furthermore, it is not suitable for distributed computing, which is particularly welcome for the large-scale (1.1) with big data.

To address the above two issues, in recent years many researchers incorporate the regularization technique and the parallel iterative technique into the iterative scheme (1.2), and propose some proximal ADMM-type methods with fully parallel splitting [7, 20] and some ADMM-type methods with partially parallel splitting [15, 28], which not only have global convergence under mild conditions, but also are suitable for distributed computing. In [7], Deng, Lai, Peng and Yin proposed a proximal Jacobian ADMM, which regularizes each subproblem by a proximal term \(\frac{1}{2}\Vert x-x_i^k\Vert ^2_{P_i}\), where \(P_i\) is the proximal matrix and is often required to be positive semi-definite. Many choices of \(P_i\) are presented in [8], e.g., \(P_i=\tau _i I_{n_i}-\beta A_i^\top A_i\), which can linearize the third term of the augmented Lagrangian function \( {\mathcal {L}_\beta (x_1,\ldots ,x_m;\lambda ):=\sum _{i=1}^m\theta _i(x_i)-\left\langle \begin{aligned} \lambda , \sum \nolimits _{i=1}^mA_ix_i-b\end{aligned}\right\rangle +\frac{\beta }{2}\left\| \begin{aligned}\sum \nolimits _{i=1}^mA_ix_i-b\end{aligned}\right\| ^2}\) and thus the corresponding subproblem in (1.2) often admits closed-form solution in practice [30]. To ensure the global convergence of the method in [7], the proximal parameter \(\tau _i\) in \(P_i\) need to satisfy the condition \(\tau _i>\frac{m\beta }{2-\gamma }\Vert A_i\Vert ^2\), where \(\gamma \in (0,2)\). Similarly, Lin, Liu and Li [20] presented a linearized alternating direction method with parallel splitting for model (1.1), and to ensure the global convergence of this method, the proximal parameter \(\tau _i\) in \(P_i\) need to satisfy the condition \(\tau _i>m\beta \Vert A_i\Vert ^2\). Wang and Song [28] developed a twisted proximal ADMM (denoted by TPADMM) with partially parallel splitting, which updates \(x_1\) and \(x_i(i=2,3,\ldots ,m)\) in a sequential way, but updates the variables \(x_i(i=2,3,\ldots ,m)\) in a parallel way. Furthermore, if we set \(P_i=\tau _i I_{n_i}-\beta A_i^\top A_i\) in [28], then the proximal parameter \(\tau _i\) must satisfy the condition \(\tau _i>(m-1)\beta \Vert A_i\Vert ^2\), and the lower bound \((m-1)\beta \Vert A_i\Vert ^2\) is obviously smaller than \(m\beta \Vert A_i\Vert ^2\) in [20].

Fazel et al. [9] have pointed out that the choice of the proximal matrix \(P_i\) is very much problem dependent and the general principle is that \(P_i\) should be small as possible while the subproblem related to the variable \(x_i\) is still relatively easy to compute. Therefore, the feasible region of the proximal parameter \(\tau _i\) deserves researching. In this paper, we are going to study the twisted proximal ADMM (TPADMM) in [28], and show that the greatest lower bound of its proximal parameter can be substantially reduced. Specifically, a sharper lower bound of the proximal parameter \(\tau _i\) is \(\frac{4+\max \{1-\gamma ,\gamma ^2-\gamma \}}{5}(m-1)\beta \Vert A_i\Vert ^2\), where \(\gamma \in (0,\frac{1+\sqrt{5}}{2})\). If \(\gamma =1\), this lower bound becomes \(\frac{4(m-1)}{5}\beta \Vert A_i\Vert ^2\), which is obviously smaller than \(m\beta \Vert A_i\Vert ^2\) in [20] and \((m-1)\beta \Vert A_i\Vert ^2\) in [28]. Furthermore, it is worth noting that the relaxation parameter \(\gamma \) is attached to the update formulas of all variables of TPADMM [28]. However, this relaxation strategy may destroy the inherent structure of the primal variables which are stemmed from the subproblems. Then, in our new method, we use the following relaxing technique: only relax the dual variable \(\lambda \). That is we only attach the relaxation factor \(\gamma \) to the dual variable \(\lambda \), while the updating formula of the primal variables \(x_i (i=1,2,\ldots ,m)\) are irrelevant to \(\gamma \).

The rest of this paper is organized as follows. Section 2 offers some notations and basic results that will be used in the subsequent discussions. In Sect. 3, we present the proximal ADMM with smaller proximal parameter for the model (1.1) and establish its global convergence and convergence rate. In Sect. 4, we report some numerical results to demonstrate the numerical advantage of smaller proximal parameter and relaxing technique. Finally, a brief conclusion ends this paper in Sect. 5.

2 Preliminaries

In this section, we define some notations and present the necessary assumptions on the model (1.1) under which our convergence analysis can be conducted.

Throughout this paper, we set

$$\begin{aligned} y=\left[ x_2;\ldots ;x_m\right] ,~~ x=\left[ x_1;y\right] ,~~ u=\left[ x;\lambda \right] , ~~v=\left[ x_2;x_3;\ldots ;x_m;\lambda \right] , \end{aligned}$$

and

$$\begin{aligned} \bar{\theta }(y)= & {} \sum _{i=2}^m\theta _i(x_i),~~ \theta (x)=\theta _1(x_1)+\bar{\theta }(y),~~ \mathcal {B}=[A_2,A_3,\ldots ,A_m], ~~\mathcal {A}=[A_1,\mathcal {B}]\\ \bar{\mathcal {X}}= & {} \mathcal {X}_2\times \cdots \times \mathcal {X}_m,~~ \mathcal {X}=\mathcal {X}_1\times \bar{\mathcal {X}}. \end{aligned}$$

Furthermore, The domain of a function \(f(\cdot ):\tilde{\mathcal {X}}\rightarrow (-\infty ,+\infty ]\) is defined as dom\((f)=\{x\in \tilde{\mathcal {X}}|f(x)<+\infty \}\); The set of all relative interior points of a given nonempty convex set \(\Omega \) is denoted by ri\((\Omega )\); \(G\succ 0\) (or \(G\succeq 0\)) denotes that the symmetric matrix G is positive definite (or positive semi-definite); For any vector x and a symmetric matrix G with compatible dimensionality, we denote by \(\Vert x\Vert _G^2=x^\top Gx\) though G may be not positive definite; Diag\(\{A_1,A_2,\ldots ,A_m\}\) has \(A_i\) as its \(i-\)th block on the diagonal.

Throughout, we make the following standard assumptions.

Assumption 2.1

The functions \(\theta _i(\cdot ) (i=1,2,\ldots ,m)\) are all convex.

Assumption 2.2

There is a point \((\hat{x}_1,\hat{x}_2,\ldots ,\hat{x}_m)\in {\mathrm {ri}}({\mathrm {dom}}\theta _1\times {\mathrm {dom}}\theta _2\times \cdots \times {\mathrm {dom}}\theta _m)\) such that \(A_1\hat{x}_1+A_2\hat{x}_2+\cdots +A_m\hat{x}_m=b\), and \(\hat{x}_i\in \mathcal {X}_i, i=1,2,\ldots ,m\).

Assumption 2.3

The matrices \(A_i(i=2,3,\ldots ,m)\) are full column rank.

Using the first-order optimality condition for convex minimization and Assumptions 2.1,2.2, the model (1.1) can be recast as the following mixed variational inequality problem, denoted by VI(\(\mathcal {U},F,\theta )\): Finding a vector \(u^*\in \mathcal {U}\) such that

$$\begin{aligned} \theta (x)-\theta (x^*)+(u-u^*)^\top F(u^*)\ge 0,~~\forall u\in \mathcal {U}, \end{aligned}$$
(2.1)

where \(\mathcal {U}=\mathcal {X}_1\times \mathcal {X}_2\times \cdots \times \mathcal {X}_m\times \mathcal {R}^l\), and

$$\begin{aligned} F(u):=\left( \begin{array}{c}-A_1^\top \lambda \\ -A_2^\top \lambda \\ \vdots \\ -A_m^\top \lambda \\ \mathcal {A}x- b\end{array}\right) . \end{aligned}$$
(2.2)

We denote by \(\mathcal {U}^*\) the solution set of (2.1), which is nonempty under Assumtion 2.2 and the fact that the solution set of the model (1.1) is assumed to be nonempty. The mapping F(u) defined in (2.2) is affine and satisfies the following simple property

$$\begin{aligned} (u'-u)^\top (F(u')-F(u))=0, \quad \forall u',u\in \mathcal {U}. \end{aligned}$$
(2.3)

The following proposition provides a criterion to measure the convergence rate of the iterative methods for the model (1.1).

Proposition 2.1

[20] Let \(\hat{x}\in \mathcal {X}\). Then \(\hat{x}\) is an optimal solution of the model (1.1) if and only if there exists \(r>0\) such that

$$\begin{aligned} \theta (\hat{x})-\theta (x^*)+\left( \hat{x}-x^*\right) ^\top \left( - {\mathcal {A}}^\top \lambda ^*\right) +\frac{r}{2}\Vert {\mathcal {A}}\hat{x}-b\Vert ^2=0, \end{aligned}$$
(2.4)

where \((x^*,\lambda ^*)\in \mathcal {U}^*\).

3 The algorithm and its convergence result

In this section, we first present a new version of TPADMM in [28] for the model (1.1), which is named as the proximal ADMM with smaller proximal parameter, and then prove its global convergence and convergence rate under Assumptions 2.12.3. As stated earlier, the augmented Lagrangian function for the model (1.1) is defined as follows:

$$\begin{aligned} {\mathcal {L}_\beta (x_1,\ldots ,x_m;\lambda ):=\sum _{i=1}^m\theta _i(x_i)-\left\langle \begin{aligned} \lambda , \sum _{i=1}^mA_ix_i-b\end{aligned}\right\rangle +\frac{\beta }{2}\left\| \begin{aligned}\sum _{i=1}^mA_ix_i-b\end{aligned}\right\| ^2}, \end{aligned}$$

based on which, now we describe the proximal ADMM with smaller proximal parameter for solving (1.1) as follows.

Algorithm 1.

Step 0. Let parameters \(\beta >0\), \(\gamma \in (0,\frac{1+\sqrt{5}}{2})\), \(\tau >\frac{4+\max \{1-\gamma ,\gamma ^2-\gamma \}}{5}(m-1)\), the tolerance \(\varepsilon >0\), and the matrices \(\bar{G}_i\in \mathcal {R}^{n_i\times n_i}\) with \(\bar{G}_i\succeq 0(i=2,3,\ldots ,m)\). Choose the initial point \(u^0=[x_1^0;x_2^0;\ldots ;x_m^0;\lambda ^0]\in \mathcal {U}\); Set \(k=0\), and denote \(G_i=\bar{G}_i-(1-\tau )\beta A_i^\top A_i(i=2,3,\ldots ,m)\).

Step 1. Compute the new iterate \({u}^{k+1}=[{x}_1^{k+1};{x}_2^{k+1};\ldots ,{x}_m^{k+1};{\lambda }^{k+1}]\) via

$$\begin{aligned} \left\{ \begin{array}{l} {x}_1^{k+1}:=\mathop {{\mathrm {argmin}}}\limits _{x_1\in \mathcal {X}_1} \left\{ \begin{aligned}\mathcal {L}_\beta (x_1,x_2^k,\ldots ,x_m^k;\lambda ^k) \end{aligned}\right\} ,\\ {x}_i^{k+1}:=\mathop {{\mathrm {argmin}}}\limits _{x_i\in \mathcal {X}_i} \left\{ \begin{aligned}\mathcal {L}_\beta ({x}_1^{k+1},x_2^k,\ldots , x_{i-1}^{k},x_i,x_{i+1}^k,\ldots ,x_m^k;\lambda ^k)+\frac{1}{2}\Vert x_i-x_i^k \Vert ^2_{G_i}\end{aligned}\right\} ,\\ i=2,3,\ldots ,m,\\ {\lambda }^{k+1}:=\lambda ^{k}-\left. \begin{aligned}\gamma \beta \end{aligned}\right. \left( \begin{aligned}\sum _{i=1}^mA_i{x}_i^{k+1}-b \end{aligned}\right) . \end{array}\right. \nonumber \\ \end{aligned}$$
(3.1)

Step 2. If \(\Vert u^k-{u}^{k+1}\Vert \le \varepsilon \), then stop; otherwise, set \(k=k+1\), and go to Step 1.

Remark 3.1

Since the matrices \(\bar{G}_i(i=2,3,\ldots ,m)\) can be any positive semi-definite matrices, we can set \(\bar{G}_i=\tau _iI_{n_i}-\tau \beta A_i^\top A_i\) with \(\tau _i\ge \tau \beta \Vert A_i\Vert ^2\). Then, \(\bar{G}_i\succeq 0\), \(G_i=\tau _iI_{n_i}-\beta A_i^\top A_i(i=2,3,\ldots ,m)\), and the subproblem with respect to the primal variable \(x_i\) in (3.1) can be rewritten as

$$\begin{aligned} {x}_i^{k+1}={\mathrm {argmin}}_{x_i\in \mathcal {X}_i} \left\{ \theta _i(x_i)+\frac{\tau _i}{2}\Vert x_i-\nu _i^k\Vert ^2\right\} , \end{aligned}$$

where \(\nu _i^k=\frac{1}{\tau _i}[\tau _ix_i^k-\beta A_i^\top A_1x_1^{k+1}-\beta A_i^\top (\sum \limits _{j\ne 2}A_jx_j^k-b)+A_i^\top \lambda ^k]\) is a vector. Therefore, when \(\mathcal {X}_i=\mathcal {R}^{n_i}\) and \(\theta _i(x_i)\) takes some special functions, such as \(\Vert x_i\Vert _1\), \(\Vert x_i\Vert _2\), the above subproblem has a closed-form solution [3, 17]. Furthermore, from \(\tau _i\ge \tau \beta \Vert A_i\Vert ^2\) and \(\tau >\frac{4+\max \{1-\gamma ,\gamma ^2-\gamma \}}{5}(m-1)\), we have

$$\begin{aligned} \tau _i>\frac{4+\max \{1-\gamma ,\gamma ^2-\gamma \}}{5}(m-1)\beta \Vert A_i\Vert ^2. \end{aligned}$$

Therefore, the feasible region of the proximal parameter \(\tau _i\) is generally larger than those in [20, 28].

Remark 3.2

Different from \(\gamma =1\) in [13], the feasible set of \(\gamma \) in Algorithm 1 is the interval \((0,\frac{1+\sqrt{5}}{2})\), and larger values of \(\gamma \) can often speed up the convergence of Algorithm 1 (see Sect. 4).

When Assumptions 2.12.3 hold and the constant \(\tau \) satisfy the following condition

$$\begin{aligned} \tau >\max \{m-1-c_0,4c_0+(m-1)\max \{1-\gamma ,\gamma ^2-\gamma \}\}, \end{aligned}$$
(3.2)

where \(c_0\) is a positive constant that will be specified later, Algorithm 1 is globally convergent and has the worst-case \(\mathcal {O}(1/t)\) convergence rate in an ergodic sense.

Theorem 3.1

Let \(\{u^k\}\) be the sequence generated by Algorithm 1 and \(\tau \) satisfies (3.2), \(\gamma \in (0,\frac{1+\sqrt{5}}{2})\). Then, the whole sequence \(\{u^k\}\) converges globally to some \(\hat{u}\), which belongs to \(\mathcal {U}^*\).

Theorem 3.2

Let \(\{u^k\}\) be the sequence generated by Algorithm 1, and set

$$\begin{aligned} {\hat{x}}_{t}=\frac{1}{t}\sum _{k=1}^t\hat{x}^{k}, \end{aligned}$$

where t is a positive integer. Then, \( {\hat{x}}_{t}\in \mathcal {X}\), and

$$\begin{aligned} \theta ( {\hat{x}}_t)-\theta (x^*)+( {\hat{x}}_t-x^*)^\top (- {\mathcal {A}}^\top \lambda ^*)+\frac{\beta }{2}\min \left\{ 1, \frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert {\mathcal {A}}\hat{x}_t-b\Vert ^2\le \frac{D}{t}, \end{aligned}$$
(3.3)

where \((x^*,\lambda ^*)\in \mathcal {U}^*\), and D is a constant defined by

$$\begin{aligned} D= & {} \frac{1}{2}\left( \Vert v^1-v^*\Vert _H^2+\beta \max \left\{ 1-\gamma ,1-\frac{1}{\gamma }\right\} \Vert \mathcal {A}x^{1}-b\Vert ^2+\Vert y^{0}-y^{1}\Vert ^2_N\right. \nonumber \\&\left. +\,c_0\beta \sum _{i=2}^m\Vert A_i\left( x_i^{0}-x_i^{1}\right) \Vert ^2\right) . \end{aligned}$$
(3.4)

To prove Theorems 3.1 and 3.2, we need the inequality (3.21) below, which is established by the following seven lemmas step by step. Based on the first-order optimality conditions of the subproblems in (3.1), we present Lemmas 3.1 and 3.2, which provide a lower bound of some vector to be a solution of VI(\(\mathcal {U},F,\theta )\), and then Lemma 3.3 further rewrites the lower bound as some quadratic terms. Lemma 3.4 provides a lower bound of some quadratic terms appeared in Lemma 3.3, and Lemma 3.5 gives a lower bound of the parameter \(\tau \). Lemma 3.6 further deals with the lower bound of some quadratic term and crossing term. Based on the conclusions established in the first six lemmas, the most important inequality (3.21) is proved in Lemma 3.7.

For convenience, we now define two matrices to make the following analysis more compact.

$$\begin{aligned} G_0=\left( \begin{array}{ccccccc}G_2&{}-\beta A_2^\top A_3&{}\cdots &{}-\beta A_2^\top A_m\\ -\beta A_3^\top A_2&{}G_3&{}\cdots &{}-\beta A_3^\top A_m\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ -\beta A_m^\top A_2&{}-\beta A_m^\top A_3&{}\cdots &{}G_m\end{array}\right) , \end{aligned}$$
(3.5)

and

$$\begin{aligned} Q=\left( \begin{array}{ccccccc}\bar{G}_2+\tau \beta A_2^\top A_2&{}0&{}\cdots &{}0&{}0\\ 0&{}\bar{G}_3+\tau \beta A_3^\top A_3&{}\cdots &{}0&{}0\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}\cdots &{}\bar{G}_m+\tau \beta A_m^\top A_m&{}0\\ -A_2&{}-A_3&{}\cdots &{}-A_m&{}I_l/\beta \end{array}\right) . \end{aligned}$$
(3.6)

Lemma 3.1

Let \(\{u^k\}\) be the sequence generated by Algorithm 1. Then, we have \(x_i^k\in \mathcal {X}_i\) and

$$\begin{aligned} \begin{aligned}&\beta \left( \mathcal {A}x^{k+1}-b\right) ^\top \mathcal {B} \left( y^k-y^{k+1}\right) \\&\quad \ge (1-\gamma )\beta \left( \mathcal {A}x^{k}-b\right) ^\top \mathcal {B}\left( y^k-y^{k+1}\right) +\left\| y^k-y^{k+1}\right\| ^2_{G_0}\\&\qquad -\left( y^k-y^{k+1}\right) ^\top G_0\left( y^{k-1}-y^{k}\right) . \end{aligned} \end{aligned}$$
(3.7)

Proof

By the first-order optimality condition for \(x_i\)-subproblem in (3.1), we have \({x}_i^{k+1}\in \mathcal {X}_i(i=1,2,\ldots ,m)\) and

$$\begin{aligned}&\theta _i(x_i)-\theta _i\left( {x}_i^{k+1}\right) +\left( x_i-{x}_i^{k+1}\right) ^\top \\&\quad \left\{ -A_i^\top \lambda ^k+\beta A_i^\top \left( A_1{x}_1^{k+1}+\sum _{j=2,j\ne i}^mA_jx_j^k+A_i{x}_i^{k+1}-b\right) \right. \\&\quad \left. +G_i\left( {x}_i^{k+1}-x_i^k\right) \right\} \ge 0, \quad \forall x_i\in \mathcal {X}_i, i=2,3,\ldots ,m, \end{aligned}$$

i.e.,

$$\begin{aligned} \begin{aligned}&\theta _i(x_i)-\theta _i\left( {x}_i^{k+1}\right) +\left( x_i-{x}_i^{k+1}\right) ^\top \left\{ -A_i^\top \lambda ^k+\beta A_i^\top \left( \sum _{j=1}^mA_j{x}_j^{k+1}-b \right) \right. \\&\quad \left. -\beta A_i^\top \sum _{j=2,j\ne i}^mA_j\left( {x}_j^{k+1}-x_j^k\right) +G_i\left( {x}_i^{k+1}-x_i^k\right) \right\} \ge 0, \quad \forall x_i\in \mathcal {X}_i, \\&\quad i=2,3,\ldots ,m. \end{aligned} \end{aligned}$$
(3.8)

Then, summing the above inequality over \(i=2,3,\ldots ,m\), and using the definition of \(G_0\) in (3.5), we get

$$\begin{aligned}&\bar{\mathcal {\theta }}(y)-\bar{\mathcal {\theta }}\left( y^{k+1}\right) +\left( y-y^{k+1}\right) ^\top \nonumber \\&\quad \times \left[ -\mathcal {B}^\top {\lambda }^{k}+\beta \mathcal {B}^\top \left( \mathcal {A}x^{k+1}-b\right) +G_0\left( y^{k+1}-y^k\right) \right] \ge 0. \end{aligned}$$

Note that the above inequality is also true for \(k:=k-1\) and thus

$$\begin{aligned} \bar{\mathcal {\theta }}(y)-\bar{\mathcal {\theta }}\left( y^{k}\right) +\left( y-y^{k}\right) ^\top \left[ -\mathcal {B}^\top {\lambda }^{k-1}+\beta \mathcal {B}^\top \left( \mathcal {A}x^{k}-b\right) +G_0\left( y^{k}-y^{k-1}\right) \right] \ge 0. \end{aligned}$$

Setting \(y=y^k\) and \(y=y^{k+1}\) in the above inequalities, respectively, and then adding them, we get

$$\begin{aligned}&\left( y^k-y^{k+1}\right) ^\top \mathcal {B}^\top \left[ \left( {\lambda }^{k-1}-{\lambda }^{k}\right) +\beta \left( \mathcal {A}x^{k+1}-b\right) -\beta \left( \mathcal {A}x^{k}-b\right) \right] \\&\quad \ge \Vert y^k-y^{k+1}\Vert ^2_{G_0}-\left( y^k-y^{k+1}\right) ^\top G_0\left( y^{k-1}-y^{k}\right) . \end{aligned}$$

By the updating formula for \(\lambda \) in (3.1), we have \({\lambda }^{k-1}-{\lambda }^{k}=\gamma \beta (\mathcal {A}x^{k}-b)\). Substituting this into the left-hand side of the above inequality, we get the assertion (3.7) immediately. This completes the proof. \(\square \)

Now, let us define an auxiliary sequence \(\{\hat{u}^k\}=\{[\hat{x}_1^k;\hat{x}_2^k;\ldots ;\hat{x}_m^k;\hat{\lambda }^k]\}\), whose components are defined by

$$\begin{aligned} \hat{x}_i^k={x}_i^{k+1}(i=1,2,\ldots ,m), \quad \hat{\lambda }^k=\lambda ^k-\beta \left( A_1{x}_1^{k+1} +\sum _{i=2}^mA_i{x}_i^{k}-b\right) , \end{aligned}$$
(3.9)

Lemma 3.2

The auxiliary sequence \(\{\hat{u}^k\}\) satisfies \(\hat{u}^k\in \mathcal {U}\) and

$$\begin{aligned} \theta (x)-\theta \left( \hat{x}^k\right) +\left( u-\hat{u}^k\right) ^\top F\left( \hat{u}^k\right) \ge \left( v-\hat{v}^k\right) ^\top Q\left( v^k-\hat{v}^k\right) , \quad \forall u\in \mathcal {U},\qquad \end{aligned}$$
(3.10)

where the matrix Q is defined by (3.6).

Proof

From \(\hat{x}^k=x^{k+1}\), it is obvious that \(\hat{u}^k\in \mathcal {U}\). The first-order optimality condition for \(x_1\)-subproblem in (3.1) and the definition of \(\hat{\lambda }^k\) in (3.9) imply

$$\begin{aligned} \theta _1(x_1)-\theta _1\left( \hat{x}_1^{k}\right) +\left( x_1-\hat{x}_1^{k}\right) ^\top \left( -A_1^\top \hat{\lambda }^k\right) \ge 0, \quad \forall x_1\in \mathcal {X}_1. \end{aligned}$$
(3.11)

By the definitions of \(\hat{\lambda }^k\) and \(G_i\) (\(i=2,3,\ldots ,m\)), we can get

$$\begin{aligned}&-A_i^\top \left[ \lambda ^k-\beta \left( \sum _{i=1}^mA_i{x}_i^{k+1}-b\right) \right] -\beta A_i^\top \sum _{j=2,j\ne i}^mA_j\left( {x}_j^{k+1}-x_j^k\right) +G_i\left( {x}_i^{k+1}-x_i^k\right) \\&\quad =-A_i^\top \left[ \lambda ^k-\beta \left( A_1{x}_1^{k+1}+\sum _{j=2}^m A_i{x}_i^{k}-b\right) \right] +\left( G_i+\beta A_i^\top A_i\right) \left( {x}_i^{k+1}-x_i^k\right) \\&\quad =-A_i^\top \hat{\lambda }^k+\left( G_i+\beta A_i^\top A_i\right) \left( \hat{x}_i^{k}-x_i^k\right) \\&\quad =-A_i^\top \hat{\lambda }^k+\left( \bar{G}_i+\tau \beta A_i^\top A_i\right) \left( \hat{x}_i^{k}-x_i^k\right) . \end{aligned}$$

Then, substituting the above equality into the left-hand side of (3.8), we thus derive

$$\begin{aligned}&\theta _i(x_i)-\theta _i\left( \hat{x}_i^{k}\right) +\left( x_i-\hat{x}_i^{k}\right) ^\top \left\{ -A_i^\top \hat{\lambda }^k+\left( \bar{G}_i+\tau \beta A_i^\top A_i\right) \left( \hat{x}_i^{k}-x_i^k\right) \right\} \ge 0, \quad \nonumber \\&\quad \forall x_i \in \mathcal {X}_i, i=2,3,\ldots ,m. \end{aligned}$$
(3.12)

Furthermore, from the definition of \(\hat{\lambda }^k\), we get the following inequality

$$\begin{aligned} \left( \lambda -\hat{\lambda }^{k}\right) ^\top \left\{ \left( \sum _{i=1}^mA_i\hat{x}_i^{k}-b\right) -\sum _{j=2}^mA_j \left( \hat{x}_j^k-x_j^k\right) +\frac{1}{\beta }\left( \hat{\lambda }^k-\lambda ^k\right) \right\} \ge 0, \quad \forall \lambda \in \mathcal {R}^l.\nonumber \\ \end{aligned}$$
(3.13)

Then, adding (3.11), (3.12) over \(i=2,3,\ldots ,m\), (3.13), and by some simple manipulations, we get the assertion (3.10) immediately. This completes the proof. \(\square \)

Remark 3.3

If \(u^k={u}^{k+1}\), then we have

$$\begin{aligned} x^k=\hat{x}^{k}, \lambda ^k=\hat{\lambda }^{k}. \end{aligned}$$

Substituting the above two equalities into the right-hand side of (3.10), we obtain

$$\begin{aligned} \theta (x)-\theta (\hat{x}^k)+(u-\hat{u}^k)^\top F(\hat{u}^k)\ge 0, \quad \forall u\in \mathcal {U}, \end{aligned}$$

i.e.,

$$\begin{aligned} \theta (x)-\theta ({x}^k)+(u-{u}^k)^\top F({u}^k)\ge 0, \quad \forall u\in \mathcal {U}, \end{aligned}$$

which implies that \(u^k\) is a solution of VI(\(\mathcal {U},F,\theta )\), and thus \(x^k\) is a solution of the model (1.1). This indicates that the stopping criterion of Algorithm 1 is reasonable.

The updating formula for \({\lambda }^{k+1}\) in (3.1) can be rewritten as

$$\begin{aligned}&{\lambda }^{k+1}\\&\quad =\lambda ^{k}-\left( -\gamma \beta \sum _{i=2}^mA_i\left( {x}_i^{k} -{x}_i^{k+1}\right) \right) -\gamma \beta \left( A_1{x}_1^{k+1}+\sum _{i=2}^mA_i{x}_i^{k}-b\right) \\&\quad =\lambda ^{k}-\left[ -\gamma \beta \sum _{i=2}^mA_i\left( {x}_i^{k} -\hat{x}_i^{k}\right) +\gamma (\lambda ^k-\hat{\lambda }^k)\right] . \end{aligned}$$

Combining the above equality and (3.9), we get

$$\begin{aligned} {v}^{k+1}=v^k-{M}(v^k-\hat{v}^k), \end{aligned}$$
(3.14)

where

$$\begin{aligned} {M}=\left( \begin{array}{ccccccc}I_{n_2}&{}0&{}\cdots &{}0&{}0\\ 0&{}I_{n_3}&{}\cdots &{}0&{}0\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}\cdots &{}I_{n_m}&{}0\\ -\gamma \beta A_2&{}-\gamma \beta A_3&{}\cdots &{}-\gamma \beta A_m&{}\gamma I_l\end{array}\right) . \end{aligned}$$

Now we define two auxiliary matrices as

$$\begin{aligned} H:=QM^{-1} \quad {\mathrm {and}} \quad G:=Q^\top +Q-M^\top HM. \end{aligned}$$

By simple calculation, we have

$$\begin{aligned} H=\left( \begin{array}{ccccccc}\bar{G}_2+\tau \beta A_2^\top A_2&{}0&{}\cdots &{}0&{}0\\ 0&{}\bar{G}_3+\tau \beta A_3^\top A_3&{}\cdots &{}0&{}0\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}\cdots &{}\bar{G}_m+\tau \beta A_m^\top A_m&{}0\\ 0&{}0&{}\cdots &{}0&{}I_l/(\gamma \beta )\end{array}\right) , \end{aligned}$$

and

$$\begin{aligned} G=\left( \begin{array}{ccccccc}\tilde{G}_2&{}-\gamma \beta A_2^\top A_3&{}\cdots &{}-\gamma \beta A_2^\top A_m&{}(\gamma -1)A_2^\top \\ -\gamma \beta A_3^\top A_2&{}\tilde{G}_3&{}\cdots &{}-\gamma \beta A_3^\top A_m&{}(\gamma -1)A_3^\top \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ -\gamma \beta A_m^\top A_2&{} -\gamma \beta A_m^\top A_3&{}\cdots &{}\tilde{G}_m&{}(\gamma -1)A_m^\top \\ (\gamma -1)A_2&{}(\gamma -1)A_3&{}\cdots &{}(\gamma -1)A_m&{}(2-\gamma )I_l/\beta \end{array}\right) , \end{aligned}$$

where \(\tilde{G}_i=\bar{G}_i+(\tau -\gamma )\beta A_i^\top A_i (i=2,3,\ldots ,m)\). Obviously, under Assumption 3.3, the matrix H is positive definite, and the following relationship with respect to the matrices QMH holds

$$\begin{aligned} Q=HM. \end{aligned}$$
(3.15)

Based on (3.10) and (3.14), it seems that Algorithm 1 can be categorized into the prototype algorithm proposed in [12]. However, the convergence results of Algorithm 1 cannot be guaranteed by the results in [12] because we cannot ensure that the matrix G is positive semi-definite, which is a necessary condition for the global convergence of the prototype algorithm in [12]. To show the matrix G maybe indefinite, we only need to show that its sub-matrix \(G(1:m-1,1:m-1)\) maybe indefinite. Here \(G(1:m-1,1:m-1)\) denotes the corresponding sub-matrix formed from the rows and columns with the indices \((1:m-1)\) and \((1:m-1)\) in the block sense. In fact, setting \(\bar{G}_i=0\,(i=2,3,\ldots ,m)\), one has

$$\begin{aligned}&G(1:m-1,1:m-1)=\left( \begin{array}{ccccccc}(\tau -\gamma )\beta A_2^\top A_2&{}-\gamma \beta A_2^\top A_3&{}\cdots &{}-\gamma \beta A_2^\top A_m\\ -\gamma \beta A_3^\top A_2&{}(\tau -\gamma )\beta A_3^\top A_3&{}\cdots &{}-\gamma \beta A_3^\top A_m\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ -\gamma \beta A_m^\top A_2&{}-\gamma \beta A_m^\top A_3&{}\cdots &{}(\tau -\gamma )\beta A_m^\top A_m\end{array}\right) \\&=\beta \left( \begin{array}{ccccccc} A_2^\top &{}0&{}\cdots &{}0\\ 0&{} A_3^\top &{}\cdots &{}0\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}\cdots &{}A_m^\top \end{array}\right) \left( \begin{array}{ccccccc} (\tau -\gamma )I_l&{}-\gamma I_l&{}\cdots &{}-\gamma I_l\\ -\gamma I_l&{}(\tau -\gamma )I_l&{}\cdots &{}-\gamma I_l\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ -\gamma I_l&{}-\gamma I_l&{}\cdots &{}(\tau -\gamma )I_l\end{array}\right) \left( \begin{array}{ccccccc} A_2&{}0&{}\cdots &{}0\\ 0&{} A_3&{}\cdots &{}0\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}\cdots &{}A_m\end{array}\right) . \end{aligned}$$

The middle matrix in the above expression can be further written as

$$\begin{aligned} \left( \begin{array}{ccccccc} \tau -\gamma &{}-\gamma &{}\cdots &{}-\gamma \\ -\gamma &{}\tau -\gamma &{}\cdots &{}-\gamma \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ -\gamma &{}-\gamma &{}\ldots &{}\tau -\gamma \end{array}\right) \otimes I_l \end{aligned}$$

Then, we only need to show the \((m-1)\)-by-\((m-1)\) matrix

$$\begin{aligned} \left( \begin{array}{ccccccc} \tau -\gamma &{}-\gamma &{}\cdots &{}-\gamma \\ -\gamma &{}\tau -\gamma &{}\cdots &{}-\gamma \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ -\gamma &{}-\gamma &{}\cdots &{}\tau -\gamma \end{array}\right) \end{aligned}$$

may be indefinite. In fact, its eigenvalues are

$$\begin{aligned} \lambda _1=\lambda _2=\cdots =\lambda _{m-2}=\tau , \lambda _{m-1}=\tau -(m-1)\gamma . \end{aligned}$$

Then, the matrix \(G(1:m-1,1:m-1)\) is positive semi-definite iff \(\tau \ge (m-1)\gamma \). Obviously, we cannot ensure \(G(1:m-1,1:m-1)\) is positive semi-definite for any \(\tau \ge \frac{4+\max \{1-\gamma ,\gamma ^2-\gamma \}}{5}(m-1)\). Therefore, the matrix G may be indefinite.

Now, let us rewrite the crossing term \((v-\hat{v}^k)^\top Q(v^k-\hat{v}^k)\) on the right-hand side (3.10) as some quadratic terms.

Lemma 3.3

Let \(\{u^k\}\) be the sequence generated by Algorithm 1. Then, for any \(u\in \mathcal {U}\), we have

$$\begin{aligned} (v-\hat{v}^k)^\top Q(v^k-\hat{v}^k)=\frac{1}{2}(\Vert v-v^{k+1}\Vert ^2_H-\Vert v-v^k\Vert _H^2) +\frac{1}{2}(\Vert v^k-\hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2). \end{aligned}$$
(3.16)

Proof

By \(H\succ 0\), (3.14) and (3.15), we have

$$\begin{aligned}&(v-\hat{v}^k)^\top Q(v^k-\hat{v}^k)\\&\quad =(v-\hat{v}^k)^\top HM(v^k-\hat{v}^k)\\&\quad =(v-\hat{v}^k)^\top H(v^k-{v}^{k+1})\\&\quad =\frac{1}{2}(\Vert v-v^{k+1}\Vert ^2_H-\Vert v-v^k\Vert _H^2)+\frac{1}{2}(\Vert v^k- \hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2), \end{aligned}$$

where the last equality is obtained by setting \(a=v, b=\hat{v}^k, c=v^k, d=v^{k+1}\) in the identity

$$\begin{aligned} (a-b)^\top H(c-d)=\frac{1}{2}(\Vert a-d\Vert _H^2-\Vert a-c\Vert _H^2)+\frac{1}{2}(\Vert c-b\Vert _H^2-\Vert d-b\Vert _H^2). \end{aligned}$$

This completes the proof. \(\square \)

Substituting (3.16) into the left-hand side of (3.10), for any \(u\in \mathcal {U}\), we get

$$\begin{aligned}&\theta (x)-\theta (\hat{x}^k)+(u-\hat{u}^k)^\top F(\hat{u}^k)\ge \frac{1}{2}(\Vert v-v^{k+1}\Vert ^2_H-\Vert v-v^k\Vert _H^2)\nonumber \\&\quad +\,\frac{1}{2}(\Vert v^k-\hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2). \end{aligned}$$
(3.17)

The first group-term \(\frac{1}{2}(\Vert v-v^{k+1}\Vert ^2_H-\Vert v-v^k\Vert _H^2)\) on the right-hand side of (3.17) can be manipulated consecutively between iterates. Therefore, we only to deal with the second group-term \(\frac{1}{2}(\Vert v^k-\hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2)\).

Lemma 3.4

Let \(\{u^k\}\) be the sequence generated by Algorithm 1. Then, we have

$$\begin{aligned} \begin{aligned}&\Vert v^k-\hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2\\&\quad \ge \sum _{i=2}^m\Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i} +\beta (\tau -(m-1)\max \{1-\gamma ,\gamma ^2-\gamma \})\\&\qquad \sum _{i=2}^m\Vert A_i(x_i^k -{x}_i^{k+1})\Vert ^2+\beta \min \left\{ 1,\frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert \mathcal {A}x^{k+1}-b\Vert ^2\\&\qquad +\beta \max \left\{ 1-\gamma ,1-\frac{1}{\gamma }\right\} (\Vert \mathcal {A}x^{k+1}-b\Vert ^2-\Vert \mathcal {A}x^{k}-b\Vert ^2)\\&\qquad +2\Vert y^k-y^{k+1}\Vert ^2_{G_0}-2(y^k-y^{k+1})^\top G_0(y^{k-1}-y^{k}). \end{aligned} \end{aligned}$$
(3.18)

Proof

Obviously, \(\hat{\lambda }^k\) defined by (3.9) can be rewritten as

$$\begin{aligned} \hat{\lambda }^k=\lambda ^k-\beta \left( A_1{x}_1^{k+1}+\mathcal {B}y^k-b\right) . \end{aligned}$$

Then, from (3.14) and the definition of H, we can expand \(\Vert v^{k+1}-\hat{v}^k\Vert _H^2\) as

$$\begin{aligned}&\Vert v^{k+1}-\hat{v}^k\Vert _H^2\\&\quad =\Vert (I-M)(v^{k}-\hat{v}^k)\Vert _H^2\\&\quad =\frac{1}{\gamma \beta }\Vert (\lambda ^k-\hat{\lambda }^k) -[-\gamma \beta \mathcal {B}(y^k-\hat{y}^k)+\gamma (\lambda ^k-\hat{\lambda }^k)]\Vert ^2\\&\quad =\frac{1}{\gamma \beta }\Vert (\lambda ^k-\hat{\lambda }^k) -[-\gamma \beta \mathcal {B}(y^k-{y}^{k+1})+\gamma \beta (A_1{x}_1^{k+1} +\mathcal {B}y^k-b)]\Vert ^2\\&\quad =\frac{1}{\gamma \beta }\Vert (\lambda ^k-\hat{\lambda }^k) -\gamma \beta (\mathcal {A}x^{k+1}-b)\Vert ^2\\&\quad =\frac{1}{\gamma \beta }\Vert \lambda ^k-\hat{\lambda }^k \Vert ^2-2(\lambda ^k-\hat{\lambda }^k)^\top (\mathcal {A}x^{k+1}-b)+\gamma \beta \Vert \mathcal {A}x^{k+1}-b\Vert ^2, \end{aligned}$$

and thus, we have

$$\begin{aligned}&\Vert v^k-\hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2\\&\quad =\sum _{i=2}^m\Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i}+\tau \beta \sum _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2+\frac{1}{\gamma \beta }\Vert \lambda ^k-\hat{\lambda }^{k}\Vert ^2\\&\qquad -\left( \frac{1}{\gamma \beta }\Vert \lambda ^k-\hat{\lambda }^k\Vert ^2 -2(\lambda ^k-\hat{\lambda }^k)^\top (\mathcal {A}x^{k+1}-b)+\gamma \beta \Vert \mathcal {A}x^{k+1}-b\Vert ^2\right) \\&\quad =\sum _{i=2}^m\Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i}+\tau \beta \sum _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2\\&\qquad +2(\lambda ^k-\hat{\lambda }^k)^\top (\mathcal {A}x^{k+1}-b) -\gamma \beta \Vert \mathcal {A}x^{k+1}-b\Vert ^2. \end{aligned}$$

From (3.9) again, we have \(\lambda ^k-\hat{\lambda }^k=\beta \mathcal {B}(y^k-y^{k+1})+\beta (\mathcal {A}x^{k+1}-b)\), and then \(2(\lambda ^k-\hat{\lambda }^k)^\top (\mathcal {A}x^{k+1}-b)=2\beta (\mathcal {A}x^{k+1}-b)^\top \mathcal {B}(y^k-y^{k+1})+2\beta \Vert \mathcal {A}x^{k+1}-b\Vert ^2\). Substituting this into the right-hand side of the above equality, we obtain

$$\begin{aligned}&\Vert v^k-\hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2\\&\quad =\sum _{i=2}^m\Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i}+\tau \beta \sum _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2\\&\qquad +2\beta (\mathcal {A}x^{k+1}-b)^\top \mathcal {B}(y^k-y^{k+1}) +(2-\gamma )\beta \Vert \mathcal {A}x^{k+1}-b\Vert ^2\\&\quad \ge \sum _{i=2}^m\Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i}+\tau \beta \sum _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2+\beta (2-\gamma )\Vert \mathcal {A}x^{k+1}-b\Vert ^2\\&\qquad +2\beta (1-\gamma )(\mathcal {A}x^{k}-b)^\top \mathcal {B}(y^k-y^{k+1}) +2\Vert y^k-y^{k+1}\Vert ^2_{G_0}-2(y^k-y^{k+1})^\top \\&\qquad G_0(y^{k-1}-y^{k}), \end{aligned}$$

where the last inequality comes from (3.7). By applying the Cauchy–Schwartz inequality, we can get

$$\begin{aligned} \left\{ \begin{array}{l} 2\beta (1-\gamma ) (\mathcal {A}x^{k}-b)^\top \mathcal {B}(y^k-y^{k+1}) \ge -\beta (1-\gamma )(\Vert \mathcal {A}x^{k}-b\Vert ^2\\ \quad +\Vert \mathcal {B}(y^{k}-y^{k+1})\Vert ^2), \quad {\mathrm {if}}~\gamma \in (0,1],\\ 2\beta (1-\gamma ) (\mathcal {A}x^{k}-b)^\top \mathcal {B}(y^k-y^{k+1}) \ge -\beta (\gamma -1)(\frac{1}{\gamma }\Vert \mathcal {A}x^{k}-b\Vert ^2\\ \quad +\gamma \Vert \mathcal {B}(y^{k}-y^{k+1})\Vert ^2), \quad {\mathrm {if}}~\gamma \in (1,+\infty ).\end{array}\right. \end{aligned}$$

Substituting this into the right-hand side of the above inequality, we obtain

$$\begin{aligned}&\Vert v^k-\hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2\\&\quad \ge \sum _{i=2}^m\Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i}+\tau \beta \sum _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2\\&\qquad +\beta \min \left\{ 1,\frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert \mathcal {A}x^{k+1}-b\Vert ^2\\&\qquad +\beta \max \{1-\gamma ,1-\frac{1}{\gamma }\}(\Vert \mathcal {A}x^{k+1}-b \Vert ^2-\Vert \mathcal {A}x^{k}-b\Vert ^2)\\&\qquad -\beta \max \{1-\gamma ,\gamma ^2-\gamma \}\Vert \mathcal {B}(y^k-y^{k+1})\Vert ^2\\&\qquad +2\Vert y^k-y^{k+1}\Vert ^2_{G_0}-2(y^k-y^{k+1})^\top G_0(y^{k-1}-y^{k}), \end{aligned}$$

which together with \(\Vert \mathcal {B}(y^k-y^{k+1})\Vert ^2\le (m-1)\sum \limits _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2\) implies that (3.18) holds. This completes the proof. \(\square \)

Now, let us deal with the last two terms \(2\Vert y^k-y^{k+1}\Vert ^2_{G_0}-2(y^k-y^{k+1})^\top G_0(y^{k-1}-y^{k})\) on the right-hand side of (3.18). For this purpose, we define

$$\begin{aligned} N=G_0+c_0\beta {\mathrm {Diag}}\left\{ A_2^\top A_2,A_3^\top A_3,\ldots ,A_m^\top A_m\right\} , \end{aligned}$$
(3.19)

where \(c_0\) is a positive constant that will be specified later.

Lemma 3.5

The matrix N is positive definite if \(\tau >m-1-c_0\).

Proof

Since \(G_i=\bar{G}_i-(1-\tau )\beta A_i^\top A_i(i=2,3,\ldots ,m)\), then for any \(v=[x_2;x_3;\ldots ;x_m]\ne 0\), we have

$$\begin{aligned}&v^\top Nv\\&\quad =\sum _{i=2}^m\Vert x_i\Vert ^2_{\bar{G}_i}+\beta (\tau -1+c_0)\sum _{i=2}^m\Vert A_ix_i\Vert ^2-\beta \sum _{i\ne j}(A_ix_i)^\top (A_jx_j)\\&\quad \ge \beta (\tau +c_0-m+1)\sum _{i=2}^m\Vert A_ix_i\Vert ^2>0, \end{aligned}$$

where the first inequality follows from \(\bar{G}_i\succeq 0(i=2,3,\ldots ,m)\) and the second inequality comes from \(\tau >m-1-c_0\) and Assumption 3.3. Therefore, the matrix N is positive definite. The proof is completed. \(\square \)

Lemma 3.6

Let \(\{u^k\}\) be the sequence generated by Algorithm 1. Then,

$$\begin{aligned} \begin{aligned}&2\Vert y^k-y^{k+1}\Vert ^2_{G_0}-2(y^k-y^{k+1})^\top G_0(y^{k-1}-y^{k})\\&\quad \ge \Vert y^k-y^{k+1}\Vert ^2_N-\Vert y^{k-1}-y^{k}\Vert ^2_N-3c_0\beta \sum _{i=2}^m\Vert A_i(x_i^k-x_i^{k+1})\Vert ^2\\&\qquad -c_0\beta \sum _{i=2}^m\Vert A_i(x_i^{k-1}-x_i^{k})\Vert ^2. \end{aligned} \end{aligned}$$
(3.20)

Proof

From (3.19), we have \(G_0=N-c_0\beta {{\mathrm {Diag}}\{A_2^\top A_2,A_3^\top A_3,\ldots ,A_m^\top A_m\}}\). Therefore,

$$\begin{aligned}&2\Vert y^k-y^{k+1}\Vert ^2_{G_0}-2(y^k-y^{k+1})^\top G_0(y^{k-1}-y^{k})\\&\quad =2(y^k-y^{k+1})^\top G_0((y^k-y^{k+1})-(y^{k-1}-y^{k}))\\&\quad =2(y^k-y^{k+1})^\top [N-c_0\beta {\mathrm {Diag}}\{A_2^\top A_2,A_3^\top A_3,\ldots ,A_m^\top A_m\}]((y^k-y^{k+1})\\&\qquad -(y^{k-1}-y^{k}))\\&\quad =2\Vert y^k-y^{k+1}\Vert ^2_N-2(y^k-y^{k+1})^\top N(y^{k-1}-y^{k})\\&\qquad -2c_0\beta (y^k-y^{k+1})^\top {\mathrm {Diag}}\{A_2^\top A_2,A_3^\top A_3, \ldots ,A_m^\top A_m\}(y^k-y^{k+1})\\&\qquad +2c_0\beta (y^k-y^{k+1})^\top {\mathrm {Diag}}\{A_2^\top A_2,A_3^\top A_3, \ldots ,A_m^\top A_m\}(y^{k-1}-y^{k})\\&\quad \ge \Vert y^k-y^{k+1}\Vert ^2_N-\Vert y^{k-1}-y^{k}\Vert ^2_N-3c_0\beta \sum _{i=2}^m \Vert A_i(x_i^k-x_i^{k+1})\Vert ^2\\&\qquad -c_0\beta \sum _{i=2}^m\Vert A_i(x_i^{k-1}-x_i^{k})\Vert ^2, \end{aligned}$$

where the last inequality follows from the Cauchy–Schwartz inequality. The proof is completed. \(\square \)

Based on the above three lemmas, we can deduce the recurrence relation of the sequence generated by Algorithm 1 as follows.

Lemma 3.7

Let \(\{u^k\}\) be the sequence generated by Algorithm 1. Then,

$$\begin{aligned} \begin{aligned}&\Vert v^{k+1}-v^*\Vert ^2_H+\beta \max \left\{ 1-\gamma ,1-\frac{1}{\gamma }\right\} \Vert \mathcal {A}x^{k+1}-b\Vert ^2+\Vert y^k-y^{k+1}\Vert ^2_N\\&\qquad +c_0\beta \sum _{i=2}^m\Vert A_i(x_i^k-x_i^{k+1})\Vert ^2\\&\quad \le \Vert v^k-v^*\Vert _H^2+\beta \max \left\{ 1-\gamma , 1-\frac{1}{\gamma }\right\} \Vert \mathcal {A}x^{k}-b\Vert ^2+\Vert y^{k-1}-y^{k}\Vert ^2_N\\&\qquad +c_0 \beta \sum _{i=2}^m\Vert A_i(x_i^{k-1}-x_i^{k})\Vert ^2\\&\qquad -\sum _{i=2}^m\Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i} -\beta [\tau -4c_0-(m-1) \max \{1-\gamma ,\gamma ^2-\gamma \}]\\&\quad \qquad \sum _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2 -\beta \min \left\{ 1,\frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert \mathcal {A}x^{k+1}-b\Vert ^2. \end{aligned} \end{aligned}$$
(3.21)

Proof

Substituting (3.20) into (3.18), one has

$$\begin{aligned}&\Vert v^k-\hat{v}^{k}\Vert ^2_H-\Vert v^{k+1}-\hat{v}^k\Vert _H^2\\&\quad \ge \sum _{i=2}^m\Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i} +\beta [\tau -4c_0-(m-1)\max \{1-\gamma ,\gamma ^2-\gamma \}]\\&\quad \qquad \sum _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2+\beta \min \left\{ 1,\frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert \mathcal {A}x^{k+1}-b\Vert ^2\\&\qquad +\beta \max \left\{ 1-\gamma ,1-\frac{1}{\gamma }\right\} (\Vert \mathcal {A} x^{k+1}-b\Vert ^2-\Vert \mathcal {A}x^{k}-b\Vert ^2)\\&\qquad +(\Vert y^k-y^{k+1}\Vert ^2_N-\Vert y^{k-1}-y^{k}\Vert ^2_N) +c_0\beta \left( \sum _{i=2}^m\Vert A_i(x_i^k-x_i^{k+1})\Vert ^2\right. \\&\left. \qquad -\sum _{i=2}^m\Vert A_i(x_i^{k-1}-x_i^{k}) \Vert ^2\right) . \end{aligned}$$

This together with (3.17) implies

$$\begin{aligned} \begin{aligned}&2\theta (x)-2\theta (\hat{x}^k)+2(u-\hat{u}^k)^\top F(\hat{u}^k)\\&\quad \ge (\Vert v-v^{k+1}\Vert ^2_H-\Vert v-v^k\Vert _H^2)+\sum _{i=2}^m \Vert x_i^k-{x}_i^{k+1}\Vert ^2_{\bar{G}_i}\\&\qquad +\beta [\tau -4c_0-(m-1)\max \{1-\gamma ,\gamma ^2-\gamma \}] \sum _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2\\&\qquad +\beta \min \left\{ 1,\frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert \mathcal {A}x^{k+1}-b\Vert ^2+\beta \max \left\{ 1-\gamma ,1-\frac{1}{\gamma }\right\} \\&\quad \qquad (\Vert \mathcal {A} x^{k+1}-b\Vert ^2-\Vert \mathcal {A}x^{k}-b\Vert ^2)+(\Vert y^k-y^{k+1}\Vert ^2_N-\Vert y^{k-1}-y^{k}\Vert ^2_N)\\&\qquad +c_0\beta \left( \sum _{i=2}^m\Vert A_i(x_i^k-x_i^{k+1})\Vert ^2 -\sum _{i=2}^m\Vert A_i(x_i^{k-1} -x_i^{k})\Vert ^2\right) . \end{aligned} \end{aligned}$$
(3.22)

Setting \(u=u^*\in \mathcal {U}^*\) in the above inequality and using (2.3), we get (3.21). This completes the proof. \(\square \)

With the above conclusions in hand, we can prove the Theorem 3.1 and Theorem 3.2 as follows.

Proof of Theorem 3.1

By (3.21), (3.2) and \(\gamma \in \Big (0,\frac{1+\sqrt{5}}{2}\Big )\), one has

$$\begin{aligned} \begin{aligned}&\sum \limits _{k=1}^\infty \left( \sum \limits _{i=2}^m\Vert x_i^k-{x}_i^{k+1} \Vert ^2_{\bar{G}_i}+c_1\sum \limits _{i=2}^m\Vert A_i(x_i^k-{x}_i^{k+1})\Vert ^2+c_2\Vert \mathcal {A}x^{k+1}-b\Vert ^2\right) \\&\quad \le \Vert v^1-v^*\Vert _H^2+c_3\Vert \mathcal {A}x^{1}-b\Vert ^2 +\Vert y^{0}-y^{1}\Vert ^2_N+c_0\beta \sum _{i=2}^m\Vert A_i(x_i^{0}-x_i^{1})\Vert ^2\\&\quad <+\infty , \end{aligned} \end{aligned}$$
(3.23)

where \(c_1=\beta [\tau -4c_0-(m-1)\max \{1-\gamma ,\gamma ^2-\gamma \}], c_2=\beta \min \{1,\frac{1+\gamma -\gamma ^2}{\gamma }\}, c_3=\beta \max \{1-\gamma ,1-\frac{1}{\gamma }\}\). This and Assumption 2.3, \(\bar{G}_i\succeq 0 (i=2,3,\ldots ,m)\) indicate that

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert y^k-y^{k+1}\Vert =\lim _{k\rightarrow \infty } \Vert \mathcal {A}x^{k+1}-b\Vert =0. \end{aligned}$$
(3.24)

Furthermore, by the iterative scheme (3.1) and (3.24), one has

$$\begin{aligned}&\Vert A_1(x_1^{k}-x_1^{k+1})\Vert \\&\quad =\Vert \frac{1}{\gamma \beta }(\lambda ^{k-1}-\lambda ^k) -\frac{1}{\gamma \beta }(\lambda ^{k}-\lambda ^{k+1})-\sum _{i=2}^m A_i(x_i^{k}-x_i^{k+1})\Vert \\&\quad \le \frac{1}{\gamma \beta }\Vert \lambda ^{k-1}-\lambda ^k\Vert +\frac{1}{\gamma \beta }\Vert \lambda ^{k}-\lambda ^{k+1}\Vert +\sum _{i=2}^m\Vert A_i(x_i^{k}-x_i^{k+1})\Vert \rightarrow 0 ({\mathrm {as}}~k\rightarrow \infty ), \end{aligned}$$

which together with Assumption 2.3 implies that

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert x_1^{k}-x_1^{k+1}\Vert =0. \end{aligned}$$
(3.25)

On the other hand, by \(H\succ 0\) and (3.21) again, we have the sequences \(\{\Vert v^{k+1}-v^*\Vert \}\), \(\{\Vert \mathcal {A}x^{k+1}-b\Vert \}\) are both bounded. Then, by

$$\begin{aligned}&\Vert A_1(x_1^{k+1}-x_1^*)\Vert \\&\quad =\Vert \mathcal {A}x^{k+1}-b-\sum _{i=2}^mA_i(x_i^{k+1}-x_i^*)\Vert \\&\quad \le \Vert \mathcal {A}x^{k+1}-b\Vert +\sum _{i=2}^m A_i\Vert (x_i^{k+1}-x_i^*)\Vert , \end{aligned}$$

and Assumption 2.3, we have that the sequence \(\{\Vert x_1^{k+1}-x_1^*\Vert \}\) is also bounded. In conclusion, the sequence \(\{u^k\}\) generated by Algorithm 1 is bounded. Then, it has at least a cluster point, saying \(u^\infty \), and suppose the subsequence \(\{u^{k_i}\}\) converges to \(u^\infty \). Then, taking limits on both sides of (3.10) along the subsequence \(\{u^{k_i}\}\) and using (3.24), (3.25), one has

$$\begin{aligned} \theta (x)-\theta ({x}^\infty )+(u-{u}^\infty )^\top F({u}^\infty )\ge 0, \quad \forall u\in \mathcal {U}. \end{aligned}$$

Therefore, \(u^\infty \in \mathcal {U}^*\).

Hence, replacing \(u^*\) by \(u^\infty \) in (3.21), we get

$$\begin{aligned}&\Vert v^{k+1}-v^\infty \Vert ^2_H+c_3\Vert \mathcal {A}x^{k+1}-b\Vert ^2+\Vert y^k-y^{k+1}\Vert ^2_N +c_0\beta \sum _{i=2}^m\Vert A_i(x_i^k-x_i^{k+1})\Vert ^2\\&\quad \le \Vert v^k-v^\infty \Vert _H^2+c_3\Vert \mathcal {A}x^{k}-b\Vert ^2+\Vert y^{k-1}-y^{k}\Vert ^2_N +c_0\beta \sum _{i=2}^m\Vert A_i(x_i^{k-1}-x_i^{k})\Vert ^2. \end{aligned}$$

From (3.24), (3.25), we have that for any given \(\varepsilon >0\), there exists \(l_0>0\), such that

$$\begin{aligned} c_3\Vert \mathcal {A}x^{k}-b\Vert ^2+\Vert y^{k-1}-y^{k}\Vert ^2_N+c_0\beta \sum _{i=2}^m\Vert A_i(x_i^{k-1}-x_i^{k})\Vert ^2<\frac{\varepsilon }{2}, \quad \forall k\ge l_0. \end{aligned}$$

Since \(v^{k_i}\rightarrow v^\infty \) for \(i\rightarrow \infty \), there exists \(k_l>l_0\), such that

$$\begin{aligned} \Vert v^{k_l}-v^\infty \Vert ^2_H<\frac{\varepsilon }{2}. \end{aligned}$$

Then, the above three inequalities lead to that, for any \(k>k_l\), we have

$$\begin{aligned}&\Vert u^{k}-u^\infty \Vert ^2_H\\&\quad \le \Vert u^{k_l}-u^\infty \Vert ^2_H+c_3\Vert \mathcal {A}x^{k_l}-b\Vert ^2 +\Vert y^{k_l-1}-y^{k_l}\Vert ^2_N+c_0\beta \sum _{i=2}^m\Vert A_i(x_i^{k_l-1}-x_i^{k_l})\Vert ^2\\&\quad <\varepsilon . \end{aligned}$$

Therefore the whole sequence \(\{u^k\}\) converges to the solution \(u^\infty \). The proof is completed. \(\square \)

Obviously, to get a large feasible region of \(\tau \), from the condition (3.2), we can set \(m-1-c_0=4c_0+(m-1)\max \{1-\gamma ,\gamma ^2-\gamma \}\), and solve this linear equation, we get \(c_0=\frac{m-1-(m-1)\max \{1-\gamma ,\gamma ^2-\gamma \}}{5}\), which is obvious greater than zero if \(\gamma \in (0,\frac{1+\sqrt{5}}{2})\). Thus, the constant \(\tau \) only need to satisfy the following conditon

$$\begin{aligned} \tau >\frac{4+\max \{1-\gamma ,\gamma ^2-\gamma \}}{5}(m-1). \end{aligned}$$

Remark 3.4

In the recent work [14], He and Yuan proposed an ADMM-like splitting method for the model (1.1) with \(m=3\), which belongs to the mixed Gauss–Seidel and Jacobian ADMMs. More specifically, the iterative scheme of the method proposed in [14] is given by

$$\begin{aligned} \left\{ \begin{array}{l} {x}_1^{k+1}:=\mathop {{\mathrm {argmin}}}\limits _{x_1\in \mathcal {X}_1} \left\{ \mathcal {L}_\beta (x_1,x_2^k,x_3^k;\lambda ^k)\right\} ,\\ \left\{ \begin{array}{l}{x}_2^{k+1}:=\mathop {{\mathrm {argmin}}} \limits _{x_2\in \mathcal {X}_2}\left\{ \mathcal {L}_\beta ({x}_1^{k+1}, x_2,x_3^k;\lambda ^k)+\frac{\tau }{2}\Vert A_2(x_2-x_2^k)\Vert ^2\right\} ,\\ {x}_3^{k+1}:=\mathop {{\mathrm {argmin}}}\limits _{x_3\in \mathcal {X}_3} \left\{ \mathcal {L}_\beta ({x}_1^{k+1},x_2^k,x_3;\lambda ^k)+\frac{\tau }{2} \Vert A_3(x_3-x_3^k)\Vert ^2\right\} ,\\ \end{array}\right. \\ {\lambda }^{k+1}:=\lambda ^{k}-\beta \left( \sum _{i=1}^3A_i{x}_i^{k+1}-b\right) , \end{array}\right. \end{aligned}$$
(3.26)

where \(\tau \ge 0.6\). Though Algorithm 1 and (3.26) both belong to the mixed Gauss–Seidel and Jacobian ADMMs, Algorithm 1 has an important advantage over (3.26). That is the matrices \(\bar{G}_i (i=2,3,\ldots ,m)\) can be any positive semi-definite matrices, and when they are taken some special cases, the subproblems in Algorithm 1 often admit closed-form solutions; see Remark 3.1. However, when \(A_i^\top A_i\ne I_{n_i} (i=2,3)\), the subproblems with respect to \(x_i(i=2,3)\) in (3.26) don’t have closed-form solutions even if \(\theta _i(x_i)=\Vert x_i\Vert _1\) or \(\theta _i(x_i)=\Vert x_i\Vert _2 (i=2,3)\).

Remark 3.5

Algorithm 1 contains the methods in [13, 14] as special cases. More precisely:

  1. 1.

    For Algorithm 1, setting \(m=2\), \(\gamma =1\), and \(\bar{G}_2=\tau _2I_{n_2}-\tau \beta A_2^\top A_2\) with \(\tau _2\ge \tau \beta \Vert A_2\Vert ^2\), \(\tau >0.8\). Then, \(\bar{G}_2\succeq 0\), and \(G_2=\tau _2I_{n_2}-\beta A_2^\top A_2\). It is easy to check that Algorithm 1 reduces to the method in [13].

  2. 2.

    For Algorithm 1, setting \(m=3\), \(\gamma =1\), and \(\bar{G}_i=0(i=2,3)\), then \(G_i=(\tau -1)\beta A_i^\top A_i(i=2,3)\) with \(\tau >1.6\). It is easy to check that Algorithm 1 reduces to the method (3.26) in [14].

At the end of this section, let us prove the worst-case \(\mathcal {O}(1/t)\) convergence rate in an ergodic sense of Algorithm 1 according to the criterion established in Proposition 2.1.

Proof of Theorem 3.2

From \(\hat{x}^{k}\in \mathcal {X}\) and the convexity of \(\mathcal {X}\), we have \({x}_{t}\in \mathcal {X}\). Setting \(x=x^*,\lambda =\lambda ^*\) in (3.22), and summing the resulted inequality over \(k=1, 2, \ldots , t\), we have

$$\begin{aligned} \begin{aligned}&\sum _{k=1}^{t}\left[ \theta (\hat{x}^{k})-\theta (x^*)+(\hat{u}^{k}-u^*)^\top F(\hat{u}^k)+\frac{\beta }{2}\min \left\{ 1,\frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert \mathcal {A}\hat{x}^{k}-b\Vert ^2\right] \\&\quad \le \frac{1}{2}(\Vert v^1-v^*\Vert _H^2+\beta \max \left\{ 1-\gamma ,1-\frac{1}{\gamma }\right\} \Vert \mathcal {A}x^{1}-b\Vert ^2+\Vert y^{0}-y^{1}\Vert ^2_N\\&\qquad +c_0\beta \sum _{i=2}^m\Vert A_i(x_i^{0}-x_i^{1})\Vert ^2) \end{aligned} \end{aligned}$$
(3.27)

Furthermore, the crossing term \((\hat{u}^{k}-u^*)^\top F(\hat{u}^k)\) on the left-hand side of (3.27) can be rewritten as

$$\begin{aligned} \begin{aligned}&(\hat{u}^{k}-u^*)^\top F(\hat{u}^k)\\&\quad =(\hat{u}^{k}-u^*)^\top F({u}^*)\\&\quad =(\hat{x}^k-x^*)^\top (-\mathcal {A}^\top \hat{\lambda }^k) +(\hat{\lambda }^k-\lambda ^*)^\top (\mathcal {A}\hat{x}^k-b)\\&\quad =(b-\mathcal {A}\hat{x}^k)^\top \hat{\lambda }^k+(\hat{\lambda }^k -\lambda ^*)^\top (\mathcal {A}\hat{x}^k-b)\\&\quad =(-\lambda ^*)^\top (\mathcal {A}\hat{x}^k-b)\\&\quad =(\hat{x}^{k}-x^*)^\top (-\mathcal {A}^\top \lambda ^*), \end{aligned} \end{aligned}$$
(3.28)

where the first equality comes from (2.3). Substituting (3.28) into the left-hand side of (3.27), we get

$$\begin{aligned} \begin{aligned}&\sum _{k=1}^{t}\left[ \theta (\hat{x}^{k})-\theta (x^*)+(\hat{x}^{k}-x^*)^\top (-\mathcal {A}^\top \lambda ^*)+\frac{\beta }{2}\min \left\{ 1,\frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert \mathcal {A}\hat{x}^{k}-b\Vert ^2\right] \\&\quad \le \frac{1}{2}(\Vert v^1-v^*\Vert _H^2+\beta \max \left\{ 1-\gamma ,1-\frac{1}{\gamma }\right\} \Vert \mathcal {A}x^{1}-b\Vert ^2+\Vert y^{0}-y^{1}\Vert ^2_N\\&\qquad +c_0\beta \sum _{i=2}^m \Vert A_i(x_i^{0}-x_i^{1})\Vert ^2) \end{aligned} \end{aligned}$$
(3.29)

Then, dividing (3.29) by t and using the convexity of \(\theta (\cdot )\) and \(\Vert \cdot \Vert ^2\) lead to

$$\begin{aligned} \theta (\hat{x}_t)-\theta (x^*)+(\hat{x}_t-x^*)^\top (- {\mathcal {A}}^\top \lambda ^*)+\frac{\beta }{2} \min \left\{ 1,\frac{1+\gamma -\gamma ^2}{\gamma }\right\} \Vert \mathcal {A} \hat{x}_{t}-b\Vert ^2\le \frac{D}{t}, \end{aligned}$$

where the constant D is defined by (3.4). Then, we get the result (3.3). This completes the proof. \(\square \)

4 Numerical results

In this section, we use Algorithm 1 to solve three concrete models of (1.1), including the Lasso model (\(m=2\)), the latent variable Gaussian graphical model selection on some synthetic data sets (\(m=3\)), the linear homogeneous equations (\(m=4\)). We mainly compare the performance of Algorithm 1 with the method in [24], denoted by SPADMM, and the method in [28], denoted by TPADMM. All codes were written in Matlab, and run in MatlabR2010a on a laptop with Pentium(R) Dual-Core CPU T4400@2.2GHz, 4GB of memory.

Problem 1

The Lasso model

The Lasso model is

$$\begin{aligned} \min _{y\in \mathcal {R}^n}\mu \Vert y\Vert _1+\frac{1}{2}\Vert Ay-b\Vert ^2, \end{aligned}$$
(4.1)

where \(A\in \mathcal {R}^{m\times n}\), \(b\in \mathcal {R}^m\), \(\mu >0\) is a regularization parameter. By introducing a new variable x, we can rewrite (4.1) as

$$\begin{aligned} \begin{aligned}&\min \frac{1}{2}\Vert x-b\Vert ^2+\mu \Vert y\Vert _1\\&{\mathrm {s.t.}} ~x-Ay=0,\\&\quad \quad \quad x\in \mathcal {R}^m, y\in \mathcal {R}^n, \end{aligned} \end{aligned}$$
(4.2)

which is a special case of problem (1.1) with the following specifications:

$$\begin{aligned} \theta _1(x_1)=\frac{1}{2}\Vert x-b\Vert ^2, \,\,\theta _2(x_2)=\mu \Vert y\Vert _1,\,\, A_1=I_n,\,\, A_2=-A, \,\,b=0. \end{aligned}$$

Then, Algorithm 1 with \(\bar{G}_1=0,\bar{G}_2=\tau _2 I_n-\beta \tau A^\top A\) can be used to solve (4.2), where \(\tau _2=\tau \beta \Vert A^\top A\Vert \), and the closed-form solutions of the subproblems resulted by Algorithm 1 are similar to those in [2, 26], which are not listed here for succinctness. In this experiment, the matrix A is generated by \({\texttt {A=randn(m,n)}}\); \({\texttt {A}}={\texttt {A*spdiags(1./sqrt(sum(A)}}.^{2}{\texttt {))',0,n,n)}}\). The sparse vector \(x^*\) is generated by \({\texttt {p=100/n}}\); \({\texttt {x}}^*{\texttt {= sprandn(n,1,p)}}\), and the observed vector b is generated via \({\texttt {b=A*x}}^*{{\texttt {+sqrt(0.001)*randn(m,1)}}}\). We set the regularization parameter \(\mu =0.1 {\Vert {\texttt {A}}^\top {\texttt {b}}\Vert _\infty }\). For Algorithm 1, we set \(\beta =1, \tau =(4+\max \{1-\gamma ,\gamma ^2-\gamma \})/5\). For SPADMM, we set \(\beta =1\) and \(\gamma \) in Table 1 is r in SPADMM. For TPADMM, we set \(\beta =1\) and utilize the same technique to linear its second subproblem. The initial points are \(y=0,\lambda =0\), and the stopping criterion is [2]:

$$\begin{aligned} \Vert x^{k+1}-Ay^{k+1}\Vert \le \epsilon ^{{\mathrm {pri}}} \quad {\mathrm {and}} \quad \Vert \beta A(y^{k+1}-y^k)\Vert \le \epsilon ^{{\mathrm {dual}}}, \end{aligned}$$

where \(\epsilon ^{{\mathrm {pri}}}=\sqrt{n}\epsilon ^{{\mathrm {abs}}}+\epsilon ^{{\mathrm {rel}}}\max \{\Vert x^{k+1}\Vert ,\Vert Ay^{k+1}\Vert \}\), and \(\epsilon ^{{\mathrm {dual}}}=\sqrt{n}\epsilon ^{{\mathrm {abs}}}+\epsilon ^{{\mathrm {rel}}}\Vert y^{k+1}\Vert \) with \(\epsilon ^{{\mathrm {abs}}}=10^{-4}\) and \(\epsilon ^{{\mathrm {rel}}}=10^{-3}\). The numerical results generated by the tested methods are reported in Table 1, in which we only report the number of iterations, because the tested methods has the same structure.

Table 1 Comparison between the number of iterations taken by the tested methods for Lasso model

Numerical results in Table 1 indicate that the performance of Algorithm 1 is obviously better than the other two tested methods in the sense that the number of iterations taken by Algorithm 1 is at most 91\(\%\) of those taken by SPADMM and at most 82\(\%\) of those taken by TPADMM. We believe that the improvement should be contributed to the smaller proximal parameter of Algorithm 1. Then, the advantage of smaller proximal parameter is verified. Furthermore, though bigger relaxation factor the \(\gamma \) can often speed up the corresponding iteration method, we observe that Algorithm 1 with \(\gamma =1.0\) almost always performs better than Algorithm 1 with \(\gamma =0.8,1.2\). The reason maybe that \(\tau \) gives more influence to the performance of Algorithm than \(\gamma \) for this problem.

Problem 2

The latent variable Gaussian graphical model selection

First, let us review latent variable Gaussian graphical model selection (LVGGMS) [4, 21]. Let \(X_{p\times 1}\) be the observed variables, and \(Y_{r\times 1}(r\ll p)\) be the hidden variables such that (XY) jointly follow a multivariate normal distribution in \(\mathcal {R}^{p+r}\), where the covariance matrix \(\Sigma _{(X,Y)}=[\Sigma _X,\Sigma _{XY};\Sigma _{YX},\Sigma _Y]\) and the precision matrix \(\Theta _{(X,Y)}=[\Theta _X,\Theta _{XY};\Theta _{YX},\Theta _Y]\) are unknown. Under the prior assumption that the conditional precision matrix of observed variables \(\Theta _X\) is sparse, the marginal precision matrix of observed variables, \(\Sigma _X^{-1}=\Theta _X-\Theta _{XY}\Theta _Y^{-1}\Theta _{YX}\) is a difference between the sparse term \(\Theta _X\) and the low-rank term \(\Theta _{XY}\Theta _Y^{-1}\Theta _{YX}\). Therefore, the problem of interest is to recover the sparse conditional matrix \(\Theta _X\) and the low-rank term \(\Theta _{XY}\Theta _Y^{-1}\Theta _{YX}\) based on the observed variables X. Setting \(\Sigma _X^{-1}=S-L\), Chandrasekaran et al. [4] introduced the following latent variable graphical model selection

$$\begin{aligned} \begin{aligned}&\min _{S,L} \langle S-L,\hat{\Sigma }_X\rangle -{\mathrm {logdet}}(S-L)+\alpha _1\Vert S\Vert _1 +\alpha _2\mathbf {Tr}(L),\\&{\mathrm {s.t.}} ~S-L\succ 0, L\succeq 0, \end{aligned} \end{aligned}$$
(4.3)

where \(\hat{\Sigma }_X\) is the sample covariance matrix of X; \(\Vert S\Vert _1\) is the \(\ell _1\)-norm of the matrix S defined by \(\sum _{i,j=1}^p|S_{ij}|\); and \(\mathbf {Tr}(L)\) denotes the trace of the matrix L; \(\alpha _1>0\) and \(\alpha _2>0\) are given scalars controlling the sparsity and the low-rankness of the solution. Obviously, the model (4.3) can be rewritten as

$$\begin{aligned}&\min \langle R,\hat{\Sigma }_X\rangle -{\mathrm {logdet}}R+\alpha _1\Vert S\Vert _1 +\alpha _2\mathbf {Tr}(L)\\&{\mathrm {s.t.}} ~R=S-L, R\succ 0, L\succeq 0, \end{aligned}$$

which can be furthered casted as

$$\begin{aligned} \begin{aligned}&\min \langle R,\hat{\Sigma }_X\rangle -{\mathrm {logdet}}R+\alpha _1\Vert S\Vert _1 +\alpha _2\mathbf {Tr}(L)+\mathcal {I}(L\succeq 0)\\&{\mathrm {s.t.}} ~R-S+L=0, \end{aligned} \end{aligned}$$
(4.4)

where the indicator function \(\mathcal {I}(L\succeq 0)\) is defined as

$$\begin{aligned} \mathcal {I}(L\succeq 0):=\left\{ \begin{array}{l} 0, \quad {\mathrm {if}}\quad L\succeq 0,\\ +\infty ,\quad {\mathrm {otherwise}}, \end{array}\right. \end{aligned}$$

The constraint \(R\succ 0\) is removed since it is already imposed by the logdetR function. The convex minimization (4.4) is a special case of the model (1.1) where \(x_1=R, x_2=S, x_3=L\), \(\theta _1(x_1)=\langle R,\hat{\Sigma }_X\rangle -{\mathrm {logdet}}R\), \(\theta _2(x_2)=\alpha _1\Vert S\Vert _1\), \(\theta _3(x_3)=\alpha _2\mathbf {Tr}(L)+\mathcal {I}(L\succeq 0)\), \(A_1=I_p, A_2=-I_p, A_3=I_p, b=0\). Then, Algorithm 1 with \(\bar{G}_i=0(i=2,3)\) can be used to solve (4.4), and its iterative scheme is listed as follows.

$$\begin{aligned} \left\{ \begin{array}{l} R^{k+1}:=\mathop {{\mathrm {argmin}}}\limits _{R} \left\{ \mathcal {L}_\beta (R,S^k,L^k;\Lambda ^k)\right\} ,\\ S^{k+1}:=\mathop {{\mathrm {argmin}}}\limits _{S} \left\{ \mathcal {L}_\beta (R^{k+1},S,L^k;\Lambda ^k) +\frac{\beta (\tau -1)}{2}\Vert S-S^k\Vert ^2_F\right\} ,\\ L^{k+1}:=\mathop {{\mathrm {argmin}}}\limits _{L} \left\{ \mathcal {L}_\beta (R^{k+1},S^k,L;\Lambda ^k) +\frac{\beta (\tau -1)}{2}\Vert L-L^k\Vert ^2_F\right\} ,\\ \Lambda ^{k+1}:=\Lambda ^{k}-\gamma \beta \left( R^{k+1}-S^{k+1}+L^{k+1} \right) , \end{array}\right. \end{aligned}$$
(4.5)

where the augmented Lagrangian function \(\mathcal {L}_\beta (R,S,L;\Lambda )\) is defined as

$$\begin{aligned} \mathcal {L}_\beta (R,S,L;\Lambda )= & {} \langle R,\hat{\Sigma }_X\rangle -{\mathrm {logdet}}R+\alpha _1\Vert S\Vert _1 +\alpha _2\mathbf {Tr}(L)+\mathcal {I}(L\succeq 0)\\&-\langle \Lambda , R-S-L\rangle +\frac{\beta }{2}\Vert R-S+L\Vert _F^2. \end{aligned}$$

In [21], the authors have elaborated on the similar subproblems as those in (4.5). Therefore, based on the discussion of [21], we give the closed-form solutions of the subproblems in (4.5).

  • For the given \(R^{k}\), \(S^k\), \(L^k\) and \(\Lambda ^{k}\), the R subproblem in (4.5) admits a closed-form solution as

    $$\begin{aligned} {R}^{k+1}=U\hat{\wedge }U^\top , \end{aligned}$$
    (4.6)

    where U is obtained by the eigenvalue decomposition: \(U{\mathrm {Diag}}(\sigma )U^\top =(\hat{\Sigma }_X-\Lambda ^k)/ (\beta \tau )-[(\tau -1)R^k+S^k-L^k]/\tau \), and \(\hat{\wedge }={\mathrm {Diag}}(\hat{\sigma })\) is obtained by:

    $$\begin{aligned} \hat{\sigma }_j=\frac{ {-}\sigma _j+\sqrt{\sigma _j^2+4/(\tau \beta )}}{2}, \quad j=1,2,\ldots ,p. \end{aligned}$$
  • For the given \(R^{k+1}\), \(S^k\), \(L^k\) and \(\Lambda ^{k}\), the S subproblem in (4.5) admits a closed-form solution as

    $$\begin{aligned} {S}^{k+1}={\mathrm {Shrink}}(Z^k,\alpha _1/(\tau \beta )), \end{aligned}$$
    (4.7)

    where \(Z^k=-\Lambda ^k/(\beta \tau )+[(\tau -1)S^k+R^{k+1}+L^k]/\tau \) and \({\mathrm {Shrink}}(Z,\tau )={\mathrm {sign}}(Z_{ij})\cdot \max \{0,|Z_{ij}|-\tau \}\).

  • For the given \(R^{k+1}\), \(S^k\), \(L^k\) and \(\Lambda ^{k}\), the L subproblem in (4.5) admits a closed-form solution as

    $$\begin{aligned} {L}^{k+1}=U\tilde{\wedge }U^\top , \end{aligned}$$
    (4.8)

    where U is obtained by: \(U{\mathrm {Diag}}(\sigma )U^\top \) is the eigenvalue decomposition of the matrix \(\Lambda ^k/(\beta \tau )+[(\tau -1)L^k+S^k-R^{k+1}]/\tau \), and \(\hat{\wedge }={\mathrm {Diag}}(\hat{\sigma })\) is obtained by:

    $$\begin{aligned} \hat{\sigma }_j=\max \{\sigma _j-\alpha _2/(\tau \beta ),0\}, j=1,2,\ldots ,p. \end{aligned}$$

The synthetic data of our experiment is generated by the following procedures [21]. Given the number of the observed variables p and the number of latent variables r, we created a sparse matrix \(\mathcal {W}^\in \mathcal {R}^{(p+r)\times (p+r)}\) with sparsity around 10\(\%\), in which the nonzero entries were set to \(-1\) or 1 with equality probability. From \(\mathcal {W}\), we can get the true precision matrix \(K=(\mathcal {W}*\mathcal {W}^\top )^{-1}\) and then obtain two submatrices of K, \(\hat{S}=K(1:p,1:p)\in \mathcal {R}^{p\times p}\) and \(\hat{L}=K(1:p,p+1:p+r)K(p+1:p+r,p+1:p+r)^{-1}K(p+1:p+r,1:p)\in \mathcal {R}^{p\times p}\), which are the ground truth matrices of the sparse matrix S and the low-rank matrix L, respectively. The sample covariance matrix of the observed variables is defined by \(\Sigma _X=\frac{1}{N}\sum _{i=1}^NY_i^\top Y_i^\top \), where \(N=5p\) and the i.i.d. vectors \(Y_1, Y_2, \ldots , Y_N\) are drew from the gaussian distribution \(\mathcal {N}(0,(\hat{S}-\hat{L})^{-1})\). Throughout this experiment, we use the following stopping criterion:

$$\begin{aligned} \max \left\{ \begin{aligned}\frac{\Vert {R}^{k+1}-R^k\Vert _F}{1+\Vert R^k\Vert _F}, \frac{\Vert {S}^{k+1}-S^k\Vert _F}{1+\Vert S^k\Vert _F},\frac{\Vert {L}^{k}-L^k\Vert _F}{1+\Vert L^k\Vert _F} \end{aligned}\right\} <10^{-5}. \end{aligned}$$

In the following, we are going to compare Algorithm 1 with TPADMM [28] by solving (4.4). For Algorithm 1, we set \(\beta =0.5, \tau =2.02(4+\max \{1-\gamma ,\gamma ^2-\gamma \})/5\). For TPADMM, we set \(\beta =v=0.5, M_2=M_3=vI_p\). The initial points are \(R^0=\mathtt {eye(p,p)}, S^0=R^0, L^0=\mathtt {zeros(p,p)}, \Lambda ^0=\mathtt {zeros(p,p)}\). The numerical results generated by Algorithm 1 and TPADMM are reported in Tables 23 and 4.

Table 2 Comparison between the number of iterations (time in seconds) taken by Algorithm 1 and TPADMM for LVGGMS model with \(p=200, r=10\)
Table 3 Comparison between the number of iterations (time in seconds) taken by Algorithm 1 and TPADMM for LVGGMS model with \(p=500, r=25\)
Table 4 Comparison between the number of iterations (time in seconds) taken by Algorithm 1 and TPADMM for LVGGMS model with \(p=700, r=35\)

From Tables 23 and 4, several conclusions can be drawn here: (i) Both methods successfully solved all the tested problems and can deal with medium scale LVGGMS; (ii) Numerical results in the three tables show that Algorithm 1 performs better than TPADMM because Algorithm 1 always takes less number of iterations and less CPU time. In fact, Algorithm 1 can achieve an improvement of at least 20% (16%) reduction in the number of iterations (CPU time) over TPADMM; (iii) When the parameters \(\alpha _1\) and \(\alpha _2\) decrease, the advantage of Algorithm 1 over TPADMM becomes more clearly; (iv) Different to Problem 1, numerical results in the three tables also show that, for a fixed \((\alpha _1,\alpha _2)\), the performance of Algorithm 1 and TPADMM become better as the parameter \(\gamma \) increases.

Problem 3

Linear homogeneous equations

Consider a system of linear homogeneous equations in four variables, which is a special case of (1.1) with a null objective function and takes the form of

$$\begin{aligned} A_1x_1+A_2x_2+A_3x_3+A_4x_4=0, \end{aligned}$$

where

$$\begin{aligned} A_1=\left( \begin{array}{ccc} 1\\ 3\\ 8\\ 2\end{array}\right) ,\quad A_2=\left( \begin{array}{ccc} 2\\ 5\\ 7\\ 1\end{array}\right) ,\quad A_3=\left( \begin{array}{ccc} 5\\ 3\\ 3\\ 9\end{array}\right) ,\quad A_4=\left( \begin{array}{ccc} 2\\ 9\\ 10\\ 8\end{array}\right) . \end{aligned}$$

We once again compare Algorithm 1 with TPADMM, where the various parameters for the two methods are set as follows: We take \(\beta =0.1, \gamma =1.6, \tau =2.98\), and \(\bar{G}_i=0(i=2,3)\) for Algorithm 1, \(\beta =0.1,v=5, M_2=M_3=vI_4\) for TPADMM. The initial points are \(x_1^0=x_2^0=x_3^0=x_4^0=\lambda ^0=1\). The stopping criterion is set as

$$\begin{aligned} {\mathrm {RelErr}}:={\mathrm {log}}\Big (\max \Big \{\max _{i=1,2,3,4}\{\Vert A_i (x_i^k-x_i^{k+1}\Vert _\infty \},\Vert \lambda ^k-\lambda ^{k+1}\Vert _\infty \}\Big \}\Big )<10^{-5}. \end{aligned}$$

The maximum number of iteration is set 1000. In Figs. 1 and 2, the evolution of the primal variables and relative error are displayed, respecitvely.

Fig. 1
figure 1

Evolution of the values of primal variables with respect to the number of iterations

Fig. 2
figure 2

Evolution of the values of relative error with respect to the number of iterations

Simulation result of the simple problem shows faster convergence rate can be obtained by the two methods for \(k\le 58\). After \(k=58\), relative error of TPADMM declined at a slower rate, and the descent rate of relative error of Algorithm 1 is still quite stable. The numbers of iterations of Algorithm 1 and TPADMM are 94 and 593, respectively. In fact, we observe that the performance of TPADMM is quite sensitive to the parameter \(\nu \), and for \(\nu =1,2,10,11,12\), TPADMM doesn’t satisfy the above stopping criterion even for \(k=1000\).

5 Conclusion

This paper provides a sharper lower bound of the proximal parameter in the proximal ADMM-type method for multi-block separable convex minimization, which can often alleviate the over-regularization effectiveness for the corresponding subproblem, and thus may speed up the convergence of corresponding method. The numerical results have verified the advantage of the smaller proximal parameter.