Inertial accelerated primal-dual methods for linear equality constrained convex optimization problems

He, Xin; Hu, Rong; Fang, Ya-Ping

doi:10.1007/s11075-021-01246-y

Inertial accelerated primal-dual methods for linear equality constrained convex optimization problems

Original Paper
Published: 08 January 2022

Volume 90, pages 1669–1690, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Numerical Algorithms Aims and scope Submit manuscript

Inertial accelerated primal-dual methods for linear equality constrained convex optimization problems

Download PDF

881 Accesses
9 Citations
Explore all metrics

Abstract

In this paper, we propose an inertial accelerated primal-dual method for the linear equality constrained convex optimization problem. When the objective function has a “nonsmooth + smooth” composite structure, we further propose an inexact inertial primal-dual method by linearizing the smooth individual function and solving the subproblem inexactly. Assuming merely convexity, we prove that the proposed methods enjoy $\mathcal {O}(1/k^{2})$ convergence rate on the objective residual and the feasibility violation in the primal model. Numerical results are reported to demonstrate the validity of the proposed methods.

A modified inertial proximal alternating direction method of multipliers with dual-relaxed term for structured nonconvex and nonsmooth problem

Article Open access 05 September 2024

A modified inertial proximal minimization algorithm for structured nonconvex and nonsmooth problem

Article Open access 20 September 2024

Inertial alternating direction method of multipliers for non-convex non-smooth optimization

Article 19 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the linear equality constrained convex optimization problem:

$$ \min_{x} \quad F(x), \quad s.t. \ Ax = b, $$

(1)

where $F: \mathbb {R}^{n}\to \mathbb {R}$ is a convex but possibly nonsmooth function, $A\in \mathbb {R}^{m\times n}$ and $b\in \mathbb {R}^{m}$. The problem (1) captures a number of important applications arising in various areas, and the following are three concrete examples.

Example 1.1

The basis pursuit problem (see, e.g., [9, 10]):

$$ \min_{x} \quad \|x\|_{1}, \quad s.t. \ Ax = b, $$

(2)

where $A\in \mathbb {R}^{m\times n}$ with m ≪ n, and ∥⋅∥₁ is the ℓ₁-norm of $\mathbb {R}^{n}$ defined by $\|x\|_{1} ={\sum }^{n}_{i=1} |x_{i}|$. Algorithms for the basis pursuit problem can be found in [31] and [33].

Example 1.2

The linearly constrained ℓ₁ − ℓ₂ minimization problem [17]:

$$ \min_{x} \quad \|x\|_{1}+\frac{\beta}{2}\|x\|^{2}_{2}, \quad s.t. \ Ax = b, $$

(3)

where β > 0 and ∥⋅∥₂ is the ℓ₂-norm of $\mathbb {R}^{n}$ defined by $\|x\|^{2}_{2} ={\sum }^{n}_{i=1} {x_{i}^{2}}$. When β is small enough, a solution of the problem (3) is also a solution of the basis pursuit problem (2). Since the problem (3) has the regularization term $\frac {\beta }{2}\|x\|^{2}_{2}$, it is less sensitive to noise than the basis pursuit problem (2).

Example 1.3

The global consensus problem [8]:

$$ \min_{X\in\mathbb{R}^{n\times N}} \quad F(X) = \sum\limits^{N}_{i=1}f_{i}(X_{i}), \quad s.t. \ X_{i} = X_{j},\quad \forall i, j\in\{1,2,{\cdots} N\}, $$

where $f_{i}:\mathbb {R}^{n}\to \mathbb {R}$ is convex, i = 1,2,⋯ ,N. The global consensus problem is a widely investigated model that has important applications in signal processing [23], routing of wireless sensor networks [22] and optimal consensus of agents [28].

Recall that $(x^{*}, \lambda ^{*})\in \mathbb {R}^{n} \times \mathbb {R}^{m}$ is a KKT point of the problem (1) if

$$ \begin{cases} -A^{T}\lambda^{*} \in \partial F(x^{*}),\\ Ax^{*} -b =0, \end{cases} $$

(4)

where ∂F is the classical subdifferential of F defined by

$$\partial F(x) = \{v\in\mathbb{R}^{n} | F(y)\geq F(x)+\langle v,y-x \rangle,\quad \forall y\in\mathbb{R}^{n}\}.$$

Let Ω be the KKT point set of the problem (1). Then, for any (x^∗,λ^∗) ∈Ω, from (4) we have

$$ \mathcal{L}(x^{*},\lambda)\leq \mathcal{L}(x^{*},\lambda^{*})\leq \mathcal{L}(x,\lambda^{*}), \qquad \forall (x,\lambda)\in \mathbb{R}^{n}\times\mathbb{R}^{m}, $$

where ${\mathscr{L}}:\mathbb {R}^{n}\times \mathbb {R}^{m}\to \mathbb {R}$ is the Lagrangian function associated with the problem (1) defined by

$$ \mathcal{L}(x,\lambda) = F(x)+\langle \lambda,Ax-b\rangle.$$

A classical method for solving the problem (1) is the augmented Lagrangian method (ALM) [6]:

$$ \begin{array}{@{}rcl@{}} \begin{cases} x_{k+1}\in {\arg\min}_{x} \mathcal{L}(x,\lambda_{k})+\frac{\sigma}{2}\|Ax-b\|^{2},\\ \lambda_{k+1} = \lambda_{k}+\sigma(Ax_{k+1}-b). \end{cases} \end{array} $$

(5)

In general, since ${\mathscr{L}}(x,\lambda _{k})+\frac {\sigma }{2}\|Ax-b\|^{2}$ is not strictly convex, the subproblem may have more than one solutions and be difficult to solve. To overcome this disadvantage, the proximal ALM [11] has been proposed:

$$ \begin{array}{@{}rcl@{}} \begin{cases} x_{k+1}={\arg\min}_{x} F(x)+\langle A^{T}\lambda_{k},x\rangle +\frac{\sigma}{2}\|Ax-b\|^{2}+\frac{1}{2}\|x-x_{k}\|_{P}^{2},\\ \lambda_{k+1} = \lambda_{k}+\sigma(Ax_{k+1}-b), \end{cases} \end{array} $$

(6)

where $\|x\|^{2}_{P} = x^{T}Px$ with a positive semidefinite matrix P and P + σA^TA is positive definite.

In some practical situations, the objective function F has the composite structure: F(x) = f(x) + g(x), where f is a convex but possibly nonsmooth function and g is a convex smooth function. Then, the problem (1) becomes the linearly constrained composite convex optimization problem:

$$ \min_{x} \quad f(x)+g(x), \quad s.t. \ Ax = b. $$

(7)

An application of the method (6) to the problem (7) with linearizing the smooth function g leads to the linearized ALM [32]:

$$ \begin{array}{@{}rcl@{}} \begin{cases} x_{k+1}\in{\arg\min}_{x} f(x)+\langle \nabla g(x_{k})+A^{T}\lambda_{k},x\rangle +\frac{\sigma}{2}\|Ax-b\|^{2}+\frac{1}{2}\|x-x_{k}\|_{P}^{2},\\ \lambda_{k+1} = \lambda_{k}+\sigma(Ax_{k+1}-b). \end{cases} \end{array} $$

(8)

1.1 Related works

Under the assumption that F is smooth, He and Yuan [15] showed that the iteration-complexity of the method (5) is $\mathcal {O}(1/k)$ in terms of the objective residual of the associated ${\mathscr{L}}(x,\lambda )$. When F is nonsmooth, Gu et al. [13] proved that the method (5) enjoys a worst-case $\mathcal {O}(1/k)$ convergence rate in the ergodic sense. A worst-case $\mathcal {O}(1/k)$ convergence rate in the non-ergodic sense of the method (6) was shown in [21]. When g has a Lipschitz continuous gradient with constant L_g and P ≻ L_gId, Xu [32] proved that the method (8) achieves $\mathcal {O}(1/k)$ convergence rate in the ergodic sense. Tran-Dinh and Zhu [30] proposed a modified version of the method (8) and proved that the objective residual and feasibility violation sequences generated by the method both enjoy $\mathcal {O}(1/k)$ non-ergodic convergence rate. Liu et al. [24] investigated the non-ergodic convergence rate of an inexact augmented Lagrangian method for the problem (7).

Generally, naive first-order methods converge slowly. Much effort has been made to accelerate the existing first-order methods in past decades. Nesterov [25] first proposed an accelerated version of the classical gradient method for a smooth convex optimization problem, and proved that the accelerated inertial gradient method enjoys $\mathcal {O}(1/k^{2})$ convergence rate. Beck and Teboulle [5] proposed an iterative shrinkage-thresholding algorithm for solving the linear inverse problem, which achieves $\mathcal {O}(1/k^{2})$ convergence rate. The acceleration idea of [25] was further applied in Nesterov [26] to design the accelerated methods for unconstrained convex composite optimization problems. Su et al. [29] first studied accelerated methods from a continuous-time perspective. Since then, some new accelerated inertial methods based on the second-order dynamical system have been proposed for unconstrained optimization problems (see, e.g., [1, 3, 4]). For more results on inertial methods for unconstrained optimization problems, we refer the reader to [2, 12, 27].

Meanwhile, inertial accelerated methods for linearly constrained optimization problems have also been well-developed. When f is differentiable, He and Yuan [15] proposed an accelerated inertial ALM for the problem (1) and proved that its convergence rate is $\mathcal {O}(1/k^{2})$ by using an extrapolation technique similar to [5]. When f is a differentiable function with Lipschitz continuous gradient, by time discretization of dynamical system, Boţ et al. [7] proposed an inertial ALM with $\mathcal {O}(1/k^{2})$ convergence rate and provided the convergence of the sequence of iterates. Kang et al. [18] presented an inexact version of the accelerated ALM with inexact calculations of subproblems and showed that the convergence rate remains $\mathcal {O}(1/k^{2})$ under the assumption that F is strongly convex. Kang et al. [17] further presented an accelerated Bregman method for the linearly constrained ℓ₁ − ℓ₂ minimization problem, and a convergence rate of $\mathcal {O}(1/k^{2})$ was proved when the accelerated Bregman method is applied to solve the problem (1). To linearize the augmented term of the Bregman method, Huang et al. [16] raised an accelerated linearized Bregman algorithm with $\mathcal {O}(1/k^{2})$ convergence rate. For the problem (7), Tran-Dinh and Zhu [30] proposed an inertial primal-dual method which enjoys $\underline {o}(1/k\sqrt {\log k})$ convergence rate. Xu [32] proposed an accelerated version of the linearized ALM (8), named the accelerated linearized augmented Lagrangian method, which is formulated as follows:

$$ \begin{array}{@{}rcl@{}} \begin{cases} \hat{x}_{k}=(1-\alpha_{k})\bar{x}_{k}+\alpha_{k} x_{k},\\ x_{k+1} \in {\arg\min}_{x} f(x)+ \frac{\beta_{k}}{2}\|Ax - b\|^{2}+\frac{1}{2}\|x - x_{k}\|_{P_{k}}^{2}+\langle \nabla g(\hat{x}_{k})+A^{T}{\lambda}_{k}, x\rangle,\\ \bar{x}_{k+1}=(1-\alpha_{k})\bar{x}_{k}+\alpha_{k} x_{k+1},\\ \lambda_{k+1} =\lambda_{k}+ \gamma_{k}(Ax_{k+1}-b). \end{cases} \end{array} $$

(9)

It was shown in Xu [32] that the algorithm (9) enjoys $\mathcal {O}(1/k^{2})$ convergence rate under specific parameter settings. It is worth mentioning that to achieve the $\mathcal {O}(1/k^{2})$ rate, linearization to the augmented term is not allowed in the algorithm (9) since it may cause great difficulty on solving subproblems. Xu [32] did not discuss the convergence analysis of the method when the subproblem is solved inexactly.

1.2 Inertial primal-dual methods

We first propose Algorithm 1, an inertial version of the proximal ALM (6), for solving the problem (1). Algorithm 1 is inspired by the second-order primal-dual dynamical system in [14, 34] and the Nesterov accelerated methods for unconstrained optimization problem [2, 5, 25]. When the objective has the composite structure: F(x) = f(x) + g(x), by linearizing the smooth function g and introducing the perturbed sequence {𝜖_k}_k≥ 1 in Step 2 of Algorithm 1, we propose an inexact inertial accelerated primal-dual method (Algorithm 2) for the problem (7). As a comparison to Algorithm 1, we solve the subproblem inexactly by finding an approximate solution instead of an exact solution.

1.3 Outline

The rest of the paper is organized as follows. In Section 2, we investigate the convergence rates of the proposed methods. In Section 3, we perform numerical experiments. Finally, we give a concluding remark in Section 4.

2 Convergence analysis

In this section we analyze the convergence rates of Algorithm 1 and Algorithm 2. Assuming merely convexity, we show that both of them enjoy $\mathcal {O}(1/k^{2})$ convergence rates in terms of the objective function and the primal feasibility.

To do so, we first recall some standard notations and results which will be used in the paper. In what follows, we always use ∥⋅∥ to denote the ℓ₂-norm.

Let $\mathbb {S}_{+}(n)$ denote the set of all positive symmetric semidefinite matrixes in $\mathbb {R}^{n\times n}$ and Id be the identity matrix. For $M\in \mathbb {S}_{+}(n)$, we introduce the semi-norm on $\mathbb {R}^{n}$: $\|x\|_{M} =\sqrt {x^{T}Mx}$ for any $x\in \mathbb {R}^{n}$. This introduces on $\mathbb {S}_{+}(n)$ the following partial ordering: for any $M_{1}, M_{2} \in \mathbb {S}_{+}(n)$,

$$M_{1} \succcurlyeq M_{2} \Longleftrightarrow \|x\|_{M_{1}}\geq \|x\|_{M_{2}},\quad \forall x\in \mathbb{R}^{n}.$$

For any $x,y\in \mathbb {R}^{n}$, the following equality holds:

$$ \begin{array}{@{}rcl@{}} \frac{1}{2}\|x\|_{M}^{2}-\frac{1}{2}\|y\|_{M}^{2} = \langle x, M(x-y)\rangle-\frac{1}{2}\|x-y\|_{M}^{2}, \quad \forall M\in\mathbb{S}_{+}(n). \end{array} $$

(10)

Now, we start to analyze Algorithm 1.

Lemma 1

Let $\{(x_{k},\lambda _{k},\bar {x}_{k})\}_{k\geq 1}$ be the sequence generated by Algorithm 1. Then,

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\frac{k + \alpha - 2}{k}M_{k}(x_{k+1}\!-\bar{x}_{k}) \in - s\!\left( \!\partial F(x_{k+1})+A^{T}(\lambda_{k+1}+\frac{k - 1}{\alpha - 1}(\lambda_{k+1}-\lambda_{k}))\!\right). \end{array} $$

(11)

Proof

From step 2, we have

$$ 0\in \partial F(x_{k+1})+\frac{k+\alpha-2}{sk}M_{k}(x_{k+1}-\bar{x}_{k})+\frac{sk(k+\alpha-2)}{(\alpha-1)^{2}} A^{T}(Ax_{k+1}-\eta_{k})+A^{T}\hat{\lambda}_{k}.$$

This yields

$$ \frac{k + \alpha - 2}{k}M_{k}(x_{k+1}-\bar{x}_{k})\in -s\!\left( \!\partial F(x_{k+1}) + A^{T}\!\!\left( \!\frac{sk(k + \alpha - 2)}{(\alpha - 1)^{2}} (Ax_{k+1}\!-\eta_{k})+\hat{\lambda}_{k}\right)\right). $$

(12)

It follows from Step 2 and Step 3 that

$$ \begin{array}{@{}rcl@{}} && \frac{sk(k+\alpha-2)}{(\alpha-1)^{2}}(Ax_{k+1}-\eta_{k})+\hat{\lambda}_{k}\\ && = \frac{sk(k+\alpha-2)}{(\alpha-1)^{2}} A x_{k+1}-\frac{sk(k-1)}{(\alpha-1)^{2}}Ax_{k}-\frac{sk}{\alpha-1}b+\hat{\lambda}_{k} \\ && = \frac{sk}{\alpha-1}(Ax_{k+1}-b+\frac{k-1}{\alpha-1}A(x_{k+1}-x_{k}))+\hat{\lambda}_{k}\\ && = \frac{k+\alpha-2}{\alpha-1}(\lambda_{k+1}-\bar{\lambda}_{k})+ \frac{k+\alpha-2}{\alpha-1}\bar{\lambda}_{k}-\frac{k-1}{\alpha-1}\lambda_{k} \\ && = \lambda_{k+1} +\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}). \end{array} $$

This together with (12) yields (11). □

Lemma 2

Suppose that F is a convex function, Ω≠∅ and $M_{k-1}\succcurlyeq M_{k}$. Let $\{(x_{k},\lambda _{k}, \bar {x}_{k}, \hat {\lambda }_{k})\}_{k\geq 1}$ be the sequence generated by Algorithm 1 and (x^∗,λ^∗) ∈Ω. Define

$$ \begin{array}{@{}rcl@{}} \mathcal{E}_{k} = \frac{s(k^{2}-k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))+ \frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}+\frac{1}{2}\|\hat{\lambda}_{k}-\lambda^{*}\|^{2} \end{array} $$

(13)

with

$$ \hat{x}_{k} = \frac{k+\alpha-2}{\alpha-1}\bar{x}_{k}-\frac{k-1}{\alpha-1}x_{k}. $$

(14)

Then, for any k ≥ 1, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{E}_{k+1} &\leq& \mathcal{E}_{k}-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}(\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}+\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}). \end{array} $$

Proof

By computation,

$$ \begin{array}{@{}rcl@{}} \hat{x}_{k+1} &=& \frac{k+\alpha-1}{\alpha-1}(x_{k+1}+\frac{k-1}{k+\alpha-1}(x_{k+1}-x_{k}))-\frac{k}{\alpha-1}x_{k+1} \\ &=& \frac{k+\alpha-2}{\alpha-1}x_{k+1}-\frac{k-1}{\alpha-1}x_{k}\\ &=& \frac{k+\alpha-2}{\alpha-1}(x_{k+1}-\bar{x}_{k})+ \frac{k+\alpha-2}{\alpha-1}\bar{x}_{k}-\frac{k-1}{\alpha-1}x_{k}\\ &=& \hat{x}_{k}+\frac{k+\alpha-2}{\alpha-1}(x_{k+1}-\bar{x}_{k}) \end{array} $$

(15)

and

$$ \hat{x}_{k+1}-x^{*} = x_{k+1}-x^{*}+\frac{k-1}{\alpha-1}(x_{k+1}-x_{k}) . $$

(16)

Similarly, we have

$$ \hat{\lambda}_{k+1} = \hat{\lambda}_{k}+\frac{k+\alpha-2}{\alpha-1}(\lambda_{k+1}-\bar{\lambda}_{k}) $$

(17)

and

$$ \hat{\lambda}_{k+1}-\lambda^{*} = \lambda_{k+1}-\lambda^{*}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}). $$

(18)

By the definition of ${\mathscr{L}}(x,\lambda )$, we get $\partial _{x} {\mathscr{L}}(x,\lambda ) = \partial F(x)+A^{T}\lambda $. Combining this and equality (18), we can rewrite (11) as

$$ \begin{array}{@{}rcl@{}} \frac{k+\alpha-2}{k}M_{k}(x_{k+1}-\bar{x}_{k})&\in& - s(\partial F(x_{k+1})+A^{T}\lambda^{*}+A^{T}(\lambda_{k+1}\\&&-\lambda^{*}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k})))\\ & =& - s\partial_{x} \mathcal{L}(x_{k+1},\lambda^{*})-sA^{T}(\hat{\lambda}_{k+1}-\lambda^{*}), \end{array} $$

which implies

$$ \xi_{k} :=-\frac{k+\alpha-2}{sk}M_{k}(x_{k+1}-\bar{x}_{k})- A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\in \partial_{x} \mathcal{L}(x_{k+1},\lambda^{*}). $$

(19)

Since $M_{k-1}\succcurlyeq M_{k}\succcurlyeq 0$, it follows from (10) and (15) that

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2} \\ &=&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}-M_{k}}^{2} \\ &\leq& \langle \hat{x}_{k+1}-x^{*}, M_{k}(\hat{x}_{k+1}-\hat{x}_{k})\rangle-\frac{1}{2}\|\hat{x}_{k+1}-\hat{x}_{k}\|_{M_{k}}^{2} \\ &=& \frac{k+\alpha-2}{\alpha-1}\langle \hat{x}_{k+1}-x^{*}, M_{k}(x_{k+1}-\bar{x}_{k})\rangle\\&&-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2} \\ &=& -\frac{sk}{\alpha-1}(\langle \hat{x}_{k+1}-x^{*}, \xi_{k}\rangle +\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle)\\ &&-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}. \end{array} $$

(20)

Since ${\mathscr{L}}(x,\lambda ^{*})$ is a convex function with respect to x, from (16) and (19) we get

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\langle \hat{x}_{k+1} - x^{*}, \xi_{k} \rangle &=&\langle x_{k+1}-x^{*}, \xi_{k}\rangle +\frac{k-1}{\alpha-1}\langle x_{k+1}-x_{k}, \xi_{k}\rangle\\ &\geq& \mathcal{L}(x_{k+1},\lambda^{*}) - \mathcal{L}(x^{*},\lambda^{*})+\frac{k - 1}{\alpha - 1}(\mathcal{L}(x_{k+1},\lambda^{*}) - \mathcal{L}(x_{k},\lambda^{*})). \end{array} $$

(21)

Combining (20) and (21) together, we have

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\\ &\leq& -\frac{sk}{\alpha-1}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*})) - \frac{s(k^{2}-k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x_{k},\lambda^{*})) \\ &&-\frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}. \end{array} $$

(22)

Since Ax^∗ = b, it follows from Step 3 of Algorithm 1 and (16) that

$$ \lambda_{k+1}-\bar{\lambda}_{k}=\frac{sk}{k + \alpha - 2}(Ax_{k+1}-Ax^{*}+\frac{k-1}{\alpha-1}A(x_{k+1}-x_{k})) = \frac{sk}{k + \alpha-2}A(\hat{x}_{k+1}-x^{*}). $$

This together with (10) and (17) yields

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{\lambda}_{k+1}-\lambda^{*}\|^{2}-\frac{1}{2}\|\hat{\lambda}_{k}-\lambda^{*}\|^{2} = \langle \hat{\lambda}_{k+1}-\lambda^{*}, \hat{\lambda}_{k+1}-\hat{\lambda}_{k}\rangle-\frac{1}{2}\|\hat{\lambda}_{k+1}-\hat{\lambda}_{k}\|^{2} \\ &&= \frac{k+\alpha-2}{\alpha-1}\langle \hat{\lambda}_{k+1}-\lambda^{*},\lambda_{k+1}-\bar{\lambda}_{k} \rangle-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}\\ &&= \frac{sk}{\alpha-1}\langle \hat{\lambda}_{k+1}-\lambda^{*},A(\hat{x}_{k+1}-x^{*})\rangle-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}. \end{array} $$

(23)

It follows from (22) and (23) that

$$ \begin{array}{@{}rcl@{}} &&\mathcal{E}_{k+1}-\mathcal{E}_{k} \\ &=&\frac{s(k^{2}+k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))- \frac{s(k^{2}-k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))\\ &&+\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}+\frac{1}{2}\|\hat{\lambda}_{k+1}-\lambda^{*}\|^{2}-\frac{1}{2}\|\hat{\lambda}_{k}-\lambda^{*}\|^{2}\\ &\leq& \frac{(3 - \alpha)sk}{(\alpha - 1)^{2}}(\mathcal{L}(x_{k+1\!},\lambda^{*}) - \mathcal{L}(x^{*}\!,\lambda^{*})\!) - \frac{(k + \alpha - 2)^{2}}{2(\alpha - \!1)^{2}}(\|x_{k+1} - \bar{x}_{k}\|_{M_{k}}^{2} + \|\lambda_{k+1} - \bar{\lambda}_{k}\|^{2})\\ &\leq& -\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}(\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}+\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}), \end{array} $$

where the last inequality follows from α ≥ 3 and (x^∗,λ^∗) ∈Ω. This yields the desire result. □

To obtain the fast convergence rates, we need the following lemma.

Lemma 3

([20, Lemma 2], [19, Lemma 3.18]) Let $\{a_{k}\}_{k=1}^{+\infty }$ be a sequence of vectors in $\mathbb {R}^{n}$ such that

$$\|(\tau+(\tau-1)K)a_{K+1}+\sum\limits^{K}_{k=1}a_{k}\|\leq C,\qquad \forall K\geq 1, $$

where τ > 1 and C ≥ 0. Then, $\|{\sum }^{K}_{k=1}a_{k}\|\leq C$ for all K ≥ 1.

Now, we discuss the $\mathcal {O}(1/k^{2})$ convergence rate of Algorithms 1.

Theorem 1

Suppose that F is a convex function, Ω≠∅ and $M_{k-1}\succcurlyeq M_{k}$. Let $\{(x_{k},\lambda _{k},\bar {x}_{k},\bar {\lambda }_{k})\}_{k\geq 1}$ be the sequence generated by Algorithm 1 and (x^∗,λ^∗) ∈Ω. The following conclusions hold:

${\sum }_{k=1}^{+\infty }k^{2}(\|x_{k+1}-\bar {x}_{k}\|_{M_{k}}^{2}+\|\lambda _{k+1}-\bar {\lambda }_{k}\|^{2}) < +\infty $.
For all k > 1,
$$ \begin{array}{@{}rcl@{}} &&\|Ax_{k}-b\| \leq \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s(k-1)(k+\alpha-3)},\\ &&|F(x_{k})-F(x^{*})|\leq \frac{(\alpha-1)^{2}\mathcal{E}_{1}}{s(k^{2}-k)}+ \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}\|\lambda^{*}\|}{s(k-1)(k+\alpha-3)}, \end{array} $$

where $\mathcal {E}_{1}=\frac {1}{2}\|x_{1}-x^{*}\|_{M_{0}}^{2}+\frac {1}{2}\|\lambda _{1}-\lambda ^{*}\|^{2}$.

Proof

From Lemma 2, we have

$$ \mathcal{E}_{k+1}- \mathcal{E}_{k} \leq -\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}(\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}+\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2})\leq 0, \quad \forall k\geq 1. $$

(24)

By the definition of $\mathcal {E}_{k}$ and (24), $\{\mathcal {E}_{k}\}_{k\geq 1}$ is a nonincreasing and positive sequence. As a consequence, $\mathcal {E}_{k}$ converges to some point. It follows from (24) that

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{k=1}^{+\infty} \frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}(\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}+\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}))\\ && \leq \lim_{K\to+\infty} \sum\limits_{k=1}^{K} (\mathcal{E}_{k}-\mathcal{E}_{k+1}) \\ &&= \mathcal{E}_{1}-\lim_{K\to+\infty}\mathcal{E}_{K+1}\\ &&< +\infty, \end{array} $$

(25)

which is (i).

Combining (13) and (24), we have

$$ \begin{array}{@{}rcl@{}} \|\hat{\lambda}_{k}-\lambda^{*}\| \leq \sqrt{2\mathcal{E}_{k}}\leq \sqrt{2\mathcal{E}_{1}}, \qquad \forall k\geq 1. \end{array} $$

This yields

$$ \begin{array}{@{}rcl@{}} \left\|\sum\limits^{K}_{k=1}(\hat{\lambda}_{k+1}-\hat{\lambda}_{k})\right\| &=& \left\|\hat{\lambda}_{K+1}-\hat{\lambda}_{1}\right\| \\ &\leq& \left\|\hat{\lambda}_{K+1}-{\lambda}^{*}\right\|+\left\|\hat{\lambda}_{1}-{\lambda}^{*}\right\|\\ &\leq& 2\sqrt{2\mathcal{E}_{1}} \end{array} $$

(26)

for all K ≥ 1. It follows form (17) and Step 3 of Algorithm 1 that

$$ \begin{array}{@{}rcl@{}} &&\left\|\sum\limits^{K}_{k=1}(\hat{\lambda}_{k+1}-\hat{\lambda}_{k})\right\| = \left\|\sum\limits^{K}_{k=1}\frac{k+\alpha-2}{\alpha-1}(\lambda_{k+1}-\bar{\lambda}_{k})\right\|\\ &=& \frac{s}{\alpha-1}\left\|\sum\limits^{K}_{k=1}k(Ax_{k+1}-b+\frac{k-1}{\alpha-1}A(x_{k+1}-x_{k}))\right\|\\ & =&\frac{s}{(\alpha-1)^{2}}\left\|\sum\limits^{K}_{k=1}\left( k(k+\alpha-2)(Ax_{k+1}-b) -k(k-1)(Ax_{k}-b) \right)\right\|\\ & =&\frac{s}{(\alpha-1)^{2}}\left\|K(K+\alpha-2)(Ax_{K+1}-b)+ \sum\limits^{K}_{k=1}(\alpha-3)(k-1)(Ax_{k}-b)\right\|. \end{array} $$

This together with (26) implies

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\left\|K(K + \alpha - 2)(Ax_{K+1} - b) + \sum\limits^{K}_{k=1}\left( (\alpha - 3)(k - 1)(Ax_{k} - b)\right)\!\right\| \!\leq \frac{2(\alpha - 1)^{2}\sqrt{2\mathcal{E}_{1}}}{s}. \end{array} $$

(27)

When α = 3, it follows from (27) that

$$ \begin{array}{@{}rcl@{}} \|K(K+\alpha-2)(Ax_{K+1}-b)\| \leq \frac{2(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s}. \end{array} $$

When α > 3: applying Lemma 3 with a_k = (α − 3)(k − 1)(Ax_k − b), $\tau = \frac {\alpha -2}{\alpha -3}$ and $C = \frac {2(\alpha -1)^{2}\sqrt {2\mathcal {E}_{1}}}{s}$, from (27), we obtain

$$\left\|\sum\limits^{K}_{k=1}\left( (\alpha-3)(k-1)(Ax_{k}-b)\right)\right\| \leq \frac{2(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s},$$

which together with (27) yields

$$ \|K(K+\alpha-2)(Ax_{K+1}-b)\|\leq \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s}.$$

From above discussion, when α ≥ 3, we have

$$ \begin{array}{@{}rcl@{}} \|Ax_{k}-b\| \leq \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s(k-1)(k+\alpha-3)}, \quad \forall k>1. \end{array} $$

(28)

It follows form the definition of $\mathcal {E}_{k}$ and (24) that

$$ \begin{array}{@{}rcl@{}} \mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}) \leq \frac{(\alpha-1)^{2}\mathcal{E}_{k}}{s(k^{2}-k)}\leq \frac{(\alpha-1)^{2}\mathcal{E}_{1}}{s(k^{2}-k)} \end{array} $$

for all k > 1. This together with (28) implies that

$$ \begin{array}{@{}rcl@{}} |F(x_{k})-F(x^{*})| &=& |\mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*})-\langle \lambda^{*}, Ax_{k}-b\rangle|\\ &\leq & \mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}) +\|\lambda^{*}\|\|Ax_{k}-b\|\\ &\leq & \frac{(\alpha-1)^{2}\mathcal{E}_{1}}{s(k^{2}-k)}+ \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}\|\lambda^{*}\|}{s(k-1)(k+\alpha-3)} \end{array} $$

for all k > 1. The proof is complete. □

To investigate the convergence of Algorithm 2, we need the following assumption.

Assumption (H)

Ω≠∅, f is a convex function, g is a convex smooth function and has a Lipschitz continuous gradient with constant L_g > 0, i.e.,

$$ \|\nabla g(x)-\nabla g(y)\|\leq L_{g}\|x-y\|,\qquad \forall x,y\in\mathbb{R}^{n}, $$

equivalently,

$$ g(x)\leq g(y) + \langle \nabla g(y),x-y\rangle+\frac{L_{g}}{2}\|x-y\|^{2},\qquad \forall x,y\in\mathbb{R}^{n}. $$

(29)

Lemma 4

Let $\{(x_{k},\lambda _{k},\bar {x}_{k})\}_{k\geq 1}$ the sequence generated by Algorithm 2. Then,

$$ \begin{array}{@{}rcl@{}} &&\frac{k+\alpha-2}{k} M_{k}({x_{k+1}-\bar{x}_{k}})\\ &&\in- s\left( \partial f(x_{k+1})+\nabla g(\bar{x}_{k})+ A^{T}(\lambda_{k+1}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}))-\epsilon_{k}\right). \end{array} $$

(30)

Proof

From Step 2 of Algorithm 2 , we have

$$ \begin{array}{@{}rcl@{}} &&0\in \partial f(x_{k+1})+\nabla g(\bar{x}_{k})+ \frac{k+\alpha-2}{sk}M_{k}(x_{k+1}-\bar{x}_{k})\\&&+\frac{sk(k+\alpha-2)}{(\alpha-1)^{2}} A^{T}(Ax_{k+1}-\eta_{k})+A^{T}\hat{\lambda}_{k}-\epsilon_{k} , \end{array} $$

which yields

$$ \begin{array}{@{}rcl@{}} &&\frac{k+\alpha-2}{k}M_{k}(x_{k+1}-\bar{x}_{k})\\ &&\in-s\left( \partial f(x_{k+1})+\nabla g(\bar{x}_{k}) +A^{T}\left( \frac{sk(k+\alpha-2)}{(\alpha-1)^{2}} (Ax_{k+1}-\eta_{k})+\hat{\lambda}_{k}\right)-\epsilon_{k}\right). \end{array} $$

The rest of the proof is similar as the one of Lemma 1, and so we omit it. □

Lemma 5

Assume that Assumption (H) holds, and $M_{k-1}\succcurlyeq M_{k}\succcurlyeq sL_{g} Id $. Let $\{(x_{k},\lambda _{k}, \hat {\lambda }_{k})\}_{k\geq 1}$ be the sequence generated by Algorithm 2 and (x^∗,λ^∗) ∈Ω. Define

$$ \mathcal{E}^{\epsilon}_{k} = \mathcal{E}_{k}-{\sum}^{k}_{j=1} \frac{s(j-1)}{\alpha-1} \langle \hat{x}_{j}-x^{*}, \epsilon_{j-1}\rangle, $$

(31)

where $\mathcal {E}_{k}$ is defined in (13) and $\hat {x}_{k}$ is defined in (14). Then, for any k ≥ 1,

$$ \begin{array}{@{}rcl@{}} \mathcal{E}^{\epsilon}_{k+1} &\leq& \mathcal{E}^{\epsilon}_{k}. \end{array} $$

Proof

By same arguments as in the proof of Lemma 2, we get

$$ \begin{array}{@{}rcl@{}} && \hat{x}_{k+1}-\hat{x}_{k} = \frac{k+\alpha-2}{\alpha-1}(x_{k+1}-\bar{x}_{k}), \end{array} $$

(32)

$$ \begin{array}{@{}rcl@{}} && \hat{x}_{k+1}-x^{*} = x_{k+1}-x^{*}+\frac{k-1}{\alpha-1}(x_{k+1}-x_{k}), \end{array} $$

(33)

$$ \begin{array}{@{}rcl@{}} && \hat{\lambda}_{k+1}-\hat{\lambda}_{k}=\frac{k+\alpha-2}{\alpha-1}(\lambda_{k+1}-\bar{\lambda}_{k}), \end{array} $$

(34)

$$ \begin{array}{@{}rcl@{}} && \hat{\lambda}_{k+1}-\lambda^{*} = \lambda_{k+1}-\lambda^{*}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}). \end{array} $$

(35)

For notation simplicity, we denote

$$ \mathcal{L}^{f}(x) = f(x)+\langle \lambda^{*},Ax-b\rangle. $$

(36)

Then, ${\mathscr{L}}^{f}$ is a convex function, $\partial {\mathscr{L}}^{f}(x) = \partial f(x)+A^{T}\lambda ^{*}$, and

$$ \mathcal{L}(x,\lambda^{*}) = \mathcal{L}^{f}(x)+g(x). $$

(37)

It follows from (30) and (35) that

$$ \begin{array}{@{}rcl@{}} \frac{k+\alpha-2}{k} M_{k}(x_{k+1}-\bar{x}_{k})&\in& - s(\partial f(x_{k+1})+A^{T}\lambda^{*})-s\nabla g(\bar{x}_{k})\\ &&-s A^{T}(\lambda_{k+1}-\lambda^{*}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}))+s\epsilon_{k} \\ & =& - s\partial \mathcal{L}^{f}(x_{k+1})-s\nabla g(\bar{x}_{k})-sA^{T}(\hat{\lambda}_{k+1}-\lambda^{*})+s\epsilon_{k}, \end{array} $$

which yields

$$ \xi_{k} :=-\frac{k+\alpha-2}{ks}M_{k}(x_{k+1}-\bar{x}_{k})-\nabla g(\bar{x}_{k})- A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})+\epsilon_{k} \in \partial\mathcal{L}^{f}(x_{k+1}). $$

(38)

Since $M_{k-1}\succcurlyeq M_{k}$, it follows from (10), (32) and (38) that

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\\&\leq& \langle \hat{x}_{k+1}-x^{*}, M_{k}(\hat{x}_{k+1}-\hat{x}_{k})\rangle-\frac{1}{2}\|\hat{x}_{k+1}-\hat{x}_{k}\|^{2}_{M_{k}} \\ &=& -\frac{sk}{\alpha-1}(\langle \hat{x}_{k+1}-x^{*},\xi_{k}\rangle+\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle\\ &&+\langle \hat{x}_{k+1}-x^{*},\nabla g(\bar{x}_{k})\rangle-\langle \hat{x}_{k+1}-x^{*},\epsilon_{k}\rangle) -\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}. \end{array} $$

(39)

From (33) and (38), we have

$$ \begin{array}{@{}rcl@{}} \langle \hat{x}_{k+1}-x^{*}, \xi_{k} \rangle& =&\langle x_{k+1}-x^{*}, \xi_{k}\rangle +\frac{k-1}{\alpha-1}\langle x_{k+1}-x_{k}, \xi_{k}\rangle\\ &\geq& \mathcal{L}^{f}(x_{k+1})-\mathcal{L}^{f}(x^{*})+\frac{k-1}{\alpha-1}(\mathcal{L}^{f}(x_{k+1})-\mathcal{L}^{f}(x_{k})), \end{array} $$

(40)

where the inequality follows from the convexity of ${\mathscr{L}}^{f}$. Since g has a Lipschitz continuous gradient, from (29) we get

$$ g(x_{k+1})\leq g(\bar{x}_{k})+\langle \nabla g(\bar{x}_{k}),x_{k+1}-\bar{x}_{k}\rangle+\frac{L_{g}}{2}\|x_{k+1}-\bar{x}_{k}\|^{2}. $$

(41)

By the convexity of g, we have

$$ \begin{array}{@{}rcl@{}} \langle\nabla g(\bar{x}_{k}),x_{k+1}-\bar{x}_{k}\rangle &=& \langle\nabla g(\bar{x}_{k}),x_{k+1}-x^{*}\rangle+\langle\nabla g(\bar{x}_{k}),x^{*}-\bar{x}_{k}\rangle\\ &\leq&\langle\nabla g(\bar{x}_{k}),x_{k+1}-x^{*}\rangle + g(x^{*})-g(\bar{x}_{k}) \end{array} $$

(42)

and

$$ \begin{array}{@{}rcl@{}} \langle\nabla g(\bar{x}_{k}),x_{k+1}-\bar{x}_{k}\rangle &=& \langle\nabla g(\bar{x}_{k}),x_{k+1}-x_{k}\rangle+\langle\nabla g(\bar{x}_{k}),x_{k}-\bar{x}_{k}\rangle\\ &\leq&\langle\nabla g(\bar{x}_{k}),x_{k+1}-x_{k}\rangle + g(x_{k})-g(\bar{x}_{k}). \end{array} $$

(43)

It follows from (41)–(43) that

$$\langle\nabla g(\bar{x}_{k}),x_{k+1}-x^{*}\rangle\geq g(x_{k+1})-g(x^{*})-\frac{L_{g}}{2}\|x_{k+1}-\bar{x}_{k}\|^{2},$$

and

$$\langle\nabla g(\bar{x}_{k}),x_{k+1}-x_{k}\rangle\geq g(x_{k+1})-g(x_{k})-\frac{L_{g}}{2}\|x_{k+1}-\bar{x}_{k}\|^{2}.$$

This together with (33) yields

$$ \begin{array}{@{}rcl@{}} \langle \hat{x}_{k+1}-x^{*},\nabla g(\bar{x}_{k}) \rangle &=& \langle\nabla g(\bar{x}_{k}),x_{k+1}-x^{*}\rangle+\frac{k-1}{\alpha-1}\langle\nabla g(\bar{x}_{k}),x_{k+1}-x_{k}\rangle\\ & \geq& g(x_{k+1})-g(x^{*})+\frac{k-1}{\alpha-1}(g(x_{k+1})-g(x_{k}))\\ &&-\frac{(k+\alpha-2)L_{g}}{2(\alpha-1)}\|x_{k+1}-\bar{x}_{k}\|^{2}. \end{array} $$

(44)

It follows from (39), (40) and (44) that

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\\ &\leq& -\frac{sk}{\alpha-1}(\mathcal{L}^{f}(x_{k+1})+g(x_{k+1})-(\mathcal{L}^{f}(x^{*})+g(x^{*}))) \\ &&- \frac{sk(k-1)}{(\alpha-1)^{2}}(\mathcal{L}^{f}(x_{k+1})+g(x_{k+1})-(\mathcal{L}^{f}(x_{k})+g(x_{k})))\\ &&-\frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle+ \frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},\epsilon_{k} \rangle\\ &&-\frac{k+\alpha-2}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{(k+\alpha-2)M_{k}-sL_{g}kId}^{2} \\ &\leq& -\frac{sk}{\alpha-1}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))- \frac{sk(k-1)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x_{k},\lambda^{*}))\\ &&-\frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle+\frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},\epsilon_{k} \rangle, \end{array} $$

(45)

where the second inequality follows from the assumption $M_{k}\succcurlyeq sL_{g} Id\succcurlyeq \frac {sL_{g}k}{k+\alpha -2}Id$.

It follows from (23), (31) and (45) that

$$ \begin{array}{@{}rcl@{}} \mathcal{E}^{\epsilon}_{k+1}-\mathcal{E}^{\epsilon}_{k}&=& \mathcal{E}_{k+1}-\mathcal{E}_{k}- \frac{sk}{\alpha-1} \langle \hat{x}_{k+1}-x^{*}, \epsilon_{k}\rangle \\ & \leq& \frac{(3-\alpha)sk}{(\alpha-1)^{2}}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*})) -\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2} \\ & \leq& 0, \end{array} $$

(46)

where the last inequality follows from α ≥ 3 and (x^∗,λ^∗) ∈Ω. □

To analyze the convergence of Algorithm 2, we need the following discrete version of the Gronwall-Bellman lemma.

Lemma 6

[2, Lemma 5.14] Let {a_k}_k≥ 1 and {b_k}_k≥ 1 be two nonnegative sequences such that ${\sum }^{+\infty }_{k} b_{k}< +\infty $ and

$$ {a_{k}^{2}} \leq c^{2} +\sum\limits^{k}_{j=1}b_{j} a_{j} $$

for all k ≥ 1, where c ≥ 0. Then,

$$a_{k}\leq c+\sum\limits^{+\infty}_{j=1} b_{j}$$

for all k ≥ 1.

Theorem 2

Assume that Assumption (H) holds, $M_{k-1}\succcurlyeq M_{k}\succcurlyeq sL_{g}Id$ for all k ≥ 1, and

$$\sum\limits^{+\infty}_{k=1}k\|\epsilon_{k}\|< +\infty. $$

Let {(x_k,λ_k)}_k≥ 1 be the sequence generated by Algorithm 2 and (x^∗,λ^∗) ∈Ω. Then, for all k > 1,

$$ \begin{array}{@{}rcl@{}} && \|Ax_{k}-b\| \leq \frac{4(\alpha-1)^{2}\sqrt{2C}}{s(k-1)(k+\alpha-3)},\\ && |f(x_{k})+g(x_{k})-f(x^{*})-g(x^{*})| \leq \frac{(\alpha-1)^{2}C}{s(k^{2}-k)}+\frac{4(\alpha-1)^{2}\sqrt{2C}\|\lambda^{*}\|}{s(k-1)(k+\alpha-3)}, \end{array} $$

where

$$ C\!: =\! \frac{1}{2}\|x_{1}-x^{*}\|_{M_{0}}^{2}+\frac{1}{2}\|\lambda_{1}-\lambda^{*}\|^{2}+\frac{s}{\alpha - 1}\!\!\left( \!\!\sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}} + \frac{2}{(\alpha - 1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\!\!\right )\!\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|.$$

Proof

From Lemma 5 we have

$$\mathcal{E}^{\epsilon}_{k+1} \leq \mathcal{E}^{\epsilon}_{k}\leq \mathcal{E}^{\epsilon}_{1},$$

and it yields

$$ \mathcal{E}_{k}\leq \mathcal{E}_{1}+\sum\limits^{k}_{j=1}\frac{s(j-1)}{\alpha-1} \langle \hat{x}_{j}-x^{*}, \epsilon_{j-1}\rangle. $$

(47)

This together with (13) and Cauchy-Schwarz inequality implies

$$\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\leq 2\mathcal{E}_{1}+\frac{2s}{\alpha-1}\sum\limits^{k}_{j=1}(j-1) \|\hat{x}_{j}-x^{*}\|\cdot\|\epsilon_{j-1}\| . $$

Since $M_{k-1}\succcurlyeq sL_{g} Id $,

$$ \|\hat{x}_{k}-x^{*}\|^{2}\leq \frac{1}{sL_{g}} \|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\leq \frac{2\mathcal{E}_{1}}{sL_{g}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{k}_{j=1}(j-1) \|\hat{x}_{j}-x^{*}\|\cdot\|\epsilon_{j-1}\| . $$

(48)

Since ${\sum }^{+\infty }_{j=1} j\|\epsilon _{j}\|< +\infty $, applying Lemma 6 with $a_{k}=\|\hat {x}_{k}-x^{*}\|$ to (48), we obtain

$$ \|\hat{x}_{k}-x^{*}\|\leq \sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|<+\infty, \quad\forall k\geq 1. $$

(49)

This together with (47) yields

$$ \begin{array}{@{}rcl@{}} \mathcal{E}_{k}&\leq & \mathcal{E}_{1}+\frac{s}{\alpha-1}\sup_{k\geq 1}\|\hat{x}_{k}-x^{*}\|\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\\ &\leq& \mathcal{E}_{1}+\frac{s}{\alpha-1}\left( \sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\right )\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\| \end{array} $$

(50)

for any k ≥ 1. Denote

$$ \begin{array}{@{}rcl@{}} C&:=& \mathcal{E}_{1}+\frac{s}{\alpha-1}\left( \sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\right )\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\\ &=& \frac{1}{2}\|x_{1}-x^{*}\|_{M_{0}}^{2}+\frac{1}{2}\|\lambda_{1}-\lambda^{*}\|^{2}\\&&+\frac{s}{\alpha-1}\left( \sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\right )\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|. \end{array} $$

By the definition of $\mathcal {E}_{k}$ and (50), we have

$$\frac{s(k^{2}-k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))\leq \mathcal{E}_{k}\leq C$$

and

$$ \begin{array}{@{}rcl@{}} \|\hat{\lambda}_{k}-\lambda^{*}\| \leq \sqrt{2\mathcal{E}_{k}}\leq \sqrt{2C}, \qquad \forall k\geq 1. \end{array} $$

By similar arguments in Theorem 1, we obtain

$$ \begin{array}{@{}rcl@{}} \|Ax_{k}-b\| \leq \frac{4(\alpha-1)^{2}\sqrt{2C}}{s(k-1)(k+\alpha-3)} \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} &&|f(x_{k})+g(x_{k})-f(x^{*})-g(x^{*})| \\ &&\leq \mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}) +\|\lambda^{*}\|\|Ax_{k}-b\|\\ &&\leq \frac{(\alpha-1)^{2}C}{s(k^{2}-k)}+\frac{4(\alpha-1)^{2}\sqrt{2C}\|\lambda^{*}\|}{s(k-1)(k+\alpha-3)} \end{array} $$

for all k > 1. □

Remark 1

It was shown in [32, Theorem 2.9] that the algorithm (9) with adaptive parameters enjoys $\mathcal {O}(1/k^{2})$ rate. Xu [32] did not discuss whether the convergence rate of algorithm (9) is preserved when the subproblem is solved inexactly. By Theorem 2, the $\mathcal {O}(1/k^{2})$ convergence rate of Algorithm 2 is preserved even if the subproblem is solved inexactly, provided the errors are sufficiently small. The numerical experiments in section 3 also show the effectiveness of the inexact algorithm.

3 Numerical experiments

In this section, we present numerical experiments to illustrate the efficiency of the proposed algorithms. All codes are run on a PC (with 2.3 GHz Quad-Core Intel Core i5 and 8 GB memory) under MATLAB Version 9.4.0.813654 (R2018a).

3.1 The quadratic programming problem

In this subsection, we test the algorithms on the nonnegative linearly constrained quadratic programming problem (NLCQP):

$$ \min \ \frac{1}{2}x^{T}Qx+q^{T}x, \quad s.t. \ Ax = b, x\geq 0, $$

where $ q \in \mathbb {R}^{n}$, $Q\in \mathbb {R}^{n\times n}$ is a positive semidefinite matrix, $A\in \mathbb {R}^{m\times n}$, $ b \in \mathbb {R}^{m}$. Here, we compare Algorithm 2 (Al2) with the accelerated linearized augmented Lagrangian method (AALM [32, Algorithm 1]), which enjoys $\mathcal {O}(1/k^{2})$ convergence rate with adaptive parameters.

Set m = 100 and n = 500. Let q be generated by standard Gaussian distribution, b be generated by uniform distribution, A = [B,Id] with $B\in \mathbb {R}^{m\times (n-m)}$ generated by standard Gaussian distribution, Q = 2H^TH with $H\in \mathbb {R}^{n\times n}$ generated by standard Gaussian distribution. Then, Q may not be positive definite. The optimal value F(x^∗) is obtained by Matlab function quadprog with tolerance 10^− 15. In this case, F(x) = f(x) + g(x) with $f(x) = \mathcal {I}_{y\geq 0}(x)$, HCode $g(x)= \frac {1}{2}x^{T}Qx+q^{T}x$, where $\mathcal {I}_{y\geq 0}$ is the indicator function of the set {y|y ≥ 0}, i.e.,

$$ \mathcal{I}_{y\geq 0}(x)= \begin{cases} 0,\qquad &x\geq 0,\\ +\infty \qquad &otherwise. \end{cases} $$

Set the parameters of Algorithm 2 as: s = ∥Q∥, M_k = s ∗∥Q∥Id. Set the parameters of AALM ([32, Algorithm 1]) with adaptive parameters, in which $\alpha _{k}=\frac {2}{k+1}$, β_k = γ_k = ∥Q∥k, $P_{k}=\frac {2\|Q\|}{k}Id$. Subproblems for both algorithms are solved by interior-point algorithms to a tolerance subtol. Figure 1 describes the distance of optimal value |F(x_k) − F(x^∗)| and violation of feasibility ∥Ax_k − b∥ given Al2 with α = 10, 20, 30 and AALM for the first 500 iterations. As shown in Fig. 1, Algorithm 2 performs better and more stable than AALM under different subtol.

3.2 The basis pursuit problem

Consider the following basis pursuit problem:

$$ \min_{x} \quad \|x\|_{1}, \quad s.t. \ Ax = b, $$

where $A\in \mathbb {R}^{m\times n}$, $b\in \mathbb {R}^{m}$ and m ≤ n. Let A be generated by standard Gaussian distribution. The number of nonzero elements of the original solution x^∗ is fixed at 0.1 ∗ n, and the nonzero elements are selected randomly in [− 2,2]. Set b = Ax^∗. We compare Algorithm 1 with the inexact augmented Lagrangian method (IAL [24, Algorithm 1]). Here, subproblems for both algorithms are solved by fast iterative shrinkage-thresholding algorithm (FISTA [5], [24, Algorithm 2]), and the stopping condition of the FISTA is when

$$ \frac{\|x_{k}-x_{k-1}\|^{2}}{\max\{\|x_{k-1}\|,1\}}\leq subtol $$

is satisfied or the number of iterations exceeds 100, where accuracy subtol = 1e − 4, 1e − 6, 1e − 8. In each test, we calculate the residual error ∥Ax_k − b∥ (Res) and the relative error of the solution $\frac {\|x_{k}-x^{*}\|}{\|{x}^{*}\|}$ (Rel) with the stopping condition Res + Rel ≤ 1e − 8. Set the parameters of Algorithm 1 as α = n, s = 100, M_k = 0, and the parameter of IAL as β = 1. Let Init and Time denote the number of iterations, and the CPU time in seconds, respectively. Under different tolerance subtol of subproblem, Tables 1, 2 and 3 report the results for the basis pursuit problem with different dimensions. We observe that when the subproblem is solved with different accuracy, Algorithm 1 is faster than IAL in terms of the number of iterations and the cpu time.

Table 1 Numerical results of Algorithm 1 and IAL with subtol = 1e − 4

Full size table

Table 2 Numerical results of Algorithm 1 and IAL with subtol = 1e − 6

Full size table

Table 3 Numerical results of Algorithm 1 and IAL with subtol = 1e − 8

Full size table

3.3 The linearly constrained ℓ ₁ − ℓ ₂ minimization problem

Consider the following problem:

$$ \min_{x}\ \|x\|_{1} +\frac{\beta}{2}\|x\|^{2}_{2} \quad s.t.\ Ax = b, $$

where $A\in \mathbb {R}^{m\times n}$ and $b\in \mathbb {R}^{m}$. Let m = 1500,n = 3000, and A be generated by standard Gaussian distribution. Suppose that the original solution (signal) ${x}^{*}\in \mathbb {R}^{n}$ has only 150 non-zero elements which are generated by the Gaussian distribution $\mathcal {N}(0,4)$ in the interval [− 2,2] and that the noise ω is selected randomly with ∥ω∥ = 10^− 4,

$$ b= A{x}^{*}+\omega.$$

Set parameters for Algorithm 2 (Al2) with α = 20, s = 1, M_k = sβ, and the parameters of IAALM ([18, Algorithm 1]) with γ = 1. Subproblems are solved by FISTA and the stopping condition is when

$$ \frac{\|x_{k}-x_{k-1}\|^{2}}{\max\{\|x_{k-1}\|,1\}}\leq subtol $$

is satisfied or the number of iterations exceeds 100, where accuracy subtol = 1e − 6, 1e − 8. We terminate all the methods when ∥Ax_k − b∥≤ 5 ∗ 10^− 4. In each test, we calculate the residual error res = ∥Ax − b∥, the relative error $rel = \frac {\|x-x^{*}\|}{\|{x}^{*}\|}$ and the signal-to-noise ratio

$$SNR=10\log_{10} \frac{\|x^{*}-\text{mean}(x^{*})\|^{2}}{\|x-{x}^{*}\|^{2}},$$

where x is the recovery signal.

In Table 4, we present the numerical results of Algorithm 2 and IAALM for various β. When subtol = 1e − 6, IAALM does not work well, we list the numerical results of Algorithm 2 in Table 5. Based on the Rel and SNR, it is seen that the sparse original signal is well restored when β ≤ 1. This is also shown in Fig. 2.

Table 4 Numerical results of Algorithm 2 and IAALM with subtol = 1e − 8

Full size table

Table 5 Numerical results of Algorithm 2 with subtol = 1e − 6

Full size table

4 Conclusion

In this paper, we propose two inertial accelerated primal-dual methods for solving linear equality constrained convex optimization problems. Assuming merely convexity, we show the inertial primal-dual methods own $\mathcal {O}(1/k^{2})$ convergence rates even if the subproblem is solved inexactly. The numerical results demonstrate the validity and superior performance of our methods over some existing methods.

References

Apidopoulos, V., Aujol, J. F., Dossal, C.: Convergence rate of inertial forward-backward algorithm beyond Nesterov’s rule. Math. Program. 180(1), 137–156 (2020)
Article MathSciNet Google Scholar
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168(1-2), 123–175 (2018)
Article MathSciNet Google Scholar
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k². SIAM J. Optim. 26(3), 1824–1834 (2016)
Article MathSciNet Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Bertsekas, D.: Constrained optimization and lagrange multiplier methods. Academic Press (1982)
Boţ, R. I., Csetnek, E. R., Nguyen, D. K.: Fast augmented Lagrangian method in the convex regime with convergence guarantees for the iterates. arXiv:https://arxiv.org/abs/2111.09370 (2021)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2010)
Article Google Scholar
Candes, E. J., Wakin, M. B.: An introduction to compressive sampling. IEEE Signal. Proc. Mag. 25(2), 21–30 (2008)
Article Google Scholar
Chen, S. S., Donoho, D. L., Saunders, M. A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
Article MathSciNet Google Scholar
Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64(1-3), 81–101 (1994)
Article MathSciNet Google Scholar
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141 (1-2), 349–382 (2013)
Article MathSciNet Google Scholar
Gu, G., He, B., Yuan, X.: Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: a unified approach. Comput. Optim. Appl. 59(1-2), 135–161 (2014)
Article MathSciNet Google Scholar
He, X., Hu, R., Fang, Y. P.: Convergence rates of inertial primal-dual dynamical methods for separable convex optimization problems. SIAM J. Control Optim. 59(5), 3278–3301 (2021)
He, B., Yuan, X.: On the acceleration of augmented Lagrangian method for linearly constrained optimization. Optimization online http://www.optimization-online.org/DB_FILE/2010/10/2760.pdf (2010)
Huang, B., Ma, S., Goldfarb, D.: Accelerated linearized Bregman method. J. Sci. Comput. 54(2), 428–453 (2013)
Article MathSciNet Google Scholar
Kang, M., Yun, S., Woo, H., Kang, M.: Accelerated Bregman method for linearly constrained ℓ₁ − ℓ₂ minimization. J. Sci. Comput. 56(3), 515–534 (2013)
Article MathSciNet Google Scholar
Kang, M., Kang, M., Jung, M.: Inexact accelerated augmented Lagrangian methods. Comput. Optim. Appl. 62(2), 373–404 (2015)
Article MathSciNet Google Scholar
Lin, Z., Li, H., Fang, C.: Accelerated optimization for machine learning. Springer, Singapore (2019)
Li, H., Lin, Z.: Accelerated alternating direction method of multipliers: An optimal $\mathcal {O}(1/k)$ nonergodic analysis. J. Sci. Comput. 79(2), 671–699 (2019)
Article MathSciNet Google Scholar
Ma, F., Ni, M.: A class of customized proximal point algorithms for linearly constrained convex optimization. Comput. Appl. Math. 37(2), 896–911 (2018)
Article MathSciNet Google Scholar
Madan, R., Lall, S.: Distributed algorithms for maximum lifetime routing in wireless sensor networks. IEEE Trans Wirel. Commun. 5(8), 2185–2193 (2006)
Article Google Scholar
Nedic, A., Ozdaglar, A.: Cooperative Distributed Multi-agent. Convex Optimization in Signal Processing and Communications. Cambridge University Press (2010)
Liu, Y. F., Liu, X., Ma, S.: On the nonergodic convergence rate of an inexact augmented Lagrangian framework for composite convex programming. Math. Oper. Res. 44(2), 632–650 (2019)
Article MathSciNet Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $\mathcal {O}(1/k^{2})$. Insov. Math. Dokl. 27, 372–376 (1983)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet Google Scholar
Nocedal, J., Wright, S.: Numerical optimization. Springer Science and Business Media (2006)
Shi, G., Johansson, K. H.: Randomized optimal consensus of multi-agent systems. Automatica 48(12), 3018–3030 (2012)
Article MathSciNet Google Scholar
Su, W., Boyd, S., Candes, E. J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(1), 5312–5354 (2016)
MathSciNet MATH Google Scholar
Tran-Dinh, Q., Zhu, Y.: Non-stationary first-Order primal-Dual algorithms with fast convergence rates. SIAM J. Optim. 30(4), 2866–2896 (2020)
Article MathSciNet Google Scholar
Van Den Berg, E., Friedlander, M. P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31(2), 890–912 (2009)
Article MathSciNet Google Scholar
Xu, Y.: Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming. SIAM J. Optim. 27(3), 1459–1484 (2017)
Article MathSciNet Google Scholar
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for ℓ₁-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1(1), 143–168 (2008)
Article MathSciNet Google Scholar
Zeng, X., Lei, J., Chen, J.: Dynamical primal-dual accelerated method with applications to network optimization. arXiv:https://arxiv.org/abs/1912.03690 (2019)

Download references

Acknowledgements

The authors would like to thank the referees and the editor for their helpful comments and suggestions, which have led to the improvement of this paper.

Funding

This work was supported by the National Natural Science Foundation of China (11471230).

Author information

Authors and Affiliations

Department of Mathematics, Sichuan University, Chengdu, Sichuan, People’s Republic of China
Xin He & Ya-Ping Fang
Department of Applied Mathematics, Chengdu University of Information Technology, Chengdu, Sichuan, People’s Republic of China
Rong Hu

Authors

Xin He
View author publications
You can also search for this author in PubMed Google Scholar
Rong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ya-Ping Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ya-Ping Fang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, X., Hu, R. & Fang, YP. Inertial accelerated primal-dual methods for linear equality constrained convex optimization problems. Numer Algor 90, 1669–1690 (2022). https://doi.org/10.1007/s11075-021-01246-y

Download citation

Received: 29 June 2021
Accepted: 15 December 2021
Published: 08 January 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11075-021-01246-y

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Inertial accelerated primal-dual methods for linear equality constrained convex optimization problems

Abstract

Similar content being viewed by others

A modified inertial proximal alternating direction method of multipliers with dual-relaxed term for structured nonconvex and nonsmooth problem

A modified inertial proximal minimization algorithm for structured nonconvex and nonsmooth problem

Inertial alternating direction method of multipliers for non-convex non-smooth optimization

1 Introduction

Example 1.1

Example 1.2

Example 1.3

1.1 Related works

1.2 Inertial primal-dual methods

1.3 Outline

2 Convergence analysis

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Theorem 1

Proof

Assumption (H)

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Theorem 2

Proof

Remark 1

3 Numerical experiments

3.1 The quadratic programming problem

3.2 The basis pursuit problem

3.3 The linearly constrained ℓ 1 − ℓ 2 minimization problem

4 Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation

3.3 The linearly constrained ℓ ₁ − ℓ ₂ minimization problem