1 Introduction

Consider the linear equality constrained convex optimization problem:

$$ \min_{x} \quad F(x), \quad s.t. \ Ax = b, $$
(1)

where \(F: \mathbb {R}^{n}\to \mathbb {R}\) is a convex but possibly nonsmooth function, \(A\in \mathbb {R}^{m\times n}\) and \(b\in \mathbb {R}^{m}\). The problem (1) captures a number of important applications arising in various areas, and the following are three concrete examples.

Example 1.1

The basis pursuit problem (see, e.g., [9, 10]):

$$ \min_{x} \quad \|x\|_{1}, \quad s.t. \ Ax = b, $$
(2)

where \(A\in \mathbb {R}^{m\times n}\) with mn, and ∥⋅∥1 is the 1-norm of \(\mathbb {R}^{n}\) defined by \(\|x\|_{1} ={\sum }^{n}_{i=1} |x_{i}|\). Algorithms for the basis pursuit problem can be found in [31] and [33].

Example 1.2

The linearly constrained 12 minimization problem [17]:

$$ \min_{x} \quad \|x\|_{1}+\frac{\beta}{2}\|x\|^{2}_{2}, \quad s.t. \ Ax = b, $$
(3)

where β > 0 and ∥⋅∥2 is the 2-norm of \(\mathbb {R}^{n}\) defined by \(\|x\|^{2}_{2} ={\sum }^{n}_{i=1} {x_{i}^{2}}\). When β is small enough, a solution of the problem (3) is also a solution of the basis pursuit problem (2). Since the problem (3) has the regularization term \(\frac {\beta }{2}\|x\|^{2}_{2}\), it is less sensitive to noise than the basis pursuit problem (2).

Example 1.3

The global consensus problem [8]:

$$ \min_{X\in\mathbb{R}^{n\times N}} \quad F(X) = \sum\limits^{N}_{i=1}f_{i}(X_{i}), \quad s.t. \ X_{i} = X_{j},\quad \forall i, j\in\{1,2,{\cdots} N\}, $$

where \(f_{i}:\mathbb {R}^{n}\to \mathbb {R}\) is convex, i = 1,2,⋯ ,N. The global consensus problem is a widely investigated model that has important applications in signal processing [23], routing of wireless sensor networks [22] and optimal consensus of agents [28].

Recall that \((x^{*}, \lambda ^{*})\in \mathbb {R}^{n} \times \mathbb {R}^{m}\) is a KKT point of the problem (1) if

$$ \begin{cases} -A^{T}\lambda^{*} \in \partial F(x^{*}),\\ Ax^{*} -b =0, \end{cases} $$
(4)

where F is the classical subdifferential of F defined by

$$\partial F(x) = \{v\in\mathbb{R}^{n} | F(y)\geq F(x)+\langle v,y-x \rangle,\quad \forall y\in\mathbb{R}^{n}\}.$$

Let Ω be the KKT point set of the problem (1). Then, for any (x,λ) ∈Ω, from (4) we have

$$ \mathcal{L}(x^{*},\lambda)\leq \mathcal{L}(x^{*},\lambda^{*})\leq \mathcal{L}(x,\lambda^{*}), \qquad \forall (x,\lambda)\in \mathbb{R}^{n}\times\mathbb{R}^{m}, $$

where \({\mathscr{L}}:\mathbb {R}^{n}\times \mathbb {R}^{m}\to \mathbb {R}\) is the Lagrangian function associated with the problem (1) defined by

$$ \mathcal{L}(x,\lambda) = F(x)+\langle \lambda,Ax-b\rangle.$$

A classical method for solving the problem (1) is the augmented Lagrangian method (ALM) [6]:

$$ \begin{array}{@{}rcl@{}} \begin{cases} x_{k+1}\in {\arg\min}_{x} \mathcal{L}(x,\lambda_{k})+\frac{\sigma}{2}\|Ax-b\|^{2},\\ \lambda_{k+1} = \lambda_{k}+\sigma(Ax_{k+1}-b). \end{cases} \end{array} $$
(5)

In general, since \({\mathscr{L}}(x,\lambda _{k})+\frac {\sigma }{2}\|Ax-b\|^{2}\) is not strictly convex, the subproblem may have more than one solutions and be difficult to solve. To overcome this disadvantage, the proximal ALM [11] has been proposed:

$$ \begin{array}{@{}rcl@{}} \begin{cases} x_{k+1}={\arg\min}_{x} F(x)+\langle A^{T}\lambda_{k},x\rangle +\frac{\sigma}{2}\|Ax-b\|^{2}+\frac{1}{2}\|x-x_{k}\|_{P}^{2},\\ \lambda_{k+1} = \lambda_{k}+\sigma(Ax_{k+1}-b), \end{cases} \end{array} $$
(6)

where \(\|x\|^{2}_{P} = x^{T}Px\) with a positive semidefinite matrix P and P + σATA is positive definite.

In some practical situations, the objective function F has the composite structure: F(x) = f(x) + g(x), where f is a convex but possibly nonsmooth function and g is a convex smooth function. Then, the problem (1) becomes the linearly constrained composite convex optimization problem:

$$ \min_{x} \quad f(x)+g(x), \quad s.t. \ Ax = b. $$
(7)

An application of the method (6) to the problem (7) with linearizing the smooth function g leads to the linearized ALM [32]:

$$ \begin{array}{@{}rcl@{}} \begin{cases} x_{k+1}\in{\arg\min}_{x} f(x)+\langle \nabla g(x_{k})+A^{T}\lambda_{k},x\rangle +\frac{\sigma}{2}\|Ax-b\|^{2}+\frac{1}{2}\|x-x_{k}\|_{P}^{2},\\ \lambda_{k+1} = \lambda_{k}+\sigma(Ax_{k+1}-b). \end{cases} \end{array} $$
(8)

1.1 Related works

Under the assumption that F is smooth, He and Yuan [15] showed that the iteration-complexity of the method (5) is \(\mathcal {O}(1/k)\) in terms of the objective residual of the associated \({\mathscr{L}}(x,\lambda )\). When F is nonsmooth, Gu et al. [13] proved that the method (5) enjoys a worst-case \(\mathcal {O}(1/k)\) convergence rate in the ergodic sense. A worst-case \(\mathcal {O}(1/k)\) convergence rate in the non-ergodic sense of the method (6) was shown in [21]. When g has a Lipschitz continuous gradient with constant Lg and PLgId, Xu [32] proved that the method (8) achieves \(\mathcal {O}(1/k)\) convergence rate in the ergodic sense. Tran-Dinh and Zhu [30] proposed a modified version of the method (8) and proved that the objective residual and feasibility violation sequences generated by the method both enjoy \(\mathcal {O}(1/k)\) non-ergodic convergence rate. Liu et al. [24] investigated the non-ergodic convergence rate of an inexact augmented Lagrangian method for the problem (7).

Generally, naive first-order methods converge slowly. Much effort has been made to accelerate the existing first-order methods in past decades. Nesterov [25] first proposed an accelerated version of the classical gradient method for a smooth convex optimization problem, and proved that the accelerated inertial gradient method enjoys \(\mathcal {O}(1/k^{2})\) convergence rate. Beck and Teboulle [5] proposed an iterative shrinkage-thresholding algorithm for solving the linear inverse problem, which achieves \(\mathcal {O}(1/k^{2})\) convergence rate. The acceleration idea of [25] was further applied in Nesterov [26] to design the accelerated methods for unconstrained convex composite optimization problems. Su et al. [29] first studied accelerated methods from a continuous-time perspective. Since then, some new accelerated inertial methods based on the second-order dynamical system have been proposed for unconstrained optimization problems (see, e.g., [1, 3, 4]). For more results on inertial methods for unconstrained optimization problems, we refer the reader to [2, 12, 27].

Meanwhile, inertial accelerated methods for linearly constrained optimization problems have also been well-developed. When f is differentiable, He and Yuan [15] proposed an accelerated inertial ALM for the problem (1) and proved that its convergence rate is \(\mathcal {O}(1/k^{2})\) by using an extrapolation technique similar to [5]. When f is a differentiable function with Lipschitz continuous gradient, by time discretization of dynamical system, Boţ et al. [7] proposed an inertial ALM with \(\mathcal {O}(1/k^{2})\) convergence rate and provided the convergence of the sequence of iterates. Kang et al. [18] presented an inexact version of the accelerated ALM with inexact calculations of subproblems and showed that the convergence rate remains \(\mathcal {O}(1/k^{2})\) under the assumption that F is strongly convex. Kang et al. [17] further presented an accelerated Bregman method for the linearly constrained 12 minimization problem, and a convergence rate of \(\mathcal {O}(1/k^{2})\) was proved when the accelerated Bregman method is applied to solve the problem (1). To linearize the augmented term of the Bregman method, Huang et al. [16] raised an accelerated linearized Bregman algorithm with \(\mathcal {O}(1/k^{2})\) convergence rate. For the problem (7), Tran-Dinh and Zhu [30] proposed an inertial primal-dual method which enjoys \(\underline {o}(1/k\sqrt {\log k})\) convergence rate. Xu [32] proposed an accelerated version of the linearized ALM (8), named the accelerated linearized augmented Lagrangian method, which is formulated as follows:

$$ \begin{array}{@{}rcl@{}} \begin{cases} \hat{x}_{k}=(1-\alpha_{k})\bar{x}_{k}+\alpha_{k} x_{k},\\ x_{k+1} \in {\arg\min}_{x} f(x)+ \frac{\beta_{k}}{2}\|Ax - b\|^{2}+\frac{1}{2}\|x - x_{k}\|_{P_{k}}^{2}+\langle \nabla g(\hat{x}_{k})+A^{T}{\lambda}_{k}, x\rangle,\\ \bar{x}_{k+1}=(1-\alpha_{k})\bar{x}_{k}+\alpha_{k} x_{k+1},\\ \lambda_{k+1} =\lambda_{k}+ \gamma_{k}(Ax_{k+1}-b). \end{cases} \end{array} $$
(9)

It was shown in Xu [32] that the algorithm (9) enjoys \(\mathcal {O}(1/k^{2})\) convergence rate under specific parameter settings. It is worth mentioning that to achieve the \(\mathcal {O}(1/k^{2})\) rate, linearization to the augmented term is not allowed in the algorithm (9) since it may cause great difficulty on solving subproblems. Xu [32] did not discuss the convergence analysis of the method when the subproblem is solved inexactly.

1.2 Inertial primal-dual methods

We first propose Algorithm 1, an inertial version of the proximal ALM (6), for solving the problem (1). Algorithm 1 is inspired by the second-order primal-dual dynamical system in [14, 34] and the Nesterov accelerated methods for unconstrained optimization problem [2, 5, 25]. When the objective has the composite structure: F(x) = f(x) + g(x), by linearizing the smooth function g and introducing the perturbed sequence {𝜖k}k≥ 1 in Step 2 of Algorithm 1, we propose an inexact inertial accelerated primal-dual method (Algorithm 2) for the problem (7). As a comparison to Algorithm 1, we solve the subproblem inexactly by finding an approximate solution instead of an exact solution.

figure a
figure b

1.3 Outline

The rest of the paper is organized as follows. In Section 2, we investigate the convergence rates of the proposed methods. In Section 3, we perform numerical experiments. Finally, we give a concluding remark in Section 4.

2 Convergence analysis

In this section we analyze the convergence rates of Algorithm 1 and Algorithm 2. Assuming merely convexity, we show that both of them enjoy \(\mathcal {O}(1/k^{2})\) convergence rates in terms of the objective function and the primal feasibility.

To do so, we first recall some standard notations and results which will be used in the paper. In what follows, we always use ∥⋅∥ to denote the 2-norm.

Let \(\mathbb {S}_{+}(n)\) denote the set of all positive symmetric semidefinite matrixes in \(\mathbb {R}^{n\times n}\) and Id be the identity matrix. For \(M\in \mathbb {S}_{+}(n)\), we introduce the semi-norm on \(\mathbb {R}^{n}\): \(\|x\|_{M} =\sqrt {x^{T}Mx}\) for any \(x\in \mathbb {R}^{n}\). This introduces on \(\mathbb {S}_{+}(n)\) the following partial ordering: for any \(M_{1}, M_{2} \in \mathbb {S}_{+}(n)\),

$$M_{1} \succcurlyeq M_{2} \Longleftrightarrow \|x\|_{M_{1}}\geq \|x\|_{M_{2}},\quad \forall x\in \mathbb{R}^{n}.$$

For any \(x,y\in \mathbb {R}^{n}\), the following equality holds:

$$ \begin{array}{@{}rcl@{}} \frac{1}{2}\|x\|_{M}^{2}-\frac{1}{2}\|y\|_{M}^{2} = \langle x, M(x-y)\rangle-\frac{1}{2}\|x-y\|_{M}^{2}, \quad \forall M\in\mathbb{S}_{+}(n). \end{array} $$
(10)

Now, we start to analyze Algorithm 1.

Lemma 1

Let \(\{(x_{k},\lambda _{k},\bar {x}_{k})\}_{k\geq 1}\) be the sequence generated by Algorithm 1. Then,

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\frac{k + \alpha - 2}{k}M_{k}(x_{k+1}\!-\bar{x}_{k}) \in - s\!\left( \!\partial F(x_{k+1})+A^{T}(\lambda_{k+1}+\frac{k - 1}{\alpha - 1}(\lambda_{k+1}-\lambda_{k}))\!\right). \end{array} $$
(11)

Proof

From step 2, we have

$$ 0\in \partial F(x_{k+1})+\frac{k+\alpha-2}{sk}M_{k}(x_{k+1}-\bar{x}_{k})+\frac{sk(k+\alpha-2)}{(\alpha-1)^{2}} A^{T}(Ax_{k+1}-\eta_{k})+A^{T}\hat{\lambda}_{k}.$$

This yields

$$ \frac{k + \alpha - 2}{k}M_{k}(x_{k+1}-\bar{x}_{k})\in -s\!\left( \!\partial F(x_{k+1}) + A^{T}\!\!\left( \!\frac{sk(k + \alpha - 2)}{(\alpha - 1)^{2}} (Ax_{k+1}\!-\eta_{k})+\hat{\lambda}_{k}\right)\right). $$
(12)

It follows from Step 2 and Step 3 that

$$ \begin{array}{@{}rcl@{}} && \frac{sk(k+\alpha-2)}{(\alpha-1)^{2}}(Ax_{k+1}-\eta_{k})+\hat{\lambda}_{k}\\ && = \frac{sk(k+\alpha-2)}{(\alpha-1)^{2}} A x_{k+1}-\frac{sk(k-1)}{(\alpha-1)^{2}}Ax_{k}-\frac{sk}{\alpha-1}b+\hat{\lambda}_{k} \\ && = \frac{sk}{\alpha-1}(Ax_{k+1}-b+\frac{k-1}{\alpha-1}A(x_{k+1}-x_{k}))+\hat{\lambda}_{k}\\ && = \frac{k+\alpha-2}{\alpha-1}(\lambda_{k+1}-\bar{\lambda}_{k})+ \frac{k+\alpha-2}{\alpha-1}\bar{\lambda}_{k}-\frac{k-1}{\alpha-1}\lambda_{k} \\ && = \lambda_{k+1} +\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}). \end{array} $$

This together with (12) yields (11). □

Lemma 2

Suppose that F is a convex function, Ω≠ and \(M_{k-1}\succcurlyeq M_{k}\). Let \(\{(x_{k},\lambda _{k}, \bar {x}_{k}, \hat {\lambda }_{k})\}_{k\geq 1}\) be the sequence generated by Algorithm 1 and (x,λ) ∈Ω. Define

$$ \begin{array}{@{}rcl@{}} \mathcal{E}_{k} = \frac{s(k^{2}-k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))+ \frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}+\frac{1}{2}\|\hat{\lambda}_{k}-\lambda^{*}\|^{2} \end{array} $$
(13)

with

$$ \hat{x}_{k} = \frac{k+\alpha-2}{\alpha-1}\bar{x}_{k}-\frac{k-1}{\alpha-1}x_{k}. $$
(14)

Then, for any k ≥ 1, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{E}_{k+1} &\leq& \mathcal{E}_{k}-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}(\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}+\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}). \end{array} $$

Proof

By computation,

$$ \begin{array}{@{}rcl@{}} \hat{x}_{k+1} &=& \frac{k+\alpha-1}{\alpha-1}(x_{k+1}+\frac{k-1}{k+\alpha-1}(x_{k+1}-x_{k}))-\frac{k}{\alpha-1}x_{k+1} \\ &=& \frac{k+\alpha-2}{\alpha-1}x_{k+1}-\frac{k-1}{\alpha-1}x_{k}\\ &=& \frac{k+\alpha-2}{\alpha-1}(x_{k+1}-\bar{x}_{k})+ \frac{k+\alpha-2}{\alpha-1}\bar{x}_{k}-\frac{k-1}{\alpha-1}x_{k}\\ &=& \hat{x}_{k}+\frac{k+\alpha-2}{\alpha-1}(x_{k+1}-\bar{x}_{k}) \end{array} $$
(15)

and

$$ \hat{x}_{k+1}-x^{*} = x_{k+1}-x^{*}+\frac{k-1}{\alpha-1}(x_{k+1}-x_{k}) . $$
(16)

Similarly, we have

$$ \hat{\lambda}_{k+1} = \hat{\lambda}_{k}+\frac{k+\alpha-2}{\alpha-1}(\lambda_{k+1}-\bar{\lambda}_{k}) $$
(17)

and

$$ \hat{\lambda}_{k+1}-\lambda^{*} = \lambda_{k+1}-\lambda^{*}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}). $$
(18)

By the definition of \({\mathscr{L}}(x,\lambda )\), we get \(\partial _{x} {\mathscr{L}}(x,\lambda ) = \partial F(x)+A^{T}\lambda \). Combining this and equality (18), we can rewrite (11) as

$$ \begin{array}{@{}rcl@{}} \frac{k+\alpha-2}{k}M_{k}(x_{k+1}-\bar{x}_{k})&\in& - s(\partial F(x_{k+1})+A^{T}\lambda^{*}+A^{T}(\lambda_{k+1}\\&&-\lambda^{*}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k})))\\ & =& - s\partial_{x} \mathcal{L}(x_{k+1},\lambda^{*})-sA^{T}(\hat{\lambda}_{k+1}-\lambda^{*}), \end{array} $$

which implies

$$ \xi_{k} :=-\frac{k+\alpha-2}{sk}M_{k}(x_{k+1}-\bar{x}_{k})- A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\in \partial_{x} \mathcal{L}(x_{k+1},\lambda^{*}). $$
(19)

Since \(M_{k-1}\succcurlyeq M_{k}\succcurlyeq 0\), it follows from (10) and (15) that

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2} \\ &=&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}-M_{k}}^{2} \\ &\leq& \langle \hat{x}_{k+1}-x^{*}, M_{k}(\hat{x}_{k+1}-\hat{x}_{k})\rangle-\frac{1}{2}\|\hat{x}_{k+1}-\hat{x}_{k}\|_{M_{k}}^{2} \\ &=& \frac{k+\alpha-2}{\alpha-1}\langle \hat{x}_{k+1}-x^{*}, M_{k}(x_{k+1}-\bar{x}_{k})\rangle\\&&-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2} \\ &=& -\frac{sk}{\alpha-1}(\langle \hat{x}_{k+1}-x^{*}, \xi_{k}\rangle +\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle)\\ &&-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}. \end{array} $$
(20)

Since \({\mathscr{L}}(x,\lambda ^{*})\) is a convex function with respect to x, from (16) and (19) we get

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\langle \hat{x}_{k+1} - x^{*}, \xi_{k} \rangle &=&\langle x_{k+1}-x^{*}, \xi_{k}\rangle +\frac{k-1}{\alpha-1}\langle x_{k+1}-x_{k}, \xi_{k}\rangle\\ &\geq& \mathcal{L}(x_{k+1},\lambda^{*}) - \mathcal{L}(x^{*},\lambda^{*})+\frac{k - 1}{\alpha - 1}(\mathcal{L}(x_{k+1},\lambda^{*}) - \mathcal{L}(x_{k},\lambda^{*})). \end{array} $$
(21)

Combining (20) and (21) together, we have

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\\ &\leq& -\frac{sk}{\alpha-1}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*})) - \frac{s(k^{2}-k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x_{k},\lambda^{*})) \\ &&-\frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}. \end{array} $$
(22)

Since Ax = b, it follows from Step 3 of Algorithm 1 and (16) that

$$ \lambda_{k+1}-\bar{\lambda}_{k}=\frac{sk}{k + \alpha - 2}(Ax_{k+1}-Ax^{*}+\frac{k-1}{\alpha-1}A(x_{k+1}-x_{k})) = \frac{sk}{k + \alpha-2}A(\hat{x}_{k+1}-x^{*}). $$

This together with (10) and (17) yields

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{\lambda}_{k+1}-\lambda^{*}\|^{2}-\frac{1}{2}\|\hat{\lambda}_{k}-\lambda^{*}\|^{2} = \langle \hat{\lambda}_{k+1}-\lambda^{*}, \hat{\lambda}_{k+1}-\hat{\lambda}_{k}\rangle-\frac{1}{2}\|\hat{\lambda}_{k+1}-\hat{\lambda}_{k}\|^{2} \\ &&= \frac{k+\alpha-2}{\alpha-1}\langle \hat{\lambda}_{k+1}-\lambda^{*},\lambda_{k+1}-\bar{\lambda}_{k} \rangle-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}\\ &&= \frac{sk}{\alpha-1}\langle \hat{\lambda}_{k+1}-\lambda^{*},A(\hat{x}_{k+1}-x^{*})\rangle-\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}. \end{array} $$
(23)

It follows from (22) and (23) that

$$ \begin{array}{@{}rcl@{}} &&\mathcal{E}_{k+1}-\mathcal{E}_{k} \\ &=&\frac{s(k^{2}+k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))- \frac{s(k^{2}-k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))\\ &&+\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}+\frac{1}{2}\|\hat{\lambda}_{k+1}-\lambda^{*}\|^{2}-\frac{1}{2}\|\hat{\lambda}_{k}-\lambda^{*}\|^{2}\\ &\leq& \frac{(3 - \alpha)sk}{(\alpha - 1)^{2}}(\mathcal{L}(x_{k+1\!},\lambda^{*}) - \mathcal{L}(x^{*}\!,\lambda^{*})\!) - \frac{(k + \alpha - 2)^{2}}{2(\alpha - \!1)^{2}}(\|x_{k+1} - \bar{x}_{k}\|_{M_{k}}^{2} + \|\lambda_{k+1} - \bar{\lambda}_{k}\|^{2})\\ &\leq& -\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}(\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}+\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}), \end{array} $$

where the last inequality follows from α ≥ 3 and (x,λ) ∈Ω. This yields the desire result. □

To obtain the fast convergence rates, we need the following lemma.

Lemma 3

([20, Lemma 2], [19, Lemma 3.18]) Let \(\{a_{k}\}_{k=1}^{+\infty }\) be a sequence of vectors in \(\mathbb {R}^{n}\) such that

$$\|(\tau+(\tau-1)K)a_{K+1}+\sum\limits^{K}_{k=1}a_{k}\|\leq C,\qquad \forall K\geq 1, $$

where τ > 1 and C ≥ 0. Then, \(\|{\sum }^{K}_{k=1}a_{k}\|\leq C\) for all K ≥ 1.

Now, we discuss the \(\mathcal {O}(1/k^{2})\) convergence rate of Algorithms 1.

Theorem 1

Suppose that F is a convex function, Ω≠ and \(M_{k-1}\succcurlyeq M_{k}\). Let \(\{(x_{k},\lambda _{k},\bar {x}_{k},\bar {\lambda }_{k})\}_{k\geq 1}\) be the sequence generated by Algorithm 1 and (x,λ) ∈Ω. The following conclusions hold:

  • \({\sum }_{k=1}^{+\infty }k^{2}(\|x_{k+1}-\bar {x}_{k}\|_{M_{k}}^{2}+\|\lambda _{k+1}-\bar {\lambda }_{k}\|^{2}) < +\infty \).

  • For all k > 1,

    $$ \begin{array}{@{}rcl@{}} &&\|Ax_{k}-b\| \leq \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s(k-1)(k+\alpha-3)},\\ &&|F(x_{k})-F(x^{*})|\leq \frac{(\alpha-1)^{2}\mathcal{E}_{1}}{s(k^{2}-k)}+ \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}\|\lambda^{*}\|}{s(k-1)(k+\alpha-3)}, \end{array} $$

    where \(\mathcal {E}_{1}=\frac {1}{2}\|x_{1}-x^{*}\|_{M_{0}}^{2}+\frac {1}{2}\|\lambda _{1}-\lambda ^{*}\|^{2}\).

Proof

From Lemma 2, we have

$$ \mathcal{E}_{k+1}- \mathcal{E}_{k} \leq -\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}(\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}+\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2})\leq 0, \quad \forall k\geq 1. $$
(24)

By the definition of \(\mathcal {E}_{k}\) and (24), \(\{\mathcal {E}_{k}\}_{k\geq 1}\) is a nonincreasing and positive sequence. As a consequence, \(\mathcal {E}_{k}\) converges to some point. It follows from (24) that

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{k=1}^{+\infty} \frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}(\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}+\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2}))\\ && \leq \lim_{K\to+\infty} \sum\limits_{k=1}^{K} (\mathcal{E}_{k}-\mathcal{E}_{k+1}) \\ &&= \mathcal{E}_{1}-\lim_{K\to+\infty}\mathcal{E}_{K+1}\\ &&< +\infty, \end{array} $$
(25)

which is (i).

Combining (13) and (24), we have

$$ \begin{array}{@{}rcl@{}} \|\hat{\lambda}_{k}-\lambda^{*}\| \leq \sqrt{2\mathcal{E}_{k}}\leq \sqrt{2\mathcal{E}_{1}}, \qquad \forall k\geq 1. \end{array} $$

This yields

$$ \begin{array}{@{}rcl@{}} \left\|\sum\limits^{K}_{k=1}(\hat{\lambda}_{k+1}-\hat{\lambda}_{k})\right\| &=& \left\|\hat{\lambda}_{K+1}-\hat{\lambda}_{1}\right\| \\ &\leq& \left\|\hat{\lambda}_{K+1}-{\lambda}^{*}\right\|+\left\|\hat{\lambda}_{1}-{\lambda}^{*}\right\|\\ &\leq& 2\sqrt{2\mathcal{E}_{1}} \end{array} $$
(26)

for all K ≥ 1. It follows form (17) and Step 3 of Algorithm 1 that

$$ \begin{array}{@{}rcl@{}} &&\left\|\sum\limits^{K}_{k=1}(\hat{\lambda}_{k+1}-\hat{\lambda}_{k})\right\| = \left\|\sum\limits^{K}_{k=1}\frac{k+\alpha-2}{\alpha-1}(\lambda_{k+1}-\bar{\lambda}_{k})\right\|\\ &=& \frac{s}{\alpha-1}\left\|\sum\limits^{K}_{k=1}k(Ax_{k+1}-b+\frac{k-1}{\alpha-1}A(x_{k+1}-x_{k}))\right\|\\ & =&\frac{s}{(\alpha-1)^{2}}\left\|\sum\limits^{K}_{k=1}\left( k(k+\alpha-2)(Ax_{k+1}-b) -k(k-1)(Ax_{k}-b) \right)\right\|\\ & =&\frac{s}{(\alpha-1)^{2}}\left\|K(K+\alpha-2)(Ax_{K+1}-b)+ \sum\limits^{K}_{k=1}(\alpha-3)(k-1)(Ax_{k}-b)\right\|. \end{array} $$

This together with (26) implies

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\left\|K(K + \alpha - 2)(Ax_{K+1} - b) + \sum\limits^{K}_{k=1}\left( (\alpha - 3)(k - 1)(Ax_{k} - b)\right)\!\right\| \!\leq \frac{2(\alpha - 1)^{2}\sqrt{2\mathcal{E}_{1}}}{s}. \end{array} $$
(27)

When α = 3, it follows from (27) that

$$ \begin{array}{@{}rcl@{}} \|K(K+\alpha-2)(Ax_{K+1}-b)\| \leq \frac{2(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s}. \end{array} $$

When α > 3: applying Lemma 3 with ak = (α − 3)(k − 1)(Axkb), \(\tau = \frac {\alpha -2}{\alpha -3}\) and \(C = \frac {2(\alpha -1)^{2}\sqrt {2\mathcal {E}_{1}}}{s}\), from (27), we obtain

$$\left\|\sum\limits^{K}_{k=1}\left( (\alpha-3)(k-1)(Ax_{k}-b)\right)\right\| \leq \frac{2(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s},$$

which together with (27) yields

$$ \|K(K+\alpha-2)(Ax_{K+1}-b)\|\leq \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s}.$$

From above discussion, when α ≥ 3, we have

$$ \begin{array}{@{}rcl@{}} \|Ax_{k}-b\| \leq \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}}{s(k-1)(k+\alpha-3)}, \quad \forall k>1. \end{array} $$
(28)

It follows form the definition of \(\mathcal {E}_{k}\) and (24) that

$$ \begin{array}{@{}rcl@{}} \mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}) \leq \frac{(\alpha-1)^{2}\mathcal{E}_{k}}{s(k^{2}-k)}\leq \frac{(\alpha-1)^{2}\mathcal{E}_{1}}{s(k^{2}-k)} \end{array} $$

for all k > 1. This together with (28) implies that

$$ \begin{array}{@{}rcl@{}} |F(x_{k})-F(x^{*})| &=& |\mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*})-\langle \lambda^{*}, Ax_{k}-b\rangle|\\ &\leq & \mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}) +\|\lambda^{*}\|\|Ax_{k}-b\|\\ &\leq & \frac{(\alpha-1)^{2}\mathcal{E}_{1}}{s(k^{2}-k)}+ \frac{4(\alpha-1)^{2}\sqrt{2\mathcal{E}_{1}}\|\lambda^{*}\|}{s(k-1)(k+\alpha-3)} \end{array} $$

for all k > 1. The proof is complete. □

To investigate the convergence of Algorithm 2, we need the following assumption.

Assumption (H)

Ω≠, f is a convex function, g is a convex smooth function and has a Lipschitz continuous gradient with constant Lg > 0, i.e.,

$$ \|\nabla g(x)-\nabla g(y)\|\leq L_{g}\|x-y\|,\qquad \forall x,y\in\mathbb{R}^{n}, $$

equivalently,

$$ g(x)\leq g(y) + \langle \nabla g(y),x-y\rangle+\frac{L_{g}}{2}\|x-y\|^{2},\qquad \forall x,y\in\mathbb{R}^{n}. $$
(29)

Lemma 4

Let \(\{(x_{k},\lambda _{k},\bar {x}_{k})\}_{k\geq 1}\) the sequence generated by Algorithm 2. Then,

$$ \begin{array}{@{}rcl@{}} &&\frac{k+\alpha-2}{k} M_{k}({x_{k+1}-\bar{x}_{k}})\\ &&\in- s\left( \partial f(x_{k+1})+\nabla g(\bar{x}_{k})+ A^{T}(\lambda_{k+1}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}))-\epsilon_{k}\right). \end{array} $$
(30)

Proof

From Step 2 of Algorithm 2 , we have

$$ \begin{array}{@{}rcl@{}} &&0\in \partial f(x_{k+1})+\nabla g(\bar{x}_{k})+ \frac{k+\alpha-2}{sk}M_{k}(x_{k+1}-\bar{x}_{k})\\&&+\frac{sk(k+\alpha-2)}{(\alpha-1)^{2}} A^{T}(Ax_{k+1}-\eta_{k})+A^{T}\hat{\lambda}_{k}-\epsilon_{k} , \end{array} $$

which yields

$$ \begin{array}{@{}rcl@{}} &&\frac{k+\alpha-2}{k}M_{k}(x_{k+1}-\bar{x}_{k})\\ &&\in-s\left( \partial f(x_{k+1})+\nabla g(\bar{x}_{k}) +A^{T}\left( \frac{sk(k+\alpha-2)}{(\alpha-1)^{2}} (Ax_{k+1}-\eta_{k})+\hat{\lambda}_{k}\right)-\epsilon_{k}\right). \end{array} $$

The rest of the proof is similar as the one of Lemma 1, and so we omit it. □

Lemma 5

Assume that Assumption (H) holds, and \(M_{k-1}\succcurlyeq M_{k}\succcurlyeq sL_{g} Id \). Let \(\{(x_{k},\lambda _{k}, \hat {\lambda }_{k})\}_{k\geq 1}\) be the sequence generated by Algorithm 2 and (x,λ) ∈Ω. Define

$$ \mathcal{E}^{\epsilon}_{k} = \mathcal{E}_{k}-{\sum}^{k}_{j=1} \frac{s(j-1)}{\alpha-1} \langle \hat{x}_{j}-x^{*}, \epsilon_{j-1}\rangle, $$
(31)

where \(\mathcal {E}_{k}\) is defined in (13) and \(\hat {x}_{k}\) is defined in (14). Then, for any k ≥ 1,

$$ \begin{array}{@{}rcl@{}} \mathcal{E}^{\epsilon}_{k+1} &\leq& \mathcal{E}^{\epsilon}_{k}. \end{array} $$

Proof

By same arguments as in the proof of Lemma 2, we get

$$ \begin{array}{@{}rcl@{}} && \hat{x}_{k+1}-\hat{x}_{k} = \frac{k+\alpha-2}{\alpha-1}(x_{k+1}-\bar{x}_{k}), \end{array} $$
(32)
$$ \begin{array}{@{}rcl@{}} && \hat{x}_{k+1}-x^{*} = x_{k+1}-x^{*}+\frac{k-1}{\alpha-1}(x_{k+1}-x_{k}), \end{array} $$
(33)
$$ \begin{array}{@{}rcl@{}} && \hat{\lambda}_{k+1}-\hat{\lambda}_{k}=\frac{k+\alpha-2}{\alpha-1}(\lambda_{k+1}-\bar{\lambda}_{k}), \end{array} $$
(34)
$$ \begin{array}{@{}rcl@{}} && \hat{\lambda}_{k+1}-\lambda^{*} = \lambda_{k+1}-\lambda^{*}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}). \end{array} $$
(35)

For notation simplicity, we denote

$$ \mathcal{L}^{f}(x) = f(x)+\langle \lambda^{*},Ax-b\rangle. $$
(36)

Then, \({\mathscr{L}}^{f}\) is a convex function, \(\partial {\mathscr{L}}^{f}(x) = \partial f(x)+A^{T}\lambda ^{*}\), and

$$ \mathcal{L}(x,\lambda^{*}) = \mathcal{L}^{f}(x)+g(x). $$
(37)

It follows from (30) and (35) that

$$ \begin{array}{@{}rcl@{}} \frac{k+\alpha-2}{k} M_{k}(x_{k+1}-\bar{x}_{k})&\in& - s(\partial f(x_{k+1})+A^{T}\lambda^{*})-s\nabla g(\bar{x}_{k})\\ &&-s A^{T}(\lambda_{k+1}-\lambda^{*}+\frac{k-1}{\alpha-1}(\lambda_{k+1}-\lambda_{k}))+s\epsilon_{k} \\ & =& - s\partial \mathcal{L}^{f}(x_{k+1})-s\nabla g(\bar{x}_{k})-sA^{T}(\hat{\lambda}_{k+1}-\lambda^{*})+s\epsilon_{k}, \end{array} $$

which yields

$$ \xi_{k} :=-\frac{k+\alpha-2}{ks}M_{k}(x_{k+1}-\bar{x}_{k})-\nabla g(\bar{x}_{k})- A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})+\epsilon_{k} \in \partial\mathcal{L}^{f}(x_{k+1}). $$
(38)

Since \(M_{k-1}\succcurlyeq M_{k}\), it follows from (10), (32) and (38) that

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\\&\leq& \langle \hat{x}_{k+1}-x^{*}, M_{k}(\hat{x}_{k+1}-\hat{x}_{k})\rangle-\frac{1}{2}\|\hat{x}_{k+1}-\hat{x}_{k}\|^{2}_{M_{k}} \\ &=& -\frac{sk}{\alpha-1}(\langle \hat{x}_{k+1}-x^{*},\xi_{k}\rangle+\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle\\ &&+\langle \hat{x}_{k+1}-x^{*},\nabla g(\bar{x}_{k})\rangle-\langle \hat{x}_{k+1}-x^{*},\epsilon_{k}\rangle) -\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{M_{k}}^{2}. \end{array} $$
(39)

From (33) and (38), we have

$$ \begin{array}{@{}rcl@{}} \langle \hat{x}_{k+1}-x^{*}, \xi_{k} \rangle& =&\langle x_{k+1}-x^{*}, \xi_{k}\rangle +\frac{k-1}{\alpha-1}\langle x_{k+1}-x_{k}, \xi_{k}\rangle\\ &\geq& \mathcal{L}^{f}(x_{k+1})-\mathcal{L}^{f}(x^{*})+\frac{k-1}{\alpha-1}(\mathcal{L}^{f}(x_{k+1})-\mathcal{L}^{f}(x_{k})), \end{array} $$
(40)

where the inequality follows from the convexity of \({\mathscr{L}}^{f}\). Since g has a Lipschitz continuous gradient, from (29) we get

$$ g(x_{k+1})\leq g(\bar{x}_{k})+\langle \nabla g(\bar{x}_{k}),x_{k+1}-\bar{x}_{k}\rangle+\frac{L_{g}}{2}\|x_{k+1}-\bar{x}_{k}\|^{2}. $$
(41)

By the convexity of g, we have

$$ \begin{array}{@{}rcl@{}} \langle\nabla g(\bar{x}_{k}),x_{k+1}-\bar{x}_{k}\rangle &=& \langle\nabla g(\bar{x}_{k}),x_{k+1}-x^{*}\rangle+\langle\nabla g(\bar{x}_{k}),x^{*}-\bar{x}_{k}\rangle\\ &\leq&\langle\nabla g(\bar{x}_{k}),x_{k+1}-x^{*}\rangle + g(x^{*})-g(\bar{x}_{k}) \end{array} $$
(42)

and

$$ \begin{array}{@{}rcl@{}} \langle\nabla g(\bar{x}_{k}),x_{k+1}-\bar{x}_{k}\rangle &=& \langle\nabla g(\bar{x}_{k}),x_{k+1}-x_{k}\rangle+\langle\nabla g(\bar{x}_{k}),x_{k}-\bar{x}_{k}\rangle\\ &\leq&\langle\nabla g(\bar{x}_{k}),x_{k+1}-x_{k}\rangle + g(x_{k})-g(\bar{x}_{k}). \end{array} $$
(43)

It follows from (41)–(43) that

$$\langle\nabla g(\bar{x}_{k}),x_{k+1}-x^{*}\rangle\geq g(x_{k+1})-g(x^{*})-\frac{L_{g}}{2}\|x_{k+1}-\bar{x}_{k}\|^{2},$$

and

$$\langle\nabla g(\bar{x}_{k}),x_{k+1}-x_{k}\rangle\geq g(x_{k+1})-g(x_{k})-\frac{L_{g}}{2}\|x_{k+1}-\bar{x}_{k}\|^{2}.$$

This together with (33) yields

$$ \begin{array}{@{}rcl@{}} \langle \hat{x}_{k+1}-x^{*},\nabla g(\bar{x}_{k}) \rangle &=& \langle\nabla g(\bar{x}_{k}),x_{k+1}-x^{*}\rangle+\frac{k-1}{\alpha-1}\langle\nabla g(\bar{x}_{k}),x_{k+1}-x_{k}\rangle\\ & \geq& g(x_{k+1})-g(x^{*})+\frac{k-1}{\alpha-1}(g(x_{k+1})-g(x_{k}))\\ &&-\frac{(k+\alpha-2)L_{g}}{2(\alpha-1)}\|x_{k+1}-\bar{x}_{k}\|^{2}. \end{array} $$
(44)

It follows from (39), (40) and (44) that

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{2}\|\hat{x}_{k+1}-x^{*}\|_{M_{k}}^{2}-\frac{1}{2}\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\\ &\leq& -\frac{sk}{\alpha-1}(\mathcal{L}^{f}(x_{k+1})+g(x_{k+1})-(\mathcal{L}^{f}(x^{*})+g(x^{*}))) \\ &&- \frac{sk(k-1)}{(\alpha-1)^{2}}(\mathcal{L}^{f}(x_{k+1})+g(x_{k+1})-(\mathcal{L}^{f}(x_{k})+g(x_{k})))\\ &&-\frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle+ \frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},\epsilon_{k} \rangle\\ &&-\frac{k+\alpha-2}{2(\alpha-1)^{2}}\|x_{k+1}-\bar{x}_{k}\|_{(k+\alpha-2)M_{k}-sL_{g}kId}^{2} \\ &\leq& -\frac{sk}{\alpha-1}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))- \frac{sk(k-1)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x_{k},\lambda^{*}))\\ &&-\frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},A^{T}(\hat{\lambda}_{k+1}-\lambda^{*})\rangle+\frac{sk}{\alpha-1}\langle \hat{x}_{k+1}-x^{*},\epsilon_{k} \rangle, \end{array} $$
(45)

where the second inequality follows from the assumption \(M_{k}\succcurlyeq sL_{g} Id\succcurlyeq \frac {sL_{g}k}{k+\alpha -2}Id\).

It follows from (23), (31) and (45) that

$$ \begin{array}{@{}rcl@{}} \mathcal{E}^{\epsilon}_{k+1}-\mathcal{E}^{\epsilon}_{k}&=& \mathcal{E}_{k+1}-\mathcal{E}_{k}- \frac{sk}{\alpha-1} \langle \hat{x}_{k+1}-x^{*}, \epsilon_{k}\rangle \\ & \leq& \frac{(3-\alpha)sk}{(\alpha-1)^{2}}(\mathcal{L}(x_{k+1},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*})) -\frac{(k+\alpha-2)^{2}}{2(\alpha-1)^{2}}\|\lambda_{k+1}-\bar{\lambda}_{k}\|^{2} \\ & \leq& 0, \end{array} $$
(46)

where the last inequality follows from α ≥ 3 and (x,λ) ∈Ω. □

To analyze the convergence of Algorithm 2, we need the following discrete version of the Gronwall-Bellman lemma.

Lemma 6

[2, Lemma 5.14] Let {ak}k≥ 1 and {bk}k≥ 1 be two nonnegative sequences such that \({\sum }^{+\infty }_{k} b_{k}< +\infty \) and

$$ {a_{k}^{2}} \leq c^{2} +\sum\limits^{k}_{j=1}b_{j} a_{j} $$

for all k ≥ 1, where c ≥ 0. Then,

$$a_{k}\leq c+\sum\limits^{+\infty}_{j=1} b_{j}$$

for all k ≥ 1.

Theorem 2

Assume that Assumption (H) holds, \(M_{k-1}\succcurlyeq M_{k}\succcurlyeq sL_{g}Id\) for all k ≥ 1, and

$$\sum\limits^{+\infty}_{k=1}k\|\epsilon_{k}\|< +\infty. $$

Let {(xk,λk)}k≥ 1 be the sequence generated by Algorithm 2 and (x,λ) ∈Ω. Then, for all k > 1,

$$ \begin{array}{@{}rcl@{}} && \|Ax_{k}-b\| \leq \frac{4(\alpha-1)^{2}\sqrt{2C}}{s(k-1)(k+\alpha-3)},\\ && |f(x_{k})+g(x_{k})-f(x^{*})-g(x^{*})| \leq \frac{(\alpha-1)^{2}C}{s(k^{2}-k)}+\frac{4(\alpha-1)^{2}\sqrt{2C}\|\lambda^{*}\|}{s(k-1)(k+\alpha-3)}, \end{array} $$

where

$$ C\!: =\! \frac{1}{2}\|x_{1}-x^{*}\|_{M_{0}}^{2}+\frac{1}{2}\|\lambda_{1}-\lambda^{*}\|^{2}+\frac{s}{\alpha - 1}\!\!\left( \!\!\sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}} + \frac{2}{(\alpha - 1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\!\!\right )\!\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|.$$

Proof

From Lemma 5 we have

$$\mathcal{E}^{\epsilon}_{k+1} \leq \mathcal{E}^{\epsilon}_{k}\leq \mathcal{E}^{\epsilon}_{1},$$

and it yields

$$ \mathcal{E}_{k}\leq \mathcal{E}_{1}+\sum\limits^{k}_{j=1}\frac{s(j-1)}{\alpha-1} \langle \hat{x}_{j}-x^{*}, \epsilon_{j-1}\rangle. $$
(47)

This together with (13) and Cauchy-Schwarz inequality implies

$$\|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\leq 2\mathcal{E}_{1}+\frac{2s}{\alpha-1}\sum\limits^{k}_{j=1}(j-1) \|\hat{x}_{j}-x^{*}\|\cdot\|\epsilon_{j-1}\| . $$

Since \(M_{k-1}\succcurlyeq sL_{g} Id \),

$$ \|\hat{x}_{k}-x^{*}\|^{2}\leq \frac{1}{sL_{g}} \|\hat{x}_{k}-x^{*}\|_{M_{k-1}}^{2}\leq \frac{2\mathcal{E}_{1}}{sL_{g}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{k}_{j=1}(j-1) \|\hat{x}_{j}-x^{*}\|\cdot\|\epsilon_{j-1}\| . $$
(48)

Since \({\sum }^{+\infty }_{j=1} j\|\epsilon _{j}\|< +\infty \), applying Lemma 6 with \(a_{k}=\|\hat {x}_{k}-x^{*}\|\) to (48), we obtain

$$ \|\hat{x}_{k}-x^{*}\|\leq \sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|<+\infty, \quad\forall k\geq 1. $$
(49)

This together with (47) yields

$$ \begin{array}{@{}rcl@{}} \mathcal{E}_{k}&\leq & \mathcal{E}_{1}+\frac{s}{\alpha-1}\sup_{k\geq 1}\|\hat{x}_{k}-x^{*}\|\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\\ &\leq& \mathcal{E}_{1}+\frac{s}{\alpha-1}\left( \sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\right )\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\| \end{array} $$
(50)

for any k ≥ 1. Denote

$$ \begin{array}{@{}rcl@{}} C&:=& \mathcal{E}_{1}+\frac{s}{\alpha-1}\left( \sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\right )\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\\ &=& \frac{1}{2}\|x_{1}-x^{*}\|_{M_{0}}^{2}+\frac{1}{2}\|\lambda_{1}-\lambda^{*}\|^{2}\\&&+\frac{s}{\alpha-1}\left( \sqrt{\frac{2\mathcal{E}_{1}}{sL_{g}}}+\frac{2}{(\alpha-1)L_{g}}\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|\right )\times\sum\limits^{+\infty}_{j=1}j\|\epsilon_{j}\|. \end{array} $$

By the definition of \(\mathcal {E}_{k}\) and (50), we have

$$\frac{s(k^{2}-k)}{(\alpha-1)^{2}}(\mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}))\leq \mathcal{E}_{k}\leq C$$

and

$$ \begin{array}{@{}rcl@{}} \|\hat{\lambda}_{k}-\lambda^{*}\| \leq \sqrt{2\mathcal{E}_{k}}\leq \sqrt{2C}, \qquad \forall k\geq 1. \end{array} $$

By similar arguments in Theorem 1, we obtain

$$ \begin{array}{@{}rcl@{}} \|Ax_{k}-b\| \leq \frac{4(\alpha-1)^{2}\sqrt{2C}}{s(k-1)(k+\alpha-3)} \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} &&|f(x_{k})+g(x_{k})-f(x^{*})-g(x^{*})| \\ &&\leq \mathcal{L}(x_{k},\lambda^{*})-\mathcal{L}(x^{*},\lambda^{*}) +\|\lambda^{*}\|\|Ax_{k}-b\|\\ &&\leq \frac{(\alpha-1)^{2}C}{s(k^{2}-k)}+\frac{4(\alpha-1)^{2}\sqrt{2C}\|\lambda^{*}\|}{s(k-1)(k+\alpha-3)} \end{array} $$

for all k > 1. □

Remark 1

It was shown in [32, Theorem 2.9] that the algorithm (9) with adaptive parameters enjoys \(\mathcal {O}(1/k^{2})\) rate. Xu [32] did not discuss whether the convergence rate of algorithm (9) is preserved when the subproblem is solved inexactly. By Theorem 2, the \(\mathcal {O}(1/k^{2})\) convergence rate of Algorithm 2 is preserved even if the subproblem is solved inexactly, provided the errors are sufficiently small. The numerical experiments in section 3 also show the effectiveness of the inexact algorithm.

3 Numerical experiments

In this section, we present numerical experiments to illustrate the efficiency of the proposed algorithms. All codes are run on a PC (with 2.3 GHz Quad-Core Intel Core i5 and 8 GB memory) under MATLAB Version 9.4.0.813654 (R2018a).

3.1 The quadratic programming problem

In this subsection, we test the algorithms on the nonnegative linearly constrained quadratic programming problem (NLCQP):

$$ \min \ \frac{1}{2}x^{T}Qx+q^{T}x, \quad s.t. \ Ax = b, x\geq 0, $$

where \( q \in \mathbb {R}^{n}\), \(Q\in \mathbb {R}^{n\times n}\) is a positive semidefinite matrix, \(A\in \mathbb {R}^{m\times n}\), \( b \in \mathbb {R}^{m}\). Here, we compare Algorithm 2 (Al2) with the accelerated linearized augmented Lagrangian method (AALM [32, Algorithm 1]), which enjoys \(\mathcal {O}(1/k^{2})\) convergence rate with adaptive parameters.

Set m = 100 and n = 500. Let q be generated by standard Gaussian distribution, b be generated by uniform distribution, A = [B,Id] with \(B\in \mathbb {R}^{m\times (n-m)}\) generated by standard Gaussian distribution, Q = 2HTH with \(H\in \mathbb {R}^{n\times n}\) generated by standard Gaussian distribution. Then, Q may not be positive definite. The optimal value F(x) is obtained by Matlab function quadprog with tolerance 10− 15. In this case, F(x) = f(x) + g(x) with \(f(x) = \mathcal {I}_{y\geq 0}(x)\), HCode \(g(x)= \frac {1}{2}x^{T}Qx+q^{T}x\), where \(\mathcal {I}_{y\geq 0}\) is the indicator function of the set {y|y ≥ 0}, i.e.,

$$ \mathcal{I}_{y\geq 0}(x)= \begin{cases} 0,\qquad &x\geq 0,\\ +\infty \qquad &otherwise. \end{cases} $$

Set the parameters of Algorithm 2 as: s = ∥Q∥, Mk = s ∗∥QId. Set the parameters of AALM ([32, Algorithm 1]) with adaptive parameters, in which \(\alpha _{k}=\frac {2}{k+1}\), βk = γk = ∥Qk, \(P_{k}=\frac {2\|Q\|}{k}Id\). Subproblems for both algorithms are solved by interior-point algorithms to a tolerance subtol. Figure 1 describes the distance of optimal value |F(xk) − F(x)| and violation of feasibility ∥Axkb∥ given Al2 with α = 10, 20, 30 and AALM for the first 500 iterations. As shown in Fig. 1, Algorithm 2 performs better and more stable than AALM under different subtol.

Fig. 1
figure 1

Error of objective function and constraint of Al2 and AALM with different subtol

3.2 The basis pursuit problem

Consider the following basis pursuit problem:

$$ \min_{x} \quad \|x\|_{1}, \quad s.t. \ Ax = b, $$

where \(A\in \mathbb {R}^{m\times n}\), \(b\in \mathbb {R}^{m}\) and mn. Let A be generated by standard Gaussian distribution. The number of nonzero elements of the original solution x is fixed at 0.1 ∗ n, and the nonzero elements are selected randomly in [− 2,2]. Set b = Ax. We compare Algorithm 1 with the inexact augmented Lagrangian method (IAL [24, Algorithm 1]). Here, subproblems for both algorithms are solved by fast iterative shrinkage-thresholding algorithm (FISTA [5], [24, Algorithm 2]), and the stopping condition of the FISTA is when

$$ \frac{\|x_{k}-x_{k-1}\|^{2}}{\max\{\|x_{k-1}\|,1\}}\leq subtol $$

is satisfied or the number of iterations exceeds 100, where accuracy subtol = 1e − 4, 1e − 6, 1e − 8. In each test, we calculate the residual error ∥Axkb∥ (Res) and the relative error of the solution \(\frac {\|x_{k}-x^{*}\|}{\|{x}^{*}\|}\) (Rel) with the stopping condition Res + Rel ≤ 1e − 8. Set the parameters of Algorithm 1 as α = n, s = 100, Mk = 0, and the parameter of IAL as β = 1. Let Init and Time denote the number of iterations, and the CPU time in seconds, respectively. Under different tolerance subtol of subproblem, Tables 12 and 3 report the results for the basis pursuit problem with different dimensions. We observe that when the subproblem is solved with different accuracy, Algorithm 1 is faster than IAL in terms of the number of iterations and the cpu time.

Table 1 Numerical results of Algorithm 1 and IAL with subtol = 1e − 4
Table 2 Numerical results of Algorithm 1 and IAL with subtol = 1e − 6
Table 3 Numerical results of Algorithm 1 and IAL with subtol = 1e − 8

3.3 The linearly constrained 1 2 minimization problem

Consider the following problem:

$$ \min_{x}\ \|x\|_{1} +\frac{\beta}{2}\|x\|^{2}_{2} \quad s.t.\ Ax = b, $$

where \(A\in \mathbb {R}^{m\times n}\) and \(b\in \mathbb {R}^{m}\). Let m = 1500,n = 3000, and A be generated by standard Gaussian distribution. Suppose that the original solution (signal) \({x}^{*}\in \mathbb {R}^{n}\) has only 150 non-zero elements which are generated by the Gaussian distribution \(\mathcal {N}(0,4)\) in the interval [− 2,2] and that the noise ω is selected randomly with ∥ω∥ = 10− 4,

$$ b= A{x}^{*}+\omega.$$

Set parameters for Algorithm 2 (Al2) with α = 20, s = 1, Mk = sβ, and the parameters of IAALM ([18, Algorithm 1]) with γ = 1. Subproblems are solved by FISTA and the stopping condition is when

$$ \frac{\|x_{k}-x_{k-1}\|^{2}}{\max\{\|x_{k-1}\|,1\}}\leq subtol $$

is satisfied or the number of iterations exceeds 100, where accuracy subtol = 1e − 6, 1e − 8. We terminate all the methods when ∥Axkb∥≤ 5 ∗ 10− 4. In each test, we calculate the residual error res = ∥Axb∥, the relative error \(rel = \frac {\|x-x^{*}\|}{\|{x}^{*}\|}\) and the signal-to-noise ratio

$$SNR=10\log_{10} \frac{\|x^{*}-\text{mean}(x^{*})\|^{2}}{\|x-{x}^{*}\|^{2}},$$

where x is the recovery signal.

In Table 4, we present the numerical results of Algorithm 2 and IAALM for various β. When subtol = 1e − 6, IAALM does not work well, we list the numerical results of Algorithm 2 in Table 5. Based on the Rel and SNR, it is seen that the sparse original signal is well restored when β ≤ 1. This is also shown in Fig. 2.

Table 4 Numerical results of Algorithm 2 and IAALM with subtol = 1e − 8
Table 5 Numerical results of Algorithm 2 with subtol = 1e − 6
Fig. 2
figure 2

Original sparse signal and the final estimated solution of Algorithm 2 with subtol = 1e − 8

4 Conclusion

In this paper, we propose two inertial accelerated primal-dual methods for solving linear equality constrained convex optimization problems. Assuming merely convexity, we show the inertial primal-dual methods own \(\mathcal {O}(1/k^{2})\) convergence rates even if the subproblem is solved inexactly. The numerical results demonstrate the validity and superior performance of our methods over some existing methods.