1 Introduction

In this paper, we consider the linearly constrained \(\ell _1\)\(\ell _2\) minimization problem

$$\begin{aligned} \min _x \Vert x\Vert _1 + \frac{\beta }{2}\Vert x\Vert ^2_2 \quad \,\text{ subject} \text{ to}\quad Ax=b, \end{aligned}$$
(1)

where \(\beta \ge 0\), \(A\in \mathbb{R }^{m\times n}\), \(x\in \mathbb{R }^n\), and \(b\in \mathbb{R }^m\). The penalized form of the (1), in which the linear constraints are penalized by using a square of its \(\ell _2\)-norm,

$$\begin{aligned} \min _x \frac{\gamma }{2}\Vert Ax-b\Vert ^2_2+ \Vert x\Vert _1 + \frac{\beta }{2}\Vert x\Vert ^2_2, \end{aligned}$$
(2)

where \(\gamma >0\), has been used in statistical analysis of the variable selection problem [23, 35], neuroimaging [5], and genome analysis [14]. It is called as the elastic net problem. In variable selection problem, \(A\) is a modeling matrix whose columns are predictors[35] and De Mol et al [15] proposed good predictors using dictionary learning based on training set. There are some algorithms for solving the elastic net problem. Zuo and Hastie [35] proposed the least angle regression and Friedman et al. [17] proposed the coordinate descent method.

In compressive sensing [1, 12], a linearly constrained \(\ell _1\) minimization problem, i.e., the problem (1) with \(\beta = 0\):

$$\begin{aligned} \min _x \Vert x\Vert _1 \quad \,\text{ subject} \text{ to}\quad Ax=b, \end{aligned}$$
(3)

where \(m \ll n\), i.e., an underdetermined system of equations, has been used. This linearly constrained \(\ell _1\) minimization problem (3) is often called as the basis pursuit problem [13]. The main theory of compressive sensing is that a large sparse signal can be recovered from highly incomplete information. Many real-world signals can be approximated well by sparse or compressible under a suitable basis, so it can be considered that a matrix \(A\) is the product of a sensing matrix and the basis transformation matrix and the number of rows of \(A\) (i.e., \(m\)) is less than the number of columns of \(A\) (i.e., \(n\)). The theoretical results provided in [911] related to compressive sensing showed that the solution of basis pursuit (3) is equal to the sparsest solution in certain conditions of \(A\) with overwhelming probability. The basis pursuit problem can be recast as Linear Programming(LP) and so LP solvers, such as \(\ell _1\)-magic [8] and PDCO [29], were proposed for solving the basis pursuit. Recently, Yin et al. [34] proposed the Bregman method, which is equivalent to an augmented Lagrangian method, to solve the basis pursuit problem. Several other algorithms can also be applied to the basis pursuit problem [30, 31].

Adding the \(\Vert \cdot \Vert ^2_2\) in the basis pursuit problem yields the tractable object function \(\Vert x\Vert _1 + \frac{\beta }{2}\Vert x\Vert ^2_2\), which is a strictly convex function. Thus, the linearly constrained \(\ell _1\)\(\ell _2\) minimization problem (1) has a unique solution and the dual problem is smooth. When \(\beta \) is sufficiently small, the solution of (1) is also a solution of the basis pursuit problem (3). This exact regularization property of \(\ell _1\)\(\ell _2\) has been studied in [16, 33]. Problem (1) has a \(\ell _2\) norm in the regularizer term, so it is less sensitive to noise than the basis pursuit problem (3).

The linearized Bregman method [6, 7] is a well-known optimization algorithm for compressive sensing. This method actually solves the linearly constrained \(\ell _1\)\(\ell _2\) minimization problem (1) [6, 27] instead of the basis pursuit problem (3). However, the linearized Bregman method converges slowly, so many accelerated schemes have been proposed. The kicking technique was introduced in [27]. Yin [33] proved that the linearized Bregman method is equivalent to a gradient descent method applied to the Lagrangian dual problem of (1). Based on this study, Yin [33] improved the linearized Bregman method using Barzilai-Borwein line search [2], the limited memory BFGS [24], and nonlinear conjugate gradient methods. Huang et al. [22] developed an accelerated linearized Bregman method based on the fact that the linearized Bregman method is equivalent to a gradient descent method applied to the Lagrangian dual of problem (1) and the extrapolation technique, which is used in the accelerated proximal gradient methods [3, 25] studied by Nesterov and others. Yang et al. [32] applied alternating split technique to the Lagrange dual of problem (1) for solving the linearly constraints \(\ell _1\)\(\ell _2\) minimization problem.

Recently, He et al. [19] developed the accelerated augmented Lagrangian method for the linearly constraints minimization problem whose objective function is differentiable. Since the problem (1) has only linear constraint, in this paper, we extend the algorithm in [19] to solve the linearly constrained minimization problem in which the object function is convex and continuous but not differentiable. By using the equivalence between the Bregman method and the augmented Lagrangian method (ALM), and using the generalization of the accelerated ALM [19], we propose the accelerated Bregman method for solving the linearly constrained \(\ell _1\)\(\ell _2\) minimization problem (1). We observe that the performance of the proposed method is similar with that of the accelerated linearized Bregman method from numerical simulations when we solve the linearly constraints \(\ell _1\)\(\ell _2\) minimization problem (1). In [22], from the numerical tests they observed that the accelerated linearized Bregman method is faster than the algorithms discussed in [33]. Hence, we can expect that our proposed method is faster than the methods discussed in [33].

The remainder of this paper is organized as follows. In Sect. 2, we introduce the Bregman method and linearized Bregman method. In Sect. 3, we describe the accelerated ALM, propose our accelerated Bregman method, and give \(O(\frac{1}{k^2})\), where \(k\) is the iteration count, complexity of the accelerated Bregman method and the accelerated ALM. In Sect. 4, we provide the numerical tests on the linearly constrained \(\ell _1\)\(\ell _2\) minimization problem and we compare the performance of our accelerated Bregman method with that of other methods.

2 Bregman Methods

In this section, we introduce the Bregman method and linearized Bregman method. The Bregman method [26] was proposed for solving total variation-based image restoration. The Bregman distance with respect to an \(\ell _1\)\(\ell _2\) function \(f(\cdot ) = \Vert \cdot \Vert _1 + \frac{\beta }{2}\Vert \cdot \Vert ^2_2\) with points \(u\) and \(v\) is defined as

$$\begin{aligned} D_f^p(u,v) = f(u) - f(v) - p^T(u-v), \end{aligned}$$

where \(p\) is an element in \(\partial f(v)\), i.e., a subdifferential of \(f\) at \(v\). The Bregman method for solving (1) can be expressed as follows:

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg {\min _x} \quad D_f^{p^k}(x,x^k) + \frac{\gamma }{2}\Vert A x - b\Vert ^2_2; \\ p^{k+1} = p^k - \gamma A^T (A x^{k+1}-b), \end{array} \right. \end{aligned}$$
(4)

starting with \(x^0 = \mathbf 0 \) and \(p^0 = \mathbf 0 .\)

By Fermat’s rule [28, Theorem 10.1] and the first step in (4), we have

$$\begin{aligned} \mathbf 0 \in \partial f(x^{k+1}) - p^k + \gamma A^T (Ax^{k+1}-b), \end{aligned}$$

Hence \(p^{k+1}\) in the second step in (4) is a subgradient of \(f\) at \(x^{k+1}.\) Note that the original Bregman method does not have the parameter \(\gamma \) (i.e., \(\gamma \) is set as \(1\)). Instead the scaling parameter \(\mu \) is used, i.e., \(f(x) = \mu (\Vert x\Vert _1+\frac{\beta }{2}\Vert x\Vert ^2_2)\). We can get the same solution by setting \(\gamma =\frac{1}{\mu }\). Now, we slightly modify the origianl Bregman method as follows:

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg \min _x \quad f(x) + \frac{\gamma }{2}\Vert Ax-b^k\Vert ^2_2\\ b^{k+1} = b^k - (Ax^{k+1}-b) \end{array} \right. \end{aligned}$$
(5)

starting with \(x^0 = \mathbf 0 \) and \(b^0 = b.\)

The following lemma gives the condition of the equivalence of the first step of the original Bregman method (4) to that of the modified Bregman method (5).

Lemma 1

\(x^{k+1}\) computed by Bregman method (4) equals \(x^{k+1}\) computed by (5) if and only if

$$\begin{aligned} p^k = \gamma A^T(b^k-b). \end{aligned}$$
(6)

Proof

It is obvious by just comparing \(x^{k+1}\) in (4) and \(x^{k+1}\) in (5). \(\square \)

By using Lemma 1 and mathematical induction, we can establish the following theorem.

Theorem 1

The modified Bregman method (5) is equivalent to the original Bregman method (4).

It was proved in [34] that the Bregman method is equivalent to the ALM which was proposed in [20]. The Lagrangian function of (1) is

$$\begin{aligned} L(x,\lambda ) = f(x) -\lambda ^T (Ax-b), \end{aligned}$$

where \(\lambda \in \mathbb{R }^m\) is the Lagrange multiplier. The augmented Lagrangian function

$$\begin{aligned} L(x,\lambda ,\gamma ) = f(x) - \lambda ^T(Ax-b) + \frac{\gamma }{2}\Vert Ax-b\Vert ^2_2 \end{aligned}$$

combines the Lagrangian function with the penalty term \(\Vert Ax-b\Vert ^2_2.\) The ALM for solving (1) is

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg \min _x \quad f(x) - (\lambda ^{k})^T(Ax-b) + \frac{\gamma }{2}\Vert Ax-b\Vert ^2_2,\\ \lambda ^{k+1} = \lambda ^k - \gamma (Ax^{k+1}-b). \end{array} \right. \end{aligned}$$
(7)

For completeness, we provide a proof of the equivalence between the Bregman method and the ALM. By comparing \(x^{k+1}\) in (5) and \(x^{k+1}\) in (7), we obtain the following technical lemma to prove the equivalence. The proof is simple, so we omit it.

Lemma 2

\(x^{k+1}\) of the first step in (5) is equal to that of the first step of the ALM (7) if and only if

$$\begin{aligned} b^k = b + \frac{\lambda ^k}{\gamma }. \end{aligned}$$
(8)

Theorem 2

The Bregman method (4) is equivalent to the augmented Lagrangian method (7) starting with \(\lambda ^0 = \mathbf 0 \).

Proof

We show that (8) holds for all integers \(k \ge 0\) using induction. If \(k =0\), (8) holds by the initial conditions \(b^0 = b\) and \(\lambda ^0 = \mathbf 0 .\) Suppose that the (8) holds for \(k\). The \(x^{k+1}\) in (5) equals the \(x^{k+1}\) in the ALM according to Lemma 2. Thus, we have

$$\begin{aligned} b^{k+1}&= b^k - (Ax^{k+1} - b) \\&= b + \frac{\lambda ^k}{\gamma } - (Ax^{k+1} -b)\\&= b + \frac{\lambda ^k}{\gamma } + \frac{\lambda ^{k+1} - \lambda ^k}{\gamma }\\&= b + \frac{\lambda ^{k+1}}{\gamma }, \end{aligned}$$

where the first equality is from \(b^{k+1}\) in (5) and the third equality is from \(\lambda ^{k+1}\) in (7). Thus, the Bregman method is equivalent to the ALM according to Theorem 1 and Lemma 2. \(\square \)

In general, the subproblem in the first step of (4) does not have the closed form solution. Hence, we have to use other iterative methods to solve the subproblem of the Bregman method. This often takes times to solve the subproblem. In order to solve this difficulty, the linearized Bregman method [7] replaces \(\frac{1}{2}\Vert Ax-b\Vert ^2_2\) with its linearization term \((A^T(Ax^k - b))^T x\) and adds the proximal term to that replacement:

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg \min _x \quad D_f^{p^k}(x,x^k) + (A^T(Ax^k - b))^T x +\frac{1}{2\delta } \Vert x-x^k\Vert ^2_2 \\ p^{k+1} = p^k - A^T(A x^{k}-b) - \frac{1}{\delta }(x^{k+1}-x^k). \end{array} \right. \end{aligned}$$
(9)

Similar to the Bregman method,

$$\begin{aligned} \mathbf 0 \in \partial f(x^{k+1}) - p^k + A^T(Ax^k - b) + \frac{1}{\delta }(x^{k+1}-x^k) \end{aligned}$$

by the optimality condition of \(x^{k+1}\). Hence \(p^{k+1}\) is in the subdifferential \(\partial f(x^{k+1}).\)

In [7, 27], it was proved that if \(0 < \delta < \frac{2}{\Vert AA^T\Vert _2},\) then \(x^k\) in the linearized Bregman method converges to the solution of

$$\begin{aligned} \min _x \quad f(x)+\frac{1}{2\delta }\Vert x\Vert ^2_2 \quad \,\text{ subject} \text{ to}\quad Ax=b. \end{aligned}$$

Recently, an accelerated version of the linearized Bregman method was proposed in [22].

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg \min _x \quad D_f^{\tilde{p}^k}(x,\tilde{x}^k) + \tau (A^T(A\tilde{x}^k-b))^Tx + \frac{1}{2\delta }||x - \tilde{x}^k||^2_2,\\ p^{k+1} = \tilde{p}^k - \frac{1}{\delta }({x}^{k+1} - \tilde{x}^{k}) - \tau A^T(A \tilde{x}^k - b),\\ \tilde{x}^{k+1} = \alpha _k x^{k+1} + (1-\alpha _k)x^k,\\ \tilde{p}^{k+1} = \alpha _k p^{k+1} + (1-\alpha _k)p^k, \end{array} \right. \end{aligned}$$
(10)

where \(\alpha _k = 1+\theta _k(\theta _{k-1}^{-1} -1) \) with the setting \( \theta _{-1} = 1 \, \text{ and} \theta _k = \frac{2}{k+2} \, \text{ for} \text{ all} k\ge 0.\) Huang et al. [22] showed that the convergence rate of the accelerated linearized Bregman method is \(\mathcal{O }(\frac{1}{k^2})\), where \(k\) is the iteration count, based on the equivalence between the linearized Bregman method and the gradient descent method, and the extrapolation scheme used in Nesterov’s accelerated gradient descent method [3, 25].

3 Accelerated Bregman Method

In this section, we propose an accelerated Bregman method which has an \(O(\frac{1}{k^2})\) global convergence rate.

In Sect. 2, we show that the Bregman method is equivalent to the ALM. It was proposed in [19] that if \(f(x)\) is a differentiable convex function, the augmented Lagrangian method can be accelerated using the extrapolation scheme used in Nesterov’s accelerated method. Based on the equivalence and the results in [19], we propose the accelerated Bregman method:

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg \min _x \quad D_f^{p^k}(x,x^k) + \frac{\gamma }{2}\Vert A x - b\Vert ^2_2\\ \tilde{p}^{k+1} = p^k - \gamma A^T (A x^{k+1}-b)\\ t_{k+1} = \frac{1+\sqrt{1+4t_k^2}}{2},\\ p^{k+1} = \tilde{p}^{k+1} + \big (\frac{t_k -1}{t_{k+1}}\big )(\tilde{p}^{k+1} - \tilde{p}^{k}) + \big (\frac{t_k}{t_{k+1}}\big )(\tilde{p}^{k+1} - p^k), \end{array} \right. \end{aligned}$$
(11)

where \(t_0 = 1\).

3.1 Equivalence to the Accelerated ALM

In this subsection, we prove the equivalence between the proposed accelerated Bregman method and the accelerated augmented Lagrangian method which is a generalization of the method proposed in [19] in the sense that \(f\) can be nondifferentible. The following lemma is similar to Theorem 2.

Lemma 3

The accelerated Bregman method (11) starting with \(\lambda ^0 = \mathbf 0 \) is equivalent to the following method starting with \(b^0 = b\).

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg \min _x \quad f(x) + \frac{\gamma }{2}\Vert Ax-b^k\Vert ^2_2\\ \tilde{b}^{k+1} = b^k - (Ax^{k+1} -b )\\ b^{k+1} = \tilde{b}^{k+1} + \big (\frac{t_k -1}{t_{k+1}}\big )(\tilde{b}^{k+1} - \tilde{b}^{k}) + \big (\frac{t_k}{t_{k+1}}\big )(\tilde{b}^{k+1} - b^k), \end{array} \right. \end{aligned}$$
(12)

Proof

We show that (6) is satisfied for all \(k\) by induction. Based on the initial condition, (6) holds for \(k=0.\) We assume that (6) holds for all \(k \le n\). For all \(k\le n\), we get

$$\begin{aligned} \tilde{p}^{k+1}&= p^k - \gamma A^T(Ax^{k+1}-b)\\&= \gamma A^T(b^k-b) - \gamma A^T(Ax^{k+1}-b)\\&= \gamma A^T(b^k -b -(Ax^{k+1}-b))\\&= \gamma A^T(\tilde{b}^{k+1} -b), \end{aligned}$$

where the first equality uses the second step of (11), the second equality is derived from the induction hypothesis, and the fourth equality uses the second step of (12). Hence

$$\begin{aligned} \tilde{p}^{n+1} = \gamma A^T(\tilde{b}^{n+1} -b). \end{aligned}$$

By using the above equality and the third step of (12), we have the following equalities

$$\begin{aligned} p^{n+1}&= \tilde{p}^{n+1} + \left(\frac{t_n -1}{t_{n+1}}\right)(\tilde{p}^{n+1} - \tilde{p}^{n}) + \left(\frac{t_n}{t_{n+1}}\right)(\tilde{p}^{n+1} - p^n)\\&= \gamma A^T(\tilde{b}^{n+1} -b) + \left(\frac{t_n -1}{t_{n+1}}\right)(\gamma A^T(\tilde{b}^{n+1} -b) - \gamma A^T(\tilde{b}^{n} -b)\\&+ \left(\frac{t_n}{t_{n+1}}\right) (\gamma A^T(\tilde{b}^{n+1} -b) - \gamma A^T(b^{n} -b))\\&= \gamma A^T \left(\tilde{b}^{n+1} -b + \left(\frac{t_n -1}{t_{n+1}}\right) (\tilde{b}^{n+1} -\tilde{b}^{n}) + \left(\frac{t_n}{t_{n+1}}\right)(\tilde{b}^{n+1}-b^{n}) \right)\\&= \gamma A^T(b^{n+1} - b). \end{aligned}$$

By induction,

$$\begin{aligned} p^k = \gamma A^T(b^{k} - b) \end{aligned}$$

is satisfied for all integers \(k\ge 0.\) Thus, the accelerated Bregman method (11) is equivalent to the method (12). \(\square \)

The next lemma shows that the method (12) starting with \(b^0 = b\) is equivalent to the accelerated augmented Lagrangian method starting with \(\lambda ^0 = \mathbf 0 \).

Lemma 4

The method (12) starting with \(b^0 = b\) is equivalent to the following accelerated ALM starting with \(\lambda ^0 = \mathbf 0 \):

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg \min _x \quad f(x) -(\lambda ^k)^T(Ax-b) + \frac{\gamma }{2}\Vert Ax-b\Vert ^2_2\\ \tilde{\lambda }^{k+1} = \lambda ^k - \gamma (Ax^{k+1} -b )\\ \lambda ^{k+1} = \tilde{\lambda }^{k+1} + \big (\frac{t_k -1}{t_{k+1}}\big )(\tilde{\lambda }^{k+1} - \tilde{\lambda }^{k}) + \big (\frac{t_k}{t_{k+1}}\big )(\tilde{\lambda }^{k+1} - \lambda ^k), \end{array} \right. \end{aligned}$$
(13)

Proof

It is sufficient to show that (8) is satisfied for all \(k\) according to Lemma 2. We will prove this by induction. When \(k=0\), (8) is satisfied by the initial condition \(b^0=b, \lambda ^0 = \mathbf 0 .\) We assume that

$$\begin{aligned} b^k = b + \frac{\lambda ^k}{\gamma } \end{aligned}$$

is satisfied for \(k \le n\). This implies that, for all \(k\le n\), we have

$$\begin{aligned} \tilde{b}^{k+1}&= b^k - (Ax^{k+1}-b)\\&= b + \frac{\lambda ^k}{\gamma } - (Ax^{k+1}-b)\\&= b + \frac{1}{\gamma }(\lambda ^k - \gamma (Ax^{k+1} - b))\\&= b + \frac{\tilde{\lambda }^{k+1}}{\gamma }, \end{aligned}$$

where the first equality is from \(\tilde{b}^{k+1}\) in (12) and the final equality is from \(\tilde{\lambda }^{k+1}\) in (13). We find the following equalities using the induction hypothesis and the previous equality

$$\begin{aligned} b^{n+1}&= \tilde{b}^{n+1} + \left(\frac{t_n -1}{t_{n+1}}\right)(\tilde{b}^{n+1} - \tilde{b}^{n}) + \left(\frac{t_n}{t_{n+1}}\right)(\tilde{b}^{n+1} - b^n) \\&= b + \frac{\tilde{\lambda }^{n+1}}{\gamma } + \left(\frac{t_n -1}{t_{n+1}}\right) (b + \frac{\tilde{\lambda }^{n+1}}{\gamma } - b + \frac{\tilde{\lambda }^{n}}{\gamma }) + \left(\frac{t_k}{t_{k+1}}\right)(b + \frac{\tilde{\lambda }^{n+1}}{\gamma } - b + \frac{\lambda ^{n}}{\gamma })\\&= b + \frac{1}{\gamma }\left( \tilde{\lambda }^{n+1} + \left(\frac{t_n-1}{t_{n+1}}\right)(\tilde{\lambda }^{n+1} - \tilde{\lambda }^{n}) + \left(\frac{t_n}{t_{n+1}}\right)(\tilde{\lambda }^{n+1} - \lambda ^n)\right)\\&= b + \frac{\lambda ^{n+1}}{\gamma }. \end{aligned}$$

By induction,

$$\begin{aligned} b^k = b + \frac{\lambda ^k}{\gamma } \end{aligned}$$

is satisfied for all integers \(k\ge 0.\) \(\square \)

By Lemmas 3 and 4, we get the following theorem for the equivalence between the accelerated Bregman method (11) and the accelerated ALM.

Theorem 3

The accelerated Bregman method (11) starting with \(p^0 = \mathbf 0 \) is equivalent to the accelerated augmented Lagrangian method (13) starting with \(\lambda ^0 = \mathbf 0 \).

3.2 Complexity of the Accelerated Bregman Method

In the previous subsection, we proved the equivalence of the accelerated ALM and the accelerated Bregman method. Therefore, the convergence rate of the (accelerated) Bregman method is equal to the convergence rate of the (accelerated) ALM. In this section, we will show that the convergence rate of the ALM is \(\mathcal{O }(\frac{1}{k})\) and the convergence rate of the accelerated augmented Lagrangian method (AALM) is \(\mathcal{O }(\frac{1}{k^2})\).

In [19], the authors considered an ALM (7) for the linearly constrained smooth minimization problem:

$$\begin{aligned} \min _{x\in X,\;Ax=b}\;h(x), \end{aligned}$$

where \(h\) is differentiable and the set \(X\) is closed convex set. They showed that (7) was accelerated and the convergence rate of the accelerated version (13) was \(\mathcal{O }(\frac{1}{k^2}).\) In this paper, we extend the accelerated augmented Lagrangian method in [19] to solve the linearly constrained nonsmooth minimization problem, especially the linearly constrained \(\ell _1\)\(\ell _2\) minimization problem (1). The key lemma is Lemma 6 and the proof of the \(\mathcal{O }(\frac{1}{k^2})\) convergence rate is similar to that in [19]. The dual problem of (1) is

$$\begin{aligned} \max _\lambda \{\min _x L(x,\lambda ) = f(x) -\lambda ^T(Ax-b)\}, \end{aligned}$$
(14)

where \(L(x,\lambda )\) is a Lagrangian function. Problem (1) is a convex optimization whose constraints are only a linear equation, so there is no duality gap according to Slater’s condition [4]. Let \((x^*,\lambda ^*)\) be the saddle point of the Lagrangian function. Several lemmas are required to prove the \(\mathcal{O }(\frac{1}{k^2})\) convergence rate.

The following lemma gives the bound for the difference of the Lagrangian function values at the current iterates and any point that satisfies \(A^T\lambda \in \partial f(x)\) in terms of dual variables.

Lemma 5

Let \((x^{k+1},\lambda ^{k+1})\) be generated by the augmented Lagrangian method (7). For any \((x,\lambda )\) that satisfies

$$\begin{aligned} f(x^{k+1}) - f(x) \ge \lambda ^T A(x^{k+1}-x), \end{aligned}$$
(15)

we get the inequality

$$\begin{aligned} L(x^{k+1},\lambda ^{k+1}) - L(x,\lambda ) \ge \frac{1}{\gamma }\Vert \lambda ^k - \lambda ^{k+1}\Vert ^2_2 + \frac{1}{\gamma }(\lambda - \lambda ^k)^T (\lambda ^k - \lambda ^{k+1}). \end{aligned}$$

Proof

By using (15) and the definition of the Lagrangian function, we have the following inequalities

$$\begin{aligned} L(x^{k+1},\lambda ^{k+1}) - L(x,\lambda )&= f(x^{k+1}) -f(x) + \lambda ^T (Ax-b) - (\lambda ^{k+1})^T(Ax^{k+1} -b)\\&\ge \lambda ^T A(x^{k+1}-x) + \lambda ^T(Ax-b) - (\lambda ^{k+1})^T(Ax^{k+1} -b) \\&= \lambda ^T (Ax^{k+1} -b) - (\lambda ^{k+1})^T(Ax^{k+1} -b)\\&= (\lambda - \lambda ^{k+1})^T(Ax^{k+1} -b)\\&= \frac{1}{\gamma }(\lambda - \lambda ^{k+1})^T (\lambda ^k -\lambda ^{k+1})\\&= \frac{1}{\gamma }\Vert \lambda ^k - \lambda ^{k+1}\Vert ^2_2 + \frac{1}{\gamma }(\lambda - \lambda ^k)^T(\lambda ^k -\lambda ^{k+1}), \end{aligned}$$

where the fourth equality is based on the second step of (7). \(\square \)

The next lemma shows that the condition (15) in Lemma 5 is satisfied with \((x,\lambda ) = (x^{k+1},\lambda ^{k+1}), \, \text{ or} (x^*,\lambda ^*)\) according to Lemma 6. In what follows, a point \((x^*,\lambda ^*)\) is a Karush–Kuhn–Tucker (KKT) [21] point of the problem:

$$\begin{aligned} \min _{Ax=b}\;f(x) \end{aligned}$$

if the following conditions are satisfied:

$$\begin{aligned} \partial f(x^*) - A^T\lambda ^*&\ni\, \mathbf 0 \nonumber \\ Ax^*&= b. \end{aligned}$$
(16)

Lemma 6

The \((x^{n+1},\lambda ^{n+1})\) generated by (7) satisfies the condition (15) with \((x,\lambda ) = (x^{n+1},\lambda ^{n+1})\) and \((x^*,\lambda ^*)\) also satisfies the same condition

$$\begin{aligned} f(x^{k+1}) - f(x^*) \ge (\lambda ^*)^T A(x^{k+1}-x^*). \end{aligned}$$

Proof

By Fermat’s rule [28, Theorem 10.1] and the first step in (7), we have

$$\begin{aligned} \mathbf 0 \in \partial f(x^{n+1}) - A^T \lambda ^n + \gamma A^T(Ax^{n+1} -b), \end{aligned}$$

i.e.,

$$\begin{aligned} A^T(\lambda ^n - \gamma (Ax^{n+1} -b)) \in \partial f(x^{n+1}). \end{aligned}$$

Based on \(\lambda ^{n+1}\) in (7),

$$\begin{aligned} A^T \lambda ^{n+1} \in \partial f(x^{n+1}). \end{aligned}$$

According to the definition of the subdifferential, we get

$$\begin{aligned} f(x^{k+1}) - f(x^{n+1})&\ge (A^T\lambda ^{n+1})^T(x^{k+1}-x^{n+1})\\&= (\lambda ^{n+1})^T A(x^{k+1}-x^{n+1}). \end{aligned}$$

Since \((x^*,\lambda ^*)\) satisfies the KKT condition,

$$\begin{aligned} \partial f(x^*) - A^T\lambda ^*&\ni\, \mathbf 0 \\ Ax^*&= b. \end{aligned}$$

From the first condition, we get

$$\begin{aligned} A^T\lambda ^* \in \partial f(x^*). \end{aligned}$$

Thus, based on the definition of the subdifferential, we obtain

$$\begin{aligned} f(x^{k+1}) - f(x^{*}) \ge (\lambda ^{*})^T A(x^{k+1}-x^{*}). \end{aligned}$$

\(\square \)

By the proof of Lemma 6, it is satisfied that

$$\begin{aligned} A^T \lambda ^{k+1} \in \partial f(x^{k+1}), \quad A^T\lambda ^* \in \partial f(x^*) \quad \,\text{ and} \quad Ax^* = b. \end{aligned}$$

By the definition of the subdifferential, we get

$$\begin{aligned} f(x^{*}) - f(x^{k+1})&\ge (\lambda ^{k+1})^T A(x^{*}-x^{k+1})\\&= (\lambda ^{k+1})^T (b - Ax^{k+1}). \end{aligned}$$

Hence, we also have

$$\begin{aligned} L(x^{k+1},\lambda ^{k+1}) \le L(x^*,\lambda ^*). \end{aligned}$$

The following lemma is a key lemma to establish the \(\mathcal{O }(\frac{1}{k})\) convergence rate for ALM.

Lemma 7

Let \((x^{k+1},\lambda ^{k+1})\) be generated by the (7). We have

$$\begin{aligned} \Vert \lambda ^{k+1} - \lambda ^*\Vert _2^2 \le \Vert \lambda ^{k} - \lambda ^*\Vert _2^2 - \Vert \lambda ^k - \lambda ^{k+1}\Vert _2^2 - 2\gamma (L(x^*,\lambda ^*) - L(x^{k+1},\lambda ^{k+1})). \end{aligned}$$

Proof

Lemma 5 with \((x,\lambda ) = (x^*,\lambda ^*)\) implies that

$$\begin{aligned} (\lambda ^k - \lambda ^*)^T (\lambda ^k-\lambda ^{k+1}) \ge \Vert \lambda ^k - \lambda ^{k+1}\Vert ^2_2 + \gamma (L(x^*,\lambda ^*) - L(x^{k+1},\lambda ^{k+1})). \end{aligned}$$

The above inequality yields that

$$\begin{aligned} \Vert \lambda ^{k+1} - \lambda ^*\Vert _2^2&= \Vert \lambda ^{k+1} - \lambda ^k + \lambda ^k - \lambda ^*\Vert _2^2\\&= \Vert \lambda ^{k+1} - \lambda ^k\Vert _2^2 -2(\lambda ^k - \lambda ^*)^T(\lambda ^{k} - \lambda ^{k+1}) + \Vert \lambda ^k - \lambda ^*\Vert _2^2\\&\le \Vert \lambda ^{k+1} - \lambda ^k \Vert _2^2 + \Vert \lambda ^k - \lambda ^*\Vert _2^2 -2\Vert \lambda ^k -\lambda ^{k+1}\Vert ^2_2\\&- 2\gamma (L(x^*,\lambda ^*) - L(x^{k+1},\lambda ^{k+1}))\\&= \Vert \lambda ^{k} - \lambda ^*\Vert _2^2 - \Vert \lambda ^k - \lambda ^{k+1}\Vert _2^2 - 2\gamma (L(x^*,\lambda ^*) - L(x^{k+1},\lambda ^{k+1})). \end{aligned}$$

\(\square \)

We have the inequality

$$\begin{aligned} \Vert \lambda ^{k+1} - \lambda ^*\Vert ^2_2 \le \Vert \lambda ^k-\lambda ^*\Vert ^2_2 - \Vert \lambda ^k - \lambda ^{k+1}\Vert ^2_2 \end{aligned}$$
(17)

from Lemma 7 and \(L(x^{k+1},\lambda ^{k+1}) \le L(x^*,\lambda ^*).\) Then the inequality (17) implies the global convergence of ALM (7). By summing (17) over \(k = 1,\cdots , n\) we have

$$\begin{aligned} \sum _{k=1}^n \Vert \lambda ^k - \lambda ^{k+1}\Vert _2^2 \le \Vert \lambda ^1 - \lambda ^*\Vert _2^2, \end{aligned}$$

which implies that

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert \lambda ^k - \lambda ^{k+1}\Vert _2^2 = 0. \end{aligned}$$

In the next theorem, we prove that the convergence rate of ALM is \(\mathcal{O }(\frac{1}{k})\).

Theorem 4

Let \((x^k,\lambda ^k)\) be generated by (7). We obtain

$$\begin{aligned} L(x^*,\lambda ^*) - L(x^k,\lambda ^k) \le \frac{\Vert \lambda ^0-\lambda ^*\Vert _{2}^2}{2k\gamma } \quad \,\text{ for} \text{ any} x. \end{aligned}$$

Proof

We get

$$\begin{aligned} \Vert \lambda ^{n+1} - \lambda ^*\Vert _2^2 \le \Vert \lambda ^n - \lambda ^*\Vert _2^2 - \Vert \lambda ^n - \lambda ^{n+1}\Vert ^2_2 -2\gamma (L(x^*,\lambda ^*)-L(x^{n+1},\lambda ^{n+1})) \end{aligned}$$

from Lemma 7. Thus, we have

$$\begin{aligned} L(x^{n+1},\lambda ^{n+1}) - L(x^*,\lambda ^*) \ge \frac{1}{2\gamma }\left\{ \Vert \lambda ^{n+1}-\lambda ^*\Vert _2^2 - \Vert \lambda ^n-\lambda ^*\Vert ^2_2 + \Vert \lambda ^n - \lambda ^{n+1}\Vert ^2_2\right\} . \end{aligned}$$

Summing this inequality over \(n = 0,\cdots ,k-1\), we have

$$\begin{aligned} \sum _{n=1}^k L(x^n, \lambda ^n) \!-\!k L(x^*,\lambda ^*) \!\ge \! \frac{1}{2\gamma } \left\{ \Vert \lambda ^k\!-\! \lambda ^*\Vert _2^2 - \Vert \lambda ^0 \!-\! \lambda ^*\Vert _2^2 \!+\! \sum _{n=0}^{k-1} \Vert \lambda ^n - \lambda ^{n+1}\Vert ^2_2\right\} .\quad \end{aligned}$$
(18)

Based on Lemma 5 for \(k=n\) and setting \((x,\lambda ) = (x^n,\lambda ^n),\) we obtain

$$\begin{aligned} L(x^{n+1},\lambda ^{n+1}) - L(x^n,\lambda ^n) \ge \frac{1}{\gamma }\Vert \lambda ^n - \lambda ^{n+1}\Vert _2^2. \end{aligned}$$

By multiplying this inequality with \(n\) and summing it over \(n=0,\cdots ,k-1\), we have the following inequalities

$$\begin{aligned}&\sum _{n=0}^{k-1} i(L(x^{n+1},\lambda ^{n+1})-L(x^n,\lambda ^n)) \ge \frac{1}{\gamma }\sum _{n=0}^{k-1} n\Vert \lambda ^n - \lambda ^{n+1}\Vert _2^2\\&\quad \Leftrightarrow \sum _{n=0}^{k-1}((n+1)L(x^{n+1},\lambda ^{n+1})-n L(x^n,\lambda ^n) - L(x^{n+1},\lambda ^{n+1})) \ge \frac{1}{\gamma }\sum _{n=0}^{k-1} n\Vert \lambda ^n - \lambda ^{n+1}\Vert _2^2\\&\quad \Leftrightarrow k L(x^k,\lambda ^k) - \sum _{n=1}^k L(x^n,\lambda ^n) \ge \frac{1}{\gamma }\sum _{n=0}^{k-1} n\Vert \lambda ^n - \lambda ^{n+1}\Vert _2^2. \end{aligned}$$

It follows from adding (18) and the above inequality that

$$\begin{aligned} k(L(x^k,\lambda ^k) \!-\! L(x^*,\lambda ^*))\ge \frac{1}{2\gamma }\left\{ \Vert \lambda ^k-\lambda ^*\Vert ^2_2 \!-\! \Vert \lambda ^0 \!-\! \lambda ^*\Vert _2^2 + \sum _{n=0}^{k-1}(2n+1)\Vert \lambda ^n-\lambda ^{n+1}\Vert ^2_2\right\} . \end{aligned}$$

Thus, we have

$$\begin{aligned} L(x^*,\lambda ^*) - L(x^k,\lambda ^k)&\le \frac{1}{2k\gamma }\Big \{\Vert \lambda ^0 - \lambda ^*\Vert _2^2- \Vert \lambda ^k - \lambda ^*\Vert _2^2 - \sum _{n=0}^{k-1}(2n+1)\Vert \lambda ^n - \lambda ^{n+1}\Vert ^2_2\Big \}\\&\le \frac{1}{2k\gamma }\Vert \lambda ^0-\lambda ^*\Vert ^2_2. \end{aligned}$$

\(\square \)

The iterate \((x^{n+1},\tilde{\lambda }^{n+1})\) generated by AALM (13) contents

$$\begin{aligned} f(x^{k+1}) - f(x^{n+1}) \ge (\tilde{\lambda }^{n+1})^T A(x^{k+1}-x^{n+1}) \end{aligned}$$

by replacing \((x^{k+1},\lambda ^{k+1})\) with \((x^{k+1},\tilde{\lambda }^{k+1})\) in Lemma 6. Therefore, we get the following lemmas by simply changing the notation.

Lemma 8

Let \((x^{k+1},\tilde{\lambda }^{k+1})\) be generated by the AALM (13). For \((x,\lambda ) = (x^*,\lambda ^*)\) or \((x^{n+1},\tilde{\lambda }^{n+1})\) generated by the AALM, we have the inequality

$$\begin{aligned} L(x^{k+1},\tilde{\lambda }^{k+1}) - L(x,\lambda ) \ge \frac{1}{\gamma }\Vert \lambda ^k - \tilde{\lambda }^{k+1}\Vert ^2_2 + \frac{1}{\gamma }(\lambda - \lambda ^k)^T (\lambda ^k - \tilde{\lambda }^{k+1}). \end{aligned}$$

Lemma 9

Let \((x^{k+1},\tilde{\lambda }^{k+1})\) be generated by the AALM (13). We obtain

$$\begin{aligned} \Vert \tilde{\lambda }^{k+1} - \lambda ^*\Vert _2^2 \le \Vert \lambda ^{k} - \lambda ^*\Vert _2^2 - \Vert \lambda ^k - \tilde{\lambda }^{k+1}\Vert _2^2 - 2\gamma (L(x^*,\lambda ^*) - L(x^{k+1},\tilde{\lambda }^{k+1})). \end{aligned}$$

Several lemmas are required to obtain our main result.

Lemma 10

The sequence \(\{t_k\}\) generated by (11) satisfies

$$\begin{aligned} t_k \ge \frac{k+2}{2} \end{aligned}$$

and

$$\begin{aligned} t^2_k = t^{2}_{k+1} - t_{k+1} \quad \, \text{ for} \text{ all} k\ge 0. \end{aligned}$$

Proof

The proof is based on Lemma 3.3 in [19] and the definition of \(t_k.\) \(\square \)

Lemma 11

The inequality

$$\begin{aligned} 4t_k^2v_{k+1} \le 4t_0^2v_1 + \frac{1}{\gamma }\Vert u_1\Vert ^2_2, \end{aligned}$$
(19)

where \(v_{k} = L(x^*,\lambda ^*) -L({x}^{k}, \tilde{\lambda }^{k})\) and \(u_k = t_{k-1}(2\tilde{\lambda }^{k} - \lambda ^{k-1} - \tilde{\lambda }^{k-1}) + \tilde{\lambda }^{k-1} - \lambda ^*\), is satisfied for all \(k\ge 0\)

Proof

When \(k=0\), this is trivial. By Lemma 8 with \((x,\lambda ) = (x^n,\tilde{\lambda }^n), (x^*,\lambda ^*)\) and using the definition of \(v_n\), we have

$$\begin{aligned} v_n&-\, v_{n+1} \ge \frac{1}{\gamma }\Vert \lambda ^n - \tilde{\lambda }^{n+1}\Vert _2^2 + \frac{1}{\gamma }(\tilde{\lambda }^n - \lambda ^n)^T (\lambda ^n - \tilde{\lambda }^{n+1}) \end{aligned}$$
(20)
$$\begin{aligned}&-\, v_{n+1} \ge \frac{1}{\gamma }\Vert \lambda ^n - \tilde{\lambda }^{n+1}\Vert _2^2 + \frac{1}{\gamma }(\lambda ^* - \lambda ^n)^T (\lambda ^n -\tilde{\lambda }^{n+1}). \end{aligned}$$
(21)

By multiplying (20) by \(t_n-1\) and adding (21), we get

$$\begin{aligned} (t_n-1)v_n - t_n v_{n+1} \ge \frac{t_n}{\gamma }\Vert \lambda ^n - \tilde{\lambda }^{n+1}\Vert _2^2 + \frac{1}{\gamma }(\lambda ^* + (t_n-1)\tilde{\lambda }^n - t_n \lambda ^n)^T (\lambda ^n - \tilde{\lambda }^{n+1}). \end{aligned}$$

By multiplying the above inequality by \(t_n\) and applying lemma 10, we have

$$\begin{aligned} t_{n-1}^2 v_n - t_n^2v_{n+1}&\ge \frac{1}{\gamma }(\lambda ^* - t_n\lambda ^n + (t_n - 1)\tilde{\lambda }^n )^T (t_n(\lambda ^n-\tilde{\lambda }^{n+1}))\\&+ \frac{1}{\gamma }\Vert t_n(\lambda ^n- \tilde{\lambda }^{n+1})\Vert ^2_2 \\&= \frac{1}{\gamma }(\lambda ^* +(t_n-1)\tilde{\lambda }^n -t_n\tilde{\lambda }^{n+1})^T(t_n(\lambda ^n-\tilde{\lambda }^{n+1}))\\&= \frac{1}{4\gamma }\Vert \lambda ^* + t_n(\tilde{\lambda }^n + \lambda ^n -2\tilde{\lambda }^{n+1})-\tilde{\lambda }^n\Vert ^2_2\\&\quad - \frac{1}{4\gamma }\Vert \lambda ^*+ (t_n-1)\tilde{\lambda }^n - t_n\lambda ^n\Vert ^2_2\\&= \frac{1}{4\gamma }\Vert u_{n+1}\Vert ^2_2 - \frac{1}{4\gamma }\Vert \lambda ^* +(t_n-1)\tilde{\lambda }^n - t_n\lambda ^n\Vert ^2_2 \end{aligned}$$

where the second equality is from the elementary identity \(x^T y = \frac{1}{4}\Vert x+y\Vert ^2_2 -\frac{1}{4}\Vert x-y\Vert ^2_2\). Based on the definition of \(\lambda ^{k}\) in AALM (13), we get

$$\begin{aligned} -\lambda ^* -(t_n-1)\tilde{\lambda }^n + t_n\lambda ^n = u_n. \end{aligned}$$

Thus, it follows that

$$\begin{aligned} t_{n-1}^2 v_n - t_n^2 v_{n+1} \ge \frac{1}{4\gamma }\Vert u_{n+1}\Vert ^2_2 - \frac{1}{4\gamma }\Vert u_n\Vert ^2_2 \quad \,\text{ for} \text{ all} n\ge 1. \end{aligned}$$
(22)

By multiplying (22) by 4 and summing it over \(n = 1,\cdots ,k\), we have

$$\begin{aligned} 4(-t_k^2 v_{k+1} + t_0^2 v_1) \ge \frac{1}{\gamma }\Vert u_{k+1}\Vert ^2_2 - \frac{1}{\gamma }\Vert u_1\Vert ^2_2. \end{aligned}$$

Since \(\Vert u_{k+1}\Vert ^2_2 \ge 0,\) we get

$$\begin{aligned} 4t_k^2v_{k+1} \le 4t_0^2 v_1 + \frac{1}{\gamma }\Vert u_1\Vert ^2_2 \quad \, \text{ for} \text{ all} k \ge 1. \end{aligned}$$

\(\square \)

Now, we establish our main theorem.

Theorem 5

Let \((x^{k+1},\tilde{\lambda }^{k+1},\lambda ^{k+1})\) be generated by the AALM. For any \(k\ge 1,\) we have

$$\begin{aligned} L(x^*,\lambda ^*) - L(x^{k},\tilde{\lambda }^k) \le \frac{\Vert \lambda ^0-\lambda ^*\Vert ^2_{2}}{\gamma (k+1)^2}. \end{aligned}$$

Proof

Based on the Eq. (19), we get

$$\begin{aligned} L(x^*,\lambda ^*) - L(x^k,\tilde{\lambda }^k) \le \frac{4t_0^2 v_1 + \frac{1}{\gamma }\Vert u_1\Vert ^2_{2}}{4t_{k-1}^2} \quad \, \text{ for} \text{ any} k\ge 1. \end{aligned}$$

This together with Lemma 10 implies that

$$\begin{aligned} L(x^*,\lambda ^*) - L(x^k,\tilde{\lambda }^k) \le \frac{4t_0^2 v_1 + \frac{1}{\gamma }\Vert u_1\Vert ^2_{2}}{(k+1)^2}. \end{aligned}$$
(23)

By simple calculation and \(t_0 = 1\), we have

$$\begin{aligned} 4t_0^2 v_1 + \frac{1}{\gamma }\Vert u_1\Vert ^2_{2} = 4(L(x^*,\lambda ^*) - L(x^1,\tilde{\lambda }^1)) + \frac{1}{\gamma }\Vert 2\tilde{\lambda }^1 - \lambda ^0 -\lambda ^*\Vert ^2_{2}. \end{aligned}$$

Lemma 9 with \(k=0\) yields that

$$\begin{aligned} \Vert \tilde{\lambda }^1 - \lambda ^*\Vert ^2_{2} \le \Vert \lambda ^0-\lambda ^*\Vert ^2_{2} - \Vert \lambda ^0 - \tilde{\lambda }^1\Vert ^2_2 -2\gamma (L(x^*,\lambda ^*) - L(x^1,\tilde{\lambda }^1)). \end{aligned}$$

Thus, we have

$$\begin{aligned} 4t_0^2 v_1&+ \frac{1}{\gamma }\Vert u_1\Vert ^2_{2}\nonumber \\&\le \frac{2}{\gamma }\Vert \lambda ^0 -\lambda ^*\Vert ^2_{2} -\frac{2}{\gamma }\Vert \lambda ^0 - \tilde{\lambda }^1\Vert ^2_{2} - \frac{2}{\gamma }\Vert \tilde{\lambda }^1 - \lambda ^*\Vert ^2_{2} + \frac{1}{\gamma }\Vert 2\tilde{\lambda }^1 -\lambda ^0 -\lambda ^*\Vert ^2_{2}\end{aligned}$$
(24)
$$\begin{aligned}&= \frac{1}{\gamma }\Vert \lambda ^0-\lambda ^*\Vert ^2_{2}. \end{aligned}$$
(25)

where the equality is from the identity \(2\Vert a-c\Vert ^2_{2} -2\Vert b-c\Vert ^2_{2}-2\Vert b-a\Vert ^2_{2} = \Vert a-c\Vert ^2_{2} - \Vert b-a +b-c\Vert ^2_{2}\) with \(a = \lambda ^0, b= \tilde{\lambda }^1, c=\lambda ^*\). Thus, (23) and (25) imply

$$\begin{aligned} L(x^*,\lambda ^*) - L(x^{k},\tilde{\lambda }^k) \le \frac{\Vert \lambda ^0-\lambda ^*\Vert ^2_{2}}{\gamma (k+1)^2}. \end{aligned}$$

\(\square \)

Remark 1

In [19], the authors considered a generalized augmented Lagrangian method (26) with a symmetric positive definite matrix penalty parameter \(H_k\) that satisfied

$$\begin{aligned} H_k \preceq H_{k+1},\quad \forall k \ge 0 \end{aligned}$$

when the object function \(f\) was differentiable:

$$\begin{aligned} \left\{ \begin{array}{ll} x^{k+1} = \arg \min _x\quad f(x) - (\lambda ^{k})^T(Ax-b) + \frac{1}{2}\Vert Ax-b\Vert ^2_{H_k},\\ \lambda ^{k+1} = \lambda ^k - H_k(Ax^{k+1}-b). \end{array} \right. \end{aligned}$$
(26)

We can also extend the \(\mathcal{O }(\frac{1}{k^2})\) convergence rate result for this generalized method when \(f(x)\) is not necessarily differentiable.

4 Numerical Simulation

In this section, we provide numerical tests for the comparison of the performance of our accelerated Bregman method and other methods, such as the Bregman method and the accelerated linearized Bregman method, for the linearly constrained \(\ell _1\)\(\ell _2\) minimization problem. We use the Bregman method (5) and the accelerated Bregman method (12). The subproblems for \(x^{k+1}\) in the Bregman method (5) and the accelerated Bregman method (12) are solved using the fixed point (FP) method [18]. We have implemented our accelerated Bregman method in MATLAB. All runs are performed on a Windows 7 PC with an Intel core i7-2700K 3.4 GHz CPU and 16 GB of memory and MATLAB (Version R2011b).

4.1 FP Method

In this subsection, we briefly describe the fixed point method [18] for the following minimization problem:

$$\begin{aligned} \min _{x\in \mathbb{R }^n} \Vert x\Vert _1 + \bar{\mu }g(x), \end{aligned}$$
(27)

where \(g:\mathbb{R }^n \rightarrow \mathbb{R }\) is differentiable and convex. Note that minimizing a convex function is equivalent to finding a zero for the subdifferential. The optimal condition of the (27) is

$$\begin{aligned} \bar{\mu } \nabla g(x^*) = \left\{ \begin{array}{ll} = -1,&x_i^* > 0; \\ \in [-1,1],&x_i^* = 0;\\ = 1,&x_i^* < 0, \end{array} \right. \end{aligned}$$

where \(x^*\) is an optimal solution for (27) and \(x_i^*\) denotes the \(i\)th component of \(x^*\). From the optimal condition, it is easy to prove that

$$\begin{aligned} x^*_i = \,\text{ sgn}(x^*_i - \tau \nabla g(x^*_i))\max \left\{ |x^*_i-\tau \nabla g(x^*_i)| - \frac{\tau }{\bar{\mu }},\mathbf 0 \right\} , \end{aligned}$$
(28)

for any \(\tau >0.\) To solve the Eq. (28), we consider fixed point iterations

$$\begin{aligned} x^{k+1}_i = \,\text{ sgn}(x^k_i - \tau \nabla g(x^k_i))\max \left\{ |x^k_i - \tau \nabla g(x_i^k)| - \frac{\tau }{\bar{\mu }},\mathbf 0 \right\} . \end{aligned}$$

For the subproblem of the Bregman method when it is applied to solve the linearly constrained \(\ell _1\)\(\ell _2\) minimization problem, we set

$$\begin{aligned} g(x) = \frac{\beta \Vert x\Vert ^2_2}{2\bar{\mu }} +\frac{1}{2}\Vert Ax-b\Vert ^2_2 \end{aligned}$$

with \(\bar{\mu } = \gamma \) and

$$\begin{aligned} \nabla g(x) = \frac{\beta x}{\bar{\mu }} + A^T(Ax-b). \end{aligned}$$

4.2 Comparison of the Bregman Method with the Accelerated Bregman Method

In this section, we compare the performance of the Bregman method with that of the accelerated Bregman method for solving the linearly constrained \(\ell _1\)\(\ell _2\) minimization. For experiments, we set \(n = 5000, m=2500\) for the size of the Gaussian measurement matrix \(A\) whose entries are selected randomly from a standard Gaussian distribution \(\mathcal{N }(0,1)\). The sparsity \(k\), i.e., the number of nonzero elements of the original solution, is fixed at 250. The location of nonzero elements in the original solution (signal) \(\bar{x}\) is selected randomly and the nonzero elements of \(\bar{x}\) are selected from the Gaussian distribution \(\mathcal{N }(0,2^2)\) (matlab code : 2*randn(k,1)). The noise \(\bar{n}\) in

$$\begin{aligned} b = A\bar{x} + \bar{n} \end{aligned}$$

is generated by a standard Gaussian distribution \(\mathcal{N } (0,1)\) and then it is normalized to the norm \(\sigma = 0.01.\) We terminate all the methods when

$$\begin{aligned} \Vert Ax^k-b\Vert _2 < \sigma \end{aligned}$$

is satisfied or the number of iterations exceeds 1000. The coefficient \(\gamma \) in the penalty term of the subproblem of the (accelerated) Bregman method is fixed to \(\gamma = \frac{2}{\Vert A^Tb\Vert _2}\). The stopping condition of the fixed point method is when

$$\begin{aligned} \frac{\Vert x^{k} - x^{k-1}\Vert _2}{\max \{\Vert x^{k-1}\Vert _2,1\}} < 10^{-4} \quad \, \text{ and} \gamma \Vert \nabla g\Vert _\infty - 1 < 10^{-3} \end{aligned}$$

is satisfied or the number of iterations in FP exceeds 100.

In each test, we calculate the residual error \(\Vert Ax-b\Vert _2\), the relative error

$$\begin{aligned} \frac{\Vert x-\bar{x}\Vert _2}{\Vert \bar{x}\Vert _2}, \end{aligned}$$

and the signal-to-noise ratio (SNR)

$$\begin{aligned} 10\log _{10} \frac{\Vert \bar{x} - \,\text{ mean}\,(\bar{x})\Vert ^2_2}{\Vert \bar{x} - x\Vert ^2_2}, \end{aligned}$$

where \(x\) is the recovery signal.

In Table 1, we report the number of iterations (and the total number of inner iterations), the CPU time, the residual error, the relative error, and SNR of the Bregman method and the accelerated Bregman method for various \(\beta \).

Table 1 Comparison of the Bregman method with the accelerated Bregman method with various \(\beta \)

Both methods with \(\beta = 1, 1.5\) converge slower than those with \(\beta < 1\). And, based on the relative error and SNR, it is observed that the sparse original signal is well restored when \(\beta < 1\). This is also shown in Figs. 1 and 2. The accelerated Bregman method is faster than the Bregman method. In Fig. 3, we show the relation between the relative error and the number of iterations for both methods.

Fig. 1
figure 1

Original sparse signal and the final estimated solution of the (accelerated) Bregman method. \(\beta = 0.1\)

Fig. 2
figure 2

Original sparse signal and the final estimated solution of the (accelerated) Bregman method. \(\beta = 1.5\)

Fig. 3
figure 3

Plot of the relative error and residual error of the Bregman and accelerated Bregman methods. \(\beta = 0.1\)

4.3 Comparison of the Accelerated Linearized Bregman Method with the Accelerated Bregman Method

In this subsection, we compare the performance of the accelerated Bregman method and the accelerated linearized Bregman method for solving the linearly constrained \(\ell _1\)\(\ell _2\) minimization with the same setup for \(A\) and \(k\) as in the previous subsection but with \(\sigma =0.01,\;0.1\). The nonzero elements of \(\bar{x}\) are selected from the Gaussian distribution \(\mathcal{N }(0,2^2).\) When noise is present, we terminate all the methods when the residual error \(\Vert Ax^k-b\Vert _2\le \sigma \) or the number of iterations exceeds \(1000\). But, for a noise-free case, i.e., \(b=A\bar{x}\), we stop all the methods when the relative residual error

$$\begin{aligned} \frac{\Vert Ax^k-b\Vert _2}{\Vert b\Vert _2} < 10^{-5} \end{aligned}$$

is satisfied. The coefficient \(\gamma \) in the penalty term of the subproblem of the accelerated Bregman method is fixed to \(\gamma = \frac{2}{\Vert A^Tb\Vert _2}\). When \(\beta = 1.5\) for \( \sigma = 0.01\) and noise-free case, we use the coefficient \(\tau =\frac{1.8\beta }{\Vert A\Vert ^2_2}\) in the accelerated linearized Bregman method. In other cases, we use the coefficient \(\tau =\frac{2\beta }{\Vert A\Vert ^2_2}\). We observed that two iterations for FP is enough to ensure the convergence of the accelerated Bregman method. Thus, we stop the FP method when

$$\begin{aligned} \frac{\Vert x^{k} - x^{k-1}\Vert _2}{\max \{\Vert x^{k-1}\Vert _2,1\}} < 10^{-4} \quad \, \text{ and} \gamma \Vert \nabla g\Vert _\infty - 1 < 10^{-3} \end{aligned}$$

are satisfied or the number of iterations of FP exceeds \(2\).

In Tables 2, 3 and 4, we report the total number of iterations, the CPU time, the residual error, the relative error, and SNR of the accelerated linearized Bregman method and the accelerated Bregman method for various \(\beta \). Based on the relative error and SNR, we observe that the sparse original signal is well restored when \(\beta \le 0.5\).

Table 2 Comparison of the accelerated Bregman method with the accelerated linearized Bregman method with various \(\beta \) for Noise-free case
Table 3 Comparison of the accelerated Bregman method with the accelerated linearized Bregman method with various \(\beta \) for \(\sigma = 0.01\)
Table 4 Comparison of the accelerated Bregman method with the accelerated linearized Bregman method with various \(\beta \) for \(\sigma = 0.1\)

For all \(\sigma \), the accelerated Bregman method is faster than the accelerated linearized Bregman method when \(\beta \le 0.1\). In this experiments, we observe that the accelerated Bregman method overall performs better than the accelerated linearlized Bregman method when \(\beta \) is relatively small. Both accelerated methods find the better solution when \(\beta \) is relatively small.

5 Conclusion

In this paper, we proposed an accelerated algorithm for the Bregman method to solve the linearly constrained \(\ell _1\)\(\ell _2\) minimization. We have shown that the convergence rate of the original Bregman method is \(\mathcal{O }(\frac{1}{k})\) and that of the accelerated Bregman is \(\mathcal{O }(\frac{1}{k^2})\) for the general linearly constrained nonsmooth minimization, based on the equivalence between the Bregman method and the augmented Lagrangian method. The numerical simulation results shows that the proposed method is faster than the original Bregman method and, when the regularization parameter \(\beta \) for \(\ell _2\) is relatively small, is even faster than the accelerated linearized Bregman method.