1 Introduction

Let \(A\in \mathbb {R}^{m\times n}\) with m < n or mn (in compressed sensing), \(b\in \mathbb {R}^{m}\), and \(u\in \mathbb {R}^{n}\). A basis pursuit (BP) problem is a constrained minimization problem as follows [11]:

$$ \min_{u\in \mathbb{R}^{n} }\left\lbrace ||u||_{1}: Au=b \right\rbrace, $$
(1)

which gives the solution of the underdetermined linear system with minimal L1 norm. This formulation has been suggested by Chen and Donoho [11] to find a sparse solution among the many possible solutions to Au = b. Basis pursuit problem emerges in many applications such as statistics [15], image compression [22], spectral decomposition [23] and compressed sensing [9, 14, 33, 38], image reconstruction [39], specially its applications in medical images [36, 37]. Instead of problem (1), one can consider the unconstrained problem:

$$ \min_{u\in \mathbb{R}^{n} } \frac{1}{2}\Vert Au-b {\Vert_{2}^{2}} + \lambda ||u||_{1} , $$
(2)

for λ ≥ 0 named LASSO (Least Absolute Shrinkage And Selection Operator) which is closely related to that. Indeed, in the [32], the authors proved that a solution of (1) is either u = 0, or else is a solution of problem (2), for some λ > 0. In (1), one can use another norm instead of L1 norm. Especially, considering sparsity for the solution, L0 norm can be replaced with L1 norm since it gives the sparsest solution for the problem at hand (note that L0 is not actually a norm) [24, 26]. But, it is an NP-hard problem and so it is difficult to solve [24, 26]. In [29], it is mentioned that under suitable circumstances, the solution to the problem (2) gives the sparsest solution satisfying Au = b. Indeed, the problem (2) is a convex approximation to the L0 regularizing problem [17]:

$$ \min_{u\in \mathbb{R}^{n} } \frac{1}{2}\Vert Au-b {\Vert_{2}^{2}} + \lambda ||u||_{0} , $$
(3)

where ∥u0 corresponds to the total number of nonzero elements in u. The problem (2) can be considered as a second order cone programming problem [17] and therefore can be solved by conventional methods such as an interior point method [5]. However, because of its computational complexity, these conventional methods are not suitable for large scale data [29] which arises in the real applications such as CS reconstruction of Magnetic Resonance Imaging (MRI). Many algorithms solve efficiently the problem (2). Some of these methods can be listed as follows: infeasible-point subgradient algorithm (ISA) [25], homotopy based methods [27, 28], fixed - point continuation (FPC) [19, 40], L1-Magic algorithm [6], linearized Bregman and its generations [21, 29, 41, 42] and proximal gradient based algorithms such as alternating direction method (ADM) [30], alternating direction method of multipliers (ADMM) [30], iterative shrinkage thresholding algorithm (ISTA) and its modification, fast iterative shrinkage thresholding algorithm (FISTA) [2, 3, 30, 34, 35]. Compared to the traditional methods, the above methods, had been suggested to be suitable for problems with large - scale data and sparse solutions and had been used successfully for the LASSO problem. Most of these methods are proposed to solving LASSO (2) which has nothing to do with the BP problem. Among these methods, fixed - point continuation and an active set method can also solve the BP problem directly only if λ is set to be a tiny value (e.g., 1e-10).

In this paper, we proposed a method based on FISTA which solves the BP problem directly. Indeed, we modified FISTA using a continuation strategy to lead to a less number of iterations compared to FISTA and to solve the BP problem instead of the LASSO problem. Given that the only difference between the FISTA and AFISTA methods is related to the continuation strategy added to the AFISTA algorithm, which has little computational cost compared to the matrix-vector multiplication, the computational cost of each of these two algorithms at each iteration is almost the same. But the number of iterations for AFISTA is considerably less than FISTA. Also, we presented the convergence analysis of the proposed method. In order to further accelerate the proposed method, we used the Barzilai-Borwein (BB) method [4]. Also, for applications with orthogonal sensing matrix A, we proposed a specialized version of the AFISTA method. The efficiency of the proposed method is illustrated through a variety of numerical experiments such as sparse signal recovery, recovery of a noisy sinusoidal wave, compressive sensing for MRI images, and image deconvolution.

This paper is organized as follows: Section 2 recalls ISTA and FISTA methods which is the core of the proposed method, in Section 3, we proposed the AFISTA method and its generations, also, we presented its convergence analysis, in Section 4, some numerical experiments were presented. Finally, in the conclusion section, we summarize the whole text of this paper.

2 ISTA and FISTA methods

In this section, the ideas of ISTA and FISTA algorithms, the core of our proposed method, are shortly described In [2], the authors considered the following general formulation to solve

$$ \begin{array}{@{}rcl@{}} \min_{u\in \mathbb{R}^{n} } \quad f(u)+g(u) , \end{array} $$
(4)

where \(g:\mathbb {R}^{n} \rightarrow \mathbb {R} \) is a continuous convex function which can be both smooth or non-smooth. And \(f: \mathbb {R}^{n} \rightarrow \mathbb {R} \) is a convex and continuously differentiable function with Lipschitz continuous gradient, i.e.,

$$ \begin{array}{@{}rcl@{}} \Vert \nabla f(u_{1}) - \nabla f(u_{2}) \Vert \leq L \Vert u_{1} - u_{2} \Vert , \quad \forall u_{1}, u_{2} \in \mathbb{R}^{n}, \end{array} $$

where ∥.∥ denotes the Euclidean norm in the space \(\mathbb {R}^{n}\). The basic iterative scheme of the ISTA for solving the convex optimization problem (4) is as follows

$$ \begin{array}{@{}rcl@{}} u^{k} &=& \underset{ u \in \mathbb{R}^{n} }{\operatorname{argmin}} \quad f(u^{k-1})+\left\langle u - u^{k-1}, \nabla f(u^{k-1}) \right\rangle + \frac{L}{2}\big\Vert u - u^{k-1}\big\Vert^{2}+ g(u)\\ &=& \underset{ u \in \mathbb{R}^{n} }{\operatorname{argmin}} \quad g(u) + \frac{L}{2}\Big\Vert u - \left( u^{k-1} - \frac{1}{L}\nabla f(u^{k-1}) \right) \Big\Vert^{2}. \end{array} $$

Now by setting \(f(u) = \frac {1}{2}\Vert Au-b \Vert ^{2}\) and g(u) = λu1, we have ∇f(u) = AT(Aub) and L(f) = ∥ATA∥ and also [2]

$$ \begin{array}{@{}rcl@{}} u^{k} = \mathcal{T}_{\lambda /L} \left( u^{k-1} - \frac{1}{L}A^{T}(Au^{k-1}-b) \right), \end{array} $$

where \(\mathcal {T}_{\alpha }:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) is the shrinkage (soft thresholding) operator defined by

$$ \begin{array}{@{}rcl@{}} \mathcal{T}_{\alpha}(\textbf{x})_{i}=\text{sgn}(x_{i})\max\left\lbrace |x_{i}|-\alpha , 0\right\rbrace . \end{array} $$

Finally, algorithm of the ISTA method for solving LASSO problem can be written as follows [2]

figure a

Algorithm 1 shows the ISTA algorithm exactly uses two matrix-vector multiplications involving A and AT followed by a shrinkage step. So, it is suitable for large-scale applications. Note that, ISTA is the special case of the proximal forward-backward method proposed in [8] and [31]. A general convergence results for the proximal forward-backward method can be found in [13]. A sufficient but not necessary condition to ensure the convergence of the sequence \(\left \lbrace u^{k} \right \rbrace \) generated by the ISTA for the optimal point of the LASSO problem is that the stepsize L satisfies \(\frac {1}{L}\in \left (0, \frac {1}{\Vert A^{T}A \Vert }\right ) \) [16] or equivalently L ≥∥ATA∥. The authors in [7] proved that the sequence \(\left \lbrace u^{k} \right \rbrace \) can converge to minimizer of the LASSO problem with \(L>\frac {1}{2}\Vert A^{T}A \Vert \). However, it is not a necessary condition too.

The main drawback of the ISTA algorithm is its rate of convergence. Indeed, it converges slowly to the optimal objective function value. To fix this drawback of the ISTA, in [2] a faster algorithm called FISTA is proposed. The main difference between FISTA and ISTA algorithms is that FISTA uses yk composed of uk− 1 and uk− 2, whereas ISTA is only based on uk− 1. Algorithm 2 shows FISTA method for LASSO problem [2]

figure b

As appears, the cost of calculation of Algorithms 1 and 2 are almost the same, this is because the execution times of lines 4 and 5 are obviously marginal. Now, we report two theorems of the convergence rate of the ISTA and FISTA methods presented in [2]. Suppose that \(F(u) = \frac {1}{2}\Vert Au-b \Vert ^{2} + \lambda \Vert u \Vert _{1}\) and BP problem is solvable, i.e., U = argminF and also F be the optimal function value of F at an optimal point u.

Theorem 1

[2] Let \(\left \lbrace u^{k} \right \rbrace \)be the sequence generated by ISTA method. Then for any k ≥ 1

$$ \begin{array}{@{}rcl@{}} F(u^{k})-F(u^{*})\leq \frac{C}{k}, \quad \forall u^{*} \in U^{*}, \end{array} $$

where C is a positive constant.

Theorem 2

[2] Let \(\left \lbrace u^{k} \right \rbrace \) be the sequence generated by FISTA method. Then for any k ≥ 1

$$ \begin{array}{@{}rcl@{}} F(u^{k})-F(u^{*})\leq \frac{C}{(k+1)^{2}}, \quad \forall u^{*} \in U^{*}, \end{array} $$

where C is a positive constant.

In [35], the authors note that the ISTA and FISTA methods have local linear convergence rate, i.e., the convergence of these methods is linear when they reach close to the optimal point.

3 AFISTA method

Considering the importance of the convergence rate in large-scale problems, we advocate in favor of using a continuation strategy along with the BB stepsize for accelerating the FISTA method.

3.1 Accelerating FISTA with continuation

Here, the AFISTA method is presented. In [19], the authors introduced a fixed point continuation method that solves a sequence of LASSO problems with different values of the regularization parameter λ. Indeed, for a decreasing sequence \(\left \lbrace \lambda _{k} \right \rbrace \), as a warm start for the next LASSO problem, they used the solution of the i-th LASSO problem associated with the regularization parameter λk, which was obtained using fixed - point algorithm. Similar continuation strategies have also been used in [1, 17, 20, 28].

Here, we have used the continuation technique to accelerate the FISTA method in a different way. Compared with previous works, that produce decreasing sequence \(\left \lbrace \lambda _{k} \right \rbrace \) with a fixed descent rate 0 < β < 1, we used the convergence of the sequence {uk} generated by FISTA method to produce a decreasing sequence of the regularization parameters which converges to zero. Indeed, here we produced the decreasing sequence \(\left \lbrace \lambda _{k} \right \rbrace \), first by setting λ1 as \( \lambda _{1}=\tilde {\lambda } \Vert b\Vert _{2}\Vert A^{T}b\Vert _{2} /L\) and then at iteration k, by setting λk+ 1 as follows

$$ \begin{array}{@{}rcl@{}} &&\lambda^{\prime}=\tilde{\lambda}\times\Vert b\Vert_{2}\times\Vert u^{k}-u^{k-1}\Vert_{2},\\ &&\textbf{if} \quad \lambda^{\prime} < \gamma \lambda_{k}, \\ && \quad \lambda_{k+1} = \lambda^{\prime}, \\ &&\textbf{else} \\ && \quad \lambda_{k+1} = \lambda_{k}, \end{array} $$
(5)

where \(\tilde {\lambda }\) and γ are positive constants. It is notable that, the ISTA and FISTA are proposed to solve LASSO problem which has nothing to do with BP problem. Whereas, the proposed method, by converging λk to 0, solves BP problem. Also, to ensure the convergence of the sequence λk to 0, we need to take γ < 1. Numerical results show how much the use of this method to generate a sequence of regularization parameters in the continuation technique can be successful. Now, we are ready to present the algorithm of the AFISTA for the BP problem

figure c

Given that the only difference between the FISTA and AFISTA methods is in the fifth and seventh lines of the AFISTA algorithm, which has little computational cost compared to the matrix-vector multiplication, the computational cost of each of these two algorithms at each iteration is almost the same. But the number of iterations for AFISTA is considerably less than FISTA. This fact is clearly shown in Fig. 1, where the total number of iterations of the FISTA and AFISTA methods to reach the same stopping criterion are 7651 and 155, respectively. In this experiment, we take the sensing matrix A to be a 300 × 1000 Gaussian random matrix, the total number of nonzeros of u to be 50 and λ = 0.005 for FISTA method and \(\tilde {\lambda } = 2 \) and γ = 0.9 for AFISTA method and we use the stopping criterion \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }<10^{-5}\) for these two methods.

Fig. 1
figure 1

Plot of residual and relative error as a function of iteration counts in the left and right hand side, respectively. Solid line: AFISTA. Dashed line: FISTA

Table 1 shows the difference between the convergence rate of the AFISTA and FISTA methods in terms of the number of iterations (NoIter for short) along with stoping criterion \(\frac {\Vert u^{k}-u^{*} \Vert }{\Vert u^{*} \Vert }\leq \varepsilon \) for different values of ε. Here, Func. Err. denotes objective function error F(uk) − F(u). Observe the big difference between AFISTA and FISTA: the former needs 715 iterations with γ = 0.5, 733 iterations with γ = 0.7 and 708 iterations with γ = 0.9 for ε = 1014 whereas the latter requires 3282 iterations for ε = 101. In this test, we take the sensing matrix A to be a 300 × 1000 Gaussian random matrix, the total number of nonzeros of u to be 50 and \(\tilde {\lambda } = 2.5 \). It is notable that, numerical results are obtained for 10 random samples.

Table 1 Iteration numbers of AFISTA .vs. FISTA to reach the stoping criterion \(\frac {\Vert u^{k}-u^{*} \Vert }{\Vert u^{*} \Vert }\leq \varepsilon \)

3.2 Convergence analysis of the AFISTA

In this section, first we present the convergence result of the FISTA. Then, we propose the convergence theorem of the AFISTA. Note that, the Authors in [10], proved the convergence of the objective function. However, the convergence of the sequence \(\left \lbrace u_{k}\right \rbrace \) given by FISTA was not known. In [10], the Authors show that with a small modification, one can ensure the convergence of the objective function as well as the convergence of the sequence \(\left \lbrace u_{k}\right \rbrace \) itself. Here, we present the theorem proved in [10].

Theorem 3

[10] Suppose that a > 2 be a real number, and \(\forall k\in \mathbb {N}\) let \(t_{k}=\frac {k+a-1}{a}\). Then, the sequence \(\left \lbrace u_{k}\right \rbrace \) produced by FISTA converges to a minimizer of objective function F.

Note that, here we set objective function \(F(u) = \frac {1}{2}\Vert Au-b \Vert ^{2} + \lambda \Vert u \Vert _{1}\), which is the objective function of the LASSO problem. Now, we are ready to present the following theorem of the convergence of sequence \(\left \lbrace u_{k}\right \rbrace \) given by AFISTA.

Theorem 4

Suppose that \(\left \lbrace u_{k}\right \rbrace \) is the sequence given by AFISTA. Then, the sequence \(\left \lbrace u_{k}\right \rbrace \) converges to the minimizer of the Basis Pursuit problem.

Proof

Based on updating rule (5), it is obvious that the sequence \(\left \lbrace \lambda _{k}\right \rbrace \) is a non-increasing sequence. Also it is not a fixed sequence. For some k ≥ 1, suppose that at iteration k, we have λk = λk+ 1 = ⋯. It means that, from iteration k, we are solving LASSO problem with regularization parameter λk using FISTA starting from initial point uk. Thus, the convergence of FISTA method yields that for a given 0 < γ < 1

$$ \begin{array}{@{}rcl@{}} &&\exists N^{\prime}\in \mathbb{N} \text{ s. t. } \Vert u_{k+N^{\prime}-1} - u_{k+N^{\prime}-2} \Vert < \gamma \Vert u_{k-1} - u_{k-2} \Vert \\ &&\Rightarrow \tilde{\lambda}\Vert b\Vert \Vert u_{k+N^{\prime}-1} - u_{k+N^{\prime}-2} \Vert < \gamma \tilde{\lambda}\Vert b\Vert \Vert u_{k-1} - u_{k-2} \Vert \\ &&\Rightarrow \lambda_{k+N^{\prime}} < \gamma \lambda_{k}. \end{array} $$
(6)

So, the sequence \(\left \lbrace \lambda _{k}\right \rbrace \) is a positive decreasing sequence, and converges to a point. Also, from (6), one can conclude that the sequence \(\left \lbrace \lambda _{k}\right \rbrace \) has a subsequence which satisfies \(\lambda _{k_{i+1}} < \gamma \lambda _{k_{i}} \quad \forall i\in \mathbb {N}\). This means that, the sequence \(\left \lbrace \lambda _{k}\right \rbrace \) is a convergent seguence which has a subsequence that converges to zero. Thus, we have \(\left \lbrace \lambda _{k}\right \rbrace \rightarrow 0\) which will force the equality Au = b constraint. This means that eventually AFISTA solves the Basis Pursuit problem. □

Note that, throughout this paper we set a = 3 in formula \(t_{k}=\frac {k+a-1}{a}\).

3.3 AFISTA with BB stepsize

The Barzilai-Borwein (BB) method was proposed in [4]. The motivation of the BB method is the same idea behind quasi-Newton’s method. In fact, BB method provides an approximation to the inverse of the Hessian matrix of the problem at hand, and this approximation simply is set to be a multiple of the identity matrix, that is, the step size αk is chosen so that αkI approximates the inverse of the Hessian at iteration k.

Let Sk− 1 = ukuk− 1 and Gk− 1 = gkgk− 1, then the BB stepsize can be computed as follows

$$\alpha_{k} = \underset{ \alpha \in \mathbb{R} }{\operatorname{argmin}}\quad \Vert S^{k-1}-\alpha G^{k-1}\Vert^{2},$$

and so by differentiating and setting it to zero, we obtain

$$ \begin{array}{@{}rcl@{}} \alpha_{k} = \frac{(G^{k-1})^{T}S^{k-1}}{(G^{k-1})^{T}G^{k-1}}. \end{array} $$
(7)

Now, in order to further accelerate the AFISTA method, we intend to use the BB method, which has been successfully used in similar works for l1-minimization issues [12, 17, 40, 41]. The AFISTA algorithm contains two basic steps in each iteration. The first step has one iteration of the gradient descent method with fixed stepsize \(\frac {\mathbf {1}}{\mathbf {L}}\), which performs the minimization of the function \(f(u) = \frac {1}{2}\Vert Au-b \Vert ^{2}\). The second step uses a shrinkage operator, which imposes the sparsity assumption on the obtained solution. Therefore, to improve the convergence rate of the first step, we changed the value of stepsize furnished by BB stepsizes. The above discussions lead to an accelerated version of AFISTA, namely AFISTA-BB (see Algorithm 4).

figure d

Each iteration of the AFISTA-BB algorithm uses two matrix-vector multiplications as in the previous methods, but in practice, its convergence rate is better than the AFISTA. Left and right - hand side of Fig. 2 shows the residual and relative error as a function of iteration counts, respectively. In this figure, to reach the stopping criterion \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }<10^{-5}\), the total number of iterations of the AFISTA and AFISTA-BB methods are 122 and 71, respectively. Figure 2 shows also the AFISTA-BB method has significantly better performance than the AFISTA. The sensing matrix A is a 600 × 2000 Gaussian matrix. All other parameters used in this figure are the same as used in Fig. 1.

Fig. 2
figure 2

Plot of residual (left) and relative error (right) as a function of iteration. Solid line: AFISTA-BB. Dashed line: AFISTA

3.4 Orthogonal case

Let us consider the specific case of AAT = I which appears in the most applications. In this case, all of the stepsizes are equal to one. For steepest descent stepsize, we have

$$ \begin{array}{@{}rcl@{}} \alpha_{k} = \frac{(g^{k})^{T}g^{k}}{(g^{k})^{T}A^{T}Ag^{k}}=\frac{(Au^{k}-b)^{T}}{brace{AA^{T}}(Au^{k}-b) }{ (Au^{k}-b)^{T}\underbrace{AA^{T}AA^{T}}(Au^{k}-b) }=1. \end{array} $$

And for BB stepsize, we have

$$ \begin{array}{@{}rcl@{}} &S^{k-1}=u^{k}-u^{k-1}\\ &G^{k-1}=g^{k}-g^{k-1}=A^{T}A(u^{k}-u^{k-1}), \end{array} $$

thus

$$ \begin{array}{@{}rcl@{}} \alpha_{k} = \frac{(G^{k-1})^{T}S^{k-1}}{(G^{k-1})^{T}G^{k-1}} =\frac{ (u^{k}-u^{k-1})^{T}A^{T}A(u^{k}-u^{k-1}) }{ (u^{k}-u^{k-1})^{T}A^{T}\underbrace{AA^{T} }A(u^{k}-u^{k-1}) }=1. \end{array} $$

Therefore, for this particular case, the AFISTA algorithm can be rewritten as follows

figure e

4 Numerical experiments

In this section, the efficiency of the AFISTA algorithm is shown for solving the BP problem. We applied the AFISTA on several examples. First, it is done on sparse signal recovery, second on noisy sinusoidal wave recovery, third on CS for sparse MRI, and finally on image deconvolution problem. All of the discussed algorithms were implemented using Lenovo B590 laptop with Intel(R) Celeron(R) CPU 1005M @ 1.90GHz Processor in MATLAB. Throughout this section, the accuracy of the methods is measured by relative error \(\frac {\Vert u^{k}-\bar {u} \Vert }{\Vert \bar {u} \Vert }\), where \({\bar {\mathbf {u}}}\) and uk stand for the original signal and the reconstructed one at iteration k, respectively.

4.1 Sparse signal recovery

In this subsection, our goal is to reconstruct the original sparse signal \({\bar {\mathbf {u}}}\) from the sensing matrix A and the observation b. To generate the original signal \({\bar {\mathbf {u}}}\), the position of the non-zero components of signal \({\bar {\mathbf {u}}}\) were randomly selected using the randsample command in MATLAB and then each of these non-zero components were sampled from the standard Gaussian (randn in MATLAB) or from [− 1,1] uniformly at random (2 ∗rand − 1 in MATLAB) or from \(\left \lbrace -1,1\right \rbrace \) standard Gaussian random (sign(randn) in MATLAB). And we use three types of measurement matrices: orthogonalized Gaussian matrices that its entries were generated using Gaussian distribution and its rows were orthogonalized by QR decompositions, standard Gaussian matrices, and partial discrete cosine transform (DCT) matrices that its m rows were randomly selected from the n × n DCT matrix. Note that, when we deal with the partial DCT matrix, we can compute matrix-vector multiplications involving A and AT using the discrete cosine transform and the inverse discrete cosine transform, respectively. Here, we have done two types of tests: Noise - free Test: its results are given in Fig. 3 and Table 2, and Noisy Test: its results are given in Figs. 456 and Table 3. In noisy case, we take the measurement vector b as follows

$$b=A\bar{u}+\eta , $$

where η is the white Gaussian noise of variance σ2. To detect the noise level, instead of using σ, we use the signal to noise ratio (SNR), which is defined as follows

$$ \begin{array}{@{}rcl@{}} \text{SNR}=20\log_{10}\left( \frac{\Vert \bar{u} \Vert}{\Vert \eta \Vert } \right) \end{array} $$
Fig. 3
figure 3

Noise - free test. Top: Original signal with \(\Vert \bar {u}\Vert _{0}=50\). Middle: Reconstructed signal using AFISTA-BB algorithm with ∥uk0 = 51 after 346 iterations. Bottom left: Graph of residual values with respect to iteration counts. Bottom right: Graph of relative error with respect to iteration counts

Table 2 Noiseless test: Comparison of the SAFISTA for orthogonal case with KB and LB methods [41]. Numerical results are obtained for 20 random samples
Fig. 4
figure 4

Noisy test for three different noise levels with SNRs = 16.35,6.67,− 3.74 and for standard Gaussian signal with \(\Vert \bar {u}\Vert _{0}=50\). Original signal: red dots. Reconstructed signal: blue circles

Fig. 5
figure 5

Noisy test for three different noise levels with SNRs = 8.30,2.41,− 4.23 and for uniformly random [− 1,1] signal with \(\Vert \bar {u}\Vert _{0}=50\). Original signal: red dots. Reconstructed signal: blue circles

Fig. 6
figure 6

Noisy test for three different noise levels with SNRs = 15.06,5.59,− 4.91 and for uniformly random \(\left \lbrace -1,1\right \rbrace \) signal with \(\Vert \bar {u}\Vert _{0}=50\). Original signal: red dots. Reconstructed signal: blue circles

Table 3 Noisy test: Comparison of the AFISTA and the KO [29]. Standard Gaussian and partial DCT matrices are used for AFISTA-BB and SAFISTA algorithms, respectively. Numerical results are obtained for 10 random samples

Here, we used AFISTA-BB method along with the following parameters, sensing matrix A = 600 × 2000 standard Gaussian matrix, \(\Vert \bar {u}\Vert _{0}=50\) and \(\tilde {\lambda } =2 \). Figure 3 shows the residual and the relative error. AFISTA-BB algorithm converges to a solution with 10− 15 residual and relative error in less than 350 iterations. The stopping criterion is \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }<10^{-15}\). Figures 46 show the results of the noisy tests with three different levels of noise. These figures show, even for huge noise levels with negative SNRs, the AFISTA-BB algorithm works well.

In Table 2, we compare specialized AFISTA for orthogonal case, termed SAFISTA, with two different methods, namely kicking+BB line search (KB) and L-BFGS (LB) of the linearized Bergman [41]. In this Table, following [41], we use orthogonalized Gaussian matrices as sensing matrices and we report average results which are obtained using 20 random instances. Also, we let \(\tilde {\lambda }=0.1\) for Gaussian signals and \(\tilde {\lambda }=0.05\) for uniformly random [− 1,1] signals, α = 0.95 and the stopping criterion equal to \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }<10^{-5}\). From Table 2, one can observe that the number of matrix-vector multiplications required for the SAFISTA method is less than linearized Bergman’s methods, so SAFISTA is faster than these tow methods. Notably, the quality of reconstructions for all three methods are the same.

Table 3 shows the comparison of AFISTA and the linearized Bregman with kicking (KO) [29]. Let std denotes the standard deviation. For standard Gaussian matrices and partial DCT ones we use the AFISTA-BB with stopping criterion std\(\left (Au^{k}-b\right ) <\sigma \) and SAFISTA with std\(\left (Au^{k}-b\right ) < 0.5\sigma \), respectively. We report mean results obtained for 10 instances. Numerical results show that the AFISTA method overcomes KO method from the perspective of convergence rate and accuracy.

Table 4 shows the comparison of the SAFISTA with FPC-AS (i.e. FPC Active Set [40]). We used latest version of the FPC-AS Matlab code, which is available at http://www.caam.rice.edu/~optimization/L1/FPC_AS/. For a fair comparison, first, we run the FPC-AS code and then, having the FPC-AS’s solution uF, we terminated the SAFISTA method at iteration k when the following solution uk is satisfied

$$ \begin{array}{@{}rcl@{}} 10^{-5}\Vert u^{k} \Vert_{1} + \frac{1}{2}\Vert Au^{k}-b {\Vert_{2}^{2}}\leq 10^{-5}\Vert u_{F} \Vert_{1} + \frac{1}{2}\Vert Au_{F}-b {\Vert_{2}^{2}}. \end{array} $$
Table 4 Comparison of the SAFISTA and the FPC active set [40]. Numerical results are obtained for 20 random samples

Table 4 shows also the average results which are obtained for 20 random instances using orthogonalized Gaussian matrices as sensing matrices. Also, we take α = 0.95 and \(\tilde {\lambda } = 0.04\). Note that, in this table, we have both noise - free and noisy cases. From this table, one can see that for the noise free cases FPC-AS works better than SAFISTA, but in noisy cases, SAFISTA works better and required fewer matrix-vector multiplications than FPC-AS. In the last column of this table we present the time ratio of FPC-AS and proposed methods, which the average of this column shows that for noise - free cases stopping time of these two methods is approximately the same but for noisy cases, the proposed method is approximately 2 times faster than FPC-AS.

4.2 Recovery of noisy sinusoidal wave

For the first compressive sensing application in the frequency domain, we consider a sinusoidal wave that is sparse in frequency domain:

$$ \begin{array}{@{}rcl@{}} \bar{x}(t) = a_{1}\sin(\beta_{1}t) + a_{2}\cos(\beta_{2}t), \end{array} $$

where ai and βi denote magnitudes and frequencies of the signal. The observed signal \(\tilde {x}\), is a noisy signal with the form \(\tilde {x} = \bar {x} + \eta \), where η is the white Gaussian noise of variance σ2. Note that, in CS application, only a random measurement of observed signal \(\tilde {x}\) is available. So, the problem is to reconstruct the original signal \(\bar {x}\) from a random measurement of the observed signal \(\tilde {x}\). Then, the problem can be written as Au = b, where \(b=\tilde {x}(I)\), I is a random subset of \( \left \lbrace 1,2,{\ldots } ,n \right \rbrace \), n is total signal length and the sensing matrix A is the partial inverse Fourier transform matrix. Note that, signal u is in the Fourier domain, thus we need to take an inverse Fourier transform to get the reconstructed signal in the physical domain. In numerical experiments, to generate original signal \(\bar {x}\), the magnitudes ai are generated uniformly at random from [− 1,1] and βis are taken to be random multiplies of \(\frac {2\pi }{n}\), i.e., \(\beta _{i} = k_{i}\frac {2\pi }{n}\), where ki is randomly selected from \(\left \lbrace 0,1,{\ldots } ,n-1 \right \rbrace \).

In this test, AFISTA-BB algorithm is used with parameters \( \tilde {\lambda }=0.01\), L = 1 and the stopping criterion \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }<0.01\). Numerical results for this test for three different levels of noise with SNRs = 9.5369,4.0767 and-0.9420 are shown in Figs. 78 and 9. Details of the results obtained in this test are presented in the Table 5. These results show that to have a reliable reconstruction at the higher level of noise, we need to use a greater number of measurements.

Fig. 7
figure 7

Reconstruction using 30% of the noisy signal. Top left: Original (red) and noisy (blue) signals. Top right: The absolute magnitude of original (red dots) and reconstructed (blue circles) signals in frequency domain. Bottom left: Original (red) and reconstructed (blue) signals. Bottom right: A close-up view of the bottom left figure

Fig. 8
figure 8

Reconstruction using 50% of the noisy signal. Top left: Original (red) and noisy (blue) signals. Top right: The absolute magnitude of original (red dots) and reconstructed (blue circles) signals in frequency domain. Bottom left: Original (red) and reconstructed (blue) signals. Bottom right: A close-up view of the bottom left figure

Fig. 9
figure 9

Reconstruction using 70% of the noisy signal. Top left: Original (red) and noisy (blue) signals. Top right: The absolute magnitude of original (red dots) and reconstructed (blue circles) signals in frequency domain. Bottom left: Original (red) and reconstructed (blue) signals. Bottom right: A close-up view of the bottom left figure

Table 5 Details of the results obtained in Figs. 78 and 9

4.3 CS for sparse MRI

To illustrate the efficiency of our method in practical applications, we tested it on MRI images. All images used in this section are of the size 720 × 960 which are available at https://usa.healthcare.siemens.com/magnetic-resonance-imaging/. After the vectorization of these images, the original signal \({\bar {\mathbf {u}}}\) will be of length n = 691200. In the CS application for sparse MRI, we need to take the sensing matrix A to be a partial discrete Fourier transform matrix. Note that, we compute matrix-vector multiplications involving A and AT using the fast Fourier transform and the inverse fast Fourier transform, respectively. Throughout this part, we use the AFISTA-BB algorithm with parameters \( \tilde {\lambda }=1\times 10^{-10}\), L = 1 and the stopping criterion \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }<5\times 10^{-3}\).

Considering the importance of the sparsity assumption, we first start with angiogram images, which is usually sparse in the spatial domain. We used four angiogram images with different sparsity levels. We call these four images angio1, angio2, angio3 and angio4, which are used for producing Fig. 10, from top to bottom, and Fig. 11, respectively. Figure 10 shows the results of our first test on angiography images. In this figure, we compare the AFISTA-BB algorithm with the split Bregman method [18]. The Matlab code for the split Bregman method can be downloaded from http://tag7.web.rice.edu/Split_Bregman.html. The obtained results show that for enough sparse images, the AFISTA-BB works better than split Bregman in convergence rate and accuracy of the solution. In the other words, Whereas AFISTA-BB is faster than split Bregman, the quality of reconstructed images using AFISTA-BB is better than split Bregman. Details of the results obtained in this test are presented in the Table 6. Figure 11 shows the results of our second test on angiography images. In this test, we use image angio4 that is 52% sparse. Compared to the images used in Fig. 10, which are about 70% sparse, this image is not sparse enough to reconstruct using 30% of its frequency domain data. From Fig. 11, we can see that the AFISTA-BB algorithm failed to have a proper reconstruction using 30% of frequency domain data.

Table 6 Details of the results obtained in Fig. 10
Fig. 10
figure 10

CS reconstruction of MR Angiography images using 30% of frequency domain data. Columns from left to right: Original images, images reconstructed using 30% of the frequency domain data, in which missing samples are filled with zero, images reconstructed by split Bregman algorithm after 8 inner iterations (3 outer iterations) and images reconstructed by AFISIA-BB using the stopping criterion \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }\leq 5\times 10^{-3}\), respectively

Fig. 11
figure 11

CS reconstruction of a 50% sparse MR Angiography image. Top: Original image. Middle left: image reconstructed using 30% of the frequency domain data, in which missing samples are filled with zero. Middle right: Result of AFISIA-BB algorithm using 30% of frequency domain data. Bottom left: image reconstructed using 50% of the frequency domain data, in which missing samples are filled with zero. Bottom right: Result of AFISIA-BB algorithm using 50% of frequency domain data

This figure also shows that the AFISTA-BB algorithm works well in reconstructing the image angio4 using 50% of its frequency domain data. Details of the results obtained in this test are presented in the Table 7. Figure 11 shows the importance of the sparsity assumption in BP problem. It is obvious that for other MRI images with significantly little sparsity level the AFISTA method will not work.

Table 7 Details of the results obtained in Fig. 11

As the last test, we applied AFISTA-BB on MRI images. Here, the original signal is considered as the wavelet coefficients of the MRI images, in which the Haar wavelet is used to decompose images up to five resolution levels. The results of this test are presented in Fig. 12 and its details are presented in Table 8.

Fig. 12
figure 12

CS reconstruction of MRI images in wavelet domain (The original signal is considered as the wavelet coefficients of the MRI images) using 25% of frequency domain data. Columns from left to right: Original images, Images reconstructed using 25% of the frequency domain data, in which missing samples are filled with zero and Images reconstructed by AFISIA-BB algorithm, which are obtained by using the stopping criterion \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }\leq 5\times 10^{-3}\), respectively

Table 8 Details of the results presented in Fig. 12

4.4 Image deconvolution

For our final experiment, we test the proposed AFISTA-BB on the Image Deconvolution problem. In other words, we set matrix A to be RW, in which, R is a matrix representation of the blur operation, and W denotes the inverse wavelet transform. Following [17], the blur kernel is setted to be \(h_{ij} = \frac {1}{i^{2} + j^{2}}\). Here, we use two well - known test images, Lena and Cameraman. And we compare the proposed method With GPSR-BB algorithm [17]. Both the paper and the code of the GPSR-BB algorithm are available at http://www.lx.it.pt/~mtf/GPSR/. For the GPSR algorithm, we use the demo ”demo_ image_ deblur.m” with default settings for its parameters except parameter ”tolA”. For implementation of the discrete wavelet transform, both algorithms use the Rice wavelet toolbox that it can be freely downloaded from http://www-dsp.rice.edu/software/rwt.shtml. Throughout this part, we use the AFISTA-BB algorithm with parameters \( \tilde {\lambda }=1\times 10^{-7}\), L = 1 × 10+ 3 and the stopping criterion \(\frac {\Vert Au^{k}-b\Vert }{\Vert b\Vert }<\epsilon \). The results of Image Deconvolution experiments are reported at Table 9 and Figs. 13 and 14, which show the efficiency of the proposed method. Note that, in this test, the quality of reconstructed images using the GPSR method is less than the proposed method. The psnr of reconstructed images shows this fact. For example, from Table 9, we can see that, for Lena test image GPSR-BB reached psnr= 33.69 in 15.28 seconds whereas AFISTA-BB reached psnr= 39.50 in 12.55 seconds. So, AFISTA-BB can reach a better quality of reconstruction in less time compared to GPSR-BB. Thus, from this test, we can see that the AFISTA method overcomes the GPSR-BB method from the perspective of convergence rate and accuracy.

Table 9 Image deconvolution test: Comparison of the AFISTA-BB with the GPSR-BB [17] for different tolerances
Fig. 13
figure 13

Image deconvolution test: Size of the blur kernel is [9,9]. Top left: Original image. Top right: Blurred Image. Bottom left: reconstructed using GPSR-BB algorithm with Psnr 28.09. Bottom right: reconstructed using AFISTA-BB with Psnr 57.60

Fig. 14
figure 14

Image deconvolution test: Size of the blur kernel is [17,17]. Top left: Original image. Top right: Blurred Image. Bottom left: reconstructed using GPSR-BB algorithm with Psnr 32.49.Bottom right: reconstructed using AFISTA-BB with Psnr 50.54

5 Conclusion

This paper proposes AFISTA to accelerate the FISTA method using a specific continuation strategy. Also, the convergence analysis of the proposed method is presented. Although the main computational cost of these two methods at each iteration remains almost the same, however thanks to the continuation strategy AFISTA method is significantly faster than the FISTA method. Furthermore, to improve its convergence rate, we furnished the AFISTA method with BB stepsize. In the sequel, AFISTA is specialized for the orthogonal case i.e. AAT = I. A fundamental difference between the proposed method and the FISTA method is that the proposed method solves BP problem while the FISTA method solves LASSO problem. As mentioned above, AFISTA is applied on various tasks of signal and image processing and compared with other methods such as generalizations of linearized Bregman, fixed - point continuation (FPC), split Bregman, and Gradient projection for sparse reconstruction (GPSR) methods. Numerical results show that in most cases, the proposed method in both the convergence rate and the quality of reconstruction is better than other methods. The main limitation of the proposed method is that its parameters need to set beforehand. The application of AFISTA on medical image processing is our future work.