1 Introduction

In this paper, H is denoted to be a real separable Hilbert space, 〈⋅,⋅〉 and ∥⋅∥ are respectively the scalar product and the norm in H. We assume that {φi}i∈Λ \(({\Lambda } \subset \mathbb {N})\) is an orthonormal basis of the Hilbert space H. For each uH, we denote u = (ui)i∈Λ, where ui = 〈u,φi〉 is the i th component of u with respect to the basis {φi}i∈Λ. Note that all results from this paper are true not only for an orthonormal basis but also for a frame in the real separable Hilbert space H. However, for simplicity, we assume that {φi}i∈Λ \(({\Lambda } \subset \mathbb {N})\) is an orthonormal basis of the Hilbert space H. We study the proximal method for the minimization problem

$$ \min_{u\in H} {\Theta}(u):= F(u)+\alpha{\Phi}(u), $$
(1)

where \(F:H \rightarrow \mathbb {R}\) is a Fréchet differentiable functional and \({\Phi }:H \rightarrow \bar {\mathbb {R}}:= \mathbb {R}\cup {\{+\infty \}}\) is defined by

$$ {\Phi}(u):=\| u\|_{0}={\sum}_{i\in {\Lambda}} |u_{i}|_{0}, $$
(2)

with

$$ |x|_{0}=\begin{cases}1 , & \text{ if } x\not= 0\\ 0,& \text{ if } x=0. \end{cases} $$

Problem (1) appears in l0-regularization of inverse problems, i.e., we want to find a solution u of the operator equation

$$ K(u)=y, $$
(3)

where \(K: H\rightarrow H\) is a Fréchet differentiable operator (linear or nonlinear), y is the exact data, which is unknown and we only obtain a noisy data yδ of y with

$$ \| y-y^{\delta} \|_{H} \leq \delta. $$
(4)

Further, we assume that problem (3) is ill-posed, which the solutions to the problem do not depend continuously on the data [5]. Therefore, the real problem is finding an approximation of the solution given the operator K and a noisy data yδ. Applying l0-regularization for this problem the regularized approximate solutions to (3)–(4) are seen to be minimizers of the minimization problem

$$ \min_{u\in H} {\Theta}(u):= F(u;y^{\delta})+\alpha{\Phi}(u), $$
(5)

where \(F(\cdot ;y^{\delta }):H \rightarrow \mathbb {R}\) measuring the difference between K(u) and yδ is a Fréchet differentiable functional. Note that problem (5) is an example of problem (1).

The advantage of l0-regularization is preserving the edges between homogeneous regions and nonhomogeneous. Thus, l0-regularization is used in many applications such as signal and image analysis, wavelet frame, compressive sensing, dictionary building, machine learning and actuator location problems. For linear inverse problems, Blumensath and Davies in [2, 3] have proposed an iterative thresholding scheme and proven the local convergence of this scheme. Needell and Tropp have presented the compressive sampling matching pursuit (CoSaMP). Foucart [7] has also improved CoSaMP by introducing hard thresholding pursuit based on a combination of iterative hard thresholding and compressive sampling matching pursuit algorithm. The other method is suggested by Lu et al. in [9]. Nikolova and E. Soubies have studied in detail about global solutions, see [13, 14]. Recently, Wachsmuth has studied the hard-iterative thresholding to optimal control problems with L0(Ω) control cost in space L2(Ω) in [15]. But for nonlinear inverse problems, there are few algorithms proposed to solve problem (1) in a real separable Hilbert space.

In this paper, we will propose a proximal method for problem (5) in a real separable Hilbert space based on the ideas of the proximal method in [8, 11]. Note that since the functional Φ is nonconvex, problem (1) is different from the minimization problem arising from l1-regularization in [8, 11]. Therefore, the results and the methods of proof for the convergence of the proximal method in our paper are different from those in [8, 11]. Furthermore, the iterative thresholding [2] or the iterative hard thresholding [3] are special cases of our proposed method since they are only applied to linear inverse problem, i.e., to problem (5) with F(u;yδ) = ∥Kuyδ2 and K is a linear operator. Thus, the results in this paper are extensions of those in [2, 3] to nonlinear inverse problems.

The main results in our paper are

  • The existence and the optimality condition for solutions to problem (1).

  • The convergence of the proximal method for problem (1) and the practical choice for step sizes.

The paper is organized as follows: In Section 2 we give explicit formula for the unique minimizer of the quadratic approximation of the cost functional. In Section 3, we study the existence and the optimality condition for solutions to problem (1). Section 4 is devoted to prove the convergence of the proximal method. A method of choosing step sizes is given in Section 5. Finally, we illustrate the performance of the proximal method by applying it to an identification problem in elliptic equations in Section 6.

2 First-order condition for global minimizers for quadratic approximation of the cost functional

The proximal methods developed in [2,3,4, 11] are based on quadratic approximate functionals and the proximal operators. In order to present the proximal method as well as the first-order optimality condition of problem (1), we need to introduce the proximal operator for l0-regularization and the quadratic approximate functionals.

The proximal operator (also known as the hard-thresholding operator ) is defined by

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{\lambda}(u)=\left( {P}_{\lambda}(u_{1}),{P}_{\lambda}(u_{2}),\dots,{P}_{\lambda}(u_{n}), \ldots\right), u\in H, \end{array} $$
(6)

where

$$ \begin{array}{@{}rcl@{}} {P}_{\lambda}(x)=\begin{cases} 0 &\text{ if } |x|< \sqrt{\lambda}\\ \{0,x\} &\text{ if } |x|= \sqrt{\lambda}\\ x&\text{ if } |x| > \sqrt{\lambda},\end{cases}\quad (x\in \mathbb{R}). \end{array} $$
(7)

Note that \(\mathbb {P}: H \rightrightarrows H\) is a multi-valued mapping.

For each fixed s > 0, the so called quadratic approximate functional of Θ(v) at a given point u is defined as follows:

$$ \begin{array}{@{}rcl@{}} {\Theta}_{s}(v,u)&:=F(u)+\langle \nabla F(u),v-u\rangle+\frac{s}{2}\|v-u\|^{2}+\alpha{\Phi}(v). \end{array} $$
(8)

Using the proximal operator, we can show that the minimization problem, \(\min \limits _{v\in H } {\Theta }_{s}(v,u)\), has at least one solution. This result is given in the following theorem.

Theorem 2.1

Let FC1(H). Then, for any uH fixed, the problem

$$\min_{v\in H} {\Theta}_{s}(v,u)$$

has at least one solution and u+ is a solution if and only if

$$u^{+} \in \mathbb{P}_{\frac{2\alpha}{s}}\left( u-\frac{1}{s}\nabla F(u)\right).$$

Proof

By simple calculation, which ignores constant term in u, the minimization problem, \(\min \limits _{v\in H } {\Theta }_{s}(v,u)\), is equivalent to

$$\min_{v\in H }\left\{\frac{1}{2}\left\|v-\left( u-\frac{1}{s}\nabla F(u)\right)\right\|^{2}+\frac{\alpha}{s}{\Phi}(v)\right\}.$$

This problem can be rewritten as

$$ \begin{array}{@{}rcl@{}}\min_{v\in H }\left\{{\sum}_{i\in {\Lambda}}\frac{1}{2}\left( v_{i}-\left( u_{i}-\frac{1}{s}(\nabla F(u))_{i}\right)\right)^{2}+\frac{\alpha}{s}|v_{i}|_{0}\right\}.\end{array} $$
(9)

Therefore, the minimum of the problem, \(\min \limits _{v\in H } {\Theta }_{s}(v,u)\), can be calculated by minimizing with respect to vi individually. For each i we distinguish two cases, vi = 0 and vi≠ 0. In the first case, by the definition of |.|0, we have

$$\frac{1}{2}\left( v_{i}-\left( u_{i}-\frac{1}{s}(\nabla F(u))_{i}\right)\right)^{2}+\frac{\alpha}{s}|v_{i}|_{0}=\frac{1}{2}\left( u_{i}-\frac{1}{s}(\nabla F(u))_{i}\right)^{2}.$$

In the second case, we have

$$\frac{1}{2}\left( v_{i}-\left( u_{i}-\frac{1}{s}(\nabla F(u))_{i}\right)\right)^{2}+\frac{\alpha}{s}|v_{i}|_{0}\ge\frac{\alpha}{s},$$

where the equality sign is obtained when \(v_{i}=u_{i}-\frac {1}{s}(\nabla F(u))_{i}\).

Comparing the values for both cases, the minimum of component i of problem (9) is

$$\min\left( \frac{1}{2}\left( u_{i}-\frac{1}{s}(\nabla F(u))_{i})\right)^{2},\frac{\alpha}{s}\right)$$

that is equal to

$$\begin{cases}\frac{1}{2}\left( u_{i}-\frac{1}{s}(\nabla F(u))_{i})\right)^{2} &, \text{ if }|u_{i}-\frac{1}{s}(\nabla F(u))_{i}|< \sqrt{\frac{2\alpha}{s}}, \\ \frac{1}{2}\left( u_{i}-\frac{1}{s}(\nabla F(u))_{i})\right)^{2}=\frac{\alpha}{s} &, \text{ if }|u_{i}-\frac{1}{s}(\nabla F(u))_{i}|= \sqrt{\frac{2\alpha}{s}}, \\ \frac{\alpha}{s}&,\text{ if } |u_{i}-\frac{1}{s}(\nabla F(u))_{i}|> \sqrt{\frac{2\alpha}{s}}. \end{cases}$$

Note that the first value is obtained when vi = 0, the second one is obtained when vi = 0 or \(v_{i}= u_{i}-\frac {1}{s}(\nabla F(u))_{i}\), and the last value is obtained when \(v_{i}= u_{i}-\frac {1}{s}(\nabla F(u))_{i}\not =0\), i.e., the minimum of component i is attained at

$$v_{i} \in {P}_{\frac{2\alpha}{s}}\left( u_{i}-\frac{1}{s}(\nabla F(u))_{i}\right).$$

Therefore, the problem, \(\min \limits _{v\in H } {\Theta }_{s}(v,u)\), has a solution

$$u^{+}=(v_{1},v_{2},\ldots,v_{n},\ldots),$$

or

$$u^{+} \in \mathbb{P}_{\frac{2\alpha}{s}}\left( u-\frac{1}{s}\nabla F(u)\right).$$

3 Existence of optimal solutions and optimality conditions

Before presenting the results of the existence of optimal solutions and optimality conditions to problem (1), we recall some basic notions which are not often used. The other basic notions such as convexity, Lipschitz continuity, Lipschitz Fréchet differentiability are well-known in calculus or convex analysis, so we do not recall them here.

Definition 3.1

Let a functional \(G: H \rightarrow \mathbb {R}\).

  • G is called bounded from below if there exists a constant C such that G(u) ≥ C for all uH.

  • G is called coercive if \(G(u) \rightarrow \infty \) as \(\|u\|\rightarrow \infty \).

  • G is called lower semi-continuous if for every uH and every sequence \(\{u^{k}\}, u^{k}\rightarrow u\) as \(k\rightarrow \infty \), we have

    $$G(u)\leq \liminf_{k\rightarrow\infty} G(u^{k}).$$

In our paper, the functionals, F and Φ, in problem (1) have some properties of G. In fact, F is assumed to be bounded from below, coercive and lower semi-continuous (see Assumption 1) and Φ is lower semi-continuous (see Lemma 3.1). In order to prove the existence of solutions and give the optimality conditions to problem (1) as well as the convergence of the proximal method, we need the following assumption.

Assumption 1

Assume that

  • F is coercive, bounded from below and lower semi-continuous.

  • F is Lipschitz Fréchet differentiable with the Lipschitz constant L, i.e.,

    $$\| \nabla F(u)-\nabla F(v)\|\leq L\|u-v\|, \forall u,v \in H .$$

Note that In Assumption 1, Condition a) is used to make sure the existence of a solution to problem (1), and Condition b) is used to obtain the optimality condition of solutions to problem (1) as well as the convergence of the proximal method for this problem. In fact, by Condition b) we have for \(s\geqslant L\)

$$| F(v)-F(u)-\langle \nabla F(u),v-u\rangle | \leqslant \frac{s}{2}\|v-u\|^{2}.$$

This is the key inequality used to prove the main results of the paper.

Lemma 3.1

The functional Φ defined by (2) is non-negative, nonconvex, and lower semi-continuous.

Proof

The first two properties are clear. The last property is followed by the lower semi-continuity of |x|0. □

Note that functional Φ in l0-regularization (given by (2)) is nonconvex and not coercive. Thus, problem (1) is nonconvex as well. To make sure the existence of a solution to problem (1) we need some properties of the functional F. Such sufficient conditions for F are given in the following lemma.

Lemma 3.2

Assume that F satisfies Condition a) of Assumption 1. Then, problem (1) has at least a solution (global minimizer).

Proof

The proof is standard and it is left to the readers. □

Theorem 3.1

Assume that F satisfies Condition b) of Assumption 1. Then, u is a solution (global minimizer) of problem (1) if and only if

$$ \begin{array}{@{}rcl@{}} u^{*} \in \mathbb{P}_{\frac{2\alpha}{s^{*}}}\left( u^{*}-\frac{1}{s^{*}}\nabla F(u^{*})\right), \end{array} $$
(10)

for any fixed sL.

Proof

Since u is a global minimizer of Θ, we have Θ(u) ≤Θ(u + h) for any hH. For sL, we have

$$ \begin{array}{@{}rcl@{}}{\Theta}_{s^{*}}(u^{*}+h,u^{*})-{\Theta}_{s^{*}}(u^{*},u^{*})&\geq&{\Theta}_{s^{*}}(u^{*}+h,u^{*})-{\Theta}(u^{*}+h)\\ &=&F(u^{*})-F(u^{*}+h)+\langle \nabla F(u^{*}),h\rangle+\frac{s^{*}}{2}\|h\|^{2}\geq 0. \end{array} $$

Therefore, \({\Theta }_{s^{*}}(u^{*}+h,u^{*})\geq {\Theta }_{s^{*}}(u^{*},u^{*})\) for any hH. It means that u is the global minimizer of \({\Theta }_{s^{*}}(\cdot ,u^{*})\). Then, the proof deduces from Theorem 2.1. □

Definition 3.2

Any point u satisfying (10) with sL is called a stationary point of Θ.

Remark 3.1

The condition sL is necessary to obtain the last inequality in the proof of Theorem 3.1. For s < L we could not prove such a result. This is different from l1-regularization, which we can prove a similar result for any s > 0, see, e.g., in [1, 10]. Furthermore, Condition (10) is only true for global minimizers of problem (1). We could not prove such a result for local minimizers. This is also the other difference when it is compared with l1-regularization, see, e.g., [10].

4 Proximal method

The main idea of the proximal method is based on the quadratic approximation method. In fact, at the k th iteration, uk+ 1 is determined by solving the quadratic approximation of Θ at uk, i.e.,

$$u^{k+1}\in \text{argmin}_{v\in H } {\Theta}_{s^{k}}(v,u^{k})$$

and then we need to propose some conditions for sk > 0 such that {uk} (with a subsequence) converges to a stationary point of the original problem. Here, the functional Θs(v,u) is defined by (8).

By Theorem 2.1, the sequence \(\{u^{k+1} \in \text {argmin}_{v\in H } {\Theta }_{s^{k}}(v,u^{k})\}\) is later given by

$$ \begin{array}{@{}rcl@{}} u^{k+1}\in \mathbb{P}_{\frac{2\alpha}{s^{k}}}\big(u^{k}-\frac{1}{s^{k}}\nabla F(u^{k})\big). \end{array} $$
(11)

Now we prove the convergence of this sequence. We separate the result into several following lemmas and theorems.

Lemma 4.1

Let Assumption 1 hold and {uk} is a sequence defined by (11), where the sequence {sk} satisfies skL + 𝜖 with 𝜖 > 0 for all \(k\in \mathbb {N}\). Then, the sequence {Θ(uk)} is monotonically decreasing, the sequence {uk} is bounded and \(\lim _{k\rightarrow \infty } \|u^{k+1}-u^{k}\|=0\).

Proof

Since F is Lipschitz differentiable with Lipschitz constant L. Therefore, for \(s\geqslant L\) we have

$$| F(v)-F(u)-\langle \nabla F(u),v-u\rangle | \leqslant \frac{s}{2}\|v-u\|^{2}.$$

Hence, with \(s\geqslant L\) we obtain

$${\Theta}(v)=F(v)+{\Phi}(v)\leqslant F(u)+\langle \nabla F(u),v-u\rangle +\frac{s}{2}\|v-u\|^{2}+{\Phi}(v)={\Theta}_{s}(v,u).$$

Consequently, we have (since sk > L)

$${\Theta}(u^{k+1})\leqslant {\Theta}_{s^{k}}(u^{k+1},u^{k}).$$

In addition, since uk+ 1 is the solution to \({\Theta }_{s^{k}}(\cdot ,u^{k})\) (see (11)), we have

$${\Theta}(u^{k+1})\leq{\Theta}_{s^{k}}(u^{k+1},u^{k})\le{\Theta}_{s^{k}}(u^{k},u^{k})={\Theta}(u^{k}).$$

Thus, the sequence {Θ(uk)} is monotonically decreasing.

The boundedness of {uk} is a consequence of the decreasing of {Θ(uk)}, the boundedness from below and the coercivity of Θ.

For each k, since L is the Lipschitz constant of ∇F and sk > L, we have

$$ \begin{array}{@{}rcl@{}} {\Theta}(u^{k+1})&=&F(u^{k+1})+\alpha{\Phi}(u^{k+1})\\ &\le& F(u^{k})+\langle\nabla F(u^{k}),u^{k+1}-u^{k}\rangle+\frac{L}{2}\|u^{k+1}-u^{k}\|^{2}+\alpha{\Phi}(u^{k+1})\\ &\leq& F(u^{k})+\langle\nabla F(u^{k}),u^{k+1}-u^{k}\rangle+\frac{s^{k}}{2}\|u^{k+1}-u^{k}\|^{2}+\alpha{\Phi}(u^{k+1})\\ &=&\min_{u\in H } {\Theta}_{s^{k}}(u,u^{k})\\ &\leq& F(u^{k})+\alpha{\Phi}(u^{k})={\Theta}(u^{k}). \end{array} $$

This implies that

$${\Theta}(u^{k})-{\Theta}(u^{k+1})\geq\frac{s^{k}-L}{2}\|u^{k+1}-u^{k}\|^{2}\ge\epsilon\|u^{k+1}-u^{k}\|^{2}$$

or

$$\frac{1}{\epsilon}\left( {\Theta}(u^{k})-{\Theta}(u^{k+1})\right)\geq\|u^{k+1}-u^{k}\|^{2}.$$

Summing the above inequalities over k = 0,1,...,m yields

$$\frac{1}{\epsilon}\left( {\Theta}(u^{0})-{\Theta}(u^{m+1})\right)\geq{\sum}_{k=0}^{m}\|u^{k+1}-u^{k}\|^{2},\quad\quad\forall m.$$

One infers that the series \({\sum }_{k=0}^{\infty }\|u^{k+1}-u^{k}\|^{2}\) converges. Therefore, \(\lim _{k\rightarrow \infty }\|u^{k+1}-u^{k}\|=0\). □

Theorem 4.1 (Convergence)

Let F satisfy Assumption 1 and {uk} is a sequence defined by (11), where \(s^{k}\in [L+\epsilon ,\overline {s}], \forall k\), 𝜖 > 0 and \(\overline {s}\geq L+\epsilon \). Then, there exists a subsequence \(\{u^{k_{j}}\}\) of {uk} weakly converging to some uH as \(j\rightarrow \infty \) and u is a stationary point of Θ.

Proof

By Lemma 4.1, the sequence {Θ(uk)} is decreasing and the sequence {uk} is bounded. Since ∇F is Lipschitz continuous, the sequence \(\{\nabla F(u^{k})\}_{k}\) is bounded as well. Therefore, there exists a subsequence \(\{u^{k_{j}}\}\) of {uk} such that \(\{u^{k_{j}}\}\) converges weakly to some uH, \(\{\nabla F(u^{k_{j}})\}\) converges weakly to ∇F(u) ∈ H, and \(\{s^{k_{j}}\}\) converges to \(s^{*}\in [L+\epsilon ,\overline {s}]\) as \(j\rightarrow \infty \). By Lemma 4.1 \(\{u^{k_{j}+1}\}\) also converges weakly to u. We define

$$ \begin{array}{@{}rcl@{}} {\Gamma}_{0}&=&\left\{i| |u^{*}_{i}-\frac{1}{s^{*}}(\nabla F(u^{*}))_{i}|<\sqrt{\frac{2\alpha}{s^{*}}}\right\},\\ {\Gamma}_{1}&=&\left\{i| |u^{*}_{i}-\frac{1}{s^{*}}(\nabla F(u^{*}))_{i}|>\sqrt{\frac{2\alpha}{s^{*}}}\right\}, {\Gamma}={\Lambda}\backslash ({\Gamma}_{0}\cup{\Gamma}_{1}) \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} {{\Gamma}^{j}_{0}}&=&\left\{i| |u^{k_{j}}_{i}-\frac{1}{s^{k_{j}}}(\nabla F(u^{k_{j}}))_{i}|<\sqrt{\frac{2\alpha}{s^{k_{j}}}}\right\},\\ {{\Gamma}^{j}_{1}}&=&\left\{i| |u^{k_{j}}_{i}-\frac{1}{s^{k_{j}}}(\nabla F(u^{k_{j}}))_{i}|>\sqrt{\frac{2\alpha}{s^{k_{j}}}}\right\}, {\Gamma}^{j}={\Lambda}\backslash ({{\Gamma}^{j}_{0}}\cup {{\Gamma}^{j}_{1}}). \end{array} $$

We consider three cases:

Case 1 (for each i ∈Γ0): Since \(\{u^{k_{j}}-\frac {1}{s^{k_{j}}}\nabla F(u^{k_{j}})\}\) converges to \(u^{*}-\frac {1}{s^{*}}\nabla F(u^{*})\) and \(\{s^{k_{j}}\}\) converges to s, there exists \(j^{*}\in \mathbb {N}\) such that for all jj we have \(i{\in {\Gamma }^{j}_{0}}\). Therefore, if jj, then

$$u^{k_{j}+1}_{i}=P_{\frac{2\alpha}{s^{k_{j}}}}\left( u^{k_{j}}_{i}-\frac{1}{s^{k_{j}}}(\nabla F(u^{k_{j}}))_{i}\right)=0.$$

Let \(j\rightarrow \infty \), we have

$$u^{*}_{i}=0.$$

This implies that

$$ 0=u^{*}_{i}=P_{\frac{2\alpha}{s^{*}}}\left( u^{*}_{i}-\frac{1}{s^{*}}(\nabla F(u^{*}))_{i}\right).$$

Case 2 (for each i ∈Γ1): Since \(\{u^{k_{j}}-\frac {1}{s^{k_{j}}}\nabla F(u^{k_{j}})\}\) converges to \(u^{*}-\frac {1}{s^{*}}\nabla F(u^{*})\) and \(\{s^{k_{j}}\}\) converges to s, there exists \(j^{*}\in \mathbb {N}\) such that for all jj we have \(i{\in {\Gamma }^{j}_{1}}\). Therefore, if jj, then

$$u^{k_{j}+1}_{i}=P_{\frac{2\alpha}{s^{k_{j}}}}\left( u^{k_{j}}_{i}-\frac{1}{s^{k_{j}}}(\nabla F(u^{k_{j}}))_{i}\right)=u^{k_{j}}_{i}-\frac{1}{s^{k_{j}}}(\nabla F(u^{k_{j}}))_{i}.$$

Let \(j\rightarrow \infty \), we have

$$u^{*}_{i}=u^{*}_{i}-\frac{1}{s^{*}}(\nabla F(u^{*}))_{i}.$$

This implies that

$$(\nabla F(u^{*}))_{i}=0, |u^{*}_{i}|>\sqrt{\frac{2\alpha}{s^{*}}}\text{ and } u^{*}_{i}=P_{\frac{2\alpha}{s^{*}}}\left( u^{*}_{i}-\frac{1}{s^{*}}(\nabla F(u^{*}))_{i}\right).$$

Case 3 (for each i ∈Γ): We have

$$u^{k_{j}+1}_{i} \in P_{\frac{2\alpha}{s^{k_{j}}}}\left( u^{k_{j}}_{i}-\frac{1}{s^{k_{j}}}(\nabla F(u^{k_{j}}))_{i}\right)\subset\left\{0,u^{k_{j}}_{i}-\frac{1}{s^{k_{j}}}(\nabla F(u^{k_{j}}))_{i}\right\}.$$

Let \(j\rightarrow \infty \), we have

$$u^{*}_{i}\in \left\{0,u^{*}_{i}-\frac{1}{s^{*}}(\nabla F(u^{*}))_{i}\right\}.$$

Thus

$$u^{*}_{i}\in P_{\frac{2\alpha}{s^{*}}}\left( u^{*}_{i}-\frac{1}{s^{*}}(\nabla F(u^{*}))_{i}\right).$$

Combining three cases, we have

$$u^{*}\in \mathbb{P}_{\frac{2\alpha}{s^{*}}}\left( u^{*}-\frac{1}{s^{*}}\nabla F(u^{*})\right).$$

5 Step size choice and realization of the proximal method

From Lemma 4.1 the condition for the convergence of proximal method is that parameters sk must satisfy skL + 𝜖 for some fixed 𝜖 > 0. In practice, there are two following cases:

  1. (i)

    In the first case, the Lipschitz constant L is known.

  2. (ii)

    In the second one, the Lipschitz constant L is unknown or very hard to find.

To make sure the convergence of the proximal method: in the first case we can choice sk = L + 𝜖 with 𝜖 > 0 for all k. However, the bigger the Lipschitz constant is, the smaller of step size is. As a result, the algorithm converges slower. Thus, we do not want to choose sk too large for all k. To overcome this situation and for the second case, we propose a practical way to select step size as follow: we can choose s0 that is not too large and choose the parameter sk that satisfies the condition \({\Theta }(u^{k+1})\leq {\Theta }_{s^{k}}(u^{k+1},u^{k})\) by the back-tracking rule. By this method, the sequence {Θ(uk)} is monotonically decreasing (see the proof of Lemma 4.1). With this choice rule, there are two cases: In the first case we have skL for all k and thus the conditions in Lemma 4.1 are not satisfied, i.e., the proximal method may not converge. In the second case there exists a k such that \(s^{k^{*}}>L, \forall k\ge k^{*}\), i.e., the proximal method converges. In practice, by the backtracking rule the first case rarely happen, especially when the number of iterations is large since the sequence sk is increasing. Thus, the above method of choosing step sizes is practical.

Since the proximal operator is a multi-valued operator, there are many sequences {uk} generated by the proximal method. In practice we only need one of them. Thus, we use one of two modified proximal operators as follows:

$$ \begin{array}{@{}rcl@{}} \mathbb{P}^{l}_{\lambda}(u)=\left( {P}^{l}_{\lambda}(u_{1}),{P}^{l}_{\lambda}(u_{2}),\dots,{P}^{l}_{\lambda}(u_{n}), \ldots\right), u\in H, l\in \{1,2\} \end{array} $$
(12)

where

$$ \begin{array}{@{}rcl@{}} {P}^{1}_{\lambda}(x)=\begin{cases} 0 &\text{ if } |x|\leq \sqrt{\lambda}\\ x&\text{ if } |x| > \sqrt{\lambda}\end{cases}, {P}^{2}_{\lambda}(x)=\begin{cases} 0 &\text{ if } |x|< \sqrt{\lambda}\\ x&\text{ if } |x| \geq \sqrt{\lambda}\end{cases} \quad (x\in \mathbb{R}). \end{array} $$
(13)

Combining the choice of step sizes and the modified proximal operator, the iteration (8), is presented in Algorithm 1. We will illustrate Algorithm 1 with different modified proximal operators in Section 6.

figure a

6 Application to an inverse source problem

For a numerical example we consider the inverse problem of estimating the source yL2(Ω) in the elliptic boundary problem

$$ \begin{array}{@{}rcl@{}} -\text{div} (a\nabla u)&=y \text{ in } {\Omega}\subset \mathbb{R}^{2}, u= u_{0} \text{ on } \partial {\Omega} \end{array} $$
(14)

from a noisy data uδ of the exact solution.

We assume that Ω is a Lipschitz domain and \(a\in \mathcal {A}\), where

$$\mathcal{A}=\{a\in L^{\infty}({\Omega}): 0<\underline{a}< a < \overline{a}\}$$

Then, for each \(y\in L^{2}({\Omega }),a\in \mathcal {A}\) and u0H1/2(Ω) problem (14) has a unique weak solution uH1, see, e.g., [6].

Let y be the parameter that needs to recover and u be the solution to (14) with fixed parameters u0,a and y. Let \(K:L^{2}({\Omega })\rightarrow H^{1}({\Omega }), y\mapsto u\), the solution operator of (14) with fixed a and u0. The inverse problem is equivalent to solving the equation

$$ \begin{array}{@{}rcl@{}} K(y)=u^{*},\end{array} $$
(15)

where only a noisy approximation uδ of u is available with

$$\|u^{*} - u^{\delta}\|_{L^{2}{({\Omega})}}\le \delta.$$

Using least square approach together with l0-regularization, the approximation to y is the solution to the following problem

$$ \begin{array}{@{}rcl@{}} \min_{y\in L^{2}({\Omega})}{\Theta}(y):=F(y)+\alpha {\sum}_{i=1}^{\infty} |y_{i}|_{0}, \end{array} $$
(16)

where

$$ \begin{array}{@{}rcl@{}} F(y)=\frac{1}{2}\|K(y)-u^{\delta}\|^{2}_{L^{2}({\Omega})}, \end{array} $$
(17)

α > 0 is a regularization parameter and yi = 〈y,φi〉 with \(\{\varphi _{i}\}_{i=1}^{\infty }\) is an orthonormal basis of L2(Ω).

The wellposedness of problem (16), the convergence and convergence rates of its solutions to solutions of operator (15) are still open (by our best knowledge), but it can be obtained by standard approaches in theory of regularization for inverse problems. For examples, Wang et al. have proved some such results relating to l0-regularization for general operator equations in [16, 17]. Here, as an example we try to solve problem (16) by our proximal method. The numerical results will show the wellposedness of problem (16) and the convergence of its solutions to operator (15). Note that the proximal method can be applied for problem (16) since F satisfies Assumption 1, where the Lipschitz differentiability of F is given in the following lemma.

Lemma 6.1

The functional \(F:L^{2}({\Omega })\rightarrow \mathbb {R}\) is Fréchet differentiable at any point yL2(Ω), Fréchet derivative \(F^{\prime }\) is Lipschitz continuous and \(F^{\prime }(y)=p\), where p is the weak solution to the following problem

$$ \begin{array}{@{}rcl@{}} -\text{div}(a\nabla p)=u-u^{\delta} \text{ in } {\Omega}, p=0 \text{ on } \partial{\Omega}, \end{array} $$
(18)

where u = K(y).

Proof

We first recall the following well-known results:

  1. (a)

    \({H^{1}_{0}}({\Omega })\) is embedding compactly into L2(Ω). This implies that there exists a constant c > 0 such that

    $$ \|v\|_{L^{2}({\Omega})}\leq c\|v\|_{{H^{1}_{0}}({\Omega})}, \forall v\in {H^{1}_{0}}({\Omega}). $$
    (19)
  2. (b)

    In \({H^{1}_{0}}({\Omega })\) the norm \({\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert v \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }:=\sqrt {{\int \limits }_{\Omega } a \nabla v. \nabla v dx}\) is equivalent to \(\|v\|_{{H^{1}_{0}}({\Omega })}\), i.e., there exist constants c1,c2 > 0 such that

    $$ c_{1}\|v\|_{{H^{1}_{0}}({\Omega})}\leq {\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert v \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}\leq c_{2}\|v\|_{{H^{1}_{0}}({\Omega})}, \forall v\in {H^{1}_{0}}({\Omega}). $$
    (20)

Using above results, we now prove the Fréchet differentiability of F at yL2(Ω). Indeed, for δyL2(Ω) we have

$$ \begin{array}{@{}rcl@{}} 2(F(y+\delta y)-F(y))&=\langle K(y+\delta y)-K(y),K(y+\delta y)+K(y)-2u^{\delta} \rangle_{L^{2}({\Omega})} \\ &=2\langle u^{1}-u,u-u^{\delta} \rangle_{L^{2}({\Omega})}+\| u^{1}-u\|^{2}_{L^{2}({\Omega})} \\ &\stackrel{(19)}{\leq} 2\langle u^{1}-u,u-u^{\delta} \rangle_{L^{2}({\Omega})}+c\|u^{1}-u\|^{2}_{{H^{1}_{0}}({\Omega})} \\ &\stackrel{(20)}{\leq} 2\langle u^{1}-u,u-u^{\delta} \rangle_{L^{2}({\Omega})}+\frac{c}{c_{1}}{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert u^{1}-u \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}^{2}, \end{array} $$
(21)

where u = K(y),u1 = K(y + δy). Furthermore, by the formula of weak solutions we have

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega}a\nabla u. \nabla v dx&={\int}_{\Omega} y vdx, \forall v\in {H^{1}_{0}}({\Omega}), \end{array} $$
(22)
$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega}a\nabla u^{1}. \nabla v dx&={\int}_{\Omega} (y+\delta y) vdx, \forall v\in {H^{1}_{0}}({\Omega}). \end{array} $$
(23)

Subtracting both sides of equations (22) and (23) we have

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega}a\nabla (u^{1}-u). \nabla v dx&={\int}_{\Omega} \delta y vdx, \forall v\in {H^{1}_{0}}({\Omega}). \end{array} $$
(24)

Inserting v = u1u and v = p into (24) we obtain

$$ \begin{array}{@{}rcl@{}} {\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert u^{1}-u \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}^{2}&={\int}_{\Omega} \delta y (u^{1}-u)dx, \end{array} $$
(25)
$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega} a\nabla (u^{1}-u).\nabla p dx&={\int}_{\Omega} \delta y pdx. \end{array} $$
(26)

By the formula of weak solution to (18) we have

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega}a\nabla p. \nabla v dx&={\int}_{\Omega} (u-u^{\delta}) vdx, \forall v\in {H^{1}_{0}}({\Omega}). \end{array} $$
(27)

Inserting \(v=u^{1}-u\in {H^{1}_{0}}({\Omega })\) into (27) we have

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega}a\nabla p. \nabla (u^{1}-u) dx&={\int}_{\Omega} (u-u^{\delta}) (u^{1}-u)dx. \end{array} $$
(28)

From (21), (25), (26) and (28) we have

$$ \begin{array}{@{}rcl@{}} 2(F(y+\delta y)-F(y))&\leq& 2{\int}_{\Omega} a\nabla p. \nabla (u^{1}-u) dx+\frac{c}{c_{1}}{\int}_{\Omega} \delta y (u^{1}-u)dx \\ &=&2{\int}_{\Omega} \delta y p dx+\frac{c}{c_{1}}{\int}_{\Omega} \delta y (u^{1}-u)dx \\ &\leq& 2{\int}_{\Omega} \delta y p dx+\frac{c}{c_{1}}\|\delta y\|_{L^{2}({\Omega})}\|u^{1}-u\|_{L^{2}({\Omega})} \\ &=&2{\int}_{\Omega} \delta y p dx+0(\|\delta y\|_{L^{2}({\Omega})}) \end{array} $$
(29)

since \(\|u^{1}-u\|_{L^{2}({\Omega })}\rightarrow 0\) as \(\delta y\rightarrow 0\) (due to the continuity of the operator K). The inequality (29) implies that F is Fréchet differentiable and \(F^{\prime }(y)=p\), where p is the weak solution to (18).

To finish the proof, we need to prove the Lipschitz continuity of \(F^{\prime }\). Indeed, let \(p_{1}=F^{\prime }(y_{1})\) and \(p_{2}=F^{\prime }(y_{2})\). Then, \(p_{1}, p_{2}\in {H^{1}_{0}}({\Omega })\) and they respectively satisfy

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega} a\nabla p_{1}.\nabla vdx={\int}_{\Omega} (u_{1}-u^{\delta})vdx, \forall v\in {H^{1}_{0}}({\Omega}) \end{array} $$
(30)

and

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega} a\nabla p_{2}.\nabla vdx={\int}_{\Omega} (u_{2}-u^{\delta})vdx, \forall v\in {H^{1}_{0}}({\Omega}), \end{array} $$
(31)

where u1 = K(y1) and u2 = K(y2). Subtracting both sides of (30) and (31) we have

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega} a\nabla (p_{1}-p_{2}).\nabla vdx={\int}_{\Omega} (u_{1}-u_{2})vdx, \forall v\in {H^{1}_{0}}({\Omega}). \end{array} $$
(32)

Inserting v = p1p2 into (32) we obtain

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega} a\nabla (p_{1}-p_{2}).\nabla (p_{1}-p_{2})dx={\int}_{\Omega} (u_{1}-u_{2})(p_{1}-p_{2})dx. \end{array} $$
(33)

Similarly, subtracting both sides of the formula of the weak solutions u1 and u2 we have

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega} a\nabla (u_{1}-u_{2}).\nabla vdx={\int}_{\Omega} (y_{1}-y_{2})vdx, \forall v\in {H^{1}_{0}}({\Omega}). \end{array} $$
(34)

Inserting v = u1u2 into (32) we obtain

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega} a\nabla (u_{1}-u_{2}).\nabla (u_{1}-u_{2})dx={\int}_{\Omega} (y_{1}-y_{2})(u_{1}-u_{2})dx. \end{array} $$
(35)

From (19), (20) and (33) we have

$$ \begin{array}{@{}rcl@{}} \|p_{1}-p_{2}\|^{2}_{L^{2}({\Omega})}&\stackrel{(19)}{\leq} c^{2}\|p_{1}-p_{2}\|^{2}_{H^{2}({\Omega})} \stackrel{(20)}{\leq}\frac{c^{2}}{{c_{1}^{2}}}\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert {p_{1}-p_{2}} \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert^{2} \\ &{\kern-38.5pt}\stackrel{(33)}{=}\frac{c^{2}}{{c_{1}^{2}}}{\int}_{\Omega} (u_{1}-u_{2})(p_{1}-p_{2})dx \\ &{}\leq \frac{c^{2}}{{c_{1}^{2}}}\|u_{1}-u_{2}\|_{L^{2}({\Omega})}\|p_{1}-p_{2}\|_{L^{2}({\Omega})} \\ \Rightarrow \|p_{1}-p_{2}\|_{L^{2}({\Omega})}&{}\leq \frac{c^{2}}{{c_{1}^{2}}}\|u_{1}-u_{2}\|_{L^{2}({\Omega})}. \end{array} $$
(36)

Similarly, from (19), (20) and (35) we have

$$ \begin{array}{@{}rcl@{}} \|u_{1}-u_{2}\|_{L^{2}({\Omega})}&\leq \frac{c^{2}}{{c_{1}^{2}}}\|y_{1}-y_{2}\|_{L^{2}({\Omega})}. \end{array} $$
(37)

Finally, from (36) and (36) we deduce

$$ \begin{array}{@{}rcl@{}} \|p_{1}-p_{2}\|_{L^{2}({\Omega})}&\leq \frac{c^{4}}{{c_{1}^{4}}}\|y_{1}-y_{2}\|_{L^{2}({\Omega})}. \end{array} $$
(38)

This implies that \(F^{\prime }\) is Lipschitz continuous with Lipschitz constant \(\frac {c^{4}}{{c_{1}^{4}}}\). □

7 Numerical solution

In this section we illustrate the performance of the proximal method for the inverse source problem in previous section. We assume that Ω = [− 1,1]2 and

$$a(x_{1},x_{2})=1+{x^{2}_{1}}+{x^{2}_{2}}, u_{0}=1, y^{*}(x_{1},x_{2})=e^{-10({x_{1}^{2}}+{x_{2}^{2}})}. $$

We discretize problems (14) and (16) by the finite element method (FEM). The domain Ω is divided into triangles and define the following finite element spaces:

$$ \begin{array}{@{}rcl@{}} V^{N}=\{u\in H^{1}({\Omega}):u|_{K}\in \mathcal{P}_{1}(K), \forall K\in \mathcal{T}\},\\ {V^{N}_{0}}=\{u\in {H^{1}_{0}}({\Omega}):u|_{K}\in \mathcal{P}_{1}(K), \forall K\in \mathcal{T}\}, \end{array} $$

where \(\mathcal {P}_{1}(K)\) denotes the set of linear polynomials in K, N and \(\mathcal {T}\) are respectively the number of nodes and the set of triangles of the mesh. Let {φk}k= 1,…,N is the basis consisting of piecewise linear finite elements in discretized space VN. Then, the discretization of problem (14) is: find uVN such that

$$ \begin{array}{@{}rcl@{}} {\int}_{\Omega} a\nabla u. \nabla v dx={\int}_{\Omega} yvdx, \forall v\in {V^{N}_{0}} \text{ and } u=Pu_{0} \text{ on } \partial{\Omega}, \end{array} $$
(39)

where P is the interpolation in FEM.

Similarly, the discretization of problem (16) is

$$ \min_{y\in V^{N}} {\Theta}(y)=F(y)+\alpha {\Phi}(y), $$
(40)

where

$$ \begin{array}{@{}rcl@{}} F(y)=\frac{1}{2}\|K(y)-u^{\delta}\|_{L^{2}({\Omega})}, {\Phi}(y)={\sum}_{i=1}^{N}\left| y_{i}\right|_{0} \end{array} $$
(41)

with \(y_{i}=\langle y,\varphi _{i} \rangle _{L^{2}({\Omega })}\).

To obtain u and uδ, we solve (39) on a mesh with N = 887 nodes and 1680 triangles. The solution to (39) and all the parameters a,u0 and y are represented by piecewise linear finite elements. Problem (40) is solved by the proximal method, Algorithm 5.1.

We measure the convergence of the computed minimizers to the true parameter y by the mean square error sequence

$$MSE(y^{k})={\int}_{\Omega} (y^{k}-y^{*})^{2} dx,$$

and

$$\|D(y^{k})\|=\|y^{k}-\mathbb{P}^{l}_{\frac{2\alpha}{s^{*}}}\left( y^{k}-\frac{1}{s^{*}}\nabla F(y^{k})\right)\|_{L^{2}({\Omega})}, l=1,2$$

with s = 100.

7.1 Numerical experiments with δ = 0

We first discuss results without noise, i.e., uδ = u. Let Prox.Al.1 and Prox.Al.2 be Algorithm 5.1 with l = 1 and l = 2 respectively. In these algorithms we set q = 0.5, \([\underline {s},\overline {s}]:=[5 \cdot 10^{-4},5 \cdot 10^{4}]\) and α = 10− 6.

Convergence and convergence rate

In Fig. 1 we see that the sequences \(\{\|D(y^{k})\|_{L^{2}({\Omega })}\}\) generated by Prox.Al.1 and Prox.Al.2 are decreasing to zero, i.e., the sequences {yk} converge to stationary points of Θ(⋅). The behavior of MSE(yk) indicate that the sequences {yk} in two algorithms converge to the unknown parameter y. We also see that the objective functionals {Θ(yk)} decrease monotonically in two algorithms. These observations are suitable with the theory results in Lemma 4.1. The computational times in two algorithms are similar. Figure 3 shows that after the first iteration, the step sizes \(\frac {1}{s^{n}}\) do not change. Thus, we believe that they are smaller than \(\frac {1}{L}\), the Lipschitz constant of \(F^{\prime }\).

Fig. 1
figure 1

The value of \(\|D(y^{n})\|_{L^{2}({\Omega })}, MSE(y^{n}), {\Theta }(y^{n})\) and computational time in Prox.Al.1 and Prox.Al.2

Note that we do not have any result about convergence rate of the proximal algorithm in this paper. Since problem (9) is nonconvex and nondifferentiable, such a result is very hard to prove. However, in [12] for the convex problem of the same form of (9), the convergence rate is proven to be \(O(\frac {1}{n})\), where n is the number of iterations of the proximal method. This implies that the convergence rate of the proximal algorithm for problem (9) do not better than \(O(\frac {1}{n})\). Figure 1 shows this statement is true since the sequences Θ(yn) in two algorithms are decreasing slower than Θ(y0)/n.

Quality of recovered solutions

Figure 2 illustrates y and yn with n = 500 in two algorithms. The parameter y has been reconstructed very well. The mean square errors in two algorithms are the same. Note that y is always greater than zero (there are many components close to zero), but two reconstructions, yn with n = 500 in two algorithms have many zero components, which are shown in Fig. 3.

Fig. 2
figure 2

3D plots and contour plots: exact y (top); yn with n = 500 in Prox.Al.1 (middle) and in Prox.Al.2 (bottom)

Fig. 3
figure 3

Number of nonzero elements of yn (left) and step size sn (right) in Prox.Al.1 and Prox.Al.2

7.2 Numerical experiments with noisy data

We now consider noisy data. To obtain uδL2(Ω), we compute \(u^{\delta }=u^{*}+\epsilon \frac {R}{\|R\|_{L^{2}({\Omega })}}\), where R is computed with the MATLAB routine randn(size(y)) with setting randn(state,0). Here, we choose 𝜖 = 10− 2. Figure 4 illustrates the graph of uδu. It shows that the data is noisy.

Fig. 4
figure 4

3D graph of uδu

With noisy data we also use the same setting in Prox.Al.1 and Prox.Al.2, i.e., all parameters are the same as those in the case of no noise. From Fig. 5 we have the similar observations on convergence and convergence rate as well as computational time. However, for MSE(yn) they are decreasing in the first number of iterations, but they are increasing after that. This is always happened in regularization of inverse problems. With noisy data we need to study a rule of stopping the algorithms. Such a rule is out of the aim of this paper. Here, we choose yk such that MSE(yk) is the smallest in the sequences of MSE(yn) generated by two algorithms. These approximate solutions are illustrated in Fig. 6.

Fig. 5
figure 5

The value of \(\|D(y^{n})\|_{L^{2}({\Omega })}, MSE(y^{n}), {\Theta }(y^{n})\) and computational time in Prox.Al.1 and Prox.Al.2

Fig. 6
figure 6

3D plots and contour plots: exact y (top); yn with n = 139 in Prox.Al.1 (middle) and in Prox.Al.2 (bottom)

Fig. 7
figure 7

The value of \(\|D(y^{n})\|_{L^{2}({\Omega })}, MSE(y^{n}), {\Theta }(y^{n})\) and computational time in Prox.Al.1 and Prox.Al.2

7.3 Numerical experiments with a nonsmooth source functional

We close this section by considering an inverse source problem with a nonsmooth source functional . Here, we use the same setting as two previous subsections, but replace the source functional by another nonsmooth functional, i.e., we assume that Ω = [− 1,1]2 and

$$a(x_{1},x_{2})=1+{x^{2}_{1}}+{x^{2}_{2}}, u_{0}=1, $$
$$y^{*}(x_{1},x_{2})=\begin{cases} 2, \text{ if } (x_{1},x_{2})\in [-0.6, -0.2]\times [-0.6, -0.2],\\ 2, \text{ if } (x_{1},x_{2})\in [0.2, 0.6]\times [0.2, 0.6],\\ 0, \text{ otherwise.} \end{cases}. $$

We generate noisy data as previous subsection with 𝜖 = 10− 2. For Prox.Al.1 and Prox.Al.2, we set q = 0.5, \([\underline {s},\overline {s}]:=[5 \cdot 10^{-4},5 \cdot 10^{4}]\) and α = 10− 6. In Fig. 7, the sequences {∥D(yn)∥} show that the sequences {yn} converge to stationary points of Θ, the sequences {yn} also approach to the exact parameter y but the sequences MSE(yn) have higher values than those in the smooth case (see Fig. 5). Figure 8 shows the sparsity of recovered solutions yn and the behaviors of the stepsizes \(\frac {1}{s_{n}}\), which are suitable with theoretical analysis. It is harder to recover a nonsmooth parameter than a smooth parameter. The qualities of reconstructions are worse in this situation (see Fig. 9).

Fig. 8
figure 8

Number of nonzero elements of yn (left) and step size sn (right) in Prox.Al.1 and Prox.Al.2

Fig. 9
figure 9

3D plots and contour plots: exact y (top); yn with n = 139 in Prox.Al.1 (middle) and in Prox.Al.2 (bottom)

From three situations, we see that Prox.Al.1 and Prox.Al.2 converge to stationary points of Θ and the sequences of the objective functional are decreasing monotonically. Prox.Al.1 is faster than Prox.Al.2, but they have the same order of convergence rates. Furthermore, the computational time for each iteration seems to be constant in both algorithms. Finally, at the same noise level the reconstructions for smooth unknown parameters are better than those for nonsmooth unknown parameters.

8 Conclusion and future work

We have considered the minimization problem

$$ \min_{u\in H } F(u)+\alpha {\Phi}(u), $$

where \(F:H \rightarrow \mathbb {R}\) is Fréchet differentiable, \(F^{\prime }\) is Lipschitz continuous and Φ(u) = ∥u0.

Under Assumption 1 and the condition on step sizes sk, Theorem 4.1 has shown that the iteration

$$ \begin{array}{@{}rcl@{}} u^{k+1}\in\mathbb{P}_{\frac{2\alpha}{s^{k}}}\big(u^{k}-\frac{1}{s^{k}}\nabla F(u^{k})\big). \end{array} $$
(42)

converges (under a subsequence) to a stationary point u, i.e., u satisfies

$$u^{*}\in \mathbb{P}_{\frac{2\alpha}{s^{*}}}\left( u^{*}-\frac{1}{s^{*}}\nabla F(u^{*})\right),$$

for some sL.

Note that the proximal method falls in class of gradient type methods. Thus, it converges quite slow and has linear convergence rate. Therefore, we will study some better methods in the next step, e.g., some algorithms fall into class of Newton method. To make this point clearer, we note that instead of solving the above minimization problem, we can concentrate on solving its first-order optimal condition equation:

$$0 \in D(u):=u-\mathbb{P}_{\frac{2\alpha}{s}}\left( u-\frac{1}{s}\nabla F(u)\right),$$

for some fixed sL. Therefore, we can extend Newton method or quasi-Newton method to solve the last equation. It is valuable to show that the last equation is not continuous. As a result, it is most difficult part of extending Newton and quasi-Newton method to deal with this situation. We will propose our ideas to overcome this issue in future.

There are some other issues which are needed to study such as the rules of choosing regularization parameter with respect to noise level, the rules of stopping the algorithms in the case of noisy data and the algorithms for the constrained minimization problem

$$ \min_{u\in D} F(u)+\alpha {\Phi}(u), $$

where \(F: D\subset H \rightarrow \mathbb {R}\) is Fréchet differentiable, \(F^{\prime }\) is Lipschitz continuous and Φ(u) = ∥u0.