1 Introduction

Consider ill-posed problems of the form

$$\begin{aligned} F(x)=y^{\dagger }\end{aligned}$$
(1.1)

where \(F:{\mathcal {D}}(F) \subset {\mathcal {X}}\rightarrow {\mathcal {Y}}\), is a linear or a nonlinear operator, with domain \({\mathcal {D}}(F)\). We assume (1.1) has a solution. To find a solution of (1.1) with specific properties, we may use a proper, lower semi-continuous, uniformly convex function \({\mathcal {R}}:{\mathcal {X}}\rightarrow (-\infty ,\infty ]\). Let \(\partial {\mathcal {R}}\) denote the subdifferential of \({\mathcal {R}}\), i.e.

$$\begin{aligned} \partial {\mathcal {R}}(x):= \left\{ \xi \in {\mathcal {X}}^{*}: {\mathcal {R}}(\bar{x}) \ge {\mathcal {R}}(x) + \left\langle \xi ,\bar{x} - x \right\rangle _{{\mathcal {X}}^{*},{\mathcal {X}}} \text { for all } \bar{x} \in {\mathcal {X}} \right\} \end{aligned}$$

for all x in \({\mathcal {X}}\), with \({\mathcal {X}}^{*}\) denoting the dual space of a Banach space \({\mathcal {X}}\) and \(\left\langle \cdot ,\cdot \right\rangle _{{\mathcal {X}}^{*},{\mathcal {X}}}\) the duality pairing between \({\mathcal {X}}^{*}\) and \({\mathcal {X}}\). Also we denote by \(\left\| \cdot \right\| _{{\mathcal {X}}}\) the norm on \({\mathcal {X}}\). We use

$$\begin{aligned} D_{\xi }{\mathcal {R}}(\bar{x},x):= {\mathcal {R}}(\bar{x}) - {\mathcal {R}}(x) - \left\langle \xi ,\bar{x} - x \right\rangle _{{\mathcal {X}}^{*},{\mathcal {X}}}, \quad \bar{x} \in {\mathcal {X}}\end{aligned}$$

to denote the Bregman distance induced by \({\mathcal {R}}\) at x in the direction \(\xi \). By picking \(x_{0} \in \mathcal {D}(\partial {\mathcal {R}}) \) and \(\xi _{0} \in \partial {\mathcal {R}}(x_{0})\) as initial guesses, we define \(x^{\dagger }\) to be a solution of (1.1) with the property

$$\begin{aligned} D_{\xi _{0}}{\mathcal {R}}(x^{\dagger },x_{0}):= \min _{x \in \mathcal {D}(F)} \left\{ D_{\xi _{0}}{\mathcal {R}}(x,x_{0}) \,: \, F(x)=y^{\dagger } \right\} . \end{aligned}$$
(1.2)

The exact data \(y^{\dagger }\) is either inaccessible or not precisely available. Instead, a noisy measurement \(y^{\delta }\in {\mathcal {Y}}\) is available that satisfies

$$\begin{aligned} \left\| y^{\delta }- y^{\dagger } \right\| _{{\mathcal {Y}}} \le \delta , \end{aligned}$$
(1.3)

where \(\delta \) is the noise level. Then the Landweber iteration in Banach spaces has the form

$$\begin{aligned} \begin{aligned} \xi ^{\delta }_{n+1}&= \xi ^{\delta }_{n} - \mu ^{\delta }_{n} L(x^{\delta }_{n})^{*} j_{r}^{\mathcal {Y}}\left( F(x^{\delta }_{n}) - y^{\delta } \right) , \\ x^{\delta }_{n+1}&= \arg \min _{x\in {\mathcal {X}}} \left\{ {\mathcal {R}}(x) - \left\langle \xi ^{\delta }_{n+1}, x \right\rangle _{{\mathcal {X}}^{*},{\mathcal {X}}} \right\} . \end{aligned} \end{aligned}$$
(1.4)

where \(\xi ^{\delta }_{0} = \xi _{0}\), \(x^{\delta }_{0}=x_{0}\), the step size is given by

$$\begin{aligned} \mu ^{\delta }_n:=\tilde{\mu }^{\delta }_{n}\left\| F(x^{\delta }_{n})-y^{\delta } \right\| _{{\mathcal {Y}}}^{p-r}, \text { where } \tilde{\mu }^{\delta }_{n}:= \min \left\{ \frac{\mu _{0}\left\| F(x^{\delta }_{n} )-y^{\delta } \right\| _{{\mathcal {Y}}}^{p(r-1)}}{\left\| L(x^{\delta }_{n})^{*} j_{r}^{\mathcal {Y}}\left( F(x^{\delta }_{n}) - y^{\delta } \right) \right\| _{{\mathcal {X}}^{*}}^{p} }, \mu _{1} \right\} , \nonumber \\ \end{aligned}$$
(1.5)

for some positive constants \(\mu _{0}\) and \(\mu _{1}\), p is a positive constant depending on the convexity of \({\mathcal {R}}\), \(\left\{ L(x): x \in {\mathcal {D}}(F) \right\} \) is a family of bounded linear operators from \({\mathcal {X}}\) to \({\mathcal {Y}}\) satisfying certain properties, and \(j_{r}^{{\mathcal {Y}}}: {\mathcal {Y}}\rightarrow {\mathcal {Y}}^{*}\), with \(1 \le r <\infty \) is a selection of the (possibly) multi-valued mapping \(J_{r}^{{\mathcal {Y}}}:{\mathcal {Y}}\rightarrow 2^{{\mathcal {Y}}^{*}}\), defined as the subdifferential of the convex function \(y \mapsto \left\| y \right\| _{{\mathcal {Y}}}^{r}/r\).

The iteration must be terminated appropriately to produce a useful approximate solution. The authors in [19] used the discrepancy principle i.e., the iteration (1.4) is terminated after \(n_{\delta }:= n(\delta ,y^{\delta })\) steps with

$$\begin{aligned} \left\| y^{\delta }- F(x^{\delta }_{n_{\delta }}) \right\| _{{\mathcal {Y}}} \le \tau \delta< \left\| y^{\delta }- F(x^{\delta }_{n}) \right\| _{{\mathcal {Y}}}, \quad 0 \le n < n_{\delta }, \end{aligned}$$
(1.6)

where \(\tau \) is an appropriately chosen constant, and the regularization property of \(x^{\delta }_{n} \) as \(\delta \rightarrow 0\) has been extensively studied [3, 15, 19, 20, 26]. All previous convergence analysis of Landweber iteration (under discrepancy principle) requires standard assumptions on the forward operator F, as well as the following key conditions:

  1. (a)

    the Banach space \({\mathcal {Y}}\) is uniformly smooth and \(1<r<\infty \);

  2. (b)

    the mapping \(x \rightarrow L(x)\) from \({\mathcal {X}}\) to \({\mathscr {L}}({\mathcal {X}}, {\mathcal {Y}})\) is continuous,

where \({\mathscr {L}}({\mathcal {X}}, {\mathcal {Y}})\) denotes the Banach space of all bounded linear operators from the Banach space \({\mathcal {X}}\) to the Banach space \({\mathcal {Y}}\). The paper [24] relaxes conditions (a) and (b) to address situations where the noisy data is contaminated by non-Gaussian noise such as the impulsive noise or the uniform distributed noise.

The discrepancy principle (1.6) requires knowledge of the noise level, which is not always available or reliable in real world applications. Overestimation or underestimation of noise level may lead to significant loss of reconstruction accuracy when using the discrepancy principle. Therefore, it is necessary to develop heuristic rules for Landweber iteration that do not use any knowledge of the noise level. Based on the discrepancy principle, we will propose a heuristic rule for Landweber iteration in the spirit of the work of Hanke and Raus [10]. Our heuristic rule determines an integer \(n_*:=n_*(y^\delta )\) by minimizing

$$\begin{aligned} \Theta (n, y^\delta ):= (n+a) \left\| F(x^{\delta }_n)-y^{\delta } \right\| _{{\mathcal {Y}}} ^p \end{aligned}$$

over the iteration, where \(a\ge 1\) is a fixed number. The Bakushinskii’s veto [2] states that heuristic rules can not lead to convergence in the sense of worst case scenario for any regularization methods. Despite this veto, by imposing certain conditions on the noisy data, convergence results under heuristic rules are obtained for various regularization methods [8, 12, 16, 18, 22, 30]. The Landweber iteration is computationally cheap to implement, thus it is worth analyzing under heuristic rules.

In this paper we will present a convergence analysis of the Landweber iteration in Banach spaces under the Hanke–Raus heuristic rule. To do this, we first present the assumptions lifted from previous convergence analysis of the iteration, including a compactness condition. Then we discuss the Hanke–Raus rule for the iteration and the special assumption for our analysis, the noise condition. Next, we present the main result on the convergence analysis of the Landweber iteration under the Hanke–Raus rule. Finally, numerical examples are presented to illustrate the theoretical results.

2 Landweber iteration in Banach spaces

In analyzing the Landweber iteration in Banach spaces defined by (1.4)–(1.5), we will assume the following standard assumptions concerning the regularization functional F and the forward operator F, as also used in [19, 24].

Assumption 2.1

\({\mathcal {R}}: {\mathcal {X}}\rightarrow (-\infty , \infty ]\) is proper, lower semi-continuous and p-convex with \(p>1\) in the sense that there is a constant \(c_0>0\) such that

$$\begin{aligned} {\mathcal {R}}(t {\bar{x}} + (1-t) x) + c_0 t(1-t) \Vert {\bar{x}} - x\Vert _{{\mathcal {X}}}^p \le t {\mathcal {R}}({\bar{x}}) + (1-t) {\mathcal {R}}(x) \end{aligned}$$

for all \({\bar{x}}, x \in {\mathcal {X}}\) and \(0\le t \le 1\).

Assumption 2.2

  1. (a)

    F is weakly closed over \({\mathcal {D}}(F)\).

  2. (b)

    There exist \(\rho >0\), \(x_0\in {\mathcal {X}}\) and \(\xi _0 \in \partial {\mathcal {R}}(x_0)\) such that \(B_{3\rho }(x_0)\subset {\mathcal {D}}(F)\) and (1.1) has a solution \(x^*\) satisfying \(D_{\xi _0} {\mathcal {R}}(x^*, x_0) \le c_0 \rho ^p\).

  3. (c)

    There is a family of bounded linear operators \(\{L(x): {\mathcal {X}}\rightarrow {\mathcal {Y}}\}_{x\in B_{3\rho }(x_0)}\) such that \(\Vert L(x)\Vert \le B\) for all \(x\in B_{3\rho }(x_0)\) for some constant B, and there is a constant \(0\le \eta <1\) such that

    $$\begin{aligned} \Vert F(\bar{x})-F(x)-L(x)(\bar{x}-x)\Vert _{{\mathcal {Y}}} \le \eta \Vert F(\bar{x})-F(x)\Vert _{{\mathcal {Y}}} \end{aligned}$$

    for all \(\bar{x}, x\in B_{3\rho }(x_0)\).

It can be easily checked that the p-convexity of \({\mathcal {R}}\) in Assumption 2.1 implies that

$$\begin{aligned} D_{\xi }R(\bar{x},x) \ge c_{0}\left\| \bar{x} - x \right\| _{{\mathcal {X}}}^{p} \end{aligned}$$

for all \(\bar{x},x \in {\mathcal {X}}\) and \(\xi \in \partial {\mathcal {R}}(x)\). Let \({\mathcal {R}}^{*}\) denote the Legendre-Fenchel conjugate of \({\mathcal {R}}\), i.e.

$$\begin{aligned} {\mathcal {R}}^{*}(\xi ):=\sup _{x \in {\mathcal {X}}}\left\{ \left\langle \xi ,x \right\rangle _{{\mathcal {X}}^{*},{\mathcal {X}}} - {\mathcal {R}}(x) \right\} , \quad \xi \in {\mathcal {X}}^{*}, \end{aligned}$$

then \({\mathcal {R}}^{*}\) is Fréchet differentiable over \({\mathcal {X}}^{*}\) and

$$\begin{aligned} \left\| \nabla {\mathcal {R}}^{*}(\bar{\xi }) - \nabla {\mathcal {R}}^{*}(\xi ) \right\| _{{\mathcal {X}}} \le \left( \frac{\left\| \bar{\xi } - \xi \right\| _{{\mathcal {X}}^{*}}}{2c_{0}} \right) ^{p^{*} - 1}, \end{aligned}$$

where \(p^{*}\) is the number conjugate to p, i.e., \(1/p + 1/p^{*} = 1\), see [33]. Using the definition of \(x^{\delta }_{n}\) and the subdifferential calculus, we have \(x^{\delta }_{n} = \nabla {\mathcal {R}}^{*}(\xi ^{\delta }_{n})\).

Assumption 2.2 (a) means that for any sequence \(\left\{ x_{n} \right\} \in \mathcal {D}(F)\) satisfying \(x_{n} \rightharpoonup x \in \mathcal {X}\) and \(F(x_{n}) \rightarrow y \in {\mathcal {Y}}\) there hold \(x \in \mathcal {D}(F)\) and \(F(x)=y\) (here we denote weak convergence and strong convergence by "\(\rightharpoonup \)" and "\(\rightarrow \)" respectively). Assumption 2.2 (c) is a version of the tangential cone condition, widely used in convergence analysis of iterative regularization methods for nonlinear inverse problems [9]. Notice that it is expressed in terms of a bounded linear operator L(x) which does not have to be the Fréchet derivative of F. It is easy to see that Assumption 2.2 (c) implies that F is continuous over \(B_{3\rho }(x_{0})\).

When \(\mathcal {X}\) is a reflexive Banach space, by using the p-convexity and the weakly lower semi-continuity of \({\mathcal {R}}\) together with the weakly closedness of F, \(x^{\dagger }\) exists and is uniquely defined [19, Lemma 3.2]. Moreover, Assumption 2.2 (b) implies that

$$\begin{aligned} D_{\xi _{0}}{\mathcal {R}}(x^{\dagger },x_{0}) \le c_{0} \rho ^{p}, \end{aligned}$$

which, together with Assumption 2.1, implies that \(\left\| x_{0}-x^{\dagger } \right\| _{{\mathcal {X}}} \le \rho \).

In order to make the method more transparent, it is necessary to give more explanation on the mapping \(j_r^{\mathcal {Y}}: {\mathcal {Y}}\rightarrow {\mathcal {Y}}^*\) for \(1\le r<\infty \). We use \(J_r^{\mathcal {Y}}: {\mathcal {Y}}\rightarrow 2^{{\mathcal {Y}}^*}\) to denote the subdifferential of the convex functional \(y\rightarrow \Vert y\Vert _{{\mathcal {Y}}}^r/r\) over \({\mathcal {Y}}\). By the Asplund’s theorem [5], \(J_r^{\mathcal {Y}}\) with \(1<r<\infty \) is exactly the duality mapping on \({\mathcal {Y}}\) with the gauge function \(t\rightarrow t^{r-1}/r\), i.e.

$$\begin{aligned} J_r^{\mathcal {Y}}(y) = \{ y^* \in {\mathcal {Y}}^*: \Vert y^*\Vert _{{\mathcal {Y}}^{*}} = \Vert y\Vert _{{\mathcal {Y}}}^{r-1} \text{ and } \langle y^*, y\rangle _{{\mathcal {Y}}^{*},{\mathcal {Y}}} = \Vert y\Vert _{{\mathcal {Y}}}^r\}. \end{aligned}$$
(2.1)

While for \(r=1\), one has

$$\begin{aligned} J_1(y) = \left\{ \begin{array}{lll} \{y^*\in {\mathcal {Y}}^*: \Vert y^*\Vert _{{\mathcal {Y}}^{*}}=1 \text{ and } \langle y^*, y\rangle _{{\mathcal {Y}}^{*},{\mathcal {Y}}} = \Vert y\Vert _{{\mathcal {Y}}} \} &{} \text{ if } y \ne 0,\\ \{y^*\in {\mathcal {Y}}^*: \Vert y^*\Vert _{{\mathcal {Y}}^{*}} \le 1\} &{} \text{ if } y = 0. \end{array}\right. \end{aligned}$$

Note that \(J_r^{\mathcal {Y}}\) in general is multi-valued. The mapping \(j_r^{\mathcal {Y}}\) in (1.4) denotes a selection of \(J_r^{\mathcal {Y}}\) which is defined to be a single-valued mapping from \({\mathcal {Y}}\) to \(Y^*\) with the property \(j_r^Y(y) \in J_r^Y(y)\) for each \(y\in Y\). It is easily seen that

$$\begin{aligned} \langle j_r^Y(y), y\rangle _{{\mathcal {Y}}^{*},{\mathcal {Y}}} = \Vert y\Vert _{{\mathcal {Y}}}^r \quad \text{ and } \quad \Vert j_r^Y(y)\Vert _{{\mathcal {Y}}^{*}} \le \Vert y\Vert _{{\mathcal {Y}}}^{r-1} \end{aligned}$$
(2.2)

for all \(y\in Y\) and \(1\le r<\infty \). Here we adopt the convention \(0^0=1\). For a bounded linear operator \(A: {\mathcal {X}}\rightarrow {\mathcal {Y}}\), we use \({{\mathcal {N}}}(A)\) to denote its null space and we also use

$$\begin{aligned} {{\mathcal {N}}}(A)^\perp := \{\xi \in {\mathcal {X}}^*: \langle \xi , x\rangle _{{\mathcal {X}}^{*}, {\mathcal {X}}} =0 \text{ for } \text{ all } x\in {{\mathcal {N}}}(A)\} \end{aligned}$$

to denote the annihilator of \({{\mathcal {N}}}(A)\).

When F is a linear operator, the Landweber iteration (1.4)–(1.5) was analyzed in [3], in which the domain of \({\mathcal {R}}\) is required to have a nonempty interior. This condition excludes many important properties of the regularization functional \({\mathcal {R}}\), so this was removed in [19], extending the method to cover nonlinear inverse problems.

In order to show that the Landweber iteration (1.4)–(1.5) under the discrepancy principle (1.6) is a regularization method for solving (1.1), besides the standard conditions specified in Assumptions 2.1 and 2.2, the convergence analysis in [19] requires the following additional conditions.

Assumption 2.3

  1. (a)

    The Banach space \({\mathcal {Y}}\) is uniformly smooth and \(1<r<\infty \);

  2. (b)

    The mapping \(x \rightarrow L(x)\) from \({\mathcal {X}}\) to \({\mathscr {L}}({\mathcal {X}}, {\mathcal {Y}})\) is continuous.

Here a Banach space \({\mathcal {Y}}\) is called uniformly smooth in the sense that its modulus of smoothness

$$\begin{aligned} \rho _{\mathcal {Y}}(t): = \sup \{ \Vert \bar{y} + y\Vert _{{\mathcal {Y}}} + \Vert \bar{y} -y\Vert _{{\mathcal {Y}}} -2: \Vert \bar{y}\Vert _{{\mathcal {Y}}}=1 \text{ and } \Vert y\Vert _{{\mathcal {Y}}}\le t\} \end{aligned}$$

satisfies \(\lim _{t\searrow 0} \rho _{\mathcal {Y}}(t)/t =0\). The uniform smoothness of \({\mathcal {Y}}\) in Assumption 2.3 (a) guarantees that the duality mapping \(J_r^{\mathcal {Y}}\), for each \(1<r<\infty \), is single-valued and uniformly continuous on bounded sets [5]. Therefore, Assumption 2.3 is crucial for establishing the stability of Landweber iteration in [19, lemma 3.8].

Under the discrepancy principle, the convergence result of Landweber iteration (1.4)–(1.6) with \(1< r <\infty \) is established in [19, theorem 3.9]. However, this depends heavily on the uniform smoothness of \({\mathcal {Y}}\) and the continuity of the mapping \({\mathcal {X}}\ni x \mapsto L(x) \in {\mathscr {L}}({\mathcal {X}}, {\mathcal {Y}})\). There are situations where the noisy data is contaminated by non-Gaussian noise such as the impulsive noise or the uniform distributed noise; in such situations one may choose \({\mathcal {Y}}\) to be the \(L^1\)-space or the \(L^\infty \)-space to effectively remove the effects of noise [1, 6]. Note that both the \(L^1\)-space and the \(L^\infty \)-space are not uniformly smooth. On the other hand, there exist ill-posed inverse problems where the forward operator F is non-smooth, i.e. not necessarily Gâteaux differentiable [7]. For such non-smooth ill-posed problems, one needs to choose L(x) carefully as the replacement of the Gâteaux derivative; for instance, one may choose L(x) to be a Bouligand subderivative of F at x which is defined as a limit of Fréchet derivative of F in differentiable points. The Bouligand subderivative mapping in general is not continuous unless the forward operator is Gâteaux differentiable. The paper [24] revisits the iteration (1.4)–(1.6) to provide a new convergence result that requires neither the reflexivity of \({\mathcal {Y}}\) nor the continuity of the mapping \(x \rightarrow L(x)\). To replace Assumption 2.3, the following compactness condition was used.

Assumption 2.4

There exists a Banach space Z such that

$$\begin{aligned} \bigcup _{x\in B_{3\rho }(x_0)} \text{ Range }(L(x)^*) \subset Z \end{aligned}$$

and Z is compactly embedded into \({\mathcal {X}}^*\). Moreover, there is a constant \({\hat{B}}\) such that \(\Vert L(x)^*\Vert _{{\mathscr {L}}(Y^*, Z)} \le {\hat{B}}\) for all \(x\in B_{3\rho }(x_0)\).

Assumption 2.4 places a condition on the smoothing properties of \(L(x)^*\) for \(x\in B_{3\rho }(x_0)\) in a uniform sense. This assumption is inspired by the work [7] and is an adaptation of their assumption in Hilbert spaces to Banach space setting. Also, this assumption can be independent of the reflexivity of the Banach space \({\mathcal {Y}}\), as illustrated in [24, example 2.2 ] which uses a nonlinear inverse problem.

In order to show the convergence of \(x_{n_\delta }^\delta \) to a solution of (1.1), it is necessary to investigate for each n the behaviour of \((\xi _n^\delta , x_n^\delta )\) as \(\delta \rightarrow 0\) which is an important step. Since the mapping \(x \rightarrow L(x)\) is not assumed to be continuous, and because the mapping \(y\in {\mathcal {Y}}\rightarrow j_{r}^{\mathcal {Y}}(y) \in {\mathcal {Y}}^*\) is no longer continuous, one may not expect the convergence of \((\xi ^{\delta }_n, x^{\delta }_n)\) to a single point as \(\delta \rightarrow 0\). For each n, \((\xi ^{\delta }_n,x^{\delta }_n)\) may have many limit points as \(\delta \rightarrow 0\). By picking any one of the limit points for each n, we may form an iterative sequence \(\{(\xi _n, x_n)\}\) in \({\mathcal {X}}^*\times {\mathcal {X}}\) corresponding to the noise-free case. It turns out that all these iterative sequences obey certain properties which will be specified in the following. We will use \(\Gamma _{\mu _0, \mu _1}(\xi _0, x_0)\) to denote the set of all possible sequences \(\{(\xi _n, x_n)\}\) in \({\mathcal {X}}^*\times {\mathcal {X}}\) constructed from \((\xi _0, x_0)\) by

$$\begin{aligned} \xi _{n+1} = \xi _{n} - \mu _{n}\psi _{n}, \quad \text { and } \quad x_{n+1} = \arg \min _{x \in {\mathcal {X}}}\left\{ {\mathcal {R}}(x)-\left\langle \xi _{n+1},x \right\rangle _{{\mathcal {X}}^{*},{\mathcal {X}}} \right\} , \end{aligned}$$
(2.3)

where

$$\begin{aligned} \mu _{n} = {\left\{ \begin{array}{ll} \min \left\{ \frac{\mu _{0}\left\| F(x_{n})-y^{\dagger } \right\| _{{\mathcal {Y}}}^{p(r-1)} }{\left\| \psi _{n} \right\| _{{\mathcal {X}}^{*}}^{p}}, \mu _{1} \right\} \left\| F(x_{n})-y^{\dagger } \right\| _{{\mathcal {Y}}}^{p-r} \quad &{}\text {if } F(x_{n}) \ne y^{\dagger }\\ 0 \quad &{}\text {if } F(x_{n}) = y^{\dagger }\\ \end{array}\right. } \end{aligned}$$
(2.4)

and \(\psi _{n}\in {\mathcal {X}}^{*}\) satisfies the properties

$$\begin{aligned} \begin{aligned} \left\langle \psi _{n},\hat{x}-x_{n} \right\rangle _{{\mathcal {X}}^{*},{\mathcal {X}}}&\le -(1-\eta )\left\| F(x_{n})-y^{\dagger } \right\| _{{\mathcal {Y}}}^{r}, \\ \left| \left\langle \psi _{n},\hat{x}-x \right\rangle _{{\mathcal {X}}^{*},{\mathcal {X}}} \right|&\le (1+\eta )\left\| F(x_{n})-y^{\dagger } \right\| _{{\mathcal {Y}}}^{r-1}\left( 2\left\| F(x_{n})-y^{\dagger } \right\| _{{\mathcal {Y}}} + \left\| F(x)-y^{\dagger } \right\| _{{\mathcal {Y}}} \right) , \end{aligned}\nonumber \\ \end{aligned}$$
(2.5)

for any integer n, any \(x \in B_{3\rho }(x_{0})\) and any solution \(\hat{x}\) of (1.1) in \(B_{3\rho }(x_{0})\). As shown in [24], (2.5) hold when \(\psi _{n}=L(x_{n})^{*}J_{r}^{\mathcal {Y}}(F(x_{n})-y ) \), ensuring the well-definedness of (2.3) and (2.4). Indeed, for this choice of \(\psi _n\), one may use Assumption 2.1, Assumption 2.2 and the argument in [19] to show that \(x_n \in B_{3\rho }(x_0)\) for all \(n \ge 0\). Hence \(\Gamma _{\mu _{0},\mu _{1} }(\xi _{0},x_{0})\) is non-empty.

The performance of the discrepancy principle depends heavily on the accurate knowledge of noise level. Such noise level information, however, is not always available or reliable in applications. Incorrect estimation on noise level may lead to significant loss of reconstruction accuracy when using the discrepancy principle. Therefore, it is necessary to develop heuristic rules for Landweber iteration that do not use any knowledge of the noise level in case a reliable noise level information is unavailable.

Based on modifying the discrepancy principle, we propose the following heuristic rule for Landweber iteration in Banach spaces in the spirit of [10]. The Hanke–Raus rule has been studied in variational regularization in Banach spaces [13], and has been generalized for variational regularization in general topological spaces [22].

Rule 2.1

Let \(a \ge 1\) be a fixed number and let

$$\begin{aligned} \Theta (n,y^\delta ):=(n+a)\left\| F(x^{\delta }_{n})-y^{\delta } \right\| _{{\mathcal {Y}}}^{p}. \end{aligned}$$

We define \(n_{*}:=n_{*}(y^{\delta })\) be an integer such that

$$\begin{aligned} n_{*} \in \textrm{argmin}\left\{ \Theta (n,y^{\delta }): 0\le n \le n_{\infty } \right\} , \end{aligned}$$

where \(n_{\infty }:=n_{\infty }(y^{\delta })\) is the largest integer such that \(x_n^{\delta } \in \mathcal {D}(F)\) for all \(0\le n\le n_\infty \).

Rule 2.1 can be easily implemented. During the iteration, the value of \(\Theta (n, y^\delta )\) is recorded versus n. After performing a large number of iterations, we stop and choose \(n_*\) to be the integer that minimizes \(\Theta (n, y^\delta )\). Setting an upper limit in the number of iterations due to the local convergence is an issue with the nonlinear Landweber iteration [11]. This upper limit must be also large enough to prevent badly estimating \(n_{*}\) by the first local minimum, which is hardly the global minimum [11].

Using rule 2.1 requires the choice of a fixed number a, a scheme first introduced in [16] which uses a rule just like rule 2.1 for the augmented Lagrangian method in solving linear inverse problems. To use rule 2.1, we suggest choosing a to be suitably large to generate an accurate solution.

Using a fixed constant similar to a in rule 2.1 prevents the regularization parameter \(n_{*}\) from becoming too small. This was first illustrated in the numerical simulations in [16] for the augmented Lagrangian method. Meanwhile the paper [30] reported a convergence analysis of Hanke–Raus rule for non-stationary iterated Tikhonov regularization using a fixed constant \(a \ge 0\). The choice of the lower bound for this fixed constant for the Hanke–Raus rule could depend on the regularization method to guarantee convergence. For the Landweber iteration, \(a \ge 1\) in rule 2.1 is crucial, as illustrated later in the proof of Lemma 2.1.

With the integer \(n_*:=n_*(y^\delta )\) determined by Rule 2.1, we will use \(x_{n_*(y^\delta )}^\delta \) as an approximate solution. A natural question is, for a family of noisy data \(\{y^\delta \}\) with \(y^\delta \rightarrow y\) as \(\delta \rightarrow 0\), if it is possible to guarantee the convergence of \(x_{n_*(y^\delta )}^\delta \) to a solution of (1.1) as \(\delta \rightarrow 0\). The answer in general is no, according to the Bakushinskii’s veto which states that heuristic rules can not lead to convergence in the sense of worst case scenario for any regularization method [2]. Therefore, to guarantee a convergence result, one must impose certain conditions on the noisy data. We will use the following noise condition.

Assumption 2.5

\(\left\{ y^{\delta } \right\} \) is a family of noisy data satisfying \(0<\left\| y^{\delta }-y^{\dagger } \right\| _{{\mathcal {Y}}}\rightarrow 0\) as \(\delta \rightarrow 0\) and there is a constant \(\kappa > 0\) such that

$$\begin{aligned} \left\| y^{\delta }- F(x) \right\| _{{\mathcal {Y}}} \ge \kappa \left\| y^{\delta }- y^{\dagger } \right\| _{{\mathcal {Y}}} \end{aligned}$$

for every \(y^{\delta }\) and every \(x \in S(y^{\delta })\), where \(S(y^{\delta }):=\left\{ x_{n}^{\delta }:0 \le n \le n_{\infty } \right\} \) generated by (1.4).

Under Assumption 2.5 it is immediate to see that Rule 2.1 defines a finite integer \(n_{*}\). Indeed, if \(n_{\infty }\) is finite, then there is nothing to prove. So we only consider \(n_{\infty }=\infty \). It then follows from Assumption 2.5 that

$$\begin{aligned} \Theta (n,y^{\delta }) = (n+a)\left\| F(x_{n}^{\delta })-y^{\delta } \right\| _{{\mathcal {Y}}}^{p} \ge (n+a)\kappa ^{p} \left\| y^{\dagger }-y^{\delta } \right\| _{{\mathcal {Y}}}^{p} \rightarrow \infty \end{aligned}$$

as \(n \rightarrow \infty \). This implies that there must exist a finite integer \(n_{*}\) achieving the minimum of \(\Theta (n,y^{\delta })\).

As of writing, Assumption 2.5 is a rather abstract condition and difficult to verify. However, this is the best thing that can be done for now in order to come up with some analysis. The noise condition was introduced in the seminal paper [10] for linear problems in Hilbert spaces. Under the Hanke–Raus rule, such condition was extended in the convergence analyses for variational regularization [12, 13, 24], augmented Lagrangian method [16], and non-stationary iterated Tikhonov regularization [30]. We can only illustrate this condition via numerical examples. Meanwhile, a very comprehensive numerical study of heuristic rules, aside from the Hanke–Raus rule, for nonlinear Landweber iteration can be found in [11].

2.1 Analysis of rule 2.1

In this section, we will show that

$$\begin{aligned} \Theta (n_{*}(y^{\delta }),y^{\delta }) \rightarrow 0 \quad \text {as } \delta \rightarrow 0 \end{aligned}$$
(2.6)

To achieve this result, we introduce an auxiliary integer \(\hat{n}_{\delta }\) defined by the stopping rule

$$\begin{aligned} \left\| F(x^{\delta }_{\hat{n}_{\delta }})-y^{\delta } \right\| _{{\mathcal {Y}}}^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(\hat{n}_{\delta }+a)^{2} } \le \tau ^{p}\left\| y^{\delta }- y \right\| _{{\mathcal {Y}}}^{p} < \left\| F(x^{\delta }_{n})-y^{\delta } \right\| _{{\mathcal {Y}}}^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(n +a)^{2} } \nonumber \\ \end{aligned}$$
(2.7)

for \(0 \le n < \hat{n}_{\delta }\), where \(\tau >1\) is a large number (not to be confused with the one in (1.6)). This is a slight modification of the discrepancy principle (1.6) with an extra term \(\tfrac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(n +a)^{2} }\). This extra term ensures that \(\hat{n}_{\delta }\rightarrow \infty \) as \(\delta \rightarrow 0\), an important step in establishing (2.6).

Lemma 2.1

Let Assumptions 2.1 and 2.2 hold. Let \(\mu _{0}\) be chosen such that

$$\begin{aligned} c_{1}:=1 - \eta - \frac{1}{p^{*}}\left( \frac{\mu _{0} }{2c_{0}} \right) ^{p^{*}-1} > 0. \end{aligned}$$
(2.8)

If \(\tau > (1 + \eta )/c_{1}\), then (2.7) defines a finite integer \(\hat{n}_{\delta }\) with \(\hat{n}_{\delta }\rightarrow \infty \) as \(\delta \rightarrow 0\). Moreover \(x_n^\delta \in B_{3\rho }(x_{0})\) for all \(0 \le n \le \hat{n}_{\delta }\).

Proof

Using induction, we show first that \(x^{\delta }_{n} \in B_{3\rho }(x_{0})\) for \(0 \le n \le \hat{n}_{\delta }\). This is trivial for \(n=0\). Now, suppose \(x_k^\delta \in B_{3\rho }(x_0)\) for all \(0\le k\le n\) for some \(n < \hat{n}_{\delta }\), we will show that \(x^{\delta }_{n+1} \in B_{3\rho }(x_{0})\). Noting that we are considering the noisy case, by a similar argument in [19, lemma 3.4], we have for \(0\le k\le n\) that

$$\begin{aligned} \begin{aligned}&D_{\xi ^{\delta }_{k+1}}{\mathcal {R}}(x^{\dagger },x^{\delta }_{k+1}) - D_{\xi ^{\delta }_{k}}{\mathcal {R}}(x^{\dagger },x^{\delta }_{k})\\&\quad \le \frac{1}{p^{*}(2c_{0})^{p^{*} - 1}}\left\| \xi ^{\delta }_{k+1} - \xi ^{\delta }_{k} \right\| _{{\mathcal {X}}^{*}}^{p_{^{*}}} - \mu _k^\delta \left\langle j_{r}^{\mathcal {Y}}(F(x^{\delta }_{k})-y^{\delta }),L(x^{\delta }_{k})(x^{\delta }_{k} - x^{\dagger }) \right\rangle _{{\mathcal {Y}}^{*},{\mathcal {Y}}} \end{aligned} \end{aligned}$$
(2.9)

Based on (1.4) and (2.1), we have

$$\begin{aligned} \left\| \xi ^{\delta }_{k+1} - \xi ^{\delta }_{k} \right\| _{{\mathcal {X}}^{*}}^{p_{^{*}}} =(\mu ^{\delta }_{n})^{p^{*}} \left\| L(x^{\delta }_{n})^{*}j_{r}^{\mathcal {Y}}(F(x^{\delta }_{n}) - y^{\delta }) \right\| _{{\mathcal {X}}^{*}}^{p^{*}} \le \mu _{0}^{p^{*}-1}\mu ^{\delta }_{n}\left\| F(x^{\delta }_{n}) - y^{\delta } \right\| _{{\mathcal {Y}}}^{r} \end{aligned}$$

Moreover, by invoking the iterative Eq. (1.4) and Assumption 2.2 (c), we have

$$\begin{aligned}&-\big \langle j_{r}^{\mathcal {Y}}(F(x^{\delta }_{k})-y^{\delta }), L(x^{\delta }_{k})(x^{\delta }_{k} - x^{\dagger }) \big \rangle _{{\mathcal {Y}}^{*}, {\mathcal {Y}}} \\&\quad = -\left\langle j_{r}^{\mathcal {Y}}(F(x^{\delta }_{k})-y^{\delta }), F(x^{\delta }_{k})-y^{\delta } \right\rangle _{{\mathcal {Y}}^{*},{\mathcal {Y}}} + \left\langle j_{r}^{\mathcal {Y}}(F(x^{\delta }_{k})-y^{\delta }), y^{\dagger }-y^{\delta } \right\rangle _{{\mathcal {Y}}^{*},{\mathcal {Y}}}\\&\qquad -\left\langle j_{r}^{\mathcal {Y}}(F(x^{\delta }_{k})-y^{\delta }), F(x^{\dagger }) - F(x^{\delta }_{k}) - L(x^{\delta }_{k})(x^{\dagger }- x^{\delta }_{k}) \right\rangle _{{\mathcal {Y}}^{*},{\mathcal {Y}}}\\&\quad \le -\left\| F(x^{\delta }_{k})-y^{\delta } \right\| ^{r}_{{\mathcal {Y}}} + \left\| F(x^{\delta }_{k})-y^{\delta } \right\| ^{r-1}_{{\mathcal {Y}}}\left\| y^{\dagger }- y^{\delta } \right\| _{{\mathcal {Y}}}\\&\qquad +\eta \left\| F(x^{\delta }_{k})-y^{\delta } \right\| ^{r-1}_{{\mathcal {Y}}}\left\| F(x^{\dagger }) - F(x^{\delta }_{k}) \right\| _{{\mathcal {Y}}} \\&\quad \le -\left\| F(x^{\delta }_{k})-y^{\delta } \right\| ^{r}_{{\mathcal {Y}}} + \left\| F(x^{\delta }_{k})-y^{\delta } \right\| ^{r-1}_{{\mathcal {Y}}}\left\| y^{\dagger }- y^{\delta } \right\| _{{\mathcal {Y}}} \\&\qquad +\eta \left\| F(x^{\delta }_{k})-y^{\delta } \right\| ^{r-1}_{{\mathcal {Y}}}(\left\| F(x^{\dagger }) - y^{\delta } \right\| _{{\mathcal {Y}}} + \left\| y^{\delta }- F(x^{\delta }_{k}) \right\| _{{\mathcal {Y}}} ) \\&\quad =-(1 - \eta )\left\| F(x^{\delta }_{k})-y^{\delta } \right\| ^{r}_{{\mathcal {Y}}} + (1 + \eta )\left\| F(x^{\delta }_{k})-y^{\delta } \right\| ^{r-1}_{{\mathcal {Y}}}\left\| y^{\dagger }- y^{\delta } \right\| _{{\mathcal {Y}}} \end{aligned}$$

Therefore, with the definition of \(\mu ^{\delta }_{n}\) in (1.5), we have

$$\begin{aligned} \begin{aligned}&D_{\xi ^{\delta }_{k+1}} {\mathcal {R}}(x^{\dagger },x^{\delta }_{k+1}) - D_{\xi ^{\delta }_{k}}{\mathcal {R}}(x^{\dagger },x^{\delta }_{k}) \\&\quad \le -c_{1} \tilde{\mu }^{\delta }_k \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} + (1+\eta )\tilde{\mu }^{\delta }_k \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p-1}\left\| y^{\dagger }- y^{\delta } \right\| _{{\mathcal {Y}}}. \end{aligned} \end{aligned}$$
(2.10)

According to the definition of \(\hat{n}_{\delta }\), for \(k < \hat{n}_{\delta }\) we have

$$\begin{aligned} \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p-1}\left\| y^{\dagger }- y^{\delta } \right\| _{{\mathcal {Y}}} \le \frac{1}{\tau } \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p-1} \left( \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(k +a)^{2} } \right) ^{1/p}, \end{aligned}$$

and by Young’s inequality, we further arrive at

$$\begin{aligned}&\left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p-1}\left\| y^{\dagger }- y^{\delta } \right\| _{{\mathcal {Y}}} \\&\quad \le \frac{1}{\tau }\left( \frac{1}{p}\left( \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(k +a)^{2} } \right) + \frac{1}{p^{*}}\left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} \right) \\&\quad \le \frac{1}{\tau }\left( \frac{1}{p}\left( \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(k +a)^{2} } \right) + \frac{1}{p^{*}}\left( \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(k +a)^{2} } \right) \right) \\&\quad =\frac{1}{\tau }\left( \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(k +a)^{2} } \right) . \end{aligned}$$

Since \(\tilde{\mu }^{\delta }_k \le \mu _{1}\), we can conclude that

$$\begin{aligned}{} & {} D_{\xi ^{\delta }_{k+1}} {\mathcal {R}}(x^{\dagger },x^{\delta }_{k+1}) - D_{\xi ^{\delta }_{k}}{\mathcal {R}}(x^{\dagger },x^{\delta }_{k})\\{} & {} \quad \le -\left( c_{1} -\frac{1+\eta }{\tau } \right) \tilde{\mu }^{\delta }_k \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} + \frac{ (1+\eta )(2^{p}-1)c_{0}\rho ^{p}}{2\tau (k +a)^{2} }. \end{aligned}$$

Then by taking the sum from \(k=0\) to \(k=n<\hat{n}_{\delta }\) on both sides, we further obtain

$$\begin{aligned}&D_{\xi ^{\delta }_{n+1}} {\mathcal {R}}(x^{\dagger },x^{\delta }_{n+1}) + \left( c_{1} -\frac{1+\eta }{\tau } \right) \sum _{k=0}^{n}\tilde{\mu }^{\delta }_k\left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} \\&\quad \le D_{\xi _{0}}{\mathcal {R}}(x^{\dagger },x_{0}) + \frac{ (1+\eta )(2^{p}-1)c_{0}\rho ^{p}}{2\tau }\sum _{k=0}^{n}\frac{1}{(k+a)^{2}} \\&\quad \le c_{0}\rho ^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2}\sum _{k=0}^{\infty }\frac{1}{(k+a)^{2}}, \end{aligned}$$

where we used Assumption 2.2 (b) and the condition \(\tau > (1+\eta )/c_1\) which implies that \((1+\eta )/\tau< c_1<1\). Noting that \(a \ge 1\), and since \(\sum _{k=0}^{\infty }\frac{1}{(k+1)^{2}} = \pi ^{2}/6 <2\), we can obtain

$$\begin{aligned} D_{\xi ^{\delta }_{n+1}} {\mathcal {R}}(x^{\dagger },x^{\delta }_{n+1}) + \left( c_{1} -\frac{1+\eta }{\tau } \right) \sum _{k=0}^{n}\tilde{\mu }^{\delta }_{k} \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} \le c_{0}(2\rho )^{p}.\qquad \end{aligned}$$
(2.11)

By using the p-convexity of \({\mathcal {R}}\), it follows that

$$\begin{aligned} c_{0}\left\| x^{\delta }_{n+1}-x^{\dagger } \right\| _{{\mathcal {X}}}^{p} \le D_{\xi ^{\delta }_{n+1}} {\mathcal {R}}(x^{\dagger },x^{\delta }_{n+1}) \le c_{0}(2\rho )^{p} \end{aligned}$$

which shows that \(\left\| x^{\delta }_{n+1}-x^{\dagger } \right\| _{{\mathcal {X}}} \le 2\rho \). Since \(\left\| x_{0}-x^{\dagger } \right\| _{{\mathcal {X}}} \le \rho \), we therefore have \(x^{\delta }_{n+1} \in B_{3\rho }(x_{0})\).

Next, we prove that \(\hat{n}_{\delta }\) is finite. By contradiction, suppose \(\hat{n}_{\delta }\) is infinite. Then the previous argument shows that \(x^{\delta }_{n} \in B_{3\rho }(x_{0})\) for all \(n \ge 0\). Consequently it follows from (2.11) that

$$\begin{aligned} \left( c_{1} -\frac{1+\eta }{\tau } \right) \sum _{k=0}^{n}\tilde{\mu }^{\delta }_{k} \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} \le c_{0}(2\rho )^{p}. \end{aligned}$$

for all \(n \ge 0\). From the definition of \(\tilde{\mu }^{\delta }_{n}\), we have

$$\begin{aligned} \tilde{\mu }^{\delta }_{n} \ge \min \left\{ \mu _{0}B^{-p},\mu _{1} \right\} . \end{aligned}$$

Then, for all \(n \ge 0\) we have

$$\begin{aligned} c_{3}\sum _{k=0}^{n}\left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} \le c_{0}(2\rho )^{p}, \end{aligned}$$
(2.12)

where \(c_{3}:= \left( c_{1} -\frac{1+\eta }{\tau } \right) \min \left\{ \mu _{0}B^{-p},\mu _{1} \right\} >0\). In addition, the definition of \(\hat{n}_{\delta }\) tells us that

$$\begin{aligned} \left\| F(x_k^\delta )-y^\delta \right\| _{{\mathcal {Y}}}^{p} \ge \tau ^{p}\left\| y^{\delta }- y^{\dagger } \right\| _{{\mathcal {Y}}}^{p} - \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(k +a)^{2} } \end{aligned}$$

for all \(k \ge 0\). Summing up both sides of the previous inequality, and then using (2.12), we further obtain

$$\begin{aligned} c_{3}\tau ^{p}\left\| y^{\delta }- y^{\dagger } \right\| _{{\mathcal {Y}}}^{p}(n+1)&\le 2^{p}c_{0}\rho ^{p} + \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}} \sum _{k=0}^{n}\frac{1}{(k+a)^{2}} \\&\le 2^{p}c_{0}\rho ^{p} + \frac{\pi ^{2}(2^{p}-1)c_{0}\rho ^{p}}{12\mu _{1}} < \infty , \end{aligned}$$

and by taking \(n \rightarrow \infty \), we obtain a contradiction. Finally, from the definition of \(\hat{n}_{\delta }\) in (2.7),

$$\begin{aligned} \frac{(2^{p}-1)c_{0}\rho ^{p}}{2\mu _{1}(\hat{n}_{\delta }+a)^{2} } \le \tau ^{p}\left\| y^{\delta }- y^{\dagger } \right\| _{{\mathcal {Y}}}^{p} \rightarrow 0 \quad \text {as } \delta \rightarrow 0. \end{aligned}$$

Therefore we must have \(\hat{n}_{\delta }\rightarrow \infty \) as \(\delta \rightarrow 0\). \(\square \)

Lemma 2.2

Let Assumptions 2.1 and 2.2 hold. Let \(\mu _{0}>0\) be chosen such that (2.8) holds, and let \(\left\{ y^{\delta } \right\} \) be a family of noisy data satisfying Assumption 2.5. Let \(n_{*}:=n_{*}(y^{\delta })\) be the integer defined by rule 2.1. Then \(\Theta (n_{*}(y^{\delta }),y^{\delta }) \rightarrow 0\) as \(\delta \rightarrow 0\). Consequently, \(\left\| F(x^{\delta }_{n_{*}(y^{\delta })}) - y^{\delta } \right\| _{{\mathcal {Y}}} \rightarrow 0\) and \(n_{*}(y^{\delta })\left\| y^{\delta }- y^{\dagger } \right\| _{{\mathcal {Y}}}^{p} \rightarrow 0\) as \(\delta \rightarrow 0\).

Proof

Let \(\hat{n}_{\delta }\) be the integer defined by (2.7). From the proof of Lemma 2.1 we have

$$\begin{aligned} \sum _{n=0}^{\hat{n}_{\delta }- 1}\left\| F(x^{\delta }_{n}) - y^{\delta } \right\| _{{\mathcal {Y}}}^{p} \le \frac{2^{p}c_{0}\rho ^{p}}{c_{3}} \end{aligned}$$

By the minimality of \(\Theta (n_{*}(y^{\delta }),y^{\delta })\), it follows that

$$\begin{aligned} \left\| F(x^{\delta }_{n}) - y^{\delta } \right\| _{{\mathcal {Y}}}^{p} = \frac{\Theta (n,y^{\delta })}{n+a} \ge \frac{\Theta (n_{*}(y^{\delta }),y^{\delta })}{n+a}. \end{aligned}$$

Hence,

$$\begin{aligned} \Theta (n_{*}(y^{\delta }),y^{\delta }) \sum _{n=0}^{\hat{n}_{\delta }- 1}\frac{1}{n+a} \le \sum _{n=0}^{\hat{n}_{\delta }- 1}\left\| F(x^{\delta }_{n}) - y^{\delta } \right\| _{{\mathcal {Y}}}^{p} \le \frac{2^{p}c_{0}\rho ^{p}}{c_{3}}. \end{aligned}$$

Note that

$$\begin{aligned} \sum _{n=0}^{{\hat{n}}_\delta -1} \frac{1}{n+a} \ge \sum _{n=0}^{\hat{n}_\delta -1} \int _n^{n+1} \frac{1}{t+a} dt = \int _0^{\hat{n}_\delta } \frac{1}{t+a} dt = \log \frac{\hat{n}_\delta +a}{a}. \end{aligned}$$

Thus

$$\begin{aligned} \Theta (n_{*}(y^{\delta }),y^{\delta }) \le \frac{2^{p}c_{0}\rho ^{p}}{c_{3}\log (\hat{n}_{\delta }+a)/a }. \end{aligned}$$

Since \(\hat{n}_{\delta }\rightarrow \infty \) we must have \(\Theta (n_{*}(y^{\delta }),y^{\delta }) \rightarrow 0\) as \(\delta \rightarrow 0\). Note that

$$\begin{aligned} a \left\| F(x^{\delta }_{n_{*}(y^{\delta })}) - y^{\delta } \right\| _{{\mathcal {Y}}}^{p} \le \Theta (n_{*}(y^{\delta }),y^{\delta }), \end{aligned}$$

and Assumption 2.5 implies that

$$\begin{aligned} \left( n_{*}(y^{\delta }) + a \right) \kappa ^{p} \left\| y^{\delta }- y^{\dagger } \right\| ^{p} \le \Theta (n_{*}(y^{\delta }),y^{\delta }). \end{aligned}$$

Hence, by the recent claim, \(\left\| F(x^{\delta }_{n_{*}(y^{\delta })}) - y^{\delta } \right\| _{{\mathcal {Y}}} \rightarrow 0\) and \(n_{*}(y^{\delta }) \left\| y^{\delta }- y^{\dagger } \right\| _{{\mathcal {Y}}}^{p} \rightarrow 0\) as \(\delta \rightarrow 0\). \(\square \)

Lemma 2.3

Let Assumptions 2.1 and 2.2 hold. Let \(\mu _{0}>0\) be chosen such that (2.8) holds, and let \(\left\{ y^{\delta } \right\} \) be a family of noisy data satisfying Assumption 2.5. Let \(n_{*}:=n_{*}(y^{\delta })\) be the integer defined by Rule 2.1. Then \(x^{\delta }_{n} \in B_{3\rho }(x_0)\) for all \(0 \le n \le n_{*}\) if \(\delta \) is sufficiently small.

Proof

We use an induction argument. The result is trivial for \(n=0\). Now we assume that \(x^{\delta }_{n} \in B_{3\rho }(x_0)\) for some \(n < n_{*}(y^{\delta })\). We will use (2.10) to prove that \(x^{\delta }_{n+1} \in B_{3\rho }(x_0)\). By the Young’s inequality we have

$$\begin{aligned} (1+\eta ) \Vert F(x_n^\delta )-y^\delta \Vert _{{\mathcal {Y}}}^{p-1} \Vert y^\delta -y^{\dagger }\Vert _{{\mathcal {Y}}} \le \frac{c_1}{p^*} \Vert F(x_n^\delta ) - y^{\delta }\Vert _{{\mathcal {Y}}}^p + \frac{(1+\eta )^p}{p c_1^{p-1}}\Vert y^\delta -y^{\dagger }\Vert _{{\mathcal {Y}}}^p. \end{aligned}$$

Combining this with (2.10), we have

$$\begin{aligned} D_{\xi _{n+1}^\delta } {\mathcal {R}}(x^\dag , x_{n+1}^\delta ) - D_{\xi _n^\delta }{\mathcal {R}}(x^\dag , x_n^\delta )&\le -\frac{c_1}{p} \tilde{\mu }^{\delta }_n \Vert F(x_n^\delta ) -y^\delta \Vert _{{\mathcal {Y}}}^p + c_4 \Vert y^\delta -y^{\dagger }\Vert _{{\mathcal {Y}}}^p, \end{aligned}$$

where \(c_4:= (1+\eta )^p\mu _1/(p c_1^{p-1})\). Therefore

$$\begin{aligned} D_{\xi _{n+1}^\delta } {\mathcal {R}}(x^\dag , x_{n+1}^\delta ) + \frac{c_1}{p}\sum _{k=0}^n \tilde{\mu }^{\delta }_k \Vert F(x_k^\delta ) - y^\delta \Vert _{{\mathcal {Y}}}^p&\le D_{\xi _0} {\mathcal {R}}(x^\dag , x_0) + c_4 (n+1) \Vert y^\delta - y^{\dagger }\Vert _{{\mathcal {Y}}}^p \\&\le c_0 \rho ^p + c_4 n_*(y^\delta ) \Vert y^\delta - y^{\dagger }\Vert _{{\mathcal {Y}}}^p \end{aligned}$$

By Lemma 2.2, we can guarantee that

$$\begin{aligned} c_4 n_*(y^\delta ) \Vert y^\delta -y^{\dagger }\Vert _{{\mathcal {Y}}}^p \le (2^p-1) c_0\rho ^p \end{aligned}$$

for sufficiently small \(\delta \). Then

$$\begin{aligned} D_{\xi _{n+1}^\delta } {\mathcal {R}}(x^\dag , x_{n+1}^\delta ) \le c_0 \rho ^p + (2^p-1) c_0 \rho ^p = c_0 (2 \rho )^p. \end{aligned}$$

By the p-convexity of \({\mathcal {R}}\) we have \(c_0 \Vert x_{n+1}^\delta -x^\dag \Vert _{{\mathcal {X}}}^p \le c_0 (2\rho )^p\) which show that \(\Vert x_{n+1}^\delta -x^\dag \Vert _{{\mathcal {X}}} \le 2\rho \). Since \(\Vert x_0-x^\dag \Vert _{{\mathcal {X}}}\le \rho \), we therefore have \(x_{n+1}^\delta \in B_{3\rho }(x_0)\). \(\square \)

We lift a previous result regarding the noise-free algorithm (2.3)–(2.5), which we will use later.

Lemma 2.4

Let Assumptions 2.1 and 2.2 hold. Let \(\mu _{0}>0\) be chosen such that (2.8) holds. The for any sequence \(\left\{ (\xi _{n},x_{n}) \right\} \in \Gamma _{\mu _{0},\mu _{1} }(\xi _{0},x_{0})\) there exists a solution \(x_{*}\) of (1.1) such that \(D_{\xi _{n}}{\mathcal {R}}(x_{*},x_{n}) \rightarrow 0\) as \(n \rightarrow \infty \). If, in addition, \(\xi _{n+1} - \xi _{n} \in \mathcal {N}(L(x^{\dagger }))^{\perp }\) for all n, then \(x_{*}=x^{\dagger }\).

Proof

This is [24, Lemma 2.5]. \(\square \)

2.2 Main result

Now we are about to prove the main result of this paper. We need two stability results based also on Assumption 2.3 and Assumption 2.4. These stability results will link the method (1.4) to the noise-free algorithm (2.3)–(2.5) so that lemma 2.4 can be applied.

Lemma 2.5

Let Assumptions 2.1 and 2.2 hold. Let \(\mu _{0}>0\) be chosen such that (2.8) holds, and let \(\left\{ y^{\delta } \right\} \) be a family of noisy data satisfying Assumption 2.5. Let \(n_{*}:=n_{*}(y^{\delta })\) be the integer defined by Rule 2.1.

  1. (a)

    If Assumption 2.3 holds and \(1< r <\infty \), then there is a sequence \(\left\{ (\xi _{n},x_{n}) \right\} \in \Gamma _{\mu _{0},\mu _{1} }(\xi _{0},x_{0})\) such that

    $$\begin{aligned} x^{\delta }_{n} \rightarrow x_{n} \quad \text { and } \quad \xi ^{\delta }_{n} \rightarrow \xi _{n} \quad \text {as } \delta \rightarrow 0 \end{aligned}$$

    for all \(0 \le n \le \liminf _{\delta \rightarrow 0}n_{*}(y^{\delta })\).

  2. (a)

    If Assumption 2.4 holds and \(1 \le r <\infty \), then for any subsequence \(\left\{ y^{\delta _{l}} \right\} \), with \(\delta _{l}\rightarrow 0\) as \(l \rightarrow \infty \), of \(\left\{ y^{\delta } \right\} \), by taking a subsequence if necessary, there is a sequence \(\left\{ (\xi _{n},x_{n}) \right\} \in \Gamma _{\mu _{0},\mu _{1} }(\xi _{0},x_{0})\) such that

    $$\begin{aligned} x^{\delta _{l}}_{n} \rightarrow x_{n} \quad \text { and } \quad \xi ^{\delta _{l}}_{n} \rightarrow \xi _{n} \quad \text {as } l \rightarrow \infty \end{aligned}$$

    for all \(0 \le n \le \liminf _{l \rightarrow \infty }n_{*}(y^{\delta _{l}})\).

If in addition \(\mathcal {N}(L(x^{\dagger })) \subset \mathcal {N}(L(x))\) for all \(x \in B_{3\rho }(x_0)\), then there also holds \(\xi _{n+1} - \xi _{n} \in \mathcal {N}(L(x^{\dagger }))^{\perp }\) for all n.

Proof

  1. (a)

    The proof is similar to that of [19, lemma 3.8]. Note that since Assumption 2.3 holds, the mappings \(y \mapsto j_{r}^{\mathcal {Y}}(y)\) and \(x \mapsto L(x)\) are continuous [5].

  2. (b)

    Since Assumption 2.4 holds, the proof follows from that of [24, lemma 2.3] with \(N:=\liminf _{l \rightarrow \infty }n_{*}(y^{\delta _{l}})\).

\(\square \)

Now we are ready to prove the main result of this paper concerning the method (1.4)–(1.5).

Theorem 2.6

Let Assumption 2.1 and 2.2 hold, and let \(\mu _{0}\) be chosen such that (2.8) holds. Let \(\left\{ y^{\delta } \right\} \) be a sequence of noisy data satisfying assumption 2.5 and let \(n_{*}(y^{\delta })\) be the integer determined by Rule 2.1.

  1. (a)

    If Assumption 2.3 holds and \(1< r <\infty \), then there is a solution \(x_{*}\) of (1.1) such that

    $$\begin{aligned} \left\| x^{\delta }_{n_{*}(y^{\delta })} - x_{*} \right\| _{{\mathcal {X}}} \rightarrow 0 \quad \text {and} \quad D_{\xi ^{\delta }_{n_{*}(y^{\delta })}}{\mathcal {R}}(x_{*}, x^{\delta }_{n_{*}(y^{\delta })} ) \rightarrow 0 \end{aligned}$$

    as \(\delta \rightarrow 0\). If in addition \(\mathcal {N}(L(x^{\dagger })) \subset \mathcal {N}(L(x))\) for all \(x \in B_{3\rho }(x_0)\), then \(x_{*}=x^{\dagger }\).

  2. (b)

    If Assumption 2.4 holds and \(1 \le r <\infty \), then for any subsequence \(\left\{ y^{\delta _{l}} \right\} \), with \(\delta _{l}\rightarrow 0\) as \(l \rightarrow \infty \), of \(\left\{ y^{\delta } \right\} \), by taking a subsequence of \(\left\{ y^{\delta _{l}} \right\} \) if necessary, there hold

    $$\begin{aligned} \left\| x^{\delta _{l}}_{n_{*}(y^{\delta _{l}})} - x_{*} \right\| _{{\mathcal {X}}} \rightarrow 0 \quad \text {and} \quad D_{\xi ^{\delta _{l}}_{n_{*}(y^{\delta _{l}})}}{\mathcal {R}}(x_{*}, x^{\delta _{l}}_{n_{*}(y^{\delta _{l}})} ) \rightarrow 0 \end{aligned}$$

    as \(l \rightarrow \infty \) for some solution \(x_{*}\) of (1.1) in \(\mathcal {B}_{3\rho }(x_{0})\). If in addition \(\mathcal {N}(L(x^{\dagger })) \subset \mathcal {N}(L(x))\) for all \(x \in B_{3\rho }(x_0)\), then

    $$\begin{aligned} \left\| x^{\delta }_{n_{*}(y^{\delta })} - x^{\dagger } \right\| _{{\mathcal {X}}} \rightarrow 0 \quad \text {and} \quad D_{\xi ^{\delta }_{n_{*}(y^{\delta })}}{\mathcal {R}}(x^{\dagger }, x^{\delta }_{n_{*}(y^{\delta })} ) \rightarrow 0 \end{aligned}$$

    as \(\delta \rightarrow 0\).

Proof

We will only prove (b) since (a) can be proved similarly. Let \(\left\{ y^{\delta _{l}} \right\} \) be a subsequence of \(\left\{ y^{\delta } \right\} \), and let \(N:=\liminf _{l \rightarrow \infty }n_{*}(y^{\delta _{l}})\). By taking a subsequence of \(\left\{ y^{\delta _{l}} \right\} \) if necessary, we may assume \(N=\lim _{l \rightarrow \infty }n_{*}(y^{\delta _{l}})\) and, according to Lemma 2.5, we can find a sequence \(\left\{ (\xi _{n},x_{n}) \right\} \in \Gamma _{\mu _{0},\mu _{1} }(\xi _{0},x_{0})\) such that

$$\begin{aligned} \xi ^{\delta _{l}}_{n} \rightarrow \xi _{n} \quad \text {and} \quad x^{\delta _{l}}_{n} \rightarrow x_{n} \quad \text {as } \, l \rightarrow \infty \end{aligned}$$
(2.13)

for all \(0 \le n \le N\). Let \(x_{*}\) be the limit of \(\left\{ x_{n} \right\} \) which is a solution of (1.1) in \(B_{3\rho }(x_0)\). We show that

$$\begin{aligned} \lim _{l \rightarrow \infty } D_{\xi ^{\delta _{l}}_{n_{*}(y^{\delta _{l}})}}{\mathcal {R}}(x_{*}, x^{\delta _{l}}_{n_{*}(y^{\delta _{l}})} ) = 0. \end{aligned}$$
(2.14)

If \(N<\infty \), then (2.14) can be similarly proven using the argument for the corresponding case in the proof of [24, theorem 2.6] since there requires only \(\left\| F(x^{\delta _{l}}_{n_{*}(y^{\delta _{l}})}) - y^{\delta _{l}} \right\| _{{\mathcal {Y}}} \rightarrow 0\) as \(l \rightarrow \infty \), which is guaranteed by Lemma 2.2. If \(N=\infty \) then for any fixed integer \(n \ge 1\) we have \(n_{*}(y^{\delta _{l}})> n\) for large l. According to the proof of Lemma 2.3 we have

$$\begin{aligned} D_{\xi ^{\delta _{l}}_{k+1}}{\mathcal {R}}(x_{*},x^{\delta _{l}}_{k+1}) - D_{\xi ^{\delta _{l}}_{k}}{\mathcal {R}}(x_{*},x^{\delta _{l}}_{k}) \le C\left\| y^{\delta _{l}}- y^{\dagger } \right\| _{{\mathcal {Y}}}^{p} \end{aligned}$$

for \(0 \le k <n_{*}(y^{\delta _{l}})\), where C is a positive constant independent of l. Choose n such that \(0 \le n <n_{*}(y^{\delta _{l}})\). By taking the sum both sides from \(k=n\) to \(k=n_{*}(y^{\delta _{l}})- 1\), the previous inequality implies that

$$\begin{aligned} D_{\xi ^{\delta _{l}}_{n_{*}(y^{\delta _{l}})}}{\mathcal {R}}(x_{*},x^{\delta _{l}}_{n_{*}(y^{\delta _{l}})})&\le D_{\xi ^{\delta _{l}}_{n}}{\mathcal {R}}(x_{*},x^{\delta _{l}}_{n}) + C(n_{*}(y^{\delta _{l}})- n + 1)\left\| y^{\delta _{l}}- y^{\dagger } \right\| _{{\mathcal {Y}}}^{p} \\&\le D_{\xi ^{\delta _{l}}_{n}}{\mathcal {R}}(x_{*},x^{\delta _{l}}_{n}) + Cn_{*}(y^{\delta _{l}})\left\| y^{\delta _{l}}- y^{\dagger } \right\| _{{\mathcal {Y}}}^{p}. \end{aligned}$$

We follow the argument for the corresponding case in the proof of [24, theorem 2.6]. From the proof of Lemma 2.3, the monotonicity of \(\left\{ D_{\xi ^{\delta }_{n}}{\mathcal {R}}(x_{*},x^{\delta }_{n}) \right\} \) also holds. Using this, (2.13), and the lower semi-continuity of \({\mathcal {R}}\), we can obtain

$$\begin{aligned} 0&\le \limsup _{l \rightarrow \infty }D_{\xi ^{\delta _{l}}_{n_{*}(y^{\delta _{l}})}}{\mathcal {R}}(x_{*},x^{\delta _{l}}_{n_{*}(y^{\delta _{l}})}) \le \limsup _{l \rightarrow \infty }D_{\xi ^{\delta _{l}}_{n}}{\mathcal {R}}(x_{*},x^{\delta _{l}}_{n}) \\&\le {\mathcal {R}}(x_{*}) - \liminf _{l \rightarrow \infty } {\mathcal {R}}(x^{\delta _{l}}_{n}) - \lim _{l \rightarrow \infty }\left\langle \xi ^{\delta _{l}}_{n}, x_{*} - x^{\delta _{l}}_{n} \right\rangle \\&\le {\mathcal {R}}(x_{*}) - {\mathcal {R}}(x_{n}) - \left\langle \xi _{n}, x_{*} - x_{n} \right\rangle \\&\le D_{\xi _{n}}{\mathcal {R}}(x_{*},x_{n}). \end{aligned}$$

Since Lemma 2.4 implies that \(D_{\xi _{n}}{\mathcal {R}}(x_{*},x_{n}) \rightarrow 0\) as \(n \rightarrow \infty \), we can conclude (2.14) again. \(\square \)

For the Eq. (1.1) with a bounded linear operator \(F:{\mathcal {X}}\rightarrow {\mathcal {Y}}\), Assumption 2.4 can be replaced by the compactness of F. This leads to the following convergence result.

Corollary 2.7

Consider the Eq. (1.1) where \(F:{\mathcal {X}}\rightarrow {\mathcal {Y}}\) is a compact bounded linear operator. Let \({\mathcal {R}}\) satisfy Assumption 2.1 and let \(\mu _{0} >0\) be chosen such that

$$\begin{aligned} 1 - \frac{1}{p^{*}}\left( \frac{\mu _{0} }{2c_{o}} \right) ^{p^{*}-1} > 0. \end{aligned}$$

Then for the Landweber iteration (1.4) with the integer \(n_{*}(y^{\delta })\) determined by rule 2.1 there hold

$$\begin{aligned} \left\| x^{\delta }_{n_{*}(y^{\delta })} - x^{\dagger } \right\| _{{\mathcal {X}}} \rightarrow 0 \quad \text {and} \quad D_{\xi ^{\delta }_{n_{*}(y^{\delta })}}{\mathcal {R}}(x^{\dagger }, x^{\delta }_{n_{*}(y^{\delta })} ) \rightarrow 0 \end{aligned}$$

as \(\delta \rightarrow 0\).

3 Numerical examples

Example 3.1

Now we consider an example with a nonsmooth forward operator. To start, consider an open bounded subset \(\Omega \subset \mathbb {R}^{d}\), \(d \in \left\{ 1,2 \right\} \), with a Lipschitz boundary \(\partial \Omega \), and consider the nonsmooth semilinear equation

$$\begin{aligned} \left\{ \begin{array}{ll} -\Delta y + y^{+} = u &{} { in} \Omega , \\ y = 0 &{} { on} \partial \Omega , \end{array} \right. \end{aligned}$$
(3.1)

with \(u \in L^{2}(\Omega )\) and \(y^{+}(x):=\max (y(x),0)\) for almost every \(x \in \Omega \). Equation (3.1) arises in a number of applications, such as modelling the deflection of a stretched thin membrane partially covered by water (see [21]). It also appears in free boundary problems for a confined plasma; see [21, 23, 28] for some examples. For each \(u \in L^{2}(\Omega )\), a unique solution \(y_{u}\) in \(H_{0}^{1}(\Omega ) \cap C(\Omega )\) exists [29, theorem 4.7], so we consider the inverse problem of estimating the source term u from noisy measurements of \(y_{u}\).

Define the forward operator \(F:L^{2} \rightarrow H_{0}^{1}(\Omega ) \cap C(\Omega )\) where \(y_{u}=F(u)\) for \(u \in L^{2}(\Omega )\). As shown in [4, proposition 2.1, theorem 2.2], F is globally Lipschitz continuous, and has a directional derivative \(F'(u;h)\) in the direction \(h \in L^{2}(\Omega )\) given by \(\nu \in H_{0}^{1}(\Omega )\) which solves

$$\begin{aligned} \left\{ \begin{array}{ll} -\Delta \nu + \mathbb {1}_{\left\{ y_{u}=0 \right\} }\nu ^{+} + \mathbb {1}_{\left\{ y_{u}>0 \right\} }\nu = h &{} { in} \Omega , \\ \nu = 0 &{} { on} \partial \Omega . \end{array} \right. \end{aligned}$$

The operator F is Gâteaux differentiable in \(u \in L^2(\Omega )\) if and only if the Lebesgue measure of the set \(\left\{ u:y_{u}=0 \right\} \) is zero [7, proposition 3.4]. Hence, in general F not is Gâteaux differentiable. Moreover, computing the directional derivative of F could be difficult, and a more convenient alternative is using the Bouligand subdifferential [4, proposition 3.16]. Given \(u \in L^{2}(\Omega )\), a specific Bouligand subderivative of F is given by the solution operator \(G_{u}:L^{2}(\Omega ) \rightarrow H_{0}^{1}(\Omega )\hookrightarrow L^{2}(\Omega )\) which maps \(h \in L^{2}(\Omega )\) to the unique solution \(\nu \in H_{0}^{1}(\Omega )\) of

$$\begin{aligned} \left\{ \begin{array}{ll} -\Delta \nu + \mathbb {1}_{\left\{ y_{u}>0 \right\} }\nu = h &{} { in} \Omega , \\ \nu = 0 &{} { on} \partial \Omega , \end{array} \right. \end{aligned}$$
(3.2)

where \(y_{u}=F(u)\). In fact \(G_{u}\) is uniformly bounded for all \(u \in L^{2}(\Omega )\) [7, lemmas 3.7 and 3.9], and consequently satisfies the generalized tangential cone condition. Hence, Assumption 2.2(c) holds for \(G_{u}\).

Now we can use Landweber iteration (1.4) to solve the inverse problem of recovering \(u \in L^{2}(\Omega )\) from \(y_{u}=F(u)\) satisfying (3.1). Since F is a mapping between Hilbert spaces, we have \(r=2\) and \(J_{2} \equiv I\). Since the convex quadratic penalty \({\mathcal {R}}(u)=\left\| u \right\| _{L^{2}(\Omega )}/2\) is 2-convex, set \(p=2\). By choosing \(L = G_{u}\), the iterative equation in (1.4) reduces to the Bouligand-Landweber iteration [7] given by

$$\begin{aligned} u_{n+1}^\delta = u_{n}^\delta - \mu _{n} ^\delta G_{u_{n}^\delta }^{*}\left( F(u_{n}^\delta )-y^\delta \right) . \end{aligned}$$
(3.3)

where \(u_{n}^{\delta }\) solves (3.1) given \(y=y_{n}^{\delta }\), and \(\mu _{n}^{\delta } \) is a given stepsize. Under the discrepancy principle (1.6), the authors in [7] established the convergence of (3.3) using a constant stepsize

$$\begin{aligned} \mu _n^\delta = \frac{2-2\eta }{\bar{L}^2}, \text { where } \bar{L} = 5 \cdot 10^{-2}, \end{aligned}$$
(3.4)

where \(\eta =0.1\) is a tangential constant estimate in Assumption 2.2(c).

For the computation of the nonsmooth forward operator F, we discretize the non-smooth semilinear elliptic problem (3.1) using standard continuous piecewise linear finite elements, and then solve the resulting non-smooth nonlinear equation using a semi-smooth Newton method. The same discretization scheme is also done for computing the Bouligand subdifferential \(G_{u}\) in terms of (3.1). For more details, we refer the reader to [7] and the reference therein.

We consider the two-dimensional problem with \(\Omega = (0,1) \times (0,1) \subset \mathbb {R}^2\) and use a uniform triangular Friedrichs-Keller triangulation with \(128^2\) vertices. The discretization scheme for computing the operator F and \(G_{u}\) involve standard continuous piecewise linear finite elements (see [7] for more details). We used the Python implementation available in https://github.com/clason/bouligandlandweber. The exact solution to be reconstructed is defined as

$$\begin{aligned} u^{\dagger }(x_{1},x_{2}) =&\max (y^{\dagger }(x_{1},x_{2}),0) \\&+\left[ 4\pi ^{2}y^{\dagger }(x_{1},x_{2})-2\left( (2x_{1}-1)^2+2(x_{1}-1+\beta )(x_{1}-\beta ) \right) \sin (2\pi x_{2}) \right] \\&\times \mathbb {1}_{(\beta ,1-\beta ] }(x_{1}) \end{aligned}$$

where

$$\begin{aligned} y^{\dagger }(x_{1},x_{2}) = \left[ (x_{1}-\beta )^{2}(x_{1}-1+\beta )^{2}\sin (2\pi x_{2}) \right] \mathbb {1}_{(\beta ,1-\beta ] }(x_{1}) \end{aligned}$$

with \(\beta =0.005\). Figure 2(i) shows a plot of \(u^{\dagger }\). The function \(y^\dagger \in H^{2}(\Omega ) \cup H_{0}^{1}(\Omega )\) satisfies (3.1) for the right-hand side \(u^{\dagger }\) and that \(y^{\dagger }\) vanishes on a set of measure \(2\beta \), so the forward operator F is not Gâteaux differentiable at \(u^\dagger \) [7, proposition 3.4]. Hence we can use the Bouligand–Landweber iteration (3.3).

Fig. 1
figure 1

a plots the noisy data \(y_{h}^{d}\) with \(\delta =3.875 \times 10^{-3}\); b plots the initial guess \(u_{0}=\bar{u}\)

Given the projection of \(y^{\dagger }\) to the finite element space, denoted by \(y_{h}^{\dagger }\), random Gaussian noise is added to obtain the noisy data \(y_{h}^\delta \), so that \(\delta = \left\| y_{h}^\delta -y_{h}^{\dagger } \right\| _{L^{2}(\Omega )}\). Figure 1b shows a noisy data with \(\delta =1.043 \times 10^{-4}\).

Now we test the Landweber iteration in solving the problem under the discrepancy principle (1.6) and the Hanke–Raus rule (Rule 2.1). We used two initial solutions: the trivial point \(u_{0} \equiv 0\) and the function

$$\begin{aligned} \bar{u}:=u^{\dagger } - 10\sin (\pi x_{1})\sin (2\pi x_{2}), \end{aligned}$$

plotted in Fig. 1a.

Now we test the iteration under rule 2.1, with \(\mu _{0}=0.6\), \(\mu _{1}=5.0 \times 10^5\). For comparision, we also test under discrepancy principle, by choosing the appropriate parameter \(\tau \) (see [19, theorem 3.9]. We terminated the iteration either when it satisfies (1.6) or when it exceeds 5000 iterations.

Fig. 2
figure 2

The reconstruction for Example 3.1 with noise level \(\delta =\)3.875e\(-\)3 and the trivial initial solution \(u_{0} \equiv 0\): a, b, and c plot the relationship between \(\Theta (n,u^{\delta })\) and n for \(a=1\), 100 and 10000, respectively; d, e, and f plot the reconstructed solutions via rule 2.1 for \(a=1\), 100 and 10000, respectively; g plots the residual \(\left\| F(u^{\delta }_{n}) - y^{\delta } \right\| _{L^{2}(\Omega )}\) versus n.; h plots the reconstructed solution via discrepancy principle; and i plots the exact solution

Fig. 3
figure 3

The reconstruction for Example 3.1 with noise level \(\delta =\)3.875e\(-\)3 and \(u_{0} =\bar{u}\): a, b, and c plot the relationship between \(\Theta (n,u^{\delta })\) and n for \(a=1\), 100 and 10000, respectively; (d), (e), and (f) plot the reconstructed solutions via Hanke–Raus rule for \(a=1\), 100 and 10000, respectively; g plots the residual \(\left\| F(u^{\delta }_{n}) - y^{\delta } \right\| _{L^{2}(\Omega )}\) versus n; and h plots the solution via discrepancy principle

Figures 2 and 3 report the numerical results using rule 2.1 and the discrepancy principle. Using rule 2.1 for various choices of the parameter a, Figs. 2d–f and 3d–f shows the reconstruction using \(u_{0} \equiv 0\) and \(u_{0} = \bar{u}\), respectively. To illustrate that the noise condition in Assumption 2.5 is satisfied, we plot the residual against the iteration number in Figs. 2g and 3g. Figure 2(i) plots the exact solution \(u^{\dagger }\). Notice the iteration (3.3) converges faster and generates better reconstruction using the initial point \(\bar{u}\) since it satisfies the general source condition with the exact solution \(u^{\dagger }\) [7]. The faster convergence is illustrated by the plots of \(\Theta (n,u^{\delta })\) versus n in Fig. 2a, b for \(u_{0} \equiv 0\), and Fig. 3a, b for \(u_{0} = \bar{u}\). Since our convergence analysis do not assume any source condition, the trivial initial point \(u_{0} \equiv 0\) can still produce accurate reconstruction under rule 2.1. For comparison, Figs. 2h and 3h plot the reconstructions using discrepancy principle for the two different initial points \(u_{0}\).

Tables 1 and 2 gives more details in the numeric results for rule 2.1 for various noise levels using \(u_{0} \equiv 0\) and \(u_{0}=\bar{u}\), respectively. The regularization parameter \(n_{*}\) and the relative errors

$$\begin{aligned} E_{n}^{\delta }:=\frac{\left\| u_{h}^\delta -u^{\dagger } \right\| _{L^{2}(\Omega )}}{\left\| u^{\dagger } \right\| _{L^{2}(\Omega )}} \end{aligned}$$

are shown. For consistency we also show the values \(\delta _{rel} = \left\| y_{h}^\delta -y_{h}^{\dagger } \right\| _{L^{2}(\Omega )}/\left\| y_{h}^{\dagger } \right\| _{L^{2}(\Omega )}\). For comparison, Table 3 report the stopping indices \(n_{\delta }\) via discrepancy principle for both \(u_{0} \equiv 0\) and \(u_{0}=\bar{u}\) as well as the relative errors \(E_{n_{\delta }}\). As expected using the variable step (1.5) reduces the number of iterations to achieve convergence than using the constant step (3.4). The tabular results further verifies the previously described effect using \(u_{0}=\bar{u}\). Just More importantly, the figures and tables show that even without information on the noise level, rule 2.1 can still produce accurate reconstructions.

Example 3.2

The problem of identifying the source or coefficient term/s in partial differential equations arises in a number of applications. Here we consider a known benchmark problem for regularizing nonlinear inverse problems.

Suppose we want to solve for the space-dependent source function c in the following elliptic boundary value problem

$$\begin{aligned} \left\{ \begin{array}{rl} -\Delta u + cu = f &{} { in} \Omega , \\ u = g &{} { on} \partial \Omega , \end{array} \right. \end{aligned}$$
(3.5)

given a measurement u in \(\Omega \), where \(\Omega \subset \mathbb {R}^m\) with \(m \in \mathbb {N}\) is a smooth bounded domain. Given spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\), which are to be specified later, the forward operator F

$$\begin{aligned} F:{\mathcal {D}}(F) \subseteq {\mathcal {X}}\rightarrow {\mathcal {Y}}, \end{aligned}$$
(3.6)

its derivative \(F'\), and the adjoint of \(F'\), can be formally written as

$$\begin{aligned} F(c)= A(c)^{-1}f, \quad F'(c)h = -A(c)^{-1}(h\cdot F(c)), \quad F'(c)^{*}w = -u(c)A(c)^{-1}w \end{aligned}$$

for \(h,w \in L^{2}(\Omega )\), where

$$\begin{aligned} A(c):H^{2}(\Omega ) \cap H_{0}^{1}(\Omega )&\rightarrow L^{2}(\Omega ) \\ u&\rightarrow -\Delta u + cu. \end{aligned}$$

To preserve ellipticity, a straightforward choice of the domain \({\mathcal {D}}(F)\) is

$$\begin{aligned} {\mathcal {D}}(F) = \left\{ c \in {\mathcal {X}}\, \vert \, c \ge 0 \, \text {a.e.} \, \left\| c \right\| _{X} \le \rho \right\} \end{aligned}$$
(3.7)

for some sufficiently small \(\rho >0\). For situations requiring a nonempty interior of \({\mathcal {D}}(F)\) in \({\mathcal {X}}\), the choice

$$\begin{aligned} {\mathcal {D}}(F) = \left\{ c \in {\mathcal {X}}\, \vert \, \exists \hat{c} \in L^{\infty }(\Omega ), \, \hat{c} \ge 0 \, \text {a.e.} \, \left\| c - \hat{c} \right\| _{X} \le \rho \right\} \end{aligned}$$
(3.8)

for some sufficiently small \(\rho >0\) has been devised [9].

Table 1 Results for the Bouligand-Landweber iteration with variable stepsize for Example 3.1 under Rule 2.1 with \(u_{0} \equiv 0\)
Table 2 Results for the Bouligand–Landweber iteration with variable stepsize for Example 3.1 under Rule 2.1 with \(u_{0}=\bar{u}\)
Table 3 The reconstruction for Example 3.1 using the Bouligand-Landweber iteration (3.3) using the discrepancy principle

So far, the problem has been studied in the context of regularization in Hilbert spaces, by setting the preimage and image spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\) to be \(L^{2}(\Omega )\) [9, 17, 18]. By treating the same \({\mathcal {X}}\) and \({\mathcal {Y}}\) as Banach spaces, the problem has also been studied in a more general setting by incorporating a non-smooth convex penalty functional to recover a non-smooth solution [13, 19, 30, 31]. However, as previously argued, \({\mathcal {Y}}=L^{\infty }(\Omega )\) or \({\mathcal {Y}}=L^{1}(\Omega )\) are more suitable choices for \({\mathcal {Y}}\), especially for a practically relevant situation of impulsive noise. Hence, we treat this example in a more general setting by using

$$\begin{aligned} {\mathcal {X}}= L^{p}(\Omega ), \quad {\mathcal {Y}}= L^{r}(\Omega ) \end{aligned}$$

for \(p,r \in [1,\infty ]\). The result in [27, proposition 1.2] guarantees that in this more general setting the forward operator F and its derivative \(F'\) are still well-defined.

In addition, item (c) of Assumption 2.2 holds for small \(\rho >0\) [9]. Hence, we can consider \({\mathcal {X}}= L^{2}(\Omega )\) and \(Y = L^{1}(\Omega )\) for numerical simulation. We use finite differences to discretize the problem. We consider the two-dimensional problem with \(\Omega = \left[ 0,1 \right] \times \left[ 0,1 \right] \) divided into \(N \times N\) small squares of equal size. This results to a grid with \(N+1\) grid points in both x- and y-directions with step size \(h = (N+1)^{-1}\). We also define the sought parameter \(c^\dagger \) as

$$\begin{aligned} c^{\dagger }(x,y) = {\left\{ \begin{array}{ll} 1, &{} \text{ if } (x-0.65)^{2} + (y-0.36)^{2} \le 0.18^{2}, \\ 0.5, &{} \text{ if } (x-0.35)^{2} + 4(y-0.75)^{2} \le 0.2^{2}, \\ 0, &{} \text{ elsewhere. } \end{array}\right. } \end{aligned}$$

Since \(c^{\dagger }\) is piecewise constant, we use the Landweber iteration in (1.4)–(1.5) with the 2-convex TV-like functional

$$\begin{aligned} {\mathcal {R}}(x):=\frac{1}{2\beta }\left\| x \right\| _{L^{2}(\Omega )}^{2} + \left| x \right| _{TV} \end{aligned}$$
(3.9)

where \(\beta >0\) and

$$\begin{aligned} \left| x \right| _{TV}:= \sup \left\{ \int _{\Omega }f \, \text {div} g d\omega \,: \, g \in C_{0}^{1}(\Omega ;\mathbb {R}^{m}) \text{ and } \left\| g \right\| _{L^{\infty }(\Omega )} \le 1 \right\} \end{aligned}$$

denotes the total variation of f over \(\Omega \). For our chosen image and preimage spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\), implementing the iteration (1.4) requires solving a minimization problem of the form

$$\begin{aligned} x = \arg \min _{z \in L^{2}(\Omega )}\left\{ {\mathcal {R}}(z) - \left\langle \xi ,z \right\rangle _{L^{2}(\Omega )} \right\} \end{aligned}$$

for any \(\xi \in L^{2}(\Omega )\). For our choice of \({\mathcal {R}}\) given by (3.9), this minimization problem is equivalent to solving

$$\begin{aligned} x = \arg \min _{z \in L^{2}(\Omega )} \left\{ \frac{1}{2\beta }\left\| z - \beta \xi \right\| _{L^{2}(\Omega )}^{2} + \left| x \right| _{TV} \right\} , \end{aligned}$$
(3.10)

which is the total variation denoising problem, also known as the ROF model [25]. Since there is no explicit formula for the minimisation step in (1.4), numerical solvers are used; we will use the primal-dual hybrid gradient (PDHG) algorithm [32]. The penalty functional (3.9) is discretized first before applying PDHG, as illustrated in [14]. After setting the exact data to be \(u(c^{\dagger })=x+y\), we add random uniform noise to produce noisy data \(u^\delta \) with noise level \(\delta := \left\| u^\delta -u(c^{\dagger }) \right\| _{L^{2}}\).

By relaxing the uniform smoothness on \({\mathcal {Y}}\), we use \(r=1.0\) in the duality mapping \(j_{r}^{{\mathcal {Y}}}\), in contrast to the implementation in [19]. For \(r \ge 1\) the duality mapping \(j_{r}^{{\mathcal {Y}}}(y)\) for each \(y \in {\mathcal {Y}}= L^{r}\left[ 0,1 \right] \) has the pointwise expression

$$\begin{aligned} \left[ j_{r}^{{\mathcal {Y}}}(y) \right] (t):= \left| y(t) \right| ^{r-1} \text{ sign } (x(t)), \quad t \in \left[ 0,1 \right] . \end{aligned}$$

Next we apply the Landweber iteration in (1.4)–(1.5) under the discrepancy principle (1.6) and the Hanke–Raus rule (Rule 2.1). We used the parameters

$$\begin{aligned} \beta = 12.0, \quad r = 1.0, \quad p=2.0, \quad \eta = 0.01, \quad \tau = 1.001 \cdot \frac{1 + \eta }{1-\eta }, \end{aligned}$$

and \(\xi _0 = x_0=0\) as initial guess. For the step size (1.5), we choose

$$\begin{aligned} \mu _{0}= 3.9\left( 1 - \eta - \frac{1+\eta }{\tau } \right) \text{ and } \mu _{1}=10000 \end{aligned}$$

for both parameter choice rules, to ensure that \(\mu _{0}>0\) and (2.8) holds. We picked a large \(\mu _{1}\) to ensure fast convergence. Since the Fréchet derivative of F satisfies Assumption 2.2(c) for small \(\rho >0\) (see [9]), take \(L = F'\).

Fig. 4
figure 4

The reconstruction for Example 3.2 with a Gaussian noisy data with \(\delta =0.0005\): a plots the exact solution \(c^{\dagger }\); b plots the reconstructed solution by discrepancy principle; c plots the relationship between the residual \(\left\| F(x^{\delta }_{n}) - y^{\delta } \right\| _{L^{2}}\) and n; d plots the relationship between \(\Theta (n,y^{\delta })\) and n for \(a=5000\) respectively; and e plots the reconstructed solutions via Hanke–Raus rule for \(a=5000\)

Figure 4 shows the numerical results. The solution obtained using discrepancy principle is plotted in Fig. 4b. Moreover, the plot in Fig. 4c illustrates that the noise condition in assumption 2.5 is satisfied. The solution obtained in Fig. 4e shows that Rule 2.1 can provide accurate reconstruction in the absence of information on the noise level.

Fig. 5
figure 5

The reconstruction for example 3.2 using noisy data with impulse noise: a plots the noisy data \(y^{\delta }\); b plots the relationship between \(\Theta (n,y^{\delta })\) and n for \(a=50000\); c and d plots the reconstructions via discrepancy principle and rule 2.1, respectively

On the other hand, we also apply the Landweber iteration using a data filled with impulse noise. Figure 5 shows the reconstruction results. The noisy data in Fig. 5a contains some outliers due to the impulse noise, making the noise level harder to estimate. We used a large number of iterations to ensure we get \(n_{*}\) as shown by the plot in Fig. 5b.

The inaccurate reconstruction in Fig. 5c shows that the discrepancy principle becomes ineffective with a poorly estimated noise level. However, the reconstruction in Fig. 5d shows that rule 2.1 can overcome this scenario, since it does not depend on the noise level.

Example 3.3

Now we consider an image deblurring problem, where an unknown image \(x^{\dagger }\in \mathbb {R}^{M \times N}\) is to be reconstructed from an observed image \(\tilde{y}=Fx^{\dagger }+ \nu \) downgraded by a linear convolution blurring operator F and a salt-and-pepper-noise \(\nu \).

Since the image to be reconstructed has some periodic boundary conditions, we use the total variation (TV) deblurring. The function \({\mathcal {R}}(x)\) is chosen as

$$\begin{aligned} {\mathcal {R}}(x)=\left| x \right| _{TV} + \frac{\beta }{2}\left\| x \right\| _{F}^{2} \end{aligned}$$

with \(\beta =0.001\), which is a small perturbation of the TV function. Here \(\left| x \right| _{TV}\) and \(\left\| x \right\| _{F}\) denote the total variation of x and the Frobenius norm of x respectively. To remove the salt-and-pepper noise efficiently, we use the data fidelity term as

$$\begin{aligned} \frac{1}{r}\left\| Fx-\tilde{y} \right\| _{r}^{r} \end{aligned}$$

with \(r=1.0\).

Next we apply the Landweber iteration via (1.4)–(1.5) with \(r=1.0\), \(p=2\), \(\mu _{0}= 0.008\), \(\mu _{1}=100\). We choose \(\xi _0 = x_0=0\) as initial guesses. In addition, the minimization problem in (3.10) is solved by a primal-dual hybrid gradient method.

Fig. 6
figure 6

a The original image; b observed image with 40% salt-and-pepper noise; c reconstruction result via discrepancy principle with \(\tau =1.01\); df are the reconstruction results via rule 2.1 with \(a=10000\), 50000, and 100000, respectively; g plots the residual \(\left\| Fx^{\delta }_{n} - \tilde{y} \right\| _{r}^{r}\) versus n; h plots \(\Theta (n,\tilde{y})\) versus n for \(a=10000\), 50000, and 100000

Figure 6 shows the reconstructions using the Boats \((576 \times 720)\) motion blur (fspecial(’motion’,35,50) in MATLAB). The noisy observed image was obtained by adding salt-and-pepper noise with noise density 0.4. To measure the quality of the reconstructed image, we show the computed PSNR (peak signal-to-noise ratio) defined by

$$\begin{aligned} \text {PSNR} = 10 \cdot \log _{10}\frac{255^2}{\text {MSE}}(\text {dB}) \end{aligned}$$

where the MSE stands for the mean-squared error per pixel. We present the reconstruction of Landweber iteration via discrepancy principle and via rule 2.1. Figure 6g illustrates that assumption 2.5 is satisfied. Moreover, Fig. 6h shows an eventual rising of \(\Theta (n,y^{\delta })\), illustrating the existence of \(n_{*}\) via rule 2.1. Both parameter choice rules give satisfactory rules, and notice that rule 2.1 gives better reconstruction for a sufficiently large fixed constant a. For this type of noisy data, the Hanke–Raus rule can indeed provide accurate reconstruction.

4 Conclusion

A general convergence analysis for Landweber iteration for inverse problems under the Hanke–Raus rule was established. By using the so-called noise condition, and the compactness condition, we obtain the convergence result. More importantly, since the Hanke–Raus rule do not rely on any information about the noise level, this makes the Landweber iteration purely data driven, while expanding its applied range towards inverse problems with a nonsmooth forward operator and problems whose data is corrupted by various types of noise.