1 Introduction

We are interested in solving inverse problems which can be formulated as the operator equation

$$\begin{aligned} F(x)=y, \end{aligned}$$
(1.1)

where \(F: D(F)\subset \mathcal {X}\mapsto \mathcal {Y}\) is an operator between two Banach spaces \(\mathcal {X}\) and \(\mathcal {Y}\) with domain \(D(F)\subset \mathcal {X}\); the norms in \(\mathcal {X}\) and \(\mathcal {Y}\) are denoted by the same notation \(\Vert \cdot \Vert \) that should be clear from the context. A characteristic property of inverse problems is their ill-posedness in the sense that their solutions do not depend continuously on the data. Due to errors in the measurements, one never has the exact data in practical applications; instead only noisy data are available. If one uses the algorithms developed for well-posed problems directly, it usually fails to produce any useful information since noise could be amplified by an arbitrarily large factor. Let \(y^\delta \) be the only available noisy data to \(y\) satisfying

$$\begin{aligned} \Vert y^\delta -y\Vert \le \delta \end{aligned}$$
(1.2)

with a given small noise level \(\delta > 0\). How to use \(y^\delta \) to produce a stable approximate solution to (1.1) is a central topic, and regularization methods should be taken into account.

When both \(\mathcal {X}\) and \(\mathcal {Y}\) are Hilbert spaces, a lot of regularization methods have been proposed to solve inverse problems in the Hilbert space framework [4, 15]. In case \(F:\mathcal {X}\rightarrow \mathcal {Y}\) is a bounded linear operator, nonstationary iterated Tikhonov regularization is an attractive iterative method in which a sequence \(\{x_n^\delta \}\) of regularized solutions is defined successively by

$$\begin{aligned} x_n^\delta :=\arg \min _{x\in \mathcal {X}} \left\{ \frac{1}{2} \Vert F x-y^\delta \Vert ^2 + \frac{\alpha _n}{2} \Vert x-x_{n-1}^\delta \Vert ^2 \right\} , \end{aligned}$$

where \(x_0^\delta :=x_0\in \mathcal {X}\) is an initial guess and \(\{\alpha _n\}\) is a preassigned sequence of positive numbers. Since \(\{x_n^\delta \}\) can be written explicitly as

$$\begin{aligned} x_n^\delta =x_{n-1}^\delta -(\alpha _n I +F^* F)^{-1} F^* (F x_{n-1}^\delta -y^\delta ), \end{aligned}$$

where \(F^*: \mathcal {Y}\rightarrow \mathcal {X}\) denotes the adjoint of \(F:\mathcal {X}\rightarrow \mathcal {Y}\), the complete analysis of the regularization property has been established (see [8] and references therein) when \(\{\alpha _n\}\) satisfies suitable property and the discrepancy principle is used to terminate the iteration, This method has been extended in [12, 13] to solve nonlinear inverse problems in Hilbert spaces.

Regularization methods in Hilbert spaces can produce good results when the sought solution is smooth. However, because such methods have a tendency to over-smooth solutions, they may not produce good results in applications where the sought solution has special features such as sparsity or discontinuities. In order to capture the special features, the methods in Hilbert spaces should be modified by incorporating the information of suitable adapted penalty functionals, for which the theories in Hilbert space setting are no longer applicable.

The nonstationary iterated Tikhonov regularization has been extended in [14] for solving linear inverse problems in Banach spaces setting by defining \(x_n^\delta \) as the minimizer of the convex minimization problem

$$\begin{aligned} \min _{x\in \mathcal {X}} \left\{ \frac{1}{r}\Vert F x-y^\delta \Vert ^r +\alpha _n \varDelta _p(x, x_{n-1}^\delta ) \right\} \end{aligned}$$

for \(n\ge 1\) successively, where \(1\le r<\infty ,\,1<p<\infty \) and \(\varDelta _p(\cdot , \cdot )\) denotes the Bregman distance on \(\mathcal {X}\) induced by the convex function \(x\rightarrow \Vert x\Vert ^p/p\). When \(\mathcal {X}\) is uniformly smooth and uniformly convex, and when the method is terminated by the discrepancy principle, the regularization property has been established if \(\{\alpha _n\}\) satisfies \(\sum _{n=1}^\infty \alpha _n^{-1}=\infty \). The numerical simulations in [14] indicate that the method is efficient in sparsity reconstruction when choosing \(\mathcal {X}=L^p\) with \(p>1\) close to \(1\) on one hand, and provides robust estimator in the presence of outliers in the noisy data when choosing \(\mathcal {Y}=L^1\) on the other hand. However, since \(\mathcal {X}\) is required to be uniformly smooth and uniformly convex and since \(\varDelta _p(\cdot , \cdot )\) is induced by the power of the norm in \(\mathcal {X}\), the result in [14] does not apply to regularization methods with \(L^1\) and total variation like penalty terms that are important for reconstructing sparsity and discontinuities of sought solutions.

The total variation regularization was introduced in [18], its importance was recognized immediately and many successive works were conducted in the last two decades. In [16] an iterative regularization method based on Bregman distance and total variation was introduced to enhance the multi-scale nature of reconstruction. The method solves (1.1) with \(F:\mathcal {X}\rightarrow \mathcal {Y}\) linear and \(\mathcal {Y}\) a Hilbert space by defining \(\{x_n^\delta \}\) in the primal space \(\mathcal {X}\) and \(\{\xi _n^\delta \}\) in the dual space \(\mathcal {X}^*\) via

$$\begin{aligned} \begin{aligned} x_n^\delta&:=\arg \min _{x\in \mathcal {X}} \left\{ \frac{1}{2} \Vert F x-y^\delta \Vert ^2 +\alpha _n D_{\xi _{n-1}^\delta } \varTheta (x, x_{n-1}^\delta )\right\} ,\\ \xi _n^\delta&:= \xi _{n-1}^\delta - \frac{1}{\alpha _n} F^* \left( F x_n^\delta -y^\delta \right) , \end{aligned} \end{aligned}$$
(1.3)

where \(\varTheta : \mathcal {X}\rightarrow (-\infty , \infty ]\) is a proper convex function, \(x_0^\delta \in \mathcal {X}\) is an initial guess, \(\xi _0^\delta \in \mathcal {X}^*\) is in the sub-gradient of \(\varTheta \) at \(x_0^\delta \), and \(D_\xi \varTheta (\cdot , \cdot )\) denotes the Bregman distance induced by \(\varTheta \). This method was extended in [2] to solve nonlinear inverse problems. Extensive numerical simulations were reported in [2, 16] and convergence analysis was given, with special attention to the case that \(\mathcal {X}=L^2(\varOmega )\) and \(\varTheta (x)=a \Vert x\Vert _{L^2}^2 +\int _\varOmega |D x|\), where \(\int _\varOmega |D x|\) denotes the total variation, when the iteration is terminated by a discrepancy principle and \(\{\alpha _n\}\) satisfies the condition \(\underline{\alpha }\le \alpha _n \le \overline{\alpha }\) for two positive constants \(\overline{\alpha }\ge \underline{\alpha }>0\). The analysis in [2, 16], however, is somewhat preliminary since it provides only the boundedness of \(\{\varTheta (x_{n_\delta }^\delta )\}\) which guarantees only weak convergence for a subsequence of \(\{x_{n_\delta }^\delta \}\), where \(n_\delta \) denotes the stopping index determined by the discrepancy principle. It is natural to ask if the whole sequence converges strongly and in Bregman distance.

We point out that the method (1.3) is equivalent to the augmented Lagrangian method introduced originally in [10, 17] and developed further in various directions, see [11] and reference therein. One may refer to [6] for some results on convergence and convergence rates of the augmented Lagrangian method applied to linear inverse problems in Hilbert spaces with general convex penalty term. When \(\mathcal {X}\) and \(\mathcal {Y}\) are Hilbert spaces and \(\varTheta (x)=\Vert x\Vert ^2\), (1.3) is exactly the nonstationary iterated Tikhonov regularization. In this paper we formulate an extension of the nonstationary iterated Tikhonov regularization in the spirit of (1.3) to solve (1.1) with both \(\mathcal {X}\) and \(\mathcal {Y}\) being Banach spaces and present the detailed convergence analysis when the method is terminated by the discrepancy principle. In the method we allow \(\{\alpha _n\}\) to vary in various ways so that geometric decreasing sequence can be included; this makes it possible to terminate the method in fewer iterations. Moreover, we allow the penalty term \(\varTheta \) to be general uniformly convex functions on \(\mathcal {X}\) so that the method can be used for sparsity reconstruction and discontinuity detection. Most importantly, we obtain

$$\begin{aligned} x_{n_\delta }^\delta \rightarrow x^\dag , \quad \varTheta (x_{n_\delta }^\delta ) \rightarrow \varTheta (x^\dag ) \quad \text{ and } \quad D_{\xi _{n_{\delta }}^{\delta }} \varTheta (x^\dag , x_{n_{\delta }}^{\delta }) \rightarrow 0 \end{aligned}$$

and give a characterization of the limit \(x^\dag \), which significantly improve the known convergence results.

This paper is organized as follows. In Sect. 2 we give some preliminary results on Banach spaces and convex analysis. In Sect. 3, we then formulate the method in Banach spaces with uniformly convex penalty term for solving linear and nonlinear inverse problems, and present the main convergence results. In Sect. 4 we first prove a convergence result for the method when the data is given exactly; we then show that, if the data contains noise, the method is well-defined and admits some stability property; by combining these results we finally obtain the proof of the main convergence theorems. Finally, in Sect. 5 we present some numerical simulations on linear integral equations of first kind and parameter identification problems in partial differential equations to test the performance of the method.

2 Preliminaries

Let \(\mathcal {X}\) be a Banach space with norm \(\Vert \cdot \Vert \). We use \(\mathcal {X}^*\) to denote its dual space. Given \(x\in \mathcal {X}\) and \(\xi \in \mathcal {X}^*\) we write \(\langle \xi , x\rangle =\xi (x)\) for the duality pairing. We use “\(\rightarrow \)” and “\(\rightharpoonup \)” to denote the strong convergence and weak convergence respectively. If \(\mathcal {Y}\) is another Banach space and \(A: \mathcal {X}\rightarrow \mathcal {Y}\) is a bounded linear operator, we use \(A^*: \mathcal {Y}^*\rightarrow \mathcal {X}^*\) to denote its adjoint, i.e. \(\langle A^* \zeta , x\rangle =\langle \zeta , A x\rangle \) for any \(x\in \mathcal {X}\) and \(\zeta \in \mathcal {Y}^*\). We use \(\mathcal {N}(A)=\{x\in \mathcal {X}: A x=0\}\) to denote the null space of \(A\) and define

$$\begin{aligned} \mathcal {N}(A)^\perp := \{ \xi \in \mathcal {X}^*: \langle \xi , x\rangle =0 \text{ for } \text{ all } x\in \mathcal {N}(A)\}. \end{aligned}$$

When \(\mathcal {X}\) is reflexive, there holds

$$\begin{aligned} \mathcal {N}(A)^\perp =\overline{\mathcal {R}(A^*)}, \end{aligned}$$
(2.1)

where \(\mathcal {R}(A^*)\) denotes the range space of \(A^*\) and \(\overline{\mathcal {R}(A^*)}\) denotes the closure of \(\mathcal {R}(A^*)\) in \(\mathcal {X}^*\).

For a convex function \(\varTheta : \mathcal {X}\rightarrow (-\infty , \infty ]\), we use \(D(\varTheta ):=\{x\in \mathcal {X}: \varTheta (x)<+\infty \}\) to denote its effective domain. We call \(\varTheta \) proper if \(D(\varTheta )\ne \emptyset \). Given \(x\in \mathcal {X}\) we define

$$\begin{aligned} \partial \varTheta (x):=\{\xi \in \mathcal {X}^*: \varTheta (\bar{x})-\varTheta (x)-\langle \xi , \bar{x}-x\rangle \ge 0 \text{ for } \text{ all } \bar{x}\in \mathcal {X}\}. \end{aligned}$$

Any element \(\xi \in \partial \varTheta (x)\) is called a subgradient of \(\varTheta \) at \(x\). The multi-valued mapping \(\partial \varTheta : \mathcal {X}\rightarrow 2^{\mathcal {X}^*}\) is called the subdifferential of \(\varTheta \). It could happen that \(\partial \varTheta (x)=\emptyset \) for some \(x\in D(\varTheta )\). Let

$$\begin{aligned} D(\partial \varTheta ):=\{x\in D(\varTheta ): \partial \varTheta (x)\ne \emptyset \}. \end{aligned}$$

For \(x \in D(\partial \varTheta )\) and \(\xi \in \partial \varTheta (x)\) we define

$$\begin{aligned} D_\xi \varTheta (\bar{x},x):=\varTheta (\bar{x})-\varTheta (x)-\langle \xi , \bar{x}-x\rangle , \quad \forall \bar{x}\in \mathcal {X}\end{aligned}$$

which is called the Bregman distance induced by \(\varTheta \) at \(x\) in the direction \(\xi \). Clearly \(D_\xi \varTheta (\bar{x},x)\ge 0\). By straightforward calculation one can see that

$$\begin{aligned} D_\xi \varTheta (x_2,x)-D_\xi \varTheta (x_1, x) =D_{\xi _1} \varTheta (x_2,x_1) +\langle \xi _1-\xi , x_2-x_1\rangle \end{aligned}$$
(2.2)

for all \(x, x_1\in D(\partial \varTheta ),\,\xi \in \partial \varTheta (x),\,\xi _1\in \partial \varTheta (x_1)\) and \(x_2\in \mathcal {X}\).

A proper convex function \(\varTheta : \mathcal {X}\rightarrow (-\infty , \infty ]\) is called uniformly convex if there is a continuous function \(h:[0, \infty ) \rightarrow [0, \infty )\), with the property that \(h(t)=0\) implies \(t=0\), such that

$$\begin{aligned} \varTheta (\lambda \bar{x} +(1-\lambda ) x) +\lambda (1-\lambda ) h(\Vert \bar{x}-x\Vert ) \le \lambda \varTheta (\bar{x}) +(1-\lambda ) \varTheta (x) \end{aligned}$$
(2.3)

for all \(\bar{x}, x\in \mathcal {X}\) and \(\lambda \in (0,1)\). If \(h\) in (2.3) can be taken as \(h(t)=c t^p\) for some \(c>0\) and \(p\ge 2\), then \(\varTheta \) is called \(p\)-uniformly convex. It can be shown [20, Theorem 3.5.10] that \(\varTheta \) is uniformly convex if and only if there is a strictly increasing continuous function \(\varphi : [0, \infty )\rightarrow [0, \infty )\) with \(\varphi (0)=0\) such that

$$\begin{aligned} D_\xi \varTheta (\bar{x},x) \ge \varphi (\Vert \bar{x}-x\Vert ) \end{aligned}$$
(2.4)

for all \(\bar{x} \in \mathcal {X},\,x\in D(\partial \varTheta )\) and \(\xi \in \partial \varTheta (x)\).

On a Banach space \(\mathcal {X}\), we consider for \(1<r<\infty \) the convex function \(x\rightarrow \Vert x\Vert ^r/r\). Its subdifferential at \(x\) is given by

$$\begin{aligned} J_r(x):=\{\xi \in \mathcal {X}^*: \Vert \xi \Vert =\Vert x\Vert ^{r-1} \text{ and } \langle \xi , x\rangle =\Vert x\Vert ^r\} \end{aligned}$$

which gives the duality mapping \(J_r: \mathcal {X}\rightarrow 2^{\mathcal {X}^*}\) with gauge function \(t\rightarrow t^{r-1}\). We call \(\mathcal {X}\) uniformly convex if its modulus of convexity

$$\begin{aligned} \delta _{\mathcal {X}}(t) := \inf \{2 -\Vert \bar{x} +x\Vert : \Vert \bar{x}\Vert =\Vert x\Vert =1, \Vert \bar{x}-x\Vert \ge t\} \end{aligned}$$

satisfies \(\delta _{\mathcal {X}}(t)>0\) for all \(0 < t\le 2\). If there are \(c>0\) and \(r>1\) such that \(\delta _{\mathcal {X}}(t) \ge c t^r\) for all \(0<t\le 2\), then \(\mathcal {X}\) is called \(r\)-uniformly convex. We call \(\mathcal {X}\) uniformly smooth if its modulus of smoothness

$$\begin{aligned} \rho _{\mathcal {X}}(s) := \sup \{\Vert \bar{x}+ x\Vert +\Vert \bar{x}-x\Vert - 2 : \Vert \bar{x}\Vert = 1, \Vert x\Vert \le s\} \end{aligned}$$

satisfies \(\lim _{s\searrow 0} \frac{\rho _{\mathcal {X}}(s)}{s} =0\). One can refer to [1, 3] for many examples of Banach spaces, including the sequence spaces \(l^r\), the Lebesgue spaces \(L^r\), the Sobolev spaces \(W^{k,r}\) and the Besov spaces \(B^{s,r}\) with \(1<r<\infty \), that are both uniformly convex and uniformly smooth.

It is well known that any uniformly convex or uniformly smooth Banach space is reflexive. On a uniformly smooth Banach space \(\mathcal {X}\), every duality mapping \(J_r\) with \(1<r<\infty \) is single valued and uniformly continuous on bounded sets; for each \(1<r<\infty \) we use

$$\begin{aligned} \varDelta _r(\bar{x},x) = \frac{1}{r}\Vert \bar{x}\Vert ^r - \frac{1}{r}\Vert x\Vert ^r - \langle J_r(x),\bar{x}-x\rangle ,\quad \forall \bar{x}, x\in \mathcal {X}\end{aligned}$$

to denote the Bregman distance induced by the convex function \(\varTheta (x) = \Vert x\Vert ^r/r\).

Furthermore, on a uniformly convex Banach space, any sequence \(\{x_n\}\) satisfying \(x_n \rightharpoonup x\) and \(\Vert x_n\Vert \rightarrow \Vert x\Vert \) must satisfy \(x_n\rightarrow x\) as \(n\rightarrow \infty \). This property can be easily generalized for uniformly convex functions which we state in the following result.

Lemma 2.1

Let \(\varTheta :\mathcal {X}\rightarrow (-\infty , \infty ]\) be a proper, weakly lower semi-continuous, and uniformly convex function. Then \(\varTheta \) admits the Kadec property, i.e. for any sequence \(\{x_n\}\subset \mathcal {X}\) satisfying \(x_n \rightharpoonup x\in \mathcal {X}\) and \(\varTheta (x_n) \rightarrow \varTheta (x)<\infty \) there holds \(x_n\rightarrow x\) as \(n\rightarrow \infty \).

Proof

Assume the result is not true. Then, by taking a subsequence if necessary, there is an \(\epsilon >0\) such that \(\Vert x_n-x\Vert \ge \epsilon \) for all \(n\). In view of the uniform convexity of \(\varTheta \), there is a \(\gamma >0\) such that \( \varTheta ((x_n+x)/2) \le (\varTheta (x_n)+\varTheta (x))/2 -\gamma . \) Using \(\varTheta (x_n)\rightarrow \varTheta (x)\) we then obtain

$$\begin{aligned} \limsup _{n\rightarrow \infty } \varTheta \left( \frac{x_n+x}{2}\right) \le \varTheta (x) -\gamma . \end{aligned}$$

On the other hand, observing that \((x_n+x)/2 \rightharpoonup x\), we have from the weak lower semi-continuity of \(\varTheta \) that

$$\begin{aligned} \varTheta (x) \le \liminf _{n\rightarrow \infty } \varTheta \left( \frac{x_n+x}{2}\right) . \end{aligned}$$

Therefore \(\varTheta (x) \le \varTheta (x)-\gamma \), which is a contradiction. \(\square \)

In many practical applications, proper, weakly lower semi-continuous, uniformly convex functions can be easily constructed. For instance, consider \(\mathcal {X}=L^p(\varOmega )\), where \(2\le p<\infty \) and \(\varOmega \) is a bounded domain in \({\mathbb {R}}^d\). It is known that the functional \( \varTheta _0(x) := \int _\varOmega |x(\omega )|^p d\omega \) is uniformly convex on \(L^p(\varOmega )\) (it is in fact \(p\)-uniformly convex). Consequently we obtain on \(L^p(\varOmega )\) the uniformly convex functions

$$\begin{aligned} \varTheta (x):=\mu \int _\varOmega |x(\omega )|^p d\omega + a \int _\varOmega |x(\omega )| d\omega +b \int _\varOmega |D x|, \end{aligned}$$
(2.5)

where \(\mu >0,\,a, b\ge 0\), and \(\int _\varOmega |D x|\) denotes the total variation of \(x\) over \(\varOmega \) that is defined by [7]

$$\begin{aligned} \int _\varOmega |D x| :=\sup \left\{ \, \int _\varOmega x \, \text{ div } f d\omega : f\in C_0^1(\varOmega ; {\mathbb {R}}^N) \text{ and } \Vert f\Vert _{L^\infty (\varOmega )}\le 1\right\} . \end{aligned}$$

For \(a=1\) and \(b=0\) the corresponding function is useful for sparsity reconstruction [19]; while for \(a=0\) and \(b=1\) the corresponding function is useful for detecting the discontinuities, in particular, when the solutions are piecewise-constant [18].

3 The method and main results

We now return to (1.1), where \(F:\mathcal {X}\rightarrow \mathcal {Y}\) is an operator between two Banach spaces \(\mathcal {X}\) and \(\mathcal {Y}\). We will always assume that \(\mathcal {X}\) is reflexive, \(\mathcal {Y}\) is uniformly smooth, and (1.1) has a solution. In general, Eq. (1.1) may have many solutions. In order to find the desired one, some selection criteria should be enforced. Choosing a proper convex function \(\varTheta \), we pick \(x_0\in D(\partial \varTheta )\) and \(\xi _0\in \partial \varTheta (x_0)\) as the initial guess, which may incorporate some available information on the sought solution. We define \(x^{\dag }\) to be the solution of (1.1) with the property

$$\begin{aligned} D_{\xi _0} \varTheta (x^{\dag },x_0) := \min _{x\in D(\varTheta )\cap D(F) }\{D_{\xi _0} \varTheta (x,x_0) : F(x) = y\}. \end{aligned}$$
(3.1)

We will work under the following conditions on the convex function \(\varTheta \) and the operator \(F\).

Assumption 3.1

\(\varTheta \) is a proper, weakly lower semi-continuous and uniformly convex function such that (2.4) holds, i.e. there is a strictly increasing continuous function \(\varphi : [0, \infty )\rightarrow [0, \infty )\) with \(\varphi (0)=0\) such that

$$\begin{aligned} D_\xi \varTheta (\bar{x}, x) \ge \varphi (\Vert \bar{x}-x\Vert ) \end{aligned}$$

for \(\bar{x} \in \mathcal {X},\,x\in D(\partial \varTheta )\) and \(\xi \in \partial \varTheta (x)\).

Assumption 3.2

  1. (a)

    \(D(F)\) is convex, and \(F\) is weakly closed, i.e. for any sequence \(\{x_n\} \subset D(F)\) satisfying \(x_n\rightharpoonup x\in \mathcal {X}\) and \(F(x_n)\rightharpoonup v \in \mathcal {Y}\) there hold \(x\in D(F)\) and \(F(x) = v\);

  2. (b)

    There is \(\rho >0\) such that (1.1) has a solution in \(B_\rho (x_0)\cap D(F)\cap D(\varTheta )\), where \( B_{\rho }(x_0) : = \{x\in \mathcal {X}:\ \Vert x-x_0\Vert \le \rho \}\);

  3. (c)

    \(F\) is Fréchet differentiable on \(D(F)\), and \(x\rightarrow F'(x)\) is continuous on \(D(F)\), where \(F'(x)\) denotes the Fréchet derivative of \(F\) at \(x\);

  4. (d)

    There exists \(0\le \eta <1\) such that

    $$\begin{aligned} \Vert F(\bar{x})-F(x)-F'(x)(\bar{x}-x)\Vert \le \eta \Vert F(\bar{x})-F(x)\Vert \end{aligned}$$

    for all \(\bar{x}, x\in B_{3\rho }(x_0)\cap D(F)\).

When \(\mathcal {X}\) is a reflexive Banach space, by using the weakly closedness of \(F\) and the weakly lower semi-continuity and uniformly convexity of \(\varTheta \) it is standard to show that \(x^\dag \) exists. The following result shows that \(x^\dag \) is in fact uniquely defined.

Lemma 3.1

Let \(\mathcal {X}\) be reflexive, \(\varTheta \) satisfy Assumption 3.1, and \(F\) satisfy Assumption 3.2. If \(x^\dag \) is a solution of \(F(x)=y\) satisfying (3.1) with

$$\begin{aligned} D_{\xi _0} \varTheta (x^\dag , x_0) \le \varphi (\rho ), \end{aligned}$$
(3.2)

then \(x^\dag \) is uniquely defined.

Proof

Assume that (1.1) has two distinct solutions \(\hat{x}\) and \(x^\dag \) satisfying (3.1). Then it follows from (3.2) that

$$\begin{aligned} D_{\xi _0} \varTheta (\hat{x}, x_0) =D_{\xi _0}\varTheta (x^\dag , x_0) \le \varphi (\rho ). \end{aligned}$$

By using Assumption 3.1 on \(\varTheta \) we obtain \(\Vert \hat{x}-x_0\Vert \le \rho \) and \(\Vert x^\dag -x_0\Vert \le \rho \). Since \(F(\hat{x})=F(x^\dag )\), we can use Assumption 3.2 (d) to derive that \(F'(x^\dag ) (\hat{x}-x^\dag )=0\). Let \(x_\lambda =\lambda \hat{x}+(1-\lambda ) x^\dag \) for \(0<\lambda <1\). Then \(x_\lambda \in B_{\rho }(x_0) \cap D(\varTheta ) \cap D(F)\) and \(F'(x^\dag ) (x_\lambda -x^\dag )=0\). Thus we can use Assumption 3.2 (d) to conclude that

$$\begin{aligned} \Vert F(x_\lambda )-F(x^\dag )\Vert \le \eta \Vert F(x_\lambda )-F(x^\dag )\Vert . \end{aligned}$$

Since \(0\le \eta <1\), this implies that \(F(x_\lambda )=F(x^\dag )=y\). Consequently, by the minimal property of \(x^\dag \) we have

$$\begin{aligned} D_{\xi _0} \varTheta (x_\lambda , x_0) \ge D_{\xi _0} \varTheta (x^\dag , x_0). \end{aligned}$$
(3.3)

On the other hand, it follows from the strict convexity of \(\varTheta \) that

$$\begin{aligned} D_{\xi _0} \varTheta (x_\lambda , x_0)&< \lambda D_{\xi _0} \varTheta (\hat{x}, x_0) +(1-\lambda ) D_{\xi _0} \varTheta (x^\dag , x_0)= D_{\xi _0} \varTheta (x^{\dag }, x_0) \end{aligned}$$

for \(0<\lambda <1\) which is a contradiction to (3.3). \(\square \)

We are now ready to formulate the nonstationary iterated Tikhonov regularization with penalty term induced by the uniformly convex function \(\varTheta \). For the initial guess \(x_0^{\delta }: = x_0\in D(\partial \varTheta )\cap D(F)\) and \(\xi _0^{\delta } := \xi _0\in \partial \varTheta (x_0)\), we take a sequence of positive numbers \(\{\alpha _n\}\) and define the iterative sequences \(\{x_n^{\delta }\}\) and \(\{\xi _n^{\delta }\}\) successively by

$$\begin{aligned} \begin{aligned}&x_n^\delta \in \arg \min _{x\in D(F)} \left\{ \frac{1}{r} \Vert F(x)-y^\delta \Vert ^r + \alpha _n D_{\xi _{n-1}^\delta } \varTheta (x, x_{n-1}^\delta )\right\} ,\\&\xi _n^\delta = \xi _{n-1}^\delta -\frac{1}{\alpha _n} F'\left( x_n^\delta \right) ^* J_r\left( F(x_n^\delta )-y^\delta \right) \end{aligned} \end{aligned}$$
(3.4)

for \(n\ge 1\), where \(1<r <\infty \) and \(J_r: \mathcal {Y}\rightarrow \mathcal {Y}^*\) denotes the duality mapping of \(\mathcal {Y}\) with gauge function \(t\rightarrow t^{r-1}\) which is single-valued and continuous because \(\mathcal {Y}\) is assumed to be uniformly smooth. At each step, the existence of \(x_n^{\delta }\) is guaranteed by the reflexivity of \(\mathcal {X}\) and \(\mathcal {Y}\), the weakly lower semi-continuity and uniformly convexity of \(\varTheta \), and the weakly closedness of \(F\). However, \(x_n^\delta \) might not be unique when \(F\) is nonlinear; we will take \(x_n^\delta \) to be any one of the minimizers. In view of the minimality of \(x_n^\delta \), we have \(\xi _n^\delta \in \partial \varTheta (x_n^\delta )\). From the definition of \(x_n^{\delta }\), it is straightforward to see that

$$\begin{aligned} \Vert F(x_n^\delta )-y^\delta \Vert \le \Vert F(x_{n-1}^\delta )-y^\delta \Vert ,\quad n=1,2,\ldots . \end{aligned}$$
(3.5)

We will terminate the iteration by the discrepancy principle

$$\begin{aligned} \Vert F(x_{n_\delta }^\delta )-y^\delta \Vert \le \tau \delta <\Vert F(x_n^\delta )-y^\delta \Vert ,\quad 0\le n<n_\delta \end{aligned}$$
(3.6)

with a given constant \(\tau >1\). The output \(x_{n_{\delta }}^{\delta }\) will be used to approximate a solution of (1.1).

In order to understand the convergence property of \(x_{n_\delta }^\delta \), it is necessary to consider the noise-free iterative sequences \(\{x_n\}\) and \(\{\xi _n\}\), where each \(x_n\) and \(\xi _n\) with \(n\ge 1\) are defined by (3.4) with \(y^\delta \) replaced by \(y\), i.e.,

$$\begin{aligned} \begin{aligned} x_n\in&\,\,\arg \min _{x\in D(F)} \left\{ \frac{1}{r} \Vert F(x)-y\Vert ^r +\alpha _n D_{\xi _{n-1}}\varTheta (x,x_{n-1})\right\} , \\ \xi _n=&\,\,\xi _{n-1}-\frac{1}{\alpha _n} F'(x_n)^* J_r(F(x_n)-y) \in \partial \varTheta (x_n). \end{aligned} \end{aligned}$$
(3.7)

In Sect. 4.1 we will give a detailed convergence analysis on \(\{x_n\}\); in particular, we will show that \(\{x_n\}\) strongly converges to a solution of (1.1). In order to connect such result with the convergence property of \(x_{n_\delta }^\delta \), we will make the following assumption.

Assumption 3.3

\(x_n\) is uniquely defined for each \(n\).

We will give some sufficient condition for the validity of Assumption 3.3. This assumption enables us to establish some stability results connecting \(x_n^\delta \) and \(x_n\) so that we can finally obtain the convergence property of \(x_{n_\delta }^\delta \) in the following result.

Theorem 3.1

Let \(\mathcal {X}\) be reflexive and \(\mathcal {Y}\) be uniformly smooth, let \(\varTheta \) satisfy Assumption 3.1, and let \(F\) satisfy Assumptions 3.2 and 3.3. Assume that \(1<r<\infty ,\,\tau >(1+\eta )/(1-\eta )\) and that \(\{\alpha _n\}\) is a sequence of positive numbers satisfying \(\sum _{n=1}^{\infty }\alpha _n^{-1} = \infty \) and \(\alpha _n\le c_0\alpha _{n+1}\) for all \(n\) with some constant \(c_0>0\). Assume further that

$$\begin{aligned} D_{\xi _0} \varTheta (x^\dag , x_0) \le \frac{\tau ^r-1}{\tau ^r-1+c_0} \varphi (\rho ). \end{aligned}$$
(3.8)

Then, the discrepancy principle (3.6) terminates the method (3.4) after \(n_{\delta }<\infty \) steps. Moreover, there is a solution \(x_*\in D(\varTheta )\) of (1.1) such that

$$\begin{aligned} x_{n_\delta }^\delta \rightarrow x_*, \quad \varTheta (x_{n_\delta }^\delta ) \rightarrow \varTheta (x_*) \quad \text{ and } \quad D_{\xi _{n_{\delta }}^{\delta }} \varTheta (x_*, x_{n_{\delta }}^{\delta }) \rightarrow 0 \end{aligned}$$
(3.9)

as \(\delta \rightarrow 0\). If, in addition, \(\mathcal {N}(F'(x^\dag ))\subset \mathcal {N}(F'(x))\) for all \(x\in B_{3\rho }(x_0)\cap D(F)\), then \(x_*=x^\dag \).

In this result, the closeness condition (3.8) is used to guarantee that \(x_n^\delta \) is in \(B_{3\rho }(x_0)\) for \(0\le n\le n_\delta \) so that Assumption 3.2 (d) can be applied. This issue does not appear when \(F: \mathcal {X}\rightarrow \mathcal {Y}\) is a bounded linear operator. Furthermore, Assumption 3.3 holds automatically for linear problems when \(\varTheta \) is strictly convex. Consequently, we have the following convergence result for linear inverse problems.

Theorem 3.2

Let \(F: \mathcal {X}\rightarrow \mathcal {Y}\) be a bounded linear operator with \(\mathcal {X}\) being reflexive and \(\mathcal {Y}\) being uniformly smooth, let \(\varTheta \) be proper, weakly lower semi-continuous, and uniformly convex, let \(1<r<\infty \), and let \(\{\alpha _n\}\) be such that \(\sum _{n=1}^{\infty }\alpha _n^{-1} = \infty \) and \(\alpha _n\le c_0\alpha _{n+1}\) for all \(n\) with \(c_0>0\). Then, the discrepancy principle (3.6) with \(\tau >1\) terminates the method after \(n_{\delta }<\infty \) steps. Moreover, there hold

$$\begin{aligned} x_{n_\delta }^\delta \rightarrow x^\dag , \quad \varTheta (x_{n_\delta }^\delta ) \rightarrow \varTheta (x^\dag ) \quad \text{ and } \quad D_{\xi _{n_{\delta }}^{\delta }} \varTheta (x^\dag , x_{n_{\delta }}^{\delta }) \rightarrow 0 \end{aligned}$$

as \(\delta \rightarrow 0\).

In the next section, we will give the detailed proof of Theorem 3.1. It should be pointed out that the convergence \(x_{n_\delta }^\delta \rightarrow x_*\) does not imply \(\varTheta (x_{n_\delta }^\delta ) \rightarrow \varTheta (x_*)\) directly since \(\varTheta \) is not necessarily continuous. The proof of \(\varTheta (x_{n_\delta }^\delta )\rightarrow \varTheta (x_*)\) relies on additional observations.

When applying our convergence result to the situation that \(\mathcal {X}=L^2(\varOmega )\) and \(\varTheta (x)=\mu \int _\varOmega |x(\omega )|^2 d\omega +\int _\varOmega |Dx|\) with \(\mu >0\), we can obtain

$$\begin{aligned} \Vert x_{n_\delta }^\delta -x^\dag \Vert _{L^2(\varOmega )} \rightarrow 0 \quad \text{ and } \quad \int _\varOmega |D x_{n_\delta }^\delta | \rightarrow \int _\varOmega |Dx| \quad \text{ as } \delta \rightarrow 0. \end{aligned}$$

This significantly improves the result in [2] in which only the boundedness of \(\varTheta (x_{n_\delta }^\delta )\) was derived and hence only weak convergence for a subsequence of \(\{x_{n_\delta }^\delta \}\) can be guaranteed.

We conclude this section with some sufficient condition to guarantee the validity of Assumption 3.3.

Assumption 3.4

There exist \(C_0 \ge 0\) and \(1/r \le \kappa <1\) such that

$$\begin{aligned} \Vert F(\bar{x})-F(x) - F'(x)(\bar{x}-x)\Vert \le C_0 [D_{\xi } \varTheta (\bar{x},x)]^{1-\kappa } [\varDelta _r(F(\bar{x})-y, F(x)-y)]^\kappa \end{aligned}$$

for all \(\bar{x}, x\in B_{3\rho }(x_0)\cap D(\varTheta ) \cap D(F)\) with \(x\in D(\partial \varTheta )\) and \(\xi \in \partial \varTheta (x)\), where \(\varDelta _r(\cdot , \cdot )\) denotes the Bregman distance on \(\mathcal {Y}\) induced by the convex function \(\Vert y\Vert ^r/r\).

When \(\mathcal {Y}\) is a \(r\)-uniformly convex Banach space, \(\varTheta \) is a \(p\)-uniformly convex function on \(\mathcal {X}\) with \(p\ge 2\), and \(1/p+1/r\le 1\), Assumption 3.4 holds with \(\kappa =1-1/p\) if there is a constant \(C_1\ge 0\) such that

$$\begin{aligned} \Vert F(\bar{x})-F(x)-F'(x) (\bar{x}-x) \Vert \le C_1 \Vert \bar{x}-x\Vert \Vert F(\bar{x}) -F(x)\Vert \end{aligned}$$
(3.10)

for \(\bar{x}, x\in B_{3\rho }(x_0)\cap D(F)\), which is a slightly strengthened version of Assumption 3.2 (d).

Lemma 3.2

Let \(\mathcal {X}\) be reflexive and \(\mathcal {Y}\) be uniformly smooth, let \(1<r<\infty \), let \(\varTheta \) satisfy Assumption 3.1, let \(F\) satisfy Assumptions 3.2 and 3.4, and let \(\{\alpha _n\}\) satisfy \(\sum _{n=1}^\infty \alpha _n^{-1}=\infty \). Assume that

$$\begin{aligned} D_{\xi _0} \varTheta (x^\dag , x_0)\le \varphi (\rho ) \quad \text{ and } \quad \bar{C}_0 [D_{\xi _0} \varTheta (x^\dag , x_0)]^{1-\frac{1}{r}} <1 \end{aligned}$$
(3.11)

with \(\bar{C}_0:=C_0 \kappa ^\kappa (1-\kappa )^{1-\kappa } (1-\eta )^{\frac{1-r}{r}} \alpha _1^{\kappa -\frac{1}{r}}\). Then Assumption 3.3 holds, i.e. \(x_n\) is uniquely defined for each \(n\).

We will prove Lemma 3.2 at the end of Sect. 4.1 by using some useful estimates that will be derived during the proof of the convergence of \(\{x_n\}\).

4 Convergence analysis

We prove Theorem 3.1 in this section. We first obtain a convergence result for the noise-free iterative sequences \(\{x_n\}\) and \(\{\xi _n\}\). We then consider the sequences \(\{x_n^\delta \}\) and \(\{\xi _n^\delta \}\) corresponding to the noisy data case, and show that the discrepancy principle indeed terminates the iteration in finite steps. We further establish a stability result which in particular implies that \(x_n^\delta \rightarrow x_n\) as \(\delta \rightarrow 0\) for each fixed \(n\). Combining all these results we finally obtain the proof of Theorem 3.1.

4.1 Convergence result for noise-free case

We first consider the noise-free iterative sequences \(\{x_n\}\) and \(\{\xi _n\}\) defined by (3.7) and obtain a convergence result that is crucial for proving Theorem 3.1. Our proof is inspired by [9, 14].

Theorem 4.1

Let \(\mathcal {X}\) be reflexive and \(\mathcal {Y}\) be uniformly smooth, let \(1<r<\infty \), let \(\varTheta \) satisfy Assumption 3.1, let \(F\) satisfy Assumption 3.2, and let \(\{\alpha _n\}\) satisfy \(\sum _{n=1}^\infty \alpha _n^{-1}=\infty \). Assume that

$$\begin{aligned} D_{\xi _0} \varTheta (x^\dag , x_0)\le \varphi (\rho ). \end{aligned}$$
(4.1)

Then there exists a solution \(x_*\) of (1.1) in \(B_{3\rho }(x_0)\cap D(\varTheta )\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } \Vert x_n-x_*\Vert =0, \quad \lim _{n\rightarrow \infty } \varTheta (x_n)=\varTheta (x_*) \quad \text{ and } \quad \lim _{n\rightarrow \infty } D_{\xi _n} \varTheta (x_*, x_n)=0. \end{aligned}$$

If in addition \(\mathcal {N}(F'(x^\dag ))\subset \mathcal {N}(F'(x))\) for all \(x\in B_{3\rho }(x_0)\cap D(F)\), then \(x_*=x^\dag \).

Proof

We first show by induction that for any solution \(\hat{x}\) of (1.1) in \(B_{3\rho }(x_0)\cap D(\varTheta )\) there holds

$$\begin{aligned} D_{\xi _n}\varTheta (\hat{x}, x_n)\le D_{\xi _0}\varTheta (\hat{x}, x_0), \quad n=0, 1, \ldots . \end{aligned}$$
(4.2)

This is trivial for \(n=0\). Assume that it is true for \(n=m-1\) for some \(m\ge 1\), we will show that it is also true for \(n=m\). From (2.2) we have

$$\begin{aligned} D_{\xi _m}\varTheta (\hat{x}, x_m)-D_{\xi _{m-1}}\varTheta (\hat{x}, x_{m-1}) =-D_{\xi _{m-1}}\varTheta (x_m, x_{m-1}) +\langle \xi _{m-1}-\xi _m, \hat{x}-x_m\rangle . \end{aligned}$$

By dropping the first term on the right which is non-positive and using the definition of \(\xi _m\) we can obtain

$$\begin{aligned} D_{\xi _m}\varTheta (\hat{x}, x_m)-D_{\xi _{m-1}}&\varTheta (\hat{x}, x_{m-1}) \le \dfrac{1}{\alpha _m} \langle J_r(F(x_m)-y), F'(x_m)(\hat{x}-x_m)\rangle . \end{aligned}$$

In view of the properties of the duality mapping \(J_r\) it follows that

$$\begin{aligned}&D_{\xi _m}\varTheta (\hat{x}, x_m)-D_{\xi _{m-1}}\varTheta (\hat{x}, x_{m-1})\nonumber \\&\quad \le -\frac{1}{\alpha _m} \Vert F(x_m)\!-\!y\Vert ^r \!+\!\frac{1}{\alpha _m} \Vert F(x_m)\!-\!y\Vert ^{r-1} \Vert F(x_m)\!-\!y+F'(x_m)(\hat{x}-x_m)\Vert .\nonumber \\ \end{aligned}$$
(4.3)

In order to proceed further, we need to show that \(x_m \in B_{3\rho }(x_0)\) so that Assumption 3.2 (d) on \(F\) can be employed. Using the minimizing property of \(x_m\), the induction hypothesis, and (4.1) we obtain

$$\begin{aligned} D_{\xi _{m-1}}\varTheta (x_m, x_{m-1}) \le D_{\xi _{m-1}}\varTheta (x^\dag , x_{m-1}) \le D_{\xi _0} \varTheta (x^\dag , x_0) \le \varphi (\rho ). \end{aligned}$$

With the help of Assumption 3.1 on \(\varTheta \), we have

$$\begin{aligned} \Vert x_m-x_{m-1}\Vert \le \rho , \quad \Vert x^\dag -x_{m-1}\Vert \le \rho \quad \text{ and } \quad \Vert x^\dag -x_0\Vert \le \rho . \end{aligned}$$

Therefore \(x_m\in B_{3\rho }(x_0)\). Thus we may use Assumption 3.2 (d) to obtain from (4.3) that

$$\begin{aligned} D_{\xi _m}\varTheta (\hat{x}, x_m)-D_{\xi _{m-1}}\varTheta (\hat{x}, x_{m-1}) \le -\frac{1-\eta }{\alpha _m} \Vert F(x_m)-y\Vert ^r. \end{aligned}$$
(4.4)

This and the induction hypothesis imply (4.2) with \(n=m\).

As an immediate consequence of (4.2), we know that (4.4) is true for all \(m\). Consequently

$$\begin{aligned} D_{\xi _n}\varTheta (\hat{x}, x_n)\le D_{\xi _{n-1}}\varTheta (\hat{x}, x_{n-1}), \quad n=1,2, \ldots \end{aligned}$$
(4.5)

and

$$\begin{aligned} \frac{1-\eta }{\alpha _n}\Vert F(x_n)-y\Vert ^r \le D_{\xi _{n-1}}\varTheta (\hat{x}, x_{n-1})-D_{\xi _n}\varTheta (\hat{x}, x_n). \end{aligned}$$
(4.6)

By using the monotonicity of \(\Vert F(x_n)-y\Vert \) with respect to \(n\), we obtain

$$\begin{aligned} \Vert F(x_n)-y\Vert ^r \sum _{j=1}^n \frac{1}{\alpha _j} \le \sum _{j=1}^n \frac{1}{\alpha _j} \Vert F(x_j)-y\Vert ^r \le \frac{1}{1-\eta } D_{\xi _0} \varTheta (\hat{x}, x_0). \end{aligned}$$

Since \(\sum _{j=1}^n \alpha _j^{-1}\rightarrow \infty \) as \(n\rightarrow \infty \), we have \(\Vert F(x_n)-y\Vert \rightarrow 0\) as \(n\rightarrow \infty \).

Next we show that \(\{x_n\}\) converges to a solution of (1.1). To this end, we show that \(\{x_n\}\) is a Cauchy sequence in \(\mathcal {X}\). For \(0\le l<m<\infty \) we have from (2.2) that

$$\begin{aligned} D_{\xi _l}\varTheta (x_m, x_l)=D_{\xi _l}\varTheta (\hat{x}, x_l)-D_{\xi _m}\varTheta (\hat{x}, x_m)+\langle \xi _m-\xi _l, x_m-\hat{x}\rangle . \end{aligned}$$

By the definition of \(\xi _n\) we have

$$\begin{aligned} |\langle \xi _m-\xi _l, x_m- \hat{x}\rangle |&= \left| \sum _{n=l+1}^m \langle \xi _n-\xi _{n-1}, x_m-\hat{x}\rangle \right| \nonumber \\&= \left| \sum _{n=l+1}^m \frac{1}{\alpha _n} \langle J_r(F(x_n)-y), F'(x_n)(x_m-\hat{x})\rangle \right| \nonumber \\&\le \sum _{n=l+1}^m \frac{1}{\alpha _n} \Vert F(x_n)-y\Vert ^{r-1} \Vert F'(x_n)(x_m-\hat{x})\Vert .\qquad \end{aligned}$$
(4.7)

By using Assumption 3.2 (d) on \(F\) and the monotonicity of \(\Vert F(x_n)-y\Vert \) we can obtain

$$\begin{aligned} \Vert F'(x_n)(x_m-\hat{x})\Vert&\le \Vert F'(x_n)(x_n-\hat{x})\Vert +\Vert F'(x_n)(x_m-x_n)\Vert \nonumber \\&\le (1+\eta ) (\Vert F(x_n)-y\Vert +\Vert F(x_m)-F(x_n)\Vert )\nonumber \\&\le 3(1+\eta ) \Vert F(x_n)-y\Vert . \end{aligned}$$
(4.8)

Therefore, by using (4.6), we have with \(c_0:=3(1+\eta )/(1-\eta )\) that

$$\begin{aligned} |\langle \xi _m-\xi _l, x_m-\hat{x}\rangle |&\le 3(1+\eta ) \sum _{n=l+1}^m \frac{1}{\alpha _n} \Vert F(x_n)-y\Vert ^r \nonumber \\&\le c_0 (D_{\xi _l}\varTheta (\hat{x}, x_l)-D_{\xi _m}\varTheta (\hat{x}, x_m)). \end{aligned}$$
(4.9)

Consequently

$$\begin{aligned} D_{\xi _l}\varTheta (x_m, x_l) \le (1+c_0) (D_{\xi _l}\varTheta (\hat{x}, x_l)-D_{\xi _m}\varTheta (\hat{x}, x_m)). \end{aligned}$$

Since \(\{D_{\xi _n}\varTheta (\hat{x}, x_n)\}\) is monotonically decreasing, we obtain \(D_{\xi _l}\varTheta (x_m, x_l)\rightarrow 0\) as \(l,m\rightarrow \infty \). In view of the uniform convexity of \(\varTheta \), we can conclude that \(\{x_n\}\) is a Cauchy sequence in \(\mathcal {X}\). Thus \(x_n\rightarrow x_*\) for some \(x_*\in \mathcal {X}\) as \(n\rightarrow \infty \). Since \(\Vert F(x_n)-y\Vert \rightarrow 0\) as \(n\rightarrow \infty \), we may use the weak closedness of \(F\) to conclude that \(x_*\in D(F)\) and \(F(x_*)=y\). We remark that \(x_*\in B_{3\rho }(x_0)\) because \(x_n\in B_{3\rho }(x_0)\).

Next we show that

$$\begin{aligned} x_*\in D(\varTheta ), \quad \lim _{n\rightarrow \infty } \varTheta (x_n)=\varTheta (x_*) \quad \text{ and } \quad \lim _{n\rightarrow \infty } D_{\xi _n} \varTheta (x_*, x_n)=0. \end{aligned}$$

From the convexity of \(\varTheta \) and \(\xi _n\in \partial \varTheta (x_n)\) it follows that

$$\begin{aligned} \varTheta (x_n)\le \varTheta (\hat{x}) +\langle \xi _n, x_n-\hat{x}\rangle . \end{aligned}$$
(4.10)

In view of (4.9) we have

$$\begin{aligned} \varTheta (x_n)\le \varTheta (\hat{x}) +\langle \xi _0, x_n-\hat{x}\rangle + c_0 D_{\xi _0} \varTheta (\hat{x}, x_0). \end{aligned}$$

Since \(x_n\rightarrow x_*\) as \(n\rightarrow \infty \), by using the weak lower semi-continuity of \(\varTheta \) we obtain

$$\begin{aligned} \varTheta (x_*)\le \liminf _{n\rightarrow \infty } \varTheta (x_n)\le \varTheta (\hat{x}) +\langle \xi _0, x_*-\hat{x}\rangle + c_0 D_{\xi _0} \varTheta (\hat{x}, x_0)<\infty .\qquad \end{aligned}$$
(4.11)

This implies that \(x_*\in D(\varTheta )\). We next use (4.9) to derive for \(l<n\) that

$$\begin{aligned} |\langle \xi _n, x_n-x_*\rangle | \le c_0 (D_{\xi _l} \varTheta (x_*, x_l) -D_{\xi _n} \varTheta (x_*, x_n)) +|\langle \xi _l, x_n-x_*\rangle |. \end{aligned}$$

By taking \(n\rightarrow \infty \) and using \(x_n\rightarrow x_*\) we can derive that

$$\begin{aligned} \limsup _{n\rightarrow \infty } |\langle \xi _n, x_n-x_*\rangle | \le c_0 (D_{\xi _l} \varTheta (x_*, x_l) - \varepsilon _0), \end{aligned}$$

where \(\varepsilon _0:=\lim _{n\rightarrow \infty } D_{\xi _n}\varTheta (x_*, x_n)\) whose existence is guaranteed by the monotonicity of \(\{D_{\xi _n}\varTheta (x_*, x_n)\}\). Since the above inequality holds for all \(l\), by taking \(l\rightarrow \infty \) we obtain

$$\begin{aligned} \limsup _{n\rightarrow \infty } |\langle \xi _n, x_n-x_*\rangle | \le c_0 (\varepsilon _0 - \varepsilon _0)=0. \end{aligned}$$
(4.12)

Using (4.10) with \(\hat{x}\) replaced by \(x_*\) we thus obtain \(\limsup _{n\rightarrow \infty } \varTheta (x_n)\le \varTheta (x_*)\). Combining this with (4.11) we therefore obtain \(\lim _{n\rightarrow \infty } \varTheta (x_n)=\varTheta (x_*)\). This together with (4.12) then implies that \(\lim _{n\rightarrow \infty } D_{\xi _n}\varTheta (x_*, x_n)=0\).

Finally we prove \(x_*=x^\dag \) under the additional condition \(\mathcal {N}(F'(x^\dag ))\subset \mathcal {N}(F'(x))\) for \(x\in B_{3\rho }(x_0)\cap D(F)\). We use (4.10) with \(\hat{x}\) replaced by \(x^\dag \) to obtain

$$\begin{aligned} D_{\xi _0}\varTheta (x_n, x_0)\le D_{\xi _0}\varTheta (x^\dag , x_0)+\langle \xi _n-\xi _0, x_n-x^\dag \rangle . \end{aligned}$$
(4.13)

By using (4.9), for any \(\varepsilon >0\) we can find \(l_0\) such that

$$\begin{aligned} |\langle \xi _n-\xi _{l_0}, x_n-x^\dag \rangle | <\frac{\varepsilon }{2}, \quad n\ge l_0. \end{aligned}$$

We next consider \(\langle \xi _{l_0}-\xi _0, x_n-x^\dag \rangle \). According to the definition of \(\xi _n\) we have \(\xi _j-\xi _{j-1}\in \mathcal {R}(F'(x_j)^*)\). Since \(\mathcal {X}\) is reflexive and \(\mathcal {N}(F'(x^\dag )) \subset \mathcal {N}(F'(x_j))\), we have from (2.1) that \(\overline{\mathcal {R}(F'(x_j)^*)}\subset \overline{\mathcal {R}(F'(x^\dag )^*)}\). Thus we can find \(v_j\in \mathcal {Y}^*\) and \(\beta _j \in \mathcal {X}^*\) such that

$$\begin{aligned} \xi _j-\xi _{j-1}=F'(x^\dag )^* v_j +\beta _j \quad \text{ and } \quad \Vert \beta _j\Vert \le \frac{\varepsilon }{3 l_0 M}, \quad 1\le j\le l_0, \end{aligned}$$

where \(M>0\) is a constant such that \(\Vert x_n-x^\dag \Vert \le M\) for all \(n\). Consequently

$$\begin{aligned} |\langle \xi _{l_0}-\xi _0, x_n-x^\dag \rangle |&= \left| \sum _{j=1}^{l_0} \langle \xi _j-\xi _{j-1}, x_n-x^\dag \rangle \right| \\&= \left| \sum _{j=1}^{l_0} [\langle v_j, F'(x^\dag ) (x_n-x^\dag )\rangle +\langle \beta _j, x_n-x^\dag \rangle ]\right| \\&\le \sum _{j=1}^{l_0} \left( \Vert v_j\Vert \Vert F'(x^\dag ) (x_n-x^\dag )\Vert +\Vert \beta _j\Vert \Vert x_n-x^\dag \Vert \right) \\&\le (1+\eta ) \sum _{j=1}^{l_0} \Vert v_j\Vert \Vert F(x_n)-y\Vert +\frac{\varepsilon }{3}. \end{aligned}$$

Since \(\Vert F(x_n)-y\Vert \rightarrow 0\) as \(n\rightarrow \infty \), we can find \(n_0\ge l_0\) such that

$$\begin{aligned} |\langle \xi _{l_0}-\xi _0, x_n-x^\dag \rangle |<\frac{\varepsilon }{2}, \quad \forall n\ge n_0. \end{aligned}$$

Therefore \(|\langle \xi _n-\xi _0, x_n-x^\dag \rangle |<\varepsilon \) for all \(n\ge n_0\). Since \(\varepsilon >0\) is arbitrary, we obtain \(\lim _{n\rightarrow \infty } \langle \xi _n-\xi _0, x_n-x^\dag \rangle =0\). By taking \(n\rightarrow \infty \) in (4.13) and using \(\varTheta (x_n)\rightarrow \varTheta (x_*)\) we obtain

$$\begin{aligned} D_{\xi _0}\varTheta (x_*, x_0)\le D_{\xi _0}\varTheta (x^\dag , x_0). \end{aligned}$$

According to the definition of \(x^\dag \) we must have \(D_{\xi _0}\varTheta (x_*, x_0)=D_{\xi _0}\varTheta (x^\dag , x_0)\). A direct application of Lemma 3.1 gives \(x_*=x^\dag \). \(\square \)

As a byproduct, now we can use some estimates established in the proof of Theorem 4.1 to prove Lemma 3.2.

Proof of Lemma 3.2

We assume that the minimization problem in (3.7) has two minimizers \(x_n\) and \(\hat{x}_n\). Then it follows that

$$\begin{aligned} 0&= \frac{1}{r} \Vert F(\hat{x}_n)-y\Vert ^r + \alpha _n D_{\xi _{n-1}} \varTheta (\hat{x}_n,x_{n-1}) - \frac{1}{r} \Vert F(x_n)-y\Vert ^r \\&- \,\alpha _n D_{\xi _{n-1}} \varTheta (x_n,x_{n-1})\\&= \varDelta _r(F(\hat{x}_n)-y, F(x_n)-y)+\langle J_r(F(x_n)-y), F(\hat{x}_n)-F(x_n)\rangle \\&+\, \alpha _n(\varTheta (\hat{x}_n)-\varTheta (x_n)-\langle \xi _{n-1}, \hat{x}_n-x_n\rangle ). \end{aligned}$$

With the help of the definition of \(\xi _n\) we can write

$$\begin{aligned}&\varTheta (\hat{x}_n)-\varTheta (x_n)-\langle \xi _{n-1}, \hat{x}_n- x_n \rangle \\&\quad = \varTheta (\hat{x}_n) - \varTheta (x_n) - \langle \xi _n, \hat{x}_n- x_n\rangle + \langle \xi _n-\xi _{n-1}, \hat{x}_n- x_n\rangle \\&\quad = D_{\xi _n} \varTheta (\hat{x}_n, x_n) -\frac{1}{\alpha _n}\langle J_r(F(x_n)-y), F'(x_n)(\hat{x}_n- x_n)\rangle . \end{aligned}$$

Therefore

$$\begin{aligned} 0&= \varDelta _r(F(\hat{x}_n)-y,F(x_n)-y) +\alpha _n D_{\xi _n} \varTheta (\hat{x}_n, x_n) \\&+ \,\langle J_r(F(x_n)-y), F(\hat{x}_n)-F(x_n)-F'(x_n)(\hat{x}_n-x_n)\rangle . \end{aligned}$$

Since \(x_n, \hat{x}_n\in B_{3\rho }(x_0)\) as shown in the proof of Theorem 4.1, we may use Assumption 3.4 and the Young’s inequality to obtain

$$\begin{aligned} 0&\ge \varDelta _r(F(\hat{x}_n)-y,F(x_n)-y) +\alpha _n D_{\xi _n} \varTheta (\hat{x}_n, x_n) \\&-\, C_0 \Vert F(x_n)-y\Vert ^{r-1} [D_{\xi _n} \varTheta (\hat{x}_n, x_n)]^{1-\kappa } [\varDelta _r (F(\hat{x}_n)-y,F(x_n)-y)]^\kappa \\&\ge \alpha _n D_{\xi _n} \varTheta (\hat{x}_n, x_n) - (1-\kappa ) \kappa ^{\frac{\kappa }{1-\kappa }} C_0^{\frac{1}{1-\kappa }} \Vert F(x_n)-y\Vert ^{\frac{r-1}{1-\kappa }} D_{\xi _n} \varTheta (\hat{x}_n,x_n). \end{aligned}$$

Recall that in the proof of Theorem 4.1 we have established

$$\begin{aligned} \Vert F(x_n)-y\Vert ^r \le \frac{1}{1-\eta } s_n^{-1} D_{\xi _0} \varTheta (x^\dag , x_0) \quad \text{ with } s_n:=\sum _{j=1}^n \alpha _j^{-1}. \end{aligned}$$

Since \(s_n^{-1} \le \min \{\alpha _1, \alpha _n\}\) and \(\kappa \ge 1/r\), we therefore obtain

$$\begin{aligned} 0&\ge \left( 1-\bar{C}_0^{\frac{1}{1-\kappa }} D_{\xi _0} \varTheta (x^\dag , x_0)^{\frac{r-1}{r(1-\kappa )}}\right) \alpha _n D_{\xi _n} \varTheta (\hat{x}_n, x_n) \end{aligned}$$

with \(\bar{C}_0:=C_0 \kappa ^\kappa (1-\kappa )^{1-\kappa } (1-\eta )^{\frac{1-r}{r}} \alpha _1^{\kappa -\frac{1}{r}}\). Thus we may use the second condition in (3.11) to conclude that \(D_{\xi _n} \varTheta (\hat{x}_n, x_n)=0\) and hence \(\hat{x}_n = x_n\). \(\square \)

4.2 Justification of the method

In this subsection we show that the method is well-defined, in particular we prove that, when the data contains noise, the discrepancy principle (3.6) terminates the iteration in finite steps, i.e. \(n_\delta <\infty \).

Lemma 4.1

Let \(\mathcal {X}\) be reflexive and \(\mathcal {Y}\) be uniformly smooth, let \(\varTheta \) satisfy Assumption 3.1, and let \(F\) satisfy Assumption 3.2. Let \(1<r<\infty \) and \(\tau >(1+\eta )/(1-\eta )\), and let \(\{\alpha _n\}\) be such that \(\sum _{n=1}^\infty \alpha _n^{-1} =\infty \). Assume that (4.1) holds.

Then the discrepancy principle (3.6) terminates the iteration after \(n_{\delta }<\infty \) steps. If \(n_\delta \ge 2\), then for \(1\le n<n_\delta \) there hold

$$\begin{aligned} D_{\xi _{n}^{\delta }} \varTheta (\hat{x}, x_n^\delta )&\le D_{\xi _{n-1}^\delta } \varTheta (\hat{x}, x_{n-1}^\delta ), \end{aligned}$$
(4.14)
$$\begin{aligned} \frac{1}{\alpha _n}\Vert F(x_n^\delta )-y^\delta \Vert ^r&\le C_1 \left( D_{\xi _{n-1}^\delta } \varTheta (\hat{x}, x_{n-1}^\delta )-D_{\xi _{n}^\delta } \varTheta (\hat{x},x_n^\delta )\right) . \end{aligned}$$
(4.15)

If, in addition, \(\alpha _n \le c_0 \alpha _{n+1}\) for all \(n\) with some constant \(c_0>0\) and

$$\begin{aligned} D_{\xi _0}\varTheta (x^\dag , x_0) \le \frac{\tau ^r-1}{\tau ^r-1+c_0} \varphi (\rho ), \end{aligned}$$
(4.16)

then there holds

$$\begin{aligned} D_{\xi _{n_\delta }^\delta } \varTheta (\hat{x}, x_{n_\delta }^\delta )\le D_{\xi _{n_\delta -1}^\delta } \varTheta (\hat{x}, x_{n_\delta -1}^\delta ) + (1+\eta ) \tau ^{r-1} \frac{\delta ^r}{\alpha _{n_\delta }}, \end{aligned}$$
(4.17)

where \(C_{1}:= \tau /[(1-\eta ) \tau -1-\eta ]\) and \(\hat{x}\) denotes any solution of (1.1) in \(B_{3\rho }(x_0)\cap D(\varTheta )\).

Proof

To prove the first part, we first show by induction that

$$\begin{aligned} x_n^\delta \in B_{2\rho }(x_0) \quad \text{ and } \quad D_{\xi _n^\delta }\varTheta (x^\dag , x_n^\delta ) \le D_{\xi _0}\varTheta (x^\dag , x_0), \quad 0\le n<n_\delta .\qquad \end{aligned}$$
(4.18)

This is trivial for \(n=0\). Next we assume that (4.18) is true for \(n=m-1\) for some \(m<n_\delta \) and show that (4.18) is also true for \(n=m\). By the minimizing property of \(x_m^\delta \) and the induction hypothesis we have

$$\begin{aligned} \frac{1}{r} \Vert F(x_m^\delta )-y^\delta \Vert ^r +\alpha _m D_{\xi _{m-1}^\delta } \varTheta (x_m^\delta , x_{m-1}^\delta )&\le \frac{1}{r} \delta ^r + \alpha _m D_{\xi _{m-1}^\delta } \varTheta (x^\dag , x_{m-1}^\delta ) \nonumber \\&\le \frac{1}{r} \delta ^r + \alpha _m D_{\xi _0}\varTheta (x^\dag , x_0).\qquad \end{aligned}$$
(4.19)

Since \(\Vert F(x_m^\delta )-y^\delta \Vert >\tau \delta \), we can obtain

$$\begin{aligned} \frac{\tau ^r}{r} \delta ^r +\alpha _m D_{\xi _{m-1}^\delta } \varTheta (x_m^\delta , x_{m-1}^\delta ) \le \frac{1}{r} \delta ^r + \alpha _m D_{\xi _0}\varTheta (x^\dag , x_0). \end{aligned}$$

Because \(\tau >1\), this implies that

$$\begin{aligned} \alpha _m\ge \frac{(\tau ^r-1) \delta ^r}{r D_{\xi _0} \varTheta (x^\dag , x_0)} \quad \text{ and } \quad D_{\xi _{m-1}^\delta } \varTheta (x_m^\delta , x_{m-1}^\delta ) \le D_{\xi _0}\varTheta (x^\dag , x_0). \end{aligned}$$
(4.20)

By Assumption 3.1 and the condition (4.1), we can derive that \(\Vert x_m^\delta -x_{m-1}^\delta \Vert \le \rho \). In view of the induction hypothesis we also have \(\Vert x_{m-1}^\delta -x_0\Vert \le 2\rho \). Thus \(x_m^\delta \in B_{3\rho }(x_0)\).

We are now able to use Assumption 3.2 (d) and the similar argument for deriving (4.3) to obtain that

$$\begin{aligned}&D_{\xi _m^\delta } \varTheta (\hat{x},x_m^\delta )- D_{\xi _{m-1}^\delta } \varTheta (\hat{x},x_{m-1}^\delta ) \nonumber \\&\quad \le \langle \xi _m^\delta -\xi _{m-1}^\delta , x_m^\delta -\hat{x}\rangle = -\frac{1}{\alpha _m} \langle J_r(F(x_m^\delta )-y^\delta ), F'(x_m^\delta )(x_m^\delta -\hat{x}) \rangle \nonumber \\&\quad \le -\frac{1}{\alpha _m}\Vert F(x_m^{\delta })-y^{\delta }\Vert ^{r} +\frac{1}{\alpha _m} \Vert F(x_m^\delta )-y^\delta \Vert ^{r-1} (\delta + \eta \Vert F(x_m^\delta ) -y\Vert ) \nonumber \\&\quad \le -\frac{1-\eta }{\alpha _m} \Vert F(x_m^\delta )-y^\delta \Vert ^{r} +\frac{1+\eta }{\alpha _m} \Vert F(x_m^\delta )-y^\delta \Vert ^{r-1} \delta . \end{aligned}$$
(4.21)

Using again \(\Vert F(x_m^\delta )-y^\delta \Vert >\tau \delta \), we can conclude that

$$\begin{aligned} D_{\xi _m^\delta } \varTheta (\hat{x}, x_m^\delta )\!-\! D_{\xi _{m-1}^\delta } \varTheta (\hat{x}, x_{m-1}^\delta ) \!\le \! -\frac{1}{\alpha _m} \left( 1\!-\!\eta \!-\!\frac{1+\eta }{\tau }\right) \Vert F(x_m^\delta )\!-\!y^\delta \Vert ^r.\nonumber \\ \end{aligned}$$
(4.22)

Since \(\tau >(1+\eta )/(1-\eta )\), we obtain

$$\begin{aligned} D_{\xi _m^\delta }\varTheta (\hat{x}, x_m^\delta ) \le D_{\xi _{m-1}^\delta }\varTheta (\hat{x}, x_{m-1}^\delta ). \end{aligned}$$

In view of this inequality with \(\hat{x}=x^\dag \) and the induction hypothesis, we obtain the second result in (4.18) with \(n=m\). By using again Assumption 3.1 and (4.1) we have \(\Vert x_m^\delta -x^\dag \Vert \le \rho \) and \(\Vert x^\dag -x_0\Vert \le \rho \) which imply that \(x_m^\delta \in B_{2\rho }(x_0)\). We therefore complete the proof of (4.18). As a direct consequence, we can see that (4.22) holds for all \(1\le m<n_\delta \) which implies (4.14) and (4.15).

In view of (4.15) and the monotonicity (3.5) of \(\Vert F(x_n^\delta )-y^\delta \Vert \) with respect to \(n\), it follows that

$$\begin{aligned} \Vert F(x_n^\delta )-y^\delta \Vert ^r \sum _{j=1}^n\frac{1}{\alpha _j} \le \sum _{j=1}^n\frac{1}{\alpha _j} \Vert F(x_j^\delta )-y^\delta \Vert ^r \le \frac{\tau }{(1-\eta )\tau -1-\eta } D_{\xi _0} \varTheta (\hat{x}, x_0). \end{aligned}$$

Since \(\Vert F(x_n^\delta )-y^\delta \Vert >\tau \delta \) for \(1\le n<n_\delta \) and \(\sum _{j=1}^n\alpha _j^{-1}\rightarrow \infty \) as \(n\rightarrow \infty \), we can conclude that \(n_\delta \) is a finite integer.

Finally we prove the second part, i.e. the inequality (4.17). Since (4.19) is true for \(m=n_\delta \), we have

$$\begin{aligned} D_{\xi _{n_\delta -1}^\delta } \varTheta \left( x_{n_\delta }^\delta , x_{n_\delta -1}^\delta \right) \le \frac{\delta ^r}{r \alpha _{n_\delta }} +D_{\xi _0} \varTheta (x^\dag , x_0). \end{aligned}$$

Recall from (4.20) that \(\alpha _{n_\delta -1} \ge (\tau ^r-1) \delta ^r /(r D_{\xi _0}\varTheta (x^\dag , x_0))\). Since \(\alpha _{n_\delta -1}\le c_0 \alpha _{n_\delta }\), we can derive that

$$\begin{aligned} D_{\xi _{n_\delta -1}^\delta } \varTheta \left( x_{n_\delta }^\delta , x_{n_\delta -1}^\delta \right) \le \frac{\tau ^r-1+c_0}{\tau ^r-1} D_{\xi _0} \varTheta (x^\dag , x_0). \end{aligned}$$

It then follows from Assumption 3.1 and (4.16) that \(\Vert x_{n_\delta }^\delta -x_{n_\delta -1}^\delta \Vert \le \rho \). Since \(x_{n_\delta -1}^\delta \in B_{2\rho }(x_0)\) we obtain \(x_{n_\delta }^\delta \in B_{3\rho }(x_0)\). Thus we can employ Assumption 3.2 (d) to conclude that (4.21) is also true for \(m=n_\delta \). By setting \(m=n_\delta \) in (4.21) and using \(\Vert F(x_{n_\delta }^\delta )-y^\delta \Vert \le \tau \delta \), we can obtain (4.17). \(\square \)

As a byproduct of the proof of Lemma 4.1, we have the following result which will be used to show \(\lim _{\delta \rightarrow 0} \varTheta (x_{n_\delta }^\delta ) =\varTheta (x_*)\) in the proof of Theorem 3.1.

Lemma 4.2

Let all the conditions in Lemma 4.1 hold, and let \(\hat{x}\) be any solution of (1.1) in \(B_{3\rho }(x_0)\cap D(\varTheta )\). Then for all \(0\le l< n_\delta \) there holds

$$\begin{aligned} |\langle \xi _{n_\delta }^\delta -\xi _l^\delta , \hat{x}-x_{n_\delta }^\delta \rangle | \le C_2 \frac{\delta ^r}{\alpha _{n_\delta }} + C_3 D_{\xi _l^\delta } \varTheta (\hat{x}, x_l^\delta ), \end{aligned}$$
(4.23)

where \(C_2:= 3(1+\eta ) \tau ^{r-1} (1+\tau )\) and \(C_3:=3(1+\eta ) (1+\tau )/[(1-\eta ) \tau -1-\eta ]\).

Proof

By the definition of \(\xi _n^\delta \) and the property of the duality mapping \(J_r\), we can obtain, using the similar argument for deriving (4.7), that

$$\begin{aligned} |\langle \xi _{n_\delta }^\delta -\xi _l^\delta , \hat{x}-x_{n_\delta }^\delta \rangle | \le \sum _{n=l+1}^{n_\delta } \frac{1}{\alpha _n} \Vert F(x_n^\delta )-y^\delta \Vert ^{r-1}\Vert F'(x_n^\delta ) (\hat{x}-x_{n_\delta }^\delta )\Vert . \end{aligned}$$

With the help of Assumption 3.2 (d) and the monotonicity (3.5) of \(\Vert F(x_n^\delta )-y^\delta \Vert \) with respect to \(n\), similar to the derivation of (4.8) we have for \(n\le n_\delta \) that

$$\begin{aligned} \Vert F'(x_n^\delta )(\hat{x}-x_{n_\delta }^\delta )\Vert \le 3(1+\eta ) (\Vert F(x_n^\delta )-y^\delta \Vert +\delta ). \end{aligned}$$

Therefore

$$\begin{aligned} |\langle \xi _{n_\delta }^\delta -\xi _l^\delta , \hat{x}-x_{n_\delta }^\delta \rangle | \le 3(1+\eta ) \sum _{n=l+1}^{n_\delta } \frac{1}{\alpha _n} \Vert F(x_n^\delta )-y^\delta \Vert ^{r-1} (\Vert F(x_n^\delta )-y^\delta \Vert +\delta ). \end{aligned}$$

Since \(\Vert F(x_{n_\delta }^\delta )-y^\delta \Vert \le \tau \delta \) and \(\Vert F(x_n^\delta )-y^\delta \Vert >\tau \delta \) for \(0\le n <n_\delta \), we thus obtain

$$\begin{aligned}&|\langle \xi _{n_\delta }^\delta -\xi _l^\delta , \hat{x}-x_{n_\delta }^\delta \rangle | \nonumber \\&\quad \le 3(1+\eta )\tau ^{r-1} (1+\tau ) \frac{\delta ^r}{\alpha _{n_\delta }} + \frac{3(1+\eta )(1+\tau )}{\tau } \sum _{n=l+1}^{n_\delta -1} \frac{1}{\alpha _n} \Vert F(x_n^\delta )-y^\delta \Vert ^r.\nonumber \\ \end{aligned}$$
(4.24)

In view of (4.15) in Lemma 4.1, we can see that

$$\begin{aligned} \sum _{n=l+1}^{n_\delta -1} \frac{1}{\alpha _n} \Vert F(x_n^\delta )-y^\delta \Vert ^r \le \frac{\tau }{(1-\eta )\tau -1-\eta } D_{\xi _l^\delta }\varTheta (\hat{x}, x_l^\delta ). \end{aligned}$$

Combining this inequality with (4.24) gives the desired estimate. \(\square \)

4.3 Stability

We will prove some stability results on the method which connect \(\{x_n^\delta \}\) with \(\{x_n\}\). These results enable us to use Theorem 4.1 to complete the proof of Theorem 3.1.

Lemma 4.3

Let \(\mathcal {X}\) be reflexive and \(\mathcal {Y}\) be uniformly smooth, let \(\varTheta \) satisfy Assumption 3.1, and let \(F\) satisfy Assumptions 3.2 and 3.3. Then for each fixed \(n\) there hold

$$\begin{aligned} x_n^\delta \rightarrow x_n,\quad \varTheta (x_n^\delta )\rightarrow \varTheta (x_n)\quad \text{ and } \quad \xi _n^\delta \rightarrow \xi _n \end{aligned}$$
(4.25)

as \(y^\delta \rightarrow y\).

Proof

We show this result by induction. It is trivial when \(n=0\) since \(x_0^\delta = x_0\) and \(\xi _0^\delta = \xi _0\). In the following we assume that the result is proved for \(n=m-1\) and show that the result holds also for \(n=m\).

We will adapt the argument from [5]. Let \(\{y^{\delta _i}\}\) be a sequence of data satisfying \(\Vert y^{\delta _i}-y\Vert \le \delta _i\) with \(\delta _i\rightarrow 0\). By the minimizing property of \(x_m^{\delta _i}\) we have

$$\begin{aligned} \frac{1}{r}\Vert F(x_m^{\delta _i})-y^{\delta _i}\Vert ^r + \alpha _m D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i}) \le \frac{1}{r}\Vert F(x_{m-1}^{\delta _i})-y^{\delta _i}\Vert ^r. \end{aligned}$$

By the induction hypothesis, we can see that the right hand side of the above inequality is uniformly bounded with respect to \(i\). Therefore both \(\{\Vert F(x_m^{\delta _i})-y^{\delta _i}\Vert \}\) and \(\{D_{\xi _{m-1}^{\delta _i}}\varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i})\}\) are uniformly bounded with respect to \(i\). Consequently \(\{F(x_m^{\delta _i})\}\) is bounded in \(\mathcal {Y}\) and \(\{x_m^{\delta _i}\}\) is bounded in \(\mathcal {X}\); here we used the uniform convexity of \(\varTheta \). Since both \(\mathcal {X}\) and \(\mathcal {Y}\) are reflexive, by taking a subsequence if necessary, we may assume that \(x_m^{\delta _i}\rightharpoonup \bar{x}_m\in \mathcal {X}\) and \(F(x_m^{\delta _i})\rightharpoonup \bar{y}_m\in \mathcal {Y}\) as \(i\rightarrow \infty \). Since \(F\) is weakly closed, we have \(\bar{x}_m \in D(F)\) and \(F(\bar{x}_m)=\bar{y}_m\). In view of the weak lower semi-continuity of Banach space norm we have

$$\begin{aligned} \Vert F(\bar{x}_m)-y\Vert \le \liminf _{i\rightarrow \infty }\Vert F(x_m^{\delta _i})-y^{\delta _i}\Vert . \end{aligned}$$
(4.26)

Moreover, by using \(x_m^{\delta _i}\rightharpoonup \bar{x}_m\), the weak lower semi-continuity of \(\varTheta \), and the induction hypothesis, we have

$$\begin{aligned} \liminf _{i\rightarrow \infty } D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i})&= \liminf _{i\rightarrow \infty } \varTheta (x_m^{\delta _i}) - \varTheta (x_{m-1}) - \langle \xi _{m-1},\bar{x}_m-x_{m-1}\rangle \nonumber \\&\ge \varTheta (\bar{x}_m) - \varTheta (x_{m-1}) - \langle \xi _{m-1},\bar{x}_m-x_{m-1}\rangle \nonumber \\&= D_{\xi _{m-1}}\varTheta (\bar{x}_m,x_{m-1}). \end{aligned}$$
(4.27)

The inequalities (4.26) and (4.27) together with the minimizing property of \(x_m^{\delta _i}\) and the induction hypothesis imply

$$\begin{aligned}&\frac{1}{r} \Vert F(\bar{x}_m) -y\Vert ^r + \alpha _m D_{\xi _{m-1}} \varTheta (\bar{x}_m,x_{m-1})\\&\quad \le \liminf _{i\rightarrow \infty } \left\{ \frac{1}{r}\Vert F(x_m^{\delta _i})-y^{\delta _i}\Vert ^r + \alpha _m D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i})\right\} \\&\quad \le \limsup _{i\rightarrow \infty } \left\{ \frac{1}{r}\Vert F(x_m^{\delta _i})-y^{\delta _i}\Vert ^r + \alpha _m D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i})\right\} \\&\quad \le \limsup _{i\rightarrow \infty } \left\{ \frac{1}{r}\Vert F(x_m)-y^{\delta _i}\Vert ^r + \alpha _m D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m,x_{m-1}^{\delta _i})\right\} \\&\quad = \frac{1}{r}\Vert F(x_m)-y\Vert ^r + \alpha _m D_{\xi _{m-1}} \varTheta (x_m,x_{m-1}). \end{aligned}$$

According to the definition of \(x_m\) and Assumption 3.3, we must have \(\bar{x}_m = x_m\). Therefore \(x_m^{\delta _i}\rightharpoonup x_m,\,F(x_m^{\delta _i})\rightharpoonup F(x_m)\), and

$$\begin{aligned}&\lim _{i\rightarrow \infty } \left\{ \frac{1}{r}\Vert F(x_m^{\delta _i})-y^{\delta _i}\Vert ^r + \alpha _m D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i})\right\} \nonumber \\&\quad = \frac{1}{r}\Vert F(x_m)-y\Vert ^r + \alpha _m D_{\xi _{m-1}} \varTheta (x_m,x_{m-1}). \end{aligned}$$
(4.28)

Next we will show that

$$\begin{aligned} \lim _{i\rightarrow \infty } D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i}) = D_{\xi _{m-1}} \varTheta (x_m,x_{m-1}). \end{aligned}$$
(4.29)

Let

$$\begin{aligned} a: = \limsup _{i\rightarrow \infty } D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i})\quad \mathrm{and }\quad b: = D_{\xi _{m-1}} \varTheta (x_m,x_{m-1}). \end{aligned}$$

In view of (4.27), it suffices to show \(a\le b\). Assume to the contrary that \(a>b\). By taking a subsequence if necessary, we may assume that

$$\begin{aligned} a= \lim _{i\rightarrow \infty } D_{\xi _{m-1}^{\delta _i}} \varTheta (x_m^{\delta _i},x_{m-1}^{\delta _i}). \end{aligned}$$

It then follows from (4.28) that

$$\begin{aligned} \frac{1}{r} \lim _{i\rightarrow \infty }\Vert F(x_m^{\delta _i})-y^{\delta _i}\Vert ^r =\frac{1}{r} \Vert F(x_m)-y\Vert ^r +\alpha _m (b-a) < \frac{1}{r}\Vert F(x_m)-y\Vert ^r \end{aligned}$$

which is a contradiction to (4.26). We therefore obtain (4.29).

By using the induction hypothesis and \(x_m^{\delta _i}\rightharpoonup x_m\), we obtain from (4.29) that

$$\begin{aligned} \lim _{i\rightarrow \infty } \varTheta (x_m^{\delta _i}) = \varTheta (x_m). \end{aligned}$$

Since \(x_m^{\delta _i}\rightharpoonup x_m\) and since \(\varTheta \) has the Kadec property, see Lemma 2.1, we obtain that \(x_m^{\delta _i}\rightarrow x_m\) as \(i\rightarrow \infty \). Finally, from the definition of \(\xi _m^{\delta _i}\), the induction hypothesis, and the continuity of the map \(x\rightarrow F'(x)\), and the continuity of the duality mapping \(J_r\), it follows that \(\xi _m^{\delta _i}\rightarrow \xi _m\) as \(i\rightarrow \infty \).

The above argument shows that for any sequence \(\{y^{\delta _i}\}\) converging to \(y\), the sequence \(\{x_m^{\delta _i}\}\) always has a subsequence, still denoted as \(x_m^{\delta _i}\), such that \(x_m^{\delta _i}\rightarrow x_m,\,\varTheta (x_m^{\delta _i}) \rightarrow \varTheta (x_m)\) and \(\xi _m^{\delta _i} \rightarrow \xi _m\) as \(i\rightarrow \infty \). Therefore, we obtain (4.25) with \(n=m\) as \(y^\delta \rightarrow y\). The proof is complete. \(\square \)

4.4 Proof of Theorem 3.1

Since other parts have been proved in Lemma 4.1, it remains only to show the convergence result (3.9), where \(x_*\) is the limit of \(\{x_n\}\) which exists by Theorem 4.1.

Assume first that \(\{y^{\delta _i}\}\) is a sequence satisfying \(\Vert y^{\delta _i}-y\Vert \le \delta _i\) with \(\delta _i\rightarrow 0\) such that \(n_{\delta _i}\rightarrow n_0\) as \(i\rightarrow \infty \) for some integer \(n_0\). We may assume \(n_{\delta _i} =n_0\) for all \(i\). From the definition of \(n_{\delta _i} = n_0\), we have

$$\begin{aligned} \Vert F(x_{n_0}^{\delta _i}) - y^{\delta _i}\Vert \le \tau \delta _i. \end{aligned}$$

Since Lemma 4.3 implies \(x_{n_0}^{\delta _i}\rightarrow x_{n_0}\), by letting \(i\rightarrow \infty \) we have \(F(x_{n_0}) = y\). This together with the definition of \(x_n\) implies that \(x_n=x_{n_0}\) for all \(n\ge n_0\). Since Theorem 4.1 implies \(x_n\rightarrow x_*\) as \(n\rightarrow \infty \), we must have \(x_{n_0} = x_*\). Consequently, we have from Lemma 4.3 that \(x_{n_{\delta _i}}^{\delta _i} \rightarrow x_*,\,\varTheta (x_{n_{\delta _i}}^{\delta _i})=\varTheta (x_{n_0}^{\delta _i})\rightarrow \varTheta (x_{n_0})=\varTheta (x_*)\) and

$$\begin{aligned} D_{\xi _{n_{\delta _i}}^{\delta _i}} \varTheta (x_*, x_{n_{\delta _i}}^{\delta _i}) = D_{\xi _{n_0}^{\delta _i}} \varTheta (x_{n_0},x_{n_0}^{\delta _i})\rightarrow 0 \ \mathrm{as} \ i\rightarrow \infty . \end{aligned}$$

Assume next that \(\{y^{\delta _i}\}\) is a sequence satisfying \(\Vert y^{\delta _i}-y\Vert \le \delta _i\) with \(\delta _i\rightarrow 0\) such that \(n_i:=n_{\delta _i} \rightarrow \infty \) as \(i\rightarrow \infty \). We first show that

$$\begin{aligned} D_{\xi _{n_i-2}^{\delta _i}} \varTheta (x_*, x_{n_i-2}^{\delta _i})\rightarrow 0\quad \mathrm{as}\,\,i\rightarrow \infty . \end{aligned}$$
(4.30)

Let \(\epsilon >0\) be an arbitrary number. Since Theorem 4.1 implies \(D_{\xi _n} \varTheta (x_*,x_n)\rightarrow 0\) as \(n\rightarrow \infty \), there exists an integer \(n(\epsilon )\) such that \(D_{\xi _{n(\epsilon )}}\varTheta (x_*, x_{n(\epsilon )})<\epsilon /2\). On the other hand, since Lemma 4.3 implies \(x_{n(\epsilon )}^{\delta _i}\rightarrow x_{n(\epsilon )},\,\varTheta (x_{n(\epsilon )}^{\delta _i}) \rightarrow \varTheta (x_{n(\epsilon )})\) and \(\xi _{n(\epsilon )}^{\delta _i}\rightarrow \xi _{n(\epsilon )}\) as \(i\rightarrow \infty \), we can pick an integer \(i(\epsilon )\) large enough such that for all \(i\ge i(\epsilon )\) there hold \(n_i-2\ge n(\epsilon )\) and

$$\begin{aligned} |D_{\xi _{n(\epsilon )}^{\delta _i}} \varTheta (x_*, x_{n(\epsilon )}^{\delta _i}) - D_{\xi _{n(\epsilon )}} \varTheta (x_*, x_{n(\epsilon )})|<\frac{\epsilon }{2}. \end{aligned}$$

Therefore, it follows from Lemma 4.1 that

$$\begin{aligned} D_{\xi _{n_i-2}^{\delta _i}} \varTheta (x_*,x_{n_i-2}^{\delta _i})&\le D_{\xi _{n(\epsilon )}^{\delta _i}} \varTheta (x_*, x_{n(\epsilon )}^{\delta _i}) \le D_{\xi _{n(\epsilon )}} \varTheta (x_*, x_{n(\epsilon )}) + \frac{\epsilon }{2}<\epsilon \end{aligned}$$

for all \(i\ge i(\epsilon )\). Since \(\epsilon >0\) is arbitrary, we thus obtain (4.30). With the help of (4.14), we then obtain

$$\begin{aligned} D_{\xi _{n_i-1}^{\delta _i}} \varTheta (x_*, x_{n_i-1}^{\delta _i})\rightarrow 0 \quad \mathrm{as}\,\, i\rightarrow \infty . \end{aligned}$$
(4.31)

In view of (4.15) we have

$$\begin{aligned} \frac{1}{\alpha _{n_i-1}}\Vert F(x_{n_i-1}^{\delta _i})-y^{\delta _i}\Vert ^r \le \frac{\tau }{(1-\eta )\tau -1-\eta } D_{\xi _{n_i-2}^{\delta _i}} \varTheta (x_*, x_{n_i-2}^{\delta _i}). \end{aligned}$$

Since \(\Vert F(x_{n_i-1}^{\delta _i})-y^{\delta _i}\Vert >\tau \delta _i\), we can conclude from (4.30) that \(\delta _i^r/\alpha _{n_i-1}\rightarrow 0\). Since \(\alpha _{n_i-1}\le c_0\alpha _{n_i}\), we must have \(\delta _i^r/\alpha _{n_i}\rightarrow 0\) as \(i\rightarrow \infty \). In view of (4.17) and (4.31), we can obtain

$$\begin{aligned} D_{\xi _{n_i}^{\delta _i}} \varTheta (x_*, x_{n_i}^{\delta _i})\rightarrow 0 \quad \text{ as } i\rightarrow \infty , \end{aligned}$$
(4.32)

which together with the uniform convexity of \(\varTheta \) implies that \(x_{n_i}^{\delta _i} \rightarrow x_*\) as \(i\rightarrow \infty \).

Finally we show that \(\varTheta (x_{n_i}^{\delta _i}) \rightarrow \varTheta (x_*)\) as \(i \rightarrow \infty \). In view of (4.32), it suffices to show that

$$\begin{aligned} \langle \xi _{n_i}^{\delta _i}, x_*-x_{n_i}^{\delta _i}\rangle \rightarrow 0 \quad \text{ as } i \rightarrow \infty . \end{aligned}$$
(4.33)

Recall that \(\varTheta (x_n)\rightarrow \varTheta (x_*)\) and \(\langle \xi _n, x_*-x_n\rangle \rightarrow 0\) as \(n \rightarrow \infty \) which have been established in Theorem 4.1 and its proof. Thus, for any \(\epsilon >0\), we can pick an integer \(l_0\) such that

$$\begin{aligned} |\varTheta (x_{l_0})-\varTheta (x_*)| <\epsilon \quad \text{ and } \quad |\langle \xi _{l_0}, x_*-x_{l_0}\rangle | <\epsilon . \end{aligned}$$
(4.34)

Then, using (4.23) in Lemma 4.2, we can derive

$$\begin{aligned} |\langle \xi _{n_i}^{\delta _i}, x_*-x_{n_i}^{\delta _i}\rangle |&\le |\langle \xi _{l_0}^{\delta _i}, x_*-x_{n_i}^{\delta _i}\rangle | +|\langle \xi _{n_i}^{\delta _i} -\xi _{l_0}^{\delta _i}, x_*-x_{n_i}^{\delta _i}\rangle | \\&\le |\langle \xi _{l_0}^{\delta _i}, x_*-x_{n_i}^{\delta _i}\rangle | + C_2 \frac{\delta _i^r}{\alpha _{n_i}} + C_3 D_{\xi _{l_0}^{\delta _i}} \varTheta (x_*, x_{l_0}^{\delta _i}). \end{aligned}$$

By using the definition of Bregman distance and (4.34) we have

$$\begin{aligned} D_{\xi _{l_0}^{\delta _i}} \varTheta (x_*, x_{l_0}^{\delta _i})&= [ \varTheta (x_*)-\varTheta (x_{l_0})] + [\varTheta (x_{l_0}) -\varTheta (x_{l_0}^{\delta _i})] - \langle \xi _{l_0}, x_*-x_{l_0}\rangle \\&-\langle \xi _{l_0}, x_{l_0} -x_{l_0}^{\delta _i}\rangle - \langle \xi _{l_0}^{\delta _i}-\xi _{l_0}, x_*-x_{l_0}^{\delta _i}\rangle \\&\le 2\epsilon \!+\! | \varTheta (x_{l_0}) \!-\!\varTheta (x_{l_0}^{\delta _i})| \!+\! |\langle \xi _{l_0}, x_{l_0} \!-\!x_{l_0}^{\delta _i}\rangle | \!+\! | \langle \xi _{l_0}^{\delta _i}-\xi _{l_0}, x_*-x_{l_0}^{\delta _i}\rangle |. \end{aligned}$$

Therefore

$$\begin{aligned} |\langle \xi _{n_i}^{\delta _i}, x_*-x_{n_i}^{\delta _i}\rangle |&\le 2C_3 \epsilon + C_2 \frac{\delta _i^r}{\alpha _{n_i}} + |\langle \xi _{l_0}^{\delta _i}, x_*-x_{n_i}^{\delta _i}\rangle | + C_3 | \varTheta (x_{l_0}) -\varTheta (x_{l_0}^{\delta _i})| \\&\quad \, + \,\,C_3 |\langle \xi _{l_0}, x_{l_0} -x_{l_0}^{\delta _i}\rangle | + C_3 | \langle \xi _{l_0}^{\delta _i}-\xi _{l_0}, x_*-x_{l_0}^{\delta _i}\rangle |. \end{aligned}$$

In view of Lemma 4.3 and the facts that \(\delta _i^r/\alpha _{n_i}\rightarrow 0\) and \(x_{n_i}^{\delta _i} \rightarrow x_*\) as \(i\rightarrow \infty \) which we have established in the above, we can conclude that there is an integer \(i_0(\epsilon )\) such that for all \(i>i_0(\epsilon )\) there hold \(n_i>l_0\) and \(|\langle \xi _{n_i}^{\delta _i}, x_*-x_{n_i}^{\delta _i}\rangle | \le 3 C_3 \epsilon \). Since \(\epsilon >0\) is arbitrary, we thus obtain (4.33).

4.5 A variant of the discrepancy principle

When \(n_\delta \) denotes the integer determined by the discrepancy principle (3.6), from Lemma 4.1 we can see that the Bregman distance \(D_{\xi _n^\delta }\varTheta (x^\dag , x_n^\delta )\) is decreasing up to \(n=n_\delta -1\). This monotonicity, however, may not hold at \(n=n_\delta \). Therefore, it seems reasonable to consider the following variant of the discrepancy principle.

Rule 4.1

Let \(\tau >1\) be a given number. If \(\Vert F(x_0)-y^\delta \Vert \le \tau \delta \), we define \(n_\delta :=0\); otherwise we define

$$\begin{aligned} n_\delta := \max \{ n: \Vert F(x_n^\delta )-y^\delta \Vert \ge \tau \delta \}, \end{aligned}$$

i.e., \(n_\delta \) is the integer such that

$$\begin{aligned} \Vert F(x_{n_\delta +1}^\delta )-y^\delta \Vert <\tau \delta \le \Vert F(x_n^\delta )-y^\delta \Vert , \quad 0\le n\le n_\delta . \end{aligned}$$

We point out that the argument for proving Theorem 3.1 can be used to prove the convergence property of \(x_{n_\delta }^\delta \) for \(n_\delta \) determined by Rule 4.1, we can even drop the condition \(\alpha _n\le c_0\alpha _{n+1}\) on \(\{\alpha _n\}\) in Theorem 3.1. In fact we have the following result.

Theorem 4.2

Let \(\mathcal {X}\) be reflexive and \(\mathcal {Y}\) be uniformly smooth, \(\varTheta \) satisfy Assumption 3.1, and \(F\) satisfy Assumptions 3.2 and 3.3. Let \(1<r<\infty \) and \(\tau >(1+\eta )/(1-\eta )\), and let \(\{\alpha _n\}\) be such that \(\sum _{n=1}^{\infty }\alpha _n^{-1} = \infty \). Assume further that

$$\begin{aligned} D_{\xi _0} \varTheta (x^\dag , x_0) \le \varphi (\rho ). \end{aligned}$$

Then, the integer \(n_\delta \) defined by Rule 4.1 is finite. Moreover, there is a solution \(x_*\in D(\varTheta )\) of (1.1) such that

$$\begin{aligned} x_{n_\delta }^\delta \rightarrow x_*, \quad \varTheta (x_{n_\delta }^\delta ) \rightarrow \varTheta (x_*) \quad \text{ and } \quad D_{\xi _{n_{\delta }}^{\delta }} \varTheta (x_*, x_{n_{\delta }}^{\delta }) \rightarrow 0 \end{aligned}$$
(4.35)

as \(\delta \rightarrow 0\). If, in addition, \(\mathcal {N}(F'(x^\dag ))\subset \mathcal {N}(F'(x))\) for all \(x\in B_{3\rho }(x_0)\cap D(F)\), then \(x_*=x^\dag \).

Proof

The proof of Lemma 4.1 can be used without change to show that \(n_\delta <\infty \) and that (4.14) and (4.15) hold for \(1\le n\le n_\delta \). Consequently, (4.23) in Lemma 4.2 becomes

$$\begin{aligned} |\langle \xi _{n_\delta }^\delta \!-\!\xi _l^\delta , x_*-x_{n_\delta }^\delta \rangle | \!\le \! \frac{3(1+\eta )(1+\tau )}{(1-\eta )\tau \!-\!1-\eta } D_{\xi _l^\delta } \varTheta (x_*, x_l^\delta ), \quad 0\!\le \! l<n_\delta .\qquad \end{aligned}$$
(4.36)

In order to prove the convergence result (4.35), as in the proof of Theorem 3.1 we consider two cases.

Assume first that \(\{y^{\delta _i}\}\) is a sequence satisfying \(\Vert y^{\delta _i}-y\Vert \le \delta _i\) with \(\delta _i\rightarrow 0\) such that \(n_{\delta _i}\rightarrow n_0\) as \(i\rightarrow \infty \) for some integer \(n_0\). We may assume \(n_{\delta _i} =n_0\) for all \(i\). By Rule 4.1 we always have \(\Vert F(x_{n_0+1}^{\delta _i})-y^{\delta _i}\Vert \le \tau \delta _i\). By letting \(i\rightarrow \infty \), we obtain \(F(x_{n_0+1}) = y\). This together with the definition of \(x_n\) implies that \(x_n = x_{n_0+1}\) for all \(n\ge n_0+1\). It then follows from Theorem 4.1 that \(x_* = x_{n_0+1}\). We claim that \(x_{n_0+1} = x_{n_0}\). To see this, by using the definition of \(\xi _{n_0+1}\), we have

$$\begin{aligned} \xi _{n_0+1} = \xi _{n_0} -\frac{1}{\alpha _{n_0+1}} F'(x_{n_0+1})^* J_r(F(x_{n_0+1})-y) = \xi _{n_0}. \end{aligned}$$

Therefore

$$\begin{aligned} D_{\xi _{n_0}} \varTheta (x_{n_0+1},x_{n_0})&\le D_{\xi _{n_0}} \varTheta (x_{n_0+1},x_{n_0}) +D_{\xi _{n_0+1}} \varTheta (x_{n_0},x_{n_0+1}) \\&= \langle \xi _{n_0+1} -\xi _{n_0}, x_{n_0+1}-x_{n_0}\rangle =0. \end{aligned}$$

This and the strictly convexity of \(\varTheta \) imply that \(x_{n_0+1} = x_{n_0}\). Consequently \(x_{n_0} = x_*\). A simple application of Lemma 4.3 then gives the desired conclusion.

Assume next that \(\{y^{\delta _i}\}\) is a sequence satisfying \(\Vert y^{\delta _i}-y\Vert \le \delta _i\) with \(\delta _i\rightarrow 0\) such that \(n_{\delta _i}\rightarrow \infty \) as \(i\rightarrow \infty \). We can follow the argument for deriving (4.30) to show that \(D_{\xi _{n_i}^{\delta _i}} \varTheta (x_*, x_{n_i}^{\delta _i})\rightarrow 0\) which in turn implies that \(x_{n_i}^{\delta _i} \rightarrow x_*\) by the uniformly convexity of \(\varTheta \). Then we can use (4.36) and follow the same procedure in the proof of Theorem 3.1 to obtain \(\varTheta (x_{n_i}^{\delta _i})\rightarrow \varTheta (x_*)\) as \(i\rightarrow \infty \). \(\square \)

5 Numerical examples

In this section we present some numerical simulations to test the performance of our method by considering a linear integral equation of the first kind and a nonlinear problem arising from the parameter identification in partial differential equations.

Example 5.1

We consider the linear integral equation of the form

$$\begin{aligned} Ax(s) := \int _0^1 K(s,t) x(t)dt = y(s) \quad \mathrm{on}\ [0,1], \end{aligned}$$
(5.1)

where

$$\begin{aligned} K(s,t) = \left\{ \begin{array}{lll} 40s(1-t), &{} \quad s\le t, \\ 40t(1-s), &{} \quad s\ge t. \end{array} \right. \end{aligned}$$

It is clear that \(A: \mathcal {X}:=L^2[0,1]\rightarrow \mathcal {Y}:=L^2[0,1]\) is a compact operator. Our goal is to find the solution of (5.1) by using some noisy data \(y^{\delta }\) instead of \(y\). We assume that the exact solution is

$$\begin{aligned} x^{\dag }(t) = \left\{ \begin{array}{l@{\quad }l} 0.5, \quad &{} t\in [0.292,0.300], \\ 1, &{} t\in [0.500,0.508], \\ 0.7, &{} t\in [0.700, 0.708],\\ 0, &{} \text{ elsewhere } \end{array} \right. \end{aligned}$$

Let \(y = Ax^{\dag }\) which is the exact data. For a given noise level \(\delta >0\), we add random Gaussian noise to \(y\) to obtain \(y^\delta \) satisfying \(\Vert y-y^\delta \Vert _{L^2[0,1]} = \delta \) which is used to reconstruct \(x^\dag \) when the iteration is terminated by the discrepancy principle (3.6).

In our numerical simulations, we take \(x_0=0\) and \(\xi _0=0\), we divide \([0,1]\) into \(N=400\) subintervals of equal length, approximate any integrals by the trapezoidal rule, and solve the involved minimization problems by the modified Fletcher-Reeves CG method in [21]. In Fig. 1 we present the reconstruction results by taking \(\delta = 0.5\times 10^{-3}\) and \(\alpha _n = 2^{-n}\) with \(\tau =1.02\) in the discrepancy principle (3.6). Figure 1a reports the result via the method with \(\varTheta (x) = \Vert x\Vert _{L^2}^2\). It is clear that the reconstructed solution is rather oscillatory and fails to capture the sparsity of the exact solution \(x^{\dag }\). Figure 1b gives the result of the method with \(\varTheta (x) = \mu \Vert x\Vert _{L^2}^2+\Vert x\Vert _{L^1}\) and \(\mu = 0.01\). During the computation, \(\Vert x\Vert _{L^1}\) is replaced by a smooth one \(\int _0^1\sqrt{|x|^2+\epsilon }\) with \(\epsilon = 10^{-6}\). The sparsity reconstruction is significantly improved.

Fig. 1
figure 1

Reconstruction results for Example 5.1: a \(\varTheta (x) = \Vert x\Vert _{L^2}^2\); b \(\varTheta (x) = \mu \Vert x\Vert _{L^2}^2+\Vert x\Vert _{L^1}\) with \(\mu =0.01\)

Example 5.2

We next consider the identification of the parameter \(c\) in the boundary value problem

$$\begin{aligned} \left\{ \begin{array}{ll} -\Delta u + cu = f &{}\quad \text{ in } \varOmega , \\ u = g &{}\quad \text{ on } \partial \varOmega \end{array} \right. \end{aligned}$$
(5.2)

from an \(L^2(\varOmega )\)-measurement of the state \(u\), where \(\varOmega \subset \mathbb {R}^d,\,d\le 3\), is a bounded domain with Lipschitz boundary, \(f\in L^2(\varOmega )\) and \(g\in H^{3/2}(\partial \varOmega )\). We assume that the sought solution \(c^{\dag }\) is in \(L^2(\varOmega )\). This problem reduces to solving an equation of the form (1.1) if we define the nonlinear operator \(F: L^2(\varOmega )\rightarrow L^2(\varOmega )\) by \(F(c): = u(c)\), where \(u(c)\in H^2(\varOmega )\subset L^2(\varOmega )\) denotes the unique solution of (5.2). This operator \(F\) is well defined on

$$\begin{aligned} D(F): = \left\{ c\in L^2(\varOmega ) : \Vert c-\hat{c}\Vert _{ L^2(\varOmega )}\le \gamma _0 \text{ for } \text{ some } \hat{c}\ge 0 \text{ a.e. }\right\} \end{aligned}$$

for some positive constant \(\gamma _0>0\). It is well known that \(F\) is Fréchet differentiable; the Fréchet derivative of \(F\) and its adjoint are given by

$$\begin{aligned} F'(c)h = -A(c)^{-1}(hF(c)) \quad \text{ and } \quad F'(c)^* w = -u(c) A(c)^{-1}w \end{aligned}$$

for \(h, w\in L^2(\varOmega )\), where \(A(c): H^2\cap H_0^1\rightarrow L^2\) is defined by \(A(c) u = -\Delta u +cu\) which is an isomorphism uniformly in a ball \(B_{\rho }(c_0) \cap D(F)\) for any \(c_0\in D(F)\) with small \(\rho >0\). It has been shown (see [4]) that for any \(\bar{c}, c\in B_{\rho }(c_0)\) there holds

$$\begin{aligned} \Vert F(\bar{c})-F(c)- F'(c)(\bar{c}-c)\Vert _{L^2(\varOmega )}\le C\Vert \bar{c}-c\Vert _{L^2(\varOmega )}\Vert F(\bar{c})-F(c)\Vert _{L^2(\varOmega )}. \end{aligned}$$

Therefore, Assumption 3.2 and the condition (3.10) hold if \(\rho >0\) is small enough.

In our numerical simulation, we consider the two dimensional problem with \(\varOmega = [0,1]\times [0,1]\) and

$$\begin{aligned} c^{\dag }(x,y) = \left\{ \begin{array}{l@{\quad }l} 1, \quad &{} \hbox {if } (x-0.3)^2+(y-0.7)^2\le 0.2^2; \\ 0.5, &{} \hbox {if } (x,y)\in [0.6,0.8]\times [0.2,0.5]; \\ 0, &{} \hbox {elsewhere.} \end{array} \right. \end{aligned}$$

We assume \(u(c^{\dag }) = x+y\) and add noise to produce the noisy data \(u^{\delta }\) satisfying \(\Vert u^{\delta } - u(c^{\dag })\Vert _{L^2(\varOmega )} = \delta \). We take \(\delta = 0.1\times 10^{-3}\) and \(\alpha _n = 2^{-n}\). The partial differential equations involved are solved approximately by a finite difference method by dividing \(\varOmega \) into \(40\times 40\) small squares of equal size. and the involved minimization problems are solved by the modified nonlinear CG method in [21]. we take the initial guess \(c_0=0\) and \(\xi _0=0\), and terminate the iteration by the discrepancy principle (3.6) with \(\tau = 1.05\).

Fig. 2
figure 2

Reconstruction results for Example 5.2: a exact solution; b \(\varTheta (c) = \Vert c\Vert _{L^2}^2\); c, d \(\varTheta (c) = \mu \Vert c\Vert _{L^2}^2+\int _\varOmega |Dc|\) with \(\mu =0.01\) and \(\mu =1\) respectively

Figure 2a plots the exact solution \(c^{\dag }(x,y)\). Figure 2b shows the result for the method with \(\varTheta (c) = \Vert c\Vert ^2_{L^2}\). Figure 2c, d report the reconstruction results for the method with \(\varTheta (c) = \mu \Vert c\Vert _{L^2}^2+\int _\varOmega |Dc|\) for \(\mu = 0.01\) and \(\mu =1.0\) respectively; the term \(\int _\varOmega |Dc|\) is replaced by a smooth one \(\int _{\varOmega }\sqrt{|Dc|^2+\epsilon }\) with \(\epsilon = 10^{-6}\) during computation. The reconstruction results in (c) and (d) significantly improve the one in (b) by efficiently removing the notorious oscillatory effect and indicate that the method is robust with respect to \(\mu \). We remark that, due to the smaller value of \(\mu \), the reconstruction result in (d) is slightly better than the one in (c) as can be seen from the plots; the computational time for (d), however, is longer.