1 Introduction

Inverse problems can arise from many applications in natural sciences. Most of inverse problems are usually ill-posed in the sense that their solutions do not depend continuously on the data. Thus, the stable reconstruction of solutions of inverse problems requires regularization techniques [2, 8, 25].

For solving nonlinear inverse problems in Hilbert spaces, Hanke introduced in [11] his regularizing Levenberg–Marquardt scheme which is a stable iterative procedure with the Lagrangian multipliers in the Levenberg–Marquardt method being updated by an adaptive strategy. To describe this method more precisely, consider nonlinear inverse problems that can be formulated as the form

$$\begin{aligned} F(x)=y, \end{aligned}$$
(1.1)

where \(F: \mathcal X\rightarrow \mathcal Y\) is a nonlinear Fréchet differentiable operator between two Hilbert spaces \(\mathcal X\) and \(\mathcal Y\) whose Fréchet derivative at x is denoted by \(F'(x)\). Assuming that \(y^\delta \) is the only available noisy data, the regularizing Levenberg–Marquardt scheme in [11] constructs the next iterate \(x_{n+1}\) from a current iterate \(x_n\) by first regularizing the linearized equation

$$\begin{aligned} F'(x_n) (x-x_n) = y^\delta -F(x_n) \end{aligned}$$

at \(x_n\) via the Tikhonov regularization

$$\begin{aligned} x_n(\alpha , y^\delta ) := \arg \min _{x\in \mathcal X} \left\{ \frac{1}{2} \Vert y^\delta -F(x_n) -F'(x_n) (x-x_n)\Vert ^2 + \frac{\alpha }{2} \Vert x-x_n\Vert ^2\right\} \end{aligned}$$
(1.2)

using a quadratic penalty term and then define \(x_{n+1}: = x_n(\alpha _n, y^\delta )\) with \(\alpha _n>0\) being a parameter satisfying

$$\begin{aligned} \Vert y^\delta -F(x_n)-F'(x_n) (x_n(\alpha _n, y^\delta )-x_n)\Vert = \mu \Vert y^\delta -F(x_n)\Vert , \end{aligned}$$
(1.3)

where \(0<\mu <1\) is a preassigned number. It has been shown in [11] that this defines a regularization method as long as the iteration is terminated by the discrepancy principle. The regularizing Levenberg–Marquardt scheme has had far reaching impact on the development of iterative regularization methods for solving nonlinear inverse problems and has stimulated considerable subsequent work, see [13, 17, 18, 27, 30] and the references therein.

In order to deal with the situations where the sought solutions are sparse or piecewise constant and where the data are contaminated by general noise, one has to use the sparse promoting functionals or total variational like functionals as regularization terms and use general fidelity terms to fit data. This leads to the consideration on regularization methods in Banach spaces with general regularization terms which has emerged as a highly active research field in recent years. The monograph [31] collects some of such research works including the variational regularization of Tikhonov type and some iterative regularization methods in Banach spaces. One may further refer to [14, 18, 2024, 26, 28] for more recent works.

The purpose of the present paper is to extend the regularizing Levenberg–Marquardt scheme of Hanke to solve nonlinear inverse problems in Banach spaces using general convex regularization terms. Thus, we will consider nonlinear inverse problems modelled by (1.1) with F being a nonlinear operator between two Banach spaces \(\mathcal X\) and \(\mathcal Y\) whose dual spaces are denoted as \(\mathcal X^*\) and \(\mathcal Y^*\) respectively. Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, lower semi-continuous, uniformly convex function which is chosen according to the a prior information on the feature of the sought solution. Using a current iterate \((x_n, \xi _n)\in \mathcal X\times \mathcal X^*\) with \(\xi _n\in \partial \Theta (x_n)\), the subdifferential of \(\Theta \) at \(x_n\), we define

$$\begin{aligned} x_n(\alpha , y^\delta ) :=\arg \min _{x\in \mathcal X} \left\{ \frac{1}{r} \Vert y^\delta -F(x_n) -F'(x_n)(x-x_n)\Vert ^r +\alpha D_{\xi _n} \Theta (x, x_n)\right\} , \end{aligned}$$
(1.4)

where \(1<r<\infty \) is a given number and \(D_{\xi _n}\Theta (x, x_n)\) denotes the Bregman distance induced by \(\Theta \) at \(x_n\) in the direction \(\xi _n\). We then define the next iterate \(x_{n+1}\) by \(x_{n+1}= x_n(\alpha _n, y^\delta )\), where \(\alpha _n>0\) is chosen such that

$$\begin{aligned} \mu _0\Vert y^\delta -F(x_n)\Vert \le \Vert y^\delta -F(x_n)-F'(x_n) (x_n(\alpha _n, y^\delta )-x_n)\Vert \le \mu _1 \Vert y^\delta -F(x_n)\Vert \end{aligned}$$

with preassigned numbers \(0<\mu _0\le \mu _1<1\). This choice of \(\alpha _n\) is a relaxation of (1.3) and has more flexibility when the root of the Eq. (1.3) is difficult to determine exactly. We need to update the subgradient \(\xi _n\) to \(\xi _{n+1}\in \partial \Theta (x_{n+1})\) which, according to the minimization property of \(x_{n+1}\), can be taken as

$$\begin{aligned} \xi _{n+1} = \xi _n + \frac{1}{\alpha _n} F'(x_n)^* J_r^{\mathcal Y} (y^\delta -F(x_n) -F'(x_n)(x_{n+1}-x_n)), \end{aligned}$$

where \(F'(x_n)^*: \mathcal Y^*\rightarrow \mathcal X^*\) denotes the adjoint of \(F'(x_n)\) and \(J_r^{\mathcal Y}: \mathcal Y\rightarrow \mathcal Y^*\) denotes the duality mapping on \(\mathcal Y\) with gauge function \(t\rightarrow t^{r-1}\). We repeat the above procedure until a discrepancy principle is satisfied. The precise description of the above extension will be given in Algorithm 1. The main result of this paper shows that the above extension of the Levenberg–Marquardt scheme is well-defined and exhibits the regularization property when used to solve ill-posed problems.

The introduction of a general convex function \(\Theta \) into the algorithm presents many challenging issues on its convergence analysis. Unlike (1.2) whose minimizer can be determined explicitly, the minimizer of (1.4) does not have a closed form. The possible non-smoothness of \(\Theta \) and the non-Hilbertian structure of \(\mathcal X\) and \(\mathcal Y\) prevent us from using the classical techniques. Instead we have to utilize tools from convex optimization and non-smooth analysis, including the subdifferential calculus and the Bregman distance. The convergence analysis becomes even more subtle when considering the regularization property. The main obstacle comes from the stability issue; an iterative sequence constructed using noisy data can split into many possible noise-free iterative sequences as the noise level tends zero, due to the non-unique determination of \(\alpha _n\). We will conquer this difficulty by borrowing an idea from [10], which is based on the diagonal sequence argument, to show that all these noise-free sequences actually converge uniformly in certain sense. On the other hand, unlike the variational regularization of Tikhonov [31] and the non-stationary iterated Tikhonov regularization [22] whose numerical implementation requires to solve several non-convex minimization problems, our method involves only convex minimization problems and therefore has the advantage of being implemented efficiently by convex optimization techniques.

Under a prior choice of \(\{\alpha _n\}\), a Levenberg–Marquardt method was considered in [1] to solve (1.1) with \(\mathcal X=\L ^2(\Omega )\) and \(\mathcal Y\) a Hilbert space using the convex penalty function \(\Theta (x)= a\Vert x\Vert _{L^2}^2 + \int _\Omega |Dx|\), where \(\int _\Omega |Dx|\) denotes the total variation of x. The analysis in [1], however, is somewhat preliminary since it provides only weak convergence along subsequences. In contrast, our method chooses \(\{\alpha _n\}\) adaptively, and the whole iterative sequence converges to a solution of (1.1) strongly.

In recent years the iteratively regularized Gauss–Newton method has been extended to solve nonlinear inverse problems in Banach spaces [21, 24, 26] in which it defines \(\{x_n\}\) by

$$\begin{aligned} x_{n+1} = \arg \min _{x\in \mathcal X} \left\{ \frac{1}{r} \Vert y^\delta - F(x_n)-F'(x_n)(x-x_n)\Vert ^r + \alpha _n D_{\xi _0} \Theta (x, x_0)\right\} \end{aligned}$$

It looks similar to (1.4) but in fact they are essentially different. The iteratively regularized Gauss–Newton method always defines \(x_n\) in a neighborhood of the initial guess \(x_0\), while the Levenberg–Marquardt method defines \(x_{n+1}\) in a region around \(x_n\) for each \(n\ge 0\). From the optimization point of view, Levenberg–Marquardt method is more favorable in nature.

The rest of this paper is organized as follows. In Sect. 2 we collect some basic results from convex analysis in Banach spaces and prove a continuous dependence result of minimizers of uniformly convex functionals on various parameters. We then present in Sect. 3 the Levenberg–Marquardt method in Banach spaces with general convex regularization terms and show that the method is well-defined. The detailed convergence analysis is given in Sect. 4. Finally in Sect. 5 we report various numerical results to indicate the performance of the method.

2 Preliminaries

Let \(\mathcal X\) and \(\mathcal Y\) be two Banach spaces whose norms are denoted by \(\Vert \cdot \Vert \). We use \(\mathcal X^*\) and \(\mathcal Y^*\) to denote their dual spaces respectively. Given \(x\in \mathcal X\) and \(x^*\in \mathcal X^*\) we write \(\langle x^*, x\rangle =x^*(x)\) for the duality pair. We use “\(\rightarrow \)” and “\(\rightharpoonup \)” to denote the strong convergence and the weak convergence respectively. By \({\mathscr {L}}(\mathcal X, \mathcal Y)\) we denote for the space of all continuous linear operators from \(\mathcal X\) to \(\mathcal Y\). For any \(A\in {\mathscr {L}}(\mathcal X, \mathcal Y)\), we use \(A^*: \mathcal Y^*\rightarrow \mathcal X^*\) to denote its adjoint, i.e. \(\langle A^* y^*, x\rangle =\langle y^*, A x\rangle \) for any \(x\in \mathcal X\) and \(y^*\in \mathcal Y^*\). We use \({\mathscr {N}}(A)=\{x\in \mathcal X: A x=0\}\) to denote the null space of A and define

$$\begin{aligned} {\mathscr {N}}(A)^\perp := \{ \xi \in \mathcal X^*: \langle \xi , x\rangle =0 \text{ for } \text{ all } x\in \mathcal N(A)\}. \end{aligned}$$

When \(\mathcal X\) is reflexive, there holds \({\mathscr {N}} (A)^\perp =\overline{{\mathscr {R}}(A^*)}\), where \({\mathscr {R}}(A^*)\) denotes the range space of \(A^*\) and \(\overline{{\mathscr {R}}(A^*)}\) denotes the closure of \({\mathscr {R}}(A^*)\) in \(\mathcal X^*\).

For a convex function \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\), we use \({\mathscr {D}}(\Theta ):=\{x\in \mathcal X: \Theta (x)<+\infty \}\) to denote its effective domain. We call \(\Theta \) proper if \({\mathscr {D}}(\Theta )\ne \emptyset \). Given \(x\in \mathcal X\) we define

$$\begin{aligned} \partial \Theta (x):=\{\xi \in \mathcal X^*: \Theta (\bar{x})-\Theta (x)-\langle \xi , \bar{x}-x\rangle \ge 0 \text{ for } \text{ all } \bar{x}\in \mathcal X\}. \end{aligned}$$

Any element \(\xi \in \partial \Theta (x)\) is called a subgradient of \(\Theta \) at x. The multi-valued mapping \(\partial \Theta : \mathcal X\rightarrow 2^{\mathcal X^*}\) is called the subdifferential of \(\Theta \). It could happen that \(\partial \Theta (x)=\emptyset \) for some \(x\in {\mathscr {D}}(\Theta )\). Let

$$\begin{aligned} {\mathscr {D}}(\partial \Theta ):=\{x\in {\mathscr {D}}(\Theta ): \partial \Theta (x)\ne \emptyset \}. \end{aligned}$$

For \(x \in {\mathscr {D}}(\partial \Theta )\) and \(\xi \in \partial \Theta (x)\) we define

$$\begin{aligned} D_\xi \Theta (\bar{x},x):=\Theta (\bar{x})-\Theta (x)-\langle \xi , \bar{x}-x\rangle , \quad \forall \bar{x}\in \mathcal X\end{aligned}$$

which is called the Bregman distance induced by \(\Theta \) at x in the direction \(\xi \). Clearly \(D_\xi \Theta (\bar{x},x)\ge 0\). By straightforward calculation one can see that

$$\begin{aligned} D_{\xi _2} \Theta (x,x_2)-D_{\xi _1} \Theta (x, x_1) = -D_{\xi _1} \Theta (x_2,x_1) +\langle \xi _2-\xi _1, x_2-x\rangle \end{aligned}$$
(2.1)

for all \(x_1, x_2\in {\mathscr {D}}(\partial \Theta )\), \(\xi _1\in \partial \Theta (x_1)\), \(\xi _2\in \partial \Theta (x_2)\) and \(x\in \mathcal X\).

Bregman distance can be used to obtain information under the Banach space norm when \(\Theta \) has stronger convexity. A proper convex function \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) is called uniformly convex if there is a strictly increasing function \(\varphi :[0, \infty ) \rightarrow [0, \infty )\) with \(\varphi (0)=0\) such that

$$\begin{aligned} \Theta (\lambda \bar{x} +(1-\lambda ) x) +\lambda (1-\lambda ) \varphi (\Vert \bar{x}-x\Vert ) \le \lambda \Theta (\bar{x}) +(1-\lambda ) \Theta (x) \end{aligned}$$
(2.2)

for all \(\bar{x}, x\in \mathcal X\) and \(\lambda \in [0,1]\). It is easily seen that if \(\Theta \) is uniformly convex in the sense of (2.2) then

$$\begin{aligned} D_\xi \Theta (\bar{x},x) \ge \varphi (\Vert \bar{x}-x\Vert ) \end{aligned}$$
(2.3)

for all \(\bar{x} \in \mathcal X\), \(x\in {\mathscr {D}}(\partial \Theta )\) and \(\xi \in \partial \Theta (x)\). Moreover, it follows from [32, Proposition 3.5.8] that any proper, weakly lower semi-continuous, uniform convex function \(\Theta :\mathcal X\rightarrow (-\infty , \infty ]\) is coercive in the sense that

$$\begin{aligned} \liminf _{\Vert x\Vert \rightarrow \infty } \frac{\Theta (x)}{\Vert x\Vert ^2} >0. \end{aligned}$$
(2.4)

On a Banach space \(\mathcal X\), we consider for \(1<r<\infty \) the convex function \(x\rightarrow \Vert x\Vert ^r/r\). Its subdifferential at x is given by

$$\begin{aligned} J_r^{\mathcal X}(x):=\{\xi \in \mathcal X^*: \Vert \xi \Vert =\Vert x\Vert ^{r-1} \text{ and } \langle \xi , x\rangle =\Vert x\Vert ^r\} \end{aligned}$$

which gives the duality mapping \(J_r^{\mathcal X}: \mathcal X\rightarrow 2^{\mathcal X^*}\) with gauge function \(t\rightarrow t^{r-1}\). We call \(\mathcal X\) uniformly convex if its modulus of convexity

$$\begin{aligned} \delta _{\mathcal X}(t) := \inf \{2 -\Vert \bar{x} +x\Vert : \Vert \bar{x}\Vert =\Vert x\Vert =1, \quad \Vert \bar{x}-x\Vert \ge t\} \end{aligned}$$

satisfies \(\delta _{\mathcal X}(t)>0\) for all \(0 < t\le 2\). We call \(\mathcal X\) uniformly smooth if its modulus of smoothness

$$\begin{aligned} \rho _{\mathcal X}(s) := \sup \{\Vert \bar{x}+ x\Vert +\Vert \bar{x}-x\Vert - 2 : \Vert \bar{x}\Vert = 1, \quad \Vert x\Vert \le s\} \end{aligned}$$

satisfies \(\lim _{s\searrow 0} \frac{\rho _{\mathcal X}(s)}{s} =0\). It is well known [6] that any uniformly convex or uniformly smooth Banach space is reflexive. On a uniformly smooth Banach space \(\mathcal X\), every duality mapping \(J_r^{\mathcal X}\) with \(1<r<\infty \) is single valued and uniformly continuous on bounded sets. Furthermore, on a uniformly convex Banach space, any sequence \(\{x_n\}\) satisfying \(x_n \rightharpoonup x\) and \(\Vert x_n\Vert \rightarrow \Vert x\Vert \) must satisfy \(x_n\rightarrow x\) as \(n\rightarrow \infty \). This property can be generalized for uniformly convex functions which we state in the following result.

Lemma 2.1

[22] Let \(\Theta :\mathcal X\rightarrow (-\infty , \infty ]\) be a proper, weakly lower semi-continuous, and uniformly convex function. Then \(\Theta \) admits the Kadec property, i.e. for any sequence \(\{x_n\}\subset \mathcal X\) satisfying \(x_n \rightharpoonup x\in \mathcal X\) and \(\Theta (x_n) \rightarrow \Theta (x)<\infty \) there holds \(x_n\rightarrow x\) as \(n\rightarrow \infty \).

We conclude this section by providing a continuous dependence result of minimizers for uniformly convex cost functionals on various parameters. This result is crucial in the forthcoming sections.

Lemma 2.2

Let \(\mathcal X\) and \(\mathcal Y\) be Banach spaces with \(\mathcal X\) being reflexive. Let \(\{A(x): \mathcal X\rightarrow \mathcal Y\}_{x\in {\mathcal D}}\subset {\mathscr {L}}(\mathcal X, \mathcal Y)\) be such that \(x\rightarrow A(x)\) is continuous on \({\mathcal D}\subset \mathcal X\). Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, weakly lower semi-continuous, uniformly convex function. Assume that the sequences \(\{\alpha ^{(\ell )}\}\subset (0, \infty )\), \(\{b^{(\ell )}\}\subset \mathcal Y\), \(\{x^{(\ell )}\}\subset {\mathcal D}\) and \(\{\xi ^{(\ell )}\}\subset \mathcal X^*\) with \(\xi ^{(\ell )}\in \partial \Theta (x^{(\ell )})\) satisfy

$$\begin{aligned} \alpha ^{(\ell )}\rightarrow \bar{\alpha }, \quad b^{(\ell )}\rightarrow \bar{b}, \quad x^{(\ell )}\rightarrow \bar{x}, \quad \xi ^{(\ell )}\rightarrow \bar{\xi }\quad \text{ as } \quad \ell \rightarrow \infty \end{aligned}$$
(2.5)

for some \(\bar{\alpha }>0\), \(\bar{b} \in \mathcal Y\), \(\bar{x} \in {\mathcal D}\) and \(\bar{\xi }\in \mathcal X^*\) with \(\bar{\xi }\in \partial \Theta (\bar{x})\). For \(1\le r<\infty \) let

$$\begin{aligned} z^{(\ell )} := \arg \min _{z\in \mathcal X} \left\{ \frac{1}{r} \Vert b^{(\ell )}- A(x^{(\ell )}) z\Vert ^r + \alpha ^{(\ell )} D_{\xi ^{(\ell )}} \Theta (z, x^{(\ell )})\right\} . \end{aligned}$$

Then \(z^{(\ell )} \rightarrow \bar{z}\) and \(\Theta (z^{(\ell )}) \rightarrow \Theta (\bar{z})\) as \(\ell \rightarrow \infty \), where

$$\begin{aligned} \bar{z} :=\arg \min _{z\in \mathcal X} \left\{ \frac{1}{r} \Vert \bar{b}- A(\bar{x}) z\Vert ^r + \bar{\alpha }D_{\bar{\xi }} \Theta (z, \bar{x})\right\} . \end{aligned}$$

Proof

It is easy to see that \(z^{(\ell )}\) and \(\bar{z}\) are uniquely defined since the corresponding cost functionals are weakly lower semi-continuous and uniformly convex and hence coercive, see (2.4). Because \(\xi ^{(\ell )}\in \partial \Theta (x^{(\ell )})\) and \(\bar{\xi }\in \partial \Theta (\bar{x})\), we may use the condition (2.5) to obtain

$$\begin{aligned} \liminf _{\ell \rightarrow \infty } \Theta (x^{(\ell )})&\ge \liminf _{\ell \rightarrow \infty } (\Theta (\bar{x})+\langle \bar{\xi }, x^{(\ell )}-\bar{x}\rangle ) = \Theta (\bar{x}),\\ \limsup _{\ell \rightarrow \infty } \Theta (x^{(\ell )})&\le \limsup _{\ell \rightarrow \infty } (\Theta (\bar{x}) -\langle \xi ^{(\ell )}, \bar{x}-x^{(\ell )}\rangle ) = \Theta (\bar{x}). \end{aligned}$$

Therefore

$$\begin{aligned} \lim _{\ell \rightarrow \infty } \Theta (x^{(\ell )}) = \Theta (\bar{x}). \end{aligned}$$
(2.6)

To show \(z^{(\ell )} \rightarrow \bar{z}\) we will adapt the argument in [9, 16]. By the definition of \(z^{(\ell )}\) we have

$$\begin{aligned} \frac{1}{r} \Vert b^{(\ell )}- A(x^{(\ell )}) z^{(\ell )}\Vert ^r + \alpha ^{(\ell )} D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )}) \le \frac{1}{r} \Vert b^{(\ell )} - A(x^{(\ell )}) x^{(\ell )}\Vert ^r. \end{aligned}$$

By the given conditions, the right hand side is bounded and thus \(\{D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )})\}\) is bounded. Consequently, \(\{z^{(\ell )}\}\) is bounded in \(\mathcal X\) by the uniform convexity of \(\Theta \). Since \(\mathcal X\) is reflexive, \(\{z^{(\ell )}\}\) has a subsequence, denoted by the same notation, such that \(z^{(\ell )} \rightharpoonup z_*\) as \(\ell \rightarrow \infty \) for some \(z_*\in \mathcal X\). Since \(x\rightarrow A(x)\) is continuous and \(A(\bar{x})\in {\mathscr {L}}(\mathcal X, \mathcal Y)\), we have \(A(x^{(\ell )}) z^{(\ell )} \rightharpoonup A(\bar{x}) z_*\) as \(\ell \rightarrow \infty \). By using (2.6) and the weak lower semi-continuity of \(\Theta \) and the Banach space norm, it follows that

$$\begin{aligned} \Vert \bar{b}- A(\bar{x}) z_*\Vert\le & {} \liminf _{\ell \rightarrow \infty } \Vert b^{(\ell )} -A(x^{(\ell )}) z^{(\ell )}\Vert ,\end{aligned}$$
(2.7)
$$\begin{aligned} D_{\bar{\xi }} \Theta (z_*, \bar{x})\le & {} \liminf _{\ell \rightarrow \infty } D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )}). \end{aligned}$$
(2.8)

Consequently

$$\begin{aligned}&\frac{1}{r} \Vert {\bar{b}}- A(\bar{x}) z_*\Vert ^r +{\bar{\alpha }} D_{\bar{\xi }} \Theta (z_*, \bar{x})\\&\quad \le \frac{1}{r} \liminf _{\ell \rightarrow \infty } \Vert b^{(\ell )}-A(x^{(\ell )}) z^{(\ell )}\Vert ^r + \liminf _{\ell \rightarrow \infty } \alpha ^{(\ell )} D_{\xi ^{(\ell )}} \Theta (z^{(\ell )}, x^{(\ell )}) \\&\quad \le \liminf _{\ell \rightarrow \infty } \left( \frac{1}{r} \Vert b^{(\ell )}-A(x^{(\ell )}) z^{(\ell )}\Vert ^r +\alpha ^{(\ell )} D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )})\right) \\&\quad \le \limsup _{\ell \rightarrow \infty } \left( \frac{1}{r} \Vert b^{(\ell )}-A(x^{(\ell )}) z^{(\ell )}\Vert ^r +\alpha ^{(\ell )} D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )})\right) \\&\quad \le \limsup _{\ell \rightarrow \infty } \left( \frac{1}{r} \Vert b^{(\ell )}-A(x^{(\ell )}) {\bar{z}}\Vert ^r +\alpha ^{(\ell )} D_{\xi ^{(\ell )}}\Theta ({\bar{z}}, x^{(\ell )})\right) \\&\quad = \frac{1}{r} \Vert {\bar{b}}-A(\bar{x}) \bar{z}\Vert ^r +{\bar{\alpha }} D_{\bar{\xi }}\Theta ({\bar{z}}, {\bar{x}}), \end{aligned}$$

where, for the last inequality we used the minimality of \(z^{(\ell )}\) and for the last equality we used (2.5) and (2.6). By using the minimality and uniqueness of \(\bar{z}\), we can conclude \(z_*=\bar{z}\) and

$$\begin{aligned}&\lim _{\ell \rightarrow \infty } \left( \frac{1}{r} \Vert b^{(\ell )}-A(x^{(\ell )}) z^{(\ell )}\Vert ^r +\alpha ^{(\ell )} D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )})\right) \nonumber \\&\quad =\frac{1}{r} \Vert \bar{b}-A(\bar{x}) \bar{z}\Vert ^r +\bar{\alpha }D_{\bar{\xi }}\Theta (\bar{z}, \bar{x}). \end{aligned}$$
(2.9)

Next we show that

$$\begin{aligned} \lim _{\ell \rightarrow \infty } D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )}) = D_{\bar{\xi }} \Theta (\bar{z}, \bar{x}). \end{aligned}$$
(2.10)

According to (2.8) and \(z_*=\bar{z}\), it suffices to show that

$$\begin{aligned} \gamma _0:= \limsup _{\ell \rightarrow \infty } D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )}) \le D_{\bar{\xi }} \Theta (\bar{z}, \bar{x})=:\gamma _1. \end{aligned}$$

Suppose this is not true, i.e. \(\gamma _0>\gamma _1\). By taking a subsequence if necessary, we may assume that

$$\begin{aligned} \gamma _0 = \lim _{\ell \rightarrow \infty } D_{\xi ^{(\ell )}} \Theta (z^{(\ell )}, x^{(\ell )}). \end{aligned}$$

Then from (2.9) it follows that

$$\begin{aligned} \lim _{\ell \rightarrow \infty } \frac{1}{r} \Vert b^{(\ell )}-A(x^{(\ell )}) z^{(\ell )}\Vert ^r = \frac{1}{r} \Vert \bar{b}-A(\bar{x}) \bar{z}\Vert ^r + \bar{\alpha }(\gamma _1-\gamma _0) < \frac{1}{r} \Vert \bar{b}-A(\bar{x}) \bar{z}\Vert ^r \end{aligned}$$

which is a contradiction to (2.7). We thus obtain (2.10).

In view of (2.10), (2.5) and \(z^{(\ell )} \rightharpoonup \bar{z}\) as \(\ell \rightarrow \infty \) we have

$$\begin{aligned} \lim _{\ell \rightarrow \infty } (\Theta (z^{(\ell )})-\Theta (x^{(\ell )}))= \Theta (\bar{z}) -\Theta (\bar{x}). \end{aligned}$$

This together with (2.6) implies that \(\lim _{\ell \rightarrow \infty } \Theta (z^{(\ell )}) =\Theta (\bar{z})\). Since \(z^{(\ell )} \rightharpoonup \bar{z}\), we may use Lemma 2.1 to conclude that \(z^{(\ell )}\rightarrow \bar{z}\) as \(\ell \rightarrow \infty \). \(\square \)

3 The method

We consider the Eq. (1.1) arising from nonlinear inverse problems, where \(F: {\mathscr {D}}(F)\subset \mathcal X\rightarrow \mathcal Y\) is a nonlinear operator between two Banach spaces \(\mathcal X\) and \(\mathcal Y\) with domain \({\mathscr {D}}(F)\). We will assume that (1.1) has a solution. In general (1.1) may have many solutions. In order to find the desired one, some selection criterion should be enforced. According to a prior information on the sought solution, we choose a proper, weakly lower semi-continuous, uniformly convex function \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\). By taking \(x_0\in {\mathscr {D}}(\Theta ) \cap {\mathscr {D}}(F)\) and \(\xi _0\in \partial \Theta (x_0)\) as the initial guess, we define \(x^\dag \) to be the solution of (1.1) with the property

$$\begin{aligned} D_{\xi _0}\Theta (x^\dag , x_0) = \min _{x\in {\mathscr {D}}(\Theta )\cap {\mathscr {D}}(F)} \{D_{\xi _0}\Theta (x, x_0): F(x) =y\}. \end{aligned}$$
(3.1)

We are interested in developing algorithms to find the solution \(x^\dag \) of (1.1). We will work under the following standard conditions.

Assumption 1

   

  1. (a)

    \(\mathcal X\) is a reflexive Banach space and \(\mathcal Y\) is a uniformly smooth Banach space;

  2. (b)

    F is weakly closed on \({\mathscr {D}}(F)\), i.e. for any sequence \(\{x_n\} \subset {\mathscr {D}}(F)\) satisfying \(x_n\rightharpoonup x\in \mathcal X\) and \(F(x_n)\rightharpoonup v \in \mathcal Y\) there hold \(x\in {\mathscr {D}}(F)\) and \(F(x) = v\);

  3. (c)

    There is \(\rho >0\) such that \(B_{2\rho }(x_0)\subset {\mathscr {D}}(F)\) and (1.1) has a solution in \(B_\rho (x_0)\cap {\mathscr {D}}(\Theta )\), where \(B_{\rho }(x_0): = \{x\in \mathcal X:\ \Vert x-x_0\Vert \le \rho \}\);

  4. (d)

    There exists \(\{L(x): \mathcal X\rightarrow \mathcal Y\}_{x\in B_{2\rho }(x_0)}\subset {\mathscr {L}}(\mathcal X, \mathcal Y)\) such that \(x\rightarrow L(x)\) is continuous on \(B_{2\rho }(x_0)\) and there is \(0\le \eta <1\) such that

    $$\begin{aligned} \Vert F(\bar{x})-F(x)-L(x)(\bar{x}-x)\Vert \le \eta \Vert F(\bar{x})-F(x)\Vert \end{aligned}$$

    for all \(\bar{x}, x\in B_{2\rho }(x_0)\).

In Assumption 1, the uniform smoothness of \(\mathcal Y\) in (a) is used to guarantee that the duality mapping \(J_r^{\mathcal Y}\) is single-valued and continuous for each \(1<r<\infty \). We do not require F to be Fréchet differentiable; in case F is Fréchet differentiable, we may take \(L(x) =F'(x)\), where \(F'(x)\) denotes the Fréchet derivative of F at x. The condition in (d) is the so called tangential cone condition which has been widely used in the analysis of regularization methods for nonlinear inverse problems and has been verified for several important applications [1113, 20, 2224, 26, 30]. How to replace the tangential cone condition by a weaker condition is a challenging issue. Under certain smoothness conditions on the solution, a class of Newton methods in Hilbert spaces has been shown in [19] to be order optimal under merely the Lipschitz condition on the Fréchet derivative of F when the methods are terminated by a discrepancy principle. How to extend such result to the Banach space setting remains an open problem. In view of (d) in Assumption 1, it is easily seen that

$$\begin{aligned} \Vert F(\bar{x})-F(x)\Vert \le \frac{1}{1-\eta } \Vert L(x) (\bar{x}-x)\Vert , \quad \bar{x}, x\in B_{2\rho }(x_0) \end{aligned}$$

which shows that \(x\rightarrow F(x)\) is continuous on \(B_{2\rho }(x_0)\). When \(\mathcal X\) is a reflexive Banach space, by using the uniform convexity of \(\Theta \) and weak closedness of F, it is standard to show that \(x^\dag \) exists. Moreover, [20, Lemma 3.2] gives the following local uniqueness result.

Lemma 3.1

Let Assumption 1 hold. If \(x^\dag \in B_\rho (x_0)\cap {\mathscr {D}}(\Theta )\), then \(x^\dag \) is the unique solution of (1.1) in \(B_{2\rho }(x_0) \cap {\mathscr {D}}(\Theta )\) satisfying (3.1).

In practical applications, instead of y we only have noisy data \(y^\delta \) satisfying

$$\begin{aligned} \Vert y^\delta -y\Vert \le \delta \end{aligned}$$
(3.2)

with a small noise level \(\delta >0\). We will use \(y^\delta \) to construct an approximate solution to (1.1). To formulate our Levenberg–Marquardt algorithm, assuming \(x_n^\delta \) has been constructed we consider the linearized equation

$$\begin{aligned} L(x_n^\delta ) (x-x_n^\delta ) = y^\delta -F(x_n^\delta ) \end{aligned}$$

and apply to it the Tikhonov regularization whose regularizing term is the Bregman distance induced by \(\Theta \) at \(x_n^\delta \). The next iterate \(x_{n+1}^\delta \) is then constructed by chosen the regularization parameter adaptively. This leads to the following Levenberg–Marquardt algorithm.

Algorithm 1

(Levenberg–Marquardt method with noisy data)      

  1. 1.

    Take \(x_0\in \mathcal X\) and \(\xi _0\in \mathcal X^*\) such that \(\xi _0 \in \partial \Theta (x_0)\). Pick \(0<\mu _0\le \mu _1<1\) and \(\tau >1\).

  2. 2.

    Let \(x_0^\delta :=x_0\) and \(\xi _0^\delta := \xi _0\). Assume that \(x_n^\delta \) and \(\xi _n^\delta \) are well-defined, we then define \(x_{n+1}^\delta \) and \(\xi _{n+1}^\delta \) as follows:

    1. (a)

      For each \(\alpha >0\) we define \(x_n(\alpha , y^\delta )\) and \(\xi _n(\alpha , y^\delta )\) as

      $$\begin{aligned} x_n(\alpha , y^\delta )= & {} \arg \min _{x\in \mathcal X} \left\{ \frac{1}{r} \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta )(x-x_n^\delta )\Vert ^r+ \alpha D_{\xi _n^\delta }\Theta (x,x_n^\delta )\right\} ,\\ \xi _n(\alpha , y^\delta )= & {} \xi _n^\delta +\frac{1}{\alpha } L(x_n^\delta )^* J_r^{\mathcal Y}\left( y^\delta -F(x_n^\delta )-L(x_n^\delta ) (x_n(\alpha , y^\delta )-x_n^\delta )\right) ; \end{aligned}$$
    2. (b)

      Take \(\alpha _n(y^\delta ) >0\) to be a number \(\alpha \) such that

      $$\begin{aligned} \mu _0 \Vert y^\delta - F(x_n^\delta )\Vert \le \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta ) (x_n(\alpha , y^\delta )-x_n^\delta )\Vert \le \mu _1 \Vert y^\delta -F(x_n^\delta )\Vert ; \end{aligned}$$
    3. (c)

      Define \(x_{n+1}^\delta :=x_n(\alpha _n(y^\delta ), y^\delta )\) and \(\xi _{n+1}^\delta := \xi _n(\alpha _n(y^\delta ), y^\delta )\).

  3. 3.

    Let \(n_\delta \) be the first integer such that \(\Vert y^\delta -F(x_{n_\delta }^\delta )\Vert \le \tau \delta \) and use \(x_{n_\delta }^\delta \) as an approximate solution.

When \(\mathcal X\) and \(\mathcal Y\) are Hilbert spaces, \(r=2\) and \(\Theta (x) = \Vert x\Vert ^2/2\), Algorithm 1 reduces to the regularizing Levenberg–Marquardt scheme in [11] and each minimizer \(x_n(\alpha , y^\delta )\) can be written explicitly. In the setting of Algorithm 1 in Banach spaces with general convex regularization terms, \(x_n(\alpha , y^\delta )\) does not have an explicit formula. This increases the difficulty in convergence analysis. By making use of tools from convex analysis, in this section we will show that Algorithm 1 is well-defined, and in Sect. 4 we will show that \(x_{n_\delta }^\delta \) indeed converges to a solution of (1.1) as \(\delta \rightarrow 0\).

In Algorithm 1, we need to pick \(\xi _0\in \mathcal X^*\) and \(x_0\in \mathcal X\) such that \(\xi _0 \in \partial \Theta (x_0)\). This can be achieved as follows: pick \(\xi _0\in \mathcal X^*\) and define

$$\begin{aligned} x_0 =\arg \min _{x\in \mathcal X} \left\{ \Theta (x)-\langle \xi _0, x\rangle \right\} , \end{aligned}$$

then \(\xi _0\in \partial \Theta (x_0)\) holds automatically; in applications, we usually have \(\Theta \ge 0\) and \(\Theta (0)=0\), then we can simply take \(x_0=0\) and \(\xi _0=0\).

From the definition of \(x_n(\alpha , y^\delta )\) in Algorithm 1 we can see that

$$\begin{aligned} 0\in -L(x_n^\delta )^* J_r^{\mathcal Y}\left( y^\delta -F(x_n^\delta )-L(x_n^\delta )(x_n(\alpha , y^\delta )-x_n^\delta )\right) +\alpha \left( \partial \Theta (x_n(\alpha , y^\delta ))-\xi _n^\delta \right) . \end{aligned}$$

The definition of \(\xi _n(\alpha , y^\delta )\) is exactly motivated by this fact so that

$$\begin{aligned} \xi _n(\alpha , y^\delta ) \in \partial \Theta (x_n(\alpha , y^\delta )) \quad \text{ for } \text{ all } \alpha >0. \end{aligned}$$

Moreover, by the minimality of \(x_n(\alpha , y^\delta )\), we always have

$$\begin{aligned} \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta ) (x_n(\alpha , y^\delta )-x_n^\delta )\Vert \le \Vert y^\delta -F(x_n^\delta )\Vert , \quad \forall \alpha >0. \end{aligned}$$
(3.3)

In order to prove that Algorithm 1 is well-defined, we need to show that the number \(\alpha _n(y^\delta )\) used to define \(x_{n+1}^\delta \) from \(x_n^\delta \) exists, each \(x_n^\delta \) is in \({\mathscr {D}}(F)\), and the iteration terminates after \(n_\delta <\infty \) steps. We achieve these via a series of results.

Lemma 3.2

Let \(\mathcal X\) be reflexive and let \(\Theta :\mathcal X\rightarrow (-\infty , \infty ]\) be proper, weakly lower semi-continuous and uniformly convex. Then, for each \(\alpha >0\), \(x_n(\alpha , y^\delta )\) is uniquely determined. Moreover, the mapping \(\alpha \rightarrow x_n(\alpha , y^\delta )\) is continuous over \((0, \infty )\), and the function

$$\begin{aligned} \alpha \rightarrow \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta )(x_n(\alpha , y^\delta )-x_n^\delta )\Vert \end{aligned}$$
(3.4)

is continuous and monotonically increasing over \((0, \infty )\).

Proof

All assertions, except the monotonicity of (3.4), follow from Lemma 2.2. The monotonicity of (3.4) can be proved by a standard argument, see [3, Lemma 9.2.1] or [7, Lemma 6.1] for instance. \(\square \)

Lemma 3.3

Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, weakly lower semi-continuous function that is uniformly convex in the sense of (2.2). Let Assumption 1 hold with \(0\le \eta <1\). Let \(\eta <\mu _0<1\) and \(\tau >(1+\eta )/(\mu _0-\eta )\). Assume that \(x_n^\delta \) and \(\xi _n^\delta \) are well-defined for some \(0\le n<n_\delta \) with

$$\begin{aligned} D_{\xi _n^\delta } \Theta (x^\dag , x_n^\delta ) \le D_{\xi _0} \Theta (x^\dag , x_0) \le \varphi (\rho ). \end{aligned}$$
(3.5)

Then for any \(\alpha >0\) such that

$$\begin{aligned} \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta )(x_n(\alpha , y^\delta )-x_n^\delta )\Vert \ge \mu _0 \Vert y^\delta -F(x_n^\delta )\Vert \end{aligned}$$
(3.6)

there hold \(x_n(\alpha , y^\delta ) \in B_{2\rho }(x_0)\) and

$$\begin{aligned} D_{\xi _n(\alpha , y^\delta )} \Theta (\hat{x}, x_n(\alpha , y^\delta ))- D_{\xi _n^\delta } \Theta (\hat{x}, x_n^\delta ) \le -\frac{c_0 \mu _0^r}{\alpha } \Vert y^\delta -F(x_n^\delta )\Vert ^r \end{aligned}$$
(3.7)

for any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\), where \(c_0:=1-(1+\eta +\tau \eta )/(\tau \mu _0)\).

Proof

For simplicity of exposition, we write

$$\begin{aligned} x_n(\alpha ):=x_n(\alpha , y^\delta ), \quad \xi _n(\alpha ):=\xi _n(\alpha , y^\delta ) \quad \text{ and } \quad L_n:=L(x_n^\delta ). \end{aligned}$$

By using the identity (2.1) and the nonnegativity of the Bregman distance, we obtain

$$\begin{aligned} D_{\xi _n(\alpha )} \Theta (\hat{x}, x_n(\alpha ))-D_{\xi _n^\delta } \Theta (\hat{x}, x_n^\delta )\le \langle \xi _n(\alpha )-\xi _n^\delta , x_n(\alpha )-\hat{x}\rangle . \end{aligned}$$

By the definition of \(\xi _n(\alpha )\) we then have

$$\begin{aligned}&D_{\xi _n(\alpha )} \Theta (\hat{x}, x_n(\alpha ))-D_{\xi _n^\delta } \Theta (\hat{x}, x_n^\delta ) \\&\quad \le \frac{1}{\alpha } \langle J_r^{\mathcal Y}(y^\delta -F(x_n^\delta )-L_n(x_n(\alpha )-x_n^\delta )), L_n(x_n(\alpha )-\hat{x})\rangle . \end{aligned}$$

We can write

$$\begin{aligned} L_n (x_n(\alpha )-\hat{x})&=[y^\delta -F(x_n^\delta )-L_n(\hat{x}-x_n^\delta )]-[y^\delta -F(x_n^\delta )-L_n(x_n(\alpha )-x_n^\delta )]. \end{aligned}$$

Then, by virtue of the property of the duality mapping \(J_r^{\mathcal Y}\), we obtain

$$\begin{aligned}&D_{\xi _n(\alpha )} \Theta (\hat{x}, x_n(\alpha ))-D_{\xi _n^\delta } \Theta (\hat{x}, x_n^\delta ) \le -\frac{1}{\alpha } \Vert y^\delta -F(x_n^\delta )- L_n (x_n(\alpha )-x_n^\delta )\Vert ^r\\&\quad + \frac{1}{\alpha } \Vert y^\delta -F(x_n^\delta )- L_n(x_n(\alpha )-x_n^\delta )\Vert ^{r-1} \Vert y^\delta -F(x_n^\delta )- L_n (\hat{x}-x_n^\delta )\Vert . \end{aligned}$$

In view of (3.5) and (2.3), we have \(\Vert x_n^\delta -x^\dag \Vert \le \rho \) and \(\Vert x^\dag -x_0\Vert \le \rho \) which implies that \(x_n^\delta \in B_{2\rho }(x_0)\). Thus we may use (3.2) and Assumption 1(d) to derive that

$$\begin{aligned} \Vert y^\delta -F(x_n^\delta )- L_n (\hat{x}-x_n^\delta )\Vert \le (1+\eta ) \delta +\eta \Vert y^\delta -F(x_n^\delta )\Vert . \end{aligned}$$

Since \(n<n_\delta \) we have \(\Vert F(x_n^\delta ) -y^\delta \Vert \ge \tau \delta \). Thus

$$\begin{aligned} \Vert y^\delta -F(x_n^\delta )- L_n (\hat{x}-x_n^\delta )\Vert \le \frac{1+\eta +\tau \eta }{\tau } \Vert y^\delta -F(x_n^\delta )\Vert . \end{aligned}$$
(3.8)

Therefore

$$\begin{aligned}&D_{\xi _n(\alpha )} \Theta (\hat{x}, x_n(\alpha ))-D_{\xi _n^\delta } \Theta (\hat{x}, x_n^\delta ) \le -\frac{1}{\alpha } \Vert y^\delta -F(x_n^\delta )- L_n (x_n(\alpha )-x_n^\delta )\Vert ^r\\&\quad + \frac{1+\eta +\tau \eta }{\tau \alpha } \Vert y^\delta -F(x_n^\delta )- L_n (x_n(\alpha )-x_n^\delta )\Vert ^{r-1} \Vert y^\delta -F(x_n^\delta )\Vert . \end{aligned}$$

In view of the inequality (3.6), we thus obtain

$$\begin{aligned} D_{\xi _n(\alpha )}&\Theta (\hat{x}, x_n(\alpha ))-D_{\xi _n^\delta } \Theta (\hat{x}, x_n^\delta ) \le - \frac{c_0}{\alpha } \Vert y^\delta -F(x_n^\delta )- L_n (x_n(\alpha )-x_n^\delta )\Vert ^r, \end{aligned}$$

where \(c_0:=1-(1+\eta +\tau \eta )/(\tau \mu _0)\). According to the conditions on \(\mu _0\) and \(\tau \), we have \(c_0>0\). Thus, in view of (3.6) again, we obtain (3.7).

Finally, by using (3.7) with \(\hat{x} = x^\dag \) and (3.5) we have

$$\begin{aligned} D_{\xi _n(\alpha )} \Theta (x^\dag , x_n(\alpha ))\le D_{\xi _n^\delta } \Theta (x^\dag , x_n^\delta )\le D_{\xi _0}\Theta (x^\dag , x_0)\le \varphi (\rho ). \end{aligned}$$

This together with (2.3) and \(\Vert x_0-x^\dag \Vert \le \rho \) implies that \(x_n(\alpha ) \in B_{2\rho }(x_0)\). \(\square \)

Proposition 3.4

Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be proper, lower semi-continuous and uniformly convex in the sense of (2.2). Let Assumption 1 hold with \(0\le \eta <1/3\) and let \(\eta <\mu _0\le \mu _1<1-2\eta \) and \(\tau >(1+\eta )/(\mu _0-\eta )\). Assume that

$$\begin{aligned} D_{\xi _0}\Theta (x^\dag , x_0) \le \varphi (\rho ). \end{aligned}$$
(3.9)

Then \(x_n^\delta \) are well-defined for all \(0\le n\le n_\delta \) and Algorithm 1 terminates after \(n_\delta <\infty \) iterations with \(n_\delta =O(1+|\log \delta |)\). Moreover, for any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) there hold

$$\begin{aligned} D_{\xi _{n+1}^\delta } \Theta (\hat{x}, x_{n+1}^\delta )\le D_{\xi _n^\delta } \Theta (\hat{x}, x_n^\delta ) \end{aligned}$$
(3.10)

and

$$\begin{aligned} \frac{1}{\alpha _n(y^\delta )} \Vert y^\delta -F(x_n^\delta )\Vert ^r \le C_0(D_{\xi _n^\delta } \Theta (\hat{x}, x_n^\delta )-D_{\xi _{n+1}^\delta } \Theta (\hat{x}, x_{n+1}^\delta )) \end{aligned}$$
(3.11)

for all \(0\le n<n_\delta \), where \(C_0=1/(c_0 \mu _0^r)\).

Proof

We first show that if

$$\begin{aligned} D_{\xi _n^\delta }\Theta (x^\dag , x_n^\delta ) \le D_{\xi _0} \Theta (x^\dag , x_0)\le \varphi (\rho ) \end{aligned}$$
(3.12)

for some \(0\le n<n_\delta \), then there exists \(\alpha _n(y^\delta )>0\) such that

$$\begin{aligned} \mu _0\Vert y^\delta -F(x_n^\delta )\Vert \le f(\alpha _n(y^\delta )) \le \mu _1 \Vert y^\delta -F(x_n^\delta )\Vert , \end{aligned}$$
(3.13)

where

$$\begin{aligned} f(\alpha ) = \Vert y^\delta - F(x_n^\delta ) - L(x_n^\delta ) (x_n(\alpha , y^\delta )-x_n^\delta )\Vert \end{aligned}$$

which is continuous and monotonically increasing, see Lemma 3.2. To see this, we may use the minimality of \(x_n(\alpha , y^\delta )\) to obtain

$$\begin{aligned} \alpha D_{\xi _n^\delta }\Theta (x_n(\alpha , y^\delta ), x_n^\delta ) \le \frac{1}{r} \Vert y^\delta -F(x_n^\delta )\Vert ^r, \quad \forall \alpha >0. \end{aligned}$$

This implies that \(\lim _{\alpha \rightarrow \infty } D_{\xi _n^\delta }\Theta (x_n(\alpha , y^\delta ), x_n^\delta ) =0\) and hence \(\lim _{\alpha \rightarrow \infty } \Vert x_n(\alpha , y^\delta )-x_n^\delta \Vert =0\) by the uniform convexity of \(\Theta \). Consequently

$$\begin{aligned} \lim _{\alpha \rightarrow \infty } f(\alpha ) =\Vert y^\delta -F(x_n^\delta )\Vert > \mu _1\Vert y^\delta -F(x_n^\delta )\Vert . \end{aligned}$$

To show the existence of a finite \(\alpha _n(y^\delta )\) satisfying (3.13), it suffices to show that

$$\begin{aligned} \lim _{\alpha \rightarrow 0} f(\alpha ) < \mu _0\Vert y^\delta -F(x_n^\delta )\Vert . \end{aligned}$$

Suppose this is not true, then

$$\begin{aligned} f(\alpha ) \ge \mu _0\Vert y^\delta -F(x_n^\delta )\Vert , \quad \forall \alpha >0. \end{aligned}$$

Thus, we may use Lemma 3.3 to obtain

$$\begin{aligned} \frac{c_0 \mu _0^r}{\alpha } \Vert y^\delta -F(x_n^\delta )\Vert ^r \le D_{\xi _n^\delta }\Theta (x^\dag , x_n^\delta ), \quad \forall \alpha >0. \end{aligned}$$

Taking \(\alpha \rightarrow 0\) gives \(y^\delta =F(x_n^\delta )\) which is absurd since \(\Vert y^\delta -F(x_n^\delta )\Vert >\tau \delta \) for \(n<n_\delta \).

We next show (3.12) by induction on n. It is trivial for \(n=0\). Assume that it is true for \(n=m\) with \(m<n_\delta \). Since \(x_{m+1}^\delta = x_m(\alpha _m(y^\delta ), y^\delta )\) and \(\xi _{m+1}^\delta = \xi _m(\alpha _m(y^\delta ), y^\delta )\), we may use Lemma 3.3 to conclude that

$$\begin{aligned} D_{\xi _{m+1}^\delta } \Theta (x^\dag , x_{m+1}^\delta ) \le D_{\xi _m^\delta } \Theta (x^\dag , x_m^\delta ) \end{aligned}$$

which together with the induction hypothesis shows (3.12) for \(n=m+1\).

Since (3.12) holds true for all \(0\le n<n_\delta \), we may use Lemma 3.3 to obtain (3.10) and (3.11) immediately.

Finally, the finiteness of \(n_\delta \) can be proved by a standard argument from [11]. For completeness we include the argument here. By Assumption 1(d) and the definition of \(x_{n+1}^\delta \) we have for all \(0\le n<n_\delta \) that

$$\begin{aligned} \Vert y^\delta -F(x_{n+1}^\delta )\Vert&\le \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta )(x_{n+1}^\delta -x_n^\delta )\Vert \\&\quad \, +\Vert F(x_{n+1}^\delta )-F(x_n^\delta )-L(x_n^\delta )(x_{n+1}^\delta -x_n^\delta )\Vert \\&\le \mu _1 \Vert y^\delta -F(x_n^\delta )\Vert +\eta \Vert F(x_{n+1}^\delta )-F(x_n^\delta )\Vert \\&\le (\mu _1+\eta ) \Vert y^\delta -F(x_n^\delta )\Vert + \eta \Vert y^\delta -F(x_{n+1}^\delta )\Vert . \end{aligned}$$

This implies that

$$\begin{aligned} \Vert y^\delta -F(x_{n+1}^\delta )\Vert \le q \Vert y^\delta -F(x_n^\delta )\Vert \quad \text{ with } q=\frac{\mu _1+\eta }{1-\eta }<1 \end{aligned}$$
(3.14)

and hence

$$\begin{aligned} \Vert y^\delta -F(x_n^\delta )\Vert \le q^n \Vert y^\delta -F(x_0)\Vert , \quad 0\le n< n_\delta . \end{aligned}$$
(3.15)

If \(n_\delta =\infty \), then we must have \(\Vert y^\delta -F(x_n^\delta )\Vert > \tau \delta \) for all n. But the inequality (3.15) implies \(\Vert y^\delta -F(x_n^\delta )\Vert \rightarrow 0\) as \(n\rightarrow \infty \). Therefore \(n_\delta <\infty \). Now we take \(n=n_\delta -1\) in (3.15) and obtain \(q^{n_\delta -1}\Vert y^\delta -F(x_0)\Vert >\tau \delta \). This implies \(n_\delta =O(1+|\log \delta |)\). \(\square \)

4 Convergence analysis

In this section we will show that Algorithm 1 is a regularization method for solving (1.1), that is, we will show that if \((x_n^\delta , \xi _n^\delta )\in \mathcal X\times \mathcal X^*\), \(0\le n\le n_\delta \), are defined by Algorithm 1 using noisy data \(y^\delta \), then \(x_{n_\delta }^\delta \) converges to \(x^\dag \) as \(y^\delta \rightarrow y\). To this end, it is necessary to investigate for each fixed n the behavior of \(x_n^\delta \) as \(y^\delta \rightarrow y\). This leads us to consider the counterpart of Algorithm 1 with exact data which is formulated as Algorithm 2 below. Due to the non-unique determination of \(\alpha _n\), this algorithm actually defines many distinct iterative sequences. We will show that every sequence defined by Algorithm 2 is convergent. This convergence result however is not enough for showing the regularization property of Algorithm 1. Indeed, for each fixed \(n\ge 1\), when taking \(y^\delta \rightarrow y\), the sequence \(\{\alpha _{n-1}(y^\delta )\}\) used to define \(x_n^\delta \) from \(x_{n-1}^\delta \) may split into many convergent subsequences and so does \(\{x_n^\delta , \xi _n^\delta \}\) with limits defined by Algorithm 2. This forces us to establish a uniform convergence result for all the possible sequences defined by Algorithm 2. The regularization property of Algorithm 1 is then established by using a stability result which connects Algorithms 1 and 2.

4.1 The method with exact data

We start with the formulation of the counterpart of Algorithm 1 when the exact data is used.

Algorithm 2

(Levenberg–Marquardt method with exact data)

  1. 1.

    Let \(0<\mu _0\le \mu _1<1\) and \((x_0, \xi _0) \in \mathcal X\times \mathcal X^*\) be the same as in Algorithm 1.

  2. 2.

    Assume that \(x_n\) and \(\xi _n\) are defined. If \(F(x_n)=y\), we define \(x_{n+1} = x_n\) and \(\xi _{n+1}=\xi _n\); otherwise, we define \(x_{n+1}\) and \(\xi _{n+1}\) as follows:

    1. (a)

      For each \(\alpha >0\) we define \(x_n(\alpha , y)\) and \(\xi _n(\alpha , y)\) as

      $$\begin{aligned} x_n(\alpha , y)= & {} \arg \min _{x\in \mathcal X} \left\{ \frac{1}{r} \Vert y-F(x_n)-L(x_n)(x-x_n)\Vert ^r+ \alpha D_{\xi _n}\Theta (x,x_n)\right\} ,\\ \xi _n(\alpha , y)= & {} \xi _n +\frac{1}{\alpha } L(x_n)^* J_r^{\mathcal Y}\left( y-F(x_n)-L(x_n) (x_n(\alpha , y)-x_n)\right) ; \end{aligned}$$
    2. (b)

      Take \(\alpha _n>0\) to be a number such that

      $$\begin{aligned} \mu _0 \Vert y - F(x_n)\Vert \le \Vert y-F(x_n)-L(x_n) (x_n(\alpha _n, y)-x_n)\Vert \le \mu _1 \Vert y-F(x_n)\Vert ; \end{aligned}$$
    3. (c)

      Define \(x_{n+1}:=x_n(\alpha _n, y)\) and \(\xi _{n+1} := \xi _n(\alpha _n, y)\).

In the formulation of Algorithm 2, we take \(\alpha _n>0\) to be any number satisfying (b) when defining \(x_{n+1}, \xi _{n+1}\) from \(x_n, \xi _n\). There might have many possible choices of \(\alpha _n\); different choice of \(\{\alpha _n\}\) may lead to different iterative sequence. We will use \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) to denote the set of all possible sequence \(\{(x_n, \xi _n)\}\) in \(\mathcal X\times \mathcal X^*\) constructed from \((x_0, \xi _0)\) by Algorithm 2 with \(\alpha _n>0\) chosen to be any number satisfying (b). By using the same argument in the proof of Proposition 3.4, we can obtain the following result which shows that each sequence in \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) is well-defined and admits certain monotonicity property.

Lemma 4.1

Let Assumption 1 hold with \(0\le \eta <1\) and let \(\eta <\mu _0\le \mu _1<1\). Then any sequence \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) is well-defined and for any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) there hold

$$\begin{aligned} D_{\xi _{n+1}}\Theta (\hat{x}, x_{n+1})&\le D_{\xi _n}\Theta (\hat{x}, x_n), \end{aligned}$$
(4.1)
$$\begin{aligned} \frac{1}{\alpha _n} \Vert y-F(x_n)\Vert ^r&\le \frac{1}{\mu _0-\eta } (D_{\xi _n}\Theta (\hat{x}, x_n) - D_{\xi _{n+1}}\Theta (\hat{x}, x_{n+1})) \end{aligned}$$
(4.2)

for all \(n\ge 0\).

In order to derive the convergence of every sequence \(\{(x_n, \xi _n)\}\) in \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\), we will use the following result which gives a general convergence criterion.

Proposition 4.2

Consider the Eq. (1.1) for which Assumption 1 holds. Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, lower semi-continuous and uniformly convex function. Let \(\{x_n\}\subset B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) and \(\{\xi _n\}\subset \mathcal X^*\) be such that

  1. (i)

    \(\xi _n\in \partial \Theta (x_n)\) for all n;

  2. (ii)

    for any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) the sequence \(\{D_{\xi _n}\Theta (\hat{x}, x_n)\}\) is monotonically decreasing;

  3. (iii)

    \(\lim _{n\rightarrow \infty } \Vert F(x_n)-y\Vert =0\).

  4. (iv)

    there is a constant C such that for all \(k>n\) and any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) there holds

    $$\begin{aligned} |\langle \xi _k-\xi _n, x_k-\hat{x}\rangle | \le C (D_{\xi _n}\Theta (\hat{x}, x_n) -D_{\xi _k}\Theta (\hat{x}, x_k)). \end{aligned}$$
    (4.3)

Then there exists a solution \(x_*\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } \Vert x_n-x_*\Vert =0, \quad \lim _{n\rightarrow \infty } \Theta (x_n) = \Theta (x_*) \quad \text{ and } \quad \lim _{n\rightarrow \infty } D_{\xi _n}\Theta (x_*, x_n)=0. \end{aligned}$$

If, in addition, \(x^\dag \in B_\rho (x_0)\cap {\mathscr {D}}(\Theta )\) and \(\xi _{n+1}-\xi _n \in \overline{{\mathscr {R}}(L(x^\dag )^*)}\) for all n, then \(x_*=x^\dag \).

Proof

This result follows from [20, Proposition 3.6] and its proof. \(\square \)

Now we can prove the main convergence result on Algorithm 2.

Theorem 4.3

Let Assumption 1 hold with \(0\le \eta <1/3\) and let \(\eta <\mu _0\le \mu _1<1-2\eta \). Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be proper, lower semi-continuous and uniformly convex in the sense of (2.2), and let (3.9) be satisfied. Then for any \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) there exists a solution \(x_*\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } \Vert x_n-x_*\Vert =0, \quad \lim _{n\rightarrow \infty } \Theta (x_n) = \Theta (x_*) \quad \text{ and } \quad \lim _{n\rightarrow \infty } D_{\xi _n}\Theta (x_*, x_n)=0. \end{aligned}$$

If in addition \({\mathscr {N}}(L(x^\dag )) \subset {\mathscr {N}}(L(x))\) for all \(x \in B_{2\rho }(x_0)\cap {\mathscr {D}}(F)\), then \(x_*=x^\dag \).

Proof

We will use Proposition 4.2. By the definition of \(\xi _n\) in Algorithm 2 we always have \(\xi _n\in \partial \Theta (x_n)\) for all \(n\ge 0\) which shows (i) in Proposition 4.2. Lemma 4.1 shows (ii) in Proposition 4.2. By the similar argument for deriving (3.14) we can show that

$$\begin{aligned} \Vert y-F(x_{n+1})\Vert \le q\Vert y-F(x_n)\Vert , \quad \forall n\ge 0 \end{aligned}$$
(4.4)

with \(q=(\mu _1+\eta )/(1-\eta )<1\). This implies (iii) in Proposition 4.2. In order to show the convergence result, it remains only to show (iv) in Proposition 4.2.

To this end, for \(0\le l <m<\infty \) we may use the definition of \(\xi _n\) and the property of the duality mapping \(J_r^{\mathcal Y}\) to obtain

$$\begin{aligned} |\langle \xi _m-\xi _l, x_m-x^\dag \rangle |&=\left| \sum _{n=l}^{m-1} \langle \xi _{n+1}-\xi _n, x_m-x^\dag \rangle \right| \\&=\left| \sum _{n=l}^{m-1} \frac{1}{\alpha _n} \langle J_r^{\mathcal Y}(y-F(x_n)-L(x_n)(x_{n+1}-x_n)), L(x_n)(x_m-x^\dag )\rangle \right| \\&\le \sum _{n=l}^{m-1} \frac{1}{\alpha _n} \Vert y-F(x_n)-L(x_n)(x_{n+1}-x_n)\Vert ^{r-1} \Vert L(x_n)(x_m-x^\dag )\Vert . \end{aligned}$$

By the triangle inequality \(\Vert L(x_n)(x_m-x^\dag )\Vert \le \Vert L(x_n)(x_n-x^\dag )\Vert +\Vert L(x_n)(x_m-x_n)\Vert \) and Assumption 1(d), we have

$$\begin{aligned} \Vert L(x_n)(x_m-x^\dag )\Vert&\le (1+\eta )\left( \Vert y-F(x_n)\Vert +\Vert F(x_n)-F(x_m)\Vert \right) \\&\le (1+\eta ) \left( 2 \Vert y-F(x_n)\Vert +\Vert y-F(x_m)\Vert \right) . \end{aligned}$$

Since the inequality (4.4) implies that \(\{\Vert y-F(x_n)\Vert \}\) monotonically decreasing, we have

$$\begin{aligned} \Vert L(x_n)(x_m-x^\dag )\Vert \le 3(1+\eta ) \Vert y-F(x_n)\Vert , \quad 0\le n\le m. \end{aligned}$$

Therefore

$$\begin{aligned}&|\langle \xi _m -\xi _l, x_m-x^\dag \rangle | \\&\quad \le 3(1+\eta ) \sum _{n=l}^{m-1} \frac{1}{\alpha _n} \Vert y-F(x_n)-L(x_n) (x_{n+1}-x_n)\Vert ^{r-1} \Vert y-F(x_n)\Vert \\&\quad \le 3(1+\eta ) \sum _{n=l}^{m-1} \frac{1}{\alpha _n} \Vert y-F(x_n)\Vert ^r. \end{aligned}$$

In view of (4.2) in Lemma 4.1, we obtain with \(C_1:=3(1+\eta )/(\mu _0-\eta )\) that

$$\begin{aligned} |\langle \xi _m-\xi _l, x_m-x^\dag \rangle | \le C_1 (D_{\xi _l} \Theta (x^\dag , x_l)-D_{\xi _m}\Theta (x^\dag , x_m)) \end{aligned}$$

which shows (iv) in Proposition 4.2.

To show the last part under the condition \({\mathscr {N}}(L(x^\dag ))\subset {\mathscr {N}}(L(x))\) for \(x\in B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\), we observe from the definition of \(\xi _n\) that

$$\begin{aligned} \xi _{n+1}-\xi _n \in {\mathscr {R}}(L(x_n)^*) \subset {\mathscr {N}}(L(x_n))^\perp \subset {\mathscr {N}}(L(x^\dag ))^\perp = \overline{{\mathscr {R}}(L(x^\dag )^*)}. \end{aligned}$$

Thus, we may use the second part of Proposition 4.2 to conclude the proof. \(\square \)

4.2 A uniform convergence result

In Theorem 4.3 we have shown the convergence of every sequence in \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\). In this subsection we will strengthen this result by showing the following uniform convergence result for all sequences in \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) which will be crucial in establishing the regularization property of Algorithm 1.

Proposition 4.4

Assume all the conditions in Theorem 4.3 hold. Assume also that

$$\begin{aligned} {\mathscr {N}}(L(x^\dag ))\subset {\mathscr {N}}(L(x)), \quad \forall x\in B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta ). \end{aligned}$$
(4.5)

Then, for any \(\varepsilon >0\), there is an integer \(n(\varepsilon )\) such that for any sequence \(\{(\xi _n, x_n)\} \in \Gamma _{\mu _0, \mu _1} (\xi _0, x_0)\) there holds \(D_{\xi _n} \Theta (x^\dag , x_n) < \varepsilon \) for all \(n\ge n(\varepsilon )\).

The proof of Proposition 4.4 is based on some preliminary results. It is easily seen that, to complete the proof, we only need to consider the case \(F(x_0)\ne y\). The following result shows that in this case we always have \(F(x_n)\ne y\) for all \(n\ge 0\) for any sequence \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\). Thus, when defining \(x_{n+1}\) from \(x_n\) in Algorithm 2 we always use a finite number \(\alpha _n>0\).

Lemma 4.5

Let all the conditions in Proposition 4.4 hold. For any sequence \(\{(x_n, \xi _n)\} \in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\), if \(F(x_n)=y\) for some \(n\ge 1\), then \(F(x_0) = y\).

Proof

It suffices to show that if \(F(x_k)=y\) for some \(1\le k\le n\), then \(F(x_{k-1})=y\). By using Assumption 1(d) and \(F(x_k)=y\) we have \(L(x^\dag ) (x_k-x^\dag ) = 0\). Thus, in view of (4.5), we have

$$\begin{aligned} x_k-x^\dag \in {\mathscr {N}}(L(x^\dag )) \subset {\mathscr {N}}(L(x_{k-1})). \end{aligned}$$

Consequently \(L(x_{k-1}) (x_k-x^\dag )=0\). If \(F(x_{k-1})\ne y\), then by the definition of \(x_k\) and Assumption 1(d) we have

$$\begin{aligned} \mu _0\Vert y-F(x_{k-1})\Vert&\le \Vert y-F(x_{k-1})-L(x_{k-1})(x_k-x_{k-1})\Vert \\&= \Vert y-F(x_{k-1})-L(x_{k-1}) (x^\dag -x_{k-1})\Vert \\&\le \eta \Vert y-F(x_{k-1})\Vert \end{aligned}$$

which is impossible since \(\mu _0>\eta \). The proof is thus complete. \(\square \)

The next result shows that, if \(F(x_n)\ne y\), then we can give the upper and lower bounds on the number \(\alpha _n\) used to define \(x_{n+1}\) from \(x_n\).

Lemma 4.6

Let all the conditions in Theorem 4.3 hold. If \(F(x_n)\ne y\), then for any \(\alpha \) satisfying

$$\begin{aligned} \mu _0\Vert y-F(x_n)\Vert \le \Vert y-F(x_n)-L(x_n) (x_n(\alpha , y)-x_n)\Vert \le \mu _1 \Vert y-F(x_n)\Vert \end{aligned}$$
(4.6)

there holds \(0<\underline{\alpha }_n \le \alpha \le \overline{\alpha }_n<\infty \), where

$$\begin{aligned} \underline{\alpha }_n:= \frac{(\mu _0^r-\eta ^r)\Vert y-F(x_n)\Vert ^r}{r \varphi (\rho )} \quad \text{ and } \quad \overline{\alpha }_n := \frac{\Vert y-F(x_n)\Vert ^r}{r \varphi ((1-\mu _1)\Vert y-F(x_n)\Vert /\Vert L(x_n)\Vert )}. \end{aligned}$$

Proof

By the definition of \(x_n(\alpha , y)\) and the uniform convexity of \(\Theta \), we have

$$\begin{aligned} \alpha \varphi (\Vert x_n(\alpha , y)-x_n\Vert ) \le \alpha D_{\xi _n}\Theta (x_n(\alpha , y), x_n) \le \frac{1}{r} \Vert y-F(x_n)\Vert ^r. \end{aligned}$$

In view of the second inequality in (4.6) we can obtain

$$\begin{aligned} \Vert L(x_n)\Vert \Vert x_n(\alpha , y)-x_n\Vert \ge \Vert L(x_n) (x_n(\alpha , y)-x_n)\Vert \ge (1-\mu _1) \Vert y-F(x_n)\Vert . \end{aligned}$$

Consequently

$$\begin{aligned} \alpha \varphi \left( \frac{(1-\mu _1)\Vert y-F(x_n)\Vert }{\Vert L(x_n)\Vert }\right) \le \frac{1}{r} \Vert y-F(x_n)\Vert ^r \end{aligned}$$
(4.7)

which implies that \(\alpha \le \overline{\alpha }_n\). On the other hand, by the definition of \(x_n(\alpha , y)\) we have

$$\begin{aligned}&\frac{1}{r} \Vert y -F(x_n) -L(x_n) (x_n(\alpha ,y)-x_n)\Vert ^r \\&\quad \le \frac{1}{r} \Vert y-F(x_n) -L(x_n)(x^\dag -x_n)\Vert ^r +\alpha D_{\xi _n}\Theta (x^\dag , x_n). \end{aligned}$$

In view of the first inequality in (4.6), Assumption 1(d), and the inequality \(D_{\xi _n}\Theta (x^\dag , x_n) \le D_{\xi _0}\Theta (x^\dag , x_0) \le \varphi (\rho )\) from Lemma 4.1, it follows that

$$\begin{aligned} \mu _0^r \Vert y-F(x_n)\Vert ^r \le \eta ^r \Vert y-F(x_n)\Vert ^r + r\alpha \varphi (\rho ). \end{aligned}$$

This implies that \(\alpha \ge \underline{\alpha }_n\). \(\square \)

Now we are ready to give the proof of Proposition 4.4. We will use an idea from [10] which is based on the well-known diagonal sequence argument.

Proof of Proposition 4.4

We may assume that \(F(x_0)\ne y\). We will use a contradiction argument. Assume that the result is not true. Then there is an \(\varepsilon _0>0\) such that for any \(\ell \ge 1\) there exist \(\{(x_n^{(\ell )}, \xi _n^{(\ell )})\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) and \(n_\ell >\ell \) such that

$$\begin{aligned} D_{\xi _{n_\ell }^{(\ell )}}\Theta (x^\dag , x_{n_\ell }^{(\ell )}) \ge \varepsilon _0. \end{aligned}$$
(4.8)

We will construct, for each \(n=0, 1, \ldots \), a strictly increasing subsequence \(\{\ell _{n,k}\}\) of positive integers and \((\bar{x}_n, \bar{\xi }_n) \in \mathcal X\times \mathcal X^*\) such that

  1. (i)

    \(\{(\bar{x}_n, \bar{\xi }_n)\} \in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\);

  2. (ii)

    for each fixed n there hold \(x_n^{(\ell _{n,k})}\rightarrow \bar{x}_n\) and \(\xi _n^{(\ell _{n,k})}\rightarrow \bar{\xi }_n\) as \(k\rightarrow \infty \). Moreover, for all k there hold

    $$\begin{aligned} \quad D_{\xi _n^{(\ell _{n,k})}} \Theta (\bar{x}_n, x_n^{(\ell _{n,k})}) \le \varepsilon _0/4 \quad \text{ and } \quad |\langle \xi _n^{(\ell _{n,k})}-\bar{\xi }_n, \bar{x}_n-x^\dag \rangle | \le \varepsilon _0/4. \end{aligned}$$

Assume that the above construction is available, we will derive a contradiction. According to (i), we may use Theorem 4.3 to conclude that \(D_{\bar{\xi }_n} \Theta (x^\dag , \bar{x}_n) \rightarrow 0\) as \(n\rightarrow \infty \). Thus we can pick a large integer \(\hat{n}\) such that

$$\begin{aligned} D_{\bar{\xi }_{\hat{n}}}\Theta (x^\dag , \bar{x}_{\hat{n}})< \varepsilon _0/2. \end{aligned}$$

Let \(\hat{\ell }: =\ell _{\hat{n}, \hat{n}}\) and consider the sequence \(\{(x_n^{(\hat{\ell })}, \xi _n^{(\hat{\ell })})\}\). According to (ii), we have

$$\begin{aligned} \varepsilon _0/2&> (D_{\bar{\xi }_{\hat{n}}} \Theta (x^\dag , \bar{x}_{\hat{n}}) -D_{\xi _{\hat{n}}^{(\hat{\ell })}} \Theta (x^\dag , x_{\hat{n}}^{(\hat{\ell })})) +D_{\xi _{\hat{n}}^{(\hat{\ell })}} \Theta (x^\dag , x_{\hat{n}}^{(\hat{\ell })})\\&= -D_{\xi _{\hat{n}}^{(\hat{\ell })}} \Theta (\bar{x}_{\hat{n}}, x_{\hat{n}}^{(\hat{\ell })}) + \langle \bar{\xi }_{\hat{n}} -\xi _{\hat{n}}^{(\hat{\ell })}, \bar{x}_{\hat{n}} -x^\dag \rangle + D_{\xi _{\hat{n}}^{(\hat{\ell })}} \Theta (x^\dag , x_{\hat{n}}^{(\hat{\ell })})\\&\ge -\varepsilon _0/4 -\varepsilon _0/4 + D_{\xi _{\hat{n}}^{(\hat{\ell })}} \Theta (x^\dag , x_{\hat{n}}^{(\hat{\ell })}). \end{aligned}$$

Since \(\{\ell _{\hat{n},k}\}\) is strictly increasing, we have \(n_{\hat{\ell }}>\hat{\ell }= \ell _{\hat{n}, \hat{n}} \ge \hat{n}\). Therefore, we may use Lemma 4.1 to obtain

$$\begin{aligned} D_{\xi _{n_{\hat{\ell }}}^{(\hat{\ell })}} \Theta (x^\dag , x_{n_{\hat{\ell }}}^{(\hat{\ell })}) \le D_{\xi _{\hat{n}}^{(\hat{\ell })}} \Theta (x^\dag , x_{\hat{n}}^{(\hat{\ell })}) < \varepsilon _0 \end{aligned}$$

which is a contradiction to (4.8) with \(\ell =\hat{\ell }\).

We turn to the construction of \(\{\ell _{n,k}\}\) and \((\bar{x}_n, \bar{\xi }_n)\), for each \(n=0, 1, \ldots \), such that (i) and (ii) hold. For \(n=0\), we take \((\bar{x}_0, \bar{\xi }_0)=(x_0, \xi _0)\) and \(\ell _{0,k}=k\) for all k. Since \(\xi _0^{(k)}=\xi _0\) and \(x_0^{(k)}=x_0\), (ii) holds automatically for \(n=0\).

Next, assume that we have constructed \(\{\ell _{n,k}\}\) and \((\bar{x}_n, \bar{\xi }_n)\) for all \(0\le n\le m\). We will construct \(\{\ell _{m+1, k}\}\) and \((\bar{x}_{m+1}, \bar{\xi }_{m+1})\). Since \(F(x_0)\ne y\), we have from Lemma 4.5 that \(F(\bar{x}_m)\ne y\) and \(F(x_m^{(\ell )})\ne y\) for all \(\ell \). Let \(\alpha _m^{(\ell )}>0\) be the number used to define \((x_{m+1}^{(\ell )}, \xi _{m+1}^{(\ell )})\) from \((x_m^{(\ell )}, \xi _m^{(\ell )})\). From Lemma 4.6 and the induction hypothesis \(x_m^{(\ell _{m,k})}\rightarrow \bar{x}_m\) we can conclude that there are two positive numbers \(\underline{\alpha }_m\) and \(\bar{\alpha }_m\) independent of k such that \(\underline{\alpha }_m \le \alpha _m^{(\ell _{m,k})}\le \bar{\alpha }_m\) for all k. Thus \(\{\ell _{m,k}\}\) must have a subsequence, denoted as \(\{\ell _{m+1, k}\}\), such that \(\{\alpha _m^{(\ell _{m+1,k})}\}\) converges to some number \(\alpha _m \in (0, \infty )\) as \(k\rightarrow \infty \). We define

$$\begin{aligned} \bar{x}_{m+1}&= \arg \min _{x\in \mathcal X} \left\{ \frac{1}{r} \Vert y-F(\bar{x}_m)-L(\bar{x}_m) (x-\bar{x}_m)\Vert ^r + \alpha _m D_{\bar{\xi }_m} \Theta (x, \bar{x}_m)\right\} ,\\ \bar{\xi }_{m+1}&= \bar{\xi }_m +\frac{1}{\alpha _m} L(\bar{x}_m)^* J_r^{\mathcal Y}(y-F(\bar{x}_m) -L(\bar{x}_m) (\bar{x}_{m+1}-\bar{x}_m)). \end{aligned}$$

It is clear that \(\bar{x}_{m+1}= \bar{x}_m(\alpha _m, y)\) and \(\bar{\xi }_{m+1} = \bar{\xi }_m(\alpha _m, y)\). In view of the induction hypotheses \(x_m^{(\ell _{m+1,k})}\rightarrow \bar{x}_m\) and \(\xi _m^{(\ell _{m+1,k})}\rightarrow \bar{\xi }_m\), the continuity of \(x\rightarrow F(x)\) and \(x\rightarrow L(x)\), and \(\alpha _m^{(\ell _{m+1,k})}\rightarrow \alpha _m\), we may use Lemma 2.2 and the continuity of the duality mapping \(J_r^{\mathcal Y}\) to conclude that

$$\begin{aligned} x_{m+1}^{(\ell _{m+1,k})} \rightarrow \bar{x}_{m+1}, \quad \Theta (x_{m+1}^{(\ell _{m+1,k})}) \rightarrow \Theta (\bar{x}_{m+1}) \quad \text{ and } \quad \xi _{m+1}^{(\ell _{m+1,k})} \rightarrow \bar{\xi }_{m+1} \end{aligned}$$
(4.9)

as \(k\rightarrow \infty \). According to the choice of \(\alpha _m^{(\ell _{m+1,k})}\) and thus the definition of \(x_{m+1}^{(\ell _{m+1,k})}\), we have

$$\begin{aligned} \mu _0\Vert y-F(x_m^{(\ell _{m+1,k})})\Vert&\le \Vert y-F(x_m^{(\ell _{m+1,k})}) - L(x_m^{(\ell _{m+1,k})}) (x_{m+1}^{(\ell _{m+1,k})}-x_m^{(\ell _{m+1,k})})\Vert \\&\le \mu _1 \Vert y - F(x_m^{(\ell _{m+1,k})})\Vert . \end{aligned}$$

Letting \(k\rightarrow \infty \) gives

$$\begin{aligned} \mu _0\Vert y-F(\bar{x}_m)\Vert \le \Vert y-F(\bar{x}_m) - L(\bar{x}_m) (\bar{x}_{m+1}-\bar{x}_m)\Vert \le \mu _1 \Vert y - F(\bar{x}_m)\Vert . \end{aligned}$$

Thus \(\bar{x}_{m+1} = \bar{x}_m(\alpha _m, y)\) satisfies the desired requirement. We therefore complete the construction of \(\{\ell _{m+1,k}\}\) and \((\bar{x}_{m+1}, \bar{\xi }_{m+1})\). We need to show that \(\bar{x}_{m+1}\) and \(\bar{\xi }_{m+1}\) satisfy the estimates in (ii) for \(n=m+1\). We may use (4.9) to obtain

$$\begin{aligned} \lim _{k\rightarrow \infty } \langle \xi _{m+1}^{(\ell _{m+1,k})}-\bar{\xi }_{m+1}, \bar{x}_{m+1}-x^\dag \rangle = 0, \quad \lim _{k\rightarrow \infty } D_{\xi _{m+1}^{(\ell _{m+1,k})}} \Theta (\bar{x}_{m+1}, x_{m+1}^{(\ell _{m+1,k})}) =0 \end{aligned}$$

Consequently, by taking a subsequence of \(\{\ell _{m+1,k}\}\) if necessary, which is still denoted by the same notation, we can guarantee (ii) for \(n=m+1\). \(\square \)

4.3 Regularization property

In this section we will establish the regularization property of Algorithm 1 which is stated in the following result.

Theorem 4.7

Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, lower semi-continuous function that is uniformly convex in the sense of (2.2), and let Assumption 1 hold with \(0\le \eta <1/3\). Let \(\eta <\mu _0\le \mu _1<1-2\eta \) and \(\tau >(1+\eta )/(\mu _0-\eta )\), and let (3.9) be satisfied. Assume further that

$$\begin{aligned} {\mathscr {N}}(L(x^\dag )) \subset {\mathscr {N}}(L(x)), \quad \forall x\in B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta ). \end{aligned}$$

Then for \(x_{n_\delta }^\delta \in \mathcal X\) and \(\xi _{n_\delta }^\delta \in \mathcal X^*\) defined by Algorithm 1 there hold

$$\begin{aligned} \lim _{\delta \rightarrow 0} \Vert x_{n_\delta }^\delta -x^\dag \Vert =0, \quad \lim _{\delta \rightarrow 0} \Theta (x_{n_\delta }^\delta ) =\Theta (x^\dag ) \quad \text{ and } \quad \lim _{\delta \rightarrow 0} D_{\xi _{n_\delta }^\delta }\Theta (x^\dag , x_{n_\delta }^\delta ) =0. \end{aligned}$$

In order to prove Theorem 4.7, we will need to establish certain stability result to connect Algorithms 1 and 2 so that Proposition 4.4 can be used. The following stability result is sufficient for our purpose.

Lemma 4.8

Let \(F(x_0)\ne y\) and let all the conditions in Theorem 4.7 hold. Let \(\{y^{\delta _l}\}\) be a sequence of noisy data satisfying \(\Vert y^{\delta _l}-y\Vert \le \delta _l\) with \(\delta _l\rightarrow 0\) as \(l\rightarrow \infty \). Let \(x_n^{\delta _l}\) and \(\xi _n^{\delta _l}\), \(0\le n\le n_{\delta _l}\), be defined by Algorithm 1. Then for any finite \(n\le \liminf _{l\rightarrow \infty } n_{\delta _l}\), by taking a subsequence of \(\{y^{\delta _l}\}\) if necessary, there is a sequence \(\{(x_m, \xi _m)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) such that

$$\begin{aligned} x_m^{\delta _l}\rightarrow x_m, \quad \xi _m^{\delta _l} \rightarrow \xi _m \quad \text{ and }\quad \Theta (x_m^{\delta _l})\rightarrow \Theta (x_m) \quad \text{ as } l\rightarrow \infty \end{aligned}$$

for all \(0\le m\le n\).

Proof

Since \(F(x_0)\ne y\), we must have \(\liminf _{l\rightarrow \infty } n_{\delta _l}\ge 1\). We will use an induction argument on n.

When \(n=0\), nothing needs to prove since \(x_0^{\delta _l}=x_0\) and \(\xi _0^{\delta _l}=\xi _0\). Assume next that, for some \(0\le n<\liminf _{l\rightarrow \infty } n_{\delta _l}\), the result is true for some sequence \(\{(x_m, \xi _m)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) with \(0\le m\le n\). In order to show the result is also true for \(n+1\), we will obtain a sequence from \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) by retaining the first \(n+1\) terms in \(\{(x_m, \xi _m)\}\) and modifying the remaining terms. It suffices to redefine \(x_{n+1}\) and \(\xi _{n+1}\) since then we can apply Algorithm 2 to produce the remaining terms.

Since \(F(x_0)\ne y\), we have from Lemma 4.5 that \(F(x_n)\ne y\). Let \(\alpha _n(y^{\delta _l})\) be the number used to define \(x_{n+1}^{\delta _l}\) and \(\xi _{n+1}^{\delta _l}\). Since the induction hypothesis \(x_n^{\delta _l}\rightarrow x_n\) and the fact \(y^{\delta _l}\rightarrow y\) imply \(\Vert F(x_n^{\delta _l})-y^{\delta _l}\Vert \rightarrow \Vert F(x_n)-y\Vert >0\) as \(l\rightarrow \infty \), we may use the similar argument in the proof of Lemma 4.6 to conclude that

$$\begin{aligned} \underline{\alpha }_n \le \alpha _n(y^{\delta _l}) \le \bar{\alpha }_n \end{aligned}$$

for two numbers \(0<\underline{\alpha }_n\le \bar{\alpha }_n<\infty \) independent of l. Therefore, by taking a subsequence of \(\{y^{\delta _l}\}\) if necessary, we may assume that

$$\begin{aligned} \alpha _n(y^{\delta _l}) \rightarrow \alpha _n \quad \text{ as } \quad l\rightarrow \infty \end{aligned}$$

for some number \(\alpha _n\in (0, \infty )\). We define

$$\begin{aligned} x_{n+1}&= \arg \min _{x\in \mathcal X} \left\{ \frac{1}{r} \Vert y-F(x_n) -L(x_n) (x-x_n)\Vert ^r + \alpha _n D_{\xi _n}\Theta (x, x_n)\right\} ,\\ \xi _{n+1}&= \xi _n + \frac{1}{\alpha _n} L(x_n)^* J_r^{\mathcal Y}(y-F(x_n) -L(x_n) (x_{n+1}-x_n)). \end{aligned}$$

In view of the induction hypotheses and the continuity of \(x\rightarrow F(x)\) and \(x\rightarrow L(x)\), we may use Lemma 2.2 and the continuity of \(J_r^{\mathcal Y}\) to conclude that

$$\begin{aligned} x_{n+1}^{\delta _l} \rightarrow x_{n+1}, \quad \Theta (x_{n+1}^{\delta _l}) \rightarrow \Theta (x_{n+1}) \quad \text{ and } \quad \xi _{n+1}^{\delta _l}\rightarrow \xi _{n+1} \quad \text{ as } l\rightarrow \infty . \end{aligned}$$

Moreover, since

$$\begin{aligned} \mu _0 \Vert y^{\delta _l}-F(x_n^{\delta _l})\Vert \le \Vert y^{\delta _l} -F(x_n^{\delta _l})-L(x_n^{\delta _l}) (x_{n+1}^{\delta _l}-x_n^{\delta _l})\Vert \le \mu _1 \Vert y^{\delta _l}-F(x_n^{\delta _l})\Vert , \end{aligned}$$

by taking \(l\rightarrow \infty \) we can conclude that

$$\begin{aligned} \mu _0 \Vert y-F(x_n)\Vert \le \Vert y -F(x_n)-L(x_n) (x_{n+1}-x_n)\Vert \le \mu _1 \Vert y-F(x_n)\Vert . \end{aligned}$$

Therefore \(x_{n+1}=x_n(\alpha _n, y)\) and \(\xi _{n+1}=\xi _n(\alpha _n, y)\) are the desired elements to be defined. The proof is thus complete. \(\square \)

The next result will be used to prove \(\lim _{\delta \rightarrow 0} \Theta (x_{n_\delta }^\delta ) =\Theta (x^\dag )\) in Theorem 4.7.

Lemma 4.9

Let all the conditions in Theorem 4.7 hold and let \(\{(x_n^\delta , \xi _n^\delta )\}_{0\le n\le n_\delta }\) be defined by Algorithm 1. Then for all \(0\le l\le n_\delta \) there holds

$$\begin{aligned} |\langle \xi _{n_\delta }^\delta -\xi _l^\delta , x^\dag -x_{n_\delta }^\delta \rangle | \le C_2 D_{\xi _l^\delta } \Theta (x^\dag , x_l^\delta ), \end{aligned}$$

where \(C_2=(1+\eta )(3\tau +1)/(\tau c_0 \mu _0^r)\).

Proof

By the definition of \(\xi _n^\delta \), the property of \(J_r^{\mathcal Y}\) and (3.3), we can obtain

$$\begin{aligned} |\langle \xi _{n_\delta }^\delta -\xi _l^\delta , x^\dag -x_{n_\delta }^\delta \rangle |&\le \sum _{n=l}^{n_\delta -1} |\langle \xi _{n+1}^\delta -\xi _n^\delta , x^\dag -x_{n_\delta }^\delta \rangle |\\&\le \sum _{n=l}^{n_\delta -1} \frac{1}{\alpha _n(y^\delta )} \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta )(x_{n+1}^\delta -x_n^\delta )\Vert ^{r-1} \Vert L(x_n^\delta ) (x^\dag -x_{n_\delta }^\delta )\Vert \\&\le \sum _{n=l}^{n_\delta -1} \frac{1}{\alpha _n(y^\delta )} \Vert y^\delta -F(x_n^\delta )\Vert ^{r-1} \Vert L(x_n^\delta )(x^\dag -x_{n_\delta }^\delta )\Vert . \end{aligned}$$

By Assumption 1(d) and (3.14) we can derive that

$$\begin{aligned} \Vert L(x_n^\delta ) (x^\dag -x_{n_\delta }^\delta )\Vert \le (1+\eta ) (\delta + 3 \Vert y^\delta -F(x_n^\delta )\Vert ), \quad 0\le n< n_\delta . \end{aligned}$$

Since \(\Vert y^\delta -F(x_n^\delta )\Vert >\tau \delta \) for \(0\le n<n_\delta \), we therefore have

$$\begin{aligned} \Vert L(x_n^\delta ) (x^\dag -x_{n_\delta }^\delta )\Vert \le \frac{(1+\eta )(3\tau +1)}{\tau } \Vert y^\delta -F(x_n^\delta )\Vert , \quad 0\le n<n_\delta . \end{aligned}$$

Consequently

$$\begin{aligned} |\langle \xi _{n_\delta }^\delta -\xi _l^\delta , x^\dag -x_{n_\delta }^\delta \rangle | \le \frac{(1+\eta )(3\tau +1)}{\tau } \sum _{n=l}^{n_\delta -1} \frac{1}{\alpha _n(y^\delta )}\Vert y^\delta -F(x_n^\delta )\Vert ^r. \end{aligned}$$

This together with (3.11) in Proposition 3.4 implies the desired estimate. \(\square \)

We are now ready to prove Theorem 4.7, the main result of this paper.

Proof of Theorem 4.7

We may assume that \(F(x_0)\ne y\). We first claim that

$$\begin{aligned} \lim _{\delta \rightarrow 0} n_\delta = \infty . \end{aligned}$$
(4.10)

Suppose that this is not true. Then there exists \(\{y^{\delta _l}\}\) satisfying \(\Vert y^{\delta _l}-y\Vert \le \delta _l\) with \(\delta _l\rightarrow 0\) such that \(n_{\delta _l}\rightarrow \hat{n}\) as \(l\rightarrow \infty \) for some finite integer \(\hat{n}\). Thus \(n_{\delta _l} = \hat{n}\) for large l. By the definition of \(n_{\delta _l}\) we have

$$\begin{aligned} \Vert F(x_{\hat{n}}^{\delta _l})-y^{\delta _l}\Vert \le \tau \delta _l. \end{aligned}$$
(4.11)

In view of Lemma 4.8, by taking a subsequence of \(\{y^{\delta _l}\}\) if necessary, we can find \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) such that \(x_{\hat{n}}^{\delta _l}\rightarrow x_{\hat{n}}\) as \(l\rightarrow \infty \). Letting \(l\rightarrow \infty \) in (4.11) gives \(F(x_{\hat{n}})=y\). Consequently, by Lemma 4.5, we must have \(F(x_0)=y\) which is a contradiction.

We next show the convergence result. We first prove that

$$\begin{aligned} \lim _{\delta \rightarrow 0} D_{\xi _{n_\delta }^\delta } \Theta (x^\dag , x_{n_\delta }^\delta ) =0 \end{aligned}$$
(4.12)

by a contradiction argument. Suppose that (4.12) is not true. Then there exist a number \(\varepsilon >0\) and a sequence \(\{y^{\delta _l}\}\) satisfying \(\Vert y^{\delta _l}-y\Vert \le \delta _l\) with \(\delta _l\rightarrow 0\) as \(l\rightarrow \infty \) such that

$$\begin{aligned} D_{\xi _{n_l}^{\delta _l}}\Theta (x^\dag , x_{n_l}^{\delta _l}) \ge \varepsilon \quad \text{ for } \text{ all } l, \end{aligned}$$
(4.13)

where \(n_l:= n_{\delta _l}\). According to Proposition 4.4, there is an integer \(n(\varepsilon )\) such that

$$\begin{aligned} D_{\xi _{n(\varepsilon )}} \Theta (x^\dag , x_{n(\varepsilon )}) <\varepsilon , \qquad \forall \{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0). \end{aligned}$$
(4.14)

For this \(n(\varepsilon )\), by using Lemma 4.8 and by taking a subsequence of \(\{y^{\delta _l}\}\) if necessary, we can find \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) such that

$$\begin{aligned} x_n^{\delta _l} \rightarrow x_n, \quad \xi _n^{\delta _l} \rightarrow \xi _n \quad \text{ and } \quad \Theta (x_n^{\delta _l}) \rightarrow \Theta (x_n) \quad \text{ as } l \rightarrow \infty \end{aligned}$$
(4.15)

for \(0\le n\le n(\varepsilon )\). Since (4.10) implies that \(n_l>n(\varepsilon )\) for large l, by using Proposition 3.4 we have

$$\begin{aligned} D_{\xi _{n_l}^{\delta _l}}\Theta (x^\dag , x_{n_l}^{\delta _l})\le D_{\xi _{n(\varepsilon )}^{\delta _l}}\Theta (x^\dag , x_{n(\varepsilon )}^{\delta _l}) =\Theta (x^\dag ) -\Theta (x_{n(\varepsilon )}^{\delta _l}) -\langle \xi _{n(\varepsilon )}^{\delta _l}, x^\dag -x_{n(\varepsilon )}^{\delta _l}\rangle . \end{aligned}$$

In view of (4.15) and (4.14), we therefore obtain

$$\begin{aligned} \limsup _{l\rightarrow \infty } D_{\xi _{n_l}^{\delta _l}} \Theta (x^\dag , x_{n_l}^{\delta _l})&\le \Theta (x^\dag ) -\lim _{l\rightarrow \infty } \Theta (x_{n(\varepsilon )}^{\delta _l}) -\lim _{l\rightarrow \infty } \langle \xi _{n(\varepsilon )}^{\delta _l}, x^\dag -x_{n(\varepsilon )}^{\delta _l}\rangle \\&= \Theta (x^\dag ) -\Theta (x_{n(\varepsilon )}) -\langle \xi _{n(\varepsilon )}, x^\dag -x_{n(\varepsilon )}\rangle \\&= D_{\xi _{n(\varepsilon )}} \Theta (x^\dag , x_{n(\varepsilon )}) <\varepsilon . \end{aligned}$$

This is a contradiction to (4.13). We thus obtain (4.12). By virtue of the uniform convexity of \(\Theta \), we then have \(\lim _{\delta \rightarrow 0} \Vert x_{n_\delta }^{\delta } -x^\dag \Vert =0\).

It remains only to show that \(\lim _{\delta \rightarrow 0} \Theta (x_{n_\delta }^{\delta }) = \Theta (x^\dag )\). In view of (4.12), it suffices to show that

$$\begin{aligned} \lim _{\delta \rightarrow 0} \langle \xi _{n_\delta }^{\delta }, x^\dag -x_{n_\delta }^{\delta }\rangle =0. \end{aligned}$$
(4.16)

We again use a contradiction argument by assuming that there is a number \(\varepsilon >0\) and a sequence \(\{y^{\delta _l}\}\) satisfying \(\Vert y^{\delta _l}-y\Vert \le \delta _l\) with \(\delta _l\rightarrow 0\) as \(l\rightarrow \infty \) such that

$$\begin{aligned} |\langle \xi _{n_l}^{\delta _l}, x^\dag -x_{n_l}^{\delta _l}\rangle | \ge C_2\varepsilon \quad \text{ for } \text{ all } l, \end{aligned}$$
(4.17)

where \(C_2\) is the constant defined in Lemma 4.9. For this \(\varepsilon \), we may use Proposition 4.4 and Lemma 4.8 to find an integer \(n(\varepsilon )\) such that (4.14) and (4.15) hold. In view of Lemma 4.9, we have

$$\begin{aligned} |\langle \xi _{n_l}^{\delta _l}, x^\dag -x_{n_l}^{\delta _l}\rangle |&\le |\langle \xi _{n_l}^{\delta _l}-\xi _{n(\varepsilon )}^{\delta _l}, x^\dag -x_{n_l}^{\delta _l}\rangle | + |\langle \xi _{n(\varepsilon )}^{\delta _l}, x^\dag -x_{n_l}^{\delta _l}\rangle |\\&\le C_2 D_{\xi _{n(\varepsilon )}^{\delta _l}} \Theta (x^\dag , x_{n(\varepsilon )}^{\delta _l}) +|\langle \xi _{n(\varepsilon )}^{\delta _l}, x^\dag -x_{n_l}^{\delta _l}\rangle |. \end{aligned}$$

By taking \(l\rightarrow \infty \), using \(x_{n_l}^{\delta _l}\rightarrow x^\dag \), (4.15) and (4.14), we can obtain

$$\begin{aligned} \limsup _{l\rightarrow \infty } |\langle \xi _{n_l}^{\delta _l}, x^\dag -x_{n_l}^{\delta _l}\rangle |&\le C_2 \lim _{l\rightarrow \infty } D_{\xi _{n(\varepsilon )}^{\delta _l}}\Theta (x^\dag , x_{n(\varepsilon )}^{\delta _l}) {\,=\,} C_2 D_{\xi _{n(\varepsilon )}}\Theta (x^\dag , x_{n(\varepsilon )}) {\,<\,} C_2 \varepsilon \end{aligned}$$

which contradicts (4.17). We therefore obtain (4.16) and complete the proof of Theorem 4.7. \(\square \)

5 Numerical results

We consider the identification of the parameter c in the boundary value problem

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\triangle u + cu = f &{} \text{ in } \Omega , \\ u = g &{} \text{ on } \partial \Omega \end{array} \right. \end{aligned}$$
(5.1)

from the measurements of the state variable u in \(\Omega \), where \(\Omega \subset \mathbb {R}^d\), \(d\le 2\), is a bounded domain with Lipschitz boundary \(\partial \Omega \), \(f\in H^{-1}(\Omega )\) and \(g\in H^{1/2}(\partial \Omega )\). This is a benchmark example of nonlinear inverse problems. It is well known that (5.1) has a unique solution \(u = u(c) \in H^1(\Omega )\) for each c in the domain

$$\begin{aligned} {\mathscr {D}}: = \{ c\in L^2(\Omega ) : \Vert c-\hat{c}\Vert _{ L^2(\Omega )}\le \gamma _0 \text{ for } \text{ some } \hat{c}\ge 0,\ \text {a.e.}\} \end{aligned}$$

with some \(\gamma _0>0\). By the Sobolev embedding \(H^1(\Omega )\hookrightarrow L^r(\Omega )\), it makes sense to define the parameter-to-solution map \(F: {\mathscr {D}} \subset L^2(\Omega ) \rightarrow L^r(\Omega )\) with \(F(c) = u(c)\) for any \(1<r<\infty \). We consider the problem of identifying \(c\in L^2(\Omega )\) from an \(L^r(\Omega )\)-measurement of u. This is amount to solving \(F(c)=u\). It is known that F is Fréchet differentiable; the Fréchet derivative and its Banach space adjoint are given respectively by

$$\begin{aligned} \begin{array}{ll} F'(c)h &{} = -A(c)^{-1}(hu(c)), \quad \,\, h\in L^2(\Omega ),\\ F'(c)^* w &{}= -u(c) A(c)^{-1}w, \qquad w \in L^{r^*}(\Omega ), \end{array} \end{aligned}$$
(5.2)

where \(r^*\) is the number conjugate to r, i.e. \(1/r^*+1/r=1\), and \(A(c): H^2(\Omega )\cap H_0^1(\Omega )\rightarrow L^2(\Omega )\) is defined by \(A(c) u = -\triangle u +cu\). Recall that in the space \(L^r(\Omega )\) with \(1<r<\infty \) the duality mapping \(J_r: L^r(\Omega ) \rightarrow L^{r^*}(\Omega )\) is given by

$$\begin{aligned} J_r(\varphi ) := |\varphi |^{r-1} \text{ sign }(\varphi ), \quad \varphi \in L^r(\Omega ). \end{aligned}$$

For this parameter identification problem, the tangential cone condition has been verified for \(r\ge 2\) in [12, 26]; the verification for \(1<r<2\), however, is not yet available.

In the following we will report some numerical results for this inverse problem to indicate the performance of Algorithm 1 with various choices of the convex function \(\Theta \) and the Banach spaces \(\mathcal X\) and \(\mathcal Y\). The main computational cost stems from solving the convex minimization problems involved in the algorithm which requires numerical solutions of differential equations related to calculating the Fréchet derivatives and their adjoint. We use BFGS—one of the most popular quasi-Newton methods—for Example 5.1 and a restarted nonlinear CG method for Examples 5.2 and 5.3 below, see [29]. Some fast algorithms have been discovered for solving convex optimization problems in recent years, including the fast proximal gradient method [4] and the primal dual hybrid gradient methods [5, 33]. These methods are powerful to deal with problems for which fast solvers such as fft are applicable. In case fast algorithms are not applicable as in our computation, they might not have much advantage over other type methods.

Example 5.1

Consider the one-dimensional problem on the interval \(\Omega =(0,1)\) with the source term \(f(t) = 100 e^{-10(t-0.5)^2}\) and the boundary data \(u(0) = 1\) and \(u(1) =2\). We will identify the sought solution

$$\begin{aligned} c^\dag (t) = 5t^2(1-t) + \sin ^2(2\pi t) \end{aligned}$$

using noisy data that contains a few data points, called outliers, which are highly inconsistent with other data points. The appearance of outliers may arise from procedural measurement errors.

In Fig. 1 we present the numerical results by Algorithm 1 with \(\tau = 1.1\), \(\mu _0=0.90\) and \(\mu _1 = 0.96\) using the initial guess \(c_0=0\) and \(\xi _0=0\). In order to carry out the computation, the differential equations involved are solved by a finite difference method by dividing \(\Omega =(0,1)\) into 400 subintervals of equal length. The minimization problems involved in the algorithm are solved by performing 50 iterations of the BFGS method. Figure 1a and d show the plots of the noisy data; the one in (a) contains only Gaussian noise, while the one in (d) contains not only Gaussian noise but also 10 % outliers. Figure 1b and e present the reconstruction results by the regularizing Levenberg–Marquardt scheme of Hanke, i.e. Algorithm 1 with \(\mathcal X=\mathcal Y=L^2[0,1]\) and \(\Theta (c) = \frac{1}{2} \Vert c\Vert _{L^2}^2\); it shows that the method is highly susceptible to the existence of outliers. In Fig. 1c and f we present the reconstruction results by Algorithm 1 with \(\mathcal X= L^2[0,1]\), \(\mathcal Y=L^r[0,1]\) with \(r=1.1\) and \(\Theta (c) = \frac{1}{2} \Vert c\Vert _{L^2}^2\). It can be seen that the method is robust enough to prevent being affected by outliers. Using the \(L^r\) misfit data terms, with \(r\ge 1\) close to 1, to exclude the outliers has been investigated for several other regularization methods, see [14, 15, 18, 23].

Fig. 1
figure 1

Numerical results for Example 5.1: a, d data with noise; b, e reconstruction results by Algorithm 1 with \(\mathcal X=\mathcal Y=L^2[0,1]\) and \(\Theta (c) =\frac{1}{2} \Vert c\Vert _2^2\); c, f reconstruction result by Algorithm 1 with \(\mathcal X=L^2[0,1]\), \(\mathcal Y=L^{1.1}[0,1]\) and \(\Theta (c) =\frac{1}{2} \Vert c\Vert _2^2\)

Example 5.2

We next consider the two dimensional problem on \(\Omega = [0,1]\times [0,1]\) with the source term

$$\begin{aligned} f(x,y) = 200 e^{-10(x-0.5)^2-10(y-0.5)^2} \end{aligned}$$

and the boundary data \(g \equiv 1\) on \(\partial \Omega \). The sought solution is a piecewise constant function as shown in Fig. 2a. We reconstruct the sought solution using Algorithm 1 with \(\mathcal X=\mathcal Y=L^2(\Omega )\) and different choices of \(\Theta \). In order to carry out the computation, we divide \(\Omega \) into \(100\times 100\) small squares of equal size. All partial differential equations involved are solved approximately by a finite difference method. When using Algorithm 1, we use \(\tau =1.01\), \(\mu _0=0.90\), \(\mu _1 = 0.96\) and take \(\xi _0=c_0=0\) as an initial guess. The minimization problem to determine \(c_n^\delta \) for each \(n\ge 1\) is solved by a restart CG method after 1000 iterations.

Fig. 2
figure 2

Reconstruction results for Example 5.2: a exact solution; b Algorithm 1 with \(\Theta (c) = \frac{1}{2}\Vert c\Vert _2^2\); c Algorithm 1 with \(\Theta (c) =\frac{1}{2}\Vert c\Vert _2^2 +|c|_{TV, \varepsilon }\) and \(\varepsilon = 10^{-4}\)

In Fig. 2 we report the numerical results using measurement data that is corrupted by Gaussian noise with noise level \(\delta =0.001\). Figure 2b presents the reconstruction result using \(\Theta (c) =\frac{1}{2} \Vert c\Vert _{L^2}^2\). Due to the over-smoothing effect, the reconstruction result turns out to contain unsatisfactory artifacts. Figure 2c reports the reconstruction result using \(\Theta (c) = \lambda \Vert c\Vert _{L^2}^2 +|c|_{TV, \varepsilon }\) with \(\lambda = 1/2\), where \(\varepsilon = 10^{-4}\) and \(|c|_{TV, \varepsilon }= \int _\Omega \sqrt{|\nabla c|^2+\varepsilon }\) which can be considered as a smoothed approximation of the total variation functional \(\int _\Omega |\nabla c|\). Clearly the result in (c) significantly improves the one in (b) by efficiently removing the undesired artifacts.

Example 5.3

We use the same setup in Example 5.2 but now the sought solution is sparse. The domain \(\Omega \) is divided into \(120\times 120\) small squares of equal size in order to solve the associated partial differential equations. In Fig. 3 we report the reconstruction results of Algorithm 1 using measurement data contaminated by Gaussian noise with noise level \(\delta =0.001\). We use \(\tau =1.001\), \(\mu _0 =0.90\), \(\mu _1=0.96\) and take \(c_0=\xi _0=0\) as an initial guess. The minimization problems involved in the algorithm are solved again by a restart CG after 300 iterations. The true solution is plot in Fig. 3a. Figure 3b presents the numerical result of Algorithm 1 using \(\Theta (c) =\frac{1}{2}\Vert c\Vert _{L^2}^2\). Figure 3c reports the numerical result of Algorithm 1 using \(\Theta (c) =\lambda \Vert c\Vert _{L^2}^2 +\int _\Omega \sqrt{|c|^2+\varepsilon }\) with \(\lambda = 0.01\) and \(\varepsilon =10^{-4}\) which can be regarded as a smoothed approximation of the \(L^1\) norm. A comparison on the results in (b) and (c) clearly shows that the sparsity of the sought solution is significantly reconstructed in (c). Therefore, a proper use of a convex function close to the \(L^1\) norm can improve the reconstruction of sparse functions dramatically.

Fig. 3
figure 3

Reconstruction results for Example 5.3: a exact solution; b Algorithm 1 with \(\Theta (c) = \frac{1}{2}\Vert c\Vert _{L^2}^2\); c Algorithm 1 with \(\Theta (c) =\lambda \Vert c\Vert _{L^2}^2 +\int _\Omega \sqrt{|c|^2+\varepsilon }\), where \(\lambda =0.01\) and \(\varepsilon = 10^{-4}\)