Abstract
We propose a Levenberg–Marquardt method with general uniformly convex regularization terms to solve nonlinear inverse problems in Banach spaces, which is an extension of the scheme proposed by Hanke in (Inverse Probl 13:79–95, 1997) in Hilbert space setting. The method is so designed that it can be used to detect the features of the sought solutions such as sparsity or piecewise constancy. It can also be used to deal with the situation that the data is contaminated by noise containing outliers. By using tools from convex analysis in Banach spaces, we establish the convergence of the method. Numerical simulations are reported to test the performance of the method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Inverse problems can arise from many applications in natural sciences. Most of inverse problems are usually ill-posed in the sense that their solutions do not depend continuously on the data. Thus, the stable reconstruction of solutions of inverse problems requires regularization techniques [2, 8, 25].
For solving nonlinear inverse problems in Hilbert spaces, Hanke introduced in [11] his regularizing Levenberg–Marquardt scheme which is a stable iterative procedure with the Lagrangian multipliers in the Levenberg–Marquardt method being updated by an adaptive strategy. To describe this method more precisely, consider nonlinear inverse problems that can be formulated as the form
where \(F: \mathcal X\rightarrow \mathcal Y\) is a nonlinear Fréchet differentiable operator between two Hilbert spaces \(\mathcal X\) and \(\mathcal Y\) whose Fréchet derivative at x is denoted by \(F'(x)\). Assuming that \(y^\delta \) is the only available noisy data, the regularizing Levenberg–Marquardt scheme in [11] constructs the next iterate \(x_{n+1}\) from a current iterate \(x_n\) by first regularizing the linearized equation
at \(x_n\) via the Tikhonov regularization
using a quadratic penalty term and then define \(x_{n+1}: = x_n(\alpha _n, y^\delta )\) with \(\alpha _n>0\) being a parameter satisfying
where \(0<\mu <1\) is a preassigned number. It has been shown in [11] that this defines a regularization method as long as the iteration is terminated by the discrepancy principle. The regularizing Levenberg–Marquardt scheme has had far reaching impact on the development of iterative regularization methods for solving nonlinear inverse problems and has stimulated considerable subsequent work, see [13, 17, 18, 27, 30] and the references therein.
In order to deal with the situations where the sought solutions are sparse or piecewise constant and where the data are contaminated by general noise, one has to use the sparse promoting functionals or total variational like functionals as regularization terms and use general fidelity terms to fit data. This leads to the consideration on regularization methods in Banach spaces with general regularization terms which has emerged as a highly active research field in recent years. The monograph [31] collects some of such research works including the variational regularization of Tikhonov type and some iterative regularization methods in Banach spaces. One may further refer to [14, 18, 20–24, 26, 28] for more recent works.
The purpose of the present paper is to extend the regularizing Levenberg–Marquardt scheme of Hanke to solve nonlinear inverse problems in Banach spaces using general convex regularization terms. Thus, we will consider nonlinear inverse problems modelled by (1.1) with F being a nonlinear operator between two Banach spaces \(\mathcal X\) and \(\mathcal Y\) whose dual spaces are denoted as \(\mathcal X^*\) and \(\mathcal Y^*\) respectively. Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, lower semi-continuous, uniformly convex function which is chosen according to the a prior information on the feature of the sought solution. Using a current iterate \((x_n, \xi _n)\in \mathcal X\times \mathcal X^*\) with \(\xi _n\in \partial \Theta (x_n)\), the subdifferential of \(\Theta \) at \(x_n\), we define
where \(1<r<\infty \) is a given number and \(D_{\xi _n}\Theta (x, x_n)\) denotes the Bregman distance induced by \(\Theta \) at \(x_n\) in the direction \(\xi _n\). We then define the next iterate \(x_{n+1}\) by \(x_{n+1}= x_n(\alpha _n, y^\delta )\), where \(\alpha _n>0\) is chosen such that
with preassigned numbers \(0<\mu _0\le \mu _1<1\). This choice of \(\alpha _n\) is a relaxation of (1.3) and has more flexibility when the root of the Eq. (1.3) is difficult to determine exactly. We need to update the subgradient \(\xi _n\) to \(\xi _{n+1}\in \partial \Theta (x_{n+1})\) which, according to the minimization property of \(x_{n+1}\), can be taken as
where \(F'(x_n)^*: \mathcal Y^*\rightarrow \mathcal X^*\) denotes the adjoint of \(F'(x_n)\) and \(J_r^{\mathcal Y}: \mathcal Y\rightarrow \mathcal Y^*\) denotes the duality mapping on \(\mathcal Y\) with gauge function \(t\rightarrow t^{r-1}\). We repeat the above procedure until a discrepancy principle is satisfied. The precise description of the above extension will be given in Algorithm 1. The main result of this paper shows that the above extension of the Levenberg–Marquardt scheme is well-defined and exhibits the regularization property when used to solve ill-posed problems.
The introduction of a general convex function \(\Theta \) into the algorithm presents many challenging issues on its convergence analysis. Unlike (1.2) whose minimizer can be determined explicitly, the minimizer of (1.4) does not have a closed form. The possible non-smoothness of \(\Theta \) and the non-Hilbertian structure of \(\mathcal X\) and \(\mathcal Y\) prevent us from using the classical techniques. Instead we have to utilize tools from convex optimization and non-smooth analysis, including the subdifferential calculus and the Bregman distance. The convergence analysis becomes even more subtle when considering the regularization property. The main obstacle comes from the stability issue; an iterative sequence constructed using noisy data can split into many possible noise-free iterative sequences as the noise level tends zero, due to the non-unique determination of \(\alpha _n\). We will conquer this difficulty by borrowing an idea from [10], which is based on the diagonal sequence argument, to show that all these noise-free sequences actually converge uniformly in certain sense. On the other hand, unlike the variational regularization of Tikhonov [31] and the non-stationary iterated Tikhonov regularization [22] whose numerical implementation requires to solve several non-convex minimization problems, our method involves only convex minimization problems and therefore has the advantage of being implemented efficiently by convex optimization techniques.
Under a prior choice of \(\{\alpha _n\}\), a Levenberg–Marquardt method was considered in [1] to solve (1.1) with \(\mathcal X=\L ^2(\Omega )\) and \(\mathcal Y\) a Hilbert space using the convex penalty function \(\Theta (x)= a\Vert x\Vert _{L^2}^2 + \int _\Omega |Dx|\), where \(\int _\Omega |Dx|\) denotes the total variation of x. The analysis in [1], however, is somewhat preliminary since it provides only weak convergence along subsequences. In contrast, our method chooses \(\{\alpha _n\}\) adaptively, and the whole iterative sequence converges to a solution of (1.1) strongly.
In recent years the iteratively regularized Gauss–Newton method has been extended to solve nonlinear inverse problems in Banach spaces [21, 24, 26] in which it defines \(\{x_n\}\) by
It looks similar to (1.4) but in fact they are essentially different. The iteratively regularized Gauss–Newton method always defines \(x_n\) in a neighborhood of the initial guess \(x_0\), while the Levenberg–Marquardt method defines \(x_{n+1}\) in a region around \(x_n\) for each \(n\ge 0\). From the optimization point of view, Levenberg–Marquardt method is more favorable in nature.
The rest of this paper is organized as follows. In Sect. 2 we collect some basic results from convex analysis in Banach spaces and prove a continuous dependence result of minimizers of uniformly convex functionals on various parameters. We then present in Sect. 3 the Levenberg–Marquardt method in Banach spaces with general convex regularization terms and show that the method is well-defined. The detailed convergence analysis is given in Sect. 4. Finally in Sect. 5 we report various numerical results to indicate the performance of the method.
2 Preliminaries
Let \(\mathcal X\) and \(\mathcal Y\) be two Banach spaces whose norms are denoted by \(\Vert \cdot \Vert \). We use \(\mathcal X^*\) and \(\mathcal Y^*\) to denote their dual spaces respectively. Given \(x\in \mathcal X\) and \(x^*\in \mathcal X^*\) we write \(\langle x^*, x\rangle =x^*(x)\) for the duality pair. We use “\(\rightarrow \)” and “\(\rightharpoonup \)” to denote the strong convergence and the weak convergence respectively. By \({\mathscr {L}}(\mathcal X, \mathcal Y)\) we denote for the space of all continuous linear operators from \(\mathcal X\) to \(\mathcal Y\). For any \(A\in {\mathscr {L}}(\mathcal X, \mathcal Y)\), we use \(A^*: \mathcal Y^*\rightarrow \mathcal X^*\) to denote its adjoint, i.e. \(\langle A^* y^*, x\rangle =\langle y^*, A x\rangle \) for any \(x\in \mathcal X\) and \(y^*\in \mathcal Y^*\). We use \({\mathscr {N}}(A)=\{x\in \mathcal X: A x=0\}\) to denote the null space of A and define
When \(\mathcal X\) is reflexive, there holds \({\mathscr {N}} (A)^\perp =\overline{{\mathscr {R}}(A^*)}\), where \({\mathscr {R}}(A^*)\) denotes the range space of \(A^*\) and \(\overline{{\mathscr {R}}(A^*)}\) denotes the closure of \({\mathscr {R}}(A^*)\) in \(\mathcal X^*\).
For a convex function \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\), we use \({\mathscr {D}}(\Theta ):=\{x\in \mathcal X: \Theta (x)<+\infty \}\) to denote its effective domain. We call \(\Theta \) proper if \({\mathscr {D}}(\Theta )\ne \emptyset \). Given \(x\in \mathcal X\) we define
Any element \(\xi \in \partial \Theta (x)\) is called a subgradient of \(\Theta \) at x. The multi-valued mapping \(\partial \Theta : \mathcal X\rightarrow 2^{\mathcal X^*}\) is called the subdifferential of \(\Theta \). It could happen that \(\partial \Theta (x)=\emptyset \) for some \(x\in {\mathscr {D}}(\Theta )\). Let
For \(x \in {\mathscr {D}}(\partial \Theta )\) and \(\xi \in \partial \Theta (x)\) we define
which is called the Bregman distance induced by \(\Theta \) at x in the direction \(\xi \). Clearly \(D_\xi \Theta (\bar{x},x)\ge 0\). By straightforward calculation one can see that
for all \(x_1, x_2\in {\mathscr {D}}(\partial \Theta )\), \(\xi _1\in \partial \Theta (x_1)\), \(\xi _2\in \partial \Theta (x_2)\) and \(x\in \mathcal X\).
Bregman distance can be used to obtain information under the Banach space norm when \(\Theta \) has stronger convexity. A proper convex function \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) is called uniformly convex if there is a strictly increasing function \(\varphi :[0, \infty ) \rightarrow [0, \infty )\) with \(\varphi (0)=0\) such that
for all \(\bar{x}, x\in \mathcal X\) and \(\lambda \in [0,1]\). It is easily seen that if \(\Theta \) is uniformly convex in the sense of (2.2) then
for all \(\bar{x} \in \mathcal X\), \(x\in {\mathscr {D}}(\partial \Theta )\) and \(\xi \in \partial \Theta (x)\). Moreover, it follows from [32, Proposition 3.5.8] that any proper, weakly lower semi-continuous, uniform convex function \(\Theta :\mathcal X\rightarrow (-\infty , \infty ]\) is coercive in the sense that
On a Banach space \(\mathcal X\), we consider for \(1<r<\infty \) the convex function \(x\rightarrow \Vert x\Vert ^r/r\). Its subdifferential at x is given by
which gives the duality mapping \(J_r^{\mathcal X}: \mathcal X\rightarrow 2^{\mathcal X^*}\) with gauge function \(t\rightarrow t^{r-1}\). We call \(\mathcal X\) uniformly convex if its modulus of convexity
satisfies \(\delta _{\mathcal X}(t)>0\) for all \(0 < t\le 2\). We call \(\mathcal X\) uniformly smooth if its modulus of smoothness
satisfies \(\lim _{s\searrow 0} \frac{\rho _{\mathcal X}(s)}{s} =0\). It is well known [6] that any uniformly convex or uniformly smooth Banach space is reflexive. On a uniformly smooth Banach space \(\mathcal X\), every duality mapping \(J_r^{\mathcal X}\) with \(1<r<\infty \) is single valued and uniformly continuous on bounded sets. Furthermore, on a uniformly convex Banach space, any sequence \(\{x_n\}\) satisfying \(x_n \rightharpoonup x\) and \(\Vert x_n\Vert \rightarrow \Vert x\Vert \) must satisfy \(x_n\rightarrow x\) as \(n\rightarrow \infty \). This property can be generalized for uniformly convex functions which we state in the following result.
Lemma 2.1
[22] Let \(\Theta :\mathcal X\rightarrow (-\infty , \infty ]\) be a proper, weakly lower semi-continuous, and uniformly convex function. Then \(\Theta \) admits the Kadec property, i.e. for any sequence \(\{x_n\}\subset \mathcal X\) satisfying \(x_n \rightharpoonup x\in \mathcal X\) and \(\Theta (x_n) \rightarrow \Theta (x)<\infty \) there holds \(x_n\rightarrow x\) as \(n\rightarrow \infty \).
We conclude this section by providing a continuous dependence result of minimizers for uniformly convex cost functionals on various parameters. This result is crucial in the forthcoming sections.
Lemma 2.2
Let \(\mathcal X\) and \(\mathcal Y\) be Banach spaces with \(\mathcal X\) being reflexive. Let \(\{A(x): \mathcal X\rightarrow \mathcal Y\}_{x\in {\mathcal D}}\subset {\mathscr {L}}(\mathcal X, \mathcal Y)\) be such that \(x\rightarrow A(x)\) is continuous on \({\mathcal D}\subset \mathcal X\). Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, weakly lower semi-continuous, uniformly convex function. Assume that the sequences \(\{\alpha ^{(\ell )}\}\subset (0, \infty )\), \(\{b^{(\ell )}\}\subset \mathcal Y\), \(\{x^{(\ell )}\}\subset {\mathcal D}\) and \(\{\xi ^{(\ell )}\}\subset \mathcal X^*\) with \(\xi ^{(\ell )}\in \partial \Theta (x^{(\ell )})\) satisfy
for some \(\bar{\alpha }>0\), \(\bar{b} \in \mathcal Y\), \(\bar{x} \in {\mathcal D}\) and \(\bar{\xi }\in \mathcal X^*\) with \(\bar{\xi }\in \partial \Theta (\bar{x})\). For \(1\le r<\infty \) let
Then \(z^{(\ell )} \rightarrow \bar{z}\) and \(\Theta (z^{(\ell )}) \rightarrow \Theta (\bar{z})\) as \(\ell \rightarrow \infty \), where
Proof
It is easy to see that \(z^{(\ell )}\) and \(\bar{z}\) are uniquely defined since the corresponding cost functionals are weakly lower semi-continuous and uniformly convex and hence coercive, see (2.4). Because \(\xi ^{(\ell )}\in \partial \Theta (x^{(\ell )})\) and \(\bar{\xi }\in \partial \Theta (\bar{x})\), we may use the condition (2.5) to obtain
Therefore
To show \(z^{(\ell )} \rightarrow \bar{z}\) we will adapt the argument in [9, 16]. By the definition of \(z^{(\ell )}\) we have
By the given conditions, the right hand side is bounded and thus \(\{D_{\xi ^{(\ell )}}\Theta (z^{(\ell )}, x^{(\ell )})\}\) is bounded. Consequently, \(\{z^{(\ell )}\}\) is bounded in \(\mathcal X\) by the uniform convexity of \(\Theta \). Since \(\mathcal X\) is reflexive, \(\{z^{(\ell )}\}\) has a subsequence, denoted by the same notation, such that \(z^{(\ell )} \rightharpoonup z_*\) as \(\ell \rightarrow \infty \) for some \(z_*\in \mathcal X\). Since \(x\rightarrow A(x)\) is continuous and \(A(\bar{x})\in {\mathscr {L}}(\mathcal X, \mathcal Y)\), we have \(A(x^{(\ell )}) z^{(\ell )} \rightharpoonup A(\bar{x}) z_*\) as \(\ell \rightarrow \infty \). By using (2.6) and the weak lower semi-continuity of \(\Theta \) and the Banach space norm, it follows that
Consequently
where, for the last inequality we used the minimality of \(z^{(\ell )}\) and for the last equality we used (2.5) and (2.6). By using the minimality and uniqueness of \(\bar{z}\), we can conclude \(z_*=\bar{z}\) and
Next we show that
According to (2.8) and \(z_*=\bar{z}\), it suffices to show that
Suppose this is not true, i.e. \(\gamma _0>\gamma _1\). By taking a subsequence if necessary, we may assume that
Then from (2.9) it follows that
which is a contradiction to (2.7). We thus obtain (2.10).
In view of (2.10), (2.5) and \(z^{(\ell )} \rightharpoonup \bar{z}\) as \(\ell \rightarrow \infty \) we have
This together with (2.6) implies that \(\lim _{\ell \rightarrow \infty } \Theta (z^{(\ell )}) =\Theta (\bar{z})\). Since \(z^{(\ell )} \rightharpoonup \bar{z}\), we may use Lemma 2.1 to conclude that \(z^{(\ell )}\rightarrow \bar{z}\) as \(\ell \rightarrow \infty \). \(\square \)
3 The method
We consider the Eq. (1.1) arising from nonlinear inverse problems, where \(F: {\mathscr {D}}(F)\subset \mathcal X\rightarrow \mathcal Y\) is a nonlinear operator between two Banach spaces \(\mathcal X\) and \(\mathcal Y\) with domain \({\mathscr {D}}(F)\). We will assume that (1.1) has a solution. In general (1.1) may have many solutions. In order to find the desired one, some selection criterion should be enforced. According to a prior information on the sought solution, we choose a proper, weakly lower semi-continuous, uniformly convex function \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\). By taking \(x_0\in {\mathscr {D}}(\Theta ) \cap {\mathscr {D}}(F)\) and \(\xi _0\in \partial \Theta (x_0)\) as the initial guess, we define \(x^\dag \) to be the solution of (1.1) with the property
We are interested in developing algorithms to find the solution \(x^\dag \) of (1.1). We will work under the following standard conditions.
Assumption 1
-
(a)
\(\mathcal X\) is a reflexive Banach space and \(\mathcal Y\) is a uniformly smooth Banach space;
-
(b)
F is weakly closed on \({\mathscr {D}}(F)\), i.e. for any sequence \(\{x_n\} \subset {\mathscr {D}}(F)\) satisfying \(x_n\rightharpoonup x\in \mathcal X\) and \(F(x_n)\rightharpoonup v \in \mathcal Y\) there hold \(x\in {\mathscr {D}}(F)\) and \(F(x) = v\);
-
(c)
There is \(\rho >0\) such that \(B_{2\rho }(x_0)\subset {\mathscr {D}}(F)\) and (1.1) has a solution in \(B_\rho (x_0)\cap {\mathscr {D}}(\Theta )\), where \(B_{\rho }(x_0): = \{x\in \mathcal X:\ \Vert x-x_0\Vert \le \rho \}\);
-
(d)
There exists \(\{L(x): \mathcal X\rightarrow \mathcal Y\}_{x\in B_{2\rho }(x_0)}\subset {\mathscr {L}}(\mathcal X, \mathcal Y)\) such that \(x\rightarrow L(x)\) is continuous on \(B_{2\rho }(x_0)\) and there is \(0\le \eta <1\) such that
$$\begin{aligned} \Vert F(\bar{x})-F(x)-L(x)(\bar{x}-x)\Vert \le \eta \Vert F(\bar{x})-F(x)\Vert \end{aligned}$$for all \(\bar{x}, x\in B_{2\rho }(x_0)\).
In Assumption 1, the uniform smoothness of \(\mathcal Y\) in (a) is used to guarantee that the duality mapping \(J_r^{\mathcal Y}\) is single-valued and continuous for each \(1<r<\infty \). We do not require F to be Fréchet differentiable; in case F is Fréchet differentiable, we may take \(L(x) =F'(x)\), where \(F'(x)\) denotes the Fréchet derivative of F at x. The condition in (d) is the so called tangential cone condition which has been widely used in the analysis of regularization methods for nonlinear inverse problems and has been verified for several important applications [11–13, 20, 22–24, 26, 30]. How to replace the tangential cone condition by a weaker condition is a challenging issue. Under certain smoothness conditions on the solution, a class of Newton methods in Hilbert spaces has been shown in [19] to be order optimal under merely the Lipschitz condition on the Fréchet derivative of F when the methods are terminated by a discrepancy principle. How to extend such result to the Banach space setting remains an open problem. In view of (d) in Assumption 1, it is easily seen that
which shows that \(x\rightarrow F(x)\) is continuous on \(B_{2\rho }(x_0)\). When \(\mathcal X\) is a reflexive Banach space, by using the uniform convexity of \(\Theta \) and weak closedness of F, it is standard to show that \(x^\dag \) exists. Moreover, [20, Lemma 3.2] gives the following local uniqueness result.
Lemma 3.1
Let Assumption 1 hold. If \(x^\dag \in B_\rho (x_0)\cap {\mathscr {D}}(\Theta )\), then \(x^\dag \) is the unique solution of (1.1) in \(B_{2\rho }(x_0) \cap {\mathscr {D}}(\Theta )\) satisfying (3.1).
In practical applications, instead of y we only have noisy data \(y^\delta \) satisfying
with a small noise level \(\delta >0\). We will use \(y^\delta \) to construct an approximate solution to (1.1). To formulate our Levenberg–Marquardt algorithm, assuming \(x_n^\delta \) has been constructed we consider the linearized equation
and apply to it the Tikhonov regularization whose regularizing term is the Bregman distance induced by \(\Theta \) at \(x_n^\delta \). The next iterate \(x_{n+1}^\delta \) is then constructed by chosen the regularization parameter adaptively. This leads to the following Levenberg–Marquardt algorithm.
Algorithm 1
(Levenberg–Marquardt method with noisy data)
-
1.
Take \(x_0\in \mathcal X\) and \(\xi _0\in \mathcal X^*\) such that \(\xi _0 \in \partial \Theta (x_0)\). Pick \(0<\mu _0\le \mu _1<1\) and \(\tau >1\).
-
2.
Let \(x_0^\delta :=x_0\) and \(\xi _0^\delta := \xi _0\). Assume that \(x_n^\delta \) and \(\xi _n^\delta \) are well-defined, we then define \(x_{n+1}^\delta \) and \(\xi _{n+1}^\delta \) as follows:
-
(a)
For each \(\alpha >0\) we define \(x_n(\alpha , y^\delta )\) and \(\xi _n(\alpha , y^\delta )\) as
$$\begin{aligned} x_n(\alpha , y^\delta )= & {} \arg \min _{x\in \mathcal X} \left\{ \frac{1}{r} \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta )(x-x_n^\delta )\Vert ^r+ \alpha D_{\xi _n^\delta }\Theta (x,x_n^\delta )\right\} ,\\ \xi _n(\alpha , y^\delta )= & {} \xi _n^\delta +\frac{1}{\alpha } L(x_n^\delta )^* J_r^{\mathcal Y}\left( y^\delta -F(x_n^\delta )-L(x_n^\delta ) (x_n(\alpha , y^\delta )-x_n^\delta )\right) ; \end{aligned}$$ -
(b)
Take \(\alpha _n(y^\delta ) >0\) to be a number \(\alpha \) such that
$$\begin{aligned} \mu _0 \Vert y^\delta - F(x_n^\delta )\Vert \le \Vert y^\delta -F(x_n^\delta )-L(x_n^\delta ) (x_n(\alpha , y^\delta )-x_n^\delta )\Vert \le \mu _1 \Vert y^\delta -F(x_n^\delta )\Vert ; \end{aligned}$$ -
(c)
Define \(x_{n+1}^\delta :=x_n(\alpha _n(y^\delta ), y^\delta )\) and \(\xi _{n+1}^\delta := \xi _n(\alpha _n(y^\delta ), y^\delta )\).
-
(a)
-
3.
Let \(n_\delta \) be the first integer such that \(\Vert y^\delta -F(x_{n_\delta }^\delta )\Vert \le \tau \delta \) and use \(x_{n_\delta }^\delta \) as an approximate solution.
When \(\mathcal X\) and \(\mathcal Y\) are Hilbert spaces, \(r=2\) and \(\Theta (x) = \Vert x\Vert ^2/2\), Algorithm 1 reduces to the regularizing Levenberg–Marquardt scheme in [11] and each minimizer \(x_n(\alpha , y^\delta )\) can be written explicitly. In the setting of Algorithm 1 in Banach spaces with general convex regularization terms, \(x_n(\alpha , y^\delta )\) does not have an explicit formula. This increases the difficulty in convergence analysis. By making use of tools from convex analysis, in this section we will show that Algorithm 1 is well-defined, and in Sect. 4 we will show that \(x_{n_\delta }^\delta \) indeed converges to a solution of (1.1) as \(\delta \rightarrow 0\).
In Algorithm 1, we need to pick \(\xi _0\in \mathcal X^*\) and \(x_0\in \mathcal X\) such that \(\xi _0 \in \partial \Theta (x_0)\). This can be achieved as follows: pick \(\xi _0\in \mathcal X^*\) and define
then \(\xi _0\in \partial \Theta (x_0)\) holds automatically; in applications, we usually have \(\Theta \ge 0\) and \(\Theta (0)=0\), then we can simply take \(x_0=0\) and \(\xi _0=0\).
From the definition of \(x_n(\alpha , y^\delta )\) in Algorithm 1 we can see that
The definition of \(\xi _n(\alpha , y^\delta )\) is exactly motivated by this fact so that
Moreover, by the minimality of \(x_n(\alpha , y^\delta )\), we always have
In order to prove that Algorithm 1 is well-defined, we need to show that the number \(\alpha _n(y^\delta )\) used to define \(x_{n+1}^\delta \) from \(x_n^\delta \) exists, each \(x_n^\delta \) is in \({\mathscr {D}}(F)\), and the iteration terminates after \(n_\delta <\infty \) steps. We achieve these via a series of results.
Lemma 3.2
Let \(\mathcal X\) be reflexive and let \(\Theta :\mathcal X\rightarrow (-\infty , \infty ]\) be proper, weakly lower semi-continuous and uniformly convex. Then, for each \(\alpha >0\), \(x_n(\alpha , y^\delta )\) is uniquely determined. Moreover, the mapping \(\alpha \rightarrow x_n(\alpha , y^\delta )\) is continuous over \((0, \infty )\), and the function
is continuous and monotonically increasing over \((0, \infty )\).
Proof
All assertions, except the monotonicity of (3.4), follow from Lemma 2.2. The monotonicity of (3.4) can be proved by a standard argument, see [3, Lemma 9.2.1] or [7, Lemma 6.1] for instance. \(\square \)
Lemma 3.3
Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, weakly lower semi-continuous function that is uniformly convex in the sense of (2.2). Let Assumption 1 hold with \(0\le \eta <1\). Let \(\eta <\mu _0<1\) and \(\tau >(1+\eta )/(\mu _0-\eta )\). Assume that \(x_n^\delta \) and \(\xi _n^\delta \) are well-defined for some \(0\le n<n_\delta \) with
Then for any \(\alpha >0\) such that
there hold \(x_n(\alpha , y^\delta ) \in B_{2\rho }(x_0)\) and
for any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\), where \(c_0:=1-(1+\eta +\tau \eta )/(\tau \mu _0)\).
Proof
For simplicity of exposition, we write
By using the identity (2.1) and the nonnegativity of the Bregman distance, we obtain
By the definition of \(\xi _n(\alpha )\) we then have
We can write
Then, by virtue of the property of the duality mapping \(J_r^{\mathcal Y}\), we obtain
In view of (3.5) and (2.3), we have \(\Vert x_n^\delta -x^\dag \Vert \le \rho \) and \(\Vert x^\dag -x_0\Vert \le \rho \) which implies that \(x_n^\delta \in B_{2\rho }(x_0)\). Thus we may use (3.2) and Assumption 1(d) to derive that
Since \(n<n_\delta \) we have \(\Vert F(x_n^\delta ) -y^\delta \Vert \ge \tau \delta \). Thus
Therefore
In view of the inequality (3.6), we thus obtain
where \(c_0:=1-(1+\eta +\tau \eta )/(\tau \mu _0)\). According to the conditions on \(\mu _0\) and \(\tau \), we have \(c_0>0\). Thus, in view of (3.6) again, we obtain (3.7).
Finally, by using (3.7) with \(\hat{x} = x^\dag \) and (3.5) we have
This together with (2.3) and \(\Vert x_0-x^\dag \Vert \le \rho \) implies that \(x_n(\alpha ) \in B_{2\rho }(x_0)\). \(\square \)
Proposition 3.4
Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be proper, lower semi-continuous and uniformly convex in the sense of (2.2). Let Assumption 1 hold with \(0\le \eta <1/3\) and let \(\eta <\mu _0\le \mu _1<1-2\eta \) and \(\tau >(1+\eta )/(\mu _0-\eta )\). Assume that
Then \(x_n^\delta \) are well-defined for all \(0\le n\le n_\delta \) and Algorithm 1 terminates after \(n_\delta <\infty \) iterations with \(n_\delta =O(1+|\log \delta |)\). Moreover, for any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) there hold
and
for all \(0\le n<n_\delta \), where \(C_0=1/(c_0 \mu _0^r)\).
Proof
We first show that if
for some \(0\le n<n_\delta \), then there exists \(\alpha _n(y^\delta )>0\) such that
where
which is continuous and monotonically increasing, see Lemma 3.2. To see this, we may use the minimality of \(x_n(\alpha , y^\delta )\) to obtain
This implies that \(\lim _{\alpha \rightarrow \infty } D_{\xi _n^\delta }\Theta (x_n(\alpha , y^\delta ), x_n^\delta ) =0\) and hence \(\lim _{\alpha \rightarrow \infty } \Vert x_n(\alpha , y^\delta )-x_n^\delta \Vert =0\) by the uniform convexity of \(\Theta \). Consequently
To show the existence of a finite \(\alpha _n(y^\delta )\) satisfying (3.13), it suffices to show that
Suppose this is not true, then
Thus, we may use Lemma 3.3 to obtain
Taking \(\alpha \rightarrow 0\) gives \(y^\delta =F(x_n^\delta )\) which is absurd since \(\Vert y^\delta -F(x_n^\delta )\Vert >\tau \delta \) for \(n<n_\delta \).
We next show (3.12) by induction on n. It is trivial for \(n=0\). Assume that it is true for \(n=m\) with \(m<n_\delta \). Since \(x_{m+1}^\delta = x_m(\alpha _m(y^\delta ), y^\delta )\) and \(\xi _{m+1}^\delta = \xi _m(\alpha _m(y^\delta ), y^\delta )\), we may use Lemma 3.3 to conclude that
which together with the induction hypothesis shows (3.12) for \(n=m+1\).
Since (3.12) holds true for all \(0\le n<n_\delta \), we may use Lemma 3.3 to obtain (3.10) and (3.11) immediately.
Finally, the finiteness of \(n_\delta \) can be proved by a standard argument from [11]. For completeness we include the argument here. By Assumption 1(d) and the definition of \(x_{n+1}^\delta \) we have for all \(0\le n<n_\delta \) that
This implies that
and hence
If \(n_\delta =\infty \), then we must have \(\Vert y^\delta -F(x_n^\delta )\Vert > \tau \delta \) for all n. But the inequality (3.15) implies \(\Vert y^\delta -F(x_n^\delta )\Vert \rightarrow 0\) as \(n\rightarrow \infty \). Therefore \(n_\delta <\infty \). Now we take \(n=n_\delta -1\) in (3.15) and obtain \(q^{n_\delta -1}\Vert y^\delta -F(x_0)\Vert >\tau \delta \). This implies \(n_\delta =O(1+|\log \delta |)\). \(\square \)
4 Convergence analysis
In this section we will show that Algorithm 1 is a regularization method for solving (1.1), that is, we will show that if \((x_n^\delta , \xi _n^\delta )\in \mathcal X\times \mathcal X^*\), \(0\le n\le n_\delta \), are defined by Algorithm 1 using noisy data \(y^\delta \), then \(x_{n_\delta }^\delta \) converges to \(x^\dag \) as \(y^\delta \rightarrow y\). To this end, it is necessary to investigate for each fixed n the behavior of \(x_n^\delta \) as \(y^\delta \rightarrow y\). This leads us to consider the counterpart of Algorithm 1 with exact data which is formulated as Algorithm 2 below. Due to the non-unique determination of \(\alpha _n\), this algorithm actually defines many distinct iterative sequences. We will show that every sequence defined by Algorithm 2 is convergent. This convergence result however is not enough for showing the regularization property of Algorithm 1. Indeed, for each fixed \(n\ge 1\), when taking \(y^\delta \rightarrow y\), the sequence \(\{\alpha _{n-1}(y^\delta )\}\) used to define \(x_n^\delta \) from \(x_{n-1}^\delta \) may split into many convergent subsequences and so does \(\{x_n^\delta , \xi _n^\delta \}\) with limits defined by Algorithm 2. This forces us to establish a uniform convergence result for all the possible sequences defined by Algorithm 2. The regularization property of Algorithm 1 is then established by using a stability result which connects Algorithms 1 and 2.
4.1 The method with exact data
We start with the formulation of the counterpart of Algorithm 1 when the exact data is used.
Algorithm 2
(Levenberg–Marquardt method with exact data)
-
1.
Let \(0<\mu _0\le \mu _1<1\) and \((x_0, \xi _0) \in \mathcal X\times \mathcal X^*\) be the same as in Algorithm 1.
-
2.
Assume that \(x_n\) and \(\xi _n\) are defined. If \(F(x_n)=y\), we define \(x_{n+1} = x_n\) and \(\xi _{n+1}=\xi _n\); otherwise, we define \(x_{n+1}\) and \(\xi _{n+1}\) as follows:
-
(a)
For each \(\alpha >0\) we define \(x_n(\alpha , y)\) and \(\xi _n(\alpha , y)\) as
$$\begin{aligned} x_n(\alpha , y)= & {} \arg \min _{x\in \mathcal X} \left\{ \frac{1}{r} \Vert y-F(x_n)-L(x_n)(x-x_n)\Vert ^r+ \alpha D_{\xi _n}\Theta (x,x_n)\right\} ,\\ \xi _n(\alpha , y)= & {} \xi _n +\frac{1}{\alpha } L(x_n)^* J_r^{\mathcal Y}\left( y-F(x_n)-L(x_n) (x_n(\alpha , y)-x_n)\right) ; \end{aligned}$$ -
(b)
Take \(\alpha _n>0\) to be a number such that
$$\begin{aligned} \mu _0 \Vert y - F(x_n)\Vert \le \Vert y-F(x_n)-L(x_n) (x_n(\alpha _n, y)-x_n)\Vert \le \mu _1 \Vert y-F(x_n)\Vert ; \end{aligned}$$ -
(c)
Define \(x_{n+1}:=x_n(\alpha _n, y)\) and \(\xi _{n+1} := \xi _n(\alpha _n, y)\).
-
(a)
In the formulation of Algorithm 2, we take \(\alpha _n>0\) to be any number satisfying (b) when defining \(x_{n+1}, \xi _{n+1}\) from \(x_n, \xi _n\). There might have many possible choices of \(\alpha _n\); different choice of \(\{\alpha _n\}\) may lead to different iterative sequence. We will use \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) to denote the set of all possible sequence \(\{(x_n, \xi _n)\}\) in \(\mathcal X\times \mathcal X^*\) constructed from \((x_0, \xi _0)\) by Algorithm 2 with \(\alpha _n>0\) chosen to be any number satisfying (b). By using the same argument in the proof of Proposition 3.4, we can obtain the following result which shows that each sequence in \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) is well-defined and admits certain monotonicity property.
Lemma 4.1
Let Assumption 1 hold with \(0\le \eta <1\) and let \(\eta <\mu _0\le \mu _1<1\). Then any sequence \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) is well-defined and for any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) there hold
for all \(n\ge 0\).
In order to derive the convergence of every sequence \(\{(x_n, \xi _n)\}\) in \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\), we will use the following result which gives a general convergence criterion.
Proposition 4.2
Consider the Eq. (1.1) for which Assumption 1 holds. Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, lower semi-continuous and uniformly convex function. Let \(\{x_n\}\subset B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) and \(\{\xi _n\}\subset \mathcal X^*\) be such that
-
(i)
\(\xi _n\in \partial \Theta (x_n)\) for all n;
-
(ii)
for any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) the sequence \(\{D_{\xi _n}\Theta (\hat{x}, x_n)\}\) is monotonically decreasing;
-
(iii)
\(\lim _{n\rightarrow \infty } \Vert F(x_n)-y\Vert =0\).
-
(iv)
there is a constant C such that for all \(k>n\) and any solution \(\hat{x}\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) there holds
$$\begin{aligned} |\langle \xi _k-\xi _n, x_k-\hat{x}\rangle | \le C (D_{\xi _n}\Theta (\hat{x}, x_n) -D_{\xi _k}\Theta (\hat{x}, x_k)). \end{aligned}$$(4.3)
Then there exists a solution \(x_*\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) such that
If, in addition, \(x^\dag \in B_\rho (x_0)\cap {\mathscr {D}}(\Theta )\) and \(\xi _{n+1}-\xi _n \in \overline{{\mathscr {R}}(L(x^\dag )^*)}\) for all n, then \(x_*=x^\dag \).
Proof
This result follows from [20, Proposition 3.6] and its proof. \(\square \)
Now we can prove the main convergence result on Algorithm 2.
Theorem 4.3
Let Assumption 1 hold with \(0\le \eta <1/3\) and let \(\eta <\mu _0\le \mu _1<1-2\eta \). Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be proper, lower semi-continuous and uniformly convex in the sense of (2.2), and let (3.9) be satisfied. Then for any \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) there exists a solution \(x_*\) of (1.1) in \(B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\) such that
If in addition \({\mathscr {N}}(L(x^\dag )) \subset {\mathscr {N}}(L(x))\) for all \(x \in B_{2\rho }(x_0)\cap {\mathscr {D}}(F)\), then \(x_*=x^\dag \).
Proof
We will use Proposition 4.2. By the definition of \(\xi _n\) in Algorithm 2 we always have \(\xi _n\in \partial \Theta (x_n)\) for all \(n\ge 0\) which shows (i) in Proposition 4.2. Lemma 4.1 shows (ii) in Proposition 4.2. By the similar argument for deriving (3.14) we can show that
with \(q=(\mu _1+\eta )/(1-\eta )<1\). This implies (iii) in Proposition 4.2. In order to show the convergence result, it remains only to show (iv) in Proposition 4.2.
To this end, for \(0\le l <m<\infty \) we may use the definition of \(\xi _n\) and the property of the duality mapping \(J_r^{\mathcal Y}\) to obtain
By the triangle inequality \(\Vert L(x_n)(x_m-x^\dag )\Vert \le \Vert L(x_n)(x_n-x^\dag )\Vert +\Vert L(x_n)(x_m-x_n)\Vert \) and Assumption 1(d), we have
Since the inequality (4.4) implies that \(\{\Vert y-F(x_n)\Vert \}\) monotonically decreasing, we have
Therefore
In view of (4.2) in Lemma 4.1, we obtain with \(C_1:=3(1+\eta )/(\mu _0-\eta )\) that
which shows (iv) in Proposition 4.2.
To show the last part under the condition \({\mathscr {N}}(L(x^\dag ))\subset {\mathscr {N}}(L(x))\) for \(x\in B_{2\rho }(x_0)\cap {\mathscr {D}}(\Theta )\), we observe from the definition of \(\xi _n\) that
Thus, we may use the second part of Proposition 4.2 to conclude the proof. \(\square \)
4.2 A uniform convergence result
In Theorem 4.3 we have shown the convergence of every sequence in \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\). In this subsection we will strengthen this result by showing the following uniform convergence result for all sequences in \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) which will be crucial in establishing the regularization property of Algorithm 1.
Proposition 4.4
Assume all the conditions in Theorem 4.3 hold. Assume also that
Then, for any \(\varepsilon >0\), there is an integer \(n(\varepsilon )\) such that for any sequence \(\{(\xi _n, x_n)\} \in \Gamma _{\mu _0, \mu _1} (\xi _0, x_0)\) there holds \(D_{\xi _n} \Theta (x^\dag , x_n) < \varepsilon \) for all \(n\ge n(\varepsilon )\).
The proof of Proposition 4.4 is based on some preliminary results. It is easily seen that, to complete the proof, we only need to consider the case \(F(x_0)\ne y\). The following result shows that in this case we always have \(F(x_n)\ne y\) for all \(n\ge 0\) for any sequence \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\). Thus, when defining \(x_{n+1}\) from \(x_n\) in Algorithm 2 we always use a finite number \(\alpha _n>0\).
Lemma 4.5
Let all the conditions in Proposition 4.4 hold. For any sequence \(\{(x_n, \xi _n)\} \in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\), if \(F(x_n)=y\) for some \(n\ge 1\), then \(F(x_0) = y\).
Proof
It suffices to show that if \(F(x_k)=y\) for some \(1\le k\le n\), then \(F(x_{k-1})=y\). By using Assumption 1(d) and \(F(x_k)=y\) we have \(L(x^\dag ) (x_k-x^\dag ) = 0\). Thus, in view of (4.5), we have
Consequently \(L(x_{k-1}) (x_k-x^\dag )=0\). If \(F(x_{k-1})\ne y\), then by the definition of \(x_k\) and Assumption 1(d) we have
which is impossible since \(\mu _0>\eta \). The proof is thus complete. \(\square \)
The next result shows that, if \(F(x_n)\ne y\), then we can give the upper and lower bounds on the number \(\alpha _n\) used to define \(x_{n+1}\) from \(x_n\).
Lemma 4.6
Let all the conditions in Theorem 4.3 hold. If \(F(x_n)\ne y\), then for any \(\alpha \) satisfying
there holds \(0<\underline{\alpha }_n \le \alpha \le \overline{\alpha }_n<\infty \), where
Proof
By the definition of \(x_n(\alpha , y)\) and the uniform convexity of \(\Theta \), we have
In view of the second inequality in (4.6) we can obtain
Consequently
which implies that \(\alpha \le \overline{\alpha }_n\). On the other hand, by the definition of \(x_n(\alpha , y)\) we have
In view of the first inequality in (4.6), Assumption 1(d), and the inequality \(D_{\xi _n}\Theta (x^\dag , x_n) \le D_{\xi _0}\Theta (x^\dag , x_0) \le \varphi (\rho )\) from Lemma 4.1, it follows that
This implies that \(\alpha \ge \underline{\alpha }_n\). \(\square \)
Now we are ready to give the proof of Proposition 4.4. We will use an idea from [10] which is based on the well-known diagonal sequence argument.
Proof of Proposition 4.4
We may assume that \(F(x_0)\ne y\). We will use a contradiction argument. Assume that the result is not true. Then there is an \(\varepsilon _0>0\) such that for any \(\ell \ge 1\) there exist \(\{(x_n^{(\ell )}, \xi _n^{(\ell )})\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) and \(n_\ell >\ell \) such that
We will construct, for each \(n=0, 1, \ldots \), a strictly increasing subsequence \(\{\ell _{n,k}\}\) of positive integers and \((\bar{x}_n, \bar{\xi }_n) \in \mathcal X\times \mathcal X^*\) such that
-
(i)
\(\{(\bar{x}_n, \bar{\xi }_n)\} \in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\);
-
(ii)
for each fixed n there hold \(x_n^{(\ell _{n,k})}\rightarrow \bar{x}_n\) and \(\xi _n^{(\ell _{n,k})}\rightarrow \bar{\xi }_n\) as \(k\rightarrow \infty \). Moreover, for all k there hold
$$\begin{aligned} \quad D_{\xi _n^{(\ell _{n,k})}} \Theta (\bar{x}_n, x_n^{(\ell _{n,k})}) \le \varepsilon _0/4 \quad \text{ and } \quad |\langle \xi _n^{(\ell _{n,k})}-\bar{\xi }_n, \bar{x}_n-x^\dag \rangle | \le \varepsilon _0/4. \end{aligned}$$
Assume that the above construction is available, we will derive a contradiction. According to (i), we may use Theorem 4.3 to conclude that \(D_{\bar{\xi }_n} \Theta (x^\dag , \bar{x}_n) \rightarrow 0\) as \(n\rightarrow \infty \). Thus we can pick a large integer \(\hat{n}\) such that
Let \(\hat{\ell }: =\ell _{\hat{n}, \hat{n}}\) and consider the sequence \(\{(x_n^{(\hat{\ell })}, \xi _n^{(\hat{\ell })})\}\). According to (ii), we have
Since \(\{\ell _{\hat{n},k}\}\) is strictly increasing, we have \(n_{\hat{\ell }}>\hat{\ell }= \ell _{\hat{n}, \hat{n}} \ge \hat{n}\). Therefore, we may use Lemma 4.1 to obtain
which is a contradiction to (4.8) with \(\ell =\hat{\ell }\).
We turn to the construction of \(\{\ell _{n,k}\}\) and \((\bar{x}_n, \bar{\xi }_n)\), for each \(n=0, 1, \ldots \), such that (i) and (ii) hold. For \(n=0\), we take \((\bar{x}_0, \bar{\xi }_0)=(x_0, \xi _0)\) and \(\ell _{0,k}=k\) for all k. Since \(\xi _0^{(k)}=\xi _0\) and \(x_0^{(k)}=x_0\), (ii) holds automatically for \(n=0\).
Next, assume that we have constructed \(\{\ell _{n,k}\}\) and \((\bar{x}_n, \bar{\xi }_n)\) for all \(0\le n\le m\). We will construct \(\{\ell _{m+1, k}\}\) and \((\bar{x}_{m+1}, \bar{\xi }_{m+1})\). Since \(F(x_0)\ne y\), we have from Lemma 4.5 that \(F(\bar{x}_m)\ne y\) and \(F(x_m^{(\ell )})\ne y\) for all \(\ell \). Let \(\alpha _m^{(\ell )}>0\) be the number used to define \((x_{m+1}^{(\ell )}, \xi _{m+1}^{(\ell )})\) from \((x_m^{(\ell )}, \xi _m^{(\ell )})\). From Lemma 4.6 and the induction hypothesis \(x_m^{(\ell _{m,k})}\rightarrow \bar{x}_m\) we can conclude that there are two positive numbers \(\underline{\alpha }_m\) and \(\bar{\alpha }_m\) independent of k such that \(\underline{\alpha }_m \le \alpha _m^{(\ell _{m,k})}\le \bar{\alpha }_m\) for all k. Thus \(\{\ell _{m,k}\}\) must have a subsequence, denoted as \(\{\ell _{m+1, k}\}\), such that \(\{\alpha _m^{(\ell _{m+1,k})}\}\) converges to some number \(\alpha _m \in (0, \infty )\) as \(k\rightarrow \infty \). We define
It is clear that \(\bar{x}_{m+1}= \bar{x}_m(\alpha _m, y)\) and \(\bar{\xi }_{m+1} = \bar{\xi }_m(\alpha _m, y)\). In view of the induction hypotheses \(x_m^{(\ell _{m+1,k})}\rightarrow \bar{x}_m\) and \(\xi _m^{(\ell _{m+1,k})}\rightarrow \bar{\xi }_m\), the continuity of \(x\rightarrow F(x)\) and \(x\rightarrow L(x)\), and \(\alpha _m^{(\ell _{m+1,k})}\rightarrow \alpha _m\), we may use Lemma 2.2 and the continuity of the duality mapping \(J_r^{\mathcal Y}\) to conclude that
as \(k\rightarrow \infty \). According to the choice of \(\alpha _m^{(\ell _{m+1,k})}\) and thus the definition of \(x_{m+1}^{(\ell _{m+1,k})}\), we have
Letting \(k\rightarrow \infty \) gives
Thus \(\bar{x}_{m+1} = \bar{x}_m(\alpha _m, y)\) satisfies the desired requirement. We therefore complete the construction of \(\{\ell _{m+1,k}\}\) and \((\bar{x}_{m+1}, \bar{\xi }_{m+1})\). We need to show that \(\bar{x}_{m+1}\) and \(\bar{\xi }_{m+1}\) satisfy the estimates in (ii) for \(n=m+1\). We may use (4.9) to obtain
Consequently, by taking a subsequence of \(\{\ell _{m+1,k}\}\) if necessary, which is still denoted by the same notation, we can guarantee (ii) for \(n=m+1\). \(\square \)
4.3 Regularization property
In this section we will establish the regularization property of Algorithm 1 which is stated in the following result.
Theorem 4.7
Let \(\Theta : \mathcal X\rightarrow (-\infty , \infty ]\) be a proper, lower semi-continuous function that is uniformly convex in the sense of (2.2), and let Assumption 1 hold with \(0\le \eta <1/3\). Let \(\eta <\mu _0\le \mu _1<1-2\eta \) and \(\tau >(1+\eta )/(\mu _0-\eta )\), and let (3.9) be satisfied. Assume further that
Then for \(x_{n_\delta }^\delta \in \mathcal X\) and \(\xi _{n_\delta }^\delta \in \mathcal X^*\) defined by Algorithm 1 there hold
In order to prove Theorem 4.7, we will need to establish certain stability result to connect Algorithms 1 and 2 so that Proposition 4.4 can be used. The following stability result is sufficient for our purpose.
Lemma 4.8
Let \(F(x_0)\ne y\) and let all the conditions in Theorem 4.7 hold. Let \(\{y^{\delta _l}\}\) be a sequence of noisy data satisfying \(\Vert y^{\delta _l}-y\Vert \le \delta _l\) with \(\delta _l\rightarrow 0\) as \(l\rightarrow \infty \). Let \(x_n^{\delta _l}\) and \(\xi _n^{\delta _l}\), \(0\le n\le n_{\delta _l}\), be defined by Algorithm 1. Then for any finite \(n\le \liminf _{l\rightarrow \infty } n_{\delta _l}\), by taking a subsequence of \(\{y^{\delta _l}\}\) if necessary, there is a sequence \(\{(x_m, \xi _m)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) such that
for all \(0\le m\le n\).
Proof
Since \(F(x_0)\ne y\), we must have \(\liminf _{l\rightarrow \infty } n_{\delta _l}\ge 1\). We will use an induction argument on n.
When \(n=0\), nothing needs to prove since \(x_0^{\delta _l}=x_0\) and \(\xi _0^{\delta _l}=\xi _0\). Assume next that, for some \(0\le n<\liminf _{l\rightarrow \infty } n_{\delta _l}\), the result is true for some sequence \(\{(x_m, \xi _m)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) with \(0\le m\le n\). In order to show the result is also true for \(n+1\), we will obtain a sequence from \(\Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) by retaining the first \(n+1\) terms in \(\{(x_m, \xi _m)\}\) and modifying the remaining terms. It suffices to redefine \(x_{n+1}\) and \(\xi _{n+1}\) since then we can apply Algorithm 2 to produce the remaining terms.
Since \(F(x_0)\ne y\), we have from Lemma 4.5 that \(F(x_n)\ne y\). Let \(\alpha _n(y^{\delta _l})\) be the number used to define \(x_{n+1}^{\delta _l}\) and \(\xi _{n+1}^{\delta _l}\). Since the induction hypothesis \(x_n^{\delta _l}\rightarrow x_n\) and the fact \(y^{\delta _l}\rightarrow y\) imply \(\Vert F(x_n^{\delta _l})-y^{\delta _l}\Vert \rightarrow \Vert F(x_n)-y\Vert >0\) as \(l\rightarrow \infty \), we may use the similar argument in the proof of Lemma 4.6 to conclude that
for two numbers \(0<\underline{\alpha }_n\le \bar{\alpha }_n<\infty \) independent of l. Therefore, by taking a subsequence of \(\{y^{\delta _l}\}\) if necessary, we may assume that
for some number \(\alpha _n\in (0, \infty )\). We define
In view of the induction hypotheses and the continuity of \(x\rightarrow F(x)\) and \(x\rightarrow L(x)\), we may use Lemma 2.2 and the continuity of \(J_r^{\mathcal Y}\) to conclude that
Moreover, since
by taking \(l\rightarrow \infty \) we can conclude that
Therefore \(x_{n+1}=x_n(\alpha _n, y)\) and \(\xi _{n+1}=\xi _n(\alpha _n, y)\) are the desired elements to be defined. The proof is thus complete. \(\square \)
The next result will be used to prove \(\lim _{\delta \rightarrow 0} \Theta (x_{n_\delta }^\delta ) =\Theta (x^\dag )\) in Theorem 4.7.
Lemma 4.9
Let all the conditions in Theorem 4.7 hold and let \(\{(x_n^\delta , \xi _n^\delta )\}_{0\le n\le n_\delta }\) be defined by Algorithm 1. Then for all \(0\le l\le n_\delta \) there holds
where \(C_2=(1+\eta )(3\tau +1)/(\tau c_0 \mu _0^r)\).
Proof
By the definition of \(\xi _n^\delta \), the property of \(J_r^{\mathcal Y}\) and (3.3), we can obtain
By Assumption 1(d) and (3.14) we can derive that
Since \(\Vert y^\delta -F(x_n^\delta )\Vert >\tau \delta \) for \(0\le n<n_\delta \), we therefore have
Consequently
This together with (3.11) in Proposition 3.4 implies the desired estimate. \(\square \)
We are now ready to prove Theorem 4.7, the main result of this paper.
Proof of Theorem 4.7
We may assume that \(F(x_0)\ne y\). We first claim that
Suppose that this is not true. Then there exists \(\{y^{\delta _l}\}\) satisfying \(\Vert y^{\delta _l}-y\Vert \le \delta _l\) with \(\delta _l\rightarrow 0\) such that \(n_{\delta _l}\rightarrow \hat{n}\) as \(l\rightarrow \infty \) for some finite integer \(\hat{n}\). Thus \(n_{\delta _l} = \hat{n}\) for large l. By the definition of \(n_{\delta _l}\) we have
In view of Lemma 4.8, by taking a subsequence of \(\{y^{\delta _l}\}\) if necessary, we can find \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) such that \(x_{\hat{n}}^{\delta _l}\rightarrow x_{\hat{n}}\) as \(l\rightarrow \infty \). Letting \(l\rightarrow \infty \) in (4.11) gives \(F(x_{\hat{n}})=y\). Consequently, by Lemma 4.5, we must have \(F(x_0)=y\) which is a contradiction.
We next show the convergence result. We first prove that
by a contradiction argument. Suppose that (4.12) is not true. Then there exist a number \(\varepsilon >0\) and a sequence \(\{y^{\delta _l}\}\) satisfying \(\Vert y^{\delta _l}-y\Vert \le \delta _l\) with \(\delta _l\rightarrow 0\) as \(l\rightarrow \infty \) such that
where \(n_l:= n_{\delta _l}\). According to Proposition 4.4, there is an integer \(n(\varepsilon )\) such that
For this \(n(\varepsilon )\), by using Lemma 4.8 and by taking a subsequence of \(\{y^{\delta _l}\}\) if necessary, we can find \(\{(x_n, \xi _n)\}\in \Gamma _{\mu _0, \mu _1}(x_0, \xi _0)\) such that
for \(0\le n\le n(\varepsilon )\). Since (4.10) implies that \(n_l>n(\varepsilon )\) for large l, by using Proposition 3.4 we have
In view of (4.15) and (4.14), we therefore obtain
This is a contradiction to (4.13). We thus obtain (4.12). By virtue of the uniform convexity of \(\Theta \), we then have \(\lim _{\delta \rightarrow 0} \Vert x_{n_\delta }^{\delta } -x^\dag \Vert =0\).
It remains only to show that \(\lim _{\delta \rightarrow 0} \Theta (x_{n_\delta }^{\delta }) = \Theta (x^\dag )\). In view of (4.12), it suffices to show that
We again use a contradiction argument by assuming that there is a number \(\varepsilon >0\) and a sequence \(\{y^{\delta _l}\}\) satisfying \(\Vert y^{\delta _l}-y\Vert \le \delta _l\) with \(\delta _l\rightarrow 0\) as \(l\rightarrow \infty \) such that
where \(C_2\) is the constant defined in Lemma 4.9. For this \(\varepsilon \), we may use Proposition 4.4 and Lemma 4.8 to find an integer \(n(\varepsilon )\) such that (4.14) and (4.15) hold. In view of Lemma 4.9, we have
By taking \(l\rightarrow \infty \), using \(x_{n_l}^{\delta _l}\rightarrow x^\dag \), (4.15) and (4.14), we can obtain
which contradicts (4.17). We therefore obtain (4.16) and complete the proof of Theorem 4.7. \(\square \)
5 Numerical results
We consider the identification of the parameter c in the boundary value problem
from the measurements of the state variable u in \(\Omega \), where \(\Omega \subset \mathbb {R}^d\), \(d\le 2\), is a bounded domain with Lipschitz boundary \(\partial \Omega \), \(f\in H^{-1}(\Omega )\) and \(g\in H^{1/2}(\partial \Omega )\). This is a benchmark example of nonlinear inverse problems. It is well known that (5.1) has a unique solution \(u = u(c) \in H^1(\Omega )\) for each c in the domain
with some \(\gamma _0>0\). By the Sobolev embedding \(H^1(\Omega )\hookrightarrow L^r(\Omega )\), it makes sense to define the parameter-to-solution map \(F: {\mathscr {D}} \subset L^2(\Omega ) \rightarrow L^r(\Omega )\) with \(F(c) = u(c)\) for any \(1<r<\infty \). We consider the problem of identifying \(c\in L^2(\Omega )\) from an \(L^r(\Omega )\)-measurement of u. This is amount to solving \(F(c)=u\). It is known that F is Fréchet differentiable; the Fréchet derivative and its Banach space adjoint are given respectively by
where \(r^*\) is the number conjugate to r, i.e. \(1/r^*+1/r=1\), and \(A(c): H^2(\Omega )\cap H_0^1(\Omega )\rightarrow L^2(\Omega )\) is defined by \(A(c) u = -\triangle u +cu\). Recall that in the space \(L^r(\Omega )\) with \(1<r<\infty \) the duality mapping \(J_r: L^r(\Omega ) \rightarrow L^{r^*}(\Omega )\) is given by
For this parameter identification problem, the tangential cone condition has been verified for \(r\ge 2\) in [12, 26]; the verification for \(1<r<2\), however, is not yet available.
In the following we will report some numerical results for this inverse problem to indicate the performance of Algorithm 1 with various choices of the convex function \(\Theta \) and the Banach spaces \(\mathcal X\) and \(\mathcal Y\). The main computational cost stems from solving the convex minimization problems involved in the algorithm which requires numerical solutions of differential equations related to calculating the Fréchet derivatives and their adjoint. We use BFGS—one of the most popular quasi-Newton methods—for Example 5.1 and a restarted nonlinear CG method for Examples 5.2 and 5.3 below, see [29]. Some fast algorithms have been discovered for solving convex optimization problems in recent years, including the fast proximal gradient method [4] and the primal dual hybrid gradient methods [5, 33]. These methods are powerful to deal with problems for which fast solvers such as fft are applicable. In case fast algorithms are not applicable as in our computation, they might not have much advantage over other type methods.
Example 5.1
Consider the one-dimensional problem on the interval \(\Omega =(0,1)\) with the source term \(f(t) = 100 e^{-10(t-0.5)^2}\) and the boundary data \(u(0) = 1\) and \(u(1) =2\). We will identify the sought solution
using noisy data that contains a few data points, called outliers, which are highly inconsistent with other data points. The appearance of outliers may arise from procedural measurement errors.
In Fig. 1 we present the numerical results by Algorithm 1 with \(\tau = 1.1\), \(\mu _0=0.90\) and \(\mu _1 = 0.96\) using the initial guess \(c_0=0\) and \(\xi _0=0\). In order to carry out the computation, the differential equations involved are solved by a finite difference method by dividing \(\Omega =(0,1)\) into 400 subintervals of equal length. The minimization problems involved in the algorithm are solved by performing 50 iterations of the BFGS method. Figure 1a and d show the plots of the noisy data; the one in (a) contains only Gaussian noise, while the one in (d) contains not only Gaussian noise but also 10 % outliers. Figure 1b and e present the reconstruction results by the regularizing Levenberg–Marquardt scheme of Hanke, i.e. Algorithm 1 with \(\mathcal X=\mathcal Y=L^2[0,1]\) and \(\Theta (c) = \frac{1}{2} \Vert c\Vert _{L^2}^2\); it shows that the method is highly susceptible to the existence of outliers. In Fig. 1c and f we present the reconstruction results by Algorithm 1 with \(\mathcal X= L^2[0,1]\), \(\mathcal Y=L^r[0,1]\) with \(r=1.1\) and \(\Theta (c) = \frac{1}{2} \Vert c\Vert _{L^2}^2\). It can be seen that the method is robust enough to prevent being affected by outliers. Using the \(L^r\) misfit data terms, with \(r\ge 1\) close to 1, to exclude the outliers has been investigated for several other regularization methods, see [14, 15, 18, 23].
Example 5.2
We next consider the two dimensional problem on \(\Omega = [0,1]\times [0,1]\) with the source term
and the boundary data \(g \equiv 1\) on \(\partial \Omega \). The sought solution is a piecewise constant function as shown in Fig. 2a. We reconstruct the sought solution using Algorithm 1 with \(\mathcal X=\mathcal Y=L^2(\Omega )\) and different choices of \(\Theta \). In order to carry out the computation, we divide \(\Omega \) into \(100\times 100\) small squares of equal size. All partial differential equations involved are solved approximately by a finite difference method. When using Algorithm 1, we use \(\tau =1.01\), \(\mu _0=0.90\), \(\mu _1 = 0.96\) and take \(\xi _0=c_0=0\) as an initial guess. The minimization problem to determine \(c_n^\delta \) for each \(n\ge 1\) is solved by a restart CG method after 1000 iterations.
In Fig. 2 we report the numerical results using measurement data that is corrupted by Gaussian noise with noise level \(\delta =0.001\). Figure 2b presents the reconstruction result using \(\Theta (c) =\frac{1}{2} \Vert c\Vert _{L^2}^2\). Due to the over-smoothing effect, the reconstruction result turns out to contain unsatisfactory artifacts. Figure 2c reports the reconstruction result using \(\Theta (c) = \lambda \Vert c\Vert _{L^2}^2 +|c|_{TV, \varepsilon }\) with \(\lambda = 1/2\), where \(\varepsilon = 10^{-4}\) and \(|c|_{TV, \varepsilon }= \int _\Omega \sqrt{|\nabla c|^2+\varepsilon }\) which can be considered as a smoothed approximation of the total variation functional \(\int _\Omega |\nabla c|\). Clearly the result in (c) significantly improves the one in (b) by efficiently removing the undesired artifacts.
Example 5.3
We use the same setup in Example 5.2 but now the sought solution is sparse. The domain \(\Omega \) is divided into \(120\times 120\) small squares of equal size in order to solve the associated partial differential equations. In Fig. 3 we report the reconstruction results of Algorithm 1 using measurement data contaminated by Gaussian noise with noise level \(\delta =0.001\). We use \(\tau =1.001\), \(\mu _0 =0.90\), \(\mu _1=0.96\) and take \(c_0=\xi _0=0\) as an initial guess. The minimization problems involved in the algorithm are solved again by a restart CG after 300 iterations. The true solution is plot in Fig. 3a. Figure 3b presents the numerical result of Algorithm 1 using \(\Theta (c) =\frac{1}{2}\Vert c\Vert _{L^2}^2\). Figure 3c reports the numerical result of Algorithm 1 using \(\Theta (c) =\lambda \Vert c\Vert _{L^2}^2 +\int _\Omega \sqrt{|c|^2+\varepsilon }\) with \(\lambda = 0.01\) and \(\varepsilon =10^{-4}\) which can be regarded as a smoothed approximation of the \(L^1\) norm. A comparison on the results in (b) and (c) clearly shows that the sparsity of the sought solution is significantly reconstructed in (c). Therefore, a proper use of a convex function close to the \(L^1\) norm can improve the reconstruction of sparse functions dramatically.
References
Bachmayr, M., Burger, M.: Iterative total variation schemes for nonlinear inverse problems. Inverse Probl. 25(10), 105004 (2009)
Bakushinsky, A.B., Kokurin, M.Y.: Iterative Methods for Approximate Solutions of Inverse Problems. Mathematics and Its Applications (New York), vol. 577. Springer, Dordrecht (2004)
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming, Theory and Algorithms, 3rd edn. Wiley-Interscience, New York (2006)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Cioranescu, I.: Geometry of Banach Spaces, Duality Mappings and Nonlinear Problems. Kluwer, Dordrecht (1990)
Colonius, F., Kunisch, K.: Stability for parameter estimation in two point boundary value problems. J. Reine Angew. Math. 370, 1–29 (1986)
Engl, H.W., Hanke, M., Neunauer, A.: Regularization of Inverse Problems. Kluwer, Dordrecht (1996)
Engl, H.W., Kunisch, K., Neubauer, A.: Convergence rates of Tikhonov regularization of nonlinear ill-posed problems. Inverse Probl. 5, 523–540 (1989)
Hanke, M.: Regularizing properties of a truncated Newton-CG algorithm for nonlinear inverse problems. Numer. Funct. Anal. Optim. 18(9–10), 971–993 (1997)
Hanke, M.: A regularizing Levenberg–Marquardt scheme with applications to inverse groundwater filtration problems. Inverse Probl. 13, 79–95 (1997)
Hanke, M., Neubauer, A., Scherzer, O.: A convergence analysis of the Landweber iteration for nonlinear ill-posed problems. Numer. Math. 72, 21–37 (1995)
Hettlich, F., Rundell, W.: A second degree method for nonlinear inverse problems. SIAM J. Numer. Anal. 37(2), 587–620 (2000)
Hohage, T., Werner, F.: Iteratively regularized Newton-type methods for general data misfit functionals and applications to Poisson data. Numer. Math. 123(4), 745–779 (2013)
Hohage, T., Werner, F.: Convergence rates for inverse problems with impulsive noise. SIAM J. Numer. Anal. 52(3), 1203–1221 (2014)
Jin, Q.: Applications of the modified discrepancy principle to Tikhonov regularization of nonlinear ill-posed problems. SIAM J. Numer. Anal. 36(2), 475–490 (1999)
Jin, Q.: On the order optimality of the regularization via inexact Newton iterations. Numer. Math. 121, 237–260 (2012)
Jin, Q.: Inexact Newton–Landweber iteration for solving nonlinear inverse problems in Banach spaces. Inverse Probl. 28, 065002 (15pp) (2012)
Jin, Q., Tautenhahn, U.: On the discrepancy principle for some Newton type methods for solving nonlinear inverse problems. Numer. Math. 111, 509–558 (2009)
Jin, Q., Wang, W.: Landweber iteration of Kaczmarz type with general non-smooth convex penalty functionals. Inverse Probl. 29, 085011 (22pp) (2013)
Jin, Q., Zhong, M.: On the iteratively regularized Gauss–Newton method in Banach spaces with applications to parameter identification problems. Numer. Math. 124, 647–683 (2013)
Jin, Q., Zhong, M.: Nonstationary iterated Tikhonov regularization in Banach spaces with uniformly convex penalty terms. Numer. Math. 127, 485–513 (2014)
Kaltenbacher, B.: A convergence rates result for an iteratively regularized Gauss–Newton–Halley method in Banach space. Inverse Probl. 31, 015007 (20pp) (2015)
Kaltenbacher, B., Hofmann, B.: Convergence rates for the iteratively regularized Gauss–Newton method in Banach spaces. Inverse Probl. 26, 035007 (21pp) (2010)
Kaltenbacher, B., Neubauer, A., Scherzer, O.: Iterative Regularization Methods for Nonlinear Ill-Posed Problems. de Gruyter, Berlin (2008)
Kaltenbacher, B., Schöpfer, F., Schuster, T.: Iterative methods for nonlinear ill-posed problems in Banach spaces: convergence and applications to parameter identification problems. Inverse Probl. 25, 065003 (19pp) (2009)
Lechleiter, A., Rieder, A.: Towards a general convergence theory for inexact Newton regularizations. Numer. Math. 114, 521–548 (2010)
Margotti, F., Rieder, A., Leitäo, A.: A Kaczmarz version of the REGINN-Landweber iteration for ill-posed problems in Banach spaces. SIAM J. Numer. Anal. 52, 1439–1465 (2014)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Rieder, A.: On the regularization of nonlinear ill-posed problems via inexact Newton iterations. Inverse Probl. 15, 309–327 (1999)
Schuster, T., Kaltenbacher, B., Hofmann, B., Kazimierski, K.S.: Regularization Methods in Banach Spaces. Walter de Gruyter, Berlin (2012)
Z\(\breve{\text{ a }}\)linscu, C.: Convex Analysis in General Vector Spaces. World Scientific Publishing Co., Inc., River Edge (2002)
Zhu, M., Chan, T.F.: An efficient primaldual hybrid gradient algorithm for total variation image restoration. CAM Report 08-34, UCLA (2008)
Acknowledgments
Q Jin is partly supported by the ARC discovery project grant DP150102345 and H Yang is partly supported by the Natural Science Foundation of China under grant 11071264 and 11571386.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jin, Q., Yang, H. Levenberg–Marquardt method in Banach spaces with general convex regularization terms. Numer. Math. 133, 655–684 (2016). https://doi.org/10.1007/s00211-015-0764-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-015-0764-z