Abstract
This paper presents a class of Levenberg–Marquardt methods for solving the nonlinear least-squares problem. Explicit algebraic rules for computing the regularization parameter are devised. In addition, convergence properties of this class of methods are analyzed. We prove that all accumulation points of the generated sequence are stationary. Moreover, q-quadratic convergence for the zero-residual problem is obtained under an error bound condition. Illustrative numerical experiments with encouraging results are presented.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Given \(F: \mathbb {R}^n \rightarrow \mathbb {R}^m\), the nonlinear least-squares (NLS) problem is as follows:
where \(\Vert \cdot \Vert \) is the Euclidean norm. This problem is important from both a theoretical and practical viewpoint [3, 18]. The seminal works of Levenberg [12], Morrison [15], and Marquardt [13] provided a regularization strategy for improving the convergence of the Gauss–Newton (GN) method. The latter is the first and simplest method to address the NLS problem. In the GN method, the nonlinear residual function F is replaced by its linear approximation at the current iterate, so that the NLS problem is solved by means of a sequence of quadratic problems, without any additional parameter. In the Levenberg–Marquardt (LM) method, a sequence of quadratic problems is also generated, and a regularization term is introduced that essentially depends on the so-called LM parameter.
For very large problems, as those arising in data assimilation, it may be necessary to make further approximations in the GN method in order to reduce computational costs. Gratton et al. [10] point out conditions that ensure the convergence of truncated and perturbed GN methods, deriving rates of convergence for the iterations. Inexact versions of the LM method have also been considered under an error bound condition. Such a condition is weaker than assuming nonsingularity of the appropriate matrix at a solution, namely the Jacobian \(JF\in \mathbb {R}^{m \times n}\) if \(m=n\) or the square matrix \(JF^T JF\), otherwise [2, 8]. The local convergence of the LM method has been studied also under an error bound condition. Yamashita and Fukushima [17] proved q-quadratic convergence for the LM method with the LM parameter sequence set as \(\mu _k=\Vert F(x_k)\Vert ^2\). Later, this q-quadratic rate was extended by Fan and Yuan [6] for the setting \(\mu _k=\Vert F(x_k)\Vert ^\delta \), with \( \delta \in [1,2]\). Fan and Pan [5] have enlarged upon the analysis for the variation of the exponent for \(\delta \in (0,1)\) within the aforementioned update of the LM parameter, showing local superlinear convergence. Accelerated versions of the LM method have been proposed recently by Fan [4], in which cubic local convergence was reached. When it comes to complexity analysis, Ueda and Yamashita [16] have investigated a global complexity bound for the LM method.
In this work we have devised algebraic rules for computing the LM parameter, so that \(\mu _k\) belongs to an interval that is proportional to \(\Vert F(x_k)\Vert \). Under the Lipschitz continuity of the Jacobian, the rules were generated in order to accept the full LM step by the Armijo sufficient decrease condition. On the one hand, under the availability of the Lipschitz constant, we have proved that the full steps are acceptable. On the other hand, for unknown Lipschitz constants, we have proposed a scheme for dynamically updating an estimate of such a constant within a well defined and globally convergent algorithm for solving the NLS problem. The q-quadratic rate of convergence for the zero-residual problem is obtained under an error bound condition. Numerical results illustrate the performance of the proposed algorithm for a set of test problems from the CUTEst, with promising results in terms of both efficiency and robustness.
The text is organized as follows. The basic results that have generated the proposed algorithm are presented in Sect. 2. The algorithm and its underlying properties are given in Sect. 3. The global convergence analysis is provided in Sect. 4, whereas the local convergence result is developed in Sect. 5. The numerical experiments are presented and examined in Sect. 6. A summary of the work is given in Sect. 7 and the tables with the complete numerical results compose the Appendix.
2 Technical results
We consider \(F:\mathbb {R}^n\rightarrow \mathbb {R}^m\) of class \({\mathcal {C}}^1\) with an L-Lipschitz continuous Jacobian \(JF\in \mathbb {R}^{m\times n}\), that is, for all \(x,y\in \mathbb {R}^n\),
On the left hand-side of the inequality above we have the operator norm induced by the canonical norms in \(\mathbb {R}^n\) and \(\mathbb {R}^m\). More specifically, we are concerned about choosing the regularization parameter \(\mu \) in the Levenberg–Marquardt method,
where \(x_+\) is the new iterate and t is the step length.
Initially, a classical result is recalled [3, Lemma 4.1.12].
Lemma 2.1
(Linearization error) For any x, s in \(\mathbb {R}^n\),
From now on, in this section, s is the Levenberg–Marquardt step at \(x\in \mathbb {R}^n\), with regularization parameter \(\mu {>}0\), that is,
Let \(\phi :\mathbb {R}\rightarrow \mathbb {R}\) be
Note that
First we analyze the norm of the linearization of F along the direction s.
Lemma 2.2
(Linearization’s norm) For any \(t\in [0,1]\),
Proof
Using (4) and the fact that \(t\in [0,1]\), we conclude the proof. \(\square \)
Now we analyze the norm of F along the direction s.
Lemma 2.3
For any \(t\in [0,1]\),
Proof
Let
Since \(F(x+ts)=F(x)+tJF(x)s+R(t)\),
where the first inequality follows from Cauchy–Schwarz inequality and the second one from Lemma 2.2. From the definition of R(t) and the assumption of \(JF\) being L-Lipschitz continuous, we have that \(\Vert R(t)\Vert \le Lt^2\Vert s\Vert ^2/2 \). Therefore,
To end the proof, use Lemma 2.2 again. \(\square \)
In view of Lemma 2.3 we need a bound for \(\Vert s\Vert \) in order to choose, at a non-stationary point, a regularization parameter \(\mu \). The next technical lemma will be useful for obtaining such a bound.
Lemma 2.4
For any \(A\in \mathbb {R}^{m\times n}\), \(b\in \mathbb {R}^m\) and \(\mu {>}0\)
where \({\mathcal {R}}(A)\) is the range of A and \(P_{{\mathcal {R}}(A)}\) is the orthogonal projection onto this subspace.
Proof
Let \(b'=P_{{\mathcal {R}}(A)}b\) and \(s=(A^T A+\mu I)^{-1} A^T b\). Observe that
We will use an SVD of A. There exist U and V unitary matrices \(m \times m\) and \(n \times n\), respectively, such that
where \(\sigma _i\ge 0\) for all \(i=1,\ldots ,\min \{m, n\}\). Note that \(V^T=V^{-1}\), \(U^T=U^{-1}\) and
Therefore, pre-multiplying both sides of (5) by \(V^T\) and using the substitutions
in (5) we conclude that
It follows from this equation that if \(n \le m\) then
while if \(n{>}m\) then
Since \(t/(t^2+\mu )\le 1/(2\sqrt{\mu })\) for \(\mu {>}0\) and \(t\ge 0\), we have
To end the proof, note that \(\Vert {\tilde{s}}\Vert =\Vert s\Vert \) and \(\Vert {\tilde{b}}\Vert =\Vert b'\Vert \). \(\square \)
In the following result, the main ingredients for the devised algebraic rules are established.
Theorem 2.5
For any \(\mu {>}0\),
where P is the orthogonal projection onto the range of \(JF(x)\). Moreover, if
then
Proof
The first part of the theorem, which are the bounds for \(\Vert s\Vert \), follows from (2), Lemma 2.4 and the metric properties of the orthogonal projection. Combining the first part of the theorem with Lemma 2.3, we conclude that
To end the proof, note that the right hand-side of inequality (6) is the largest root of the concave quadratic \(\mu \mapsto \dfrac{L^2}{16}\Vert P(F(x))\Vert ^2+L\Vert F(x)\Vert \mu -\mu ^2\). \(\square \)
Remark 2.6
In view of Theorem 2.5, possible choices for \(\mu \) at a point x are
or
or
The first value is the smallest one and the first two values require the computation of \(\Vert P(F(x))\Vert \), although an upper bound for such a norm can also be used.
3 The algorithm
Now we propose and analyze the Levenberg–Marquard algorithm with algebraic rules for computing the regularization parameter.
Concerning Algorithm 1, it is worth noticing that
-
(i)
Iteration \(\ell \) begins with \(k=\ell -1\), and ends with \(k=\ell \) if \(JF(x_{\ell -1})^T F(x_{\ell -1}) \ne 0\).
-
(ii)
If the algorithm does not end at iteration \(k+1\), then \(\mu _k{>}0\), \(s_k\) is well defined and it is a descent direction for \(\Vert F(\cdot ) \Vert ^2\). Therefore, the Armijo line search in Step 6–10 of Algorithm 1 has finite termination. Altogether, Algorithm 1 is well defined and either it terminates after \(\ell \) steps with \(JF(x_{\ell -1})^T F(x_{\ell -1})=0\), or it generates infinite sequences \((x_k)\), \((s_k)\), \((t_k)\), \((\mu _k)\), \((L_k)\).
-
(iii)
\(L_k\) plays the role of the Lipschitz constant of \(JF\). If \(\delta {>}0\), this parameter also plays the role of a safeguard which prevents \(L_k\) from becoming too small.
From now on, we assume that Algorithm 1 with input \(x_0\in \mathbb {R}^n\), \(\beta \in (0,1)\), \(\eta \in [0,1)\), \(L_0{>}0\) and \(\delta \ge 0\) does not stop at Step 2 and that \((x_k)\), \((s_k)\), \((t_k)\), \((\mu _k)\), \((L_k)\) are the (infinite) sequences generated by it.
Next we analyze the basic properties of Algorithm 1.
Proposition 3.1
If \(L_k\ge L\), then \(t_k=1\) and \(L_{k+1}=\max \{L_k/2,\delta \}\).
Proof
Suppose that \(L_k\ge L\). From this assumption and the definition of \(\mu _k\) (in Step 4) we have that
where P is the orthogonal projection onto the range of \(JF(x_k)\). The first equality of the proposition follows from the above inequality; the definition of \(s_k\) (in Step 5); Theorem 2.5 with \(\mu =\mu _k\), \(x=x_k\), \(s=s_k\); and Steps 6–10. The second equality comes from the first one and Steps 12–22. \(\square \)
Proposition 3.2
For all k,
and, for infinitely many k, \(t_k=1\).
Proof
Since \(L_0\ge \delta \) and \(L_{k+1}\ge \max \{L_k/2,\delta \}\) for all k, the first inequality in (10) also holds for all k.
We will prove the second inequality by induction in k. This inequality holds trivially for \(k=0\). Assume that it holds for some k. Steps 12–22 of the algorithm imply that if \(t_k=1\), then \(L_{k+1}=L_k\) or \(L_{k+1}=\max \{\delta ,L_k/2 \}\le L_k\) and in both cases the inequality holds for \(k+1\). If \(t_k<1\), it follows from Proposition 3.1 that \(L_k< L\) and then \(L_{k+1}=2L_k\le 2L\). So the inequality holds for \(k+1\) and the induction proof is complete.
To prove the second part of the proposition, suppose that \(t_k<1\) for any \(k\ge k_0\). Then
in contradiction with (10). \(\square \)
From Proposition 3.2 and the Step 4 of the algorithm, we have
for all k.
Proposition 3.3
For each k
As a consequence, the sequence \(\left( \Vert F(x_k)\Vert ^2\right) \) is strictly decreasing and
Proof
The first inequality follows from the stopping condition for the Armijo line search (Steps 6–10). In view of the definition of \(s_k\) and \(\mu _k\), and the fact that \(\mu _k{>}0\),
which trivially implies the second inequality. The last statement of the proposition follows directly from (12). \(\square \)
4 General convergence analysis
The convergence of Algorithm 1 is examined in the following results.
Proposition 4.1
If the sequence \((x_k)\) is bounded, then it has a stationary accumulation point.
Proof
By Proposition 3.2, \(t_k=1\) for infinitely many k. Since \((x_k)\) is bounded, there exists a subsequence \((x_{k_j})\) convergent to some \({\bar{x}}\), such that \(t_{k_j}=1\) for all j. Thus, by Proposition 3.3,
Moreover, since F and \(JF\) are continuous, \(\Vert F(x_k)\Vert \), \(\Vert JF(x_k)\Vert \) and \(\mu _k\) are bounded. Hence the sequence \((JF(x_{k_j})^TF(x_{k_j}))\) converges to 0. To end the proof, use again the continuity of F and \(JF\) to conclude that \(JF({\bar{x}})^TF({\bar{x}})=0\). \(\square \)
From the bound for the norm of F along the direction s, obtained in Lemma 2.3, we prove next that the step length is bounded away from zero.
Proposition 4.2
If \(\delta {>}0\), then
for all k.
Proof
For any \(t\in [0,1]\) it holds
where the first inequality comes from the bound on t, the second from Theorem 2.5 and the third from the fact that \(\mu _k\ge \delta \Vert F(x_k)\Vert \), as stated in (11).
From inequality (13) and Lemma 2.3, we have that if
then \(\Vert F(x_k+ts_k)\Vert ^2\le \Vert F(x_k)\Vert ^2+t\langle {s_k},{JF(x_k)^TF(x_k)}\rangle \). So, the result follows from Steps 6–10 of Algorithm 1. \(\square \)
We now present the global convergence result of Algorithm 1.
Proposition 4.3
If \(\delta {>} 0\), then all accumulation points of the sequence \((x_k)\) are stationary for the function \(\Vert F(x)\Vert ^2\).
Proof
Suppose that \((x_{k_j})\) converges to some \({\bar{x}}\). From Propositions 3.3 and 4.2, we have that
It follows from (11) and from the continuity of F and \(JF\) that \( \Vert JF(x_{k_j})\Vert ^2+\mu _{k_j}\) is bounded. Therefore \(JF(x_{k_j})^TF(x_{k_j})\) converges to 0. \(\square \)
A complexity result concerning the stationarity measure of the Algorithm 1 is given next.
Proposition 4.4
If \(\delta {>} 0\) and \((x_k)\) is bounded, then
Proof
Define
Then, by (11) and Propositions 3.3 and 4.2, for any k
and the conclusion follows. \(\square \)
5 Quadratic convergence under an error bound condition
Given \(F: \mathbb {R}^n \rightarrow \mathbb {R}^m\), consider the system
and let \(X^*\) be its solution set, that is,
Our aim is to prove that \((x_k)\) converges quadratically near solutions of (14) where, locally, \(\Vert F(x)\Vert \) provides an error bound for this system, in the sense of [17, Definition 1]. For completeness, we present this definition in the sequel (see Definition 5.3).
Define, for \(\gamma {>}0\), \(\mathbf {S}_\gamma :\mathbb {R}^n\setminus X^*\rightarrow \mathbb {R}^n\)
Auxiliary bounds are established in the next two results.
Proposition 5.1
If \(x\in \mathbb {R}^n\setminus X^*\), \(\gamma {>}0\), \(s=\mathbf {S}_\gamma (x)\), and \(x_+=x+s\) then
Proof
Direct algebraic manipulations yield
It follows from (16) that \(JF(x)^T(F(x)+JF(x)s)=-\gamma \Vert F(x)\Vert s\). Combining this result with the above equality, and using the triangle inequality, we have
where the last inequality follows from Assumption (1), Lemmas 2.1 and 2.2. \(\square \)
Proposition 5.2
If \(x\in \mathbb {R}^n\setminus X^*\), \({\bar{x}}\in X^*\), \(\gamma {>}0\), \(s=\mathbf {S}_\gamma (x)\), and \({\bar{s}}={\bar{x}}-x\), then
-
1.
\(\Vert F(x)+JF(x){\bar{s}}\Vert \le \dfrac{L}{2}\Vert {\bar{s}}\Vert ^2\);
-
2.
\(\Vert F(x)+JF(x)s\Vert ^2+\Vert JF(x)({\bar{s}}-s)\Vert ^2 +\gamma \Vert F(x)\Vert (\Vert s\Vert ^2+\Vert {\bar{s}}-s\Vert ^2) \le \dfrac{L^2}{4}\Vert {\bar{s}}\Vert ^4+\gamma \Vert F(x)\Vert \Vert {\bar{s}}\Vert ^2\).
Proof
Since \(F({\bar{x}})=0\),
and the first inequality follows from this equality and Lemma 2.1.
Define
From item 1 we have
Observe that \(s=\hbox {argmin}_{u\in \mathbb {R}^n} \psi _{\gamma ,x}(u)\). Since \(\psi _{\gamma ,x}\) is a quadratic with Hessian \(2(JF(x)^TJF(x)+ \gamma \Vert F(x)\Vert I)\) and it is minimized by s,
The second inequality of the proposition follows from the two above relations. \(\square \)
We will analyze the local convergence of the sequence \((x_k)\) under the local error bound condition, as defined next. Such a condition is weaker than assuming nonsingularity of \(JF(x)^T JF(x)\) for x at the solution set \(X^*\).
Definition 5.3
([17, Definition 1]) Let V be an open subset of \(\mathbb {R}^n\) such that \(V\cap X^*\ne \emptyset \), where \(X^*\) is as in (15). We say that \(\Vert F(x)\Vert \) provides a local error bound on V for the system (14) if there exists a positive constant c such that
for all \(x\in V\).
The next lemma, which will be instrumental in the main result of this section, was proved in [7, Corollary 2]. A proof is provided here for the sake of completeness, where the function F is simply continuously differentiable.
Lemma 5.4
If \(F:\mathbb {R}^n\rightarrow \mathbb {R}^m\) is continuously differentiable and \(\Vert F(x)\Vert \) provides an error bound for (14) in a neighborhood V of \(x^*\in X^*\) as in Definition 5.3, then \(\Vert JF(x)^TF(x)\Vert \) also provides an error bound for (14) in some neighborhood of \(x^*\).
Proof
Suppose that \(V\subset \mathbb {R}^n\) is a neighborhood of \(x^*\) where \(\Vert F(x)\Vert \) provides an error bound for (14), that is,
Define, for \(x,y\in \mathbb {R}^n\)
Since F is continuously differentiable, there exists \(r{>}0\) such that \(B(x^*,r)\subseteq V\) and
Take \(x\in B(x^*,r/2)\) and \({\bar{x}}\in \displaystyle {\arg \min _{z\in X^*}\Vert z-x\Vert }\). Then \({{\mathrm{dist}}}(x,X^*)=\Vert {\bar{x}}-x\Vert <r/2\), \({\bar{x}}\in B(x^*,r)\) and, in view of the above assumptions,
Since \(F({\bar{x}})=0\), \( -JF(x)({\bar{x}}-x)=F(x)+R({\bar{x}},x)\) and
Using the above equalities, Cauchy-Schwarz inequality and (17) we conclude that
Therefore, \(\Vert JF(x)^TF(x)\Vert \ge (c^2/2)\,{{\mathrm{dist}}}(x,X^*)\) for any \(x\in B(x^*,r/2)\). \(\square \)
From now on, in this section, \(x^*\in \mathbb {R}^n\), \(r,c{>}0\) are such that
Another auxiliary result is provided next.
Lemma 5.5
Consider \(x^*\in \mathbb {R}^n\), \(r,c{>}0\) satisfying (18). If \(x\in B(x^*,r)\setminus X^*\), \({\bar{x}}\in \displaystyle {\arg \min _{u\in X^*}\Vert u-x\Vert }\), \(\gamma {>}0\), \(s=\mathbf {S}_\gamma (x)\) and \({\bar{s}}={\bar{x}}-x\), then
-
1.
\(\Vert {\bar{s}}\Vert ={{\mathrm{dist}}}(x,X^*)\le \Vert F(x)\Vert /c\);
-
2.
\(\Vert s\Vert \le \left( 1+\dfrac{L^2}{4c \gamma }\Vert {\bar{s}}\Vert \right) ^{1/2}\Vert {\bar{s}}\Vert \);
-
3.
\(\Vert F(x)+JF(x)s\Vert \le \left( \dfrac{L^2}{4c^2\gamma ^2}\Vert {\bar{s}}\Vert ^2+\dfrac{1}{c\gamma }\Vert {\bar{s}}\Vert \right) ^{1/2}\gamma \Vert F(x)\Vert \).
Moreover, if \(\Vert {\bar{s}}\Vert \le \dfrac{c \gamma }{2L^2}\) then \(\Vert F(x+s)\Vert ^2 \le \Vert F(x)\Vert ^2+\langle {s},{JF(x)^TF(x)}\rangle \).
Proof
The first relation in item 1 follows trivially from the definition of \({\bar{x}}\) while the second relation comes from (18).
From item 2 of Proposition 5.2 and item 1 of this lemma, we have that
and
which trivially imply items 2 and 3, respectively.
To prove the last part of the lemma, suppose that \(\Vert {\bar{s}}\Vert \le \dfrac{c \gamma }{2L^2}\) and define
From items 2 and 3, we have
where the second inequality follows from item 1, and the equality comes from the definition of w. Observe that since \(w\le 1/2\) it follows that \(a<0\). To end the proof, use the first inequality in Lemma 2.3 with \(t=1\) and \(\mu =\gamma \Vert F(x)\Vert \). \(\square \)
Note that in Algorithm 1, \(s_k=\mathbf {S}_\gamma (x_k)\) for \(\gamma =\mu _k/\Vert F(x_k)\Vert \). In order to simplify the proofs, define
In view of (11) and the definition of \(s_k\) in Algorithm 1, for all \(k\in \mathbb {N}\),
Assuming that the residual function F provides an error bound for the solution set of the NLS zero-residual problem, the local convergence of Algorithm 1 is established as follows.
Theorem 5.6
Consider \(x^*\in \mathbb {R}^n\), \(r,c{>}0\) satisfying (18). There exists \({\tilde{r}}{>}0\) such that if \(x_{k_0}\in B(x^*,{\tilde{r}})\) for some \(k_0\), then either Algorithm 1 stops at some \(x_k\) solution of (14) or \((x_k)\) converges q-quadratically to some \({\hat{x}}\) solution of (14).
Proof
In view of Lemma 5.4, there exist \(0<r_1\le r\) and \(c_1{>}0\) such that
Define
and
We claim that if
then
The first inequality in (23) follows from item 2 of Lemma 5.5 and the definitions of \({\bar{s}}\) and \(\rho \). The second inequality comes from the inclusions \(x^*\in X^*\) and \(x\in B(x^*,\rho )\).
To prove (24) first observe that
which comes from the definition of \(x_+\), the triangle inequality, (23) and the definition of \(\rho \). Consequently, from (21) for \(x_+\) and Proposition 5.1,
Since \(JF\) is continuous, \(F(x)=F({\bar{x}}) - \int _0^1 JF({\bar{x}}-t {\bar{s}}) \,{\bar{s}} \, dt\). As \( F({\bar{x}})=0\), we have
Using the two relations displayed above, the bounds for \(\gamma \) in (22), (23), and the definitions of \(M_1\) and \(M_2\) we have
which proves the first inequality of (24). The second inequality in (24) comes from the fact that \({{\mathrm{dist}}}(x,X^*) \le \rho \) and from the definition of \(\rho \).
Our third claim, (25), follows directly from the last result of Lemma 5.5.
Next we define a family of sets on \(\mathbb {R}^n\) which are, in some sense, well behaved with respect to Algorithm 1. Define
and let
Since \(x^*\in X^*\), we have \(W_0=B(x^*,{\tilde{r}})\). Hence
Let \(x_k\in W\). If \(JF(x_k)^TF(x_k)=0\) then the algorithm stops at \(x_k\) and, in view of (21), \(F(x_k)=0\). Next we analyze the case \(JF(x_k)^TF(x_k)\ne 0\). It follows from (20) and (25) that
As \(x_k\in W\), we have \(x_k\in W_\tau \) for some \(0\le \tau <1\). Therefore, from the triangle inequality, the definition of \(W_\tau \), and (23)
Additionally, from (24) and the definition of \(W_\tau \) we also have
Altogether we proved that
Suppose that \(x_{k_0}\in B(x^*,{\tilde{r}})\). We have just proved that in this case either the algorithm stops at some \(x_k\in X^*\) or an infinite sequence is generated and \(x_k\in B(x^*,\rho )\) for \(k\ge k_0\). Assume that an infinite sequence is generated and define
It follows from (23) and (24) that for \(k \ge k_0\),
Hence, \(\sum _{j=k_0}^\infty \Vert x_{j+1}-x_j\Vert \le \frac{3}{2\sqrt{2}}\sum _{j=k_0}^\infty d_j\le \frac{3}{\sqrt{2}}d_{k_0} \). As \((x_k)\) is a Cauchy sequence, it converges to some \({\hat{x}}\). Since \(d_k={{\mathrm{dist}}}(x_k,X^*)\) converges to 0, \(F({\hat{x}})=0\), that is, \({\hat{x}}\in X^*\). By the triangle inequality and the first inequality in (26),
From the last two inequalities in (26) we have
Therefore, from the two relations displayed above,
where the first inequality comes from the inclusion \({\hat{x}}\in X^*\). Consequently, using (26) again and the definition of \(d_k\), we conclude that
ensuring the q-quadratic convergence and completing the proof. \(\square \)
6 Numerical experiments
In this section we describe numerical experiments to illustrate the practical performance of the algorithm. We start with the algorithmic and computational choices for obtaining the numerical results.
6.1 On the choice of \(\mu _k\)
Similarly to the analysis developed for the unconstrained minimization problem (cf. [11, Prop.1.2]), as the positive scalar \(\mu \) is a lower bound for the smallest eigenvalue of \(JF(x)^TJF(x) + \mu I\), it follows that
and so \(\Vert s \Vert \le \frac{\Vert JF(x)^TF(x)\Vert }{\mu }\).
By Lemma 2.3, with \(t=1\),
For obtaining \(\mu \) that ensures \(\dfrac{L^2}{4}\Vert s\Vert ^2+L\Vert F(x)\Vert -\mu \le 0,\) the previous condition concerning the step gives
Let \(\mu ^2=z\), and the concave function \(\psi :\mathbb {R}_+\rightarrow \mathbb {R}\) given by
Consider \(z_0=L^2\Vert F(x)\Vert ^2\). Note that \( \psi (z_0)=L^2\Vert JF(x)^T F(x)\Vert ^2 \ge 0\).
An iteration of Newton’s method from \(z_0\) gives us
Then (27) is guaranteed for all
Observe that \(L \Vert F(x)\Vert \le \mu _J\), \( \mu ^+=\frac{2+\sqrt{5}}{4}L \Vert F(x)\Vert \) and so \( \frac{4}{2+\sqrt{5}} \,\mu ^+ \le \mu _J\). Nevertheless, there is no guarantee that \(\mu _J \le \mu ^+\) holds, so we set \( \mu _k=\min \{\mu _k^+, \mu _J\}\).
6.2 On the computation of the step \(s_k\)
The equivalence between the problems
implies that the system
may be handled by the normal equations as follows (cf. [14])
As an alternative to the Cholesky factorization of \(J_k^T J_k + \mu _k I\), the QR factorization of \(\left( \begin{array}{c} J_k \\ \sqrt{\mu _k} I \end{array} \right) \) avoids performing the product \(J_k^TJ_k\):
where \(Q \in \mathbb {R}^{(m+n) \times (m+n)}\) is orthogonal and \(R_{\mu } \in \mathbb {R}^{n \times n}\) is upper triangular. Indeed, the economic version of the factorization may be used, and just the first n columns of the orthogonal matrix Q are computed. The direction s is thus obtained from the system (28) by means of the relationship
6.3 About the backtracking scheme
Besides the simple bisection of Steps 6–10 of Algorithm 1, we have implemented a quadratic–cubic interpolation scheme (cf. [3, §6.3.2]). As this scheme performed slightly better in preliminary experiments, it was used in the reported results.
6.4 About the initialization of the sequence \((L_k)\)
We set \(L_0=10^{-6}\) and \(L_{\min }=10^{-12}\), compute \(x_1\) and define \(L_1=\max \left\{ \frac{\Vert JF(x_1)-JF(x_0)\Vert _F}{\sqrt{n} \Vert x_1-x_0\Vert }, L_{\min } \right\} \). After that, the updating scheme follows Steps 12–22 of Algorithm 1.
6.5 The results
The tests were performed in a notebook DELL Intel Core i7-4510U, CPU @2.00GHz \(\times \) 4, with 16GB RAM, Inspiron 5000 - i15 5547-A30, 64-bit, using Matlab 2014a, v. 8.3.
The set of test problems consists of all 53 problems from the CUTEst collection [9] such that the nonlinear equations \(F(x)=0\) are recast from feasibility instances, i.e, problems without objective function, with nonlinear equality constraints and without fixed variables for which the Jacobian matrix is available. The constraint bodies and Jacobians were evaluated in sparse format. The initial point \(x_0\) was always the default of the collection.
To put our approach in perspective, the same problems were addressed by two distinct approaches. The first one is the Self-adaptive Levenberg–Marquardt Algorithm of Fan and Pan in [5], which is close to the scheme proposed in this paper. The second one is the modular code lsqnonlin (available within the Matlab software), based on the Levenberg–Marquardt Method [12, 13, 15]. The remaining parameters of the Algorithm 1 were defined by \(\beta =10^{-4}\), \( \eta =10^{-4}\) and \(\delta =10^{-8}\). These parameters are the default values suggested in [5] for the parameters that play a similar role to ours. The choices of Fan and Pan denoted by \(p_0\) and m correspond to our \(\eta \) and \(\delta \), respectively. Moreover, the parameter \(\delta \) of Fan and Pan was set as 1.
Summing up, there are four strategies under analysis:
-
CH: Algorithm 1 with Cholesky factorization for computing \(s_k\);
-
QR: Algorithm 1 with QR factorization for computing \(s_k\);
-
FP: Fan and Pan’s algorithm [5, Alg. 2.1], with Cholesky factorization for computing the step and the aforementioned parameters and values;
-
LM: The Levenberg–Marquardt algorithm (an option of the routine lsqnonlin of Matlab).
Concerning the implemented stopping criteria, we have adopted the same numbering of the exit flags as the routine lsqnonlin. Setting \(\varepsilon _\mathrm{mac}\) as the machine precision, the stopping criteria commom to the four strategies were the following:
-
(1)
Convergence to a solution with relative stationarity: \(\Vert JF(x_k)^TF(x_k))\Vert _\infty \le 10^{-10} \max \{\Vert (JF(x_0)^TF(x_0))\Vert _\infty ,\sqrt{\varepsilon _\mathrm{mac}}\}\);
-
(2)
Change in x (too small): \(\Vert x_{k+1}-x_k\Vert \le 10^{-9}\left( \sqrt{\varepsilon _{\mathrm{mac}}}+\Vert x_{k+1}\Vert \right) \);
-
(3)
Change in the norm of the residue (too small): \(| \Vert F(x_{k+1})\Vert ^2 - \Vert F(x_{k})\Vert ^2| \le 10^{-6} \Vert F(x_{k})\Vert ^2\);
-
(4)
Computed search direction (too small): \(\Vert s_k\Vert \le 10^{-9}\);
-
(0)
Maximum number of functional evaluations exceeded \(\max =\{2000,10n\}\).
Besides, as strategy \(\mathtt{FP}\) does not compute the step length \(t_k\), the next criterion was included only for the strategies \(\mathtt{CH}\), \(\mathtt{QR}\) and \(\mathtt{LM}\):
- (-4):
-
Line search failed, as the step length is too small: \(|t_k | \le 10^{-15}\).
Let \(f^*\) be the objective function value obtained by a strategy S when this strategy is applied to a given problem with the default initial point. We consider that the strategy S has found a solution (cf. [1]) if
where \(f_{\min }\) is the smallest function value found among all the strategies under comparison.
All outputs are reported in the Appendix. Tables 1, lists the 53 test problems. The table columns display the name of the problem; the dimensions (n and m); the number of iterations (\(\#\)Iter); the number of function evaluations (\(\#\)Fun); the function value at the last iterate (\(f^*\)); the CPU time in seconds (CPU), and the reason for stopping (exit). We observe that the stopping criteria 2 and -4 were never activated during the tests. The results of the four solvers CH, QR, FP and LM, respectively, are displayed row by row for each problem.
As it is usual to have some variation of the CPU time from one execution of an algorithm to the other, for each problem we ran nine times all the solvers and we considered the average CPU time of the last eight runs, discarding the CPU time of the first one. The symbol \(\dagger \) indicates that the obtained solution does not satisfy (29). It is worth mentioning that \(\mathtt{CH}\) and \(\mathtt{FP}\) performed one Cholesky factorization per iteration. The only exception occurred at a single iteration of the problem 10FOLDTR for \(\mathtt{CH}\) and of the problems 10FOLDTR and ARGLBLE for \(\mathtt{FP}\). In these three instances, both strategies required an additional Cholesky factorization with a slight increase in \(\mu _k\) to ensure the numerically safe positive definiteness of \(J_k^TJ_k + \mu _kI\).
The results corresponding to the solved problems are depicted in the performance profiles of Fig. 1 for the number of iterations, the number of function evaluations and the required CPU time. The logarithmic scale was used in the horizontal axis for better visualization of the results. In terms of efficiency, our strategies slightly outperformed the \(\mathtt{FP}\) and \(\mathtt{LM}\) with regard to the number of iterations and function evaluations. Indeed, \(52.8\,\%\) of the problems were solved with the fewest number of iterations for both \(\mathtt{CH}\) and \(\mathtt{QR}\), \(35.8\,\%\) for \(\mathtt{FP}\) and \(49.1\,\%\) for \(\mathtt{LM}\). Moreover, \(49.1\,\%\) of the problems were solved with the fewest number of function evaluations for both \(\mathtt{CH}\) and \(\mathtt{QR}\), and \(43.4\,\%\) for \(\mathtt{FP}\) and \(\mathtt{LM}\). Now, concerning the CPU time, the shortest time was never reached by \(\mathtt{QR}\), whereas strategies \(\mathtt{CH}\), \(\mathtt{FP}\) and \(\mathtt{LM}\) solved respectively 35.8, 41.5 and \(22.6\,\%\) of the problems in the shortest time. Furthermore, \(\mathtt{CH}\) and \(\mathtt{FP}\) solved \(70\,\%\) of the problems using no more than twice the best CPU time. Strategies \(\mathtt{CH}\) and \(\mathtt{QR}\) are the most robust, solving 51 of the 53 problems. Problems EIGENB and YATP2SQ were considered not solved by \(\mathtt{CH}\), and 10FOLDTR and YATP2SQ were not solved by strategy \(\mathtt{QR}\). On the other hand, \(\mathtt{LM}\) did not solve three problems (10FOLDTR, CYCLIC3, YATP2SQ), while \(\mathtt{FP}\) did not solve four problems (10FOLDTR, ARWHDNE, CYCLIC3, EIGENB), showing that our approach is competitive.
Aiming at illustrating the rate of convergence of the proposed algebraic rules, Fig. 2 shows the logarithm (base 10) of the squared residual value against the iterations for two typical problems.
7 Final remarks
We have proposed and analyzed a class of Levenberg–Marquardt methods in which algebraic rules for computing the regularization parameter were devised. Under the Lipschitz continuity of the Jacobian of the residual functions, the algebraic rules were proposed to allow the full LM-step to be accepted by the Armijo sufficient decrease condition. In terms of global convergence, all the accumulation points of the sequence generated by the algorithm are stationary. As for the local convergence for the zero-residual problem, we have proved a q-quadratic rate under an error-bound condition. This condition is less restrictive than assuming nonsingularity at the solution. A set of numerical experiments was prepared to illustrate the practical performance of the proposed algorithm. Our approach has shown both efficiency and robustness for the 53 feasibility instances from the CUTEst. It has performed slightly better than both the algorithm proposed by Fan and Pan in [5] and the routine lsqnonlin of Matlab. Obtaining inexact solutions for the linear systems with adequately matching stopping criteria, as well as considering nonzero-residual problems in the local convergence analysis are topics for future investigations.
References
Birgin, E.G., Gentil, J.M.: Evaluating bound-constrained minimization software. Comput. Optim. Appl. 53(2), 347–373 (2012)
Dan, H., Yamashita, N., Fukushima, M.: Convergence properties of the inexact Levenberg-Marquardt method under local error bound conditions. Optim. Methods Softw. 17(4), 605–626 (2002)
Dennis Jr., J.E., Schnabel, R.B.: Numerical methods for unconstrained optimization and nonlinear equations. In: Classics in Applied Mathematics, vol 16. Society for Industrial and Applied Mathematics (SIAM). Philadelphia, PA (1996). Corrected reprint of the 1983 original
Fan, J.: Accelerating the modified Levenberg-Marquardt method for nonlinear equations. Math. Compd. 83(287), 1173–1187 (2014)
Fan, J., Pan, J.: Convergence properties of a self-adaptive Levenberg-Marquardt algorithm under local error bound condition. Comput. Optim. Appl. 34(1), 47–62 (2006)
Fan, J., Yuan, Y.-X.: On the quadratic convergence of the Levenberg-Marquardt method without nonsingularity assumption. Computing 74(1), 23–39 (2005)
Fischer, A.: Local behavior of an iterative framework for generalized equations with nonisolated solutions. Math. Progr. 94((1, Ser. A)), 91–124 (2002)
Fischer, A., Shukla, P.K., Wang, M.: On the inexactness level of robust Levenberg-Marquardt methods. Optimization 59(2), 273–287 (2010)
Gould, N., Orban, D., Toint, P.: Cutest: a constrained and unconstrained testing environment with safe threads. Technical Report RAL-TR-2013-005. Rutherford Appleton Laboratory, Oxfordshire (2013)
Gratton, S., Lawless, A.S., Nichols, N.K.: Approximate Gauss-Newton methods for nonlinear least squares problems. SIAM J. Optim. 18(1), 106–132 (2007)
Karas, E.W., Santos, S.A., Svaiter, B.F.: Algebraic rules for quadratic regularization of Newton’s method. Comput. Optim. Appl. 60, 343–376 (2014)
Levenberg, K.: A method for the solution of certain non-linear problems in least squares. Quart. Appl. Math. 2, 164–168 (1944)
Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11, 431–441 (1963)
Moré, J.J.: The Levenberg-Marquardt algorithm: implementation and theory. In: Watson, G.A. (ed.), Lecture Notes in Mathematics 630: Numerical Analysis, pp. 105–116. Springer, New York (1978)
Morrison, D.D.: Methods for nonlinear least squares problems and convergence proofs. In: Lorell, J., Yagi, F. (eds.), Proceedings of the Seminar on Tracking Programs and Orbit Determination, pp. 1–9. Jet Propulsion Laboratory, Pasadena (1960)
Ueda, K., Yamashita, N.: On a global complexity bound on the Levenberg-Marquardt method. J. Optimiz. Theory App. 147, 443–453 (2010)
Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg-Marquardt method. In: Alefeld, G., Chen, X. (eds.) Topics in Numerical Analysis: With Special Emphasis on Nonlinear Problems, Computing Supplement, vol. 15, pp. 239–249. Springer, Vienna (2001)
Yuan, Y.-X.: Recent advances in numerical methods for nonlinear equations and nonlinear least squares. Numer. Algebra Control Optim. 1(1), 15–34 (2011)
Acknowledgments
The authors are thankful to Abel S. Siqueira for installing the CUTEst and preparing our machine to run it within Matlab. We are also grateful to the anonymous referees whose suggestions led to improvements in the paper. Elizabeth W. Karas was partially supported by CNPq Grants 477611/2013-3 and 308957/2014-8. Sandra A. Santos was partially supported by CNPq Grant 304032/2010-7, FAPESP Grants 2013/05475-7 and 2013/07375-0 and PRONEX Optimization. Benar F. Svaiter was partially supported by CNPq Grants 302962/2011-5, 474944/2010-7, FAPERJ Grant E-26/102.940/2011 and PRONEX Optimization.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The complete computational results are presented next. The outcomes of the four solvers CH, QR, FP and LM, respectively, are displayed row by row for each problem.
Rights and permissions
About this article
Cite this article
Karas, E.W., Santos, S.A. & Svaiter, B.F. Algebraic rules for computing the regularization parameter of the Levenberg–Marquardt method. Comput Optim Appl 65, 723–751 (2016). https://doi.org/10.1007/s10589-016-9845-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-016-9845-x
Keywords
- Nonlinear least-squares problems
- Levenberg–Marquardt method
- Regularization
- Global convergence
- Local convergence
- Computational results