Abstract
We propose an adaptive multi-step Levenberg–Marquardt (LM) method for nonlinear equations. The adaptive scheme can decide automatically whether an iteration should evaluate the Jacobian matrix at the current iterate to compute an LM step, or use the latest evaluated Jacobian to compute an approximate LM step, so that not only the Jacobian evaluation but also the linear algebra work can be saved. It is shown that the adaptive multi-step LM method converges superlinearly under the local error bound condition, which does not require the full column rank of the Jacobian at the solution. Numerical experiments demonstrate the efficiency of the adaptive multi-step LM method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider the system of nonlinear equations
where \(F(x):R^{n}\rightarrow R^{m}\) is a continuously differentiable function. Nonlinear equations have wide applications in technology, mechanics, economy and so on. For example, physical models that are expressed as nonlinear partial differential equations become systems of nonlinear equations when discretized [5].
It is natural to transform the nonlinear equations (1.1) to the nonlinear least squares problem
where \(\Vert \cdot \Vert \) refers to the 2-norm. Obviously, (1.1) has a solution if and only if the minimal value of (1.2) is zero. The Gauss–Newton method is the most well-known method for (1.2). At the k-th iteration, it computes the Gauss–Newton step
where \(F_{k}=F(x_{k})\) and \(J_{k}=F'(x_{k})\) is the Jacobian at \(x_k\). However, when the Jacobian is not of full rank, the Gauss–Newton step is not well-defined. To overcome this difficulty, the Levenberg–Marquardt (LM) method introduces a positive parameter \(\lambda _{k}>0\) and computes the LM step
Both the Gauss–Newton method and the Levenberg–Marquardt method have quadratic convergence for (1.1) under the assumptions that the Jacobian J(x) is Lipschitz continuous, \(J(x)^TJ(x)\) is nonsingular at the solution and the LM parameter is chosen suitably. Yamashita and Fukushima [13] proved that if the LM parameter is chosen as \(\lambda _{k}=\Vert F_{k}\Vert ^{2}\), then the LM method converges quadratically for (1.1) under the local error bound condition, which is weaker than the nonsingularity condition. Fan and Yuan [4] took \(\lambda _{k}=\Vert F_{k}\Vert ^{\delta }\) and showed that the LM method preserves the quadratic convergence for any \(\delta \in [1,2]\) under the local error bound condition.
The LM method evaluates the Jacobian at every iteration. When F(x) is complicated or n is large, the cost of Jacobian evaluations may be expensive. To save the Jacobian evaluations as well as the linear algebra work, Fan [2] proposed a modified LM method. At every iteration, it uses the evaluated Jacobian to compute not only an LM step but also an approximate LM step. More generally, a multi-step LM method was given in [3]. At the k-th iteration, it computes one LM step and \(p-1\) approximate LM steps as follows:
where \(p\ge 1\) is a given integer and \(x_{k,i}=x_{k,i-1}+d_{k,i-1}\) with \(x_{k,0}=x_{k}\), then set the trial step as
That is, the multi-step LM method uses \(J_k\) as an approximation of \(J(x_{k,i})\) to compute the approximate LM step \(d_{k,i}\). Hence, the Jacobian evaluation and the matrix factorization are done after every p computations of the step. Counting as a complete iteration of the full p steps between the Jacobian evaluation and the matrix factorization, the multi-step LM method converges with Q-order \(p+1\) under the local error bound condition.
It looks like that the bigger the p, the more Jacobian evaluations could be saved. However, the bigger p, meanwhile, implies the possibly worse approximation of the Jacobian at the iterate. On the other hand, different problems may prefer to different p. But, the best p for a problem is usually not known. These observations motivate us to develop an adaptive scheme to automatically decide whether an iteration should evaluate the Jacobian at the current iterate to compute an LM step or use the latest evaluated Jacobian to compute an approximate LM step. That is, the number of approximate LM steps could vary at different iterations, but it is at most \(t-1\) to avoid worse approximation of the Jacobian, where \(t\ge 1\) is a given integer.
Counting every LM step and every approximate LM step as a single iteration, in this paper, we compute the trial step \(d_k\) by solving
where \(G_k\) is the Jacobian \(J_k\) or the Jacobian used at the last iteration. If \(d_k\) is satisfactory and less than \(t-1\) approximate LM steps are computed, we regard \(G_k\) as a good approximation of the Jacobian at the current iterate, and use it to compute another approximate LM step at next iteration.
Define the ratio of the actual reduction to the predicted reduction of the merit function \(\Vert F(x)\Vert ^2\) at the k-th iteration as
The ratio \(r_k\) exploits the most relevant information on the step’s quality at the iterate. It plays an important role in deciding whether \(d_k\) is acceptable. If \(r_k\) is positive (i.e., the iteration is successful), we accept \(d_k\). Usually, we set
where \(p_0\) is a small positive constant. Suppose \(s-1\) approximate LM steps that used the latest evaluated Jacobian have been computed. If the iteration is very successful and s is less than t, we regard the latest evaluated Jacobian as a good approximation at the iterate and keep it, otherwise, we evaluate the exact one. That is, we set
where \(0<p_0<p_1<1\). Meanwhile, we update the LM parameter as follows:
where
Here \(0<c_2<1<c_1\), \(0<p_0<p_2<p_1<p_3<1\), \(1\le \delta \le 2\) and \(\mu _{\min }>0\) are positive constants. Based on (1.7)–(1.12), we propose an adaptive multi-step LM method for nonlinear equations (1.1) and show it converges superlinearly under the local error bound condition.
The paper is organized as follows. In Sect. 2, we propose an adaptive multi-step LM algorithm for (1.1). It is shown that the algorithm converges globally under certain assumptions. In Sect. 3, we discuss the convergence rate of the algorithm under the local error bound condition. Some numerical results are given in Sect. 4. Finally, we conclude the paper in Sect. 5.
2 An Adaptive Multi-step LM Algorithm and Global Convergence
In this section, we first propose the adaptive multi-step LM algorithm, then show it converges globally under certain assumptions.
The adaptive multi-step LM algorithm is presented as follows.
We denote by \({{\bar{S}}}=\{k_i : i=1,2,\ldots \}\) the set of numbers at which iterations the Jacobians \(J(x_{k_i}) (i=1,2,\ldots ) \) are used to compute LM steps. Let
Since at most \(t-1\) approximate LM steps are computed, we have \(s_i\le t\).
For any k, there exist \(k_i\) and \(0\le q\le s_i-1\) such that
Note that
thus the linear equations (1.7) can also be written as
for \(k=k_i, \ldots , k_i+s_i-1\).
It can be easily checked that \(d_{k}\) is not only the minimizer of the convex minimization problem
but also the solution of the trust region problem
Due to Powell’s result [10, Theorem 4], we have the following lemma.
Lemma 2.1
Let \(d_{k}\) be computed by (1.7), then
Lemma 2.1 indicates that the predicted reduction of the merit function is always nonnegative. It plays a crucial role in guaranteeing the global convergence of Algorithm 1. In the following, we show that, from the optimization point, at least one of the accumulation points of the sequence generated by Algorithm 1 is a stationary point of the merit function \(\Vert F(x)\Vert ^2\).
Theorem 2.2
Suppose F(x) is continuously differentiable, both F(x) and J(x) are Lipschitz continuous, i.e., there exist positive constants \(\kappa _{lf}\) and \(\kappa _{lj}\) such that
and
Then, we have
Proof
Suppose that (2.10) is not true. Then, there exists a constant \(\varepsilon >0\) such that
and
Denote the index set of successful iterations by
We consider S in two cases.
Case 1: S is finite. Then, there exists \({\tilde{k}}\) such that \(r_{k}<p_0<p_1\) for all \(k\ge {\tilde{k}}\). Hence, \(G_k = J_k\) for all \(k\ge {\tilde{k}}\). Since \(\{\Vert F_k\Vert \}\) is nonincreasing, by (1.12), \(\mu _{k+1}=c_1\mu _k\) with \(c_1>1\). So, \(\mu _{k}\rightarrow +\,\infty \).
By (2.11) and (2.12), for \(k\ge {\tilde{k}}\),
and
This, together with \(\lambda _k=\mu _k\Vert F_k\Vert ^\delta \) and \(\mu _k \rightarrow +\,\infty \), gives \(\lambda _{k}\rightarrow +\,\infty .\) Hence, by the definition of \(d_k\), we get \(d_{k}\rightarrow 0\).
Case 2: S is infinite. It follows from (2.12) and Lemma 2.1 that
We claim that the set \(S\cap {{\bar{S}}}\) is infinite. Suppose to the contrary that it is finite. Let \(k_{{{\bar{i}}}}\) be the largest index in it. Then, for \(i>{{\bar{i}}}\), \(r_{k_i}< p_0 < p_1\). By (1.10), \(G_{k_i+1} = J_{k_i+1}\). Thus, \(k_{i+1}=k_i+1\). Moreover, \(k_{i+1} \notin S\). Deducing by induction, we obtain that \(r_k<p_0\) for all sufficiently large k, which contradicts to the infinity of S. Hence, \(S\cap {{\bar{S}}}\) is infinite.
Note that \(k_i\in {{\bar{S}}}\) according to the definition of \({{\bar{S}}}\). By (2.17), we obtain \(d_{k_i}\rightarrow 0\) for \(k_i\in S\). Since \(d_{k_i}=0\) for \(k_i\notin S\), we have \(d_{k_i}\rightarrow 0\). This, together with \(\Vert G_{k_i}\Vert \le \kappa _{lf}\), \(\Vert G_{k_i}^TF_{k_i}\Vert =\Vert J_{k_i}^TF_{k_i}\Vert \ge \varepsilon \) and (1.7), gives \(\lambda _{k_i} \rightarrow +\,\infty \).
By (2.12) and (2.13), for \(k=k_i+1,\ldots , k_i+s_i-1\),
Since \(\lambda _{k_i} \rightarrow + \infty \), we have
and
for sufficiently large \(k_i\). By induction, there exists a positive constant \({{\hat{c}}}\) such that, for \(k=k_i+1,\ldots ,k_i+s_i-1\),
holds for all sufficiently large \(k_i\). Since \(s_i\le t\) for all i, we have \(d_k\rightarrow 0\) and
Hence, by (2.11),
holds for sufficiently large k. Since \(\Vert F_k\Vert \ge \left\| G_{k}^{T}F_{k}\right\| / \Vert G_{k}\Vert \ge \frac{\varepsilon }{2 \kappa _{lf}}\), by the definition of \(d_k\) and \(\lambda _k\), we have \(\lambda _{k}\rightarrow +\,\infty \) and \(\mu _k\rightarrow +\,\infty \).
Therefore, no matter S is infinite or not, we obtain
moreover, (2.22) and (2.23) hold true. It then follows from (2.12), (2.13), (2.22), (2.23), Lemma 2.1 and \(\Vert F_k\Vert \le \Vert F_1\Vert \) that
Thus, \(r_{k}\rightarrow 1\). So, by (1.10)–(1.12), there exists a positive \({\hat{\mu }}\) such that \( \mu _{k}\le {\hat{\mu }} \) holds for all sufficiently large k, which is a contradiction to (2.24). Hence, (2.11) can not be true. Therefore, we obtain (2.10). \(\square \)
3 Local Convergence
In this section, we first investigate the properties of the trial step, then show that the LM parameter is bounded, finally we prove that Algorithm 1 converges superlinearly under the local error bound condition. We make the following assumptions.
Assumption 3.1
The sequence \(\{x_k\}\) generated by Algorithm 1 satisfies \({\text {dist}}(x_k, X^*)\rightarrow 0\), and there exist \(x^*\in X^*\) and \(0<r<1\) such that \( \Vert x_k-x^*\Vert \le r/2\) for all large k.
Assumption 3.2
F(x) is continuously differentiable, and J(x) is Lipschitz continuous on \(N(x^*, r)=\{ x: \Vert x-x^*\Vert \le r\}\), i.e., there exists a constant \(\kappa _{lj}>0\) such that
Assumption 3.3
\(\Vert F(x)\Vert \) provides a local error bound on \(N(x^*, r)\), i.e., there exists a constant \(\kappa _{leb}>0\) such that
By Assumption 3.2,
and there exists a constant \(\kappa _{lf}>0\) such that
Denote by \({\bar{x}}_k\) the vector in \(X^*\) that satisfies
Then, for all large k,
and
so \({{\bar{x}}}_k\in N(x^*,r)\). In the following, the iterations k we consider are large.
3.1 Properties of the Step \(d_k\)
In this subsection, we show the relationship between the size of \(d_k\) and the distance from \(x_{k_i}\) to the solution set.
Lemma 3.4
Under Assumptions 3.1–3.3, there exists a constant \(c>0\) such that
Proof
It follows from (1.12) and (3.2) that
Since \(d_{k_i}\) is the minimizer of \(\varphi _{k_{i}}(d)\), by (3.6), (3.3), \(0<r<1\) and \(1\le \delta \le 2\), we have
where \(c_0=\sqrt{\mu _{\min }^{-1}\kappa _{leb} ^{-\delta }\kappa _{lj}^{2}+1}\).
For \(k=k_i+1, \ldots , k_i+s_i-1\), by (3.3),
By (3.9),
So, it follows from (3.6), (3.10) and \(1\le \delta \le 2\) that
where \({{\bar{c}}}= \kappa _{lj}\mu _{\min }^{-\frac{1}{2}}\kappa _{leb}^{-\frac{\delta }{2}}c_0\) and \(c_1=2+{{\bar{c}}}\) are positive constants. Similarly,
where \(c_2=2+c_1+{{\bar{c}}}(1+c_1)^2\). Let
Then, by induction, we have
Since \(s_i\le t\) for all i, \(c_{t-1}>c_{t-2}>\cdots>c_2>c_1>1\), by (3.10), we obtain (3.8), where \(c=c_0c_{t-1}\). \(\square \)
3.2 Boundedness of the LM Parameter
The updating rule of \(\mu _k\) indicates that \(\mu _k\) is bounded below. In the following, we show that \(\mu _k\) is bounded above.
Lemma 3.5
Under Assumptions 3.1–3.3, there exists a constant \(\mu _{\max }>0\) such that
holds for all sufficiently large k.
Proof
We first prove that
holds for sufficiently large k.
It follows from (3.1), (3.3), Lemma 3.4 and \(s_i\le t\) that, for \(k=k_i, k_i+1,\ldots , k_i+s_i-1\),
Note that \(\Vert {\bar{x}}_{k}-x_{k}\Vert \rightarrow 0\) and \(\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \rightarrow 0\), we obtain that
holds for sufficiently large k.
We consider in two cases.
Case 1: \(\Vert {\bar{x}}_{k}-x_{k}\Vert \le \Vert d_k\Vert \). Since \(d_k\) is a solution of the trust region problem (2.6), by (3.2) and (3.20),
holds for sufficiently large k.
Case 2: \(\Vert {\bar{x}}_{k}-x_{k}\Vert >\Vert d_k\Vert \). Similarly, by (3.2) and (3.20),
Combining (3.21) with (3.22), we get
which yields (3.18).
It then follows from (3.2), (3.3), Lemma 3.4 and \(\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert \le \Vert F_{k_i}\Vert \) that
This implies that \(r_{k_i}\rightarrow 1\). Note that for \(k\notin {{\bar{S}}}\), \(G_{k+1}=G_k\) and \(r_k\ge p_1>p_2\), so \(\mu _{k+1}\le \mu _{k}\). By (1.10)–(1.12), there exists a constant \(\mu _{\max }>0\) such that (3.17) holds true. \(\square \)
3.3 Convergence Rate of Algorithm 1
Due to the results given by Behling and Iusem in [1], we assume that \({\text {rank}}(J({\bar{x}}))=r\) for all \({\bar{x}}\in N(x^{*},b )\cap X^{*}\). Suppose the SVD of \(J({\bar{x}}_{k_{i}})\) is
where \({\bar{\Sigma }}_{k_{i},1}={\text {diag}}({\bar{\sigma }}_{k_{i},1},{\bar{\sigma }}_{k_{i},2},\ldots ,{\bar{\sigma }}_{k_{i},r})\) with \({\bar{\sigma }}_{k_{i},1}\ge \cdots \ge {\bar{\sigma }}_{k_{i},r}>0\). Suppose the SVD of \(J(x_{k_{i}})\) is
where \(\Sigma _{k_{i},1}=\text{ diag }(\sigma _{k_{i},1},\sigma _{k_{i},2},\ldots ,\sigma _{k_{i},r})\) with \(\sigma _{k_{i},1}\ge \cdots \ge \sigma _{k_{i},r}>0\) and \(\Sigma _{k_{i},2}=\text{ diag }(\sigma _{k_{i},r+1},\sigma _{k_{i},r+2},\ldots ,\sigma _{k_{i},n})\) with \(\sigma _{k_{i},r+1}\ge \cdots \ge \sigma _{k_{i},n}\ge 0\).
By (3.1) and the theory of matrix perturbation [12], we obtain
Lemma 3.6
Under Assumptions 3.1–3.3, there exist positive constants \(l_1\) and \(l_2\) such that
Proof
We prove by induction. It was shown in [2] that the results hold true for \(k=k_i\) and \(k=k_i+1\).
Suppose the results hold true for \(k-1 (k_i+2< k < k_i+s_i-1)\), that is, there exist constants \({{\bar{l}}}_1\) and \({{\bar{l}}}_2\) such that
It then follows from (3.1), (3.3) and Lemma 3.4 that
where \({{\bar{l}}}_3={{\bar{l}}}_2+\kappa _{lj}ct{{\bar{l}}}_1+\kappa _{lj}\bar{l}_1^2\). Hence,
Moreover, by (3.2),
Let \({\tilde{J}}_{k_{i}}=U_{k_i, 1}\Sigma _{k_i, 1}V_{k_i, 1}^{T}\) and \({\tilde{u}}_{k}=-{\tilde{J}}_{k_i}^{+}F_{k}\). Then, \({\tilde{u}}_{k}\) is the least squares solution of the problem \({\displaystyle \min _{u\in R^{n}}\Vert F_{k}+{\tilde{J}}_{k_{i}}u\Vert }\). By (3.1), (3.3) and (3.27),
where \({{\bar{l}}}_4=\kappa _{lj}\kappa _{leb}^{-1} \bar{l}_3(\kappa _{leb}^{-1} {{\bar{l}}}_3+ct+1)\).
Note that \(\{x_k\}\) converges to \(X^{*}\). By (3.27), we know
holds for sufficiently large \(k_i\). By (3.9),
It then follows from (3.33), (3.35)–(3.37), Lemma 3.5 and \(1\le \delta \le 2\) that
where \(l_1=2{\bar{\sigma }}_{k_i, r}^{-1} {{\bar{l}}}_3+ \kappa _{lj}\mu _{\min }^{-1} \kappa _{leb} ^{-\delta } {{\bar{l}}}_4\), and
where \(l_2= 4\mu _{\max }\kappa _{lf}^{\delta }{\bar{\sigma }}_{k_i, r}^{-2}{{\bar{l}}}_3+{{\bar{l}}}_4\). The proof is completed. \(\square \)
Based on Lemma 3.6, we have the following main result.
Theorem 3.7
Under Assumptions 3.1–3.3, there exists a constant \(l_3>0\) such that
Consequently, Algorithm 1 converges q-superlinearly to some solution of (1.1).
Proof
It follows from (3.2), (3.3), Lemma 3.4 and Lemma 3.6 that
Since
holds for sufficiently large k.
Let \({{\bar{l}}}_5=\kappa _{leb}^{-1}(l_2+tcl_1+\kappa _{lj}l_1^2)\). It then follows from (3.41), Lemmas 3.4 and 3.6 that
Letting \(l_3=c{{\bar{l}}}_5(2tc_{t-1})^{t+1}\), we obtain (3.40).
By Lemma 3.4, we have \(\Vert d_{k_i}\Vert \rightarrow 0\). Hence, there exist N and \(0<q<1\) such that \(\max \{\Vert d_{k_i}\Vert , l_3\Vert d_{k_i}\Vert \}\le q<1\) for all \(i\ge N\). Since \( s_i\ge 1\), we have
Then,
This implies that \(\sum _{i=N}^\infty \Vert d_{k_i}\Vert \) converges, so does \(\sum _{i=1}^\infty \Vert d_{k_i}\Vert \).
By (3.15) and \(s_i\le t\), we have
So, \(\sum _{i=1}^\infty \Vert x_{k_{i+1}}-x_{k_i}\Vert \) converges, which implies that \(\sum _{i=1}^\infty (x_{k_{i+1}}-x_{k_i})\) converges. Thus, \(\{x_{k_i}\}\) converges to some \({{\hat{x}}}\in X^*\). By (3.40), we have
for some \({{\tilde{q}}}>0\). Therefore, Algorithm 1 converges q-superlinearly. \(\square \)
4 Numerical Results
We tested Algorithm 1 on some singular problems and compared it with the LM algorithm, the modified LM algorithm [2] as well as the multi-step LM algorithm with \(m=3\) [3]. The experiments are implemented on a laptop with an Intel Core i7-7500U CPU and 8 GB of RAM, using Matlab R2015b.
The test problems are created by modifying the nonsingular problems given by Moré, Garbow and Hillstrom in [8]. They have the same form as in [11]
where F(x) is the standard nonsingular test function, \(x^{*}\) is its root, and \(A\in R^{n\times k}\) has full column rank with \(1\le k\le n\). Obviously, \({\hat{F}}(x^{*})=0\) and
has rank \(n-k\).
We set \(p_{0}=0.0001,p_{1}=0.50,p_{2}=0.25,p_{3}=0.75\), \(c_1=4, c_2=0.25\), \(\mu _{1}=10^{-5}\), \(\mu _{\min }=10^{-8}\) and \(t=10\) for the tests. Algorithm 1 stops when \(\Vert J_{k_i}^{T}F_{k_i}\Vert \le 10^{-5}\) or the iteration number exceeds \(100\times (n+1)\).
We chose the rank of \({\hat{J}}(x^{*})\) to be \(n-1\) by using
The results are given in Tables 1 and 2. In the tables, \(x_{0}\), \(10x_{0}\) and \(100x_{0}\) in the third column are starting points, where \(x_{0}\) is suggested in [8]. “NF” and “NJ” represent the numbers of function evaluations and Jacobian evaluations, respectively. “—” represents that the algorithm fails to find a solution in \(100\times (n+1)\) iterations. Note that, the evaluation of the Jacobian is generally n times of the function evaluation. So we also presented the values “\(NF+n* NJ\)” for comparisons of the total evaluations. However, if the Jacobian is sparse, this kind of value does not mean much. For large scale problems, we presented the running time of the problem instead of the total evaluations.
We also chose the rank of \({\hat{J}}(x^{*})\) to be 1 and 10, where \(A\in R^{n\times (n-1)}\) and \(A\in R^{n\times (n-10)}\) are generated randomly. The results are given in Tables 3 and 4, respectively.
From the tables, we can see that the adaptive LM algorithm usually takes the least Jacobian calculations as well as the total calculations to find a solution of small scale nonlinear equations, while it usually takes the least time to find a solution of large scale nonlinear equations, no matter the rank of the problem is. The reason is that it makes full use of the available evaluated Jacobians and matrix factorizations to compute approximate LM steps. Since it takes much less efforts to compute approximate LM steps, the overall cost of the adaptive LM algorithm is usually much less.
5 Conclusion
We proposed an adaptive multi-step Levenberg–Marquardt algorithm for nonlinear equations. It uses the latest evaluated Jacobian to calculate an approximate LM step, if the ratio of the actual reduction to the predicted reduction of the merit function at the iterate is good. Under the local error bound condition, which is weaker than the nonsingularity condition, the adaptive LM algorithm converges superlinearly to some solution of (1.1). Compared to the LM algorithm, the Jacobian calculations, the total calculations, as well as the running time of the adaptive multi-step LM algorithm are significantly reduced.
References
Behling, R., Iusem, A.: The effect of calmness on the solution set of systems of nonlinear equations. Math. Program. 137, 155–165 (2013)
Fan, J.Y.: The modified Levenberg–Marquardt method for nonlinear equations with cubic convergence. Math. Comput. 81, 447–466 (2012)
Fan, J.Y.: A Shamanskii-like Levenberg–Marquardt method for nonlinear equations. Comput. Optim. Appl. 56, 63–80 (2013)
Fan, J.Y., Yuan, Y.X.: On the quadratic convergence of the Levenberg–Marquardt method without nonsingularity assumption. Computing 74, 23–39 (2005)
Kelley, C.T.: Solving Nonlinear Equations with Newton’s Method. Fundamentals of Algorithms. SIAM, Philadelphia (2003)
Levenberg, K.: A method for the solution of certain nonlinear problems in least squares. Q. Appl. Math. 2, 164–166 (1944)
Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear inequalities. SIAM J. Appl. Math. 11, 431–441 (1963)
Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. (TOMS) 7, 17–41 (1981)
Moré, J.J.: The Levenberg–Marquardt Algorithm: Implementation and Theory. In: Watson, G.A. (ed.) Lecture Notes in Mathematics 630: Numerical Analysis, pp. 105–116. Springer, Berlin (1978)
Powell, M.J.D.: Convergence properties of a class of minimization algorithms. Nonlinear Program. 2, 1–27 (1975)
Schnabel, R.B., Frank, P.D.: Tensor methods for nonlinear equations. SIAM J. Numer. Anal. 21, 815–843 (1984)
Stewart, G.W., Sun, J.-G.: Matrix Perturbation Theory, (Computer Science and Scientific Computing). Academic Press, Boston (1990)
Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg–Marquardt method. Comput. Suppl. 15, 237–249 (2001)
Yuan, Y.X.: Recent advances in trust region algorithms. Math. Program. Ser. B 151, 249–281 (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
The first author was supported in part by NSFC Grant 11571234. The third author was supported in part by NSFC Grant 11371145 and 11771148, and Science and Technology Commission of Shanghai Municipality Grant 13dz2260400.
Rights and permissions
About this article
Cite this article
Fan, J., Huang, J. & Pan, J. An Adaptive Multi-step Levenberg–Marquardt Method. J Sci Comput 78, 531–548 (2019). https://doi.org/10.1007/s10915-018-0777-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-018-0777-8