Abstract
A new Levenberg–Marquardt (LM) algorithm is proposed for nonlinear equations, where the iterate is updated according to the ratio of the actual reduction to the predicted reduction as usual, but the update of the LM parameter is no longer just based on that ratio. When the iteration is unsuccessful, the LM parameter is increased; but when the iteration is successful, it is updated based on the value of the gradient norm of the merit function. The algorithm converges globally under certain conditions. It also converges quadratically under the local error bound condition, which does not require the nonsingularity of the Jacobian at the solution.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider the system of nonlinear equations
where \(F(x): R^n\rightarrow R^n\) is continuously differentiable.
The Levenberg–Marquardt method (LM) is one of the most well-known iterative methods for nonlinear equations [5, 6, 15]. At the k-th iteration, it computes the trial step
where \(F_k=F(x_k), J_k=J(x_k)\) is the Jacobian at \(x_k\), and \(\lambda _k\) is the LM parameter introduced to overcome the difficulties caused by the singularity or near singularity of \(J_k\).
Let
be the merit function of (1.1). Define the actual reduction of the merit function as
the predicted reduction as
and the ratio of the actual reduction to the predicted reduction
In classical LM methods, one sets
where \(p_0\ge 0\) is a constant, and updates the LM parameter as
where \(p_0<p_1<p_2<1\), \(0<c_1<1<c_0\) are positive constants (cf. [7, 9, 13, 16, 17]).
It was shown in [14] that, if the LM parameter is chosen as \(\lambda _k=\Vert F_k\Vert ^2\), then the LM method converges quadratically under the local error bound condition, which is weaker than the nonsingularity of the Jacobian at the solution. It was further proved in [4] that the LM method converges quadratically for all \(\lambda _k=\Vert F_k\Vert ^{\delta } (\delta \in [1,2])\) under the local error bound condition. In [1], Fan chose
and updated \(\mu _k\) according to the ratio \(r_k\) as follows:
where \(m>0\) is a small constant to prevent the LM parameter from being too small.
Recently, Zhao and Fan [18] took the LM parameter as
where the update of \(\mu _k\) is no longer just based on the ratio \(r_k\). When the iteration is unsuccessful (i.e., \(r_k< p_0\)), \(\mu _k\) is increased; but when the iteration is successful (i.e., \(r_k\ge p_0\)), \(\mu _{k+1}\) is updated as
It was shown that the global complexity bound of the above LM algorithm is \(O(\varepsilon ^{-2})\), that is, it takes at most \(O(\varepsilon ^{-2})\) iterations to derive the norm of the gradient of the merit function below the desired accuracy \(\varepsilon \).
The logic behind the updating rule (1.9) follows from the fact that the LM step is actually the solution of the trust region subproblem
So, the step size computed by solving (1.10) is proportional to the norm of the model gradient \(\Vert J^T_kF_k\Vert \). Hence, the trust region, a magnitude of the inverse of \(\mu _k\), should also be of comparable size.
In this paper, we present a new LM algorithm for (1.1), where the LM parameter is computed as
We update the iterate \(x_k\) according to the ratio \(r_k\) as classical LM algorithms. When the iteration is unsuccessful, we increase \(\mu _k\); otherwise, we update \(\mu _{k+1}\) by (1.9). We show that the new LM algorithm preserves the global convergence of classical LM algorithms. We also prove that the algorithm converges quadratically under the local error bound condition.
The paper is organized as follows. In Sect. 2, we present the new LM algorithm for (1.1). The global convergence of the algorithm is also proved. In Sect. 3, we study the convergence rate of the algorithm under the local error bound condition. Some numerical results are given in Sect. 4. Finally, we conclude the paper in Sect. 5.
2 The LM Algorithm and Global Convergence
In this section, we first give the new LM algorithm, then show that the algorithm converges globally under certain conditions.
The LM algorithm is presented as follows.
Algorithm 2.1
(A Levenberg–Marquardt algorithm for nonlinear equations)
-
Step 1.
Given \(x_0 \in R^n,\mu _0>m>0, 0<p_0< p_1< p_2<1, c_0>1, 0<c_1<1, \varepsilon \ge 0, k:=0\).
-
Step 2.
If \(\Vert J_k^TF_k\Vert \le \varepsilon \), then stop. Otherwise, solve
$$\begin{aligned} (J_k^TJ_k+\lambda _kI)d=-J_k^TF_k ~\hbox {with}~ \lambda _k=\mu _k\Vert F_k\Vert ^{2} \end{aligned}$$(2.1)to obtain \(d_k\).
-
Step 3.
Compute \(r_k=\frac{Ared_k}{Pred_k}\). If \(r_k\ge p_0\), set \( x_{k+1}=x_k+d_k\) and compute \(\mu _{k+1}\) by (1.9); Otherwise, set \(x_{k+1}=x_k\) and compute \(\mu _{k+1}=c_0\mu _k\). Set \(k:=k+1\) and go to step 2.
To study the global convergence of Algorithm 2.1, we make the following assumption.
Assumption 2.1
F(x) is continuously differentiable, both F(x) and its Jacobian J(x) is Lipschitz continuous, i.e., there exist positive constants \(L_1\) and \(L_2\) such that
and
Due to the result given by Powell [10], we have the following lemma.
Lemma 2.1
The predicted reduction satisfies
for all k.
Lemma 2.1 implies that the predicted reduction is always nonnegative.
In the following, we first prove the weak global convergence of Algorithm 2.1, that is, at least one accumulation point of the sequence generated by Algorithm 2.1 is a stationary point of the merit function \(\phi (x)\).
Theorem 2.1
Under Assumption 2.1, Algorithm 2.1 terminates in finite iterations or satisfies
Proof
We prove by contradiction. Suppose that there exists a constant \(\tau >0\) such that
Define the index set of successful iterations:
We discuss in two cases.
Case I. S is infinite. Since \(\Vert F(x)\Vert \) is nonincreasing and bounded below, it follows from (2.3) and (2.4) that
So,
Note that \(d_k=0\) for \(k\notin S\), we have
Since \(\Vert J_k\Vert \le L_2\) and \(\Vert F_k\Vert \le \Vert F_0\Vert \), by (2.1), we have
On the other hand, it follows from (2.3) and (2.4) that
So, \(r_k\rightarrow 1\). Thus, \(\mu _k\) is updated by (1.9) and \(\Vert J_k^TF_k\Vert >p_2/\mu _k\) for all sufficiently large k. Hence, \(\mu _k=\max \left\{ c_1\mu _k,m\right\} \) for all large k. Note that \(0<c_1<1\), there exists a positive constant \(\tilde{c}\) such that
for all large k. This is a contradiction to (2.9).
Case II. S is finite. Then there exists a \(\tilde{k}\) such that
According to the updating rule of \(x_k\) in Algorithm 2.1, we have \(d_k\rightarrow 0\). By the same arguments as (2.10), we get \(r_k\rightarrow 1\), which contradicts (2.11). The proof is completed.\(\square \)
Based on Theorem 2.1, we can further prove the strong global convergence of Algorithm 2.1, that is, all limit points of the sequence generated by Algorithm 2.1 are stationary points of the merit function \(\phi (x)\). We first give an auxiliary result (cf. [3, Lemma 2.7]).
Lemma 2.2
Let \(b, a_1, \ldots , a_N>0\). Then,
Theorem 2.2
Under Assumption 2.1, Algorithm 2.1 terminates in finite iterations or satisfies
Proof
Suppose by contradiction that there exists \(\tau >0\) such that the set
is infinite. Given \(k\in \Omega \), consider the first index \(l_k>k\) such that \(\Vert J_{l_k}^TF_{l_k}\Vert \le \frac{\tau }{2}\). The existence of such \(l_k\) is guaranteed by Theorem 2.1. By (2.2), (2.3) and \(\Vert F_k\Vert \le \Vert F_0\Vert \),
which yields
Define the set
Then,
It now follows from (2.4), (2.15) and Lemma 2.2 that, for all \(k\in \Omega \),
However, since \(\{\Vert F_k\Vert ^2\}\) is nonincreasing and bounded below, \(\Vert F_k\Vert ^2-\Vert F_{l_k}\Vert ^2\rightarrow 0\). This contradicts (2.16). So, the set \(\Omega \) defined by (2.14) is finite. Therefore, (2.13) holds true. The proof is completed.\(\square \)
3 Local Convergence
We assume that the sequence \(\{x_k\}\) generated by Algorithm 2.1 converges to the solution set \(X^*\) of (1.1) and lies in some neighbourhood of \(x^*\in X^*\). We first give some important properties of the algorithm, then show that the algorithm converges quadratically under the local error bound condition.
We make the following assumption.
Assumption 3.1
(a) F(x) is continuously differentiable, and \(\Vert F(x)\Vert \) provides a local error bound on some neighbourhood of \(x^*\in X^*\), i.e., there exist positive constants \(c>0\) and \(b_1<1\) such that
(b) The Jacobian J(x) is Lipschitz continuous on \(N(x^*, b_1)\), i.e., there exists a positive constant \(L_1\) such that
Note that, if J(x) is nonsingular at a solution of (1.1), then it is an isolated solution, so \(\Vert F(x)\Vert \) provides a local error bound on its neighborhood. However, the converse is not necessarily true. Please see examples in [14]. Thus, the local error bound condition is weaker than the nonsingularity.
By (3.2), we have
Moreover, there exists a constant \(L_2>0\) such that
Throughout the paper, we denote by \(\bar{x}_k\) the vector in \(X^*\) that satisfies
3.1 Some Properties
In the following, we first show the relationship between the length of the trial step \(d_k\) and the distance from \(x_k\) to the solution set.
Lemma 3.1
Under Assumption 3.1, if \(x_k\in N(x^*,b_1/2)\), then
holds for all sufficiently large k, where \( c_2=\sqrt{L_1^2c^{-2}m^{-1}+1} \) is a positive constant.
Proof
Since \(x_k\in N(x^*, b_1/2)\), we have
So, \(\bar{x}_k\in N(x^*, b_1)\). Thus, it follows from (1.9) and (3.1) that the LM parameter \(\lambda _k\) satisfies
Note that \(d_k\) is also a minimizer of
So, we obtain (3.5).\(\square \)
Next we show that the gradient of the merit function also provides a local error bound on some neighbourhood of \(x^*\in X^*\).
Lemma 3.2
Under Assumption 3.1, if \(x_k\in N(x^*,b_1/2)\), then there exists a constant \(c_3>0\) such that
holds for all sufficiently large k.
Proof
It follows from (3.3) that
Thus,
So,
By (3.1),
Hence, (3.7) holds for sufficiently large k. The proof is completed.\(\square \)
Lemma 3.3
Under Assumption 3.1, if \(x_k\in N(x^*,b_1/2)\), then there exists a positive integer K such that
That is, \(\mu _k\) is updated by (1.9) when \(k\ge K\).
Proof
It follows form (3.4), (3.5) and (3.7) that
This, together with (3.3), (3.4) and \(\Vert F_k+J_kd_k\Vert \le \Vert F_k\Vert \), gives
So, \(r_k\rightarrow 1\). Therefore, we obtain the result.\(\square \)
Let
be two positive constants.
Lemma 3.4
Under Assumption 3.1 and \(c_1\le (1+c_4c_2c_3^{-1})^{-1}\), if \(k\ge K\) and \(\mu _k\Vert J_k^TF_k\Vert >C_1\), then
Proof
It then follows from Lemmas 3.1 and 3.2 that
Since \(\mu _k\Vert J_k^TF_k\Vert >C_1\), by (3.4) and \(\Vert F_k\Vert \le \Vert F_0\Vert \), we have
So, \(\mu _k\ge \frac{m}{c_1}\). It then follows from \(k\ge K\), Lemma 3.3 and the updating rule (1.9) that
By (3.11) and \(c_1\le (1+c_4c_2c_3^{-1})^{-1}\), we have
The proof is completed.\(\square \)
Let
be a positive constant.
The next lemma shows that \(\mu _k\Vert J^T_kF_k\Vert \) is upper bounded.
Lemma 3.5
Under conditions of Lemma 3.4,
Proof
We discuss in two cases.
Case 1 \(\mu _K\Vert J^T_KF_K\Vert \le c_0(1+c_4c_2c_3^{-1})C_1\). Then, we must have
Otherwise, suppose
It follows from (3.11) and \(\mu _{K+1}\le c_0\mu _K\) that
This gives
By Lemma 3.4, we obtain
This is a contradiction to (3.14). So (3.13) holds true.
By induction, we can obtain
Case 2 \(\mu _K\Vert J^T_KF_K\Vert > c_0(1+c_4c_2c_3^{-1})C_1\). Note that \(c_0>1\), we have
So, by Lemma 3.4,
If \(\mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert > c_0(1+c_4c_2c_3^{-1})C_1\), then by Lemma 3.4 and (3.17),
Otherwise, if \(\mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert \le c_0(1+c_4c_2c_3^{-1})C_1\), then by the same arguments as in case 1, we have
In view of (3.17)–(3.19), we obtain
By induction, we can prove that, for all \(k> K\),
The proof is completed.\(\square \)
Let
The following lemma shows that \(\mu _k\Vert F_k\Vert \) is bounded by \(C_3\).
Lemma 3.6
Under conditions of Lemma 3.4,
holds for all sufficiently large k.
Proof
It follows from (3.4) that
This, together with (3.7), gives
Thus, by (3.12), we obtain (3.21). The proof is completed.\(\square \)
3.2 Quadratic Convergence
Based on the above lemmas, we study the quadratic convergence of Algorithm 2.1 under the local error bound condition, by using the singular value decomposition (SVD) technique.
Suppose the SVD of \(J(\bar{x}_k)\) is
where \(\bar{\Sigma }_{k,1}={\text {diag}}(\bar{\sigma }_{k,1},\ldots , \bar{\sigma }_{k,r})\) with \(\bar{\sigma }_{k,1} \ge \bar{\sigma }_{k,2}\ge \cdots \ge \bar{\sigma }_{k,r}>0\), and the correspondingly SVD of \(J_k\) is
where \(\Sigma _{k,1}={\text {diag}}(\sigma _{k,1},\ldots , \sigma _{k,r})\) with \(\sigma _{k,1} \ge \cdots \ge \sigma _{k,r}>0\), and \(\Sigma _{k,2}={\text {diag}}(\sigma _{k,r+1},\ldots , \sigma _{k,n})\) with \(\sigma _{k,r}\ge \cdots \ge \sigma _{k,n}\ge 0\). In the following, if the context is clear, we neglect the subscription k in \(\Sigma _{k,i}\) and \(U_{k,i}, V_{k,i}(i=1,2)\), and write \(J_k\) as
By the theory of matrix perturbation [12] and the Lipschitzness of \(J_k\),
So,
Since \(\{x_k\}\) converges to the solution set \(X^*\), we assume that \(L_1\Vert \bar{x}_k-x_k\Vert \le \bar{\sigma }_r/2\) holds for all sufficiently large k. Then, it follows from (3.22) that
Lemma 3.7
Under Assumption 3.1, if \(x_k\in N(x^*, b_1/2)\), then we have
-
(a)
\(\Vert U_1U_1^TF_k\Vert \le L_2\Vert \bar{x}_k-x_k\Vert \);
-
(b)
\(\Vert U_2U_2^TF_k\Vert \le 2L_1\Vert \bar{x}_k-x_k\Vert ^2\);
where \(L_1, L_2\) are given in (3.2) and (3.4) respectively.
Proof
(a) follows from (3.4) directly.
Denote \(F(\bar{x}_k)\) by \(\bar{F}_k\). By (3.3) and (3.22),
The proof is completed.\(\square \)
Now we can give the main result of this section.
Theorem 3.1
Under Assumption 3.1, the sequence generated by Algorithm 2.1 converges to some solution of (1.1) quadratically.
Proof
By the SVD of \(J_k\),
and
It follows from (3.4), (3.23), Lemmas 3.6 and 3.7 that
where \(c_5=\frac{4L_2^{2}C_3}{\bar{\sigma }_r^2}+2L_1\) is a positive constant. So, by (3.1), (3.3) and Lemma 3.1,
where \(c_6=c_5+c_2^2 L_1\) is a positive constant.
Note that
By (3.24),
holds for all sufficiently large k. Combining (3.5), (3.24) and (3.25), we obtain
The proof is completed.\(\square \)
Remark 3.1
If the Levenberg–Marquardt parameter is chosen as \(\lambda _k=\mu _k\Vert F_k\Vert ^\delta \), where \(\mu _k\) is updated by (1.9) and \(\delta \in (1,2]\), the algorithm converges superlinearly to some solution of the nonlinear equations with the order \(\delta \). The proof is almost the same as above, except that we have \(\Vert F_k+J_kd_k\Vert \le c_5\Vert \bar{x}_k- x_k\Vert ^\delta \) instead of \(\Vert F_k+J_kd_k\Vert \le c_5\Vert \bar{x}_k- x_k\Vert ^2\) in the proof of Theorem 3.1, which then yields \(\Vert d_{k+1}\Vert \le O(\Vert d_k\Vert ^\delta )\).
4 Numerical Results
We test Algorithm 2.1, where the LM parameter is computed by \(\lambda _k=\mu _k\Vert F_k\Vert ^2\) with \(\mu _k\) updated by (1.9), on some singular nonlinear equations, and compare it with other two LM algorithms, where \(\lambda _k=\mu _k\Vert F_k\Vert \) and \(\lambda _k=\mu _k\Vert F_k\Vert ^2\) with \(\mu _k\) updated by (1.7), respectively.
The test problems are created by modifying the nonsingular problems given by Moré, Garbow and Hillstrom in [8], and have the same form as in [11],
where F(x) is the standard nonsingular test function, \(x^*\) is its root, and \(A\in R^{n\times k}\) has full column rank with \(1\le k\le n\). Obviously, \(\hat{F}(x^*)=0\) and
has rank \(n-k\). A disadvantage of these problems is that \(\hat{F}(x)\) may have roots that are not roots of F(x). We create two sets of singular problems, with \(\hat{J}(x^*)\) having rank \(n-1\) and \(n-2\), by using
and
respectively. Meanwhile, we make a slight alteration on the variable dimension problem, which has \(n+2\) equations in n unknowns; we eliminate the \((n-1)\)-th and n-th equations. (The first n equations in the standard problem are linear.)
We set \(p_0=0.0001, p_1=0.25, p_2=0.75, c_0=4, c_1=0.25, \mu _0=10^{-8}, m=10^{-8}, \varepsilon =10^{-6}\) for all the tests. The stopping criterion is \(\Vert J^T_kF_k\Vert \le \varepsilon \) or when the number of iterations exceeds \(100(n+1)\). The results for the first set problems of rank \(n - 1\) with small scale are listed in Table 1, and the second set of rank \(n-2\) in Table 2. We also test the algorithms on some large scale problems. The results are given in Tables 3 and 4.
The third column of the table indicates that the starting point is \(x_0, 10x_0\), and \(100x_0\), where \(x_0\) is suggested by Moré, Garbow and Hillstrom in [8]; “NF” and “NJ” represent the numbers of function calculations and Jacobian calculations, respectively. If the algorithm fails to find the solution in \(100(n+1)\) iterations, we denote it by the sign “−”, and if the algorithm has underflows or overflows, we denote it by OF. Note that, for general nonlinear equations, the calculations of the Jacobian are usually n times of the function calculations. So, for small scale problems, we also present the values “NF+n*N” for comparisons of the total calculations. However, if the Jacobian is sparse, this kind of value does not mean much. For the large scale problem, the computing time is also given.
From Tables 1 and 2, we can see that Algorithm 2.1 works almost the same as other two LM algorithms for small scale problems. From Tables 3 and 4, we can see that Algorithm 2.1 outperforms the other two algorithms for most large scale problems.
5 Conclusion and Discussion
In traditional LM algorithms for nonlinear equations, both the iterate and the LM parameter are updated according to the ratio of the actual reduction to the predicted reduction of the merit function (cf. [1, 2]). In this paper, we proposed a new LM algorithm for nonlinear equations, where the LM parameter is taken as \(\lambda _k=\mu _k\Vert F_k\Vert ^2\) with \(\mu _k\) being updated by (1.9). Though the iterate is still updated according to the ratio of the actual reduction to the predicted reduction, the update of \(\mu _k\) is no longer based on it. When the iteration is unsuccessful, \(\mu _k\) is increased; otherwise it is updated based on the value of the gradient norm of the merit function as in (1.9). We proved that all limit points of the sequence generated by the algorithm are stationary points of the merit function under standard conditions. Since the updating rule of \(\mu _k\) changes, the analysis of the convergence rate in this paper is quite different from those in [1, 3]. We developed new techniques to prove the quadratic convergence of the algorithm under the local error bound condition.
We also discussed the LM parameter as \(\lambda _k=\mu _k\Vert F_k\Vert ^\delta \), where \(\mu _k\) is updated by (1.9) and \(\delta \in [1,2)\). We found that the algorithm converges with the order \(\delta \), by using the similar analysis in this paper. We conjecture that the convergence rate is quadratic for any \(\delta \in [1,2)\). This will be our future study.
References
Fan, J.Y.: A modified Levenberg–Marquardt algorithm for singular system of nonlinear equations. J. Comput. Math. 21, 625–636 (2003)
Fan, J.Y.: The modified Levenberg–Marquardt method for nonlinear equations with cubic convergence. Math. Comput. 81, 447–466 (2012)
Fan, J.Y., Pan, J.Y., Song, H.Y.: A retrospective trust region algorithm with trust region converging to zero. J. Comput. Math. 34, 421–436 (2016)
Fan, J.Y., Yuan, Y.X.: On the quadratic convergence of the Levenberg–Marquardt method without nonsingularity assumption. Computing 74, 23–39 (2005)
Levenberg, K.: A method for the solution of certain nonlinear problems in least squares. Quardt. Appl. Math. 2, 164–166 (1944)
Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear inequalities. SIAM J. Appl. Math. 11, 431–441 (1963)
Moré, J.J.: The Levenberg–Marquardt algorithm: implementation and theory. Numer. Anal. 630, 105–116 (1978)
Moré, J.J., Garbow, B.S., Hillstrom, K.H.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7, 17–41 (1981)
Osborne, M.R.: Nonlinear least squares-the Levenberg–Marquardt algorithm revisited. J. Aust. Math. Soc. 19, 343–357 (1976)
Powell, M.J.D.: Convergence properties of a class of minimization algorithms. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming 2, pp. 1–27. Academic Press, New York (1975)
Schnabel, R.B., Frank, P.D.: Tensor methods for nonlinear equations. SIAM J. Numer. Anal. 21, 815–843 (1984)
Stewart, G.W., Sun, J.G.: Matrix Perturbation Theory. Academic Press, San Diego (1990)
Wright, S.J., Holt, J.N., Holt, J.N.: An inexact Levenberg–Marquardt method for large sparse nonlinear least squares. J. Aust. Math. Soc. 26, 387–403 (1985)
Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg–Marquardt mehod. Computing (Supplement) 15, 237–249 (2001)
Yuan, Y.X.: Trust region algorithms for nonlinear equations. Information 1, 7–20 (1998)
Yuan, Y.X.: Recent advances in numerical methods for nonlinear equations and nonlinear least sqaures. Numer. Algebra Control Optim. 1, 15–34 (2011)
Yuan, Y.X.: Recent advances in trust region algorithms. Math. Program. Ser. B 151, 249–281 (2015)
Zhao, R.X., Fan, J.Y.: Global complexity bound of the Levenberg–Marquardt method. Optim. Methods Softw. 31, 805–814 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Jinyan Fan is partially supported by the NSFC Grant 11571234.
Rights and permissions
About this article
Cite this article
Zhao, R., Fan, J. On a New Updating Rule of the Levenberg–Marquardt Parameter. J Sci Comput 74, 1146–1162 (2018). https://doi.org/10.1007/s10915-017-0488-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-017-0488-6