1 Introduction

We consider the system of nonlinear equations

$$\begin{aligned} F(x)=0, \end{aligned}$$
(1.1)

where \(F(x): R^n\rightarrow R^n\) is continuously differentiable.

The Levenberg–Marquardt method (LM) is one of the most well-known iterative methods for nonlinear equations [5, 6, 15]. At the k-th iteration, it computes the trial step

$$\begin{aligned} d_k=-(J^T_kJ_k+\lambda _kI)^{-1}J_k^TF_k, \end{aligned}$$
(1.2)

where \(F_k=F(x_k), J_k=J(x_k)\) is the Jacobian at \(x_k\), and \(\lambda _k\) is the LM parameter introduced to overcome the difficulties caused by the singularity or near singularity of \(J_k\).

Let

$$\begin{aligned} \min \limits _{x\in R^n}~~\phi (x):=\Vert F(x)\Vert ^2 \end{aligned}$$
(1.3)

be the merit function of (1.1). Define the actual reduction of the merit function as

$$\begin{aligned} Ared_k=\Vert F_k\Vert ^2-\Vert F(x_k+d_k)\Vert ^2, \end{aligned}$$

the predicted reduction as

$$\begin{aligned} Pred_k=\Vert F_k\Vert ^2-\Vert F_k+J_kd_k\Vert ^2, \end{aligned}$$

and the ratio of the actual reduction to the predicted reduction

$$\begin{aligned} r_k=\frac{Ared_k}{Pred_k}. \end{aligned}$$

In classical LM methods, one sets

$$\begin{aligned} x_{k+1}=\left\{ \begin{array}{ll} x_k+d_k, &{} \text{ if }\ r_k\ge p_0, \\ x_k, &{} \text{ otherwise }, \end{array} \right. \end{aligned}$$
(1.4)

where \(p_0\ge 0\) is a constant, and updates the LM parameter as

$$\begin{aligned} \lambda _{k+1}=\left\{ \begin{array}{ll} c_0\lambda _k, &{} \hbox {if} \ r_k< p_1,\\ \lambda _k, &{} \hbox {if} \ r_k\in [p_1, p_2], \\ c_1\lambda _k,&{} \hbox {if}\ r_k>p_2, \end{array}\right. \end{aligned}$$
(1.5)

where \(p_0<p_1<p_2<1\), \(0<c_1<1<c_0\) are positive constants (cf. [7, 9, 13, 16, 17]).

It was shown in [14] that, if the LM parameter is chosen as \(\lambda _k=\Vert F_k\Vert ^2\), then the LM method converges quadratically under the local error bound condition, which is weaker than the nonsingularity of the Jacobian at the solution. It was further proved in [4] that the LM method converges quadratically for all \(\lambda _k=\Vert F_k\Vert ^{\delta } (\delta \in [1,2])\) under the local error bound condition. In [1], Fan chose

$$\begin{aligned} \lambda _k=\mu _k\Vert F_k\Vert , \end{aligned}$$
(1.6)

and updated \(\mu _k\) according to the ratio \(r_k\) as follows:

$$\begin{aligned} \mu _{k+1}=\left\{ \begin{array}{ll} c_0\mu _k, &{} \hbox {if} \ r_k< p_1,\\ \mu _k, &{} \hbox {if} \ r_k\in [p_1, p_2], \\ \max \left\{ c_1\mu _k, m\right\} ,&{} \hbox {if}\ r_k>p_2, \end{array}\right. \end{aligned}$$
(1.7)

where \(m>0\) is a small constant to prevent the LM parameter from being too small.

Recently, Zhao and Fan [18] took the LM parameter as

$$\begin{aligned} \lambda _k=\mu _k\Vert J^T_kF_k\Vert , \end{aligned}$$
(1.8)

where the update of \(\mu _k\) is no longer just based on the ratio \(r_k\). When the iteration is unsuccessful (i.e., \(r_k< p_0\)), \(\mu _k\) is increased; but when the iteration is successful (i.e., \(r_k\ge p_0\)), \(\mu _{k+1}\) is updated as

$$\begin{aligned} \mu _{k+1}=\left\{ \begin{array}{ll} c_0\mu _k, &{} \hbox {if} \ \Vert J_k^TF_k\Vert < \dfrac{p_1}{\mu _k},\\ \mu _k, &{} \hbox {if} \ \Vert J_k^TF_k\Vert \in [\dfrac{p_1}{\mu _k},~~\dfrac{p_2}{\mu _k}],\\ \max \left\{ c_1\mu _k,m\right\} ,&{} \hbox {if} \ \Vert J_k^TF_k\Vert >\dfrac{p_2}{\mu _k}. \end{array}\right. \end{aligned}$$
(1.9)

It was shown that the global complexity bound of the above LM algorithm is \(O(\varepsilon ^{-2})\), that is, it takes at most \(O(\varepsilon ^{-2})\) iterations to derive the norm of the gradient of the merit function below the desired accuracy \(\varepsilon \).

The logic behind the updating rule (1.9) follows from the fact that the LM step is actually the solution of the trust region subproblem

$$\begin{aligned} \min \limits _{d\in R^n}&\ \Vert F_k+J_kd\Vert ^{2}\nonumber \\ s.t.&\ \Vert d\Vert \le \Delta _k:=\Vert d_k\Vert . \end{aligned}$$
(1.10)

So, the step size computed by solving (1.10) is proportional to the norm of the model gradient \(\Vert J^T_kF_k\Vert \). Hence, the trust region, a magnitude of the inverse of \(\mu _k\), should also be of comparable size.

In this paper, we present a new LM algorithm for (1.1), where the LM parameter is computed as

$$\begin{aligned} \lambda _k=\mu _k\Vert F_k\Vert ^{2}. \end{aligned}$$
(1.11)

We update the iterate \(x_k\) according to the ratio \(r_k\) as classical LM algorithms. When the iteration is unsuccessful, we increase \(\mu _k\); otherwise, we update \(\mu _{k+1}\) by (1.9). We show that the new LM algorithm preserves the global convergence of classical LM algorithms. We also prove that the algorithm converges quadratically under the local error bound condition.

The paper is organized as follows. In Sect. 2, we present the new LM algorithm for (1.1). The global convergence of the algorithm is also proved. In Sect. 3, we study the convergence rate of the algorithm under the local error bound condition. Some numerical results are given in Sect. 4. Finally, we conclude the paper in Sect. 5.

2 The LM Algorithm and Global Convergence

In this section, we first give the new LM algorithm, then show that the algorithm converges globally under certain conditions.

The LM algorithm is presented as follows.

Algorithm 2.1

(A Levenberg–Marquardt algorithm for nonlinear equations)

  1. Step 1.

    Given \(x_0 \in R^n,\mu _0>m>0, 0<p_0< p_1< p_2<1, c_0>1, 0<c_1<1, \varepsilon \ge 0, k:=0\).

  2. Step 2.

    If \(\Vert J_k^TF_k\Vert \le \varepsilon \), then stop. Otherwise, solve

    $$\begin{aligned} (J_k^TJ_k+\lambda _kI)d=-J_k^TF_k ~\hbox {with}~ \lambda _k=\mu _k\Vert F_k\Vert ^{2} \end{aligned}$$
    (2.1)

    to obtain \(d_k\).

  3. Step 3.

    Compute \(r_k=\frac{Ared_k}{Pred_k}\). If \(r_k\ge p_0\), set \( x_{k+1}=x_k+d_k\) and compute \(\mu _{k+1}\) by (1.9); Otherwise, set \(x_{k+1}=x_k\) and compute \(\mu _{k+1}=c_0\mu _k\). Set \(k:=k+1\) and go to step 2.

To study the global convergence of Algorithm 2.1, we make the following assumption.

Assumption 2.1

F(x) is continuously differentiable, both F(x) and its Jacobian J(x) is Lipschitz continuous, i.e., there exist positive constants \(L_1\) and \(L_2\) such that

$$\begin{aligned} \Vert J(x)-J(y)\Vert \le L_1\Vert y-x\Vert ,\quad \forall x,y\in R^n \end{aligned}$$
(2.2)

and

$$\begin{aligned} \Vert F(x)-F(y)\Vert \le L_2\Vert y-x\Vert ,\quad \forall x,y\in R^n. \end{aligned}$$
(2.3)

Due to the result given by Powell [10], we have the following lemma.

Lemma 2.1

The predicted reduction satisfies

$$\begin{aligned} \Vert F_k\Vert ^2-\Vert F_k+J_kd_k\Vert ^2\ge \Vert J^T_kF_k\Vert \min \left\{ \Vert d_k\Vert ,\frac{\Vert J_k^TF_k\Vert }{\Vert J^T_kJ_k\Vert }\right\} \end{aligned}$$
(2.4)

for all k.

Lemma 2.1 implies that the predicted reduction is always nonnegative.

In the following, we first prove the weak global convergence of Algorithm 2.1, that is, at least one accumulation point of the sequence generated by Algorithm 2.1 is a stationary point of the merit function \(\phi (x)\).

Theorem 2.1

Under Assumption 2.1, Algorithm 2.1 terminates in finite iterations or satisfies

$$\begin{aligned} \liminf _{k\rightarrow \infty }\Vert J_k^TF_k\Vert =0. \end{aligned}$$

Proof

We prove by contradiction. Suppose that there exists a constant \(\tau >0\) such that

$$\begin{aligned} \Vert J_k^TF_k \Vert \ge \tau , \quad \forall k. \end{aligned}$$
(2.5)

Define the index set of successful iterations:

$$\begin{aligned} S=\{k: r_k \ge p_0 \}. \end{aligned}$$

We discuss in two cases.

Case I. S is infinite. Since \(\Vert F(x)\Vert \) is nonincreasing and bounded below, it follows from (2.3) and (2.4) that

$$\begin{aligned} +\infty >&\sum _{k\in S} (\Vert F_k\Vert ^2-\Vert F_{k+1}\Vert ^2)\nonumber \\ \ge&\sum _{k\in S} p_0 (\Vert F_k\Vert ^2-\Vert F_k+J_kd_k\Vert ^2)\nonumber \\ \ge&\sum _{k\in S} p_0\Vert J^T_kF_k\Vert \min \left\{ \Vert d_k\Vert ,\frac{\Vert J_k^TF_k\Vert }{\Vert J^T_kJ_k\Vert }\right\} \nonumber \\ \ge&\sum _{k\in S} p_0\tau \min \left\{ \Vert d_k\Vert , \frac{\tau }{L_2^2}\right\} . \end{aligned}$$
(2.6)

So,

$$\begin{aligned} \lim _{k\in S, k\rightarrow \infty } d_k=0. \end{aligned}$$
(2.7)

Note that \(d_k=0\) for \(k\notin S\), we have

$$\begin{aligned} \lim _{k\rightarrow \infty } d_k=0. \end{aligned}$$
(2.8)

Since \(\Vert J_k\Vert \le L_2\) and \(\Vert F_k\Vert \le \Vert F_0\Vert \), by (2.1), we have

$$\begin{aligned} \mu _k\rightarrow +\infty . \end{aligned}$$
(2.9)

On the other hand, it follows from (2.3) and (2.4) that

$$\begin{aligned} |r_k-1|=&\left| \frac{Ared_k-Pred_k}{Pred_k}\right| \nonumber \\ =&\frac{|\Vert F(x_k+d_k)\Vert ^2-\Vert F_k+J_kd_k\Vert ^2|}{Pred_k} \nonumber \\ \le&\frac{\Vert F_k+J_kd_k\Vert O(\Vert d_k\Vert ^2)+O(\Vert d_k\Vert ^4)}{\tau \min \{\Vert d_k\Vert ,\tau /L_2^2\}} \nonumber \\ \rightarrow&0. \end{aligned}$$
(2.10)

So, \(r_k\rightarrow 1\). Thus, \(\mu _k\) is updated by (1.9) and \(\Vert J_k^TF_k\Vert >p_2/\mu _k\) for all sufficiently large k. Hence, \(\mu _k=\max \left\{ c_1\mu _k,m\right\} \) for all large k. Note that \(0<c_1<1\), there exists a positive constant \(\tilde{c}\) such that

$$\begin{aligned} \mu _k< \tilde{c} \end{aligned}$$

for all large k. This is a contradiction to (2.9).

Case II. S is finite. Then there exists a \(\tilde{k}\) such that

$$\begin{aligned} r_k<p_0, \quad k\ge \tilde{k}. \end{aligned}$$
(2.11)

According to the updating rule of \(x_k\) in Algorithm 2.1, we have \(d_k\rightarrow 0\). By the same arguments as (2.10), we get \(r_k\rightarrow 1\), which contradicts (2.11). The proof is completed.\(\square \)

Based on Theorem 2.1, we can further prove the strong global convergence of Algorithm 2.1, that is, all limit points of the sequence generated by Algorithm 2.1 are stationary points of the merit function \(\phi (x)\). We first give an auxiliary result (cf. [3, Lemma 2.7]).

Lemma 2.2

Let \(b, a_1, \ldots , a_N>0\). Then,

$$\begin{aligned} \sum _{j=1}^N \min \{a_j,b\}\ge \min \left\{ \sum _{j=1}^Na_j, b\right\} . \end{aligned}$$
(2.12)

Theorem 2.2

Under Assumption 2.1, Algorithm 2.1 terminates in finite iterations or satisfies

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }~~\Vert J_k^TF_k\Vert =0. \end{aligned}$$
(2.13)

Proof

Suppose by contradiction that there exists \(\tau >0\) such that the set

$$\begin{aligned} \Omega =\{k: \Vert J_k^TF_k\Vert \ge \tau \} \end{aligned}$$
(2.14)

is infinite. Given \(k\in \Omega \), consider the first index \(l_k>k\) such that \(\Vert J_{l_k}^TF_{l_k}\Vert \le \frac{\tau }{2}\). The existence of such \(l_k\) is guaranteed by Theorem 2.1. By (2.2), (2.3) and \(\Vert F_k\Vert \le \Vert F_0\Vert \),

$$\begin{aligned} \frac{\tau }{2}\le&\Vert J_k^TF_k\Vert -\Vert J_{l_k}^TF_{l_k}\Vert \le \Vert J_k^TF_k-J_{l_k}^TF_{l_k}\Vert \nonumber \\ \le&\Vert J_k^TF_k-J_{l_k}^TF_k\Vert +\Vert J_{l_k}^TF_k-J_{l_k}^TF_{l_k}\Vert \le (L_1\Vert F_0\Vert +L_2^2)\Vert x_k-x_{l_k}\Vert , \end{aligned}$$

which yields

$$\begin{aligned} \Vert x_k-x_{l_k}\Vert \ge \frac{\tau }{2(L_1\Vert F_0\Vert +L_2^2)}. \end{aligned}$$

Define the set

$$\begin{aligned} S_k=\{j: k\le j<l_k, x_{j+1}\not =x_j\}. \end{aligned}$$

Then,

$$\begin{aligned} \frac{\tau }{2(L_1\Vert F_0\Vert +L_2^2)}\le \Vert x_k-x_{l_k}\Vert \le \sum _{j\in S_k} \Vert x_j-x_{j+1}\Vert \le \sum _{j\in S_k}\Vert d_j\Vert . \end{aligned}$$
(2.15)

It now follows from (2.4), (2.15) and Lemma 2.2 that, for all \(k\in \Omega \),

$$\begin{aligned} \Vert F_k\Vert ^2-\Vert F_{l_k}\Vert ^2=&\sum _{j\in S_k} (\Vert F_j\Vert ^2-\Vert F_{j+1}\Vert ^2) \nonumber \\ \ge&\sum _{j\in S_k} p_0 \Vert J_j^TF_j\Vert \min \left\{ \Vert d_j\Vert , \frac{\Vert J_j^TF_j\Vert }{\Vert J_j^TJ_j\Vert }\right\} \nonumber \\ \ge&\sum _{j\in S_k} \frac{p_0\tau }{2} \min \left\{ \Vert d_j\Vert , \frac{\tau }{2L_2^2}\right\} \nonumber \\ \ge&\frac{p_0\tau }{2} \min \left\{ \sum _{j\in S_k}\Vert d_j\Vert , \frac{\tau }{2L_2^2}\right\} \nonumber \\ \ge&\frac{p_0\tau }{2} \min \left\{ \frac{\tau }{2(L_1\Vert F_0\Vert +L_2^2)}, \frac{\tau }{2L_2^2}\right\} \nonumber \\ =&\frac{p_0\tau ^2}{4(L_1\Vert F_0\Vert +L_2^2)}\nonumber \\ >&0. \end{aligned}$$
(2.16)

However, since \(\{\Vert F_k\Vert ^2\}\) is nonincreasing and bounded below, \(\Vert F_k\Vert ^2-\Vert F_{l_k}\Vert ^2\rightarrow 0\). This contradicts (2.16). So, the set \(\Omega \) defined by (2.14) is finite. Therefore, (2.13) holds true. The proof is completed.\(\square \)

3 Local Convergence

We assume that the sequence \(\{x_k\}\) generated by Algorithm 2.1 converges to the solution set \(X^*\) of (1.1) and lies in some neighbourhood of \(x^*\in X^*\). We first give some important properties of the algorithm, then show that the algorithm converges quadratically under the local error bound condition.

We make the following assumption.

Assumption 3.1

(a) F(x) is continuously differentiable, and \(\Vert F(x)\Vert \) provides a local error bound on some neighbourhood of \(x^*\in X^*\), i.e., there exist positive constants \(c>0\) and \(b_1<1\) such that

$$\begin{aligned} \Vert F(x)\Vert \ge c \ \text{ dist }(x, X^*), \qquad \forall x\in N(x^*, b_1)=\{x: \Vert x-x^*\Vert \le b_1 \}. \end{aligned}$$
(3.1)

(b) The Jacobian J(x) is Lipschitz continuous on \(N(x^*, b_1)\), i.e., there exists a positive constant \(L_1\) such that

$$\begin{aligned} \quad \Vert J(y)-J(x)\Vert \le L_1 \Vert y-x\Vert , \quad \forall x, y\in N(x^*, b_1). \end{aligned}$$
(3.2)

Note that, if J(x) is nonsingular at a solution of (1.1), then it is an isolated solution, so \(\Vert F(x)\Vert \) provides a local error bound on its neighborhood. However, the converse is not necessarily true. Please see examples in [14]. Thus, the local error bound condition is weaker than the nonsingularity.

By (3.2), we have

$$\begin{aligned} \Vert F(y)-F(x)-J(x)(y-x)\Vert \le L_1\Vert y-x\Vert ^2,\quad \forall x, y\in N(x^*,b_1). \end{aligned}$$
(3.3)

Moreover, there exists a constant \(L_2>0\) such that

$$\begin{aligned} \Vert F(y)-F(x)\Vert \le L_2\Vert y-x\Vert ,\quad \forall x, y\in N(x^*,b_1). \end{aligned}$$
(3.4)

Throughout the paper, we denote by \(\bar{x}_k\) the vector in \(X^*\) that satisfies

$$\begin{aligned} \Vert \bar{x}_k-x_k\Vert =\text{ dist }(x_k, X^*). \end{aligned}$$

3.1 Some Properties

In the following, we first show the relationship between the length of the trial step \(d_k\) and the distance from \(x_k\) to the solution set.

Lemma 3.1

Under Assumption 3.1, if \(x_k\in N(x^*,b_1/2)\), then

$$\begin{aligned} \Vert d_k\Vert \le c_2\Vert \bar{x}_k-x_k\Vert \end{aligned}$$
(3.5)

holds for all sufficiently large k, where \( c_2=\sqrt{L_1^2c^{-2}m^{-1}+1} \) is a positive constant.

Proof

Since \(x_k\in N(x^*, b_1/2)\), we have

$$\begin{aligned} \Vert \bar{x}_k-x^*\Vert \le \Vert \bar{x}_k-x_k\Vert +\Vert x_k-x^*\Vert \le 2\Vert x_k-x^*\Vert \le b_1. \end{aligned}$$

So, \(\bar{x}_k\in N(x^*, b_1)\). Thus, it follows from (1.9) and (3.1) that the LM parameter \(\lambda _k\) satisfies

$$\begin{aligned} \lambda _k=\mu _k\Vert F_k\Vert ^{2}\ge c^{2} m\Vert \bar{x}_k-x_k\Vert ^{2}. \end{aligned}$$
(3.6)

Note that \(d_k\) is also a minimizer of

$$\begin{aligned} \min \limits _{d\in R^n}\Vert F_k+J_kd\Vert ^2+\lambda _k\Vert d\Vert ^2\triangleq \varphi _k(d), \end{aligned}$$

by (3.3) and (3.6), we have

$$\begin{aligned} \Vert d_k\Vert ^2\le & {} \displaystyle \frac{\varphi _{k}(d_k)}{\lambda _k}\\\le & {} \displaystyle \frac{\varphi _{k}(\bar{x}_k-x_k)}{\lambda _k}\\= & {} \displaystyle \frac{\Vert F_k+J_k(\bar{x}_k-x_k)\Vert ^2}{\lambda _k}+\Vert \bar{x}_k-x_k\Vert ^2\\\le & {} \displaystyle \frac{L_1^2\Vert \bar{x}_k-x_k\Vert ^4}{\lambda _k}+\Vert \bar{x}_k-x_k\Vert ^2\\\le & {} (L_1^2c^{-2}m^{-1}+1)\Vert \bar{x}_k-x_k\Vert ^{2}. \end{aligned}$$

So, we obtain (3.5).\(\square \)

Next we show that the gradient of the merit function also provides a local error bound on some neighbourhood of \(x^*\in X^*\).

Lemma 3.2

Under Assumption 3.1, if \(x_k\in N(x^*,b_1/2)\), then there exists a constant \(c_3>0\) such that

$$\begin{aligned} \Vert J_k^TF_k\Vert \ge c_3 \Vert \bar{x}_k- x_k\Vert \end{aligned}$$
(3.7)

holds for all sufficiently large k.

Proof

It follows from (3.3) that

$$\begin{aligned} \Vert F_k+J_k(\bar{x}_k- x_k)\Vert \le L_1\Vert \bar{x}_k- x_k\Vert ^2. \end{aligned}$$

Thus,

$$\begin{aligned} \Vert F_k\Vert ^2+2(\bar{x}_k- x_k)^TJ^T_kF_k+(\bar{x}_k- x_k)^TJ^T_kJ_k(\bar{x}_k- x_k)\le L_1^2\Vert \bar{x}_k- x_k\Vert ^4. \end{aligned}$$

So,

$$\begin{aligned} \Vert F_k\Vert ^2+2(\bar{x}_k- x_k)^TJ^T_kF_k\le L_1^2\Vert \bar{x}_k- x_k\Vert ^4. \end{aligned}$$

By (3.1),

$$\begin{aligned} c^2\Vert \bar{x}_k- x_k\Vert ^2-L_1^2\Vert \bar{x}_k- x_k\Vert ^4\le 2\Vert \bar{x}_k- x_k\Vert \Vert J^T_kF_k\Vert . \end{aligned}$$

Hence, (3.7) holds for sufficiently large k. The proof is completed.\(\square \)

Lemma 3.3

Under Assumption 3.1, if \(x_k\in N(x^*,b_1/2)\), then there exists a positive integer K such that

$$\begin{aligned} r_k\ge p_0,\quad \forall k\ge K. \end{aligned}$$

That is, \(\mu _k\) is updated by (1.9) when \(k\ge K\).

Proof

It follows form (3.4), (3.5) and (3.7) that

$$\begin{aligned} Pred_k&\ge \Vert J^T_kF_k\Vert \min \left\{ \Vert d_k\Vert ,\frac{\Vert J_k^TF_k\Vert }{\Vert J^T_kJ_k\Vert }\right\} \\&\ge c_3 \Vert \bar{x}_k-x_k\Vert \min \{\Vert d_k\Vert ,\frac{c_2^{-1}c_3}{L_2^2}\Vert d_k\Vert \}\\&=\Vert d_k\Vert O(\Vert \bar{x}_k-x_k\Vert ). \end{aligned}$$

This, together with (3.3), (3.4) and \(\Vert F_k+J_kd_k\Vert \le \Vert F_k\Vert \), gives

$$\begin{aligned} |r_k-1|&=|\frac{Ared_k-Pred_k}{Pred_k}|\\&\le \frac{\Vert F_k+J_kd_k\Vert O(\Vert d_k\Vert ^2)+O(\Vert d_k\Vert ^4)}{Pred_k}\\&\le \frac{O(\Vert \bar{x}_k-x_k\Vert )O(\Vert d_k\Vert ^2)+O(\Vert d_k\Vert ^4)}{\Vert d_k\Vert O(\Vert \bar{x}_k-x_k\Vert )}\\&=O(\Vert d_k\Vert )\\&\rightarrow 0. \end{aligned}$$

So, \(r_k\rightarrow 1\). Therefore, we obtain the result.\(\square \)

Let

$$\begin{aligned} C_1&=\max \{p_2,c_1^{-1}mL_2\Vert F_0\Vert \}, \end{aligned}$$
(3.8)
$$\begin{aligned} c_4&= L_2^2+L_1\Vert F_0\Vert \end{aligned}$$
(3.9)

be two positive constants.

Lemma 3.4

Under Assumption 3.1 and \(c_1\le (1+c_4c_2c_3^{-1})^{-1}\), if \(k\ge K\) and \(\mu _k\Vert J_k^TF_k\Vert >C_1\), then

$$\begin{aligned} \mu _{k+1}\Vert J_{k+1}^TF_{k+1}\Vert \le \mu _k\Vert J_{k}^TF_{k}\Vert . \end{aligned}$$
(3.10)

Proof

By (3.2) and (3.4),

$$\begin{aligned} |\Vert J_{k+1}^TF_{k+1}\Vert -\Vert J_k^TF_k\Vert | \le&|\Vert J_{k+1}^TF_{k+1}\Vert -\Vert J_{k+1}^TF_{k}\Vert |+|\Vert J_{k+1}^TF_{k}\Vert -\Vert J_k^TF_k\Vert | \\ \le&\Vert J_{k+1}\Vert \Vert F_{k+1}-F_k\Vert +\Vert F_k\Vert \Vert J_{k+1}-J_k\Vert \\ \le&(L_2^2+L_1\Vert F_0\Vert )\Vert d_k\Vert \\ =&c_4\Vert d_k\Vert . \end{aligned}$$

It then follows from Lemmas 3.1 and 3.2 that

$$\begin{aligned} \Vert J_{k+1}^TF_{k+1}\Vert \le \Vert J_k^TF_k\Vert +c_4\Vert d_k\Vert \le (1+c_4c_2c_3^{-1})\Vert J_k^TF_k\Vert . \end{aligned}$$
(3.11)

Since \(\mu _k\Vert J_k^TF_k\Vert >C_1\), by (3.4) and \(\Vert F_k\Vert \le \Vert F_0\Vert \), we have

$$\begin{aligned} \mu _k> \frac{p_2}{\Vert J_k^TF_k\Vert }, \quad \mu _k\Vert J_k^TF_k\Vert \ge \frac{mL_2}{c_1}\Vert F_0\Vert \ge \frac{m}{c_1}\Vert J_k^TF_k\Vert . \end{aligned}$$

So, \(\mu _k\ge \frac{m}{c_1}\). It then follows from \(k\ge K\), Lemma 3.3 and the updating rule (1.9) that

$$\begin{aligned}\mu _{k+1}=c_1\mu _k. \end{aligned}$$

By (3.11) and \(c_1\le (1+c_4c_2c_3^{-1})^{-1}\), we have

$$\begin{aligned}\mu _{k+1}\Vert J_{k+1}^TF_{k+1}\Vert =&c_1\mu _k\Vert J_{k+1}^TF_{k+1}\Vert \nonumber \\ \le&c_1(1+c_4c_2c_3^{-1})\mu _k\Vert J_{k}^TF_{k}\Vert \nonumber \\ \le&\mu _k\Vert J_{k}^TF_{k}\Vert . \end{aligned}$$

The proof is completed.\(\square \)

Let

$$\begin{aligned} C_2=&\max \{\mu _K\Vert J_K^TF_K\Vert ,c_0(1+c_4c_2c_3^{-1})C_1\} \end{aligned}$$

be a positive constant.

The next lemma shows that \(\mu _k\Vert J^T_kF_k\Vert \) is upper bounded.

Lemma 3.5

Under conditions of Lemma 3.4,

$$\begin{aligned} \mu _k\Vert J^T_kF_k\Vert \le C_2, \quad \forall k\ge K. \end{aligned}$$
(3.12)

Proof

We discuss in two cases.

Case 1 \(\mu _K\Vert J^T_KF_K\Vert \le c_0(1+c_4c_2c_3^{-1})C_1\). Then, we must have

$$\begin{aligned} \mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert \le c_0(1+c_4c_2c_3^{-1})C_1. \end{aligned}$$
(3.13)

Otherwise, suppose

$$\begin{aligned} \mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert > c_0(1+c_4c_2c_3^{-1})C_1. \end{aligned}$$
(3.14)

It follows from (3.11) and \(\mu _{K+1}\le c_0\mu _K\) that

$$\begin{aligned} (1+c_4c_2c_3^{-1})C_1<\mu _K\Vert J^T_{K+1}F_{K+1}\Vert \le (1+c_4c_2c_3^{-1})\mu _K\Vert J^T_KF_K\Vert . \end{aligned}$$
(3.15)

This gives

$$\begin{aligned} \mu _K\Vert J^T_KF_K\Vert > C_1. \end{aligned}$$

By Lemma 3.4, we obtain

$$\begin{aligned} \mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert \le \mu _K\Vert J^T_KF_K\Vert \le c_0(1+c_4c_2c_3^{-1})C_1.\end{aligned}$$

This is a contradiction to (3.14). So (3.13) holds true.

By induction, we can obtain

$$\begin{aligned} \mu _k\Vert J^T_kF_k\Vert \le c_0(1+c_4c_2c_3^{-1})C_1, \quad \forall k\ge K. \end{aligned}$$
(3.16)

Case 2 \(\mu _K\Vert J^T_KF_K\Vert > c_0(1+c_4c_2c_3^{-1})C_1\). Note that \(c_0>1\), we have

$$\begin{aligned} \mu _K\Vert J^T_KF_K\Vert >C_1. \end{aligned}$$

So, by Lemma 3.4,

$$\begin{aligned} \mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert \le \mu _K\Vert J^T_KF_K\Vert . \end{aligned}$$
(3.17)

If \(\mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert > c_0(1+c_4c_2c_3^{-1})C_1\), then by Lemma 3.4 and (3.17),

$$\begin{aligned} \mu _{K+2}\Vert J^T_{K+2}F_{K+2}\Vert \le \mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert . \end{aligned}$$
(3.18)

Otherwise, if \(\mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert \le c_0(1+c_4c_2c_3^{-1})C_1\), then by the same arguments as in case 1, we have

$$\begin{aligned} \mu _{K+2}\Vert J^T_{K+2}F_{K+2}\Vert \le c_0(1+c_4c_2c_3^{-1})C_1. \end{aligned}$$
(3.19)

In view of (3.17)–(3.19), we obtain

$$\begin{aligned} \mu _{K+2}\Vert J^T_{K+2}F_{K+2}\Vert \le&\max \{\mu _{K+1}\Vert J^T_{K+1}F_{K+1}\Vert , c_0(1+c_4c_2c_3^{-1})C_1\} \nonumber \\ \le&\max \{\mu _K\Vert J^T_KF_K\Vert , c_0(1+c_4c_2c_3^{-1})C_1\}. \end{aligned}$$
(3.20)

By induction, we can prove that, for all \(k> K\),

$$\begin{aligned}\mu _k\Vert J^T_kF_k\Vert \le&\max \{\mu _{k-1}\Vert J^T_{k-1}F_{k-1}\Vert , c_0(1+c_4c_2c_3^{-1})C_1\} \nonumber \\ \le&\cdots \nonumber \\ \le&\max \{\mu _K\Vert J^T_KF_K\Vert , c_0(1+c_4c_2c_3^{-1})C_1\}\nonumber \\ =&C_2. \end{aligned}$$

The proof is completed.\(\square \)

Let

$$\begin{aligned} C_3=c_3^{-1}L_2C_2. \end{aligned}$$

The following lemma shows that \(\mu _k\Vert F_k\Vert \) is bounded by \(C_3\).

Lemma 3.6

Under conditions of Lemma 3.4,

$$\begin{aligned} \mu _k\Vert F_k\Vert \le C_3 \end{aligned}$$
(3.21)

holds for all sufficiently large k.

Proof

It follows from (3.4) that

$$\begin{aligned} \Vert F_k\Vert \le L_2\Vert \bar{x}_k- x_k\Vert . \end{aligned}$$

This, together with (3.7), gives

$$\begin{aligned} \Vert F_k\Vert \le c_3^{-1}L_2 \Vert J^T_kF_k\Vert . \end{aligned}$$

Thus, by (3.12), we obtain (3.21). The proof is completed.\(\square \)

3.2 Quadratic Convergence

Based on the above lemmas, we study the quadratic convergence of Algorithm 2.1 under the local error bound condition, by using the singular value decomposition (SVD) technique.

Suppose the SVD of \(J(\bar{x}_k)\) is

$$\begin{aligned} \bar{J}_k= & {} \bar{U}_k\bar{\Sigma }_k\bar{V}_k^{T}\\= & {} (\bar{U}_{k,1}, \bar{U}_{k,2})\left( \begin{array}{cc}\bar{\Sigma }_{k,1}&{}\\ {} &{} 0\end{array}\right) \left( \begin{array}{c} \bar{V}_{k,1}^{T}\\ \bar{V}_{k,2}^{T}\end{array}\right) \\= & {} \bar{U}_{k,1}\bar{\Sigma }_{k,1}\bar{V}_{k,1}^{T}, \end{aligned}$$

where \(\bar{\Sigma }_{k,1}={\text {diag}}(\bar{\sigma }_{k,1},\ldots , \bar{\sigma }_{k,r})\) with \(\bar{\sigma }_{k,1} \ge \bar{\sigma }_{k,2}\ge \cdots \ge \bar{\sigma }_{k,r}>0\), and the correspondingly SVD of \(J_k\) is

$$\begin{aligned} J_k= & {} U_k\Sigma _kV_k^T\\= & {} (U_{k,1},U_{k,2})\left( \begin{array}{cc}\Sigma _{k,1}&{}\\ {} &{}\Sigma _{k,2}\end{array}\right) \left( \begin{array}{c}V_{k,1}^T\\ V_{k,2}^T \end{array}\right) \\= & {} U_{k,1}\Sigma _{k,1}V_{k,1}^T+U_{k,2}\Sigma _{k,2}V_{k,2}^T, \end{aligned}$$

where \(\Sigma _{k,1}={\text {diag}}(\sigma _{k,1},\ldots , \sigma _{k,r})\) with \(\sigma _{k,1} \ge \cdots \ge \sigma _{k,r}>0\), and \(\Sigma _{k,2}={\text {diag}}(\sigma _{k,r+1},\ldots , \sigma _{k,n})\) with \(\sigma _{k,r}\ge \cdots \ge \sigma _{k,n}\ge 0\). In the following, if the context is clear, we neglect the subscription k in \(\Sigma _{k,i}\) and \(U_{k,i}, V_{k,i}(i=1,2)\), and write \(J_k\) as

$$\begin{aligned} J_k=U_1\Sigma _1V_1^T+U_2\Sigma _2V_2^T. \end{aligned}$$

By the theory of matrix perturbation [12] and the Lipschitzness of \(J_k\),

$$\begin{aligned} \Vert \text{ diag }(\Sigma _1-\bar{\Sigma }_1, \Sigma _2)\Vert \le \Vert J_k-\bar{J}_k\Vert \le L_1\Vert \bar{x}_k-x_k\Vert . \end{aligned}$$

So,

$$\begin{aligned} \Vert \Sigma _1-\bar{\Sigma }_1\Vert \le L_1\Vert \bar{x}_k-x_k\Vert \qquad \text{ and }\qquad \Vert \Sigma _2\Vert \le L_1\Vert \bar{x}_k-x_k\Vert . \end{aligned}$$
(3.22)

Since \(\{x_k\}\) converges to the solution set \(X^*\), we assume that \(L_1\Vert \bar{x}_k-x_k\Vert \le \bar{\sigma }_r/2\) holds for all sufficiently large k. Then, it follows from (3.22) that

$$\begin{aligned} \Vert \Sigma _1^{-1}\Vert \le \dfrac{1}{\bar{\sigma }_r-L_1\Vert \bar{x}_k-x_k\Vert }\le \dfrac{2}{\bar{\sigma }_r}. \end{aligned}$$
(3.23)

Lemma 3.7

Under Assumption 3.1, if \(x_k\in N(x^*, b_1/2)\), then we have

  1. (a)

    \(\Vert U_1U_1^TF_k\Vert \le L_2\Vert \bar{x}_k-x_k\Vert \);

  2. (b)

    \(\Vert U_2U_2^TF_k\Vert \le 2L_1\Vert \bar{x}_k-x_k\Vert ^2\);

where \(L_1, L_2\) are given in (3.2) and (3.4) respectively.

Proof

(a) follows from (3.4) directly.

Denote \(F(\bar{x}_k)\) by \(\bar{F}_k\). By (3.3) and (3.22),

$$\begin{aligned} \Vert U_2U_2^TF_k\Vert =&\Vert U_2U_2^T(\bar{F}_k- F_k)\Vert \\ \le&\Vert U_2U_2^TJ_k(\bar{x}_k- x_k)\Vert +L_1\Vert U_2U_2^T\Vert \Vert \bar{x}_k- x_k\Vert ^2\\ \le&\Vert U_2U_2^T(U_1\Sigma _1V_1^T+U_2\Sigma _2V_2^T)\Vert \Vert \bar{x}_k- x_k\Vert +L_1\Vert \bar{x}_k- x_k\Vert ^2\\ \le&\Vert \Sigma _2\Vert \Vert \bar{x}_k- x_k\Vert +L_1\Vert \bar{x}_k- x_k\Vert ^2\\ \le&2L_1\Vert \bar{x}_k- x_k\Vert ^2. \end{aligned}$$

The proof is completed.\(\square \)

Now we can give the main result of this section.

Theorem 3.1

Under Assumption 3.1, the sequence generated by Algorithm 2.1 converges to some solution of (1.1) quadratically.

Proof

By the SVD of \(J_k\),

$$\begin{aligned} d_k=-V_1(\Sigma _1^2+\lambda _kI)^{-1}\Sigma _1U_1^TF_k-V_2(\Sigma _2^2+\lambda _kI)^{-1}\Sigma _2U_2^TF_k, \end{aligned}$$

and

$$\begin{aligned} F_k+J_kd_k=\lambda _kU_1(\Sigma _1^2\!+\!\lambda _kI)^{-1}U_1^TF_k \!+\!\lambda _kU_2(\Sigma _2^2\!+\!\lambda _kI)^{-1}U_2^TF_k. \end{aligned}$$

It follows from (3.4), (3.23), Lemmas 3.6 and 3.7 that

$$\begin{aligned} \Vert F_k+J_kd_k\Vert \le&\mu _k\Vert F_k\Vert ^{2}\Vert \Sigma _1^{-2}\Vert \Vert U_1U_1^TF_k\Vert +\Vert U_2U_2^TF_k\Vert \\ \le&\frac{4L_2^{2}C_3}{\bar{\sigma }_r^2}\Vert \bar{x}_k- x_k\Vert ^{2}+2L_1\Vert \bar{x}_k- x_k\Vert ^2\\ \le&c_5\Vert \bar{x}_k- x_k\Vert ^{2}, \end{aligned}$$

where \(c_5=\frac{4L_2^{2}C_3}{\bar{\sigma }_r^2}+2L_1\) is a positive constant. So, by (3.1), (3.3) and Lemma 3.1,

$$\begin{aligned} c\Vert \bar{x}_{k+1}- x_{k+1}\Vert&\le \Vert F_{k+1}\Vert \nonumber \\&\le \Vert F_k+J_kd_k\Vert +L_1\Vert d_k\Vert ^2\nonumber \\&\le c_5\Vert \bar{x}_k- x_k\Vert ^{2}+L_1c_2^2\Vert \bar{x}_k- x_k\Vert ^2\nonumber \\&\le c_6\Vert \bar{x}_k- x_k\Vert ^{2}, \end{aligned}$$
(3.24)

where \(c_6=c_5+c_2^2 L_1\) is a positive constant.

Note that

$$\begin{aligned} \Vert \bar{x}_k-x_k\Vert \le \Vert \bar{x}_{k+1}-x_{k+1}\Vert +\Vert d_k\Vert . \end{aligned}$$
(3.25)

By (3.24),

$$\begin{aligned} \Vert \bar{x}_k-x_k\Vert \le 2\Vert d_k\Vert \end{aligned}$$

holds for all sufficiently large k. Combining (3.5), (3.24) and (3.25), we obtain

$$\begin{aligned} \Vert d_{k+1}\Vert \le O(\Vert d_k\Vert ^{2}). \end{aligned}$$

The proof is completed.\(\square \)

Remark 3.1

If the Levenberg–Marquardt parameter is chosen as \(\lambda _k=\mu _k\Vert F_k\Vert ^\delta \), where \(\mu _k\) is updated by (1.9) and \(\delta \in (1,2]\), the algorithm converges superlinearly to some solution of the nonlinear equations with the order \(\delta \). The proof is almost the same as above, except that we have \(\Vert F_k+J_kd_k\Vert \le c_5\Vert \bar{x}_k- x_k\Vert ^\delta \) instead of \(\Vert F_k+J_kd_k\Vert \le c_5\Vert \bar{x}_k- x_k\Vert ^2\) in the proof of Theorem 3.1, which then yields \(\Vert d_{k+1}\Vert \le O(\Vert d_k\Vert ^\delta )\).

Table 1 Results on the first singular test set with \(\text{ rank }(F'(x^*))=n-1\)
Table 2 Results on the second singular test set with \(\text{ rank }(F'(x^*))=n-2\)
Table 3 Results on the first singular test set with rank \(n-1\)
Table 4 Results on the first singular test set with rank \(n-2\)

4 Numerical Results

We test Algorithm 2.1, where the LM parameter is computed by \(\lambda _k=\mu _k\Vert F_k\Vert ^2\) with \(\mu _k\) updated by (1.9), on some singular nonlinear equations, and compare it with other two LM algorithms, where \(\lambda _k=\mu _k\Vert F_k\Vert \) and \(\lambda _k=\mu _k\Vert F_k\Vert ^2\) with \(\mu _k\) updated by (1.7), respectively.

The test problems are created by modifying the nonsingular problems given by Moré, Garbow and Hillstrom in [8], and have the same form as in [11],

$$\begin{aligned} \hat{F}(x)=F(x)-J(x^*)A(A^TA)^{-1}A^T(x-x^*), \end{aligned}$$

where F(x) is the standard nonsingular test function, \(x^*\) is its root, and \(A\in R^{n\times k}\) has full column rank with \(1\le k\le n\). Obviously, \(\hat{F}(x^*)=0\) and

$$\begin{aligned} \hat{J}(x^*)=J(x^*)(I-A(A^TA)^{-1}A^T) \end{aligned}$$

has rank \(n-k\). A disadvantage of these problems is that \(\hat{F}(x)\) may have roots that are not roots of F(x). We create two sets of singular problems, with \(\hat{J}(x^*)\) having rank \(n-1\) and \(n-2\), by using

$$\begin{aligned} A\in R^{n\times 1},\qquad A^T=(1,1,\ldots , 1) \end{aligned}$$

and

$$\begin{aligned} A\in R^{n\times 2},\qquad A^T=\left( \begin{array}{cccccc} 1 &{} 1 &{} 1 &{} 1 &{} \cdots &{} 1\\ 1 &{} -1&{} 1 &{} -1&{} \cdots &{} \pm 1 \end{array}\right) , \end{aligned}$$

respectively. Meanwhile, we make a slight alteration on the variable dimension problem, which has \(n+2\) equations in n unknowns; we eliminate the \((n-1)\)-th and n-th equations. (The first n equations in the standard problem are linear.)

We set \(p_0=0.0001, p_1=0.25, p_2=0.75, c_0=4, c_1=0.25, \mu _0=10^{-8}, m=10^{-8}, \varepsilon =10^{-6}\) for all the tests. The stopping criterion is \(\Vert J^T_kF_k\Vert \le \varepsilon \) or when the number of iterations exceeds \(100(n+1)\). The results for the first set problems of rank \(n - 1\) with small scale are listed in Table 1, and the second set of rank \(n-2\) in Table 2. We also test the algorithms on some large scale problems. The results are given in Tables 3 and 4.

The third column of the table indicates that the starting point is \(x_0, 10x_0\), and \(100x_0\), where \(x_0\) is suggested by Moré, Garbow and Hillstrom in [8]; “NF” and “NJ” represent the numbers of function calculations and Jacobian calculations, respectively. If the algorithm fails to find the solution in \(100(n+1)\) iterations, we denote it by the sign “−”, and if the algorithm has underflows or overflows, we denote it by OF. Note that, for general nonlinear equations, the calculations of the Jacobian are usually n times of the function calculations. So, for small scale problems, we also present the values “NF+n*N” for comparisons of the total calculations. However, if the Jacobian is sparse, this kind of value does not mean much. For the large scale problem, the computing time is also given.

From Tables 1 and 2, we can see that Algorithm 2.1 works almost the same as other two LM algorithms for small scale problems. From Tables 3 and 4, we can see that Algorithm 2.1 outperforms the other two algorithms for most large scale problems.

5 Conclusion and Discussion

In traditional LM algorithms for nonlinear equations, both the iterate and the LM parameter are updated according to the ratio of the actual reduction to the predicted reduction of the merit function (cf. [1, 2]). In this paper, we proposed a new LM algorithm for nonlinear equations, where the LM parameter is taken as \(\lambda _k=\mu _k\Vert F_k\Vert ^2\) with \(\mu _k\) being updated by (1.9). Though the iterate is still updated according to the ratio of the actual reduction to the predicted reduction, the update of \(\mu _k\) is no longer based on it. When the iteration is unsuccessful, \(\mu _k\) is increased; otherwise it is updated based on the value of the gradient norm of the merit function as in (1.9). We proved that all limit points of the sequence generated by the algorithm are stationary points of the merit function under standard conditions. Since the updating rule of \(\mu _k\) changes, the analysis of the convergence rate in this paper is quite different from those in [1, 3]. We developed new techniques to prove the quadratic convergence of the algorithm under the local error bound condition.

We also discussed the LM parameter as \(\lambda _k=\mu _k\Vert F_k\Vert ^\delta \), where \(\mu _k\) is updated by (1.9) and \(\delta \in [1,2)\). We found that the algorithm converges with the order \(\delta \), by using the similar analysis in this paper. We conjecture that the convergence rate is quadratic for any \(\delta \in [1,2)\). This will be our future study.