1 Introduction

We consider the system of nonlinear equations

$$\begin{aligned} F(x)=0, \end{aligned}$$
(1.1)

where \(F(x):R^{n}\rightarrow R^{m}\) is a continuously differentiable function. Nonlinear equations have wide applications in technology, mechanics, economy and so on. For example, physical models that are expressed as nonlinear partial differential equations become systems of nonlinear equations when discretized [5].

It is natural to transform the nonlinear equations (1.1) to the nonlinear least squares problem

$$\begin{aligned} \min _{x\in R^n} \Vert F(x)\Vert ^2, \end{aligned}$$
(1.2)

where \(\Vert \cdot \Vert \) refers to the 2-norm. Obviously, (1.1) has a solution if and only if the minimal value of (1.2) is zero. The Gauss–Newton method is the most well-known method for (1.2). At the k-th iteration, it computes the Gauss–Newton step

$$\begin{aligned} d_k^{GN}=-\,(J_k^TJ_k)^{-1}J_k^TF_k, \end{aligned}$$
(1.3)

where \(F_{k}=F(x_{k})\) and \(J_{k}=F'(x_{k})\) is the Jacobian at \(x_k\). However, when the Jacobian is not of full rank, the Gauss–Newton step is not well-defined. To overcome this difficulty, the Levenberg–Marquardt (LM) method introduces a positive parameter \(\lambda _{k}>0\) and computes the LM step

$$\begin{aligned} d_{k}=-\,(J_{k}^{T}J_{k}+\lambda _{k}I)^{-1}J_{k}^{T}F_{k}. \end{aligned}$$
(1.4)

Both the Gauss–Newton method and the Levenberg–Marquardt method have quadratic convergence for (1.1) under the assumptions that the Jacobian J(x) is Lipschitz continuous, \(J(x)^TJ(x)\) is nonsingular at the solution and the LM parameter is chosen suitably. Yamashita and Fukushima [13] proved that if the LM parameter is chosen as \(\lambda _{k}=\Vert F_{k}\Vert ^{2}\), then the LM method converges quadratically for (1.1) under the local error bound condition, which is weaker than the nonsingularity condition. Fan and Yuan [4] took \(\lambda _{k}=\Vert F_{k}\Vert ^{\delta }\) and showed that the LM method preserves the quadratic convergence for any \(\delta \in [1,2]\) under the local error bound condition.

The LM method evaluates the Jacobian at every iteration. When F(x) is complicated or n is large, the cost of Jacobian evaluations may be expensive. To save the Jacobian evaluations as well as the linear algebra work, Fan [2] proposed a modified LM method. At every iteration, it uses the evaluated Jacobian to compute not only an LM step but also an approximate LM step. More generally, a multi-step LM method was given in [3]. At the k-th iteration, it computes one LM step and \(p-1\) approximate LM steps as follows:

$$\begin{aligned} d_{k,i}&=-\,(J_{k}^{T}J_{k}+\lambda _{k}I)^{-1}J_{k}^{T}F(x_{k,i}),\quad i=0,\ldots ,p-1, \end{aligned}$$
(1.5)

where \(p\ge 1\) is a given integer and \(x_{k,i}=x_{k,i-1}+d_{k,i-1}\) with \(x_{k,0}=x_{k}\), then set the trial step as

$$\begin{aligned} d_k=\sum _{i=0}^{p-1}d_{k,i}. \end{aligned}$$
(1.6)

That is, the multi-step LM method uses \(J_k\) as an approximation of \(J(x_{k,i})\) to compute the approximate LM step \(d_{k,i}\). Hence, the Jacobian evaluation and the matrix factorization are done after every p computations of the step. Counting as a complete iteration of the full p steps between the Jacobian evaluation and the matrix factorization, the multi-step LM method converges with Q-order \(p+1\) under the local error bound condition.

It looks like that the bigger the p, the more Jacobian evaluations could be saved. However, the bigger p, meanwhile, implies the possibly worse approximation of the Jacobian at the iterate. On the other hand, different problems may prefer to different p. But, the best p for a problem is usually not known. These observations motivate us to develop an adaptive scheme to automatically decide whether an iteration should evaluate the Jacobian at the current iterate to compute an LM step or use the latest evaluated Jacobian to compute an approximate LM step. That is, the number of approximate LM steps could vary at different iterations, but it is at most \(t-1\) to avoid worse approximation of the Jacobian, where \(t\ge 1\) is a given integer.

Counting every LM step and every approximate LM step as a single iteration, in this paper, we compute the trial step \(d_k\) by solving

$$\begin{aligned} \left( G_{k}^{T}G_{k}+\lambda _{k}I\right) d=-\,G_{k}^{T}F_{k}, \end{aligned}$$
(1.7)

where \(G_k\) is the Jacobian \(J_k\) or the Jacobian used at the last iteration. If \(d_k\) is satisfactory and less than \(t-1\) approximate LM steps are computed, we regard \(G_k\) as a good approximation of the Jacobian at the current iterate, and use it to compute another approximate LM step at next iteration.

Define the ratio of the actual reduction to the predicted reduction of the merit function \(\Vert F(x)\Vert ^2\) at the k-th iteration as

$$\begin{aligned} r_k=\frac{Ared_k}{Pred_k}=\frac{ \Vert F_{k}\Vert ^{2}-\Vert F(x_{k}+d_{k})\Vert ^{2}}{ \Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_{k}d_{k}\Vert ^{2} }. \end{aligned}$$
(1.8)

The ratio \(r_k\) exploits the most relevant information on the step’s quality at the iterate. It plays an important role in deciding whether \(d_k\) is acceptable. If \(r_k\) is positive (i.e., the iteration is successful), we accept \(d_k\). Usually, we set

$$\begin{aligned} x_{k+1} = {\left\{ \begin{array}{ll} x_{k}+d_{k}, &{} \quad \hbox {if}\quad r_{k}\ge p_{0},\\ x_{k}, &{} \quad \hbox {otherwise}, \end{array}\right. } \end{aligned}$$
(1.9)

where \(p_0\) is a small positive constant. Suppose \(s-1\) approximate LM steps that used the latest evaluated Jacobian have been computed. If the iteration is very successful and s is less than t, we regard the latest evaluated Jacobian as a good approximation at the iterate and keep it, otherwise, we evaluate the exact one. That is, we set

$$\begin{aligned} G_{k+1} = {\left\{ \begin{array}{ll} G_k, &{} \quad \text {if } \quad r_k\ge p_1 \text { and }s<t,\\ J_{k+1}, &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(1.10)

where \(0<p_0<p_1<1\). Meanwhile, we update the LM parameter as follows:

$$\begin{aligned} \lambda _{k+1} = {\left\{ \begin{array}{ll} \lambda _k, &{} \quad \text {if} \quad r_k\ge p_1\text { and }s<t,\\ \mu _{k+1}\Vert F_{k+1}\Vert ^\delta , &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(1.11)

where

$$\begin{aligned} \mu _{k+1} = {\left\{ \begin{array}{ll} c_{1}\mu _{k}, &{} \quad \hbox {if}\quad r_{k}<p_{2},\\ \mu _{k}, &{} \quad \hbox {if}\quad p_{2}\le r_{k} \le p_3,\\ \max \{c_{2}\mu _{k},\mu _{\min }\}, &{} \quad \hbox {if}\quad r_{k}> p_3. \end{array}\right. } \end{aligned}$$
(1.12)

Here \(0<c_2<1<c_1\), \(0<p_0<p_2<p_1<p_3<1\), \(1\le \delta \le 2\) and \(\mu _{\min }>0\) are positive constants. Based on (1.7)–(1.12), we propose an adaptive multi-step LM method for nonlinear equations (1.1) and show it converges superlinearly under the local error bound condition.

The paper is organized as follows. In Sect. 2, we propose an adaptive multi-step LM algorithm for (1.1). It is shown that the algorithm converges globally under certain assumptions. In Sect. 3, we discuss the convergence rate of the algorithm under the local error bound condition. Some numerical results are given in Sect. 4. Finally, we conclude the paper in Sect. 5.

2 An Adaptive Multi-step LM Algorithm and Global Convergence

In this section, we first propose the adaptive multi-step LM algorithm, then show it converges globally under certain assumptions.

The adaptive multi-step LM algorithm is presented as follows.

figure a

We denote by \({{\bar{S}}}=\{k_i : i=1,2,\ldots \}\) the set of numbers at which iterations the Jacobians \(J(x_{k_i}) (i=1,2,\ldots ) \) are used to compute LM steps. Let

$$\begin{aligned} s_{i}=k_{i+1}-k_{i}. \end{aligned}$$
(2.1)

Since at most \(t-1\) approximate LM steps are computed, we have \(s_i\le t\).

For any k, there exist \(k_i\) and \(0\le q\le s_i-1\) such that

$$\begin{aligned} k=k_{i}+q. \end{aligned}$$
(2.2)

Note that

$$\begin{aligned} G_{k_{i}}=G_{k_{i}+1}=\cdots =G_{k_{i}+s_i-1}=J_{k_{i}}, \end{aligned}$$
(2.3)

thus the linear equations (1.7) can also be written as

$$\begin{aligned} \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) d=-\,J_{k_{i}}^{T}F_k \end{aligned}$$
(2.4)

for \(k=k_i, \ldots , k_i+s_i-1\).

It can be easily checked that \(d_{k}\) is not only the minimizer of the convex minimization problem

$$\begin{aligned} \min _{d\in R^{n}}\Vert F_{k}+G_{k}d\Vert ^{2}+\lambda _{k}\Vert d\Vert ^{2}\triangleq \varphi _{k}(d), \end{aligned}$$
(2.5)

but also the solution of the trust region problem

$$\begin{aligned} \min _{d\in R^{n}}&\ \Vert F_{k}+G_{k}d\Vert ^{2}\nonumber \\ \ s.t.&\ \ \Vert d\Vert \le \Delta _{k}\triangleq \Vert d_{k}\Vert . \end{aligned}$$
(2.6)

Due to Powell’s result [10, Theorem 4], we have the following lemma.

Lemma 2.1

Let \(d_{k}\) be computed by (1.7), then

$$\begin{aligned} \Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_{k}d_{k}\Vert ^{2} \ge \Vert G_{k}^{T}F_{k}\Vert \min \left\{ \Vert d_{k}\Vert ,\dfrac{\Vert G_{k}^{T}F_{k}\Vert }{\Vert G_{k}^{T}G_{k}\Vert }\right\} . \end{aligned}$$
(2.7)

Lemma 2.1 indicates that the predicted reduction of the merit function is always nonnegative. It plays a crucial role in guaranteeing the global convergence of Algorithm 1. In the following, we show that, from the optimization point, at least one of the accumulation points of the sequence generated by Algorithm 1 is a stationary point of the merit function \(\Vert F(x)\Vert ^2\).

Theorem 2.2

Suppose F(x) is continuously differentiable, both F(x) and J(x) are Lipschitz continuous, i.e., there exist positive constants \(\kappa _{lf}\) and \(\kappa _{lj}\) such that

$$\begin{aligned} \Vert F(y)-F(x)\Vert \le \kappa _{lf}\Vert y-x\Vert ,\quad \forall x,y\in R^{n}, \end{aligned}$$
(2.8)

and

$$\begin{aligned} \Vert J(y)-J(x)\Vert \le \kappa _{lj}\Vert y-x\Vert ,\quad \forall x,y\in R^{n}. \end{aligned}$$
(2.9)

Then, we have

$$\begin{aligned} \liminf _{k\rightarrow \infty }\Vert J_{k}^{T}F_{k}\Vert =0. \end{aligned}$$
(2.10)

Proof

Suppose that (2.10) is not true. Then, there exists a constant \(\varepsilon >0\) such that

$$\begin{aligned} \left\| J_{k}^{T}F_{k}\right\| \ge \varepsilon , \quad \forall k. \end{aligned}$$
(2.11)

By (2.8) and (2.9), we have

$$\begin{aligned} \Vert J(x)\Vert \le \kappa _{lf}, \quad \forall x\in R^n \end{aligned}$$
(2.12)

and

$$\begin{aligned} \Vert F(y)-F(x)-J(x)(y-x)\Vert \le \kappa _{lj}\Vert y-x\Vert ^{2},\quad \forall x,y\in R^{n}. \end{aligned}$$
(2.13)

Denote the index set of successful iterations by

$$\begin{aligned} S=\left\{ k: r_{k}\ge p_0\right\} . \end{aligned}$$
(2.14)

We consider S in two cases.

Case 1: S is finite. Then, there exists \({\tilde{k}}\) such that \(r_{k}<p_0<p_1\) for all \(k\ge {\tilde{k}}\). Hence, \(G_k = J_k\) for all \(k\ge {\tilde{k}}\). Since \(\{\Vert F_k\Vert \}\) is nonincreasing, by (1.12), \(\mu _{k+1}=c_1\mu _k\) with \(c_1>1\). So, \(\mu _{k}\rightarrow +\,\infty \).

By (2.11) and (2.12), for \(k\ge {\tilde{k}}\),

$$\begin{aligned} \Vert G_{k}^{T}F_{k}\Vert \le \Vert G_{k}\Vert \Vert F_k\Vert \le \kappa _{lf}\Vert F_1\Vert \end{aligned}$$
(2.15)

and

$$\begin{aligned} \left\| F_{k}\right\| \ge \frac{\Vert G_{k}^{T}F_{k}\Vert }{\Vert G_{k}\Vert } \ge \frac{\Vert J_{k}^{T}F_{k}\Vert }{\kappa _{lf}} \ge \frac{\varepsilon }{\kappa _{lf}}. \end{aligned}$$
(2.16)

This, together with \(\lambda _k=\mu _k\Vert F_k\Vert ^\delta \) and \(\mu _k \rightarrow +\,\infty \), gives \(\lambda _{k}\rightarrow +\,\infty .\) Hence, by the definition of \(d_k\), we get \(d_{k}\rightarrow 0\).

Case 2: S is infinite. It follows from (2.12) and Lemma 2.1 that

$$\begin{aligned} \left\| F_{1}\right\| ^{2}&\ge \sum _{k\in S}\left( \left\| F_{k}\right\| ^{2}-\left\| F_{k+1}\right\| ^{2}\right) \nonumber \\&\ge \sum _{k\in S}p_{0} Pred_{k} \ge \sum _{k\in S\cap {{\bar{S}}}}p_{0} Pred_{k}\nonumber \\&\ge \sum _{k\in S\cap {{\bar{S}}}}p_{0}\Vert J_{k}^{T}F_{k}\Vert \min \left\{ \Vert d_{k}\Vert ,\dfrac{\Vert J_{k}^{T}F_{k}\Vert }{\Vert J_{k}^{T}J_{k}\Vert }\right\} \nonumber \\&\ge \sum _{k\in S\cap {{\bar{S}}}}p_{0}\varepsilon \min \left\{ \left\| d_{k}\right\| ,\frac{\varepsilon }{\kappa _{lf}^{2}}\right\} . \end{aligned}$$
(2.17)

We claim that the set \(S\cap {{\bar{S}}}\) is infinite. Suppose to the contrary that it is finite. Let \(k_{{{\bar{i}}}}\) be the largest index in it. Then, for \(i>{{\bar{i}}}\), \(r_{k_i}< p_0 < p_1\). By (1.10), \(G_{k_i+1} = J_{k_i+1}\). Thus, \(k_{i+1}=k_i+1\). Moreover, \(k_{i+1} \notin S\). Deducing by induction, we obtain that \(r_k<p_0\) for all sufficiently large k, which contradicts to the infinity of S. Hence, \(S\cap {{\bar{S}}}\) is infinite.

Note that \(k_i\in {{\bar{S}}}\) according to the definition of \({{\bar{S}}}\). By (2.17), we obtain \(d_{k_i}\rightarrow 0\) for \(k_i\in S\). Since \(d_{k_i}=0\) for \(k_i\notin S\), we have \(d_{k_i}\rightarrow 0\). This, together with \(\Vert G_{k_i}\Vert \le \kappa _{lf}\), \(\Vert G_{k_i}^TF_{k_i}\Vert =\Vert J_{k_i}^TF_{k_i}\Vert \ge \varepsilon \) and (1.7), gives \(\lambda _{k_i} \rightarrow +\,\infty \).

By (2.12) and (2.13), for \(k=k_i+1,\ldots , k_i+s_i-1\),

$$\begin{aligned} \Vert d_k \Vert&= \left\| -(G_k^T G_k + \lambda _{k} I)^{-1}G_k^T F_k \right\| \nonumber \\&\le \left\| (J_{k_i}^T J_{k_i} + \lambda _{k_i} I)^{-1}J_{k_i}^T F_{k_i} \right\| + \left\| (J_{k_i}^T J_{k_i} + \lambda _{k_i} I)^{-1}J_{k_i}^T J_{k_i} \left( \sum _{j = k_i}^{k-1} d_j \right) \right\| \nonumber \\&\qquad +\, \kappa _{lj} \left\| (J_{k_i}^T J_{k_i} + \lambda _{k_i} I)^{-1}J_{k_i}^T \right\| \left\| \left( \sum _{j = k_i}^{k-1} d_j \right) \right\| ^2 \nonumber \\&\le \Vert d_{k_i}\Vert + \sum _{j = k_i}^{k-1} \Vert d_j\Vert + \frac{\kappa _{lf}\kappa _{lj}}{\lambda _{k_i}} \left( \sum _{j = k_i}^{k-1} \Vert d_j\Vert \right) ^2. \end{aligned}$$
(2.18)

Since \(\lambda _{k_i} \rightarrow + \infty \), we have

$$\begin{aligned} \Vert d_{k_i+1} \Vert \le 3 \Vert d_{k_i}\Vert \end{aligned}$$
(2.19)

and

$$\begin{aligned} \Vert d_{k_i+2} \Vert&\le \Vert d_{k_i}\Vert +\Vert d_{k_i}\Vert +\Vert d_{k_i+1} \Vert + \frac{\kappa _{lf}\kappa _{lj}}{\lambda _{k_i}}\left( \Vert d_{k_i}\Vert +\Vert d_{k_i+1}\Vert \right) ^2\nonumber \\&\le 21\Vert d_{k_i}\Vert \end{aligned}$$
(2.20)

for sufficiently large \(k_i\). By induction, there exists a positive constant \({{\hat{c}}}\) such that, for \(k=k_i+1,\ldots ,k_i+s_i-1\),

$$\begin{aligned} \Vert d_k \Vert \le {{\hat{c}}}\Vert d_{k_i} \Vert \end{aligned}$$
(2.21)

holds for all sufficiently large \(k_i\). Since \(s_i\le t\) for all i, we have \(d_k\rightarrow 0\) and

$$\begin{aligned} \Vert J_k-G_k\Vert =\Vert J_k-J_{k_i}\Vert \le \kappa _{lj} \sum _{j=k_i}^{k-1} \Vert d_j\Vert \rightarrow 0. \end{aligned}$$
(2.22)

Hence, by (2.11),

$$\begin{aligned} \left\| G_{k}^T F_k\right\| \ge \left\| J_{k}^T F_{k}\right\| - \left\| (J_{k}-G_k)^T F_k\right\| \ge \left\| J_{k}^T F_{k}\right\| -\Vert J_k-G_k\Vert \Vert F_1\Vert \ge \varepsilon /2 \end{aligned}$$
(2.23)

holds for sufficiently large k. Since \(\Vert F_k\Vert \ge \left\| G_{k}^{T}F_{k}\right\| / \Vert G_{k}\Vert \ge \frac{\varepsilon }{2 \kappa _{lf}}\), by the definition of \(d_k\) and \(\lambda _k\), we have \(\lambda _{k}\rightarrow +\,\infty \) and \(\mu _k\rightarrow +\,\infty \).

Therefore, no matter S is infinite or not, we obtain

$$\begin{aligned} d_k\rightarrow 0, \quad \lambda _{k}\rightarrow +\,\infty , \quad \mu _k\rightarrow +\,\infty , \end{aligned}$$
(2.24)

moreover, (2.22) and (2.23) hold true. It then follows from (2.12), (2.13), (2.22), (2.23), Lemma 2.1 and \(\Vert F_k\Vert \le \Vert F_1\Vert \) that

$$\begin{aligned} \left| r_{k}-1\right|&=\left| \frac{Ared_{k}-Pred_{k}}{Pred_{k}}\right| =\left| \frac{ \left\| F_{k+1}\right\| ^{2}-\left\| F_{k}+G_{k}d_{k}\right\| ^{2} }{ \Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_{k}d_{k}\Vert ^{2} }\right| \nonumber \\&\le \frac{ \left| \left\| F_{k}+J_{k}d_{k}\right\| ^{2}-\left\| F_{k}+G_{k}d_{k}\right\| ^{2}\right| + 2\kappa _{lj}\left\| F_{k}+J_{k}d_{k}\right\| \left\| d_{k}\right\| ^{2}+\kappa _{lj}^2\left\| d_{k}\right\| ^{4}}{ \left\| G_{k}^{T}F_{k}\right\| \min \left\{ \Vert d_{k}\Vert ,\dfrac{\left\| G_{k}^{T}F_{k}\right\| }{\left\| G_{k}^{T}G_{k}\right\| }\right\} }\nonumber \\&\le \frac{ \left( \left\| J_{k}^{T}J_{k}\right\| +\left\| G_{k}^{T}G_{k}\right\| \right) \left\| d_{k}\right\| ^{2}+2\left\| J_{k}-G_{k}\right\| \left\| F_{k}\right\| \left\| d_{k}\right\| }{ \frac{\varepsilon }{2}\min \left\{ \left\| d_{k}\right\| ,\frac{\varepsilon }{2\kappa _{lf}^{2}}\right\} }\nonumber \\&\qquad +\,\frac{2\kappa _{lj}\Vert F_1\Vert \Vert d_k\Vert ^2+2\kappa _{lj}\kappa _{lf}\Vert d_k\Vert ^3+\kappa _{lj}^4\Vert d_k\Vert ^4}{\frac{\varepsilon }{2}\min \left\{ \left\| d_{k}\right\| ,\frac{\varepsilon }{2\kappa _{lf}^{2}}\right\} }\nonumber \\&\rightarrow 0. \end{aligned}$$
(2.25)

Thus, \(r_{k}\rightarrow 1\). So, by (1.10)–(1.12), there exists a positive \({\hat{\mu }}\) such that \( \mu _{k}\le {\hat{\mu }} \) holds for all sufficiently large k, which is a contradiction to (2.24). Hence, (2.11) can not be true. Therefore, we obtain (2.10). \(\square \)

3 Local Convergence

In this section, we first investigate the properties of the trial step, then show that the LM parameter is bounded, finally we prove that Algorithm 1 converges superlinearly under the local error bound condition. We make the following assumptions.

Assumption 3.1

The sequence \(\{x_k\}\) generated by Algorithm 1 satisfies \({\text {dist}}(x_k, X^*)\rightarrow 0\), and there exist \(x^*\in X^*\) and \(0<r<1\) such that \( \Vert x_k-x^*\Vert \le r/2\) for all large k.

Assumption 3.2

F(x) is continuously differentiable, and J(x) is Lipschitz continuous on \(N(x^*, r)=\{ x: \Vert x-x^*\Vert \le r\}\), i.e., there exists a constant \(\kappa _{lj}>0\) such that

$$\begin{aligned} \Vert J(y)-J(x)\Vert \le \kappa _{lj}\Vert y-x\Vert ,\quad \forall x,y\in N(x^*, r). \end{aligned}$$
(3.1)

Assumption 3.3

\(\Vert F(x)\Vert \) provides a local error bound on \(N(x^*, r)\), i.e., there exists a constant \(\kappa _{leb}>0\) such that

$$\begin{aligned} \Vert F(x)\Vert \ge \kappa _{leb} {\text {dist}}(x,X^{*}),\quad \forall x\in N (x^{*},r). \end{aligned}$$
(3.2)

By Assumption 3.2,

$$\begin{aligned} \Vert F(y)-F(x)-J(x)(y-x)\Vert \le \kappa _{lj}\Vert y-x\Vert ^{2},\quad \forall x,y\in N(x^{*}, r), \end{aligned}$$
(3.3)

and there exists a constant \(\kappa _{lf}>0\) such that

$$\begin{aligned} \Vert F(y)-F(x)\Vert \le \kappa _{lf}\Vert y-x\Vert ,\quad \forall x,y\in N(x^{*}, r). \end{aligned}$$
(3.4)

Denote by \({\bar{x}}_k\) the vector in \(X^*\) that satisfies

$$\begin{aligned} \Vert {{\bar{x}}}_k-{x}_k\Vert =\text{ dist }(x_k, X^*). \end{aligned}$$
(3.5)

Then, for all large k,

$$\begin{aligned} \Vert {\bar{x}}_{k}-x_{k}\Vert \le \Vert x^*-x_k\Vert \le \frac{r}{2} \end{aligned}$$
(3.6)

and

$$\begin{aligned} \Vert {{\bar{x}}}_k-x^*\Vert \le \Vert {\bar{x}}_{k}-x_{k}\Vert +\Vert x^*-x_k\Vert \le r, \end{aligned}$$
(3.7)

so \({{\bar{x}}}_k\in N(x^*,r)\). In the following, the iterations k we consider are large.

3.1 Properties of the Step \(d_k\)

In this subsection, we show the relationship between the size of \(d_k\) and the distance from \(x_{k_i}\) to the solution set.

Lemma 3.4

Under Assumptions 3.13.3, there exists a constant \(c>0\) such that

$$\begin{aligned} \Vert d_k\Vert \le c\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert , \quad k=k_i, \ldots , k_{i}+s_i-1. \end{aligned}$$
(3.8)

Proof

It follows from (1.12) and (3.2) that

$$\begin{aligned} \lambda _{k_{i}}=\mu _{k_{i}}\Vert F_{k_{i}}\Vert ^{\delta }\ge \mu _{\min } \kappa _{leb} ^{\delta }\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{\delta }. \end{aligned}$$
(3.9)

Since \(d_{k_i}\) is the minimizer of \(\varphi _{k_{i}}(d)\), by (3.6), (3.3), \(0<r<1\) and \(1\le \delta \le 2\), we have

$$\begin{aligned} \Vert d_{k_{i}}\Vert ^{2}&\le {\displaystyle \frac{\varphi _{k_{i}}(d_{k_{i}})}{\lambda _{k_{i}}}}\nonumber \\&\le {\displaystyle \frac{\varphi _{k_{i}}({\bar{x}}_{k_{i}}-x_{k_{i}})}{\lambda _{k_{i}}}}\nonumber \\&= {\displaystyle \frac{\Vert F_{k_{i}}+J_{k_{i}}({\bar{x}}_{k_{i}}-x_{k_{i}})\Vert ^{2}}{\lambda _{k_{i}}}+\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{2}}\nonumber \\&\le \dfrac{\kappa _{lj}^{2}}{\mu _{\min } \kappa _{leb} ^{\delta }}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{4-\delta }+\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{2}\nonumber \\&\le c_0\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert , \end{aligned}$$
(3.10)

where \(c_0=\sqrt{\mu _{\min }^{-1}\kappa _{leb} ^{-\delta }\kappa _{lj}^{2}+1}\).

For \(k=k_i+1, \ldots , k_i+s_i-1\), by (3.3),

$$\begin{aligned} \Vert d_k\Vert&= \left\| -\,\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}F_{k}\right\| \nonumber \\&\le \left\| -\,\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}F_{k_{i}}\right\| +\left\| -\,\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}J_{k_{i}}\left( \sum _{j=k_i}^{k-1}d_j\right) \right\| \nonumber \\&\qquad +\,\kappa _{lj}\left\| -\,\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}\right\| \left\| \sum _{j=k_i}^{k-1}d_j\right\| ^{2}\nonumber \\&\le \Vert d_{k_{i}}\Vert +\sum _{j=k_i}^{k-1}\Vert d_j\Vert +\kappa _{lj}\left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}\right\| \left( \sum _{j=k_i}^{k-1}\Vert d_j\Vert \right) ^{2}. \end{aligned}$$
(3.11)

By (3.9),

$$\begin{aligned} \left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}\right\|&=\left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}J_{k_{i}}\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}\right\| ^{1/2}\nonumber \\&\le \left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_i}I\right) \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}\right\| ^{1/2}\nonumber \\&= \left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}\right\| ^{1/2}\nonumber \\&\le \frac{1}{\sqrt{\lambda _{k_i}}} \le \frac{1}{\mu _{\min }^{\frac{1}{2}}\kappa _{leb} ^{\frac{\delta }{2}}}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{-\frac{\delta }{2}}. \end{aligned}$$
(3.12)

So, it follows from (3.6), (3.10) and \(1\le \delta \le 2\) that

$$\begin{aligned} \Vert d_{k_i+1}\Vert&\le \Vert d_{k_{i}}\Vert +\Vert d_{k_{i}}\Vert + \kappa _{lj}\mu _{\min }^{-\frac{1}{2}}\kappa _{leb}^{-\frac{\delta }{2}}c_0\Vert d_{k_{i}}\Vert \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{1-\frac{\delta }{2}}\nonumber \\&\le 2 \Vert d_{k_{i}}\Vert + {{\bar{c}}} \Vert d_{k_{i}}\Vert \nonumber \\&= c_1 \Vert d_{k_{i}}\Vert , \end{aligned}$$
(3.13)

where \({{\bar{c}}}= \kappa _{lj}\mu _{\min }^{-\frac{1}{2}}\kappa _{leb}^{-\frac{\delta }{2}}c_0\) and \(c_1=2+{{\bar{c}}}\) are positive constants. Similarly,

$$\begin{aligned} \Vert d_{k_i+2}\Vert&\le \Vert d_{k_{i}}\Vert +\Vert d_{k_{i}}\Vert + \Vert d_{k_i+1}\Vert + \kappa _{lj}\mu _{\min }^{-\frac{1}{2}}\kappa _{leb}^{-\frac{\delta }{2}}c_0(1+c_1)^2\Vert d_{k_{i}}\Vert \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{1-\frac{\delta }{2}}\nonumber \\&\le \left( 2+c_1+{{\bar{c}}}(1+c_1)^2\right) \Vert d_{k_{i}}\Vert \nonumber \\&= c_2 \Vert d_{k_{i}}\Vert , \end{aligned}$$
(3.14)

where \(c_2=2+c_1+{{\bar{c}}}(1+c_1)^2\). Let

$$\begin{aligned} c_j= 2+c_1+\cdots + c_{j-1}+{{\bar{c}}}(1+c_1+\cdots +c_{j-1})^2, \quad j=3, \ldots , t-1. \end{aligned}$$
(3.15)

Then, by induction, we have

$$\begin{aligned} \Vert d_{k_i+q}\Vert \le&c_q \Vert d_{k_{i}}\Vert , \quad q=3, \ldots , s_i-1. \end{aligned}$$
(3.16)

Since \(s_i\le t\) for all i, \(c_{t-1}>c_{t-2}>\cdots>c_2>c_1>1\), by (3.10), we obtain (3.8), where \(c=c_0c_{t-1}\). \(\square \)

3.2 Boundedness of the LM Parameter

The updating rule of \(\mu _k\) indicates that \(\mu _k\) is bounded below. In the following, we show that \(\mu _k\) is bounded above.

Lemma 3.5

Under Assumptions 3.13.3, there exists a constant \(\mu _{\max }>0\) such that

$$\begin{aligned} \mu _{k}\le \mu _{\max } \end{aligned}$$
(3.17)

holds for all sufficiently large k.

Proof

We first prove that

$$\begin{aligned}&\Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_kd_k\Vert ^{2}\ge \frac{\kappa _{leb}}{2}\Vert F_{k}\Vert \min \left\{ \Vert d_k\Vert ,\Vert {\bar{x}}_{k}-x_{k}\Vert \right\} \end{aligned}$$
(3.18)

holds for sufficiently large k.

It follows from (3.1), (3.3), Lemma 3.4 and \(s_i\le t\) that, for \(k=k_i, k_i+1,\ldots , k_i+s_i-1\),

$$\begin{aligned} \Vert F_{k}+G_k({\bar{x}}_{k}-x_{k})\Vert&\le \Vert F_{k}+J_k({\bar{x}}_{k}-x_{k})\Vert +\Vert G_k-J_k\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\le \kappa _{lj}\Vert {\bar{x}}_{k}-x_{k}\Vert ^{2}+\kappa _{lj}\left( \sum _{j=k_i}^{k-1}\Vert d_j\Vert \right) \Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\le \kappa _{lj}\Vert {\bar{x}}_{k}-x_{k}\Vert ^{2}+\kappa _{lj}ct\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert . \end{aligned}$$
(3.19)

Note that \(\Vert {\bar{x}}_{k}-x_{k}\Vert \rightarrow 0\) and \(\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \rightarrow 0\), we obtain that

$$\begin{aligned} \Vert F_{k}+G_k({\bar{x}}_{k}-x_{k})\Vert \le \frac{\kappa _{leb}}{2}\Vert {\bar{x}}_{k}-x_{k}\Vert \end{aligned}$$
(3.20)

holds for sufficiently large k.

We consider in two cases.

Case 1: \(\Vert {\bar{x}}_{k}-x_{k}\Vert \le \Vert d_k\Vert \). Since \(d_k\) is a solution of the trust region problem (2.6), by (3.2) and (3.20),

$$\begin{aligned} \Vert F_{k}\Vert -\Vert F_{k}+G_kd_k\Vert&\ge \Vert F_{k}\Vert - \Vert F_{k}+G_k({\bar{x}}_{k}-x_{k})\Vert \nonumber \\&\ge \kappa _{leb}\Vert {\bar{x}}_{k}-x_{k}\Vert -\frac{\kappa _{leb}}{2}\Vert {\bar{x}}_{k}-x_{k}\Vert \ge \frac{\kappa _{leb}}{2}\Vert {\bar{x}}_{k}-x_{k}\Vert \end{aligned}$$
(3.21)

holds for sufficiently large k.

Case 2: \(\Vert {\bar{x}}_{k}-x_{k}\Vert >\Vert d_k\Vert \). Similarly, by (3.2) and (3.20),

$$\begin{aligned} \Vert F_{k}\Vert -\Vert F_{k}+G_kd_k\Vert&\ge \Vert F_{k}\Vert -\left\| F_{k}+\frac{\Vert d_k\Vert }{\Vert {\bar{x}}_{k}-x_{k}\Vert }G_k({\bar{x}}_{k}-x_{k})\right\| \nonumber \\&\ge \frac{\Vert d_k\Vert }{\Vert {\bar{x}}_{k}-x_{k}\Vert }\left( \Vert F_{k}\Vert -\Vert F_{k}+G_k({\bar{x}}_{k}-x_{k})\Vert \right) \nonumber \\&\ge \frac{\Vert d_k\Vert }{\Vert {\bar{x}}_{k}-x_{k}\Vert }\frac{\kappa _{leb}}{2}\Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\ge \frac{\kappa _{leb}}{2}\Vert d_k\Vert . \end{aligned}$$
(3.22)

Combining (3.21) with (3.22), we get

$$\begin{aligned} \Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_kd_k\Vert ^{2}&\ge \frac{\kappa _{leb}}{2}\left( \Vert F_{k}\Vert +\Vert F_{k}+G_kd_k\Vert \right) \min \left\{ \Vert d_k\Vert ,\Vert {\bar{x}}_{k}-x_{k}\Vert \right\} \nonumber \\&\ge \frac{\kappa _{leb}}{2}\Vert F_{k}\Vert \min \left\{ \Vert d_k\Vert ,\Vert {\bar{x}}_{k}-x_{k}\Vert \right\} , \end{aligned}$$
(3.23)

which yields (3.18).

It then follows from (3.2), (3.3), Lemma 3.4 and \(\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert \le \Vert F_{k_i}\Vert \) that

$$\begin{aligned} |r_{k_i}-1|&= \left| \dfrac{Ared_{k_i}-Pred_{k_i}}{Pred_{k_i}}\right| =\left| \dfrac{\Vert F_{k_i+1}\Vert ^{2}-\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert ^{2}}{\Vert F_{k_i}\Vert ^{2}-\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert ^{2}}\right| \nonumber \\&\le \left| \dfrac{2\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert \Vert d_{k_i}\Vert ^2+\kappa _{lj}^2\Vert d_{k_i}\Vert ^{4}}{\frac{\kappa _{leb}}{2}\Vert F_{k_i}\Vert \min \{\Vert d_{k_i}\Vert ,\Vert {\bar{x}}_{k_i}-x_{k_i}\Vert \}}\right| \nonumber \\&\rightarrow 0. \end{aligned}$$
(3.24)

This implies that \(r_{k_i}\rightarrow 1\). Note that for \(k\notin {{\bar{S}}}\), \(G_{k+1}=G_k\) and \(r_k\ge p_1>p_2\), so \(\mu _{k+1}\le \mu _{k}\). By (1.10)–(1.12), there exists a constant \(\mu _{\max }>0\) such that (3.17) holds true. \(\square \)

3.3 Convergence Rate of Algorithm 1

Due to the results given by Behling and Iusem in [1], we assume that \({\text {rank}}(J({\bar{x}}))=r\) for all \({\bar{x}}\in N(x^{*},b )\cap X^{*}\). Suppose the SVD of \(J({\bar{x}}_{k_{i}})\) is

$$\begin{aligned} J({\bar{x}}_{k_{i}})&= {\bar{U}}_{k_{i}}{\bar{\Sigma }}_{k_{i}}{\bar{V}}_{k_{i}}^{T}\nonumber \\&= \left( {\bar{U}}_{k_{i},1},{\bar{U}}_{k_{i},2}\right) \left( \begin{array}{cc} {\bar{\Sigma }}_{k_{i},1}\\ &{}\quad 0 \end{array}\right) \left( \begin{array}{c} {\bar{V}}_{k_{i},1}^{T}\\ {\bar{V}}_{k_{i},2}^{T} \end{array}\right) \nonumber \\&= {\bar{U}}_{k_{i},1}{\bar{\Sigma }}_{k_{i},1}{\bar{V}}_{k_{i},1}^{T}, \end{aligned}$$
(3.25)

where \({\bar{\Sigma }}_{k_{i},1}={\text {diag}}({\bar{\sigma }}_{k_{i},1},{\bar{\sigma }}_{k_{i},2},\ldots ,{\bar{\sigma }}_{k_{i},r})\) with \({\bar{\sigma }}_{k_{i},1}\ge \cdots \ge {\bar{\sigma }}_{k_{i},r}>0\). Suppose the SVD of \(J(x_{k_{i}})\) is

$$\begin{aligned} J_{k_{i}}&= U_{k_{i}}\Sigma _{k_{i}}V_{k_{i}}^{T}\nonumber \\&= \left( U_{k_{i},1},U_{k_{i},2}\right) \left( \begin{array}{cc} \Sigma _{k_{i},1}\\ &{} \Sigma _{k_{i},2}\\ \end{array}\right) \left( \begin{array}{c} V_{k_{i},1}^{T}\\ V_{k_{i},2}^{T} \end{array}\right) \nonumber \\&= U_{k_{i},1}\Sigma _{k_{i},1}V_{k_{i},1}^{T}+U_{k_{i},2}\Sigma _{k_{i},2}V_{k_{i},2}^{T}, \end{aligned}$$
(3.26)

where \(\Sigma _{k_{i},1}=\text{ diag }(\sigma _{k_{i},1},\sigma _{k_{i},2},\ldots ,\sigma _{k_{i},r})\) with \(\sigma _{k_{i},1}\ge \cdots \ge \sigma _{k_{i},r}>0\) and \(\Sigma _{k_{i},2}=\text{ diag }(\sigma _{k_{i},r+1},\sigma _{k_{i},r+2},\ldots ,\sigma _{k_{i},n})\) with \(\sigma _{k_{i},r+1}\ge \cdots \ge \sigma _{k_{i},n}\ge 0\).

By (3.1) and the theory of matrix perturbation [12], we obtain

$$\begin{aligned} \left\| \text{ diag }\left( \Sigma _{k_i, 1}-{\bar{\Sigma }}_{k_i,1},\Sigma _{k_i, 2}\right) \right\| \le \Vert J_{k_{i}}-J({{\bar{x}}}_{k_{i}})\Vert \le \kappa _{lj}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert . \end{aligned}$$
(3.27)

Lemma 3.6

Under Assumptions 3.13.3, there exist positive constants \(l_1\) and \(l_2\) such that

$$\begin{aligned} \Vert d_{k}\Vert&\le l_1\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}, \quad k=k_i,\ldots , k_{i}+s_i-1, \end{aligned}$$
(3.28)
$$\begin{aligned} \Vert F_{k}+G_{k}d_{k}\Vert&\le l_2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+2}, \quad k=k_i,\ldots , k_{i}+s_i-1. \end{aligned}$$
(3.29)

Proof

We prove by induction. It was shown in [2] that the results hold true for \(k=k_i\) and \(k=k_i+1\).

Suppose the results hold true for \(k-1 (k_i+2< k < k_i+s_i-1)\), that is, there exist constants \({{\bar{l}}}_1\) and \({{\bar{l}}}_2\) such that

$$\begin{aligned} \Vert d_{k-1}\Vert&\le {{\bar{l}}}_1\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i}, \end{aligned}$$
(3.30)
$$\begin{aligned} \Vert F_{k-1}+G_{k-1}d_{k-1}\Vert&\le \bar{l}_2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}. \end{aligned}$$
(3.31)

It then follows from (3.1), (3.3) and Lemma 3.4 that

$$\begin{aligned} \Vert F_{k}\Vert&=\Vert F(x_{k-1}+d_{k-1})\Vert \nonumber \\&\le \Vert F_{k-1}+J_{k-1}d_{k-1}\Vert +\kappa _{lj}\Vert d_{k-1}\Vert ^{2}\nonumber \\&\le \Vert F_{k-1}+G_{k-1}d_{k-1}\Vert +\Vert \left( J_{k-1}-G_{k-1}\right) d_{k-1}\Vert +\kappa _{lj}\Vert d_{k-1}\Vert ^{2}\nonumber \\&=\Vert F_{k-1}+G_{k-1}d_{k-1}\Vert +\left\| J_{k-1}-J_{k_{i}}\right\| \Vert d_{k-1}\Vert +\kappa _{lj}\Vert d_{k-1}\Vert ^{2}\nonumber \\&\le {{\bar{l}}}_2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1} +\kappa _{lj}\sum _{j=k_i}^{k-2}\Vert d_{j}\Vert \Vert \Vert d_{k-1}\Vert +\kappa _{lj}{{\bar{l}}}_1^2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{2(k-k_i)}\nonumber \\&\le {{\bar{l}}}_3\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}, \end{aligned}$$
(3.32)

where \({{\bar{l}}}_3={{\bar{l}}}_2+\kappa _{lj}ct{{\bar{l}}}_1+\kappa _{lj}\bar{l}_1^2\). Hence,

$$\begin{aligned} \left\| U_{k_i, 1}U_{k_i, 1}^{T}F_{k}\right\| \le \Vert F_{k}\Vert \le \bar{l}_3\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}. \end{aligned}$$
(3.33)

Moreover, by (3.2),

$$\begin{aligned} \Vert {\bar{x}}_{k}-x_{k}\Vert \le \kappa _{leb}^{-1}\Vert F_{k}\Vert \le \kappa _{leb}^{-1} {{\bar{l}}}_3\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}. \end{aligned}$$
(3.34)

Let \({\tilde{J}}_{k_{i}}=U_{k_i, 1}\Sigma _{k_i, 1}V_{k_i, 1}^{T}\) and \({\tilde{u}}_{k}=-{\tilde{J}}_{k_i}^{+}F_{k}\). Then, \({\tilde{u}}_{k}\) is the least squares solution of the problem \({\displaystyle \min _{u\in R^{n}}\Vert F_{k}+{\tilde{J}}_{k_{i}}u\Vert }\). By (3.1), (3.3) and (3.27),

$$\begin{aligned} \left\| U_{k_i, 2}U_{k_i, 2}^{T}F_{k}\right\|&= \Vert F_{k}+{\tilde{J}}_{k_{i}}{\tilde{u}}_{k}\Vert \le \Vert F_{k}+{\tilde{J}}_{k_{i}}({\bar{x}}_{k}-x_{k})\Vert \nonumber \\&\le \Vert F_{k}+J_{k}({\bar{x}}_{k}-x_{k})\Vert +\Vert J_{k_{i}}-J_{k}\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert + \Vert {\tilde{J}}_{k_{i}}-J_{k_{i}}\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\le \kappa _{lj}\Vert {\bar{x}}_{k}-x_{k}\Vert ^{2}+\kappa _{lj}\sum _{j=k_i}^{k-1}\Vert d_{j}\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert +\left\| U_{k_i, 2}\Sigma _{k_i, 2}V_{k_i, 2}^{T}\right\| \Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\le {{\bar{l}}}_4\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+2}, \end{aligned}$$
(3.35)

where \({{\bar{l}}}_4=\kappa _{lj}\kappa _{leb}^{-1} \bar{l}_3(\kappa _{leb}^{-1} {{\bar{l}}}_3+ct+1)\).

Note that \(\{x_k\}\) converges to \(X^{*}\). By (3.27), we know

$$\begin{aligned} \left\| \Sigma _{k_i, 1}^{-1}\right\| =\frac{1}{\sigma _{k_i, r}}\le \left| \frac{1}{{\bar{\sigma }}_{k_i, r}-\kappa _{lj}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert }\right| \le \frac{2}{{\bar{\sigma }}_{k_i, r}} \end{aligned}$$
(3.36)

holds for sufficiently large \(k_i\). By (3.9),

$$\begin{aligned} \left\| \lambda _{k_{i}}^{-1}\Sigma _{k_i, 2}\right\| =\frac{\Vert \Sigma _{k_i, 2}\Vert }{\lambda _{k_i}} \le \frac{\kappa _{lj}}{\mu _{\min } \kappa _{leb} ^{\delta }}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{1-\delta }. \end{aligned}$$
(3.37)

It then follows from (3.33), (3.35)–(3.37), Lemma 3.5 and \(1\le \delta \le 2\) that

$$\begin{aligned} \Vert d_{k}\Vert&= \left\| -V_{k_i, 1}\left( \Sigma _{k_i, 1}^{2}+\lambda _{k_{i}}I\right) ^{-1}\Sigma _{k_i, 1}U_{k_i, 1}^{T}F_{k}-V_{k_i, 2}\left( \Sigma _{k_i, 2}^{2}+\lambda _{k_{i}}I\right) ^{-1}\Sigma _{k_i, 2}U_{k_i, 2}^{T}F_{k}\right\| \nonumber \\&\le \left\| \Sigma _{k_i, 1}^{-1}\right\| \left\| U_{k_i, 1}^{T}F_{k}\right\| +\left\| \lambda _{k_{i}}^{-1}\Sigma _{k_i, 2}\right\| \left\| U_{k_i, 2}^{T}F_{k}\right\| \nonumber \\&\le l_1 \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}, \end{aligned}$$
(3.38)

where \(l_1=2{\bar{\sigma }}_{k_i, r}^{-1} {{\bar{l}}}_3+ \kappa _{lj}\mu _{\min }^{-1} \kappa _{leb} ^{-\delta } {{\bar{l}}}_4\), and

$$\begin{aligned}&\Vert F_{k}+G_{k}d_{k}\Vert \nonumber \\&\qquad = \left\| \lambda _{k_{i}}U_{k_i, 1}\left( \Sigma _{k_i, 1}^{2}+\lambda _{k_{i}}I\right) ^{-1}U_{k_i, 1}^{T}F_{k}+\lambda _{k_{i}}U_{k_i, 2}\left( \Sigma _{k_i, 2}^{2}+\lambda _{k_{i}}I\right) ^{-1}U_{k_i, 2}^{T}F_{k} \right\| \nonumber \\&\qquad \le \mu _{k_{i}}\Vert F_{k_{i}}\Vert ^{\delta }\left\| \Sigma _{k_i, 1}^{-2}\right\| \left\| U_{k_i, 1}^{T}F_{k}\right\| +\left\| U_{k_i, 2}^{T}F_{k}\right\| \nonumber \\&\qquad \le l_2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+2}, \end{aligned}$$
(3.39)

where \(l_2= 4\mu _{\max }\kappa _{lf}^{\delta }{\bar{\sigma }}_{k_i, r}^{-2}{{\bar{l}}}_3+{{\bar{l}}}_4\). The proof is completed. \(\square \)

Based on Lemma 3.6, we have the following main result.

Theorem 3.7

Under Assumptions 3.13.3, there exists a constant \(l_3>0\) such that

$$\begin{aligned} \Vert d_{k_{i+1}}\Vert \le l_3\Vert d_{k_{i}}\Vert ^{s_i+1}. \end{aligned}$$
(3.40)

Consequently, Algorithm 1 converges q-superlinearly to some solution of (1.1).

Proof

It follows from (3.2), (3.3), Lemma 3.4 and Lemma 3.6 that

$$\begin{aligned} \kappa _{leb}\Vert {\bar{x}}_{k_{i+1}}-x_{k_{i+1}}\Vert&\le \Vert F(x_{k_{i+1}})\Vert = \Vert F(x_{k_i+s_i-1}+d_{k_i+s_i-1})\Vert \nonumber \\&\le \Vert F_{k_i+s_i-1}+J_{k_i+s_i-1}d_{k_i+s_i-1}\Vert + \kappa _{lj}\Vert d_{k_i+s_i-1}\Vert ^2\nonumber \\&\le \Vert F_{k_i+s_i-1}+G_{k_i+s_i-1}d_{k_i+s_i-1}\Vert +\Vert (J_{k_i+s_i-1}-G_{k_i+s_i-1})d_{k_i+s_i-1}\Vert \nonumber \\&\quad +\kappa _{lj}\Vert d_{k_i+s_i-1}\Vert ^{2}\nonumber \\&\le \Vert F_{k_i+s_i-1}+G_{k_i+s_i-1}d_{k_i+s_i-1}\Vert + \sum _{j=k_i}^{k_i+s_i-2}\Vert d_{j}\Vert \Vert d_{k_i+s_i-1}\Vert \nonumber \\&\quad +\kappa _{lj}\Vert d_{k_i+s_i-1}\Vert ^{2}\nonumber \\&\le \left( l_2+tcl_1+\kappa _{lj}l_1^2\right) \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{s_i+1}. \end{aligned}$$
(3.41)

Since

$$\begin{aligned} \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \le \Vert {\bar{x}}_{k_{i+1}}-x_{k_{i}}\Vert \le \Vert {\bar{x}}_{k_{i+1}}-x_{k_{i+1}}\Vert +\sum _{j=k_i}^{k_{i+1}-1}\Vert d_j\Vert , \end{aligned}$$
(3.42)

by (3.16) and (3.41),

$$\begin{aligned} \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \le 2 \sum _{j=k_i}^{k_{i+1}-1}\Vert d_j\Vert \le 2s_ic_{s_i-1}\Vert d_{k_i}\Vert \end{aligned}$$
(3.43)

holds for sufficiently large k.

Let \({{\bar{l}}}_5=\kappa _{leb}^{-1}(l_2+tcl_1+\kappa _{lj}l_1^2)\). It then follows from (3.41), Lemmas 3.4 and 3.6 that

$$\begin{aligned} \Vert d_{k_{i+1}}\Vert&\le c \Vert {\bar{x}}_{k_{i+1}}-x_{k_{i+1}}\Vert \le c{{\bar{l}}}_5 \Vert {{\bar{x}}}_{k_i}-x_{k_i}\Vert ^{s_i+1}\nonumber \\&\le c{{\bar{l}}}_5(2s_ic_{s_i-1})^{s_i+1} \Vert d_{k_{i}}\Vert ^{s_i+1}\nonumber \\&\le c{{\bar{l}}}_5(2tc_{t-1})^{t+1}\Vert d_{k_{i}}\Vert ^{s_i+1}. \end{aligned}$$
(3.44)

Letting \(l_3=c{{\bar{l}}}_5(2tc_{t-1})^{t+1}\), we obtain (3.40).

By Lemma 3.4, we have \(\Vert d_{k_i}\Vert \rightarrow 0\). Hence, there exist N and \(0<q<1\) such that \(\max \{\Vert d_{k_i}\Vert , l_3\Vert d_{k_i}\Vert \}\le q<1\) for all \(i\ge N\). Since \( s_i\ge 1\), we have

$$\begin{aligned} \Vert d_{k_{i+1}}\Vert \le l_3 \Vert d_{k_{i}}\Vert ^{s_i+1} \le l_3 \Vert d_{k_{i}}\Vert ^2\le q\Vert d_{k_i}\Vert , \qquad \forall i\ge N. \end{aligned}$$
(3.45)

Then,

$$\begin{aligned} \sum _{i=N}^\infty \Vert d_{k_i}\Vert&\le \Vert d_{k_N}\Vert + q \Vert d_{k_N}\Vert + q^2\Vert d_{k_N}\Vert +\cdots \nonumber \\&= \frac{\Vert d_{k_N}\Vert }{1-q}. \end{aligned}$$
(3.46)

This implies that \(\sum _{i=N}^\infty \Vert d_{k_i}\Vert \) converges, so does \(\sum _{i=1}^\infty \Vert d_{k_i}\Vert \).

By (3.15) and \(s_i\le t\), we have

$$\begin{aligned} \Vert x_{k_{i+1}}-x_{k_i}\Vert =\left\| \sum _{j=0}^{s_i-1} d_{k_i+j}\right\| \le t c_{t-1} \Vert d_{k_i}\Vert . \end{aligned}$$
(3.47)

So, \(\sum _{i=1}^\infty \Vert x_{k_{i+1}}-x_{k_i}\Vert \) converges, which implies that \(\sum _{i=1}^\infty (x_{k_{i+1}}-x_{k_i})\) converges. Thus, \(\{x_{k_i}\}\) converges to some \({{\hat{x}}}\in X^*\). By (3.40), we have

$$\begin{aligned} \Vert x_{k_{i+1}}-{{\hat{x}}}\Vert \le {{\tilde{q}}} \Vert x_{k_{i}}-{{\hat{x}}}\Vert ^{s_i+1} \end{aligned}$$
(3.48)

for some \({{\tilde{q}}}>0\). Therefore, Algorithm 1 converges q-superlinearly. \(\square \)

4 Numerical Results

We tested Algorithm 1 on some singular problems and compared it with the LM algorithm, the modified LM algorithm [2] as well as the multi-step LM algorithm with \(m=3\) [3]. The experiments are implemented on a laptop with an Intel Core i7-7500U CPU and 8 GB of RAM, using Matlab R2015b.

The test problems are created by modifying the nonsingular problems given by Moré, Garbow and Hillstrom in [8]. They have the same form as in [11]

$$\begin{aligned} {\hat{F}}(x)=F(x)-J(x^{*})A\left( A^{T}A\right) ^{-1}A^{T}(x-x^{*}), \end{aligned}$$
(4.1)

where F(x) is the standard nonsingular test function, \(x^{*}\) is its root, and \(A\in R^{n\times k}\) has full column rank with \(1\le k\le n\). Obviously, \({\hat{F}}(x^{*})=0\) and

$$\begin{aligned} {\hat{J}}(x^{*})=J(x^{*})\left( I-A\left( A^{T}A\right) ^{-1}A^{T}\right) \end{aligned}$$

has rank \(n-k\).

We set \(p_{0}=0.0001,p_{1}=0.50,p_{2}=0.25,p_{3}=0.75\), \(c_1=4, c_2=0.25\), \(\mu _{1}=10^{-5}\), \(\mu _{\min }=10^{-8}\) and \(t=10\) for the tests. Algorithm 1 stops when \(\Vert J_{k_i}^{T}F_{k_i}\Vert \le 10^{-5}\) or the iteration number exceeds \(100\times (n+1)\).

Table 1 Results on the singular problems with rank \(n-1\) in small scale
Table 2 Results on the singular problems with rank \(n-1\) in large scale
Table 3 Results on the singular problems with rank 1 in small scale

We chose the rank of \({\hat{J}}(x^{*})\) to be \(n-1\) by using

$$\begin{aligned} A\in R^{n\times 1},\qquad A^{T}=(1,1,\ldots ,1). \end{aligned}$$

The results are given in Tables 1 and 2. In the tables, \(x_{0}\), \(10x_{0}\) and \(100x_{0}\) in the third column are starting points, where \(x_{0}\) is suggested in [8]. “NF” and “NJ” represent the numbers of function evaluations and Jacobian evaluations, respectively. “—” represents that the algorithm fails to find a solution in \(100\times (n+1)\) iterations. Note that, the evaluation of the Jacobian is generally n times of the function evaluation. So we also presented the values “\(NF+n* NJ\)” for comparisons of the total evaluations. However, if the Jacobian is sparse, this kind of value does not mean much. For large scale problems, we presented the running time of the problem instead of the total evaluations.

We also chose the rank of \({\hat{J}}(x^{*})\) to be 1 and 10, where \(A\in R^{n\times (n-1)}\) and \(A\in R^{n\times (n-10)}\) are generated randomly. The results are given in Tables 3 and 4, respectively.

Table 4 Results on the singular problems with rank 10 in large scale

From the tables, we can see that the adaptive LM algorithm usually takes the least Jacobian calculations as well as the total calculations to find a solution of small scale nonlinear equations, while it usually takes the least time to find a solution of large scale nonlinear equations, no matter the rank of the problem is. The reason is that it makes full use of the available evaluated Jacobians and matrix factorizations to compute approximate LM steps. Since it takes much less efforts to compute approximate LM steps, the overall cost of the adaptive LM algorithm is usually much less.

5 Conclusion

We proposed an adaptive multi-step Levenberg–Marquardt algorithm for nonlinear equations. It uses the latest evaluated Jacobian to calculate an approximate LM step, if the ratio of the actual reduction to the predicted reduction of the merit function at the iterate is good. Under the local error bound condition, which is weaker than the nonsingularity condition, the adaptive LM algorithm converges superlinearly to some solution of (1.1). Compared to the LM algorithm, the Jacobian calculations, the total calculations, as well as the running time of the adaptive multi-step LM algorithm are significantly reduced.