An Adaptive Multi-step Levenberg–Marquardt Method

Fan, Jinyan; Huang, Jianchao; Pan, Jianyu

doi:10.1007/s10915-018-0777-8

An Adaptive Multi-step Levenberg–Marquardt Method

Published: 28 June 2018

Volume 78, pages 531–548, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Scientific Computing Aims and scope Submit manuscript

An Adaptive Multi-step Levenberg–Marquardt Method

Download PDF

Jinyan Fan¹,
Jianchao Huang² &
Jianyu Pan³

832 Accesses
23 Citations
Explore all metrics

Abstract

We propose an adaptive multi-step Levenberg–Marquardt (LM) method for nonlinear equations. The adaptive scheme can decide automatically whether an iteration should evaluate the Jacobian matrix at the current iterate to compute an LM step, or use the latest evaluated Jacobian to compute an approximate LM step, so that not only the Jacobian evaluation but also the linear algebra work can be saved. It is shown that the adaptive multi-step LM method converges superlinearly under the local error bound condition, which does not require the full column rank of the Jacobian at the solution. Numerical experiments demonstrate the efficiency of the adaptive multi-step LM method.

A modified Levenberg–Marquardt method with line search for nonlinear equations

Article 27 May 2016

A new adaptive Levenberg–Marquardt parameter with a nonmonotone and trust region strategies for the system of nonlinear equations

Article 25 April 2023

On a New Updating Rule of the Levenberg–Marquardt Parameter

Article 23 June 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We consider the system of nonlinear equations

$$\begin{aligned} F(x)=0, \end{aligned}$$

(1.1)

where $F(x):R^{n}\rightarrow R^{m}$ is a continuously differentiable function. Nonlinear equations have wide applications in technology, mechanics, economy and so on. For example, physical models that are expressed as nonlinear partial differential equations become systems of nonlinear equations when discretized [5].

It is natural to transform the nonlinear equations (1.1) to the nonlinear least squares problem

$$\begin{aligned} \min _{x\in R^n} \Vert F(x)\Vert ^2, \end{aligned}$$

(1.2)

where $\Vert \cdot \Vert $ refers to the 2-norm. Obviously, (1.1) has a solution if and only if the minimal value of (1.2) is zero. The Gauss–Newton method is the most well-known method for (1.2). At the k-th iteration, it computes the Gauss–Newton step

$$\begin{aligned} d_k^{GN}=-\,(J_k^TJ_k)^{-1}J_k^TF_k, \end{aligned}$$

(1.3)

where $F_{k}=F(x_{k})$ and $J_{k}=F'(x_{k})$ is the Jacobian at $x_k$. However, when the Jacobian is not of full rank, the Gauss–Newton step is not well-defined. To overcome this difficulty, the Levenberg–Marquardt (LM) method introduces a positive parameter $\lambda _{k}>0$ and computes the LM step

$$\begin{aligned} d_{k}=-\,(J_{k}^{T}J_{k}+\lambda _{k}I)^{-1}J_{k}^{T}F_{k}. \end{aligned}$$

(1.4)

Both the Gauss–Newton method and the Levenberg–Marquardt method have quadratic convergence for (1.1) under the assumptions that the Jacobian J(x) is Lipschitz continuous, $J(x)^TJ(x)$ is nonsingular at the solution and the LM parameter is chosen suitably. Yamashita and Fukushima [13] proved that if the LM parameter is chosen as $\lambda _{k}=\Vert F_{k}\Vert ^{2}$, then the LM method converges quadratically for (1.1) under the local error bound condition, which is weaker than the nonsingularity condition. Fan and Yuan [4] took $\lambda _{k}=\Vert F_{k}\Vert ^{\delta }$ and showed that the LM method preserves the quadratic convergence for any $\delta \in [1,2]$ under the local error bound condition.

The LM method evaluates the Jacobian at every iteration. When F(x) is complicated or n is large, the cost of Jacobian evaluations may be expensive. To save the Jacobian evaluations as well as the linear algebra work, Fan [2] proposed a modified LM method. At every iteration, it uses the evaluated Jacobian to compute not only an LM step but also an approximate LM step. More generally, a multi-step LM method was given in [3]. At the k-th iteration, it computes one LM step and $p-1$ approximate LM steps as follows:

$$\begin{aligned} d_{k,i}&=-\,(J_{k}^{T}J_{k}+\lambda _{k}I)^{-1}J_{k}^{T}F(x_{k,i}),\quad i=0,\ldots ,p-1, \end{aligned}$$

(1.5)

where $p\ge 1$ is a given integer and $x_{k,i}=x_{k,i-1}+d_{k,i-1}$ with $x_{k,0}=x_{k}$, then set the trial step as

$$\begin{aligned} d_k=\sum _{i=0}^{p-1}d_{k,i}. \end{aligned}$$

(1.6)

That is, the multi-step LM method uses $J_k$ as an approximation of $J(x_{k,i})$ to compute the approximate LM step $d_{k,i}$. Hence, the Jacobian evaluation and the matrix factorization are done after every p computations of the step. Counting as a complete iteration of the full p steps between the Jacobian evaluation and the matrix factorization, the multi-step LM method converges with Q-order $p+1$ under the local error bound condition.

It looks like that the bigger the p, the more Jacobian evaluations could be saved. However, the bigger p, meanwhile, implies the possibly worse approximation of the Jacobian at the iterate. On the other hand, different problems may prefer to different p. But, the best p for a problem is usually not known. These observations motivate us to develop an adaptive scheme to automatically decide whether an iteration should evaluate the Jacobian at the current iterate to compute an LM step or use the latest evaluated Jacobian to compute an approximate LM step. That is, the number of approximate LM steps could vary at different iterations, but it is at most $t-1$ to avoid worse approximation of the Jacobian, where $t\ge 1$ is a given integer.

Counting every LM step and every approximate LM step as a single iteration, in this paper, we compute the trial step $d_k$ by solving

$$\begin{aligned} \left( G_{k}^{T}G_{k}+\lambda _{k}I\right) d=-\,G_{k}^{T}F_{k}, \end{aligned}$$

(1.7)

where $G_k$ is the Jacobian $J_k$ or the Jacobian used at the last iteration. If $d_k$ is satisfactory and less than $t-1$ approximate LM steps are computed, we regard $G_k$ as a good approximation of the Jacobian at the current iterate, and use it to compute another approximate LM step at next iteration.

Define the ratio of the actual reduction to the predicted reduction of the merit function $\Vert F(x)\Vert ^2$ at the k-th iteration as

$$\begin{aligned} r_k=\frac{Ared_k}{Pred_k}=\frac{ \Vert F_{k}\Vert ^{2}-\Vert F(x_{k}+d_{k})\Vert ^{2}}{ \Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_{k}d_{k}\Vert ^{2} }. \end{aligned}$$

(1.8)

The ratio $r_k$ exploits the most relevant information on the step’s quality at the iterate. It plays an important role in deciding whether $d_k$ is acceptable. If $r_k$ is positive (i.e., the iteration is successful), we accept $d_k$. Usually, we set

$$\begin{aligned} x_{k+1} = {\left\{ \begin{array}{ll} x_{k}+d_{k}, &{} \quad \hbox {if}\quad r_{k}\ge p_{0},\\ x_{k}, &{} \quad \hbox {otherwise}, \end{array}\right. } \end{aligned}$$

(1.9)

where $p_0$ is a small positive constant. Suppose $s-1$ approximate LM steps that used the latest evaluated Jacobian have been computed. If the iteration is very successful and s is less than t, we regard the latest evaluated Jacobian as a good approximation at the iterate and keep it, otherwise, we evaluate the exact one. That is, we set

$$\begin{aligned} G_{k+1} = {\left\{ \begin{array}{ll} G_k, &{} \quad \text {if } \quad r_k\ge p_1 \text { and }s<t,\\ J_{k+1}, &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$

(1.10)

where $0<p_0<p_1<1$. Meanwhile, we update the LM parameter as follows:

$$\begin{aligned} \lambda _{k+1} = {\left\{ \begin{array}{ll} \lambda _k, &{} \quad \text {if} \quad r_k\ge p_1\text { and }s<t,\\ \mu _{k+1}\Vert F_{k+1}\Vert ^\delta , &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$

(1.11)

where

$$\begin{aligned} \mu _{k+1} = {\left\{ \begin{array}{ll} c_{1}\mu _{k}, &{} \quad \hbox {if}\quad r_{k}<p_{2},\\ \mu _{k}, &{} \quad \hbox {if}\quad p_{2}\le r_{k} \le p_3,\\ \max \{c_{2}\mu _{k},\mu _{\min }\}, &{} \quad \hbox {if}\quad r_{k}> p_3. \end{array}\right. } \end{aligned}$$

(1.12)

Here $0<c_2<1<c_1$, $0<p_0<p_2<p_1<p_3<1$, $1\le \delta \le 2$ and $\mu _{\min }>0$ are positive constants. Based on (1.7)–(1.12), we propose an adaptive multi-step LM method for nonlinear equations (1.1) and show it converges superlinearly under the local error bound condition.

The paper is organized as follows. In Sect. 2, we propose an adaptive multi-step LM algorithm for (1.1). It is shown that the algorithm converges globally under certain assumptions. In Sect. 3, we discuss the convergence rate of the algorithm under the local error bound condition. Some numerical results are given in Sect. 4. Finally, we conclude the paper in Sect. 5.

2 An Adaptive Multi-step LM Algorithm and Global Convergence

In this section, we first propose the adaptive multi-step LM algorithm, then show it converges globally under certain assumptions.

The adaptive multi-step LM algorithm is presented as follows.

We denote by ${{\bar{S}}}=\{k_i : i=1,2,\ldots \}$ the set of numbers at which iterations the Jacobians $J(x_{k_i}) (i=1,2,\ldots ) $ are used to compute LM steps. Let

$$\begin{aligned} s_{i}=k_{i+1}-k_{i}. \end{aligned}$$

(2.1)

Since at most $t-1$ approximate LM steps are computed, we have $s_i\le t$.

For any k, there exist $k_i$ and $0\le q\le s_i-1$ such that

$$\begin{aligned} k=k_{i}+q. \end{aligned}$$

(2.2)

Note that

$$\begin{aligned} G_{k_{i}}=G_{k_{i}+1}=\cdots =G_{k_{i}+s_i-1}=J_{k_{i}}, \end{aligned}$$

(2.3)

thus the linear equations (1.7) can also be written as

$$\begin{aligned} \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) d=-\,J_{k_{i}}^{T}F_k \end{aligned}$$

(2.4)

for $k=k_i, \ldots , k_i+s_i-1$.

It can be easily checked that $d_{k}$ is not only the minimizer of the convex minimization problem

$$\begin{aligned} \min _{d\in R^{n}}\Vert F_{k}+G_{k}d\Vert ^{2}+\lambda _{k}\Vert d\Vert ^{2}\triangleq \varphi _{k}(d), \end{aligned}$$

(2.5)

but also the solution of the trust region problem

$$\begin{aligned} \min _{d\in R^{n}}&\ \Vert F_{k}+G_{k}d\Vert ^{2}\nonumber \\ \ s.t.&\ \ \Vert d\Vert \le \Delta _{k}\triangleq \Vert d_{k}\Vert . \end{aligned}$$

(2.6)

Due to Powell’s result [10, Theorem 4], we have the following lemma.

Lemma 2.1

Let $d_{k}$ be computed by (1.7), then

$$\begin{aligned} \Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_{k}d_{k}\Vert ^{2} \ge \Vert G_{k}^{T}F_{k}\Vert \min \left\{ \Vert d_{k}\Vert ,\dfrac{\Vert G_{k}^{T}F_{k}\Vert }{\Vert G_{k}^{T}G_{k}\Vert }\right\} . \end{aligned}$$

(2.7)

Lemma 2.1 indicates that the predicted reduction of the merit function is always nonnegative. It plays a crucial role in guaranteeing the global convergence of Algorithm 1. In the following, we show that, from the optimization point, at least one of the accumulation points of the sequence generated by Algorithm 1 is a stationary point of the merit function $\Vert F(x)\Vert ^2$.

Theorem 2.2

Suppose F(x) is continuously differentiable, both F(x) and J(x) are Lipschitz continuous, i.e., there exist positive constants $\kappa _{lf}$ and $\kappa _{lj}$ such that

$$\begin{aligned} \Vert F(y)-F(x)\Vert \le \kappa _{lf}\Vert y-x\Vert ,\quad \forall x,y\in R^{n}, \end{aligned}$$

(2.8)

and

$$\begin{aligned} \Vert J(y)-J(x)\Vert \le \kappa _{lj}\Vert y-x\Vert ,\quad \forall x,y\in R^{n}. \end{aligned}$$

(2.9)

Then, we have

$$\begin{aligned} \liminf _{k\rightarrow \infty }\Vert J_{k}^{T}F_{k}\Vert =0. \end{aligned}$$

(2.10)

Proof

Suppose that (2.10) is not true. Then, there exists a constant $\varepsilon >0$ such that

$$\begin{aligned} \left\| J_{k}^{T}F_{k}\right\| \ge \varepsilon , \quad \forall k. \end{aligned}$$

(2.11)

By (2.8) and (2.9), we have

$$\begin{aligned} \Vert J(x)\Vert \le \kappa _{lf}, \quad \forall x\in R^n \end{aligned}$$

(2.12)

and

$$\begin{aligned} \Vert F(y)-F(x)-J(x)(y-x)\Vert \le \kappa _{lj}\Vert y-x\Vert ^{2},\quad \forall x,y\in R^{n}. \end{aligned}$$

(2.13)

Denote the index set of successful iterations by

$$\begin{aligned} S=\left\{ k: r_{k}\ge p_0\right\} . \end{aligned}$$

(2.14)

We consider S in two cases.

Case 1: S is finite. Then, there exists ${\tilde{k}}$ such that $r_{k}<p_0<p_1$ for all $k\ge {\tilde{k}}$. Hence, $G_k = J_k$ for all $k\ge {\tilde{k}}$. Since $\{\Vert F_k\Vert \}$ is nonincreasing, by (1.12), $\mu _{k+1}=c_1\mu _k$ with $c_1>1$. So, $\mu _{k}\rightarrow +\,\infty $.

By (2.11) and (2.12), for $k\ge {\tilde{k}}$,

$$\begin{aligned} \Vert G_{k}^{T}F_{k}\Vert \le \Vert G_{k}\Vert \Vert F_k\Vert \le \kappa _{lf}\Vert F_1\Vert \end{aligned}$$

(2.15)

and

$$\begin{aligned} \left\| F_{k}\right\| \ge \frac{\Vert G_{k}^{T}F_{k}\Vert }{\Vert G_{k}\Vert } \ge \frac{\Vert J_{k}^{T}F_{k}\Vert }{\kappa _{lf}} \ge \frac{\varepsilon }{\kappa _{lf}}. \end{aligned}$$

(2.16)

This, together with $\lambda _k=\mu _k\Vert F_k\Vert ^\delta $ and $\mu _k \rightarrow +\,\infty $, gives $\lambda _{k}\rightarrow +\,\infty .$ Hence, by the definition of $d_k$, we get $d_{k}\rightarrow 0$.

Case 2: S is infinite. It follows from (2.12) and Lemma 2.1 that

$$\begin{aligned} \left\| F_{1}\right\| ^{2}&\ge \sum _{k\in S}\left( \left\| F_{k}\right\| ^{2}-\left\| F_{k+1}\right\| ^{2}\right) \nonumber \\&\ge \sum _{k\in S}p_{0} Pred_{k} \ge \sum _{k\in S\cap {{\bar{S}}}}p_{0} Pred_{k}\nonumber \\&\ge \sum _{k\in S\cap {{\bar{S}}}}p_{0}\Vert J_{k}^{T}F_{k}\Vert \min \left\{ \Vert d_{k}\Vert ,\dfrac{\Vert J_{k}^{T}F_{k}\Vert }{\Vert J_{k}^{T}J_{k}\Vert }\right\} \nonumber \\&\ge \sum _{k\in S\cap {{\bar{S}}}}p_{0}\varepsilon \min \left\{ \left\| d_{k}\right\| ,\frac{\varepsilon }{\kappa _{lf}^{2}}\right\} . \end{aligned}$$

(2.17)

We claim that the set $S\cap {{\bar{S}}}$ is infinite. Suppose to the contrary that it is finite. Let $k_{{{\bar{i}}}}$ be the largest index in it. Then, for $i>{{\bar{i}}}$, $r_{k_i}< p_0 < p_1$. By (1.10), $G_{k_i+1} = J_{k_i+1}$. Thus, $k_{i+1}=k_i+1$. Moreover, $k_{i+1} \notin S$. Deducing by induction, we obtain that $r_k<p_0$ for all sufficiently large k, which contradicts to the infinity of S. Hence, $S\cap {{\bar{S}}}$ is infinite.

Note that $k_i\in {{\bar{S}}}$ according to the definition of ${{\bar{S}}}$. By (2.17), we obtain $d_{k_i}\rightarrow 0$ for $k_i\in S$. Since $d_{k_i}=0$ for $k_i\notin S$, we have $d_{k_i}\rightarrow 0$. This, together with $\Vert G_{k_i}\Vert \le \kappa _{lf}$, $\Vert G_{k_i}^TF_{k_i}\Vert =\Vert J_{k_i}^TF_{k_i}\Vert \ge \varepsilon $ and (1.7), gives $\lambda _{k_i} \rightarrow +\,\infty $.

By (2.12) and (2.13), for $k=k_i+1,\ldots , k_i+s_i-1$,

$$\begin{aligned} \Vert d_k \Vert&= \left\| -(G_k^T G_k + \lambda _{k} I)^{-1}G_k^T F_k \right\| \nonumber \\&\le \left\| (J_{k_i}^T J_{k_i} + \lambda _{k_i} I)^{-1}J_{k_i}^T F_{k_i} \right\| + \left\| (J_{k_i}^T J_{k_i} + \lambda _{k_i} I)^{-1}J_{k_i}^T J_{k_i} \left( \sum _{j = k_i}^{k-1} d_j \right) \right\| \nonumber \\&\qquad +\, \kappa _{lj} \left\| (J_{k_i}^T J_{k_i} + \lambda _{k_i} I)^{-1}J_{k_i}^T \right\| \left\| \left( \sum _{j = k_i}^{k-1} d_j \right) \right\| ^2 \nonumber \\&\le \Vert d_{k_i}\Vert + \sum _{j = k_i}^{k-1} \Vert d_j\Vert + \frac{\kappa _{lf}\kappa _{lj}}{\lambda _{k_i}} \left( \sum _{j = k_i}^{k-1} \Vert d_j\Vert \right) ^2. \end{aligned}$$

(2.18)

Since $\lambda _{k_i} \rightarrow + \infty $, we have

$$\begin{aligned} \Vert d_{k_i+1} \Vert \le 3 \Vert d_{k_i}\Vert \end{aligned}$$

(2.19)

and

$$\begin{aligned} \Vert d_{k_i+2} \Vert&\le \Vert d_{k_i}\Vert +\Vert d_{k_i}\Vert +\Vert d_{k_i+1} \Vert + \frac{\kappa _{lf}\kappa _{lj}}{\lambda _{k_i}}\left( \Vert d_{k_i}\Vert +\Vert d_{k_i+1}\Vert \right) ^2\nonumber \\&\le 21\Vert d_{k_i}\Vert \end{aligned}$$

(2.20)

for sufficiently large $k_i$. By induction, there exists a positive constant ${{\hat{c}}}$ such that, for $k=k_i+1,\ldots ,k_i+s_i-1$,

$$\begin{aligned} \Vert d_k \Vert \le {{\hat{c}}}\Vert d_{k_i} \Vert \end{aligned}$$

(2.21)

holds for all sufficiently large $k_i$. Since $s_i\le t$ for all i, we have $d_k\rightarrow 0$ and

$$\begin{aligned} \Vert J_k-G_k\Vert =\Vert J_k-J_{k_i}\Vert \le \kappa _{lj} \sum _{j=k_i}^{k-1} \Vert d_j\Vert \rightarrow 0. \end{aligned}$$

(2.22)

Hence, by (2.11),

$$\begin{aligned} \left\| G_{k}^T F_k\right\| \ge \left\| J_{k}^T F_{k}\right\| - \left\| (J_{k}-G_k)^T F_k\right\| \ge \left\| J_{k}^T F_{k}\right\| -\Vert J_k-G_k\Vert \Vert F_1\Vert \ge \varepsilon /2 \end{aligned}$$

(2.23)

holds for sufficiently large k. Since $\Vert F_k\Vert \ge \left\| G_{k}^{T}F_{k}\right\| / \Vert G_{k}\Vert \ge \frac{\varepsilon }{2 \kappa _{lf}}$, by the definition of $d_k$ and $\lambda _k$, we have $\lambda _{k}\rightarrow +\,\infty $ and $\mu _k\rightarrow +\,\infty $.

Therefore, no matter S is infinite or not, we obtain

$$\begin{aligned} d_k\rightarrow 0, \quad \lambda _{k}\rightarrow +\,\infty , \quad \mu _k\rightarrow +\,\infty , \end{aligned}$$

(2.24)

moreover, (2.22) and (2.23) hold true. It then follows from (2.12), (2.13), (2.22), (2.23), Lemma 2.1 and $\Vert F_k\Vert \le \Vert F_1\Vert $ that

$$\begin{aligned} \left| r_{k}-1\right|&=\left| \frac{Ared_{k}-Pred_{k}}{Pred_{k}}\right| =\left| \frac{ \left\| F_{k+1}\right\| ^{2}-\left\| F_{k}+G_{k}d_{k}\right\| ^{2} }{ \Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_{k}d_{k}\Vert ^{2} }\right| \nonumber \\&\le \frac{ \left| \left\| F_{k}+J_{k}d_{k}\right\| ^{2}-\left\| F_{k}+G_{k}d_{k}\right\| ^{2}\right| + 2\kappa _{lj}\left\| F_{k}+J_{k}d_{k}\right\| \left\| d_{k}\right\| ^{2}+\kappa _{lj}^2\left\| d_{k}\right\| ^{4}}{ \left\| G_{k}^{T}F_{k}\right\| \min \left\{ \Vert d_{k}\Vert ,\dfrac{\left\| G_{k}^{T}F_{k}\right\| }{\left\| G_{k}^{T}G_{k}\right\| }\right\} }\nonumber \\&\le \frac{ \left( \left\| J_{k}^{T}J_{k}\right\| +\left\| G_{k}^{T}G_{k}\right\| \right) \left\| d_{k}\right\| ^{2}+2\left\| J_{k}-G_{k}\right\| \left\| F_{k}\right\| \left\| d_{k}\right\| }{ \frac{\varepsilon }{2}\min \left\{ \left\| d_{k}\right\| ,\frac{\varepsilon }{2\kappa _{lf}^{2}}\right\} }\nonumber \\&\qquad +\,\frac{2\kappa _{lj}\Vert F_1\Vert \Vert d_k\Vert ^2+2\kappa _{lj}\kappa _{lf}\Vert d_k\Vert ^3+\kappa _{lj}^4\Vert d_k\Vert ^4}{\frac{\varepsilon }{2}\min \left\{ \left\| d_{k}\right\| ,\frac{\varepsilon }{2\kappa _{lf}^{2}}\right\} }\nonumber \\&\rightarrow 0. \end{aligned}$$

(2.25)

Thus, $r_{k}\rightarrow 1$. So, by (1.10)–(1.12), there exists a positive ${\hat{\mu }}$ such that $ \mu _{k}\le {\hat{\mu }} $ holds for all sufficiently large k, which is a contradiction to (2.24). Hence, (2.11) can not be true. Therefore, we obtain (2.10). $\square $

3 Local Convergence

In this section, we first investigate the properties of the trial step, then show that the LM parameter is bounded, finally we prove that Algorithm 1 converges superlinearly under the local error bound condition. We make the following assumptions.

Assumption 3.1

The sequence $\{x_k\}$ generated by Algorithm 1 satisfies ${\text {dist}}(x_k, X^*)\rightarrow 0$, and there exist $x^*\in X^*$ and $0<r<1$ such that $ \Vert x_k-x^*\Vert \le r/2$ for all large k.

Assumption 3.2

F(x) is continuously differentiable, and J(x) is Lipschitz continuous on $N(x^*, r)=\{ x: \Vert x-x^*\Vert \le r\}$, i.e., there exists a constant $\kappa _{lj}>0$ such that

$$\begin{aligned} \Vert J(y)-J(x)\Vert \le \kappa _{lj}\Vert y-x\Vert ,\quad \forall x,y\in N(x^*, r). \end{aligned}$$

(3.1)

Assumption 3.3

$\Vert F(x)\Vert $ provides a local error bound on $N(x^*, r)$, i.e., there exists a constant $\kappa _{leb}>0$ such that

$$\begin{aligned} \Vert F(x)\Vert \ge \kappa _{leb} {\text {dist}}(x,X^{*}),\quad \forall x\in N (x^{*},r). \end{aligned}$$

(3.2)

By Assumption 3.2,

$$\begin{aligned} \Vert F(y)-F(x)-J(x)(y-x)\Vert \le \kappa _{lj}\Vert y-x\Vert ^{2},\quad \forall x,y\in N(x^{*}, r), \end{aligned}$$

(3.3)

and there exists a constant $\kappa _{lf}>0$ such that

$$\begin{aligned} \Vert F(y)-F(x)\Vert \le \kappa _{lf}\Vert y-x\Vert ,\quad \forall x,y\in N(x^{*}, r). \end{aligned}$$

(3.4)

Denote by ${\bar{x}}_k$ the vector in $X^*$ that satisfies

$$\begin{aligned} \Vert {{\bar{x}}}_k-{x}_k\Vert =\text{ dist }(x_k, X^*). \end{aligned}$$

(3.5)

Then, for all large k,

$$\begin{aligned} \Vert {\bar{x}}_{k}-x_{k}\Vert \le \Vert x^*-x_k\Vert \le \frac{r}{2} \end{aligned}$$

(3.6)

and

$$\begin{aligned} \Vert {{\bar{x}}}_k-x^*\Vert \le \Vert {\bar{x}}_{k}-x_{k}\Vert +\Vert x^*-x_k\Vert \le r, \end{aligned}$$

(3.7)

so ${{\bar{x}}}_k\in N(x^*,r)$. In the following, the iterations k we consider are large.

3.1 Properties of the Step $d_k$

In this subsection, we show the relationship between the size of $d_k$ and the distance from $x_{k_i}$ to the solution set.

Lemma 3.4

Under Assumptions 3.1–3.3, there exists a constant $c>0$ such that

$$\begin{aligned} \Vert d_k\Vert \le c\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert , \quad k=k_i, \ldots , k_{i}+s_i-1. \end{aligned}$$

(3.8)

Proof

It follows from (1.12) and (3.2) that

$$\begin{aligned} \lambda _{k_{i}}=\mu _{k_{i}}\Vert F_{k_{i}}\Vert ^{\delta }\ge \mu _{\min } \kappa _{leb} ^{\delta }\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{\delta }. \end{aligned}$$

(3.9)

Since $d_{k_i}$ is the minimizer of $\varphi _{k_{i}}(d)$, by (3.6), (3.3), $0<r<1$ and $1\le \delta \le 2$, we have

$$\begin{aligned} \Vert d_{k_{i}}\Vert ^{2}&\le {\displaystyle \frac{\varphi _{k_{i}}(d_{k_{i}})}{\lambda _{k_{i}}}}\nonumber \\&\le {\displaystyle \frac{\varphi _{k_{i}}({\bar{x}}_{k_{i}}-x_{k_{i}})}{\lambda _{k_{i}}}}\nonumber \\&= {\displaystyle \frac{\Vert F_{k_{i}}+J_{k_{i}}({\bar{x}}_{k_{i}}-x_{k_{i}})\Vert ^{2}}{\lambda _{k_{i}}}+\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{2}}\nonumber \\&\le \dfrac{\kappa _{lj}^{2}}{\mu _{\min } \kappa _{leb} ^{\delta }}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{4-\delta }+\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{2}\nonumber \\&\le c_0\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert , \end{aligned}$$

(3.10)

where $c_0=\sqrt{\mu _{\min }^{-1}\kappa _{leb} ^{-\delta }\kappa _{lj}^{2}+1}$.

For $k=k_i+1, \ldots , k_i+s_i-1$, by (3.3),

$$\begin{aligned} \Vert d_k\Vert&= \left\| -\,\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}F_{k}\right\| \nonumber \\&\le \left\| -\,\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}F_{k_{i}}\right\| +\left\| -\,\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}J_{k_{i}}\left( \sum _{j=k_i}^{k-1}d_j\right) \right\| \nonumber \\&\qquad +\,\kappa _{lj}\left\| -\,\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}\right\| \left\| \sum _{j=k_i}^{k-1}d_j\right\| ^{2}\nonumber \\&\le \Vert d_{k_{i}}\Vert +\sum _{j=k_i}^{k-1}\Vert d_j\Vert +\kappa _{lj}\left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}\right\| \left( \sum _{j=k_i}^{k-1}\Vert d_j\Vert \right) ^{2}. \end{aligned}$$

(3.11)

By (3.9),

$$\begin{aligned} \left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}\right\|&=\left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}J_{k_{i}}^{T}J_{k_{i}}\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}\right\| ^{1/2}\nonumber \\&\le \left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}\left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_i}I\right) \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}\right\| ^{1/2}\nonumber \\&= \left\| \left( J_{k_{i}}^{T}J_{k_{i}}+\lambda _{k_{i}}I\right) ^{-1}\right\| ^{1/2}\nonumber \\&\le \frac{1}{\sqrt{\lambda _{k_i}}} \le \frac{1}{\mu _{\min }^{\frac{1}{2}}\kappa _{leb} ^{\frac{\delta }{2}}}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{-\frac{\delta }{2}}. \end{aligned}$$

(3.12)

So, it follows from (3.6), (3.10) and $1\le \delta \le 2$ that

$$\begin{aligned} \Vert d_{k_i+1}\Vert&\le \Vert d_{k_{i}}\Vert +\Vert d_{k_{i}}\Vert + \kappa _{lj}\mu _{\min }^{-\frac{1}{2}}\kappa _{leb}^{-\frac{\delta }{2}}c_0\Vert d_{k_{i}}\Vert \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{1-\frac{\delta }{2}}\nonumber \\&\le 2 \Vert d_{k_{i}}\Vert + {{\bar{c}}} \Vert d_{k_{i}}\Vert \nonumber \\&= c_1 \Vert d_{k_{i}}\Vert , \end{aligned}$$

(3.13)

where ${{\bar{c}}}= \kappa _{lj}\mu _{\min }^{-\frac{1}{2}}\kappa _{leb}^{-\frac{\delta }{2}}c_0$ and $c_1=2+{{\bar{c}}}$ are positive constants. Similarly,

$$\begin{aligned} \Vert d_{k_i+2}\Vert&\le \Vert d_{k_{i}}\Vert +\Vert d_{k_{i}}\Vert + \Vert d_{k_i+1}\Vert + \kappa _{lj}\mu _{\min }^{-\frac{1}{2}}\kappa _{leb}^{-\frac{\delta }{2}}c_0(1+c_1)^2\Vert d_{k_{i}}\Vert \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{1-\frac{\delta }{2}}\nonumber \\&\le \left( 2+c_1+{{\bar{c}}}(1+c_1)^2\right) \Vert d_{k_{i}}\Vert \nonumber \\&= c_2 \Vert d_{k_{i}}\Vert , \end{aligned}$$

(3.14)

where $c_2=2+c_1+{{\bar{c}}}(1+c_1)^2$. Let

$$\begin{aligned} c_j= 2+c_1+\cdots + c_{j-1}+{{\bar{c}}}(1+c_1+\cdots +c_{j-1})^2, \quad j=3, \ldots , t-1. \end{aligned}$$

(3.15)

Then, by induction, we have

$$\begin{aligned} \Vert d_{k_i+q}\Vert \le&c_q \Vert d_{k_{i}}\Vert , \quad q=3, \ldots , s_i-1. \end{aligned}$$

(3.16)

Since $s_i\le t$ for all i, $c_{t-1}>c_{t-2}>\cdots>c_2>c_1>1$, by (3.10), we obtain (3.8), where $c=c_0c_{t-1}$. $\square $

3.2 Boundedness of the LM Parameter

The updating rule of $\mu _k$ indicates that $\mu _k$ is bounded below. In the following, we show that $\mu _k$ is bounded above.

Lemma 3.5

Under Assumptions 3.1–3.3, there exists a constant $\mu _{\max }>0$ such that

$$\begin{aligned} \mu _{k}\le \mu _{\max } \end{aligned}$$

(3.17)

holds for all sufficiently large k.

Proof

We first prove that

$$\begin{aligned}&\Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_kd_k\Vert ^{2}\ge \frac{\kappa _{leb}}{2}\Vert F_{k}\Vert \min \left\{ \Vert d_k\Vert ,\Vert {\bar{x}}_{k}-x_{k}\Vert \right\} \end{aligned}$$

(3.18)

holds for sufficiently large k.

It follows from (3.1), (3.3), Lemma 3.4 and $s_i\le t$ that, for $k=k_i, k_i+1,\ldots , k_i+s_i-1$,

$$\begin{aligned} \Vert F_{k}+G_k({\bar{x}}_{k}-x_{k})\Vert&\le \Vert F_{k}+J_k({\bar{x}}_{k}-x_{k})\Vert +\Vert G_k-J_k\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\le \kappa _{lj}\Vert {\bar{x}}_{k}-x_{k}\Vert ^{2}+\kappa _{lj}\left( \sum _{j=k_i}^{k-1}\Vert d_j\Vert \right) \Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\le \kappa _{lj}\Vert {\bar{x}}_{k}-x_{k}\Vert ^{2}+\kappa _{lj}ct\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert . \end{aligned}$$

(3.19)

Note that $\Vert {\bar{x}}_{k}-x_{k}\Vert \rightarrow 0$ and $\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \rightarrow 0$, we obtain that

$$\begin{aligned} \Vert F_{k}+G_k({\bar{x}}_{k}-x_{k})\Vert \le \frac{\kappa _{leb}}{2}\Vert {\bar{x}}_{k}-x_{k}\Vert \end{aligned}$$

(3.20)

holds for sufficiently large k.

We consider in two cases.

Case 1: $\Vert {\bar{x}}_{k}-x_{k}\Vert \le \Vert d_k\Vert $. Since $d_k$ is a solution of the trust region problem (2.6), by (3.2) and (3.20),

$$\begin{aligned} \Vert F_{k}\Vert -\Vert F_{k}+G_kd_k\Vert&\ge \Vert F_{k}\Vert - \Vert F_{k}+G_k({\bar{x}}_{k}-x_{k})\Vert \nonumber \\&\ge \kappa _{leb}\Vert {\bar{x}}_{k}-x_{k}\Vert -\frac{\kappa _{leb}}{2}\Vert {\bar{x}}_{k}-x_{k}\Vert \ge \frac{\kappa _{leb}}{2}\Vert {\bar{x}}_{k}-x_{k}\Vert \end{aligned}$$

(3.21)

holds for sufficiently large k.

Case 2: $\Vert {\bar{x}}_{k}-x_{k}\Vert >\Vert d_k\Vert $. Similarly, by (3.2) and (3.20),

$$\begin{aligned} \Vert F_{k}\Vert -\Vert F_{k}+G_kd_k\Vert&\ge \Vert F_{k}\Vert -\left\| F_{k}+\frac{\Vert d_k\Vert }{\Vert {\bar{x}}_{k}-x_{k}\Vert }G_k({\bar{x}}_{k}-x_{k})\right\| \nonumber \\&\ge \frac{\Vert d_k\Vert }{\Vert {\bar{x}}_{k}-x_{k}\Vert }\left( \Vert F_{k}\Vert -\Vert F_{k}+G_k({\bar{x}}_{k}-x_{k})\Vert \right) \nonumber \\&\ge \frac{\Vert d_k\Vert }{\Vert {\bar{x}}_{k}-x_{k}\Vert }\frac{\kappa _{leb}}{2}\Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\ge \frac{\kappa _{leb}}{2}\Vert d_k\Vert . \end{aligned}$$

(3.22)

Combining (3.21) with (3.22), we get

$$\begin{aligned} \Vert F_{k}\Vert ^{2}-\Vert F_{k}+G_kd_k\Vert ^{2}&\ge \frac{\kappa _{leb}}{2}\left( \Vert F_{k}\Vert +\Vert F_{k}+G_kd_k\Vert \right) \min \left\{ \Vert d_k\Vert ,\Vert {\bar{x}}_{k}-x_{k}\Vert \right\} \nonumber \\&\ge \frac{\kappa _{leb}}{2}\Vert F_{k}\Vert \min \left\{ \Vert d_k\Vert ,\Vert {\bar{x}}_{k}-x_{k}\Vert \right\} , \end{aligned}$$

(3.23)

which yields (3.18).

It then follows from (3.2), (3.3), Lemma 3.4 and $\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert \le \Vert F_{k_i}\Vert $ that

$$\begin{aligned} |r_{k_i}-1|&= \left| \dfrac{Ared_{k_i}-Pred_{k_i}}{Pred_{k_i}}\right| =\left| \dfrac{\Vert F_{k_i+1}\Vert ^{2}-\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert ^{2}}{\Vert F_{k_i}\Vert ^{2}-\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert ^{2}}\right| \nonumber \\&\le \left| \dfrac{2\Vert F_{k_i}+J_{k_i}d_{k_i}\Vert \Vert d_{k_i}\Vert ^2+\kappa _{lj}^2\Vert d_{k_i}\Vert ^{4}}{\frac{\kappa _{leb}}{2}\Vert F_{k_i}\Vert \min \{\Vert d_{k_i}\Vert ,\Vert {\bar{x}}_{k_i}-x_{k_i}\Vert \}}\right| \nonumber \\&\rightarrow 0. \end{aligned}$$

(3.24)

This implies that $r_{k_i}\rightarrow 1$. Note that for $k\notin {{\bar{S}}}$, $G_{k+1}=G_k$ and $r_k\ge p_1>p_2$, so $\mu _{k+1}\le \mu _{k}$. By (1.10)–(1.12), there exists a constant $\mu _{\max }>0$ such that (3.17) holds true. $\square $

3.3 Convergence Rate of Algorithm 1

Due to the results given by Behling and Iusem in [1], we assume that ${\text {rank}}(J({\bar{x}}))=r$ for all ${\bar{x}}\in N(x^{*},b )\cap X^{*}$. Suppose the SVD of $J({\bar{x}}_{k_{i}})$ is

$$\begin{aligned} J({\bar{x}}_{k_{i}})&= {\bar{U}}_{k_{i}}{\bar{\Sigma }}_{k_{i}}{\bar{V}}_{k_{i}}^{T}\nonumber \\&= \left( {\bar{U}}_{k_{i},1},{\bar{U}}_{k_{i},2}\right) \left( \begin{array}{cc} {\bar{\Sigma }}_{k_{i},1}\\ &{}\quad 0 \end{array}\right) \left( \begin{array}{c} {\bar{V}}_{k_{i},1}^{T}\\ {\bar{V}}_{k_{i},2}^{T} \end{array}\right) \nonumber \\&= {\bar{U}}_{k_{i},1}{\bar{\Sigma }}_{k_{i},1}{\bar{V}}_{k_{i},1}^{T}, \end{aligned}$$

(3.25)

where ${\bar{\Sigma }}_{k_{i},1}={\text {diag}}({\bar{\sigma }}_{k_{i},1},{\bar{\sigma }}_{k_{i},2},\ldots ,{\bar{\sigma }}_{k_{i},r})$ with ${\bar{\sigma }}_{k_{i},1}\ge \cdots \ge {\bar{\sigma }}_{k_{i},r}>0$. Suppose the SVD of $J(x_{k_{i}})$ is

$$\begin{aligned} J_{k_{i}}&= U_{k_{i}}\Sigma _{k_{i}}V_{k_{i}}^{T}\nonumber \\&= \left( U_{k_{i},1},U_{k_{i},2}\right) \left( \begin{array}{cc} \Sigma _{k_{i},1}\\ &{} \Sigma _{k_{i},2}\\ \end{array}\right) \left( \begin{array}{c} V_{k_{i},1}^{T}\\ V_{k_{i},2}^{T} \end{array}\right) \nonumber \\&= U_{k_{i},1}\Sigma _{k_{i},1}V_{k_{i},1}^{T}+U_{k_{i},2}\Sigma _{k_{i},2}V_{k_{i},2}^{T}, \end{aligned}$$

(3.26)

where $\Sigma _{k_{i},1}=\text{ diag }(\sigma _{k_{i},1},\sigma _{k_{i},2},\ldots ,\sigma _{k_{i},r})$ with $\sigma _{k_{i},1}\ge \cdots \ge \sigma _{k_{i},r}>0$ and $\Sigma _{k_{i},2}=\text{ diag }(\sigma _{k_{i},r+1},\sigma _{k_{i},r+2},\ldots ,\sigma _{k_{i},n})$ with $\sigma _{k_{i},r+1}\ge \cdots \ge \sigma _{k_{i},n}\ge 0$.

By (3.1) and the theory of matrix perturbation [12], we obtain

$$\begin{aligned} \left\| \text{ diag }\left( \Sigma _{k_i, 1}-{\bar{\Sigma }}_{k_i,1},\Sigma _{k_i, 2}\right) \right\| \le \Vert J_{k_{i}}-J({{\bar{x}}}_{k_{i}})\Vert \le \kappa _{lj}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert . \end{aligned}$$

(3.27)

Lemma 3.6

Under Assumptions 3.1–3.3, there exist positive constants $l_1$ and $l_2$ such that

$$\begin{aligned} \Vert d_{k}\Vert&\le l_1\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}, \quad k=k_i,\ldots , k_{i}+s_i-1, \end{aligned}$$

(3.28)

$$\begin{aligned} \Vert F_{k}+G_{k}d_{k}\Vert&\le l_2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+2}, \quad k=k_i,\ldots , k_{i}+s_i-1. \end{aligned}$$

(3.29)

Proof

We prove by induction. It was shown in [2] that the results hold true for $k=k_i$ and $k=k_i+1$.

Suppose the results hold true for $k-1 (k_i+2< k < k_i+s_i-1)$, that is, there exist constants ${{\bar{l}}}_1$ and ${{\bar{l}}}_2$ such that

$$\begin{aligned} \Vert d_{k-1}\Vert&\le {{\bar{l}}}_1\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i}, \end{aligned}$$

(3.30)

$$\begin{aligned} \Vert F_{k-1}+G_{k-1}d_{k-1}\Vert&\le \bar{l}_2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}. \end{aligned}$$

(3.31)

It then follows from (3.1), (3.3) and Lemma 3.4 that

$$\begin{aligned} \Vert F_{k}\Vert&=\Vert F(x_{k-1}+d_{k-1})\Vert \nonumber \\&\le \Vert F_{k-1}+J_{k-1}d_{k-1}\Vert +\kappa _{lj}\Vert d_{k-1}\Vert ^{2}\nonumber \\&\le \Vert F_{k-1}+G_{k-1}d_{k-1}\Vert +\Vert \left( J_{k-1}-G_{k-1}\right) d_{k-1}\Vert +\kappa _{lj}\Vert d_{k-1}\Vert ^{2}\nonumber \\&=\Vert F_{k-1}+G_{k-1}d_{k-1}\Vert +\left\| J_{k-1}-J_{k_{i}}\right\| \Vert d_{k-1}\Vert +\kappa _{lj}\Vert d_{k-1}\Vert ^{2}\nonumber \\&\le {{\bar{l}}}_2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1} +\kappa _{lj}\sum _{j=k_i}^{k-2}\Vert d_{j}\Vert \Vert \Vert d_{k-1}\Vert +\kappa _{lj}{{\bar{l}}}_1^2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{2(k-k_i)}\nonumber \\&\le {{\bar{l}}}_3\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}, \end{aligned}$$

(3.32)

where ${{\bar{l}}}_3={{\bar{l}}}_2+\kappa _{lj}ct{{\bar{l}}}_1+\kappa _{lj}\bar{l}_1^2$. Hence,

$$\begin{aligned} \left\| U_{k_i, 1}U_{k_i, 1}^{T}F_{k}\right\| \le \Vert F_{k}\Vert \le \bar{l}_3\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}. \end{aligned}$$

(3.33)

Moreover, by (3.2),

$$\begin{aligned} \Vert {\bar{x}}_{k}-x_{k}\Vert \le \kappa _{leb}^{-1}\Vert F_{k}\Vert \le \kappa _{leb}^{-1} {{\bar{l}}}_3\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}. \end{aligned}$$

(3.34)

Let ${\tilde{J}}_{k_{i}}=U_{k_i, 1}\Sigma _{k_i, 1}V_{k_i, 1}^{T}$ and ${\tilde{u}}_{k}=-{\tilde{J}}_{k_i}^{+}F_{k}$. Then, ${\tilde{u}}_{k}$ is the least squares solution of the problem ${\displaystyle \min _{u\in R^{n}}\Vert F_{k}+{\tilde{J}}_{k_{i}}u\Vert }$. By (3.1), (3.3) and (3.27),

$$\begin{aligned} \left\| U_{k_i, 2}U_{k_i, 2}^{T}F_{k}\right\|&= \Vert F_{k}+{\tilde{J}}_{k_{i}}{\tilde{u}}_{k}\Vert \le \Vert F_{k}+{\tilde{J}}_{k_{i}}({\bar{x}}_{k}-x_{k})\Vert \nonumber \\&\le \Vert F_{k}+J_{k}({\bar{x}}_{k}-x_{k})\Vert +\Vert J_{k_{i}}-J_{k}\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert + \Vert {\tilde{J}}_{k_{i}}-J_{k_{i}}\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\le \kappa _{lj}\Vert {\bar{x}}_{k}-x_{k}\Vert ^{2}+\kappa _{lj}\sum _{j=k_i}^{k-1}\Vert d_{j}\Vert \Vert {\bar{x}}_{k}-x_{k}\Vert +\left\| U_{k_i, 2}\Sigma _{k_i, 2}V_{k_i, 2}^{T}\right\| \Vert {\bar{x}}_{k}-x_{k}\Vert \nonumber \\&\le {{\bar{l}}}_4\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+2}, \end{aligned}$$

(3.35)

where ${{\bar{l}}}_4=\kappa _{lj}\kappa _{leb}^{-1} \bar{l}_3(\kappa _{leb}^{-1} {{\bar{l}}}_3+ct+1)$.

Note that $\{x_k\}$ converges to $X^{*}$. By (3.27), we know

$$\begin{aligned} \left\| \Sigma _{k_i, 1}^{-1}\right\| =\frac{1}{\sigma _{k_i, r}}\le \left| \frac{1}{{\bar{\sigma }}_{k_i, r}-\kappa _{lj}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert }\right| \le \frac{2}{{\bar{\sigma }}_{k_i, r}} \end{aligned}$$

(3.36)

holds for sufficiently large $k_i$. By (3.9),

$$\begin{aligned} \left\| \lambda _{k_{i}}^{-1}\Sigma _{k_i, 2}\right\| =\frac{\Vert \Sigma _{k_i, 2}\Vert }{\lambda _{k_i}} \le \frac{\kappa _{lj}}{\mu _{\min } \kappa _{leb} ^{\delta }}\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{1-\delta }. \end{aligned}$$

(3.37)

It then follows from (3.33), (3.35)–(3.37), Lemma 3.5 and $1\le \delta \le 2$ that

$$\begin{aligned} \Vert d_{k}\Vert&= \left\| -V_{k_i, 1}\left( \Sigma _{k_i, 1}^{2}+\lambda _{k_{i}}I\right) ^{-1}\Sigma _{k_i, 1}U_{k_i, 1}^{T}F_{k}-V_{k_i, 2}\left( \Sigma _{k_i, 2}^{2}+\lambda _{k_{i}}I\right) ^{-1}\Sigma _{k_i, 2}U_{k_i, 2}^{T}F_{k}\right\| \nonumber \\&\le \left\| \Sigma _{k_i, 1}^{-1}\right\| \left\| U_{k_i, 1}^{T}F_{k}\right\| +\left\| \lambda _{k_{i}}^{-1}\Sigma _{k_i, 2}\right\| \left\| U_{k_i, 2}^{T}F_{k}\right\| \nonumber \\&\le l_1 \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+1}, \end{aligned}$$

(3.38)

where $l_1=2{\bar{\sigma }}_{k_i, r}^{-1} {{\bar{l}}}_3+ \kappa _{lj}\mu _{\min }^{-1} \kappa _{leb} ^{-\delta } {{\bar{l}}}_4$, and

$$\begin{aligned}&\Vert F_{k}+G_{k}d_{k}\Vert \nonumber \\&\qquad = \left\| \lambda _{k_{i}}U_{k_i, 1}\left( \Sigma _{k_i, 1}^{2}+\lambda _{k_{i}}I\right) ^{-1}U_{k_i, 1}^{T}F_{k}+\lambda _{k_{i}}U_{k_i, 2}\left( \Sigma _{k_i, 2}^{2}+\lambda _{k_{i}}I\right) ^{-1}U_{k_i, 2}^{T}F_{k} \right\| \nonumber \\&\qquad \le \mu _{k_{i}}\Vert F_{k_{i}}\Vert ^{\delta }\left\| \Sigma _{k_i, 1}^{-2}\right\| \left\| U_{k_i, 1}^{T}F_{k}\right\| +\left\| U_{k_i, 2}^{T}F_{k}\right\| \nonumber \\&\qquad \le l_2\Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{k-k_i+2}, \end{aligned}$$

(3.39)

where $l_2= 4\mu _{\max }\kappa _{lf}^{\delta }{\bar{\sigma }}_{k_i, r}^{-2}{{\bar{l}}}_3+{{\bar{l}}}_4$. The proof is completed. $\square $

Based on Lemma 3.6, we have the following main result.

Theorem 3.7

Under Assumptions 3.1–3.3, there exists a constant $l_3>0$ such that

$$\begin{aligned} \Vert d_{k_{i+1}}\Vert \le l_3\Vert d_{k_{i}}\Vert ^{s_i+1}. \end{aligned}$$

(3.40)

Consequently, Algorithm 1 converges q-superlinearly to some solution of (1.1).

Proof

It follows from (3.2), (3.3), Lemma 3.4 and Lemma 3.6 that

$$\begin{aligned} \kappa _{leb}\Vert {\bar{x}}_{k_{i+1}}-x_{k_{i+1}}\Vert&\le \Vert F(x_{k_{i+1}})\Vert = \Vert F(x_{k_i+s_i-1}+d_{k_i+s_i-1})\Vert \nonumber \\&\le \Vert F_{k_i+s_i-1}+J_{k_i+s_i-1}d_{k_i+s_i-1}\Vert + \kappa _{lj}\Vert d_{k_i+s_i-1}\Vert ^2\nonumber \\&\le \Vert F_{k_i+s_i-1}+G_{k_i+s_i-1}d_{k_i+s_i-1}\Vert +\Vert (J_{k_i+s_i-1}-G_{k_i+s_i-1})d_{k_i+s_i-1}\Vert \nonumber \\&\quad +\kappa _{lj}\Vert d_{k_i+s_i-1}\Vert ^{2}\nonumber \\&\le \Vert F_{k_i+s_i-1}+G_{k_i+s_i-1}d_{k_i+s_i-1}\Vert + \sum _{j=k_i}^{k_i+s_i-2}\Vert d_{j}\Vert \Vert d_{k_i+s_i-1}\Vert \nonumber \\&\quad +\kappa _{lj}\Vert d_{k_i+s_i-1}\Vert ^{2}\nonumber \\&\le \left( l_2+tcl_1+\kappa _{lj}l_1^2\right) \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert ^{s_i+1}. \end{aligned}$$

(3.41)

Since

$$\begin{aligned} \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \le \Vert {\bar{x}}_{k_{i+1}}-x_{k_{i}}\Vert \le \Vert {\bar{x}}_{k_{i+1}}-x_{k_{i+1}}\Vert +\sum _{j=k_i}^{k_{i+1}-1}\Vert d_j\Vert , \end{aligned}$$

(3.42)

by (3.16) and (3.41),

$$\begin{aligned} \Vert {\bar{x}}_{k_{i}}-x_{k_{i}}\Vert \le 2 \sum _{j=k_i}^{k_{i+1}-1}\Vert d_j\Vert \le 2s_ic_{s_i-1}\Vert d_{k_i}\Vert \end{aligned}$$

(3.43)

holds for sufficiently large k.

Let ${{\bar{l}}}_5=\kappa _{leb}^{-1}(l_2+tcl_1+\kappa _{lj}l_1^2)$. It then follows from (3.41), Lemmas 3.4 and 3.6 that

$$\begin{aligned} \Vert d_{k_{i+1}}\Vert&\le c \Vert {\bar{x}}_{k_{i+1}}-x_{k_{i+1}}\Vert \le c{{\bar{l}}}_5 \Vert {{\bar{x}}}_{k_i}-x_{k_i}\Vert ^{s_i+1}\nonumber \\&\le c{{\bar{l}}}_5(2s_ic_{s_i-1})^{s_i+1} \Vert d_{k_{i}}\Vert ^{s_i+1}\nonumber \\&\le c{{\bar{l}}}_5(2tc_{t-1})^{t+1}\Vert d_{k_{i}}\Vert ^{s_i+1}. \end{aligned}$$

(3.44)

Letting $l_3=c{{\bar{l}}}_5(2tc_{t-1})^{t+1}$, we obtain (3.40).

By Lemma 3.4, we have $\Vert d_{k_i}\Vert \rightarrow 0$. Hence, there exist N and $0<q<1$ such that $\max \{\Vert d_{k_i}\Vert , l_3\Vert d_{k_i}\Vert \}\le q<1$ for all $i\ge N$. Since $ s_i\ge 1$, we have

$$\begin{aligned} \Vert d_{k_{i+1}}\Vert \le l_3 \Vert d_{k_{i}}\Vert ^{s_i+1} \le l_3 \Vert d_{k_{i}}\Vert ^2\le q\Vert d_{k_i}\Vert , \qquad \forall i\ge N. \end{aligned}$$

(3.45)

Then,

$$\begin{aligned} \sum _{i=N}^\infty \Vert d_{k_i}\Vert&\le \Vert d_{k_N}\Vert + q \Vert d_{k_N}\Vert + q^2\Vert d_{k_N}\Vert +\cdots \nonumber \\&= \frac{\Vert d_{k_N}\Vert }{1-q}. \end{aligned}$$

(3.46)

This implies that $\sum _{i=N}^\infty \Vert d_{k_i}\Vert $ converges, so does $\sum _{i=1}^\infty \Vert d_{k_i}\Vert $.

By (3.15) and $s_i\le t$, we have

$$\begin{aligned} \Vert x_{k_{i+1}}-x_{k_i}\Vert =\left\| \sum _{j=0}^{s_i-1} d_{k_i+j}\right\| \le t c_{t-1} \Vert d_{k_i}\Vert . \end{aligned}$$

(3.47)

So, $\sum _{i=1}^\infty \Vert x_{k_{i+1}}-x_{k_i}\Vert $ converges, which implies that $\sum _{i=1}^\infty (x_{k_{i+1}}-x_{k_i})$ converges. Thus, $\{x_{k_i}\}$ converges to some ${{\hat{x}}}\in X^*$. By (3.40), we have

$$\begin{aligned} \Vert x_{k_{i+1}}-{{\hat{x}}}\Vert \le {{\tilde{q}}} \Vert x_{k_{i}}-{{\hat{x}}}\Vert ^{s_i+1} \end{aligned}$$

(3.48)

for some ${{\tilde{q}}}>0$. Therefore, Algorithm 1 converges q-superlinearly. $\square $

4 Numerical Results

We tested Algorithm 1 on some singular problems and compared it with the LM algorithm, the modified LM algorithm [2] as well as the multi-step LM algorithm with $m=3$ [3]. The experiments are implemented on a laptop with an Intel Core i7-7500U CPU and 8 GB of RAM, using Matlab R2015b.

The test problems are created by modifying the nonsingular problems given by Moré, Garbow and Hillstrom in [8]. They have the same form as in [11]

$$\begin{aligned} {\hat{F}}(x)=F(x)-J(x^{*})A\left( A^{T}A\right) ^{-1}A^{T}(x-x^{*}), \end{aligned}$$

(4.1)

where F(x) is the standard nonsingular test function, $x^{*}$ is its root, and $A\in R^{n\times k}$ has full column rank with $1\le k\le n$. Obviously, ${\hat{F}}(x^{*})=0$ and

$$\begin{aligned} {\hat{J}}(x^{*})=J(x^{*})\left( I-A\left( A^{T}A\right) ^{-1}A^{T}\right) \end{aligned}$$

has rank $n-k$.

We set $p_{0}=0.0001,p_{1}=0.50,p_{2}=0.25,p_{3}=0.75$, $c_1=4, c_2=0.25$, $\mu _{1}=10^{-5}$, $\mu _{\min }=10^{-8}$ and $t=10$ for the tests. Algorithm 1 stops when $\Vert J_{k_i}^{T}F_{k_i}\Vert \le 10^{-5}$ or the iteration number exceeds $100\times (n+1)$.

Table 1 Results on the singular problems with rank $n-1$ in small scale

Full size table

Table 2 Results on the singular problems with rank $n-1$ in large scale

Full size table

Table 3 Results on the singular problems with rank 1 in small scale

Full size table

We chose the rank of ${\hat{J}}(x^{*})$ to be $n-1$ by using

$$\begin{aligned} A\in R^{n\times 1},\qquad A^{T}=(1,1,\ldots ,1). \end{aligned}$$

The results are given in Tables 1 and 2. In the tables, $x_{0}$, $10x_{0}$ and $100x_{0}$ in the third column are starting points, where $x_{0}$ is suggested in [8]. “NF” and “NJ” represent the numbers of function evaluations and Jacobian evaluations, respectively. “—” represents that the algorithm fails to find a solution in $100\times (n+1)$ iterations. Note that, the evaluation of the Jacobian is generally n times of the function evaluation. So we also presented the values “$NF+n* NJ$” for comparisons of the total evaluations. However, if the Jacobian is sparse, this kind of value does not mean much. For large scale problems, we presented the running time of the problem instead of the total evaluations.

We also chose the rank of ${\hat{J}}(x^{*})$ to be 1 and 10, where $A\in R^{n\times (n-1)}$ and $A\in R^{n\times (n-10)}$ are generated randomly. The results are given in Tables 3 and 4, respectively.

Table 4 Results on the singular problems with rank 10 in large scale

Full size table

From the tables, we can see that the adaptive LM algorithm usually takes the least Jacobian calculations as well as the total calculations to find a solution of small scale nonlinear equations, while it usually takes the least time to find a solution of large scale nonlinear equations, no matter the rank of the problem is. The reason is that it makes full use of the available evaluated Jacobians and matrix factorizations to compute approximate LM steps. Since it takes much less efforts to compute approximate LM steps, the overall cost of the adaptive LM algorithm is usually much less.

5 Conclusion

We proposed an adaptive multi-step Levenberg–Marquardt algorithm for nonlinear equations. It uses the latest evaluated Jacobian to calculate an approximate LM step, if the ratio of the actual reduction to the predicted reduction of the merit function at the iterate is good. Under the local error bound condition, which is weaker than the nonsingularity condition, the adaptive LM algorithm converges superlinearly to some solution of (1.1). Compared to the LM algorithm, the Jacobian calculations, the total calculations, as well as the running time of the adaptive multi-step LM algorithm are significantly reduced.

References

Behling, R., Iusem, A.: The effect of calmness on the solution set of systems of nonlinear equations. Math. Program. 137, 155–165 (2013)
Article MathSciNet MATH Google Scholar
Fan, J.Y.: The modified Levenberg–Marquardt method for nonlinear equations with cubic convergence. Math. Comput. 81, 447–466 (2012)
Article MathSciNet MATH Google Scholar
Fan, J.Y.: A Shamanskii-like Levenberg–Marquardt method for nonlinear equations. Comput. Optim. Appl. 56, 63–80 (2013)
Article MathSciNet MATH Google Scholar
Fan, J.Y., Yuan, Y.X.: On the quadratic convergence of the Levenberg–Marquardt method without nonsingularity assumption. Computing 74, 23–39 (2005)
Article MathSciNet MATH Google Scholar
Kelley, C.T.: Solving Nonlinear Equations with Newton’s Method. Fundamentals of Algorithms. SIAM, Philadelphia (2003)
Book Google Scholar
Levenberg, K.: A method for the solution of certain nonlinear problems in least squares. Q. Appl. Math. 2, 164–166 (1944)
Article MathSciNet MATH Google Scholar
Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear inequalities. SIAM J. Appl. Math. 11, 431–441 (1963)
Article MATH Google Scholar
Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. (TOMS) 7, 17–41 (1981)
Article MathSciNet MATH Google Scholar
Moré, J.J.: The Levenberg–Marquardt Algorithm: Implementation and Theory. In: Watson, G.A. (ed.) Lecture Notes in Mathematics 630: Numerical Analysis, pp. 105–116. Springer, Berlin (1978)
Google Scholar
Powell, M.J.D.: Convergence properties of a class of minimization algorithms. Nonlinear Program. 2, 1–27 (1975)
Google Scholar
Schnabel, R.B., Frank, P.D.: Tensor methods for nonlinear equations. SIAM J. Numer. Anal. 21, 815–843 (1984)
Article MathSciNet MATH Google Scholar
Stewart, G.W., Sun, J.-G.: Matrix Perturbation Theory, (Computer Science and Scientific Computing). Academic Press, Boston (1990)
Google Scholar
Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg–Marquardt method. Comput. Suppl. 15, 237–249 (2001)
MathSciNet MATH Google Scholar
Yuan, Y.X.: Recent advances in trust region algorithms. Math. Program. Ser. B 151, 249–281 (2015)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, and Key Lab of Scientific and Engineering Computing (Ministry of Education), Shanghai Jiao Tong University, Shanghai, 200240, China
Jinyan Fan
School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
Jianchao Huang
School of Mathematical Sciences, Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, China
Jianyu Pan

Authors

Jinyan Fan
View author publications
You can also search for this author in PubMed Google Scholar
Jianchao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jianyu Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinyan Fan.

Additional information

The first author was supported in part by NSFC Grant 11571234. The third author was supported in part by NSFC Grant 11371145 and 11771148, and Science and Technology Commission of Shanghai Municipality Grant 13dz2260400.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, J., Huang, J. & Pan, J. An Adaptive Multi-step Levenberg–Marquardt Method. J Sci Comput 78, 531–548 (2019). https://doi.org/10.1007/s10915-018-0777-8

Download citation

Received: 20 March 2017
Revised: 22 June 2018
Accepted: 23 June 2018
Published: 28 June 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s10915-018-0777-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Adaptive Multi-step Levenberg–Marquardt Method

Abstract

Similar content being viewed by others

A modified Levenberg–Marquardt method with line search for nonlinear equations

A new adaptive Levenberg–Marquardt parameter with a nonmonotone and trust region strategies for the system of nonlinear equations

On a New Updating Rule of the Levenberg–Marquardt Parameter

1 Introduction

2 An Adaptive Multi-step LM Algorithm and Global Convergence

Lemma 2.1

Theorem 2.2

Proof

3 Local Convergence

Assumption 3.1

Assumption 3.2

Assumption 3.3

3.1 Properties of the Step \(d_k\)

Lemma 3.4

Proof

3.2 Boundedness of the LM Parameter

Lemma 3.5

Proof

3.3 Convergence Rate of Algorithm 1

Lemma 3.6

Proof

Theorem 3.7

Proof

4 Numerical Results

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

An Adaptive Multi-step Levenberg–Marquardt Method

Abstract

Similar content being viewed by others

A modified Levenberg–Marquardt method with line search for nonlinear equations

A new adaptive Levenberg–Marquardt parameter with a nonmonotone and trust region strategies for the system of nonlinear equations

On a New Updating Rule of the Levenberg–Marquardt Parameter

1 Introduction

2 An Adaptive Multi-step LM Algorithm and Global Convergence

Lemma 2.1

Theorem 2.2

Proof

3 Local Convergence

Assumption 3.1

Assumption 3.2

Assumption 3.3

3.1 Properties of the Step \(d_k\)

Lemma 3.4

Proof

3.2 Boundedness of the LM Parameter

Lemma 3.5

Proof

3.3 Convergence Rate of Algorithm 1

Lemma 3.6

Proof

Theorem 3.7

Proof

4 Numerical Results

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation