1 Introduction

Given \(F: \mathbb {R}^n \rightarrow \mathbb {R}^m\), the nonlinear least-squares (NLS) problem is as follows:

$$\begin{aligned} \min _{x \in \mathbb {R}^n} \Vert F(x)\Vert ^2, \end{aligned}$$

where \(\Vert \cdot \Vert \) is the Euclidean norm. This problem is important from both a theoretical and practical viewpoint [3, 18]. The seminal works of Levenberg [12], Morrison [15], and Marquardt [13] provided a regularization strategy for improving the convergence of the Gauss–Newton (GN) method. The latter is the first and simplest method to address the NLS problem. In the GN method, the nonlinear residual function F is replaced by its linear approximation at the current iterate, so that the NLS problem is solved by means of a sequence of quadratic problems, without any additional parameter. In the Levenberg–Marquardt (LM) method, a sequence of quadratic problems is also generated, and a regularization term is introduced that essentially depends on the so-called LM parameter.

For very large problems, as those arising in data assimilation, it may be necessary to make further approximations in the GN method in order to reduce computational costs. Gratton et al. [10] point out conditions that ensure the convergence of truncated and perturbed GN methods, deriving rates of convergence for the iterations. Inexact versions of the LM method have also been considered under an error bound condition. Such a condition is weaker than assuming nonsingularity of the appropriate matrix at a solution, namely the Jacobian \(JF\in \mathbb {R}^{m \times n}\) if \(m=n\) or the square matrix \(JF^T JF\), otherwise [2, 8]. The local convergence of the LM method has been studied also under an error bound condition. Yamashita and Fukushima [17] proved q-quadratic convergence for the LM method with the LM parameter sequence set as \(\mu _k=\Vert F(x_k)\Vert ^2\). Later, this q-quadratic rate was extended by Fan and Yuan [6] for the setting \(\mu _k=\Vert F(x_k)\Vert ^\delta \), with \( \delta \in [1,2]\). Fan and Pan [5] have enlarged upon the analysis for the variation of the exponent for \(\delta \in (0,1)\) within the aforementioned update of the LM parameter, showing local superlinear convergence. Accelerated versions of the LM method have been proposed recently by Fan [4], in which cubic local convergence was reached. When it comes to complexity analysis, Ueda and Yamashita [16] have investigated a global complexity bound for the LM method.

In this work we have devised algebraic rules for computing the LM parameter, so that \(\mu _k\) belongs to an interval that is proportional to \(\Vert F(x_k)\Vert \). Under the Lipschitz continuity of the Jacobian, the rules were generated in order to accept the full LM step by the Armijo sufficient decrease condition. On the one hand, under the availability of the Lipschitz constant, we have proved that the full steps are acceptable. On the other hand, for unknown Lipschitz constants, we have proposed a scheme for dynamically updating an estimate of such a constant within a well defined and globally convergent algorithm for solving the NLS problem. The q-quadratic rate of convergence for the zero-residual problem is obtained under an error bound condition. Numerical results illustrate the performance of the proposed algorithm for a set of test problems from the CUTEst, with promising results in terms of both efficiency and robustness.

The text is organized as follows. The basic results that have generated the proposed algorithm are presented in Sect. 2. The algorithm and its underlying properties are given in Sect. 3. The global convergence analysis is provided in Sect. 4, whereas the local convergence result is developed in Sect. 5. The numerical experiments are presented and examined in Sect. 6. A summary of the work is given in Sect. 7 and the tables with the complete numerical results compose the Appendix.

2 Technical results

We consider \(F:\mathbb {R}^n\rightarrow \mathbb {R}^m\) of class \({\mathcal {C}}^1\) with an L-Lipschitz continuous Jacobian \(JF\in \mathbb {R}^{m\times n}\), that is, for all \(x,y\in \mathbb {R}^n\),

$$\begin{aligned} \Vert JF(x)-JF(y)\Vert \le L\Vert x-y \Vert . \end{aligned}$$
(1)

On the left hand-side of the inequality above we have the operator norm induced by the canonical norms in \(\mathbb {R}^n\) and \(\mathbb {R}^m\). More specifically, we are concerned about choosing the regularization parameter \(\mu \) in the Levenberg–Marquardt method,

$$\begin{aligned} s=-\big (JF(x)^T JF(x)+\mu I \big )^{-1} JF(x)^T F(x), \quad x_+=x+ts, \quad 0<t\le 1 \end{aligned}$$

where \(x_+\) is the new iterate and t is the step length.

Initially, a classical result is recalled [3, Lemma 4.1.12].

Lemma 2.1

(Linearization error) For any xs in \(\mathbb {R}^n\),

$$\begin{aligned} \Vert F(x+s)-(F(x)+JF(x)s)\Vert \le \dfrac{L}{2} \Vert s\Vert ^2. \end{aligned}$$

From now on, in this section, s is the Levenberg–Marquardt step at \(x\in \mathbb {R}^n\), with regularization parameter \(\mu {>}0\), that is,

$$\begin{aligned} s&=\hbox {argmin} \Vert F(x)+JF(x)s\Vert ^2+\mu \Vert s\Vert ^2=-\big (JF(x)^TJF(x)+\mu I \big )^{-1}JF(x)^TF(x). \end{aligned}$$
(2)

Let \(\phi :\mathbb {R}\rightarrow \mathbb {R}\) be

$$\begin{aligned} \phi (t)=\Vert F(x+ts) \Vert ^2. \end{aligned}$$
(3)

Note that

$$\begin{aligned} \phi '(0)&=2\langle {F(x)},{JF(x)s}\rangle \nonumber \\&=-2\langle {JF(x)^T F(x)},{\big (JF(x)^TJF(x)+\mu I \big )^{-1}JF(x)^TF(x)}\rangle \le 0. \end{aligned}$$
(4)

First we analyze the norm of the linearization of F along the direction s.

Lemma 2.2

(Linearization’s norm) For any \(t\in [0,1]\),

$$\begin{aligned} \Vert F(x)+tJF(x)s\Vert ^2= & {} \Vert F(x)\Vert ^2+t\langle {JF(x)^TF(x)},{s}\rangle +\big (t^2-t\big )\Vert JF(x)s\Vert ^2\\&\quad -t\mu \Vert s\Vert ^2\le \Vert F(x)\Vert ^2. \end{aligned}$$

Proof

$$\begin{aligned} \Vert F(x)+tJF(x)s\Vert ^2&=\Vert F(x)\Vert ^2+2t\langle {JF(x)^TF(x)},{s}\rangle +{t^2}\Vert JF(x)s\Vert ^2\\&=\Vert F(x)\Vert ^2+t \langle {F(x)},{JF(x)s}\rangle \\&\quad + t \langle {JF(x)^TF(x)},{s}\rangle +t^2\Vert JF(x)s\Vert ^2\\&=\Vert F(x)\Vert ^2+t \langle {F(x)},{JF(x)s}\rangle \\&\quad + t \langle {-(JF(x)^TJF(x)+\mu I)s},{s}\rangle +t^2\Vert JF(x)s\Vert ^2\\&=\Vert F(x)\Vert ^2+t\langle {F(x)},{JF(x)s}\rangle \\&\quad +(t^2-t)\Vert JF(x)s\Vert ^2-t\mu \Vert s\Vert ^2. \end{aligned}$$

Using (4) and the fact that \(t\in [0,1]\), we conclude the proof. \(\square \)

Now we analyze the norm of F along the direction s.

Lemma 2.3

For any \(t\in [0,1]\),

$$\begin{aligned} \Vert F(x+ts)\Vert ^2&\le \Vert F(x)\Vert ^2+t\langle {F(x)},{JF(x)s}\rangle +(t^2-t)\Vert JF(x)s\Vert ^2\\&\quad + t\Vert s\Vert ^2\left[ \dfrac{L^2}{4}t^3\Vert s\Vert ^2+Lt\Vert F(x)+tJF(x)s\Vert -\mu \right] \\&\le \Vert F(x)\Vert ^2+t\langle {JF(x)^TF(x)},{s}\rangle +(t^2-t)\Vert JF(x)s\Vert ^2\\&\quad + t\Vert s\Vert ^2\left[ \dfrac{L^2}{4}t^3\Vert s\Vert ^2+Lt\Vert F(x)\Vert -\mu \right] . \end{aligned}$$

Proof

Let

$$\begin{aligned} R(t)=F(x+ts)-(F(x)+tJF(x)s). \end{aligned}$$

Since \(F(x+ts)=F(x)+tJF(x)s+R(t)\),

$$\begin{aligned} \Vert F(x+ts)\Vert ^2&=\Vert F(x)+tJF(x)s\Vert ^2+\Vert R(t)\Vert ^2+2\langle {F(x)+tJF(x)s},{R(t)}\rangle \\&\le \Vert F(x)+tJF(x)s\Vert ^2+\Vert R(t)\Vert ^2+2\Vert F(x)+tJF(x)s\Vert \Vert R(t)\Vert \\&\le \Vert F(x)+tJF(x)s\Vert ^2+\Vert R(t)\Vert ^2+2\Vert F(x)\Vert \Vert R(t)\Vert \end{aligned}$$

where the first inequality follows from Cauchy–Schwarz inequality and the second one from Lemma 2.2. From the definition of R(t) and the assumption of \(JF\) being L-Lipschitz continuous, we have that \(\Vert R(t)\Vert \le Lt^2\Vert s\Vert ^2/2 \). Therefore,

$$\begin{aligned} \Vert F(x+ts)\Vert ^2&\le \Vert F(x)+tJF(x)s\Vert ^2+\dfrac{L^2}{4}t^4\Vert s\Vert ^4+Lt^2\Vert F(x)\Vert \Vert s\Vert ^2. \end{aligned}$$

To end the proof, use Lemma 2.2 again. \(\square \)

In view of Lemma 2.3 we need a bound for \(\Vert s\Vert \) in order to choose, at a non-stationary point, a regularization parameter \(\mu \). The next technical lemma will be useful for obtaining such a bound.

Lemma 2.4

For any \(A\in \mathbb {R}^{m\times n}\), \(b\in \mathbb {R}^m\) and \(\mu {>}0\)

$$\begin{aligned} \big \Vert \big (A^T A+\mu I\big )^{-1} A^T b\big \Vert \le \dfrac{1}{2\sqrt{\mu }}\Vert P_{{\mathcal {R}}(A)}b\Vert , \end{aligned}$$

where \({\mathcal {R}}(A)\) is the range of A and \(P_{{\mathcal {R}}(A)}\) is the orthogonal projection onto this subspace.

Proof

Let \(b'=P_{{\mathcal {R}}(A)}b\) and \(s=(A^T A+\mu I)^{-1} A^T b\). Observe that

$$\begin{aligned} \big (A^T A+\mu I\big ) s=A^T b'. \end{aligned}$$
(5)

We will use an SVD of A. There exist U and V unitary matrices \(m \times m\) and \(n \times n\), respectively, such that

$$\begin{aligned} U^TAV=D,\quad d_{i,j}={\left\{ \begin{array}{ll} \sigma _i,&{} i=j\\ 0,&{}\text {otherwise,}\end{array}\right. } \end{aligned}$$

where \(\sigma _i\ge 0\) for all \(i=1,\ldots ,\min \{m, n\}\). Note that \(V^T=V^{-1}\), \(U^T=U^{-1}\) and

$$\begin{aligned} V^TA^TAV=(U^TAV)^TU^TAV=D^TD. \end{aligned}$$

Therefore, pre-multiplying both sides of (5) by \(V^T\) and using the substitutions

$$\begin{aligned} {\tilde{s}}=V^Ts,\quad {\tilde{b}}=U^Tb' \end{aligned}$$

in (5) we conclude that

$$\begin{aligned} D^T{\tilde{b}}=V^T\big (A^TA + \mu I\big )V{\tilde{s}}=\big (D^TD+\mu I\big ){\tilde{s}}. \end{aligned}$$

It follows from this equation that if \(n \le m\) then

$$\begin{aligned} {\tilde{s}}_i=\dfrac{\sigma _i}{\sigma _i^2+\mu }{\tilde{b}}_i,\quad i=1,\ldots ,n; \end{aligned}$$

while if \(n{>}m\) then

$$\begin{aligned} {\tilde{s}}_i={\left\{ \begin{array}{ll} \dfrac{\sigma _i}{\sigma _i^2+\mu }{\tilde{b}}_i,&{} 1\le i\le m\\ 0,&{} m<i\le n.\end{array}\right. } \end{aligned}$$

Since \(t/(t^2+\mu )\le 1/(2\sqrt{\mu })\) for \(\mu {>}0\) and \(t\ge 0\), we have

$$\begin{aligned} \Vert {\tilde{s}}\Vert \le \dfrac{1}{2\sqrt{\mu }}\Vert {\tilde{b}}\Vert . \end{aligned}$$

To end the proof, note that \(\Vert {\tilde{s}}\Vert =\Vert s\Vert \) and \(\Vert {\tilde{b}}\Vert =\Vert b'\Vert \). \(\square \)

In the following result, the main ingredients for the devised algebraic rules are established.

Theorem 2.5

For any \(\mu {>}0\),

$$\begin{aligned} \Vert s\Vert \le \dfrac{1}{2\sqrt{\mu }}\Vert P(F(x))\Vert \le \dfrac{1}{2\sqrt{\mu }}\Vert F(x)\Vert \end{aligned}$$

where P is the orthogonal projection onto the range of \(JF(x)\). Moreover, if

$$\begin{aligned} \mu \ge \dfrac{L}{4}\Big (2\Vert F(x)\Vert +\sqrt{4\Vert F(x)\Vert ^2+\Vert P(F(x))\Vert ^2}\Big ), \end{aligned}$$
(6)

then

$$\begin{aligned} \Vert F(x+s)\Vert ^2\le&\Vert F(x)\Vert ^2+\langle {F(x)},{JF(x)s}\rangle . \end{aligned}$$

Proof

The first part of the theorem, which are the bounds for \(\Vert s\Vert \), follows from (2), Lemma 2.4 and the metric properties of the orthogonal projection. Combining the first part of the theorem with Lemma 2.3, we conclude that

$$\begin{aligned} \Vert F(x+s)\Vert ^2\le&\Vert F(x)\Vert ^2+\langle {F(x)},{JF(x)s}\rangle \\&+\dfrac{\Vert s\Vert ^2}{\mu } \left[ \dfrac{L^2}{16}\Vert P(F(x))\Vert ^2+L\Vert F(x)\Vert \mu -\mu ^2\right] . \end{aligned}$$

To end the proof, note that the right hand-side of inequality (6) is the largest root of the concave quadratic \(\mu \mapsto \dfrac{L^2}{16}\Vert P(F(x))\Vert ^2+L\Vert F(x)\Vert \mu -\mu ^2\). \(\square \)

Remark 2.6

In view of Theorem 2.5, possible choices for \(\mu \) at a point x are

$$\begin{aligned} \mu =\dfrac{L}{4}\left( 2\Vert F(x)\Vert +\sqrt{4\Vert F(x)\Vert ^2+\Vert P(F(x))\Vert ^2}\right) , \end{aligned}$$
(7)

or

$$\begin{aligned} \mu =\dfrac{L}{4}(4\Vert F(x)\Vert +\Vert P(F(x))\Vert ), \end{aligned}$$
(8)

or

$$\begin{aligned} \mu =\dfrac{2+\sqrt{5}}{4}L\Vert F(x)\Vert . \end{aligned}$$
(9)

The first value is the smallest one and the first two values require the computation of \(\Vert P(F(x))\Vert \), although an upper bound for such a norm can also be used.

3 The algorithm

Now we propose and analyze the Levenberg–Marquard algorithm with algebraic rules for computing the regularization parameter.

figure a

Concerning Algorithm 1, it is worth noticing that

  1. (i)

    Iteration \(\ell \) begins with \(k=\ell -1\), and ends with \(k=\ell \) if \(JF(x_{\ell -1})^T F(x_{\ell -1}) \ne 0\).

  2. (ii)

    If the algorithm does not end at iteration \(k+1\), then \(\mu _k{>}0\), \(s_k\) is well defined and it is a descent direction for \(\Vert F(\cdot ) \Vert ^2\). Therefore, the Armijo line search in Step 6–10 of Algorithm 1 has finite termination. Altogether, Algorithm 1 is well defined and either it terminates after \(\ell \) steps with \(JF(x_{\ell -1})^T F(x_{\ell -1})=0\), or it generates infinite sequences \((x_k)\), \((s_k)\), \((t_k)\), \((\mu _k)\), \((L_k)\).

  3. (iii)

    \(L_k\) plays the role of the Lipschitz constant of \(JF\). If \(\delta {>}0\), this parameter also plays the role of a safeguard which prevents \(L_k\) from becoming too small.

From now on, we assume that Algorithm 1 with input \(x_0\in \mathbb {R}^n\), \(\beta \in (0,1)\), \(\eta \in [0,1)\), \(L_0{>}0\) and \(\delta \ge 0\) does not stop at Step 2 and that \((x_k)\), \((s_k)\), \((t_k)\), \((\mu _k)\), \((L_k)\) are the (infinite) sequences generated by it.

Next we analyze the basic properties of Algorithm 1.

Proposition 3.1

If \(L_k\ge L\), then \(t_k=1\) and \(L_{k+1}=\max \{L_k/2,\delta \}\).

Proof

Suppose that \(L_k\ge L\). From this assumption and the definition of \(\mu _k\) (in Step 4) we have that

$$\begin{aligned} \mu _k\ge \dfrac{L}{4}\Big (2\Vert F(x_k)\Vert +\sqrt{4\Vert F(x_k)\Vert ^2+\Vert P(F(x_k))\Vert ^2}\Big ), \end{aligned}$$

where P is the orthogonal projection onto the range of \(JF(x_k)\). The first equality of the proposition follows from the above inequality; the definition of \(s_k\) (in Step 5); Theorem 2.5 with \(\mu =\mu _k\), \(x=x_k\), \(s=s_k\); and Steps 6–10. The second equality comes from the first one and Steps 12–22. \(\square \)

Proposition 3.2

For all k,

$$\begin{aligned} \delta \le L_k\le \max \{L_0,2L\}; \end{aligned}$$
(10)

and, for infinitely many k, \(t_k=1\).

Proof

Since \(L_0\ge \delta \) and \(L_{k+1}\ge \max \{L_k/2,\delta \}\) for all k, the first inequality in (10) also holds for all k.

We will prove the second inequality by induction in k. This inequality holds trivially for \(k=0\). Assume that it holds for some k. Steps 12–22 of the algorithm imply that if \(t_k=1\), then \(L_{k+1}=L_k\) or \(L_{k+1}=\max \{\delta ,L_k/2 \}\le L_k\) and in both cases the inequality holds for \(k+1\). If \(t_k<1\), it follows from Proposition 3.1 that \(L_k< L\) and then \(L_{k+1}=2L_k\le 2L\). So the inequality holds for \(k+1\) and the induction proof is complete.

To prove the second part of the proposition, suppose that \(t_k<1\) for any \(k\ge k_0\). Then

$$\begin{aligned} L_k=2^{k-k_0}L_{k_0}, \ k=k_0, \ k_0+1,\ldots \end{aligned}$$

in contradiction with (10). \(\square \)

From Proposition 3.2 and the Step 4 of the algorithm, we have

$$\begin{aligned} \delta \Vert F(x_k)\Vert \le \mu _k \le \frac{2+\sqrt{5}}{4}\max \{L_0,2L\}\Vert F(x_k)\Vert \end{aligned}$$
(11)

for all k.

Proposition 3.3

For each k

$$\begin{aligned} \Vert F(x_{k+1})\Vert ^2&\le \Vert F(x_k)\Vert ^2+\beta t_k\langle {JF(x_k)^T F(x_k)},{s_k}\rangle \nonumber \\&\le \Vert F(x_k)\Vert ^2 -\beta t_k\frac{\Vert JF(x_k)^T F(x_k)\Vert ^2}{\Vert JF(x_k)\Vert ^2+\mu _k}. \end{aligned}$$
(12)

As a consequence, the sequence \(\left( \Vert F(x_k)\Vert ^2\right) \) is strictly decreasing and

$$\begin{aligned} \sum _{k=0}^\infty \beta t_k\frac{\Vert JF(x_k)^T F(x_k)\Vert ^2}{\Vert JF(x_k)\Vert ^2+\mu _k}\le \Vert F(x_0)\Vert ^2. \end{aligned}$$

Proof

The first inequality follows from the stopping condition for the Armijo line search (Steps 6–10). In view of the definition of \(s_k\) and \(\mu _k\), and the fact that \(\mu _k{>}0\),

$$\begin{aligned} -\langle {JF(x_k)^T F(x_k)},{s_k}\rangle&=\langle {JF(x_k)^T F(x_k)},{(JF(x_k)^TJF(x_k)+\mu _k I)^{-1}JF(x_k)^T F(x_k)}\rangle \\&\ge \dfrac{\Vert JF(x_k)^TF(x_k)\Vert ^2}{\Vert JF(x_k)^TJF(x_k)\Vert +\mu _k}, \end{aligned}$$

which trivially implies the second inequality. The last statement of the proposition follows directly from (12). \(\square \)

4 General convergence analysis

The convergence of Algorithm 1 is examined in the following results.

Proposition 4.1

If the sequence \((x_k)\) is bounded, then it has a stationary accumulation point.

Proof

By Proposition 3.2, \(t_k=1\) for infinitely many k. Since \((x_k)\) is bounded, there exists a subsequence \((x_{k_j})\) convergent to some \({\bar{x}}\), such that \(t_{k_j}=1\) for all j. Thus, by Proposition 3.3,

$$\begin{aligned} \sum _{j=1}^\infty \beta \frac{\Vert JF(x_{k_j})^T F(x_{k_j})\Vert ^2}{\Vert JF(x_{k_j})\Vert ^2+\mu _{k_j}}\le \Vert F(x_0)\Vert ^2. \end{aligned}$$

Moreover, since F and \(JF\) are continuous, \(\Vert F(x_k)\Vert \), \(\Vert JF(x_k)\Vert \) and \(\mu _k\) are bounded. Hence the sequence \((JF(x_{k_j})^TF(x_{k_j}))\) converges to 0. To end the proof, use again the continuity of F and \(JF\) to conclude that \(JF({\bar{x}})^TF({\bar{x}})=0\). \(\square \)

From the bound for the norm of F along the direction s, obtained in Lemma 2.3, we prove next that the step length is bounded away from zero.

Proposition 4.2

If \(\delta {>}0\), then

$$\begin{aligned} t_k \ge \dfrac{8\delta ^2/L^2}{1+16\delta /L} \end{aligned}$$

for all k.

Proof

For any \(t\in [0,1]\) it holds

$$\begin{aligned} \dfrac{L^2}{4}t^3\Vert s_k\Vert ^2+Lt\Vert F(x_k)\Vert -\mu _k&\le t\left( \dfrac{L^2}{4}\Vert s_k\Vert ^2+L\Vert F(x_k)\Vert \right) -\mu _k \nonumber \\&\le t\left( \dfrac{L^2}{16\mu _k}\Vert F(x_k)\Vert ^2+L\Vert F(x_k)\Vert \right) -\mu _k \nonumber \\&\le t\left( \dfrac{L^2}{16\delta }\Vert F(x_k)\Vert +L\Vert F(x_k)\Vert \right) -\delta \Vert F(x_k)\Vert \nonumber \\&=\delta \Vert F(x_k)\Vert \left[ t\left( \dfrac{L^2}{16\delta ^2}+\dfrac{L}{\delta }\right) -1\right] , \end{aligned}$$
(13)

where the first inequality comes from the bound on t, the second from Theorem 2.5 and the third from the fact that \(\mu _k\ge \delta \Vert F(x_k)\Vert \), as stated in (11).

From inequality (13) and Lemma 2.3, we have that if

$$\begin{aligned} 0\le t\le \dfrac{1}{\dfrac{L^2}{16\delta ^2}+\dfrac{L}{\delta }}=\dfrac{16\delta ^2/L^2}{1+16\delta /L}, \end{aligned}$$

then \(\Vert F(x_k+ts_k)\Vert ^2\le \Vert F(x_k)\Vert ^2+t\langle {s_k},{JF(x_k)^TF(x_k)}\rangle \). So, the result follows from Steps 6–10 of Algorithm 1. \(\square \)

We now present the global convergence result of Algorithm 1.

Proposition 4.3

If \(\delta {>} 0\), then all accumulation points of the sequence \((x_k)\) are stationary for the function \(\Vert F(x)\Vert ^2\).

Proof

Suppose that \((x_{k_j})\) converges to some \({\bar{x}}\). From Propositions 3.3 and 4.2, we have that

$$\begin{aligned} \sum _{j=1}^\infty \frac{\Vert JF(x_{k_j})^T F(x_{k_j})\Vert ^2}{\Vert JF(x_{k_j})\Vert ^2+\mu _{k_j}}<\infty . \end{aligned}$$

It follows from (11) and from the continuity of F and \(JF\) that \( \Vert JF(x_{k_j})\Vert ^2+\mu _{k_j}\) is bounded. Therefore \(JF(x_{k_j})^TF(x_{k_j})\) converges to 0. \(\square \)

A complexity result concerning the stationarity measure of the Algorithm 1 is given next.

Proposition 4.4

If \(\delta {>} 0\) and \((x_k)\) is bounded, then

$$\begin{aligned} \min _{i=1,\ldots ,k} \Vert JF(x_i)^T F(x_i)\Vert ={\mathcal {O}} \left( \dfrac{1}{\sqrt{k}}\right) . \end{aligned}$$

Proof

Define

$$\begin{aligned} M=\sup _{k}\left\{ \Vert JF(x_k)\Vert ^2+\frac{2+\sqrt{5}}{4}\max \{L_0,2L\}\Vert F(x_k)\Vert \right\} . \end{aligned}$$

Then, by (11) and Propositions 3.3 and 4.2, for any k

$$\begin{aligned}&k\;\dfrac{\beta }{M} \left( \dfrac{8\delta ^2/L^2}{1+16\delta /L} \right) \min _{i=1,\ldots ,k}{\Vert JF(x_i)^TF(x_i)\Vert ^2}\\&\qquad \le \sum _{i=1}^k\beta t_i\frac{\Vert JF(x_i)^T F(x_i)\Vert ^2}{\Vert JF(x_i)\Vert ^2+\mu _i}\le \Vert F(x_0)\Vert ^2 \end{aligned}$$

and the conclusion follows. \(\square \)

5 Quadratic convergence under an error bound condition

Given \(F: \mathbb {R}^n \rightarrow \mathbb {R}^m\), consider the system

$$\begin{aligned} F(x)=0 \end{aligned}$$
(14)

and let \(X^*\) be its solution set, that is,

$$\begin{aligned} X^*=\{x\in \mathbb {R}^n\;|\; F(x)=0\}\ne \emptyset . \end{aligned}$$
(15)

Our aim is to prove that \((x_k)\) converges quadratically near solutions of (14) where, locally, \(\Vert F(x)\Vert \) provides an error bound for this system, in the sense of [17, Definition 1]. For completeness, we present this definition in the sequel (see Definition 5.3).

Define, for \(\gamma {>}0\), \(\mathbf {S}_\gamma :\mathbb {R}^n\setminus X^*\rightarrow \mathbb {R}^n\)

$$\begin{aligned} \mathbf {S}_\gamma (x)&=-\bigg (JF(x)^TJF(x)+\gamma \Vert F(x)\Vert I\bigg )^{-1}JF(x)^TF(x). \end{aligned}$$
(16)

Auxiliary bounds are established in the next two results.

Proposition 5.1

If \(x\in \mathbb {R}^n\setminus X^*\), \(\gamma {>}0\), \(s=\mathbf {S}_\gamma (x)\), and \(x_+=x+s\) then

$$\begin{aligned} \Vert JF(x_+)^TF(x_+)\Vert \le (\gamma + L)\Vert F(x)\Vert \Vert s\Vert +\dfrac{L}{2}\Vert JF(x_+)\Vert \Vert s\Vert ^2. \end{aligned}$$

Proof

Direct algebraic manipulations yield

$$\begin{aligned} JF(x_+)^TF(x_+)= & {} JF(x)^T(F(x)+JF(x)s)+(JF(x_+)\\&-JF(x))^T(F(x)+JF(x)s)\\&+JF(x_+)^T[F(x_+)-(F(x)+JF(x)s)]. \end{aligned}$$

It follows from (16) that \(JF(x)^T(F(x)+JF(x)s)=-\gamma \Vert F(x)\Vert s\). Combining this result with the above equality, and using the triangle inequality, we have

$$\begin{aligned} \Vert JF(x_+)^TF(x_+) \Vert&\le \gamma \Vert F(x)\Vert \Vert s\Vert + \Vert JF(x_+)-JF(x)\Vert \Vert F(x)+JF(x)s\Vert \\&\quad + \Vert JF(x_+)\Vert \Vert F(x_+)-(F(x)+JF(x)s)\Vert \\&\le \gamma \Vert F(x)\Vert \Vert s\Vert + L\Vert F(x)\Vert \Vert s\Vert + \dfrac{L}{2}\Vert JF(x_+)\Vert \Vert s\Vert ^2, \end{aligned}$$

where the last inequality follows from Assumption (1), Lemmas 2.1 and 2.2. \(\square \)

Proposition 5.2

If \(x\in \mathbb {R}^n\setminus X^*\), \({\bar{x}}\in X^*\), \(\gamma {>}0\), \(s=\mathbf {S}_\gamma (x)\), and \({\bar{s}}={\bar{x}}-x\), then

  1. 1.

    \(\Vert F(x)+JF(x){\bar{s}}\Vert \le \dfrac{L}{2}\Vert {\bar{s}}\Vert ^2\);

  2. 2.

    \(\Vert F(x)+JF(x)s\Vert ^2+\Vert JF(x)({\bar{s}}-s)\Vert ^2 +\gamma \Vert F(x)\Vert (\Vert s\Vert ^2+\Vert {\bar{s}}-s\Vert ^2) \le \dfrac{L^2}{4}\Vert {\bar{s}}\Vert ^4+\gamma \Vert F(x)\Vert \Vert {\bar{s}}\Vert ^2\).

Proof

Since \(F({\bar{x}})=0\),

$$\begin{aligned} F(x)+JF(x){\bar{s}}=F(x)+JF(x)({\bar{x}}-x)-F({\bar{x}}) \end{aligned}$$

and the first inequality follows from this equality and Lemma 2.1.

Define

$$\begin{aligned} \psi _{\gamma ,x}:\mathbb {R}^n\rightarrow \mathbb {R},\quad \psi _{\gamma ,x}(u)=\Vert F(x)+JF(x)u\Vert ^2+\gamma \Vert F(x)\Vert \Vert u\Vert ^2. \end{aligned}$$

From item 1 we have

$$\begin{aligned} \psi _{\gamma ,x}({\bar{s}})\le \dfrac{L^2}{4}\Vert {\bar{s}}\Vert ^4+\gamma \Vert F(x)\Vert \Vert {\bar{s}}\Vert ^2. \end{aligned}$$

Observe that \(s=\hbox {argmin}_{u\in \mathbb {R}^n} \psi _{\gamma ,x}(u)\). Since \(\psi _{\gamma ,x}\) is a quadratic with Hessian \(2(JF(x)^TJF(x)+ \gamma \Vert F(x)\Vert I)\) and it is minimized by s,

$$\begin{aligned} \psi _{\gamma ,x}({\bar{s}})&=\psi _{\gamma ,x}(s)+\Vert JF(x)({\bar{s}}-s)\Vert ^2+\gamma \Vert F(x)\Vert \Vert {\bar{s}}-s\Vert ^2\\&=\Vert F(x)+JF(x)s\Vert ^2+\Vert JF(x)({\bar{s}}-s)\Vert ^2+\gamma \Vert F(x)\Vert \big (\Vert s\Vert ^2+\Vert {\bar{s}}-s\Vert ^2\big ) . \end{aligned}$$

The second inequality of the proposition follows from the two above relations. \(\square \)

We will analyze the local convergence of the sequence \((x_k)\) under the local error bound condition, as defined next. Such a condition is weaker than assuming nonsingularity of \(JF(x)^T JF(x)\) for x at the solution set \(X^*\).

Definition 5.3

([17, Definition 1]) Let V be an open subset of \(\mathbb {R}^n\) such that \(V\cap X^*\ne \emptyset \), where \(X^*\) is as in (15). We say that \(\Vert F(x)\Vert \) provides a local error bound on V for the system (14) if there exists a positive constant c such that

$$\begin{aligned} c\;{{\mathrm{dist}}}(x,X^*)\le \Vert F(x)\Vert \end{aligned}$$

for all \(x\in V\).

The next lemma, which will be instrumental in the main result of this section, was proved in [7, Corollary 2]. A proof is provided here for the sake of completeness, where the function F is simply continuously differentiable.

Lemma 5.4

If \(F:\mathbb {R}^n\rightarrow \mathbb {R}^m\) is continuously differentiable and \(\Vert F(x)\Vert \) provides an error bound for (14) in a neighborhood V of \(x^*\in X^*\) as in Definition 5.3, then \(\Vert JF(x)^TF(x)\Vert \) also provides an error bound for (14) in some neighborhood of \(x^*\).

Proof

Suppose that \(V\subset \mathbb {R}^n\) is a neighborhood of \(x^*\) where \(\Vert F(x)\Vert \) provides an error bound for (14), that is,

$$\begin{aligned} c\,{{\mathrm{dist}}}(x,X^*) \le \Vert F(x)\Vert \quad \forall x\in V. \end{aligned}$$

Define, for \(x,y\in \mathbb {R}^n\)

$$\begin{aligned} R(y,x)=F(y)-\left( F(x)+JF(x)(y-x)\right) . \end{aligned}$$

Since F is continuously differentiable, there exists \(r{>}0\) such that \(B(x^*,r)\subseteq V\) and

$$\begin{aligned} \Vert R(y,x)\Vert \le c\Vert y-x\Vert /2\quad \forall y,x\in B(x^*,r). \end{aligned}$$

Take \(x\in B(x^*,r/2)\) and \({\bar{x}}\in \displaystyle {\arg \min _{z\in X^*}\Vert z-x\Vert }\). Then \({{\mathrm{dist}}}(x,X^*)=\Vert {\bar{x}}-x\Vert <r/2\), \({\bar{x}}\in B(x^*,r)\) and, in view of the above assumptions,

$$\begin{aligned} c\,\Vert {\bar{x}}-x\Vert \le \Vert F(x)\Vert ,\quad \Vert R({\bar{x}},x)\Vert \le c\Vert {\bar{x}}-x\Vert /2. \end{aligned}$$
(17)

Since \(F({\bar{x}})=0\), \( -JF(x)({\bar{x}}-x)=F(x)+R({\bar{x}},x)\) and

$$\begin{aligned} -\langle {JF(x)^TF(x)},{{\bar{x}}-x}\rangle =-\langle {F(x)},{JF(x)({\bar{x}}-x)}\rangle =\Vert F(x)\Vert ^2+\langle {F(x)},{R({\bar{x}},x)}\rangle . \end{aligned}$$

Using the above equalities, Cauchy-Schwarz inequality and (17) we conclude that

$$\begin{aligned} \Vert JF(x)^TF(x)\Vert \Vert {\bar{x}}-x\Vert&\ge \Vert F(x)\Vert \big (\Vert F(x)\Vert -\Vert R({\bar{x}},x)\Vert ) \\&\ge \Vert F(x)\Vert \big (c\,\Vert {\bar{x}}-x\Vert -(c/2)\Vert {\bar{x}}-x\Vert \big ) \ge c^2\Vert {\bar{x}}-x\Vert ^2/2. \end{aligned}$$

Therefore, \(\Vert JF(x)^TF(x)\Vert \ge (c^2/2)\,{{\mathrm{dist}}}(x,X^*)\) for any \(x\in B(x^*,r/2)\). \(\square \)

From now on, in this section, \(x^*\in \mathbb {R}^n\), \(r,c{>}0\) are such that

$$\begin{aligned} F(x^*)=0,\qquad c\,{{\mathrm{dist}}}(x,X^*) \le \Vert F(x)\Vert \qquad \forall x\in B(x^*,r). \end{aligned}$$
(18)

Another auxiliary result is provided next.

Lemma 5.5

Consider \(x^*\in \mathbb {R}^n\), \(r,c{>}0\) satisfying (18). If \(x\in B(x^*,r)\setminus X^*\), \({\bar{x}}\in \displaystyle {\arg \min _{u\in X^*}\Vert u-x\Vert }\), \(\gamma {>}0\), \(s=\mathbf {S}_\gamma (x)\) and \({\bar{s}}={\bar{x}}-x\), then

  1. 1.

    \(\Vert {\bar{s}}\Vert ={{\mathrm{dist}}}(x,X^*)\le \Vert F(x)\Vert /c\);

  2. 2.

    \(\Vert s\Vert \le \left( 1+\dfrac{L^2}{4c \gamma }\Vert {\bar{s}}\Vert \right) ^{1/2}\Vert {\bar{s}}\Vert \);

  3. 3.

    \(\Vert F(x)+JF(x)s\Vert \le \left( \dfrac{L^2}{4c^2\gamma ^2}\Vert {\bar{s}}\Vert ^2+\dfrac{1}{c\gamma }\Vert {\bar{s}}\Vert \right) ^{1/2}\gamma \Vert F(x)\Vert \).

Moreover, if \(\Vert {\bar{s}}\Vert \le \dfrac{c \gamma }{2L^2}\) then \(\Vert F(x+s)\Vert ^2 \le \Vert F(x)\Vert ^2+\langle {s},{JF(x)^TF(x)}\rangle \).

Proof

The first relation in item 1 follows trivially from the definition of \({\bar{x}}\) while the second relation comes from (18).

From item 2 of Proposition 5.2 and item 1 of this lemma, we have that

$$\begin{aligned} \gamma \Vert F(x)\Vert \Vert s\Vert ^2&\le \dfrac{L^2}{4}\Vert {\bar{s}}\Vert ^4+\gamma \Vert F(x)\Vert \Vert {\bar{s}}\Vert ^2 \le \dfrac{L^2}{4c}\Vert F(x)\Vert \Vert {\bar{s}}\Vert ^3 +\gamma \Vert F(x)\Vert \Vert {\bar{s}}\Vert ^2 \end{aligned}$$

and

$$\begin{aligned} \Vert F(x)+JF(x)s\Vert ^2&\le \dfrac{L^2}{4}\Vert {\bar{s}}\Vert ^4+\gamma \Vert F(x)\Vert \Vert {\bar{s}}\Vert ^2\\&\le \left( \dfrac{L^2}{4c^2 \gamma ^2}\Vert {\bar{s}}\Vert ^2+\dfrac{1}{c \gamma }\Vert {\bar{s}}\Vert \right) \gamma ^2\Vert F(x)\Vert ^2, \end{aligned}$$

which trivially imply items 2 and 3, respectively.

To prove the last part of the lemma, suppose that \(\Vert {\bar{s}}\Vert \le \dfrac{c \gamma }{2L^2}\) and define

$$\begin{aligned} a=\dfrac{L^2}{4}\Vert s\Vert ^2+L\Vert F(x)+JF(x)s\Vert -\gamma \Vert F(x)\Vert ,\qquad w=\dfrac{L^2}{c \gamma }\Vert {\bar{s}}\Vert . \end{aligned}$$

From items 2 and 3, we have

$$\begin{aligned} a&\le \dfrac{L^2}{4}\left( 1+\dfrac{L^2}{4 c \gamma }\Vert {\bar{s}}\Vert \right) \Vert {\bar{s}}\Vert ^2 + L\left( \dfrac{L^2}{4 c^2\gamma ^2}\Vert {\bar{s}}\Vert ^2+\dfrac{1}{c \gamma }\Vert {\bar{s}}\Vert \right) ^{1/2}\gamma \Vert F(x)\Vert -\gamma \Vert F(x)\Vert \\&\le \left[ \dfrac{L^2}{4c \gamma }\Vert {\bar{s}}\Vert \left( 1+\dfrac{L^2}{4 c \gamma }\Vert {\bar{s}}\Vert \right) + L\left( \dfrac{L^2}{4c^2 \gamma ^2}\Vert {\bar{s}}\Vert ^2 +\dfrac{1}{c \gamma }\Vert {\bar{s}}\Vert \right) ^{1/2}-1\right] \gamma \Vert F(x)\Vert \\&=\left[ \dfrac{w}{4}\left( 1+\dfrac{w}{4}\right) +w^{1/2}\left( 1+\dfrac{w}{4}\right) ^{1/2}-1\right] \gamma \Vert F(x)\Vert , \end{aligned}$$

where the second inequality follows from item 1, and the equality comes from the definition of w. Observe that since \(w\le 1/2\) it follows that \(a<0\). To end the proof, use the first inequality in Lemma 2.3 with \(t=1\) and \(\mu =\gamma \Vert F(x)\Vert \). \(\square \)

Note that in Algorithm 1, \(s_k=\mathbf {S}_\gamma (x_k)\) for \(\gamma =\mu _k/\Vert F(x_k)\Vert \). In order to simplify the proofs, define

$$\begin{aligned} D=\frac{2+\sqrt{5}}{4}\max \{L_0,2L\},\qquad \gamma _k=\dfrac{\mu _k}{\Vert F(x_k)\Vert }, \qquad k\in \mathbb {N}. \end{aligned}$$
(19)

In view of (11) and the definition of \(s_k\) in Algorithm 1, for all \(k\in \mathbb {N}\),

$$\begin{aligned} \delta \le \gamma _k \le D,\qquad s_k=\mathbf {S}_{\gamma _k}(x_k). \end{aligned}$$
(20)

Assuming that the residual function F provides an error bound for the solution set of the NLS zero-residual problem, the local convergence of Algorithm 1 is established as follows.

Theorem 5.6

Consider \(x^*\in \mathbb {R}^n\), \(r,c{>}0\) satisfying (18). There exists \({\tilde{r}}{>}0\) such that if \(x_{k_0}\in B(x^*,{\tilde{r}})\) for some \(k_0\), then either Algorithm 1 stops at some \(x_k\) solution of (14) or \((x_k)\) converges q-quadratically to some \({\hat{x}}\) solution of (14).

Proof

In view of Lemma 5.4, there exist \(0<r_1\le r\) and \(c_1{>}0\) such that

$$\begin{aligned} c_1{{\mathrm{dist}}}(x,X^*)\le \Vert JF(x)^TF(x)\Vert ,\;\; \qquad \forall x\in B(x^*,r_1). \end{aligned}$$
(21)

Define

$$\begin{aligned} M_1=\max \left\{ \left\| JF(x)\right\| \;|\; x\in B(x^*,r_1)\right\} ,\quad M_2=\dfrac{3M_1}{2 \sqrt{2}c_1} \left( D+L\left( 1+\dfrac{3}{4\sqrt{2}}\right) \right) \end{aligned}$$

and

$$\begin{aligned} \rho =\min \left\{ \dfrac{c \,\delta }{2L^2},\dfrac{\sqrt{2}}{3}r_1,\dfrac{1}{2M_2}\right\} . \end{aligned}$$

We claim that if

$$\begin{aligned}&x\in B(x^*,\rho )\setminus X^*,\;\delta \le \gamma \le D,\; s=\mathbf {S}_\gamma (x), \; x_+=x+s, \;\nonumber \\&{\bar{x}}\in \displaystyle {\arg \min _{u\in X^*}\Vert u-x\Vert }, \; {\bar{s}}={\bar{x}}-x, \end{aligned}$$
(22)

then

$$\begin{aligned}&\Vert s\Vert \le \dfrac{3}{2\sqrt{2}}{{\mathrm{dist}}}(x,X^*)\le \dfrac{3}{2\sqrt{2}}\rho ,\end{aligned}$$
(23)
$$\begin{aligned}&{{\mathrm{dist}}}(x_+,X^*)\le M_2 {{\mathrm{dist}}}(x,X^*)^2\le {{\mathrm{dist}}}(x,X^*)/2,\end{aligned}$$
(24)
$$\begin{aligned}&\Vert F(x_+)\Vert ^2\le \Vert F(x)\Vert ^2+\langle {s},{JF(x)^TF(x)}\rangle . \end{aligned}$$
(25)

The first inequality in (23) follows from item 2 of Lemma 5.5 and the definitions of \({\bar{s}}\) and \(\rho \). The second inequality comes from the inclusions \(x^*\in X^*\) and \(x\in B(x^*,\rho )\).

To prove (24) first observe that

$$\begin{aligned} \Vert x_+ - x^*\Vert \le \Vert x - x^*\Vert +\Vert s\Vert \le \rho + \dfrac{3}{2\sqrt{2}}\rho < r_1 \end{aligned}$$

which comes from the definition of \(x_+\), the triangle inequality, (23) and the definition of \(\rho \). Consequently, from (21) for \(x_+\) and Proposition 5.1,

$$\begin{aligned} {{\mathrm{dist}}}(x_+,X^*)&\le \dfrac{1}{c_1} \Vert JF(x_+)^TF(x_+)\Vert \\&\le \dfrac{1}{c_1} \left( (\gamma +L)\Vert F(x)\Vert + \dfrac{L}{2} \Vert JF(x_+)\Vert \Vert s\Vert \right) \Vert s\Vert . \end{aligned}$$

Since \(JF\) is continuous, \(F(x)=F({\bar{x}}) - \int _0^1 JF({\bar{x}}-t {\bar{s}}) \,{\bar{s}} \, dt\). As \( F({\bar{x}})=0\), we have

$$\begin{aligned} \Vert F(x)\Vert \le \left( \max _{x\in B(x^*,\rho )}\left\| JF(x)\right\| \right) \Vert \bar{s}\Vert \le M_1\Vert \bar{s}\Vert . \end{aligned}$$

Using the two relations displayed above, the bounds for \(\gamma \) in (22), (23), and the definitions of \(M_1\) and \(M_2\) we have

$$\begin{aligned} {{\mathrm{dist}}}(x_+,X^*)&\le \dfrac{3 M_1}{2 \sqrt{2} c_1} \left( D +L \left( 1 + \dfrac{3}{4 \sqrt{2}} \right) \right) \Vert {\bar{s}}\Vert ^2=M_2 {{\mathrm{dist}}}(x,X^*)^2, \end{aligned}$$

which proves the first inequality of (24). The second inequality in (24) comes from the fact that \({{\mathrm{dist}}}(x,X^*) \le \rho \) and from the definition of \(\rho \).

Our third claim, (25), follows directly from the last result of Lemma 5.5.

Next we define a family of sets on \(\mathbb {R}^n\) which are, in some sense, well behaved with respect to Algorithm 1. Define

$$\begin{aligned} W_{\tau }&=\left\{ x\in \mathbb {R}^n\;\left| \; \Vert x-x^*\Vert \le \dfrac{3 \tau +\sqrt{2}}{3+\sqrt{2}}\rho , \; {{\mathrm{dist}}}(x,X^*)\le \dfrac{(1-\tau )\sqrt{2}}{3+\sqrt{2}}\rho \right\} \right. ,\;\;\\ W&=\bigcup _{0\le \tau <1}W_\tau , \end{aligned}$$

and let

$$\begin{aligned} {\tilde{r}}=\dfrac{\sqrt{2}}{3+\sqrt{2}}\,\rho . \end{aligned}$$

Since \(x^*\in X^*\), we have \(W_0=B(x^*,{\tilde{r}})\). Hence

$$\begin{aligned} B\left( x^*,{\tilde{r}}\right) \subset W\subset B(x^*,\rho )\subset B(x^*,r_1). \end{aligned}$$

Let \(x_k\in W\). If \(JF(x_k)^TF(x_k)=0\) then the algorithm stops at \(x_k\) and, in view of (21), \(F(x_k)=0\). Next we analyze the case \(JF(x_k)^TF(x_k)\ne 0\). It follows from (20) and (25) that

$$\begin{aligned} x_{k+1}=x_k+\mathbf {S}_{\gamma _k}(x_k)=x_k+s_k. \end{aligned}$$

As \(x_k\in W\), we have \(x_k\in W_\tau \) for some \(0\le \tau <1\). Therefore, from the triangle inequality, the definition of \(W_\tau \), and (23)

$$\begin{aligned} \Vert x_{k+1}-x^*\Vert&\le \Vert x_k-x^*\Vert +\Vert s_k\Vert \\&\le \dfrac{3 \tau +\sqrt{2}}{3+\sqrt{2}}\rho +\dfrac{3}{2\sqrt{2}}{{\mathrm{dist}}}(x_k,X^*)\\&\le \dfrac{3 \tau +\sqrt{2}}{3+\sqrt{2}}\rho +\dfrac{3}{2\sqrt{2}} \dfrac{(1-\tau )\sqrt{2}}{3+\sqrt{2}}\rho =\dfrac{3\left( \frac{1+\tau }{2}\right) +\sqrt{2}}{3+\sqrt{2}}\;\rho . \end{aligned}$$

Additionally, from (24) and the definition of \(W_\tau \) we also have

$$\begin{aligned} {{\mathrm{dist}}}(x_{k+1}, X^*)&\le \dfrac{1}{2}{{\mathrm{dist}}}(x_k,X^*) \le \dfrac{(1-\tau )\sqrt{2}}{2(3+\sqrt{2})}\rho =\dfrac{\left( 1-\frac{1+\tau }{2}\right) \sqrt{2}}{3+\sqrt{2}}\rho . \end{aligned}$$

Altogether we proved that

$$\begin{aligned} 0\le \tau <1,\;x_k\in W_\tau ,JF(x_k)^TF(x_k)\ne 0 \Rightarrow \;x_{k+1}\in W_{\frac{1+\tau }{2}}. \end{aligned}$$

Suppose that \(x_{k_0}\in B(x^*,{\tilde{r}})\). We have just proved that in this case either the algorithm stops at some \(x_k\in X^*\) or an infinite sequence is generated and \(x_k\in B(x^*,\rho )\) for \(k\ge k_0\). Assume that an infinite sequence is generated and define

$$\begin{aligned} d_k={{\mathrm{dist}}}(x_k,X^*). \end{aligned}$$

It follows from (23) and (24) that for \(k \ge k_0\),

$$\begin{aligned} \Vert s_k\Vert \le \dfrac{3}{2\sqrt{2}}d_k, \quad d_{k+1}\le M_2d_k^2\le \dfrac{d_k}{2}. \end{aligned}$$
(26)

Hence, \(\sum _{j=k_0}^\infty \Vert x_{j+1}-x_j\Vert \le \frac{3}{2\sqrt{2}}\sum _{j=k_0}^\infty d_j\le \frac{3}{\sqrt{2}}d_{k_0} \). As \((x_k)\) is a Cauchy sequence, it converges to some \({\hat{x}}\). Since \(d_k={{\mathrm{dist}}}(x_k,X^*)\) converges to 0, \(F({\hat{x}})=0\), that is, \({\hat{x}}\in X^*\). By the triangle inequality and the first inequality in (26),

$$\begin{aligned} \Vert x_k-{\hat{x}}\Vert \le \Vert s_k\Vert +\sum _{j=k+1}^\infty \Vert s_j\Vert \le \dfrac{3}{2\sqrt{2}}\left[ d_k+\sum _{j=k+1}^\infty d_j\right] . \end{aligned}$$

From the last two inequalities in (26) we have

$$\begin{aligned} \sum _{j=k+1}^\infty d_j \le \sum _{j=k}^\infty M_2d_j^2\le M_2d_k\sum _{j=k}^\infty d_j \le 2M_2d_k^2\le d_k. \end{aligned}$$

Therefore, from the two relations displayed above,

$$\begin{aligned} d_k\le \Vert x_k-{\hat{x}}\Vert \le \dfrac{3}{\sqrt{2}} d_k, \end{aligned}$$

where the first inequality comes from the inclusion \({\hat{x}}\in X^*\). Consequently, using (26) again and the definition of \(d_k\), we conclude that

$$\begin{aligned} \Vert x_{k+1}-{\hat{x}}\Vert \le \dfrac{3}{\sqrt{2}}d_{k+1} \le \dfrac{3M_2}{\sqrt{2}}d_k^2 \le \dfrac{3M_2}{\sqrt{2}}\Vert x_k-{\hat{x}}\Vert ^2, \end{aligned}$$

ensuring the q-quadratic convergence and completing the proof. \(\square \)

6 Numerical experiments

In this section we describe numerical experiments to illustrate the practical performance of the algorithm. We start with the algorithmic and computational choices for obtaining the numerical results.

6.1 On the choice of \(\mu _k\)

Similarly to the analysis developed for the unconstrained minimization problem (cf. [11, Prop.1.2]), as the positive scalar \(\mu \) is a lower bound for the smallest eigenvalue of \(JF(x)^TJF(x) + \mu I\), it follows that

$$\begin{aligned} \mu \Vert s\Vert ^2 \le \langle s, \big (JF(x)^TJF(x) + \mu I\big )s \rangle =- \langle s, JF(x)^TF(x) \rangle \le \Vert s\Vert \Vert JF(x)^TF(x)\Vert \end{aligned}$$

and so \(\Vert s \Vert \le \frac{\Vert JF(x)^TF(x)\Vert }{\mu }\).

By Lemma 2.3, with \(t=1\),

$$\begin{aligned} \Vert F(x+s)\Vert ^2\le \Vert F(x)\Vert ^2+\langle {F(x)},{JF(x)s}\rangle + \Vert s\Vert ^2\left[ \dfrac{L^2}{4}\Vert s\Vert ^2+L\Vert F(x)\Vert -\mu \right] . \end{aligned}$$

For obtaining \(\mu \) that ensures \(\dfrac{L^2}{4}\Vert s\Vert ^2+L\Vert F(x)\Vert -\mu \le 0,\) the previous condition concerning the step gives

$$\begin{aligned} L^2 \Vert JF(x)^T F(x)\Vert ^2+4L\Vert F(x)\Vert \mu ^2-4\mu ^3\le 0. \end{aligned}$$
(27)

Let \(\mu ^2=z\), and the concave function \(\psi :\mathbb {R}_+\rightarrow \mathbb {R}\) given by

$$\begin{aligned} \psi (z)=L^2 \Vert JF(x)^T F(x)\Vert ^2+4L\Vert F(x)\Vert z-4z^{3/2}. \end{aligned}$$

Consider \(z_0=L^2\Vert F(x)\Vert ^2\). Note that \( \psi (z_0)=L^2\Vert JF(x)^T F(x)\Vert ^2 \ge 0\).

An iteration of Newton’s method from \(z_0\) gives us

$$\begin{aligned} z_1=z_0-\dfrac{\psi (z_0)}{\psi '(z_0)} =L^2\Vert F(x)\Vert ^2 + \dfrac{L\Vert JF(x)^T F(x)\Vert ^2}{2\Vert F(x)\Vert }. \end{aligned}$$

Then (27) is guaranteed for all

$$\begin{aligned} \mu \ge \sqrt{L^2\Vert F(x)\Vert ^2+ \dfrac{L\Vert JF(x)^T F(x)\Vert ^2}{2\Vert F(x)\Vert }} \;{\mathop {=}\limits ^{\mathrm{def}}} \; \mu _J. \end{aligned}$$

Observe that \(L \Vert F(x)\Vert \le \mu _J\), \( \mu ^+=\frac{2+\sqrt{5}}{4}L \Vert F(x)\Vert \) and so \( \frac{4}{2+\sqrt{5}} \,\mu ^+ \le \mu _J\). Nevertheless, there is no guarantee that \(\mu _J \le \mu ^+\) holds, so we set \( \mu _k=\min \{\mu _k^+, \mu _J\}\).

6.2 On the computation of the step \(s_k\)

The equivalence between the problems

$$\begin{aligned} \min _s \frac{1}{2} \Vert J_k s + F_k \Vert ^2 + \frac{\mu }{2} \Vert s\Vert ^2 \quad \hbox {and} \quad \min _s \frac{1}{2} \left\| \left( \begin{array}{c} J_k \\ \sqrt{\mu _k} I \end{array}\right) s + \left( \begin{array}{c} F_k \\ 0 \end{array}\right) \right\| ^2 \end{aligned}$$

implies that the system

$$\begin{aligned} \left( J_k^T J_k + \mu _k I\right) s + J_k^TF_k=0 \end{aligned}$$
(28)

may be handled by the normal equations as follows (cf. [14])

$$\begin{aligned} \big (J_k^T \; \sqrt{\mu _k} I\big ) \left[ \left( \begin{array}{c} J_k \\ \sqrt{\mu _k} I \end{array}\right) s + \left( \begin{array}{c} F_k \\ 0 \end{array}\right) \right] =0. \end{aligned}$$

As an alternative to the Cholesky factorization of \(J_k^T J_k + \mu _k I\), the QR factorization of \(\left( \begin{array}{c} J_k \\ \sqrt{\mu _k} I \end{array} \right) \) avoids performing the product \(J_k^TJ_k\):

$$\begin{aligned} \left( \begin{array}{c} J_k \\ \sqrt{\mu _k} I \end{array} \right) =Q \left( \begin{array}{c} R_{\mu } \\ 0 \end{array} \right) , \end{aligned}$$

where \(Q \in \mathbb {R}^{(m+n) \times (m+n)}\) is orthogonal and \(R_{\mu } \in \mathbb {R}^{n \times n}\) is upper triangular. Indeed, the economic version of the factorization may be used, and just the first n columns of the orthogonal matrix Q are computed. The direction s is thus obtained from the system (28) by means of the relationship

$$\begin{aligned} J_k^T J_k + \mu _k I=R_{\mu }^T R_{\mu }. \end{aligned}$$

6.3 About the backtracking scheme

Besides the simple bisection of Steps 6–10 of Algorithm 1, we have implemented a quadratic–cubic interpolation scheme (cf. [3, §6.3.2]). As this scheme performed slightly better in preliminary experiments, it was used in the reported results.

6.4 About the initialization of the sequence \((L_k)\)

We set \(L_0=10^{-6}\) and \(L_{\min }=10^{-12}\), compute \(x_1\) and define \(L_1=\max \left\{ \frac{\Vert JF(x_1)-JF(x_0)\Vert _F}{\sqrt{n} \Vert x_1-x_0\Vert }, L_{\min } \right\} \). After that, the updating scheme follows Steps 12–22 of Algorithm 1.

6.5 The results

The tests were performed in a notebook DELL Intel Core i7-4510U, CPU @2.00GHz \(\times \) 4, with 16GB RAM, Inspiron 5000 - i15 5547-A30, 64-bit, using Matlab 2014a, v. 8.3.

The set of test problems consists of all 53 problems from the CUTEst collection [9] such that the nonlinear equations \(F(x)=0\) are recast from feasibility instances, i.e, problems without objective function, with nonlinear equality constraints and without fixed variables for which the Jacobian matrix is available. The constraint bodies and Jacobians were evaluated in sparse format. The initial point \(x_0\) was always the default of the collection.

To put our approach in perspective, the same problems were addressed by two distinct approaches. The first one is the Self-adaptive Levenberg–Marquardt Algorithm of Fan and Pan in [5], which is close to the scheme proposed in this paper. The second one is the modular code lsqnonlin (available within the Matlab software), based on the Levenberg–Marquardt Method [12, 13, 15]. The remaining parameters of the Algorithm 1 were defined by \(\beta =10^{-4}\), \( \eta =10^{-4}\) and \(\delta =10^{-8}\). These parameters are the default values suggested in [5] for the parameters that play a similar role to ours. The choices of Fan and Pan denoted by \(p_0\) and m correspond to our \(\eta \) and \(\delta \), respectively. Moreover, the parameter \(\delta \) of Fan and Pan was set as 1.

Summing up, there are four strategies under analysis:

  • CH: Algorithm 1 with Cholesky factorization for computing \(s_k\);

  • QR: Algorithm 1 with QR factorization for computing \(s_k\);

  • FP: Fan and Pan’s algorithm [5, Alg. 2.1], with Cholesky factorization for computing the step and the aforementioned parameters and values;

  • LM: The Levenberg–Marquardt algorithm (an option of the routine lsqnonlin of Matlab).

Concerning the implemented stopping criteria, we have adopted the same numbering of the exit flags as the routine lsqnonlin. Setting \(\varepsilon _\mathrm{mac}\) as the machine precision, the stopping criteria commom to the four strategies were the following:

  1. (1)

    Convergence to a solution with relative stationarity: \(\Vert JF(x_k)^TF(x_k))\Vert _\infty \le 10^{-10} \max \{\Vert (JF(x_0)^TF(x_0))\Vert _\infty ,\sqrt{\varepsilon _\mathrm{mac}}\}\);

  2. (2)

    Change in x (too small): \(\Vert x_{k+1}-x_k\Vert \le 10^{-9}\left( \sqrt{\varepsilon _{\mathrm{mac}}}+\Vert x_{k+1}\Vert \right) \);

  3. (3)

    Change in the norm of the residue (too small): \(| \Vert F(x_{k+1})\Vert ^2 - \Vert F(x_{k})\Vert ^2| \le 10^{-6} \Vert F(x_{k})\Vert ^2\);

  4. (4)

    Computed search direction (too small): \(\Vert s_k\Vert \le 10^{-9}\);

  5. (0)

    Maximum number of functional evaluations exceeded \(\max =\{2000,10n\}\).

Besides, as strategy \(\mathtt{FP}\) does not compute the step length \(t_k\), the next criterion was included only for the strategies \(\mathtt{CH}\), \(\mathtt{QR}\) and \(\mathtt{LM}\):

(-4):

Line search failed, as the step length is too small: \(|t_k | \le 10^{-15}\).

Let \(f^*\) be the objective function value obtained by a strategy S when this strategy is applied to a given problem with the default initial point. We consider that the strategy S has found a solution (cf. [1]) if

$$\begin{aligned} \frac{f^* - f_{\min }}{\max \{1, f_{\min } \}} \le 10^{-4}, \end{aligned}$$
(29)

where \(f_{\min }\) is the smallest function value found among all the strategies under comparison.

All outputs are reported in the Appendix. Tables 1, lists the 53 test problems. The table columns display the name of the problem; the dimensions (n and m); the number of iterations (\(\#\)Iter); the number of function evaluations (\(\#\)Fun); the function value at the last iterate (\(f^*\)); the CPU time in seconds (CPU), and the reason for stopping (exit). We observe that the stopping criteria 2 and -4 were never activated during the tests. The results of the four solvers CH, QR, FP and LM, respectively, are displayed row by row for each problem.

As it is usual to have some variation of the CPU time from one execution of an algorithm to the other, for each problem we ran nine times all the solvers and we considered the average CPU time of the last eight runs, discarding the CPU time of the first one. The symbol \(\dagger \) indicates that the obtained solution does not satisfy (29). It is worth mentioning that \(\mathtt{CH}\) and \(\mathtt{FP}\) performed one Cholesky factorization per iteration. The only exception occurred at a single iteration of the problem 10FOLDTR for \(\mathtt{CH}\) and of the problems 10FOLDTR and ARGLBLE for \(\mathtt{FP}\). In these three instances, both strategies required an additional Cholesky factorization with a slight increase in \(\mu _k\) to ensure the numerically safe positive definiteness of \(J_k^TJ_k + \mu _kI\).

Fig. 1
figure 1

Performance profiles for the number of iterations, the number of function evaluations and the required CPU time

The results corresponding to the solved problems are depicted in the performance profiles of Fig. 1 for the number of iterations, the number of function evaluations and the required CPU time. The logarithmic scale was used in the horizontal axis for better visualization of the results. In terms of efficiency, our strategies slightly outperformed the \(\mathtt{FP}\) and \(\mathtt{LM}\) with regard to the number of iterations and function evaluations. Indeed, \(52.8\,\%\) of the problems were solved with the fewest number of iterations for both \(\mathtt{CH}\) and \(\mathtt{QR}\), \(35.8\,\%\) for \(\mathtt{FP}\) and \(49.1\,\%\) for \(\mathtt{LM}\). Moreover, \(49.1\,\%\) of the problems were solved with the fewest number of function evaluations for both \(\mathtt{CH}\) and \(\mathtt{QR}\), and \(43.4\,\%\) for \(\mathtt{FP}\) and \(\mathtt{LM}\). Now, concerning the CPU time, the shortest time was never reached by \(\mathtt{QR}\), whereas strategies \(\mathtt{CH}\), \(\mathtt{FP}\) and \(\mathtt{LM}\) solved respectively 35.8, 41.5 and \(22.6\,\%\) of the problems in the shortest time. Furthermore, \(\mathtt{CH}\) and \(\mathtt{FP}\) solved \(70\,\%\) of the problems using no more than twice the best CPU time. Strategies \(\mathtt{CH}\) and \(\mathtt{QR}\) are the most robust, solving 51 of the 53 problems. Problems EIGENB and YATP2SQ were considered not solved by \(\mathtt{CH}\), and 10FOLDTR and YATP2SQ were not solved by strategy \(\mathtt{QR}\). On the other hand, \(\mathtt{LM}\) did not solve three problems (10FOLDTR, CYCLIC3, YATP2SQ), while \(\mathtt{FP}\) did not solve four problems (10FOLDTR, ARWHDNE, CYCLIC3, EIGENB), showing that our approach is competitive.

Aiming at illustrating the rate of convergence of the proposed algebraic rules, Fig. 2 shows the logarithm (base 10) of the squared residual value against the iterations for two typical problems.

Fig. 2
figure 2

The residual value \(\Vert F(x_k)\Vert ^2\) against the iterations, for the problems EIGENC (left) and POWELLBS (right)

7 Final remarks

We have proposed and analyzed a class of Levenberg–Marquardt methods in which algebraic rules for computing the regularization parameter were devised. Under the Lipschitz continuity of the Jacobian of the residual functions, the algebraic rules were proposed to allow the full LM-step to be accepted by the Armijo sufficient decrease condition. In terms of global convergence, all the accumulation points of the sequence generated by the algorithm are stationary. As for the local convergence for the zero-residual problem, we have proved a q-quadratic rate under an error-bound condition. This condition is less restrictive than assuming nonsingularity at the solution. A set of numerical experiments was prepared to illustrate the practical performance of the proposed algorithm. Our approach has shown both efficiency and robustness for the 53 feasibility instances from the CUTEst. It has performed slightly better than both the algorithm proposed by Fan and Pan in [5] and the routine lsqnonlin of Matlab. Obtaining inexact solutions for the linear systems with adequately matching stopping criteria, as well as considering nonzero-residual problems in the local convergence analysis are topics for future investigations.