1 Introduction

In this paper, we consider the system of nonlinear equations

$$\begin{aligned} F(x)=0, \end{aligned}$$
(1)

where \(F(x):{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) is a continuously differentiable function. We assume that the system of nonlinear Eq. (1) has a nonempty solution set \(X_*\), and refer \(\Vert \cdot \Vert \) to the Euclidean 2-norm.

As we all know, the system of nonlinear equations is one of the cornerstones in many fields of science, including engineering, finance, machine learning, etc. There have been many well-known iterative methods to find one or more solutions for the system of nonlinear Eq. (1), which may be summarized by two strategies: the line search and the trust region paradigms. The line search strategy computes a descent direction and a suitable stepsize to obtain a new estimation of the minimum point, while the trust region methods choose a step and a stepsize in a region around the current iterate within which we trust. Both these strategies contain the globally convergent line search and trust region methods, the Newton-based methods, the conjugate gradient methods, the quasi-Newton BFGS, DFP, and SR1 etc. They are operational in the system of nonlinear equations and unconstrained optimizations [1,2,3]. Moreover, the iterative technique is not only widely used for solving systems of nonlinear equations and unconstrained optimization, but also for solving matrix equations. Some gradient and least squares based iterative methods have been used to find the solutions to matrix equations that are often encountered in systems and control [4, 5].

Although there have been many efficient iterative methods to find a solution to the system of nonlinear equations, none of them is guaranteed to obtain the solutions for all systems of nonlinear equations. Many iterative methods are specialized at solving particular classes of systems of nonlinear equations. Therefore, the application of numerical methods is one of the important problems that many researchers are interested in. According to the characteristics of nonlinear problems, a suitable iterative method for solving a class of system of nonlinear equations that plays a key role in applied science, and there have been many works on these recently. For instants, in system identification, many identification methods and parameter estimation approaches have been developed for linear systems, bilinear systems [6] and nonlinear systems. Among them, there are many methods based on gradient and least squares that have been widely used for solving stochastic systems [7, 8], parameter identification [9], dual-rate systems [10, 11], and state space systems [12], as well as many identification methods for solving Hammerstein nonlinear systems [13].

In this paper, we focus on the numerical methods for solving the system of nonlinear equations. Due to the characteristics of the different nonlinear problems, there are many iterative algorithms have not yet been utilized efficiently. Therefore, many improved modifications of the classical iterative methods have been made [14,15,16]. Here, we improve the classical Levenberg–Marquardt method for solving the system of nonlinear equations and discuss the convergence under some conditions.

The rest of this paper is organized as follows. In the next section, we first propose the modified Levenberg–Marquardt algorithm for the system of nonlinear Eq. (1). Then, in Sect. 3, the convergence rate of the algorithm will be discussed with \(\gamma \)-Hölderian local error bound condition and the v-Hölderian continuity of the Jacobian. In Sect. 4, the global convergence result is given when the line search is used. Some numerical results are given in Sect. 5. Finally, we conclude this paper in Sect. 6.

2 The modified Levenberg–Marquardt method

One of the most well-known methods for solving system of nonlinear Eq. (1) is the Levenberg–Marquardt method (hereinafter referred to as the LM method) [17, 18], which computes the following trial step at every iteration

$$\begin{aligned} d_k=-\left( J_k^TJ_k+\lambda _kI\right) ^{-1}J_k^TF_k, \end{aligned}$$
(2)

where \(F_k=F(x_k)\), \(J_k=J(x_k)=F'(x_k)\) is the Jacobian, and \(\lambda _k\) is the LM parameter updated by an appropriate rule. The LM method has quadratic convergence when the function F(x) is Lipschitz continuous and its Jacobian J(x) is nonsingular. To improve the computational efficiency and theoretical results of the LM method, there have been intense and interesting discussions on the choices of the LM parameter and convergence conditions in many works of literature recently [19].

Let \(N(x_*,b)=\left\{ x\mid \Vert x-x_*\Vert \le b\right\} \subset {\mathbb {R}}^n\), with \(b<1\) is a positive constant, be a neighbourhood of \(x_*\) and \(\forall x\in N(x_*,b)\). By choosing the LM parameter as \(\lambda _k=\Vert F_k\Vert ^2\), Yamashita and Fukushima [20] show that the LM method has quadratic convergence under the following local error bound condition which is weaker than nonsingularity,

$$\begin{aligned} c~\textrm{dist}\left( x,X_*\right) \le \Vert F(x)\Vert ,\quad \forall x\in {N(x_*,b)}, \end{aligned}$$
(3)

where c is a positive constant, and \(\textrm{dist}\left( x,X_*\right) =\inf _{{\bar{x}}\in X_*}\Vert x-{\bar{x}}\Vert \) means the distance from x to \(X_*\). It is obvious that, if \(J(x_*)\) is nonsingular at a solution \(x_*\) of (1), then \(x_*\) is an isolated solution and \(\Vert F(x)\Vert \) provides a local error bound on some neighbourhood of \(x_*\).

Although the local error bound condition (3) is weaker than nonsingularity of Jacobian \(J_k\), it is not always applicable for some ill-conditioned nonlinear equations from application fields like biochemical systems. The following condition, called \(\gamma \)-Hölderian local error bound, is proposed,

$$\begin{aligned} c~\textrm{dist}(x,X_*)\le \Vert F(x)\Vert ^\gamma ,\quad \forall x\in N(x_*,b), \end{aligned}$$
(4)

where \(\gamma \in (0,1]\), \(c>0\) is a positive constant. For example, for the Powell singular function [21], the local error bound condition (3) is invalid, but the \(\gamma \)-Hölderian local error bound condition (4) is valid with \(\gamma =1/2\) around the origin [22].

The \(\gamma \)-Hölderian local error bound condition (4) is obviously a generalization of the local error bound condition (3), which extends the exponent \(\gamma \) from 1 to an interval (0, 1]. Recently, under the \(\gamma \)-Hölderian local error bound condition, the convergence of some modified LM methods has been discussed with several choices of LM parameter [23,24,25,26].

From the definition of the LM trial step (2), it can be seen that the choice of the LM parameter \(\lambda _k\) has a great influence on the computational efficiency of the LM method. Therefore, in order to improve the computational efficiency, many choices of the LM parameter, besides \(\lambda _k=\Vert F_k\Vert ^2\) proposed above, have been given recently. For example, Fan and Yuan [27] chose \(\lambda _k=\Vert F_k\Vert \), Dennis and Schnable [28] employed \(\lambda _k=O(\Vert J_k^TF_k\Vert )\) while Fischer [29] took \(\lambda _k=\Vert J_k^TF_k\Vert \), Ma and Jiang [30] proposed the convex combination of these two choices, i.e. \(\lambda _k=\theta \Vert F_k\Vert +(1-\theta )\Vert J_k^TF_k\Vert \) with \(\theta \in [0,1]\). And some other choices are given [31,32,33]. To make the LM parameter belongs to an interval which is proportional to \(\Vert F_k\Vert \), some algebraic rules for the LM parameter also have been given [34, 35].

The choice of the LM parameter may cause some unsatisfactory properties when the sequence \(\{x_k\}\) generated by the LM method is close to or far away from the solution set, Fan et al. [36] and Chen [37] extended the choice \(\lambda _k=\Vert F_k\Vert ^2\) to \(\lambda =\Vert F_k\Vert ^\delta \) with \(\delta \in [1,2]\). Inspired by those observations, we consider our choice as

$$\begin{aligned} \lambda _k=\theta \Vert F_k\Vert ^\delta +(1-\theta )\Vert J_k^TF_k\Vert ^\delta , \end{aligned}$$
(5)

where \(\delta \in [1,2]\) and \(\theta \in [0,1]\). It’s clear that, if we take \(\delta =1\), the choice (5) will be that which took by Ma and Jiang in [30], and if \(\delta =\theta =1\), the choice (5) will reduce to that proposed by Fan and Yuan in [27], etc. It also can be considered as the convex combination of \(\Vert F_k\Vert ^\delta \) and \(\Vert J_k^TF_k\Vert ^\delta \). Therefore, choice (5) can be considered as the extension of many other choices.

Now, we consider the following modified LM algorithm with the LM parameter as (5) for the system of nonlinear Eq. (1).

figure a

To generalize the theoretical application of the modified LM algorithm 1, we analyse the convergence of our new LM method under the \(\gamma \)-Hölderian local error bound (4) instead of the commonly used local error bound condition (3).

3 Convergence rate

The following assumptions are needed to establish convergence of Algorithm 1 throughout the paper.

Assumption 1

The following conditions hold for some \(x_*\in X_*\).

  1. (a)

    F(x) admits a \(\gamma \)-Hölderian local error bound property in the neighbourhood of \(x_*\in X_*\), where \(\gamma \in (0,1]\), i.e., there exits positive constants \(c>0\) that makes

    $$\begin{aligned} c~\textrm{dist}(x,X_*)\le \Vert F(x)\Vert ^\gamma ,\quad \forall x\in N(x_*,b). \end{aligned}$$
    (6)
  2. (b)

    J(x) is \(\nu \)-Hölderian continuous, where \(v\in (0,1]\), i.e., there exists a constant \(\kappa _1>0\) such that

    $$\begin{aligned} \Vert J(x)-J(y)\Vert \le \kappa _1\Vert x-y\Vert ^v,\quad \forall x,y\in N(x_*,b). \end{aligned}$$
    (7)

Remark 1

If \(\gamma =1\), then the \(\gamma \)-Hölderian local error bound condition (6) will be the local error bound condition (3). If \(v=1\), then the v-Hölderian continuity of the Jacobian (7) will be Lipschitz continuity of the Jacobian.

It follows from Assumption 1 and the mean value theorem that

$$\begin{aligned} \Vert F(y)-F(x)-J(x)(y-x)\Vert&=\left\| \int _0^1J\left( x+t(y-x)\right) (y-x)dt-J(x)(y-x)\right\| \nonumber \\&\le \Vert y-x\Vert \int _0^1\left\| J\left( x+t(y-x)\right) -J(x)\right\| dt\nonumber \\&\le \frac{\kappa _1}{1+v}\Vert y-x\Vert ^{1+v} \quad \hbox { for all}\ x,y\in N(x_*,b). \end{aligned}$$
(8)

It can also been seen that there is a constant \(\kappa _2>0\) such that

$$\begin{aligned} \Vert J(x)\Vert \le \kappa _2,\quad \forall x\in N(x_*,b). \nonumber \\ \Vert F(y)-F(x)\Vert \le \kappa _2\Vert y-x\Vert ,\quad \forall x,y\in N(x_*,b). \end{aligned}$$
(9)

Suppose that Algorithm 1 starts with an initial \(x_0\) that is sufficiently close to \(x_*\) such that the sequence \(\{x_k\}_{k=0}^{\infty }\) stays in the feasible set \(X_*\) for F(x). It is worth emphasizing that \(x_*\) may not be an isolated zero of F(x), and in the end \(x_k\) may converge to some point in \(X_*\) if it converges. Denote by \({\tilde{x}}\in X_*\) which satisfies

$$\begin{aligned} \Vert {\tilde{x}}-x\Vert =\textrm{dist}\left( x,X_*\right) . \end{aligned}$$
(10)

Now, we will study the local convergence of our modified LM method, i.e., Algorithm 1.

Denote by

$$\begin{aligned} d(x)=-\left( J(x)^TJ(x)+\lambda (x)I\right) ^{-1}J(x)^TF(x) \end{aligned}$$
(11)

and

$$\begin{aligned} \lambda (x)=\theta \Vert F(x)\Vert ^{\delta }+(1-\theta )\Vert J(x)^TF(x)\Vert ^{\delta }. \end{aligned}$$
(12)

Lemma 2

Let Assumption 1 hold. If \(v\ge 2(1/\gamma -1)\), then there exist some positive constants \(c_1, c_2\) such that

$$\begin{aligned} c_1\textrm{dist}(x,X^*)^{\delta /\gamma }\le \lambda (x)\le c_2\textrm{dist}(x,X^*)^\delta ,\quad \forall x\in N(x^*,b). \end{aligned}$$
(13)

Proof

First we will show the second inequality in (13). From (9), we have

$$\begin{aligned} \lambda (x)&=\theta \Vert F(x)\Vert ^\delta +(1-\theta )\Vert J(x)^TF(x)\Vert ^\delta \le \left( \theta +(1-\theta )\Vert J(x)\Vert ^\delta \right) \Vert F(x)\Vert ^\delta \nonumber \\&=\left( \theta +(1-\theta )\Vert J(x)\Vert ^\delta \right) \left\| F({\bar{x}})-F(x)\right\| ^\delta \le \left( \theta +(1-\theta )\Vert J(x)\Vert ^\delta \right) \kappa _2^\delta \Vert {\bar{x}}-x\Vert ^\delta \nonumber \\&\le \kappa _2^\delta \left( \theta +(1-\theta )\kappa _2^\delta \right) \Vert {\bar{x}}-x\Vert ^\delta . \end{aligned}$$

This proves the second inequality in (13) with \(c_2=\kappa _2^\delta \left( \theta +(1-\theta )\kappa _2^\delta \right) \).

Now, we show the first inequality in (13). From

$$\begin{aligned} \Vert F(x)\Vert ^2&=F(x)^TF(x)\nonumber \\&=F(x)^T\left( F({\bar{x}})+J(x)(x-{\bar{x}})\right) +F(x)^T\left( F(x)-F({\bar{x}})-J(x)(x-{\bar{x}})\right) , \end{aligned}$$

we have

$$\begin{aligned} F(x)^TJ(x)(x-{\bar{x}})= \Vert F(x)\Vert ^2-F(x)^T\left( F(x)-F({\bar{x}})-J(x)(x-{\bar{x}})\right) . \end{aligned}$$

From the properties of the norm, we get

$$\begin{aligned} \Vert F(x)^TJ(x)(x-{\bar{x}})\Vert \ge \Vert F(x)\Vert ^2-\Vert F(x)^T\Vert ~\Vert F(x)-F({\bar{x}})-J(x)(x-{\bar{x}})\Vert .\nonumber \\ \end{aligned}$$
(14)

Since \(v\ge 2(1/\gamma -1)\), it follows from (6), (8), (9) and (14) that

$$\begin{aligned} \Vert J(x)^TF(x)\Vert ~\Vert {\bar{x}}-x\Vert \ge c^{2/\gamma }\Vert {\bar{x}}-x\Vert ^{2/\gamma }-\frac{\kappa _1\kappa _2}{1+v}\Vert {\bar{x}}-x\Vert ^{2+v}. \end{aligned}$$

Thus, we have

$$\begin{aligned} \Vert J(x)^TF(x)\Vert \ge&c^{2/\gamma }\Vert {\bar{x}}-x\Vert ^{2/\gamma -1}-\frac{\kappa _1\kappa _2}{1+v}\Vert {\bar{x}}-x\Vert ^{1+v} \ge c_3\Vert {\bar{x}}-x\Vert ^{2/\gamma -1} \end{aligned}$$

with \(c_3>0\) is some positive constant. Hence

$$\begin{aligned} \lambda (x)&=\theta \Vert F(x)\Vert ^\delta +(1-\theta )\Vert J(x)^TF(x)\Vert ^\delta \nonumber \\&\ge \theta c^{\delta /\gamma }\Vert {\bar{x}}-x\Vert ^{\delta /\gamma }+c_3(1-\theta )\Vert {\bar{x}}-x\Vert ^{\delta (2/\gamma -1)}\nonumber \\&\ge c_1\Vert {\bar{x}}-x\Vert ^{\delta /\gamma }. \end{aligned}$$

This proves the first inequality in (13) with \(c_1\) being a positive constant. \(\square \)

Lemma 3

Let Assumption 1 hold, then the following inequality holds with a positive constant \(c_4>0\).

$$\begin{aligned} \Vert d(x)\Vert \le c_4\textrm{dist}(x,X^*)^{\min \{1,1+v-\delta /2\gamma \}},\quad \forall x\in N(x^*,b/2). \end{aligned}$$
(15)

Proof

For all \(x\in N(x^*,b/2)\), we get

$$\begin{aligned} \Vert {\bar{x}}-x^*\Vert \le \Vert {\bar{x}}-x\Vert +\Vert x-x^*\Vert \le b. \end{aligned}$$
(16)

Thus, \({\bar{x}}\in N(x^*,b)\).

Define the quadratical convex function \(\varphi (d)\) as

$$\begin{aligned} \varphi (d)=\Vert F(x)+J(x)d\Vert ^2+\lambda (x)\Vert d\Vert ^2. \end{aligned}$$
(17)

Hence, d(x) is the minimizer of \(\varphi (d)\). By Assumption 1, Lemma 2, (16) and (17), we get

$$\begin{aligned} \Vert d(x)\Vert ^2&\le \frac{\varphi (d(x))}{\lambda (x)}\le \frac{\varphi ({\bar{x}}-x)}{\lambda (x)} =\frac{\Vert F(x)+J(x)({\bar{x}}-x)\Vert ^2+\lambda (x)\Vert {\bar{x}}-x\Vert ^2}{\lambda (x)}\nonumber \\&=\frac{\Vert F({\bar{x}})-F(x)-J(x)({\bar{x}}-x)\Vert ^2}{\lambda (x)}+\Vert {\bar{x}}-x\Vert ^2\nonumber \\&\le \frac{\kappa _1^2}{c_1(1+v)^2}\Vert {\bar{x}}-x\Vert ^{2(1+v)-\delta /\gamma }+\Vert {\bar{x}}-x\Vert ^2\nonumber \\&\le \left( \frac{\kappa _1^2}{c_1(1+v)^2}+1\right) \Vert {\bar{x}}-x\Vert ^{2\min \{1,1+v-\delta /2\gamma \}}. \end{aligned}$$

Let \(c_4=\sqrt{\kappa _1^2/c_1(1+v)^2+1}\), we obtain (15). \(\square \)

Lemma 4

Assume \(x,x+d(x)\in N(x_*,b/2)\). Then the following inequality holds with a positive constant \(c_5\).

$$\begin{aligned} \textrm{dist}(x+d(x),X_*)\le c_5\textrm{dist}(x,X_*)^{\gamma \min \{1+v,1+\delta /2,2(1+v-\delta /2\gamma )\}}. \end{aligned}$$
(18)

Proof

From the definition of \(\varphi (d)\) (17) and Lemma 2, we can show

$$\begin{aligned} \Vert F(x)+J(x)d(x)\Vert ^2&\le \varphi (d(x))\le \varphi ({\tilde{x}}-x)\le \Vert F(x)+J(x)({\tilde{x}}-x)\Vert ^2 +\lambda _k\Vert {\tilde{x}}-x\Vert ^2\nonumber \\&\le \frac{\kappa _1^2}{(1+v)^2}\Vert {\tilde{x}}-x\Vert ^{2(1+v)}+c_2\Vert {\tilde{x}}-x\Vert ^{2+\delta }\nonumber \\&\le \left( \frac{\kappa _1^2}{(1+v)^2}+c_2\right) \Vert {\tilde{x}}-x\Vert ^{2\min \{1+v,1+\delta /2\}}. \end{aligned}$$

Then we have

$$\begin{aligned} \Vert F(x)+J(x)d(x)\Vert \le c_6\Vert {\tilde{x}}-x\Vert ^{\min \{1+v,1+\delta /2\}}, \end{aligned}$$
(19)

where positive constant \(c_6=\sqrt{\kappa _1^2/(1+v)^2+c_2}\).

It follows from Assumption 1, Lemma 3 and (19) that

$$\begin{aligned} c~\textrm{dist}(x+d(x),X_*)&\le \Vert F(x+d(x))\Vert ^\gamma \le \left( \Vert F(x)+J(x)d(x)\Vert +\kappa _1\Vert d(x)\Vert ^2\right) ^\gamma \\&\le \left( c_6\Vert {\tilde{x}}-x\Vert ^{\min \{1+v,1+\delta /2\}} +\kappa _1c_4^2\Vert {\tilde{x}}-x\Vert ^{2\min \{1,1+v-\delta /2\gamma \}}\right) ^\gamma \\&\le \left( c_6+\kappa _1c_4^2\right) ^\gamma \Vert {\tilde{x}}-x\Vert ^{\gamma \min \{1+v,1+\delta /2,2,2(1+v-\delta /2\gamma )\}}. \end{aligned}$$

Hence the inequality (18) holds with \(c_5=c^{-1}\left( c_6+\kappa _1c_4^2\right) ^\gamma \). \(\square \)

It is clear that, from (11) and (12), we have \(d_k=d(x_k)\), \(\lambda _k=\lambda (x_k)\) and \(x_{k+1}=x_k+d_k\). Then it follows from Lemmas 2, 3 and 4 that, at every iteration,

$$\begin{aligned} c_1\textrm{dist}(x_k,X^*)^{\delta /\gamma }&\le \lambda _k\le c_2\textrm{dist}(x_k,X^*)^\delta ,\\ \Vert d_k\Vert&\le c_4\textrm{dist}(x_k,X^*)^{\min \{1,1+v-\delta /2\gamma \}},\\ \textrm{dist}(x_{k+1},X^*)&\le c_5\textrm{dist}(x_k,X^*)^{\gamma \min \{1+v,1+\delta /2,2(1+v-\delta /2\gamma )\}} \end{aligned}$$

hold with \(x_k,x_{k+1}\in N(x_*,b/2)\).

Remark 2

Let

$$\begin{aligned} \alpha =&\min \{\gamma (1+v),\gamma (1+\delta /2),2\gamma (1+v)-\delta \}, \\ \beta =&\min \{1,1+v-\delta /2\gamma \}. \end{aligned}$$

To analyse the convergence rate of the sequence \(\{x_k\}\) generated by Algorithm 1, the value of above \(\alpha \) and \(\beta \) in Lemmas 3 and 4 should be discussed. Obviously, a larger value of \(\alpha \) will lead to derive a better convergence rate. Also, to deduce a convergence result, one needs to have \(\alpha >1\). This holds if and only if

$$\begin{aligned} \gamma >\frac{1}{1+v}\quad \text{ and }\quad 2\left( \frac{1}{\gamma }-1\right)<\delta <2\gamma (1+v)-1, \end{aligned}$$
(20)

which imposes an additional requirement on the values of \(\gamma \), v and \(\delta \).

Since \(\alpha >1\), we have that \(\beta \alpha ^k>k\) for sufficiently large k. As \(\sum _{l=0}^{\infty }\left( \frac{1}{2}\right) ^l=2\), we deduce that

$$\begin{aligned} c_7=\sum _{l=0}^{\infty }\left( \frac{1}{2}\right) ^{\beta \alpha ^l}<\infty . \end{aligned}$$

Let

$$\begin{aligned} r=\min \left\{ \frac{1}{2c_5^{1/(\alpha -1)}},\frac{b}{2\left( 1+2c_4c_5^{(1-\beta )/(\alpha -1)}c_7\right) }\right\} , \end{aligned}$$
(21)

where \(c_4, c_5, c_7, \alpha , \beta \) are defined above.

Now, we show that if the initial point \(x_0\) in Algorithm 1 is close enough to \(X_*\), the assumption of Lemma 4 will be satisfied.

Lemma 5

Assume (20) holds. If the initial point \(x_0\in N(x_*,r)\), where r is given by (21), then we have \(x_k\in N(x_*,b/2)\) for every k.

Proof

The proof is conducted by induction of k. We start with \(k=0\). By the assumption, we have \(x_0\in N(x_*,r)\). Since \(r\le b/2\), then \(x_l\in N(x_*,b/2)\) for all \(l=0,1,\cdots ,k\). In order to show that \(x_{k+1}\in N(x_*,b/2)\), we first from Lemma 3 note that

$$\begin{aligned} \Vert x_{k+1}-x_*\Vert&=\Vert x_k+d_k-x_*\Vert \le \Vert x_k-x_*\Vert +\Vert d_k\Vert \le \Vert x_{k-1}-x_*\Vert +\Vert d_{k-1}\Vert +\Vert d_k\Vert \nonumber \\&\le \cdots \le \Vert x_0-x_*\Vert +\sum _{l=0}^{k}\Vert d_l\Vert \le r+c_4\sum _{l=0}^{k}\textrm{dist}(x_l,X_*)^{\beta }. \end{aligned}$$
(22)

By Lemma 4, since \(r\le {1}/{2c_5^{1/(\alpha -1)}}\) defined in (21), we get

$$\begin{aligned} \textrm{dist}(x_l,X_*)&\le c_5\textrm{dist}(x_{l-1},X_*)^{\alpha }\le c_5c_5^{\alpha }\textrm{dist}(x_{l-2},X_*)^{\alpha ^2}\nonumber \\&\le \cdots \nonumber \\&\le c_5c_5^{\alpha }\cdots c_5^{\alpha ^{l-1}}\textrm{dist}(x_0,X_*)^{\alpha ^l} =c_5^{1+\alpha +\cdots +\alpha ^{l-1}}\textrm{dist}(x_0,X_*)^{\alpha ^l}\nonumber \\&\le c_5^{\frac{\alpha ^{l}-1}{\alpha -1}}\Vert x_0-x_*\Vert ^{\alpha ^l}\le c_5^{\frac{\alpha ^{l}-1}{\alpha -1}}r^{\alpha ^l}\le r\left( c_5^{\frac{1}{\alpha -1}}r\right) ^{\alpha ^l-1}\nonumber \\&\le r\left( \frac{1}{2}\right) ^{\alpha ^l-1}, \end{aligned}$$
(23)

where \(l=0,1,\cdots ,k\).

By (21), (22) and (23), we have

$$\begin{aligned} \Vert x_{k+1}-x_*\Vert&\le r+c_4\sum _{l=0}^{k}\textrm{dist}(x_l,X_*)^\beta \le r+c_4\sum _{l=0}^{k}\left( r\left( \frac{1}{2}\right) ^{\alpha ^l-1}\right) ^{\beta }\nonumber \\&=r+c_4(2r)^\beta \sum _{l=0}^{k}\left( \frac{1}{2}\right) ^{\beta \alpha ^l} \le r\left( 1+2^\beta c_4r^{\beta -1}\sum _{l=0}^{\infty }\left( \frac{1}{2}\right) ^{\beta \alpha ^l}\right) \nonumber \\&\le r\left( 1+2c_4c_5^{(1-\beta )/(\alpha -1)}c_7\right) \le \frac{b}{2}. \end{aligned}$$

The above inequality indicates that \(x_{k+1}\in N\left( x_*,b/2\right) \). \(\square \)

From Lemmas 4 and 5 that we obtain the following main theorem.

Theorem 6

Under Assumption 1, if (20) holds, then the sequence \(\{\textrm{dist}(x_k,X_*)\}\) superlinearly converges to zero with order \(\min \{\gamma (1+v),\gamma (1+\delta /2),2\gamma (1+v)-\delta \}\). Moreover, the sequence \(\{x_k\}\) generated by Algorithm 1 with initial point \(x_0\in N(x_*,r)\) converges to a solution \({\tilde{x}}\in X_*\bigcap N(x_*,b/2)\) of nonlinear Eq. (1).

Proof

The first part of this theorem, that is the convergence rate of sequence \(\{\textrm{dist}(x_k,X_*)\}\), is clearly derived by Lemmas 4 and 5. So, only the proof of second part should been given.

From the hypothesis, for every k, we get \(x_k\in N(x_*,b/2)\). Hence, we just have to show that \(\{x_k\}\) converges. From Lemma 3 and (23), we have

$$\begin{aligned} \Vert d_k\Vert \le c_4\textrm{dist}(x_k,X_*)^\beta \le c_4r^\beta \left( \frac{1}{2}\right) ^{\beta (\alpha ^l-1)}. \end{aligned}$$
(24)

Denoting by

$$\begin{aligned} s_k=&\sum _{l=0}^{k}\Vert d_l\Vert \le c_4(2r)^\beta \sum _{l=0}^{k}\left( \frac{1}{2}\right) ^{\beta \alpha ^l} \le c_4c_5^{\frac{\beta }{1-\alpha }}\sum _{l=0}^{k}\left( \frac{1}{2}\right) ^{\beta \alpha ^l}\le c_4c_5^{\frac{\beta }{1-\alpha }}c_7<\infty , \end{aligned}$$

we obtain that \(\{s_k\}\) is a Cauchy sequence. Thus, we have

$$\begin{aligned} \Vert x_{k+p}-x_k\Vert&\le \Vert d_{k+p-1}\Vert +\Vert x_{k+p-1}-x_k\Vert \nonumber \\&\le \cdots \le \sum _{l=k}^{k+p-1}\Vert d_l\Vert \nonumber \\&= \mid s_{k+p-1}-s_{k-1}\mid , \end{aligned}$$
(25)

for any \(k,p\in {\mathbb {N}}\). Which indicates that the sequence \(\{x_k\}\) is also a Cauchy sequence. Then, \(\{x_k\}\) converges to some \({\tilde{x}}\). Since \(x_{k}\in N(x_*,b/2)\) for all k and \(\{\textrm{dist}(x_k,X_*)\}\) convergence to zero, we have \({\tilde{x}}\in X_*\bigcap N(x_*,b/2)\). \(\square \)

Theorem 7

Under the assumptions of Theorem 6, if \(v\ge \delta /2\gamma \), then sequence \(\{x_k\}\), generated by Algorithm 1 with \(x_0\in N(x_*,r)\), converges to the solution \({\tilde{x}}\) of the nonlinear Eq. (1) with order \(\gamma (1+\delta /2)\).

Proof

It follows from \(v\ge \delta /2\gamma \) that

$$\begin{aligned}&\gamma (1+v)-\gamma \left( 1+\frac{\delta }{2}\right) \\&=\gamma \left( v-\frac{\delta }{2}\right) \ge \gamma \left( \frac{\delta }{2\gamma }-\frac{\delta }{2}\right) \\&=\frac{\gamma \delta }{2}\left( \frac{1}{\gamma }-1\right) \ge 0 \end{aligned}$$

and

$$\begin{aligned}&\left( 2\gamma (1+v)-\delta \right) -\gamma \left( 1+\frac{\delta }{2}\right) \\&=\gamma +2\gamma v-\delta -\frac{\gamma \delta }{2}\ge \gamma -\frac{\gamma \delta }{2}\\&=\frac{\gamma }{2}(2-\delta )\ge 0. \end{aligned}$$

So, from the the assumptions of Theorem 6, we have

$$\begin{aligned} \alpha =\min \{\gamma (1+v),\gamma (1+\delta /2),2\gamma (1+v)-\delta \}=\gamma (1+\delta /2)>1. \end{aligned}$$
(26)

Moreover, it follows from (26) that \(\textrm{dist}(x_{k+1},X_*)\le \frac{1}{2}\textrm{dist}(x_k,X_*)\) holds for all sufficiently large k. From \(v\ge \delta /2\gamma \), we have \(\beta =1\) in Lemma 3. By letting \(p\rightarrow \infty \) in (25), we deduce

$$\begin{aligned} \Vert {\tilde{x}}-x_k\Vert&\le \sum _{l=k}^{\infty }\Vert d_l\Vert \le c_4\sum _{l=k}^{\infty }\textrm{dist}(x_l,X_*)\le c_4\sum _{l=k}^{\infty }\left( \frac{1}{2}\right) ^{l-k}\textrm{dist}(x_k,X_*)\nonumber \\&\le 2c_4\textrm{dist}(x_k,X_*)\le 2c_4c_5\textrm{dist}(x_{k-1},X_*)^\alpha \le 2c_4c_5\Vert {\tilde{x}}-x_{k-1}\Vert ^\alpha , \end{aligned}$$
(27)

which implies that \(\{x_k\}\) converges with rate \(\gamma (1+\delta /2)\). \(\square \)

Remark 3

  1. (1)

    As it was shown in Theorem 7, if \(\gamma =v=1\), we have \(\alpha =1+\delta /2\). From (27), we obtain

    $$\begin{aligned} \Vert {\tilde{x}}-x_k\Vert \le 2c_4c_5\Vert {\tilde{x}}-x_{k-1}\Vert ^{1+\delta /2} \end{aligned}$$

    which shows that the sequence \(\{x_k\}\) at least converges superlinearly and converges quadratically while \(\delta =2\). This result is the same as the result in [20] with \(\delta =2\) and \(\theta =1\).

  2. (2)

    Under Assumption 1, let \(\gamma =v=1\). Algorithm 1 will reduce to the algorithm proposed by Fan and Yuan in [27] when \(\theta =1\), \(\delta =1\), the algorithm proposed by Ma and Jiang in [30, Algorithm 2.1] when \(\delta =1\), and the algorithm proposed by Fan and Yuan in [36] when \(\theta =1\), \(\delta \in [1,2]\). The results of Theorem 7 is also the same as theirs.

Remark 4

To obtain fast convergence order, singular value decomposition (SVD) technology was employed by some papers [27, 30, 36], but we omit it here.

4 Global convergence

In this section, we discuss the global convergence properties of the modified LM method. To guarantee the global convergence, the Armijo line search technique is employed in the modified LM method.

Take

$$\begin{aligned} \Phi (x)=\frac{1}{2}\Vert F(x)\Vert ^2 \end{aligned}$$

as the merit function for nonlinear equations (1).

We consider the following algorithm.

figure b

Theorem 8

Suppose Assumption 1 holds. Then the sequence \(\{x_k\}\) be generated by Algorithm 2 converges to a stationary point of \(\Phi (x)\). Moreover, if the stationary point \(x_*\) is a solution of nonlinear Eq. (1), then \(\{x_k\}\) converges to the solution with convergence rate \(\min \{\gamma (1+v),\gamma (1+\delta /2),2\gamma (1+v)-\delta \}\).

Proof

The Armijo line search technique is employed in Algorithm 2. That is, the next iteration will be computed might be

$$\begin{aligned} x_{k+1}=x_k+{\tilde{\beta }}^md_k, \end{aligned}$$

where \({\tilde{\beta }}^m\) is a step size and \(m\geqslant 0\) is the smallest nonnegative integer that satisfies the following inequality.

$$\begin{aligned} \Phi \left( x_k+{\tilde{\beta }}^md_k\right) -\Phi (x_k)\le {\tilde{\alpha }}{\tilde{\beta }}^m\nabla \Phi (x_k)^Td_k. \end{aligned}$$

Combine with [38, Eq. (2.10)], we get

$$\begin{aligned} \Vert F(x_{k+1})\Vert ^2\le \Vert F_k\Vert ^2-{\tilde{\alpha }}{\hat{\alpha }}\frac{\left( F_k^TJ_kd_k\right) ^2}{\Vert d_k\Vert ^2}, \end{aligned}$$
(28)

where \({\hat{\alpha }}\) is some positive constant.

Therefore, from the condition \(\Vert F(x_k+d_k)\Vert \le {\tilde{\gamma }}\Vert F_k\Vert \) and (28), we obtain that \(\left\{ \Vert F(x_k)\Vert \right\} \) is monotonically decreasing. Hence, the sequence \(\{x_k\}\) converges to a stationary point of \(\Phi (x)\).

Next we show the remainder part. We deduce from the first part that \(\Vert F(x_k+d_k)\Vert \le {\tilde{\gamma }}\Vert F_k\Vert \) holds for sufficient large k, that is, \({\tilde{\beta }}^m=1\) holds for all sufficient large k. Since \(\{x_k\}\) converges to a stationary point \(x_*\) which is a solution of nonlinear Eq. (1), there exists a large K such that

$$\begin{aligned} \Vert F(x_K)\Vert ^{\alpha \gamma -1}\le \frac{c^\alpha {\tilde{\gamma }}}{\kappa _2c_5}, \end{aligned}$$
(29)

and

$$\begin{aligned} \Vert x_K-x_*\Vert \le r \end{aligned}$$

where c, \(\kappa _2\) and \(c_5\) are defined in Sect. 3, r is defined by (21).

Let sequence \(\{y_k\}\) be generated by Algorithm 1 with unit step size and \(y_0=x_K\). Then, by Theorem 6, the sequence \(\textrm{dist}(y_l,X_*)\) converges to zero with convergence rate \(\min \{\gamma (1+v),\gamma (1+\delta /2),2\gamma (1+v)-\delta \}\). So, we just have to prove that \(x_{K+l}=y_l\) holds for each l, i.e. \(\{y_l\}\) satisfies

$$\begin{aligned} \Vert F(y_{l+1})\Vert \le {\tilde{\gamma }}\Vert F(y_l)\Vert . \end{aligned}$$

Let \({\bar{y}}_{l+1}\in X_*\) such that \(\Vert y_{l+1}-{\bar{y}}_{l+1}\Vert =\textrm{dist}(y_{l+1},X_*)\). It follows from Assumption 1 (a), Lemma 4, (9) and (29) that

$$\begin{aligned} \Vert F(y_{l+1})\Vert&=\Vert F(y_{l+1})-F({\bar{y}}_{l+1})\Vert \nonumber \\&\le \kappa _2\textrm{dist}(y_{l+1},X_*) \le \kappa _2c_5\textrm{dist}(y_l,X_*)^\alpha \nonumber \\&\le \frac{\kappa _2c_5\Vert F(y_l)\Vert ^{\alpha \gamma -1}}{c^\alpha }\Vert F(y_l)\Vert \nonumber \\&\le {\tilde{\gamma }}\Vert F(y_l)\Vert \end{aligned}$$
(30)

holds for \({\tilde{\gamma }}<1\) and each l. (30) implies that \({\tilde{\beta }}^m=1\) holds for all sufficient large k in Algorithm 2. Thus, we have \(\{x_k\}\) converges to the solution with convergence rate \(\min \{\gamma (1+v),\gamma (1+\delta /2),2\gamma (1+v)-\delta \}\). \(\square \)

5 Numerical example

Some numerical experiments are carried out to verify Algorithm 1 proposed in Sect. 3 in this section. We first test the following Powell singular function [21] proposed in Sect. 1.

$$\begin{aligned} F(x)=\left( x_1+10x_2,\sqrt{5}(x_3-x_4),(x_2-2x_3)^2,\sqrt{10}(x_1-x_4)^2\right) ^T, \end{aligned}$$
(31)

where \(x\in {\mathbb {R}}^4\). As proposed in Sect. 1, the local error bound (3) does not satisfy (31) but the \(\gamma \)-Hölderian local error bound (4) does.

The other test problems are systems of nonlinear equations created by the modifying singular problems proposed in [21, 39],

$$\begin{aligned} {\hat{F}}(x)=F(x)-J(x_*)A\left( A^TA\right) ^{-1}A^T(x-x_*), \end{aligned}$$
(32)

where \(x_*\) is its root, function F(x) is the standard nonsingular function, and matrix \(A\in {\mathbb {R}}^{n\times k}\) has full column rank with \(1\le k\le n\). It is easy to check that \({\hat{F}}(x_*)=0\) and \({\hat{J}}(x_*)=J(x_*)\left( I-A\left( A^TA\right) ^{-1}A^T\right) \) has rank \(n-k\).

We chose the rank of \({\hat{J}}(x_*)\) to be \(n-1\) by using

$$\begin{aligned} A\in {\mathbb {R}}^{n\times 1},\quad A^T=(1,1,\cdots ,1) \end{aligned}$$
(33)

and \(n-2\) by using

$$\begin{aligned} A\in {\mathbb {R}}^{n\times 2},\quad A^T=\left( \begin{array}{rrrrrr} 1&{}1&{}1&{}1&{}\cdots &{}1\\ 1&{}-1&{}1&{}-1&{}\cdots &{}\pm 1\\ \end{array}\right) \end{aligned}$$
(34)

respectively.

We test several choices of the LM parameter in Algorithm 1. Based on the range of \(\theta \) and \(\delta \) defined in (5), we use \(\theta =0\), 0.5 and 1 and \(\delta =1\), 1.5 and 2 for the LM method with unit step size in Sect. 3. The algorithm is terminated when the norm of \(\Vert J^T_kF_k\Vert \leqslant 10^{-6}\), or when the number of the iterations exceeds \(100(n+1)\). However, for Powell singular function, we take the number \(10^6\) instead of \(100(n+1)\). The results for the Powell singular function are tabulated in Table 1, while the results for systems of nonlinear equations with rank \(n-1\) and \(n-2\) case are tabulated in Tables 2 and 3, respectively. We test three starting points \(x_0\), \(10x_0\) and \(100x_0\) where \(x_0\) is suggested by Moré, Garbow and Hillstrom in [21]. The notation “–” is used if the method fails to find the solution within the maximum iterations.

Table 1 Results on Powell singular function
Table 2 Results on the first singular test set with rank\((F'(x_*))=n-1\)
Table 3 Results on the second singular test set with rank\((F'(x_*))=n-2\)

The results in Table 1 show that Algorithm 1 is efficient for Powell singular function even it does not satisfy the local error bound (3). But it is not perfect because of the big number of the iteration when initial point is not close enough to \(x_*\). From the results in the all tables, we can see that Algorithm 1 is always outstanding when \(\delta =1\) no matter what the value of \(\theta \) is. In fact, when \(\{x_k\}\) is close to the solution set \(X_*\), the LM parameter \(\lambda _k\) with \(\delta =2\) may be very small, even less than the machine accuracy, and it will lose its role. Conversely, when \(\{x_k\}\) is far away from the solution set \(X_*\), the LM parameter \(\lambda _k\) with \(\delta =2\) may be very large, and the trial step \(d_k\) will be small, then it prevents \(\{x_k\}\) from converging rapidly. Also Algorithm 1 with \(\lambda _k=\Vert F_k\Vert ^\delta \) is always outperforms or at least performs as well as Algorithm 1 with \(\lambda _k=\Vert J_k^TF_k\Vert ^\delta \).

6 Conclusion

We have presented a modified Levenberg–Marquardt method for solving system of nonlinear equations with adaptive choice of LM parameter which is a convex combination of \(\Vert F_k\Vert ^\delta \) and \(\Vert J_k^TF_k\Vert ^\delta \). Under the \(\gamma \)-Hölderian local error bound condition of the underlying function and the v-Hölderian continuity of its Jacobian, its convergence including local and global has been analysed. These convergence properties hold in many applied problems, as they are satisfied by any real differentiable function. In our numerical experiments, we clearly obtained a superior performance of our choice of LM parameter.