Local convergence of quasi-Newton methods under metric regularity

Aragón Artacho, F. J.; Belyakov, A.; Dontchev, A. L.; López, M.

doi:10.1007/s10589-013-9615-y

Local convergence of quasi-Newton methods under metric regularity

Published: 30 October 2013

Volume 58, pages 225–247, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Optimization and Applications Aims and scope Submit manuscript

Local convergence of quasi-Newton methods under metric regularity

Download PDF

F. J. Aragón Artacho¹,
A. Belyakov^2,3,
A. L. Dontchev⁴ &
…
M. López⁵

642 Accesses
32 Citations
Explore all metrics

Abstract

We consider quasi-Newton methods for generalized equations in Banach spaces under metric regularity and give a sufficient condition for q-linear convergence. Then we show that the well-known Broyden update satisfies this sufficient condition in Hilbert spaces. We also establish various modes of q-superlinear convergence of the Broyden update under strong metric subregularity, metric regularity and strong metric regularity. In particular, we show that the Broyden update applied to a generalized equation in Hilbert spaces satisfies the Dennis–Moré condition for q-superlinear convergence. Simple numerical examples illustrate the results.

Extending the Kantorovich’s theorem on Newton’s method for solving strongly regular generalized equation

Article 27 April 2018

Improved convergence analysis for Newton-like methods

Article 11 July 2015

On the convergence of Newton-like methods using restricted domains

Article 22 September 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper we consider the generalized equation

$$ f(x) + F(x) \ni0, $$

(1)

where f:X→Y is a function and F:X⇉Y is a set-valued mapping. Throughout, unless stated otherwise, X and Y are (real) Banach spaces. To simplify some of the arguments used we make the standing assumption that f is continuously Fréchet differentiable everywhere with derivative Df and F has closed graph.

If F is the zero mapping, then (1) reduces to the equation f(x)=0 for which the standard Newton iteration takes the form

$$f(x_k) + Df(x_k) (x_{k+1}-x_k) = 0. $$

When F is nonzero, then the Newton iteration is extended in a natural way to

$$ f(x_k) + Df(x_k) (x_{k+1}-x_k) +F(x_{k+1})\ni0, $$

(2)

that is, at each iteration a partially linearized inclusion is to be solved. In a path-breaking work N.H. Josephy [16] was the first to consider a Newton iteration of the kind (2) specialized to the case where F is the normal cone mapping in finite dimensions; then (1) describes a variational inequality. Most importantly, he employed the property of strong regularity coined by his Ph.D. advisor S.M. Robinson [21]. In this paper, we adopt the definition given in [10]:

Definition 1.1

(Strong metric regularity)

A mapping H:X⇉Y is said to be strongly metrically regular at $\bar{x}$ for $\bar{y}$ when $\bar{y}\in H(\bar{x})$ and there are neighborhoods U of $\bar{x}$ and V of $\bar{y}$ such that the mapping y↦H ⁻¹(y)∩U is a Lipschitz continuous function on V.

The result of Josephy [16] adapted for the generalized equation (1) essentially says that if $\bar{x}$ is a solution of (1), the function f is twice continuously differentiable around $\bar{x}$ and the mapping f+F is strongly metrically regular at $\bar{x}$ for 0, then there exists a neighborhood O of $\bar{x}$ such that for every starting point x ₀∈O the iteration (2) generates a unique sequence in O and this sequence is q-quadratically convergent to $\bar{x}$.

A linear and bounded mapping A:X→Y is strongly metrically regular (everywhere) whenever its inverse A ⁻¹ is single-valued. If the mapping A is not necessarily invertible but only surjective, then it is metrically regular. Metric regularity has played a major role in nonlinear analysis since 1960s. Its formal definition follows:

Definition 1.2

(Metric regularity)

A mapping H:X⇉Y is said to be metrically regular at $\bar{x}$ for $\bar{y}$ when $\bar{y}\in H(\bar{x})$ and there is a constant κ>0 together with neighborhoods U of $\bar{x}$ and V of $\bar{y}$ such that

$$ d \bigl(x,H^{-1}(y) \bigr)\leq\kappa d\bigl(y,H(x)\bigr) \quad\text{for all } (x,y)\in U\times V. $$

(3)

A central result in the theory of metric regularity is the Lyusternik–Graves theorem which says that if a function f:X→Y is continuously Fréchet differentiable around $\bar{x}$, then it is metrically regular at $\bar{x}$ (for $f(\bar{x})$) if and only if the derivative $Df(\bar{x})$ is surjective. In parallel, the standard inverse function theorem can be stated as follows: if a function f:X→Y is continuously Fréchet differentiable around $\bar{x}$ then it is strongly metrically regular at $\bar{x}$ (for $f(\bar{x})$) if and only if the derivative $Df(\bar{x})$ is invertible. The inverse function theorem was extended to variational inequalities by Robinson in his seminal paper [21]. Adapted to the generalized equation (1), it says that the mapping f+F is (strongly) metrically regular at $\bar{x}$ for 0 if and only if the “partial linearization” $f(\bar{x}) + Df(\bar{x})(\cdot- \bar{x}) + F(\cdot)$ has the same property. This general pattern culminates in the inverse function theorem paradigm to which the recent book [10] is dedicated.

The third author of the present paper proved in [7], see also Theorem 6C.6 in [10], the following result which complements that of Josephy: if the derivative Df is Lipschitz continuous around $\bar{x}$ and the mapping f+F is metrically regular at $\bar{x}$ for 0, then there exists a neighborhood O of $\bar{x}$ such that for every starting point x ₀∈O the method (2) is executable, that is, it generates a sequence which is q-quadratically convergent to $\bar{x}$. Under strong metric regularity this sequence happens to be locally unique, as in Josephy’s theorem. The result in [7] has opened the way to developing a broader perspective to set-valued extensions of Newton’s method. A number of results in this direction are presented in the books [1, 10] and [19].

There is a third regularity property that plays an important role in establishing convergence of Newton’s method.

Definition 1.3

(Strong metric subregularity)

Consider a mapping H:X⇉Y and a point $(\bar{x}, \bar{y}) \in X\times Y$. Then H is said to be strongly metrically subregular at $\bar{x}$ for $\bar{y}$ when $\bar{y}\in H(\bar{x})$ and there is a constant κ>0 together with a neighborhood U of $\bar{x}$ such that

$$\|x-\bar{x}\| \leq\kappa d\bigl(\bar{y}, H(x)\bigr)\quad \text{for all } x \in U. $$

Strong metric subregularity of H at $\bar{x}$ for $\bar{y}$ implies that $\bar{x}$ is an isolated point in $H^{-1}(\bar{y})$; moreover, it is equivalent to the so-called isolated calmness of the inverse H ⁻¹, meaning that there is a neighborhood U of $\bar{x}$ such that $H^{-1}(y)\cap U \subset\bar{x}+\kappa\|y-\bar{x}\|\mathbb{B}$ for all y∈Y. The isolated calmness was introduced independently in [4] under the name semistability and in [6] under the name local upper Lipschitz continuity. Every mapping H acting in finite dimensions, whose graph is the union of finitely many convex polyhedral sets, is strongly metrically subregular at $\bar{x}$ for $\bar{y}$ if and only if $\bar{x}$ is an isolated point in $H^{-1}(\bar{y})$. Most importantly, the strong metric subregularity obeys the paradigm of the inverse function theorem in the same way as metric regularity and strong metric regularity do. In particular, a smooth function f is strongly metrically subregular at $\bar{x}$ if and only if its derivative mapping $Df(\bar{x})$ is injective.

Consider the Newton method (2) for the generalized equation (1) with a function f whose derivative mapping Df is Lipschitz continuous around a reference solution $\bar{x}$. If a sequence {x _k} generated by (2) is convergent to $\bar{x}$ then, as shown in [4, Corollary 2.1], strong metric subregularity implies that this sequence converges quadratically. Note that the strong metric subregularity itself does not guarantee the existence of a Newton sequence. In order to ensure that Newton’s method (2) is executable, Bonnans [4] introduced a property called by him hemistability which basically postulates the existence of a Newton iteration in (2). Specifically, in our notation this property requires that for any ε>0 there exists δ>0 such that for any x and M with $\|x-\bar{x}\|+\|M-Df(\bar{x})\|\leq\varepsilon$ there exists $\hat{x}$ with $\|\hat{x} - \bar{x}\|\leq\delta$ satisfying $f(x)+M(\hat{x} - x) + F(\hat{x})$. Observe that for the mapping f+F with a smooth function f hemistability is implied by metric regularity; this is a simple consequence of the Lyusternik–Graves theorem. On the other hand the combination of semistability and hemistability does not follow from metric regularity, since metric regularity does not imply local uniqueness of the reference point. Also, this combination is a weaker property than strong metric regularity but doesn’t guarantee local uniqueness of a Newton sequence.

In this paper we prove first a result parallel to the one in [7] concerning the convergence of the following quasi-Newton method for solving (1): given x ₀ compute x _k+1 to satisfy

$$ f(x_k)+ B_k(x_{k+1}-x_k) + F(x_{k+1}) \ni0, \quad\text{for } k=0,1,\ldots, $$

(4)

where B _k is a sequence of linear and bounded mappings from X to Y. The specific way B _k is constructed determines the quasi-Newton method; in the paper we focus on Broyden’s update. In Sect. 3, Theorem 3.1 shows that if the mapping f+F of (1) is metrically regular at $\bar{x}$ for 0 and the initial mapping B ₀ is close to $Df(\bar{x})$, then, under certain condition on the sequence of mappings B _k, there exists a neighborhood O of $\bar{x}$ such that for every starting point x ₀∈O there exists a sequence generated by (4) which stays in O and either reaches a solution of (1) in O after finitely many steps or is q-linearly convergent to $\bar{x}$. If in addition the mapping f+F is strongly metrically regular, then for every starting point x ₀∈O there exists a unique in O sequence generated by (4) and this sequence is q-linearly convergent to $\bar{x}$.

In Sect. 4 we first prove in the Hilbert space setting that the Broyden method satisfies the conditions of Theorem 3.1. In Theorem 4.9, under the condition that $B_{0}-Df(\bar{x})$ is a Hilbert–Schmidt operator, we establish q-superlinear convergence in three cases depending on the property of the mapping f+F: (i) if f+F is strongly metrically subregular at $\bar{x}$ for 0 then every sequence generated by (4) which converges to $\bar{x}$ is actually q-superlinearly convergent; (ii) if in addition f+F is metrically regular at $\bar{x}$ for 0 then there exists a neighborhood O of $\bar{x}$ such that for every starting point x ₀∈O there exists a sequence generated by (4) which either reaches a solution of (1) in O in finitely many steps or is q-superlinearly convergent to $\bar{x}$; (iii) if f+F is strongly metrically regular at $\bar{x}$ for 0 then for every x ₀∈O the iteration (4) generates a unique in O sequence and this sequence is q-superlinearly convergent to $\bar{x}$. A key step in proving this result is Theorem 4.8 where we show that in the case considered the Broyden update satisfies the Dennis–Moré condition, which allows us to employ the generalization of the Dennis–Moré theorem obtained in [8]. Theorem 4.9 sharpens, in the setting of (strong) metric (sub)regularity, and extends to infinite dimensions the results in [4] developed for variational inequalities in finite dimensions. Specifically, for a class of quasi-Newton methods including the Broyden update, [4, Theorem 2.3] shows the existence of a q-superlinear convergent sequence under hemistability and semistability of the reference solution. In this paper we extend the latter result to infinite dimensions under the stronger condition that both metric regularity and strong metric subregularity hold.

Section 2 contains an auxiliary result concerning perturbed metric regularity, which is used as a tool for proving local convergence of the method (4). In Sect. 5 we present two numerical examples, the second of which is based on a model of economic equilibrium recently developed in [11].

Throughout, any norm is denoted by ∥⋅∥ and any metric by ρ(⋅,⋅). The distance from a point x to a set C is denoted by d(x,C) and the excess from a set D to a set C by e(D,C)=sup_x∈D d(x,C). The closed ball centered at x with radius a is denoted by $\mathbb{B}_{a}(\bar{x})$ and $\mathcal{L}(X,Y)$ denotes the Banach space of linear and bounded mappings acting from X to Y.

2 Preliminaries

We utilize the following generalization of Nadler’s fixed point theorem, originally proved in [9], for more see [10, Theorem 5E.2]:

Theorem 2.1

(Contraction mapping principle)

Let (X,ρ) be a complete metric space, and consider a set-valued mapping Φ:X⇉X, a point $\bar{x}\in X$, and positive scalars a and θ such that θ<1, the set $\operatorname{gph}\varPhi\cap (\mathbb{B}_{a}(\bar{x})\times\mathbb{B}_{a}(\bar{x}) ) $ is closed, and the following conditions hold:

(i)
$d (\bar{x}, \varPhi(\bar{x}) ) < a(1 - \theta)$;
(ii)
$e (\varPhi(u)\cap\mathbb{B}_{a}(\bar{x}), \varPhi(v) ) \leq \theta\rho(u,v)$ for all $u, v \in\mathbb{B}_{a}(\bar{x})$.

Then Φ has a fixed point in $\mathbb{B}_{a}(\bar{x})$; that is, there exists $x \in\mathbb{B}_{a}(\bar{x})$ such that x∈Φ(x). In addition, if Φ is single-valued, then Φ has a unique fixed point in $\mathbb{B}_{a}(\bar{x})$.

Recall that a metric ρ in a linear space X is said to be shift-invariant when ρ(x+y,x+y′)=ρ(y,y′) for all x,y,y′∈X.

Theorem 2.2

(Perturbed metric regularity)

Let (X,ρ) be a complete metric space and (Y,ρ) be a linear metric space with shift-invariant metric. Consider a mapping H:X⇉Y with closed graph and a point $(\bar{x}, \bar{y}) \in\operatorname{gph}H$ at which H is metrically regular, that is, there exist positive constants a, b, and κ such that

$$ d\bigl(x,H^{-1}(y)\bigr)\leq\kappa d\bigl(y,H(x)\bigr)\quad \textit{for all } (x,y)\in \mathbb{B}_a(\bar{x}) \times \mathbb{B}_b(\bar{y}). $$

(5)

Let μ>0 be such that κμ<1 and let κ′>κ. Then for every positive α and β such that

$$ \alpha\leq a/2, \quad \mu\alpha+2\beta\leq b \quad \textit{and} \quad 2\kappa'\beta\leq \alpha(1-\kappa\mu) $$

(6)

and for every function h:X→Y satisfying

$$ \rho\bigl(h(\bar{x}), 0\bigr) \leq\beta $$

(7)

and

$$ \rho\bigl(h(x), h\bigl(x'\bigr)\bigr) \leq\mu\rho \bigl(x, x'\bigr) \quad\textit{for every } x, x' \in \mathbb{B}_\alpha(\bar{x}), $$

(8)

the mapping h+H has the following property: for every $y, y' \in \mathbb{B}_{\beta}(\bar{y})$ and every $x \in(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x})$ there exists x′∈(h+H)⁻¹(y′) such that

$$ \rho\bigl(x, x'\bigr) \leq\frac{\kappa'}{1-\kappa\mu}\rho\bigl(y, y'\bigr). $$

(9)

In addition, if the mapping H is strongly metrically regular at $\bar{x}$ for $\bar{y}$; specifically, the mapping $y \mapsto H^{-1}(y)\cap\mathbb{B}_{a}(\bar{x})$ is single-valued and Lipschitz continuous on $\mathbb{B}_{b}(\bar{y})$ with a Lipschitz constant κ, then for μ, κ′, α and β as above and any function h satisfying (7) and (8), the mapping $y \mapsto(h+H)^{-1}(y)\cap\mathbb {B}_{\alpha}(\bar{x})$ is a Lipschitz continuous function on $\mathbb{B}_{\beta}(\bar{y})$ with a Lipschitz constant κ′/(1−κμ).

Before proving the theorem we will make some comments. If we assume $h(\bar{x}) = 0$ then Theorem 2.2 simply says that if H is (strongly) metrically regular at $\bar{x}$ for $\bar{y}$ and h has a sufficiently small Lipschitz constant, then the perturbed h+H is (strongly) metrically regular at $\bar{x}$ for $\bar{y}$; indeed in this case the claim involving (9) means that (h+H)⁻¹ has the Aubin property at $\bar{y}$ for $\bar{x}$ which is equivalent to metric regularity of h+H at $\bar{x}$ for $\bar{y}$, and then we obtain the (extended) Lyusternik–Graves theorem as stated in [10, Theorem 5E.1]. For the strong regularity part we get a version of Robinson’s theorem, see [10, Theorem 5F.1]. However, if $h(\bar{x}) \neq0$ then $(\bar{x}, \bar{y})$ may be not in the graph of h+H and then we cannot claim that h+H is (strongly) metrically regular at $\bar{x}$ for $\bar{y}$. Of course, this could be handled by choosing a new function $\tilde{h}$ with $\tilde{h}(x) = h(x) - h(\bar{x})$ and then applying [10, Theorem 5F.1], but the latter does not specify how the constants (e.g., the radii of the balls involved) depend on the data of the problem, which is the crux of the matter in obtaining the needed estimates. Clearly, the result in Theorem 2.2 is parallel to [10, Theorem 5F.1] and can be recovered from the latter; but we believe that giving a complete proof would be beneficial for the reader.

Proof

Choose μ and κ′ as required and then α and β to satisfy (6). For any $x \in\mathbb{B}_{\alpha}(\bar{x})$ and $y \in\mathbb {B}_{\beta}(\bar{y})$, using the shift-invariance of the metric in Y, (7), (8) and the triangle inequality, we obtain

$$\begin{aligned} \rho\bigl(-h(x)+y, \bar{y}\bigr) \leq&\rho\bigl(0,h(\bar{x})\bigr)+\rho \bigl(h(\bar{x}), h(x)\bigr) + \rho(y,\bar{y}) \\ \leq&\beta+ \mu\rho(x, \bar{x}) + \beta\leq2\beta+ \mu\alpha \leq b, \end{aligned}$$

(10)

where the last inequality follows from the second inequality in (6). Fix $y' \in\mathbb{B}_{\beta}(\bar{y})$ and consider the mapping

$$\varPhi_{y'}: x \mapsto H^{-1}\bigl(-h(x)+y' \bigr) \quad\text{for } x \in \mathbb{B}_\alpha(\bar{x}). $$

Clearly, $\operatorname{gph}\varPhi_{y'}$ is closed. Let $y \in\mathbb{B}_{\beta}(\bar{y})$, y≠y′ and let $x \in(h+H)^{-1}(y) \cap\mathbb{B}_{\alpha}(\bar{x})$. We will apply Theorem 2.1 with the complete metric space X identified with the closed ball $\mathbb{B}_{\alpha}(\bar{x})$ to show that there is a fixed point x′∈Φ _y(x′) in the closed ball centered at x with radius

$$ \varepsilon:=\frac{\kappa'\rho(y,y')}{1-\kappa\mu}. $$

(11)

From the third inequality in (6), we obtain

$$\varepsilon\leq \frac{\kappa'(2\beta)}{1-\kappa\mu} \leq\alpha. $$

Hence, from the first inequality in (6) we get $\mathbb{B}_{\varepsilon}(x) \subset \mathbb{B}_{a}(\bar{x})$. Since y∈h(x)+H(x) and (x,y) satisfies (10), from the assumed metric regularity of H we get

$$\begin{aligned} d\bigl(x,\varPhi_{y'}(x)\bigr) = & d\bigl(x,H^{-1} \bigl(-h(x)+y'\bigr)\bigr) \leq\kappa d\bigl(-h(x)+y', H(x)\bigr) \\ =& \kappa d\bigl(y', h(x)+H(x)\bigr) \leq \kappa\rho \bigl(y,y'\bigr) \\ < & \kappa' \rho\bigl(y,y'\bigr)= \varepsilon(1- \kappa\mu). \end{aligned}$$

For any $u,v \in\mathbb{B}_{\varepsilon}(x)$, using (8), we have

$$\begin{aligned} e\bigl(\varPhi_{y'}(u)\cap\mathbb{B}_{\varepsilon}(x), \varPhi_{y'}(v)\bigr) \leq& e\bigl(H^{-1} \bigl(-h(u)+y'\bigr)\cap\mathbb{B}_a(\bar{x}), H^{-1}\bigl(-h(v)+y'\bigr)\bigr) \\ \leq& \kappa\rho\bigl(h(u),h(v)\bigr) \leq\kappa\mu\rho(u,v). \end{aligned}$$

Applying Theorem 2.1 to the mapping Φ _y′, with $\bar{x}$ identified with x and constants a=ε and θ=κμ, we obtain the existence of a fixed point x′∈Φ _y′(x′)=H ⁻¹(−h(x′)+y′), which is equivalent to x′∈(h+H)⁻¹(y′), within distance ε given by (11) from x.

For the second part of the theorem, suppose that $y \mapsto s(y):=H^{-1}(y)\cap\mathbb{B}_{a}(\bar{x})$ is a Lipschitz continuous function on $\mathbb{B}_{b}(\bar{y})$ with a Lipschitz constant κ. Choose μ, κ′, α and β as in the statement and let h satisfy (7) and (8). For any $y \in\mathbb{B}_{\beta}(\bar{y})$, since $\bar{x}\in (h+H)^{-1}(\bar{y}+h(\bar{x}))\cap\mathbb{B}_{\alpha}(\bar{x})$, from (9) we obtain that there exists x∈(h+H)⁻¹(y) such that

$$\rho(x, \bar{x}) \leq\frac{\kappa'}{1-\kappa\mu}\rho\bigl(y, \bar{y}+h(\bar{x})\bigr). $$

Since $\rho(y, \bar{y}+h(\bar{x}))\leq2\beta$, by (6) we get $\rho(x, \bar{x}) \leq\alpha$, that is, $(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x}) \neq\emptyset$. Hence the domain of the mapping $(h+H)^{-1}\cap\mathbb{B}_{\alpha}(\bar{x}) $ contains $\mathbb {B}_{\beta}(\bar{y})$.

If $x \in(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x})$, then $x \in H^{-1}(y-h(x))\cap\mathbb{B}_{\alpha}(\bar{x}) \subset H^{-1}(y-h(x))\cap\mathbb{B}_{a}(\bar{x})= s(y-h(x))$ since $y - h(x) \in\mathbb{B}_{b}(\bar{y})$ according to (10). Hence,

$$ H^{-1}\bigl(y-h(x)\bigr)\cap\mathbb{B}_\alpha(\bar{x})=s\bigl(y-h(x)\bigr)=x. $$

(12)

Assume that there exist $y \in\mathbb{B}_{\beta}(\bar{y})$ and $x, x' \in(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x})$ such that x≠x′. From (10) we have that both y−h(x) and y−h(x′) are in $\mathbb{B}_{b}(\bar{y})$. Then from (12) we get

$$\begin{aligned} \rho\bigl(x', x\bigr) = & \rho\bigl(s\bigl(-h \bigl(x'\bigr)+y\bigr), s\bigl(-h(x)+y\bigr)\bigr) \\ \leq& \kappa\rho\bigl(-h\bigl(x'\bigr)+y, -h(x)+y\bigr) = \kappa \rho\bigl(h\bigl(x'\bigr), h(x)\bigr) \\ \leq& \kappa\mu\rho\bigl(x',x\bigr) < \rho \bigl(x', x\bigr), \end{aligned}$$

which is a contradiction. Hence, the mapping $y\mapsto g(y):=(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x})$ is single-valued, that is, a function, defined on $\mathbb{B}_{\beta}(\bar{y})$. Let $y,y' \in\mathbb{B}_{\beta}(\bar{y})$. Utilizing the equality g(y)=s(−h(g(y))+y), see (12), we have

$$\begin{aligned} \rho\bigl(g(y),g\bigl(y'\bigr)\bigr) = & \rho\bigl(s\bigl(-h \bigl(g(y)\bigr)+y\bigr), s\bigl(-h\bigl(g\bigl(y'\bigr) \bigr)+y'\bigr)\bigr) \\ \leq&\kappa\rho\bigl(h\bigl(g(y)\bigr),h\bigl(g\bigl(y'\bigr) \bigr)\bigr) +\kappa\rho\bigl(y,y'\bigr) \\ \leq& \kappa\mu\rho\bigl(g(y),g\bigl(y'\bigr)\bigr) + \kappa\rho \bigl(y,y'\bigr). \end{aligned}$$

Thus,

$$\rho\bigl(g(y),g\bigl(y'\bigr)\bigr) \leq\frac{\kappa'}{1-\kappa\mu}\rho \bigl(y,y'\bigr); $$

that is, g is Lipschitz continuous with Lipschitz constant κ′/(1−κμ). The proof is complete. □

3 Convergence under metric regularity

In this section we give conditions under which the quasi-Newton iteration (4) is locally q-linearly convergent. Recall that a sequence {x _k} is convergent q-linearly to $\bar{x}$ when there exist a natural K and a real α∈[0,1) such that

$$\|x_{k+1} - \bar{x}\| \leq\alpha\|x_k -\bar{x}\| \quad \text{for } k=K, K+1,\ldots. $$

A sequence {x _k} is q-superlinearly convergent to $\bar{x}$ when there exist a natural K and sequence of reals {α _k} such that α _k↘0 and

$$\|x_{k+1} - \bar{x}\| \leq\alpha_k \|x_k -\bar{x} \| \quad\text{for } k=K, K+1,\ldots. $$

Theorem 3.1

Suppose that f+F is metrically regular at $\bar{x}$ for 0 with constant λ. Then, in particular, $\bar{x}$ is a solution of (1). Consider the quasi-Newton method (4) and assume that

$$ \bigl\| B_0 - Df(\bar{x})\bigr\| < 1/(2\lambda). $$

(13)

Furthermore, assume that there exist a constant c>0 and a neighborhood U of $\bar{x}$ such that, for k=0,1,…, and for any B _k and any x _k,x _k+1∈U, x _k≠x _k+1, all satisfying (4), the operator B _k+1 is chosen in such a way that

$$ \bigl\| B_{k+1} - Df(\bar{x})\bigr\| \leq\bigl\| B_{k}-Df(\bar{x}) \bigr\| + c\bigl(\|x_k-\bar{x}\|+\| x_{k+1}-\bar{x}\|\bigr). $$

(14)

Then there exists a neighborhood O of $\bar{x}$ such that for every x ₀∈O there exists a sequence {x _k} starting at x ₀ and generated by (4) which stays in O and either reaches a solution of (1) in O after finitely many steps or converges q-linearly to $\bar{x}$. If in addition f+F is strongly metrically regular at $\bar{x}$ for 0 then for every x ₀∈O there is a unique in O sequence {x _k} starting at x ₀ and generated by (4), and this sequence converges q-linearly to $\bar{x}$.

Proof

Choose κ>λ such that

$$ \delta:=\bigl\| B_0 - Df(\bar{x})\bigr\| < 1/(2\kappa). $$

(15)

Let κ′>κ be such that

$$\frac{\kappa'\delta}{1-\kappa\delta} < 1 $$

and then fix γ>0 to satisfy

$$\frac{\kappa'\delta}{1-\kappa\delta} < \gamma<1. $$

Choose ε>0 such that

$$ \frac{\kappa'}{1-\kappa(\delta+\varepsilon)}(\varepsilon+\delta ) < \gamma. $$

(16)

Let

$$ H(x) = f(\bar{x}) + Df(\bar{x}) (x-\bar{x}) + F(x), $$

(17)

for x∈X. From Theorem 2.2 applied to f+F and $h(\cdot)=f(\bar{x})+Df(\bar{x})(\,\cdot\,-\bar{x})-f(\cdot)$ (or simply by the standard Lyusternik–Graves theorem, see, e.g., [10, Theorem 5E.1]), since $h(\bar{x})=0$, κ>λ, and (f+F)+h=H, then (9) implies metric regularity of H at $\bar{x}$ for 0 with constant κ. Thus, it follows the existence of some positive constants a and b such that

$$d\bigl(x,H^{-1}(y)\bigr)\leq\kappa d\bigl(y,H(x)\bigr)\quad\text{for all }x\in\mathbb{B} _a(\bar{x}),y\in\mathbb{B}_b(0). $$

Using (16), make a smaller if necessary so that $\mathbb{B} _{a}(\bar{x})\subset U$,

$$ \begin{aligned} &\bigl\| f(u)-f(v)-Df(\bar{x}) (u-v)\bigr\| \leq\varepsilon\|u-v\| \quad\text{for all }u,v\in\mathbb{B}_a(\bar{x}), \\ &\kappa \biggl(\delta+\frac{ca}{1-\gamma} \biggr)< 1, \end{aligned} $$

(18)

and

$$ \frac{\kappa'}{1-\kappa (\delta+ \frac{ca}{1-\gamma} )} \biggl(\varepsilon+\delta+\frac{ca}{1-\gamma} \biggr) < \gamma, $$

(19)

where c is from (14). Set

$$\mu:=\delta+\frac{ca}{1-\gamma} $$

and choose positive α and β to satisfy the inequalities (6) in Theorem 2.2. Choose τ∈(0,α) such that

$$ \biggl(\varepsilon+\delta+\frac{ca}{1-\gamma} \biggr)\tau\leq \beta. $$

(20)

Denote $O:=\mathbb{B}_{\tau}(\bar{x})$ and choose any $x_{0}\in O\setminus\{ \bar{x}\}$. Consider the function

$$h_0(x): = f(x_0) + B_0(x-x_0)-f( \bar{x})-Df(\bar{x}) (x-\bar{x}). $$

Then we have

$$ \bar{y}_0 \in h_0(\bar{x})+H(\bar{x}) \quad \text{where } \bar{y}_0 := f(x_0)-f(\bar{x})+B_0(\bar{x}-x_0). $$

(21)

Further,

$$\begin{aligned} \bigl\| h_0(\bar{x})\bigr\| = & \bigl\| f(x_0) + B_0(\bar{x}-x_0)-f(\bar{x})\bigr\| \\ \leq& \bigl\| f(x_0)-f(\bar{x}) - Df(\bar{x}) (x_0-\bar{x})\bigr\| + \bigl\| \bigl(B_0-Df(\bar{x})\bigr) (\bar{x}-x_0)\bigr\| \\ \leq& (\varepsilon+ \delta)\tau\leq\beta, \end{aligned}$$

(22)

where we use (20). For any $x,x' \in\mathbb{B}_{\alpha}(\bar{x})$ from (15) we have

$$\bigl\| h_0(x)-h_0\bigl(x'\bigr)\bigr\| = \bigl\| \bigl(B_0-Df(\bar{x})\bigr) \bigl(x-x'\bigr)\bigr\| \leq\delta \bigl\| x-x'\bigr\| \leq \mu\bigl\| x-x'\bigr\| . $$

Also, observe that $\bar{y}_{0} = h_{0}(\bar{x})$, hence from (22) we get $\bar{y}_{0} \in\mathbb{B}_{\beta}(0)$. Finally, from (21) we have $\bar{x}\in(h_{0}+H)^{-1}(\bar{y}_{0})$. We are now ready to apply Theorem 2.2 with H defined in (17), h=h ₀, κ, μ, κ′, a, b, α, β having the values defined above, to obtain that there exists x ₁∈(h ₀+H)⁻¹(0), that is, x ₁ satisfies (4) for k=0, and also

$$\|x_1 - \bar{x}\| \leq\frac{\kappa'}{1-\kappa\mu}\|\bar{y}_0\| \leq \frac{\kappa'}{1-\kappa\mu}(\varepsilon+\delta)\|x_0-\bar{x}\| \leq\gamma \|x_0 - \bar{x}\|, $$

where we use the estimates (19) and (22). Since γ<1, this yields $x_{1} \in O=\mathbb{B}_{\tau}(\bar{x})$.

By induction, suppose that there exist an integer n>1 and points x ₁,…,x _n with $x_{k} \in\mathbb{B}_{\tau}(\bar{x})$, and

$$ \|x_k-\bar{x}\|\leq\gamma\|x_{k-1}-\bar{x}\| \quad\text{for } k = 1, \dots, n. $$

(23)

If for some k∈{1,…,n} we have $x_{k}=\bar{x}$ or x _k−1=x _k then x _k is a solution of (1). Otherwise, we have $x_{k-1}\neq x_{k} \neq\bar{x}$ for all k=1,…,n. From condition (14) and taking into account that τ≤α≤a/2 we get

$$\begin{aligned} \bigl\| Df(\bar{x})-B_n\bigr\| \leq{}& \bigl\| Df(\bar{x})-B_0 \bigr\| +c\sum_{k=1}^n \bigl(\| x_k-\bar{x} \|+\|\bar{x}-x_{k-1}\| \bigr) \\ \leq{}& \delta+c\sum_{k=1}^n \bigl( \|x_k-\bar{x}\|+\|\bar{x}-x_{k-1}\| \bigr) \\ \leq{}& \delta+ 2c\sum_{k=0}^n \|x_{k}-\bar{x}\| \leq\delta+2c\sum_{k=0}^{\infty} \gamma^k\|x_0-\bar{x}\| \leq\delta+\frac{ca}{1-\gamma}. \end{aligned}$$

(24)

Define

$$h_n(x): = f(x_n) + B_n(x-x_n)-f( \bar{x})-Df(\bar{x}) (x-\bar{x}) $$

and

$$\bar{y}_n := f(x_n)-f(\bar{x})+B_n(\bar{x}-x_n). $$

Then, using (24), we obtain

$$\begin{aligned} \|\bar{y}_n\| = & \bigl\| f(x_n)-f( \bar{x}) - Df(\bar{x}) (x_n-\bar{x})\bigr\| + \bigl\| \bigl(B_n-Df(\bar{x})\bigr) (\bar{x}-x_n)\bigr\| \\ \leq& \biggl(\varepsilon+ \delta+ \frac{ca}{1-\gamma} \biggr)\| x_n- \bar{x}\| \leq \biggl(\varepsilon+ \delta+ \frac{ca}{1-\gamma } \biggr) \|x_0-\bar{x}\| . \end{aligned}$$

(25)

Hence, by (20),

$$ \|\bar{y}_n\| \leq \biggl(\varepsilon+ \delta+ \frac{ca}{1-\gamma } \biggr)\tau\leq\beta. $$

(26)

Since $h_{n}(\bar{x}) = \bar{y}_{n}$, we get $\|h_{n}(\bar{x})\| \leq\beta$. Also, for any $x,x' \in\mathbb{B}_{\alpha}(\bar{x})$ we obtain

$$\begin{aligned} \bigl\| h_n(x)-h_n\bigl(x'\bigr)\bigr\| =& \bigl\| \bigl(B_n-Df(\bar{x})\bigr) \bigl(x-x'\bigr)\bigr\| \\ \leq& \biggl(\delta +\frac{ca}{1-\gamma} \biggr)\bigl\| x-x'\bigr\| =\mu \bigl\| x-x'\bigr\| . \end{aligned}$$

The assumptions of Theorem 2.2 are then satisfied, hence, taking into account that $\bar{x}\in(h_{n}+H)^{-1}(\bar{y}_{n})$ we conclude that there exists x _n+1∈(h _n+H)⁻¹(0), that is, satisfying (4) for k=n, such that

$$\|x_{n+1}-\bar{x}\| \leq\frac{\kappa'}{1-\kappa\mu}\|\bar{y}_n\|. $$

Then, utilizing (19) and (25) we obtain

$$ \|x_{n+1}-\bar{x}\| \leq\frac{\kappa '}{1-\kappa\mu} \biggl(\varepsilon+ \delta+ \frac{ca}{1-\gamma } \biggr)\|x_n-\bar{x}\| \leq\gamma \|x_n-\bar{x}\|. $$

(27)

Hence, $x_{n+1} \in\mathbb{B}_{\tau}(\bar{x})$ and the induction step is complete. If $x_{n+1}=\bar{x}$ or x _n+1=x _n then x _n+1 is a solution of (1). Otherwise, we have an infinite sequence {x _n} with x _n≠x _n+1 for all n which satisfies (27). Since γ<1, (27) yields that the sequence {x _k} converges to $\bar{x}$ q-linearly.

For the final statement, suppose that f+F is strongly metrically regular with the same constant κ and neighborhoods $\mathbb{B}_{a}(\bar{x}) $ and $\mathbb{B}_{b}(\bar{x})$. According to the second part of Theorem 2.2, the point x _n+1∈(h _n+H)⁻¹(0) is unique in O. Furthermore, $(f+F)^{-1}(0)\cap O=\{\bar{x}\}$ and hence the sequence must converge q-linearly to the only solution $\bar{x}$ in O. The proof is complete. □

4 Convergence of the Broyden update

In this section X and Y are real Hilbert spaces with scalar products denoted by 〈⋅,⋅〉. We consider the following well-known Broyden update:

$$ B_{k+1}:=B_{k}+\frac{(y_{k}-B_{k}s_{k}) \langle s_{k},\cdot \rangle}{ \Vert s_{k}\Vert ^{2}}, $$

(28)

where y _k:=f(x _k+1)−f(x _k) and s _k:=x _k+1−x _k. Usually, B ₀ is taken as Df(x ₀).

There are a large number of papers dealing with quasi-Newton methods for solving the nonlinear equations in the infinite-dimensional setting, and some of them deal with the Broyden update, see, e.g., [3, 12, 15, 18, 22, 23, 25]. An overview of the Broyden update, together with historical remarks and recent works is given in [13]. We will apply Theorem 3.1 to the Broyden update (28) showing that it satisfies condition (14) and hence is q-linearly convergent, locally.

We start with an elementary lemma.

Lemma 4.1

Let $A \in\mathcal{L}(X, Y)$. If x∈X∖{0}, then

$$ \biggl\Vert A-\frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert =\left \{ \begin{array}{l@{\quad}l} 0,&\textit{if }\dim X=1;\\ \|A\|,&\textit{if }\dim X>1. \end{array} \right . $$

(29)

Proof

Let $z\in\operatorname{span}(x)=\{\lambda x\mid\lambda\in\mathbb {R}\}$. Then there is some $\lambda_{0}\in\mathbb{R}$ such that z=λ ₀ x, from where,

$$\biggl\Vert Az-\frac{\langle x,z\rangle Ax}{\|x\|^2}\biggr\Vert =\biggl\Vert \lambda_0 Ax-\frac{\lambda_0\langle x,x\rangle Ax}{\|x\|^2}\biggr\Vert =0. $$

If dimX=1, then $X=\operatorname{span}(x)$, and from the above equality we obtain (29). Otherwise, assume that dimX>1. For any $z\in\mathbb{B}_{X}$, one has

$$\biggl\Vert z-\frac{\langle x,z\rangle x}{\|x\|^2}\biggr\Vert ^2=\|z\| ^2-\frac{\langle x,z\rangle^2}{\|x\|^2}\leq1. $$

Hence,

$$\begin{aligned} \biggl\Vert A-\frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert &= \sup_{w\in\mathbb{B}_X} \biggl\Vert A \biggl(w-\frac{\langle x,w\rangle x}{\| x\|^2} \biggr)\biggr\Vert \leq\|A\|\sup _{w\in\mathbb{B}_X}\biggl\Vert w-\frac {\langle x,w\rangle x}{\|x\|^2}\biggr\Vert \leq\|A\|. \end{aligned}$$

For any z∈{x}^⊥={w∈X∣〈w,x〉=0}, one has

$$\biggl\Vert Az-\frac{\langle x,z\rangle Ax}{\|x\|^2}\biggr\Vert =\|Az\|, $$

and therefore (29) follows. □

The following result is a generalization to Hilbert spaces of a statement included in the first part of the proof of [24, Theorem 5.4.13].

Proposition 4.2

Suppose that the Fréchet derivative mapping Df is Lipschitz continuous with constant L in a convex neighborhood U of a point $\bar{x}$. Given $B_{k} \in\mathcal{L}(X, Y)$ and x _k,x _k+1∈U, with x _k+1≠x _k, if B _k+1 is defined as in (28), then

$$ \bigl\| B_{k+1}-Df(\bar{x})\bigr\| \leq \bigl\| B_k-Df(\bar{x})\bigr\| + \frac{L}{2} \bigl(\| x_{k+1}-\bar{x}\|+ \|\bar{x}-x_k\| \bigr). $$

(30)

Proof

By assumption,

$$\bigl\| Df(u)-Df(v)\bigr\| \leq L\|u-v\|\quad\text{for all }u,v\in U. $$

Let x _k+1,x _k∈U, x _k≠x _k+1 and let B _k+1 be defined as in (28). Then

$$\begin{aligned} B_{k+1}-Df(\bar{x})&=B_k-Df(\bar{x})+\frac{(y_k-B_k s_k)\langle s_k,\cdot\,\rangle}{\|s_k\|^2} \\ &=B_k-Df(\bar{x})-\frac{(B_k-Df(\bar{x}))s_k \langle s_k,\cdot\,\rangle }{\|s_k\|^2}+\frac{(y_k-Df(\bar{x}) s_k)\langle s_k,\cdot\,\rangle}{\| s_k\|^2}. \end{aligned}$$

Thus,

$$\bigl\| B_{k+1}-Df(\bar{x})\bigr\| \leq\biggl\Vert \bigl(B_k-Df(\bar{x})\bigr)-\frac{(B_k-Df(\bar{x}))s_k \langle s_k,\cdot\,\rangle}{\|s_k\|^2}\biggr\Vert +\frac{\| y_k-Df(\bar{x}) s_k\|}{\|s_k\|}. $$

By Lemma 4.1,

$$\biggl\Vert \bigl(B_k-Df(\bar{x})\bigr)-\frac{(B_k-Df(\bar{x}))s_k \langle s_k,\cdot\, \rangle}{\|s_k\|^2}\biggr\Vert \leq\bigl\| B_k-Df(\bar{x})\bigr\| . $$

Utilizing the mean value theorem, we obtain

$$\begin{aligned} \bigl\| y_k-Df(\bar{x}) s_k\bigr\| &=\bigl\| f(x_{k+1})-f(x_k)-Df( \bar{x})s_k\bigr\| \\ &=\biggl\Vert \int_0^1 \bigl[Df \bigl(x_{k}+t(x_{k+1}-x_{k})\bigr) (x_{k+1}-x_{k})-Df(\bar{x})s_k\bigr] dt\biggr\Vert \\ &\leq\|s_k\|\int_0^1\bigl\| Df \bigl(x_{k}+t(x_{k+1}-x_{k})\bigr)-Df(\bar{x})\bigr\| dt \\ &\leq L\|s_k\|\int_0^1 \bigl((1-t) \|x_{k}-\bar{x}\|+t\|x_{k+1}-\bar{x}\| \bigr)dt \\ &=\frac{L}{2}\|s_k\| \bigl(\|x_{k+1}-\bar{x}\|+ \|x_{k}-\bar{x}\| \bigr). \end{aligned}$$

This yields (30). □

We apply Theorem 3.1 to obtain the following result:

Theorem 4.3

Consider the generalized equation (1) in the setting of Hilbert spaces X and Y with a solution $\bar{x}$ and suppose that the derivative mapping Df is Lipschitz continuous around $\bar{x}$. Also, suppose that f+F is metrically regular at $\bar{x}$ for 0 with constant λ. Consider the quasi-Newton method (4) applied to (1) with the Broyden update (28) and with B ₀ satisfying (13). Then there exists a neighborhood O of $\bar{x}$ such that for any x ₀∈O there exists a sequence {x _k} starting from x ₀ and generated by (4) which stays in O and either reaches a solution of (1) in finitely many steps or converges q-linearly to $\bar{x}$. If in addition f+F is strongly metrically regular at $\bar{x}$ for 0 then for every x ₀∈O there is a unique in O sequence {x _k} starting from x ₀ and generated by (4), and this sequence which converges q-linearly to $\bar{x}$.

We devote the remainder of this section to the q-superlinear convergence of the Broyden update. Recall that the Hilbert–Schmidt norm of an operator $A\in\mathcal {L}(X,Y)$ is defined as

$$\|A\|_{HS}=\sqrt{\sum_{i\in I} \|Ae_i\|^2}, $$

where {e _i,i∈I} is an orthonormal basis of X. Denote by $\mathcal{H}(X,Y): = \{ A \in\mathcal{L}(X,Y) \mid\|A\| _{HS} < +\infty\}$ the set of Hilbert–Schmidt operators. Endowed with the inner product

$$\langle A,B\rangle_{HS}=\sum_{i\in I}\langle Ae_i,B e_i\rangle, $$

$\mathcal{H}(X,Y)$ becomes a Hilbert space, see [20]. In Euclidean spaces this norm coincides with the Frobenius norm.

We start with a lemma which echoes Lemma 4.1.

Lemma 4.4

Let $A \in\mathcal{H}(X,Y)$. If 0≠x∈X, then

$$ \biggl\Vert A-\frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert _{HS}^2=\| A\|_{HS}^2- \frac{\|Ax\|^2}{\|x\|^2}. $$

(31)

Proof

Note that

$$\begin{aligned} \biggl\Vert A-\frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert _{HS}^2&= \|A\|_{HS}^2+\biggl\Vert \frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert ^2_{HS}-2 \biggl\langle A,\frac{\langle x,\cdot\rangle Ax}{\|x\|^2} \biggr\rangle _{HS}. \end{aligned}$$

Further, by the Parseval identity,

$$\bigl\Vert {\langle x,\cdot\rangle Ax}\bigr\Vert ^2_{HS}= \sum_{i\in I}\bigl\| \langle x,e_i\rangle Ax \bigr\| ^2= \|Ax\|^2\sum_{i\in I} \langle x,e_i\rangle^2=\|Ax\|^2\|x \|^2, $$

and

$$\bigl\langle A,\langle x,\cdot\rangle Ax \bigr\rangle _{HS}=\sum _{i\in I} \bigl\langle Ae_i,\langle x,e_i\rangle Ax \bigr\rangle =\sum_{i\in I} \bigl\langle A\langle x,e_i\rangle e_i,Ax \bigr\rangle = \|Ax\|^2, $$

where to get the last equality we apply Remark 1.2.1(c) in [20]. This yields (31). □

Lemma 4.4 implies that the Proposition 4.2 is valid also for the Hilbert–Schmidt norm.

Proposition 4.5

Consider a function f:X→Y and a point $\bar{x}\in X$ such that the derivative mapping Df is Lipschitz continuous around $\bar{x}$ with respect to the Hilbert–Schmidt norm with constant L in a convex neighborhood U of a point $\bar{x}$. Given $B_{k} \in \mathcal{H}(X,Y)$ and x _k,x _k+1∈U, with x _k+1≠x _k, if B _k+1 is defined as in (28), then

$$ \bigl\| B_{k+1}-Df(\bar{x}) \bigr\| _{HS}\leq\bigl\| B_k-Df(\bar{x})\bigr\| _{HS} + \frac {L}{2} \bigl(\|x_{k+1}-\bar{x}\|+\|\bar{x}-x_k\| \bigr). $$

(32)

Proof

This can be obtained by applying the same argument as in Proposition 4.2 but using Lemma 4.4 instead of Lemma 4.1. □

Corollary 4.6

On the assumptions of Proposition 4.5, if $B_{0}-Df(\bar{x})$ is a Hilbert–Schmidt operator and Df is Lipschitz with respect to the Hilbert–Schmidt norm, then $B_{k}-Df(\bar{x})$ is a Hilbert–Schmidt operator, for all $k\in\mathbb{N}$.

Proof

This follows from (32). □

In the remainder of this section we link the analysis presented so far with a central result in the theory of quasi-Newton methods—the Dennis–Moré theorem. This theorem, first published in [5], gives a characterization for the q-superlinear convergence of a quasi-Newton method applied to a smooth equation f(x)=0 with a zero at $\bar{x}$ at which the derivative mapping $Df(\bar{x})$ is invertible. Namely, if a quasi-Newton method generates a sequence {x _k} which stays near $\bar{x}$ and x _k+1≠x _k for all k, then {x _k} is convergent q-superlinearly if and only if it is convergent and, in addition,

$$ \lim_{k\rightarrow\infty}\frac{\Vert E_{k}s_{k}\Vert }{ \Vert s_{k}\Vert }=0, $$

(33)

where $E_{k}:=B_{k}-Df(\bar{x})$.

It is well known that the Broyden update (28) applied to a smooth equation in finite dimensions with a nonsingular Jacobian at the reference solution $\bar{x}$ satisfies condition (33), see e.g. [17, Theorem 7.2.4]. Proofs of this claim in infinite-dimensional Hilbert spaces are given in [18] and [23], both of which explicitly use the fact that they deal with equations. We will now show that (33) holds as well by relying only on the formula (28), without assuming that this update is to be applied for solving an equation. This allows us to apply the Dennis–Moré theorem for generalized equations in Banach spaces proved in [8, Theorem 3]. For completeness we state next the sufficiency part of the latter result which is used in further lines.

Theorem 4.7

Consider the generalized equation (1) with a solution $\bar{x}$ and suppose that f is Fréchet differentiable in a neighborhood U of $\bar{x}$ and the derivative mapping Df is continuous at $\bar{x}$. Also, suppose that the mapping $x\mapsto f(\bar{x})+D f(\bar{x})(x-\bar{x}) + F(x)$ is strongly metrically subregular at $\bar{x}$ for 0. If a sequence {x _k} generated by (4) is convergent to $\bar{x}$ and satisfies (33), then it is convergent q-superlinearly.

The following theorem gives conditions under which the Broyden update satisfies the Dennis–Moré condition (33).

Theorem 4.8

Consider a function f:X→Y and a point $\bar{x}\in X$ such that the derivative mapping Df is Lipschitz continuous around $\bar{x}$ with respect to the Hilbert–Schmidt norm. Consider also the Broyden update (28) such that $B_{0}-Df(\bar{x})$ is a Hilbert–Schmidt operator. If a sequence {x _k} is linearly convergent to $\bar{x}$, then it satisfies the Dennis–Moré condition (33).

Proof

We first show that

$$ \lim_{k\to\infty}\|B_{k+1}-B_k\|_{HS} = 0. $$

(34)

The proof of (34) parallels the analysis in [14], similarly to the proof of [4, Theorem 2.3]. Consider the convex set

$$\mathcal{C}_k:=\bigl\{ A\in\mathcal{L}(X,Y)\mid\bigl\| A-Df(\bar{x})\bigr\| _{HS}<\infty\text{ and } As_k=y_k \bigr\} , $$

where s _k:=x _k+1−x _k and y _k:=f(x _k+1)−f(x _k). Observe that $\mathcal{C}_{k}$ is closed: if $A_{n}\in\mathcal{C}_{k}$ converges to $A\in\mathcal{L}(X,Y)$ (with respect to the Hilbert–Schmidt norm), then

$$\bigl\| A-Df(\bar{x})\bigr\| _{HS}\leq\|A-A_n\|_{HS}+ \bigl\| A_n-Df(\bar{x})\bigr\| _{HS}<\infty, $$

and

$$\|As_k-y_k\|=\|As_k-A_ns_k \|\leq\|A-A_n\|_{HS}\|s_k\|, $$

where we employ the inequality ∥A−B _k∥≤∥A−B _k∥_HS (see e.g. [20, Corollary 16.9]). Taking the limit when n→∞, we get As _k=y _k.

Let {e _i,i∈I} be an orthonormal basis of X for an index set I. By Corollary 4.6, $B_{k+1}\in\mathcal{C}_{k}$. Moreover, for all $A\in\mathcal{C}_{k}$, one has

$$\begin{aligned} \|B_{k+1}-B_k\|_{HS}^2&=\biggl\Vert \frac{(y_k-B_ks_k)\langle s_k,\cdot \rangle}{\|s_k\|^2}\biggr\Vert _{HS}^2=\frac{\|(A-B_k)s_k \langle s_k,\cdot\rangle\|_{HS}^2}{\|s_k\|^4} \\ & =\frac{\sum_{i\in I}\|(A-B_k)s_k\langle s_k,e_i\rangle\|^2}{\|s_k\|^4} =\frac{\sum_{i\in I}\langle s_k,e_i\rangle^2\|(A-B_k)s_k\|^2}{\|s_k\| ^4} \\ &=\frac{\|(A-B_k)s_k\|^2}{\|s_k\|^2} \leq\|A-B_k\|_{HS}^2, \end{aligned}$$

where we again use the inequality ∥A−B _k∥≤∥A−B _k∥_HS. Then Broyden update (28) is the (unique) solution to the minimization problem

$$\min_{A\in\mathcal{C}_k}\|A-B_k\|_{HS}. $$

Thus, B _k+1 is the projection of B _k onto the closed convex set $\mathcal{C}_{k}$. The projection mapping onto $\mathcal{C}_{k}$, denoted by $P_{\mathcal{C}_{k}}$, is firmly nonexpansive (see e.g. [2, Proposition 4.8]), meaning in our case that for every $A\in\mathcal{C}_{k}$ one has

$$\bigl\| P_{\mathcal{C}_k}(B_k)-P_{\mathcal{C}_k}(A)\bigr\| _{HS}^2+ \bigl\| (I-P_{\mathcal{C}_k}) (B_k)-(I-P_{\mathcal{C}_k}) (A) \bigr\| _{HS}^2\leq\| B_k-A\|_{HS}^2, $$

where I denotes the identity mapping. (Firmly nonexpansive mappings can be defined in several equivalent ways; here we use one of the possible definitions, see also [2, Definition 4.1(i)].) Hence, for all $A\in\mathcal{C}_{k}$,

$$ \|B_{k+1}-A \|_{HS}^2+\|B_{k+1}-B_k \|_{HS}^2\leq\|B_k-A\|_{HS}^2. $$

(35)

For

$$ A_k:=\int_0^1 Df \bigl(x_k+t(x_{k+1}-x_k)\bigr)dt $$

(36)

we have

$$ A_ks_k=\int_0^1 Df \bigl(x_k+t(x_{k+1}-x_k)\bigr) (x_{k+1}-x_k)dt=f(x_{k+1})-f(x_k)=y_k . $$

(37)

Furthermore, since Df is Lipschitz continuous with respect to the Hilbert–Schmidt norm, there is a constant L≥0 such that, eventually,

$$\begin{aligned} \bigl\| A_k-Df(\bar{x})\bigr\| _{HS}&=\biggl\Vert \int _0^1 \bigl(Df\bigl(x_k+t(x_{k+1}-x_k) \bigr)-Df(\bar{x}) \bigr)dt\biggr\Vert _{HS} \\ &\leq\int_0^1\bigl\Vert Df \bigl(tx_{k+1}+(1-t)x_k\bigr)-Df(\bar{x})\bigr\Vert _{HS}dt \\ &\leq L \int_0^1\bigl\Vert t(x_{k+1}-\bar{x})+(1-t) (x_k-\bar{x})\bigr\Vert dt \\ &\leq\frac{L}{2} \bigl(\|x_{k+1}-\bar{x}\|+\|x_k-\bar{x}\| \bigr) <\infty . \end{aligned}$$

(38)

Thus, $A_{k}\in\mathcal{C}_{k}$. Since x _k converges to $\bar{x}$, we deduce from (38) that $\|A_{k}-Df(\bar{x})\|_{HS}$ converges to zero. Moreover, (32) together with the linear convergence of x _k to $\bar{x}$ implies that $\|B_{k}-Df(\bar{x})\|_{HS}$ is convergent. Indeed, let 0<γ<1 be such that

$$\|x_{k+1}-\bar{x}\|\leq\gamma\|x_k-\bar{x}\|\quad\text{for all }k=0,1,\ldots. $$

Then, for all m>n, one has by (32)

$$\begin{aligned} \bigl\| B_m-Df(\bar{x})\bigr\| _{HS} &\leq\bigl\| B_n-Df(\bar{x}) \bigr\| _{HS}+\frac{L}{2}\sum_{k=n+1}^m \bigl(\|x_k-\bar{x}\|+\|x_{k-1}-\bar{x}\| \bigr) \\ &\leq\bigl\| B_n-Df(\bar{x})\bigr\| _{HS}+L\sum _{k=n}^{m-1}\|x_{k}-\bar{x}\| \\ &\leq\bigl\| B_n-Df(\bar{x})\bigr\| _{HS}+L\sum _{k=n}^{\infty}\gamma^k\| x_{0}- \bar{x}\| \\ &\leq\bigl\| B_n-Df(\bar{x})\bigr\| _{HS}+\frac{L\gamma^n}{1-\gamma}\| x_{0}-\bar{x}\|. \end{aligned}$$

This implies that $\|B_{k}-Df(\bar{x})\|_{HS}$ is a Cauchy sequence, and thus it is convergent. Therefore, since A _k defined in (36) converges to $Df(\bar{x})$, we get that ∥B _k−A _k∥_HS and ∥B _k+1−A _k∥_HS converge to the same limit. Furthermore, (35) implies

$$ \|B_{k+1}-A_k \|^2_{HS}+\|B_{k+1}-B_k \|^2_{HS}\leq\|B_k-A_k \|^2_{HS} $$

(39)

which in turn yields (34).

We are now ready to prove that (33) is satisfied. Since ∥B _k+1−B _k∥≤∥B _k+1−B _k∥_HS, by the triangle inequality we have

$$\begin{aligned} \Vert E_{k}s_{k}\Vert =&\bigl\Vert \bigl(B_{k}-Df(\overline{x})\bigr)s_{k}\bigr\Vert \\ \leq&\bigl\Vert \bigl(B_{k+1}-Df(\overline {x})\bigr)s_{k} \bigr\Vert +\Vert B_{k+1}-B_{k}\Vert _{HS} \Vert s_{k}\Vert . \end{aligned}$$

(40)

The next steps mimics the proof of Proposition 4.2. Taking into account that

$$\bigl\Vert \bigl(B_{k+1}-Df(\overline{x})\bigr)s_{k}\bigr\Vert =\bigl\Vert y_{k}-Df(\overline{x})s_{k}\bigr\Vert =\bigl\Vert f(x_{k+1})-f(x_{k} )-Df( \overline{x})s_{k}\bigr\Vert $$

and

$$f(x_{k+1})-f(x_{k})={ \displaystyle\int\nolimits_{0}^{1}} Df \bigl(x_{k}+t(x_{k+1}-x_{k})\bigr)s_{k}dt, $$

we get

$$\begin{aligned} \bigl\Vert \bigl(B_{k+1}-Df(\overline{x})\bigr)s_{k}\bigr\Vert & \leq \Vert s_{k}\Vert { \displaystyle\int\nolimits_{0}^{1}} \bigl\Vert Df\bigl(x_{k}+t(x_{k+1}-x_{k})\bigr)-Df( \overline{x})\bigr\Vert dt \\ &\leq \Vert s_{k}\Vert {\displaystyle\int \nolimits_{0}^{1}} \bigl\Vert Df \bigl(x_{k}+t(x_{k+1}-x_{k})\bigr)-Df(\overline{x}) \bigr\Vert _{HS}dt \\ & \leq L\Vert s_{k}\Vert {\displaystyle \int\nolimits_{0}^{1}} \bigl\Vert t(x_{k+1}-\overline{x})+(1-t) (x_{k}-\overline{x})\bigr\Vert dt \\ & \leq\frac{L\Vert s_{k}\Vert }{2}\bigl(\Vert x_{k+1}-\overline {x}\Vert + \Vert x_{k}-\overline{x}\Vert \bigr). \end{aligned}$$

Thus, from (40),

$$\frac{\Vert E_{k}s_{k}\Vert }{\Vert s_{k}\Vert }\leq\frac{L}{2}\bigl(\Vert x_{k+1}-\overline{x}\Vert +\Vert x_{k}-\overline{x} \Vert \bigr)+\Vert B_{k+1}-B_{k}\Vert _{HS}. $$

Since ∥B _k+1−B _k∥_HS→0 by (34) and $x_{k}\rightarrow\overline{x}$, we come to (33). □

The following theorem presents the main result of this section.

Theorem 4.9

Consider the generalized equation (1) with a solution $\bar{x}$ and suppose that the derivative mapping Df is Lipschitz continuous around $\bar{x}$ with respect to the Hilbert–Schmidt norm. Consider the quasi-Newton method (4) applied to (1) with the Broyden update (28) such that B ₀ satisfies (13) and $B_{0}-Df(\bar{x})$ is a Hilbert–Schmidt operator.

(i)
If f+F is strongly metrically subregular at $\bar{x}$ for 0, then every sequence {x _k} generated by (4) which converges to $\bar{x}$ is q-superlinearly convergent;
(ii)
If f+F is both strongly metrically subregular and metrically regular at $\bar{x}$ for 0, then there exists a neighborhood O of $\bar{x}$ such that for every starting point x ₀∈O there exists a sequence {x _k} generated by (4) which either reaches a solution of (1) in finitely many steps or converges q-superlinearly to $\bar{x}$;
(iii)
If f+F is strongly metrically regular at $\bar{x}$ for 0, then for every x ₀∈O there exists a unique in O sequence {x _k} starting from x ₀ and generated by (4), and this sequence converges q-superlinearly to $\bar{x}$.

Proof

To prove (i) it is sufficient to combine Theorem 4.8 with Theorem 4.7. Then (ii) follows from (i) and Theorem 4.3. Since strong metric regularity implies strong metric subregularity, in order to prove (iii) it is sufficient to combine (i) with the last part of Theorem 4.3. □

We note that the condition $E_{0}:=B_{0}-Df(\overline{x})$ be a Hilbert–Schmidt operator is used in [23, Theorem 3.5] to prove q-superlinear convergence of the Broyden method for equations. Thus, Theorem 4.7 also extends [23, Theorem 3.5] to generalized equations.

5 Two numerical examples

Our first example is one-dimensional. Let $f:\mathbb{R}\to\mathbb {R}$ and $F:\mathbb{R}\rightrightarrows\mathbb{R}$ be given by

$$\begin{aligned} f(x)&:=3x^3-2x^2,\quad\text{for }x\in\mathbb{R}; \\ F(x)&:=\left \{ \begin{array}{l@{\quad}l} \{x,-x\},&x\geq0;\\ \emptyset,&x<0. \end{array} \right . \end{aligned}$$

The graph of f+F is plotted in Figure 1. The generalized equation 0∈f(x)+F(x) has two solutions: 0 and 1.

Observe that the mapping f+F is strongly regular at any point of its graph, and particularly at 0 for 0 and at 1 for 0. Hence the assumptions of Theorem 4.9 are satisfied, and the quasi-Newton method (4) with the Broyden update (28) generates a locally unique q-superlinearly convergent sequence when started within a neighborhood of each of the solutions. The numerical results with B ₀:=Df(x ₀) are shown in Table 1 for two starting points: x ₀=0.1 (left) and x ₀=0.3 (right). The absolute error at the kth iteration is denoted by ∥e _k∥. Note that the obtained convergence is actually q-superlinear in each case.

Table 1 Numerical results for the first example with x ₀=0.1 (left) and x ₀=0.3 (right)

Full size table

In the paper [11] the following model of economic equilibrium was introduced. Consider a group of r agents, each of which starts with a vector $x_{i}^{0}\in\mathbb{R}^{n}$ of goods and trades them for another goods vectors $x_{i}\in\mathbb{R}^{n}$. Each good has a price to be determined by the market and the price vector is $p \in\mathbb{R}^{n}_{+}$. Agent i has an initial amount of money $m_{i}^{0}\in\mathbb{R}_{+}$ and ends up, after trading, with an amount of money $m_{i}\in\mathbb{R}_{+}$. Agent i aims at maximizing a utility function u _i(m _i,x _i) over a set $\mathbb{R}_{+}\times U_{i}$ subject to the budget constraint

$$ m_i - m_i^0 + \bigl\langle p, x_i - x_i^0\bigr\rangle \leq0, $$

(41)

where the sets $U_{i}\subset\mathbb{R}^{n}$ are nonempty, closed and convex and the functions u _i are continuously differentiable, concave and nondecreasing over $\mathbb{R}_{+}\times U_{i}$. In addition to the budget constraints (41) there are supply-demand requirements for money and goods, of the form

$$ \sum_{i=1}^r \bigl[m_i-m^0_i\bigr] \leq0,\qquad \sum _{i=1}^r \bigl[x_i-x^0_i \bigr] \leq0. $$

(42)

The problem is to find an equilibrium value of the vector variable (p,m,x) such that each utility function attains its maximum subject to the budget and the supply-demand constraints. It is shown in [11, Theorem 1] that under some mild conditions that are satisfied in the example displayed below an equilibrium always exists, moreover it satisfies a first-order optimality condition for each agent involving the Lagrange functions

$$L_i(p,m_i, x_i, \lambda_i) = - u(m_i, x_i) + \lambda_i \bigl(m_i - m_i^0 + \bigl\langle p, x_i - x_i^0\bigr\rangle \bigr) $$

with a Lagrange multiplier λ _i≥0, i=1,…,r, associated with the budget constraint (41). Adding the supply-demand constraints (42) written as complementarity conditions, we obtain a variational inequality for the vectors $p \in\mathbb{R}^{n}_{+}$, $m=(m_{1},\ldots,m_{r})^{\sf T} \in\mathbb {R}^{r}_{+}$, $x=(x_{1},\ldots,x_{r})^{\sf T} \in U_{1} \times U_{2} \times\cdots\times U_{r}$, and $\lambda= (\lambda_{0}, \ldots, \lambda_{r})^{\sf T} \in\mathbb{R} _{+}^{r}$ of the form

$$ -g\bigl(p,m,x,\lambda,m^0,x^0\bigr) \in N_C(p,m,x,\lambda), $$

(43)

where

$$ C= \mathbb{R}^n_+\times\mathbb{R}_+^r \times U_1 \times\cdots \times U_r \times\mathbb{R}^r_+, $$

(44)

and

$$ g\bigl(p,m,x,\lambda,m^0,x^0\bigr) = \left ( \begin{array}{c} \sum_{i=1}^r[x_i^0-x_i] \\ \ldots\\ \lambda_i - \nabla_{m_i} u_i(m_i,x_i) \\ \ldots\\ \lambda_i p - \nabla_{x_i} u_i(m_i,x_i)\\ \ldots\\ m_i^0-m_i +\langle p, x_i^0-x_i\rangle\\ \ldots\\ \end{array} \right ). $$

(45)

The initial endowments are represented by the vectors $m^{0}=(m^{0}_{1},\ldots,m^{0}_{r})^{\sf T}\in\mathbb{R}^{r}_{+}$ and $x^{0}=(x^{0}_{1},\ldots,x^{0}_{r})^{\sf T} \in U_{1} \times U_{2} \times\cdots\times U_{r}$. In [11, Theorem 3] it is shown that the equilibrium mapping associated with (43) is strongly regular provided that for each agent i the initial goods $x^{0}_{i}$ are sufficiently close to the equilibrium vector $\bar{x}_{i}$; in other words, when the trade starts with amounts of goods not too far from the equilibrium. Note that the first inequality in (42) does not appear in (43) since at equilibrium that automatically becomes an equality.

We consider a specific example where there are two agents with utility functions

$$ u_i(m_i, x_i) = \alpha_i \ln(m_i) + \beta_i\ln(x_i), \quad i = 1, 2, $$

and a single good subject to the constraints

$$ x_i \in U_i=[ \xi_i, \eta_i], \quad i = 1, 2 $$

for some positive ξ _i and η _i. The variational inequality (43) for the vector (p,m ₁,m ₂,x ₁,x ₂,λ ₁,λ ₂) has the following specific form:

$$ - \left ( \begin{array}{c} \sum_{i=1}^2[x_i^0-x_i] \\ \lambda_1 - \frac{\alpha_1}{m_1} \\ \lambda_2 - \frac{\alpha_2}{m_2}\\ \lambda_1 p - \frac{\beta_1}{x_1}\\ \lambda_2 p - \frac{\beta_2}{x_2}\\ m_1^0-m_1 +\langle p, x_1^0-x_1\rangle\\ m_2^0-m_2 +\langle p, x_2^0-x_2\rangle\\ \end{array} \right ) \in \left ( \begin{array}{c} N_{\mathbb{R}_+}(p) \\ N_{\mathbb{R}_+}(m_1) \\ N_{\mathbb{R}_+}(m_2)\\ N_{U_1}(x_1) \\ N_{U_2}(x_2)\\ N_{\mathbb{R}_+}(\lambda_1)\\ N_{\mathbb{R}_+}(\lambda_2)\\ \end{array} \right ). $$

The numerical implementation of Broyden’s update (43) for this variational inequality has been done in Matlab. Each step of the method reduces to solving linear complementarity problems (LCP). The matlab function LCP by Yuval available at http://www.mathworks.com/matlabcentral/fileexchange/20952 has been used for solving these problems. The computations are done for the following data. For the parameters α _i=β _i=0.1 we consider the first agents with endowment of good 0.9 and money 1.3 and the second agent with unit endowments: $x^{0} = (0.9, 1)^{\sf T}$, $m^{0} = (1.3, 1)^{\sf T}$. The survival interval of consumption for each agent is [0.94,1.08]. Then the solution is: p=1.2745, $m = (1.2235, 1.0765)^{\sf T}$, $x = (0.96, 0.94)^{\sf T}$, $\lambda= (0.0817, 0.0929)^{\sf T}$.

We did numerical testing with various starting points and starting updates, and obtained rather similar results. The result of one of these tests is presented below for the starting point of the algorithm equal p ₀=1.3745, $m_{0} = (1.3235, 1.1765)^{\sf T}$, $x_{0} = (1.06, 1.04)^{\sf T}$, $\lambda_{0} = (0.1817, 0.1929)^{\sf T}$ and initial update B ₀ equal the value of the Jacobian at the starting point. The results of computations are given in Table 2. We have q-superlinear convergence also for this case, as proved theoretically.

Table 2 Numerical results for the second example

Full size table

References

Argyros, I.K., Cho, Y.J., Hilout, S.: Numerical Methods for Equations and Its Applications. CRC Press, Boca Raton (2012)
MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Book MATH Google Scholar
Benahmed, B., Mokhtar-Kharroubi, H., Malafosse, B., Yassine, A.: Quasi-Newton methods in infinite-dimensional spaces and application to matrix equations. J. Glob. Optim. 49, 365–379 (2011)
Article MATH Google Scholar
Bonnans, J.F.: Local analysis of Newton-type methods for variational inequalities and nonlinear programming. Appl. Math. Optim. 29, 161–186 (1994)
Article MATH MathSciNet Google Scholar
Dennis, J.E. Jr., Moré, J.J.: A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comput. 28, 549–560 (1974)
Article MATH Google Scholar
Dontchev, A.L.: Characterizations of Lipschitz stability in optimization. In: Recent Developments in Well-Posed Variational Problems, pp. 96–116. Kluwer Academic, Boston (1995)
Google Scholar
Dontchev, A.L.: Local convergence of the Newton method for generalized equation. C. R. Math. Acad. Sci. Paris, Sér. I 322, 327–331 (1996)
MATH MathSciNet Google Scholar
Dontchev, A.L.: Generalizations of the Dennis–Moré theorem. SIAM J. Optim. 22, 821–830 (2012)
Article MATH MathSciNet Google Scholar
Dontchev, A.L., Hager, W.W.: An inverse mapping theorem for set-valued maps. Proc. Am. Math. Soc. 121, 481–489 (1994)
Article MATH MathSciNet Google Scholar
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. a View from Variational Analysis. Springer, Dordrecht (2009)
Book MATH Google Scholar
Dontchev, A.L., Rockafellar, R.T.: Parametric stability of solutions in models of economic equilibrium. J. Convex Anal. 19, 975–997 (2012)
MATH MathSciNet Google Scholar
Griewank, A.: The local convergence of Broyden-like methods on Lipschitzian problems in Hilbert spaces. SIAM J. Numer. Anal. 24, 684–705 (1987)
Article MATH MathSciNet Google Scholar
Griewank, A.: Broyden updating, the good and the bad! Optimization stories. Doc. Math. 301–315 (2012)
Grzegórski, S.M.: Orthogonal projections on convex sets for Newton-like methods. SIAM J. Numer. Anal. 22, 1208–1219 (1985)
Article MATH MathSciNet Google Scholar
Hwang, D.M., Kelley, C.T.: Convergence of Broyden’s method in Banach spaces. SIAM J. Optim. 2, 505–532 (1992)
Article MATH MathSciNet Google Scholar
Josephy, N.H.: Newton’s method for generalized equations and the PIES energy model. Ph.D. Dissertation, Department of Industrial Engineering, University of Wisconsin-Madison (1979)
Kelley, C.T.: Iterative Methods for Linear and Nonlinear Equations. Frontiers in Applied Mathematics, vol. 16. SIAM, Philadelphia (1995). With separately available software
Book MATH Google Scholar
Kelley, C.T., Sachs, E.: A new proof of superlinear convergence for Broyden’s method in Hilbert space. SIAM J. Optim. 1, 146–150 (1991)
Article MATH MathSciNet Google Scholar
Klatte, D., Kummer, B.: Nonsmooth Equations in Optimization. Kluwer Academic, Boston (2002)
MATH Google Scholar
Meise, R., Vogt, D.: Introduction to Functional Analysis. Oxford University Press, London (2004)
Google Scholar
Robinson, S.M.: Strongly regular generalized equations. Math. Oper. Res. 5, 43–62 (1980)
Article MATH MathSciNet Google Scholar
Sachs, E.W.: Convergence rates of quasi-Newton algorithms for some nonsmooth optimization problems. SIAM J. Control Optim. 23, 401–418 (1985)
Article MATH MathSciNet Google Scholar
Sachs, E.W.: Broyden’s method in Hilbert space. Math. Program. 35, 71–82 (1986)
Article MATH MathSciNet Google Scholar
Sun, W., Yuan, Y.-X.: Optimization Theory and Methods: Nonlinear Programming. Springer Optimization and Its Applications, vol. 1. Springer, New York (2006)
Google Scholar
Wen-huan, Y.: A quasi-Newton method in infinite-dimensional spaces and its application for solving a parabolic inverse problem. J. Comput. Math. 16, 305–318 (1998)
MathSciNet Google Scholar

Download references

Acknowledgement

The authors wish to thank the anonymous referees for their valuable comments and suggestions.

Author information

Authors and Affiliations

Systems Biochemistry Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, 4362, Esch-sur-Alzette, Luxembourg
F. J. Aragón Artacho
Institute of Mathematical Methods in Economics, Vienna University of Technology, 1040, Vienna, Austria
A. Belyakov
Institute of Mechanics, Lomonosov Moscow State University, 119192, Moscow, Russia
A. Belyakov
Mathematical Reviews, Ann Arbor, MI, 48107-8604, USA
A. L. Dontchev
Department of Statistics and Operations Research, University of Alicante, 03080, Alicante, Spain
M. López

Authors

F. J. Aragón Artacho
View author publications
You can also search for this author in PubMed Google Scholar
A. Belyakov
View author publications
You can also search for this author in PubMed Google Scholar
A. L. Dontchev
View author publications
You can also search for this author in PubMed Google Scholar
M. López
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. L. Dontchev.

Additional information

A. Belyakov was supported by the Austrian Science Foundation (FWF) under grant No P 24125-N13.

A.L. Dontchev was supported by NSF Grant DMS 1008341 through the University of Michigan.

M. López was supported by MINECO of Spain, Grant MTM2011-29064-C03-02.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aragón Artacho, F.J., Belyakov, A., Dontchev, A.L. et al. Local convergence of quasi-Newton methods under metric regularity. Comput Optim Appl 58, 225–247 (2014). https://doi.org/10.1007/s10589-013-9615-y

Download citation

Received: 04 March 2013
Published: 30 October 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10589-013-9615-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Local convergence of quasi-Newton methods under metric regularity

Abstract

Similar content being viewed by others

Extending the Kantorovich’s theorem on Newton’s method for solving strongly regular generalized equation

Improved convergence analysis for Newton-like methods

On the convergence of Newton-like methods using restricted domains

1 Introduction

Definition 1.1

Definition 1.2

Definition 1.3

2 Preliminaries

Theorem 2.1

Theorem 2.2

Proof

3 Convergence under metric regularity

Theorem 3.1

Proof

4 Convergence of the Broyden update

Lemma 4.1

Proof

Proposition 4.2

Proof

Theorem 4.3

Lemma 4.4

Proof

Proposition 4.5

Proof

Corollary 4.6

Proof

Theorem 4.7

Theorem 4.8

Proof

Theorem 4.9

Proof

5 Two numerical examples

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation