1 Introduction

In this paper we consider the generalized equation

$$ f(x) + F(x) \ni0, $$
(1)

where f:XY is a function and F:XY is a set-valued mapping. Throughout, unless stated otherwise, X and Y are (real) Banach spaces. To simplify some of the arguments used we make the standing assumption that f is continuously Fréchet differentiable everywhere with derivative Df and F has closed graph.

If F is the zero mapping, then (1) reduces to the equation f(x)=0 for which the standard Newton iteration takes the form

$$f(x_k) + Df(x_k) (x_{k+1}-x_k) = 0. $$

When F is nonzero, then the Newton iteration is extended in a natural way to

$$ f(x_k) + Df(x_k) (x_{k+1}-x_k) +F(x_{k+1})\ni0, $$
(2)

that is, at each iteration a partially linearized inclusion is to be solved. In a path-breaking work N.H. Josephy [16] was the first to consider a Newton iteration of the kind (2) specialized to the case where F is the normal cone mapping in finite dimensions; then (1) describes a variational inequality. Most importantly, he employed the property of strong regularity coined by his Ph.D. advisor S.M. Robinson [21]. In this paper, we adopt the definition given in [10]:

Definition 1.1

(Strong metric regularity)

A mapping H:XY is said to be strongly metrically regular at \(\bar{x}\) for \(\bar{y}\) when \(\bar{y}\in H(\bar{x})\) and there are neighborhoods U of \(\bar{x}\) and V of \(\bar{y}\) such that the mapping yH −1(y)∩U is a Lipschitz continuous function on V.

The result of Josephy [16] adapted for the generalized equation (1) essentially says that if \(\bar{x}\) is a solution of (1), the function f is twice continuously differentiable around \(\bar{x}\) and the mapping f+F is strongly metrically regular at \(\bar{x}\) for 0, then there exists a neighborhood O of \(\bar{x}\) such that for every starting point x 0O the iteration (2) generates a unique sequence in O and this sequence is q-quadratically convergent to \(\bar{x}\).

A linear and bounded mapping A:XY is strongly metrically regular (everywhere) whenever its inverse A −1 is single-valued. If the mapping A is not necessarily invertible but only surjective, then it is metrically regular. Metric regularity has played a major role in nonlinear analysis since 1960s. Its formal definition follows:

Definition 1.2

(Metric regularity)

A mapping H:XY is said to be metrically regular at \(\bar{x}\) for \(\bar{y}\) when \(\bar{y}\in H(\bar{x})\) and there is a constant κ>0 together with neighborhoods U of \(\bar{x}\) and V of \(\bar{y}\) such that

$$ d \bigl(x,H^{-1}(y) \bigr)\leq\kappa d\bigl(y,H(x)\bigr) \quad\text{for all } (x,y)\in U\times V. $$
(3)

A central result in the theory of metric regularity is the Lyusternik–Graves theorem which says that if a function f:XY is continuously Fréchet differentiable around \(\bar{x}\), then it is metrically regular at \(\bar{x}\) (for \(f(\bar{x})\)) if and only if the derivative \(Df(\bar{x})\) is surjective. In parallel, the standard inverse function theorem can be stated as follows: if a function f:XY is continuously Fréchet differentiable around \(\bar{x}\) then it is strongly metrically regular at \(\bar{x}\) (for \(f(\bar{x})\)) if and only if the derivative \(Df(\bar{x})\) is invertible. The inverse function theorem was extended to variational inequalities by Robinson in his seminal paper [21]. Adapted to the generalized equation (1), it says that the mapping f+F is (strongly) metrically regular at \(\bar{x}\) for 0 if and only if the “partial linearization” \(f(\bar{x}) + Df(\bar{x})(\cdot- \bar{x}) + F(\cdot)\) has the same property. This general pattern culminates in the inverse function theorem paradigm to which the recent book [10] is dedicated.

The third author of the present paper proved in [7], see also Theorem 6C.6 in [10], the following result which complements that of Josephy: if the derivative Df is Lipschitz continuous around \(\bar{x}\) and the mapping f+F is metrically regular at \(\bar{x}\) for 0, then there exists a neighborhood O of \(\bar{x}\) such that for every starting point x 0O the method (2) is executable, that is, it generates a sequence which is q-quadratically convergent to \(\bar{x}\). Under strong metric regularity this sequence happens to be locally unique, as in Josephy’s theorem. The result in [7] has opened the way to developing a broader perspective to set-valued extensions of Newton’s method. A number of results in this direction are presented in the books [1, 10] and [19].

There is a third regularity property that plays an important role in establishing convergence of Newton’s method.

Definition 1.3

(Strong metric subregularity)

Consider a mapping H:XY and a point \((\bar{x}, \bar{y}) \in X\times Y\). Then H is said to be strongly metrically subregular at \(\bar{x}\) for \(\bar{y}\) when \(\bar{y}\in H(\bar{x})\) and there is a constant κ>0 together with a neighborhood U of \(\bar{x}\) such that

$$\|x-\bar{x}\| \leq\kappa d\bigl(\bar{y}, H(x)\bigr)\quad \text{for all } x \in U. $$

Strong metric subregularity of H at \(\bar{x}\) for \(\bar{y}\) implies that \(\bar{x}\) is an isolated point in \(H^{-1}(\bar{y})\); moreover, it is equivalent to the so-called isolated calmness of the inverse H −1, meaning that there is a neighborhood U of \(\bar{x}\) such that \(H^{-1}(y)\cap U \subset\bar{x}+\kappa\|y-\bar{x}\|\mathbb{B}\) for all yY. The isolated calmness was introduced independently in [4] under the name semistability and in [6] under the name local upper Lipschitz continuity. Every mapping H acting in finite dimensions, whose graph is the union of finitely many convex polyhedral sets, is strongly metrically subregular at \(\bar{x}\) for \(\bar{y}\) if and only if \(\bar{x}\) is an isolated point in \(H^{-1}(\bar{y})\). Most importantly, the strong metric subregularity obeys the paradigm of the inverse function theorem in the same way as metric regularity and strong metric regularity do. In particular, a smooth function f is strongly metrically subregular at \(\bar{x}\) if and only if its derivative mapping \(Df(\bar{x})\) is injective.

Consider the Newton method (2) for the generalized equation (1) with a function f whose derivative mapping Df is Lipschitz continuous around a reference solution \(\bar{x}\). If a sequence {x k } generated by (2) is convergent to \(\bar{x}\) then, as shown in [4, Corollary 2.1], strong metric subregularity implies that this sequence converges quadratically. Note that the strong metric subregularity itself does not guarantee the existence of a Newton sequence. In order to ensure that Newton’s method (2) is executable, Bonnans [4] introduced a property called by him hemistability which basically postulates the existence of a Newton iteration in (2). Specifically, in our notation this property requires that for any ε>0 there exists δ>0 such that for any x and M with \(\|x-\bar{x}\|+\|M-Df(\bar{x})\|\leq\varepsilon\) there exists \(\hat{x}\) with \(\|\hat{x} - \bar{x}\|\leq\delta\) satisfying \(f(x)+M(\hat{x} - x) + F(\hat{x})\). Observe that for the mapping f+F with a smooth function f hemistability is implied by metric regularity; this is a simple consequence of the Lyusternik–Graves theorem. On the other hand the combination of semistability and hemistability does not follow from metric regularity, since metric regularity does not imply local uniqueness of the reference point. Also, this combination is a weaker property than strong metric regularity but doesn’t guarantee local uniqueness of a Newton sequence.

In this paper we prove first a result parallel to the one in [7] concerning the convergence of the following quasi-Newton method for solving (1): given x 0 compute x k+1 to satisfy

$$ f(x_k)+ B_k(x_{k+1}-x_k) + F(x_{k+1}) \ni0, \quad\text{for } k=0,1,\ldots, $$
(4)

where B k is a sequence of linear and bounded mappings from X to Y. The specific way B k is constructed determines the quasi-Newton method; in the paper we focus on Broyden’s update. In Sect. 3, Theorem 3.1 shows that if the mapping f+F of (1) is metrically regular at \(\bar{x}\) for 0 and the initial mapping B 0 is close to \(Df(\bar{x})\), then, under certain condition on the sequence of mappings B k , there exists a neighborhood O of \(\bar{x}\) such that for every starting point x 0O there exists a sequence generated by (4) which stays in O and either reaches a solution of (1) in O after finitely many steps or is q-linearly convergent to \(\bar{x}\). If in addition the mapping f+F is strongly metrically regular, then for every starting point x 0O there exists a unique in O sequence generated by (4) and this sequence is q-linearly convergent to \(\bar{x}\).

In Sect. 4 we first prove in the Hilbert space setting that the Broyden method satisfies the conditions of Theorem 3.1. In Theorem 4.9, under the condition that \(B_{0}-Df(\bar{x})\) is a Hilbert–Schmidt operator, we establish q-superlinear convergence in three cases depending on the property of the mapping f+F: (i) if f+F is strongly metrically subregular at \(\bar{x}\) for 0 then every sequence generated by (4) which converges to \(\bar{x}\) is actually q-superlinearly convergent; (ii) if in addition f+F is metrically regular at \(\bar{x}\) for 0 then there exists a neighborhood O of \(\bar{x}\) such that for every starting point x 0O there exists a sequence generated by (4) which either reaches a solution of (1) in O in finitely many steps or is q-superlinearly convergent to \(\bar{x}\); (iii) if f+F is strongly metrically regular at \(\bar{x}\) for 0 then for every x 0O the iteration (4) generates a unique in O sequence and this sequence is q-superlinearly convergent to \(\bar{x}\). A key step in proving this result is Theorem 4.8 where we show that in the case considered the Broyden update satisfies the Dennis–Moré condition, which allows us to employ the generalization of the Dennis–Moré theorem obtained in [8]. Theorem 4.9 sharpens, in the setting of (strong) metric (sub)regularity, and extends to infinite dimensions the results in [4] developed for variational inequalities in finite dimensions. Specifically, for a class of quasi-Newton methods including the Broyden update, [4, Theorem 2.3] shows the existence of a q-superlinear convergent sequence under hemistability and semistability of the reference solution. In this paper we extend the latter result to infinite dimensions under the stronger condition that both metric regularity and strong metric subregularity hold.

Section 2 contains an auxiliary result concerning perturbed metric regularity, which is used as a tool for proving local convergence of the method (4). In Sect. 5 we present two numerical examples, the second of which is based on a model of economic equilibrium recently developed in [11].

Throughout, any norm is denoted by ∥⋅∥ and any metric by ρ(⋅,⋅). The distance from a point x to a set C is denoted by d(x,C) and the excess from a set D to a set C by e(D,C)=sup xD d(x,C). The closed ball centered at x with radius a is denoted by \(\mathbb{B}_{a}(\bar{x})\) and \(\mathcal{L}(X,Y)\) denotes the Banach space of linear and bounded mappings acting from X to Y.

2 Preliminaries

We utilize the following generalization of Nadler’s fixed point theorem, originally proved in [9], for more see [10, Theorem 5E.2]:

Theorem 2.1

(Contraction mapping principle)

Let (X,ρ) be a complete metric space, and consider a set-valued mapping Φ:XX, a point \(\bar{x}\in X\), and positive scalars a and θ such that θ<1, the set \(\operatorname{gph}\varPhi\cap (\mathbb{B}_{a}(\bar{x})\times\mathbb{B}_{a}(\bar{x}) ) \) is closed, and the following conditions hold:

  1. (i)

    \(d (\bar{x}, \varPhi(\bar{x}) ) < a(1 - \theta)\);

  2. (ii)

    \(e (\varPhi(u)\cap\mathbb{B}_{a}(\bar{x}), \varPhi(v) ) \leq \theta\rho(u,v)\) for all \(u, v \in\mathbb{B}_{a}(\bar{x})\).

Then Φ has a fixed point in \(\mathbb{B}_{a}(\bar{x})\); that is, there exists \(x \in\mathbb{B}_{a}(\bar{x})\) such that xΦ(x). In addition, if Φ is single-valued, then Φ has a unique fixed point in \(\mathbb{B}_{a}(\bar{x})\).

Recall that a metric ρ in a linear space X is said to be shift-invariant when ρ(x+y,x+y′)=ρ(y,y′) for all x,y,y′∈X.

Theorem 2.2

(Perturbed metric regularity)

Let (X,ρ) be a complete metric space and (Y,ρ) be a linear metric space with shift-invariant metric. Consider a mapping H:XY with closed graph and a point \((\bar{x}, \bar{y}) \in\operatorname{gph}H\) at which H is metrically regular, that is, there exist positive constants a, b, and κ such that

$$ d\bigl(x,H^{-1}(y)\bigr)\leq\kappa d\bigl(y,H(x)\bigr)\quad \textit{for all } (x,y)\in \mathbb{B}_a(\bar{x}) \times \mathbb{B}_b(\bar{y}). $$
(5)

Let μ>0 be such that κμ<1 and let κ′>κ. Then for every positive α and β such that

$$ \alpha\leq a/2, \quad \mu\alpha+2\beta\leq b \quad \textit{and} \quad 2\kappa'\beta\leq \alpha(1-\kappa\mu) $$
(6)

and for every function h:XY satisfying

$$ \rho\bigl(h(\bar{x}), 0\bigr) \leq\beta $$
(7)

and

$$ \rho\bigl(h(x), h\bigl(x'\bigr)\bigr) \leq\mu\rho \bigl(x, x'\bigr) \quad\textit{for every } x, x' \in \mathbb{B}_\alpha(\bar{x}), $$
(8)

the mapping h+H has the following property: for every \(y, y' \in \mathbb{B}_{\beta}(\bar{y})\) and every \(x \in(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x})\) there exists x′∈(h+H)−1(y′) such that

$$ \rho\bigl(x, x'\bigr) \leq\frac{\kappa'}{1-\kappa\mu}\rho\bigl(y, y'\bigr). $$
(9)

In addition, if the mapping H is strongly metrically regular at \(\bar{x}\) for \(\bar{y}\); specifically, the mapping \(y \mapsto H^{-1}(y)\cap\mathbb{B}_{a}(\bar{x})\) is single-valued and Lipschitz continuous on \(\mathbb{B}_{b}(\bar{y})\) with a Lipschitz constant κ, then for μ, κ′, α and β as above and any function h satisfying (7) and (8), the mapping \(y \mapsto(h+H)^{-1}(y)\cap\mathbb {B}_{\alpha}(\bar{x})\) is a Lipschitz continuous function on \(\mathbb{B}_{\beta}(\bar{y})\) with a Lipschitz constant κ′/(1−κμ).

Before proving the theorem we will make some comments. If we assume \(h(\bar{x}) = 0\) then Theorem 2.2 simply says that if H is (strongly) metrically regular at \(\bar{x}\) for \(\bar{y}\) and h has a sufficiently small Lipschitz constant, then the perturbed h+H is (strongly) metrically regular at \(\bar{x}\) for \(\bar{y}\); indeed in this case the claim involving (9) means that (h+H)−1 has the Aubin property at \(\bar{y}\) for \(\bar{x}\) which is equivalent to metric regularity of h+H at \(\bar{x}\) for \(\bar{y}\), and then we obtain the (extended) Lyusternik–Graves theorem as stated in [10, Theorem 5E.1]. For the strong regularity part we get a version of Robinson’s theorem, see [10, Theorem 5F.1]. However, if \(h(\bar{x}) \neq0\) then \((\bar{x}, \bar{y})\) may be not in the graph of h+H and then we cannot claim that h+H is (strongly) metrically regular at \(\bar{x}\) for \(\bar{y}\). Of course, this could be handled by choosing a new function \(\tilde{h}\) with \(\tilde{h}(x) = h(x) - h(\bar{x})\) and then applying [10, Theorem 5F.1], but the latter does not specify how the constants (e.g., the radii of the balls involved) depend on the data of the problem, which is the crux of the matter in obtaining the needed estimates. Clearly, the result in Theorem 2.2 is parallel to [10, Theorem 5F.1] and can be recovered from the latter; but we believe that giving a complete proof would be beneficial for the reader.

Proof

Choose μ and κ′ as required and then α and β to satisfy (6). For any \(x \in\mathbb{B}_{\alpha}(\bar{x})\) and \(y \in\mathbb {B}_{\beta}(\bar{y})\), using the shift-invariance of the metric in Y, (7), (8) and the triangle inequality, we obtain

$$\begin{aligned} \rho\bigl(-h(x)+y, \bar{y}\bigr) \leq&\rho\bigl(0,h(\bar{x})\bigr)+\rho \bigl(h(\bar{x}), h(x)\bigr) + \rho(y,\bar{y}) \\ \leq&\beta+ \mu\rho(x, \bar{x}) + \beta\leq2\beta+ \mu\alpha \leq b, \end{aligned}$$
(10)

where the last inequality follows from the second inequality in (6). Fix \(y' \in\mathbb{B}_{\beta}(\bar{y})\) and consider the mapping

$$\varPhi_{y'}: x \mapsto H^{-1}\bigl(-h(x)+y' \bigr) \quad\text{for } x \in \mathbb{B}_\alpha(\bar{x}). $$

Clearly, \(\operatorname{gph}\varPhi_{y'}\) is closed. Let \(y \in\mathbb{B}_{\beta}(\bar{y})\), yy′ and let \(x \in(h+H)^{-1}(y) \cap\mathbb{B}_{\alpha}(\bar{x})\). We will apply Theorem 2.1 with the complete metric space X identified with the closed ball \(\mathbb{B}_{\alpha}(\bar{x})\) to show that there is a fixed point x′∈Φ y (x′) in the closed ball centered at x with radius

$$ \varepsilon:=\frac{\kappa'\rho(y,y')}{1-\kappa\mu}. $$
(11)

From the third inequality in (6), we obtain

$$\varepsilon\leq \frac{\kappa'(2\beta)}{1-\kappa\mu} \leq\alpha. $$

Hence, from the first inequality in (6) we get \(\mathbb{B}_{\varepsilon}(x) \subset \mathbb{B}_{a}(\bar{x})\). Since yh(x)+H(x) and (x,y) satisfies (10), from the assumed metric regularity of H we get

$$\begin{aligned} d\bigl(x,\varPhi_{y'}(x)\bigr) = & d\bigl(x,H^{-1} \bigl(-h(x)+y'\bigr)\bigr) \leq\kappa d\bigl(-h(x)+y', H(x)\bigr) \\ =& \kappa d\bigl(y', h(x)+H(x)\bigr) \leq \kappa\rho \bigl(y,y'\bigr) \\ < & \kappa' \rho\bigl(y,y'\bigr)= \varepsilon(1- \kappa\mu). \end{aligned}$$

For any \(u,v \in\mathbb{B}_{\varepsilon}(x)\), using (8), we have

$$\begin{aligned} e\bigl(\varPhi_{y'}(u)\cap\mathbb{B}_{\varepsilon}(x), \varPhi_{y'}(v)\bigr) \leq& e\bigl(H^{-1} \bigl(-h(u)+y'\bigr)\cap\mathbb{B}_a(\bar{x}), H^{-1}\bigl(-h(v)+y'\bigr)\bigr) \\ \leq& \kappa\rho\bigl(h(u),h(v)\bigr) \leq\kappa\mu\rho(u,v). \end{aligned}$$

Applying Theorem 2.1 to the mapping Φ y, with \(\bar{x}\) identified with x and constants a=ε and θ=κμ, we obtain the existence of a fixed point x′∈Φ y(x′)=H −1(−h(x′)+y′), which is equivalent to x′∈(h+H)−1(y′), within distance ε given by (11) from x.

For the second part of the theorem, suppose that \(y \mapsto s(y):=H^{-1}(y)\cap\mathbb{B}_{a}(\bar{x})\) is a Lipschitz continuous function on \(\mathbb{B}_{b}(\bar{y})\) with a Lipschitz constant κ. Choose μ, κ′, α and β as in the statement and let h satisfy (7) and (8). For any \(y \in\mathbb{B}_{\beta}(\bar{y})\), since \(\bar{x}\in (h+H)^{-1}(\bar{y}+h(\bar{x}))\cap\mathbb{B}_{\alpha}(\bar{x})\), from (9) we obtain that there exists x∈(h+H)−1(y) such that

$$\rho(x, \bar{x}) \leq\frac{\kappa'}{1-\kappa\mu}\rho\bigl(y, \bar{y}+h(\bar{x})\bigr). $$

Since \(\rho(y, \bar{y}+h(\bar{x}))\leq2\beta\), by (6) we get \(\rho(x, \bar{x}) \leq\alpha\), that is, \((h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x}) \neq\emptyset\). Hence the domain of the mapping \((h+H)^{-1}\cap\mathbb{B}_{\alpha}(\bar{x}) \) contains \(\mathbb {B}_{\beta}(\bar{y})\).

If \(x \in(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x})\), then \(x \in H^{-1}(y-h(x))\cap\mathbb{B}_{\alpha}(\bar{x}) \subset H^{-1}(y-h(x))\cap\mathbb{B}_{a}(\bar{x})= s(y-h(x))\) since \(y - h(x) \in\mathbb{B}_{b}(\bar{y})\) according to (10). Hence,

$$ H^{-1}\bigl(y-h(x)\bigr)\cap\mathbb{B}_\alpha(\bar{x})=s\bigl(y-h(x)\bigr)=x. $$
(12)

Assume that there exist \(y \in\mathbb{B}_{\beta}(\bar{y})\) and \(x, x' \in(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x})\) such that xx′. From (10) we have that both yh(x) and yh(x′) are in \(\mathbb{B}_{b}(\bar{y})\). Then from (12) we get

$$\begin{aligned} \rho\bigl(x', x\bigr) = & \rho\bigl(s\bigl(-h \bigl(x'\bigr)+y\bigr), s\bigl(-h(x)+y\bigr)\bigr) \\ \leq& \kappa\rho\bigl(-h\bigl(x'\bigr)+y, -h(x)+y\bigr) = \kappa \rho\bigl(h\bigl(x'\bigr), h(x)\bigr) \\ \leq& \kappa\mu\rho\bigl(x',x\bigr) < \rho \bigl(x', x\bigr), \end{aligned}$$

which is a contradiction. Hence, the mapping \(y\mapsto g(y):=(h+H)^{-1}(y)\cap\mathbb{B}_{\alpha}(\bar{x})\) is single-valued, that is, a function, defined on \(\mathbb{B}_{\beta}(\bar{y})\). Let \(y,y' \in\mathbb{B}_{\beta}(\bar{y})\). Utilizing the equality g(y)=s(−h(g(y))+y), see (12), we have

$$\begin{aligned} \rho\bigl(g(y),g\bigl(y'\bigr)\bigr) = & \rho\bigl(s\bigl(-h \bigl(g(y)\bigr)+y\bigr), s\bigl(-h\bigl(g\bigl(y'\bigr) \bigr)+y'\bigr)\bigr) \\ \leq&\kappa\rho\bigl(h\bigl(g(y)\bigr),h\bigl(g\bigl(y'\bigr) \bigr)\bigr) +\kappa\rho\bigl(y,y'\bigr) \\ \leq& \kappa\mu\rho\bigl(g(y),g\bigl(y'\bigr)\bigr) + \kappa\rho \bigl(y,y'\bigr). \end{aligned}$$

Thus,

$$\rho\bigl(g(y),g\bigl(y'\bigr)\bigr) \leq\frac{\kappa'}{1-\kappa\mu}\rho \bigl(y,y'\bigr); $$

that is, g is Lipschitz continuous with Lipschitz constant κ′/(1−κμ). The proof is complete. □

3 Convergence under metric regularity

In this section we give conditions under which the quasi-Newton iteration (4) is locally q-linearly convergent. Recall that a sequence {x k } is convergent q-linearly to \(\bar{x}\) when there exist a natural K and a real α∈[0,1) such that

$$\|x_{k+1} - \bar{x}\| \leq\alpha\|x_k -\bar{x}\| \quad \text{for } k=K, K+1,\ldots. $$

A sequence {x k } is q-superlinearly convergent to \(\bar{x}\) when there exist a natural K and sequence of reals {α k } such that α k ↘0 and

$$\|x_{k+1} - \bar{x}\| \leq\alpha_k \|x_k -\bar{x} \| \quad\text{for } k=K, K+1,\ldots. $$

Theorem 3.1

Suppose that f+F is metrically regular at \(\bar{x}\) for 0 with constant λ. Then, in particular, \(\bar{x}\) is a solution of (1). Consider the quasi-Newton method (4) and assume that

$$ \bigl\| B_0 - Df(\bar{x})\bigr\| < 1/(2\lambda). $$
(13)

Furthermore, assume that there exist a constant c>0 and a neighborhood U of \(\bar{x}\) such that, for k=0,1,…, and for any B k and any x k ,x k+1U, x k x k+1, all satisfying (4), the operator B k+1 is chosen in such a way that

$$ \bigl\| B_{k+1} - Df(\bar{x})\bigr\| \leq\bigl\| B_{k}-Df(\bar{x}) \bigr\| + c\bigl(\|x_k-\bar{x}\|+\| x_{k+1}-\bar{x}\|\bigr). $$
(14)

Then there exists a neighborhood O of \(\bar{x}\) such that for every x 0O there exists a sequence {x k } starting at x 0 and generated by (4) which stays in O and either reaches a solution of (1) in O after finitely many steps or converges q-linearly to \(\bar{x}\). If in addition f+F is strongly metrically regular at \(\bar{x}\) for 0 then for every x 0O there is a unique in O sequence {x k } starting at x 0 and generated by (4), and this sequence converges q-linearly to \(\bar{x}\).

Proof

Choose κ>λ such that

$$ \delta:=\bigl\| B_0 - Df(\bar{x})\bigr\| < 1/(2\kappa). $$
(15)

Let κ′>κ be such that

$$\frac{\kappa'\delta}{1-\kappa\delta} < 1 $$

and then fix γ>0 to satisfy

$$\frac{\kappa'\delta}{1-\kappa\delta} < \gamma<1. $$

Choose ε>0 such that

$$ \frac{\kappa'}{1-\kappa(\delta+\varepsilon)}(\varepsilon+\delta ) < \gamma. $$
(16)

Let

$$ H(x) = f(\bar{x}) + Df(\bar{x}) (x-\bar{x}) + F(x), $$
(17)

for xX. From Theorem 2.2 applied to f+F and \(h(\cdot)=f(\bar{x})+Df(\bar{x})(\,\cdot\,-\bar{x})-f(\cdot)\) (or simply by the standard Lyusternik–Graves theorem, see, e.g., [10, Theorem 5E.1]), since \(h(\bar{x})=0\), κ>λ, and (f+F)+h=H, then (9) implies metric regularity of H at \(\bar{x}\) for 0 with constant κ. Thus, it follows the existence of some positive constants a and b such that

$$d\bigl(x,H^{-1}(y)\bigr)\leq\kappa d\bigl(y,H(x)\bigr)\quad\text{for all }x\in\mathbb{B} _a(\bar{x}),y\in\mathbb{B}_b(0). $$

Using (16), make a smaller if necessary so that \(\mathbb{B} _{a}(\bar{x})\subset U\),

$$ \begin{aligned} &\bigl\| f(u)-f(v)-Df(\bar{x}) (u-v)\bigr\| \leq\varepsilon\|u-v\| \quad\text{for all }u,v\in\mathbb{B}_a(\bar{x}), \\ &\kappa \biggl(\delta+\frac{ca}{1-\gamma} \biggr)< 1, \end{aligned} $$
(18)

and

$$ \frac{\kappa'}{1-\kappa (\delta+ \frac{ca}{1-\gamma} )} \biggl(\varepsilon+\delta+\frac{ca}{1-\gamma} \biggr) < \gamma, $$
(19)

where c is from (14). Set

$$\mu:=\delta+\frac{ca}{1-\gamma} $$

and choose positive α and β to satisfy the inequalities (6) in Theorem 2.2. Choose τ∈(0,α) such that

$$ \biggl(\varepsilon+\delta+\frac{ca}{1-\gamma} \biggr)\tau\leq \beta. $$
(20)

Denote \(O:=\mathbb{B}_{\tau}(\bar{x})\) and choose any \(x_{0}\in O\setminus\{ \bar{x}\}\). Consider the function

$$h_0(x): = f(x_0) + B_0(x-x_0)-f( \bar{x})-Df(\bar{x}) (x-\bar{x}). $$

Then we have

$$ \bar{y}_0 \in h_0(\bar{x})+H(\bar{x}) \quad \text{where } \bar{y}_0 := f(x_0)-f(\bar{x})+B_0(\bar{x}-x_0). $$
(21)

Further,

$$\begin{aligned} \bigl\| h_0(\bar{x})\bigr\| = & \bigl\| f(x_0) + B_0(\bar{x}-x_0)-f(\bar{x})\bigr\| \\ \leq& \bigl\| f(x_0)-f(\bar{x}) - Df(\bar{x}) (x_0-\bar{x})\bigr\| + \bigl\| \bigl(B_0-Df(\bar{x})\bigr) (\bar{x}-x_0)\bigr\| \\ \leq& (\varepsilon+ \delta)\tau\leq\beta, \end{aligned}$$
(22)

where we use (20). For any \(x,x' \in\mathbb{B}_{\alpha}(\bar{x})\) from (15) we have

$$\bigl\| h_0(x)-h_0\bigl(x'\bigr)\bigr\| = \bigl\| \bigl(B_0-Df(\bar{x})\bigr) \bigl(x-x'\bigr)\bigr\| \leq\delta \bigl\| x-x'\bigr\| \leq \mu\bigl\| x-x'\bigr\| . $$

Also, observe that \(\bar{y}_{0} = h_{0}(\bar{x})\), hence from (22) we get \(\bar{y}_{0} \in\mathbb{B}_{\beta}(0)\). Finally, from (21) we have \(\bar{x}\in(h_{0}+H)^{-1}(\bar{y}_{0})\). We are now ready to apply Theorem 2.2 with H defined in (17), h=h 0, κ, μ, κ′, a, b, α, β having the values defined above, to obtain that there exists x 1∈(h 0+H)−1(0), that is, x 1 satisfies (4) for k=0, and also

$$\|x_1 - \bar{x}\| \leq\frac{\kappa'}{1-\kappa\mu}\|\bar{y}_0\| \leq \frac{\kappa'}{1-\kappa\mu}(\varepsilon+\delta)\|x_0-\bar{x}\| \leq\gamma \|x_0 - \bar{x}\|, $$

where we use the estimates (19) and (22). Since γ<1, this yields \(x_{1} \in O=\mathbb{B}_{\tau}(\bar{x})\).

By induction, suppose that there exist an integer n>1 and points x 1,…,x n with \(x_{k} \in\mathbb{B}_{\tau}(\bar{x})\), and

$$ \|x_k-\bar{x}\|\leq\gamma\|x_{k-1}-\bar{x}\| \quad\text{for } k = 1, \dots, n. $$
(23)

If for some k∈{1,…,n} we have \(x_{k}=\bar{x}\) or x k−1=x k then x k is a solution of (1). Otherwise, we have \(x_{k-1}\neq x_{k} \neq\bar{x}\) for all k=1,…,n. From condition (14) and taking into account that ταa/2 we get

$$\begin{aligned} \bigl\| Df(\bar{x})-B_n\bigr\| \leq{}& \bigl\| Df(\bar{x})-B_0 \bigr\| +c\sum_{k=1}^n \bigl(\| x_k-\bar{x} \|+\|\bar{x}-x_{k-1}\| \bigr) \\ \leq{}& \delta+c\sum_{k=1}^n \bigl( \|x_k-\bar{x}\|+\|\bar{x}-x_{k-1}\| \bigr) \\ \leq{}& \delta+ 2c\sum_{k=0}^n \|x_{k}-\bar{x}\| \leq\delta+2c\sum_{k=0}^{\infty} \gamma^k\|x_0-\bar{x}\| \leq\delta+\frac{ca}{1-\gamma}. \end{aligned}$$
(24)

Define

$$h_n(x): = f(x_n) + B_n(x-x_n)-f( \bar{x})-Df(\bar{x}) (x-\bar{x}) $$

and

$$\bar{y}_n := f(x_n)-f(\bar{x})+B_n(\bar{x}-x_n). $$

Then, using (24), we obtain

$$\begin{aligned} \|\bar{y}_n\| = & \bigl\| f(x_n)-f( \bar{x}) - Df(\bar{x}) (x_n-\bar{x})\bigr\| + \bigl\| \bigl(B_n-Df(\bar{x})\bigr) (\bar{x}-x_n)\bigr\| \\ \leq& \biggl(\varepsilon+ \delta+ \frac{ca}{1-\gamma} \biggr)\| x_n- \bar{x}\| \leq \biggl(\varepsilon+ \delta+ \frac{ca}{1-\gamma } \biggr) \|x_0-\bar{x}\| . \end{aligned}$$
(25)

Hence, by (20),

$$ \|\bar{y}_n\| \leq \biggl(\varepsilon+ \delta+ \frac{ca}{1-\gamma } \biggr)\tau\leq\beta. $$
(26)

Since \(h_{n}(\bar{x}) = \bar{y}_{n}\), we get \(\|h_{n}(\bar{x})\| \leq\beta\). Also, for any \(x,x' \in\mathbb{B}_{\alpha}(\bar{x})\) we obtain

$$\begin{aligned} \bigl\| h_n(x)-h_n\bigl(x'\bigr)\bigr\| =& \bigl\| \bigl(B_n-Df(\bar{x})\bigr) \bigl(x-x'\bigr)\bigr\| \\ \leq& \biggl(\delta +\frac{ca}{1-\gamma} \biggr)\bigl\| x-x'\bigr\| =\mu \bigl\| x-x'\bigr\| . \end{aligned}$$

The assumptions of Theorem 2.2 are then satisfied, hence, taking into account that \(\bar{x}\in(h_{n}+H)^{-1}(\bar{y}_{n})\) we conclude that there exists x n+1∈(h n +H)−1(0), that is, satisfying (4) for k=n, such that

$$\|x_{n+1}-\bar{x}\| \leq\frac{\kappa'}{1-\kappa\mu}\|\bar{y}_n\|. $$

Then, utilizing (19) and (25) we obtain

$$ \|x_{n+1}-\bar{x}\| \leq\frac{\kappa '}{1-\kappa\mu} \biggl(\varepsilon+ \delta+ \frac{ca}{1-\gamma } \biggr)\|x_n-\bar{x}\| \leq\gamma \|x_n-\bar{x}\|. $$
(27)

Hence, \(x_{n+1} \in\mathbb{B}_{\tau}(\bar{x})\) and the induction step is complete. If \(x_{n+1}=\bar{x}\) or x n+1=x n then x n+1 is a solution of (1). Otherwise, we have an infinite sequence {x n } with x n x n+1 for all n which satisfies (27). Since γ<1, (27) yields that the sequence {x k } converges to \(\bar{x}\) q-linearly.

For the final statement, suppose that f+F is strongly metrically regular with the same constant κ and neighborhoods \(\mathbb{B}_{a}(\bar{x}) \) and \(\mathbb{B}_{b}(\bar{x})\). According to the second part of Theorem 2.2, the point x n+1∈(h n +H)−1(0) is unique in O. Furthermore, \((f+F)^{-1}(0)\cap O=\{\bar{x}\}\) and hence the sequence must converge q-linearly to the only solution \(\bar{x}\) in O. The proof is complete. □

4 Convergence of the Broyden update

In this section X and Y are real Hilbert spaces with scalar products denoted by 〈⋅,⋅〉. We consider the following well-known Broyden update:

$$ B_{k+1}:=B_{k}+\frac{(y_{k}-B_{k}s_{k}) \langle s_{k},\cdot \rangle}{ \Vert s_{k}\Vert ^{2}}, $$
(28)

where y k :=f(x k+1)−f(x k ) and s k :=x k+1x k . Usually, B 0 is taken as Df(x 0).

There are a large number of papers dealing with quasi-Newton methods for solving the nonlinear equations in the infinite-dimensional setting, and some of them deal with the Broyden update, see, e.g., [3, 12, 15, 18, 22, 23, 25]. An overview of the Broyden update, together with historical remarks and recent works is given in [13]. We will apply Theorem 3.1 to the Broyden update (28) showing that it satisfies condition (14) and hence is q-linearly convergent, locally.

We start with an elementary lemma.

Lemma 4.1

Let \(A \in\mathcal{L}(X, Y)\). If xX∖{0}, then

$$ \biggl\Vert A-\frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert =\left \{ \begin{array}{l@{\quad}l} 0,&\textit{if }\dim X=1;\\ \|A\|,&\textit{if }\dim X>1. \end{array} \right . $$
(29)

Proof

Let \(z\in\operatorname{span}(x)=\{\lambda x\mid\lambda\in\mathbb {R}\}\). Then there is some \(\lambda_{0}\in\mathbb{R}\) such that z=λ 0 x, from where,

$$\biggl\Vert Az-\frac{\langle x,z\rangle Ax}{\|x\|^2}\biggr\Vert =\biggl\Vert \lambda_0 Ax-\frac{\lambda_0\langle x,x\rangle Ax}{\|x\|^2}\biggr\Vert =0. $$

If dimX=1, then \(X=\operatorname{span}(x)\), and from the above equality we obtain (29). Otherwise, assume that dimX>1. For any \(z\in\mathbb{B}_{X}\), one has

$$\biggl\Vert z-\frac{\langle x,z\rangle x}{\|x\|^2}\biggr\Vert ^2=\|z\| ^2-\frac{\langle x,z\rangle^2}{\|x\|^2}\leq1. $$

Hence,

$$\begin{aligned} \biggl\Vert A-\frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert &= \sup_{w\in\mathbb{B}_X} \biggl\Vert A \biggl(w-\frac{\langle x,w\rangle x}{\| x\|^2} \biggr)\biggr\Vert \leq\|A\|\sup _{w\in\mathbb{B}_X}\biggl\Vert w-\frac {\langle x,w\rangle x}{\|x\|^2}\biggr\Vert \leq\|A\|. \end{aligned}$$

For any z∈{x}={wX∣〈w,x〉=0}, one has

$$\biggl\Vert Az-\frac{\langle x,z\rangle Ax}{\|x\|^2}\biggr\Vert =\|Az\|, $$

and therefore (29) follows. □

The following result is a generalization to Hilbert spaces of a statement included in the first part of the proof of [24, Theorem 5.4.13].

Proposition 4.2

Suppose that the Fréchet derivative mapping Df is Lipschitz continuous with constant L in a convex neighborhood U of a point \(\bar{x}\). Given \(B_{k} \in\mathcal{L}(X, Y)\) and x k ,x k+1U, with x k+1x k , if B k+1 is defined as in (28), then

$$ \bigl\| B_{k+1}-Df(\bar{x})\bigr\| \leq \bigl\| B_k-Df(\bar{x})\bigr\| + \frac{L}{2} \bigl(\| x_{k+1}-\bar{x}\|+ \|\bar{x}-x_k\| \bigr). $$
(30)

Proof

By assumption,

$$\bigl\| Df(u)-Df(v)\bigr\| \leq L\|u-v\|\quad\text{for all }u,v\in U. $$

Let x k+1,x k U, x k x k+1 and let B k+1 be defined as in (28). Then

$$\begin{aligned} B_{k+1}-Df(\bar{x})&=B_k-Df(\bar{x})+\frac{(y_k-B_k s_k)\langle s_k,\cdot\,\rangle}{\|s_k\|^2} \\ &=B_k-Df(\bar{x})-\frac{(B_k-Df(\bar{x}))s_k \langle s_k,\cdot\,\rangle }{\|s_k\|^2}+\frac{(y_k-Df(\bar{x}) s_k)\langle s_k,\cdot\,\rangle}{\| s_k\|^2}. \end{aligned}$$

Thus,

$$\bigl\| B_{k+1}-Df(\bar{x})\bigr\| \leq\biggl\Vert \bigl(B_k-Df(\bar{x})\bigr)-\frac{(B_k-Df(\bar{x}))s_k \langle s_k,\cdot\,\rangle}{\|s_k\|^2}\biggr\Vert +\frac{\| y_k-Df(\bar{x}) s_k\|}{\|s_k\|}. $$

By Lemma 4.1,

$$\biggl\Vert \bigl(B_k-Df(\bar{x})\bigr)-\frac{(B_k-Df(\bar{x}))s_k \langle s_k,\cdot\, \rangle}{\|s_k\|^2}\biggr\Vert \leq\bigl\| B_k-Df(\bar{x})\bigr\| . $$

Utilizing the mean value theorem, we obtain

$$\begin{aligned} \bigl\| y_k-Df(\bar{x}) s_k\bigr\| &=\bigl\| f(x_{k+1})-f(x_k)-Df( \bar{x})s_k\bigr\| \\ &=\biggl\Vert \int_0^1 \bigl[Df \bigl(x_{k}+t(x_{k+1}-x_{k})\bigr) (x_{k+1}-x_{k})-Df(\bar{x})s_k\bigr] dt\biggr\Vert \\ &\leq\|s_k\|\int_0^1\bigl\| Df \bigl(x_{k}+t(x_{k+1}-x_{k})\bigr)-Df(\bar{x})\bigr\| dt \\ &\leq L\|s_k\|\int_0^1 \bigl((1-t) \|x_{k}-\bar{x}\|+t\|x_{k+1}-\bar{x}\| \bigr)dt \\ &=\frac{L}{2}\|s_k\| \bigl(\|x_{k+1}-\bar{x}\|+ \|x_{k}-\bar{x}\| \bigr). \end{aligned}$$

This yields (30). □

We apply Theorem 3.1 to obtain the following result:

Theorem 4.3

Consider the generalized equation (1) in the setting of Hilbert spaces X and Y with a solution \(\bar{x}\) and suppose that the derivative mapping Df is Lipschitz continuous around \(\bar{x}\). Also, suppose that f+F is metrically regular at \(\bar{x}\) for 0 with constant λ. Consider the quasi-Newton method (4) applied to (1) with the Broyden update (28) and with B 0 satisfying (13). Then there exists a neighborhood O of \(\bar{x}\) such that for any x 0O there exists a sequence {x k } starting from x 0 and generated by (4) which stays in O and either reaches a solution of (1) in finitely many steps or converges q-linearly to \(\bar{x}\). If in addition f+F is strongly metrically regular at \(\bar{x}\) for 0 then for every x 0O there is a unique in O sequence {x k } starting from x 0 and generated by (4), and this sequence which converges q-linearly to \(\bar{x}\).

We devote the remainder of this section to the q-superlinear convergence of the Broyden update. Recall that the Hilbert–Schmidt norm of an operator \(A\in\mathcal {L}(X,Y)\) is defined as

$$\|A\|_{HS}=\sqrt{\sum_{i\in I} \|Ae_i\|^2}, $$

where {e i ,iI} is an orthonormal basis of X. Denote by \(\mathcal{H}(X,Y): = \{ A \in\mathcal{L}(X,Y) \mid\|A\| _{HS} < +\infty\}\) the set of Hilbert–Schmidt operators. Endowed with the inner product

$$\langle A,B\rangle_{HS}=\sum_{i\in I}\langle Ae_i,B e_i\rangle, $$

\(\mathcal{H}(X,Y)\) becomes a Hilbert space, see [20]. In Euclidean spaces this norm coincides with the Frobenius norm.

We start with a lemma which echoes Lemma 4.1.

Lemma 4.4

Let \(A \in\mathcal{H}(X,Y)\). If 0≠xX, then

$$ \biggl\Vert A-\frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert _{HS}^2=\| A\|_{HS}^2- \frac{\|Ax\|^2}{\|x\|^2}. $$
(31)

Proof

Note that

$$\begin{aligned} \biggl\Vert A-\frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert _{HS}^2&= \|A\|_{HS}^2+\biggl\Vert \frac{\langle x,\cdot\rangle Ax}{\|x\|^2}\biggr\Vert ^2_{HS}-2 \biggl\langle A,\frac{\langle x,\cdot\rangle Ax}{\|x\|^2} \biggr\rangle _{HS}. \end{aligned}$$

Further, by the Parseval identity,

$$\bigl\Vert {\langle x,\cdot\rangle Ax}\bigr\Vert ^2_{HS}= \sum_{i\in I}\bigl\| \langle x,e_i\rangle Ax \bigr\| ^2= \|Ax\|^2\sum_{i\in I} \langle x,e_i\rangle^2=\|Ax\|^2\|x \|^2, $$

and

$$\bigl\langle A,\langle x,\cdot\rangle Ax \bigr\rangle _{HS}=\sum _{i\in I} \bigl\langle Ae_i,\langle x,e_i\rangle Ax \bigr\rangle =\sum_{i\in I} \bigl\langle A\langle x,e_i\rangle e_i,Ax \bigr\rangle = \|Ax\|^2, $$

where to get the last equality we apply Remark 1.2.1(c) in [20]. This yields (31). □

Lemma 4.4 implies that the Proposition 4.2 is valid also for the Hilbert–Schmidt norm.

Proposition 4.5

Consider a function f:XY and a point \(\bar{x}\in X\) such that the derivative mapping Df is Lipschitz continuous around \(\bar{x}\) with respect to the Hilbert–Schmidt norm with constant L in a convex neighborhood U of a point \(\bar{x}\). Given \(B_{k} \in \mathcal{H}(X,Y)\) and x k ,x k+1U, with x k+1x k , if B k+1 is defined as in (28), then

$$ \bigl\| B_{k+1}-Df(\bar{x}) \bigr\| _{HS}\leq\bigl\| B_k-Df(\bar{x})\bigr\| _{HS} + \frac {L}{2} \bigl(\|x_{k+1}-\bar{x}\|+\|\bar{x}-x_k\| \bigr). $$
(32)

Proof

This can be obtained by applying the same argument as in Proposition 4.2 but using Lemma 4.4 instead of Lemma 4.1. □

Corollary 4.6

On the assumptions of Proposition 4.5, if \(B_{0}-Df(\bar{x})\) is a Hilbert–Schmidt operator and Df is Lipschitz with respect to the Hilbert–Schmidt norm, then \(B_{k}-Df(\bar{x})\) is a Hilbert–Schmidt operator, for all \(k\in\mathbb{N}\).

Proof

This follows from (32). □

In the remainder of this section we link the analysis presented so far with a central result in the theory of quasi-Newton methods—the Dennis–Moré theorem. This theorem, first published in [5], gives a characterization for the q-superlinear convergence of a quasi-Newton method applied to a smooth equation f(x)=0 with a zero at \(\bar{x}\) at which the derivative mapping \(Df(\bar{x})\) is invertible. Namely, if a quasi-Newton method generates a sequence {x k } which stays near \(\bar{x}\) and x k+1x k for all k, then {x k } is convergent q-superlinearly if and only if it is convergent and, in addition,

$$ \lim_{k\rightarrow\infty}\frac{\Vert E_{k}s_{k}\Vert }{ \Vert s_{k}\Vert }=0, $$
(33)

where \(E_{k}:=B_{k}-Df(\bar{x})\).

It is well known that the Broyden update (28) applied to a smooth equation in finite dimensions with a nonsingular Jacobian at the reference solution \(\bar{x}\) satisfies condition (33), see e.g. [17, Theorem 7.2.4]. Proofs of this claim in infinite-dimensional Hilbert spaces are given in [18] and [23], both of which explicitly use the fact that they deal with equations. We will now show that (33) holds as well by relying only on the formula (28), without assuming that this update is to be applied for solving an equation. This allows us to apply the Dennis–Moré theorem for generalized equations in Banach spaces proved in [8, Theorem 3]. For completeness we state next the sufficiency part of the latter result which is used in further lines.

Theorem 4.7

Consider the generalized equation (1) with a solution \(\bar{x}\) and suppose that f is Fréchet differentiable in a neighborhood U of \(\bar{x}\) and the derivative mapping Df is continuous at \(\bar{x}\). Also, suppose that the mapping \(x\mapsto f(\bar{x})+D f(\bar{x})(x-\bar{x}) + F(x)\) is strongly metrically subregular at \(\bar{x}\) for 0. If a sequence {x k } generated by (4) is convergent to \(\bar{x}\) and satisfies (33), then it is convergent q-superlinearly.

The following theorem gives conditions under which the Broyden update satisfies the Dennis–Moré condition (33).

Theorem 4.8

Consider a function f:XY and a point \(\bar{x}\in X\) such that the derivative mapping Df is Lipschitz continuous around \(\bar{x}\) with respect to the Hilbert–Schmidt norm. Consider also the Broyden update (28) such that \(B_{0}-Df(\bar{x})\) is a Hilbert–Schmidt operator. If a sequence {x k } is linearly convergent to \(\bar{x}\), then it satisfies the Dennis–Moré condition (33).

Proof

We first show that

$$ \lim_{k\to\infty}\|B_{k+1}-B_k\|_{HS} = 0. $$
(34)

The proof of (34) parallels the analysis in [14], similarly to the proof of [4, Theorem 2.3]. Consider the convex set

$$\mathcal{C}_k:=\bigl\{ A\in\mathcal{L}(X,Y)\mid\bigl\| A-Df(\bar{x})\bigr\| _{HS}<\infty\text{ and } As_k=y_k \bigr\} , $$

where s k :=x k+1x k and y k :=f(x k+1)−f(x k ). Observe that \(\mathcal{C}_{k}\) is closed: if \(A_{n}\in\mathcal{C}_{k}\) converges to \(A\in\mathcal{L}(X,Y)\) (with respect to the Hilbert–Schmidt norm), then

$$\bigl\| A-Df(\bar{x})\bigr\| _{HS}\leq\|A-A_n\|_{HS}+ \bigl\| A_n-Df(\bar{x})\bigr\| _{HS}<\infty, $$

and

$$\|As_k-y_k\|=\|As_k-A_ns_k \|\leq\|A-A_n\|_{HS}\|s_k\|, $$

where we employ the inequality ∥AB k ∥≤∥AB k HS (see e.g. [20, Corollary 16.9]). Taking the limit when n→∞, we get As k =y k .

Let {e i ,iI} be an orthonormal basis of X for an index set I. By Corollary 4.6, \(B_{k+1}\in\mathcal{C}_{k}\). Moreover, for all \(A\in\mathcal{C}_{k}\), one has

$$\begin{aligned} \|B_{k+1}-B_k\|_{HS}^2&=\biggl\Vert \frac{(y_k-B_ks_k)\langle s_k,\cdot \rangle}{\|s_k\|^2}\biggr\Vert _{HS}^2=\frac{\|(A-B_k)s_k \langle s_k,\cdot\rangle\|_{HS}^2}{\|s_k\|^4} \\ & =\frac{\sum_{i\in I}\|(A-B_k)s_k\langle s_k,e_i\rangle\|^2}{\|s_k\|^4} =\frac{\sum_{i\in I}\langle s_k,e_i\rangle^2\|(A-B_k)s_k\|^2}{\|s_k\| ^4} \\ &=\frac{\|(A-B_k)s_k\|^2}{\|s_k\|^2} \leq\|A-B_k\|_{HS}^2, \end{aligned}$$

where we again use the inequality ∥AB k ∥≤∥AB k HS . Then Broyden update (28) is the (unique) solution to the minimization problem

$$\min_{A\in\mathcal{C}_k}\|A-B_k\|_{HS}. $$

Thus, B k+1 is the projection of B k onto the closed convex set \(\mathcal{C}_{k}\). The projection mapping onto \(\mathcal{C}_{k}\), denoted by \(P_{\mathcal{C}_{k}}\), is firmly nonexpansive (see e.g. [2, Proposition 4.8]), meaning in our case that for every \(A\in\mathcal{C}_{k}\) one has

$$\bigl\| P_{\mathcal{C}_k}(B_k)-P_{\mathcal{C}_k}(A)\bigr\| _{HS}^2+ \bigl\| (I-P_{\mathcal{C}_k}) (B_k)-(I-P_{\mathcal{C}_k}) (A) \bigr\| _{HS}^2\leq\| B_k-A\|_{HS}^2, $$

where I denotes the identity mapping. (Firmly nonexpansive mappings can be defined in several equivalent ways; here we use one of the possible definitions, see also [2, Definition 4.1(i)].) Hence, for all \(A\in\mathcal{C}_{k}\),

$$ \|B_{k+1}-A \|_{HS}^2+\|B_{k+1}-B_k \|_{HS}^2\leq\|B_k-A\|_{HS}^2. $$
(35)

For

$$ A_k:=\int_0^1 Df \bigl(x_k+t(x_{k+1}-x_k)\bigr)dt $$
(36)

we have

$$ A_ks_k=\int_0^1 Df \bigl(x_k+t(x_{k+1}-x_k)\bigr) (x_{k+1}-x_k)dt=f(x_{k+1})-f(x_k)=y_k . $$
(37)

Furthermore, since Df is Lipschitz continuous with respect to the Hilbert–Schmidt norm, there is a constant L≥0 such that, eventually,

$$\begin{aligned} \bigl\| A_k-Df(\bar{x})\bigr\| _{HS}&=\biggl\Vert \int _0^1 \bigl(Df\bigl(x_k+t(x_{k+1}-x_k) \bigr)-Df(\bar{x}) \bigr)dt\biggr\Vert _{HS} \\ &\leq\int_0^1\bigl\Vert Df \bigl(tx_{k+1}+(1-t)x_k\bigr)-Df(\bar{x})\bigr\Vert _{HS}dt \\ &\leq L \int_0^1\bigl\Vert t(x_{k+1}-\bar{x})+(1-t) (x_k-\bar{x})\bigr\Vert dt \\ &\leq\frac{L}{2} \bigl(\|x_{k+1}-\bar{x}\|+\|x_k-\bar{x}\| \bigr) <\infty . \end{aligned}$$
(38)

Thus, \(A_{k}\in\mathcal{C}_{k}\). Since x k converges to \(\bar{x}\), we deduce from (38) that \(\|A_{k}-Df(\bar{x})\|_{HS}\) converges to zero. Moreover, (32) together with the linear convergence of x k to \(\bar{x}\) implies that \(\|B_{k}-Df(\bar{x})\|_{HS}\) is convergent. Indeed, let 0<γ<1 be such that

$$\|x_{k+1}-\bar{x}\|\leq\gamma\|x_k-\bar{x}\|\quad\text{for all }k=0,1,\ldots. $$

Then, for all m>n, one has by (32)

$$\begin{aligned} \bigl\| B_m-Df(\bar{x})\bigr\| _{HS} &\leq\bigl\| B_n-Df(\bar{x}) \bigr\| _{HS}+\frac{L}{2}\sum_{k=n+1}^m \bigl(\|x_k-\bar{x}\|+\|x_{k-1}-\bar{x}\| \bigr) \\ &\leq\bigl\| B_n-Df(\bar{x})\bigr\| _{HS}+L\sum _{k=n}^{m-1}\|x_{k}-\bar{x}\| \\ &\leq\bigl\| B_n-Df(\bar{x})\bigr\| _{HS}+L\sum _{k=n}^{\infty}\gamma^k\| x_{0}- \bar{x}\| \\ &\leq\bigl\| B_n-Df(\bar{x})\bigr\| _{HS}+\frac{L\gamma^n}{1-\gamma}\| x_{0}-\bar{x}\|. \end{aligned}$$

This implies that \(\|B_{k}-Df(\bar{x})\|_{HS}\) is a Cauchy sequence, and thus it is convergent. Therefore, since A k defined in (36) converges to \(Df(\bar{x})\), we get that ∥B k A k HS and ∥B k+1A k HS converge to the same limit. Furthermore, (35) implies

$$ \|B_{k+1}-A_k \|^2_{HS}+\|B_{k+1}-B_k \|^2_{HS}\leq\|B_k-A_k \|^2_{HS} $$
(39)

which in turn yields (34).

We are now ready to prove that (33) is satisfied. Since ∥B k+1B k ∥≤∥B k+1B k HS , by the triangle inequality we have

$$\begin{aligned} \Vert E_{k}s_{k}\Vert =&\bigl\Vert \bigl(B_{k}-Df(\overline{x})\bigr)s_{k}\bigr\Vert \\ \leq&\bigl\Vert \bigl(B_{k+1}-Df(\overline {x})\bigr)s_{k} \bigr\Vert +\Vert B_{k+1}-B_{k}\Vert _{HS} \Vert s_{k}\Vert . \end{aligned}$$
(40)

The next steps mimics the proof of Proposition 4.2. Taking into account that

$$\bigl\Vert \bigl(B_{k+1}-Df(\overline{x})\bigr)s_{k}\bigr\Vert =\bigl\Vert y_{k}-Df(\overline{x})s_{k}\bigr\Vert =\bigl\Vert f(x_{k+1})-f(x_{k} )-Df( \overline{x})s_{k}\bigr\Vert $$

and

$$f(x_{k+1})-f(x_{k})={ \displaystyle\int\nolimits_{0}^{1}} Df \bigl(x_{k}+t(x_{k+1}-x_{k})\bigr)s_{k}dt, $$

we get

$$\begin{aligned} \bigl\Vert \bigl(B_{k+1}-Df(\overline{x})\bigr)s_{k}\bigr\Vert & \leq \Vert s_{k}\Vert { \displaystyle\int\nolimits_{0}^{1}} \bigl\Vert Df\bigl(x_{k}+t(x_{k+1}-x_{k})\bigr)-Df( \overline{x})\bigr\Vert dt \\ &\leq \Vert s_{k}\Vert {\displaystyle\int \nolimits_{0}^{1}} \bigl\Vert Df \bigl(x_{k}+t(x_{k+1}-x_{k})\bigr)-Df(\overline{x}) \bigr\Vert _{HS}dt \\ & \leq L\Vert s_{k}\Vert {\displaystyle \int\nolimits_{0}^{1}} \bigl\Vert t(x_{k+1}-\overline{x})+(1-t) (x_{k}-\overline{x})\bigr\Vert dt \\ & \leq\frac{L\Vert s_{k}\Vert }{2}\bigl(\Vert x_{k+1}-\overline {x}\Vert + \Vert x_{k}-\overline{x}\Vert \bigr). \end{aligned}$$

Thus, from (40),

$$\frac{\Vert E_{k}s_{k}\Vert }{\Vert s_{k}\Vert }\leq\frac{L}{2}\bigl(\Vert x_{k+1}-\overline{x}\Vert +\Vert x_{k}-\overline{x} \Vert \bigr)+\Vert B_{k+1}-B_{k}\Vert _{HS}. $$

Since ∥B k+1B k HS →0 by (34) and \(x_{k}\rightarrow\overline{x}\), we come to (33). □

The following theorem presents the main result of this section.

Theorem 4.9

Consider the generalized equation (1) with a solution \(\bar{x}\) and suppose that the derivative mapping Df is Lipschitz continuous around \(\bar{x}\) with respect to the Hilbert–Schmidt norm. Consider the quasi-Newton method (4) applied to (1) with the Broyden update (28) such that B 0 satisfies (13) and \(B_{0}-Df(\bar{x})\) is a Hilbert–Schmidt operator.

  1. (i)

    If f+F is strongly metrically subregular at \(\bar{x}\) for 0, then every sequence {x k } generated by (4) which converges to \(\bar{x}\) is q-superlinearly convergent;

  2. (ii)

    If f+F is both strongly metrically subregular and metrically regular at \(\bar{x}\) for 0, then there exists a neighborhood O of \(\bar{x}\) such that for every starting point x 0O there exists a sequence {x k } generated by (4) which either reaches a solution of (1) in finitely many steps or converges q-superlinearly to \(\bar{x}\);

  3. (iii)

    If f+F is strongly metrically regular at \(\bar{x}\) for 0, then for every x 0O there exists a unique in O sequence {x k } starting from x 0 and generated by (4), and this sequence converges q-superlinearly to \(\bar{x}\).

Proof

To prove (i) it is sufficient to combine Theorem 4.8 with Theorem 4.7. Then (ii) follows from (i) and Theorem 4.3. Since strong metric regularity implies strong metric subregularity, in order to prove (iii) it is sufficient to combine (i) with the last part of Theorem 4.3. □

We note that the condition \(E_{0}:=B_{0}-Df(\overline{x})\) be a Hilbert–Schmidt operator is used in [23, Theorem 3.5] to prove q-superlinear convergence of the Broyden method for equations. Thus, Theorem 4.7 also extends [23, Theorem 3.5] to generalized equations.

5 Two numerical examples

Our first example is one-dimensional. Let \(f:\mathbb{R}\to\mathbb {R}\) and \(F:\mathbb{R}\rightrightarrows\mathbb{R}\) be given by

$$\begin{aligned} f(x)&:=3x^3-2x^2,\quad\text{for }x\in\mathbb{R}; \\ F(x)&:=\left \{ \begin{array}{l@{\quad}l} \{x,-x\},&x\geq0;\\ \emptyset,&x<0. \end{array} \right . \end{aligned}$$

The graph of f+F is plotted in Figure 1. The generalized equation 0∈f(x)+F(x) has two solutions: 0 and 1.

Fig. 1
figure 1

The graph of f+F in the first example

Observe that the mapping f+F is strongly regular at any point of its graph, and particularly at 0 for 0 and at 1 for 0. Hence the assumptions of Theorem 4.9 are satisfied, and the quasi-Newton method (4) with the Broyden update (28) generates a locally unique q-superlinearly convergent sequence when started within a neighborhood of each of the solutions. The numerical results with B 0:=Df(x 0) are shown in Table 1 for two starting points: x 0=0.1 (left) and x 0=0.3 (right). The absolute error at the kth iteration is denoted by ∥e k ∥. Note that the obtained convergence is actually q-superlinear in each case.

Table 1 Numerical results for the first example with x 0=0.1 (left) and x 0=0.3 (right)

In the paper [11] the following model of economic equilibrium was introduced. Consider a group of r agents, each of which starts with a vector \(x_{i}^{0}\in\mathbb{R}^{n}\) of goods and trades them for another goods vectors \(x_{i}\in\mathbb{R}^{n}\). Each good has a price to be determined by the market and the price vector is \(p \in\mathbb{R}^{n}_{+}\). Agent i has an initial amount of money \(m_{i}^{0}\in\mathbb{R}_{+}\) and ends up, after trading, with an amount of money \(m_{i}\in\mathbb{R}_{+}\). Agent i aims at maximizing a utility function u i (m i ,x i ) over a set \(\mathbb{R}_{+}\times U_{i}\) subject to the budget constraint

$$ m_i - m_i^0 + \bigl\langle p, x_i - x_i^0\bigr\rangle \leq0, $$
(41)

where the sets \(U_{i}\subset\mathbb{R}^{n}\) are nonempty, closed and convex and the functions u i are continuously differentiable, concave and nondecreasing over \(\mathbb{R}_{+}\times U_{i}\). In addition to the budget constraints (41) there are supply-demand requirements for money and goods, of the form

$$ \sum_{i=1}^r \bigl[m_i-m^0_i\bigr] \leq0,\qquad \sum _{i=1}^r \bigl[x_i-x^0_i \bigr] \leq0. $$
(42)

The problem is to find an equilibrium value of the vector variable (p,m,x) such that each utility function attains its maximum subject to the budget and the supply-demand constraints. It is shown in [11, Theorem 1] that under some mild conditions that are satisfied in the example displayed below an equilibrium always exists, moreover it satisfies a first-order optimality condition for each agent involving the Lagrange functions

$$L_i(p,m_i, x_i, \lambda_i) = - u(m_i, x_i) + \lambda_i \bigl(m_i - m_i^0 + \bigl\langle p, x_i - x_i^0\bigr\rangle \bigr) $$

with a Lagrange multiplier λ i ≥0, i=1,…,r, associated with the budget constraint (41). Adding the supply-demand constraints (42) written as complementarity conditions, we obtain a variational inequality for the vectors \(p \in\mathbb{R}^{n}_{+}\), \(m=(m_{1},\ldots,m_{r})^{\sf T} \in\mathbb {R}^{r}_{+}\), \(x=(x_{1},\ldots,x_{r})^{\sf T} \in U_{1} \times U_{2} \times\cdots\times U_{r}\), and \(\lambda= (\lambda_{0}, \ldots, \lambda_{r})^{\sf T} \in\mathbb{R} _{+}^{r}\) of the form

$$ -g\bigl(p,m,x,\lambda,m^0,x^0\bigr) \in N_C(p,m,x,\lambda), $$
(43)

where

$$ C= \mathbb{R}^n_+\times\mathbb{R}_+^r \times U_1 \times\cdots \times U_r \times\mathbb{R}^r_+, $$
(44)

and

$$ g\bigl(p,m,x,\lambda,m^0,x^0\bigr) = \left ( \begin{array}{c} \sum_{i=1}^r[x_i^0-x_i] \\ \ldots\\ \lambda_i - \nabla_{m_i} u_i(m_i,x_i) \\ \ldots\\ \lambda_i p - \nabla_{x_i} u_i(m_i,x_i)\\ \ldots\\ m_i^0-m_i +\langle p, x_i^0-x_i\rangle\\ \ldots\\ \end{array} \right ). $$
(45)

The initial endowments are represented by the vectors \(m^{0}=(m^{0}_{1},\ldots,m^{0}_{r})^{\sf T}\in\mathbb{R}^{r}_{+}\) and \(x^{0}=(x^{0}_{1},\ldots,x^{0}_{r})^{\sf T} \in U_{1} \times U_{2} \times\cdots\times U_{r}\). In [11, Theorem 3] it is shown that the equilibrium mapping associated with (43) is strongly regular provided that for each agent i the initial goods \(x^{0}_{i}\) are sufficiently close to the equilibrium vector \(\bar{x}_{i}\); in other words, when the trade starts with amounts of goods not too far from the equilibrium. Note that the first inequality in (42) does not appear in (43) since at equilibrium that automatically becomes an equality.

We consider a specific example where there are two agents with utility functions

$$ u_i(m_i, x_i) = \alpha_i \ln(m_i) + \beta_i\ln(x_i), \quad i = 1, 2, $$

and a single good subject to the constraints

$$ x_i \in U_i=[ \xi_i, \eta_i], \quad i = 1, 2 $$

for some positive ξ i and η i . The variational inequality (43) for the vector (p,m 1,m 2,x 1,x 2,λ 1,λ 2) has the following specific form:

$$ - \left ( \begin{array}{c} \sum_{i=1}^2[x_i^0-x_i] \\ \lambda_1 - \frac{\alpha_1}{m_1} \\ \lambda_2 - \frac{\alpha_2}{m_2}\\ \lambda_1 p - \frac{\beta_1}{x_1}\\ \lambda_2 p - \frac{\beta_2}{x_2}\\ m_1^0-m_1 +\langle p, x_1^0-x_1\rangle\\ m_2^0-m_2 +\langle p, x_2^0-x_2\rangle\\ \end{array} \right ) \in \left ( \begin{array}{c} N_{\mathbb{R}_+}(p) \\ N_{\mathbb{R}_+}(m_1) \\ N_{\mathbb{R}_+}(m_2)\\ N_{U_1}(x_1) \\ N_{U_2}(x_2)\\ N_{\mathbb{R}_+}(\lambda_1)\\ N_{\mathbb{R}_+}(\lambda_2)\\ \end{array} \right ). $$

The numerical implementation of Broyden’s update (43) for this variational inequality has been done in Matlab. Each step of the method reduces to solving linear complementarity problems (LCP). The matlab function LCP by Yuval available at http://www.mathworks.com/matlabcentral/fileexchange/20952 has been used for solving these problems. The computations are done for the following data. For the parameters α i =β i =0.1 we consider the first agents with endowment of good 0.9 and money 1.3 and the second agent with unit endowments: \(x^{0} = (0.9, 1)^{\sf T}\), \(m^{0} = (1.3, 1)^{\sf T}\). The survival interval of consumption for each agent is [0.94,1.08]. Then the solution is: p=1.2745, \(m = (1.2235, 1.0765)^{\sf T}\), \(x = (0.96, 0.94)^{\sf T}\), \(\lambda= (0.0817, 0.0929)^{\sf T}\).

We did numerical testing with various starting points and starting updates, and obtained rather similar results. The result of one of these tests is presented below for the starting point of the algorithm equal p 0=1.3745, \(m_{0} = (1.3235, 1.1765)^{\sf T}\), \(x_{0} = (1.06, 1.04)^{\sf T}\), \(\lambda_{0} = (0.1817, 0.1929)^{\sf T}\) and initial update B 0 equal the value of the Jacobian at the starting point. The results of computations are given in Table 2. We have q-superlinear convergence also for this case, as proved theoretically.

Table 2 Numerical results for the second example