Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Prerequisites. Undergraduate-level real analysis and linear algebra. The basics of metric spaces: continuity and completeness.

1 Contraction Mappings

The theorem we’re going to apply to Newton’s Method, Initial-Value Problems, and the Internet was proved by the Polish mathematician Stefan Banach as part of his 1922 doctoral dissertation. Although the setting of Banach’s theorem is far more general than that of Brouwer’s, the restricted nature of the mappings involved makes its proof a lot simpler.

Banach’s theorem is set in a metric space: a pair (S, d) where S is a set and d is a “metric” on S, i.e., a function \(d: S \times S \rightarrow \mathbb{R}_{+}\) such that for all x, y, zS

(m1):

d(x, y) = 0 iff x = y,

(m2):

d(x, y) = d(y, x), and

(m3):

d(x, z) ≤ d(x, y) + d(y, z).

The last property is called, for obvious reasons, “the triangle inequality.”

Example.

Let S be \(\mathbb{R}^{N}\), or a subset thereof, and take d(x, y) to be the Euclidean distance between x and y: \(d(x,y) =\| x - y\|\). Alternatively d could be the distance on \(\mathbb{R}^{N}\) induced in the same way by the one-norm introduced in the proof of Theorem 1.6. As we pointed out there, the two metrics are equivalent in that they have the same convergent sequences.

The mappings addressed by Banach’s Principle are called strict contractions.Footnote 1 To say \(F: S \rightarrow S\) is one of these means that there is a positive “contraction constant” c < 1 for which

$$\displaystyle{ d(F(x),F(y)) \leq cd(x,y)\qquad \forall \ x,y \in S. }$$
(3.1)

Clearly every strict contraction is continuous on S.

Definition 3.1.

A Cauchy sequence in a space with metric d is a sequence (x n ) such that: For each \(\varepsilon> 0\) there is a positive integer \(N = N(\varepsilon )\) such that \(d(x_{n},x_{m}) <\varepsilon\) whenever the indices m and n are larger than N. A complete metric space is one in which every Cauchy sequence converges.

Theorem 3.2 (The Banach Contraction-Mapping Principle).

Suppose (S,d) is a complete metric space and \(F: S \rightarrow S\) is a strict contraction. Then F has a unique fixed point, and every iterate sequence converges to this point.

We’ll prove this shortly; first, a few comments.

Iterate sequence. Recall that, for a mapping F taking a set S into itself, the iterate sequence starting at x 0S is (x n ) where \(x_{n+1} = F(x_{n})\) for n = 0, 1, 2, .

Uniqueness. If (S, d) is a metric space on which F is a strict contraction and pS is a fixed point of F, then there can be no other fixed point.

Proof.

If qS is also a fixed point of f then

$$\displaystyle{d(p,q) = d(F(p),F(q)) \leq cd(p,q).}$$

Since 0 < c < 1 we must have d(p, q) = 0, whereupon condition (m1) in the definition of “metric” guarantees that p = q. □

“Non-strict” contractions. If in Theorem (3.1) we merely assume that the contraction constant c is 1, then:

  • Existence can fail. Example: \(F(x) = x + 1\) defined on the real line.

  • Uniqueness can also fail. Example: the identity map on a metric space with more than one point.

Exercise 3.1 (Necessity of completeness).

Give an example of an incomplete metric space on which there is a strict contraction with no fixed point.

Fixed points and iterate sequences. We contended in Sect. 1.2 (page 4) that if the iteration of Newton’s method for an appropriate function f were to converge, then that limit had to be a root of f (i.e., a fixed point of the Newton function of f). The next result justifies this contention in a far more general setting.

Proposition 3.3.

If (S,d) is a metric space, \(F: S \rightarrow S\) is continuous, and x 0 is a point of S for which the iterate sequence \(\{x_{0},F(x_{0}),F(F(x_{0}),\,\ldots \}\) converges, then the limit of that sequence has to be a fixed point of F.

Proof.

Suppose the iterate sequence (x n ) of x 0 converges to pS, i.e., \(\lim _{n}d(x_{n},p) = 0\). Then the continuity of F insures that \(x_{n+1} = F(x_{n}) \rightarrow F(p)\). Also \(\lim _{n}d(x_{n+1},p) = 0\), i.e., \(x_{n+1} \rightarrow p\), so (because limits in metric spaces are unique) p = F(p). □

If we assume further that F is a strict contraction, then there results a very strong converse.

Proposition 3.4.

Suppose F is a strict contraction on a metric space. If p is a fixed point of F then every iterate sequence converges to p.

Proof.

Let c denote the contraction constant of the mapping F, so 0 < c < 1 and F satisfies (3.1) above. Fix x 0S and define the iterate sequence (x n ) in the usual way: \(x_{1} = F(x_{0}),\,\ldots \,,x_{n} = F(x_{n-1}),\,\ldots \ .\) Then

$$\displaystyle{d(x_{n},p) = d(F(x_{n-1}),F(p)) \leq cd(x_{n-1},p) \leq \ \ldots \ \leq c^{n}d(x_{ 0},p),}$$

so \(d(x_{n},p) \rightarrow 0\) as \(n \rightarrow \infty\), i.e., (x n ) converges to p. □

Exercise 3.2 (Lessons from a simple initial-value problem).

For the initial-value problem (IVP) \(y' = y,\ y(0) = 1\), write down the integral operator T on \(C(\mathbb{R})\) defined on page 4 by Eq. (IE), and compute explicitly the iterate sequence that has \(y_{0} \equiv 1\) as its initial function. On which intervals [−a, a] does this iterate sequence converge uniformly to a solution of the IVP? For which of these intervals does the Contraction-Mapping Principle guarantee such convergence?

Proof of the Contraction-Mapping Principle. In view of Proposition 3.4 only one strategy will work: fix a point x 0S and prove that its iterate sequence (x n ) converges. By Proposition 3.3 this limit must be a fixed point.

Since our metric space is complete it’s enough to show that (x n ) is a Cauchy sequence. To this end, consider a pair of indices m < n and use the triangle inequality to observe that

$$\displaystyle{d(x_{n},x_{m}) \leq \sum _{j=m}^{n-1}d(x_{ j+1},x_{j}).}$$

From the strict contractiveness of F:

$$\displaystyle{d(x_{j+1},x_{j}) = d(F(x_{j}),F(x_{j-1}) \leq cd(x_{j},x_{j-1}) \leq \ \ldots \ \leq \ c^{j}d(x_{ 1},x_{0}),}$$

whereupon (since c < 1)

$$\displaystyle{d(x_{m},x_{n}) \leq \sum _{j=m}^{n-1}c^{j}d(x_{ 1},x_{0}) = d(x_{1},x_{0})\sum _{j=m}^{\infty }c^{j} = \frac{d(x_{1},x_{0})} {1 - c} \,c^{m}\,.}$$

Now given \(\varepsilon> 0\), we may choose N so that \(\frac{d(x_{1},x_{0})} {1-c} \,c^{N} <\varepsilon\), which insures, by the above chain of inequalities, that \(N \leq m <n \Rightarrow d(x_{m},x_{n}) <\varepsilon\), hence our iterate sequence (x n ) is indeed Cauchy. □

The Contraction-Mapping Principle seems to be a perfect theorem: easy to prove and widely applicable. However there is a catch: proving a given mapping to be a strict contraction usually requires some work—as you’ll see in the next few sections.

2 Application: Stochastic Matrices/Google

In Sect. 1.9 we introduced the “Google matrix” G, a stochastic matrix with entries all positive, and observed with the help of the Brouwer Fixed-Point Theorem (a key step in our proof of Perron’s Theorem) that G has an essentially unique positive fixed point whose coordinates rank internet web pages.

There remains, however, the problem of proposing an algorithm for actually finding this fixed point. Recall that the application of Brouwer/Perron to the Google matrix ultimately rested on the stochasticity of that matrix, which implied that G (indeed each N × N stochastic matrix) maps the standard N-simplex \(\Pi _{N}\) continuously into itself. The positivity of G then guaranteed the uniqueness of its fixed point.

The generalization to \(\mathbb{R}^{N}\) of the “walking-through-rooms” proof of Brouwer’s theorem set out for N = 2 in Sect. 2.3 could provide the basis for an algorithm that approximates the desired fixed point. On the other hand, Banach’s theorem has built into it a scheme that—at least in theory—is easily implemented: Use iterate sequences to approximate fixed points. However to be certain that this will work we need each positive stochastic matrix to induce a strict contraction on its standard simplex (so far we’ve established only “non-strict” contractivity: Exercise 1.5).

Does stochasticity imply strict contractivity? Is this too much to ask? Read on!

Theorem 3.5.

Every N × N, positive, stochastic matrix induces a strict contraction on the standard simplex \(\Pi _{N}\) , taken in the metric induced by the one-norm.

Proof.

Suppose A is a positive, stochastic, N × N matrix. We already know (proof of Theorem 1.6) that A takes \(\Pi _{N}\) into itself. We’re claiming that there exists a positive number c strictly less than 1 such that

$$\displaystyle{ \|Ax - Ay\|_{1} \leq c\|x - y\|_{1}\qquad (x,y \in \Pi _{N})\,. }$$
(3.2)

Let a i, j denote the matrix A’s entry in the i-th row and j-th column. Since each of these numbers is positive we may choose a positive number \(\varepsilon\) that is strictly less than all of them. Since each column of A sums to 1 we know that \(N\varepsilon <1\) (Proof: for j an index, \(1 =\sum _{i}a_{i,j}> N\varepsilon\)). Thus we may form the new N × N matrix B, whose (i, j)-entry is

$$\displaystyle{b_{i,j} = \frac{a_{i,j}-\varepsilon } {1 - N\varepsilon }\,.}$$

Clearly B is a positive matrix, and it’s easy to check that B is stochastic. Now

$$\displaystyle{A = (1 - N\varepsilon )B +\varepsilon E}$$

where E is the N × N matrix, all of whose entries are 1.

Claim. A satisfies (3.2) with \(c = (1 - N\varepsilon )\).

Proof of Claim. Since \(N\varepsilon\) lies strictly between 0 and 1, so does c. What makes this argument work is the fact that if \(x \in \Pi _{N}\) then Ex is the vector in \(\mathbb{R}^{N}\), each of whose coordinates is the sum of the coordinates of x, namely 1. In particular if x and y belong to \(\Pi _{N}\) then Ex = Ey, whereupon

$$\displaystyle{Ax - Ay = c(Bx - By) +\varepsilon (Ex - Ey) = c(Bx - By).}$$

By Exercise 1.5, every N × N stochastic matrix induces, in the 1-norm, a (possibly non-strict) contraction on \(\mathbb{R}^{N}\), so from the last equation and the linearity of B:

$$\displaystyle{\|Ax - Ay\|_{1} =\| c(Bx - By)\|_{1} = c\|B(x - y)\|_{1} \leq c\|x - y\|_{1}\quad (x,y \in \Pi _{N}),}$$

which proves the Claim, and with it the theorem. □

Corollary 3.6.

If A is an N × N positive stochastic matrix, then its (unique) Perron eigenvector is the limit of the iterate sequence of each initial point \(x_{0} \in \Pi _{N}\).

In particular, the unique ranking of web pages produced by the Google matrix can be computed by iteration. For \(x_{0} \in \Pi _{N}\) the iterate sequence of Corollary 3.6 is (x n ), where

$$\displaystyle{x_{n} = Ax_{n-1} = A^{2}x_{ n-2} =\,\ldots = A^{n}x_{ 0}\qquad (n = 1,2,\,\ldots ).}$$

For this reason the approximation scheme of the Corollary is called power iteration; it is used widely in numerical linear algebra for eigen-value/vector approximation.

3 Application: Newton’s Method

Suppose f is a real-valued function defined on a finite, closed interval [a, b] of the real line, and that we know f has a root somewhere in the open interval (a, b). We’re going to use the Contraction-Mapping Principle to show that, under suitable hypotheses on f, Newton’s method for each appropriate starting point converges to this root.

More precisely, suppose fC 2(I) with f′ never zero on I, and suppose f has different signs at the endpoints of I; say (without loss of generality) f(a) < 0 and f(b) > 0. Then f has a unique root x in the interior (a, b) of I. Under these hypotheses we have

Theorem 3.7.

There exists δ > 0 such that for every x 0 in \([x^{{\ast}}-\delta,x^{{\ast}}+\delta ]\) , Newton’s method with starting point x 0 converges to x .

In other words, under reasonable hypotheses on f: for starting points close enough to a root of f the iterate sequence for the Newton function

$$\displaystyle{F(x) = x -\frac{f(x)} {f'(x)}\qquad (x \in I)}$$

will converge to that root.

Proof.

Let M denote the maximum of | f″(x) | as x ranges through I, and let m denote the corresponding minimum of | f′(x) | . By the continuity of f″, and the hypothesis that f′ never vanishes on I, we know that M is finite and m > 0.

Differentiation of F via the quotient rule yields

$$\displaystyle{F'(x) = \frac{f(x)\,f''(x)} {f'(x)^{2}} \qquad (x \in I)}$$

which, along with our bounds on f′ and f″, provides the estimate

$$\displaystyle{\vert F'(x)\vert \leq \frac{M} {m^{2}}\vert f(x)\vert \qquad (x \in I_{\delta }).}$$

Thus, upon shrinking δ enough to insure that

$$\displaystyle{\vert f(x)\vert \leq \frac{m^{2}} {2M}\qquad \text{for}\qquad \vert x - x^{{\ast}}\vert <\delta }$$

(possible because f is continuous at x and takes the value zero there) we see that | F′(x) | ≤ 1∕2 for each \(x \in I_{\delta } = [x^{{\ast}}-\delta,x^{{\ast}}+\delta ]\). This estimate on F′ does the trick! For starters, if x, yI δ then, along with the Mean-Value Theorem of differential calculus, it shows that

$$\displaystyle{\vert F(x) - F(y)\vert = \vert F'(\overline{x})(x - y)\vert \leq \frac{1} {2}\,\vert x - y\vert \qquad \forall x,y \in I_{\delta }}$$

where on the right-hand side of the equality, \(\overline{x}\) lies between x and y. Thus F is a strict contraction on I δ —once we know F maps that interval into itself. But it does, since the same inequality shows that for each xI δ (upon recalling that the root x of f is a fixed point of F):

$$\displaystyle{\vert F(x) - x^{{\ast}}\vert = \vert F(x) - F(x^{{\ast}})\vert \leq \frac{1} {2}\,\vert x - x^{{\ast}}\vert \leq \frac{1} {2}\,\delta <\delta }$$

so F(x) ∈ I δ , as desired.

Thus Banach’s Contraction-Mapping Principle applies to the strict contraction F acting on the complete metric space \(I_{\delta } = [x^{{\ast}}-\delta,x^{{\ast}}+\delta ]\), and guarantees that for every starting point in I δ the corresponding F-iteration sequence converges to the fixed point of F, which must necessarily be the unique root of f in I δ . □

In the course of this proof we had to overcome a problem that occurs frequently when one seeks to apply Banach’s Principle:

The metric space for which the problem is originally defined is often not the one to which you apply Banach’s Principle!

For example, the hypotheses of Theorem 3.7 refer to the Newton function F defined on the compact interval (i.e., the complete metric space) I, but the theorem’s proof depended on cutting this space down to the smaller one I δ on which F acted as a strict contraction.

We’ll see this scenario play out again in the next section, where we’ll have to shrink an entire metric space of continuous functions!

4 Application: Initial-Value Problems

It’s time for a careful treatment of the initial-value problem (IVP) of Sect. 1.3 Recall its form: There is a differential equation plus initial condition

$$\displaystyle{ y' = f(x,y),\quad y(x_{0}) = y_{0} }$$
(IVP)

with \((x_{0},y_{0}) \in \mathbb{R}^{2}\) and f assumed initially to be continuous on all of \(\mathbb{R}^{2}\). Here we’ll just assume that f is continuous on a closed rectangle R = I × H, where I and H are compact intervals of the real line, I having radius r and center x 0 and H having radius h and center y 0. Thus R is a compact “r by h” rectangle in the plane, centered at the point \((x_{0},y_{0})\).

We’ll operate in the metric space C(I) consisting of real-valued functions that are continuous on I. In the course of our work we’ll need to shrink the radius r of I. To keep the notation simple we’ll re-assign the original symbols I, r, and R to the newly shrunken objects, taking care to be sure that what we’ve accomplished in one setting transfers intact to the new one.

Since continuous functions are bounded on compact sets and attain their maxima thereon, we can define on C(I) the “max-norm”

$$\displaystyle{\|u\| =\max _{x\in I}\vert u(x)\vert \qquad (f \in C(I))}$$

and use this to define a metric d by:

$$\displaystyle{d(u,v) =\| u - v\|\qquad (u,v \in C(I)).}$$

In this metric a sequence converges (resp. is Cauchy) if and only if it converges (resp. is Cauchy) uniformly on I. A fundamental property of uniform convergence is that every sequence of functions in C(I) that is uniformly Cauchy on I converges uniformly on I to a function in C(I).Footnote 2 Thus the metric space (C(I), d) is complete. As in our treatment of Newton’s Method, we’ll have to find an appropriate subset of C(I) in which to apply Banach’s Theorem. We’ll break this quest into several steps.

Step I. C(I) is too large. For (IVP) to make sense for a prospective solution y = u(x) we have to make sure that for every xI the point (x, u(x)) lies in the domain of the function f on the right-hand side of the differential equation in (IVP). We must therefore restrict attention to functions uC(I) having graph y = u(x) contained in R, i.e., for which | u(x) − y 0 | ≤ h for every xI. In metric-space language this means that in order for (IVP) to make sense, our prospective solutions must lie in

$$\displaystyle{\overline{B} = \overline{B}(y_{0},h) =\{ u \in C(I): \|u - y_{0}\| \leq h\},}$$

the closed ball in C(I) of radius h, centered at the constant function y 0.

Step II. The integral equation. As we observed in Sect. 1.3, a real-valued function y defined on the interval I satisfies IVP if and only if it satisfies the integral equation

$$\displaystyle{ y(x) = y_{0} +\int _{ t=x_{0}}^{x}f(t,y(t))\,dt\qquad (x \in I). }$$
(IE)

The right-hand side of this equation makes sense for every \(u \in \overline{B}\), and defines an integral transformation T on C(I) by

$$\displaystyle{ (Tu)(x) = y_{0} +\int _{ t=x_{0}}^{x}f(t,u(t))\,dt\qquad (u \in \overline{B},x \in \mathbb{R}). }$$
(IT)

By an argument entirely similar to the one used in Sect. 1.3 (pages 4 and 5) to prove that (IVP) is equivalent to the problem of finding a fixed point for (IT), we have

Lemma 3.8.

If \(u \in \overline{B}\) then Tu is differentiable on I and (Tu)′(x) = f(x,u(x)) for every x ∈ I.

In particular, T maps \(\overline{B}\) into C(I). Step III. Insuring that \(T(\overline{B}) \subset \overline{B}\). To use the Banach Contraction-Mapping Principle we must at the very least insure that T maps \(\overline{B}\) into itself. For the moment, let’s continue to assume only that f is continuous on the rectangle R, and set

$$\displaystyle{M =\max \{ \vert f(x,y)\vert: (x,y) \in R\}.}$$

Fix this value of M for the rest of the proof. Although we’ll allow ourselves to shrink the horizontal dimension of the rectangle R, we won’t be changing the value of M.

Lemma 3.9.

For M as above: if we redefine the interval I to have radius r ≤ h∕M then \(T(\overline{B}) \subset \overline{B}\).

Proof.

For \(\vert x - x_{0}\vert \leq h/M\) we have for each \(u \in \overline{B}\):

$$\displaystyle{\vert Tu(x) - y_{0}\vert = \left \vert \int _{t=x_{0}}^{x}f(t,u(t))\,dt\right \vert \leq M\vert x - x_{ 0}\vert \leq Mh/M = h.}$$

Thus redefining I to have radius ≤ hM insures that \(\|Tu - y_{0}\| \leq h\) for each \(u \in \overline{B}\), i.e., that T maps \(\overline{B}\) into itself. □

Step IV. Strict contractivity. So far we’ve found how to shrink the original interval I so that the closed ball \(\overline{B}\) of radius h in C(I) is mapped into itself by the integral operator T. This ball, being a closed subset of the complete metric space C(I), is itself complete in the metric inherited from C(I). However to apply Banach’s Principle we need to know that T is a strict contraction on \(\overline{B}\). For this we’ll assume that the function f, in addition to being continuous on the rectangle R, is also differentiable there with respect to its second variable, and that this partial derivative (call if f 2) is continuous on R.

Our goal now is to show that T is a strict contraction mapping on \(\overline{B}\) for some positive rhM. Then Banach’s Contraction-Mapping Principle will guarantee a fixed point for T in \(\overline{B}\), hence a unique solution therein to the integral equation (IE), and therefore to the initial-value problem (IVP) on the interval \(I = [x_{0} - r,x_{0} + r]\). Once done we’ll have proved

Theorem 3.10 (The Picard–Lindel&f Theorem).

Suppose \((x_{0},y_{0}) \in \mathbb{R}^{2}\) , U is an open subset of \(\mathbb{R}^{2}\) that contains \((x_{0},y_{0})\) , and f is a real-valued function that is continuous on U and has thereon a continuous partial derivative with respect to the second variable. Then the initial-value problem (IVP) has a unique solution on some nontrivial interval centered at x 0.

Proof.

By the work above we may choose a compact rectangle R = I × H in U, centered at \((x_{0},y_{0})\), such that \(T(\overline{B}) \subset \overline{B}\) whenever the length of I is sufficiently small. It remains to see how much further we must shrink I in order to achieve strict contractivity for T on \(\overline{B}\). To this end let \(M':=\max \{ \vert f_{2}(x,y)\vert: (x,y) \in R\},\) where the compactness of R and the continuity of f 2 on R guarantee that the maximum exists. Note first that if y 1 and y 2 belong to the interval H with \(y_{1} \leq y_{2}\) then the Mean-Value Theorem of differential calculus guarantees for each xI that

$$\displaystyle{ \vert f(x,y_{2}) - f(x,y_{1})\vert = \vert f_{2}(x,\eta )(y_{2} - y_{1})\vert \leq M'\vert y_{2} - y_{1}\vert }$$
(3.3)

where on the right-hand side of the equality, η lies between y 1 and y 2. Thus if u and v are functions in \(\overline{B}\) and xI, we have upon letting J(x) denote the closed interval between x and x 0:

$$\displaystyle\begin{array}{rcl} \vert Tu(x) - Tv(x)\vert & =& \left \vert \int _{J(x)}[f(t,u(t)) - f(t,v(t))]\,dt\right \vert \leq \int _{J(x)}\vert f(t,u(t)) - f(t,v(t))\vert \,dt {}\\ & \leq & M'\int _{J(x)}\vert u(t) - v(t)\vert \,dt \leq M'\|u - v\| \cdot \mbox{ length of }J(x) {}\\ & =& M'\|u - v\| \cdot \vert x - x_{0}\vert \leq M'r\|u - v\| {}\\ \end{array}$$

where the second inequality follows from estimate (3.3). Thus

$$\displaystyle{\|Tu - Tv\| \leq M'r\|u - v\|\qquad (u,v \in \overline{B}),}$$

so we can insure that T is a strictly contractive self-map of \(\overline{B}\) simply by demanding that, in addition to the restriction rhM already placed on the radius of I, we insure that r be < 1∕M′. □

Note that the proof given above will still work if the differentiability of f in the second variable is replaced by a “Lipschitz condition”

$$\displaystyle{\vert f(x,y_{1}) - f(x,y_{2})\vert \leq M'\vert y_{2} - y_{1}\vert \qquad ((x,y_{1}),(x,y_{2}) \in R).}$$

For initial-value problems, the interval of existence/uniqueness promised us by Banach’s Principle could be very small (see Exercise 3.2 for an example of this). There is, however, always a maximal such interval, and this interval has the property that the solution’s graph over this interval continues out to the boundary of the region on which the function f is defined and satisfies the Picard–Lindel&f hypotheses. For details see, e.g., [93, Sect. 2.4].

As an illustration of this phenomenon, consider the simple initial-value problem \(y' = a(1 + y^{2}),y(0) = 0\), where a > 0. One checks easily that \(y =\tan (ax)\) is a solution for which the maximal interval of existence is \((- \frac{\pi }{2a}, \frac{\pi } {2a})\), and a separation-of-variables argument shows that this is the only solution. Thus, even though the right-hand side \(f(x,y) = a(1 + y^{2})\) of this IVP’s differential equation is infinitely differentiable (even real-analytic) on the entire plane, the solution exists only on a finite interval, which for large a is very small.

Conclusion: In nonlinear situations, singularities can arise “unexpectedly.”

Notes

Banach’s doctoral dissertation. This is [5]; the Contraction-Mapping Principle is Theorem 6 on page 160 of that paper.

Stochastic matrices. The proof that every positive stochastic matrix induces a strict contraction on its standard simplex (Theorem 3.5) is from [66], where the result is attributed to Krylov and Bogoliubov. The same proof is in [20, Sect. 4, pp. 578–9].

We mentioned that the “power iteration” method of Corollary 3.6 works in more generality. For more on this, see, e.g., [117, Lecture 27].

For the Google matrix G, revisited in Sect. 3.2, there is still the issue of its enormous size. A preliminary discussion of how to handle this can be found in [17].

Initial-value problems. The Picard–Lindel&f Theorem originates in Lindel&f’s 1894 paper [70], in which he generalizes earlier work of Picard. In our special case the iteration associated with Banach’s Principle is often called “Picard Iteration.”

Higher orders, higher dimensions. The restriction of our discussion of initial-value problems to first order differential equations is not as severe as it seems. Consider, for example, the second order problem for an open interval I containing the point x 0:

$$\displaystyle{y'' = f(x,y,y'),\quad y(x_{0}) = y_{0},\qquad y'(x_{0}) = y_{1}\qquad (x \in I).}$$

This problem can be rewritten as: \(Y ' = F(x,Y ),\ Y (x_{0}) = Y _{0}\) for xI, where Y = (y, y′) is a function taking I into \(\mathbb{R}^{2}\), \(Y _{0} = (y_{0},y_{1})\) is a vector in \(\mathbb{R}^{2}\) (now thought of as a space of row vectors), and \(F(x,Y ) = (y',f(x,y,y'))\) maps the original domain of f (a subset of \(\mathbb{R}^{3}\)) into \(\mathbb{R}^{2}\).

It’s not difficult to check that the proof given above for our original “scalar-valued” IVP works almost verbatim in the new setting, with the absolute-value norm on the real line replaced in the higher dimensions by the Euclidean one, thus producing a unique solution for the second order IVP. Of course the idea generalizes readily to initial-value problems of order larger than 2.

Newton’s Method again. In a similar vein, our analysis of Newton’s Method can be generalized to higher dimensions. Suppose the function f maps some open subset G of \(\mathbb{R}^{N}\) into itself, and that f(p) = 0 for some point pG. If we assume that all first and second order partial derivatives of the components of f are continuous, and that the derivative f′, which is now a linear transformation on \(\mathbb{R}^{N}\), is nonsingular at every point of G, then, just as in the single-variable case, we can form the “Newton function” \(F(x) = x - f'(x)^{-1}\,f(x)\), where on the right-hand side we see the inverse of the linear transformation f′(x) acting on the vector f(x). A bit more work than before shows that, when restricted to a suitable closed rectangle centered at p, the function F is a strict contraction, so for every point in that rectangle the Newton iteration converges to p.