Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In computational physics very often roots of a function, i.e. solutions of an equation like

$$\begin{aligned} f(x_{1}\cdots x_{N})=0 \end{aligned}$$
(6.1)

have to be determined. A related problem is the search for local extrema (Fig. 6.1)

$$\begin{aligned} \max f(x_{1}\cdots x_{N})\quad\min f(x_{1}\cdots x_{N}) \end{aligned}$$
(6.2)

which for a smooth function are solutions of the equations

$$\begin{aligned} \frac{\partial f(x_{1}\cdots x_{N})}{\partial x_{i}}=0, \quad i=1\dots N. \end{aligned}$$
(6.3)

In one dimension bisection is a very robust but rather inefficient root finding method. If a good starting point close to the root is available and the function smooth enough, the Newton-Raphson method converges much faster. Special strategies are necessary to find roots of not so well behaved functions or higher order roots. The combination of bisection and interpolation like in Dekker’s and Brent’s methods provides generally applicable algorithms. In multidimensions calculation of the Jacobian matrix is not always possible and Quasi-Newton methods are a good choice. Whereas local extrema can be found as the roots of the gradient, at least in principle, direct optimization can be more efficient. In one dimension the ternary search method or Brent’s more efficient golden section search method can be used. In multidimensions the class of direction set search methods is very popular which includes the methods of steepest descent and conjugate gradients, the Newton-Raphson method and, if calculation of the full Hessian matrix is too expensive, the Quasi-Newton methods.

Fig. 6.1
figure 1

Roots and local extrema of a function

1 Root Finding

If there is exactly one root in the interval a 0<x<b 0 then one of the following methods can be used to locate the position with sufficient accuracy. If there are multiple roots, these methods will find one of them and special care has to be taken to locate the other roots.

1.1 Bisection

The simplest method [45] to solve

$$\begin{aligned} f(x)=0 \end{aligned}$$
(6.4)

uses the following algorithm (Fig. 6.2):

  1. (1)

    Determine an interval [a 0,b 0], which contains a sign change of f(x). If no such interval can be found then f(x) does not have any zero crossings.

  2. (2)

    Divide the interval into \([a_{0},a_{0}+\frac{b_{0}-a_{0}}{2}]\) \([a_{0}+\frac{b_{0}-a_{0}}{2},b_{0}]\) and choose that interval [a 1,b 1], where f(x) changes its sign.

  3. (3)

    repeat until the width b n a n <ε is small enough.Footnote 1

Fig. 6.2
figure 2

Root finding by bisection

The bisection method needs two starting points which bracket a sign change of the function. It converges but only slowly since each step reduces the uncertainty by a factor of 2.

1.2 Regula Falsi (False Position) Method

The regula falsi [92] method (Fig. 6.3) is similar to the bisection method [45]. However, polynomial interpolation is used to divide the interval [x r ,a r ] with f(x r )f(a r )<0. The root of the linear polynomial

$$\begin{aligned} p(x)=f(x_{r})+(x-x_{r})\frac{f(a_{r})-f(x_{r})}{a_{r}-x_{r}} \end{aligned}$$
(6.5)

is given by

$$\begin{aligned} \xi_{r}=x_{r}-f(x_{r})\frac{a_{r}-x_{r}}{f(a_{r})-f(x_{r})}= \frac{a_{r}f(x_{r})-x_{r}f(a_{r})}{f(x_{r})-f(a_{r})} \end{aligned}$$
(6.6)

which is inside the interval [x r ,a r ]. Choose the sub-interval which contains the sign change:

$$\begin{aligned} \begin{aligned}[c] &f(x_{r})f(\xi_{r})<0\rightarrow[x_{r+1},a_{r+1}]=[x_{r}, \xi _{r}] \\ &f(x_{r})f(\xi_{r})>0\rightarrow[x_{r+1},a_{r+1}]=[ \xi_{r},a_{r}]. \end{aligned} \end{aligned}$$
(6.7)

Then ξ r provides a series of approximations with increasing precision to the root of f(x)=0.

Fig. 6.3
figure 3

Regula falsi method

1.3 Newton-Raphson Method

Consider a function which is differentiable at least two times around the root ξ. Taylor series expansion around a point x 0 in the vicinity

$$\begin{aligned} f(x)=f(x_{0})+(x-x_{0})f'(x_{0})+ \frac {1}{2}(x-x_{0})^{2}f''(x_{0})+\cdots \end{aligned}$$
(6.8)

gives for x=ξ

$$\begin{aligned} 0=f(x_{0})+(\xi-x_{0})f'(x_{0})+ \frac{1}{2}(\xi -x_{0})^{2}f''(x_{0})+\cdots. \end{aligned}$$
(6.9)

Truncation of the series and solving for ξ gives the first order Newton-Raphson [45, 252] method (Fig. 6.4)

$$\begin{aligned} x_{r+1}=x_{r}-\frac{f(x_{r})}{f'(x_{r})} \end{aligned}$$
(6.10)

and the second order Newton-Raphson method (Fig. 6.4)

$$\begin{aligned} x_{r+1}=x_{r}-\frac{f'(x_{r})\pm\sqrt {f'(x_{r})^{2}-2f(x_{r})f''(x_{r})}}{f''(x_{r})}. \end{aligned}$$
(6.11)
Fig. 6.4
figure 4

Newton-Raphson method

The Newton-Raphson method converges fast if the starting point is close enough to the root. Analytic derivatives are needed. It may fail if two or more roots are close by.

1.4 Secant Method

Replacing the derivative in the first order Newton Raphson method by a finite difference quotient gives the secant method [45] (Fig. 6.5) which has been known for thousands of years before [196]

$$\begin{aligned} x_{r+1}=x_{r}-f(x_{r})\frac{x_{r}-x_{r-1}}{f(x_{r})-f(x_{r-1})}. \end{aligned}$$
(6.12)

Round-off errors can become important as |f(x r )−f(x r−1)| gets small. At the beginning choose a starting point x 0 and determine

$$\begin{aligned} x_{1}=x_{0}-f(x_{0})\frac{2h}{f(x_{0}+h)-f(x_{0}-h)} \end{aligned}$$
(6.13)

using a symmetrical difference quotient.

Fig. 6.5
figure 5

Secant method

1.5 Interpolation

The secant method is also obtained by linear interpolation

$$\begin{aligned} p(x)=\frac{x-x_{r}}{x_{r-1}-x_{r}}f_{r-1}+\frac{x-x_{r-1}}{x_{r}-x_{r-1}}f_{r}. \end{aligned}$$
(6.14)

The root of the polynomial p(x r+1)=0 determines the next iterate x r+1

$$\begin{aligned} x_{r+1}=\frac {1}{f_{r-1}-f_{r}}(x_{r}f_{r-1}-x_{r-1}f_{r})=x_{r}-f_{r} \frac{x_{r}-x_{r-1}}{f_{r}-f_{r-1}}. \end{aligned}$$
(6.15)

Quadratic interpolation of three function values is known as Muller’s method [177]. Newton’s form of the interpolating polynomial is

$$\begin{aligned} p(x)=f_{r}+(x-x_{r})f[x_{r},x_{r-1}]+(x-x_{r}) (x-x_{r-1})f[x_{r},x_{r-1},x_{r-2}] \end{aligned}$$
(6.16)

which can be rewritten

$$\begin{aligned} p(x) =& f_{r}+(x-x_{r})f[x_{r},x_{r-1}]+(x-x_{r})^{2}f[x_{r},x_{r-1},x_{r-2}] \\ &{} +(x_{r}-x_{r-1}) (x-x_{r})f[x_{r},x_{r-1},x_{r-2}] \\ =&f_{r}+(x-x_{r})^{2}f[x_{r},x_{r-1},x_{r-2}] \\ &{} +(x-x_{r}) \bigl(f[x_{r},x_{r-1}] +f[x_{r},x_{r-2}]-f[x_{r-1},x_{r-2}] \bigr) \\ =&f_{r}+A(x-x_{r})+B(x-x_{r})^{2} \end{aligned}$$
(6.17)

and has the roots

$$\begin{aligned} x_{r+1}=x_{r}-\frac{A}{2B}\pm\sqrt{\frac{A^{2}}{4B^{2}}- \frac{f_{r}}{B}}. \end{aligned}$$
(6.18)

To avoid numerical cancellation, this is rewritten

$$\begin{aligned} x_{r+1} =& x_{r}+\frac{1}{2B} \bigl(-A\pm \sqrt{A^{2}-4Bf_{r}} \bigr) \\ =& x_{r}+\frac{-2f_{r}}{A^{2}-(A^{2}-4Bf_{r})} \bigl(A\mp\sqrt {A^{2}-4Bf_{r}} \bigr) \\ =& x_{r}+\frac{-2f_{r}}{A\pm\sqrt{A^{2}-4Bf_{r}}}. \end{aligned}$$
(6.19)

The sign in the denominator is chosen such that x r+1 is the root closer to x r . The roots of the polynomial can become complex valued and therefore this method is useful to find complex roots.

1.6 Inverse Interpolation

Complex values of x r can be avoided by interpolation of the inverse function instead

$$\begin{aligned} x=f^{-1}(y). \end{aligned}$$
(6.20)

Using the two points x r ,x r−1 the Lagrange method gives

$$\begin{aligned} p(y)=x_{r-1}\frac{y-f_{r}}{f_{r-1}-f_{r}}+x_{r}\frac{y-f_{r-1}}{f_{r}-f_{r-1}} \end{aligned}$$
(6.21)

and the next approximation of the root corresponds again to the secant method (6.12)

$$\begin{aligned} x_{r+1}=p(0)=\frac{x_{r-1}f_{r}-x_{r}f_{r-1}}{f_{r}-f_{r-1}}=x_{r}+\frac {(x_{r-1}-x_{r})}{f_{r}-f_{r-1}}f_{r}. \end{aligned}$$
(6.22)

Inverse quadratic interpolation needs three starting points x r ,x r−1,x r−2 together with the function values f r ,f r−1,f r−2 (Fig. 6.6). The inverse function x=f −1(y) is interpolated with the Lagrange method

$$\begin{aligned} p(y) =& \frac {(y-f_{r-1})(y-f_{r})}{(f_{r-2}-f_{r-1})(f_{r-2}-f_{r})}x_{r-2}+\frac {(y-f_{r-2})(y-f_{r})}{(f_{r-1}-f_{r-2})(f_{r-1}-f_{r})}x_{r-1} \\ &{} +\frac{(y-f_{r-1})(y-f_{r-2})}{(f_{r}-f_{r-1})(f_{r}-f_{r-2})}x_{r}. \end{aligned}$$
(6.23)

For y=0 we find the next iterate

$$\begin{aligned} x_{r+1} =& p(0)=\frac {f_{r-1}f_{r}}{(f_{r-2}-f_{r-1})(f_{r-2}-f_{r})}x_{r-2}+\frac {f_{r-2}f_{r}}{(f_{r-1}-f_{r-2})(f_{r-1}-f_{r})}x_{r-1} \\ &{} +\frac{f_{r-1}f_{r-2}}{(f_{r}-f_{r-1})(f_{r}-f_{r-2})}x_{r}. \end{aligned}$$
(6.24)
Fig. 6.6
figure 6

Root finding by interpolation of the inverse function

Inverse quadratic interpolation is only a good approximation if the interpolating parabola is single valued and hence if it is a monotonous function in the range of f r ,f r−1,f r−2. For the following discussion we assume that the three values of x are renamed such that x 1<x 2<x 3.

Consider the limiting case (a) in Fig. 6.7 where the polynomial has a horizontal tangent at y=f 3 and can be written as

$$\begin{aligned} p(y)=x_{3}+(x_{1}-x_{3})\frac{(y-f_{3})^{2}}{(f_{1}-f_{3})^{2}}. \end{aligned}$$
(6.25)

Its value at y=0 is

$$\begin{aligned} p(0) =&x_{3}+(x_{1}-x_{3})\frac {f_{3}^{2}}{(f_{1}-f_{3})^{2}} \\ =&x_{1}+(x_{3}-x_{1}) \biggl(1-\frac{f_{3}^{2}}{(f_{1}-f_{3})^{2}} \biggr). \end{aligned}$$
(6.26)

If f 1 and f 3 have different sign and |f 1|<|f 3| (Sect. 6.1.7.2) we find

$$\begin{aligned} 1-\frac{f_{3}^{2}}{(f_{1}-f_{3})^{2}}<\frac{3}{4}. \end{aligned}$$
(6.27)
Fig. 6.7
figure 7

(Validity of inverse quadratic interpolation)  Inverse quadratic interpolation is only applicable if the interpolating polynomial p(y) is monotonous in the range of the interpolated function values f 1f 3. (a) and (b) show the limiting cases where the polynomial has a horizontal tangent at f 1 or f 3. (c) shows the case where the extremum of the parabola is inside the interpolation range and interpolation is not feasible

Brent [36] used this as a criterion for the applicability of the inverse quadratic interpolation. However, this does not include all possible cases where interpolation is applicable. Chandrupatla [54] gave a more general discussion. The limiting condition is that the polynomial p(y) has a horizontal tangent at one of the boundaries x 1,3. The derivative values are

$$\begin{aligned} \frac{dp}{dy}(y=f_{1}) =& \frac {x_{2}(f_{1}-f_{3})}{(f_{2}-f_{1})(f_{2}-f_{3})}+\frac {x_{3}(f_{1}-f_{2})}{(f_{3}-f_{1})(f_{3}-f_{2})}+ \frac {x_{1}}{f_{1}-f_{2}}+\frac{x_{1}}{f_{1}-f_{3}} \\ =&\frac{(f_{2}-f_{1})}{(f_{3}-f_{1})(f_{3}-f_{2})} \biggl[\frac {x_{2}(f_{3}-f_{1})^{2}}{(f_{2}-f_{1})^{2}}-x_{3} \\ &{} -\frac {x_{1}(f_{3}-f_{1})^{2}-x_{1}(f_{2}-f_{1})^{2}}{(f_{2}-f_{1})^{2}} \biggr] \\ =& \frac{(f_{2}-f_{1})(x_{2}-x_{1})}{(f_{3}-f_{1})(f_{3}-f_{2})} \bigl[\varPhi ^{-2}-\xi^{-1} \bigr] \end{aligned}$$
(6.28)
$$\begin{aligned} \frac{dp}{dy}(y=f_{3}) =&\frac {x_{2}(f_{3}-f_{1})}{(f_{2}-f_{1})(f_{2}-f_{3})}+\frac {x_{1}(f_{3}-f_{2})}{(f_{1}-f_{2})(f_{1}-f_{3})}+ \frac {x_{3}}{f_{3}-f_{2}}+\frac{x_{3}}{f_{3}-f_{1}} \\ =&\frac{(f_{3}-f_{2})}{(f_{2}-f_{1})(f_{3}-f_{1})} \biggl[-\frac {x_{2}(f_{3}-f_{1})^{2}}{(f_{3}-f_{2})^{2}}+x_{3} \frac{(f_{3}-f_{1})^{2}}{(f_{3}-f_{2})^{2}} \\ &{} -x_{3}\frac {(f_{3}-f_{2})^{2}}{(f_{3}-f_{2})^{2}}+x_{1} \biggr] \\ =& \frac{(f_{3}-f_{2})(x_{3}-x_{2})}{(f_{2}-f_{1})(f_{3}-f_{1})} \biggl[ \biggl(\frac{1}{\varPhi-1} \biggr)^{2}- \frac{1}{1-\xi} \biggr] \end{aligned}$$
(6.29)

with [54]

$$\begin{aligned} &\xi=\frac{x_{2}-x_{1}}{x_{3}-x_{1}} \qquad \varPhi=\frac{f_{2}-f_{1}}{f_{3}-f_{1}} \end{aligned}$$
(6.30)
$$\begin{aligned} &\xi-1=\frac{x_{2}-x_{3}}{x_{3}-x_{1}} \qquad \varPhi-1=\frac {f_{2}-f_{3}}{f_{3}-f_{1}}. \end{aligned}$$
(6.31)

Since for a parabola either f 1<f 2<f 3 or f 1>f 2>f 3 the conditions for applicability of inverse interpolation finally become

$$\begin{aligned} \varPhi^{2}<\xi& \end{aligned}$$
(6.32)
$$\begin{aligned} 1-\xi>(1-\varPhi)^{2}& \end{aligned}$$
(6.33)

which can be combined into

$$\begin{aligned} 1-\sqrt{1-\xi}<|\varPhi|<\sqrt{\xi}. \end{aligned}$$
(6.34)

This method is usually used in combination with other methods (Sect. 6.1.7.2).

1.7 Combined Methods

Bisection converges slowly. The interpolation methods converge faster but are less reliable. The combination of both gives methods which are reliable and converge faster than pure bisection.

1.7.1 Dekker’s Method

Dekker’s method [47, 70] combines bisection and secant method. The root is bracketed by intervals [c r ,b r ] with decreasing width where b r is the best approximation to the root found and c r is an earlier guess for which f(c r ) and f(b r ) have different sign. First an attempt is made to use linear interpolation between the points (b r ,f(b r )) and (a r ,f(a r )) where a r is usually the preceding approximation a r =b r−1 and is set to the second interval boundary a r =c r−1 if the last iteration did not lower the function value (Fig. 6.8).

Fig. 6.8
figure 8

Dekker’s method

Starting from an initial interval [x 0,x 1] with sign(f(x 0))≠sign(f(x 1)) the method proceeds as follows [47]:

  • initialization

$$\begin{aligned} &f_{1}=f(x_{1})\quad f_{0}=f(x_{0}) \\ &\mbox{if }|f_{1}|<|f_{0}|\mbox{ then \{} \\ &b=x_{1}\quad c=a=x_{0} \\ &f_{b}=f_{1}\quad f_{c}=f_{a}=f_{0}\} \\ &\mbox{else }\{ \\ &b=x_{0}\quad c=a=x_{1} \\ &f_{b}=f_{0}\quad f_{c}=f_{a}=f_{1}\} \end{aligned}$$
  • iteration

$$ \begin{aligned}[b] &x_{s}=b-f_{b}\frac{b-a}{f_{b}-f_{a}} \\ &x_{m}=\frac{c+b}{2}. \end{aligned} $$

If x s is very close to the last b then increase the distance to avoid too small steps else choose x s if it is between b and x m , otherwise choose x m (thus choosing the smaller interval)

$$\begin{aligned} x_{r}=\left \{ \begin{array}{l@{\quad}l} b+\delta\operatorname{sign}(c-b)&\mbox{if abs}(x_{s}-b)<\delta \\ x_{s}&\mbox{if }b+\delta<x_{s}<x_{m}\mbox{ or }b-\delta>x_{s}>x_{m}\\ x_{m}&\mbox{else}. \end{array} \right . \end{aligned}$$

Determine x k as the latest of the previous iterates x 0x r−1 for which sign(f(x k ))≠sign(f(x r )).

If the new function value is lower update the approximation to the root

$$\begin{aligned} \begin{aligned}[c] & f_{r}=f(x_{r}) \\ &\mbox{if }|f_{r}|<|f_{k}|\mbox{ then }\{ \\ &a=b\quad b=x_{r}\quad c=x_{k} \\ &f_{a}=f_{b}\quad f_{b}=f_{r}\quad f_{c}=f_{k}\} \end{aligned} \end{aligned}$$

otherwise keep the old approximation and update the second interval boundary

$$\begin{aligned} \begin{aligned}[c] &\mbox{if }|f_{r}|\ge|f_{k}|\mbox{ then }\{ \\ &b=x_{k}\quad a=c=x_{r} \\ &f_{b}=f_{k}\quad f_{a}=f_{c}=f_{r}\} \end{aligned} \end{aligned}$$

repeat until |cb|<ε or f r =0.

1.7.2 Brent’s Method

In certain cases Dekker’s method converges very slowly making many small steps of the order ε. Brent [36, 37, 47] introduced some modifications to reduce such problems and tried to speed up convergence by the use of inverse quadratic interpolation (Sect. 6.1.6). To avoid numerical problems the iterate (6.24) is written with the help of a quotient

$$\begin{aligned} x_{r+1} =& \frac{f_{b}f_{c}}{(f_{a}-f_{b})(f_{a}-f_{c})}a+\frac {f_{a}f_{c}}{(f_{b}-f_{a})(f_{b}-f_{c})}b \\ & {} +\frac{f_{b}f_{a}}{(f_{c}-f_{b})(f_{c}-f_{a})}c \\ =&b+\frac{p}{q} \end{aligned}$$
(6.35)

with

$$\begin{aligned} & \begin{aligned}[b] p&=\frac{f_{b}}{f_{a}} \biggl((c-b)\frac{f_{a}}{f_{c}} \biggl(\frac {f_{a}}{f_{c}}- \frac{f_{b}}{f_{c}} \biggr)-(b-a) \biggl(\frac {f_{b}}{f_{c}}-1 \biggr) \biggr) \\ & =(c-b)\frac{f_{b}(f_{a}-f_{b})}{f_{c}^{2}}-(b-a)\frac {f_{b}(f_{b}-f_{c})}{f_{a}f_{c}} \\ &=\frac{af_{b}f_{c}(f_{b}-f_{c})+b [f_{a}f_{b}(f_{b}-f_{a})+f_{b}f_{c}(f_{c}-f_{b})]+cf_{a}f_{b}(f_{a}-f_{b})}{f_{a}f_{c}^{2}} \end{aligned} \end{aligned}$$
(6.36)
$$\begin{aligned} &q=- \biggl(\frac{f_{a}}{f_{c}}-1 \biggr) \biggl(\frac{f_{b}}{f_{c}}-1 \biggr) \biggl(\frac{f_{b}}{f_{a}}-1 \biggr)=-\frac {(f_{a}-f_{c})(f_{b}-f_{c})(f_{b}-f_{a})}{f_{a}f_{c}^{2}}. \end{aligned}$$
(6.37)

If only two points are available linear interpolation is used. The iterate (6.22) then is written as

$$\begin{aligned} x_{r+1}=b+\frac{(a-b)}{f_{b}-f_{a}}f_{b}=b+\frac{p}{q} \end{aligned}$$
(6.38)

with

$$\begin{aligned} p=(a-b)\frac{f_{b}}{f_{a}} \qquad q= \biggl(\frac{f_{b}}{f_{a}}-1 \biggr). \end{aligned}$$
(6.39)

The division is only performed if interpolation is appropriate and division by zero cannot happen. Brent’s method is fast and robust at the same time. It is often recommended by text books. The algorithm is summarized in the following [209].

Start with an initial interval [x 0,x 1] with f(x 0)f(x 1)≤0

  • initialization

$$\begin{aligned} &a=x_{0}\quad b=x_{1}\quad c=a \\ &f_{a}=f(a)\quad f_{b}=f(b)\quad f_{c}=f_{a} \\ &e=d=b-a \end{aligned}$$
  • iteration

If c is a better approximation than b exchange values

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }|f_{c}|<|f_{b}|\mbox{ then }\{ \\ &a=b\quad b=c\quad c=a \\ &f_{a}=f_{b}\quad f_{b}=f_{c}\quad f_{c}=f_{a}\} \end{aligned} \end{aligned}$$

calculate midpoint relative to b

$$\begin{aligned} x_{m}=0.5(c-b) \end{aligned}$$

stop if accuracy is sufficient

$$\begin{aligned} \mbox{if }|x_{m}|<\varepsilon\mbox{ or }f_{b}=0\mbox{ then exit} \end{aligned}$$

use bisection if the previous step width e was too small or the last step did not improve

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }|e|<\varepsilon\mbox{ or }|f_{a}|\le|f_{b}|\mbox{ then \{} \\ &e=d=x_{m}\} \end{aligned} \end{aligned}$$

otherwise try interpolation

$$\begin{aligned} \begin{aligned}[b] &\mbox{else }\{ \\ &\mbox{if }a=c\,\mbox{ then }\biggl\{ \\ &p=2x_{m}\frac{f_{b}}{f_{a}}\quad q=\frac{f_{b}-f_{a}}{f_{a}}\biggr\} \\ &\mbox{else }\biggl\{ \\ &p=2x_{m}\frac{f_{b}(f_{a}-f_{b})}{f_{c}^{2}}-(b-a)\frac {f_{b}(f_{b}-f_{c})}{f_{a}f_{c}} \\ & q= \biggl(\frac{f_{a}}{f_{c}}-1 \biggr) \biggl(\frac{f_{b}}{f_{c}}-1 \biggr) \biggl(\frac{f_{b}}{f_{a}}-1 \biggr)\biggr\} \end{aligned} \end{aligned}$$

make p a positive quantity

$$\begin{aligned} \mbox{if }p>0\mbox{ then }\{q=-q\}\mbox{ else }\{p=-p\} \end{aligned}$$

update previous step width

$$\begin{aligned} s=e\quad e=d \end{aligned}$$

use interpolation if applicable, otherwise use bisection

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }2p<3x_{m}q-|\varepsilon q|\mbox{ and }p<|0.5\, sq|\mbox{ then }\biggl\{ \\ &d=\frac{p}{q}\biggr\} \\ &\mbox{else }\{e=d=x_{m}\} \\ &a=b\quad f_{a}=f_{b} \\ &\mbox{if }|d|>\varepsilon\mbox{ then }\{ \\ &b=b+d\} \\ &\mbox{else }\bigl\{b=b+\varepsilon\operatorname{sign}(x_{m})\bigr\} \end{aligned} \end{aligned}$$

calculate new function value

$$\begin{aligned} f_{b}=f(b) \end{aligned}$$

be sure to bracket the root

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }\operatorname{sign}(f_{b})=\operatorname{sign}(f_{c})\mbox{ then }\{ \\ &c=a\quad f_{c}=f_{a} \\ &e=d=b-a\} \end{aligned} \end{aligned}$$

1.7.3 Chandrupatla’s method

In 1997 Chandrupatla [54] published a method which tries to use inverse quadratic interpolation whenever possible according to (6.34). He calculates the relative position of the new iterate as (Fig. 6.9)

$$\begin{aligned} t =&\frac{x-c}{b-c} \\ =&\frac{1}{b-c} \biggl[\frac{f_{c}f_{b}}{(f_{a}-f_{b})(f_{a}-f_{c})}a+\frac {f_{a}f_{c}}{(f_{b}-f_{a})(f_{b}-f_{c})}b \\ &{}+ \frac {f_{b}f_{a}}{(f_{c}-f_{b})(f_{c}-f_{a})}c-c \biggr] \\ =&\frac{a-c}{b-c}\frac{f_{c}}{f_{c}-f_{a}}\frac{f_{b}}{f_{b}-f_{a}}+\frac {f_{a}f_{c}}{(f_{b}-f_{a})(f_{b}-f_{c})}. \end{aligned}$$
(6.40)
Fig. 6.9
figure 9

Chandrupatla’s method

The algorithm proceeds as follows:

Start with an initial interval [x 0,x 1] with f(x 0)f(x 1)≤0

  • initialization

$$\begin{aligned} \begin{aligned}[b] &b=x_{0}\quad a=c=x_{1} \\ &f_{b}=f(b)\quad f_{a}=f_{c}=f(c) \\ &t=0.5 \end{aligned} \end{aligned}$$
  • iteration

$$\begin{aligned} \begin{aligned}[b] &x_{t}=a+t(b-a) \\ &f_{t}=f(x_{t}) \end{aligned} \end{aligned}$$
$$\begin{aligned} \begin{aligned}[b] &\mbox{if }\operatorname{sign}(f_{t})=\operatorname{sign}(f_{a})\ \{ \\ &c=a\quad f_{c}=f_{a} \\ &a=x_{t}\quad f_{a}=F_{t}\} \\ &\mbox{else}\ \{ \\ & c=b\quad b=a\quad a=x_{t} \\ &f_{c}=f_{b}\quad f_{b}=f_{a}\quad f_{a}=f_{t}\} \\ &x_{m}=a\quad f_{m}=f_{a} \\ &\mbox{if }\mathrm{abs}(f_{b})<\mathrm{abs}(f_{a})\ \{ \\ &x_{m}=b\quad f_{m}=f_{b}\} \\ &\mathit{tol}=2\varepsilon_{M}|x_{m}|+\varepsilon_{a} \\ &t_{l}=\frac{\mathit{tol}}{|b-c|} \\ & \mbox{if }t_{l}>0.5\mbox{ or }f_{m}=0\mbox{ exit} \\ &\xi=\frac{a-b}{c-b}\quad\varPhi=\frac{f_{a}-f_{b}}{f_{c}-f_{b}} \\ &\mbox{if }1-\sqrt{1-\xi}<\varPhi<\sqrt{\xi}\ \biggl\{ \\ &t=\frac{f_{a}}{f_{b}-f_{a}}\frac{f_{c}}{f_{b}-f_{c}}+\frac {c-a}{b-a}\frac{f_{a}}{f_{c}-f_{a}} \frac{f_{b}}{f_{c}-f_{b}}\biggr\} \\ &\mbox{else }\{t=0.5\} \\ &\mbox{if }t<t_{l}\ \{t=t_{l}\} \\ &\mbox{if }t>(1-t_{l})\ \{t=1-t_{l}\} \end{aligned} \end{aligned}$$

Chandrupatlas’s method is more efficient than Dekker’s and Brent’s, especially for higher order roots (Figs. 6.10, 6.11, 6.12).

Fig. 6.10
figure 10

(Comparison of different solvers)  The root of the equation f(x)=x 2−2 is determined with different methods: Newton-Raphson (a, squares), Chandrupatla (b, circles), Brent (c, triangle up), Dekker (d, diamonds), regula falsi (e, stars), pure bisection (f, dots). Starting values are x 1=−1,x 2=2. The absolute error is shown as function of the number of iterations. For x 1=−1, the Newton-Raphson method converges against \(-\sqrt{2}\)

Fig. 6.11
figure 11

(Comparison of different solvers for a third order root)  The root of the equation f(x)=(x−1)3 is determined with different methods: Newton-Raphson (a), Chandrupatla (b), Brent (c), Dekker (d), regula falsi (e), pure bisection (f). Starting values are x 1=0, x 2=1.8. The absolute error is shown as function of the number of iterations

Fig. 6.12
figure 12

(Comparison of different solvers for a high order root)  The root of the equation f(x)=x 25 is determined with different methods: Newton-Raphson (a), Chandrupatla (b, circles), Brent (c), Dekker (d), regula falsi (e), pure bisection (f, dots). Starting values are x 1=−1, x 2=2. The absolute error is shown as function of the number of iterations

1.8 Multidimensional Root Finding

The Newton-Raphson method can be easily generalized for functions of more than one variable. We search for the solution of a system of n nonlinear equations in n variables x i

$$\begin{aligned} \mathbf {f}(\mathbf {x})=\left ( \begin{array}{c} f_{1}(x_{1}\cdots x_{n})\\ \vdots\\ f_{n}(x_{1}\cdots x_{n}) \end{array} \right )=0. \end{aligned}$$
(6.41)

The first order Newton-Raphson method results from linearization of

$$\begin{aligned} 0=\mathbf {f}(\mathbf {x})=\mathbf {f}\bigl(\mathbf {x}^{0}\bigr)+J\bigl( \mathbf {x}^{0}\bigr) \bigl(\mathbf {x}-\mathbf {x}^{0}\bigr)+\cdots \end{aligned}$$
(6.42)

with the Jacobian matrix

$$\begin{aligned} J=\left ( \begin{array}{c@{\quad}c@{\quad}c} \frac{\partial f_{1}}{\partial x_{1}} & \cdots& \frac{\partial f_{1}}{\partial x_{n}}\\ \vdots& \ddots& \vdots\\ \frac{\partial f_{n}}{\partial x_{1}} & \cdots& \frac{\partial f_{n}}{\partial x_{n}} \end{array} \right ). \end{aligned}$$
(6.43)

If the Jacobian matrix is not singular the equation

$$\begin{aligned} 0=\mathbf {f}\bigl(\mathbf {x}^{0}\bigr)+J\bigl(\mathbf {x}^{0}\bigr) \bigl(\mathbf {x}-\mathbf {x}^{0}\bigr) \end{aligned}$$
(6.44)

can be solved by

$$\begin{aligned} \mathbf {x}=\mathbf {x}^{0}- \bigl(J\bigl(\mathbf {x}^{0}\bigr) \bigr)^{-1}\mathbf {f}\bigl(\mathbf {x}^{0}\bigr). \end{aligned}$$
(6.45)

This can be repeated iteratively

$$\begin{aligned} \mathbf {x}^{(r+1)}=\mathbf {x}^{(r)}-\bigl(J\bigl(\mathbf {x}^{(r)} \bigr)\bigr)^{-1}\mathbf {f}\bigl(\mathbf {x}^{(r)}\bigr). \end{aligned}$$
(6.46)

1.9 Quasi-Newton Methods

Calculation of the Jacobian matrix can be very time consuming. Quasi-Newton methods use instead an approximation to the Jacobian which is updated during each iteration. Defining the differences

$$\begin{aligned} \mathbf {d}^{(r)}=\mathbf {x}^{(r+1)}-\mathbf {x}^{(r)}& \end{aligned}$$
(6.47)
$$\begin{aligned} \mathbf {y}^{(r)}=\mathbf {f}\bigl(\mathbf {x}^{(r+1)}\bigr)-\mathbf {f}\bigl( \mathbf {x}^{(r)}\bigr)& \end{aligned}$$
(6.48)

we obtain from the truncated Taylor series

$$\begin{aligned} \mathbf {f}\bigl(\mathbf {x}^{(r+1)}\bigr)=\mathbf {f}\bigl(\mathbf {x}^{(r)} \bigr)+J\bigl(\mathbf {x}^{(r)}\bigr) \bigl(\mathbf {x}^{(r+1)}- \mathbf {x}^{(r)}\bigr) \end{aligned}$$
(6.49)

the so called Quasi-Newton or secant condition

$$\begin{aligned} \mathbf {y}^{(r)}=J\bigl(\mathbf {x}^{(r)}\bigr)\mathbf {d}^{(r)}. \end{aligned}$$
(6.50)

We attempt to construct a family of successive approximation matrices J r so that, if J were a constant, the procedure would become consistent with the Quasi-Newton condition. Then for the new update J r+1 we have

$$\begin{aligned} J_{r+1}\mathbf {d}^{(r)}=\mathbf {y}^{(r)}. \end{aligned}$$
(6.51)

Since d (r),y (r) are already known, these are only n equations for the n 2 elements of J r+1. To specify J r+1 uniquely, additional conditions are required. For instance, it is reasonable to assume, that

$$\begin{aligned} J_{r+1}\mathbf {u}=J_{r}\mathbf {u}\quad\mbox{for all }\mathbf {u}\perp \mathbf {d}^{(r)}. \end{aligned}$$
(6.52)

Then J r+1 differs from J r only by a rank one updating matrix

$$\begin{aligned} J_{r+1}=J_{r}+\mathbf {u}\,\mathbf {d}^{(r)T}. \end{aligned}$$
(6.53)

From the secant condition we obtain

$$\begin{aligned} J_{r+1}\mathbf {d}^{(r)}=J_{r}\mathbf {d}^{(r)}+ \mathbf {u}\bigl(\mathbf {d}^{(r)}\mathbf {d}^{(r)T}\bigr)= \mathbf {y}^{(r)} \end{aligned}$$
(6.54)

hence

$$\begin{aligned} \mathbf {u}=\frac{1}{|d^{(r)}|^{2}} \bigl(\mathbf {y}^{(r)}-J_{r}\mathbf {d}^{(r)} \bigr). \end{aligned}$$
(6.55)

This gives Broyden’s update formula [42]

$$\begin{aligned} J_{r+1}=J_{r}+\frac{1}{|d^{(r)}|^{2}} \bigl(\mathbf {y}^{(r)}-J_{r} \mathbf {d}^{(r)} \bigr)\mathbf {d}^{(r)T}. \end{aligned}$$
(6.56)

To update the inverse Jacobian matrix, the Sherman-Morrison formula [236]

$$\begin{aligned} \bigl(A+\mathbf {u}\mathbf {v}^{T}\bigr)^{-1}=A^{-1}- \frac{A^{-1}\mathbf {u}\mathbf {v}^{T}A^{-1}}{1+\mathbf {v}^{T}A^{-1}\mathbf {u}} \end{aligned}$$
(6.57)

can be applied to have

$$\begin{aligned} J_{r+1}^{-1} =&J_{r}^{-1}-\frac{J_{r}^{-1}\frac{1}{|d^{(r)}|^{2}} (\mathbf {y}^{(r)}-J_{r}\mathbf {d}^{(r)} )\mathbf {d}^{(r)T}J_{r}^{-1}}{1+\frac{1}{|d^{(r)}|^{2}}\mathbf {d}^{(r)T}J_{r}^{-1} (\mathbf {y}^{(r)}-J_{r}\mathbf {d}^{(r)} )} \\ =&J_{r}^{-1}-\frac{ (J_{r}^{-1}\mathbf {y}^{(r)}-\mathbf {d}^{(r)} )\mathbf {d}^{(r)T}J_{r}^{-1}}{\mathbf {d}^{(r)T}J_{r}^{-1}\mathbf {y}^{(r)}}. \end{aligned}$$
(6.58)

2 Function Minimization

Minimization or maximization of a functionFootnote 2 is a fundamental task in numerical mathematics and closely related to root finding. If the function f(x) is continuously differentiable then at the extremal points the derivative is zero

$$\begin{aligned} \frac{\mathrm {d}f}{\mathrm {d}x}=0. \end{aligned}$$
(6.59)

Hence, in principle root finding methods can be applied to locate local extrema of a function. However, in some cases the derivative cannot be easily calculated or the function even is not differentiable. Then derivative free methods similar to bisection for root finding have to be used.

2.1 The Ternary Search Method

Ternary search is a simple method to determine the minimum of a unimodal function f(x). Initially we have to find an interval [a 0,b 0] which is certain to contain the minimum. Then the interval is divided into three equal parts [a 0,c 0], [c 0,d 0], [d 0,b 0] and either the first or the last of the three intervals is excluded (Fig. 6.13). The procedure is repeated with the remaining interval [a 1,b 1]=[a 0,d 0] or [a 1,b 1]=[c 0,b 0].

Fig. 6.13
figure 13

Ternary search method

Each step needs two function evaluations and reduces the interval width by a factor of 2/3 until the maximum possible precision is obtained. It can be determined by considering a differentiable function which can be expanded around the minimum x 0 as

$$\begin{aligned} f(x)=f(x_{0})+\frac{(x-x_{0})^{2}}{2}f''(x_{0})+ \cdots. \end{aligned}$$
(6.60)

Numerically calculated function values f(x) and f(x 0) only differ, if

$$\begin{aligned} \frac{(x-x_{0})^{2}}{2}f''(x_{0})> \varepsilon_{M}f(x_{0}) \end{aligned}$$
(6.61)

which limits the possible numerical accuracy to

$$\begin{aligned} \varepsilon(x_{0})=\mbox{min}|x-x_{0}|=\sqrt{ \frac {2f(x_{0})}{f''(x_{0})}\varepsilon_{M}} \end{aligned}$$
(6.62)

and for reasonably well behaved functions (Fig. 6.14) we have the rule of thumb [207]

$$\begin{aligned} \varepsilon(x_{0})\approx\sqrt{\varepsilon_{M}}. \end{aligned}$$
(6.63)
Fig. 6.14
figure 14

(Ternary search method)  The minimum of the function f(x)=1+0.01x 2+0.1x 4 is determined with the ternary search method. Each iteration needs two function evaluations. After 50 iterations the function minimum f min=1 is reached to machine precision ε M ≈10−16. The position of the minimum x min cannot be determined with higher precision than \(\sqrt{\varepsilon_{M}}\approx10^{-8}\) (6.63)

However, it may be impossible to reach even this precision, if the quadratic term of the Taylor series vanishes (Fig. 6.15).

Fig. 6.15
figure 15

(Ternary search method for a higher order minimum)  The minimum of the function f(x)=1+0.1 x 4 is determined with the ternary search method. Each iteration needs two function evaluations. After 30 iterations the function minimum f min=1 is reached to machine precision ε M ≈10−16. The position of the minimum x min cannot be determined with higher precision than \(\sqrt[4]{\varepsilon_{M}}\approx10^{-4}\)

The algorithm can be formulated as follows:

  • iteration

$$\begin{aligned} \begin{aligned}[c] &\mbox{if }(b-a)<\delta\mbox{ then exit} \\ &c=a+\frac{1}{3}(b-a)\quad d=a+\frac{2}{3}(b-a) \\ &f_{c}=f(c)\quad f_{d}=f(d) \\ &\mbox{if }f_{c}<f_{d}\mbox{ then }b=d\mbox{ else }a=c \end{aligned} \end{aligned}$$

2.2 The Golden Section Search Method (Brent’s Method)

To bracket a local minimum of a unimodal function f(x) three points a,b,c are necessary (Fig. 6.16) with

$$\begin{aligned} f(a)>f(b)\quad f(c)>f(b). \end{aligned}$$
(6.64)
Fig. 6.16
figure 16

(Golden section search method)  A local minimum of the function f(x) is bracketed by three points a, b, c. To reduce the uncertainty of the minimum position a new point ξ is chosen in the interval a<ξ<c and either a or c is dropped according to the relation of the function values. For the example shown a has to be replaced by ξ

The position of the minimum can be determined iteratively by choosing a new value ξ in the interval a<ξ<c and dropping either a or c, depending on the ratio of the function values. A reasonable choice for ξ can be found as follows (Fig. 6.17) [147, 207]. Let us denote the relative positions of the middle point and the trial point as

$$\begin{aligned} \begin{aligned}[c] & \frac{b-a}{c-a}=\beta\quad\frac{c-b}{c-a}=1-\beta\quad \frac{b-a}{c-b}=\frac{\beta}{1-\beta}\quad\frac{\xi-b}{c-a}=t \\ &\frac{\xi-a}{c-a}=\frac{\xi-b+b-a}{c-a}=t+\beta. \end{aligned} \end{aligned}$$
(6.65)

The relative width of the new interval will be

$$\begin{aligned} \frac{c-\xi}{c-a}=(1-\beta-t)\quad\mbox{or}\quad\frac{b-a}{c-a}=\beta \quad\mbox{if }a<\xi<b& \end{aligned}$$
(6.66)
$$\begin{aligned} \frac{\xi-a}{c-a}=(t+\beta)\quad\mbox{or}\quad\frac{c-b}{c-a}=(1-\beta ) \quad\mbox{if }b<\xi<c.& \end{aligned}$$
(6.67)

The golden search method requires that

$$\begin{aligned} t=1-2\beta=\frac{c+a-2b}{c-a}=\frac{(c-b)-(b-a)}{c-a}. \end{aligned}$$
(6.68)

Otherwise it would be possible that the larger interval width is selected many times slowing down the convergence. The value of t is positive if cb>ba and negative if cb<ba, hence the trial point always is in the larger of the two intervals. In addition the golden search method requires that the ratio of the spacing remains constant. Therefore we set

$$\begin{aligned} \frac{\beta}{1-\beta}=-\frac{t+\beta}{t}=-\frac{1-\beta}{t}\quad\mbox{if }a< \xi<b& \end{aligned}$$
(6.69)
$$\begin{aligned} \frac{\beta}{1-\beta}=\frac{t}{1-\beta-t}=\frac{t}{\beta}\quad\mbox{if }b< \xi<c.& \end{aligned}$$
(6.70)

Eliminating t we obtain for a<ξ<b the equation

$$\begin{aligned} \frac{(\beta-1)}{\beta}\bigl(\beta^{2}+\beta-1\bigr)=0. \end{aligned}$$
(6.71)

Besides the trivial solution β=1 there is only one positive solution

$$\begin{aligned} \beta=\frac{\sqrt{5}-1}{2}\approx0.618. \end{aligned}$$
(6.72)

For b<ξ<c we end up with

$$\begin{aligned} \frac{\beta}{\beta-1}\bigl(\beta^{2}-3\beta+1\bigr)=0 \end{aligned}$$
(6.73)

which has the nontrivial positive solution

$$\begin{aligned} \beta=\frac{3-\sqrt{5}}{2}\approx0.382. \end{aligned}$$
(6.74)

Hence the lengths of the two intervals [a,b], [b,c] have to be in the golden ratio \(\varphi=\frac{1+\sqrt{5}}{2}\approx1.618\) which gives the method its name. Using the golden ratio the width of the interval bracketing the minimum reduces by a factor of 0.618 (Figs. 6.18, 6.19).

Fig. 6.17
figure 17

Golden section search method

Fig. 6.18
figure 18

(Golden section search method)  The minimum of the function f(x)=1+0.01x 2+0.1x 4 is determined with the golden section search method. Each iteration needs only one function evaluation. After 40 iterations the function minimum f min=1 is reached to machine precision ε M ≈10−16. The position of the minimum x min cannot be determined to higher precision than \(\sqrt{\varepsilon_{M}}\approx10^{-8}\) (6.63)

Fig. 6.19
figure 19

(Golden section search for a higher order minimum)  The minimum of the function f(x)=1+0.1 x 4 is determined with the golden section search method. Each iteration needs only one function evaluation. After 28 iterations the function minimum f min=1 is reached to machine precision ε M ≈10−16. The position of the minimum x min cannot be determined to higher precision than \(\sqrt[4]{\varepsilon_{M}}\approx10^{-4}\)

The algorithm can be formulated as follows:

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }c-a<\delta\mbox{ then exit} \\ &\mbox{if }(b-a)\ge(c-b)\mbox{ then }\{ \\ &x=0.618 b+0.382 a \\ &f_{x}=f(x) \\ &\mbox{if }f_{x}<f_{b}\mbox{ then }\{c=b\quad b=x\quad f_{c}=f_{b}\quad f_{b}=f_{x}\} \\ &\mbox{else }a=x\quad f_{a}=f_{x}\} \\ & \mbox{if }(b-a)<(c-b)\mbox{ then }\{ \\ &x=0.618 b+0.382 c \\ &f_{x}=f(x) \\ & \mbox{if }f_{x}<f_{b}\mbox{ then }\{a=b\quad b=x\quad f_{a}=f_{b}\quad f_{b}=f_{x}\} \\ &\mbox{else }c=x\quad f_{c}=f_{x}\} \end{aligned} \end{aligned}$$

To start the method we need three initial points which can be found by Brent’s exponential search method (Fig. 6.20). Begin with three points

$$\begin{aligned} a_{0},b_{0}=a_{0}+h,c_{0}+1.618 h \end{aligned}$$
(6.75)

where h 0 is a suitable initial step width which depends on the problem. If the minimum is not already bracketed then if necessary exchange a 0 and b 0 to have

$$\begin{aligned} f(a_{0})>f(b_{0})>f(c_{0}). \end{aligned}$$
(6.76)

Then replace the three points by

$$\begin{aligned} a_{1}=b_{0}\quad b_{1}=c_{0}\quad c_{1}=c_{0}+1.618(c_{0}-b_{0}) \end{aligned}$$
(6.77)

and repeat this step until

$$\begin{aligned} f(b_{n})<f(c_{n}) \end{aligned}$$
(6.78)

or n exceeds a given maximum number. In this case no minimum can be found and we should check if the initial step width was too large.

Fig. 6.20
figure 20

Brent’s exponential search

Brent’s method can be improved by making use of derivatives and by combining the golden section search with parabolic interpolation [207].

2.3 Minimization in Multidimensions

We search for local minima (or maxima) of a function

$$\begin{aligned} h(\mathbf{x}) \end{aligned}$$

which is at least two times differentiable. In the following we denote the gradient vector by

$$\begin{aligned} \mathbf{g}^{T}(\mathbf{x})=\biggl(\frac{\partial h}{\partial x_{1}},\ldots, \frac{\partial h}{\partial x_{n}}\biggr) \end{aligned}$$
(6.79)

and the matrix of second derivatives (Hessian) by

$$\begin{aligned} H= \biggl(\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}h \biggr). \end{aligned}$$
(6.80)

The very popular class of direction set methods proceeds as follows (Fig. 6.21). Starting from an initial guess x 0 a set of direction vectors s r and step lengths λ r is determined such that the series of vectors

$$\begin{aligned} \mathbf{x}_{r+1}=\mathbf{x}_{r}+\lambda_{r} \mathbf{s}_{r} \end{aligned}$$
(6.81)

approaches the minimum of h(x). The method stops if the norm of the gradient becomes sufficiently small or if no lower function value can be found.

Fig. 6.21
figure 21

(Direction set minimization)  Starting from an initial guess x 0 a local minimum is approached by making steps along a set of direction vectors s r

2.4 Steepest Descent Method

The simplest choice, which is known as the method of gradient descent or steepest descentFootnote 3 is to go in the direction of the negative gradient

$$\begin{aligned} \mathbf{s}_{r}=-\mathbf{g}_{r} \end{aligned}$$
(6.82)

and to determine the step length by minimizing h along this direction

$$\begin{aligned} h(\lambda)=h(\mathbf{x}_{r}-\lambda\mathbf{g}_{r})= \mbox{min}. \end{aligned}$$
(6.83)

Obviously two consecutive steps are orthogonal to each other since

$$\begin{aligned} 0=\frac{\partial}{\partial\lambda}h(\mathbf{x}_{r+1}-\lambda \mathbf{g}_{r})_{|\lambda=0}= -\mathbf{g}_{r+1}^{T} \mathbf{g}_{r}. \end{aligned}$$
(6.84)

This can lead to a zig-zagging behavior and a very slow convergence of this method (Figs. 6.22, 6.23).

Fig. 6.22
figure 22

(Function minimization)  The minimum of the Rosenbrock function h(x,y)=100(yx 2)2+(1−x)2 is determined with different methods. Conjugate gradients converge much faster than steepest descent. Starting at (x,y)=(0,2), Newton-Raphson reaches the minimum at x=y=1 within only 5 iterations to machine precision

Fig. 6.23
figure 23

(Minimization of the Rosenbrock function)  Left: Newton-Raphson finds the minimum after 5 steps within machine precision. Middle: conjugate gradients reduce the gradient to 4×10−14 after 265 steps. Right: The steepest descent method needs 20000 steps to reduce the gradient to 5×10−14. Red lines show the minimization pathway. Colored areas indicate the function value (light blue: <0.1, grey: 0.1…0.5, green: 5…50, pink: >100). Screenshots taken from Problem 6.2

2.5 Conjugate Gradient Method

This method is similar to the steepest descent method but the search direction is iterated according to

$$\begin{aligned} \mathbf{s}_{0}=-\mathbf{g}_{0}& \end{aligned}$$
(6.85)
$$\begin{aligned} \mathbf{x}_{r+1}=\mathbf{x}_{r}+\lambda_{r} \mathbf{s}_{r}& \end{aligned}$$
(6.86)
$$\begin{aligned} \mathbf{s}_{r+1}=-\mathbf{g}_{r+1}+\beta_{r+1} \mathbf{s}_{r}& \end{aligned}$$
(6.87)

where λ r is chosen to minimize h(x r+1) and the simplest choice for β is made by Fletcher and Rieves [89]

$$\begin{aligned} \beta_{r+1}=\frac{g_{r+1}^{2}}{g_{r}^{2}}. \end{aligned}$$
(6.88)

2.6 Newton-Raphson Method

The first order Newton-Raphson method uses the iteration

$$\begin{aligned} \mathbf{x}_{r+1}=\mathbf{x}_{r}-H(\mathbf{x}_{r})^{-1} \mathbf{g}(\mathbf {x}_{r}). \end{aligned}$$
(6.89)

The search direction is

$$\begin{aligned} \mathbf{s}=H^{-1}\mathbf{g} \end{aligned}$$
(6.90)

and the step length is λ=1. This method converges fast if the starting point is close to the minimum. However, calculation of the Hessian can be very time consuming (Fig. 6.22).

2.7 Quasi-Newton Methods

Calculation of the full Hessian matrix as needed for the Newton-Raphson method can be very time consuming. Quasi-Newton methods use instead an approximation to the Hessian which is updated during each iteration. From the Taylor series

$$\begin{aligned} h(\mathbf{x})=h_{0}+\mathbf{b}^{T}\mathbf{x}+ \frac{1}{2}\mathbf {x}^{T}H\mathbf{x}+\cdots \end{aligned}$$
(6.91)

we obtain the gradient

$$\begin{aligned} \mathbf{g}(\mathbf{x}_{r})=\mathbf{b}+H\mathbf{x}_{r}+ \cdots=\mathbf {g}(\mathbf{x}_{r-1})+H(\mathbf{x}_{r}- \mathbf{x}_{r-1})+\cdots. \end{aligned}$$
(6.92)

Defining the differences

$$\begin{aligned} \mathbf{d}_{r}=\mathbf{x}_{r+1}-\mathbf{x}_{r}& \end{aligned}$$
(6.93)
$$\begin{aligned} \mathbf{y}_{r}=\mathbf{g}_{r+1}-\mathbf{g}_{r}& \end{aligned}$$
(6.94)

and neglecting higher order terms we obtain the Quasi-Newton or secant condition

$$\begin{aligned} H\mathbf{d}_{r}=\mathbf{y}_{r}. \end{aligned}$$
(6.95)

We want to approximate the true Hessian by a series of matrices H r which are updated during each iteration to sum up all the information gathered so far. Since the Hessian is symmetric and positive definite, this also has to be demanded for the H r .Footnote 4 This cannot be achieved by a rank one update matrix. Popular methods use a symmetric rank two update of the form

$$\begin{aligned} H_{r+1}=H_{r}+\alpha \mathbf {u}\mathbf {u}^{T}+\beta \mathbf {v}\mathbf {v}^{T}. \end{aligned}$$
(6.96)

The Quasi-Newton condition then gives

$$\begin{aligned} H_{r+1}\mathbf {d}_{r}=H_{r}\mathbf {d}_{r}+ \alpha\bigl(\mathbf {u}^{T}\mathbf {d}_{r}\bigr)\mathbf {u}+\beta\bigl( \mathbf {v}^{T}\mathbf {d}_{r}\bigr)\mathbf {v}=\mathbf {y}_{r} \end{aligned}$$
(6.97)

hence H r d r y r must be a linear combination of u and v. Making the simple choice

$$\begin{aligned} \mathbf {u}=\mathbf {y}_{r}\quad \mathbf {v}=H_{r}\mathbf {d}_{r} \end{aligned}$$
(6.98)

and assuming that these two vectors are linearly independent, we find

$$\begin{aligned} \beta=-\frac{1}{(\mathbf {v}^{T}\mathbf {d}_{r})}\quad=-\frac{1}{(\mathbf {d}_{r}^{T}H_{r}\mathbf {d}_{r})}& \end{aligned}$$
(6.99)
$$\begin{aligned} \alpha=\frac{1}{(\mathbf {u}^{T}\mathbf {d}_{r})}=\frac{1}{(\mathbf {y}_{r}^{T}\mathbf {d}_{r})}& \end{aligned}$$
(6.100)

which together defines the very popular BFGS (Broyden, Fletcher, Goldfarb, Shanno) method [43, 86, 105, 235]

$$\begin{aligned} H_{r+1}=H_{r}+\frac{\mathbf{y}_{r}\mathbf{y}_{r}^{T}}{\mathbf {y}_{r}^{T}\mathbf{d}_{r}}-\frac{(H_r\mathbf{d}_{r})(H_{r}\mathbf {d}_{r})^{T}}{\mathbf{d}_{r}^{T}H_{r}\mathbf{d}_{r}}. \end{aligned}$$
(6.101)

Alternatively the DFP method by Davidon, Fletcher and Powell, directly updates the inverse Hessian matrix B=H −1 according to

$$\begin{aligned} B_{r+1}=B_{r}+\frac{\mathbf {d}_{r}\mathbf {d}_{r}^{T}}{\mathbf {y}_{r}^{T}\mathbf {d}_{r}}-\frac{(B_{r}\mathbf {y}_{r})(B_{r}\mathbf {y}_{r})^{T}}{\mathbf {y}_{r}^{T}B_{r}\mathbf {y}_{r}}. \end{aligned}$$
(6.102)

Both of these methods can be inverted with the help of the Sherman-Morrison formula to give

$$\begin{aligned} B_{r+1}=B_{r}+\frac{(\mathbf {d}_{r}-B_{r}\mathbf {y}_{r})\mathbf {d}_{r}^{T}+\mathbf {d}_{r}(\mathbf {d}_{r}-B\mathbf {y}_{r})^{T}}{\mathbf {y}_{r}^{T}\mathbf {d}_{r}}-\frac {(\mathbf {d}_{r}-B_{r}\mathbf {y}_{r})^{T}\mathbf {y}}{(\mathbf {y}_{r}^{T}\mathbf {d})^{2}}\mathbf {d} \mathbf {d}^{T}& \\ \end{aligned}$$
(6.103)
$$\begin{aligned} H_{r+1}=H_{r}+\frac{(\mathbf {y}_{r}-H_{r}\mathbf {d}_{r})\mathbf {y}_{r}^{T}+\mathbf {y}_{r}(\mathbf {y}_{r}-H_{r}\mathbf {d}_{r})^{T}}{\mathbf {y}_{r}^{T}\mathbf {d}_{r}}-\frac{(\mathbf {y}_{r}-H_{r}\mathbf {d}_{r})\mathbf {d}_{r}}{(\mathbf {y}_{r}^{T}\mathbf {d}_r)^{2}} \mathbf {y}_{r}\mathbf {y}_{r}^{T}.& \end{aligned}$$
(6.104)

3 Problems

Problem 6.1

(Root finding methods)

This computer experiment searches roots of several test functions:

$$\begin{aligned} &f(x)=x^{n}-2\quad n=1,2,3,4\quad(\mbox{Fig.~6.10}) \\ &f(x)=5\sin(5x) \\ &f(x)= \bigl(\cos(2x) \bigr)^{2}-x^{2} \\ &f(x)=5 \bigl(\sqrt{|x+2|}-1 \bigr) \\ &f(x)=\mathrm {e}^{-x}\ln x \\ &f(x)=(x-1)^{3}\quad\mbox{(Fig.~6.11)} \\ &f(x)=x^{25}\quad\mbox{(Fig.~6.12)} \end{aligned}$$

You can vary the initial interval or starting value and compare the behavior of different methods:

  • bisection

  • regula falsi

  • Dekker’s method

  • Brent’s method

  • Chandrupatla’s method

  • Newton-Raphson method

Problem 6.2

(Stationary points)

This computer experiment searches a local minimum of the Rosenbrock functionFootnote 5

$$\begin{aligned} h(x,y)=100\bigl(y-x^{2}\bigr)^{2}+(1-x)^{2}. \end{aligned}$$
(6.105)
  • The method of steepest descent minimizes h(x,y) along the search direction

    $$\begin{aligned} s_{x}^{(n)}=-g_{x}^{(n)}=-400x \bigl(x_{n}^{2}-y_{n}\bigr)-2(x_{n}-1)& \end{aligned}$$
    (6.106)
    $$\begin{aligned} s_{y}^{(n)}=-g_{y}^{(n)}=-200 \bigl(y_{n}-x_{n}^{2}\bigr).& \end{aligned}$$
    (6.107)
  • Conjugate gradients make use of the search direction

    $$\begin{aligned} s_{x}^{(n)}=-g_{x}^{(n)}+ \beta_{n}s_{x}^{(n-1)} \end{aligned}$$
    (6.108)
$$\begin{aligned} s_{y}^{(n)}=-g_{y}^{(n)}+ \beta_{n}s_{y}^{(n-1)}. \end{aligned}$$
(6.109)
  • The Newton-Raphson method needs the inverse Hessian

    $$\begin{aligned} H^{-1}=\frac{1}{\det(H)}\left ( \begin{array}{cc} h_{yy} & -h_{xy}\\ -h_{xy} & h_{xx} \end{array} \right ) \end{aligned}$$
    (6.110)
$$\begin{aligned} \det(H)=h_{xx}h_{yy}-h_{xy}^{2}& \end{aligned}$$
(6.111)
$$\begin{aligned} h_{xx}=1200x^{2}-400y+2\quad h_{yy}=200\quad h_{xy}=-400x& \end{aligned}$$
(6.112)

and iterates according to

$$\begin{aligned} \left ( \begin{array}{c} x_{n+1}\\ y_{n+1} \end{array} \right )=\left ( \begin{array}{c} x_{n}\\ y_{n} \end{array} \right )-H^{-1}\left ( \begin{array}{c} g_{x}^{n}\\ q_{y}^{n} \end{array} \right ). \end{aligned}$$
(6.113)

You can choose an initial point (x 0,y 0). The iteration stops if the gradient norm falls below 10−14 or if the line search fails to find a lower function value.