Roots and Extremal Points

Scherer, Philipp O. J.

doi:10.1007/978-3-319-00401-3_6

Philipp O. J. Scherer²

Part of the book series: Graduate Texts in Physics ((GTP))

84k Accesses

Abstract

In computational physics very often roots and local extrema of a function have to be determined. In one dimension bisection is a very robust but rather inefficient root finding method. If a good starting point close to the root is available and the function is smooth enough, the Newton-Raphson method converges much faster. Special strategies are necessary to find roots of not so well behaved functions or higher order roots. The combination of bisection and interpolation as by the methods of Dekker, Brent and more recently Chandrupatla provides generally applicable algorithms. In multidimensions Quasi-Newton methods are a good choice. Whereas local extrema can be found as the roots of the gradient, at least in principle, direct optimization can be more efficient. In one dimension the ternary search method or Brent’s more efficient golden section search method can be used. In multidimensions the class of direction set search methods is very popular which includes the methods of steepest descent and conjugate gradients, the Newton-Raphson method and, if calculation of the full Hessian matrix is too expensive, the Quasi-Newton methods.

Access provided by Autonomous University of Puebla. Download chapter PDF

On the steepest descent algorithm for quadratic functions

Article 07 August 2015

Roots

A Family of Fifth-Order Iterative Methods for Finding Multiple Roots of Nonlinear Equations

Article 01 April 2021

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In computational physics very often roots of a function, i.e. solutions of an equation like

$$\begin{aligned} f(x_{1}\cdots x_{N})=0 \end{aligned}$$

(6.1)

have to be determined. A related problem is the search for local extrema (Fig. 6.1)

$$\begin{aligned} \max f(x_{1}\cdots x_{N})\quad\min f(x_{1}\cdots x_{N}) \end{aligned}$$

(6.2)

which for a smooth function are solutions of the equations

$$\begin{aligned} \frac{\partial f(x_{1}\cdots x_{N})}{\partial x_{i}}=0, \quad i=1\dots N. \end{aligned}$$

(6.3)

In one dimension bisection is a very robust but rather inefficient root finding method. If a good starting point close to the root is available and the function smooth enough, the Newton-Raphson method converges much faster. Special strategies are necessary to find roots of not so well behaved functions or higher order roots. The combination of bisection and interpolation like in Dekker’s and Brent’s methods provides generally applicable algorithms. In multidimensions calculation of the Jacobian matrix is not always possible and Quasi-Newton methods are a good choice. Whereas local extrema can be found as the roots of the gradient, at least in principle, direct optimization can be more efficient. In one dimension the ternary search method or Brent’s more efficient golden section search method can be used. In multidimensions the class of direction set search methods is very popular which includes the methods of steepest descent and conjugate gradients, the Newton-Raphson method and, if calculation of the full Hessian matrix is too expensive, the Quasi-Newton methods.

1 Root Finding

If there is exactly one root in the interval a ₀<x<b ₀ then one of the following methods can be used to locate the position with sufficient accuracy. If there are multiple roots, these methods will find one of them and special care has to be taken to locate the other roots.

1.1 Bisection

The simplest method [45] to solve

$$\begin{aligned} f(x)=0 \end{aligned}$$

(6.4)

uses the following algorithm (Fig. 6.2):

(1)
Determine an interval [a ₀,b ₀], which contains a sign change of f(x). If no such interval can be found then f(x) does not have any zero crossings.
(2)
Divide the interval into $[a_{0},a_{0}+\frac{b_{0}-a_{0}}{2}]$ $[a_{0}+\frac{b_{0}-a_{0}}{2},b_{0}]$ and choose that interval [a ₁,b ₁], where f(x) changes its sign.
(3)
repeat until the width b _n−a _n<ε is small enough.^{Footnote 1}

The bisection method needs two starting points which bracket a sign change of the function. It converges but only slowly since each step reduces the uncertainty by a factor of 2.

1.2 Regula Falsi (False Position) Method

The regula falsi [92] method (Fig. 6.3) is similar to the bisection method [45]. However, polynomial interpolation is used to divide the interval [x _r,a _r] with f(x _r)f(a _r)<0. The root of the linear polynomial

$$\begin{aligned} p(x)=f(x_{r})+(x-x_{r})\frac{f(a_{r})-f(x_{r})}{a_{r}-x_{r}} \end{aligned}$$

(6.5)

is given by

$$\begin{aligned} \xi_{r}=x_{r}-f(x_{r})\frac{a_{r}-x_{r}}{f(a_{r})-f(x_{r})}= \frac{a_{r}f(x_{r})-x_{r}f(a_{r})}{f(x_{r})-f(a_{r})} \end{aligned}$$

(6.6)

which is inside the interval [x _r,a _r]. Choose the sub-interval which contains the sign change:

$$\begin{aligned} \begin{aligned}[c] &f(x_{r})f(\xi_{r})<0\rightarrow[x_{r+1},a_{r+1}]=[x_{r}, \xi _{r}] \\ &f(x_{r})f(\xi_{r})>0\rightarrow[x_{r+1},a_{r+1}]=[ \xi_{r},a_{r}]. \end{aligned} \end{aligned}$$

(6.7)

Then ξ _r provides a series of approximations with increasing precision to the root of f(x)=0.

1.3 Newton-Raphson Method

Consider a function which is differentiable at least two times around the root ξ. Taylor series expansion around a point x ₀ in the vicinity

$$\begin{aligned} f(x)=f(x_{0})+(x-x_{0})f'(x_{0})+ \frac {1}{2}(x-x_{0})^{2}f''(x_{0})+\cdots \end{aligned}$$

(6.8)

gives for x=ξ

$$\begin{aligned} 0=f(x_{0})+(\xi-x_{0})f'(x_{0})+ \frac{1}{2}(\xi -x_{0})^{2}f''(x_{0})+\cdots. \end{aligned}$$

(6.9)

Truncation of the series and solving for ξ gives the first order Newton-Raphson [45, 252] method (Fig. 6.4)

$$\begin{aligned} x_{r+1}=x_{r}-\frac{f(x_{r})}{f'(x_{r})} \end{aligned}$$

(6.10)

and the second order Newton-Raphson method (Fig. 6.4)

$$\begin{aligned} x_{r+1}=x_{r}-\frac{f'(x_{r})\pm\sqrt {f'(x_{r})^{2}-2f(x_{r})f''(x_{r})}}{f''(x_{r})}. \end{aligned}$$

(6.11)

The Newton-Raphson method converges fast if the starting point is close enough to the root. Analytic derivatives are needed. It may fail if two or more roots are close by.

1.4 Secant Method

Replacing the derivative in the first order Newton Raphson method by a finite difference quotient gives the secant method [45] (Fig. 6.5) which has been known for thousands of years before [196]

$$\begin{aligned} x_{r+1}=x_{r}-f(x_{r})\frac{x_{r}-x_{r-1}}{f(x_{r})-f(x_{r-1})}. \end{aligned}$$

(6.12)

Round-off errors can become important as |f(x _r)−f(x _r−1)| gets small. At the beginning choose a starting point x ₀ and determine

$$\begin{aligned} x_{1}=x_{0}-f(x_{0})\frac{2h}{f(x_{0}+h)-f(x_{0}-h)} \end{aligned}$$

(6.13)

using a symmetrical difference quotient.

1.5 Interpolation

The secant method is also obtained by linear interpolation

$$\begin{aligned} p(x)=\frac{x-x_{r}}{x_{r-1}-x_{r}}f_{r-1}+\frac{x-x_{r-1}}{x_{r}-x_{r-1}}f_{r}. \end{aligned}$$

(6.14)

The root of the polynomial p(x _r+1)=0 determines the next iterate x _r+1

$$\begin{aligned} x_{r+1}=\frac {1}{f_{r-1}-f_{r}}(x_{r}f_{r-1}-x_{r-1}f_{r})=x_{r}-f_{r} \frac{x_{r}-x_{r-1}}{f_{r}-f_{r-1}}. \end{aligned}$$

(6.15)

Quadratic interpolation of three function values is known as Muller’s method [177]. Newton’s form of the interpolating polynomial is

$$\begin{aligned} p(x)=f_{r}+(x-x_{r})f[x_{r},x_{r-1}]+(x-x_{r}) (x-x_{r-1})f[x_{r},x_{r-1},x_{r-2}] \end{aligned}$$

(6.16)

which can be rewritten

$$\begin{aligned} p(x) =& f_{r}+(x-x_{r})f[x_{r},x_{r-1}]+(x-x_{r})^{2}f[x_{r},x_{r-1},x_{r-2}] \\ &{} +(x_{r}-x_{r-1}) (x-x_{r})f[x_{r},x_{r-1},x_{r-2}] \\ =&f_{r}+(x-x_{r})^{2}f[x_{r},x_{r-1},x_{r-2}] \\ &{} +(x-x_{r}) \bigl(f[x_{r},x_{r-1}] +f[x_{r},x_{r-2}]-f[x_{r-1},x_{r-2}] \bigr) \\ =&f_{r}+A(x-x_{r})+B(x-x_{r})^{2} \end{aligned}$$

(6.17)

and has the roots

$$\begin{aligned} x_{r+1}=x_{r}-\frac{A}{2B}\pm\sqrt{\frac{A^{2}}{4B^{2}}- \frac{f_{r}}{B}}. \end{aligned}$$

(6.18)

To avoid numerical cancellation, this is rewritten

$$\begin{aligned} x_{r+1} =& x_{r}+\frac{1}{2B} \bigl(-A\pm \sqrt{A^{2}-4Bf_{r}} \bigr) \\ =& x_{r}+\frac{-2f_{r}}{A^{2}-(A^{2}-4Bf_{r})} \bigl(A\mp\sqrt {A^{2}-4Bf_{r}} \bigr) \\ =& x_{r}+\frac{-2f_{r}}{A\pm\sqrt{A^{2}-4Bf_{r}}}. \end{aligned}$$

(6.19)

The sign in the denominator is chosen such that x _r+1 is the root closer to x _r. The roots of the polynomial can become complex valued and therefore this method is useful to find complex roots.

1.6 Inverse Interpolation

Complex values of x _r can be avoided by interpolation of the inverse function instead

$$\begin{aligned} x=f^{-1}(y). \end{aligned}$$

(6.20)

Using the two points x _r,x _r−1 the Lagrange method gives

$$\begin{aligned} p(y)=x_{r-1}\frac{y-f_{r}}{f_{r-1}-f_{r}}+x_{r}\frac{y-f_{r-1}}{f_{r}-f_{r-1}} \end{aligned}$$

(6.21)

and the next approximation of the root corresponds again to the secant method (6.12)

$$\begin{aligned} x_{r+1}=p(0)=\frac{x_{r-1}f_{r}-x_{r}f_{r-1}}{f_{r}-f_{r-1}}=x_{r}+\frac {(x_{r-1}-x_{r})}{f_{r}-f_{r-1}}f_{r}. \end{aligned}$$

(6.22)

Inverse quadratic interpolation needs three starting points x _r,x _r−1,x _r−2 together with the function values f _r,f _r−1,f _r−2 (Fig. 6.6). The inverse function x=f ⁻¹(y) is interpolated with the Lagrange method

$$\begin{aligned} p(y) =& \frac {(y-f_{r-1})(y-f_{r})}{(f_{r-2}-f_{r-1})(f_{r-2}-f_{r})}x_{r-2}+\frac {(y-f_{r-2})(y-f_{r})}{(f_{r-1}-f_{r-2})(f_{r-1}-f_{r})}x_{r-1} \\ &{} +\frac{(y-f_{r-1})(y-f_{r-2})}{(f_{r}-f_{r-1})(f_{r}-f_{r-2})}x_{r}. \end{aligned}$$

(6.23)

For y=0 we find the next iterate

$$\begin{aligned} x_{r+1} =& p(0)=\frac {f_{r-1}f_{r}}{(f_{r-2}-f_{r-1})(f_{r-2}-f_{r})}x_{r-2}+\frac {f_{r-2}f_{r}}{(f_{r-1}-f_{r-2})(f_{r-1}-f_{r})}x_{r-1} \\ &{} +\frac{f_{r-1}f_{r-2}}{(f_{r}-f_{r-1})(f_{r}-f_{r-2})}x_{r}. \end{aligned}$$

(6.24)

Inverse quadratic interpolation is only a good approximation if the interpolating parabola is single valued and hence if it is a monotonous function in the range of f _r,f _r−1,f _r−2. For the following discussion we assume that the three values of x are renamed such that x ₁<x ₂<x ₃.

Consider the limiting case (a) in Fig. 6.7 where the polynomial has a horizontal tangent at y=f ₃ and can be written as

$$\begin{aligned} p(y)=x_{3}+(x_{1}-x_{3})\frac{(y-f_{3})^{2}}{(f_{1}-f_{3})^{2}}. \end{aligned}$$

(6.25)

Its value at y=0 is

$$\begin{aligned} p(0) =&x_{3}+(x_{1}-x_{3})\frac {f_{3}^{2}}{(f_{1}-f_{3})^{2}} \\ =&x_{1}+(x_{3}-x_{1}) \biggl(1-\frac{f_{3}^{2}}{(f_{1}-f_{3})^{2}} \biggr). \end{aligned}$$

(6.26)

If f ₁ and f ₃ have different sign and |f ₁|<|f ₃| (Sect. 6.1.7.2) we find

$$\begin{aligned} 1-\frac{f_{3}^{2}}{(f_{1}-f_{3})^{2}}<\frac{3}{4}. \end{aligned}$$

(6.27)

Brent [36] used this as a criterion for the applicability of the inverse quadratic interpolation. However, this does not include all possible cases where interpolation is applicable. Chandrupatla [54] gave a more general discussion. The limiting condition is that the polynomial p(y) has a horizontal tangent at one of the boundaries x _1,3. The derivative values are

$$\begin{aligned} \frac{dp}{dy}(y=f_{1}) =& \frac {x_{2}(f_{1}-f_{3})}{(f_{2}-f_{1})(f_{2}-f_{3})}+\frac {x_{3}(f_{1}-f_{2})}{(f_{3}-f_{1})(f_{3}-f_{2})}+ \frac {x_{1}}{f_{1}-f_{2}}+\frac{x_{1}}{f_{1}-f_{3}} \\ =&\frac{(f_{2}-f_{1})}{(f_{3}-f_{1})(f_{3}-f_{2})} \biggl[\frac {x_{2}(f_{3}-f_{1})^{2}}{(f_{2}-f_{1})^{2}}-x_{3} \\ &{} -\frac {x_{1}(f_{3}-f_{1})^{2}-x_{1}(f_{2}-f_{1})^{2}}{(f_{2}-f_{1})^{2}} \biggr] \\ =& \frac{(f_{2}-f_{1})(x_{2}-x_{1})}{(f_{3}-f_{1})(f_{3}-f_{2})} \bigl[\varPhi ^{-2}-\xi^{-1} \bigr] \end{aligned}$$

(6.28)

$$\begin{aligned} \frac{dp}{dy}(y=f_{3}) =&\frac {x_{2}(f_{3}-f_{1})}{(f_{2}-f_{1})(f_{2}-f_{3})}+\frac {x_{1}(f_{3}-f_{2})}{(f_{1}-f_{2})(f_{1}-f_{3})}+ \frac {x_{3}}{f_{3}-f_{2}}+\frac{x_{3}}{f_{3}-f_{1}} \\ =&\frac{(f_{3}-f_{2})}{(f_{2}-f_{1})(f_{3}-f_{1})} \biggl[-\frac {x_{2}(f_{3}-f_{1})^{2}}{(f_{3}-f_{2})^{2}}+x_{3} \frac{(f_{3}-f_{1})^{2}}{(f_{3}-f_{2})^{2}} \\ &{} -x_{3}\frac {(f_{3}-f_{2})^{2}}{(f_{3}-f_{2})^{2}}+x_{1} \biggr] \\ =& \frac{(f_{3}-f_{2})(x_{3}-x_{2})}{(f_{2}-f_{1})(f_{3}-f_{1})} \biggl[ \biggl(\frac{1}{\varPhi-1} \biggr)^{2}- \frac{1}{1-\xi} \biggr] \end{aligned}$$

(6.29)

with [54]

$$\begin{aligned} &\xi=\frac{x_{2}-x_{1}}{x_{3}-x_{1}} \qquad \varPhi=\frac{f_{2}-f_{1}}{f_{3}-f_{1}} \end{aligned}$$

(6.30)

$$\begin{aligned} &\xi-1=\frac{x_{2}-x_{3}}{x_{3}-x_{1}} \qquad \varPhi-1=\frac {f_{2}-f_{3}}{f_{3}-f_{1}}. \end{aligned}$$

(6.31)

Since for a parabola either f ₁<f ₂<f ₃ or f ₁>f ₂>f ₃ the conditions for applicability of inverse interpolation finally become

$$\begin{aligned} \varPhi^{2}<\xi& \end{aligned}$$

(6.32)

$$\begin{aligned} 1-\xi>(1-\varPhi)^{2}& \end{aligned}$$

(6.33)

which can be combined into

$$\begin{aligned} 1-\sqrt{1-\xi}<|\varPhi|<\sqrt{\xi}. \end{aligned}$$

(6.34)

This method is usually used in combination with other methods (Sect. 6.1.7.2).

1.7 Combined Methods

Bisection converges slowly. The interpolation methods converge faster but are less reliable. The combination of both gives methods which are reliable and converge faster than pure bisection.

1.7.1 Dekker’s Method

Dekker’s method [47, 70] combines bisection and secant method. The root is bracketed by intervals [c _r,b _r] with decreasing width where b _r is the best approximation to the root found and c _r is an earlier guess for which f(c _r) and f(b _r) have different sign. First an attempt is made to use linear interpolation between the points (b _r,f(b _r)) and (a _r,f(a _r)) where a _r is usually the preceding approximation a _r=b _r−1 and is set to the second interval boundary a _r=c _r−1 if the last iteration did not lower the function value (Fig. 6.8).

Starting from an initial interval [x ₀,x ₁] with sign(f(x ₀))≠sign(f(x ₁)) the method proceeds as follows [47]:

initialization

$$\begin{aligned} &f_{1}=f(x_{1})\quad f_{0}=f(x_{0}) \\ &\mbox{if }|f_{1}|<|f_{0}|\mbox{ then \{} \\ &b=x_{1}\quad c=a=x_{0} \\ &f_{b}=f_{1}\quad f_{c}=f_{a}=f_{0}\} \\ &\mbox{else }\{ \\ &b=x_{0}\quad c=a=x_{1} \\ &f_{b}=f_{0}\quad f_{c}=f_{a}=f_{1}\} \end{aligned}$$

iteration

$$ \begin{aligned}[b] &x_{s}=b-f_{b}\frac{b-a}{f_{b}-f_{a}} \\ &x_{m}=\frac{c+b}{2}. \end{aligned} $$

If x _s is very close to the last b then increase the distance to avoid too small steps else choose x _s if it is between b and x _m, otherwise choose x _m (thus choosing the smaller interval)

$$\begin{aligned} x_{r}=\left \{ \begin{array}{l@{\quad}l} b+\delta\operatorname{sign}(c-b)&\mbox{if abs}(x_{s}-b)<\delta \\ x_{s}&\mbox{if }b+\delta<x_{s}<x_{m}\mbox{ or }b-\delta>x_{s}>x_{m}\\ x_{m}&\mbox{else}. \end{array} \right . \end{aligned}$$

Determine x _k as the latest of the previous iterates x ₀…x _r−1 for which sign(f(x _k))≠sign(f(x _r)).

If the new function value is lower update the approximation to the root

$$\begin{aligned} \begin{aligned}[c] & f_{r}=f(x_{r}) \\ &\mbox{if }|f_{r}|<|f_{k}|\mbox{ then }\{ \\ &a=b\quad b=x_{r}\quad c=x_{k} \\ &f_{a}=f_{b}\quad f_{b}=f_{r}\quad f_{c}=f_{k}\} \end{aligned} \end{aligned}$$

otherwise keep the old approximation and update the second interval boundary

$$\begin{aligned} \begin{aligned}[c] &\mbox{if }|f_{r}|\ge|f_{k}|\mbox{ then }\{ \\ &b=x_{k}\quad a=c=x_{r} \\ &f_{b}=f_{k}\quad f_{a}=f_{c}=f_{r}\} \end{aligned} \end{aligned}$$

repeat until |c−b|<ε or f _r=0.

1.7.2 Brent’s Method

In certain cases Dekker’s method converges very slowly making many small steps of the order ε. Brent [36, 37, 47] introduced some modifications to reduce such problems and tried to speed up convergence by the use of inverse quadratic interpolation (Sect. 6.1.6). To avoid numerical problems the iterate (6.24) is written with the help of a quotient

$$\begin{aligned} x_{r+1} =& \frac{f_{b}f_{c}}{(f_{a}-f_{b})(f_{a}-f_{c})}a+\frac {f_{a}f_{c}}{(f_{b}-f_{a})(f_{b}-f_{c})}b \\ & {} +\frac{f_{b}f_{a}}{(f_{c}-f_{b})(f_{c}-f_{a})}c \\ =&b+\frac{p}{q} \end{aligned}$$

(6.35)

with

$$\begin{aligned} & \begin{aligned}[b] p&=\frac{f_{b}}{f_{a}} \biggl((c-b)\frac{f_{a}}{f_{c}} \biggl(\frac {f_{a}}{f_{c}}- \frac{f_{b}}{f_{c}} \biggr)-(b-a) \biggl(\frac {f_{b}}{f_{c}}-1 \biggr) \biggr) \\ & =(c-b)\frac{f_{b}(f_{a}-f_{b})}{f_{c}^{2}}-(b-a)\frac {f_{b}(f_{b}-f_{c})}{f_{a}f_{c}} \\ &=\frac{af_{b}f_{c}(f_{b}-f_{c})+b [f_{a}f_{b}(f_{b}-f_{a})+f_{b}f_{c}(f_{c}-f_{b})]+cf_{a}f_{b}(f_{a}-f_{b})}{f_{a}f_{c}^{2}} \end{aligned} \end{aligned}$$

(6.36)

$$\begin{aligned} &q=- \biggl(\frac{f_{a}}{f_{c}}-1 \biggr) \biggl(\frac{f_{b}}{f_{c}}-1 \biggr) \biggl(\frac{f_{b}}{f_{a}}-1 \biggr)=-\frac {(f_{a}-f_{c})(f_{b}-f_{c})(f_{b}-f_{a})}{f_{a}f_{c}^{2}}. \end{aligned}$$

(6.37)

If only two points are available linear interpolation is used. The iterate (6.22) then is written as

$$\begin{aligned} x_{r+1}=b+\frac{(a-b)}{f_{b}-f_{a}}f_{b}=b+\frac{p}{q} \end{aligned}$$

(6.38)

with

$$\begin{aligned} p=(a-b)\frac{f_{b}}{f_{a}} \qquad q= \biggl(\frac{f_{b}}{f_{a}}-1 \biggr). \end{aligned}$$

(6.39)

The division is only performed if interpolation is appropriate and division by zero cannot happen. Brent’s method is fast and robust at the same time. It is often recommended by text books. The algorithm is summarized in the following [209].

Start with an initial interval [x ₀,x ₁] with f(x ₀)f(x ₁)≤0

initialization

$$\begin{aligned} &a=x_{0}\quad b=x_{1}\quad c=a \\ &f_{a}=f(a)\quad f_{b}=f(b)\quad f_{c}=f_{a} \\ &e=d=b-a \end{aligned}$$

iteration

If c is a better approximation than b exchange values

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }|f_{c}|<|f_{b}|\mbox{ then }\{ \\ &a=b\quad b=c\quad c=a \\ &f_{a}=f_{b}\quad f_{b}=f_{c}\quad f_{c}=f_{a}\} \end{aligned} \end{aligned}$$

calculate midpoint relative to b

$$\begin{aligned} x_{m}=0.5(c-b) \end{aligned}$$

stop if accuracy is sufficient

$$\begin{aligned} \mbox{if }|x_{m}|<\varepsilon\mbox{ or }f_{b}=0\mbox{ then exit} \end{aligned}$$

use bisection if the previous step width e was too small or the last step did not improve

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }|e|<\varepsilon\mbox{ or }|f_{a}|\le|f_{b}|\mbox{ then \{} \\ &e=d=x_{m}\} \end{aligned} \end{aligned}$$

otherwise try interpolation

$$\begin{aligned} \begin{aligned}[b] &\mbox{else }\{ \\ &\mbox{if }a=c\,\mbox{ then }\biggl\{ \\ &p=2x_{m}\frac{f_{b}}{f_{a}}\quad q=\frac{f_{b}-f_{a}}{f_{a}}\biggr\} \\ &\mbox{else }\biggl\{ \\ &p=2x_{m}\frac{f_{b}(f_{a}-f_{b})}{f_{c}^{2}}-(b-a)\frac {f_{b}(f_{b}-f_{c})}{f_{a}f_{c}} \\ & q= \biggl(\frac{f_{a}}{f_{c}}-1 \biggr) \biggl(\frac{f_{b}}{f_{c}}-1 \biggr) \biggl(\frac{f_{b}}{f_{a}}-1 \biggr)\biggr\} \end{aligned} \end{aligned}$$

make p a positive quantity

$$\begin{aligned} \mbox{if }p>0\mbox{ then }\{q=-q\}\mbox{ else }\{p=-p\} \end{aligned}$$

update previous step width

$$\begin{aligned} s=e\quad e=d \end{aligned}$$

use interpolation if applicable, otherwise use bisection

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }2p<3x_{m}q-|\varepsilon q|\mbox{ and }p<|0.5\, sq|\mbox{ then }\biggl\{ \\ &d=\frac{p}{q}\biggr\} \\ &\mbox{else }\{e=d=x_{m}\} \\ &a=b\quad f_{a}=f_{b} \\ &\mbox{if }|d|>\varepsilon\mbox{ then }\{ \\ &b=b+d\} \\ &\mbox{else }\bigl\{b=b+\varepsilon\operatorname{sign}(x_{m})\bigr\} \end{aligned} \end{aligned}$$

calculate new function value

$$\begin{aligned} f_{b}=f(b) \end{aligned}$$

be sure to bracket the root

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }\operatorname{sign}(f_{b})=\operatorname{sign}(f_{c})\mbox{ then }\{ \\ &c=a\quad f_{c}=f_{a} \\ &e=d=b-a\} \end{aligned} \end{aligned}$$

1.7.3 Chandrupatla’s method

In 1997 Chandrupatla [54] published a method which tries to use inverse quadratic interpolation whenever possible according to (6.34). He calculates the relative position of the new iterate as (Fig. 6.9)

$$\begin{aligned} t =&\frac{x-c}{b-c} \\ =&\frac{1}{b-c} \biggl[\frac{f_{c}f_{b}}{(f_{a}-f_{b})(f_{a}-f_{c})}a+\frac {f_{a}f_{c}}{(f_{b}-f_{a})(f_{b}-f_{c})}b \\ &{}+ \frac {f_{b}f_{a}}{(f_{c}-f_{b})(f_{c}-f_{a})}c-c \biggr] \\ =&\frac{a-c}{b-c}\frac{f_{c}}{f_{c}-f_{a}}\frac{f_{b}}{f_{b}-f_{a}}+\frac {f_{a}f_{c}}{(f_{b}-f_{a})(f_{b}-f_{c})}. \end{aligned}$$

(6.40)

The algorithm proceeds as follows:

Start with an initial interval [x ₀,x ₁] with f(x ₀)f(x ₁)≤0

initialization

$$\begin{aligned} \begin{aligned}[b] &b=x_{0}\quad a=c=x_{1} \\ &f_{b}=f(b)\quad f_{a}=f_{c}=f(c) \\ &t=0.5 \end{aligned} \end{aligned}$$

iteration

$$\begin{aligned} \begin{aligned}[b] &x_{t}=a+t(b-a) \\ &f_{t}=f(x_{t}) \end{aligned} \end{aligned}$$

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }\operatorname{sign}(f_{t})=\operatorname{sign}(f_{a})\ \{ \\ &c=a\quad f_{c}=f_{a} \\ &a=x_{t}\quad f_{a}=F_{t}\} \\ &\mbox{else}\ \{ \\ & c=b\quad b=a\quad a=x_{t} \\ &f_{c}=f_{b}\quad f_{b}=f_{a}\quad f_{a}=f_{t}\} \\ &x_{m}=a\quad f_{m}=f_{a} \\ &\mbox{if }\mathrm{abs}(f_{b})<\mathrm{abs}(f_{a})\ \{ \\ &x_{m}=b\quad f_{m}=f_{b}\} \\ &\mathit{tol}=2\varepsilon_{M}|x_{m}|+\varepsilon_{a} \\ &t_{l}=\frac{\mathit{tol}}{|b-c|} \\ & \mbox{if }t_{l}>0.5\mbox{ or }f_{m}=0\mbox{ exit} \\ &\xi=\frac{a-b}{c-b}\quad\varPhi=\frac{f_{a}-f_{b}}{f_{c}-f_{b}} \\ &\mbox{if }1-\sqrt{1-\xi}<\varPhi<\sqrt{\xi}\ \biggl\{ \\ &t=\frac{f_{a}}{f_{b}-f_{a}}\frac{f_{c}}{f_{b}-f_{c}}+\frac {c-a}{b-a}\frac{f_{a}}{f_{c}-f_{a}} \frac{f_{b}}{f_{c}-f_{b}}\biggr\} \\ &\mbox{else }\{t=0.5\} \\ &\mbox{if }t<t_{l}\ \{t=t_{l}\} \\ &\mbox{if }t>(1-t_{l})\ \{t=1-t_{l}\} \end{aligned} \end{aligned}$$

Chandrupatlas’s method is more efficient than Dekker’s and Brent’s, especially for higher order roots (Figs. 6.10, 6.11, 6.12).

1.8 Multidimensional Root Finding

The Newton-Raphson method can be easily generalized for functions of more than one variable. We search for the solution of a system of n nonlinear equations in n variables x _i

$$\begin{aligned} \mathbf {f}(\mathbf {x})=\left ( \begin{array}{c} f_{1}(x_{1}\cdots x_{n})\\ \vdots\\ f_{n}(x_{1}\cdots x_{n}) \end{array} \right )=0. \end{aligned}$$

(6.41)

The first order Newton-Raphson method results from linearization of

$$\begin{aligned} 0=\mathbf {f}(\mathbf {x})=\mathbf {f}\bigl(\mathbf {x}^{0}\bigr)+J\bigl( \mathbf {x}^{0}\bigr) \bigl(\mathbf {x}-\mathbf {x}^{0}\bigr)+\cdots \end{aligned}$$

(6.42)

with the Jacobian matrix

$$\begin{aligned} J=\left ( \begin{array}{c@{\quad}c@{\quad}c} \frac{\partial f_{1}}{\partial x_{1}} & \cdots& \frac{\partial f_{1}}{\partial x_{n}}\\ \vdots& \ddots& \vdots\\ \frac{\partial f_{n}}{\partial x_{1}} & \cdots& \frac{\partial f_{n}}{\partial x_{n}} \end{array} \right ). \end{aligned}$$

(6.43)

If the Jacobian matrix is not singular the equation

$$\begin{aligned} 0=\mathbf {f}\bigl(\mathbf {x}^{0}\bigr)+J\bigl(\mathbf {x}^{0}\bigr) \bigl(\mathbf {x}-\mathbf {x}^{0}\bigr) \end{aligned}$$

(6.44)

can be solved by

$$\begin{aligned} \mathbf {x}=\mathbf {x}^{0}- \bigl(J\bigl(\mathbf {x}^{0}\bigr) \bigr)^{-1}\mathbf {f}\bigl(\mathbf {x}^{0}\bigr). \end{aligned}$$

(6.45)

This can be repeated iteratively

$$\begin{aligned} \mathbf {x}^{(r+1)}=\mathbf {x}^{(r)}-\bigl(J\bigl(\mathbf {x}^{(r)} \bigr)\bigr)^{-1}\mathbf {f}\bigl(\mathbf {x}^{(r)}\bigr). \end{aligned}$$

(6.46)

1.9 Quasi-Newton Methods

Calculation of the Jacobian matrix can be very time consuming. Quasi-Newton methods use instead an approximation to the Jacobian which is updated during each iteration. Defining the differences

$$\begin{aligned} \mathbf {d}^{(r)}=\mathbf {x}^{(r+1)}-\mathbf {x}^{(r)}& \end{aligned}$$

(6.47)

$$\begin{aligned} \mathbf {y}^{(r)}=\mathbf {f}\bigl(\mathbf {x}^{(r+1)}\bigr)-\mathbf {f}\bigl( \mathbf {x}^{(r)}\bigr)& \end{aligned}$$

(6.48)

we obtain from the truncated Taylor series

$$\begin{aligned} \mathbf {f}\bigl(\mathbf {x}^{(r+1)}\bigr)=\mathbf {f}\bigl(\mathbf {x}^{(r)} \bigr)+J\bigl(\mathbf {x}^{(r)}\bigr) \bigl(\mathbf {x}^{(r+1)}- \mathbf {x}^{(r)}\bigr) \end{aligned}$$

(6.49)

the so called Quasi-Newton or secant condition

$$\begin{aligned} \mathbf {y}^{(r)}=J\bigl(\mathbf {x}^{(r)}\bigr)\mathbf {d}^{(r)}. \end{aligned}$$

(6.50)

We attempt to construct a family of successive approximation matrices J _r so that, if J were a constant, the procedure would become consistent with the Quasi-Newton condition. Then for the new update J _r+1 we have

$$\begin{aligned} J_{r+1}\mathbf {d}^{(r)}=\mathbf {y}^{(r)}. \end{aligned}$$

(6.51)

Since d ^(r),y ^(r) are already known, these are only n equations for the n ² elements of J _r+1. To specify J _r+1 uniquely, additional conditions are required. For instance, it is reasonable to assume, that

$$\begin{aligned} J_{r+1}\mathbf {u}=J_{r}\mathbf {u}\quad\mbox{for all }\mathbf {u}\perp \mathbf {d}^{(r)}. \end{aligned}$$

(6.52)

Then J _r+1 differs from J _r only by a rank one updating matrix

$$\begin{aligned} J_{r+1}=J_{r}+\mathbf {u}\,\mathbf {d}^{(r)T}. \end{aligned}$$

(6.53)

From the secant condition we obtain

$$\begin{aligned} J_{r+1}\mathbf {d}^{(r)}=J_{r}\mathbf {d}^{(r)}+ \mathbf {u}\bigl(\mathbf {d}^{(r)}\mathbf {d}^{(r)T}\bigr)= \mathbf {y}^{(r)} \end{aligned}$$

(6.54)

hence

$$\begin{aligned} \mathbf {u}=\frac{1}{|d^{(r)}|^{2}} \bigl(\mathbf {y}^{(r)}-J_{r}\mathbf {d}^{(r)} \bigr). \end{aligned}$$

(6.55)

This gives Broyden’s update formula [42]

$$\begin{aligned} J_{r+1}=J_{r}+\frac{1}{|d^{(r)}|^{2}} \bigl(\mathbf {y}^{(r)}-J_{r} \mathbf {d}^{(r)} \bigr)\mathbf {d}^{(r)T}. \end{aligned}$$

(6.56)

To update the inverse Jacobian matrix, the Sherman-Morrison formula [236]

$$\begin{aligned} \bigl(A+\mathbf {u}\mathbf {v}^{T}\bigr)^{-1}=A^{-1}- \frac{A^{-1}\mathbf {u}\mathbf {v}^{T}A^{-1}}{1+\mathbf {v}^{T}A^{-1}\mathbf {u}} \end{aligned}$$

(6.57)

can be applied to have

$$\begin{aligned} J_{r+1}^{-1} =&J_{r}^{-1}-\frac{J_{r}^{-1}\frac{1}{|d^{(r)}|^{2}} (\mathbf {y}^{(r)}-J_{r}\mathbf {d}^{(r)} )\mathbf {d}^{(r)T}J_{r}^{-1}}{1+\frac{1}{|d^{(r)}|^{2}}\mathbf {d}^{(r)T}J_{r}^{-1} (\mathbf {y}^{(r)}-J_{r}\mathbf {d}^{(r)} )} \\ =&J_{r}^{-1}-\frac{ (J_{r}^{-1}\mathbf {y}^{(r)}-\mathbf {d}^{(r)} )\mathbf {d}^{(r)T}J_{r}^{-1}}{\mathbf {d}^{(r)T}J_{r}^{-1}\mathbf {y}^{(r)}}. \end{aligned}$$

(6.58)

2 Function Minimization

Minimization or maximization of a function^{Footnote 2} is a fundamental task in numerical mathematics and closely related to root finding. If the function f(x) is continuously differentiable then at the extremal points the derivative is zero

$$\begin{aligned} \frac{\mathrm {d}f}{\mathrm {d}x}=0. \end{aligned}$$

(6.59)

Hence, in principle root finding methods can be applied to locate local extrema of a function. However, in some cases the derivative cannot be easily calculated or the function even is not differentiable. Then derivative free methods similar to bisection for root finding have to be used.

2.1 The Ternary Search Method

Ternary search is a simple method to determine the minimum of a unimodal function f(x). Initially we have to find an interval [a ₀,b ₀] which is certain to contain the minimum. Then the interval is divided into three equal parts [a ₀,c ₀], [c ₀,d ₀], [d ₀,b ₀] and either the first or the last of the three intervals is excluded (Fig. 6.13). The procedure is repeated with the remaining interval [a ₁,b ₁]=[a ₀,d ₀] or [a ₁,b ₁]=[c ₀,b ₀].

Each step needs two function evaluations and reduces the interval width by a factor of 2/3 until the maximum possible precision is obtained. It can be determined by considering a differentiable function which can be expanded around the minimum x ₀ as

$$\begin{aligned} f(x)=f(x_{0})+\frac{(x-x_{0})^{2}}{2}f''(x_{0})+ \cdots. \end{aligned}$$

(6.60)

Numerically calculated function values f(x) and f(x ₀) only differ, if

$$\begin{aligned} \frac{(x-x_{0})^{2}}{2}f''(x_{0})> \varepsilon_{M}f(x_{0}) \end{aligned}$$

(6.61)

which limits the possible numerical accuracy to

$$\begin{aligned} \varepsilon(x_{0})=\mbox{min}|x-x_{0}|=\sqrt{ \frac {2f(x_{0})}{f''(x_{0})}\varepsilon_{M}} \end{aligned}$$

(6.62)

and for reasonably well behaved functions (Fig. 6.14) we have the rule of thumb [207]

$$\begin{aligned} \varepsilon(x_{0})\approx\sqrt{\varepsilon_{M}}. \end{aligned}$$

(6.63)

However, it may be impossible to reach even this precision, if the quadratic term of the Taylor series vanishes (Fig. 6.15).

The algorithm can be formulated as follows:

iteration

$$\begin{aligned} \begin{aligned}[c] &\mbox{if }(b-a)<\delta\mbox{ then exit} \\ &c=a+\frac{1}{3}(b-a)\quad d=a+\frac{2}{3}(b-a) \\ &f_{c}=f(c)\quad f_{d}=f(d) \\ &\mbox{if }f_{c}<f_{d}\mbox{ then }b=d\mbox{ else }a=c \end{aligned} \end{aligned}$$

2.2 The Golden Section Search Method (Brent’s Method)

To bracket a local minimum of a unimodal function f(x) three points a,b,c are necessary (Fig. 6.16) with

$$\begin{aligned} f(a)>f(b)\quad f(c)>f(b). \end{aligned}$$

(6.64)

The position of the minimum can be determined iteratively by choosing a new value ξ in the interval a<ξ<c and dropping either a or c, depending on the ratio of the function values. A reasonable choice for ξ can be found as follows (Fig. 6.17) [147, 207]. Let us denote the relative positions of the middle point and the trial point as

$$\begin{aligned} \begin{aligned}[c] & \frac{b-a}{c-a}=\beta\quad\frac{c-b}{c-a}=1-\beta\quad \frac{b-a}{c-b}=\frac{\beta}{1-\beta}\quad\frac{\xi-b}{c-a}=t \\ &\frac{\xi-a}{c-a}=\frac{\xi-b+b-a}{c-a}=t+\beta. \end{aligned} \end{aligned}$$

(6.65)

The relative width of the new interval will be

$$\begin{aligned} \frac{c-\xi}{c-a}=(1-\beta-t)\quad\mbox{or}\quad\frac{b-a}{c-a}=\beta \quad\mbox{if }a<\xi<b& \end{aligned}$$

(6.66)

$$\begin{aligned} \frac{\xi-a}{c-a}=(t+\beta)\quad\mbox{or}\quad\frac{c-b}{c-a}=(1-\beta ) \quad\mbox{if }b<\xi<c.& \end{aligned}$$

(6.67)

The golden search method requires that

$$\begin{aligned} t=1-2\beta=\frac{c+a-2b}{c-a}=\frac{(c-b)-(b-a)}{c-a}. \end{aligned}$$

(6.68)

Otherwise it would be possible that the larger interval width is selected many times slowing down the convergence. The value of t is positive if c−b>b−a and negative if c−b<b−a, hence the trial point always is in the larger of the two intervals. In addition the golden search method requires that the ratio of the spacing remains constant. Therefore we set

$$\begin{aligned} \frac{\beta}{1-\beta}=-\frac{t+\beta}{t}=-\frac{1-\beta}{t}\quad\mbox{if }a< \xi<b& \end{aligned}$$

(6.69)

$$\begin{aligned} \frac{\beta}{1-\beta}=\frac{t}{1-\beta-t}=\frac{t}{\beta}\quad\mbox{if }b< \xi<c.& \end{aligned}$$

(6.70)

Eliminating t we obtain for a<ξ<b the equation

$$\begin{aligned} \frac{(\beta-1)}{\beta}\bigl(\beta^{2}+\beta-1\bigr)=0. \end{aligned}$$

(6.71)

Besides the trivial solution β=1 there is only one positive solution

$$\begin{aligned} \beta=\frac{\sqrt{5}-1}{2}\approx0.618. \end{aligned}$$

(6.72)

For b<ξ<c we end up with

$$\begin{aligned} \frac{\beta}{\beta-1}\bigl(\beta^{2}-3\beta+1\bigr)=0 \end{aligned}$$

(6.73)

which has the nontrivial positive solution

$$\begin{aligned} \beta=\frac{3-\sqrt{5}}{2}\approx0.382. \end{aligned}$$

(6.74)

Hence the lengths of the two intervals [a,b], [b,c] have to be in the golden ratio $\varphi=\frac{1+\sqrt{5}}{2}\approx1.618$ which gives the method its name. Using the golden ratio the width of the interval bracketing the minimum reduces by a factor of 0.618 (Figs. 6.18, 6.19).

The algorithm can be formulated as follows:

$$\begin{aligned} \begin{aligned}[b] &\mbox{if }c-a<\delta\mbox{ then exit} \\ &\mbox{if }(b-a)\ge(c-b)\mbox{ then }\{ \\ &x=0.618 b+0.382 a \\ &f_{x}=f(x) \\ &\mbox{if }f_{x}<f_{b}\mbox{ then }\{c=b\quad b=x\quad f_{c}=f_{b}\quad f_{b}=f_{x}\} \\ &\mbox{else }a=x\quad f_{a}=f_{x}\} \\ & \mbox{if }(b-a)<(c-b)\mbox{ then }\{ \\ &x=0.618 b+0.382 c \\ &f_{x}=f(x) \\ & \mbox{if }f_{x}<f_{b}\mbox{ then }\{a=b\quad b=x\quad f_{a}=f_{b}\quad f_{b}=f_{x}\} \\ &\mbox{else }c=x\quad f_{c}=f_{x}\} \end{aligned} \end{aligned}$$

To start the method we need three initial points which can be found by Brent’s exponential search method (Fig. 6.20). Begin with three points

$$\begin{aligned} a_{0},b_{0}=a_{0}+h,c_{0}+1.618 h \end{aligned}$$

(6.75)

where h ₀ is a suitable initial step width which depends on the problem. If the minimum is not already bracketed then if necessary exchange a ₀ and b ₀ to have

$$\begin{aligned} f(a_{0})>f(b_{0})>f(c_{0}). \end{aligned}$$

(6.76)

Then replace the three points by

$$\begin{aligned} a_{1}=b_{0}\quad b_{1}=c_{0}\quad c_{1}=c_{0}+1.618(c_{0}-b_{0}) \end{aligned}$$

(6.77)

and repeat this step until

$$\begin{aligned} f(b_{n})<f(c_{n}) \end{aligned}$$

(6.78)

or n exceeds a given maximum number. In this case no minimum can be found and we should check if the initial step width was too large.

Brent’s method can be improved by making use of derivatives and by combining the golden section search with parabolic interpolation [207].

2.3 Minimization in Multidimensions

We search for local minima (or maxima) of a function

$$\begin{aligned} h(\mathbf{x}) \end{aligned}$$

which is at least two times differentiable. In the following we denote the gradient vector by

$$\begin{aligned} \mathbf{g}^{T}(\mathbf{x})=\biggl(\frac{\partial h}{\partial x_{1}},\ldots, \frac{\partial h}{\partial x_{n}}\biggr) \end{aligned}$$

(6.79)

and the matrix of second derivatives (Hessian) by

$$\begin{aligned} H= \biggl(\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}h \biggr). \end{aligned}$$

(6.80)

The very popular class of direction set methods proceeds as follows (Fig. 6.21). Starting from an initial guess x ₀ a set of direction vectors s _r and step lengths λ _r is determined such that the series of vectors

$$\begin{aligned} \mathbf{x}_{r+1}=\mathbf{x}_{r}+\lambda_{r} \mathbf{s}_{r} \end{aligned}$$

(6.81)

approaches the minimum of h(x). The method stops if the norm of the gradient becomes sufficiently small or if no lower function value can be found.

2.4 Steepest Descent Method

The simplest choice, which is known as the method of gradient descent or steepest descent^{Footnote 3} is to go in the direction of the negative gradient

$$\begin{aligned} \mathbf{s}_{r}=-\mathbf{g}_{r} \end{aligned}$$

(6.82)

and to determine the step length by minimizing h along this direction

$$\begin{aligned} h(\lambda)=h(\mathbf{x}_{r}-\lambda\mathbf{g}_{r})= \mbox{min}. \end{aligned}$$

(6.83)

Obviously two consecutive steps are orthogonal to each other since

$$\begin{aligned} 0=\frac{\partial}{\partial\lambda}h(\mathbf{x}_{r+1}-\lambda \mathbf{g}_{r})_{|\lambda=0}= -\mathbf{g}_{r+1}^{T} \mathbf{g}_{r}. \end{aligned}$$

(6.84)

This can lead to a zig-zagging behavior and a very slow convergence of this method (Figs. 6.22, 6.23).

2.5 Conjugate Gradient Method

This method is similar to the steepest descent method but the search direction is iterated according to

$$\begin{aligned} \mathbf{s}_{0}=-\mathbf{g}_{0}& \end{aligned}$$

(6.85)

$$\begin{aligned} \mathbf{x}_{r+1}=\mathbf{x}_{r}+\lambda_{r} \mathbf{s}_{r}& \end{aligned}$$

(6.86)

$$\begin{aligned} \mathbf{s}_{r+1}=-\mathbf{g}_{r+1}+\beta_{r+1} \mathbf{s}_{r}& \end{aligned}$$

(6.87)

where λ _r is chosen to minimize h(x _r+1) and the simplest choice for β is made by Fletcher and Rieves [89]

$$\begin{aligned} \beta_{r+1}=\frac{g_{r+1}^{2}}{g_{r}^{2}}. \end{aligned}$$

(6.88)

2.6 Newton-Raphson Method

The first order Newton-Raphson method uses the iteration

$$\begin{aligned} \mathbf{x}_{r+1}=\mathbf{x}_{r}-H(\mathbf{x}_{r})^{-1} \mathbf{g}(\mathbf {x}_{r}). \end{aligned}$$

(6.89)

The search direction is

$$\begin{aligned} \mathbf{s}=H^{-1}\mathbf{g} \end{aligned}$$

(6.90)

and the step length is λ=1. This method converges fast if the starting point is close to the minimum. However, calculation of the Hessian can be very time consuming (Fig. 6.22).

2.7 Quasi-Newton Methods

Calculation of the full Hessian matrix as needed for the Newton-Raphson method can be very time consuming. Quasi-Newton methods use instead an approximation to the Hessian which is updated during each iteration. From the Taylor series

$$\begin{aligned} h(\mathbf{x})=h_{0}+\mathbf{b}^{T}\mathbf{x}+ \frac{1}{2}\mathbf {x}^{T}H\mathbf{x}+\cdots \end{aligned}$$

(6.91)

we obtain the gradient

$$\begin{aligned} \mathbf{g}(\mathbf{x}_{r})=\mathbf{b}+H\mathbf{x}_{r}+ \cdots=\mathbf {g}(\mathbf{x}_{r-1})+H(\mathbf{x}_{r}- \mathbf{x}_{r-1})+\cdots. \end{aligned}$$

(6.92)

Defining the differences

$$\begin{aligned} \mathbf{d}_{r}=\mathbf{x}_{r+1}-\mathbf{x}_{r}& \end{aligned}$$

(6.93)

$$\begin{aligned} \mathbf{y}_{r}=\mathbf{g}_{r+1}-\mathbf{g}_{r}& \end{aligned}$$

(6.94)

and neglecting higher order terms we obtain the Quasi-Newton or secant condition

$$\begin{aligned} H\mathbf{d}_{r}=\mathbf{y}_{r}. \end{aligned}$$

(6.95)

We want to approximate the true Hessian by a series of matrices H _r which are updated during each iteration to sum up all the information gathered so far. Since the Hessian is symmetric and positive definite, this also has to be demanded for the H _r.^{Footnote 4} This cannot be achieved by a rank one update matrix. Popular methods use a symmetric rank two update of the form

$$\begin{aligned} H_{r+1}=H_{r}+\alpha \mathbf {u}\mathbf {u}^{T}+\beta \mathbf {v}\mathbf {v}^{T}. \end{aligned}$$

(6.96)

The Quasi-Newton condition then gives

$$\begin{aligned} H_{r+1}\mathbf {d}_{r}=H_{r}\mathbf {d}_{r}+ \alpha\bigl(\mathbf {u}^{T}\mathbf {d}_{r}\bigr)\mathbf {u}+\beta\bigl( \mathbf {v}^{T}\mathbf {d}_{r}\bigr)\mathbf {v}=\mathbf {y}_{r} \end{aligned}$$

(6.97)

hence H _r d _r−y _r must be a linear combination of u and v. Making the simple choice

$$\begin{aligned} \mathbf {u}=\mathbf {y}_{r}\quad \mathbf {v}=H_{r}\mathbf {d}_{r} \end{aligned}$$

(6.98)

and assuming that these two vectors are linearly independent, we find

$$\begin{aligned} \beta=-\frac{1}{(\mathbf {v}^{T}\mathbf {d}_{r})}\quad=-\frac{1}{(\mathbf {d}_{r}^{T}H_{r}\mathbf {d}_{r})}& \end{aligned}$$

(6.99)

$$\begin{aligned} \alpha=\frac{1}{(\mathbf {u}^{T}\mathbf {d}_{r})}=\frac{1}{(\mathbf {y}_{r}^{T}\mathbf {d}_{r})}& \end{aligned}$$

(6.100)

which together defines the very popular BFGS (Broyden, Fletcher, Goldfarb, Shanno) method [43, 86, 105, 235]

$$\begin{aligned} H_{r+1}=H_{r}+\frac{\mathbf{y}_{r}\mathbf{y}_{r}^{T}}{\mathbf {y}_{r}^{T}\mathbf{d}_{r}}-\frac{(H_r\mathbf{d}_{r})(H_{r}\mathbf {d}_{r})^{T}}{\mathbf{d}_{r}^{T}H_{r}\mathbf{d}_{r}}. \end{aligned}$$

(6.101)

Alternatively the DFP method by Davidon, Fletcher and Powell, directly updates the inverse Hessian matrix B=H ⁻¹ according to

$$\begin{aligned} B_{r+1}=B_{r}+\frac{\mathbf {d}_{r}\mathbf {d}_{r}^{T}}{\mathbf {y}_{r}^{T}\mathbf {d}_{r}}-\frac{(B_{r}\mathbf {y}_{r})(B_{r}\mathbf {y}_{r})^{T}}{\mathbf {y}_{r}^{T}B_{r}\mathbf {y}_{r}}. \end{aligned}$$

(6.102)

Both of these methods can be inverted with the help of the Sherman-Morrison formula to give

$$\begin{aligned} B_{r+1}=B_{r}+\frac{(\mathbf {d}_{r}-B_{r}\mathbf {y}_{r})\mathbf {d}_{r}^{T}+\mathbf {d}_{r}(\mathbf {d}_{r}-B\mathbf {y}_{r})^{T}}{\mathbf {y}_{r}^{T}\mathbf {d}_{r}}-\frac {(\mathbf {d}_{r}-B_{r}\mathbf {y}_{r})^{T}\mathbf {y}}{(\mathbf {y}_{r}^{T}\mathbf {d})^{2}}\mathbf {d} \mathbf {d}^{T}& \\ \end{aligned}$$

(6.103)

$$\begin{aligned} H_{r+1}=H_{r}+\frac{(\mathbf {y}_{r}-H_{r}\mathbf {d}_{r})\mathbf {y}_{r}^{T}+\mathbf {y}_{r}(\mathbf {y}_{r}-H_{r}\mathbf {d}_{r})^{T}}{\mathbf {y}_{r}^{T}\mathbf {d}_{r}}-\frac{(\mathbf {y}_{r}-H_{r}\mathbf {d}_{r})\mathbf {d}_{r}}{(\mathbf {y}_{r}^{T}\mathbf {d}_r)^{2}} \mathbf {y}_{r}\mathbf {y}_{r}^{T}.& \end{aligned}$$

(6.104)

3 Problems

Problem 6.1

(Root finding methods)

This computer experiment searches roots of several test functions:

$$\begin{aligned} &f(x)=x^{n}-2\quad n=1,2,3,4\quad(\mbox{Fig.~6.10}) \\ &f(x)=5\sin(5x) \\ &f(x)= \bigl(\cos(2x) \bigr)^{2}-x^{2} \\ &f(x)=5 \bigl(\sqrt{|x+2|}-1 \bigr) \\ &f(x)=\mathrm {e}^{-x}\ln x \\ &f(x)=(x-1)^{3}\quad\mbox{(Fig.~6.11)} \\ &f(x)=x^{25}\quad\mbox{(Fig.~6.12)} \end{aligned}$$

You can vary the initial interval or starting value and compare the behavior of different methods:

bisection
regula falsi
Dekker’s method
Brent’s method
Chandrupatla’s method
Newton-Raphson method

Problem 6.2

(Stationary points)

This computer experiment searches a local minimum of the Rosenbrock function^{Footnote 5}

$$\begin{aligned} h(x,y)=100\bigl(y-x^{2}\bigr)^{2}+(1-x)^{2}. \end{aligned}$$

(6.105)

The method of steepest descent minimizes h(x,y) along the search direction
$$\begin{aligned} s_{x}^{(n)}=-g_{x}^{(n)}=-400x \bigl(x_{n}^{2}-y_{n}\bigr)-2(x_{n}-1)& \end{aligned}$$
(6.106)

$$\begin{aligned} s_{y}^{(n)}=-g_{y}^{(n)}=-200 \bigl(y_{n}-x_{n}^{2}\bigr).& \end{aligned}$$
(6.107)
Conjugate gradients make use of the search direction
$$\begin{aligned} s_{x}^{(n)}=-g_{x}^{(n)}+ \beta_{n}s_{x}^{(n-1)} \end{aligned}$$
(6.108)

$$\begin{aligned} s_{y}^{(n)}=-g_{y}^{(n)}+ \beta_{n}s_{y}^{(n-1)}. \end{aligned}$$

(6.109)

The Newton-Raphson method needs the inverse Hessian
$$\begin{aligned} H^{-1}=\frac{1}{\det(H)}\left ( \begin{array}{cc} h_{yy} & -h_{xy}\\ -h_{xy} & h_{xx} \end{array} \right ) \end{aligned}$$
(6.110)

$$\begin{aligned} \det(H)=h_{xx}h_{yy}-h_{xy}^{2}& \end{aligned}$$

(6.111)

$$\begin{aligned} h_{xx}=1200x^{2}-400y+2\quad h_{yy}=200\quad h_{xy}=-400x& \end{aligned}$$

(6.112)

and iterates according to

$$\begin{aligned} \left ( \begin{array}{c} x_{n+1}\\ y_{n+1} \end{array} \right )=\left ( \begin{array}{c} x_{n}\\ y_{n} \end{array} \right )-H^{-1}\left ( \begin{array}{c} g_{x}^{n}\\ q_{y}^{n} \end{array} \right ). \end{aligned}$$

(6.113)

You can choose an initial point (x ₀,y ₀). The iteration stops if the gradient norm falls below 10⁻¹⁴ or if the line search fails to find a lower function value.

Notes

1.
Usually a combination like ε=2ε _M+|b _n|ε _r of an absolute and a relative tolerance is taken.
2.
In the following we consider only a minimum since a maximum could be found as the minimum of −f(x).
3.
Which should not be confused with the method of steepest descent for the approximate calculation of integrals.
4.
This is a major difference to the Quasi-Newton methods for root finding (6.1.9).
5.
A well known test function for minimization algorithms.

References

R.P. Brent, Comput. J. 14, 422 (1971)
Article MathSciNet MATH Google Scholar
R.P. Brent, Algorithms for Minimization without Derivatives (Prentice Hall, Englewood Cliffs, 1973), Chap. 4
MATH Google Scholar
C.G. Broyden, Math. Comput. 19, 577 (1965)
Article MathSciNet MATH Google Scholar
C.G. Broyden, J. Inst. Math. Appl. 6, 76 (1970)
Article MathSciNet MATH Google Scholar
R.L. Burden, J.D. Faires, Numerical Analysis (Brooks Cole, Boston, 2010), Chap. 2
Google Scholar
J.C.P. Bus, T.J. Dekker, ACM Trans. Math. Softw. 1, 330 (1975)
Article MathSciNet MATH Google Scholar
T.R. Chandrupatla, Adv. Eng. Softw. 28, 145 (1997)
Article Google Scholar
T.J. Dekker, Finding a zero by means of successive linear interpolation, in Constructive Aspects of the Fundamental Theorem of Algebra, ed. by B. Dejon, P. Henrici (Wiley-Interscience, London, 1969)
Google Scholar
R. Fletcher, Comput. J. 13, 317 (1970)
Article MATH Google Scholar
R. Fletcher, C. Reeves, Comput. J. 7, 149 (1964)
Article MathSciNet MATH Google Scholar
J.A. Ford, Improved Algorithms of Illinois-Type for the Numerical Solution of Nonlinear Equations (University of Essex Press, Essex, 1995), Technical Report CSM-257
Google Scholar
D. Goldfarb, Math. Comput. 24, 23 (1970)
Article MathSciNet MATH Google Scholar
J. Kiefer, Proc. Am. Math. Soc. 4, 502–506 (1953)
Article MathSciNet MATH Google Scholar
D.E. Muller, Math. Tables Other Aids Comput. 10, 208 (1956)
Article MATH Google Scholar
J.M. Papakonstantinou, A historical development of the (n+1)-point secant method. M.A. thesis, Rice University Electronic Theses and Dissertations, 2007
Google Scholar
W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Golden section search in one dimension, in Numerical Recipes: The Art of Scientific Computing, 3rd edn. (Cambridge University Press, New York, 2007). ISBN 978-0-521-88068-8, Sect. 10.2
Google Scholar
A. Quarteroni, R. Sacco, F. Saleri, Numerical Mathematics, 2nd edn. (Springer, Berlin, 2007)
MATH Google Scholar
D.F. Shanno, Math. Comput. 24, 647 (1970)
Article MathSciNet Google Scholar
J. Sherman, W.J. Morrison, Ann. Math. Stat. 20, 621 (1949)
Google Scholar
J.Y. Tjalling, Historical development of the Newton-Raphson method. SIAM Rev. 37, 531 (1995)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Physikdepartment T38, Technische Universität München, Garching, Germany
Philipp O. J. Scherer

Authors

Philipp O. J. Scherer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Scherer, P.O.J. (2013). Roots and Extremal Points. In: Computational Physics. Graduate Texts in Physics. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00401-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-00401-3_6
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00400-6
Online ISBN: 978-3-319-00401-3
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics