How Accurate Does Newton Have to Be?

Kjelgaard Mikkelsen, Carl Christian; López-Villellas, Lorién; García-Risueño, Pablo

doi:10.1007/978-3-031-30442-2_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13826))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

516 Accesses

Abstract

We analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic. In particular, we derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. In the absence of sufficient accuracy, we are likely to retain rapid linear convergence. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics. We briefly discuss implications for parallel solvers.

P. García-Risueño—Independent scholar.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Experiences with a Lanczos Eigensolver in High-Precision Arithmetic

Computational Techniques

GROMEX: A Scalable and Versatile Fast Multipole Method for Biomolecular Simulation

Keywords

1 Introduction

Let $\varOmega \subseteq \mathbb {R}^n$ be open, let $F \in C^1(\varOmega , \mathbb {R}^n)$ and consider the problem of solving

$$\begin{aligned} F(x) = 0. \end{aligned}$$

If the Jacobian $F'$ of F is nonsingular, then Newton’s method is given by

$$\begin{aligned} x_{k+1} = x_k - s_k, \quad F'(x_k)s_k = F(x_k). \end{aligned}$$

(1)

A quasi-Newton method is any iteration of the form

$$\begin{aligned} y_{k+1} = y_k - t_k, \quad F'(y_k)t_k \approx F(y_k). \end{aligned}$$

(2)

In exact arithmetic, we expect local quadractic convergence from Newton’s method [7]. Quasi-Newton methods normally converge locally and at least linearly and some methods, such as the secant method, have superlinear convergence [5, 8]. In finite precision arithmetic, we cannot expect convergence in the strict mathematical sense and we must settle for stagnation near a zero [11]. In this paper we analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic. In particular, we derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics.

2 Auxiliary Results

The line segment l(x, y) between x and y is defined as follows:

$$\begin{aligned} l(x,y) = \{ tx + (1-t) y \, : \, t \in [0,1] \}. \end{aligned}$$

The following lemma is a standard result that bounds the difference between F(x) and F(y) if the line segment l(x, y) is contained in the domain of F.

Lemma 1

Let $\varOmega \subseteq \mathbb {R}^n$ be open and let $F \in C^1(\varOmega ,\mathbb {R}^n)$. If $l(x,y) \subset \varOmega $, then

$$\begin{aligned} F(x) - F(y) = \int _0^1\,F'(tx + (1-t)y)(x-y) dt \end{aligned}$$

and

$$\begin{aligned} \Vert F(x) - F(y)\Vert \le M \Vert x-y\Vert . \end{aligned}$$

where

$$\begin{aligned} M = \sup \{ \Vert F'(tx + (1-t)y) \Vert \, : \, t \in [0,1] \}. \end{aligned}$$

It is convenient to phrase Newton’s method as the functional iteration:

$$\begin{aligned} x_{k+1} = g(x_k), \quad g(x) = x - F'(x)^{-1}F(x). \end{aligned}$$

and to express the analysis of quasi-Newton methods in terms of the function g. The next lemma can be used to establish local quadratic convergence of Newton’s method.

Lemma 2

Let $\varOmega \subseteq \mathbb {R}^n$ be open and let $F \in C^1(\varOmega , \mathbb {R}^n)$. Let z denote a zero of F and let $x \in \varOmega $. If $F'(x)$ is nonsingular and if $l(x,z) \subset \varOmega $, then

$$\begin{aligned} g(x) - z = C(x)(x-z) \end{aligned}$$

where

$$\begin{aligned} C(x) = F'(x)^{-1} \left( \int _0^1 \left[ F'(x) - F'(tx + (1-t)z) \right] dt \right) \end{aligned}$$

Moreover, if $F'$ is Lipschitz continuous with Lipschitz constant $L>0$, then

$$\begin{aligned} \Vert g(x) - z \Vert \le \frac{1}{2} \Vert F'(x)^{-1}\Vert L \Vert x - z \Vert ^2. \end{aligned}$$

The following lemma allows us to write any approximation as a very simple function of the target vector.

Lemma 3

Let $x \in \mathbb {R}^n$ be nonzero, let $y \in \mathbb {R}^n$ be an approximation of x and let $E \in \mathbb {R}^{n \times n}$ be given by

$$\begin{aligned} E = \frac{1}{x^Tx} (y-x)x ^T. \end{aligned}$$

Then

$$\begin{aligned} y = (I + E) x, \quad \Vert E\Vert = O\left( \frac{\Vert x-y\Vert }{\Vert x\Vert } \right) , \quad y \rightarrow x, \quad y \not = x. \end{aligned}$$

In the special case of the 2-norm we have

$$\begin{aligned} \Vert E\Vert _2 = \frac{\Vert x-y\Vert _2}{\Vert x\Vert _2}. \end{aligned}$$

Proof

It is straightforward to verify that

$$\begin{aligned} (I + E)x = x + \frac{1}{x^Tx} (y-x) x^T x = x + (y-x) = y. \end{aligned}$$

Moreover, if z is any vector, then

$$\begin{aligned} \Vert Ez\Vert \le \frac{1}{\Vert x\Vert _2^2} \Vert y-x\Vert \Vert x^Tz\Vert = \left( \frac{\Vert x^T\Vert \Vert x\Vert }{\Vert x\Vert _2^2} \right) \left( \frac{\Vert x-y\Vert }{\Vert x\Vert } \right) \Vert z\Vert . \end{aligned}$$

In the case of the 2-norm, we have

$$\begin{aligned} \Vert Ez\Vert _2 \le \frac{\Vert x-y\Vert _2}{\Vert x\Vert _2} \Vert z\Vert _2 \end{aligned}$$

for all $z \not = 0$ and equality holds for $z=x$. This completes the proof.

3 Main Results

In the presence of rounding errors, any quasi-Newton method can written as

$$\begin{aligned} x_{k+1} = (I + D_k) \Big ( x_k - (I + E_k) F'(x_k)^{-1} F(x_k) \Big ). \end{aligned}$$

(3)

Here $D_k \in \mathbb {R}^{n \times n}$ is a diagonal matrix which represents the rounding error in the subtraction and $E_k \in \mathbb {R}^{n \times n}$ measures the difference between the computed correction and the correction used by Newton’s method. We simply treat the update $t_k$ needed for the quasi-Newton method (2) as an approximation of the update $s_k = F'(x_k)^{-1} F(x_k)$ needed for Newton’s method (1) and define $E_k$ using Lemma 3. It is practical to restate iteration (3) in terms of the function g, i.e.,

$$\begin{aligned} x_{k+1} = (I + D_k) \Big ( g(x_k) - E_k F'(x_k)^{-1} F(x_k) \Big ). \end{aligned}$$

(4)

We shall now analyze the behavior of iteration (4). For the sake of simplicity, we will assume that there exist nonnegative numbers K, L, and M such that

$$\begin{aligned} \forall x { \,} : \, \Vert F'(x)^{-1}\Vert \le K, \quad \Vert F'(x) - F'(y) \Vert \le L \Vert x-y\Vert , \quad \Vert F'(x)\Vert \le M. \end{aligned}$$

In reality, we only require that these inequalities are satisfied in a neighborhood of a zero. We have the following generalization of Lemma 2.

Theorem 1

The functional iteration given by Eq. (4) satisfies

$$\begin{aligned}{} & {} x_{k+1} - z = g(x_k) - z - E_k F'(x_k)^{-1} F(x_k) \nonumber \\{} & {} \qquad \qquad \qquad \qquad \qquad + D_k \big [ g(x_k) - E_k F'(x_k)^{-1} F(x_k) \big ] \end{aligned}$$

(5)

and

$$\begin{aligned}{} & {} \Vert x_{k+1} - z \Vert \le \frac{1}{2} L K \Vert x_k - z \Vert ^2 + \Vert E_k\Vert K M \Vert x_k-z\Vert \nonumber \\{} & {} \qquad \qquad \qquad \quad + \Vert D_k\Vert \left( \Vert z\Vert + \frac{1}{2} L K \Vert x_k - z\Vert ^2 + \Vert E_k\Vert K M \Vert x_k-z\Vert \right) . \end{aligned}$$

(6)

Proof

It is straightforward to verify that Eq. (5) is correct. Inequality (6) follows from Eq. (5) using the triangle inequality, Lemma 1, and Lemma 2. The second occurrence of the term $\Vert g(x_k)\Vert $ can be bounded using the inequality

$$\begin{aligned} \Vert g(x_k)\Vert \le \Vert z\Vert + \Vert g(x_k) - z\Vert . \end{aligned}$$

This completes the proof.

It is practical to focus on the case of $z \not = 0$ and restate inequality (6) as

$$\begin{aligned} r_{k+1} \le \frac{1}{2} L K (1 + \Vert D_k\Vert ) \Vert z\Vert r_k^2 + \Vert E_k\Vert K M (1 + \Vert D_k\Vert ) r_k + \Vert D_k\Vert \end{aligned}$$

(7)

where $r_k$ is the normwise relative forward error given by

$$\begin{aligned} r_k = \Vert z - x_k\Vert /\Vert z\Vert . \end{aligned}$$

3.1 Stagnation

We assume that the sequences $\{D_k\}$ and $\{E_k\}$ are bounded. Let D and E be nonnegative numbers that satisfy

$$\begin{aligned} \Vert D_k\Vert \le D, \quad \Vert E_k \Vert \le E. \end{aligned}$$

(8)

In this case, inequality (7) implies

$$\begin{aligned} r_{k+1} \le \frac{1}{2} LK(1+D) \Vert z\Vert r_k^2 + EMK(1+D) r_k + D. \end{aligned}$$

It is certain that the error will be reduced, i.e., $r_{k+1} < r_k$ when

$$\begin{aligned} D&< r_k - \left( \frac{1}{2} LK(1+D) \Vert z\Vert r_k^2 + EMK(1+D) r_k^2 \right) \\&= \left( 1 - EMK(1+D) \right) r_k - \frac{1}{2} LK(1+D) \Vert z\Vert r_k^2. \end{aligned}$$

This condition is equivalent to the following inequality:

$$\begin{aligned} D - \left[ 1 - EMK(1+D) \right] r_k + \frac{1}{2} LK(1+D)\Vert z\Vert r_k^2 < 0. \end{aligned}$$

This is an inequality of the second degree. The roots are

$$\begin{aligned} \lambda _{\pm } = \frac{\left[ 1 - EMK(1+D) \right] \pm \sqrt{\left[ 1 - EMK(1+D) \right] ^2 - 2 LK(1+D)D\Vert z\Vert }}{LK(1+D)\Vert z\Vert }. \end{aligned}$$

If D and E are sufficiently small then the roots are positive real numbers and the error will certainly be reduced provided

$$\begin{aligned} \lambda _{-}< r_k < \lambda _{+}. \end{aligned}$$

It follows that we cannot expect to do better than

$$\begin{aligned} r_k = \frac{\Vert z-x_k\Vert }{\Vert z\Vert } \approx \lambda _{-}. \end{aligned}$$

If D and E are sufficiently small, then a Taylor expansion ensures that

$$\begin{aligned} \lambda _{-} \approx \frac{D}{\left( 1-EMK(1+D)\right) ^2} \end{aligned}$$

is a good approximation. We cannot expect to do better than $r_{k+1} = \lambda _-$, but the threshold of stagnation is not particularly sensitive to the size of E.

3.2 The Decay of the Error

We assume that the sequences $\{D_k\}$ and $\{E_k\}$ are bounded. Let D and E be upper bounds that satisfy (8). Suppose that we are not near the threshold of stagnation in the sense that

$$\begin{aligned} D \le C r_k. \end{aligned}$$

(9)

for a (modest) constant $C>0$. In this case, inequality (7) implies

$$\begin{aligned} r_{k+1} \le \rho _k r_k, \quad \rho _k = \frac{1}{2} L K (1 + D) \Vert z\Vert r_k + E K M (1 + D) + C. \end{aligned}$$

(10)

If $C<1$, then we may have $\rho _k < 1$, when $r_k$ and E are sufficiently small. This explains when and why local linear decay is possible. We now strengthen our assumptions. Suppose that there is a $\lambda \in (0,1]$ and $C_1 > 0$ such that

$$\begin{aligned} \Vert E_k \Vert \le C_1 r_k^{\lambda } \end{aligned}$$

(11)

and that we are far from the threshold of stagnation in the sense that

$$\begin{aligned} D \le C_2 r_k^{1 + \lambda } \end{aligned}$$

(12)

for a (modest) constant $C_2 > 0$. In this case, inequality (7) implies

$$\begin{aligned} r_{k+1} \le \left[ \frac{1}{2} L K (1 + D) \Vert z\Vert r_k^{1-\lambda } + C_1 K M (1 + D) + C_2 \right] r_k^{1 + \lambda }. \end{aligned}$$

(13)

This explains when and why local superlinear decay is possible.

3.3 Convergence

We cannot expect a quasi-Newton method to converge unless the subtraction $y_{k+1} = y_k - t_k$ is exact. Then $D_k = 0$ and inequality (7) implies

$$\begin{aligned} r_{k+1} \le \eta _k r_k, \quad \eta _k = \left( \frac{1}{2} L K \Vert z\Vert r_k + \Vert E_k\Vert K M \right) . \end{aligned}$$

We may have $\eta _k < 1$ for all k, provided $E = \sup \Vert E_k\Vert $ and $r_0$ are sufficiently small. This explains when and why local linear convergence is possible. We now strengthen our assumptions. Suppose that there is a $\lambda \in (0,1]$ and a $C>0$ such that

$$\begin{aligned} \forall k \in \mathbb {N} \, : \, \Vert E_k\Vert \le C r_k^{\lambda }. \end{aligned}$$

In this case, inequality (7) implies

$$\begin{aligned} r_{k+1} \le \left( \frac{1}{2} L K \Vert z\Vert r_k^{1-\lambda } + C K M \right) r_k^{1 + \lambda }. \end{aligned}$$

This inequality allows us to establish local convergence of order at least $1+\lambda $.

3.4 How Accurate Does Newton Have to Be?

We will assume the use of normal IEEE floating point numbers and we will apply the analysis given in Sect. 3.2. If we use the 1-norm, the 2-norm or the $\infty $-norm, then we may choose $D=u$, where u is the unit roundoff. Suppose that Eqs. (11) and (12) are satisfied with $\lambda = 1$. Then inequality (13) reduces to

$$\begin{aligned} r_{k+1} \le \left[ \frac{1}{2} L K (1 + u) \Vert z\Vert + C_1 K M (1 + u) + C_2 \right] r_k^2. \end{aligned}$$

Due to the basic limitations of IEEE floating point arithmetic we cannot expect to do better than

$$\begin{aligned} r_{k+1} = O(u), \quad u \rightarrow 0, \quad u > 0. \end{aligned}$$

It follows that we never need to do better than

$$\begin{aligned} \Vert E_k\Vert = O(\sqrt{u}), \quad u \rightarrow 0, \quad u > 0. \end{aligned}$$

4 Numerical Experiments

4.1 Computing Square Roots

Let $\alpha >0$ and consider the problem of solving the nonlinear equation

$$\begin{aligned} f(x) = x^2 - \alpha = 0 \end{aligned}$$

with respect to $x>0$ using Newton’s method. Let $r_k$ denote the relative error after k Newton steps. A simple calculation based on Lemma 2 yields

$$\begin{aligned} |r_{k+1}| \le r_k^2/2, \quad |r_k| \le 2 \left( |r_0|/2 \right) ^{2^k}. \end{aligned}$$

We see that convergence is certain when $|r_0| < 2$. The general case of $\alpha > 0$ can be reduced to the special case of $\alpha \in [1,4)$ by accessing and manipulating the binary representation directly. Let $x_0 : [1,4] \rightarrow \mathbb {R}$ denote the best uniform linear approximation of the square root function on the interval [1, 4]. Then

$$\begin{aligned} x_0(\alpha ) = \alpha /3 + 17/24, \quad |r_0(\alpha )| \le 1/24. \end{aligned}$$

In order to illustrate Theorem 1 we execute the iteration

$$\begin{aligned} x_{k+1} = x_k - (1+e_k)f(x_k)/f'(x_k) \end{aligned}$$

where $e_k$ is a randomly generated number. Specifically, given $\epsilon > 0$ we choose $e_k$ such that $|e_k|$ is uniformly distributed in the interval $[\frac{1}{2} \epsilon , \epsilon ]$ and the sign of $e_k$ is positive or negative with equal probability. Three choices, namely $\epsilon = 10^{-2}$ (left), $\epsilon = 10^{-8}$ (center) and $\epsilon = 10^{-12}$ (right) are illustrated in Fig. 1.

In each case, eventually the perturbed iteration reproduces either the computer’s internal representation of the square root or stagnates with a relative error that is essentially the unit roundoff $u=2^{-53} \approx 10^{-16}$. When $\epsilon = 10^{-2}$ the quadratic convergence is lost, but the relative error is decreased by a factor of approximately $\epsilon = 10^{-2}$ from one iteration to the next, i.e., extremely rapid linear convergence. Quadratic convergence is restored when $\epsilon $ is reduced to $\epsilon = 10^{-8} \approx \sqrt{u}$. Further reductions of $\epsilon $ have no effect on the convergence as demonstrated by the case of $\epsilon = 10^{-12}$. We shall now explain exactly how far this experiment supports the theory that is presented in this paper.

Stagnation. By Sect. 3.1 we expect that the level of stagnation is essentially independent of the size of E, the upper bound on the relative error between the computed step and the step needed for Newton’s method. This is clearly confirmed by the experiment.

Error Decay. Since we are always very close to the positive zero of $f(x) = x^2 - \alpha $ we may choose

$$\begin{aligned} L \approx 2, \quad K|z| \approx 1/2, \quad MK \approx 1, \end{aligned}$$

In the case of $\epsilon = 10^{-2}$, Fig. 1 (left) shows that we satisfy inequality (9) with $D = u$ and $C = \epsilon < 1$, i.e.,

$$\begin{aligned} u \le \epsilon r_k, \quad 0 \le k < 5. \end{aligned}$$

By Eq. (10) we must have

$$\begin{aligned} r_{k+1} \le \rho _k r_k, \quad \rho _k \approx 2 \epsilon , \quad 0< k < 5. \end{aligned}$$

This is exactly the linear convergence that we have observed. In the case of $\epsilon = 10^{-8}$, Fig. 1 (center) shows that we satisfy inequality (12) with $C_2 = 1$ and $\lambda = 1$, i.e.,

$$\begin{aligned} u \le r_k^2, \quad k = 0,1. \end{aligned}$$

By inequality (13) we must have quadratic decay in the sense that

$$\begin{aligned} r_{k+1} \le C r_k^2, \quad C \approx \frac{3}{2}, \quad k = 0,1. \end{aligned}$$

Manual inspection of Fig. 1 reveals that the actual constant is close to 1 and certainly smaller than $C \approx \frac{3}{2}$. By Sect. 3.4 we do not expect any benefits from using an $\epsilon $ that is substantially smaller than $\sqrt{u}$. This is also supported by the experiment.

4.2 Constrained Molecular Dynamics

The objective is to solve a system of differential algebraic equations

$$\begin{aligned} q'(t)&= v(t), \\ M v'(t)&= f(q(t)) - g'(q(t))^T \lambda (t), \\ g(q(t))&=0. \end{aligned}$$

Here q and v are vectors that represent the position and velocity of all atoms, M is a nonsingular diagonal mass matrix, f represents the external forces acting on the atoms and $-g'(q)^T \lambda $ represents the constraint forces. Here $g'$ is the Jacobian of the constraint function g. The standard algorithm for this problem is the SHAKE algorithm [10]. It uses a pair of staggered uniform grids and takes the form

$$\begin{aligned} v_{n+1/2}&= v_{n-1/2} + h M^{-1} \left( f(q_n) - g'(q_n)^T \lambda _n \right) , \nonumber \\ q_{n+1}&= q_n + h v_{n + 1/2}, \nonumber \\ g(q_{n+1})&= 0, \end{aligned}$$

(14)

where $h>0$ is the fixed time step and $q_n \approx q(t_n)$, $v_{n+\frac{1}{2}} \approx v(t_{n+\frac{1}{2}})$, where $t_n = nh$ and $t_{n+\frac{1}{2}} = (n+1/2)h$. Equation (14) is really a nonlinear equation for the unknown Lagrange multiplier $\lambda _n$, specifically

$$\begin{aligned} g(\phi _n(\lambda )) = 0, \quad \phi _n(\lambda )= q_n + h(v_{n-\frac{1}{2}} + h M^{-1}( f(q_n) - g'(q_n)^T \lambda )). \end{aligned}$$

The relevant Jacobian is the matrix

$$\begin{aligned} A_n(\lambda ) = \left( g(\phi _n(\lambda )) \right) ' = g'(\phi _n(\lambda ))M^{-1} g'(q_n)^T. \end{aligned}$$

The matrix $A_n(\lambda )$ is close to the constant symmetric matrix $S_n$ given by

$$\begin{aligned} S_n = g'(q_n)M^{-1} g'(q_n)^T \end{aligned}$$

simply because $\phi _n(\lambda ) = q_n + O(h)$ as $h \rightarrow 0$ and $h>0$. It is therefore natural to investigate if the constant matrix $S_n^{-1}$ is a good approximation of $A_n^{-1}(\lambda )$.

For this experiment, we executed a production molecular dynamics run using the GROMACS [1] package. We replaced the constraint solver used by GROMACS’s SHAKE function with a quasi-Newton method based on the matrix $S_n$. Our experiment was based on GROMACS’s Lysozyme in Water Tutorial [6]. We simulated a hen egg white lysozyme [9] molecule submerged in water inside a cubic box. Lysozyme is a protein that consists of a single polypeptide chain of 129 amino acid residues cross-lined at 4 places by disulfide bonds between cysteine side-chains in different parts of the molecule. Lysozyme has 1960 atoms and 1984 bond length constraints. Before executing the production run, we added ions to the system to make it electrically neutral. The energy of the system was minimized using the steepest descent algorithm until the maximum force of the system was below 1000.0 kJ/(mol$\cdot $nm). Then, we executed 100 ps of a temperature equilibration step using a V-Rescale thermostat in an NVT ensemble to stabilize the temperature of the system at 310 K. To finish, we stabilized the pressure of the system at 1 Bar for another 100 ps using a V-Rescale thermostat and a Parrinello-Rahman barostat in an NPT ensemble. We executed a 100 ps production run with a 2 fs time step using an NPT ensemble with a V-Rescale thermostat and a Parrinello-Rahman barostat with time constants of 0.1 and 2 ps, respectively. We collected the results of the constraint solver every 5 ps starting at time-step 5 ps, for a total of 20 sample points. Specifically, we recorded the normwise relative error $r_k = \Vert \lambda _n-x_k\Vert _2/\Vert \lambda _n\Vert _2$ as a function of the number k of quasi-Newton steps using the symmetric matrix $S_n$ instead of the nonsymmetric matrix $A_n$ and we recorded $\Vert E_k\Vert _2 = \Vert s_k - t_k\Vert _2/\Vert s_k\Vert _2$ where $t_k$ is needed for a quasi-Newton step and $s_k$ is needed a Newton step. By (10) we have $r_{k+1} \le \rho _k r_k$, but we cannot hope for more than $r_{k+1} \approx \rho _k r_k$ where $\rho _k = O(\Vert E_k\Vert _2)$ and this is indeed what we find in the Fig. 2c until we hit the level of stagnation where the impact of rounding errors is keenly felt.

5 Related Work

It is well-known that Newton’s method has local quadratic convergence subject to certain regularity conditions. The simplest proof known to us is due to Mysovskii [7]. Dembo et al. [2] analyzed the convergence of quasi-Newton methods in terms of the ratio between the norm of linear residual, i.e., $r_k = F(x_k) - F'(x_k)t_k$ and the norm of the nonlinear residual $F(x_k)$. Tisseur [11] studied the impact of rounding errors in terms of the backward error associated with approximating the Jacobians and computing the corrections, as well as the errors associated with computing the residuals. Here we have pursued a third option by viewing the correction $t_k$ as an approximation of the correction $s_k$ needed for an exact Newton step. Tisseur found that Newton’s method stagnate at a level that is essentially independent of the stability of the solver and we have confirmed that this is true for quasi-Newton methods in general. It is clear to us from reading Theorem 3.1 of Dennis and Moore’s paper [3] that they would instantly recognize Lemma 3, but we cannot find the result stated explicitly anywhere. Forsgren [4] uses a stationary method for solving linear systems to construct a quasi-Newton method that is so exact that the convergence is quadratic. Section 4.1 contains a simple illustration of this phenomenon.

6 Conclusions

Quasi-Newton methods can also be analyzed in terms of the relative error between Newton’s correction and the computed correction. We achieve quadratic convergence when this error is $O(\sqrt{u})$. This fact represent an opportunity for improving the time-to-solution for nonlinear equations. General purpose libraries for solving sparse linear systems apply pivoting for the sake of numerical accuracy and stability. In the context of quasi-Newton methods we do not need maximum accuracy. Rather, there is some freedom to pivot for the sake of parallelism. If we fail to achieve quadratic convergence, then we are likely to still converge rapidly. It is therefore worthwhile to develop sparse solvers that pivot mainly for the sake of parallelism.

References

Berendsen, H., van der Spoel, D., van Drunen, R.: GROMACS: a message-passing parallel molecular dynamics implementation. CPC 91(1), 43–56 (1995)
Google Scholar
Dembo, R.S., Eisenstat, S.C., Steihaug, T.: Inexact Newton methods. SIAM J. Numer. Anal. 19(2), 400–408 (1982)
Article MathSciNet MATH Google Scholar
Dennis, J.E., More, J.J.: Quasi-Newton methods, motivation and theory. SIAM Rev. 19(1), 46–89 (1977)
Article MathSciNet MATH Google Scholar
Forsgren, A.: A sufficiently exact inexact Newton step based on reusing matrix information. TRITA-MAT OS7, Department of Mathematics, KTH, Stockholm, Sweden (2009)
Google Scholar
Kelley, C.T.: Iterative Methods for Linear and Nonlinear Equations. No. 16 in Frontiers in Applied Mathematics. SIAM, Philadelphia (1995)
Google Scholar
Lemkul, J.A.: GROMACS Tutorial Lysozyme in Water. https://www.mdtutorials.com/gmx/lysozyme/index.html
Mysovskii, I.P.: On the convergence of Newton’s method. Trudy Mat. Inst. Steklova 28, 145–147 (1949). (In Russian)
MathSciNet Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Computer Science and Applied Mathematics, Academic Press, New York (1970)
MATH Google Scholar
RSCB: Protein Data Bank. https://www.rcsb.org/structure/1AKI
Ryckaert, J.P., Ciccotti, G., Berendsen, H.J.: Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23(3), 327–341 (1977)
Article Google Scholar
Tisseur, F.: Newton’s method in floating point arithmetic and iterative refinement of generalized eigenvalue problems. SIAM J. Matrix Anal. Appl. 22(4), 1038–1057 (2001)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Prof. I. Argyros commented on an early draft of this paper and provided the reference to the work of I. P. Mysovskii. The first author is supported by eSSENCE, a collaborative e-Science programme funded by the Swedish Research Council within the framework of the strategic research areas designated by the Swedish Government. This work has been partially supported by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB-C21/AEI/10.13039/501100011033), by the Generalitat de Catalunya (contract 2017-SGR-1328), and by Lenovo-BSC Contract-Framework Contract (2020).

Author information

Authors and Affiliations

Department of Computing Science, Umeå University, 90187, Umeå, Sweden
Carl Christian Kjelgaard Mikkelsen
Barcelona Supercomputing Center, Barcelona, Spain
Lorién López-Villellas
Zaragoza, Spain
Pablo García-Risueño

Authors

Carl Christian Kjelgaard Mikkelsen
View author publications
You can also search for this author in PubMed Google Scholar
Lorién López-Villellas
View author publications
You can also search for this author in PubMed Google Scholar
Pablo García-Risueño
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carl Christian Kjelgaard Mikkelsen .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kjelgaard Mikkelsen, C.C., López-Villellas, L., García-Risueño, P. (2023). How Accurate Does Newton Have to Be?. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham. https://doi.org/10.1007/978-3-031-30442-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-30442-2_1
Published: 28 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30441-5
Online ISBN: 978-3-031-30442-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

How Accurate Does Newton Have to Be?

Abstract

Similar content being viewed by others

Experiences with a Lanczos Eigensolver in High-Precision Arithmetic

Computational Techniques

GROMEX: A Scalable and Versatile Fast Multipole Method for Biomolecular Simulation

Keywords

1 Introduction

2 Auxiliary Results

Lemma 1

Lemma 2

Lemma 3

Proof

3 Main Results

Theorem 1

Proof

3.1 Stagnation

3.2 The Decay of the Error

3.3 Convergence

3.4 How Accurate Does Newton Have to Be?

4 Numerical Experiments

4.1 Computing Square Roots

4.2 Constrained Molecular Dynamics

5 Related Work

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

How Accurate Does Newton Have to Be?

Abstract

Similar content being viewed by others

Experiences with a Lanczos Eigensolver in High-Precision Arithmetic

Computational Techniques

GROMEX: A Scalable and Versatile Fast Multipole Method for Biomolecular Simulation

Keywords

1 Introduction

2 Auxiliary Results

Lemma 1

Lemma 2

Lemma 3

Proof

3 Main Results

Theorem 1

Proof

3.1 Stagnation

3.2 The Decay of the Error

3.3 Convergence

3.4 How Accurate Does Newton Have to Be?

4 Numerical Experiments

4.1 Computing Square Roots

4.2 Constrained Molecular Dynamics

5 Related Work

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation