1 Introduction

Geometry optimization is an important tool in the computational chemistry toolbox and has become ubiquitous in modern studies of chemical properties and reactions. There are a wide variety of different algorithms that exist for optimization (see Ref. [1] for a recent review of methods), with the most common utilizing a combination of quasi-Newton steps in redundant internal coordinates [2, 3], with an approximate Hessian stored and updated using the gradient information computed at each step of the optimization. Methods also exist that avoid the storage and update of the Hessian matrix, such as the conjugate gradient [4] and limited memory quasi-Newton [5] optimizers. These approaches typically require more gradient evaluations to achieve convergence when compared with full storage quasi-Newton methods and are frequently used to optimize very large molecules where the memory costs associated with storing the Hessian are prohibitive. For a recent benchmark of these such methods, see Ref. [6].

For quasi-Newton optimizations, approximate, positive definite Hessian matrices [7] are typically used to avoid the cost of computing the full Hessian exactly. These approximate Hessians are then updated by the method of Broyden, Fletcher, Goldfarb and Shanno (BFGS [811]) when seeking a minimum structure, and either the Powell’s symmetric Broyden (PSB [12]) or the symmetric rank 1 (SR1 [13]) updates, or some combination of the two [14], is used when attempting to locate a transition state. Additionally, sequence acceleration methods such as line searches and direct inversion of the iterative subspace (GDIIS [15]) are also used to reduce the number of potential energy surface (PES) calculations necessary to converge to the desired minimum or transition-state structure.

Since geometry optimization using quasi-Newton methods is a relatively mature field, improvements are likely to be modest. Nevertheless, it is worthwhile to explore the effects of various modifications to existing optimization algorithms. Described below are three refinements that may offer additional benefits over existing methodologies for locating minimum energy structures using quasi-Newton optimizers.

  • Flowchart update—This approach seeks to improve Hessian updating by using different update methods only when they are expected to be well behaved, and falling back to more reliable but less ideal updates when necessary. Additionally, a new modification to the PSB method is used by using scaled displacements to compute the update.

  • Scaled RFO method—This approach seeks to improve the use of the rational function optimization method for controlling step size and direction by modifying the shift matrix to better represent the expected relative stiffness of the bond stretches versus the other coordinates.

  • Quasi-rotation method—This is an alternative approach to handling the redundancy in an internal coordinate system. Rather than store the approximate Hessian in the full redundant space, a quasi-rotation matrix is used to rotate the approximate Hessian from the non-redundant space at one point, to the non-redundant space at another. In addition to reducing the memory requirements for storing the Hessian for large systems, this could also lead to a more consistent approximation to the Hessian even when the non-redundant space changes over the course of the optimization and may help improve Hessian updating since the change in the gradient can be expressed entirely in the non-redundant space at one set of coordinates.

In the appendix, we summarize the procedures that have become standard over the past few decades for the optimization of molecular geometries using the quasi-Newton method. In Sects. 24, we present some new contributions to improve the stability and efficiency of such optimizations. The standard method will serve as the benchmark for evaluating the performance of each of the modifications to the standard optimization method.

To test the performance of the geometry optimization methods, we have compiled a new set of 20 mol with between 10 and 50 heavy atoms (Fig. 1). This test set is more representative of molecules typically being optimized using ab initio and DFT methods on modern architectures than the older test sets that were developed according to the computer time available in previous decades [16, 17].

2 Flowchart update

2.1 Motivation

Three of the most commonly used Hessian update formulae are the BFGS, SR1 and PSB updates. They use the displacement (\(\mathbf {s}=\varvec{\Delta }\mathbf {q}\)), change in gradient (\(\mathbf {y}=\varvec{\Delta }\mathbf {g}_{q}\)), and quadratic error (\(\mathbf {z}=\mathbf {y}-\mathbf {Hs}\)) to correct the approximation to the Hessian according to data computed at two separate geometries.

$$\begin{aligned} \varvec{\Delta }\mathbf {H}_{BFGS}= &\frac{\mathbf {yy}^{T}}{\mathbf {y}^{T}\mathbf {s}}-\frac{\mathbf {sHHs}^{T}}{\mathbf {sHs}} \end{aligned}$$
(1)
$$\begin{aligned} \varvec{\Delta }\mathbf {H}_{SR1}= &\frac{\mathbf {zz}^{T}}{\mathbf {z}^{T}\mathbf {s}}\end{aligned}$$
(2)
$$\begin{aligned} \varvec{\Delta }\mathbf {H}_{PSB}= &\frac{\mathbf {sz}^{T}+\mathbf {zs}^{T}}{\mathbf {s}^{T}\mathbf {s}}-\mathbf {s}^{T}\mathbf {z}\frac{\mathbf {ss}^{T}}{\left( \mathbf {s}^{T}\mathbf {s}\right) ^{2}} \end{aligned}$$
(3)

It is fairly well understood that the BFGS (Eq. 1) and SR1 (Eq. 2) updates can become numerically unstable when \(\mathbf {y}^{T}\mathbf {s}\) and \(\mathbf {z}^{T}\mathbf {s}\) are small, respectively. The PSB update (Eq. 3), on the other hand, is always numerically stable since the only quantity in the denominator (\(\mathbf {s}^{T}\mathbf {s}\)) is always nonzero for a finite step, but may have undesirable properties when the quadratic error is large. After observing that, in general, the SR1 update produced the most reasonable results when \(\mathbf {z}^{T}\mathbf {s}\) is less than 0, the following flowchart method was developed.

2.2 Method

2.2.1 Flowchart method

  1. 1.

    If \(\frac{\mathbf {z}^{T}\mathbf {s}}{\left| \mathbf {z}\right| \left| \mathbf {s}\right| }<-0.1\), use the SR1 update.

  2. 2.

    If \(\frac{\mathbf {y}^{T}\mathbf {s}}{\left| \mathbf {y}\right| \left| \mathbf {s}\right| }>0.1\), use the BFGS update.

  3. 3.

    Otherwise, use PSB method.

This has the benefit of attempting to use the SR1 and BFGS methods, which are often far superior to the PSB method for minimization, as much as possible, but relying on the numerical stability of PSB when necessary.

2.2.2 SSB method

All Hessian update methods are based upon computing a correction to the Hessian which satisfies the secant equation

$$\begin{aligned} \left( \mathbf {H+\varvec{\Delta }}\mathbf {H}\right) \mathbf {s} =\mathbf {y} \end{aligned}$$
(4)

Whenever the dimensionality of the PES is greater than 1, this equation is under-determined for the correction (\(\varvec{\Delta }\mathbf {H}\)) and the imposition of different constraints, such as the requirement that the correction be symmetric, and that it has a minimum size according to some metric, leads to the different update formula. Since all update methods satisfy Eq. 4, the difference in performance between the methods must depend on how they treat the remaining space. For SR1 and BFGS, the numerator is constructed from the outer products of vectors that resemble gradient change terms (\(\mathbf {y}\), \(\mathbf {Hs}\) or \(\mathbf {z}\)), while the numerator for the PSB formula is constructed from outer products of displacements or displacements and gradient change terms. Similarly, the denominators in the SR1/BFGS updates are scalar products of the displacement and a gradient change terms, while the denominators in the PSB update are scalar products of the displacement alone. If these features play a role in the improved behavior observed with the SR1/BFGS updates, then perhaps the PSB update can be modified to produce more reasonable updates as well.

A more general symmetric rank 2 update can be constructed using any vector \(\mathbf {v}\) as follows

$$\begin{aligned} \varvec{\Delta }\mathbf {H}_{SR2}=\frac{\mathbf {vz}^{T} +\mathbf {zv}^{T}}{\mathbf {v}^{T}\mathbf {s}} -\mathbf {s}^{T}\mathbf {z}\frac{\mathbf {vv}^{T}}{\left( \mathbf {v}^{T} \mathbf {s}\right) ^{2}} \end{aligned}$$
(5)

This update is valid for any choice of \(\mathbf {v}\) that has a nonzero overlap with \(\mathbf {s}\), with \(\mathbf {v=s}\) giving the PSB update, and \(\mathbf {v=z}\) giving the SR1 update. Greenstadt [18] observed that the choice of \(\mathbf {v=Ms}\), where \(\mathbf {M}\) is a positive definite weighting matrix, minimizes the magnitude of update matrix relative to \(\mathbf {M}\). By letting \(\mathbf {M}\) be the approximate Hessian that is used to initialize a quasi-Newton optimization, we obtain a sensible \(\mathbf {v}\) that has characteristics of \(\mathbf {y}\), the change in the gradient, while also having a nonzero overlap with \(\mathbf {s}\), the displacement. This modified Greenstadt formula will be refered to as the scaled symmetric Broyden (SSB) update and is also tested in the flowchart method.

3 Scaled RFO method

3.1 Motivation

The shifted (quasi-)Newton methods control the direction and magnitude of a (quasi-)Newton step by a shift of the Hessian eigenvalues.

$$\varvec{\Delta }\mathbf {q}_{r}=-\left( \mathbf {H}_{r} +\xi \mathbf {I}\right) ^{-1}\mathbf {g}_{r}$$
(6)

where \(\xi\) is computed in such a way that the shifted quantity \({\mathbf {H}}_{r}+\xi {\mathbf {I}}\) has the correct number of negative eigenvalues (zero for minima, one for transition states) for the structure being optimized, and the subscript r indicates that the quantities are projected into the reduced, non-redundant space (see “Projection into non-redundant space” section in the Appendix for further details). The RFO method computes this shift factor as the negative of an eigenvalue (i.e., \(\xi _{RFO}=-\lambda _{\mathrm{aug}}\)) of the Hessian augmented with the gradient (A more detailed discussion of the RFO method may be found in “RFO” section in Appendix.)

$$\begin{aligned} \mathbf {H}_{\mathrm{aug}}=\left[ \begin{array}{ll} \mathbf {H}_{r} &{}\quad \mathbf {g}_{r}\\ \mathbf {g}_{r}^{T} &{} \quad 0 \end{array}\right] \end{aligned}$$
(7)

When the shift factor is large enough to dominate the Hessian, the result is a scaled steepest descent (or shallowest ascent, depending on the sign of the shift factor) step. This tends to be a poor step on a chemical PES, since the vibrational modes in a molecule tend to be a combination of both soft angular/torsional and stiff bond stretch coordinates, which can lead to oscillations and poor convergence behavior. Standard methods to compute the shift factor often result in a shift that is small enough that the Hessian remains dominant, and the vibrational modes are scaled appropriately to avoid the oscillatory behavior. Consideration of better ways to account for the stiffness of the vibrational modes when computing the direction and magnitude of the shifted (quasi-)Newton step has resulted in the scaled RFO method.

3.2 Method

While the RFO method is normally derived with a general matrix \(\mathbf {S}\) (see “RFO” section in the Appendix), it is usually implemented with \(\mathbf {S}=\mathbf {I}\). In the scaled RFO method, \(\mathbf {S}\) is a positive definite matrix that is chosen to better account for the difference in stiffness between the coordinates. Additionally, for behavior similar to the RFO method, \(\mathbf {S}\) should be normalized so that the magnitude of \(\mathbf {S}\) is comparable to the magnitude of the identity matrix. This ensures that the only function of the \(\mathbf {S}\) matrix is to change the contribution from each coordinate in the shifted matrix relative to one another without increasing or decreasing the magnitude of the shift itself. These requirements are met by using an approximate, positive definite Hessian \(\tilde{\mathbf {H}}\) used to initialize the Hessian in a quasi-Newton optimization, projected into the active space at the current geometry (\(\tilde{\mathbf {H}}_{r}=\mathbf {U}_{\mathrm{act}}\tilde{\mathbf {H}}\mathbf {U}_{\mathrm{act}}^{T}\)), with the normalization factor computed using the inverse of the determinant.

$$\begin{aligned} \mathbf {S}=\frac{\widetilde{\mathbf {H}}_{r}}{| \widetilde{\mathbf {H}}_{r}| ^{{1}/{n_{\mathrm{act}}}}} \end{aligned}$$
(8)

Löwdin orthogonalization [19] can be used to scale the Hessian, gradient and computed step, allowing for the computation of the shift parameter in the standard way.

$$\begin{aligned} \varvec{\Delta }\mathbf {q}_{SRFO}& = \mathbf {S}^{-{1}/{2}} \left( \mathbf {S^{-{1}/{2}}}\mathbf {H}_{r} \mathbf {S}^{-{1}/{2}}\right. \\&\left. -\lambda _{RFO}\mathbf {I}\right) ^{-1} \mathbf {S}^{-{1}/{2}}\mathbf {g}_{r} \end{aligned}$$
(9)

4 Quasi-rotation internal coordinate propagation

4.1 Motivation

Regardless of the coordinate system chosen, the number of internal degrees of freedom defining a molecular geometry is fixed at \(n_{\mathrm{act}}=n_{crt}-6\) for nonlinear molecules. Whenever more than \(n_{\mathrm{act}}\) primitive coordinates are used to define the molecular geometry, a local basis of \(n_{\mathrm{act}}\) linear combinations of the primitive coordinates completely defines the allowed internal motions of the molecule, and the remaining coordinates orthogonal to that basis are redundant. This basis is considered local because it depends on the corresponding geometry, in the case of Cartesian coordinates, the rotational orientation defines the direction of the overall infinitesimal rotations, while with redundant internal coordinates, the local basis is given by the left singular vectors of the B-matrix (see “Transformation of gradient/Hessian” section in the Appendix), which is a function of the Cartesian coordinates.

A method for transforming the derivatives in the local basis for one geometry into the local basis for another geometry is outlined below. This reduces the amount of space required to store approximate Hessian information down to the amount required for a symmetric \(n_{\mathrm{act}}\) matrix and may help keep approximate Hessian information relevant over longer optimizations when the local space changes dramatically. The transformation is constructed by projecting the B-matrix of a previous geometry into the local basis at the current geometry, and the resulting \(n_{\mathrm{act}}\,\times\,n_{\mathrm{act}}\) matrix will have an unsigned determinant of approximately, but not less than, 1. For this reason, the transformation is referred to as a quasi-rotation coordinate propagation method.

4.2 Method

Instead of updating the Hessian in the full redundant space and then projecting into the non-redundant space at every step of the optimization, the Hessian may be stored and updated in only the active space. To account for the changes in the active space from one point to the next point, a quasi-rotation scheme is used. If \(\mathbf {B}^{\mathrm{old}}\) and \(\mathbf {B}^{\mathrm{new}}\) are the B-matrix computed at the previous and current point, respectively, while \(\mathbf {U}_{\mathrm{act}}^{\mathrm{old}}\) and \(\mathbf {U}_{\mathrm{act}}^{\mathrm{new}}\) are the active space at the previous and current point, the quasi-rotation matrix that transforms a matrix or vector from the active space at the previous point to the active space at the current point is given by

$$\begin{aligned} \mathbf {M}^{\mathrm{rot}}=\mathbf {U}_{\mathrm{act}}^{\mathrm{old}}\mathbf {B}^{\mathrm{old}}\left( \mathbf {U}_{\mathrm{act}}^{\mathrm{new}}\mathbf {B}^{\mathrm{old}}\right) ^{-1} \end{aligned}$$
(10)

The rotated Hessian at the new point may be computed as follows

$$\begin{aligned} \mathbf {H}_{\mathrm{act}}^{\mathrm{rot}}=\left( \mathbf {M}^{\mathrm{rot}}\right) ^{T} \mathbf {H}_{\mathrm{act}}^{\mathrm{old}}\mathbf {M}^{\mathrm{rot}} \end{aligned}$$
(11)

This Hessian may then be updated by using modified \(\varvec{\Delta }\mathbf {q}\) and \(\varvec{\Delta }\mathbf {g}\) vectors

$$\begin{aligned} \varvec{\Delta }\mathbf {q}^{prj}= &\mathbf {U}_{\mathrm{act}}^{\mathrm{new}} \left( \mathbf {q}^{\mathrm{new}}-\mathbf {q}^{\mathrm{old}}\right) \end{aligned}$$
(12)
$$\begin{aligned} \varvec{\Delta }\mathbf {g}^{\mathrm{rot}}= &\mathbf {g}_{r}^{\mathrm{new}} -\left( \mathbf {M}^{\mathrm{rot}}\right) ^{T}\mathbf {g}_{r}^{\mathrm{old}} \end{aligned}$$
(13)

The projected displacement will not be 100 % accurate, as some of the displacement may exist in the redundant space at the new point. However, this amount is typically small relative to the magnitude of the displacement vector, and the BFGS and SR1 updates only modify space that is nonzero in \(\mathbf {\varvec{\Delta }}\mathbf {g}\) and \(\mathbf {H}\varvec{\Delta }\mathbf {q}\).

5 Results and discussion

The performance of the modified algorithms are compared to the standard algorithm outlined in the appendix: optimization in redundant internal coordinates using the BFGS formula to update the Hessian and RFO to compute the step. An adaptive trust region is also used to control the maximum step size, with the RFO step scaled back whenever the computed step would exceed the trust region. Six modified algorithms were constructed by replacing one component of the standard algorithm:

  • Cartesian—Optimize using Cartesian coordinates rather than redundant internal coordinates

  • QN—Use a quasi-Newton step (i.e., Eq. 27 with \(\lambda _{m}=0\)) rather than the RFO step

  • FlowPSB—Replace BFGS updating with the flowchart update, using the PSB update as the fallback option

  • FlowSSB—Replace BFGS updating with the flowchart update, using the SSB update as the fallback option

  • SRFO—Compute the step using the scaled RFO approach rather than the standard RFO step

  • RotCrd—Store only the non-redundant \(n_{\mathrm{act}} \times n_{\mathrm{act}}\) Hessian, using the quasi-rotation matrix to propagate the Hessian and gradient at the previous point prior to applying the update

In each case, the initial Hessian was estimated as a diagonal matrix in redundant internal coordinates, and the initial trust region was set to 0.5 bohr or radian. No ceiling on the maximum step size was found to be necessary, as steps longer than approximately 1 bohr were uncommon, particularly when RFO was employed, and so the trust region was limited in practice to a maximum of 2 bohr. For the Cartesian optimization, the initial Hessian was transformed to Cartesian coordinates using the B-matrix, \(\mathbf {\widetilde{H}}_{\mathrm{cart}}=\mathbf {B} \mathbf {\widetilde{H}}_{\mathrm{int}}\mathbf {B}^{T}\), and the reduced space at each step is defined as the \(3n_{\mathrm{cart}}-6\) vectors orthogonal to the translation and rotation vectors defined in Eqs. 16 and 17. For the QN algorithm, if the Hessian is found to have a negative eigenvalue following the update, the optimization is terminated since the Hessian is not shifted prior to computing the step.

These algorithms were implemented in a Mathematica [20] program, using Gaussian 09 [21] to evaluate the potential energy surface. Table 1 shows the number of iterations, consisting of an energy and gradient calculation, that each method required to achieve the convergence criteria listed in Sect. 1 for the examples in the test set shown in Fig. 1 (initial coordinates of the structures are provided in the supporting information). So that the different algorithms could be surveyed quickly, Hartree–Fock with the 3–21 G basis set was used to compute the energies for most cases. The behavior for B3LYP [2225] with a 6–31 G(d,p) basis set should be similar, with the latter used in the EASC and vitamin C optimizations. For comparison, the total number of iterations required to converge all of the structures in the test set is listed in Table 1 for each algorithm that successfully optimized the complete test set. Additionally, since the QN method fails to converge on five of the structures, the total number of iterations required to converge only the remaining 15 structures is also listed.

Immediately apparent from the table is that optimization in redundant internal coordinates is vastly superior to optimization in Cartesian coordinates. Simple quasi-Newton steps perform slightly better, in general, than the steps shifted by the RFO method; however, the RFO method benefits from greater stability due to the ability to produce a good optimization step even when the Hessian has negative eigenvalues. It is common in optimization programs using the quasi-Newton method to employ additional fail-safes, such as restarting the optimization with a new approximate Hessian or rejecting updates that result in an indefinite Hessian, but the performance and merits of these approaches were not studied in the present work. The SRFO method offers a level of efficiency that is closer to the unshifted quasi-Newton method, while still providing the reliability of the RFO method.

For the new methods introduced in this paper, the RotCrd method was the only one that required more iterations (701) than the standard method (689) to optimize the test set. In general, the unsigned determinant of the quasi-rotation matrix was <1.05 even for large steps; however, when the BFGS update produced an indefinite Hessian, the subsequent step not only tended to increase the energy, but the resulting quasi-rotation matrix had a determinant much larger than previous or subsequent steps, in some cases being as large as 1.2. Since corresponding poor updates were observed in the standard algorithm, it suggests that changes to the local basis may play some role in these poor updates, and that a large determinant of the quasi-rotation matrix may have some use in rejecting what is likely to be a poor step prior to evaluating the energy/gradient. Additionally, the results here indicate that the RotCrd method adequately handles changes in the active coordinates during the course of the optimization, suggesting that it may be worthwhile to use this approach in future investigations of new strategies that involve the evolution of the redundant internal coordinate set during the optimization.

Both of the flowchart methods (FlowPSB and FlowSSB) and the SRFO method required fewer iterations to optimize all of the structures in the test set than the standard method. Only 8 of the 20 optimizations needed to fall back to the symmetric Broyden update in the flowchart method, so the differences in performance between the FlowPSB and FlowSSB methods are limited to the differences in those 8 structures with neither approach doing consistently better than the other. Even when the symmetric Broyden update was needed, it was only used once or twice during the course of the optimization, and so the BFGS and SR1 updates were well suited to these problems. It is possible that greater differences between the BFGS and the flowchart updates would be observed when the surface is less regular than ab initio or DFT surfaces (for example, using semi-empirical energies, or solvation models), and future investigation is warranted.

The SRFO method had the best overall performance of the seven methods in Table 1, requiring 45 fewer iterations to converge the test set than the standard approach. The SRFO method was also the only new method that converged to different structures in six of the cases. When the initial structure is high in energy and far from a minimum, there may be multiple minimum energy wells that are accessible, each containing a different stable conformer. Since the SRFO effectively emphasizes changes to dihedral and valence angles early in the optimization, it is reasonable to expect that it will occasionally find a different minimum than the standard RFO approach, even when using the same initial Hessian and same Hessian update formula. For the molecules where the SRFO method had the largest improvement over the standard method (Azadirachtin, Bisphenol A, EASC and Inosine), the SRFO and Standard methods converged to the same structure. Likewise, the only case where the SRFO method needed a significant number of additional iterations to converge (Ochratoxin A), the SRFO method produced a lower energy structure.

6 Summary

The present paper discusses various components of optimization methods, including coordinate systems, transformations, control of the step size and direction, and Hessian updating and provides a diverse set of 20 medium size molecules for testing optimization methods. The best current geometry optimization methods typically use redundant internal coordinates, BFGS updating of the Hessian, RFO and trust region updating for step size control. A number of new refinements have been proposed and tested: updating that switches between SR1, BFGS and PSB or SSB to obtain the most reliable update (FlowPSB and FlowSSB), scaled RFO (SRFO) that varies control of the step size and direction depending on the softness or stiffness of the modes, and quasi-rotation propagation of the internal coordinates (RotCrd). The latter is an initial approach to working in only the active space of the redundant internal coordinates which necessarily changes during the course of the optimization. The results for optimization of the molecules in the test set clearly show that redundant internal coordinates are superior to Cartesian coordinates. RFO is better than quasi-Newton since it can reach a minimum even when the Hessian has negative eigenvalues. The FlowPSB and FlowSSB methods are comparable to the standard BFGS updating in most cases. The SRFO approach takes about 7 % fewer steps for optimization of the test set.