Introduction

Computational chemistry has now become an essential part of drug discovery. Although the approximate nature of computational models is well understood, it is usually implicitly assumed that the individual computational steps are both reproducible and ‘accurate’, at least within the level of approximations applied. Numerical errors are rarely discussed in computational chemistry, although it is well known that these errors are unavoidable when using finite precision computers for infinite aspects of mathematics [1, 2]. In many cases, numerical errors are small and have little effect on calculated properties. However, as the length and complexity of a calculation increases, numerical errors can accumulate to a point where both the calculated property and the reproducibility of the calculation can be drastically affected. As a result, many algorithms that are adequate for small systems can become unstable and fail on larger systems [3]. Computations that exhibit instabilities when faced with small input perturbations are known as being sensitive to initial conditions—(sometimes also referred to as ‘chaotic’) [4, 5]each iterative cycle of such algorithms accumulate and magnify input errors to the point where the magnitude of those errors overwhelm the desired signal. Well-known examples of such systems include meteorological [4], seismological [6] and celestial trajectory simulations [7]. Numerical errors in timing calculations have also been blamed for foul-ups in missile guidance systems [8]. Many dissipative dynamic systems such as minimizations can exhibit final state sensitivity [9]—a condition characterized by fractal boundaries between attractive basins. Very small input perturbations near the basin boundaries of such systems will have drastic effects on the trajectory and final outcome of the calculation.

In computational chemistry, the chaotic nature of molecular dynamics (MD) trajectories and the resulting instabilities are well documented [1013]. In MD simulations numerical instabilities often manifest themselves most strongly as a drift of the total energy of the system, leading to the effect of overheating, as well as the fact that even in 1–2 ps simulations very small perturbations of the atomic positions might lead to an exponential divergence of trajectories and to large differences in the resulting conformations. Other examples of numerical effects in computational chemistry include noisy regions produced by superposition of small exponent basis functions [14] and the tendency for self-consistent field (SCF) iterations to oscillate between solutions [15].

Recent years have seen the routine application of molecular mechanics geometry optimizations to increasingly large and more complex systems. The effects of numerical error are usually ignored in these calculations, and geometry optimizations are often assumed to be completely reproducible, as these computations typically contain no stochastic elements. However, as with any other complex calculation, propagation of numerical error through many iterative cycles of a minimization could cause the calculation to exhibit initial condition sensitivities, ultimately leading to divergent results when subjected to input perturbations or computer platform differences. Despite the well-known examples of numerical instability discussed above, and the fact that many computational chemistry applications could potentially exhibit initial state sensitivity, we currently know of no study that examines the effect of numerical errors on the reproducibility of molecular mechanics geometry optimizations. It is within this context that the authors began to study how small atomic coordinate perturbations and differences between computer platforms can affect the reproducibility such optimizations.

Numerical error effects in molecular mechanics geometry optimizations should be especially apparent when optimizing large molecules with highly complex potential energy surfaces such as proteins. As the number of particles (N) in the system increases, the spatial density of saddle points and local minima on the potential energy surface also increases and the MM potential energy surface (PES) becomes more complex and irregular than may be expected. It has been estimated [16] that, in the case of a Lennard-Jones gas, (i.e., van der Waals forces only) the number of possible minima (N MIN) scales exponentially with the number of particles—N MIN ∼ e αN. Furthermore, the number of first order transition states (N TS1) scales as N TS1 ∼ Ne αN [17]. Here, α is a system-dependant parameter, estimates of which range from 0.02 to 13. Even with low values of α, the sheer number of possible minima and higher order extrema suggests their spatial density must be quite high. As a result, even very small differences in starting coordinates can place structures on different sides of transition state cusps, and in the vicinity of different local minima. Although a protein is not Lennard-Jones gas, it is conceivable that the number of possible minima for a protein structure follows a similar exponential relationship with the number of atoms, leading to a large number of closely spaced extrema on the potential energy surface.

Small numerical differences in the starting atomic coordinates can be introduced from a number of simple and routine data manipulations, including the following sources:

  • Coordinate errors due to inaccuracies in coordinate transformation (translation, rotation);

  • Errors arising from finite field size (e.g. in PDB files) when saving/retrieving files;

  • Errors arising from differences in hydrogen addition (e.g. when working from PDB files with no hydrogens).

In addition to input errors, digital calculations can be sensitive to the following sources of numerical error during run time:

  • Errors arising from finite accuracy in molecular mechanics optimizations and finite final gradient sizes.

  • The IEEE computing standard does not specify the order of all mathematical operations, so the compiler is free to determine this. Different operation order can cause very small changes to individual results.

  • Simple changes to assembly level instructions will cause slightly different results. For instance, some computers have a multiply-add instruction that retain more accuracy than a multiply followed by and addition.

  • When using multiple CPUs, numerical differences can also be introduced when making data-dependent branches (i.e., the flow of an algorithm is such that small data variations lead to different processors executing different pieces of code) or when race conditions between different calculation threads leads to changes in the sequence of operations [18].

Although it is quite difficult to separate out these errors, the aim of this work is to merely show that it is important to consider possible effects of these types of errors when working with large molecular systems.

In general it is difficult to know a priori whether or not a calculation will exhibit numerical sensitivity, but the behaviour can be probed empirically by testing the effects of input perturbations and computer platform differences on the results of sample calculations [9]. Input error effects can be simulated by performing repeated runs on the same computer platform, each time using slightly perturbed versions of the input. Errors introduced by computer platform differences can be studied by running the exact same program and input on different computer systems. Calculations that do not exhibit sensitivity to initial conditions will produce identical or very similar results regardless of computer platform or input errors, while results produced from calculations that do exhibit sensitivity to initial conditions will vary substantially between computer platforms, and when subjected to small input perturbations. Molecular mechanics (MM) geometry optimizations are convenient for the empirical study of initial condition sensitivity because:

  1. (1)

    Optimization algorithms used by MM, such as Steepest Descent and Conjugate Gradient, are typically deterministic (no stochastic elements), so reproducibility is typically expected.

  2. (2)

    Data input errors can be introduced by small Cartesian coordinate perturbations or by saving to and retrieving from low precision file formats.

  3. (3)

    There exist programs that have been compiled on multiple platforms, so the effects of computer platform differences can be investigated.

  4. (4)

    Differences in geometry optimization trajectories accumulate because the coordinates in the (n + 1) step of the algorithm (Xn+1) are computed using coordinates from the previous n step(s), Xn. In addition, the step size used in the optimization will also affect the results.

  5. (5)

    The complexity of the calculation can be increased by simply increasing the number of atoms in the system and/or the number of cycles in the simulation and/or the number of energy terms switched on in the potential.

  6. (6)

    Energy and gradient calculations involve spline approximations, trigonometric functions, 1/rn terms and other ill-behaved mathematical forms that are sufficiently complex to exhibit sensitivity to numerical error.

  7. (7)

    Differences in results can be assessed quantitatively by comparing energies, gradients and coordinates, and qualitatively by visual inspection of structures.

With the above points in mind, we performed some simple tests to independently study the effects of input perturbations and computer platform differences on geometry optimizations. Input errors were explored by examining the effects of small atomic coordinate perturbations on the results of repeated optimizations using the same computer platform. Computer platform differences were explored by examining the results of optimizing the same molecular input using different computer platforms.

Experimental

Input errors from transformations and file I/O

To study input errors, perturbed input structures of six sample peptide systems (structures 1–6, Table 1), varying in size from 44 to 3556 heavy atoms were considered. Structure 1 is a small peptide constructed using the MOE protein builder [19], minimized using the AMBER94 force field [20] in MOE, and saved to disk in PDB format. Structures 26 were generated by starting with the raw PDB files followed by deletion of bound waters and selected ligands. Heavy-atom only and hydrogen-added versions of each structures 26 were used as starting points for generating structures with small precision errors in the input coordinates. The MOE package was used to add the hydrogens in all explicit hydrogen versions of the structures. For each of the six starting structures (heavy-atom only and hydrogen-added), ten coordinate-perturbed versions of the structure (a–j) were generated by subjecting each structure to a ‘random transformation’—a 10 Å translation pulse in a random direction followed by a random rotation. The random rotations employed quaternions rounded to the sixth decimal place, helping to introducing small but real differences in relative atomic coordinates. These new coordinates were then superposed back onto the original coordinates using the MOE pro_Superpose function, which determines the optimum superposition transformation between point sets by minimizing the mean square distance between the corresponding points. Because the superposition transforms were not exactly the reverse of the random transformations, the process yields structures with absolute coordinates almost identical to the starting structures, except for differences of ∼0.0001 Å in the atomic coordinates. Further loss in coordinate precision was introduced by writing the new coordinates to disk in the PDB format, which supports only a limited precision for the coordinates. The new PDB structure files thus produced are identical to the original structures except for minute differences (∼0.001 Å) in the atomic coordinates. The ten (10) perturbed versions of structures 16 were used for subsequent numerical sensitivity tests. In the heavy-atom only versions of structures 26, adding hydrogens to fill valence was performed after the PDB files were read into the molecular modeling packages—this was meant to mimic a typical workflow scenario, and adds additional starting coordinate differences to these structures that arise from hydrogen placement. Details on structures 26 are given in Table 1.

Table 1 Structures considered in this study

Effect of input errors on MM optimizations: structure (1)

Each of the 10 perturbed versions of structure 1 was read into Hyperchem [21], ChemX [22], MOE and Cerius2 [23] packages, and subjected to a molecular mechanics minimization down to a root mean square (RMS) gradient of 10−5 kcal/mol Å2. (The packages MacroModel and Discovery Studio were not involved in these tests because hydrogen-filled PDB structures would have required some manual atom typing after reading in the structures, and this could have introduced additional errors into the results.) Geometry optimizations were performed using the conjugate gradient method, whereas in MOE the default minimization protocol, that employs a cascade of minimization routines starting with steepest descent, followed by conjugate gradient and truncated Newton, was applied. The following MM force fields were applied: the MMFF94s [24] force field in MOE with the default settings, MM+ in HyperChem, ChemX force field in ChemX and the Dreiding [25] force field in Cerius2.

Effect of input errors on MM optimizations-heavy atom only structures (26)

Geometry optimizations on heavy-atom only structures 26 were performed using the Cerius2, MOE, Discovery Studio (DS) [26] and MacroModel [27] software packages. The Cerius2 minimizations were performed using stringent ‘high convergence’ settings: (RMS force on atoms 10−3 kcal/molÅ, maximum force on atoms 0.005 kcal/molÅ, overall energy difference between steps 10−4 kcal/mol, overall rms displacement 10−5 Å, maximum displacement 5 × 10−5 Å). It must be noted that the results didn’t substantially differ from those obtained using the less stringent ‘normal precision’ settings. The MOE optimizations were performed with the default optimizations settings except for the convergence criterion, which was lowered to RMS gradient of 10−5 kcal/mol Å2. The Discovery Studio minimizations were performed using the CHARMm force field [28] with the ‘Adopted Basis NR’ algorithm to an RMS Gradient of 10−8. No implicit solvent model was used and the dielectric constant was set to 1, otherwise default conditions were applied. In MacroModel, geometry optimizations were carried out using the OPLS2005 force field [29] using the Powell-Reeves conjugate gradient (PRCG) method with default parameters and constant dielectrics and no solvent. The calculations were terminated when the gradient was below 10−4.

Effect of input errors on MM optimizations-hydrogen added structures (26)

Geometry optimizations on hydrogen-added structures 26 were performed using the MOE, Discovery Studio (DS) [26] and MacroModel [27] software packages. The minimization criteria and forcefield setting were the same as those used for the heavy-atom only structures.

Effects of platform differences on MM optimizations

The reproducibility of MM calculations across computer platforms was studied with MOE and Discovery Studio, because both programs have been compiled on different computer platforms. Furthermore, the MOE software uses the same underlying C code for the potential calculation on all supported platforms. With these tests, differences in optimizations on different platforms can be attributed strictly to software and architecture differences between the computer platforms. The ten perturbed versions of structures 1 and 3 were minimized in MOE on a three different computer platforms; a 1 GHz Pentium4 Intel clone with 256 Mb of RAM running Windows 2000 service pack 4 operating system (henceforth referred to as the “MOE-Windows” system) a Silicon Graphics Octane2 with a 400 MHz IP30 processor (CPU: MIPS R12000 Processor Chip revision 3.5; FPU MIPS R12010 Floating point chip revision 0.0) and 512 Mb of RAM running IRIX 6.5 (henceforth referred to as the “MOE-SGI” system) and an IBM PowerPC 9113–550 with a Quad processor and 4 Gb of RAM running AIX 5.2 (henceforth referred to as the “MOE-IBM AIX” computer). The ten perturbed versions of structure 3 were minimized in Discovery Studio on an HPxw8200 with 2 Intel Pentium4 cpu’s at 3.2 GHz with 3Gb RAM running under Windows XP Service Pack 2 (henceforth referred to as the “DS-Windows” system) and on a Sun Fire V40z server with 8 dual-core ADM Opteron processors running under RedHat Enterprinse Linux, release 4 (henceforth referred to as the “DS-Linux” system).

Results and discussion

The initial and final molecular mechanics energies for ten coordinate-perturbed versions of structure 1 are listed in Table  2. Table 3(a, b) provide summaries of the corresponding results for heavy-atom only and hydrogen-added structures 26. The heavy-atom only starting structures were minimized using the MOE, Cerius2, Discovery Studio and MacroModel programs. The hydrogen added structures were minimized with all of the packages except Cerius2.

Table 2 MM optimizations of structures 1a1j a
Table 3 MM optimizations of structures 26: (a) from heavy-atom only PDB structures; Hydrogens added in the respective packagesa; (b) from PDB files with hydrogens previously addeda

Input errors from transformations and file I/O

The average RMSDs arising from translation/rotation operations before writing to disk are small but real (<10−5 Å), and reflect truncation and rounding errors incurred during the transform calculations. The average coordinate RMSD difference after reading back from the PDB disk files are substantially larger (∼0.001 Å) and are comparable in magnitude to the allowed precision in the PDB format. Either way, the input errors introduced by these operations seem trivial, and they are much smaller than the maximum precision that can be hoped for from experimental coordinates. The RMSD errors show little size effect and appear to maintain the same order of magnitude when going from 44 to 3556 heavy atoms (results not shown).

Effect of input errors on MM optimizations

Since the MM optimization routines employed here contain no stochastic elements, repeated minimization runs of an identical starting structure performed on one computer platform with one CPU are expected follow the exact same optimization trajectory to the exact same nearest local minimum. We tested this hypothesis by performing the optimization of structure 1 ten times in both the Hyperchem and MOE programs, and structure 2 ten times in MOE alone; as expected, the energies and gradients at each optimization step were identical to all decimal points of precision in each of the repeated runs. Thus, the optimizations all followed exactly the same trajectory to exactly the same local minimum. This procedure was repeated for all structures 36 with the same results; this suggests that in the confines of one computer platform and 1 CPU, minimizations on identical starting structures are completely reproducible.

Differences between the MM trajectories begin to appear as small perturbations are introduced into the starting structure. In Table 2, the initial energies of structures 1a1j show that the small coordinate errors introduced by the transformations and I/O operations can have small effects on the initial MM energies (E initial). The variation in sensitivity to numerical error exhibited by the single-point E initial energies of different forcefields depends primarily on the gradient of the forcefield at that point in phase space. Calculations at potential energy surface (PES) points with large gradients will be significantly affected by small coordinate perturbations, while small coordinate perturbations will have little effect in regions of the PES where the gradient is small.

Subsequent minimizations of structures 1a1j show an interesting result; the differences in the minimized energies (E final) can be greater than the differences in the starting energies (E initial). Also, the individual software packages behave quite differently. ChemX and Hyperchem show identical and near-identical initial MM energies despite the small coordinate perturbations, but subsequent MM optimizations with these packages leads to optimized structures with a range of final energies larger than the range of initial energies. In contrast, MOE shows larger differences in the initial (unoptimized) energy values than either ChemX or Hyperchem, but somewhat smaller energy differences after optimization. Using the Cerius2 package, the range of energies is the same before and after the MM optimization. A possible explanation for the behaviour of C2 might be that the results reflect the simplicity of the Dreiding potential energy surface compared to the other more complex force fields studied here. In general, the sensitivity of a forcefield to numerical error is expected to depend on its complexity, and more precisely, the roughness of its potential energy surface and the density of local minima it produces in phase space. Forcefields with smooth potential energy surfaces and relatively sparse local minima will show less numerical sensitivity than forcefields with rough potential energy surfaces and densely packed local minima. Factors that could also affect the numerical sensitivity of a minimization include the numerical sensitivity of the optimization routine(s) employed and possibly even the compiler options and math libraries used to compile the program. However, the effect of these components is more speculative, and the underlying sensitivity to numerical error probably arises mainly from a rough potential energy surface and a high density of local minima.

The ChemX and Hyperchem programs both produce a different final energy for each perturbed version of structure 1, suggesting that each input structure optimizes to a unique minimum which is similar, but not identical to, the minima found by the other starting structures. In contrast, the MOE and C2 optimizations produce final structures that converge to either two distinct minima when using MOE (E final = 17.6657 or 17.5596 kcal/mol) or three distinct minima when using C2 (E final = 58.0338 or 58.0566 or 58.0648 kcal/mol). One could argue that deeper minimization of these structures to even smaller gradients may result in convergence towards a single energy and structure. However, this is not the case; even if the final convergence criteria are made 2–3 orders of magnitude stricter, the range in the final energies hardly changes. The interesting observation here is the unexpected sensitivity of the minimizations to small starting position perturbations, ultimately causing the process to end up in different local minima.

The peptide structure 1 is a relatively small system, and the energy differences in the final structures are inconsequential when compared to protein-ligand binding energies, which typically range from ∼2–15 kcal/mol [30]. However, as the molecular systems get larger, the variations in the optimized energies and geometries become substantial. The results for the proteins 2–6 are summarized in Table 3a (heavy-atom only source structures) and Table  3b (hydrogen-added source structures). The tables containing the average initial and final energies, as well as their corresponding average errors and ranges. These tables illustrate that for larger systems the range of obtained energies generally increases with system size, and can become quite substantial. In the case of structure 6, the final energies calculated from different starting coordinates can differ by up to 100 kcal/mol. Also, in the case of structures 26, it is rare that within the ten examples a given final energy is repeated. This behaviour is quite different from the ‘expected’ reproducibility. It must be noted that we observed similar effects when optimizing molecular geometries using semi-empirical quantum chemistry at the AM1 and PM3 levels (results not shown).

Qualitatively, the RMSDs between the non-optimized structures are so small as to not be visible in line mode rendering of the overlaid structures, as shown for structure 5 in Fig. 1. A similar superposition of the Cerius2, MOE, Discovery Studio and MacroModel optimized versions of structure 5 (Fig.  2) reveals obvious differences between the optimized structures, with significant structural variation in some regions in the protein.

Fig. 1
figure 1

Overlay of 10 randomly perturbed protein structures before minimization. (The structure is 1KPI in the PDB, shown as structure 5 in Tables 1 and 3a–b). Only the backbone atoms are shown. See text for details

Fig. 2
figure 2

Overlay of 10 randomly perturbed protein structures after minimization using low final gradient settings in four different computer programs. (The structure is 1KPI in the PDB, shown as structure 5 in Tables 1 and 3a–b). Only the backbone atoms are shown. See text for details

Detailed RMSD analysis of the optimized structures 26 shows that on average the sidechains show greater deviation in the optimized structures than the backbone. In Table  4 the average pairwise RMSD between the optimized structure 5 geometries is broken down into contributions from the sidechain atoms, backbone atoms, helicies, sheets and loops/disordered regions. The results show that in general the sidechain, loops and disordered regions show the greatest deviations between the optimized structures. This is expected because these atoms are in more disordered and peripheral regions of the protein, where the minima on the potential energy surface are more broad and shallow.

Table 4 Breakdown of the average pairwise RMSD for structure 5 (mycolic acid cyclopropane synthase 1KPI)a

The variation in final energies and geometries of structures 26 is somewhat unexpected, considering that the coordinate errors introduced into the starting structures are all quite small. Although these input errors were purposefully created, it is important to recognize that these input differences are realistic, and can be inadvertently introduced during commonplace computational chemistry structural manipulations and structure output/retrieval from physical disk. It is also important to recognize that the degree of variation in both final energy and RMSD between optimized structures increases with system size. In Fig. 3(a, b) plots of the range of final energies as a function of increasing heavy atom count show that in general all the packages show an increase in final energy range with increasing heavy atom count, with both the heavy-atom and hydrogen-added source structures. The plots in Fig.  4(a, b) shows that the final pairwise RMSD also increases with heavy atom count, although the rate of increase is not completely uniform.

Fig. 3
figure 3

Plot of the final energy range after minimization versus the number of heavy atoms for different protein structures. (a) For each of the proteins ten perturbed versions were generated using coordinate transformations. The molecules were read into the respective packages from PDB files, hydrogens were added and the structures minimized to a low gradient. (b) For each of the proteins ten perturbed versions were generated using coordinate transformations and then hydrogens were added. The molecules were read into the respective packages from PDB files and minimized to a low gradient. See text for further details

Fig. 4
figure 4

Plot of the average pairwise RMSD (Å) after minimization versus the number of heavy atoms for different protein structures. (a) For each of the proteins ten perturbed versions were generated using coordinate transformations. The molecules were read into the respective packages from PDB files, hydrogens were added and the structures minimized to a low gradient. (b) For each of the proteins ten perturbed versions were generated using coordinate transformations and then hydrogens were added. The molecules were read into the respective packages from PDB files and minimized to a low gradient. See text for further details

The initial energy differences between the heavy-atom only and the hydrogen-added source structures (Table  3a, b) reflect differences in hydrogen placement between the software packages. Since MOE was used to add hydrogens to all the hydrogen-added structures, very little difference is seen between the initial MOE energies of the heavy-atom only and hydrogen-added structures. In contrast, the difference in initial energy of the heavy-atom only and hydrogen-added structures is larger for the other packages, reflecting small differences between the hydrogen addition routine in MOE and the hydrogen addition routine in each respective package. These hydrogen placement differences could produce additional energy variations during optimization, but in most cases there is little difference between the variation in final energies of the heavy-atom only and hydrogen-added source structures. The only exception to this trend is the Discovery Studio optimization of structure 6, where the variation in final energy of the optimized hydrogen-added source structures is four-fold the energy variation of the heavy-atom only source structures. A possible explanation for this behavior is that the hydrogen positions produced by MOE are sufficiently different from those produced by Discovery Studio that the hydrogen-added starting structures are in a much higher gradient region of the potential energy surface than the heavy-atom only starting structures where the hydrogens were added in Discovery Studio. Larger gradients in the beginning of the calculation would increase the sensitivity of the optimization to chaotic effects.

Effects of platform differences on MM optimizations

The initial and final energies obtained on three platforms for structure 1 using MOE are listed in Table  5 . For a given perturbed structure (e.g. 1b), the initial energy using MOE varies less than 0.001 kcal/mol across platforms—much smaller variation than was introduced by the coordinate perturbations. This lack of variation in initial energy across computer platforms is to be expected, because the exact same input file is used in each case, and only one single-point energy calculation is performed to compute the initial energy. Thus, energy differences at this point are solely the result of differences in mathematical function evaluation between the computer platforms. Upon minimization of structure 1 with MOE, two final energy states were produced on all three platforms, i.e. the sensitivity on the starting geometry was reproduced, but no major platform dependence was seen. Furthermore, the two final energies are identical to those produced by the coordinate perturbed structures on a single platform, adding additional support to the observation that these structures minimize to two distinct minima.

Table 5 MOE minimizations of structure 1 on different computer platformsa

The initial and final energies of structure 3 obtained on three platforms using MOE and two platforms using Discovery Studio are listed in Tables  6(a, b). As with structure 1, the initial MOE energies of a given perturbed version of structure 3 varies less than 0.001 kcal/mol across platforms. Using Discovery Studio, the initial energy of a given perturbed structure is identical across platforms to the five decimal places reported in the Discovery Studio log file. However, in contrast with structure 1, the final energy of a given perturbed version of structure 3 varies substantially between platforms. With MOE, the final energy of a given starting structure can vary by 10 kcal/mol across the computer platforms; with Discovery Studio, the cross-platform variation in final energies can be as high as 100 kcal/mol. It should be noted that these energies are much larger than typical protein-ligand binding energies. The results for structures 1 and 3 suggest that the final results from geometry optimizations will be increasingly platform dependent as the size of the molecular system increases.

Table 6 (a) MOE minimizations of structure 3 (bovine pancreatic RNase A 1KF3) on different computer platforms; (b) Discovery studio minimizations of structure 3 (bovine pancreatic RNase A 1KF3) on two computer platformsa

Conclusions

The behaviour of geometry optimization calculations when faced with perturbed input structures, different computer platforms and different programs suggests that sensitivity to initial conditions might be a common problem in molecular mechanics minimizations. With large systems, the errors produced as a result of this sensitivity can be of sufficient magnitude to affect the qualitative and quantitative conclusions drawn from the results. The sensitivity of these geometry optimizations calls into question what one really means by the “nearest local minimum” in an MM optimization. In this regard, the situation is somewhat similar to molecular dynamics, where “it has been a common frustrating experience that when a computer or compiler has been slightly changed, trajectories of MD cannot be reproduced” [31]. It now appears that simple MM optimizations also have similar properties. The root cause of this effect, non-linear interactions inherent in molecular mechanics force fields, has been identified and discussed in the context of molecular dynamics simulations [32]. We have now shown that in practice, even a straightforward energy optimization under certain circumstances cannot be used in a deterministic manner (even though the process itself is fully deterministic), due to its high sensitivity to input precision. Geometry optimizations are often viewed as a marble rolling down a bowl—a picture arising from the harmonic representation of the diatomic potential energy curve. In this scenario, small differences in starting position result in at most small differences in final energies, and these differences can be made infinitesimally small by improving the precision of the calculation. However, as system size increases, minimizations become more akin to a pebble falling down a rocky hillside; small perturbations in initial positions can lead to large differences in the path taken and the final resting place on the valley floor. In this case, no improvement in the precision of the calculations will substantially reduce the large differences in final energy.

Although it is important to recognize input sensitivity as a potential source of error and ambiguity, it is difficult to propose a practical solution. Perhaps the most akin to the issues described in this work are studies of the chaotic nature of molecular dynamics trajectories in proteins [32]. It was shown that in molecular dynamics simulations very small perturbations of the atomic positions (10−3–10−9 Å) led to an exponential divergence of trajectories and to conformations differing by as much as 1 Å RMSD within an elapsed time of 1–2 ps [31]. The effect of these issues on protein folding calculations has also been demonstrated [33]. It was somewhat pessimistically concluded that “individual MD trajectories of folding are too sensitive to small perturbations to have significant predictive quality” [31]. Luckily, the situation appears to be a lot less serious in geometry optimizations. However, to deal with numerical instability, one could ideally map the potential surface in the interesting region. As this is impractical for large molecules such as proteins, it might be a possibility to purposely generate a small number of closely lying starting points (e.g. by coordinate perturbation) and perform the minimization for all of these, selecting the most appropriate solution (e.g. the lowest energy one or perhaps some kind of an average). When failing to use multiple starting points, however, it is very important to bear in mind that these errors might be quite significant. This will especially be the case when calculating small differences in large energies (e.g. when determining protein-ligand binding energies from the energy difference between the complex and the free ligand and protein), or when the initial gradient in substantial (e.g. when minimizing a docked structure in the field of the receptor, or especially when a ligand is placed inside a receptor with no prior bound ligand). In such cases, the errors might be of the same order of magnitude as the calculated quantity, and researchers need to be aware that performing complex calculations under slightly different conditions or different platforms can lead to significantly different results.