Introduction

In this paper, we discuss our submissions to the solvation energy portion of the SAMPL-09 challenge [1]. This is a blind challenge where contributors do not have access to the experimental data until after all of the submissions have been compiled. As Richard Feynman eloquently stated, “The first principal is that you must not fool yourself—and you are the easiest person to fool [2].” It is easy for scientists to convince themselves that obviously they should have done things a certain way after seeing initial results, whereas a blind challenge prevents a cyclical tuning of a method to a specific data set by only allowing comparison to experimental results at the very end. In addition to the official submissions to the challenge, several retrospective calculations are presented to help explain the unexpected results. The purpose of these calculations is not to do better, which would violate the spirit of a blind challenge, but rather to clearly establish trends that were not known a priori.

The solvation methods discussed in this paper use an implicit solvent model and a single conformer. Zap TK [3] is a Poisson–Boltzmann (PB) solver that models solvent as a dielectric and solute as low dielectric with point charges. Zap TK differs from other PB solvers by using atom-centered Gaussians to create a smooth transition between dielectric regions, which greatly diminishes the numerical instabilities inherit in a grid-based approach [3]. SM8 [4] is a Generalized Born (GB) method, which uses descriptors such as refractive index and bulk surface tension in addition to the dielectric to model a solvent. The GB method does not explicitly solve the Poisson equation and thus is ostensibly faster; however, when combined with conformation and charge calculations, the difference in performance between well implemented PB and GB is trivial.

A single conformer model assumes that the geometric conformation is the same for both gas phase and solution phase. This approximation is known not to be physical; the lowest energy conformation in gas phase and solution phase are rarely similar for all but the most rigid molecules. However, assuming a single conformation allows the vibrational–rotational partition function to reasonably be ignored, which simplifies the problem and appears to yield a fortuitous cancellation of error. The single conformation model has been shown to do well versus a simple multi-conformer model [5].

The SM8 [4] method has done well retrospectively on data from previous SAMPL challenges [6, 7] and the authors hoped to replicate the success prospectively in this challenge. The authors submitted SM8 results for a variety of calculations using the M06 and M06-2X functionals [8]. These were done at MMFF94s [9, 10] gas phase global minima and also at geometries further optimized with the functionals. We found that the more computationally expensive geometries did quite poorly. Also, the SM8 submission from the Minnesota group [11] performed better than our submissions. Further analysis reveals a strong geometric sensitivity of model and that the use of OMEGA [12] geometries was key to their success.

For a baseline calculation, a workflow was created that was extremely fast and required minimal human interaction with the data. This workflow took the lowest energy conformer from the OMEGA application, minimized with SZYBKI [13] using the MMFF94s forcefield, used QUACPAC [14] to assign AM1BCC [15, 16] partial charges, and calculated the solvation energy with Zap TK [3]. This workflow is able to calculate the entire SAMPL dataset in a few seconds on commodity hardware. Surprisingly, this was the best submission from the authors. Retrospective Zap TK calculations were done at a variety of geometries in an attempt to explain this result and it was found that the Zap TK model has the same strong geometric sensitivity as SM8.

CM4M [17] charges commonly used with the SM8 method were extracted from the output, and these charges were used for Zap TK calculations to see what sort of effect they would have. The previous attempt at using high-level DFT charges for SAMPL was very successful [5]. The results presented below show reason to be hopeful that high-level charges yield better results. Unfortunately, this experiment was done at what is now known to be a poor set of geometries, which prevents us from comparing these results to other methods in a meaningful way.

Methods

Geometries

Several sets of geometries are used in this paper. The geometries distributed with the SAMPL data set were generated with the OMEGA version 2.3.2 application from OpenEye Scientific Software. With default settings, this application uses a modified version of the MMFF94s forcefield where Coulomb terms have been removed. For small molecules, such as those in the SAMPL data set, complete sampling of all the torsion rules is possible. This set of geometries is referred to throughout the paper as the OMEGA set.

The SM8 submission from Ribeiro et al. [11] used the OMEGA geometries except for a further optimization of glycerol as discussed in their paper found in this issue. This set of geometries is referred to as OMEGA*. Also, an optimization of the OMEGA geometries was done using the MMFF94 forcefield as implemented in the SZYBKI version 1.3.2 application from OpenEye.

Another set of geometries was calculated using 1000 trials of random coordinate embedding, distance geometry relaxation and then MMFF94s optimization. These geometries are assumed to be at the global MMFF94s minimum and are referred to as the MMFF set throughout the paper.

Further optimization was done at both the M06/6-31+G** and M06-2X/6-31+G** levels using the MMFF geometries for input. The 6-31+G** basis set [18] was chosen for optimization in order to replicate the method used in the CM4M paper [17]. All M06 suite calculations were done using the GAMESSPLUS v2008-2 program [19]. GAMESSPLUS v2008-2 is a modified version of the April 11, 2008 (R1) release of GAMESS [20], which adds M06 suite and SM8 functionality as well as other features. The GAMESSPLUS distribution from the University of Minnesota modifies the source code of GAMESS, which is then compiled by the end user.

Cartesian coordinates for the MMFF, M06, and M06-2X sets of geometries are reported in Electronic Supplementary Material.

Charges

AM1BCC charges were calculated using QUACPAC 1.3.1, which includes OpenEye’s implementation of the work of Bayly and colleagues. CM4M charges were calculated for the M06 and M06-2X functionals in GAMESSPLUS using the 6-31G* basis set [21]. Charges were calculated at the listed geometry for each data set without further optimization.

Solvation models

Zap TK from the OpenEye version 1.7.0-3 toolkit release was used for Poisson-Boltzmann calculations. The ZAP9 radii [5] were used for Zap TK SAMPL submissions unless otherwise noted. Also discussed in this paper is a submission from Nicholls [22] where the radii have been parameterized for CM4M charges. Default parameters were used, which include an internal dielectric of 1 and an external dielectric of 80. SM8 calculations were done with the GAMESSPLUS v2008-2 program using default parameters.

Statistics

Root mean square error (RMSE), mean unsigned error (MUE), and mean signed error (MSE) are presented for all methods discussed. These statistics were calculated using the amended values for cyanuric acid and glycerol. Due to missing basis sets for iodine, 5-iodouracil is not discussed in this paper and is left out of all calculated statistics despite being part of the official SAMPL set.

Results and discussion

SM8 calculations

The results for the SM8 calculations along with the given experimental measurements are presented in Table 1. The 1st column containing SM8 calculations done at the OMEGA* geometries is an official SAMPL submission from Ribeiro et al. [11] The other sets of SM8 calculations are official SAMPL submissions from the authors.

Table 1 Solvation energies for SM8 calculations along with experimental results in units of kcal/mol

Columns 2 and 4 are calculated at MMFF globally optimized geometries using CM4M charges at the M06-2X/6-31G* and M06/6-31G* levels, respectively. Comparing these two sets of calculations show that the M06-2X and M06 functionals yield similar results when used for this purpose. The calculations done with M06 had a slightly higher MSE, 0.68 versus 0.56 kcal/mol, but the two submissions had nearly identical results of 2.20 and 2.21 kcal/mol for RMSE. The solvation energies from these submissions are plotted in Fig. 1, where the systematically higher solvation energies from M06 calculations can be seen.

Fig. 1
figure 1

Comparison of experimental energy with predicted energy of M06 and M06-2X using SM8 solvation model in kcal/mol with CM4M charges at MMFF geometries. M06 calculations are represented by blue diamonds and M06-2X calculations are represented by red squares

Results for further geometry optimizations at the M06-2X/6-31+G** level are given in column 3 and optimizations at the M06/6-31+G** level are presented in column 5. Both of these sets perform significantly worse than the calculations done at the MMFF geometries, with an RMSE of 2.51 kcal/mol for the M06-2X/6-31+G** geometries and 2.78 kcal/mol for the M06/6-31+G** geometries. Also, the MSE rose significantly for both functionals following optimization, with a rise from 0.56 to 1.25 kcal/mol for M06-2X and a rise from 0.68 to 1.54 kcal/mol for M06.

Comparison of SM8 calculations at the OMEGA*, MMFF, and M06-2X/6-31+G** geometries show that the RMSE worsens and MSE increases with increasing optimization. The signed errors of these methods for individual molecules are plotted in Fig. 2. This figure shows that the calculated solvation energy becomes systematically less favorable (higher energy) as optimization increases.

Fig. 2
figure 2

Signed error in kcal/mol of SM8 with CM4M charges using M06-2X/6-31G* at various geometries. Yellow line denotes OMEGA geometries with reoptimized glycerol, blue line denotes MMFF geometries, and red line denotes M06-2X/6-31+G** geometries. Molecules ordered by increasing solvation energy from left to right

Zap TK calculations

The Zap TK calculations are presented in Table 2. The best submission from the authors is presenting in column 2 with an RMSE of 2.38 kcal/mol. This submission used OMEGA geometries further optimized with SZYBKI, AM1BCC charges assigned by QUACPAC and solvation calculations were done using Zap TK; this only required a few seconds of computer time on a single core of a Q6600 processor to calculate the entire SAMPL set.

Table 2 Solvation energies for Zap TK calculations in units of kcal/mol

Additional Zap TK calculations were performed retrospectively at several readily available geometries used for other submissions from the authors. These include the OMEGA geometries (column 1), MMFF globally optimized geometries (column 3), and M06-2X geometries (column 4). Columns 1 through 4 were all done using the Zap TK solvation model with AM1BCC charges at increasing levels of geometry optimization. Comparing the MSE of these calculations show that the solvation model predicts less favorable solvation energies with increasing levels of geometry optimization. This is the same geometric sensitivity seen with the SM8 model. The signed errors of individual molecules for columns 1, 3, and 4 are plotted in Fig. 3.

Fig. 3
figure 3

Signed error in kcal/mol of Zap TK using AM1BCC charges at various geometries. Yellow line denotes OMEGA geometries, blue line denotes MMFF geometries, and red line denotes M06-2X/6-31+G** geometries. Molecules ordered by increasing solvation energy from left to right

CM4M charges were extracted from the SM8 M06-2X/6-31G* calculation done at the M06-2X/6-31++G** geometry. A Zap TK calculation was done using these charges and is presented in column 5 of Table 2. As expected, this combination performed very poorly, likely due to the fact that ZAP9 radii are parameterized for AM1BCC charges. A reparameterization of 11 atomic radii [22] specifically for the CM4M charges coupled with the Zap TK method led to significantly better results, as presented in column 6. The Zap TK calculation with reparameterized radii did not perform well compared to the best methods; however, this calculation is now know to have been done at a poor set of geometries as discussed in the following section. This calculation had a lower RMSE than the calculation with AM1BCC charges and ZAP9 radii, so improvement was seen and further study is required to determine how this method will perform with better geometries.

Calculated solvation energy less favorable as optimization increases

The OMEGA application, using default options, generates conformations with a version of the MMFF94s forcefield where electrostatic contributions are removed. This removal causes the molecule to not be concerned with self-attraction, which typically yields a conformation that is able to easily interact with its exterior environment. This sort of conformation works well for the primary use of OMEGA, ligand conformations in protein active sites, but also works well for solvent. The best SM8 submission was performed at this geometry.

The second level of optimization (OMEGA plus SZYBKI) adds back in the electrostatic contributions but uses the first OMEGA geometry as the starting point. This allows the molecule to adjust into a better gas phase geometry. However, the molecule may be trapped in the original solvent-friendly well and not allowed to fully relax, which can be seen in the differences between columns 2 and 3 in Table 2. This second level of optimization proved to be a useful compromise and the best Zap TK submission was performed at this geometry. It is noteworthy that no SM8 submissions were performed at this geometry and the performance of such a calculation is worthy of further study.

The third level of optimization involved a global search for the lowest energy conformation using the full MMFF94s forcefield. This can be thought of as the best gas phase geometry according to the MMFF94s forcefield. The gas phase geometry assumes that there are no exterior forces and causes self-attraction to play a large role in the optimal geometry.

The forth level of optimization attempted by the authors was a full DFT optimization from the MMFF starting point using both the M06 and M06-2X functionals. It is not required in general that DFT will optimize to a geometry that is even less favorable to solvation than a forcefield, since they are different models and the results could be reversed depending on parameterization. However, for the case of the M06 suite and the MMFF94s forcefield, we find that the M06 and M06-2X functionals yield the least favorable geometries for both the SM8 and Zap TK solvation models. In particular, low barrier torsions such as hydroxyl rotations can point towards a negatively charged region of the molecule, which is an electrostatic advantage in the gas phase but is a disadvantage in solution. Additionally, ring systems with multiple low energy conformations can adjust to accommodate the optimal orientation of polar groups. This is highlighted in Figs. 4 and 5, where the OMEGA and M06-2X geometry are compared for d-glucose. The largest differences between the OMEGA and M06-2X solvation energies are in d-glucose, d-xylose, glycerol, and diflunisal, all of which have multiple hydroxyl rotors.

Fig. 4
figure 4

Geometry for d-glucose generated by OMEGA

Fig. 5
figure 5

Geometry for d-glucose optimized with M06-2X/6-31+G**

The differences in calculated solvation energy with regard to geometry are consistent with the findings of Mobley et al. [23]. That study also found that the best vacuum phase geometry had a positive MSE when calculating solvation energy. The study of Mobley et al. found that the best solution phase geometry had a small negative MSE. We did not do a rigorous search for the best solution phase geometry; however, the set of OMEGA geometries also yield a small negative MSE, which is consistent with a good geometry in solution phase. Mobley et al. did not study the effect of increasing levels of quantum mechanical geometry optimization, which we find to yield increasingly large positive errors in calculated solvation energy.

Conclusions

The best submissions for both SM8 and Zap TK were found in the lowest energy well from OMEGA. The best Zap TK submission benefited from a further MMFF94s optimization of this well, while similar optimization was not attempted for any of the SM8 submissions. Global optimization of MMFF94s and expensive DFT geometry optimizations were detrimental to performance of single conformer solvation energy calculations with the SM8 and Zap TK methods.

The combination of CM4M charges with the ZAP9 radii produced unsurprisingly poor results; however, the CM4M charges with reparameterized radii outperformed AM1BCC charges with ZAP9 radii at the same geometries. Additionally, other DFT charge models, which have done well in previous SAMPL challenges, were not attempted at the same geometry. Therefore, further studies involving better geometries and direct comparison to other DFT charge models are required in order to properly judge the performance of CM4M charges with Zap TK.

For solvation energy calculations, as is often the case in the field of computational chemistry, increased computational cost quickly leads to diminishing returns. While high-level gas-phase geometry optimizations are shown to be detrimental, high-level charges do in fact yield improved results. However, these high-level charges are several orders of magnitude more expensive than AM1BCC charges. The best SM8 calculation with DFT charges had an RMSE of 1.89 kcal/mol and the best Zap TK calculation with AM1BCC charges had an RMSE of 2.38 kcal/mol.