FreeSolv: a database of experimental and calculated hydration free energies, with input files

Mobley, David L.; Guthrie, J. Peter

doi:10.1007/s10822-014-9747-x

FreeSolv: a database of experimental and calculated hydration free energies, with input files

Published: 14 June 2014

Volume 28, pages 711–720, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

FreeSolv: a database of experimental and calculated hydration free energies, with input files

Download PDF

David L. Mobley^1,2 &
J. Peter Guthrie³

6741 Accesses
292 Citations
16 Altmetric
3 Mentions
Explore all metrics

Abstract

This work provides a curated database of experimental and calculated hydration free energies for small neutral molecules in water, along with molecular structures, input files, references, and annotations. We call this the Free Solvation Database, or FreeSolv. Experimental values were taken from prior literature and will continue to be curated, with updated experimental references and data added as they become available. Calculated values are based on alchemical free energy calculations using molecular dynamics simulations. These used the GAFF small molecule force field in TIP3P water with AM1-BCC charges. Values were calculated with the GROMACS simulation package, with full details given in references cited within the database itself. This database builds in part on a previous, 504-molecule database containing similar information. However, additional curation of both experimental data and calculated values has been done here, and the total number of molecules is now up to 643. Additional information is now included in the database, such as SMILES strings, PubChem compound IDs, accurate reference DOIs, and others. One version of the database is provided in the Supporting Information of this article, but as ongoing updates are envisioned, the database is now versioned and hosted online. In addition to providing the database, this work describes its construction process. The database is available free-of-charge via http://www.escholarship.org/uc/item/6sd403pz.

Understanding Water and Its Many Roles in Biological Structure: Ways to Exploit a Resource for Drug Discovery

A quantum chemical molecular dynamics repository of solvated ions

Article Open access 21 July 2022

Non-bonded force field model with advanced restrained electrostatic potential charges (RESP2)

Article Open access 03 April 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Hydration free energies have been of substantial interest to the molecular simulations and computer-aided drug discovery communities for many years. These free energies describe the transfer of small molecules between gas to water, or their relative populations in gas and water at equilibrium. This interest stems from both practical and scientific reasons. Water is of considerable interest as a solvent, and these free energies can be used to probe aspects of solvation we do not yet understand [2, 6, 9, 10, 30]. Furthermore, since biomolecular binding interactions involve at least partial transfer of a molecular ligand from solution into a binding site, our ability to accurately model solvation and desolvation is thought to provide insight into the level of accuracy we could expect under ideal circumstances in a binding free energy calculation. That is, we should not expect to have substantially higher accuracy in binding calculations than we can when computing hydration free energies. At a more practical level, these calculations are interesting in part simply because they can be calculated extremely precisely from molecular simulations for many small molecules [32, 49], enabling quantitative comparison to experiment. This comparison can provide insight into where and how to improve our underlying solvation models and force fields [11, 21, 23, 24, 32, 35, 37, 50–52].

For these reasons, the Mobley lab has spent a good deal of effort on hydration free energy calculations. Our approach to calculating these typically involves alchemical free energy calculations based on classical molecular dynamics (MD) simulations [5, 7, 29, 48], usually with a fixed-charge force field in explicit solvent. While other methods such as implicit-solvent calculations [33, 40, 45, 50] and MD simulations based on polarizable force fields [42, 43] or QM-MM approaches [60] are also of considerable interest, this has not been a major emphasis of our work.

Because of our interest in all-atom MD simulations, we previously compiled a database of roughly 504 neutral small molecules with experimental hydration free energies, and we computed hydration free energies of all of these compounds in both implicit solvent [33] and explicit solvent [32] using the GAFF small molecule force field [58, 59], AM1-BCC partial charges [17, 18], and the AMBER (implicit solvent case) [3] and GROMACS (explicit solvent case) [54] simulation packages. This dataset, typically called the “504 molecule set” or the “Mobley set”, has seen substantial use as a benchmark and test set in a reasonably wide variety of applications. We attribute this use partly to the substantial size of the set, but also partly because it includes both experimental and calculated values for all of the compounds, as well as input files. So, for example, it has been used to test and/or train implicit solvent models to reproduce explicit solvent results with the same parameters, as well as for direct comparisons of new or existing force fields against experiment [1, 8, 11, 13, 24, 25, 27, 28, 38, 55, 57].

While this previous set, which we here call “the 2008 set”, has been useful, it has several deficiencies. First, there are several errors in the set itself, in terms of duplicate compounds, incorrect values, and so on. While these issues are being corrected via an erratum, it seems likely that further updates will be needed in the future (especially if new experiments begin being done), and there is no obvious mechanism for keeping the database updated when its main repository is the Supporting Information of a particular paper. Second, the format is less than ideal (in that much of the key information is embedded in PDF files within the Supporting Information), making it difficult to deal with in an automated manner. While we have provided this information in alternate formats such as plain text to individual researchers, this is hardly an ideal solution. Third, we now have additional experimental and calculated values^{Footnote 1} and we would like to extend the set to include these. Fourth, an ideal database would also include additional information to improve ease-of-use, such additional compound identifiers like SMILES strings or identifiers from other databases such as PubChem, and better handling of experimental sources. Finally, an ideal database should be extensible in a straightforward manner.

To improve on the current situation, we have moved our database online to a permanent, cite-able URL (http://www.escholarship.org/uc/item/6sd403pz) and simultaneously updated, expanded, and curated the set, also adding additional, smaller sets we have studied previously and since. This paper reports on the update and curation process. The final product includes a variety of changes described below, to deal with limitations of the previous database. Additionally, the database is now versioned. While one specific version of the database is deposited in the Supporting Information associated with this paper, the full database now has a permanent, cite-able repository online which will allow further updates. Here, we describe our curation and construction process for this database, which we call the “Free Solvation Database” or FreeSolv.

Database construction

Starting points

The starting point in constructing the FreeSolv database was to pull together all of the lead author’s previous work calculating hydration free energies in explicit solvent. This included calculated values, experimental values, and structures and input files^{Footnote 2} from several previous studies [22, 31–36, 39]. To simplify the following discussion, we will refer to the set represented in each study by one of the author’s names^{Footnote 3}, except for the large 2008 set [32, 33] as noted above. Specifically, we drew on the Dumont set [34], the Nicholls set [39], the 2008 set [32, 33], the Mobley set [31], the Klimovich set [22], the Liu set [35], and the Wymer set [36].

For all of these sets except one, we had retained not only calculated and experimental hydration free energies and original coordinate files (.mol2 format) containing geometries and partial charges, but also input files in the form of GROMACS topology and coordinate files. However, for the Nicholls set, we no longer had topology and coordinate files, so these were re-generated using Antechamber and ACPYPE [53].

After pulling together all these files, we found we had source files for 736 compounds. However, no cross-checking had been done at this point to ensure uniqueness of compounds. Uniqueness will be addressed below.

It is worth highlighting that this database contains only neutral solutes^{Footnote 4}. This is driven by two main considerations. In part, a variety of technical issues make alchemical free energy calculations for charged solutes extremely challenging [19, 20, 46] and we have only recently begun to understand the necessary corrections. Secondarily, experimental measurements of ionic hydration free energies are typically not possible, and typically must be obtained from decomposing solvation of ionic pairs into solvation of the individual compounds. This step can involve assumptions which are controversial. Hence, here, our focus has been on hydration free energies of neutral compounds. It is worth noting, however, that the Rizzo lab database [45] (http://ringo.ams.sunysb.edu/index.php/Rizzo_Lab_Downloads) contains in excess of 50 ions, including monoatomic and polyatomic ions, so the interested reader is referred there.

Error correction

We were already aware of several errors which we corrected in construction of the FreeSolv set. These will also be addressed in errata to the relevant individual studies. Specifically:

A human error had resulted in an incorrect structure and name (triacetyl glycerol) of the molecule which was intended to be triacetin/glycerol triacetate, in the 2008 set [33]. This compound had originated from the Nicholls set [39], where it was correct. The incorrect structure/name is now removed but the correct molecule from the Nicholls set is retained.
The experimental value for hexafluoropropene was corrected from −3.76 to 2.31 kcal/mol; it had incorrectly been assigned the value for hexafluoro-propan-2-ol due to human error interpreting abbreviations in reference [45], as per personal communication [44].
Several duplicates within the 2008 set [33] were removed, including 2-methylbut-2-ene under slight variants of the same name, 3-methylbut-1-ene in similar circumstances, and benzonitrile which is equivalent to cyanobenzene.
From the 2008 set [33], we removed a duplicate butanal entry which had an incorrect experimental value
The molecule labeled pentan-2-one in the Dumont set [34] was actually pentan-3-one, so the name and experimental value were updated to reflect the correct compound
The molecules labeled “lindane” and “prometryn” from the Mobley set were removed because of incorrect stereochemistry in the former case, and a swap between a dimethyl and an ethyl in the latter case. This issue appears to have originated in conversion of .xyz format files to 3D structures when the organizers were preparing for the statistical assessment of modeling of proteins and ligands challenge [14], and will likely require errata to several papers utilizing the relevant set [14]. This was caught during the curation process discussed below.

Initial construction process

While ideally each compound might be identified by its IUPAC name or SMILES string, different schemes for constructing these can lead to different names or strings. Every compound in the set needs a unique identifier, however, so our first step in updating the set was to assign each compound a compound identifier, consisting of the prefix “mobley_” followed by a unique random integer between 0 and 1 billion. These compound IDs serve as the basic identifiers of compounds in the set, and also serve as file names for structures and molecule files. These IDs were assigned automatically via Python script.

Once compound identifiers were assigned, we pulled experimental and calculated values, as well as their uncertainties (when applicable—experimental uncertainties were not always available) and names (some followed IUPAC conventions; others did not) from the sets studied previously via custom Python scripts, with one script handling each prior database separately (since data formats differed). The resulting data was stored into a Python dictionary, keyed by compound ID, along with separate digital object identifiers (DOIs) for the sources of the experimental and calculated values. Our Python scripts also organized the supporting files (3D structures and parameter files), ensuring we had .mol2 files with both SYBYL and GAFF atom naming conventions for each molecule, and organizing the appropriate GROMACS topology and coordinate files. As noted above, in the case of the Nicholls set [39], the relevant script also re-generated topology files. A note of this was added to the ’notes’ field in the database for each of the affected compounds.

Curation process

Following initial construction of the database, we used a Python script drawing on OpenEye software’s Python toolkits [41] to curate the database.

Before doing anything else, this script removed the entry corresponding to 4-nitroaniline from the 2008 set [33], since the Mobley set [31] had this as well with an experimental value which had been more carefully curated [14].

After this, we used OpenEye tools to attempt to parse all of the compound names. Any names which did not parse correctly at this stage were flagged for attention, and these were typically dealt with in one of two ways. First, some of the failures were because stereochemistry information was unspecified by the compound name, but specified in our existing 3D structures. In these cases (1,2-dichloroethylene, nerol) we re-generated IUPAC names from the 3D structure using OpenEye tools. Second, the remaining cases were dealt with manually. There seemed to be several major sources of problems. There were a handful of typos (5-flurouracil rather than 5-fluorouracil, for example), and a variety of other cases where a common name had been used for the compound which was not recognized by the OpenEye toolkits (carbaryl, trifluralin, pirimor, etc.). The Mobley set [14, 31] was the origin of many of these. These were typically resolved by finding alternate names. Our default procedure was to generate the compound from its common name in MarvinSketch [4], and then compute an IUPAC name within MarvinSketch and check if the OpenEye toolkit could parse it back into the correct structure. When this procedure failed, we resorted to searching Wikipedia or PubChem for alternate compound names and checking that we obtained one which the OpenEye toolkits could parse back into the correct structure. In any case where the IUPAC name was edited as described here, a note to this effect was added in the ’notes’ field of the database. All compound names were stored to the ‘iupac’ field in the database, though not all of these are technically IUPAC names. Additionally, alternate IUPAC names were assigned manually in two additional cases when PubChem lookup (discussed in Section 2.5, below) by the name failed. Specifically, mobley_2636578, 1,3-bis-(nitrooxy)propane, was renamed as 3-nitrooxypropyl nitrate, and mobley_819018, trans-3,7-dimethylocta-2,6-dien-1-ol, was renamed as (2E)-3,7-dimethylocta-2,6-dien-1-ol.

Following this check of compound names, we then generated canonical isomeric SMILES strings for each compound from the 3D structure and stored this to the database. We also then generated an analogous SMILES string for each compound from its stored name. In any case where SMILES generation from the name failed, a new name was generated from the 3D structure and stored, with the ‘notes’ field updated accordingly. In cases where SMILES were generated from both the name and the 3D structure (the vast majority of cases), we cross-checked these and ensured that they matched. This was the step where we caught the errors relating to lindane and prometryn noted above. Aside from that, no errors were found at this step.

Since for the vast majority of compounds, we now had two isomeric SMILES strings—one generated from the name, and one from the 3D structure—this provided an ideal opportunity check for redundancy in the set. Many compounds at this point appeared multiple times. For example, almost all of the compounds from the Dumont set [34] also appeared in the 2008 set [32, 33]. Some of the compounds from the 2008 set appeared in later sets as well. Thus, our next step was to remove duplicate compounds. This was made slightly more difficult by the fact that in some cases, the experimental data had a different origin (typically because an alternate name for the compound had led us to overlook the duplication initially), and thus the experimental values were potentially different. We dealt with this by identifying compounds which were identical (i.e. their canonical isomeric SMILES strings or chemical names were equivalent) and cross-checking their experimental values. In any case where the difference in experimental values was larger than the tabulated experimental uncertainty, the case was flagged for further investigation. This was not true for any of the compounds in the set except 4-nitroaniline, which occurred in both the 2008 and Mobley sets [14, 31]. After investigation, it was concluded that the later value is probably superior and this was retained. The remaining duplicates, where differences were not statistically significant (approximately 72), were removed from the set automatically.

In separate work, J. Peter Guthrie is compiling an extensive, carefully curated database of experimental hydration free energies. We cross-compared experimental values in our set to a pre-release version of the Guthrie database, and flagged discrepancies above 1 kcal/mol. (The number of discrepancies below 1 kcal/mol numbered over 100, and falls within the scope of Guthrie’s database curation work rather than the scope of this paper). In these cases we obtained details of the data from Guthrie and in some cases updated experimental values and references. When we did so, this is shown in the ’notes’ field of the database. This was true for 4-propylphenol, 4-bromophenol, 3-hydroxybenzaldehyde, 2-methoxyethanol, (2E)-hex-2-enal, and dimethyl sulfoxide/ methylsulfinylmethane.

Additionally, after consultation with Guthrie, we removed a series of sulfonylurea compounds from the Mobley set [14, 31], because of concerns about the quality of the underlying vapor pressure measurements, especially Figs. 2, 3, 4, 5 of reference [47]. Specifically, we removed the compounds called sulfometuron-methyl, metsulfuronmethyl, chlorimuronethyl, thifensulfuron, and bensulfuron. Unfortunately this means that we now only have two sulfones in our set, and in general have far too few sulfur-containing compounds, as we discuss below.

We also updated the experimental details for 1,3-butadiene. Specifically, we updated the reference to point to the original experimental data of Hine and Mookerjee [16], and updated our previous hydration free energy of 0.6–0.65 kcal/mol. As pointed out by Christopher I. Bayly in personal correspondence, the raw data there for activity coefficients in gas and water (\(-\log c_g = 1.39\) and \(-\log c_w = 1.87\)) leads to a difference of −0.48 rather than the stated value of −0.41, which is apparently a typo. The former leads to a hydration free energy of 0.65 kcal/mol, the correct value, while the latter would yield 0.56 kcal/mol.

As a final step, we also generated SDF format files for all of the molecules in the set using the OpenEye toolkits. These supplement the .mol2 files we already had available.

Any further curation done will be documented in the database documentation distributed with each database version.

Annotation

In the past, we have found it useful to focus analysis on just a fraction of the database, such as by examining systematic errors organized by functional group[32]. To aid further such analysis, we used Checkmol [15] to assign functional groups to all of the compounds in the set. The resulting functional group identifiers were stored to the database in the ‘groups’ field.

We also decided to link compounds in our set to alternate databases to simplify future work relating to compound identification, so we chose PubChem compound identifiers as an alternate way of referencing compounds. We assigned PubChem compound IDs to all of the compounds in our set using PubChemPy [56] automatically. Our script first attempted lookup by the assigned compound name (usually IUPAC name) and in cases where this did not result in a match in PubChem, it fell back to lookup via SMILES string. In several cases, typically due to unspecified stereochemistry in PubChem, we had to assign a PubChem ID manually. This was the case for mobley_6843802 ([(1R)-1,2,2-trifluoroethoxy]benzene); mobley_7869158, [(2S)-butan-2-yl] nitrate; and mobley_9741965, 1,3-bis-(nitrooxy)butane. PubChem IDs are thus stored in the database for all compounds in the set.

Database format

Currently, the database is stored within Python as a dictionary, keyed by compound ID, with each compound having keys for the various entries (SMILES string, experimental value and uncertainty, calculated value and uncertainty, (IUPAC) name, functional groups, PubChem ID, and notes). This database is then stored as a Python pickle file, and in a semicolon delimited text file. In the latter format, functional groups are stored to a separate file, groups.txt, to ensure the number of fields in the database text file is manageable. The semicolon delimited format was chosen because other common delimiters (spaces, commas) often occur in compound names making them unsuitable as delimiters.

Database contents

Currently, the database contains 643 neutral compounds which can mostly be considered fragment-like from a drug discovery perspective. The range in molecular weight from methane (16.04 Daltons, compound mobley_9055303) to 1,2,3,4,5-pentachloro-6-(2,3,4,5,6-pentachlorophenyl)benzene (that is, decachlorobiphenyl, at 498.66 Daltons, compound mobley_5456566) (Fig. 1). The compounds also span a range of polarities. While experimental dipole moments are not part of our data set, we can compute dipole moments based on the AM1-BCC partial charges assigned to molecules, and we find that dipole moments range from 0.0 (methane and many others) to 7.14 for 4-nitroaniline (mobley_6082662). Experimental hydration free energies cover a range of approximately 29 kcal/mol, from 3.43 kcal/mol for octafluorocyclobutane (mobley_1723043) to −25.47 kcal/mol for (2R,3R,4S,5S,6R)-6-(hydroxymethyl)tetrahydropyran-2,3,4,5-tetrol^{Footnote 5} (mobley_9534740). Calculated hydration free energies range from 3.43 kcal/mol for decane (mobley_2197088) to −21.71 kcal/mol for cyanuric acid (mobley_6239320). The distribution of these properties is shown in Fig. 2.

While calculated and experimental hydration free energies for the compounds in this set have been compared before, this analysis is spread across several studies and aggregate statistics are not available. Figure 3 compares calculated and experimental values for the set. Here, we find an overall average error of 0.47 ± 0.06 kcal/mol, an RMS error of 1.51 ± 0.07 kcal/mol, an average unsigned error of kcal/mol, a Kendall τ of 0.80±0.01, and a Pearson R of 0.94±0.01.

As noted previously [25, 32], having such a large set of data makes it possible to look for systematic errors in the force field description of particular functional groups. This can also be seen in Fig. 4, where we look at the average unsigned error by functional group (as assigned by Checkmol)^{Footnote 6}. Previously, we have used information from similar tests to isolate systematic errors for alkynes [32] and alcohols [12] and taken some steps towards addressing these issues. However, further work in this direction is needed, as it seems fairly clear that some functional groups tend to have particularly large errors.

One reason hydration free energies are of such interest is that they provide a test of potential relevance to binding affinity calculations for drug discovery. But is this set relevant to drug discovery? The typical size of molecules in the set is substantially smaller than typical small-molecule drugs. As noted, many of these molecules are more like “fragments” than drugs. But this may not be a problem as long as we cover all the common chemical functionalities found in drug molecules. For example, if we know that each hydroxyl group typically leads to a systematic error of just over 1 kcal/mol in fragment-like molecules [12], there is no reason to assume the error should be more or less in larger, drug-like molecules. But if there are some functional groups which frequently occur in drug-like molecules but are missing from the present set, then we have very little insight into what level of performance to expect on compounds containing these functional groups.

To compare functional group representation in typical drugs with that in our set, we downloaded the set of small molecule drugs from DrugBank 3.0 [26]. This contains over 1,500 approved small-molecule drugs and a larger number of experimental drugs, with some 6,583 molecules in total. We then compared the functional group distribution seen in these molecules with that represented in our set (Figure 5)^{Footnote 7} On the whole, results are mixed. The present set does cover a reasonably broad range of functional groups, and even has more of some functional groups than in typical drugs (chlorinated compounds are a good example of this). But some functional groups are underrepresented by far or do not appear at all, such as aminals/hemiaminals, boronic acid and boronic acid esters, enamines, enols, enol ethers, hemithioaminals, and many sulfur-containing compounds, especially sulfonamides, sulfonic acids, sulfuric acid monoesters, and thiocarboxylic acid esters. If we want to truly understand how our methods can do at predicting thermodynamic properties for molecules containing these functional groups, we will need more data. These classes of compounds are also particularly concerning in that they are further away from the region of chemical space we have studied the most—specifically, current biomolecular force fields have typically started with proteins and sometimes nucleic acids and branched out from there. As we move further from that region of chemical space, we know less about how well we can expect our force fields to work. And thus we particularly need more data for these types of compounds.

Conclusions

Here, we provide FreeSolv, an updated database of calculated and experimental hydration free energies for a large set of 643 neutral molecules which are mostly fragment-like. This database is freely available at http://www.escholarship.org/uc/item/6sd403pz and updates will be posted there when available.

While this database builds on our previously published work, it corrects a number of errors and redundancies and is more carefully curated. It is also designed to allow easy automated use via programs and scripts, and contains a variety of supporting files including molecular structures, topology and coordinate files, parameter files, and so on. We also provide SMILES strings and PubChem compound IDs for all the compounds in the set to allow easier cross-linking to other sources of chemical information.

We hope that the availability of the FreeSolv dataset will drive future force field development, development and testing of new methods, and potentially even new experimental work to fill in gaps in the available data. For example, we have highlighted functional groups which are common in drugs, and which are underrepresented or not present in this set.

Supporting Information

In the Supporting Information, we provide version 0.3 of the FreeSolv database, released Feb. 3, 2014, and a PDF file detailing changes leading up to this database.

Notes

Obtained using essentially the same protocols.
With one exception described below.
The author selected is usually one of those involved in running the calculations represented; for most of these sets, J. Peter Guthrie was key in determining the composition of the set.
It does contain a variety of carboxylic acids which would be expected to be charged in solution at neutral pH, but hydration free energies of these are typically reported for the neutral form of the molecule.
Tetrahydropyran numbering is used here.
Various groups used extremely long names and were abbreviated, while some other groups which were underrepresented were filtered out. We provide statistics only for groups occurring in more than 5 compounds, and we renamed “tertiary aliphatic amine (trialkylamine)” to “trialkylamine”, “halogen derivative” to “halogenated”, “tertiary aliphatic/aromatic amine (alkylarylamine)” to “alkylarylamine (3rd)”, “primary aliphatic amine (alkylamine)” to “alkyl amine”, “phenol or hydroxyhetarene” to “phenolic”, “secondary aliphatic/aromatic amine (alkylarylamine)” to “alkylarylamine (2nd)”, “secondary aliphatic amine (dialkylamine)” to “dialkylamine”, “orthocarboxylic acid derivative” to “ca-ortho”, and “carboxylic acid ester” to “ca-ester”.
As was the case when we examined the average error in our set by functional group, we simplified and shortened a variety of group names, as well as merging some groups and passing over others which contained too few or too many compounds. Specifically, every “carboxylic acid” was abbreviated “ca”, so “carboxylic acid amidine” became “ca-amidine”, etc. Other names were simplified to aid alphabetizing, such as “primary aliphatic amine (alkylamine)” being replaced by “amine, alkyl”, and similar changes for other alcohols and amines. “carbamic acid ester (urethane)” became “urethane”, and “halogen derivative” became “halogenated”. We otherwise retained only groups which occurred in at least 30 compounds in DrugBank, and passed over groups labeled “aromatic”, “heterocyclic”, “anion”, “cation”, and “alkene” because they tended to hit too many compounds or (in the case of “anion” and “cation”) were assigned in error. Other groups were merged to save space, either because they involved sub-categories (i.e. “carboxylic acid imide, N-unsubstituted” and “carboxylic acid imide, N-substituted” just became “carboxylic acid imide”) or to reduce the number of categories (“acetal” and “hemiacetal” became “acetal or hemiacetal”).

References

Aguilar B, Onufriev AV (2012) Efficient computation of the total solvation energy of small molecules via the R6 generalized Born Model. J Chem Theory Comput 8(7):2404–2411
Article CAS Google Scholar
Baron R, Setny P, McCammon JA (2012) Hydrophobic association and volume-confined water molecules. In: Gohlke H (ed) Protein-ligand interactions. Wiley, New Jersey
Google Scholar
Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM Jr, Onufriev A, Simmerling C, Wang B, Woods RJ (2005) The Amber biomolecular simulation programs. J Comput Chem 26(16):1668–1688
Article CAS Google Scholar
ChemAxon: MarvinSketch (2013)
Chodera JD, Mobley DL, Shirts MR, Dixon RW, Branson K, Pande VS (2011) Alchemical free energy methods for drug discovery: progress and challenges. Curr Opin Struct Biol 21(2):150–160
Article CAS Google Scholar
Chorny I, Dill K, Jacobson MP (2005) Surfaces affect ion pairing. J Phys Chem B 109(50):24,056–24,060
Article CAS Google Scholar
Christ CD, Mark AE, van Gunsteren WF (2010) Basic ingredients of free energy calculations: a review. J Comput Chem 31(8):1569–1582
CAS Google Scholar
Corbeil CR, Sulea T, Purisima EO (2010) Rapid prediction of solvation free energy. 2. The first-shell hydration (FiSH) continuum model. J Chem Theory Comput 6(5):1622–1637
Article CAS Google Scholar
Fennel CJ, Bizjak A, Vlachy V (2009) Ion pairing in molecular simulations of aqueous alkali halide solutions. J Phys Chem B 113:6782–6791
Article Google Scholar
Fennell CJ, Kehoe CW, Dill KA (2010) Oil/water transfer is partly driven by molecular shape, not just size. J Am Chem Soc 132(1):234–240
Article CAS Google Scholar
Fennell CJ, Kehoe CW, Dill KA (2011) Modeling aqueous solvation with semi-explicit assembly. Proc Natl Acad Sci 108(8):3234–3239
Article CAS Google Scholar
Fennell CJ, Wymer KL, Mobley DL (2014) A fixed-charge model for alcohol polarization in the condensed phase, and its role in small molecule hydration. J Phys Chem B
Gallicchio E, Paris K, Levy RM (2009) The AGBNP2 implicit solvation model. J Chem Theory Comput 5:2544–2564
Article CAS Google Scholar
Guthrie JP (2009) A blind challenge for computational solvation free energies: introduction and overview. J Phys Chem B 113(14):4501–4507
Article CAS Google Scholar
Haider N. Checkmol. merian.pch.univie.ac.at
Hine J, Mookerjee PK (1975) Structural effects on rates and equilibriums. XIX. Intrinsic hydrophilic character of organic compounds. Correlations in terms of structural contributions. J Org Chem 40(3):292–298
Article CAS Google Scholar
Jakalian A, Bush B, Jack D, Bayly CI (2000) Fast, efficient generation of high-quality atomic charges. AM1-BCC model: I. Method. J Comput Chem 21(2):132–146
Article CAS Google Scholar
Jakalian A, Jack D, Bayly CI (2002) Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J Comput Chem 23(16):1623–1641
Article CAS Google Scholar
Kastenholz M, Hünenberger P (2006) Computation of methodology-independent ionic solvation free energies from molecular simulations. I. The electrostatic potential in molecular liquids. J Chem Phys 124:124 106
Google Scholar
Kastenholz M, Hünenberger P (2006) Computation of methodology-independent ionic solvation free energies from molecular simulations. II. The hydration free energy of the sodium cation. J Chem Phys 124:224 501
Google Scholar
Kehoe CW, Fennell CJ, Dill KA (2012) Testing the semi-explicit assembly solvation model in the SAMPL3 community blind test. J Comput Aided Mol Des 26(5):563–568
Article CAS Google Scholar
Klimovich P, Mobley DL (2010) Predicting hydration free energies using all-atom molecular dynamics simulations and multiple starting conformations. J Comput Aided Mol Des 24(4):307–316
Article CAS Google Scholar
Knight JL, Brooks CL III (2009) Validating CHARMM parameters and exploring charge distribution rules in structure-based drug design. J Chem Theory Comput 5:1680–1691
Article CAS Google Scholar
Knight JL, Brooks CL III (2011) Surveying implicit solvent models for estimating small molecule absolute hydration free energies. J Comput Chem 32(13):2909–2923
Article CAS Google Scholar
Knight JL, Yesselman JD, Brooks CL III (2013) Assessing the quality of absolute hydration free energies among CHARMM-compatible ligand parameterization schemes. J Comput Chem 34(11):893–903
Article CAS Google Scholar
Knox C, Law V, Jewison T, Liu P, Ly, S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS (2011) DrugBank 3.0: a comprehensive resource for ’omics’ research on drugs. Nucleic Acids Res 39(Database issue), D1035–41
Li L, Fennell CJ, Dill KA (2013) Field-SEA: a model for computing the solvation free energies of nonpolar, polar, and charged solutes in water. J Phys Chem B p. 131213113930002
Liu Y, Fu J, Wu J (2013) High-throughput prediction of the hydration free energies of small molecules from a classical density functional theory. J Phys Chem Lett 4(21):3687–3691
Article CAS Google Scholar
Michel J, Essex JW (2010) Prediction of protein-ligand binding affinity by free energy simulations: assumptions, pitfalls and expectations. J Comput Aided Mol Des 24:649–658
Article Google Scholar
Mobley DL, Barber AE II, Fennell CJ, Dill KA (2008) Charge asymmetries in hydration of polar solutes. J Phys Chem B 112:2404–2414
Google Scholar
Mobley DL, Bayly CI, Cooper MD, Dill KA, Dill KA (2009) Predictions of hydration free energies from all-atom molecular dynamics simulations. J Phys Chem B 113:4533–4537
Article CAS Google Scholar
Mobley DL, Bayly CI, Cooper MD, Shirts MR, Dill KA (2009) Small molecule hydration free energies in explicit solvent: an extensive test of fixed-charge atomistic simulations. J Chem Theory Comput 5(2):350–358
Article CAS Google Scholar
Mobley DL, Dill K, Chodera JD (2008) Treating entropy and conformational changes in implicit solvent simulations of small molecules. J Phys Chem B 112(3):938
Article CAS Google Scholar
Mobley DL, Dumont É, Chodera JD, Dill K (2007) Comparison of charge models for fixed-charge force fields: small-molecule hydration free energies in explicit solvent. J Phys Chem B 111(9):2242–2254
Article CAS Google Scholar
Mobley DL, Liu S, Cerutti DS, Swope WC, Rice JE (2012) Alchemical prediction of hydration free energies for SAMPL. J Comput Aided Mol Des 26(5):551–562
Article CAS Google Scholar
Mobley DL, Wymer KL, Lim NM (2014) Blind prediction of solvation free energies from the SAMPL4 challenge. J Comput Aided Mol Des 28:135
Article CAS Google Scholar
Mukhopadhyay A, Fenley AT, Tolokh IS, Onufriev AV (2012) Charge hydration asymmetry: the basic principle and how to use it to test and improve water models. J Phys Chem B 116(32):9776–9783
Article CAS Google Scholar
Nerenberg PS, Jo B, So C, Tripathy A, Head-Gordon T (2012) Optimizing solute–water van der waals interactions to reproduce solvation free energies. J Phys Chem B p. 120406101304000
Nicholls A, Mobley DL, Guthrie JP, Chodera JD, Bayly CI, Cooper MD, Pande VS (2008) Predicting small-molecule solvation free energies: an informal blind test for computational chemistry. J Med Chem 51(4):769–779
Article CAS Google Scholar
Nicholls A, Wlodek S, Grant JA (2009) The SAMP1 solvation challenge: further lessons regarding the pitfalls of parametrization. J Phys Chem B 113:4521–4532
Article CAS Google Scholar
OpenEye Scientific Software (2013) OpenEye Python Toolkits
Ponder JW, Wu C, Ren P, Pande VS, Chodera JD, Schneiders MJ, Haque I, Mobley DL, Lambrecht DS, DiStasio RA Jr, Head-Gordon M, Clark GNI, Johnson ME, Head-Gordon T (2010) Current status of the AMOEBA polarizable force field. J Phys Chem 114:2549
Article CAS Google Scholar
Ren P, Chun J, Thomas DG, Schnieders MJ, Marucho M, Zhang J, Baker NA (2012) Biomolecular electrostatics and solvation: a computational perspective. Q Rev Biophys 45(04):427–491
Article Google Scholar
Rizzo RC (2013) Hexafluoropropene correction (2013). Personal communication
Rizzo RC, Aynechi T, Case DA, Kuntz ID (2006) Estimation of absolute free energies of hydration using continuum methods: accuracy of partial charge models and optimization of nonpolar contributions. J Chem Theory Comput 2(1):128–139
Article CAS Google Scholar
Rocklin GJ, Mobley DL, Dill KA, Hünenberger PH (2013) Calculating the binding free energies of charged species based on explicit-solvent simulations employing lattice-sum methods: an accurate correction scheme for electrostatic finite-size effects. J Chem Phys 139(18):184 103
Article Google Scholar
Schmuckler ME, Barefoot AC, Kleier DA, Cobranchi DP (2003) Vapor pressures of sulfonylurea herbicides. Pest Manag Sci 56(6):521–532
Article Google Scholar
Shirts MR, Mobley DL (2013) An Introduction to Best Practices in Free Energy Calculations. In: Biomolecular Simulations. Methods in Molecular Biology
Shirts MR, Pitera JW, Swope WC, Pande VS (2003) Extremely precise free energy calculations of amino acid side chain analogs: comparison of common molecular mechanics force fields for proteins. J Chem Phys 119(11):5740–5761
Article CAS Google Scholar
Shivakumar D, Deng Y, Roux B (2009) Computations of absolute solvation free energies of small molecules using explicit and implicit solvent model. J Chem Theory Comput 5(4):919–930
Article CAS Google Scholar
Shivakumar D, Harder E, Damm W, Friesner RA, Sherman W (2012) Improving the prediction of absolute solvation free energies using the next generation OPLS force field. J Chem Theory Comput 8(8):2553–2558
Article CAS Google Scholar
Shivakumar D, Williams J, Wu Y, Damm W, Shelley J, Sherman W (2010) Prediction of absolute solvation free energies using molecular dynamics free energy perturbation and the OPLS force field. J Chem Theory Comput 6(5):1509–1519
Article CAS Google Scholar
da Silva Sousa AW, Vranken WF (2012) ACPYPE— AnteChamber PYthon Parser interfacE. BMC Res Notes 5(1):367
Article Google Scholar
van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJC (2005) GROMACS: fast, flexible, and free. J Comput Chem 26:1701–1718
Article Google Scholar
Sulea T, Corbeil CR, Purisima EO (2010) Rapid prediction of solvation free energy. 1. An extensive test of linear interaction energy (LIE). J Chem Theory Comput 6(5):1608–1621
Article CAS Google Scholar
Swain M (2013) PubChemPy, https://pypi.python.org/pypi/PubChemPy/1.0
Truchon JF, Pettitt BM, Labute P (2014) A cavity corrected 3D-RISM functional for accurate solvation free energies. J Chem Theory Comput. p. 140114120800002
Wang J, Wang W, Kollman P, Case DA (2006) Automatic atom type and bond type perception in molecular mechanical calculations. J Mol Graph Model 25:247–260
Article Google Scholar
Wang J, Wolf R, Caldwell J, Kollman P, Case D (2004) Development and testing of a general amber force field. J Comput Chem 25(9):1157–1174
Article CAS Google Scholar
Zheng L, Yang W (2012) Practically efficient and robust free energy calculations: double-integration orthogonal space tempering. J Chem Theory Comput 8:810–823
Article CAS Google Scholar

Download references

Acknowledgments

We thank Robert C. Rizzo (Stony Brook University) for help tracking down an issue with hexafluoropropene, and many others who have been involved in work on the experimental and calculated values represented in this database, including Élise Dumont, John D. Chodera, Ken A. Dill, Alan E. Barber, II, Anthony Nicholls, Christopher I. Bayly, Matthew D. Cooper, Vijay S. Pande, Michael R. Shirts, Pavel V. Klimovich, Shuai Liu, David S. Cerutti, William C. Swope, Julia E. Rice, Christopher J. Fennell, Nathan M. Lim, and Karisa L. Wymer. We also appreciate work done by Karisa Wymer and Jessica Fuselier towards initial curation of the set. DLM appreciates financial support from the National Institutes of Health (1R15GM096257-01A1), and computing support from the UCI GreenPlanet cluster, supported in part by NSF Grant CHE-0840513.

Author information

Authors and Affiliations

Department of Pharmaceutical Sciences and Department of Chemistry, University of California, 147 Bison Modular, Irvine, CA, 92697, USA
David L. Mobley
Department of Chemistry, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA, 70148, USA
David L. Mobley
Department of Chemistry, University of Western Ontario, London, ON, Canada
J. Peter Guthrie

Authors

David L. Mobley
View author publications
You can also search for this author in PubMed Google Scholar
J. Peter Guthrie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David L. Mobley.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 102 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mobley, D.L., Guthrie, J.P. FreeSolv: a database of experimental and calculated hydration free energies, with input files. J Comput Aided Mol Des 28, 711–720 (2014). https://doi.org/10.1007/s10822-014-9747-x

Download citation

Received: 29 April 2014
Accepted: 03 May 2014
Published: 14 June 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s10822-014-9747-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FreeSolv: a database of experimental and calculated hydration free energies, with input files

Abstract

Similar content being viewed by others

Understanding Water and Its Many Roles in Biological Structure: Ways to Exploit a Resource for Drug Discovery

A quantum chemical molecular dynamics repository of solvated ions

Non-bonded force field model with advanced restrained electrostatic potential charges (RESP2)

Introduction