Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge

Bosisio, Stefano; Mey, Antonia S. J. S.; Michel, Julien

doi:10.1007/s10822-016-9933-0

Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge

Published: 08 August 2016

Volume 31, pages 61–70, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge

Download PDF

Stefano Bosisio¹,
Antonia S. J. S. Mey¹ &
Julien Michel¹

1016 Accesses
25 Citations
3 Altmetric
Explore all metrics

Abstract

In the context of the SAMPL5 blinded challenge standard free energies of binding were predicted for a dataset of 22 small guest molecules and three different host molecules octa-acids (OAH and OAMe) and a cucurbituril (CBC). Three sets of predictions were submitted, each based on different variations of classical molecular dynamics alchemical free energy calculation protocols based on the double annihilation method. The first model (model A) yields a free energy of binding based on computed free energy changes in solvated and host-guest complex phases; the second (model B) adds long range dispersion corrections to the previous result; the third (model C) uses an additional standard state correction term to account for the use of distance restraints during the molecular dynamics simulations. Model C performs the best in terms of mean unsigned error for all guests (MUE $3.2\,<\,3.4\,<\,3.6\,\text{kcal}\,\text{mol}^{-1}$—95 % confidence interval) for the whole data set and in particular for the octa-acid systems (MUE $1.7\,<\,1.9\,<\,2.1\,\text{kcal}\,\text{mol}^{-1}$). The overall correlation with experimental data for all models is encouraging ($R^2\, 0.65\,<\,0.70<0.75$). The correlation between experimental and computational free energy of binding ranks as one of the highest with respect to other entries in the challenge. Nonetheless the large MUE for the best performing model highlights systematic errors, and submissions from other groups fared better with respect to this metric.

Blinded predictions of standard binding free energies: lessons learned from the SAMPL6 challenge

Article Open access 29 August 2018

On the fly estimation of host–guest binding free energies using the movable type method: participation in the SAMPL5 blind challenge

Article 03 October 2016

Resolving the problem of trapped water in binding cavities: prediction of host–guest binding free energies in the SAMPL5 challenge by funnel metadynamics

Article Open access 29 August 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

An accurate and reliable computational prediction of binding affinities of small molecules binding to larger molecules, such as proteins, remains a major objective of computer simulations for molecular design [1]. In order to assess state-of-the-art tools for computational predictions of thermodynamic properties of binding the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) challenge was formulated almost 10 years ago [2–4]. The goal of SAMPL5, as in previous years, was to compare different computational approaches in blinded challenges for different properties. This report is concerned with host-guest standard binding free energies, as predicted by our group.

Host-guest systems can be regarded as a toy model for protein ligand systems and represent a good play ground for testing the accuracy of thermodynamic property predictions. Various computational methods are available to compute free energies of binding of the host-guest systems [5–8]. Previous SAMPL challenges have featured potential energy functions ranging from quantum chemical [9–11] to molecular mechanical approaches [12, 13]. Molecular dynamics (MD) or Monte Carlo (MC) simulations are frequently carried out to estimate the ensemble averages that yield standard free energies of binding. These methodologies still face three major problems: the sampling problem [14], the translation of host-guest systems into force field terms [15] and the presence of finite size effects [16]. Different approximations lead to various ways of estimating free energies of binding from molecular simulations trajectories, e.g. free energy perturbations (FEP) [17], finite difference thermodynamic integration (FDTI) [18], or end-states only variants such as MM-PBSA [19].

In this study a trajectory based alchemical free energy approach was used to predict standard free energies of binding for 22 host-guest complexes. The dataset consists of 16 guests that bind to three different molecules: two octa-acid hosts (OAH and OAMe), and a cucurbituril clip (CBC) as shown in Fig. 1. The octa-acid systems are basket shaped. OAH [20] has four flexible propionate side chains bearing two rotatable single bonds each, while, OAMe contains four methyl groups, which alter the shape and depth of the hydrophobic cavity. CBC [21, 22] is a more flexible host, which has shown a high binding affinity for ferrocene, adamantane and bicyclooctane guests [23]. The aim of this paper is to illustrate the accuracy and agreement with experiments that can be reached by means of standard free energy of binding calculations using a molecular mechanics approach with the general Amber forcefield (GAFF) [24]. In recent studies by Mirshra et al. and Aldeghi et al. [25, 26] the GAFF force field has not systematically been the most accurate forcefield, but it remains an attractive choice due to the ease of parameter generation, especially given the limited time available in SAMPL between datasets release and deadline for predictions submissions. The performance of three different variants of a double annihilation methodology for binding free energy predictions is critically assessed, as well as overall standing with respect to other SAMPL5 submissions.

Theory and methods

Several approaches have been proposed to compute standard free energies of binding from molecular simulations.

Computing free energies of binding: models A, B, and C

One way of estimating a free energy of binding ${\varDelta }G_{\mathrm {bind}}$ from MD simulations is by means of a double annihilation technique proposed originally by Jorgensen et al. [27] and discussed extensively by Gilson et al. [28]. The free energy of binding ${\varDelta }G_{\mathrm {bind}}$ is given by:

$$\begin{aligned} {\varDelta }G_{\mathrm {bind}} = -k_BT\ln \frac{Z_{\mathrm {HG,solv}}Z_{\mathrm {solv}}}{Z_{\mathrm {G,solv}}Z_{\mathrm {H,solv}}}, \end{aligned}$$

(1)

where $k_{\mathrm {B}}$ is the Boltzmann constant, T the temperature, $Z_{\mathrm {HG,solv}}$, $Z_{\mathrm {G,solv}}$, $Z_{\mathrm {H,solv}}$ and $Z_{\mathrm {solv}}$ are the configuration integrals for host-guest complex, the guest, the host and the solvent molecules respectively. Figure 2 depicts how the double annihilation approach may be used to evaluate ${\varDelta }G_{\mathrm {bind}}$ by means of thermodynamic cycles. First the guest’s partial charges are turned off both in water and in the host-guest-complex phase (discharging step), giving the discharging free energy change ${\varDelta }G_{\mathrm {elec}}^{\mathrm {solv}}$ and ${\varDelta }G_{\mathrm {elec}}^{\mathrm {host}}$ respectively. Secondly, the guest is fully decoupled from the solvent or host, switching off the van der Waals terms (vanishing step), ${\varDelta }G_{\mathrm {vdW}}^{\mathrm {solv}}$ and ${\varDelta }G_{\mathrm {vdW}}^{\mathrm {host}}$. The discharging and vanishing steps are usually performed with a series of intermediate simulations that depend on a coupling parameter $\lambda \in [0,1]$. In the double annihilation method the term ${\varDelta }G_{\mathrm {rest}}$ shown in Fig. 2 is zero (see details below). Closure of the thermodynamic cycle in Fig. 2 shows that in the double annihilation technique the free energy of binding ${\varDelta }G_{\mathrm {bind}}$ is given as [29]:

$$\begin{aligned} {\varDelta }G_{\mathrm {bind}}= \left( {\varDelta }G_{\mathrm {elec}}^{\mathrm {solv}} + {\varDelta }G_{\mathrm {vdW}}^{\mathrm {solv}}\right) -\left( {\varDelta }G_{\mathrm {elec}}^{\mathrm {host}} + {\varDelta }G_{\mathrm {vdW}}^{\mathrm {host}}\right) . \end{aligned}$$

(2)

Free energies of binding computed according to Eq. (2), will be referred to as model A. In the actual MD simulation an empirical distance-restraint term is added to the potential energy function. This is done to prevent the non-interacting guest from drifting out of the host cavity. A flat-bottom restraining potential is used between one atom of the guest, chosen to be the one closest to the centre of mass, and four equivalent carbon atoms of the host. The restraint potential for atom j of the guest is based on work presented in Ref. [5] and takes the following form:

$$\begin{aligned} U^{\mathrm {restr}}(d_{j1},...,d_{jN_{host}}) =\sum _{i=1}^{N_{\mathrm {host}}} {\left\{ \begin{array}{ll} 0 &{} \quad \mathrm {if}\; | d_{ji} - R_{ji} | \le D_{ji} \\ \kappa _{ij}\left( | d_{ji} - R_{ji} | - D_{ji} \right) ^2 &{}\quad \mathrm {ji}\; | d_{ji}- R_{ji} | > D_{ji} \end{array}\right. }, \end{aligned}$$

(3)

where $U^{\mathrm {restr}}(d_{j1},...,d_{jN_{host}})$ is the potential energy of the restraint as a function of the distance between a guest atom j and a set of host atoms i, $d_{ji}=|| \mathbf {r}_{i}-\mathbf {r}_{j} ||$ where $||\circ ||$ denotes a 2-norm, $D_{ji}$ is the restraint deviation tolerance, $R_{ji}$ the reference distance between host and guest atom, $\kappa _{ji}$ the restraint force constant, and $N_{\mathrm {host}}$ the number of host atoms that contribute to the restraint.

Model A neglects, among other things, the contribution of long range dispersions, since a cutoff for the Lennard Jones interaction was set to 12 Å to speed up simulations (see protocols). Following work from Shirts et al. [30], it is possible to introduce a long range dispersion correction term to the free energy of binding as a post processing step of the simulation trajectories. This leads to a corrected free energy of binding ${\varDelta }G_{\mathrm {bind,LJRC}}$ given by:

$$\begin{aligned} {\varDelta }G_{\mathrm {bind,LJLRC}}= & {} \left( {\varDelta }G^{\mathrm {solv}}_{\mathrm {elec}} + {\varDelta }G^{\mathrm {solv}}_{\mathrm {vdW}}\right) \nonumber \\&\quad -\,\left( {\varDelta }G^{\mathrm {host}}_{\mathrm {elec}}+{\varDelta }G^{\mathrm {host}}_{\mathrm {vdW}}\right) \nonumber \\&\quad +\,\left( {\varDelta }G^{\mathrm {host}}_{\mathrm {LJLRC}}-{\varDelta }G^{\mathrm {solv}}_{\mathrm {LJLRC}}\right) . \end{aligned}$$

(4)

Equation (4) gives the free energy of binding for model B, where the Lennard Jones dispersion correction ${\varDelta }G^{\mathrm {X}}_{\mathrm {LJLRC}}$ can be computed making use of the Zwanzig relation [31] in the following way:

$$\begin{aligned} {\varDelta }G^{\mathrm {X}}_{\mathrm {LJLRC}}=k_BT\ln \langle \exp [-\beta (U_{\mathrm {LJ,long}}(\mathbf {r})-U_{\mathrm {LJ,sim}}(\mathbf {r})]\rangle _{\mathrm {X}}+U_{\mathrm {LJ,ana}}, \end{aligned}$$

(5)

where X = host or solv, $U_{\mathrm {LJ,long}}$ is the Lennard Jones energy computed with an increased long range cutoff and $U_{\mathrm {LJ,ana}}(\mathbf {r})$ is an analytical correction from the increased range long cutoff to infinity. The long range correction $U_{\mathrm {LJ,long}}$ is estimated in a post processing step of the ‘vanishing’ trajectories generated at $\lambda =0$ and $\lambda =1$, by extending the domain of the typical Lennard Jones cutoff radius in the simulation from 12 Å to cover almost the entire box instead. To define this long cutoff, the minimum box length in all directions in the input coordinates is calculated, and the new cutoff radius is set to $r_{c,\mathrm {long}}=0.95 \min (L_x,L_y,L_z)/2$ to allow for some fluctuations in box size. This allows an averaging over the whole trajectory of the additional contribution of the long range potential $U_{\mathrm {LJ,long}}$, with respect to the simulated Lennard Jones term $U_{\mathrm {LJ,sim}}$. This correction, however, does not account for an infinitely large box size giving rise to an analytical correction over an infinite domain, which is given by the additive constant given below:

$$\begin{aligned} U_{\mathrm {LJ,ana}} = 8\pi \rho \sum _i^{N_{\mathrm {sol}}}\sum _{j}^{N_{\mathrm {solv}}}\left[ \frac{\epsilon _{ij}\sigma _{ij}^{12}}{9r_{\mathrm {c}}^9} - \frac{\epsilon _{ij}\sigma _{ij}^6}{3r_{\mathrm {c}}^3}\right] , \end{aligned}$$

(6)

where $\rho$ is the solvent density in mol Å⁻³, $N_{\mathrm {sol}}$ is the total number of atoms in the guest, $N_{\mathrm {solv}}$ the number of solvent molecules, $\epsilon _{ij}$ is the Lennard Jones well depth, expressed in kcal mol⁻¹, and $\sigma _{ij}$ is the Lennard Jones distance, in Å, calculated with the Lorentz–Berthelot combining rule [32]. Lennard Jones parameters for the solvent are those of the oxygen atom of the TIP3P water model [33]. It is implicitly assumed that the radial distribution function g(r) = 1 for distance greater than $r_{\mathrm {c}}$. Both model A and model B lack a well defined reference state in their definition of the free energy change upon binding of the guest molecules. Therefore a third model is proposed to enable a standard state definition. For this purpose the standard state correction is subtracted from the free energy of binding given by Eq. (4). The standard free energy of binding is given by:

$$\begin{aligned} {\varDelta }G^{\circ }_{\mathrm {bind}}= & {} \left( {\varDelta }G^{\mathrm {solv}}_{\mathrm {elec}} + {\varDelta }G^{\mathrm {solv}}_{\mathrm {vdW}}\right) \nonumber \\&\quad -\,\left( {\varDelta }G^{\mathrm {host}}_{\mathrm {elec}}+{\varDelta }G^{\mathrm {host}}_{\mathrm {vdW}}\right) \nonumber \\&\quad +\,\left( {\varDelta }G^{\mathrm {host}}_{\mathrm {LJLRC}}-{\varDelta }G^{\mathrm {solv}}_{\mathrm {LJLRC}}\right) + {\varDelta }G^{\circ }_{\mathrm {restr}}, \end{aligned}$$

(7)

where ${\varDelta }G^{\circ }_{\mathrm {restr}}$ accounts for the introduced flat-bottom restraint. Considering the cycle in Fig. 2, the restraint free energy change can be computed as:

$$\begin{aligned} {\varDelta }G_{\rm {restr}}^{\circ } = -k_{\rm {B}}T\ln \left( \frac{Z_{{{\rm H}\circ \circ{\rm G}^{\rm{ideal}}}, {\rm solv}}}{Z_{\rm H,solv} Z_{\rm {G,gas}}} \right) , \end{aligned}$$

(8)

where $Z_{{{\rm H}\circ \circ {\rm G}^{\rm{ideal}}}, {\hbox{solv}}}$ is the configuration integral for the restrained decoupled guest bound to the host, $Z_{\mathrm {H, solv}}$ is the configuration integral for the solvated host and $Z_{\mathrm {G,gas}}$ is the configuration integral for the guest in an ideal thermodynamic state (i.e. no non-bonded interactions). Assuming that the restraint potential is decoupled from the solvent and host degrees of freedom, Eq. (8) simplifies to:

$$\begin{aligned} {\varDelta }G_{\rm restr}^{\circ } = -k_{\rm B}T\ln \left( \frac{Z_{{\circ \circ {\rm G}^{\rm{ideal}}} , {\hbox{solv}}}}{Z_{{\rm G},{\rm gas}}}\right) , \end{aligned}$$

(9)

where $Z_{{\circ \circ {\rm G}^{\rm{ideal}}} , {\hbox{solv}}}$ is the configuration integral for the decoupled guest. Because the guest has no intermolecular interactions in both thermodynamic states defined in Eq. (9), and because the restraint does not hinder rotational motions, internal and rotational contributions to the configuration integrals cancel out and the only term left is the translational contribution to the configuration integral. For $Z_{\mathrm {G,gas}}$ a standard volume of measurement $V^{\circ }$ is used, with the 1 M dilute solute convention corresponding to $V^{\circ }$ = 1660 ${\AA}^3\,\hbox {mol}^{-1}$. Therefore Eq. (9) simplifies further to:

$$\begin{aligned} {\varDelta }G_{\mathrm {restr}}^{\circ } = -k_{\mathrm {B}}T\ln \left( \frac{V^{\mathrm {restr}}}{V^{\circ }}\right) , \end{aligned}$$

(10)

where the restraint volume $V^{\mathrm {restr}}$ is given by:

$$\begin{aligned} V^{\mathrm {restr}} = \int _{-\infty }^{+\infty }\int _{-\infty }^{+\infty }\int _{-\infty }^{+\infty }dx_{j}\,dy_{j}\,dz_{j} \exp (-\beta U^{restr}(d_{j1},\ldots ,d_{jN_{host}})). \end{aligned}$$

(11)

$V^{\mathrm {restr}}$ can be calculated by numerically integrating Eq. (11). The following procedure was used. First, the coordinates of the host-guest complex in the generated trajectory at $\lambda =1$ of the vanishing step was aligned to the first frame of the trajectory. Then, the average coordinate of each of the four host atoms used for the restraint was computed. Next, a grid spacing and an integration domain needed to be defined. The grid spacing was set to 0.1 Å and the integration domain was defined by the rectangular cuboid that is given by the minimum/maximum coordinates of the four defined host atoms with an additional buffer around the bounding domain of ±5 Å. Numerical integration was then performed via the multidimensional trapezoidal rule.

Host-guest simulation set-up

Host-guest input files were used as provided by the challenge organizers. For the simulations of the solvated guests, guest force field parameters and coordinates were extracted from the provided topologies and the guests were solvated in a rectangular box of TIP3P water molecules [33], with a minimum distance between the solute and the box of 12 Å, using the software tleap. Ions were added to neutralize the overall charge of the box. The system was energy minimized with 100 steps of the steepest decent algorithm. The following equilibration protocol was used: Solute molecules were position restrained with a force constant of 10 kcal mol⁻¹ Å⁻², while the water was allowed to equilibrate in an NVT ensemble for 200 ps at 298 K, followed by an NPT equilibration for a further 200 ps and a pressure of 1 atm using the Amber module Sander [34]. Lastly, 2 ns of NPT simulation was run with Sire/OpenMM6.3 (SOMD) software (revision 2015.0.0) [35, 36], to reach a final density of about 1 $\mathrm {g/cm}^3$ using a timestep of 1 fs. The final coordinate files were retrieved with CPPTRAJ [37]. The same protocol for preparation and equilibration was used for the host-guest complex.

Additionally, the reference system OAH-O1, taken from the SAMPL4 challenge [38], was set-up from scratch. Guest O1 was obtained from the modification of compound G6 using Maestro (v.10.1.012, rel 2015-1, Schrödinger) [39] and further parametrized using AM1-BCC charges [40] using Antechamber 14 [34]. Complex and water phase systems were created with tleap, according to the above protocol, using the same binding mode as the one provided for G6.

Alchemical free energy production simulations

For the discharging steps nine equidistant $\lambda$ windows were selected for the host-guest complex and the guest in water phases. For the vanishing step 12 and 18 equidistant windows were used for octa-acid guests and CBC guests respectively. The reasoning behind these different choices was that the CBC guests were larger, therefore a denser number of $\lambda$ windows was deemed necessary in order to guarantee good overlap of the potential energy distributions of neighbouring $\lambda$ windows, which is essential for free energy estimation via multistate Bennett’s acceptance ratio (MBAR) [41]. All simulations were run for 8 ns. A velocity-Verlet integrator was used with a time step of 4 fs using a hydrogen mass repartitioning (HMR) scheme [42]. All bonds were constrained. All simulations were performed in an NPT ensemble and temperature control was achieved with an Andersen Thermostat with a coupling constant of 10 ps$^{-1}$ [43]. Pressure was maintained by a Monte Carlo barostat that attempted isotropic box edge scaling every 100 fs. Periodic boundary conditions were imposed with a 12 Å atom-based cutoff distance for the non-bonded interactions, using a Barker Watts reaction field with dielectric constant of 78.3 [44]. In the host-guest complex the guest molecules were restrained according to Eq. (3). The parameters were R _ji = 5 Å, D _ji = 2 Å and $\kappa _{{ji}} = 10\;{\text{kcal}}\;{\text{mol}}^{{ - 1}}{\AA}^{{ - 2}}$.

Estimation of free energy changes for models A, B, and C

Individual free energy contributions from the discharging and vanishing steps were estimated by using MBAR [41]. To estimate the accuracy and consistency of the computed binding free energy from Eq. (2), each simulation was repeated twice using different initial assignments of velocities drawn from the Maxwell–Boltzmann distribution. Final binding free energies are reported as the average of both runs and statistical uncertainties were calculated according:

$$\begin{aligned} {\mathrm {err}}({\varDelta }G) = \frac{\sigma }{\sqrt{n}}, \end{aligned}$$

(12)

where $\sigma$ is the standard deviation of both runs and n=2 unless otherwise mentioned.

The computed binding free energies with each model are then compared to experimental values considering two different measures: the determination coefficient $R^2$ and mean unsigned error (MUE). To gain insight into the distribution of the two different measures a bootstrapping scheme is used in which each computed free energy point is considered to parameterize a normal distribution with its mean given by the computed free energy and $\sigma$ the associated computed error. Ten thousand samples are then drawn from the artificial normal distributions for each data point and correlated with the experimental values, giving rise to a distribution of $R^2$ and MUE. The resulting distributions are typically not symmetric around the mean and uncertainties in the dataset metrics are reported with a 95 % confidence interval. All simulation input files and post processing scripts needed for reproducing the results, as well results files, can be found in a github repository https://github.com/michellab/Sire-SAMPL5.

Experimental data

Experimental data for the host-guest complexes of the octa-acids were obtained by a mixture of NMR and ITC measurements. CBC host-guest complexes standard free energy of binding were obtained using UV, visible, and florescent spectroscopic measurements. All data was measured in the laboratory of Bruce Gibb (Tulane University), with detailed description of the two octacid hosts found in [20, 45]. Details on the experimental procedures and error analysis of the experimental data will be described elsewhere [46]. A summary of the experimental data used for the analysis in this work, as provided by the organizers, can be found in Table 1.

Results

Host-guest binding free energy predictions

To test the precision and accuracy of the protocols implementing the three models A, B, and C the free energy of binding of guest O1 to host OAH (used in SAMPL4) was retrospectively predicted. Figure 3 compares the results with experimental data [38]. Both models A,B yield a similar free energy of binding ${\varDelta }G_{bind} = -6.1 \pm 0.5\,\hbox {kcal}\,\hbox {mol}^{-1}$. This is because the long-range corrections for Lennard Jones interactions implemented in model B produce a negligible correction term of $0.03\;{\text{kcal}}\;{\text{mol}}^{{ - 1}}$. By contrast, the addition of a standard state correction in model C leads to a standard free energy of binding of ${\varDelta }G^{\circ }_{bind}=-4.4 \pm 0.5\,\hbox {kcal}\,\hbox {mol}^{-1}$ which is in good agreement with the experimental data of ${\varDelta }G^{\circ }_{\mathrm {bind}} =-3.7\pm 0.1\,\hbox {kcal}\,\hbox {mol}^{-1}$ [38].

Next, blinded predictions were performed for each SAMPL5 host-guest. Figure 4 contrasts the predictive power of the different models against the experimental data that was released after submission of the predictions. Figure 4a, shows the results for model A, b for model B and c for model C respectively. Results for each host-guest system are also reported in Table 1. Taking the full dataset into account, all three models yield a similar $\hbox {R}^2$ value of ca. $0.65\,<\,0.70\,<\,0.75$. Models A, B have a similar MUE of ca. $4.3<4.5<4.7\,\hbox {kcal}\,\hbox {mol}^{-1}$, whereas model C is statistically more accurate, with a MUE of ca. $3.2<3.4<3.6\,\hbox {kcal}\,\hbox {mol}^{-1}$. The accuracy of the predictions for the three different hosts was also considered individually and summarised in Table 1. As judged by the MUE measures, the models perform better across the octa-acid systems than for CBC. In particular, model C gives the best predictions compared to A and B for octa-acid systems, with a MUE of $1.7<2.1<2.4\,\hbox {kcal}\,\hbox {mol}^{-1}$ and $1.4<1.7<2.0\,\hbox {kcal}\,\hbox {mol}^{-1}$ for OAH and OAMe respectively. $R^2$ is on average slightly higher for both octa-acid hosts ($R^2 \,0.77\,<\,0.87\,<\,0.93$, model C for OAH and $R^2 \,0.52\,<\,0.74\,<\,0.93$, model C for OAMe) than CBC ($R^2\,0.70\,<\,0.76\,<\,0.82$) for models A, B and C, but the trend is not strong given statistical uncertainties.

Next, attention was focussed on the guests for which predictions showed the largest discrepancy with respect to experimental data. For instance, the standard free energy of binding of guest G5 in complex with both OAH and OAMe is overestimated by $-3.6\,\hbox {and}\,-2.8\,\hbox {kcal}\,\hbox {mol}^{-1}$ respectively. Guest G4 is also significantly stabilised in complex with OAH. G4 is arguably the most hydrophobic guest in the series studied and evidence from our accompanying distribution coefficient article suggest that the GAFF force field appear to favor the transfer of hydrophobic solutes into hydrophobic environments [47]. Since G5 is the largest outlier in both octa-acid hosts, the validity of its simulated binding mode was evaluated. For this purpose, G5 in the host-guest complexes was rotated by approximately 180$^\circ$ about its centre of mass such that the amine group pointed towards the bottom of the host cavities, and calculations were repeated using these new coordinates after solvent equilibration. Binding free energy predictions from model C obtained for this alternative binding mode were poor (${\varDelta }G^{\circ }_{bind}= +1.8\pm 0.1\,\hbox {and}\,+16.7\pm 0.1\,\hbox {kcal}\,\hbox {mol}^{-1}$ for OAH and OAMe respectively), suggesting that the original binding mode is more likely.

For the CBC host, the best MUE is about $4.8\,<\,5.1\,<\,5.4\hbox {kcal}\,\hbox {mol}^{-1}$ for model C, with no significant difference over model A and B. This is surprising since the determination coefficient $R^20.69\,<\,0.76\,<\,0.82$ is quite reasonable. In particular, model C performs better than A and B, but large errors are present for a series of guests. Guests G2 and G3 are predicted to bind substantially worse than observed in experiments. The main difference with other guests in this dataset is that these two molecules are made up of linear flexible alkyl chains, and contain several (presumably) positively charged ammonium groups. By contrast, G4–G7, G9 and G10 are predicted to bind significantly better than experimentally observed. These compounds present a variety of net charges, but are all made up of conjugated aromatic rings. Additionally, empirical pKA estimations [48] suggest that G5, G6, G9 and G10 could adopt multiple protonation states at the pH where binding constants were measured. Hence, it is unclear whether the discrepancies are due to forcefield errors or finite-size effects.

As a separate issue, the reproducibility of standard free energies of binding was evaluated by comparing the results from model C with those reported by the Gilson lab (UCSD) for the octa-acid hosts [49]. The same input files were used, but the free energy calculations were performed with the pmemd.cuda program from AMBER 14 [50], and a different potential of mean force based ’attach-pull-release’ (APR) methodology [51]. Figure 5 shows that a good agreement is observed between both OAH (Fig. 5a) and OAMe (Fig. 5b) hosts, with a mean unsigned differences of about 0.4 $\hbox {kcal}\,\hbox {mol}^{-1}$ in the former case and 0.6 $\hbox {kcal}\,\hbox {mol}^{-1}$ for the latter. At first glance this level of variability seems reasonable given the typical statistical uncertainties of each methodology. Nonetheless closer inspection indicates that OAH-G5, OAMe-G5 and OAMe-G4 show significant discrepancies. Since the model C standard free energies of binding were only estimated from two repeats a concern was that the error estimates were not reliable. To test this two additional repeats were performed for these systems. The standard free energies of binding obtained from four repeats of model C are: ${\varDelta }G^{\circ }_{bind}$ (SOMD, OAH-G5) = −6.9±0.1, ${\varDelta }G^{\circ }_{bind}$ (SOMD, OAMe-G4) $=\,-3.4\pm 0.2, {\varDelta }G^{\circ }_{bind}$(SOMD, OAMe-G5) $=\,-6.5\pm 0.3\, \hbox {kcal}\,\hbox {mol}^{-1}$ respectively. The results were statistically identical to those obtained from two repeats (Table 1) for OAMe-G4 and OAMe-G5, but not OAH-G5. Personal discussions with the Gilson lab prompted additional APR calculations which produced revised values for ${\varDelta }G^{\circ }_{bind}$(APR, OAMe-G4)$\,=\,-4.3\pm 0.3\, \hbox {kcal}\,\hbox {mol}^{-1}$, and ${\varDelta }G^{\circ }_{bind}$ (APR, OAH-G5)$\,=\,-4.5\pm 0.5\, \hbox {kcal}\,\hbox {mol}^{-1}$. However, discrepancies remain and further work is needed to establish the protocol variations that introduced this variability in the computed standard free energies of binding.

Conclusions

The present alchemical free energy calculation protocols proved reasonably reproducible (Table 1). This was unexpected given the size of the guests that was deemed large for an absolute binding free energy calculation, suggesting that longer per-$\lambda$ simulation time than what was used here would be necessary. Factors that may have contributed to this outcome include the relative rigidity of some of the guests, the rapid relaxation of the hosts upon guest decoupling, the symmetry of the hosts, and the use of distance restraints to limit translational motions of the decoupled guests. Encouragingly, the results were also reasonably predictive, at least when judged by correlation with experiment ($\hbox {R}^2 0.65\,<\, 0.70 \,<\, 0.75$ for model C). Indeed, the SAMPL5 submissions for models A,B,C were among the top-performing protocols of this entire competition with respect to this metric. Nevertheless, systematic errors are present and the same models do not fare as well when ranked according to a mean unsigned error metric. Model B yields results that are identical to model A since the long range correction for missing dispersion interactions is essentially negligible. This was unexpected given previous reports were this term was found to be a significant contribution to standard free energies of binding [30]. For the systems considered here it seems that the simulation cutoffs used were sufficient to include most of the guest-host dispersion interactions. Satisfactorily, addition of a standard state correction term in model C systematically improves agreement with experimental data. In addition the computed standard free energies of binding for model C agree well with those produced independently by members of the Gilson lab (UCSD) using a different code and methodology. However, it is not currently understood why a few compounds show more significant deviations between the double decoupling and APR methodologies and this should receive further attention. It is well known that the computation of free energies of solvation of charged solutes via molecular simulations is typically affected by significant finite-size effects [52–55]. Given the broad range of net charges in the guests considered here, it is perhaps surprising that encouraging R ² values were obtained. For the host-guest binding energies reported here errors due to finite-size electrostatics is mitigated since partial error compensation occur between the simulations of the solvated guest and the host-guest complex. However, it seems reasonable to anticipate that the significant MUE values could be decreased with the use of suitable schemes to reduce or eliminate finite-size errors.^{Footnote 1} Other areas where further improvement could be sought for this dataset include the explicit consideration of multiple tautomeric forms of the guests, as well as a more systematic evaluation of alternative potential energy functions.

Table 1 Binding free energy, in $\hbox {kcal}\,\hbox {mol}^{-1}$, determination coefficient $\hbox {R}^2$, mean unsigned error (MUE), in $\hbox {kcal}\,\hbox {mol}^{-1}$, model A, B and C

Full size table

Notes

A fourth model that corrects for finite-effects on electrostatic interactions was also submitted by our group, but the results are not discussed here as it was subsequently established that a coding error led to incorrect evaluation of the correction terms.

References

Michel J (2014) Phys Chem Chem Phys 16(10):4465–4477
Article CAS Google Scholar
Geballe MT, Skillman AG, Nicholls A, Guthrie JP, Taylor PJ (2010) J Comput Aided Mol Des 24(4):259–279
Article CAS Google Scholar
Skillman AG (2012) J Comput Aided Mol Des 26(5):473–474
Article CAS Google Scholar
Guthrie JP (2009) J Phys Chem B 113(14):4501–4507
Article CAS Google Scholar
Michel J, Henchman RH, Gerogiokas G, Southey MWY, Mazanetz MP, Law RJ (2014) J Chem Theory Comput 10(9):4055–4068
Article CAS Google Scholar
Jorgensen LW, Thomas LL (2008) J Chem Theory Comput 4(6):869–876
Article CAS Google Scholar
Woods CJ, Malaisree M, Hannongbua S, Mulholland AJ (2011) J Chem Phys. doi:10.1063/1.3519057
Google Scholar
Chang C-E, Gilson MK (2004) J Am Chem Soc 126(40):13156–13164
Article CAS Google Scholar
Muddana SH, Gilson MK (2012) J Chem Theory Comput 8(6):2023–2033
Article CAS Google Scholar
Mikulskis P, Cioloboc D, Andrejić M, Khare S, Brorsson J, Genheden S, Mata RA, Söderhjelm P, Ryde U (2014) J Comput Aided Mol Des 28(4):375–400
Article CAS Google Scholar
König G, Pickard IV FC, Mei Y, Brooks BR (2014) J Comput Aided Mol Des 28(3):245–257
Article Google Scholar
Beckstein O, Fourrier A, Iorga BI (2014) J Comput Aided Mol Des 28(3):265–276
Article CAS Google Scholar
Monroe JI, Shirts MR (2014) J Comput Aided Mol Des 28(4):401–415
Article CAS Google Scholar
Chen I-J, Foloppe N (2011) Drug Develop Res 72(1):85–94
Article CAS Google Scholar
Halgren TA, Damm W (2001) Curr Opin Struct Biol 11(2):236–242
Article CAS Google Scholar
Kastenholz MA, Hnenberger PH (2004) J Phys Chem B 108(2):774–788
Article CAS Google Scholar
Jorgensen LW, Ravimohan C (1985) J Chem Phys 83(6):3050–3054
Article CAS Google Scholar
Mezei M (1987) J Chem Phys 86(12):7084–7088
Article CAS Google Scholar
Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, Lee M, Lee T, Duan Y, Wang W, Donini O, Cieplak P, Srinivasan J, Case DA, Cheatham TE (2000) Acc Chem Res 33(12):889–897
Article CAS Google Scholar
Gibb CLD, Gibb CB (2013) J Comput Aided Mol Des 28(4):319–325
Article Google Scholar
Gilberg L, Zhang B, Zavalij PY, Sindelar V, Isaacs L (2015) Org Biomol Chem 13:4041–4050
Article CAS Google Scholar
Zhang B, Isaacs L (2014) J Med Chem 57(22):9554–9563
Article CAS Google Scholar
Sure R, Antony J, Grimme S (2014) J Phys Chem B 118(12):3431–3440
Article CAS Google Scholar
Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA (2004) J Comput Chem 25(9):1157–1174
Article CAS Google Scholar
Mishra SK, Calabr G, Loeffler HH, Michel J, Koa J (2015) J Chem Theory Comput 11(7):3333–3345
Article CAS Google Scholar
Aldeghi M, Heifetz A, Bodkin MJ, Knapp S, Biggin PC (2016) Chem Sci 7(1):207–218
Article CAS Google Scholar
Jorgensen WL, Buckner JK, Boudon S, Tirado-Rives J (1988) J Chem Phys 89(6):3742–3746
Article CAS Google Scholar
Gilson MK, Given JA, Bush BL, McCammon JA (1997) Biophys J 72(3):1047
Article CAS Google Scholar
Michel J, Essex JW (2010) J Comput Aided Mol Des 24(8):639–658
Article CAS Google Scholar
Shirts MR, Mobley DL, Chodera JD, Pande VS (2007) J Phys Chem B 111(45):13052–13063
Article CAS Google Scholar
Zwanzig WR (1954) J Chem Phys 22(8):1420–1426
Article CAS Google Scholar
Frenkel D, Smit B (2001) Understanding molecular simulation, 2nd edn. Academic Press Inc, Orlando
Google Scholar
Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) J Chem Phys 79(2):926–935
Article CAS Google Scholar
Case DA, Babin V, Berryman JT, Betz RM, Cai Q, Cerutti DS, Cheatham III DS, Darden TA, Duke TA, Gohlke H, Goetz AW, Gusarov S, Homeyer N, Janowski P, Kaus J, Kolossvary I, Kovalenko A, Lee TS, LeGrand S, Luchko T, Luo R, Madej B, Merz KM, Paesani F, Roe DR, Roitberg A, Sagui C, Salomon-Ferrer R, Seabra G, Simmerling CL, Smith W, Swails J, Walker RC, Wang J, Wolf RM, Wu X, Kollman PA (2014) AMBER 14, University of California, San Francisco
Woods C, Mey ASJ, Calabro G, Michel J (2016) Sire molecular simulations framework. http://siremol.org. Accessed May 31
Eastman P, Friedrichs MS, Chodera JD, Radmer RJ, Bruns CM, Ku JP, Beauchamp KA, Lane TJ, Wang L-P, Shukla D, Tye T, Houston M, Stich T, Klein C, Shirts MR, Pande VS (2013) J Chem Theory Comput 9(1):461–469
Article CAS Google Scholar
Roe RD, Cheatham TE III (2013) J Chem Theory Comput 9(7):3084–3095
Article CAS Google Scholar
Muddana HS, Fenley AT, Mobley DL, Gilson MK (2014) J Comput Aided Mol Des 28(4):305–317
Article CAS Google Scholar
Schrödinger release 2015-2: Maestro, version 10.2, schrödinger, llc, New York, NY, 2015
Jakalian A, Bush BL, Jack DB, Bayly CI (2000) J Comput Chem 21(2):132–146 (cited By 552)
Article CAS Google Scholar
Shirts MR, Chodera JD (2008) J Chem Phys. doi:10.1063/1.2978177
Google Scholar
Hopkins CW, Le Grand S, Walker RC, Roitberg AE (2015) J Chem Theory Comput 11(4):1864–1874
Article CAS Google Scholar
Andersen HC (1980) J Chem Phys 72:2384–2393
Article CAS Google Scholar
Tironi IG, Sperb R, Smith PE, van Gunsteren WF (1995) J Chem Phys 102(13):5451–5459
Article CAS Google Scholar
Gan H, Benjamin CJ, Gibb BC (2011) J Am Chem Soc 133(13):4770–4773
Article CAS Google Scholar
Yin J, Henriksen NM, Slochower DR, Chiu MW, Mobley DL, Gilson MK (2016) Overview of the SAMPL5 host-guest challenge: are we doing better? J Comput Aided Mol Des (under review)
Bosisio S, Mey ASJS, Michel J (2016) Blinded predictions of distribution coefficients in the SAMPL5 challenge. J Comput Aided Mol Des (under review)
Chemaxon. www.chemicalize.org
Yin J, Henriksen NM, Slochower DR, Gilson MK (2016) The SAMPL5 host-guest challenge: binding free energies and enthalpies from explicit solvent simulations. J Comput Aided Mol Des (under review)
Goetz AW, Poole D, Le Grand S, Walker RC, Salomon-Ferrer R (2013) J Chem Theory Comput 9:3878–3888
Article Google Scholar
Velez-Vega C, Gilso MK (2013) J Comput Chem 34(27):2360–2371
CAS Google Scholar
Kastenholz MA, Hünenberger PH (2006) J Chem Phys 124(12):124106
Article CAS Google Scholar
Kastenholz MA, Hünenberger PH (2006) J Chem Phys 124(22):224501
Article Google Scholar
Reif MM, Oostenbrink C (2014) J Comput Chem 35(3):227–243
Article CAS Google Scholar
Rocklin GJ, Boyce SE, Fischer M, Fish I, Mobley DL, Shoichet BK, Dill KA (2013) J Mol Biol 425(22):4569–4583
Article CAS Google Scholar

Download references

Acknowledgments

J. M. is supported by a Royal Society University Research Fellowship. The research leading to these results has received funding from the European Research Council under the European Unions Seventh Framework Programme (FP7/ 2007-2013)/ERC Grant Agreement No. 336289.

Author information

Authors and Affiliations

EaStCHEM School of Chemistry, University of Edinburgh, David Brewster Road, Edinburgh, EH9 3FJ, UK
Stefano Bosisio, Antonia S. J. S. Mey & Julien Michel

Authors

Stefano Bosisio
View author publications
You can also search for this author in PubMed Google Scholar
Antonia S. J. S. Mey
View author publications
You can also search for this author in PubMed Google Scholar
Julien Michel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Michel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bosisio, S., Mey, A.S.J.S. & Michel, J. Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge. J Comput Aided Mol Des 31, 61–70 (2017). https://doi.org/10.1007/s10822-016-9933-0

Download citation

Received: 19 June 2016
Accepted: 27 July 2016
Published: 08 August 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10822-016-9933-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge

Abstract

Similar content being viewed by others

Blinded predictions of standard binding free energies: lessons learned from the SAMPL6 challenge

On the fly estimation of host–guest binding free energies using the movable type method: participation in the SAMPL5 blind challenge

Resolving the problem of trapped water in binding cavities: prediction of host–guest binding free energies in the SAMPL5 challenge by funnel metadynamics

Introduction

Theory and methods