Introduction

The accurate prediction of the strength of molecular association is an important and largely unsolved problem from both chemical [1] and medicinal [2] perspectives. Conventional approaches, such as docking, have reached a high level of maturity as high-throughput virtual screening [36] and structure prediction tools [7, 8]. However methods based on interaction-energy scoring alone [9, 10] are often not optimally suited to pick out trends at the level of resolution necessary to address finer aspects of drug development such as lead optimization, specificity, toxicity, and resistance. Atomistic physics-based free energy models, which take into account dynamical aspects of molecular recognition [2, 1118], have the potential to bridge this gap. However the reliability and general applicability of free energy models of binding remain to be fully established [1922].

Most of the work with physics-based free energy models reported in the literature has focused on small retrospective datasets, which do not give an accurate and unbiased picture of the state of the field. The SAMPL series of community blind challenges [2325] and related efforts [26] have played a key role in giving an unbiased view of the advantages as well as the challenges related to the application of free energy models of binding. In the recent SAMPL4 experiment for example, our group has employed our free energy methodology to screen a large set of HIV integrase inhibitor candidates [27, 28] where full treatment of conformational dynamics and entropic effects was found to be key to reach the observed level of prediction accuracy.

While, with the help of experiments such as SAMPL, theories, models and practices continue to improve, one key obstacle towards wider adoption of free energy models is the scarcity of automated and easy to use software tools. For example, although automated tools are beginning to appear [29], it is notoriously laborious to plan free energy transformations to compute the relative binding free energies of a set of compounds. In many circumstances, such as in virtual screening, differences in ligand scaffolds are too great to accommodate conventional free energy transformations. In this respect absolute rather than relative binding free energy methods offer some advantages. Additional obstacles towards adoption are due to learning barriers posed by molecular dynamics engines, each with its own set of parameters and settings (topology construction, force field parameter assignment, soft-core potentials, restraints, long-range electrostatic treatments, etc.) [20] often incompatible with other molecular dynamics engines. Addressing some of these usability issues and making binding free energy tools more user friendly would enable a wide community of non-specialists to access binding free energy tools and to apply them in a variety of contexts, ultimately leading to new insights and discoveries.

As part of the octa-acid SAMPL4 affinity challenge, in this work we apply the binding energy distribution analysis method (BEDAM), an absolute binding free energy protocol [30, 31], to the blind prediction of the binding free energies of a set of host–guest systems [32, 33]. The bulk of the computational work reported here has been conducted by the students of the Statistical Thermodynamics class at the department of Chemistry at Rutgers University. The BEDAM method has been successfully applied to a variety of systems including protein–ligand binding complexes [21, 28, 30, 34] and host–guest complexes [35], including the challenging ones presented as part of the previous SAMPL3 edition [36].

In addition to a further opportunity for an unbiased validation of the methodology, the primary aim of the work has been to involve a group of students from various disciplines into a classroom project reflective of applied collaborative research. The BEDAM/SAMPL4 host–guest exercise was particularly suited for this. It allowed a direct application on molecular systems of the statistical thermodynamics concepts covered in the course. As in actual research, outcomes were not known or guaranteed. In addition, given the relatively small size of the host–guest systems, the computational load was expected to be compatible with the time and computational resources available to the class. The work also involved studying literature material about the available laboratory measurements [37] in order to prepare the molecular systems appropriately and validate the computational protocol before applying it to obtain predictions.

One of the challenges with the introduction advanced computational modeling tools in the classroom is that a significant amount of time is required to familiarize the students with the usage of the modeling software, the format for inputs and outputs, algorithmic details, etc. Besides consuming valuable class time, this process is often of limited utility to the majority of students who either are not directly engaged in computational research or whose home laboratories utilize a different suite of modeling software. This complication was largely bypassed here by using an easy-to-use graphical front-end (Maestro, by Schrödinger, Inc.) combined with the BEDAM automatic workflow tool developed in our laboratory [38]. This was essentially the same protocol we used to automate the free energy calculations for the SAMPL4 HIV integrase screening challenge [28]. The project was set up in such a way that students prepared the molecular systems using the graphical front-end, provided these to the BEDAM workflow which in turn produced, without further intervention, all of the inputs required by the molecular modeling package. The same workflow was used to process the simulation data to provide binding free energy estimates and to streamline structural and other thermodynamic analyses.

This study confirms that it is valuable from multiple perspectives to package complex free energy simulation protocols into a form that allows the automated processing of large datasets and at the same time is accessible to non-specialists. The features of the BEDAM methodology, which does not require explicit solvation, multiple complex free energy transformations and elaborate conformational restraining steps, are conducive to a high degree of automation.

Methods

Overall organization of the project

Our group focused primarily on the SAMPL4 HIV Integrase screening challenge [25, 28]. Participation to the octa-acid host–guest challenge was organized as a classroom experiment as part of the Statistical Thermodynamics graduate class that the senior authors (E. G. and R. M. L.) were teaching at the time. The aim of the experiment was to both recruit the help of students and expose them to a realistic applied research study. Contrary to most classroom experiences, but not unlike actual research scenarios, neither the students nor their instructors had knowledge of the “right” answers. However, also similar to most research scenarios, literature data was available to conduct validation of the model to gain confidence in the predictions.

Each student was assigned a small set of host–guest complexes to investigate. The molecular simulation software and related scripts and force field data were provided by the instructors. Students were responsible for building the molecular structures of the guests (either from scratch and/or starting from PubChem sources or using files provided by the SAMPL4 organizers) using the Maestro program ensuring correct protonation, Lewis structure and initial conformation. The students were also responsible for building the initial conformation of the complex by placing the ligand in a reasonable binding mode within the cavity of the host. Students submitted the prepared files for the host and the guests to the automated BEDAM workflow [38] to generate input files for the parallel calculation with the IMPACT program [39]. Students were also responsible for submitting the corresponding parallel jobs to a computing cluster and for retrieving and analyzing the resulting outputs.

Student reports on the host–guest experiment counted towards their final class grade. Students were asked to describe not only their calculations but also to observe overall binding affinity trends by retrieving and discussing the results obtained by other students. Conversely students were asked to complete their calculations and analysis within assigned deadlines so as to be able to promptly address requests from others. Again, this organization reflects actual collaborative research scenarios. At completion of the class the instructor collected the student predictions and submitted them to SAMPL.

The binding energy distribution analysis method

The binding energy distribution analysis method [30] computes the absolute binding free energy \(\Delta G_{b}^{\circ }\) between a receptor \(A\) and a ligand \(B\) employing a \(\lambda\)-dependent effective potential energy function with implicit solvation [40] (see below) of the form

$$\begin{aligned} U_{\lambda }({\mathbf {r}})=U_{0}({\mathbf {r}})+\lambda u({\mathbf {r}}), \end{aligned}$$
(1)

where \({\mathbf {r}}=({\mathbf {r}}_{A},{\mathbf {r}}_{B})\) denotes the atomic coordinates of the complex, with \({\mathbf {r}}_{A}\) and \({\mathbf {r}}_{B}\) denoting those of the receptor and ligand, respectively,

$$\begin{aligned} U_{0}({\mathbf {r}})=U({\mathbf {r}}_{A})+U({\mathbf {r}}_{B}) \end{aligned}$$
(2)

is the effective potential energy of the complex when receptor and ligand are dissociated, and

$$\begin{aligned} u({\mathbf {r}})=u({\mathbf {r}}_{A},{\mathbf {r}}_{B})=U({\mathbf {r}}_{A}, {\mathbf {r}}_{B})-U({\mathbf {r}}_{A})-U({\mathbf {r}}_{B}) \end{aligned}$$
(3)

is the binding energy function defined for each conformation \({\mathbf {r}}=({\mathbf {r}}_{B},{\mathbf {r}}_{A})\) of the complex as the difference between the effective potential energies \(U({\mathbf {r}})\) of the bound and dissociated conformations of the complex without internal conformational rearrangements. To improve convergence of the free energy near \(\lambda =0\), a modified binding energy function is employed of the form

$$\begin{aligned} u^{\prime}({\mathbf {r}})={\left\{ \begin{array}{ll} u_{\mathrm{max}}\tanh \left( \frac{u({\mathbf {r}})}{u_{\mathrm{max}}}\right), &{} u({\mathbf {r}})>0\\ u({\mathbf {r}}), &{} u({\mathbf {r}})\le 0 \end{array}\right. }, \end{aligned}$$
(4)

where \(u_{\mathrm{max}}\) is some large positive value (set in this work as 1,000 kcal/mol). This modified binding energy function, which is used in place of the actual binding energy function [Eq. (3)] wherever it appears, caps the maximum unfavorable value of the binding energy while leaving unchanged the value of favorable binding energies [31].

The binding free energy \(\Delta G_{b}\) is by definition the difference in free energy between the states at \(\lambda =1\) and \(\lambda =0\). The standard free energy of binding \(\Delta G_{b}^{\circ }\) is related to this by the relation [11]

$$\begin{aligned} \Delta G_{b}^{\circ }=-k_{\mathrm{B}}T\ln C^{\circ }V_{\mathrm{site}}+\Delta G_{b}, \end{aligned}$$
(5)

where \(C^{\circ }\) is the standard concentration of ligand molecules (\(C^{\circ }=1\) M, or equivalently \(1,668\)Å3) and \(V_{\mathrm{site}}\) is the volume of the binding site (see below). The multistate Bennett acceptance ratio estimator (MBAR) [41, 42] is used here to compute the binding free energy \(\Delta G_{b}\) from a set of binding energies, \(u\), sampled from molecular dynamics simulations at a series of \(\lambda\) values. For later use we introduce here the reorganization free energy for binding \(\Delta G_{reorg}^{\circ }\) defined by the expression [14]

$$\begin{aligned} \Delta G_{b}^{\circ }=\Delta E_{b}+\Delta G_{reorg}^{\circ } \end{aligned}$$
(6)

where \(\Delta E_{b}=\langle u\rangle _{1}\) is the average binding energy of the complex and \(\Delta G_{b}^{\circ }\) is the standard binding free energy. The former is computed from the ensemble of conformations of the complex collected at \(\lambda =1\) and \(\Delta G_{reorg}^{\circ }\) is computed by difference using Eq. (6).

The AGBNP2 solvation model

The potential energy of the system is described by the OPLS-AA/AGBNP2 effective potential in which the OPLS-AA [39, 43, 44] force field accounts for covalent and non-bonded interatomic interactions and the effect of the solvent is represented implicitly by means of the Analytic Generalized Born plus non-polar (AGBNP2) implicit solvent model [40]. A full description of the AGBNP2 model is available elsewhere [40]. Here we give a brief summary of the elements that have been tuned for the present application (see below).

The AGBNP2 model computes the solvation free energy of the solute, \(\Delta G_{\mathrm{solv}}\), as the sum of electrostatic, \(\Delta G_{\mathrm{elec}}\), non-polar, \(\Delta G_{\mathrm{np}}\), and short-range solute-water hydrogen bonding, \(\Delta G_{\mathrm{hb}}\), contributions:

$$\begin{aligned} \Delta G_{\mathrm{solv}}=\Delta G_{\mathrm{elec}}+\Delta G_{\mathrm{np}}+\Delta G_{\mathrm{hb}}. \end{aligned}$$
(7)

The electrostatic term is described by means of a variation of the continuum dielectric Generalized Born model [45, 46]. The non-polar term is further decomposed into a cavity hydration free energy \(\Delta G_{\mathrm{cav}}\), expressed in terms of solute surface areas, and a solute–solvent average dispersion interaction energy \(\Delta G_{\mathrm{vdW}}\) given by the expression

$$\begin{aligned} \Delta G_{\mathrm{vdW}}=\sum _{i}\alpha _{i}\frac{a_{i}}{(B_{i}+R_{w})^{3}}, \end{aligned}$$
(8)

where \(B_{i}\) is the Born radius of atom \(i,\, R_{w}=1.4\) Å represents the radius of a water molecule, \(a_{i}\) is an van der Waals energy integration factor solely dependent on the Lennard-Jones parameters of the solute atom and the water model [47, 48], and \(\alpha _{i}\simeq 1\) is an atom type-dependent dimensionless adjustable parameter [46].

The hydrogen bonding term,

$$\begin{aligned} \Delta G_{\mathrm{hb}}=\sum _{w}h_{w}p_{w} \end{aligned}$$
(9)

is computed in terms of spherical hydration volumes \(w\), typically located around hydrogen bonding donor and acceptor sites [40]. The geometrical parameter \(p_{w}\), expressed as the fraction of the hydration site not occupied by solute atoms, measures the effective water occupancy of the site and the adjustable parameter \(h_{w}\), which depends on the type of hydrogen bonding site, controls the strength of the solute–solvent interaction (or more precisely the portion of it not captured by the continuum model) [40]. While normally used for hydrogen bonding sites contributing favorably to the solute hydration free energy, here and elsewhere [35, 36] we have also employed this same functional form to describe hydration sites contributing unfavorably to the hydration free energy (see below); the distinction being the sign of the \(h_{w}\) parameter, negative for hydrogen bonding sites and positive for the unfavorable solvation free energy sites.

System preparation and tuning

The octa-acid host was prepared starting from the structure file provided by the SAMPL organizers using the facilities in the Maestro program (Schrödinger, Inc.) using standard OPLS2005 parameters. The guests were prepared similarly. All carboxylates of the host and the guests were modeled as unprotonated with a \(-8\) overall net charge of the host. Both axial and equatorial conformations of cyclic alkyl rings of the guests were investigated separately. The axial conformations led to significantly less favorable binding free energies and were not considered in the analysis.

A preliminary binding free energy calculation for guest 1 with default AGBNP2 parameters resulted in an unstable complex, which was regarded as unreasonable. Accordingly, steps were taken to correct this defect. Given the hydrophobicity and depth of the binding cavity of the octa-acid host, it was reasoned that the cause of the discrepancy was due to water enclosure effects [49, 50] not well represented by our continuum solvent model. Two possible scenarios are likely: the cavity may be hydrated by restricted low entropy and/or high energy water molecules which, when released in the bulk due to guest binding, contribute favorably to binding. In the second scenario the cavity is partially dewetted resulting in weak interaction of host atoms in the interior of the cavity with the solvent. In the complex these are replaced by interactions between the host and the guest, again contributing favorably to binding. As indicated by explicit solvent simulations in which the binding cavity of the octa-acid host was observed to fluctuate from empty to completely filled with water [51, 52], the two effects (low water entropy and low water occupancy in the cavity) may, in fact, occur concomitantly. Nevertheless, both effects contribute favorably to host–guest binding and, as described below, can be modeled similarly in the context of the implicit solvation model we have employed.

As illustrated in Fig. 1 the interior of the host is composed of an outer larger cavity and an inner smaller cavity. Four alkyl hydrogen atoms of the host point towards the smaller cavity [37]. Similar to a previous approach for a β-cyclodextrin host [35], we employed these as attachment points for custom AGBNP2 hydration sites with unfavorable hydration strength parameters [\(h_{w}\) in Eq. (9)]. The results submitted to SAMPL4 were obtained with \(h_{w}=2\) kcal/mol, although, given that these sites are significantly occluded even in the absence of a bound guest, their individual contribution to the binding free energy is only a fraction of this value. We used a different strategy to model water enclosure effects in the larger cavity of the host. This cavity is lined with aromatic rings lacking hydrogen atoms suitable to serve as attachment points for hydration sites. Instead, we opted to reduce the van der Waals \(\alpha\) parameters [see Eq. (8)] for the aromatic carbon atoms lining this cavity from 0.7 to 0.5. Both modifications work towards making the hydration free energy of the host less favorable relative to the complex thereby decreasing the desolvation penalty for binding. Given the limited scope of the classroom experiment, a full parameter optimization campaign was not carried out. The same modified parameters above were applied to both sets of complexes, those with known binding affinities and those with unknown affinities as part of the SAMPL4 challenge.

Fig. 1
figure 1

Surface representation of the octa-acid host (with guest 7 bound). The guest (green carbon atoms) occupies the central cavity which is composed of an outer large cavity and a deeper smaller cavity occupied in this case by the methyl group of the guest

Computational details

Force field parameters were assigned using Schrödinger’s automatic atomtyper [39]. Parallel alchemical Hamiltonian Replica Exchange molecular dynamics simulations were conducted with the IMPACT program [39]. The simulation temperature was set to 300 K. We employed 16 intermediate steps at \(\lambda\) = 0, 0.001, 0.002, 0.004, 0.005, 0.006, 0.008, 0.01, 0.02, 0.04, 0.07, 0.1, 0.25, 0.5, 0.75, and 1. The binding site volume was defined as any conformation in which the center of mass of the ligand was within 8 Åof the center of mass of the host. The ligand was sequestered within this binding site volume by means of a flat-bottom harmonic potential. Based on this definition the value of the term \(-kT\ln C^{\circ }V_{\mathrm{site}}\) in Eq. (5) is −0.15 kcal/mol. No other restraints were applied.

BEDAM calculations were performed for 1.4 ns of molecular dynamics per replica (22.4 ns total for each complex). Data from the last nanosecond of each replica trajectory was used for free energy analysis. Binding free energy estimates converged quickly; differences between estimates obtained using the first third and the full data set were all smaller than 1 kcal/mol. Binding energies were sampled with a frequency of 1 ps for a total of 16,000 binding energy samples per complex. Uncertainties in the binding free energies were estimated from MBAR [41] and scaled by a factor of 10 to reflect the correlation length of approximately 50 ps estimated from binding energy trajectories of guest 1. The binding free energy predictions were submitted to the SAMPL4 octa-acid challenge on July 20 2013 and assigned prediction ID #140.

Results and performance

Binding free energy validation

Table 1 reports the computed binding free energies for the octa-acid complexes for which experimental binding free energies were available at the time of the SAMPL4 challenge [37]. With the exception of the complexes with the two longest linear alkyl carboxylates (decanoate and octanoate) whose affinity is overestimated, there is good agreement between calculated and experimental binding free energies. The cause of the discrepancy for long chain carboxylates is not clear. The complex with the shorter hexanoate guest is predicted correctly and so are the complexes with the more compact adamantane and cyclohexane derivatives. As the SAMPL4 set did not contain long chain carboxylates, which appear problematic with the current model, we did not explore this issue further.

Table 1 Calculated binding free energies for complexes of the octa-acid host with a set of guests with published experimental affinities [37]

Blind predictions

The blind binding free energy predictions submitted to SAMPL4 are listed in Table 2 and shown in Fig. 2 compared to the experimental measurements, which were not known to us prior to the submission of the predictions [32]. Trans-4-methyl-cyclohexane carboxylate (guest 7) and 4-chlorobenzoate (guest 4) are correctly predicted as the strongest and next to strongest binders in this set. The calculated binding free energies for these guests are in quantitative agreement with the experiments (for example for guest 7, \(-7.2\) kcal/mol predicted vs. \(-7.6\) kcal/mol experimentally). At the other end of the spectrum, benzoate (guest 1) and cyclopentane carboxylate (guest 8) are correctly predicted as the weakest binders, although for these two guests the agreement is not as quantitative (for benzoate the binding free energy is underestimated by 2.7 kcal/mol). In general, the computational model predicts larger variations in binding free energies than observed as confirmed by the greater-than-one slope of the correlation line of the calculated binding free relative to experiments (Fig. 2). For example methylation at the trans position of guest 1 is predicted to favor binding by approximately 4 kcal/mol whereas measurements show a variation approximately half this value.

As the thermodynamic decomposition data in Table 2 shows, trends in binding affinity are generally determined by host–guest interaction energies measured by the binding energies \(\Delta E_{b}\). The strongest binder (guest 7) is also the one with the most negative binding energy (\(-19.6\) kcal/mol) whereas the weakest binders (guests 1 and 8) are the ones with the least negative binding energies (\(-9.3\) and \(-10.2\) kcal/mol, respectively). As it is often the case, however, the range of variation of the binding energy (10.0 kcal/mol) is significantly larger than the range of binding free energies (6.2 kcal/mol) due to the compensating effect of reorganization (\(\Delta G_{\mathrm{reorg}}^{\circ }\) in Table 2). The reorganization free energy measures entropic losses and intramolecular strain of the host and the guest upon binding [14], which generally become increasingly unfavorable with increasing strength of host–guest interactions. The strongest binder (guest 7) is also the one which incurs the highest reorganization penalty while the weakest binders incur the least. In the middle of the pack however the balance between favorable host–guest interactions and unfavorable reorganization losses are more complex. For example guest 2 would be predicted as the second strongest binder based on interaction energies alone overcoming guest 4 by more than 1 kcal/mol. Binding free energy scores however correctly predicts the opposite due to a 2 kcal/mol advantage of guest 4 in terms of reorganization penalty.

Table 2 Experimental binding free energies and calculated binding free energies, binding energies and reorganization free energies for the complexes of the octa-acid host with the SAMPL4 set of guests
Fig. 2
figure 2

Calculated standard binding free energies of the SAMPL4 octa-acid complexes plotted against the corresponding experimental measurements. The continuous line is the 1:1 line and the dashed line is the least-squared line (slope = 1.5)

As summarized in the overview paper [33], ours were judged as some of the most accurate predictions of the SAMPL4 challenge. Our submission ranked best (among the 13 octa-acid entries made public) in terms of root mean square error with respect to both absolute and relative binding free energy measures. For the latter, using the notation in reference [33], the root mean square error after subtracting the average signed error was RMSE_o = 1.3 and the root mean square error of all pairs of relative binding free energies was RMSE_r = 0.9 kcal/mol. Our predictions performed best also in terms of correlation slope (slope = 1.5, but interestingly behind a null model based on guest size), and second best in terms of correlation coefficient (\(R^{2}=0.9\)). These quality metrics were statistically equivalent to those of absolute binding free energy predictions obtained by Ryde and coworkers with an explicit solvation model [52].

The predominant binding mode seen in the simulations is, as expected, one in which the hydrophobic ring of the guest is set into the cavity with the carboxylate group oriented towards the solvent (Fig. 3). Substituents in the 4th position of the ring occupy the inner cavity of the host. This happens for guests 2, 3, 4, and 7. In these guests the substituent is in register to occupy the lower cavity of the host while leaving the carboxylate group optimally solvated. Guest 5, with the chlorine substitution at the 3rd position, prefers mostly to not occupy the inner cavity rather than sacrificing optimal solvation of the carboxylate group (see Fig. 3). The calculations generally reproduce the observed trend that binding to the inner cavity contributes to stronger binding. In agreement with the experiments complexes with guests 2, 4 and 7 are more strongly bound than their respective homologues (guests 1, 5, and 6) not capable of occupying the inner cavity.

Fig. 3
figure 3

Representative structures of the complexes of the octa-acid host with the nine cyclic carboxylate guests investigated as part of the SAMPL4 challenge. The structures displayed here are the final frames of the trajectory of the BEDAM replica at \(\lambda =1\)

Experimental trends also identify interactions with the larger outer cavity as an important binding determinant for binding; an aspect that appears to be underestimated by the computational model. For example, guest 9, the third strongest binder experimentally despite the lack of interactions with the inner cavity, is ranked only fifth by the model. Similarly, as noted above, the affinities of guests 1 and 8, while ranked correctly, are significantly underestimated. On the other hand, the binding of guest 3 is also underestimated even though it occupies the inner cavity sacrificing in part good interactions with the outer cavity in order to accommodate the longer ethyl substituent. As noted [32], the stronger observed binding of guest 3 relative to guest 2 is contrary to expectations and the model, predicting the opposite relative rankings, fails to shed light on the underlying molecular mechanism for the anomaly.

Discussion

Overall, binding affinity prediction methods have performed well on the SAMPL4 host–guest challenge [33, 5255], confirming the steady progress of the field, and the valuable contribution of blind experiments of this kind towards this progress. The binding free energy predictions made as part of this work were among the top scoring submissions for the octa-acid binding affinity challenge evaluated by the SAMPL4 organizers [33]. The present results, together with previous successful experiences in SAMPL challenges [36], and the good ligand screening performance in the concurrent SAMPL4 HIV integrase challenge [28], adds further confidence in the reliability of the BEDAM protocol for binding free energy estimation. The present work also demonstrates the accessibility of the technology to non-experts, thanks to an automated workflow and the minimal set of structural assumptions required by the model.

As in previous work [35], tuning of the implicit solvation model to properly treat enclosed hydration sites has been important to achieve good accuracy. Conventional solvation models based on homogeneous continuous descriptions of the solvent do not adequately treat hydration in deep hydrophobic solute cavities. In particular, it has been shown in several contexts that the displacement into the bulk of high free energy water molecules enclosed within receptor cavities can contribute favorably to ligand association [49, 56]. The atypical properties of water in molecular-sized volumes are difficult to model accurately even in the context of explicit representations of the solvent [57, 58]. As we have done in this work, our approach to address these challenges has been to parameterize empirical geometrical models against experimental data. The advantage of this approach is that it can yield, depending on the availability and quality of the experimental data, representations of the thermodynamics of hydration at a level of accuracy equivalent and possibly superior to models of higher complexity. However when adopting empirical approaches of this kind, transferability of parameters can not be assumed. In this work the choice of parameters was guided by existing experimental data on the octa-acid system [37] and previous experiences with similar hydration cavities in other host–guest systems [35, 36, 50].

The octa-acid binding cavity is in many respects representative of hydrophobic binding sites on protein surfaces where complex hydration patterns significantly affect binding propensities [49]. The fact that reliable predictions were achieved in the present application despite very limited parameterization indicates that similar strategies could be successfully employed for protein receptors. Recent advances in inhomogeneous solvation theory analysis [50, 56] potentially offer a suitable route to automated parametrization from explicit solvent simulations.

The model generally confirms the expected trends in the SAMPL4 set of the octa-acid host system [32]. Guests capable of occupying the inner smaller cavity without sacrificing solvent exposure of the carboxylate group tend to bind the octa-acid host more strongly, and so are guests containing an alkyl ring rather than an aromatic ring. The model provides further insights and details on the molecular origins of these trends. For example it has been suggested that guest 4 (chlorine substituent at position 4) binds more strongly than guest 2 (methyl substituent) due to added hydrogen bonding-like interactions between the chlorine atom and the benzal hydrogen atoms of the host pointing towards the inner cavity [32]. However the computational model, by predicting a stronger host–guest interaction energy for guest 2 relative to guest 4 by more than 1 kcal/mol (see column 5 in Table 2), appears to contradict this hypothesis. In our model the greater affinity of guest 4 for the host is due to its smaller reorganization free energy penalty relative to guest 2. We hypothesize that this is due to added intramolecular strain imposed on the host to open up the inner cavity slightly to accommodate the methyl group. This is an example of the commonly observed compensation between binding energies and reorganization free energy components [28, 35]: stronger receptor-ligand interactions are often achieved at the expense of entropic losses and intramolecular strain and, as a result, the outcome in terms of binding free energy is often the result of a subtle balance that is difficult to predict.

As an additional example we find that the higher affinity of alkyl ring-containing guests is due to their stronger interaction energies with the host. For instance, the average binding energies of guests 6 and 7 are approximately 3 kcal/mol more favorable than those of the corresponding aromatic guests (guests 1 and 2). This is due to a combination of the larger number of hydrogen atom interaction centers in the alkyl ring and the smaller average distance between the carbon atoms of the guest and the atoms of the host afforded by the puckering of the ring. This conclusion is in agreement with the analogous analysis based on binding site volume occupancy [32]. Interestingly, in this case, unlike the example above, the stronger interaction energy actually translates into stronger binding affinity because the reorganization free energy component is either reinforcing the interaction energy difference (compare the reorganization free energy values of guest 6 and guest 1 in Table 1) or opposes it only slightly (guest 7 relative to guest 2).

The SAMPL4 host–guest binding experiment offered an invaluable opportunity to incorporate a realistic research task into a classroom context. The blind nature of the experiment, where no one was in possession of the right answer, created a unique collaborative network among students and among students and instructors. The choice of the best approach to solve each problem was worked out based on the collective wisdom of the class rather than being selected as the approach that gives the “right” answer. Validation of the model, just as it is in applied research, became not the goal of the exercise but rather the means to obtain a set of predictions of the highest quality possible.

To streamline the calculations, we employed a highly automated BEDAM workflow capable of preparing inputs for the molecular dynamics engine from a minimal set of user parameters: the structure files for receptor and ligand and the maximum center of mass distance between the two which defines the binding site volume (see Methods). This automation strategy, which has also been employed to automate hundreds of binding free energy calculations HIV integrase screening challenge as part of the same SAMPL4 experiment [28], enabled the simulations with little user knowledge of the system preparation details, input file syntax and parallel execution commands for the molecular dynamics engine. Key features of the Schrödinger’s molecular modeling environment, such as the graphical user interface and automatic force field parameter assignment, also played a key role in making these complex calculations accessible to students.

Conclusions

As part of the SAMPL4 blind challenge, we have employed the BEDAM protocol to predict the binding free energies of a set of octa-acid host–guest complexes. Our predictions consistently scored among the best submitted to SAMPL4 in this category (best in terms root mean square errors and correlation slope, and second best in terms of correlation coefficient). The experiment has been conducted as part of a hands-on graduate class laboratory exercise. Collectively the students, guided by the instructors, performed the bulk of the calculations and the numerical and structural analysis. Students were encouraged to share data and prepared reports, on which this work is based, discussing their results in the context of those of all of the other students. Overall, participation to this SAMPL4 challenge has been a very instructive experience to both the students and their instructors. The success of the experiment confirms the reliability of physics-based atomistic binding free energy estimation models and it shows that these, when properly streamlined and automated, can be successfully employed by non-specialists.