Introduction

This year’s Statistical Assessment of Modeling of Proteins and Ligand (SAMPL) challenge focuses on prediction of cyclohexane–water distribution coefficients. The inclusion of distribution coefficients replaces the previous focus on hydration free energies which was a fixture of the past five challenges (SAMPL0-4) [17]. Due to a lack of ongoing experimental work to generate new data, hydration free energies are no longer a practical property to include in blind challenges. It has become increasingly difficult to find unpublished or obscure hydration free energies and therefore impossible to design a challenge focusing on target compounds, functional groups or chemical classes. But this type of data is extremely valuable—the past SAMPL challenges have driven real improvements in a variety of methods for calculating hydration free energies [1]—so we sought to include a similar physical property in SAMPL5. The organizers of SAMPL5 settled on cyclohexane–water distribution coefficients, and thanks to a partnership with Genentech, this led to a series of measurements on drug-like compounds, discussed in detail in this issue [8]. The experimental measurements are also straightforward enough that future distribution coefficient challenges can be deliberately designed to focus on issues that merit attention to move the field forward.

Partition and distribution coefficients are important physical properties [9, 10] which can provide a valuable opportunity for testing computational methods and molecular models. Distribution coefficients describe how all forms of a solute distributes itself across two immiscible solvents. In this case,

$$\begin{aligned} D = \frac{\sum _i [X_i]_{cyc}}{\sum _i [X_i]_{aq}} \end{aligned}$$
(1)

where \(X_i\) represents a single protonation or tautomeric state of the solute in one of the solvents, and the sum runs over all protonation and tautomeric states [10]. Results are reported as the logarithm of this ratio (\(\log D\)). These are more complicated than partition coefficients (\(\log P\)), which measure the concentration of the neutral solute in both solvents [9]. Specifically, if only one tautomer is relevant, \(\log P\) is proportional to the transfer free energy, and can be calculated from solvation free energies [1121]. In contrast, all relevant charged and neutral forms of the solute will need to be included to accurately calculate \(\log D\), which can be estimated from a calculated \(\log P\) and the relative populations of protonation states and tautomers in each solvent. Thus, accurate tautomer enumeration in both solvents may be an important part of predicting \(\log D\), introducing new complexities to the SAMPL challenge which were avoided in previous hydration free energy challenges.

Here we give an overview of the analysis done for the SAMPL5 challenge, including the compounds considered, overall performance of submissions, and the metrics used for analysis. We also include details for a set of reference calculations we performed estimating \(\log D\) as the cyclohexane/water partition coefficient as well as a series of follow-up studies focusing on the importance of tautomers in estimating \(\log D\). Overall, we believe the outcome of the present SAMPL5 challenge highlights the potential benefits of this type of experimental data to improve computational methods, force fields, sampling algorithms, and treatment of protonation states and tautomers. Many of these issues will be highly relevant for nominally more challenging problems, such as prediction of protein–ligand binding affinities.

Challenge logistics

SAMPL5 began on September 15, 2015 when the specifications and input files for the challenge were made available on the D3R website (http://drugdesigndata.org); these are also provided in the supporting information, made available on the University of California Dash (http://n2t.net/ark:/b7280/d1988w). The challenge deadline was February 2, 2016 and experimental results were provided to participants not long after. As in past SAMPL challenges, each group could submit multiple sets of predictions. There was also the option for participants’ identities to remain anonymous, although their results and method descriptions would still be made available. A total of 76 prediction sets from 18 participants or participating groups were submitted and assigned a random 2 digit ID number, 01–76, that will be used throughout this paper. Predictions were analyzed and overview statistics, as well as individual analysis of each submission by various error metrics (as detailed below) were returned to each participant. The challenge culminated with discussions of participants experiences and results at the 1st D3R Workshop at the University of California, San Diego March 9–11, 2016.

Table 1 A complete list of compounds used in the SAMPL5 challenge, sorted by batch

For the prediction of distribution coefficients in SAMPL5, a total of 53 molecules were considered. They were assigned an identifier in the form SAMPL5_XXX and are pictured in Table 1. The 53 molecules were divided into batches 0, 1, and 2 containing 13, 20, and 20 molecules respectively. We wanted each batch to have a similar dynamic range and for the molecules to increase in size across batches, so on average the smallest molecules are in batch 0 and the largest in batch 2. To ensure each batch had adequate dynamic range, the molecular weight and estimated octanol/water partition coefficient were computed for each compound. These partition coefficients were estimated with OpenEye’s \(\log P\) calculator. Molecules were then assigned to bins by estimated partition coefficient, and assigned to batches based on molecular weight. Specifically, the smallest molecules from each partition coefficient bin were added to batch 0, then batch 1, and the rest of the molecules comprise batch 2.

Participants could submit partial sets of predictions as long as they included full consecutive batches; that is, they could submit batch 0, batches 0 and 1, or batches 0, 1, and 2. The idea was that all participants should attempt predictions on the full set if at all possible, but grouping into batches would allow people with particularly demanding methods (such as polarizable force fields or methods requiring intensive quantum mechanics) to focus on smaller compounds and still be evaluated. Eight submissions from two participants included results for only batch 0, and an additional five submissions from two participants provided only batches 0 and 1. Here we focus on the results for the complete set of molecules (batches 0, 1, and 2). Separate analysis for data subsets is available in the supporting information on Dash

Participants were asked to report a cyclohexane/water distribution coefficient for each molecule. As discussed above, distribution coefficients are the ratio of concentrations for all forms of the solute in the cyclohexane and aqueous layers, at a specified pH. In this case, experiments were done with the water layer consisting of a buffered aqueous solution at pH 7.4. We also required participants to provide two estimates for uncertainty, a statistical uncertainty for their computational method and a model uncertainty that estimates agreement with experiment. The statistical uncertainty was intended to be the variation expected over repeated calculations of the same value. The model uncertainty, on the other hand, was intended to provide an estimate of how well the calculated value will agree with experiment. For example, in a recent study we computed cyclohexane/water partition coefficients using alchemical solvation free energy calculations in GROMACS where the statistical uncertainties were around 0.05 log units, but the root mean squared error was around 1.4 log units [23], so an appropriate estimated model uncertainty would have been 1.4 log units. A careful analysis of expected error could even yield model uncertainties which would vary based on the anticipated difficulty or complexity of a compound. Our interest in model uncertainties is in part based on the realization that an important part of creating predictive models is the ability to know when they will be unreliable or fail. Thus, analysis of model uncertainties is an important part of evaluating any model.

Analysis of submission performance

As in past SAMPL challenges, we considered a variety of error metrics in analyzing all predictions submitted to SAMPL5. Each error metric was calculated for all submissions, by batch, and distributed to challenge participants before the workshop. Here we will focus primarily on six error metrics: the root-mean-squared error (RMSE), average unsigned error (AUE), average signed error (ASE), Pearson’s R (R), Kendall’s tau (tau), and the ‘error slope’ explained in depth below. We also calculated the maximum absolute error and the percent of predictions with the correct sign, but these are not included in the analysis here. However, these metrics were provided to challenge participants and are available in the supporting information on Dash. The uncertainty in each metric was calculated as the standard deviation over 1000 bootstrap trials, where each trial consists of creating a ‘new’ dataset by sampling pairs of (predicted, calculated) values from the original set, with replacement. As described previously, this bootstrapping technique also included variation in the experimental values based on their reported uncertainties [1].

As discussed above, an important factor influencing the utility of a predictive tool is the ability of the tool to not only provide predictions but to predict the accuracy of those predictions—that is, how well the calculated values are likely to agree with experiment—not just its statistical error. To assess this, as in SAMPL4 [1], a quantile–quantile plot (QQ Plot) was created for each prediction set [24]. QQ Plots compare the fraction of a normal distribution within a specified number of standard deviations to the distribution of errors (calculated minus experiment) that are within that number of model uncertainties. For example, consider the number of predictions within one standard deviation of the expected value; if the samples are drawn from a normal distribution, then  0.68 of the values ought to fall within one standard deviation, so the value on the x-axis is 0.68. The value on the y-axis will represent the fraction of predictions that are within one model uncertainty of the experimental value. If the model uncertainty is accurate, then this also ought to correspond to a value of 0.68. A linear regression analysis helps summarize these results. The ‘error slope’ is the slope of the line comparing the fraction of predictions within a specified range of experiment to the expected fraction from a normal distribution. An error slope of greater than one indicates that the calculated values are within uncertainty of experiment more often than expected, or in other words the model uncertainty was overestimated. In contrast, an error slope less than one suggests the model uncertainty was underestimated.

We also attempted to identify any individual molecules where most of the methods failed to accurately estimate the distribution coefficient. To accomplish this, we analyzed all predictions on a molecule-by-molecule basis via our usual set of error metrics. Here we will primarily focus on just average unsigned error for molecules, but all other error metrics were provided to participants and are available in supporting information on Dash.

Reference calculations

We calculated distribution coefficients through a few different methods as a reference. K.H.B., a graduate student in the Mobley group, performed a set of blind calculations estimating the \(\log D\) as a partition coefficient between cyclohexane and water calculated from solvation free energies. In addition, C.C.B. and D.L.M. performed post-challenge analysis of protonation and tautomeric states and used this to convert our calculated partition coefficients to distribution coefficients. We considered a null hypothesis where all molecules are assumed to distribute equally between cyclohexane and water. Many fast structure-based tools for octanol/water partition coefficients exist, and we used one of those to estimate partition coefficients, both with no correction for the fact that we are interested in cyclohexane, and with a small adjustment for this as discussed below.

Calculating partition coefficients from solvation free energies

We decided to estimate distribution coefficients via a \(\log P\) calculation, by assuming only a single neutral tautomer of each solute, then calculating \(\log P\) from a difference in solvation free energies. Before the challenge, each molecule was taken directly from the provided SMILES string. As demonstrated in the literature [1121], partition coefficients are directly proportional to the difference between the solvation free energy for the solute into each solvent. We use previously established and automated protocols [23] to calculate the solvation free energy of each molecule into water and cyclohexane. Then the calculated partition coefficient was reported as an estimate for \(\log D\).

To calculate solvation free energies, we used automated tools created by the Mobley lab. Molecular dynamics simulations were performed in GROMACS [2531] with the General AMBER Force Field (GAFF) [32] with AM1-BCC charges [33, 34]. The TIP3P water model [35] was used for the aqueous phase. Topology and coordinate files for the solvated boxes with 1 solute molecule and 500 cyclohexane or 1000 water molecules were built using our Solvation Toolkit [23]. These files were then converted to AMBER, DESMOND, and LAMMPS formats and provided to SAMPL5 participants as reference calculations. The Solvation Toolkit takes advantage of many open source Python modules and is available at https://github.com/MobleyLab/SolvationToolkit. It converts SMILES strings or IUPAC names of any mixture of small organic compounds to parameterized molecules and builds topology and coordinate files for a variety of simulation packages. All molecular dynamics parameters are identical to previous studies [4, 23, 36]. The molecule is taken from the solvated box to a non-interacting gas phase in 20 lambda values. Solvation free energies are calculated with Alchemical Analysis tool [37] using the multi-state Bennett acceptance ratio to extract the free energy difference between the beginning and end state. The partition coefficient was calculated as the difference between the cyclohexane solvation free energy and the hydration free energy. The statistical uncertainty was reported as the propagated uncertainty from the solvation free energy calculations. The model uncertainty was estimated to be the same for all molecules and reported as the root-mean-squared error from a recent study on calculating cyclohexane/water partition coefficient, specifically 1.4 log units [23]. These reference calculations were assigned submission ID 39 and included in the error analysis performed on all submissions.

Simulation box size does not affect the calculated solvation free energy

Hydration free energies were previously shown to be independent of box size for box edges ranging from 2 to 9 nanometers, within calculated uncertainties [38]; however, here, because cyclohexane is much less polar, we had some concern that finite size effects could still be significant. To explore this, we performed tests in which we varied the simulation box size. Because more polar solutes are more likely to have substantial long range interactions, we calculated the dipole moment of each SAMPL5 molecule using the position and charges on atoms in the mol2 files. SAMPL5_024 had the largest dipole moment so it was used as the solute for the box size investigation. The solvation free energy calculations were set up as described above, changing the number of cyclohexane molecules from 100 to 500. Our calculations above are performed with lattice-sum (PME) treatment of coulomb interactions. It was the primary focus of this check and we included duplicate calculations for the smaller box sizes where the initial coordinate file was the same, but new velocities were generated for the equilibrium phases. We also repeated the solvation free energy calculations with reaction field coulomb interactions assigning the dielectric coefficient for cyclohexane, 2.0243 [39]. For both types of simulation, the calculated solvation free energy fluctuated around an average of 19.1 kcal/mol with no trend that would suggest solvation free energy depends on box size (Fig. 1). There are many other explanations for fluctuations in calculated solvation free energies, including sampling, that might account for the 0.48 kcal/mol range in the PME simulations. Ultimately, we find that box edge lengths from 2.64 to 4.54 nm have no significant effect on the calculated solvation free energies. This suggests that in the future, smaller box sizes could be used for computational efficiency. The input, results, molecular dynamics parameter and coordinate files, and tables of solvation free energies are available in the supporting information on Dash.

Fig. 1
figure 1

Calculated solvation free energy for SAMPL5_024 is independent of box size for PME and reaction field coulomb interactions. Points are connected to help distinguish between the two data sets

Consideration of tautomers after SAMPL5

To help understand how the results from our partition coefficient calculations could have been improved, we considered corrections for changes in the solutes’ protonation or tautomeric states. Distribution coefficients differ from partition coefficients in that they include all forms of the solute in both solvents. A common way to convert between experimentally measured distribution coefficients and partition coefficients is with pKa values for the solute [40]. This is a simple correction using the Henderson–Hasselbalch equation:

$$\begin{aligned} pH = pK_a + \log \frac{[X]}{[HX]} \end{aligned}$$
(2)

to relate the concentration of neutral species to the charged species at a given pH. This correction assumes the solute only has one other relevant protonation state and changes for acidic and basic molecules. Zwitterions and other neutral tautomers are not taken into account. The equation used to calculate a distribution coefficient (\(\log D\)) from a partition coefficient (\(\log P\)) for a basic solute (or X in Eq. 2) is below

$$\begin{aligned} \log D = \log P - \log(1+10^{pK_a-pH}) \end{aligned}$$
(3)

Alternatively for an acidic solute (or HX in Eq. 2) we would instead use:

$$\begin{aligned} \log D = \log P - \log(1+10^{pH-pK_a}) \end{aligned}$$
(4)

We use Schrödinger’s Epik tool [4143] to estimate pKa values for each molecule according to experimental conditions. We then estimated \(\log D\) using the equations above, accounting for just one change in protonation state, meaning each solute was taken to be either acidic or basic. For acidic solutes, the smallest acidic pKa was used with Eq. 4, oppositely for basic solutes the largest basic pKa was used with Eq. 3 to estimate \(\log D\) from \(\log P\).

Using pKa values only accounts for one change in protonation, whereas a correct distribution coefficient should include all relevant tautomers and protonation states of the molecule in both solvents. To account for all other tautomer states, we used Schrödinger’s LigPrep [44] to enumerate tautomers for each molecule in the aqueous solution. The results of the enumeration can include an energetic “state penalty” calculated with Epik which relates the population of that tautomer to all others. This state penalty can be converted into log units and used as a correction term to convert \(\log P\) to \(\log D\):

$$\begin{aligned} \log D = \log P + \frac{-E_{state\ penalty}}{k_BT \ln (10)} \end{aligned}$$
(5)

where \(k_B\) is Boltzmann constant and T is temperature. LigPrep can only perform the tautomer enumeration with water or DMSO as a solvent, so we were unable to predict tautomers in cyclohexane. Therefore both of these corrections account for the protonation or tautomer states only in the aqueous layer and assume the tautomer remains fixed in cyclohexane as the one used in the initial simulation. In the results section below, the corrections performed with \(pK_a\) and the corrections made with the the calculated state penalty are referred to \(\log D_{pK_a}\) and \(\log D_{state\ penalty}\) respectively.

Estimating distribution coefficients with a fast, structural based partition coefficient calculator

Many structure-based tools exist for octanol/water partition coefficients; they are very fast and generally accurate. However, these tools are all trained on empirical data, meaning they are limited by the training data. We chose the OpenEye tool OEXlogP [45, 46] as an example of such a tool. Two post-prediction sets were prepared with the OEXlogP tool. First, the predicted octanol/water partition coefficient was considered an estimate for cyclohexane/water distribution coefficient. In the second set, we used a linear regression to correct for the bias between the calculated XlogP values and a set of experimental cyclohexane/water partition coefficients [9]. For the rest of this paper we will refer to the octanol/water partition coefficient set as \(X \log\,P_{oct}\) and the bias-corrected set as \(X \log\,P_{corr}\).

Exploring the possibility of solvent mixing

Because no two solvents are perfectly immiscible, we wanted to explore the effect that a small amount of water present in cyclohexane would have on computed \(\log D\) values for one of the more polar solutes. The experimental concentration of water in cyclohexane is 0.00047 mole fraction [47]. The presence of water in the cyclohexane phase has a possibility of affecting the transfer free energy, especially for solutes with many polar functional groups. Also, it has been suggested that particularly polar compounds can pull water with them from the aqueous layer into the organic solvent [9]; while this is a kinetic argument and should not apply to equilibrium thermodynamic properties like \(\log D\), the point is well taken—some solutes have a particularly high affinity for water and may actually impact the amount of water present in cyclohexane when the solute is present at finite concentration. Thus, to investigate this, we took one of the most polar compounds—one for which we had particularly large errors relative to experiment—SAMPL5_074, and performed two additional sets of free energy calculations in cyclohexane. Both sets of simulations had the single solute in 150 cyclohexane molecules, but varied in water content—the first with seven water molecules present in cyclohexane, and the second with only a single water molecule. The input files for these simulations were also created with Solvation Toolkit as described above and the same simulation protocol was followed. These simulations were conducted after the SAMPL submission deadline in order to help us understand the role of water in such cases.

The experiments were not done on completely pure water and cyclohexane—particularly, experimental distribution coefficients were measured with small amounts of dimethyl sulfoxide (DMSO) and acetonitrile present in solution. Therefore, we investigated the how acetonitrile and DMSO would distribute in simulations. The experimental concentrations reported for each solvent are approximately 50, 50, 1, and 0.4 % by volume for cyclohexane, water, DMSO, and acetonitrile respectively [8]. To explore how the DMSO and acetonitrile distribute between cyclohexane and water we performed a single simulation with 130 cyclohexane, 780 water, four DMSO, and two acetonitrile molecules, mirroring those concentrations. Input topology and coordinate files were created with Solvation Toolkit. The system was minimized and equilibrated following our reference calculation procedure and then a 5 ns constant pressure and temperature production simulation was run. The trajectory from this simulation was visualized with VMD [48] and a movie is available in the supplementary information.

Results and discussion

A broad range of methods were used for the 76 submissions predicting cyclohexane/water distribution coefficients for the SAMPL5 challenge. Many of these predictions used alchemical molecular dynamics simulations to estimate the solvation free energy in explicit solvent using several classes of force fields, including fixed-charge all-atom force fields [4953], all-atom/coarse-grained hybrid force fields [54], and polarizable force fields [55]. One participant used Semi-Explicit Assembly, a type of implicit solvent solvation free energy method applied to one or more chosen solute conformations [56]. A variety of quantum mechanics (QM) methods were also used including QM/molecular mechanics (QM/MM) with explicit solvent [52, 53], QM with non-Boltzmann Bennett free energy calculations [52, 53], and QM energy calculations with a single optimized molecular geometry [52, 57]. Two participants used variations on the reference interaction-site model (RISM), an integral equation approach, to predict solvation free energies [58, 59]. One participant used QM calculations to derive parameters to tune an empirical model for activity coefficients and used these to estimate distribution coefficients [60]. A few submissions used empirically trained methods for calculating solvation free energies [61, 62]. One particularly successful submission, which will be discussed again below, employed the Conductor-Like-Screening Model for Real Solvents (COSMO-RS) [63].

SAMPL5 is the first SAMPL challenge to include distribution coefficients, but we can estimate how well we expect submissions to do based on past SAMPL challenges which included hydration free energies. Distribution coefficients can be related to transfer free energy between solvents, which allows us to estimate an expected performance from root-mean-squared error (RMSE) in past hydration free energy calculations. In SAMPL4 [1], the average RMSE for the best half of submissions was about 1.5 kcal/mol which would correspond to a 1.54 log unit error in a distribution coefficient if both solvation free energies have comparable errors. Here, only five submissions had an RMSE less than 2.5 log units in SAMPL5. There are many reasons for this perceived change in accuracy, such as a more complex set of molecules, the use of cyclohexane as a solvent, and the complexity of estimating tautomer populations, discussed in depth below. Since this is the first challenge on predicting distribution coefficients, it is likely that participants had not yet developed good protocols to deal with many of these challenges, meaning that somewhat less accuracy ought to be expected. It took several challenges focused on hydration [17] before a range of methods could achieve the success noted in SAMPL4.

Table 2 Error metrics were calculated for each set of predictions, including root-mean-squared error (RMSE), average unsigned error (AUE), average signed error (ASE), Kendall’s tau (tau), and Pearson’s R (R). Error slope refers to the slope of data in a QQ Plot. Indicated submissions included only batch 0\(^{\mathrm{a}}\) or batches 0 and 1\(^{\mathrm{b}}\)
Fig. 2
figure 2

Example plots created for each set of predictions. Submission 21 [53] and submission 49 [62] were chosen to try to represent average submissions, those that were in the middle by most error metrics. Comparison plots show how predicted distribution coefficients compared to experiment for both submissions. QQ Plots show how errors in the predictions were distributed compared to expectations given the model uncertainty

As discussed above, we calculated root-mean-squared error (RMSE), average unsigned error (AUE), average signed error (ASE), Pearson’s R (R), Kendall’s tau (tau), and the slope from the QQ plot (error slope) for each set of predictions. These are reported for all submissions (Table 2), but the rest of the analysis will focus only on submissions that reported results for batches 0, 1, and 2. For each submission, we also created a plot comparing the predicted and experimental values. Some example plots are provided (Fig. 2); these represent a typical submission, in that these submissions were in the middle of the pack by most error metrics. Comparison and QQ plots for every submission are available in the supporting information on Dash as well as error metric tables by batch rather than just for the full set.

Fig. 3
figure 3

Root-mean-squared error (RMSE), average error (AveErr), and average unsigned error (AUE) for every SAMPL5 submission covering all batches. The submissions on each plot are sorted from best to worst by that metric. Due to the number of submissions, data was split across the two panels, with a change in the y-axis scale

Fig. 4
figure 4

Kendall’s tau, Pearson’s R, and the slope from a linear regression analysis on the QQ Plot (‘error slope’) for every SAMPL5 submission covering all batches. The submissions on each plot are sorted from best to worst by that metric. Due to the number of submissions, data was split across the two panels, with a change in the y-axis scale

To help visualize all of the error metrics, the data was compiled into a histogram where results are sorted from best to worst for that metric (closest to 1 for error slope for example). These metrics are split into measurements of deviation from experiment (Fig. 3) and correlation with experiment (Fig. 4) distinctions which helped in identifying high performing groups. This analysis included only submissions that included data for all molecules; the other submissions were indicated in Table 2 and generally fall in the middle of the pack on most metrics. In comparing methods by all of the error metrics, it is important to keep in mind the uncertainty in these error metrics. While Figs. 3 and 4 are ordered by method performance in some sense, the reality is that there are many submissions that are not significantly different from one another.

In the error slope analysis, the slopes are often substantially different from 1, indicating that participants generally provided poor estimates of model uncertainty. Only the top three submissions are within uncertainty of 1. Sebastian Diaz-Rodriguez et al. from Miami University used conservative estimates based on results in previous calculations for solubility and hydration free energy for submissions 53 and 60 [60]. Gerhard König et al. from Max-Planck-Institut für Kohlenforschung provided no explicit discussion of model uncertainty with submission 43 [53]. Only submission 40 significantly overestimated their model uncertainty. All other submissions have an error slope below one, indicating a significant underestimation of the model uncertainty. This suggests that analysis and prediction of model uncertainty remains a key frontier for predictive molecular simulations, and further effort is needed in that area.

Top performing submissions

Fig. 5
figure 5

These plots compare predicted and experimental distribution coefficients for the top performing submissions (16, 36, and 14)

In order to determine which submissions performed the best, we group error metrics into two categories. The first category describes typical error relative to experiment, and includes metrics RMSE and AUE. The second category describes how well correlated the experimental values are with the experimental values, and includes Kendall \(\tau\) and Pearson R. Unlike past SAMPL challenges, there does appear to be one submission which performs best by all of these metrics, submission 16, and for most metrics it is better by a statistically significant amount. Next, we considered the top ten submissions for each of these four metrics. There were only two submissions, other than 16, which performed in the top ten for at least three of these metrics, submissions 14 and 36. Predictions from each of these submissions are compared to experiment in Fig. 5. For submission 16, Andreas Klamt et al. from COSMOlogic used COSMO-RS to compute a partition coefficient for each solute from the difference in chemical potentials for the solute in each solvent [63]. To find distribution coefficients, calculations for the formation of different protonation states, zwitterions, and tautomers were performed in COSMO-RS for relevant molecules. For submission 14, Frank Pickard et al. from the National Institute of Health calculated solvation free energies from QM calculations with SMD implicit solvent in Gaussian. Absolute \(pK_a\) calculations were used to account for additional ionization states [52]. For submission 36, Sherin Shanaka Paranhewage et al. from Oklahoma State University estimated \(\log D\) as a partition coefficient, calculated from the difference in alchemical solvation free energies where the solute was parameterized with the dielectrically corrected general AMBER force field, water was the dielectrically corrected H2O-DC model, and cyclohexane was a specially optimized united-atom model [49]. Further details for each of these submissions can be found in this issue so only a brief explanation of each method was provided here.

Comparisons to simple empirical models

Table 3 Null hypothesis corresponds to \(\log D = 0\) for all molecules

One way of evaluating predictive models is to compare them to a null hypothesis, or default result of some kind. In the case of distribution coefficients, we chose a null hypothesis where we assume all solute molecules distribute equally between cyclohexane and water, corresponding to \(\log\,D = 0\), as suggested by Christopher Fennell [64]. We performed all our standard error analyses discussed above (RMSE, AUE, and Ave. Err) on this simple model as a point of comparison (Table 3). The null hypothesis would have been the top submission for both RMSE and AUE. While this null

Fig. 6
figure 6

The experimental distribution coefficients of the SAMPL5 challenge have a relatively small dynamic range, with most falling within 2 log units of zero

hypothesis has no actual predictive power and could not be used to rank compounds, the fact that it performs better than any submission in terms of error statistics is a challenge for the other methods. These results may also provide commentary on the dataset, which contains a reasonably large percentage of \(\log D\) values that are not that far off from zero (Fig. 6). Organizers had hoped to ensure equal coverage of all \(\log D\) values within the assay range, but due to experimental time constraints this was not possible. It is possible the null model would look worse if the experimental results were more evenly dispersed across the entire dynamic range.

There are many structure-based and/or empirically trained prediction methods for octanol/water partition coefficients. To a first approximation, one might imagine that cyclohexane/water partition coefficients would follow similar trends to those in octanol/water. Therefore, we used OEXlogP from OpenEye (\(X\log\,P_{oct}\)) to examine the possibility of estimating cyclohexane/water distribution coefficients with such a tool. Next, we compared \(X\log\,P_{oct}\) results for a set of compounds with experimental cyclohexane/water partition coefficients [9]. A linear regression was used to correct the \(X\log\,P_{oct}\) values with a slope of 0.7241 and a y-intercept of −1.0306 (\(X\log\,P_{corr}\)). \(X\,\log\,P_{oct}\) would be in the top few submissions by tau and R, but ranked in the middle for all other metrics (Table 3). However, with a simple linear regression trained on experimental cyclohexane/water partition coefficients, \(X\,\log\,P_{corr}\) has a better RMSE and AUE than any SAMPL5 submission. We do not wish to suggest that regression-trained tools are the best mechanism for predicting distribution coefficients; rather, this indicates the potential for cyclohexane/water distribution data to help drive improvements in our physical models, as clearly there are a range of physical effects here which are not yet well described by our models.

Results of reference calculations

We performed a set of blind reference calculations (submission 39) for the SAMPL5 challenge, calculating \(\log P\) for the provided neutral tautomers of all solutes. Our protocol for these calculations was announced in advance, and parameter and coordinate files for the calculations were made available (as described above) in formats for a variety of simulation packages. Participants were encouraged to perform their own set of reference calculations for the full set, or at the very least several specified reference compounds, using these files. This would allow differences in performance to be traced back to methodological differences rather than force field differences. Unfortunately, no participants actually reported results of reference calculations, so this type of analysis has thus far been impossible.

Fig. 7
figure 7

Plots showing our reference calculations compared to experiment. Shown here is the results for submission 39 in SAMPL5, with no tautomer correction (\(\log P\)), distribution coefficient corrected from calculated partition coefficient based on pKas (\(\log D_{pK_a}\)), and distribution coefficient corrected from calculated partition coefficient based on state penalties (\(\log D_{state\ penalty}\))

However, the results of our reference calculations are still helpful for understanding the challenges facing SAMPL5 participants.

For our reference calculations, solvation free energies were calculated using GROMACS with GAFF parameters and AM1-BCC charges. Our reference calculations yielded partition coefficients, determined from the difference in solvation free energies without correcting for variation in tautomers. These calculations were done blindly, and analyzed as submission number 39, which was in the top quarter of submissions by most error metrics (Table 2) although, there is a slight bias favoring solvation in cyclohexane, evidenced by the average error (\(1.6 \pm 0.3\)).

Table 4 Shown are the results for error analysis on our reference calculations which estimated \(\log D\) as a cyclohexane partition coefficient (\(\log P\)) and the correction to distributions coefficients by \(pK_a\) (\(\log D_{pK_a}\)) and by state penalty (\(\log D_{state\ penalty}\))

After the challenge we explored how including protonation and deprotonation would have affected the initial partition coefficient predictions. The first set of corrections involved calculating the pKa for each molecule using Schrödinger’s Epik tool [4143]. Next, \(\log D\) was calculated using the pKa and partition coefficient determined in submission 39 using Eqs. 3 and 4 for basic and acidic solutes, respectively. We assumed only one change in protonation state occurred so only one \(pK_a\) was used. This does not account for zwitterions or alternate neutral tautomers. This correction (labeled \(\log D_{pK_a}\)) showed a slight improvement by most error metrics (Table 4) including a decrease in the average error from \(1.6 \pm 0.3\) to \(0.7 \pm 0.3\) indicating less bias toward overly high concentration in cyclohexane.

For the next set of corrections, we used Schrödinger’s

Ligprep tool [44] to enumerate tautomers and calculate a state penalty or relative energy of each tautomer in an aqueous buffer at pH 7.4. The state penalty was used to correct the concentration in the aqueous layer, according to Eq. 5. This correction (labeled \(\log D_{state\ penalty}\)) results show improvements from the original partition coefficient coefficients for tau (\(0.49 \pm 0.08\) to \(0.65 \pm 0.06\)) and R (\(0.6 \pm 0.1\) to \(0.77 \pm 0.06\)), but no significant change in RMSE or AUE. It should be noted that no attempt was made to estimate uncertainties in the newly calculated \(\log D_{pK_a}\) or \(\log D_{state\ penalty}\) although both of these corrections would certainty have introduced uncertainty into the estimate. Future exploration in tautomer estimation will need to account for the uncertainty in those calculations.

Both of these correction methods only adjust the concentration in the aqueous layer; however, there may be tautomer affects that would change the concentration in cyclohexane as well. Outliers and molecules with particularly significant changes in \(\log D\) were indicated by number in Fig. 7. SAMPL_050 for example, had an initial \(\log P\) value of \(1.20 \pm 0.04\) which was decreased significantly to \(-9.78\) with the pKa correction and \(-10.70\) with the state penalty correction compared to the experimental value \(-3.2 \pm 0.6\). SAMPL_060 and SAMPL_063 also changed by more than 3 log units due to these corrections. These state penalties allow us to account for other tautomers and protonation states only in the aqueous phase. Without tautomer enumeration in cyclohexane, we have to assume that the tautomer used for solvation free energy calculations, prior to correction, is the dominant state of the solute in the cyclohexane phase. If an alternate tautomer were relevant in water and cyclohexane, we could obtain dramatically incorrect values with this approach, since our state penalty will only recover alternate tautomer(s) in water, but not cyclohexane. This appears to be one of the reasons why these corrections seem to overshoot in the cases listed above. A better solution would compute state penalties in both water and cyclohexane, but we do not currently have an adequate approach for doing so.

Even after correcting for tautomer enumeration in the aqueous phase, there is a slight bias for cyclohexane in the calculated distribution coefficients (Table 4; Fig. 7). There are a variety of factors that could be contributing to this slight bias. A recent study by the Mobley group calculated partition coefficients for small molecules and found a slight bias for alcohol compounds to over favor cyclohexane [23]. Possibly, this demonstrates a limitation in GAFF or atomistic force fields in general to accurately predict the behavior of solutes in polar and non-polar environments. That same study found that for large, flexible compounds insufficient conformational sampling can dramatically affect the calculated partition coefficients [23]. Given the number of large, flexible, and polyfunctional compounds in SAMPL5, more investigation into each of these effects along with improved tautomer enumeration and handling (especially for the cyclohexane phase) will be required to completely understand the slight bias seen here.

Examining individual molecules

With only 53 molecules, it is difficult to find any statistically significant trends in terms of functional groups which are well- or poorly-predicted in general; compared to past SAMPL challenges, this set of molecules is much more complex. They are on average larger, more flexible, and contain multiple functional groups per compound. For each molecule, we organized a data set of predicted distribution coefficients and compared them to the experimental values, calculating the average unsigned error for each (Table 1). There are only three molecules with an AUE less than 2.0 log units (SAMPL5_ 003, 045, and 059). While these three molecules are relatively small, there were no trends in AUE and molecular weight, as there was in SAMPL4 hydration free energy results [1]. We tried grouping molecules by functional group, molecular mass, and estimated number of tautomers to see if size, presence or absence of particular functional groups, or number of tautomers played a role in how difficult each compound was in general. The only trend found in this process was that all five carboxylic acids (SAMPL5_ 010, 011, 015, 026, and 060) are in the worst ten molecules by AUE and RMSE. This could be due to poor treatment of the effects of protonation state changes. Among the bottom compounds, perhaps unsurprisingly, was SAMPL5_083 which is a large macrocycle, and SAMPL5_050; both have many neutral tautomeric forms. Most submissions had significant errors in predicting SAMPL5_074, despite the fact that it is relatively small, rigid, and has no other significant tautomers. Below we will explore why some of these molecules may have had distribution coefficients which were particularly difficult to predict.

The provided SMILES strings may not be the most populated tautomeric form of the molecule

From our tautomer enumeration and discussions with other SAMPL5 participants [65] it became clear that accurately estimating \(\log D\) for molecules with many tautomers was difficult. For example, we compared population corrections we derived using Schrödinger’s LigPrep [44] with corrections calculated by Pickard et al. [52, 66] and found significant differences between them. If we could perfectly calculate solvation free energies and tautomer populations in both solvents, the starting tautomer should not affect the final calculated distribution coefficient, but differing corrections—such as these—will yield different results. Additionally, whenever protonation state/tautomer populations are not estimated correctly or not included in both solvents, the initial choice of protonation state/tautomer is likely to affect computed \(\log D\) values. Here, our initial solvation free energy calculations used provided SMILES strings without any consideration of other tautomers. To explore how this may have affected our \(\log D\) calculation, we decided to repeat a few solvation free energy calculations with alternate tautomers. We used SAMPL5_050 and SAMPL5_083 as examples since both have other neutral tautomers that could be present in both the water and cyclohexane solutions. Also, most participants failed to accurately predict the correct \(\log D\) for either solute. The new tautomer for SAMPL5_050 was found with LigPrep [44]; the one for SAMPL5_083 was found with COSMO-RS and was provided by Andreas Klamt [63, 65]. For both SAMPL5_050 and SAMPL5_083 there were significant changes in calculated solvation free energies and partition coefficients for the two different tautomers (Table 5). Distribution coefficients were calculated from the \(\log P\) and state penalties calculated with Schrödinger’s LigPrep tool [44]. In both cases the \(\log D\) is still significantly different from the experimental values. Since both the calculated solvation free energies and the tautomer/protomer populations are needed to estimate the distribution coefficient, it is impossible for us to know which calculation introduces more error into our estimates.

Table 5 Shown are the results for solvation free energy of two different tautomers of SAMPL5_ 050 and 083
Table 6 Shown are results for the solvation free energy and distribution coefficient for SAMPL5_074 in cyclohexane with no water present, 1 water molecule, and 7 water molecules all with 150 cyclohexane molecules and 1 solute molecule

Solvents are not completely immiscible

Though the concentration of water in cyclohexane is very small, 0.00047 mole fraction [47], it may still affect how a solute is distributed across the two solvents. This will be particularly important for solutes with many polar groups; and may be one reason it was difficult to accurate estimate the \(\log D\) for SAMPL5_074. We performed two new calculations of solvation free energy of SAMPL5_074 into cyclohexane with some water present in cyclohexane. These simulations were set-up with Solvation Toolkit as described above. Both sets of simulations had a single solute molecule in 150 cyclohexanes, but one had seven water molecules and the other had a single water molecule. While neither of these results reach the low experimental concentration of water in cyclohexane, they do demonstrate the dramatic variation in solvation free energy and \(\log D\) which can be caused by the presence of water in cyclohexane. With varying amounts of water in cyclohexane, the water dramatically impacts the computed solvation free energy for SAMPL5_074 into cyclohexane (Table 6) and thus calculated \(\log D\) values. In this case particular case, the presence of water also dramatically improves the estimation for \(\log D\), though as noted this is with far too high a water concentration in cyclohexane. We visualized trajectories from the production phase of our calculations and find that all water molecules stay adjacent to SAMPL5_074 for the full simulation, likely indicating a particularly high affinity for water. Addition of one or a few molecules of a second solvent (water) to cyclohexane might be expected to raise the uncertainty in calculated free energies significantly due to slow mixing of the second solvent in the simulation box, but we do not observe that here. It seems likely this is because the water molecules are so strongly attracted to the solute that they essentially stay bound throughout the simulations, so we do not observe slow mixing and the associated increased uncertainty. For a solute with so many polar functional groups, it is perhaps unsurprising that the water molecules in cyclohexane are drawn to the solute. In general, this suggests that the local concentration of water near highly polar solutes may be much higher than the bulk concentration in cyclohexane, and this may potentially be important when considering simulation settings. It is important to note that the concentration of water in cyclohexane for these simulations are 0.0067 and 0.047 for 1 and 7 water molecules, both a gross overestimate of the amount of water in cyclohexane (by multiple orders of magnitude) as our focus here was to explore the sensitivity of \(\log D\) to cyclohexane water content. However, our results are sufficient to show that the presence of water in the cyclohexane phase can dramatically affect the computed distribution coefficient, depending on the affinity of the solute for water. To accurately account for these effects, we will need long simulations at much larger box sizes to not only match the experimental water concentration, but to determine how much water will localize around the solute at equilibrium. Given the extent to which water stays localized near the solute in these simulations, substantially longer simulations may also be needed to converge the relative populations of water in bulk cyclohexane versus near the solute.

Other buffer/solution components may affect distribution coefficients

The two phases for the distribution coefficients were cyclohexane and an aqueous buffer, however, DMSO and acetonitrile were used in the experiments [8] as well. While DMSO and acetonitrile were at very low concentrations, their presence in either solvent layer may affect how a solute distributes between phases. We created topology and coordinate files for a system with 780 water, 130 cyclohexane, 4 DMSO, and 2 acetonitrile molecules using SolvationToolkit, roughly matching the experimental concentrations. In the trajectory visualized with VMD, both DMSO and acetonitrile spend most of their time near the solvent interface, with very little movement into the bulk of the water or cyclohexane. In our view, this does not immediately suggest profound implications for calculated \(\log D\) values, but a detailed understanding of the implications of this for both calculated and measured distribution coefficients will require further study.

Conclusions

Past SAMPL challenges often involved a broad range of methods for hydration free energies. Here, in our first SAMPL on cyclohexane/water distribution coefficients, we saw a similarly diverse set of methods. Predicting \(\log D\) accurately for this set of molecules was rather difficult, exhibiting a number of the same challenges that will face accurate prediction of binding affinities. The best methods showed reasonable agreement with experiment with RMSEs around 2.5 log units and Kendall tau’s around 0.6. However, considering that the null model and \(Xlog \, P_{corr}\) both would have topped the submissions list by RMSE and AUE, there is clear room for all of the methods employed in SAMPL5 to improve.

This SAMPL5 set was substantially more complex, flexible, and polyfunctional than typical molecules in SAMPL hydration challenge sets. Additionally, the most relevant protonation state and tautomer were not always clear for the compounds. Some compounds likely had multiple relevant protonation states and tautomers, and shifts in protonation/tautomeric state on transferring between phases. These issues are especially important given that the challenge focused on distribution coefficients rather than partition coefficients (\(\log P\)). Given these complexities, it is not surprising that we saw drop in performance relative to the accuracy that would have been expected for \(\log D\) values if participants predicted \(\log D\) based on solvation free energies in two solvents that could be computed as accurately as hydration free energies in previous SAMPL challenge (yielding an expected accuracy of about 1.5 log units). In large part, this is probably because of the additional complexities of distribution coefficients such as the need to account for other protonation states and tautomers. Our solvation free energy calculations performed with a less dominant tautomer of SAMPL5_050 lead to a \(\log D\) estimate of \(-10.70 \pm 0.04\) which is 7.5 log units below the experimental value. Thus, accurately accounting for tautomers appears to be a vital part of accurately calculating \(\log D\), and better methods for treating protonation and especially tautomeric states in non-aqueous environments are needed.

As mentioned earlier, the molecules chosen for SAMPL5 are generally larger and more flexible than those in past SAMPL challenges. It follows that conformational sampling of the solutes might play a significant role in accurately predicting these distribution coefficients. In a previous study using the same protocol as the reference calculation here, the Mobley lab showed that changing the initial conformation of a large, flexible molecule dramatically changed its calculated partition coefficient due to insufficient conformational sampling [23]. In this issue, Tyler Luchko et al. spend time addressing the importance of conformational sampling for these molecules in the context of their submission results for the challenge [59].

We asked participants to estimate two forms of uncertainty, statistical uncertainty and model uncertainty, the latter of which should predict how well their calculation will agree with experiment. This latter uncertainty estimate is particularly key, as it would allow practitioners to predict how reliable their calculations are likely to be in applications. Here, we find that almost every participant dramatically underestimated their model uncertainty. The importance for the community to improve error estimation has been addressed in past SAMPL challenges [1], but clearly, much more work is still needed.

A common trend in SAMPL5 submissions was that the dynamic range in the predicted distribution coefficients was larger than the dynamic range observed experimentally—that is, generally, the smallest \(\log D\) values were underestimated and the largest were overestimated. This issue of dynamic range is visible in Figs. 2, 5, and 7, but was evident in comparison plots for almost all submissions. For SAMPL5_074, the introduction of water to cyclohexane increased the \(\log D\) estimate from −3.76 to −1.73; it is possible that accounting for water in cyclohexane would affect the apparent underestimation at the lower end of the \(\log D\) scale. As discussed above, it is also possible that insufficient conformational sampling could account for some of this problem. For example, if a solute has a particular conformation that is more stable in a non-polar environment and that conformation was not found for the solute in cyclohexane due to insufficient conformational sampling, then the calculated distribution coefficient would over favor the aqueous environment. However, this will require further investigation. It is of course also possible that experimental issues could have compressed the measured dynamic range of compounds, but this too will require further investigation.

The results of this challenge strongly suggest predictions for solute partitioning will be extremely helpful for driving improvements to physical modeling needed in pharmaceutical research. The major challenges encountered here are all very likely to occur when attempting to predict binding affinities or other biomolecular properties of interest to drug discovery. Specifically, accurately predicting the population of protonation and tautomeric states was a challenge, complicated by the fact that there is no simple way to predicted protonation states and tautomers to corresponding experimental values relevant to the conditions studied here. Most work so far on tautomer prediction has focused on tautomeric ratios in vacuum or in water, but tautomer populations are likely environment-dependent in ways that can dramatically affect computed physical properties. These same challenges are likely to apply when predicting biomolecular binding. An improved treatment of these effects within the context of SAMPL or similar challenges will drive advances in computational techniques also used to predict binding and related properties such as solubility.

Overall, distribution coefficients have been an extremely valuable part of this year’s SAMPL5 challenge, and, since they can be measured in a relatively straightforward way, seem to be a promising potential source of future data for blind challenges. Additionally, this data highlights important issues, such as tautomer enumeration, that need better treatment in many of our models. The ability to create new, completely blind data sets make distribution coefficients a great option for future challenges.

Supporting information

Supporting information is available free of charge online through the University of California Dash at http://n2t.net/ark:/b7280/d1988w. It includes all of the files provided to SAMPL5 participants. That includes a table of SAMPL5_ID numbers paired with SMILES, MOL2 files for each molecule, and the GROMACS, AMBER, DESMOND, and LAMMPS topology and coordinate files for each solute in water and cyclohexane. A separate directory is included with analysis associated with our reference calculations and post sample method development. This includes all python scripts, input files, output files, and results files required to repeat our simulations and calculations done with Schrödinger tools. We also provide the data files for all submission, the scripts used for error analysis, plots for all submissions, and data for only batch 0 and batches 0 and 1. This is supported with detailed README files explaining the structure of the directories.