Introduction

Ligand efficiency (LE) has rapidly become accepted as a key tool to help gauge the potential value of fragments and early hits [13]. It is usually defined as the negative of the binding affinity per non-hydrogen atom:

$$ \begin{aligned} LE & = - \Updelta G/HAC \\ & = - RT\ln (K_{d} )/HAC \\ & \approx - RT\ln (IC_{50} )/HAC \\ \end{aligned} $$
(1)

Here ∆G is the free energy change associated with the ligand binding to the protein, with associated dissociation constant K d (or half maximal inhibitory concentration IC 50 ), and HAC is the number of “heavy” (non-hydrogen) atoms in the molecule. LE allows us to estimate how potent a fragment hit needs to be in order to be optimisable to a druglike lead, assuming that we can maintain LE during the optimisation. Typically this druglike endpoint is taken to be a molecule that has 36 heavy atoms (corresponding to Lipinski’s maximum molecular weight of 500) [4] with a potency of 10 nM. This profile leads to the commonly used threshold value for LE of 0.3 kcal mol−1 per heavy atom.

The assumption that it will be possible to maintain LE during optimization has been challenged by some authors [57]. The authors of these publications have suggested varied (and more complex) schemes for normalizing potencies according to molecular size. Alternative ligand efficiency indices have also been proposed, where the binding affinity is normalized by molecular weight or polar surface area [8]. However, the simplicity of LE as defined in Eq. 1 makes it an intuitive quantity for medicinal chemists to use, and in practice we and others have found that with careful optimization it is often possible to maintain LE from fragment to lead [9].

Recently much attention has focussed on lipophilicity as a key physical property that must be controlled in a candidate small molecule. Compounds that are more lipophilic are associated with increased rates of attrition in the clinic [10], often have poor absorption properties and solubility [11] and can be expected to be more promiscuous [12].

So how lipophilic is too lipophilic? Leeson and Springthorpe have examined this problem for candidates and other late stage molecules with a ligand lipophilicity efficiency (LLE) index [12]. Their LLE value is defined as follows:

$$ LLE = pIC_{50} - \log\,P $$
(2)

Here pIC 50  = −log10(IC 50 ), and P is the partition coefficient for the neutral compound between octanol and water. The value used for log P could be measured or calculated; in this paper we will mostly discuss calculated values (ClogP). The index presented in Eq. 2 has been used by other workers, and named “Lipophilic Efficiency” (LipE) [13]. Based on an analysis of the properties of oral drugs (average ClogP ~ 2.5, pIC 50  > 7.5), Leeson and Springthorpe suggested that LLE ≥ 5 is a suitable target value.

More recently Keseru and Makara [14] defined the ligand-efficiency-dependent lipophilicity (LELP) index:

$$ LELP = \frac{\log\,P}{LE} $$
(3)

They suggest that promising compounds would have log P values between −3 and 3, with LE > 0.3, giving LELP values that would lie between −10 and 10. However, it is not always obvious how to interpret LELP values—for example a compound with log P = 1.0 and LE = 0.1 would have an acceptable LELP value of 10.

We feel that the LLE index of Leeson and Springthorpe is a very useful metric for assessing candidates, leads or other drug-sized molecules, and we wanted to extend the idea to typical fragment screening hits. In our experience it would be rare to discover a small fragment hit with LLE ≥ 5; with a logP of 1, such a compound would need to have a pIC 50 of 6. For a fragment with 12 heavy atoms, this would correspond to an LE of 0.68 kcal mol−1 per non-hydrogen atom, and while fragments at this level of efficiency are not unprecedented, they are certainly unusual. Here we have used a similar philosophy to define a new LLE index that is applicable to fragments, and can be used to compare molecules of different sizes. In this sense it is closer to conventional ligand efficiency. To avoid confusion between our new LLE index, and that defined by Leeson and Springthorpe, we will denote ours LLEAT. LE and LLEAT are expressed throughout this paper in units of kcal mol−1 per non-hydrogen atom. The LLE index of Leeson and Springthorpe is by definition dimensionless.

Definition of LLEAT

Figure 1 shows a schematic representation of a ligand binding to a protein. Conventional LE is defined in terms of ∆G, the free energy change associated with ligand binding, which we typically approximate using an IC 50 , as shown in Eq. 1. A (usually favourable) component of this binding affinity comes from transferring the ligand from an aqueous environment to the more hydrophobic environment of the protein. In Fig. 1 we have represented this as the transfer of the ligand from an aqueous environment to a non-specific hydrophobic environment, with an associated free energy change ∆G lipo . To calculate LLEAT, we replace the ∆G in Eq. 1 with a modified free energy change ∆G *, where this non-specific lipophilic component (∆G lipo ) has been removed. We can approximate ∆G lipo using logP, which provides us with an easily calculable estimate for the partition coefficient of a compound between a hydrophobic (octanol) and aqueous environment. This leads to Eq. 4:

$$ \begin{aligned} \Updelta G^{*} & = \Updelta G - \Updelta G_{lipo} \\ & \approx RT\ln (IC_{50} ) + RT\ln (P) \\ & \approx \ln (10) \cdot RT\left( {\log\,P - pIC_{50} } \right) \\ \end{aligned} $$
(4)

We note that, as defined in Eq. 4, ∆G * is directly proportional to the LLE index of Leeson and Springthorpe. To define a threshold value for LLEAT, we first need to define a target compound profile. The targets for size and potency were chosen to be the same as those for conventional LE; 36 heavy atoms, with a potency of 10 nM. We selected a target logP of 3, which is approximately consistent both with the profile of known oral drugs (average ClogP ~ 2.5) and also the work of Leeson and Springthorpe. This target profile would set a minimum acceptable value for LLEAT of around 0.19. In order to make comparisons with conventional LE easier, we chose to give LLEAT the same target value of 0.3. To this end, a constant term is added, giving us our final definition in Eq. 5.

$$ LLE_{AT} = 0.11 - \Updelta G^{*} /HAC. $$
(5)
Fig. 1
figure 1

The binding process can be considered in two stages—first transfer from aqueous solution to a hydrophobic environment (with free energy change ∆G lipo ), and second the specific binding of the ligand to the protein (∆G *). ∆G lipo will be more favourable for more lipophilic compounds, so we would expect them to bind more strongly to an arbitrary protein (when ∆G * is approximately random)

Hit selection and optimization using LLEAT

LLEAT allows us to determine not only a suitable potency for a fragment to be of interest, but also an appropriate log P value. Table 1 represents a hypothetical progression from fragment to candidate, maintaining constant LE and LLEAT of 0.3. Also included in the table are values for LLE and LELP. Note that in contrast to LLEAT, these values increase as the optimisation progresses, making it harder to compare hits of different sizes.

Table 1 Hypothetical progression from fragment to candidate

In reality of course an optimisation is unlikely to follow such an idealised path. Typically at an early stage a number of hits may be available, of varying potency and lipophilicity. Table 2 shows four similar hypothetical hits from a fragment screen, all containing 12 heavy atoms but differing in lipophilicity. Also shown are the IC50 values required for each compound to give it an LLEAT of 0.3. Strikingly, there is almost a 50-fold difference between the required potency for the most and least lipophilic compounds. By contrast, using conventional LE we would calculate the same required IC 50 (~2 mM) for all these fragments.

Table 2 Four hypothetical fragment screening hits, all with an LLEAT of 0.3

It is also possible to use LLEAT to assess how successful a modification to a compound has been, analogous to the group efficiency (GE) concept [15]. If we have an existing molecule “A”, and add a functional group to it to form a new molecule “B”, then GE represents the binding efficiency of the added functional group:

$$ GE = - \Updelta \Updelta G/\Updelta HAC $$
(6)

where ∆∆G = ∆G B −∆G A (i.e. ∆∆G is the difference in binding free energy between molecules A and B) and ∆HAC = HAC B HAC A . If molecule A has an LE of 0.3, and we wish to maintain this efficiency in molecule B, then the added functional group will need to have a GE of at least 0.3.

A similar analysis can be performed for LLEAT:

$$ GLE_{AT} = - \Updelta \Updelta G^{*} /\Updelta HAC $$
(7)

where \( \Updelta \Updelta G^{*} = \Updelta G^{*}_{B} - \Updelta G^{*}_{A} \), and ∆G * is defined as in Eq. 4. We can rearrange Eq. 7 to calculate the required improvement in potency in order to maintain an LLEAT of 0.3. Recall from Eq. 5 that our definition of LLEAT incorporates a constant term—this means that the required GLEAT to maintain an LLEAT of 0.3 is approximately 0.19. Our required ∆∆G is therefore:

$$ \Updelta \Updelta G = - 0.19 \cdot \Updelta HAC - RT\ln (10) \cdot \Updelta C\log\,P $$
(8)

where ∆ClogP = ClogP B −ClogP A . Figure 2 shows the fold improvement in potency necessary to maintain an LLEAT of 0.3 for a range of common substituents to an aromatic ring, calculated using Eq. 8. The list of substituents was constructed from an internal database of over 5 million commercially available screening compounds [16], by looking for pairs of molecules that differed only by a terminal group attached to an aromatic carbon. In order to calculate the required ∆∆G, we need to know not only ∆HAC, but also ∆Clog P for each substituent. While ∆HAC is a constant for a given substituent, the precise value of ∆Clog P for a particular pair of molecules will depend on the local environment of the functional group addition. Here, average values have been taken for ∆ClogP, calculated over all the pairs of molecules in which the substitution was observed, and so the values shown should be taken as indications, rather than precise numbers. Nevertheless, it is instructive to compare, for example, the piperazinyl, 4-pyridyl and phenyl substituents. A conventional GE analysis (using Eq. 6) would argue that as 6 heavy atoms were being added, the necessary improvement in potency to justify each change is approximately 20-fold. However, a GLEAT-based analysis comes to a markedly different conclusion, requiring a 460-fold improvement for the phenyl group, a 30-fold improvement for the pyridyl group, and only a threefold improvement for the piperazinyl group. This is perhaps a slightly contrived example, as only the most tolerant of binding sites would allow the chemist a completely free choice between these three substituents, but it does clearly indicate the difference between an LE driven analysis, and one that incorporates LLEAT.

Fig. 2
figure 2

Fold improvements in potency required to maintain an LLEAT of 0.3 for some common substituents, when attached to an aromatic carbon

Deficiencies of LLEAT

As with any such metric, it is straightforward to point out potential shortcomings of LLEAT. It might be argued that logD should be used instead of logP when calculating LLEAT. While it is certainly arguable that logD is at least as relevant as logP, it is a harder property to estimate computationally, due to the additional complications of predicting pK a values. This would effectively limit use of the index to compounds with available experimental logD data. Eq. 5 would also need to be reparameterised using an appropriate target value for log D.

Another possible criticism of LLEAT is that it will allow small, potent compounds to be surprisingly lipophilic. This occurs because its target end point is a molecule with 36 heavy atoms. For example, a compound with 24 heavy atoms and an IC 50 of 10nM would be allowed to have a log P of up to 4.7. There are two main counterarguments to this criticism. Firstly, one could argue that at only 24 heavy atoms the compound in question is small enough that further functionality could be added to reduce its lipophilicity—a solubilising group, perhaps—and the index is implicitly taking this into account. Secondly we would suggest that in general, efficiency indices that are scaled by molecular size become less useful as the compound approaches the target potency. We would argue that the primary purpose of such an efficiency index is to act as a guide during the early stages of the discovery process (analagous to a pacemaker in a long-distance running race), when it is not clear how to find the right balance between potency, size and lipophilicity. Once the target potency has been achieved, one can and should focus on simple properties of the molecule such as logP and molecular weight, and unscaled derivations such as LLE. In this example, either the logP or LLE (3.3) could be used to show that the compound is more lipophilic than desired.

Lipophilicity is of course not the only property that we need to monitor. In some situations the main challenge will be compounds that are too polar, or have too many hydrogen bond donors, or rotatable bonds. However, excessive lipophilicity is a sufficiently common and potentially serious problem that we feel it warrants consideration from the earliest stages of a discovery program.

Summary and conclusions

We have derived a new LLE index, LLEAT, which is a useful metric to help medicinal chemists when selecting and optimising fragment and other small, early screening hits. It is conceptually simple, and has been designed to have the same target value (0.3) and a similar dynamic range to LE. As such it is straightforward for medicinal chemists to interpret. LLEAT is best used in conjunction with LE; in our experience fragments with both LE and LLEAT ≥ 0.3 are more likely to be optimisable into potent molecules with physicochemical properties appropriate for an oral drug. Focussing on LLEAT as well as LE during the optimisation process will help to avoid improvements in potency that are largely driven by increases in lipophilicity, thus delivering lead molecules with improved chances of being a successful clinical candidate. Finally we would observe that LLEAT is ultimately an empirical metric, and as such should be used to complement, and not replace, the knowledge and intuition of an experienced medicinal chemist.