Introduction

The situation

Binding constants are the key parameters to optimize lead compounds to drugs. Optimizing a substance for efficacy and selectivity relies on the appropriate measurements of these parameters. Binding constants are also the basis for structure activity relationships (SAR) and similar computer models. These models attempt to predict binding constants for novel compounds. However, without reliable input data, in silico models cannot yield good predictions.

Data sets extracted from public databases also critically depend on the initial data quality [13]. The data sets can be curated to some extent by removing implausible data records (i.e. transcription errors, grossly deviating replicates). In order to allow maximum gain from ligand binding databases, the respective protocol information behind the data must be preserved and associated with the data [2]. This is the basis to understand the source of deviating results.

The binding data in publically accessible databases for molecules evaluated by different assays typically show a high uncertainty. Unsatisfactory precision is the main source, but lacking accuracy may also play a role. It is not unlikely that different assays evaluate different binding events, even different binding sites may be addressed [2]. When the same binding site is considered, often different (or missing) additional binding partners (e.g. solvent molecules of small ions) or a diverse chemical environment (e.g. ionic strength) make a striking difference. Thus, database entries with binding data from different source must be interpreted carefully. They often only provide information about the order of magnitude of the studied effect. This still may serve as a first orientation to plan further experiments.

Hence, success in drug research is strongly tied to accurate and precise measurement of binding constants. This is particularly true for interactomics and system biology research, about which the first success stories are being told [4, 5]. These ambitious approaches require reliable data to achieve meaningful conclusions.

However, the accuracy and precision of these parameters are more limited than desirable. For instance, binding constants from different laboratories showed an average deviation of approximately 0.5 log units, which translates to a factor of about 3 by which the measurements differ [13]. Surprisingly, this is just the average. The cut-off value not to keep two values in one data set was 2.5 log units [2], which corresponds to a factor of more than 300 by which the measurements differ! (Fig. 1).

Fig. 1
figure 1

Investigation of the uncertainty of experimental heterogenous public Ki data. Experimental Ki data of 7667 independent measurements were compared for 2540 protein–ligand systems in an analysis of the ChEMBL database. This plot shows all pKi values that were measured for all investigated protein–ligand pairs. A threshold of 2.5 pKi units, depicted as upper and lower line, was set as a maximum limit for measurements in one data set. Variations of this size call for improvement (reprinted from Kramer et al. [2] with permission)

In contrast, in pharmaceutical quality control (QC), it is typically aspired to keep the within-assay experimental error (repeatability [6]) below 1 % RSD%. In this context assay is understood as quantitation of the active pharmaceutical ingredients. An experimental error of 2 % is often still acceptable, but is already considered as unusual. When binding experiments are measured, a deviation of 30,000 % is considered as the limit (e.g. for pKi values [2])! This comparison is certainly unfair, because the aforementioned 1–2 % experimental error refer to validated methods within one lab for far less complex assays. However, this sharp contrast highlights that there is obviously room for improvements in the quality of binding constant data.

The requirements

What is the required analytical performance to precisely and accurately measure binding constants? The variability of a model and a subsequent prediction can never be better than the variability of the underlying analytical method. When the data quality for a correct prediction is known, the required analytical performance can be derived (see Fig. 2). For instance, the measurement of ∆G of a binding event, and the subsequent estimation of ∆S in particular, requires especially high data quality [7].

Fig. 2
figure 2

Example for the variability of an analytical method over time in one laboratory, e.g. caused by changes in assay conditions and solution handling. In this experiment IC50 values (the biochemical half maximal inhibitory concentration) were measured for rolipram on PDE4D and cilostamide on PDE3 in the same laboratory. The standard deviation of the log IC50 values are σ = 0.22 for rolipram/PDE4D and σ = 0.17 for cilostamide/PDE3 (reprinted with permission from Kalliokoski et al. [3])

The analytical performance, such as accuracy, precision, selectivity and speed, not only matters to the analysts, but it is equally important for everyone working downstream with the produced datasets. Understanding assay performance allows to ask the right questions about data quality. It also allows to choose the appropriate experimental design and to pick the right future collaboration partners. Thus for all assays of interest, the fundamentals and functional principles are outlined and the available information about their performance is compiled in the following.

The assays

Choice of the right assay

How can we achieve the analytical performance needed to reliably measure binding constants? Analytical accuracy and precision should be prerequisites. It is important to choose an assay which can perform well in the expected affinity range. Previous experiments, or prior experience with a compound class may guide this decision. If no previous knowledge is available, assays with a wide affinity range are preferable. Slow binding kinetics may also influence the choice of the particular assay. The availability and the consumption of expensive sample material should be taken into account. Speed is certainly important to do efficient research, but also to deal with samples of limited stability.

Binding assays shall reflect the binding properties within a relevant biological context. The proper choice of the solvent is one of the first critical steps here. For example, the use of DMSO may change the binding situation as compared to an aqueous solution. Dissociation constants and hydrogen bond strengths are typically very different in water and DMSO, reflected e.g. by the very different acidity of substances in these liquids. Charge and charge distribution within a ligand can therefore depend very strongly on the solvent. Versatile assays, which allow for a wide range of solvents and cofactors, are therefore advantageous.

Cell-based assays provide a holistic view on the biological action of the studied compounds. First, transport processes, e.g. into the cell or into the nucleus, are taken into account. Second, membrane proteins such as GPCRs are embedded in intact cell membranes. Third, the response of the whole cell can be observed. Hence, cell-based assays are often considered much more relevant than biochemical assays on isolated receptors or enzymes. Yet, if the target of the studied compounds is not yet known, cell-based assays do not provide information about the exact target [8].

Despite the advantages of cell-based assays, biochemical assays are often preferred in cases where information for a particular target is sought. The holistic view may also be undesirable, if binding to a particular protein and transport processes shall be studied separately.

Furthermore, biochemical assays are often preferred even if they are less than ideal because they are much simpler to set up provided they are sufficiently relevant for the studied problem. The more relevant the chemical environment such as solvent and ionic strength, the more likely a biochemical assay will be relevant as well.

If cell-based assays need or shall to be employed a number of challenges are faced with respect to precision. Yet, simpler approaches can be used to single out the major sources of variability in order to optimize the whole process (see “Optimizing assays” section).

For both biochemical and cell-based assays the future variability of the analytical matrix needs to be taken into account. Depending on the matrix, the sample pre-treatment needs to be appropriate. For example, the pH, the viscosity and the salt concentration within the sample can influence the results in the immunoassays. Sometimes it is sufficient just to dilute the sample in order to mitigate these effects, in other cases buffering or dialysis will be required. Often it will be necessary to develop or adjust existing pre-treatments to the requirements of the matrix at hand. Furthermore, the reaction kinetics must be considered. For slow kinetics, the binding constant will depend on the testing time! In this case it is necessary to allow for a sufficient adjustment of the equilibrium.

In addition, the available reagent quality has to be carefully considered. Furthermore, assays often fail because their protocols are too vague. Sometimes the amount or even the kind of solvent is not sufficiently described, reagent quality and manufacturer are not given and dilution steps are not defined. Well-described protocols are an essential starting point, independent of the analytical technique.

In the following a number of assays are described. After an assay is selected for the intended purpose, its performance should be validated. If this validation is successful, meaningful binding constants can be expected. If the assay performance is insufficient, further optimization is required. Typical strategies for doing this will be described below (see “Optimizing assays” section).

Biochemical assays

Immunoassays

Immunoassays are often used for the quantitation of biomacromolecules. They are well suited for stronger binding constants (KD = 10−7–10−10 M), and they can be performed with very low sample amounts. In principle, a few molecules can be sufficient.

Enzyme linked immunosorbent assay (ELISA), is a very widespread technique. Because the aspects of validation and data quality were not fully appreciated in the beginning, ELISA and immunoassays were rather seen as a semi-quantitative technique for a long time. However, based on previous essential works [914] and official guidelines [1517], immunoassay protocols have been developed which allow 1.5–4 % RSD% for concentration analyses in one measurement series under favorable conditions. In order to reach this precision, the parameters given in Table 1 need to be carefully considered and controlled [18]. Furthermore, suitable dilution sequences have to be designed to minimize the resulting error [13].

Table 1 Parameters to be considered in ELISA assay [9, 10, 19]

Concepts to develop validated immunoassays have been comprehensively described in [10, 11, 14, 19]. Furthermore the validation of commercially available immunoassays has already been well illustrated [18, 20]. Certainly matrix effects remain a major source of variance after considering all the parameters mentioned above [12, 14, 19]. As matrices vary from one analytical task to another, so far no general concept to deal with these changes has been published. Sample pretreatment may be necessary in most cases (see “Optimizing assays” section).

Surface plasmon resonance (SPR) assays

In SPR based binding assays the interaction of molecules to immobilized ligands are measured directly in real-time. Possible bimolecular interaction studies include affinity, kinetics, thermodynamics, and specificity and concentration analysis of interacting molecules. Currently this technology is widely applied in bioanalytical areas such as medical diagnostics, bioanalysis, biopharmaceutics, drug discovery and proteomics [2123].

SPR offers several advantages over alternative immunoassay techniques like ELISA or radio immunoassay (RIA). In comparison to ELISA, lower binding affinities can be determined with SPR, the range of binding affinities (KD) measurable is about 10-3–10-10 M [24]. In addition SPR binding signals do not need reporter molecules so the conformational integrity of proteins can be retained. The binding signal depicts the interaction of protein with the biosensor surface of the SPR instrument in real time. Moreover results are obtained very rapidly, possible are a few minutes, since long sample incubations are not required. Interactions involving small organic molecules, such as drug molecules (>200 Da) or large molecules, like viruses or cells (<100,000,000 Da) can be measured using SPR techniques [24]. A disadvantage of SPR is the requirement of sufficient amounts of ligand that is successfully covalently bound to the surface of the sensor chip. Furthermore this ligand must be robust enough to stand several regeneration cycles [25].

SPR is a physical phenomenon that appears in conducting films at an interface between media of different refractive indices. In Biacore systems the different media are the glass of the sensor chip and the sample solution and the conducting film is a thin layer of gold [24]. When light at an angle of total reflection and a certain wavelength is shone onto the back of a sensor chip plasmons are excited in the gold film. A characteristic absorption of energy occurs and SPR is seen as a drop in intensity of the reflected light. The refractive index changes depending on the change of mass at the surface of the biosensor and hence the angle of reflection shifts. Thus changes in mass caused by bimolecular interactions are converted into optical signals that are detected in real time as a SPR sensorgram (Fig. 3) [24, 25]. A typical sensor chip has a gold side that is covered with a layer of carboxymethylated dextran onto which a protein or ligand is immobilized. A fluidic system leads the analyte solution over the sensor surface. The binding of the analyte to the ligand results in a change in the SPR signal measured in Response Units. Besides the common composition there exist array based SPR instruments for high throughput measurements and localized SPR (LSPR) which employ nanoparticles as conducting film.

Fig. 3
figure 3

Monochromatic light is reflected at a gold interface. Molecules bound to the gold layer can influence this reflection, mediated by the evanescent wave of the light beam. Hence changes in the mass at the gold layer can be noted by changes in the reflection, which in turn can be recorded as the sensorgram [24, 25]

In the past 12 years several SPR studies have been published that include the validation of assays using SPR technology. In most of them the validation of concentration or quantitation assays was reported [2632] and only few contain the validation of kinetic assays [3335]. Unfortunately, few published SPR-based data concern the GMP-regulated quality assessment. Comparison of different concentration studies performed by SPR assays show that for example the inter-assay variability of the data can be in the range of 0.6–15 % RSD% for IC50 dependent on assay conditions [31]. However, in most cases the RSD% is approximately 10–25 % in kinetic studies. The variability of a SPR analysis is very dependent on the assay system itself. To begin with, the molecular weight and the stability of the reagents are relevant. Studies investigating the robustness of SPR have compared [28, 3336] different instruments, analysts, and sensor chips. Furthermore, the effect of matrices on method validation still need to be further investigated. Another critical factor for assay robustness is the stability of the immobilized ligand on the sensor chip [36]. Proteins bound to the surface of the sensor chip may be denatured by regeneration steps thus the time of permanency of the ligand surface is reduced (example Fig. 4). Another important influence is the maintenance status of the instrument [34, 35]. Table 2 summarizes which parameters could influence the variability and performance of a kinetic SPR assay.

Fig. 4
figure 4

Surface plasmon resonance (SPR) single cycle kinetic measurements of the antigen antibody system beta 2 microglobuline and its murine antibody. Maximal response units (RU) decline after 60 cycles, which is typical for this case. Note that the number of cycles can vary strongly depending on the particular system. Maximal RU is the characteristic parameter for the bound ligand on the sensor chip

Table 2 Parameters to be considered in kinetic SPR assay

Fluorescence based ligand binding assays

In drug discovery fluorescence based protein ligand binding assays play an important role especially in the investigation of G protein-coupled receptors (GPCR) as molecular targets. Application possibilities of fluorescent receptor ligands are for example localization of receptors in tissues and cells, binding affinity and kinetic measurements, and exploring the mechanism of ligand–receptor interactions [37, 38].

There are many techniques based on measuring fluorescence intensity that are used as ligand binding assays, such as evaluation of bulk binding by separating bound from free ligand, determination of changes in fluorescence emission spectrum, fluorescence correlation spectroscopy (FCS), thermal shift assays (TSA), fluorescence anisotropy, flow cytometry, fluorescence polarization and ligand–receptor fluorescence resonance energy transfer (FRET) [37, 39, 40].

Evaluation of bulk binding by separating bound from free ligand means that the respective ligands are physically separated by methods such as filtration, centrifugation, affinity chromatography or size exclusion chromatography.

The determination of change in the fluorescence emission spectrum can directly be used if the emission of a fluorescent ligand changes upon binding to a protein. In this approach a separation step is not necessary [41].

FCS measures the diffusion of fluorescently labeled molecules in solution in a defined unit of volume that is irradiated by a focused laser beam. This technique offers time-dependent binding measurements under equilibrium conditions, without separating bound from free ligand as well [42].

TSA, also called differential scanning fluorimetry (DSF), determines the thermal stability of a target protein. Therefore it measures the thermal stability change of a target protein and its ligand performing a thermal denaturation curve in the presence of a fluorescent dye [43].

Fluorescence anisotropy is used to detect unequal intensities of the light emitted from a fluorophore along different axes of polarization [44, 45].

Flow cytometry is a technique that measures fluorescence when cells or particles pass through a fluid path of a flow cytometer. Thus the amount of fluorescent ligand associated with a cell in a homogenous suspension can be determined [46] (see “Cell-based assays” section).

Fluorescence polarization detects the binding of a small fluorescent ligand to a larger protein using plane polarized light. The difference between polarization of the polarized light that is used to excite the fluorophore and the light that is emitted from the fluorophore is measured [44, 47].

FRET is the non-radiative energy transfer between an excited fluorophor (donor) and another fluorophore (acceptor) where the emission from one overlaps the excitation of the other. The interaction is very dependent on the distance (on the scale of 1–10 nm). FRET can be measured by enhanced emission of the longer wavelength acceptor fluorophore or by loss of the emission from the shorter wavelength donor fluorophore [42, 48]. Time-resolved FRET offers an improved signal-to-noise ratio and therefore a more precise estimation of binding constants [48]. A typical application of FRET is to probe the environment and geometry of the ligand binding site and the kinetic aspects of ligand binding to receptors.

Due to the various techniques based on fluorescently labeled ligands there are many publications about the development and experimental validation of fluorescence based ligand binding assay (e.g. [41, 49, 50]). The KD is mostly in the range of 10−3–10−9 M. One example for a fluorescent ligand binding validation study that concerns the GMP-regulated quality assessment has been published by de Boer et al. [51]. In this concentration analyses study an intra-assay precision of 7 % (RSD%) and an inter-assay-precision of 12 % (RSD%) are determined. In principle, fluorescence-based LBAs can be very fast. They are suitable for high-throughput screenings.

All fluorescent techniques have one significant problem in common: the modification of the ligand with a fluorophore often alters the ligand properties and consequently the nature of the ligand’s interaction with its receptor. Thus the fluorophore can have a direct influence on the measured KD value. In addition, false positive and false negative results are observed (e.g. from adsorbing interferences, aggregation, self-absorption, etc.). Finally, results are rather variable (e.g. for inhibition analyses studies RSD% up to 50 % [52, 53]). Some general factors have to be considered in fluorescence-based binding assays as in any other ligand binding assay, such as differentiation of bound and unbound ligand, unspecific binding and matrix effects. A standard procedure to avoid or at least reduce errors and to optimize fluorescence-based protease assays has been suggested [52].

In contrast to radioligand binding assays it is difficult to quantitate the absolute concentration of bound receptors, because in fluorescent assays the measured values are compared in curve fitting with fluorescence values at saturation. Furthermore, the calibration against a standard fluorophore can be critical because in this case the fluorophore of the binding assay and the standard fluorophore should have the same fluorescent properties. Moreover, fluorescence levels measured in a binding experiment are affected by the sensitivity and geometry of the detecting instrument [37].

Isothermal titration calorimetry (ITC)

ITC has gained much attention in drug discovery in recent years, because it has become apparent that enthalpy-driven optimization of hits and leads is favorable for obtaining compounds with balanced potency and physicochemical profile [5456]. Hence, apart from applying van’t Hoff analysis (temperature-dependent assessment of binding constants), ITC is at present the gold standard for decomposing binding free energies (ΔG) into enthalpy (ΔH) and entropy (ΔS) contributions. It gives direct access (Fig. 5) in the measurement to ΔH, the binding constant KD, and the stoichiometry n, while ΔG and ΔS are then derived by calculation from the primarily measured properties [57]. Further favorable features of ITC are that it is a label-free technique and it does not require immobilization of ligand or protein. During the experiment one component of the ligand–receptor complex is titrated into the other component and the incremental heat changes in µcal for each step of the titration are measured. This raw data is converted to a binding isotherm that needs to be fitted to a suitable binding model by non-linear least squares fit in order to retrieve the desired thermodynamic parameters [57, 58]. The shape of this binding isotherm is represented by the ratio of the receptor concentration divided by the dissociation constant, also called the c value. It is generally accepted that c values in a certain range (5–500 [58], better: 10–100 [57, 59]) provide the best sigmoidal shape for obtaining reliable KD values. However, ligand and protein solubility can strictly limit the achievable c value, particularly for weak binders [60]. It has been demonstrated that for low affinity systems, even at c < 10, ITC can yield reasonable KD values, given that the binding stoichiometry is known, protein and ligand concentrations are accurately measured, the signal-to-noise level is appropriate and the titration is pursued almost up to saturation levels [60]. However, enthalpies derived under such “low c” conditions should be interpreted only with great caution. In contrast, at “high c” conditions, the binding isotherm approaches a rectangular shape, from which no KD value can be derived anymore. Still the enthalpy is well defined in this case. Based on these requirements, typical KD values that can usually be determined by ITC can range from low 10−9 M to high 10−6 M affinities. Measurement of high affinity ligands is limited predominantly by the sensitivity of the machine to detect very small heats produced by the low concentrations necessary for an acceptable c value. In contrast, measurement of low affinity ligands is typically limited by the solubility of protein and compound required for obtaining a high enough c value. In some cases, inverse titrations or competition-based design of the experiment can help to overcome these limitations [61].

Fig. 5
figure 5

Raw data (upper panel) generated by an ITC experiment representing the heat released (or absorbed) during the duration of the titration (µcal/s). This raw data is converted into the binding isotherm (below) by integration of each injection peak giving the thermal energy (ΔH) of each titration step. Upon saturation of the protein in the cell with added ligand, the signal is reduced until only the background heat of dilution remains. Corrections for such heats (e.g. using control experiments) are essential. From the binding isotherm (heat in kcal/mol plotted against the molar ratio of ligand/protein), the change in enthalpy ΔH, the stoichiometry n, and the dissociation constant KD can be derived. The change in enthalpy is represented by the distance between the two asymptotic lines (red arrow) corresponding to the minimal and maximal heat formation. Where n is molar ratio at the inflection point (green arrow) of the sigmoidal curve. The slope at the inflection point (blue line) reflects the association constant (KA = 1/KD). ΔG and ΔS can be calculated from KD, ΔH, and T (adapted from [57, 65])

Without a doubt, proper design of an ITC experiment is important for the quality of the resulting data. Still, a large variety of issues and systematic errors (biases) need to be avoided to produce reliable data. The accuracy of both the ligand and protein concentrations is certainly an important factor [62]. Thus, they should be verified by additional analytical procedures. Because all processes that produce or require heat (e.g. mixing/dilution, protonation/deprotonation, aggregation/precipitation, alternative reactions, etc.) can interfere with the experiment quite substantially, ligand and protein buffers must be matched exactly and proper control experiments should always be performed. Further detailed information can be found at “Supporting information” section.

Various studies investigated the statistical errors in ITC curve fitting [63]. One parameter that can influence the quality of the curve fit is the number of injections and their volume. With an increasing number of injections more data points of the binding isotherm become available for the fitting. In order to increase experimental throughput and, consequently, also foster the repetition of experiments, reduced injection and single injection methods have been proposed [58]. Another important step toward higher throughput is miniaturization [58] (see also “Supporting information” section).

In addition to precision, accuracy and repeatability, reproducibility is an important factor in ITC measurements, particularly, for the reported absolute values of ΔG, ΔH, and ΔS. Because of the multitude of possible pitfalls when planning, conducting, and interpreting an ITC experiment, it has been suggested that as a common standard, some validation reactions (e.g. titrations of tris base in nitric acid, silver nitrate with sodium iodide/bromide, or bovine carbonic anhydrase II with CBS) should be performed and reported together with the newly determined data [64].

Nuclear magnetic resonance

NMR is of widespread use in rational drug design campaigns. Besides its classical applications for elucidating the constitution and structure of small organic molecules and, as alternative to X-ray crystallography, the determination of the 3D structure of bio-macromolecules, it provides sensible probes for screening ligand binding to biomolecular targets like proteins and nucleic acids [6674]. Many successful applications have been reported from pharmaceutical and biotechnology companies in which NMR methods have been employed for hit identification, validation, and/or elaboration [66]. Many detailed reviews are available in the literature describing the large variety of NMR screening techniques [6674] and here only a short summary is given. Specific properties of the most important methods are summarized in Table 3. Depending on the signals observed in the experiment, the methods can be assigned to two categories: target or receptor-based and ligand-based. The first measures the change of chemical shifts of the macromolecule on ligand binding using 2D heteronuclear correlation spectra. The second relies on the fact that many NMR observables differ in the complexed and uncomplexed state. A ligand in complex with its target takes over the dynamic properties of the latter. Therefore, it will experience much slower diffusion rates, slower tumbling leading to faster transversal relaxation and, in this way, to broadening of the signals, and negative intra-NOE signals. For a fast exchanging ligand, corresponding to a large k off rate, these properties of the bound state are transferred to the free ligand in equilibrium and modulate the corresponding spectra. Since only the signals of the ligands are monitored here, specific labelling of the target is not needed and 1D 1H-NMR spectra are often sufficiently strongly reducing the acquisition time. Alternatively, methods relying on 19F resonances have started to be applied. Even if most applications use these methods to identify binders out of a large collection of molecules, identification of the binding epitopes of the ligand or the target as well as binding affinity determinations are possible directly or by replacement experiments.

Table 3 Comparison of methods for target- and ligand-based NMR screening in drug discovery (adapted from [71])

Target-based techniques all rely on chemical shift perturbation (CSP) on ligand binding, which are caused by changes in the electronic environment of atoms in the target due to the interactions with the ligands. These shifting signals are followed mainly in 2D 1H-15N correlation spectra, due to the need of only relatively inexpensive 15N labeling of the protein, but 1H-13C HSQC spectra of partially labelled proteins (e.g. methyl groups of Val, Leu, and Ile) can give more reliable results. One representative is the patented SAR-by-NMR approach [75, 76]. To identify hits e.g. in a fragment library, mixtures of ligands can be screened. Those mixtures inducing CSPs in the target have then to be de-convoluted, i.e. each compound has to be measured individually, to identify the active ligands. After identification of a fragment for a specific binding pocket, the screenings are redone with a high concentration of this first fragment in the mixture. Fragments binding to other pockets of the binding site induce additional CSPs and high affinity ligands can be generated by linking the two fragments together. The two main advantages are the possibility to accurately determine binding affinities by non-linear fitting the size of the CSPs as a function of added ligand [71] and the identification of the binding epitope of the target (chemical shift mapping), if an assignment of the HSQC signals is available. For the latter it has to be kept in mind that changes in the conformation or the dynamics of the target can also lead to large CSPs even in regions further away from the ligand [77, 78]. Disadvantages are the specific labelling and the high amount of protein needed as well as the long acquisition time for the 2D spectra. Applying this method for quantification of binding affinities in drug discovery projects (Fig. 6) [79, 80], several practical aspects that should be taken into account: (1) Trivial or uncharacteristic shifts should be discarded. Chemical shifts should be considered significant, if the average weighted 1H/15N chemical shift difference Δδ(1H/15N) = [(Δδ(1H))2 + (Δδ(15N)/5)2]0.5 is >0.04 ppm. This should help to avoid overinterpretation of meaningless peak deviations. Uncharacteristic shifts can only be identified by thoroughly studying the protein under modified buffer conditions. (2) The number of concentrations should be sufficient for curve fitting. It is desirable to measure at least HSQC spectra for five different concentrations of the ligand. The statistic of KD values obtained from multiple separate curve fits can provide additional insights about strange, possibly erroneous chemical shift changes. (3) Whenever peaks overlap or a shifting peak makes a transition though some other unchanged peaks, the shape of the peak can be distorted and it can become difficult to unambiguously identify the center of the peak and, thus, the correct chemical shift difference. (4) Solubility of weak binders can be a strict limitation for obtaining reliable data. It needs to be taken into consideration that saturation with respect to solubility of the compound can mimic saturation of the binding site. For more details see also “Supporting information” section.

Fig. 6
figure 6

Overlay of 1H-15N HSQC spectra of protein without ligand (red) and increasing concentrations of a ligand (orange 45 μM, yellow 114 μM, green 227 μM, cyan 455 μM, dark blue 909 μM) (a). One of several significant chemical shift changes is shown as a close-up. Curve fits for all 15 significant chemical shift differences are shown in b. The KD value obtained from each individual curve fit is presented in color code next to the respective curve. The mean value can be calculated as 78.0 µM, the standard deviation is 13.5 µM

Ligand-based methods can once more be divided into two groups [66]. The amplifying methods include transferred NOE (trNOE) [81, 82], saturation transfer difference (STD) [71, 83], waterLOGSY, inter-Overhauser effect (ILOE) [84, 85], inter-ligand NOE for pharmacophore mapping (INPHARMA) [8688], as well as NOE [89] and inverse NOE [90] pumping are all based on nuclear Overhauser enhancement. Besides the possibility to identify binding, additional information on the bioactive conformation of the ligand can be obtained. The trNOE approach provides intra-ligand distances, which can be used to constrain the conformation e.g. in docking experiments [81, 82]. Similarly, distances between ligands or fragments binding to the target at the same time are seen in ILOE experiments. In STD, magnetization is transferred from the target to the ligand, which is more effective at smaller distances. Therefore, it can quantify the relative closeness to the protein surface of parts of the ligand and, in this way, identify the binding epitope of the ligand [71]. In a similar way, INPHARMA uses the protein-mediated magnetization transfer between two competitive binding ligands, from which the relative orientation of the ligands in the binding site can be determined [8688]. Finally, in waterLOGSY [91, 92] the magnetization is coming from the solvent, which is then transferred to the free or bound ligand through the macromolecule.

Since the signals of the just mentioned methods are proportional to the concentration of the bound ligand and accumulate in the course of several binding/unbinding events, they are superior to the second class, the non-amplifying techniques. For these the signals are proportional to the bound/unbound fraction of the ligand. Thus, low ligand concentrations should be used, which is limiting the sensitivity. Therefore, techniques to enhance the sensitivity are required [9395] (for detailed possibilities see “Supporting information” section).

Affinity capillary electrophoresis (ACE)

ACE is an excellent and highly precise option to estimate binding properties, if the binding interaction is combined with a change in charge of the interacting partners [9699] (Fig. 7).

Fig. 7
figure 7

The principle of ACE. The electrophoretic mobility change of a weak-to-moderate ligand (L) protein (P) binding system. The electrophoretic mobility (µ) of a receptor protein changes when it binds to a charged ligand. This electrophoretic mobility shifting is induced by a change of the proteins charge-to-mass ratio. The interaction between ligand free and ligand bound protein can also broaden the protein peak corresponding to migration time intermediate depending on protein concentration. At saturating concentrations of the charged ligand the protein peak (µP+L) sharpens and its mobility does not alter. S internal standard (reprinted from Chu and Cheng [100] with permission)

ACE can successfully be applied to study the interactions of metal ions with biomolecules, since the complex is typically differently charged as compared to the biomolecules alone [101104]. This is particularly true for metal containing binding sites. The ACE technique is well suited for the screening of weaker interactions (KD above 10−6 M) [105], for example in an earlier discovery stage. So far, it is less suitable to differentiate between stronger binding ligands, e.g. in the nanomolar range. ACE experiments only require sample amounts of 10–20 µg per series and binding constant [106]. Slow kinetics can easily be studied by pre-incubation experiments. One ACE experiment typically needs <10 min, including rinsing steps to avoid carry-over effects. Typically percent relative standard deviations of migration data within one measurement series of <1 % are found, the variability of the binding constants probably being in the same range [101104]. Comprehensive information about method development and successful examples has recently been reviewed [9699].

Other biochemical assays

Radioligand binding assays are well established and still often used, due to lots of experience with this technique and excellent selectivity, if unspecific binding of the radioligand can be avoided. Recently, this type of assay came a bit out of fashion because of its intrinsic disadvantage: the need to use radioligands. This is potentially hazardous, and thus not straightforward in handling. Certainly, additional regulatory effort is required, which makes the whole approach expensive. Apart from this, common error sources include ligand depletion and non-specific matrix binding of the chosen ligand [107].

Mass spectrometry, which is an excellent option for other analytical tasks, is less frequently used for binding assays. In addition to the high costs involved, there is another major disadvantage: often experiments rely on binding in the vacuum phase. However, this binding behavior is often completely different to the one in the aqueous phase [108110].

Microscale thermophoresis has been employed for LBAs as well [111, 112], with minimal sample volumes and acceptable performance [113, 114], also compared to other techniques [115].

Binding data with acceptable error can also be obtained by atomic force microscopy (AFM). However, so far, this approach is not routinely used, hence it is not possible to evaluate its general performance [116].

Cell-based assays

Cell-based assays inherit all the properties discussed in the two previous sections, including issues such as instrument qualification, method validation and sample dilution. In addition, they bring about another property, which is a pro and con at the same time: they exhibit all the complexity of living organisms. Thus, they provide better information than isolated biochemical systems. For example, GPCRs can possibly only be properly investigated within intact cell membranes. However, life, and thus cell-based assays, also shows a lot of individuality and development, just what life is about.

For this reason, typically there is no stable reference material for cellular assays. Cells just cannot be kept stable so easily [117].

Checking the accuracy is methodologically done by using an alternative (orthogonal) method. However, since the cells themselves are a central part of the assay, alternatives to them are not available. Using a different cell line is no alternative, since this would be considered as a different biological system rather than an alternative test system. This means that there is no way to check accuracy in cell-based assays. Just the precision of the results can be checked by repeating the experiments. Since the precision is typically low, cell-based assays are only considered as qualitative (just categorizing) or quasi-quantitative [120] test systems. This also means: long-term precision is hard to estimate, or in other words, it contains all the errors from the changes in reference materials and reagents. Moreover, in cell-based assays always the average cell behavior is measured, since the cells will typically be in different states of the cell cycle [117].

Thus the “Material” deserves particular attention in cell-based assay. Some of the difficulties have been solved by commercial suppliers. Using commercial kits, it is mandatory to protocol the lot numbers and expiration dates. Changes during the period of use should also be documented. The same said about standards and cell cultures also applies to media. Their constitution and a possible change of media can matter substantially. However, by a clear-defined recipe it is certainly possible to use well-defined media for a long time, much easier as for the cell cultures. In the meantime, for some applications standard media have evolved [118].

The specimen age can be tracked, allowing for the possible adjustment of protocols [118]. Using deep-frozen cultures or cell suspensions, good freezing and thawing protocols are required to provide uniformity of the reagents and reference cells. Stability depends on the storage temperature range and therefore needs to be carefully validated and controlled as well [118]. Specimen should be adequately characterized: the viability can be tested, e.g. by fluorescent dyes, taken up by damaged cells, and subsequent cell counting. Typically viability and count decrease in a typical way, which may also allow for proper adjustments. For the validation of these tests, and proper staff training, it is recommended to use very well characterized cell lines such as CaCo or HeLa. Evaluating the specimen morphology is another recommended approach for cell characterization [118].

In addition to these necessary efforts related to the cell material, drug permeability can be another major source of misleading experimental results. Due to all these difficulties, the global regulatory bodies such as the FDA and EMA hesitate to give guidance [117]. How to deal with this unsatisfactory situation?

Control experiments are a necessity, instrument performance qualification procedures and the preferred use of commercial tests or generic protocols are strongly recommended [119]. Possibly error sources can only be determined and systematically excluded in an economical manner if a test is more widely used. Laboratory developed tests (LDTs) or so-called “home brew” assays will often not be able to fully cover all the mentioned topics [117]. It should be allowed for sufficient time and manpower to implement tests, in particular new ones.

Possibly one cannot give generally valid benchmark numbers for the precision of cell-based assays, since they are just too different. However, performance parameters such as specificity, sensitivity, LLOQ, linearity and range have been identified and thoroughly discussed, including the respective required data numbers for the estimation of these parameters, e.g. by Wood et al. [120]. Even though benchmark numbers for the overall precision have not been given, limits for the stabilities of specimen and reagents have been proposed [120], including the number of measurements required.

The variability of cell-based assays is best controlled if measurements are produced within one lab by one person during a short period. This is understandable, since the number of unintentionally varying parameters is much lower under these circumstances. If a small number of compounds or conditions need to be compared, the relative results from one lab are often much less variable and thus more often significant. If results from more than one lab shall be evaluated together, homogeneous, well-defined, validated protocols are a clear precondition. In addition, re-validation in all participating labs is advisable or proper method transfer is necessary [121]. These statements are particularly valid, if databases should be fed with data from various sources.

Although there are already very valuable individual assays, which are superior to their biochemical competitors, performance characteristics of cell-based assays are generally still unsatisfactory. On the long run, it will be very important to make general progress, since these assays promise so much more biologically relevant information. A first step should be a comprehensive checklist of all possible parameters of interest that influence the total error. This list has to be agreed upon by regulatory authorities to later ensure acceptance of preclinical usage of the respectively developed assays. As research into error sources progresses, the list should regularly be updated. An initial checklist could correspond to [117, 118, 120, 122].

The importance of this task is widely seen, the topic has been recently pursued with considerable effort [117, 118, 120, 122]. However, cell-based assays are still much more difficult to establish than biochemical assays.

General comparison of assays

It is very difficult to give generally valid recommendations how to choose the optimal assay. The number of possible approaches is large, and there are few scientists, which have experiences with several of them. Every technique has its advantages [97, 98, 121] as was discussed in the respective sections. In the following well-suited approaches with respect to selected criteria are given (see “The requirements” and “Choice of the right assay” sections):

Well-suited approaches with respect to certain criteria

  • Speed SPR, immunoassays, fluorescence assays, and radio ligand binding assays. However, keep in mind that these are the most established ones, which enjoyed the most attention and development. For example, ACE, MST and SPR could be accelerated using multiplexing.

  • Precision For immunoassays, fluorescence assays, NMR, SPR and ACE, 2–10 % RSD% for KD can be reached.

  • Spatial information about binding (SAR by) NMR (see “NMR” section), X-ray crystallography (not discussed here).

  • Low sample amounts Immunoassays, fluorescence assays, ACE,Footnote 1 SPR, MST.

  • Suitable affinity ranges All approaches are suitable in the medium range; in addition:

    • Suitable for very strong affinities (<10 9 ) Immunoassays, fluorescence assays and radio ligand binding assays, SPR, NMR.

    • Suitable for low affinities (>10 3 ) SPR, NMR, ACE, MST, fluorescence assays.

Robustness is another very important aspect to discuss, but there is very limited information regarding this parameter related to LBAs at the present time. Certainly a lot can be learned in this regard from robustness tests in QC.

Specificity is an important parameter as well. This is the independence of the result of a binding study from compounds that may be expected to be present (e.g. matrix compounds). At present, specificity of methods is still hard to assess. Possibly techniques such as NMR, immunoassays, fluorescence assays and radio ligand binding assays could be seen as well-suited in this regard. In order to avoid systematic errors (biases), such as caused by frequent hitters, aggregation or adsorption (see “Avoiding common error sources” section), it is common practice to employ at least two orthogonal techniques in binding experiments [43]. Please note that specificity is not equivalent to robustness or reliability.

The aforementioned list may reflect personal bias through individual experiences and shall serve as a starting point for discussing strengths and weaknesses of the individual techniques. In any case, it is always advantageous to employ various approaches to thoroughly characterize a binding process. The confirmation of binding data by an alternative approach is the best method to achieve accuracy.

Optimizing assays

Parameters to be optimized

The main aspects to choose an assay include relevance (biological system, solvent used, etc.), the affinity range to be investigated, speed, robustness, selectivity, accuracy, and precision. These parameters are also those that may require further optimization. The respective benchmark techniques have been listed in the previous paragraph. The following optimization concepts are widely accepted to ensure the quality of pharmaceuticals in a regulated environment, see [121]. The same concepts can be applied to LBAs.

Time requirements for assays play an important role. High speed is required for high throughput systems. It also allows quick method development and optimization, since various conditions can be tested in a short time. In addition, it is possible to trade speed for precision. The standard deviation of a mean value \((SD(\bar{x}))\) is the standard deviation of the single values \((SD(x))\) divided by the square root of n, the number of repeated measurements to obtain this mean value (equation):

$$\begin{aligned} & SD(x) = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} }}{n - 1}} \\ & SD\left( {\bar{x}} \right) = SD(x)/\sqrt n \\ \end{aligned}$$

where x i represents a single measurement and \(\bar{x}\) is the mean of the n measurements. If a technique operates at high speed, multiple measurements can be obtained in short time. This allows to reduce the standard deviation of the reported average by increasing the number of repetitions. It should be noted that the degree of improvement quickly levels of. While four repetitions yield an improvement of factor 2 in terms of the standard deviation \(SD(\bar{x}),\) 9 and 16 repetitions are required to gain factor 3, or factor 4 in improvement, respectively. Yet, the improvement in precision through repetitions is an additional very good argument to go for maximal analytical speed, e.g. by multiplexing or lab on a chip (LoC) approaches.

Avoiding common error sources

All error components contribute to the total error. It is useful to distinguish random errors, which represent the random spread of data, and systematic errors, which shift the results into one direction. The latter error type is also called bias. Random errors can be caused by different kinds of noise. Examples for systematic errors include uncontrolled shifts of experimental parameters such as temperature or pH, or loss of samples during sample preparation. There are several general sources of error which are often found to be important, independent of the employed assay technique:

Standard operation procedure (SOP), experimental: Quite often the employed protocols are still not mature in the sense that they do not describe the experiment thoroughly and completely. This means that important experimental parameters may have been omitted. Uncontrolled shifts in these parameters can in turn produce a large amount of error. The completeness of an SOP or an experimental part can be checked by carefully comparing the draft to publications of similar assays. The most rigorous way would be to assess its quality by inter-laboratory trials. A first step in this direction could be to ask a cooperating laboratory to carry out the assay according to the SOP under scrutiny and record the results for standard compounds. Comparing mean values and standard deviations will reveal the accuracy and robustness of the procedure.

Please note that SOPs are not limited to methods. LBAs can profit from the virtues of QC in many respects. Thus, establishing SOPs for instrument qualification and validation procedures will help to assure reproducible results and are highly recommended. Another efficient way to check the performance of the analytical system is the use of QC samples. These are samples with known properties that are analyzed together with the samples under study. About 5 % of all samples should be QC samples [121].

Sample preparation is widely known as a frequent major error source [123]. Yet, this aspect is not thoroughly studied thus far, since systematic studies would require large resources. Sample pretreatment is often mandatory to reduce the influence of analytical matrixes. Sample pretreatment techniques include liquid–liquid and solid–liquid extraction systems, precipitation, ultrafiltration, centrifugation and dialysis.

Artifacts from sample preparation, such as loss of sample or sample degradation, cannot always be avoided, even if great care is taken. Often a trade-off between avoiding these artifacts and an efficient sample clean-up has to be found. In this case, the use of a systematic approach to optimize the sample preparation conditions is certainly advisable. An example would be to optimize the conditions for sample pretreatment by statistical design of experiments [124] as follows: select several established liquid–liquid and solid–liquid extraction systems to be screened. Define a set of suitable conditions (factors) for each one that can be varied systematically. Potential influence factors would be e.g. phase ratios and mixing/homogenization times. Set up an experimental design to analyze the yield in sample extraction (i.e. minimize sample loss or degradation) with respect to time requirements.

Even if the sample matrix itself does not cause errors, the solvent can cause biased results. For example, it is common practice to use DMSO although it is well-known that interactions in this solvent are often quite different than those in aqueous buffers.

False positive results can also be caused by aggregation, adsorption and formation of reactive species, such as hydrogen peroxide. These phenomena are often related to compounds giving false positive responses in several assays. Therefore, these are called frequent hitters or pains (panassay interference compounds [125]).

However, even if a well-suited buffer in an aqueous system can be chosen for a particular assay, at least dilution steps are typically necessary for sample pretreatment. The contribution of the dilution error to the total error is often underestimated. Even today, it is still sometimes the dominating error source [13]. The quality of pipettes is obviously important, well-trained lab personnel is another prerequisite. The careful planning of dilution steps is essential as well. Each dilution step contributes to the total error, thus many steps are typically unfavorable.

Finally, the signal-to-noise ratio (S/N) is always strongly related to the analytical performance. A low S/N often relates to poor precision [126, 127]. Sometimes noise reduction or signal increase can be straightforward, e.g. by optimizing the sample concentration.

Optimizing assay precision: decreasing variance components

Looking at the various equations to calculate the total error, it is obvious that the major error components always dominate in contribution, no matter what calculation is used for the propagation of uncertainty. By reducing the major error sources, the total error is substantially reduced as well.

Often the aforementioned common error sources belong to these major error components. Other important components are related to the specific technique employed. Thus, to improve the performance the respective key publications need to be consulted, which discuss error sources, method development, and good working practices [9, 19, 22, 24, 29, 36, 37, 39, 52, 6264, 66, 67, 98, 99]. This helps to assemble a list of known relevant parameters to be optimized. It should be noted that the influence of some parameters (for a specific technique) on the error may still be unknown. Put differently, there may be unknown relevant factors whose control would yield an error reduction. For efficient error control, these relevant factors need to be identified since otherwise the performance of a technique cannot be efficiently improved.

Therefore, these unknown relevant parameters are the trickiest ones. In order to disclose them, thorough analytical experience and understanding is required, including careful investigations and observations. Such investigations can be strongly supported with control charts from control experiments, looking at the long-term trends in negative and positive control experiments [128132].

After having identified all possibly relevant parameters for a certain technique, a suitable representative method to optimize the technique needs to be identified. Usually thousands of methods are published for each technique. However, suitable are only those methods that come with a full description of the experimental part and at the same time perform reasonably well according to the state of the art. If several methods meet these requirements, the simplest and most robust method should be chosen for further optimization.

For the selected method, the sources of variability and their contribution to the total error should be analyzed with the help of an experimental design. When starting with a large number of parameters (factors) whose effect sizes are unknown, it is recommended to start with a screening design such as the Plackett–Burman design [133]. After the most influential parameters and their contributions to the total error are identified this way, their main effects, interaction, and statistical significance can be studied in more detail. This can be done either with (repeated) fractional factorial designs or central composite designs depending on how many factors remain and which functional form of the relationship between the respective parameter and the total error is expected [134].

After successfully improving the overall precision of the representative method and thus the precision of the technique as such, the degree of complexity can now be increased, transferring the obtained knowledge to other methods, which contain potentially additional error sources. Introducing these additional error sources step by step, their contribution can be much easier observed and understood. Again, control charts are strongly recommended to support and facilitate this process.

Conclusions

The choice of the appropriate method to measure binding constants is a very difficult task. When comparing data obtained from even different methods, the resulting errors will be enormous and the resulting scientific value correspondingly limited.

For all assays, the proper choice of the sample pretreatment strategy is essential. This aspect probably did not earn the attention it really deserves. Furthermore, well-defined and detailed protocols are the prerequisite for successful assays in general.

Using biochemical assays, intra-assay repeatability RSD% values of 2–10 % for KD and corresponding parameters can be achieved after careful optimization, the complete variability including error and bias probably being in the same range. This is provided that the virtues well known from QC assays, such as having a proper SOP system and using QS samples, are filled with life. Better understanding of the relevant experimental parameters for intra-assay precision will also allow for the improvement of inter-assay reproducibility.

However, the value of isolated biochemical tests depends very much on the scientific problem to be solved. Cell-based assays are superior in this regards, but their variability is still not satisfactory. Working with one test in one laboratory and characterizing the cell lines appropriately in short time intervals, certainly comparable results can be obtained which are suitable to optimize binding properties of substances. However, the resulting data quality still impedes the training of really good computer models of substance efficacy just derived from these data. Some authors still hesitate to give numbers for the achievable overall analytical performance for cell-based assays, because this information is very limited.

Should we give up and accept this situation? Certainly not. We should carefully identify the major sources of errors, then look into details for each source and improve the procedures one by one. It has been shown that this can be very successful for biochemical assays, but it can be successful for cell-based assays as well, even though the task is much more difficult here. Fortunately, steps in the right direction have already been made, e.g. by the ICSH/ICCS Working Group [117, 118, 120, 122]. Still, the room for improvement is enormous, and every improvement will in turn improve the drug discovery process. Let us assume there is something similar like Moore´s law for the advancement of the computer performance for drug research. Maybe the variability can be halved every 2 years, in particular for the highly variable cell-based assays? Better data quality is possible. Success in drug discovery can be advanced, through analytical performance.

Outlook

Comparing the performance of the various ligand binding assays has been a tricky topic to discuss. We tried to do it as good as possible, but we expect that there are several different opinions. However, a discussion must start somewhere. We expect some interesting debate, and are looking forward to this. In order to share this debate and remarks from colleagues with the readers, we shall post the most interesting points at supporting material.

Supporting material

https://www.tu-braunschweig.de/pharmchem/forschung/waetzig/zusatzmaterial.