Introduction to small molecules

What are small molecules?

Size is not the only criterion for a molecule to be regarded as a “small molecule”. The term “small molecule” is defined by pharmaceutical chemistry, and small molecules have had a big effect on drug discovery in recent decades.

A definitive encompassing statement of the size of a small molecule and its molecular weight cannot be found in the literature. The definition depends on the application and the field in which it is used, and so is rather variable. Some sources state that small molecules are “organic, non-peptide compounds” [1, 2], others that the “size diversity of small biomolecules easily overlaps with many natural peptide hormones” and thus a molecule should fulfil three criteria: first, “it is not directly encoded by genome”, second “it is synthesized by specific enzymes”, and third “it is non polymeric” [3]. This means the latter description includes non-ribosomal peptides or specific non-polymeric carbohydrates, contradicting the previous statement. Also, the range of the molecular weight varies from definition to definition. For example, some literature [3, 4] sets the limit at 1000 Da, whereas 500 Da is the upper threshold for others [5, 6]. Nevertheless, the molecular weight is a crucial aspect of drug design because of the poorer blood-barrier permeability associated with increasing molecular weight in most common cases. [7] Because of this poor permeability, the “rule of five” was established, which states that “poor absorption or permeation are more likely when: There are more than 5 H-bond donors (expressed as the sum of “OH”s and “NH”s); The molecular weight is over 500; The log P is over 5 (or Mlog P is over 4.15); There are more than 10 H-bond acceptors (expressed as the sum of “N”s and “O”s); Compound classes that are substrates for biological transporters are exceptions to the rule” [7].

The origin of small molecules

Small molecules and their derivatives can originate from extractions of natural products, which are “secondary metabolites chemicals that are not directly involved in the growth or the development of the organism that produces them”, or chemically synthesized compounds [8]. Numerous natural small compounds can be isolated from bacteria, actinomycetes, fungi, or microbes [9]. In recent years, compounds of marine origin have also been isolated and used as small-molecule drugs. The anti-tumour drug Trabectedin (brand name Yondelis®) [10] is an example, already used for the treatment of patients, which “is the first marine anticancer agent approved in the European Union for patients with soft tissue sarcoma” [11].

Small molecule fragments

When discussing small molecules, small-molecule fragments must also be considered. Fragment-based lead discovery (FBLD) is an increasingly popular screening technique. The basic principle is to subdivide drug candidates into functional fragments and use these fragments for screening. Afterwards, the drug candidate is designed on the basic of the screening results: “FBLD can lead to molecules with better drug-like properties than those originating from typical HTS” [12]. The “rule of three”, which specifies that fragments have a molecular mass less than 300 Da, log P ≤ 3, and ≤3 hydrogen bond donors and acceptors, was often used in this context, although it only serves to limit complexity in fragment libraries and therefore is not a strict guideline [12, 13].

“A number of compounds that evolved from fragments have entered the clinic, and the approach is increasingly accepted as an additional route to identifying new hit compounds in pharmaceutical discovery and inhibitor design” [14]. For example, in December 2012 MK-8931 (Merck), originating from FBLD, was announced as a potential drug for Alzheimer’s disease, whereas fifteen years ago beta-secretase 1 (BACE1) was regarded as an intractable target [12]. In FBLD, low-molecular-mass compounds with low affinity to the target are upsized, giving them better physicochemical properties than those generated using conventional screening techniques [12, 13].

What do small molecules target?

Small molecules most often target proteins. DNA or RNA might also be an interesting target for small molecules, but they are excluded from this review.

In 2006, the list of the most common protein targets for pharmaceutical research included [15]:

  • Rhodopsin-like G-protein-coupled receptors (GPCRs)

  • Nuclear receptors

  • Ligand-gated ion channels

  • Voltage-gated ion channels

  • Penicillin-binding protein

  • Myeloperoxidase-like

  • Sodium: neurotransmitter symporter family

  • Type II DNA topoisomerase

  • Fibronectin type III

  • Cytochrome P450

  • Phosphodiesterases

  • Proteases

  • Protein kinases

Rhodopsin-like GPCRs, nuclear receptors, and ion channels together provide more than 50 % of possible drug targets.

A very detailed overview of targets and respective drugs can be found in [16].

Many examples of screening for small molecules, their targets, and the related diseases can be found in recent literature (Table S1, Electronic Supplementary Material). Interesting examples will be emphasized in the appropriate method section.

Why are small molecules interesting?

Small molecules are interesting for two reasons. First, they are of great interest in pharmaceutical research because of their intrinsic properties, e.g. being able to cross the blood–brain barrier [17]. The second reason is the danger associated with small molecules when they are present in the environment (i.e. not used in a controlled way as pharmaceuticals). In this context, a large group of small molecules can turn into endocrine-disrupting chemicals (EDC).

When acting as EDC, small molecules can cause harm and can thus be referred to as “Jekyll and Hydes” [18]. The powerful effects of small molecules include curing diseases, reprograming whole cells [19], or having very negative and unwanted effects (for example being associated with a variety of diseases and disorders in living organisms) as EDC.

EDC have been the focus of research for decades now [20], but are still far from being sufficiently understood. EDC research covers two main aspects: understanding the mechanism by which EDC interact with proteins to disrupt their function, and detecting EDC in the environment (even at small traces). Monitoring levels of EDC [21] is necessary to control their prevalence and to protect populations [22] from their negative effect. Understanding the mechanism by which EDC interact with proteins is very closely related to understanding how small molecules can serve as pharmaceuticals. Detecting small molecules under environmental conditions, e.g. in small traces and complex matrices, is a totally different task, and is not covered in this review. Instead, we focus on analytical and bioanalytical techniques suitable for investigating small molecule–protein interactions.

Label-free detection methods for small molecule protein interaction investigation

Advantages of label-free detection

There are several obvious advantages of systems which do not require the use of labels. There are also some disadvantages; for example, label-free technology will probably never match the limits of detection of labelled methods [23, 24].

However, the advantage of label-free technology when investigating small molecule–protein interactions is that the interaction is not disturbed by labels. A label could be, e.g., an isotope, a fluorescent dye, a fluorescent protein, an enzyme, a quantum dot, or a nanoparticle [2528], and these labels can disturb the system, or at least make the system more artificial. Whereas the exchange of a single atom for another isotope of the same element can be regarded as a rather small manipulation, attaching a fluorescence label or even larger molecules to a small molecule, which is the same size as or even smaller than the label, will have a dramatic effect on both the physical and the chemical properties. Even when using isotope labelling in, e.g., scintillation proximity assays (SPA) [29], one must be aware that the radiation isotopes used have enough energy to break chemical bonds and alter the behaviour of the system in unknown ways (e.g. conformational changes).

Labelling the protein is less problematic than labelling the small molecule, but still has disadvantages. It is straightforward for an experienced molecular biologist producing their own proteins (e.g. by cell culture) to introduce, for example, green fluorescent protein (GFP) or other labels [30]. However, these proteins are arguably large even compared with other proteins: for example, GFP is approximately 27 kDa. There are other, much smaller chemical dyes, for example Cy5, which is less than 800 Da [31]. This is only a small change in mass compared with the protein (see Fig. 1). However, the label must be attached to the protein and, in contrast with biologically introduced labels (for example GFP), it is very hard to control the point at which the dye will attach, and the dye may even attach to multiple sites within the same protein. This can result in proteins of the same type and in the same mixture having completely different properties, caused by the dye molecule attaching to different positions on different proteins [31, 34, 35].

Fig. 1
figure 1

Size comparison of the 3D structures, generated using PyMol [32], of a typical protein target (ligand-binding domain of the estrogen receptor alpha [33], 30 kDa) in green, a small molecule (4-hydroxy-tamoxifen) in purple, and a typical fluorescence dye (Cy5.5-ester) in cyan

Other reasons not to use labels include the fact that labelling can be expensive and takes time. Sometimes, labelling is not possible: the protein might be not stable enough to be labelled [36], or may be unable to be labelled because it is only available in a mixture with other proteins (e.g. serum sample or cell lysate).

For these reasons, information gained from label-free methods can be regarded as more accurate and closer to an in-vivo situation, if the experimental conditions are chosen carefully.

This critical review will discuss strengths and weaknesses of different label-free techniques in terms of their ability to investigate specific interaction characteristics of a small molecule with its target. These specific interaction characteristics can be described as the physico-chemical properties of a biomolecular reaction, and are very important for understanding and/or manipulating, e.g., the binding of a drug candidate to a receptor protein. Thus, the different label-free bioanalytical methods should provide access not only to structural information, but also to information on thermodynamics (e.g. affinity), kinetics (e.g. association and dissociation rate constants), or even reaction mechanisms or dimerization of reactants.

This review will discuss in detail several methods that can be used for investigating small molecule–protein interactions. Advantages, disadvantages, and requirements will be emphasised, and notable examples of the respective method will be presented.

In-silico (computational) methods

In-silico-based methods may be the fastest-progressing analytical method during recent decades. This statement is supported by Moore’s law [37], which has now been valid for almost 50 years.

In silico covers a wide variety of methods, but we will focus on those which can be used to perform small molecule–protein virtual screening.

Virtual screening for pharmaceutically interesting compounds has become very attractive during recent years because it takes 14 years, on average, for a new drug (e.g. small molecule) to progress from development to market [38]. During this time, the developing company will spend approximately 1 billion USD [39].

There are three approaches to performing virtual screening for interesting small-molecule compounds or small-molecule fragments: structure-based drug design (SBDD), ligand-based drug design (LBDD), and sequence-based design. Normally, two or more approaches are combined when a pure virtual screening is performed. Which approach will be most promising for a given problem will depend on which information is available regarding the protein (3D structure, or only sequence) and the interacting small ligands.

During SBDD, the small ligand is aligned to a protein (with known 3D structure) in a way which is favourable in terms of energy, as illustrated in Fig. 2. The list of programs and/or algorithms which can be used for SBDD is quite long, and includes: Dock [41], FlexX [42], GOLD [43], GLIDE [44], SLIDE [45], LigandFit [46], FRED [47], Surflex [48], and AutoDock [49].

Fig. 2
figure 2

Docking of a small molecule to a protein. Virtual docking (using AutoDock Vina [40]) of a small molecule (4-OH-tamoxifen, anti-cancer drug) to the binding pocket of the human-estrogen receptor alpha. The different simulated conformations of the small molecule are shown in different colours (yellow, violet, and cyan)

Ligand-based drug design is possible when the 3D structure of the protein is unknown, if a group of small ligands is known to interact with that protein [50]. The basic principle is to find similarities within this group of ligands and use these similarities as molecular descriptors. Using these molecular descriptors, it is possible to predict other small molecules that may bind to this protein [51].

Sequence-based drug design is probably the most ambitious approach. For this technique, neither the 3D structure of the protein nor any small molecule that can interact with the protein is known [52].

All three approaches require no laboratory equipment, no chemicals, and no laboratory space, and, depending on the amount of small molecules and proteins to be screened, they may not need specialised hardware or specially trained personnel. However, the data generated is very much dependent on the quality of the experimental data (e.g. X-ray structures) and the skills and knowledge of the person running the simulations, and is even more dependent on the critical assessment of the generated data.

There are several interesting applications and examples of the use of virtual screening. Recently, SBDD led to the discovery of two small molecules able to block the Ras–Raf interaction [53], which has an important function in cancer development. The virtual screening reduced 40,882 candidates to 97, which were then screened using cell-based methods. One compound, Kobe0065, screened positive for inhibition of the oncogene at μmol L−1 affinity. A later computer-assisted similarity search of approximately 160,000 candidates yielded 273 virtually-screened small-molecule inhibitor candidates, which were again validated and yielded a second successful inhibitor: Kobe2602. This is an example of the successful combination of in-silico methods (here SBDD) with other analytical methods, including NMR (to determine the mechanism by which Kobe0065 functions) and cell-based assays.

LBDD was used very early on to predict affinities of polychlorinated biphenyls (PCBs) and polychlorinated dibenzodioxins (PCDDs) for the aryl hydrocarbon receptor (AhR), which is of great interest in xenobiotic metabolism but has no known 3D structure [54]. Recently LBDD, and quantitative structure activity relationship (QSAR) in particular, has been used not only to predict affinities and find new small molecules and ligands for proteins, but also to predict other properties of molecule classes, for example toxicity [55].

Sequence-based drug design has been used to discover new small molecules out of a library of 191,407 individual small molecules. The objective was to screen suitable small molecules for interaction with four potentially interesting targets (GPR40, SIRT1, p38, and GSK-3beta). The small molecules indicated by screening were then validated by: cell-based assays, for GPR40; fluorescence, for SIRT1; or optical sensors (SPR), for p38 and GSK-3beta. This screening identified and experimentally validated one novel binder for GPR40, five for SIRT1, two for p38, and one for GSK-3beta, which may be the first time that sequence-based virtual screening has been successfully applied to this kind of problem [56].

Biosensors

Despite their different transduction principles (piezoelectric, electrochemical, or optical), all sensors share specific properties. They usually require one of the interacting substances—here either the protein or the small molecule—to be immobilized onto a surface to create the sensing element. This can be read out using a suitable transduction principle and related electronics.

The process of immobilizing one of the reactants could be regarded as attaching a very large label to one of the reactants, especially when the small molecule is immobilized on the surface. Because of this, the sensor section of this review will almost exclusively focus on approaches and examples where the protein is immobilized, and the small molecule is detected in a direct test format (Fig. 3).

Fig. 3
figure 3

Schematic sensorgram. A typical sensorgram (upper) and the corresponding processes on the heterogeneous phase (lower), consisting of five phases. 1: baseline, only buffer is flushed over the surface; 2: the small molecule is injected, leading to a change in the sensor signal; 3: equilibrium is reached; the sensor signal is stable on a different level compared with baseline; 4: buffer is injected again and the small molecule dissociates, changing the sensor signal back to 5: baseline

Even when the protein rather than the small molecule is attached to the surface, the behaviour of the protein could be affected, just as happens during labelling. The difference between the labelling process and the immobilization process is that little can be done to control the way a label attaches to a protein. In contrast, sensor scientists usually use biopolymers to immobilize the protein in a controlled way, and create an environment that does not affect its activity [57]. This is always necessary for the label-free sensor techniques, because non-specific adsorption of molecules to the sensor surface must be reduced to enable discrimination between specific interaction and non-specific binding.

All sensor-based techniques only need a small amount of immobilised protein (a few μg). The amount of small molecule needed for experiments can be very high (up to high μmol L−1) depending on the affinity of the small molecule–protein interaction.

Piezoelectric transduction

QCM-based sensors use the piezoelectric effect of a quartz crystal. The resonance frequency of the QCM depends on the mass of the sensor, and the change of the frequency follows the Sauerbrey equation [58]. The combination of resonance frequency and dissipation measurements (QCM-D) is most widely used for analyte detection in the liquid phase. For a more detailed overview, refer to one of the recent reviews [59, 60].

The advantage of QCM compared with other techniques, and especially other sensor-based techniques, is the relatively easy technical setup and its relative cost-effectiveness. In contrast with optical sensors, QCM measurements can also be performed in non-transparent solutions. High concentrations of solvent, which are sometimes required in small molecule screening, do not disturb the measurements. Disadvantages are the low degree of high-throughput capability, and the fact that the detected sensor signal is directly dependent on the mass of the detected molecule, which is by definition very small.

QCM provides kinetic data for the interaction [61]. Nihara et al. investigated kinetic rate constants of catalytic dextran elongation by use of a dextransucrase enzyme. Kinetic rate constants for the binding of the enzyme to immobilized dextran could be monitored, as could the kinetics of the catalysed elongation process itself. The reverse assay-procedure, with immobilized enzyme on the QCM, did not provide kinetic information. The molecular mass of the dextran acceptor was too small compared with the immobilized enzyme, meaning frequency changes of the QCM were insufficient and rate constants could not be determined.

Another notable recent example where QCM was used to investigate small-molecule binding to an immobilised protein is the binding of the antioxidant catechin (from green tea) to troponin C (a marker for cardiac failure) [62]. Tadano et al. revealed that several catechin derivatives from green tea bind to immobilized troponin C subunits, and affinity constants of the interactions have been calculated from the resulting frequency changes. The binding event of (−)-epigallocatechin gallate seems, on the basis of troponin C peptides, to be in good agreement with NMR spectroscopy studies.

QCM with dissipation monitoring has been used to monitor conformational changes of glycoprotein gp120 induced by exposure to small-molecule inhibitors [63]. This glycoprotein is found on the envelope of the HIV-I virus and is involved in the viral entry into host cells. The QCM-D measurements reveal changes in viscoelasticity of the surrounding medium, indicating a conformational change in the gp120 immobilized on the sensor surface. Although the results are in good agreement with reference experiments using ITC, the interaction with the small molecule was only measured indirectly, via the conformation change of the protein, and not directly via the mass change on the sensor surface upon binding of the small molecule.

Optical transduction

There are many optical transduction principles, and they vary greatly. Therefore, we will only briefly discuss the most popular techniques and their use in small molecule–protein interaction analysis.

The easiest way to investigate small molecule–protein interaction is when one of the binding-partners has an intrinsic fluorophore, e.g. the fluorescent tryptophan of most proteins. This has been used for detailed biophysical analysis of the interaction of the enzyme FKBP-12 with rapamycin, a small-molecule target for prostate-cancer treatment and immunosuppression. Performing fluorescence intensity measurements, the authors could specifically determine the energetic contribution of the interaction between a single residue and the rapamycin molecule [64]. It is debateable whether these techniques can be regarded as truly “label-free”, but there is no need for a labelling process which could negatively affect the interaction. However: light of a suitable wavelength is needed for irradiating the fluorophore, and this has an unknown effect on the sample.

Other approaches are based on phase shifts (Mach Zehnder interferometry (MZI) [65]), waveguide grating biosensors (e.g. [66, 67]), photonic-crystal biosensors (e.g. [68, 69]), surface-plasmon resonance (SPR) [70], or reflectometry (e.g. RIfS [71]).The technique with the best theoretical performance is probably MZI [65]. However, because it is also probably the most challenging technique, there have been no successful applications of MZI for small molecule–protein screening.

SPR, like MZI, is an evanescence-based technique, which detects the change of the refractive index of a surface upon binding of an analyte to a sensitive layer. SPR-based techniques can determine kinetic constants directly, and can use a wide variety of surface chemistry [72]. Like QCM, SPR uses gold (or, more rarely, other noble metal)-covered surfaces. One advantage of SPR is that the technique is very well accepted by the scientific community, especially for determining kinetic rate constants. Like QCM, a disadvantage of SPR is that the signal also correlates—although more indirectly than with QCM—with the size of the molecule to be detected, which is again a small molecule.

Sensors based on SPR have been used to screen for anti-prion compounds, and to compare their inhibition activity against abnormal prion protein formation in scrapie-infected neuroblastoma (ScNB) cells [73]. This study indicated that most anti-prion compounds tested interacted with and had an affinity for the recombinant domain of the prion protein. The SPR binding response to that domain correlated with anti-prion activity in (ScNB) cells.

Reflectometric techniques do not only monitor the refractive-index change upon binding, as do MZI or SPR, but also monitor the change in physical thickness. In contrast with QCM or SPR, the substrates are not gold-coated. Glass or transparent polymers are most often used as transducer materials, but silicon can also be used depending on the detection technique and wavelength used. Therefore, a large variety of possible surface chemistry can be used [74]. Like SPR, reflectometric techniques have the disadvantage that the size of the analyte affects the amount of signal generated by the sensor. In addition, the technique has not yet reached the high degree of acceptance of SPR. However, because both the physical thickness and the refractive index are detected, reflectometric techniques are less sensitive to temperature fluctuations [75]. As with SPR, kinetic data can be obtained directly.

Sensors based on reflectometry have been used for fragment-based screening. By using biolayer interferometry (BLI), reproducible matches have been observed for the target JNK1, which has been implicated in diabetes, and the target eIF4E, an important modulator of disease progression in oncology. In addition to overlapping matches obtained from SPR and biochemical assays, compounds uniquely identified with BLI have been observed [76].

Both SPR and reflectometric techniques offer a high degree of automation and the possibility of high throughput [77].

To determine both affinity and kinetic rate constants, one of the reactants must be immobilised. When only the affinity is of interest, it is possible to determine it in homogenous phase. Two possible methods will be explained in brief.

The first possible method of determining the affinity in homogenous phase by using optical sensors is rather an assay type than a separate method. Therefore, this method can be applied to all optical-transduction techniques presented here (SPR [78] or reflectometry [79]). The assay type is rather old and is commonly known as binding inhibition assay (the same principle as, e.g., KinExA [80]). By using this assay format, it is also possible to circumvent the problem of low signal by detecting small molecules directly, but at the cost of losing kinetic information. The effect of the surface properties on the biological system can also be avoided [81].

The second possibility, thermophoresis [82], does not require an immobilisation step. The method relies on disturbing the protein–small molecule interaction via an IR laser and then monitoring the auto-fluorescence with a photodetector. Information provided by this method includes affinity, free energy, enthalpy, and entropy. The capabilities of this method are limited by the auto-fluorescence of the protein or the small molecule, so it cannot be applied to all protein–small molecule systems. As for the first example, information on kinetic rate constants is not available.

Electrochemical transduction

Electrochemistry offers a wide variety of possibilities for investigating small molecule–protein interactions. The most important are square-wave voltammetry and impedance measurements.

One main advantage of electrochemical transduction is the possibility of using relatively simple and cost-effective equipment, which can be used by untrained persons [83]; the best-known example is blood-glucose meters. In contrast with other sensor-based techniques, the intensity of the observed signal is usually not strongly dependent on the size of the analyte, which is useful for small-molecule detection.

A disadvantage that the analyte must be redox active when square-wave voltammetry is performed. However, this problem can be overcome by immobilizing a redox-active component on the sensing surface in close proximity to the analyte. Upon binding of a protein, this redox process is disturbed and the reduction correlates with the amount of protein bound. This procedure has successfully been used for, e.g., the detection of TNT [84].

Since its invention in the late 1970s the patch-clamp technique has been widely used, especially in electrophysiology. Here, ion channels in cells are studied by measuring the current flux. Impressive results were obtained from studying voltage-gated calcium or sodium channels in regard to pain therapy [85, 86], and from measuring their inhibition by small molecules [87].

Impedance was first used relatively early (over 25 y ago) to detect cholinergic agents via their binding to immobilised acetylcholinesterase [88]. More recently the affinities of small molecules towards the amyloid beta peptide, which has an important function in Alzheimer’s disease, have been determined by impedance spectroscopy [89].

Although the small molecule does not need to be redox active when using impedance spectroscopy, this technique was rarely used in small molecule research until recent years, when it has been more widely used for monitoring whole cells exposed to small molecules than for monitoring isolated proteins [9092].

Monitoring whole cells has advantages and disadvantages, which are the same for each transduction principle associated with cells. These advantages and disadvantages will be discussed in detail in the “Cell-based methods” section.

Isothermal titration calorimetry (ITC)

Calorimetric methods are one of the oldest methods used in analytical (and bioanalytical) chemistry. Calorimetry has been used in numerous applications; a detailed overview of current applications is given in [93]. In particular, the advantages of calorimetric methods have attracted much interest regarding drug discovery and drug design [94].

As shown in Fig. 4, ITC directly measures the heat of interaction between two reactants, e.g. for a protein and a small molecule. The main advantage of calorimetric methods for investigating protein–small molecule interactions is the amount of information generated per experiment; it is possible to gain information about enthalpy, entropy, affinity, specific heat capacity, and stoichiometry in one experiment. Because this information is generated in homogenous phase and without labelling, the information can be assumed to be very accurate when compared with equivalent information obtained by other methods. Additionally, the molecular weight of the measured reactants does not affect the strength of the signal. The basic principle of this method is explained in ■; for a more detailed explanation of the function principle see [95].

Fig. 4
figure 4

ITC experimental setup and titration curve. Schematic drawing of the experimental setup for ITC measurements (left). The experimental setup of an ITC consists of two cells. One contains the protein of interest (sample cell), one lacks of the protein of interest (reference cell). A small molecule is stepwise injected into a protein solution. As the interaction takes place, heat is generated at each injection (upper right). The titration curve (lower right) is obtained by integrating the peaks of each injection. By further data evaluation, thermodynamic data is determined

However, this method has disadvantages. Experiments usually take several hours, with limited potential for automation or high-throughput screening. The amount of reactants is usually quite high; this is not a problem regarding the small molecule, but the availability of protein is limited in some cases. Other limitations of the methods are determined by the intrinsic properties of the measured system, namely the product of affinity and concentration of protein (Wiseman constant), on which the affinity has the greatest effect and which should be approximately 10–100 in an optimum case. These disadvantages are encountered as a result of recent developments focused on reducing the volume of the measurement cells down to a few hundred μL, which reduces the amount of protein needed to 10 μg [96]. Another technique, continuous isothemal titration calorimetry (cITC) [97], uses continuous injection of the small molecule, which leads to more data points and thus reduces the statistical error and the time needed per experiment. For high-affinity systems the limiting Wiseman constant might be circumvented by varying experimental conditions, for example pH or temperature, to reduce the affinity of the system [98]. A more elegant method of circumventing this limitation is displacement titration, where a lower-affinity small molecule is pre-incubated with the protein and then displaced by titrating the higher-affinity small molecule [99].

Recent examples where ITC has been used to investigate small molecule–protein interactions also include investigations of membrane proteins, which are usually difficult to handle. ITC has been used to reveal that the interaction of glycine, which acts as an agonist on the glycine receptor, is driven by enthalpy, whereas the interaction of strychnine, which acts as an antagonist on the glycine receptor, is driven by entropy [100].

Using displacement titration calorimetry, Biela et al. were able to determine the very high binding affinities (pico to nanomolar range) of different peptidomimetica to the thrombin receptor. Using this information, the hydrophobicity of the investigated peptidomimetica was stepwise increased (Gly, D-Ala, D-Val, D-Leu and D-Cha), resulting in a parallel increasing affinity of the ligands. The determination of DH0 enabled calculation of entropic contributions, revealing an enhanced entropic term resulting from the stepwise increase of the substituents’ hydrophobicity. Thus, the results support “the classical understanding of the hydrophobic effect, being mainly entropy driven in nature and resulting from the release of firmly fixed water molecules from a well-hydrated binding pocket” [101].

ITC in combination with NMR and xX-ray crystal-structure analysis helped to explain the binding mechanism of small-peptide inhibitors to serine protease. The structure of the peptide inhibitors was varied on the basis of the structure of Upain1 (CSWRGLENHRMC). Modifications at the N-terminal reduced the affinity by a factor of 10, whereas modifications at the C-terminal did not affect binding affinity. ITC measurements revealed that the driving force for all peptides was enthalpy, whereas the entropic contribution inhibited binding because of the more restrained conformation upon binding. N-terminal and C-terminal modifications both reduced binding enthalpy, but for N-terminal modifications the unfavourable entropy over-compensated the enthalpy, causing the reduction in affinity. C-terminal modifications did not lead to a change in binding entropy [102].

Mass spectrometry (MS)

MS is the method of choice for detection and identification of unknown molecules. For analytes ranging from small molecules to nanoparticles, the method is usually sensitive, fast, has low sample consumption, can be very well automated, and can be used for high-throughput approaches.

The basic principle of the method remains the same for investigating small molecule–protein interactions: instead of a single molecule, the small molecule–protein complex is ionised and then detected [103].

However, there are some challenges associated with using MS for this purpose, because the structure of the protein should remain in a native state and be minimally affected by the ionisation. In the worst case, ionisation can cause the interaction between the small molecule and the protein to be completely destroyed. This can be especially problematic for hydrophobic interactions, because no water is present in the gaseous phase and some additives strengthen ionic and electrostatic forces, while weakening hydrophobic and van der Waals forces. Another challenge is that the buffer should be compatible with both the protein and the ionisation technique required for mass spectrometry, otherwise peak broadening and “ion suppression” will result [104]. In most cases, finding a compatible buffer is not possible. A related challenge is that sometimes the shift in mass caused by the small molecule in relation to the larger mass of the protein is not sufficient to clearly discriminate between the protein itself and the small molecule–protein complex.

Once these challenges have been overcome, it is possible to determine the binding stoichiometry and even the affinity with a low consumption of sample (less than femtomolar) [105] and very rapidly. Probably the most promising ionisation method is electron-spray ionisation (ESI) [106].

Using ESI-MS, it has been possible to investigate the affinity and mechanism of enzymatic cooperativity of multimeric proteins [107]. In this study, nanoESI-MS was used to reveal the allosteric mechanism in the binding of different inhibitors to fructose 1,6-bisphosphatase (FBPase), which is a potential target in type 2 diabetes. FBPase is a tetrameric enzyme, consisting of four identical subunits. In the nanoESI mass spectrum the formation of a non-covalent complex with four ligands was observed. States with one, two, or three ligands could not be detected because of the low resolution of the different states, resulting from the low mass of the small molecule compared with the protein. For a ligand with higher molecular weight two states were detected, with two and with four ligands bound. The Hill coefficient was determined by titration, and positive cooperativity was proved.

ESI-MS was also used in fragment-based screening for drug discovery [108]. A fully automated nanoESI-MS method with high throughput was established to screen hundreds of potential drug candidates in a short time. One hundred and fifty-seven phenylpyrazole-derived compounds were screened against the anti-apoptotic protein Bcl-xL, a protein which has a function in tumour progression.

Another interesting application is the coupling of affinity columns to MS. With this approach it was possible to find new ligands for nuclear receptors [109].

Nuclear magnetic resonance (NMR)

NMR is still mainly used to determine structures of molecules [110, 111]. But with the ongoing development of instrumentation, this technique is now very well suited for investigating the interaction between small molecules and proteins or even bigger structures. Several NMR experiments were used by Rademacher et al. to study norovirus infection [112, 113]. The interaction of histo blood-group antigens (HBGA), as small molecules, with the replicated norovirus surface of virus-like particles (VLPs) was investigated. A library of 500 fragments was screened to identify binders to human norovirus VLPs. Those studies revealed α-L-fucose to be essential for the binding of the VLPs, and delivered high-avidity binders which are potent inhibitors of norovirus infections. As well as investigating virus-infection processes of VPLs, cells can also be investigated [114, 115]. An NMR study of living cancer cells was published in 2011 [116]. Integrins, as transmembrane receptors, are involved in tumour-cell proliferation, migration, and survival. Therefore, integrin antagonists are used for cancer therapy. For NMR studies of potential antagonists, bladder-cancer cells with integrin receptors were investigated in non-deuterated buffer suspensions. This study reveals the potential of investigations of small ligands interacting with membrane-bound proteins in the environment of a whole cell.

Other potentially interesting information can be gained, e.g. about aggregation states of investigated proteins (important in, e.g., Alzheimer’s research) [117], and about protein dynamics [118, 119] and stability. A more detailed insight is provided by more-specialised NMR reviews [120123].

When investigating the interaction between a small molecule and a protein via NMR, the scientist can monitor the change in chemical shift of protein signals when the small molecule binds to the protein. For example, 15N-HSQC-NMR experiments were used to identify the region of interaction of a drug candidate with the binding groove of the oncogene BCL6 [124]. This compound was selected by computer-aided drug design because of its potential to disrupt BCL6 activity by blocking its interaction with a corepressor. This loss of function can kill cancer cells. The same 2D-NMR experiment was used to reveal that the azobenzene compound ischemin binds to the CREB-binding protein at its bromodomain acetyl-lysine binding pocket [125]. Ischemin suppresses cardiac myocyte apoptosis during ischemia. Those 2D-NMR experiments (HSQC, HMBC, etc.) can help to identify the protein region the small molecule interacts with, but are time-consuming and challenging in terms of signal interpretation. Alternatively, the signals of the small molecule, interacting with a substantially bigger protein, can be monitored, and there are several fast one-dimensional experiments (STD, WaterLOGSY) that can provide sufficient information. For those experiments, the nuclear Overhauser effect (NOE) can be used to transfer magnetization from the protein to the small molecule.

Three notable techniques for investigating small molecule–protein interactions are: transferred nuclear Overhauser effect spectroscopy (NOESY), saturation transfer difference spectroscopy (STD), and water-ligand observed via gradient spectroscopy (WaterLOGSY).

The underlying principle of investigating small molecule–protein interactions with NOESY is the short relaxation time of small molecules in solution. When a protein is added to this solution and the small molecules bind to it, the relaxation time increases dramatically and the 2D 1H-1H NOESY spectrum changes accordingly. In this way, up to 96 different small molecules can be investigated at once [126] when using the method for fragment-based screening.

STD (like NOESY) is based on an interchange of the magnetic moment between bound and unbound small molecule [127, 128]. An important variable is the saturation frequency, which must saturate the protein frequencies and should therefore be different from the frequencies of the small molecule. The experiment is described in Fig. 5. The signals of the small molecule bound to the protein disappear as a result of the spin relaxation [129, 130]. The experiments are usually performed in a 100 to 1000-fold excess of the small molecule, depending on the molecular weight of the protein [113].

Fig. 5
figure 5

Comparison of a normal NMR spectrum (1H) with STD and WaterLOGSY. Example spectra of a binder (orange) and a non-binder (blue) depending on the technique used. In the upper spectra, both ligands produce a signal. Using WaterLOGSY (middle) the spectrum of the non-binder becomes negative (negative NOE). For STD experiments (lower) the spectrum pictured is a subtraction of the non-binder STD spectrum (during saturation) from the normal 1H spectrum

STD has recently been used for screening and qualitative ranking of drugs in complex biological systems [123, 131133], determination of affinity constants [134136], and group epitope mapping [137]. The STD-NMR observation of time-dependent hydrolysis of piperacillin catalysed by penicillin-binding proteins (PBPs) revealed several hydrolysis products [138]. Surprisingly, one of those piperacillin derivatives, (5S)-penicilloic acid ((5S)-PA), had binding capability to the PBPs, although no longer having a β-lactam ring. Additionally, the complex of PBP with (5S)-PA was confirmed by STD-NMR, supporting crystallography observations. Fragment-based screening for inhibitors of human thymidylate synthase (hTS) was also successfully performed using STD [139]. A library of 420 molecule fragments was screened, and ligand pairs were identified binding in proximal sites in the cofactor-binding pockets of hTS. The fragment hits helped identify novel non-canonical leads with excellent binding efficiencies for hTS. Inhibition of hTS is used for anti-cancer therapy and has recently been considered for treating infectious disease. These examples reveal the usefulness of ligand-based NMR for initial screening, and for more in-depth investigation, of drug candidates.

WaterLOGSY is based on the same principles as NOESY and STD, but water molecules—as the name implies—have an important function in the magnetisation transition: magnetisation of water is induced and then transferred to bound small molecules. The signals of bound and unbound molecules will have the opposite NOE effect and can thus be discriminated (see Fig. 5). The experiments are usually performed in an excess of small molecule, not higher than 100 equivalents. Therefore, it is more suitable than STD when the small molecule has a low solubility. WaterLOGSY has been successfully used for screening and affinity measurements [140], for studying the interaction between small molecules and nucleic acids [141], and for epitope mapping, as in the following study in which solvent accessibilities of bound ligands were investigated for mapping ligand orientations [142]. These experiments were performed with ligands for two dehydrogenases (AKR1C3 and HSD17β1), and even ligands buried in deeper binding pockets had slightly different signal intensities for some parts of the molecule, revealing its orientation. In addition, the use of DMSO as co-solvent and its magnetization transfer ability were successfully investigated, enabling investigation of ligands which are poorly soluble in water. In summary, WaterLOGSY can help to design new compounds which lack the crystal structures of protein–ligand complexes, and is complementary to STD-NMR.

There are additional advantages of NMR compared with other methods discussed within this review. Compared with X-ray diffraction, NMR has the advantage that structural information gained regarding the interaction complex is closer to the natural state of the protein [143, 144], because NMR experiments are performed in aqueous solution and the protein does not have to be crystallized. Therefore, NMR enables interaction studies on proteins which cannot be crystallized [145]. Another advantage is that the information gained is also on a molecular level, providing a very deep insight into the underlying principles of the interaction [145, 146]. In contrast with, e.g., biosensor applications, especially optical biosensors, the size (molecular weight) of the small molecule does not affect the strength of the signal being detected [142].

On the other hand, there are some disadvantages. One major disadvantage is the time needed per experiment when monitoring the chemical shift of the protein to identify the binding domain [147149]. This is caused by the increase of transverse, longitudinal, and cross-relaxation, which are mass-dependent, and the large increase of the molecular rotational correlation time [121, 144, 150]. In addition, the amount of protein needed can be quite high: normal NMR experiments need concentrations of approximately 1 mmol L−1 and a sample volume of 500 μL. NMR techniques to monitor ligand–protein interactions can use concentrations as low as 50 μmol L−1 in a 100 μL sample volume [144], or even as low as a few hundred nmol L−1 [151].

X-ray diffraction

X-ray diffraction is best known as the standard technique for the determination of molecule structures. It can resolve chemical structures ranging from a few Da [152] to a theoretically-unrestricted upper molecular weight [153], and can resolve complexes of different molecules. The latter makes it perfectly suited for investigating small molecule–protein interactions.

The basic principle of X-ray diffraction has changed little since its development over 60 years ago [154], although it has been extended and varied. It uses an X-ray irradiation source (alternatively, electrons or neutrons can be used [155157]) to produce a beam, which is then diffracted by the sample of interest. The diffraction pattern can then be used to reconstruct the 3D structure of the sample. The sample must usually be in a crystallized state.

When using X-ray diffraction to investigate small molecule–protein interaction, there are several possibilities:

  • The small molecule and the protein can be co-crystallized.

  • The protein can be crystallized alone and the crystal can later be soaked in a small-molecule solution.

  • The protein can be crystallized with a low-affinity ligand and later be soaked in a higher-affinity ligand.

Each of these methods of creating the sample to be diffracted has advantages and disadvantages, but, as for NMR, structural information regarding both the protein and the small molecule is gained [158]. Crystallizing the protein alone and soaking it with small molecules later can be regarded as the most resource-effective method, because the crystallization process has to be optimized only once and it is even possible to compare the structures of the apo-protein with the small molecule–protein complexes. However, the soaking may destroy the crystal entirely, or it may lead to false-negative results when there is not enough structural flexibility left for binding of the small molecule within the crystal [159]. Co-crystallization with a small molecule has the advantage that a small molecule–protein complex is often more stable than the apo-protein. Additionally, the crystallization process can be considered closer to the native state, because it is performed in liquid phase. This method is less suitable for low-affinity interactions, when the high concentrations of small molecule required may disrupt the crystallization process [160].

All crystallization methods will provide the 3D structure of the protein, its binding pocket, and the small molecule, meaning there are no false-positive results. It is also possible to investigate very low-affinity interactions: affinities down to 5 mmol L−1 are unproblematic [161].

On the other hand, large amounts of very pure protein are needed, the crystallization process can be challenging (or even impossible in the worst case) and time-consuming, and, therefore, the throughput will always be limited [162]. However, the latest developments in robotics and lab automation have increased the throughput, and X-rays can also be used in primary screenings [163].

In several drug-development projects, X-ray diffraction has an essential function. Recently, structural information on crucial interactions enabled Certal et al. to optimise their phosphoinositde-3 kinase (PI3K) inhibitors for cancer treatment to be selective for the beta isoform. They were able to use structural information on other isoforms of PI3K to gain information on their specific problem, overcoming the lack of usable crystal structures of the beta isoform [164].

The study of Muray et al. reveals the potential of X-ray diffraction for fragment-based screening. They used high-throughput X-ray crystallography in combination with other methods to find inhibitors of heat-shock protein 90 (HSP90), which is associated with cancer. Twenty-six interesting fragments were discovered and two lead series with high ligand efficiency were identified. The resorcinol lead was further optimized into a compound for clinical trials. To obtain high-quality X-ray structures, it was necessary to use a combination of cocrystallography and soaking experiments [165].

Goudreau et al. were able to identify a new inhibitor-binding site on HIV-1 capsid N-terminal domain. The cocrystal of the target protein with a benzimidazole inhibitor has good agreement with NMR studies regarding binding to a novel site, which is well-removed from the two previously reported binding sites. [166].

Cell-based methods

Cell-based methods emerged from advances in methods of molecular biology and genetic engineering. They can be regarded as a link between somewhat artificial methods, which rely on the use of isolated proteins and ligands, and whole-animal models. This means they are closer to reality than fully artificial systems, but are easier to handle and require less intensive care than animal models. In addition, for most people it is better for ethical reasons to perform studies with cell cultures rather than with animals.

Most commonly, a transgenic cell line is created by introducing new genetic material. This new genetic material is designed as a “reporter gene”, which forces the cell to generate a detectable signal (e.g. fluorescence, antibiotic resistance) when exposed to a specific stimulus, in our case the chemical stimulus of a small molecule (Fig. 6) [167]. The “reporter gene” can either be temporarily introduced (transient transfection) or can be persistent (stable transfection), generating a “reporter cell line”. Either bacterial or eukaryotic cells can be used for this process. Bacterial cells are usually easier to handle, but eukaryotic cells are closer to the human metabolism, which is usually desired when investigating small molecule–protein interaction in pharmaceutical research.

Fig. 6
figure 6

Reporter gene assay. Upon translocation of the small molecule into the genetically modified cell, the small molecule binds to a receptor. This leads to a signalling cascade, which forces the cell to produce a detectable signal depending on which genetic material was used to generate the transgenic cell

One of the biggest advantages of reporter-gene assays performed with reporter cell lines is also the biggest disadvantage. With these reporter-gene assays, as well as detecting a single small-molecule species, it is also possible to detect whole groups of substance classes, including their metabolites, sharing the same effect (and not necessarily having structural similarities). This behaviour is often referred to as “effect directed analysis” (EDA) [168], and means it is not possible to determine if a small molecule itself has the observed effect, or if the small molecule must be metabolised to have that effect.

It is even possible to use genetically unmodified cells and monitor their “behaviour” (e.g. morphology, growth, and change in reflective index caused by second-messenger activity) when exposed to small molecules. A comprehensive overview of methods providing a deeper insight into morphology information can be found in [169]. In particular, the high degree of automation available for flow cytometry and microscopy reveals this technology to be ideally suited for multiparameter phenotypic profiling. Monitoring of the growth of human T cell leukaemia (Jurkat) and human hepatocellular carcinoma (HepG2) cells to evaluate the cytotoxicity of water-soluble fraction from biodiesel and its blends is reported in [170]. The authors describe assays to detect changes of the mitochondrial membrane potential, and compare their results with the recognition of apoptosis. They also use a third technology in this study, to monitor changes in cell behaviour using real-time impedance measurements. This label-free read technology corresponds very well with the results of the other assays in this study. Automated impedance measurement is a well-established and validated technology for cell-based assays. An important strength of this technology is the ability to work with unmodified cells in settings as close as possible to in-vivo situations, as was recently revealed by monitoring of T cell activation [171]. Another possibility of similar assays is quantifying the changes of the refractive index. For example, [172] used a grating-coupler approach to develop a screening assay for human stem-cell lines in a way which enables monitoring of dynamic mass redistribution within living cells.

Cell-based assays have also been extensively used to investigate GPCRs [173] and nuclear receptors [174, 175], and in combination with virtual screening for discovering new STAT3 inhibitors [176].

Conclusions and outlook

After discussing the advantages and disadvantages of both types of method (Table 1), we conclude that each is best suited to a specific purpose.

Table 1 Comparison of methods for label-free interaction analysis of small molecules with proteins

Methods with high-throughput capabilities, e.g. in-silico screening, mass spectrometry, biosensors, or cell-based systems, are probably best suited for primary screening. Afterwards, positive screening targets should be further investigated using other methods. For a general understanding of how single small molecules interact with their target protein, such methods as X-ray diffraction, NMR, or ITC are probably most promising.

Other factors that will affect the decision of what method to choose are the availability of the method and the related costs. The specialisation and expertise of those using the equipment will also contribute to the decision as to which method to use. X-ray.

It is possible to perform in-silico simulations without buying new lab instrumentation because free software is available, and simple docking experiments can be performed on commercially available PCs. “Unfortunately, many chemoinformatic approaches simply overpromise and under-deliver and, therefore, do not improve productivity (and may even reduce it)” [177]. Furthermore, as stated in the in-silico section, if the models are not chosen and prepared carefully the simulations may still produce results, but these results will not be reliable. However, this problem does not occur only with computational methods; a poorly-performed experiment using any method may produce unreliable results. In-silico is an unmatched method in terms of throughput but, as with any method, experiments must be planned and performed carefully.

The other methods also have their unique features, advantages, and disadvantages. NMR or X-ray diffraction will yield structural information. There may be workarounds for providing limited structural information, for example conformational changes, by using other methods (e.g. biosensors [178]), but they will not equal the detailed insights obtained using NMR or X-ray diffraction. However, choosing between NMR and X-ray diffraction depends on the preferences of the researcher and the problem to be tackled. For example, X-ray diffraction does not place a limit on the size of the molecules to be investigated and provides better special resolution, whereas NMR is faster, can be applied to a larger number of samples, and requires no crystallization of the protein.

For extensive characterization of the thermodynamics of small molecule–protein interaction, ITC is the method of choice. Some variables, including affinity, can be determined using other methods, e.g. NMR, biosensors, or MS, but in many cases ITC provides more reliable and more complete information than any other method.

In contrast, biosensor measurements both enable the determination of affinity and provide detailed insights into kinetic data. However, one of the reactants must be immobilized while maintaining its natural activity.

At some point, pharmaceutical screening must progress from standard analytical methods. Cell or animal-based systems can provide complementary information (e.g. pharmacokinetics) to the previously mentioned methods, and are mandatory before clinical trials of a pharmaceutical.

We conclude that for serious research into small molecule–protein interactions, a combination of at least two (or preferably more) complementary methods covering different aspects is required. For example, in-silico approaches can be regarded as a useful supporting tool because simulated results can be experimentally validated, and the experimental results can be used to refine and improve the in-silico models.

The ultimate objective would be to have one method that provided the most information while being closest to an in-vivo situation and having high throughput. However, there is no such method, and any method will have its disadvantages. Recent developments in all methods in label-free technology focus on overcoming these individual disadvantages while strengthening intrinsic advantages. In particular, methods that yield detailed information but suffer from low throughput (e.g. NMR, X-ray, or ITC) profit most from lab automation and robotics, as revealed in the recent development of these techniques [179, 180]. Together with automation, the trend of miniaturization not only increases throughput, but also minimizes sample consumption [181]. This is of great importance, because some samples (mainly the proteins) are quite valuable, because they are difficult and therefore expensive to produce. In addition, miniaturization and parallelization further increases the throughput of sensors [182, 183]. Improving the limit of detection remains one of the major challenges of label-free sensing, and is the objective of recent developments. Although sensors are already working at conditions very close to the physiological state, improvements in assay design bring models closer to the in-vivo system. The next logical step is combining living cells with label-free methods; the increased information that may be gained will result in great improvement of the method [184], especially when combining primary cell lines or cancer-cell lines with complementary methods, for example sensors.

The need for a sophisticated combination of multiple methods will remain the driving force behind any new development in label-free small molecule–protein interaction analysis in the immediate future.