Introduction

In modern drug discovery, protein–ligand or protein–protein docking plays an important role in predicting the orientation of the ligand when it is bound to a protein receptor or enzyme using shape and electrostatic interactions to quantify it. The van der Waals interactions also play an important role, in addition to Coulombic interactions and the formation of hydrogen bonds. The sum of all these interactions is approximated by a docking score, which represents potentiality of binding. In the simplest rigid-body systems, the ligand is searched in a six-dimensional rotational or translational space to fit in the binding site, which can serve as a lead compound for drug design (Alberg and Schreiber 1993).

The docking accuracy in a rigid-body approach is much greater for bound complexes than uncomplexed molecules (Shoichet and Kuntz 1991). Even though the observed structural changes between the bound and free forms are small, the difference in accuracy implies that the assumption of rigidity is not fully warranted (Totrov and Abagyan 1994). Also, the difference between the near native structures and others far from native cannot be distinguished, even with simple scoring functions such as measures of surface complementarity (Katchalski-Katzir et al. 1992), solvent accessible surface area (SASA) burial, solvation free energy, electrostatic interaction energy, or the total molecular mechanics energy (Shoichet and Kuntz 1991). Hence, the docking procedures were improved by several groups by allowing for receptor and ligand flexibility.

The entropy loss of a flexible ligand in rigid six body degrees of freedom in an anisotropic environment of the receptor and the change in its internal energy upon binding can greatly affect the binding affinity. Introducing local minimization of a molecular-mechanics energy function such as in the CHARMM package yields only limited improvement (Brooks et al. 2009). Consequently, information regarding the binding site location before the docking processes became very important to increase the docking efficiency. There are several cavity detection programs or online servers that can detect putative active sites within proteins, e.g., GRID (Goodford 1985), POCKET (Levitt and Banaszak 1992), SURFNET (Laskowski 1995), PASS (Putative Active Sites with Spheres) (Brady and Stouten 2000), and MMC (mapping macromolecular topography) (Mezei 2003).

The earliest reported docking methods were based on the lock-and-key assumption proposed by Fischer, stating that both the ligand and the receptor can be treated as rigid bodies and their affinity is directly proportional to a geometric fit between their shapes (Mezei 2003). Later, the “induced-fit” theory proposed by Koshland suggested that the ligand and receptor should be treated as flexible during docking (Hammes 2002; Koshland 1963). Each backbone movement affects multiple side chains in contrast to relatively independent side chains. Thus, the sampling procedure in a fully flexible receptor/ligand docking is of a higher order of magnitude in terms of the number of degrees of freedom than in flexible docking with a rigid receptor. Consequently, these flexible docking algorithms not only predict the binding mode of a molecule more accurately than rigid body algorithms, but also its binding affinity relative to other compounds (Verkhivker et al. 2000).

Over the last two decades, more than 60 different docking tools and programs have been developed for both academic and commercial, use such as DOCK (Venkatachalam et al. 2003) AutoDock (Österberg et al. 2002), FlexX (Rarey et al. 1996), Surflex (Jain 2003), GOLD (Jones et al. 1997), ICM (Schapira et al. 2003), Glide (Friesner et al. 2004), Cdocker, LigandFit (Venkatachalam et al. 2003), MCDock, FRED (McGann et al. 2003), MOE-Dock (Corbeil et al. 2012), LeDock (Zhao and Caflisch 2013), AutoDock Vina (Trott and Olson 2010), rDock (Ruiz-Carmona et al. 2014), UCSF Dock (Allen et al. 2015), and many others.

Although strategies in the ligand placement differ one from another, these programs are broadly categorized as ranging from incremental construction approaches, such as FlexX (Rarey et al. 1996) to shape-based algorithms (i.e., DOCK) (Kuntz et al. 1982), genetic algorithms (GOLD) (Jones et al. 1997), systematic search techniques (Glide, Schrödinger, Portland, OR 97201), and Monte Carlo simulations (LigandFit) (Venkatachalam et al. 2003). With the exception of GOLD, almost all current flexible ligand docking programs treat the receptor as rigid (Jones et al. 1997). These programs were evaluated to test their abilities in producing the correct binding mode of a ligand to its biological target and identifying the known compounds with top scores in virtual screening trials. In order to assesses the docking accuracy and mode of binding, initially, FlexX was evaluated on a set of 19 protein–ligand complexes, with a subsequent evaluation on a larger set of 200 complexes (Rarey et al. 1996). The docking accuracy of Glide was assessed by redocking ligands from 282 co-crystallized PDB complexes, while GOLD was validated on 100 and 305 complexes (Friesner et al. 2004; Jones et al. 1997). Further, ligandFit was reported for 19 protein–ligand complexes (Venkatachalam et al. 2003), while DOCK has been verified on several targets over the years (Bodian et al. 1993; Debnath et al. 1999; Shoichet et al. 1993). Both AutoDock and AutoDock Vina were calibrated using the same test set of 30 structurally known protein–ligand complexes with experimentally determined binding constants (Österberg et al. 2002; Trott and Olson 2010).

Among these programs, AutoDock Vina, GOLD, and MOE-Dock predicted top ranking poses with best scores. GOLD and LeDock were able to identify the correct ligand binding poses. Both Glide (XP) and GOLD predict the poses consistently with a 90.0% accuracy (Wang et al. 2016). It was also shown that GOLD produced higher enrichment factors than Glide in a virtual screening trial against Factor Xa, whereas Glide outperformed GOLD against the same target in a similar virtual screening trial. Overall, it was reported recently that these docking programs are able to predict experimental poses with root-mean-squared deviations (RMSDs) averaging from 1.5 to 2 Å (Bissantz et al. 2000; Dixon 1997). However, flexible receptor docking, especially backbone flexibility in receptors, still presents a major challenge for the available docking methods.

Rigid body docking

Rigid body docking produces a large number of docked conformations with favorable surface complementarity, followed by the reranking of the conformations using the free energy of approximation. The fast Fourier transform (FFT) correlation approach (Katchalski-Katzir et al. 1992) systematically explores the space of docked conformations using electrostatic interactions (Mandell et al. 2001) or both electrostatic and solvation terms (Chen et al. 2003), but the potential is restricted to a correlation function form. Later, polar Fourier correlations were used to accelerate the search for candidate low-energy conformations (Ritchie and Kemp 2000). Additionally, other approaches such as computer vision concepts (Wolfson and Nussinov 2000), Boolean operations (Palma et al. 2000), and the genetic algorithms (Gardiner et al. 2001) were also used. In fact, the Fourier transform algorithm can also use spherical harmonic decomposition to accelerate the search over 3D rotational space, as used in FRODOCK. To further improve the FFT docking, atomic contact energy is added to estimate the desolvation energy in RDOCK and electrostatic correction in ZDOCK (Bissantz et al. 2000; Metropolis and Ulam 1949).

There are also other types of useful FFT based rigid-body docking tools without a 3D grid-based searching system, such as Hex (Ritchie and Kemp 2000; Ritchie and Venkatraman 2010). HEX uses spherical polar Fourier correlations for both rotational and translational space. Furthermore, the efficiency of Fourier transform-based algorithms is further accelerated computationally with the help of advanced software packages, such as the 3D convolution library (Pierce et al. 2011), and new hardware technologies, such as the graphics processing unit (GPU) (Ritchie and Venkatraman 2010) and Cell BE processor (Pons et al. 2012). The docking program ‘DOT’ performs a systematic rigid-body search of one molecule, carrying out both translational and rotational orientation on a second molecule. Finally, the sum of intermolecular energies of electrostatic and atomic desolvation energies as a correlation function for all the generated configurations are computed efficiently with FFTs (Roberts et al. 2013).

MEGADOCK is similar to ZDOCK in that it generates docking conformations in a grid-based 3D space using an FFT. But MEGADOCK calculations are 8.8 times faster than ZDOCK due to a much simpler score function in which only shape complementarity and electrostatics are considered (Ohue et al. 2014a). Using these two programs, the core signal process in the bacterial chemotaxis pathway has been identified (Matsuzaki et al. 2014). Later, a soft docking approach in FFT was developed where the ligand and the receptor are considered as rigid bodies, and their conformational changes are calculated by allowing a certain degree of inter-protein penetration (Katchalski-Katzir et al. 1992). These domain–domain poses were also scored by binding energy and a pseudo-energy term based on restraints derived from linker and end-to-end distances in pyDockTET (tethered-docking).

The other programs include SOFTDOCK (Jiang and Kim 1991), BiGGER (Palma et al. 2000), and SKE-DOCK (Terashi et al. 2007). For the sake of matching efficiency, each grid point is given a value of ‘1’ when occupied by the protein or ‘0’ otherwise. This grid-based system is similar to FFT-based grid searches, except that it has simpler values on the grid. Although the affinity of a ligand–protein complex is determined mainly by complementary physical-chemistry features, shape complementarity became an essential part in rigid body docking programs (O’Sullivan et al. 1991). Apart from electrostatics, the hydrophobic complementarity based on geometry was incorporated in the MolFit FFT program to calculate the interface of a protein–protein complex (Katchalski-Katzir et al. 1992). Recently, PIPER was developed to predict mutual orientation of the two proteins using pairwise interaction potential between the atoms i and j. The contributions to the scoring function are evaluated in discretized 6D space as the sum of terms representing shape complementarity, electrostatic, and desolvation energies. The structures obtained in PIPER are very close to their native conformations due to the decomposition of eigenvalue–eigenvector, which is the key to the efficient use of this potential (Kozakov et al. 2006).

However, these algorithms are not well suited for unbound crystal structures and yield many false-positives far from the native complex, though they have good surface complementarity. To improve in silico prediction further, F2Dock was developed, which also uses shape complementarity and scores based on Coulombic potentials. This program is also structured to incorporate the Lennard-Jones potential and docking solutions were reranked based on desolvation energy. These contributions were shown to be effective in more than 70% of the cases in a bound–unbound complex. The lowest RMSD was improved by at least 0.5 Å for 45 bound–unbound complexes and less than 1 Å was seen for 27 bound–bound complexes (Bajaj et al. 2011). In fact, DOCK was one of the first programs that involved shape complementarity through a set of spheres in the determination of ligand–protein interactions. The volume occupied by the ligand depends on the diameter of the spheres inside the binding pocket of the protein (Kuntz et al. 1982). The initial orientation of the ligand inside the binding pocket is determined by a maximum clique detection method based on distance compatibility. However, the data can be accessed rapidly though geometric hashing by matching features in triplets. The features are represented in the form of spheres and are clustered as poses (Fischer et al. 1993). SDOCK performs global searches by incorporating the van der Waals attractive potential, geometric collision, screened electrostatic potential, and Lazaridis–Karplus desolvation energy into the scoring function. Structure flexibility was based on stepwise potentials that were generated from the corresponding continuous forms (Zhang and Lai 2011).

Cell-Dock also performs the global scan using the translational and rotational space of two molecules based on surface complementarity and electrostatics. A paramount difference with FTDock is that the value of the grid size is fixed in a number of cells that reflects grid cell resolution and total span in Angstroms (Pons et al. 2012). Furthermore, to reduce the size of molecules from large compound libraries, shape complementarity was introduced between ligand and protein in MS-DOCK to perform efficient multiple conformation rigid-body docking (Sauton et al. 2008). The contact surface between the ligand and the protein is further optimized by a Gaussian shape fitting function in FLOG (Miller et al. 1994), CLIX (Lawrence and Davis 1992), FRED (McGann et al. 2003), and PAS-Dock (Protein Alpha Shape-Dock) (Tøndel et al. 2006) to perform rigid body docking.

The TagDock toolkit produces macromolecular complexes from rigid monomers by generating randomly posed docked pairs (decoys) that agree with inter-monomer distance restraints determined experimentally by using a penalty for each decoy (Smith et al. 2013). Examples of other docking programs that use local shape featuring algorithms include LZerD (Venkatraman et al. 2009), PatchDock (Schneidman-Duhovny et al. 2005) and GAPDOCK (Gardiner et al. 2001). Geometric hashing algorithms also perform a global protein–protein docking using local shape descriptors, such as surface patches in PatchDock (Harrison et al. 2002) or 3D Zernike descriptors in LZerD (Venkatraman et al. 2009), between proteins. Recently, an integrated algorithm, MEMDOCK (Membrane Dock) was designed for docking within the membranes. The method models both side chain and backbone flexibility and performs rigid body optimization of the ligand orientation using modified Patchdock and Fiberdock (Hurwitz et al. 2016).

Accuracy of rigid body docking

Docking was considered successful if the binding of a ligand into its active site was closer than a given threshold from the X-ray solution. The DOCK program applied to aspartic protease of HIV resulted in a candidate inhibitor with high potency turned out to be several orders of magnitude too low for clinical use. However, this molecule can be used as a lead compound for the design of more potent inhibitors.

Ring and coworkers designed inhibitors against proteases of schistosome and malaria parasites that are crucial to the pathogenicity by using shape-complementarity function and a simplified molecular-mechanics potential approximating the interaction energy between the protease and ligand (Ring et al. 1993). The DOT program successfully predicted the electron transfer complex of the positively charged cytochrome c to the negative region on the cytochrome c oxidase surface formed by subunit II (Roberts and Pique 1999). Out of 25 protein–protein complexes tested using the BiGGER program, 22 complexes were near to native docked geometries with C(alpha) RMS deviations ≤4.0 A from the experimental structures, of which 14 were found within the 20 top ranking solutions (Palma et al. 2000). With the omission of water molecules, the top-ranking solutions of the MolFit program using geometric and geometric-electrostatic docking identify clusters of nearly correct solutions with limited rotational freedom at the interface for disassembled and unbound structures (Heifetz et al. 2002). In round 1 of CAPRI (Critical Assessment of PRediction of Interactions) experiments, GAPDOCK correctly predicted 17 of 52 interprotein contacts with target 1 and 27 of 52 contacts with target 2 compared to those obtained by other methods (Gardiner et al. 2003). Using PatchDock, out of 35 examples, 31 examples were shown to have the lowest RMSD below 2 Å. In 26 cases, the correct poses were ranked first, whereas in the other nine cases, the correct solution is ranked among the first 30 conformations. However, SymmDock only predicts structures with cyclic symmetry. If the input monomers are with different symmetry in its native complex, then SymmDock is not suitable for such a prediction (Schneidman-Duhovny et al. 2005).

With simple unbound–bound target cases, 47% of the interface contacts were correctly predicted by ZDOCK, demonstrating its strength in binding site prediction (Wiehe et al. 2005). INTELEF, an updated version of SOFTDOCK, predicted 66 corrected solutions out of 83 with ranks in the top 2000 solutions (Li et al. 2007). Using the SE-Dock server, the smallest RMSD between the model and experimental structures obtained were 3.307 and 3.324 Å, respectively. In the docking step, out of eight targets, the SKE-DOCK server generated acceptable models with ligand RMSD of 10 Å or lower for five targets. For the results of three targets, SKE-DOCK failed in the geometric docking because of improper conformations obtained during the docking step (Terashi et al. 2007). When considering only the cases that have at least one acceptable solution generated by ZDOCK, the success rates of pyDockTET for predicting an acceptable conformation in the top 10 and 50 solutions are 69% and 77%, respectively, whereas the success rates of pyDock alone are 62% and 69%, respectively (Cheng et al. 2008).

Except in five known difficult cases (1BGX, 1I4D, 1SBB, 1HE8, 1IB1), several acceptable solutions have been found in almost all docking cases with RMSDL ≤10 Å or RMSDL ≤4 Å within 10,000 default predictions yielded by FRODOCK (Garzon et al. 2009). For 64% of acetylcholinesterase complexes, the shape complementarity identified by HEX overlaps with the native binding site (Wass et al. 2011). Further, Cell-Dock was tested on the unbound structures of protein–protein docking benchmark version 2.0 formed by 84 cases. In 89% of the cases using CELL-256 and in 85% of the cases when using CELL-128, the docking poses are nearer to native conformations. These results were also assessed by pyDOCK based on electrostatics, desolvation, and van der Waals energy. The scoring by pyDOCK showed a slightly better success rate with CELL-256 than with CELL-128. With CELL-256, 19.7% of the cases obtained near native conformations within the top 10 scoring solutions, whereas 18.3% of the cases showed near native conformations within the top 10 scoring solutions using CELL-128 (Pons et al. 2012). In both the cases, the differences were minimal and the values in general were similar to those achieved by pyDOCK when scoring FTDOCK models (Pons et al. 2010).

According to the latest CAPRI experiments carried out in 2013, the ClusPro server was best in automated protein docking equivalent to the best human predictor group. HADDOCK (de Vries et al. 2010), SwarmDock (Torchala et al. 2013), and PIE-Dock (Ravikant and Elber 2010) were the next best. In the human predictor category, HADDOCK (Dominguez et al. 2003) was given the first rank, followed by SwarmDock (Venkatraman and Ritchie 2012). ICM (Fernández-Recio et al. 2002) was ranked in the 2nd to 5th positions (Kozakov et al. 2013; Lensink and Wodak 2013). The predicted binding mode for the CCDC-Astex set of 85 diverse protein–ligand complexes is correct in approximately 80% of cases with rDock (Ruiz-Carmona et al. 2014). By incorporating the electrostatic term, MEGADOCK 2.1 successfully predicted at least one near-native decoy for 128 protein complexes in the bound set and 23 complexes in the unbound set in the top 100 scored decoys. When compared with ZDOCK 3.0, MEGADOCK 2.1 was less successful (Ohue et al. 2014b).

Flexible docking

In standard virtual docking studies, ligands are freely docked into a rigid receptor. However, it has become increasingly clear that side chain flexibility plays a crucial role in ligand–protein complexes. These changes allow the receptor to alter its binding site according to the orientation of the ligand. The ligand orients in a (6 + N)-dimensional space of translational, rotational, and conformational variables in the anisotropic environment of the receptor (Jackson et al. 1998; Moon and Howe 1991; Rotstein and Murcko 1993a, b; Nishibata and Itai 1993). Four different strategies are currently in use for docking flexible ligands, namely: (a) Monte Carlo or molecular-dynamics docking of complete molecules; (b) in-site combinatorial search, (c) ligand buildup; and (d) site mapping and fragment assembly.

Monte Carlo methods accept or reject the random changes of the thermodynamic accessible states by using Metropolis criteria (Metropolis and Ulam 1949). The configurations with increase in temperature T will be accepted by slow cooling through so-called simulated annealing (Kirkpatrick et al. 1983). The changes in conformations are quite large, allowing the ligand to cross the energy barriers on the potential energy surface. This technique of conformational searches combined with the potentials of molecular affinity gives an efficient method of substrate docking with known structures (Goodsell and Olson 1990). Along with affinity potentials, distance constraints were added as soft potentials in simulated annealing (Yue 1990).

Examples of applying the Monte Carlo methods include an earlier version of AutoDock (Novotny et al. 1989), ICM (International Computer Management) (O’Sullivan et al. 1991), QXP (quick explore) (Pellegrini and Doniach 1993), and Affinity (Ring et al. 1993). AutoDock 2.4 generates conformers in real space using Monte Carlo simulated annealing with a rapid energy evaluation using molecular affinity grids using common force fields (Leach 1994). ICM software generates the ligand in 3D grid space by Monte Carlo movements and minimization of interaction potentials. Using this software, the interactions between FNR (ferredoxin:NADP+ reductase) and its redox partners were modeled and their binding interfaces were predicted. The results obtained were highly similar to FNR:Fd complexes of Anabaena and maize, showing a good correlation computationally. QXP is a multistep docking program using a local Monte Carlo search with a restricted rotational angle (Pellegrini and Doniach 1993).

Recently, a newly designed and implemented version of the AutoDock program called AutoDock Vina has been released. This version abandoned the former empirical scoring function and GA-based optimizer, but adopted a new knowledge-based scoring function with a Monte Carlo sampling technique and the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method for local optimization. Their simulation results showed a significant improvement in both prediction accuracy and docking time. PSOVina is the first PSO (particle swarm optimization) protein–ligand docking algorithm in the framework of AutoDock Vina (Ng et al. 2015). Through careful integration of Vina’s efficient local optimizer into the canonical PSO procedure and proper tuning of parameters, PSOVina achieved a remarkable execution time reduction of 51–60% without compromising the docking accuracies. In recent years, swarm intelligence algorithms have emerged as a fast and reasonably accurate technique in solving complex search problems in computer science. To date, there exists only a handful of swarm-based docking methods: SODOCK, a hybrid of PSO and Solis and Wets’ local search method (Chen et al. 2007); PLANTS, an ant colony optimization method (Korb et al. 2009); pso@autodock, a velocity adaptive and regenerative constricted PSO method (Namasivayam and Günther 2007); ParaDockS, a parallel docking suite having PSO as the optimization algorithm (Banitt and Wolfson 2011); and FIPSDock, the fully informed PSO method (Liu et al. 2013). Three of the programs were modifications of the popular open-source docking program AutoDock, albeit different versions, and all of them showed better predictive performance when compared to the original AutoDock implementation.

Furthermore, a novel search method called QPSO-ls (quantum-behaved particle swarm optimization) was introduced for solving a highly flexible docking problem, which is a hybrid of quantum-behaved particle swarm optimization (QPSO) and a local search method of Solis and Wets (Fu et al. 2015). In another program called GalaxyDock, the receptor side chains were preselected and globally optimized using an AutoDock-based algorithm for flexible side-chain docking (Shin and Seok 2012). FLIPDock uses the AutoDock force field for generating multiple receptor conformations, termed as the flexibility tree (FT) (Zhao and Sanner 2007). Furthermore, ‘RosettaLigand’ uses a low-resolution docking in the initial step combined with translational and rotational adjustments (DeLuca et al. 2015). GOLD explores the flexibility of the ligand through the process of evolution by using a genetic algorithm and displaces loosely bound water on ligand binding (Jones et al. 1995, 1997). Later, a wide range of nuclear magnetic resonance (NMR) and available experimental as well as bioinformatics data was used to drive the docking process in HADDOCK (Dominguez et al. 2003).

Previous modeling studies on protein–DNA and protein–RNA complexes using NMR data have been shown to be successful (Gu et al. 2015; Bursulaya et al. 2003; Paul and Rognan 2002; Berman et al. 2002). The various degrees of conformational flexibility of DNA were sampled by the semi-flexibility of sugar-phosphate backbone and DNA base pairs for further docking calculations. In FTDOCK, the docking score is measured by rotating and translating the protein along the DNA using shape and electrostatic complementarity by approximate flexibility (Bruccoleri and Karplus 1990). Further rotamer libraries can be used to reduce the side chain placement problem to a combinatorial optimization problem with the minimum energy, i.e., the global minimum energy conformation (GMEC) (Kohlbacher and Lenhof 2000; Canutescu et al. 2003). One of these methods is based on the dead-end elimination (DEE) theorem of Desmet et al. (1992). Later, GMEC was investigated as the convex hull of all feasible solutions with some classes of facet-defining inequalities in a branch-and-cut algorithm. The side chain conformations generated by these techniques are then subjected to a geometry optimization with a molecular mechanics force field. Finally, the binding free energy of the optimized structure is estimated (Jackson and Sternberg 1995).

Further, the algorithms were developed to build ligands directly in the binding site in flexible-docking and design strategy. One of these was the de novo design of peptide inhibitors using a library of low-energy conformations of isolated amino acid residues as building blocks (Moon and Howe 1991). Subsequently, this method was extended to a non-peptide ligand design using functional groups or single atoms using GroupBuild and LEGEND (Nishibata and Itai 1993; Rotstein and Murcko 1993a, b). Goodford introduced the idea of using functional groups (water, methyl group, amine nitrogen, carboxy oxygen, and hydroxyl) as molecular probes to map the binding site of a macromolecule (Goodford 1985). Thus, the energy contour surfaces for the various probes differentiate regions of attraction between the probe and protein. The procedure is well suited to multiple-copy techniques (Miranker and Karplus 1991). The goal of fragment-assembly approaches, pioneered by Lewis and Dean (1989a, b), is to connect the individual molecular fragments into a single viable molecule. The CLIX program attempts to make a pair of favorable interactions in the binding site of the protein with a pair of chemical substitutions (Lawrence and Davis 1992). LUDI places molecular fragments to form hydrogen bonds with the enzyme so that the hydrophobic pockets are filled. These fragments are then linked together with suitable spacers (Böhm 1992). The linked-fragment approach of Verlinde and coworkers are based on shape descriptors (Verlinde et al. 1992). Caflisch and coworkers used MCSS (maximal common substructure search) against HIV protease to map a binding site and constructed peptide inhibitors by building bonds to connect the various minima they found (Caflisch et al. 1993). In HOOK, MCSS is also used in the mapping stage, but the minima are connected by a database of molecular scaffolds for possible connectors (Eisen et al. 1994). FlexX uses a tree-search technique for placing the ligand into the active site, incrementally starting with the base fragment (Rarey et al. 1996).

Unlike other docking programs, Glide performs a complete systematic search of the conformational, orientational, and positional space of the docked ligand with the OPLS-AA force field (Optimized Potentials for Liquid Simulations). The best possible conformation is further refined using Monte Carlo sampling (Friesner et al. 2004). SLIDE (‘Screening for Ligands by Induced-fit Docking, Efficiently’) optimization is based on the mean-field theory, balancing flexibility between the ligand and the protein side chains (Schnecke et al. 1998). Further, a surface-based molecular similarity method was implemented in Surflex (Jain 2003) to rapidly generate suitable putative poses for molecular fragments using the Hammerhead docking system (Jain 2003). In addition, a multi-objective docking strategy, MoDock, has been proposed to further improve the pose prediction with the available scoring functions divided into the following three types: force field-based, empirical-based, and knowledge-based. The results obtained indicate that the multi-objective strategy can enhance the pose prediction power of docking with the available scoring functions (Gu et al. 2015).

Accuracy of flexible docking

Initially, three different docking programs (Dock, FlexX, and GOLD), with six different scoring functions (Chemscore, Dock, FlexX, Fresno, Gold, Pmf score) were evaluated against thymidine kinase (TK) and estrogen receptor to measure the accuracy of virtual screening methods. Out of the three docking programs, GOLD showed 60% docking accuracy with less than 1.2 Å RMSD, including the worst docked orientation, with an RMSD of 3.1 Å. Surprisingly, both Dock as well as FlexX were not able to produce a reasonable solution for at least three TK ligands (IdU (5-iododeoxyuridine), hmtt (6-[6-hydroxymethy-5-methyl-2,4-dioxo-hexahydro-pyrimidin-5-yl-methyl]-5-methyl-1H-pyrimidin-2,4-dione), and mct ((North)-methanocarba-thymidine) for Dock; hmtt, ganciclovir, and penciclovir for FlexX). Furthermore, the best docking poses for ERα receptor with raloxifene, 4-hydroxytamoxifen, were obtained using GOLD and, to some extent, with FlexX. On the other hand, DOCK failed completely to predict a reliable pose for raloxifene. However, no relationship was found between the docking accuracy and ranking score with these programs (Bissantz et al. 2000).

In 2003, five docking programs, DOCK 4.0, FlexX 1.8, AutoDock 3.0, GOLD 1.2, and ICM 2.8, were accessed with a dataset of 37 protein–ligand complexes and screening the compounds containing 10,037 entries against 11 different proteins. The results revealed that ICM provided the highest docking accuracy against these receptors, with a value of 0.93 compared to AutoDock, DOCK, FlexX, and GOLD, with acceptable accuracies of 0.47, 0.31, 0.35, and 0.52, respectively. In 17 cases, ICM predicted the original ligands within the top 1% of the total library screened with 50% of the potentially active compounds falling under ∼1.5% of top scoring solutions, while DOCK and FlexX predicted only ∼9% of potentially active compounds. It was also found that ∼46%, 30%, 35%, 46%, and 76% of the molecules were docked correctly within 2 Å RMSD by AutoDock, DOCK, FlexX, GOLD, and ICM, respectively (Bursulaya et al. 2003).

Furthermore, in 2004, eight docking programs were evaluated with 100 protein–ligand complexes (Paul and Rognan 2002) from the Protein Data Bank (PDB) (Berman et al. 2002). At an RMSD cutoff of 2 Å, 50–55% of the ligands were successfully docked using FlexX, Glide, GOLD, and Surflex, whereas the success rates of DOCK, FRED, SLIDE, and QXP did not exceed 40%. Using the protein-bound X-ray conformation, OMEGA was able to predict at least one conformation closer than 2 Å for 99% of 100 ligands. With random ligand conformations, the docking poses obtained with FRED were satisfactory, with RMSDs between 1.76 and 2.14 Å of docked poses from X-ray conformation. All these docking programs performed well with small hydrophobic ligands, while the performance of GOLD and Surflex remained roughly unchanged. Moreover, Glide and FRED are still efficient in ligand placement with a poor ranking ability (Kellenberger et al. 2004).

In the same year of 2004, three highly regarded docking programs, namely, Glide, GOLD, and ICM, were evaluated on a vertex dataset of 150 diverse protein–ligand complexes to predict their ability to reproduce crystallographic binding orientations. In 61% of the cases, Glide correctly identified the crystallographic pose within 2.0 Å, compared to GOLD with 48% and 45% for ICM (Perola et al. 2004). In regards to ligand complexity, all these docking programs performed well, with the ligands having ten or fewer rotatable bonds. However, LigandFit identified 75% of its close conformations when the ligands have ten or fewer rotatable bonds, while FlexX identified 69% for less than ten rotatable bonds, which increases to 92% if the ligand has 15 or fewer degrees of freedom. In contrast, the sensitivity of Glide is less, with a 78% success rate with smaller complexity less than 15, while GOLD is the least sensitive of all (Kontoyianni et al. 2004).

Evaluation of known crystal structures of 40 zinc-dependent metalloproteinase ligand complexes showed the lowest energy conformations by GOLD and DrugScore with a proper ZBG (zinc binding group) binding. However, DOCK, GOLD, and DrugScore produced RMSD values greater than 8 and improper ZBG binding, showing significant differences between the docked pose and crystal structures. In contrast, AutoDock and FlexX gave better results, with RMSDs of 2.91 and 2.63 Å and a proper ZBG binding. If the RMSD limit is increased to 2.5 Å, the percentage of the well-docked poses with good/fair ZBG binding increased to 90% for all five approaches (Hu et al. 2004). At the 2% level of predicting top-scoring molecules, Glide identifies known active molecules for four of the five protein targets. However, for the 10 and 20% levels, Glide was the only program which identified one or more of the known active molecules for each of the five target proteins. The same 2% level of success was achieved when 5% of the top-scoring molecules were considered by DOCKVISION and Glide. All the other programs achieved between 10 and 20% level of success by identifying one or more active seeds for four of the five targets (Cummings et al. 2005).

Further results against 164 targets show that ICM and Glide produce the lowest average RMSDs of 1.08 and 2.37 Å matching with the native ligands, while GOLD and FlexX fared worse, with RMSDs of 2.80 and 3.98 Å, respectively. At the RMSD cutoff of 2.0 Å, ICM and Glide showed success rates of 91 and 63%, respectively, by classifying 149 out of 164 and 104 out of 164 compounds correctly within this threshold. GOLD also performed reasonably well by classifying 91 from 164 (55%), while FlexX performed less well with 70 from 164, a percentage of 42%. ICM and Glide again performed well at the more stringent RMSD cutoff of 1.0 Å, correctly docking 93 out of 164 and 81 out of 164, leading to success rates of 57 and 49%, respectively. GOLD was successful in 64 out of 164 cases, for a success rate of 39%, and FlexX was successful with a rate of 26% (42 out of 164) (Chen et al. 2006). With eight protein targets, 50% of the ligands were placed well for five targets by at least one program. Indeed, 90% of the ligands could be docked with the correct orientation and 100% could be docked in the correct location for several protein targets (Warren et al. 2006). The RMSD-based evaluations against 116 complexes of 13 types revealed that no docking program was significantly superior to GOLD. Thirteen complexes found solutions with an RMSD of 2 Å or better only by GOLD, and no solution was found by either AutoDock or DOCK alone. The sizes of the binding sites for the complexes that were successfully solved only by GOLD were widely distributed, from 2253 to 7900 Å, and represented the various protein types (Onodera et al. 2007).

Later, ten docking programs and 37 scoring functions were analyzed against seven protein types to predict the binding mode, lead identification using virtual screening, and lead optimization. Out of these ten programs, Glide, GOLD, and QXP showed success for 61–63% of the cases with an RMSD cutoff of 1 Å. At this cutoff, the docking was successful in only 48% and 54% of the cases with FLEXX and Surflex, respectively. With an RMSD threshold of 2 Å, 80–90% of the ligands using Glide, GOLD, Surflex, and QXP, while 66% and 62% of the cases in FLEXX and FRED, respectively, were place within 2.0 Å of the X-ray pose. Lastly, DOCK and SLIDE placed only 50% of the ligands within the 2 Å RMSD threshold. Studies also showed that GOLD performed well with hydrophilic targets where there is some lipophilic character in the active site (i.e., thermolysins and PPAR-γ). Contrary to GOLD, both LigandFit and Glide performed well with COX-2, a target with a mainly hydrophobic binding pocket (Kontoyianni et al. 2004).

Furthermore, seven commonly used programs were evaluated on the PDBbind database with 1300 protein complexes (Plewczynski et al. 2011). The results showed that Surflex, FlexX, LigandFit, eHiTS, and GOLD were reasonable, with failed complexes amounting to not more than 30, while 60% of their complexes in GOLD and eHiTS have their top score conformations below 2 Å. AutoDock failed to dock nearly 90 pairs, while only 1170 (90% of the entire database) complexes overcame the Glide ligand restraints on the number of rotatable bonds to 35 and ligand size of 200 atoms. Both pose prediction and scoring capabilities of Glide, AutoDock, and Surflex achieved results of around 50%. The docking accuracy by LigandFit reached nearly 60% with a higher number of rotatable bonds, whereas medium and weak ones achieved only 50%. The level of correlation for hydrophobic molecules is 0.2, though the ligand–protein contacts were based on van der Waals and polar interactions.

Later, 19 docking protocols were used to predict bound conformations for the 136 compounds of seven different targets (kinase, protease, isomerase, polymerase, synthetase, metalloprotease, and NHR) of the available protein/ligand crystal structures. For all targets except HCVP, at least one program was able to dock 40% of the ligands within 2 Å of the crystal conformation. In 2010, four popular docking programs were evaluated, Glide (version 4.5), GOLD (version 3.2), LigandFit (version 2.3), and Surflex (version 2.0), on a test set of 195 protein–ligand complexes. Out of these four docking programs, GOLD and Surflex processed well with the dataset, while Glide and LigandFit failed to process 25 and 8 complexes, respectively. Except for Surflex, the docking solutions produced in 40% of the cases by other programs were less than 1.0 Å of the RMSD, whereas Glide and GOLD showed a 60% success rate on this highly diverse test. Based on these results, these docking programs were ranked as Glide > GOLD + gold score > GOLD + Chemscore ∼ GOLD + ASP ∼ Ligandfit > Surflex (Li et al. 2010).

Recently, ten docking programs were evaluated. The success rate for the top scored and best poses varied from 40% to 60% and 60% to 80%, respectively. The RMSD obtained is less than 2 Å between the top scored pose and the native pose. On the basis of the results for the top scored poses, the performance of the academic programs conform to the following order: LeDock (57.4%) > rDock (50.3%) ∼ AutoDock Vina (49.0%) > AutoDock (PSO) (47.3%) > UCSF DOCK (44.0%) > AutoDock (LGA) (37.4%), and that of the commercial programs confirm to the following order: GOLD (59.8%) > Glide (XP) (57.8%) > Glide (SP) (53.8%) > Surflex-Dock (53.2%) > LigandFit (46.1%) > MOE-Dock (45.6%). The averaged success rates of the commercial docking programs in predicting the top scored poses and best poses are 54.0% and 67.8%, which can be compared to academic programs, with success rates of 47.4% and 68.4%, respectively (Wang et al. 2016). This shows that all these docking algorithms were able to explore the conformational space to generate correctly docked poses in the binding pockets sufficiently well on a diverse set of protein–ligand complexes.

In general. Glide performs well with diversified binding sites and flexibility of the ligand, while ICM and GOLD perform significantly poorer when binding sites are mainly influenced by hydrophobic contacts. These results also show that the difference between the commercial and academic programs was not obvious, even though the capability of predicting the ligand binding poses by the commercial programs is slightly better than that of the academic programs from a global perspective.

Conclusions

Structure-based drug design is a powerful technique for the rapid identification of small molecules against the 3D structure of the macromolecular targets available by either X-ray, NMR, or homology models. Because of abundant information regarding the sequences and structures of the proteins, the structural information of individual proteins and their interactions became very important for further drug therapy. Although many docking programs exist for conformational searching and binding pose prediction, the scoring functions are not accurate and need to be improved further. Nevertheless, despite the drawbacks of each docking strategy, active research is taking place to address all the issues regarding scoring, explicit protein flexibility, explicit water, etc.

Even in the absence of knowledge regarding the binding site and limited backbone movements, a variety of search algorithms have been developed for protein–protein docking over the past two decades. As rigid body docking can systematically explore the shape complementarity between proteins, this may not work well for docking the proteins that are crystallized separately. Thus, a high-resolution protocol is very much needed to understand the basic principles to detect the underlying mechanism of protein–protein interactions and actual binding with other proteins. Rescoring using empirical potentials may not even eliminate all the false-positives. Even fine tuning of individual protein–protein interactions by redesigning the protein interface depends on the accurate structure of the protein complex generated by high-resolution docking protocols.

Although, ZDOCK, rDOCK, and HEX provided the results with high docking accuracy, the provided complexes are not highly useful to design the inhibitors for the protein interfaces due to constraints in rigid body docking. Due to this, flexible approaches were developed that generally examine very limited conformations compared to the rigid body methods. These docking methods predict binding poses most likely to occur on the broad surface regions and then define the sites into high-affinity complex structures. The best example is the HADDOCK software, which has been quite successful in resolving a large number of accurate models for protein–protein complexes. One good example is the study of the complex formed between plectasin, a member of the innate immune system, and the bacterial wall precursor lipid II. The study has clearly identified the residues involved at the binding site between the two proteins, providing valuable information for the design of novel antibiotics.

However, the absolute energies associated with the intermolecular interaction are not estimated with satisfactory accuracy by the current algorithms. The major issues of solvent effects, entropic effects, and receptor flexibility still need to be handled with special attention. As of now, some methods like MOE-Dock, GOLD, Glide, FlexX, Surflex, etc. that deal with side chain flexibility have been proven effective and adequate in most of the cases. The realistic interactions between small molecules and receptors still rely on experimental technology. Moreover, using the current docking methods, although they discriminate between different ligands based on binding affinity with high accuracy, the mode of binding, solvent effects, entropic effects, and effects of protonation states of the charged residues in the active site are still major problems. With the aid of community efforts such as CAPRI (Critical Assessment of PRediction of Interactions), a large number of docking algorithms and their limitations were overcome with benchmark testing. But the problem of flexibility is still under investigation and with the accelerated pace of research in this area, it will be tackled soon in the near future.