Introduction

In the development of potential drugs, computational methods play an important role. Particularly molecular modeling techniques [116] are used to optimize a potential ligand for optimal fitting into the proposed binding pocket of a G-protein-coupled receptor (GPCR), because this docking is considered to be responsible for an agonistic or antagonistic effect. Most modeling studies neglect the possibility of transporting a ligand through the transmembrane helices into the binding pocket of the GPCR. This results in missing knowledge about the amino acids that lead the ligand into the binding pocket and prevents predictions about kinetic aspects concerning experimentally determined rate constants. Molecular-dynamics simulations include the time variable and should therefore be appropriate for modeling the transport of a ligand into the binding pocket. However, because of the large amount of computational time needed, this method is not successful for solving this problem. Steered molecular-dynamics simulations, as described in the literature [24], use a constant external force on the ligand in one direction to enforce ligand movement. However, this force introduces an unwanted arbitrariness on the one hand and on the other hand there is no knowledge about its magnitude. In order to calculate the unbinding pathway, a further technique in combination with molecular-dynamics simulations is known [17]. In this technique, the ligand is moved incrementally from the binding pocket in a small number of steps to the surface of the protein. At each of these intermediate points on the pathway, minimization and molecular-dynamics simulations are carried out. Because of the large distance between each of the intermediate points, up to 0.2 nm, a loss of conformational information in the unbinding pathway must be expected. Therefore, the intention is to calculate a sequence of differentially spaced conformations determining the way of a ligand into the binding pocket, only on the basis of the potential energy surface without kinetic aspects so far. Nevertheless, the amino acids that are absolutely necessary for the transport of the ligand into the binding pocket should be predicted. Thus, the new algorithm LigPath has been developed and applied to the calculation of the pathway of histaprodifen (Fig. 1), a potent agonist at the guinea-pig histamine H1-Receptor (gpH1R) [1820].

Fig. 1
figure 1

Structure of histaprodifen

Methology

As already mentioned, traditional modeling techniques are not suitable for solving this problem. Therefore, the LigPath algorithm was developed in order to limit cpu time and to obtain results within a reasonable calculation time. Thus, to obtain results after a few days of calculation without large computer clusters, one must use preassumed information in the calculation. Logically, it would be appropriate to give a source structure as starting information. This source structure should be a minimized ligand–receptor-complex with the ligand positioned at the extracellular part of the receptor, where the entry to the binding pocket is localized. To avoid large searches of the algorithm in regions of the energy surface that are not involved in ligand transport, the search direction should be given as the destination of the ligand in the proposed binding pocket.

Essentially, the algorithm is based on a generation-child scheme. Beginning with the starting structure, n child conformations are generated randomly and energy minimized. The best child of this actual generation is determined and used as a new starting structure for the next generation.

In Fig. 2, the flow chart of the algorithm LigPath is given. At the beginning, the source structure and the ligand destination are read in. Starting with the source structure, n new child structures are calculated in five steps:

  • Each atom of the ligand is translated about Δr on a virtual guiding line between its actual coordinates and the corresponding destination coordinates. Furthermore, each atom is allowed to deviate from the guiding line by an angle Δφ.

  • After translation, the ligand is rotated by angles of Δα, Δβ and Δγ about the x-, y- and z-axes. The ligand atom that defines the center of rotation is chosen randomly in each step.

  • Next, rotations by an angle of Δρ over all defined rotable bonds of the ligand are carried out.

  • Next, rotations by an angle of Δρ over all rotable bonds of the defined amino acids (Fig. 3) of the receptor are carried out.

  • Finally, a random translation ΔTM xy of the transmembrane helices within the xy plane coupled with random angular variation Δθ z of the transmembrane helices with respect to the z axis is allowed for n TM children of the whole set of n children.

Fig. 2
figure 2

Flow chart of the LigPath algorithm

Fig. 3
figure 3

Rotable bonds of the amino acid sidechains

After these translational and rotational motions, carried out by randomly chosen values within user-defined limits, very small interatomic distances, smaller than optimal van-der-Waals distances, between some atoms of ligand and receptor are expected. To get rid of this disadvantage, the structure generated must be energy minimized. However, bad starting structures can result in expensive minimization times or in bad minimization results. Therefore, the interatomic distances between ligand–receptor and receptor-receptor including all displaced atoms are calculated. If there are distances below a defined limit Δr coll, translation and rotation of the ligand and rotation of the corresponding amino-acid side chains (Fig. 2) are carried out until there are no more interatomic collisions. If the structure generated thus is accepted within the defined limits, it is minimized using the software package GROMACS 3.2 [21]. Our algorithm, written in the programming language C, is linked to the GROMACS 3.2 [21] minimization routine by a shell script, using Linux as the operating system. The minimization parameters can be defined within GROMACS 3.2 [21] just as when using GROMACS 3.2 [21] in the standalone mode. After minimization of all children structures, the best of them would be determined by observation of the potential-energy gradient of the whole simulation box. Because of the large impact of the environment, containing water and lipid, on the potential energy, unfavorable conformations of the ligand–receptor-complex cannot be detected. Thus, the potential energy of the ligand–receptor-complex is used in calculation of the quantity q j (Eq. 1) to determine the best child.

$$q_{j} = - \frac{{E_{j} {\left( i \right)} - E{\left( {i - 1} \right)}}}{{{\text{rmsd}}_{j} {\left( i \right)} - {\text{rmsd}}{\left( {i - 1} \right)}}}$$
(1)

The term E j (i) is the potential energy of the ligand–receptor-complex of child j in the actual generation i, whereas E(i−1) means the potential energy of the ligand–receptor complex of the best child of the previous generation (i−1). Specifically, the notation E 0(0) stands for the potential energy of the ligand–receptor-complex in the starting structure. Denoting the coordinates of ligand atom k of child j in generation i by \(x^{k}_{j} {\left( i \right)},\;y^{k}_{j} {\left( i \right)}\;{\text{and}}\;z^{k}_{j} {\left( i \right)}\) the corresponding rmsd j (i) (Eq. 2) describes the spatial distance between the actual and the destination position given by \({\left( {x^{k}_{{{\text{dest}}}} ,\;y^{k}_{{{\text{dest}}}} ,\;z^{k}_{{{\text{dest}}}} } \right)}\) of the ligand, which consists of N atoms. The symbol rmsd0(0) defines the appropriate value of the starting structure.

$${\text{rmsd}}_{j} {\left( i \right)} = {\sqrt {\frac{{{\sum\limits_{k = 1}^N {{\left( {{\left( {x^{k}_{j} {\left( i \right)} - x^{k}_{{{\text{dest}}}} } \right)}^{2} + {\left( {y^{k}_{j} {\left( i \right)} - y^{k}_{{{\text{dest}}}} } \right)}^{2} + {\left( {z^{k}_{j} {\left( i \right)} - z^{k}_{{{\text{dest}}}} } \right)}^{2} } \right)}} }}}{N}} }$$
(2)

The calculation is stopped if the ligand has reached the defined destination position within defined limits. Thus, the LigPath algorithm is based on a combination of directional guiding (translation of the ligand on the guiding line), Monte-Carlo search (random motions of ligand and receptor) and minimization.

Evaluation of the LigPath algorithm

In order to evaluate our new LigPath algorithm, we performed a calculation on a known system [3]. Therefore, we used the crystal structure of bacteriorhodopsin (bR) (1FBB) [22, 23] as a template and prepared the retinal-bacteriorhodopsin complex according to [3] and minimized the complex with GROMACS 3.2 [21] in combination with the ffG53A6 force field [24]. To avoid constraints on the bacteriorhodopsin as described in [3], the protein was embedded in the surrounding medium. The protein was placed manually in a POPC (1-palmitoyl-2-oleoyl-phosphatidylcholine) membrane bilayer (104 molecules) using the software package VEGA [25]. 13,216 SPC water molecules and five sodium and chloride ions were added to the simulation box. After further energy minimization, the resulting system was treated with the LigPath algorithm. All possible rotable bonds of the retinal were included in the “rotable bond” module of LigPath (Fig. 2). The rotable bonds for the amino acids are given in Fig. 3. After equilibration of the simulation box (seed = 56,789, n = 2, Δr = 0.0 nm, Δφ = 20°, α = Δβ = Δγ = 0°, Δρ = 2.5°, n™ = 0, ΔTM xy  = 0 nm, Δθ z  = 0°, Δr coll = 0.1 nm, E 0(0) = −15,544 kJ mol−1, rmsd0(0)=1.62 nm), the unbinding pathway of retinal was calculated with LigPath (seed = 56,789, n=10, Δr = 0.02 nm, Δφ = 20°, Δα = Δβ = Δγ = 2.5°, Δρ = 2.5°, n™ = 0, ΔTM xy  = 0 nm, Δθ z  = 0°, Δr coll = 0.1 nm, E0(0)=−16,032 kJ mol−1, rmsd0(0)=1.62 nm).

In Fig. 4, different calculated quantities are plotted against the number of generations, where negative numbers indicate the equilibration process. Figure 4a shows the distance of the carbonyl oxygen of retinal and the nitrogen of Lys216 during unbinding, which takes place within generations 0–125, and is characterized by an increase of approximately 1.2 nm. The change in the potential energy of the ligand–receptor-complex from the initial state to the final state is approximately 250 kJ mol−1. During the unbinding process, this quantity shows a maximum (Fig. 4b). In Fig. 4c, the variation of the rmsd for all non-hydrogen atoms of the protein is shown with reference to the initial structure. A comparison with the results given in [3] shows a quantitative correlation for the O–N distance and only a qualitative one with the rmsd because the constraints described in [3] are replaced by the lipid/solvent environment, the increase of rmsd is smaller in our calculation. The use of a new improved force field is responsible for larger differences in the variation of the potential energy of the ligand–receptor complex, and results especially in a significant maximum, as mentioned above.

Fig. 4
figure 4

Characteristics of the unbinding pathway calculation of retinal (a) Distance between the carbonyl oxygen of retinal and the \({\text{NH}}_{{\text{3}}} ^{{\text{ + }}} \)-nitrogen of Lys216 (b) Potential energy of the bR-Retinal-complex (c) rmsd of the bR structure (only heavy atoms) during the calculation with respect to the initial structure

Qualitative agreement with the structural results given in [3] could also be found. The retinal interacts sequentially with the amino acids Lys216-Tyr185-Trp189-Pro186/Trp138-Met142/Thr139 during its unbinding pathway. As described in a short overview [3], the amino acids Tyr185, Trp189, Pro186, Trp138 and Met142 show an effect on the reconstitution rates in appropriate bR mutants.

Summarizing, the results with respect to the unbinding pathway of retinal in bR obtained with the LigPath algorithm are in good agreement with those predicted by molecular-dynamics simulations [3].

Calculation of the binding pathway of histaprodifen

Preparation of the system

Construction of a receptor model

At first, a model of the gpH1R was generated. The sequence of gpH1R was aligned according to Ballesteros et al. [26] to bovine rhodopsin (Fig. 5). Using the 3D-crystal structure of bovine rhodopsin (1F88) [22, 27] as template, the 3D-structure of the receptor was generated on the basis of comparative homology modeling in combination with the Loop Search module of the software package SYBYL 7.0 [28]. Because of the lack of sufficient experimental data concerning the structure of the intracellular C3-Loop (189 amino acids), it was only partially included in the modeling studies. This approximation should not influence the modeling of the entry to the binding pocket and the binding pocket itself much. Then the receptor was minimized carefully, paying particular attention to the correct orientation of the helical amino-acid side chains that face the interior of the transmembrane helix bundle or the lipids, as predicted [26]. The homology model thus generated is a sound basis for further modeling studies, like docking of ligands into the proposed binding pocket. However, for calculations, especially including the entry to the binding pocket on the extracellular side of the receptor, the lack of an environment like a membrane or aqueous extra-and intracellular regions would be a serious approximation. Therefore, it is state of the art to embed the receptor into the surrounding medium. Using the software package VEGA [25], the receptor was placed in a POPC membrane bilayer (104 molecules) manually [29]. Additionally, the histaprodifen was positioned manually at the proposed entry between the E3-Loop and the N-terminus at the extracellular side of the receptor. With GROMACS 3.2 [21], a simulation box containing intra- and extracellular water (12,735 molecules) was constructed. Because of the positively charged receptor and ligand, electroneutrality was achieved by placing an appropriate number of seven sodium and 25 chloride ions inside the box. The whole system was minimized with GROMACS 3.2 [21].

Fig. 5
figure 5

Sequence alignment of gpH1R to bovine rhodopsin

Parameter for calculation of the pathway

Definition of rotable bonds

The rotable bonds of histaprodifen are illustrated in Fig. 6.

Fig. 6
figure 6

Rotable bonds of histaprodifen

Initialization parameters

For our minimization with GROMACS 3.2 [21], the internal GROMACS force field ffG53A6 [24] was used. The parameters for the LigPath algorithm are given in Table 1.

Table 1 Initialization parameters for different pathway calculations (the symbols are explained in the text)

Changes on the surrounding medium

The surrounding medium (POPC lipid bilayer, water, sodium and chloride ions) was included during the whole calculation. The only changes in solvent/lipid positions were introduced by the minimization steps, whereby no solvent clashes were observed.

Calculation of the starting structure

To guarantee an optimal orientation of the amino-acid side chains, the whole simulation box was first minimized using the LigPath algorithm, with the initialization parameter set to run0 (Table 1), setting the characteristic variables Δr, Δα, Δβ and Δγ to zero. Within the first 100 generations the potential energy of the whole simulation box decreased rapidly from about −7.9×105 kJ mol−1 to about −8.34×105 kJ mol−1. During the next 900 generations the potential energy varied slightly down to −8.4×105 kJ mol−1 (Fig. 7). As source structure for the pathway calculation for the histaprodifen, the last structure resulting from the minimization cycle was used.

Fig. 7
figure 7

Potential energy of the simulation box as function of the generations (run0)

Definition of the destination structure

The destination structure is based on the binding mode, which has been described for histaprodifen at the human H1R by Elz et al. [18] In analogy, the binding mode for histaprodifen at the gpH1R was modeled with Sybyl [28]. The histaprodifen interacts with the amino acids Asp116, Tyr117, Ile124, Trp167, Phe208, Pro211, Phe429, Tyr432, Phe433 and Phe436 in the binding pocket of the gpH1R. The resulting coordinates for the histaprodifen relative to the coordinates of the gpH1R were used as destination structure in the pathway calculation.

Results and discussion

The algorithm described allows the calculation of the pathway of histaprodifen into the proposed binding pocket of the gpH1R. The calculations were carried out including the environment (lipid bilayer and extracellular and intracellular water). Neglecting the environment destroys the ternary structure of the GPCR during the calculation, so constraints on the backbone atoms would have to be set. However, this would lead to an artificial impact on the system. Thus, the inclusion of the environment leads to a natural stabilization of the ternary structure without loosing flexibility of the receptor. The quality of the resulting receptor structures was checked with the help of ramachandran plots.

Figure 7 shows the potential-energy curves for minimization of the simulation box (run0) and additionally for the five pathway calculations (run1 to run5). During the first 1,500 generations without ligand penetration, the potential energy leads smoothly into a limit. When ligand penetration is allowed (starting with generation no. 1,500), the potential energy decreases rapidly until the ligand reaches the proposed binding pocket. Variations in initialization parameters lead to deviations in the energy range of about 5%, based on different orientations of the ligand or the amino-acid side chains.

In Fig. 8, the potential energy of the ligand–receptor-complex in the whole simulation box for the above runs is given as a function of the ligand’s rmsd.

Fig. 8
figure 8

Potential energy of the ligand-receptor complex as function of the rmsd of histaprodifen with respect to the binding pocket

The closed lines represent the minimum energy path of each run. The dots show the potential energy of all children produced during the calculations. The five runs show good agreement in the energy profiles with respect to local minima and maxima. The mean values together with the upper and lower limits of the potential energy for all runs are shown in Fig. 9. Six representative snapshots of significant structures along the whole pathway are given in Fig. 10. Structure (a) shows the results of the preminimization of the simulations box. The part of the extracellular receptor surface with the amino acids Tyr194 and Glu448 can be identified as a structure-recognition system for histaprodifen. After about 30 generations (structure (b)), the rmsd of the ligand is about 2.1 nm (Fig. 9). The potential energy of the ligand–receptor complex has a local minimum because of the stabilization of the diphenylpropyl moiety of the ligand. This stabilization is caused by an aromatic interaction between a phenyl group of the ligand and the amino acid Tyr194 of the receptor. This amino acid leads the ligand to the entrance of the hydrophobic channel of the receptor. The terminal \({\text{NH}}_{{\text{3}}} ^{{\text{ + }}} \) moiety, however, does not vary its position, so a rotation of the ligand from a horizontal orientation into a vertical orientation is the consequence. This reorientation destabilizes the ligand–receptor complex, so the potential energy increases about 400 kJ mol−1 at a rmsd of 1.8 nm (Fig. 9). The diphenylpropyl moiety is completely immersed in the receptor in snapshot (Fig. 10c), so the hydrophobic part of the histaprodifen no longer undergoes a repulsive interaction with the polar water on the extracellular side of the receptor. The potential energy of the ligand–receptor-complex varies between −18,900 and −18,700 kJ mol−1 in the rmsd range from 1.7 to 0.3 nm (Fig. 9). This part is shown by snapshots (d) to (f) (Fig. 10). There the amino acids Phe193, Phe436 and Phe433 lead the histaprodifen through the channel into the binding pocket. This guidance is mostly based on successive aromatic interactions of these amino acids with one of the ligand’s phenyl groups. In snapshot (f) (Fig. 10) the histaprodifen is shown in the binding pocket of the receptor. In the LigPath calculation the same histaprodifen-amino acid interactions were found, as described above.

Fig. 9
figure 9

Potential energy profile of the ligand-receptor complex as function of the rmsd of histaprodifen with respect to the binding pocket

Fig. 10
figure 10

af Snapshots of penetration of histaprodifen into the binding pocket of the gpH1R (C: cyan, O: red, N: blue, H: grey)

The homology modeling of the receptor is based on an inactive GPCR structure of bovine rhodopsin. However, during the penetration of the ligand, slight changes in the mutual orientation of some transmembrane helices can be observed, accompanied by a structural reformation of the binding pocket for optimal docking of the histaprodifen. Figure 11 shows the difference in the helix orientation before (grey-colored) and after docking of the ligand (red, green and blue colored). During the pathway calculation, the transmembrane helices TM5 and TM6 exhibit the largest changes in helix orientation compared with the starting structure, induced by Pro211 (TM5, red circle, Fig. 11) and Pro431 (TM6, red circle, Fig. 11). Figure 12 shows the corresponding rmsd of the transmembrane helices based on the starting structure as a function of the number of generations. After equilibration without further helix movement (generation −249–0, Fig. 12), a considerable rearrangement of the transmembrane helices is observed during docking of the histaprodifen into the binding pocket (second red arrow, Fig. 12). Further structural changes in the helices are observable for the next 900 generations. TM5 and TM6 with rmsds of 0.34 nm and 0.33 nm, respectively, show the largest deviations. These results may give an indication of continuous receptor activation during the calculation [3034].

Fig. 11
figure 11

Structural changes in helix orientation (initial structure: grey colour; red circles: Pro-kinks in TM5 and TM6)

Fig. 12
figure 12

Rmsd deviation of the seven transmembrane helices from the initial structure (red arrows indicate start and stop of the penetration of histaprodifen into the binding pocket)

Conclusion

The LigPath algorithm for a predictive calculation of the binding pathway of a ligand into its proposed binding pocket has been developed. On basis of this concept, the pathway of histaprodifen into the binding pocket of the gpH1R was calculated. The simulation shows that the histaprodifen is guided step by step into the binding pocket with the participation of the amino acids Tyr194, Phe193, Phe436 and Phe433. To establish experimental proof, point mutations of these amino acids are in preparation. A comparision of experimentally determined rate constants for the binding and unbinding process of the histaprodifen between the wild type and mutated gpH1R will be used to verify the prediction.