Key words

1 Introduction

Proteins are involved in numerous biological processes such as enzymatic activities and signal transductions [13]. The biological functions of proteins result from their molecular interactions with other molecules such as metal ions, small organic compounds, lipids, peptides, nucleic acids, or other proteins. Typically, proteins interact with other molecules by binding them at specific sites. Therefore, identification of the binding sites on the three-dimensional protein surfaces can be an important step for inferring protein functions [4, 5] and for designing novel molecules that control protein functions [6, 7] or designing new proteins with desired interaction properties [8, 9]. Various methods have been developed to predict ligand binding sites of proteins from protein sequences or structures. Those methods are based on geometry, energy, evolutionary information, or combinations of them [10]. Methods utilizing available experimentally resolved structures of homologous protein–ligand complexes were proven to be successful in predicting binding sites in the community-wide blind prediction experiments [1113]. Those methods predict binding sites by transferring the available binding information for homologs, assuming that binding sites are conserved among homologs. However, methods based on evolutionary information alone may not be sufficient to predict interactions at the binding sites in atomic detail, and physicochemical interactions may have to be considered in addition.

In this chapter, we introduce two methods that predict binding sites of small organic compounds and peptides that are available on the GALAXY web server called GalaxyWEB [14]. These methods effectively search the protein structure database to find available experimental structures of related proteins complexed with ligands, build three-dimensional protein–ligand complex structures from the available information, and further refine the complex structure to go beyond the available information by optimizing physicochemical energy. The GalaxySite server predicts binding sites of small organic compounds from input protein structure or sequence [15]. Binding ligands are first predicted and the predicted ligands are then docked to the given protein structure or a predicted protein structure if sequence is given. The predicted complex structures are optimized by protein–ligand docking simulations which take into account the binding information derived from related proteins and additional physicochemical energy that do not rely on evolutionary information. GalaxySite was ranked among top methods in the recent critical assessment techniques for protein structure prediction ( CASP) experiments when evaluated in terms of predicted binding site residues [16, 17]. GalaxyPepDock predicts protein–peptide complex structures from input protein structure and peptide sequence [18]. It also combines information on interactions found in homologous complexes in the protein structure database and additional physicochemical energy to optimize the protein–peptide complex structures. The protein structure is allowed to change flexibly according to its interaction with the peptide ligand during optimization.

The method proved its usefulness in the recent critical assessment of prediction of interactions (CAPRI) experiments ([19], http://www.ebi.ac.uk/msd-srv/capri/round28/round28.html). Both GalaxySite ligand binding site prediction server and the GalaxyPepDock peptide binding site prediction server rely on similarity to the protein–ligand complexes of known structures and provide detailed protein–ligand atomic interactions by sophisticated energy optimization.

2 Materials

  1. 1.

    A personal computer or device and a web browser are required to access the GalaxyWEB server through the Internet. A JavaScript enabled web browser is highly recommended to see the results on the web browser: The server compatibility was tested on Google Chrome, Firefox, Safari, and Internet Explorer.

  2. 2.

    The following input materials are required to use GalaxySite and GalaxyPepDock on GalaxyWEB.

    1. (a)

      To run GalaxySite for ligand binding site prediction, a sequence in FASTA format or a structure file in standard PDB format for the protein of interest is required. The input target protein sequence/structure file must contain 20 standard amino acids in one/three-letter codes. The input should be a single-chain protein, and the number of amino acids should be greater than 30 and less than 500. The user may judiciously delete irrelevant protein chains or termini before job submission to meet this requirement and/or to save computational cost. An example input sequence (Fig. 1, Label 1) and structure file (Fig. 1, Label 2) can be obtained from the GalaxySite web page.

      Fig. 1
      figure 1

      The GalaxySite input page

    2. (b)

      To run GalaxyPepDock for peptide binding site prediction, a structure file in standard PDB format for the receptor protein of interest and a sequence file in FASTA format for the peptide of interest are required. The number of amino acids of the receptor protein should be less than 900 and that of the peptide less than 30. The input peptide sequence file must contain 20 standard amino acids in one-letter codes. Example input files (Fig. 2, Label 1) can be obtained from the GalaxyPepDock web page.

      Fig. 2
      figure 2

      The GalaxyPepDock input page

3 Methods

3.1 Ligand Binding Site Prediction Using GalaxySite

  1. 1.

    Go to GalaxyWEB, http://galaxy.seoklab.org. Click “Site” in the “Services” tab at the top of the page.

  2. 2.

    In the “User Information” section, enter job name (defaults to “None”). The user can provide e-mail address so that the server sends progress reports of the submitted job automatically. Otherwise, the user should bookmark the report page (Fig. 3b) after submitting the job.

    Fig. 3
    figure 3

    (a) A summary page showing the submission information of a GalaxySite job. (b) An example report page showing the status of the GalaxySite job

  3. 3.

    In the “Query Protein Information” section, provide a FASTA-formatted protein sequence or a standard PDB-formatted protein structure file. If the structure of query protein has been already determined or predicted, the user may simply upload the protein structure file in PDB format (Fig. 1, Label 3). If only the sequence of the query protein is known, the user may provide a FASTA-formatted protein sequence by copying the sequence and pasting it into the text box (Fig. 1, Label 4). When sequence information is provided, the GalaxySite server predicts its protein structure by using a simplified version of GalaxyTBM [20], a template-based protein structure prediction method (see Note 1 ).

  4. 4.

    Press the submit button to queue the job. If any errors occur with the provided input, the user will get a notice about the errors that need to be corrected. If the submission is successful, the user will be directed to the summary page of the submission information which has a link to the report page (Fig. 3a). The number of jobs in the “WAIT” or “RUN” status allowed per user is limited to three.

  5. 5.

    Click “LINK” in the submission information page to access to the report page. The user can track the status of the submitted job in the report page which will be refreshed every 30 s (Fig. 3b). When the job is completed, predicted results will be automatically presented. Average run time of GalaxySite is 2–4 h.

  6. 6.

    Ligands predicted to bind: GalaxySite predicts up to three ligands that are likely to bind to the target protein (see Note 2 ). The predicted ligands are presented in the descending order of the estimated likelihood of binding (Fig. 4). For each ligand, ligand name in a three-letter code (Fig. 4, Label 1) and two-dimensional chemical structure (Fig. 4, Label 2) are shown. Ligand name is hyperlinked to the ligand summary page of RCSB PDB (http://www.rcsb.org) [21] for detailed information on the molecule. PDB IDs for protein–ligand complexes used for the prediction are also provided and hyperlinked to the structure summary page of RCSB PDB (Fig. 4, Label 3).

    Fig. 4
    figure 4

    An example of the “Ligands predicted to bind” section on the GalaxySite report page

  7. 7.

    Predicted ligand binding residues: For each predicted ligand, information on the predicted ligand binding residues is provided (Fig. 5a). Ligand binding residues are defined from the protein–ligand complex structure obtained by molecular docking in GalaxySite (Fig 4a, Label 1). If the distance of any amino acid residue from any ligand atom is less than the sum of van der Waals radii of the two atoms + 0.5 Å, the residue is considered to bind the ligand. In addition, detailed atomic interactions between ligand and ligand binding residues are analyzed by using LIGPLOT [22] and can be seen through LINK (Fig 4a, Label 2). On the LIGPLOT page (Fig. 5b), the ligand molecule and the protein amino acid residues are depicted in violet and brown, respectively. Hydrogen bonds are shown in green dashed lines with their lengths, and hydrophobic contacts are shown in red spikes. Ideas for designing ligands or ligand binding site residues may be gained from this interaction analysis.

    Fig. 5
    figure 5

    (a) An example of the “Predicted ligand binding residue” section on the GalaxySite report page. (b) An example of interaction analysis between ligand and ligand binding residues made by LIGPLOT. (c) An example of the “Predicted binding poses” section on the GalaxySite report page

  8. 8.

    Predicted binding poses: For each predicted ligand, a predicted protein–ligand complex structure can be seen on the page using PV (http://biasmv.github.io/pv/), a JavaScript protein viewer, if the web browser supports JavaScript (Fig. 5c). Users can zoom in and out by scrolling mouse wheel and change the focusing center by double clicking. Different predicted protein–ligand complex structures are shown by clicking the model number in the “View in PV” line (Fig 4c, Label 3). Predicted protein–ligand complex structures can be downloaded in PDB-formatted file for further analyses (Fig 4c, Label 4).

  9. 9.

    Re-submission with other ligands: Other ligands that are likely to bind to the query protein are listed in another table (Fig. 6). Similarly to the top three ligands with the highest estimated likelihood of binding (see step 6), ligand names, two-dimensional chemical structures, and PDB IDs for the corresponding protein–ligand complexes are shown in the table. By clicking the “Submit” button (Fig. 6, Label 1), the user can re-submit a new ligand binding site prediction job with a selected ligand.

    Fig. 6
    figure 6

    An example of the “Re-submission with other possible ligands” section on the GalaxySite report page

  10. 10.

    Detailed explanations on the GalaxySite web server are also provided on the GalaxySite help page; click “Help” tab at the top of the page, and then click “ GalaxySite” on the right of the help page. The prediction method used for the GalaxySite program is described in the original paper [15].

3.2 Peptide Binding Site Prediction Using GalaxyPepDock

  1. 1.

    Go to GalaxyWEB, http://galaxy.seoklab.org. Click “ PepDock” in the “Services” tab at the top of the page.

  2. 2.

    In the “User Information” section, enter job name (defaults to “None”). The user can provide e-mail address so that the server sends progress reports of submitted job automatically. Otherwise, the user should bookmark the report page after submitting job.

  3. 3.

    In the “Protein–peptide Docking” section, provide a standard PDB-formatted protein structure file (Fig. 2, Label 2) and a FASTA-formatted peptide sequence file (Fig. 2, Label 3).

  4. 4.

    Press the submit button to queue the job. If the submission is successful, a “Submission Information” page will appear (Fig. 7a).

    Fig. 7
    figure 7

    (a) A summary page showing the submission information of a GalaxyPepDock job. (b) An example report page showing the status of the GalaxyPepDock job

  5. 5.

    Click “LINK” of the submission information page to access the report page. The report page will be refreshed every 30 s, updating the status of the submitted job. When the job is completed, the predicted results will be presented. Average run time of GalaxyPepDock is 2–3 h (Fig. 7b).

  6. 6.

    Predicted protein–peptide complex structures: Predicted structures of the query protein–peptide complex can be visualized on the report page using PV (http://biasmv.github.io/pv/), a JavaScript protein viewer, if the web browser supports JavaScript (Fig. 8). Users can zoom in and out by scrolling mouse wheel and change the focusing center by double clicking. Template structures selected from the database of protein–peptide complex structures to be used in the prediction are shown in light colors; protein and peptide structures are in light red and blue, respectively. Different protein–peptide complex model structures can be seen by clicking the model number in the “View in PV” line (Fig. 8, Label 1). Predicted protein–peptide complex structures can also be downloaded in PDB-formatted files for further analyses (Fig. 8, Label 2).

    Fig. 8
    figure 8

    An example of the “Predicted protein–peptide complex structures” section on the GalaxyPepDock report page

  7. 7.

    Additional information: Additional information on predicted models and intermediate results generated during the GalaxyPepDock run is provided in a table (Fig. 9a). Structures of protein template and peptide template are given as PDB IDs and can also be downloaded (Fig. 9a, Labels 1 and 2, respectively). Sequences and alignments of the query and the template used for the prediction are provided (Fig. 9a, Label 3) for both protein and peptide (Fig. 9b). Structure similarity between the predicted protein structure and the protein template structure is presented in terms of TM-score [23] and RMSD (Fig. 9a, Label 4). A score called interaction similarity score [18] that was designed to describe the similarity of the amino acids of the query complex aligned to the interacting residues of the template complex is reported for each prediction. This is to give an idea on the degree of the relative differences in similarity to the selected templates among different models (Fig. 9a, Label 5).

    Fig. 9
    figure 9

    An example of the “Additional information” section on the GalaxyPepDock report page. (a) A summary table showing the results of the protein–peptide complex structure predictions. (b) An example of structure/sequence alignments between the query protein/peptide and the template protein/peptide. (c) An example of the list of predicted binding residues of protein

  8. 8.

    Predicted binding site residues: Binding site residues of the protein taken from the predicted complex structure (Fig. 9a, Label 7 and 9c) and the estimated prediction accuracy of the binding site (Fig. 9a, Label 6) are provided (see Note 3 ). Those residues with any heavy atom within 5 Å from any peptide heavy atom in the predicted structure are reported as binding residues.

  9. 9.

    GalaxyPepDock help page is also available; click the “Help” tab at the top of the page, and click “GalaxyPepDock” on the right of the help page. More detailed description of the prediction method of GalaxyPepDock can be found in the original paper [18].

4 Notes

  1. 1.

    When a protein sequence is provided as input, GalaxySite predicts its protein structure first by using a simplified version of the GalaxyTBM template-based protein structure prediction program. Protein structure is required because ligand binding sites are predicted by structure-based protein–ligand docking with additional information from available protein–ligand complex structures in the database. For computational efficiency, loop/termini modeling and further refinement step employed in the original GalaxyTBM are skipped during the GalaxySite runs. If the user desires to use a protein structure predicted by the full components of GalaxyTBM, he/she can run the GalaxyTBM program on GalaxyWEB. Select “TBM” in the “Services” tab at the top of the GalaxyWEB page. The same FASTA-formatted protein sequence described in the Materials section is sufficient to run GalaxyTBM.

  2. 2.

    Because GalaxySite predicts ligand binding sites using available protein–ligand complex structures, it cannot predict ligand binding sites if no structures for similar protein–ligand complexes are identified. In such cases, GalaxySite generates the message, “No template for binding site prediction has been found”.

  3. 3.

    The estimated prediction accuracy in GalaxyPepDock means the estimated fraction of correctly predicted binding site residues. This value is obtained by using the linear regression data obtained from the prediction and experimental results on the PeptiDB test set [24]. A low value of estimated prediction accuracy implies that proper templates were not able to be selected, and the current similarity-based method may not provide reliable results for the query. When a very low value of estimated accuracy is returned, the user is recommended to try an ab initio protein–peptide docking method such as PEP-SiteFinder [25] that does not rely on similarity to the known structures.