Key words

1 Introduction

Proteins which bind to small molecules (i.e. ligands) are involved in many biological processes such as enzyme catalysis, receptor signaling, and metabolite transport. Designing these interactions can produce reagents which can serve as biosensors, in vivo diagnostics, signal modulators, molecular delivery devices, and sequestering agents [15]. Additionally, the computational design of proteins which bind small molecules serves as a critical test of our understanding of the principles that drive protein/ligand interactions.

While in vitro techniques for the optimization of protein/ligand interactions have shown success [6], these are limited in the number of sequence variants which can be screened, and often require at least a modest starting affinity which to further optimize [7]. Computational techniques allow searching larger regions of sequence space and permit design in protein scaffolds with no detectable intrinsic affinity for the target ligand. Computational and in vitro techniques are often complementary and starting activity achieved via computational design can often be improved via in vitro techniques ([8] and Chapter 9 of this volume). Although challenges remain, computational design of small molecule interactions have yielded success on a number of occasions [5, 9], and further attempts will refine our predictive ability to generate novel ligand binders.

The Rosetta macromolecular modeling software suite [10, 11] has proven to be a robust platform for protein design, having produced novel protein folds [12, 13], protein/DNA interactions [14], protein/peptide interactions [15], protein/protein interactions [16], and novel enzymes [1719]. Technologies for designing protein/ligand interactions have also been developed and applied [4, 8, 20]. Design of ligand binding proteins using Rosetta approaches the problem in one of two ways. One method derives from enzyme design, where predefined key interactions to the ligand are emplaced onto a protein scaffold and the surrounding context is subsequently optimized around them [8]. The other derives from ligand docking, in which the interactions with a movable ligand are optimized comprehensively [4, 20]. Both approaches have proven successful in protein redesign, and features from both can be combined using the RosettaScripts system [21], tailoring the design protocol to particular design needs.

Here we present a protocol derived from RosettaLigand ligand docking [2225], which designs a protein binding site around a given small molecule ligand (Fig. 1). After preparing the protein and ligand structures, the placement of the ligand in the binding pocket is optimized, followed by optimization of sidechain identity and conformation. This process is repeated iteratively, and the proposed designs are sorted and filtered by a number of relevant structural metrics, such as predicted affinity and hydrogen bonding. This design process should be considered as part of the integrated program of computational and experimental work, where proteins designed computationally are tested experimentally and the experimental results are used to inform subsequent rounds of computational design.

Fig. 1
figure 1

Flowchart of RosettaLigand design protocol. From the combined input coordinates of the protein and ligand, the position of the ligand is optimized. Next, residues in the protein/ligand interface are optimized for both identity and position. After several cycles of small molecule perturbation, sidechain rotamer sampling, Monte Carlo minimization with Metropolis (MCM) criterion, and a final gradient-based minimization of the protein to resolve any clashes (“high resolution redocking”), the final model is the output. Further optimization can occur by using the final models of one round of design as the input models of the next round. Most variables in this protocol are user-defined, and will be varied to best fit the protein–ligand complex under study

2 Materials

  1. 1.

    A computer running a Unix-like operating system such as Linux or MacOS. Use of a multi-processor computational cluster is recommended for productions runs, although test runs and small production runs can be performed on conventional laptop and desktop systems.

  2. 2.

    Rosetta. The Rosetta modeling package can be obtained from the RosettaCommons website (https://www.rosettacommons.org/software/license-and-download). Rosetta licenses are available free to academic users. Rosetta is provided as source code and must be compiled before use. See the Rosetta Documentation (https://www.rosettacommons.org/docs/latest/) for instructions on how to compile Rosetta. The protocol in this paper has been tested with Rosetta weekly release version 2015.12.57698.

  3. 3.

    A program to manipulate small molecules. OpenBabel [26] is a free software package which allows manipulation of many small molecule file formats. See http://openbabel.org/ for download and installation information. The protocol in this paper has been tested with OpenBabel version 2.3.1. Other small molecule manipulation programs can also be used.

  4. 4.

    A ligand conformer generation program. We recommend the BCL [27] which is freely available from http://meilerlab.org/index.php/bclcommons for academic use but does require an additional license to the Cambridge Structural Database [28] for conformer generation. The protocol in this paper has been tested with BCL version 3.2. Other conformer generation programs such as Omega [29], MOE [30], or RDKit [31] can also be used.

  5. 5.

    The structure of the target small molecule in a standard format such as SDF or SMILES (see Note 1 ).

  6. 6.

    The structure of the protein to be redesigned, in PDB format (see Notes 2 and 3 ).

3 Methods

Throughout the protocol ${ROSETTA} represents the directory in which Rosetta has been installed. File contents and commands to be run in the terminal are in italics. The use of a bash shell is assumed—users of other shells may need to modify the syntax of command lines.

3.1 Pre-relax the Protein Structure into the Rosetta Scoring Function [32]

Structure from non-Rosetta sources or structures from other Rosetta protocols can have minor structural variations resulting in energetic penalties which adversely affect the design process (see Notes 4 and 5 ).

${ROSETTA}/main/source/bin/relax.linuxgccrelease -ignore_unrecognized_res -ignore_zero_occupancy_false -use_input_sc -flip_HNQ -no_optH false -relax:constrain_relax_to_start_coords -relax:coord_constrain_sidechains -relax:ramp_constraints false -s PDB.pdb

For convenience, rename the output structure.

mv PDB_0001.pdb PDB_relaxed.pdb

3.2 Prepare the Ligand

  1. 1.

    Convert the small molecule to SDF format, including adding hydrogens as needed (see Note 6 ).

obabel LIG.smi --gen3D -O LIG_3D.sdf

obabel LIG_3D.sdf -p 7.4 -O LIG.sdf

  1. 2.

    Generate a library of ligand conformers (see Notes 7 and 8 ).

bcl.exe molecule: ConformerGenerator -top_models 100 -ensemble_filenames LIG.sdf -conformers_single_file LIG_conf.sdf

  1. 3.

    Convert the conformer library into a Rosetta-formatted “params file” (see Notes 9 and 10 ).

${ROSETTA}/main/source/src/python/apps/public/molfile_to_params.py -n LIG -p LIG --conformers-in-one-file LIG_conf.sdf

This will produce three files: “LIG.params”, a Rosetta-readable description of the ligand; “LIG.pdb”, a selected ligand conformer; and “LIG_conformers.pdb”, the set of all conformers (see Note 11 ).

3.3 Place the Ligand into the Protein (See Notes 12 and 13 )

  1. 1.

    Identify the location of desired interaction pockets. Visual inspection using programs like PyMol or Chimera [33] is normally the easiest method (see Note 14 ). Use the structure editing mode of PyMol to move the LIG.pdb file from step 3.2.3 into the starting conformation. Save the repositioned molecule with its new coordinates as a new file (LIG_positioned.pdb) (see Note 15 ).

  2. 2.

    If necessary, use a text editor to make the ligand be residue 1 on chain X (see Note 16 ).

  3. 3.

    Using a structure viewing program, inspect and validate the placement of the ligand (LIG_positioned.pdb) in the binding pocket of the protein (PDB_relaxed.pdb) (see Note 17 ).

3.4 Run Rosetta Design

  1. 1.

    Prepare a residue specification file. A Rosetta resfile allows specification of which residues should be designed and which should not. A good default is a resfile which permits design at all residues interface (see Note 18 ).

    ALLAA AUTO start 1 X NATAA

  2. 2.

    Prepare a docking and design script (“design.xml”). The suggested protocol is based off of RosettaLigand docking using the RosettaScripts framework [2225]. It will optimize the location of ligand in the binding pocket (low_res_dock), redesign the surrounding sidechains (design_interface), and refine the interactions in the designed context (high_res_dock). To avoid spurious mutations, a slight energetic bonus is given to the input residue at each position (favor_native).

<ROSETTASCRIPTS>

    <SCOREFXNS>

<ligand_soft_rep weights=ligand_soft_rep />

<hard_rep weights=ligandprime />

    </SCOREFXNS>

    <TASKOPERATIONS>

<DetectProteinLigandInterface name=design_interface cut1=6.0 cut2=8.0 cut3=10.0 cut4=12.0 design=1 resfile="PDB.resfile"/> # see Note 19

    </TASKOPERATIONS>

    <LIGAND_AREAS>

<docking_sidechain chain=X cutoff=6.0 add_nbr_radius=true all_atom_mode=true minimize_ligand=10/>

<final_sidechain chain=X cutoff=6.0 add_nbr_radius=true all_atom_mode=true/>

<final_backbone chain=X cutoff=7.0 add_nbr_radius=false all_atom_mode=true Calpha_restraints=0.3/>

    </LIGAND_AREAS>

    <INTERFACE_BUILDERS>

<side_chain_for_docking ligand_areas=docking_sidechain/>

<side_chain_for_final ligand_areas=final_sidechain/>

<backbone ligand_areas=final_backbone extension_window=3/>

    </INTERFACE_BUILDERS>

    <MOVEMAP_BUILDERS>

<docking sc_interface=side_chain_for_docking minimize_water=true/>

<final sc_interface=side_chain_for_final bb_interface=backbone minimize_water=true/>

    </MOVEMAP_BUILDERS>

    <SCORINGGRIDS ligand_chain=X width=15> # see Note 20

<vdw grid_type=ClassicGrid weight=1.0/>

    </SCORINGGRIDS>

    <MOVERS>

<FavorNativeResidue name=favor_native bonus=1.00 /> # see Notes 21 and 22

<Transform name=transform chain=X box_size=5.0 move_distance=0.1 angle=5 cycles=500 repeats=1 temperature=5 rmsd=4.0 /> # see Note 23

<HighResDocker name=high_res_docker cycles=6 repack_every_Nth=3 scorefxn=ligand_soft_rep movemap_builder=docking/>

<PackRotamersMover name=designinterface scorefxn=hard_rep task_operations=design_interface/>

<FinalMinimizer name=final scorefxn=hard_rep movemap_builder=final/>

<InterfaceScoreCalculator name=add_scores chains=X scorefxn=hard_rep />

<ParsedProtocol name=low_res_dock>

    <Add mover_name=transform/>

</ParsedProtocol>

<ParsedProtocol name=high_res_dock>

    <Add mover_name=high_res_docker/>

    <Add mover_name=final/>

</ParsedProtocol>

</MOVERS>

<PROTOCOLS>

    <Add mover_name=favor_native/>

    <Add mover_name=low_res_dock/>

    <Add mover_name=design_interface/> # see Note 24

    <Add mover_name=high_res_dock/>

    <Add mover_name=add_scores/>

    </PROTOCOLS>

</ROSETTASCRIPTS>

  1. 3.

    Prepare an options file (“design.options”). Rosetta options can be specified either on the command line or in a file. It is convenient to put options which do not change run-to-run (such as those controlling packing and scoring) into an options file rather than the command line.

-ex1

-ex2

-linmem_ig 10

-restore_pre_talaris_2013_behavior # see Note 25

  1. 4.

    Run the design application (see Notes 26 and 27 ). This will produce a number of output PDB files (named according to the input file names, see Note 28 ) and a summary score file (“design_results.sc”).

${ROSETTA}/main/source/bin/rosetta_scripts.linuxgccrelease @design.options -parser:protocol design.xml -extra_res_fa LIG.params -s "PDB_relaxed.pdb LIG_positioned.pdb" -nstruct <number of output models> -out:file:scorefile design_results.sc

3.5 Filter Designs

  1. 1.

    Most Rosetta protocols are stochastic in nature. The output structures produced will contain a mixture of good and bad structures. The large number of structures produced need to be filtered to a smaller number of structures taken on to the next step.

A rule of thumb is that filtering should remove unlikely solutions, rather than selecting the single “best” result. Successful designs are typically good across a range of relevant metrics, rather than being the best structure on a single metric (see Note 29 ).

The metrics to use can vary based on the desired properties of the final design. Good standard metrics include the predicted interaction energy of the ligand, the stability score of the complex as a whole, the presence of any clashes [34], shape complementarity of the protein/ligand interface [35], the interface area, the energy density of the interface (binding energy per unit of interface area), and the number of unsatisfied hydrogen bonds formed on binding.

  1. 2.

    Prepare a file (“metric_thresholds.txt”) specifying thresholds to use in filtering the outputs of the design runs. IMPORTANT: The exact values of the thresholds need to be tuned for your particular system (see Note 30 ).

req total_score value < -1010 # measure of protein stability

req if_X_fa_rep value < 1.0# measure of ligand clashes

req ligand_is_touching_X value > 0.5# 1.0 if ligand is in pocket

output sortmin interface_delta_X# binding energy

  1. 3.

    Filter on initial metrics from the docking run. This will produce a file (“filtered_pdbs.txt”) containing a list of output PDBs which pass the metric cutoffs.

perl ${ROSETTA}/main/source/src/apps/public/enzdes/DesignSelect.pl -d <(grep SCORE design_results.sc) -c metric_thresholds.txt -tag_column last > filtered_designs.sc

awk '{print $NF ".pdb"}' filtered_designs.sc> filtered_pdbs.txt

  1. 4.

    Calculate additional metrics (see Note 31 ). Rosetta’s InterfaceAnalyzer [36] calculates a number of additional metrics. These can take time to evaluate, though, so are best run on only a pre-filtered set of structures. After the metrics are generated, the structures can be filtered as in steps 3.5.1 and 3.5.2. This will produce a score file (“design_interfaces.sc”) containing the calculated metric values for the selected PDBs.

${ROSETTA}/main/source/bin/InterfaceAnalyzer.linuxgccrelease -interface A_X -compute_packstat -pack_separated -score:weights ligandprime -no_nstruct_label -out:file:score_only design_interfaces.sc -l filtered_pdbs.txt -extra_res_fa LIG.params

  1. 5.

    Filter on additional metrics. The commands are similar to those used in step 3.5.2, but against the design_interfaces.sc score file, and with a new threshold file.

perl ${ROSETTA}/main/source/src/apps/public/enzdes/DesignSelect.pl -d <(grep SCORE design_results.sc) -c metric_thresholds.txt -tag_column last > filtered_designs.sc

awk '{print $NF ".pdb"}' filtered_designs.sc> filtered_pdbs.txt

Example contents of metric_thresholds2.txt:

req packstat value > 0.55 # packing metric; 0-1 higher better

req sc_value value > 0.45# shape complementarity; 0-1 higher better

req delta_unsatHbonds value < 1.5# unsatisfied hydrogen bonds on binding

req dG_separated/dSASAx100 value < -0.5 # binding energy per contact area

output sortmin dG_separated# binding energy

3.6 Manually Inspect Selected Sequences

While automated procedures are continually improving and can substitute to a limited extent [37], there is still no substitute for expert human knowledge in evaluating designs. Visual inspection of interfaces by a domain expert can capture system-specific requirements that are difficult to encode into an automated filter (see Note 32 ).

3.7 Reapply the Design Protocol, Starting at Step 3.4

Improved results can be obtained by repeating the design protocol on the output structures from previous rounds of design. The number of design rounds depends on your system and how quickly it converges, but 3–5 rounds of design, each starting from the filtered structures of the previous one, is typical (see Note 33 ).

3.8 Extract Protein Sequences from the Final Selected Designs into FASTA Format

${ROSETTA}/main/source/src/python/apps/public/pdb2fasta.py $(cat final_filtered_pdbs.txt) > selected_sequences.fasta

3.9 Iteration of Design

Only rarely will the initial design from a computational protocol give exactly the desired results. Often it is necessary to perform iterative cycles of design and experiment, using information learned from experiment to alter the design process (Fig. 2).

Fig. 2
figure 2

Protein/ligand interface design with RosettaLigand. (a) Comparison in improvements in Interface Score and Total Score for top models from an initial placement, docking without sequence design, and docking with design. (b) Sequence logo of mutation sites among the top models from a round of interface design [43]. For most positions, the consensus sequence resembles the native sequence. Amino acids with sidechains that directly interact with the ligand show a high prevalence to mutation as seen in the positions with decreased consensus. (c) Example of a typical mutation introduced by RosettaLigand. The protein structure is represented in cartoon (cyan). The native alanine (pink) is mutated to an arginine residue (green) to match ionic interactions with the negatively charged ligand (green). Image generated in PyMol [44]

4 Notes

  1. 1.

    While Rosetta can ignore chain breaks and missing loops far from the binding site, the structure of the protein should be complete in the region of ligand binding. If the binding pocket is missing residues, remodel these with a comparative modeling protocol, using the starting structure as a template.

  2. 2.

    Acceptable formats depend on the capabilities of your small molecule handling program. OpenBabel can be used to convert most small molecule representations, including SMILES and InChI, into the sdf format needed by Rosetta.

  3. 3.

    High resolution experimental structures determined in complex with a closely related ligand are most desirable, but not required. Experimental structures of the unliganded protein and even homology models can be used [38, 39].

  4. 4.

    The option “-relax:coord_constrain_sidechains” should be omitted if the starting conformation of the sidechains are from modeling rather than experimental results.

  5. 5.

    Rosetta applications encode the compilation conditions in their filename. Applications may have names which end with *.linuxgccrelease, *.macosclangrelease, *.linuxiccrelease, etc. Use whichever ending is produced for your system. Applications ending in “debug” have additional error checking which slows down production runs.

  6. 6.

    It is important to add hydrogens for the physiological conditions under which you wish to design. At neutral pH, for example, amines should be protonated and carboxylates deprotonated. The “-p” option of OpenBabel uses heuristic rules to reprotonate molecules for a given pH value. Apolar hydrogens should also be present.

  7. 7.

    Visually examine the produced conformers and manually remove any which are folded back on themselves or are otherwise unsuitable for being the target design conformation.

  8. 8.

    It is unnecessary to sample hydrogen positions during rotamer generation, although any ring flip or relevant heavy atom isomeric changes should be sampled.

  9. 9.

    molfile_to_params.py can take a number of options—run with the “-h” option for details. The most important ones are: “-n”, which allows you to specify a three letter code to use with the PDB file reading and writing, permitting you to mix multiple ligands; “-p”, which specifies output file naming; “--recharge”, which is used to specify the net charge on the ligand if not correctly autodetected; and “--nbr_atom”, which allows you to specify a neighbor atom (see Note 10 )

  10. 10.

    Specifying the neighbor atom is important for ligands with offset “cores”. The neighbor atom is the atom which is superimposed when conformers are exchanged. By default the neighbor atom is the “most central” atom. If you have a ligand with a core that should be stable when changing conformers, you should specify an atom in that core as the neighbor atom.

  11. 11.

    LIG.params expects LIG_conformers.pdb to be in the same directory, so keep them together when moving files to a new directory. If you change the name of the files, you will need to adjust the value of the PDB_ROTAMERS line in the LIG.params file.

  12. 12.

    Rosetta expects the atom names to match those generated in the molfile_to_params.py step. Even if you have a starting structure with the ligand correctly placed, you should align the molfile_to_params.py generated structure into the pocket so that atom naming is correct.

  13. 13.

    Other methods of placing the ligand in the pocket are also possible. Notably, Tinberg et al. [8] used RosettaMatch [40] both to place the ligand in an appropriate scaffold and to place key interactions in the scaffold.

  14. 14.

    Other pocket detection algorithms can also be used (see Chapter 1 of this volume and [41] for a review).

  15. 15.

    If you have a particularly large pocket, or multiple potential pockets, save separate ligand structures at different positions and perform multiple design runs. For a large number of locations, the StartFrom mover in RosettaScripts can be used to randomly place the ligand at multiple specified locations in a single run.

  16. 16.

    Being chain X residue 1 should be the default for molfile_to_params.py produced structures. Chain identity is important as the protocol can be used to design for ligand binding in the presence of cofactors or multiple ligands. For fixed-location cofactors, simply change the PDB chain of the cofactor to something other than X, add the cofactor to the input protein structure, and add the cofactors’ params file to the -extra_res_fa command line option. For designing to multiple movable ligands, including explicit waters, see Lemmon et al. [42].

  17. 17.

    To refine the initial starting position of the ligand in the protein, you can do a few “design” runs as in step 3.4, but with design turned off. Change the value of the design option in the DetectProteinLigandInterface tag to zero. A good starting structure will likely have good total scores and good interface energy from these runs, but will unlikely result in ideal interactions. Pay more attention to the position and orientation of the ligand than to the energetics of this initial placement docking run.

  18. 18.

    The exact resfile to use will depend on system-specific knowledge of the protein structure and desired interactions. Relevant commands are ALLAA (allow design to all amino acids), PIKAA (allow design to only specified amino acids) NATAA (disallow design but permit sidechain movement), and NATRO (disallow sidechain movement). The AUTO specification allows the DetectProteinLigandInterface task operation to remove design and sidechain movement from residues which are “too far” from the ligand.

  19. 19.

    Change the name of the resfile in the XML script to match the full path and filename of the resfile you are using. The cut values decide how to treat residues with the AUTO specification. All AUTO residues with a C-beta atom within cut1 Angstroms of the ligand will be designed, as will all residues within cut2 which are pointing toward the ligand. The logic in selecting sidechains is similar for cut3 and cut4, respectively, but with sidechain flexibility rather than design. Anything outside of the cut shells will be ignored during the design phase, but may be moved during other phases.

  20. 20.

    The grid width must be large enough to accommodate the ligand. For longer ligands, increase the value to at least the maximum extended length of the ligand plus twice the value of box_size in the Transform mover.

  21. 21.

    Allison et al. [20] found that a value of 1.0 for the FavorNativeSequence bonus worked best over their benchmark set. Depending on your particular requirements, though, you may wish to adjust this value. Do a few test runs with different values of the bonus and examine the number of mutations which result. If there are more mutations than desired, increase the bonus. If fewer than expected, decrease the bonus.

  22. 22.

    More complicated native favoring schemes can be devised by using FavorSequenceProfile instead of FavorNativeSequence. For example, you can add weights according to BLOSUM62 relatedness scores, or even use a BLAST-formatted position-specific scoring matrix (PSSM) to weight the bonus based on the distribution of sequences seen in homologous proteins.

  23. 23.

    The value of box_size sets the maximum rigid body displacement of the ligand from the starting position. The value of rmsd sets the maximum allowed root mean squared deviation from the starting position. Set these to smaller values if you wish to keep the designed ligand closer to the starting conformation, and to larger values if you want to permit more movement. These are limits for the active sampling stage of the protocol only. Additional movement may occur during other stages of the protocol.

  24. 24.

    The provided protocol only does one round of design and minimization. Additional rounds may be desired for further refinement. Simply replicate the low_res_dock, design_interface, and high_res_dock lines in the PROTOCOLS section to add additional rounds of design and optimization. Alternatively, the EnzRepackMinimize mover may be used for finer control of cycles of design and minimization (although it does not incorporate any rigid body sampling).

  25. 25.

    Refinement of the Rosetta scorefunction for design of protein/ligand interfaces is an area of current active research. The provided protocol uses the standard ligand docking scorefunction which was optimized prior to the scorefunction changes in 2013, and thus requires an option to revert certain changes. Decent design performance has also been seen with the “enzdes” scorefunction (which also requires the -restore_pre_talaris_2013 option) and the standard “talaris2013” scorefunction.

  26. 26.

    Use of a computational cluster is recommended for large production runs. Talk to your local cluster administrator for instructions on how to launch jobs on your particular cluster system. The design runs are “trivially parallel” and can either be manually split or run with an MPI-compiled version. If splitting manually, change the value of the -nstruct option to reduce the number of structures produced by each job, and use the options -out:file:prefix or -out:file:suffix to uniquely label each run. The MPI version of rosetta_scripts can automatically handle distributing structures to multiple CPUs, but requires Rosetta to be compiled and launched in cluster-specific ways. See the Rosetta documentation for details.

  27. 27.

    The Rosetta option “-s” takes a list of PDBs to use as input for the run. The residues from multiple PDBs can be combined into a single structure by enclosing the filenames in quotes on the command line. Multiple filenames not enclosed in quotes will be treated as independent starting structures.

  28. 28.

    The number of output models needed (the value passed to -nstruct) will depend on the size of the protein pocket and the extent of remodeling needed. Normally, 1000–5000 models is a good sized run for a single starting structure and a single protocol variant. At a certain point, you will reach “convergence” and the additional models will not show appreciable metric improvement or sequence differences. If you have additional computational resources, it is often better to run multiple smaller runs (100–1000 models) with slightly varying protocols (different starting location, number of rounds, extent of optimization, native bonus, etc.), rather than have a larger number of structures from the identical protocol.

  29. 29.

    Relevant metrics can be determined by using “positive controls”. That is, run the design protocol on known protein–ligand interactions which resemble your desired interactions. By examining how the known ligand–protein complexes behave under the Rosetta protocol, you can identify features which are useful for distinguishing native-like interactions from non-native interactions. Likewise, “negative controls”, where the design protocol is run without design (see Note 17 ) can be useful for establishing baseline metric values and cutoffs.

  30. 30.

    The thresholds to use are system-specific. A good rule of thumb is to discard at least a tenth to a quarter by each relevant metric. More important metrics can receive stricter thresholds. You may wish to plot the distribution of scores to see if there is a natural threshold to set the cut at. You will likely need to do several test runs to adjust the thresholds to levels which give the reasonable numbers of output sequences. “Negative controls” (the protocol run with design disabled, see Note 17 ) can also be used to determine thresholds.

  31. 31.

    Other system-specific metric values are available through the RosettaScripts interface as “Filters”. Adding “confidence = 0” in the filter definition tag will turn off the filtering behavior and will instead just report the calculated metric for the final structure in the final score file. Many custom metrics, such as specific atom–atom distances, can be constructed in this fashion. See the Rosetta documentation for details.

  32. 32.

    Certain automated protocol can ease this post-analysis. For example, Rosetta can sometimes produce mutations which have only a minor influence on binding energy. While the native bonus (see Notes 21 and 22 ) mitigates this somewhat, explicitly considering mutation-by-mutation reversions can further reduce the number of such “spurious” mutations seen. Nivon et al. [37] presents such a protocol.

  33. 33.

    In subsequent rounds, you will likely want to decrease the aggressiveness of the low resolution sampling stage (the box_size and rmsd values of the Transform mover in step 3.4.2) as the ligand settles into a preferred binding orientation. As the output structure contains both the protein and ligand, the quotes on the values passed to the “-s” option (see step 3.4.4 and Note 27 ) are no longer needed. Instead, you may wish to use the “-l” option, which takes the name of a text file containing one input PDB per line. Each input PDB will each produce “-nstruct” models. Reduce this value such that the total number of unfiltered output structures in each round is approximately the same.