Abstract
Molecular replacement is already able to solve the majority of structures in the Protein Data Bank, thanks to the rapidly increasing number of template structures available and continuous improvements in the algorithms. Chances of success can be optimised by proper preparation of models, for instance by trimming poorly-conserved regions, creating an ensemble of alternative models or applying advanced homology modeling tools. The sensitivity of the molecular replacement search can be improved by using likelihood targets; these lend themselves to automation, which makes it possible to carry out extensive searches and helps to avoid user errors. The convergence radius of model completion can be extended by using methods that smoothly deform the starting model or apply advanced modeling techniques. Even more difficult structures can be solved by combining molecular replacement with other phasing methods, such as SAD phasing or multi-crystal averaging.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
11.1 Introduction
When the second protein crystal structure was solved (haemoglobin; [16]), it was already seen to resemble the first protein crystal structure (myoglobin; [7]), and the seeds of the molecular replacement method were sown. In the subsequent half-century it has become very clear that proteins with similar amino acid sequences have similar 3D structure, and for a long time molecular replacement has been an essential tool for the macromolecular crystallographer [22].
By now about two-thirds of protein structures are solved by molecular replacement [10] and, as the Protein Data Bank continues to expand, the method can only become more prominent. The rise in molecular replacement is also fuelled, in large part, by improvements in the algorithms, from model preparation through the molecular replacement search algorithms and on to the methods used to complete structures from poor starting models.
11.2 Model Preparation
To carry out molecular replacement, it is necessary to find a template (a related structure in the PDB) and then, possibly, to modify this template to be more similar to the target structure in the unknown crystal. Until a few years ago, the application of any molecular modelling protocol that changed the coordinates of the atoms tended to make the model worse for molecular replacement than the underlying template; in essence, there are many more ways to degrade the model than to improve it.
11.2.1 Model Trimming
One simple way to improve a template is to trim off the parts that are not expected to be conserved in the target, such as a domain or a large surface loop. At times it has been popular to trim back all the side chains to give a poly-Ala model, avoiding uncertainty about side-chain conformation; we find, in general, that this is too extreme and throws away useful signal.
Schwarzenbacher et al. [24] carried out a careful study of model trimming and drew two important conclusions. First, it is generally better to leave the side chains of conserved residues in the model, because their conformation is likely to be conserved as well, but to trim back non-identical side chains (and non-conserved surface loops). Even for non-identical residues, the first torsion angle is often conserved, so it is usually a good idea to keep the gamma atom of the residue. Second, as the sequence identity drops, it becomes essential to use the best possible sequence alignment, such as one obtained by profile-profile alignment methods, so that the right side chains and surface loops are actually modified.
Another form of model preparation is carried out in MOLREP [28]. Rather than simply deleting uncertain side chains, their B-factors can be inflated to reduce their influence on the calculation in a more subtle fashion [9].
The program Sculptor [3] combines these approaches and allows a number of different model preparation protocols to be tested. Side chains and loops can be trimmed in different ways, and B-factors can be adjusted according to surface accessibility, local sequence conservation, or a combination of both. By carrying out a series of molecular replacement calculations with a number of different variations on the starting template, the overall success rate can be increased significantly.
11.2.2 Molecular Modelling
In recent years, the sophistication of molecular modelling algorithms has finally reached the point where the starting templates can be improved for molecular replacement. Impressive results have been obtained using the Rosetta modelling package to improve starting models derived from NMR experiments or from the crystal structures of homologues [17].
11.2.3 Ab Initio Modelling
In fact, even an ab initio model created by Rosetta in a blind structure prediction test was shown to be sufficiently accurate to be used successfully for molecular replacement [17]. The computational resources required to fold ab initio models of this level of accuracy are substantial, but it has subsequently been shown that, at least in favourable cases, ab initio folding methods making a more modest use of CPU time can also succeed [20].
11.2.4 Ensembles
As sequence identity drops, structures become less similar and the success rate of molecular replacement also drops. However, there is also often a greater number of choices of model at a lower sequence identity level. By collecting these into an ensemble, in which the conserved features are enhanced and the variable features are downweighted, the success rate can again be boosted. The likelihood framework, discussed below, allows a statistical weighting of the contributions of members of an ensemble, which can be helpful [18].
The success rate can also be enhanced by trimming off surface loops that are not conserved among members of the ensemble, leaving a conserved core. This was essential, for instance, in solving the structure of angiotensinogen using a collection of models with about 20 % sequence identity (Fig. 11.1; [29]). An automated trimming option has been implemented in the Ensembler program (Bunkóczi and Read unpublished), along with a robust multiple-superposition method that optimises the superposition of the conserved core.
11.3 Molecular Replacement Calculations
In principle, molecular replacement is a 6n-dimensional search to find the orientations and positions of n models, but such a large space is impractical to search exhaustively. One approach is to use stochastic methods such as genetic algorithms (EPMR; [8]) or Monte Carlo (QoS; [5]) to search in the full space. However, most molecular replacement programs, such as our program Phaser, break the problem down into a series of 3D searches with rotation functions to find the orientation of a molecule and translation functions to find its position. For problems where the model is sufficiently accurate to yield a useful map, the signal in the individual searches is usually strong enough that the correct solution at each step is found in a relatively short list of plausible partial solutions. This enables a tree-search-with-pruning strategy [14].
11.3.1 Likelihood
Traditional molecular replacement calculations were based on the properties of the Patterson map, but the use of likelihood scores has a number of advantages [18]: the influence of data at different resolutions is weighted sensibly based on the expected quality of the model, information from partial models can be taken into account, and the likelihood score can be used robustly to rank different potential solutions, which is useful for automation strategies.
The molecular replacement likelihood functions [18] are relatively expensive to compute but, fortunately, it is possible to derive good approximations that can be computed efficiently. Likelihood-based fast rotation [25] and fast translation [13] functions can be used to generate a short list of plausible solutions, which can then be ranked using the full likelihood score.
The idea of likelihood is simple: models or hypotheses can be tested by how well they agree with the measured data. Likelihood gives a probabilistic measure of agreement with the data, i.e. likelihood measures the probability that the set of data would have been measured, given the model and any associated uncertainties in the model parameters or the data. A more in-depth understanding can be obtained from the review on likelihood in crystallography by McCoy [11].
11.3.2 Automation
A molecular replacement calculation can be thought of as testing a series of hypotheses about the orientation and then the position taken by molecules in the crystal. Since likelihood is an effective measure to rank hypotheses, it lends itself to decision-making in an automated molecular replacement strategy. As noted above, Phaser uses a tree-search-with-pruning strategy. Heuristic rules (e.g. the correct solution is usually above 75 % of the distance between the mean and the top in any step of the search) are used to keep a list of plausible solutions and discard the less plausible ones. Multiple alternative models for a component can be evaluated at the same time, and the best one can be chosen by its likelihood score. Even different possible choices of space group can be evaluated. If the crystal contains a complex of different components, then the search order for the different components can be evaluated by considering how well each component would explain the data.
Increasingly, molecular replacement is being implemented as part of a pipeline, such as MrBUMP [6], BALBES [10] and AutoMR in the Phenix package [1]. Ideally, such pipelines are started by supplying only the diffraction data and the sequences of the proteins in the crystal, and then they fetch the template structures, modify them, carry out molecular replacement, and even follow that with automated building and refinement.
11.3.3 Pathologies
Experience has shown that likelihood targets are more sensitive than the traditional Patterson-based methods in finding the solution. However, this sensitivity is a double-edged sword, because likelihood is also more sensitive to errors in the assumptions used to derive the likelihood targets. One such assumption is that the crystal diffracts isotropically (i.e. equally strongly in all directions in reciprocal space). Likelihood-based molecular replacement is severely degraded by the effects of anisotropic diffraction, unless a correction is applied. Fortunately, likelihood also provides the tools to characterise the anisotropy and correct for its effects [14], and anisotropic diffraction no longer presents a problem.
Similarly, the presence of translational non-crystallographic symmetry (tNCS) also severely violates the assumptions of the original likelihood targets. In tNCS, two or more copies of the molecule are found in the same orientation in the crystal. Depending on their relative position, and how this relates to the Bragg planes for a particular reflection, they can scatter in phase (leading to exceptionally strong reflections) or out of phase (leading to exceptionally weak reflections). Until recently, the presence of tNCS was one of the leading causes for Phaser to fail in cases that would otherwise be expected to succeed. Methods to characterise tNCS and account for its statistical effects on the diffraction pattern have now been implemented in Phaser, dramatically increasing success rates in these cases (McCoy and Read unpublished).
11.4 Model Completion
When the available models are poor (typically low sequence identity) or incomplete, or the resolution of the data is limited, it has frequently been found that the molecular replacement problem can be solved but the electron density maps are too poor to see what needs to be done to complete the structure. Fortunately, a number of recent developments have markedly improved this situation.
11.4.1 Morphing and Other Smooth Deformations
Looking at distant homologues, one often sees that the basic fold is preserved, but the relative positions and orientations of structural elements have changed slightly. Even though such movements might be difficult to see in a density map at the local level, there are weak signals that can be combined over a larger region. Tom Terwilliger (personal communication) has developed a “morphing” algorithm that takes advantage of these signals. It looks for rigid-body movements that would improve the fit to density of a window of residues along the chain, and then applies that shift to the central residue in the window. By sliding the window along the chain, a smooth transformation (“morphing”) of the model is achieved. In a number of test cases, this has led to sufficient improvement in the model, and thus the phases, that further improvements to the model become clear in the density.
Refinement methods that lead to smooth deformations, such as the jelly-body method [15] or DEN refinement [23] are also very helpful in the initial stages of refinement from a poor molecular replacement model. This is illustrated clearly in a test case using DEN refinement to complete a structure that had been stuck in refinement [2].
11.4.2 Rosetta Modelling
In particularly difficult cases, the largest convergence radius in rebuilding and refining from a poor model is probably achieved by using the advanced modelling algorithms in Rosetta [4], combining the Rosetta energy functions with electron density fit scores to build into noisy density maps. The phenix.mr_rosetta pipeline [27] provides a convenient interface giving access to Rosetta modelling, molecular replacement in Phaser, and automated building and refinement in AutoBuild [26].
11.4.3 Arcimboldo
Completing the structure starting from a highly incomplete model presents similar challenges to starting from a poor but relatively complete model. The Arcimboldo procedure [21] is discussed elsewhere in greater detail by Isabel Usón. Briefly, this exploits the power of density modification and automated building algorithms to extend incomplete models comprising only a few helices, placed using Phaser.
11.5 Combined Methods
11.5.1 MR-SAD
A molecular replacement model can be used as a starting point for the computation of log-likelihood-gradient (LLG) maps to find anomalous scatterers using single-wavelength anomalous diffraction (SAD; [12, 19]). In some cases, the anomalous signal may be too weak to find the anomalous scatterers with ab initio substructure determination methods, but nonetheless significant phase information can be obtained once the sites have been found using SAD LLG maps, even if those are based on a poor molecular replacement model. In other cases, locating anomalous scatterers in a refined model can be a valuable tool for identifying unknown components, such as bound ions.
11.5.2 Using Density as a Model
Proteins frequently crystallise in multiple crystal forms and, at times, experimental phase information can only be obtained for one of these forms. In such cases, the electron density can be cut out of one map and used as a molecular replacement model to solve another crystal form.
Such a procedure was used in solving the structure of angiotensinogen [29]. A poor electron density map was available for crystals of the human form of this protein, combining information from molecular replacement with an ensemble of distant models at 3.3 Å resolution with SAD phases from a GdCl3 derivative at 4 Å resolution. Molecular replacement with the same ensemble model did not succeed in solving the structures of crystals from rat or mouse angiotensinogen, but electron density extracted from the map of the human form did give a clear solution for two copies of angiotensinogen in one of the rat crystal forms. In turn, averaged density from this rat crystal form could be used to find two copies in the second rat crystal form, allowing 4-fold multi-crystal averaging to be initiated between the two rat crystal forms.
Molecular replacement serves two purposes for multi-crystal averaging, in such cases: it defines the rotation and translation operators that superimpose the density in one crystal on the density in the other crystal, and it provides initial phases for the second crystal form.
11.6 Future Developments
There has been rapid progress in recent years in the power and reach of molecular replacement, and there are good reasons to believe that this will continue. As density modification and model-building algorithms improve, it will become possible to solve structures from even less complete and less accurate starting points. Improvements in our understanding of the likelihood targets will feed into better automation strategies, both by allowing us to predict how good the model must be to have a chance of success, and by providing measures of confidence in partial solutions obtained along the solution path. Even if there were no improvements in the algorithms, the continued rapid growth of the PDB would ensure that there are good models for an ever-expanding set of targets.
References
Adams PD, Afonine PV, Bunkóczi G, Chen VB, Davis IW, Echols N, Headd JJ, Hung L-W, Kapral GJ, Grosse-Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner R, Read RJ, Richardson DC, Richardson JS, Terwilliger TC, Zwart PH (2010) PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D 66:213–221
Brunger AT, Das D, Deacon AM, Grant J, Terwilliger TC, Read RJ, Adams PD, Levitt M, Schröder GF (2012) Application of DEN-refinement and automated model-building to a difficult case of molecular replacement phasing: the structure of a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum. Acta Crystallogr D 68:391–403
Bunkóczi G, Read RJ (2011) Improvement of molecular replacement models with Sculptor. Acta Crystallogr D 67:303–312
DiMaio F, Terwilliger TC, Read RJ, Wlodawer A, Oberdorfer G, Wagner U, Valkov E, Alon A, Fass D, Axelrod HL, Das D, Vorobiev SM, Iwaï H, Pokkuluri PR, Baker D (2011) Improving molecular replacement by density- and energy-guided protein structure optimization. Nature 473:540–543
Glykos NM, Kokkinidis M (2001) Multidimensional molecular replacement. Acta Crystallogr D 57:1462–1473
Keegan RM, Winn MD (2008) MrBUMP: an automated pipeline for molecular replacement. Acta Crystallogr D 64:119–124
Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC (1958) A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181:662–666
Kissinger CR, Gehlhaar DK, Fogel DB (1999) Rapid automated molecular replacement by evolutionary search. Acta Crystallogr D 55:484–491
Lebedev AA, Vagin AA, Murshudov G (2008) Model preparation in MOLREP and examples of model improvements using X-ray data. Acta Crystallogr D 64:33–39
Long F, Vagin AA, Young P, Murshudov G (2007) BALBES: a molecular-replacement pipeline. Acta Crystallogr D 64:125–132
McCoy AJ (2004) Liking likelihood. Acta Crystallogr D 60:2169–2183
McCoy AJ, Read RJ (2010) Experimental phasing: best practice and pitfalls. Acta Crystallogr D 66:458–469
McCoy AJ, Grosse-Kunstleve RW, Storoni LC, Read RJ (2005) Likelihood-enhanced fast translation functions. Acta Crystallogr D 61:458–464
McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ (2007) Phaser crystallographic software. J Appl Crystallogr 40:658–674
Murshudov GN, Skubák P, Lebedev AA, Pannu NS, Steiner RA, Nicholls RA, Winn MD, Long F, Vagin AA (2011) REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr D 67:355–367
Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North AC (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-Å resolution, obtained by X-ray analysis. Nature 185:416–422
Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ, Baker D (2007) High-resolution structure prediction and the crystallographic phase problem. Nature 450:259–264
Read RJ (2001) Pushing the boundaries of molecular replacement with maximum likelihood. Acta Crystallogr D 57:1373–1382
Read RJ, McCoy AJ (2011) Using SAD data in Phaser. Acta Crystallogr D 67:338–344
Rigden DJ, Keegan RM, Winn MD (2008) Molecular replacement using ab initio polyalanine models generated with ROSETTA. Acta Crystallogr D 64:1288–1291
Rodríguez DD, Grosse C, Himmel S, González C, de Ilarduya IM, Becker S, Sheldrick GM, Usón I (2009) Crystallographic ab initio protein structure solution below atomic resolution. Nat Methods 6:651–653
Rossmann MG (1972) The molecular replacement method. Gordon & Breach, New York
Schröder GF, Levitt M, Brunger AT (2010) Super-resolution biomolecular crystallography with low-resolution data. Nature 464:1218–1222
Schwarzenbacher R, Godzik A, Grzechnik SK, Jaroszewski L (2004) The importance of alignment accuracy for molecular replacement. Acta Crystallogr D 60:1229–1236
Storoni LC, McCoy AJ, Read RJ (2004) Likelihood-enhanced fast rotation functions. Acta Crystallogr D 60:432–438
Terwilliger TC, Grosse-Kunstleve RW, Afonine PV, Moriarty NW, Zwart PH, Hung L-W, Read RJ, Adams PD (2008) Iterative model building, structure refinement and density modification with the Phenix AutoBuild wizard. Acta Crystallogr D 64:61–69
Terwilliger TC, DiMaio F, Read RJ, Baker D, Bunkóczi G, Adams PD, Grosse-Kunstleve RW, Afonine PV, Echols N (2012) phenix.mr_rosetta: molecular replacement and model rebuilding with Phenix and Rosetta. J Struct Funct Genomics 13(2):81–90. doi:10.1007/s10969-012-9129-3
Vagin A, Teplyakov A (1997) MOLREP: an automated program for molecular replacement. J Appl Crystallogr 30:1022–1025
Zhou A, Carrell RW, Murphy MP, Wei Z, Yan Y, Stanley PLD, Stein PE, Broughton Pipkin F, Read RJ (2010) A redox switch in angiotensinogen modulates angiotensin release. Nature 468:108–111
Acknowledgments
Our work on Phaser is supported by awards from the Wellcome Trust (082961/Z/07/Z) and the NIH (Grant No. P01GM063210). We are grateful to users who provide us with bug reports and challenging problems that push the limits.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Read, R.J., McCoy, A.J., Oeffner, R.D., Bunkóczi, G. (2013). Extending the Reach of Molecular Replacement. In: Read, R., Urzhumtsev, A., Lunin, V. (eds) Advancing Methods for Biomolecular Crystallography. NATO Science for Peace and Security Series A: Chemistry and Biology. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6232-9_11
Download citation
DOI: https://doi.org/10.1007/978-94-007-6232-9_11
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6231-2
Online ISBN: 978-94-007-6232-9
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)