Introduction

Both residual dipolar couplings (RDCs) in high-resolution nuclear magnetic resonance (NMR) spectroscopy and small angle scattering (SAS) have experienced tremendous progress in recent years as powerful tools for biomolecular structure determination in solution (Prestegard et al. 2000; Bax 2003; Blackledge 2005; Svergun and Koch 2002; Koch et al. 2003; Neylon 2008). Whereas SAS data contain translational information of biomacromolecules such as radii of gyration, inter-domain distances and arrangements, RDCs provide the respective domain orientations in multimeric complexes as well as domain-internal structural information. NMR can provide additional information about binding interfaces and conformational dynamics. Therefore, both techniques complement each other perfectly and have been applied to challenging problems in structural biology (Aliprandi et al. 2008; Bernado et al. 2005, 2007; Choy et al. 2002; Goult et al. 2007; Grishaev et al. 2008; Marino et al. 2006; Tidow et al. 2007). Especially the high potential of the combined use of SAS and RDCs for structural analysis in solution has been realized recently (Mattinen et al. 2002; Yuzawa et al. 2004; Bernado et al. 2005; Grishaev et al. 2005; Gabel et al. 2006; Marino et al. 2006; Mareuil et al. 2007).

Here, we present the implementation of a recently introduced target function, combining both NMR RDC and SAS restraints (Gabel et al. 2006), in CNS (Crystallography and NMR Systems; Brünger et al. 1998). In contrast to earlier work (Gabel et al. 2006), where the domains were kept artificially at fixed orientations and the target function was calculated directly from the atomic positions, we have now incorporated the target function into a molecular dynamics/simulated annealing protocol in CNS. The protocol is tested by retrieving the structure of the two-domain, 31 kDa nuclear export protein TAP (Liker et al. 2000) from a set of simulated RDC and SAS data, after initial randomization of the domain positions and orientations. We find a family of structures that cluster both translationally and orientationally and that match the overall SAS curve very well. The effect of the SAS parameters and of the number and errors of RDCs on the quality of the refined structures is investigated. The main advantages of our approach are the short computing times per MD (molecular dynamics) step, the limited angular range of SAS data needed and the flexibility to incorporate the SAS target potential at different levels of accuracy in the annealing protocol. We compare our results with an alternative refinement procedure using the rigid body modeling program SASREF (Petoukhov and Svergun 2005).

Material and methods

Simulation of RDCs

Three sets of RDCs were simulated with (a) no noise and (b) 1 Hz and (c) 2 Hz noise for all N–H bonds of TAP (residues 120–355, PDB entry 1FT8_A) using the program DC (Delaglio et al. 1995) for an alignment tensor magnitude Da = 7 Hz and rhombicity r = 0.24. More details are provided in the Supplementary Material.

Calculation of the SAS target curve and parameters

The small angle scattering target curve of TAP was calculated with the program CRYSOL (Svergun et al. 1995) from the PDB entry 1FT8_A (residues 120–194 and 207–355) in the s-range 0.00…0.35 Å−1 (Fig. 1). The simulated SAS curve was fitted with a polynomial function (Gabel et al. 2006) in order to extract the parameters A target, B target and C target used in the SAS target potential (Eq. 1). In addition, the target parameters were calculated directly from the atomic positions (for details see Supplementary Material). The respective values are compared in Supplementary Table 1, along with optimum values obtained in a grid-search (see next section). For the alternative refinement with SASREF, an additional scattering curve was calculated with randomized 5% errors (Supplementary Fig. 3).

Fig. 1
figure 1

Calculated SAS data points and polynomial fit (thick continuous line) with a polynomial fit (Gabel et al. 2006) up to the order of s 18. The truncated expansions to the order of s 2 (A), s 4 (A + B) and s 6 (A + B + C) are also shown for a comparison

Optimization of SAS target potential parameters by a grid-search

While the SAS target values A target, B target and C target used in Eq. 1 are extracted from a polynomial fit of a CRYSOL-generated scattering curve from the TAP crystal structure (including a hydration shell), the parameters A, B and C are directly calculated from the atomic positions (Supplementary Eq. 2) of the structures during the refinement process and thus ignore a hydration shell. Therefore, a grid-search of the target parameters A target, B target and C target around the theoretically predicted ones is necessary to consider the hydration shell in modified target parameters (Supplementary Table 2). The optimum values were obtained as those that gave the best χ2 fit of all calculated structures in a run with the simulated SAS curve.

CNS protocol

We incorporated the SAS target potential (Gabel et al. 2006)

$$ \Psi = \left( {\frac{{\lambda _{a} }}{6}} \right)\left( {A - A_{\text{target}} } \right){\kern 1pt} ^{2} + \left( {\frac{{\lambda _{b} }}{120}} \right)\left( {B - B_{\text{target}} } \right){\kern 1pt} ^{2} + \left( {\frac{{\lambda _{c} }}{5040}} \right)\left( {C - C_{\text{target}} } \right){\kern 1pt} ^{2} $$
(1)

into the existing CNS program suite (Brünger et al. 1998) by adding several FORTRAN modules and modifying existing CNS protocols. A schematic overview of the individual steps of the CNS protocol is given in Fig. 2.

Fig. 2
figure 2

Schematic, sequential flowchart of the CNS protocol used

It is noteworthy that the target potential A target (i.e. excluding the higher order terms) may be incorporated into the CNS refinement protocol in a simplified way as distance restraints (“pseudo-NOE restraints”) between rigid domains (see Supplementary Material).

Structure calculations

To test the refinement protocol, the following combinations of SAS and RDC potentials were used: (a) No SAS target potential active, all 217N–H RDC restraints active, (b) activation of the target parameter A target and all RDC restraints, (c) activation of the target parameters A target and B target and all RDC restraints, (d) activation of the target parameters A target, B target and C target and all RDC restraints, (e) same setup as in (b–d) but all RDCs with 1 or 2 Hz experimental noise, and control runs with no SAS and no RDC restraints active (Table 1 and Supplementary Table 2). In each case, a total of 500 structures were calculated. The 50 structures with the highest CNS energies were discarded from the subsequent structural analysis since most of them had obvious steric problems.

Table 1 Structural parameters of the refined structures as a function of the activation degree of the SAS-potential

Selection of refined structures by scoring against the full SAS curve

The SAS scattering curves of the refined structures were calculated with CRYSOL and scored against the target SAS curve (for details see Supplementary Material). The refined structures were evaluated in terms of the following parameters (Table 1 and Supplementary Table 2): Radius of gyration, root mean square displacement (RMSDs) of the RRM domain around its mean position and with respect to the crystal (target) RRM domain relative to the LRR domain, χ2 of the fit with the target SAS curve, mean RDC Q-value (Cornilescu et al. 1998) of the ten best refined structures and the percentage of clustered structures (a cluster being defined by structures with a maximum orientational deviation of ±15° in a molecule-fixed coordination frame).

Alternative refinement with SASREF

We also compared the results of our approach to an alternative rigid body refinement using the program SASREF (Petoukhov and Svergun 2005). Details are provided in the Supplementary Material.

Results

The CNS refinement protocol

The flowchart in Fig. 2 shows the implementation of the SAS potential into the CNS program suite. The modules are available upon request and will be provided in an extended version of CNS/ARIA.

The SAS potential is only activated during the second cooling step sa_l_cool2.cns. This late activation ensures that the domains are already oriented properly by the RDC data and that the SAS potential terms B and C (that depend explicitly on the domain orientation; Gabel et al. 2006) act correctly. The exact time point of activation can be specified by the logical parameter $i_cool/$ncycle that represents the remaining fraction of molecular dynamics steps within sa_l_cool2.cns. In order to save computing time, we chose X = $i_cool/$ncycle = 0.9, i.e. the SAS restraints were only active during the last 10% of the sa_l_cool2.cns protocol (=2000 MD steps). The target parameters A target, B target and C target (=a xaf, b xaf, c xaf) and their individual weighing factors (Eq. 1) λa, λb and λc (=w axf, w bxf, w cxf, see Supplement) are provided by the user, along with the overall weighing factor λSAS (=w xaf) for the energy of the SAS potential and the definition of the rigid domains. A typical example is given in the Supplementary Material.

The FORTRAN77 file xafs.f is invoked during the sa_l_cool2.cns protocol in order to calculate the SAS parameters A, B and C of the structures during the refinement protocol. This calculation is done from the atomic coordinates as described (Gabel et al. 2006). xafs.f calculates also the potential Ψ (Eq. 1) and its gradient according to Gabel et al. (2006). If the parameters A, B and C of the present structure differ from their respective target value by more than specific percentage a, b and c (defined in the xafs.f file), all atoms of domain 2 (RRM in our case) will be translationally shifted by grad Ψ with the partial derivatives \( \frac{\partial \Psi }{\partial \theta } \) and \( \frac{\partial \Psi }{\partial \varphi } \) multiplied with a weighing factor k = 10−7. In the present protocol we chose a = 0.03, b = 0.05 and c = 0.05. The optimized value for k is defined in xafs.f. If the parameters A, B and C match their target values (within the marges provided) the present structure will be considered as the final, SAS-refined structure and the protocol will stop.

Optimization of SAS target potential values by a grid-search

The SAS parameters A, B and C (Eq. 1) are approximate and do not consider the hydration shell of the protein. Furthermore, the finite polynomial fit and experimental errors of the scattering curve lead to uncertainties in the determination of A target, B target and C target (discussed in Gabel et al. 2006). Therefore, a grid-search around their initial values (determined by a χ2-fit against the SAS curve) is necessary. The optimum values for A target, B target and C target were obtained as those that gave the best χ2 fit of all calculated structures in a run with the simulated SAS curve (Materials and Methods and Supplementary Table 2). The optimal A target = 1180 found by a grid-search is within 5% of its predicted value. B target = 2.8 × 106 and C target = 8.5 × 109 are within 10% and 25% of their predicted values, respectively (Supplementary Table 1).

The A-potential lifts part of the orientational degeneracy

The activation of the A-term of the SAS-potential in addition to RDC restraints improved the structural parameters of the 20 best structures (χ2-scored against the SAS-curve) in several regards (Fig. 3). The translational spread of the RRM centres of mass is decreased and its rotational degeneracy is partially lifted by reducing the number of orientations present from three to two (more details are provided in the Supplementary Material).

Fig. 3
figure 3

20 best refined structures (in terms of χ2). Top: no SAS term active, RDCs active (dataset 1 in Table 1), bottom: A target active, RDCs active (dataset 4). In the bottom part, three structures out of the 20 best have been omitted since they did not belong to any of the four possible clusters. The structures in the domain orientation of the reference structure are depicted in blue, the crystal structure itself in red. For all structures the LRR domain is superimposed and shown in the same orientation. Structures in green and magenta belong to clusters other than the crystal structure orientation, where the RRM domain is rotated by 180º compared to the reference structure orientation. The fourth possible orientation is not found, presumably due to steric constraints introduced by the linker of finite length

Structural parameters as a function of the activation level of the SAS-potential

The effect of the progressive activation of the SAS-potential terms on all structures (with correct domain orientation; no χ2-scoring applied) is best summarized in Fig. 4, showing the spatial distribution of the TAP alignment tensor frames (placed on the RRM domain centre of mass after superposition of the LRR domain).

Fig. 4
figure 4

Alignment tensor reference frames of all refined TAP structures (with correct domain orientation) as a function of the activation level of the SAS-potential from two different views. LRR domains have been superposed and the alignment tensor frames are placed at the centres of mass of the RRM domains. The centre of mass of the target (crystal) RRM domain is depicted as a small red sphere. The structures on the right-hand side have been generated by a rotation of 90° around the vertical axis

In the absence of a SAS-potential, all RRM domains are properly oriented and aligned by the RDCs but distributed randomly within a sphere that is limited by the steric restraints due to the finite length of the linker connecting both domains. The ample translational distribution of the RRM domains is reflected in large RMSDs of the vector connecting both domains and a poor mean χ2-fit of all structures with the SAS curve (Table 1, data set 1).

The activation of the A-term induces a significant decrease of the translational degrees of freedom of the structures in the sense that all RRM domain centres (while being correctly oriented by the RDCs) are now placed at equal distance from the LRR centre of mass on a two-dimensional (spherical) surface. However, since no driving force orthogonal to the domain-connecting vector is active, the refined RRM positions continue to be widely spread over this spherical surface, comparable to the case without SAS restraints. This is reflected by large RRM RMSDs to the mean refined structure and to the target structure (Table 1, data set 4) comparable to the ones without SAS restraints. However, the χ2-values of all structures are significantly improved, due to the adjustment of the radius of gyration induced by the activation of the A-potential. As in the case without SAS-potentials, the target position of the RRM domain is situated within the set of structures refined.

The additional activation of the B-term, which induces a driving force on the RRM domain tangentially to the spherical surface defined by the A-term, reduces mainly the translational spread of the RRM domain positions and their RMSDs to the mean refined structure but improves the RMSDs to the target structure only slightly (Table 1, data set 9). The χ2-values of all structures decrease further and their standard deviation is less pronounced with respect to the case where only A is active. All structures remain correctly oriented.

Finally, the full activation of the SAS-potential (C) decreases further the degrees of freedom of the RRM domain positions on the spherical surface into a crescent moon shaped region and reduces the mean χ2-values (Table 1, data set 13). All structures match the scattering curve very well and the positions of the RRM alignment tensor frames can be considered as a geometric representation of a set of structures that are all compatible with the scattering curve (in terms of a low χ2-fit). The RMSDs of the RRM to the mean structure increases slightly with respect to the value in the case of B, the RMSDs to the target structure are comparable to the preceeding cases. Note, however, that the coordinate RMSD is not a good measure for characterizing the improvement of a spherical distribution of structures into a disk- or line-like distribution as is achieved by the activation of the SAS potential (Fig. 4).

Figure 5 illustrates how an additional χ2-scoring (against the complete scattering curve) selects a subset of structures out of those already refined against the C potential: the translational spread of these 20 best structures is limited to a core region of the crescent moon shaped surface but a unique, unambiguous structure cannot be pointed out even in this case. Note, that all structures shown in Fig. 5 have virtually indistinguishable χ2-values (=0.515…0.518). Figure 5 can therefore be considered as a geometrical representation of an ensemble of TAP structures equivalent from a SAS point of view (within the error bars of the SAS curve).

Fig. 5
figure 5

Alignment tensor reference frames of the 20 best TAP structures (χ2-scored against the complete scattering curve) out of all C-refined structures (with correct domain orientation) from two different perspectives. LRR domains have been superposed and the alignment tensor frames are placed at the centres of mass of the RRM domains. The centre of mass of the target (crystal) RRM domain is depicted as a small red sphere. The structures on the right-hand side were obtained by a 90° rotation around the vertical axis

Effect of errors on the refined structures

The introduction of experimental noise (1 and 2 Hz) on the RDC data sets did not change notably the main structural parameters, i.e. radius of gyration, translational spread etc. (for more details see Supplementary Material). This suggests that the protocol is stable against errors in the experimental RDCs.

Alternative refinement with SASREF

The CNS-based structure calculation protocol with simultaneous refinement against SAS and RDC data was compared with rigid body modelling using the program SASREF (Petoukhov and Svergun 2005). SASREF allowed the reconstruction of the initial model in some runs but also yielded several other solutions equally compatible with the simulated data (see Supplementary Fig. 3, top). In particular, two “false positives”, one with a different position of the RRM relative to the LRR domain (Fig. 6b) and the other with different relative orientation of domains (Fig.  6c) were obtained yielding the same or even better fit quality than the correct rigid body model (Fig. 6d, Supplementary Table 3). Not unexpectedly, although the RMSD between the atomic coordinates of the false positives and the initial model are rather high, the NSD criteria (Kozin and Svergun 2001) were all below 1.0 (Supplementary Table 3). This indicated that the models with “alternative” configuration of domains still kept essentially the same low resolution shape (and thus the same fit to the scattering data). A possible further reason for the observed ambiguity is the influence of the solvation shell. In CRYSOL, used to compute the simulated profile, this shell is calculated as an envelope surrounding the entire TAP particle. SASREF employs the scattering amplitudes calculated by CRYSOL from each individual domain, surrounded by its solvation shell. The influence of the hydration shell was tested by generating a second scattering curve with smaller errors (Supplementary Material).

Fig. 6
figure 6

TAP rigid body models reconstructed by SASREF with RDC restraints. (a) reference structure, (b) alternative position, (c) alternative orientation, (d) correct reconstruction. The models are shown with the same orientation for the LRR domain. The bottom view is rotated by 90° about horizontal axis

Discussion

The activation of an inter-domain distance restraint (A target) was sufficient to reduce the possible domain orientations that match the scattering curve well (Fig. 3, bottom). This demonstrates that the joint refinement against RDC and SAS data can lift the four-fold orientational degeneracy associated with a single alignment tensor at least partially. However, as becomes clear from Fig. 3, some degeneracy can remain, which is due to the rather isotropic shape of the RRM domain (effects of domain anistropy are discussed in more detail in Gabel et al. 2006). In order to completely lift the orientational degeneracy it is necessary to include additional data. These could include RDC data from an additional distinct alignment tensor (Prestegard et al. 2000; Bax 2003; Blackledge 2005), or additional translational and distance information from NOEs (Grishaev et al. 2005), chemical shift perturbations and/or biochemical data (mutational analysis) (Dominguez et al. 2003; Clore and Schwieters 2003), or, for example, paramagnetic relaxation enhancements (Battiste and Wagner 2000; Mackereth et al. 2005). The implementation of the SAS potential in CNS allows that all these different types of data can be readily supplemented for the structure calculation. We also note that in this case a further refinement of the resulting structures considering electrostatic terms in a solvation shell (Linge et al. 2003) is possible and strongly recommended.

There are two general points that limit the accuracy of the target parameters used: firstly, A, B and C in Eq. 1 do not consider the presence of a protein solvent shell with different scattering density than the bulk solvent (Svergun et al. 1998). In our case, this effect may well account for a part of the differences between the target values A target, B target and C target determined by a fit of the calculated scattering curve (including a hydration shell) and the ones yielding the best results in the MD structural refinement (Supplementary Table 1). Secondly, due to practical reasons, the polynomial fit of the SAS curve is only a truncated form of the correct sinc function (Gabel et al. 2006), introducing a certain error in the extracted target parameters.

In contrast to an approach proposed recently (Grishaev et al. 2005), our protocol does not aim to derive relatively high-resolution structural information from the SAS data, but rather provides an efficient rigid body algorithm for combination with complementary NMR data. Therefore, no corrective form factors need to be introduced. Importantly, the determination of the target values A target, B target and C target is only required once at the beginning of the MD simulation. As a result our approach is computationally very efficient. For example, a single MD step requires about 1 second of computing time for all SAS potential terms in the case of TAP (235 residues, including hydrogen atoms) on a commercial Pentium computer (3 GHz). Calculation of a single structure, including all SAS terms takes about 1 hour on a single CPU. Presently, a grid search of several target value conditions is necessary so that the effective time per structure calculation is a function of the target parameter conditions explored (see Supplementary Table 2).

An advantage of the protocol presented here is that the user can easily choose and adjust the level and contribution of SAS restraints by selecting the order of SAS terms activated during the simulated annealing. The A potential is related to the radius of gyration which is amongst the most reliable information contained in SAS data and defines the interdomain distance (Gabel et al. 2006). Restraints on the radius of gyration have been used in molecular dynamics calculations for many years (Boczko and Brooks 1995; Hünenberger et al. 1995; Kuszewski et al. 1999). However, the A-term is mathematically distinct from these since it restrains the distance between the centers-of-mass of two domains. The activation of the B potential (under the precautions mentioned in the Supplementary Material) significantly improves the convergence of structures. If high quality data (good signal/noise, wide angular range) are available an activation of the C-term may further refine the geometric distribution of the structures (Fig. 4) even though it has little effect on the χ2 values of the best structures (Supplementary Table 2).

The coordinate RMSDs between the SAS-refined and the reference structures (Table 1) are rather large. The main reason for this is that, in contrast to other reports (e.g. Grishaev et al. 2005), we have not included any other data than the SAS and RDCs. Especially, no interdomain distance restraints were included, which—if applied—will drastically reduce the coordinate RMSD. In fact, the protocol presented here yields a geometric representation of an ensemble of structures that are virtually equivalent in terms of χ2 fits with the scattering curve (Figs. 4 and 5), and thus provides a good representation of the conformational space that is consistent with the SAS data. The potential (Eq. 1) allows to represent the accuracy of the refined structures graphically in terms of residual translational degrees of freedom of the domains for a given orientation. Given that the SAS data are in general associated with high structural ambiguities it is important to highlight these uncertainties. Such information is rarely provided by other rigid body modelling approaches.

The results obtained with the rigid-body modelling using SASREF illustrate the utility of the RDC restraints in SAS-based rigid body modelling but also highlight the possibilities of ambiguous reconstructions even in this case. It is important to note that SAS essentially sees the overall particle shape, and if a false positive has a similar shape at low resolution, it may be difficult to discard such solution based on the fit to the “noisy” experimental data (Supplementary Fig. 3, top). This documents the high structural ambiguity of SAS data. It is therefore always necessary to complement SAS-based rigid body models with independent and/or additional structural information.

In the present study we limited the application to a two-domain protein. However, the approach can be easily extended to the structural refinement of two subunits within a multimeric complex with SANS: either by using specific perdeuteration of the two subunits of interest and contrast variation of the solvent (H2O to D2O ratio) or by using the natural contrast between protein and RNA/DNA molecules (Timmins and Zaccai 1988).

Conclusions

We present an efficient structural calculation protocol that simultaneously employs SAS translational and RDC orientational restraints in CNS to define the quaternary arrangement of a two-domain protein. The addition of SAS restraints notably reduces the translational degrees of freedom of the refined structures and partially lifts the orientational degeneracies associated with RDCs from a single alignment medium. Main advantages of our approach are the computational efficiency and that the target potential can be activated at several levels of precision, allowing structural refinement with SAS data from a very limited angular range. The protocol can be combined with additional structural information, for example from NMR chemical shift perturbation, paramagnetic relaxation enhancements or biochemical data. This is required to reduce remaining translational spread of the resulting structures. The approach is generally applicable to multidomain proteins and/or complexes with known single-domain high resolution structures (X-ray or NMR).