Keywords

A major challenge for structural biology is providing a mechanistic understanding of the plethora of functions and associated conformational changes performed by macromolecular and supramolecular complexes that underlie cell biology. Obtaining structures of such assemblies is a necessary prerequisite, and the rich data that they provide will open up new opportunities in the biomedical, biotechnological, and pharmacological arenas.

In order to investigate and adequately describe multifaceted biological systems, single types of methodologies are no longer sufficient: researchers are turning more and more to integrated approaches, using complementary structural data. The complexity of biological phenomena, linked to the inherent partiality of any representation, requires the pursuit of multiple methods and models. As is universally appreciated, individual types of structural data are limited in scope, accuracy and generality, and any inherent shortcomings can be overcome or minimized using complementary information in an integrative fashion.

In addition to the traditional structural biology techniques of X-ray crystallography, nuclear magnetic resonance (NMR) and electron microscopy (EM), additional methods are increasingly used, alone and in combination, with traditional methods to generate structural information. These include mass spectrometry of crosslinked complexes (Cohen and Chait 2001) and native complexes (Mehmood et al. 2015), synchrotron radiation circular dichroism spectroscopy (Cowieson et al. 2008), electron paramagnetic resonance spectroscopy (EPR) combined with site-directed spin labelling (Hubbell et al. 2000), Small-Angle Scattering (SAXS) (Lipfert and Doniach 2007), and computational docking with sparse distance restraints (Schneidman-Duhovny et al. 2012).

Although the integration of all structural methodologies with cell biology, biochemistry and computational approaches has made major strides over the last few years, the current chapter focusses specifically on the integration of NMR and SAXS for structural biology, emphasizing their remarkable complementarity.

NMR has unique capabilities for studying structure and dynamics of biomolecules at the atomic level. Structural characterization of a protein or any other biological macromolecule by NMR in solution invariably describes a distribution of interconverting conformers, in contrast to most structural descriptions from X-ray crystallography, cryo EM or solid-state magic-angle spinning NMR. Solution NMR ensembles encompass conformational families that range from a narrow distribution for well-folded, globular proteins or domains to a wide distribution for unfolded or partially folded polypeptide ensembles.

In contrast to the atomic-level information available by NMR, SAXS affords low resolution information but furnishes important data on the global size and shape of a particle in solution, ideally complementing the NMR-derived data. Or, in other words, SAXS provides an overall picture of the 3D space occupied by all coexisting conformers, while high resolution NMR describes the details of the conformational landscape at the atomic level. Several excellent reviews describing the general use of SAXS for biomolecules in solution have been published, covering a number of different aspects of the technique (Guinier and Fournet 1955; Doniach 2001; Koch et al. 2003; Putnam et al. 2007; Svergun and Koch 2003; Doniach and Lipfert 2012). Furthermore, a focused review on the use of SAXS to derive global shape information of folded RNA molecules is also available (Bhandari et al. 2016).

Like all structural techniques, NMR and SAXS each have advantages and disadvantages, as well as unique strengths and shortcomings. For example, SAXS is not limited by the molecular size of the particle under investigation (Graewert and Svergun 2013; Grant et al. 2011; Hura et al. 2009; Jeffries and Trewhella 2013; Martel et al. 2012) and can describe the contours of molecules with molecular masses of a few hundred kDa, a size too large for atomic level structure determination by solution NMR. Solution NMR, on the other hand, can provide detailed information about the atomic structure and dynamics of molecules, even for rare conformational sub-states (Sekhar and Kay 2013). However, both techniques are affected by potentially confounding factors to different degrees. While both methods ideally require monodispersity of the dissolved molecules, SAXS data quality is exquisitely sensitive to aggregation, and even a very small percentage (∼1%) of aggregated species can compromise the data analysis. In contrast, such small amounts of aggregates would not be observed by solution NMR and the presence of very large aggregates does not interfere with structural characterization of the smaller major component. For both SAXS and NMR, an additional complexity arises from conformational averaging on different timescales, reflecting the presence of local as well as global motions, which are important inherent properties of proteins (Henzler-Wildman and Kern 2007). Therefore, it is desirable to combine orthogonal techniques, which provide a more comprehensive description of the structure and dynamics than any individual method alone. In this regard, it is noteworthy that SAXS and NMR measurements can be performed on the same solution, ideally lending themselves to be used in an integrative fashion.

Given their complementarity, the integrated use of NMR and SAXS provides a powerful means to more completely describe the solution behavior of biological macromolecules, filling-in gaps or inherent imprecisions in the data extracted by either technique alone. Thus, when characterizing solution structures and architectures, it is desirable to obtain a SAXS shape envelope into which high resolution structures can be fitted, thus allowing the overall architecture of a multi-domain protein or multiprotein complex to be visualized.

NMR is an effective method for determining protein structure in solution at atomic resolution and has been routinely used for over 25 years (Fig. 11.1). However, for multi-domain proteins, even if a large number of distance-, angle- and chemical shift restraints are available, the relative orientations of individual domains are difficult to ascertain, given the predominantly local nature of the NMR-derived constraints. This limitation can be overcome, to some degree, by using extensive sets of residual dipolar couplings (RDCs). RDCs can be measured in solution NMR spectra, if molecules experience weak alignment in the magnetic field, either caused by the molecule’s own magnetic susceptibility anisotropy or by employing very dilute liquid crystalline media (Tjandra and Bax 1997). These couplings contain information about the orientation of the associated inter-nuclear vector, relative to the molecular susceptibility anisotropy tensor and, therefore, provide angular restraints for structure calculations. Addition of RDC-derived restraints to conventional structure determination algorithms results in remarkable improvements, both locally as well as globally.

Fig. 11.1
figure 1

Schematic illustration of NMR-provided information. 2D spectrum (middle), NOESY data and distances (left), chemical shift-derived phi, psi angles (top), J coupling-derived dihedral angles and RDC-derived orientational restraints (right), are all combined to determine an atomic model (bottom)

Algorithms for determining NMR structures aim to locate the global minimum of a target function containing terms for covalent geometry, non-bonded contacts, and the experimentally derived distance and angular restraints. The most important geometric information is provided by the nuclear Overhauser effect (NOE), which is translated into distances between proton pairs separated by <6 Å. Despite their short-range nature, these distances are highly conformationally restrictive, especially if they involve atoms that belong to units (amino acids or nucleotides) that are far apart in the linear sequence. Other experimental NMR restraints that provide short range structural information are three-bond coupling constants and secondary 1H and 13C chemical shifts. Three-bond coupling constants (3J) are related to torsion angles by the Karplus equation (Karplus 1963), with the 3JHNa coupling providing direct information about the phi backbone torsion angle. In a similar way, the empirical correlation between a protein’s backbone conformation (phi/psi angles) and the difference in 13Ca and 13Cb chemical shifts from random coil values are used in NMR structure determination. 1H chemical shifts are primarily used for refinement purposes, although recent advances in the ab initio calculation of proton shifts hold great promise for their routine use in NMR structure determination. In addition to these originally used parameters, paramagnetic relaxation enhancements (PREs) (Gillespie and Shortle 1997) and pseudocontact shifts (PCS) (Bertini and Luchinat 1999) augment the arsenal of geometric restraints that can be obtained by NMR.

SAXS data are measured as scattering signal intensity at a given value of q, where q = 4πsin θ/λ, with 2θ the scattering angle and λ the X-ray wavelength. Several program suites are available for processing SAXS data (e.g., PRIMUS, Scatter) (Rambo). The SAXS scattering profile (Fig. 11.2) at very small scattering angles (low q region) is frequently analyzed using the Guinier approximation, since the data for q close to zero vary linearly with q (Guinier and Fournet 1955). Thus, plotting the scattering intensity as ln I(q) vs q2 results in a straight line with the slope equal to – Rg 2/3 and the vertical intercept equal to the natural log of the zero-angle scattering intensity I(0). In this manner, the radius of gyration, Rg, i.e. the average root-mean-square distance from the center of density in the molecule can be extracted. Using the Guinier plots for the estimation of Rg, the maximum q that is acceptable to include in the fit is 1.3/Rg. The extrapolated intensity at zero scattering angle, I(0), is proportional to the electron density contrast between the scattering entity and the buffer and can be used to determine the molecular mass of the molecule (Fischer et al. 2010; Mylonas and Svergun 2007). Plotting I(0) vs concentration yields a straight line, unless large scale conformational averaging is present. Indeed, for highly flexible systems, the electron density contrast between the solute and the solvent is difficult to discern, rendering accurate determination of the volume and molecular weight values difficult.

Fig. 11.2
figure 2

Schematic illustration of SAXS data and analysis. (a) Scattering pattern (top), an experimental scattering intensity profile with fit (middle), and a low-resolution dummy bead model (bottom). (b) A theoretical scattering intensity profile (middle) and the various basic methods for analysis of SAXS data

Conformational flexibility or large amplitude motions in a molecule can be discerned from analysis of the scattering data using Kratky plots in which the scattering data is transformed as q2*I(q) vs q (Fig. 11.2b) (Glatter and Kratky 1982). Kratky plots for well-ordered globular, disordered and highly flexible, as well as partially ordered entities exhibit characteristic features (Hammel 2012; Kikhney and Svergun 2015; Rambo and Tainer 2011) that can be used for an initial characterization of the system under investigation.

The most powerful means for analyzing SAXS data consists of Fourier transforming the scattering intensity I(q) into a pair-distance distribution function P(r) (Fig. 11.2b). This function represents a continuous r2-weighted histogram of all electron-pair distances in the molecule (Glatter 1977). The P(r) function permits assessment of the overall quality of SAXS data analysis, since Rg and I(0) can be extracted directly from the P(r) function by integrating the function over all values of r. Calculating Rg and I(0) directly from P(r) uses all of the experimental data in real space, compared to solely using the linearly approximated points from the Guinier plot in the low-q region.

SAXS data together with RDC data, initially, were used to successfully refine known solution NMR structures of single-chain proteins with simulated annealing (SA) protocols (Grishaev et al. 2005; Lee et al. 2007). The power of combining SAXS and NMR, however, is most evident for multi-domain proteins, in which individual domains are connected by flexible linkers (Hennig and Sattler 2014). For example, it is possible to determine global architectures of complexes, employing experimental SAXS and RDC data in conjunction with solution NMR-derived component structures, as shown by us and others (Wang et al. 2009; Ellis et al. 2009). A very instructive and comprehensive review on the integration of SAXS and NMR for the analysis of the structural dynamics of modular multi-domain proteins, using DNA replication proteins as examples, was published recently (Thompson et al. 2017). In addition, several methods for characterizing flexible systems in solution using SAXS data have been reported; these include ensemble optimization methods (Bernado et al. 2007; Schwieters and Clore 2007), a minimal ensemble search (Pelikan et al. 2009), a basis-set supported SAXS (Yang et al. 2010), an integrative modeling platform (Forster et al. 2008), a maximum-entropy refinement (Rozycki et al. 2011), and maximum occurrence method, MaxOcc (Bertini et al. 2012). These approaches entail the generation of a large number of structures to cover the accessible conformational space, from which a subset of conformers is selected that fit the experimental SAXS data. The methods differ in the way the starting conformational ensemble is generated and how the final ensemble is selected from the pool. Extending such ensemble refinement protocols to include NMR-derived distance and RDC restraints, in addition to SAXS data, in both, the pool generation and the optimal ensemble selection, have proven successful for two-domain proteins that possess significant inter-domain motions (Lemak et al. 2014).

An illustrative example of method integration, aimed at obtaining a more detailed picture of a macromolecule in solution is our recent study on the structure and dynamics of a domain-insertion protein (Fig. 11.3). In this case, we integrated crystallographic, NMR and SAXS data with microsecond-scale atomistic molecular dynamics to construct a structural model of the overall two-domain system. In particular, NMR relaxation and paramagnetic relaxation enhancement (PRE) experiments along with microsecond-scale MD simulations in explicit solvent were carried out. Using this comprehensive integrated approach, we established that the two domains in the protein have no fixed relative orientation, although certain orientations are preferred over others (Debiec et al. 2018). In summary, the integrated use of NMR and SAXS provides a powerful means to describe the solution behavior of biological macromolecules, as the combined data collected with each method permits one to derive a more complete picture of a multi-domain protein or multiprotein complex than can be provided by either technique alone. Thus, when characterizing solution structures of biological systems, one should consider obtaining a SAXS shape envelope into which high-resolution NMR structures can be fitted.

Fig. 11.3
figure 3

Integration of NMR- or X-ray-derived domain structure information, NMR relaxation data, SAXS data and long-time scale molecular dynamics simulations permits the characterization of a probabilistic ensemble of the overall solution structure. The LysM domain is shown in blue, the CVNH domain in red, the interdomain linkers in green, and the paramagnetic MTSL tag in yellow. Structures were best fit to the CVNH domain coordinates. Solid contours represent 1 Å3 bins in the simulation that are occupied by a heavy atom in at least 1% of the ensemble, and transparent contours represent bins occupied in at least 0.1% of the ensemble