15.1 Introduction

In this chapter we focus on scattering from non-crystalline solutions of molecules or nanoparticles in which the scattering objects are rotationally disordered.

This field of solution scattering dates back to the classic work of Guinier [10] and Kratky [15] in the 1930s and 1940s. At that time it was shown that the radius of gyration of the macromolecule could be extracted from the small angle scattering profile as a model-free parameter (see Doniach 2001 [8] for details).

Following the earlier work by Svergun and Stuhrmann [22], a significant development in the field was made by P. Chacon et al. in 1998 [5] who were working in electron microscopy at the time. They utilized a basic principle of feature extraction from complex data whereby the application of prior knowledge, in the form of constraints, to the fitting of simple models to data (in this case scattering data) can help the ability to relate the scattering data to molecular models of the molecules being examined.

As shown by Chacon et al., models of three-dimensional molecular shapes at nanometer-scale resolution can be extracted from one-dimensional I(q) Small and Wide Angle X-ray Scattering (SAXS/WAXS) data by imposing suitable constraints of continuity and positivity of sample density.

This approach was further developed by Svergun [21] and by Walther et al. [23]. As discussed below, the features extracted from scattering data are represented by the pair distribution functions of atoms in the molecules being examined. These one-dimensional pair distribution functions coming from the 2-point correlation functions represented by the data can then provide low resolution three-dimensional shape models when suitable constraints are applied.

The advent of X-ray FELs has now made possible the extraction of 4-point correlation functions, from scattering data on solutions of rotationally disordered molecules, yielding data in a three-dimensional representation. In this chapter we focus on the experimental problems to be dealt with in the extraction of three-dimensional models from the 4-point correlation functions obtained from X-ray FEL scattering data. This three-dimensional representation is obtained by computing angular correlations of the deviations away from the angular means I(q).

15.2 Extraction of 3d Molecular Shapes from 1d SAXS/WAXS Solution Scattering Data

S/WAXS measurements from a mono-disperse solution of macromolecules result in one-dimensional data sets which may be related to the three-dimensional structure of the scattering molecules via

$$\displaystyle \begin{aligned} I(q)\propto \int d\phi\ d\mathrm{cos}\theta\ |F(q\cos\theta, \phi)|{}^2\vspace{-2pt} {} \end{aligned} $$
(15.1)

where F is the scattering form factor of the molecule

$$\displaystyle \begin{aligned} F(q\cos\theta, \phi)=\sum_i f_i(q) exp(i \mathbf{q}\cdot {\mathbf{r}}_i) {}\end{aligned} $$
(15.2)

with atomic positions r i and atomic scattering factors f i(q). Averaging over orientations gives the Debye formula [8]

$$\displaystyle \begin{aligned} I(q)&=\left\langle\int d\phi\ d\mathrm{cos}\theta\ |F|{}^2\right\rangle_{\mathrm{orientations}}\\ &=\sum_i |f_i|{}^2 + \sum_{i<j}f_i f_j {\sin q r_{ij}\over q r_{ij}} {} \end{aligned} $$
(15.3)

Equation (15.3) can be written in terms of the pair distribution function within the molecule,

$$\displaystyle \begin{aligned} g(r)= \sum_{ij}f_if_j\delta (r-r_{ij})\vspace{-2pt} {} \end{aligned} $$
(15.4)

as

$$\displaystyle \begin{aligned} I(q)=\sum_i |f_i|{}^2 + \int dr g(r){\sin{}(qr)\over q r} {} \end{aligned} $$
(15.5)

Although one-dimensional, this data, supplemented by prior knowledge about molecules occupying a finite volume (support), contains significant information about the three-dimensional structure of the scattering molecule.

Application of positivity and smoothness constraints beyond the Guinier small angle region can extract the 2-point correlation function shown in Eq. (15.5), giving the atomic pair distribution in the molecule [22] (see Doniach [8] for details).

15.3 Angular Correlations of Solution X-Ray Scattering Lead to Three-Dimensional Data

Kam’s 1977 paper [12] showed how the measurement of azimuthal angular correlations in the scattering data yields three-dimensional information about the structure of the scattering molecules. At that time, the use of Dynamic Light Scattering (DLS) to obtain size information for mixtures of molecules had recently been developed. DLS involves the correlations of fluctuations in the scattered light intensity as a function of time. When Kam extended the application of these ideas to X-ray scattering, he referred to the resulting measurement of correlations as “fluctuation” scattering although the time-dependent correlation aspect was replaced by correlations of the angular dependence of the scattering.

The correlator for the scattering as defined by Kam [12] is

$$\displaystyle \begin{aligned} C(q_1,q_2,\psi)\equiv \int d\phi\ \delta I(q_1,\phi)\times \delta I (q_2, \phi+\psi) {} \end{aligned} $$
(15.6)

Here, δI = I(q, ϕ) − I(q) represents the deviation of I(q, ϕ) from the azimuthal mean, I(q), and ψ is the angle between vectors q 1 and q 2. As may be seen by comparison with Eq. (15.1), this involves the product of two form factors, hence is a measurement of a 4-point correlation function. This then needs to be averaged over orientations of the scattering molecules.

Since this is a 4-point correlation function, the rotational averaging involves 2 pairs of atoms r ij, and r kl. In order for the scattering to be correlated, r ij,kl must form a tetrahedron of atoms in the same individual molecule in the irradiated sample. The correlation requirement therefore leads to two kinds of rotational averaging: the correlator, Eq. (15.4) averaged over the set of random rotations \(\mathcal {R}_{\omega }\) applied to the same molecule containing both r ij and r kl; and a second set of rotations, averaging over independent rotationally random ensembles for each of the molecules containing r ij and r kl.

The rotationally averaged correlator is thus decomposed into two terms:

$$\displaystyle \begin{aligned} C^{\mathcal{C}}(q_1,q_2,\psi)=\sum_{\omega_i}\int d\phi\ \delta I^{\mathcal{R}_i}(q_1,\phi)\times \delta I^{\mathcal{R}_i} (q_2, \phi+\psi)\ {} \end{aligned} $$
(15.7)

and

$$\displaystyle \begin{aligned} C^{\mathcal{U}}=\sum_{\omega_{i\ne j}}\int d\phi\ \delta I^{\mathcal{R}_i}(q_1,\phi)\times \delta I^{\mathcal{R}_j} (q_2, \phi+\psi)\ {} \end{aligned} $$
(15.8)

Here, \( \delta I^{\mathcal {R}_i}\) denotes the value of δI(q, ϕ) after rotation by rotation matrix \(\mathcal {R}_{\omega _i}\).

15.4 Extraction of Structural Features by Averaging Many X-Ray FEL Shots

Since the correlator \(C^{\mathcal {U}}\) involves independent sets of random rotations for each of the molecules contributing to r ij and to r kl, the product of the deviations from the mean will contain as many positive as negative terms. Hence, the mean of \(C^{\mathcal {U}}\) is expected to ⇒ 0 when summed over many different ensembles of randomly oriented molecules.

When collecting data using an X-ray FEL, samples of the solution are typically illuminated by X-ray beam at rates varying from 30–120 Hz to date, with potential for faster rates at the recently operational European XFEL and the future LCLS-II. Thus, removal of the \(C^{\mathcal {U}}\) term, which involves averaging over a sufficient number of shots to lead to adequate convergence, can take place over a suitable length of beam time due to the high repetition rate.

The signal/noise for X-ray FEL measurements of randomly oriented ensembles of molecules has been studied in detail by Kirian et al. [14]. Here we use their results to give a qualitative summary.

In the high X-ray fluence limit in which many photons per shot scatter off a given molecule, we can treat the statistics of photons falling on a given pixel in the Gaussian limit. The probability distribution for the product of a pair of Gaussian ransodm variables has been studied rigorously [3] and is non-Gaussian. Here we simplify the discussion by approximating this distribution by the product of two Gaussians. In this approximation the magnitude of C for a given pair of pixels at q 1 and q 2 then scales as the sum of a set of products of two independent Gaussian variables. The mean, μ and standard deviation, σ of a product, Y , of two Gaussian variables, X, with same means and σ’s, scales as μ 2 with standard error scaling as σ 2.

An atomic scale estimate of the autocorrelator will be measured at large angles q ∼ 2πa, where a is an average interatomic distance. The standard deviation of the autocorrelator

$$\displaystyle \begin{aligned} C(q,q,\psi)\approx\sum_{\phi} \delta I(q,\phi)\times \delta I (q, \phi+\psi) {} \end{aligned} $$
(15.9)

of a macromolecule will be summed over the number m ϕ(q) of pixels which define the scattering ring q. Since the variance of a sum of independent gaussian variables scales as the sum of the individual variances, this leads to scaling of the signal/noise ratio on the estimate of C as \(\pm 1/\sqrt {m(q)n(q)}\), where n(q) is the number of photons falling on pixel m(q).

Because of the Gaussian nature of the distribution at large flux, this is then further narrowed by \(\sqrt {M}\) when averaged over M shots. This finally leads to an S/N error bar for \(C^{\mathcal {C}}(q,q,\psi )\) of \(\pm 1/\sqrt {Mm(q)n(q)}\).

15.5 Dealing with Artifacts

In order to observe correlated scattering from a solution of randomly ordered macromolecules, their Brownian motion needs to be eliminated. This may be achieved either by freezing in a gas stream from liquid nitrogen, in which case the sample is in a capillary, or by generating images from the 20–50 fs bursts of X-rays scattered from a jet of liquid sample injected into the X-ray FEL beam.

In a recent study, a suspension of gold nanoparticles in LCP (lipid cubic phase) a continuous jet of the sample was injected into the X-ray beam, which was running at 30 Hz [19].

An immediate observation was that the images on the detector are contaminated by parasitic X-rays, some of which arise from scattering from the detector nozzle. This parasitic scattering presents a basic problem for the measurement of angular correlations in the scattered beam since the image contamination is anisotropic around the azimuth normal to the incoming beam. This asymmetry causes large artifactual distortions of the angular correlations computed from the images, inhibiting the measurement of the correlated scattering from the molecules in the sample. There can also be artifactual anisotropies arising from the detector geometry and gain variations between the detector subunits.

A solution to this problem, originally proposed by Kam and collaborators [13], is to take the difference between pairs of images for which the artifactual anisotropy is the closest. In the case of the gold nanoparticle measurements [19], this was done by fitting a polynomial to the scattering intensity on the principal q = [111]-ring and using its coefficients to compare different samples. In each run of 5000 shots, pairs of images for which the polynomials were the closest were selected. For more complex samples such as biomolecules, pairing algorithms which sample over more of the detector need to be developed.

On calculation of the autocorrelations, the subtracted images were found to have the artifactual contributions considerably reduced. See Fig. 15.1 below. (For more details, see the Supplementary Information published in IUCrJ [19]).

Fig. 15.1
figure 1

Comparing the raw correlation of selected moderate intensities \(C^m(\cos \psi )\) to the difference correlation of the moderate intensities \(D^m(\cos \psi )\) Reproduced with permission of the International Union of Crystallography

When plotted on a log plot, the anisotropy effects on the raw correlator (Fig. 15.1 upper left panel) are found to be ∼2 orders of magnitude larger than the signal (Fig. 15.1 upper right panel). We fit 6th degree polynomials (dashed yellow) to the paired data and subtracted them (lower panels) to emphasize that, even after pair differencing, the correlations still contain residual anisotropies which are certainly artifactual.

These data represent averages over tens of thousands of exposures. Expected CXS signals for gold nanoparticles are marked on the axis and shown with grid lines. Apparent in the figure, the difference correlation is a critical step in the analysis. Without it we would not be able to distinguish the gold nanoparticle CXS signal from the artifactual CXS signal. Low frequency variation in the difference correlation (Fig. 15.1 right panel) persists, and is probably due to extreme detector artifacts.

As an additional Filter against artifacts, we employ a Friedel symmetry constraint. Friedel’s law states that I(q) = I(−q) (in the absence of anomalous scattering). Hence, if one measures a physical correlation at an angle \(\psi =\arccos ({\mathbf {q}}_1\cdot {\mathbf {q}}_2/q_1q_2)\), one should measure the same correlation at an angle \(\pi -\psi =\arccos (-{\mathbf {q}}_1\cdot \mathbf {q}/q_1q_2)\). This implies that a pure CXS function should be mirror-symmetric about \(\pi _/2\ (\cos \psi =0)\). Any signal violating this symmetry is likely to contain an artifactual component. We define the Friedel difference correlation \(D_F(\cos \psi )=1/2\{D(\cos \psi )+D(-\cos \psi )\}\) in order to enhance the true CXS information while minimizing false correlation peaks that defy Friedel symmetry.

15.6 Measurement of Molecular Details on the Atomic Scale Using Correlated Scattering

An electron density map allowing extraction of molecular details requires a range of scattering vectors up to q ≃ 2πa, where a is on the order of interatomic distances, generally ≈3.5 Å. In recent publications ([18] and [19]), wide angle scattering measurements were made of suspensions of silver and gold nanoparticles.

As is well established, the internal structure of these nanoparticles forms a face centered cubic lattice [11], which may be distorted by twinning defects. Calculation of the powder diffraction pattern for X-ray scattering I(q) for a model nanoparticle in the form of an fcc lattice truncated to a spherical shape gives a principal scattering peak corresponding to the Bragg vector q(111) = 2.674 Å−1 of the fcc lattice (lattice spacing 2.35 Å).

Raw images from both capillary scattering from Ag nanoparticles [18], and jet scattering from Au nanoparticles exhibit bright rings at q = 2.67 Å−1 superposed on artifactual parasitic anisotropic scattering [19].

In the recent work of Mendez et al. [19], X-ray FEL images were first categorized according to the total average exposure. Out of a total of roughly 3.8 × 105 usable exposures in runs of 5000 shots each, pairing was done on exposures that occurred during the same experimental run.

Each exposure i was paired with an exposure j according to their azimuthal anisotropies, quantified by the fitted polynomials \(y^{\ast }_i , y^{\ast }_j\). The squared Euclidean distance \(d_{ij}=\sum _{\phi } || y^{\ast }_i (\phi ) - y^{\ast }_j(\phi )||\) was used as a metric of comparison between two exposures (for further details, see [19]-supplementary information). Before computing angular correlations, images were sorted based on the [111] Bragg ring intensities. These were separated into two components: images with the brightest Bragg spots (Fig. 15.2a) and images with moderate intensities (Fig. 15.2b). This separation indicated the existence of two populations of nanoparticles in the samples.

Fig. 15.2
figure 2

Separation of bright Bragg spots in the angular intensity profile. (a) The 111 Bragg ring intensity of a single snapshot exposure i. Highlighted in green are the brightest intensities. (b) The same as (a), but the bright Bragg spots are removed, leaving behind the moderate intensity, which forms a relatively noisy signal. Reproduced with permission of the International Union of Crystallography

After removal of the artifactual scattering by pair selection as described in Sect. 15.5 additional peaks were found in the autocorrelations around the q[111] ring for the moderate component but not for the brightest component.

After pairing, the difference of the angular correlations of the moderate intensities between pairs of images was denoted by \(D^m(\cos \psi )=\delta C^m_i(\cos \psi )- \delta C^m_j(\cos \psi )\) for pairs i, j of images (see Eqs. (15.7) and (15.8)).

It was found that \(D^m(\cos \psi )\) showed peaks at \(\cos \psi = \pm 1/3, \pm 5/9\), and ± 7∕9 indicating the presence of twinning (Fig. 15.3c). On the other hand, the CXS of the images with bright Bragg spots, \(D^b(\cos \psi )\), only showed peaks at \(\cos \psi = \pm 1/3\) (Fig. 15.3d), implying that the domains which scattered the brightest Bragg spots were most likely not twinned. The absence of pronounced peaks at \(\cos \psi =\pm 5/9\) and ± 7∕9 further indicates that this signal possibly arises from a population of non-twinned scattering domains.

Fig. 15.3
figure 3

(a) Simulated CXS for the gold decahedron in Fig. 15.4b The horizontal line marks an SNR of 2.5; (d) The mirror-symmetric difference correlation of the bright Bragg intensities, D b(ψ). The absence of pronounced peaks at \(\cos (\psi )=\pm 5/9\) and ± 7∕9 suggests that this signal arises from a population of non-twinned scattering domains. Reproduced with permission of the International Union of Crystallography

Because the peak width is inversely proportional to the domain size, it was inferred that the relatively sharp width of the CXS peaks coming from the bright Bragg spots at \(\cos \psi = 1/3\) indicates that the corresponding nanoparticle domains are larger than the twinned domains which produced the CXS signal shown in Fig. 15.3b.

These conclusions are emphasized by simulation of CXS from single- and multi-domain nanoparticle models shown in Fig. 15.4.

Fig. 15.4
figure 4

(a) (left panel): Simulated CXS autocorrelation signals for a non-twinned cuboctahedron gold nanoparticle atomic model. (b) (right panel): Simulated CXS autocorrelation signals for a nearest-neighbor tetrahedron (NNT) model. Reproduced with permission of the International Union of Crystallography

In Fig. 15.4a, the single domain cuboctahedron shows only a single peak in the correlated signal, while the correlator for the multi domain nearest-neighbor tetrahedron (NNT) model in Fig. 15.4b shows three twinning peaks on each side of ψ = 90. The angular gap seen in the NNT model (Fig. 15.4b) of the dcahedron results because the tetrahedra are each close-packed fcc domains [24]. The twinning gives rise to the additional CXS peaks. The ability to distinguish these two classes of model based on angular correlations of the data demonstrates that these observations are able to distinguish model differences on an atomic length scale.

15.7 Time-Resolved Solution Scattering: One-Dimensional Data

At the macromolecular level, living systems are characterized by structure—function relationships in which deviations from thermodynamic equilibrium result in biochemical reactions which modify cellular information at the chemical or physical level. Examples include polymerases using DNA templates to synthesize messenger RNA, molecular motors converting chemical to mechanical energy, and G-protein coupled receptors transmitting chemical information across cellular membranes.

A unique feature of X-ray FEL radiation is that the X-rays are delivered in bursts lasting a few 10 s of femtoseconds. From the perspective of solution scattering this implies that pump-probe types of experiment can be carried out in which a modifying macromolecular reaction is started at an initial time, then the samples are exposed to the X-ray FEL beam at a series of later times t i leading to the possibility of generating “molecular” movies.

In recent years X-ray FEL technology has been used in pump-probe experiments to characterize the kinetics of macromolecular conformation change for a couple of well-studied systems—photolysis of the CO-myoglobin complex [16], and photo-excitation of the Blastochloris viridis photosynthetic reaction center, RC vir [2].

In each case, X-ray FEL radiation was used to measure one-dimensional solution X-ray scattering SAXS/WAXS profiles at a time-series of observations following a stimulus by an optical laser pulse.

For the CO-myoglobin experiments, Cammarata [16] and his team measured changes in the radius of gyration, R g and the molecular volume V p over the first 10 picoseconds (ps) following the optical excitation pulse. After an initial rapid rise in R g and volume, V p, these quantities are observed to relax to close to their initial value. The global conformational change reflected in the WAXS difference signal suggested that the internal secondary structure motions may initially occur through a quasi-ballistic mechanism as opposed to the more usual overdamped Brownian motion associated with thermally excited conformational changes.

The dynamics of energy dissipation in proteins have traditionally been referred to as “protein quakes” [1]. These new time-resolved X-ray FEL experiments provide valuable insight into the ultrafast dynamics of proteins in solution, but they also raise new questions: a recent paper by Brinkman and Hub [4], based on systematic molecular dynamics simulations, suggests a very different interpretation from that suggested by Cammarata et al. [16].

In the Cammarata work [16], underdamped oscillations in the ΔR Guin and ΔV Guin after CO photodissociation in Mb were interpreted as underdamped oscillations of the protein. However, Brinkman and Hub [4] find that ΔR Guin and ΔV Guin are dominated by modulations of the solvent density. These results lead to the conclusion that the small angle S/WAXS data from Mb report on protein dynamics only up to ∼500 fs, after which the signals are dominated by the propagation of the pressure wave into the solvent.

These findings highlight the importance of detailed simulations which accurately include solvent effects when interpreting solution scattering data.

Although a similar study has not yet appeared for the photosynthetic reaction center study [2], it seems likely that similar conclusions would also be reached for that case.

The overall conclusion is that one-dimensional X-ray FEL solution scattering experiments, coupled with detailed molecular dynamics simulations can provide atomic insights into the ultrafast dynamics of proteins. This ability provides a new perspective with which to examine the role of molecular dynamics in the structure-function relationships needed for understanding living systems.

15.8 Time-Resolved Solution Scattering: Three-Dimensional Data

In contrast to time-dependent S/WAXS measurements, the ability to obtain three-dimensional data by measuring 4-point angular correlations of X-ray FEL solution scattering data has the potential to reveal changes in molecular structure at resolutions close to atomic and over a large range of time scales. Development of this methodology will provide an independent test of simulations based on molecular dynamics calculations, while at the same time providing detailed insight into the structural biology of macromolecular interactions.

The papers of Donatelli et al. [6, 7] provide the ability to generate density maps from correlated solution scattering.

Here the application of suitable constraints on the scattering data is effected by application of spherical harmonic projections of density maps to give an iterative solution of the phase problem. To do this, Donatelli et al. combine the Hybrid Input-Output (HIO) methods of Fienup [9] with the method of shrink-wrap of Marchesini [17].

To combine phase retrieval with shrink-wrap, Marchesini et al. use the current real space estimate of the autocorrelation function density distribution of the scattering molecules, then perform a thresholding of the intensity. The result is a blurred estimate of the boundary of the object at a given intensity contour. This is then repeated every 50 steps or so of the HIO phase retrieval optimization.

The authors find that application of the shrink-wrap method acts to smooth out noise and provides an improved support constraint that gives rise to a successively better estimate of the object in real space as the iterative refinement proceeds to convergence.

Donatelli et al. have used simulated data to demonstrate that this approach has the power to extract structural details from three-dimensional X-ray FEL scattering from non-crystalline molecular solutions at close to atomic resolution. In this way, the time resolution of X-ray FEL scattering can eventually lead to atomic scale models of biomolecular reactions on physiological time and length scales.

15.9 Summary

The measurements discussed in this chapter demonstrate that angular correlation of X-ray scattering data may be used to reveal the internal structure of macromolecules in non-crystalline solutions on atomic length scales. The advent of X-ray FEL sources thus opens up a new frontier in the science of X-ray scattering from non-crystalline samples.

In particular, X-ray FEL radiation has the potential to reveal structural molecular dynamic biology under physiological conditions and at time scales and spatial resolution not accessible to other methodologies.