Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

7.1 Introduction and Historical Overview

Although NMR was discovered in 1946, its application to biological systems only started in the late 1960s and early 1970s. The application was very limited due to the poor sensitivity and very low resolution offered by the one-dimensional techniques used at that time. Two major breakthroughs in the 1970s revolutionized the field: Fourier transformation (FT) NMR that allowed rapid recording of NMR signals and two-dimensional NMR spectroscopy that dramatically increased the spectral resolution. These advances in combination with the development of stable magnets at higher fields led to explosive investigations using NMR spectroscopy in the late 1970s and early 1980s, which centered on exploring its potential in determining the 3D structures of macromolecules. Even though X-ray crystallography was already a method of choice for structure determination during that period, it was believed that NMR may provide complementary structural information in the more physiologically relevant solution environment. Moreover, since some biomolecules are difficult to crystallize, NMR could be used as an alternative method for obtaining 3D structures.

In the mid-1980s, several groups reported the first generation of solution structures of proteins and oligonucleotides using 2D NMR methods. The protocols used in these NMR structure calculations proved to be valid when the same structure of the α-amylase inhibitor Tendamistat was determined in 1986 independently by NMR and crystallographic groups. After that, the field witnessed an exponential growth with the excitement over NMR being another powerful method for macromolecule structure determination. However, it was soon realized that signal degeneracy and the intrinsic relaxation behavior of macromolecules limited 2D NMR application within the range of small proteins and nucleic acids (< 10 kDa).

In late 1980s and early 1990s, another quantum jump came when multidimensional heteronuclear NMR methods were developed thereby pushing the molecular size limit of NMR structures up to 35 kDa. Advances in molecular biology that lead to the overexpression and isotope labeling of proteins also played an important role. Hence, multidimensional heteronuclear NMR has opened the door for studying a wide variety of proteins and protein domains. The developments of TROSY and residual dipolar coupling (RDC) allow NMR to study even larger proteins and protein complexes. Hence, although NMR is still in its developmental stage and lags behind macromolecular crystallography by almost 30 years (the first crystal structure of a protein was published in 1957, whereas the first NMR structure came to the world in the mid 1980s), it has certainly become one of the most powerful players in molecular/structural biology. Today, about one fifth of the macromolecular structures deposited in the PDB (Protein Databank) were derived from NMR spectroscopy. Despite its size limitation for macromolecular structure determination, NMR has the following unique features: (a) it allows structural studies in a physiologically relevant solution environment, which avoids experimental artifacts such as crystal packing seen in crystal structures; (b) it allows structural studies of some molecules that are difficult to crystallize such as flexible protein domains, weakly bound protein complexes, etc.; (c) it can provide information about protein dynamics, flexibility, folding/unfolding transitions, etc. (see Chap. 8). With the completion of the human genome, NMR will also play a major role in the post-genome era in areas such as structural genomics, proteomics, and metabolomics.

The outline for the NMR-based structure determination (Wüthrich 1986) shown in Fig. 7.1 includes three stages: (1) sample preparation, NMR experiments, data processing; (2) sequence specific assignment, NOESY assignment, assignments of other conformational restraints such as J coupling, hydrogen bonding, dipolar coupling; and (3) structure calculation and structure refinement. One starts with a well-behaved sample (protein, nucleic acid, etc.) and performs a suite of NMR experiments designed for resonance assignment and structural analysis (Chap. 5). Four important parameters are generated for structure calculations: (a) Chemical shifts that provide mostly secondary structural information for proteins; (b) J coupling constants that provide geometric angles within molecules; (c) Nuclear Overhauser effects (NOEs) that provide 1H–1H distances within 5 Å. The NOE data are considered to be the most important and are rich in providing especially tertiary structural information. (d) The RDCs can provide valuable structural restraints. This fourth parameter is complementary to NOE data since it provides long-range distance information (>5 Å), whereas the NOE data is only restricted to <5 Å. Each of the parameters will first briefly be described and then a protocol will be used to describe in detail how a structure is calculated and how each parameter is implemented during the structure calculation.

Fig. 7.1
figure 1

Strategy for NMR-based structure determination

Key questions to be addressed in the current chapter include the following:

  1. 1.

    What types of structural information can be obtained based on the results of the resonance assignments?

  2. 2.

    What are the methods currently used for structure calculation from NMR data?

  3. 3.

    How are J coupling constants used in the structure determination of a protein?

  4. 4.

    What other nuclear interactions can be utilized for structure determination?

  5. 5.

    How are the NMR parameters utilized in structure calculation?

  6. 6.

    What are the strategies to carry out the calculation?

  7. 7.

    How is the structural quality analyzed?

  8. 8.

    How are the precision and accuracy of the calculated structures determined?

  9. 9.

    What is the role of iterative NOE analysis during the structure calculation?

  10. 10.

    Step-by-step illustration of structural calculation using a typical XPLOR protocol.

7.2 NMR Structure Calculation Methods

To date, the majority of structures characterized by NMR spectroscopy are obtained using distance and orientational restraints derived from the measurements of NOEs, J coupling constants, RDCs, and chemical shifts. The calculation of a 3D structure is usually formulated as a minimization problem for a target function that measures the agreement between a conformation and the given set of restraints. There are several algorithms developed over the past three decades, of which four are widely used (Table 7.1): (a) metric matrix distance geometry, represented by the DG-II protocol; (b) variable target function method, represented by the DIANA protocol; (c) Cartesian space or torsion angle space restrained molecular dynamics (rMD), represented by the XPLOR (or CNS) protocol; (d) Torsion angle dynamics, represented by the DYANA protocol. These methods generate and refine biomolecular structures by searching globally to get an ensemble of molecular structures that fit with the experimentally measured restraints within the range of experimental error. Distance geometry and rMD (simulated annealing) calculations are discussed in this section.

Table 7.1 Structure calculation methods and programs

7.2.1 Distance Geometry

One approach is to generate structures with the distance and orientational restraints derived from NOEs and J coupling constants using a metric matrix or dihedral angle space distance algorithm. The metric matrix method utilizes all possible interatomic distances as restraints including the known distances from covalent bonds and experimentally estimated distances from NOE data to generate an n-dimensional matrix for a molecule with n atoms (Crippen 1981; Braun 1987). For the remaining distances, upper and lower bounds are chosen and altered until no further alterations can be made according to the triangle inequalities:

$$ \begin{array}{lcllclcl}{ {{u}_{{ij}}} \leq {{u}_{{ik}}} + {{u}_{{kj}}}} \hfill \\{{l}_{{ij}}} \leq {{l}_{{ik}}} + {{l}_{{kj}}} \end{array} $$
(7.1)

in which i, j, k are the three atoms defining a triangle, and u and l are the upper and lower bounds between any two given points of the triangle. An ensemble of structures is generated from randomly selected distances within the boundary conditions.

The initial structures obtained by the randomization are further refined by minimizing a penalty function V (or potential function) such as:

$$ V = \sum\limits_{{i > j}} {\left\{ {\begin{array}{llclcl} {k{{{(r_{{ij}}^2 - l_{{ij}}^2)}}^2}\hskip 10.5pt {\hbox{if}}\quad {{r}_{{ij}}} < {{l}_{{ij}}}} \hfill \\{k{{{(r_{{ij}}^2 - u_{{ij}}^2)}}^2}\quad {\hbox{if}}\quad {{r}_{{ij}}} < {{u}_{{ij}}}} \hfill \\{0\quad \quad \quad \quad \quad {\hbox{if}}\quad{{l}_{{ij}}} \leq {{r}_{{ij}}} \leq {{u}_{{ij}}}}\end{array} } \right.} $$
(7.2)

in which r ij is the distance between atom i and j and k is a weighting factor, u, l are defined as in (7.1). The true global minimum is found only if V = 0, meaning that all restraints are satisfied. However, in practice, V is always greater than zero because of the insufficient number of restraints available. Function V is also called a target function. The minimization is achieved by comparing the interproton distances of the calculated structures with the distances within the chosen boundaries. The improved distance geometry uses dihedral angles rather than Cartesian coordinates to fold protein structures based on the short-range restraints and then expands the calculation to eventually include all restraints. These distance geometry methods (e.g., the DIANA program) have played an important role in the determination of protein structures by solution NMR spectroscopy.

7.2.2 Restrained Molecular Dynamics

Restrained molecular dynamics methods (e.g., the programs CNS and XPLOR) calculate the structures using NMR experimental restraints and energy minimization with potential energy functions similar to the above restrained potential energy function. The potential energy (or target function) is calculated for an array of initial atomic coordinates based on a series of potential energy functions (van Gunsteren 1993):

$$ {{V}_{\rm{tot}}} = {{V}_{\rm{classic}}} + {{V}_{\rm{NOE}}} + {{V}_{{J\;{\rm{coupling}}}}} + {{V}_{{{\rm{H}}\;{\rm{bond}}}}} + {{V}_{\rm{dipolar}}} + {{V}_{\rm{cs}}} + {{V}_{\rm{other}}} $$
(7.3)

in which term V classic is the potential function from the classic energy of the molecule, which contains \( {{V}_{\rm{bond}}} + {{V}_{\rm{angle}}} + {{V}_{\rm{dihedral}}} + {{V}_{{{\rm{Van}}{\rm{der}}\;{\rm{Waals}}}}} + {{V}_{\rm{electrostatic}}} \). The rest of (7.3) takes the NMR data in terms of distances derived from NOE, torsion angles from J coupling constants, molecular bond orientation restraints from residual dipolar couplings, chemical shifts and other restraints such as disulfide bridges, hydrogen bonding, and planarity. Although torsion angles and residual dipolar coupling restraints are sometimes not used, NOE distance restraints are always used in the rMD calculations. Among several different functions used to characterize the potential energy, a flat-well potential is commonly used, which consists of the energy contributions of NOE distance violations relative to the lower and upper distance bounds (Clore et al. 1986):

$$ \begin{array}{lcllclcl}{ {{{V}_{\rm{NOE}}} = \sum\limits_i^{{{\rm{all}}{\rm{NOEs}}}} {{{V}_{{{\rm{NOE}}i}}}}}} \\ \hfill = \sum\limits_i^{{{\rm{all}}{\rm{NOEs}}}} {\left\{ {\begin{array}{llclc} {{{k}_{\rm{NOE}}}{{{({{r}_i} - {{r}_u})}}^2}\quad {\hbox{if}}\quad {{r}_i} > {{r}_u}} \hfill \\{{{k}_{\rm{NOE}}}{{{({{r}_i} - {{r}_l})}}^2}\quad {\hbox{if}}\quad {{r}_i} < {{r}_l}} \hfill \\{0\quad \quad \quad \quad \quad \quad{\hbox{if}}\quad {{r}_l} < {{r}_i} < {{r}_u}} \hfill \\\end{array} } \right.} \end{array} $$
(7.4)

in which k NOE is the force constant of the NOE potential function (also called the target function), r l and r u are lower and upper bounds for individual NOE intensities, respectively, and r i is the interproton distance for each proton pair from the generated structure. If the precise interproton distances (instead of a range of distances) are obtained from NOESY spectra with different mixing times, the NOE potential function can be described using a biharmonic potential:

$$ {{V}_{\rm{NOE}}} = \sum\limits_i^{{{\rm{all}}\;{\rm{NOEs}}}} {\left\{ {\begin{array}{llclcllc} {{{C}_1}{{{({{r}_i} - {{r}_0})}}^2}} & {{\hbox{if}}\quad {{r}_i} > {{r}_0}} \\{{{C}_2}{{{({{r}_i} - {{r}_l})}}^2}} & {{\hbox{if}}\quad{{r}_i} < {{r}_0}} \\\end{array} } \right.} $$
(7.5)

in which C 1 and C 2 are the force constants which are temperature-dependent and r i and r 0 are the calculated and experimental distance, respectively.

An alternative method to calculate the NOE potential energy is to compare the calculated NOE intensities from the structure to the experimental NOE intensities at any given step during an rMD calculation using the NOE potential function (Brünger 1992a, 1992b):

$$ {{V}_{\rm{NOE}}} = k{{({{a}_{{{ \exp }}}} - {{a}_{\rm{cal}}})}^2} $$
(7.6)

in which a exp and a cal are, respectively, matrices of experimental NOE cross-peak intensities used for the calculation and calculated intensities from the structure obtained by the rMD simulation.

The energy barriers between local minima are more easily overcome in rMD because molecular dynamics is used in the energy minimization, which makes the method less sensitive to the initial structures (Allen and Tildesley 1987). A molecular dynamics simulation is performed by solving Newton’s equations of motion using the forces generated by varying the potential energies of the macromolecular structures. A minimum energy structure is obtained by solving the first derivative of the potential energy with respect to the coordinates of each atom using the condition that the derivative is zero. From Newton’s law, the force for an individual atom can be written as:

$$ F = ma = - \frac{{{\text{d}}V}}{{{\hbox{d}}r}} $$
(7.7)

or

$$ - \frac{{{\text{d}}V}}{{{\hbox{d}}r}} = m\frac{{{{\text{d}}^2}r}}{{{\hbox{d}}{{t}^2}}} $$
(7.8)

in which m is mass of the atom, a is the acceleration, V is the potential energy, t is time, and r the coordinates of the atom. The equation of motion is solved by numerical integration algorithms, and the trajectory for each atom as a function of time is calculated.

In order to maintain an accurate and stable simulation, the time step should be kept sufficiently smaller than the fastest local motion of the molecule. Typically, the time step size chosen in the simulation is in the range of femtoseconds (10−15 s) for simulations in a picosecond (10−12 s) timescale. During the simulation, energy barriers of the system (whose amplitude approximately equals kT) are passed by raising the temperature of the system high enough to increase the kinetic energy during the simulation so that a global energy minimum is located by balancing both the classical energy terms and the energy terms fitted with the experimental restraints. In the first stage of the calculation, an ensemble of initial structures is selected by randomization. The initial structures must gain kinetic energy, which is commonly provided by increasing temperature of the simulated system, to move away from their local energy minima and then pass over higher energy barriers. With the higher energy, the system contains a greater range of structural space. The system is then slowly cooled down to room temperature. During the cooling process, the system energy is minimized over the surface of potential energy to search for stable structures at low temperature. The cycles of heating and cooling are repeated until an ensemble of stable structures with an acceptable penalty (or violations) is eventually determined.

A typical procedure (Güntert 1998, 2003; de Alba and Tjandra 2002; Lipstitz and Tjandra 2004) for structure calculation includes (a) a stage of randomization in which a set of initial structures is generated with the idealized covalent geometry restraints such as bond length, bond angles, dihedral angles, and improper torsions; (b) global folding in which a variety of energy terms with both the geometry restraints and experimentally obtained distance and torsion restraints are used to obtain folded structures; and (c) refinement which utilizes the same energy terms as in the previous stage but in a smaller step size (typical 0.5 fs) for ps rMD processes to refine the structures generated in the previous stage. The refinement can also involve RDCs in which the structures are refined using observed RDCs by increasing dipolar force constants slowly and simultaneously refining the principal components of the alignment tensor.

7.3 NMR Parameters for Structure Calculation

7.3.1 Chemical Shifts

In principle, chemical shifts of NMR-active nuclei such as 1H, 15N, and 13C are dictated by the structural and chemical environment of the atoms (see Chap. 1). Vigorous efforts have been made to deduce protein structure from chemical shifts. Chemical shift-based secondary structural prediction has met with some success. In particular, the deviations of 13Cα (and, to some extent, 13Cβ) chemical shifts from their random coil values can be well correlated with the α-helix or β-sheet conformations: 13Cα chemical shifts larger than the random coil values tend to occur for helical residues, whereas the opposite is observed for β-sheet residues (Spera and Bax 1991; de Dios et al. 1993; Wishart et al. 1991, 1992). A good correlation was also observed for proton shifts with secondary structures: 1Hα shifts smaller than the random coil values tend to occur for helical residues, whereas the opposite is observed for β-sheet residues. Although the information is useful for tertiary structure calculations (Luginbühl et al. 1995; Kuszewski et al. 1995a, b), it is much more valuable for the initial secondary structural analysis in combination with NOE data (see below).

Methods have been developed to predict the dihedral angles using backbone chemical shifts, such as the programs TALOS (torsion angle likelihood obtained from the shift and sequence similarity, Cormilescu et al. 1999) and SHIFTOR (Zhang 2001). The prediction is based on the observation that similar amino acid sequences with similar backbone chemical shifts have similar backbone torsion angles. First, TALOS breaks the sequence of a target protein into overlapping amino acid triplets. Then, for each triplet, the program searches its database which contains proteins with known chemical shifts (1Hα, 13Cα, 13Cβ, 13C′, and 15N) and high-resolution x-ray crystal structures to compare the chemical shift and sequence similarity. The torsion angles for the central residue from the best ten matches are chosen as the predicted torsion angles for the residue, which are used as backbone dihedral angles in the structure calculation. Typically, TALOS can predict the dihedral angles for ~70 % of the residues within ±20°. The incorrect predictions can be removed by the inconsistency with other types of constraints during the structure calculation.

7.3.2 J Coupling Constants

J coupling constants are derived from the scalar interactions between atoms. They provide geometric information between atoms in a molecule. The most useful and obtainable coupling constants are vicinal scalar coupling constants, 3 J, between atoms separated from each other by three covalent bonds. Its relation with dihedral angle θ is defined as follows (Karplus equation, Karplus 1959, 1963):

$$ ^3J(\theta ) = A\;{{\cos }^2}\theta + B\;\cos \theta + C $$
(7.9)

in which A, B, C are coefficients for various types of couplings, and θ is the dihedral angle. Using the Karplus relationship, one can convert the J coupling constants into the dihedral angles, commonly, ϕ, ψ, and χ1. The dihedral angles can be determined by best fitting the measured J values to the corresponding values calculated with (7.9) for known protein structures (see Fig. 7.2). These dihedral angles can be used as structural restraints later during calculations (see below). Table 7.2 lists 3 J commonly used for deducing various dihedral angles for proteins.

Fig. 7.2
figure 2

Karplus curves describing the relationships of vicinal J coupling constants and torsion angle φ using the constants listed in Table 7.2. The solid line is for the coupling between protons, dotted lines for the heteronuclear couplings. The angle θ = φ + offset

Table 7.2 Karplus constants

7.3.3 Nuclear Overhauser Effect

NOEs are the most important NMR parameters for structure determination because they provide short-range as well as long-range distance information between pairs of hydrogen atoms separated by less than 5Å. Whereas the short-range NOEs are valuable for defining secondary structure elements such as α-helix or β-sheet, the long-range NOEs provide crucial tertiary structural information (Wüthrich 1986). The spectral intensity of an NOE (I) is related to the distance r between the proton pair, I = f(τ c)<r –6> in which f(τ c) is a function of the rotational correlation time τ c of the molecule. Because of many technical factors such as highly variable τ c for different molecules at different temperatures and solvent conditions, it is common to use intensity I (or cross-peak volume) to obtain qualitative distance information. The information is usually grouped into three different distance categories: 1.8−2.5 Å (strong), 1.8−3.5 Å (medium), and 1.8−5.0 Å (weak). Note that the lower bound for all three categories is 1.8Å, corresponding to the Van der Waals repulsion range. This treatment is due to the consideration that weak NOEs may not be related longer distances such as >4 Å. Instead, they may be related to the chemical exchange or protein motions that diminish the NOE intensities.

When performing a NOESY experiment to obtain NOE information, it is important to choose a proper mixing time that is in principle proportional to the distance (NOE intensity). Short mixing times may lead to the loss of weak NOEs that may contain important tertiary structural information. However, long mixing times may induce so-called “spin-diffusion,” that is, NOEs indirectly generated by spins in the vicinity >5 Å. The mixing time can be accurately determined by analyzing an NOE build-up curve (Neuhaus and Williamson 1989). However, the build-up curves vary considerably among different spins in the same molecule such as between methylene protons and methyl protons. A compromise is usually given to suppress spin diffusion and to maintain sufficient NOE intensities based on NOE build-up curves. As a rule of thumb, 80120 ms is usually used for small–medium-sized proteins. Sometimes, NOESY experiments with several different mixing times are performed to make sure that the “spin-diffusion” peaks are not picked.

Several programs are available for NOE analyses such as nmrview and PIPP. These programs can store all the assigned NOEs in a table that can be converted into distance format for structure calculations (see Appendix A). The assigned NOEs can also be plotted as a function of protein sequence to gain information about the protein secondary structural information and topology of tertiary fold in conjunction with chemical shifts and J coupling information (Fig. 7.3).

Fig. 7.3
figure 3

Experimental restraints for eotaxin-2, including NH exchange \( {}^3{{J}_{{{{\rm{H}}^{\rm{N}}}{{\rm{H}}^{{\rm{\alpha }}}}}}} \) coupling constants, sequential, short- and medium-range NOEs and Hα, Cα, and Cβ secondary chemical shifts, along with the secondary structure deduced from the data (reproduced with permission from Mayer and Stone (2000), Copyright © 2000 American Chemical Society). The amino acid sequence and numbering are shown at top. Sequential N–N and α–N NOEs are indicated by black bars; the thickness of the bar represents the strength of the observed NOE. The presence of medium-range N–N and α–N NOEs is indicated by solid lines. Gray bars and dashed lines represent ambiguous assignments. \( {}^3{{J}_{{{{\rm{H}}^{\rm{N}}}{{\rm{H}}^{{\rm{\alpha }}}}}}} \) coupling constants are represented by diamonds corresponding to values of <6 Hz (open), 6–8 Hz (gray), and >8 Hz (black). Residues whose amide protons show protection from exchange with solvent are indicated with filled circles. The chemical shift indices shown for Cα, Cβ, and Hα were calculated according to the method developed by Wishart et al. (1992). The locations of the secondary structure elements identified in the calculated family of structures are shown at the bottom

7.3.4 Residual Dipolar Couplings

Between the early 1980s and late 1990s, all NMR structures were determined based primarily on NOE data supplemented by J coupling constants and chemical shifts. In the late 1990s, a new class of conformational restraints emerged, which originate from internuclear RDC in weakly aligned media such as bicelles (Tjandra and Bax 1997; Prestegard et al. 2000; de Alba and Tjandra 2002; Lipstitz and Tjandra 2004). The RDC gives information on angles between covalent bonds and on long-range order. The addition of this structural parameter has proven to greatly improve the precision as well as the accuracy of NMR structures.

Although internuclear DD (dipole–dipole) couplings are typically averaged out due to molecular tumbling, RDC occurs when there is a small degree of molecular alignment with the external magnetic field (see Chap. 1). The RDCs are manifested as small, field-dependent changes of the splitting normally caused by one-bond J couplings between directly bound nuclei. With the assumption of an axially symmetric magnetic susceptibility tensor and neglecting the contribution from “dynamic frequency shifts,” the frequency difference Δνobs between the apparent J values at two different magnetic field strengths, \( B_0^1 \) and \( B_0^2 \), is given by (Tjandra et al. 1997):

$$ \Delta {{\nu }^{\rm{obs}}} = \frac{{\hbar {{\gamma }_a}{{\gamma }_b}{{\chi }_a}S}}{{30\pi rkT}}(B_0^2 - B_0^1)(3{{\cos }^2}\theta - 1) $$
(7.10)

in which \( \hbar \) is reduced Plank’s constant, k is the Boltzmann constant, T is the temperature in Kelvin, χ a is the axial component of the magnetic susceptibility tensor, S is the order parameter for internal motion, r is the distance between coupled nuclei a and b, and γ a and γ b are the gyromagnetic ratios of a and b, respectively. The structural information is contained in the angle θ between the covalent bond formed by two scalar coupled atoms a and b and the main axis of the magnetic susceptibility tensor. It is then straightforward to add an orientational restraint term to the target function of a structure calculation program that measures the deviation between the experimental and calculated values of θ.

An alternative way to obtain RDC orientational restraints without the assumption of an axially symmetric magnetic susceptibility tensor is to obtain the magnitude and relative orientation of the alignment tensor. The value for the RDC (ΔνD) is extracted from the difference between the splittings observed in alignment medium (ΔνA) and in isotropic solution (ΔνJ):

$$ \Delta {{\nu }_{\rm{D}}} = \Delta {{\nu }_{\rm{A}}} - \Delta {{\nu }_{\rm{J}}} $$
(7.11)

The dipolar couplings can be determined using IPAP type experiments (see Chap. 5). As discussed in Chap. 1, the RDC provides orientational information according to (1.63). In order to use RDCs as structural restraints, the magnitudes of the axial and rhombic components of the alignment tensor and their relative orientation with respect to the magnetic field must be determined. The magnitude and rhombicity of the alignment tensor can be obtained by examining the powder pattern distribution of all normalized observed dipolar couplings for the molecule. When structures are calculated, all of the variables can be obtained by fitting the equation with a large number of RDCs. If the structures are accurate, the calculated dipolar couplings of the structures will be in good agreement with the observed RDCs within the experimental error range. Such orientational restraints have been shown to improve the quality of the structures. They are also extremely valuable when calculating protein complex structures.

7.4 Preliminary Secondary Structural Analysis

Prior to structure calculations, it is useful to determine the secondary structure using a combination of chemical shifts, J coupling constants, and NOE data. As mentioned above, 13C chemical shifts are particularly indicative of α-helices and β-sheets. Three-bond \( {{J}_{{{\rm{NH}} - {\rm{H\alpha }}}}} \) coupling constants are often small for helical residues (<5 Hz), but large for β-sheet residues (>8 Hz). Regular secondary structure elements can also be easily identified from sequential NOEs, as each type of secondary structure element is characterized by a particular pattern of short-range NOEs (|r i r j | < 5 Å). For instance, α-helices are characterized by a stretch of strong and medium NH i– NH i+1 NOEs, and medium or weak CαH i –NH i+3, CαH i –CβH i+3 NOEs, and CαH i –NH i+1 NOEs, sometimes supplemented by NH i –NH i+2 and CαH i –NH i+4 NOEs. β-Strands, on the other hand, are characterized by very strong CαH i –NH i+1 NOEs and by the absence of other short-range NOEs involving the NH and CαH protons. β-Sheets can be identified and aligned from interstrand NOEs involving the NH, CαH, and CβH protons. Hydrogen-exchange experiments are also often performed to extract information for slowly exchanging amides that are normally involved in helices or β-sheets. This adds great confidence later when dealing with H-bonds of the backbone amides involved in α-helices or β-sheets.

A computer program written in a shell script can convert all the NOE, J coupling, exchange data, and chemical shifts into a figure for analyzing the secondary structures of proteins. This is illustrated in Fig. 7.3. Note that this approach tends to perform poorly in ill-defined secondary structural regions such as loops. In addition, the exact start and end of helices tends to be less accurate since the pattern of these parameters is similar to that present in turns. Thus, a turn at the end of a helix could be misinterpreted as still being part of the helix. In the case of β-sheets, the definition of the start and end is more accurate as the alignment is accomplished from interstrand NOEs involving NH and CαH protons.

Another preliminary structural analysis is the stereospecific assignment of diasterotopic protons. There are two major types of diastereotopic protons in amino acids: (a) methylene protons in Lys, Arg, etc.; (b) Methyl groups of Val, Leu. If the signals are well resolved, stereospecific assignments of β-methylene protons can be assigned by a combination of 15N-edited TOCSY and 15N-edited NOESY. Some methylene protons can also be stereospecifically assigned during the course of structure calculations. Stereospecific assignments of Val and Leu methyls can be made experimentally by the partial 13C labeling or fractional deuteration method (Neri et al. 1989; Senn et al. 1989), provided the signals are well resolved. If signals are degenerate or weak, which prevents the stereospecific assignments, diastereotopic protons are referred to as pseudoatoms, which may result in less well-defined structures. Stereospecific assignments not only provide more accurate distance information, but also provide dihedral angle information including χ 1 and χ 2 (Powers et al. 1991). Hence, it is important to have as many stereospecific assignments as possible in order to obtain a high quality structure.

7.5 Tertiary Structure Determination

7.5.1 Computational Strategies

Because proteins typically consist of more than a thousand atoms that are restrained by thousands of experimentally determined NOE restraints in conjunction with stereochemical and steric conditions, it is in general neither feasible to do an exhaustive search of allowed conformations nor to find solutions by interactive model building. In practice, as mentioned in the previous section, the calculation of a molecular structure is performed by minimizing the target function that represents the agreement between a conformation and a set of experimentally derived restraints. In the following section, a step-by-step description of structure calculation is provided using the most widely used XPLOR protocol.

7.5.2 Illustration of Step-by-Step Structure Calculations Using a Typical XPLOR Protocol

General guidance for the rMD protocol is given in Table 7.3. A complete protocol for protein structural calculations using simulated annealing XPLOR program (sa.inp) is provided in Appendix B. The file names in bold require modifications for specific protein structure determination and generally include the different input files such as distance restraints, PDB coordinates, etc. In the protocol, readers are referred to specific remarks on important lines such as “read the PSF file and initial structure,” which will help in understanding the protocol.

Table 7.3 Structure calculation protocol using rMD

7.5.2.1 Preparation of Input Files

  1. 1.

    Example of NOE table. All the assigned NOEs in a peak-pick table generated by programs such as PIPP or nmrPipe can be converted into a distance restraint table using a shell script. An example of an XPLOR distance restraint table can be found in Appendix C.

  2. 2.

    Example of dihedral angle restraint table. Dihedral angles derived from J coupling constants can be assembled into the format for the XPLOR program (Appendix D).

  3. 3.

    Example of chemical shift restraint table. The carbon chemical shifts of Cα and Cβ can be formatted for the XPLOR program (Appendix E).

  4. 4.

    Example of H-bond table. Although NMR experiments have been developed to directly measure the H-bonds, most of the H-bond restraints are still derived indirectly from amide exchange experiments. These H-bond restraints are normally used for structure refinement after the initial structure is calculated. The H-bond input table is as shown in Appendix F.

7.5.2.2 Preparation of Initial Random-Coil Coordinates and Geometric File

  1. 1.

    Input file to generate random-coil coordinates based on the protein sequence (Appendix G).

  2. 2.

    Input file to generate geometric PSF file (Appendix H). This file contains information on the molecular bonds, angles, peptide planes, etc. present in the structure (i.e., how the atoms are connected together).

7.5.2.3 Randomization

In the initial stage of the calculation, an array of random (or semi-random) initial structures is generated based on covalent geometry restraints including bond length, bond angles, dihedral angles, and improper torsions. After 10 ps of rMD is carried out at a temperature of 1,000 K, a total of 50–100 initial structures are obtained, which will be used for the structure calculation using experimental restraints in next step of the calculation. The energy of the randomized structures is minimized by 500 cycles of Powell minimization (Brooks et al. 1983) against the force of bond length, bond angles, dihedral angles, and improper torsions.

7.5.2.4 First-Round Structure Calculation: Global Folding

After the starting structures are obtained and all other files in bold in sa.inp are prepared, one can start the first-round structure calculations. On a UNIX-based SGI workstation on which XPLOR or CNS is installed, simply type “XPLOR<sa.inp>sa.out &” to initiate the calculation process. The detailed process and output parameters are all contained in the sa.out file. Calculation is often terminated in the beginning due to errors in the input files, nomenclature, metal coordination, etc. These errors are usually reflected in the sa.out file and the readers are referred to the XPLOR or CNS manual for instructions of specific file format. PDB coordinates of a set of calculated structures are stored in the directory during the calculation for visualization and analysis.

The starting structures are first calculated by 5 ps of rMD at 2,000 K with a step size of 5 fs and forces of the covalent geometry restraints such as bond length, bond angles, dihedral angles, improper torsions, and Van der Waals (Lennard–Jones bonds), and experimental restraints of NOE and J coupling. The NOE restraints are used with a force constant of k NOE = 30−50 kcal mol–1 Å–2, whereas the torsion angle restraints are applied usually with a relatively weak constant, k dih = 5−10 kcal mol–1 rad–2. The soft repulsive Van der Waals radii (Lennard−Jones) are scaled by a factor of 0.9. In the next step, the temperature of the system is decreased by 50 K per cycle during the slow cooling down to 300 K by 34 cycles (0.44 ps for each cycle; 0.44 ps × 34 = 15 ps) of rMD calculation with a step size of 5 fs. During the cooling, the Van der Waals radii are reduced from 90 to 80 % of their true values, k NOE is gradually increased from 2 to 30 kcal mol–1 Å–2, and k dih = 200 kcal mol–1 rad–2. The last step of the second-round of calculation consists of 500 cycles of Powell energy minimizations. The above procedure is looped for 100 cycles.

Although one will use as many NOEs as possible for the structure calculation, the interresidual distance restraints play a more important role in calculation. In order to obtain a high quality structure, more than 10 distance restraints should be used for each residue.

7.5.2.5 NOE Violations and Removal of Incorrect Distance Restraints

Once the first-round structures are calculated using the experimental restraints, mainly NOE data, it is necessary to analyze and validate the derived structures. Because of experimental errors and imperfect restraints, the calculated structures always contain violations of distance and torsion angles. A distance violation is the difference between the interproton distance in the structure and the closest distance bound (upper or lower) defined by the observed NOE intensity. NOE violations are output into a file after the calculation. At this stage, one should carefully examine the violations that appear in a large number of the structures. These consistent violations are likely caused by misassignment of NOEs or incorrect NOE volume integration. Frequently, finding the consistent violations is not always straightforward because the structure calculation is done by minimizing the potential function over all restraints. As a result, the large violations caused by the incorrect restraints are spread to neighboring regions, leading to a region being consistently violated with lesser scale, or sometimes to violations too small to be recognized after being distributed over a large number of minor violated restraints. Removal of the incorrect restraints after the first-round calculation will improve the quality of structures.

7.5.2.6 Iterative Steps for NOE Analysis and Structure Calculations

Because it is not possible to assign all NOESY cross peaks after sequence specific assignment due to chemical shift degeneracy and inconsistency in some extent of NOESY cross-peak positions compared to those obtained by resonance assignment, only a fraction of NOESY cross peaks are assigned unambiguously and used for the structure calculation at the initial stage. Even with only 30 % of the final number of NOEs, the generated structures are usually well defined although the resolution is relatively low. These structures are then used to resolve the ambiguous NOESY cross-peaks based on the spatial information of the first-round structures. In order to utilize the NOEs, criteria must be set such as chemical shift tolerance range (usually <0.02 ppm) and corresponding distance between the proton pair in the structures. The newly assigned NOEs are then used as restraints for the next round of structure calculation. It is necessary to carry out several rounds of NOE assignment and structure calculation to assign a majority of ambiguous NOE cross-peaks.

The above process of assignment/calculation can also be performed automatically. First, the NOEs are listed with possible assignments for a given chemical shift tolerance. After the first-round calculation, the program such as ARIA (Nilges et al. 1997; Nilges and O’Donoghue 1998; Linge et al. 2001, 2003) uses output structures to reduce the assignment possibilities by comparing the interproton distance from the structure to that from the ambiguous NOEs for all assignment possibilities. The new distance restraints are tested during the next round of calculation. Usually, multiple restraints are given to the reassigned ambiguous NOE, of which only one will be correct. Therefore, these restraints are taken to be more flexible range during the structure calculation.

An alternative approach for iterative NOE Analysis is back-calculation of NOESY cross-peaks based on the generated structure. Once the folded structure is generated by rMD, the intensities (in terms of volume) of NOESY diagonal and cross-peaks can be calculated for the structures using a relaxation matrix approach with the consideration of spin diffusion. During the back-calculation, the relaxation matrix is first defined with the assumption of isotropic motion in absence of the cross correlation contribution to the relaxation. The relaxation matrix is then used to calculate theoretical NOESY spectra from the calculated structures. The theoretical NOESY spectrum is compared with the experimental data either manually or automatically to assign additional NOESY cross-peaks which are used as distance restraints for further structure calculations as described above.

7.5.3 Criteria of Structural Quality

The quality of structures is usually reported in terms of several statistic characters including rmsd (root mean square deviation) of the distance and dihedral restraints, rsmd of idealized covalent geometry, rsmd of backbone atoms, rsmd of heavy atoms, the total number of distance violations, and dihedral restraint violations. These criteria provide insights about how consistent the structures are with the experimental restraints. However, these statistical criteria are imperfect when describing the accuracy of the structural calculations. The quality factor is a preferable parameter to describe the consistency of the derived structures with the experimentally determined restraints. The quality factor or Q factor is defined for a type of restraint A as follows (Yip and Case 1989; Nilges et al. 1991; Baleja et al. 1990; Gochin and James 1990):

$$ Q = \frac{{{\text{rms}}({{A}^{\rm{obs}}} - {{A}^{\rm{calc}}})}}{{{\hbox{rms}}({{A}^{\rm{obs}}})}} = \frac{{\sqrt {{\sum\limits_i {{{{(a_i^{\rm{obs}} - a_i^{\rm{calc}})}}^2}} }} }}{{\sqrt {{\sum\limits_i {{{{(a_i^{\rm{obs}})}}^2}} }} }} $$
(7.12)

in which rms(A obsA calc) is the root mean square of the difference between the observed values and calculated value of restraint, and rms(A obs) is a normalization factor. The restraint can be the NOE intensity (defined by peak volume), J coupling constant, or RDC. As the equation depicts, the Q factor can be used as an indicator for how close the calculated structures are to the actual one for a given set of NMR data.

7.5.4 Second-Round Structure Calculation: Structure Refinement

The second-round calculation involves structural refinement by optimizing the calculated structures after the folding stage with small time step rMD processes while simultaneously minimizing the Q factor of the generated structures. Usually, hydrogen bonding restraints observed for slowly exchangeable amide protons of secondary structural elements are included in the target function during the final stage of the refinement since the hydrogen bonding distance cannot be longer than 2.4 Å and the bonding angle must be within ±35° of linearity. If RDC data are available, the refinement is also carried out using an empirical energy term containing dipolar couplings for the target function. Since dipolar couplings provide long-range restraints, they can be used to correct misassigned NOEs, and hence reduce the number of violations and increase the accuracy of the structures.

7.5.5 Presentation of the NMR Structure

Once the structures are determined from NMR data, it is necessary to display them. In a ribbon representation of the average structure, the secondary structural elements are easily recognized, whereas the deviation of the determined structures is visualized by the backbone superposition of a set of final structures. The flexible and rigid regions of the structures are clearly indicated by the superposition representation. The structures can also be represented by molecular surface or electrostatic potential, which is helpful in studying the binding sites of a complex or the overall shape of a molecule. The detailed molecular structures can also be displayed.

There are many software packages available for displaying molecular structures in both schematic and detailed representations, of which MOLMOL (MOLecule analysis and MOLecule display) and MOLSCRIPT are widely used. The software takes the coordinates of the atoms in a structure file to generate 3D structures in the above representations. In addition to displaying superimposed structures, MOLMOL (http://www.mol.biol.ethz.ch/groups/wuthrich_group/software) can be used to display hydrogen bonds, electrostatic surfaces, and Ramamchandran plots. A unique feature of MOLSCRIPT is that the output image can be saved in various formats such as PNG, JPEG, GIF, and many other image formats (http://www.avatar.se/molscript). Midasplus is another program for structural display, which also calculates molecular surfaces, electrostatic potentials, and draw distances between protons (http://www.cgl.ucsf.edu/Outreach/midasplus). Figure 7.4 shows a sample display of a set of structures in ribbon and superimposition representations using MOLMOL.

Fig. 7.4
figure 4

Solution structure of the cytoplasmic domain of a prototypic integrin αIIbβ3 and binding interface (Vinogradova et al. 2002). (a) A backbone superposition of 20 best structures of αIIbβ3 tail complex. (b) Backbone ribbon display of the average structure of the αIIbβ3 tail complex same as in (a). (c) Zoom-in detailed view of the αIIbβ3 binding interface showing the hydrophobic and electrostatic contacts (reproduced from Vinogradova et al. (2002). Copyright © 2002 Elsevier)

7.5.6 Precision of NMR Structures

The precision of NMR structures is related to the precision of the experimental data. Errors in measurements of NOE, J coupling, and dipolar coupling will affect the precision in the estimation of distance and orientational restraints derived from the data. The precision of the calculated structures is usually presented in terms of the atomic rmsd such as the rmsd of backbone atoms and the rmsd of all heavy atoms. A low rmsd value of the structures indicates that the calculated structures are close to the average structure, which represents a high precision of the structure calculation. A smaller range of the errors in the restraints will produce structures with improved rmsd values. Several factors will contribute errors in the measurements such as low digital resolution in multidimensional experiments, noise level, resonance overlapping, etc. The rmsd of the calculated NOE intensity, J coupling, and chemical shift for the structures compared to the experimental data will also validate the quality of NMR structures as mentioned previously.

In general, an increase in the number of experimental restraints will improve the precision of the calculated structures. However, the precision of the structure determination does not guarantee the accuracy of the NMR structures. For example, if the distances derived from NOE are scaled by a factor due to incorrect NOE volume measurement, the calculated structures will be significantly different from the structures obtained with the correct distance restraints. Therefore, the accuracy of NMR structures is required to be examined with additional criteria.

7.5.7 Accuracy of NMR Structures

It is thought that an accurate structure should not have substantial violations in Ramachandran diagrams and covalent bond geometry. Programs have been developed such as PROCHECK (Laskowski et al. 1996) and WHAT_CHECK (Hooft et al. 1996) for checking the values of bond lengths and angles, the appearance of Ramachandran diagrams, the number and scale of violations of experimental restraints, potential energy, etc. Structures with poor scores do not necessarily indicate errors in the structure, but they require attention to locate possible misassigned experimental data. On the other hand, structures with high scores also do not assure the accuracy of the calculation.

As mentioned earlier in this chapter, the quality factor is frequently used to describe the consistency of the generated structures with the experimentally obtained restraints. Actual NMR structures must possess a small Q factor. Consequently, the Q factor is often minimized during the refinement stage of the structural calculation. Although an ensemble of structures can be obtained with small rmsd and Q factor values, the accuracy of the structures cannot be validated using the restraints which are used to generate the structures. The accuracy of the structures requires cross validation with other criteria, of which a free R factor has been used for this purpose (Brünger 1992a, 1992b). The idea is to set aside some portion of experimental data which will be used for validation of the accuracy of NMR structures. The prerequisite to do this is that there must be sufficient restraints to generate the structures after excluding those to be set aside. For example, NMR structures can be calculated and refined using restraints from the measurements of NOE, J coupling, chemical shift, and hydrogen bonding. RDCs can then be used for validation of the accuracy and to further refine the calculated structure. Good agreement of the validated structures with the refined structures using the dipolar couplings will confirm the accuracy of the NMR structures. NOE back-calculation is also a valuable indicator of the accuracy of structures determined using NMR data.

7.6 Protein Complexes

7.6.1 Protein–Protein Complexes

Proteinprotein interactions play an essential role at various levels in information flow associated with various biological process such as gene transcription and translation, cell growth and differentiation, the neurotransmission, and immune responses. The interactions lead to changes in shape and dynamics as well as in chemical or physical properties of the proteins involved. Solution NMR spectroscopy provides a powerful tool to characterize these interactions at the atomic level and at near-physiological conditions. With the use of isotopic labeling, structures of many protein complexes in the 40 kDa total molecular mass regime have been determined (Clore and Gronenborn 1998). The development of novel NMR techniques and sample preparation has been increasing the mass size further for the structural determination of protein complexes. Furthermore, NMR has been utilized to quickly identify the binding sites of the complexes based on the results of chemical shift mapping or hydrogen bonding experiments. Because it is particularly difficult and sometimes impossible to crystallize weakly bound protein complexes (K d  > 10−6), the chemical mapping method is uniquely suitable to characterize such complexes. The binding surfaces of proteins with molecular mass less than 30 kDa to large target proteins (unlabeled, up to 100 kDa) can be identified by solution NMR in combination with isotopic labeling (Mastsuo et al. 1999; Takahashi et al. 2000). As discussed in Chap. 6, structures of small ligands weakly bound to the proteins can be determined by transferred NOE experiments. The structures of peptides or small protein domains of weakly bound protein complexes can also be characterized by NMR techniques, which may be beneficial to the discovery and design of new drugs with high affinity. In addition to the structural investigation of protein complexes, NMR is a unique and powerful technique to study the molecular dynamics involved in protein–protein reorganization.

7.6.2 Protein–Peptide Complexes

The contact surface contributing to the interactions of high affinity and specificity generally involves 30 or less amino acid residues from each protein of the complex (de Vos et al. 1992; Song and Ni 1998). Frequently, this contact surface is located in a single continuous fragment of one of the proteins, which can be identified by mutation and deletion experiments. Therefore, fragments can be chemically synthesized in large amount and studied by 1H NMR experiments owing to their small molecular size (Wüthrich 1986). In the study of protein–peptide complexes, samples prepared according to the procedure discussed in Chap. 3 for isotopic-labeled protein and unlabeled peptide are most commonly used since the availability of labeled peptide is prohibited by the expense of chemical synthesis from labeled amino acids and the difficulty of biosynthesis due to peptide stability problems during expression and purification.

Data collection and resonance assignment for the complex can be carried out in three stages: for labeled protein, for unlabeled peptide, and for the complex. In the first two stages, the protein and peptide are treated as two independent entities. Standard multidimensional heteronuclear experiments can be carried out using the complex sample for resonance assignment, including J coupling measurement, NOE analysis, and RDC measurement. Backbone HN, N, Cα, C′ resonances are assigned using HNCO, HNCA, HN(CA)CO, and HN(CO)CA and aliphatic side-chain resonances using CBCA(CO)NH, CBCANH, 3D or 4D HCCH-TOCSY, and 15N edited TOCSY as described previously. The distance restraints are obtained from 3D 13C or 4D 13C–13C NOESY based on the resonance assignments (Qin et al. 2001).

Questions

  1. 1.

    Why are small rmsd values of the calculated structure insufficient to describe the accuracy of the structure?

  2. 2.

    Why is the temperature increased to 2,000 K and then cooled down to 300 K during rMD calculation?

  3. 3.

    How are the chemical shift indexes used for identifying secondary structural elements?

  4. 4.

    Why is the iterative NOE analysis important to the structure calculation?

  5. 5.

    What is the Q factor of the structure calculation? What does it describe?

  6. 6.

    What are the parameters used for protein structure calculation and how are they obtained?

  7. 7.

    What kind of restraints is used for secondary structure determination and what for tertiary structure determination?

  8. 8.

    Both NOEs and RDCs are used as restraints for structure calculation. What kinds of structure information are they provide?