Introduction

Macromolecular dynamics are essential for life, and nuclear magnetic resonance (NMR) spectroscopy has been used extensively to characterize these phenomena since the 1950s (Boehr et al. 2006; Kleckner and Foster 2011; McConnell 1958). For the past 15 years, the Carr-Purcell Meiboom-Gill relaxation dispersion (CPMG RD) NMR experiment has afforded advanced NMR labs access to kinetic, thermodynamic, and structural details of molecular dynamics in the crucial μs-ms time window for proteins (Kempf and Loria 2003; Kleckner and Foster 2011; Loria et al. 2008; Palmer et al. 2005), and more recently for RNA (Johnson and Hoogstraten 2008; Kloiber et al. 2011; Ren and Ghose 2011). CPMG RD enables researchers to extract information on certain dynamic processes by measuring the effective transverse relaxation rate of the NMR signal, \( R_{2}^{Eff} \), in response to the application of refocusing pulses applied at frequency ν CPMG . Typically, the dynamic process is interpreted using the simplest description of exchange, the two-state A ↔ B model, which describes the RD curve using 4–5 parameters (Fig. 1).

Fig. 1
figure 1

The two-state exchange model requires 4–5 parameters to describe a single RD curve. a A single NMR probe (e.g., Ala 12NH) is considered to alternate between local structures designated A and B, which have distinct NMR chemical shifts in 1H and/or AX dimensions. Although two signals are shown here, the “minor” B state is not detected directly because its NMR signal is too weak and/or too broad. However, the A ↔ B exchange yields a quantitative effect on the measured RD curve of signal A. b A multiple quantum (MQ) RD curve is described by five parameters in the two-state model, which characterize the structural, thermodynamic, kinetic, and relaxation features of the exchanging states. (1) |Δω H | is the magnitude of 1H chemical shift difference between states A and B (n.b., this is not detected in a single quantum (SQ) heteronuclear experiment). (2) |Δω X | is the magnitude of AX chemical shift difference between states A and B (AX = 13C or 15N). (3) P A  = k B /k ex is the equilibrium population fraction of state A such that P A  + P B  = 1. (4) k ex  = k A  + k B is the total rate of exchange between states A and B. k A and k B are the rates of exchange from A → B and B → A respectively with k A  = (1 − P A )k ex and k B  = P A k ex . (5) \( {R_{ 2}^{ 0} } \) is the transverse relaxation rate in the absence of exchange assuming \( {R_{2}^{0} \approx R_{2A}^{0} \approx R_{2B}^{0} } \) (Ishima and Torchia 2006). The MQ curve shown here is simulated using |Δω H | = 8 Hz, |Δω X | = 201 Hz, k ex  = 1,000/s, P A  = 90%, and \( {R_{ 2}^{ 0} } \) = 10 Hz

Analysis of CPMG RD data is challenged by (1) the non-linear nature of its mathematical formulation, (2) the multitude of parameters required for fitting compared to its relatively featureless appearance, (3) the reality that an RD curve is often under-sampled with noisy data (to spare expensive NMR time), and (4) the fact that fitting accuracy is not validated by mere visual inspection of the fit to the data, nor by merely minimizing the fitted residuals, because some curves are fit equally well with divergent combinations of parameters (Hansen et al. 2008b; Ishima and Torchia 2005, 2006; Kovrigin et al. 2006; Myint and Ishima 2009; Neudecker et al. 2006). To address these limitations, experimentalists often acquire large datasets encompassing multiple magnetic field strengths (Kovrigin et al. 2006) and/or temperatures (Boehr et al. 2010; Mulder et al. 2001; Vallurupalli and Kay 2006) and/or pressures (Korzhnev et al. 2006) and/or quantum coherences (QCs) (Korzhnev et al. 2005), and produce custom software to implement a global analysis assuming two-state exchange (Beach et al. 2005; Boehr et al. 2010; Grey et al. 2003; Hansen et al. 2008a, b; Henzler-Wildman et al. 2007; Jee et al. 2008; Kovrigin, June 30, 2011; Mulder et al. 2002; Namanja et al. 2010) or, less often, three-state exchange (Korzhnev et al. 2007, 2004b; Neudecker et al. 2006). Although these software tools are generally available from their respective authors, their complexity and specificity may overwhelm a novice spectroscopist, and more intuitive software has only begun to emerge (Bieri and Gooley 2011).

Hence, we present the software package GUARDD (Graphical User-friendly Analysis of Relaxation Dispersion Data), which is designed to facilitate scientific investigations by organizing, automating, and enhancing the analytical processes related to CPMG RD data, especially for novice spectroscopists. In addition to its user-friendly interface, GUARDD contains powerful features that engage the user in the fitting process and thus clarify assessment of fitting accuracy. Advanced labs can benefit from using GUARDD to easily manage data and display results, to plan experiments using the comprehensive RD Simulator, and to extend customized functionality, permitted via an open-source code license. Importantly, we have reduced the activation barrier for its use by establishing tutorials, extensive documentation, and an online community of support (http://code.google.com/p/guardd/).

GUARDD software overview

GUARDD is designed for ease of use by implementing a graphical interface for key functions like managing data, performing and assessing fitting tasks, organizing and displaying results, and simulating RD data for experimental design and/or instruction. In the interest of space, only certain features are discussed, whereas comprehensive documentation and a user-friendly tutorial can be obtained via the GUARDD website (http://code.google.com/p/guardd/).

Data: hierarchical data objects are easily managed via graphical interface

Because analyzing RD data in aggregate is necessary for an accurate description of molecular dynamics, GUARDD implements a hierarchical data management system which accommodates a variety of hypothetical two-state exchange models (Fig. 2). The lowest tier is an RD Curve, which describes a series of relaxation rates and uncertainties, each paired with a value of ν CPMG , for a unique NMR signal/dynamic probe (i.e., \( R_{2}^{Eff} (v_{CPMG} ) \) and \( \sigma \left( {R_{2}^{Eff} (v_{CPMG} )} \right) \), where σ designates uncertainty, for one or more values of ν CPMG ). Each Curve is typically loaded with Curves from other signals as a Dataset, which specifies the experimental conditions: magnetic field strength B 0, temperature T, total time of the CPMG period T CPMG , and the nucleus AX and QC probed during that time (SQ = Single Quantum, MQ = Multiple Quantum). For convenience, Curves are loaded using a plain-text file that can easily be prepared using a spreadsheet program. For analysis, each Curve is designated its own relaxation rate, \( R_{2}^{0} \). At the middle tier, one or more Curves are aggregated by signal assignment into a Curveset, which designates the chemical shift differences, |Δω H | and |Δω X |, for that NMR probe. For each Curve, the QC determines the effect of the Curveset’s chemical shift differences (Korzhnev et al. 2005), such that MQ Curves are sensitive to |Δω H | + |Δω X | whereas SQ Curves are only sensitive to |Δω X | and therefore fix Δω H to zero. At the highest tier, one or more Curvesets are aggregated into a Group, which designates signals that share the kinetic parameters P A and k ex at each temperature. Because each Curveset (and therefore Curve) within a Group is constrained to exhibit the same exchange kinetics, the highest tier is useful for segregating independent modes of exchange in the molecule (e.g., domains). Importantly, although GUARDD can auto-generate Curvesets and Groups, the user has complete control over how the data are organized, and is encouraged to create many combinations of Curvesets and Groups to test the validity of different descriptions of molecular dynamics. These essential tasks are easily accomplished using the GUARDD Data Manager (Fig. 3).

Fig. 2
figure 2

To accommodate many descriptions of molecular dynamics, GUARDD organizes the basic unit of RD data, the Curve (C), in a hierarchical manner, allowing for global fitting of multi-temperature, multi-field, multi-coherence data. a At the lowest tier, Curves are loaded in Datasets (DS), which specify experimental conditions during data acquisition of those Curves. For analysis, one or more Curves are aggregated into a Curveset (CS), and one or more Curvesets are aggregated into a Group (G). In this way, fitting parameters required for each Curve are shared, thus enhancing fitting accuracy to gain insight into molecular dynamics. More Curves (C), Datasets (DS), Curvesets (CS) and Groups (G) are shown as smaller icons to indicate that each of these data objects may contain an arbitrary number of lower-tier objects. b An example Group, containing dynamic probes Ile 10δ1 and Leu 22δ1, contains two Curvesets, which each contain three Curves. Fitting this Group using GUARDD tests the hypothesis that these two NMR probes report a common exchange process in the protein. The fitting procedure requires 14 parameters. (1–4) The Group requires P A and k ex at each of two temperatures. (5–8) These Curves are MQ, and thus each Curveset requires specification of both |Δω H | and |Δω C | in ppm, which is converted to rad/s using each Curve’s field strength, B 0. (9–14) Each Curve is designated its own transverse relaxation rate \( {R_{ 2}^{ 0} } \)

Fig. 3
figure 3

The GUARDD Data Manager makes it easy to browse and construct hierarchical fitting objects, which allow the user to test models of molecular dynamics probed by RD. Using this interface, Curves can be added or removed to or from Curvesets, which can be added, removed or copied to or from Groups. For convenience, GUARDD can auto-generate Groups and Curvesets from the all Datasets, or a specified subset, by segregating Curves according to NMR signal assignment. To read these data using an external program, a hierarchical tree of Groups → Curvesets  Curves  \( {R_{2}^{Eff} (\nu_{CPMG} )} \) can be saved into a comprehensive plain-text file

Analysis: sharing fit parameters via two-state exchange at multiple temperatures

In order to fit a wide range of RD data, GUARDD uses the MQ Carver-Richards-Jones all-timescales dispersion equation (Korzhnev et al. 2004a). To accommodate multi-coherence data, MQ Curves allow both |Δω H | and |Δω X | to be optimized during fitting, whereas |Δω H | is fixed to zero for SQ Curves. To accommodate multi-field data, the Curve-specific rad/s value of Δω is obtained from its parent Curveset using \( \Updelta \omega_{Curve}^{{({\text{rad/s}})}} = 2\pi \gamma_{X} B_{0} \Updelta \omega_{Curveset}^{{({\text{ppm}})}} \), where γ X is the gyromagnetic ratio of nucleus X. To accommodate multi-temperature data, there are two fitting options. (A) Designate a Group-specific k ex and P A for each of the N T temperatures, and optimize these 2N T parameters during fitting. After fitting, additional details of the dynamic process can be obtained from activation energies and exchange enthalpies (Boehr et al. 2010; Vallurupalli and Kay 2006) resulting from Arrhenius and van’t Hoff analyses, respectively (Kleckner and Foster 2011; Winzor and Jackson 2006). Briefly, Arrhenius analysis requires both P A and k ex to quantify the temperature-dependence of exchange rate and van’t Hoff analysis requires only P A to quantify temperature-dependence of exchange populations. (B) The second fitting option uses the Arrhenius and van’t Hoff relations as part of the fitting process by optimizing four Group-specific parameters: k ex and P A at one temperature as well as activation energy E AB  = E B  − E A and exchange enthalpy ΔΗ = H B  − H A , which subsequently determine k ex and P A at any temperature. These options are useful because when data are available at more than two temperatures, option (B) requires fewer parameters than option (A) (4 vs. 2N T ), and therefore may be more appropriate for interpreting the RD dataset.

Analysis: grid search and Monte Carlo χ2 maps enhance and assess fitting accuracy

The goal of the fitting procedure is to obtain a set of parameters that accurately describes the trends in the data, which is typically pursued by minimizing the difference between the observed data and that calculated from the fit (Bevington and Robinson 2003; Motulsky and Christopoulos 2003). To accomplish this task, GUARDD invokes the MATLAB fmincon function with the interior-point algorithm (Byrd et al. 2000; though other options are available) to iteratively alter the fitting parameters \( {\vec{p}} \) to minimize the target function \( {\chi^{2} = \sum\limits_{All \, Curves} {\left( {\sum\limits_{{All \, \nu_{CPMG} }} {\left( {\frac{{R_{2,Eff}^{Obs} (\nu_{CPMG} ) - R_{2,Eff}^{Calc} (\nu_{CPMG} ,\vec{p})}}{{\sigma \left( {R_{2,Eff}^{Obs} (\nu_{CPMG} )} \right)}}} \right)^{2} } } \right)} } \), where \( {R_{2,Eff}^{Obs} (\nu_{CPMG} )} \) designates an observed data point with error \( {\sigma \left( {R_{2,Eff}^{Obs} (\nu_{CPMG} )} \right)} \) on a single Curve acquired with conditions B 0, T, QC, AX, and T CPMG ; \( {R_{2,Eff}^{Calc} (\nu_{CPMG} ,\vec{p})} \) designates the calculated point using the Curve conditions and the independent fitting parameters \( {\vec{p}} \) for the Group: P A and k ex for each temperature, |Δω H | and |Δω X | for each Curveset, and \( {R_{2}^{0} } \) for each Curve. Unfortunately, the nonlinear nature of RD phenomena makes the relationship between χ 2 and \( {\vec{p}} \) difficult to predict, and therefore optimization algorithms often “fail” by finding a local minimum of χ 2, which is sensitive to initial fitting conditions, instead of the intended global minimum (Fig. 4a). Hence, to assess the sensitivity of the final fit to initial conditions, GUARDD implements a grid search over a six-dimensional (6D) parameter space encompassing the most sensitive parameters: |Δω H |, |Δω X |, P A , k ex , E AB , and ΔΗ. Each point on this 6D grid is used to set the initial conditions for subsequent optimization. Using GUARDD’s Fit RD window (Fig. 5), the user can specify the limits and number of steps for each dimension of the grid, the selection of which depends Group size, knowledge of the fitting solution, and use of the Arrhenius constraint which enables the E AB and ΔH dimensions (see section "Analysis: sharing fit parameters", option (B)). The completed grid of N grid points contains N grid sets of optimized parameter values, with each set yielding one value of χ 2. Because the grid for each Group is typically large (10–500 points in each of 5–50 dimensions), GUARDD’s Chi2 Map window is designed to make the otherwise-cumbersome navigation relatively easy (Fig. 4, Bottom).

Fig. 4
figure 4

The nonlinear nature of the RD phenomena yields a “rough” χ 2 map that can trap an optimization routine in any of many local minima, instead of the intended global minimum, and GUARDD is designed to address and illustrate these potential pitfalls. ad The χ 2 map is a hypersurface with amplitude χ 2 and one dimension for each independent fitting parameter (e.g., 14 parameters yields a 14D hypersurface). Here, the response of χ 2 to just one parameter, k ex , produces a 2D slice through the hypersurface to illustrate four commonly encountered shapes that pose distinct challenges in obtaining an accurate fit. (Bottom) GUARDD addresses this challenge using a grid search to fit the data using many different initial conditions, thus sampling the χ 2 map. The Chi2 Map window enables the user to navigate each dimension of the map, which helps assess fitting accuracy and select the best fit. Here, two parameters, P A (25°C) and k ex (25°C), most closely correspond to the “Good fit” shape in a, though other parameters may require user-directed optimization, as in c, or may not be defined at all, as in d. To focus the display of this example χ 2 map, only the best 40% of fits are shown, where each black point designates the result of a single fit optimization and the red circle shows the “best” fit currently selected by the user (n.b., each fit can be displayed individually for more careful examination)

Fig. 5
figure 5

For a given Group, the fitting window specifies grid search limits, enables refinement via user-specified initial fitting conditions, and organizes and displays the array of candidate fits. In the lower half of the window, the selected fit is displayed, the user designates which parameters are valid for subsequent analyses, and plain-text notes can be logged for reference and aggregate export

When seeking the best fit for the Group, one must often select one χ 2 minimum out of several candidates (Fig. 4c). This issue is exacerbated by noisy RD data and/or small R ex resulting from exchange near a detection limit of the CPMG RD method. To this end, GUARDD is designed to accommodate three general strategies at the discretion of the user. (1) Select one of the fits, but mark the ill-defined parameters as “Not OK,” thus preventing their mis-interpretation in subsequent analyses. For example, in the fast-exchange regime (k ex  ≫ Δν), k ex is often valid despite that P A and Δω are not independently defined, but are instead convoluted into the quantity Φ ex  = P A P B Δω 2 (Ishima and Torchia 1999; Luz and Meiboom 1963). (2) Alter the Group and re-fit. The user may remove noisy Curves and/or add additional Curvesets to help constrain the values of k ex and P A . (3) Acquire more data and re-fit the new Group. The RD Simulator can help determine optimal conditions of temperature, magnetic field strength, and/or QC for efficient use of spectrometer time (see section “Optimization and education”). Importantly, NMR probes that sample more than two states may not fit well to the two-state exchange model used by GUARDD, and hence relatively poor fits (large χ 2) may reflect this limitation.

As mentioned previously, RD data are often noisy, and this can further widen the variety of fitting solutions surrounding the global minimum of χ 2. Hence, GUARDD implements a well-established Monte Carlo (MC) procedure to bootstrap from the residuals (Bevington and Robinson 2003; Motulsky and Christopoulos 2003). For a given Group, the MC algorithm generates many synthetic Groups, each randomly differing in \( {R_{2}^{Eff} } \) values from the original Group, yet still within their noise, and fits them individually to assess the sensitivity of the final fit to noise in the data. Much like the results of the grid search, the MC results are often large (100 s of fits with 5–50 parameters each), but can be easily navigated using the Chi2 Map window.

The computational time required for these fitting tasks depends on the number of Curves in the Group, where 1, 3, or 10 Curves requires about 5, 15, or 60 s per step using a desktop PC with a 2.6 GHz CPU and 3 Gb of RAM running MATLAB R2011a. Because these tasks are repeated hundreds of times for tens of Groups, GUARDD includes a Batch Task function which allows lengthy computations to be queued for consecutive processing, while incremental progress is automatically saved to the hard drive.

Optimization and education: RD Simulator explores the nature of RD phenomena

GUARDD includes an interactive RD Simulator that models multi-temperature, multi-coherence, multi-field RD data (Fig. 6). The related Kinetic Simulator models the temperature-dependence of population P A and rates k ex , k A , and k B for two-state exchange, provided enthalpy change ΔH and activation energy E AB . These features are designed to serve three important functions.

Fig. 6
figure 6

The RD Simulator can be used to explore the nature of RD phenomena to plan future experiments and to enhance understanding of RD. For example, by modeling the temperature-dependence of two-state exchange, one can identify the limits of detection for a particular NMR probe, thus avoiding uninformative experiments. The blue Curve corresponds to the “Curve Parameters” in the middle of the window, and the surface corresponds to how this Curve would appear at different temperatures, provided the “Exchange kinetics” in the upper panel. Data can be exported, either directly to GUARDD as a Group or to a plain-text file, with an arbitrary number of ν CPMG points and signal to noise ratio

First, they are helpful for selecting appropriate conditions for future experiments, thus optimizing the use of expensive NMR time. For example, if RD data are already available at two temperatures, the simulator can determine what third temperature yields the largest dispersions, which often yield the most accurate and informative fits used to describe the molecular dynamics.

Second, provided conditions for a future experiment, the simulator can identify the quantity and quality of new data required for an informative fit. This is important because if noisy Curves are added to a Group, the fit quality can be compromised. This assessment is performed by creating Groups of original data plus simulated Curves, each with different ν CPMG points and signal to noise ratios (e.g., 10 points at 10% noise, 15 points at 10% noise, 15 points at 5% noise). By comparing fits of these new Groups, the investigator can ensure that a sufficient number spectra and scans are acquired, thus avoiding the “noisy new data” pitfall.

Finally, these features are excellent teaching tools for learning basic principles of RD. Simply by editing fields in a table, the user can explore how the Curve reports on Δω, P A , k ex , and how it is influenced by B 0. Similarly, two-state exchange kinetics can be monitored in response to temperature, ΔH, and E AB . These tools can be used to create custom figures as teaching aids, or can engage a student directly with a “hands-on” approach.

Presentation: massive datasets are easy to visualize for analysis and dissemination

Because analysis of RD data typically yields many results, organization is challenging using a spreadsheet, for example. Hence, GUARDD is designed with versatile presentation features that allow the user to easily explore their results. In the Display RD window (Fig. 7, Top), the user can view the Curves from a Group with any of its candidate fits and residuals. Next, a table of results can be dynamically generated based on the user’s selection criteria from an arbitrary set of RD parameters and Curves (e.g., display only P A and k ex at one temperature, instead of all RD parameters for all Curves for all Groups) (Fig. 7, Bottom). This customized table can be exported to plain-text for publication or external analysis. Finally, the Group Display window allows the user to seamlessly prepare scatter plots and histograms of arbitrary RD parameters (Fig. 8). This is helpful for identifying outlying results, comparing candidate fits, and producing publication-quality figures. Moreover, any figures prepared by GUARDD can easily be altered using MATLAB’s figure editor (e.g., adding and moving text, specifying symbols, colors, and font sizes).

Fig. 7
figure 7

GUARDD includes powerful features to easily view fitting results. (Top) The Display RD window plots the selected Curves from one Group with a selected fit and its residuals. (Bottom) The Results Table window dynamically generates a table encompassing selected RD parameters from selected subsets of the data (e.g., magnetic field strength, quantum coherence, temperature). This table can be exported to a plain-text file for publication and/or external analysis

Fig. 8
figure 8

After fitting, the Group Display window visually organizes results to seek the nature of molecular motions. Here, subsets of Groups can be displayed as scatter-plot correlations between arbitrary RD parameters (e.g., Residue Number, k ex , P A , R ex , k A , α, ΔH). To this end, a set of Groups can be color-coded to designate a hypothetical concerted motion (e.g., protein domain A vs. B) or a particular fitting scenario (e.g., all Ile fit together vs. each Ile fit individually). Here, for two sets of Groups (red and blue), the upper subplot displays k ex along the protein sequence and the bottom subplot displays k ex as a histogram. This can help address the possible presence of multiple dynamic “modes” that each exchange at a distinct rate

Limitations and comparison to alternative software

Although GUARDD inherits desirable features from the MATLAB framework, this dependency imposes certain limitations. First, because MATLAB is an interpreted language, and because its graphical interface uses Java, GUARDD is slower than if it were coded using C or Python, for example. Although this lag is not prohibitive, it can be experienced while drawing the display, while reading or writing large session files, or while fitting data. Interested users may partially alleviate this drawback by using the MATLAB compiler, which generates a single executable that is not subject to runtime interpretation, and/or by programming consecutive fitting tasks to run in parallel. Second, malfunctions in MATLAB may hinder functionality of GUARDD (however, enhancements to MATLAB may imbue enhancements to GUARDD). Third, although MATLAB is a convenient cross-platform solution for dissemination of software, the user must have access to MATLAB and the MATLAB Optimization Toolbox (i.e., GUARDD is not a standalone program).

There are also limitations to the current features of GUARDD. First, the selection of exchange model is restricted to two-state using the all-timescales MQ Carver-Richards-Jones formulation (Korzhnev et al. 2004a). Although these equations are very general, certain cases may benefit from simplifications assuming skewed populations (P B  < P A ; Ishima and Torchia 1999) or assuming fast-exchange (k ex  > Δν; Kempf and Loria 2003; Kleckner and Foster 2011; Luz and Meiboom 1963). Also, more advanced labs may feel limited by the lack of analytical options like three-state exchange models (Neudecker et al. 2006), ZQ or DQ coherences (Korzhnev et al. 2005), pressure-dependence (Korzhnev et al. 2006), Anti-TROSY/TROSY analysis (Hansen et al. 2008a), or temperature-dependence using transition state theory (Boehr et al. 2010; Korzhnev et al. 2004b).

Regarding fitting accuracy, we expect GUARDD will match softwares that implement a grid search (Grey et al. 2003; Ishima and Torchia 2005; Kovrigin et al. 2006; Namanja et al. 2010), but will surpass ones that do not (Bieri and Gooley 2011), due to the ruggedness of RD χ 2 maps (Ishima and Torchia 2005; Kovrigin et al. 2006) and the effectiveness of the grid search procedure (Bevington and Robinson 2003; Motulsky and Christopoulos 2003). Regarding error estimation, the MC bootstrap procedure used by GUARDD is very effective, but not always the most computationally efficient option. For example, depending on the size of the dataset, the jackknife approach (Choy et al. 2005; Mosteller and Tukey 1968, 1977) or the covariance matrix method (Korzhnev et al. 2004b) may be superior, but GUARDD does not currently implement those options.

As detailed in prior sections, GUARDD is distinguished from other RD software for its user-friendly graphical interface, versatile data management and presentation features, and comprehensive simulator. These qualities are important because they fill a well-defined void in the domain of RD analysis software. Furthermore, because our goal is to enhance the analysis of CPMG RD data, limitations to GUARDD can be addressed in future updates.

Discussion and conclusions

Because CPMG RD is widely used, and because current analytical software options are limited, we expect GUARDD will be of broad interest to the NMR community. Especially for novice spectroscopists, the user-friendly interface, tutorials, organizational features, and RD Simulator will enhance and accelerate both knowledge of RD phenomena and analysis of their data. For experienced spectroscopists, the innovative display features will allow them to navigate results with ease, and the RD Simulator will help them plan future experiments with quantitative precision. Although GUARDD can accomplish important analytical procedures from start to finish, we encourage users to provide feedback, to request additional functionality, and to edit and share the source code, as we continue to serve the needs of RD experimentalists.

Acknowledgments

This work is funded by National Institutes of Health grant R01GM077234. The authors are thankful to investigators who shared their RD data, including Z. Wu, P. Vallurupali, F. Mulder, and D. Korzhnev. The authors are thankful for developmental feedback from all Foster Lab members, as well as E. Kovrigin, and D. Kern. Special thanks to A. Simmons for careful review of this manuscript.

Materials and methods

Alternative software for fitting CPMG RD NMR data

Although many labs will supply CPMG RD analysis software upon request, some is already publicly accessible: CPMG Fit (Palmer Lab), http://biochemistry.hs.columbia.edu/labs/palmer/software/cpmgfit.html; CATIA from F. Hansen (Kay Lab), http://pound.med.utoronto.ca/~flemming/catia/; NESSY from M. Bieri (Gooley Lab), http://home.gna.org/nessy/.

Extracting RD data from signal intensity

GUARDD can receive RD Curve data as NMR signal intensities, which is used to calculate relaxation rates \( {R_{2}^{Eff} = - \ln (I(\nu_{CPMG} )/I_{0} )/T_{CPMG} } \), where I(ν CPMG ) is the signal intensity in the 2D spectrum acquired with refocusing frequency ν CPMG , I 0 is the reference signal intensity obtained in the spectrum with no refocusing block, and T CPMG is the duration of the refocusing block. Errors in intensities \( {\sigma (R_{2}^{Eff} )} \)are estimated via standard deviation from repeat measures of I(ν CPMG ).

Modeling MQ relaxation dispersion

The fitting equations for MQ dispersions, which simplify to SQ dispersions provided Δω H  = 0, are as follows using δ = 1/(4ν CPMG ) and n = T CPMG ν CPMG (Korzhnev et al. 2004a):

$$ {R_{2}^{Eff} \left( {\frac{1}{2\delta }} \right) = \Re (\lambda_{1} ) - \frac{\ln (Q)}{4n\delta }} $$
(1a)
$$ {\lambda_{1} = R_{2}^{0} + \frac{1}{2}\left( {k_{ex} - \left( {\frac{1}{2\delta }} \right)\cosh^{ - 1} (D_{ + } \cosh (\eta_{ + } ) - D_{ - } \cos (\eta_{ - } ))} \right)} $$
(1b)
$$ {D_{ \pm } = \frac{1}{2}\left( {\frac{{\Uppsi + 2\Updelta \omega_{X}^{2} }}{{\left( {\Uppsi^{2} + \zeta^{2} } \right)^{1/2} }} \pm 1} \right)} $$
(1c)
$$ {\eta_{ \pm } = \sqrt 2 \delta ((\Uppsi^{2} + \zeta^{2} )^{1/2} \pm \Uppsi )^{1/2} } $$
(1d)
$$ {\Uppsi = (i\Updelta \omega_{H} + (P_{A} - P_{B} )k_{ex} )^{2} - \Updelta \omega_{C}^{2} + 4P_{A} P_{B} k_{ex}^{2} } $$
(1e)
$$ {\zeta = - 2\Updelta \omega_{C} (i\Updelta \omega_{H} + (P_{A}- P_{B} )k_{ex} )} $$
(1f)
$$ {Q = \Re \left( {1 - m_{D}^{2} + m_{D} m_{Z} - m_{Z}^{2} + \left( {\frac{{m_{D} + m_{Z} }}{2}} \right)\left( {\frac{{P_{B} }}{{P_{A} }}} \right)^{1/2} } \right)} $$
(1g)
$$ {m_{D} = \frac{{ik_{ex} (P_{A} P_{B} )^{1/2} }}{{d_{ + } z_{ +} }}\left( {z_{ + } + 2\Updelta \omega_{X} \left( {\frac{{\sin(z_{ + } \delta )}}{{\sin ((d_{ + } + z_{ + } )\delta )}}}\right)} \right)} $$
(1h)
$$ {m_{Z} = \frac{{ik_{ex} (P_{A} P_{B} )^{1/2} }}{{d_{ - } z_{ -} }}\left( {d_{ - } - 2\Updelta \omega_{X} \left( {\frac{{\sin(d_{ - } \delta )}}{{\sin ((d_{ - } + z_{ - } )\delta )}}}\right)} \right)} $$
(1i)
$$ {d_{ \pm } = (\Updelta \omega_{H} + \Updelta \omega_{X} ) \pm ik_{ex} } $$
(1j)
$$ {z_{ \pm } = (\Updelta \omega_{H} - \Updelta \omega_{X} ) \pm ik_{ex} } $$
(1k)

Exchange broadening

The exchange broadening R ex quantifies the height of the dispersion curve and is estimated from the fit as \( {R_{ex} \approx R_{2,Eff}^{Fit} (\nu_{CPMG} \approx 0{\text{Hz}}) - R_{2,Eff}^{Fit} (\nu_{CPMG} \approx 10^{4} {\text{Hz}})} \).

Scaling factor for chemical exchange timescale

The scaling factor α quantifies the timescale of chemical exchange, and is calculated α = d(ln(R ex ))/d(ln(Δω)) (Millet et al. 2000).

Temperature-dependence of exchange kinetics

Provided reasonable fitted exchange rates at multiple temperatures, Arrhenius and van’t Hoff analyses are performed (Winzor and Jackson 2006). Arrhenius analysis requires both P A and k ex to quantify temperature dependence of exchange rate via \( {\ln (k) = \ln (P) + \left( {\frac{ - E}{R}} \right)\left( \frac{1}{T} \right)} \) where k = k A  = (1 − P A )k ex (or k B  = P A k ex ) = kinetic rate of exchange from A → B (or B → A), P = P AB (or P = P BA ) pre-exponential rate, the exchange rate from A → B (or B → A) at infinite temperature, E = E AB (or E = E BA ) = activation energy (≈enthalpy) required to exchange from A → B (or B → A), R = gas constant, and T = absolute temperature. van’t Hoff analysis only requires P A to quantify temperature dependence of exchange populations via \( {\ln (K) = \ln \left( {\frac{{1 - P_{A} }}{{P_{A} }}} \right) = \frac{\Updelta S}{R} + \left( {\frac{ - \Updelta H}{R}} \right)\left( \frac{1}{T} \right)} \)where K = (1 − P A )/P A  = k A /k B  = equilibrium constant for exchange, ΔS = system entropy change from A → B, ΔH = system enthalpy change from A → B. Errors in Arrhenius and van’t Hoff analyses are estimated from errors from MATLAB’s fit routine (provided data at more than two temperatures), or from propagation of relative error from the fitting variables (when limited to data at only two temperatures).

Kinetic simulator

The two-state kinetic simulator itemizes all kinetic parameters of interest: ΔH, ΔS, E AB , P AB , E BA , P BA , k ex (T), P A (T), k A (T), and k B (T) where T is an arbitrary temperature, provided ΔH, E AB , \( {k_{ex}^{0} = k_{ex} (T_{0} )} \), \( {P_{A}^{0} = P_{A} (T_{0} )} \), and T 0. By extension of the Arrhenius and van’t Hoff relations above, this is accomplished as follows:

Using ΔH and P A (T 0), the van’t Hoff relation yields ΔS

$$ {\Updelta S = R\ln \left( {\frac{{1 - P_{A}^{0} }}{{P_{A}^{0} }}} \right) + \frac{\Updelta H}{{T_{0} }}} $$

which, with ΔH, determines P A at any temperature via van’t Hoff

$$ {P_{A} (T) = \left( {1 + \exp \left( {\frac{\Updelta S}{R} - \frac{\Updelta H}{RT}} \right)} \right)^{ - 1} } .$$

Next, using P A and k ex at T 0 determines k A and k B at T 0

$$ {k_{A}^{0} = k_{A} (T_{0} ) = (1 - P_{A}^{0} )k_{ex}^{0} } $$
$$ {k_{B}^{0} = k_{B} (T_{0} ) = P_{A}^{0} k_{ex}^{0} } $$

and using E AB and k A at T 0, the Arrhenius relation yields P AB

$$ {P_{AB} = k_{A}^{0} \exp \left( {\frac{{E_{AB} }}{{RT_{0} }}} \right)} $$

which, with E AB , yields k A at any temperature via Arrhenius

$$ {k_{A} (T) = P_{AB} \exp \left( {\frac{{ - E_{AB} }}{RT}} \right)} .$$

Next, knowledge of P A and k A at any temperature yields k ex at any temperature

$$ {k_{ex} (T) = \frac{{k_{A} \left( T \right)}}{{1 - P_{A} \left( T \right)}}}$$

and therefore k B at any temperature\( {k_{B} (T) = k_{ex} (T) - k_{A} (T)} \). Knowledge of k B at any temperature yields E BA via the Arrhenius relation and selection of any two temperatures T 1 and T 2 (e.g., 280 and 320 K)

$$ {E_{BA} = \frac{{R\ln (k_{B} (T_{1} )/k_{B} (T_{2} ))}}{{1/T_{2} - 1/T_{1} }}} .$$

Finally, using k B (T 0) and E BA , the Arrhenius relation yields P BA

$$ {P_{BA} = k_{B}^{0} \exp \left( {\frac{{E_{BA} }}{{RT_{0} }}} \right)} .$$

Monte Carlo bootstrapping from residuals (Motulsky and Christopoulos 2003)

Residuals are calculated for each observed ν CPMG value in a given dispersion Curve as \( {\varepsilon (\nu_{CPMG} ) = R_{2,Eff}^{Obs} (\nu_{CPMG} ) - R_{2,Eff}^{Calc} (\nu_{CPMG} )} \). The set of residuals is used to create a normal distribution \( {\mathbf{N}}(mean(\vec{\varepsilon }),\text{var} (\vec{\varepsilon })) \) for the Curve. Next, a synthetic dispersion Curve is created using the best fit at each observed ν CPMG value plus a random sample from this normal distribution: \( R_{2,Eff}^{Synth} (\nu_{CPMG} ) = R_{2,Eff}^{Calc} (\nu_{CPMG} ) + {\mathbf{N}}(mean(\vec{\varepsilon }),\text{var} (\vec{\varepsilon })) \). This is repeated for each Curve in the Group such that a synthetic Group is produced. The synthetic Group is fit using initial conditions from the best fit of the actual data. This is repeated 100 times to create 100 synthetic Groups and 100 sets of optimized fit parameters. The error in a given parameter is estimated as the standard deviation of the optimized fit parameter from its 100 element distribution. Errors in subsequent quantities (e.g., k A , k B , ln(k A ), etc.) are estimated using propagation of error assuming all parameters are uncorrelated (zero covariance).