Introduction

Sedimentation velocity (SV) experiments performed in an analytical ultracentrifuge provide information about composition, size, and anisotropy, and for some experimental designs information about density of colloidal molecules in solutions. They measure the sedimentation and diffusion transport of a colloidal particle in a centrifugal force field, and provide the partial concentration of each solute in a mixture. The observed signal is typically convoluted with systematic and stochastic noise. Where possible, systematic noise contributions can be removed mathematically (Demeler 2010), leaving only the stochastic noise to the residuals of a fit. We have developed a number of optimization routines to solve the problem of fitting experimental data in an unbiased approach, and to extract the sedimentation and diffusion coefficients and partial concentrations of mixtures of analytes (Brookes et al. 2006, 2010; Brookes and Demeler 2006, 2007; Demeler et al. 2014; Gorbet et al. 2014). For all of these methods, the ability to recover these parameters is limited by the magnitude of the stochastic noise present in the data. The magnitude of the noise determines the minimum amount of signal that the fitting method needs to be able to resolve. Any signal larger than the noise is not lost in the noise, and the grid must, therefore, be able to resolve differences between grid points that are equal or a slightly smaller than the noise signal. In other words, the underlying model must be able to explain the sedimentation and diffusion transport present in the experimental data with slightly higher resolution than the resolution necessary to account for the magnitude of the stochastic noise. This transport, when performed in a sector-shaped centrifugation cell under ideal solution conditions (constant temperature, absence of pressure dependence, constant speed, and under dilute conditions) is described by the Lamm equation L (Eq. 1) (Lamm 1929):

$$\frac{\partial C}{\partial t} = \frac{1}{r}\frac{\partial }{\partial r}\left[ {rD\frac{\partial C}{\partial r} - s\omega^{2} r^{2} C} \right],$$
(1)

where r is the radial distance from the rotor center, s and D are the sedimentation and diffusion coefficients, C is the partial concentration of a solute, and ω is the angular velocity of the rotor. Inspection of Eq. 1 reveals that fitting an experimental dataset consists of adjusting the sedimentation and diffusion coefficient, and finding the appropriate concentration C. In the general case, one must allow for the presence of multiple solutes Ci, where i indicates the ith species in a mixture. For non-interacting mixtures of solutes, the general solution for a multi-component mixture with n unknown species is given by:

$$C_{\text{total}} = \mathop \sum \limits_{i = 1}^{n} c_{i} L_{i} (s,D),$$
(2)

where ci is the partial concentration of the ith solute. In the general case, n, ci, and Li are not known and need to be determined with a degenerate fitting approach that does not impose any user bias or prior knowledge upon the solution. Furthermore, a rigorous solution to this problem requires that s and D for each solute are allowed to vary independently, requiring a two-dimensional fitting approach that can account for variable distributions in both sedimentation and diffusion coefficient. Previously, we proposed a two-dimensional spectrum analysis (2DSA) approach to solve this problem (Brookes et al. 2006; Brookes and Cao 2010). 2DSA begins by building a regular two-dimensional grid of sedimentation coefficients in one dimension and frictional ratios in the second dimension. This results in a two-dimensional grid of unique solutes, where each solute is defined by a unique combination of sedimentation and diffusion coefficients. Next, the finite-element solution for the entire experiment is calculated and a full set of scans and radial absorbances is simulated for each individual solute, using the experimental and boundary conditions of the actual experiment (rotor speed, buffer conditions, meniscus position, and bottom of cell). The simulated data points for each unique solute represent a basis vector of a linear combination of all solutes represented by the two-dimensional grid. The optimization problem is solved by forming a linear combination (Eq. 2) of the basis vectors li= Li (s, D), representing simulated solutions for each s, D for all n solutes defined in the grid. This linear system can be written as Ax = b, where A is the matrix of basis vectors li, x is the vector of unknown concentrations ci, and b is vector with experimental data. This problem is solved with the non-negatively constrained least-squares algorithm (NNLS) (Lawson and Hanson 1974), which results in a vector x containing positive concentrations ci for solutes Ci contributing to the NNLS fit and zero for all other solutes not found in the experimental data. Clearly, the model resolution obtained from the fit will be proportional to the number of solutes included in matrix A, with the resolution increasing with the size of A. In any case, for a typical experiment A will be very large (on the order of several gigabytes). As the size of A increases, so does the computational effort and the required calculation time. The exact scaling of the computational effort with resolution is difficult to generalize, since it depends on the number of components present in the experimental data, the size of A, the number of parallel processors available, and the number of partitions employed in the 2DSA. Therefore, a compromise has to be made between the desired resolution and the available computational resources. An obvious question, therefore, is: what exactly is the best set of grid points to use in a two-dimensional grid to obtain a desired model resolution for a given problem? A good rule of thumb is to use a grid layout where elimination of any grid point in the two-dimensional grid would introduce an error slightly less in magnitude than the stochastic noise inherent in the experimental data. If the grid-point density is high enough to where the removal of a grid point does not affect the root-mean-square deviation (RMSD) of the fit within the magnitude of the noise, then any solute present in the experimental data can be distinguished reliably, and the chance for missing a solute is minimized.

According to Eq. 1, each solute measured in an analytical ultracentrifugation (AUC) experiment gives rise to a sedimentation and diffusion coefficient, and NNLS optimization recovers the partial concentration of each solute in the grid of solutes (which may be zero). Once sedimentation and diffusion coefficients with non-zero concentrations are determined, additional properties of the found solutes are available. From the diffusion coefficient, we can derive the frictional coefficient:

$$f = \frac{RT}{ND},$$
(3)

where R is the gas constant, T is the temperature in Kelvin, and N is Avogadro’s number. If the partial-specific volume, \(\bar{\nu }\) is available, we can derive the molar mass:

$$M = \frac{sNf}{{1 - \bar{\nu }\rho }},$$
(4)

where ρ is the density of the solvent. Once molar mass and partial specific volume are available, we can assume a hypothetical spherical particle with the same volume as the actual solute and calculate the volume V and hydrodynamic radius r0 of the spherical particle:

$$V = \frac{{M\bar{\nu }}}{N},\quad r_{0} = \left( {\frac{3V}{4\pi }} \right)^{1/3}$$
(5)

Using the Stokes–Einstein relationship, we can derive the frictional coefficient of this hypothetical sphere:

$$f_{0} = 6\pi \eta r_{0}$$
(6)

Finally, the frictional ratio, or anisotropy k can be derived:

$$k = \frac{f}{{f_{0} }}$$

The latter property describes how non-globular a molecule is. For a perfectly spherical molecule f = f0 and k = 1.0, for all other molecules k > 1.0.

To aid in the interpretation of AUC results, it is frequently more convenient to express the results using parameterizations of the sedimentation and diffusion coefficients, and to present the results in terms of more intuitive parameters, for example, as functions of molar mass and anisotropy or partial-specific volume and molar mass instead of sedimentation and diffusion coefficients. As was shown in Demeler et al. (2014), it is straightforward to express a range of solute properties of interest in terms of any combination of another type of grid. In this work, we evaluate the resolution, contrast the computational requirements of several regular grid layouts, and show that all of these regular grids are either computationally wasteful or lack the ability to describe an experimental system with the desired resolution for each region of the grid equally. With the recent introduction of the Beckman Optima AUC instrument, a significant enhancement of the data quality and signal-to-noise ratio is realized, which suggests that commensurate enhancements in the data analysis resolution are desirable. This raises the question of the exact distribution of solutes in Eq. 2 that will provide the optimal compromise between resolution and computational requirements. In this manuscript, we present a systematic evaluation of the performance of traditionally employed regular grids and propose an adaptive grid layout providing improved solute point positions for s and D, which are easily computed and which still can be converted to any custom grid application proposed in Demeler et al. (2014). The new grid optimizes the retrieval of available information while at the same time minimizing the computational effort as a function of resolution, performing significantly better than any other regular grid layout tested by us.

Methods

Testing grid performance and simulation

We define grid performance as the reciprocal product of the computational effort times the number of grid points required to obtain a constant grid resolution. To compare grid performance, a resolution metric needs to be established. A convenient resolution metric is the signal difference between the experimental data from two simulated solutes with equal loading concentration (Brookes and Demeler 2010). Here, the simulations are for two adjacent grid points and simulated to match the experimental run conditions. An optimal grid layout will feature a resolution and grid spacing such that the difference between the Lamm equation solutions from adjacent grid points equals tolerance t, which should be slightly less than the RMSD originating from stochastic noise in the data. We suggest a constant value e which should be half of the expected RMSD. This difference needs to be satisfied in both dimensions of the grid:

$$\begin{array}{*{20}c} {L(s_{i + 1} ,D_{i} ) - L(s_{i} ,D_{i} ) = t} \\ {L(s_{i} ,D_{i + 1} ) - L(s_{i} ,D_{i} ) = t} \\ {{\text{with}}\,t = RMSD - e} \\ \end{array}$$
(7)

It is important to point out that contributions to the signal difference between two points in the grid depend on several experimental conditions, including rotor speed, the radial range of the fitted data, the interval between scans, the partial concentration of a solute, and the duration of the experiment, and should be derived from the UltraScan edit profile, which sets data range limits. We investigated regular grid types parameterized by k vs. s, k vs. M, D vs. s, and a new improved k vs. s grid with point spacings based on the first derivative of the Lamm equation with respect to s and D. In each case, we attempted to cover the same domain in s and D, regardless of parameterization. For all grids, the number of total grid points, Ngrid, was kept constant at 210 points to approximate equal computational effort across all grids. The total number of grid points was chosen such that the grid coverage was visually comparable across all grids. Generating grids should be fast and efficient, so numerical routines that empirically identify grid spacings satisfying a given resolution, for example through a line search or root-finding algorithm, are not desirable due to their large computational overhead (data not shown). In contrast, our proposed improved grid can be generated quickly and is suitable for parallel methods implemented on supercomputers (Demeler et al. 2009). To compare the efficiency of all grid types, an empirical method using finite-element simulations was needed. For this purpose, a new UltraScan module was developed, reusing already available data structures and processing methods in the UltraScan C++ class library. RMSD values were determined by subtracting two finite-element solutions representing adjacent grid points from each other as follows: scans for each component were simulated with equal time increments and over the experimental duration. Only points having less than one-half of the plateau concentration were included in the calculation, and any scans where the midpoint of the boundary was to the right of the bottom b of the cell with meniscus m according to \(b = m{\text{e}}^{{s\omega^{2} t}}\) were excluded from the RMSD calculations. Any points located to the left of a point 0.025 cm to the right of the meniscus were also excluded from the RMSD computation. This approach assured that steep gradients in the back-diffusion region are excluded from the fitting range due to refractive artifacts in this region, and because absorbance values at the bottom of the cell tend to exceed the dynamic range of the detector. The simulated time of the experiment was 6.8 h and was chosen such that the midpoint of the boundary from the average sedimentation coefficient of the grid’s s-value range would cross the bottom of the cell according to Eq. 14. For each experiment, 100 equidistant scans in time were simulated. All finite-element simulations were performed for a 40,000 rpm rotor speed, using 200 simulation points for ASTFEM grid. For sedimentation coefficients larger than the mean sedimentation coefficient, sedimentation time was shortened such that the RMSD calculation ignored scans after the faster of the two components pelleted. This prevented calculated RMSD values from being underestimated due to the inclusion of baseline values from pelleted solute states. For all grids, we used a sedimentation coefficient range from s1 = 1.1 × 10− 13 s to s2 = 9.9 × 10− 13 s, and a frictional ratio range from k1 = 1.2 to k2 = 3.8. These ranges were chosen to prevent the program from simulating unreasonable frictional coefficients below 1.0 during RMSD isobar calculations. To measure grid resolution, RMSD isobars were calculated around each grid point by measuring the RMSD difference along the polar coordinate lines from zero to 2π in two-degree increments with each grid point at its center, producing 180 RMSD points around each grid point. The required length of the polar coordinate line was determined by testing the RMSD at points from the four corners of the grid. We found that constant multipliers proportional to the regular s · k grid spacing were more than sufficient to capture all RMSD isobars of interest. Furthermore, the chosen constant scaling also ensured that all RMSD values along the polar coordinate line allowed for a linear extrapolation (data not shown). This approach was repeated by iterating over all other solute points in a test grid projected on to the s · k plane. Linear interpolations between 0 and 0.5% RMSD were used to generate RMSD error ellipsoid isobars (see Fig. 1). Equations for the linear approximations needed for the generation of the ellipsoid isobars were then stored in an output file to allow ellipsoids for different RMSD values to be generated without simulating every grid point and its associated sample points again.

Fig. 1
figure 1

RMSD error isobars for a solute point (black) in the 2DSA grid

Improved grid generation

Our improved grid is based on the rate of change of the concentration as a function of s and D. This can be represented by the derivative of the Lamm equation with respect to s and D. Since an analytical solution to this problem is not readily available, and numerical solutions require computationally demanding algorithms, we chose to use the Faxén approximation to the Lamm equation (Faxén 1929). Starting with the Lamm equation (Eq. 1), we first introduce dimensionless variables x, τ, and ε:

$$x = \left( {\frac{r}{m}} \right)^{2} ,\quad \tau = 2\;s\;\omega^{2} t,\;\;\;\varepsilon = \frac{2D}{{s\omega^{2} m^{2} }},$$
(8)

where m and b are the meniscus and the bottom of the cell, respectively. Then the Lamm equation can be transformed to:

$$\frac{\partial C}{\partial \tau } = \frac{\partial }{\partial x}\;\left[ {x\left( {\varepsilon \frac{\partial C}{\partial x} - C} \right)} \right]$$
(9)

It is evident that the solution C depends on parameter ε only. When ε ≪ 1, and x near one, the solution to the Lamm equation can be approximated by the Faxén solution:

$$C(x,\tau ) = \frac{1}{2}{\text{e}}^{ - \tau } [1 - \varPhi (\upsilon )],$$
(10)

where

$$\varPhi (\upsilon ) = \frac{2}{\sqrt \pi }\;\;\mathop \int \limits_{0}^{\upsilon } \;{\text{e}}^{{ - x^{2} }} {\text{d}}x$$
(11)

is the error function, and

$$v = \frac{{1 - (x{\text{e}}^{ - \tau } )^{1/2} }}{{[\varepsilon (1 - {\text{e}}^{ - \tau } )]^{1/2} }}$$
(12)

Taking the partial derivative with respect to ε yields:

$$\frac{\partial C}{\partial \varepsilon }(x,\tau ) = \frac{1}{2\sqrt \pi }{\text{e}}^{ - \tau } \epsilon^{ - 1} \upsilon {\text{e}}^{{ - \upsilon^{2} }}$$
(13)

To evaluate the magnitude of \(\frac{\partial C}{\partial \varepsilon }\), we need to specify a meaningful time interval. We note that a typical experiment will be finished when the midpoint of the solute’s boundary reaches the bottom of the cell, which occurs at:

$$t_{*} = \ln \left( {\frac{b}{m}} \right)\frac{1}{{s\omega^{2} }}$$
(14)

Therefore, we use the time interval [0, t*] to evaluate the magnitude of \(\frac{\partial C}{\partial \varepsilon }\). More precisely, we introduce the norm of \(\frac{\partial C}{\partial \varepsilon }\) in the domain \(0 \le \tau \le \tau_{*} = 2s\omega^{2} t_{*} = 2\ln \;(b/m)\) and \(1 \le x \le x_{*} = (b/m)^{2}\) as follows:

$$\left\| {\frac{\partial C}{\partial \varepsilon }} \right\| = \left[ {\int\limits_{0}^{{\tau_{*} }} {\mathop \int \limits_{1}^{{x_{*} }} |\frac{\partial C}{\partial \varepsilon }(x,\tau )|^{2} {\text{d}}x{\text{d}}\tau } } \right]^{1/2}$$
(15)

For fixed values of m, b, and rotor speed ω, this norm is dependent on ε only. Unfortunately, there is no explicit formula for the norm as a function of ε. A numerical evaluation of the norm suggests that for typical ranges of s, D, and ω, \(\frac{\partial C}{\partial \varepsilon }\) is approximately proportional to ε− 3/4. See Fig. 2 for a log–log plot of \(\left\| {\frac{\partial C}{\partial \varepsilon }} \right\|\) as a function of ε in the case of m = 6.5, b = 7.2, and ω = 60,000 rpm.

Fig. 2
figure 2

A plot of the norm of \(\frac{\partial C}{\partial \varepsilon }\) as a function of ε in the case of m = 6.5, b = 7.2, and ω = 60,000 rpm

A careful study shows that ε is inversely proportional to the 3/2th power of the product \(s \cdot k\), more precisely

$$\varepsilon = \frac{2}{{9\sqrt 2 \pi \omega^{2} b^{2} }} \cdot \frac{RT}{N} \cdot \left( {\frac{{\eta^{3} v}}{1 - v\rho }} \right)^{ - 1/2} \cdot (s \cdot k)^{ - 3/2}$$
(16)

Let μ = (s · k)− 1, then we have ε = O (μ3/2), and thus:

$$\left\| {\frac{\partial C}{\partial \varepsilon }} \right\| \approx O(\varepsilon^{ - 3/4} ) = O\;(\mu^{-9/8} )$$
(17)

Using the chain rule for differentiation

$$\frac{\partial C}{\partial \mu } = \frac{\partial C}{\partial \varepsilon } \cdot \frac{\partial \varepsilon }{\partial \mu } = \frac{\partial C}{\partial \varepsilon } \cdot O\;(\mu^{ 1/2} )$$
(18)

which implies that:

$$\left\| {\frac{\partial C}{\partial \mu }} \right\| = O\;(\mu^{ - 5/8} )$$
(19)

Since \(\left\| {\frac{\partial C}{\partial \mu }} \right\|\) is approximately constant along a curve \(s \cdot k = {\text{const}},\) when designing an sk grid system, the grid points can be picked along various curves \(s \cdot k = \mu_{j}^{ - 1} ,j = 1,2, \ldots ,N,\) where the values of \(\mu_{j}\), j = 1, 2,…,N, are selected so that the RMSD error isobars are approximately uniformly distributed. We observed that when \(\mu_{j}^{ - 1/4}\) is evenly spaced, the distribution of the RMSD error isobars is the closest to uniformity. Thus, we select \(\mu_{j}\) values accordingly for the grid generation. A detailed description of the creation of the sk grid system follows.

Suppose in a 2DSA analysis, the sedimentation coefficient s is between limits s1 and s2 and the frictional ratio k is between k1 and k2, then the range for \(\mu = (s \cdot k)^{ - 1}\) is between \(\mu_{1} = (s_{1} \cdot k_{1} )^{ - 1}\) and \(\mu_{2} = (s_{2} \cdot k_{2} )^{ - 1}\). Let, N be the number of partitions we would like to place in between μ1 and μ2. Then an equidistribution of μ− 1/4 can be achieved approximately using the dividing points:

$$\mu_{j} = \left[ {\left( {1 - \frac{j}{N}} \right) \cdot \mu_{1}^{ - 1/4} + \frac{j}{N} \cdot \mu_{2}^{ - 1/4} } \right]^{ - 4} ,\;\;0 \le j \le N$$
(20)

To generate the improved grid, we calculate all \(y_{j} = 1/(\mu_{j} \cdot s_{1} )\) where \(\mu_{j}^{ - 1} \ge s_{1} \cdot k_{2}\) and all \(x_{i,j} = 1/(\mu_{i} \cdot y_{j} )\), satisfying \(s_{1} \le x_{i,j} \le s_{2}\). Then the grids on the sk plane is the collection of all points (xi,j, yj), satisfying \(s_{1} \le x_{i,j} \le s_{2}\) and \(\mu_{j} \le s_{1} \cdot k_{2}\).

Adjusting the resolution of the improved grid

The resolution of the improved grid is proportional to the total number of grid points, Ngrid. It can be controlled by adjusting N, the number of partitions between μ1 and μ2. An estimate of Ngrid can be obtained as follows: first, ensuring \(\mu_{j} \le 1/(s_{1} \cdot k_{2} )\), we have \(0 \le j \le J_{a}\) with:

$$J_{a} = {{N \cdot \left[ {1 - \left( {\frac {\mu_{1}}{s_{1} \cdot k_{2}}} \right)^{1/4} } \right]} {\left / {\vphantom {{N \cdot \left[ {1 - \left( {\frac{{\mu_{1} }}{{s_{1} \cdot k_{2} }}} \right)^{1/4} } \right]} {\left[ {1 - \left( {\frac{{\mu_{1} }}{{\mu_{2} }}} \right)^{1/4} } \right]}}} \right. \kern-0pt} {\left[ {1 - \left( {\frac{{\mu_{1} }}{{\mu_{2} }}} \right)^{1/4} } \right]}}$$
(21)

For each \(j \le J_{a}\), to ensure that \(s_{1} \le x_{i,j} \le s_{2}\), we have:

$${{j \le i \le N \cdot \left[ {1 - \left( {\frac{{\mu_{1} }}{{s_{2} \cdot y_{j} }}} \right)^{1/4} } \right]} \mathord{\left/ {\vphantom {{j \le i \le N \cdot \left[ {1 - \left( {\frac{{\mu_{1} }}{{s_{2} \cdot y_{j} }}} \right)^{1/4} } \right]} {\left[ {1 - \left( {\frac{{\mu_{1} }}{{\mu_{2} }}} \right)^{1/4} } \right]}}} \right. \kern-0pt} {\left[ {1 - \left( {\frac{{\mu_{1} }}{{\mu_{2} }}} \right)^{1/4} } \right]}}$$
(22)

Consequently, the total number of grid points, Ngrid, is given by:

$$N_{\text{grid}} = \mathop \sum \limits_{j = 0}^{{J_{a} }} \left( {{{\left[ {1 - \left( {\frac{{\mu_{1} }}{{s_{2} \cdot y_{j} }}} \right)^{1/4} } \right]} \mathord{\left/ {\vphantom {{\left[ {1 - \left( {\frac{{\mu_{1} }}{{s_{2} \cdot y_{j} }}} \right)^{1/4} } \right]} {\left[ {1 - \left( {\frac{{\mu_{1} }}{{\mu_{2} }}} \right)^{1/4} } \right]}}} \right. \kern-0pt} {\left[ {1 - \left( {\frac{{\mu_{1} }}{{\mu_{2} }}} \right)^{1/4} } \right]}} - j} \right)$$
(23)

A plot of Ngrid vs. N is displayed in Fig. 3. Furthermore, a least-squares fit shows that Ngrid is approximately a quadratic function of the number of partitions as given by:

Fig. 3
figure 3

Total number of grid points Ngrid vs. the number of partitions N using Eq. 23 and approximately Eq. 24

$$N_{\text{grid}} \approx N^{2} /e$$
(24)

A comparison of the estimated total number of grid points using the above formula is also shown in Fig. 3, which indicates a good match of Eq. 23 with the prediction by Eq. 24. Therefore, in practice, to generate an improved grid containing Ngrid points we can select \(N = \sqrt {e \cdot N_{\text{grid}} }\) as the number of partitions to produce the improved grid, where e ≈ 2.718 is Euler's number.

Results

One of the best metrics for grid performance is the RMSD distance between adjacent grid points. When the grid points are identical, the RMSD difference between these points is zero; furthermore, the two grid points move apart along either the s or D direction, the larger the RMSD difference will become between finite-element solutions for these grid points. The comparison cannot just be made in one dimension, because both s and D contribute to this RMSD difference. As shown in Fig. 1, a constant level of RMSD difference around a grid point is best described by an ellipsoid, which varies in aspect ratio and orientation, depending on the position of the grid point in the two-dimensional grid space. Ideally, the RMSD difference between adjacent grid points should be slightly less than the RMSD level encountered in the stochastic noise in the data to assure all solute concentrations that exceed the noise level can be detected. To this end, we plotted the location and RMSD ellipsoids for five different RMSD levels (0.001–0.005), corresponding roughly to the noise level ranges observed in commercially available analytical ultracentrifuges, for a fixed number of grid points and several regular grid types, as well as for the improved grid based on the Faxén solution. Regular grid types offer the advantage of being intuitive in terms of the variable that they represent (for example, frictional ratio and molar mass) and can be quickly generated. They avoid the computational overhead of numerically optimized grids that will result in equi-distant RMSD grid points. Furthermore, such optimized grids are difficult to generate in more than one dimension. On the other hand, the computational overhead for the improved grid (Eqs. 824) is trivial and well suited for methods such as the 2DSA or genetic algorithms (Brookes and Demeler 2006, 2007), where hundreds of grids need to be computed. We investigated regular grids where s and D values were based on s vs. k, M vs. k, and s vs. D on a range consistent with a sedimentation coefficient range for s from 1 to 10 × 10−13 s, and a frictional ratio range for k from 1 to 4. Our comparison of the performance of the improved grid with different regular grids revealed significant differences when error distances between adjacent grid points were evaluated. These differences are clearly seen when their RMSD error isobars are visually compared (Fig. 4). We observe the following characteristics for each grid: the conventionally used s vs. k grid (Fig. 4a) suffers from low resolution in s for the left half of the grid, but performs well for s in the right half of the grid. For k, the resolution increasingly suffers in the upper left quadrant of the grid and is computationally very wasteful in the right half of the grid, where adjacent grid points overlap strongly in the k dimension. By far, the worst-performing grid is a regular grid based on molar mass M and frictional ratio k (Fig. 4b). Drastic loss of resolution in the lower left quadrant of the grid in both dimensions is accompanied by significant overlap in both dimensions in the entire right half of the grid, and especially strong in the center for k. A regular grid based on s vs. D (Fig. 4c) performs reasonably well for s throughout the s domain, with a slight loss in resolution in the upper left quadrant of the grid for k, similar to the error seen in the s vs. k grid (Fig. 4a). In the right half of the grid, significant overlaps are seen in the lower k regions of the grid, indicating significant computational inefficiencies. The most evenly distributed RMSD error over the entire grid is evident from the improved grid based on the Faxén solution (Fig. 4d).

Fig. 4
figure 4

RMSD error isobars for an equal number of grid points from three regular grids a s vs. k, b M vs. k, c s vs. D, and d improved grid based on the Faxén solution. Here, increasing white space between the outermost error isobar indicates reduced resolution, while overlaps between adjacent red isobars indicate wasteful inefficiencies. Ideally, red isobars should touch, but not overlap

Remarkably, the improved grid provides excellent coverage for the lower left quadrant [see Fig. 5 for a magnified view of the lower left quadrant for a 0.0005 (red) and 0.001 (blue) RMSD error level], demonstrating no overlap and nearly touching isobars. Similarly, overlaps in the right half of the grid are essentially absent, though spacing in the s range suggests slight resolution loss in the upper right quadrant of the grid. It should be noted that diffusion resolution is very low in the upper right quadrant since solutes in this portion of the grid have a small diffusion coefficient to begin with, and then they are sedimenting rapidly, leaving little time for diffusion, which decreases diffusion signal and explains lower resolution in D. Consequently, isobars are very elongated in the k direction, and white space at the upper right quadrant is caused by missing solute points that would be centered at frictional ratios > 4 which were not considered in this simulation. The same effect is very clear also in Fig. 4c, where large regions were not simulated since they fell outside of the range of 1 ≥ k ≥ 4, and solute regions outside of this range would be required to fill these white spaces. The current grid generation function for UltraScan’s 2DSA produces a regularly spaced grid of solute points in terms of sedimentation coefficient and frictional ratio. Although this method can effectively analyze AUC experimental data, it does not necessarily do so in the most computationally efficient way. When using a regular s · k grid, there are often cases in which groups of two solute points on the grid are sufficiently similar in terms of their simulated behavior that when the stochastic noise of the experimental data is taken into account; the two are functionally identical. This is problematic because it requires the program to unnecessarily simulate a solute, a cost that can become significant for large grids with many redundant simulations.

Fig. 5
figure 5

Detail of lower left corner of improved grid based on the Faxén solution, demonstrating excellent coverage without overlaps and without resolution gaps (simulated resolution in blue: RMSD = 0.001)

Summary

We have presented a novel method for a computationally efficient s · k grid that substantially improves resolution for a given number of simulation points in a two-dimensional grid used for fitting sedimentation velocity experiments, while simultaneously minimizing computational effort and required memory. This innovation will reduce needed computer time on national supercomputing infrastructures such as XSEDE and PRACE, and improve resolution when fitting sedimentation velocity data.