INTRODUCTION

Hemoglobin is among the macromolecules that have been most comprehensively studied in molecular biology. Hill [1, 2] proposed his well-known equation to describe the cooperative binding of ligands with proteins more than a century ago for hemoglobin. Using hemoglobin as an example, Monod developed the theory of allosteric interactions [36], which underlie many processes involved in regulating biological activity of macromolecules. Allosteric regulation refers to the regulation where the binding of a ligand to one site of a protein affects the binding of another molecule to another protein site (in this context, allosteric interactions can be considered as a particular case of more general cooperative effects, which additionally manifest themselves as contact interactions between spatially close ligand-binding centers [7, 8]). Many issues are still not understood in the function of hemoglobin. Its four catalytically active subunits are capable of binding various ligands. The heterotetramer can regulate the basic functions of its monomers via cooperative effects (e.g., see [9, 10]). Alternative and accessory functions of hemoglobin have attracted special interest in the past years [11]. However, the phenomenon of cooperative oxygen–hemoglobin binding, which provides a classical example of allosteric interactions, has not been studied well enough to allow us to choose a single model among many theoretical models developed to describe it. The concerted functions of the subunits in the macromolecule are due to their cooperative interactions. The high level of functional adaptability of hemoglobin to varying conditions in the body is achieved via fine adjustments of its elements, which work as a single entity [1214]. A formal description of the interaction of hemoglobin with its ligands does not obey the Michaelis–Menten equation [15, 16]. Nonlinear effects arise as a result of cooperative interactions, and a variety of approaches are used to describe them in the given system. The approaches include conceptual models of protein–ligand interactions, chemical reactions, and regression models. We note that such approaches lack a physical content in a number of cases [1719]. Many descriptions have been proposed for the cooperative effects that occur in the hemoglobin molecule, but the relationships between individual models have still not been established and there are no criteria to choose the most adequate (best) model. We believe that this problem deserves special investigation (e.g., see [20, 21]).

We have previously proposed criteria [22] to evaluate regression models of oxygen–hemoglobin binding [23]. Here, we consider a broad range of models designed to describe oxygen–hemoglobin binding and systematize the approaches developed in the field.

METHODS

Models of oxygen–hemoglobin binding were the focus of this study and were analyzed using numerical methods and experimental data [24], which provide the most complete and accurate measurements available for the functional dependence (Fig. 1).

Fig. 1.
figure 1

Experimental measurements of the oxyhemoglobin dissociation curve (based on the data [24]). Hb4(O2)4 is a tetrameric oxyhemoglobin molecule; pO2 is the partial pressure of oxygen.

To eliminate the stochastic measurement errors, the data were digitally processed using piecewise-polynomial smoothing algorithms (data array 1) [25] or globally approximating polynomials (data array 2) [26].

The stability of a solution to the problem of modeling the oxyhemoglobin dissociation curve (ODC) was checked using the above data arrays and the initial data (data array 3) [24].

Model parameters were optimized by the least-squares method [27]. The coefficient of determination was used to evaluate the goodness of fit for a model [28]. A formal classification of models was constructed via cluster analysis [29].

Necessary computations were performed using the MS Excel spreadsheet processor with the Visual Basic for Applications (VBA) module.

RESULTS AND DISCUSSION

An overview of the models that describe oxygen–hemoglobin binding. In the history of mathematical models describing the oxygen–hemoglobin interaction, Hüfner [30] was the first to propose an equation (1890), which described oxygenation as a first-order chemical reaction:

$${\text{Hb}} + {{{\text{O}}}_{2}}\underset{\beta }{\overset{\alpha }{\rightleftarrows}}{\text{Hb}}{{{\text{O}}}_{2}},$$
(1)
$$y = \frac{{{\text{Hb}}{{{\text{O}}}_{2}}}}{{{\text{Hb}} + {\text{Hb}}{{{\text{O}}}_{2}}}} = \frac{{kp}}{{1 + kp}} = \frac{p}{{{{k}^{{ - 1}}} + p}} = \frac{p}{{{{p}_{{50}}} + p}},$$
(2)

where Hb is deoxyhemoglobin; O2 is oxygen; HbO2 is oxyhemoglobin; α and β are the kinetic coefficients of the forward and reverse reactions, respectively; y is the hemoglobin saturation with oxygen; p is the partial pressure of oxygen; k = α/β is the equilibrium reaction constant; and k–1 (hereafter designated p50) is the oxygen pressure at y = 50%.

The model fails to describe the S-like shape of the ODC and can be used only to calculate the binding of oxygen with myoglobin or an individual hemoglobin subunit and to provide a basis for constructing the equations that allow for a cooperative effect.

It should be noted that the Hüfner, Langmuir [31], and Michaelis–Menten [32] equations are essentially equivalent because they describe similar physical processes:

$$\frac{{kp}}{{1 + kp}} = \frac{p}{{{{p}_{{50}}} + p}} = \frac{{k{\kern 1pt} 'C}}{{1 + k{\kern 1pt} 'C}} = \frac{C}{{K{\kern 1pt}^ {"}\,\, + C}} = \frac{{[S]}}{{{{K}_{{\text{m}}}} + [S]}},$$
(3)

where k' is the absorption constant, K'' is the absorption equilibrium constant (1/k'), C is the equilibrium adsorbate concentration in the Langmuir equation, [S] is the substrate concentration, and Km is the Michaelis constant in the Michaelis–Menten equation.

The Hill equation (1910) [1, 2] is based on a model that assumes the hemoglobin molecule to be a polymer, which consists of h subunits and binds simultaneously h O2 molecules. Oxygenation is considered to be an h-order reaction in this case:

$${{\left( {{\text{Hb}}} \right)}_{h}} + h{{{\text{O}}}_{2}}\underset{\beta }{\overset{\alpha }{\rightleftarrows}}{{\left( {{\text{Hb}}{{{\text{O}}}_{2}}} \right)}_{h}},$$
(4)
$$\begin{gathered} y = \frac{{{{{\left( {kp} \right)}}^{h}}}}{{1 + {{{\left( {kp} \right)}}^{h}}}} = \frac{{{{p}^{h}}}}{{{{k}^{{ - h}}} + {{p}^{h}}}} \\ = \frac{{{{p}^{h}}}}{{p_{{50}}^{h} + {{p}^{h}}}} = \frac{{{{{\left( {{p \mathord{\left/ {\vphantom {p {{{p}_{{50}}}}}} \right. \kern-0em} {{{p}_{{50}}}}}} \right)}}^{h}}}}{{1 + {{{\left( {{p \mathord{\left/ {\vphantom {p {{{p}_{{50}}}}}} \right. \kern-0em} {{{p}_{{50}}}}}} \right)}}^{h}}}}, \\ \end{gathered} $$
(5)

where h is the Hill constant.

Adair et al. [34] (1925) proposed a model that was based on the hypothesis of intermediate hemoglobin saturation with oxygen and assumed that ligand binding proceeds consecutively in four steps:

$${\text{H}}{{{\text{b}}}_{4}} + {{{\text{O}}}_{2}}\underset{{{{\beta }_{1}}}}{\overset{{{{\alpha }_{1}}}}{\rightleftarrows}}{\text{H}}{{{\text{b}}}_{4}}{{{\text{O}}}_{2}},$$
(6)
$${\text{H}}{{{\text{b}}}_{4}}{{{\text{O}}}_{2}} + {{{\text{O}}}_{2}}\underset{{{{\beta }_{2}}}}{\overset{{{{\alpha }_{2}}}}{\rightleftarrows}}{\text{H}}{{{\text{b}}}_{4}}{{\left( {{{{\text{O}}}_{2}}} \right)}_{2}},$$
(7)
$${\text{H}}{{{\text{b}}}_{4}}{{\left( {{{{\text{O}}}_{2}}} \right)}_{2}} + {{{\text{O}}}_{2}}\underset{{{{\beta }_{3}}}}{\overset{{{{\alpha }_{3}}}}{\rightleftarrows}}{\text{H}}{{{\text{b}}}_{4}}{{\left( {{{{\text{O}}}_{2}}} \right)}_{3}},$$
(8)
$${\text{H}}{{{\text{b}}}_{4}}{{\left( {{{{\text{O}}}_{2}}} \right)}_{3}} + {{{\text{O}}}_{2}}\underset{{{{\beta }_{4}}}}{\overset{{{{\alpha }_{4}}}}{\rightleftarrows}}{\text{H}}{{{\text{b}}}_{4}}{{\left( {{{{\text{O}}}_{2}}} \right)}_{4}},$$
(9)
$$y = \frac{{{{a}_{1}}p + 2{{a}_{2}}{{p}^{2}} + 3{{a}_{3}}{{p}^{3}} + 4{{a}_{4}}{{p}^{4}}}}{{4\left( {1 + {{a}_{1}}p + {{a}_{2}}{{p}^{2}} + {{a}_{3}}{{p}^{3}} + {{a}_{4}}{{p}^{4}}} \right)}},$$
(10)

where a1 = k1, a2 = k1k2, a3 = k1k2k3, and a4 = k1k2k3k4 are the Adair coefficients; ki = αii (i = 1, 2, 3, and 4) are the equilibrium reaction constants; and Hb4 is the hemoglobin tetramer.

Based on the Wyman–Allen hypothesis, which postulates a simultaneous binding of two oxygen molecules to a hemoprotein molecule [35], Bernard [36] (1960) proposed the following oxygenation equation:

$$y = \frac{{0.5a{{{\left( {{p \mathord{\left/ {\vphantom {p {{{p}_{{50}}}}}} \right. \kern-0em} {{{p}_{{50}}}}}} \right)}}^{2}} + {{{\left( {{p \mathord{\left/ {\vphantom {p {{{p}_{{50}}}}}} \right. \kern-0em} {{{p}_{{50}}}}}} \right)}}^{4}}}}{{1 + a{{{\left( {{p \mathord{\left/ {\vphantom {p {{{p}_{{50}}}}}} \right. \kern-0em} {{{p}_{{50}}}}}} \right)}}^{2}} + {{{\left( {{p \mathord{\left/ {\vphantom {p {{{p}_{{50}}}}}} \right. \kern-0em} {{{p}_{{50}}}}}} \right)}}^{4}}}},$$
(11)

where a is a certain constant kinetic coefficient. The model is principally a combined variant of the Hill and Adair equations.

The above (and related) mathematical equations are based on the general chemical model that considers the interaction of a ligand and a protein in terms of the law of the mass action and utilizes the n-th order power function.

Alternative models of oxygen–hemoglobin binding are based on the assumption of a transition process, to which the law of the mass action is inapplicable [3740]. An exponential function is used to determine the hemoprotein oxygenation rate in this case, and the partial pressure of oxygen serves as its argument (power).

Based on this idea, Vysochina [37] (1963) proposed the following equation:

$$y = 1 - \left[ {1 + bp + \frac{{{{{\left( {bp} \right)}}^{2}}}}{2} + \frac{{{{{\left( {bp} \right)}}^{3}}}}{3}} \right]\exp \left( { - bp} \right),$$
(12)

where b is the variable kinetic coefficient.

Podrabinek and Kamenskii [38] (1968) developed a model that is based on the Hüfner equation and assumes that the ligand binding constant is a function of the degree of protein macromolecule deformation, which exponentially depends on pO2:

$$y = \frac{{\alpha p\exp \left( {\lambda p} \right)}}{{1 + \alpha p\exp \left( {\lambda p} \right)}},$$
(13)

where α and λ are the positive constant kinetic coefficients and k = α exp(λp) is the equilibrium reaction constant.

The approach is similar to that developed by Tenford [41] and used to construct models that describe absorption of free ligands on a nucleic acid molecule with partly filled ligand-binding centers.

Kislyakov [39] (1975) proposed the following equation:

$$y = {{\left[ {1 - \exp \left( { - \frac{{bp}}{{10}}} \right)} \right]}^{{\frac{1}{b}}}},$$
(14)

where b is the kinetic coefficient equal to a0 + a1z + a2z2 and a0, a1, and a2 are the constant kinetic coefficients.

Khanin et al. [40] (1978) proposed a function that is analogous to the above ones and has the following form:

$$y = 1 + {{\delta }_{1}}\exp \left( { - {{\delta }_{2}}p} \right),$$
(15)

where δ1 and δ2 are the constant kinetic coefficients.

Analysis of the models of cooperative oxygen binding by hemoglobin. It is possible to evaluate how accurately the experimental ODC is approximated by mathematical functions with a certain physical content by using the coefficient of determination (R2) to measure the goodness of fit.

Because stochastic errors are always present in experimental data to a certain extent and may qualitatively change the interpretation of experimental results, two-step filtration was applied to the initial ODC values.

At the first step, outliers were eliminated. A difference of more than 1% between a single point of the experimental curve and its approximation with third-order nonuniform rational B-splines excluded the point from further analysis [4244]. Only 2 out of 65 points were outliers. The next step included smoothing with a third-order Savitzky–Golay piecewise polynomial filter. The filter is the derivative of the moving average method, thus providing a simple, available, and common tool to solve a number of similar problems [45, 46]. The data obtained by digital filtering were re-discretized with a regular grid increment of 0.5 mm Hg in an O2 partial pressure range of 0–622.5 mm Hg; the total number of points was 1246 (data array 1).

The functions were ranked in order of decreasing R2 (Table 1). As is seen, the functions based on a power dependence lead by a substantial margin in the approximation list with the expectable exception of the Hüfner equation, which is a basic model.

Table 1. ODC-approximating functions with a physical content (results obtained with data array 1)

The facts that similar R2 values were obtained for the Adair, Bernard, and Hill models and that the models similarly represent the oxygenation process indirectly suggest a particular manner for oxygen binding by hemoglobin; i.e., oxygen is bound in a stepwise manner as a result of structural rearrangements that arise in the protein macromolecule upon its binding with ligands via equilibrium reactions.

In addition, this is indirectly (by the exclusion method) supported by the approach based on the exponential dependence and used in the alternative models, together with the scatter observed for the coefficient of determination in the group of equations 47 (Table 1).

To formally assign the approximating functions in question to a particular type of physical (and, consequently, mathematical) models, the models were classified according to R2 by nearest neighbor clustering, using the Euclidean distance to measure the separation between models (Fig. 2a). It should also be noted that much the same dendrogram was obtained by the furthest neighbor method; i.e., clustering quality was not impaired when the two methods were used.

Fig. 2.
figure 2

A dendrogram (distance matrix) of several ODC-describing models as constructed for data arrays (a) 1, (b) 2, and (c) 3.

The clustering characteristics of the mathematical models under study were considered consecutively based on their analytic representation with the decreasing distance between models.

First, a dichotomy of the group of models 1–7 and group 8 (Table 1), which consisted of one model (the Hüfner equation), was observed at a distance of 57 710 points. It is well known that the Hüfner model fails to describe the cooperative character of oxygenation; an illustrative and mathematically formal demonstration is provided again by our cluster analysis (Fig. 2a).

A decrease in distance to 11 938 points shows a separation of the remaining models into two subclusters (cluster leaves). The Khanin and Vysochina equations form a cluster with a between-model distance of 6241 points. The similarity of these two approximations is determined by the equation component a exp(–bp), where a and b are the kinetic coefficients and p is the partial pressure of oxygen.

The other cluster (2578 points) comprises models 1–5 (Table 1). The closeness of the models formally indicates that power equations more efficiently describe the oxygenation process. For example, although exponential, the Kislyakov equation has b–1 as a power and the Podrabinek–Kamenskii model is based on the ratio kpn/(1 + kpn), where n = exp(λp), as mentioned above.

The small distance (279 points) between equations 4 and 5 reflects their similarity in having an exponential component, but not their good fit to experimental data.

The subcluster with a between-model distance of 237 points is a family of the Adair, Bernard, and Hill power models, thus supporting the assumption that ligand binding is an nth order chemical reaction.

Then, these mathematical approximations are possible to associate with physical models. The Adair equation suggests consecutive ligand binding and is therefore possible to associate with the Koshland–Némethy–Filmer model [47]. The Bernard equation (a hybrid solution) can be associated with the Wyman–Allen model [48]; and the Hill equation, with the Monod–Wyman–Changeux model (a symmetrical model) [3].

The cluster of these models divides into two subcluster, one including the Adair and Bernard equations (with a between-model distance of 63 points) and the other, the Hill equation. The Hill equation is inferior to the other two in terms of R2. The Bernard approximation groups with the Adair approximation, suggesting their similarity in describing the ligand binding by sequence states. Because the Adair equation has the highest R2 among all models under study, the formal approach shows that the Koshland–Némethy–Filmer model with the Adair equation as its mathematical projection is superior in describing the ligand binding with a macromolecule.

In the next variant of the experiment, the following polynomial was used to directly approximate the initial data:

$$y = \frac{{a + {{b}_{1}}\ln p + {{b}_{2}}{{{\left( {\ln p} \right)}}^{2}} + {{b}_{3}}{{{\left( {\ln p} \right)}}^{3}} + {{b}_{4}}{{{\left( {\ln p} \right)}}^{4}}}}{{1 + {{c}_{1}}\ln p + {{c}_{2}}{{{\left( {\ln p} \right)}}^{2}} + {{c}_{3}}{{{\left( {\ln p} \right)}}^{3}} + {{c}_{4}}{{{\left( {\ln p} \right)}}^{4}}}},$$
(16)

where a, b1–4, and c1–4 are coefficients and p (>0) is the partial pressure of oxygen. We believe that this approach makes it possible to evaluate how experimental errors, such as outliers, or systematic errors of approximation, affect the stability of the solution of a numerical modeling problem. The polynomial was selected so that no appreciable advantage was added to the Adair, Bernard, or Hill equation compared with the other models and that both power and logarithmic transformation were included in the polynomial. Its coefficient of determination was 999 918 ppm, which is higher than that of any of the regression models. This formally supports the adequacy of the selected polynomial. If the polynomial degree is increased from 4 to higher values, R2 increases naturally, but the polynomial approximates the interpolating polynomial, as is the case, for example, with Fourier and Chebyshev higher-order polynomials [49].

As in the above experiment, the numerical values were again re-discretized with a regular grid increment of 0.5 mm Hg and a total of 1246 points (data array 2), thus allowing comparisons of the R2 estimates obtained using data arrays 1 and 2.

As Table 2 demonstrates, the operation of taking logarithm present in the polynomial quite expectedly reduced the R2 values to some extent for the power equations and increased R2 for the equations based on an exponential function (Table 1) with the exception of the Khanin equation. However, the differences had almost no effect on the ranking of the regression models under study. Changes occurred only in positions 4 and 5 (Tables 1, 2). The Podrabinek–Kamenskii equation had a higher coefficient of determination compared with the Kislyakov equation in this case. The dendrogram of the ODC models also remained unchanged in structure (Fig. 2b).

Table 2. ODC-approximating functions with a physical content (results obtained with data array 2)

Thus, use of piecewise-polynomial smoothing or globally approximating polynomials does not affect the general ranking of the models and their clustering in dendrograms, indicating that consecutive binding of ligands is a more plausible scenario.

However, data that were processed digitally were used as reference values in the above experiments and data processing might add transformation errors, change the values of the coefficient of determination, and affect the model clustering pattern.

The initial data (data array 3) were therefore used without any treatment in the next experiment. This makes it possible to evaluate the stability of the analytical algorithm in the case of data that include stochastic errors and to avoid additional mathematical transformations if appropriate.

The results are shown as a ranking of the approximating functions (Table 3). The Vysochina and Khanin equations exchanged their places in this case. The change did not principally alter the general clustering pattern, as was expected from the results of the previous experiments. This finding supports the adequacy of our estimates.

Table 3. ODC-approximating functions with a physical content (results obtained with data array 3)

A cluster analysis was carried out with these data (Fig. 2c). The results showed that the dendrogram structure remained almost unchanged. Two clusters formed at a distance of 1260 points, one including models 1–4 (Table 3) and the other, model 5 (the Kislyakov equation). Unlike in the previous experiments (data arrays 1 and 2), the Podrabinek–Kamenskii model occurred in the power model cluster (the Adair, Bernard, and Hill models) rather than clustering together with the Kislyakov model. However, a relatively great distance (613 points) separates the Podrabinek–Kamenskii model from the power model cluster. This indicates, first, that stochastic errors are a factor and, second, that there is a certain similarity in describing the kinetics of oxygen binding by the hemoprotein between the Podrabinek–Kamenskii model and the Adair, Bernard, and Hill models, as already mentioned above.

As for the power models, the cluster structure, leaf positions, and the proportions and absolute values of distances between models were similar to those in the previous experiments (Figs. 2a, 2b). It should be noted that the distances observed between the Adair and Bernard models (56 points) and between the Adair and Bernard models and the Hill model (186 points) in the last experiment were lower than in the previous experiments (63 and 89 points for the former distance and 237 and 208 points for the latter distance). The difference in distance is probably determined by the difference in total point number: 1246 points (data arrays 1 and 2) vs. 65 points (data array 3). Then, the distance of 613 points might be even greater in the previous experiments if the Podrabinek–Kamenskii model clustered together with the power models.

CONCLUSIONS

Based on the experimental results, the Koshland–Némethy–Filmer model, which suggests a consecutive four-step binding of the ligand by an oligomeric protein, was found to best approximate the experimental data on oxygen binding by hemoglobin.

Our approach to evaluating the efficiency of ODC approximation can be used to solve similar problems, that is, to evaluate mathematical models with a physical content. This will eventually help to improve the efficiency of finding the most promising variants when designing modern molecular models and to more efficiently verify the templates, models, and schemes that exist in the field. We note that the cooperative binding of biologically active compounds with nucleic acids [6] is also possible to analyze using this method. Models developed in the field overlap the above models to a substantial extent [50, 51]. The basic mathematical constructs used to describe the ligand binding with biopolymers in this case include the Scatchard equations, which are designed to calculate the absorption isotherms; the Sips equations, which describe the heterogeneous ligand binding; and the results of theoretical studies by Tenford [41], Hill [54], Latt and Sober [55], and Crothers [56], who developed an approach to describing ligand absorption on linear polymers. Zasedatelev and colleagues [5763] and Lando, Teif, and colleagues [64–66] also performed important studies in the field and developed a modern approach to describing the cooperative binding of extended ligands to linear nucleic acid templates.

Our method to evaluate the efficiency of ODC approximation can additionally be used to solve problems of this kind in biomedicine, chemistry, and pharmacology.