Introduction

The survival of red blood cells (RBCs) has been studied for nearly a century [1] because of its importance in clinical medicine and translational research. To cite three examples, the mean RBC lifespan and production rate determine the steady-state hemoglobin (Hb) concentrations in both healthy and diseased individuals [2]; variation in mean RBC lifespan is sufficient to result in clinically important differences in HbA1c among diabetics with the same mean blood glucose level [3]; and reduced mean RBC lifespan contributes to the anemia of chronic renal failure [4, 5]. Equally important is the role of the RBC lifespan in establishing criteria for stored RBCs for transfusion, elucidation of the physiology and pathophysiology of erythropoiesis, and the design of therapies for anemia and other hematological conditions [68].

Confusion and a lack of proper quantification of various aspects of RBC survival still exist in the literature. Before discussing these issues we describe the two approaches, direct and indirect, commonly used to study RBC survival and the types of RBC samples used in such studies [9]. We then provide a brief literature review.

Direct and indirect models of RBC survival

The direct approach [9] involves observing and quantifying the disappearance of a population of labeled RBCs from the circulation. Direct models fall primarily into two categories: (1) empirical models for simple curve fitting [3, 5, 8, 1014], and (2) “phenomenological” models accounting for macroscopic phenomena such as random destruction of the cells, lifespan-based elimination of the cells, neocytolysis (selective hemolysis of young RBCs under conditions of RBC excess when acutely exposed to increased tissue oxygen content), etc., and method-specific phenomena such as radioactive decay of the label, elution, vesiculation, etc. in the case of labeling by \(^{51}\)Cr [9, 10, 1521].

Instead of the direct measurement of RBC survival, the indirect approach [9] relies on native Hb, glycosylated Hb, or other pharmacokinetic/pharmacodynamic (PK/PD) information over time to make inferences about the RBC lifespan [2, 9, 2225]. Indirect models are lifespan-based compartmental PK/PD models (e.g., lifespan-based indirect response or LIDR models [26]), of which transit compartment (TC) models are a special case [27]. A series of transit compartments, each compartment with a PDF representing the lifespan distribution of the cells in that compartment [9, 27, 28] is a hallmark of TC models.

As the name indicates, LIDR models are “indirect” models. Direct models, in which the survival of labeled RBCs is tracked over time, are based on the residual survival function (SF), which is derived here as a consequence of the LIDR model.

Random and cohort samples

Direct RBC survival studies can be conducted in two ways depending on the type of sample of RBCs used [7]. A random (or population) sample consists of a mixture of RBCs of all ages, such as that obtained by a venous blood draw. In contrast, a cohort sample consists of RBCs that are all approximately the same age [7]. Random sample methods are more easily performed and thus more widely used in RBC survival studies [7, 14]. In such studies, RBCs in blood drawn from a donor are labeled and transfused into a recipient (possibly the same person), and their disappearance is followed [7]. The time at which the sample is collected is called the index time, and the corresponding RBC population (in the circulation plus in the sample) at that time is the index population.

For a random sample study, the sampling of blood and transfusion of labeled RBCs back into the subject should ideally happen at the index time and within a time interval insignificant in comparison to the shortest lifespan of all but a negligible fraction of the RBCs; further, the transfused RBCs should be distributed homogeneously in the circulation immediately. The concentration of labeled RBCs in the circulation at that instant would serve as the baseline concentration. In practice, the labeled RBC enrichment at day 1 is taken as the baseline to minimize artifacts from RBCs that are damaged and consequently removed during the first 24 h in circulation [29]. Blood samples are subsequently drawn at sufficient intervals after the labeled RBC transfusion to permit determination of RBC survival parameters. The ratio of the concentration of labeled RBCs in the later samples to the baseline concentration gives the survival data at each time. A plot of these measurements against time constitutes the random sample (residual) survival curve.

Labeling with radioactive \(^{51}\)Cr is the current de facto gold standard for RBC survival studies [14], despite the fact that it exposes subjects to radiation and despite the analytical complications due to radioactive decay, elution of \(^{51}\)Cr from cells, and loss of label by vesiculation [30]. Recently, Mock et al. [31] demonstrated that random sample survival data for normal adults obtained using nonradioactive biotin labeled RBCs (bioRBCs) can be used in RBC survival studies with results similar to those from \(^{51}\)Cr. Rather than measuring the radioactivity of hemoglobin bound \(^{51}\)Cr in the blood samples, which is inevitably confounded by radioactive decay, vesiculation, and elution, individual cells are enumerated by flow cytometry after separation from unlabeled RBCs based on fluorescence intensity due to binding of fluorescent-labeled streptavidin to the biotin label on the RBCs. Thus, this method is free from the problems of elution and vesiculation. Moreover, biotinylation at lower densities appears not to affect the RBC lifespan [14].

Brief literature survey

Mathematical descriptions of RBC survival are broadly based on the direct or indirect method described above. Direct models are based on the theory for transfused RBCs developed early on [10, 3234]. Some quantitative descriptions developed later focused on fitting a curve to the survival curve and quantifying the mean lifespan and half-life [5, 11, 15, 17, 18, 32]. Complicated phenomenological models were also used [10, 1521, 30]. Some authors used simple linear or cubic curve-fitting of the survival curve to estimate the mean age at the time of labeling [3, 14]. Many studies use the maximum RBC lifespan (\(T_{max}\)), a term that is difficult to quantify [3, 11, 13] in a plausible manner.

A direct PK/PD model including RBC survivor functions of various complexity was presented by Uehlinger et al. [25]. A new lifespan based compartmental PK/PD model paved the way to an indirect and compartmental approach to quantify RBC/reticulocyte mean lifespan [35]. Earlier works based on this approach assumed that all cells in a compartment survived for a fixed time, which introduced a time-delay in the input/output models thus representing the lifespan of the cells in a compartment by a point distribution [2, 23, 26, 3537]. The theory was expanded later to represent the RBC lifespan distribution by possibly time-varying continuous PDF [21, 22, 3840]. These indirect methods do not involve direct measurement of random sample cell survival data. But Lledó-García et al. [9] presented a somewhat “hybrid” approach that used a TC model to estimate mean lifespan from a set of biotin-labeled random sample RBC survival data previously published by Cohen et al. [3].

There exist confusions in the literature primarily due to failure to distinguish between various survival functions and survival parameters and subtle disconnects in the theoretical relationships developed. Furthermore, some models have a large number of unknown parameters, which creates identifiability issues.

Objectives

In this paper, starting with a LIDR model based on an arbitrary lifespan distribution, we derive the relations between the residual lifespan distribution, which is the basis of observations in the direct method, and the full lifespan distribution; these are expressed through the survival functions corresponding to each distribution. Further, we derive survival functions for the current age distribution (distribution of ages in the index RBC population) and the excess lifespan distribution (distribution of remaining lifespan of cells in the index population that have survived beyond a specified time \(t_e\) past the index time). Other RBC survival descriptors (mean lifespan, mean current age, etc.) are then easily obtained.

We apply these results to analyze a set of bioRBC data [14] using the NLME framework. We consider three models, based on the Weibull, gamma, and lognormal distributions, respectively, for the full RBC lifespan. The PDFs for these distributions are given in Appendix 1. These distributions are widely used in reliability theory [4143], survival analysis [44], and studies of aging and lifespan for both mechanical and biological systems using a reliability theory approach [4549]. Although the Weibull is widely used as a failure distribution [41], aside from the papers by Friese et al. [22, 39], ours is the only use of the Weibull as an RBC lifespan distribution to our knowledge. In RBC survival studies the gamma distribution arises in transit compartment models [27] with a fixed, specified integer-valued shape parameter, a restriction we avoid in this work.

Based on observations of the residual survival function provided by the experimental data, the estimated structural parameters of each model are used to compute the survival functions for the full lifespan, current age, and excess lifespan distributions. From these it is easy to estimate many RBC survival descriptors at both the population and individual levels; as examples we use the mean lifespan, the standard deviation of the lifespan distribution, \(\tau _{_{95}}\) (as a surrogate for maximum lifespan), mean current age, and residual half-life. The definitions of these terms are provided later in the text.

The process of RBC aging and the eventual death is an issue of special scientific and clinical interest [5052]. The RBC lifespan distribution may provide information of clinical and research relevance in understanding the type of red cell elimination taking place in normal individuals and in those with specific diseases. For example, these models can potentially be used to determine whether elimination is primarily driven by an intrinsic lifespan-based mechanism or is the result of extrinsic factors acting independently of the RBC lifespan.

Post-transfusion survival of stored RBC has important implications in determination of the shelf life of RBCs [8, 53]. Excess lifespan of RBC can potentially be used to better quantify the post-transfusion survival of stored RBC.

A potential use of current age and excess lifespan distributions is to help identify morphological and/or physiological attributes of RBCs that correlate to the RBC age distribution, shedding light on the mechanism of RBC aging and eventual death in health and disease [54]. For example, it has been shown that the surface area and hemoglobin amount generally resemble lognormal distribution and the distribution changes in a time dependent manner as the membrane continues to be shed [55]. Time dependent changes in the density distribution of biotin-labeled sickle RBC have been studied in [56]. If we are able to accurately measure the current age or excess lifespan distributions based on some morphological or physiological attributes of RBCs, then it would potentially allow the quantification of full RBC survival descriptors discussed in this paper with a single random sample of blood instead of multiple consecutive random samples over the period of up to 4 months as in the most commonly used residual lifespan data.

Mathematical model

The birth time of an RBC is the time when a reticulocyte transitions to a fully matured and hemoglobinized RBC after losing the ribosomes and mitochondria. By death time of RBC we mean the moment of elimination of the RBC from the circulation. At birth time \(t=-u\), each RBC is assumed to be endowed with a full lifespan \(\tau\). Consider the index time \(t=0\) when the RBC sample is collected and labeled. The age of an RBC in the index population at this time is the current age, \(\tau _c=u\), which is the amount of time already spent by it in circulation since its “birth.” The residual lifespan of the RBC is the time \(\tau _r\) that it remains in the circulation before its “death”. Thus

$$\begin{aligned} \tau =\tau _c+\tau _r. \end{aligned}$$

Given a fixed, specified time \(t_e\) after the index time, the remaining lifespan of the RBC is excess lifespan of the RBC, \(\tau _e=\tau -t_e\). Not all RBCs are born with the same lifespans. The full lifespan and hence the residual lifespan, the current age, and the excess lifespan can be represented by respective continuous probability density functions (PDFs). In general the RBC survival distributions for different people will not be the same.

We assume that the RBCs in healthy individuals survive independently of each other. We assume the internal environment of the study subject is stable in the sense that the RBC lifespan distribution is time-invariant and that there is no abnormal blood loss or other significant intercurrent event (e.g., events leading to hemolysis).

Full lifespan distribution

The (full) lifespan distribution in the population of RBCs in a given individual is represented by a PDF p(t), \(t \ge 0\). The corresponding full SF is the probability that the lifespan is \(>\) t; more explicitly,

$$\begin{aligned} \bar{P}(t)=\int ^\infty _tp(u) du. \end{aligned}$$
(1)

The mean lifespan \(\mu\) is then given by the well-known formulae

$$\begin{aligned} \mu =\int ^\infty _0 tp(t) dt=\int ^\infty _0 \bar{P}(t) dt. \end{aligned}$$
(2)

In principle, if we could isolate a sample of newly produced RBCs released into the circulation (i.e., a cohort sample), label them, and transfuse them into the recipient, the survival curve obtained by tracking the labeled RBCs would provide an estimate of the full SF \(\bar{P}(t)\).

RBC compartmental model

We start with the LIDR compartmental model previously studied by several other groups [26, 3638, 40]. In brief the model specifies that

$$\begin{aligned} \dot{r}(t)=k(t)-\int _{-\infty }^t k(u) p(t-u)du, \end{aligned}$$
(3)

where r(t) denotes the number of cells in the RBC compartment at time t, k(t) is the RBC production rate, which is the rate of entry of cells into the RBC compartment, and p(t) is the full RBC lifespan PDF, which can be a completely arbitrary PDF on the positive real numbers. The integral term in Eq. (3) is the rate at which cells leave the compartment, i.e., the rate of RBC death due to senescence or hemolysis [40]. We assume k(t) is bounded and nonnegative, \(0\le k(t) \le M < \infty\) for all t, \(-\infty < t < \infty\), and that the mean lifespan \(\mu\) (Eq. (2)) is \(< \infty\). Integrating Eq. (3) from 0 to \(t > 0\), we obtain

$$\begin{aligned} r(t)-r(0)=\int _{0}^t k(u)du- \int _{0}^t k(v) \int _{-\infty }^v p(v-u)dudv. \end{aligned}$$

Because k(t) is bounded, both integrals are finite. After exchanging the order of integration in the double integral, letting \(r_0 = r(0)\) be the size of the index population (i.e., the RBC population that was present at time \(t_0 = 0\), which is the index time as described above), and some algebraic manipulation, we find

$$\begin{aligned} r(t)=r_0+r_0(t)+r_1(t)- \int _0^\infty k(-u) \bar{P}(u) du, \end{aligned}$$
(4)

where

$$\begin{aligned} r_0(t)=\int _0^\infty k(-u) \bar{P}(t+u)du \end{aligned}$$
(5)

and

$$\begin{aligned} r_1(t)=\int _0^t k(u) \bar{P}(t-u) du. \end{aligned}$$
(6)

The integral in (4) represents the RBC population due to past production, that is, during \(t < 0\), that is present at time \(t_0 = 0\). This can be seen by noting that \(k(-u)du\) is the number of cells produced during the time interval \(-u\) to \(-u + du\), and \(\bar{P}(u)\) is the fraction of cells that survive for time \(u > 0\), and the integral is the sum of these quantities over all \(u > 0\). For a person with a stable internal environment, this will be the RBC population at time 0, thus \(r_0\) and the integral cancel each other in Eq. (4) and we are left with

$$\begin{aligned} r(t)=r_0(t)+r_1(t). \end{aligned}$$
(7)

The term \(r_0(t)\) represents the part of the index population that has survived until time \(t > 0\), whereas \(r_1(t)\) is the portion of the RBC population that is present at time \(t > 0\) that was produced between times 0 and t.

Residual lifespan distribution

The survival time after \(t = 0\) of a cell in the index population is called the residual lifespan of the cell. The residual survival function (RSF), \(\bar{P}_{r} (t)\), is the proportion of the index population that has survived at least until time \(t > 0\) (equivalently, the probability of residual lifespan \(> t\)), \(\bar{P}_{r} (t) = r_0(t)/r_0(0)\). From Eq. (5),

$$\begin{aligned} \bar{P}_{r} (t)= \frac{\int _0^\infty k(-u) \bar{P}(t+u)du}{\int _0^\infty k(-u) \bar{P}(u)du}. \end{aligned}$$
(8)

Survival curves reported in studies of random RBC samples are in fact estimates of the RSF.

The RSF depends on the past production rate k(t) for times \(t < 0\), as indicated in Eq. (8). If k(t) can be assumed constant for \(t < 0\) (which is consistent with the assumption that the subjects had a stable intravascular environment), \(\bar{P}_{r}\) reduces to the familiar form

$$\begin{aligned} \bar{P}_{r} (t)=\frac{1}{\mu } \int _t^\infty \bar{P}(u)du. \end{aligned}$$
(9)

Note the similarity between Eqs. (9) and (1). The corresponding PDF of the residual lifespan is

$$\begin{aligned} p_{r}(t)=\bar{P} (t) / \mu . \end{aligned}$$
(10)

Integration by parts of \(\int _0^\infty \bar{P} _{r}(t)dt\) gives the mean residual lifespan \(\mu _{r}\), which is different from mean full lifespan:

$$\begin{aligned} \mu _{r}=\frac{E(\tau ^2)}{2\mu }=\frac{\sigma ^2+\mu ^2}{2\mu }, \end{aligned}$$
(11)

where \(E(\cdot )\) signifies expected value, and \(\mu\) and \(\sigma ^2\) are the mean and variance of the full RBC lifespan \(\tau\).

Current age distribution

The members of the index population that have age \(> t \ge 0\) at time 0 are those that were produced at some time u before time \(-t\) and have survived for an additional time \(-u\); as in the case of Eq. (4), this is given by

$$\begin{aligned} \int _{-\infty }^{-t} k(u) \bar{P}(-u) du= \int _t^\infty k(-u)\bar{P}(u)du. \end{aligned}$$

The fraction of the index population of age \(> t\), namely,

$$\begin{aligned} \bar{P}_{c}(t)=\frac{\int _t^\infty k(-u)\bar{P}(u)du}{\int _0^\infty k(-u)\bar{P}(u)du}, \end{aligned}$$
(12)

is the SF of the current age distribution, which is the distribution of ages in the cell population at the index time \(t_0 = 0\). It reduces to Eq. (9) if the past production rate is constant; in this specific situation, but not in general, the current age distribution is the same as the residual lifespan distribution and hence mean current age, \(\mu _c\)= mean residual lifespan, \(\mu _r\).

Excess lifespan distribution

Consider the subpopulation of the index population that is still present at a specified time \(t_e > 0\); these are the cells that were produced at some time \(-u < 0\) and survived for a time \(t_e + u\). The excess lifespan distribution corresponds to the SF \(\bar{P}_{e}(t)\), which is the fraction of this subpopulation of age \(> t_e + t\), where \(t > 0\) is called the excess lifespan. By an argument similar to that following Eq. (4), the subpopulation of age \(> t_e + t\) consists of those cells that were produced at some time some time \(-u < 0\) and have survived for time \(t_e + t+u\); the number of such cells is \(\int _0^\infty k(-u) \bar{P}(t_e+t+u)du\). The entire subpopulation present at time \(t_e\) corresponds to \(t = 0\) in this expression. Thus the fraction of those present at time \(t_e\) that are of age\(> t_e + t\) is the ratio

$$\begin{aligned} \bar{P}_{e}(t)=\frac{\int _0^\infty k(-u)\bar{P}(t_e+t+u)du}{\int _0^\infty k(-u)\bar{P}(t_e+u)du}. \end{aligned}$$
(13)

This is the conditional probability that the lifespan of an RBC in the index population is \(> t + t_e\) given that it is \(>t_e\). Given a constant past production rate it reduces to

$$\begin{aligned} \bar{P} _{e}(t)=\frac{\int _{t+t_e}^\infty \bar{P}(u)du}{\int _{t_e}^\infty \bar{P}(u)du}=\frac{ \bar{P}_{r}(t_e+t)}{ \bar{P}_{r}(t_e)}, \end{aligned}$$
(14)

and this can be estimated from the observed residual survival curve. The corresponding PDF of excess lifespan distribution is given by

$$\begin{aligned} \bar{p} _{e}(t)=\frac{ \bar{P}(t_e+t) }{ \int _{t_e}^\infty \bar{P}(u)du }. \end{aligned}$$
(15)

Note that, for \(t_e=0\), the excess lifespan SF is the same as the current age SF and consequently is the same as the residual lifespan SF in the case under consideration.

Relation (2) holds for any probability distribution, including the full lifespan, residual lifespan, current age, and excess lifespan distributions. We can use Eqs. (8), (12), and (14) to compute the mean residual lifespan (\(\mu _r\)), mean current age (\(\mu _c\)) in the index population, and the mean excess lifespan (\(\mu _e\)) in the surviving part of the index population at any time \(t_e > 0\). These results generalize those of Lindsell et al. [13], without the questionable assumption of a stable, stationary, and closed population [57].

Methods and statistical analysis

We used a nonlinear mixed effects (NLME) model [58] to analyze data obtained from 8 healthy adults who received autologous biotin-labeled RBCs [14]. Parametric bootstrap analysis was performed to confirm the results of NLME analysis because the number of subjects is small. The NLME model analysis was conducted using the nlmefit function in the Statistics toolbox of MATLAB software [59]. All computations and simulations were conducted using the same software. A personal computer with a 4 GB RAM and a 2 core Intel\(\copyright\) (R) Core(TM) i3- M 370 @ 2.40 GHz processor was used for all computations.

We analyzed the lowest density bioRBC data from Mock et al. [14] as biotinylation at lower densities has negligible effect on the RBC lifespan [14]. Details of the experimental design and data collection are provided in [14]. Briefly, each of eight healthy subjects was transfused with autologous bioRBCs labeled at a biotinylation reagent density of 6 \(\upmu\)g/mL of packed RBC. Post-transfusion residual RBC survival data were collected. Since even microliter blood samples contain many RBCs, the laws of large numbers from probability theory imply that such samples will replicate the residual lifespan distribution of the whole circulation at a given time within measurement error; this is the basis of the statistical analysis of random blood samples. Since the flow cytometric instrumentation allows tracking of the disappearance of at least 97 % of bioRBCs [14], we excluded values of the survival curves smaller than 0.03. The bioRBC concentration at day 1 was used as the baseline value.

We performed separate analyses for the Weibull, gamma, and lognormal models. For each model, the nonlinear mean-value terms are the RSFs as in Eq. (9), but with the appropriate survival functions; thus,

$$\begin{aligned} \bar{W}_{r}(t;\alpha ,\beta )= & {} \frac{1}{\mu _w} \int _t^\infty \bar{W} (u;\alpha ,\beta )du,\\ \bar{G}_{r}(t;\alpha ,\beta )= & {} \frac{1}{\mu _g} \int _t^\infty \bar{G} (u;\alpha ,\beta )du,\\ \bar{L}_{r}(t;\alpha ,\beta )= & {} \frac{1}{\mu _l} \int _t^\infty \bar{L} (u;\alpha ,\beta )du, \end{aligned}$$
(16)

where \(\bar{W}_{r}\), \(\bar{G}_{r}\), and \(\bar{L}_{r}\) denote the RSFs, \(\bar{W}\), \(\bar{G}\), and \(\bar{L}\) denote the full SFs, and \(\mu _w\), \(\mu _g\), and \(\mu _l\) denote the mean full lifespans (see Appendix 1) in the Weibull, gamma, and lognormal models, respectively. The same letters \(\alpha\) and \(\beta\) are used for all of the Weibull, gamma, and lognormal parameters for brevity in presentation, but we emphasize that they are entirely unrelated. We refer to \(\alpha\) and \(\beta\) as the structural parameters on the original scale.

Just as \(\bar{P}_{r}(t)\) is the ratio of the surviving part of the index population relative to the size of the index population, the observed residual survival curve, is the ratio of the measured concentration of the surviving part of the labeled RBCs at time \(t > 0\) relative to the concentration at index time \(t = 0\). In standard mixed effects model notation, the models in Eq. (16) are

$$\begin{aligned} \bar{W}_{ij}=\bar{W }_{r} (t_{ij};\alpha _i,\beta _i )+\epsilon _{ij},\\ \bar{G}_{ij}=\bar{G }_{r} (t_{ij};\alpha _i,\beta _i )+\epsilon _{ij},\\ \bar{L}_{ij}=\bar{L }_{r} (t_{ij};\alpha _i,\beta _i )+\epsilon _{ij},\\ i=1,\ldots ,M,\ j=1,\ldots ,n_i,\end{aligned}$$
(17)

where \(\bar{W}_{ij}\), \(\bar{G}_{ij}\), and \(\bar{L}_{ij}\) denote the observed residual survival curve corresponding to the Weibull, gamma, and lognormal models, respectively, \(M = 8\) is the number of subjects, \(n_i\) is the number of observations on the ith subject, \(t_{ij}\) is the time of the jth observation on the ith subject, and \(\epsilon _{ij}\) is an additive random error term, assumed normal \(N(0,\phi ^2)\) (we use \(\phi ^2\) to denote variance of the random error to not confuse with \(\sigma ^2\) used to denote the variance of the full lifespan distribution). Thus \(\bar{W}_{ij}=\bar{W}(t_{ij})\), for example, is the observed residual survival curve for the ith subject at the jth time using the Weibull model. To ensure the parameter estimates are \(>\)0 we write

$$\begin{aligned} \alpha _i=\exp (a_i),\ \beta _i=\exp (b_i) \end{aligned}$$
(18)

and rewrite the models as

$$\begin{aligned} \bar{W}_{ij}=\bar{W }_{r} (t_{ij};a_i,b_i )+\epsilon _{ij},\\ \bar{G}_{ij}=\bar{G }_{r} (t_{ij};a_i,b_i )+\epsilon _{ij},\\ \bar{L}_{ij}=\bar{L }_{r} (t_{ij};a_i,b_i )+\epsilon _{ij},\\ i=1,\ldots , M,\ j=1,\ldots , n_i,\ {\mathrm {and}} \end{aligned}$$
(19)
$$\begin{aligned} a_i = a_f + a_{ri},\ b_i = b_f + b_{ri}. \end{aligned}$$
(20)

We call \(a_i\) and \(b_i\) the individual structural parameters on the log scale; this well known “log trick” [58] allows the use of unconstrained optimization techniques. \(a_f\ (=\log (\alpha _f))\) and \(b_f \ (=\log (\beta _f))\) denote the fixed effects, representing the average values of the parameters (in log scale) over the whole population from which the subjects in the sample are drawn (the reference population), and \(a_{ri}\) and \(b_{ri}\) denote random effects, which represent the deviations of an individual’s parameters from the population average [58]. “Mixed effects” refers to the presence of both fixed and random effects in (20). The random effects vector \((a_{ri}, b_{ri})\) is modeled as multivariate normal, \(N({\mathbf {0,\Psi }})\), independent of the measurement errors \(\epsilon _{ij}\), and independent between subjects; likewise, the errors are assumed independent between and within subjects [58].

For the distribution specified by the corresponding PDF, the derived parameters for each individual are the functions of corresponding \(\alpha _i\) and \(\beta _i\) and for the population (fixed effects estimates) are the functions of corresponding \(\alpha _f\) and \(\beta _f\). For the corresponding full lifespan distribution, the mean full lifespan (or simply mean lifespan) \(\mu\) and the standard deviation \(\sigma\) are the derived parameters that can be computed easily using the formulas provided in Appendix 1. Other derived parameters describing RBC survival of potential clinical importance are

  1. 1.

    \(\tau _{_{95}}\), defined as the 95th percentile of the full lifespan distribution, a surrogate for the maximum lifespan,

  2. 2.

    the mean current age, \(\mu _c\), defined as the mean of the current age distribution, which is equal to the mean of the residual lifespan distribution or the mean of the excess lifespan distribution at index time \(t=0\) assuming constant past production rate, and

  3. 3.

    the residual half-life, \(T_{50}\), defined as the median of the residual lifespan distribution. Note that we use the term half-life irrespective of whether or not the distribution is exponential.

The values of \(\tau _{_{95}}\) and \(T_{50}\) have to be obtained numerically as they cannot generally be expressed in closed form; any modern statistical software has a facility to do this. An exception is that \(\tau _{_{95}}\) is easily obtained for the Weibull distribution: \(\tau _{_{95}} = \beta (\log (20))^{1/ \alpha }\) (all logarithms in this paper are natural logarithms). The closed form expressions for \(\mu _c\) for the corresponding PDFs are presented in Appendix 1. Formulas for mean residual lifespan and steady state mean full lifespan are provided in Appendix 1. Mathematical manipulations for efficient computation of the different survival functions based on the Weibull, gamma, and lognormal PDFs are also provided therein.

Results

Figure 1 shows the population fits for the 8 study subjects using the three different models. Residual survival data for each subject along with the respective individual fits using the Weibull, gamma, and lognormal models are shown in Fig. 2. The group values of root mean square errors (RMSEs) are 0.0120, 0.0126, and 0.0129 for Weibull, gamma, and lognormal models, respectively. Corresponding fractional RMSE (RMSE/average response value) are 0.023, 0.024, and 0.025 respectively. The log likelihood (logl) values are 358, 354, and 351 respectively. The RMSE and logl values are shown in Table 1.

Fig. 1
figure 1

Fitted population-level residual survival curves for data of all 8 study subjects. Inset: tail region where the models differ the most

Table 1 Goodness-of-fit and model selection criteria

The individual weighted residuals, population weighted residuals, and conditional weighted residuals for all of the models were consistent with normality based on normal probability plots (not shown) and Shapiro–Wilk tests (\(P \ge 0.05\)). The population fit for the RSF along with the computed full lifespan, current age, and excess lifespan (at different times \(t_e\) after index time) SFs and corresponding PDFs using the gamma model are shown in Fig. 3 top and bottom panels respectively. Such a graph can easily be computed for individual subjects too (not shown). Similarly, the population and individual survival functions and the corresponding PDFs can easily be computed using the Weibull and lognormal models as well (not shown).

Population-level (fixed effects) estimates of the structural parameters on the original scale, \(\alpha _f\) and \(\beta _f\), for all the models are given in Table 2. For each of the three models, Table 3 gives maximum likelihood estimates (top), mean of the bootstrap estimates (middle), and the bootstrap \(95\, \%\) CI of the population derived parameters \(\mu\), \(\sigma\), \(\tau _{_{95}}\), \(\mu _c\), and \(T_{50}\). The predicted values of the individual structural parameters on original scale are shown in Table 4 and the values of the individual derived parameters of the RBC survival are shown in Table 5.

Table 2 Estimated population-level structural parameters \(\alpha _f\) and \(\beta _f\) on original scale (subscript f indicates fixed effects)
Table 3 Population estimates and confidence intervals for derived parameters
Table 4 Estimated individual-level (mixed effects) structural parameters \(\alpha _i\) and \(\beta _i\) on original scale
Table 5 Individual-level (mixed effects) estimates of derived RBC survival parameters

The range of individual \(\mu\) was 97.5–128.0 days for the Weibull, 99.6–128.4 days for the gamma, and 100.95–128.37 days for the lognormal model. The range of \(\sigma\) was 18.84–36.97 days for the Weibull, 18.86–35.59 days for gamma, and 19.51–34.98 for the lognormal model. For \(\tau _{_{95}}\) the ranges were 149.2–160.1, 155.3–164.5, and 157.14–165.72 days, for \(\mu _c\) they were 55.7–65.39, 56.18–65.59, and 48.62–63.50 days, and for \(T_{50}\) they were 50.1–64.0, 50.3–64.2, and 50.65–64.19 days, for the Weibull, gamma, and lognormal, respectively.

Discussion

Various distributions have been used for the full RBC lifespan in the literature, including the homogeneous lifespan model [2, 23, 25, 40], in which each of the cells has the same fixed lifespan, and the Weibull [22, 39], gamma [40], and lognormal [60] distributions. A more complicated model, a mixture of two Weibulls (a so-called bathtub-shaped distribution), was used by Korell et al. [11, 19, 20, 30]. (Note that there is a misplaced minus sign in Eq. (1) in both [20] and [30]). By judicious choice of the parameters its shape can be made to mimic senescence, random destruction, and neocytolysis. This is a phenomenological model in the sense mentioned earlier.

In the present paper three separate models, using the Weibull, gamma, and lognormal distributions, were employed to analyze the bioRBC data from eight healthy adult subjects. The nonlinear mixed effects approach provided a framework for analysis of both population and individual variability of the structural and derived parameters. Because of the large number of parameters in their model, the authors in [11, 19, 20, 30] were unable to estimate all of them simultaneously and thus found it necessary to fix several of them in the analysis. By contrast, we were able to carry out complete analyses with the Weibull, gamma, and lognormal models, each with only two structural parameters, without fixing any parameter in advance.

The fits appear excellent visually (Figs. 1 and 2). The fractional RMSEs (RMSE/ average response value) of 0.023, 0.024, and 0.025 respectively for Weibull, gamma, and lognormal models together with the fact that the residuals are consistent with normality support the use of the NLME methodology.

Fig. 2
figure 2

Fitted individual-level residual survival curves for data of all 8 study subjects

Fig. 3
figure 3

Population survival functions (top panel) and PDFs (bottom) based on gamma model. For constant past production rate the residual lifespan, current age, and excess lifespan (for \(t_e = 0\)) SFs and PDFs coincide. Note the changes in excess lifespan SF and PDF depending on the choice of \(t_e\)

In addition to routine model diagnostics we performed a parametric bootstrap simulation (details in [21]) to confirm the results from the NLME software (MATLAB [59]), which are based on asymptotic theory for maximum likelihood and require the number of subjects to be large. The bootstrap simulation utilizes the NLME information from the reference population to generate a large number of virtual datasets that duplicate the random mechanism that generated the original data set. The confidence intervals (CIs) of structural parameters obtained from the bootstrap analysis are similar to those from the NLME analysis. This shows that parameter estimation by the NLME software is reliable even with the small number of subjects (\(M = 8\)) in our dataset. Since analytical expressions for the standard errors (SEs) of the derived parameters are not available, we also used the bootstrap results to quantify their SEs in order to compute the resulting \(95\,\%\) CIs (=estimate \(\pm\)1.96 \(\times\) SE).

All three models give physiologically plausible values for the parameters \(\mu\), \(\sigma\), \(\tau _{_{95}}\), \(\mu _c\), and \(T_{50}\) for both the population and the individual subjects. The agreement between the three models is almost exact for \(\mu\), \(\sigma\), \(\mu _c\), and \(T_{50}\), but the gamma and lognormal models give consistently greater values (approximately \(3\,\%\)) for \(\tau _{_{95}}\) than the Weibull. The reason is that the full lifespan distributions are very similar in their midranges but diverge towards the long survival tail, where the data are sparse, and \(\tau _{_{95}}\) is a property of the tail. We speculate that a data set with more reliable measurements in the tail region of the residual survival curve might provide a stronger basis for deciding which (if any) of the models is significantly better than the others. Techniques for more accurately enumerating bioRBCs after removal of \(95\,\%\) of the initial population are currently being developed by some of the authors (JAW, DMM, PV-P).

Because of the small sample size there is no statistical criterion by which to decide confidently which is the “best” of the three models entertained here. For some purposes the choice of any particular model is immaterial, and in this case the three models are about equally mathematically and computationally tractable; see Appendix 1. One advantage of the gamma model in certain applications is that a clean state space representation exists when \(\alpha\) is an integer. We presented such a gamma-based erythropoiesis model that is physiologically relevant and demonstrated its applicability using clinical data [61].

Recently, Lledo-Garcia et al. [9] gave extensive comparisons of three common RBC lifespan models: homogeneous lifespan, random destruction (RD), and transit compartment (TC) models. The TC model contains a series of compartments connected by first order cell transfer rates [27]. The TC model with one compartment is the RD model. A TC model with infinitely many compartments converges to the homogeneous lifespan model [27]. The TC model with 12 compartments was found by Lledo-Garcia et al. to describe the data best based on likelihood functions (in the form of the objective function value or OFV, defined as minus twice the loglikelihood) and visual predictive checks. The mean full lifespan for the 12 TC model was found to be 91.8 days for healthy subjects, which is substantially shorter than the normally accepted value of around 120 days.

There can be difficulties with likelihood-based model selection [62], especially for non-nested models [63] as occur in [9] and in the present paper. Unless the data satisfy the technical requirements, such as normality or independence, for several models, as must be determined from model diagnostics, a comparison based on likelihood may be questionable, as the likelihood function is then just a criterion function that is not the same for different models. Even when such requirements are met, it is not always clear what such a comparison means. To say, as is often done, that the model with the higher likelihood “explains the data better” is circular, because, without some discussion of the mechanism behind the data, the only sense in which the data are “explained” in this situation is by the higher likelihood. Unfortunately the probabilistic justification of a comparison by likelihood will also be inapplicable if the sample size is small, as is the case in some of the studies discussed in [9] and here.

The gamma model in the current paper is equivalent to TC model with \(\alpha\) compartments if \(\alpha\) is a positive integer. For comparison we conducted NLME analyses of our bioRBC data with 1–50 compartments, namely, by taking \(\alpha = 1\) to 50 and estimating only \(\beta\) in each case. The smallest RMSE (= 0.0151), fractional RMSE (= 0.029), and largest loglikelihood (= 336), all occurred for the 22-compartment TC model (\(\alpha\) = 22), as shown in Fig. 3. These values were less favorable than those for the unconstrained Weibull, gamma, and lognormal models; in particular, the loglikelihood for the lognormal model (= 351), which was the smallest loglikelihood for our three models, was considerably larger than that of the 22-compartment TC model.

It is accepted in the literature that RBC lifespan values are tightly dispersed around a mean value for healthy individuals. There are many diseases which impact RBC lifespan, e.g., sickle cell disease, diabetes, chronic kidney disease, etc. In such cases, information about the whole distribution of RBC lifespan, in particular the mean (\(\mu\)) and the standard deviation (\(\sigma\)), may be helpful in distinguishing between health and disease. A simultaneous estimation of \(\mu\) and \(\sigma\) of the RBC lifespan distribution has never been done to the best of our knowledge. Typically, RBC survival studies have been focused on quantifying an average description, e.g., mean age (i.e mean current age according to our definition) of circulating RBCs [3], RBC survival (meaning the half life of the \(^{51}\)Cr disappearance from the circulation) [5], half-life [4], mean lifespan [9, 11, 20, 64], mean potential lifespan (MPL) [14], mean remaining life span [12] (this is similar to mean excess lifespan at \(t_e=0\) according to our definition), etc.

The concepts of residual lifespan, current age, and excess lifespan have often appeared in the literature, but are not named explicitly and sometimes not treated rigorously [3, 8, 33, 53, 54, 57]. Outside of RBC survival literature, the concepts of current age and excess lifespan appear in renewal theory [65] but have not previously been used in compartmental models, which are conceptually different from renewal theory.

In the literature, the maximum RBC lifespan \(T_{max}\) is defined as the time when all of the labeled RBCs disappear from the circulation [3, 11, 13]. Taken literally, this means \(T_{max}\) is the time at which the residual survival curve decreases to zero. The determination of that time is highly dependent on the sensitivity of the method and associated analytical instrumentation because the proportion of labeled RBCs becomes very small towards the tail region. \(T_{max}\) is often estimated as the time at which a linear or nonlinear curve fitted to the residual survival curve intersects the time axis. When a linear fit of the entire residual survival curve is used, this intersection is actually an estimate of the mean lifespan rather than the maximum [31, 33]. When a nonlinear fit is used [3, 13], the estimate is strongly dependent on the last sample time of the residual survival curve. The final 2 points of the residual survival curve were extrapolated to the time axis to estimate the maximum RBC lifespan in [3]; such an estimate will also depend on the distance between the last two measurement points.

We used the 95th percentile \(\tau _{_{95}}\) of the lifespan distribution as a surrogate for the maximum lifespan; by definition, \(\tau _{_{95}}\) is the value such that \(95\,\%\) of the RBC lifespans are \(< \tau _{_{95}}\). (The Weibull, gamma, and lognormal distributions extend to infinity in the positive direction; hence have no true maximum value). The choice of \(95\,\%\) is arbitrary and is used here only for illustration. Any other percentile could be used; in practice the choice would likely be dictated by the nature of the application. The estimated population values of \(\tau _{_{95}}\) are in physiologically plausible ranges.

Recently, a new clinically relevant parameter to assess the quality of transfused red blood cells called mean remaining lifespan (MRL) has been introduced [12]. MRL is analogous to the area under the curve (AUC) or mean residence time (MRT) in PK studies. It is defined as the AUC of the fraction of the transfused RBCs remaining in the circulation versus time [12]. This is in fact an estimate of mean residual lifespan described in the current paper (\(\mu _{r}=\int _0^\infty \bar{P}_{r} (t)dt=\mu _{c}\)). Measurement of MRL as described in the paper [12] again runs into the problem of determining when the last of the transfused RBCs have left the circulation. For practical purposes, MRL is replaced by \(MRL_{0.95}\), which is defined as the area under the curve until the time, \(t_{0.95}\), when \(95\,\%\) of the transfused cells have disappeared from the circulation. \(t_{0.95}\) is estimated by interpolation of the curve [12]. The accuracy of \(MRL_{0.95}\) depends on the frequency of sampling around \(t_{0.95}\) and the last sample must wait until at least \(95\,\%\) of the transfused cells have disappeared, which takes approximately 3–4 months in healthy individuals.

As shown by Dornhorst [33], it follows from Eq. (9) that the mean RBC lifespan \(\mu\) is given by the negative reciprocal of the slope of the RSF at \(t = 0\), a result that holds for any lifespan distribution, assuming the past production rate is constant. Thus the MPL (mean potential lifespan) described in [14] is actually an estimate of \(\mu\) for each individual (note that we use the data from [14]). The average MPL for the eight subjects given in [14] was \(115 \pm 8\) days, which is very close to our population values of \(\mu = 116 \pm 3\) for Weibull and \(117 \pm 3\) for both the gamma and lognormal (mean \(\pm\) SE). Similarly, the time to disappearance of 50 % of the labeled RBCs from the circulation, denoted by \(T_{50}\) and given as 58 \(\pm\) 4 days in [14], is very close to the population residual half-lives \(T_{50}\) found here (58 \(\pm\) 1.5 for all models).

Why use a complicated method like NLME when the simple MPL method gives similar results? There are both statistical and practical reasons. From the statistical point of view, first, the average MPL in [14] is the average of the individual MPLs obtained by simple linear regression for each subject separately. This approach yields an overestimate of the variability among individuals, and hence the SD of the MPLs, based on only 8 subjects, is likely too large. By contrast, NLME gives estimates and standard errors at both the population and individual levels [58]. Second, the MPL does not model the full lifespan distribution and, therefore, is incapable of providing quantitative information about other survival parameters such as the longest lifespans in the distribution as reflected in \(\tau _{_{95}}\). Information about the whole distribution may be helpful in distinguishing between health and disease (e.g., sickle cell disease). Third, the NLME framework provides standard methodology to include covariates such as sex, age, ethnicity, etc. in the analysis. Fourth, NLME is well-suited to the situation of sparse samples, which is usually the case in clinical settings. The results of a NLME analysis can be used to determine the minimum number, optimal timing, and the last time point of measurement for measurements in a new subject when only a few measurements are possible [21, 66], e.g., in infants or sick patients.

Our results in Appendix 1 allow complete, rigorous, and computationally efficient analyses of the models considered here (parameter estimation took under a second for Weibull and just over a second for the gamma and lognormal models on a personal computer). Further, there are no constraints on parameters during the estimation process, as in other published models [9, 20, 30], or computational constraints, such as the inability of the software to handle more than 30 compartments described by Lledó-García et al. [9].

Conclusions

Starting with definitions of index population, which has previously not been made explicit, and of birth time, death time, full lifespan, residual lifespan, current age, and excess lifespan, we provided mathematical descriptions of RBC survival parameters, which remove the lack of clarity often found in the literature. We exhibit the connections between a compartmental (or LIDR) model for the RBC population based on a given lifespan distribution and the residual lifespan, current age, and excess lifespan distributions, not available previously in the literature. We gave analytical expressions for mean full lifespan, mean residual lifespan, mean current age, and steady-state mean full lifespan, and indicated how to compute RBC half-life and 95th percentile of the RBC lifespan distribution (\(\tau _{_{95}}\)). The use of \(\tau _{_{95}}\) (or other percentiles) avoids the questionable concept of \(T_{max}\) often used in the literature.

Employing nonlinear mixed effects modelling, we analyzed residual survival data from biotin-labeled RBCs using models based on the Weibull, gamma, and lognormal distributions. The three models fit the data closely and gave equally physiologically plausible estimates of clinically interpretable RBC survival parameters at population and individual levels.

Our modelling framework could be useful in studying RBC lifespans in various diseases that affect RBC survival, especially in situations with strongly non-linear survival curves (e.g., sickle cell anemia). The model cannot be used if the assumption of “stability” of the subject’s internal environment is not satisfied, as may occur in the case of blood loss and/or other significant intercurrent events (e.g., events leading to hemolysis). The framework also lends itself to analyzing richer data sets containing covariates such as age, gender, and weight.