Introduction

Patient disease status is commonly assessed using abstract scores that represent a discrete set of values within a finite range. This type of data is referred to as "bounded outcome scores" (BOS) [1]. Examples include the widely known Child–Pugh score [2] and the Clinical Dementia Rating Sum of Boxes (CDR-SB), a composite score ranging from 0 to 18 based on domains of cognition and function [3]. Another frequently analyzed example in pharmacometric literature is the Psoriasis Area and Severity Index (PASI), a composite score ranging from 0 to 72, with 0.1 increments [4]. BOS, or their derived endpoints such as PASI 75/90/100, which indicate 75, 90, or 100% improvement from baseline PASI scores, respectively, are often used as endpoints in clinical trials.

Analyzing BOS data presents a unique challenge, as common practices often treat the data as different types, even within the same analysis. While BOS data are conceptually ordered categorical, they are often analyzed as continuous variables due to the large number of possible values (> 10). This approach can result in model predictions outside the natural range of BOS data and may introduce biases, as BOS data distributions often exhibit non-standard shapes such as J- or U-shaped, which violate the assumption of symmetry commonly associated with the normal distribution [1]. For longitudinal clinical trial data, even if the baseline distribution is considered symmetric, data distributions at later visits may become progressively more skewed as disease conditions improve over time with effective treatments [4]. In practice, BOS data typically reach the lower boundary but may occasionally reach the upper boundary. Describing BOS data accurately, especially for derived endpoints, can be challenging.

The key to analyzing BOS data lies in selecting an appropriate probability distribution for the data, potentially with transformations. Over the past two decades, various analysis methods have emerged, and a review of methods used in pharmacometric applications was published in 2019 [5]. Since then, additional applications have been reported in pharmacometric literature [3, 6,7,8,9]. A methodological elaboration has recently appeared [10].

The technical complexity, often arising from simultaneously treating BOS data as both continuous and categorical, can be daunting for non-statisticians, leading to confusions outside of statistical literature. In practice, the choice of analysis method often depends more on familiarity rather than appropriateness. This manuscript aims to clarify the properties of analysis methods and provide an updated summary based on the review from [5].

Analysis Approaches and Methods

To streamline the notations, it is assumed without loss of generality that, the original BOS variable Y has been standardized using a linear transformation onto the closed interval [0, 1]. As a result, Y takes values in the form of k/m, where k = 0, 1, …, m. This will result in slight differences in notation compared to the original method descriptions. A summary of the main BOS analysis methods used in pharmacometric applications is provided below, categorized according to their approaches.

Data Transformation

This approach may be the earliest used, with origin from psychological literature [11]. It may be motivated by the common perception that the main challenge in analysis arises from the boundary data, as they hamper the use of commonly employed distributions on finite intervals, such as the beta distributions. This led to the idea of transforming the boundary data (0 or 1) to values within the interval (0, 1) using a linear transformation, such as Y* = Y(1 − δ) + δ/2 with a small correction factor δ, e.g., 0.01. Modeling the transformed data Y* led to the two methods below.

Beta-Regression

This method models Y* using a beta distribution with density

$$\text{f}\left(\text{y}\right)=\Gamma \left(\alpha +\beta \right)/\left[\Gamma \left(\alpha \right)\Gamma \left(\beta \right)\right]{\text{y}}^{\alpha -1}{\left(1-\text{y}\right)}^{\beta -1}$$
(1)

where α, β > 0, and Γ denotes the gamma function. Note that a useful mean-precision parameterization is given by:

$$\begin{array}{cc}\alpha =\mu \phi ,& \beta =\left(1-\mu \right)\phi \end{array}$$
(2)

where μ is the mean, and ϕ is the precision parameter. Beta distributions can describe a variety of data skewness [12].

Logit-Normal-Regression

Similarly intuitive as the beta-regression method, this method first applies the logit function logit(x) ≡ log[x/(1-x)] to further transform the data to (-∞, ∞), and then model it with a normal distribution. Specifically, the model is logit(Y*) ~ N(p, σ2), where p is the model predictor on the transformed scale, and σ2 is the variance [3].

Despite the intuitive appeal, the data transformation approach has an under-realized ill-behavior at the boundary, namely that the boundary data can become arbitrarily influential as δ → 0 [4]. Therefore the approach lacks statistical rigor [13]. In practice, its behavior will depend on the value of δ chosen by the analyst, which is difficult to determine a priori. This approach will not be discussed further in this manuscript.

Zero-Inflated

This approach treats data inside the boundary as continuous and the boundary data as categorical, by modeling not only the values within the boundary but also the probability of achieving the boundary. It has long been used in statistical literature [14]. Two methods applying this approach in pharmacometrics are given below.

Zero-Inflated-Beta

Recently, a zero-inflated beta distribution was applied to model the transformed PASI scores Y = PASI/72, such that the probability of observing Y = y is modeled as [8]:

$$\left\{\begin{array}{cc}p_0,&if\;y=0\\\left({1-p}_0\right)f\left(y\right),&if\;0<y<1\end{array}\right.$$
(3)

where p0 is the probability at the boundary, and f(y) is given by Eq. 12. To allow dependence between p0 and f(y), p0 was further modeled as p0 = logit(ξ1—ξ2μ), where μ is given in Eq. 2, and ξ1 and ξ2 are parameters to be estimated. Limited visual predictive checks (VPCs) [15] were also conducted for PASI scores as well as the derived endpoints of PASI 5075/90/100.

Censoring

This method is motivated from that of analyzing concentration data below the quantification limit [9, 16], and may be viewed as a parsimonious sub-category in the zero-inflated approach. It treats the boundary data as outside but censored at the boundary, and the data within the boundary as continuous. The Aranda-Ordaz link function, defined by

$$\text{x}=\text{h}\left(\text{y}\right)=\text{h}\left(y, \lambda \right)=\text{log}\left(\frac{{(1-y)}^{-\lambda }-1}{\lambda }\right)$$
(4)

where λ is a parameter to be estimated, was used to accommodate data skewness. A general nonlinear mixed effect model X = p + gε was used where p is the model predictor, g is a residual error standard deviation function, and ε is a normally distributed residual error. The conditional likelihood of an observation y on the continuous scale is given by

$${\left[\phi \left(\frac{h\left(y\right)-p}{g}\right)J\left(y,\lambda \right)\right]}^{I\left(y\in \left(\text{0,1}\right)\right)}{\Phi \left(\frac{{x}_{L}-p}{g}\right)}^{I\left(y=0\right)}{\left[1-\Phi \left(\frac{{x}_{U}-p}{g}\right)\right]}^{I\left(y=1\right)}$$
(5)

where ϕ is the normal density, Φ is the cumulative distribution function of ϕ, I is the indicator function, J(y, λ) = ∂h(y)/∂y is the Jacobian, and xL = h(0) and xU = h(1) are the transformed boundary values.

These two methods are capable to described skewed data distributions on the continuous scale, as suitable transformations may be found by visually inspecting observed data distributions. A drawback is that predictions for the in-boundary data will fall outside the original categories. While the predictions could be rounded to the nearest BOS data category [13], this would, in principle, result in a loss of information and could adversely affect analysis precision.

Latent Variable

This approach treats BOS data as ordered categorical, which aligns with the data nature. The common statistical analysis methods of logistic and probit regressions can be interpreted as that there exists an underlying latent variable which [17], when crossing certain thresholds, caused the observed data to fall in the corresponding categories [18]. In the notation of this manuscript, the latent variable model may be written as:

$$\text{Y}=\text{k}/\text{m} \ \text{if and only if a}_{\text{k}}\le \text{t}\left(\text{U}\right)<{\text{a}}_{\text{k}+1}, \ \text{for k}=0,\dots ,\text{m}$$
(6)

where U is the latent variable on the interval (0,1), t is a possible transform function and {ak} are the thresholds. Selecting different distributions for U, along with t and {ak}, lead to different methods.

Ordered Categorical

The familiar logistic regression model is equivalent to setting t in Eq. 6 as the logit function, a0 = -∞, am+1 = ∞, and estimate the remaining {ak} as parameters which correspond to the intercepts. Similarly, setting t as Φ−1, the inverse cumulative normal distribution function, leads to probit regression. In the past decade, it has been realized that logistic and probit regressions remain effective for analyzing BOS data with sufficient data sizes in all categories [19], which may occur with moderate e.g., [10,11,12,13,14,15,16,17,18,19,20] number of categories [13].

Larger number of categories can still hamper the analysis. Along with selecting t in Eq. 6, fixing {ak} has led to additional methods given below.

Logit/probit Normal

Two methods using the identity transformation function for t and a logit-normal distribution for U, i.e., logit(U) ~ N(p, σ2), where p is the model predictor on the transformed scale, and σ2 is the variance, are described below.

Coarsened Grid (CG)

This method is motivated by viewing BOS data as interval censored observations, with the intervals given by ak in Eq. 6, where ak = (k-0.5)/m, ak+1 = (k + 0.5)/m, and a0 = 0, am+1 = 1 [1]. The conditional likelihood of an observation Y = k/m is given by

$$\Phi \left(\frac{{z}_{k}^{(u)}-p}{\sigma }\right)-\Phi \left(\frac{{z}_{k}^{(l)}-p}{\sigma }\right)$$
(7)

where \({z}_{k}^{(l)}\) = logit(ak), \({z}_{k}^{(u)}\) = logit(ak+1). Note that the logit link is associated with logistic regression, while the probit link is associated with probit regression. In this sense, it may seem more natural to assume that U follows a probit-normal distribution instead, i.e., replacing the logit function with Φ−1 [5].

Additional flexible transformations have been used to accommodate skewed data distributions [20], including the Aranda-Ordaz and the Czado transformation below:

$$h\left(\text{x}, {\lambda }_{1}, {\lambda }_{2}\right)=\left\{\begin{array}{cc}\frac{{\left(\text{x}+1\right)}^{{\lambda }_{1}}-1}{{\lambda }_{1}}& if \text{x}\ge 0\\ -\frac{{\left(-\text{x}+1\right)}^{{\lambda }_{2}}-1}{{\lambda }_{2}}& if \text{x}<0\end{array}\right.$$
(8)

where λ1 and λ2 are parameters to be estimated.

As with logistic and probit regressions, CG naturally predicts data in their original BOS categories, which is an advantage over the zero-inflated approach. However, a difficulty, especially for those less familiar with categorical data analysis, is that understanding its ability to describe skewed data distributions may require more effort.

Bounded Integer (BI)

This method was motivated by an equidistant discretization of the cumulative normal distribution with the Z-values, Z1/(m+1) to Zm/(m+1) [21], and has been used in further applications [6, 7, 22]. Assuming a general variance function g, the conditional likelihood of an observation Y = k/m is modeled as follows:

for k = 1, …, m-1,

$$\Phi \left(\frac{{Z}_{\left(k+1\right)/\left(m+1\right)}-p}{g}\right)-\Phi \left(\frac{{Z}_{k/\left(m+1\right)}-p}{g}\right)$$
(9)

for k = 0,

$$\Phi \left(\frac{{Z}_{1/\left(m+1\right)}-p}{g}\right)$$
(10)

and for k = m,

$$1-\Phi \left(\frac{{Z}_{m/\left(m+1\right)}-p}{g}\right)$$
(11)

A latent variable interpretation of BI, similar to that of CG, was also given in [21], but it was shown problematic [5]. On the other hand, comparing Eq. 7 and 9 shows that, when g is a constant, BI is equivalent to setting logit(ak) = \({z}_{k}^{(l)}\)  = Zk/(m+1), or equivalently, ak = logit−1(Zk/(m+1)) in Eq. 6. As indicated under Eq. 7, using a probit instead of the logit link with CG would lead to using Φ−1(ak) instead of logit(ak) in Eq. 7, which in turn would lead to ak = Φ−1(Zk/(m+1)) = k/(m + 1). The difference of this with the choice of ak = (k-0.5)/m appears subtle, thus BI and CG could be expected to perform similarly, especially when the number of categories is large. The symmetry of normal distributions could adversely affect the performance of BI and CG with skewed data [5].

Latent-Beta

This method sets t as the identity function in Eq. 6, i.e., no transformation. The latent variable U is assumed to follow a beta distribution given in Eq. 12, and ak = k/(m + 1). It emerged relatively recently from statistical literature [23], and has shown successes in describing skewed PASI score data as well as derive endpoints (PASI 75/90/100), even in a challenging situation of subject population censoring where a subpopulation more sensitive to drug treatment was the focus [24, 25]. Note that to successfully describe derive endpoints, the model must accurately describe both the mean BOS trend and its variability (24). While the latent-beta method has been shown to be successful in perhaps the most varied types of pharmacometric applications, it may also be the least intuitive due to the relatively less frequent use of the beta distribution compared to others, such as the normal distribution.

Combined Uniform Binomial

This approach also treats BOS data as ordered categorical, albeit with a different motivation than the latent variable approach. It contains a large class of methods, known in psychological literature as the CUB family, which combines a binomial distribution with a uniform distribution, typically with a mixture probability π [26]. The binomial distribution serves a similar role as in logistic regression, and the uniform distribution, termed the “uncertainty” component of the model, accommodates potential additional variability. The likelihood of an observation Y = k/m is given by

$$\pi {\xi }^{\text{k}}{\left(1-\xi \right)}^{\text{m}-\text{k}}\text{m}!/\left[\text{k}!\left(\text{m}-\text{k}\right)!\right]+\left(1-\pi \right)/\left(\text{m}+1\right)$$
(12)

where 0 ≤ π ≤ 1 and 0 < ξ < 1. In mixture-model terms, π and (1 – π) are the mixing probabilities for the binomial distribution and the uniform distribution on [0, m], respectively. Let F(k, m, ξ) be the binomial cumulative distribution function, the cumulative distribution of the CUB model is directly obtained from Eq. 12:

$$\text{prob}\;\left(\text{Y}\le \text{k}/\text{m}\right)=\pi \text{F}\;\left(\text{k}, \text{m}, \xi \right)+\left(\text{k}+1\right)\;\left(1-\pi \right)/\left(\text{m}+1\right)$$
(13)

It is noted that, as a special case of the above, Eq. 6 in [25] contained an error; the correct form can be derived by using k = 11 and m = 720 in Eq. 13. The CUB approach may suit certain types of survey data, but the uniform distribution may not be appropriate for representing the residual variability of disease severity scores [25]. This is because, given a model prediction, scores closer to the prediction should be more likely than scores farther away, which contradicts the assumptions of a uniform distribution.

Implementation

An early implementation of the censoring method was carried outin SAS PROC NLMIXED [16]. To the author’s knowledge, all other pharmacometric applications were implemented in NONMEM [27]. This may be because the data likelihoods for all methods differ from those more routinely used in standard data analysis tasks, as can be seen from Eq. 313. A particular example with clinical study data is when study entry criteria include requirements for levels of disease severity, and appropriate modeling such data requires baseline likelihood modification [25]. The implementation of such likelihood modification may be easier in NONMEM than some other popular software in pharmacometrics. The NONMEM implementations were provided in the supporting materials of the respective references for all methods. As an example, the essential elements for the latent-beta implementation of PASI scores are provided below [24]:

  • $ABBR FUNCTION BETACDF(VQI,10); beta CDF

  • $ABBR VECTOR VQI2(10); required auxiliary function

  • $ERROR

    • PhiBeta = THETA(1); precision parameter of beta distribution

    • MuBlgt = ModPred; model prediction: include more THETA, ETA, etc

    • MuBeta = 1/(1 + EXP(-MuBlgt)); mean parameter of beta distribution

    • ALPHA = MuBeta*PhiBeta

    • BETAq = (1 - MuBeta)*PhiBeta

    • VQI(1) = DV*10/721; DV is observed PASI score (0 – 72, with 0.1 increments)

    • VQI(2) = ALPHA

    • VQI(3) = BETAq

    • CDFk = BETACDF(VQI); CDF(k), k ~ DV

    • VQI2(1) = (DV*10 + 1)/721

    • VQI2(2) = ALPHA

    • VQI2(3) = BETAq

    • CDFk1 = BETACDF(VQI2); CDF(k + 1)

    • Y = CDFk1 – CDFk; likelihood

Distribution Bariance

Some of the methods mentioned above, as described in their original publications, incorporate a general standard deviation function g, which allow more flexibility than a constant σ. This may improve model fitting in the hybrid approach, e.g., by allowing the variance to increase proportionally with predictors that will remain positive [16]. However, the benefit becomes dubious for the latent variable approach, particularly if the latent variable can be ≤ 0, rendering the common proportional and additive-plus-proportional error models ill-behaved [4]. It is also worth noting that the presence of apparent heterogeneous variance when treating data as continuous may not be observed when treating the data as ordered categorical.

Data Type and Model Comparisons

The definition of BOS implies that they are ordered categorical. However, in practice, BOS data appear in numerical forms, such as integers, allowing for numerical operations to derive endpoints like PASI 75, which is characteristic of continuous data. Indeed, this was a motivation for the BI method [21]. Additionally, different analysis approaches in practice treat data as different types. Therefore, it can be tempting to think BOS data as both continuous and categorical, which is the source of many practical confusions. One such confusion is the use of likelihood-based criteria to compare models treating data as different types, which is inappropriate [10]. Another issue is the scale to evaluate the models on, which will be elaborated later in this commentary. Note that while a continuous analysis model can be used to "predict" categorical outcomes, commonly by rounding [13], this essentially treats the continuous model predictor as a latent variable for the categorical data, as shown in Eq. 6. Since the model parameters are estimated assuming a continuous model distribution rather than a categorical model distribution, using the continuous analysis model for categorical data becomes ad hoc. Assuming a continuous distribution is for the purpose of facilitating useful analysis conclusions and should not be interpreted as the data truly following the assumed (misspecified) distribution. To avoid confusions, it is helpful to consider BOS data as ordered categorical, and any "continuous" properties attributed to the data as desirable properties for the analysis model. For more details, see [10].

Zero-Inflated vs Latent-Variable

The zero-inflated approach has the appeal in that functional or transformation effects on the data can be easily seen on the continuous scale. For the latent variable approach, seeing effects on the latent variable on predicting the BOS data categories may require more effort. However, it is important to note that conceptually, latent variables are continuous, allowing for the application of any type of models for continuous data. Therefore, the latent variable approach does not lose any flexibility.

It is worth noting that all analysis methods utilize at least two parameters: one for the mean and another for the variance of the distribution. However the zero-inflated approach requires more parameters due to treating the boundary data differently. Particularly, the zero-inflated-beta method uses two additional parameters to model data at one boundary. In contrast, the censoring method only uses one additional parameter, which also relates to the skewness of the remaining data. This is because the censoring method treats boundary data as censored versions of all data, considering them to be of similar nature to data inside the boundaries. This preserves a link between data inside and on the boundaries.

Considering the latent variable interpretation of categorical data analysis [18], the zero-inflated approach can be seen as using predictors of data inside boundaries as latent variables that determine the probability of data on the boundary. It is important to note that for BOS data, a boundary is just a category like any other data category. This raises the question of why these data are treated differently, i.e., why not use the same latent variable for all categories. Indeed, this is achieved by the latent variable approach.

In principle, the latent variable approach is just as flexible as the zero-inflated approach, but it has the advantage of naturally predicting the data within its defined ranges. In this sense, the latent variable approach is superior. The fact that it is not widely used may be attributed to a lack of familiarity.

Categorical Analysis Methods

Among the main latent variable methods, the logit-normal methods CG and BI are expected to perform similarly, except when a substantial portion of the data is on the boundary. The latent-beta method may have advantages for highly skewed data, as beta distributions can effectively describe skewed data. To the author's knowledge, this is the only method that has consistently shown the ability to describe derived endpoints in addition to the BOS data. Additional transformations could potentially improve the performance of CG and BI with skewed data [20].

While there are reasons why the CUB approach may perform worse than the latent variable approach, it could still be a viable option when at least part of the BOS data nature is close to binomial, especially considering that BOS are often composite scores.

Finally, the ordered categorical method has been demonstrated to perform well with a sufficient sample size [19], which depends on the total number of observations and the minimum number of observations in each category [5]. With the many (= m) intercept parameters to distinguish all categories, the method is likely to be more accurate and robust [13]. This situation may occur more frequently with data that has a moderate number of categories e.g., [10,11,12,13,14,15,16,17,18,19,20].

More experiences are needed on the relative performance of these methods. In practice, these methods could be evaluated by AIC/BIC, and VPC [10].

Method Evaluation

Since BOS data are fundamentally ordered categorical, several important considerations arise. Firstly, it is worth noting that no practically useful residuals exist [28]. Additionally, there is a risk of overfitting the variability component, including the between-subject variability [18]. If desired, residuals obtained by treating the data as continuous can be used as a guide, especially as the number of categories increases. However, it is important to acknowledge that treating the data as continuous compromises the desired distributional properties. Similarly, metrics commonly used for continuous data analysis, such as mean prediction errors, become less useful.

Therefore, VPC becomes particularly valuable in this context. VPCs that treat the data as continuous can provide convenient general assessments [20], but they may introduce bias when comparing different methods that treat the data as different types [13]. Treating the data as ordered categorical would align more consistently with the nature of latent variable models, preferably cumulatively [29], but would be cumbersome. Therefore, it is especially important to conduct VPCs on those quantities that are crucial for the specific application, including all derived endpoints of interest [10, 25].

Another situation where confusion often arises in VPCs is when there are censored baseline observations. Clinical trial entry criteria often require a certain level of disease status, such as a PASI score ≥ 12. In practice, the censoring effect on baseline observations is often ignored during model fitting, and subjects with simulated baseline outcomes that violate the entry criteria are removed from VPCs. This discrepancy between the model and simulation can negatively impact VPC performance. It is more appropriate to adjust the data likelihood for baseline observations [25], which ensures accurate VPC evaluations.

For general VPC usage, it has recently been pointed out that some common usages of pharmacometric interval terms, including confidence intervals and prediction intervals, are inconsistent with statistical terminology. A proposal to align with statistical terminology has been suggested [30]. Using terminology consistent with statistical literature can lead to more appropriate usage of VPCs by using confidence intervals constructed from observed data, such as those obtained by bootstrap, instead of the commonly used VPC intervals obtained by model-based simulations [31].

Practical Guideline

Considering the properties of the methods discussed, the following guidelines can help in selecting the appropriate analysis methods:

  • Standard Ordered Categorical: This is the best method when applicable, but only in the rare scenario where the sample size is large relative to the number of categories. The term 'sample size' here relates to the total number of observations as well as the minimum number of observations in each category.

  • Standard Continuous: Use this method if data distributions appear symmetric and the objective is to describe the observed data using a simple approach.

  • Zero-Inflated Beta or Censoring: Use these methods if data distributions appear skewed and the objective is to describe the observed data using a transformation that can be easily identified. The censoring method may be more parsimonious than the zero-inflated beta method.

  • CG or BI: Use these methods if data distributions appear symmetric or at most modestly skewed.

  • Latent Beta: This method may be widely applicable, especially if data distributions appear skewed.

Likelihood-based criteria, such as AIC or BIC, may be used to compare methods treating data as the same type. VPCs are recommended to evaluate model performance, especially for the endpoints of interest.

Conclusions

BOS analysis is technically complex, and the methods are evolving. While treating the data as continuous and attempting data transformations may seem intuitive, they can often lead to misleading results. Treating boundary data as a separate category may aid in data description, but it can create an artificial distinction between data on the boundary and data inside the boundary. Latent variable methods, on the other hand, naturally align with the nature of BOS data. These methods are not structurally more complex than alternative approaches and may offer the most potential. However, the choice of the best latent variable method may depend on the specific situation and requires further research.