Predictive Models for Trajectory Risks Prediction from Repeated Ordinal Outcomes

Chowdhury, Rafiqul I; Islam, M Ataharul

doi:10.1007/s40840-022-01277-1

Predictive Models for Trajectory Risks Prediction from Repeated Ordinal Outcomes

Published: 01 April 2022

Volume 45, pages 161–209, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Bulletin of the Malaysian Mathematical Sciences Society Aims and scope Submit manuscript

Predictive Models for Trajectory Risks Prediction from Repeated Ordinal Outcomes

Download PDF

130 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes new regressive proportional and partial proportional odds models and a framework to predict trajectories of repeated ordinal outcomes, which is a new development. We illustrated the proposed models using repeated ordinal responses on activities of daily living from older adults collected biannually through the Health and Retirement Study in the USA. The proposed framework uses the marginal and conditional modeling approach to obtain the joint model and predict the joint probability of a sequence of ordinal outcomes and trajectories. Besides, these models significantly reduce over-parameterization, as one needs to fit one model for each follow-up. This model allows assessing the effect of prior responses on current outcomes, including interaction terms among previous responses and between prior outcomes and covariates in the model. Also, it permits the varying number of risk factors for each follow-up. The prediction accuracy for full, training, and test data is close and varies between 0.91 and 0.94. The bootstrap simulation demonstrates the bias of parameter estimates, accuracy, and predicted joint probabilities are negligible with very low mean squared error. This model and framework would be instrumental in studying trajectories generated from longitudinal studies. The proposed framework can be used to analyze big data generated from repeated measures. This model readily uses a divide and recombine approach for big data in a statistically valid manner.

Robust Bayesian cumulative probit linear mixed models for longitudinal ordinal data

Article 04 May 2024

Modeling of Repeated Measures for Time-to-event Prediction

Dynamic Models for Longitudinal Ordinal Non-stationary Categorical Data

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Due to increased life expectancy in many countries, older adults are exposed to a higher risk of adverse health outcomes. Also, they are vulnerable to increased utilization of health care services and death [11]. For example, difficulty in the activity of daily living (ADL) may prospectively relate to the progression of functional limitations and disability among the elderly [5]. The Health and Retirement Study (HRS) is a nationally representative longitudinal survey in the USA and repeatedly measured ADL as ordinal outcomes. Individuals make transitions over time among different response categories, and a trajectory based on the series of events is helpful to understand the disease progression [4, 6, 12, 15, 40]. This is a large-scale longitudinal study with more than 20 years of follow-up data on approximately 20,000 people producing a huge data volume. Nowadays, due to the lower cost of data acquisition, large and complex longitudinal data captured, termed big data. As a result, there are new statistical challenges in methodology, theory, and computation to get vital insight, actual behavior, and make sense of this extensive complex data. A special issue on “The role of Statistics in the era of big data” of Statistics and Probability Letters in 2018 was devoted to the role of statistics in the era of big data. Modeling transitions over time among different response categories and predicting trajectory risks based on various risk factors would be difficult.

A growing area of interest is to predict the joint probability of a sequence of events (trajectory) based on a specified covariate vector [23, 29, 32, 35, 41, 42]. Modeling these sequences allows us to predict likely future outcomes. Specifically, interest might be in (i) What is the expected risk of having a condition of a patient based on previous responses and risk factors? (ii) What is the predicted risk of occurrence of a sequence of events based on specified features at different follow-ups? (iii) What would be the predicted outcome at the subsequent follow-up with specified values of covariates and previous outcomes? (iv) Also, in a follow-up, it is interesting to study the interaction effects between risk factors and outcomes of earlier follow-ups and interaction among previous events. Using the predicted risk of a sequence of outcomes, health care providers can screen individuals that would help them to suggest necessary therapy and preventive measures. A physician can recommend early or regular office visits or prescribe medication to prevent hospitalization based on a patient’s trajectory [37]. The risk prediction can also allow a patient to be aware of the future course of the disease [39].

The prediction of trajectories from a sequence of ordinal outcomes based on specified covariates is a great challenge. To predict the joint probability of a series of events, we need to examine the progression of responses during subsequent follow-ups using a joint model (multivariate) for ordinal outcomes. A multivariate approach is often complicated and would be challenging to develop for a large number of follow-ups [16]. The multistate higher-order Markov model (conditional model) can be used to study the underlying dependence in consecutive follow-ups [24]. Using this model, one can estimate the risk for a sequence of events [26]. However, these models are restricted for a large number of follow-ups due to over-parameterization, and one cannot assess the impact of the prior outcomes due to effective stratification [16, 25]. Also, it is not possible to include the interaction between responses from previous follow-ups and interaction between previous responses and risk factors in the model [13]. For many follow-ups, Markov models require data set with a big sample size throughout the follow-ups.

Figure 1 displays three repeated outcomes, each with three categories and twenty-seven possible trajectories (paths). In this case, one needs to fit a total of thirteen models; one marginal model from follow-up one (baseline), three first-order and nine second-order Markov models. Which could be computationally cumbersome and may explode for a large number of repeated responses [14, 17]. Another choice is the regressive logistic models under the Markovian assumption, which include both binary outcomes in previous times in addition to covariates in the conditional models [7, 8, 23, 36]. Islam and Chowdhury [23] developed a regressive logistic model to predict the joint probability of a sequence of binary outcomes based on specified covariates, which reduces the fitting of conditional models significantly. Chowdhury and Islam [13] extended this model for repeated multinomial responses. The ordinal logistic regression models with different variants are some approaches to model ordinal response [2, 3, 9, 19, 33, 34], for example proportional odds, partial proportional odds, continuation ratio, stereotype adjacent category, and baseline category models. However, these are univariate models only for the single ordinal outcome.

Against this backdrop, we proposed two regressive models for repeated ordinal outcomes and showed the joint model, which is a new development. First, we proposed a proportional odds regressive model (POM) for repeated ordinal outcomes. For POM, one needs to test the proportional odds assumption [9]. Second, in the case of violations of proportional odds assumption, we proposed a partial proportional odds regressive model (PPOM) for repeated ordinal outcomes. We also applied the multinomial regressive logistic model (MNOM) for repeated responses [13] by ignoring the ordinal nature of the outcome variables. Then we estimated the risk for a sequence of events for specified covariate values by linking marginal and conditional probabilities. We obtained the marginal probability using the outcome from the first follow-up, and the conditional probabilities are estimated from the subsequent follow-ups using the proposed regressive models. Using data partitioning (training and test data), we computed the prediction accuracy to check over(under)fitting. Furthermore, 10,000 bootstrap simulations are performed to assess the proposed models’ performance. Finally, we illustrated the proposed methods using follow-up data from the USA’s Health and Retirement Study (HRS).

2 Repeated outcomes and trajectories

Suppose $Y_1, Y_2$, and $Y_3$ are three repeated ordinal outcomes with three categories that may represent three states of ADL difficulty (0,1,2). Figure 1 displays the possible transitions between three outcome categories from three follow-ups. A total of twenty-seven distinct trajectories (paths) are possible. Here, the first column shows marginal probabilities, and the second and third are conditional probabilities.

2.1 Notations

Let ${Y_{i1}},{Y_{i2}},...,{Y_{iJ_i}}$ represent the past and present responses for ith subject at jth follow-up where $(i=1,2,...,n \text { and }j=1,2,...,J_i)$, $J_i$ is the number of follow-ups for subject i. For simplicity, subscript i is omitted in what follows next unless explicitly specified. Define, $Y_j=s$ where $(s=0,1,2,...,S)$ with $S+1$ outcome categories. The category 0 may denote non-event.

Following the notations used in [13] the joint probability mass function of $Y_1,Y_2,\cdots ,Y_J$ with covariate vector ${\varvec{X=x}}$ can be expressed as:

$$\begin{aligned} \begin{aligned} P(Y_1 = y_1,Y_2 = y_2,\ldots ,Y_J = y_J\mid {\varvec{x}})&=P(Y_1 = y_1\mid {\varvec{x}})\times P(Y_2 = y_2\mid y_1;{\varvec{x}})\\&\quad \times \cdots \times P(Y_J = s \mid y_1,\cdots ,y_{j-1};{\varvec{x}})\\&=P_{y_1}({\varvec{x}})\times P_{y_2.y_{1}}({\varvec{x}})\times \cdots \\&\quad \times P_{s.y_1}, \ldots ,y_{j-1}({\varvec{x}}), \end{aligned} \end{aligned}$$

(1)

where ${\varvec{X}}^\prime =[1,x_1,...,x_p]$ is vector of covariates for a subject at the first follow-up. It should be noted that ${\varvec{X}}={\varvec{x}}$ can be time dependent. Explanations of the functions of the right-hand side in Eq. (1) are as follows:

$P(Y_1 = s\mid {\varvec{x}})=P_s({\varvec{x}})$ is the marginal probability function of $Y_1$ conditional on ${\varvec{x}}$;

$P(Y_J = s\mid y_{j-1};{\varvec{x}})=P_{s.y_{j-1}}({\varvec{x}})$ is the probability function of $Y_j$ conditional on $y_{j-1}$ and ${\varvec{x}}$ of order one;

$P(Y_J = s\mid y_{j-1}, y_{j-2};{\varvec{x}})=P_{s.y_{j-1},y_{j-2}}({\varvec{x}})$ is the probability function for $Y_j$ conditional on $y_{j-1}, y_{j-2}$ and ${\varvec{x}}$ of order two;

$P(Y_J = s\mid y_{j-1},y_{j-2},\cdots ,y_1;{\varvec{x}})=P_{s.y_{j-1},y_{j-2},\cdots ,y_1}({\varvec{x}})$ is the probability function of $Y_j$ conditional on $y_{j-1},\cdots ,y_1$ and ${\varvec{x}}$ of order $k=j-1$.

The unconditional probability of the left-hand side of Eq. (1) is defined as:

$P(Y_1 = y_1,Y_2 = y_2,\cdots ,Y_J = y_J\mid {\varvec{x}})=P_{y_1,y_2,\cdots ,y_J}({\varvec{x}})$.

3 Models

3.1 Proportional odds model (POM)

McCullagh [33] proposed the proportional odds model (POM) model to analyze ordinal outcomes as a function of covariates. In this model, the coefficients that describe the relationship between lower-level versus all higher levels (thresholds or cut points) of the response variable are the same, which is the proportional odds assumption (parallel regression) and required to test [9]. We assessed the proportionality odds assumption using the Brant test [9]. POM-fitting using baseline outcome as a function of covariates will provide the marginal model.

Let, the outcome $Y_1$ having s categories $(s=0,1,\ldots ,S)$ with associated probabilities ${\pi _0+\pi _1+\cdots +\pi _S}$ and $P(Y_1 \le s)=\pi _0+\cdots +\pi _s$ where $P(Y_1 \le 0) \le P(Y_1 \le 1) \le \cdots \le P(Y_1 \le S)$. Then the proportional odds model can be shown as:

$$\begin{aligned} P(Y_1\le s\mid {\varvec{x}})=\frac{\hbox {exp}\left( \alpha _{s}-{\varvec{\beta _1^\prime }}{\varvec{X}}\right) }{1+\hbox {exp}\left( \alpha _{s}-{\varvec{\beta _1^\prime }}{\varvec{X}}\right) }, \quad s=1,2,\ldots ,S \end{aligned}$$

(2)

or equivalently can be expressed in logit form as

$$\begin{aligned} \begin{aligned} \hbox {logit}[P(Y_1\le s\mid {\varvec{x}})]&=\ln \bigg [\frac{\pi _0+\cdots +\pi _s}{\pi _{s+1}+\cdots +\pi _S}\bigg ]\\&= \alpha _s - \left( \beta _1X_1 +\cdots + \beta _pX_p\right) \\&=\alpha _s-{\varvec{\beta _1^\prime }}{\varvec{X}} \end{aligned} \end{aligned}$$

(3)

where $\alpha _s$’s are the threshold parameters (cut points) and ${\varvec{\beta _1}}=[\beta _1,\beta _2,\cdots ,\beta _p]^\prime $ is the vector of regression coefficients corresponding to the covariate vector ${\varvec{X}}=[X_1,X_2,\cdots ,X_p]^\prime $. This model assumes that the effects of the covariates are same for all categories (proportional odds). Then the marginal probability of sth category is

$$\begin{aligned} P_s({\varvec{x}})=P(Y_1=s\mid {\varvec{x}})=P(Y_1\le s+1\mid {\varvec{x}}) - P(Y_1 \le s\mid {\varvec{x}}), \quad s=0,1,\ldots ,S. \end{aligned}$$

(4)

3.2 Proposed kth-order proportional odds regressive model

Let $Y_1,\ldots , Y_J$ are repeated ordinal outcomes with s outcome levels ($s=0,1,2,\ldots ,S$). Then the proposed kth-order (k=j-1) proportional odds regressive model can be shown as follows:

$$\begin{aligned} \begin{aligned}&logit[P(Y_j\le s\mid {\varvec{z}})]= \alpha _{s.y_{j-1}} \\&\quad - \left( \beta _{j.y_{j-1}1}Z_1 +\cdots + \beta _{j.y_{j-1}p}Z_p+\beta _{j.y_{j-1}(p+1)}Z_{p+1}+\cdots \right. \\&\quad + \beta _{j.y_{j-1}(p+S)} Z_{p+S} +\beta _{j.y_{j-1}(p+S+1)}Z_{p+S+1} +\cdots \\&\quad +\beta _{ j.y_{j-1}(p+2S)} Z_{p+2S}+\cdots +\beta _{j.y_{j-1}[p+(j-2)S+1]}Z_{p+(j-2)S+1} +\cdots \\&\quad \left. +\beta _{j.y_{j-1}[p+(j-1)S]} Z_{p+(j-1)S}\right) =\alpha _{s.y_{j-1}}-{\varvec{\beta ^\prime }}_{j.y_{j-1}}{\varvec{Z}}, \quad s=1,2,\ldots ,S \end{aligned} \end{aligned}$$

(5)

where $\alpha _s$’s are the threshold parameters and

$$\begin{aligned} \begin{aligned}&{\varvec{\beta _{j.y_{j-1}}}}=\left[ \beta _{j.y_{j-1}1},\ldots , \beta _{j.y_{j-1}p},\beta _{j.y_{j-1}(p+1)}\ldots ,\beta _{j.y_{j-1}(p+S)},\beta _{j.y_{j-1}(p+S+1)}\right. \\&\quad \left. ,\ldots ,\beta _{j.y_{j-1}(p+2S)},\ldots ,\beta _{j.y_{j-1}[p+(j-2)S+1]},\ldots ,\beta _{j.y_{j-1}\left[ p+(j-1)S\right] }\right] ^\prime \end{aligned} \end{aligned}$$

(6)

is the vector of regression coefficients corresponding to the covariate vector

$$\begin{aligned} \begin{aligned} {\varvec{Z}}&= \left[ Z_1,\ldots ,Z_p,Z_{p+1},\ldots ,Z_{p+S},Z_{p+S+1},\ldots ,Z_{p+2S},\right. \\&\quad \left. \ldots ,Z_{p+(j-2)S+1},\ldots , Z_{p+(j-1)S}\right] ^\prime \\&=\left[ {\varvec{X^\prime }},{\varvec{D^\prime }}\right] \\&=\left[ X_1,X_2,\ldots ,X_p,D_{11},\ldots ,D_{1S},D_{21},\ldots ,D_{2S},D_{(j-1)1},\ldots ,D_{(j-1)S}\right] ^\prime . \end{aligned} \end{aligned}$$

(7)

here $D_{11},\ldots ,D_{1S},D_{21},\ldots ,D_{2S},\ldots ,D_{(j-1)1},\ldots ,D_{(j-1)S}$ are the dummy variables for categories $1, 2,\ldots ,S$ for $Y_1,\ldots Y_{j-1}$ with 0 as the reference category. Then the conditional probability of sth category is

$$\begin{aligned} \begin{aligned} P_{s.y_1y_2,\ldots ,y_{j-1}}({\varvec{z}})&=P(Y_j=s\mid y_1,y_2,\ldots ,y_{j-1};{\varvec{x}})\\&=P(Y_j\le s+1\mid y_1,y_2,\ldots ,y_{j-1};{\varvec{x}}) - P(Y_j \\&\le s\mid y_1,y_2,\ldots ,y_{j-1};{\varvec{x}}),\\&\quad s,y_1,\ldots ,y_{j-1}=0,1,\ldots ,S. \end{aligned} \end{aligned}$$

(8)

3.3 Partial proportional odds model (PPOM)

If the proportional odds assumption violates for predictors, then alternative models are unconstrained, constrained partial proportional odds [38], or multinomial logistic models among others [1, 20, p. 290–292]. The unconstrained partial proportional odds model allows non-proportional odds for a subset of q predictors ($q<p$, p is the total number of predictors) for those proportional odds assumptions violates. Then the marginal model using baseline outcome can be shown as:

$$\begin{aligned} \begin{aligned} P(Y_1\le s\mid {\varvec{x}})=\frac{\hbox {exp}\left( \alpha _{s}-{\varvec{\beta _1^\prime }}{\varvec{X}}-{\varvec{\gamma }}_s^\prime {\varvec{T}}\right) }{1+\hbox {exp}\left( \alpha _{s}-{\varvec{\beta _1^\prime }}{\varvec{X}}-{\varvec{\gamma }}_s^\prime {\varvec{T}}\right) }, \quad s=1,2,\ldots ,S. \end{aligned} \end{aligned}$$

(9)

or equivalently can be expressed in logit form as

$$\begin{aligned} \hbox {logit}\left[ P\left( Y_1\le s\mid {\varvec{x}}\right) \right] =\alpha _s-{\varvec{\beta _1^\prime }}{\varvec{X}}-{\varvec{\gamma }}_s^\prime {\varvec{T}} \end{aligned}$$

(10)

where $\alpha _s$ are the cut points, ${\varvec{T}}$ is the subset of covariate vector for which the proportional odds assumption is violated, and $\gamma _s$ is a vector of regression coefficients corresponding to the q covariates in ${\varvec{T}}$, ${\varvec{\beta _1^\prime }}$ is the vector of the regression coefficients of covariates those are not in q. Then using Eq. (4), we can obtain the marginal probability of sth category.

3.4 Proposed kth-order partial proportional odds regressive model

The kth-order partial proportional regressive model for $Y_1,\ldots ,Y_j$ can be shown as:

$$\begin{aligned} \hbox {logit}\left[ P\left( Y_j\le s\mid {\varvec{z}}\right) \right] =\alpha _{j.s}-{\varvec{\beta _{j.y_{j-1}}^\prime }}{\varvec{Z}}-{\varvec{\gamma }}_{j.s}^\prime {\varvec{T}} \end{aligned}$$

(11)

where $\alpha _{j.s}$ are the cut points, ${\varvec{T}}$, $\gamma _{j.s}$, ${\varvec{\beta _{j.y_{j-1}}^\prime }}$ are equivalent as explained in Eq. (10) and ${\varvec{Z}}$ is a covariate vector as defined in Eq. (7). The conditional probability of sth category can be estimated using Eq. (8).

3.5 Multinomial regressive logistic model

Chowdhury and Islam [13] showed kth-order multinomial regressive logistic model. The first-order multinomial regressive model $P(Y_2\mid y_1; {\varvec{z}})$ for outcomes $Y_1$ and $Y_2$ can be shown as:

$$\begin{aligned} P_{s.y_1}({\varvec{z}})=P(Y_2=s\mid y_1; {\varvec{z}})=\frac{e^{g_{s.y_1}({\varvec{Z}})}}{\sum \limits _{s=0}^S e^{g_{s.y_1}({\varvec{Z}})}} ,\quad s, y_1=0,1,\ldots ,S, \end{aligned}$$

(12)

$$\begin{aligned} \begin{aligned} \hbox {where } g_{s.y_1}({\varvec{Z}})&=\beta _{s.y_10}+\beta _{s.y_11}Z_1+\cdots +\beta _{s.y_1p}Z_p+\beta _{s.y_1(p+1)}Z_{p+1}+\cdots \\&\quad +\beta _{s.y_1(p+S)}Z_{p+S}, \quad s=1,\ldots ,S \text{ and } \end{aligned} \end{aligned}$$

${\varvec{Z^\prime }}=\left[ 1,Z_1,...,Z_p,Z_{p+1},\ldots , Z_{p+S}\right] $ $=\left[ {\varvec{X^\prime }},{\varvec{D^\prime }}\right] =\left[ 1,X_1,\ldots ,X_p,D_{11},\ldots , D_{1S}\right] $. Here $D_{11},\ldots , D_{1S}$ are the dummy variables for categories $1,\ldots , S$ of outcome $Y_1$ with 0 as the reference category and producing a total of $[(p+1)+S]S$ regression coefficients.

The first and all higher-order regressive models are equivalent to the corresponding marginal models. The regressive modeling approach requires fitting only one model for each repeated outcome by incorporating previous responses as covariates along with the risk factor. Besides, it allows the divide and recombines technique for large complex data. Therefore, one can run models for all the follow-ups in parallel using parallel programming and exploiting multiple processors. We can use R, SAS, STATA, or other software capable of fitting POM, PPOM, and MNOM. It is noteworthy that the regressive model for binary outcomes proposed by Islam and Chowdhury [23] and Bonney [7, 8] is special cases of the proposed regressive models shown in Eq. (12) for s=0,1.

3.6 Predictive models and joint probabilities

The log-likelihood function of the joint mass function in (1) can be obtained as:

$$\begin{aligned} \begin{aligned} l&=\sum \limits _{i=1}^{n}\sum \limits _{j=1}^{J} \ln P(Y_{i1} = y_{i1},Y_{i2} = y_{i2},\ldots ,Y_{iJ} = y_{iJ}\mid {\varvec{x}})\\&=\sum \limits _{i=1}^{n}\sum \limits _{j=1}^{J} \bigg [\ln P(Y_{i1} = y_{i1}\mid {\varvec{x}}) + \ln P(Y_{i2} = y_{i2}\mid y_{i1};{\varvec{x}}) \\&\quad +\cdots + \ln P(Y_{iJ} = s \mid y_{i1},\ldots ,y_{i(j-1)};{\varvec{x}})\bigg ]. \end{aligned} \end{aligned}$$

(13)

For each proposed model, differentiating the log-likelihood with respect to the parameters and equating the derivatives to zero, we obtain the equations whose solutions give the maximum likelihood estimates for parameters. The observed information matrix can be obtained by taking the second derivative. Then using the Newton–Raphson method, the estimates of the parameters are obtained. These provide the fitted models and are used for predictions.

We can predict the risks of a sequence of outcomes for a subject with specified covariate vector ${\varvec{X}}^*={\varvec{x}}^*$ for a particular trajectory as shown in Fig. 1.

The predicted joint probabilities of ${\hat{P}}(Y_1=y_1,Y_2=y_2,\ldots ,Y_j=y_j\mid {\varvec{x}}^*)$ can be obtained as:

$$\begin{aligned} \begin{aligned} {\hat{P}}\left( Y_1 = y_1,Y_2 = y_2,\ldots ,Y_J = y_J\mid {\varvec{x}}^*\right)&={\hat{P}}\left( Y_1 = y_1\mid {\varvec{x}}^*\right) \times {\hat{P}}\left( Y_2 = y_2\mid y_1;{\varvec{x}}^*\right) \\&\quad \times , \ldots , \times {\hat{P}}\left( Y_J = s \mid y_{j-1}, \ldots ,y_1;{\varvec{x}}^*\right) \\&={\hat{P}}_{y_1}\left( {\varvec{x}}^*\right) \times {\hat{P}}_{y_2.y_{1}}\left( {\varvec{x}}^*\right) \\&\quad \times \cdots \times {\hat{P}}_{s. y_{j-1}, \ldots ,y_1}\left( {\varvec{x}}^*\right) . \end{aligned} \end{aligned}$$

(14)

For simplicity, let two repeated outcomes $Y_1$ and $Y_2$ with categories $s=0,1$ and 2. Then using Eq. (14) the predicted joint probabilities $P(Y_1=y_1,Y_2=y_2 \mid {\varvec{x}}^*)$ is

$$\begin{aligned} \begin{aligned} {\hat{P}}_{y_1,y_2}\left( {\varvec{x}}^*\right)&={\hat{P}}\left( Y_1=y_1,Y_2=y_2 \mid {\varvec{x}}^*\right) ={\hat{P}}\left( Y_1=y_1 \mid {\varvec{x}}^*\right) \times {\hat{P}}\left( Y_2=s, \mid y_1; {\varvec{x}}^*\right) \\&={\hat{P}}_{y_1}\left( {\varvec{x}}^*\right) \times {\hat{P}}_{s.y_1}\left( {\varvec{x}}^*\right) , \quad s, y_1, y_2 = 0, 1, 2. \end{aligned} \end{aligned}$$

(15)

We can predict the marginal probabilities ${\hat{P}}_0({\varvec{x}}^*); {\hat{P}}_1({\varvec{x}}^*); {\hat{P}}_2({\varvec{x}}^*)$ from the fitted marginal model and the first-order conditional probabilities ${\hat{P}}_{s.y_1}({\varvec{x}}^*)$ from the fitted first-order regressive model using covariate vector ${\varvec{Z}}=[{\varvec{x}}^*,D_{11},D_{12}]^\prime $ where $D_{11},D_{12}=0,1$. For example, ${\hat{P}}_{1.0}({\varvec{x}}^*)$ and ${\hat{P}}_{2.0}({\varvec{x}}^*)$ are estimated using ${\varvec{Z}}=[{\varvec{x}}^*,0,0]^\prime $; ${\hat{P}}_{1.1}({\varvec{x}}^*)$ and ${\hat{P}}_{2.1}({\varvec{x}}^*)$ are estimated using ${\varvec{Z}}=[{\varvec{x}}^*,1,0]^\prime $; ${\hat{P}}_{1.2}({\varvec{x}}^*)$ and ${\hat{P}}_{2.2}({\varvec{x}}^*)$ are estimated using ${\varvec{Z}}=[{\varvec{x}}^*,0,1]^\prime $ and so on. Then the joint probabilities for two outcomes ${\hat{P}}_{00}={\hat{P}}_0\times {\hat{P}}_{0.0}$; ${\hat{P}}_{01}={\hat{P}}_0\times {\hat{P}}_{1.0}$ and ${\hat{P}}_{02}={\hat{P}}_0\times {\hat{P}}_{2.0}$ and so on.

4 Tests

4.1 Significance of the joint model

We can test the significance of the joint model using the likelihood ratio test between the joint constant only model (Red.) and joint full model (Full) as follows:

$$\begin{aligned} -2\left[ \ln L_{\text {Red.}}({\varvec{{\hat{\beta }}_0}})-\ln L_{\text {Full}}({\varvec{{\hat{\beta }}_1}})\right] \text{ is } \text{ distributed } \text{ asymptotically } \text{ as } \chi ^2_{(d)}. \end{aligned}$$

(16)

The degrees of freedom (d) for three models are as follows:

POM=$[\{(p+S)\}+\{(p+S+S)\} +\{(p+2S+S)\} +\cdots + \{p+(j-1)S\} +S]-jS$.

PPOM=$[\{(p^\prime +S)\}+\{(p^\prime +S+S)\} +\{(p^\prime +2S+S)\} +\cdots + \{p^\prime +(j-1)S\} +S]-jS$.

MNOM=$[\{(p+1)S\}+\{(p+1+S)S\} +\{(p+1+2S)S\} +\cdots + \{p+1+(j-1)S\} S] - jS$.

here ${\varvec{{\hat{\beta }}_0}}$ and ${\varvec{{\hat{\beta }}_1}}$ includes regression coefficients from the constant only joint model and the full joint model, respectively. Table 1 displays the degrees of freedom for different models.

Table 1 Number of parameters for different models

Predictive Models for Trajectory Risks Prediction from Repeated Ordinal Outcomes

Abstract

Similar content being viewed by others

Robust Bayesian cumulative probit linear mixed models for longitudinal ordinal data

Modeling of Repeated Measures for Time-to-event Prediction

Dynamic Models for Longitudinal Ordinal Non-stationary Categorical Data

1 Introduction

2 Repeated outcomes and trajectories

2.1 Notations

3 Models

3.1 Proportional odds model (POM)

3.2 Proposed kth-order proportional odds regressive model

3.3 Partial proportional odds model (PPOM)

3.4 Proposed kth-order partial proportional odds regressive model

3.5 Multinomial regressive logistic model

3.6 Predictive models and joint probabilities

4 Tests

4.1 Significance of the joint model

4.2 Test for order of the regressive model

4.3 Overfitting, underfitting and predictive accuracy

5 An illustration

5.1 Predicted trajectories

5.1.1 Impact of gender on trajectory

5.1.2 Impact of mobility index on trajectory

5.1.3 Impact of large muscle index on trajectory

5.1.4 Impact of large muscle index on trajectory with mild ADL difficulties

5.1.5 Impact of large muscle, mobility index, and previous outcomes on trajectory

6 Bootstrapping

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation