1 Introduction

Ordinal variables allow assigning numbers to classify characteristics of subjects into categories that are ordered in some meaningful way. There are two broad categories of ordinal variables: the first is a discretised version of a continuous variable, which is split into different intervals according to some specific criteria and where each interval corresponds to a discrete category. For example, patients can be grouped according to their hospital length of stay (LoS) into categories such as “short LoS”, “medium LoS” and “long LoS”. The other type of ordinal variable is originated by an assessing process which evaluates an indeterminate amount of information before providing a grade or score of the ordinal variable (Anderson 1984). Examples of this type of ordinal variable are the Lansky score (Lansky et al. 2006) and Barthel scale (Mahoney and Barthel 1965) to assess general well-being and performance in daily living activities for children with cancer and geriatric patients, the score systems for patient dependency at an intensive care unit (Flaatten et al. 2002) or the triage classification used in emergency departments.

Ordinal variables are very often objects of study. A classical approach is to create models to describe them in relation to other variables by looking for rules of classification based on data. In this context, there are a variety of models that can accommodate ordinal variables, including the classical and well-established logistic regression models. However, in practice, there is a very limited dissemination of the range of models that can be used. Consequently, it is very common to use logistic models that are designed for nominal variables (unordered categories) with ordinal variables (Keski-Rahkonen et al. 2003; Takazawa et al. 2003; Newman et al. 2005; Mcelroy et al. 2002; Walston et al. 2002; Mäntyselkä et al. 2003). Another common approach is to amalgamate adjacent categories of the ordinal outcome into two broad categories and use standard binary logistic regression (Bender and Grouven 1998). However, this often results in the loss of information, description and statistical power (Ananth and Kleinbaum 1997; Armstrong and Sloan 1989). Amalgamating adjacent categories could be acceptable if when estimating a binary logistic regression (using the categories that one desire to combine), it is found that all the slopes in the model are simultaneously equal to zero.

The purpose of this tutorial is to provide an introduction to models from the family of logistic regression that are suitable for the analysis of ordinal data, putting particular emphasis on understanding how each model can be used to answer different research questions. The logistic regression models considered here are: the ordinal regression model, continuation ratio model, adjacent category model, generalised regression model, sequential model, multinomial logit model, partial proportional odds model, partial continuation ratio model and stereotype ordered regression model.

2 Data

The illustrative example of an ordinal outcome variable in this tutorial is a discretised version of LoS, whereby patients were classified according to three categories: “short LoS” (patients with LoS up to 3 days), “medium LoS” (patients with LoS from 4 to 11 days) and “long LoS” (patients with LoS from 12 days onwards). This classification was made based on empirical observation and personal judgment. Figure 1 shows the distribution of the continuous variable LoS, and its basic descriptive statistics are given in Table 1. Figure 2 depicts the discretised version of LoS split into three categories, where in accordance with Fig. 1, the vast majority of patients have a short LoS and just few of them experience a long LoS.

Fig. 1
figure 1

Distribution of the LoS as continuous variable

Table 1 Descriptive statistics for LoS
Fig. 2
figure 2

Distribution of the LoS as discretised variable

Potential predictor variables of a patient’s LoS category are described in Table 2. The data were extracted from routine patient records in a general public hospital in Mexico. The hospital belongs to the Secretariat of Health, is located in the heart of an urban area and is open to the general population, making it the preferable option for people who cannot afford private medical services or who are not affiliated to another healthcare provider. It is a 148-bed second-level hospital, which means it offers outpatient walk-in clinics and hospitalisations for basic medical specialties, such as adult medicine, paediatrics, obstetrics and gynaecology, and general surgery. Hospitals that correspond to this level of care have operating rooms and equipment suitable for performing surgery of low and medium level of complexity.

Table 2 Description of variables

The data correspond to almost 13,300 patient records from the years 2005 to 2009. The variables “diagnosis” and “surgical procedure” originally contained, respectively, around 800 and 200 different ICD codes (International Classification of Diseases codes version 10 for diagnoses and version 9 for surgical procedures), which would complicate their inclusion for further statistical analysis. To reduce the number of ICD codes of such variables, we used hierarchical cluster methods based on the χ 2 dissimilarity measure (Rezanková 2009). The diagnosis codes were grouped into five clusters using complete linkage algorithm (Defays 1977), and the surgical procedure codes were grouped into four main categories using Ward’s algorithm (Ward 1963). For more details, the reader is referred to Guzman Castillo (2012). Tables 3 and 4 contain some examples of the most common ICD codes for each diagnosis and surgical procedure category.

Table 3 Some common diagnoses within each of the three diagnosis categories
Table 4 Some common surgical procedures within each of the four surgical procedure categories

3 Models

Fullerton (2009) presented a classification of models for ordinal data based on how they deal with the proportional odds assumption or parallel regression assumption. Most of the models presented here can be thought as J − 1 simultaneous binary logistic regressions, where J is the number of categories of the ordinal-dependent variable. The parallel regression assumption means that the beta coefficients are equal across the simultaneous regressions. Assuming the equality of slopes among categories allows interpreting the models in the same way for all categories, making more parsimonious models.

The models that are based on that assumption are the ordinal regression model, the continuation ratio model and the adjacent category model. However, there is a general consensus that this assumption is quite stringent and the chance of all the dependent variables in the model having identical slope coefficients is likely to be quite rare (Lall et al. 2002). Consequently, other models have been presented in statistical literature as alternatives.

The generalised ordered logit, the sequential and multinomial models have all slope coefficients not delineated by the parallel regression assumption. One of the drawbacks of these models is that they include many more parameters, as a result of setting free all variables from parallel line constraints. Although it is very common to find that the parallel regression assumption has been violated, usually not all the slope coefficients of the model transgress the assumption.

The partial proportional odds model, partial continuation ratio and stereotype ordered model are models that impose constraints for parallel lines only where they are needed, i.e. some slope coefficients can be the same for all the J categories, while others can differ, hence avoiding including unnecessary extra parameters in the model.

This selection of models is by no means comprehensive. In particular, one can find other partially constrained models for the ordinal regression, the continuation ratio and the adjacent category models (Cole and Ananth 2001; Hauser and Andrew 2006; Fullerton 2009).

Following this classification, the definition of the models in the next section has been grouped into models that hold the parallel regression assumption for every independent variable, for no independent variables and for some independent variables.

3.1 Parallel assumption for every independent variables

3.1.1 Ordinal regression model

The ordinal regression model (ORM), commonly known as the cumulative odds model (Walker and Duncan 1967) or proportional odds model (Mccullagh 1980), was the first model developed exclusively for ordinal outcomes. The ORM can be defined as a probability model:

$$\ln \left( {\frac{\Pr (y \le j|\varvec{x})}{\Pr (y > j|\varvec{x})}} \right) = \tau_{j} - \varvec{x}\beta ,\quad j = 1, \ldots , J - 1,$$

where x is the vector of independent variables, βs are the slope coefficients, τ j are the thresholds, and J is the number of categories of the ordinal-dependent variable. The predicted probabilities of belonging to a certain category are defined as:

$$\begin{aligned} \Pr \left( {y = 1|\varvec{x}} \right) & = \frac{{\exp \left( {\tau_{1} - \varvec{x}\beta } \right)}}{{1 + \exp \left( {\tau_{1} - \varvec{x}\beta } \right)}}, \\ \Pr \left( {y = j|\varvec{x}} \right) & = \frac{{\exp \left( {\tau_{j} - \varvec{x}\beta } \right)}}{{1 + \exp \left( {\tau_{j} - \varvec{x}\beta } \right)}} - \frac{{\exp \left( {\tau_{j - 1} - \varvec{x}\beta } \right)}}{{1 + \exp \left( {\tau_{j - 1} - \varvec{x}\beta } \right)}}, \quad j = 2, \ldots ,J - 1. \\ \Pr \left( {y = J|\varvec{x}} \right) & = 1 - \frac{{\exp \left( {\tau_{J - 1} - \varvec{x}\beta } \right)}}{{1 + \exp \left( {\tau_{J - 1} - \varvec{x}\beta } \right)}}, \\ \end{aligned}.$$

Furthermore, the ORM is often formulated as a latent variable model, defined as:

$$\begin{aligned} y_{i}^{\prime } & = \varvec{x}\beta + \epsilon_{i} . \\ y_{i} & = j\quad if\quad \tau_{j - 1} \le y_{i}^{{\prime }} < \tau_{j} ,\quad j = 1, \ldots , J, \\ \end{aligned}$$

where y i is the latent variable ranging from ∞ to −∞, and \(\epsilon_{i}\) is the random error. The thresholds \(\tau_{1}\) through \(\tau_{J - 1}\) are parameters to estimate, assuming that τ 0 = −∞ and τ J  = ∞. In the context of LoS, the continuous latent variable y i can be thought of as the propensity of a patient to belong to a certain category. For example, the LoS category now relies on the latent variable:

$$\begin{aligned} y_{i} & = short\; if\quad \tau_{0} \le y^{\prime}_{i} < \tau_{1.} \\ y_{i} & = medium\, if\quad \tau_{1} \le y^{\prime}_{i} < \tau_{2} . \\ y_{i} & = long\, if\quad \tau_{2} \le y'_{i} < \tau_{3} . \\ \end{aligned}$$

Thus, when the latent variable crosses a threshold τ j , the patient category changes.

3.1.2 Continuation ratio model

A special type of ordinal model is the continuation ratio model (CRM) proposed by Fienberg (1977) in which the categories represent levels, where the lowest level must occur before the second, the second before the third and so forth until the highest level (Hilbe 2009). It can be thought as stages in some process through which an individual can advance. A key characteristic of the process is that an individual must pass through each stage (Long and Freese 2006). This special characteristic suits the nature of the patient journey through the hospital where the patient can evolve from a short to a medium LoS and so on. The CRM is defined as:

$$\ln \left( {\frac{\Pr (y = m|\varvec{x})}{\Pr (y > m|\varvec{x})}} \right) = \tau_{m} - \varvec{x}\beta , \quad m = 1, \ldots ,J - 1,$$

where m is the stage and J is the number of categories of the outcome variable. The predicted probabilities are calculated by:

$$\begin{aligned} \Pr \left( {y = m|\varvec{x}} \right) & = \frac{{\exp \left( {\tau_{m} - \varvec{x}\beta } \right)}}{{\mathop \prod \nolimits_{j = 1}^{m} \left[ {1 + \exp \left( {\tau_{j} - \varvec{x}\beta } \right)} \right]}} m = 1, \ldots , J - 1 . \\ \Pr \left( {y = J|\varvec{x}} \right) & = 1 - \mathop \sum \limits_{j = 1}^{J - 1} \Pr (y = j|\varvec{x}). \\ \end{aligned}$$

3.1.3 Adjacent category model

As the name indicates, the probability of interest in the adjacent category model (ACM) is the adjacent probability: the probability of having a short LoS against the probability of having a medium LoS or the probability of having a medium LoS against the probability of having a long LoS. The ACM formulated by Goodman (1983) is defined as:

$$\ln \left( {\frac{\Pr (y = m|\varvec{x}}{\Pr (y = m + 1|\varvec{x})}} \right) = \tau_{m} - \varvec{x}\beta ,\quad m = 1, \ldots ,J - 1,$$

The predicted probabilities are calculated by:

$$\begin{aligned} \Pr \left( {y = m |\varvec{x}} \right) & = \frac{{exp\left( {\mathop \sum \nolimits_{j = m}^{J - 1} \left( {\tau_{m} - \varvec{x}\beta } \right)} \right)}}{{1 + \mathop \sum \nolimits_{q = 1}^{J - 1} \left[ {exp\left( {\mathop \sum \nolimits_{m = q}^{J - 1} \left( {\tau_{m} - \varvec{x}\beta } \right)} \right)} \right]}} \quad m = 1, \ldots , J - 1 . \\ \Pr \left( {y = J|\varvec{x}} \right) & = 1 - \mathop \sum \limits_{j = 1}^{J - 1} \Pr (y = j|\varvec{x}) \\ \end{aligned}$$

3.2 No parallel assumption for independent variables

3.2.1 Generalised ordered logit model

Described by Clogg and Shihadeh (1994), the generalised ordered logit model (GOLM) allows the slope coefficients to differ for each of J-1 binary regressions as represented in the following equation:\(\ln \left( {\frac{\Pr (y \le j|\varvec{x}}{\Pr (y > j|\varvec{x}}} \right) = \tau_{j} - \varvec{x}\beta_{j} , \quad for \quad j = 1\, to \,J - 1.\)

The predicted probabilities are calculated as:

$$\begin{aligned} \Pr \left( {y = 1|\varvec{x}} \right) & = \frac{{\exp \left( {\tau_{1} - \varvec{x}\beta_{1} } \right)}}{{1 + \exp \left( {\tau_{1} - \varvec{x}\beta_{1} } \right)}}. \\ \Pr \left( {y = j|\varvec{x}} \right) & = \frac{{\exp \left( {\tau_{j} - \varvec{x}\beta_{j} } \right)}}{{1 + \exp \left( {\tau_{j} - \varvec{x}\beta_{j} } \right)}} - \frac{{\exp \left( {\tau_{j - 1} - \varvec{x}\beta_{j - 1} } \right)}}{{1 + \exp \left( {\tau_{j - 1} - \varvec{x}\beta_{j - 1} } \right)}},\quad j = 2, \ldots , J - 1. \\ \Pr \left( {y = J|\varvec{x}} \right) & = 1 - \frac{{\exp \left( {\tau_{J - 1} - \varvec{x}\beta_{J - 1} } \right)}}{{1 + \exp \left( {\tau_{J - 1} - \varvec{x}\beta_{J - 1} } \right)}} \\ \end{aligned}$$

Notice that the equations for the GOLM are similar to the ORM. GOLM retains the nature of the ORM by considering simultaneously the effects of a set of independent variables across successive dichotomisations of the outcome (O’connell 2010), yet setting free the slope coefficients β j to vary across the categories.

3.2.2 Sequential model

Also known as the unconstrained continuation ratio model, the sequential model (SeqM) presented in (Kahn and Morimune 1979) and (Mare 1979) describes and ordinal outcome as a sequence of decisions or steps. It can be expressed as:

$$ln\left( {\frac{{Pr(y = m|\varvec{x})}}{{Pr(y > m|\varvec{x})}}} \right) = \tau_{m} - \varvec{x}\beta_{j} ,\quad m = 1, \ldots ,J - 1,$$

The predicted probabilities are calculated by:

$$\begin{aligned} \Pr \left( {y = m|\varvec{x}} \right) = \frac{{\exp \left( {\tau_{m} - \varvec{x}\beta_{j} } \right)}}{{\mathop \prod \nolimits_{j = 1}^{m} \left[ {1 + \exp \left( {\tau_{j} - \varvec{x}\beta_{j} } \right)} \right]}} m = 1, \ldots , J - 1 . \hfill \\ \Pr \left( {y = J|\varvec{x}} \right) = 1 - \mathop \sum \limits_{j = 1}^{J - 1} \Pr (y = j|\varvec{x}). \hfill \\ \end{aligned}$$

3.2.3 Multinomial logit model

Luce (1959) proposed the multinomial logit model (MNLM) as an extension of the binary logistic regression model to handle polytomous outcomes, where the categories are no longer considered as ordered and the effects of the independent variables are allowed to differ for each outcome (Hilbe 2009). Although it was defined for nominal outcomes, it is often used for ordinal data. The MNLM can be expressed as:

$$\ln \left( {\frac{{\Pr \left( {y = m|\varvec{x}} \right)}}{{\Pr \left( {y = b|\varvec{x}} \right)}}} \right) = \varvec{x}\beta_{m|b,} \quad m = 1, \ldots , J,$$

where b is the reference category or the comparison group. The predicted probabilities are calculated by:

$$\Pr \left( {y = m|\varvec{x}} \right) = \frac{{\exp \left( {\varvec{x}\beta_{m|b} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{J} \exp \left( {\varvec{x}\beta_{j|b} } \right)}}.$$

The MNLM relies on the assumption of independence of irrelevant alternatives (IIA) (Luce 1959; Arrow 1963), where the odds do not depend on other alternative outcomes that are available. In other words, adding or deleting outcome categories does not affect the odds among other outcomes. It is plausible to assume that the categories of LoS are independent, because the odds of belonging to certain LoS category do not change if the other two categories are omitted. Alternatively, the Hausman–McFaden test (Hausman and Mcfadden 1984) and the Small–Hsiao test (Small and Hsiao 1985) can be used to evaluate IIA. However, there is evidence suggesting that both tests often give inconsistent results and provide little guidance to violations of the IIA assumption (Long and Freese 2006)

3.3 Parallel assumption for some independent variables

3.3.1 Partial proportional odds model

The partial proportional odds model (PPOM) formulated by Peterson and Harrell (Peterson and Harrell 1990) imposes constraints for parallel lines only where they are needed. The GOLM equation is now extended to accommodate the unconstrained parameters which violated the assumption:

$$\ln \left( {\frac{\Pr (y \le j|\varvec{x}}{\Pr (y > j|\varvec{x}}} \right) = \tau_{j} - \left( {\varvec{x}\beta + T\gamma_{j} } \right),\quad j = 1, \ldots , J - 1.$$

Here x is the vector containing the full set of independent variables. T is a vector containing a subset of independent variables which violate the parallel assumption, and \(\gamma_{j}\) are the regression coefficients associated with the variables in T. The predicted probabilities of belonging to a certain category are defined as:

$$\begin{aligned} \Pr \left( {y = 1|\varvec{x}} \right) = \frac{{\exp \left( {\tau_{1} - \left( {\varvec{x}\beta + T\gamma_{1} } \right)} \right)}}{{1 + \exp \left( {\tau_{1} - \left( {\varvec{x}\beta + T\gamma_{1} } \right)} \right)}}. \hfill \\ \Pr \left( {y = j|\varvec{x}} \right) = \frac{{\exp \left( {\tau_{j} - \left( {\varvec{x}\beta + T\gamma_{j} } \right)} \right)}}{{1 + \exp \left( {\tau_{j} - \left( {\varvec{x}\beta + T\gamma_{j} } \right)} \right)}} - \frac{{\exp \left( {\tau_{j - 1} - \left( {\varvec{x}\beta + T\gamma_{j - 1} } \right)} \right)}}{{1 + \exp \left( {\tau_{j - 1} - \left( {\varvec{x}\beta + T\gamma_{j - i} } \right)} \right)}} \quad j = 2, \ldots , J - 1. \hfill \\ \Pr \left( {y = J|\varvec{x}} \right) = 1 - \frac{{\exp \left( {\tau_{J - 1} - \left( {\varvec{x}\beta + T\gamma_{J - 1} } \right)} \right)}}{{1 + \exp \left( {\tau_{J - 1} - \left( {\varvec{x}\beta + T\gamma_{J - 1} } \right)} \right)}} \hfill \\ \end{aligned}$$

3.3.2 Partial continuation ratio model

The partial continuation model (PCRM) extends the equation for the CRM by adding coefficients for those variables that violate the parallel assumption:

$$\ln \left( {\frac{\Pr (y = m|\varvec{x})}{\Pr (y > m|\varvec{x})}} \right) = \tau_{m} - \left( {\varvec{x}\beta + T\gamma_{m} } \right), \quad m = 1, \ldots ,J - 1,$$

The predicted probabilities are calculated by:

$$\begin{aligned} \Pr \left( {y = m|\varvec{x}} \right) &= \frac{{\exp \left( {\tau_{m} - \left( {\varvec{x}\beta + T\gamma_{j} } \right)} \right)}}{{\mathop \prod \nolimits_{j = 1}^{m} \left[ {1 + \exp \left( {\tau_{j} - \left( {\varvec{x}\beta + T\gamma_{j} } \right)} \right)} \right]}} m = 1, \ldots , J - 1 . \hfill \\ \Pr \left( {y = J|\varvec{x}} \right) &= 1 - \mathop \sum \limits_{j = 1}^{J - 1} \Pr (y = j|\varvec{x}). \hfill \\ \end{aligned}$$

3.3.3 Stereotype ordered regression model

The stereotype ordered regression model (SORM) can be thought of as imposing ordering constraints on a multinomial model (Lunt 2005). It was proposed by Anderson (1984) in response to the restrictive parallel regression assumption of the ORM. The model was originally defined for ordinal variables originated by an assessing process, but it is not restricted to this type of variables only. The SORM is defined as:

$$\ln \frac{\Pr (y = m|\varvec{x})}{\Pr (y = b|\varvec{x})} = \left( {\theta_{m} - \theta_{b} } \right) - \left( {\phi_{m} - \phi_{b} } \right)\varvec{x}\beta ,$$

where θs are the intercepts and ϕs are scale factors associated with the outcome categories and b is the reference category or the comparison group. The model allows the coefficients associated with each independent variable to differ by a scale factor that depends on the pair of outcomes on the left-hand side of the equation. Similarly, the θs allow different intercepts for each pair of outcomes. If the relationship between the independent variables and dependent variable is ordinal, then ϕ 1 > ϕ 2 > ··· > ϕ J-1 > ϕ J .

Constraints need to be added to the model to make it identifiable: ϕ 1 = 1, \(\phi_{J} = 0\), θ 1 = 1 and \(\theta_{J} = 0\). The predicted probabilities of belonging to a certain category are defined as:

$$\Pr \left( {y = m|\varvec{x}} \right) = \frac{{\exp \left( {\theta_{m} - \phi_{m} \varvec{x}\beta } \right)}}{{\mathop \sum \nolimits_{j = 1}^{J} (\exp \left( {\theta_{j} - \phi_{j} \varvec{x}\beta } \right)}}.$$

The model presented here is known as a one-dimensional stereotype ordered regression. Anderson (1984) also presents an extension of the equations to model ordinal variables that are constructed by multiple domains. For example, the coronary heart disease risk score (Wilson et al. 1998) classifies individuals according to their risk of having a coronary heart disease event in the next 10 years. The score is constructed by adding the scores of different risk factors (e.g. diabetes, cholesterol and blood pressure) indicating level or risk (very low, low, moderate, high and very high)

4 Odds ratios and interpretation

The interpretation of the logistic regression models can be more manageable if it comes in terms of odds ratios (ORs). The odds of an event occurring are defined as the probability of an event occurring divided by the probability of that event not occurring. In terms of logistic regression models, the odds ratio then compares the change in the odds that results from a unit change in the predictor.

The models previously described differ in how an event (or non-event) is defined. Table 5 summarises how the odds in each type of approach and models are interpreted.

Table 5 Odds ratio interpretation for each logistic regression model

The adjacent category, multinomial logit and stereotype ordered models are similar in the sense they perform one by one comparison of the categories. For example, the adjacent category model compares one category against the next higher category. Because the MNLM ignores the ordering of the outcomes, it compares a given category against the reference category. The SORM interpretation of the odds could be similar to the MNLM, but when estimated using the default options of STATA, it compares the highest category versus the lowest category.

The odds ratios can be computed from the models’ parameter estimates by exponentiating the β coefficient. When the odds ratio is greater than 1 (i.e. computed from positive βs), it indicates that, as the predictor increases, the odds of the event occurring increase by a factor of \(\exp \left( \beta \right)\), holding all the other variables constant. Conversely, a value lower than 1 (i.e. computed from negative βs) indicates that, as the predictor increases, the odds of the event occurring decrease by a factor of \(\exp \left( \beta \right)\), holding all the other variables constant.

To facilitate the interpretation, when a predictor has a negative effect on the event occurring, instead of calculating the odds of the event occurring, the odds of the event not occurring can be computed by simply taking the inverse of the effect on the odds of the event occurring \(\left( {{\text{i}} . {\text{e}} . \;\exp \left( \beta \right)^{ - 1} } \right)\). Alternatively, the models can be interpreted as per cent change: as the predictor increases, the odds of the event occurring increase by \(\left( {100 \times \left[ {\exp \left( \beta \right) - 1} \right]} \right)\,\%\), holding all other the variables constant.

5 Model estimation

As in common practice, two-thirds of the LoS data set, named the training set, was allocated for estimation purposes and the remaining third, named the validation set, was used for testing (Dobbin and Simon 2011). All the models described above were fitted to the training set using STATA. The first step is to evaluate the parallel assumption. We fitted the ORM using ologit followed by the command brant which performs a Brant test. This test compares the beta coefficients from J-1 binary logits and gives a list of which variables are violating the parallel assumption. This was useful later to estimate the partially constrained models (i.e. PPOM, PCRM and SORM) where constraints needed to be imposed on those variables where the assumption is not violated. The commands ocratio and adjcatlogit were used to estimate the CRM and ACM; the gologit command with the npl option was used for GOLM; and the commands ucrlogit, mlogit and slogit were used to estimate the SeqM, MNLM and SORM, respectively.

The PPOM was estimated using the gologit2 command with the autofit option to impose constraints on the variables where the parallel assumption is not violated. Finally, the PCRM was estimated using the seqlogit, but the constraints needed to be added manually. See the appendix for the full STATA code and alternative commands for estimation.

All the models were estimated through the procedure of maximum likelihood estimation. Maximum likelihood estimates are the values of the parameters that have the “maximum likelihood” of generating the observed sample.

To compare the models discussed here, along with the log-likelihood, the Akaike information criterion (AIC) and Bayesian information criterion (BIC) were calculated. A model with a higher log-likelihood should be considered as a better-fitting model. Models with the smaller absolute AIC and/or BIC values should be preferred.

In addition, model performance was measured through accuracy rates (per category and overall performance) to express the percentage of times the patient membership (i.e. observed category to which they belong) matches with the membership predicted by the models discussed here. The predicted category was assigned using the highest probability method (Anderson and Philips 1981), which allocates a patient to the category for which he or she got the highest probability estimate.

Furthermore, an analysis of the residuals might be useful in identifying the data points for which the models fit poorly. However, recent research conducted (Hosmer and Lemeshow 2010) has highlighted the difficulties of defining what a large residual is in the context of logistic models, as it highly depends on the type of data involved. Therefore, the analysis of the residuals was left out of the scope of this tutorial.

6 Results

Table 6 gives the results of the Brant test. Notice that the χ 2 test, at the top of the table, indicates that the null hypothesis, stating that the model parameters are equal across categories (i.e. parallel regression assumption), can be rejected at the 0.0001 level. In general, when statistical assumptions, such as the parallel regression assumption, are broken as they are in this data set, models based on the parallel assumption for all the independent variables cannot be accurately applied to the whole population (the parameters of the model are said to be biased). In other words, it is not possible to draw conclusions about the population, although valid estimates of the models can be generated. Consequently, these models where the parallel assumption is imposed to all their independent variables (ORM, CRM and ACM) were excluded from further analysis. Notice that not all the variables violate the assumption. This should be taken into account when imposing constraints to the partial models.

Table 6 Brant test for parallel assumption

Table 7 displays the odds ratios estimates for the six remaining models. Notice that for all models, gender, surgical procedure category 3 and number of surgical procedures to undergo were not significant predictors of short to medium LoS. However, gender and number of surgical procedures to undergo then became significant predictors of long LoS. For example, it seems that gender might not be an initial factor when predicting whether a patient will be more than 4 days hospitalised. But being female might increase the odds of staying more than 12 days.

Table 7 Odds ratio estimates for all models

The first two columns of Table 7 display the parameter estimates for the GOLM. The first column contrasts short LoS with categories medium and long LoS, and the second column contrasts categories short and medium LoS with category long LoS. In terms of interpretation, an OR ≥ 1 indicates that higher values in the predictors make it more likely that the patient belongs to an upper LoS category than the current one, while an OR < 1 indicates that higher values on the independent variable increase the likelihood of belonging to the current category or to a lower one (Williams 2007). For example, the OR of diagnosis category 2 (e.g. insulin-dependent diabetes mellitus, hepatic failure and stroke) is higher in the first section, indicating that a patient with a diagnosis category 2 is more likely (2 times more) to have a medium or long LoS rather than a short one. The OR of outpatient clinic is less than 1, so to ease the interpretation, we take their inverse. The higher value in the second column indicates that a patient who is referred to hospitalisation from the outpatient clinic is more likely (4.2 times more) to have a short or medium LoS rather than a long one.

The next two columns of Table 7 display the odds ratios for the SEQM. The first column contrasts short LoS with categories medium or long LoS, and the second column contrasts category medium LoS with category long LoS. In terms of interpretation, an OR ≥ 1 indicates that higher values in the predictor make it more likely that the patient progresses to an upper LoS category than the current one, while an OR < 1 indicates that higher values on the independent variable increase the likelihood of not progressing to the next category. For example, the OR of diagnosis category 2 (e.g. insulin-dependent diabetes mellitus, hepatic failure and stroke) is higher in the first section, indicating that a patient with a diagnosis category 2 is more likely (2 times more) to progress to a medium or long LoS rather than a short LoS. The OR value of the coefficient of outpatient clinic is higher in the first section (after taking the inverse), indicating that a patient who is referred to hospitalisation from the outpatient clinic is more likely (3 times more) to have a short LoS rather than a medium or long one.

The next model is the MNLM where short LoS is the reference category (i.e. STATA usually picks the category with the highest frequency to be the reference category, but this can be easily modified). A patient with a diagnosis category 3 is three times more likely to have a short LoS rather than a medium LoS. A patient who is referred to hospitalisation from the outpatient clinic is 5 times more likely to have a short LoS than a long one. A patient undergoing a surgical procedure category 4 is 2.5 times more likely to have a long LoS rather than a short one.

The next two models (PPOM and PCRM) have beta coefficients varying across the categories only for those variables that violate the parallel assumption, i.e. gender, general surgery ward, diagnosis category 2 and 3, surgical procedure category 4 and number of surgical procedures to undergo. Thus, the parameter estimates for the rest of the variables (i.e. constrained variables) are the same in both columns. In the PPOM, a patient undergoing a surgical procedure category 4 is almost 60 % more likely to have a longer LoS. The odds of having a shorter LoS are three times higher for patients who are referred to hospitalisation from the outpatient clinic. Conversely, more specific comparisons can be made for the variables which were set free of constraints: the higher coefficient of diagnosis category 2 (e.g. insulin-dependent diabetes mellitus, hepatic failure and stroke) in the first column indicates that patient with a condition classified under that diagnosis category is two times more likely to have a medium or long LoS rather than a short one.

The next model is the PCRM, from which estimates were almost identical to the PPOM. It is just the interpretation which slightly changes. For example, a patient with a diagnosis category 2 is two times more likely to progress to medium or long LoS.

The last two columns show the output for the SORM. The parameters can be interpreted in terms of the odds of the reference category versus the first category. STATA usually selects the last category (long LoS) as the reference category, but this can be easily changed in the command line. For example, the odds of having a long LoS versus a short LoS are twice as high for patients with a disease classified in the diagnosis category 2 (e.g. insulin-dependent diabetes mellitus, hepatic failure and stroke), The odds of having a short versus a long LoS are three times higher for patients who are referred to hospitalisation from the outpatient clinic.

Table 8 displays the goodness of fit and performance measurement. The model that represents the best fit according to the log-likelihood and AIC (by a negligible margin) is the MNLM. However, in terms of BIC, the best model is the PPOM. The absolute lower value of BIC could indicate a better fit or the presence of fewer parameters; it penalises free parameters more strongly than AIC. The second part of the table shows the accuracy rates on the validation set, giving an idea of how well the models do in predicting new patients. The six models perform well in predicting patients with short LoS. However, the models failed to predict any patient with a long LoS. Correct discrimination between categories is only possible if categories are essentially different in terms of the predictors (Ashby et al. 1986), which may suggest that collecting more data is appropriate for our illustrative example. Only when more data are not available is it acceptable to combine adjacent categories (e.g. medium and long LoS) and use conventional binary logistic regression instead to improve classification performance.

Table 8 Comparative chart logistic regression models

7 Discussion

When choosing the most appropriate model, there are some main points to consider: MNLM is a very popular model, and there is a wide selection of software on the market available for its implementation, which is naturally an advantage. However, the biggest drawback of the MNLM is that it ignores the clearly ordinal nature of the data which hinders the ability to assess effect directionality and progression (Cliff 1996).

The models based on the parallel assumptions are the only ones that strictly maintain the ordinality of the outcomes (Mccullagh 1980). The rest of the models only retain the ordinal nature of the outcome to some extent (except for the MNLM that completely ignores the ordinal nature). Another advantage of the models based on the parallel assumption is their parsimony compared to most of the models presented here. However, the parallel assumption is rarely fulfilled and these models cannot be applied. SORM is another parsimonious model and easy to analyse, but it has been frequently associated with highly biased estimates and dubious identifiability when more than four outcome categories are involved (Holtbrugge and Schumacher 1991). The partially constrained models have less parameter than the fully unconstrained models. Both AIC and BIC take into account parsimony when evaluating the model goodness of fit. For our particular case, the more parsimonious models did not outperform those models with more parameters.

Furthermore, the choice of model should depend on the research question: if the main goal is classification and prediction, it is not imperative to select a model that preserves ordinality but to find which minimises cost of misclassification. For example, when categories are ordered, misclassification to an adjacent category should be always preferred to misclassification to a more extreme one (Ashby et al. 1986). There is strong evidence that suggests the MNLM outperforms ordinal models under a variety of circumstances (Campbell et al. 1991). For our illustrative example, more data need to be collected if any of these models were to be used for prediction and classification purposes. If collecting more data is not feasible, combining adjacent categories (e.g. medium and long LoS) and using conventional of binary logistic regression instead might be acceptable.

However, if the main goal is to understand the nature and direction of the predictor effects by exploiting the ordinal nature of the outcome, one should pick the model that best interprets the event of interest under study, especially in our case where there was little difference in terms of goodness of fit and performance between the six models.

The ACM, MNLM and SORM provide one by one comparison of categories. These models can be of particular use when hospitals plan their resources around a specific type of patients like short stays. It might be the case that patients with short and medium LoS are no different in terms of the level of care they need. Therefore, planners might be more interested in identifying factors for those patients who might have a long LoS. These three models allow direct comparisons with that category.

The ORM, GOLM and PPOM reveal the likelihood of moving in a certain direction (the patient moves towards either longer LoS or shorter LoS). They are very useful in identifying trends in the odds, and for the illustrative example shown here, understanding patient progression regarding to their LoS category, either upward or downward, brought more valuable information than comparing against a fixed category (e.g. MNLM or SORM).

The CRM, SeqM and PCRM reveal the likelihood of moving in certain direction assuming that patients must have pass through lower categories. This might be particularly useful if there is an interest in understanding patient pathways at hospital.

8 Conclusion

This tutorial presents a synthesised review of nine different models from the family of logistic regression for the analysis of ordered outcomes. The three first models ORM, CRM and ACM are developed under a strong assumption (i.e. the parallel regression assumption) that is rarely fulfilled. Accordingly, other models are presented as viable alternatives, including GOLM, SEQM, MNLM, PPOM, PCRM and SORM. During model validation, performance measurements indicated that the six models had analogous performance when predicting new data. Moreover, the direction of the effects estimates were very similar between models, although the interpretation of the odds ratios varied from one model to another.

Consequently, other factors should be taken into account when choosing the most appropriate model, such as simplicity, number of model parameters or software availability. Most importantly, the choice should be based on the user’s research question and event under study. The models presented here have proved to be computationally inexpensive and easy to estimate, analyse and interpret. Therefore, there is no reason to maintain frequently used practices, such as using nominal models or binary regression models, on combined adjacent categories.