1 Introduction

In several applied researches data are collected as categorical ordinal observations. Sometimes they are genuine ordered assessments (judgements, preferences, degree of adhesion to a sentence, etc.) whereas in other circumstances they are categorized for convenience (age of people in classes, measures of objects in block of constant size, education achievement, levels of blood pressure for classifying heart health status, etc.). In both cases, an effective statistical analysis should take the ordinal nature of the responses into account, as discussed by Agresti (2010), Powers and Xie (2000), Tutz (2012), among others. Although the results of this paper may be loosely applied in any context where ordinal data and subjects’ covariates are involved, it is more immediate to focus the subsequent discussion in case of rating surveys.

Different lines of attack to the problem have been raised in the literature and some of them stem from the well known historical debate between Pearson and Yule: the main distinctions lie in considering ordinal data as generated by a latent continuous variable or as an intrinsically discrete phenomenon. In fact, the boundary line between these two approaches is not so sharp as it is evident when we face with logistic regression which can be safely introduced within the logic of both paradigms.

In the last decades, several contributions have been proposed and the leading trend is to convey the statistical analysis of ordinal data to the Generalized Linear Models (GLM) framework as proposed by McCullagh (1980) and deepened by Nelder and Wedderburn (1972), McCullagh and Nelder (1989) and extended with several variants by Peterson and Harrell (1990) and Cox (1995), among others. According to this line of reasoning, we model the probability of a response not superior to a given category as a function of selected covariates; in fact, the distribution function induces an ordered constraint among the categories.

An alternative approach, mainly motivated by the investigation of respondents’ psychology, has been introduced by Piccolo (2003) and D’Elia and Piccolo (2005) and consists in the so-called cub models. They have been successfully applied in several fields since they allow for easy interpretation and visualization of the estimation results, and also for designing profiles and specifying clusters of respondents (Corduas 2008a, b, 2011). Then, these models have been extended in several directions and form the basis of the generalization we will pursue in this paper according to the suggestions of Corduas et al. (2009) and the analysis of Iannario (2012a) who support the introduction of a further component, denoted as shelter effect. The novelty of the class of models, discussed hereafter and denoted as g e cub , is the ability to estimate the effect of subjects’ covariates for all the components of the extended mixture.

The paper is organized as follows: in the next section, we set notations and motivations for the model whereas in Sect. 3 g e cub models are specified and their usage is emphasized. Then, the main derivation of the maximum likelihood (ML) estimators is outlined in Sect. 4. A limited simulation experiment is performed in Sect. 5 to confirm the main properties of the ML procedures for finite sample sizes. Section 6 investigates the usefulness and the interpretation of these models in a real case study. Some final remarks conclude the paper.

2 Motivations and notation for the proposed mixture

Sample data consist of a collection of ordered scores \((r_1, \,r_2, \,\dots , r_n)\) anchored to the integers of the support \(I_m=\{1,2,\dots ,m\}\), for some known m. The ordered evaluation may concern opinions, judgments, degrees of liking/preference, and even a qualitative mapping of some continuous variable, but for simplifying the discussion we assume that responses are some sort of ratings in one-to-one correspondence with integers belonging to \(I_m\). Thus, respondents choose a qualitative assessment on a graduated sequence of verbal definitions (for instance, “extremely dissatisfied”, “very dissatisfied”, ..., “very satisfied”, “extremely satisfied”) which are coded as numbers just for convenience.

In statistical surveys further information are also collected, and we will speak of ratings and subjects’ covariates to refer to ordinal responses and information regarding the respondents, respectively. Our objective is to explain, fit, and predict the probability \(Pr\displaystyle \left( R=r\right) \) that a discrete random variable R assumes values \(r=1,2,\dots ,m\). When significant, subjects’ covariates should improve the performance and the interpretation of such a model. For this purpose, we introduce a probability structure where the final outcome of the evaluation process is a discrete observation generated by an investigated trait which is intrinsically continuous.

In this regard, two possible interpretations are admissible for explaining the mental process by which respondents rate their opinion/evaluation about an item by means of a finite and graduated scale.

According to the first interpretation, it may be conjectured that the i-th respondent adopts the following two-step strategy:

  • First of all, he/she chooses between a simplistic option (consisting in the selection of a modality which he/she considers very attractive by the nature of verbal wording and/or the numbering of the scale, for example) and a meditated response (which requires some thinking about). We assume that the selection between these two main alternatives happens with probabilities \(\delta _i\) and \(1-\delta _i\), respectively, for \(i=1,2,\dots ,n\). This choice may be motivated by a lazy behaviour of the respondent who takes refuge in a category which is judged as convenient, safe, attractive, politically-correct, etc. Such an option has been called shelter choice and it may be represented by a degenerate random variable located at \(R=c\), where \(c \in I_m\) is a known category depending on the specific question at hand:

    $$\begin{aligned} D_{r}^{(c)}\,=\,\left\{ \begin{array}{ll} 1, &{}\quad \text {if} \quad r=c;\\ 0, &{}\quad \text {otherwise}; \end{array} \right. \qquad \qquad r=1,2,\dots ,m. \end{aligned}$$
    (1)
  • If he/she selects the second option, the final selected category is a balanced decision between his/her feeling towards the item and a totally random choice, with propensities \(\pi _i\) and \(1-\pi _i\), respectively. This choice assumes a more involved respondent, thus the final decision is a weighted determination between a positive/negative sensation related to the item and a light/heavy indecision/fuzziness. In fact, the selection of an ordered modality among several ones is a very complex mental process since it involves several factors influencing the final choice (Tourangeau et al. 2000). Thus, a simplified version of such a psychological process should limit the analysis only to relevant components. As fully discussed with reference to cub models (Piccolo 2003; Iannario and Piccolo 2012a), the attractiveness (or the repulsion) towards the item (= feeling) and the indecision (fuzziness) in the response (= uncertainty) have been considered as the relevant ones.

We consider feeling as an internal/personal attitude concerning the opinion of the subject towards the object and, depending on the circumstances, it may be named as degree of perception, measure of closeness, level of satisfaction/preference, assessment of proficiency, rating of concern, index of selectiveness, pain threshold, risk awareness, subjective probability, degree of confidence, etc.

On the other side, uncertainty pertains to the operational modes of the final choice and to the external facts affecting and surrounding the final decision. Thus, uncertainty is not the “randomness” related to the sampling experiment, but it depends on convergent and related factors as: limited set of information about the topic, personal interest/engagement in activities related to the problem, amount of time devoted to the response, nature of the scale in terms of range and wording, tiredness or fatigue for a correct comprehension of the question, willingness to joke and fake, lack of self-confidence, laziness/apathy/boredom of the respondent. Also, the “response style” may be interpreted as a component of uncertainty in the response (see Gottard et al. 2015 for a discussion of these and related topics).

In addition, uncertainty is also related to the “satisficing” behaviour (Simon 1957), which is generated by respondents who choose an adequate answer that may not be the optimal one, in the attempt to minimize the burden of the question (Krosnick 1991). This attitude generates a varying degree of indecision to answer a specific item and it ranges from a complete lack of satisficing (=  completely accurate response) to strong satisficing behaviour (=  completely random response). Then, we are assuming that uncertainty affects any individual choice and it can be, at worst, constituted by a purely random choice among categories. In intermediate cases, each respondent acts with a propensity to adhere to a thoughtful and to a completely random choice, and we will weigh such a propensity with quantities \((\pi _i)\) and \((1-\pi _i)\), respectively.

A second interpretation may be proposed and again it assumes a two-step strategy for the i-th respondent:

  • First of all, he/she decides to activate his/her personal feeling towards the item with a meditated choice (as previously detailed) or to adopt a lazy behavior derived by a global indecision mood with probabilities \(\lambda _i\) and \(1-\lambda _i\), respectively, for \(i=1,2,\dots ,n\).

  • If he/she selects the second option, then he/she may activate a random selection over the support \(I_m\) or refuge in a shelter category, and this happens with propensities \(\eta _i\) and \(1-\eta _i\), respectively.

The second interpretation is conceptually simpler and it is consistent with the “satisficing” behaviour. In the next section, the equivalence between the two conceptual models will be formally proved.

Turning these interpretations into a statistical framework, several distributions may adequately fit the implied components. The family of cub models (Piccolo 2003) is characterized by the shifted Binomial and the discrete Uniform random variable for modelling feeling and uncertainty, respectively, as defined by:

$$\begin{aligned} b_{r}(\xi _i)=\left( {\begin{array}{c}m-1\\ r-1\end{array}}\right) \xi _i^{m-r}(1-\xi _i)^{r-1};\qquad p_r^U=\frac{1}{m}; \qquad \qquad r=1,2,\dots ,m. \end{aligned}$$

To support these choices pragmatic and statistical points of view may be advanced. The shifted Binomial distribution involves a single parameter \((\xi _i)\) and presents a modal value located everywhere over the support \(\{1, 2,\dots ,m\}\). It allows a parsimonious parameterization when we have to fit observed distributions with different shapes in terms of skewness and flatness. Then, the Binomial distribution (and the shifted one) may be generated by a continuous unimodal distribution by selecting appropriate ordered cutpoints. Thus, this choice is consistent with the common hypothesis that a continuous latent variable moves the final selection of a discrete modality. Finally, the choice of the Binomial random variable may be also justified on the basis of statistical motivations, as detailed in “Appendix 1”.

As far as the discrete Uniform distribution is concerned, we adopt this random variable just as the extreme building block for the respondent choice since it is maximally uninformative and maximizes entropy over the class of discrete distribution with a given finite support. In addition, no parameter is added to the model (m is known) and we may judge the resoluteness of the respondent in respect to this extreme choice since it represents the maximum heterogeneity among the responses.

These arguments are behind the introduction of a discrete mixture (Piccolo 2003) defined by

$$\begin{aligned} Pr\displaystyle \left( R=r\right) =\pi _i\, b_{r}(\xi _i)+(1-\pi _i)\,p_r^U\,,\qquad i=1,2,\dots ,n, \end{aligned}$$
(2)

and called cub model since it is a convex Combination of discrete Uniform and shifted Binomial random variable.

3 Specification of a GeCUB model

For a given \(c \in I_m\) and known m, we will consider the observed response r as the realization of a random variable R whose probability distribution for any i-th subject -according to the first interpretation- is defined by:

$$\begin{aligned} Pr\displaystyle \left( R=r\right) =\delta _i\,\biggl [\,D_{r}^{(c)}\,\biggr ]\,+\,(1-\delta _i)\,\biggl [\,\pi _i\,b_{r}(\xi _i)+(1-\pi _i)\,p^U_r\,\biggr ], \quad r=1,2,,\dots ,m. \end{aligned}$$
(3)

In absence of covariates, this model has been denoted as cub model with a shelter effect by Iannario (2012a), who discusses properties, estimation issues and related topics.

If we adhere to the second interpretation, an alternative specification may be obtained:

$$\begin{aligned} Pr\displaystyle \left( R=r\right) =\lambda _i\,b_{r}(\xi _i)\,+\,(1-\lambda _i)\,\biggl [\,\eta _i\,p^U_r\,+\,(1-\eta _i)\,D_{r}^{(c)}\,\biggr ], \quad r=1,2,,\dots ,m. \end{aligned}$$
(4)

Given the one-to-one mapping

$$\begin{aligned} \left\{ \begin{array}{lcl} \lambda _i &{} = &{} \pi _i (1-\delta _i);\\ \eta _i &{}=&{} \displaystyle \frac{(1-\pi _i) (1-\delta _i)}{1-\pi _i (1-\delta _i)}; \end{array} \right. \quad \Longleftrightarrow \quad \left\{ \begin{array}{lcl} \pi _i &{} = &{} \displaystyle \frac{\lambda _i}{\lambda _i+\eta _i (1-\lambda _i)};\\ \delta _i &{}=&{} (1-\lambda _i) (1-\eta _i); \end{array} \right. \qquad i=1,2,\dots ,n; \end{aligned}$$

it is indifferent to discuss either of g e cub specifications (3) and (4). Hereafter, we focus on the model (3) since it gives an immediate weight \((\delta _i)\) to quantify the shelter effect.

In standard cub random variables, thanks to the one-to-one correspondence between the parameters \((\pi ,\xi )\) and the probability distribution, we plot the estimated models as points in the unit square in order to interpret the behaviour of respondents when faced to different items, for varying circumstances of space, time and contexts. If we wish to add the additional parameter \((\delta )\) to this representation we may increase the size of the point \((\pi ,\xi )\) or add an horizontal line starting at \((\pi ,\xi )\) and proportional to \(\delta \), for instance.

In presence of covariates, model (3) allows to interpret the parameters in relation to the  feeling \((1-\xi _i)\) of the respondent, the uncertainty \((1-\pi _i)\) of the responses and a possible shelter effect \(\delta _i\). Briefly, this effect is the weight of the shelter choice and it quantifies the increase of probability of the category \((R=c)\) with respect to a cub model (where \(\delta _i=0\)). Thus, cub models are nested into cub models with a shelter effect.

Suppose that information on the n subjects are summarized by a set of v variables and collected in the matrix

$$\begin{aligned} \varvec{T} =|| t_{ij},\,\, i=1,2,\dots ,n;\,\,\, j=1,2,\dots ,v||\,. \end{aligned}$$

which summarizes the available subjects’ covariates (the so-called concomitant variables). We consider sub-matrices \(\varvec{Y},\, \varvec{W},\, \varvec{X}\) obtained from \(\varvec{T}\) by selecting convenient columns. Then, we denote by \(\varvec{y}_i, \varvec{w}_i\), and \(\varvec{x}_i\), for \(i=1,2,\dots ,n\), the i-th rows of the \(\varvec{Y}, \varvec{W}\) and \(\varvec{X}\) matrices, respectively, that is:

$$\begin{aligned} \varvec{y}_i= & {} (y_{i0},y_{i1},y_{i2},\dots ,y_{ip}); \quad \varvec{w}_i=(w_{i0},w_{i1},w_{i2},\dots ,w_{iq});\\ \varvec{x}_i= & {} (x_{i0},x_{i1},x_{i2},\dots ,x_{is})\,. \end{aligned}$$

We let: \(y_{i0}=w_{i0}=x_{i0}=1\), for \(i=1,2,\dots ,n\). These rows contain all available sample information on the i-th subject related to the model components and they are necessary and sufficient for the model specification.

Then, for \(i=1, 2, \dots , n\), we introduce a direct logistic link among parameters and covariates:

$$\begin{aligned} \pi _i=\pi _i(\varvec{\beta })=\frac{1}{1+e^{-\varvec{y}_i \varvec{\beta }}}\,;\quad \xi _i=\xi _i(\varvec{\gamma })=\frac{1}{1+e^{-\varvec{w}_i \varvec{\gamma }}}\,;\quad \delta _i=\delta _i(\varvec{\omega })=\frac{1}{1+e^{-\varvec{x}_i \varvec{\omega }}}\,; \end{aligned}$$

where \(\varvec{\beta }=(\beta _0, \beta _1, \dots , \beta _p)', \varvec{\gamma }=(\gamma _0, \gamma _1, \dots , \gamma _q)'\) and \(\varvec{\omega }=(\omega _0, \omega _1, \dots , \omega _s)'\), respectively. According to the logistic function: \(logit(p)=\log \left( p/(1-p)\right) \), previous relationships are equivalent to:

$$\begin{aligned} { logit}\left( \pi _i\right) =\varvec{y}_i \varvec{\beta }\,;\quad { logit}\left( \xi _i\right) =\varvec{w}_i \varvec{\gamma }\,;\quad { logit}\left( \delta _i\right) =\varvec{x}_i \varvec{\omega }\,;\quad i=1,2,\dots ,n. \end{aligned}$$
(5)

Alternative links are admissible but we found that the logistic function is a convenient mapping in most real circumstances. Notice that, given the previous parameterization, the matrices \(\varvec{Y},\, \varvec{W},\, \varvec{X}\) may or may not possess an arbitrary number of common columns.

To see how a single covariate affects the probability of the response, we may plot this probability mass function for any prefixed value of the discrete covariates or for some specific values of the continuous variables. Alternatively, we may consider the modification of the points in the parametric space for varying values of covariates or to study the behaviour of \((1-\pi _i), (1-\xi _i)\) and \(\delta _i\) as functions of selected covariates, as it will be pursued in the real case study, for instance.

Finally, a Generalized cub (=g e cub ) model is fully specified by (3) and (5). For this model the length of the vector \((\varvec{\beta }', \varvec{\gamma }', \varvec{\omega }')'\) is \((p+q+s+3)\). If some or all subjects’ covariates (or components) are absent, analysis is greatly simplified. In these circumstances, it is more convenient to refer to cub models without and with shelter effect, respectively, as derived by Piccolo (2006) and Iannario (2012a).

A critical point is that the model assumes c as a known constant. In principle, one might test a cub model with a possible shelter effect for any admissible \(c=1,2,\dots ,m\) and then accept the model with the best fitting and significant parameters. Indeed, in real case studies concerning a specific scientific field, researchers have nearly always accumulated evidence about a category c where people tend to give a response more often than that predicted by the standard model. This happens for psychological motivations, biased or sensible questions, mass media pressure, difficulty of comprehension of the item, desire of privacy, specific wording, and so on. Thus, the knowledge of c is not a severe constraint in most of the current surveys.

4 Statistical inference for the GeCUB model

The sample ratings \(\varvec{r}=(r_1,r_2,\dots ,r_n)'\) are considered as realizations of the random sample \((R_1,R_2,\dots ,R_n)'\) where each \(R_i\) is independently distributed as a discrete random variable over the support \(I_m\).

In a mixture distribution, it is useful to characterize the notation of the parameters according to their roles. Thus, we denote by \(\varvec{\theta }=(\varvec{\psi }',\,\varvec{\eta }')'\) the full parameter vector of a g e cub model where \(\varvec{\psi }\) and \(\varvec{\eta }\) are the parameter vectors of weights \((\alpha _g)\) of the probability distributions \(\left( {\fancyscript{P}}_g\right) \), respectively, for the \(g=1,2,3\) components (as summarized in Table 1).

Table 1 Notation for the components of the mixture in the g e cub model

Given the sample \(\varvec{r}\) and the information set of covariates \(\fancyscript{C}_i=(\varvec{y}_i, \varvec{w}_i, \varvec{x}_i)\), for \(i=1,2,\dots ,n\), the log-likelihood function may be written as:

$$\begin{aligned} \ell (\varvec{\theta })= & {} \sum _{i=1}^{n}\,\log \left( Pr\displaystyle \left( R=r_i|\fancyscript{C}_i,\,\varvec{\theta }\right) \right) =\sum _{i=1}^{n}\,\log \left( \sum _{g=1}^{3}\,\alpha _{gi}\,p_g(r_i; \varvec{\eta }_g)\right) \\= & {} \sum _{i=1}^{n}\,\log \biggl [\alpha _{1i}\,p_1(r_i; \varvec{\eta }_1)+\alpha _{2i}\,p_2(r_i; \varvec{\eta }_2)+\alpha _{3i}\,p_3(r_i; \varvec{\eta }_3)\biggr ]\\= & {} \sum _{i=1}^{n}\,\log \biggl [\delta _{i}\,D_{r_i}^{(c)}+\pi _i(1-\delta _i)\,b_{r_i}(\varvec{\gamma })+(1-\pi _i)(1-\delta _i)\,p^U_{r_i}\biggr ]\,. \end{aligned}$$

As for all mixture distributions, ML estimators are effectively obtained from \(\ell (\varvec{\theta })\) by exploiting the EM procedure proposed by Dempster et al. (1977) and specifically oriented to finite mixtures (McLachlan and Krishnan 2008; McLachlan and Peel 2000). Such a procedure is detailed for g e cub models in “Appendix 2”. Asymptotic inference requires the knowledge of the information matrix for g e cub models and this step is generally achieved by numerical computations or by simulation devices (bootstrap, for instance). However, it is more accurate to compute the second order derivatives of \(\ell (\varvec{\theta })\) by analytic methods.

All estimation procedures have been derived for the parameters of g e cub models specified by (3). Given the invariance properties of ML estimators (Serfling 1980, p.43), it is immediate to get any estimation result of this model in terms of the alternative specification (4).

Then, the validation of the estimated model relies on several points:

  • parameters significance: this is achieved by comparing estimates to their standard errors (Wald test). Some caution should be considered when we test on the border of the parametric space since significance must be modified: a detailed account and related references for testing \(H_0: \delta =0\) for the shelter effect are discussed by Iannario (2012a).

  • log-likelihood comparisons: in presence of nested models, we test the increase in log-likelihoods with respect to the standard \(\chi ^2\) percentiles to see if the most complex model is a valuable choice. Likelihood ratio tests, deviance and related statistics may be defined as in the current literature (Agresti 2010, pp. 67–75)

  • global indices: we consider measures as \(BIC=-2\,\ell (\hat{\varvec{\theta }})+(p+q+s+3)\,\log (n)\), for instance, to take into account both the improvement in the log-likelihood and the penalty given by an increase of parameters of the model.

  • residuals diagnostic: Pearson and relative residuals may be defined and conveniently checked. In addition, further analyses based on the the definition of generalized residuals (Di Iorio and Iannario 2012) may be pursued as well.

A fitting index which compares observed relative frequencies \(f_r\) and expected \(\hat{p}_r=p_r(\hat{\varvec{\theta }})\) is based on normalized dissimilarity measures as

$$\begin{aligned} \fancyscript{F}^2=1-\frac{1}{2}\,\sum _{r=1}^{m}{\mid } f_r-\hat{p}_r{\mid }. \end{aligned}$$

It may be interpreted as the proportion of correct predicted responses (Iannario 2009). In presence of discrete covariates with k categories, with obvious notation, this quantity may be generalized by means of

$$\begin{aligned} \fancyscript{F}^2=1-\frac{1}{2}\,\sum _{j=1}^{k}\,\frac{n_j}{n}\,\sum _{r=1}^{m}{\mid } f_{rj}-\hat{p}_{rj}{\mid }. \end{aligned}$$

From a predictive point of view, several problems have to be faced when ordinal data are involved. As a matter of fact, all modelling approaches are able to predict a whole probability distribution given the subjects’ covariates; indeed, most of the methods (as involved by log-likelihood computations and analysis of deviance) concern the comparison of predicted and observed proportions of the ordinal categories.

On the other side, the main purpose of the researcher is to predict the rating of a respondent, given his/her characteristics. Thus, we have to synthesize \(Pr\displaystyle \left( R=r \!\mid \! \hat{\varvec{\theta }}, \,{\fancyscript{C}}_i\right) \) by a predictor \(\hat{r}_i\) of \(r_i\), for \(i=1,2,\dots ,n\). Expectation, modal value (mode) and median of the estimated probability distribution, conditional to selected covariates \({\fancyscript{C}}_i\), are candidates for \(\hat{r}_i\). For any selection of a predictor, a Root Mean Square Error (RMSE) is defined by:

$$\begin{aligned} \textit{RMSE}=\sqrt{\frac{1}{n}\,\sum _{i=1}^{n}\, \left( r_i-\hat{r}_i\right) ^2}\,. \end{aligned}$$
(6)

This measure should be critically considered since it is based on a point estimate; although it is useful to compare different predictors derived by different models, it should not be used to discriminate models belonging to different classes.

5 A simulation experiment

To check the ability of the proposed modelling approach to detect the presence of a possible shelter the case of a finite sample size, a limited experimental design has been planned with a subjects’ covariate which, for simplicity, we suppose dichotomous. More precisely, in a rating survey with \(m=7\) categories, we assume the existence of two subgroups \({\fancyscript{G}}_0\) and \({\fancyscript{G}}_1\) characterized by the parameters: \(\varvec{\theta }_0= (\pi _0,\,\xi _0,\,\delta _0)'\) and \(\varvec{\theta }_1= (\pi _1,\,\xi _1,\,\delta _1)'\), respectively, with a shelter effect at the fifth category, so that \(c=5\).

Table 2 Experimental design for the simulation of g e cub models

Table 2 lists \(\varvec{\theta }_i= (\pi _i,\,\xi _i,\,\delta _i)',\,i=0,1\) according to cub and shelter parameterization and to the implied g e cub parameters (when the dichotomous covariate is explicitly inserted). To express this basic structure in terms of g e cub models, we exploit the relationships among \(\pi _0, \pi _1\) and \(\varvec{\beta }=(\beta _0,\,\beta _1)'\) parameters when a dichotomous covariate \(D_i,\,i=1,2,\dots ,n\) is present:

$$\begin{aligned} \pi _i= & {} \left[ 1+\exp (-\beta _0-\beta _1\,D_i)\right] ^{-1};\\D_i= & {} 0,1\,\Longrightarrow \,\, \beta _0=\log \,\frac{\pi _0}{1-\pi _0};\\ \beta _1= & {} \log \,\frac{\pi _1}{1-\pi _1}-\beta _0; \end{aligned}$$

similar relationships hold between \(\xi _i, \delta _i, i=0,1\) and \(\varvec{\gamma }=(\gamma _0,\,\gamma _1)', \varvec{\omega }=(\omega _0,\,\omega _1)'\), respectively.

The experiment is characterized by different configurations of location and shape of the probability distributions as shown in Fig. 1, where the contribution of the shelter effect at \(R=5\) has been emphasized.

Fig. 1
figure 1

Population distributions (shelter at \(R=5\)) of two subgroups: \({\fancyscript{G}}_0\) (left) and \({\fancyscript{G}}_1\) (right)

For each simulation run, sample data consist of two samples of \(n_0=n_1=500\) observations \((r_i,d_i), i=1,2,\dots ,n\) where \(r_i\) is generated by the groups \({\fancyscript{G}}_0\) and \({\fancyscript{G}}_1\), and \(d_i\) assumes values 0 and 1 for the first and the second subgroups, respectively. Then, the following steps have been performed:

  1. 1.

    Generate \(n_0\) and \(n_1\) observations from the “true” model.

  2. 2.

    From the global sample \((r_1,r_2,\dots ,r_n)\) of \(n=n_0+n_1\) observations, estimate a g e cub model by the ML method on the basis of the information \((r_i,d_i)\), for \(i=1,2,\dots ,n\) and collect estimates in the vector

    $$\begin{aligned} \varvec{\theta }^{\,[j]}=\left( \beta _0^{\,[j]},\,\beta _1^{\,[j]},\,\gamma _0^{\,[j]},\,\gamma _1^{\,[j]},\,\omega _0^{\,[j]},\,\omega _1^{\,[j]}\right) '. \end{aligned}$$
  3. 3.

    Repeat 1-2 for \(j=1,2,\dots ,nsimul=1000\) times.

We report the estimates of bias and mean square error (MSE) of the parameters in Table 3 and we briefly comment on the main results of this experiment.

Table 3 Bias and mean square error for the simulation experiments
Fig. 2
figure 2

Histograms and approximating normal distributions for simulated estimates of Experiment 3. Each panel row represents the distributions of \(\hat{\beta }_i,\,\hat{\gamma }_i,\,\hat{\omega }_i\), for \(i=0\) (left) and \(i=1\) (right), respectively

  • The bias of the estimates is always very limited; thus, MSE is mainly due to the variability of the estimates. In fact, MSE is generally small but for some parameters it deserves some consideration.

  • MSE of estimators \(\hat{\beta _1}\) seems more extreme in the experiments 1 and 2; the ratios of the parameters to the square root of MSE are 3.182 and 4.516, respectively. These MSEs are larger than expected as a consequence of few atypical values observed in these experiments. A similar consideration applies for the MSE of the estimator \(\hat{\omega _0}\) in the experiment 2 for which the ratio of the parameters to the square root of MSE is \(-6.059\).

  • A different problem arises for the estimation of \(\omega _1\) in the experiments 2 and 3 for which the mentioned ratios are 1.169 and \(-2.123\), respectively. In these situations, the proportions of the shelter effect (estimated by \(\omega _1\)) at \(R=5\) and expressed by \(\delta =0.15\) and \(\delta =0.05\), respectively, are important with respect to the basic probabilities. Here, the values of \(Pr\displaystyle \left( R=5\right) \) are 0.218 and 0.107, and the shelter effect represents 69 % and 47 % of the probability of the category \(R=5\), respectively. In these cases, a large number of observations is required for a more accurate estimation.

  • The asymptotic Normality of all estimators is sufficiently accurate as shown by the histograms reported in Fig. 2 for the more extreme case (Experiment 3). Only the distribution of \(\hat{\omega }_1\), for the aforementioned motivations, presents a left tails which is too long for a Gaussian distribution as a consequence of a limited set of atypical values. If we omit them, the resulting distribution is almost perfectly Normal.

Although the parameters of the distributions have been selected in order to scatter different shapes, more extensive simulations are required. We report that further experiments (here not discussed for brevity) have been performed for different number of categories, different proportions of the groups and varying sample size. They confirmed the adequacy of the ML estimation method and support the ability of the approach to detect different groups for samples of moderate/large size.

6 A real case study

We check the approach so far discussed with a real case study related to the political orientation in a survey planned with the students of University of Naples Federico II and their families and friends, during 2010. Since the research is based on an observational sample the study cannot be considered as representative of the Italian population but it is an instance of the capability of the g e cub model approach in terms of fitting and interpretation of similar results.

It is generally difficult to collect reliable answers about political orientation, and this is particularly true if the research considers a finer disaggregation than a coarse definition of “Conservative”, “Moderate”, and “Liberal”. This happens, for instance, in Italy where the galaxy of political parties is extremely varied; thus, it is important to predict the political orientation of a person by means of related questions and/or different covariates which are strongly related to such an orientation.

The sample data consist of \(n=707\) questionnaires where respondents carefully expressed their (self-assessed) Political orientation as an ordinal variable R with \(m=9\) categories, where 1, 5, 9 stand for “Extremely to Left”, “Center” and “Extremely to Right”, respectively. In addition, several concomitant variables related to personal socio-demographic and economic situation, opinions, ranking of nationwide newspapers, etc. have been collected. Thus, 48 % of respondents are women and their average age is 38 (derived from a larger group of young university students and a smaller one of their relatives). Then, education is higher than the average of the population since about one half of interviewees has got a (secondary) diploma degree and about 40 % has a university degree.

After a preliminary analysis based on stepwise regression approaches, we found the following covariates as relevant to explain Political orientation: Age (\(=\)the respondents’ age transformed as deviations from the average of logged years), Rank (\(=\)the ranking assigned to a historic Italian newspaper, “L’Unità”, well known for Left positions; here, \(\mathtt Rank =1\) means it is the most preferred, \(\mathtt Rank =7\) means that it is considered the worst), and Demo (=a dummy variable which denotes if the respondent has participated to public demonstrations in the last year).

All computations have been implemented by a programm written in the GAUSS language by using ML methods and exploiting the EM procedure for convergence. Standard errors have been computed by analytical derivation of the observed information matrix with ML estimates plugged into: details of these formal developments are reported in Iannario and Piccolo (2012b).

Table 4 Estimation of cub and g e cub models for the Political orientation

A g e cub model may be considered as the final step of several statistical analyses, including exploratory and correlation methods, cub models fitting with and without covariates, and cub model with a dummy to check for a possible shelter effect in a definite category (see Table 4 for the estimates of these different models. Standard errors are in parentheses). Figure 3 summarizes different aspects of these investigations which we briefly comment.

Fig. 3
figure 3

Observed distribution of Political orientation, with estimated shelter effect, (top-left). Box-plots of Political orientation with respect to Rank (top-right). cub models distributions conditional to Demo=0,1 (bottom-left). Heterogeneity index of Political orientation with respect to Rank (bottom-right)

Data set are characterized by a serious uncertainty in the responses since we get \((1-\hat{\pi })=0.57\) after fitting a cub model. The observed distribution shows a prominent shelter effect at \(R=5\) (see Fig. 3, top-left panel): an appreciable proportion of people, estimated by \(\hat{\delta }=0.089\), choices an intermediate position which corresponds also to a non-selective choice. This option represents a genuine shelter option. By missing this point, one might deduce that the Political orientation of the interviewees is strongly anchored to a “Centre” position (about 1 / 5 of the respondents), a statement not confirmed by electoral results and other empirical analyses.

Moreover, a significant relationship has been found between the expressed Political orientation and the covariate Demo, as confirmed by the estimated cub model which includes Demo as a covariate for feeling: Fig. 3 (bottom-left panel). Similarly, there is a sharp evidence of a connection between Political orientation and the covariate Rank as supported by the conditional box-plots of Fig. 3 (top-right panel): here the location of the distribution of the responses changes with Rank in a non-linear fashion.

In the framework of cub models, \((1-\pi )\) can be considered as a direct measure of indecision and \(\pi \) is strongly related to the heterogeneity of the distribution (Iannario 2012c, pp.169–171). Thus, to check if the heterogeneity is related to some subjects’ covariate we compute this measure on the ordinal data split by a categorical variable, as Rank for instance. We choose the normalized Laakso and Taagepera (1989) index which, for a discrete probability mass function \(\{p_1,p_2,\dots ,p_m\}\), is defined by

$$\begin{aligned} \fancyscript{H}=\frac{1}{m-1}\left[ \left( \sum _{i=1}^{m}\,p_i^{\,2}\right) ^{-1}-1\right] . \end{aligned}$$

The estimate of this index for the Political orientation conditional to Rank is shown in Fig. 3 (bottom-right). Although non-linear, we see a definite increase of \(\fancyscript{H}\) with the covariate Rank so we may expect a relationship of the uncertainty with this variable.

Finally, observe that when \(\xi \rightarrow 1\) (\(\xi \rightarrow 0\)) people express preference for an extreme left (right) position; thus, the parameter \(\xi \) is a direct measure of “Left” orientation.

A cub model with covariates presents a good fit but it does not take the shelter effect into account. In fact, as the log-likelihood and BIC measures confirm (see Table 5), it is relevant to introduce a shelter effect with significant covariates by means of a g e cub model.

Table 5 Fitting measures of estimated models for the Political orientation

The final g e cub model interprets such a relationship between the dependent variable and Age, Demo and Rank, with a significant contribution of the interaction between Age and Rank due to the different personal history of the respondents (which regarded such a newspaper with extreme positive/negative feeling according to their Political orientation). This circumstance is confirmed by the fact that a cub model with covariates (and no shelter component) assumes Age as a significant covariate for explaining uncertainty whereas the introduction of a shelter component considers Age as an useful covariate to explain this effect. We observe that the introduction of a shelter effect removes a significant role of Gender and emphasizes the importance of Rank for assessing the distribution of the responses.

An immediate visualization consists in plotting the estimated probability distributions of the g e cub models conditional to the significant subjects’ covariates, that is Demo, Rank and Age, as in Fig. 4. It is evident that the participation to demonstrations (Demo=1) increases the probability of being Left oriented (=low categories of the support). Rank is strongly related to Political orientation since a low rank for the selected newspaper is strictly related to Left orientation; noticeably, young respondents who give high consideration to this newspaper also have a considerable shelter effect. Age has a double effect: with increasing age people moves towards Right and the shelter effect becomes so prominent that it accounts for about 50 % of the probability of response.

Fig. 4
figure 4

Estimated g e cub probability distributions for given subjects’ covariates

A more stringent evidence of the shelter effect turns out by considering the role of the parameters \(\delta _i\) in the estimated g e cub model:

$$\begin{aligned} \hat{\delta }_i\,=\,\displaystyle \frac{1}{1+e^{2.842+\widetilde{Age}_i\,\left( 4.603-1.357\,Rank_i\right) }}\,, \qquad i=1,2,\dots ,n. \end{aligned}$$

The shelter effect \(\delta _i\) adds a positive probability at modality \(R=5\) (the so-called “Centre” choice) and this effect changes with Age and with the interaction between Age and Rank: the interaction acts positively with Age if \(\mathtt Rank =1, 2\) (people who substantially agree with the positions of the selected newspaper) and negatively elsewhere. The behaviour of \(\delta _i\) explains this composite effect for varying Age and Rank.

Fig. 5
figure 5

Shelter effect of the estimated g e cub model for varying Age and Rank

Figure 5 suggests that for people aged less than 34 years the shelter effect is quite moderate when they have low reputation towards the newspaper (=high rank). A more important contribution is registered for people aged more than 34 years (especially when respondents are elderly): the shelter effect systematically increases if they rank the newspaper in a high position (=bad consideration). Thus, the refuge position attracts with more decision people who negatively consider the selected newspaper and are not so young (again this is related to the heated political debate in Italy during the last 50 years).

A further interpretation may be derived if we exploit the second specification (4) of a g e cub model where the weight of the combination of uncertainty and shelter effect (a sort of global indecision) is measured by \(1-\lambda _i\) and estimated as:

$$\begin{aligned} 1-\hat{\lambda }_i\,=\,1-\left( 1-\hat{\delta }_i\right) \,\,\displaystyle \frac{1}{1+e^{-2.133\,+\,0.355\,Rank_i}}\,, \qquad i=1,2,\dots ,n. \end{aligned}$$

This quantity is shown in Fig. 6 and confirms that the global indecision suddenly decreases for young people up to 30 years, then it remains substantially constant. On the contrary, this global effect regularly increases with years as far as respondents are right-oriented.

Fig. 6
figure 6

Global indecision of the estimated g e cub model for varying Age and Rank

For a comparative analysis, some results obtained by using a more consolidated approach, as the proportional odds models (pom) (Agresti 2010, p.53), are discussed and the main fitting measures are reported in the last line of Table 5. The procedure for selecting significant explanatory variables leads to the set of covariates {Age, Rank, Demo, Age \(\times \) Rank} as for g e cub models. The inclusion of Gender does not raise significantly the likelihood.

Finally, two expected profiles according to pom  and g e cub models (fitted with the same explanatory covariates) are compared to see the effect of the different structures on the probabilities of responses. These results are summarized in Fig. 7 for a young left-oriented respondent who declares to participate to demonstrations (profile A: left panel) and for an elderly right-oriented respondent who does not participate to demonstrations (profile B: right panel). It seems evident that both models capture a similar pattern in the probabilities of responses; however, the g e cub model includes more uncertainty in the expected probabilities and gives special importance to the shelter option, which is only partially captured by a pom. In addition, g e cub model relates this effect to significant covariates as already shown in Fig. 5.

Fig. 7
figure 7

Expected profiles of responses according to pom  and g e cub models. Profile A concerns a young left-oriented respondent who participates to demonstrations (left panel). Profile B concerns an elderly right-oriented respondent who does not participate to demonstrations (right panel)

Few considerations may be added: first, the estimation of 8 additional cutpoints in pom  induces a serious loss in parsimony and, secondly, these models transfer the effect of the respondents’ indecision in the global effect of covariates on the responses. This may bias the interpretation of the results and the corresponding prediction of the respondents’ behaviour. Thus, although maximum log-likelihood values are definitely comparable, the g e cub model seems more convenient in terms of fitting and parsimony (as measured by BIC).

The prediction of ordinal data is generally assessed by comparing the observed proportions of categorical responses with the fitted ones: this is performed by computing deviance measures which are related to the previous log-likelihood considerations. In fact, these models may be used also for predicting the responses of a single respondent given the covariates by means of the estimated significant relationships. More specifically, we use the estimated \(\hat{r}_i\) computed from the g e cub model to predict the Political orientation of any respondent, given the knowledge of Age, Rank and Demo. In Table 5, we list the RMSE obtained by different estimated models.

In our case study, the modal value as a predictor outperforms the expected value (in fact, modal value is also easier to interpret since it is a proper category). The improvement of a g e cub with respect to a cub model with covariates is quite moderate; however, from a predictive point of view, g e cub should be preferred if we use the modal value as point predictor for the response and it is comparable to pom  if we use the mean.

A common problem arises in the setting of point prediction: although the estimated probability distribution fits the observed ones in a satisfactory measure, it is difficult to predict the single response with high confidence given the appreciable level of uncertainty which surrounds these options. The estimated g e cub model is able to predict the response exactly or around \(\pm 1\) category in more than 60 % of cases. However, the substitution of the whole estimated probability distribution with a single value (mode, mean, or median) misses at least one aspect of the features of the responses: in our data set the observed data are positively skewed and it is almost impossible to predict responses as \(R=7,8,9\) (which have been observed in 124 cases, a relative frequency of 0.175). In fact, the prediction of these high categories requires the knowledge of further covariates which are explicitly related to these extreme values.

7 Concluding remarks

g e cub models share with GLM framework the presence of stochastic and systematic components (given by (3) and (5), respectively) but differentiate in several aspects and a comparative discussion has been pursued by Iannario and Piccolo (2015). The main points are the following:

  • Data generating process for g e cub models concerns a probability mass function, whereas a distribution function is involved in the derivation of cumulative models.

  • g e cub and cub models do not belong to the exponential family.

  • In g e cub models the link among parameters and covariates is a direct one, without involving moments.

  • Parsimony of g e cub models is an added value since no cutpoints are required in the estimation procedures; thus, g e cub models should be preferred for larger m.

The specification of g e cub models may be extended in several directions and we list few of them according to the current research.

  • Some interesting relationships between the shelter choice and the presence of “don’t know” responses in rating surveys have been recently emphasized by Manisera and Zuccolotto (2014a) with special reference to cub models.

  • The Binomial distribution for the feeling has been generalized with the introduction of a Beta-Binomial random variable to take into account a possible overdispersion in the ordinal data (Iannario 2012b, 2014), also with subjects’ covariates (Piccolo 2015). A similar approach would lead to g e cube models.

  • The standard structure of cub and g e cub models assumes a constant uncertainty for all categories. Some interesting improvements have been recently obtained by Gottard et al. (2015) who considered a varying uncertainty in the model by specifying an a priori distribution for the subjects’ indecision. Similar considerations may be pursued by inserting a varying uncertainty in the g e cub structure and this extension does not require further parameters to be estimated.

  • In some circumstances, as in sensometric studies, it is convenient to introduce objects’ covariates (Piccolo and D’Elia 2008) in the link of the parameters since consumers’ preferences are undoubtedly conditioned by the sensory characteristics of the items under scrutiny (food or beverage, for instance). This proposal may be usefully applied to g e cub models.

  • When data are organized according to a hierarchical structure, it may be effective to consider multilevel models: thus, hierarchical cub models have been introduced (Iannario 2012d). This random effect might be extended to the g e cub models in order to capture hierarchical structures and clusters variability.

To summarize, in this paper the main statistical issues of g e cub models for studying ordinal data have been presented and this has been pursued according to a general framework derived by an interpretation of the data generating process for this kind of observations. More experience is necessary to implement some design able to detect and test the components of these mixtures. In addition, convenient and effective starting values for the EM procedure are required, as already obtained for cub models (Iannario 2012c).

Both reported simulations and empirical analysis, although limited, suggest that this framework can accomplish the standard goals of the analysis of ordinal data with an added value in terms of interpretation of the components and their effects, parameters parsimony and immediate graphical facilities. As in any scientific investigation, a multiple perspective for modelling real phenomena should be considered a positive improvement of knowledge.