Abstract
Ordinal responses often arise from surveys which require respondents to rate items on a Likert scale. Since most surveys contain more than one question, the data collected are multivariate in nature, and the associations between different survey items are usually of considerable interest. In this paper, we focus on a mixture distribution, called the combination of uniform and binomial (CUB), under which each response is assumed to originate from either the respondent’s uncertainty or the actual feeling towards the survey item. We extend the CUB model to the bivariate case for modelling two correlated ordinal data without using copula-based approaches. The proposed model allows the associations between the unobserved uncertainty and feeling components of the variables to be estimated, a distinctive feature compared to previous attempts. This article describes the underlying logic and deals with both theoretical and practical aspects of the proposed model. In particular, we will show that the model is identifiable under a wide range of conditions. Practical inferential aspects such as parameter estimation, standard error calculations and hypothesis tests will be discussed through simulations and a real case study.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Ordinal data are frequently encountered in various disciplines. As mentioned in Anderson (1984), ordinal data often arise in two situations: (1) thresholding an underlying continuous variable, and (2) ranking provided by an assessor after processing unspecified amount of available information. An example of the first type could be the abundance of species based on percentage cover on the ground, which can be defined as 0 (absence), 1 (> 0–5% cover), 2 (> 5–12% cover), and so on (Guisan and Harrell 2000). When it is reasonable to assume the existence of a latent continuous variable, logit- or probit-type regression models are commonly employed to analyse the data (McCullagh 1980; Agresti 2010), see also a recent review for a detailed account of various ordinal regression models (Tutz 2022).
The second type of ordinal data is usually recorded in terms of a Likert scale, which has become a widely used tool in researches that involve surveys and questionnaires (Joshi et al. 2015). For example, in visual grading experiments for medical images, assessors are often requested to classify an image using one of several possible options such as “Definitely it is not clearly visible”, “Probably it is not clearly visible”, and so on (Al-Humairi et al. 2022). Since this type of data is usually collected from human respondents, there exists response biases which may make the data not truly reflecting the respondent’s actual opinion towards the survey item (Baumgartner and Steenkamp 2006). For example, in answering a survey question, some people may choose a satisficing option rather than investing their time to give the optimal answer (Krosnick 1999). Van Vaerenbergh and Thomas (2013) have also reported different response styles where respondents tend to choose an answer regardless of the content. Thus, any serious attempt to analyse survey data should take into account the potential response biases inherent in the data. As argued by Iannario and Piccolo (2016), one of the simplest ways to model these kinds of data is to use a two-component model, which explicitly assumes that the data are generated from two processes as described below.
To this end, this paper focuses on the use of finite mixture models to analyse ordinal data arising from surveys. An advantage of using finite mixture models is that the data can be considered as generated from different underlying processes or heterogeneous populations, allowing for a greater flexibility (McLachlan et al. 2019). A popular mixture model that has gained attention recently is the combination of uniform and binomial (CUB) model. Since introduced by Piccolo (2003) and D’Elia and Piccolo (2005), CUB models and their variants have been widely applied in various disciplines to model ordinal data, especially those arising from surveys which require respondents to choose a response from a Likert scale. For example, CUB models have been applied in modelling survival probabilities (Iannario and Piccolo 2010b), customer preferences on food quality (Piccolo and D’Elia 2008), and job satisfaction (Gambacorta and Iannario 2013), just to name a few.
Under the settings of CUB models, the uniform component represents the indecisiveness or uncertainty of the respondent towards the survey item. In such a case, the respondent is assumed to pick an answer completely at random. The binomial component, on the other hand, is related to the feeling or actual opinion of the respondent towards the survey item. The stronger the feeling, the higher the rating. However, the analyst will not be able to distinguish whether the response is a completely random selection or a reflection of the actual feeling of the respondent. Nonetheless, the estimated parameters could inform the measure of uncertainty and preference for typical respondents. More details regarding the foundations and developments of CUB models can be found in a recent review (Piccolo and Simone 2019).
Most of the CUB models developed so far are univariate in nature. In other words, they focus on merely one survey item or question. Since most surveys contain more than one question, the data collected are multivariate in nature. To capture the dependency structure between the responses from several survey items, multivariate models are required. Some notable attempts to introduce multivariate CUB distributions include Corduas (2011, 2015), Andreis and Ferrari (2013), Colombi and Giordano (2016) and Colombi et al. (2019). Except the last one, all these works use copula-based methods to combine univariate CUB random variables. In particular, Colombi and Giordano (2016) employed the Sarmanov distribution while the others used the Plackett distribution. On a related note, Barbiero (2021) demonstrates how a joint distribution of two CUB margins can be constructed using copulas to match a desired correlation. While copula-based methods are flexible, there are limitations that cannot be overlooked. Firstly, copulas are usually applied to continuous random variables. The dangers and restrictions of applying the same practices to discrete distributions have been outlined by several authors, see Genest and Nešlehová (2007) and Geenens (2020) for example. Specifically, since copulas cannot be uniquely defined for discrete variables (Nelsen 2006), there are identifiability issues, which may cause inconsistency in parameter estimation (Genest and Nešlehová 2007). Secondly, parameter(s) in copula models is (are) usually related to either the rank or Pearson correlation between the two univariate random variables. However, since CUB random variables are a combination of two processes, the copula parameter(s) (assuming consistent) would relate to the overall correlation between the mixtures only, rather than the correlation between the individual uniform or binomial components. This may make the interpretation of the estimated copula parameters difficult.
To avoid the above concerns, this paper aims to construct a joint distribution for \((R_1,R_2)\), which represents a pair of ratings arising from a survey, using bivariate uniform and bivariate binomial distributions. Some important features of the proposed model include (1) both \(R_1\) and \(R_2\) follow a CUB distribution marginally, (2) the joint distribution is not derived through copula-based routines, and (3) the dependency between the uniform and binomial components can be estimated separately, allowing better interpretation of model parameters. Our proposed model is similar to the hierarchical marginal models with latent uncertainty (HMMLU) proposed by Colombi et al. (2019), which will be described more formally in Sect. 3. Briefly, in their work, the uncertainty components can take a more flexible shape while the feeling components and the corresponding associations are modelled directly using marginal logits and log odds ratios, in the spirit of marginal models (Molenberghs and Lesaffre 1994; Bartolucci et al. 2007). One drawback of HMMLU is that the uncertainty components are assumed to be independent. Our proposed model overcomes this by having a parameter that directly measures the correlation between the uncertainty components. Another drawback of HMMLU lies in the large number of parameters, which characterise the marginal logits and log odds ratios, especially in the absence of covariates. In our proposed model, the feelings are modelled using a bivariate binomial distribution which contains only three parameters, making it more parsimonious.
The rest of the paper is organised as follows. Section 2 provides a brief account of the CUB, bivariate uniform and bivariate binomial distributions. Section 3 demonstrates how these distributions can be combined to form a new class of bivariate CUB models. A comparison between the proposed model and HMMLU is provided as well. Section 4 deals with various inferential issues including identifiability, parameter estimation, calculation of standard errors and hypothesis tests. Simulation and application results are reported in Sects. 5 and 6, respectively. Finally, Sect. 7 provides a conclusion and discussions.
2 Preliminaries
Formally, a random variable R is said to follow the CUB distribution with parameters \(\pi \) and \(\xi \), denoted by \(R\sim CUB(\pi _,\xi )\), if the probability mass function (pmf) is a mixture of the discrete uniform distribution and a binomial distribution. Suppose R takes one of the \((m+1)\) values from \(\lbrace 0,1,2,\ldots ,m\rbrace \), the pmf admits the form
Here, the mixing weight \((1-\pi )\) measures the degree of uncertainty while \((1-\xi )\) measures the degree of feeling. As \((1-\xi )\) increases, there is a higher chance of observing a higher rating.
In the literature, CUB random variables are assumed to range from 1 to m instead of starting from zero, represented by a shifted binomial distribution (Iannario and Piccolo 2010a). However, in this paper, we use the ordinary binomial distribution for several reasons. Firstly, real survey choice sets are often textual and arbitrary in numbering, making numerical interpretation less meaningful. Secondly, we treat the binomial component as a sum of independent Bernoulli variables, which naturally starts from zero. Lastly, ordinary binomial distribution results are more accessible and less confusing for readers unfamiliar with the history of CUB models.
To construct a bivariate model for \(R_1\) and \(R_2\) (which may represent rating responses from two survey questions), we first provide some details on a bivariate discrete uniform distribution and a bivariate binomial distribution which we have chosen to work on. A main feature of these distributions is that the marginal distributions belong to the same class.
2.1 Bivariate discrete uniform distribution
Let \(U_1\) and \(U_2\) be two random variables where the pmf for \(U_1\) admits the form
We further assume the following form for the conditional distribution of \(U_2\) given \(U_1\):
In other words, the conditional distribution of \(U_2|U_1\) is not uniform but a categorical distribution. The parameter \(\alpha _U\) characterises the dependence between \(U_1\) and \(U_2\). Depending on the value of \(\alpha _U\), the probability of choosing the same answer in Question 2, given the response in Question 1, can be higher, lower, or unchanged. The admissible range of \(\alpha _U\) is \([-1,m]\), with \(\alpha _U=0\) representing the case of independence. Marginally, \(U_2\) follows the discrete uniform distribution, since
The joint distribution of \(U_1\) and \(U_2\) can be written as
where \(1_{A}\) is the indicator variable which takes a value of 1 if condition A is satisfied; and 0 otherwise. Notice that \(U_1\) and \(U_2\) are independent only if \(\alpha _U=0\).
Studies on psychological aspects of survey responses have revealed the tendency for respondents to select the same category regardless of the question, thus we expect \(\alpha _U\) to be positive in practice. For example, three of the common response styles reported by Baumgartner and Steenkamp (2001) and Van Vaerenbergh and Thomas (2013) are acquiescence response style, extreme response style and midpoint responding. These response styles refer to the tendency to agree with the item regardless of content, to select the most extreme category regardless of content, and to choose the middle scale category regardless of content. All these tendencies would make the probability of having two identical responses higher than expected under the independence assumption. The first two moments of \(U_1\) and \(U_2\) are summarised below. The derivations can be found in Appendix A.
2.2 Bivariate binomial distribution
We will make use of the bivariate binomial distribution introduced in Biswas and Hwang (2002), where further details can be found. Let \(T_1\) and \(T_2\) be two random variables. Following Biswas and Hwang (2002), we consider \(T_1\) as a sum of m independent Bernoulli variables, i.e., \(T_1=\sum _{i=1} ^m T_{1i}\) where \(T_{1i}\overset{\text {i.i.d.}}{\sim } Ber(1-\xi _1)\). Given \(T_{1i}\), another Bernoulli variable \(T_{2i}\) is generated such that
where \(\alpha _B\) measures the dependency between \(T_{1i}\) and \(T_{2i}\), with the admissible ranges
Remark 1
The admissible ranges given in (4) ensure \(P(T_{2i}=0|T_{1i})\) and \(P(T_{2i}=1|T_{1i})\) are between 0 and 1. The above ranges correct the ones provided in Biswas and Hwang (2002).
When \(\alpha _B=0\), \(T_{1i}\) and \(T_{2i}\) are independent. Furthermore, \(T_{1i}\) and \(T_{2j}\) are assumed to be independent for all \(i\ne j\). Marginally, it can be checked that \(T_{2i}\sim Ber(1-\xi _2)\) since
We further define \(T_2=\sum _{i=1}^m T_{2i}\). In other words, both \(T_1\) and \(T_2\) follow the binomial distribution with parameters \((m,1-\xi _1)\) and \((m,1-\xi _2)\), respectively. The conditional distribution of \(T_2\) given \(T_1\) is given as
where
Hence, the joint distribution of \(T_1\) and \(T_2\) can be written as
where \(B_1 (t_1) = C_{t_1} ^m (1-\xi _1)^{t_1}\xi _1^{m-t_1}\). The covariance and correlation of \(T_1\) and \(T_2\) are given below [see also Biswas and Hwang (2002) for a more general class of the bivariate binomial distribution]:
Mathematical derivations are provided in Appendix A. When two survey questions inquire about similar aspects, it is reasonable to expect a positive correlation in the responses (\(\alpha _B>0\)). In the opposite, if the two questions are probing for conflicting aspects (for example, satisfaction of salary and tendency to leave the company), one may anticipate a negative \(\alpha _B\). Studies of survey response have also revealed that prior questions often influence later responses (Krosnick and Alwin 1987; Tourangeau et al. 2000), thus it is important to capture the correlation between \(T_1\) and \(T_2\).
3 A new class of bivariate CUB distributions
Suppose \(R_1\) and \(R_2\) represent the ordinal responses from two survey questions answered by the same respondent. Although there is no requirement for \(R_1\) and \(R_2\) to be the responses from two consecutive questions, it may be easier to understand the process considering that way. We assume the following generating process.
The respondent first decides if s/he is uncertain or certain about his/her feeling towards Question 1. If s/he is uncertain, the rating is given randomly according to a discrete uniform distribution. If s/he is certain, the rating is given by a binomial distribution reflecting her/his feeling. Hence, \(R_1\) resembles the generating process of a univariate CUB variable. The same process is repeated Question 2. However, this time the rating may depend on the rating provided in the previous question.
Since the decision process is repeated two times, there are four scenarios: (uncertain, uncertain), (uncertain, certain), (certain, uncertain) and (certain, certain), with respective probabilities \((1-\pi _1)(1-\pi _2)\), \((1-\pi _1)\pi _2\), \(\pi _1(1-\pi _2)\) and \(\pi _1\pi _2\). Symbolically, let \(D_1\) and \(D_2\) be two independent Bernoulli variables with \(P(D_i = 1) = \pi _i\). The four scenarios can be written as \((D_1=0,D_2=0)\), \((D_1=0,D_2=1)\), \((D_1=1,D_2=0)\), and \((D_1=1,D_2=1)\). We also assume that if the ‘regime’ goes from uncertain to certain (or vice versa), the ratings given in the two questions are independent. Such a process is represented schematically in Fig. 1. Note that all the stages except the outcome are unobservable and therefore unobserved. The above described process would result in the following joint distribution:
From the joint distribution, it can be checked that, marginally, both \(R_1\) and \(R_2\) follow a univariate CUB distribution with parameters \((\pi _1,\xi _1)\) and \((\pi _2,\xi _2)\), respectively. Also, \(R_1\) and \(R_2\) are independent if and only if \(\alpha _B=\alpha _U=0\). In that case,
The first two moments of the proposed bivariate CUB distribution are given by
Derivation details are provided in Appendix A. The correlation, \(r_R\), can then be derived from the covariance and the variances. Figure 2 shows the contour plots and 3D histograms for the joint probability mass functions under three sets of parameters. From top to bottom panels, the figure demonstrates the cases where \(R_1\) and \(R_2\) are positively correlated, independent, and negatively correlated, respectively.
From (7), it can be deduced that the correlation between \(R_1\) and \(R_2\) is zero if \(\alpha _B=\alpha _U=0\), or when
as long as the right-hand-side (RHS) of the above equation is within the admissible range provided in (4). In other words, the dependency within the uniform components may sometimes cancel out that due to the binomial components.
The correlation \(r_R\) between the two responses are governed by not only \(\alpha _U\) and \(\alpha _B\), but also all other parameters. For this reason, \(r_R\) may sometimes be misleading, or at least undermining the dependency between the respondent’s feelings towards the two items. For instance, if \(\pi _1=\pi _2=0.5\), then approximately half of the pairs \((R_1,R_2)\) will be generated independently, which may shrink the overall correlation \(r_R\), even when \(r_U\) and \(r_T\) are reasonably large. Yet, in practice, \(r_U\) and \(r_T\) may be of higher interest. The former represents the tendency of choosing the same category when the respondent was uncertain towards both questions, while the latter represents the correlation between the liking of the two survey items. Once the model parameters were estimated, \(r_T\) and \(r_U\) can be found correspondingly. The separation of the overall dependency into different components cannot be accomplished in any previously proposed copula-based methods, as these methods tend to estimate the overall correlation between the two margins.
3.1 Comparison with HMMLU
A model that is similar to the bivariate CUB model proposed is the aforementioned HMMLU (Colombi et al. 2019). Similar to our approach, the data generating process of HMMLU assumes the existence of latent states that represent if the respondent’s answer was based on feeling or uncertainty. In the bivariate case, the four scenarios \((D_1=0,D_2=0)\), \((D_1=0,D_2=1)\), \((D_1=1,D_2=0)\), and \((D_1=1,D_2=1)\) would still apply. The major difference between HMMLU and our proposal lies in the distributions of the responses under each of the four scenarios.
When an the answer is given with uncertainty \((D=0)\), HMMLU assumes a distribution \(h_i(r_i), i=1,2,\) which can take different shapes such as U-shape and bell shape. The uniform distribution is one of the special cases. When both answers are given with uncertainty, \(R_1\) and \(R_2\) are assumed to be independent under HMMLU. In the opposite, when an answer is given with certainty (\(D=1\)), HMMLU does not impose any specific distribution for the responses. Rather, the marginal distributions and the joint distribution are parameterised through marginal logits and log odds ratios, respectively. Such an approach stems from the general framework of marginal models for categorical data (Bergsma and Rudas 2002; Bartolucci et al. 2007). The joint distribution of \(R_1\) and \(R_2\) under HMMLU can be written as
Comparing Eqs. (6) and (8), some differences between HMMLU and the proposed bivariate CUB model are notable. Firstly, HMMLU does not allow correlation between uncertain responses. In the bivariate CUB model, such a correlation is captured through \(\alpha _U\) in \(U_{12}\). Of course, when \(\alpha _U=0\), the uncertain responses under the bivariate CUB model are generated in the same manner as HMMLU when both \(h_i\) take the uniform distribution. Secondly, the mixing weights (\(\pi \)’s) are generated differently. Implicitly, our approach assumes that \(D_1\) and \(D_2\) are independent while HMMLU allows them to be dependent.
Lastly, the distributions of the certain responses under HMMLU need not be the binomial distributions, and are hence more flexible. However, a consequence of which is that HMMLU contains way more parameters. The situation is more obvious in the absence of covariates. For example, with \(m+1\) categories, HMMLU would require m parameters for the marginal logits for each of \(R_1\) and \(R_2\), and \((m-1)^2\) log odds ratios to parameterise the joint distribution. As mentioned in Colombi et al. (2019, p. 599), the large number of parameters will usually lead to identifiability issues, and constraints are therefore required. In the opposite, the distributions of the certain responses under the proposed bivariate CUB model can be characterised using three parameters \(\xi _1,\xi _2\) and \(\alpha _B\).
4 Inferential issues
Next, we discuss various issues related to the inferential processes. We start with the identifiability since the estimation of the parameters is only meaningful if the model is identifiable. Next, we discuss the strategy of estimating the parameters. Before closing this section, we provide details for the standard error calculations and hypothesis tests for some of the parameters.
4.1 Identifiability
The following theorem specifies the conditions under which the bivariate CUB model is identifiable.
Theorem 1
Given that \(0<\pi _1,\pi _2,\xi _1,\xi _2<1\), \(\xi _1\ne \xi _2\), and \(m\ge 3\), the bivariate CUB model given in (6) is identifiable.
Before we provide a proof for the theorem above, we first place a couple of remarks. Firstly, the condition \(m\ge 3\) (i.e., the number of categories is at least 4) is equivalent to the condition required in the univariate case (Iannario 2010). Similar to the identifiability condition for HMMLU, restrictions on the number of categories are necessary to make sure that the number of parameters is less than the number of free frequencies (Colombi et al. 2019). The univariate CUB model is still identifiable when \(\pi =1\) (Iannario 2010) because the discrete uniform distribution does not contain any parameters (\(\xi \) can be identified even when \(\pi =1\)). Thus, the identifiability is ensured for \(\pi >0\). However, in the bivariate case, as specified in Theorem 1, while it is still required that \(\pi _1,\pi _2>0\), either \(\pi _1\) or \(\pi _2=1\) would make the model non-identifiable, since there are infinite number of possible \(\alpha _U\) that would yield the same joint distribution (6) under such case. Similarly, when \(\xi _1\) or \(\xi _2\) takes either values of 0 or 1, \(\alpha _B\) cannot be identified as well. The additional requirement of \(\xi _1\ne \xi _2\) may seem restrictive. In practice, however, since \(\xi _1\) and \(\xi _2\) correspond to the feeling of a respondent towards two survey questions, the values rarely coincide, unless the two questions are probing for exactly the same aspect (in that case, a single question would be sufficient).
Proof
Let \(\varvec{\theta }=(\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'\in \varvec{\Theta }=(0,1)^4\times [-1,m]\times {\mathcal {A}}_B\) where \({\mathcal {A}}_B\) is the parameter space for \(\alpha _B\) governed by (4), with the exception that \(\xi _1\) cannot be equal to \(\xi _2\). Further, denote by \(P_{r_1,r_2}(\varvec{\theta })=P(R_1=r_1,R_2=r_2;\varvec{\theta })\), \(P_{\bullet r_2}=\sum _{r_1=0} ^m P_{r_1,r_2}\) and \(P_{r_1 \bullet }=\sum _{r_2=0} ^m P_{r_1,r_2}\). The bivariate CUB model is identifiable if and only if, for any parameter vector \(\varvec{\theta ^*}\), the system of equations in \(\varvec{\theta }\):
admits only one solution in the parameter space (Manisera and Zuccolotto 2015). With \((m+1)\) categories, there are altogether \((m+1)^2\) equations in (9). Fortunately, results in Manisera and Zuccolotto (2015) also demonstrate that it is possible to reduce the number of equations in the system by constructing some equations that allow the parameters to be specified sequentially.
For the bivariate CUB model on hand, we consider the following system of equations:
The selection of the above system was merely due to the simplicity of algebra involved, as shown below. In the first equation, both \(P_{m\bullet }\) and \(P_{0\bullet }\) represent marginal probabilities which are free of \(\alpha _U\) and \(\alpha _B\). According to Iannario (2010), the first two equations in (10) allow \(\pi _1\) and \(\xi _1\) to be uniquely specified. Similarly, the second two allow \(\pi _2\) and \(\xi _2\) to be uniquely specified. If \(\alpha _B\) can be uniquely specified, the last equation will only yield one \(\alpha _U\) (hence unique). Thus, it remains to prove the uniqueness of \(\alpha _B\). For this purpose, we consider in details the fifth equation in (10). Since
we have
which is a function in \(\alpha _B\), and free of \(\alpha _U\), provided all other specified parameters \(\pi _1,\pi _2,\xi _1\) and \(\xi _2\). Furthermore, this function is continuous in \(\alpha _U\). To see this, we simply need to show that \(\alpha _B>-1\). With \(\xi _1\ne \xi _2\), the lower bound of \(\alpha _B\) is always greater than \(-1\) since
Now, we will show that the above function is monotonically increasing in \(\alpha _B\). Differentiating (11) with respect to \(\alpha _B\) yields
Since \(\alpha _B>-1\), the denominator is always positive. Now, consider \(\xi _2-\alpha _B(\xi _1-\xi _2)+\alpha _B\). The lower bound of \(\alpha _B\) is given by
Since
we can deduce that
If \(\xi _1+\xi _2-1 \ge 0\),
In the opposite, if \(\xi _1+\xi _2-1 < 0\),
Hence, \(\xi _2+\alpha _B(1-\xi _1+\xi _2)\) is always positive. Since Eq. (11) is continuous and monotonically increasing, one and only one \(\alpha _B\) will be specified. This completes the proof. \(\square \)
4.2 Parameter estimation
The parameter estimation can be carried out using the EM algorithm (Dempster et al. 1977). Although the chief focus of Dempster et al. (1977) was on handling incomplete data, the EM algorithm has been proven to work well for mixture distributions, including CUB models (Piccolo 2006). Further details on this topic can be found in Everitt and Hand (1981), Redner and Walker (1984), McLachlan and Peel (2000) and Arcidiacono and Jones (2003), among many others. The details of the algorithm for the proposed bivariate CUB model are provided in Appendix B.
4.3 Standard errors
The variance-covariance matrix of the estimated parameters can be obtained by inverting the observed information matrix:
The standard errors of the parameters are the square root of the diagonal elements of \(\text {Var}(\hat{\varvec{\theta }})\). The use of observed information matrix, instead of expected information matrix, has been justified in Efron and Hinkley (1978). Explicit expressions of the elements in \(\text {Var}(\hat{\varvec{\theta }})\) are provided in Appendix C.
4.4 Model selection
For a particular dataset on hand, when selecting between non-nested models such as the bivariate CUB and HMMLU, common measures such as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) can be employed. In the context of the proposed bivariate CUB model, when comparing between nested models, it can be done via hypothesis tests by means of the likelihood ratio test (Hoel 1962). Here, we list some of the tests can be done regarding the dependency parameters:
-
\(H_0 ^1: \alpha _B=c_1, \alpha _U=c_2\),
-
\(H_0 ^2: \alpha _B=c\), and
-
\(H_0 ^3: \alpha _U=c\),
for some constants \(c, c_1\) and \(c_2\), against the alternative hypothesis that \(H_0\) is not true. In particular, testing if any or both of \(\alpha _U\) and \(\alpha _B\) is/ are zero would be of high interest. Under \(H_0 ^1\), if \(\alpha _B=\alpha _U=0\), \(R_1\) and \(R_2\) are completely independent. Under \(H_0 ^2\), if \(\alpha _B=0\), provided that the respondent chose to express his/ her opinions on both questions, the feelings towards the two questions are independent. Under \(H_0 ^3\), if \(\alpha _U=0\), provided that the respondent was uncertain to both questions, his/ her choices of the categories are independent (both completely random). The test statistic is
where \(\hat{\varvec{\theta }}_0\) is the maximum likelihood estimator of \(\varvec{\theta }\) evaluated under the restrictions specified in \(H_0\). The test statistic follows a \(\chi ^2\) distribution approximately, with a degrees of freedom of 2 for the \(H_0 ^1\), and 1 for both \(H_0 ^2\) and \(H_0 ^3\).
5 Simulation
Simulations were conducted to investigate the accuracy of the estimates based on the procedure described in Sect. 4.2 under two cases: (1) large sample with many categories, and (2) small sample with relatively fewer categories. As the number of categories is typically between 2 and 11, with 5 to 10 categories being the easiest to rate (Wakita et al. 2012), we have purposely chosen 5, 7 and 10 categories in the simulation studies below.
5.1 Large sample with 10 categories
In this simulation study, we set \(m=9\) (which means a total of 10 categories) and used two sets of parameters as given below.
-
Set 1: \((\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'=(0.7,0.5,0.6,0.4,5.0,1.5)'\)
-
Set 2: \((\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'=(0.5,0.6,0.6,0.4,3.0,-0.3)'\)
For each set of parameters, we first simulated two Bernoulli variables \(D_1\) and \(D_2\) using \(\pi _1\) and \(\pi _2\) as the respective parameters. If \(D_1=D_2=0\), \(R_1\) and \(R_2\) were simulated using (1) and (2), respectively. If \(D_1=0\) and \(D_2=1\), \(R_1\) was simulated using (1) and \(R_2\) was simulated using a binomial distribution with parameter \((1-\xi _2)\). If \(D_1=1\) and \(D_2=0\), \(R_1\) was simulated using a binomial distribution with parameter \((1-\xi _1)\) and \(R_2\) was simulated using (1). If \(D_1=D_2=1\), then m Bernoulli variables were simulated using \((1-\xi _1)\) as the parameter. These m Bernoulli variables were summed up to yield \(R_1\). Conditional on each value of these m Bernoulli variables, another m Bernoulli variables were generated with a parameter specified in (3). The sum of the latter m Bernoulli variables resulted in \(R_2\). Three sample sizes \(n=\lbrace 1000,2000,3000\rbrace \) were used. For each sample size, 1000 replicates were simulated. The convergence threshold for the EM algorithm was set to be \(1\times 10^{-5}\).
Under the parameters specified in Set 1, \(r_U=0.56, r_T=0.60\), and \(R_1\) and \(R_2\) are positively correlated, with a theoretical correlation of 0.24. Under those specified in Set 2, \(r_U=0.33, r_T=-0.43\), and \(R_1\) and \(R_2\) are only weakly positively correlated, with a theoretical correlation of 0.05. Table 1 summarises the estimation results across all simulation replicates.
For both sets of parameters, the biases of all estimated parameters were very small, with a generally decreasing trend with sample sizes. Meanwhile, the coefficients of variation decrease with the sample size as well, as one would expect. Not surprisingly, the variabilities of \(\alpha _U\) and \(\alpha _B\) were greater than the other parameters. This is probably due to the fact that these parameters can only be estimated when \(D_1=D_2=0\) and \(D_1=D_2=1\), respectively, hence requiring a larger sample size than the marginal parameters in achieving a lower variability. Overall, we conclude that the EM algorithm proposed in Sect. 4.2 worked well and is therefore an appropriate method for fitting the bivariate CUB model when both the sample size and the number of categories are large.
5.2 Small sample with 5 or 7 categories
The data generating process was the same as those reported in Sect. 5.1, except the number of categories and sample sizes are smaller. Specifically, the cases where \(m=4\) and 6 were considered. For each value of m, sample sizes of 100, 200 and 300 were used. The parameters used were \((\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'=(0.7,0.3,0.8,0.4,3.0,0.2)'\). Compared to the previous two sets of parameters used, this set of parameters would make the data more sparse as \(\xi _1\) is closer to 1, meaning that the values for \(R_1\) are more concentrated in the lower end. The simulation results are provided in Table 2. From the results, as the sample size increases, a generally decreasing trend in the biases of the estimates can be observed. The marginal parameters can be accurately estimated even with the lower sample size considered, although larger biases can be observed compared to the large sample cases reported in Table 1. Consistent with the large sample case, the estimation of the dependency parameters \(\alpha _U\) and \(\alpha _B\) is less accurate than the marginal parameters. The number of categories does not seem to have a huge impact on the estimation of the parameters.
6 Application
The proposed bivariate CUB model was applied to the “relgoods” dataset, available within the CUB package (Iannario et al. 2020) in R (R Core Team 2022). The dataset contains results from a survey conducted in Naples, Italy, in 2014. Respondents of the survey were asked to evaluate their scores for various relational goods (for example, time dedicated to friends and family) and related issues such as safety of surroundings and their feeling of happiness. We focused on two of the questions related to the following aspects:
-
Environment: the level of comfort with the surrounding environment, and
-
Safety: the level of safety in the streets.
In the original survey, for both questions, respondents provided a score on a 10-point Likert scale, ranging from 1 = “never, at all” to 10 = “always, a lot”. For our purpose, we have re-scaled the responses to 0 to 9 by subtracting 1 from each response (meaning that \(m=9\)). The dataset contains many other variables. Univariate analysis results on some of the variables can be found in, for example, Iannario and Simone (2017) and Capecchi et al. (2018). Further details regarding the dataset can be found on https://rdrr.io/cran/CUB/man/relgoods.html. The R code used to obtain the results in this section is available as Supplementary Information online.
As one can naturally expect some association between the level of comfort with the surrounding environment and the level of safety in the surrounding areas, a bivariate model would be appropriate. Originally, there were a total of 2,459 responses. Upon removing 9 observations that contained missing values, the proposed bivariate CUB model was fitted on the remaining 2,450 observations. Here we label “Environment” as \(R_1\) and “Safety” as \(R_2\). The procedures described in Sects. 4.2 to 4.4 were employed to gain insights from the dataset.
Table 3 depicts the estimated parameters based on the proposed bivariate CUB model and separate univariate CUB models. The parameters under the univariate case were obtained using the functionalities within the CUB package (Iannario et al. 2020). Overall, the bivariate model resulted in a higher log-likelihood as well as a lower AIC and BIC, indicating a better goodness-of-fit (GOF). The better performance can also be checked visually by assessing the contour plots and 3D histograms provided in Fig. 3. In particular, the separate model was not able to capture the positive correlation between the two ratings.
Based on the estimated parameters in the bivariate model, we have \({\hat{r}}_U\) = 0.191 and \({\hat{r}}_T\) = 0.316, while the empirical correlation between \(R_1\) and \(R_2\) was \(r_R = 0.229\). Thus, the correlation between the feelings of the two questions was larger than that suggested by \(r_R\). Results of hypothesis tests in Table 4 also show that both \(\alpha _U\) and \(\alpha _B\) are significantly different from zero.
Suppose the respondent was uncertain towards both questions, the estimated value of \({\hat{\alpha }}_U = 1.723\) suggests that the estimated probability of choosing the same category, given the first response, was \(1.723/10=0.1723\), a 72.3% increase compared to a model assuming independence among the responses. Moreover, suppose the respondent chose to express his/ her feeling towards the two questions, the model found a moderate positive correlation (\({\hat{r}}_T\) = 0.316) among the two responses, indicating that the two responses tended to go in the same direction. That is, respondents who are satisfied with the level of comfort with the surrounding environment tended to be satisfied with the level of safety in the streets as well. These kinds of insights regarding the associations between the two survey items were not obtainable if the two variables were fitted separately.
The same dataset was also analysed using HMMLU with \(h_i(r_i)\) taking the form of discrete uniform distribution. In total, 22 parameters were used: three for the mixing weights \(\pi _{00}, \pi _{01}\) and \(\pi _{10}\) (\(\pi _{11}\) can be derived from these three), nine for the marginal logits for each of \(R_1\) and \(R_2\) and one for the log odds ratio. In particular, local logits in the form of \(\eta _r ^j=\log \left[ P(R_j = r+1|D_j=1)/P(R_j = r|D_j=1)\right] \) for \(j=1,2\) and \(r=0,2,\ldots ,8\) were used, and a global odds ratio (Dale 1986)
that is identical for all i and j was used. The use of only one log odds ratio was to ensure model identifiability (Colombi et al. 2019, p. 599). Table 5 shows the estimated values of the parameters and the overall GOF of the model. Not surprisingly, HMMLU provided a better fit in terms of all measures used since it contained substantially more parameters. The relative advantage of the bivariate CUB model lies in parsimony and interpretability.
7 Discussions and conclusion
In this research work, we have proposed a novel bivariate CUB model for modelling correlated ordinal variables, especially those arising from surveys that require people to rate or express their opinions on a Likert scale. The joint distribution belongs to a general class of mixture distributions while the marginal variables belong to the CUB distribution. Combining the two CUB variables facilitates further insights, such as the association between the two variables, to be drawn from the dataset. Identifiability and other inferential issues around the proposed model have been discussed throughout the paper. The estimation procedure has been found to work satisfactorily through simulation studies. Additional simulation studies under varied scenarios would enhance comprehension of the model’s performance. Upon applying the proposed model to a set of publicly available data, we have demonstrated the capability of the model in analysing two variables jointly instead of separately, and how further insights on the associations of the survey items could be discovered.
Since responses from surveys involve psychological behaviours of the respondents, it is important to take into account the potential biases that may have been introduced. Apart from indecision or uncertainty, the uncertainty component of the CUB model can also be used to account for other elements such as difficulty in expressing an actual feeling, limited knowledge, fatigue or willingness to satisfy the interviewer (Iannario and Piccolo 2016; Iannario and Tarantola 2023). As shown by Colombi et al. (2019), the ignorance of the uncertainty component during the modelling stage would lead to substantial biases in the estimation results.
One distinctive feature of the proposed model is the ability to estimate the associations within the uncertainty and feeling components separately. Previous attempts to generalise CUB models to the multivariate setting typically rely on copula-based methods, in particular the Plackett distribution (Corduas 2011; Andreis and Ferrari 2013; Corduas 2015). Another notable work by Colombi and Giordano (2016) used Sarmanov distribution to bind the univariate margins. Both the Plackett and Sarmanov distributions have a parameter that is related to either the rank or Pearson correlation of the two marginal variables. However, it is not possible to tell whether the correlation results from the uncertainty or the feeling component of the underlying CUB variables. Our proposal, on the other hand, allows the decomposition of the overall correlation into two separate elements. In particular, the estimated correlation between the respondents’ feelings/preferences would be considered an important measure in many applications. Although Colombi et al. (2019) do not use copula, it assumes independence between the uncertain responses.
One of the reasons why CUB models have become popular is the ability to include respondents’ covariates in the model, enabling analysts to explore the relationship between the CUB parameters and the subjects’ covariates for better interpretation. Under the proposed bivariate CUB model, we conjecture that it would be straightforward to include covariates for the uncertainty parameters \(\pi \). However, it may be challenging to include covariates for the feeling parameters \(\xi \), as the admissible range of \(\alpha _B\) [which is a function of \(\xi _1\) and \(\xi _2\) as provided in Eq. (4)] will then be affected by the covariates. Re-parameterising \(\alpha _B\) could be a way to overcome this challenge, but it is unclear at this stage how this would affect the likelihood function and the mechanism of the EM algorithm introduced in this paper. Further studies are needed to devise a solution. Nonetheless, we have purposely not considered models with covariates since the identifiability has not been established. In fact, to the best of our knowledge, we are not aware of any work that has fully tackled the identifiability issue even for univariate CUB models with covariates.
Our proposed model can be extended in several ways. For example, inclusion of “shelter”/ “refuge” (Iannario 2012) or “don’t know” category (Manisera and Zuccolotto 2014; Iannario et al. 2018) would be a direction for future research. Assuming identifiability is not an issue, other bivariate binomial and discrete uniform distributions would replace those utilised in this work. In the univariate case, Gottard et al. (2016) provide details of some other distributions that could be used to replace the uniform distribution in the uncertainty part. Building a bivariate model using these distributions would potentially lead to models that are more interpretable under certain contexts.
In this work, we have focused on the bivariate case. The model developed will serve as a building block for higher dimensional models. As the dependency structure becomes more complicated, the number of parameters will inevitably increase as well. Our proposed bivariate model would be useful if some pairwise dependence or Markov assumptions are to be imposed. These assumptions are particularly suitable for time series (Varin and Vidoni 2006) or spatial ordinal data (Feng et al. 2014; Ip and Wu 2024). More parameters will also mean a higher complexity of the observed information matrix. In that case, the empirical information matrix (Meilijson 1989; McLachlan and Peel 2000; Scott 2002), which requires only the first derivatives, can be used to ease the laborious burden in obtaining the second derivatives.
Data availability
The dataset analysed during the current study are available within the CUB package in R (Iannario et al. 2020).
Code availability
The code used to produce part of the results in the Application section is available as Supplementary Information.
References
Agresti A (2010) Analysis of ordinal categorical data. Wiley, Hoboken
Al-Humairi A, Ip RHL, Spuur K, Zheng X, Huang B (2022) Visual grading experiments and optimization in CBCT dental implantology imaging: preliminary application of integrated visual grading regression. Radiat Enviorn Biophys 61:133–145. https://doi.org/10.1007/s00411-021-00959-x
Anderson JA (1984) Regression and ordered categorical variables. J R Stat Soc B Met 46(1):1–22. https://doi.org/10.1111/j.2517-6161.1984.tb01270.x
Andreis F, Ferrari PA (2013) On a copula model with CUB margins. Quad Stat 15:33–51
Arcidiacono P, Jones JB (2003) Finite mixture distributions, sequential likelihood and the EM algorithm. Econometrica 71(3):933–946. https://doi.org/10.1111/1468-0262.00431
Barbiero A (2021) Inducing a desired value of correlation between two point-scale variables: a two-step procedure using copulas. Adv Stat Anal 105:307–334. https://doi.org/10.1007/s10182-021-00405-9
Bartolucci F, Colombi R, Forcina A (2007) An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Stat Sinica 17(2):691–711
Baumgartner H, Steenkamp JEM (2001) Response styles in marketing research: a cross-national investigation. J Mark Res 38(2):143–156. https://doi.org/10.1509/jmkr.38.2.143.18840
Baumgartner H, Steenkamp JEM (2006) Response biases in marketing research. In: Grover R, Vriens M (eds) The handbook of marketing research: uses, misuses, and future advances. SAGE, London
Bergsma WP, Rudas T (2002) Marginal models for categorical data. Ann Stat 30(1):140–159. https://doi.org/10.1214/aos/1015362188
Biswas A, Hwang JS (2002) A new bivariate binomial distribution. Stat Probab Lett 60(2):231–240. https://doi.org/10.1016/S0167-7152(02)00323-1
Capecchi S, Iannario M, Simone R (2018) Well-being and relational goods: a model-based approach to detect significant relationships. Soc Indic Res 135:729–750. https://doi.org/10.1007/s11205-016-1519-7
Colombi R, Giordano S (2016) A class of mixture models for multidimensional ordinal data. Stat Model 16(4):322–340. https://doi.org/10.1177/1471082X16649730
Colombi R, Giordano S, Gottard A, Iannario M (2019) Hierarchical marginal models with latent uncertainty. Scand J Stat 46(2):595–620. https://doi.org/10.1111/sjos.12366
Corduas M (2011) Modelling correlated bivariate ordinal data with CUB margins. Quad Stat 13:109–119
Corduas M (2015) Analyzing bivariate ordinal data with CUB margins. Stat Model 15(5):411–432. https://doi.org/10.1177/1471082X14558770
Dale JR (1986) Global cross-ratio mdoels for bivariate, discrete, ordered responses. Biometrics 42(4):909–917. https://doi.org/10.2307/2530704
D’Elia A, Piccolo D (2005) A mixture model for preferences data analysis. Comput Stat Data Anal 49:917–934. https://doi.org/10.1016/j.csda.2004.06.012
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B Met 39(1):1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Efron B, Hinkley DV (1978) Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information. Biometrika 65(3):457–483. https://doi.org/10.1093/biomet/65.3.457
Everitt BS, Hand DJ (1981) Finite mixture distributions. Chapman and Hall, London
Feng X, Zhu J, Lin P, Steen-Adams MM (2014) Composite likelihood estimation for models of spatial ordinal data and spatial proportional data with zero/one values. Environmetrics 25(8):571–583. https://doi.org/10.1002/env.2306
Gambacorta R, Iannario M (2013) Measuring job satisfaction with CUB models. Labour 27(2):198–224. https://doi.org/10.1111/labr.12008
Geenens G (2020) Copula modeling for discrete random vectors. Depend Model 8:417–440. https://doi.org/10.1515/demo-2020-0022
Genest C, Nešlehová J (2007) A primer on copulas for count data. Astin Bull 37(2):475–515. https://doi.org/10.2143/AST.37.2.2024077
Gottard A, Iannario M, Piccolo D (2016) Varying uncertainty in CUB models. Adv Data Anal Classif 10:225–244. https://doi.org/10.1007/s11634-016-0235-0
Guisan A, Harrell FE (2000) Ordinal response regression models in ecology. J Veg Sci 11(5):617–626. https://doi.org/10.2307/3236568
Hoel PG (1962) Introduction to mathematical statistics. Wiley, New York
Iannario M (2010) On the identifiability of a mixture model for ordinal data. Metron 68(1):87–94. https://doi.org/10.1007/BF03263526
Iannario M (2012) Modelling shelter choices in a class of mixture models for ordinal response. Stat Method Appl 21:1–22. https://doi.org/10.1007/s10260-011-0176-x
Iannario M, Manisera M, Piccolo D, Zuccolotto P (2018) Ordinal data models for no-opinion responses in attitude surveys. Sociol Method Res 49(1):250–276. https://doi.org/10.1177/0049124118769081
Iannario M, Piccolo D (2010) A new statistical model for the analysis of customer satisfaction. Qual Technol Quant Manage 7(2):149–168. https://doi.org/10.1080/16843703.2010.11673225
Iannario M, Piccolo D (2010) Statistical modelling of subjective survival probabilities. Genus 66(2):17–42
Iannario M, Piccolo D (2016) A comprehensive framework of regression models for ordinal data. Metron 74:233–252. https://doi.org/10.1007/s40300-016-0091-x
Iannario M, Piccolo D, Simone R (2020) CUB: a class of mixture models for ordinal data. R package version 1.1.4. https://CRAN.R-project.org/package=CUB
Iannario M, Simone R (2017) Mixture models for rating data: the method of moments via Gröbner basis. J Algebr Stat 8(2):1–28. https://doi.org/10.18409/JAS.V8I2.60
Iannario M, Tarantola C (2023) How to interpret the effect of covariates on the extreme categories in ordinal data models. Sociol Method Res 52(1):231–267. https://doi.org/10.1177/0049124120986179
Ip RHL, Wu KYK (2024) A Markov random field model with cumulative logistic functions for spatially dependent ordinal data. J Appl Stat 51(1):70–86. https://doi.org/10.1080/02664763.2022.2115985
Joshi A, Kale S, Chandel S, Pal DK (2015) Likert scale: explored and explained. Brit J Appl Sci Technol 7(4):157. https://doi.org/10.9734/BJAST/2015/14975
Krosnick JA (1999) Survey research. Annu Rev Psychol 50:537–567. https://doi.org/10.1146/annurev.psych.50.1.537
Krosnick JA, Alwin DF (1987) An evaluation of a cognitive theory of response-order effects in survey measurement. Public Opin Q 51(2):201–219. https://doi.org/10.1086/269029
Manisera M, Zuccolotto P (2014) Modeling “don’t know’’ responses in rating scales. Pattern Recogn Lett 45:226–234. https://doi.org/10.1016/j.patrec.2014.04.012
Manisera M, Zuccolotto P (2015) Identifiability of a model for discrete frequency distributions with a multidimensional parameter space. J Multivar Anal 140:302–316. https://doi.org/10.1016/j.jmva.2015.05.011
McCullagh P (1980) Regression models for ordinal data. J R Stat Soc B Met 42:109–142. https://doi.org/10.1111/j.2517-6161.1980.tb01109.x
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annu Rev Stat Appl 6:355–378. https://doi.org/10.1146/annurev-statistics-031017-100325
Meilijson I (1989) A fast improvement to the EM algorithm on its own terms. J R Stat Soc B Met 51(1):127–138. https://doi.org/10.1111/j.2517-6161.1989.tb01754.x
Molenberghs G, Lesaffre E (1994) Marginal modeling of correlated ordinal data using a multivariate Plackett distribution. J Am Stat Assoc 89:633–644. https://doi.org/10.1080/01621459.1994.10476788
Nelsen RB (2006) An introduction to copulas. Springer, New York
Piccolo D (2003) On the moments of a mixture of uniform and shifted binomial random variables. Quad Stat 5:85–104
Piccolo D (2006) Observed information matrix for MUB models. Quad Stat 8:33–78
Piccolo D, D’Elia A (2008) A new approach for modelling consumers’ preferences. Food Qual Prefer 19(3):247–259. https://doi.org/10.1016/j.foodqual.2007.07.002
Piccolo D, Simone R (2019) The class of CUB models: statistical foundations, inferential issues and empirical evidence. Stat Method Appl 28:389–435. https://doi.org/10.1007/s10260-019-00461-1
R Core Team (2022) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26(2):195–239. https://doi.org/10.1137/1026034
Scott WA (2002) Maximum likelihood estimation using the empirical Fisher information matrix. J Stat Comput Simul 72(8):599–611. https://doi.org/10.1080/00949650213744
Tourangeau R, Rips LJ, Rasinski K (2000) The psychology of survey response. Cambridge University Press, Cambridge
Tutz G (2022) Ordinal regression: a review and a taxonomy of models. WIRES Comput Stat 14(2):e1545. https://doi.org/10.1002/wics.1545
Van Vaerenbergh Y, Thomas TD (2013) Response styles in survey research: a literature review of antecedents, consequences, and remedies. Int J Public Opin Res 25(2):195–217. https://doi.org/10.1093/ijpor/eds021
Varin C, Vidoni P (2006) Pariwise likelihood inference for ordinal categorical time series. Comput Stat Data Anal 51(4):2365–2373. https://doi.org/10.1016/j.csda.2006.09.009
Wakita T, Ueshima N, Noguchi H (2012) Psychological distance between categories in the Likert scale: comparing different number of options. Educ Psychol Meas 72(4):533–546. https://doi.org/10.1177/0013164411431162
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions. No funds, grants, or other support was received.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. The first draft was of the manuscript was written by Ryan H. L. Ip and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Ethical approval
Ethics approval is not required for this work.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Moments of U, T and R
The covariance between \(U_1\) and \(U_2\) is given as
Since
we have
The covariance and correlation between \(T_1\) and \(T_2\) provided below are special cases of those presented in Biswas and Hwang (2002). To obtain the covariance between \(T_1\) and \(T_2\), observe that
meaning that
Thus, the covariance between \(T_1\) and \(T_2\) is
leading to a correlation of
The mean and variance of \(R_i\), \(i=1,2\), can be derived as follows.
The covariance between \(R_1\) and \(R_2\) can be derived as follows. Since
we have
Finally, the correlation between \(R_1\) and \(R_2\) can be found using
Appendix B: EM algorithm
Assume n pairs of \((r_{1k},r_{2k})\) are observed. For the sake of notational simplicity, let \(\varvec{\theta }=(\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'\), \({\varvec{r}}_k=(r_{1k},r_{2k}), k=1,2,\ldots ,n\), \(q_1 = 1-\pi _1\), \(q_2=1-\pi _2\), \(g_{00}({\varvec{r}}_k;\varvec{\theta }) = U_{12}({\varvec{r}}_k,\alpha _U)\), \(g_{01}({\varvec{r}}_k;\varvec{\theta }) =B_2(r_{2k})/(m+1)\), \(g_{10}({\varvec{r}}_k;\varvec{\theta }) =B_1(r_{1k})/(m+1)\), and \(g_{11}({\varvec{r}}_k;\varvec{\theta }) =B_{12}({\varvec{r}}_k,\xi _1,\xi _2,\alpha _B)\). The aim is to maximise
which could be done using the iterative procedure described below.
-
1.
Get initial values of \(\pi _1^{(0)},\pi _2^{(0)},\xi _1^{(0)},\xi _2^{(0)}\) by considering the margins separately. This step can be done using the CUB package in R (Iannario et al. 2020). Also, set \(\alpha _U ^{(0)} = \alpha _B ^{(0)}\)=0.
-
2.
Get the posterior probabilities \({\hat{p}}_{ij} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)})=P(D_1=i,D_2=j | R_1=r_{1k},R_2=r_{2k}; \varvec{\theta }^{(0)})\), \(i,j=0,1\), \(k=1,2,\ldots ,n,\) based on each individual paired observation \({\varvec{r}}_k\):
$$\begin{aligned} {\hat{p}}_{00} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)})= & {} (1-\pi _1 ^{(0)})(1-\pi _2 ^{(0)})g_{00}({\varvec{r}}_k;\varvec{\theta }^{(0)}) / L({\varvec{r}}_k;\varvec{\theta }^{(0)})\\ {\hat{p}}_{01} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)})= & {} (1-\pi _1 ^{(0)})\pi _2 ^{(0)}g_{01}({\varvec{r}}_k;\varvec{\theta }^{(0)}) / L({\varvec{r}}_k;\varvec{\theta }^{(0)})\\ {\hat{p}}_{10} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)})= & {} \pi _1 ^{(0)}(1-\pi _2 ^{(0)}) g_{10}({\varvec{r}}_k;\varvec{\theta }^{(0)}) / L({\varvec{r}}_k;\varvec{\theta }^{(0)})\\ {\hat{p}}_{11} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)})= & {} \pi _1 ^{(0)}\pi _2 ^{(0)} g_{11}({\varvec{r}}_k;\varvec{\theta }^{(0)}) / L({\varvec{r}}_k;\varvec{\theta }^{(0)}),\\ \end{aligned}$$which give the overall estimates of
$$\begin{aligned} {\hat{p}}_{ij} ^{(0)} = \frac{1}{n}\sum _{k=1} ^n {\hat{p}}_{ij} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}),\qquad i,j=0,1. \end{aligned}$$From the second iterations \((t>1)\) onward, the above estimates can be obtained from
$$\begin{aligned} {\hat{p}}_{ij} ^{(t+1)}({\varvec{r}}_k;\varvec{\theta }^{(t)})= & {} {\hat{p}}_{ij}^{(t)}g_{ij}({\varvec{r}}_k;\varvec{\theta }^{(t)}) / L({\varvec{r}}_k;\varvec{\theta }^{(t)}),\quad \text {and}\\ {\hat{p}}_{ij} ^{(t+1)}= & {} \frac{1}{n}\sum _{k=1} ^n {\hat{p}}_{ij} ^{(t+1)}({\varvec{r}}_k;\varvec{\theta }^{(t)}), \quad t>1. \end{aligned}$$ -
3.
Update \(\pi _1\) and \(\pi _2\) through
$$\begin{aligned} \pi _1 ^{(1)}= & {} {\hat{p}}_{10} ^{(0)}+{\hat{p}}_{11} ^{(0)}, \end{aligned}$$(B1)$$\begin{aligned} \pi _2 ^{(1)}= & {} {\hat{p}}_{01} ^{(0)}+{\hat{p}}_{11} ^{(0)}. \end{aligned}$$(B2)It can be shown that Eqs. (B1) and (B2) provide the best estimate of \(\pi _1\) and \(\pi _2\), respectively. The steps below largely follow those provided in Everitt and Hand (1981), except that our mixing probabilities are constrained in a different manner. Our objective is to maximise
$$\begin{aligned} \ell (\varvec{\theta })=\sum _{k=1} ^n \log [L({\varvec{r}}_k;\varvec{\theta })]-\lambda _1 (\pi _1+q_1-1)-\lambda _2(\pi _2+q_2-1), \end{aligned}$$where \(\lambda _1\) and \(\lambda _2\) are Lagrange multipliers corresponding to the constraints \(q_1+\pi _1=1\) and \(q_2+\pi _2=1\), respectively. Differentiating \(\ell (\varvec{\theta })\) with respect to \(\pi _1\), and setting the equation to 0 yields
$$\begin{aligned} \frac{\partial }{\partial \pi _1}\ell (\varvec{\theta })=\sum _{k=1} ^n\frac{q_2g_{10}({\varvec{r}}_k;\varvec{\theta })}{L({\varvec{r}}_k;\varvec{\theta })}+\sum _{k=1} ^n\frac{\pi _2g_{11}({\varvec{r}}_k;\varvec{\theta })}{L({\varvec{r}}_k;\varvec{\theta })}-\lambda _1=0. \end{aligned}$$(B3)Similarly, we have
$$\begin{aligned} \frac{\partial }{\partial q_1}\ell (\varvec{\theta })=\sum _{k=1} ^n\frac{q_2g_{00}({\varvec{r}}_k;\varvec{\theta })}{L({\varvec{r}}_k;\varvec{\theta })}+\sum _{k=1} ^n\frac{\pi _2g_{01}({\varvec{r}}_k;\varvec{\theta })}{L({\varvec{r}}_k;\varvec{\theta })}-\lambda _1=0. \end{aligned}$$(B4)Multiplying (B3) by \(\pi _1\) and (B4) by \(q_1\), and adding them up:
$$\begin{aligned}{} & {} \sum _{k=1} ^n \frac{\pi _1q_2g_{10}({\varvec{r}}_k;\varvec{\theta })+\pi _1\pi _2g_{11}({\varvec{r}}_k;\varvec{\theta })+ q_1q_2g_{00}({\varvec{r}}_k;\varvec{\theta })+ q_1\pi _2g_{01}({\varvec{r}}_k;\varvec{\theta })}{L({\varvec{r}}_k;\varvec{\theta })}\\{} & {} \quad -\lambda _1(\pi _1+q_1)=0\\{} & {} {\hat{\lambda }}_1 = n. \end{aligned}$$Suppose one has \(\varvec{\theta }^{(0)}\), using \({\hat{\lambda }}_1 = n\) and multiplying (B3) by \(\pi _1\) yields
$$\begin{aligned} \sum _{k=1} ^n \frac{\pi _1q_2^{(0)}g_{10}({\varvec{r}}_k;\varvec{\theta }^{(0)})+\pi _1\pi _2^{(0)}g_{11}({\varvec{r}}_k;\varvec{\theta }^{(0)})}{L({\varvec{r}}_k;\varvec{\theta }^{(0)})}-n\pi _1=0. \end{aligned}$$Thus,
$$\begin{aligned} {\hat{\pi }}_1=\frac{1}{n}\sum _{k=1} ^n \left[ \frac{\pi _1^{(0)}(1-\pi _2^{(0)}) g_{10}({\varvec{r}}_k;\varvec{\theta }^{(0)})}{ L({\varvec{r}}_k;\varvec{\theta }^{(0)})}+\frac{\pi _1^{(0)}\pi _2^{(0)} g_{11}({\varvec{r}}_k;\varvec{\theta }^{(0)})}{L({\varvec{r}}_k;\varvec{\theta }^{(0)})}\right] , \end{aligned}$$where the RHS is equivalent to the posterior probability \(P(D_1=1|R_1=r_1,R_2=r_2;\varvec{\theta }^{(0)}).\) The derivation of \(\pi _2\) follows in a similar fashion.
-
4.
Update \(\xi _1\) and \(\xi _2\) using
$$\begin{aligned} \xi _1 ^{(1)}= & {} \underset{\xi _1\in \Xi _1}{\text {argmax}} \sum _{k=1} ^n \left[ {\hat{p}}_{10} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \log g_{10}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \right. \\{} & {} \left. + {\hat{p}}_{11} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \log g_{11}({\varvec{r}}_k;\varvec{\theta }^{(0)})\right] ,\\ \xi _2 ^{(1)}= & {} \underset{\xi _2\in \Xi _2}{\text {argmax}} \sum _{k=1} ^n \left[ {\hat{p}}_{01} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)})\log g_{01}({\varvec{r}}_k;\varvec{\theta }^{(0)})\right. \\{} & {} \left. + {\hat{p}}_{11} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \log g_{11}({\varvec{r}}_k;\varvec{\theta }^{(0)})\right] ,\\ \end{aligned}$$where \(\Xi _1\) and \(\Xi _2\) are the ranges for \(\xi _1 ^{(1)}\) and \(\xi _2 ^{(1)}\), respectively, to ensure \(w_1\), \(w_2\), \(w_3\) and \(w_4\) are all positive, provided \(\varvec{\theta }^{(0)}\). Explicitly, the corresponding ranges are
$$ \begin{aligned} \Xi _1= & {} {\left\{ \begin{array}{ll} \left( \max \left\{ 0,\frac{-\xi _2^{(0)}-\alpha _B^{(0)}\xi _2^{(0)}-\alpha _B^{(0)}}{-\alpha _B^{(0)}}\right\} ,\right. \\ \left. \qquad \min \left\{ \xi _2^{(0)},\frac{1-\xi _2^{(0)}-\alpha _B^{(0)}\xi _2^{(0)}}{-\alpha _B^{(0)}},\frac{(1+\alpha _B^{(0)})(\xi _2^{(0)}-1)}{\alpha _B^{(0)}} \right\} \right) ,&{}\, \text {if } \xi _1 ^{(0)}< \xi _2 ^{(0)} \& \, \alpha _B ^{(0)}<0;\\ \left( \max \left\{ 0,\frac{1-\xi _2^{(0)}-\alpha _B^{(0)}\xi _2^{(0)}}{-\alpha _B^{(0)}},\frac{(1+\alpha _B^{(0)})(\xi _2^{(0)}-1)}{\alpha _B^{(0)}}\right\} ,\right. \\ \left. \qquad \min \left\{ \xi _2^{(0)},\frac{-\xi _2^{(0)}-\alpha _B^{(0)}\xi _2^{(0)}-\alpha _B^{(0)}}{-\alpha _B^{(0)}} \right\} \right) ,&{}\, \text {if } \xi _1 ^{(0)}< \xi _2 ^{(0)} \& \, \alpha _B ^{(0)}>0;\\ \left( \max \left\{ \frac{-\xi _2^{(0)}-\alpha _B^{(0)}\xi _2^{(0)}-\alpha _B^{(0)}}{-\alpha _B^{(0)}},\frac{\xi _2^{(0)}+\alpha _B^{(0)}\xi _2^{(0)}}{\alpha _B^{(0)}},\xi _2^{(0)}\right\} ,\right. \\ \left. \qquad \min \left\{ \frac{(1+\alpha _B^{(0)})(\xi _2^{(0)}-1)}{\alpha _B^{(0)}},1\right\} \right) ,&{}\, \text {if } \xi _1 ^{(0)}> \xi _2 ^{(0)} \& \, \alpha _B ^{(0)}<0;\\ \left( \max \left\{ \xi _2^{(0)},\frac{(1+\alpha _B^{(0)})(\xi _2^{(0)}-1)}{\alpha _B^{(0)}}\right\} ,\right. \\ \left. \qquad \min \left\{ \frac{-\xi _2^{(0)}-\alpha _B^{(0)}\xi _2^{(0)}-\alpha _B^{(0)}}{-\alpha _B^{(0)}},\frac{\xi _2^{(0)}+\alpha _B^{(0)}\xi _2^{(0)}}{\alpha _B^{(0)}},1\right\} \right) ,&{}\, \text {if } \xi _1 ^{(0)}> \xi _2 ^{(0)} \& \, \alpha _B ^{(0)}>0;\\ \left( 0,1 \right) ,&{}\, \text {if } \alpha _B ^{(0)} =0, \text {and} \end{array}\right. }\\ \Xi _2= & {} {\left\{ \begin{array}{ll} \left( \max \left\{ \frac{-(1-\xi _1^{(0)})\alpha _B^{(0)}}{1+\alpha _B^{(0)}},\xi _1^{(0)}\right\} , \right. \\ \left. \qquad \min \left\{ \frac{1+\xi _1^{(0)}\alpha _B^{(0)}}{1+\alpha _B^{(0)}},\frac{1+\alpha _B^{(0)}+\alpha _B^{(0)}\xi _1^{(0)}}{1+\alpha _B^{(0)}},1\right\} \right) ,&{}\quad \text {if } \xi _1 ^{(0)} < \xi _2 ^{(0)} \& \, \alpha _B ^{(0)} \ne 0;\\ \left( \max \left\{ 0,\frac{-(1-\xi _1^{(0)})\alpha _B^{(0)}}{1+\alpha _B^{(0)}},\frac{\xi _1^{(0)}\alpha _B^{(0)}}{1+\alpha _B^{(0)}}\right\} ,\right. \\ \left. \qquad \min \left\{ \xi _1^{(0)},\frac{1+\alpha _B^{(0)}+\alpha _B^{(0)}\xi _1^{(0)}}{1+\alpha _B^{(0)}}\right\} \right) ,&{}\quad \text {if } \xi _1 ^{(0)} > \xi _2 ^{(0)} \& \, \alpha _B ^{(0)} \ne 0;\\ \left( 0,1 \right) ,&{}\quad \text {if } \alpha _B ^{(0)} =0. \end{array}\right. } \end{aligned}$$ -
5.
Update \(\alpha _B\) using
$$\begin{aligned} \alpha _B ^{(1)}= & {} \underset{\alpha _B\in {\mathcal {A}}_B}{\text {argmax}} \sum _{k=1} ^n {\hat{p}}_{11} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \log g_{11}({\varvec{r}}_k;\varvec{\theta }^{(0)}), \end{aligned}$$where \({\mathcal {A}}_B\) is the admissible range of \(\alpha _B\) based on \(\xi _1 ^{(0)}\) and \(\xi _2 ^{(0)}\) as provided in (4).
-
6.
Update \(\alpha _U\) using
$$\begin{aligned} \alpha _U ^{(1)} = \frac{mS_1 - S_2}{S_1 + S_2}, \end{aligned}$$(B5)where
$$\begin{aligned} S_1 = \sum _{k=1} ^n {\hat{p}}_{00} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) 1_{r_{1k}=r_{2k}}\quad \text {and}\quad S_2 = \sum _{k=1} ^n {\hat{p}}_{00} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) 1_{r_{1k}\ne r_{2k}}. \end{aligned}$$To see that (B5) gives the best estimate for \(\alpha _U\), notice that our aim is to find
$$\begin{aligned} \alpha _U ^{(1)}= & {} \underset{\alpha _U\in [-1,m]}{\text {argmax}} \sum _{k=1} ^n \left[ {\hat{p}}_{00} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \log g_{00}({\varvec{r}}_k;\varvec{\theta }^{(0)})\right] . \end{aligned}$$Expanding \(\sum _{k=1} ^n \left[ {\hat{p}}_{00} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \log g_{00}({\varvec{r}}_k;\varvec{\theta }^{(0)})\right] \) gives
$$\begin{aligned}{} & {} \sum _{k=1, r_{1k}=r_{2k}} ^n \left[ {\hat{p}}_{00} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \log \left( \frac{m+m\alpha _U}{m(m+1)^2}\right) \right] \nonumber \\{} & {} \quad + \sum _{k=1, r_{1k}\ne r_{2k}} ^n \left[ {\hat{p}}_{00} ^{(0)}({\varvec{r}}_k;\varvec{\theta }^{(0)}) \log \left( \frac{m-\alpha _U}{m(m+1)^2}\right) \right] . \end{aligned}$$(B6)Upon differentiating (B6) with respect to \(\alpha _U\), we have
$$\begin{aligned} \frac{S_1}{1+\alpha _U} - \frac{S_2}{m-\alpha _U}=0, \end{aligned}$$which gives (B5).
-
7.
Calculate \(\log L({\varvec{r}}_k;\varvec{\theta }^{(1)})\). Repeat Steps 2 to 7 until L converges, that is, \(\log L({\varvec{r}}_k;\varvec{\theta }^{(t+1)})-\log L({\varvec{r}}_k;\varvec{\theta }^{(t)})<\varepsilon \) for some threshold \(\varepsilon \).
Appendix C: Detailed expressions for the information matrix
In this appendix, to simplify the notation, define \(f^\theta = \frac{\partial }{\partial \theta } f\) and \(f^{\theta _i \theta _j} = \frac{\partial ^2}{\partial \theta _i\partial \theta _j} f\) for some expressions f. Whenever there is no chance of causing confusion, the dependency of the functions on the parameters and/or data are often omitted. For instance, we write \(g_{00}\) instead of \(g_{00}(\varvec{\theta },{\varvec{r}}_k)\). In addition, the index j in \(\sum _j\) runs from 0 to \(r_1\), and the index k in \(\sum _k\) runs from 1 to n. We first list some recurrent expressions.
The elements of the negative information matrix, \(J(\varvec{\theta })=\Delta ^2 \log L(\varvec{\theta })\), is shown below, where the order of the parameters follows: \(\varvec{\theta }=(\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'\).
The lower triangular elements are the same as the upper triangular ones, and are thus omitted.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ip, R.H.L., Wu, K.Y.K. A mixture distribution for modelling bivariate ordinal data. Stat Papers (2024). https://doi.org/10.1007/s00362-024-01560-2
Received:
Revised:
Published:
DOI: https://doi.org/10.1007/s00362-024-01560-2