Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In 1980, the American Psychological Association (APA) conducted an election in which five candidates (A, B, C, D, and E) were running for president and voters were asked to rank order all of the candidates. Candidates A and B are research psychologists, C is a community psychologist, and D and E are clinical psychologists. Among those voters, 5738 gave complete rankings. These complete rankings are considered here (Diaconis (1988)). Note that lower rank implies more favorable. Then the average ranks received by candidates A, B, C, D, and E are 2.84, 3.16, 2.92, 3.09, and 2.99, respectively. This means that voters generally prefer candidate A the most, candidate C the second, etc. However, in order to make inferences on the preferences of the candidates, modeling of the ranking data is needed. In Sect. 9.1 we consider a model for this data which takes into account covariates.

In Sect. 9.2 we consider the following example for which factor analysis would be appropriate. In 1997, a mainland marketing research firm conducted a survey on people’s attitude toward career and living style in three major cities in Mainland China – Beijing, Shanghai, and Guangzhou. Five hundred responses from each city were obtained. A question regarding the behavior, conditions, and criteria for job selection of the 500 respondents in Guangzhou will be discussed here. In the survey, respondents were asked to rank the three most important criteria on choosing a job among 13 criteria: (1) favorable company reputation, (2) large company scale, (3) more promotion opportunities, (4) more training opportunities, (5) comfortable working environment, (6) high income, (7) stable working hours, (8) fringe benefits, (9) well matched with employees’ profession or talent, (10) short distance between working place and home, (11) challenging, (12) corporate structure of the company, and (13) low working pressure.

9.1 Multivariate Normal Order Statistics Models

In light of the Thurstone order statistics model mentioned in Sect. 8.1, the multivariate normal order statistics (MVNOS) model assumes that the ranking of t objects given by a judge is determined by ordering t latent utilities for the objects assigned by the judge. However, unlike the Thurstone order statistics model that assumes independent utilities, the MVNOS model assumes that the utilities are possibly correlated and the ranking \(\boldsymbol{\pi }_{j}\) given by judge j has the following probability:

$$\displaystyle{ P(\boldsymbol{\pi }_{j}) = P(y_{[1]_{j},j} > y_{[2]_{j},j} > \cdots > y_{[t]_{j},j}),\quad i = 1,\cdots \,,t, }$$
(9.1)

where \(< [1]_{j},\cdots \,,[t]_{j} >\) is the ordering of the t objects corresponding to the ranking \(\boldsymbol{\pi }_{j}\) and the latent utility vector \(\boldsymbol{y}_{j} = (y_{1j},\cdots \,,y_{tj})'\) of judge j is assumed to follow multivariate normal distribution with mean utility vector \(\mu _{j} = (\mu _{1j},\cdots \,,\mu _{tj})'\) and a general covariance matrix \(\boldsymbol{V }\), i.e.,

$$\displaystyle\begin{array}{rcl} \boldsymbol{y}_{j} = \mu _{j} +\boldsymbol{ e}_{j}& &{}\end{array}$$
(9.2)
$$\displaystyle\begin{array}{rcl} \boldsymbol{e}_{j}\stackrel{\mathrm{iid}}{\sim }N(\boldsymbol{0},\boldsymbol{V }).& &{}\end{array}$$
(9.3)

The MVNOS model is sometimes termed the multinomial probit model for ranking data.

9.1.1 The MVNOS Model with Covariates

When there are some covariates associated with the judges and objects, it is natural to impose the following linear model for μ j :

$$\displaystyle{ \mu _{j} =\boldsymbol{ Z}_{j}\boldsymbol{\beta }, }$$
(9.4)

where \(\boldsymbol{Z}_{j}\) is a t × p matrix of covariates associated with judge j and \(\boldsymbol{\beta }\) is a p × 1 vector of unknown parameters. For example, in a marketing survey, respondents are asked to rank products according to their preference. Usually, apart from the ranking given by the respondents, some socioeconomic variables (\(\boldsymbol{s}_{j}\)) about the respondents and the attributes (\(\boldsymbol{a}_{i}\)) of the products are also available. Then one may study the heterogeneity of the preference due to these variables by assuming the following model:

$$\displaystyle{ \mu _{ij} = \boldsymbol{a}_{i}'\boldsymbol{\gamma } + \boldsymbol{s}_{j}'\boldsymbol{\delta }_{i},\quad i = 1,\cdots \,,t. }$$
(9.5)

The parameter vector \(\boldsymbol{\gamma }\) represents the attribute effect common to all the respondents while the vector \(\boldsymbol{\delta }_{i}\) represents the respondents’ socioeconomic background which may affect their preference of product i. It is easily seen that equation (9.5) is a particular case of the model in (9.4) when

$$\displaystyle{\boldsymbol{Z}_{j} = \left (\begin{array}{ccccc} \boldsymbol{a}_{1}' & \boldsymbol{s}_{j}'& \boldsymbol{0} &\cdots & \boldsymbol{0} \\ \boldsymbol{a}_{2}' & \boldsymbol{0} &\boldsymbol{s}_{j}'& & \boldsymbol{0}\\ \vdots & \vdots & & \ddots \\ \boldsymbol{a}_{t}' & \boldsymbol{0} & \boldsymbol{0} & &\boldsymbol{s}_{j}' \end{array} \right )\quad \text{and}\quad \boldsymbol{\beta } = \left (\begin{array}{c} \boldsymbol{\gamma }\\ \boldsymbol{\delta }_{1} \\ \boldsymbol{\delta }_{2}\\ \vdots\\ \boldsymbol{\delta } _{ t} \end{array} \right ).}$$

In what follows, we shall consider the MVNOS model with the mean given in equation (9.4).

9.1.2 Parameter Identifiability of the MVNOS Model

Note that one can add an arbitrary constant (location shift) or multiply a positive constant (scale shift) to both sides of (9.2) while leaving the ranking probability unchanged. The location-shift problem is commonly dealt with by subtracting the first t − 1 rows by the last row leading to the model

$$\displaystyle\begin{array}{rcl} \boldsymbol{w}_{j} = \boldsymbol{X}_{j}\boldsymbol{\beta } + \boldsymbol{\varepsilon }_{j}& &{}\end{array}$$
(9.6)
$$\displaystyle\begin{array}{rcl} \boldsymbol{\varepsilon }_{j}\stackrel{\mathrm{iid}}{\sim }N(\mathbf{0},\boldsymbol{\Sigma }),& &{}\end{array}$$
(9.7)

where \(w_{ij} = y_{ij} - y_{tj}\), \(\boldsymbol{X}_{j} = [\boldsymbol{I}_{t-1},-\mathbf{1}_{t-1}]\boldsymbol{Z}_{j}\), \(\varepsilon _{ij} = e_{ij} - e_{tj}\), and

$$\displaystyle{\boldsymbol{\Sigma } = [\boldsymbol{I}_{t-1},-\mathbf{1}_{t-1}]\boldsymbol{V }[\boldsymbol{I}_{t-1},-\mathbf{1}_{t-1}]'.}$$

Here, \(\boldsymbol{I}\) denotes an identity matrix and 1 denotes a vector of 1’s. Then the ranking \(\boldsymbol{\pi }_{j}\) with respective ordering \(< [1]_{j},\cdots \,,[t]_{j} >\) corresponds to the event

$$\displaystyle\begin{array}{rcl} E_{j}& =& \{\boldsymbol{w}_{j}: w_{[1]_{j},j} > \cdots > w_{[r-1]_{j},j} > 0 > w_{[r+1]_{j},j} > \cdots > w_{[t],j}\} \\ & & \text{whenever}\ [r]_{j} = t. {}\end{array}$$
(9.8)

For the sake of simplicity, we use the convention \(w_{[0]_{j},j}= + \infty \) and \(w_{[t+1]_{j},j} = -\infty \). Notice that the scale-shift problem still exists in the model given by (9.6) and it can be easily resolved by adding a constraint on \(\boldsymbol{\Sigma }\) such as σ 11 = 1.

Since rankings of objects only depend on utility differences, \(\boldsymbol{\beta }\) and \(\boldsymbol{\Sigma }\) (with σ 11 fixed) are estimable, but the original parameters μ j and \(\boldsymbol{V }\) still cannot be fully identified. For example, suppose t = 3 and μ j  = μ. Then the following three sets of parameters under the MVNOS model lead to the same ranking probabilities:

$$\displaystyle{\begin{array}{lll} \quad \text{Set A:} &\quad \text{Set B:} &\quad \text{Set C:} \\ \ \mu _{A} = (1,0,-1)' &\ \mu _{B} = (-1,-2,-3)' &\ \mu _{C} = (1,0,-1)' \\ \boldsymbol{V }_{A} = \left (\begin{array}{lll} 1 &0 &0.2\\ 0 &1 &0.8 \\ 0.2&0.8&1 \end{array} \right )\ \ &\boldsymbol{V }_{B} = \left (\begin{array}{lll} 0.4&0 &0.4\\ 0 &1.6 &1.6 \\ 0.4&1.6&2 \end{array} \right )\ \ &\boldsymbol{V }_{C} = \left (\!\begin{array}{lll} \ \ 0.756 & - 0.444& - 0.311\\ - 0.444 &\ \ 0.356 &\ \ 0.089 \\ - 0.311&\ \ 0.089 &\ \ 0.222 \end{array} \!\right ) \end{array} }$$

This is because they all have the same utility differences \(y_{1} - y_{3}\) and \(y_{2} - y_{3}\) whose joint distribution is

$$\displaystyle{N\left (\left [\begin{array}{c} 2\\ 1 \end{array} \right ],\left [\begin{array}{cc} 1.6& 0\\ 0 &0.4 \end{array} \right ]\right ).}$$

Generally speaking, the parameter \(\boldsymbol{\beta }\) can be identified in the presence of covariates \(\boldsymbol{X}_{j}\). However, when there are no covariates, i.e., \(\mu _{j} = \mu\), the values of the μ i ’s will be determined only within a location shift. This indeterminacy can be eliminated by imposing one constraint on the μ i ’s, say, μ t  = 0.

The major identification problem is due to indeterminacy of the covariance matrix \(\boldsymbol{V }\) of the utilities. Owing to the fact that the utilities \(y_{ij},i = 1,\cdots \,,t,\) are invariant under any scale shift of \(\boldsymbol{V }\) and any transformation of \(\boldsymbol{V }\) of the form:

$$\displaystyle{ \boldsymbol{V }\longrightarrow \boldsymbol{V } + \boldsymbol{c}\mathbf{1}_{t}' + \mathbf{1}_{t}\boldsymbol{c}', }$$
(9.9)

for any constant vector \(\boldsymbol{c}\) (Arbuckle and Nugent 1973), \(\boldsymbol{V }\) can never be identified unless it is structured. In the previous example, it can be seen that \(\boldsymbol{V }_{A}\) can be transformed to \(\boldsymbol{V }_{B}\) and \(\boldsymbol{V }_{C}\) by setting \(\boldsymbol{c}\,=\,(-0.3,0.3,0.5)'\) and \(\boldsymbol{c}\,=\,(-0.122,\) \(-0.322,-0.389)'\), respectively. This identification problem is well known in the context of Thurstone order statistics models and multinomial probit models (Arbuckle and Nugent 1973; Dansie 1985; Bunch 1991; Yai et al. 1997; Train 2003).

Various solutions which impose constraints on the covariance matrix \(\boldsymbol{V }\) have been proposed in the literature. Among them, the methods proposed by Chintagunta (1992) and Yu (2000) provide the most flexible form for \(\boldsymbol{V }\) which does not require fixing any cell. Chintagunta’s method restricts each column sum of \(\boldsymbol{V }\) to zero (and σ 11 = 1), resulting in \(\boldsymbol{V } = \boldsymbol{B}^{-}\boldsymbol{\Sigma }(\boldsymbol{B}')^{-}\), with \(\boldsymbol{B} = [\boldsymbol{I}_{t-1},-\mathbf{1}_{t-1}]\) while Yu’s method restricts each column sum of \(\boldsymbol{V }\) to 1 (and σ 11 = 1), leading to

$$\displaystyle{\boldsymbol{V } = \boldsymbol{A}^{-1}\left [\begin{array}{cc} \boldsymbol{\Sigma }&\mathbf{0} \\ \mathbf{0} & t \end{array} \right ](\boldsymbol{A}')^{-1}\:\mathrm{with}\;\boldsymbol{A} = \left [\begin{array}{cc} \boldsymbol{I}_{t-1} & -\mathbf{1}_{t-1} \\ \mathbf{1}'_{t-1} & 1 \end{array} \right ].}$$

Note that the \(\boldsymbol{V }\) identified by Chintagunta’s method is singular and the associated utilities must be correlated whereas Yu’s method always produces a non-singular matrix \(\boldsymbol{V }\) and includes the identity matrix (or its scale shift) as a special case. In addition, it is easy to show that this non-singular matrix is an invariant transformation of the matrix used by Chintagunta (1992) under the transformation (9.9) with \(\boldsymbol{c} = \frac{1} {2t}\mathbf{1}_{t}\).

9.1.3 Bayesian Analysis of the MVNOS Model

Given a sample of n judges, the likelihood function of \((\boldsymbol{\beta },\boldsymbol{\Sigma })\) is given by

$$\displaystyle{ L(\boldsymbol{\beta },\boldsymbol{\Sigma }) =\prod _{ j=1}^{n}P(E_{ j}), }$$
(9.10)

where the event E j is given in (9.8). Note that the evaluation of the above likelihood function requires the numerical approximation of the \(\left (t - 1\right )\)-dimensional integral (e.g., Genz 1992) which can be done relatively accurately provided that the number of objects (t) is small, say less than 15. To avoid a high-dimensional numerical integration, limited information methods using the induced paired/triple-wise comparisons from the ranking data (e.g., structural equation models by Maydeu-Olivares and Bockenholt (2005) fitted using Mplus) have been proposed. Another approach is to use a Monte Carlo Expectation-Maximization (MCEM) algorithm (e.g., Yu et al. 2005; see also Sect. 9.2) which can avoid the direct maximization of the above likelihood function.

In this section we will consider a simulation-based Bayesian approach which can also avoid the evaluation and maximization of the above likelihood function. Recently, a number of R packages have become available for the Bayesian estimation of the MVNOS models for ranking data, including MNP (Imai and van Dyk 2005), rJAGS (Johnson and Kuhn 2013) as well as our own package StatMethRank.

9.1.3.1 Bayesian Estimation and Prior Distribution

In a Bayesian approach, the first step is to specify the prior distribution of the identified parameters. As mentioned previously one constraint on \(\boldsymbol{\Sigma }\) could be added in order to fix the scale and hence to identify all the parameters. Under this condition, the usual Wishart prior distribution for the constrained \(\boldsymbol{\Sigma }\) could not be used. In the context of multinomial probit model studied by McCulloch and Rossi (1994), instead of imposing the scale constraint on \(\boldsymbol{\Sigma }\), we may compute the full posterior distribution of \(\boldsymbol{\beta }\) and \(\boldsymbol{\Sigma }\) and obtain the marginal posterior distribution of the identified parameters such as \(\boldsymbol{\beta }/\sqrt{\sigma _{11}},\sigma _{ii}/\sigma _{11}\) and \(\rho _{ij} =\sigma _{ij}/\sqrt{\sigma _{ii } \sigma _{jj}}\).

Let \(f(\boldsymbol{\beta },\boldsymbol{\Sigma })\) denote the joint prior distribution of \((\boldsymbol{\beta },\boldsymbol{\Sigma })\). Then the posterior density of \((\boldsymbol{\beta },\boldsymbol{\Sigma })\) is

$$\displaystyle{ f(\boldsymbol{\beta },\boldsymbol{\Sigma }\vert \boldsymbol{\Pi }) \propto L(\boldsymbol{\beta },\boldsymbol{\Sigma })f(\boldsymbol{\beta },\boldsymbol{\Sigma }), }$$
(9.11)

where \(\boldsymbol{\Pi } =\{ \boldsymbol{\pi }_{1},\cdots \,,\boldsymbol{\pi }_{n}\}\) is the data set of all n observed rankings. It is convenient to use a normal prior on \(\boldsymbol{\beta }\),

$$\displaystyle{\boldsymbol{\beta } \sim N(\boldsymbol{\beta }_{0},\boldsymbol{A}_{0}^{-1}),}$$

and an independent Wishart prior on \(\boldsymbol{G} \equiv \boldsymbol{\Sigma }^{-1}\),

$$\displaystyle{\boldsymbol{G} \equiv \boldsymbol{\Sigma }^{-1} \sim W_{ t-1}(\alpha,\boldsymbol{P}).}$$

Note that our parametrization of the Wishart distribution is such that \(E(\boldsymbol{\Sigma }^{-1}) =\alpha \boldsymbol{P}^{-1}\).

Although (9.11) is intractable for Bayesian calculations, we may use the method of Gibbs sampling with data augmentation. We augment the parameter \((\boldsymbol{\beta },\boldsymbol{\Sigma })\) by the latent variable \(\boldsymbol{W} = (\boldsymbol{w}_{1},\cdots \,,\boldsymbol{w}_{n})\). Now, the joint posterior density of \((\boldsymbol{\beta },\boldsymbol{\Sigma },\boldsymbol{W})\) is

$$\displaystyle{ f(\boldsymbol{\beta },\boldsymbol{\Sigma },\boldsymbol{W}\vert \boldsymbol{\pi }) \propto f(\boldsymbol{\Pi }\vert \boldsymbol{W})f(\boldsymbol{W}\vert \boldsymbol{\beta },\boldsymbol{\Sigma })f(\boldsymbol{\beta },\boldsymbol{\Sigma }), }$$
(9.12)

which allows us to sample from the full conditional posterior distributions. The details are provided in the next section.

9.1.3.2 Gibbs Sampling Algorithm for the MVNOS Model

The Gibbs sampling algorithm for the MVNOS model consists of drawing samples consecutively from the full conditional posterior distributions, as follows:

  1. 1.

    Draw \(\boldsymbol{w}_{j}\) from \(f(\boldsymbol{w}_{j}\vert \boldsymbol{\beta },\boldsymbol{\Sigma },\boldsymbol{\Pi })\), for j = 1, ⋯ , n.

  2. 2.

    Draw \(\boldsymbol{\beta }\) from \(f(\boldsymbol{\beta }\vert \boldsymbol{\Sigma },\boldsymbol{W},\boldsymbol{\Pi }) \propto f(\boldsymbol{\beta }\vert \boldsymbol{\Sigma },\boldsymbol{W})\).

  3. 3.

    Draw \(\boldsymbol{\Sigma }\) from \(f(\boldsymbol{\Sigma }\vert \boldsymbol{\beta },\boldsymbol{W},\boldsymbol{\Pi }) \propto f(\boldsymbol{\Sigma }\vert \boldsymbol{\beta },\boldsymbol{W}).\)

In step (1), it can be shown that given \(\boldsymbol{\beta }\), \(\boldsymbol{\Sigma }\), and \(\boldsymbol{\Pi }\), the \(\boldsymbol{w}_{j}\)’s are independent and \(\boldsymbol{w}_{j}\) follows a truncated multivariate normal distribution, \(N(\boldsymbol{X}_{j}\boldsymbol{\beta },\boldsymbol{\Sigma })I(\boldsymbol{w}_{j} \in E_{j})\). One may simulate \(\boldsymbol{w}_{j}\) by using the acceptance-rejection technique, but this may lead to a high rejection rate when the number of objects is fairly large. Instead of drawing the whole vector \(\boldsymbol{w}_{j}\) at one time, we successively simulate each entry of \(\boldsymbol{w}_{j}\) by conditioning on the other t − 2 entries. More specifically, we replace step (1) by

  1. 1.

    draw w ij from \(f(w_{ij}\vert w_{-i,j},\boldsymbol{\beta },\boldsymbol{\Sigma },\boldsymbol{\Pi })\), for \(i = 1,\cdots \,,t - 1,j = 1,\cdots \,,n\), where \(w_{-i,j}\) is \(\boldsymbol{w}_{j}\) with w ij deleted.

Let \(\boldsymbol{x}_{ij}'\) be the ith row of \(\boldsymbol{X}_{j}\), \(\boldsymbol{X}_{-i,j}\) be \(\boldsymbol{X}_{j}\) with the ith row deleted, and \(\boldsymbol{g}_{-i,i}\) be the ith column of \(\boldsymbol{G}\) with g ii deleted. Suppose \(< [1]_{j},\cdots \,,[t]_{j} >\) is the ordering of objects corresponding to their ranks \(\boldsymbol{\pi }_{j} = (\pi _{1j},\cdots \,,\pi _{tj})\). Then π ij  = r if and only if [r] j  = i. Now we have

$$\displaystyle{ \begin{array}{c} w_{ij}\vert w_{-i,j},\boldsymbol{\beta },\boldsymbol{\Sigma },\boldsymbol{\Pi } \sim N(m_{ij},\tau _{ij}^{2}) \\ \text{subject to}\ \ w_{[r+1]_{j}j} < w_{ij} < w_{[r-1]_{j}j}\ \ \text{whenever}\ \ \pi _{ij} = r, \end{array} }$$
(9.13)

where

$$\displaystyle{m_{ij} = \boldsymbol{x}_{ij}'\boldsymbol{\beta } - g_{ii}^{-1}\boldsymbol{g}_{ -i,i}'(w_{-i,j} -\boldsymbol{X}_{-i,j}\boldsymbol{\beta })}$$

and \(\tau _{ij}^{2} = g_{ii}^{-1}\).

Although it still involves simulation from a truncated univariate normal distribution, we can adopt the inverse method to sample from this distribution without using the acceptance-rejection technique which may not be efficient (Devroye 1986).

Returning to steps (2) and (3), since we are conditioning on \(\boldsymbol{W}\), the MVNOS model is simply a standard Bayesian linear model setup. Therefore, the full conditional distribution of \(\boldsymbol{\beta }\) is

$$\displaystyle{ \boldsymbol{\beta }\vert \boldsymbol{\Sigma },\boldsymbol{W} \sim N_{p}(\boldsymbol{\beta }_{1},\boldsymbol{A}_{1}^{-1}), }$$
(9.14)

where

$$\displaystyle{\boldsymbol{A}_{1} = \boldsymbol{A}_{0} +\sum _{ j=1}^{n}\boldsymbol{X}_{ j}'\boldsymbol{\Sigma }^{-1}\boldsymbol{X}_{ j}\quad \mathrm{and}\quad \boldsymbol{\beta }_{1} = \boldsymbol{A}_{1}^{-1}(\boldsymbol{A}_{ 0}\boldsymbol{\beta }_{0} +\sum _{ j=1}^{n}\boldsymbol{X_{ j}'}\boldsymbol{\Sigma }^{-1}\boldsymbol{w}_{ j}).}$$

Finally, the full conditional distribution of \(\boldsymbol{\Sigma }\) is such that \(\boldsymbol{\Sigma } = \boldsymbol{G}^{-1}\) with

$$\displaystyle{ \boldsymbol{G}\vert \boldsymbol{\beta },\boldsymbol{W} \sim W_{t-1}\left (\alpha +n,\boldsymbol{P} +\sum _{ j=1}^{n}(\boldsymbol{w}_{ j} -\boldsymbol{X}_{j}'\boldsymbol{\beta })(\boldsymbol{w}_{j} -\boldsymbol{X}_{j}'\boldsymbol{\beta })'\right ). }$$
(9.15)

With a starting value for \((\boldsymbol{\beta },\boldsymbol{\Sigma },\boldsymbol{W})\), we draw in turn from each of the full conditional distributions given by (9.13), (9.14), and (9.15). When this process is repeated many times, the draws obtained will converge to a single draw from the full joint posterior distribution of \(\boldsymbol{\beta }\), \(\boldsymbol{\Sigma }\), and \(\boldsymbol{W}\). In practice, we iterate the process M + N times. The first M burn-in iterations are discarded. Because the iterates in the Gibbs sample are autocorrelated, we keep every sth draw in the last N iterates so that the resulting sample contains approximately independent draws from the joint posterior distribution. The value s here can be determined based on the graph of the sample autocorrelation of the Gibbs iterates.

A natural choice for a starting value for \((\boldsymbol{\beta },\boldsymbol{\Sigma })\) is to use \((0,\boldsymbol{I})\). However, it is nonstandard to find a starting value for \(\boldsymbol{W}\). We adopt an approach motivated by the fact that the ranking of \(\{w_{1j},\cdots \,,w_{t-1,j},0\}\) must be consistent with the observed ranking \(\{\pi _{1j},\cdots \,,\pi _{tj}\}\). Using this fact, a simple choice for the starting value of the w’s is to use \(w_{ij} = (\pi _{ij} -\pi _{tj})/\sqrt{(t^{2 } - 1)/12}\), a type of standardized rank score.

It should be remarked that since Thurstone’s normal order statistics model is a MVNOS model with \(\boldsymbol{V } = \boldsymbol{I}_{t}\), its parameters can be estimated by fixing \(\boldsymbol{V }\) to \(\boldsymbol{I}_{t}\), or equivalently, fixing \(\boldsymbol{\Sigma }\) to \(\boldsymbol{I}_{t-1} + \mathbf{1}_{t-1}\mathbf{1}'_{t-1}\) and skipping the step of generating \(\boldsymbol{\Sigma }\) in the above Gibbs sampling algorithm.

Remark. Although the MVNOS model discussed here considered the case of the complete ranking of t objects, it is not difficult to extend it to incorporate incomplete or partial ranking by modifying the event E j in (9.8) and the corresponding truncation rule in (9.13) used to sample w ij in the Gibbs sampling. For instance a partial ordering of 4 objects A, B, C, and D given by judge j is B ≻ C ≻ A, D. The event E j will then be modified to \(\{\boldsymbol{w}_{j}:\max \{ w_{Aj},0\} < w_{Cj} < w_{Bj}\}\), and hence w Aj , w Bj , and w Cj will be separately simulated from truncated normal over intervals (−, w Cj ), (w Cj , +), and \((\max \{w_{Aj},0\},w_{Bj})\), respectively. So far, we assume that the data does not contain tied ranks or, equivalently, the observed ordering of the objects that are tied is unknown. For example we will treat the tied ranking B ≻ C ≻ A = D as if the partial ranking B ≻ C ≻ A, D. See Sect. 9.2.1 for similar treatments of incomplete rankings in the context of factor analysis.

9.1.4 Adequacy of the Model Fit

To test for the adequacy of the model, we may group the t! rankings into a small number of meaningful subgroups and examine the fit for each subgroup. In particular, let n i be the observed frequency that object i is ranked as the top object. Also let

$$\displaystyle{\hat{p}_{i} = P(Y _{i} > Y _{1},\cdots \,,Y _{i-1},Y _{i+1},\cdots \,,Y _{t}\vert \hat{\boldsymbol{\beta }},\hat{\boldsymbol{\Sigma }})}$$

be the estimated partial probability of ranking object i as first under the fitted MVNOS model with posterior mean estimates \(\hat{\boldsymbol{\beta }}\) and \(\hat{\boldsymbol{\Sigma }}\). The fit can be examined by comparing the observed frequency n i with the expected frequency \(n\hat{p}_{i}\), i = 1, 2, ⋯ , t, or by calculating the standardized residuals:

$$\displaystyle{r_{i} = \frac{n_{i} - n\hat{p}_{i}} {\sqrt{n\hat{p}_{i } (1 -\hat{ p}_{i } )}},\quad j = 1,2,\cdots \,,t.}$$

If the expected frequencies match the observed frequencies well or the absolute values of the residuals are small enough, say, < 2, the MVNOS model adequately fits the data. The same argument can be applied to other ranking models.

In performing these calculations, it is necessary to evaluate numerically the estimated probability \(\hat{p}_{i}\) which may be expressed as

$$\displaystyle{\int _{-\infty }^{0}\!\!\!\cdots \!\int _{ -\infty }^{0}\phi (\boldsymbol{v}\vert \boldsymbol{\beta }^{{\ast}},\boldsymbol{\Sigma }^{{\ast}})d\boldsymbol{v},}$$

where \(\boldsymbol{v} = (Y _{1} - Y _{i},\cdots \,,Y _{i-1} - Y _{i},Y _{i+1} - Y _{i},\cdots \,,Y _{t} - Y _{i})' \sim N(\boldsymbol{\beta }^{{\ast}},\boldsymbol{\Sigma }^{{\ast}})\) and \(\boldsymbol{\beta }^{{\ast}}\) and \(\boldsymbol{\Sigma }^{{\ast}}\) can be obtained from \(\hat{\boldsymbol{\beta }}\) and \(\hat{\boldsymbol{\Sigma }}\), respectively. We employ the Geweke-Hajivassiliou-Keane (GHK) method (see Geweke 1991; Hajivassiliou 1993; Keane 1994). Let \(\boldsymbol{L} = (\ell_{ij})\) be the unique lower triangular matrix obtained from the Cholesky decomposition of \(\boldsymbol{\Sigma }^{{\ast}}\) (i.e., \(\boldsymbol{\Sigma }^{{\ast}} = \boldsymbol{L}\boldsymbol{L}'\)). The GHK simulator for the estimated partial probability \(\hat{p}_{i}\) is constructed via the following steps:

  1. 1.

    Compute

    $$\displaystyle{P(v_{1} < 0\vert \boldsymbol{\beta }^{{\ast}},\boldsymbol{\Sigma }^{{\ast}}) = \Phi (-\frac{\beta _{1}^{{\ast}}} {\ell_{11}} ),}$$

    and draw a \(\eta _{1} \sim N(0,1)\) with \(\eta _{1} < -\frac{\beta _{1}^{{\ast}}} {\ell_{11}}\).

  2. 2.

    For \(s = 2,\cdots \,,t - 1\), compute \(P(v_{s} < 0\vert \eta _{1},\eta _{2},\cdots \,,\eta _{s-1},\boldsymbol{\beta }^{{\ast}},\boldsymbol{\Sigma }^{{\ast}}) = \Phi (-\frac{\beta _{s}^{{\ast}}+\sum _{ j=1}^{s-1}\ell_{ sj}\eta _{j}} {\ell_{ss}} )\), and draw a \(\eta _{s} \sim N(0,1)\) with \(\eta _{s} < -\frac{\beta _{s}^{{\ast}}+\sum _{ j=1}^{s-1}\ell_{ sj}\eta _{j}} {\ell_{ss}}\).

  3. 3.

    Estimate \(\hat{p}_{i}\) by \(P(v_{1} < 0\vert \boldsymbol{\beta }^{{\ast}},\boldsymbol{\Sigma }^{{\ast}})\Pi _{s=2}^{t-1}P(v_{s} < 0\vert \eta _{1},\eta _{2},\cdots \,,\eta _{s-1},\boldsymbol{\beta }^{{\ast}},\boldsymbol{\Sigma }^{{\ast}})\).

  4. 4.

    Repeat steps 1–3 a large number of times to obtain independent estimates of \(\hat{p}_{i}\), and finally by taking the average of these estimates, the GHK simulator for \(\hat{p}_{i}\) is obtained. In a later application, we will use 10,000 replications.

9.1.5 Analysis of the APA Election Data

We now consider the APA election data. Let Y ij be the jth voter’s utility of selecting candidate i, i = A, B, C, D, E. We apply the MVNOS model in which (i) the jth voter’s ranking is assumed to be formed by the relative ordering of \(Y _{Aj},Y _{Bj},Y _{Cj},Y _{Dj},Y _{Ej}\); and (ii) the Y ’s satisfy the following model:

$$\displaystyle{\begin{array}{c} Y _{ij} =\mu _{i} + e_{ij},\quad i = A,B,C,D,E,\ j = 1,\cdots \,,5738, \\ (e_{Aj},e_{Bj},e_{Cj},e_{Dj},e_{Ej})'\stackrel{\mathrm{iid}}{\sim }N(\mathbf{0},\boldsymbol{V }), \end{array} }$$

or equivalently, the model could be formed by the relative ordering of \(w_{Aj},w_{Bj},w_{Cj},w_{Dj},0\), and the w’s satisfy

$$\displaystyle{\begin{array}{c} w_{ij} =\beta _{i} +\varepsilon _{ij},\quad i = A,B,C,D,\ j = 1,\cdots \,,5738, \\ (\varepsilon _{Aj},\varepsilon _{Bj},\varepsilon _{Cj},\varepsilon _{Dj})'\stackrel{\mathrm{iid}}{\sim }N(\mathbf{0},\boldsymbol{\Sigma }), \end{array} }$$

where \(\beta _{i} =\mu _{i} -\mu _{E}\) and \(\boldsymbol{\Sigma } = (\sigma _{ij})\) with \(\sigma _{ij} = v_{ij} + v_{EE} - v_{iE} - v_{jE}\).

Using the proper priors, \(\beta \sim N(\beta _{0} = 0,A_{0}^{-1} = 100)\) and \(\boldsymbol{\Sigma }^{-1} \sim W_{t-1}(\alpha = t + 1,\boldsymbol{P} = (t + 1\boldsymbol{I})\), 11,000 Gibbs iterations are generated. The first 1000 burn-in iterations were discarded. As evidenced from the sample autocorrelation of the Gibbs samples (not shown here), keeping every 20th draw in the last 10,000 Gibbs iterations gives approximately independent draws from the joint posterior distribution of the parameters \(\boldsymbol{\beta }\) and \(\boldsymbol{\Sigma }\) of the MVNOS model. By imposing the constraint μ E  = 0 and our constraint for \(\boldsymbol{V }\) to the Gibbs sequences, we obtain estimates for \(\mu _{i}(i = A,B,C,D,E)\) and v ij , (i ≤ j).

9.1.5.1 Adequacy of Model Fit and Model Comparison

To examine the goodness of fit of the MVNOS model, Table 9.1 shows the observed proportions and estimated partial probabilities under the MVNOS model. The two statistics for Thurstone’s normal order statistics model and Stern’s mixture of Luce (called BTL in his paper) models are also listed in Table 9.1 as alternatives to the MVNOS model. Thurstone’s model is fitted by repeating the Gibbs sampling with \(\boldsymbol{V }\) fixed at \(\boldsymbol{I}_{t}\), while Stern’s mixture models were fitted by Stern (1993). Stern found that the data seem to be a mixture of 2 or 3 groups of voters. This feature is also supported by Diaconis’s (1989) spectral analysis and McCullagh’s (1993b) model of inversions.

Table 9.1 Observed proportions and estimated probabilities that a candidate is ranked as first under various models for the APA election data (the value in bracket is the residual, r i )

As seen from Table 9.1, the estimated partial probabilities for the MVNOS model match the observed proportions very well. Also the magnitudes of the standardized residuals r i for the MVNOS model only are all very small ( < 2), indicating that among the four models considered in Table 9.1, the MVNOS model gives the best fit to the APA election data.

9.1.5.2 Interpretation of the Fitted MVNOS Model

Table 9.2 shows the posterior means, standard deviations, and 90 % posterior intervals for the parameters of the MVNOS model. It is not surprising to see that the ordering of the posterior means of the μ i ’s is the same as that of the average ranks. Apart from the posterior means, the Gibbs samples can also provide estimates of the probability that candidate i is more favorable than candidate j. For instance, the probability that candidate A is more favorable than candidate C is estimated by the sample mean of \(\Phi ( \frac{\mu _{A}-\mu _{C}} {\sqrt{v_{AA } +v_{CC } -2v_{AC}}})\) in the Gibbs samples, which is found to be 0.509 (posterior standard deviation = 0.006).

Table 9.2 Parameter estimates of the MVNOS model for the APA election data
Fig. 9.1
figure 1

Boxplots of μ i , v ii , and \(r_{ij} = v_{ij}/\sqrt{v_{ii } v_{jj}}(i\neq j)\) for the APA election data

According to the boxplots of μ i , v ii , and \(r_{ij} = v_{ij}/\sqrt{v_{ii } v_{jj}}(i\neq j)\) shown in Fig. 9.1, distributions of some parameters are fairly symmetric. In addition, a large estimate of v CC indicates that voters have fairly large variation of the preference on candidate C. To further investigate the structure of the covariance matrix \(\boldsymbol{V }\), a principal components analysis of the posterior mean estimate for \(\boldsymbol{V }\) is performed and the result is presented in Table 9.3.

A principal components analysis of the posterior mean estimate for \(\boldsymbol{V }\) produces the utilities of the five candidates {A, B, C, D, E} as

$$\displaystyle\begin{array}{rcl} \left [\begin{array}{c} y_{A} \\ y_{B} \\ y_{C} \\ y_{D} \\ y_{E} \end{array} \right ]& =& \left [\begin{array}{r} 0.086\\ \text{-0.071} \\ 0.067\\ \text{-0.048} \\ 0 \end{array} \right ] + \sqrt{1.015}\boldsymbol{a}_{1}z_{1} + \sqrt{0.2}\boldsymbol{1}_{5}z_{2} + \sqrt{0.440}\boldsymbol{a}_{3}z_{3} \\ & & +\sqrt{0.357}\boldsymbol{a}_{4}z_{4} + \sqrt{0.346}\boldsymbol{a}_{5}z_{5} {}\end{array}$$
(9.16)

where the z’s are independently and identically distributed as N(0, 1) and the principal components \(\boldsymbol{a}\)’s are given in Table 9.3. Since rankings of objects only depend on utility differences, the term \(\sqrt{0.2}\mathbf{1}_{5}z_{2}\) does not affect the rankings and hence, interpretation is based on the remaining four components.

Table 9.3 Principal components analysis of the posterior mean estimate for \(\boldsymbol{V }\)

Component 1 separates two groups of candidates, {A, C} and {D, E}, implying that there are two groups of voters: voters who prefer candidates A and C more and those who prefer candidates D and E more. Component 3 contrasts candidate E with candidates B and D, indicating that voters either prefer B and D to E or prefer E to B and D. For instance, if voters like B, they prefer D to E. Finally, components 4 and 5 indicate a contrast between A and C and a contrast between B and D, respectively. Based on the variances of the components, we can see that component 1 dominates and hence it plays a major role on ranking the five candidates.

9.2 Factor Analysis

It was mentioned in Sect. 9.1.2 that the parameters of a MVNOS model cannot be fully identified unless the covariance matrix \(\boldsymbol{V }\) is structured. One possibility to resolve this problem is to impose a factor covariance structure used in factor analysis onto \(\boldsymbol{V }\).

Factor analysis is widely used in social sciences and marketing research to identify the common characteristics among a set of variables. The classical d-factor model for a set of continuous variables \(y_{1},y_{2},\cdots \,,y_{t}\) is defined as

$$\displaystyle{ y_{ij} = \boldsymbol{z}_{j}'\boldsymbol{a}_{i} +\varepsilon _{ij},\quad i = 1,\ldots,t;\sim j = 1,\ldots,n }$$
(9.17)

where \(\boldsymbol{y}_{j} = (y_{1j},\ldots,y_{tj})'\) is a t-dimensional vector of response variables from individual j, \(\boldsymbol{z}_{j} = (z_{1j},\ldots,z_{dj})'\) is a vector of unobserved common factors associated with individual j, \(\boldsymbol{a}_{i} = (a_{i1},\ldots,a_{id})'\) is a vector of factor loadings associated with object i on the d factors, and \(\varepsilon _{ij}\) represents the error of the factor model. By adopting the MVNOS framework with the latent utilities satisfying the above factor model, we can generalize the classical factor model to analyze ranking data. In what follows, we shall assume that the reader has a basic familiarity with the statistical concepts of factor scores, factor loadings, and varimax rotation as can be found in most textbooks on multivariate analysis.

9.2.1 The Factor Model

Suppose we have a random sample of n individuals from the population and each individual is asked to rank t objects under study according to their own preferences. Within the framework of the MVNOS model, the ranking of the t objects given by individual j in the factor model is determined by the ordering of t latent utilities \(y_{1j},\ldots,y_{tj}\) which satisfies a more general d-factor model:

$$\displaystyle{ y_{ij} =\boldsymbol{ z}_{j}'\boldsymbol{a}_{i} + b_{i} +\varepsilon _{ij}\quad j = 1,\ldots,n;\ i = 1,\ldots,t(> d) }$$
(9.18)

where \(\boldsymbol{b} = (b_{1},\ldots,b_{t})^{'}\) is the mean utility vector reflecting the relative importance of the t objects and \(\boldsymbol{a}_{i} = (a_{i1},\ldots,a_{id})^{'}\) represents the factor loadings. It is assumed that the latent common factors \(\boldsymbol{z}_{1},\ldots,\boldsymbol{z}_{n}\) are independent and identically distributed according to the standard d-variate normal distribution, \(N_{d}(\mathbf{0},\boldsymbol{I})\). The error term, \(\varepsilon _{ij}\), is the unique factor which is assumed to follow a \(N(0,\sigma _{i}^{2})\) distribution, independent of the \(\boldsymbol{z}_{i}\)’s.

Denote a complete ranking by \(\boldsymbol{\pi }_{j} = (\pi _{1j},\ldots,\pi _{tj})^{'}\) where π ij is the rank of object i from individual j. Smaller ranks refer to the more preferred objects and hence higher utilities. For example, if \(\boldsymbol{\pi }_{j} = (2,3,1)^{'}\) is recorded, it corresponds to the unobservable utilities \(\boldsymbol{y}_{j} = (y_{1j},y_{2j},y_{3j})^{'}\) with \(y_{2j} < y_{1j} < y_{3j}\). Note that the only observable quantities are the π ij ’s but not the y ij ’s.

Remark.

Extension of the above factor model to incorporate incomplete ranking data is quite straightforward. In the case of the top q partial rankings with the top q objects being \([1]_{j},\ldots,[q]_{j}\) for individual j, it is natural to assign objects \([1]_{j},\ldots,[q]_{j}\) with ranks 1, , q, respectively, and the rest of objects with midrank, i.e., \([(q + 1) + \cdots + t]/(t - q)\). The factor model can be extended to restrict the utilities \(y_{1j},\cdots \,,y_{tj}\) to satisfy \(y_{[1]_{j}j} > y_{[2]_{j}j} > \cdots > y_{[q]_{j}j} > y_{[q+1\}_{j}j},\cdots \,,y_{[t]_{j}j}\). For subset rankings, individuals are asked to rank a subset of the t objects only. Ranking of the set of remaining objects is unknown and we can simply restrict the ordering of the utilities of objects in the subset consistent to the ranking of these objects. Generally speaking, a ranking \(\boldsymbol{\pi }\), complete or incomplete, corresponds to an event \(\{\boldsymbol{y}: C\boldsymbol{y} < \boldsymbol{0}\}\), for some contrast matrix C. For instance in the case of ranking t = 4 objects, the complete ranking \(\boldsymbol{\pi }_{1} = (2,3,1,4)'\), top 2 partial ranking \(\boldsymbol{\pi }_{2} = (2,3.5,1,3.5)'\), and the subset ranking \(\boldsymbol{\pi }_{3} = (2,\_,1,\_)'\) refer to the events with their respective matrices C being

$$\displaystyle{\left (\begin{array}{cccc} 1 & 0 & - 1&0\\ - 1 & 1 & 0 &0 \\ 0 & - 1& 0 &1 \end{array} \right ),\quad \left (\begin{array}{cccc} 1 &0& - 1&0\\ - 1 &1 & 0 &0 \\ - 1&0& 0 &1 \end{array} \right ),\quad \text{and}\quad \left (\begin{array}{cccc} 1&0& - 1&0 \end{array} \right ).}$$

Notationally, let

$$\displaystyle{\boldsymbol{A}_{d\times t} = [\boldsymbol{a}_{1}\cdots \boldsymbol{a_{t}}],}$$

\(\boldsymbol{\Psi }_{t\times t}\) be the diagonal matrix with \(diag(\boldsymbol{\Psi }) = (\sigma _{1}^{2},\ldots,\sigma _{t}^{2})\), and all other entries equal to zero, and

$$\displaystyle{\boldsymbol{\theta } =\{ \boldsymbol{A},\boldsymbol{b},\boldsymbol{\Psi }\}}$$

the set of parameters of interest. We shall discuss the maximum likelihood estimation of \(\boldsymbol{\theta }\) based on various types of ranking data via the Monte Carlo Expectation-Maximization (MCEM) algorithm in the next section.

9.2.2 Monte Carlo Expectation-Maximization Algorithm

In order to deal with missing data, the EM algorithm is a broadly applicable approach for the computation of maximum likelihood estimates having the advantages of simplicity and stability. It requires one to compute the conditional expectation of the complete-data log-likelihood function given the observed data (E-step) and then to maximize the likelihood function with respect to the parameters of interest (M-step).

Let \(\boldsymbol{Y }_{n\times t},\boldsymbol{Z}_{n\times d}\) be the matrices of the unobservable response utilities and latent common factors, respectively, with their jth rows corresponding to individual j. Denote by \(\boldsymbol{\Pi }_{n\times t} = [\boldsymbol{\pi }_{1},\ldots,\boldsymbol{\pi }_{n}]^{'}\) the matrix of the observed ranked data. Under an EM setting, we denote by {\(\boldsymbol{Y,Z}\)} the missing data and by \(\boldsymbol{\Pi }\) the observed data.

9.2.2.1 Implementing the E-step via the Gibbs Sampler

Since the complete-data log-likelihood function, apart from a constant, is given by

$$\displaystyle{ \ell(\boldsymbol{\theta }\vert \boldsymbol{Y },\boldsymbol{Z}) = -\frac{n} {2} \sum _{i=1}^{t}\log \sigma _{ i}^{2} -\frac{1} {2}\sum _{i=1}^{t}\sum _{ j=1}^{n}\frac{(y_{ij} -\boldsymbol{z}_{j}'\boldsymbol{a}_{i} - b_{i})^{2}} {\sigma _{i}^{2}}, }$$
(9.19)

the E-step here only involves computation of the conditional expectations of the complete-data sufficient statistics \(\{\boldsymbol{Y }'\boldsymbol{Y },\boldsymbol{Z}'\boldsymbol{Z},\boldsymbol{Z}'\boldsymbol{Y },\boldsymbol{Y }'\mathbf{1},\boldsymbol{Z}'\mathbf{1}\}\) given \(\boldsymbol{\Pi }\) and \(\boldsymbol{\theta }\). This can be done by using the Gibbs sampling algorithm which consists of drawing samples consecutively from the full conditional posterior distributions, as shown below:

  1. 1.

    Draw \(\boldsymbol{z}_{j}\) from \(f(\boldsymbol{z}_{j}\vert \boldsymbol{y}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta })\).

  2. 2.

    Draw \(\boldsymbol{y}_{j}\) from \(f(\boldsymbol{y}_{j}\vert \boldsymbol{z}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta })\) for j = 1, , n.

For step 1, making draws from \(f\left (\boldsymbol{z}_{j}\vert \boldsymbol{y}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta }\right )\) is simple because

$$\displaystyle{f(\boldsymbol{z}_{j}\vert \boldsymbol{y}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta }) = f(\boldsymbol{z}_{j}\vert \boldsymbol{y}_{j},\boldsymbol{\theta }),}$$

which is independent of \(\boldsymbol{\pi }_{j}\). Draws of \(\boldsymbol{Z}\) can be made from the conditional distribution

$$\displaystyle{ \boldsymbol{z}_{j}\vert \boldsymbol{y}_{j},\boldsymbol{\theta }\sim N_{d}(\boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}(\boldsymbol{y}_{ j} -\boldsymbol{b}),\boldsymbol{I} -\boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}\boldsymbol{A}'). }$$
(9.20)

For step 2, \(\boldsymbol{y}_{j}\) requires to have consistent orderings with the observed ranking \(\boldsymbol{\pi }_{j}\). Suppose that \(< [1]_{j},\cdots \,,[t]_{j} >\) represents the ordering of the t objects with respect to the complete ranking \(\boldsymbol{\pi }_{j}\) such that [1] j is the most preferred object, [2] j is the second most preferred object, and so on. Define \(y_{[0]_{j}j} = +\infty \) and \(y_{[t+1]_{j}j} = -\infty \).

9.2.2.1.1 Complete Rankings

For the cases with complete rankings, we can draw y ij sequentially for i = 1, ⋯ , t from

$$\displaystyle{ y_{ij}\vert y_{1j},\ldots,y_{i-1,j},y_{i+1,j},\ldots,y_{tj},\boldsymbol{\pi }_{j},\boldsymbol{z}_{j},\boldsymbol{\theta }\sim N(\boldsymbol{z}_{j}'\boldsymbol{a}_{i},\sigma _{i}^{2}) }$$
(9.21)

with the constraint \(y_{[r-1]_{j}j} > y_{ij} > y_{[r+1]_{j}j}\) for π ij  = r (or [r] j  = i).

9.2.2.1.2 Top q Partial Rankings

For top q partial rankings, we draw the top q objects (i.e., \(\{x_{[1]_{j}j},\ldots,x_{[q]_{j}j}\}\)) as in the complete case and simulate the other objects by

$$\displaystyle{ y_{ij} \sim N(\boldsymbol{z}_{j}'\boldsymbol{a}_{i},\sigma _{i}^{2}) }$$
(9.22)

with the constraint \(-\infty < y_{ij} < y_{[q]_{j}j}\) for π ij  = r (or [r] j  = i).

9.2.2.1.3 Subset Rankings

For subset rankings, individuals are asked to rank a subset of the t objects only. Rankings of the set of remaining objects, {y ij }, are unknown and we can simulate \(\{y_{i'j}\}\) from

$$\displaystyle{ \{y_{i'j}\vert i'\notin \{\text{ranked objects}\}\} \sim N(\boldsymbol{z}_{j}'\boldsymbol{a}_{i},\sigma _{i}^{2}). }$$
(9.23)

The conditional expectation of \(\boldsymbol{Y }'\mathbf{1}\) and \(\boldsymbol{Y }'\boldsymbol{Y }\) can be approximated by taking the average of the random draws of \(\sum _{j}\boldsymbol{y}_{j}\) and the average of their product sum \(\sum _{j}\boldsymbol{y}_{j}\boldsymbol{y}_{j}'\), respectively. Finally, conditional expectations of \(\boldsymbol{Z}'\mathbf{1}\), \(\boldsymbol{Z}'\boldsymbol{Z}\), and \(\boldsymbol{Z}'\boldsymbol{Y }\) can be obtained by

$$\displaystyle{\begin{array}{rcl} E[\boldsymbol{Z}'\mathbf{1}\vert \boldsymbol{\Pi },\boldsymbol{\theta }]& =&\boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}(E[\boldsymbol{Y }'\mathbf{1}\vert \boldsymbol{\Pi },\boldsymbol{\theta }] - n\boldsymbol{b}), \\ E[\boldsymbol{Z}'\boldsymbol{Z}\vert \boldsymbol{\Pi },\boldsymbol{\theta }]& =&n[\boldsymbol{I} -\boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}\boldsymbol{A}'] \\ & & + \boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A}\,+\,\boldsymbol{\Psi })^{-1}E[(\boldsymbol{Y }\,-\,\mathbf{1}\boldsymbol{b}')'(\boldsymbol{Y }\,-\,\mathbf{1}\boldsymbol{b}')\vert \boldsymbol{\Pi },\boldsymbol{\theta }](\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}\boldsymbol{A}' \\ E[\boldsymbol{Z}'\boldsymbol{Y }\vert \boldsymbol{\Pi },\boldsymbol{\theta }]& =&\boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}E[(\boldsymbol{Y }' -\boldsymbol{b}\mathbf{1}')\boldsymbol{Y }\vert \boldsymbol{\Pi },\boldsymbol{\theta }]. \end{array} }$$

9.2.2.2 M-Step

By replacing the complete-data sufficient statistics \(\{\boldsymbol{Y }'\boldsymbol{Y },\boldsymbol{Z}'\boldsymbol{Z},\boldsymbol{Z}'\boldsymbol{Y },\boldsymbol{Y }'\mathbf{1},\boldsymbol{Z}'\mathbf{1}\}\) with their corresponding conditional expectations obtained in E-step, we can compute the maximum likelihood estimate of \(\boldsymbol{\theta }\) by

$$\displaystyle{\left (\begin{array}{c} \hat{\boldsymbol{A}}\\ \hat{\boldsymbol{b} '} \end{array} \right ) = \left [(\boldsymbol{Z}\;\mathbf{1})'(\boldsymbol{Z}\;\mathbf{1})\right ]^{-1}(\boldsymbol{Z}\;\mathbf{1})'\boldsymbol{Y } = \left [\begin{array}{cc} \boldsymbol{Z}'\boldsymbol{Z}&\boldsymbol{Z}'\mathbf{1} \\ \mathbf{1}'\boldsymbol{Z} & \mathbf{1}'\mathbf{1} \end{array} \right ]^{-1}\left (\begin{array}{c} \boldsymbol{Z}'\boldsymbol{Y } \\ \mathbf{1}'\boldsymbol{Y } \end{array} \right )}$$

and

$$\displaystyle\begin{array}{rcl} \hat{\boldsymbol{\Psi }}& =& \frac{1} {n}\ \mathit{diag}\left ((\boldsymbol{Y } -\boldsymbol{Z}\hat{\boldsymbol{A}} -\mathbf{1}\hat{\boldsymbol{b}}')'(\boldsymbol{Y } -\boldsymbol{Z}\hat{\boldsymbol{A}} -\mathbf{1}\hat{\boldsymbol{b}}')\right ) {}\\ & =& \frac{1} {n}\ \mathit{diag}\left (\boldsymbol{Y }'\boldsymbol{Y } - 2\hat{\boldsymbol{A}'}\boldsymbol{Z}'\boldsymbol{Y } - 2\hat{\boldsymbol{b}}\mathbf{1}'\boldsymbol{Y } +\hat{ \boldsymbol{A}'}\boldsymbol{Z}'\boldsymbol{Z}\hat{\boldsymbol{A}} + 2\hat{\boldsymbol{b}}\mathbf{1}'\boldsymbol{Z}\hat{\boldsymbol{A}} + n\hat{\boldsymbol{b}}\hat{\boldsymbol{b}}'\right ). {}\\ \end{array}$$

The new set of \(\boldsymbol{\theta }\) is then used for calculation of the conditional expectation of the sufficient statistics in the E-step and the algorithm is iterated until convergence is attained.

9.2.2.3 Determining Convergence of MCEM via Bridge Sampling

To determine convergence of the EM algorithm we propose to use the bridge sampling criterion discussed by Meng and Wong (1996). The bridge sampling estimate for the likelihood ratio associated with the individual j is given by

$$\displaystyle{\frac{L(\boldsymbol{\theta }^{(s+1)}\vert \boldsymbol{y}_{j},\boldsymbol{z}_{j})} {L(\boldsymbol{\theta }^{(s)}\vert \boldsymbol{y}_{j},\boldsymbol{z}_{j})} = \frac{\sum _{m=1}^{M}\left [\frac{L(\boldsymbol{\theta }^{(s+1)}\vert \boldsymbol{y}_{ j}^{(s,m)},\boldsymbol{z}_{ j}^{(s,m)})} {L(\boldsymbol{\theta }^{(s)}\vert \boldsymbol{y}_{j}^{(s,m)},\boldsymbol{z}_{j}^{(s,m)})} \right ]^{1/2}} {\sum _{m=1}^{M}\left [ \frac{L(\boldsymbol{\theta }^{(s)}\vert \boldsymbol{y}_{j}^{(s+1,m)},\boldsymbol{z}_{j}^{(s+1,m)})} {L(\boldsymbol{\theta }^{(s+1)}\vert \boldsymbol{y}_{j}^{(s+1,m)},\boldsymbol{z}_{j}^{(s+1,m)})}\right ]^{1/2}},}$$

where \(\{\boldsymbol{y}_{j}^{(s,m)},\boldsymbol{z}_{j}^{(s,m)},m = 1,\ldots,M\}\) denote the M Gibbs samples from \(f(\boldsymbol{y}_{j}\vert \boldsymbol{z}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta }^{(s)})\) and \(f(\boldsymbol{z}_{j}\vert \boldsymbol{y}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta }^{(s)})\) with \(\boldsymbol{\theta }^{(s)}\) being the sth iterate of \(\boldsymbol{\theta }\). The estimate for the log-likelihood ratio of two consecutive iterates is then given by

$$\displaystyle{\hat{h}(\boldsymbol{\theta }^{(s+1)},\boldsymbol{\theta }^{(s)}) =\sum _{ j=1}^{n}\ \log \frac{L(\boldsymbol{\theta }^{(s+1)}\vert \boldsymbol{y}_{ j},\boldsymbol{z}_{j})} {L(\boldsymbol{\theta }^{(s)}\vert \boldsymbol{y}_{j},\boldsymbol{z}_{j})}.}$$

We plot \(\hat{h}(\boldsymbol{\theta }^{(s+1)},\boldsymbol{\theta }^{(s)})\) against s to determine the convergence of the MCEM algorithm. A curve converging to zero indicates a convergence because the EM algorithm should increase the likelihood at each step.

9.2.3 Simulation

We adopt the parameter values listed in Table 9.4 used by Brady (1989) to study the MCEM algorithm for complete and incomplete rankings. Using the factor model and these parameter values, thirty sets of data with n = 1, 000 and utility vectors of t = 7 objects were simulated. Three types of ranked data were observed from each data set. The first type corresponds to the complete rankings for seven objects by ranking the utilities of the 7 objects. The second type corresponds to the top 3 partial rankings constructed from the rankings of the three largest utilities while the third type corresponds to the subset rankings of 3 out the 7 objects chosen according to the incomplete block design as shown in Table 9.5.

Table 9.4 The Parameter values of a 2-factor model for seven objects
Table 9.5 Incomplete block design for subset rankings

In our simulation studies, the Gibbs sampler and the MCEM algorithm both converge fairly fast. Computation time required for each MC E-step in the case of subset rankings is shorter than that in complete rankings because the number of truncated normal variates to be drawn is smaller. For each E step, we discarded the first 100 burn-in cycles and selected one \(\boldsymbol{x}_{i}\) systematically from every fifth cycle afterward until a total of 40 draws was reached. The MCEM algorithm converged within 10 iterations for all simulation data sets. The means of the 30 sets of estimates for the complete rankings, top 3 partial rankings and 3 out of 7 subset rankings, together with their biases and standard errors are shown in Table 9.6. Small values of biases and standard errors show that the estimation method for incomplete rankings is extremely efficient and reliable with high accuracy.

Intuitively, since more information is provided when complete rankings are observed, the estimation of the factor model should perform better than with partial or subset rankings. This is indeed the case as can be seen from Table 9.6. Larger biases and standard errors are obtained for the case of 3 out of 7 subset rankings.

Table 9.6 Simulation results of MCEM algorithm

9.2.4 Factor Score Estimation

So far we have been interested mainly in problems concerning the parameters in factor models and their estimation. Indeed, this frequently represents the main objective of factor analysis since the loading coefficients, to a large extent, determine the reduction of observed variables into a small number of common factors in terms of meaningful phenomena. While these problems constitute the primary interest of factor analysis, it is sometimes desirable to go one step further and to estimate the scores of an individual on the common factors in terms of the realizations of the variates for that individual. Factor scores provide information concerning the relative position of each individual corresponding to each factor whereas the loadings generally remain constant for all individuals. We therefore turn our attention to the problem of factor score estimation.

With the normality assumption, estimates of the factor score can be obtained via the regression approach and the generalized least squares approach that, respectively, minimize the variation of the estimator and the sum of squared standardized residuals (see Lawley and Maxwell 1971). However, these two approaches can only be used when the utility \(\boldsymbol{Y }\) can be observed. Recently, Shi and Lee (1997a) developed a Bayesian approach for estimating the factor scores in factor models with polytomous data. By constructing appropriate posterior distribution, they proposed using the posterior mean as a factor score estimate. Their method involves computation of some multiple integrals which is handled by some Monte Carlo methods. To avoid tedious computation, Shi and Lee (1997b) applied the EM algorithm to obtain a Bayesian estimate of the factor score with polytomous variables. In this section, we will estimate the factor scores with ranked data via the MCEM algorithm discussed in Sect. 9.2.2.

9.2.4.1 Factor Score Estimation Using the MCEM Algorithm

The factor score \(\boldsymbol{z}_{j}\) can be estimated by the posterior mode of the posterior distribution \(\boldsymbol{z}_{j}\vert \boldsymbol{\pi }_{j},\boldsymbol{\theta }\). Hence, the MCEM algorithm can be used to find the estimate by viewing the \(\boldsymbol{z}_{j}\)’s as parameters in the complete-data log-likelihood function in (9.19) and the resulting maximum likelihood estimate of \(\boldsymbol{z}_{j}\) will then be the posterior mode estimate. The MCEM iteration can be simplified as follows: given an initial value \(\boldsymbol{z}_{j}^{(0)}\) and the estimate \(\boldsymbol{\theta }\), at the (s + 1)th MCEM iteration,

E-step: :

Find \(E(\boldsymbol{y}_{j}\vert \boldsymbol{\pi }_{j},\boldsymbol{z}_{j}^{(s)},\boldsymbol{\theta })\) via Gibbs sampler.

M-step: :

Update \(\boldsymbol{z}_{j}^{(s)}\) to \(\boldsymbol{z}_{j}^{(s+1)}\) by

$$\displaystyle{ \boldsymbol{z}_{j}^{(s+1)} = \boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}[E(\boldsymbol{y}_{ j}\vert \boldsymbol{\pi }_{j},\boldsymbol{z}_{j}^{(s)},\boldsymbol{\theta }) -\boldsymbol{b})]. }$$
(9.24)

The Monte Carlo E-step is exactly the same as finding the conditional expectation of \(\boldsymbol{y}_{j}\) while the M-step improves the estimate of \(\boldsymbol{z}_{j}\) in a single step only. This iterative procedure will converge to the appropriate posterior mode which will be taken as an estimate of \(\boldsymbol{z}_{j}\). We propose to stop the MCEM iteration when the likelihood function of \(\boldsymbol{z}_{j}^{(s)}\) and \(\boldsymbol{z}_{j}^{(s+1)}\) is very close to each other. A simple stopping criterion is to consider the following expression:

$$\displaystyle l({\boldsymbol{z}}^{(s)},{\boldsymbol{z}}^{(s+1)})=\log\frac{\exp\frac{\sum_{i}{{\boldsymbol{z}}_{j}^{(s)}}^{'}{\boldsymbol{z}}_{j}^{(s)}}{2}}{\exp\frac{\sum_{i}{{\boldsymbol{z}}_{j}^{(s+1)}}^{'}{\boldsymbol{z}}_{j}^{(s+1)}}{2}}=\frac{1}{2}\sum_{i}\left({{\boldsymbol{z}}_{j}^{(s)}}^{'}{\boldsymbol{z}}_{j}^{(s)}-{{\boldsymbol{z}}_{j}^{(s+1)}}^{'}{\boldsymbol{z}}_{j}^{(s+1)}\right).$$
(9.25)

Convergence of the MCEM iteration is attained when \(l(\boldsymbol{z}^{(s)},\boldsymbol{z}^{(s+1)})\) becomes stationary at zero level.

Note that it is possible to estimate the factor scores using the posterior mean based on the samples generated from the Gibbs sampler. We note that the posterior mode and the posterior mean are usually very close and, moreover, the covariance matrix of the posterior mode can be obtained as a by-product of the MCEM factor score estimation.

9.2.4.2 The Covariance Matrix of the Factor Score Estimates

To provide more insight about the estimates and the impact of lost information from continuous to ranking measurements, it is desirable to derive the covariance matrix of the posterior distribution \(f(\boldsymbol{z}_{j}\vert \boldsymbol{\pi }_{j},\boldsymbol{\theta })\), which is given by the negative inverse of the Hessian matrix of \(\log [f(\boldsymbol{z}_{j}\vert \boldsymbol{\pi }_{j},\boldsymbol{\theta })]\). A convenient way to evaluate the Hessian matrix is via the following expression:

$$\displaystyle\begin{array}{rcl} -\frac{\partial ^{2}\log [f(\boldsymbol{z}_{j}\vert \boldsymbol{\pi }_{j},\boldsymbol{\theta })]} {\partial \boldsymbol{z}_{j}\partial \boldsymbol{z}_{j}'} & =& -\int \frac{\partial ^{2}\log [f(\boldsymbol{z}_{j}\vert \boldsymbol{y}_{j},\boldsymbol{\theta })]} {\partial \boldsymbol{z}_{j}\partial \boldsymbol{z}_{j}'} f(\boldsymbol{y}_{j}\vert \boldsymbol{z}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta })d\boldsymbol{y}_{j} \\ & & -\text{Var}\left \{-\frac{\partial \log [f(\boldsymbol{z}_{j}\vert \boldsymbol{y}_{j},\boldsymbol{\theta })]} {\partial \boldsymbol{z}_{j}} \right \} {}\end{array}$$
(9.26)

where the variance is with respect to \(f(\boldsymbol{y}_{j}\vert \boldsymbol{z}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta })\) (Tanner (1997)).

It can be shown that the covariance matrix of the factor score estimate \(\hat{\boldsymbol{z}}_{j}\) is equal to

$$\displaystyle{ \left [(\boldsymbol{I} -\boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}\boldsymbol{A}')^{-1} -\boldsymbol{W}\text{ Var}(\boldsymbol{y}_{ j}\vert \boldsymbol{z}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta })\boldsymbol{W}'\right ]^{-1}\mid _{ \boldsymbol{z}_{j}=\hat{\boldsymbol{z}}_{j}}, }$$
(9.27)

where \(\boldsymbol{W} = (\boldsymbol{I} -\boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}\boldsymbol{A}')^{-1}\boldsymbol{A}(\boldsymbol{A}'\boldsymbol{A} + \boldsymbol{\Psi })^{-1}\) and \(\text{Var}(\boldsymbol{y}_{j}\vert \boldsymbol{z}_{j},\boldsymbol{\pi }_{j},\boldsymbol{\theta })\) can be approximated by the Gibbs sample variance, a by-product of the MCEM factor score estimation.

9.2.5 Application to the Job Selection Ranking Data

We now consider the marketing survey on people’s attitude toward career and living style in three main cities in Mainland China – Beijing, Shanghai, and Guangzhou. Five hundred responses from each city were obtained. A question regarding the behavior, conditions, and criteria for job selection of the 500 respondents in Guangzhou will be discussed here. Respondents were asked to rank the three most important criteria on choosing a job among the following 13 criteria: 1. favorable company reputation; 2. large company scale; 3. more promotion opportunities; 4. more training opportunities; 5. comfortable working environment; 6. high income; 7. stable working hours; 8. fringe benefits; 9. well matched with employees’ profession or talent; 10. short distance between working place and home; 11. challenging; 12. corporate structure of the company; and 13. low working pressure.

This is a typical top 3 out of 13 objects partial ranking problem. The values “1”, “2,” and “3” were assigned to the most, second, and third important criteria for job selection, respectively. Regarding the other less important items, it is common to define the midrank, i.e., \(\frac{1} {t-q}[(q + 1) + \cdots + t]\), as their rank. In this case the midrank is \(\frac{1} {10}[4 + \cdots + 13] = 8.5\). Table 9.7 provides some preliminary statistics, including sample mean and sample variance for each of the 13 criteria based on these 500 incomplete rankings with the midrank imputations.

Table 9.7 Summary statistics of job selection ranking data

The factor model is assumed and the analysis is made possible by the MCEM algorithm. Initial values of \(\boldsymbol{\theta }\) were obtained by principal factor analysis and a standardized rank score, \(\frac{\pi _{ij}} {\sqrt{(t^{2 } -1)/12}} \times \frac{1} {\sqrt{\sum _{i } 1/\sigma _{i }^{2}}}\), was used as starting value of y ij in the Gibbs sampler. The choice of standardized rank score was motivated by the fact that the rankings of y ij s must be consistent with the observed ranking \(\boldsymbol{\pi }_{j}\).

9.2.5.1 Model Estimation

Factor models with the number of factors ranging from zero to five were estimated. The Gibbs sampler (in the MC E-step) converged quite rapidly. We discarded the first 100 burn-in cycles and selected one \(\boldsymbol{y}_{j}\) systematically from every fifth cycle afterward until a total of 40 draws was reached.

We used the bridge sampling criterion discussed in Sect. 9.2.2.3 to detect the convergence of the MCEM algorithm. Figure 9.2 shows the plot of the log-likelihood ratio against the number of iterations of the 3-factor model. The MCEM algorithm converged after 20 iterations.

Fig. 9.2
figure 2

Bridge sampling criteria

The Akaike information criterion (AIC) was used to determine the appropriate number of factors. The observed likelihood function which can be written as a product of multivariate normal probabilities over the rectangular region was approximated by the Geweke-Hajivassiliou-Keane (GHK) method shown to be unbiased and most reliable. Table 9.8 exhibits the values of AIC approximated by GHK methods and the proportions of variation explained by the d-factor models with d = 0, 1, 2, 3, 4, 5. It can be seen that the “best” model according to AIC is the 3-factor model and the proportion of variation explained by the 3-factor model is 41 %.

Table 9.8 AIC values and proportions of variance explained by the d-factor model with d = 0, 1, 2, 3, 4, 5

To examine the goodness of fit of the 3-factor model, we compare the top-choice probability for each of the 13 objects based on the fitted model with its corresponding observed proportions. Here, the top-choice probabilities is estimated using the GHK method. Figure 9.3 provides a plot of the estimated top-choice probabilities vs the respective observed proportions. The points appear to lie on the straight line, indicating the 3-factor model fits the data reasonably well.

Fig. 9.3
figure 3

Estimated top-choice probabilities vs observed proportions

Table 9.9 Parameter estimates of the 3-factor model

Estimated values of the factor loadings were obtained by varimax rotation. The values of factor loadings expressed as the correlation between factors and utilities together with the estimated values of \(\boldsymbol{b}\) and \(\boldsymbol{\sigma }^{2}\) are summarized in Table 9.9. The first factor can be regarded as a measure of career prospect. Utilization of one’s talent and job aspiration are major concerns in this factor. The second dimension represents the undemanding job nature. Short distance between working place and home, stable working hours, and low working pressure all score high loadings in this factor. The third factor represents a contrast between the scale of the company and the salary. A large company offering lower income can be more attractive than a small company offering higher income.

Also, the mean vector \(\boldsymbol{b}\) reflects the overall importance of the 13 criteria. Note that the ordering of \(\hat{b}_{1},\ldots,\hat{b}_{t}\) is consistent with the average of the 500 rankings. Criterion 6 has the largest mean value which implies salary is their major concern on choosing a job while factors regarding the company itself are least important because \(\hat{b}_{1}\), \(\hat{b}_{4}\), and \(\hat{b}_{12}\) get large negative values.

Fig. 9.4
figure 4

Stopping criterion

9.2.5.2 Factor Score Estimation

To estimate the factor scores of the fitted 3-factor model, we applied the MCEM method. It is found that the Gibbs sampler in the E-step converged quite rapidly. We discarded the first 100 burn-in cycles and selected one \(\boldsymbol{y}_{j}\) systematically from every fifth cycle afterward until 40 draws were reached. We simulated a total of 300 cycles for each E-step. Also, we applied the stopping criterion to detect the convergence of the MCEM algorithm. Figure 9.4 gives the plot of \(l(\boldsymbol{z}^{(s)},\boldsymbol{z}^{(s+1)})\) against the number of iterations. According to the plot, the MCEM algorithm converged after 20 iterations.

Fig. 9.5
figure 5

Factor scores vs age and education level

It is often of interest to study the relationship between the factor scores and the covariates of each individual. In this survey, age group was collected in nine 5-year bands covering the ages from 15 to 59 ((1) 15–19, (2) 20–24, \(\ldots\)., (9) 55–59), while education level was recorded in five categories: primary (1), junior secondary (2), senior secondary (3), postsecondary (4), and university degree or above (5). Figure 9.5 provides plots of the means of the factor score estimates of individuals of different age groups and education levels. From the plot of factor scores by age, a decreasing trend for factor 1 scores and an increasing trend for factor 2 scores are observed whereas from the plot of factor scores by education, an increasing trend for factor 1 scores and a decreasing trend for factor 2 scores are observed. For factor 3 scores, only a slightly increasing trend in education is observed. These observations imply a young, well-educated person acquires more on career prospect while an old, less educated person may seek for a job with undemanding job nature. Finally, a better educated person is more willing to work in a large company offering lower salary.

To demonstrate the performance of our estimation on factor scores, Table 9.10 provides descriptive statistics on the covariance matrix S of \(\hat{\boldsymbol{z}}_{j},i = 1,\ldots,500.\) Small values of the standard error show that the estimation method is good and reliable. Also, it seems that the impact of unobservable information for this case is not serious.

Table 9.10 Standard error of factor scores