1 Introduction

In item response theory (IRT), inferences are made about the ordering of subjects’ latent scores on the basis of their responses to multiple test items. Many IRT models have been proposed to describe test data, and these models differ from each other with respect to the restrictions they impose on the data. For example, the Rasch (1960) model allows for consistent estimates of the differences between subjects’ latent scores, as the sum of the item scores is a sufficient statistic for the latent variable (Andersen 1973). This enables different subjects to be compared, irrespective of the particular set of items that were administered to them. But the restrictive parametric requirements the Rasch model imposes on the data also mean that the model might not fit. A nonparametric model like Mokken’s (1971) monotone homogeneity model is looser in terms of the requirements it imposes on the data, making it applicable to a wider variety of tests. Unlike the Rasch model, however, it does not allow for the estimation of the differences between subjects, but allows for ordinal inferences only. More specifically, the Mokken model implies a stochastic ordering of subjects’ latent scores on the basis of the sum of the item scores (Grayson 1988; Huynh 1994; Ünlü 2008). This measurement property is called the stochastic ordering of the latent trait by the sum score (SOL; Hemker, Sijtsma, Molenaar, & Junker, 1996, 1997), which means that a subject with a higher sum score is also expected to have a higher latent score than a subject with a lower sum score. When a test is used with the goal of obtaining an ordering of subjects on the basis of their sum scores, both the Rasch model and the Mokken model are sufficient; but for many tests the Rasch model does not fit the data, and the nonparametric Mokken model might be more appropriate.

The Rasch (1960) model and Mokken’s (1971) monotone homogeneity model are examples of IRT models for dichotomously scored items, where the sum score corresponds to the total number correct for an ability test or the number of times a subject agrees with the presented statement of an attitude test. For many test applications, however, the responses to the items may be assigned more than two scores. For example, the performance on a complex problem may be assigned credits for each step correctly taken in solving the problem, and attitudes are often graded on a Likert scale containing more than two ordered options to choose from. The IRT models for items with more than two ordered scores are called polytomous IRT models.

Masters’ (1982) partial credit model (PCM) is a polytomous IRT model which, like the Rasch (1960) model, has the sum scores as a sufficient statistic for the latent scores. However, the PCM imposes constraints on the data that are far more restrictive than necessary for the SOL property to hold. So, if the PCM does not fit the data, the SOL property might still hold. Like the dichotomous IRT models, it is tempting to think that a nonparametric version of the PCM is sufficient for the SOL property. Hemker et al. (1996, 1997) defined such a nonparametric PCM, and showed that Samejima’s (1969) graded response model and Muraki’s (1992) generalized PCM are special cases of this nonparametric PCM. However, they also found that none of these models imply the SOL property. In fact, they found that, except for the PCM and special cases hereof, none of the popular polytomous IRT models implies the SOL property. Hence, there seems to be a mismatch between the polytomous IRT models and the SOL property.

One way to resolve this mismatch is to define a weaker form of SOL that is implied by the polytomous IRT models. For example, Van der Ark and Bergsma (2010) defined weak SOL for the dichotomized sum scores, and Scheiblechner (2007) defined a monotone likelihood ordering on the basis of concordant and discordant item scores. The weak SOL property subdivides the sum scores into two groups, with sum scores higher than a particular sum score and sum scores lower than that sum score, respectively. These weaker forms of SOL imply that the group with the higher scores is also expected to have (on average) a higher latent value than the group with the lower scores. As such, the weak SOL property justifies the polytomous IRT models for the ordinal measurement of subjects, whereby the dichotomized sum scores yield a stochastic ordering. However, these weaker forms of SOL do not imply a stochastic ordering of subjects’ latent values for each individual sum scores; and, as such, they do not justify the individual sum scores as practical measures by which subjects’ latent values can be ordered. In contrast to the weaker forms of SOL, the SOL property justifies the used of the sum score for ordinal measurement of subjects, and it allows the sum score to be used as a direct scoring rule for actually performing such measurement.

In this paper, the gap that exists between the polytomous IRT models and the SOL property is bridged by proposing an additional constraint to the nonparametric PCM. It will be shown that the additional constraint is sufficient for the monotone likelihood ratio property (MLR; Lehmann 1959; Hemker et al., 1996, 1997), which implies the SOL property. The nonparametric PCM with the additional constraint is referred to as the isotonic PCM. The isotonic PCM justifies the use of the sum score as a scoring rule for ordinal measurement, without having to assume any parametric distributions. Also, observable properties are derived from the isotonic PCM, allowing the model to be empirically tested. In a simulation study, the merits of the Gibbs sampling algorithm for assessing the model-data fit are investigated for tests consisting of a small numbers of items.

2 Theory

Let A denote the set of subjects taking the test J. The test is assumed to be a finite and fixed set of items. Also, let X ai denote the item score variable for subject aA on item iJ. The item scores are assumed to be ordinal and have realizations x ai ∈{0,…,m i }. Here, m i denotes the maximum score assigned to item i, which is allowed to vary across different items. Notice that for m i =1, item i is dichotomously scored. The probability of the score X ai =x ai is denoted by P x (a,i), which depends on both the subject a and the item i. It is assumed that the m i +1 probabilities sum to unity for each item–subject pair, and that all probabilities are positive.

In IRT, a latent variable Θ is proposed to explain the associations that exist between the item scores on the test. Two assumptions lie at the heart of almost all IRT models. First, each subject aA is characterized by a value θ a , where the assumption of unidimensionality (UD) states that these values are located on a real-line. In our notation, the subject a is interpreted as an individual subject. However, a can also be considered a subject from an equivalence class of subjects, all with the same latent value θ a . For example, if Θ is assumed to be discrete, θ a corresponds to a homogeneous latent class of subjects (e.g., Lazarsfeld & Henry, 1968). Usually, IRT models are expressed by a function that is defined across Θ and relates the latent variable to each item score, allowing the items to be characterized by a small number of parameters, and also allowing inferences to be made about the distribution of the latent variable. However, to emphasize that our goal is to attain an ordering of subjects, we instead focus here on the probabilities P x (a,i) defined for the set of subjects A.

Second, it is assumed that the item scores are locally (i.e., conditional on Θ=θ a ) independent (LI), which means that the joint score probability P(X a1=x a1,X a2=x a2,…) equals the product of the corresponding marginals P x (a,i) for each aA. Together, the assumptions UD and LI allow the latent variable to be interpret as the variable underlying or explaining the item scores.

2.1 Partial Credit Models

Let the local odds corresponding to item score X ai =x ai be defined as

(e.g., Agresti 1990; Douglas, Fienberg, Lee, Sampson, & Whitaker, 1991), where O x (a,i)=1 for x ai =0, by definition. Assuming UD and LI, Masters’ (1982) PCM can be defined for x ai >0 as

$$ O_x(a,i)= \exp(\theta_a- \delta_{ix}), $$
(1)

where for each item score δ ix denotes the corresponding difficulty parameter. Notice that in Equation (1) the local odds is exponentially increasing in Θ. Hemker (1996) and Hemker et al. (1996, 1997) defined the nonparametric PCM, where Equation (1) was relaxed by assuming the local odds to be nondecreasing in Θ. The latter assumption was also proposed by Holland and Rosenbaum (1986) under the name of latent total positivity 2 (LTP2; cf. Karlin 1968). Formally, the assumption of LTP2 states that, for all x ai and x bi (where iJ and a,bA), if θ a <θ b then

$$ O_x(a,i)\le O_x(b,i). $$
(2)

Special cases of the nonparametric PCM include Muraki’s (1992) generalized PCM, and Samejima’s (1969) graded response model (Hemker et al. 1997, Theorem 3). However, none of these three models imply the SOL property (Hemker et al., 1996, 1997).

In order to obtain a polytomous IRT model that does imply the SOL property, one additional assumption is made: the assumption of weak instrumental independence (WI; cf. Scheiblechner 1999). Scheiblechner (1999) introduced weak instrumental independence as an axiom to the isotonic probabilistic modeling framework in order to obtain decomposable scales for the subject, the items, and the instrument scales. In another context, Samejima (1972) investigated the orderliness property for graded responses, which corresponds to assumption WI. The difference between both Scheiblechner’s models and the graded response models, and the model presented here is that in this paper the focus is on the local odds, whereas both Scheiblechner (1995, 1999) and Samejima (1969, 1972) consider cumulative probabilities instead. Neither Scheiblechner’s isotonic models nor the graded response models imply SOL. Here, assumption WI states that for all 1+x ai =y ai m i (where aA and iJ),

$$ O_x(a,i)\ge O_{y}(a,i). $$
(3)

In words, assumption WI restricts the different local odds pertaining to the same item to be ordered the same for all subjects, in descending order of the item scores. In terms of Scheiblechner’s models, the nonparametric model with assumption WI is isotonic with respect to Θ for both subjects and item scores. The model that is defined by the assumptions of UD, LI, LTP2 (Equation (2)), and WI (Equation (3)) is therefore called the isotonic PCM. Strictly speaking, the PCM (Equation (1)) does not imply WI, as the difficulty parameters δ ix are formally not restricted to be increasing in x. Notice also that for dichotomously scored items (i.e., m i =1) WI is implied by definitions as only a single odds exists. Next, the MLR property is discussed.

Let the sum score be denoted by X a+ and let P(X a+=x a+) denote the probability of subject a obtaining a sum score equal to x a+. The MLR property states that: for all 1+x a+=y a+ (where a,bA), if θ a <θ b then

(4)

Let x=(x 1,x 2,…) and y=(y 1,y 2,…) be two observed vectors of the item scores, for which 1+∑x=∑y. Assuming LI, the probability of a sum score X a+=x a+ can be expressed as

(5)

(cf. Huynh 1994), with

$$ Z_x(a,i)=\prod_{z=0}^x O_z(a,i). $$
(6)

Next, inserting Equation (5) in Equation (4) yields

(7)

The MLR property is satisfied if for all sum scores 1+x a+=y a+ (where a,bA) the left-hand side of Equation (7) is nonnegative.

Hemker et al. (1996) showed that MLR implies (i.e., is sufficient for) the SOL property, but the MLR property is more general than the PCM. With ‘⇒’ denoting a logical implication, this means that

$$\mbox{PCM}\quad \Rightarrow\quad \mbox{MLR}\quad \Rightarrow\quad \mbox{SOL}, $$

but the reverse relationships do not hold. Both the PCM and the isotonic PCM are special cases of the nonparametric PCM as they impose additional restrictions on top of the UD, LI and LTP2 assumptions, but the PCM and the isotonic PCM do not logically imply one another. (The PCM implies the isotonic PCM only if, for each item, δ i1δ i2≤⋯≤δ im .) Hence,

$$\mbox{PCM}\quad \Rightarrow\quad \mbox{nonparametric PCM} $$

and

$$\mbox{isotonic PCM}\quad \Rightarrow\quad \mbox{nonparametric PCM}, $$

but the reverse relationships do not hold. Further, the theorem below shows that the isotonic PCM implies the MLR property (but not the other way around), and thus also implies the SOL property:

$$\mbox{isotonic PCM}\quad \Rightarrow\quad \mbox{MLR}\quad \Rightarrow\quad \mbox{SOL}. $$

Theorem

The isotonic PCM is sufficient (but not necessary) for MLR.

To prove the theorem, the property of MLR is broken down into smaller, more tractable parts. First, element-wise MLR is defined for a single item, for which Equation (7) holds with all the odds of the other items being constant. It is then shown (see lemma below) that element-wise MLR for each item implies the MLR property. Consequently, the proof of the theorem consists of showing that the isotonic PCM implies element-wise MLR for each item. This proof is given after the proof of the lemma.

Define element-wise MLR for item i as the case for which Equation (7) holds with all the odds except those of item i being constant on the interval between θ a and θ b . Hence, element-wise MLR for item i means that Equation (7) holds with Z x (a,k)=Z x (b,k) and Z y (a,k)=Z y (b,k) for all ki. To break down Equation (7) into smaller and more tractable parts, it will first be shown by induction that Equation (7) is implied if element-wise MLR holds for each item separately.

Lemma

MLR (Equation (7)) is implied if element-wise MLR holds for each item separately.

Proof

For two items i and k, Equation (7) yields

$$ \sum_{\boldsymbol{x}} \sum _{\boldsymbol{y}} \bigl( Z_x(a,i) Z_y(b,i) Z_x(a,k) Z_y(b,k) - Z_x(b,i) Z_y(a,i) Z_x(b,k) Z_y(a,k) \bigr)\ge 0. $$
(8)

We start by showing that element-wise MLR for both items i and k separately imply Equation (8).

Suppose that element-wise MLR holds for item i. Then the odds corresponding to item k are constant on the interval between θ a and θ b , which means that the subject indices pertaining to the odds of item k can be arbitrarily assigned the index a, as Z x (a,k)=Z x (b,k) and Z y (a,k)=Z y (b,k). This yields for Equation (8) the following inequality

$$ \sum_{\boldsymbol{x}} \sum _{\boldsymbol{y}} \bigl( Z_x(a,i) Z_y(b,i)Z_x(a,k)Z_y(a,k)-Z_x(b,i)Z_y(a,i) Z_x(a,k)Z_y(a,k) \bigr)\ge 0. $$
(9)

Likewise, if we assume element-wise MLR holds for item k, then the indices of the odds of item i can be arbitrarily assigned the index a. However, as the assignment of indices to item i is arbitrary if element-wise MLR holds only for item k, we might just as well conveniently adapt these indices from Equation (9). Adapting the subject indices pertaining to the odds of item i from Equation (9), and assuming element-wise MLR only for item k, yields an inequality with indices that correspond to the situation where element-wise MLR holds for both item i and item k separately. Moreover, this inequality is equal to Equation (8). This shows that if element-wise MLR holds for item i and item k separately, then MLR holds also for item i and item k together. The same line of reasoning can be readily extended to any number of items, which shows that Equation (7) holds if element-wise MLR holds for each item separately. □

The lemma shows that to prove the theorem it is sufficient to prove that the isotonic PCM implies element-wise MLR for each item iJ, for which Equation (7) reduces to

$$ \sum_{\boldsymbol{x}} \sum _{\boldsymbol{y}} \biggl( \bigl(Z_x(a,i) Z_y(b,i) - Z_x(b,i) Z_y(a,i) \bigr) \mathop{ \prod_{k\in J}}_{k\ne i} Z_x(b,k) Z_y(b,k) \biggr) \ge 0. $$
(10)

In Equation (10), element-wise MLR is defined as a nonnegative summation across the difference in odds between the parentheses. This does not mean that for each pair of patterns x and y the difference between the parentheses is nonnegative. Let x i and y i refer to the ith element of x and y, respectively, and suppose that x i <y i . Then, Equation (6) shows that

$$Z_y(a,i)=Z_x(a,i)\prod_{z=x+1}^y O_{z}(a,i), $$

which means that for those pairs of patterns x and y in Equation (10) for which x i y i ,

The last inequality holds for the isotonic PCM because LTP2 implies that the difference in between the parentheses is nonnegative. However, using the same line of reasoning it can be shown that, for those pairs of patterns x and y for which x i >y i , the difference between the parentheses of Equation (10) can take on negative values. To show that the isotonic PCM implies element-wise MLR, it needs to be shown that the sum of these negative values cannot be larger than the sum of the positive values implied by the isotonic PCM. The proof of the theorem thus consists in showing that the isotonic PCM implies that the negative values associated with the patterns x and y for which x i >y i are canceled out by the positive values for those patterns for which x i <y i .

Proof of the Theorem

Define the domain D as a collection of all pairs of patterns x and y in Equation (10) for which x i >y i . Also, consider the codomain C to contain all pairs of patterns for which x i <y i . The proof of the theorem consists of two parts. In the first part a function is defined that assigns all elements of D (i.e., all pairs of patterns x and y for which x i >y i ) to a distinct element in C. In the second part, it is shown that the isotonic PCM implies that the negative values of the elements in D are canceled out by the positive values of the assigned elements in C. The result is that Equation (10) yields nonnegative values under the assumptions of the isotonic PCM, which by the lemma implies MLR.

Let \(\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\) denote an element in D for which x i =y i +u for any u≥1 (i.e., x i >y i ). Also, let item j be referred to as the operator if ji and for the item it holds that x j <y j . For each element from D, an operator item exists, as 1+∑x=∑y (from the definition of MLR). Further, let x ijk and y ijk denote the patterns excluding the scores on the items i, j, and k. Without loss of generality, the pair of patterns \(\boldsymbol{x}^{d}_{1}\) and \(\boldsymbol{y}^{d}_{1}\) can be expressed as

(11)

Next, consider also the pairs of patterns \(\boldsymbol{x}^{c}_{1}\) and \(\boldsymbol{y}^{c}_{1}\) for which

(12)

Because u≥1, the pair \(\{\boldsymbol{x}^{c}_{1},\boldsymbol{y}^{c}_{1}\}\) is an element from the codomain C. Notice that \(\boldsymbol{x}^{c}_{1}\) is obtained from \(\boldsymbol{y}^{d}_{1}\) by subtracting 1 from the score on the operator j (for which x j <y j ), and likewise, \(\boldsymbol{y}^{c}_{1}\) is obtained from \(\boldsymbol{x}^{d}_{1}\) by adding 1 to the score on the operator. This operation for obtaining the element \(\{\boldsymbol{x}^{c}_{1},\boldsymbol{y}^{c}_{1}\}\) from \(\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\) is denoted by the function f j :DC, which depends on the choice of the operator (in case more than one item exist for which x j <y j ). For a given operator, the function yields distinct elements in C for each element in D. However, if different items are used for the role of operator, then different elements in D can be assigned to the same element in C (i.e., a surjection). To illustrate, consider the pair \(\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) from the D, where

Suppose now that item k is assigned the role of operator, then \(f_{k}:\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) yields \(\{\boldsymbol{x}^{c}_{1},\boldsymbol{y}^{c}_{1}\}\); the same element in the codomain as \(f_{j}:\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\). This example illustrates that the function f j does not guarantee that each element in D can be assigned to a distinct element in C. If an alternative item exists for the role of operator, the surjection can be resolved by assigning the role of operator to another item. However, a problem occurs when for \(\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\) only item j can be the operator (i.e., only for item j it holds that x j <y j ) and for \(\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) only item k can be the operator. This situation, where two elements in D yield the same element in C without alternative choices for the operators, is referred to as the exceptional case. If for \(\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\) only item j can be the operator, then the exceptional case implies that x k y k and ∑x ijk ≥∑y ijk , which means that x j +1≤y j u. In addition, if for \(\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) only item k can take up the role of operator, then x j +1≥y j −1, which means that u=1. Hence, if two elements from D are assigned to the same element in C, and both elements have only a single (but different) item to take the role of operator, then it has to be true that u=1. Furthermore, this means that x k y k +1−u=y k , which holds only if x j +1=y j −1, x k =y k , and x ijk =y ijk . Hence, if \(f_{j}:\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\) and \(f_{k}:\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) both yield the same element in C, then the vectors patterns \(\boldsymbol{x}^{d}_{1}\), \(\boldsymbol{y}^{d}_{1}\), \(\boldsymbol{x}^{d}_{2}\), and \(\boldsymbol{y}^{d}_{2}\) can be expressed as

$$ \everymath{\displaystyle} \begin{array}{c} \boldsymbol{x}^{d}_1=(x_i,x_j,x_k, \boldsymbol{x}_{ijk}),\qquad \boldsymbol{y}^{d}_1=(x_i-1,x_j+2,x_k, \boldsymbol{x}_{ijk}), \\[9pt] \boldsymbol{x}^{d}_2=(x_i,x_j+1,x_k-1, \boldsymbol{x}_{ijk}),\quad \mbox{and} \quad \boldsymbol{y}^{d}_2=(x_i-1,x_j+1,x_k+1, \boldsymbol{x}_{ijk}). \end{array} $$
(13)

Also, we obtain that the patterns in Equation (12) become

(14)

Notice that the existence of the pairs \(\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\) and \(\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) implies that 0≤x j <m j . This immediately shows that the surjection can be resolved by allowing for the exceptional case item j to take up the role of operator for \(\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) (an exceptional case as x j =y j ). For this exceptional case, \(f^{\prime}_{j}:\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) (prime added to indicate the exception) is properly defined and yields the distinct element \(\{\boldsymbol{x}^{c}_{2},\boldsymbol{y}^{c}_{2}\}\) in C. This element \(\{\boldsymbol{x}^{c}_{2},\boldsymbol{y}^{c}_{2}\}\) is distinct as it could not be attained without the exception of an operator for which x j =y j . Applying function \(f^{\prime}_{j}\) to \(\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) reveals that

(15)

Next it is shown that the isotonic PCM implies that the negative values in Equation (10) of the elements in D are canceled out by the positive values of the assigned elements in C. This is first shown for the operator item j for which x j <y j . Second, it is shown also to be true in the exceptional case for which x j =y j .

First, for the pair \(\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\) in Equation (11) and \(\{\boldsymbol{x}^{c}_{1},\boldsymbol{y}^{c}_{1}\}\) in Equation (12), Equation (10) reduces to

(16)

where the inequality holds if both terms between the parentheses in Equation (16) are nonnegative. The first term is nonnegative if

which holds because of assumption LTP2. The second term between the parentheses in Equation (16) is nonnegative if

Because x j <y j , the second term in Equation (16) is nonnegative because of assumption WI.

Second, for the exceptional case Equation (10) is applied to the four pairs \(\{\boldsymbol{x}^{d}_{1},\boldsymbol{y}^{d}_{1}\}\) and \(\{\boldsymbol{x}^{d}_{2},\boldsymbol{y}^{d}_{2}\}\) in Equation (13), the pair \(\{\boldsymbol{x}^{c}_{1},\boldsymbol{y}^{c}_{1}\}\) in Equation (14), and the pair \(\{\boldsymbol{x}^{c}_{2},\boldsymbol{y}^{c}_{2}\}\) in Equation (15):

for which factorization similar to Equation (16) yields

(17)

Equation (17) holds if all three terms between the parentheses are nonnegative. Similar to Equation (16), the first term is nonnegative because of assumption LTP2. The second and third terms between the parentheses are nonnegative because of assumption WI. Hence, for any set of elements from D, a distinct set of elements from C exists for which the isotonic PCM implies that Equation (10) is nonnegative. This concludes the proof. □

2.2 Observable Properties

In order to test the assumptions of the isotonic model, observable consequences of the model need to be obtained. Such observable consequences allow the use of the sum score for a stochastic ordering of subjects to be empirically tested. Holland and Rosenbaum (1986, Theorem 5) showed that the assumptions UD, LI, and LTP2 imply a property which they called conditional multivariate TP2 (CMTP2; cf. Karlin & Rinott, 1980). This means that

$$\mbox{isotonic PCM}\quad \Rightarrow\quad \mbox{nonparametric PCM}\quad \Rightarrow\quad \mbox{CMTP}_2. $$

Consider the item score vector x=(x i ,x j ,x ij ), where x ij is the vector of item scores excluding the scores on items i and j. Also, consider the scores z i and z j , for which x i <z i and x j <z j . The observable property CMTP2 states that for all i,jJ and all x ij ,

(18)

In words, for any 2×2 sub-table of the joined score probabilities of items i and j (conditional on the other item scores), CMTP2 means that the product of the two diagonal elements is larger than the product of the two off-diagonal elements. Any reversal of the inequality in Equation (18) indicates a violation of the joined assumptions UD, LI, and LTP2. Hence, with ‘¬’ denoting the logical negation,

$$\neg\mbox{CMTP}_2\quad \Rightarrow\quad \neg\mbox{nonparametric PCM} \quad \Rightarrow\quad \neg\mbox{isotonic PCM}. $$

Besides the property of CMTP2, a manifest property for WI can be obtained. Let x i denote the item score vector, excluding the score on item i. Let the manifest local odds be expressed as

$$\frac{P(\boldsymbol{X}=(x_i,\boldsymbol{x}_i))}{P(\boldsymbol{X}=(x_i-1,\boldsymbol{x}_i))}. $$

Here, conditional manifest WI (CMWI) is defined as follows: for all iJ, all 0<x i <m i , and all x i ,

$$ \frac{P(\boldsymbol{X}=(x_i,\boldsymbol{x}_i))}{P(\boldsymbol{X}=(x_i-1,\boldsymbol{x}_i))}\ge \frac{P(\boldsymbol{X}=(x_i+1,\boldsymbol{x}_i))}{P(\boldsymbol{X}=(x_i,\boldsymbol{x}_i))}. $$
(19)

The next proposition shows that the isotonic PCM is sufficient for CMWI; that is,

$$\mbox{isotonic PCM}\quad \Rightarrow\quad \mbox{CMWI}. $$

Proposition

The assumptions UD, LI, and WI imply CMWI.

Proof

Let XX i denote the score vector excluding the ith item score variable. Assuming UD and LI, the manifest local odds can be expressed as

(cf. Junker 1993, Proposition 4.1a). Inserting E(O x (a,i)|XX i =x i ) into Equation (19) shows that CMWI is implied by WI, as the inequality holds for all aA. □

The proposition states that under the assumptions UD, LI, and WI, the manifest local odds of item i are ordered the same for all subjects, in descending order of the item scores (conditional on the rest of the item scores). Any reversal of the inequality in Equation (19) indicates a violation of the joined assumptions UD, LI, and WI. Hence,

$$\neg\mbox{CMWI}\quad \Rightarrow\quad \neg\mbox{isotonic PCM}. $$

The observable properties CMTP2 and CMWI form the basis for our test for the fit of the isotonic PCM to the data. For CMTP2 the score vector x is partitioned into disjoint item scores (x i ,x j ) and x ij , where x ij corresponds to the remaining item scores. This means that depending on the number of items in the test and numbers m i of these items, a large number of inequalities in Equation (18) and Equation (19) need to be assessed. To illustrate, for 4 items for which each m i =2, there are 486 inequalities corresponding to CMTP2 and 108 corresponding to CMWI. Beside the problem of having to combine these Boolean outcomes into a single conclusion about whether or not the isotonic PCM fits the data, we also need to keep in mind that a large sample size would be needed to obtain reliable estimates for the probabilities in Equation (18) and Equation (19). To deal with sparse data, the remaining item scores can be combined ad hoc to form rest-score groups (Molenaar & Sijtsma, 2000), with each group containing at least a preset number of subjects. However, this method also results in a loss of power of the test as many of the inequalities in Equation (18) and Equation (19) are ignored. Instead of this method of combining the remaining item score into rest-score groups, we explore in the next section the use of the Gibbs sampling algorithm to simultaneously deal with the large number of inequalities and with sparse data.

3 The Gibbs Sampling Algorithm

The Gibbs sampler allows inequality constraints to be imposed on the probability P(X=x), for all score vectors x, by restricting its prior distribution. Hereby, each probability is iteratively sampled from its posterior distribution, conditional on all the other probabilities at the previous iteration. Examples of such applications of the Gibbs sampler include Hoijtink (1998), Hoijtink and Molenaar (1997), Karabatsos (2001), Karabatsos and Sheu (2004), Ligtvoet and Vermunt (2012), and Van Onna (2002).

For notational convenience, we consider (without loss of generality) the expected frequency F(X=x) instead of the probability P(X=x). Suppose that x i <z i and x j <z j . The observable property CMTP2 implies that

$$F(\boldsymbol{X}=\boldsymbol{x})F\bigl(\boldsymbol{X}=(z_i,z_j,\boldsymbol{x}_{ij})\bigr)\ge F\bigl(\boldsymbol{X}=(x_i,z_j,\boldsymbol{x}_{ij})\bigr)F\bigl(\boldsymbol{X}=(z_i,x_j,\boldsymbol{x}_{ij})\bigr). $$

This means that F(X=x) is restricted from below by

$$F\bigl(\boldsymbol{X}=(x_i,z_j,\boldsymbol{x}_{ij})\bigr) F\bigl(\boldsymbol{X}=(z_i,x_j,\boldsymbol{x}_{ij})\bigr)/ F\bigl(\boldsymbol{X}=(z_i,z_j,\boldsymbol{x}_{ij})\bigr). $$

Likewise, the expected frequency of the score vector (x i ,z j ,x ij ) is restricted from above by

$$F(\boldsymbol{X}=\boldsymbol{x})F\bigl(\boldsymbol{X}=(z_i,z_j,\boldsymbol{x}_{ij})\bigr)/ F\bigl(\boldsymbol{X}=(z_i,x_j,\boldsymbol{x}_{ij})\bigr). $$

For all combinations of item pairs and all x ij , we can list all those inequalities that involve F(X=x) and collect all the lower bounds for F(X=x) in the set L and all the upper bounds in the set U.

Similar to CMTP2 we can collect all the lower and upper bounds for each expected item score frequency on the basis of the observable property CMWI. For example, CMWI implies that F(X=x) is restricted from below by

$$\sqrt{F\bigl(\boldsymbol{X}=(x_i,\boldsymbol{x}_{i}-1)\bigr)F\bigl(\boldsymbol{X}=(x_i,\boldsymbol{x}_{i}+1)\bigr)} $$

(see Equation (19)). By listing all the inequalities implied by CMWI that involve F(X=x), and adding the lower and upper bounds to L and U, respectively, we obtain that any frequency of x between max(L) and min(U) satisfies the constraints of both CMTP2 and CMWI.

3.1 Estimation

The goal of the following procedure is to obtain estimates for the expected frequencies under the constraints imposed by both CMTP2 and CMWI. Subsequently, these estimates can be compared to the observed item score frequencies to assess the fit of the isotonic PCM to the data.

With the frequencies F(X=x) following a multinomial distribution, the posterior distribution of interest (with the conjugated prior) is Dirichlet distributed. Using the Gibbs sampling algorithm, estimates of the expected frequencies are obtained by sampling values from this posterior. This can be achieved iteratively for each x by fixing all expected frequencies except F(X=x) to the sampled values from the previous iteration, determining the bounds for F(X=x) and sampling its value under the appropriate constraints, and then repeating the procedure for each x. Sampling from the appropriate posterior is problematic here, because the probabilities P(X=x) are not independent under a Dirichlet distribution and, so, the value of F(X=x) cannot be sampled independently from the other frequencies. To circumvent this problem, each probability is first re-scaled to a Gamma distribution, which is truncated by the upper and lower bounds. Subsequently, a sampled value for F(X=x) can be obtained from the appropriate posterior distribution (Laudy & Hoijtink, 2007; Ligtvoet & Vermunt, 2012). The resulting Gibbs sampling algorithm contains the following steps:

  1. 1.

    Assign to all frequencies initial values \(\hat{F}_{0}(\boldsymbol{X}=\boldsymbol{x})\) that conform to CMTP2 and CMWI.

  2. 2.

    Compute at iteration t for the vector x all the lower bounds and upper bounds imposed upon it by CMTP2 and CMWI using the estimates of the other frequencies at the previous iteration. Sample a value \(\hat{G}_{t}(\boldsymbol{X}=\boldsymbol{x})\) from the truncated Gamma distribution,

    $$\hat{G}_t(\boldsymbol{X}=\boldsymbol{x})\sim \mbox{Gamma}\bigl(F(\boldsymbol{X}=\boldsymbol{x})+1,1\bigr)\times I\bigl(\max(L)<\hat{G}_t(\boldsymbol{X}=\boldsymbol{x})<\min(U)\bigr), $$

    where one observation is added to F(X=x) as a prior, the scaling parameter is set equal to unity, and I(⋅) is the indicator function containing the restrictions. Sampling from the truncated Gamma distribution can be achieved by inverse probability sampling (e.g., Hoijtink & Molenaar 1997). Return the frequency

    $$\hat{F}_{t}(\boldsymbol{X}=\boldsymbol{x})=\frac{N\hat{G}_t(\boldsymbol{X}=\boldsymbol{x})}{ \hat{G}_t(\boldsymbol{X}=\boldsymbol{x})+\sum_{\boldsymbol{y}\ne \boldsymbol{x}} \hat{F}_{t-1}(\boldsymbol{X}=\boldsymbol{y})}, $$

    where N is the total sample size and the summation is taken across all previously sampled frequencies.

  3. 3.

    Repeat Step 2 for all x.

  4. 4.

    Repeat Steps 2 and 3 until a preset criterion of convergence has been reached.

Once the criterion of convergence in Step 4 is reached, subsequent parameter values are sampled as if taken from the posterior distribution of interest. The next step is intended to use these samples to assess model-data fit for the isotonic PCM:

  1. 5.

    After Step 4, repeat Steps 2 and 3 until a sufficiently large sample of values for \(\hat{F}(\boldsymbol{X}=\boldsymbol{x})\) have been obtained.

3.2 Model-Data Fit

Consider two statistics for assessing the model-data fit: Pearson’s X 2 and the difference in log-likelihood from the multinomial model. For Pearson’s X 2, the expected frequency of X=x under the constraints imposed by CMTP2 and CMWI is estimated by the median value of the samples of \(\hat{F}_{t}(\boldsymbol{X}=\boldsymbol{x})\) from the Gibbs sampler (after convergence). Notice that the distributions of the two statistics for model-data fit under the constraints imposed by CMTP2 and CMWI can be obtained by computing the statistics for the samples of \(\hat{F}_{t}(\boldsymbol{X}=\boldsymbol{x})\). The Bayesian p-values (Gelman, Meng, & Stern, 1996; Meng 1994) for these statistics are then estimated as the proportion of times the statistic computed for the sampled values exceeds the value of the statistic for the observed score distribution.

3.3 Simulation Study

To assess the use of the Gibbs sampling algorithm to deal with the large number of inequalities and the sparse data, a small simulation study was performed. In this simulation study, the Type I error rate and the power were assessed of two model-data fit procedures that were based on the Bayesian p-values. For the first procedure, the p-values were computed for the Pearson’s X 2 statistic. For the second procedure, the p-values were computed for the log-likelihood ratio statistic. For each of these statistics, the Type I error rate corresponds to the proportion of times the p-value incorrectly indicated misfit of the model assumptions. The power of each statistic corresponds to the proportion of times the p-value correctly indicated misfit of the model assumptions.

3.3.1 Method

The independent variables in this study were the number of items, the number of score categories, the sample size, and whether or not the isotonic PCM was valid.

To generate data under a flexible (nonparametric) model, the following procedure was used for each item. First, subjects’ latent values we randomly sampled from a Uniform distribution between 1 and 5. Second, a 5×m i design matrix D was generated with random probabilities, where the rows represented θ=1, 2, 3, 4, and 5. These probabilities were subsequently sorted within each column of D in increasing order. Third, for those data sets in the design for which the isotonic PCM did not hold, the probabilities for adjacent rows in D were connected with a straight (nondecreasing) line to define a function to relate each subjects’ latent score to the probability of attaining each of the m i item scores. These probabilities were taken as cumulative probabilities; hence, the data sets in the design for which the isotonic PCM did not hold correspond to the nonparametric graded response model (e.g., Molenaar 1997). This nonparametric graded response model neither implies the assumption LTP2 nor the assumption WI. For those data sets in the design for which the isotonic PCM did hold, the probabilities in D correspond to the nonparametric PCM. To facilitate assumption WI, these probabilities were sorted within each column of D in decreasing order. Fourth, using the formulas in Van der Ark, Hemker, and Sijtsma (2002), the probabilities P x (a,i) could be obtained for both the nonparametric graded response model and the isotonic PCM. Finally, for each subject an item score vector was randomly sampled from a multinomial distribution using these probabilities.

Van der Ark (2005) found that the severity of the violation of MLR for the nonparametric graded response model depends both on the number of items and the number of score categories. Smaller violations occur for tests consisting of more items and less score categories. Because the severity of the violation of MLR is larger for tests consisting of a few items, we only considered tests of 2, 3, and 4 items. Also, for tests of more than 4 items, the analysis of a single data set took more than 10 minutes, which is no problem when analyzing a single data set, but it became infeasible for the number of replications used in this simulation. For the number of score categories, we fixed all m i =m and considered three levels: m+1=3, 5, and 7 for a two-item test. For tests consisting of 3 and 4 items, three score categories were considered only. For these five combinations of number of items and number of score categories, sample sizes of N=200, 400, and 800 were considered for both the nonparametric graded response model and the isotonic PCM, where for each of the resulting 30 cells of the design 50 data sets were generated.

The dependent variables were the Type I error rate and the power of the model-data fit procedures based on the Bayesian p-values for Pearson’s X 2 statistic and the log-likelihood ratio statistic. As a decision rule a p-value smaller than 0.10 was taken to indicate misfit of the model assumptions. All 1500 data sets were generated and analyzed in R (R Development Core Team 2009). The codes provided by Nadarajah and Kotz (2006) were used for sampling from the truncated Gamma distributions. Tests of the Gibbs sampler indicated that for the data sets used in this study a burn-in period of 100 iterations was sufficient for convergence to be reached. This small number might be explained by the large number of constraints that restrict the outcome space. The estimations were based on 400 samples from the posterior after this burn-in period. The R-code for running the Gibbs sampler, which also works for varying numbers of score categories, can be obtained from the author’s web site: http://home.medewerker.uva.nl/r.ligtvoet.

3.3.2 Results

The results are discussed first for the model-data fit procedure based on Pearson’s X 2 statistic. Table 1 contains the Type I error rates and power for the different numbers of items, numbers of score categories, and sample sizes. Table 1 shows that, with the exception of the cells of the design corresponding to 2 items and 3 score categories, the Bayesian p-values for Pearson’s X 2 statistic indicate model-data misfit for almost all the data sets. Inspection of the distribution of these p-values across the cells of the design showed a U-shaped distribution, with most p-values close to zero. These results indicate that the Bayesian p-value for Pearson’s X 2 statistic is not a good indicator for model-data fit.

Table 1. Type I error rate and power based on Pearson’s X 2 statistic, for different numbers of items, numbers of score categories, and sample sizes.

The results of the model-data fit procedure based on the log-likelihood statistic are shown in Table 2. Table 2 shows the Type I error rates and power for the different numbers of items, numbers of score categories, and sample sizes. Across the cells of the design, the power was moderate to high (range between 0.62 and 1.00), suggesting that the statistic is sensitive to model violations. In addition, the Type I error rate was small, except for the cells of the design corresponding to 7 score categories. However, for the cells of the design with 7 score categories the Type I error rate decreases for larger sample sizes, which suggest that a sample size of more than 800 subjects is required for data containing 7 score categories. These results suggest that the Bayesian p-value for the log-likelihood statistic is a good indicator for model-data fit, given that the sample size is large enough.

Table 2. Type I error rate and power based on log-likelihood statistic, for different numbers of items, numbers of score categories, and sample sizes.

3.3.3 Conclusion

In the simulation study, the Type I error rate and the power were assessed of two model-data fit procedures that were based on the Bayesian p-values. These p-values were computed for the Pearson’s X 2 statistic and the log-likelihood ratio statistic. These result indicate that the Bayesian p-value for Pearson’s X 2 statistic is not a good indicator for model-data fit, whereas the Bayesian p-value for the log-likelihood statistic did have a high power and a low Type I error rates for fewer than 7 score categories.

4 Discussion

In this paper, sufficient conditions for the stochastic ordering of subjects by their sum score were obtained. These conditions define the isotonic PCM model, which like the PCM justifies the use of the sum score as a basis for ordering subjects. However, the isotonic PCM is more flexible than the PCM in terms of the restrictions it imposes on the test data, which makes it applicable for a wider variety of tests. Also, observable properties of the isotonic PCM were derived in the form of inequality constraints. It was shown how to obtain estimates of the score distribution under these constraints by using the Gibbs sampling algorithm. In a small simulation study, it was shown that the Bayesian p-values based on the log-likelihood ratio statistic can be used to assess the fit of the isotonic PCM to the data.

That the isotonic PCM is more flexible than the PCM does not mean that the isotonic PCM will almost surely hold for a given data set. In fact, the isotonic PCM imposes a great many restrictions onto the data, which illustrate just how restrictive the requirement for using the sum score are. Misfit of the isotonic PCM means that the practical use of the sum score for ordering subjects lacks empirical justification. If the isotonic PCM fits the data, then there is at least some empirical justification for such use of the sum score.

Ligtvoet, Van der Ark, Bergsma, and Sijtsma (2011) studied sufficient conditions for an ordering of the items that is invariant across subjects. For polytomously scored items, they defined several latent scales which yield observable consequences in terms of the restrictions imposed on the observed score distribution. One possible extension of the isotonic PCM is to include such an item ordering, which can be easily tested by including additional inequality restrictions in the Gibbs sampling algorithm.