The Reduced Reparameterized Unified Model (Reduced RUM; Hartz, 2002; Hartz, Roussos, Henson, & Templin, 2005) has received considerable attention among psychometric researchers concerned with cognitive diagnosis models (CDMs) for educational assessment. CDMs are constrained latent class models. For a given ability domain, classes of intellectual proficiency are defined in terms of binary cognitive skills called attributes, which an examinee may or may not have mastered (DiBello, Roussos, & Stout, 2007; Haberman & von Davier, 2007; Leighton & Gierl, 2007; Rupp, Templin, & Henson, 2010). Expectation Maximization (EM) or Markov Chain Monte Carlo (MCMC; de la Torre, 2009, 2011; Henson, Templin, & Willse, 2009; von Davier, 2005, 2008, 2011) are used to obtain maximum likelihood estimates of the model parameters that are then used to assign examinees to proficiency classes.

If a researcher wants to use MCMC for fitting the Reduced RUM, then he or she can either choose to write his or her own code, or refer to the MCMC routines implemented, for example, in OpenBUGS (Lunn, Spiegelhalter, Thomas, & Best, 2009) or the Arpeggio Suite (Bolt et al., 2008). Alternatively, the Reduced RUM can be fitted by the EM algorithm. The first option is again that a user writes his or her own code (see, for example, Feng, Habing, & Huebner, 2014). Second, a user can resort to a commercial package that offers an implementation of the EM algorithm for fitting (constrained) latent class models, for example, Latent GOLD (Vermunt & Magidson, 2000) and Mplus (Muthén & Muthén, 1998–2011).

Using a latent class analysis (LCA) routine for fitting the Reduced RUM requires that it be re-expressed as a logit model, with constraints imposed on the parameters of the logistic function. The parameterization and the associated constraints have been worked out for the Reduced RUM involving two attributes (Henson et al., 2009). However, for more than two attributes, the specific reparameterization and the constraints to be imposed on the parameters of the logistic function are nontrivial and currently unknown.

This article intends to close this gap: The general reparameterization of the Reduced RUM as a logit model involving any number of attributes is presented including the associated parameter constraints. Thus, a researcher can now use the LCA routine, say in Mplus, for fitting educational data by the Reduced RUM.

The presentation is divided into a theoretical and a practical/applied part. The next section briefly reviews definitions and technical key concepts of (general) CDMs and the Reduced RUM. The mathematical derivations and proofs are presented in the subsequent sections. As a practical illustration, the Reduced RUM is fitted to two synthetic data sets and to a real-world data set using the EM algorithm implemented in the constrained LCA routine in Mplus (key parts of the syntax are provided in the appendices). For comparison, all data sets were also fitted with the MCMC routine available in OpenBUGS.

1 Technical Background

1.1 (General) Cognitive Diagnosis Models

Suppose that \(K\) binary cognitive skills or attributes constitute a certain ability domain; there are then \(M=2^K\) distinct attribute profiles, each of which characterizes a proficiency class. Let the \(K\)-dimensional vector, \(\varvec{\alpha }_m = (\alpha _1, \ldots , \alpha _K)^T\), \(m=1, \ldots , M\), denote the binary attribute profile of proficiency class \(m\), where the \(k\)th entry indicates whether the respective attribute has been mastered. (Throughout the text, the superscript \(T\) denotes the transpose of vectors or matrices; the “prime notation” is reserved for distinguishing between vectors or their scalar entries.) Consider a test of \(J\) items for assessing ability in the domain. Each individual item \(j\) is associated with a \(K\)-dimensional binary vector, \(\mathbf {q}_j\), called item-attribute profile, where \(q_{jk} = 1\) if a correct answer requires mastery of the \(k\)th attribute, and \(0\) otherwise. Note that item-attribute profiles consisting entirely of zeroes are inadmissible, because they correspond to items that require no skills at all. Hence, given \(K\) attributes, there are at most \(2^K-1\) distinct item-attribute profiles. The \(J\) item-attribute profiles of a test constitute its Q-matrix, \(\mathbf {Q}=\{q_{jk}\}_{(J \times K)}\) (Tatsuoka, 1983, 1985) that summarizes the constraints specifying the associations between items and attributes.

CDMs differ in the way in which mastery and nonmastery of the attributes are believed to affect an examinee’s performance on a test item (e.g., compensatory models vs non-compensatory models; conjunctive models vs disjunctive models; for a detailed discussion, see Henson et al., 2009). General CDMs (de la Torre, 2011; Henson et al., 2009; Rupp et al., 2010; von Davier, 2005, 2008, 2011) express the functional relation between attribute mastery and the probability of a correct item response in a unified mathematical form and parameterization that are applicable to “recognizable” CDMs (de la Torre, 2011, p. 181), as discussed previously in the literature, and CDMs “that have not yet been defined” (Henson et al., p. 199), thereby establishing a general standard for model comparison and evaluation.

Define the “kernel” (Rupp et al., 2010, p. 135), of item \(j\), \(g(\mathbf {q}_j, \varvec{\alpha })\), as the linear combination of all \(K\) attribute main effects, \(\alpha _k\), and their interactions

$$\begin{aligned} g(\mathbf {q}_j, \varvec{\alpha }) = \gamma _{j0} + \sum _{k=1}^K \gamma _{jk}q_{jk}\alpha _k + \sum _{k'=k+1}^K \sum _{k=1}^{K-1} \gamma _{jkk'}q_{jk}q_{jk'}\alpha _k\alpha _{k'} + \cdots + \gamma _{j12\ldots K}\prod _{k=1}^Kq_{jk}\alpha _k, \end{aligned}$$
(1)

where \(q_{jk}\) indicates whether mastery of attribute \(\alpha _k\) is required for item \(j\). The attribute interactions are expressed as product terms. For example, the two-way interaction of attributes \(\alpha _k\) and \(\alpha _{k'}\) is written as \(q_{jk}q_{jk'}\alpha _k\alpha _{k'}\). (The order of an interaction corresponds to the number of parenthetical attribute subscripts of the associated coefficient, \(\gamma _{j(\ldots )}\).) Attribute main effects and interaction terms can be removed from the kernel by constraining the corresponding entries in the parameter vector, \(\varvec{\gamma }_j=(\gamma _{j0}, \gamma _{jk}, \gamma _{j(kk^{\prime })}, \ldots , \gamma _{j(12\ldots K)})^T\), to zero.

Let \(Y_j\) denote the response to the binary test item \(j\). The expression of the item response function (IRF), \(P(Y_j=1 \mid \varvec{\alpha })\), in terms of \(g(\mathbf {q}_j, \varvec{\alpha })\) must guarantee that \(0 \le P(Y_j=1 \mid \varvec{\alpha }) \le 1\), which, for example, can be achieved by using the logit link, \(P(Y_j=1 \mid \varvec{\alpha }) = \mathrm{e}^{g(\mathbf {q}_j, \varvec{\alpha })} / (1 + \mathrm{e}^{g(\mathbf {q}_j, \varvec{\alpha })})\). Based on this general logistic function, von Davier, 2005, 2008, 2011 defined the IRF of his General Diagnostic Model (GDM) by constraining all interaction terms in \(g(\mathbf {q}_j, \varvec{\alpha })\) to be zero (see Equations 1 and 2; von Davier, 2005, pp. 3–4). Henson et al. (2009) used the unconstrained general logistic function as the IRF of their Log-linear Cognitive Diagnosis Model (LCDM; see Equation 11 in Henson et al., 2009, p. 197). The specific IRFs of recognizable CDMs like the Deterministic Input Noisy Output “AND” gate (DINA) model (Junker & Sijtsma, 2001, Macready & Dayton, 1977), the Deterministic Input Noisy Output “OR” gate (DINO) model (Templin & Henson, 2006), and the Reduced RUM were then derived by Henson et al. (2009) through constraining the coefficients in \(g(\mathbf {q}_j, \varvec{\alpha })\). The logit link was also used by de la Torre (2011) for the IRF of a general CDM that he called the Generalized DINA (G-DINA) model (see Equation 2; de la Torre, 2011, p. 181.). In addition, de la Torre (2011) proposed to use the identity link, \(P(Y_j=1 \mid \varvec{\alpha }) = g(\mathbf {q}_j, \varvec{\alpha })\), and the log link, \(P(Y_j=1 \mid \varvec{\alpha }) = \mathrm{e}^{g(\mathbf {q}_j, \varvec{\alpha })}\), for expressing the IRF of the G-DINA model (see Equations 1 and 3; de la Torre, 2011, pp. 181–182; these two link functions require further constraints on the parameters to guarantee that \(P(Y_j=1 \mid \varvec{\alpha })\) is bounded by 0 and 1). The IRFs of various recognizable CDMs were derived by de la Torre (2011) based on the G-DINA model.

1.2 The Reduced RUM as a General Cognitive Diagnosis Model Based on the Logit Link

The IRF of the Reduced RUM in its traditional parameterization is

$$\begin{aligned} P(Y_j=1 \mid \varvec{\alpha })=\pi ^*_j \prod _{k=1}^K r_{jk}^{*\, q_{jk}(1-\alpha _{k})}, \end{aligned}$$
(2)

where \(0 < \pi ^*_j < 1\) denotes the probability of a correct answer for an examinee who has mastered all the attributes required by item \(j\), and \(0 < r^*_{jk} < 1\) is a penalty parameter for not mastering the \(k\)th attribute.

For \(K=2\) attributes, Henson et al. (2009) obtained the expression of the Reduced RUM as a general CDM based on the logit link by first transforming Equation 2 to the equivalent, but mathematically more tractable Inverse RUM, which then allowed for deriving the expressions of the parameters of the logistic function (see Equations 25–27, Henson et al., 2009, p. 201). The IRF is

$$\begin{aligned} P(Y_j=1 \mid \varvec{\alpha })= & {} \frac{ \mathrm{e}^{ \beta _{j0} + \beta _{j1}q_{j1}\alpha _1 + \beta _{j2}q_{j2}\alpha _2 + \beta _{j12}q_{j1}q_{j2}\alpha _1\alpha _2 } }{ 1 + \mathrm{e}^{ \beta _{j0} + \beta _{j1}q_{j1}\alpha _1 + \beta _{j2}q_{j2}\alpha _2 + \beta _{j12}q_{j1}q_{j2}\alpha _1\alpha _2 }},\\ \text{ subject } \text{ to: }&\beta _{jk} > 0 \quad k=1,2\nonumber \end{aligned}$$
(3)

where

$$\begin{aligned} \beta _{j12} = \ln \left( \frac{1 + \mathrm{e}^{\beta _{j0} }}{1 + \mathrm{e}^{\beta _{j0} + \beta _{j1} } + \mathrm{e}^{\beta _{j0} + \beta _{j2} } - \mathrm{e}^{\beta _{j0} + \beta _{j1} + \beta _{j2}}} \right) \end{aligned}$$
(4)

The constraint on the \(\beta _{jk}\) is mathematically not required because \(0 < \pi ^*_j, r^*_{jk} < 1\). However, \(\beta _{jk} > 0\) is necessary to guarantee monotonicity—that is, for each examinee, who masters certain attributes, the probability of a correct response must be equal to or greater than the probability of a correct response when these attributes are not mastered (Henson et al., 2009, p. 198).

The inclusion of the interaction effect, \(\alpha _1\alpha _2\), in the model—in addition to the two main effects—is needed for modeling the probability of a correct response to an item that requires the mastery of two attributes. Specifically, the coefficient \(\beta _{j12}\) quantifies the relation between item \(j\) and attribute 2 conditional on the mastery of attribute 1 (and vice versa—see, Henson et al., 2009, p. 198).

1.3 The Coefficient \(\beta _{j12}\): Further Considerations

The functional expression of the coefficient \(\beta _{j12}\) of the interaction term in Equation 4 implies the constraint—not explicitly mentioned in Henson et al. (2009)—that the argument of the log function be strictly positive (the logarithm of a negative argument is not defined). Because it always holds that \(1 + \mathrm{e}^{ \beta _{j0} } > 0\), the constraint reduces to the requirement that the denominator of the fraction in parentheses be strictly positive:

$$\begin{aligned} 1 + \mathrm{e}^{\beta _{j0} + \beta _{j1} } + \mathrm{e}^{\beta _{j0} + \beta _{j2} } - \mathrm{e}^{\beta _{j0} + \beta _{j1} + \beta _{j2}} > 0 \end{aligned}$$
(5)

Certain software packages (e.g., Mplus) cannot handle this constraint in the form of Equation 5, but require that it be reformulated as an upper bound on one of the parameters, \(\beta _{j0}\), \(\beta _{j1}\), or \(\beta _{j2}\). Thus, in arbitrarily choosing \(\beta _{j2}\) as the parameter with the highest index \(k=K=2\), Equation 5 is re-expressed as

$$\begin{aligned} \mathrm{e}^{\beta _{j2}} < \frac{ 1 + \mathrm{e}^{ \beta _{j0} + \beta _{j1} } }{ \mathrm{e}^{\beta _{j0} + \beta _{j1} } - \mathrm{e}^{\beta _{j0}} } \end{aligned}$$

Note that \(\mathrm{e}^{\beta _{j0} + \beta _{j1} } - \mathrm{e}^{\beta _{j0}} > 0\) must be true because \(\mathrm{e}^{ \beta _{j0} } > 0\) and \(\mathrm{e}^{ \beta _{j1} } > 1\) due to \(\beta _{j1} > 0\). Hence, \(\beta _{j2} < \ln \Big ( 1 + \mathrm{e}^{ \beta _{j0} + \beta _{j1} } \Big ) - \ln \Big ( \mathrm{e}^{ \beta _{j1} } - 1 \Big ) - \beta _{j0}\), and the constraints on Equation 3 are given in explicit form as

$$\begin{aligned} (1)&0 < \beta _{j1} \\ (2)&0 < \beta _{j2} < \ln \Big ( 1 + \mathrm{e}^{ \beta _{j0} + \beta _{j1} } \Big ) - \ln \Big ( \mathrm{e}^{ \beta _{j1} } - 1 \Big ) - \beta _{j0} \end{aligned}$$

2 The Reduced RUM as a General Cognitive Diagnosis Model Based on the Log Link

The Reduced RUM can also be expressed as a general CDM using the log link. Rewrite Equation 2 as

$$\begin{aligned} P(Y_j = 1 \mid \varvec{\alpha })= & {} \pi ^*_j \prod _{k=1}^K r_{jk}^{*\, q_{jk}(1-\alpha _{k})}\\= & {} \mathrm{e}^{ \ln (\pi ^*_j) + \sum _{k=1}^K \ln \Big (r_{jk}^{*\, q_{jk}(1-\alpha _{k})}\Big ) }\\= & {} \mathrm{e}^{ \ln (\pi ^*_j) + \sum _{k=1}^K \Big (q_{jk}\ln (r_{jk}^{*}) - q_{jk}\alpha _{k}\ln (r_{jk}^{*})\Big ) }\\= & {} \mathrm{e}^{ \ln (\pi ^*_j) + \sum _{k=1}^K \ln (r_{jk}^{*})q_{jk} + \sum _{k=1}^K -\ln (r_{jk}^{*})q_{jk}\alpha _{k} } \end{aligned}$$

Setting

$$\begin{aligned} \delta _{j0}= & {} \ln (\pi ^*_j) + \sum _{k=1}^K \ln (r_{jk}^{*})q_{jk}\\ \delta _{jk}= & {} -\ln (r_{jk}^{*}) \end{aligned}$$

then results in the IRF of the Reduced RUM as a general CDM with log link:

$$\begin{aligned} P(Y_j = 1 \mid \varvec{\alpha })&= \,\, \mathrm{e}^{\delta _{j0} + \sum _{k=1}^K \delta _{jk}q_{jk} \alpha _k}\\ \text{ subject } \text{ to: }&\, (1) \,\, \delta _{j0} < 0 \nonumber \\&(2) \,\, 0 < \sum _{k=1}^{K}\delta _{jk}\alpha _k < |\delta _{j0}| \nonumber \end{aligned}$$
(6)

Constraints (1) and (2) are implied by the restrictions on the traditional parameters of the Reduced RUM, \(0 < \pi ^*_j, r^*_{jk} < 1\). (Note that \(r_{jk}^{*} = \mathrm{e}^{ -\delta _{jk} }\) due to \(\delta _{jk} = -\ln (r_{jk}^{*})\); and because \(\delta _{j0} = \ln (\pi ^*_j) + \sum _{k=1}^K \ln (r_{jk}^{*})q_{jk}\) it holds that \(\pi ^*_j = \mathrm{e}^{\delta _{j0} + \sum _{k=1}^K \delta _{jk} q_{jk} }\).) Observe that the log link-based IRF of the Reduced RUM has a simpler parameter structure than the logit model because no interaction effects are included in the model—the log-link form is a main-effects-only model.

3 The Reduced RUM as a General Cognitive Diagnosis Model: The Connection Between the Log Link and the Logit Link

Henson et al. (2009) derived the reparameterization of the Reduced RUM as a logit model from the equivalent, but mathematically more tractable Inverse RUM. This section demonstrates that the log link provides an alternative transformation for deriving the Reduced RUM in the form of a general CDM using the logit link.

Consider item \(j\) requiring the mastery of \(K\) attributes (i.e., \(\mathbf {q}_j=(11 \ldots 1)^T\), a \(K\)-dimensional vector of ones); its IRF based on the log link is given by Equation 6. The parameters of the logit model are derived by inspecting Equation 6 separately for each of the \(2^K=M\) attribute profiles of the different proficiency classes.

3.1 The Attribute Profile \(\varvec{\alpha }=(00 \cdots 0)^T\)

The response probabilities of examinees in the first proficiency class are adequately modeled by setting all parameters in Equation 6 to zero except \(\delta _{j0}\):

$$\begin{aligned} P(Y_j=1 \mid \varvec{\alpha }_1) = \mathrm{e}^{\delta _{j0}} \end{aligned}$$
(7)

The equivalent IRF using the logit link must result in the same item response probability. Thus, the logit model too must be an intercept-only model:

$$\begin{aligned} P(Y_j=1 \mid \varvec{\alpha }) = \frac{ \mathrm{e}^{ \beta _{j0} } }{ 1 + \mathrm{e}^{ \beta _{j0} } } \end{aligned}$$
(8)

Then, equating 7 and 8 and solving for \(\delta _{j0}\) gives

$$\begin{aligned} \delta _{j0}= & {} \beta _{j0} - \ln \left( 1 + \mathrm{e}^{ \beta _{j0} } \right) \end{aligned}$$

3.2 The Attribute Profiles \(\varvec{\alpha }=\mathbf {e}_k\)

Let \(\mathbf {e}_k\) denote a unit vector with the \(k\)th element equal to 1, and the remaining entries all 0. Thus, \(\varvec{\alpha }=\mathbf {e}_k\) indicates the single-attribute profile of the proficiency class whose examinees master only the \(k\)th attribute. Thus, the IRF using the log link is

$$\begin{aligned} P(Y_j=1 \mid \varvec{\alpha }) = \mathrm{e}^{\delta _{j0} + \delta _{jk}} \end{aligned}$$

The equivalent logit model is

$$\begin{aligned} P(Y_j=1 \mid \varvec{\alpha }) = \frac{\mathrm{e}^{ \beta _{j0} + \beta _{jk} } }{ 1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk} } } \end{aligned}$$

Equating the two IRFs, substituting \(\delta _{j0} = \beta _{j0} - \ln \Big (1 + \mathrm{e}^{ \beta _{j0} } \Big )\), and solving for \(\delta _{jk}\) gives

$$\begin{aligned} \delta _{jk} = \beta _{jk} - \ln \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}} \Big ) + \ln \Big (1 + \mathrm{e}^{\beta _{j0}} \Big ) \end{aligned}$$

3.3 The Attribute Profiles \(\varvec{\alpha }=(1\cdots 10 \cdots 0)^T\), where \(||\varvec{\alpha }||=K'\), \(K'=2,\ldots ,K\)

Without loss of generality, the entries in \(\varvec{\alpha }\) are assumed to be ordered such that the first \(K^{\prime }=2,\ldots ,K\) positions in \(\varvec{\alpha }\) are occupied by the entries \(\alpha _k = 1\). The IRF based on the log link is

$$\begin{aligned} P(Y_j = 1 \mid \varvec{\alpha }) = \mathrm{e}^{\delta _{j0} + \sum ^{K'}_{k=1} \delta _{jk}} \end{aligned}$$

which upon substituting for \(\delta _{j0}\) and \(\delta _{jk}\) the expressions obtained earlier is written as

$$\begin{aligned} P(Y_j = 1 \mid \varvec{\alpha }) = \mathrm{e}^{ \beta _{j0} + \sum ^{K'}_{k=1} \beta _{jk} - \sum ^{K'}_{k=1} \ln \big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}} \big ) + (K'-1)\ln \big (1 + \mathrm{e}^{\beta _{j0}} \big ) } \end{aligned}$$
(9)

It is trivial to verify that the equivalent IRF using the logit link must include all \(\sum ^{K^{\prime }}_{k=2} {K^{\prime } \atopwithdelims ()k}\) interaction terms:

$$\begin{aligned} P(Y_j = 1 \mid \varvec{\alpha }) = \frac{ \mathrm{e}^{\beta _{j0} + \sum ^{K'}_{k=1}\beta _{jk} + \sum ^{K'}_{k^{\prime }=k+1} \sum ^{K'-1}_{k=1} \beta _{jkk^{\prime }} + \cdots + \beta _{j1 \ldots K'}} }{ 1 +\mathrm{e}^{\beta _{j0} + \sum ^{K'}_{k=1}\beta _{jk} + \sum ^{K'}_{k^{\prime }=k+1} \sum ^{K'-1}_{k=1} \beta _{jkk^{\prime }} + \cdots + \beta _{j1 \ldots K'}} } \end{aligned}$$
(10)

(Otherwise, the fundamental requirement were violated that the item response probabilities of Equations 9 and 10 must be equal.) Equating the IRFs of Equations 9 and 10 and solving for \(\beta _{j1 \ldots K'}\) then results in

$$\begin{aligned} \beta _{j1 \ldots K'}= & {} \ln \left( \frac{ \Big (1 + \mathrm{e}^{\beta _{j0}} \Big )^{K'-1}}{\prod ^{K'}_{k=1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}} \Big ) - \Big (1 + \mathrm{e}^{ \beta _{j0}} \Big )^{K'-1} \mathrm{e}^{ \beta _{j0} + \sum ^K_{k=1} \beta _{jk} } } \right) \nonumber \\&\quad - \sum ^{K'}_{k^{\prime }=k+1} \sum ^{K'-1}_{k=1} \beta _{jkk^{\prime }} - \cdots - \sum ^{K'}_{k_{K'-1}=k_{K'-2}+1} \cdots \sum ^3_{k_2=k_1+1} \sum ^2_{k_1=1} \beta _{j k_1\ldots k_{K'-1}} \end{aligned}$$
(11)

From the general expression of \(\beta _{j1 \ldots K'}\) in Equation 11 the coefficient of any \(K'\)-way interaction term is readily obtained provided the indices of the summation and product operators are adjusted accordingly. For example, consider item \(j\) requiring the mastery of \(K=3\) attributes. If the coefficients of the three two-way interactions are sought, then \(K^{\prime }=2\) and Equation 11 provides

$$\begin{aligned} \beta _{j1K^{\prime }} = \beta _{j12}= & {} \ln \left( \frac{(1 + \mathrm{e}^{ \beta _{j0}})^{2-1}}{\prod ^{2}_{k=1}(1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}) - (1 + \mathrm{e}^{ \beta _{j0}})^{2-1} \mathrm{e}^{ \beta _{j0} + \sum ^{2}_{k=1} \beta _{jk}}} \right) \\= & {} \ln \left( \frac{1 + \mathrm{e}^{ \beta _{j0}}}{(1 + \mathrm{e}^{ \beta _{j0} + \beta _{j1}})(1 + \mathrm{e}^{ \beta _{j0} + \beta _{j2}}) - (1 + \mathrm{e}^{ \beta _{j0}}) \mathrm{e}^{ \beta _{j0} + \beta _{j1} + \beta _{j2}}} \right) \end{aligned}$$

Note, however, that for \(\beta _{j13}\) and \(\beta _{j23}\), the index \(k\) must be adjusted to the values in the sets \(\{1, 3\}\) and \(\{2, 3\}\), respectively, that refer to the indices of the main effects constituting the corresponding interaction terms:

$$\begin{aligned} \beta _{j13}= & {} \ln \left( \frac{(1 + \mathrm{e}^{ \beta _{j0}})^{2-1}}{\prod _{k \in \{1,3\}}(1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}) - (1 + \mathrm{e}^{ \beta _{j0}})^{2-1} \mathrm{e}^{ \beta _{j0} + \sum _{k \in \{1,3\}} \beta _{jk}}} \right) \\= & {} \ln \left( \frac{1 + \mathrm{e}^{ \beta _{j0}}}{(1 + \mathrm{e}^{ \beta _{j0} + \beta _{j1}})(1 + \mathrm{e}^{ \beta _{j0} + \beta _{j3}}) - (1 + \mathrm{e}^{ \beta _{j0}}) \mathrm{e}^{ \beta _{j0} + \beta _{j1} + \beta _{j3}}} \right) \end{aligned}$$

and

$$\begin{aligned} \beta _{j23}= & {} \ln \left( \frac{(1 + \mathrm{e}^{ \beta _{j0}})^{2-1}}{\prod _{k \in \{2,3\}}(1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}) - (1 + \mathrm{e}^{ \beta _{j0}})^{2-1} \mathrm{e}^{ \beta _{j0} + \sum _{k \in \{2,3\}} \beta _{jk}}} \right) \\= & {} \ln \left( \frac{1 + \mathrm{e}^{ \beta _{j0}}}{(1 + \mathrm{e}^{ \beta _{j0} + \beta _{j2}})(1 + \mathrm{e}^{ \beta _{j0} + \beta _{j3}}) - (1 + \mathrm{e}^{ \beta _{j0}}) \mathrm{e}^{ \beta _{j0} + \beta _{j2} + \beta _{j3}}} \right) \end{aligned}$$

The coefficient of the three-way interaction (i.e., \(K'=K=3\)) is obtained from Equation 11 as

$$\begin{aligned} \beta _{j123}= & {} \ln \left( \frac{ (1 + \mathrm{e}^{ \beta _{j0}})^{(3-1)} }{ \prod ^{3}_{k=1} ( 1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} ) - ( 1 + \mathrm{e}^{ \beta _{j0}} )^{3-1} \mathrm{e}^{ \beta _{j0} + \sum ^{3}_{k=1} \beta _{jk} } } \right) - \sum ^{3}_{k^{\prime }=k+1} \sum ^{2}_{k=1} \beta _{jkk^{\prime }} \\= & {} \ln \left( \frac{ (1 + \mathrm{e}^{ \beta _{j0} } )^2 }{ ( 1 + \mathrm{e}^{ \beta _{j0} + \beta _{j1}}) (1 + \mathrm{e}^{ \beta _{j0} + \beta _{j2}}) (1 + \mathrm{e}^{ \beta _{j0} + \beta _{j3} }) - (1 + \mathrm{e}^{ \beta _{j0} } )^2 \mathrm{e}^{ \beta _{j0} + \beta _{j1} + \beta _{j2} + \beta _{j3}} } \right) \\&- \beta _{j13} - \beta _{j12} - \beta _{j23} \end{aligned}$$

3.4 The Constraint on \(\beta _{j1 \ldots K'}\)

The expression of \(\beta _{j1 \ldots K'}\) in Equation 11 is defined only if the argument of the log function is strictly positive. This condition is satisfied if

$$\begin{aligned} \prod ^{K'}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big ) - \Big (1 + \mathrm{e}^{ \beta _{j0}} \Big )^{K'-1} \mathrm{e}^{ \beta _{j0} + \sum ^{K'}_{k=1} \beta _{jk}} > 0 \end{aligned}$$
(12)

Recall that certain software packages (e.g., Mplus) cannot handle the constraint in the form of Equation 12, but require that it be rephrased as an upper bound on one of the \(K^{\prime }+1\) parameters. The convention adopted here is to choose (arbitrarily) the last parameter, \(\beta _{jK'}\). Thus, Equation 12 is rewritten as

$$\begin{aligned} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jK'}} \Big ) \prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big ) - \Big (1 + \mathrm{e}^{ \beta _{j0}} \Big )^{K'-1} \mathrm{e}^{ \beta _{j0} + \sum ^{K'-1}_{k=1} \beta _{jk}} \mathrm{e}^{ \beta _{jK'}} > 0 \end{aligned}$$

Then,

$$\begin{aligned} \prod ^{K'-1}_{k=1} \left( 1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \right) + \mathrm{e}^{ \beta _{j0}} \mathrm{e}^{ \beta _{jK'}} \prod ^{K'-1}_{k=1} \left( 1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \right) - \mathrm{e}^{ \beta _{j0}} \mathrm{e}^{ \beta _{jK'}} \left( 1 + \mathrm{e}^{ \beta _{j0}} \right) ^{K'-1} \mathrm{e}^{\sum ^{K'-1}_{k=1} \beta _{jk}} > 0 \end{aligned}$$
$$\begin{aligned} \prod ^{K'-1}_{k=1} \left( 1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}} \right)> & {} \mathrm{e}^{ \beta _{jK'}} \mathrm{e}^{ \beta _{j0}} \left( \Big (1 + \mathrm{e}^{\beta _{j0}} \Big )^{K'-1} \mathrm{e}^{ \sum ^{K'-1}_{k=1} \beta _{jk}} - \prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big ) \right) \nonumber \\= & {} \mathrm{e}^{ \beta _{jK'}} \mathrm{e}^{ \beta _{j0}} \left( \prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0}} \Big ) \mathrm{e}^{\beta _{jk}} - \prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big )\right) \nonumber \\= & {} \mathrm{e}^{ \beta _{jK'}} \mathrm{e}^{ \beta _{j0}} \left( \prod ^{K'-1}_{k=1} \Big (\mathrm{e}^{ \beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big ) -\prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big )\right) \end{aligned}$$
(13)

Because \(\mathrm{e}^{\beta _{jk}}>1\) for all \(k\) (due to \(\beta _{jk}>0\))

$$\begin{aligned} \prod ^{K'-1}_{k=1} \Big (\mathrm{e}^{ \beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big ) -\prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big ) > 0 \end{aligned}$$

Thus,

$$\begin{aligned} \mathrm{e}^{\beta _{j0}}\left( \prod ^{K'-1}_{k=1} \Big (\mathrm{e}^{ \beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big ) -\prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big )\right) > 0 \end{aligned}$$

must be true, and Equation 13 can be written as

$$\begin{aligned} \mathrm{e}^{ \beta _{jK'}}< & {} \frac{ \prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \Big ) }{\mathrm{e}^{ \beta _{j0}} \Big (\prod ^{K'-1}_{k=1} \big (\mathrm{e}^{ \beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \big ) -\prod ^{K'-1}_{k=1} \big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}} \big )\Big )} \end{aligned}$$

which implies the general expression of the upper bound, \(U^{(1 \ldots K^{\prime })}_{jK^{\prime }}\):

$$\begin{aligned} \beta _{jK'}< & {} U^{(1 \ldots K^{\prime })}_{jK^{\prime }} \nonumber \\= & {} \sum ^{K'-1}_{k=1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) - \ln \left( \prod ^{K'-1}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) - \prod ^{K'-1}_{k=1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big ) \right) - \beta _{j0} \nonumber \\ \end{aligned}$$
(14)

Based on Equation 14, the upper bound on any main-effect coefficient can be identified provided the indices of the summation and the product operator are adequately adjusted.

3.5 The Least Upper Bound on the Coefficient of the Last Main Effect

In the previous section, it was shown that the constraint on the coefficient of any interaction term can be rephrased as an upper bound on the coefficient of any main effect that constitutes the interaction term (recall that the intercept would do as well). By convention, this upper bound is usually imposed on the coefficient of the last main effect, \(\beta _{jK'}\) (i.e., the coefficient of the main effect with the highest index \(k=K'\)). In this section, it is demonstrated that the upper bound derived from the constraint on the coefficient of the highest-order interaction term always guarantees the least upper bound—that is, all upper bounds derivable from the constraints on the coefficients of any lower-order interaction terms are larger.

Consider an item that requires the mastery of \(K\) attributes. Hence, the coefficient of the last main effect is \(\beta _{jK}\). There are \(\sum _{k=1}^{K-1}{K-1\atopwithdelims ()k}\) interaction terms that involve the last main effect, \(\alpha _K\). Thus, \(\sum _{k=1}^{K-1}{K-1\atopwithdelims ()k}\) different upper bounds on \(\beta _{jK}\) can be derived. (For example, let \(K=4\); thus, there are \(\sum _{k=1}^{3}{3\atopwithdelims ()k}=7\) upper bounds on \(\beta _{j4}\).) It was demonstrated earlier that the least upper bound on \(\beta _{jK}\) can be identified by exploring the relations among the candidate upper bounds. Consider the upper bound on \(\beta _{jK}\), \(U_{\beta _{jK}}^{(1\ldots K)}\), obtained from the constraint on the coefficient of the \(K\)-way interaction, \(\beta _{j1 \ldots K}\),

$$\begin{aligned} \beta _{jK}< & {} U^{(1\ldots K)}_{\beta _{jK}} \\= & {} \sum ^{K-1}_{k=1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) - \ln \left( \prod ^{K-1}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) - \prod ^{K-1}_{k=1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) -\beta _{j0} \end{aligned}$$

Without loss of generality, \(U^{(1\ldots K)}_{\beta _{jK}}\) is compared to the upper bounds on \(\beta _{jK}\) obtainable from the constraints on the coefficients of any \(K'\)-way interaction, \(\beta _{j((K-K'+1)\ldots K)}\), where \(K'=2,\ldots , K-1\):

$$\begin{aligned} \beta _{jK}&<\, U^{((K-K'+1)\ldots K)}_{\beta _{jK}}\\&=\, \sum ^{K-1}_{k=K-K'+1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\\&\quad -\,\ln \left( \prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) -\prod ^{K-1}_{k=K-K'+1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) -\beta _{j0} \end{aligned}$$

The difference of any pair of upper bounds is always strictly positive:

$$\begin{aligned}&U^{((K-K'+1)\ldots K)}_{\beta _{jK}}-U^{(1\ldots K)}_{\beta _{jK}}\\&\quad = \sum ^{K-1}_{k=K-K'+1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\\&\qquad -\ln \left( \prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) -\prod ^{K-1}_{k=K-K'+1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) -\beta _{j0}\\&\qquad -\sum ^{K-1}_{k=1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) +\ln \left( \prod ^{K-1}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) -\prod ^{K-1}_{k=1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) +\beta _{j0}\\&\quad = \ln \left( \prod ^{K-1}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) -\prod ^{K-1}_{k=1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) \\&\qquad -\ln \left( \prod ^{K-1}_{k=K\!-\!K'\!+\!1} \Big (\mathrm{e}^{\beta _{jk}} \!+\! \mathrm{e}^{ \beta _{j0} \!+\! \beta _{jk}}\Big ) -\prod ^{K-1}_{k=K-K'+1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) -\sum ^{K-K'}_{k=1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) \\&\quad = \ln \left( \prod ^{K-K'}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\right. \\&\qquad \left. -\prod ^{K-K'}_{k=1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\prod ^{K-1}_{k=K-K'+1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) \\&\qquad -\ln \left( \prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} \!+\! \mathrm{e}^{ \beta _{j0} \!+\! \beta _{jk}}\Big ) \!-\!\prod ^{K-1}_{k=K-K'+1} \Big (1 \!+\! \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) -\sum ^{K-K'}_{k=1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) \\&\quad > \ln \left( \prod ^{K-K'}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\right. \\&\qquad \left. -\prod ^{K-K'}_{k=1} (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{\beta _{j0} + \beta _{jk}})\prod ^{K-1}_{k=K-K'+1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) \\&\qquad -\ln \left( \prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} \!+\! \mathrm{e}^{ \beta _{j0} \!+\! \beta _{jk}}\Big ) \!-\!\prod ^{K-1}_{k=K-K'+1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) \!-\!\sum ^{K-K'}_{k=1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) \end{aligned}$$
$$\begin{aligned}&\quad = \ln \left( \prod ^{K-K'}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\left( \prod ^{K-1}_{k=K-K'+1} (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}) -\prod ^{K-1}_{k=K-K'+1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) \right) \nonumber \\&\qquad -\ln \left( \prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} \!+\! \mathrm{e}^{ \beta _{j0} \!+\! \beta _{jk}}\Big ) \!-\!\prod ^{K-1}_{k=K-K'+1} \Big (1 \!+\! \mathrm{e}^{\beta _{j0} \!+\! \beta _{jk}}\Big )\right) -\sum ^{K-K'}_{k=1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) \\&\quad = \ln \left( \prod ^{K-K'}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\right) + \ln \left( \prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big )\right. \\&\qquad \left. -\prod ^{K-1}_{k=K-K'+1} \Big (1 + \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) \\&\qquad -\ln \left( \prod ^{K-1}_{k=K-K'+1} \Big (\mathrm{e}^{\beta _{jk}} \!+\! \mathrm{e}^{ \beta _{j0} \!+\! \beta _{jk}}\Big ) \!-\!\prod ^{K-1}_{k=K-K'+1} \Big (1 \!+\! \mathrm{e}^{\beta _{j0} + \beta _{jk}}\Big )\right) -\sum ^{K-K'}_{k=1}\ln \Big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) \\&\quad = \ln \left( \prod ^{K-K'}_{k=1} \Big (\mathrm{e}^{\beta _{jk}} \!+\! \mathrm{e}^{ \beta _{j0} \!+\! \beta _{jk}}\Big )\right) \!-\!\sum ^{K-K'}_{k=1}\ln \Big (1 \!+\! \mathrm{e}^{ \beta _{j0} + \beta _{jk}}\Big ) \\&\quad >\, 0 \quad \text{ due } \text{ to }\, \mathrm{e}^{\beta _{jk}}>1 \end{aligned}$$

(After minor adjustments of the indices of the product and the summation operators, this result generalizes to any other \(K'\)-way interaction that involves the main effect \(\alpha _K\).) Among all upper bounds on \(\beta _{jK}\) that can be derived from the constraint on the coefficient of any interaction term involving the main effect \(\alpha _K\) the upper bound obtainable from the coefficient of the highest-order interaction term is always the least upper bound.

4 Practical Illustration: Fitting the Reduced RUM Using the LCA Routine in Mplus

To illustrate the practical application of the theoretical results on the reparameterization of the Reduced RUM as a logit model, two synthetic data sets and a real-world data set on the “Examination for the Certificate of Proficiency in English” (ECPE) were analyzed. The model was estimated using the EM algorithm implemented in the LCA routine in Mplus—to provide a benchmark, all data sets were also fitted with the MCMC routine available in OpenBUGS. Parts of the Mplus syntax for implementing the parameterization of the logistic function and the associated parameter constraints can be found in the appendices (as general references on how to write Mplus command files for fitting CDMs consult Rupp et al., 2010; Templin & Hoffman, 2013).

4.1 The Synthetic Data Sets

Two data sets were simulated each containing the responses of \(N=3000\) examinees to \(J = 30\) items conforming to the Reduced RUM involving \(K=3\) and \(K=4\) attributes, respectively. The examinees’ attribute profiles were generated based on the multivariate normal threshold model (Chiu, Douglas, & Li, 2009). Each attribute profile was linked to a latent continuous ability vector, \(\varvec{\theta }_i = (\theta _{i1}, \ldots , \theta _{iK})^T \sim \mathcal {N}_K (\varvec{0}, \varvec{\Sigma })\), with values along the main diagonal of \(\varvec{\Sigma }\) equal to 1.00 and off-diagonal entries set to 0.50. The \(\varvec{\theta }_i\) were randomly sampled and their components dichotomized according to

$$\begin{aligned} \alpha _{ik} = \left\{ \begin{array}{l@{\quad }l} 1 &{} \text{ if }\, \theta _{ik} \ge \Phi ^{-1} \Big (\frac{k}{K+1}\Big )\\ 0 &{} \text{ otherwise } \end{array} \right. \end{aligned}$$

resulting in the attribute profile \(\varvec{\alpha }_i\).

For all items, the baseline parameter was fixed at \(\pi ^{*}_j = 0.90\); the penalty parameters, \(r^{*}_{jk}\), were all set to 0.60. For \(K=3\) attributes, the Q-matrix replicated each of the single and two-attribute item profiles four times; the three-attribute item profiles were replicated six times. For \(K=4\) attributes, the Q-matrix replicated each of the feasible item- attribute profiles twice.

4.2 Results: The Synthetic Data Set Involving \(K=3\) Attributes

On a machine with 8 GB RAM, 2.30 GHz Intel Core processor, and 64-bit OS, the EM algorithm (Mplus) used 2,054 seconds of CPU time; MCMC (OpenBUGS) required 16,109 seconds of CPU time. (For the simulated data, to secure high accuracy of the MCMC parameter estimates, 8000 and 10000 were chosen as the length of the burn-in and the number of updates, respectively.) None of the estimates of the (traditional) parameters of the Reduced RUM obtained from the EM algorithm deviated from the known true model parameters by more than 0.080; for the MCMC estimates, the maximum deviation from the known true model parameters was 0.082.

The parameter estimates of the Reduced RUM in traditional form, \(\hat{\pi }^{*}_j\), \(\hat{r}^{*}_{jk}\), \(k=1,2,3\), were retrieved from the EM-based estimates of the parameters of the logit model according to (see p. 9)

$$\begin{aligned} \pi ^{*}_j= & {} \mathrm{e}^{ \delta _{j0} + \sum ^K_{k=1} \delta _{jk}q_{jk} } = \mathrm{e}^{\beta _{j0} - \ln (1 + \mathrm{e}^{ \beta _{j0} } ) + \sum ^K_{k=1} \left( \beta _{jk}q_{jk} - \ln \big ( 1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk} }\big ) q_{jk} + \ln \big ( 1 + \mathrm{e}^{ \beta _{j0} }\big ) q_{jk} \right) }\\= & {} \frac{ \mathrm{e}^{ \beta _{j0} + \sum ^K_{k=1} \beta _{jk} q_{jk} } \prod ^K_{k=1} \big (1 + \mathrm{e}^{ \beta _{j0} }\big )^{q_{jk}} }{ \big (1 + \mathrm{e}^{ \beta _{j0} }\big ) \prod ^K_{k=1} \big (1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk} }\big )^{q_{jk}} } \end{aligned}$$

and

$$\begin{aligned} r^{*}_{jk} = \mathrm{e}^{-\delta _{jk}} = \mathrm{e}^{ - \beta _{jk} + \ln \big ( 1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk} }\big ) - \ln \big ( 1+ \mathrm{e}^{ \beta _{j0} }\big ) } = \frac{ 1 + \mathrm{e}^{ \beta _{j0} + \beta _{jk} } }{ \mathrm{e}^{ \beta _{jk} } \big (1 + \mathrm{e}^{ \beta _{j0} }\big ) } \end{aligned}$$

As an illustration of these conversions, consider item 7. The coefficient estimates \(\hat{\beta }_{70}=-1.397\), \(\hat{\beta }_{71}=0.625\), \(\hat{\beta }_{72}=0.741\), \(\hat{\beta }_{73}=0.668\), \(\hat{\beta }_{712}=0.209\), \(\hat{\beta }_{713}=0.179\), \(\hat{\beta }_{723}=0.232\), and \(\hat{\beta }_{7123}=0.868\) were given in the Mplus output (not reported here). The estimate \(\hat{\pi }^{*}_7\) of item 7 was then computed as

$$\begin{aligned} \hat{\pi }^{*}_7 = \frac{ \mathrm{e}^{-1.397+0.625+0.741+0.668}(1+\mathrm{e}^{-1.397})^2 }{ (1+\mathrm{e}^{-1.397+0.625})(1+\mathrm{e}^{-1.397+0.741})(1+\mathrm{e}^{-1.397+0.668}) } = 0.8935971 \end{aligned}$$

The estimates of \(r^{*}_{71}\), \(r^{*}_{72}\), and \(r^{*}_{73}\) were

$$\begin{aligned} \hat{r}^{*}_{71}= & {} \frac{ 1+\mathrm{e}^{-1.397+0.625} }{ \mathrm{e}^{0.625}(1+\mathrm{e}^{-1.397}) } = 0.6274156\\ \hat{r}^{*}_{72}= & {} \frac{ 1+\mathrm{e}^{-1.397+0.741} }{ \mathrm{e}^{0.741}(1+\mathrm{e}^{-1.397}) } = 0.5804160\\ \hat{r}^{*}_{73}= & {} \frac{ 1+\mathrm{e}^{-1.397+0.668} }{ \mathrm{e}^{0.668}(1+\mathrm{e}^{-1.397}) } = 0.6093545 \end{aligned}$$

The conversion of the parameter estimates can be carried out directly in Mplus; the syntax for item 7 is provided in Appendix 1. As an aside, the estimates of the \(\beta \)-coefficients can also be used to verify that the coefficients of the interaction terms are functions of the parameters of the constituting main effects—as an example, consider \(\beta _{712}\):

$$\begin{aligned} \hat{\beta }_{712}= & {} \ln \left( \frac{ 1+\mathrm{e}^{-1.397} }{ (1 + \mathrm{e}^{-1.397+0.625})(1 + \mathrm{e}^{-1.397+0.741}) - (1+\mathrm{e}^{-1.397}) \mathrm{e}^{-1.397+0.625+0.741} } \right) \\= & {} 0.2095423 \end{aligned}$$

Fitting the data also involved modeling the relation between the latent ability dimensions underlying the attribute profiles, often referred to as the higher-order structure among the attributes. Rupp et al. (2010) discussed several models that can be used for this purpose. In the simulation studies reported here, the saturated log-linear model was used. (For technical details, consult Rupp et al., 2010, Chaps. 8 and 9; the Mplus syntax is provided in Appendix 1—when using MCMC (OpenBUGS), the relation between the latent traits was analyzed based on the multivariate normal threshold model. It can be shown that the two approaches are mathematically equivalent.)   The parameter estimates of the saturated log-linear model were G_0 = 0.007, G_11 = \(-\)1.014, G_12 = \(-\)1.052, G_13 = \(-\)1.130, G_212 = 1.093, G_213 = 1.032, G_223 = 1.012, and G_3123 = 0.053. Based on these estimates, the proportions of examinees in each proficiency class were computed that were then used to estimate the tetrachoric correlations between pairs of attributes (the known true tetrachoric correlations are given in parentheses): \((\alpha _1, \alpha _2) = 0.498\) (0.488); \((\alpha _1, \alpha _3) = 0.484\) (0.496); and \((\alpha _2, \alpha _3) = 0.480\) (0.489).

4.3 Results: The Synthetic Data Set Involving \(K=4\) Attributes

The CPU times for the EM algorithm (Mplus) were 7430 and 23,762 seconds for MCMC (OpenBUGS) observed on a machine with 8 GB RAM, 2.30 GHz Intel Core processor, and 64-bit OS. (18,000 and 10,000 were chosen as the length of the burn-in and the number of updates, respectively.) None of the estimates of the (traditional) parameters of the Reduced RUM obtained from the EM algorithm deviated from the known true model parameters by more than 0.111; for the MCMC estimates, the maximum deviation from the known true model parameters was 0.186. Appendix 2 provides the Mplus syntax for the main and interaction effects of the most complex test item 15, \(\mathbf {q}_{15} = (1111)^T\), and the associated constraints.

Modeling the higher-order structure among the \(K=4\) attributes resulted in the following parameter estimates of the saturated log-linear model: G_0 = 0.066, G_11 = \(-\)1.389, G_12 = \(-\)1.539, G_13 = \(-\)1.542, G_14 = \(-\)1.567, G_212 = 1.181, G_213 = 1.279, G_214 =  1.216, G_223 = 1.030, G_224 = 1.198, G_234 = 1.268, G_3123 = \(-\)0.338, G_3124 = \(-\)0.649, G_3134 = \(-\)0.836, G_3234 = \(-\)0.124, and G_41234 = 0.745. The estimated tetrachoric correlations between the attribute pairs \((\alpha _1, \alpha _2)\), \((\alpha _1, \alpha _3)\), \((\alpha _1, \alpha _4)\), \((\alpha _2, \alpha _3)\), \((\alpha _2, \alpha _4)\), and \((\alpha _3, \alpha _4)\) were thus 0.497 (0.523), 0.498 (0.498), 0.449 (0.462), 0.538 (0.515), 0.530 (0.537), and 0.524 (0.500), respectively (the known true tetrachoric correlations are given in parentheses).

4.4 The “Examination for the Certificate of Proficiency in English (ECPE)” Data

The “Examination for the Certificate of Proficiency in English (ECPE)” for non-native speakers was developed by the English Language Institute at the University of Michigan. The test is administered annually to examinees from Africa, Asia, Europe, and Latin America. The data used here are a subset from the ECPE grammar section from a single year and have been previously analyzed by Buck and Tatsuoka (1998), Feng et al. (2014), Henson and Templin (2007), Templin and Bradshaw (2014), and Templin and Hoffman (2013). Responses to \(J = 28\) items were collected from \(N=2922\) examinees. The test involved \(K=3\) attributes (\(\alpha _1\) = lexical skills; \(\alpha _2\) = morphosyntactic skills; and \(\alpha _3\) = cohesive skills); but none of the items required the mastery of all \(K=3\) attributes (the complete Q-matrix is given in Templin & Hoffman, 2013).

4.5 Results: The “Examination for the Certificate of Proficiency in English (ECPE)” Data

On a machine with 8 GB RAM, 2.30 GHz Intel Core processor, and 64-bit OS, the CPU times for the EM algorithm (Mplus) were 568 seconds and 8,319 seconds for MCMC (OpenBUGS). The data were first fitted without including the higher-order attribute structure using the default setting of 0.50 for the item parameter starting values. Mplus did not converge and terminated prematurely with an error message recommending to change the starting values. Inspection of the (premature) parameter estimates revealed several problems: (a) the estimate of the main effect parameter for item 8, \(\hat{\beta }_{83}\), was negative, which is a violation of the positivity constraint; (b) all parameter estimates of item 22 were unusually large in magnitude. After increasing the Mplus starting value of the main effect of item 8 from the default setting of 0.50–0.80, and decreasing the Mplus starting value of the intercept parameter of item 22 from the default value of 0.50–0.10, the estimation process converged properly. Next, the estimation of the measurement model was augmented by fitting the higher-order attribute structure with the saturated log-linear model. The algorithm again terminated prematurely suggesting that the Mplus starting values be changed. Indeed, the estimates of the coefficients of the main effects in the saturated log-linear model were unusually large. After decreasing their starting values from the default setting of 0.50 to \(-\)1.00, the algorithm converged properly. The Mplus item parameter estimates of the logit model were converted to the traditional Reduced RUM parameter estimates. They are presented in Table 1 together with the MCMC estimates. With minor exceptions, the EM and MCMC parameter estimates are more or less identical.

Table 1 The ECPE data: estimates of the traditional item parameters of the Reduced RUM obtained from EM (Mplus) and MCMC (OpenBUGS).

The Mplus estimates of the tetrachoric correlations between the attribute pairs (lexical, morphosyntactic), (lexical, cohesive), and (morphosyntactic, cohesive) were 0.865, 0.786, and 0.911, respectively, which are lower than the estimates obtained from MCMC: 0.915, 0.887, and 0.913.

5 Discussion

The Reduced RUM has been frequently studied in simulations and applications to real-world data sets (e.g., Feng et al., 2014; Henson & Douglas, 2005; Henson & Templin, 2007; Henson, Roussos, Douglas, & He, 2008; Henson, Templin, & Douglas, 2007; Kim, 2011; Liu, Douglas, & Henson, 2009; Templin, Henson, Templin, & Roussos, 2008). Researchers have appreciated the flexibility of the Reduced RUM in modeling the probability of correct item responses for different attribute profiles. However, this flexibility comes at the cost of a “significant degree of complexity” of the estimation process (Feng et al., 2014, p. 138). In fact, with the exception of Feng et al. (2014), the estimation method of choice in the studies referenced above was MCMC. But MCMC requires advanced technical skills so that its usefulness is likely restricted to researchers with a solid background in statistics.

To date, the options for educational researchers, who wish to use the Reduced RUM in their testing programs and empirical research, but do not feel comfortable writing their own code, are still rather limited. Feng et al. (2014) recently reported the implementation of the EM algorithm for estimating the Reduced RUM as a routine in R. However, this routine is not (yet) publicly available. Alternatively, educational researchers can use a commercial package that offers an implementation of the EM algorithm for fitting (constrained) latent class models. However, using an LCA routine for fitting the Reduced RUM requires that it be re-expressed as a logit model, with constraints imposed on the parameters of the logistic function—but these were only known for models involving at most \(K=2\) attributes. In this article, the general parameterization of the Reduced RUM as a logit model involving any number of attributes and the associated parameter constraints were derived. Thus, educational researchers and practitioners can now use the EM algorithm in the LCA routines of commercial software packages like Latent GOLD and Mplus for fitting the Reduced RUM to data sets with a realistic number of attributes. Several aspects concerning the theoretical and practical implications of this result remain to be addressed.

In response to the “lack of available software for researchers and practitioners” (Templin & Hoffman, 2013, p. 63) of advanced, complex CDMs, Templin and Hoffman (2013) prepared a tutorial for fitting the (saturated) LCDM with Mplus. (At least equally important, and an immense aid to the practitioner, Templin and Hoffman (2013) direct the reader to a SAS macro written by Templin for generating the complex Mplus syntax of the parameter constraints of the LCDM automatically.) The saturated LCDM subsumes many of the familiar, recognizable CDMs that can be derived by imposing appropriate constraints on the model parameters of the saturated LCDM—among them the Reduced RUM (Henson et al., 2009; Templin & Bradshaw, 2014). So, why is a separate treatment of the technical aspects of the Reduced RUM and their software implementation needed in the first place? Casual inspection may easily deceive one to believe that the saturated LCDM is the Reduced RUM because both models contain identical main and interaction terms. However, the constraints on the parameters of the Reduced RUM are different from those of the saturated LCDM; they represent a distinct view on how the mastery of attributes and the probability of a correct response are related. The technical level of these constraints is nontrivial requiring tailored syntax to implement them in Mplus.

In comparison with the parameterization of the Reduced RUM as a logit model, the traditional parameterization of the Reduced RUM has two immediate practical advantages. First, the parameters, \(\pi ^*_j\) and \(r^*_{jk}\), are bounded by 0 and 1. They are well-defined and have a direct and meaningful interpretation as probabilities that “modulate” the overall probability of a correct item response (which itself is the result of the complex interplay between attributes required for an item and those mastered by an examinee). In contrast, the parameters of the Reduced RUM as a logit model are not readily interpretable as probabilities; although they quantify the contribution of the attributes (and their interactions) to a correct item response—but the logit scale is not as intuitively accessible as the probability scale. Second, the well-defined meaning of the traditional parameters of the Reduced RUM allows a researcher to detect ill-devised items immediately. Recall that all parameters must be bounded by 0 and 1. High-quality items usually have a large \(\pi ^*_j\) and \(r^*_{jk}\) that take on only moderate values (Henson & Templin, 2007). This information cannot be gleaned from the parameters of the Reduced RUM as a logit model.

Finally, what recommendations can be made to educational practitioners? First, the CPU times observed with the few examples reported here seem to suggest to prefer the EM algorithm in the Mplus LCA routine over MCMC (OpenBUGS). Although, neither the EM nor the MCMC run times make conducting large-scale simulations currently a viable option. On the other hand, having access to an operational EM algorithm for analyzing a given data set with the Reduced RUM involving any number of attributes—although potentially time consuming—is certainly a big improvement.

Second, as might be gathered from the computational applications reported here, the specification of the starting values for the parameter estimates in Mplus appears to be more an art than a skill. The recommendation is to try first the default settings (i.e., 0.50 for all parameters). If Mplus encounters difficulties, then it will very likely terminate with a nondescript error message recommending a change of the starting values. The few applications run here were relatively easy to fix by inspecting the (premature) parameter estimates provided by Mplus and adjusting the setting of the starting values accordingly. However, it is not unlikely that this might not work for other applications. In such a situation, the general recommendation is to contact the Mplus technical support.

Third, composing and debugging Mplus syntax for fitting CDMs is a tedious and time-consuming undertaking. Therefore, a function written in R has been made available to researchers to aid them in writing their own Mplus code. The only input required to the R function are the data and the Q-matrix underlying the test; the complete Mplus syntax for several CDMs—including the Reduced RUM—is then generated automatically. This R function is available from the authors upon request.