Restricted Latent Class Models for Nominal Response Data: Identifiability and Estimation

Liu, Ying; Culpepper, Steven Andrew

doi:10.1007/s11336-023-09940-7

Restricted Latent Class Models for Nominal Response Data: Identifiability and Estimation

Theory and Methods
Published: 19 December 2023

Volume 89, pages 592–625, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Psychometrika Aims and scope Submit manuscript

Restricted Latent Class Models for Nominal Response Data: Identifiability and Estimation

Download PDF

407 Accesses
1 Citation
Explore all metrics

Abstract

Restricted latent class models (RLCMs) provide an important framework for diagnosing and classifying respondents on a collection of multivariate binary responses. Recent research made significant advances in theory for establishing identifiability conditions for RLCMs with binary and polytomous response data. Multiclass data, which are unordered nominal response data, are also widely collected in the social sciences and psychometrics via forced-choice inventories and multiple choice tests. We establish new identifiability conditions for parameters of RLCMs for multiclass data and discuss the implications for substantive applications. The new identifiability conditions are applicable to a wealth of RLCMs for polytomous and nominal response data. We propose a Bayesian framework for inferring model parameters, assess parameter recovery in a Monte Carlo simulation study, and present an application of the model to a real dataset.

A Note on Weaker Conditions for Identifying Restricted Latent Class Models for Binary Responses

Article 27 July 2022

Handling Missing Data in Item Response Theory. Assessing the Accuracy of a Multiple Imputation Procedure Based on Latent Class Analysis

Article 22 March 2017

Robustness of Mixture IRT Models to Violations of Latent Normality

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Mixture models for multivariate, unordered categorical data, which are also referred to as nominal data, are widely used as a data reduction technique to uncover a partition of latent classes. Nominal response data arises naturally in a diverse collection of fields and associated latent class models have been applied to uncover the structure underlying positional dependence of nucleotides (Dunson & Xing, 2009), surveys responses for political elections (DeYoreo et al., 2017), anuran abundance using calling survey data (Royle & Link, 2005), as well as for multiple imputation of educational (Si & Reiter, 2013) and social science (Murray & Reiter, 2016; Vermunt et al., 2008) surveys. In short, nominal mixture models serve an important role across the physical and social sciences.

Recent psychometric research introduced a class of restricted latent class models (RLCMs) that use a more parsimonious formulation for describing the structure underlying multivariate nominal data (e.g., see Chen & Zhou, 2017, Fang et al., 2019, Templin et al., 2008) than the traditional framework, which we refer to as unrestricted latent class models (ULCMs). For instance, a popular application of nominal RLCMs is to understand how latent classes relate to target and distractor responses on multiple choice tests (e.g., see Bradshaw & Templin, 2014, De La Torre, 2009, DiBello et al., 2015, Ku et al., 2016, Shear & Roussos, 2017, Yigit et al., 2019). In order to distinguish RLCMs and ULCMs we let $\varvec{\Theta }$ denote the emission parameters that govern the likelihood that latent classes select different options on J variables with M unordered options for each variable. If C denotes the number of latent classes, the classic ULCM framework includes $J\times M\times C$ parameters to relate latent class membership to observed responses. In contrast, RLCMs impose structure on the elements of $\varvec{\Theta }$ by constraining some elements to be equal. Accordingly, RLCMs include fewer parameters than ULCMs. Furthermore, as we demonstrate below, RLCMs generally offer a more interpretable framework for understanding the latent structure (i.e., the relationship between the latent classes and observed variables). In fact, the pattern of equal and unequal elements in the RLCM $\varvec{\Theta }$ parameter provides researchers with a guide for interpreting the impact of latent class membership on response probabilities.

Although prior research developed several general models for nominal RLCMs, there are at least two limitations with existing research that limits widespread applicability of these methods for statistical research in education and the social sciences. First, existing methods are primarily confirmatory in nature given that researchers must prespecify the manner by which latent classes relate to observed response probabilities. Specifically, let $\varvec{\Delta }$ be the RLCM parameter with $J\times (M-1)\times C$ binary elements that indicate which elements of $\varvec{\Theta }$ are equal (we formally define the $\varvec{\Delta }$ matrix below with examples). Currently deployed nominal RLCMs must specify every element of $\varvec{\Delta }$, which may be challenging for some research applications. Whereas researchers may be able to correctly articulate the latent structure in $\varvec{\Delta }$ for some applications (e.g., target and distractor responses on some multiple choice tests), the general unavailability of substantive theory would limit widespread application of nominal RLCMs. Second, numerous studies developed RLCMs for multivariate nominal data, yet there has been limited research on conditions that are needed to ensure model parameters are identified. It is important to note that several studies discussed identifiability for ULCMs (e.g., see Allman et al., 2009); however, current results are not specific for RLCMs given that Allman et al. (2009) consider an unrestricted parameter space whereas the parameter space for our RLCM is restricted by the structure of $\varvec{\Delta }$. Consequently, the parameter space falls into a measure zero set with respect to the whole parameter space of ULCMs as discussed in Allman et al. (2009), so identifiability conditions mentioned above for ULCMs cannot be directly applied to our RLCMs. Furthermore, our paper contributes to literature on the identifiability RLCMs. An extensive collection of literatures have delved into local identifiability issues, which aim to ensure the model parameters are identifiable in a neighborhood of the true parameters. McHugh (1956) proposed sufficient conditions to determine the local identifiability condition for latent class model with binary response. Goodman (1974) extended the conditions for latent class models with polytomous response. Huang and Bandeen-Roche (2004) proposed local identifiability conditions for latent class models with covariates. For global identifiability issue, there are numerous papers proposing strict and generic identifiability conditions for binary response data (Chen et al., 2015, 2020; Xu, 2017; Xu & Shang, 2018) and strict identifiability conditions for polytomous response data (Culpepper, 2019; Fang et al., 2019). Additionally, Gu and Dunson (2021) establish strict and generic identifiability conditions for a multiclass, multilayer latent structure model. Gu and Dunson (2021) could be viewed as a more general model than the one we consider as it admits a multilayered, hierarchical structure for attributes. One strength of our paper relative to Gu and Dunson (2021) is that our identifiability conditions provide practitioners with clear guidance for designing nominal response assessments (e.g., forced-choice inventories). Furthermore, our identifiability conditions also provide generic conditions that are applicable to polytomous RLCMs.

Accordingly, the goal of our study is to address the aforementioned shortcomings in the literature. That is, we propose a fully exploratory framework for inferring nominal RLCM parameters and present new theory regarding model identification. The identifiability of model parameters is critical for statistical inference and we also provide researchers with guidance for designing multivariate nominal response studies.

It is also important to distinguish the models we explore in this study in comparison to polytomous latent class models. Specifically, researchers advanced RLCMs for polytomous data for both confirmatory (e.g., see Ma & de la Torre, 2016; 2019) and exploratory methods (Culpepper, 2019; Culpepper & Balamuta, 2021; Jimenez et al., 2023). There are also several studies (Bacci et al., 2014; Bartolucci, 2007; Gnaldi et al., 2020) described latent class models within an item response theory (IRT) framework with at least three link functions (i.e., graded response, partial credit, and continuation ratio). These prior studies made important contributions and demonstrated how to use link functions for modeling ordered, polytomous response data with latent class models. In contrast, an important innovation of our study is that we deploy the multinomial logistic link function, which is suitable for unordered, nominal responses.

The remainder of this paper includes six sections. The first section provides a general introduction to ULCMs and RLCMs for nominal data and the second section presents new theoretical results concerning the identifiability of RLCMs (please see Appendix for related proofs). The third section outlines a Bayesian formulation for inferring the RLCM parameter posterior distribution. The fourth section reports Monte Carlo results concerning the accuracy of the developed algorithm and the fifth section reports results from an application. The final section discusses the implications of this study and provides concluding remarks.

2 Overview of Mixture Models for Nominal Responses

We consider the setting where multivariate, nominal response data are available such that $Y_j$ (for $j=1,\dots ,J$) is a random categorical (or nominal) response with a realization $y_j\in \left\{ 0,\dots ,M_j-1\right\} $ where $M_j\ge 2$ denotes the number of unordered response options. We denote the random J-vector by $\varvec{Y}=(Y_1,\dots , Y_J)^\top $ and the observed vector of responses as $\varvec{y}= (y_{1},\dots ,y_{J})^\top $. The support for $\varvec{Y}$ is defined as $\varvec{y}\in \times _{j=1}^J \left\{ 0,\dots ,M_j-1\right\} $, which implies there are $\prod _{j=1}^JM_j$ possible observed response patterns. The purpose of this section is discuss the role of mixture models in understanding the multivariate, nominal response patterns. The first subsection reviews existing unstructured latent class models (ULCMs) for nominal, unordered response data. ULCMs offer a powerful framework for uncovering substantively meaningful latent classes. However, the results from ULCMs data analyses may not always be easily interpretable as researchers must decipher the meaning of latent classes by comparing many latent class parameters. Accordingly, the second subsection introduces a new general restricted latent class model (RLCM) framework, which has the benefit of directly uncovering the latent structure by providing researchers with a $\varvec{\Delta }$ parameter for more easily interpreting the class labels.

2.1 Unstructured Latent Class Models (ULCMs)

The goal of this section is to review the traditional ULCM framework. Let $c\in \{0,\dots ,C-1\}$ index the C underlying latent classes. In the case of nominal data, the unstructured model includes a $M_j$-vector of category response probabilities for each class and item denoted by $\varvec{\theta }_{jc}=(\theta _{jc0},\dots ,\theta _{jc,M_j-1})^\top $ so that the probability of observing a response of m on item j for members of class c is $\theta _{jcm}=P(Y_j=m|c)$. We define $\varvec{\Theta }_j=(\varvec{\theta }_{j0},\dots ,\varvec{\theta }_{j,C-1})$ as the $M_j\times C$ matrix of response probabilities by response option and latent class. The goal of ULCMs is to describe the $\prod _{j=1}^J M_j$ possible response patterns. ULCMs consider the case where latent classes differ in their chances of responding according to a given response pattern. The probability vector that governs the chance members of class c respond according to one of the $\prod _{j=1}^J M_j$ possible response patterns is $\mathbb {P}_c = \bigotimes _{j=1}^J\varvec{\theta }_{jc}$ where $\otimes $ denotes a Kronecker product. Let $\varvec{\pi }=(\pi _0,\dots ,\pi _{C-1})^\top $ be a C-vector of structural probabilities such that $\pi _c$ denotes the chance of membership in class c and note that the model implied response pattern probability vector is $\mathbb {P}=\sum _{c=0}^{C-1}\pi _c \mathbb {P}_c$.

2.2 Restricted Latent Class Models (RLCMs)

This subsection introduces a RLCM for nominal data which offers a more interpretable solution by imposing restrictions on the ULCM $\varvec{\theta }_{jc}$ parameters. In particular, the RLCM adapts the ULCM to describe the $\prod _{j=1}^JM_j$ response patterns by reparameterizing both the latent space and parameters. First, the RLCM defines the latent classes using a $2^K$ binary attribute vector $\varvec{\alpha }=(\alpha _1,\dots ,\alpha _K)^\top \in \{0,1\}^K$. Therefore, the connection between the number of classes in the ULCM and the RLCM is $C=2^K$. An advantage of using the binary attribute profile is that researchers can interpret $\alpha _k=1$ as denoting possession or mastery of attribute k and $\alpha _k=0$ otherwise. The relationship between the ULCM and RLCM is also apparent when using a bijection between the binary attribute profile $\varvec{\alpha }$ and the integers $c\in \{0,\dots , 2^K-1\}$ by defining class $c=\varvec{\alpha }^\top \varvec{v}\in \{0,\dots , 2^K-1\}$ with $\varvec{v}=(2^{K-1},2^{K-2},\dots ,1)^\top $.

Second, the RLCM reparameterizes the elements of $\varvec{\theta }_{jc}$ using the following multinomial logit-link function

$$\begin{aligned} \theta _{jcm}=\frac{\exp \left( \varvec{a}_c^\top \varvec{\beta }_{jm} \right) }{\sum _{m'=0}^{M_j-1}\exp \left( \varvec{a}_c^\top \varvec{\beta }_{jm'} \right) } \end{aligned}$$

(1)

where $\varvec{a}_c$ is a design vector for the attribute profile for class c and $\varvec{\beta }_{jm}$ is a P-vector of coefficients for item j and option m (i.e., P depends on the order of the model, $P=2^K$ if we include main and all interaction-effect terms for latent class). Note that the restriction $\varvec{\beta }_{j0}=\varvec{0}$ is deployed for all j to identify the model. Furthermore, the restriction on $\varvec{\beta }_{j0}$ implies that $y_j=0$ is the reference response so that $\varvec{\beta }_{jm}$ for $m>0$ quantifies the impact of the attributes on response values of $y_j=m$ versus $y_j=0$ on item j. Let the $M_j\times 2^K$ matrix of coefficients for item j be denoted as $\varvec{B}_j=(\varvec{\beta }_{j0},\dots ,\varvec{\beta }_{j,M_j-1})^\top $.

An important implication of reparameterizing $\theta _{jcm}$ with a multinomial logit-link is that the transformed $\varvec{\beta }_{jm}$ parameters provide a more coherent interpretation regarding the process by which the underlying attributes relate to observed responses. For instance, we define the $2^K$-vector $\varvec{a}_c$ as including main- and interaction-effect terms for latent class $\varvec{\alpha }^\top \varvec{v}=c$. Consequently, the elements of $\varvec{\beta }_{jm}$ indicate the manner by which the attributes translate into preferences for response option m relative to response option zero.

We next present an example to further illustrate the link between ULCMs and RLCMs and the interpretation of the $\varvec{a}_c$ and $\varvec{\beta }_{jm}$ parameters.

Example 1

Suppose $K=3$ and $M_j=3$, so $y_j\in \{0,1,2\}$. In this case, the matrix of ULCM parameters is,

$$\begin{aligned} \varvec{\Theta }_j = \begin{bmatrix} \theta _{j00}&{}\quad \theta _{j10}&{}\quad \theta _{j20}&{}\quad \theta _{j30}&{}\quad \theta _{j40}&{}\quad \theta _{j50}&{}\quad \theta _{j60}&{}\quad \theta _{j70}\\ \theta _{j01}&{}\quad \theta _{j11}&{}\quad \theta _{j21}&{}\quad \theta _{j31}&{}\quad \theta _{j41}&{}\quad \theta _{j51}&{}\quad \theta _{j61}&{}\quad \theta _{j71}\\ \theta _{j02}&{}\quad \theta _{j12}&{}\quad \theta _{j22}&{}\quad \theta _{j32}&{}\quad \theta _{j42}&{}\quad \theta _{j52}&{}\quad \theta _{j62}&{}\quad \theta _{j72}\\ \end{bmatrix} \end{aligned}$$

(2)

where we note that $\theta _{jc0}=1-\sum _{m=1}^{M_j}\theta _{jcm}$ for all $c\in \{0,1,\dots ,7\}$. In this setting, the ULCM includes $2\times 8=16$ parameters for each item. Moreover, in order to understand the meaning of the latent classes researchers would need to interpret differences in the $16\cdot J$ total class probabilities, which may be challenging for even a modest number of items J. The RLCM attempts to address this problem by reparameterizing both the latent classes and item parameters. In the case with $K=3$, we define the arbitrary design vector $\varvec{a}$ as:

$$\begin{aligned} \varvec{a}^\top = (1,\alpha _1,\alpha _2,\alpha _3,\alpha _1\alpha _2,\alpha _1\alpha _3,\alpha _2\alpha _3,\alpha _1\alpha _2\alpha _3) \end{aligned}$$

(3)

so that $\varvec{a}$ includes all main-effect and interaction terms among the attributes and we use $\varvec{a}_c$ to refer to the design vector for attribute profile $\varvec{\alpha }^\top \varvec{v}=c$. The matrix of reparameterized parameters $\varvec{\beta }_j$ for relating $\varvec{\alpha }$ to $Y_j$ is

$$\begin{aligned} \varvec{B}_j = \begin{bmatrix} \beta _{j00}&{}\quad \beta _{j10}&{}\quad \beta _{j20}&{}\quad \beta _{j30}&{}\quad \beta _{j40}&{}\quad \beta _{j50}&{}\quad \beta _{j60}&{}\quad \beta _{j70}\\ \beta _{j01}&{}\quad \beta _{j11}&{}\quad \beta _{j21}&{}\quad \beta _{j31}&{}\quad \beta _{j41}&{}\quad \beta _{j51}&{}\quad \beta _{j61}&{}\quad \beta _{j71}\\ \beta _{j02}&{}\quad \beta _{j12}&{}\quad \beta _{j22}&{}\quad \beta _{j32}&{}\quad \beta _{j42}&{}\quad \beta _{j52}&{}\quad \beta _{j62}&{}\quad \beta _{j72}\\ \end{bmatrix}. \end{aligned}$$

(4)

Note we can view $\varvec{a}_c^\top \varvec{\beta }_{jm}$ as the latent response propensity for members of class c to pick option m vs. option 0. Therefore, the definition of $\varvec{a}$ implies that $\beta _{j0m}$ is an intercept term that corresponds with the latent propensity for the latent class with $\varvec{\alpha }=\varvec{0}$ to select option m vs. 0. Furthermore, the main-effects for $\alpha _1$, $\alpha _2$, and $\alpha _3$ for distinguishing response m from 0 are $\beta _{j1m}$, $\beta _{j2m}$, and $\beta _{j3m}$, respectively. Furthermore, the two-way interaction terms are $\alpha _1\alpha _2$, $\alpha _1\alpha _3$, and $\alpha _2\alpha _3$ with effects $\beta _{j4m}$, $\beta _{j5m}$, and $\beta _{j6m}$, respectively, and the three-way interaction effect is $\beta _{j7m}$. In general positive coefficients suggest preference for option m to option 0 and the interactive effects provide researchers with insight regarding the extent to which preferences are determined by a complex interplay of the attributes.

The aforementioned example demonstrates the ability of the RLCM to provide researchers with a more clear interpretation of the latent structure (i.e., the relationship between attributes and observed responses). Still, each $\varvec{\beta }_j$ includes many parameters to estimate and interpret. A further refinement we advance to support coherent inferences about the latent structure is to incorporate variable selection methods into the RLCM to infer which of the elements of $\varvec{\beta }_j$ are active (i.e., different from zero) versus inactive (i.e., equal to or near zero). In fact, the pattern of active vs. inactive elements of $\varvec{\beta }_j$ indicates the underlying structure and describes the process by which attributes relate to the observed response $Y_j$. Accordingly, we introduce a $M_j\times 2^K$ binary matrix $\varvec{\Delta }_j$ in order to indicate which elements of $\varvec{\beta }_{j}$ are active. Specifically, $\delta _{jpm}=1$ to denote that $\beta _{jpm}$ is active (i.e., nonzero) and $\delta _{jpm}=0$ if $\beta _{jpm}=0$ (i.e., inactive). Note that we generally always include the intercept and fix $\delta _{j0m}=1$ for all $m\in \{1,\dots ,M_j-1\}$.

We next revisit Example 1 to highlight the role of $\varvec{\Delta }_j$ in interpreting the latent structure.

Example 2

Reconsider the case with $M_j=3$ and $K=3$. In this case, $\varvec{\Delta }_j$ is generally written as

$$\begin{aligned} \varvec{\Delta }_j=\begin{bmatrix} \delta _{j00}&{}\quad \delta _{j10}&{}\quad \delta _{j20}&{}\quad \delta _{j30}&{}\quad \delta _{j40}&{}\quad \delta _{j50}&{}\quad \delta _{j60}&{}\quad \delta _{j70}\\ \delta _{j01}&{}\quad \delta _{j11}&{}\quad \delta _{j21}&{}\quad \delta _{j31}&{}\quad \delta _{j41}&{}\quad \delta _{j51}&{}\quad \delta _{j61}&{}\quad \delta _{j71}\\ \delta _{j02}&{}\quad \delta _{j12}&{}\quad \delta _{j22}&{}\quad \delta _{j32}&{}\quad \delta _{j42}&{}\quad \delta _{j52}&{}\quad \delta _{j62}&{}\quad \delta _{j72}\\ \end{bmatrix}. \end{aligned}$$

(5)

Note that $\delta _{jp0}=0$ for all $p=0,\dots , 2^K-1$ to identify the model parameters and that terms for the intercepts are generally specified as active so $\delta _{j01}=\delta _{j02}=1$.

Remark 1

If $\varvec{\Delta }_j = \varvec{1}$ for all $j=1, \ldots , J$, which implies that all coefficients in $\varvec{B}$ are active, the latent classes have distinct response probabilities, and the RLCM is equivalent to a ULCM in this case. For additional discussion see Example 1 of Chen et al. (2020) for an exposition involving the binary response RLCM.

Note that the pattern of 1’s and 0’s in $\varvec{\Delta }_j$ convey different types of relationships and structures. The structure of an item is referred to as simple structure for attribute k if the response probabilities only differ by levels of $\alpha _k$.

Definition 1

The structure of $\varvec{\Delta }_j$, which is a slice of $\varvec{\Delta }$ for item j, is referred to as simple structure for attribute k if it satisfies the following structure:

$$\begin{aligned} \varvec{\Delta }_j= \begin{bmatrix} 0&{}\quad 0&{}\quad \cdots &{}\quad 0&{}\quad 0&{}\quad 0&{}\quad \cdots &{}\quad 0\\ 1&{}\quad 0&{}\quad \cdots &{}\quad 0&{}\quad \delta _{jk1}&{}\quad 0&{}\quad \cdots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad &{}\quad \vdots \\ 1&{}\quad 0&{}\quad \cdots &{}\quad 0&{}\quad \delta _{jk,M_j-1}&{}\quad 0&{}\quad \cdots &{}0\\ \end{bmatrix}_{M_j \times P}, \end{aligned}$$

(6)

and $\sum _{m=1}^{M_j-1} \delta _{jkm}\ge 1$ where P generally equals $2^K$.

Remark 2

Note that for convenience of notation that our identifiability proof below supposes that item j is simple structure for attribute j.

Example 3

Consider $M_1=M_2=3$, $P=2^K$ and $J=2$ and note that examples of $\varvec{\Delta }$ matrices that satisfy simple structure according to Definition 1 are:

$$\begin{aligned} \varvec{\Delta }_1 = \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ 1&{}\quad 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ 1&{}\quad 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \end{bmatrix}, \end{aligned}$$

(7)

$$\begin{aligned} \varvec{\Delta }_2 = \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ 1&{}\quad 0&{}\quad 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \end{bmatrix}. \end{aligned}$$

(8)

Items 1 and 2 suppose the item is simple structure such that the probability of selecting the responses only relates to standing on $\alpha _1$ in $\varvec{\Delta }_1$ and $\alpha _2$ for $\varvec{\Delta }_2$. $\varvec{\Delta }_1$ indicates that only the main-effect for $\alpha _1$ differentiates between response option 1 vs. 0 and 2 vs. 0. In contrast, for item 2, $\varvec{\Delta }_2$ represents the case where the main-effect for $\alpha _2$ is only active for differentiating between response option 2 vs. 0. The associated $\varvec{B}_1$ and $\varvec{B}_2$ matrices for the structure parameters $\varvec{\Delta }_1$ and $\varvec{\Delta }_2$ are:

$$\begin{aligned} \varvec{B}_1 = \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \beta _{101}&{}\quad \beta _{111}&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \beta _{102}&{}\quad \beta _{112}&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \end{bmatrix}, \end{aligned}$$

(9)

$$\begin{aligned} \varvec{B}_2 = \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \beta _{201}&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \beta _{202}&{}\quad 0&{}\quad \beta _{222}&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \end{bmatrix}. \end{aligned}$$

(10)

Let $\varvec{\Theta }_j(\varvec{B}_j)$ denote the latent class response probabilities associated with the RLCM $\varvec{B}_j$ matrix. The presence of structure in $\varvec{B}_j$ and $\varvec{\Delta }_j$ implies that elements of $\varvec{\Theta }_j(\varvec{B}_j)$ are restricted to be equal.

Example 4

The rows of $\varvec{\Delta }_1$ in Eq. 7 imply that certain latent classes have a common probability of selecting response 1 vs 0 and 2 vs. 0. That is, latent classes that do not possess the first attribute such that $\varvec{\alpha }=(0,\alpha _2,\alpha _3)$ have common response probabilities for selecting options 0, 1, and 2 of $\theta _{100}$, $\theta _{101}$, and $\theta _{102}$, respectively, whereas classes with the first attribute with $\varvec{\alpha }=(1,\alpha _2,\alpha _3)$ have common response probabilities of $\theta _{140}$, $\theta _{141}$, and $\theta _{142}$. The $\varvec{\Theta }_1(\varvec{B}_1)$ in this case is

$$\begin{aligned} \varvec{\Theta }_1(\varvec{B}_1) = \begin{bmatrix} \theta _{100}&{}\quad \theta _{100}&{}\quad \theta _{100}&{}\quad \theta _{100}&{}\quad \theta _{140}&{}\quad \theta _{140}&{}\quad \theta _{140}&{}\quad \theta _{140}\\ \theta _{101}&{}\quad \theta _{101}&{}\quad \theta _{101}&{}\quad \theta _{101}&{}\quad \theta _{141}&{}\quad \theta _{141}&{}\quad \theta _{141}&{}\quad \theta _{141}\\ \theta _{102}&{}\quad \theta _{102}&{}\quad \theta _{102}&{}\quad \theta _{102}&{}\quad \theta _{142}&{}\quad \theta _{142}&{}\quad \theta _{142}&{}\quad \theta _{142}\\ \end{bmatrix}, \end{aligned}$$

(11)

where the columns of $\varvec{\Theta }_1(\varvec{B}_1)$ are organized according to the binary-integer bijection and

$$\begin{aligned}{} & {} \varvec{\theta }_{10}=(\theta _{100},\theta _{101},\theta _{102})^\top = \frac{1}{\sum _{m=0}^2 \exp (\beta _{10m})}\left( 1,\exp (\beta _{101}),\exp (\beta _{102})\right) , \end{aligned}$$

(12)

$$\begin{aligned}{} & {} \varvec{\theta }_{14}=(\theta _{140},\theta _{141},\theta _{142})^\top = \frac{1}{\sum _{m=0}^2 \exp (\beta _{10m}+\beta _{11m})}\left( 1,\exp (\beta _{101}+\beta _{111}), \exp (\beta _{102}+\beta _{112})\right) .\nonumber \\ \end{aligned}$$

(13)

In contrast, the rows of $\varvec{\Delta }_2$ in Eq. 8 imply a different collection of elements are constrained equal in $\varvec{\Theta }_2(\varvec{B}_2)$. Latent classes that do not possess the second attribute such that $\varvec{\alpha }=(\alpha _1,0,\alpha _3)$ have common response probabilities for selecting options 0, 1, and 2 of $\theta _{200}$, $\theta _{201}$, and $\theta _{202}$, respectively, whereas classes with the second attribute with $\varvec{\alpha }=(\alpha _1,1,\alpha _3)$ have common response probabilities of $\theta _{220}$, $\theta _{221}$, and $\theta _{222}$. The $\varvec{\Theta }_2(\varvec{B}_2)$ in this case is

$$\begin{aligned} \varvec{\Theta }_2(\varvec{B}_2) = \begin{bmatrix} \theta _{200}&{}\quad \theta _{200}&{}\quad \theta _{220}&{}\quad \theta _{220}&{}\quad \theta _{200}&{}\quad \theta _{200}&{}\quad \theta _{220}&{}\quad \theta _{220}\\ \theta _{201}&{}\quad \theta _{201}&{}\quad \theta _{221}&{}\quad \theta _{221}&{}\quad \theta _{201}&{}\quad \theta _{201}&{}\quad \theta _{221}&{}\quad \theta _{221}\\ \theta _{202}&{}\quad \theta _{202}&{}\quad \theta _{222}&{}\quad \theta _{222}&{}\quad \theta _{202}&{}\quad \theta _{202}&{}\quad \theta _{222}&{}\quad \theta _{222}\\ \end{bmatrix}, \end{aligned}$$

(14)

where the columns of $\varvec{\Theta }_2(\varvec{B}_2)$ are organized according to the binary-integer bijection and

$$\begin{aligned} \varvec{\theta }_{20}=(\theta _{200},\theta _{201},\theta _{202})^\top = \frac{1}{\sum _{m=0}^2 \exp (\beta _{20m})}\left( 1,\exp (\beta _{201}),\exp (\beta _{202})\right) , \end{aligned}$$

(15)

$$\begin{aligned} \varvec{\theta }_{22}=(\theta _{220},\theta _{221},\theta _{222})^\top = \frac{\left( 1,\exp (\beta _{201}),\exp (\beta _{202}+\beta _{222})\right) }{1+ \exp (\beta _{201})+ \exp (\beta _{202}+\beta _{222})}. \end{aligned}$$

(16)

Remark 3

Note that $\varvec{\Delta }_j$ can also denote different structures where multiple attributes relate to response variables. For instance, $\varvec{\Delta }_j$ might specify the inclusion of interaction terms so that response probabilities are shaped by a more complex relationship of the attributes. Furthermore, we can also draw a connection between the ULCM and RLCM where $\varvec{\Delta }_j=(0,\varvec{1}_{M_j-1}^\top )^\top \varvec{1}_P^\top $ corresponds with the ULCM setting with distinct elements in $\varvec{\Theta }_j(\varvec{B}_j)$.

3 Identifiability Issue

3.1 Model Identifiability

As introduced in the previous section, the probability distribution of latent classes is given by $\varvec{\pi }=(\pi _c)^\top \in [0,1]^{2^K}$ with $\sum \pi _c =1$. Coefficients array $\varvec{B}=(\varvec{B}_1, \ldots ,\varvec{B}_J)$ is a three-dimensional array, where $\varvec{B}_j$ is the j-th slice of $\varvec{B}$ with size $M_j\times P$. Then, we denote the parameter space of $(\varvec{\pi },\varvec{B})$ by

$$\begin{aligned} \Omega (\varvec{\pi }, \varvec{B})=\{(\varvec{\pi }, \varvec{B}):\varvec{\pi }\in \Omega (\varvec{\pi }),\varvec{B}\in \Omega (\varvec{B})\}, \end{aligned}$$

(17)

where $\Omega (\varvec{\pi })=\{\varvec{\pi }\in [0,1]^{2^K}:\sum _c \pi _c =1\}$, and $\Omega (\varvec{B})$ represents the parameter space of the coefficients array $\varvec{B}$, which could be the whole real space ${\mathbb {R}}^{J\times P\times \sum _j M_j}$, or a subset of $\mathbb R^{J\times P\times \sum _j M_j}$ if constrained by the $\varvec{\Delta }$.

Definition 2

(Strict Identifiability) The parameters $(\varvec{\pi }, \varvec{B}) \in \Omega (\varvec{\pi }, \varvec{B})$ are identifiable if

$$\begin{aligned} P(\varvec{Y} =\varvec{y} \mid \varvec{\pi }, \varvec{B})=P(\varvec{Y} =\varvec{y} \mid \bar{\varvec{\pi }},\bar{\varvec{B}}) \Longleftrightarrow (\varvec{\pi },\varvec{B}) \sim (\bar{\varvec{\pi }},\bar{\varvec{B}}), \end{aligned}$$

where $(\bar{\varvec{\pi }}, \bar{\varvec{B}})$ is another value from the parameter space $\Omega (\varvec{\pi }, \varvec{B})$ and “$\sim $” means two parameter values are equivalent up to label switching of attributes.

3.2 Generic Identifiability

Generic identifiability, which is a weaker notion of identifiability than Definition 2, was first introduced in Allman et al. (2009). Generic identifiability allows the existence of some exceptional values of parameters for which strict identifiability does not hold, as long as all non-identifiable parameters form a Lebesgue measure zero set within the parameter space. Given that non-identifiable parameters exist in a set of measure zero, one is unlikely to face identifiability problems in performing inference. Thus, generic identifiability is generally sufficient for data analysis purposes.

However, the generic identifiability condition shown in Allman et al. (2009) cannot be applied in this paper. Under the setting of Allman et al. (2009), the parameter space $\Omega (\varvec{B})$ is the whole real space $\mathbb R^{J\times P\times \sum _j M_j}$, whereas the parameter space $\Omega (\varvec{B})$ in our RLCM is restricted by the structure of $\varvec{\Delta }$. The dimension of $\Omega (\varvec{B})$ might vary with different $\varvec{\Delta }$ arrays, i.e., the parameter space of $\varvec{B}$ restricted by $\varvec{\Delta }$ might be a measure zero subspace of another parameter space of $\varvec{B}$ restricted by $\tilde{\varvec{\Delta }}$. So, it is important to discuss the generic identifiability issue within a parameter space with a fixed $\varvec{\Delta }$.

Therefore, in order to discuss generic identifiability for our RLCM, we need to define the parameter space $\Omega (\varvec{B})$ by taking into account the sparsity structure due to the $\varvec{\Delta }$ array. Similar to Definition 17, we denote the model parameter space with a given $\varvec{\Delta }$ by

$$\begin{aligned} \Omega _{\varvec{\Delta }}(\varvec{\pi }, \varvec{B})=\{(\varvec{\pi }, \varvec{B}):\varvec{\pi }\in \Omega (\varvec{\pi }),\varvec{B}\in \Omega _{\varvec{\Delta }}(\varvec{B})\}. \end{aligned}$$

(18)

Coefficients in $\Omega _{\varvec{\Delta }}(\varvec{B})$ are active when corresponding elements in $\varvec{\Delta }$ are equal to 1, so the parameter space $\Omega _{\varvec{\Delta }}(\varvec{B})$ would be $\mathbb R^{\mid \varvec{\Delta }\mid }$, where $\mid \varvec{\Delta }\mid $ is the total sum of entries of $\varvec{\Delta }$. For generic identifiability, it suffices to consider the parameter space $\Omega _{\varvec{\Delta }}(\varvec{\pi }, \varvec{B})$ with a given sparsity structure $\varvec{\Delta }$.

Let $S_{\varvec{\Delta }}$ denote the set of non-identifiable parameters from $\Omega (\varvec{\pi },\varvec{B})$:

$$\begin{aligned} \begin{aligned} S_{\varvec{\Delta }}=\{(\varvec{\pi },\varvec{B}):&\ P(\varvec{Y} =\varvec{y} \mid \varvec{\pi },\varvec{B})= P(\varvec{Y} =\varvec{y} \mid \bar{\varvec{\pi }}, \bar{\varvec{B}}),\\ ( \varvec{\pi },\varvec{B}) \not \sim (\bar{\varvec{\pi }},&\bar{\varvec{B}}),\ (\varvec{\pi }, \varvec{B})\in \Omega _{\varvec{\Delta }}(\varvec{\pi }, \varvec{B}),(\bar{\varvec{\pi }}, \bar{\varvec{B}})\in \Omega _{\bar{\varvec{\Delta }}}(\varvec{\pi }, \varvec{B})\}. \end{aligned} \end{aligned}$$

(19)

Remark 4

The non-identifiable parameters $(\varvec{\pi }, \varvec{B}) \in S_{\varvec{\Delta }}$ could be due to some other parameters $(\bar{\varvec{\pi }}, \bar{\varvec{B}})$ with a different sparsity structure $\bar{\varvec{\Delta }}$.

If the non-identifiable parameter set $S_{\varvec{\Delta }}$ is of measure zero within parameter space $\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})$, then we say $\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})$ is a generically identifiable parameter space.

Definition 3

(Generic Identifiability) The parameter space $\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})$ is generically identifiable, if the Lebesgue measure of $S_{\varvec{\Delta }}$ with respect to parameter space $\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})$ is zero.

3.3 Identifiability Conditions

In this section, we propose strict and generic identifiability conditions for our RLCM. We start with introducing the form of $\varvec{\Delta }$ as follows.

The sparse 3-dimensional array $\varvec{\Delta }$ takes the form

$$\begin{aligned} \varvec{\Delta }=\left( \begin{array}{c}{\varvec{\Delta }^1}\\ {\varvec{\Delta }^2}\\ {\varvec{\Delta }^\prime }\end{array}\right) \end{aligned}$$

after a permutation of items, where $\varvec{\Delta }^1$ and $\varvec{\Delta }^2$ contain K slices of $\varvec{\Delta }$ and $\varvec{\Delta }^\prime $ contains the rest of $J-2K$ slices. We use $\varvec{\Delta }_j^i$ to denote the j-th slice of $\varvec{\Delta }^i$ for item j.

Theorem 1

(Strict Identifiability) The parameter space $\Omega (\varvec{\pi },\varvec{B})$ is strictly identifiable if the following two conditions are satisfied:

(A1)
For $j=1,\ldots ,K$, $\Delta _j^1$ and $\Delta _{j}^2$ satisfy simple structure shown in Definition 1 and Remark 2;
(A2)
For any two classes of subjects, there exists at least one item in $\varvec{\Delta }^\prime $ such that they have different positive response probabilities for some response option.

Remark 5

The $\varvec{\Delta }_j$ shown in Example 3 satisfies the structure in A1.

Theorem 2

(Generic Identifiability) The parameter space $\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})$ is generically identifiable if the following two conditions are satisfied:

(B1)
For $j=1,\ldots ,K$, $\Delta _j^1$ and $\Delta _{j}^2$ satisfy the following structure:
$$\begin{aligned} \varvec{\Delta }_j= \begin{bmatrix} 0&{}0&{}\cdots &{}0&{}0&{}0&{}\cdots &{}0\\ *&{}*&{}\cdots &{}*&{}\delta _{jj1}&{}*&{}\cdots &{}*\\ \vdots &{}\vdots &{}&{}\vdots &{}\vdots &{}\vdots &{}&{}\vdots \\ *&{}*&{}\cdots &{}*&{}\delta _{jj,M_j-1}&{}*&{}\cdots &{}*\\ \end{bmatrix}_{M_j \times P}, \end{aligned}$$
(20)
and $\sum _{m=1}^{M_j-1} \delta _{jjm}\ge 1$, where $*$ can be either 0 or 1, P generally equals $2^K$.
(B2)
$\varvec{\Delta }^{\prime }$ satisfies the condition that for every $k=1,\ldots ,K$ there exists a $j > 2K$, such that $\sum _{m=1}^{M_j-1} \delta _{jkm}\ge 1$.

Remark 6

Condition (B2) requires that there is a least one item in the last $J-2K$ items where attribute k loads onto the main-effect for at least one response option.

4 Bayesian Formulation for the Nominal RLCM

Following the same setting in previous sections, consider a RLCM with N subjects, J items with $M_j$ ($j=1,\ldots ,J$) unordered response options for each item j, and K skills. We use subscript $i=1,\ldots ,N$ to index subjects, $j=1,\ldots ,J$ to index items, $m=0,\ldots ,M_j-1$ to index options of each item, and $c=0,\ldots ,2^K-1$ to index latent classes. Let $\varvec{\alpha }_{i}$ denote the attribute profile of subject i, and $Y_{ij}$ denote the response of subject i to item j. The likelihood of observing a sample of N responses to J items is

$$\begin{aligned} p\left( \varvec{Y}=\varvec{y}\mid \varvec{B}, \varvec{\pi }\right) =\prod _{i=1}^{N}\sum _{\varvec{\alpha }_c\in \{0,1\}^K} \pi _c\prod _{j=1}^{J}\prod _{m=0}^{M_j-1}\left( \frac{\exp \left( \varvec{a}_i^\top \varvec{\beta }_{jm} \right) }{\sum _{m'=0}^{M_j-1}\exp \left( \varvec{a}_i^\top \varvec{\beta }_{jm'} \right) }\right) ^{\mathbb {1}(y_{ij}=m)}. \end{aligned}$$

(21)

The posterior distribution of all parameters for the nominal RLCM is given by

$$\begin{aligned} p(\varvec{\alpha },\varvec{B},\varvec{\Delta },\gamma ,\sigma _{\beta }^{2},\varvec{\pi }|\varvec{y})\propto p(\varvec{y}|\varvec{\alpha },\varvec{B})p(\varvec{\alpha }|\varvec{\pi })p(\varvec{\pi })p(\varvec{B}|\varvec{\Delta },\sigma _\beta ^2)p(\sigma _\beta ^2)p(\varvec{\Delta }|\gamma )p(\gamma ). \end{aligned}$$

(22)

Then, we formulate the RLCM Bayesian model as follows.

We outline our Bayesian model and priors. Specifically, we use a categorical likelihood conditioned upon attributes and item parameters,

$$\begin{aligned} Y_{ij}|\varvec{\alpha }^\top \varvec{v}=c,\varvec{B}_{j}\sim \text {categorical}\left( \varvec{\theta }_{jc}(\varvec{B}_j)\right) . \end{aligned}$$

(23)

We also use a categorical prior for attributes conditioned upon the latent class probabilities,

$$\begin{aligned} \varvec{\alpha }_i|\varvec{\pi }\sim \text {categorical}(\varvec{\pi }) \end{aligned}$$

(24)

and a conjugate Dirichlet prior for the latent class probabilities, $\varvec{\pi }\sim \text {Dirichlet}(\varvec{d}_0)$ where $\varvec{d}_0$ is a fixed constant vector.

We use a stochastic search variable selection priors for the (j, p, m) elements of $\varvec{B}$ and $\varvec{\Delta }$:

$$\begin{aligned} \beta _{jpm}&\mid \delta _{jpm},\sigma _\beta ^2 \sim \left\{ \begin{array}{ll} N(0, \sigma _{\beta }^{2}) &{} \delta _{jpm}=1 \\ N(0, \sigma _{\beta }^{2}/D) &{} \delta _{jpm}=0 \end{array}\right. ,\end{aligned}$$

(25)

$$\begin{aligned} \delta _{jpm}&\mid \gamma \sim Bernoulli(\gamma ), \end{aligned}$$

(26)

where $\varvec{B}=(\varvec{B}_1,\dots ,\varvec{B}_J)^\top $ satisfies the generic identifiable condition shown in Theorem 2, and the intercept is always set active with $\delta _{j0m}=1$. Furthermore, D is a large fixed constant (e.g., we consider $D=100$, 1000) that is used reduce the variance for the spike distribution for the case with $\delta _{jpm}=0$. The priors for the hyper-parameters for the coefficients and activeness parameters are:

$$\begin{aligned} \sigma _{\beta }^2&\sim IGamma(\alpha _{\sigma },\beta _{\sigma }),\end{aligned}$$

(27)

$$\begin{aligned} \gamma&\sim Beta(a,b). \end{aligned}$$

(28)

Here $(\alpha _{\sigma },\beta _{\sigma },D,a,b,\varvec{d}_0)$ are hyper-parameters.

Model parameters of the nominal RLCM are inferred through applying the Polya-gamma data augmentation approach for multinomial logistic regression (Holmes & Held, 2006; Polson et al., 2013) along with the stochastic search variable selection algorithm (George & McCulloch, 1993) to infer the latent structure. Then, the Gibbs sampling algorithm is implemented from the posterior distribution of model parameters, which is given in Appendix 7. Full sampling algorithm is represented in Algorithm 1. In order to address issues with respect to poor starting values we use a combination of k-means clustering and factor analysis to specify starting values (see the description in Appendix 7).

5 Monte Carlo Simulation Study

5.1 Settings

In this section, we report results from a Monte Carlo experiment to evaluate the performance of Algorithm 1. We conducted the simulation study under different number of attributes (i.e., $K=2$ and 3), correlations among the attributes (i.e., $\rho =0$ and 0.25), and sample size (i.e., $N=1000$, 2000, 5000 and 10000).

For the $\rho =0$ case, the attribute profile $\varvec{\alpha }=(\alpha _{1},\ldots ,\alpha _{K})^{\top }$ is generated uniformly from all possible $2^K$ cases, so the latent class membership probabilities are $\varvec{\pi }= (1/2^K,\ldots ,1/2^K)^\top $. For the $\rho >0$ case, the dependence among attribute profiles is introduced using the method of Chiu et al. (2009). Suppose $\varvec{Z}=(Z_1,\ldots ,Z_K)^{\top }$ follows a multivariate normal distribution $N(\varvec{0}, \varvec{\Sigma })$ with unit variance and correlation $\rho $, where $\varvec{\Sigma }=(1-\rho )\varvec{I}_K + \rho \varvec{1}_{K}\varvec{1}_{K}^\top $ and $\varvec{1}_{K}$ is a column vector of 1 with length K. Then, the attribute profile $\varvec{\alpha }$ is given by $\alpha _{k}={\mathcal {I}}(Z_k \ge \Phi ^{-1}(\frac{k}{K+1}))$, $k=1,\ldots ,K$, where $\Phi $ is the cumulative distribution function of the standard normal distribution. In this case, the data generating values for $\varvec{\pi }$ are computed from integrals of the multivariate normal distribution (Chen et al., 2015; Culpepper & Balamuta, 2021).

We assume that there are $J=18$ items, and $M_j=4$ unordered options for each item j. For reparameterized latent class variable $\varvec{\alpha }$ shown in Eq. 3, we only include two-way interaction terms among the attributes. Our model does not explicitly contain $\varvec{Q}$ matrices, therefore, we recover the $\varvec{Q}$ matrices, implied by $\varvec{\Delta }_m$ for each option $m=1,2,3$, using the method shown in Chen et al. (2020). Then, the true $\varvec{\Delta }$ and true $\varvec{Q}$ matrices for each option are shown as follows (columns in $\varvec{\Delta }$ follow the same order as the design vector shown in Eq. 3):

$\varvec{\Delta }$ cube with $K=2$
$$\begin{aligned} \varvec{\Delta }_{m=1}= \left( \begin{array}{llll} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \end{array}\right) , \quad \varvec{\Delta }_{m=2}=\left( \begin{array}{llll} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \end{array}\right) , \quad \varvec{\Delta }_{m=3}=\left( \begin{array}{llll} 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \end{array}\right) \end{aligned}$$
(29)
$\varvec{\Delta }$ cube with $K=3$
$$\begin{aligned} \varvec{\Delta }_{m=1}= \left( \begin{array}{llllllll} 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right) , \varvec{\Delta }_{m=2}= \left( \begin{array}{llllllll} 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right) , \varvec{\Delta }_{m=3}= \left( \begin{array}{llllllll} 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right) \end{aligned}$$
(30)
$\varvec{Q}$ matrices with $K=2$
$$\begin{aligned} {\varvec{Q}}_{m=1}= \left( \begin{array}{ll} 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \end{array}\right) , \quad {\varvec{Q}}_{m=2}=\left( \begin{array}{ll} 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \end{array}\right) , \quad {\varvec{Q}}_{m=3}=\left( \begin{array}{ll} 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \end{array}\right) \end{aligned}$$
(31)
$\varvec{Q}$ matrices with $K=3$
$$\begin{aligned} {\varvec{Q}}_{m=1}= \left( \begin{array}{lll} 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \end{array}\right) , \quad {\varvec{Q}}_{m=2}=\left( \begin{array}{lll} 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \end{array}\right) , \quad {\varvec{Q}}_{m=3}=\left( \begin{array}{lll} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \end{array}\right) \end{aligned}$$
(32)

Given $\varvec{\Delta }$, we generate coefficients in $\varvec{B}$ according to their prior distribution shown in Eq. 25 using $\sigma _\beta ^2 = 1$. For hyper-parameters presented in the previous section, we have $\alpha _{\sigma }=\beta _{\sigma }=a=b=1$, $\varvec{d}_0 = \varvec{1}_{2^K}$, and $D= 100$. We use a Markov chain of length 20,000 with a 10,000 burn-in period for $K = 2$, and a chain length of 30,000 with a 20,000 burn-in period for $K = 3$.

5.2 Results

We repeated the simulation study 100 times for each setting. For model performance, we use several metrics to evaluate parameter recovery. Specifically, we report the average element-wise accuracy rate (EAR) for $\varvec{Q}$ by comparing the estimated $\hat{\varvec{Q}}$ and the true $\varvec{Q}$ matrix, where $\hat{\varvec{Q}}$ is recovered by aggregating $\hat{\varvec{B}}$ samples after burn-in period (Chen et al., 2020). Note that we transform $\varvec{B}$ to $\varvec{\Theta }$ for every sampled value using Eq. 1 and compute the point estimate $\hat{\varvec{\Theta }}$ as the mean of all sampled $\varvec{\Theta }$ arrays after the burn-in period. We compute the mean absolute deviation (MAD) to assess the accuracy of the estimated latent class response probabilities $\hat{\varvec{\Theta }}$, and report the proportion of attribute profiles that are correctly estimated.

It is important to mention how we address the label-switching problem for the RLCM and ULCM. Similar to latent class models, the exploratory RLCM is identified up to label-switching. However, the RLCM has fewer permutations than the ULCM. For instance, the ULCM as $(2^K)!$ possible arrangements whereas the RLCM has $K!\times 2^K$ arrangements (i.e., there are K! ways to permute the order of attributes and $2^K$ ways of permuting the attribute levels). Note that for each replication we draw values from the posterior and then compare posterior means of our parameters (e.g., the $\theta $’s or $\beta $’s) with all $K!\times 2^K$ arrangements with the data generating model parameters in order to evaluate parameter recovery. We select the permutations for the ULCM and RLCM that minimizes the difference between the estimates and data generating value. It is important to note that we do not find evidence of label-switching within chain.

Simulation results in Table 1 show a good recovery for model parameters. It suggests that for fixed K, as the sample size gets larger, the MAD of $\hat{\varvec{\Theta }}$ and $\varvec{\pi }$ become smaller and the EARs of $\hat{\varvec{Q}}$ matrices become larger. The EARs of $\hat{\varvec{Q}}$ matrices are higher for smaller K, which is expected given that the number of unknown model parameters that must be estimated increases with larger K. The simulation results also provide evidence that a positive correlation among attributes, represented by $\rho > 0$, results in slightly larger MADs for $\varvec{\Theta }$ in some instances, and this impact is more systematic for $K=3$. Although $\rho =0.25$ slightly decreases recovery of $\varvec{\Theta }$, $\varvec{\pi }$, and $\varvec{Q}$, the results in Table 1 show attribute classification accuracy improves by a few percentage points. Overall, the classification accuracy is at acceptable levels and generally exceeds 70% for most scenarios.

We also conduct Monte Carlo experiment for denser $\varvec{\Delta }$ and $\varvec{Q}$ compared with those shown in Eqs. 29 − 32, true $\varvec{\Delta }$, $\varvec{Q}$ matrices and simulation results are given in Appendix 7. Table 6 shows similar model parameter recovery compared with the results shown in Table 1.

5.3 Unstructured Latent Class Models (ULCMs)

We want to compare our model performance with ULCMs, which assume that there’s no latent structure between latent attribute classes and observed response variables. Following the same setting as we represented for the nominal RLCM, the likelihood of observing a sample of N independent responses to J items is

$$\begin{aligned} p(\varvec{Y}=\varvec{y}|\varvec{\alpha },\varvec{\Theta })=\prod _{j=1}^J\prod _{c=0}^{2^K-1}\prod _{m=0}^{M_j-1} \theta _{jc}^{n_{jcm} }, \end{aligned}$$

(33)

where $n_{jcm}=\sum _{i=1}^n={\mathcal {I}}(y_{ij}=m)\mathcal I(\varvec{\alpha }_{i}^\top \varvec{v}=c)$. Then, the posterior distribution of all parameters for the nominal ULCM is given by

$$\begin{aligned} p(\varvec{\alpha },\varvec{\Theta },\varvec{\pi }|\varvec{y})\propto p(\varvec{y}|\varvec{\alpha },\varvec{\Theta })p(\varvec{\Theta })p(\varvec{\alpha }|\varvec{\pi })p(\varvec{\pi }). \end{aligned}$$

(34)

Below is the Bayesian framework for our nominal ULCM. Given attribute profile $\varvec{\alpha }$ and class-response probability matrix $\varvec{\Theta }$, response data follow a categorical distribution

$$\begin{aligned} Y_{ij}|\varvec{\alpha }_i^\top \varvec{v}=c,\varvec{\theta }_{jc}\sim \text {Categorical}(\varvec{\theta }_{jc}). \end{aligned}$$

(35)

We use a Dirichlet prior for the class-response probability vectors

$$\begin{aligned} \varvec{\theta }_{jc}\sim \text {Dirichlet}(\varvec{d}_{M_j}), \end{aligned}$$

(36)

and a categorical prior for attributes conditioned upon the latent class probabilities,

$$\begin{aligned} \varvec{\alpha }_i|\varvec{\pi }\sim \text {Categorical}(\varvec{\pi }) \end{aligned}$$

(37)

with a conjugate Dirichlet prior for the latent class probabilities $\varvec{\pi }\sim \text {Dirichlet}(\varvec{d}_0)$, where $\varvec{d}_M$ and $\varvec{d}_0$ are constant vectors.

We applied Gibbs sampling algorithm to estimate model parameters via their posterior distributions.

$\varvec{\theta }_{jc}\mid \varvec{y}_{1:n},\varvec{\alpha }_{1:n}\sim \text {Dirichlet}(\varvec{n}_{jc}+\varvec{d}_{M_j})$
$$\begin{aligned} p(\varvec{\theta }_{jc}\mid \varvec{y}_{1:n},\varvec{\alpha }_{1:n})&\propto p(\varvec{y}_{Ij}\mid \varvec{\alpha }_{I},\varvec{\theta }_{jc})p(\varvec{\theta }_{jc})\nonumber \\&\propto \prod _{m=0}^{M_j}\theta _{jcm}^{n_{jcm}}\cdot \prod _{m=0}^{M_j}\theta _{jcm}^{1-1}, \end{aligned}$$
(38)
where $\varvec{n}_{jc} = (n_{jc0},\ldots ,n_{jc,M_j-1})^\top $ and $I = \left\{ i:\varvec{\alpha }_{i}^\top \varvec{v}=c \right\} $.
$\varvec{\pi }|\varvec{\alpha }\sim Dirichlet(\varvec{n}+\varvec{d}_0)$
$$\begin{aligned} p(\varvec{\pi }\mid \varvec{\alpha }_{1:n})\propto p(\varvec{\alpha }_{1:n}\mid \varvec{\pi })p(\varvec{\pi }) \sim \text {Dirichlet}(\varvec{n}+\varvec{d}_0), \end{aligned}$$
(39)
where $\varvec{\pi }\sim \text {Dirichlet}(\varvec{d}_0) $ and $\varvec{n}=(n_0,\cdots ,n_{2^K-1})^\top $ represents the frequencies of each attribute pattern $\varvec{\alpha }_{i}^\top \varvec{v}=c$, $c=0,\ldots ,2^K-1$.
$\varvec{\alpha }_i\mid \varvec{\alpha }_{(i)},\varvec{y}_{1:n}$ We update $\varvec{\alpha }$ while integrating $\varvec{\pi }$ out
$$\begin{aligned} p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N)&= \int p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N\mid \varvec{\pi })p(\varvec{\pi })d\varvec{\pi }\nonumber \\&=\dfrac{1}{B(\varvec{d}_0)} \int \left( \prod _{c=0}^{2^K-1} \pi _{c}^{n_{c}+d_{0,c}-1}\right) \textrm{d} \varvec{\pi }\nonumber \\&=\dfrac{B(\varvec{n} + \varvec{d}_0)}{B(\varvec{d}_0)}. \end{aligned}$$
(40)
Then, the full conditional distribution for $\varvec{\alpha }_i$ is
$$\begin{aligned} p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)})&= \dfrac{p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N)}{p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_{i-1},\varvec{\alpha }_{i+1},\ldots ,\varvec{\alpha }_N)}\nonumber \\&= \frac{n_{c(i)}+1}{n-1+2^K}, \end{aligned}$$
(41)
where $n_{c(i)}$ represents the number of individuals other than i that have attribute profile $\varvec{\alpha }_c$. The full conditional distribution of $\varvec{\alpha }_i$ given $\varvec{y}_{1:n}$ and $\varvec{\alpha }_{(i)}$ is
$$\begin{aligned} p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)},\varvec{y}_{1:n})&\propto p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)})p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{\Theta })\nonumber \\&\propto (n_{c(i)}+1)p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{\Theta }), \end{aligned}$$
(42)
we update $\varvec{\alpha }_i$ sequentially with weight proportional to $(n_{c(i)}+1)p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{\Theta })$.

The full Gibbs sampling steps of all parameters are shown in Algorithm 2.

In order to compare the model performance of the ULCM with the RLCM, we generate response data from the RLCM model, and then use both Algorithm 1 and to estimate model parameters in two models. We use same simulation settings as in our RLCM. For hyper-parameters in ULCM, we use $\varvec{d}_{M_j}=\varvec{1}_{M_j}$ for $j=1,\ldots ,J$. Simulation results are shown in Table 1 and provide evidence the RLCM has better parameter recovery. Table 2 reports additional details regarding MADs for $\varvec{\Theta }$ at the item level for the RLCM and ULCM. The results show that the aggregate findings in Table 1 are consistent with item-level performance such that the RLCM has smaller MADs than the ULCM.

Results shown in Tables 1 and 2 indicate that for response data generated via RLCM, Algorithm 1 performs uniformly better than Algorithm 2, which implies that if there is structure in the latent relationship between attributes and observed variables, our RLCM can achieve better recovery for model parameters compared with the ULCM.

Table 1 Summary of simulation performance for RLCM and ULCM.

Full size table

Table 2 Summary of mean absolute deviations (MADs) of RLCM and ULCM item response probabilities by item for two selected conditions.

Full size table

6 Applications

6.1 Wagner Preference Inventory

In this section, we apply Algorithm 1 to the dataset in the Wagner Preference Inventory (WAPI II) (Wagner & Wells, 1985). This data set contains nominal responses to $J=12$ items, each of which contains $M=4$ choices. All 13, 502 participants completed the 12 questions, so we have $N=13,502$. Table 3 presents the twelve items along with the marginal probability of selecting each response option. The twelve items were originally designed to distinguish preferences along the notion of activities that vary in Left vs. Right brain and logical vs. creative. The proposed two-by-two design included (a) Left, logical; (b) left, verbal; (c) right, manipulative/spatial; and (d) right, creative. A separate measure for left and right preference can be obtained by adding (a) and (b) and (c) and (d), respectively. In order to be consistent with (Wagner & Wells, 1985), we let $K=2$ to represent the left-right brain dominance dichotomy. We ran five Markov chains with $K=2$ for convergence diagnostics of the Markov chain.

Table 3 Wagner preference inventory items, anchors, and response frequencies.

Full size table

Figure 1 shows the plot of maximum proportional scale reduction factor (PSRF) (Brooks & Gelman, 1998) for checking the convergence of Markov chain with multivariate parameters. The approximate convergence is achieved after 5, 000 iterations since the maximum PSRF remains below 1.1 after that. So we ran 100 Markov chains of length 20, 000 (with 10, 000 as burn-in) estimate the parameters and the results are shown in Table 4.

Table 4 Estimated $\hat{\varvec{\Theta }}$ for Wagner Preference Inventory data.

Full size table

Table 4 implies that participants with attribute profiles $\varvec{\alpha }_i = (0,1)^\top $, $(1,1)^\top $, $(0,0)^\top $ and $(1,0)^\top $ prefer option a, b, c and d, respectively. For instance, the choices for item 1 were “a. major in logic”, “b. write a letter”, “c. fix things at home”, and “d. major in art”. The estimates for $\varvec{\Theta }$ in Table 4 indicate that members of class 01 were most likely to choose option “a” with an estimated response probability of 0.725. In contrast, members of class 00 had a 0.643 chance of selecting option “c” and respondents in the 10 class chose “d” with a probability equal to 0.693.

We also estimated the latent class probabilities of attribute profiles. Specifically, the proportions of each attribute profile pattern in an increasing order of the bijection $\varvec{\alpha }_c^\top \varvec{v}$ are shown in Table 5. The latent classes are nearly equal in size with the most respondents of 0.290 classified with the 11 profile (i.e., Wagner’s left-verbal group) and the 00 class having 0.204 proportion of respondents (i.e., Wagner’s right-manipulative/spatial).

Table 5 Estimated distribution of attributes in Wagner Preference Inventory data.

Full size table

Also, results shown in Table 4 can be used to evaluate the intended choice design for the items. Most items differentiated among one or two of the underlying latent classes. However, some options did not differentiate the latent classes as intended, such as item 2, 7 and 10. For item 2, Wagner originally specified option a as left-logical function and option d as right/creative function. However, according to probabilities represented in Table 4, people with attribute profile $\varvec{\alpha }_i = (0,1)^\top $ did not strongly prefer option a, and people with attribute profile $\varvec{\alpha }_i = (1,0)^\top $ did not strongly prefer option d. The choice design for item 2 should be reconsidered.

7 Discussion

This paper focuses on the identifiability conditions of RLCMs. We proposed strict and generic identifiability conditions based on the unique condition of tensor decomposition shown in Kruskal theorem for the uniqueness of three-way arrays (Kruskal, 1976, 1977). The established identifiability conditions are applicable to a wealth of models for binary (e.g., Chen et al., 2015; 2020, de la Torre, 2011), polytomous (e.g., Chen & de la Torre, 2013, Culpepper, 2019, Culpepper & Balamuta, 2021), and nominal response data. Accordingly, the new identifiability results can guide researchers on the design of diagnostic interventions. Then, we developed a Bayesian formulation for the RLCMs where the generic identifiability conditions are taken into consideration. For our simulation study, we apply the Polya-gamma data augmentation for updating coefficients, and compared our algorithm results with ULCMs. Simulation results show that our algorithm can efficiently estimate model parameters, especially when the number of attribute profiles are small. Given latent structures, our model has better performance compared with ULRMs.

In this paper, we assume that the number of attribute profiles, K, is fixed and pre-specified. However, the prior knowledge for K may not be available in practice. Further study may consider K as an unknown parameter that needs to be estimated (e.g., see Chen et al., 2021). Unknown K implies that the dimension of attribute profiles, category response probability array and coefficients array become unavailable, which can be quite challenging for future research.

References

Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37, 3099–3132.
Google Scholar
Bacci, S., Bartolucci, F., & Gnaldi, M. (2014). A class of multidimensional latent class IRT models for ordinal polytomous item responses. Communications in Statistics-Theory and Methods, 43(4), 787–800.
Google Scholar
Balamuta, J. J., & Culpepper, S. A. (2022). Exploratory restricted latent class models with monotonicity requirements under Polya–gamma data augmentation. Psychometrika 1–43.
Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141–157.
Google Scholar
Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79(3), 403–425.
PubMed Google Scholar
Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455.
Google Scholar
Chen, J., & de la Torre, J. (2013). A general cognitive diagnosis model for expert-defined polytomous attributes. Applied Psychological Measurement, 37(6), 419–437.
Google Scholar
Chen, J., & Zhou, H. (2017). Test designs and modeling under the general nominal diagnosis model framework. PLoS ONE, 12(6), e0180016.
PubMed PubMed Central Google Scholar
Chen, Y., Culpepper, S., & Liang, F. (2020). A sparse latent class model for cognitive diagnosis. Psychometrika, 85, 121–153.
PubMed Google Scholar
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.
PubMed Google Scholar
Chen, Y., Liu, Y., Culpepper, S. A., & Chen, Y. (2021). Inferring the number of attributes for the exploratory DINA model. Psychometrika, 86(1), 30–64.
PubMed Google Scholar
Chiu, C.-Y., Douglas, J., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665.
Google Scholar
Culpepper, S. A. (2019). An exploratory diagnostic model for ordinal responses with binary attributes: Identifiability and estimation. Psychometrika, 84(4), 921–940.
PubMed Google Scholar
Culpepper, S. A., & Balamuta, J. J. (2021). Inferring latent structure in polytomous data with a higher-order diagnostic model. Multivariate Behavioral Research 1–19.
Dang, N. V. (2015). Complex powers of analytic functions and meromorphic renormalization in QFT. arXiv preprint arXiv:1503.00995 .
De La Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183.
Google Scholar
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.
Google Scholar
DeYoreo, M., Reiter, J. P., & Hillygus, D. S. (2017). Bayesian mixture models with focused clustering for mixed ordinal and nominal data. Bayesian Analysis, 12(3), 679–703.
Google Scholar
DiBello, L. V., Henson, R. A., & Stout, W. F. (2015). A family of generalized diagnostic classification models for multiple choice option-based scoring. Applied Psychological Measurement, 39(1), 62–79.
PubMed Google Scholar
Dunson, D. B., & Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. Journal of the American Statistical Association, 104(487), 1042–1051.
PubMed Central Google Scholar
Fang, G., Liu, J., & Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84(1), 19–40.
PubMed Google Scholar
George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423), 881–889.
Google Scholar
Gnaldi, M., Bacci, S., Kunze, T., & Greiff, S. (2020). Students’ complex problem solving profiles. Psychometrika, 85, 469–501.
PubMed Google Scholar
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.
Google Scholar
Gu, Y., & Dunson, D. B. (2021). Bayesian pyramids: Identifiable multilayer discrete latent structure models for discrete data. arXiv preprint arXiv:2101.10373 .
Holmes, C. C., & Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1), 145–168.
Google Scholar
Huang, G.-H., & Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69(1), 5–32.
Google Scholar
Jiang, Z., & Templin, J. (2019). Gibbs samplers for logistic item response models via the Pólya-Gamma distribution: A computationally efficient data-augmentation strategy. Psychometrika, 84(2), 358–374.
PubMed Google Scholar
Jimenez, A., Balamuta, J. J., & Culpepper, S. A. (2023). A sequential exploratory diagnostic model using a Pólya-gamma data augmentation strategy. British Journal of Mathematical and Statistical Psychology.
Kruskal, J. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications, 18, 95–138.
Google Scholar
Kruskal, J. B. (1976). More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3), 281–293.
Google Scholar
Kuo, B.-C., Chen, C.-H., Yang, C.-W., & Mok, M. M. C. (2016). Cognitive diagnostic models for tests with multiple-choice and constructed-response items. Educational Psychology, 36(6), 1115–1133.
Google Scholar
Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253–275.
PubMed Google Scholar
Ma, W., & de la Torre, J. (2019). Category-level model selection for the sequential G-DINA model. Journal of Educational and Behavioral Statistics, 44(1), 45–77.
Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297).
Matsaglia, G., & Styan, G. P. H. (1974). Equalities and inequalities for ranks of matrices. Linear and Multilinear Algebra, 2(3), 269–292.
Google Scholar
McHugh, R. B. (1956). Efficient estimation and local identification in latent class analysis. Psychometrika, 21(4), 331–347.
Google Scholar
Mityagin, B. S. (2020). The zero set of a real analytic function. Mathematical Notes, 107, 529–530.
Google Scholar
Murray, J. S., & Reiter, J. P. (2016). Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence. Journal of the American Statistical Association, 111(516), 1466–1479.
Google Scholar
Polson, N. G., Scott, J. G., & Windle, J. (2013). Bayesian inference for logistic models using Pólya-gamma latent variables. Journal of the American Statistical Association, 108(504), 1339–1349.
Google Scholar
Royle, J. A., & Link, W. A. (2005). A general class of multinomial mixture models for anuran calling survey data. Ecology, 86(9), 2505–2512.
Google Scholar
Shear, B. R., & Roussos, L. A. (2017). Validating a distractor-driven geometry test using a generalized diagnostic classification model. In Understanding and investigating response processes in validation research (pp. 277–304). Springer.
Si, Y., & Reiter, J. P. (2013). Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavioral Statistics, 38(5), 499–521.
Google Scholar
Templin, J., Henson, R. A., Rupp, E., Jang, A. A., & Ahmed, M. (2008). Cognitive diagnosis models for nominal response data. New York, NY: In Annual Meeting of the National Council on Measurement in Education.
Vermunt, J. K., Van Ginkel, J. R., Van der Ark, L. A., & Sijtsma, K. (2008). Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology, 38(1), 369–397.
Google Scholar
Wagner, R. F., & Wells, K. A. (1985). A refined neurobehavioral inventory of hemispheric preference. Journal of Clinical Psychology, 41(5), 671–676.
PubMed Google Scholar
Xu, G. (2017). Identifiability of restricted latent class models with binary responses. Annals of Statistics, 45(2), 675–707.
Google Scholar
Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113(523), 1284–1295.
Google Scholar
Yigit, H. D., Sorrel, M. A., & de la Torre, J. (2019). Computerized adaptive testing for cognitively based multiple-choice data. Applied Psychological Measurement, 43(5), 388–401.
PubMed Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the financial support of the National Science Foundation Grants 1758631 and SES 21-50628.

Author information

Authors and Affiliations

Department of Statistics, University of Illinois at Urbana-Champaign, Computing Applications Building, Room 152, 605 E. Springfield Ave., Champaign, IL, 61820, USA
Ying Liu & Steven Andrew Culpepper

Authors

Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Steven Andrew Culpepper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven Andrew Culpepper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Proof of Theorems

1.1 Preliminary Results

We start the proof with introducing some notation.

Definition 4

For a matrix $\varvec{M}$, the Kruskal rank of $\varvec{M}$ is the largest number I such that every set of I columns in $\varvec{M}$ are linearly independent.

Remark 7

Compared with the rank of a matrix M, we have $rank_K(M)\le rank(M)$. If M has full column rank, then the equality holds.

Consider a tripartition of the set ${\mathbb {J}}=\{1,2,\ldots ,J\}$ into three disjoint, non-empty subsets ${\mathbb {J}}_1=\{1,2,\ldots ,K\}$, ${\mathbb {J}}_2=\{K+1,\ldots ,2K\}$ and ${\mathbb {J}}_3=\{2K+1,\ldots ,J\}$. Then, the marginal distribution of response $\varvec{Y}$ can be represented as a three-way array $\varvec{T}$ decomposing $\varvec{Y}$ into three parts:

$$\begin{aligned} \begin{aligned} \varvec{T}_{(\varvec{y}^{{\mathbb {J}}_1}, \varvec{y}^{{\mathbb {J}}_2},\varvec{y}^{{\mathbb {J}}_3})}&= P(\varvec{Y}^{{\mathbb {J}}_1}=\varvec{y}^{{\mathbb {J}}_1}, \varvec{Y}^{{\mathbb {J}}_2}=\varvec{y}^{{\mathbb {J}}_2},\varvec{Y}^{{\mathbb {J}}_3}=\varvec{y}^{{\mathbb {J}}_3}\mid \varvec{\pi }, \varvec{B})\\&=\sum _{\varvec{\alpha }} \pi _{\varvec{\alpha }} P(\varvec{Y}^{{\mathbb {J}}_1}=\varvec{y}^{{\mathbb {J}}_1}, \varvec{Y}^{{\mathbb {J}}_2}=\varvec{y}^{{\mathbb {J}}_2},\varvec{Y}^{{\mathbb {J}}_3}=\varvec{y}^{{\mathbb {J}}_3}\mid \varvec{B},\varvec{\alpha })\\&=\sum _{\varvec{\alpha }} \pi _{\varvec{\alpha }} P(\varvec{Y}^{{\mathbb {J}}_1}=\varvec{y}^{{\mathbb {J}}_1}\mid \varvec{B},\varvec{\alpha }) P(\varvec{Y}^{{\mathbb {J}}_2}=\varvec{y}^{{\mathbb {J}}_2}\mid \varvec{B},\varvec{\alpha }) P(\varvec{Y}^{{\mathbb {J}}_3}=\varvec{y}^{{\mathbb {J}}_3}\mid \varvec{B},\varvec{\alpha }). \end{aligned} \end{aligned}$$

(A1)

Let $\varvec{T}_1$, $\varvec{T}_2$, $\varvec{T}_3$ represent distributions of $\varvec{Y}^{{\mathbb {J}}_1}$, $\varvec{Y}^{{\mathbb {J}}_2}$, $\varvec{Y}^{{\mathbb {J}}_3}$ given values of attribute profile $\varvec{\alpha }$. Then, the identifiability is equivalent to the uniqueness of the decomposition of the following tensor (Kruskal, 1977)

$$\begin{aligned} \varvec{T} =\sum _{\varvec{\alpha }} \pi _{\varvec{\alpha }} \varvec{T}_{1,\varvec{\alpha }} \otimes \varvec{T}_{2,\varvec{\alpha }} \otimes \varvec{T}_{3,\varvec{\alpha }} =\sum _{\varvec{\alpha }} \tilde{\varvec{T}}_{1, \varvec{\alpha }} \otimes \varvec{T}_{2,\varvec{\alpha }} \otimes \varvec{T}_{3,\varvec{\alpha }}, \end{aligned}$$

(A2)

where $\varvec{T}_{1, \varvec{\alpha }}$, $\varvec{T}_{2, \varvec{\alpha }}$, $\varvec{T}_{3, \varvec{\alpha }}$ are the $\varvec{\alpha }$-th column of $\varvec{T}_1$, $\varvec{T}_2$, $\varvec{T}_3$, and $\tilde{\varvec{T}}_{1, \varvec{\alpha }}= \pi _{\varvec{\alpha }} \varvec{T}_{1, \varvec{\alpha }}$.

We apply the following theorem shown in Kruskal (1977) for our proof.

Theorem 3

(Kruskal, 1977) If

$$\begin{aligned} rank_K(\tilde{\varvec{T}_1})+rank_K(\varvec{T}_2)+rank_K(\varvec{T}_3)\ge 2 \cdot 2^K +2, \end{aligned}$$

(A3)

then the tensor decomposition of $\varvec{T}$ is unique up to simultaneous permutation and rescaling of the columns.

We have $rank_K(\tilde{\varvec{T}_1})=rank_K(\varvec{T}_1)$ since $\varvec{\pi }$ has positive entries. Moreover, $\varvec{T}_1$, $\varvec{T}_2$ and $\varvec{T}_3$ are all stochastic matrices with column sum 1, so the decomposition of the tensor $\varvec{T}$ is unique up to permutations of columns if (A3) in Theorem 3 is satisfied, which implies model identifiability.

If items have M response levels, we will be able to construct $\varvec{T}_1$, $\varvec{T}_2$ and $\varvec{T}_3$ such that $\varvec{T}_1$, $\varvec{T}_2$ are $M^K\times 2^K$, and $\varvec{T}_3$ is $M^{J-2K}\times 2^K$. So for the first set of K items, the probability matrix $\varvec{T}_1$ is

$$\begin{aligned} \varvec{T}_1=\left[ \bigotimes _{j=1}^K \varvec{\theta }_{j0},\ldots ,\bigotimes _{j=1}^K \varvec{\theta }_{j,2^{K}-1} \right] . \end{aligned}$$

We multiply $\varvec{T}_1$ by a collapsing matrix $\varvec{A}$ that makes $\underset{2^K\times M^K}{\varvec{A}}\underset{M^K\times 2^K}{\varvec{T}_1}$ a square matrix of size $2^K \times 2^K$.

Definition 5

Matrix $\varvec{A}_j$, with size $2\times M$, is a collapsing matrix for item j. For the response probability vector $\theta _{jc}$, $\varvec{A}_j\theta _{jc}$ only gives probabilities of selecting a given response level or not in class c.

Following is an example of matrix $\varvec{A}_j$.

Example 5

Consider $M_j= 3$ and we are collapsing on option 1. Let

$$\begin{aligned} \varvec{A}_j = \begin{bmatrix} 1&{}0&{}1\\ 0&{}1&{}0\\ \end{bmatrix}, \end{aligned}$$

and the response probability vector $\theta _{jc}=(\theta _{jc,0},\theta _{jc,1},\theta _{jc,2})$ for item j in class c. Then, we have

$$\begin{aligned} \varvec{A}_j\theta _{jc}= \begin{bmatrix} 1&{}0&{}1\\ 0&{}1&{}0\\ \end{bmatrix} \begin{bmatrix} 1-\theta _{jc,1}-\theta _{jc,2}\\ \theta _{jc,1}\\ \theta _{jc,2}\\ \end{bmatrix} = \begin{bmatrix} 1-\theta _{jc,1}\\ \theta _{jc,1}\\ \end{bmatrix}. \end{aligned}$$

Let $\varvec{A}=\bigotimes _{j=1}^K \varvec{A}_j$, where $\varvec{A}_j$ is defined in Definition 5. Compared with $\varvec{T}_1$, $\varvec{A} \varvec{T}_1$ is a collapsed version of $\varvec{T}_1$ which only contains probabilities of selecting a given response level or not by latent class membership. So we can write $\varvec{A} \varvec{T}_1$ as

$$\begin{aligned} \varvec{A} \varvec{T}_1&=\bigotimes _{j=1}^K\varvec{A}_j \left[ \bigotimes _{j=1}^K \varvec{\theta }_{j0},\ldots ,\bigotimes _{j=1}^K \varvec{\theta }_{j,2^K-1}\right] \end{aligned}$$

(A4)

$$\begin{aligned}&=\left[ \bigotimes _{j=1}^K\varvec{A}_j \bigotimes _{j=1}^K \varvec{\theta }_{j0},\ldots ,\bigotimes _{j=1}^K\varvec{A}_j \bigotimes _{j=1}^K \varvec{\theta }_{j,2^K-1}\right] \end{aligned}$$

(A5)

$$\begin{aligned}&=\left[ \bigotimes _{j=1}^K (\varvec{A}_j\varvec{\theta }_{j0}),\ldots ,\bigotimes _{j=1}^K (\varvec{A}_j\varvec{\theta }_{j,2^{K}-1}) \right] . \end{aligned}$$

(A6)

Proposition 1

If $\varvec{\Delta }^1$ follows a simple structure shown in Definition 1 and Remark 2, then we have

$$\begin{aligned} \varvec{A}\varvec{T}_1 = \bigotimes _{j=1}^K(\varvec{p}_{j0},\varvec{p}_{j1}) \end{aligned}$$

(A7)

where $\varvec{p}_{j0}=\varvec{A}_{j}\varvec{\theta }_{j0}$ and $\varvec{p}_{j1}=\varvec{A}_{j} \varvec{\theta }_{j,\varvec{e}_j^\top \varvec{v}}$.

Proof

Simple structure implies that we can then write $\varvec{A}\varvec{T}_1$ as a block matrix:

$$\begin{aligned} \varvec{A}\varvec{T}_1&=\left( \varvec{p}_{10}\otimes \left( \bigotimes _{j>1}\varvec{A}_{j}\varvec{\theta }_{j0}\right) ,\dots ,\varvec{p}_{10}\otimes \left( \bigotimes _{j>1}\varvec{A}_{j}\varvec{\theta }_{j,2^{K-1}-1}\right) ,\right. \nonumber \\&\left. \varvec{p}_{11}\otimes \left( \bigotimes _{j>1}\varvec{A}_{j}\varvec{\theta }_{j,2^{K-1}}\right) ,\dots ,\varvec{p}_{11}\otimes \left( \bigotimes _{j>1}\varvec{A}_{j}\varvec{\theta }_{j,2^K-1}\right) \right) \nonumber \\&=\left( \varvec{p}_{10}\otimes \varvec{T}_{(1)0},\varvec{p}_{11} \otimes \varvec{T}_{(1)1}\right) . \end{aligned}$$

(A8)

We next show that simple structure of item 1 implies that,

$$\begin{aligned} \bigotimes _{j>1}\varvec{A}_j\varvec{\Theta }_{j0}=\bigotimes _{j>1}\varvec{A}_j\varvec{\Theta }_{j,2^{K-1}}, \dots , \bigotimes _{j>1}\varvec{A}_j\varvec{\Theta }_{j,2^{K-1}-1}=\bigotimes _{j>1}\varvec{A}_j\varvec{\Theta }_{j,2^{K}-1}. \end{aligned}$$

Notice that items $j>1$ are unrelated to attribute one. Let $\varvec{\alpha }_{(1)}=(\alpha _2,\dots ,\alpha _K)^\top $ denote the response pattern on attributes two through K. Simple structure of item 1 implies that classes with $\varvec{\alpha }=(0,\varvec{\alpha }_{(1)})^\top $ and $\varvec{\alpha }=(1,\varvec{\alpha }_{(1)})^\top $ will have identical response probabilities on items $j>1$. Stated differently, classes $(0,\varvec{\alpha }_{(1)})\varvec{v}=c_0$ and $(1,\varvec{\alpha }_{(1)})\varvec{v}=c_0+2^{K-1}$ have equivalent response probabilities on the remaining $j>1$ items so that $\bigotimes _{j>1}\varvec{A}_j\varvec{\Theta }_{j,c_0}=\bigotimes _{j>1}\varvec{A}_j\varvec{\Theta }_{j,c_0+2^{K-1}}$ for all $c_0\in \{0,\dots ,2^{K-1}-1\}$. Consequently, $\varvec{T}_{(1)0}=\varvec{T}_{(1)1}=\varvec{T}_{(1)}$ and properties of the Kronecker product imply that

$$\begin{aligned} \varvec{A}\varvec{T}_1=\left( \varvec{p}_{10}\otimes \varvec{T}_{(1)},\varvec{p}_{11} \otimes \varvec{T}_{(1)}\right) = \left( \varvec{p}_{10},\varvec{p}_{11}\right) \otimes \varvec{T}_{(1)} \end{aligned}$$

(A9)

where

$$\begin{aligned} \varvec{T}_{(1)} = \left( \bigotimes _{j>1}\varvec{A}_j\varvec{\Theta }_{j0},\dots ,\bigotimes _{j>1}\varvec{A}_j\varvec{\Theta }_{j,2^{K-1}-1} \right) . \end{aligned}$$

(A10)

Item two is also simple structure and repeating the aforementioned steps on $\varvec{T}_{(1)}$ implies that

$$\begin{aligned} \varvec{A}\varvec{T}_1=\left( \varvec{p}_{10},\varvec{p}_{11}\right) \otimes \left( \varvec{p}_{20},\varvec{p}_{21}\right) \otimes \varvec{T}_{(1:2)} \end{aligned}$$

(A11)

where

$$\begin{aligned} \varvec{T}_{(1:2)} = \left( \bigotimes _{j>2}\varvec{A}_j\varvec{\Theta }_{j0},\dots ,\bigotimes _{j>2}\varvec{A}_j\varvec{\Theta }_{j,2^{K-2}-1} \right) . \end{aligned}$$

(A12)

Consequently, simple structure for the remaining $j\in \{3,\dots ,K\}$ items implies that

$$\begin{aligned} \varvec{A}\varvec{T}_1 = \bigotimes _{j=1}^K(\varvec{p}_{j0},\varvec{p}_{j1}). \end{aligned}$$

(A13)

$\square $

Remark 8

Note that properties of the Kronecker product and simple structure in $\varvec{T}_1$ imply that $\varvec{A}\varvec{T}_1$ has rank $2^K$ if $\varvec{p}_{j0}$ and $\varvec{p}_{j1}$ are linearly independent for all j. According to Definition 1, there is at least one $\beta _{jjm}\ne 0$ for $m\in \{1,\dots ,M_j-1\}$, which implies that $\varvec{p}_{j0}\ne \varvec{p}_{j1}$.

Proposition 2

$rank(\varvec{A}^i\varvec{T}_i)=2^K$ if and only if $rank(\varvec{T}_i)=2^K$, $i=1,2$.

Proof

By Sylvester’s rank inequality (Matsaglia & Styan, 1974), we have

$$\begin{aligned} rank(\varvec{A}^i)+rank(\varvec{T}_i)-2^K \le rank(\varvec{A}^i\varvec{T}_i)\le \min \{rank(\varvec{A}^i),rank(\varvec{T}_i)\}. \end{aligned}$$

Given $rank(\varvec{A}_j^i)=2$, and the property of the rank of a Kronecker product, we have $rank(\varvec{A}^i)=\prod _{j=1}^{K}rank(\varvec{A}_j^i)=2^K$. Then, we get

$$\begin{aligned} rank(\varvec{T}_i) \le rank(\varvec{A}^i\varvec{T}_i)\le \min \{2^K,rank(\varvec{T}_i)\}, \end{aligned}$$

which implies that $rank(\varvec{A}^i\varvec{T}_i)=2^K$ if and only if $rank(\varvec{T}_i)=2^K$. $\square $

1.2 Proof of Theorem 1

By Proposition 2, it suffices to show that $rank(\varvec{A}^1\varvec{T}_1)=rank(\varvec{A}^2\varvec{T}_2)=2^K$ and $rank_K(\varvec{T}_3)\ge 2$, where $\varvec{A}^1=\bigotimes _{j=1}^K \varvec{A}_j^1$ and $\varvec{A}^2=\bigotimes _{j=1}^K \varvec{A}_j^2$ are collapsing matrices introduced in Definition 5. According to condition (A1) in Theorem 1, both $\varvec{\Delta }^1$ and $\varvec{\Delta }^2$ satisfy the simple structure, then by Proposition 1 and remark 8 we have $rank(\varvec{A}^1\varvec{T}_1)=rank(\varvec{A}^2\varvec{T}_2)=2^K$. For $\varvec{T}_3$, since each element is nonnegative and each column sums to 1, then given condition (A2), for any two different classes c and $c^\prime $, there must exist one item j such that $\theta _{jcm}\ne \theta _{j^\prime m}$, so that $rank_K(\varvec{T}_3)\ge 2$. By Theorem 3, the model is strictly identified.

1.3 Proof of Theorem 2

According to the tripartition of items set ${\mathbb {J}}$, we can decompose $\varvec{\Delta }$ into $\varvec{\Delta }^1$, $\varvec{\Delta }^2$, $\varvec{\Delta }^\prime $ corresponding to ${\mathbb {J}}_1$, ${\mathbb {J}}_2$, ${\mathbb {J}}_3$, respectively. Similarly, we can decompose parameter space $\Omega (\varvec{B})$ into three parts, $\Omega _{\varvec{\Delta }}(\varvec{B})=\Omega _{\varvec{\Delta }^1}\times \Omega _{\varvec{\Delta }^2}\times \Omega _{\varvec{\Delta }'}$. Therefore, to prove Theorem 2, it suffices to show that under conditions (B1) and (B2), $rank_K(\varvec{T}_1)=rank_K(\varvec{T}_2)=2^K$ and $rank_K(\varvec{T}_3)\ge 2$ hold almost everywhere in $\Omega _{\varvec{\Delta }^1}$, $\Omega _{\varvec{\Delta }^2}$, $\Omega _{\varvec{\Delta }^\prime }$, respectively. Then, by Theorem 3, the identifiability holds almost everywhere in $\Omega _{\varvec{\Delta }}(\varvec{B})$.

Based on Theorem 3 and Proposition 2, we first show that $rank(\varvec{A}^i \varvec{T}_i)=2^K$ holds almost everywhere in $\Omega _{\varvec{\Delta }^i}$, $i=1,2$, given $\varvec{\Delta }$ satisfying the structure shown in Theorem 2. Let

$$\begin{aligned} f_{i}(\varvec{B})=det(\varvec{A}^i\varvec{T}_i):\ \Omega _{\varvec{\Delta }^i}\rightarrow {\mathbb {R}} \end{aligned}$$

(A14)

denote the determinant of matrix $\varvec{A}^i\varvec{T}_i$, where $\varvec{\Delta }^i$ satisfies condition (B1).

Proposition 3

$f_{i}(\varvec{B})$ is a real analytic function of $\varvec{B}$.

Proof

$f_{i}(\varvec{B})$ is a composition function shown as below.

$$\begin{aligned} f_{i}(\varvec{B})=det(\varvec{A}^i\varvec{T}_i)=g(\varvec{\theta }_{1,0},\ldots ,\varvec{\theta }_{K,2^K-1}), \end{aligned}$$

where $\varvec{\theta }_{jc}=(\theta _{jc0},\ldots ,\theta _{jc,M_j-1})^\top $ and $\theta _{jcm}=\frac{\exp \left( \varvec{\alpha }_c^\top \varvec{\beta }_{jm} \right) }{\sum _{m'=0}^{M_j-1}\exp \left( \varvec{\alpha }_c^\top \varvec{\beta }_{jm'} \right) }=\dfrac{1}{1+\sum _{m'\ne m}\exp \left( \varvec{\alpha }_c^\top \varvec{\beta }_{jm'}-\varvec{\alpha }_c^\top \varvec{\beta }_{jm} \right) }$.

$\theta _{jcm}$ is an analytic function because exponential functions are positive analytic functions, and $g(\varvec{\theta }_{1,0},\ldots ,\varvec{\theta }_{K,2^K-1})$ is also a real analytic function of $(\varvec{\theta }_{1,0},\ldots ,\varvec{\theta }_{K,2^K-1})$ given that it is a polynomial. Therefore, we know that $f_{i}(\varvec{B})$ is a real analytic function of $\varvec{B}$, given the fact that the composition of real analytic functions is a real analytic function. $\square $

Next we introduce the following lemma, which shows that the zero set of a real analytic function has Lebesgue measure zero if the function is not constantly equal to zero.

Lemma 1

(Mityagin, 2020; Dang, 2015) Let $f: {\mathbb {R}}^d\rightarrow {\mathbb {R}}$ be a real analytic function on ${\mathbb {R}}^d$. If f is not identically zero, then its zero set $\{\varvec{x}\in {\mathbb {R}}^d: f(\varvec{x})=0\}$ has Lebesgue measure zero.

Proposition 4

If $\varvec{\Delta }^i$ satisfies the structure shown in Theorem 2, then there exists some $\varvec{B}\in \Omega _{\varvec{\Delta }^i}$, such that $f_i(\varvec{B})\ne 0$, $i=1,2$.

Proof

As shown in condition B1 of Theorem 2, assume that for $j=1,\ldots ,K$, $\Delta _j^1$ and $\Delta _{j}^2$ satisfy the following structure:

$$\begin{aligned} \varvec{\Delta }_j= \begin{bmatrix} 0&{}0&{}\cdots &{}0&{}0&{}0&{}\cdots &{}0\\ 1&{}0&{}\cdots &{}0&{}\delta _{jj1}=1&{}0&{}\cdots &{}0\\ 1&{}0&{}\cdots &{}0&{}0&{}0&{}\cdots &{}0\\ \vdots &{}\vdots &{}&{}\vdots &{}\vdots &{}\vdots &{}&{}\vdots \\ 1&{}0&{}\cdots &{}0&{}0&{}0&{}\cdots &{}0\\ \end{bmatrix}_{M_j \times P}. \end{aligned}$$

(A15)

Then, we have

$$\begin{aligned} \varvec{p}_{j0}=\left( 1-\frac{\exp \left( \beta _{j01} \right) }{\exp \left( \beta _{j01} \right) + h_j},\frac{\exp \left( \beta _{j01} \right) }{\exp \left( \beta _{j01} \right) + h_j}\right) ^\top ,\\ \varvec{p}_{j1}=\left( 1-\frac{\exp \left( \beta _{j01}+\beta _{jj1} \right) }{\exp \left( \beta _{j01}+\beta _{jj1} \right) + h_j},\frac{\exp \left( \beta _{j01}+\beta _{jj1} \right) }{\exp \left( \beta _{j01}+\beta _{jj1} \right) + h_j}\right) ^\top , \end{aligned}$$

where

$$\begin{aligned} h_j=\sum _{m'\ne 1}\exp \left( \beta _{j0m^\prime } \right) . \end{aligned}$$

As shown in Definition 1, $\Delta _j^1$ and $\Delta _{j}^2$ satisfy simple structure, and $\varvec{p}_{j0}$ and $\varvec{p}_{j1}$ are linearly independent for $j=1,\ldots ,K$ due to $\beta _{jk1}\ne 0$. Therefore, according to Proposition 1 and Remark 8, we have $rank(\varvec{A}^i\varvec{T}_i)=\prod _{j=1}^{K}rank(\varvec{p}_{j0},\varvec{p}_{j1})=2^K$, which implies that $f_i(\varvec{B})\ne 0$, $i=1,2$. $\square $

Let $S_i$ denote the zero set of function $f_i(\varvec{B})$:

$$\begin{aligned} S_{i}=\{\varvec{B}\in \Omega _{\varvec{\Delta }^i}: f_{i}(\varvec{B})=det(\varvec{A}^i\varvec{T}_i)=0\}, \end{aligned}$$

then by Lemma 1 we can conclude that $S_{i}$ is a measure zero set with respect to $\Omega _{\varvec{\Delta }^i}$ provided $\varvec{\Delta }^i$ satisfies condition (B1).

For condition (B2), we need to show that $rank_K(\varvec{T}_3)\ge 2$ holds almost everywhere in $\varvec{\Omega }_{\varvec{\Delta }^\prime }$. Note $rank_K(\varvec{T}_3)\ge 2$ implies that for any two different attribute profiles $\varvec{\alpha }_c$, $\varvec{\alpha }_{c^\prime }$, there always exist one $j^*>2K$, such that $\theta _{j^*cm}\ne \theta _{j^*c^\prime m}$ for some choice m with $0<m<M_{j^*}$. Note that the $\varvec{\alpha }_c$-th and $\varvec{\alpha }_{c^\prime }$-th columns of $\varvec{T}_3$ are $\varvec{T}_{3,\varvec{\alpha }_c}=\bigotimes _{j>2K}\varvec{\theta }_{j\varvec{\alpha }_c}$ and $\varvec{T}_{3,\varvec{\alpha }_{c^\prime }}=\bigotimes _{j>2K}\varvec{\theta }_{j\varvec{\alpha }_{c^\prime }}$. Then, under condition B2, we have $\varvec{\theta }_{j^*\varvec{\alpha }_c}\ne \varvec{\theta }_{j^*\varvec{\alpha }_{c^\prime }}$, which implies $\varvec{T}_{3,\varvec{\alpha }_c} \ne \varvec{T}_{3,\varvec{\alpha }_{c^\prime }}$. Therefore, $\varvec{T}_{3,\varvec{\alpha }_c} = \varvec{T}_{3,\varvec{\alpha }_{c^\prime }}$ holds only when $\beta _{j^*mk}=0$ for some k and choice m with $\delta _{j^*mk}=1$, which are of Lebesgue measure zero within $\varvec{\Omega }_{\varvec{\Delta }^\prime }$. That proves $rank_K(\varvec{T}_3)\ge 2$ almost everywhere within $\varvec{\Omega }_{\varvec{\Delta }^\prime }$.

Therefore, the inequality A3 shown in Theorem 3 holds almost everywhere in $\Omega _{\varvec{\Delta }}(\varvec{B})$.

Appendix B Posterior Inference

The goal of this section is to describe our strategy for inferring the nominal RLCM parameters. We first discuss the conditional likelihood and apply the Polya-gamma identity to augment the likelihood function. An important feature to note is that we collapse the conditional likelihood, which has the advantage of requiring fewer draws of Polya-gamma augmented variables. Then, we show the derivation of full conditional distributions for a Gibbs sampling algorithm.

1.1 Conditional Likelihood and Polya Gamma Data Augmentation

Let $i=1,\dots ,n$ index individuals so that $y_{ij}$ denotes the observed response for individual i to item j and $\varvec{y}_i=(y_{i1},\dots ,y_{iJ})^\top $ is a J-vector of responses for individual i. Let $\varvec{B}$ denote all of the item parameters. The conditional distribution of a sample of n responses is,

$$\begin{aligned} p(\varvec{y}_{1:n}|\varvec{\alpha }_{1:n},\varvec{B}) = \prod _{i=1}^n p(\varvec{y}_{i}|\varvec{\alpha }_i,\varvec{B})=\prod _{i=1}^n\prod _{j=1}^J \prod _{m=0}^{M_j-1}\left( \frac{\exp \left( \varvec{a}_i^\top \varvec{\beta }_{jm} \right) }{\sum _{m'=0}^{M_j-1}\exp \left( \varvec{a}_i^\top \varvec{\beta }_{jm'} \right) }\right) ^{\mathbb {1}(y_{ij}=m)}. \end{aligned}$$

(B1)

where $\varvec{y}_i$ is the response vector for individual i, $\varvec{\alpha }_i$ is individual i’s attribute profile, $\varvec{y}_{1:n}=(\varvec{y}_{1},\dots ,\varvec{y}_{n})^\top $ is a $n\times J$ matrix of responses, and $\varvec{\alpha }_{1:n}=(\varvec{\alpha }_1,\dots ,\varvec{\alpha }_n)^\top $ denotes all attribute profiles. An important feature to note about the conditional likelihood is that we can aggregate terms in the product for individuals who reside in the same class and select the same response options on items. That is, after switching the order of the i and j products and substituting $\varvec{a}_i = \varvec{a}_{\varvec{\alpha }_i^\top \varvec{v}}$ we can aggregate Eq. B1 as

$$\begin{aligned} p(\varvec{y}_{1:n}|\varvec{\alpha }_{1:n},\varvec{B}) = \prod _{j=1}^J \prod _{c=0}^{2^K-1} \frac{\prod _{m=0}^{M_j-1} \left[ \exp \left( \varvec{a}_c^\top \varvec{\beta }_{jm} \right) \right] ^{n_{jcm}}}{\left[ \sum _{m'=0}^{M_j-1}\exp \left( \varvec{a}_c^\top \varvec{\beta }_{jm'} \right) \right] ^{n_c}} \end{aligned}$$

(B2)

where $n_{jcm} = \sum _{i=1}^n \mathbb {1}(y_{ij}=m) \mathbb {1}(\varvec{\alpha }_i^\top \varvec{v}=c)$ indicates the number of individuals within class c that select option m on item j and $n_c= \sum _{i=1}^n \mathbb {1}(\varvec{\alpha }_i^\top \varvec{v}=c)$ is the number of individuals residing in class c.

The form of Eq. B2 enables us to adopt the Polya-Gamma data augmentation strategy for models involving logistic functions (Polson et al., 2013). In particular, Polson et al. (2013) reported the following identity relating the logistic function with an integral for a Polya-Gamma (PG) random variable,

$$\begin{aligned} \frac{(e^{\psi })^a}{\left( 1+e^\psi \right) ^b} = 2^{-b}e^{\kappa \psi }\int _0^\infty e^{-w \psi ^2/2}p(w)dw,\; a\in {\mathbb {R}},\; b>0 \end{aligned}$$

(B3)

where $\kappa =a-b/2$ and $w\sim \text {PG}(b,0)$. Equation B3 provides a data augmentation strategy for the random variable $\psi $ that is conjugate with normal priors. For instance, see Polson et al. (2013) for a summary of a Gibbs sampling algorithm to infer the posterior distribution of logistic regression model parameters using Markov chain Monte Carlo (MCMC). Additionally, the PG strategy was also used for Bayesian estimation of the two parameter logistic item response theory model (Jiang & Templin, 2019) and binary diagnostic models (Balamuta & Culpepper, 2022).

We apply the PG identity in Eq. B3 by rewriting the portion of the conditional likelihood in Eq. B2 that corresponds with item j in a logistic format. Specifically, as noted by Holmes and Held and Polson et al., the full conditional distribution for $\varvec{\beta }_{jm}$ can be written as,

$$\begin{aligned} p(\varvec{\beta }_{jm}|\varvec{y}_{1:n,j},\varvec{\alpha }_{1:n},\varvec{B}_{j(m)})&=p(\varvec{\beta }_{jm})p(\varvec{y}_{1:n,j}|\varvec{B}_j)\nonumber \\&\propto p(\varvec{\beta }_{jm}) \prod _{c=0}^{2^K-1} \left[ \frac{\exp \left( \eta _{jcm} \right) }{1+\exp \left( \eta _{jcm} \right) }\right] ^{n_{jcm}} \left[ \frac{1}{1+\exp \left( \eta _{ijm} \right) }\right] ^{n_c-n_{jcm}}. \end{aligned}$$

(B4)

where $\varvec{y}_{1:n,j}=(y_{1j},\dots ,y_{nj})$, $\varvec{B}_{j(m)}=(\varvec{\beta }_{j0},\dots ,\varvec{\beta }_{j,m-1},\varvec{\beta }_{j,m+1},\dots ,\varvec{\beta }_{j,M_j-1})$ excludes coefficients for response m on j, and

$$\begin{aligned} \eta _{jcm}&=\varvec{a}_c^\top \varvec{\beta }_{jm}-C_{jcm}\end{aligned}$$

(B5)

$$\begin{aligned} C_{jcm}&=\ln \left( \sum _{m'\ne m} \exp (\varvec{a}_c^\top \varvec{\beta }_{jm'})\right) . \end{aligned}$$

(B6)

1.2 Full Conditional Distributions

1.2.1 $\delta _{jpm}$ and $\gamma $

The full conditional distribution for $\delta _{jpm}$ is

$$\begin{aligned} p(\delta _{jpm}|\beta _{jpm},\gamma ,\sigma ^2_{\beta })\propto p(\beta _{jpm}|\delta _{jpm},\sigma ^2_{\beta })p(\delta _{jpm}|\gamma ) \end{aligned}$$

(B7)

which is

$$\begin{aligned} \delta _{jpm}|\beta _{jpm},\gamma \sim \text {Bernoulli}\left( \tilde{\gamma }_{jpm}\right) \end{aligned}$$

(B8)

where

$$\begin{aligned} \tilde{\gamma }_{jpm}=\frac{\gamma p(\beta _{jpm}|\delta _{jpm}=1,\sigma ^2_{\beta })}{\gamma p(\beta _{jpm}|\delta _{jpm}=1,\sigma ^2_{\beta })+(1-\gamma ) p(\beta _{jpm}|\delta _{jpm}=0,\sigma ^2_{\beta })} \end{aligned}$$

(B9)

The full conditional distribution for $\gamma $ is

$$\begin{aligned} p(\gamma |\varvec{\Delta })\propto p(\varvec{\Delta }|\gamma )p(\gamma )=\left( \prod _{j=1}^J\prod _{p=1}^{P-1} \prod _{m=1}^{M_j-1} p(\delta _{jpm}|\gamma )\right) p(\gamma ) \end{aligned}$$

(B10)

so

$$\begin{aligned} \gamma |\varvec{\Delta }\sim \text {Beta}\left( \sum _{j=1}^J\sum _{p=1}^{P-1} \sum _{m=1}^{M_j-1} \delta _{jpm}+a,J(P-1)(M-1)-\sum _{j=1}^J\sum _{p=1}^{P-1} \sum _{m=1}^{M_j-1} \delta _{jpm}+b\right) \nonumber \\ \end{aligned}$$

(B11)

1.2.2 $\sigma _{\beta }^2$

The full conditional distribution for $\sigma _{\beta }^2$ is

$$\begin{aligned} \sigma _{\beta }^2 \mid \varvec{B}, \varvec{\Delta }\sim IGamma\left( \alpha _{\sigma }+\dfrac{1}{2}\sum _{j=1}^J P(M_j-1),\beta _{\sigma }+\dfrac{1}{2}\sum _{j=1}^J\sum _{p=0}^{P-1} \sum _{m=1}^{M_j-1}\beta _{jpm}^2(D(1-\delta _{jpm})+\delta _{jpm})\right) \nonumber \\ \end{aligned}$$

(B12)

1.2.3 $\varvec{\alpha }_i$ and $\varvec{\pi }$

We update $\varvec{\alpha }$ while integrating $\varvec{\pi }$ out

$$\begin{aligned} p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N)&= \int p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N\mid \varvec{\pi })p(\varvec{\pi })d\varvec{\pi }\nonumber \\&=\dfrac{1}{B(\varvec{d}_0)} \int \left( \prod _{c=0}^{2^K-1} \pi _{c}^{n_{c}+d_{0,c}-1}\right) \textrm{d} \varvec{\pi }\nonumber \\&=\dfrac{B(\varvec{n} + \varvec{d}_0)}{B(\varvec{d}_0)}. \end{aligned}$$

(B13)

Then, the full conditional prior distribution for $\varvec{\alpha }_i$ is

$$\begin{aligned} p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)})&= \dfrac{p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N)}{p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_{i-1},\varvec{\alpha }_{i+1},\ldots ,\varvec{\alpha }_N)}\nonumber \\&= \frac{n_{c(i)}+1}{n-1+2^K}, \end{aligned}$$

(B14)

where $n_{c(i)}$ represents the number of individuals other than i that have attribute profile $\varvec{\alpha }_c$. Full conditional distribution of $\varvec{\alpha }_i$ given $\varvec{y}_{1:n}$ and $\varvec{\alpha }_{(i)}$ is

$$\begin{aligned} p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)},\varvec{y}_{1:n},\varvec{B}^{(t-1)})&\propto p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)})p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{B}^{(t-1)})\nonumber \\&\propto (n_{c(i)}+1)p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{B}^{(t-1)}), \end{aligned}$$

(B15)

we update $\varvec{\alpha }_i$ sequentially with weight proportional to $(n_{c(i)}+1)p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{B}^{(t-1)})$. For $\varvec{\pi }$, we have

$$\begin{aligned} \varvec{\pi }\mid \varvec{\alpha }_{1:n} \sim Dirichlet(\varvec{n}+\varvec{d}_0), \end{aligned}$$

(B16)

where $\varvec{n}=(\varvec{n}_0,\ldots ,\varvec{n}_{2^K-1})$.

1.2.4 Item Parameters, $\varvec{\beta }_{jm}$, and Augmented Parameters, $w_{jcm}$

Applying the PG identity in Eq. B3 to Eq. B4 yields,

$$\begin{aligned} p(\varvec{\beta }_{jm}|\varvec{y}_{1:n,j},\varvec{A},\varvec{B}_{j(m)},\varvec{\delta }_{jm},\varvec{w}_{jm})&\propto p(\varvec{\beta }_{jm}|\varvec{\delta }_{jm}) \prod _{c=0}^{2^K-1} \frac{\left[ \exp \left( \eta _{jcm} \right) \right] ^{n_{jcm}}}{\left[ 1+\exp \left( \eta _{jcm} \right) \right] ^{n_c}}\nonumber \\&\propto p(\varvec{\beta }_{jm}|\varvec{\delta }_{jm}) \prod _{c=0}^{2^K-1} \exp \left( {\tilde{y}}_{jcm}\eta _{jcm}-\frac{w_{jcm}\eta _{jcm}^2}{2}\right) , \end{aligned}$$

(B17)

where ${{\tilde{y}}}_{jcm}=n_{jcm}-n_c/2$ and $w_{jm}$ is a $2^K$ vector with element c defined as a PG random variable $w_{jcm}$ with full conditional distribution of $w_{jcm}|\varvec{A},\varvec{B}_j\sim \text {PG}(n_c,\eta _{jcm})$. We see,

$$\begin{aligned}&p(\varvec{\beta }_{jm}|\varvec{y}_{1:n,j},\varvec{A},\varvec{B}_{j(m)},\varvec{\delta }_{jm},\varvec{w}_{jm})\propto p(\varvec{\beta }_{jm}\mid \varvec{\delta }_{jm}) \prod _{c=0}^{2^K-1} \exp \left\{ -\frac{\omega _{jcm}}{2} \left( \frac{{{\tilde{y}}}_{jcm}}{\omega _{jcm}}-\eta _{jcm}\right) ^2\right\} \nonumber \\&\quad = p(\varvec{\beta }_{jm}\mid \varvec{\delta }_{jm}) \prod _{c=0}^{2^K-1} \exp \left\{ -\frac{\omega _{jcm}}{2} \left( z_{jcm}-\varvec{a}_c^\top \varvec{\beta }_{jm}\right) ^2\right\} \nonumber \\&\quad = p(\varvec{\beta }_{jm}\mid \varvec{\delta }_{jm}) \exp \left\{ -\frac{1}{2}\left( \varvec{z}_{jm}-{\textbf {A}}\varvec{\beta }_{jm}\right) ^\top \varvec{\Omega }_{jm}\left( \varvec{z}_{jm}-{\textbf {A}}\varvec{\beta }_{jm}\right) \right\} \end{aligned}$$

(B18)

where ${{\textbf {A}}}$ is a $2^K\times P$ design matrix and

$$\begin{aligned} z_{jcm}&=\frac{{{\tilde{y}}}_{jcm}}{\omega _{jcm}}+C_{jcm}\end{aligned}$$

(B19)

$$\begin{aligned} \varvec{z}_{jm}&=(z_{0jm},\dots ,z_{2^K-1,jm})^\top . \end{aligned}$$

(B20)

Since the prior distribution of $\varvec{\beta }_{jm}$ given $\varvec{\delta }_{jm}$ is

$$\begin{aligned} \varvec{\beta }_{jm}\mid \varvec{\delta }_{jm} \sim {\mathcal {N}} (0,\varvec{\Sigma }_{jm}), \end{aligned}$$

(B21)

where $\varvec{\Sigma }_{jm}=\sigma _\beta ^2 \text {diag}(v_{jm0},\dots ,v_{jm,2^K-1})$ and $v_{jmp}=\delta _{jmp}+(1-\delta _{jmp})/D$. Then, adding the prior term from the exponent for $\varvec{\beta }_{jm}$ yields the posterior distribution

$$\begin{aligned} p(\varvec{\beta }_{jm}|\varvec{y}_{1:n,j},\varvec{A},\varvec{B}_{j(m)},\varvec{\delta }_{jm},\varvec{w}_{jm})&\propto p(\varvec{\beta }_{jm}\mid \varvec{\delta }_{jm}) \prod _{c=0}^{2^K-1} \exp \left\{ -\frac{\omega _{jcm}}{2} \left( \frac{{{\tilde{y}}}_{jcm}}{\omega _{jcm}}-\eta _{jcm}\right) ^2\right\} \nonumber \\&\propto \exp \left\{ -\frac{1}{2}\left( \varvec{\beta }_{jm}^{\top }\varvec{\Sigma }_{jm}^{-1} \varvec{\beta }_{jm}+\left( \varvec{z}_{jm}-{{\textbf {A}}}\varvec{\beta }_{jm}\right) ^\top \varvec{\Omega }_{jm}\left( \varvec{z}_{jm}-{{\textbf {A}}}\varvec{\beta }_{jm}\right) \right) \right\} \nonumber \\&\propto \exp \left\{ -\frac{1}{2} \left( \varvec{\beta }_{jm}-\varvec{\mu }_{jm}\right) ^\top {\textbf{V}}_{jm} \left( \varvec{\beta }_{jm}-\varvec{\mu }_{jm}\right) \right\} . \end{aligned}$$

(B22)

Therefore, the full conditional distribution of $\varvec{\beta }_{jm}$ is

$$\begin{aligned} \varvec{\beta }_{jm}&\mid \varvec{Y}_{1:n,j},\varvec{A},\varvec{B}_{j(m)},\varvec{\delta }_{jm},\varvec{w}_{jm}\sim {\mathcal {N}}_{2^K}\left( \varvec{\mu }_{jm},{\textbf{V}}_{jm}\right) \end{aligned}$$

(B23)

$$\begin{aligned} {\textbf{V}}_{jm}&= \varvec{A}^\top \varvec{\Omega }_{jm}\varvec{A}+\varvec{\Sigma }_{jm}^{-1}\end{aligned}$$

(B24)

$$\begin{aligned} \varvec{\mu }_{jm}&={\textbf{V}}_{jm}^{-1} \varvec{A}^\top \varvec{\Omega }_{jm}\varvec{z}_{jm}={\textbf{V}}_{jm}^{-1} \varvec{A}^\top \left( \tilde{\varvec{y}}_{jm}+\varvec{\Omega }_{jm}\varvec{C}_{jm}\right) \end{aligned}$$

(B25)

where $\varvec{\Sigma }_{jm}=\sigma _\beta ^2 \text {diag}(v_{jm0},\dots ,v_{jmP})$ and $v_{jmp}=\delta _{jmp}+(1-\delta _{jmp})/D$. Note we use the second equality in Eq. B25 to avoid numerical issues associated with dividing by a $\omega _{jcm}$ that is zero.

Similarly, if we instead sample one coefficient at a time, we need the full conditional distribution of $\beta _{jmp}$ as follows:

$$\begin{aligned} p(\beta _{jmp}|\varvec{y}_{1:n,j},\varvec{A},\varvec{B}_{j(m)},\varvec{q}_j,\varvec{w}_{jm})&\propto p(\beta _{jmp}) \prod _{c=0}^{2^K-1} \exp \left\{ -\frac{\omega _{jcm}}{2} \left( \frac{{{\tilde{y}}}_{jcm}}{\omega _{jcm}}-\eta _{jcm}\right) ^2\right\} \nonumber \\&= p(\beta _{jmp}) \exp \left\{ -\frac{1}{2}\left( \varvec{z}_{jm}-{{\textbf {A}}}\varvec{\beta }_{jm}\right) ^\top \varvec{\Omega }_{jm}\left( \varvec{z}_{jm}-{{\textbf {A}}}\varvec{\beta }_{jm}\right) \right\} \nonumber \\&\propto \exp \left\{ -\frac{\beta _{jmp}^2}{2\varvec{\Sigma }_{jmp}} \right\} \exp \left\{ -\frac{1}{2} \left( \tilde{\varvec{z}}_{jm}-\varvec{A}_p\beta _{jmp}\right) ^\top \varvec{\Omega }_{jm}\left( \tilde{\varvec{z}}_{jm}-\varvec{A}_p\beta _{jmp}\right) \right\} \nonumber \\&\propto \left( \varvec{\beta }_{jm}-\varvec{\mu }_{jm}\right) ^\top {\textbf{V}}_{jm} \left( \varvec{\beta }_{jm}-\varvec{\mu }_{jm}\right) . \end{aligned}$$

(B26)

Therefore, the full conditional distribution for $\beta _{jmp}$ given $\varvec{Y}_{1:n,j}$, $\varvec{A}$, $\varvec{\beta }_{jm(p)}$, and $\varvec{B}_{j(m)}$ is

$$\begin{aligned} \beta _{jmp}\mid \varvec{Y}_{1:n,j},\delta _{jmp},\varvec{A}, \varvec{\beta }_{jm(p)},\varvec{B}_{j(m)}\sim \mathcal N(\mu _{jmp},\sigma _{jmp}^2), \end{aligned}$$

(B27)

$$\begin{aligned} \sigma _{jmp}^2&=\frac{1}{\varvec{A}_p^\top \varvec{\Omega }_{jm}\varvec{A}_p + 1/\sigma ^2_\beta v_{jmp}},\end{aligned}$$

(B28)

$$\begin{aligned} \mu _{jmp}&=\sigma _{jmp}^2 \varvec{A}_p^\top \varvec{\Omega }_{jm} \tilde{\varvec{z}}_{jm} = \sigma _{jmp}^2 \varvec{A}_p^\top \left( \tilde{\varvec{y}}_{jm} + \varvec{\Omega }_{jm} \varvec{C}_{jm} - \varvec{\Omega }_{jm} {\textbf{A}}_{(p)}\varvec{\beta }_{jm(p)} \right) , \end{aligned}$$

(B29)

$$\begin{aligned} \tilde{\varvec{z}}_{jm}&=\varvec{z}_{jm}-{{\textbf {A}}}_{(p)}\varvec{\beta }_{jm(p)}=\varvec{z}_{jm} - {{\textbf {A}}}\varvec{\beta }_{jm}+\varvec{A}_p\beta _{jmp}, \end{aligned}$$

(B30)

where $\varvec{A}_p$ is column p of ${{\textbf {A}}}$ and ${{\textbf {A}}}_{(p)}$ excludes column p of ${{\textbf {A}}}$. Note computation of the conditional mean and variance requires, ${{\textbf {A}}}_{p}^\top \varvec{\Omega }_{jm}{{\textbf {A}}}_{p}=\sum _{c=0}^{2^K-1} \text{ A}_{pc}\omega _{jcm}$.

Appendix C Starting Values of $\varvec{B}$

In this appendix, we will show the starting value generation steps of coefficients in $\varvec{B}$.

1.
Perform a $k-$means clustering (MacQueen, 1967) on the observed responses.
1. (a)
  First, define the binary response array $\varvec{Y}^b$ with size $N\times J\times M$, where $Y_{ijm}^{b}=\varvec{I}\left\{ Y_{ij}=m \right\} $. Then, reshape the three dimensional array $\varvec{Y}^b$ into a matrix $\varvec{Y}^*$ with size $N\times JM$, through combining columns of every slice of $\varvec{Y}^b$.
2. (b)
  Partition the $JM-$dimensional vectors corresponding to the N respondents $Y^*=(y_1^*,\ldots ,y_N^*)^\top $ into $2^K$ distinct groups with $n_{c^\prime }$ respondents per group.
3. (c)
  Initialize the category response probabilities described in Eq. 1 such that $\theta _{j^\prime c^\prime }\in \varvec{\Theta }_{JM\times 2^K}$ is the $j^\prime $-th element of the cluster center for group $c^\prime $.
2.
Assuming K factors, perform an exploratory factor analysis (EFA) on the slices of observed responses array $\varvec{Y}^b_m$, $m=1,\ldots ,M-1$.
1. (a)
  Generate factor scores for the $i = 1,\ldots ,N$ respondents across the $j = 1,\ldots ,J$ items and $k = 1,\ldots ,K$ attributes, $\tilde{\theta }_{ik}^m$.
2. (b)
  Compute within-cluster averages of the factor scores $\tilde{\theta }_{c^{^\prime }k}^m = \dfrac{1}{n_c^{^\prime }}\sum _{i^{c^\prime }=1}^{n_{c^\prime }} \tilde{\theta }_{i^{c^\prime } k}^m$ for each of the $2^K$ groups.
3. (c)
  Dichotomize the within-cluster factor score averages into pseudo-attributes as $\tilde{\alpha }_{c^{^\prime }k}^m = I\left( \tilde{\theta }_{c^{^\prime }k}^m>0 \right) $.
3.
Define the pseudo-attribute profiles $\tilde{\varvec{\alpha }}_{c^{^\prime }}^m = (\tilde{\alpha }_{c^{^\prime }1}^m,\ldots ,\tilde{\alpha }_{c^{^\prime }K}^m)^\top $ in terms of the binary-integer bijection $(\tilde{\varvec{\alpha }}_{c^{^\prime }}^m)^\top \varvec{v} = c$.
4.
Based on the bijection integers in step 3, swap the label of latent class of the initialized category response probabilities matrix got in step 1.
5.
For $m=1,2,\ldots ,M-1$, initialize the m-th slice of $\varvec{B}$ as follows:
1. (a)
  Define the category response probability as $Logit^{-1}(\theta _{jcm}) = \varvec{\alpha }_c^\top \varvec{\beta }_{jm}^{(0)}\in \varvec{M}^{(0)}_{\alpha \beta :J\times 2^K}$.
2. (b)
  Calculate $\varvec{B}_m^{(0)} = \varvec{M}^{(0)}_{\alpha \beta }\varvec{A}(\varvec{A}^\top \varvec{A})^{-1}$, where $\varvec{A}$ is the $2^K \times P$ design matrix.

Table 6 Summary of simulation performance for RLCM with dense $\varvec{\Delta }$ and $\varvec{Q}$

Full size table

Appendix D Simulation Results for Dense $\varvec{\Delta }$

The unknown denser true $\varvec{\Delta }$ and true $\varvec{Q}$ matrices for each option are shown as follows (columns in $\varvec{\Delta }$ follow the same order as the design vector shown in Eq. 3):

$\varvec{\Delta }$ cube with $K=2$
$$\begin{aligned} \varvec{\Delta }_{m=1}= \left( \begin{array}{llll} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 &{} 1 \end{array}\right) , \quad \varvec{\Delta }_{m=2}=\left( \begin{array}{llll} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \end{array}\right) , \quad \varvec{\Delta }_{m=3}=\left( \begin{array}{llll} 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 \end{array}\right) \end{aligned}$$
(B31)
$\varvec{\Delta }$ cube with $K=3$
$$\begin{aligned} \varvec{\Delta }_{m=1}= \left( \begin{array}{llllllll} 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \end{array}\right) , \varvec{\Delta }_{m=2}= \left( \begin{array}{llllllll} 1 &{} 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right) , \varvec{\Delta }_{m=3}= \left( \begin{array}{llllllll} 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 \\ 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 \end{array}\right) \end{aligned}$$
(B32)
$\varvec{Q}$ matrices with $K=2$
$$\begin{aligned} {\varvec{Q}}_{m=1}= \left( \begin{array}{ll} 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 1 \\ 1 &{} 1 \\ 1 &{} 1 \end{array}\right) , \quad {\varvec{Q}}_{m=2}=\left( \begin{array}{ll} 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 1 \\ 0 &{} 1 \\ 1 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \end{array}\right) , \quad {\varvec{Q}}_{m=3}=\left( \begin{array}{ll} 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \\ 0 &{} 1 \\ 1 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 1 \end{array}\right) \end{aligned}$$
(B33)
$\varvec{Q}$ matrices with $K=3$
$$\begin{aligned} {\varvec{Q}}_{m=1}= \left( \begin{array}{lll} 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 1 \\ 1 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 \\ 0 &{} 1 &{} 1 \\ 1 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 \end{array}\right) , \quad {\varvec{Q}}_{m=2}=\left( \begin{array}{lll} 1 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 1 \\ 0 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \end{array}\right) , \quad {\varvec{Q}}_{m=3}=\left( \begin{array}{lll} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 1 &{} 0 \\ 0 &{} 1 &{} 1 \\ 1 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 \\ 1 &{} 1 &{} 1 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 1 \\ 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 \end{array}\right) \end{aligned}$$
(B34)

Simulation results are shown in Table 6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Culpepper, S.A. Restricted Latent Class Models for Nominal Response Data: Identifiability and Estimation. Psychometrika 89, 592–625 (2024). https://doi.org/10.1007/s11336-023-09940-7

Download citation

Received: 28 July 2022
Accepted: 15 November 2023
Published: 19 December 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11336-023-09940-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Restricted Latent Class Models for Nominal Response Data: Identifiability and Estimation

Abstract

Similar content being viewed by others

A Note on Weaker Conditions for Identifying Restricted Latent Class Models for Binary Responses

Handling Missing Data in Item Response Theory. Assessing the Accuracy of a Multiple Imputation Procedure Based on Latent Class Analysis

Robustness of Mixture IRT Models to Violations of Latent Normality

1 Introduction

2 Overview of Mixture Models for Nominal Responses

2.1 Unstructured Latent Class Models (ULCMs)

2.2 Restricted Latent Class Models (RLCMs)

Example 1

Example 2

Remark 1

Definition 1

Remark 2

Example 3

Example 4

Remark 3

3 Identifiability Issue

3.1 Model Identifiability

Definition 2

3.2 Generic Identifiability

Remark 4

Definition 3

3.3 Identifiability Conditions

Theorem 1

Remark 5

Theorem 2

Remark 6

4 Bayesian Formulation for the Nominal RLCM

5 Monte Carlo Simulation Study

5.1 Settings

5.2 Results

5.3 Unstructured Latent Class Models (ULCMs)

6 Applications

6.1 Wagner Preference Inventory

7 Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A Proof of Theorems

1.1 Preliminary Results

Definition 4

Remark 7

Theorem 3

Definition 5

Example 5

Proposition 1

Proof

Remark 8

Proposition 2

Proof

1.2 Proof of Theorem 1

1.3 Proof of Theorem 2

Proposition 3

Proof

Lemma 1

Proposition 4

Proof

Appendix B Posterior Inference

1.1 Conditional Likelihood and Polya Gamma Data Augmentation

1.2 Full Conditional Distributions

1.2.1 \(\delta _{jpm}\) and \(\gamma \)

1.2.2 \(\sigma _{\beta }^2\)

1.2.3 \(\varvec{\alpha }_i\) and \(\varvec{\pi }\)

1.2.4 Item Parameters, \(\varvec{\beta }_{jm}\), and Augmented Parameters, \(w_{jcm}\)

Appendix C Starting Values of \(\varvec{B}\)

Appendix D Simulation Results for Dense \(\varvec{\Delta }\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation