1 Introduction

Mixture models for multivariate, unordered categorical data, which are also referred to as nominal data, are widely used as a data reduction technique to uncover a partition of latent classes. Nominal response data arises naturally in a diverse collection of fields and associated latent class models have been applied to uncover the structure underlying positional dependence of nucleotides (Dunson & Xing, 2009), surveys responses for political elections (DeYoreo et al., 2017), anuran abundance using calling survey data (Royle & Link, 2005), as well as for multiple imputation of educational (Si & Reiter, 2013) and social science (Murray & Reiter, 2016; Vermunt et al., 2008) surveys. In short, nominal mixture models serve an important role across the physical and social sciences.

Recent psychometric research introduced a class of restricted latent class models (RLCMs) that use a more parsimonious formulation for describing the structure underlying multivariate nominal data (e.g., see Chen & Zhou, 2017, Fang et al., 2019, Templin et al., 2008) than the traditional framework, which we refer to as unrestricted latent class models (ULCMs). For instance, a popular application of nominal RLCMs is to understand how latent classes relate to target and distractor responses on multiple choice tests (e.g., see Bradshaw & Templin, 2014, De La Torre, 2009, DiBello et al., 2015, Ku et al., 2016, Shear & Roussos, 2017, Yigit et al., 2019). In order to distinguish RLCMs and ULCMs we let \(\varvec{\Theta }\) denote the emission parameters that govern the likelihood that latent classes select different options on J variables with M unordered options for each variable. If C denotes the number of latent classes, the classic ULCM framework includes \(J\times M\times C\) parameters to relate latent class membership to observed responses. In contrast, RLCMs impose structure on the elements of \(\varvec{\Theta }\) by constraining some elements to be equal. Accordingly, RLCMs include fewer parameters than ULCMs. Furthermore, as we demonstrate below, RLCMs generally offer a more interpretable framework for understanding the latent structure (i.e., the relationship between the latent classes and observed variables). In fact, the pattern of equal and unequal elements in the RLCM \(\varvec{\Theta }\) parameter provides researchers with a guide for interpreting the impact of latent class membership on response probabilities.

Although prior research developed several general models for nominal RLCMs, there are at least two limitations with existing research that limits widespread applicability of these methods for statistical research in education and the social sciences. First, existing methods are primarily confirmatory in nature given that researchers must prespecify the manner by which latent classes relate to observed response probabilities. Specifically, let \(\varvec{\Delta }\) be the RLCM parameter with \(J\times (M-1)\times C\) binary elements that indicate which elements of \(\varvec{\Theta }\) are equal (we formally define the \(\varvec{\Delta }\) matrix below with examples). Currently deployed nominal RLCMs must specify every element of \(\varvec{\Delta }\), which may be challenging for some research applications. Whereas researchers may be able to correctly articulate the latent structure in \(\varvec{\Delta }\) for some applications (e.g., target and distractor responses on some multiple choice tests), the general unavailability of substantive theory would limit widespread application of nominal RLCMs. Second, numerous studies developed RLCMs for multivariate nominal data, yet there has been limited research on conditions that are needed to ensure model parameters are identified. It is important to note that several studies discussed identifiability for ULCMs (e.g., see Allman et al., 2009); however, current results are not specific for RLCMs given that Allman et al. (2009) consider an unrestricted parameter space whereas the parameter space for our RLCM is restricted by the structure of \(\varvec{\Delta }\). Consequently, the parameter space falls into a measure zero set with respect to the whole parameter space of ULCMs as discussed in Allman et al. (2009), so identifiability conditions mentioned above for ULCMs cannot be directly applied to our RLCMs. Furthermore, our paper contributes to literature on the identifiability RLCMs. An extensive collection of literatures have delved into local identifiability issues, which aim to ensure the model parameters are identifiable in a neighborhood of the true parameters. McHugh (1956) proposed sufficient conditions to determine the local identifiability condition for latent class model with binary response. Goodman (1974) extended the conditions for latent class models with polytomous response. Huang and Bandeen-Roche (2004) proposed local identifiability conditions for latent class models with covariates. For global identifiability issue, there are numerous papers proposing strict and generic identifiability conditions for binary response data (Chen et al., 2015, 2020; Xu, 2017; Xu & Shang, 2018) and strict identifiability conditions for polytomous response data (Culpepper, 2019; Fang et al., 2019). Additionally, Gu and Dunson (2021) establish strict and generic identifiability conditions for a multiclass, multilayer latent structure model. Gu and Dunson (2021) could be viewed as a more general model than the one we consider as it admits a multilayered, hierarchical structure for attributes. One strength of our paper relative to Gu and Dunson (2021) is that our identifiability conditions provide practitioners with clear guidance for designing nominal response assessments (e.g., forced-choice inventories). Furthermore, our identifiability conditions also provide generic conditions that are applicable to polytomous RLCMs.

Accordingly, the goal of our study is to address the aforementioned shortcomings in the literature. That is, we propose a fully exploratory framework for inferring nominal RLCM parameters and present new theory regarding model identification. The identifiability of model parameters is critical for statistical inference and we also provide researchers with guidance for designing multivariate nominal response studies.

It is also important to distinguish the models we explore in this study in comparison to polytomous latent class models. Specifically, researchers advanced RLCMs for polytomous data for both confirmatory (e.g., see Ma & de la Torre, 2016; 2019) and exploratory methods (Culpepper, 2019; Culpepper & Balamuta, 2021; Jimenez et al., 2023). There are also several studies (Bacci et al., 2014; Bartolucci, 2007; Gnaldi et al., 2020) described latent class models within an item response theory (IRT) framework with at least three link functions (i.e., graded response, partial credit, and continuation ratio). These prior studies made important contributions and demonstrated how to use link functions for modeling ordered, polytomous response data with latent class models. In contrast, an important innovation of our study is that we deploy the multinomial logistic link function, which is suitable for unordered, nominal responses.

The remainder of this paper includes six sections. The first section provides a general introduction to ULCMs and RLCMs for nominal data and the second section presents new theoretical results concerning the identifiability of RLCMs (please see Appendix for related proofs). The third section outlines a Bayesian formulation for inferring the RLCM parameter posterior distribution. The fourth section reports Monte Carlo results concerning the accuracy of the developed algorithm and the fifth section reports results from an application. The final section discusses the implications of this study and provides concluding remarks.

2 Overview of Mixture Models for Nominal Responses

We consider the setting where multivariate, nominal response data are available such that \(Y_j\) (for \(j=1,\dots ,J\)) is a random categorical (or nominal) response with a realization \(y_j\in \left\{ 0,\dots ,M_j-1\right\} \) where \(M_j\ge 2\) denotes the number of unordered response options. We denote the random J-vector by \(\varvec{Y}=(Y_1,\dots , Y_J)^\top \) and the observed vector of responses as \(\varvec{y}= (y_{1},\dots ,y_{J})^\top \). The support for \(\varvec{Y}\) is defined as \(\varvec{y}\in \times _{j=1}^J \left\{ 0,\dots ,M_j-1\right\} \), which implies there are \(\prod _{j=1}^JM_j\) possible observed response patterns. The purpose of this section is discuss the role of mixture models in understanding the multivariate, nominal response patterns. The first subsection reviews existing unstructured latent class models (ULCMs) for nominal, unordered response data. ULCMs offer a powerful framework for uncovering substantively meaningful latent classes. However, the results from ULCMs data analyses may not always be easily interpretable as researchers must decipher the meaning of latent classes by comparing many latent class parameters. Accordingly, the second subsection introduces a new general restricted latent class model (RLCM) framework, which has the benefit of directly uncovering the latent structure by providing researchers with a \(\varvec{\Delta }\) parameter for more easily interpreting the class labels.

2.1 Unstructured Latent Class Models (ULCMs)

The goal of this section is to review the traditional ULCM framework. Let \(c\in \{0,\dots ,C-1\}\) index the C underlying latent classes. In the case of nominal data, the unstructured model includes a \(M_j\)-vector of category response probabilities for each class and item denoted by \(\varvec{\theta }_{jc}=(\theta _{jc0},\dots ,\theta _{jc,M_j-1})^\top \) so that the probability of observing a response of m on item j for members of class c is \(\theta _{jcm}=P(Y_j=m|c)\). We define \(\varvec{\Theta }_j=(\varvec{\theta }_{j0},\dots ,\varvec{\theta }_{j,C-1})\) as the \(M_j\times C\) matrix of response probabilities by response option and latent class. The goal of ULCMs is to describe the \(\prod _{j=1}^J M_j\) possible response patterns. ULCMs consider the case where latent classes differ in their chances of responding according to a given response pattern. The probability vector that governs the chance members of class c respond according to one of the \(\prod _{j=1}^J M_j\) possible response patterns is \(\mathbb {P}_c = \bigotimes _{j=1}^J\varvec{\theta }_{jc}\) where \(\otimes \) denotes a Kronecker product. Let \(\varvec{\pi }=(\pi _0,\dots ,\pi _{C-1})^\top \) be a C-vector of structural probabilities such that \(\pi _c\) denotes the chance of membership in class c and note that the model implied response pattern probability vector is \(\mathbb {P}=\sum _{c=0}^{C-1}\pi _c \mathbb {P}_c\).

2.2 Restricted Latent Class Models (RLCMs)

This subsection introduces a RLCM for nominal data which offers a more interpretable solution by imposing restrictions on the ULCM \(\varvec{\theta }_{jc}\) parameters. In particular, the RLCM adapts the ULCM to describe the \(\prod _{j=1}^JM_j\) response patterns by reparameterizing both the latent space and parameters. First, the RLCM defines the latent classes using a \(2^K\) binary attribute vector \(\varvec{\alpha }=(\alpha _1,\dots ,\alpha _K)^\top \in \{0,1\}^K\). Therefore, the connection between the number of classes in the ULCM and the RLCM is \(C=2^K\). An advantage of using the binary attribute profile is that researchers can interpret \(\alpha _k=1\) as denoting possession or mastery of attribute k and \(\alpha _k=0\) otherwise. The relationship between the ULCM and RLCM is also apparent when using a bijection between the binary attribute profile \(\varvec{\alpha }\) and the integers \(c\in \{0,\dots , 2^K-1\}\) by defining class \(c=\varvec{\alpha }^\top \varvec{v}\in \{0,\dots , 2^K-1\}\) with \(\varvec{v}=(2^{K-1},2^{K-2},\dots ,1)^\top \).

Second, the RLCM reparameterizes the elements of \(\varvec{\theta }_{jc}\) using the following multinomial logit-link function

$$\begin{aligned} \theta _{jcm}=\frac{\exp \left( \varvec{a}_c^\top \varvec{\beta }_{jm} \right) }{\sum _{m'=0}^{M_j-1}\exp \left( \varvec{a}_c^\top \varvec{\beta }_{jm'} \right) } \end{aligned}$$
(1)

where \(\varvec{a}_c\) is a design vector for the attribute profile for class c and \(\varvec{\beta }_{jm}\) is a P-vector of coefficients for item j and option m (i.e., P depends on the order of the model, \(P=2^K\) if we include main and all interaction-effect terms for latent class). Note that the restriction \(\varvec{\beta }_{j0}=\varvec{0}\) is deployed for all j to identify the model. Furthermore, the restriction on \(\varvec{\beta }_{j0}\) implies that \(y_j=0\) is the reference response so that \(\varvec{\beta }_{jm}\) for \(m>0\) quantifies the impact of the attributes on response values of \(y_j=m\) versus \(y_j=0\) on item j. Let the \(M_j\times 2^K\) matrix of coefficients for item j be denoted as \(\varvec{B}_j=(\varvec{\beta }_{j0},\dots ,\varvec{\beta }_{j,M_j-1})^\top \).

An important implication of reparameterizing \(\theta _{jcm}\) with a multinomial logit-link is that the transformed \(\varvec{\beta }_{jm}\) parameters provide a more coherent interpretation regarding the process by which the underlying attributes relate to observed responses. For instance, we define the \(2^K\)-vector \(\varvec{a}_c\) as including main- and interaction-effect terms for latent class \(\varvec{\alpha }^\top \varvec{v}=c\). Consequently, the elements of \(\varvec{\beta }_{jm}\) indicate the manner by which the attributes translate into preferences for response option m relative to response option zero.

We next present an example to further illustrate the link between ULCMs and RLCMs and the interpretation of the \(\varvec{a}_c\) and \(\varvec{\beta }_{jm}\) parameters.

Example 1

Suppose \(K=3\) and \(M_j=3\), so \(y_j\in \{0,1,2\}\). In this case, the matrix of ULCM parameters is,

$$\begin{aligned} \varvec{\Theta }_j = \begin{bmatrix} \theta _{j00}&{}\quad \theta _{j10}&{}\quad \theta _{j20}&{}\quad \theta _{j30}&{}\quad \theta _{j40}&{}\quad \theta _{j50}&{}\quad \theta _{j60}&{}\quad \theta _{j70}\\ \theta _{j01}&{}\quad \theta _{j11}&{}\quad \theta _{j21}&{}\quad \theta _{j31}&{}\quad \theta _{j41}&{}\quad \theta _{j51}&{}\quad \theta _{j61}&{}\quad \theta _{j71}\\ \theta _{j02}&{}\quad \theta _{j12}&{}\quad \theta _{j22}&{}\quad \theta _{j32}&{}\quad \theta _{j42}&{}\quad \theta _{j52}&{}\quad \theta _{j62}&{}\quad \theta _{j72}\\ \end{bmatrix} \end{aligned}$$
(2)

where we note that \(\theta _{jc0}=1-\sum _{m=1}^{M_j}\theta _{jcm}\) for all \(c\in \{0,1,\dots ,7\}\). In this setting, the ULCM includes \(2\times 8=16\) parameters for each item. Moreover, in order to understand the meaning of the latent classes researchers would need to interpret differences in the \(16\cdot J\) total class probabilities, which may be challenging for even a modest number of items J. The RLCM attempts to address this problem by reparameterizing both the latent classes and item parameters. In the case with \(K=3\), we define the arbitrary design vector \(\varvec{a}\) as:

$$\begin{aligned} \varvec{a}^\top = (1,\alpha _1,\alpha _2,\alpha _3,\alpha _1\alpha _2,\alpha _1\alpha _3,\alpha _2\alpha _3,\alpha _1\alpha _2\alpha _3) \end{aligned}$$
(3)

so that \(\varvec{a}\) includes all main-effect and interaction terms among the attributes and we use \(\varvec{a}_c\) to refer to the design vector for attribute profile \(\varvec{\alpha }^\top \varvec{v}=c\). The matrix of reparameterized parameters \(\varvec{\beta }_j\) for relating \(\varvec{\alpha }\) to \(Y_j\) is

$$\begin{aligned} \varvec{B}_j = \begin{bmatrix} \beta _{j00}&{}\quad \beta _{j10}&{}\quad \beta _{j20}&{}\quad \beta _{j30}&{}\quad \beta _{j40}&{}\quad \beta _{j50}&{}\quad \beta _{j60}&{}\quad \beta _{j70}\\ \beta _{j01}&{}\quad \beta _{j11}&{}\quad \beta _{j21}&{}\quad \beta _{j31}&{}\quad \beta _{j41}&{}\quad \beta _{j51}&{}\quad \beta _{j61}&{}\quad \beta _{j71}\\ \beta _{j02}&{}\quad \beta _{j12}&{}\quad \beta _{j22}&{}\quad \beta _{j32}&{}\quad \beta _{j42}&{}\quad \beta _{j52}&{}\quad \beta _{j62}&{}\quad \beta _{j72}\\ \end{bmatrix}. \end{aligned}$$
(4)

Note we can view \(\varvec{a}_c^\top \varvec{\beta }_{jm}\) as the latent response propensity for members of class c to pick option m vs. option 0. Therefore, the definition of \(\varvec{a}\) implies that \(\beta _{j0m}\) is an intercept term that corresponds with the latent propensity for the latent class with \(\varvec{\alpha }=\varvec{0}\) to select option m vs. 0. Furthermore, the main-effects for \(\alpha _1\), \(\alpha _2\), and \(\alpha _3\) for distinguishing response m from 0 are \(\beta _{j1m}\), \(\beta _{j2m}\), and \(\beta _{j3m}\), respectively. Furthermore, the two-way interaction terms are \(\alpha _1\alpha _2\), \(\alpha _1\alpha _3\), and \(\alpha _2\alpha _3\) with effects \(\beta _{j4m}\), \(\beta _{j5m}\), and \(\beta _{j6m}\), respectively, and the three-way interaction effect is \(\beta _{j7m}\). In general positive coefficients suggest preference for option m to option 0 and the interactive effects provide researchers with insight regarding the extent to which preferences are determined by a complex interplay of the attributes.

The aforementioned example demonstrates the ability of the RLCM to provide researchers with a more clear interpretation of the latent structure (i.e., the relationship between attributes and observed responses). Still, each \(\varvec{\beta }_j\) includes many parameters to estimate and interpret. A further refinement we advance to support coherent inferences about the latent structure is to incorporate variable selection methods into the RLCM to infer which of the elements of \(\varvec{\beta }_j\) are active (i.e., different from zero) versus inactive (i.e., equal to or near zero). In fact, the pattern of active vs. inactive elements of \(\varvec{\beta }_j\) indicates the underlying structure and describes the process by which attributes relate to the observed response \(Y_j\). Accordingly, we introduce a \(M_j\times 2^K\) binary matrix \(\varvec{\Delta }_j\) in order to indicate which elements of \(\varvec{\beta }_{j}\) are active. Specifically, \(\delta _{jpm}=1\) to denote that \(\beta _{jpm}\) is active (i.e., nonzero) and \(\delta _{jpm}=0\) if \(\beta _{jpm}=0\) (i.e., inactive). Note that we generally always include the intercept and fix \(\delta _{j0m}=1\) for all \(m\in \{1,\dots ,M_j-1\}\).

We next revisit Example 1 to highlight the role of \(\varvec{\Delta }_j\) in interpreting the latent structure.

Example 2

Reconsider the case with \(M_j=3\) and \(K=3\). In this case, \(\varvec{\Delta }_j\) is generally written as

$$\begin{aligned} \varvec{\Delta }_j=\begin{bmatrix} \delta _{j00}&{}\quad \delta _{j10}&{}\quad \delta _{j20}&{}\quad \delta _{j30}&{}\quad \delta _{j40}&{}\quad \delta _{j50}&{}\quad \delta _{j60}&{}\quad \delta _{j70}\\ \delta _{j01}&{}\quad \delta _{j11}&{}\quad \delta _{j21}&{}\quad \delta _{j31}&{}\quad \delta _{j41}&{}\quad \delta _{j51}&{}\quad \delta _{j61}&{}\quad \delta _{j71}\\ \delta _{j02}&{}\quad \delta _{j12}&{}\quad \delta _{j22}&{}\quad \delta _{j32}&{}\quad \delta _{j42}&{}\quad \delta _{j52}&{}\quad \delta _{j62}&{}\quad \delta _{j72}\\ \end{bmatrix}. \end{aligned}$$
(5)

Note that \(\delta _{jp0}=0\) for all \(p=0,\dots , 2^K-1\) to identify the model parameters and that terms for the intercepts are generally specified as active so \(\delta _{j01}=\delta _{j02}=1\).

Remark 1

If \(\varvec{\Delta }_j = \varvec{1}\) for all \(j=1, \ldots , J\), which implies that all coefficients in \(\varvec{B}\) are active, the latent classes have distinct response probabilities, and the RLCM is equivalent to a ULCM in this case. For additional discussion see Example 1 of Chen et al. (2020) for an exposition involving the binary response RLCM.

Note that the pattern of 1’s and 0’s in \(\varvec{\Delta }_j\) convey different types of relationships and structures. The structure of an item is referred to as simple structure for attribute k if the response probabilities only differ by levels of \(\alpha _k\).

Definition 1

The structure of \(\varvec{\Delta }_j\), which is a slice of \(\varvec{\Delta }\) for item j, is referred to as simple structure for attribute k if it satisfies the following structure:

$$\begin{aligned} \varvec{\Delta }_j= \begin{bmatrix} 0&{}\quad 0&{}\quad \cdots &{}\quad 0&{}\quad 0&{}\quad 0&{}\quad \cdots &{}\quad 0\\ 1&{}\quad 0&{}\quad \cdots &{}\quad 0&{}\quad \delta _{jk1}&{}\quad 0&{}\quad \cdots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad &{}\quad \vdots \\ 1&{}\quad 0&{}\quad \cdots &{}\quad 0&{}\quad \delta _{jk,M_j-1}&{}\quad 0&{}\quad \cdots &{}0\\ \end{bmatrix}_{M_j \times P}, \end{aligned}$$
(6)

and \(\sum _{m=1}^{M_j-1} \delta _{jkm}\ge 1\) where P generally equals \(2^K\).

Remark 2

Note that for convenience of notation that our identifiability proof below supposes that item j is simple structure for attribute j.

Example 3

Consider \(M_1=M_2=3\), \(P=2^K\) and \(J=2\) and note that examples of \(\varvec{\Delta }\) matrices that satisfy simple structure according to Definition 1 are:

$$\begin{aligned} \varvec{\Delta }_1 = \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ 1&{}\quad 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ 1&{}\quad 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \end{bmatrix}, \end{aligned}$$
(7)
$$\begin{aligned} \varvec{\Delta }_2 = \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ 1&{}\quad 0&{}\quad 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \end{bmatrix}. \end{aligned}$$
(8)

Items 1 and 2 suppose the item is simple structure such that the probability of selecting the responses only relates to standing on \(\alpha _1\) in \(\varvec{\Delta }_1\) and \(\alpha _2\) for \(\varvec{\Delta }_2\). \(\varvec{\Delta }_1\) indicates that only the main-effect for \(\alpha _1\) differentiates between response option 1 vs. 0 and 2 vs. 0. In contrast, for item 2, \(\varvec{\Delta }_2\) represents the case where the main-effect for \(\alpha _2\) is only active for differentiating between response option 2 vs. 0. The associated \(\varvec{B}_1\) and \(\varvec{B}_2\) matrices for the structure parameters \(\varvec{\Delta }_1\) and \(\varvec{\Delta }_2\) are:

$$\begin{aligned} \varvec{B}_1 = \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \beta _{101}&{}\quad \beta _{111}&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \beta _{102}&{}\quad \beta _{112}&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \end{bmatrix}, \end{aligned}$$
(9)
$$\begin{aligned} \varvec{B}_2 = \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \beta _{201}&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \beta _{202}&{}\quad 0&{}\quad \beta _{222}&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0\\ \end{bmatrix}. \end{aligned}$$
(10)

Let \(\varvec{\Theta }_j(\varvec{B}_j)\) denote the latent class response probabilities associated with the RLCM \(\varvec{B}_j\) matrix. The presence of structure in \(\varvec{B}_j\) and \(\varvec{\Delta }_j\) implies that elements of \(\varvec{\Theta }_j(\varvec{B}_j)\) are restricted to be equal.

Example 4

The rows of \(\varvec{\Delta }_1\) in Eq. 7 imply that certain latent classes have a common probability of selecting response 1 vs 0 and 2 vs. 0. That is, latent classes that do not possess the first attribute such that \(\varvec{\alpha }=(0,\alpha _2,\alpha _3)\) have common response probabilities for selecting options 0, 1, and 2 of \(\theta _{100}\), \(\theta _{101}\), and \(\theta _{102}\), respectively, whereas classes with the first attribute with \(\varvec{\alpha }=(1,\alpha _2,\alpha _3)\) have common response probabilities of \(\theta _{140}\), \(\theta _{141}\), and \(\theta _{142}\). The \(\varvec{\Theta }_1(\varvec{B}_1)\) in this case is

$$\begin{aligned} \varvec{\Theta }_1(\varvec{B}_1) = \begin{bmatrix} \theta _{100}&{}\quad \theta _{100}&{}\quad \theta _{100}&{}\quad \theta _{100}&{}\quad \theta _{140}&{}\quad \theta _{140}&{}\quad \theta _{140}&{}\quad \theta _{140}\\ \theta _{101}&{}\quad \theta _{101}&{}\quad \theta _{101}&{}\quad \theta _{101}&{}\quad \theta _{141}&{}\quad \theta _{141}&{}\quad \theta _{141}&{}\quad \theta _{141}\\ \theta _{102}&{}\quad \theta _{102}&{}\quad \theta _{102}&{}\quad \theta _{102}&{}\quad \theta _{142}&{}\quad \theta _{142}&{}\quad \theta _{142}&{}\quad \theta _{142}\\ \end{bmatrix}, \end{aligned}$$
(11)

where the columns of \(\varvec{\Theta }_1(\varvec{B}_1)\) are organized according to the binary-integer bijection and

$$\begin{aligned}{} & {} \varvec{\theta }_{10}=(\theta _{100},\theta _{101},\theta _{102})^\top = \frac{1}{\sum _{m=0}^2 \exp (\beta _{10m})}\left( 1,\exp (\beta _{101}),\exp (\beta _{102})\right) , \end{aligned}$$
(12)
$$\begin{aligned}{} & {} \varvec{\theta }_{14}=(\theta _{140},\theta _{141},\theta _{142})^\top = \frac{1}{\sum _{m=0}^2 \exp (\beta _{10m}+\beta _{11m})}\left( 1,\exp (\beta _{101}+\beta _{111}), \exp (\beta _{102}+\beta _{112})\right) .\nonumber \\ \end{aligned}$$
(13)

In contrast, the rows of \(\varvec{\Delta }_2\) in Eq. 8 imply a different collection of elements are constrained equal in \(\varvec{\Theta }_2(\varvec{B}_2)\). Latent classes that do not possess the second attribute such that \(\varvec{\alpha }=(\alpha _1,0,\alpha _3)\) have common response probabilities for selecting options 0, 1, and 2 of \(\theta _{200}\), \(\theta _{201}\), and \(\theta _{202}\), respectively, whereas classes with the second attribute with \(\varvec{\alpha }=(\alpha _1,1,\alpha _3)\) have common response probabilities of \(\theta _{220}\), \(\theta _{221}\), and \(\theta _{222}\). The \(\varvec{\Theta }_2(\varvec{B}_2)\) in this case is

$$\begin{aligned} \varvec{\Theta }_2(\varvec{B}_2) = \begin{bmatrix} \theta _{200}&{}\quad \theta _{200}&{}\quad \theta _{220}&{}\quad \theta _{220}&{}\quad \theta _{200}&{}\quad \theta _{200}&{}\quad \theta _{220}&{}\quad \theta _{220}\\ \theta _{201}&{}\quad \theta _{201}&{}\quad \theta _{221}&{}\quad \theta _{221}&{}\quad \theta _{201}&{}\quad \theta _{201}&{}\quad \theta _{221}&{}\quad \theta _{221}\\ \theta _{202}&{}\quad \theta _{202}&{}\quad \theta _{222}&{}\quad \theta _{222}&{}\quad \theta _{202}&{}\quad \theta _{202}&{}\quad \theta _{222}&{}\quad \theta _{222}\\ \end{bmatrix}, \end{aligned}$$
(14)

where the columns of \(\varvec{\Theta }_2(\varvec{B}_2)\) are organized according to the binary-integer bijection and

$$\begin{aligned} \varvec{\theta }_{20}=(\theta _{200},\theta _{201},\theta _{202})^\top = \frac{1}{\sum _{m=0}^2 \exp (\beta _{20m})}\left( 1,\exp (\beta _{201}),\exp (\beta _{202})\right) , \end{aligned}$$
(15)
$$\begin{aligned} \varvec{\theta }_{22}=(\theta _{220},\theta _{221},\theta _{222})^\top = \frac{\left( 1,\exp (\beta _{201}),\exp (\beta _{202}+\beta _{222})\right) }{1+ \exp (\beta _{201})+ \exp (\beta _{202}+\beta _{222})}. \end{aligned}$$
(16)

Remark 3

Note that \(\varvec{\Delta }_j\) can also denote different structures where multiple attributes relate to response variables. For instance, \(\varvec{\Delta }_j\) might specify the inclusion of interaction terms so that response probabilities are shaped by a more complex relationship of the attributes. Furthermore, we can also draw a connection between the ULCM and RLCM where \(\varvec{\Delta }_j=(0,\varvec{1}_{M_j-1}^\top )^\top \varvec{1}_P^\top \) corresponds with the ULCM setting with distinct elements in \(\varvec{\Theta }_j(\varvec{B}_j)\).

3 Identifiability Issue

3.1 Model Identifiability

As introduced in the previous section, the probability distribution of latent classes is given by \(\varvec{\pi }=(\pi _c)^\top \in [0,1]^{2^K}\) with \(\sum \pi _c =1\). Coefficients array \(\varvec{B}=(\varvec{B}_1, \ldots ,\varvec{B}_J)\) is a three-dimensional array, where \(\varvec{B}_j\) is the j-th slice of \(\varvec{B}\) with size \(M_j\times P\). Then, we denote the parameter space of \((\varvec{\pi },\varvec{B})\) by

$$\begin{aligned} \Omega (\varvec{\pi }, \varvec{B})=\{(\varvec{\pi }, \varvec{B}):\varvec{\pi }\in \Omega (\varvec{\pi }),\varvec{B}\in \Omega (\varvec{B})\}, \end{aligned}$$
(17)

where \(\Omega (\varvec{\pi })=\{\varvec{\pi }\in [0,1]^{2^K}:\sum _c \pi _c =1\}\), and \(\Omega (\varvec{B})\) represents the parameter space of the coefficients array \(\varvec{B}\), which could be the whole real space \({\mathbb {R}}^{J\times P\times \sum _j M_j}\), or a subset of \(\mathbb R^{J\times P\times \sum _j M_j}\) if constrained by the \(\varvec{\Delta }\).

Definition 2

(Strict Identifiability) The parameters \((\varvec{\pi }, \varvec{B}) \in \Omega (\varvec{\pi }, \varvec{B})\) are identifiable if

$$\begin{aligned} P(\varvec{Y} =\varvec{y} \mid \varvec{\pi }, \varvec{B})=P(\varvec{Y} =\varvec{y} \mid \bar{\varvec{\pi }},\bar{\varvec{B}}) \Longleftrightarrow (\varvec{\pi },\varvec{B}) \sim (\bar{\varvec{\pi }},\bar{\varvec{B}}), \end{aligned}$$

where \((\bar{\varvec{\pi }}, \bar{\varvec{B}})\) is another value from the parameter space \(\Omega (\varvec{\pi }, \varvec{B})\) and “\(\sim \)” means two parameter values are equivalent up to label switching of attributes.

3.2 Generic Identifiability

Generic identifiability, which is a weaker notion of identifiability than Definition 2, was first introduced in Allman et al. (2009). Generic identifiability allows the existence of some exceptional values of parameters for which strict identifiability does not hold, as long as all non-identifiable parameters form a Lebesgue measure zero set within the parameter space. Given that non-identifiable parameters exist in a set of measure zero, one is unlikely to face identifiability problems in performing inference. Thus, generic identifiability is generally sufficient for data analysis purposes.

However, the generic identifiability condition shown in Allman et al. (2009) cannot be applied in this paper. Under the setting of Allman et al. (2009), the parameter space \(\Omega (\varvec{B})\) is the whole real space \(\mathbb R^{J\times P\times \sum _j M_j}\), whereas the parameter space \(\Omega (\varvec{B})\) in our RLCM is restricted by the structure of \(\varvec{\Delta }\). The dimension of \(\Omega (\varvec{B})\) might vary with different \(\varvec{\Delta }\) arrays, i.e., the parameter space of \(\varvec{B}\) restricted by \(\varvec{\Delta }\) might be a measure zero subspace of another parameter space of \(\varvec{B}\) restricted by \(\tilde{\varvec{\Delta }}\). So, it is important to discuss the generic identifiability issue within a parameter space with a fixed \(\varvec{\Delta }\).

Therefore, in order to discuss generic identifiability for our RLCM, we need to define the parameter space \(\Omega (\varvec{B})\) by taking into account the sparsity structure due to the \(\varvec{\Delta }\) array. Similar to Definition 17, we denote the model parameter space with a given \(\varvec{\Delta }\) by

$$\begin{aligned} \Omega _{\varvec{\Delta }}(\varvec{\pi }, \varvec{B})=\{(\varvec{\pi }, \varvec{B}):\varvec{\pi }\in \Omega (\varvec{\pi }),\varvec{B}\in \Omega _{\varvec{\Delta }}(\varvec{B})\}. \end{aligned}$$
(18)

Coefficients in \(\Omega _{\varvec{\Delta }}(\varvec{B})\) are active when corresponding elements in \(\varvec{\Delta }\) are equal to 1, so the parameter space \(\Omega _{\varvec{\Delta }}(\varvec{B})\) would be \(\mathbb R^{\mid \varvec{\Delta }\mid }\), where \(\mid \varvec{\Delta }\mid \) is the total sum of entries of \(\varvec{\Delta }\). For generic identifiability, it suffices to consider the parameter space \(\Omega _{\varvec{\Delta }}(\varvec{\pi }, \varvec{B})\) with a given sparsity structure \(\varvec{\Delta }\).

Let \(S_{\varvec{\Delta }}\) denote the set of non-identifiable parameters from \(\Omega (\varvec{\pi },\varvec{B})\):

$$\begin{aligned} \begin{aligned} S_{\varvec{\Delta }}=\{(\varvec{\pi },\varvec{B}):&\ P(\varvec{Y} =\varvec{y} \mid \varvec{\pi },\varvec{B})= P(\varvec{Y} =\varvec{y} \mid \bar{\varvec{\pi }}, \bar{\varvec{B}}),\\ ( \varvec{\pi },\varvec{B}) \not \sim (\bar{\varvec{\pi }},&\bar{\varvec{B}}),\ (\varvec{\pi }, \varvec{B})\in \Omega _{\varvec{\Delta }}(\varvec{\pi }, \varvec{B}),(\bar{\varvec{\pi }}, \bar{\varvec{B}})\in \Omega _{\bar{\varvec{\Delta }}}(\varvec{\pi }, \varvec{B})\}. \end{aligned} \end{aligned}$$
(19)

Remark 4

The non-identifiable parameters \((\varvec{\pi }, \varvec{B}) \in S_{\varvec{\Delta }}\) could be due to some other parameters \((\bar{\varvec{\pi }}, \bar{\varvec{B}})\) with a different sparsity structure \(\bar{\varvec{\Delta }}\).

If the non-identifiable parameter set \(S_{\varvec{\Delta }}\) is of measure zero within parameter space \(\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})\), then we say \(\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})\) is a generically identifiable parameter space.

Definition 3

(Generic Identifiability) The parameter space \(\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})\) is generically identifiable, if the Lebesgue measure of \(S_{\varvec{\Delta }}\) with respect to parameter space \(\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})\) is zero.

3.3 Identifiability Conditions

In this section, we propose strict and generic identifiability conditions for our RLCM. We start with introducing the form of \(\varvec{\Delta }\) as follows.

The sparse 3-dimensional array \(\varvec{\Delta }\) takes the form

$$\begin{aligned} \varvec{\Delta }=\left( \begin{array}{c}{\varvec{\Delta }^1}\\ {\varvec{\Delta }^2}\\ {\varvec{\Delta }^\prime }\end{array}\right) \end{aligned}$$

after a permutation of items, where \(\varvec{\Delta }^1\) and \(\varvec{\Delta }^2\) contain K slices of \(\varvec{\Delta }\) and \(\varvec{\Delta }^\prime \) contains the rest of \(J-2K\) slices. We use \(\varvec{\Delta }_j^i\) to denote the j-th slice of \(\varvec{\Delta }^i\) for item j.

Theorem 1

(Strict Identifiability) The parameter space \(\Omega (\varvec{\pi },\varvec{B})\) is strictly identifiable if the following two conditions are satisfied:

  1. (A1)

    For \(j=1,\ldots ,K\), \(\Delta _j^1\) and \(\Delta _{j}^2\) satisfy simple structure shown in Definition 1 and Remark 2;

  2. (A2)

    For any two classes of subjects, there exists at least one item in \(\varvec{\Delta }^\prime \) such that they have different positive response probabilities for some response option.

Remark 5

The \(\varvec{\Delta }_j\) shown in Example 3 satisfies the structure in A1.

Theorem 2

(Generic Identifiability) The parameter space \(\Omega _{\varvec{\Delta }}(\varvec{\pi },\varvec{B})\) is generically identifiable if the following two conditions are satisfied:

  1. (B1)

    For \(j=1,\ldots ,K\), \(\Delta _j^1\) and \(\Delta _{j}^2\) satisfy the following structure:

    $$\begin{aligned} \varvec{\Delta }_j= \begin{bmatrix} 0&{}0&{}\cdots &{}0&{}0&{}0&{}\cdots &{}0\\ *&{}*&{}\cdots &{}*&{}\delta _{jj1}&{}*&{}\cdots &{}*\\ \vdots &{}\vdots &{}&{}\vdots &{}\vdots &{}\vdots &{}&{}\vdots \\ *&{}*&{}\cdots &{}*&{}\delta _{jj,M_j-1}&{}*&{}\cdots &{}*\\ \end{bmatrix}_{M_j \times P}, \end{aligned}$$
    (20)

    and \(\sum _{m=1}^{M_j-1} \delta _{jjm}\ge 1\), where \(*\) can be either 0 or 1, P generally equals \(2^K\).

  2. (B2)

    \(\varvec{\Delta }^{\prime }\) satisfies the condition that for every \(k=1,\ldots ,K\) there exists a \(j > 2K\), such that \(\sum _{m=1}^{M_j-1} \delta _{jkm}\ge 1\).

Remark 6

Condition (B2) requires that there is a least one item in the last \(J-2K\) items where attribute k loads onto the main-effect for at least one response option.

4 Bayesian Formulation for the Nominal RLCM

Following the same setting in previous sections, consider a RLCM with N subjects, J items with \(M_j\) (\(j=1,\ldots ,J\)) unordered response options for each item j, and K skills. We use subscript \(i=1,\ldots ,N\) to index subjects, \(j=1,\ldots ,J\) to index items, \(m=0,\ldots ,M_j-1\) to index options of each item, and \(c=0,\ldots ,2^K-1\) to index latent classes. Let \(\varvec{\alpha }_{i}\) denote the attribute profile of subject i, and \(Y_{ij}\) denote the response of subject i to item j. The likelihood of observing a sample of N responses to J items is

$$\begin{aligned} p\left( \varvec{Y}=\varvec{y}\mid \varvec{B}, \varvec{\pi }\right) =\prod _{i=1}^{N}\sum _{\varvec{\alpha }_c\in \{0,1\}^K} \pi _c\prod _{j=1}^{J}\prod _{m=0}^{M_j-1}\left( \frac{\exp \left( \varvec{a}_i^\top \varvec{\beta }_{jm} \right) }{\sum _{m'=0}^{M_j-1}\exp \left( \varvec{a}_i^\top \varvec{\beta }_{jm'} \right) }\right) ^{\mathbb {1}(y_{ij}=m)}. \end{aligned}$$
(21)

The posterior distribution of all parameters for the nominal RLCM is given by

$$\begin{aligned} p(\varvec{\alpha },\varvec{B},\varvec{\Delta },\gamma ,\sigma _{\beta }^{2},\varvec{\pi }|\varvec{y})\propto p(\varvec{y}|\varvec{\alpha },\varvec{B})p(\varvec{\alpha }|\varvec{\pi })p(\varvec{\pi })p(\varvec{B}|\varvec{\Delta },\sigma _\beta ^2)p(\sigma _\beta ^2)p(\varvec{\Delta }|\gamma )p(\gamma ). \end{aligned}$$
(22)

Then, we formulate the RLCM Bayesian model as follows.

We outline our Bayesian model and priors. Specifically, we use a categorical likelihood conditioned upon attributes and item parameters,

$$\begin{aligned} Y_{ij}|\varvec{\alpha }^\top \varvec{v}=c,\varvec{B}_{j}\sim \text {categorical}\left( \varvec{\theta }_{jc}(\varvec{B}_j)\right) . \end{aligned}$$
(23)

We also use a categorical prior for attributes conditioned upon the latent class probabilities,

$$\begin{aligned} \varvec{\alpha }_i|\varvec{\pi }\sim \text {categorical}(\varvec{\pi }) \end{aligned}$$
(24)

and a conjugate Dirichlet prior for the latent class probabilities, \(\varvec{\pi }\sim \text {Dirichlet}(\varvec{d}_0)\) where \(\varvec{d}_0\) is a fixed constant vector.

We use a stochastic search variable selection priors for the (jpm) elements of \(\varvec{B}\) and \(\varvec{\Delta }\):

$$\begin{aligned} \beta _{jpm}&\mid \delta _{jpm},\sigma _\beta ^2 \sim \left\{ \begin{array}{ll} N(0, \sigma _{\beta }^{2}) &{} \delta _{jpm}=1 \\ N(0, \sigma _{\beta }^{2}/D) &{} \delta _{jpm}=0 \end{array}\right. ,\end{aligned}$$
(25)
$$\begin{aligned} \delta _{jpm}&\mid \gamma \sim Bernoulli(\gamma ), \end{aligned}$$
(26)

where \(\varvec{B}=(\varvec{B}_1,\dots ,\varvec{B}_J)^\top \) satisfies the generic identifiable condition shown in Theorem 2, and the intercept is always set active with \(\delta _{j0m}=1\). Furthermore, D is a large fixed constant (e.g., we consider \(D=100\), 1000) that is used reduce the variance for the spike distribution for the case with \(\delta _{jpm}=0\). The priors for the hyper-parameters for the coefficients and activeness parameters are:

$$\begin{aligned} \sigma _{\beta }^2&\sim IGamma(\alpha _{\sigma },\beta _{\sigma }),\end{aligned}$$
(27)
$$\begin{aligned} \gamma&\sim Beta(a,b). \end{aligned}$$
(28)

Here \((\alpha _{\sigma },\beta _{\sigma },D,a,b,\varvec{d}_0)\) are hyper-parameters.

Model parameters of the nominal RLCM are inferred through applying the Polya-gamma data augmentation approach for multinomial logistic regression (Holmes & Held, 2006; Polson et al., 2013) along with the stochastic search variable selection algorithm (George & McCulloch, 1993) to infer the latent structure. Then, the Gibbs sampling algorithm is implemented from the posterior distribution of model parameters, which is given in Appendix 7. Full sampling algorithm is represented in Algorithm 1. In order to address issues with respect to poor starting values we use a combination of k-means clustering and factor analysis to specify starting values (see the description in Appendix 7).

figure a

5 Monte Carlo Simulation Study

5.1 Settings

In this section, we report results from a Monte Carlo experiment to evaluate the performance of Algorithm 1. We conducted the simulation study under different number of attributes (i.e., \(K=2\) and 3), correlations among the attributes (i.e., \(\rho =0\) and 0.25), and sample size (i.e., \(N=1000\), 2000, 5000 and 10000).

For the \(\rho =0\) case, the attribute profile \(\varvec{\alpha }=(\alpha _{1},\ldots ,\alpha _{K})^{\top }\) is generated uniformly from all possible \(2^K\) cases, so the latent class membership probabilities are \(\varvec{\pi }= (1/2^K,\ldots ,1/2^K)^\top \). For the \(\rho >0\) case, the dependence among attribute profiles is introduced using the method of Chiu et al. (2009). Suppose \(\varvec{Z}=(Z_1,\ldots ,Z_K)^{\top }\) follows a multivariate normal distribution \(N(\varvec{0}, \varvec{\Sigma })\) with unit variance and correlation \(\rho \), where \(\varvec{\Sigma }=(1-\rho )\varvec{I}_K + \rho \varvec{1}_{K}\varvec{1}_{K}^\top \) and \(\varvec{1}_{K}\) is a column vector of 1 with length K. Then, the attribute profile \(\varvec{\alpha }\) is given by \(\alpha _{k}={\mathcal {I}}(Z_k \ge \Phi ^{-1}(\frac{k}{K+1}))\), \(k=1,\ldots ,K\), where \(\Phi \) is the cumulative distribution function of the standard normal distribution. In this case, the data generating values for \(\varvec{\pi }\) are computed from integrals of the multivariate normal distribution (Chen et al., 2015; Culpepper & Balamuta, 2021).

We assume that there are \(J=18\) items, and \(M_j=4\) unordered options for each item j. For reparameterized latent class variable \(\varvec{\alpha }\) shown in Eq. 3, we only include two-way interaction terms among the attributes. Our model does not explicitly contain \(\varvec{Q}\) matrices, therefore, we recover the \(\varvec{Q}\) matrices, implied by \(\varvec{\Delta }_m\) for each option \(m=1,2,3\), using the method shown in Chen et al. (2020). Then, the true \(\varvec{\Delta }\) and true \(\varvec{Q}\) matrices for each option are shown as follows (columns in \(\varvec{\Delta }\) follow the same order as the design vector shown in Eq. 3):

  • \(\varvec{\Delta }\) cube with \(K=2\)

    $$\begin{aligned} \varvec{\Delta }_{m=1}= \left( \begin{array}{llll} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \end{array}\right) , \quad \varvec{\Delta }_{m=2}=\left( \begin{array}{llll} 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \end{array}\right) , \quad \varvec{\Delta }_{m=3}=\left( \begin{array}{llll} 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \end{array}\right) \end{aligned}$$
    (29)
  • \(\varvec{\Delta }\) cube with \(K=3\)

    $$\begin{aligned} \varvec{\Delta }_{m=1}= \left( \begin{array}{llllllll} 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right) , \varvec{\Delta }_{m=2}= \left( \begin{array}{llllllll} 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right) , \varvec{\Delta }_{m=3}= \left( \begin{array}{llllllll} 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right) \end{aligned}$$
    (30)
  • \(\varvec{Q}\) matrices with \(K=2\)

    $$\begin{aligned} {\varvec{Q}}_{m=1}= \left( \begin{array}{ll} 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \end{array}\right) , \quad {\varvec{Q}}_{m=2}=\left( \begin{array}{ll} 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \end{array}\right) , \quad {\varvec{Q}}_{m=3}=\left( \begin{array}{ll} 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \\ 0 &{} 1 \\ 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 1 \end{array}\right) \end{aligned}$$
    (31)
  • \(\varvec{Q}\) matrices with \(K=3\)

    $$\begin{aligned} {\varvec{Q}}_{m=1}= \left( \begin{array}{lll} 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \end{array}\right) , \quad {\varvec{Q}}_{m=2}=\left( \begin{array}{lll} 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \end{array}\right) , \quad {\varvec{Q}}_{m=3}=\left( \begin{array}{lll} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \end{array}\right) \end{aligned}$$
    (32)

Given \(\varvec{\Delta }\), we generate coefficients in \(\varvec{B}\) according to their prior distribution shown in Eq. 25 using \(\sigma _\beta ^2 = 1\). For hyper-parameters presented in the previous section, we have \(\alpha _{\sigma }=\beta _{\sigma }=a=b=1\), \(\varvec{d}_0 = \varvec{1}_{2^K}\), and \(D= 100\). We use a Markov chain of length 20,000 with a 10,000 burn-in period for \(K = 2\), and a chain length of 30,000 with a 20,000 burn-in period for \(K = 3\).

5.2 Results

We repeated the simulation study 100 times for each setting. For model performance, we use several metrics to evaluate parameter recovery. Specifically, we report the average element-wise accuracy rate (EAR) for \(\varvec{Q}\) by comparing the estimated \(\hat{\varvec{Q}}\) and the true \(\varvec{Q}\) matrix, where \(\hat{\varvec{Q}}\) is recovered by aggregating \(\hat{\varvec{B}}\) samples after burn-in period (Chen et al., 2020). Note that we transform \(\varvec{B}\) to \(\varvec{\Theta }\) for every sampled value using Eq. 1 and compute the point estimate \(\hat{\varvec{\Theta }}\) as the mean of all sampled \(\varvec{\Theta }\) arrays after the burn-in period. We compute the mean absolute deviation (MAD) to assess the accuracy of the estimated latent class response probabilities \(\hat{\varvec{\Theta }}\), and report the proportion of attribute profiles that are correctly estimated.

It is important to mention how we address the label-switching problem for the RLCM and ULCM. Similar to latent class models, the exploratory RLCM is identified up to label-switching. However, the RLCM has fewer permutations than the ULCM. For instance, the ULCM as \((2^K)!\) possible arrangements whereas the RLCM has \(K!\times 2^K\) arrangements (i.e., there are K! ways to permute the order of attributes and \(2^K\) ways of permuting the attribute levels). Note that for each replication we draw values from the posterior and then compare posterior means of our parameters (e.g., the \(\theta \)’s or \(\beta \)’s) with all \(K!\times 2^K\) arrangements with the data generating model parameters in order to evaluate parameter recovery. We select the permutations for the ULCM and RLCM that minimizes the difference between the estimates and data generating value. It is important to note that we do not find evidence of label-switching within chain.

Simulation results in Table 1 show a good recovery for model parameters. It suggests that for fixed K, as the sample size gets larger, the MAD of \(\hat{\varvec{\Theta }}\) and \(\varvec{\pi }\) become smaller and the EARs of \(\hat{\varvec{Q}}\) matrices become larger. The EARs of \(\hat{\varvec{Q}}\) matrices are higher for smaller K, which is expected given that the number of unknown model parameters that must be estimated increases with larger K. The simulation results also provide evidence that a positive correlation among attributes, represented by \(\rho > 0\), results in slightly larger MADs for \(\varvec{\Theta }\) in some instances, and this impact is more systematic for \(K=3\). Although \(\rho =0.25\) slightly decreases recovery of \(\varvec{\Theta }\), \(\varvec{\pi }\), and \(\varvec{Q}\), the results in Table 1 show attribute classification accuracy improves by a few percentage points. Overall, the classification accuracy is at acceptable levels and generally exceeds 70% for most scenarios.

We also conduct Monte Carlo experiment for denser \(\varvec{\Delta }\) and \(\varvec{Q}\) compared with those shown in Eqs. 2932, true \(\varvec{\Delta }\), \(\varvec{Q}\) matrices and simulation results are given in Appendix 7. Table 6 shows similar model parameter recovery compared with the results shown in Table 1.

5.3 Unstructured Latent Class Models (ULCMs)

We want to compare our model performance with ULCMs, which assume that there’s no latent structure between latent attribute classes and observed response variables. Following the same setting as we represented for the nominal RLCM, the likelihood of observing a sample of N independent responses to J items is

$$\begin{aligned} p(\varvec{Y}=\varvec{y}|\varvec{\alpha },\varvec{\Theta })=\prod _{j=1}^J\prod _{c=0}^{2^K-1}\prod _{m=0}^{M_j-1} \theta _{jc}^{n_{jcm} }, \end{aligned}$$
(33)

where \(n_{jcm}=\sum _{i=1}^n={\mathcal {I}}(y_{ij}=m)\mathcal I(\varvec{\alpha }_{i}^\top \varvec{v}=c)\). Then, the posterior distribution of all parameters for the nominal ULCM is given by

$$\begin{aligned} p(\varvec{\alpha },\varvec{\Theta },\varvec{\pi }|\varvec{y})\propto p(\varvec{y}|\varvec{\alpha },\varvec{\Theta })p(\varvec{\Theta })p(\varvec{\alpha }|\varvec{\pi })p(\varvec{\pi }). \end{aligned}$$
(34)

Below is the Bayesian framework for our nominal ULCM. Given attribute profile \(\varvec{\alpha }\) and class-response probability matrix \(\varvec{\Theta }\), response data follow a categorical distribution

$$\begin{aligned} Y_{ij}|\varvec{\alpha }_i^\top \varvec{v}=c,\varvec{\theta }_{jc}\sim \text {Categorical}(\varvec{\theta }_{jc}). \end{aligned}$$
(35)

We use a Dirichlet prior for the class-response probability vectors

$$\begin{aligned} \varvec{\theta }_{jc}\sim \text {Dirichlet}(\varvec{d}_{M_j}), \end{aligned}$$
(36)

and a categorical prior for attributes conditioned upon the latent class probabilities,

$$\begin{aligned} \varvec{\alpha }_i|\varvec{\pi }\sim \text {Categorical}(\varvec{\pi }) \end{aligned}$$
(37)

with a conjugate Dirichlet prior for the latent class probabilities \(\varvec{\pi }\sim \text {Dirichlet}(\varvec{d}_0)\), where \(\varvec{d}_M\) and \(\varvec{d}_0\) are constant vectors.

We applied Gibbs sampling algorithm to estimate model parameters via their posterior distributions.

  • \(\varvec{\theta }_{jc}\mid \varvec{y}_{1:n},\varvec{\alpha }_{1:n}\sim \text {Dirichlet}(\varvec{n}_{jc}+\varvec{d}_{M_j})\)

    $$\begin{aligned} p(\varvec{\theta }_{jc}\mid \varvec{y}_{1:n},\varvec{\alpha }_{1:n})&\propto p(\varvec{y}_{Ij}\mid \varvec{\alpha }_{I},\varvec{\theta }_{jc})p(\varvec{\theta }_{jc})\nonumber \\&\propto \prod _{m=0}^{M_j}\theta _{jcm}^{n_{jcm}}\cdot \prod _{m=0}^{M_j}\theta _{jcm}^{1-1}, \end{aligned}$$
    (38)

    where \(\varvec{n}_{jc} = (n_{jc0},\ldots ,n_{jc,M_j-1})^\top \) and \(I = \left\{ i:\varvec{\alpha }_{i}^\top \varvec{v}=c \right\} \).

  • \(\varvec{\pi }|\varvec{\alpha }\sim Dirichlet(\varvec{n}+\varvec{d}_0)\)

    $$\begin{aligned} p(\varvec{\pi }\mid \varvec{\alpha }_{1:n})\propto p(\varvec{\alpha }_{1:n}\mid \varvec{\pi })p(\varvec{\pi }) \sim \text {Dirichlet}(\varvec{n}+\varvec{d}_0), \end{aligned}$$
    (39)

    where \(\varvec{\pi }\sim \text {Dirichlet}(\varvec{d}_0) \) and \(\varvec{n}=(n_0,\cdots ,n_{2^K-1})^\top \) represents the frequencies of each attribute pattern \(\varvec{\alpha }_{i}^\top \varvec{v}=c\), \(c=0,\ldots ,2^K-1\).

  • \(\varvec{\alpha }_i\mid \varvec{\alpha }_{(i)},\varvec{y}_{1:n}\) We update \(\varvec{\alpha }\) while integrating \(\varvec{\pi }\) out

    $$\begin{aligned} p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N)&= \int p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N\mid \varvec{\pi })p(\varvec{\pi })d\varvec{\pi }\nonumber \\&=\dfrac{1}{B(\varvec{d}_0)} \int \left( \prod _{c=0}^{2^K-1} \pi _{c}^{n_{c}+d_{0,c}-1}\right) \textrm{d} \varvec{\pi }\nonumber \\&=\dfrac{B(\varvec{n} + \varvec{d}_0)}{B(\varvec{d}_0)}. \end{aligned}$$
    (40)

    Then, the full conditional distribution for \(\varvec{\alpha }_i\) is

    $$\begin{aligned} p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)})&= \dfrac{p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_N)}{p(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_{i-1},\varvec{\alpha }_{i+1},\ldots ,\varvec{\alpha }_N)}\nonumber \\&= \frac{n_{c(i)}+1}{n-1+2^K}, \end{aligned}$$
    (41)

    where \(n_{c(i)}\) represents the number of individuals other than i that have attribute profile \(\varvec{\alpha }_c\). The full conditional distribution of \(\varvec{\alpha }_i\) given \(\varvec{y}_{1:n}\) and \(\varvec{\alpha }_{(i)}\) is

    $$\begin{aligned} p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)},\varvec{y}_{1:n})&\propto p(\varvec{\alpha }_i^{\top } \varvec{v}=c\mid \varvec{\alpha }_{(i)})p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{\Theta })\nonumber \\&\propto (n_{c(i)}+1)p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{\Theta }), \end{aligned}$$
    (42)

    we update \(\varvec{\alpha }_i\) sequentially with weight proportional to \((n_{c(i)}+1)p(\varvec{y}_i\mid \varvec{\alpha }_i^{\top } \varvec{v}=c,\varvec{\Theta })\).

The full Gibbs sampling steps of all parameters are shown in Algorithm 2.

figure b

In order to compare the model performance of the ULCM with the RLCM, we generate response data from the RLCM model, and then use both Algorithm 1 and to estimate model parameters in two models. We use same simulation settings as in our RLCM. For hyper-parameters in ULCM, we use \(\varvec{d}_{M_j}=\varvec{1}_{M_j}\) for \(j=1,\ldots ,J\). Simulation results are shown in Table 1 and provide evidence the RLCM has better parameter recovery. Table 2 reports additional details regarding MADs for \(\varvec{\Theta }\) at the item level for the RLCM and ULCM. The results show that the aggregate findings in Table 1 are consistent with item-level performance such that the RLCM has smaller MADs than the ULCM.

Results shown in Tables 1 and 2 indicate that for response data generated via RLCM, Algorithm 1 performs uniformly better than Algorithm 2, which implies that if there is structure in the latent relationship between attributes and observed variables, our RLCM can achieve better recovery for model parameters compared with the ULCM.

Table 1 Summary of simulation performance for RLCM and ULCM.
Table 2 Summary of mean absolute deviations (MADs) of RLCM and ULCM item response probabilities by item for two selected conditions.

6 Applications

6.1 Wagner Preference Inventory

In this section, we apply Algorithm 1 to the dataset in the Wagner Preference Inventory (WAPI II) (Wagner & Wells, 1985). This data set contains nominal responses to \(J=12\) items, each of which contains \(M=4\) choices. All 13, 502 participants completed the 12 questions, so we have \(N=13,502\). Table 3 presents the twelve items along with the marginal probability of selecting each response option. The twelve items were originally designed to distinguish preferences along the notion of activities that vary in Left vs. Right brain and logical vs. creative. The proposed two-by-two design included (a) Left, logical; (b) left, verbal; (c) right, manipulative/spatial; and (d) right, creative. A separate measure for left and right preference can be obtained by adding (a) and (b) and (c) and (d), respectively. In order to be consistent with (Wagner & Wells, 1985), we let \(K=2\) to represent the left-right brain dominance dichotomy. We ran five Markov chains with \(K=2\) for convergence diagnostics of the Markov chain.

Table 3 Wagner preference inventory items, anchors, and response frequencies.

Figure 1 shows the plot of maximum proportional scale reduction factor (PSRF) (Brooks & Gelman, 1998) for checking the convergence of Markov chain with multivariate parameters. The approximate convergence is achieved after 5, 000 iterations since the maximum PSRF remains below 1.1 after that. So we ran 100 Markov chains of length 20, 000 (with 10, 000 as burn-in) estimate the parameters and the results are shown in Table 4.

Fig. 1
figure 1

The maximum PSRF for Wagner Preference Inventory data.

Table 4 Estimated \(\hat{\varvec{\Theta }}\) for Wagner Preference Inventory data.

Table 4 implies that participants with attribute profiles \(\varvec{\alpha }_i = (0,1)^\top \), \((1,1)^\top \), \((0,0)^\top \) and \((1,0)^\top \) prefer option a, b, c and d, respectively. For instance, the choices for item 1 were “a. major in logic”, “b. write a letter”, “c. fix things at home”, and “d. major in art”. The estimates for \(\varvec{\Theta }\) in Table 4 indicate that members of class 01 were most likely to choose option “a” with an estimated response probability of 0.725. In contrast, members of class 00 had a 0.643 chance of selecting option “c” and respondents in the 10 class chose “d” with a probability equal to 0.693.

We also estimated the latent class probabilities of attribute profiles. Specifically, the proportions of each attribute profile pattern in an increasing order of the bijection \(\varvec{\alpha }_c^\top \varvec{v}\) are shown in Table 5. The latent classes are nearly equal in size with the most respondents of 0.290 classified with the 11 profile (i.e., Wagner’s left-verbal group) and the 00 class having 0.204 proportion of respondents (i.e., Wagner’s right-manipulative/spatial).

Table 5 Estimated distribution of attributes in Wagner Preference Inventory data.

Also, results shown in Table 4 can be used to evaluate the intended choice design for the items. Most items differentiated among one or two of the underlying latent classes. However, some options did not differentiate the latent classes as intended, such as item 2, 7 and 10. For item 2, Wagner originally specified option a as left-logical function and option d as right/creative function. However, according to probabilities represented in Table 4, people with attribute profile \(\varvec{\alpha }_i = (0,1)^\top \) did not strongly prefer option a, and people with attribute profile \(\varvec{\alpha }_i = (1,0)^\top \) did not strongly prefer option d. The choice design for item 2 should be reconsidered.

7 Discussion

This paper focuses on the identifiability conditions of RLCMs. We proposed strict and generic identifiability conditions based on the unique condition of tensor decomposition shown in Kruskal theorem for the uniqueness of three-way arrays (Kruskal, 1976, 1977). The established identifiability conditions are applicable to a wealth of models for binary (e.g., Chen et al., 2015; 2020, de la Torre, 2011), polytomous (e.g., Chen & de la Torre, 2013, Culpepper, 2019, Culpepper & Balamuta, 2021), and nominal response data. Accordingly, the new identifiability results can guide researchers on the design of diagnostic interventions. Then, we developed a Bayesian formulation for the RLCMs where the generic identifiability conditions are taken into consideration. For our simulation study, we apply the Polya-gamma data augmentation for updating coefficients, and compared our algorithm results with ULCMs. Simulation results show that our algorithm can efficiently estimate model parameters, especially when the number of attribute profiles are small. Given latent structures, our model has better performance compared with ULRMs.

In this paper, we assume that the number of attribute profiles, K, is fixed and pre-specified. However, the prior knowledge for K may not be available in practice. Further study may consider K as an unknown parameter that needs to be estimated (e.g., see Chen et al., 2021). Unknown K implies that the dimension of attribute profiles, category response probability array and coefficients array become unavailable, which can be quite challenging for future research.