1 Introduction

Cognitive diagnosis (CD) in educational measurement (DiBello, Roussos, & Stout, 2007; Haberman & von Davier, 2007; Leighton & Gierl, 2007; Nichols, Chipman, & Brennan, 1995; Rupp, Templin, & Henson, 2010; Tatsuoka, 2009) explicitly targets mastery of the instructional content and seeks to provide immediate feedback to students on their strengths and weaknesses in terms of attributes mastered and attributes needing study. Within the CD framework, skills, specific knowledge, talent, ability—any aptitude required to perform cognitive tasks—are collectively referred to as “attributes” that an examinee may or may not possess. CD models—or “Diagnostic Classification Models” (DCMs), as they are called here—describe an examinee’s ability as a composite of these attributes. Mastery of attributes is recorded as a binary string. Different zero-one combinations define attribute profiles of distinct proficiency classes to which examinees are assigned in estimating their individual attribute profiles from their test performance. Parametric methods for fitting DCMs prevail. They either use marginal maximum likelihood estimation relying on the expectation–maximization algorithm (MMLE-EM) or Markov chain Monte Carlo (MCMC) techniques (de la Torre, 2009, 2011; DiBello et al., 2007; von Davier, 2008).

A number of researchers (Ayers, Nugent, & Dean, 2008; Chiu, 2008; Chiu & Douglas, 2013; Chiu, Douglas, & Li, 2009; Park & Lee, 2011; Willse, Henson, & Templin, 2007) have explored the potential of methods that do not rely on a parametric statistical model—nonparametric methods for short—as alternatives to MMLE-EM and MCMC for assigning examinees to proficiency classes. The need for nonparametric methods might arise in situations where sample sizes are insufficient to provide reliable maximum likelihood estimates; for example, when assessment data have been collected in educational micro-environments, say, for monitoring the instruction and learning process at the classroom level. The NonParametric Classification (NPC) method by Chiu and Douglas (2013) and its generalization, and the General NPC (GNPC) method (Chiu, Sun, & Bian, 2018) are two recent examples of efficient and effective nonparametric methodological developments for assigning examinees to proficiency classes. The algorithms of the NPC and GNPC method can handle small sample sizes (Chiu et al. 2018). In addition, they are easy to implement and computationally inexpensive using minimal CPU times. These features support the use of the NPC and GNPC methods as computational engines for CD-based computerized adaptive tests (CAT) tailored to small and very small teaching units like individual classrooms.

Both methods, have certain drawbacks. The NPC estimator of examinees’ proficiency class was proven by Wang and Douglas (2015) to be statistically consistent for any DCM under certain regularity conditions. However, these consistency conditions are often difficult to meet for more complex DCMs that model the probability of a correct item response as an increasing function of the number of required attributes mastered by an examinee (known as the “monotonicity assumption”). The NPC method does not provide the flexibility to account for such complex relations between required and mastered attributes. The GNPC method, on the other hand, provides this flexibility, but the statistical properties of the resulting proficiency-class estimator are currently unknown.

In this article, the statistical consistency of the GNPC estimator of examinees’ proficiency class is proven. The next sections provide brief summaries of some key technical concepts of the NPC and the GNPC method that are prerequisites of the proof of consistency. The paper concludes with a brief discussion of possible applications of this important result.

2 Review of Technical Key Concepts

2.1 Cognitive Diagnosis and Diagnostic Classification Models

Assume ability in a given domain is conceptualized as a composite of K latent binary attributes \(\alpha _1, \alpha _2, \ldots , \alpha _K\). The K-dimensional binary vector \(\varvec{\alpha }_m = (\alpha _{m1}, \alpha _{m2}, \ldots , \alpha _{mK})^\mathrm{T}\) denotes the attribute profile of proficiency class \(\mathcal {C}_m\), \(m=1,2,\ldots ,M\), where the \(k\mathrm{th}\) entry, \(\alpha _{mk} \in \{0,1\}\), indicates (non-)mastery of the corresponding attribute. (The transpose of vectors or matrices is denoted by a superscripted T; the conventional “prime” notation is reserved here for distinguishing between vectors or their scalar entries.) If the attributes do not have a hierarchical structure, then there are \(2^K=M\) distinct proficiency classes. The attribute profile \(\varvec{\alpha }_{i \in \mathcal {C}_m}\) of examinee \(i \in \mathcal {C}_m\) is usually written as \(\varvec{\alpha }_i = (\alpha _{i1}, \alpha _{i2}, \ldots , \alpha _{iK})^\mathrm{T}\). Throughout the text, the terms “profile” and “vector” are used interchangeably; for brevity, the examinee index i, \(i=1,2, \ldots ,N\), is omitted if the context permits; for example, \(\varvec{\alpha }_i\) is simply written as \(\varvec{\alpha } = (\alpha _1, \alpha _2, \ldots , \alpha _K)^\mathrm{T}\).

The individual items of a test are also characterized by K-dimensional attribute profiles \(\mathbf {q}_j\) that specify for each individual item j of a test, \(j=1,2, \ldots , J\), which attributes are required for a correct response (\(q_{jk} = 1\), if a correct answer requires mastery of the \(k\mathrm{th}\) attribute, and 0 otherwise). If a domain is characterized by K attributes, then there are at most \(2^K-1\) distinct item-attribute profiles because item-attribute profiles that consist entirely of zeroes are inadmissible. The J item-attribute profiles of a test form its Q-matrix, \(\mathbf {Q}=\{q_{jk}\}_{(J \times K)}\) (Tatsuoka, 1985). The Q-matrix must be identified and complete. A Q-matrix is said to be complete if it guarantees the identifiability of all realizable proficiency classes among examinees (Chiu et al., 2009; Köhn & Chiu, 2016a, 2017). Q-completeness is formally defined as \(\varvec{S}(\varvec{\alpha })=\varvec{S}(\varvec{\alpha }^{\prime }) \Rightarrow \varvec{\alpha }=\varvec{\alpha }^{\prime }\), where \(\varvec{S}(\varvec{\alpha }) = E(\varvec{Y} \mid \varvec{\alpha })\) denotes the conditional expectation of the item response vector \(\varvec{Y}\), given attribute vector \(\varvec{\alpha }\). Verbally stated, a Q-matrix is complete if the equality of two expected item response vectors, \(\varvec{S}(\varvec{\alpha })\) and \(\varvec{S}(\varvec{\alpha }^{\prime })\), implies that the underlying attribute profiles, \(\varvec{\alpha }\) and \(\varvec{\alpha }^{\prime }\), are also identical. Completeness of the Q-matrix is a general requirement for any diagnostic classification regardless of whether MMLE-EM, MCMC, or nonparametric methods are used to assign examinees to proficiency classes. An incomplete Q-matrix causes examinees to be assigned to proficiency classes to which they do not belong.

DCMs are constrained latent class models such that the latent variable proficiency-class membership—associated with mastery of a particular attribute set—determines the probability of a correct item response. The specific functional relation between mastery of attributes and the probability of a correct item response distinguishes DCMs (de la Torre & Douglas, 2004; Henson et al., 2009; Maris, 1999). Criteria for describing these differences are, for example, compensatory versus non-compensatory models, and disjunctive versus conjunctive models. DCMs that allow for compensating the lack of certain attributes by the mastery of other attributes are called compensatory models, in contrast to non-compensatory models that do not provide this possibility. The second criterion distinguishes between disjunctive DCMs, where mastery of a subset of the required attributes is a sufficient condition for maximizing the probability of a correct item response, and conjunctive DCMs, where mastery of only a subset of the required attributes results in a success probability equal to that of an examinee mastering none of the attributes. The Deterministic Input Noisy “AND” Gate (DINA) Model (Haertel, 1989; Junker & Sijtsma, 2001; Macready & Dayton, 1977) is the standard example of a conjunctive DCM; its item response function (IRF) is

$$\begin{aligned} P(Y_{ij}=1 \mid \varvec{\alpha }_i) = (1-s_j)^{\eta _{ij}} g_j^{(1-\eta _{ij})} \end{aligned}$$

where monotonicity is typically imposed through the restriction \(0< g_j< 1 - s_j < 1 \quad \forall j\). The response of examinee i to binary item j, \(j = 1,2,\ldots , J\) is denoted as \(Y_{ij}\). The conjunctive ideal response \(\eta _{ij}\), defined as \({ \eta _{ij} = \prod ^K_{k=1} \alpha _{ik}^{q_{jk}} }\), indicates whether examinee i has mastered all the attributes needed to answer item j correctly. The item-related parameters \(s_j = P(Y_{ij}=0 \mid \eta _{ij}=1)\) and \(g_j = P(Y_{ij}=1 \mid \eta _{ij}=0)\) formalize the probabilities of slipping (failing to answer item j correctly despite having the skills required to do so) and guessing (answering item j correctly despite lacking the skills required to do so), respectively. Thus, \(\eta _{ij}\) can be interpreted as the ideal item response when neither slipping nor guessing occurs. The Deterministic Input Noisy “OR” gate (DINO) model (Templin & Henson, 2006) is the prototypic disjunctive DCM (i.e., mastery of a subset of the required attributes is a sufficient condition for maximizing the probability of a correct item response). The IRF of the DINO model is

$$\begin{aligned} P(Y_{ij}=1 \mid \varvec{\alpha }_i) = (1-s_j)^{\omega _{ij}} g_j^{(1-\omega _{ij})} \end{aligned}$$

where the disjunctive ideal response \({ \omega _{ij}=1-\prod _{k=1}^K(1-\alpha _{ik})^{q_{jk}} }\) indicates whether at least one of the attributes required for item j has been mastered. (Like \(\eta _{ij}\) in the DINA model, \(\omega _{ij}\) is the ideal item response.) The DINA model and the DINO model are rather limited in their flexibility to model the relation between response probabilities and attribute mastery. For example, the DINA model cannot distinguish between examinees who master none and those who master a subset of the attributes required for an item. Only if all required attributes are mastered can an examinee realize a high probability of answering the item correctly. In contrast, more complex models like the general DCMs (de la Torre, 2011; Henson et al., 2009; Rupp et al., 2010; von Davier, 2005, 2008) offer far greater flexibility in modeling the probability of correct item responses for different attribute profiles.

2.2 The Nonparametric Classification Method

The NPC method for estimating an examinee’s proficiency class developed by Chiu and Douglas (2013) does not—as the name suggests—rely on parametric estimation, but uses a distance-based algorithm on the observed item responses for classifying examinees. Specifically, proficiency-class membership is determined by comparing an examinee’s observed item response profile \(\mathbf {y}\) with each of the ideal item response profiles of the realizable \(2^K=M\) proficiency classes. Let \(\varvec{\eta }_i\) denote the J-dimensional ideal item response vector of examinee i. (All examinees in proficiency class \(\mathcal {C}_m\) share the same attribute profile; hence, \(\varvec{\eta }_i = \varvec{\eta }_{i \in \mathcal {C}_m} = \varvec{\eta }_m\).) As the Q-matrix and the M realizable proficiency classes are known, the construction of their ideal item response profiles \(\varvec{\eta }_1, \varvec{\eta }_2, \ldots ,\varvec{\eta }_M\) is straightforward. The NPC estimator \(\widehat{\varvec{\alpha }}\) of an examinee’s attribute profile \(\varvec{\alpha }_i\) is defined as the attribute profile underlying the ideal item response profile that among all ideal item response profiles minimizes the distance to an examinee’s observed item response vector—formally,

$$\begin{aligned} \widehat{\varvec{\alpha }}_i = \mathop {\text {arg min}}\limits _{m \in \{ 1,2, \ldots , M \}} d(\varvec{Y}_i, \varvec{\eta }_m) \end{aligned}$$
(1)

with \(\varvec{Y}_i\) denoting the J-dimensional item response vector of examinee i. Hence, the choice of the specific distance measure \(d(\cdot )\) for the loss function of Eq. 1 is of critical importance in determining \(\widehat{\varvec{\alpha }}_i\).

A distance measure often used with binary data is the Hamming distance defined as the number of disagreements between two vectors—for the NPC method:

$$\begin{aligned} d_{_H}(\varvec{Y},\varvec{\eta })=\sum _{j=1}^J \mid Y_j - \eta _j \mid \end{aligned}$$

Weighted Hamming distances allow to adjust for different levels of variability in the item responses; for example, in using the inverse of the item sample variance to increase the impact on the distance function of items with smaller variance:

$$\begin{aligned} d_{_{wH}}(\varvec{Y},\varvec{\eta })=\sum _{j=1}^J \frac{1}{\overline{p}_j(1-\overline{p}_j)} \mid Y_j - \eta _j \mid \end{aligned}$$

(\(\overline{p}_j\) is the proportion of correct responses to the jth item). A purported advantage of the weighted Hamming distance is the substantial reduction in the number of ties, which can be an issue especially with short tests.

Wang and Douglas (2015) proved that under certain regularity conditions statistical consistency of the NPC estimator \(\widehat{\varvec{\alpha }}\) is guaranteed for any DCM as long as the probability of a correct response is greater than 0.5 for examinees who master the required attributes and less than 0.5 for examinees who do not master all the required attributes. An implementation of the NPC method is available in the R package NPCD (Zheng & Chiu, 2014).

2.3 The General Nonparametric Classification Method

The consistency conditions of the NPC estimator \(\widehat{\varvec{\alpha }}\) identified by Wang and Douglas (2015) are sometimes difficult to meet for complex DCMs that model the probability of a correct response to an item as an increasing function of the number of attributes that are required for an item and mastered by an examinee. As an example, consider a domain characterized by two attributes; the realizable proficiency classes are \(\varvec{\alpha }_1 = (00)\), \(\varvec{\alpha }_2 = (10)\), \(\varvec{\alpha }_3 = (01)\), and \(\varvec{\alpha }_4 = (11)\). For an item having attribute vector \(\varvec{q}=(11)\), the corresponding ideal item responses are \(\eta _1 = 0\), \(\eta _2 = 0\), \(\eta _3 = 0\), and \(\eta _4 = 1\). Assume this item conforms to the DINA model, with \(g=0.1\) and \(1-s=0.9\). (The equivalent parameterization using the G-DINA model is \(\varvec{\beta } = (\beta _0, \beta _1, \beta _2, \beta _{12})^{\prime } = (0.1, 0, 0, 0.8)^{\prime }\).) The probabilities of answering the item correctly for the four proficiency classes are 0.1, 0.1, 0.1, and 0.9, respectively. Thus, the ideal responses 0, 0, 0, and 1 are, indeed, the most likely responses of the four proficiency classes. However, this may no longer be true if the data conform to a more complex model, say, the saturated G-DINA model with parameter vector \(\varvec{\beta } = (\beta _0, \beta _1, \beta _2, \beta _{12})^\mathrm{T} = (0.1, 0.4, 0.6, -0.2)^\mathrm{T}\). Then, for the four proficiency classes, the probabilities of a correct item response are 0.1, 0.5, 0.7, and 0.9. Thus, the ideal item responses 0, 0, 0, and 1 are no longer the most likely responses. Said differently, using the conjunctive \(\varvec{\eta }\) in this instance to model the relation between \(\mathbf {q}\) and \(\varvec{\alpha }\) may result in examinee misclassifications.

Recall that the DINA model and the DINO model are the prototypic conjunctive and disjunctive DCM, respectively. They define two extremes of a continuum describing the relation between \(\mathbf {q}\) and \(\varvec{\alpha }\). Based on this observation, Chiu et al. (2018) proposed a weighted ideal item response defined as the convex combination of the conjunction parameter \(\eta \) and the disjunction parameter \(\omega \). The weights are estimated from the data; hence, the relative contribution of \(\eta \) and \(\omega \) to the weighted ideal item response is tailored to the complexity of the DCM underlying the data, which elegantly resolves the limitations of the NPC method. Hence, the GNPC method can be used with any DCM that can be represented as a general DCM. Also, no a priori knowledge of the model underlying the data is required, which is presumably the most distinctive feature of the GNPC method.

The conjunctive ideal item response vector of examinee i is henceforth written as \(\varvec{\eta }^{(c)}_i\), with elements \(\eta ^{(c)}_{ij}=\prod _{k=1}^K \alpha _{ik}^{q_{jk}}\) defined earlier as the conjunction parameter of item j for the DINA model; \(\varvec{\eta }^{(d)}_i\) denotes the disjunctive ideal item response vector, with elements \(\eta ^{(d)}_{ij}=1-\prod _{k=1}^K(1-\alpha _{ik})^{q_{jk}}\) defined previously for the DINO model. For each item j and proficiency class \(\mathcal {C}_m\), the weighted ideal response \(\eta ^{(w)}_{mj}\) is defined as the convex combination

$$\begin{aligned} \eta ^{(w)}_{mj} = w_{mj}\eta _{mj}^{(c)} + (1-w_{mj})\eta _{mj}^{(d)} \end{aligned}$$
(2)

subject to \(0 \le w_{mj} \le 1\). The distance between the responses to item j and the weighted ideal responses \(\eta _{mj}^{(w)}\) of examinees in \(\mathcal {C}_m\) is defined as the sum of squared deviations:

$$\begin{aligned} d_{mj} = \sum _{i\in \mathcal {C}_m} (Y_{ij} - \eta _{mj}^{(w)})^2 = \sum _{i\in \mathcal {C}_m} \big ( Y_{ij} - w_{mj}\eta _{mj}^{(c)} - (1-w_{mj})\eta _{mj}^{(d)} \big )^2 \end{aligned}$$
(3)

Thus, \(w_{mj}\) can be estimated by minimizing \(d_{mj}\):

$$\begin{aligned} \tilde{w}_{mj} = \frac{\sum _{i\in \mathcal {C}_m} (Y_{ij}-\eta _{mj}^{(d)})}{\parallel \mathcal {C}_m \parallel (\eta _{mj}^{(c)}-\eta _{mj}^{(d)})} \end{aligned}$$
(4)

subject to \(\eta _{mj}^{(c)} - \eta _{mj}^{(d)} \ne 0\); \(\parallel \mathcal {C}_m \parallel \) indicates the number of examinees in proficiency class \(\mathcal {C}_m\). Some algebra shows

$$\begin{aligned} \tilde{w}_{mj} = 1 - \overline{Y}_{j\mathcal {C}_m} \end{aligned}$$
(5)

(As \(Y_{ij}\) is a binary random variable, \(0 \le \overline{Y}_{j\mathcal {C}_m} \le 1\); hence, \(0 \le \tilde{w}_{mj} \le 1\), which is in accord with the constraint \(0 \le w_{mj} \le 1\).) After \(\tilde{w}_{mj}\) has been determined, \(\tilde{\eta }_{mj}^{(w)}\) is computed as

$$\begin{aligned} \tilde{\eta }^{(w)}_{mj} = \tilde{w}_{mj}\eta _{mj}^{(c)} + (1-\tilde{w}_{mj})\eta _{mj}^{(d)} \end{aligned}$$
(6)

In this manner, the estimators of all possible weighted ideal response vectors \(\tilde{\varvec{\eta }}_1^{(w)}, \tilde{\varvec{\eta }}_2^{(w)},\ldots , \tilde{\varvec{\eta }}_{M}^{(w)}\) can be constructed for the M proficiency classes (each of which identified by its attribute profile \(\varvec{\alpha }_m\)). Finally, note that the relation

$$\begin{aligned} \tilde{\eta }^{(w)}_{mj} = \overline{Y}_{j \mathcal {C}_m} \end{aligned}$$
(7)

follows directly from Eqs. 5 and 6.

Comment: Eq. 4 relies on \(\eta _{mj}^{(c)}\) and \(\eta _{mj}^{(d)}\), which implies that an initial classification of examinees is required as input to the estimation of \(w_{mj}\) (and, of course, \(\eta ^{(w)}_{mj}\)). Said differently, in the previous exposition, examinees’ true attribute profiles \(\varvec{\alpha }\) were assumed to be known and used to construct \(\tilde{\varvec{\eta }}_1^{(w)}, \tilde{\varvec{\eta }}_2^{(w)},\ldots , \tilde{\varvec{\eta }}_{M}^{(w)}\); hence, they are global minimizers of Eq. 3. The “tilde” notation is used to distinguish the case where the \(\varvec{\alpha }\) are known from a situation where the attribute profiles \(\varvec{\alpha }\) are unknown and must be estimated from the data to determine the ideal response vectors; this case is discussed next.

Examinees’ true attribute profiles \(\varvec{\alpha }\) are never known, so they must be estimated from the data to provide the input to initialize the GNPC algorithm. Denote this initial estimator of \(\varvec{\alpha }\) by \(\widehat{\varvec{\alpha }}^{(0)}\), which allows to derive the estimators \(\widehat{\varvec{\eta }}^{(c)}\) and \(\widehat{\varvec{\eta }}^{(d)}\) that are then used to obtain the estimators

$$\begin{aligned} \widehat{w}_{mj} = \frac{\sum _{i\in \widehat{\mathcal {C}}_m} (Y_{ij}-\hat{\eta }_{mj}^{(d)})}{\parallel \widehat{\mathcal {C}}_m \parallel (\hat{\eta }_{mj}^{(c)}- \hat{\eta }_{mj}^{(d)})} \end{aligned}$$
(8)

(\(\widehat{\mathcal {C}}_m\) indicates that membership in proficiency class m is determined based on \(\widehat{\varvec{\alpha }}^{(0)}\)) and

$$\begin{aligned} \hat{\eta }^{(w)}_{mj} = \widehat{w}_{mj}\hat{\eta }_{mj}^{(c)} + (1-\widehat{w}_{mj})\hat{\eta }_{mj}^{(d)} \end{aligned}$$
(9)

From Eq. 8, the estimator \(\widehat{w}_{mj}\) is derived:

$$\begin{aligned} \widehat{w}_{mj} = 1 - \overline{Y}_{j \widehat{\mathcal {C}}_m} \end{aligned}$$
(10)

(notice that \(0 \le \widehat{w}_{mj} \le 1\)), and from Eqs. 9 and 10

$$\begin{aligned} \hat{\eta }^{(w)}_{mj} = \overline{Y}_{j \widehat{\mathcal {C}}_m} \end{aligned}$$
(11)

The GNPC estimator \(\widehat{\varvec{\alpha }}\) of an examinee’s attribute profile is defined as the attribute profile underlying the estimated weighted ideal item response profile that among all estimated weighted ideal item response profiles minimizes the loss function defined in terms of the distance to an examinee’s observed item response vector:

$$\begin{aligned} \widehat{\varvec{\alpha }}_i = \mathop {\text {arg min}}\limits _{m \in \{ 1,2,\ldots ,M \}} d(\varvec{Y}_i,\widehat{\varvec{\eta }}_m^{(w)}) \end{aligned}$$

where

$$\begin{aligned} d(\varvec{Y}_i,\widehat{\varvec{\eta }}_m^{(w)}) = \sum _{j = 1}^J d(Y_{ij}, \hat{\eta }_{mj}^{(w)}) = \sum _{j = 1}^J(Y_{ij}-\hat{\eta }_{mj}^{(w)})^2 \end{aligned}$$

3 Consistency Theory of the Proficiency-Class Estimator \(\widehat{\varvec{\alpha }}\) of the General Nonparametric Classification Method

The consistency theory of the proficiency-class estimator \(\widehat{\varvec{\alpha }}\) of the GNPC method consists of four lemmas and one proposition that specify the theoretical requirements for two consistency theorems. A few technical preliminaries are warranted. First, note that Eq. 2 implies that if \(w_{mj}=1\), then \(\eta ^{(w)}_{mj}\) reduces to \(\eta ^{(c)}_{mj}\), and if \(w_{mj}=0\), then \(\eta ^{(w)}_{mj}\) reduces to \(\eta ^{(d)}_{mj}\). In these two cases, the underlying DCM corresponds to the DINA model and the DINO model, respectively. For the DINA model, \(\widehat{\varvec{\alpha }}\) was proven to be consistent by Wang and Douglas (2015); the proof can be readily extended to the DINO model based on the “dual” relation of the two models (see Köhn & Chiu, 2016b). Hence, these two cases are of no further concern here. Second, Eq. 4 shows that \(w_{mj}\) can only be estimated if \(\eta _{mj}^{(c)} - \eta _{mj}^{(d)} \ne 0\) is satisfied. If \(\eta _{mj}^{(c)} = \eta _{mj}^{(d)}\), then either \(\eta _{mj}^{(c)} = \eta _{mj}^{(d)} = 0\), implying \(\eta _{mj}^{(w)} = \eta _{mj}^{(c)} = \eta _{mj}^{(d)} = 0\) (i.e., all examinees in \(\mathcal {C}_m\) have failed all attributes required for item j), or \(\eta _{mj}^{(c)} = \eta _{mj}^{(d)} = 1\) implying \(\eta _{mj}^{(w)} = \eta _{mj}^{(c)} = \eta _{mj}^{(d)} = 1\) (i.e., all examinees in \(\mathcal {C}_m\) have mastered all attributes required for item j). Hence, the DCM underlying these two instances is the DINA model or the DINO model, which means that the proof of the consistency of \(\widehat{\varvec{\alpha }}\) presented in Wang and Douglas (2015) also applies to these two cases that are, therefore, of no further concern here either. Third, the case \(\eta _{mj}^{(c)} = 1\) and \(\eta _{mj}^{(d)} = 0\) is logically impossible because \(\eta _{mj}^{(c)} \le \eta _{mj}^{(d)}\) is always true. (\(\eta _{mj}^{(c)} = 1\) is true only if all item attributes required for item j are mastered—of course, then \(\eta _{mj}^{(d)} = 1\) too; if only a subset of the required item attributes is mastered, then \(\eta _{mj}^{(c)} = 0\), whereas \(\eta _{mj}^{(d)} = 1\); finally, if none of the attributes are mastered, then \(\eta _{mj}^{(c)} = \eta _{mj}^{(d)} = 0\)—a scenario just discussed.) Fourth, consequently, \(w_{mj}\) is estimable only if \(\eta ^{(c)}_{mj}=0\) and \(\eta ^{(d)}_{mj}=1\). Therefore, proving the consistency of \(\widehat{\varvec{\alpha }}\) needs to concern only the case \(\hat{\eta }^{(w)}_{mj} = \overline{Y}_{j \widehat{\mathcal {C}}_m}\) (see Eq. 11). In summary, the complex relations between \(\widehat{w}_{mj}\) and \(\hat{\eta }^{(w)}_{mj}\) can be described for the different values of \(\eta ^{(c)}_{mj}\) and \(\eta ^{(d)}_{mj}\) as

$$\begin{aligned} \widehat{w}_{mj} =\left\{ \begin{array}{ll} \text {not defined} &{} \quad \text {if}\, \eta _{mj}^{(c)} =\eta _{mj}^{(d)} = \eta _{mj}^{(w)} = 0\\ 1-\overline{Y}_{j \widehat{\mathcal {C}}_m} &{}\quad \text {if}\,\eta _{mj}^{(c)} =0\; \wedge \;\eta _{mj}^{(d)} = 1\\ \text {not defined} &{}\quad \text {if}\,\eta _{mj}^{(c)} =\eta _{mj}^{(d)} = \eta _{mj}^{(w)} = 1 \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} \hat{\eta }^{(w)}_{mj} =\left\{ \begin{array}{ll} \eta ^{(c)}_{mj} &{}\quad \text {if}\, w_{mj}= 1\\ \eta ^{(d)}_{mj} &{}\quad \text {if}\, w_{mj}= 0\\ \overline{Y}_{j \widehat{\mathcal {C}}_m} &{}\quad \text {otherwise} \end{array}\right. \end{aligned}$$

The following assumptions are made:

A1:

The item response vectors \(\varvec{Y}_1,\varvec{Y}_2,\ldots ,\varvec{Y}_N\) for examinees \(1,2,\ldots ,N\) are statistically independent.

A2:

For examinee i, the item responses \(Y_{i1},Y_{i2},\ldots ,Y_{iJ}\) are locally independent.

A3:

The Q-matrix is complete.

A4:

For the number of examinees and the length of the test, the relation \(Ne^{-2J\varepsilon ^2}\rightarrow \infty \) as \(J \rightarrow \infty \) holds \(\forall \varepsilon >0\).

Recall the distinction between a scenario where examinees’ attribute profiles \(\varvec{\alpha }\) are assumed to be known and a scenario where the attribute profiles \(\varvec{\alpha }\) are unknown and must be estimated from the data to initialize the GNPC algorithm. First, consider the scenario assuming that attribute profiles \(\varvec{\alpha }\) are known.

Lemma 1

Suppose the data conform to any DCM that can be expressed in terms of the G-DINA model. For each item j and each proficiency class \(\mathcal {C}_m\), \(\tilde{\eta }^{(w)}_{mj}\) is defined as in Eq. 6. \(N_m\) denotes the number of examinees in proficiency class \(\mathcal {C}_m\). Let \(S_j(\varvec{\alpha }_m) = E(Y_{j} \mid \varvec{\alpha }_m)\). Then, \(\sqrt{N_m} \quad \big ( \tilde{\eta }^{(w)}_{mj} - S_j(\varvec{\alpha }_m) \big ) \quad \overset{\mathcal {D}}{\longrightarrow } \quad \mathcal {N} \Big ( 0, S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \Big )\) for all j.

Proof

Suppose examinee i belongs to proficiency class \(\mathcal {C}_m\)—that is, \(\varvec{\alpha }_i = \varvec{\alpha }_m\). The response \(Y_{ij}\) is a binary random variable, with (conditional) expectation \(S_j(\varvec{\alpha }_m)\). Hence, the variance of \(Y_{ij}\) is

$$\begin{aligned} \text{ Var } \big ( Y_{ij} \mid \varvec{\alpha }_m \big )= S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \end{aligned}$$

with \(0<S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) < \infty \). Due to the central limit theorem

$$\begin{aligned} \sqrt{N_m} \quad \big ( \overline{Y}_{j\mathcal {C}_m} - S_j(\varvec{\alpha }_m) \big ) \quad \overset{\mathcal {D}}{\longrightarrow } \quad \mathcal {N} \Big ( 0, S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \Big ) \quad \forall j \end{aligned}$$
(12)

where \(\overline{Y}_{j\mathcal {C}_m}\) is the mean of the responses \(Y_{ij}\) in proficiency class \(\mathcal {C}_m\). Because \(\tilde{\eta }^{(w)}_{mj} = \overline{Y}_{j\mathcal {C}_m}\) (see Eq. 7), Eq. 12 implies

$$\begin{aligned} \sqrt{N_m}\quad \big (\tilde{\eta }^{(w)}_{mj}-S_j(\varvec{\alpha }_m)\big ) \quad \overset{\mathcal {D}}{\longrightarrow } \quad \mathcal {N} \Big ( 0, S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \Big ) \quad \forall j \end{aligned}$$
(13)

\(\square \)

Lemma 1 specifies the asymptotic distribution of \(\tilde{\eta }^{(w)}_{mj}\).

Second, consider the scenario where examinees’ proficiency classes \(\varvec{\alpha }\) are unknown and must be estimated. The asymptotic distribution of the estimator \(\hat{\eta }^{(w)}_{mj}\) is presented in Theorem 1; convergence of \(\hat{\eta }^{(w)}_{mj}\) to \(\tilde{\eta }^{(w)}_{mj}\)—a requirement for Theorem 1—is established in Lemma 2.

Lemma 2

Suppose \(\widehat{\varvec{\alpha }}^{(0)}\) is a consistent estimator of \(\varvec{\alpha }\). \(\tilde{\eta }^{(w)}_{mj}\) and \(\hat{\eta }^{(w)}_{mj}\) are estimators of \(\eta ^{(w)}_{mj}\) as defined in Eqs. 6 and 9, respectively. Assume \(J \rightarrow \infty \) and \(J<N\). Then \(\hat{\eta }^{(w)}_{mj} \overset{\mathcal {P}}{\longrightarrow } \tilde{\eta }^{(w)}_{mj}\) for all j.

Proof

Because \(\widehat{\varvec{\alpha }}^{(0)}\) is a consistent estimator of \(\varvec{\alpha }\)

$$\begin{aligned} P \Big ( \bigcup _{i=1}^N \big \{ \mid \widehat{\varvec{\alpha }}^{(0)}_i- \varvec{\alpha }_i \mid > \varepsilon \big \} \Big ) \rightarrow 0 \end{aligned}$$
(14)

as \(J \rightarrow \infty \). Equations 6 and 9 show that \(\hat{\eta }^{(w)}_{mj} = \tilde{\eta }^{(w)}_{mj}\) if \(\widehat{\varvec{\alpha }}^{(0)}_i = \varvec{\alpha }_i\) for all i, which can be expressed as

$$\begin{aligned} P \big ( \{ \widehat{\varvec{\alpha }}^{(0)}_i = \varvec{\alpha }_i \} \big ) \le P \big ( \{ \hat{\eta }^{(w)}_{mj} = \tilde{\eta }^{(w)}_{mj} \} \big ) \end{aligned}$$

or equivalently as

$$\begin{aligned} P \big ( \{ \hat{\eta }^{(w)}_{mj} \ne \tilde{\eta }^{(w)}_{mj} \} \big ) \le P \big ( \{ \{ \widehat{\varvec{\alpha }}^{(0)}_i \ne \varvec{\alpha }_i \} \big ) \quad \forall i \end{aligned}$$

Hence, for every \(\varepsilon >0\),

$$\begin{aligned} P \big ( \mid \hat{\eta }^{(w)}_{mj}- \tilde{\eta }^{(w)}_{mj} \mid>\varepsilon \big ) \le P \Big ( \bigcup _{i=1}^N \big \{ \mid \widehat{\varvec{\alpha }}^{(0)}_i - \varvec{\alpha }_i \mid >\varepsilon \big \} \Big ) \end{aligned}$$
(15)

Due to Eqs. 14, 15 also implies

$$\begin{aligned} P \big ( \mid \hat{\eta }^{(w)}_{mj} - \tilde{\eta }^{(w)}_{mj} \mid >\varepsilon \big ) \rightarrow 0 \end{aligned}$$

as \(J \rightarrow \infty \). Note that \(J \rightarrow \infty \) implies \(N \rightarrow \infty \) because \(N>J\). Therefore,

$$\begin{aligned} \hat{\eta }^{(w)}_{mj} \quad \overset{\mathcal {P}}{\longrightarrow } \quad \tilde{\eta }^{(w)}_{mj} \end{aligned}$$

for all j. \(\square \)

Theorem 1

Let \(\tilde{\eta }^{(w)}_{mj}\) and \(\hat{\eta }^{(w)}_{mj}\) be the estimators of \(\eta ^{(w)}_{mj}\) with properties as specified in Lemmas 1 and 2. Assume \(N>J\) and \(J \rightarrow \infty \). Also, assume \(\widehat{\varvec{\alpha }}^{(0)}_i\) is a consistent estimator of \(\varvec{\alpha }_i\). Then \(\big ( \hat{\eta }^{(w)}_{mj} - S_j(\varvec{\alpha }_m) \big ) \quad \overset{\mathcal {D}}{\longrightarrow } \frac{1}{\sqrt{N_m}} \quad \mathcal {N} \Big ( 0, S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \Big )\) for all j.

Proof

Lemma 1 states that

$$\begin{aligned} \sqrt{N_m} \quad \big ( \tilde{\eta }^{(w)}_{mj} - S_j(\varvec{\alpha }_m) \big ) \quad \overset{\mathcal {D}}{\longrightarrow } \quad \mathcal {N} \Big ( 0, S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \Big ) \end{aligned}$$

which can be written as

$$\begin{aligned} \big ( \tilde{\eta }^{(w)}_{mj} - S_j(\varvec{\alpha }_m) \big ) \quad \overset{\mathcal {D}}{\longrightarrow } \frac{1}{\sqrt{N_m}} \quad \mathcal {N} \Big ( 0, S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \Big ) \end{aligned}$$
(16)

Lemma 2 states that

$$\begin{aligned} \hat{\eta }^{(w)}_{mj}-\tilde{\eta }^{(w)}_{mj} \quad \overset{\mathcal {P}}{\longrightarrow } \quad 0 \quad \forall j \end{aligned}$$
(17)

Applying Slutsky’s theorem to Eqs. 16 and 17 results in

$$\begin{aligned} \hat{\eta }^{(w)}_{mj} - \tilde{\eta }^{(w)}_{mj} + \tilde{\eta }^{(w)}_{mj} - S_j(\varvec{\alpha }_m) = \hat{\eta }^{(w)}_{mj} - S_j(\varvec{\alpha }_m) \quad \overset{\mathcal {D}}{\longrightarrow } \frac{1}{\sqrt{N_m}} \quad \mathcal {N} \Big ( 0, S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \Big ) \end{aligned}$$

for all j. \(\square \)

Theorem 1 asserts that the distribution of \(\hat{\eta }^{(w)}_{mj}\) is asymptotically normal.

Define the distance \(d_i(\varvec{\alpha }_m) = d(\varvec{Y}_i,\widehat{\varvec{\eta }}^{(w)}_m)\), with \(d_{ij}(\varvec{\alpha }_m) = d(Y_{ij}, \hat{\eta }^{(w)}_{mj})\); the notation \(d_i(\varvec{\alpha }_m)\) is chosen for emphasis that the distance between the item response vector \(\varvec{Y}\) of examinee i and the weighted ideal item response vector \(\widehat{\varvec{\eta }}^{(w)}_m\) of proficiency class \(\mathcal {C}_m\) is a function of its attribute profile.

Lemma 3

Suppose the item response vectors \(\varvec{Y}_1, \varvec{Y}_2, \ldots , \varvec{Y}_N\) are statistically independent and the Q-matrix is complete. Then, for each examinee i, the true attribute profile minimizes \(E\big (d_i(\varvec{\alpha }_m)\big )\) for all m.

Proof

Due to completeness of the Q-matrix, the identifiability of all realizable attribute profiles is guaranteed. Suppose \(\varvec{\alpha }_m\) is the true attribute profile of examinee i; \(\varvec{\alpha }_{m'}\) is some other attribute profile. In case of the true attribute profile \(\varvec{\alpha }_m\)

$$\begin{aligned} E\big (d_{ij}(\varvec{\alpha }_m)\big )&= E\big ((Y_{ij} - \hat{\eta }^{(w)}_{mj})^2\big )\nonumber \\&= E\big (Y_{ij}^2 - 2\hat{\eta }^{(w)}_{mj}Y_{ij} + (\hat{\eta }^{(w)}_{mj})^2\big )\nonumber \\&= E(Y_{ij}^2) - 2E(\hat{\eta }^{(w)}_{mj}Y_{ij}) + E\big ((\hat{\eta }^{(w)}_{mj})^2\big ) \end{aligned}$$
(18)

\(Y_{ij}\) is a binary random variable; hence, \(E(Y_{ij}^2)=E(Y_{ij})\), and Eq. 18 can be reduced to

$$\begin{aligned}&E(Y_{ij}) - 2E(\overline{Y}_{j \mathcal {C}_m}Y_{ij}) + E\big ((\hat{\eta }^{(w)}_{mj})^2\big )\nonumber \\&\quad = S_j(\varvec{\alpha }_m) -\frac{2}{N_m}E \left( Y_{ij}Y_{ij} + \sum _{i'\ne i,i'=1}^{N_m}Y_{ij}Y_{i'j} \right) + \text{ Var }(\hat{\eta }^{(w)}_{mj}) + E^2(\hat{\eta }^{(w)}_{mj}) \end{aligned}$$
(19)

Independence of \(Y_{ij}\) and \(Y_{i'j}\) implies \(E(Y_{ij}Y_{i'j}) = E(Y_{ij})E(Y_{i'j})\) so that the RHS of Eq. 19 becomes

$$\begin{aligned}&S_j(\varvec{\alpha }_m) -\frac{2}{N_m}\big (E(Y_{ij}^2)+(N_m-1)E(Y_{ij}) E(Y_{i'j})\big ) +\frac{S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big )}{N_m}+S_j^2(\varvec{\alpha }_m)\\&\quad = S_j(\varvec{\alpha }_m)-\frac{2}{N_m}\big (S_j(\varvec{\alpha }_m)\big (1-S_j (\varvec{\alpha }_m)\big )+S_j^2(\varvec{\alpha }_m)+(N_m-1)S^2_j(\varvec{\alpha }_m)\big ) \nonumber \\&\qquad + \frac{S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big )}{N_m}+S_j^2( \varvec{\alpha }_m) \\&\quad = S_j(\varvec{\alpha }_m) -\frac{2}{N_m}S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) - 2S^2_j(\varvec{\alpha }_m) +\frac{S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m) \big )}{N_m}+S_j^2(\varvec{\alpha }_m)\\&\quad = S_j(\varvec{\alpha }_m) - S^2_j(\varvec{\alpha }_m)-\frac{1}{N_m}S_j( \varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \\&\quad =\frac{N_m-1}{N_m}S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) \end{aligned}$$

Conversely, if the incorrect attribute profile \(\varvec{\alpha }_{m'}\) is assumed to be the true attribute profile of examinee i, then

$$\begin{aligned} E\big (d_{ij}(\varvec{\alpha }_{m'})\big )&= E\big ((Y_{ij} - \hat{\eta }^{(w)}_{m'j})^2\big ) \\&= E(Y_{ij}) - 2E(\overline{Y}_{j\mathcal {C}_{m'}}Y_{ij}) + E\big ((\hat{\eta }^{(w)}_{m'j})^2\big ) \nonumber \\&= S_j(\varvec{\alpha }_m) -\frac{2}{N_{m'}}E\left( Y_{ij}\sum _{i'=1}^{N_{m'}}Y_{i'j}\right) + \text{ Var }(\hat{\eta }^{(w)}_{m'j}) + E^2(\tilde{\eta }^{(w)}_{m'j}) \\&= S_j(\varvec{\alpha }_m) - \frac{2}{N_{m'}}E(Y_{ij})\sum _{i'=1}^{N_{m'}}E(Y_{i'j}) + \frac{S_j(\varvec{\alpha }_{m'})\big (1-S_j(\varvec{\alpha }_{m'})\big )}{N_{m'}} + S_j^2(\varvec{\alpha }_{m'}) \\&= S_j(\varvec{\alpha }_m) -\frac{2}{N_{m'}}N_{m'}S_j(\varvec{\alpha }_m)S_j(\varvec{\alpha }_{m'}) + \frac{S_j(\varvec{\alpha }_{m'})\big (1-S_j(\varvec{\alpha }_{m'})\big )}{N_{m'}}+S_j^2(\varvec{\alpha }_{m'}) \\&= S_j(\varvec{\alpha }_m) - 2S_j(\varvec{\alpha }_m)S_j(\varvec{\alpha }_{m'}) + \frac{S_j(\varvec{\alpha }_{m'})\big (1-S_j(\varvec{\alpha }_{m'})\big )}{N_{m'}} + S_j^2(\varvec{\alpha }_{m'}) \end{aligned}$$

Thus, for each item j,

$$\begin{aligned}&E\big (d_{ij}(\varvec{\alpha }_m)\big )- E\big (d_{ij}(\varvec{\alpha }_{m'})\big ) \\&\quad = E\big ((Y_{ij}-\hat{\eta }^{(w)}_{mj})^2\big ) - E\big ((Y_{ij} - \hat{\eta }^{(w)}_{m'j})^2\big ) \\&\quad =\frac{N_m-1}{N_m}S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) - S_j(\varvec{\alpha }_m) +2S_j(\varvec{\alpha }_m)S_j(\varvec{\alpha }_{m'})\\&\qquad -\frac{S_j( \varvec{\alpha }_{m'})\big (1-S_j(\varvec{\alpha }_{m'})\big )}{N_{m'}}-S_j^2( \varvec{\alpha }_{m'})\\&\quad =-S^2_j(\varvec{\alpha }_m)-\frac{1}{N_m}S_j(\varvec{\alpha }_m)\big ( 1-S_j(\varvec{\alpha }_m)\big )+ 2S_j(\varvec{\alpha }_m)S_j(\varvec{\alpha }_{m'}) \\&\qquad - \frac{1}{N_{m'}}S_j(\varvec{\alpha }_{m'})\big (1-S_j(\varvec{\alpha }_{m'})\big )- S^2_j(\varvec{\alpha }_{m'})\\&\quad =-\big (S_j(\varvec{\alpha }_m)-S_j(\varvec{\alpha }_{m'})\big )^2- \frac{1}{N_m}S_j(\varvec{\alpha }_m)\big (1-S_j(\varvec{\alpha }_m)\big ) -\frac{1}{N_{m'}}S_j(\varvec{\alpha }_{m'})\big (1-S_j(\varvec{\alpha }_{m'})\big ) \\&\quad = -\big (S_j(\varvec{\alpha }_m) - S_j(\varvec{\alpha }_{m'})\big )^2 - \text{ Var }(\hat{\eta }^{(w)}_{mj}) - \text{ Var }(\hat{\eta }^{(w)}_{m'j}) \\&\quad < 0 \end{aligned}$$

Therefore, \(E\big (d_{ij}(\varvec{\alpha }_m)\big ) < E\big (d_{ij}(\varvec{\alpha }_{m'})\big )\) for all j, which suggests

$$\begin{aligned} \sum _{j = 1}^J E\big (d_{ij}(\varvec{\alpha }_m)\big ) < \sum _{j = 1}^J E\big (d_{ij}(\varvec{\alpha }_{m'})\big )\big ) \end{aligned}$$

Then,

$$\begin{aligned} E\left( \sum _{j = 1}^J d_{ij}(\varvec{\alpha }_m)\right)< & {} E\left( \sum _{j = 1}^J d_{ij}(\varvec{\alpha }_{m'})\right) \\ E\big (d_i(\varvec{\alpha }_m)\big )< & {} E\big (d_i(\varvec{\alpha }_{m'})\big ) \end{aligned}$$

due to \(d_i(\varvec{\alpha }_m)=\sum _{j = 1}^J d_{ij}(\varvec{\alpha }_m)\). These inequalities imply that the expected distance between the ideal and manifest item response vectors is minimized by the true attribute profile. \(\square \)

Lemma 3 maintains that for examinee i in \(\mathcal {C}_m\), the distance between the item response vector \(\varvec{Y}_i\) and the ideal response vector \(\hat{\varvec{\eta }}_m^{(w)}\) is minimized if the true attribute profile is used to construct \(\hat{\varvec{\eta }}_m^{(w)}\).

Proposition

Suppose the item response vectors \(\varvec{Y}_1, \varvec{Y}_2, \ldots , \varvec{Y}_N\) are statistically independent and the Q-matrix is complete. If \(\varvec{\alpha }_m\) is the true attribute profile of examinee i and \(\varvec{\alpha }_{m'}\) a different attribute profile, then \(\lim _{J\rightarrow \infty }\Big (E\big (d_i(\varvec{\alpha }_{m'})\big )-E\big (d_i(\varvec{\alpha }_m) \big )\Big )=\infty \) is true.

Proof

Based on Lemma 3,

$$\begin{aligned}&\lim _{J \rightarrow \infty }\Big (E\big (d_i(\varvec{\alpha }_{m'})\big ) - E\big (d_i(\varvec{\alpha }_m)\big )\Big )\nonumber \\&\quad = \lim _{J\rightarrow \infty } \sum _{j=1}^J \Big (\big (S_j(\varvec{\alpha }_m) - S_j(\varvec{\alpha }_{m'})\big )^2+\text{ Var }(\hat{\eta }^{(w)}_{mj}) + \text{ Var }(\hat{\eta }^{(w)}_{m'j})\Big )\nonumber \\&\quad> \lim _{J\rightarrow \infty }\sum _{j=1}^J\big (\text{ Var }(\hat{\eta }^{(w)}_{mj}) + \text{ Var }(\hat{\eta }^{(w)}_{m'j})\big )\nonumber \\&\quad > \lim _{J\rightarrow \infty }J\Big (\min _j\big (\text{ Var }(\hat{\eta }^{(w)}_{mj})\big ) + \min _j\big (\text{ Var }(\hat{\eta }^{(w)}_{m'j})\big )\Big )\nonumber \\&\quad = \infty \end{aligned}$$
(20)

\(\square \)

Let \(\bar{d}_i(\varvec{\alpha }_m)=\frac{1}{J}\sum _{j=1}^J d_{ij}(\varvec{\alpha }_m)\) denote the average distance for examinee i across all j.

Lemma 4

Assume local independence; then, for examinee i, \(\bar{d}_i(\varvec{\alpha }_m) \overset{\mathcal {P}}{\longrightarrow } E\big (d_{ij}(\varvec{\alpha }_m)\big )\) uniformly as \(J \rightarrow \infty \).

Proof

Hoeffding’s (1963) inequality states \(P\big (|\frac{1}{N}\sum _{i=1}^N X_i-E(X_i)|\ge \epsilon \big )<2 \exp (-2N\epsilon ^2)\), where \(X_1, X_2, \ldots ,X_N\) are iid random variables and \(0\le X_i \le 1\) for all i. Hoeffding’s inequality is used to show that \(\lim _{J\rightarrow \infty }P\Big (\max _m \mid \bar{d}_i(\varvec{\alpha }_m) - E\big (d_{ij}(\varvec{\alpha }_m)\big ) \mid \ge \varepsilon \Big )=0\). Specifically, due to local independence, \(Y_{i1}, Y_{i2}, \ldots ,Y_{iJ}\) are independent conditional on \(\varvec{\alpha }\). Hence, \(d_{i1}(\varvec{\alpha }_m), d_{i2}(\varvec{\alpha }_m), \ldots , d_{iJ}(\varvec{\alpha }_m)\) are independent as well. Also, because \(0 \le d_{ij}(\varvec{\alpha }_m) \le 1\), the conditions for using Hoeffding’s inequality are satisfied, and thus, for every \(\varepsilon > 0\),

$$\begin{aligned} P\Big (|\bar{d}_i(\varvec{\alpha }_m) - E\big (d_{ij}(\varvec{\alpha }_m)\big )|\ge \varepsilon \Big ) < 2 \exp (-2J\varepsilon ^2) \end{aligned}$$

is true. Proposition 3 in Wang and Douglas (2015) is then used to show \(\lim _{J\rightarrow \infty }P\Big (\max _m \mid \bar{d}_i(\varvec{\alpha }_m) - E\big (d_{ij}(\varvec{\alpha }_m)\big ) \mid \ge \varepsilon \Big )=0\). Hence, \(\bar{d}_i(\varvec{\alpha }_m) \overset{\mathcal {P}}{\longrightarrow } E\big (d_{ij}(\varvec{\alpha }_m)\big )\) uniformly as \(J \rightarrow \infty \). \(\square \)

Lemma 4 asserts that \(\bar{d}_i(\varvec{\alpha }_m)\) converges to \(E\big (d_{ij}(\varvec{\alpha }_m)\big )\), which is required for the proof of Theorem 2.

Theorem 2

Suppose all assumptions of the preceding lemmas and the proposition hold, including a) statistical independence of the item response vectors \(\varvec{Y}_1, \varvec{Y}_2, \ldots , \varvec{Y}_N\), b) local independence, and c) completeness of the Q-matrix. In addition, assume \(N\exp (-J\varepsilon ^2)\rightarrow 0\) as \(J \rightarrow \infty \). Then, \(\widehat{\varvec{\alpha }}\) obtained from the GNPC method is a uniformly consistent estimator of \(\varvec{\alpha }\) for each examinee i.

Proof

Theorem 2 is proven using Theorems 1 and 2 in Wang and Douglas (2015). \(\square \)

Theorem 2 establishes the consistency of the estimator \(\hat{\varvec{\alpha }}\) obtained by the GNPC method.

Comment:  Wang and Douglas (2015) presented two consistency theorems; Theorem 1 asserts point-wise convergence in probability; Theorem 2 uniform convergence, a stronger form of statistical consistency, because all parameter estimators are guaranteed to converge. As uniform convergence implies point-wise convergence, here, only uniform convergence was considered (as it is stated in Theorem 2). Instead of an elaborate proof, reference is just made to Theorems 1 and 2 in Wang and Douglas (2015). The connection, however, between these two theorems, and how they actually allow for constructing a proof of Theorem 2 presented above, might not be obvious. Thus, as a courtesy to the reader, a more detailed summary of the argument developed in Wang and Douglas (2015), and how it applies to the proof of Theorem 2 are provided in the subsequent paragraphs.

Key to Wang and Douglas’ (2015) work are two assumptions about the independence of examinees’ item response vectors and the item responses of individual examinees:

Assumption (1) The item response vectors \(\varvec{Y}_1, \varvec{Y}_2,\ldots ,\varvec{Y}_N\) of examinees \(1, 2, \ldots , N\) are statistically independent.

Assumption (2) For examinee i, the item responses \(Y_{i1},Y_{i2},\ldots , Y_{iJ}\) are statistically independent conditional on attribute vector \(\varvec{\alpha }_i\).

Let \(q_{jk}\) be the entry in the \(j\mathrm{th}\) row and \(k\mathrm{th}\) column of the \(J \times K\) Q-matrix; define the set \(\mathcal {B}_j = \{k \mid q_{jk} = 1\}\). For some number \(\delta \in (0, 0.5)\), the following conditions on the parameters of the true model underlying the data must be fulfilled:

Condition (a.1) If the data conform to the DINA model, then the parameters \(g_j\) and \(s_j\) must satisfy \(g_j < 0.5-\delta \) and \(s_j < 0.5 -\delta \).

Condition (a.2) If the data conform to the NIDA model, then \(g_k < 0.5 - \delta \), for \(k = 1, 2,\ldots , K\), and \(\prod _{k \in \mathcal {B}_j}(1-s_k) > 0.5 + \delta \), for \(j = 1, 2,\ldots , J\) must be true.

Condition (a.3) If the data conform to the Reduced RUM, then \(\pi ^*_j > 0.5 + \delta \) for every j, and \(r^*_{jk} < 0.5 - \delta \) for some \(k \in \mathcal {B}_j\) must be true.

Condition (b) Define the set \(\mathcal {A}_{m,m'}= \{j \mid \eta _{mj} \ne \eta _{m'j}\}\), where m and \(m'\) index attribute profiles of different proficiency classes among all the \(M=2^K\) realizable proficiency classes; \(\text {Card}(\mathcal {A}_{m,m'})\rightarrow \infty \) as \(J\rightarrow \infty \).

Condition (c) The total number of examinees N and of items J satisfy the relation \(Ne^{-2J\varepsilon ^2}\rightarrow 0\) as \(J\rightarrow \infty \quad \forall \varepsilon > 0\).

How do Wang and Douglas’ (2015) assumptions and conditions relate to those made in this article? Assumption (1) in Wang and Douglas (2015) is equivalent to \(A_1\) in this article (p. 12). Assumption (2) essentially concerns local independence and corresponds to assumption \(A_2\) (p. 12). Conditions (a.1), (a.2), and (a.3) concern the DINA model, the NIDA model, and the Reduced RUM, respectively; they are irrelevant for the GNPC method. Condition (b) is equivalent to the completeness condition of the Q-matrix in \(A_3\) (p. 12). Condition (c) corresponds to \(A_4\) (p. 12). Hence, all assumptions and conditions in Wang and Douglas (2015)—with the exception of Conditions (a.1), (a.2), and (a.3)—were also used in this article.

For the proof of Theorem 1 in Wang and Douglas (2015), Assumptions (1) and (2) as well as Conditions (a.1), (a.2), (a.3), and (b) are needed; for the proof of Theorem 2, Assumptions (1) and (2) as well as Conditions (a.1), (a.2), (a.3), (b), and (c) are needed. In addition to these assumptions and conditions, Wang and Douglas (2015) presented three propositions that are required for proving Theorems 1 and 2.

The proofs of Theorems 1 and 2 of the GNPC method have a different structure than those of Wang and Douglas (2015). However, similar to their work, the proofs presented in this article required three additional antecedents: Lemma 3 (p. 15), the proposition presented on p. 18, and Lemma 4 (p. 19). The proof of Theorem 1 for the GNPC method is self-explanatory; for the proof of Theorem 2, assumptions \(A_1\), \(A_2\), \(A_3\), \(A_4\), the proposition presented on p. 18, and Lemmas 3 and 4 were needed.

4 Discussion and Conclusion

The GNPC method does not rely on a parametric statistical model for identifying an examinee’s proficiency class. As a key advantage in comparison with other nonparametric classification procedures, the GNPC method can also account for item response probabilities that are modeled as an increasing function of the number of attributes required by that item and those mastered by an examinee. This includes all models that can be represented as a general DCM. In the past, a major obstacle to the use of the GNPC method was that the statistical properties of the proficiency-class estimator were unknown. In this article, the GNPC estimator of an examinee’s proficiency class, \(\widehat{\varvec{\alpha }}\), was proven to be statistically consistent. The consistency theory of \(\widehat{\varvec{\alpha }}\) consists of four lemmas and one proposition that specify the theoretical requirements for two theorems that establish uniform convergence of \(\widehat{\varvec{\alpha }}\) to the true \(\varvec{\alpha }\) if \(J \longrightarrow \infty \). In conclusion, three topics remain to be briefly addressed here.

First, recall that the GNPC algorithm must be initialized by the estimator \(\widehat{\varvec{\alpha }}^{(0)}\) of an examinee’s proficiency class. The NPC method is typically used to obtain \(\widehat{\varvec{\alpha }}^{(0)}\)—this is equivalent to initializing the GNPC method by setting \(\widehat{w}_{mj}\) to 1; hence,

$$\begin{aligned} \hat{\eta }^{(w)}_{mj} = \widehat{w}_{mj}\hat{\eta }_{mj}^{(c)} + (1-\widehat{w}_{mj})\hat{\eta }_{mj}^{(d)} = \hat{\eta }_{mj}^{(c)} \end{aligned}$$

(see Eq. 9). Then, based on \(\hat{\eta }^{(w)}_{mj}\), examinees’ estimated proficiency classes are updated, resulting in updated estimates of \(\widehat{\varvec{\eta }}^{(c)}\) and \(\widehat{\varvec{\eta }}^{(d)}\) that are subsequently used to obtain an update of \(\widehat{w}_{mj}\) (see Eq. 8), and so on.

Second, the performance of the GNPC method was evaluated in a series of simulation studies using various conditions. These included also increasing the number of items, J, for studying the effect on the consistency of \(\widehat{\varvec{\alpha }}\). The findings corroborated the theoretical derivations and proofs presented in this article. The description of the design of these simulation studies and their results are available in Chiu et al. (2018) that the reader may wish to consult for further details. (They are not reported here to avoid redundancy.) Chiu et al. (2018) also present the results of the analysis of a real-world data set with the GNPC method.

Third, one could raise the general question whether nonparametric approaches to analyzing CD-data might not be obsolete in light of the availability of specialized software offering efficient implementations of parametric, maximum-likelihood-based methods for fitting DCMs to (educational) assessment data—for example, the R packages CDM (Robitzsch, Kiefer, George, & Uenlue, 2018) and GDINA (Ma & de la Torre, 2019). However, as was mentioned earlier, in some situations, parametric methods may be difficult to implement, or they fail entirely. Algorithms like MMLE-EM work best for large-scale assessments, where the data of at least several hundred examinees are available. But when sample sizes are small, as is typically the case with assessment data from educational micro-environments, say, for monitoring the instruction and learning process at the classroom level, then sample sizes may be simply insufficient to guarantee reliable maximum likelihood estimates of examinees’ proficiency class. (Within an applied context, the focus is typically on the evaluation of instruction and the assessment of students’ learning; hence, estimation of the item parameters is not necessarily a primary goal.) Under these circumstances, nonparametric methods may be the only viable tools for monitoring and assisting “the teaching and learning process while it is occurring” (Stout, 2002, p. 506)—that is, at the classroom level, where CD-based methods are most useful and needed. Similar restrictions apply to the availability of CD-based computerized adaptive testing (CD-CAT) in small-scale educational settings, where it would be most beneficial. But the lack of an efficient and effective computational device for the reliable assessment of examinees’ proficiency class has been a serious obstacle that, so far, has prevented the use of CD-CAT in classrooms. The algorithms of the NPC and GNPC methods, however, can easily handle even smallest sample sizes. They are immune to the difficulties arising from unstable and unreliable estimates that parametric methods typically encounter under such conditions (Chiu et al. 2018). In addition, the NPC and GNPC methods are easy to implement and computationally inexpensive. Their minimal CPU times predestine them to be used with CD-based computerized adaptive tests (CAT) to be administered for monitoring teaching and learning at the level of individual classrooms.

As a final comment, parametric and nonparametric methods for cognitive diagnosis require that the Q-matrix of a test be known and correctly specified; this applies also to the theoretical results concerning the nonparametric GNPC method, as they have been presented in this article. However, as a remarkable aside, informal simulations (not reported here due to space restrictions) showed that the GNPC method was surprisingly robust to the misspecification of individual entries of the Q-matrix. The Q-matrices involved \(K=5\) attributes and were composed of 30 items conforming to the DINA model and 30 items to the saturated GDINA model. Even if \(20\%\) of the entries of the Q-matrix were misspecified, the GNPC method outperformed the MMLE-EM algorithm in the correct classification of examinees in all instances where samples consisted of 100 or fewer examinees.