Abstract
The Graded Response Model (GRM; Samejima, Estimation of ability using a response pattern of graded scores, Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, θ, to underlie the ordinal item scores (Takane & de Leeuw in Psychometrika, 52:393–408, 1987). Traditionally, a normal distribution is specified for Z implying homoscedastic error variances and a normally distributed θ. In this paper, we present the Heteroscedastic GRM with Skewed Latent Trait, which extends the traditional GRM by incorporation of heteroscedastic error variances and a skew-normal latent trait. An appealing property of the extended GRM is that it includes the traditional GRM as a special case. This enables specific tests on the normality assumption of Z. We show how violations of normality in Z can lead to asymmetrical category response functions. The ability to test this normality assumption is beneficial from both a statistical and substantive perspective. In a simulation study, we show the viability of the model and investigate the specificity of the effects. We apply the model to a dataset on affect and a dataset on alexithymia.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
The Graded Response Model (Samejima 1969) is a psychometric generalized linear mixed model to infer a continuously distributed latent trait from a set of ordinal item scores. In psychology, the model is particularly useful in the field of personality (e.g., Fraley, Waller, & Brennan, 2000; Emons, Meijer, & Denollet, 2007) where questionnaires are administered using items with Likert-type scales.
The GRM can be derived by assuming that the ordinal scores on item i have risen from categorization of an underlying normally distributed variable, Z i , at increasing thresholds. This variable is modeled as a linear function of the latent trait, θ. The category response functions—which specify the relation between the probability of a response and θ—are then given by integrating over the conditional density function of Z i|θ between the corresponding thresholds (see Takane & de Leeuw, 1987; Wirth & Edwards, 2007). In the present article, we focus on how violations of the normality assumption affect the model. We will present the Heteroscedastic GRM with Skewed Latent Trait which is an extension of the traditional GRM that enables specific tests on this assumption. These tests are useful both in the statistical and the substantive setting.
1 The Normality Assumption for Z i
The assumption of a normal distribution for Z i is pragmatic but testable (see, e.g., Jöreskog 2002, p. 13, for an unconditional test on the bivariate marginal distribution of Z i ). The normal distribution can be replaced by a logistic distribution (Birnbaum 1968) as an approximation. The shape of the category response functions in the GRM depends on Z i|θ . Thus, using a normal distribution for Z i implies a normal distribution for Z i|θ which results in the normal-ogive GRM. On the contrary, using a logistic distribution for Z i implies a logistic distribution for Z i|θ which results in the logistic GRM (see Samejima 1969). Note that the distribution of θ does not affect the category response function. For example, if the distribution of θ is logistic and the distribution of Z i|θ is normal, the category response functions are still characterized by a normal-ogive function while Z i is non-normally distributed (due to the non-normal θ).
Central to both the logistic and normal distribution is that they are symmetric, implying a symmetric distribution for Z i|θ , which results in symmetric category response functions. As a result, the category response functions increase and decrease at the same rate to their upper and lower limits, respectively, i.e., plotting the probability of answering in category c as a function of the underlying trait, θ, will result in symmetric functions for all c. This property of generalized (non-)linear (mixed) models has been questioned in general with respect to binary data (e.g., Czado & Santner, 1992; Chen, Dey, & Shao, 1999), and specifically in item response theory (IRT). Notably, Samejima (1997, 2000, 2008) pointed out that the assumption of a symmetrical distribution for Z i|θ can lead to an inconsistent relation between the category location parameters, β ic and estimated θ (see Samejima, 1997, 2000). As a solution, Samejima (2000, 2008) proposed the logistic positive exponent family that incorporates an acceleration parameter within the logistic GRM, which allows the category response functions to be asymmetric. In addition, Bazán, Branco, and Bolfarine (2006) (see also Bazán, Bolfarine, & Branco, 2004) obtained asymmetric category response functions by using a skewed normal distribution function for Z i|θ . In applying the model to a dataset on mathematics ability and on weight perception, Bazán et al. found that a model with a skewed distribution function fitted better than a model with the traditional normal distribution function. Finally, Ramsay and Abrahamowicz (1989) proposed using a spline function for the category response functions.
Past research as described above focused on asymmetric category response functions due to skewness in the conditional distribution of Z i|θ . We argue that asymmetry in the category response functions can also arise if the distribution of Z i|θ is perfectly symmetric. Specifically, we show that when the errors of the underlying regression of Z i on θ are heteroscedastic, the distribution of Z i|θ is symmetric, but the category response functions are not. As this effect will make the marginal distribution of Z i skewed, we also take into account that the marginal distribution could be asymmetric due to a skewed distribution of θ. However, this will not result in asymmetric category response functions.
The heteroscedastic GRM with a skewed latent trait—as presented in this paper—can be used to test for specific basis for the departure from normality in the marginal distribution of Z i ; i.e., heteroscedasticity or a skewed latent trait. The basis of the departure may be of interest for both statistical and substantive reasons.
1.1 Statistical Point of View
1.1.1 Parameter Bias
In the IRT literature, it has been shown that the presence of a non-normal latent trait will bias the parameter estimates of the discrimination parameters (Azevedo, Bolfarine, & Andrade, 2011), the item category parameters (Zwinderman & van der Wollenberg, 1990), and the ability estimates (Seong 1990; Ree 1979; Swaminathan & Gifford, 1983). However, some authors found no bias (Stone 1992) which Kirisci, Hsu, and Yu (2001) attributed to the different simulation conditions used in the different studies. It was concluded that for decreasing test length and decreasing sample size, bias on the item and person parameters increase (see Kirisci et al. 2001).
With respect to heteroscedasticity in IRT, little is known as—interestingly—the subject of heteroscedasticity has not yet been investigated within this field. Heteroscedasticity has been investigated in depth in other fields including (generalized) linear regression (e.g., Agresti 2002, p. 151; Long & Ervin, 2000), (repeated measures) analysis of (co)variance (e.g., Keselman & Lix, 1997; Rochon 1992), and factor analysis (e.g., Bollen 1996; Meijer & Mooijaart, 1996; Hessen & Dolan, 2009).
As heteroscedasticity has not yet been investigated in the GRM, it is unclear whether the presence of heteroscedastic error variances will bias either item and/or person parameter estimates. We suspected that this would be the case given that heteroscedasticity has a similar effect on the distribution of Z i compared to a non-normal latent trait. We conducted a small simulation study to investigate the effect of heteroscedasticity in the GRM on the parameter estimates. As it is not the main focus of the present paper, results are presented in the appendix. From the results it is clear that for increasing effect sizes item parameters get more biased (in the most extreme case, the root mean squared differences between the parameter estimates and the true values exceeded twice the values under the tradition GRM).
Thus, we think that, from a statistical point of view, testing for heteroscedasticity and non-normality in the latent trait within the GRM is valuable, at least to compare the results to those obtained from the traditional GRM. If the effects of heteroscedasticity and non-normality of the trait are small, and item parameters are highly similar to those obtained using the traditional GRM, one can safely rely on the traditional GRM. However, when differences are large, and effects appear to be large, it might be better to take the effects of heteroscedasticity and/or non-normality in the trait into account, which can be done using the models outlined in this paper.
1.1.2 Consistency of Estimated θ
As Samejima (1997, 2000) pointed out, estimating θ in the normal-ogive model using symmetric category response curves will result in an inconsistent relation between category location parameters, β ic , and estimated θ. For simplicity, consider two items that follow a 1-parameter normal-ogive model with category location parameter β 1 for item 1 and β 2 for item 2. Assume that item 2 is more difficult than item 1, i.e., β 1<β 2. Within this model Samejima (1997) showed that for subjects with \(\theta < \operatorname{Mean}(\theta)\), answering the more difficult item 2 correctly is given more credit, while for subjects with \(\theta > \operatorname{Mean}(\theta)\) answering the easier item 1 correctly is given more credit. Stated differently, for \(\theta < \operatorname{Mean}(\theta)\) answering a more difficult item correctly will result in a higher estimate of θ compared to answering an easier item correctly. In addition, for \(\theta > \operatorname{Mean}(\theta)\) answering an easier item correctly will result in a higher estimate of θ compared to answering a more difficult item correctly. When category response functions are asymmetric, this inconsistency can be overcome. For a detailed discussion, see Samejima (1997) for the 2-parameter normal-ogive model, and Samejima (2000) for the GRM. For the present undertaking, it thus seems reasonable to take the possibility of asymmetric category response functions into account, to address this inconsistency.
1.1.3 Advantage over Existing Tests
When testing for heteroscedasticity and/or for non-normality in θ, an omnibus test is less sensitive than a more specifically focused test. In an omnibus test, such as the standard tests in LISREL, see Jöreskog (2002, p. 13), or the test proposed by Muthén and Hofacker (1988), no distinction is made between non-normality due to heteroscedastic errors and/or due to a non-normal trait. We see three reasons that specific tests (on the trait and the errors) have added value relative to the omnibus tests on normality. First, omnibus tests suggest no solution when a violation is found, whereas the models in this paper can be used to take the non-normality into account. Second, it is important to distinguish between departures from non-normal Z i due to heteroscedasticity, and departures due to a non-normal trait. Heteroscedasticity will make the category response functions asymmetric similarly to Samejima (1997, 2000, 2008), but a non-normal trait will not influence the category response functions. Third, the effects of heteroscedasticity and the non-normal trait could be in opposite directions, causing the marginal distribution of Z i to be reasonably normal while homoscedasticity and normality of the trait are violated.
1.1.4 Non-linear Moderated Factor Analysis and Differential Item Functioning
In non-linear moderated factor analysis (Bauer & Hussong, 2009; see also Neale 1998; Neale, Aggen, Maes, Kubarych, & Schmitt, 2006; Molenaar, Dolan, Wicherts, & van der Maas, 2010b), the parameters in a given latent variable model are a function of an external variable (e.g., age), which may moderate these parameters. These models have important applications as models for differential item functioning (DIF; Mellenbergh 1989) with respect to a continuous background variable (see Neale et al. 2006; and Bauer & Hussong, 2009). We note that heteroscedasticity in the GRM, as formulated below, may be attributable to DIF. For instance, suppose that the error variance in the regression of a given item on the latent trait varies with the continuous background variable that is correlated with the latent trait. This heteroscedasticity may be detected and interpreted as a manifestation of DIF by conditioning on the background variable (as in Neale et al. 2006). In the unconditional application of our GRM, such DIF would give rise to heteroscedasticity in the regression of the item on the latent trait.
1.2 Substantive Point of View
Finding that a set of items that purport to measure the same underlying latent trait is associated with heteroscedasticity across the trait can have interesting implications for the theory of that trait. This applies equally well to the finding that a given latent trait has a skewed distribution. We consider three examples, which we discuss shortly below.
1.2.1 Schematicity
In personality research, schematicity refers to the hypothesis that people differ in the certainty and accuracy with which they answer to self-report measures because of differences in their cognitive structures concerned with processing information about the self (Markus 1977; Rogers, Kuiper & Kirker, 1977; Tellegen 1988). Research indicates that being highly schematic (i.e., having strong cognitive structures about the self) is associated with an extreme position on the trait (Markus 1977). Note that this phenomenon implies heteroscedastic errors in the GRM, i.e., conditional variances decrease across the trait. We are not aware of any latent variable approaches to modeling schematicity. However, using the present model this is possible, as we illustrate later.
1.2.2 Gene-by-Environment Interaction
In behavior genetics, a gene-by-environment interaction (GxE) will result in heteroscedasticity of the environment factors with respect to the genetic factors (Jinks & Fulker, 1970). Within the linear factor model, van der Sluis, Dolan, Neale, Boomsma, and Posthuma (2006) propose testing for heteroscedastic errors as a test of GxE, i.e., the variance conditional on the genetic factor increases or decreases across that factor (see also Molenaar, van der Sluis, Boomsma, & Dolan, 2012). This approach is, however, limited to continuous factor indicators (e.g., subtest scores). The GRM proposed in this paper can be used to investigate GxE at the item level.
1.2.3 Ability Differentiation
This hypothesis from intelligence research refers to the claim that the general intelligence factor is not equally strong across its range (Spearman 1927; Tucker-Drob 2009). This phenomenon could be attributed to heteroscedastic errors, i.e., increasing conditional variances across the intelligence factor (Hessen & Dolan, 2009), non-linear factor loadings (Tucker-Drob 2009) or a skewed ability factor (Molenaar, Dolan, & van der Maas, 2011). All previous studies have focused on the subtest level; however, using the heteroscedastic GRM with non-normal trait enables the investigation of the phenomenon at the item level.
In conclusion, the heteroscedastic GRM with non-normal trait can be used to test statistical and substantive hypotheses. The outline of the present paper is as follows: First, we present the derivation of the traditional GRM using the underlying Z i . Next we discuss how category response functions can be skewed. Then, we extend the GRM to incorporate heteroscedastic errors and a skewed θ. Next, we present a simulation study to show the viability of the model and to investigate the power to detect both effects. Finally, we apply the model to a questionnaire measuring affect, and to a dataset on alexithymia to test the schematicity hypothesis.
1.3 The Graded Response Model
In the GRM, it is assumed that a normally distributed variable Z i underlies Y i , the observed scores on item i. To enable inferences about a latent trait θ, Z i is regressed on θ resulting in
where υ i is the intercept, λ i is the regression weight or factor loading, and ε i is the error term.Footnote 1 The marginal distribution of Z i , g(.), is given by
where h(.) is the density function for θ, and Z i|θ is the score on Z i conditional on θ with conditional density function f(.) and
and
The probability of a subject with a given θ answering in category c on item i is given by
where Y i|θ is the observed score on item i conditional on θ,C i is the number of answer categories of item i,τ i0=−∞, and τ i(Ci−1)=∞. Equations (3), (4), and (5) give
where F(.) is the conditional distribution function of Z i|θ . Substituting \(\upsilon_{i} = 0,\alpha_{i} = \frac{\lambda_{i}}{\sigma_{\varepsilon i}}\), and \(\beta_{ic} = - \frac{\tau_{ic}}{\sigma_{\varepsilon i}}\), we obtain
which is the normal-ogive GRM with item scale parameter α i and category location parameter β ic ; see Takane and de Leeuw (1987), McDonald (1999).
2 Asymmetric Z i
In Equations (1) and (2), it holds that if θ and ε i are both normally distributed, Z i is normally distributed by convolution. Conversely, Cramér’s theorem (Cramér 1937) ensures that when Z i is normally distributed, θ is normally distributed and Z i|θ (i.e., ε i ) is normally distributed.Footnote 2 Additionally, for a normal distribution for Z i to hold, Z i|θ should be homoscedastic (i.e., constant across θ; see Meijer & Mooijaart, 1996; Hessen & Dolan, 2009). See Figure 1 for a graphical representation for the case where C=3.
Thus, if we accept the linear relation between Z i and θ in Equation (1), then any violation of symmetry in the marginal distribution of Z i is caused by the following not mutually exclusive causes:
-
1.
\(\sigma_{\varepsilon i}^{2}\) is heteroscedastic (i.e., variable across θ), see Figure 2(b).
-
2.
The distribution of θ is skewed, see Figure 2(c).
-
3.
The distribution of Z i|θ is skewed, see Figure 2(d).
Existing approaches by Samejima (1997, 2000, 2008), Bazán et al. (2006), and Ramsay and Abrahamowicz (1989), as discussed above, focus exclusively on cause 3. However, cause 1 will also result in asymmetric category response functions. Thus, if normality of Z i|θ is not rejected by one of the existing approaches, category response functions can still be asymmetric due to heteroscedastic \(\sigma_{\varepsilon i}^{2}\). In addition, non-normality can arise in Z i due to a skewed θ.
Specific tests on the violations as presented above are developed within the framework of factor analysis for continuous variables; Hessen and Dolan (2009) proposed a one-factor model to test for heteroscedastic errors, and Molenaar, Dolan, and Verhelst (2010a) proposed a one-factor model to test for heteroscedastic errors, non-linear factor loadings, and/or a non-normal factor distribution. In the framework of factor analysis for ordinal variables, including the GRM, such specific tests have received relatively little attention. Effort has been invested in approximating the distribution of θ using mixture distributions (Muthén & Muthén, 2007; Vermunt 2004; Vermunt & Hagenaars, 2004; Schmitt, Mehta, Aggen, Kubarych, & Neale, 2006), extensions of the normal distribution (van den Oord, 2005; Verhelst 2009; Azevedo et al., 2011), and splines (Woods 2007). However, we are not aware of any work on heteroscedastic errors in the GRM.
3 Extending the GRM
We extended the GRM to include heteroscedastic \(\sigma_{\varepsilon i}^{2}\) and a skewed θ. We present the model development in factor analysis notation, as conceptualization of the GRM in terms of an underlying Z i variable originates in this field (Wirth & Edwards, 2007). In addition, within factor analysis, heteroscedasticity has already been studied to some degree, so that our presentation connects better with the existing literature. Still, translating the final result to IRT notation is straightforward (using the results in Takane & de Leeuw, 1987, as motivated above).
3.1 Heteroscedastic Errors
In Equations (2), (3), and (4), \(\sigma_{\varepsilon i}^{2}\) are assumed to be homoscedastic. Thus, we propose to account for heteroscedasticity by making \(\sigma_{\varepsilon i}^{2}\) a function of θ, i.e.,
where δ i is a vector of parameters with elements δ i0,…,δ ir , and k(.) is any function that is strictly positive. Within the framework of linear factor analysis for continuous variables, Hessen and Dolan (2009) used an exponential function for k(.), i.e.,
Thus, \(\log(\sigma_{\varepsilon i}^{2})\) is modeled as an rth degree polynomial function of θ. Bauer and Hussong (2009) used a similar function to model the relation between \(\sigma_{\varepsilon i}^{2}\) and a moderator variable. Generally, as a simple test on heteroscedasticity it will suffice to consider the minimal heteroscedasticity model, i.e., r=1,
with baseline parameter δ i0∈(−∞,∞) and heteroscedasticity parameter δ i1∈(−∞,∞). When δ i1=0, error variances are homoscedastic. When δ i1>0 error variances are increasing for increasing levels of θ, and when δ i1<0 error variances are decreasing for increasing levels if θ. Note that, in principle, in the homoscedastic case of the GRM (i.e., δ i1=0) and the heteroscedastic case of the GRM (i.e., δ i1≠0) all parameters are interpreted the same (the thresholds, factor loadings and intercepts), except for δ i0. In the homoscedastic model, δ i0 represents \(\log(\sigma_{\varepsilon i}^{2})\) for all θ’s, while in the heteroscedastic model it represents \(\log(\sigma_{\varepsilon i|\theta}^{2})\) for θ=0.
The function in Equation (8) is not generally suitable within the GRM. To see this, consider the top graph in Figure 3 in which P(Y i =c|θ) from Equation (6) is plotted for C i =5, λ i =1, υ i =0, τ ic ={−3,−1,1,3}, −5≤θ≤5, and \(\sigma_{\varepsilon i|\theta}^{2} = \exp( 0.5 + 0.5 \times \theta)\). It appears that the heteroscedastic errors underlying the distribution of Z i make the category response curves skewed. In the bottom graph of Figure 3, the behavior of the same category response functions is shown for −5≤θ≤50. Clearly, the category response functions for c=0 and c=4 (first and last category) behave undesirably as both have an upper limit of 0.5. How this happens is clear from the category response functions in Equation (6), e.g., for c=0
because τ i0=−∞. From this equation it can be seen that when θ increases, the denominator of the fraction in the normal distribution function, Φ(.), accelerates more than the numerator (as the denominator involves an exponential function of θ and the numerator a linear function of θ). Therefore, for increasing θ the fraction approaches 0 causing P(Y i|θ =c) to approach 1−Φ(0)=0.5 in the limit. Key of the problem is that category 0 has only one effective threshold (τ i1) as the other threshold (τ i0) equals minus infinity. Therefore, a similar problem occurs for c=4 (i.e., the upper category), but not for the intermediate categories as these categories have thresholds that do not involve (minus) infinity. The problem outlined in Figure 3 appears to be occurring on a range of θ that will generally fall outside the observed data. However, when the degree of heteroscedasticity increases, the category response curves of category 0 and 4 will approach 0.5 in the more reasonable ranges for θ as well.
We therefore propose an alternative function for k(.) to link \(\sigma_{\varepsilon i}^{2}\) and θ, i.e.,
with baseline parameter δ i0∈[0,∞) and heteroscedasticity parameter δ i1∈(−∞,∞).
Note that when δ i1>0 and \(\theta \rightarrow \infty \Rightarrow \sigma_{\varepsilon i|\theta}^{2} \rightarrow \delta_{0}\), i.e., the function has an upper bound which prevents the category functions of c=0 and c=C−1 to approach 0.5. In addition, when \(\theta \rightarrow - \infty \Rightarrow \sigma_{\varepsilon i|\theta}^{2} \rightarrow 0\), i.e., the function has a lower bound of 0. Similarly when δ i1<0 and \(\theta \rightarrow - \infty \Rightarrow \sigma_{\varepsilon i|\theta}^{2} \rightarrow \delta_{i0}\), and when \(\theta \rightarrow - \infty \Rightarrow \sigma_{\varepsilon i|\theta}^{2} \rightarrow 0\). In addition, δ i1 is a heteroscedasticity parameter, i.e., when \(\delta_{i1} = 0 \Rightarrow \sigma_{\varepsilon i|\theta}^{2} = \delta_{i0}\).
3.2 Skewed Latent Trait
Within the 2-parameter logistic IRT model, Azevedo et al. (2011) proposed the use of the skew-normal density function (Azzalini, 1985, 1986; Azzalini & Capatanio, 1999; Arnold, Beaver, Groeneveld & Meeker, 1993).Footnote 3 Note that within the field of latent variable modeling, this distribution is also used by Bazán et al. (2006) to model skewness in the category response function of a 2-parameter normal-ogive model (as discussed above) and the distribution is also used by Molenaar et al. (2010a) and Molenaar et al. (2011) to approximate the latent variable distribution in the linear factor model.
An appealing feature of the class of skew-normal distributions is that it includes the normal distribution as a special case, which makes it straightforward to test a given variable for normality. Specifically, if random variable η has a skew-normal distribution, its probability density function is given by
where κ is a location parameter, κ∈(−∞,∞), ω is a scale parameter, ω∈[0,∞), and ζ is a shape parameter, ζ∈(−∞,∞). From these parameters the expected value, variance, and skewness of η can be calculated using the so-called centered parameterization of the skew-normal distribution (see Azzalini 1985)
and
with
Note that in Equation (10), for ζ=0, h(.) reduces to the normal density function. Within the heteroscedastic GRM with skewed θ, the latent trait, θ, can still be interpreted as in the traditional GRM. That is, a high position on θ in the traditional GRM corresponds with a high value on θ in the extended GRM.
3.3 Identification
The present form of the model in Equation (6), is unidentified as both θ and Z i are unobserved variables. Identification of θ traditionally proceeds by restricting E(θ)=0, \(\operatorname{Var}(\theta) = 1\), and identification of Z i proceeds by restricting E(Z i )=0 and \(\operatorname{Var}(Z_{i}) = 1\). Because of Equation (13), these restrictions have the following implications:
and
That is, intercepts are not in the model and the error variances are a deterministic function of the factor loadings. Thus, in the present model, we could not introduce heteroscedastic errors, as \(\sigma_{\varepsilon i}^{2}\) is not a free parameter. We therefore identify the scale of Z i by fixing two adjacent thresholds to some arbitrary values (see Mehta, Neale, & Flay, 2004). The scale of Z i is now identified because the distance between the fixed thresholds define one unit on the Z i scale. An advantage is that we can estimate \(\sigma_{\varepsilon i}^{2}\) and υ i for all i. We are still free to fix the scale of θ by either fixing \(\operatorname{Var}(\theta)\) to equal 1, or by fixing λ i to equal 1 for a given i. In the latter case we can estimate ω from Equation (10). In addition, in multi-group applications we can fix E(θ) to equal 0 in the first group and estimate κ from Equation (10) in the second group. However, in single group applications E(θ) (and thus κ) is fixed.
The restrictions presented above result in an identified model, as we verify in our simulation study (see below). In addition, in all applications of the model (among which are the simulations and illustrations that are presented below), using (marginal) maximum likelihood, the Hessian was positive definite, parameter recovery was satisfactory, and altering starting values did not result in a different solution with the same likelihood. We also note that within the linear one-factor model, we already established that heteroscedastic errors and a skew-normal distribution are simultaneously identified (see Molenaar et al., 2010a). Taking these results together, we are confident that the present model is identified.
3.4 Estimation
Equation (6), with \(\sigma_{\varepsilon i}^{2}\) given by Equation (9) and θ distributed with density function given by Equation (10), constitutes the heteroscedastic GRM with a skewed trait distribution. To fit this model to data, we propose a marginal maximum likelihood procedure (MML; Bock & Aitkin, 1981). That is, we maximize the log-marginal likelihood function, i.e.,
where X denotes the N×q matrix with item scores of N subjects on q items, x (pi) is element (p,i) from this matrix, and γ is the vector of parameters in the model. This vector includes: δ i0, δ i1, λ i , υ i , τ i2, τ i3, and ζ for all i=1,…,q. Note that κ and ω are not free parameters as these are fixed, such that E(θ)=0 and \(\operatorname{Var}(\theta) = 1\) according to Equations (11) and (12) as discussed above.
4 Simulation Study
In the present section, we present results of a small simulation study. The simulation study served three aims. First, we wanted to establish the viability of the model. Second, we wanted to investigate the statistical power to detect the different effects simultaneously and in isolation. Finally, we wanted to investigate the specificity of both effects, i.e., whether the models can distinguish between a dataset which is generated according to a skew-normal θ and a dataset which is generated according to heteroscedastic \(\sigma_{\varepsilon i}^{2}\).
4.1 Design
We simulated data according to the model and varied three aspects in the data: (i) The number of respondents: N=400 or N=800; (ii) the number of answer categories, C=3 or C=5; and (iii) the nature of the effect(s) in the data: no effects in the data, heteroscedastic \(\sigma_{\varepsilon i}^{2}\), skew-normal θ, or both. We simulated data for 10 items (i.e., q=10). In the case that one or both of the effects were present in the data, we chose ζ=2.17 [\(\operatorname{Skew}(\theta) = 0.5\)] and/or δ i1=0.4. All other parameters were fixed to λ i =1, υ i =0, δ i0=1.5, E(θ)=0, \(\operatorname{Var}(\theta) = 1\), and τ ic ={−2,−0.75,0.75,2} when C=5, and τ ic ={−0.75,0.75} when C=3. The models were identified by fixing E(θ)=0, \(\operatorname{Var}(\theta)= 1\), and τ i1=−2, τ i2=−0.75 (if C=5) and τ i1=−0.75, τ i2=0.75 (if C=3).
For each of the 16 conditions (2×2×4) we simulated 40 data sets and fitted four different models: the baseline model with no effects (i.e., ζ=0 and δ i1=0 for all i), a heteroscedasticity model (i.e., δ i1 is free for all i but ζ=0), a skewed-trait model (i.e., ζ is free but δ i1=0 for all i), and a full model with both effects simultaneously (i.e., ζ is free and δ i1 is free for all i).
4.2 Likelihood Ratio and Power Calculation
In testing for skewness in the distribution of θ and/or heteroscedasticity of the errors, we use the likelihood ratio test (LRT). In this test, H0 states that ζ=0 and/or δ i1=0 for some or all of the items, and HA states that ζ≠0 and/or δ i1≠0. The LRT statistic, T, is calculated as
where \(\hat{\boldsymbol{\gamma }}_{0}\) is the estimated vector of parameters from Equation (13) subjected to the parameter restrictions of H0, and \(\hat{\boldsymbol{\gamma }}_{\mathrm{A}}\) is the vector of parameter estimates under HA. Under H0, T has a central-χ 2 distribution with degrees of freedom (df) equal to the number of restrictions in H0. Under HA, this statistic has a non-central χ 2 distribution with a non-centrality parameter that depends on the effect size and sample size. Two important conditions that need to be satisfied for the LRT statistic to approach these theoretical distributions are (i) that H0 is nested under HA and (ii) that the restrictions in H0 should not be on a boundary of the parameter space (Cramér 1946). As our likelihood ratio tests concern either fixing ζ with parameter space (−∞,∞) to 0, and/or fixing δ i1 with parameter space (−∞,∞) to 0, these regularity conditions are satisfied.
In the present simulations, we study the empirical power of the LRT to reject H0 of a normal θ and/or homoscedastic ε i . To this end, we calculated T from Equation (14) for each replication in the design. As T has a non-central χ 2 distribution under HA, it holds that (Fisher 1928):
Thus, the non-centrality parameter (ncp) of the LRT statistic, T, could empirically be approximated by subtracting the df from the average T across the replications in a given cell in the simulation study. As now the ncp of the distribution of T is known under HA, we can choose a nominal alpha level (α) and calculate the power to reject H0 by integrating over the non-central χ 2 distribution. Within covariance structure modeling, Satorra and Saris (1985), (see Molenaar, Dolan, & Wicherts, 2009, for an illustrative example) developed an analytical approach to calculate the ncp given that N is large, and that the misspecification under H0 is not severe. We use essentially the same approach as that of Satorra and Saris; however, because we have no analytical expression for the ncp, we therefore establish the ncp empirically from the data, as described above.
We used Gauss–Hermite quadratures with 50 quadrature points to approximate the integral in the log marginal likelihood function in Equation (13). We fitted the model using the freely available software package Mx (Neale, Boker, Xie, & Maes, 2002). A script to fit the heteroscedastic GRM with skewed θ is available from the website of the first author.
4.3 Results
In Figures 4 and 5, box plots of the parameter estimates are shown for the heteroscedasticity parameters, δ i1, and the skewness parameter, ζ, for the case that both effects were in the data and the full model was fitted to that data. As it appears, true parameter values are recovered quite well. Variability in the parameter estimates is highest when N=400 and C=3 but decreases with increasing N and increasing C.
In Table 1, power coefficients (α=0.05) are shown for the 16 conditions in the simulation study and for three classes of models. The first class of models is the full model, i.e., the heteroscedastic GRM with a skewed trait (abbreviated het-GRM-skew in Table 1), second is the heteroscedastic GRM (het-GRM; i.e., a GRM with heteroscedastic errors only), and third is the GRM with a skewed trait only (GRM-skew). Ideally, when a given effect is in the data (e.g., heteroscedastic \(\sigma_{\varepsilon i}^{2}\)) it is not detected by a misspecified model (i.e., the GRM-skew). Thus, power of the misspecified model should ideally be equal to 0.05 (i.e., equal to the Type I error probability specified a priori). If this is the case, the model is said to be highly specific, that is, it detects only the effects that are in the model and it does not pick up any additional effects. In Table 1, the power coefficients of the misspecified models are underlined and are desired to be close to the nominal alpha level. All other coefficients should be reasonably large.
As can be seen from the table, all underlined values are close to or equal to 0.05 for the het-GRM-skew, which suggests that the two effects (heteroscedastic \(\sigma_{\varepsilon i}^{2}\) and skewed θ) are well separable within this model. For the het-GRM and GRM-skew, the underlined values are departing somewhat more from the 0.05 level (at most 0.31). This indicates that unmodeled skewness in the trait or heteroscedasticity in the errors will increase the false positive rate somewhat.
For N=400 and C=3, power values are around 0.4–0.5 which is judged to be low power (exception is the 0.74 power of the GRM-skew to detect the skewness in θ, which is already acceptable). For C=5, power is acceptable, as power coefficients vary around 0.7–0.8. An exception is the power of the het-GRM-skew to detect the skewed θ in the data. This situation is associated with a power of 0.46 which is still low. However, this is not surprising as the model has 11 extra parameters compared to the standard GRM of which only 1 is concerned with the skewed θ. Power is thus lowered because of the 10 parameters that remain unaffected. For N=800 power is judged to be acceptable-to-high as coefficients are around 0.8–1.0, with the larger power coefficients when C=5.
From Table 1 it also appears that when both effects are in the data, the het-GRM-skew outperforms both the het-GRM and GRM-skew in terms of power. For instance, in the cases that N=800 and C=3, power to detect the heteroscedastic errors is 0.72 and 0.67, respectively, in the het-GRM-skew, compared to 0.53 and 0.29, respectively, in the het-GRM. In addition, power of the het-GRM-skew to detect the skewness in θ is 0.84 compared to 0.25 in the GRM-skew. When only one effect is in the data but the het-GRM-skew is applied nevertheless (which includes both effects), power is somewhat smaller compared to the GRM-skew and het-GRM. For instance, when N=800 and C=3 and a skewed θ underlies the data, the het-GRM-skew has a power of 0.88 to detect the effect, while the GRM-skew has a power of 0.96. In addition, when heteroscedastic errors underlie the data, the het-GRM skew has power of 0.82 to detect this effect, and the het-GRM has a power of 0.88. In the case of one effect in the data, power is, thus, somewhat increased in the models with a single effect only. However, benefits are not that large as compared to when both effects are in the data and the het-GRM-skew is applied. It seems, thus, to be advisable that if one is interested in detecting one effect, e.g., heteroscedastic errors, it is safest to take the possibility of the skewed θ into account as well because of substantial power gains.
Results above depend highly on the present choice of effect sizes. However, we showed that given the chosen effect sizes power can be acceptable for reasonable sample sizes. Regardless of effect size, we can conclude that the model is viable and that the effects are highly specific and, therefore, well separable.
In the present simulation study, we only used effects on θ and \(\sigma_{\varepsilon i}^{2}\) that are in the same direction, i.e., they both make the distribution of Z i positively skewed. We did consider the case in which both effects were in opposite directions (specifically, ζ=−2.17 and δ i1=0.4), but this did not affect the results as presented above. This finding is not surprising as both effects are highly specific and, therefore, do not interfere with each other.
5 Illustration
In this section, we present two applications purported to test the schematicity hypothesis from personality research (Markus 1977; Rogers et al., 1977; Tellegen 1988). As discussed above, this hypothesis supposes that people who are low on a given personality trait have a less clear self-schemata, i.e., they are more ambiguous about their trait level compared to people high on that trait. In the literature, a similar phenomenon is the hypothesis that not all personality traits apply equally well to everybody (Allport, 1937; Baumeister & Tice, 1988). As discussed previously, the schematicity hypothesis predicts that people who are low on a given personality trait are less accurate in reporting their exact position on the answer scale of a given item from a questionnaire. On the item level, this prediction implies heteroscedasticity of the error variances in the GRM. In the applications below, we investigate the schematicity hypothesis by testing for heteroscedastic errors. We do also take the possibility of skewness in the trait into account as this might benefit the power to detect the heteroscedastic errors (as we showed in the simulation). The first application involves an investigation of the schematicity hypothesis in an affect questionnaire, and the second application involves an investigation of the schematicity hypothesis in an alexithymia questionnaire.
6 Application 1
6.1 Description of the Data
A total of 557 psychology freshman completed a Dutch version of the Positive Affect and Negative Affect scale (Guadagnoli & Mor, 1989). All items in these scales consist of words that describe a particular affect (e.g., wistful, desperate, happy, cheerful). Respondents are asked to report on a 5-point Likert scale to what degree they experienced each affect during past week. The questionnaire contains 40 items in total of which 20 purport to measure positive affect and 20 items purport to measure negative affect. Here we limit the analysis to the Negative Affect scale only.Footnote 4 The negative affect scale was judged to have acceptable fit to a discrete one-factor model (RMSEA=0.066).Footnote 5 We identified the model by fixing E(θ)=0, \(\operatorname{Var}(\theta)=1\), τ i0=−2, and τ i1=−0.75.
In the data analysis, we proceeded as follows: First, we fitted the full model (including the skew-normal distribution for θ and the heteroscedastic \(\sigma_{\varepsilon i}^{2}\)). From the full model, we dropped the skewness parameter, ζ, resulting in a model with heteroscedastic errors only. Next, from the full model, we dropped the heteroscedastic errors, resulting in a model with a skewed θ only. Finally, we fitted a model with no effects, i.e., the traditional GRM. To see which of these models fitted the data best, we considered various fit indices. First, we conducted the LRT between the model of interest and the full model as described above. Next, we consulted Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Both the AIC and BIC are calculated on basis of the marginal likelihood, i.e., the incidental parameters, θ, are integrated out and are not included in these indices. For the AIC and BIC, a lower value suggests a better model fit. Note that these indices are not tests in a statistical sense (e.g., Konishi & Kitagawa, 2008), i.e., they are used to decide which model predicts future data best among competing models. For present purposes we adopt this model selection based strategy using the AIC, BIC and LRT. However, we note that a more thorough approach would be to test whether data are a true realization of the stochastic model under investigation (Schmueli 2010).
6.2 Results
Table 2 contains the parameter estimates of υ i , τ i2, τ i3, δ 0i , δ 1i , and ζ in the full model. From Table 3, it can be seen that dropping the skewness parameter, ζ, from the full model did not result in a significant LRT. However, dropping the heteroscedasticity parameters, δ i1, did result in a significant LRT. Thus, judged by the LRT we would choose a model with heteroscedastic errors only. The AIC provides the same conclusion; however, judged by the BIC the model with no effects should be preferred. Thus, results concerning the Negative Affect scale are mixed. The observation that the AIC and the BIC do not agree can be attributed to differences in the penalty that is used in these indices. BIC is known to favor more parsimonious models, while the AIC tends to favor less parsimonious models. We tend to favor the model without effects as this model is more parsimonious, i.e., we put more weight on the BIC. We can thus conclude that we did not find any support for the schematicity hypothesis with respect to self-reported negative affect.
7 Application 2
7.1 Description of the Data
A sample purported to be representative of the Dutch population (N=816) completed the Bermond–Vorst alexithymia questionnaire (Vorst & Bermond, 2001) at home on a computer.Footnote 6 Alexithymia is a personality trait that reflects the inability to understand, process, or describe emotions. The Bermond–Vorst alexithymia questionnaire consists of 6 subscales: Emotionalizing, Fantasizing, Identifying, Verbalizing, Cognitive Analyzing, and Affective Analyzing. Each subscale consists of 8 items measured on a 5-point Likert scale. An example of an item in the Identifying subscale is: “If I am nervous, it is unclear to me which emotion caused this feeling.”
We analyzed all subscales separately. A discrete one-factor model was judged to fit in an acceptable way to the Fantasizing (RMSEA=0.061), Identifying (RMSEA=0.073), Affective Analyzing (RMSEA=0.064), and Cognitive Analyzing (RMSEA=0.060) subscales. The subscales Emotionalizing (RMSEA=0.110) and Verbalizing (RMSEA=0.091) were judged to fit poor to a one-factor model and are, therefore, not included in the analysis. We identified the model by fixing E(θ)=0, \(\operatorname{Var}(\theta)=1\), τ i0=−2, and τ i1=−0.75.
7.2 Results
Table 4 contains the parameter estimates of υ i , τ i2, τ i3, δ 0i , δ 1i , and ζ in the full model for the six subscales. In Table 5, modeling results are given. We followed the same procedure as in Application 1. As shown in Table 5, for all subscales, the full model appears to fit best according to the AIC and BIC. See Figure 6 for the model implied category response functions for the Affective Analyzing subscale under the full model.
Taken together, in contrast to Application 1, conclusions are relatively clear in the sense that we can conclude that for all 4 subscales analyzed, the distribution of Z i is characterized by both heteroscedastic errors and a skewed θ distribution. Additionally, we tested for equality of δ 1i to see whether we could model the heteroscedasticity with a single parameter for all items instead of an individual parameter for each item. We found that this equality restriction held for the Cognitive Analyzing subscale only (AIC reduced to 1267.99 and the BIC reduced to −14622.42). For the other subscales, δ 1i differed between items.
It can be seen in Table 4 that for the Affective Analyzing, Cognitive Analyzing, and Identifying subscales the heteroscedasticity is systematically in the same direction for almost all items.Footnote 7 That is, lower levels of θ are associated with higher error variance. Thus individuals with high levels of alexithymia are more consistent in their answers compared to individuals with lower levels of alexithymia. This finding is in line with the schematicity hypothesis, suggesting that high alexithymia individuals are less ambiguous about their psychological functioning in this regard because they are in little doubt concerning their lack of insight and knowledge of the emotions that they experience. Allport (1937) and Baumeister and Tice (1988) questioned whether personality traits do apply equally well to everybody. In perspective of this line of work, the present results suggest that for people that are well able to describe their emotions, the alexithymia trait applies less well (i.e., is a weaker source of individuals differences) compared to people that are less able to describe their emotion.
8 Discussion
In the present paper, we presented the Heteroscedastic Graded Response Model with a skewed latent trait—a unified model that extends the traditional Graded Response Model by adding heteroscedastic errors and a non-normal latent trait. In introducing heteroscedastic errors, we specified a parametric function between the trait and the error variances. As we showed, this choice is not straightforward, as the commonly used exponential function (Hessen & Dolan, 2009; Bauer & Hussong, 2009) gives undesirable results in the limits of the trait. We therefore adopted a logistic function, which was shown to be adequate. However, the choice for this function was pragmatic. The function could be replaced with any other function that behaves reasonably given the purposes of the researcher.
Although the logistic function performs well in the limits of the trait, it has some drawbacks. When effect sizes are very large (i.e., the logistic function tends to a step function), category response functions of some categories will show patterns that are not realistic from a psychological or measurement point of view. That is, category response curves will have more than one stationary point; for an example, see Figure 7. However, we stress that this occurs only given very large effect sizes, which we consider unrealistic. In addition, in applications of the model, this behavior of the category response function can easily be detected by inspecting relevant plots of the results. When a given item shows these patterns, a different function can be used for this item or all items. Or a non-parametric approach could be considered, as explained in the following.
An alternative to specifying a parametric function between the trait and the error variances is to consider a non-parametric approach. In this approach, heteroscedasticity is detected using a step function. An advantage is that the method is more flexible, i.e., highly specific forms of heteroscedasticity can be tested depending on the substantive theory and/or the aims of the researcher. However, the method will likely be associated with smaller power as compared to the parametric function approach, particularly as the number of steps in the step function increases. This remains to be investigated in depth.
With respect to the distribution of the trait, we considered the skew-normal distribution because, first, this distribution enables straightforward tests on normality; second, its statistical properties are well documented (e.g., Azzalini & Dalla Valle, 1996; Azzalini & Capatanio, 1999; Arnold et al., 1993; Arnold & Beaver, 2002; Chiogna, 2005; Monti, 2003); and, third, the distribution is applied in general (for examples, see Azzalini, 2005) and specifically with respect to psychometrics (e.g., Bazán et al. 2006; Molenaar et al. 2010b, 2011; Azevedo et al. 2011). However, our choice is based on pragmatic considerations; other distributions can be considered as well, e.g., mixture distributions (Muthén & Muthén, 2007; Vermunt 2004; Vermunt & Hagenaars, 2004; Schmitt et al. 2006), the shifted-log normal (Verhelst 2009), or distributions based on Johnson curves (van den Oord, 2005) or splines (Woods 2007).
Notes
Note that the error term may include a systematic component due to misfit.
Cramér’s theorem states that if X 1 and X 2 are independent random variables and X 1+X 2 is normally distributed, it follows that both X 1 and X 2 are normally distributed.
Specifically, Azevedo et al. (2011) used the centered skew-normal distribution, see below.
We note however that for the Positive Affect scale, results are similar.
Because of the illustrational purposes, we judge an RMSEA smaller than 0.08 to be an indication of acceptable model fit (see Schermelleh-Engel, Moosbrugger & Müller, 2003).
This sample was selected from a data set that is much larger (N=5780). The original data included various manipulations (15 in total). We selected subjects from 2 conditions (N 1=410 and N 2=406) that appeared to be homogeneous with respect to the manipulation (specifically, the subjects we selected completed the questionnaire with respectively a ‘scroll’ answer scale and a ‘static’ answer scale).
For the Cognitive Analyzing and Affective Analyzing subscales, only one item is associated with heteroscedasticity in the opposing direction.
References
Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: Wiley.
Allport, G.W. (1937). Personality. A psychological interpretation. New York: Henry Holt.
Arnold, B., & Beaver, R. (2002). Skewed multivariate models related to hidden truncation and/or selective reporting. Test, 11, 7–54.
Arnold, B.C., Beaver, R.J., Groeneveld, R.A., & Meeker, W.Q. (1993). The nontruncated marginal of a truncated bivariate normal distribution. Psychometrika, 58, 471–488.
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12, 171–178.
Azzalini, A. (1986). Further results on a class of distributions which includes the normal ones. Statistica, 46, 199–208.
Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32, 159–188.
Azzalini, A., & Capatanio, A. (1999). Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society. Series B, 61, 579–602.
Azzalini, A., & Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometrika, 83, 715–726.
Azevedo, C.L.N., Bolfarine, H., & Andrade, D.F. (2011). Bayesian inference for a skew-normal IRT model under the centred parameterization. Computational Statistics & Data Analysis, 55, 353–365.
Bauer, D.J., & Hussong, A.M. (2009). Psychometric approaches for developing commensurate measures across independent studies: traditional and new models. Psychological Methods, 14, 101–125.
Baumeister, R.E., & Tice, T.M. (1988). Metatraits. Journal of Personality, 56, 571–598.
Bazán, J.L., Bolfarine, H., & Branco, D.M. (2004). A new family of asymmetric models for item response theory: a skew-normal IRT family (Technical Report No. RT-MAE-2004-17). Department of Statistics, University of São Paulo.
Bazán, J.L., Branco, M.D., & Bolfarine, H. (2006). A skew item response model. Bayesian Analysis, 1, 861–892.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores. Reading: Addison Wesley (Chapters 17–20).
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443–459.
Bollen, K.A. (1996). A limited-information estimator for LISREL models with or without heteroscedastic errors. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling: issues and techniques (pp. 227–241). Mahwah: Erlbaum.
Chen, M.-H., Dey, D.K., & Shao, Q.M. (1999). A new skewed link model for dichotomous quantal response data. Journal of the American Statistical Association, 94, 1172–1186.
Chiogna, M. (2005). A note on the asymptotic distribution of the maximum likelihood estimator for the scalar skew-normal distribution. Statistical Methods & Applications, 14, 331–334.
Cramér, H. (1937). Random variables and probability distributions. Cambridge: Cambridge University Press.
Cramér, H. (1946). Mathematical methods of statistics. Princeton: Princeton University Press.
Czado, C., & Santner, T.J. (1992). The effect of link misspecification on binary regression inference. Journal of Statistical Planning and Inference, 33, 213–231.
Emons, W.H., Meijer, R.R., & Denollet, J. (2007). Negative affectivity and social inhibition in cardiovascular disease: evaluating type-D personality and its assessment using item response theory. Journal of Psychosomatic Research, 63, 27–39.
Fisher, R.A. (1928). The general sampling distribution of the multiple correlation coefficient. Proceedings of the Royal Society of London. Series A, 121, 654–673.
Fraley, R.C., Waller, N.G., & Brennan, K.A. (2000). An item response theory analysis of self-report measures of adult attachment. Journal of Personality and Social Psychology, 78, 350–365.
Guadagnoli, E., & Mor, V. (1989). Measuring cancer patients’ affect: revision and psychometric properties of the Profile of Mood States (POMS). Psychological Assessment, 1, 150–154.
Hessen, D.J., & Dolan, C.V. (2009). Heteroscedastic one-factor models and marginal maximum likelihood estimation. British Journal of Mathematical & Statistical Psychology, 62, 57–77.
Jinks, J.L., & Fulker, D.W. (1970). Comparison of the biometrical genetical, MAVA, and classical approaches to the analysis of human behavior. Psychological Bulletin, 73, 311–349.
Jöreskog, K.J. (2002). Structural equation modeling with ordinal variables using LISREL. Scientific Software International Inc. Retrieved November 3, 2010, from: http://www.ssicentral.com/lisrel/techdocs/ordinal.pdf.
Keselman, H.J., & Lix, L.M. (1997). Analyzing multivariate repeated measures designs when covariance matrices are heterogeneous. British Journal of Mathematical & Statistical Psychology, 50, 319–338.
Kirisci, L., Hsu, T., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25, 146–162.
Konishi, S., & Kitagawa, G. (2008). Information criteria and statistical modeling. New York: Springer.
Long, J.S., & Ervin, L.H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. American Statistician, 54, 217–224.
Markus, H. (1977). Self-schemata and processing information about the self. Journal of Personality and Social Psychology, 35, 63–78.
McDonald, R.P. (1999). Test theory: a unified treatment. Mahwah: Lawrence Erlbaum.
Mehta, P.D., Neale, M.C., & Flay, B.R. (2004). Squeezing interval change from ordinal panel data: latent growth curves with ordinal outcomes. Psychological Methods, 9, 301–333.
Meijer, E., & Mooijaart, A. (1996). Factor analysis with heteroscedastic errors. British Journal of Mathematical & Statistical Psychology, 49, 189–202.
Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Molenaar, D., Dolan, C.V., & van der Maas, H.L.J. (2011). Modeling ability differentiation in the second-order factor model. Structural Equation Modeling, 18, 578–594.
Molenaar, D., Dolan, C.V., & Verhelst, N.D. (2010a). Testing and modeling non-normality within the one factor model. British Journal of Mathematical & Statistical Psychology, 63, 293–317.
Molenaar, D., Dolan, C.V., & Wicherts, J.M. (2009). The power to detect sex differences in IQ test scores using multi-group covariance and mean structure analysis. Intelligence, 37, 396–404.
Molenaar, D., Dolan, C.V., Wicherts, J.M., & van der Maas, H.L.J. (2010b). Modeling differentiation of cognitive abilities within the higher-order factor model using moderated factor analysis. Intelligence, 38, 611–624.
Molenaar, D., van der Sluis, S., Boomsma, D.I., & Dolan, C.V. (2012). Detecting specific genotype by environment interaction using marginal maximum likelihood estimation in the classical twin design. Behavior Genetics, 42, 483–499.
Monti, A.C. (2003). A note on the estimation of the skew normal and the skew exponential power distributions. Metron, LXI, 205–219.
Muthén, B., & Hofacker, C. (1988). Testing the assumptions underlying tetrachoric correlations. Psychometrika, 53, 563–578.
Muthén, L.K., & Muthén, B.O. (2007). Mplus user’s guide (5th ed.). Los Angeles: Muthén & Muthén.
Neale, M.C. (1998). Modeling interaction and nonlinear effects with Mx: a general approach. In G. Marcoulides & R. Schumacker (Eds.), Interaction and non-linear effects in structural equation modeling (pp. 43–61). New York: Lawrence Erlbaum Associates.
Neale, M.C., Aggen, S.H., Maes, H.H., Kubarych, T.S., & Schmitt, J.E. (2006). Methodological issues in the assessment of substance use phenotypes. Addictive Behaviors, 31, 1010–1034.
Neale, M.C., Boker, S.M., Xie, G., & Maes, H.H. (2002). Mx: statistical modeling (6th ed.). Richmond: VCU.
Ramsay, J.O., & Abrahamowicz, M. (1989). Binomial regression with monotone splines: a psychometric application. Journal of the American Statistical Association, 84, 906–915.
Ree, M.J. (1979). Estimating item characteristic curves. Applied Psychological Measurement, 3, 371–385.
Rochon, J. (1992). ARMA covariance structures with time heteroscedasticity for repeated measures experiments. Journal of the American Statistical Association, 87, 777–784.
Rogers, T.B., Kuiper, N.A., & Kirker, W.S. (1977). Self-reference and the encoding of personal information. Journal of Personality and Social Psychology, 35, 677–688.
Samejima, F. (1969). Psychometric monograph: Vol. 17. Estimation of ability using a response pattern of graded scores. Richmond: The Psychometric Society.
Samejima, F. (1997). Departure from normal assumptions: a promise for future psychometrics with substantive mathematical modeling. Psychometrika, 62, 471–493.
Samejima, F. (2000). Logistic positive exponent family of models: virtue of asymmetric item characteristic curves. Psychometrika, 65, 319–335.
Samejima, F. (2008). Graded response model based on the logistic positive exponent family of models for dichotomous responses. Psychometrika, 73, 561–578.
Satorra, A., & Saris, W.E. (1985). The power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90.
Schermelleh-Engel, K., Moosbrugger, H., & Müller, H. (2003). Evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures. Methods of Psychological Research, 8, 23–74.
Schmitt, J.E., Mehta, P.D., Aggen, S.H., Kubarych, T.S., & Neale, M.C. (2006). Semi-nonparametric methods for detecting latent non-normality: a fusion of latent trait and ordered latent class modeling. Multivariate Behavioral Research, 41, 427–443.
Schmueli, G. (2010). To explain or to predict. Statistical Science, 25, 289–310.
Seong, T.J. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299–311.
Spearman, C.E. (1927). The abilities of man: their nature and measurement. New York: Macmillan.
Stone, C.A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: an evaluation of MULTILOG. Applied Psychological Measurement, 16, 1–16.
Swaminathan, H., & Gifford, J. (1983). Estimation of parameters in the three-parameter latent trait model. In D.J. Weiss (Ed.), New horizons in testing: latent trait test theory and computerized adaptive testing (pp. 13–30). New York: Academic Press.
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
Tellegen, A. (1988). The analysis of consistency in personality assessment. Journal of Personality, 56, 621–663.
Tucker-Drob, E.M. (2009). Differentiation of cognitive abilities across the life span. Developmental Psychology, 45, 1097–1118.
van den Oord, E.J. (2005). Estimating Johnson curve population distributions in MULTILOG. Applied Psychological Measurement, 29, 45–64.
van der Sluis, S., Dolan, C.V., Neale, M.C., Boomsma, D.I., & Posthuma, D. (2006). Detecting genotype-environment interaction in monozygotic twin data: comparing the Jinks & Fulker test and a new test based on marginal maximum likelihood estimation. Twin Research and Human Genetics, 9, 377–392.
Verhelst, N.D. (2009). Latent variable analysis with skew distributions. Manuscript in preparation.
Vermunt, J.K. (2004). An EM algorithm for the estimation of parametric and nonparametric hierarchical nonlinear models. Statistica Neerlandica, 58, 220–233.
Vermunt, J.K., & Hagenaars, J.A. (2004). Ordinal longitudinal data analysis. In R.C. Hauspie, N. Cameron, & L. Molinari (Eds.), Methods in human growth research (pp. 374–393). Cambridge: Cambridge University Press.
Vorst, H.C.M., & Bermond, B. (2001). Validity and reliability of the Bermond–Vorst alexithymia questionnaire. Personality and Individual Differences, 30, 413–434.
Wirth, R.J., & Edwards, M.C. (2007). Item factor analysis: current approaches and future directions. Psychological Methods, 12, 58–79.
Woods, C.M. (2007). Ramsay-curve IRT for Likert type data. Applied Psychological Measurement, 31, 195–212.
Zwinderman, A.H., & van den Wollenberg, A.L. (1990). Robustness of marginal maximum likelihood estimation in the Rasch model. Applied Psychological Measurement, 14, 73–81.
Acknowledgements
The research by Dylan Molenaar was made possible by a grant from the Netherlands Organization for Scientific Research (NWO). We thank Harry Vorst, Martin Muller, Pieter Röhling, and Clasine van der Wal for providing the data used in the illustration section and Mariska Knol for the computational resources used in carrying out the simulation study. We also thank three reviewers, the associate editor, and the editor for their comments on previous versions of this paper. Mx input files are available from the site of the first author, www.dylanmolenaar.nl.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We conducted a small simulation study to see how the presence of heteroscedasticity influences the parameter estimates of the traditional GRM. In this simulation study, we simulated 50 data sets according to the extended GRM. We simulated either heteroscedastic residuals or homoscedastic residuals. In case of heteroscedastic residuals, we simulated either a small (δ 1=0.5), medium (δ 1=1), or large effect size (δ 1=1.5). Please note that the large effect size is still realistic, as we found such effect sizes in Application 2 of the manuscript (for instance, for the Cognitive Analyzing subscale 5 of the 7 items showed effects around δ 1=−1.5, see Table 4). We carried out this procedure for N=1000 and N=3000. Next, we fitted the traditional GRM to see how parameter estimates differed from the true values. We found effects on the discrimination parameters and the thresholds, but not on the trait estimates.
From Table A.1 it can be seen that a small degree of heteroscedasticity does not have much effect, but when the heteroscedasticity increases, the root mean squared difference (RMSD) increases when it is not recognized. In the case of a large degree of heteroscedasticity and N=3000, the RMSD is more than two times larger (0.17 to 0.19) compared to the case in which they are homoscedastic (0.07 to 0.08). We found negligible effects on the standard errors (both the empirical standard errors and the theoretical standard errors). It is clear that the discrimination parameters of the traditional GRM are biased if the true model is heteroscedastic. For the threshold, we found even larger effects but this is to be expected as the non-normality in the data is partly captured by the threshold. As already mentioned, for the trait estimates we found no (clear) effects.
Rights and permissions
About this article
Cite this article
Molenaar, D., Dolan, C.V. & de Boeck, P. The Heteroscedastic Graded Response Model with a Skewed Latent Trait: Testing Statistical and Substantive Hypotheses Related to Skewed Item Category Functions. Psychometrika 77, 455–478 (2012). https://doi.org/10.1007/s11336-012-9273-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-012-9273-5