Keywords

1 Introduction

The power of a test is usually estimated through Monte Carlo simulation methods. However, it can alternatively be computed asymptotically using the distribution of the test statistic under the alternative hypothesis that depends on a noncentrality parameter, often unknown or difficult to compute (Gudicha et al., 2017).

In this work we study the asymptotic power of two test statistics, the Lagrange Multiplier (LM) test and the Generalized Lagrange Multiplier (LM(S)) test, to detect measurement non-invariance under correct model specification and model misspecification.

An item is measurement non-invariant, or biased, if it measures different abilities for different group membership identified by an external variable (Mellenbergh, 1982, 1983). Group differences can be present only on the item intercept or simultaneously on the item intercept and slope.

The Lagrange Multiplier test is used in the IRT context to detect measurement non-invariance (Glas, 1998; Fox & Glas, 2005) and other types of model violations such as local dependence, incorrect specification of the item characteristic curve and a non normal distribution of the latent variables (Glas, 1999; Glas & Falcón, 2003; Liu & Thissen, 2012). Despite its extensive use in IRT models, only in a few studies the LM test has been applied in the case of model misspecification under the null and the alternative hypothesis (Glas & Falcón, 2003).

In order to take into account possible misspecification in the model, the LM test can be generalized obtaining the so-called Generalized Lagrange Multiplier test (LM(S)), whose expression involves the sandwich variance and covariance matrix (White, 1982). In the IRT context the performance of the LM(S) test under model misspecification has been recently analyzed by Falk and Monroe (2018) through a elaborate simulation study.

The first objective of this paper is to present the theoretical computation of the asymptotic power of these tests using two different approximation methods to obtain the noncentrality parameter. The second objective is to compare the performance of the LM and LM(S) tests through a simulation study to detect measurement invariance under correct model specification and misspecification of the latent variable distribution in terms of asymptotic and empirical power. The model considered under the null and the alternative hypothesis is a classic Multiple Indicator Multiple Causes (MIMIC) model for binary data, based on the assumption of a normal distribution of the latent factor. The misspecification is introduced by assuming a non normal distribution of the latent factor in the data generating model.

The paper is organized as follows; in Sect. 2 we review the theory of the LM test and the procedures to estimate its asymptotic power, in Sect. 3 we describe the LM(S) test and the procedures to estimate its asymptotic power and in Sect. 4 we present a Monte Carlo simulation study. We conclude with some remarks in Sect. 5.

2 The Lagrange Multiplier Test

Consider a sample y 1, …, y n from a model f(y, θ). Let θ 0 denote the true parameter vector, that can be divided in two subvectors \(\boldsymbol {\theta }_{0}^{\prime }=(\boldsymbol {\theta }_{01}^{\prime },\boldsymbol {\theta }_{02}^{\prime })\). The hypotheses H 0 and H 1 can be formalized as follows:

$$\displaystyle \begin{aligned} H_{0}:\boldsymbol{\theta}^{\prime}_{02}=\mathbf{c}\qquad vs\qquad H_{1}: \boldsymbol{\theta}^{\prime}_{02}\neq \mathbf{c}, \end{aligned} $$
(1)

where c is a vector of constants. The LM statistic is (Engle, 1984):

$$\displaystyle \begin{aligned}LM= \frac{1}{n}S_2(\tilde{\boldsymbol{\theta}})A^{22}(\tilde{\boldsymbol{\theta}})^{-1}S_2(\tilde{\boldsymbol{\theta}}),\end{aligned} $$
(2)

where \(\tilde {\boldsymbol {\theta }}'=(\tilde {\boldsymbol {\theta }}^{\prime }_{1},\mathbf {c})\) denotes the restricted maximum likelihood estimates of the parameters θ, S 2 is the subset of the vector of score functions \(S=\frac {\partial \ln l(\boldsymbol {y},\boldsymbol {\theta })}{\partial \boldsymbol {\theta }}\) corresponding to the parameters θ 02 evaluated at \(\tilde {\boldsymbol {\theta }}\). The matrix A 22 is the block of the partitioned Fisher information matrix \(A=-E\bigg [\frac {1}{n}\frac {\partial ^2 l(\mathbf {y},\boldsymbol {\theta })}{\partial \boldsymbol {\theta } \partial \boldsymbol {\theta }'}\bigg ]\) defined as:

$$\displaystyle \begin{aligned}A^{22}=A_{22}-A_{21}A_{11}^{-1}A_{12},\end{aligned} $$
(3)

evaluated at \(\tilde {\boldsymbol {\theta }}\). The partition of A into A 22, A 21, A 11, A 12 is derived from the partition of \(\boldsymbol {\theta }^{\prime }_0\) into \((\boldsymbol {\theta }^{\prime }_{01},\boldsymbol {\theta }^{\prime }_{02}) \). In this study, we consider the LM test computed with the observed Hessian approach, where the Fisher information matrix in formula (2) is replaced by the corresponding observed Hessian matrix

$$\displaystyle \begin{aligned} \hat{A}({\boldsymbol{\theta}})=-\frac{1}{n}\sum_{i=1}^{n}\frac{\partial^2 l_i({\mathbf{y}}_i,\boldsymbol{\theta})}{\partial \boldsymbol{\theta} \partial \boldsymbol{\theta}'}\end{aligned} $$
(4)

Under a correct specified likelihood and under H 0, the LM statistic is asymptotically distributed as a \(\chi ^{2}_r\), where r are the degrees of freedom (df) equal to the dimension of θ 02 (Silvey, 1959). When the alternative hypothesis is true but the null is tested, the LM test statistic has an asymptotic noncentral chi-square distribution that depends on two parameters, the df and a noncentrality parameter (Bollen, 1989). To compute the local asymptotic power of the LM test, a standard approach is to consider a set of local alternatives that are close to the null value for large n, \(H_1:\boldsymbol {\theta }_{02}=\boldsymbol {c}+\frac {\boldsymbol {\xi }}{\sqrt {n}}\), where ξ is an arbitrary vector with the same dimension of θ 02 (Boos & Stefanski, 2013). Under H 1, the test statistic LM converges in distribution to a \(\chi ^2_r(\lambda )\) with noncentrality parameter λ equal to (Cox and Hinkley, 1979):

$$\displaystyle \begin{aligned}\lambda=\boldsymbol{\xi}' A^{22}(\boldsymbol{\theta}^0)\boldsymbol{\xi},\end{aligned} $$
(5)

where θ 0 = (θ 01, c).

The asymptotic local power is computed as \(P(\chi ^2_r(\lambda )>\chi ^2_r(\lambda ,1-\alpha )\).

2.1 Approximation Procedures for the Asymptotic Power

The asymptotic distribution of the LM test under the alternative hypothesis as a noncentral chi-square with noncentrality parameter (5) holds when the model defined under the set of local alternatives is true, i.e. when the model under the null hypothesis is barely incorrect for large n (see Agresti 2002 and Reiser 2008). In practice, it is often reasonable to adopt an alternative hypothesis for fixed and finite n (Agresti, 2002), as H 1 : θ 02 = c + ξ, or to use hypotheses as (1) (Gudicha et al., 2017). We present here two different approximation procedures for the computation of the noncentrality parameter.

The first method extends the approximation procedure for the asymptotic power derived by Gudicha et al. (2017) for the Likelihood-Ratio and the Wald tests to the LM test. It can be summarized in the following steps:

  1. 1.

    From the model defined under the alternative hypothesis, create a large data set (e.g. N = 10000 observations).

  2. 2.

    Fit the model under H 0 to the data.

  3. 3.

    Take the value of the LM statistic as the estimate of the noncentrality parameter λ (Satorra, 1989; Bollen, 1989).

  4. 4.

    Compute the noncentrality parameter for a sample of size 1 equal to \(\lambda _1=\frac {\lambda }{N}\).

  5. 5.

    The noncentrality parameter for a sample of size n is λ n =  1.

The power of the LM test can be determined by comparing the λ n obtained in step 5 with the tabled values of the noncentral chi-square with df corresponding to the number of parameters constrained under H 0 and significance level α (Bollen, 1989).

We propose a second method, that is also based on some of the steps of the procedure proposed by Gudicha et al. (2017), but the noncentrality parameter is computed according to formula (5). The procedure can be summarized as follows:

  1. 1.

    From the model defined under the alternative hypothesis, create a large data data set (e.g. N = 10000 observations).

  2. 2.

    Fit the model under H 0 to the data.

  3. 3.

    Compute \(\boldsymbol {\xi }=\sqrt {N}(\boldsymbol {\theta }_{02}-\mathbf {c})\) , where θ 02 is the vector of the data generating values (values under H 1) of the constrained parameters and c is the vector of constants under the null hypothesis (Reiser, 2008).

  4. 4.

    Compute the noncentrality parameter according to formula (5) where A 22(θ 0) can be consistently estimated by the corresponding matrix \(\hat {A}\), evaluated at \(\tilde {\boldsymbol {\theta }}\).

  5. 5.

    Compute the noncentrality parameter for a sample of size 1 as \(\lambda _1=\frac {\lambda }{N}\).

  6. 6.

    The noncentrality parameter for a sample of size n is λ n =  1

The power is computed as before, using the noncentrality parameter computed at point 5.

3 The Generalized Lagrange Multiplier Test

Consider a sample y 1, …, y n from a model with true density g(y). The model f(y;θ) is assumed to be true one for the data and differs from g(y). Under the assumptions given in White (1982) the vector of parameter \(\hat {\boldsymbol {\theta }}_{n}\), that maximizes the log-likelihood function based on model f(y;θ) (Quasi-ML estimator, White 1982), converges in probability to θ , the parameter vector that minimizes the Kullback-Leibler information criterion. Moreover the variance and covariance matrix of the Quasi-LM estimator is the sandwich variance and covariance matrix \(\hat {C}(\hat {\boldsymbol {\theta }}_{n})=\hat {A}^{-1}(\hat {\boldsymbol {\theta }}_{n})\hat {B}(\hat {\boldsymbol {\theta }}_{n})\hat {A}^{-1}(\hat {\boldsymbol {\theta }}_{n})\), where the matrix \(\hat {A}\) is defined in formula (4) and \(\hat {B}=\frac {1}{n}\sum _{i=1}^{n}\frac {\partial l_i({\mathbf {y}}_i,\boldsymbol {\theta })}{\partial \boldsymbol {\theta } }\frac {\partial l_i({\mathbf {y}}_i,\boldsymbol {\theta })}{\partial \boldsymbol {\theta } }\) is the observed cross-product matrix (White, 1982).

Under model misspecification, the null and the alternative hypotheses are posed in terms of θ . Let θ be divided in two subvectors \(\boldsymbol {\theta }^{\prime }_{*}=(\boldsymbol {\theta }^{\prime }_{*1},\boldsymbol {\theta }^{\prime }_{*2})\). The hypotheses H 0 and H 1 can be formalized as follows:

$$\displaystyle \begin{aligned}H_{0}:\boldsymbol{\theta}^{\prime}_{*2}=\mathbf{c} \qquad vs\qquad H_{1}:\boldsymbol{\theta}^{\prime}_{*2}\neq \mathbf{c},\end{aligned} $$
(6)

where c is a vector of constants.

The Generalized Lagrange Multiplier Test is defined as:

$$\displaystyle \begin{aligned}LM(S)=\frac{1}{n}S_2(\tilde{\boldsymbol{\theta}}_n)'\hat{A}^{22}(\tilde{\boldsymbol{\theta}}_n)^{-1}{{\hat{C}}_{22}(\tilde{\boldsymbol{\theta}}_n)}^{-1}\hat{A}^{22}(\tilde{\boldsymbol{\theta}}_n)^{-1}S_2(\tilde{\boldsymbol{\theta}}_n),\end{aligned} $$
(7)

where \(\tilde {\boldsymbol {\theta }}_n\) is the constrained quasi-ML estimator, \(\hat {A}^{22}\) is the block of the partitioned observed Hessian matrix computed as in formula (3), evaluated at \(\tilde {\boldsymbol {\theta }}_n\) and \({\hat {C}_{22}}\) is the block of the matrix \(\hat {C}\) corresponding to \(\boldsymbol {\theta }^{\prime }_{*2}\), evaluated at \(\tilde {\boldsymbol {\theta }}_n\). Under H 0 the statistic LM(S) is distributed as a \(\chi ^{2}_r\), where r are the df equal to the dimension of θ ∗2 (White, 1982). To compute the local asymptotic power of the LM(S) test, a standard approach is to consider a set of local alternatives \(H_1:\boldsymbol {\theta }_{*2}=\boldsymbol {c}+\frac {\boldsymbol {\xi }}{\sqrt {n}}\), where ξ is an arbitrary vector of dimension θ ∗2. Under H 1, the test statistic LM(S) converges in distribution to a \(\chi ^2_r(\lambda )\), where r are the df equal to the dimension of θ ∗2 and λ is the noncentrality parameter given by Bera et al. (2020):

$$\displaystyle \begin{aligned} \lambda=\boldsymbol{\xi}' A^{22'}(B_{22}-A_{21}A_{11}^{-1}B_{12}-B_{21}A_{11}^{-1}A_{12}+A_{21}A_{11}^{-1}B_{11}A_{11}^{-1}A_{12})^{-1} A^{22} \boldsymbol{\xi} \end{aligned} $$
(8)

where A is the Fisher information matrix and B is the expected cross-product matrix, evaluated at θ .

If the model is correctly specified, the LM(S) coincides with LM test (White, 1982).

3.1 Estimation Procedure for the Noncentrality Parameter

The estimation method described in Sect. 2.1 to compute the asymptotic power is used here to estimate the asymptotic power for the LM(S) test, with some differences.

In step 3 of the first method, the LM(S) statistic is taken as the estimate of the noncentrality parameter (the proof of this result can be found in Satorra 1989).

In step 4 of the second method, the noncentrality parameter is computed according to formula (8), where the matrices A(θ ) and B(θ ) are consistently estimated by \(\hat {A}\) and \(\hat {B}\), evaluated at \(\tilde {\boldsymbol {\theta }}_n\).

Moreover, the model fitted under H 0 at step 2 is assumed to be misspecified. Under correct model specification the LM(S) and the LM tests have the same noncentrality parameter and, consequently, the same asymptotic power.

4 Simulation Study

4.1 Simulation Design

The aim of this section is to compare the different procedures described above to estimate the asymptotic and the empirical power of the LM and LM(S) tests to detect measurement non-invariance by means of a simulation study. A MIMIC model for binary data is considered. Both under correct and model misspecification, we consider a binary group variable x because we study measurement non-invariance only in two subgroups of population. Given n individuals and p items, under correct model specification data are generated from the following model, where measurement non-invariance is introduced on the intercept of the last item p through the parameter γ 1 and the group variable x:

$$\displaystyle \begin{aligned} &logit (\pi_{ij})=\alpha_{0j}+\alpha_{1j}z_i\qquad i=1,\ldots,n \qquad j=1,\ldots,p-1\\ &logit(\pi_{ip})=\alpha_{0p}+\alpha_{1p}z_i+\gamma_1{x}_i \qquad \qquad \\ &z \sim N(0,1) \end{aligned} $$
(9)

Under misspecification of the latent variable distribution data are generated from the following model, where measurement non-invariance is introduced as before on the intercept of the last item p through the parameter γ 1 and the group variable x:

$$\displaystyle \begin{aligned} &logit (\pi_{ij})=\alpha_{0j}+\alpha_{1j}z_i\qquad i=1,\ldots,n \qquad j=1,\ldots,p-1\\ &logit(\pi_{ip})=\alpha_{0p}+\alpha_{1p}z_i+\gamma_1{x}_i \qquad \qquad \\ &z \sim SN(\kappa) \end{aligned} $$
(10)

In this case, the latent variable z is generated from a Skew-normal (SN) with skewness parameter κ, with the following probability density function (Azzalini, 1985):

$$\displaystyle \begin{aligned}\phi(\epsilon;\kappa)=2\phi(\epsilon)\Phi(\epsilon;\kappa)\end{aligned} $$

where ϕ and Φ are the standard normal density and distribution function, respectively. The parameter κ can takes values from − to + : when it is equal to 0, the Skew-normal reduces to a Standard normal distribution. In the simulations, we consider two values of κ, 3 and 5. When κ = 3 the mean and the variance of the latent variable are 0.76 and 0.43, respectively, and when κ = 5, the mean and the variance of the latent variable are 0.78 and 0.39, respectively. In both models (9) and (10) we consider two possible effect sizes, equal to 0.2 and 0.5, for the parameter γ 1. Moreover, in both cases, the values xs are generated from a Bernoulli distribution with success probability 0.7, the intercepts from a normal distribution with 0 mean and Standard Deviation (SD) 0.1 and the slopes from a normal distribution with 0 mean and SD 0.5.

The following set of hypotheses is being tested:

$$\displaystyle \begin{aligned}H_{0}:\gamma_{1p}=0\qquad vs\qquad H_{1}: \gamma_{1p}\neq 0,\end{aligned}$$

that implies that the last item is tested for measurement invariance.

Model (9) is fitted to the data with γ 1p fixed to 0. When data are generated from model (10) we are working under model misspecification. Indeed, as mentioned before, the true latent variable has mean and variance around 0.7 and 0.4, respectively, and its skewed. Since model (9) is fitted to the data, the misfit is in the mean, assumed to be 0, in the variance, assumed to be 1 and in the distribution of the latent variable, assumed to be symmetric. The following simulation conditions are considered: number of items (p = 10) × sample size (n = 200, 500, 1000, 5000, 10000) × Test statistic (LM, LM(S)). Due to the time complexity, the empirical power is computed only for n = 200, 500, 1000. 200 replications are considered for each condition of the study. The empirical power \(\hat {p}\) is computed as \(\hat {p}=\sum _{l=1}^{N_v}\frac {I(T_{l} \ge c)}{N_v}\), where N v is the number of valid statistics out of the number of replications, I is the indicator function, T l is the value of the test statistic evaluated in the l-th replication and c is the theoretical asymptotic critical value corresponding to the 95-th percentile of the \(\chi ^2_{df}\) distribution, with df equal to the number of constrained parameter under H 0. If non valid statistics occur, they are excluded from the analysis. The asymptotic power is computed through methods 1 and 2 described in Sect. 2.1 and 3.1. The nominal level α is equal to 0.05 in all simulations. ML estimates of the parameters are obtained with direct maximization of the likelihood function using 21 Gauss-Hermite quadrature points. Numerical derivatives are used to compute the Hessian and cross-product matrices.

4.2 Results

Table 1 presents the results for the LM and LM(S) tests under correct model specification when γ 1 is equal to 0.2 and 0.5 in the data generating model, p = 10, n = 200, 500, 1000, 5000, 10000. We can notice that, in general, the differences between the asymptotic and empirical power are small and method 1 is slightly closer to the empirical power than method 2. For what concerns the power to detect measurement non-invariance, the LM test has a slightly higher power compared to the LM(S) tests under all conditions, with the exception of the case γ 1 = 0.5 and for large sample sizes (n = 5000, 10000), where the two tests reach the same power, as expected from the theory.

Table 1 Asymptotic and empirical power of the LM and LM(S) tests under correct model specification, γ 1 = 0.2, 0.5, p = 10, n = 200, 500, 1000, 5000, 10000

Table 2 shows the results for the LM and LM(S) tests computed under misspecification of the latent variable distribution when γ 1 is equal to 0.2 and 0.5 in the data generating model, p = 10, n = 200, 500, 1000, 5000, 10000. Also in this case the differences between the asymptotic and empirical power are small. For what concerns the power to detect measurement non-invariance under model misspecification, despite the fact that the LM(S) test is derived under model misspecification, the LM test has the highest power under all conditions. The two tests reach the same power only when γ 1 = 0.5 and n = 10000. In both Tables and for both tests, the power increases with the sample size and the effect size of the parameter γ 1 and decreases when the model is misspecified.

Table 2 Asymptotic power of the LM and LM(S) tests under incorrect distribution of the latent variable, γ 1 = 0.2, 0.5, p = 10, n = 200, 500, 1000, 5000, 10000

5 Conclusion

In this paper we presented two methods to compute the power of the LM and LM(S) tests, based on their asymptotic distributions under the alternative hypothesis. Moreover, we assessed the performance of these two tests to detect measurement non-invariance under correct model specification and misspecification of the latent variable distribution. The simulation study highlighted that the asymptotic power, computed through the two different approximation methods for the non-centrality parameter, is very close to the empirical power, also under model misspecification. Small differences between the empirical and asymptotic power have been found also by Gudicha et al. (2017) for the Likelihood-Ratio and Wald tests and by Saris et al. (1987) for the score test.

To compute the noncentrality parameter of the LM and LM(S) tests, we have generated data from the model under the alternative hypothesis considering 10000 observations. Increasing this number could reduce the differences between the empirical and asymptotic power, but it would increase the time burden to obtain the parameter estimates and the numerical derivatives used in the noncentrality parameter approximation procedures.

For what concerns the power of the two tests to detect measurement noninvariance, the LM test has a slightly higher power compared to the LM(S) test under most simulation conditions. The two tests reach the same power only for large sample sizes. A similar behaviour of the power of the LM and LM(S) tests has been found also by Falk and Monroe (2018), under correct model specification and misspecification due to an omitted cross-loading.

From this study we can conclude that the asymptotic power can be a valid alternative to obtain the power of a test, both under the correct model and a model with a misspecified distribution of the latent variable since it allows us to reduced the time complexity compared to the empirical power. Although not shown here, the asymptotic power can be used also to find sample sizes necessary to reach a certain power (Boos & Stefanski, 2013; Gudicha et al., 2017). However, the asymptotic power can be computed only for certain test statistics with known noncentrality parameter.

This work was limited only to one type of misspecification. Further analysis should be carried out on the LM(S) test to evaluate if there might be an improvement in its performance considering different types of model misspecification and different estimation methods.