An Illustration on the Quantile-Based Calculation of the Standard Error of Equating in Kernel Equating

González, Jorge; Wallin, Gabriel

doi:10.1007/978-3-030-74772-5_21

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 353))

Included in the following conference series:

The Annual Meeting of the Psychometric Society

986 Accesses

Abstract

Liou and Cheng (J Educ Behav Stat 20(3):259–286, 1995) discussed large-sample approximations for the standard error of equating (SEE) using the results of Bahadur (Ann Math Stat 37(3):577–580, 1966) and Ghosh (Ann Math Stat 42(6):1957–1961, 1971) on the asymptotic representation of sample quantiles. In this paper we revisit the Bahadur representation of sample quantiles, describe its use for the calculation of the SEE of Kernel equating, and present a comparison with the more traditionally used Delta method.

Access provided by Autonomous University of Puebla. Download conference paper PDF

On the Strong Consistency of the Kernel Estimator of Extreme Conditional Quantiles

A plug-in bandwidth selector for nonparametric quantile regression

Article 03 April 2018

Fast algorithms for the quantile regression process

Article 12 July 2020

Keywords

1 Introduction

Equating methods rely on the comparison of score distributions using what is called an equating transformation function. Let F _X and G _Y be the score distributions of the random variables X and Y , corresponding to the test scores on two test forms X and Y, and defined on $\mathcal {X}$ and $\mathcal {Y}$, respectively. The equipercentile equating function $\varphi : \mathcal {X}\mapsto \mathcal {Y}$, computed as

$$\displaystyle \begin{aligned} \varphi(x)=G_Y^{-1}(F_X(x)), \end{aligned} $$

(1)

maps the scores from one test form into the scale of the other (Braun & Holland, 1982; González & Wiberg, 2017). The equating transformation is a functional parameter that in practice is estimated using score data. Although various measures for the assessment of equating functions have been proposed (Wiberg & González, 2016), the uncertainty in the estimation of the equating transformation has mainly been measured by the standard error of equating (SEE),

$$\displaystyle \begin{aligned} \mathrm{SEE}_Y(x) = \sqrt{\mathrm{Var}(\hat{\varphi}(x))}. \end{aligned} $$

(2)

Different methods to calculate the SEE include exact formulas (see, e.g., Kolen and Brennan, 2014, Table 7.2); the Delta method (Lord, 1982), (Braun & Holland, 1982, p. 33), (Holland et al., 1989); and the bootstrap (Tsai et al., 2001). Another method proposed in Liou and Cheng (1995) and later extended by Liou et al. (1997) is based on the Bahadur’s representation of sample quantiles (Bahadur, 1966; Ghosh, 1971). In this paper, this method will be refereed to as the Quantile-Based SEE (QB-SEE).

Liou and Cheng (1995) used the QB-SEE method and obtained results for traditional equipercentile equating under the single group (SG), the equivalent groups (EG), and the nonequivalent groups with anchor test (NEAT) designs. Later, Liou et al. (1997) extended this work to include the kernel equating transformation using Gaussian and Uniform kernels, considering only the NEAT design. These authors did however not make any comparison between the QB-SEE and the more traditionally used Delta method for the estimation of the SEE under the kernel equating framework. In this paper, we aim to fill this gap.

The paper is organized as follows. In Sect. 2 we briefly revise the kernel equating transformation and the way the SEE is calculated using the Delta method. Next, in Sect. 3 we introduce the QB-SEE method and give the details on how it can be used under the kernel equating framework. An illustration of the comparison between the QB-SEE and the delta method applied to the estimated kernel equating transformation is shown in Sect. 4. The paper ends in Sect. 5 summarizing the main results and discussing on future research.

2 Equating and the Standard Error of Equating

In this section we briefly review the basics of kernel equating and the way the SEE has been calculated within this framework. Next, we introduce the QB-SEE method and show how it adapts to be used in KE.

2.1 Kernel Equating

Kernel equating (Holland & Thayer, 1989; von Davier et al., 2004) is a semiparametric method used to estimate the equating function (González & von Davier, 2013). The score distributions are estimated using both kernel density estimation techniques (the nonparametric part) and maximum likelihood estimates of score probabilities (the parametric part).

Let X(h _X) be a continuized version of the discrete score random variable X, defined as

$$\displaystyle \begin{aligned}X(h_X)= a_X(X+h_XV)+(1-a_X)\mu_X,\end{aligned}$$

where V is a continuous random variable with mean 0 and variance $\sigma ^2_V$, $a_X^2=\sigma ^2_X/(\sigma ^2_X +\sigma ^2_Vh_X^2)$, μ _X and $\sigma ^2_X$ are the mean and variance of X, and h _X is a smoothing parameter. The estimated score distribution of X(h _X) is obtained as

$$\displaystyle \begin{aligned}\hat{F}_{h_X}(x) =\sum_j \hat{r}_j K (\hat{R}_{jX}(x)),\end{aligned} $$

where r _j = Pr(X = x _j) are score probabilities, typically modelled using log-linear models estimated by maximum likelihood, $\hat {R}_{jX}(x) = \big (x-\hat {a}_Xx_j-(1-\hat {a}_X)\hat {\mu }_X\big )/\hat {a}_Xh_X$, and K is a kernel defined by the distribution of V . In this paper we will assume that $V\sim N(0,\sigma _V^2)$ so that K = Φ, the standard normal (or Gaussian) distribution function.

Defining s _k = Pr(Y = y _k), and with similar expressions for $\hat {R}_{kY}(y)$, a _Y, and Y (h _Y), the score distribution of the continuized Y scores, $\hat {G}_{h_Y}$, is obtained leading to calculate the kernel equating function as

$$\displaystyle \begin{aligned}\varphi(x,\hat{\mathbf{r}},\hat{\mathbf{s}})=\hat{G}_{h_Y}^{-1}(\hat{F}_{h_X}(x)),\end{aligned} $$

where $ \hat {\mathbf {r}} = (\hat {r}_1, \ldots , \hat {r}_J)^\top $ and $ \hat {\mathbf {s}} = (\hat {s}_1, \ldots , \hat {s}_K)^\top . $

Because $\hat {\mathbf {r}}$ and $\hat {\mathbf {s}}$ are maximum likelihood estimates, the Delta method (e.g., Rao, 1973; Lehmann, 1999), described next, has been used to estimate the uncertainty on the estimation of φ.

2.2 SEE in Kernel Equating

The SEE in KE is based on the Delta method. The following theorem from von Davier et al. (2004) formalizes the result.

Theorem 1 (Delta method for the SEE in KE)

If , then

where

$$\displaystyle \begin{aligned}\varSigma = \begin{pmatrix} \varSigma_{\hat{\mathbf{r}}} & \varSigma_{\hat{\mathbf{r}}, \hat{\mathbf{s}}}\\ \varSigma^\top_{\hat{\mathbf{r}}, \hat{\mathbf{s}}} & \varSigma_{\hat{\mathbf{s}}}\end{pmatrix},\end{aligned}$$

and

$$\displaystyle \begin{aligned}J_{\varphi}=\left( \frac{\partial\varphi}{\partial\mathbf{r}} , \frac{\partial\varphi}{\partial\mathbf{s}} \right).\end{aligned}$$

When the score probabilities are obtained using maximum likelihood estimates from log-linear models and estimated for different equating designs using a design function, $DF(\hat {\mathbf {r}}, \hat {\mathbf {s}})$, von Davier et al. (2004) showed that the asymptotic variance obtained via the Delta method can be written as

$$\displaystyle \begin{aligned}Var(\hat{\varphi}) = ||J_{\varphi}J_{DF}C||{}^2 \end{aligned}$$

where J _φ is the Jacobian of the equating function, J _DF is the Jacobian matrix of the design function and C is a factor of the covariance matrix such that Σ = CC ^⊤. From this result, the SEE for the kernel equating function is defined as

$$\displaystyle \begin{aligned} \mbox{SEE}_Y(x) = ||J_{\varphi}J_{DF}C||, \end{aligned} $$

(3)

which in this paper is denoted as $\mbox{SEE}^{\varDelta }_Y(x)$.

3 Quantile-Based Estimation of SEE

The QB-SEE method is based on the so called Bahadur’s representation of sample quantiles. The main result is presented in Ghosh (1971) and reproduced here.

Theorem 2 (Ghosh, 1971)

Suppose that G is once differentiable at ξ _p = G ⁻¹(p) with G ^′(ξ _p) > 0. If 0 < p < 1, then

$$\displaystyle \begin{aligned} \hat{\xi}_p = \xi_p + \frac{p - \hat{G}(\xi)}{G^\prime(\xi)}+o_p\big(N^{-1/2}\big). \end{aligned} $$

(4)

Liou and Cheng (1995) used this result to derive a formula for the SEE of the equipercentile equating transformation. After replacing p by F _X and checking regularity conditions, these authors took the variance in (4) to obtain

$$\displaystyle \begin{aligned} Var(\hat{G}^{-1}(F(x))) = \frac{1}{G^{\prime}(\varphi)^2}\Big \{\mathrm{Var}(\hat{F}_X) +\mathrm{Var}(\hat{G}_Y(\varphi))-2\mathrm{Cov}(\hat{F}_X,\hat{G}_Y(\varphi))\Big \}.\end{aligned} $$

(5)

We call the square root of this expression the QB-SEE and denote it as $\mbox{SEE}^{B}_Y(x)$. In the next section we describe how the QB-SEE can be used to evaluate the kernel equating transformation under the NEAT design. For a critical review of the NEAT equating design see San Martín and González (2020).

3.1 Quantile-Based SEE in KE

The sample estimates of the score distributions can be replaced by kernel estimates in which case the QB-SEE formula becomes

$$\displaystyle \begin{aligned} \mbox{SEE}^{B}_Y(x) = \frac{1}{G^{\prime}(\varphi)}\Big \{\mathrm{Var}(\hat{F}_{h_X}) +\mathrm{Var}(\hat{G}_{h_Y}(\varphi))-2\mathrm{Cov}(\hat{F}_{h_X},\hat{G}_{h_Y}(\varphi))\Big \}^{1/2}. \end{aligned} $$

(6)

To derive the QB-SEE for the particular case of equating under the NEAT design, we introduce the following additional notation: t _l = Pr(A = a _l) are the marginal score probabilities for the anchor random variable A, and r _j|l and s _k|l are the conditional score probabilities of X and Y given A, respectively.

Following Liou et al. (1997), the variances and covariance terms in (6) can be obtained as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{aligned} \mathrm{Var}(\hat{F}_{h_X}) & = \mathrm{Var}\bigg(\sum_j \hat{r}_j K (\hat{R}_{jX}(x))\bigg) \\ & \approx \sum_j \sum_{j^{\prime}} K \big(R_{JX}(x)\big) K \big(R_{j^{\prime} X}(x)\big) \mathrm{Cov}\big[\hat{r}_j, \hat{r}_{j^{\prime}}\big] \\ & = \sum_{j} K_{jX}^2 \big(R_{JX}(x) \big) \mathrm{Var}\big[\hat{r}_j\big] \\ & + \operatorname*{\sum \sum}_{j\ne j^{\prime}} K_{jX}\big(R_{JX}(x)\big) K_{j^{\prime} X}\big(R_{JX}(x)\big) \mathrm{Cov}\big[\hat{r}_j, \hat{r}_{j^{\prime}}\big] \end{aligned} , \end{array} \end{aligned} $$

(7)

where

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{aligned} \mathrm{Var}\big[\hat{r}_j\big] & = \sum_{l} \Bigg\{ \frac{\hat{r}_{j|l}[1-\hat{r}_{j|l}]}{(n_X+1)\hat{h}(a_l)-1} \hat{h}^2(a_l) + \frac{\hat{h}(a_l)[1-\hat{h}(a_l)]}{n_X + n_Y}\hat{r}^2_{j|l} \\ & + \frac{\hat{r}_{j|l}\big[1-\hat{r}_{j|l}\big]\hat{h}(a_l)\big[1-\hat{h}(a_l)\big]}{\big[(n_X+1)\hat{h}(a_l)-1\big](n_X+n_Y)} \Bigg\} - \operatorname*{\sum \sum}_{l\ne l^{\prime}} \frac{\hat{h}(a_l) \hat{h}(a_{l^{\prime}})}{n_X + n_Y} \hat{r}_{j|a} \hat{r}_{j|a^{\prime}} \end{aligned} , \end{array} \end{aligned} $$

(8)

and

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{aligned} \mathrm{Cov}\big[\hat{r}_j, \hat{r}_{j^{\prime}}\big] & = \sum_{a} \Bigg\{ \frac{\hat{h}(a_l)[1-\hat{h}(a_l)]}{n_X + n_Y} \hat{r}_{j|l} \hat{r}_{j^{\prime}|l} - \frac{\hat{r}_{j|l} \hat{r}_{j^{\prime}|l}}{(n_X + 1) \hat{h}(a_l)-1}\hat{h}^2(a_l) \\ & -\frac{\hat{r}_{j|l} \hat{r}_{j^{\prime}|l} \hat{h}(a_l)[1-\hat{h}(a_l)]}{\big[(n_X+1)\hat{h}(a_l)-1\big](n_X + n_Y)} \Bigg\} - \operatorname*{\sum \sum}_{l\ne l^{\prime}} \frac{\hat{h}(a_l) \hat{h}(a_{l^{\prime}})}{n_X + n_Y} \hat{r}_{j|l} \hat{r}_{j^{\prime}|l^{\prime}} \end{aligned} \end{array} \end{aligned} $$

(9)

Replacing terms accordingly, similar derivations lead to obtain the variance of G _Y.

Finally, the covariance term is calculated as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{aligned} \mathrm{Cov}(\hat{F}_{h_X}, \hat{G}_{h_Y}) & \approx \sum_{j} \sum_{k} K(R_{jX}) K(R_{kY}) \Bigg\{ \sum_{l} \frac{\hat{h}(a_l)[1-\hat{h}(a_l)]}{n_X + n_Y} \hat{r}_{j|l} \hat{s}_{k|l} \\ & - \operatorname*{\sum \sum}_{l\ne l^{\prime}} \frac{\hat{h}(l) \hat{h}(l^{\prime})}{n_X + n_Y} \hat{r}_{j|l} \hat{s}_{k|l^\prime} \Bigg\} \end{aligned} . \end{array} \end{aligned} $$

(10)

In all previous equations either sample estimates or presmoothed score probabilities can be used as weights for the kernel. In the next section, the former case is considered for illustration and to compare the SEE for KE as calculated using the traditional delta method with the QB-SEE method

4 Illustration

4.1 Data

We use data described in Kolen and Brennan (2014). The data set consists of two 36-items test forms. Form X was administered to 1,655 examinees and form Y was administered to 1,638 examinees. Also, 12 out of the 36 items are common between both test forms (items 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, and 36). The data come with the distribution of the CIPE software which is freely available at https://education.uiowa.edu/centers/center-advanced-studies-measurement-and-assessment/computer-programs and is also available in the equate (Albano, 2016) and SNSequate (González, 2014) R packages.

4.2 Analyses

To investigate how $\mbox{SEE}^{B}_Y(x)$ is related to $\mbox{SEE}^{\varDelta }_Y(x)$, we compared the SEE for KE under the NEAT-PSE design calculated using the Delta, the QB-SEE, and Bootstrap methods.

The SEE based on the Delta method is calculated using Equation (3), which is implemented in SNSequate and appears as one of the output values for a call to the ker.eq( ) function.

The QB-SEE is calculated using Equation (6). The variances and covariance components of the numerator on the right hand side are obtained using Equations 7–10, whereas the denominator corresponds to the derivative of G evaluated in the equated score, which in this case correspond to a Gaussian kernel.

The bootstrap SEE implements the procedure described in Kolen and Brennan (2014, Chap. 7) to compute the SEE using 500 replications.

All the analyses were carried out using the R software (R Core Team, 2020) and the code is available from the authors upon request.

4.3 Results

The SEE obtained for the three compared methods are shown graphically in Fig. 1. Except for some score values in the lower range of the score scale, it can be seen that all the methods yielded similar estimations of SEE for the analyzed data.

The results for all SEE methods reflect that there are few test-takers in the tails of the score scale, as illustrated by the increased values of the SEE. The results also suggest that the QB-SEE and the Delta method produce very similar results, although leaning on different asymptotic results. Given that both the QB-SEE and Delta method SEE deviate from the bootstrap SEE in the lower tail, the results also indicate that they might be better approximations when the number of test-takers is large, which is expected given that they both are large-sample approximations.

5 Discussion

In this paper we have revisited the result of Bahadur on the asymptotic representation of sample quantiles and its use in the derivation of what we call the QB-SEE method of estimating the standard error of equating. The method was applied for kernel equating transformations under the NEAT design and compared to the more traditional Delta method of obtaining SEE. Results from a numerical illustration shown that the QB-SEEs are very similar to the SEEs obtained using the Delta method, for the analyzed data set.

An advantage of the QB-SEE method is that it allows to separate sources of uncertainty influencing the SEE, as it can be grasped from (5). A comprehensive simulation study to assess how these variances and covariance terms vary according to different conditions is planned for future research. Another advantage of this method is that, in comparison to the Delta method, it does not rely on normality. This could open room for other models and methods for presmoothing that do not necessarily resort on the normality of parameter estimates, as it is the case of log-linear models estimated using maximum likelihood.

Future work include other methods to estimate the variance-covariance components in the SEE formulas and the evaluation of other kernels and other equating designs.

References

Albano, A. D. (2016). Equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1–36
Article Google Scholar
Bahadur, R. R. (1966). A note on quantiles in large samples. The Annals of Mathematical Statistics, 37(3), 577–580
Article MathSciNet MATH Google Scholar
Braun, H., & Holland, P. (1982). Observed-score test equating: A mathematical analysis of some ets equating procedures. In P. Holland, & D. Rubin (Eds.), Test equating (vol 1, pp. 9–49). New York: Academic Press
Google Scholar
Ghosh, J. K. (1971). A new proof of the bahadur representation of quantiles and an application. The Annals of Mathematical Statistics, 42(6), 1957–1961
Article MathSciNet MATH Google Scholar
González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59(7), 1–30
Article Google Scholar
González, J., & von Davier, M. (2013). Statistical models and inference for the true equating transformation in the context of local equating. Journal of Educational Measurement, 50(3), 315–320
Article Google Scholar
González, J., & Wiberg, M. (2017). Applying test equating methods using R. New York: Springer
Book MATH Google Scholar
Holland, P., King, B., & Thayer, D. (1989). The standard error of equating for the kernel method. Technical Report 89–83, Educational Testing Service
Google Scholar
Holland, P., & Thayer, D. (1989). The kernel method of equating score distributions. Technical report, Princeton: Educational Testing Service
Google Scholar
Kolen, M., & Brennan, R. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). New York: Springer
Book MATH Google Scholar
Lehmann, E. L. (1999). Elements of large-sample theory. New York: Springer
Book MATH Google Scholar
Liou, M., & Cheng, P. E. (1995). Asymptotic standard error of equipercentile equating. Journal of Educational and Behavioral Statistics, 20(3), 259–286
Article Google Scholar
Liou, M., Cheng, P. E., & Johnson, E. G. (1997). Standard error of the kernel equating methods under the common-item design. Applied Psychological Measurement, 21(4), 349–369
Article Google Scholar
Lord, F. (1982). The standard error of equipercentile equating. Journal of Educational and Behavioral Statistics, 7(3), 165
Article Google Scholar
R Core Team (2020). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing, Vienna, Austria
Google Scholar
Rao, C. R. (1973). Linear statistical inference and applications. New York: Wiley
Book MATH Google Scholar
San Martín, E., & González, J. (2020). A Critical View on the NEAT Equating Design: Statistical Modelling and Identifiability Problems. Manuscript submitted for publication
Google Scholar
Tsai, T., Hanson, B., Kolen, M., & Forsyth, R. (2001). A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design. Applied Measurement in Education, 14(1), 17–30
Article Google Scholar
von Davier, A. A., Holland, P. & Thayer, D. (2004). The Kernel method of test equating. New York: Springer
Book MATH Google Scholar
Wiberg, M., & González, J. (2016). Statistical assessment of estimated transformations in observed-score equating. Journal of Educational Measurement, 53(1), 106–125
Article Google Scholar

Download references

Author information

Authors and Affiliations

Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Santiago, Chile
Jorge González
Laboratorio Interdiciplinario de Estadística Social, LIES, Pontificia Universidad Católica de Chile, Santiago, Chile
Jorge González
CNRS, Laboratoire J.A. Dieudonné, team Maasai, Université Côte d’Azur Inria, Nice, France
Gabriel Wallin

Authors

Jorge González
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Wallin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge González .

Editor information

Editors and Affiliations

Department of Statistics, USBE, Umeå University, Umeå, Västerbottens Län, Sweden
Marie Wiberg
Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
Dylan Molenaar
Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Santiago, Chile
Jorge González
Kellogg School of Management, Northwestern University, Evanston, IL, USA
Ulf Böckenholt
Department of Educational Psychology, University of Wisconsin-Madison, Madison, WI, USA
Jee-Seon Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

González, J., Wallin, G. (2021). An Illustration on the Quantile-Based Calculation of the Standard Error of Equating in Kernel Equating. In: Wiberg, M., Molenaar, D., González, J., Böckenholt, U., Kim, JS. (eds) Quantitative Psychology. IMPS 2020. Springer Proceedings in Mathematics & Statistics, vol 353. Springer, Cham. https://doi.org/10.1007/978-3-030-74772-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-74772-5_21
Published: 24 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74771-8
Online ISBN: 978-3-030-74772-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics