Keywords

1 Introduction

Equating methods rely on the comparison of score distributions using what is called an equating transformation function. Let F X and G Y be the score distributions of the random variables X and Y , corresponding to the test scores on two test forms X and Y, and defined on \(\mathcal {X}\) and \(\mathcal {Y}\), respectively. The equipercentile equating function \(\varphi : \mathcal {X}\mapsto \mathcal {Y}\), computed as

$$\displaystyle \begin{aligned} \varphi(x)=G_Y^{-1}(F_X(x)), \end{aligned} $$
(1)

maps the scores from one test form into the scale of the other (Braun & Holland, 1982; González & Wiberg, 2017). The equating transformation is a functional parameter that in practice is estimated using score data. Although various measures for the assessment of equating functions have been proposed (Wiberg & González, 2016), the uncertainty in the estimation of the equating transformation has mainly been measured by the standard error of equating (SEE),

$$\displaystyle \begin{aligned} \mathrm{SEE}_Y(x) = \sqrt{\mathrm{Var}(\hat{\varphi}(x))}. \end{aligned} $$
(2)

Different methods to calculate the SEE include exact formulas (see, e.g., Kolen and Brennan, 2014, Table 7.2); the Delta method (Lord, 1982), (Braun & Holland, 1982, p. 33), (Holland et al., 1989); and the bootstrap (Tsai et al., 2001). Another method proposed in Liou and Cheng (1995) and later extended by Liou et al. (1997) is based on the Bahadur’s representation of sample quantiles (Bahadur, 1966; Ghosh, 1971). In this paper, this method will be refereed to as the Quantile-Based SEE (QB-SEE).

Liou and Cheng (1995) used the QB-SEE method and obtained results for traditional equipercentile equating under the single group (SG), the equivalent groups (EG), and the nonequivalent groups with anchor test (NEAT) designs. Later, Liou et al. (1997) extended this work to include the kernel equating transformation using Gaussian and Uniform kernels, considering only the NEAT design. These authors did however not make any comparison between the QB-SEE and the more traditionally used Delta method for the estimation of the SEE under the kernel equating framework. In this paper, we aim to fill this gap.

The paper is organized as follows. In Sect. 2 we briefly revise the kernel equating transformation and the way the SEE is calculated using the Delta method. Next, in Sect. 3 we introduce the QB-SEE method and give the details on how it can be used under the kernel equating framework. An illustration of the comparison between the QB-SEE and the delta method applied to the estimated kernel equating transformation is shown in Sect. 4. The paper ends in Sect. 5 summarizing the main results and discussing on future research.

2 Equating and the Standard Error of Equating

In this section we briefly review the basics of kernel equating and the way the SEE has been calculated within this framework. Next, we introduce the QB-SEE method and show how it adapts to be used in KE.

2.1 Kernel Equating

Kernel equating (Holland & Thayer, 1989; von Davier et al., 2004) is a semiparametric method used to estimate the equating function (González & von Davier, 2013). The score distributions are estimated using both kernel density estimation techniques (the nonparametric part) and maximum likelihood estimates of score probabilities (the parametric part).

Let X(h X) be a continuized version of the discrete score random variable X, defined as

$$\displaystyle \begin{aligned}X(h_X)= a_X(X+h_XV)+(1-a_X)\mu_X,\end{aligned}$$

where V is a continuous random variable with mean 0 and variance \(\sigma ^2_V\), \(a_X^2=\sigma ^2_X/(\sigma ^2_X +\sigma ^2_Vh_X^2)\), μ X and \(\sigma ^2_X\) are the mean and variance of X, and h X is a smoothing parameter. The estimated score distribution of X(h X) is obtained as

$$\displaystyle \begin{aligned}\hat{F}_{h_X}(x) =\sum_j \hat{r}_j K (\hat{R}_{jX}(x)),\end{aligned} $$

where r j = Pr(X = x j) are score probabilities, typically modelled using log-linear models estimated by maximum likelihood, \(\hat {R}_{jX}(x) = \big (x-\hat {a}_Xx_j-(1-\hat {a}_X)\hat {\mu }_X\big )/\hat {a}_Xh_X\), and K is a kernel defined by the distribution of V . In this paper we will assume that \(V\sim N(0,\sigma _V^2)\) so that K = Φ, the standard normal (or Gaussian) distribution function.

Defining s k = Pr(Y = y k), and with similar expressions for \(\hat {R}_{kY}(y)\), a Y, and Y (h Y), the score distribution of the continuized Y scores, \(\hat {G}_{h_Y}\), is obtained leading to calculate the kernel equating function as

$$\displaystyle \begin{aligned}\varphi(x,\hat{\mathbf{r}},\hat{\mathbf{s}})=\hat{G}_{h_Y}^{-1}(\hat{F}_{h_X}(x)),\end{aligned} $$

where \( \hat {\mathbf {r}} = (\hat {r}_1, \ldots , \hat {r}_J)^\top \) and \( \hat {\mathbf {s}} = (\hat {s}_1, \ldots , \hat {s}_K)^\top . \)

Because \(\hat {\mathbf {r}}\) and \(\hat {\mathbf {s}}\) are maximum likelihood estimates, the Delta method (e.g., Rao, 1973; Lehmann, 1999), described next, has been used to estimate the uncertainty on the estimation of φ.

2.2 SEE in Kernel Equating

The SEE in KE is based on the Delta method. The following theorem from von Davier et al. (2004) formalizes the result.

Theorem 1 (Delta method for the SEE in KE)

If , then

where

$$\displaystyle \begin{aligned}\varSigma = \begin{pmatrix} \varSigma_{\hat{\mathbf{r}}} & \varSigma_{\hat{\mathbf{r}}, \hat{\mathbf{s}}}\\ \varSigma^\top_{\hat{\mathbf{r}}, \hat{\mathbf{s}}} & \varSigma_{\hat{\mathbf{s}}}\end{pmatrix},\end{aligned}$$

and

$$\displaystyle \begin{aligned}J_{\varphi}=\left( \frac{\partial\varphi}{\partial\mathbf{r}} , \frac{\partial\varphi}{\partial\mathbf{s}} \right).\end{aligned}$$

When the score probabilities are obtained using maximum likelihood estimates from log-linear models and estimated for different equating designs using a design function, \(DF(\hat {\mathbf {r}}, \hat {\mathbf {s}})\), von Davier et al. (2004) showed that the asymptotic variance obtained via the Delta method can be written as

$$\displaystyle \begin{aligned}Var(\hat{\varphi}) = ||J_{\varphi}J_{DF}C||{}^2 \end{aligned}$$

where J φ is the Jacobian of the equating function, J DF is the Jacobian matrix of the design function and C is a factor of the covariance matrix such that Σ = CC . From this result, the SEE for the kernel equating function is defined as

$$\displaystyle \begin{aligned} \mbox{SEE}_Y(x) = ||J_{\varphi}J_{DF}C||, \end{aligned} $$
(3)

which in this paper is denoted as \(\mbox{SEE}^{\varDelta }_Y(x)\).

3 Quantile-Based Estimation of SEE

The QB-SEE method is based on the so called Bahadur’s representation of sample quantiles. The main result is presented in Ghosh (1971) and reproduced here.

Theorem 2 (Ghosh, 1971)

Suppose that G is once differentiable at ξ p = G −1(p) with G (ξ p) > 0. If 0 < p < 1, then

$$\displaystyle \begin{aligned} \hat{\xi}_p = \xi_p + \frac{p - \hat{G}(\xi)}{G^\prime(\xi)}+o_p\big(N^{-1/2}\big). \end{aligned} $$
(4)

Liou and Cheng (1995) used this result to derive a formula for the SEE of the equipercentile equating transformation. After replacing p by F X and checking regularity conditions, these authors took the variance in (4) to obtain

$$\displaystyle \begin{aligned} Var(\hat{G}^{-1}(F(x))) = \frac{1}{G^{\prime}(\varphi)^2}\Big \{\mathrm{Var}(\hat{F}_X) +\mathrm{Var}(\hat{G}_Y(\varphi))-2\mathrm{Cov}(\hat{F}_X,\hat{G}_Y(\varphi))\Big \}.\end{aligned} $$
(5)

We call the square root of this expression the QB-SEE and denote it as \(\mbox{SEE}^{B}_Y(x)\). In the next section we describe how the QB-SEE can be used to evaluate the kernel equating transformation under the NEAT design. For a critical review of the NEAT equating design see San Martín and González (2020).

3.1 Quantile-Based SEE in KE

The sample estimates of the score distributions can be replaced by kernel estimates in which case the QB-SEE formula becomes

$$\displaystyle \begin{aligned} \mbox{SEE}^{B}_Y(x) = \frac{1}{G^{\prime}(\varphi)}\Big \{\mathrm{Var}(\hat{F}_{h_X}) +\mathrm{Var}(\hat{G}_{h_Y}(\varphi))-2\mathrm{Cov}(\hat{F}_{h_X},\hat{G}_{h_Y}(\varphi))\Big \}^{1/2}. \end{aligned} $$
(6)

To derive the QB-SEE for the particular case of equating under the NEAT design, we introduce the following additional notation: t l = Pr(A = a l) are the marginal score probabilities for the anchor random variable A, and r j|l and s k|l are the conditional score probabilities of X and Y given A, respectively.

Following Liou et al. (1997), the variances and covariance terms in (6) can be obtained as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{aligned} \mathrm{Var}(\hat{F}_{h_X}) & = \mathrm{Var}\bigg(\sum_j \hat{r}_j K (\hat{R}_{jX}(x))\bigg) \\ & \approx \sum_j \sum_{j^{\prime}} K \big(R_{JX}(x)\big) K \big(R_{j^{\prime} X}(x)\big) \mathrm{Cov}\big[\hat{r}_j, \hat{r}_{j^{\prime}}\big] \\ & = \sum_{j} K_{jX}^2 \big(R_{JX}(x) \big) \mathrm{Var}\big[\hat{r}_j\big] \\ & + \operatorname*{\sum \sum}_{j\ne j^{\prime}} K_{jX}\big(R_{JX}(x)\big) K_{j^{\prime} X}\big(R_{JX}(x)\big) \mathrm{Cov}\big[\hat{r}_j, \hat{r}_{j^{\prime}}\big] \end{aligned} , \end{array} \end{aligned} $$
(7)

where

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{aligned} \mathrm{Var}\big[\hat{r}_j\big] & = \sum_{l} \Bigg\{ \frac{\hat{r}_{j|l}[1-\hat{r}_{j|l}]}{(n_X+1)\hat{h}(a_l)-1} \hat{h}^2(a_l) + \frac{\hat{h}(a_l)[1-\hat{h}(a_l)]}{n_X + n_Y}\hat{r}^2_{j|l} \\ & + \frac{\hat{r}_{j|l}\big[1-\hat{r}_{j|l}\big]\hat{h}(a_l)\big[1-\hat{h}(a_l)\big]}{\big[(n_X+1)\hat{h}(a_l)-1\big](n_X+n_Y)} \Bigg\} - \operatorname*{\sum \sum}_{l\ne l^{\prime}} \frac{\hat{h}(a_l) \hat{h}(a_{l^{\prime}})}{n_X + n_Y} \hat{r}_{j|a} \hat{r}_{j|a^{\prime}} \end{aligned} , \end{array} \end{aligned} $$
(8)

and

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{aligned} \mathrm{Cov}\big[\hat{r}_j, \hat{r}_{j^{\prime}}\big] & = \sum_{a} \Bigg\{ \frac{\hat{h}(a_l)[1-\hat{h}(a_l)]}{n_X + n_Y} \hat{r}_{j|l} \hat{r}_{j^{\prime}|l} - \frac{\hat{r}_{j|l} \hat{r}_{j^{\prime}|l}}{(n_X + 1) \hat{h}(a_l)-1}\hat{h}^2(a_l) \\ & -\frac{\hat{r}_{j|l} \hat{r}_{j^{\prime}|l} \hat{h}(a_l)[1-\hat{h}(a_l)]}{\big[(n_X+1)\hat{h}(a_l)-1\big](n_X + n_Y)} \Bigg\} - \operatorname*{\sum \sum}_{l\ne l^{\prime}} \frac{\hat{h}(a_l) \hat{h}(a_{l^{\prime}})}{n_X + n_Y} \hat{r}_{j|l} \hat{r}_{j^{\prime}|l^{\prime}} \end{aligned} \end{array} \end{aligned} $$
(9)

Replacing terms accordingly, similar derivations lead to obtain the variance of G Y.

Finally, the covariance term is calculated as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{aligned} \mathrm{Cov}(\hat{F}_{h_X}, \hat{G}_{h_Y}) & \approx \sum_{j} \sum_{k} K(R_{jX}) K(R_{kY}) \Bigg\{ \sum_{l} \frac{\hat{h}(a_l)[1-\hat{h}(a_l)]}{n_X + n_Y} \hat{r}_{j|l} \hat{s}_{k|l} \\ & - \operatorname*{\sum \sum}_{l\ne l^{\prime}} \frac{\hat{h}(l) \hat{h}(l^{\prime})}{n_X + n_Y} \hat{r}_{j|l} \hat{s}_{k|l^\prime} \Bigg\} \end{aligned} . \end{array} \end{aligned} $$
(10)

In all previous equations either sample estimates or presmoothed score probabilities can be used as weights for the kernel. In the next section, the former case is considered for illustration and to compare the SEE for KE as calculated using the traditional delta method with the QB-SEE method

4 Illustration

4.1 Data

We use data described in Kolen and Brennan (2014). The data set consists of two 36-items test forms. Form X was administered to 1,655 examinees and form Y was administered to 1,638 examinees. Also, 12 out of the 36 items are common between both test forms (items 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, and 36). The data come with the distribution of the CIPE software which is freely available at https://education.uiowa.edu/centers/center-advanced-studies-measurement-and-assessment/computer-programs and is also available in the equate (Albano, 2016) and SNSequate (González, 2014) R packages.

4.2 Analyses

To investigate how \(\mbox{SEE}^{B}_Y(x)\) is related to \(\mbox{SEE}^{\varDelta }_Y(x)\), we compared the SEE for KE under the NEAT-PSE design calculated using the Delta, the QB-SEE, and Bootstrap methods.

The SEE based on the Delta method is calculated using Equation (3), which is implemented in SNSequate and appears as one of the output values for a call to the ker.eq( ) function.

The QB-SEE is calculated using Equation (6). The variances and covariance components of the numerator on the right hand side are obtained using Equations 710, whereas the denominator corresponds to the derivative of G evaluated in the equated score, which in this case correspond to a Gaussian kernel.

The bootstrap SEE implements the procedure described in Kolen and Brennan (2014, Chap. 7) to compute the SEE using 500 replications.

All the analyses were carried out using the R software (R Core Team, 2020) and the code is available from the authors upon request.

4.3 Results

The SEE obtained for the three compared methods are shown graphically in Fig. 1. Except for some score values in the lower range of the score scale, it can be seen that all the methods yielded similar estimations of SEE for the analyzed data.

Fig. 1
figure 1

SEE for the three compared methods

The results for all SEE methods reflect that there are few test-takers in the tails of the score scale, as illustrated by the increased values of the SEE. The results also suggest that the QB-SEE and the Delta method produce very similar results, although leaning on different asymptotic results. Given that both the QB-SEE and Delta method SEE deviate from the bootstrap SEE in the lower tail, the results also indicate that they might be better approximations when the number of test-takers is large, which is expected given that they both are large-sample approximations.

5 Discussion

In this paper we have revisited the result of Bahadur on the asymptotic representation of sample quantiles and its use in the derivation of what we call the QB-SEE method of estimating the standard error of equating. The method was applied for kernel equating transformations under the NEAT design and compared to the more traditional Delta method of obtaining SEE. Results from a numerical illustration shown that the QB-SEEs are very similar to the SEEs obtained using the Delta method, for the analyzed data set.

An advantage of the QB-SEE method is that it allows to separate sources of uncertainty influencing the SEE, as it can be grasped from (5). A comprehensive simulation study to assess how these variances and covariance terms vary according to different conditions is planned for future research. Another advantage of this method is that, in comparison to the Delta method, it does not rely on normality. This could open room for other models and methods for presmoothing that do not necessarily resort on the normality of parameter estimates, as it is the case of log-linear models estimated using maximum likelihood.

Future work include other methods to estimate the variance-covariance components in the SEE formulas and the evaluation of other kernels and other equating designs.