Keywords

1 Introduction

Reliability is a measure of overall internal consistency of a test. It has been widely used in statistical, psychological, educational, social and behavioral research when data are collected by responding to items in a test or a questionnaire (Bollen, 1989; Finney & DiStefano, 2006). A high value of reliability indicates the measure provides similar, reliable, and stable results under consistent conditions. There are many approaches that have been proposed to estimate reliabilities. Among them, the classical test theory (CTT) approach has been widely used, and the Cronbach alpha is the most popular reliability. But it only measures the lower bound on the consistency of a test (Green et al., 1977; Novick & Lewis, 1967; Sijtsma, 2009). Another approach using structural equation modeling (SEM) has been proposed to obtain more accurate reliabilities (Bentler, 2009; Bollen, 1989; Green & Yang, 2009; Miller, 1995; Raykov, 1997; Raykov & Shrout, 2002). But these methods focus on addressing continuous outcomes.

For tests with ordered categorical responses, Green and Yang (2009) proposed a nonlinear reliability coefficient within an SEM framework, and the nonlinear reliability has been found to be more accurate than the linear reliability that treats categorical scores as continuous. But they only considered the items with the same number of categories. Kim, Lu and Cohen (2020) extended their research to broader situations and proposed a general formula for reliabilities. But these formulas are only for single level data structure. And their research did not consider the composite reliability or the coefficient Omega (McDonald, 1985) and the maximal reliability H of weighted sum (Bentler, 2007) for tests with categorical responses. So far, there has been no research on this topic.

In order to fill the gap, the current study reviewed various approaches to reliabilities, extended single level reliabilities to multilevel reliabilities, and provided closed-form formulas for multilevel nonlinear SEM reliabilities for tests with ordered categorical responses via a multilevel confirmatory factor analysis (MCFA) approach. Multilevel alpha was also considered.

2 Reliabilities

2.1 CTT Approach

Suppose there are J items in a test. In classical test theory (CTT), an observed score X j on item j (j = 1, …, J) is composed of two uncorrelated components, a latent true score or trait, T j, and an error score, ϵ j, with mean of 0:

$$ {X}_j={T}_j+{\epsilon}_j $$

Let X, T and ϵ be the sum of observed scores, of true scores, and of error scores, respectively, across J items. Then

$$ X={X}_1+{X}_2+\dots +{X}_J=\sum_{j=1}^J{X}_j, $$

and \( T=\sum_{j=1}^J{T}_j,\epsilon =\sum_{j=1}^J{\epsilon}_j, \) so we have X = T + ϵ.

We want to make sure how much of variance of observed score is due to the latent true score versus the error. One measure is to use reliability. The reliability coefficient of a test is defined as the ratio of the true variance to the total variance, which is the sum of the true variance and the error variance. Mathematically, it is

$$ \rho =\frac{\sigma_T^2}{\sigma_x^2}, $$

where \( {\sigma}_T^2 \) is the variance of T, and \( {\sigma}_x^2 \) is the variance of X (Lord & Novick, 1968). The reliability quantifies the proportion or ratio. It is an estimation of how much random error might be in the scores around the true score.

Cronbach Alpha

Under CTT, there are many ways to estimate reliability: test-retest, alternative forms, split-half, Spearman-Brown prophecy formula, and the Cronbach alpha (or coefficient Alpha). Among them, the Cronbach alpha (Cronbach, 1951) is the most commonly used. The Cronbach alpha is defined as

$$ \alpha =\frac{J^2{\overline{\sigma}}_{x{x}^{\prime }}}{{\sigma_x}^2} $$

where J is the total number of items, σ x 2 is the variance of observed scores of the test X, and \( {\overline{\sigma}}_{xx\prime } \) is the mean of off-diagonal covariance between two parallel tests X and X . Specifically, the alpha can be calculated as

$$ \alpha =\frac{J^2\ast \frac{\sum_i\sum_{j,i<j}\ {\sigma}_{x_i,{x}_j}}{\frac{J\left(J-1\right)}{2}}}{{\sigma_x}^2} $$

where \( \sum_i\sum_{j,i<j}{\sigma}_{\varepsilon_i,{\varepsilon}_j} \) is the sum of lower (or upper) off-diagonal covariance between items i and j.

2.2 CFA Approach

However, the Cronbach alpha is only a lower bound on the internal consistency of the test (Green et al., 1977; Novick & Lewis, 1967; Sijtsma, 2009). To get a better estimate, another approach to reliability is the structural equation modeling (SEM) approach. Specifically, this approach uses confirmatory factor analysis (CFA) to estimate reliability. Tests are assumed to have underlying factorial structure (factors, e.g., reading ability, math ability, or personality).

$$ {X}_j^{\ast }={\lambda}_{1j}{\eta}_1+{\lambda}_{2j}{\eta}_2+\cdots +{\lambda}_{Mj}{\eta}_M+{e}_j, $$

where \( {X}_j^{\ast } \) is a continuous score for item j, M is the number of latent factors, η m (1 ≤ m ≤ M) are latent factors weighted by corresponding factor loadings λ m, and e j is a measurement error term. We assume errors are independent and have variance \( {\sigma}_{\varepsilon_j}^2 \) for item j.

Suppose there are J items in the test and X and T are the sum of observed scores and of true scores, respectively. Then \( {X}^{\ast }=\sum_{j=1}^J{X}_j^{\ast } \) and \( T=\sum_{j=1}^J\sum_{m=1}^M{\lambda}_{mj}{\eta}_m \). Because item scores are presented by a confirmatory factor analysis model, the linear reliability ρ lin is calculated as the ratio of true sum score variance to observed sum score variance where the true score variance is estimated using the CFA model above.

$$ {\rho}_{lin}=\frac{\sigma_T^2}{\sigma_{X^{\ast}}^2}=\frac{Var\left(\sum_{j=1}^J\sum_{m=1}^M{\lambda}_{mj}{\eta}_m\right)}{Var\left(\sum_{j=1}^J{X}_j^{\ast}\right)}. $$

Here ρ lin measures the linear proportion of observed sum score variance that is attributed to the latent factors, η 1, η 2, …, η M (Bollen, 1989).

Composite Reliability/Coefficient Omega ω

If item scores have only one factor, then M = 1 and the reliability above becomes

$$ {\rho}_{lin}=\frac{\sigma_T^2}{\sigma_{X^{\ast}}^2}=\frac{Var\left(\sum_{j=1}^J{\lambda}_j\eta \right)}{Var\left(\sum_{j=1}^J{X}_j^{\ast}\right)}. $$

And if we assume that latent factor have unit variance Var(η) = 1, then the reliability ρ lin is referred to as composite reliability or coefficient omega ω (McDonald, 1985)

$$ \omega =\frac{{\left(\sum {\lambda}_{x_j}\right)}^2}{{\left(\sum {\lambda}_{x_j}\right)}^2+\sum {\sigma}_{\varepsilon_j}^2} $$

where \( \sum {\lambda}_{x_j} \) is a sum of the factor loading of item j, and \( \sum {\sigma}_{\varepsilon_j}^2 \) is a sum of all error variances.

Maximal Reliability H for Weighted Sum

Composite reliability represents the relation between a scale’s underlying latent factor and its unit-weighted composite, but a scale’s unit-weighted composite may not optimally reflect its underlying latent construct. The true score variance estimated in factor analysis allows for heterogeneous indicator weights, so it is reasonable to allow heterogeneous weights when creating a scale’s composite score.

$$ X={w}_1{X}_1+{w}_2{X}_2+\dots +{w}_J{X}_J=\sum_{j=1}^J{w}_j{X}_j $$

One approach to comparing true score variance for one common factor to the variance of a unit-weighted scale is presented as maximal reliability H (e.g., Bentler, 2007). When the weight vector

$$ W={\left({\lambda}^{\prime }{\psi}^{-1}\lambda \right)}^{-1/2}\ {\psi}^{-1}\lambda, $$

where λ is the factor loading matrix and ψ is the residue variance matrix, then the maximal reliability is given by

$$ {\rho}_{w\prime x\left(\mathit{\max}\right)}=\frac{\lambda^{\prime }{\psi}^{-1}\lambda }{\lambda^{\prime }{\psi}^{-1}\lambda +1} $$

By assuming that the variance of each item is 1, the standardized version of maximal reliability for a single common factor model can be expressed as follows (e.g., Hancock & Mueller, 2001; Geldhof et al., 2014).

$$ H=\frac{\sum \frac{\lambda_{x_j}^2}{\sigma_{\varepsilon_j}^2}}{1+\sum \frac{\lambda_{x_j}^2}{\sigma_{\varepsilon_j}^2}}=\frac{\sum \frac{\lambda_{x_j}^2}{1-{\lambda}_{x_j}^2}}{1+\sum \frac{\lambda_{x_j}^2}{1-{\lambda}_{x_j}^2}}=\frac{1}{1+\frac{1}{\sum \frac{\lambda_{x_j}^2}{1-{\lambda}_{x_j}^2}}} $$

where \( {\lambda}_{x_j}^2 \) is the squared standardized factor loading of item j, and \( {\sigma}_{\varepsilon_j}^2 \) is the error variance of item j.

3 Multilevel Reliabilities

The data collected from social and educational areas often have multilevel structure. In these cases, multilevel reliabilities for multilevel structure were proposed.

3.1 Multilevel Confirmatory Factor Analysis (MCFA)

Muthén (2011) defined a multilevel confirmatory factor analysis (MCFA) by assuming a one-factor model holds for both the between and the within components. The observed value of the p-dimensional variable y gi is partitioned into three components:

$$ {y}_{gi}=V+{y}_{Bg}+{y}_{wgi} $$

where y gi is the observed value of individual i in group g, V is a grand mean, y Bg is the between-group part of the observed value, and y wgi is the within-group part of the observed value. The multilevel CFA specifies a model at between-group level and within-group level separately. Suppose there are h factors between groups and m factors within groups, the between-group level CFA model is

$$ {y}_{Bg}={\varLambda}_{Bg}{\eta}_{Bg}+{\varepsilon}_{Bg} $$

where Λ Bg is a (p × h) matrix of factor loadings with elements λBg’s, ηBg is a h-dimensional vector of factor scores with the assumption of η Bg~MN(0h, Ψ h × h), and ε Bg is a p-dimensional vector of errors with the assumption of ε i~MN(0p, Φ p × p). And the within-group CFA model level is defined as

$$ {y}_{wgi}={\varLambda}_{wg}{\eta}_{wgi}+{\varepsilon}_{wgi} $$

where Λ wg is a (p × m) matrix of factor loadings with elements λ wg’s, η wgi is a m-dimensional vector of factor scores with the assumption of η wgi~MN(0m, Ψ m × m ), and ε wgi is a p-dimensional vector of errors with the assumption of ε i~MN(0p, Φ p × p).

If there is only one factor either between-group or within-group, then the variance of the observed variable y, \( {\sigma}_{y_{gi}}^2 \), is decomposed (Muthén, 2011) as

$$ {\sigma}_{y_{gi}}^2={\lambda}_{Bg}^2{\sigma}_{\eta Bg}^2+{\sigma}_{\varepsilon Bg}^2+{\lambda}_{wg}^2{\sigma}_{\eta wgi}^2+{\sigma}_{\varepsilon wgi}^2 $$
$$ ={\sigma}_{BF}^2+{\sigma}_{BE}^2+{\sigma}_{WF}^2+{\sigma}_{WE}^2 $$

where λ Bg is between-group factor loadings, \( {\sigma}_{\eta Bg}^2 \) is the variances of between-group factor scores, \( {\sigma}_{\varepsilon Bg}^2 \) is the variances of between-group errors, λ wg is within-group factor loadings, \( {\sigma}_{\eta wg}^2 \) is the variances of within-group factor scores, and \( {\sigma}_{\varepsilon wg}^2 \) is the variances of within-group errors, σ BF 2 is a between-level factor score variance, σ BE 2 is a between-level error variance, σ WF 2 is a within-level factor score variance, and σ WE 2 is an within-level error variance.

3.2 Multilevel Reliabilities

Applying MCFA to reliability calculation, p dimensions become J items, and we assume there is only one factor either between-group or within-group.

Multilevel Alpha

The multilevel alpha is calculated as

$$ \mathrm{within}-\mathrm{group}\ \mathrm{level}\ \alpha =\frac{J^2\ast {\overline{\sigma}}_{wg i, wgj}}{\sigma_{wg}^2} $$
$$ \mathrm{between}-\mathrm{group}\ \mathrm{level}\ \alpha =\frac{J^2\ast {\overline{\sigma}}_{Bg i, Bgj}}{\sigma_{Bg}^2} $$

where J is the number of items, \( {\overline{\sigma}}_{wgi, wgj} \) is the average of the within-group covariance between items i and j. \( {\sigma}_{wg}^2 \) is the variance of the within-group part of observed value, \( {\overline{\sigma}}_{Bgi, Bgj} \) is the mean of the between-group covariance between items i and j, and \( {\sigma}_{Bg}^2 \) is the variance of the between-group part of observed value.

Multilevel Omega

And multilevel omega is obtained as

$$ \mathrm{within}-\mathrm{group}\ \mathrm{level}\ \upomega =\frac{{\left(\sum {\lambda}_{wg_j}\right)}^2}{{\left(\sum {\lambda}_{wg_j}\right)}^2+\sum {\sigma}_{{\varepsilon wg}_j}^2\ } $$
$$ \mathrm{between}-\mathrm{group}\ \mathrm{level}\ \upomega =\frac{{\left(\sum {\lambda}_{Bg_j}\right)}^2}{{\left(\sum {\lambda}_{Bg_j}\right)}^2+\sum {\sigma}_{{\varepsilon Bg}_j}^2} $$

where ∑λ wgj is a sum of within-group level squared factor loading of item j, and \( \sum {\upsigma}_{\upvarepsilon \mathrm{wgi}}^2 \) is a sum of within-group level error variances, ∑λ Bgj is a sum of within-group level squared factor loading of item i, and \( \sum {\sigma}_{\varepsilon Bgj}^2 \) is a sum of within-group level error variances.

Multilevel H

Maximal H is calculated by

$$ \mathrm{within}-\mathrm{group}\ \mathrm{level}\ H=\frac{1}{1+\frac{1}{\sum \frac{\lambda_{wg_j}^2}{1-{\lambda}_{wg_j}^2}}} $$
$$ \mathrm{between}-\mathrm{group}\ \mathrm{level}\ H=\frac{1}{1+\frac{1}{\sum \frac{\lambda_{Bg_j}^2}{1-{\lambda}_{Bg_j}^2}}} $$

where \( {\lambda}_{wgj}^2 \) is the squared standardized within-group factor loading of item j, \( {\lambda}_{Bgj}^2 \) is the squared standardized between-group factor loading of item j.

4 Multilevel Reliability for Categories Responses

It is very common in social and behavioral sciences that items have ordered categories. When the observed data are ordinal categorical, fitting linear SEM models using the linear estimation method is not desirable because it violates the assumption and provides inflated chi-square estimates and attenuated factor loadings (Bollen, 1989). To address this problem, we consider the observed categorical scores (Xj) are from underlying continuous variables (\( {\mathrm{X}}_{\mathrm{j}}^{\ast}\Big) \) and the nonlinear relationship between Xj and \( {\mathrm{X}}_{\mathrm{j}}^{\ast } \) is

$$ {X}_j=\left\{\begin{array}{c}{C}_j-1,\kern3.25em if\kern3em {X}_j^{\ast}\ge {v}_{C_j-1}\kern1em \\ {}\vdots \kern8.5em \vdots \kern3em \\ {}1,\kern3.75em if\kern1em {v}_1\le {X}_j^{\ast }<{v}_2\\ {}\kern1.5em 0,\kern3.75em if\kern3.5em {X}_j^{\ast }<{v}_{1.}\kern1.5em \end{array}\right. $$

where C j is the number of categories for X j, and the ν i (i = 1, 2, …, C j − 1) are the category thresholds. If \( {X}_j^{\ast } \) is less then ν 1, Xj is equal to 0, for \( {v}_1\le {\mathrm{X}}_{\mathrm{j}}^{\ast }<{v}_2 \), Xj is equal to 1, and if \( {\mathrm{X}}_{\mathrm{j}}^{\ast } \) is above \( {\nu}_{C_j-1} \), Xj is equal to Cj − 1. If the structure of the test is well-specified, this approach can estimate reliability more accurately than the linear SEM approach.

Multilevel Alpha for Tests with Categorical Responses

Single level alpha for categorical responses can be calculated by using polychoric correlations if we assume the variance of each item is 1. Reliability calculated from parallel measures. For multilevel alpha, both within- and between-group levels reliabilities are calculated.

Multilevel Composite Reliability for Tests with Categorical Responses

Single level composite reliabilities have been proposed by Kim, Lu and Cohen (2020) to investigated reliability with items having the same or different numbers of ordered categories. For multilevel composite reliabilities, the same formula can be applied at both within- and between-group levels. Multilevel Omega for tests with categorical responses is a simplified version for one factor models.

Multilevel Maximal Reliability for Tests with Categorical Responses

There has been no research on this topic done before. In this article, we derived the numerical formula as follows. Suppose X and \( \overset{\sim }{X} \) are two parallel tests, which are two weighted sums, \( X=\sum_{j=1}^J{w}_j{X}_j \) and \( \overset{\sim }{X}=\sum_{j^{\prime }=1}^J{w}_{j^{\prime }}{\overset{\sim }{X}}_{j^{\prime }} \). To estimate the reliability for the nonlinear measurement model, the correlation between X and \( \overset{\sim }{X} \)is used, which is

$$ {\rho}_{X\overset{\sim }{X}}=\frac{Cov\left(X,\overset{\sim }{X}\right)}{\sqrt{\mathit{\operatorname{var}}(X)\mathit{\operatorname{var}}\left(\overset{\sim }{X}\right)}}. $$

The numerator for weighted sums is

$$ Cov\left(X,\overset{\sim }{X}\right)= Cov\left(\sum_{j=1}^J{w}_j{X}_j,\sum_{j^{\prime }=1}^J{w}_{j^{\prime }}{\overset{\sim }{X}}_{j^{\prime }}\right)=\sum_{j=1}^J\sum_{j^{\prime }=1}^J{w}_j{w}_{j^{\prime }} Cov\left({X}_j,{\overset{\sim }{X}}_{j^{\prime }}\right), $$

in which

$$ Cov\left({X}_j,{\overset{\sim }{X}}_{j^{\prime }}\right)=E\left({X}_j{\overset{\sim }{X}}_{j^{\prime }}\right)-E\left({X}_j\right)E\left({\overset{\sim }{X}}_{j^{\prime }}\right) $$
$$ \begin{aligned} &=\left(\sum_{k=1}^{C_j-1}\sum_{l=1}^{C_{j^{\prime }}-1}{\varPhi}_2\left({\nu}_{j_k},{h}_{j_l^{\prime }};{\rho}_M\right)-\left({C}_{j^{\prime }}-1\right)\sum_{k=1}^{C_j-1}{\varPhi}_1\left({\nu}_{j_k}\right)-\left({C}_j-1\right)\right. \\ &\quad \times \left. \sum_{l=1}^{C_{j^{\prime }}-1}{\varPhi}_1\left({h}_{j_l^{\prime }}\right)+\left({C}_j-1\right)\left({C}_{j^{\prime }}-1\right)\right)\\ & \quad -\left(-\sum_{k=1}^{C_j-1}{\varPhi}_1\left({\nu}_{j_k}\right)+\left({C}_j-1\right)\right)\left(-\sum_{k=1}^{C_{j\prime }-1}{\varPhi}_1\left({h}_{j_l^{\prime }}\right)+\left({C}_{j^{\prime }}-1\right)\right) \end{aligned} $$
$$ =\sum_{k=1}^{{\mathrm{C}}_j-1}\sum_{l=1}^{C_{j^{\prime }}-1}{\Phi}_2\left({\nu}_{j_k},{h}_{j_l^{\prime }};{\rho}_M\right)-\sum_{k=1}^{C_j-1}{\Phi}_1\left({\nu}_{j_k}\right)\sum_{l=1}^{C_{j^{\prime }}-1}{\Phi}_1\left({h}_{j_l^{\prime }}\right) $$

and

$$ {\rho}_M=\sum_{m=1}^M\sum_{m^{\prime }=1}^M{\lambda}_{mj}{\lambda}_{m^{\prime }{j}^{\prime }}{\rho}_{\eta_m{\eta}_{m^{\prime }}} $$

The denominator is

$$ \begin{aligned} Var(X)& = Var\left(\sum_{j=1}^J{w}_j{X}_j\right)=\sum_{j=1}^J\sum_{j^{\prime }=1}^J{w}_j{w}_{j^{\prime }} Cov\left({X}_j,{X}_{j^{\prime }}\right)\\ & =\sum_{j=1}^J\sum_{j^{\prime }=1}^J{w}_j{w}_{j^{\prime }}\left(\sum_{k=1}^{C_j-1}\sum_{l=1}^{C_{j^{\prime }}-1}{\Phi}_2\left({\nu}_{j_k},{h}_{j_l^{\prime }};{\rho}_{X_j^{\ast }{X}_{j^{\prime}}^{\ast }}\right)\right. \\ &\quad \left. -\sum_{k=1}^{C_j-1}{\Phi}_1\left({\nu}_{j_k}\right)\sum_{l=1}^{C_{j^{\prime }}-1}{\Phi}_1\left({h}_{j_l^{\prime }}\right)\right) \end{aligned} $$

Therefore the reliability is calculated from parallel measures of ordered categories responses

$$ {\rho}_{Cat}=\frac{\sum_{j=1}^J\sum_{j^{\prime }=1}^J{w}_j{w}_{j^{\prime }}\left[\sum_{k=1}^{C_j-1}\sum_{l=1}^{C_{j^{\prime }}-1}{\Phi}_2\left({\nu}_{j_k},{h}_{j_l^{\prime }};{\rho}_M\right)-\sum_{k=1}^{C_j-1}{\Phi}_1\left({\nu}_{j_k}\right)\sum_{l=1}^{C_{j^{\prime }}-1}{\Phi}_1\left({h}_{j_l^{\prime }}\right)\right]}{\sum_{j=1}^J\sum_{j^{\prime }=1}^J{w}_j{w}_{j^{\prime }}\left[\sum_{k=1}^{C_j-1}\sum_{l=1}^{C_{j^{\prime }}-1}{\Phi}_2\left({\nu}_{j_k},{h}_{j_l^{\prime }};{\rho}_{X_j^{\ast }{X}_{j^{\prime}}^{\ast }}\right)-\sum_{k=1}^{C_j-1}{\Phi}_1\left({\nu}_{j_k}\right)\sum_{l=1}^{C_{j^{\prime }}-1}{\Phi}_1\left({h}_{j_l^{\prime }}\right)\right]} $$

For one factor models, the formula above can be greatly simplified. The multilevel maximal reliability of weighted sum will apply the formula at both within-group and between-group levels.

5 Conclusions

This study proposed a confirmatory factor analysis approach to multilevel reliability for tests with ordered categories item responses. It extended single level reliabilities to multilevel reliabilities, and provided closed–form formulas for calculating various types of multilevel nonlinear reliabilities, including the composite reliability, the coefficient Omega, and the maximal reliability.