Abstract
There has been no research on the multilevel composite reliability (or coefficient omega) and multilevel maximal reliability H for weighted sum of a test with ordered categorical responses. In order to fill the gap, this study reviewed various approaches to reliabilities, extended single level reliabilities to multilevel reliabilities, and provided closed-form formulas for multilevel nonlinear SEM reliabilities via a multilevel confirmatory factor analysis approach. Multilevel Cronbach’s alpha was also considered.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Reliability is a measure of overall internal consistency of a test. It has been widely used in statistical, psychological, educational, social and behavioral research when data are collected by responding to items in a test or a questionnaire (Bollen, 1989; Finney & DiStefano, 2006). A high value of reliability indicates the measure provides similar, reliable, and stable results under consistent conditions. There are many approaches that have been proposed to estimate reliabilities. Among them, the classical test theory (CTT) approach has been widely used, and the Cronbach alpha is the most popular reliability. But it only measures the lower bound on the consistency of a test (Green et al., 1977; Novick & Lewis, 1967; Sijtsma, 2009). Another approach using structural equation modeling (SEM) has been proposed to obtain more accurate reliabilities (Bentler, 2009; Bollen, 1989; Green & Yang, 2009; Miller, 1995; Raykov, 1997; Raykov & Shrout, 2002). But these methods focus on addressing continuous outcomes.
For tests with ordered categorical responses, Green and Yang (2009) proposed a nonlinear reliability coefficient within an SEM framework, and the nonlinear reliability has been found to be more accurate than the linear reliability that treats categorical scores as continuous. But they only considered the items with the same number of categories. Kim, Lu and Cohen (2020) extended their research to broader situations and proposed a general formula for reliabilities. But these formulas are only for single level data structure. And their research did not consider the composite reliability or the coefficient Omega (McDonald, 1985) and the maximal reliability H of weighted sum (Bentler, 2007) for tests with categorical responses. So far, there has been no research on this topic.
In order to fill the gap, the current study reviewed various approaches to reliabilities, extended single level reliabilities to multilevel reliabilities, and provided closed-form formulas for multilevel nonlinear SEM reliabilities for tests with ordered categorical responses via a multilevel confirmatory factor analysis (MCFA) approach. Multilevel alpha was also considered.
2 Reliabilities
2.1 CTT Approach
Suppose there are J items in a test. In classical test theory (CTT), an observed score X j on item j (j = 1, …, J) is composed of two uncorrelated components, a latent true score or trait, T j, and an error score, ϵ j, with mean of 0:
Let X, T and ϵ be the sum of observed scores, of true scores, and of error scores, respectively, across J items. Then
and \( T=\sum_{j=1}^J{T}_j,\epsilon =\sum_{j=1}^J{\epsilon}_j, \) so we have X = T + ϵ.
We want to make sure how much of variance of observed score is due to the latent true score versus the error. One measure is to use reliability. The reliability coefficient of a test is defined as the ratio of the true variance to the total variance, which is the sum of the true variance and the error variance. Mathematically, it is
where \( {\sigma}_T^2 \) is the variance of T, and \( {\sigma}_x^2 \) is the variance of X (Lord & Novick, 1968). The reliability quantifies the proportion or ratio. It is an estimation of how much random error might be in the scores around the true score.
Cronbach Alpha
Under CTT, there are many ways to estimate reliability: test-retest, alternative forms, split-half, Spearman-Brown prophecy formula, and the Cronbach alpha (or coefficient Alpha). Among them, the Cronbach alpha (Cronbach, 1951) is the most commonly used. The Cronbach alpha is defined as
where J is the total number of items, σ x 2 is the variance of observed scores of the test X, and \( {\overline{\sigma}}_{xx\prime } \) is the mean of off-diagonal covariance between two parallel tests X and X ′. Specifically, the alpha can be calculated as
where \( \sum_i\sum_{j,i<j}{\sigma}_{\varepsilon_i,{\varepsilon}_j} \) is the sum of lower (or upper) off-diagonal covariance between items i and j.
2.2 CFA Approach
However, the Cronbach alpha is only a lower bound on the internal consistency of the test (Green et al., 1977; Novick & Lewis, 1967; Sijtsma, 2009). To get a better estimate, another approach to reliability is the structural equation modeling (SEM) approach. Specifically, this approach uses confirmatory factor analysis (CFA) to estimate reliability. Tests are assumed to have underlying factorial structure (factors, e.g., reading ability, math ability, or personality).
where \( {X}_j^{\ast } \) is a continuous score for item j, M is the number of latent factors, η m (1 ≤ m ≤ M) are latent factors weighted by corresponding factor loadings λ m, and e j is a measurement error term. We assume errors are independent and have variance \( {\sigma}_{\varepsilon_j}^2 \) for item j.
Suppose there are J items in the test and X ∗ and T are the sum of observed scores and of true scores, respectively. Then \( {X}^{\ast }=\sum_{j=1}^J{X}_j^{\ast } \) and \( T=\sum_{j=1}^J\sum_{m=1}^M{\lambda}_{mj}{\eta}_m \). Because item scores are presented by a confirmatory factor analysis model, the linear reliability ρ lin is calculated as the ratio of true sum score variance to observed sum score variance where the true score variance is estimated using the CFA model above.
Here ρ lin measures the linear proportion of observed sum score variance that is attributed to the latent factors, η 1, η 2, …, η M (Bollen, 1989).
Composite Reliability/Coefficient Omega ω
If item scores have only one factor, then M = 1 and the reliability above becomes
And if we assume that latent factor have unit variance Var(η) = 1, then the reliability ρ lin is referred to as composite reliability or coefficient omega ω (McDonald, 1985)
where \( \sum {\lambda}_{x_j} \) is a sum of the factor loading of item j, and \( \sum {\sigma}_{\varepsilon_j}^2 \) is a sum of all error variances.
Maximal Reliability H for Weighted Sum
Composite reliability represents the relation between a scale’s underlying latent factor and its unit-weighted composite, but a scale’s unit-weighted composite may not optimally reflect its underlying latent construct. The true score variance estimated in factor analysis allows for heterogeneous indicator weights, so it is reasonable to allow heterogeneous weights when creating a scale’s composite score.
One approach to comparing true score variance for one common factor to the variance of a unit-weighted scale is presented as maximal reliability H (e.g., Bentler, 2007). When the weight vector
where λ is the factor loading matrix and ψ is the residue variance matrix, then the maximal reliability is given by
By assuming that the variance of each item is 1, the standardized version of maximal reliability for a single common factor model can be expressed as follows (e.g., Hancock & Mueller, 2001; Geldhof et al., 2014).
where \( {\lambda}_{x_j}^2 \) is the squared standardized factor loading of item j, and \( {\sigma}_{\varepsilon_j}^2 \) is the error variance of item j.
3 Multilevel Reliabilities
The data collected from social and educational areas often have multilevel structure. In these cases, multilevel reliabilities for multilevel structure were proposed.
3.1 Multilevel Confirmatory Factor Analysis (MCFA)
Muthén (2011) defined a multilevel confirmatory factor analysis (MCFA) by assuming a one-factor model holds for both the between and the within components. The observed value of the p-dimensional variable y gi is partitioned into three components:
where y gi is the observed value of individual i in group g, V is a grand mean, y Bg is the between-group part of the observed value, and y wgi is the within-group part of the observed value. The multilevel CFA specifies a model at between-group level and within-group level separately. Suppose there are h factors between groups and m factors within groups, the between-group level CFA model is
where Λ Bg is a (p × h) matrix of factor loadings with elements λBg’s, ηBg is a h-dimensional vector of factor scores with the assumption of η Bg~MN(0h, Ψ h × h), and ε Bg is a p-dimensional vector of errors with the assumption of ε i~MN(0p, Φ p × p). And the within-group CFA model level is defined as
where Λ wg is a (p × m) matrix of factor loadings with elements λ wg’s, η wgi is a m-dimensional vector of factor scores with the assumption of η wgi~MN(0m, Ψ m × m ), and ε wgi is a p-dimensional vector of errors with the assumption of ε i~MN(0p, Φ p × p).
If there is only one factor either between-group or within-group, then the variance of the observed variable y, \( {\sigma}_{y_{gi}}^2 \), is decomposed (Muthén, 2011) as
where λ Bg is between-group factor loadings, \( {\sigma}_{\eta Bg}^2 \) is the variances of between-group factor scores, \( {\sigma}_{\varepsilon Bg}^2 \) is the variances of between-group errors, λ wg is within-group factor loadings, \( {\sigma}_{\eta wg}^2 \) is the variances of within-group factor scores, and \( {\sigma}_{\varepsilon wg}^2 \) is the variances of within-group errors, σ BF 2 is a between-level factor score variance, σ BE 2 is a between-level error variance, σ WF 2 is a within-level factor score variance, and σ WE 2 is an within-level error variance.
3.2 Multilevel Reliabilities
Applying MCFA to reliability calculation, p dimensions become J items, and we assume there is only one factor either between-group or within-group.
Multilevel Alpha
The multilevel alpha is calculated as
where J is the number of items, \( {\overline{\sigma}}_{wgi, wgj} \) is the average of the within-group covariance between items i and j. \( {\sigma}_{wg}^2 \) is the variance of the within-group part of observed value, \( {\overline{\sigma}}_{Bgi, Bgj} \) is the mean of the between-group covariance between items i and j, and \( {\sigma}_{Bg}^2 \) is the variance of the between-group part of observed value.
Multilevel Omega
And multilevel omega is obtained as
where ∑λ wgj is a sum of within-group level squared factor loading of item j, and \( \sum {\upsigma}_{\upvarepsilon \mathrm{wgi}}^2 \) is a sum of within-group level error variances, ∑λ Bgj is a sum of within-group level squared factor loading of item i, and \( \sum {\sigma}_{\varepsilon Bgj}^2 \) is a sum of within-group level error variances.
Multilevel H
Maximal H is calculated by
where \( {\lambda}_{wgj}^2 \) is the squared standardized within-group factor loading of item j, \( {\lambda}_{Bgj}^2 \) is the squared standardized between-group factor loading of item j.
4 Multilevel Reliability for Categories Responses
It is very common in social and behavioral sciences that items have ordered categories. When the observed data are ordinal categorical, fitting linear SEM models using the linear estimation method is not desirable because it violates the assumption and provides inflated chi-square estimates and attenuated factor loadings (Bollen, 1989). To address this problem, we consider the observed categorical scores (Xj) are from underlying continuous variables (\( {\mathrm{X}}_{\mathrm{j}}^{\ast}\Big) \) and the nonlinear relationship between Xj and \( {\mathrm{X}}_{\mathrm{j}}^{\ast } \) is
where C j is the number of categories for X j, and the ν i (i = 1, 2, …, C j − 1) are the category thresholds. If \( {X}_j^{\ast } \) is less then ν 1, Xj is equal to 0, for \( {v}_1\le {\mathrm{X}}_{\mathrm{j}}^{\ast }<{v}_2 \), Xj is equal to 1, and if \( {\mathrm{X}}_{\mathrm{j}}^{\ast } \) is above \( {\nu}_{C_j-1} \), Xj is equal to Cj − 1. If the structure of the test is well-specified, this approach can estimate reliability more accurately than the linear SEM approach.
Multilevel Alpha for Tests with Categorical Responses
Single level alpha for categorical responses can be calculated by using polychoric correlations if we assume the variance of each item is 1. Reliability calculated from parallel measures. For multilevel alpha, both within- and between-group levels reliabilities are calculated.
Multilevel Composite Reliability for Tests with Categorical Responses
Single level composite reliabilities have been proposed by Kim, Lu and Cohen (2020) to investigated reliability with items having the same or different numbers of ordered categories. For multilevel composite reliabilities, the same formula can be applied at both within- and between-group levels. Multilevel Omega for tests with categorical responses is a simplified version for one factor models.
Multilevel Maximal Reliability for Tests with Categorical Responses
There has been no research on this topic done before. In this article, we derived the numerical formula as follows. Suppose X and \( \overset{\sim }{X} \) are two parallel tests, which are two weighted sums, \( X=\sum_{j=1}^J{w}_j{X}_j \) and \( \overset{\sim }{X}=\sum_{j^{\prime }=1}^J{w}_{j^{\prime }}{\overset{\sim }{X}}_{j^{\prime }} \). To estimate the reliability for the nonlinear measurement model, the correlation between X and \( \overset{\sim }{X} \)is used, which is
The numerator for weighted sums is
in which
and
The denominator is
Therefore the reliability is calculated from parallel measures of ordered categories responses
For one factor models, the formula above can be greatly simplified. The multilevel maximal reliability of weighted sum will apply the formula at both within-group and between-group levels.
5 Conclusions
This study proposed a confirmatory factor analysis approach to multilevel reliability for tests with ordered categories item responses. It extended single level reliabilities to multilevel reliabilities, and provided closed–form formulas for calculating various types of multilevel nonlinear reliabilities, including the composite reliability, the coefficient Omega, and the maximal reliability.
References
Bentler, P. M. (2007). Covariance structure models for maximal reliability of unit-weighted composites. In Handbook of latent variable and related models (pp. 1–19). North-Holland.
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74(1), 137–143.
Bollen, K. A. (1989). Structural equations with latent variables. Wiley.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. psychometrika, 16(3), 297–334.
Finney, S. J., & DiStefano, C. (2006). Non-normal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 269–314). Information Age.
Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91.
Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827–838.
Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74(1), 155–167.
Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability within latent variable systems. In R. Cudeck, S. du Toit, & D. Sörbom (Eds.), Structural equation modeling: Present and future – A festschrift in honor of Karl Jöreskog (pp. 195–216). Scientific Software International.
Kim, S., Lu, Z., & Cohen, A. S. (2020). Reliability for tests with items having different numbers of ordered categories. Applied psychological measurement, 44(2), 137–149.
Lord, F. M., & Novick, R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Wesley.
McDonald, R. P. (1985). Factor analysis and related methods. Erlbaum.
Miller, M. B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling. Structural Equation Modeling, 2(3), 255–273.
Muthén, B. O. (2011). Mean and covariance structure analysis of hierarchical data. UCLA: Department of Statistics, UCLA. Retrieved from https://escholarship.org/uc/item/1vp6w4sr.
Novick, M., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32(1), 1–13.
Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21(2), 173–184.
Raykov, T., & Shrout, P. E. (2002). Reliability of scales with general structure: Point and interval estimation using a structural equation modeling approach. Structural Equation Modeling, 9(2), 195–212.
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107–120.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, Z.(., Hong, M., Kim, S. (2021). Formulas of Multilevel Reliabilities for Tests with Ordered Categorical Responses. In: Wiberg, M., Molenaar, D., González, J., Böckenholt, U., Kim, JS. (eds) Quantitative Psychology. IMPS 2020. Springer Proceedings in Mathematics & Statistics, vol 353. Springer, Cham. https://doi.org/10.1007/978-3-030-74772-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-74772-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74771-8
Online ISBN: 978-3-030-74772-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)