Abstract
The results of large-scale student assessments are increasingly being used to rank nations, states, and schools and to inform policy decisions. These uses often rely on aggregated student test score data, and imply inferences about multilevel constructs. Validating uses and interpretations about these multilevel constructs requires appropriate multilevel validation techniques. This chapter combines multilevel data analysis techniques with an explanatory view of validity to develop explanations of score variation that can be used to evaluate multilevel measurement inferences. We use country-level mathematics scores from the Trends in International Mathematics and Science Study (TIMSS) to illustrate the integration of these techniques. The explanation focused view of validity accompanied by the ecological model of item responding situates conventional response process research in a multilevel construct setting and moves response process studies beyond the traditional focus on individual test-takers’ behaviors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Domain scores were used in the multilevel confirmatory factor analyses because they could be treated as continuous observed variables and hence conventional fit statistics were available to test for fit as well as the computational ease of using continuous scores resulting in substantially reduced computing time. Our’s is a variation on the use of item parcels. In our case, however, the parcels are theoretically driven and confirmed to be unidimensional. As further support for the use of the four domain scores in subsequent analyses, we fit a multilevel exploratory item response theory analysis for all 29 items simultaneously. The first three eigenvalues of the within level polychoric correlation matrix were 10.0, 1.5, and 1.3; and the first three eigenvalues of the between level correlation matrix were 22.4, 1.5, and 1.0. Clearly, the eigenvalues point toward one between and one within latent variable even when the items are the focus of analysis. The CFI = 0.92, RMSEA = 0.03, SRMR Within = 0.07, and SRMR Between = 0.06 for the one factor within and one factor between model. As an example of the computational burden of the item level analyses, the 29 item analysis described in this footnote required over 6 h of computational time whereas the domain models complete in less than 5 min each. All of this evidence lends further support for the use of the domain scores in the subsequent analyses.
References
Chen, G., Mathieu, J. E., & Bliese, P. D. (2004a). A framework for conducting multilevel construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Research in multilevel issues: Multilevel issues in organizational behavior and processes (Vol. 3, pp. 273–303). Oxford, UK: Elsevier.
Chen, G., Mathieu, J. E., & Bliese, P. D. (2004b). Validating frogs and ponds in multilevel contexts: Some afterthoughts. In F. J. Yammarino & F. Dansereau (Eds.), Research in multilevel issues: Multilevel issues in organizational behavior and processes (Vol. 3, pp. 335–343). Oxford, UK: Elsevier.
Dansereau, F., & Yammarino, F. J. (2000). Within and between analysis: The variant paradigm as an underlying approach to theory building and testing. In K. J. Klein & S. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 425–466). San Francisco, CA: Jossey-Bass.
Forer, B., & Zumbo, B. D. (2011). Validation of multilevel constructs: Validation methods and empirical findings for the EDI. Social Indicators Research: An International Interdisciplinary Journal for Quality of Life Measurement, 103, 231–265. doi:10.1007/s11205-011-9844-3.
Goldstein, H., & McDonald, R. P. (1988). A general model for the analysis of multilevel data. Psychometrika, 53, 455–467.
Hofmann, D. A., & Jones, L.M. (2004). Some foundational and guiding questions for multilevel construct validation. In F. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behavior and processes. Amsterdam: Elsevier.
Kaplan, D., & Elliott, P. R. (1997). A didactic example of multilevel structural equation modeling applicable to the study of organizations. Structural Equation Modeling, 4, 1–24.
Klein, K. J., Dansereau, F., & Hall, R. J. (1994). Levels issues in theory development, data collection, and analysis. Academy of Management Review, 19, 195–229.
Lee, S.-Y. (1990). Multilevel analysis of structural equation models. Biometrika, 77, 763–772.
Longford, N. T., & Muthén, B. O. (1992). Factor analysis for clustered observations. Psychometrika, 57, 581–597.
Morgeson, F. P., & Hofmann, D. A. (1999). The structure and function of collective constructs: Implications for multilevel research and theory development. Academy of Management Review, 24, 249–265.
Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y., Arora, A., & Erberber, E. (2005). TIMSS 2007 assessment frameworks. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. URL: http://timss.bc.edu/timss2007/PDF/T07_AF.pdf.
Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376–398.
Muthén, B. O., & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267–316.
Raudenbush, S. W., Rowan, B., & Kang, S. J. (1991). A multilevel, multivariate model for studying school climate with estimation via the EM algorithm and application to U.S. high-school data. Journal of Educational Statistics, 16, 295–330.
Stone, J., & Zumbo, B. D. (2016). Validity as a Pragmatist project: A global concern with local application. In V. Aryadoust & J. Fox (Eds.), Trends in language assessment research and practice (pp. 555–573). Newcastle, UK: Cambridge Scholars Publishing.
Watkins, K. (2007). Human development report 2007/2008, fighting climate change: Human solidarity in a divided world. New York, NY: United Nations Development Programme.
Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, Vol. 26: Psychometrics (pp. 45–79). Amsterdam, The Netherlands: Elsevier Science B.V.
Zumbo, B. D. (2009). Validity as contextualized and Pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: IAP – Information Age Publishing, Inc..
Zumbo, B. D., & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J. A. Bovaird, K. F. Geisinger, & C. W. Buckendahl (Eds.), High stakes testing in education – Science and practice in K-12 settings (pp. 177–190). Washington, DC: American Psychological Association Press.
Zumbo, B. D., & Gelin, M. N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1–23.
Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Astivia, O. L. O., & Ark, T. K. (2015). A methodology for Zumbo’s Third Generation DIF analyses and the ecology of item eesponding. Language Assessment Quarterly, 12, 136–151.
Acknowledgment
The authors would like to thank Professor Fred Dansereau for his generous guidance and feedback on the WABA analyses, and Professor Bob Linn for the encouragement to publish this paper. An earlier version of this paper presented at the symposium “A Multilevel View of Test Validity”, 2010 Annual Meeting of the American Educational Research Association, Denver, CO.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
Appendix A: Countries Involved in the Study and Sample Size
Nation | Number of students |
---|---|
Algeria | 384 |
Armenia | 277 |
Australia | 294 |
Bahrain | 303 |
Bosnia and Herzegovina | 301 |
Botswana | 298 |
Bulgaria | 288 |
Chinese Taipei | 287 |
Colombia | 347 |
Cyprus | 314 |
Czech Republic | 349 |
Egypt | 466 |
England | 299 |
Georgia | 306 |
Ghana | 377 |
Hong Kong, SAR | 249 |
Hungary | 285 |
Indonesia | 305 |
Iran, Islamic Republic of | 291 |
Israel | 234 |
Italy | 315 |
Japan | 307 |
Jordan | 370 |
Korea, Republic of | 306 |
Kuwait | 284 |
Lebanon | 267 |
Lithuania | 287 |
Malaysia | 321 |
Malta | 337 |
Mongolia | 317 |
Norway | 326 |
Oman | 322 |
Palestinian National Authority | 315 |
Qatar | 516 |
Romania | 303 |
Russian Federation | 320 |
Saudi Arabia | 307 |
Scotland | 290 |
Serbia | 288 |
Singapore | 328 |
Slovenia | 292 |
Sweden | 369 |
Syria, Arab Republic of | 327 |
Thailand | 390 |
Tunisia | 292 |
Turkey | 314 |
Ukraine | 321 |
United States | 544 |
Appendix B: Listing of the National Level Curriculum Explanatory Variables
Variable | Description | Data coding |
---|---|---|
1. Calculator | Does the national curriculum contain statements/policies about the use of calculators in grade 8 mathematics? | Binary 0/1; Yes = 1 |
2. Computer | Does the national curriculum contain statements/policies about the use of computers in grade 8 mathematics? | Binary 0/1; Yes = 1 |
How much emphasis does the national mathematics curriculum place on the following? | ||
3a. Basic | (a) Mastering basic skills and procedures | 4 point scale; |
None = 0, | ||
Very Little = 1, | ||
Some = 2, | ||
A lot = 3 | ||
3b. Concept | (b) Understanding mathematical concepts and principles | 4 point scale; |
None = 0, | ||
Very Little = 1, | ||
Some = 2, | ||
A lot = 3 | ||
3c. Real life | (c) Applying mathematics in real-life contexts | 4 point scale; |
None = 0, | ||
Very Little = 1, | ||
Some = 2, | ||
A lot = 3 | ||
3d. Communicate | (d) Communicating mathematically | 4 point scale; |
None = 0, | ||
Very Little = 1, | ||
Some = 2, | ||
A lot = 3 | ||
3e. Reason | (e) Reasoning mathematically | 4 point scale; |
None = 0, | ||
Very Little = 1, | ||
Some = 2, | ||
A lot = 3 | ||
3f. Integrating | (f) Integrating mathematics with other subjects | 4 point scale; |
None = 0, | ||
Very Little = 1, | ||
Some = 2, | ||
A lot = 3 | ||
3g. Proof | (g) Deriving formal proofs | 4 point scale; |
None = 0, | ||
Very Little = 1, | ||
Some = 2, | ||
A lot = 3 | ||
4a & b. Which best describes how the mathematics curriculum addresses the issue of students with different levels of ability? (Two variables DFlevel and DFCur) | Different curricula are prescribed for students of different ability levels. | Design Matrix |
DFlevel DFcur | ||
0 1 | ||
The same curriculum is prescribed for students of different ability levels, but at different levels of difficulty | 1 0 | |
The same curriculum is prescribed for all students | 0 0 | |
5. Remedial | Is there an official policy to provide remedial mathematics instruction at the eighth grade of formal schooling? | Binary 0/1; Yes = 1 |
6. Degree | Which are the current requirements for being a middle/lower secondary grade teacher? A degree from a teacher education program | Binary 0/1; Yes = 1 |
7. Exam | Across grades K-12, does an education authority in your country (e.g., National Ministry of Education) administer examinations in mathematics that have consequences for individual students, such as determining grade promotion, entry to a higher school system, entry to a university, and/or exiting or graduating from high school? | Binary 0/1; Yes = 1 |
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Zumbo, B.D., Liu, Y., Wu, A.D., Forer, B., Shear, B.R. (2017). National and International Educational Achievement Testing: A Case of Multi-level Validation Framed by the Ecological Model of Item Responding. In: Zumbo, B., Hubley, A. (eds) Understanding and Investigating Response Processes in Validation Research. Social Indicators Research Series, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-56129-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56128-8
Online ISBN: 978-3-319-56129-5
eBook Packages: Social SciencesSocial Sciences (R0)