National and International Educational Achievement Testing: A Case of Multi-level Validation Framed by the Ecological Model of Item Responding

Zumbo, Bruno D.; Liu, Yan; Wu, Amery D.; Forer, Barry; Shear, Benjamin R.

doi:10.1007/978-3-319-56129-5_18

Bruno D. Zumbo⁴,
Yan Liu⁴,
Amery D. Wu⁴,
Barry Forer⁵ &
…
Benjamin R. Shear⁶

Part of the book series: Social Indicators Research Series ((SINS,volume 69))

1331 Accesses
3 Citations

Abstract

The results of large-scale student assessments are increasingly being used to rank nations, states, and schools and to inform policy decisions. These uses often rely on aggregated student test score data, and imply inferences about multilevel constructs. Validating uses and interpretations about these multilevel constructs requires appropriate multilevel validation techniques. This chapter combines multilevel data analysis techniques with an explanatory view of validity to develop explanations of score variation that can be used to evaluate multilevel measurement inferences. We use country-level mathematics scores from the Trends in International Mathematics and Science Study (TIMSS) to illustrate the integration of these techniques. The explanation focused view of validity accompanied by the ecological model of item responding situates conventional response process research in a multilevel construct setting and moves response process studies beyond the traditional focus on individual test-takers’ behaviors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Article Open access 03 September 2022

Application of Multilevel Models to International Large-Scale Student Assessment Data

Assessment of fit of item response theory models used in large-scale educational survey assessments

Article Open access 08 July 2016

Notes

1.
Domain scores were used in the multilevel confirmatory factor analyses because they could be treated as continuous observed variables and hence conventional fit statistics were available to test for fit as well as the computational ease of using continuous scores resulting in substantially reduced computing time. Our’s is a variation on the use of item parcels. In our case, however, the parcels are theoretically driven and confirmed to be unidimensional. As further support for the use of the four domain scores in subsequent analyses, we fit a multilevel exploratory item response theory analysis for all 29 items simultaneously. The first three eigenvalues of the within level polychoric correlation matrix were 10.0, 1.5, and 1.3; and the first three eigenvalues of the between level correlation matrix were 22.4, 1.5, and 1.0. Clearly, the eigenvalues point toward one between and one within latent variable even when the items are the focus of analysis. The CFI = 0.92, RMSEA = 0.03, SRMR Within = 0.07, and SRMR Between = 0.06 for the one factor within and one factor between model. As an example of the computational burden of the item level analyses, the 29 item analysis described in this footnote required over 6 h of computational time whereas the domain models complete in less than 5 min each. All of this evidence lends further support for the use of the domain scores in the subsequent analyses.

References

Chen, G., Mathieu, J. E., & Bliese, P. D. (2004a). A framework for conducting multilevel construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Research in multilevel issues: Multilevel issues in organizational behavior and processes (Vol. 3, pp. 273–303). Oxford, UK: Elsevier.
Chapter Google Scholar
Chen, G., Mathieu, J. E., & Bliese, P. D. (2004b). Validating frogs and ponds in multilevel contexts: Some afterthoughts. In F. J. Yammarino & F. Dansereau (Eds.), Research in multilevel issues: Multilevel issues in organizational behavior and processes (Vol. 3, pp. 335–343). Oxford, UK: Elsevier.
Chapter Google Scholar
Dansereau, F., & Yammarino, F. J. (2000). Within and between analysis: The variant paradigm as an underlying approach to theory building and testing. In K. J. Klein & S. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 425–466). San Francisco, CA: Jossey-Bass.
Google Scholar
Forer, B., & Zumbo, B. D. (2011). Validation of multilevel constructs: Validation methods and empirical findings for the EDI. Social Indicators Research: An International Interdisciplinary Journal for Quality of Life Measurement, 103, 231–265. doi:10.1007/s11205-011-9844-3.
Article Google Scholar
Goldstein, H., & McDonald, R. P. (1988). A general model for the analysis of multilevel data. Psychometrika, 53, 455–467.
Article Google Scholar
Hofmann, D. A., & Jones, L.M. (2004). Some foundational and guiding questions for multilevel construct validation. In F. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behavior and processes. Amsterdam: Elsevier.
Google Scholar
Kaplan, D., & Elliott, P. R. (1997). A didactic example of multilevel structural equation modeling applicable to the study of organizations. Structural Equation Modeling, 4, 1–24.
Article Google Scholar
Klein, K. J., Dansereau, F., & Hall, R. J. (1994). Levels issues in theory development, data collection, and analysis. Academy of Management Review, 19, 195–229.
Google Scholar
Lee, S.-Y. (1990). Multilevel analysis of structural equation models. Biometrika, 77, 763–772.
Article Google Scholar
Longford, N. T., & Muthén, B. O. (1992). Factor analysis for clustered observations. Psychometrika, 57, 581–597.
Article Google Scholar
Morgeson, F. P., & Hofmann, D. A. (1999). The structure and function of collective constructs: Implications for multilevel research and theory development. Academy of Management Review, 24, 249–265.
Google Scholar
Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y., Arora, A., & Erberber, E. (2005). TIMSS 2007 assessment frameworks. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. URL: http://timss.bc.edu/timss2007/PDF/T07_AF.pdf.
Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376–398.
Article Google Scholar
Muthén, B. O., & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267–316.
Article Google Scholar
Raudenbush, S. W., Rowan, B., & Kang, S. J. (1991). A multilevel, multivariate model for studying school climate with estimation via the EM algorithm and application to U.S. high-school data. Journal of Educational Statistics, 16, 295–330.
Article Google Scholar
Stone, J., & Zumbo, B. D. (2016). Validity as a Pragmatist project: A global concern with local application. In V. Aryadoust & J. Fox (Eds.), Trends in language assessment research and practice (pp. 555–573). Newcastle, UK: Cambridge Scholars Publishing.
Google Scholar
Watkins, K. (2007). Human development report 2007/2008, fighting climate change: Human solidarity in a divided world. New York, NY: United Nations Development Programme.
Google Scholar
Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, Vol. 26: Psychometrics (pp. 45–79). Amsterdam, The Netherlands: Elsevier Science B.V.
Google Scholar
Zumbo, B. D. (2009). Validity as contextualized and Pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: IAP – Information Age Publishing, Inc..
Google Scholar
Zumbo, B. D., & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J. A. Bovaird, K. F. Geisinger, & C. W. Buckendahl (Eds.), High stakes testing in education – Science and practice in K-12 settings (pp. 177–190). Washington, DC: American Psychological Association Press.
Chapter Google Scholar
Zumbo, B. D., & Gelin, M. N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1–23.
Google Scholar
Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Astivia, O. L. O., & Ark, T. K. (2015). A methodology for Zumbo’s Third Generation DIF analyses and the ecology of item eesponding. Language Assessment Quarterly, 12, 136–151.
Article Google Scholar

Download references

Acknowledgment

The authors would like to thank Professor Fred Dansereau for his generous guidance and feedback on the WABA analyses, and Professor Bob Linn for the encouragement to publish this paper. An earlier version of this paper presented at the symposium “A Multilevel View of Test Validity”, 2010 Annual Meeting of the American Educational Research Association, Denver, CO.

Author information

Authors and Affiliations

Measurement, Evaluation, and Research Methodology (MERM) Program, Department of Educational and Counselling Psychology, and Special Education (ECPS), The University of British Columbia, 2125 Main Mall, Vancouver, BC, V6T 1Z4, Canada
Bruno D. Zumbo, Yan Liu & Amery D. Wu
The Human Early Learning Partnership, The University of British Columbia, Suite 440, 2206 East Mall, Vancouver, BC, V6T 1Z3, Canada
Barry Forer
School of Education, University of Colorado Boulder, 249 UCB, Boulder, CO, 80309, USA
Benjamin R. Shear

Authors

Bruno D. Zumbo
View author publications
You can also search for this author in PubMed Google Scholar
Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Amery D. Wu
View author publications
You can also search for this author in PubMed Google Scholar
Barry Forer
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin R. Shear
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bruno D. Zumbo .

Editor information

Editors and Affiliations

Measurement, Evaluation, and Research Methodology (MERM) Program, Department of Educational and Counselling Psychology, and Special Education (ECPS), The University of British Columbia, Vancouver, British Columbia, Canada
Bruno D. Zumbo
Measurement, Evaluation, and Research Methodology (MERM) Program, Department of Educational and Counselling Psychology, and Special Education (ECPS), The University of British Columbia, Vancouver, British Columbia, Canada
Anita M. Hubley

Appendices

Appendix A: Countries Involved in the Study and Sample Size

Nation	Number of students
Algeria	384
Armenia	277
Australia	294
Bahrain	303
Bosnia and Herzegovina	301
Botswana	298
Bulgaria	288
Chinese Taipei	287
Colombia	347
Cyprus	314
Czech Republic	349
Egypt	466
England	299
Georgia	306
Ghana	377
Hong Kong, SAR	249
Hungary	285
Indonesia	305
Iran, Islamic Republic of	291
Israel	234
Italy	315
Japan	307
Jordan	370
Korea, Republic of	306
Kuwait	284
Lebanon	267
Lithuania	287
Malaysia	321
Malta	337
Mongolia	317
Norway	326
Oman	322
Palestinian National Authority	315
Qatar	516
Romania	303
Russian Federation	320
Saudi Arabia	307
Scotland	290
Serbia	288
Singapore	328
Slovenia	292
Sweden	369
Syria, Arab Republic of	327
Thailand	390
Tunisia	292
Turkey	314
Ukraine	321
United States	544

Appendix B: Listing of the National Level Curriculum Explanatory Variables

Variable	Description	Data coding
1. Calculator	Does the national curriculum contain statements/policies about the use of calculators in grade 8 mathematics?	Binary 0/1; Yes = 1
2. Computer	Does the national curriculum contain statements/policies about the use of computers in grade 8 mathematics?	Binary 0/1; Yes = 1
How much emphasis does the national mathematics curriculum place on the following?
3a. Basic	(a) Mastering basic skills and procedures	4 point scale;
		None = 0,
		Very Little = 1,
		Some = 2,
		A lot = 3
3b. Concept	(b) Understanding mathematical concepts and principles	4 point scale;
		None = 0,
		Very Little = 1,
		Some = 2,
		A lot = 3
3c. Real life	(c) Applying mathematics in real-life contexts	4 point scale;
		None = 0,
		Very Little = 1,
		Some = 2,
		A lot = 3
3d. Communicate	(d) Communicating mathematically	4 point scale;
		None = 0,
		Very Little = 1,
		Some = 2,
		A lot = 3
3e. Reason	(e) Reasoning mathematically	4 point scale;
		None = 0,
		Very Little = 1,
		Some = 2,
		A lot = 3
3f. Integrating	(f) Integrating mathematics with other subjects	4 point scale;
		None = 0,
		Very Little = 1,
		Some = 2,
		A lot = 3
3g. Proof	(g) Deriving formal proofs	4 point scale;
		None = 0,
		Very Little = 1,
		Some = 2,
		A lot = 3
4a & b. Which best describes how the mathematics curriculum addresses the issue of students with different levels of ability? (Two variables DFlevel and DFCur)	Different curricula are prescribed for students of different ability levels.	Design Matrix
		DFlevel DFcur
		0 1
	The same curriculum is prescribed for students of different ability levels, but at different levels of difficulty	1 0
	The same curriculum is prescribed for all students	0 0
5. Remedial	Is there an official policy to provide remedial mathematics instruction at the eighth grade of formal schooling?	Binary 0/1; Yes = 1
6. Degree	Which are the current requirements for being a middle/lower secondary grade teacher? A degree from a teacher education program	Binary 0/1; Yes = 1
7. Exam	Across grades K-12, does an education authority in your country (e.g., National Ministry of Education) administer examinations in mathematics that have consequences for individual students, such as determining grade promotion, entry to a higher school system, entry to a university, and/or exiting or graduating from high school?	Binary 0/1; Yes = 1

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zumbo, B.D., Liu, Y., Wu, A.D., Forer, B., Shear, B.R. (2017). National and International Educational Achievement Testing: A Case of Multi-level Validation Framed by the Ecological Model of Item Responding. In: Zumbo, B., Hubley, A. (eds) Understanding and Investigating Response Processes in Validation Research. Social Indicators Research Series, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-56129-5_18
Published: 25 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56128-8
Online ISBN: 978-3-319-56129-5
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics

National and International Educational Achievement Testing: A Case of Multi-level Validation Framed by the Ecological Model of Item Responding

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Application of Multilevel Models to International Large-Scale Student Assessment Data

Assessment of fit of item response theory models used in large-scale educational survey assessments

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendices

Appendix A: Countries Involved in the Study and Sample Size

Appendix B: Listing of the National Level Curriculum Explanatory Variables

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

National and International Educational Achievement Testing: A Case of Multi-level Validation Framed by the Ecological Model of Item Responding

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Application of Multilevel Models to International Large-Scale Student Assessment Data

Assessment of fit of item response theory models used in large-scale educational survey assessments

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendices

Appendix A: Countries Involved in the Study and Sample Size

Appendix B: Listing of the National Level Curriculum Explanatory Variables

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation