Introduction

In recent years, schools across the world have emerged as more than academic learning environments. Contemporary educational systems are arguably the chief milieu for children and adolescents to develop adaptive attitudes, emotions, and skills that support forming positive relationships while becoming increasingly independent (Thapa et al. 2013). Wang and Degol (2016) identified school climate as the convergence of a school’s characteristic academic atmosphere, community of interpersonal relationships, physical and emotional safety, and institutional structures. Ice et al. (2015) suggested that school climate is evidenced by both the processes and outcomes that define the range of acceptable interactions and norms on a campus. Student perceptions of school climate have been positively associated with academic achievement (Berkoqitz et al. 2017), physical health (Lucarelli et al. 2014), willingness to ask for help (Shukla et al. 2016), prosocial behavior (O’Brennan et al. 2014), as well as, resilience and emotional well-being (Aldridge et al. 2016). By contrast, when school climate appraisals are low, this tends to be associated with deleterious activities including bullying (Duggins et al. 2016), absenteeism (Burton et al. 2014), dropout (Jia et al. 2016), and teacher burnout (Malinen and Savilainen 2016). Thus, perceptions of school climate may be a putative indicator of campus health and the well-being of students within their learning environment. Given the high-stake implications for the relationships between school climate and youth development across cultural characteristics (La Salle et al. 2015), identification and use of related assessments may support best practices for all counselors working in school settings.

During recent decades, theories of school climate have shifted from grounding in bio-ecological (Bronfenbrenner 1979) and attachment (Ainsworth 1989) frameworks to those reflecting conceptual developments related to risk and resilience (Rutter et al. 1997), social cognition (Bandura 1997), and fit between developmental stage and environment (Roeser and Eccles 1996). In response, the measurement of school climate has also diversified through use of surveys, interviews, focus groups, observations, and reporting of archival data. Although the integration of diverse data categories has been encouraged by groups such as the American School Counselor Association (ASCA 2019), International School Counselor Association (ISCA 2017), and World Health Organization (World Health Organizations 2017a, 2017b), use of survey data has persisted as a primary approach across school campuses (Wang and Degol 2016). This issue is further complicated by the numerous conceptualizations of the school climate construct ranging from unidimensional to multi-component (Wang and Degol 2016). Regardless of which survey school counselors select, its helpfulness in assisting planning and decision-making activities will be a function of the degree that its scores are valid and reliable. Thus, the consideration of content evidence and other psychometric features may support understanding about how useful a measure may be for a group of students.

Types of psychometric evidence are fluid representations of constructs, such as school climate from one group of students to the next (Hays and Wood 2017; Spurgeon 2017). Lenz and Wester (2017) described the five types of validity evidence based on the Standards for Educational and Psychological Testing (AERA et al. 2014) that can support counselors making decisions about how useful assessment scores may be with certain groups: (a) assessment content, (b) response processes, (c) internal structure, (d) relations to other variables, and (e) evidence of consequences of assessment. Evidence based on assessment content can be known through the degree that items reflect key features of school climate based on theory, expert review, or stakeholder feedback (Lambie et al. 2017). Evidence based on response processes reflects the degree that prompts require accessing relevant cognitions and memories that are reflected in responses (Peterson et al. 2017). Evidence based on internal structure is represented by the degree that assessment scores are statistically associated with related constructs or subscales (Lewis 2017). Evidence depicting relations to other variables can be known through statistical correlations to related variables such as other school climate measures or relevant outcomes like number of school-wide disciplinary referrals (Balkin 2017; Swank and Mullen 2017). Finally, evidence of consequential validity is indicted by degree that interpretations and uses of responses have potential for affecting students’ experiences, opportunities, and outcomes in differential ways. When coupled with robust score reliability, these types of validity evidence can support selection of a school climate measure that reveal the need for activities such as universal prevention programming and more targeted approaches that may promote student development and well-being across the lifespan.

Given the substantive ways that school climate influences student development, well-being, and success across the lifespan, counselors working in schools may be responsible for leading the way through selection and use of measures that best represent their campuses. In instances when a potentially useful measure may not be linguistically or culturally-representative, straight-forward procedures for translation and cross-cultural validation have been established to promote access and usefulness for practitioners (Lenz et al. 2017; WHO n.d.). To date, systematic reviews of school climate measures have included older, legacy measures in which items may not reflect contemporary school-student interactions and thus provide data for constructs in unrepresentative or extraneous ways. As standards and best practices continue trending toward data-driven solutions for enhancing student developmental experiences (ASCA 2019; ISCA 2017; World Health Organizations 2017a, 2017b), systematic reviews of measure characteristics and psychometric properties may support decision-making and survey selection in local communities.

Purpose of the Study and Research Questions

The purpose of this study was to identify and appraise a corpus of psychometric assessments depicting the School Climate construct developed within a recent 25-year period. This time period was selected to include measures representing the shift in theories of school climate from bio-ecological (Bronfenbrenner 1979) and attachment (Ainsworth 1989) frameworks to those reflecting more contemporary conceptualizations of risk and resilience (Rutter et al. 1997), social cognition (Bandura 1997), and fit between developmental stage and environment (Roeser and Eccles 1996). The related analyses are intended to function as a foundation to support assessment selection among school counselors by informing decisions about the measures that may be most helpful when completing singular or continuous student assessment activities consistent with the ASCA Standards (2019), the ISCA Model (2017) and WHO Guidelines for Promoting School Health (2017a, b). This study was guided by the following research questions: (a) What School Climate measures have been developed within the 25-year timespan from 1993 to 2017?; (b) What are the compositional characteristics of the identified measures?; and (c) What are the nature of psychometric reliability and validity evidence reported in primary studies of School Climate measures development and evaluation?

Method

We implemented a systematic search to identify primary studies reporting the psychometrics properties of school climate measures intended for use with students in K-12 settings. Data from eligible studies were coded and systematically reviewed to depict characteristics and validity evidence between and across measures. Results were depicted in tables and reviewed in an even-handed, plainly spoken manner to promote measure selection and identification of best practices.

Search Strategies

We implemented two search strategies to identify and include all eligible studies related to initial development and evaluation of school climate: (1) electronic database searches and (2) review of reference lists. Two authors independently searched the Academic Search Complete, PsycINFO, ERIC, and Campbell Collaboration databases for the 1993 to 2017 date range. Keywords that were used to identify relevant documents included: “School Climate Scale,” “School Climate Assessment,” “School Climate Inventory,” “School Climate Measure,” “School Environment Scale,” “School Environment Assessment,” “School Environment Inventory,” and “School Environment Measure.” All retrieved documents were screened by title and abstract review to yield potential studies for inclusion. Redundancies between references were eliminated and all candidate articles selected for inclusion were saved in Portable Document Format (PDF) or Hyper Text Markup Language (HTML) formats. Once eligible documents were identified using Strategy 1, reference lists were reviewed to include any additional studies in our sample.

Inclusion and Exclusion Criteria

Inclusion within the systematic review was contingent upon the following criteria: (a) an initial development and evaluation of a psychometric measure intended to assess the school climate construct; (b) target population was students in a K-12 setting; (c) published in English; and (d) published in peer-reviewed or unpublished manuscripts. Studies were excluded if they replicated psychometric properties from previous studies or were not focused on student ratings.

Data Extraction and Coding

We extracted data from eligible studies using a coding procedure developed by the first author to document bibliographic data, participant and survey characteristics, and psychometric features. The second and third authors completed independent coding of articles. Both coders were doctoral students in a Council for Accreditation of Counseling and Related Educational Programs (CACREP) Counselor Education Program who were Professional School Counselors and had completed coursework in research methods, statistics, and assessment. Both coders received an orientation to evidence-based practices, systematic review procedures, and manual-based training for article coding. All authors engaged in recursive review of coded data, accuracy verification, and resolution of any inconsistencies to generate the final data set.

Descriptive Statistics

Coded participant information included the number of participants, age, ethnicity, gender, grade and country of origin. Survey information included name of the measure, number of items, response type/format, strategies for item development, and subscale details.

School Climate Construct Representation

Subscales from identified measures were identified, evaluated, and coded using the four general school climate domains that emerged from Wang and Degol’s (2016) systematic review: (a) academic atmosphere, (b) community, (c) safety, and (d) institutional structures.

Psychometric Validity and Reliability Evidence

We extracted and coded psychometric information that included indices that were representative with the 5 evidences of validity and reliability data (e.g. internal consistency coefficients and test-retest metrics) as described in the Standards (AERA et al. 2014). When the presence of a validity characteristic was affirmed, 1 point was assigned to this domain. Thus, for each measure a validity quotient was computed with a minimum possible score of 0 and a maximum of 5.

Results

Our search resulted in 30 candidate documents that had apparent relevance and indicated further inquiry. Table 1 depicts the 9 studies eligible for review applying the inclusion/exclusion criteria: (a) the Inventory of School Climate-Student [Brand et al. 2003]; (b) the California School Climate and Safety Survey–Short Form (Furlong et al. 2005); (c) the Measurement of School Climate [Zullig et al. 2010]; (d) Delaware School Climate Survey-Student [Bear et al. 2011]; (e) School Climate Survey [Ding et al. 2011]; (f) What’s Happening In This School? [Aldridge and Ala’l 2013]; (g) the Georgia Brief School Climate Inventory [White et al. 2014]; (h) School Climate for Diversity - Secondary Scale, [Byrd 2017]; and (i) School Climate and School Identification Measure- Student, [Lee et al. 2017]. There were 344,730 participants across the selected studies with a mean sample size of 38,303. Among studies that reported gender and mean age, participants were girls (n = 83,141; 50.4%) and boys (n = 81,552; 49.6%) in grades 1–12 who resided within the United States (n = 326,432; 97%) and Australia (n = 11,828; 3%). Among samples from the United States, authors reported the ethnic identities of their samples as predominately Caucasian (n = 179,529; 54%), ethnic minorities (n = 70,840; 21%), African American (n = 52,102; 16%), Hispanic (n = 16,232; 5%), Asian American (n = 5602; 2%), and Native American (n = 210; < 1%).

Table 1 Description of studies and measures included in review of school climate literature

School Climate Construct Representation

Table 2 depicts categories of school climate constructs represented within our sample of measures. Inspection of construct representation within the sample of measures indicated that the most common were associated with aspects of school Community (j = 8; 88%). Within the 9 studies, 27 subscales portrayed aspects of interpersonal relationships between peers, students, and administrators. Seven studies (77%) reported 14 subscales that were associated with student perceptions of physical and emotional safety. Six studies (66%) included 9 subscales intended to depict the supportive nature of academic atmosphere, while only 1 study (11%) reported 1 subscale evaluating the quality of institutional structures. Across measures, only 1 (Zullig et al. 2010) included all four school climate domains, 3 (Aldridge and Ala’l 2013; Brand et al. 2003; Ding et al. 2011) included three domains, 4 (Bear et al. 2011; Furlong et al. 2005; Lee et al. 2017; White et al. 2014) included two domains, and 1 included a single domain (Byrd 2017).

Table 2 Psychometric characteristics of measures included in review of school climate literature

Psychometric Validity and Reliability Evidence

When inspecting the validity characteristics within our sample of studies, some obvious trends emerged related to the prominent types of validity evidence used to develop and evaluate measures of school climate (See Table 3). We situated categories of validity evidence into 3 tiers: evidences reported across all studies (Tier 1), evidences reported in the majority of studies (Tier 2), and evidences which were rarely reported (Tier 3). Within Tier 1, all identified studies (j = 9, 100%) reported Evidence of Test Content and Evidence Based on Internal Structure. This finding indicates a preference for assuring that depictions of school climate were characterized by items that were representative of the construct. The predominate method for completing this activity across studies (j = 6, 66%) relied on reference to items of previously established measures, with one study (11%) relying on reference to theories of school climate and one (11%) not specifying their strategy. Additionally, a predominate trend was noted for researchers to rely upon multivariate estimations such as factor analysis to evaluate the interrelation among measure scores. These procedures depicted provided evidence that an item score variability was solely attributable within the related dimensions (see Table 1). Across all studies, classical test theory strategies such as exploratory and confirmatory factor analyses were implemented ubiquitously in favor of item response theory approaches.

Table 3 Psychometric characteristics of measures included in review of school climate literature

Within Tier 2, 6 of our identified studies (66%) reported Evidence of Relations with Other Variables. This was typically represented by convergence of scores with related measures of school climate. Within Tier 3, 3 of our identified studies (33%) reported Evidence Related to Response Processes indicating a limited representation of primary studies reporting analyses of participant responses to items such as processes for interpreting item response strategies. Similarly, 1 (11%) of the studies within our sample reported Evidence Based on the Consequences of Testing suggesting an underrepresentation of data indicating whether the benefits of school climate were likely to be realized in target schools. Across all studies, reported reliability coefficients varied from .58 to .94 suggesting wide ranging internal consistency of item scores ranging for poor to excellent.

Discussion

This systematic search and review were intended to identify primary studies of school climate measures that emerged between 1993 and 2017. Our method located 9 measures that were characterized by varying degrees of construct representation, validity, and reliability representation as indicated by our coding activities. This broadly auspicious finding represents a corpus of available measures that is proportional to other constructs associated with the youth well-being such as resilience (Windle et al. 2011) and quality of life (Chopra and Kamal 2012). However, we recognize both the preliminary nature of these findings and the limitations in potential to support both universal and targeted assessments practices by counselors who work in schools.

Although the applicability of validity and reliability evidence are nuanced, one aspect of our systematic review that was straightforward in the presentation was the representativeness of constructs consistent with Wang and Degol’s (2016) conceptualization of school climate. Our search revealed that school counselors and administrators have many options among the identified measures to depict aspects of their campus community (j = 8), school safety (j = 7), and academic atmosphere (j = 6), with 4 measures available to depict all three of these constructs (Aldridge and Ala’a 2013; Brand et al. 2003; Ding et al. 2011; Zullig et al. 2010). It is possible that the use of any of these measures may promote an understanding of school climate in ways that allow for counselors to plan prevention and intervention programming with students, faculty, administrators, and parents. We regard the availability of these resources as encouraging when considering the positive associations between aspects of school climate and academic achievement (Berkoqitz et al. 2017), physical health (Lucarelli et al. 2014), willingness to ask for help (Shukla et al. 2016), prosocial behavior (O’Brennan et al. 2014), resilience and emotional well-being (Aldridge et al. 2016). Alternatively, Wang and Degol’s Institutional Structures category was proportionally underrepresented being associated with just one subscale (School Physical Environment) in one measure (Zullig et al. 2010). It is plausible that this finding may signal a low prioritization of school environment enhancements such as creating open spaces, organizing inviting classrooms, and planning for noise management due to limited resources. By contrast, it is also possible that this underrepresentation is indicative of a general gap in our understanding of relationships between physical environment and relevant academic, social, and health outcomes.

The sample of studies identified represented varying degrees of validity evidence. Lenz and Wester (2017) noted that validity is a unitary concept and that a greater degree of validity evidence is not a putative indicator of measures utility. While some evidence such as those for test content and internal structure may be self-evident during the decision-making process, others may obfuscate usefulness for a particular school campus. This is especially true when validity evidence is not considered holistically in terms of its merits at the local level. We submit that the existence of a statistical relationship may be less practically important than consideration of how relevant these associations are for a campus, especially when considering that constructs are expressed differently across cultures. Although the Inventory of School Climate-Student (Brand et al. 2003), Delaware School Climate Survey-Student (Bear et al. 2011), What’s Happening In This School? Questionnaire (Aldridge and Ala’a 2013), and School Climate and School Identification Measure-Student (Lee et al. 2017) received the highest validity quotient ratings, our analyses should be regarded as a starting point for scrutinizing the degree of measure-setting fit. The meaningful integration of validity evidence in juxtaposition to needs at local levels may provide more accurate conceptualization of a measure’s utility. Stated another way, a measure with a high degree of validity evidence is of little use when the constructs it depicts are not the ones that a school counselor is interested in portraying.

The same point is true for estimations of score reliability within our sample of studies which were entirely based on computation of the coefficient alpha and represented a range of estimates from .58 to .96 which could be characterized as spanning from poor to excellent reliability, respectively. Although it may be tempting to rely on conventions such as poor, modest, good, and excellent to facilitate selection of measures for local use, judgment of a measure and its subscales based on alpha coefficient reliability estimates may be inherently problematic. Several authors have contested the use of coefficient alpha as a signal of measure quality (Dunn et al. 2014; Peters 2014). The central positions among these arguments are associated with (a) statistical formulae that assume each item is a repeated measurement of itself which (b) distorts actual reliability, especially in small samples. In the absence of related confidence intervals the interactions and influences of (a) and (b) are rarely known. Therefore, when considering the adequacy of reliability estimates for our sample of studies, it may be prudent for counselors to thoughtfully compare their contexts (students, academic settings, and cultural characteristics) to those within the initial validation study. If there is a reasonable degree of similarity, it may be that reliability estimates are a relevant depiction. However, in absence of contextual similarities, the limitations to alpha coefficient estimation preclude a condition of non-generalization.

Taken together, our sample of studies revealed a samples of 9 school climate measures, each with their distinctive compositional and psychometric merits. Establishing a syzygy of compositional characteristics, validity evidence, and consistency of scores that meet a counselor’s unique needs may start with reference to our findings, but also require contextualization on a local level. For example, Zullig et al.’s (2010) School Climate Measure was the most comprehensive in its coverage of academic atmosphere, community, safety, and institutional structures. However, at 153 items it may be less practical if part of a broader campus assessment when compared to the 9-item Georgia Brief School Climate Inventory (White et al. 2014) which includes content related to community and safety. Briefer inventories may be less informative by way of comprehensiveness, but more valuable to some school-based counselors by providing greater degrees of validity evidence, sample representativeness, and parsimony. Similar comparisons and contrasts are possible between all the measures included in our review; thus, the onus of professional decision making may extend beyond crosswalks and reference to interpretative benchmarks and instead reside in the heart of local practitioners who intimately understand their campus’s and evaluative needs.

Implications for School Counselors and Professional Counselors Working in Schools

We believe that our findings have implications for school-based counselors at local and broad levels. At local levels, selection and use of a school climate measure with reasonable fit for a campus, provides a mechanism for continuous monitoring of student body experiences across academic atmosphere, community, safety, and institutional structures. Annual or semi-annual data collection may facilitate data-informed direction of prevention programming, classroom guidance, and in-service trainings for faculty and staff. The same evaluation schedule may also provide immediate and longitudinal depictions of impact associated with previous initiatives consistent with standards presented by groups such as ASCA, ISCA, and WHO. Furthermore, through testimonies and visual depictions, school climate data may support appeals to school boards and funding sources for resources that may be difficult to justify in the absence of such metrics. These activities may promote student success and establish protective factors against bullying (Duggins et al. 2016), absenteeism (Burton et al. 2014), dropout (Jia et al. 2016), and teacher burnout (Malinen and Savilainen 2016).

At broader levels, wide-ranging use and reporting of school climate data will answer calls by state and federal education agencies for meaningful assessment and monitoring of educational environments. The dissemination of this data will allow for aggregation and multi-level analyses that depict the pulse of a generation of students with distinctive needs which may share some notable similarities when represented collectively. With this information, it is possible for individuals and professional organizations to mobilize advocacy efforts that speak to legislators with generalizable quantitative data. When coupled with personal testimonies, such data may spur pragmatic approaches to educational policy and resource allocation.

Limitations and Recommendations for Future Research

Some limitations and associated recommendations for future researchers are noted. First, the data range associated with our search strategy inherently defined what studies we would identify, evaluate, and provide commentary. Future researchers are encouraged to expand the time frame (e.g. 30 years) or keep the same 25-year range yet advance the anchor year to account for contemporary developments (e.g. 1996 to 2020). Second, our conversation is limited to primary studies which defined our conversation about psychometric characteristics. Future studies that included all available psychometric evidence in a meaningful way may promote reliability and validity generalization analyses that allow for cross-cultural comparisons of construct variance. Finally, our activities were not intended to provide a prescriptive representation of measure-population fit, but instead to describe the body of available resources for assessing school climate that can be evaluated in local levels and collectively aggregated over time. As data accumulates over time, more sophisticated crosswalks of characteristics and data may yield decision trees or actuarial models for reference by school counselors and administrators.