American public schools are expected to accommodate all students who enter their doors and are charged with effectively fostering the academic and social growth of all students (Rebell 2012). This has become increasingly difficult over the years as the landscape of America’s youth changes and becomes more diverse ethnically, linguistically, economically, as well as culturally (Walker et al. 2010). These challenges to the successful accommodation of students are intensified by increasing academic demands and pressure for accountability, such as the provisions put forth by No Child Left Behind (NCLB: 2001) and Every Student Succeeds Act (ESSA: Mathis and Trujillo 2016). These demands have pushed schools into rigorous academic curricula to ensure that all students receive a quality education. However, the focus of these accountability measures on academic content marginalizes students at risk for behavioral and emotional problems. In these schools, students who are judged to be difficult to teach and manage are seen as impediments to satisfying external demands of accountability. As a result, they may be pushed towards a discipline system that excludes them from the learning environment (Losen et al. 2015).

Prevention programming has been found to be more effective than exclusionary discipline programs at increasing student achievement and mitigating negative student outcomes (American Psychological Association Zero Tolerance Task Force 2008; Bear et al. 2000). Research indicates that prevention initiatives also have the potential to reduce racial disproportionality in special education and exclusionary discipline practices (Proctor et al. 2012), thereby enhancing academic and behavioral outcomes for the entire school population (Vaillancourt et al. 2013). Consequently, schools across the country have turned attention to their implementation. Multi-tiered Systems of Support (MTSS) is a school-based prevention model that focuses on early identification of academic and behavioral risk to connect students to interventions early (e.g., Chafouleas et al. 2010; O’Connell et al. 2009; Skiba et al. 2006). MTSS itself is not an intervention but instead provides a systematic, data-informed framework to help organize, track implementation, and assess the effectiveness of prevention and intervention efforts that have been identified to meet the needs of students.

A critical component of any school-based prevention program, but especially applied within an MTSS framework, is the collection and monitoring of data regarding student functioning (Lane et al. 2009; Lane et al. 2011). Many schools have successfully established procedures to help teachers systematically collect information on student academic functioning as part of academic MTSS services. However, systematic collection of student behavioral and emotional functioning has yet to follow (Oakes et al. 2014; Kamphaus, Reynolds, and Dever 2014; Bruhn et al. 2014). One reason for this gap in systematic data collection may be the lack of information available on the implementation of behavioral and emotional screeners for systematic assessment of student behavioral and emotional risk. Conducting universal screening, including screening for behavioral and emotional risk, requires staff time, training, and depending on the screening instrument used, material costs. Schools may struggle to acquire these resources, especially if they feel uncertain regarding the process.

The purpose of the current study is twofold. The first aim is to assess whether a standardized universal screening system for student emotional and behavioral risk predicts student academic (measured through student grade point average) and behavioral outcomes (measured through student absences and suspensions). The second aim is to determine whether this screening measure is a better early predictor of behavioral and academic outcomes than the more traditionally used office discipline referrals (ODRs). This paper evaluates these tools within a school serving predominantly African American youth from low-income communities. Answering questions about screening for behavioral risk in this setting increases our understanding of student risk indicators for African American youth from low-income communities, helping to improve decision-making (Proctor et al. 2012; Raines et al. 2012).

Collecting Data on Student Behavioral and Emotional Risk in Schools

Students who struggle with behavioral and emotional problems are at substantial developmental risk (Reid, Patterson, and Snyder 2002), and untreated, these problems persist into adulthood (Vander, Weiss, Kuo, Cheney, and Cohen 2003). Some children who struggle with behavioral and emotional functioning experience poorer academic performance, absenteeism, grade retention, a greater risk of dropping out, as well as juvenile delinquency (Perfect and Morris 2011). Furthermore, some students with behavioral and emotional problems experience a myriad of symptoms that hinder their social competence, including the inability to self-regulate, and development of social skills such as appropriate assertion, cooperation, and independence. The possession of these skills not only aids the child in interacting appropriately with his or her environment but aids in the development of academic and occupational success as well (Merrell et al. 2008). Deficits in these areas contribute to a “failure to thrive,” both in an academic and social sense, launching a risky cycle of school failure and school adjustment problems (Merrell et al. 2008).

Eighty percent of students who need behavioral and emotional supports do not receive them, particularly students living in under resourced communities (Perfect and Morris 2011). Schools are in a unique position to identify early those students who need extra support and to provide the interventions that can increase student well-being and academic success. Universal screening is one way to identify those students at risk for behavioral and emotional problems, and therefore at greater risk of school failure, school disciplinary action such as school suspensions and absences. Early identification of student risk allows diversion away from reactive practices to preventative practices (Chafouleas et al. 2010; Proctor et al. 2012).

Even though the benefits of universal school-based screening for behavioral and emotional risk have been identified and promoted for over a decade, only 12% of K–12 schools incorporate the use of a standardized instrument of school-wide behavioral and emotional screening in MTSS protocol (Bruhn et al. 2014). While this is a noticeable increase in implementation of school-wide screening, up from 5% approximately a decade before (Glover and Albers 2007), traction in the use of screening instruments has been limited. Considering the strong link between student behavioral and emotional functioning and academic success (Perfect and Morris 2011), this slow growth of attention to screening procedures for behavioral and emotional risk is counterintuitive. To promote understanding and ultimately widespread application of screening instruments, researchers suggest that additional empirical data is needed to establish predictive validity and to compare their use to other referral procedures, such as monitoring of office discipline referrals (Dever et al. 2015; Dowdy et al. 2014; Kamphaus et al. 2014).

Meanwhile, in the absence of universal screening for behavioral and emotional functioning, schools rely on information gathered through ODRs to make decisions about which students to support, and how to support them (Clonan et al. 2007). ODRs are defined as teacher-reported, teacher-documented observations of student behavior that violate school rules (McIntosh et al. 2010). ODRs are a convenient, easily accessible tool for measuring student behavioral functioning and are seen by teachers as valid tools (Irvin et al. 2004). In an MTSS problem-solving model, students whose frequency of ODRs are higher than other students would be identified as in need of behavioral referral (Clonan et al. 2007). Previous research indicates that ODRs can be systematically evaluated so that the frequency of ODRs can indicate the level of risk a child is experiencing (Predy et al. 2014). A study by McIntosh et al. (2010) found that students with ODRs between 0 and 1 were at “normal” risk, 2–5 were at “elevated risk,” and 6 or more were at “extremely elevated” risk.

While ODRs provide one method for evaluating student behavior, there are several limitations to consider. First, the tracking of ODRs is a reactive practice rather than a preventative one. This refers to the fact that ODRs are generally assigned once a child has displayed highly disruptive behaviors, and that students are only flagged for intervention once they have received several ODRs. This is especially problematic when ODRs are used to inform preventative programs. The information collected by ODRs may not be valid until enough ODRs have been collected to identify students at risk of school failure, which may be too late for optimal, early intervention. Secondly, ODRs have demonstrated only limited predictive validity. In their analysis of ODR data collected on more than 900,000 students, McIntosh et al. (2010) found inconsistent patterns of predictive validity. They concluded that while ODR data had its uses for progress monitoring and for the understanding of specific behavioral problems students may demonstrate, ODRs were not supported for use as early screening measures.

The ease of data collection and the relatively inexpensive process of collecting ODRs are particularly appealing to schools with limited resources such as inner-city schools serving students from low-income communities. However, several aspects of ODR reporting strategies in urban schools serving students from low-income communities, such as the advent of no-excuse policies, make ODRs even less likely to provide valuable information on true student risk (Golann 2015). For example, students living in low-income households often face additional challenges, including acute and chronic stress as well as health and safety issues that can affect their behavior in school (Wadsworth and Rienks 2012). Intensifying these factors is a current trend in educational reform practice in schools serving students from low-income communities that include more rigid discipline policies. Students in these settings may receive citations for placing a head down on a desk or wearing socks that do not match uniform expectations (Golann 2015). Consequently, these schools may report higher incidents of problem behaviors that result in office referral, compromising the utility of these referrals to tease apart levels of need (Warren et al. 2003).

There is also a subjective aspect to discipline referrals. Teacher decisions about when to issue an ODR can be as much a function of a teacher’s unique behavioral expectations and biases as it is an indicator of student risk (Kern and Manz 2004; Skiba et al. 2002). For example, Skiba et al. (2011) found that African American elementary school students were more than twice as likely to receive an ODR as White peers. This higher incidence of discipline referral could not be attributed to a difference in behavior (American Psychological Association Zero Tolerance Task Force 2008). Though not directly related to office discipline referrals, Fish (2016), when looking at referrals for special education, found that when presented with a vignette, teachers’ decisions about who to refer for special education, including for behavioral disorders, depended on the race and gender of the student presented in the vignette. The vulnerability of ODRs to implicit biases limits the utility of the office discipline referral as the primary indicator of true behavioral and emotional risk (e.g., McIntosh et al. 2010; Skiba et al. 2011).

Because of the limitations of ODRs in identifying student risk, it becomes even more urgent to utilize systematic screening procedures that emphasize prevention, promotion, and intervention services, not just a reactive approach to problem behaviors (Warren et al. 2003). Systematic school-based universal screening provides information on indicators of emotional and behavioral problems and therefore affords schools with the ability to implement early intervention as needed. Furthermore, such screeners are more likely to indicate underlying student behavioral and emotional risk rather than incompatibility with school rules that are particularly sensitive to situational variables. In short, the most appealing aspect of using a screening instrument is that it can be applied proactively and systematically to determine true underlying student risk so that students who most need support can receive it earlier. Furthermore, because of the standardized nature of universal screening instruments, there is potential to negate the bias found in more subjective data collection methods such as ODRs (Raines et al. 2012).

This study furthers the research into best practices for behavioral MTSS data collection by comparing the predictive validity of ODRs to a standardized universal screening system for behavior that also relies on teacher-reported data, the Behavior Assessment System for Children-2 (BASC-2) Behavioral and Emotional Screening System Teacher Form (BESS TF, Kamphaus and Reynolds 2007).

Predictive Validity of the BESS

The BESS TF (Kamphaus and Reynolds 2007) was developed as a nationally normed and standardized screening measure for use in school settings (King et al. 2012). Researchers have demonstrated evidence of the BESS’s validity as a screening instrument for practical use in schools, including evaluation of the BESS’s internal consistency, external concurrent validity, and predictive ability (Kamphaus et al. 2010).

Preliminary research indicates that the BESS TF is an effective means of identifying students with behavioral and emotional risks (Eklund and Dowdy 2014). Some studies have linked the BESS TF to teacher reports of student conduct problems, unusual patterns of thoughts and behaviors, and poor social skills (Dever et al. 2012; Kamphaus et al. 2007). Other studies have provided initial support for the relationship of the BESS TF to school outcomes such as behavioral misconduct, school engagement, and academic achievement (Eklund and Dowdy 2014; Kamphaus et al. 2010; King and Reschly 2014). For example, in a study of 26 third graders and 22 fourth graders from two California elementary schools (Renshaw et al. 2009), reports from the BESS TF completed in the first quarter of the school year were found to correlate with student behavioral, academic, and engagement indicators assessed from students’ first quarter report cards. The BESS TF has also shown some predictive validity through the identification of a significant relationship between BESS scores and school outcome variables taken at later time points. In a 2-year longitudinal study with 206 students in grades K–5, the BESS TF score derived in the first year of the study was correlated with teacher ratings of conduct problems, indices of school maladjustment, special education placement, and teacher-assigned reading and math grades reported on student’s report cards in the second year of the study (Kamphaus et al. 2007).

There are limited studies comparing systematic universal screeners to office discipline referrals, though both are used as screening measures to identify students in need of behavioral and emotional support. One of the few studies making this comparison analyzed ODRs’ identification rates compared to three systematic universal screeners as well as teacher nomination methods. Teacher nominations are direct referrals of students by teachers to support services such as counseling or special education. Results indicated that ODRs and teacher nomination methods identified the fewest number of students, with teacher nomination methods and the BESS TF showing the least convergence. The authors conclude that ODRs and teacher nomination methods are best for system level indicators of behavioral functioning and measures of externalizing behaviors, but not as screening tools to identify student mental health concerns (Miller et al. 2015).

The systemic nature of the BESS TF and the focus on evaluating both externalizing and internalizing risk may be the reason for discrepancies recorded between the BESS TF and teacher referral methods (i.e., teacher nomination and ODRs). The BESS encourages teachers to systematically think of each child and to consider patterns that reflect internalizing symptoms along with the more easily observable externalizing behaviors (Raines et al. 2012). This characteristic of the BESS TF results in a screening method that is more likely to identify factors that underlie risk, including internalizing risk (Dowdy et al. 2013; Eklund et al. 2009). Because many universal screeners, in particular, the BESS, incorporate indicators of both internalizing and externalizing risks, rates of student identification using systematic tools have yielded results closer to expected community prevalence than have traditional teacher referral methods (Dowdy et al. 2013; Jellinek et al. 1999; Nelson et al. 2002).

The Current Study

The purpose of the current study is to compare the predictive validity of the BESS TF and teacher-reported ODRs. As schools turn to the implementation of programming to prevent behavioral and emotional problems in schools, and to promote student academic success and well-being, the tools we use to collect data on student need should align with these goals. Currently, office discipline referrals are the default method for identifying students at risk behaviorally and emotionally. This study seeks to understand if office discipline referrals are effective at early identification of student risk for academic and behavioral difficulties in school, particularly when compared to a psychometrically constructed universal screening instrument. Furthermore, this study seeks to answer this question when considering a school serving predominately African American youth from low-income communities.

The central questions in this study include do the BESS TF collected early in the year predict students who will be struggling academically and behaviorally at the end of the year and is it a better predictor of these outcomes than the more traditionally collected ODR system? The authors believe that the BESS TF collected at the beginning of the year is more likely to pick up on underlying student risk, and therefore will be predictive of student academic and behavioral functioning at the end of the year while ODRs will not be. We are specifically interested in being able to identify students at risk of school failure due to behavioral and emotional risk before these problems become entrenched. Early identification of behavioral and emotional risk is key to putting preventative measures in place. Therefore, we further hypothesize that the BESS TF will be predictive of change in student academic functioning over the course of the school year, not just at the end of the school year. Because ODRs are reactive and subjective, we believe they will be less predictive of student academic functioning over the course of the year. Ultimately, the purpose of this study is to evaluate whether a standardized universal screener, the BESS TF, predicts student risk for school failure as measured by school outcome variables such as Grade Point Average (GPA), absences, and school suspensions. The second question in this study is whether the BESS TF is a better predictor of student school outcomes than the more traditionally used behavioral data system, ODRs.

Method

Participants

This study utilized archival data collected during the 2014–2015 academic school year, in an urban community in the Southeastern United States. The participating school was a kindergarten through eighth grade school, enrolling approximately 450 students. Due to the archival nature of the data, demographic data was not available for each student. However, public school records from 2014 to 2015 indicate that the student body was predominantly African American (98%), and most of the students were eligible for free or reduced lunch (96%; New Orleans Parents’ Guide to Public Schools 2015). While the school enrolls approximately 450 students each year, the BESS TF is completed for grades 1–3, while the BESS Student Form (BESS SF) is completed for grades 4–8. ODRs are only collected for first through eighth grade students. The current study focuses on the BESS TF and ODRs; therefore, the final sample represents first through third grades. The final sample included 142 (53.5% female) students in grades 1–3, ranging in age from 6 to 10 years (M = 7.67, SD = 1.08).

Procedure

The archival data utilized for this study was provided by the school. Due to the anonymous nature of the data, the study was deemed exempt from human subject’s review by the university Institutional Review Board. However, all procedures were conducted in a manner consistent with ethical standards for research noted by the American Psychological Association (2010).

The BESS TF was administered 6 weeks into the school year to first through third grade teachers as part of the schools universal screening efforts. Administration of the BESS TF followed common practice for the use of universal screening in school settings (Parisi et al. 2014). Testing coordinators provided guidance to teachers on the completion of screeners and were available for assistance while teachers completed forms. Teachers completed screening forms during a professional development session where all teachers were present. Teachers were given instructions on how to complete the scantrons and then independently completed a BESS TF for each child in their homeroom class. Scantrons were then scored using the BESS software, which reported each item response and a final global BESS TF t score.

Measures

BESS Teacher Form

The BESS Teacher Form (BESS TF; Kamphaus and Reynolds 2007) is a 27-item instrument designed to measure teacher-reported levels of risk for student behavioral and emotional challenges. Items for the BESS TF were selected from the larger BASC-2. The BESS TF asks teachers to rate items on a 4-point scale (i.e., never, sometimes, often, almost always) and is designed to be completed in 5–10 min per student. BESS items represent the domains of internalizing problems, externalizing problems, school problems, and adaptive skills. The BESS TF produces a raw score that is then converted to a t score. Higher t scores represent a greater risk for behavioral and emotional challenges (20–60 is “normal” risk, 61–70 is “elevated” risk, and 71+ is “extremely elevated” risk). The psychometric properties of the BESS TF are acceptable, having good split-half reliability (.96, − .97), test-retest reliability (.83), and moderate correlations with other measures of behavioral and emotional problems (Kamphaus and Reynolds 2007).

In the current sample, t scores were normally distributed (skewness = .178; kurtosis = − .644) and ranged from 33 to 79 (M = 51.49, SD = 11.59). Using the median absolute deviation (MAD) method of detecting outliers (Leys et al. 2013), no scores were detected that exceeded a standardized threshold of |± 3|. The BESS TF for this sample yielded a Spearman-Brown split-half reliability of r = .95 and an internal consistency of α = .94.

Office Discipline Referrals

Teachers would write ODRs down during the day on a form, including the date, time, category of the incident, and description of the incident. Teachers would then enter ODRs at the end of the day using a software platform. For this study, the ODR variable is continuous and represents the frequency of ODRs the student received in September (M = 4.99, SD = 8.18). ODRs ranged from 0 to 46. When teachers logged behaviors, all entries had to be assigned predetermined categories that included behaviors like causing a disturbance in class, bullying, or cursing/using vulgar language. For purposes of this study, ODRs representing minor risks such as chewing gum or a uniform violation were excluded from analysis (McIntosh et al. 2010; Predy et al. 2014). The ODR variable used in this study includes ODRs assigned from the beginning of the school year and through the month of September when the BESS TF was administered.

Academic Outcome Variables

Student academic functioning was measured by using the GPA system where scores range from 1 to 4, 4 indicating the highest GPA possible. GPA was collected by quarter, and each quarter represents 9 weeks. For this study, we used GPA as measured each quarter. Each quarter represents 9 weeks of the school year. Research indicates that GPA is an indicator of academic skill, including the ability to learn new information and complete assigned class work (Duckworth et al. 2012), and is a predictor of school success generally (National Education Association n.d.).

Behavioral Outcome Variables

Behavioral outcome variables included suspensions and absences. Student suspensions are continuous and represent the total number of suspensions (in-school and out of school) garnered by the end of the school year (M = .93, SD = 2.07). Absences are the number of school days the child was absent during the school year (M = 9.3, SD = 7.84).

Results

Data Screening

Prior to conducting statistical analyses, data screening procedures were used to address issues with data accuracy as well as missing data (Tabachnick and Fidell 2007). Data screening procedures revealed that 5% of the sample had missing data for one or more key outcome variables such as GPA or absences. This data was probably missing as some students may have transitioned to another school, or from another school. Further examination of these cases revealed no pattern associated with the missing data. Due to the small amount of missing data (7) relative to the overall sample size (142), as well as the random nature of the missing data, we chose to delete the missing variables rather than use an estimation technique to replace the missing data (Tabachnick and Fidell 2007). No outliers were identified in the sample and all scores fell within expected variable ranges. After data screening procedures were conducted, the resulting sample included 135 participants.

Descriptive Statistics

Descriptive statistics regarding study variables can be found in Table 1. Both the BESS and ODRs were used to measure the number of students considered to be at behavioral or emotional risk. Using the BESS TF, the sample included 77.8% within a normal risk range, 16.3% within the elevated risk range, and 5.9% within the extremely elevated risk range. Using accepted cutoff points for ODRs (McIntosh et al. 2010), 33.9% of students are considered in the extremely elevated risk range (5+ referrals), 24.5% of students in the elevated risk range (2–4 referrals), and 41.5% of students in the normal risk range (0–1 referral).

Table 1 Descriptive statistics of key variables

Correlations reveal that the BESS TF risk score is negatively associated with GPA (r = − .614, p < .01), and positively associated with absences (r = .335, p < .01) and suspensions (r = .337, p < .01). ODRs were only correlated with suspensions (r = .272, p < .01). Table 2 represents correlations for study variables. Gender and grade were incorporated in the correlation analysis in order to assess for a potential relationship to study variables. Grade showed some correlation with GPA, while gender showed some correlation to the BESS TF, GPA, and suspensions.

Table 2 Correlations

Hierarchical Linear Regression

A series of regression analyses were run to compare the predictive validity of the BESS TF and ODRs collected in the same time period. GPA, absences, and suspensions were included as dependent variables. Table 3 summarizes the findings of each model. Because participant grade and gender are correlated to key variables, these were entered in as control variables. When controlling for gender and grade, the BESS TF significantly predicted end-of-year academic and behavioral functioning, including GPA, absences, and suspensions. Results are presented in Table 3. Hotelling’s t test for non-independent correlations was used to compare the predictive utility of the BESS TF and ODRs. The BESS TF accounted for significantly more variance in GPA (t(132) = 5.062, p < .001) and absences (t(132) = 2.370, p < .02) than ODRs. There was no difference in the variance accounted for in number of suspensions (t(132) = .438, p > .05).

Table 3 Regression

Discussion

To date, ODRs remain the most commonly used data for assessing student behavioral needs. While ODRs can provide valuable information on student functioning, they are particularly susceptible to subjective observations of student behavior and are reactive versus preventative. The practice of utilizing ODRs for identification of student behavioral and emotional risk is a reactive approach to monitoring student need that is contrary to the prevention framework of MTSS. This is especially meaningful in schools serving minority students from low-income communities where subjectivity in the referral process may impact who receives ODRs. Furthermore, risk factors associated with poverty may manifest as behaviors that are incompatible with school behavioral expectations. In these schools, ODRs may be an indicator of immediate teacher reactions to student behavior rather than true behavioral and emotional risk.

The current study compares ODRs and a standardized universal screening tool for behavioral and emotional risk, and the utility of these instruments in predicting student outcomes as measured by GPA, suspensions, and absences at the end of the year. Ideally, a strong data collection tool in an MTSS framework would be able to identify students at academic and behavioral risk throughout the school year. Moreover, a psychometrically sound data collection tool would minimize bias by promoting reliable and valid reporting on indicators of student difficulty (Belser et al. 2016).

Results of this study indicate that the BESS TF and ODRs differ in the number of students they report as at risk, with the BESS reporting numbers more like what is expected in an MTSS model. The highly subjective nature of ODRs may be one reason why student risk status as measured by ODRs in this study is so highly divergent from what is expected within an MTSS framework. While higher rates of risk are generally expected in urban schools (Warren et al. 2003), the rate of 34% of students at extremely elevated risk would seem to be more a function of the way risk is being reported than of student actual behavioral and emotional risk. The BESS TF may present a more accurate picture. This more accurate picture is particularly important for connecting the students most in need of support to services. Being able to identify students needing support at the tier II and tier III level will allow schools to more effectively and efficiently initiate intervention efforts, while also focusing on universal needs such as the implementation of PBIS programming or SEL programming.

Preliminary analyses also indicate that ODRs in the beginning of the year are not related to end of the year outcomes like GPA and absences. Regression analyses strengthen this claim, indicating that the BESS TF is a better predictor of GPA and absences at the end of the year than ODRs taken during the same time period. Both GPA and absences provide an important indicator of student functioning. Student absences factor into how schools are evaluated, as they can be an indicator of school climate, and student well-being generally. GPA is also an important outcome variable as it reveals how students are functioning academically. Specifically, GPA gives school staff an indication of how students are engaging with, organizing, retaining, and using what they are learning in the classroom (Duckworth et al. 2012). Results of this study indicate that students with higher BESS TF scores are less likely to come to school regularly and more likely to struggle academically. ODRs collected early in the year do not.

While ODRs are not related to GPA and absences at the end of the year, both the BESS TF and ODRs can predict student suspensions. Like ODRs, suspensions identify students who are exhibiting behaviors that are not in line with school expectations. Suspensions are generally assigned when a student has exhibited egregious behavior, such as fighting, that may result in harm to themselves or others. ODRs might capture similar behaviors, and its predictive value may come from identifying students who are more prone to acting in ways contrary to school rules. The BESS TF also measures similar variables and also can predict student suspension rates. Therefore, both give a good indication of which students are more likely to exhibit behaviors that violate school norms. However, for schools interested in supporting students and reducing the use of exclusionary discipline practices, the BESS TF gives a broader indicator of student functioning, including both externalizing and internalizing behavioral and emotional risks, which can result in fewer instances of more severe behaviors that lead to suspension.

Standardized screening referral systems like the BESS TF can give us more accurate information on student risk as measured by other school variables of importance such as GPA and school absences. Traditionally, schools use ODRs to measure student behavioral needs. While ODRs may reveal how often a student behaves in a way that contradicts school expectations, this is not necessarily an indicator of behavioral and emotional risk. This finding is especially important as schools increasingly recognize the need for school services that address the whole child, and do not divorce behavioral and emotional functioning from academic functioning. As schools seek to better support students and to institute intervention and prevention programs around behavioral and emotional health, ODRs may not be enough to indicate which students need support or to measure the impact of these programs directly.

The results of this study have practical implications for educators. First, the traditional model for connecting students who display behavioral and emotional risk is subjective and has been linked to current disproportionality in special education for emotional and behavioral disorders, as well as the overrepresentation of African American youth in school discipline data despite lack of evidence that these youth display more maladaptive behaviors (American Psychological Association Zero Tolerance Task Force 2008). Results of this study indicate that indeed, ODRs may be more a function of teacher’s immediate reaction to student behavior than to true student risk. ODRs may help schools understand which school rules students are having difficulty maintaining and therefore to help schools to reshape their discipline policies. However, ODRs are not useful in identifying which students may need early support due to behavioral and emotional risk, especially in an urban school serving primarily African American youth from low-income communities. While these students may be more likely to come to school with a higher chance of displaying behavioral and emotional risk, ODRs may not be a useful tool for planning how to address student needs either at the universal or targeted levels.

Limitations and Future Directions

This study provides a needed comparison between ODRs, the most commonly used extant data on student behavior, and a standardized universal screening measure for behavioral and emotional risk. While this information will contribute to understanding the practical application of these methods in an urban school, results may be limited to such urban schools and may not be generalizable to a larger demographic. However, it is also important to understand the unique context of schools in urban districts serving low-income youth, especially regarding current education reform practices and disparities in educational outcomes. This study adds to the literature by answering a critical question about using behavioral data to make decisions about student supports. Furthermore, while GPA as a measure of student academic functioning is an important school outcome variable, the relationship of these variables to standardized testing scores, a valuable metric to schools, would help strengthen the conclusion of the study.

Decades of research have illustrated that students from certain racial and ethnic groups are more likely to be subjected to exclusionary discipline practices such as suspension and expulsion (Gershoff and Font 2016). Likewise, office discipline referrals are subject to similar bias (Staats and Patton 2014). The subjective decision-making involved in whether a teacher issues a discipline referral for misbehavior limits our knowledge of the quality of ODR data obtained for this analysis and thus may limit the application of findings generally. Additionally, neither the variance with which ODRs were applied by teacher nor the variance in the type or severity of behavior that marked the discipline referral was investigated in this analysis. However, future research may investigate ways in which child and teacher factors affect ODR application and thusly how variance-by-teacher affects comparison to standardized screening tools. Regardless, results of this study demonstrate that while ODR data are important for school decision-making and provide an important indicator of student functioning, reliance on ODRs as the primary mechanism for referral within an MTSS system is inadequate.

Summary

MTSS relies on the collection and monitoring of data and the application of data to the allocation of resources for early intervention and prevention initiatives. In previous studies, the BESS TF identified more students than teacher referral. However, in environments that have strict discipline policies that lead to high numbers of behavioral infraction, the BESS TF may accurately identify students in need of support. This study supports the predictive validity of the BESS as a universal screener in schools and adds to the literature by examining implementation and use in a school serving a predominately low-income African American student body. Researchers are encouraged to assess the effectiveness of screening instruments as used in various populations. This study suggests that the BESS TF provides more accurate reflection of student need and social emotional functioning than does the office discipline referral system. Data collected with the BESS TF can effectively inform ways in which prevention and early intervention resources can be allocated, especially in an environment where discipline practices result in high numbers of behavioral referral. Additional research and further investigation into ways in which data can support early intervention and be used in progress monitoring is recommended.