Introduction

Achievement gaps among whites, blacks, and Hispanics in standardized test scores and grade point averages (GPAs) are an on-going unfortunate feature of educational performance in many of today’s public schools (Berends and Penaloza 2010). Black and Hispanic students score as much as a standard deviation or a grade level behind their peers on a variety of standardized assessments, indicating academic struggles that may impact minorities’ futures as these groups are projected to occupy significant shares of the working age population (Lichter 2013; Passel 2011). This “puzzle of underachievement” (see Massey et al. 2011) is even more vexing given that a student’s socioeconomic disadvantage, while a core driver of academic ability, does not fully explain race/ethnic gaps in academic performance (Coley 2002; Downey 2008; Fryer and Levitt 2010; Massey et al. 2011; Phillips 2000). Even middle class minority students attending elite schools underperform when compared to their white peers (Massey et al. 2011). Understanding and ameliorating these gaps remains a persistent challenge to policy makers and researchers.

A small but growing body of work has explored correcting achievement gaps by addressing social psychological “threats” to student performance. Stereotype threat, the anxiety of confirming group-specific stereotypes of poor intellectual ability, has well-established links to the poor performance of racial minority students in college, high school, and even in earlier grades. Ameliorating experiences of threat is the focus of interventions aimed at enhancing student self-concept and self-integrity by exposing students to writing exercises that are “self-affirming” or allowing students to write freely on what they value. Cohen et al. (2006) have generated one such intervention that effectively closed the race/ethnic gap in grade point average (GPA) for students exposed to the treatment condition.

While impressive, the question remains whether such gains can be replicated across contexts, particularly in schools and school districts where students of color represent a large share of the district and achievement gaps are likely to be large. The current study reports the results of an intervention implementing the self-affirmation exercise developed by Cohen et al. (2006) within a large urban school district in the state of Texas. Adding to a growing body of work that has fielded this intervention, we add to this literature by fielding these exercises within a unique race/ethnic context. This urban school district is majority–minority where 8% of the students are white (non-Hispanic), 25% are black and about 60% are Hispanic.Footnote 1 Additionally, several schools are composed entirely of black or Hispanic students, while white students represent one-third or more of students in other schools. We implement this exercise in three distinctive high school contexts: a predominantly black school, a predominantly Hispanic school, and a school whose composition is divided roughly into thirds: black, Hispanic, and white. Few if any studies have fielded an intervention targeting stereotype threat such contexts. This is a critical gap in light of the fact that schools attended by black and Hispanic students are decreasingly likely to include white students and these schools are most likely to be poor performing (Condron et al. 2013). Therefore, exploring the efficacy of this intervention in such a context has the potential to speak to race/ethnic gaps that plague a growing number of American schools.

Our findings draw on a sample of 886 ninth graders who received this intervention during their English class at multiple points during the 2012–2013 academic year. We explored its effectiveness by comparing academic achievement on two standardized examinations and students’ Spring semester grade in their English class (the targeted class) of those receiving the “affirmation” treatment exercise and those receiving the “neutral” control exercise, where the exercise prompt presents students with a neutral or non-emotion eliciting topic to write about. We first gauge whether there are stronger treatment effects among black and Hispanic students than white students in test scores and grades; we then gauge whether treatment effectiveness varies by race/ethnicity of student or by school setting. In addition, we examine whether the effect of the treatment for black and Hispanic students varies by their school’s racial/ethnic composition.

Stereotype Threat and Affirmation Exercises: Can the Impact of Anxiety be Reduced?

The role of social psychological issues, specifically experiencing anxiety, has emerged as a potent contributor to the academic abilities of marginalized students. Stereotype threat, as identified by Steele and Aronson, refers to anxiety over “being at risk of confirming…a negative stereotype about one’s group” and this experience inhibits performance on academic tasks (Steele and Aronson 1995: 797). Therefore, while anxiety affects everyone, this form is linked specifically with being reflective about membership in a minority or socially subordinate group (Downey 2008). Stereotype threat occurs in the presence of some trigger, either the explicit linking of a task to intellectual ability, or a reference to marginalized status (e.g., marking race on a form or mentioning race) or even being in a context where a student’s minority status is made salient through being one of the only members of this group in one’s class. Members of groups who are stereotyped as having low intellectual ability experience anxiety when facing these triggers and perform worse on intellectual tasks than students who are not members of minority groups. Therefore, it is that anxiety, not poor ability, that drives down performance (Osborne 2001).

A growing body of scholarship finds evidence that stereotype threat negatively affects the academic performance of black and Hispanic students across a variety of age groups and school settings (Brown and Day 2006; Cohen et al. 2006; McKay et al. 2002; Nguyen and Ryan 2008; Schmader and Johns 2003; Steele and Aronson 1995; Walton and Spencer 2009; Wasserberg 2014). Stereotype threat impacts performance on cognitive tasks conducted by black and Hispanic students as young as elementary school age (McKown and Weinstein 2003; Wasserberg 2014). Most studies, however, focus on minority college students, while only a select few focus on minority middle school students, and even fewer studies focus on stereotype threat impacting minority high school students (see Marchand and Taasoobshirazi 2013). This is a critical gap as Osborne’s (2001) analysis reveals that threat explains a meaningful portion of the racial differences in academic performance among high school students. For example, Kellow and Jones (2008) find that black high school students score significantly lower on a high-stakes test than whites when they are told that their performance on the test will predict their performance on an important standardized test.

A critical question emerging from this work is whether the impact of threat can be reduced and academic performance improved. There is evidence that academic ability can improve when threat is relieved (Cohen et al. 2006). Enhancing students’ self-integrity, that is their feelings about their self-worth, can be a way to limit or even remove the influence of threat-driven anxiety. While several similar interventions exist (see Walton and Spencer 2009), Cohen et al. (2006) reported the most dramatic results. This intervention is, according to Cohen et al. (2006), designed to specifically target self-confidence in an attempt to find methods that lead to higher test scores. This self-affirmation exercise is aimed to function as a “catalyst” that boosts students’ self-confidence and self-integrity while allowing their abilities to be unencumbered, translating, theoretically, into better performance (Purdie-Vaughns et al. 2009), specifically during an anxiety-provoking moment when a student is about to be evaluated.

The intervention, which is a writing exercise completed four times during the school year, was fielded by middle school students to affirm their self-integrity and raise their grades (Cohen et al. 2006: 1307). The intervention takes an average of 15 min to complete and merely asks students to write about what is important to them at several points during the year. Among seventh graders, 70% of black student participants at a middle school benefited from the writing intervention designed to reduce stereotype threat; the overall racial gap decreased by 40%, with treatment students experiencing significantly higher GPAs (Cohen et al. 2006). Similarly promising results are reported in a two-year follow-up study indicating that the lowest-achieving black students who took the intervention improved the most when compared to the control group (i.e., students who completed a similar exercise that was designed to have a neutral impact on their self-integrity) (Cohen et al. 2009). Likewise, an enhanced replication with a different body of students and teachers, in which teachers read students’ writing assignments in an effort to alter their presumed stereotypical views of students, researchers found that the writing exercise significantly improved black students’ overall GPA and narrowed the race gap in GPA to nearly nothing, while gaps persisted for those taking the control or neutral exercise (Bowen et al. 2013).

Placing Self-Affirmation in Context: How does Racial Composition in Schools Matter?

These results are impressive, but an important remaining question is whether similar treatment effects can be found across a variety of school settings. The results thus far are mixed, with some studies finding gains in academic performance (Borman 2012; Cohen et al. 2009, 2006; Hanselman et al. 2014; Sherman 2013) but others finding fewer or no effects for the intervention Dee (2015). For example, Borman (2012), who fielded the intervention among students in three Minnesota schools, found gains in standardized math scores, but these gains were explained away with controls, and he did not find a treatment effect for reading scores. Dee (2015) explored this assessment in six Philadelphia schools and identified small and statistically insignificant gains in test scores and GPAs among black and white students in the treatment condition, but did not find an overall narrowing of the achievement gaps.

While prior work establishes that gains are possible using this intervention, these studies provide little insight into the role of the school itself. In an innovative departure, Hanselman et al. (2014) explored the efficacy of the assessment in contexts that may be deemed high threat compared to those that are low threat. Contexts are considered high threat when black or Hispanic students comprise a relatively small share of the school’s racial composition. and interracial contact is a regular facet of a student’s experience in the classroom. When students are in the racial minority, they are more aware of their minority status and this awareness cues their anxiety of fulfilling stereotypes of the group (Cohen and Garcia 2005; Martiny et al. 2012). Notably, this work identifies stronger treatment effects for students in high-threat contexts.

Children of color increasingly attend schools that are composed entirely of other black or Hispanic students. These are contexts that have largely been untapped for assessment of stereotype threat or for interventions aimed at reducing its effects. Minority students attending schools with a large concentration of black and Hispanic students fall even more behind their white counterparts in both reading and math, compared to minority students in racially integrated schools or schools dominated by whites (Condron et al. 2013). Since the number of black and Hispanic students attending schools with large shares of whites has declined dramatically (Orfield et al. 2014), understanding the experience of students attending majority black or Hispanic schools is critical to addressing on-going achievement gaps.

Fielding the Intervention in a Majority-Minority School District and All Black/Hispanic Schools

How might the effect of a self-affirmation exercise vary across these different types of schools? On the one hand, we may find few effects of self-affirmation exercises across these schools because these are all low-threat contexts according to previous research. For example, in the study by Hanselman et al. (2014), high-threat contexts were schools where blacks and Hispanics represented 13 and 14% of the student body, while low-threat contexts were schools with 25 and 27%, respectively. By these standards, all the schools in the current study, and nearly all in this district, would be characterized as low threat. Presumably, whether a student of color is in a mixed environment or attending a school where their group is a large majority, they would likely not stand out as racially distinctive in a way that would correspond to a threat (i.e., be the only one in their classroom) and perhaps are less likely to confront the burdens of race-based stereotyping. In addition, even if white students are present, these students may be more cognizant of racial difference (i.e., negative stereotypes about one’s racial group) in these environments and may develop a strong affinity to communities of color (Morris 2006; Perry 2002). While students of color face academic challenges in these environments, it is not clear if they are due to stereotype threat.

On the other hand, it is possible that this intervention is effective in raising the academic performance of black and Hispanic students in these contexts, even if they could be understood as low threat, because students of color may encounter a high-threat context in their classrooms. Moreover, issues of race/ethnic disparities are not necessarily absent in schools where multiple groups are equally represented, since race shapes students’ self-perception and understanding of their own chances to succeed, as well as how students understand and view other students (Bettie 2003; Lewis 2003; Morris 2006; Tyson et al. 2005).

Additionally, it may be possible that self-affirmation may be effective even if students of color are the majority. Schools composed entirely of black or Hispanic students are more likely to be lower performing, least desired by parents, and most likely to face instability in terms of teacher and administrator turnover and even possible closure (Condron et al. 2013; Lee and Klugman 2013; Orfield et al. 2014). As a result, it is possible that even in the absence of interactions with racial out-group members, students’ self-confidence and their sense of self-worth are compromised in these settings, producing lower academic performance, independent of ability. In particular, high-stakes testing moments are classically associated with anxieties and boosting self-integrity might be particularly effective in these environments.

Research Questions

We advance the following three research questions. First, we test whether receiving the self-affirmation treatment exercise led to better academic performance, independent of the race/ethnicity of the student, the race/ethnic composition of the school, or other relevant background characteristics. To capture academic performance, we employ two central metrics (a) teacher assigned grades in English (the class where the intervention is fielded) and (b) performance on standardized tests. Prior assessments have uncovered improvement in grades (see Cohen et al. 2006), but fewer have identified improvement in standardized test performance. We compare the grades and standardized test scores of those receiving the self-affirmation treatment to those who received an exercise that did not include an affirmation component (i.e., the control exercise). If the treatment has the intended effect, students who received and completed the treatment exercises should have better academic performance at the close of the academic year than the students in the control condition.

Second, we ask whether the intervention is more or less effective for students by race/ethnicity. As this intervention is designed to alleviate anxiety that taxes minority students’ performance in particular, we expect the strength of the association to vary by race, with greater benefits for black and Hispanic students, than for white students. A contribution of our work is addressing whether Hispanic and black students experience the same treatment effects. The earliest and most persistent evidence has been found among black students experiencing stronger treatment effects than white students, but we have less information on Hispanics (Steele and Aronson 1995).

Third, we advance a similar question regarding race/ethnic composition of the school as prior research has been fielded in settings where white students are the majority and the probability of interracial contact is high (see Hanselman et al. 2014). Specifically, we investigate whether the treatment effects for black and Hispanic students are stronger in a racially mixed school, where they encounter race or ethnically different students than the corresponding school where they are members of the majority and thus only interacting with students of the same racial group. As stereotype threat is cued in contexts where inter-group contact is frequent, we anticipate the impact of the treatment on academic performance will be highest in the racially mixed school and relatively weaker in the predominantly black and Hispanic school.

Data and Methods

The Schools

With growing evidence that USA schools are becoming more racially segregated (Frankenberg and Lee 2002; Orfield et al. 2014), a growing number of schools are almost entirely composed of one race/ethnic minority group, either black or Hispanic. We argue above that this may shift the experience of stereotype threat and present a compelling context to field an intervention aimed at reducing its impact on academic performance. We drew on data collected by the school district. The district requested that the Houston Education Research Consortium (HERC) analyze the data. HERC conducted a randomized experimental design to field self-affirming writing exercises to students in a large metropolitan school district in Texas. The district administered the exercises, collected the data, and shared the data with HERC.

The specific schools were identified through a stratified school search based on these racial/ethnic criteria.Footnote 2 Our sampling approach prioritized identifying different schools with distinctive race/ethnic compositions, highlighting the different types of schools that black and Hispanic students find themselves. We fielded the intervention in a school district where only 8% of the students are white (non-Hispanic), 25% are black, and more than 60% are Hispanic. Moreover, there is a great degree of variation across school campuses, with several schools composed almost entirely (in excess of 90%) of black students or Hispanic students, while other schools are more than 50% white or Asian, despite representing less than 15% of the district (Texas Education Agency 2012). Notably, there are no schools in this district where white students are a large majority.

We identified three types of schools that represent three distinct but common configurations within the district. We identified a mixed school with roughly equal shares of black, white, and Hispanic students; a predominantly black school where black students made up almost 100% of the school; and a predominantly Hispanic school where Hispanic students made up more than 95% of the school.

Ideally, these schools would be similar in many respects to truly tap the effect of race/ethnic composition; however, they differ on several dimensions. The predominantly black and Hispanic schools are lower performing than the mixed race campus. As of 2011, one year prior to the administration, students from the predominantly black and Hispanic schools earned scores well below the state average on state standardized tests.Footnote 3 A salient question is whether this intervention can be effective in low-performing contexts. We will return in our discussion to this issue. Schools also differed in terms of class size. Across the total 54 classrooms where the intervention was fielded, the school we defined as “mixed” (i.e., roughly one-third black, white, and Hispanic students), the average class size was 13 students; meanwhile, the average class size was 19 students in the “predominantly Hispanic” school and 22 students in the “predominately black” school. Despite these differences across schools, a comparison across these contexts signals an important contribution as these educational settings are largely absent in the work on stereotype threat.

Procedure

The administration aimed to follow Cohen et al. (2006) affirmation writing exercises. Ninth-grade students (roughly aged 14) were given a series of four short writing exercises (lasting between 15 and 20 min) in their English classrooms during the 2012–2013 academic year. With the approval of the district and the principals and deans of instruction at the individual schools, this exercise was integrated into their daily assignments in class and we were not obligated to gain informed consent from the students or their parents. To identify moments of high academic stress, when anxiety related to stereotype threat is theoretically operating, these occurred no more than three days prior to four high-stakes tests: (1) a national pre-college exam administered in October, (2) the semester final exam in December, (3) a school-based examination in February, and (4) an end-of-the-year standardized test that is required for graduation and is used to determine promotion to the next grade in April.

We aimed to administer the intervention in a way that mimicked, as much as possible, its appearance as part of the regular curriculum. We therefore employed a double-blind approach. We began by soliciting school administrators to allow their teachers, as opposed to members of the research team, to give the exercises to their students during class. Once we received approval, we trained teachers on the administration. They were informed that our goal was to examine the potential benefits of a special type of writing assignment; one where students were allowed to write freely therefore students should be instructed not to worry about spelling or grammar. We also explained that we were fielding various formats of the writing exercise but did not indicate that these signaled a “treatment” or “control” condition; therefore, students should be instructed to not compare their assignments to their peers. During the training, the research team intentionally did not mention stereotype threat, race, or self-affirmation. We, then, provided the exercises directly to teachers a day before the intended administration date in a sealed envelope that only had the specific student’s name at the top.

The exercise itself included a list of 12 items that reflected values students might hold (e.g., spending time with friends or family, enjoying sports, enjoying music). Students assigned to the treatment circled those values that were important to them and then elaborated on why these values mattered (e.g., why the student values sports or spending time with friends and family). The proprietary nature of the self-affirmation exercise limits our ability to provide more description, but we invite readers to see Cohen et al. (2006) for more specifics. The control condition provided a similar list of values but instead of a prompt to elaborate; the control provided students with a brief writing prompt on a “neutral” scenario (e.g., whether the USA should continue using the penny given increasing production costs). Later in the cycle, students were given modified versions of these exercises to reduce repetition. Students in the control conditions were provided a new neutral scenario for each exercise.Footnote 4

These writing exercises were adapted from the self-affirming intervention developed by Cohen et al. (2006), which has been fielded successfully in several other studies (Borman 2012; Dee 2015; Hanselman et al. 2014; Sherman 2013). In adapting these exercises, we employed several changes. First, our writing prompt for the control exercise differs due to pilot testing that suggested the original control writing prompt was confusing for students. The purpose of the control is to elicit non-emotional student responses and our revised control prompt succeeded in gathering written responses that did not elicit student emotional reactions. Second, we have English teachers administer the writing exercises, as opposed to researchers, to signal that this exercise is part of the classroom curriculum. Finally, in order to maintain the appearance of a typical English curriculum assignment, we did not employ the racial climate survey Cohen and colleagues administered that gauges the centrality of race for students daily experiences. We acknowledge that these changes limit the degree to which this is a true replication of the Cohen and colleagues study, but we have maintained all other features of the self-affirmation exercises.

Sample

Identifying our study sample from the population of students engaged the following steps. Once three schools were selected, we obtained the roster of roughly 1600 ninth-grade students (with an average age of 14–15) across the schools. Administrative data on students’ race/ethnic background, gender, prior achievement on standardized examinations were merged on to these files. Students were stratified by race/ethnicity across all three schools to ensure representation of each race/ethnic group (black, white, Hispanic) in the treatment and control groups. We then conducted a student level randomization within each school, randomly assigning students to either treatment or control conditions, a designation they maintain throughout the school year. We provided the intervention materials to teachers in individually marked envelopes for each classroom. Teachers passed out the materials with student names already printed on each intervention so that students received their assigned treatment or control exercise. Follow-up tests indicated no statistically significant differences between treatment and control in gender or race/ethnicity across the full sample and within schools. Notably, the balance on race/ethnicity held even within schools identified as consisting of predominantly black and Hispanic students. We excluded students in English as a Second Language (ESL) or other Language English Proficient (LEP) programs or special education. We also excluded students who entered school after the school year began, those who shifted classrooms, or who exited the school before the year was over. This limited our initial sample to 1245 students.

Our goal was to limit our analysis to students who are “fully exposed” to the administration—that is they were present in the classroom and received as well as returned the appropriate exercise. We ultimately dropped students who received or returned the incorrect exercise, were absent during the administration, or changed classrooms. This resulted in the removal of roughly 28% (N = 359) of the initial sample of 1245 students. While, ideally, these exclusions should be random, ancillary analyses reveal students included in our sample are disproportionately white, female, and not disadvantaged. Also, students excluded from our analyses are more likely to attend the predominately black school in our sample. The significant association between inclusion in our sample and these characteristics raise important questions for the analysis that we return to in the discussion. Our final study sample includes 886 students, and we specify the number of students in each school in the forthcoming tables.

Measures

Dependent Variables

We employ three different indicators of student achievement to capture different facets of academic performance: students’ standardized test results in Reading and Algebra and the English grade teachers assigned students at the close of the Spring semester (or academic year). Cohen’s initial test that revealed positive results employed GPA and others have found varying treatment effects of GPA versus standardized test scores (Dee 2015; Hanselman et al. 2014). We assess the academic performance of students with a continuous measure of the grade received, ranging from 0 to 100, at the close of the Spring semester in English class (the course where they received the exercises). We refer to this outcome as “Spring Semester English grade”. We also draw on their standardized test score in English Reading I and the standardized test score in Algebra I on the STAAR-EOC (State of Texas Assessments of Academic Readiness-End of Course Results). We hereafter refer to these assessments as Reading and Algebra. The district administered this assessment for the first time during the year in which we conducted this intervention. The EOC assesses students’ knowledge and skills necessary for success in future academic courses.

We examine STAAR performance as a continuous outcome and observe whether taking the treatment relative to the control translates into increases in average score performance, which aligns with the emphasis of previous tests of this intervention. The STAAR scores are horizontally scaled, which allows for comparison across test forms from year to year for a specific subject assessment. Horizontal scale scores cannot be compared to the scale scores of other students in other grades in a different subject area; scale scores can be compared if in the same subject area.

Independent Variables

Our key independent variables are presence in the treatment or control conditions (1 = treatment, 0 = control), race/ethnicity, and school. Our analysis only focuses on the three largest racial/ethnic groups in the district, under the Office of Management and Budget designation that is adopted in the categories provided by the PEIMS (Public Education Information Management System) data: whites (non-Hispanic), blacks (non-Hispanic), and Hispanics. Students in all other categories, including Asians, Pacific Islanders, multiracial students, and students classified as “Some other race” are placed in an “Other race” category in our analyses.

Controls

We adjust for a variety of characteristics, coded dichotomously, that may have implications for academic performance. These are gender (male/female), economic disadvantage, and English proficiency. Gender is dichotomous (male = reference). Economic disadvantage is a three category variable where students are not disadvantaged (reference), Free/Reduced Lunch (FRL), or living in poverty (have household income at or below the poverty line). Status as FRL is a measure of student socioeconomic status gauging students who are eligible to partake in the Free Lunch Program under the National School Lunch Act and is determined by household size and income or based on categorical eligibility, which is a proxy for income data, roughly less than 18% of the federal poverty line. In our data, we further distinguish students whose household incomes are below the federal poverty line. Status as LEP, also known as English Language Learners (ELLs), refers to students who are eligible for language assistance programs (e.g., English as a Second Language or bilingual education).

Analysis Plan

Our goal is to compare treatment and control students in terms of academic performance. We therefore conduct a series of OLS regression models predicting our key outcomes, all of which are continuous, assessed at the close of the academic year. To test for the effect of the intervention (research question 1), we assess the average academic performance of treatment relative to control students in a baseline (Model 1) and a fully adjusted model (Model 2) that includes covariates for race/ethnicity and school and other relevant characteristics. To test whether the intervention’s effect intersects with race/ethnicity (research question 2) and school (research question 3), we introduce a series of interactions between receiving the treatment and student’s race/ethnic background and school (Models 3 and 4, respectively).

Although ideally, both sets of interactions would appear in the same model, our sample cannot sustain such a modeling approach because there is limited racial variation across schools. Specifically, white students are only present in one school (“mixed”) and black and Hispanic students are only present in the “mixed” school and the school where they are the large majority. However, we do explore how differing school composition may shape the effect of the intervention for specific groups of students (research question 3) in models where we stratify our sample by race/ethnicity and explore the effect of the intervention and its interaction with school on academic performance among black students (Table 5) and then Hispanic students (Table 6). In these models, we explore if the effect of the treatment for black or Hispanic students differs by the schools attended (i.e., either the mixed-setting or the corresponding predominate setting). Similar comparisons for white students are not possible as white students are only present in one school.

Results

Descriptive Statistics

We begin with a description of our respondents and the distribution of their characteristics in Table 1. In all, we have 886 students who received and returned either the treatment or control conditions four times over the course of the academic year. The number of students in either condition, listed in the bottom row, is basically equivalent (number of treatment students = 430, control students = 456). We show percentage distributions of the entire sample in the “All” column and the sample stratified by treatment and condition in their respective columns. We also show the corresponding frequency within each category in the “Frequency” column.

Table 1 Distribution of independent variables, by treatment condition

We present information on students’ race/ethnicity, gender, FRL and LEP status, and school type in Table 1. In our sample, we find generally equivalent percentages in both treatment and control conditions. Regarding race, our sample is broadly reflective of the district. The majority of the students in the sample are Hispanic (more than 60%), roughly a quarter are black, and 11% are white, with slightly more white students in the control condition. The remaining 1% includes all other race/ethnic groups, including Asians, Pacific Islanders, Native American, “Some Other race” students, and multiracial students. Half of all students in the sample are female, roughly one quarter are “not disadvantaged”, while almost 40% qualify as free/reduced lunch and the other one-third are “in poverty.” About 7% of students in our sample are classified as limited English proficiency (LEP). Half of our students come from the school composed of majority Hispanic student population, 35% are from the mixed school, and 15% are from the majority black school.

Fortunately, we find no statistically significant differences in these characteristics across the treatment and control, indicating that our efforts to randomize the administration yielded balance among those included in the analytical sample. The racial and gender distributions are nearly identical across both conditions. We see slightly more free and reduced lunch students among the treatment and slightly fewer students in the majority Hispanic school who received the control. Our forthcoming analyses will control for these characteristics as they have meaningful associations with academic performance; however, we do not anticipate that they will explain away the role of the treatment or control on grades or standardized scores.

Multivariate models

We now turn to our multivariate models. In Tables 2, 3, and 4, we explore our research questions in a series of models that assess the association of exposure to the treatment and Spring semester grade in English (Table 2), standardized scores on Reading (Table 3) and standardized test scores in Algebra (Table 4). In each table, we show the coefficients and associated standard errors and signal statistical significance with asterisks. We begin with the association between exposure to the treatment and grade received in English at the end of the semester. The coefficient in Model 1 reveals small differences in Spring semester grades between the students taking the treatment and their peers taking the control, and notably, our adjusted R 2 registers no variation explained. In Model 2, we continue to observe no effect, net of demographic characteristics, including race/ethnicity, gender, socioeconomic status, LEP status, and specific school. However, this model reveals very clear racial gaps in end of semester grades in English. Model 3 tests whether the efficacy of the treatment varies by race/ethnicity of the student. Not surprisingly, none of the effects are statistically significant and they are nominally small in size. In our final interactive model (Model 4) testing the treatment’s potential impact across schools, we continue to find little evidence that the effect of the treatment on semester grades in English varies across school context. In sum, we find little to no evidence that the exposure to the treatment corresponded to higher grades in English and no evidence that the treatment’s effect varied by race/ethnicity or depending on specific school.

Table 2 Regression models predicting Spring Semester English grade
Table 3 Regression models predicting STAAR Reading scores
Table 4 Regression models predicting STAAR Algebra scores

Tables 3 and 4 show the results for the STAAR Reading and Algebra standardized tests, respectively. Using the same modeling approach, we again find little evidence that students who received the treatment did significantly better on standardized tests in Reading or Algebra, for the most part. Models 1, 2, and 3 for both outcomes assess the effects of the treatment and reveal no statistically significant difference in the academic outcomes between treatment and control students. We do find sustained racial gaps. On average, blacks and Hispanics score between 50 and nearly 100 points lower on standardized reading tests, relative to white students (see Table 3) and score 150 points lower on standardized tests in Algebra. Achievement gaps are also apparent across schools as those in the “mixed” school scored considerably higher than students in the other two schools. Turning to Model 3, we find no evidence that the influence of the treatment on either Reading or Algebra scores by race/ethnicity of the student.

Despite these patterns of null effects, we do find some evidence of the treatment’s impact when appraising the interactions across schools in Reading (Table 3, Model 4), but no such effect is identified in Algebra scores. Specifically, we uncover a positive and significant effect of the interaction between treatment and school context for the majority Hispanic school, indicating potentially greater gains in standardized reading scores for these students than was achieved in the mixed school.

To further our examination of the roles of race and context, we turn to Tables 5 and 6 that explore the impact of the treatment among black (Table 5) and Hispanic students (Table 6). Unlike white students who are solely present in the “mixed” context, black and Hispanic students encounter schools where they are either in the minority (i.e., in the “mixed” school) or the large majority of students in their school. Theoretically, these conditions may shape the experience of stereotype threat and thus impact the treatment's effect on their academic performance. We present the results for each outcome across three panels using a similar modeling approach as found in the previous tables. Among black students, we find no significant effects of the treatment on their final grade in English. However, we do find a significant association (at the p < .10 level) in Model 2 between exposure to the treatment and Reading scores (see Panel B) and Algebra scores (see Panel C). However, students receiving the treatment earned lower scores in their Reading and Algebra than those in control. This is also apparent in Model 3 for Reading scores, but not Algebra, where we gauge the potential interaction of treatment and school attended. Patterns for Hispanic students, shown in Table 6, echo the findings in Tables 2, 3 and 4, as the treatment seems to have little bearing on the academic performance, regardless of outcome.

Table 5 Regression models predicting Spring Semester English grades, STAAR Reading, and STAAR Algebra scores for black students
Table 6 Regression models predicting Spring Semester English grades, STAAR Reading, and STAAR Algebra scores for Hispanic students

Discussion

This study reports the findings of a fielded intervention aimed at relieving stereotype threat and ultimately, translating into better academic performance among minority students. Stereotype threat is a theory suggesting that poorer performance by minorities (racial or otherwise) in academic achievement can be traced back, in part, to performance anxiety that reflects fears of confirming stereotypes of minority groups. Although the intervention itself has been shown to be effective in some contexts, it has never been fielded in districts or schools with a heavy concentration of blacks and Hispanics as is found in our study site. Districts and schools that are composed entirely of black or Hispanic students are increasingly common and unfortunately distinctive as academic achievement within such schools tends to be poorer than schools that are more racially integrated.

Our analysis reveals several notable findings. Overall, we find very little evidence that the self-affirmation exercise enhances students’ grades or their standardized test scores within these school contexts. Scores on standardized tests in Reading and Algebra as well as English Spring semester grades did not, for the most part, differ significantly between students who in the treatment condition (i.e., received the self-affirmation exercise) and those in the control (i.e., received a modified version with no affirming content). However, there were some exceptions to this pattern of null findings. Specifically, we uncovered a positive interaction between receiving the treatment and presence in a majority Hispanic school when assessing reading scores (see Table 3, Model 4). These findings are somewhat tenuous, as we uncover no differences among Hispanic students across different school types (mixed vs. majority Hispanic). Additionally, while the interaction tests do not reveal that the treatment’s effect varied by race, we do find some evidence that exposure to the treatment significantly impacted the performance of black students on standardized tests (see Table 5). Notably, this occurred in the opposite direction than expected as treatment students earned lower scores than their peers in the control. Ancillary analyses (not shown, available from authors by request) reveal that the pattern is driven by those in the mixed school, but is not apparent for those attending the “majority black” school. Taken together, the treatment’s effectiveness is still unclear within these school contexts.

There are many important implications of these patterns. The first is reconciling the relatively dramatic patterns identifed by Cohen et al. (2006) with the largely null associations that we uncover. While Cohen et al. (2006) identified dramatic improvement in GPA, we note little changes in grades or test scores following the intervention. Our study is not the first to uncover a lack of association between this intervention and student achievement (see Hanselman et al. 2014). Our work runs parallel to other studies revealing this intervention was considerably less successful in schools where students had fewer opportunities for contact with racially different peers, and specifically, where minority students did not encounter white students as often as in other schools (Hanselman et al. 2014). These patterns potentially point to differences in the operation and experience of stereotype threat or other identity threats within segregated contexts. When contexts are both racially homogenous and dominated by racial minority members, they are less likely (or wholly unlikely) to provide the type of inter-group contact that triggers the sense that a minority student is being judged by outsider stereotypes.

Notably, we also find that patterns of black and Hispanic students diverge as black student achievement was negatively impacted by the treatment and provided fewer clear results for Hispanic students. This challenges the initial theorizing that suggests that all minority groups encounter and are similarly impacted by stereotype threat or similarly benefited by self-affirmation. Recent debates about the changing nature of racial stratification assert that difference from whites should be unpacked further as racialized disadvantages are often more acute for African Americans compared to other groups (Lee and Bean 2010) and that Latinos’ experiences are strongly shaped by skin color, generation, and nativity (Faught and Hunter 2012). More research is needed to identify interventions that are effective for black and Hispanic students. We note here, however, that these schools differ on more dimensions that merely their race/ethnic composition and in more ways than may be captured in our models. Future work needs to examine more carefully the presumption that these two groups (or other race/ethnic groups) may experience threat in different ways, potentially impacting the means through it can be relieved.

The current analysis has a few limitations. First, many concepts go unmeasured that have bearing on this relationship. We do not, for example, directly test for the existence of “threat”. Stereotype threat is often identified after its been cued up, and we present no such trigger for these students. However, we operate under the notion that there is a “threat in the air” (Steele and Aronson 1995) suggesting that all persons classified as members of minority groups are susceptible to feeling as if they are being judged in any moment when they are evaluated. We accept this as an operating premise, however, it is unclear how threat may emerge, or how it might inhibit achievement, in contexts where children of color are the large majority. Enhancing self-integrity through the exercise may not be able to counter threat if none is present. Prior analyses gauge the level of race consciousness through fielding a survey on racial climate, a feature that is absent in our intervention. Our aim is to introduce a scenario that can be replicable at a large scale and fielding such a survey repeatedly may not be feasible. However, the result to exclude the racial climate survey may have removed the sensitivity to race that may have underlined the experience of stereotype threat in other assessments. Future research might explore the impact of the presence or absence of a climate survey for these interventions.

Additionally, our analyses do not adjust for academic ability of students. Theoretically, these exercises should be efficacious regardless of student academic ability (Cohen et al. 2006), however, prior academic achievement may drive or suppress the effectiveness of the affirmation. Ancillary analyses presented in Appendix Table 8 reveal that adjusting for performance on a standardized examination taken prior to entering ninth grade called the Stanford 10, does not alter the effect of the treatment, though this score is strongly related to academic performance a year later (at the close of ninth grade).

We make a few suggestions for future research. Future work should aim to expand the assessment of self-affirmation by exploring some of the qualitative dimensions. When writing about their personal experiences, students’ reflections, either positive or negative, may be critical to the power the affirmation exercise may have. We explored this preliminarily by coding each response to the open-ended portion of the exercise, regardless of the condition, for their use of positive or life-affirming content, as opposed to neutral or negative content (i.e., conveying unhappiness or some degree of emotional upset). The analyses shown in Appendix Table 7 compared students in the treatment who reported affirming responses to those in the control who used neutral content in their exercises. In all, we still find few differences in academic performance between these two groups; however, analyses of students reporting negative content (not shown) suggest this could be detrimental to their academic performance.Footnote 5 These patterns warrant further exploration as they suggest that the “power” of self-affirmation may be conditional on whether students actually find it to be affirming and, more importantly, that there may be costs for students for whom reflecting on their values is an upsetting experience.

Table 7 Regression models predicting Spring Semester English grade, STAAR Reading, and STAAR Math scores for students providing affirming responses on treatment or neutral responses on control

We submit that the null findings strongly suggest that academic challenges faced by many of these students are complex, shaping the potential of enhancing self-integrity for ameliorating gaps in school performance. Notably, these schools differ in more ways that just race/ethnic composition and in ways that outstrip our adjustment in our models. Our attempt to adjust for socioeconomic status between students, for example, draws on Free-Reduced Lunch (FRL) status. While researchers frequently use FRL (Hoffman 2012), this measure has noted flaws (Harwell and LeBeau 2010). Moreover, our results reveal improvement in standardized test scores, but with less clear impact on grades, suggesting that these aspects of academic performance may respond to the influence self-affirmation differently. The academic challenges of students in segregated schools (often from high poverty neighborhoods) are tremendous, and research is still unraveling its complex nature. Many have argued that segregation in today’s schools is reaching pre-1960s levels (Frankenberg and Lee 2002; Orfield et al. 2014, 2012). However, what this means for identity threats remains unclear, suggesting a variety of tools are required to address these experiences.