History repeatedly shows that people are not very good at noticing their biases, and a conglomerate of biases creates social norms with dire consequences for people who are not in power positions. Fortunately, social science and its systematic scientific thinking and analysis provide a venue to question social norms and the impacts on people. But social science is not a silver bullet. Social scientists are also people steeped in the same social norms which can unknowingly frame research—whether qualitative or quantitative. This chapter discusses how large-scale datasets can be used to investigate patterns of social injustice in education and demonstrates these procedures using a case of high school curricula opportunities.

Large-Scale Data: Risks and Advantages

It was only in 1994 that a book was widely distributed under the auspice of social science which misinterpreted results to conclude that American students of Anglo-Saxon ancestry were biologically predisposed to be more intelligent than students of African ancestry (Herrnstein & Murray, 1994). The authors based these conclusions on large-scale data on achievement patterns among US students. What the authors failed to recognize were their own biases steeped in a history of white supremacy, failing to recognize the generations of US laws making it illegal for entire groups of Americans to read or go to school (Fisher et al., 1996; Jencks & Phillips, 1998). The failure to incorporate these contextual factors created a “false-positive” error when they only looked at differences by racial identity at the individual level on achievement outcomes. Racial and ethnic identities are not to be assessed at the individual level since it is not an individual psychological factor or static attribute, but rather a measure used to reflect dynamic social norms (Bonilla-Silva, 2001). Fortunately, an esteemed group of social scientists gathered their collective talents to point out the major statistical errors in the book and retested the same data with context included to clearly show that racial differences in achievement were artifacts of context and nothing about biology (Fisher et al., 1996).

One of the advantages of large-scale quantitative work is that it can be rerun and replicated. With replication, researchers, like those in University of California at Berkeley (i.e., Fisher et al., 1996), can check others’ results and test how omitted variable bias may sway results and explain how the omitted variables provide a lens in which to interpret the results. It is in the interpretation that education policies are developed, so it is the responsibility of researchers to test these biases.

Large-scale datasets can also reveal patterns that are not easily seen with a naked eye. In the earlier example, while complexion can be thought to be observed (although this is steeped in its own set of context and misperceptions), historical racism is not observable. To test the impact of such conceptual ideas, researchers think deeply about which observable variables can be used to represent hard-to-observe social facts. In the earlier case, it was the inclusion of a constellation of measures of unemployment, parents’ education level, neighborhood locale, and the like that provided the context in which to test the cumulative impact of generations of historical racism on students’ achievement (Fisher et al., 1996; Jencks & Phillips, 1998).

Large-scale studies provide generalizable results and are large enough to disaggregate into subgroup populations. With subgroup clusters, such social justice ideas of equality, equity, and differential treatments and applications can be measured and tested over time and between contexts. These aspects increase the external validity of the analyses and reduce cynics’ criticisms that the observed differences are subjective. Instead, the abundance of data points used in large-scale quantitative analyses can provide an avenue for researchers to shelf their preconceived notions of how things appear to operate and instead focus squarely on the patterns in the historical, structural, institutional, and organizational data. With these aspects, the interpretation can be less ridiculed for being prone to interpretation in the eyes of the beholder and instead can be revered as providing the 20/20 lens to clearly see the patterns that undertow our social systems.

Large-Scale Data for Social Justice in Education

In the twenty-first century, large-scale data on students, teachers, school leaders, and school organizations pervade US education. The rhetoric of “data-driven decision-making” abounds (Gummer, Hamilton, Miller, Penuel, & Shepard, 2018), yet school leaders and teachers feel underprepared and lack the time to answer questions with the data (Honig & Coburn, 2008). Shepard discusses how the lack of clear training in research design and questioning risks the misuse and misinterpretation of data where data users do not have the skills to critically assess the quality of the measures, such as whether the measures match the conceptual core of their research questions or whether there were errors in the data collection, input, or coding, or training to test for the assumptions and biases that undergird the data. These problems exist in all data, and thus an undisciplined use of the data can develop into harmful policies for children, their learning, and the democratic education ideal (Gummer et al., 2018). Penuel emphasizes that “evidence-based decision-making” has yet to take shape in our educational organizations and the lack of clear questions to ask of the data, large-scale data become an exercise in reporting numbers without meaning (Gummer et al., 2018). Thus, learning by and training of educational practitioners to ask questions about social justice can shape the type of data that are collected, define the analyses to perform, and develop policies rooted in evidence aimed to ameliorate the injustices among children’s learning opportunities.

Defining Educational Opportunities

When considering large-scale data in addressing educational equity and equality, definitions become central to correctly identifying how to measure these attributes. Equity is the penultimate goal where opportunities are not differentiated by birth or ability, and ability to achieve goals is not relegated to a privileged few (Coleman, 1990; Espinoza, 2007; Secada, 1989). Equity thus has two parts: access to opportunities (resource inputs) and achievement successes (outcome outputs). Access to opportunities is rooted in equality. Equality requires the basic tenant of equal access no matter the sociodemographics of the individuals (Coleman, 1990; Espinoza, 2007; Secada, 1989). This can have inputs from community to schooling factors. For schooling, which is the focus of this chapter, this means that “inequality may be defined in terms of consequences of the school for individuals” (Coleman, 1990, p. 25). Essentially, equity aims to the goal of social justice where there would need to be corrective measures to adjust for historical social inequalities. Equity cannot exist without first assessing inequality in order to consider how to appropriately adjust resources.

There are a host of inequality measures to use, from Gini coefficient, mean relative deviation, Theil, and squared coefficient of variation (Reardon & Firebaugh, 2002). These measures capture the amount of proportional distributions of occupants across one space, such as students in schools, counties, or neighborhoods, compared to the general population distribution (for an extensive discussion on this, see Reardon & Firebaugh, 2002). These types of measures can answer questions such as are students who are suspended representative of all the students in the district? In the case of a curricular pipeline where there are multiple nested spaces of (1) students (1a) attending schools with or (1b) without access and if 1a, then (2) students who (2a) are enrolled in the courses or (2b) not and if 2a, then (3) who (3a) takes course exams or (3b) not and finally, if 3a, (4) who passes the exams. A measure thus needs to be comparable across this compound clustering and concentration that moves from one space to define the next space in the pipeline.

Most inequality measures cannot produce comparable gauges of inequality across an interdependent and moving denominator (since there is compounding loss of students at each stage of space). The Herfindahl-Hirschman Index (HHI), used mostly by economists, can do this using a comparable approach to gauge market concentration (Taagepera & Lee Ray, 1977). The HHI assumes that all firms have a 1:1 chance to enter the market (one firm, one chance). Conceptually, groups of students act as “firms” who occupy different spaces of the curriculum market. Since schools have varying distributions of student populations, the formula needs to adjust the 1:1 assumption. The HHI estimate presented in Eq. 17.1 shows the denominator addition that adjusts for the varied proportional representation of students.

The calculations for this normalized HHI inequality measures are:

$$ HHI=\sum \limits_{d=1}^D\sum \limits_{j=1}^J\frac{{\left({N}_{js}-{n}_{js}\right)}^2}{N_{js}}, $$
(17.1)

where N is the proportion in the population, n is the proportion in the pipeline space, j is the subgroup designation, and s is the school.

Equal representation of the groups in the market produces an HHI = 0. The higher the value, the greater a group monopolizes the asset in the market. Unlike many traditional segregation indices of Gini, Theil, and others that restrict to bi-group analyses (white-to-non-white, white-to-Hispanic), the HHI allows for multiple groups to be assessed together. With the HHI, the seven different racial and ethnic group identitiesFootnote 1 cited in the data can be compared as a whole rather than a series of pairs which otherwise would be a set of 21 combination pairs for analyses.

Declaring Data Collections

Another consideration important when researching social justice in education is the type of data collection: census or sample. Census data collect information from an entire population, while sample data collect from a subset of the population (Knoke, Bohrnstedt, & Potter Mee, 2002). Census data include the universe of all cases in the population and thus has no error in the estimations, while sample data collections include a selection of data that can be mathematically transformed to represent the whole population with an estimated tolerance for error (Knoke et al., 2002). In the US, the decennial US Census asks questions of all US households, while the Current Population Survey occurs every month to keep a pulse on the changes in US households using data from a sample of households. For US education, the Common Core of Data from the National Center for Education Statistics (NCES) and the Civil Rights Data Collection (CRDC) from the US Department of Education are two examples of census datasets. The “study” or “survey” named datasets from NCES, such as the Early Childhood Longitudinal Study (ECLS) or the Crime and Safety Surveys datasets, use sample data.

The type of data collection to use depends on the research question. If the core idea is to discuss patterns across the general population of students, teachers, schools, or the like, then datasets using samples do just fine. An advantage of sample datasets is that it is often the case that more nuanced survey questions are asked on particular topics. For example, the ECLS survey can show individual students’ waxing and waning through their educational years since it follows the same students and asks the same questions over many years of schooling. With this type of dataset, questions such as the average learning growth patterns over time can be deeply tested, and questions about impacts of teacher qualifications or discipline on student learning can be estimated.

If the research or policy question seeks to understand the differences between student, teacher, or school subgroups, sample or census data oftentimes both can work. However, if the subgroup counts are small, the census data are more reliable because census data are not prone to sampling error. To illustrate this idea, imagine a map of all the homes with students in the US. Now imagine that a representative sample of students by grade level is drawn across the country. If the idea is to ask questions about differences in educational opportunities between boys and girls, then this type of sample would suffice since the laws of statistics would show the high probability that a random sample selected would have nearly an equal representation of boys and girls. If the counts were off by a little bit, weights could be applied to tilt the scales to get the 51/49 girl/boy split found in the population. The models would also want to adjust for transgender student representation as the grade levels got higher since, before teenage years, very few students identify as transgender, but by the teen years, about 0.7 percent of the student population does identify as such (Blad, 2017).

If the goal of the research is to understand the differences in educational opportunities between transgender boys and transgender girls, then data from a general NCES sample-based database would not suffice since it would be highly unlikely that even one transgender student would be selected from that selected sample of households. Even if there were a few transgender students who were sampled by random chance, the information on a few transgender students would be susceptible to much error (i.e., large sampling error) since a handful of students’ data could not be relied upon to represent the general patterns among the transgender gender subgroup. To gather data on this group of students, a very particular sampling would need to be designed, or census data could be used since it already has the universe of all students in the database (that is, if there was a more than a binary gender identifier option on the census questionnaire).

All students represent their own voice in census data, whereas sample data allow a selection of students to represent the variation among the unsampled voices. Given the Central Limit Theorem of statistics, the sampled variation is often plenty close to what is needed to test most research questions (Knoke et al., 2002). However, the Central Limit Theorem does not suffice when there are only a few voices to speak within a subgroup.

Illustrating Inequality Using Large-Scale Data

This chapter uses an example of disparities in high school learning opportunities to illustrate these ideas of inequality in US high school curriculum resources. This example uses census data regarding Advanced Placement (AP) opportunities among high school students. In designing this study, a general representative sample of students’ high school transcripts could provide enough information on the enrollment rates of students in these courses compared to other courses. However, the question is about more than the general differences between all students. Instead, this question seeks to drill down into the magnitude of differences experienced by students of varying racial or ethnic identities.Footnote 2 Given this orientation, the sample numbers would become too small to represent some students’ voices. For instance, students who identify as indigenous to the Americas comprise 1 percent of the US school-age population (Musu-Gillette et al., 2016). Even if a transcript data collection was a large representative sample of 10,000 high school students, only approximately 100 of the sample would identify as American IndianFootnote 3 across each of the 9th, 10th, 11th, and 12th grades. Of these students, there would only likely be about a dozen American Indian students in college prep courses since these courses are not available to all students and are typically only offered in the upper high school grades. Given these conditions, a study on racial or ethnic inequality in curriculum opportunities is more reliable using a census dataset. Fortunately, the CRDC collects a biannual census from all public schools on the enrollment of students in AP since the 2011–12 school year.

A Brief Background on the Social Injustice of Opportunities to Learn

Since desegregation, education policy has focused on access to curriculum no matter a students’ school or district (Orfield & Lee, 2006). Research on tracking provides ample examples of how to measure course enrollment patterns by gauging inequality of representation by students’ ascriptive characteristics (Gamoran, 1987; Hallinan, 1991; Kelly, 2004; Kelly & Price, 2011; Rosenbaum, 1976). Most of the research on differences in quality of delivery and course credentialing comes from qualitative comparative work (Cisneros et al., 2014; Gagnon & Mattingly, 2016; Klugman, 2012; Lareau, 2000; Oakes, 2005; Palmer, 2016). Most quantitative research uses the attainment of students (equity)—high school graduation, college admission—as distal signals of schools’ overall curricular rigor. For particular courses, the use of grades and course descriptions from administrative transcript records from nationally representative sample-based datasets of the National Education Longitudinal Study (NELS) and the Education Longitudinal Study (ELS) have been the best proxies to compare across the state-based education system in the US (Gamoran, 1987; Gamoran & Mare, 1989; Lucas & Berends, 2002). However, these quantitative operationalizations of quality or credentialing do not directly link to the course curriculum and instead assume that students’ grades or attainment are absolute to some external criteria when they are in fact relative to the school standards.

Equality in Learning Opportunities

In the discussion of equality of opportunity, there exists a four-part chain of events that fuels the curricular pipeline. The four-part chain is operationalized for this analysis under the following parameters:

  1. 1.

    Access: whether or not students are enrolled in schools with rigorous curriculum offerings.

  2. 2.

    Treatment: who in the school with the curriculum participates in those particular courses.

  3. 3.

    Quality: whether the courses meet the external expectations of quality. For AP courses, quality is defined as whether or not students had access to taking the AP course exam which can be exchanged for college credit because it is assumed that if the school thought the course was of high quality, then it would offer the test to their students.

  4. 4.

    Credentialing: whether students acquire the credential to demonstrate that they learned the expected material in quality courses. For AP courses, the credential in the pipeline is defined as whether or not students who took the exam indeed passed with a mark high enough to gain college credit (for an AP course, this is typically an exam score of 3, 4, or 5.

This sequence of events compounds spaces of learning opportunities along the pipeline.

Evidencing inequality. Figure 17.1 shows the proportion of students within their own racial or ethnic subgroup who have access, are enrolled, take an exam, and pass at least one exam in AP courses. It shows the general clustering patterns along the pipeline for each of the seven racial or ethnic identity subgroups. Each turn in the line shows the places where the valves shut off flow to students along the pipeline.

Fig. 17.1
A line graph plots the Advanced Placement pipeline versus percentage. It depicts a downward trend for all ethnic groups that attend a school with an A P program but don't enroll in it.

Within-group student shutoff along the Advanced Placement curriculum pipeline. Source: Civil Rights Data Collection, pooled school years of 2011–12 and 2013–14

Access. Although four-in-five high school students attend schools that offer some AP curriculum, three times fewer American Indian students attend these schools. Every other group of students appears to have wide access to AP curriculum in their schools.Footnote 4

Treatment. Figure 17.1 also shows that enrollment in AP courses is selective—only about 1 in 20 students are enrolled in at least one AP course. Between access and treatment, the lines representing white, non-Hispanic, and Asian American students drop less than the others which indicates that these student subgroups are experiencing higher college prep course enrollments compared to their peers.

Quality. Unequal chances to take an AP exam are more extreme for Asian American students than others.

Credential. In the final stage of the pipeline, Hispanic, African American, and Asian American students experience a greater proportion of failing scores on their AP exams compared to their peers who made it through the pipeline with them. Asian American students, on the whole, leave the pipeline with more credentials. By the end of this four-part compounding opportunities to learn, Asian American students earn the college prep credential of a passing score at a rate more than two times greater than their white, non-Hispanic, and multi-racial peers; three times greater than their Hispanic, Hawaiian, and Pacific Islander peers; and nearly five times greater than their African American and American Indian peers who reach the end of the AP pipeline.

Although Asian American students persist at the highest rate in the pipeline among their own identity groups, it is simultaneously the fact that Asian American students only consist of 5 percent of the school-age population (Musu-Gillette et al., 2016). In the same line of thought, it would be helpful to be able to dig more deeply into the between-group differences. This points to the need for a discussion on market share: where is the inequality in the market along the AP pipeline? What can market share indices like the HHI demonstrate regarding whether certain groups monopolize the AP resources in high schools?

Table 17.1 shows the HHI scores for the schools along the different spaces in the pipeline. Of initial note is the result that more than one in three high schools in the US do not have disparities in three of the four spaces along the pipeline (HHI = 0). Enrollment is the one place in the pipeline where disparities accumulate.

Table 17.1 HHI scores along the Advanced Placement curriculum pipeline

In particular, inequality in access exists, but it is the least monopolized of all the spaces since more than 99 percent of schools have HHI scores less than 1. Enrollment in AP courses has a moderately high rate of monopoly within schools, alluding to the idea that some groups of students “own the AP market” in their schools. Unequal AP exam test-taking is moderate for most schools, but 10 percent of schools show extremely high monopolies over the market (HHI > 1) of who takes AP exams. Inequality scores in regard to obtaining the passing score of 3, 4, or 5 shows that 43.7 percent of schools have no inequality by racial or ethnic identity subgroups, but the flip side is that 19.1 percent of schools show extremely high disparities in market concentrations on passing exam scores.

As a check on the data and the assumptions of who is excluded from the moving denominator, calculations were also made regarding disparities in who does not take the exam or who fails the exam, as is shown below the horizontal line in Table 17.1. With these measures of inequality on full exclusion from opportunity, the results show that there is less chance of subgroup monopolies over these “lack of opportunity” markets (HHI = 0 for 40 percent and 60 percent of students who do not have the opportunity to take an AP exam or earn a passing score, respectively).

Interpreting Evidence

The HHI information, together with Fig. 17.1 information, provides a more complete picture of the inequality issues in high school AP curriculum. The AP treatment HHI points to the continuation of decades of within-school tracking issues where a group of students “own the AP market” in their schools. Whether this happens as a result of school policies on closed-track systems or de facto tracking cannot be determined with these data, but these results do point to questions for further study. The notable proportions of schools with no inequality along the pipeline point to schools that seem to be achieving some equal opportunities in curricula resources for their students, regardless of racial or ethnic identity.Footnote 5

This example shows how to use large-scale data to understand how historically marginalized students are shut out of the pipeline at rates higher than advantaged students. There are distinct racial and ethnic patterns regarding the timing of when students get shut out of the pipeline. These findings complement studies on within-school tracking inequality by moving the discussion forward to understand the nested spaces of opportunity along the curricular pipeline. This study can adjust the policy light on the new twenty-first-century racial inequality emerging in education.

Conclusions

Large-scale datasets allow for persistent patterns of inequality and inequity to be demonstrated. Whether over time or between subgroups, disparities in educational opportunities are hard to disregard when the evidence is clear and consistent. To achieve this level of rigor, education research must clearly define terms related to learning opportunities and injustice. Although the use of equality and equity terms is often conflated, the ideas are importantly distinct. Equality—as in the Equality of Educational Opportunity (Coleman et al., 1966)—involves the notion of the absence (compared to the presence) of resources between student groups. That is, if people are in a place where they can get at the same pieces of curricula, and still have unequal outcomes, that’s a way of thinking of inequity that goes beyond equality of access.

Large-scale datasets, especially census data collections, allow for small numbers of voices to be heard among the many. It is with these voices that researchers can begin to listen to the social injustices that undertow our society and begin to enact change in educational policy.