Keywords

13.1 Introduction

In 2011, New York State’s 695 school districts (New York State Education Department n.d.) spent $53.7 billion (U.S. Census Bureau 2011, Table 6) to educate almost 2.7 million elementary and secondary pupils (U.S. Census Bureau 2011, Table 19), a cost of over $19,000 per pupil (U.S. Census Bureau 2011, Table 8). Elementary and secondary education accounts for nearly one-quarter of all state and local expenditures in New York State (U.S. Government Spending n.d.). While New York State has some excellent school districts, others struggle with poor standardized test scores and low graduation rates. Many of the reasons for the differences among school districts are widely accepted. These include differences in wealth, English proficiency, and inefficient use of resources.

Given the high cost of public education and its critical importance for the future of New York and the nation, it is natural for taxpayers, legislators, and administration officials to hold public education institutions accountable for producing high quality outcomes. To do so, we must measure the performance of each school district in an objective, data-informed manner. Commonly used methods for performance measurement under these circumstances are often called benchmarking models . When applied to school districts, a benchmark model identifies leading school districts, called benchmark school districts , and it facilitates the comparison of other school districts to the benchmark school districts. Non-benchmark school districts can focus on specific ways to improve their performance and thereby that of the overall statewide school system.

In this chapter, we present an appropriate benchmarking methodology , apply that methodology to measure the performance of New York State school districts in the 2011–2012 academic year, and provide detailed alternative improvement pathways for each school district.

13.2 Choosing an Appropriate Benchmarking Methodology

There are several methods used to perform benchmarking analysis. They differ in the nature of the data employed and the manner in which the data are analyzed. They also differ in their fundamental philosophies.

Some approaches compare individual units to some measure of central tendency, such as a mean or a median. For example, we might measure the financial performance of each firm within an industry by comparing its net income to the average net income of all firms in the industry. A moment’s reflection reveals that large firms will outperform small firms simply due to their size and without regard to their managerial performance. We might attempt to correct for this by computing each firm’s net income divided by its total assets, called the firm’s return on assets. This approach is called ratio analysis, and a firm’s performance might be measured by comparing its return on assets to the mean (or median) return on assets of all firms in the industry. Ratio analysis, however, assumes constant returns to scale—the marginal value of each dollar of assets is the same regardless of the size of the firm—and this may be a poor assumption in certain applications.

To avoid this assumption, we might perform a regression analysis using net income as the dependent variable and total assets as the independent variable. The performance of an individual firm would be determined by its position relative to the regression model, that is, a firm would be considered to be performing well if its net income were higher than predicted by the model given its total assets. We point out, however, that regression is a (conditional) averaging technique and measures units relative to average, rather than best, performance, and therefore does not achieve the primary objective of benchmarking.

Other approaches compare individual units to a measure of best, rather than average, performance. For example, we might modify ratio analysis by comparing a firm’s return on assets to the largest return on assets of all firms in the industry. This has the advantage of revealing how much the firm needs to improve its return on assets to become a financial leader in the industry. Using such a methodology, we would encourage firms to focus on the best performers, rather than on the average performers, in its industry.

The complexity of business organizations means that no one ratio can possibly measure the multiple dimensions of a firm’s financial performance. Therefore, financial analysts often report a plethora of ratios, each measuring one specific aspect of the firm’s performance. The result can be a bewildering array of financial ratios requiring the analyst to piece together the ratios to create a complete, and inevitably subjective, picture of the firm’s financial performance.

Fortunately, there is a methodology, called data envelopment analysis (DEA) that overcomes the problems associated with ratio analysis of complex organizations. As described in the next section, DEA employs a linear programming model to identify units called decision-making units, or DMUs whose performance, measured across multiple dimensions, is not exceeded by any other units or even any other combination of units. Cook et al. (2014) argue persuasively that DEA is a powerful “balanced benchmarking” tool in helping units to achieve best practices.

13.3 Data Envelopment Analysis

DEA has proven to be a successful tool in performance benchmarking. It is particularly well suited when measuring the performance of units along multiple dimensions, as is the case with complex organizations such as school districts. DEA has been used since the 1950s in a wide variety of applications, including health care, banking, pupil transportation, and most recently, education. DEA’s mathematical development may be traced to Charnes et al. (1978), who built on the work of Farrell (1957) and others. The technique is well documented in the management science literature (Charnes et al. 1978, 1979, 1981; Sexton 1986; Sexton et al. 1986; Cooper et al. 1999), and it has received increasing attention as researchers have wrestled with problems of productivity measurement in the services and nonmarket sectors of the economy. Cooper et al. (2011) covers several methodological improvements in DEA and describes a wide variety of applications in banking, engineering, health care, and services. Emrouznejad et al. (2008) provided a review of more than 4000 DEA articles. Liu et al. (2013) use a citation-based approach to survey the DEA literature and report finding 4936 DEA papers in the literature. See deazone.com for an extensive bibliography of DEA publications as well as a DEA tutorial and DEA software.

DEA empirically identifies the best performers by forming the performance frontier based on observed indicators from all units. Consequently, DEA bases the resulting performance scores and potential performance improvements entirely on the actual performance of other DMUs, free of any questionable assumptions regarding the mathematical form of the underlying production function. On balance, many analysts view DEA as preferable to other forms of performance measurement.

Figures 13.1 and 13.2 illustrate the performance frontier for a simple model of school districts. We can use this simple model, which is clearly inadequate for capturing the complexity of school districts, to demonstrate the fundamental concepts of DEA. In this model, we assume that each school district employs only one type of resource, full-time equivalent (FTE) teachers, and prepares students for only one type of standardized test, mathematics at the appropriate grade level, measured as the percentage of students who score at a given level or higher. Each school district is represented by a point in the scatterplot.

Fig. 13.1
figure 1

The performance frontier for a simple example

Fig. 13.2
figure 2

Several ways for school district D to move to the performance frontier

In Fig. 13.1, school districts A, B, and C define the performance frontier. In each case, there is no school district or weighted average of school districts that has fewer FTE teachers per 100 students and has a higher percentage of students who scored 3 or 4 on the standardized mathematics test. Such school districts, if they existed, would lie to the Northwest of A, B, or C, and no such districts, or straight lines between any two districts, exists.

School district D, in Fig. 13.2, does not lie on the performance frontier and therefore its performance can improve. In principle, D can choose to move anywhere on the performance frontier. If school district D chooses to focus on resource reduction without test performance change, it would move to the left, reaching the performance frontier at point DRR. This move would require a reduction from 8.39 to 7.71 FTE teachers per 100 students. If school district D enrolls 10,000 students, this reduction would be from 839 to 771 teachers, a percentage reduction of 8.1 %. We refer to this strategy as the resource reduction orientation .

If school district D chooses to focus on performance enhancement without resource reduction, it would move upward, reaching the performance frontier at point DPE. This move would require 94.6 % of its students to score 3 or 4 on the standardized mathematics test, up from 77 %. If 1000 students in school district D sat for the standardized mathematics test, students scoring 3 or 4 would increase would from 770 to 946, or by 22.9 %. We refer to this strategy as the performance enhancement orientation .

School district D might prefer an intermediate approach that includes both resource reduction and performance enhancement and move to point DM. This entails both a reduction in FTE teachers per 100 students from 8.39 to 7.80 and an increase in the percentage of students who score 3 or 4 on the standardized mathematics test from 77 to 82.4 %. If school district D enrolls 10,000 students, this reduction would be from 839 to 780 teachers, or by 7.0 %, and an increase in students scoring 3 or 4 from 770 to 824, or 7.0 %. We refer to this strategy as the mixed orientation . The mixed orientation has the feature that the percentage decrease in each resource equals the percentage increase in each performance measure.

The three points DRR, DPE, and DM are called targets for school district D because they represent three possible goals for D to achieve to reach the performance frontier. School district D can choose its target anywhere on the performance frontier, but these three points represent reasonable reference points for D as it improves its overall performance.

Of course, this model does not consider other resources used by school districts such as teacher support personnel and other staff, nor does it consider standardized test scores in science or English. It also ignores graduation rates in school districts with one or more high schools. Moreover, it does not recognize differences in important district characteristics such as the number of elementary and secondary students , the percentage of students who qualify for free or reduced price lunch or who have limited English proficiency, or the district’s combined wealth ratio.

When other measures are included in the model, we can no longer rely on a simple graphical method to identify a school district’s target school district. For this purpose, we rely on the linear programming model that we describe in detail in the Appendix. Nonetheless, the target school district will have the same basic interpretation. Relative to the school district in question, the target school district consumes the same or less of each resource, its students perform the same or better on each standardized test, its graduation rate is at least as high (if applicable), it educates the same number or more students, and it operates under the same or worse district characteristics.

13.4 A DEA Model for School District Performance in New York State

To apply the DEA methodology to measure the performance of New York State school districts, we began by identifying three categories of important school district measurements. They were:

  • resources consumed;

  • performance measures; and

  • district characteristics.

We defined the resources consumed as:

  • FTE teachers;

  • FTE teacher support (teacher assistants + teacher aides); and

  • building administration and professional staff (principals + assistant principals + other professional staff + paraprofessionals).

For school districts with no high school, we defined the performance measures as:

  • percentage of students scoring at or above level 3 on ELA grade 6;

  • percentage of students scoring at or above level 3 on math grade 6; and

  • percentage of students scoring at or above level 3 on science grade 4.

For school districts with one or more high schools, we defined the performance measures as:

  • total cohort results in secondary-level English after 4 years of instruction: percentage scoring at levels 3–4;

  • total cohort results in secondary-level math after 4 years of instruction: percentage scoring at levels 3–4;

  • grade 8 science: percentage scoring at levels 3–4 all students; and

  • 4-year graduation rate as of August.

We defined the district characteristics as:

  • number of elementary school students;

  • number of secondary school students;

  • percentage of students with free or reduced price lunch;

  • percentage of students with limited English proficiency; and

  • school district’s combined wealth ratio.

We recognize that other choices of variables are possible. We use this particular set of variables because it captures a reasonable range of resources consumed, performance dimensions to be measured, and district characteristics to be taken into account. Other variables may be added if statewide data are available for every school district. Our objective is to illustrate the model and its ability to provide school districts with useful feedback for strategic planning and other purposes.

We consider all three possible orientations. The resource reduction orientation seeks to reduce resource consumption as much as possible while maintaining performance measures at their current levels. The performance enhancement orientation seeks to improve performance measures as much as possible while maintaining resource consumption at current levels. The mixed orientation seeks to improve performance measures and reduce resource consumption simultaneously in a balanced way.

We present the results of all three orientations to provide school district administrators with alternative options for reaching the performance frontier. One district might elect to focus on resource reduction; another might opt for increases in test scores and graduation rate , while a third might prefer a blended strategy that combines these two objectives. Moreover, there are infinitely many points on the performance frontier toward which a district may move; the three that we present are designed to highlight three possible alternatives.

We point out that the performance frontier is unaffected by the choice of orientation. Any district that lies on the performance frontier in one orientation will also lie on it in any other orientation. Orientation only determines the location of the target district on the performance frontier.

13.5 Data and Results

We obtained complete data for 624 public school districts with one or more high schools and 31 public school districts with no high school for the academic year 2011–2012. Complete data were unavailable for certain districts. All data were obtained from the New York State Education Department.

13.6 Results for Three Example Districts

Table 13.1 shows the results for three districts based on the model described above. These districts were selected to illustrate the manner in which the model results can be presented to school districts and how they might be interpreted.

Table 13.1 Results for three example districts under three orientations (in percentages)

School district A would reduce all three resources by 18.3 % using the resource reduction orientation and by 4.0 % under the mixed orientation, but would not reduce any resources under the performance enhancement orientation. Improvements in English and science would be virtually the same using all three orientations (in the range of 4 %) but the improvements in math and graduation rate are notably higher using either the performance enhancement or mixed orientations. The message for school district A is that it can raise all three test measures by about 4 % and graduation rate by about 8% with little or no reduction in resources. Alternatively, it can improve English and science (but not math) by about 4% and graduation rate by 4–5 % even with significant resource reductions. The choice of strategy would be influenced by many other factors not reflected in the model.

School district B can reduce its FTE teachers by at least 6.9 % but its greater opportunity lies in teacher support, which it can reduce by at least 27.4 %. Despite these reductions, it can improve English by almost 7% and math by almost 4 %.

School district C is performing very well regardless of orientation with the exception of math, which it can improve by almost 14 %.

13.7 Statewide Results

We found no evidence that 201 of the 624 (32.2 %) school districts with one or more high schools can reduce resource consumption or improve performance. The same statement applies to 28 of the 31 (90.3 %) school districts with no high school. Put another way, each of these school districts serves as its own target school district—none of these school districts can simultaneously reduce each of its resources and improve each of its performance measures while operating under the same district characteristics.

13.8 Districts with One or More High Schools

The 624 school districts with one or more high schools employed 126,470 FTE teachers, 33,035 FTE teacher support personnel, and 25,492.5 FTE building administration and professional staff in the academic year 2011–2012. The average percentage of students who scored 3 or 4 on the English exam was 84.4 %; on the mathematics exam, the average was 86.0 %, and on the science exam, the average was 81.6 %. The average graduation rate was 84.2 %. See Table 13.2.

Table 13.2 Data and statewide results for all three orientations for school districts with one or more high schools

Using a mixed orientation, we found evidence that the number of FTE teachers can be reduced by 8.4 %, the number of FTE teacher support personnel can be reduced by 17.2 %, and the number of FTE building administration and professional staff personnel can be reduced by 9.4 %. In addition, that the averageFootnote 1 percentage of students who score 3 or 4 on the English exam can rise by 4.9 % points, by 5.0 % points on the mathematics exam, and by 5.8 % points on the science exam. Moreover, the averageFootnote 2 graduation rate can rise by 5.4 % points.

Using a resource reduction orientation , we found evidence that the number of FTE teachers can be reduced by 19.1 %, the number of FTE teacher support personnel can be reduced by 22.3 %, and the number of FTE building administration and professional staff personnel can be reduced by 19.3 %. In addition, the average percentage of students who score 3 or 4 on the English exam can rise by 2.2 % points, by 2.4 % points on the mathematics exam, and by 3.7 % points on the science exam. Moreover, the average graduation rate can rise by 2.3 % points.

Finally, using a performance enhancement orientation, we found evidence that the number of FTE teachers can be reduced by 5.7 %, the number of FTE teacher support personnel by 15.5 %, and the number of FTE building administration and professional staff personnel by 7.1 %. In addition, the average percentage of students who score 3 or 4 on the English exam can rise by 5.3 % points, by 5.3 % points on the mathematics exam, and by 6.0 % points on the science exam. Moreover, the average graduation rate can rise by 6.8 % points.

Figures 13.3, 13.4, and 13.5 illustrate the potential improvements in the three resource categories. For districts that lie on the diagonal of one of these graphs, there is no evidence that they could reduce their use of this resource category. Other districts have the potential to reduce resource consumption by the amount that they lay below the diagonal.

Fig. 13.3
figure 3

Target vs. actual FTE teachers under each of the three orientations for school districts with at least one high school

Fig. 13.4
figure 4

Target vs. actual FTE teacher support under each of the three orientations for school districts with at least one high school

Fig. 13.5
figure 5

Target vs. actual FTE building and administrative professional staff under each of the three orientations for school districts with at least one high school

Figures 13.6, 13.7, 13.8, and 13.9 illustrate the potential improvements in the four performance measures. For districts that lie on the diagonal of one of these graphs, there is no evidence that they could improve their performance in this dimension. Other districts have the potential to improve by the amount that they lay above the diagonal.

Fig. 13.6
figure 6

Target vs. actual percentage of students scoring 3 or 4 on the secondary level English standardized test under each of the three orientations for school districts with at least one high school

Fig. 13.7
figure 7

Target vs. actual percentage of students scoring 3 or 4 on the secondary level mathematics standardized test under each of the three orientations for school districts with at least one high school

Fig. 13.8
figure 8

Target vs. actual percentage of students scoring 3 or 4 on the grade 8 science standardized test under each of the three orientations for school districts with at least one high school

Fig. 13.9
figure 9

Target vs. actual percentage of 4-year graduation rate under each of the three orientations for school districts with at least one high school

Figure 13.10 shows the histograms of the school districts for each of the three factor performances associated with the resources, excluding those districts for which no improvement is possible. Figure 13.11 shows the histograms of the school districts for each of the four factor performances associated with the performance measures, again excluding those for which no improvement is possible.

Fig. 13.10
figure 10

Histograms of the school districts with at least one high school for each of the three factor performances associated with the resources, excluding those districts for which no improvement is possible

Fig. 13.11
figure 11

Histograms of the school districts with at least one high school for each of the four factor performances associated with the performance measures, excluding those for which no improvement is possible

13.9 Districts Without a High School

The 31 school districts with no high school employed 2233 FTE teachers, 762 FTE teacher support personnel, and 416 FTE building administration and professional staff in the academic year 2011–2012. The average percentage of students who scored 3 or 4 on the English exam was 84.4 %; on the mathematics exam, the average was 86.0 %, and on the science exam, the average was 81.6 %. See Table 13.3.

Table 13.3 Statewide results for all three orientations for school districts without a high school

Using a mixed orientation, we found evidence that the number of FTE teachers can be reduced by 0.2 %, the number of FTE teacher support personnel by 4.3 %, and the number of FTE building administration and professional staff personnel by 3.3 %. In addition, the average percentage of students who score 3 or 4 on the English exam can rise by 0.4 % points, by 0.9 % points on the mathematics exam, and by 0.3 % points on the science exam.

Using a resource reduction orientation, we found evidence that the number of FTE teachers can be reduced by 0.8 %, the number of FTE teacher support personnel by 4.6 %, and the number of FTE building administration and professional staff personnel by 4.8 %. In addition, the average percentage of students who score 3 or 4 on the English exam can rise by 0.6 % points, by 0.6 % points on the mathematics exam, and by 0.0 % points on the science exam.

Finally, using a performance enhancement orientation , we found evidence that the number of FTE teachers can be reduced by 0.0 %, the number of FTE teacher support personnel by 4.3 %, and the number of FTE building administration and professional staff personnel by 3.0 %. In addition, the average percentage of students who score 3 or 4 on the English exam can rise by 0.4 % points, by 0.9 % points on the mathematics exam, and by 0.3 % points on the science exam.

13.10 Implementation

We reiterate that other choices of variables are possible. An important first step is for the school districts and the New York State Education Department (NYSED) to work together to modify this model as necessary. For example, the current model does not include data on Regents exam scores. In principle, the only requirement is that complete data exists for all school districts for the specified school year. In addition, it is important to provide a complete data set so that all school districts, especially those in New York City, can be included. This data set needs to be compiled for the latest school year for which complete data are available.

The NYSED would need to determine the distribution of model results. Perhaps the initial distribution during a pilot phase should be restricted to the school districts and NYSED. This would allow school districts the opportunity to understand the full meaning of their own results better and to begin to incorporate the results into their operations and planning. The pilot phase would also allow school districts and NYSED to suggest further improvements in the model.

Ultimately, the model can serve as a key element in a quality improvement cycle . By providing direct feedback to each school district about its performance along multiple dimensions, it supports school district decisions about how to improve and allows them to demonstrate that their decisions have in fact had the desirable effects.

13.11 Conclusions

We have presented a flexible model that allows school districts and NYSED to measure school district performance throughout New York State. The model provides multiple, mathematically-derived performance measures that allow school districts to detect specific areas for improvement. The model also enables NYSED to identify school districts that are the top performers in the state and others that most require improvement.

The results of a preliminary version of the model applied to data from the 2011–2012 school year shows that approximately one-third of the school districts in New York State are performing as well as can be expected given their local school district characteristics. Another 26.8–42.3 %, depending on the specific resource or performance measure, can improve by no more than 10 %.

Nonetheless, substantial statewide improvements are possible. Using the mixed orientation , for example, if every school district were to match to its target, New York State would have between 8 and 17 % fewer personnel, 6–7 % more students scoring 3 or 4 on standardized tests, and 6% more students graduating within 4 years.

Public education is critically important to the future of New York State and the nation. This model offers the potential to support public school education leaders in recognizing where improvements are possible and in taking appropriate action to implement those improvements.