Introduction

Social scientists have demonstrated that some police practices can be effective at reducing crime or disorder. Systematic reviews of hot spots policing (Braga et al. 2014), focused deterrence (Braga and Weisburd 2012), problem-oriented policing (Weisburd et al. 2010), and third-party policing (Mazerolle and Ransley 2006), for example, provide evidence for the relative effectiveness of these strategies. Long before researchers and practitioners conceived of these strategies, police and researchers were asking a far more basic effectiveness question: does hiring more police reduce crime. Two systematic reviews of the impact of police force size on crime have been recently published. Lim and colleagues (2010) count the number of findings that demonstrate a statistically significant crime reduction effect and find that such studies were in the minority of published findings. Even more recently, Carriaga and Worrall (2015) conducted a systematic review of 24 studies and meta-analysis of 12 studies. They found a small but significant crime reduction impact of police force size on crime.

In this paper, we are interested in the development of research over time. We use methods developed for systematic reviews and meta-analysis for this purpose. We are not simply concerned with drawing a conclusion about whether adding police reduces crime, though this is important. We are also interested in how the findings change over time and in uncovering possible reasons for the findings: do changes in methods and data correspond with differences over time, for example.

Our paper builds on prior findings in three specific ways. First, we show how findings have changed over time. These changes are critical for understanding what we know about the police force size–crime relationship. Second, we include all relevant studies identified in our analysis (over 60 studies with over 200 separate findings). This provides a more complete picture of what the research shows. Third, we examine likely explanations for the cumulative research findings. Our inquiry provides insights into how researchers produce knowledge and the impact of their evolving research methods on their findings. In the end, there appears to be no impact on crime in general of hiring more police, and advances in research methods do not seem to have helped to produce stronger conclusions on this important issue.

We organize our paper as follows. Following this introduction, we describe the economic theory of the police production of less crime, and outline the principle difficulties researchers face when trying to test it. In the third section, we describe the methods we use to identify the empirical research linking police force size to crime. The fourth section shows how the findings from this research have fluctuated over time. We first illustrate this change using descriptive findings over time and then we show effect sizes over time. These show contradictory results. The descriptive findings indicate that results have fluctuated over time, but recently the addition of police seems to have had a greater impact on crime. In contrast, the meta-analysis shows that, at least for the last four decades, findings seem to be constant and have no impact on crime reduction. In the fifth section, we test for possible explanations related to these findings: changes in the statistical methods and units of analysis, and the lack of variation in the principle independent variable (i.e., police size stability over time). Based on the tests of these explanations, we conclude that there is little reason to continue conducting research examining the police force size–crime relationship. However, for cost-conscious mayors, city managers, and city councils, there is a silver lining: modest planned reductions in police force size are unlikely to have a consequential impact with regard to overall crime.

The hypothesis that more police reduces crime

The hypothesis that increasing police force size reduces crime is relatively simple. It treats a police agency as a “firm” with a single homogeneous input, labor, and set of outcomes that are various forms of crime (Becker 1968). Using the typical ceteris paribus argument, as the number of police officers increases, crimes should decline, for a variety of reasons. More police should increase arrests of offenders, some of whom end up incarcerated for their crimes. Thus, one mechanism driving this relationship is incapacitation: what Nagin (2013) refers to as the apprehension role of police. General deterrence is a second mechanism: with more police, potential offenders perceive that their risk of being caught after offending is higher, so they too cut back on their crimes. Nagin (2013) calls this the sentinel role of police. Another possible mechanism is specific deterrence: more police officers allows tracking of specific offenders, who then cut back on their misdeeds. This is a special case of the sentinel role. Nagin, Solow, and Lum (2015) present a theoretical model of policing that suggests that the sentinel role is more effective than the apprehension role, though the two roles are intertwined.

Becker (1968) focuses his attention on the apprehension role; “The more that is spent on policemen, court personnel, and specialized equipment, the easier it is to discover offenses and convict offenders. One can postulate a relation between the output of police and court ‘activity (A)’ and various inputs of manpower (m), materials (r), and capital (c), as in A = f(m, r, c), where f is a production function summarizing the ‘state of the arts’.” (p. 174). Though Becker focuses on apprehension, his argument is not dependent on the precise mechanism by which numbers of police produce crime reductions.

Becker (1968) also assumes, following standard economic theory, that there are diminishing marginal returns to policing. That is, as inputs increase, the outputs would increase, but at a declining rate: the benefit of the first 100 officers deployed would be greater than the benefits of the next 100 deployed, which would be greater than the next 100, and so on. We can summarize Becker’s ideas in Fig. 1. The horizontal axis is the number of police, and the vertical axis is the number of crimes. The curves represent the theoretical relationship between these two variables, assuming all other factors influencing crime have been controlled for. The shapes of the curves show the declining marginal utility of police officers.

Fig. 1
figure 1

Theory of police force size and crime reduction

Historical examples provide evidence that removing police can spark a large increase in crime (the equivalent of moving from the right to the extreme left on the curve). Andenaes (1974) gives two examples—the Liverpool Police strike of 1919, and the Nazi arrest of the Danish police in 1944—where the removal of policing preceded a dramatic surge in crime. Russell (1975 [1930]) describes a similar result from the 1919 Boston police strike (the Finnish police strike of 1976 produced only a small increase in crime and may provide a counter example [Makinen and Takala 1980]). From these examples, it is reasonable to conclude that moving from zero police to a modest number of police is likely to reduce crime substantially.

The research we are reviewing does not address this. Rather, it examines whether a small (marginal) change in the size of an existing police force has an impact on crime. In short, it looks at the right part of the curve, rather than the extreme left part where crime drops rapidly with foundational (baseline) changes in police force size.

Becker (1968) notes that “it would be cheaper to achieve any given level of activity... the more highly developed the state of the arts, as determined by technologies like fingerprinting, wiretapping, computer control, and lie-detecting” (p. 174). This too is in keeping with standard economic theory, which assumes that at any given time, a firm is using a particular technology. An improvement in technology shifts the curve downward. In Fig. 1, strategy A is the “older” technology, while strategy B is a “newer” technology. For any given level of police force size, there is less crime with strategy B than strategy A, though both curves show a similar relationship between police force size and crime. Becker (1968) reflects the thinking of the 1960s by focusing on physical technology. Today, we can think of “state of the art” as reflecting the operational strategy of a police department. We will come back to this extremely important point later in the paper.

The first empirical attempt to estimate the relationship between police force size and crime was by Morris and Tweeten in 1971. Controlling for other confounders, they found that an increase in police force size results in an increase in crime. Since this study, there have been numerous other efforts. Two recent systematic reviews suggest contrasting conclusions. Lim and colleagues (2010) claimed there is more evidence contradicting a crime reduction effect of hiring more police than there is evidence supporting it. In contrast, Carriaga and Worrall (2015) suggested there is a small but significant crime reduction effect, on average. This contrast may be due to the criteria for selecting studies and the methods they used to analyze them. Lim, Lee, and Cuvelier (2010) examined 58 studies. They counted the number of findings supporting and contradicting the crime reduction hypothesis: a vote count analysis. Carriaga and Worrall (2015) restricted the studies they examined to those that analyzed the relationship between police force size and crime over time. They included 24 studies in their systematic review and 12 in their meta-analysis. Nevertheless, reconciling their different conclusions is important. Given that researchers have been examining this topic for over 40 years, it is also important to examine how findings have changed over the past four decades.

Methods

To systematically assess the relationship between police force size and crime, we followed established systematic review methods (Cook et al. 1997; Higgins and Green 2011; Mulrow and Oxman 1997) and used an advanced meta-regression technique. Our target studies are those that describe an empirical relationship between police force size (or a proxy for this independent variable) and crime, for police agencies in the United States. We restricted the inquiry of our study to the U.S. because most of the studies of this type describe U.S. policing, thus including the small number of non-U.S. studies would increase the heterogeneity in the findings without shedding additional light on the subject. Consequently, we do not make any assertions about the generalizability of our findings to police forces outside the U.S., nor do our findings apply to U.S. Federal law enforcement agencies, as well as county, or state police forces.

Search strategy

We examined the English-written literature using both electronic and manual searches. Our electronic search of various databasesFootnote 1 used these keywords: police force, police employment, police level, police expenditure, police budget, police effectiveness, police hiring, COPS and GAO (Government Accountability Office), and police deterrence. Because the literature on this topic prior to Becker’s (1968) economic theory of crime offered few empirical findings, we retrieved studies from 1968 through March 2015. We also conducted manual reference checks of relevant literature, including the two recent systematic reviews described above. Finally, we contacted scholars in this field and asked them about any study we may have missed. We presented an early version of this study at the Annual Conference of the American Society of Criminology at Atlanta, Georgia in 2013, and asked attendees if they knew of gaps in our literature. We relied upon an iterative approach rather than a sequential approach. Once we found studies that met the keywords criteria from a search of online databases, we searched the bibliographies in the new studies for other papers. If new keywords became apparent, we conducted another on-line search. This was particularly important because the online databases have limited access to studies prior to the 1980s, and because the terminology used has changed over the decades.

Literature screening and data extraction

We used a three-step screening process to select studies. First, we reviewed both abstracts and tables with empirical findings to determine if a study contained an empirical assessment of the relationship between police force size and crime. We then examined the entire article for all studies that met the step-one criteria. Second, we dropped the studies that did not provide the standard errors of estimated coefficients showing the effect of police force size on some form of crime. Six studies failed to provide this information. Finally, we eliminated any study of policing outside the U.S. Table 1 summarizes these steps. By step three, we had included 62 studies and 229 findings (most studies reported on multiple crime types resulting in multiple findings). A comparison with two other recently published systematic reviews by Lim, Lee, and Cuvelier (2010) and Carriaga and Worrall (2015) suggests that we drew upon a similar but slightly more comprehensive body of studies, despite using slightly different selection criteria (see Table 1).

Table 1 Number of studies and findings considered for systematic review

Coding protocol

Though the search period for literature was between 1968 and 2015, we found relevant studies between 1971 and 2013. For each finding, we coded whether or not it supported a crime reduction hypothesis (a statistically significant negative relationship between police force size and crime). To meta-analyze the overall effect size of 229 findings across 62 studies, we recorded the standard errors and relevant statistics (e.g., sample size, t-statistics, confidence interval, and standard deviations) for each estimated coefficient linking police force size to crime.

We also coded statistical modeling techniques of the 62 studies over this 43-year period: Ordinary least squares, 2-stage least squares, 3-stage least squares, hierarchical linear modeling, and first-difference generalized method of moments. We also accounted for the year of publication, geographic unit of analysis (city, county, metropolitan area, or state) focused on, and the years from which the data were collected. Table 2 shows the characteristics of the studies reviewed in this paper.

Table 2 Characteristics of the studies analyzed

Descriptive findings over time

Our first interest is to determine whether findings appear to change over time. We are interested in showing how a diligent reader of the police force size literature might view the conclusions of studies as the cumulative findings evolve. Such a reader will not be conducting a progressive meta-analysis, updated with every new study. Rather, she or he is likely to count the number of studies that seem to support the hypothesis that more police reduce crime, relative to the studies that fail to support this hypothesis.

There are several serious limitations to such an approach (which we describe shortly), but this approach is probably a reasonable approximation of how researchers and practitioners might alter their views of the hypothesis over time. We began by sorting the 229 findings into two categories: those that support the crime reduction hypothesis, and those that do not. We classified a finding as supportive if there was at least one significant negative coefficient for police force size. Otherwise, we classified it as contradictory. If a study had at least one supportive finding, we coded the study as supportive. Otherwise, we coded it as contradictory. We then, for each year, subtracted the contradictory studies from the supportive studies.

Figure 2 displays the differences between the supporting and contradicting studies over time. It is apparent that the net conclusions have fluctuated a great deal across the study period. Specifically, we find that an equal number of studies had supported and contradicted during the 1970s. However, during the 1980s and 1990s, the number of contradicting studies was two times greater than the number of supporting studies. After 2000, it is apparent that the crime reduction hypothesis became more predominant. Over the entire 43-year period, more studies support the crime reduction hypothesis than contradict it. In conclusion, a diligent reader who had followed this literature over 43 years might change her or his mind several times, but conclude in 2015 that on balance hiring more police seems to have an impact on crime reduction. Not only are there a few more studies supporting this conclusion, since 2000 the research seems to point in this direction.

Fig. 2
figure 2

The fluctuating findings for studies of police force size–crime relationships: 1971 through 2013

There are two reasons this analysis may be misleading. First, changing how we count changes our conclusions. If we count findings rather than studies, we find that about 36 % of findings across studies are supportive, while 64 % of findings are not. This is consistent with Lim and colleagues (2010) who reported 21 % of the findings in their database supported a crime reduction effect of police strength (the other 79 % of the findings were either nonsignificant or significant positive relationships). Second, comparing studies or findings in this way assumes each study (or finding) is equally weighted, thus we are accepting the null hypothesis for nonsignificant findings, rather than concluding we are uncertain of the true finding. In fact, our confidence in findings is dependent on its standard error. We should, therefore, weight findings by their standard errors and compare effect sizes. We next turn to this more precise analysis.

Statistical analysis

Effect size analysis

To estimate effect sizes, we used each study’s standardized regression coefficient (and standard errors) between police force size and crime variables (Higgins and Green 2011; Wilson 2001). Because the impact of police force size on crime is often measured using different methods and metrics across the studies, the direct pooling of regression coefficients is not meaningful (Nieminen et al. 2013). In such a case, standardized regression coefficient may offer a solution. Standardized regression coefficients are the estimates from an analysis carried out on variables that have been standardized to their variances equal to one (Vittinghoff et al. 2005). Therefore, in the context of the police–crime relationship, standardized coefficients show how many standard deviations in the number of crimes will change per a standard deviation increase (or decrease) in the police force size variable. For most of the studies, researchers supplied standardized coefficients, or relevant statistics (e.g., raw coefficient estimate, standard deviations of each variable in the model specification) in tables, figures, or in the body of the text, so we were able to calculate the standardized coefficient estimates of each study.Footnote 2 We excluded studies that fail to provide one or more of these components from our systematic review. We then used these standardized coefficients of all 229 findings from 62 studies to estimate the overall effect size in the Stata 14 statistical package.

Overall effect size

There are two different methods for estimating the overall effect size of police force strength on crime: fixed-effects and random-effects models. Both models rely on the inverse variance weight, so findings with smaller variances (larger studies) contribute more to the weighted average than studies with larger variances (smaller studies) (Helfenstein 2002). However, the final weight in a random-effects model is the inverse of both study level variance and the estimated between study variance. This gives random-effects models precise type I error rates, so it yields more conservative effect size estimates compared to the fixed-effects model (Lipsey and Wilson 2001). Further, the fixed-effects model assumes that all findings come from the same population, controlled for same variables, use the same outcome definitions, and are otherwise similar to each other with regard to factors that influence analytical findings. This is often an invalid assumption (Higgins et al. 2003; Wilson 2001), and the assumption is unlikely to be valid for the 62 studies and 229 findings we examined. For example, researchers measure police force size using the number of police officers or the dollar amount spent for hiring additional police officers (e.g., GAO 2005). They also frequently rely on a different geographic unit of analysis (e.g., city, county, metropolitan area, state, and nationwide), and analyze different periods. As we have noted, the studies used several different statistical modeling techniques. Due to the presence of heterogeneity across different studies, a random effects model seems to be an appropriate method to estimate the overall effect size.

However, a conventional random effects model relies on the assumption that the effect sizes from different studies are independent of one another. This assumption is not valid where multiple findings are nested within the same study. Without considering this dependent nature among multiple findings, a conventional random effects model would incorrectly estimate the overall effect size. To remedy the hierarchical nature of multiple findings per study, we estimated the overall effect size using the robust variance estimation (RVE) technique in meta-analysis (Hedberg 2014; Hedges et al. 2010). The RVE method provides a robust method for estimating standard errors in meta-regression, particularly when there are dependent effects among multiple findings within the same study. However, multiple findings per study can further cluster by the same research group. For example, effect sizes reported in Boba and Lilley (2009) and Lilley and Boba (2008) are likely to cluster at a higher level if the researchers used similar data and statistical method to estimate their effect sizes in several studies. Therefore, it makes sense to assume that the observed effect sizes (findings) are nested within studies, which are further, nested within hierarchically higher-level clusters. So, we used the RVE method with “hierarchical” model weight in Stata to operationalize the hierarchical nature across findings, studies, and research group levels.

We estimated a mean effect size of police force size on crime of about –.030 (with a 95 % confidence interval between –.078 and .019).Footnote 3 The effect size from the RVE method is not statistically significant. The corresponding tau-squared value (= 0.0007; an estimate of the between-clusters variance component) shows that there is a between-clusters variance among the 229 findings nested with 62 studies. The nonsignificant and tiny mean effect size between police force size and crimeFootnote 4 suggests that simply increasing police force size may not help reduce crime, and if it does, it does not reduce crime by much.

It is instructive to compare this effect size to the effect sizes from other meta-analyses of police strategies, as this provides a set of standards by which to judge the importance of hiring more police. In Fig. 3, we compare this effect size of police force size to the effect sizes reported in meta-analyses of crime hot spots (Braga et al. 2014), problem-oriented policing (Weisburd et al. 2010), neighborhood watch (Bennett et al. 2006), and focused-deterrence (Braga and Weisburd 2012). Though all effect sizes for these programs are negatively related to crime, we recalibrate their negative signs into positive (crime prevention gain) signs so that the height of bars corresponds to a bigger impact on crime reduction. It is apparent that the effect size for adding police is miniscule compared to the other effect sizes. Thus, it appears that cities might reduce more crime by using specific strategies to reduce crime than by hiring more police.

Fig. 3
figure 3

Effect sizes from systematic review studies

One possible explanation for the nonsignificant tiny effect size is that effect size might change over time. If recent studies demonstrate significant effect sizes while older studies report nonsignificant effect sizes, then the nonsignificant effect sizes from older studies might be masking the significant (and more valid) effect sizes of recent studies.

Period effect sizes

To test whether the effect sizes for police force size and crime have changed over time, we divided our 43-year study period into four overlapping 20-year time periods. This provides sufficient years in each period to show the change in effect sizes over four decades. We assigned the studies (and their findings) to one of the four periods where the dataset of the study was located. For example, Kovandzic and Sloan (2002) analyzed data for the years from 1980 to 1998. The authors found three significant negative relationships and five nonsignificant relationships between police force size and crime. We assigned these findings to 1980–1999 but not to 1990–2013, because the authors’ dataset only fit in the first period. This assignment process did force us to drop ten studies and 70 findings from the analysis because these studies overlapped two or more periods.

We calculated the mean effect sizes for each period using the RVE technique. Our findings are shown in Table 3. The effect sizes for four overlapping decades fluctuate from positive, to highly negative, to around zero, to small and negative, but no period effect size is significantly different from zero. One persistent finding is clear. The nonsignificant overall effect size is not due to older studies; indeed, the effect size between police strength and crime has been consistently nonsignificant for the past four decades.

Table 3 Effect sizes between police force size and crime by decades of dataset

However, another possibility for the nonsignificant overall effect size might be due to the methods used. Studies that use weaker methods might be driving the overall nonsignificant effect size. If so, then looking at only strong studies would give a more valid estimate of the average effect size. We look at that next.

Methods used

Researchers prefer advanced statistical methods because they assume these methods produce more valid findings than more basic correlational techniques. For instance, since Levitt introduced it in 1997, researchers have increasingly employed the instrumental variable (IV) technique to control for the endogenous relationships between police force and crime. Recently, Kovandzic and his colleagues (2016) have suggested that applying generalized method of moments (GMM) provides more advanced specification tests for instruments rather than conventional IV technique. They reanalyzed Levitt’s two studies (1997, 2002) and showed that when using GMM they get different results than Levitt. One way of thinking about the history of research on this topic is as an arms race between those supporting and those skeptical of the crime reduction hypothesis, where the arms are statistical methods.

Though newer methods are introduced over time, researchers may not switch to newer methods all at once. We attempt to unravel the context and the history of the use of statistical methods in the study of relationships between police force size and crime. Figure 4 shows that statistical methods change over time. Specifically, prior to 2000, OLS regression was the most used statistical method, but after 2000 other methods became dominant. Studies that use lagged variable models to control the endogenous relationship between police force size and crime also decreased after 2000. Two- and three-stage least squares have not appeared in this literature since 2010.

Fig. 4
figure 4

Statistical methods employed in police force size and crime studies: 1971 through 2013

If the advanced methods consistently produce more valid results, we should see evidence that changes in research methods have changed the size or the sign of effect sizes, regardless of the decade the research was conducted. To test this hypothesis, we used the RVE method with beta as outcome variable, and dummy variables for the methods and measures used. We coded both two-stage least-squared and three-stage-least-squared models as 1 if a study used either statistical model; otherwise 0. We coded both generalized method of moments and hierarchical linear model as 1 if a study used either method; otherwise 0. We also considered simultaneity bias between police force size and crime. Levitt (1997) notes that while more police may drive crime down, more crime might prompt elected officials to hire more police. Failing to control for this loop (simultaneity) between the two variables may result in a biased estimate of the relationship between police size and crime. Therefore, we coded the studies that used longitudinal data to control for simultaneity as 1, otherwise 0.

Since choice of statistical method may be linked to other methods choices, we also controlled for the way the independent variable was measured, and the units of analysis used. If a study measures police force size using the amount of dollars spent for hiring additional police officers, we coded the dummy variable as 1. Consequently, any study that measures police force size using the number of police officers or other measures is coded as 0. We also created dummy variables for the geographic unit in which the researchers conducted their study: city or county were coded as 1 and larger units were coded as 0 and used as the reference value.

Table 4 shows no significant relationships between the statistical methods used and effect sizes controlling other factors fixed. This is consistent with the recent findings of Carriaga and Worrall (2015) that neither unit of analysis nor study design has significant impact on the outcome variable. Specifically, using the city or county as unit of analysis does not increase the chances of discovering a crime reduction effect relative to other possible units. We also find that the choice of a geographic unit of analysis does not significantly influence the effect size. Measuring police force size by police budget (rather than by number of officers) does not seem to influence the effect size. These results hint at the possibility that neither statistical method nor measures of police force size and geographic unit of the study have significant impacts on the main relationship between police force and crime. For now, we must conclude that variation in the statistical methods used is not related to effect sizes. While we cannot assert that the next advance in such methods will be unproductive, these results suggest we should not be overly optimistic that newer methods would give us different findings.

Table 4 Test of advanced method hypothesis in police force size and crime studies

Lack of variation in police force size

Finally, we examine another factor that might explain the tiny and nonsignificant effect size: lack of variation in police force size. In an experiment, the amount of an intervention should exceed a certain dosage, relative to a control, to assure sufficient variation of the intervention variable. If this does not occur, then the independent variable approximates a constant and it will be impossible to measure the impact of the intervention. Something similar to insufficient dosage could influence our estimates of the relationship between police force size and crime study. Specifically, if there is very little change in police force size over time, then whatever impact hiring more cops has on crime may be difficult to detect.

How much change has occurred in police strength over time? To help answer this question we analyzed the temporal change in police force size for different groups of cities based on their populations. The data come from the FBI Table 78, police force size by city agency (Federal Bureau of Investigation [FBI], 2010). The data describe all U.S. cities with population greater than 100,000 between 1990 and 2010 with complete data (about 77 % of all U.S. cities). We divided cities into groups of different populations. In Fig. 5, we display the temporal variation in police per capita of each group over time. Within each group police force size is stable for most of these cities. Only large cities (about 5 % of all cities) with population greater than 1,000,000 show some changes in police force size. There is a growth from 1990 through 1999, and a large decline in 2003. This decline is due to the large decrease in the police force of a single city, New York: the six other cities in the same group did not experience any change in police force size. Table 5 provides further evidence of the stability of police force size over time: standard deviations of the average number of officers are very small relative to their mean values. For the most part, it appears that cities and counties maintain a constant level of police per capita. Large variation in police per capita is not created by hiring and attrition (or firing), but by different size jurisdictions having different levels of policing: bigger jurisdictions have more police per capita than smaller ones.

Fig. 5
figure 5

Police force size rate for U.S. cities grouped by population: 1990 through 2010

Table 5 Police officers per capita for U.S. cities with populations greater than 100,000: 1990 through 2010

These findings are very important for four reasons. First, police force size behaves more like a constant than an independent variable, and without much variation in an independent variable, it is difficult to find a connection to a dependent variable. Second, the tiny variation in police force size makes any empirical findings about the link to crime highly sensitive to model specification. This would help explain the variation in findings across studies. Third, with high sensitivity, both positive and negative relationships between police force size and crime can be found (as shown in Fig. 2). Finally, the lack of temporal variation also helps explain why studies that made use of temporal changes in police and crime did not produce different findings from those that relied on cross-sectional comparisons only. In conclusion, like the Dude’s rug, this lack of variation brings together the various findings in our systematic review.

Discussion and conclusions

In review, here are our findings. First, the overall effect size of police force size is negative, small, and statistically not significant. Compared to the effect sizes of strategies that are (randomized and quasi-experimentally) tested in the policing literature, we conclude that merely increasing police force size does nothing to reduce crime. This finding is different from the most recent meta-analysis of this topic (Carriaga and Worrall 2015).

Second, the reported effect of police force size on crime has been constant and not significant for more than 40 years. This is contrary to what we showed in the descriptive analysis (Fig. 2). There is no good reason to suspect that marginally more police reduce crime at a meaningful level and that more recent research is more supportive of the crime reduction hypothesis.

Third, effect sizes do not appear to be related to the research methods or statistical techniques used in the 62 studies. This conclusion is consistent with Carriaga and Worrall’s (2015) analysis of 24 studies.

Fourth, the little temporal variation in police force size from 1990 through 2010 helps explain the tiny and statistically insignificant effect sizes we found. It also might explain why researchers have produced so many contradictory findings over 43 years.

Reconsider Fig. 1. The most optimistic summary of the evidence to date is that police services in the U.S. are located along the right hand end of the curve: any crime control utility from adding more officers will be difficult to detect, particularly given that police agencies are unlikely to make substantial increases in their force sizes. Further, greater progress might be made in police crime-control effectiveness by changing the policing strategy from curve A to curve B. The substantially larger (and statistically significant) effect sizes from meta-analyses of police strategies indicate that policy makers who want police to have an impact on crime would be better suited investing resources in new evidence-based strategies than funding surges in police hiring.

For researchers interested in policing, these findings should be sobering. After more than 40 years of study, during which researchers have used increasingly sophisticated statistical models, we can state that police force size is unlikely to make much difference in crime, on average. However, the increased sophistication of research methods over these four decades has contributed little to improve our understanding of this topic. Unlike Lim and colleagues (2010) and Kovandzic and coauthors (2016), we are not optimistic that further research in this area using statistical modeling is likely to be productive. It is possible that police force size might influence some specific crimes more so than others. Several studies suggest this (e.g., Carriaga and Worrall 2015; GAO 2005; Lim et al. 2010). Unfortunately, we cannot reliably address this question with 62 studies and 229 findings: researchers have not consistently studied the same crime types so we have too few studies to assess this thesis.

Further, the lack of variation in the principle independent variable suggests that any result will be highly sensitive to data and model specification. This is true of crime specific results as well. This explains why findings have appeared to randomly fluctuate over time even when researchers used similar methods and datasets to decipher the relationship between police force size and crime.

We should put the weak, and possibly nonexistent, connection between police force size and crime in context. This finding is part of a larger literature that indicates that the types of methods used to answer this type of question may not be up to the task. After reviewing the research on the impact of the death penalty and crime Daniel Nagin (2013, p. 92), for example, concludes that “This unpredictability calls into question the usefulness of prior data on the death penalty when calculating present and future risk.” Looking outside of the criminal justice policy literature, we find a similar conclusion in education policy research. In the conclusions of an article on the effects of classroom size on future earnings of students, Dustmann et al. (2003: F118-19) state, “The main result of our analysis is that class size effects on wages are present, but very small. They are unlikely to be detected in simple reduced form regressions when using data sets of moderate size.” Hanushek (2002) comes to the same conclusion.

Within this context, our results seem unremarkable. However, because of this context the major implication of our systematic review is clear: researchers should reconsider efforts to link simple police outcomes to simple inputs using nonexperimental statistical modeling. In contrast to the quasi- and randomized controlled experimental literature, nonexperimental studies are easy to implement, but their collective findings maybe too weak and unreliable to inform policy. Here we break with the traditional way of concluding a research paper. We anticipate little benefit to pursuing this line of enquiry, and suggest that it is time to end it.