Introduction

A robust body of theoretical and empirical research indicates that crime, particularly serious violent offending, is spatially concentrated into relatively small geographic units (Braga et al. 2010; Brantingham and Brantingham 1993; Groff et al. 2010; Weisburd 2015). This research has informed the development and implementation of a variety of hot spots policing strategies, in which police resources are focused on increased activities in the areas where crime is the most densely concentrated (Sherman and Weisburd 1995). Evaluations of hot spots policing strategies have produced promising results, suggesting that their deployment can enhance the effectiveness of police in reducing violent crime (Braga 2005; Braga et al. 2014; Skogan and Frydl 2004). The evidence of hot spots policing effectiveness is a welcome development for law enforcement agencies faced with increasing demands for services, limited resources, and the need to make the most of effective use of available assets (Braga et al. 2014; Hardy 2010).

To date, police agencies have utilized a variety of tactics within hot spots, including concentrating traditional police activities within hot spots (e.g., directed patrol, firearm seizures) (McGarrell et al. 2001; Sherman and Rogan 1995), or engaging in problem-oriented policing designed to address the underlying conditions of crime (Braga and Bond 2008; Braga et al. 2014). Recently, Data Driven Approaches to Crime and Traffic Safety (DDACTS) has been forwarded as a new operational initiative which aligns with the former strategy, utilizing highly visible traffic enforcement (i.e., increased visible directed patrols focused on using traffic stops to target contraband (McClure et al. 2014; Weiss 2013)) within empirically generated hot spots as a means to reduce violent crime and traffic accidents simultaneously (NHTSA 2014; Weiss 2013). In the face of limited resources, the synergy of crime control and traffic safety opens the possibility of a more efficient and effective allocation of resources (McClure et al. 2014). Based on a limited number of evaluations, the DDACTS program has been rated as “promising” by crimesolutions.gov (2016).

The current study describes a rigorous impact evaluation of a place-based directed patrol intervention in Flint, Michigan. Although the program was originally conceptualized to be grounded in the DDACTS model, several deviations in practice resulted in a more traditional directed patrol intervention aimed at reducing violent crime. We utilize recent advances in synthetic control estimation (Robbins et al. 2017; Saunders et al. 2015) to implement a retrospective, quasi-experimental design. The synthetic control method is an alluring tool for producing counterfactuals in the retrospective evaluation of place-based policing programs, where randomization was not built into the planning of the intervention (Braga 2010), or is impractical in that the target areas are generated due to their abnormally high crime rates. We will outline the logic of the DDACTS program that was intended to be implemented in Flint, describe the deviations from the model in practice, and describe the synthetic control approach used to estimate treatment effects.

Background

Development of data-driven approaches to crime and traffic safety (DDACTS)

Traffic enforcement and crime prevention have traditionally been thought of as separate entities. In the late 1930s, the rapid emergence of motorized vehicles and associated traffic problems in many cities, and the belief that general patrol officers lacked the training and skilled supervision to adequately handle traffic enforcement, led to the creation of specialized traffic units in police departments across the United States (Kreml 1954; Weiss 2013). The proliferation of these specialized units reinforced the practice of regarding traffic safety and general crime control as distinct, separate activities undertaken by police (Weiss 2013). However, in contemporary policing, increasing demands for services, growing operational costs, and limited resources in many jurisdictions, has resulted in a growing need for law enforcement agencies to prioritize the allocation of police resources. These pressures have led many law enforcement agencies to primarily focus their resources on crime while traffic safety has become a secondary issue (IACP and NHTSA 2001; NHTSA 2014).

DDACTS is an innovative strategy that uses a problem-oriented policing approach to reduce both crime and traffic incidents in areas where the two overlap, allowing law enforcement to address both problems simultaneously despite limited resources. The DDACTS strategy involves geographically and temporally plotting locations of crimes and motor vehicle crashes to identify places and times where these incidents have a high occurrence of overlap. Once identified, law enforcement focuses special attention on these areas using high-visibility traffic enforcement to deter crime, traffic violations, and motor vehicle crashes (NHTSA 2014).

The development of DDACTS guiding principles can be traced through the evolution of several initiatives grounded in the attempt to strengthen law enforcement’s role in traffic safety and promote the crime control effects that can be achieved through traffic enforcement. Notably, in the late 1990s and early 2000s, the National Highway Traffic Safety Administration (NHTSA), along with the International Association of Chiefs of Police (IACP), advocated the secondary benefits of traffic enforcement as a way to fight crime by disrupting criminals who use motor vehicles during the commission of a crime (e.g., robbers, drug traffickers, car thieves). It was argued that traffic safety initiatives were no less important than those for gang violence, narcotics, and violent crimes, and should be given serious consideration by law enforcement executives. In advocating for traffic safety to become a core value of law enforcement agencies, the IACP and NHTSA contended that departments would see a number of benefits, including the reduction of crash injuries and fatalities, and the reduction of criminal activity where traffic enforcement was heaviest (IACP and NHTSA 2001). These ideas were implemented in the Strategic and Tactical Approaches to Traffic Safety (STATS) program, leading to a push for the role of traffic enforcement and for data to be the driving force for strategic decision making (Weiss 2013: 18). Using STATS as a guide, in 2008 DDACTS was developed to “…provide a dynamic, evidence-based problem-solving approach to crashes and crime” (NHTSA 2014: 1).

Components of the DDACTS model

A number of key elements drive the DDACTS model. First, DDACTS uses a place-based policing strategy. This emphasis on proactively reducing opportunities for crime at places compares to a reactive strategy focused on responding to crimes after they occur. The focus is shifted from the people involved in crime to the contexts of criminal behavior (Weisburd 2008; Weisburd et al. 2010), an approach that is more efficient than focusing on targeting individuals and provides more stable targets for intervention (Skogan and Frydl 2004; Weisburd 2008). Recent research in Indianapolis has supported the notion that both violent and property crime are significantly associated with traffic crashes (Carter and Piza 2017).

Second, in addition to focusing on the specific places where crime and traffic crashes intersect, the DDACTS model draws on research illustrating the positive crime control effect of directed, high-visibility traffic enforcement (Cohen and Ludwig 2003; Kubrin et al. 2010; McGarrell et al. 2001; Sherman and Rogan 1995; Stuster et al. 1997; Weiss and Freels 1996; Weiss and McGarrell 1999). Directed patrol is one such strategy which involves assigning officers to high-risk areas to engage in proactive investigations and enforcement of suspicious activities. Some of the most promising research on directed patrols involves their impact on gun violence. Evaluations of directed patrol programs in Kansas City, Indianapolis, and Pittsburgh involved police patrols that focused on illegally carried firearms (Cohen and Ludwig 2003; McGarrell et al. 2001; Sherman and Rogan 1995). Cohen and Ludwig (2003), for example, argued that the positive findings from directed patrol studies are a result of increased police presence in the target areas whereby the high visibility of officers and focus on illegal gun carrying deterred high-risk people from carrying or using guns in public, in turn reducing gun violence in the communities. These findings are corroborated by Ratcliffe et al. (2011) in Philadelphia, who observed deterrent effects stemming from increased police visibility in designated hot spots. However, it is important to note that meta-analytic reviews have suggested that simply increasing police presence has produced mixed effects on a variety of outcomes, including gun violence and calls for service (Braga et al. 2014).

Researchers suggest targeted traffic enforcement offers several benefits, outside of altering driver risk perceptions and driving behavior (e.g., Stanojević et al. 2013; Yannis et al. 2007): First, the high-visibility of law enforcement serves as a general deterrent for crime (Thomas et al. 2008). If there is a notable difference in enforcement activity in the targeted area, the perceived risk of getting caught for a crime increases (Ratcliffe et al. 2011). Second, it disrupts organized crime by making it riskier to use vehicles in the course of engaging in illicit activities (Worden and McLean 2009). Offenders may be less likely to use a vehicle if they think officers will find evidence of illegal activity during a stop (NHTSA 2014; Weiss 2013). Finally, a more visible law enforcement presence gives members of the community an increased sense of safety which may improve police–community relationships and build collective efficacy (Hardy 2010).

The final element driving the DDACTS model is the emphasis on the use of crime mapping and data to identify the places where targeted traffic enforcement may be needed. By using a data-driven process to identify hot spots, law enforcement can easily justify allocation of police resources to those areas and consistently monitor the strategy’s progress to determine if and when it needs to be modified (Hardy 2010; NHTSA 2014). According to Weiss (2013), crime mapping can help law enforcement agencies better understand how crimes and crashes are related and can be a useful tool for demonstrating to the public where crime in a community is occurring and subsequent results from the implementation of DDACTS.

The directed patrol (DDACTS) implementation in Flint

Flint, Michigan, USA, is a post-industrial, Midwestern city of approximately 100,000 residents. Over the last three decades, the city has experienced deleterious effects of deindustrialization and globalization, resulting in considerable loss of employment opportunities, particularly in the automobile industry (Matthews 1997). The population peaked at nearly 200,000 residents in 1960, and, by the late 1970s, over 80,000 people were employed in the automobile industry. By 2010, the population had declined to 102,000 and less than 8000 worked in the automobile industry. The result of this significant population loss and high unemployment has been a reduction in tax revenue, strained public resources, and high rates of crime and violence. Efforts to control the growing violent crime problem have been hampered by a significant decline in the city budget and a corresponding reduction in the size of the police force. The Flint Police Department (FPD) experienced close to a 50% reduction in personnel, from 242 sworn officers in 2003 to 122 by 2011. This period also saw an increase in violent crime, nearly doubling from 12.2 violent crimes per 1000 people in 2003 to 23.4 per 1000 in 2011.

The Flint DDACTS initiative was developed as part of the State of Michigan’s Secure Cities initiative, which sought to reduce high rates of violent crime in the cities of Detroit, Flint, Pontiac, and Saginaw, and later expanded to other Michigan cities. The Secure Cities initiative represented an effort of the Governor’s Office, through the Michigan State Police and the Michigan Department of Corrections, to provide state resources to local communities experiencing high levels of violent crime as well as budgetary constraints on local public safety resources. The specific strategies varied across the target cities and were developed in consultation between local and state officials.

The Flint strategy was an adaptation of the DDACTS model, a central component of which was the designation of Michigan State Police (MSP) directed patrols in high violent crime areas as identified by a data-driven selection process. However, Flint DDACTS represented a modified (or partial) implementation of the DDACTS strategy in two ways. First, the concentration of violent crime, and not the overlap of violent crime and traffic accidents, was the major determinant of hot spot identification. From the outset, the program staff were clear that the explicit purpose of the initiative was to address violence. A dedicated Geographic Information Systems (GIS) staff person at MSP performed spatial analyses of violent crime data in Flint to determine whether and where violent crime hot spots existed. Individual violent crime incidents were geocoded and linked to street segments. A measure of spatial autocorrelation (Moran’s I) was used to initially identify clusters of street segments with a relatively high number of violent incidents. Within these clusters of street segments, those with consistent rates of violence (i.e., repeated incidents within a relatively high violence area) were identified as statistical focus areas. This process was referred to internally as repeat locations analysis. Once the focus areas had been identified, dedicated GIS staff coordinated with crime analysts to expand the small clusters of street segments to encompass nearby street segments. This was done to create larger geographic areas which would be patrolled by specific MSP enforcement units.

Second, although the term “hot spots” was employed by DDACTS staff, these were relatively larger geographic areas similar to a police precinct, and were consequently much larger than intervention areas utilized in other place-based policing programs (e.g., Saunders et al. 2015). Following the initial identification of hot spots, spatial analysis of violent crime incidents occurred on a continual basis, suggesting the emergence of new statistical focus areas and the displacement of crime from original hot spots. The program response to these trends was the identification of new hot spots and the expansion of existing hot spots. The program staff rationale for the enlarged intervention areas concerned providing officers with geographic areas amenable to directed patrols. At the outset of the Flint DDACTS implementation (January 2012), MSP had identified five violent crime hot spots and dedicated 14 h per day to enforcement activities. By the end of the observation period (December 2013), a total of 7 hot spots were identified and a unit of 33 troopers and 5 full- or part-time sergeants was devoted to program operations. Patrols specific to the program were included in all shifts, to the extent that program activities were occurring in at least one hot spot on a daily basis. To these extents, the Flint DDACTS program is more accurately framed as a directed patrol initiative.

The MSP directed patrol initiative represented a significant increase in police patrols in the hot spot areas. Discussions with the Flint Police Department (FPD) and MSP officials indicated that the selection of the directed patrol strategy was based on the belief that FPD’s limited patrol resources had resulted in minimal proactive patrol, with FPD officers largely reactively responding to calls-for-service. FPD did maintain normal patrol levels in the hot spot areas, and thus the MSP directed patrols represented an increase of police patrols into these hot spot zones as opposed to a replacement of local patrols. As such, the strategy represented an infusion of state police-directed police patrols in the context of very limited local patrol resources.

Initial findings and evaluation challenges

The initial results of the Flint directed patrol initiative were promising. MSP maintained detailed activity data from the troopers involved in the program, and it was clear that it was implemented with significant intensity. For example, there were over 22,000 traffic stops between January 1, 2012 and March 2014. The hot spot areas experienced over 600 traffic stops per month. The traffic stops were to be based on probable cause, with the goal of identifying contraband and fugitives, with the most common actions involved during stops being verbal warnings (95 per 100 stops), misdemeanor and felony arrests (14 per 100 stops), fugitive arrests (17 per 100 stops) and citations (2 per 100 stops). Nearly three-quarters of the traffic stops occurred in the hot spot areas, though this also indicated spillover to surrounding non-intervention areas. This diffusion was likely due to the manner in which hot spots were operationalized—as large, expanding areas within which MSP units were assigned to patrol.

Preliminary, unconditional analyses of overall trends suggested a decline in violent crime within the hot spot areas. Violent crime (homicide, aggravated assaults, robberies, criminal sexual conduct, weapons offenses) declined 19% in the hot spot areas, while during the same period, the remainder of the city experienced a 7% decline in violent crime. The declines for robberies were even more pronounced (26%) during a time that the remainder of the city experienced a 2% decline.

Although these overall declines were quite promising, the picture became more convoluted when considering the appropriateness of the available counterfactuals. The preliminary analysis suggested that three threats to internal and construct validity were plausible explanations for the results (Shadish et al. 2002). First, as the intervention areas were chosen purposefully, selection bias between the intervention and comparison areas appeared likely. Second, as violence was declining throughout the entire city during the intervention, some form of endogenous change may have accounted for the observed declines. Third, the presence of program activities in the comparison areas of Flint opened the possibility of treatment diffusion, which could also contribute to the decline in violence throughout the city. This study extends these earlier analyses to provide a more rigorous test of whether the preliminary, promising descriptive violent crime trends were the result of the intervention.

The current inquiry

Analytic framework

Producing synthetic control weights

We implemented recent advances in synthetic control methodology proposed by Robbins et al. (2017), which is put forward as a means of producing a counterfactual estimate of a treatment effect. Specifically, let Y ijt represent the value of outcome i (of I total outcomes) at time t in block j, whereas Y ijt (1) represents the observed outcome in the presence of an intervention, and Y itj (0) the same value in the absence of treatment. Under a counterfactual model, the causal effect of the intervention on the outcome would be calculated as α ijt  = Y ijt (1) − Y ijt (0), or the difference between counterfactual outcomes. This further implies

$$ {Y}_{ijt}={Y}_{ijt}(0)+{\alpha}_{ijt}{D}_{jt}, $$
(1)

where D jt is equal to 1 when a given block was receiving an intervention at time t, and takes on a value of zero otherwise. However, since for any block receiving the intervention the value Y itj (0) is unobservable, both that value and the treatment effect (α itj ) must be approximated.

The synthetic control method attempts to achieve such approximation by estimating a vector of weights, wj, which, when applied to potential comparison areas to the treatment (the “donor pool”), produce a counterfactual comparison area which collectively resembles the treatment area prior to the onset of the intervention, in terms of both outcomes over time and baseline covariates. Using the notation provided by Robbins et al. (2017), consider J total blocks, where J0 is the number of blocks in the donor pool, and the remainder (j = J0 + 1, …, J) represent the blocks in the intervention area, and T total time periods, where T0 represents pre-intervention time periods. Restricting to pre-intervention time periods t ∈ (1, …, T0), synthetic control methods estimate a set of weights which produce equivalence in outcomes between the aggregate comparison and treatment areas, or

$$ \sum \limits_{j=1}^{J_0}{w}_j{Y}_{ijt}=\sum \limits_{j={J}_0+1}^J{Y}_{ijt} $$
(2)

for t < = T0. When weights satisfying (2) are applied, the synthetic control estimator for the average post-intervention treatment effect for outcome i can be approximated by

$$ {\widehat{\alpha}}_i=\frac{1}{T-{T}_0}\left(\sum \limits_{j={J}_0+1}^J{Y}_{ijt}-\sum \limits_{j=1}^{J_0}{w}_j{Y}_{ijt}\right) $$
(3)

where the first component of Eq. 3 serves to average treatment effects over post-intervention time points. Although several variations on synthetic control have been proposed in recent years (Abadie et al. 2015; Saunders et al. 2015), the logic of the approaches and estimation of treatment effects are similar, but vary in terms of how the synthetic control weights are optimized, and the research problems to which they can be applied. In this instance, to evaluate the impact of the Flint Directed Patrol intervention on violence, we implemented the synthetic control approach outlined by Robbins et al. (2017). This approach applies methods from the analysis of complex survey designs to estimate both synthetic control weights and subsequent treatment effects. The logic of this approach considers the intervention area as the population, and the donor pool of comparison areas as a sample that will be weighted to reflect the properties of the population.

Specifically, consider pre-intervention time points t ∈ (1, …, T0) and a data structure using a wide format matrix X, where each row is a vector of all outcomes at all pre-intervention time points, including an intercept and baseline covariates (R) for block j, or

$$ {\mathbf{X}}_j={\left(1,{Y}_{1j1},\dots, {Y}_{1j{T}_0},{Y}_{2j1},\dots, {Y}_{2j{T}_0},{Y}_{Ij1},\dots, {Y}_{Ij{T}_0},{\mathbf{R}}_j^{\prime }\ \right)}^{\prime }, $$
(4)

From this data structure, a vector of outcome, intercept, and covariate totals across the intervention blocks (tx) is calculated (see Eq. 5 below) and utilized as a vector of population totals. Consistent with the logic of the synthetic control weights in Eqs. 2 and 3, a set of sampling weights (wj) for the donor pool is then calibrated so that the weighted totals for the comparison are exactly equivalent to the intervention blocks, such that

$$ {\mathbf{t}}_x=\sum \limits_{j={J}_0+1}^J{\mathbf{X}}_{\boldsymbol{j}}=\sum \limits_{j=1}^{J_0}{w}_j{\mathbf{X}}_j $$
(5)

This weight calibration is accomplished by utilizing the generalized raking procedure implemented by the calibrate function in the survey package in the R statistical computing environment (Deville et al. 1993; Lumley 2014; R Core Team 2016). Constrained by the conditions set in Eq. 5, the generalized raking procedure solves for a set of weights which minimizes

$$ \sum \limits_{j=1}^{J_0}{d}_jG\left({w}_j/{d}_j\right) $$
(6)

where dj is a set of initial weights, and G(∙) is a truncated linear distance metric in that G(x) = (1/2)(x − 1)2 for x ≥ 0. Following Robbins et al. (2017), we set initial weights to scale the comparison to the comparison blocks to the treatment area, or d j  = (J − J0)/J0. The resulting weights for the comparison blocks are strictly positive, and all intervention blocks are given a weight of 1 prior to export to long-format data for analysis, where each row is a block j at time t (see Robbins et al. 2017 for additional details).

Treatment effect estimation and inference

As outlined by Robbins et al. (2017), the synthetic control estimator is implemented using weighted least squares (WLS), and fit separately for each outcome of interest. For outcome i, the long-format data are restricted to post-intervention time periods and the following model is fit,

$$ {Y}_{ijt}={\beta}_{it}+{\widehat{\alpha}}_i{D}_{jt}+{\in}_{ijt} $$
(7)

in which β it is a fixed effect indicator for each time period,  ∈ ijt is an error term, D jt is a treatment indicator (i.e., intervention at time t = 1; else = 0) and \( {\widehat{\alpha}}_i \) is the synthetic control estimate of the treatment effect. WLS is utilized over regression models more specifically designed for discrete count outcomes because under a linear model the value \( {\widehat{\alpha}}_i \) will be equivalent to the desired synthetic control estimator identified in Eq. 3, while this may not be the case when utilizing Poisson or negative binomial regression (Robbins et al. 2017).Footnote 1 Further, the primary rationale for utilizing WLS is to produce an estimate of the variance of \( {\widehat{\alpha}}_i \) which accounts for the design effect implicit in the weights (Robbins et al. 2017). We implemented this model using the svyglm function (in the R package survey) (Lumley 2014) in order to estimate the variance of the synthetic control estimator via Taylor series linearization, and calculate a test statistic for the estimator.

Specifically, to test the null hypothesis that the intervention effect for outcome i is equal to zero (\( {H}_{0i}:{\widehat{\alpha}}_i=0\Big) \), Robbins et al. (2017) suggest a test statistic based on a two-sided z test,

$$ {\widehat{Z}}_i={\left(\frac{{\widehat{\alpha}}_i}{\sqrt{\widehat{\mathrm{Var}}\left({\widehat{\alpha}}_i\right)}}\right)}^2 $$
(8)

Further, in instances where there are multiple outcomes of interest, an omnibus test statistic can be generated to test the compound null hypothesis that all intervention effects are equal to zero. This statistic is calculated as the sum of \( {\widehat{Z}}_i \) for all outcomes I, thus standardizing the treatment effects across each outcome,

$$ {\widehat{Z}}^{\pm }=\sum \limits_{i=1}^I{\left({\widehat{Z}}_i\right)}^2 $$
(9)

Although a naïve p value can be generated for individual \( {\widehat{Z}}_i \) by assuming the statistic was sampled from a normal distribution, the omnibus test statistic does not have an assumed sampling distribution (Robbins et al. 2015), and instead must be computed via a placebo test.Footnote 2

Indeed, in the original synth formulation, Abadie et al. (2015) suggested the use of a permutation-based placebo method, which was adapted by Robbins et al. (2017) for interventions applied to micro-level units. A sampling distribution for the test statistics identified in (8) and (9) is approximated by calculating the same test statistics across random permutations of the study areas. Specifically, the indices for the J total blocks in the study are randomly sorted, and the first J – J 0 blocks (the number of intervention blocks) are considered as a placebo intervention area (i.e., a mixture of units which obviously were not an actual intervention area), and the remainder as the comparison to the placebo area. Synthetic control weights for the comparison areas are estimated in the same manner as described above, and subsequently applied to estimate treatment effects and test statistics. Test statistics are calculated across 1000 (K) random permutations of the original study blocks, creating a distribution of treatment effects observed had the intervention been assigned to the blocks at random (Carsey and Harden 2014).

The results of the permutation tests can then be used to estimate p values for the individual and omnibus test statistics for the actual treatment area.Footnote 3 The logic of this test is that if the intervention were responsible for the observed differences between the intervention areas and their synthetic controls, then we should observe that only a very small proportion of placebo regions obtain test statistics as large as or larger than the actual intervention area (Carsey and Harden 2014). That is, we can calculate the permutation p value as

$$ p\ (perm.)=\frac{\left\{\#k:{Z}^k\ge Z\right\}}{K} $$
(10)

in which Z is the individual or omnibus test statistic for the actual intervention area (see Eqs. 8 and 9, above), Z k is the same test statistic for permutation k, and K is the total number of permutations (Robbins et al. 2017). Calculating p values in this manner has several benefits, including compensating for the distributional misspecification necessitated by applying survey methods to the present case, providing an intuitive means to describe the compatibility of the observed effects with the null hypothesis (Wasserstein and Lazar 2016), and reducing to classical inference as if the intervention has been randomized (Abadie et al. 2015; Rosenbaum 2005).

Data and comparisons

Flint comparison donor pool

In order to assess the impact of the directed patrol intervention on violent crime in Flint, the research team paired geocoded incident-level violent crime data provided by MSP with block-level data from the 2010 Census. The violent crime data captured homicides, aggravated assaults, and robberies from January 2010 to end of year 2013 (n = 7983 total offenses). With January 2012 serving as the onset of the intervention period, the crime data covered 24 months pre- and post-intervention, and the Census data covered static pre-intervention indicators. In Flint, there were 3005 blocks, with 1117 incorporated into program hot spots (37.2%) and 1888 outside of the intervention areas. As it was implemented in practice, the Michigan State Police initially identified five hot spots, and later expanded to seven over the course of the program. Relative to other implementations of place-based violence prevention programs, the hot spots utilized in the Flint program were quite large and could be considered as standalone interventions. For instance, the High Point Drug Market Intervention took place in five separate neighborhoods, comprising 145 blocks in total, with an average size of 29 blocks (Corsaro 2013; Saunders et al. 2015). Comparatively, Flint hot spots ranged from 13 to 364 blocks, with an average hot spot size of 160 blocks, and follow-up periods ranged from 6 to 24 months. Because of the variable start points for the hot spots, they are treated as seven separate interventions, with overall effects computed via meta-analysis (see below).

Utilizing an alternate donor pool from Detroit

Analysis of program process documents and raw crime trends in Flint indicates several threats to internal and construct validity. Over the course of the intervention, nearly one-quarter of the total program traffic stops took place outside the designated hot spots.Footnote 4 This contamination threat is not eliminated by the use of the synthetic control method described here, as it is more accurate to say that using the Flint comparison donor pool insinuates comparing the program hot spots to a synthetic control that received a diluted intervention (Shadish et al. 2002).

In order to guard against the impact of treatment diffusion, we drew on an alternate donor pool from the city of Detroit. Although substantially larger in terms of both land area and population than Flint, both cities are structurally similar. Stemming in part from the flight of the automotive industry in the 1980s, both have experienced similarly high rates of population loss (Jacobs 2004), poor public health outcomes (Grady and Enander 2009), concentrated disadvantage and violence (Matthews 1997). Despite the difference in size between the two cities, a strength of the synthetic control method is that the estimated weights will favor blocks in Detroit which more closely resemble the blocks in the Flint hot spots. In this case, the Detroit comparison will produce synthetic hot spots that were not subject to treatment diffusion.Footnote 5 The Detroit alternative donor pool consists of 13,097 blocks with 54,387 total offenses.

Analysis plan

In the analyses that follow, we apply this synthetic control approach to evaluate the impact of the Flint directed patrol intervention. Because the seven program hot spots are relatively large, and have different starting points and follow-up times, we treat them in the analysis as seven separate interventions. First, we examine the quality of balance between the designated hot spots and their synthetic control regions across outcomes and block-level covariates. We then estimate and visualize the treatment effect utilizing the placebo-based permutation method described above. The effect of the program is tracked across the post-intervention period by manipulating the maximum follow-up duration (Robbins et al. 2017; Saunders et al. 2015). An overall picture of the treatment effect is produced by using a fixed effects meta-analysis to pool the effects from each hot spot.

Results

Outcome and covariate balance quality

The target areas for the program hot spots were originally selected because of their distinctively high rates of violence within Flint. Unsurprisingly, the available comparison blocks within Flint were substantively different from the intervention areas in terms of pre-intervention levels of outcomes and covariates (see Table 1). In order to reduce the plausibility of selection bias, synthetic control weights were calibrated and applied to the Flint comparison blocks for each individual hot spot.Footnote 6 As the weights create a counterfactual intervention area that represents the treatment area in aggregate, descriptive totals for hot spot 1 before and after the application of synthetic control weights are displayed in Table 2. The “After” column in Table 2 demonstrates the quality of the resulting synthetic control areas—the weighted Flint comparison blocks now exactly match the pre-intervention characteristics of the patrol blocks in aggregate.Footnote 7

Table 1 Pre-intervention block averages for outcomes and demographics across hot spots and unweighted comparisons
Table 2 Pre-intervention totals for hot spot 1 and comparisons before and after application of synthetic control weights

In addition to balancing the baseline pre-intervention characteristics of the comparison areas to match the intervention hot spots, the synthetic control calibration also balances trends in outcomes over the pre-intervention period (see Left panels of Fig. 1). As noted by Robbins et al. (2017), by using micro-units such as blocks, some crimes occur so infrequently that there is an excess of zeros within both intervention and comparison blocks across time periods (e.g., homicides each month). When this occurs, the method may be unable to produce non-negative synthetic control weights. To compensate, we aggregated the monthly aggravated assaults and robberies in each block to quarters, and aggregated the homicides over the entire pre-intervention period. The quarterly trends for total violence, aggravated assaults, and robberies will be equivalent between the intervention hot spots and their synthetic control, but homicides will only be exactly equivalent in terms of pre-intervention totals.

Fig. 1
figure 1

Violent crime trends in directed patrol hot spot 1 and synthetic controls (left), and treatment–control differences (right)

To compensate for contamination and endogenous change threats to validity, similar procedures were followed with the Detroit comparison (see Tables 1, 2). As with the Flint comparison, these blocks were substantively different from the program hot spots prior to the intervention. Following the application of synthetic control weights, each of these comparison regions were indistinguishable from the observed pre-intervention characteristics of the hot spots.

Effect of DDACTS on violent crime

Applying the synthetic control weights for the Flint comparison, we first plotted the quarterly trends for total violence, aggravated assaults, and robberies across the entire study period (Fig. 1).Footnote 8 Hot spot 1, receiving the most intensive intervention activities, is presented as an example (analogous plots for other hot spots available from the authors). The panels on the left present the trends in quarterly crime counts for hot spot 1 and its synthetic controls across the study period. The overlap of these trends prior to the onset of the intervention demonstrate the quality of the synthetic control produced by the survey calibration method. The panels on the right represent the difference between the hot spot and the synthetic control, where values over the zero-line represent an increase in violence attributable to the intervention, and values below the line represent decreases in violence. The gray lines represent the analogous treatment–control differences for each of the 1000 placebo regions, representing the permutation generated sampling distribution of treatment effects.

Synthetic control treatment effects across hot spots and post-intervention periods for the Flint Comparison are presented in Table 3. The results suggest a largely null effect across hot spots, with treatment effects for the actual program hot spots statistically indistinguishable from the placebo regions. Indeed, consistent with the trends for hot spot 1 visualized in Fig. 1, the onset of the intervention appeared to be associated with an increase in violence, which then dissipated over time. However, these trends were within the range of effects produced by the placebo hot spots. Hot spot 4 is the exception to this trend, where the intervention was associated with increases in aggravated assaults across the entire post-intervention period. It is also important to note that this analysis indicates that violence displacement from one hot spot to another is an unlikely explanation for the observed null effects (Cornish and Clark 1987). It does not appear that a suppressive treatment effect in a hot spot at one point in time was followed by a criminogenic effect in another hot spot at a later point in time.

Table 3 Treatment effect estimation across hot spots, post-intervention period, and outcomes; synthetic hot spots generated using Flint blocks

The estimated treatment effects and permutation p values for the Detroit comparison are presented in Table 4. Relative to the Flint-based comparison, many of the outcome trends remained non-significant. However, estimated effects in hot spots 1, 2, and 4 suggest a relatively stronger increase in aggravated assaults, which reduce in size toward the end of the observation period. The robbery trends indicate a suppressive effect for the program, which grew as the intervention period unfolded. The magnitude of these effects on post-intervention crime trends is displayed in Table 5, where the program hot spots experienced increases in aggravated assaults from 17 to 46%, relative to their Detroit-based synthetic controls. In contrast, hot spots 1 and 5 experienced decreases in robberies ranging from 15 to nearly 30%.

Table 4 Treatment effect estimation across hotspots, post-intervention period, and outcomes—synthetic hotspots generated using Detroit blocks
Table 5 Post-intervention crime totals across hot spots and change relative to synthetic comparisons

To gain a picture of the overall impact of the program across hot spots, we used a fixed-effects meta-analysis to pool the treatment effects for all hot spots across their entire post-intervention periods. A fixed-effects model is appropriate in this instance because we are attempting to estimate an overall “true” effect for the hot spots comprising the overall intervention, rather than assuming that these effects were sampled from a set of similar evaluations that have yet to be conducted (Viechtbauer 2010). Further, the use of fixed-effects models can be supported by the effect heterogeneity indicator, τ2, which quantifies variance in treatment effects across the individual hot spots. These analyses were implemented using the metainc function in the meta package in R (Schwarzer 2007).

The treatment effects across the individual hot spots are combined in Table 5. The fixed-effect meta-analysis suggested little heterogeneity in effects for overall violence, homicides, and robberies. However, the meta-analysis indicates that when using the Flint-based comparison, the intervention was associated with significant increases in overall violence (+18%) and aggravated assaults (+33%). When using the Detroit comparison blocks, the intervention was associated with a similar increase in aggravated assaults (+26%), and a decline in robberies (−24%).Footnote 9

Discussion

The impact of the Flint directed patrol intervention on violent crime

Although recent initiatives have offered highly focused traffic enforcement as a means to enhance traffic safety and reduce criminal activity in designated geographic areas, there has been little research evaluating the impact of such initiatives on reducing violent crime. The current effort sought to estimate the effect of a traffic enforcement based directed patrol intervention—originally grounded in the data-driven approach emphasized in DDACTS—on violent crime via the use of a rigorous synthetic control analysis. These analyses suggested that the intervention produced inconsistent results—when utilizing blocks from Flint to form a comparison, largely null program effects were observed. However, when blocks from Detroit were utilized to create comparisons, we observed that the directed patrol program was associated with a null effect on homicides, an initial increase in aggravated assaults which dissipated with time, and a decrease in robberies which grew with time.

Interpreting the results from these analyses is both perplexing and intriguing, as at this point there are few similar evaluations for comparison, and contrast between the evidence suggesting a gradual decline in robbery but an initial increase in aggravated assaults is difficult to explain. An evaluation of DDACTS in Shawnee, Kansas observed a decrease in robberies in the intervention area, though the effect was non-statistically significant, perhaps due to the small number of robberies in the jurisdiction (Bryant et al. 2014). The Shawnee study did not consider aggravated assaults. A meta-analysis of hot spots policing interventions by Braga et al. (2014) found that increased traditional policing tactics were associated with an overall reduction in violence, but gives little context on whether effects should be expected to vary across offenses. For the most part, prior directed patrol studies are of limited value in dissecting this effect as most are focused solely on firearm violence, or have not distinguished impact by type of crime. One exception to the latter is the research of Rosenfeld et al. (2014) in St. Louis, but their findings were in contrast to the present results. They found that directed police patrol decreased gun assaults but not robberies with a gun. Theoretically, most researchers have attributed the effects of directed patrol on violent gun crime as being consistent with specific deterrence and situational crime prevention, both of which emphasize the increased risk for illegal gun carrying (Sherman and Rogan 1995). Why directed patrol would have a differential impact on robbery in contrast to aggravated assault, much less why the results would be different in Flint and St. Louis, is difficult to explain, but may be attributable to the more focused intervention areas in St. Louis relative to the Flint directed patrol program.

The findings of a possible reduction for robbery do have research precedence in an early study of the association between traffic stops and violent crime. Sampson and Cohen (1988) examined robbery rates across 171 U.S. cities in 1980 to study the impact of proactive policing on robbery. Although not a study of directed police patrol, Sampson and Cohen used a measure of arrests per police officer for driving under the influence and disorderly conduct as their indicator of proactive policing. There was considerable variation across the 171 police departments, with an average arrest rate per officer ranging from 0.47 to 20.38. The results indicated more proactive policing resulted in reduced robbery rates. They interpreted this as being consistent with both a direct effect on offender perceptions through control of disorder and incivilities as well as an indirect effect through a change in the risk of arrest (Sampson and Cohen 1988). Similarly, an early directed patrol study in Pontiac, Michigan, observed that “targeted crimes,” including robberies, declined significantly (Cordner 1981). These previous findings and interpretations align with the gradual effect of the Flint program on robberies observed in this study. Further, an additional study observed a decline in gun robberies as a result of directed patrol, as the Indianapolis directed police patrol study examined two different models of gun-focused directed patrol. However, the declines were observed for robberies and gun assaults, as well as gun homicides (McGarrell et al. 2001).

More commonly, the directed patrol studies have tended not to distinguish program effects on robbery from assaults. Cohen and Ludwig’s (2003) study in Pittsburgh found that assault-related gunshot injuries significantly declined in the directed patrol treatment areas. These results are consistent with the Rosenfeld et al. (2014) study, but they are silent on the impact on robbery. The Kansas City Gun Experiment examined the category gun crimes as well as specific effects on homicides, shots fired, and drive-by shootings (Sherman and Rogan 1995). They found that the overall gun crime category as well as the sub-categories all experienced declines, but they did not distinguish gun assaults and armed robberies from these other categories.

In summary, prior studies find evidence of a differential impact on robberies and gun assaults, whereas some studies find declines in one crime type but no impact on the other. We are not, however, aware of studies finding a decline in one crime category but an increase in the other. Further, the observed “increase” in aggravated assaults is particularly perplexing given prior research and theoretical considerations. Theoretically, both deterrence theory and situational crime prevention would seem to predict a consistent impact on aggravated assaults and robberies as a result of saturated police presence within the Flint intervention areas. Alternatively, the gun assault results could be consistent with defiance theory (Sherman 1993), in which increased police patrols could lead to negative interactions between police and citizens and trigger defiance. For instance, Haberman et al. (2016) observed that higher perceptions of procedural injustice within violent crime hot spots in Philadelphia were associated with lower public satisfaction with police. The intervention strategy in Flint, providing intense traffic enforcement which rarely resulted in a citation, would reasonably serve to enhance such perceptions of procedural injustice. However, why such interactions would result in increased assaults but reduced robberies is unclear and unanticipated given prior research (e.g., Cohen and Ludwig 2003; Rosenfeld et al. 2014). Future research may benefit to combine official record analysis with surveys of local residents to better understand perceptions of the intervention.

One possible explanation is that the observed initial increase in aggravated assaults is due to some manner of endogenous change in aggravated assaults that the directed patrol program would have been unlikely to impact. Theoretically, if directed police patrols are expected to suppress aggravated assaults, we would anticipate a larger impact on such assaults in public spaces and perhaps no or reduced impact on assaults occurring in private spaces, such as domestic violence incidents. To explore whether such distinctions in type of assaults could help explain the results, we utilized National Incident-Based Reporting System data to conduct supplementary analyses of city-wide trends in aggravated assaults occurring in residential and non-residential locations. The data indicated that Flint experienced a sharp increase in residential aggravated assaults in early 2012 that was not observed in Detroit. These residential assaults later declined. The results suggest that this increase in residential assaults, less likely to be affected by the program, may have produced the unexpected finding of an increase in aggravated assaults in Flint, particularly when using the Detroit synthetic comparison. This particular interpretation would be more consistent with the program having a null, instead of positive, impact on aggravated assaults.

Limitations

There are several noteworthy limitations, both in terms of the program being evaluated and the design that was utilized by the research team. First, the nature of program implementation should be considered in interpreting and generalizing from the results. The contrast of these findings and those of other directed patrol evaluations (e.g., Cohen and Ludwig 2003; Rosenfeld et al. 2014) could reflect the broad definition of hot spot as it was applied in the Flint program. The target areas were considerably larger than those utilized in other place-based interventions, and larger than the conceptualization of hot spots in much of the existing literature (Braga and Weisburd 2010; Saunders et al. 2015). Even though MSP conducted significant program activities within the designated target areas, the absolute size of these sites may have served to dilute the dosage of the intervention. For instance, Ratcliffe et al. (2011) partially attribute the success of the Philadelphia Foot Patrol Experiment to the concentrated spatial dosage of the Philadelphia program, relative to earlier experiments. Indeed, Ratcliffe et al. (2011) juxtapose their findings to an earlier foot patrol experiment in Flint (Trojanowicz 1986), in which the patrol effectiveness was reduced following the expansion of the intervention areas. It is also possible that certain subsections of the hot spots experienced more beneficial outcomes. However, the consistency of estimated effects across the multiple intervention areas, regardless of the number of blocks, casts doubt on this possibility. Due to the precinct-sized intervention areas, it is more appropriate to generalize the effects of the Flint intervention to other directed patrol programs, as opposed to place-based hot spots policing programs more generally.

Similarly, it is important to note that, although the Flint program was originally an interpretation of the DDACTS model, there were significant divergences from DDACTS in practice (NHTSA 2014). The Flint directed patrol implementation was focused almost entirely on enforcement activities within violent crime hot spots, as opposed to places of overlap between violent crime and traffic accident hot spots. A focus on the latter would likely have produced much more focused intervention sites. This reiterates the point that the Flint intervention areas may have been too large to allow for the level of focused police activity required for a consistent reduction in violence (Skogan and Frydl 2004). To this extent, we urge caution in generalizing these findings to future DDACTS implementations—to do so opens the possibility of conflating a treatment effect with a specific level of treatment (Shadish et al. 2002). At a minimum, these findings suggest that policing research should be more specific in conceptualizing and measuring the impact of policing interventions on specific crime types. This is apparent in the current findings of inconsistent impacts on aggravated assaults and robberies but also in the review of prior studies of similar directed patrol studies. The tendency has been to highlight the pattern of declines in violent crime associated with directed patrol interventions in high gun crime areas. This study raises the possibility of inconsistent effects across different violent crime types.

Concerning the adequacy of the methods used to evaluate the intervention, the synthetic control method is designed to act against selection bias threats to internal validity for retrospective quasi-experimental designs (Shadish et al. 2002). However, the counterfactual can only be based on observed characteristics, and, as such, is limited by the data available, and the possibility of unobserved confounders. Consistent with previous research, we have used a set of block-level characteristics expected to be associated with aggregate violence with the cities (Robbins et al. 2017; Saunders et al. 2015). As with any quasi-experiment, we cannot rule out the possibility that unobserved differences between the intervention areas and their synthetic controls are responsible for the findings observed here. Further, the effect of unobserved confounders could be compounded if these indicators also vary between the Flint and Detroit comparisons. However, in contrast to other studies using external controls (e.g., Rudolph et al. 2015), the interaction of unobserved indicators and comparison sites is likely to be minimal given the geographic proximity and shared socioeconomic history between Flint and Detroit.

Additionally, and as with random assignment to treatment, synthetic control cannot rule out other plausible threats to validity. Consistent with other place-based policing interventions, the Flint program encountered issues with boundary adherence and contamination (Sorg et al. 2014; Weisburd 2005), in which nearly one-quarter of all program-related traffic stops took place outside any designated intervention areas. Accounting for this diffusion is important in that even small doses of directed patrol activities can result in significant impacts on crime (Telep et al. 2014), which could potentially account for the null effects observed using the Flint comparison. In order to counter the diffusion of the DDACTS intervention outside of the hot spots, we drew on an alternative donor pool of blocks from Detroit, a larger but demographically and economically similar city that was not subject to the intervention. This analysis suggests a suppressive effect for the Flint program on robberies, but enhanced the positive effect on aggravated assaults observed using the Flint comparison. In the absence of a synthetic control design, it would be possible that the differences between the Flint and Detroit comparisons are due to differential maturation, in which a decreasing trend in Flint is compared to an increasing trend in Detroit, which would have persisted even without the intervention. However, the use of synthetic control guards against this possibility by exactly matching the pre-intervention outcome trends between the hot spots and the Detroit-based synthetic control.

Methodological utility of synthetic control

When the goal of an evaluation is to determine whether the program was responsible for observed variation in outcomes, a randomized controlled trial remains the gold standard (Shadish et al. 2002). However, the evaluation of place-based policing initiatives are often not amenable to such designs, as the intervention sites may be purposefully generated based on their distinctive levels of violence or other crimes (Braga 2010), or the evaluation may be initiated after the program activities have taken place (Braga et al. 2011). In these cases, evaluators and program stakeholders can benefit from the application of quasi-experimental designs utilizing high-quality counterfactuals. The family of synthetic control approaches (Abadie et al. 2015; Robbins et al. 2017; Saunders et al. 2015) represents a powerful tool for generating counterfactual treatment units when randomization is not possible, or the evaluation is retrospective in nature. In the current inquiry, we implemented the synthetic control approach described by Robbins et al. (2017), in which weights were calibrated via survey methods. When these weights were applied to the available comparison units, the result was a synthetic treatment unit which exactly matched the designated intervention areas in regard to pre-intervention outcomes and covariates, providing an estimate of what would have happened to the outcome in the absence of the intervention. Compared to previous implementations of synthetic control (Abadie et al. 2015), specific features of Robbins et al.’s (2017) implementation were well suited to the current evaluation, particularly the ability to incorporate numerous micro-level units and multiple outcomes of interest.

Relative to the synthetic control approaches outlined by Abadie et al. (2015) and Saunders et al. (2015), this method differs in that the others do not guarantee that weights will be estimated which produce an exact match between the actual treatment unit and the synthetic treatment unit, although the correspondence may still be very strong (see Saunders et al. 2015: table 1 and fig. 1). Although the Robbins et al. (2017) method will produce weights that provide an exact match on observed pre-intervention outcomes and covariates, it is possible that no weights can be obtained that would satisfy this condition (see Eqs. 5 and 6). As such, depending on the number of pre-intervention time periods, outcomes, covariates, and donor pool units, it cannot be assumed a priori that an exact solution using the Robbins et al. (2017) method exists. Those authors note that the method appears more feasible when there is a relatively larger number of possible comparison units. In these instances, it would be possible to use the Saunders et al. (2015) method to minimize treatment–control differences, in lieu of an exact match.