Introduction

The history of randomized controlled trials (RCTs) in medicine is well known (Bothwell et al., 2016; Bothwell and Podolosky, 2016; Chalmers et al., 2012). The same can be said about the history of RCTs in assessing the effects of social interventions (Forsetlund et al., 2007), as well as more narrowly in the field of criminology (Farrington, 1983; Farrington & Welsh, 2005). Less well known is the history of another rigorous evaluation design that predated the widespread use of RCTs in criminology and medicine: pair-matching in combination with random allocation—otherwise known as matched-pair RCTs. Here, units (people or places) are matched by pairs on a wide array of covariates and units of each pair are randomly allocated to the treatment and control conditions.

Recent research identifies that this design was used as early as 1926 in medicine, and was first used in 1935 in the social sciences, specifically, criminology (Podolsky et al., 2021; Welsh et al., 2021). Within medicine, in 1926, Amberson et al. (1931) initiated a trial to investigate the efficacy of sanocrysin as a therapeutic for pulmonary tuberculosis. Twenty-four patients “free from serious complications” participated in the study. The study reports:

On the basis of clinical X-ray and laboratory findings the 24 patients were divided into two approximately comparable groups of 12 each. The cases were individually matched, one with another, in making this division. Obviously, the matching could not be precise, but it was as close as possible, each patient having previously been studied by two of us… by a flip of the coin, one group became identified as group I (sanocrysin-treated) and the other as group II (control). (Amberson et al., 1931, pp. 403-404)

In the history of RCTs in medicine, this study is considered an “outlier,” owing to alternate-allocation designs being the dominant model in the first half of the twentieth century (Bothwell & Podolsky, 2016, p. 502). Importantly, Gabriel (2014) has demonstrated the origins of the trial at the intersection of mutual public health service and pharmaceutical industry interest in an objective assessment of the drug, with the trial entailing blinding of patients to prevent a “psychic influence” on healing.

The 1935 criminology study, known as the Cambridge-Somerville Youth Study (CSYS), was initiated to evaluate the impact on youth delinquency of a social intervention of “directed friendship” (deQ Cabot, 1940). Founded and directed by Richard Clarke Cabot, a physician and professor of clinical medicine and social ethics at Harvard University, the CSYS set out to discover whether an individually focused and “morally inspired” intervention in the lives of young, disadvantaged boys could prevent them from becoming delinquent (O’Brien, 1985, p. 550). Recruitment and screening of 1953 boys, ages 5–13 years and from the cities of Cambridge and Somerville (Mass.), produced a final sample of 650. All the boys were then matched into pairs—according to 142 variables (rated on an 11-point scale)—and one member of each pair was randomly allocated, based on a coin toss, to the treatment group. There have been four assessments of delinquent and criminal behavior and other outcomes covering major periods of the life-course: transition from adolescence to early adulthood, early adulthood, middle-age, and old age (up to age 90), with the latter representing a 72-year follow-up (Welsh et al., 2019).

Cabot himself was both physician and social interventionist, and exemplified the contemporary attempt to rigorously evaluate interventions in both domains. Using Amberson et al.’s tuberculosis trial and Cabot’s CSYS as the starting points, this paper examines the subsequent history of matched-pair RCTs, and related attention to stratification prior to randomization, in both medicine and criminology over almost a full century to illustrate shared interest in the advantages and disadvantages of a research design intended to ensure the comparison of like with like. Also important is consideration of implications for experimental criminology.

Background

In the 1926 sanocrysin trial, Amberson and colleagues appear to have used pair-matching followed by random allocation as a way to mitigate concerns about recruited patients presenting differing levels of symptoms of pulmonary tuberculosis. Far from just relying on the National Tuberculosis Association classification of the extent of the disease, as reported by the authors, it was “necessary to give weight to the character as well as to the extent of the disease, and also to include other clinical factors in the final judgment of the cases” (Amberson et al., 1931, p. 404). In the absence of documentation about the formal plan for the evaluation design (see Gabriel, 2014), we might infer that an equally pressing concern facing the researchers was the small number of recruited patients (N = 24). In short, simple random allocation could not be relied upon to produce balance in the pre-test measures between the treatment and control groups.

In the CSYS, Cabot in turn used pair-matching followed by random allocation because he regarded matching on its own to be insufficient. As reported by Powers and Witmer (1951, p. 78), following the matching process:

The next question was to determine whether any given boy should fall into the treatment or the control group. It was evident that an arbitrary decision might give rise to a constant error. The proper method of determining this question was, of course, by chance. Accordingly, a coin was flipped and the cases fell into the treatment or comparison groups in accordance with its fall.

Powers and Witmer (1951, p. 78) added the following about Cabot’s decision-making: “It was believed that, even if the measures used in the matching were not perfectly reliable, chance would tend to preserve, in groups as large as 325 each, an even balance of important factors.”

These landmark studies draw attention to certain key factors that remain central to contemporary use of matched-pair RCTs in the social sciences and medicine. First, although random allocation is designed to help eliminate confounding, covariate imbalance is still possible. That is, the treatment and control groups may still differ by chance. This can be especially problematic in small N studies. Matching across known covariates can thus add “face validity” to an experimental study (Chondros et al., 2021, p. 5766).

Second, pair-matching prior to randomization can improve study power when the matching is effective, meaning that there is a positive within-pair correlation on relevant variables (Wacholder & Weinberg, 1982). By decreasing variation within matched pairs on known covariates, matching can improve the precision of estimated treatment effects compared to other designs (i.e., statistical efficiency). The relative efficiency of the matched-pair design has been demonstrated in several recent simulation studies (Balzer et al., 2015; Chondros et al., 2021), although as noted below, this will depend on the success of the matching itself (see, e.g., Ariel and Farrington (2010) on “unsuccessful blocking”).

Third, randomization within matched pairs provides a straightforward way of dealing with differential attrition, which can present a serious threat to the internal validity of follow-up assessments of prospective trials. Since the proper comparison in a randomized trial involves the original treatment and control groups (i.e., “intent-to-treat”), differential attrition threatens the internal validity of the simple randomized design. Matched-pair randomization overcomes this problem since the researcher can drop both members of the pair in the event one member is missing (Farrington & Welsh, 2006). Of course, this essentially doubles the loss of follow-up, which may pose a problem for smaller studies (Ivers et al., 2012).

Perhaps the single most important application of matched-pair RCTs—and by far the dominant issue in scholarly and policy debates in both the social sciences and medicine (see Ariel & Farrington, 2010; Weisburd & Gill, 2014; Balzer et al., 2015; Imai et al., 2009a, 2009b; Chondros et al., 2021)—is when the units are clusters of individuals or places rather than individuals alone. Unlike with individual-based studies, where securing an initial N of some minimum threshold (e.g., 50 units in each condition; Farrington, 1983) is often straightforward, cluster- and place-based studies present any number of challenges to obtaining an initial N of such magnitude. Recruiting 100 or 150 schools, communities, or high-crime properties is far more difficult than obtaining a similar number of families, patients, or offenders. Small sample size is thus a key motivating factor for pair-matching in cluster- and place-based RCTs, where it appears to be most common (Campbell et al., 2007).

In the last two decades, a robust debate in medicine and public health has taken place over the potential benefits of using pair-matching in cluster-RCTs.Footnote 1 Some have gone so far as to suggest that “randomization by cluster without prior construction of matched pairs, when pairing is feasible, is an exercise in self-destruction” (Imai et al., 2009a, p. 48). Others have been somewhat restrained: “a randomized trial with adaptive pair-matching will often be more efficient for estimation of the CATE [conditional average treatment effect] than its completely randomized counterpart” (Balzer et al., 2015, p. 1009). Still, others have been more reserved in their enthusiasm for the design, arguing that “the actual benefits of matching in practice will not be realized unless several conditions are satisfied, conditions that may be difficult to achieve in practice” (Donner & Klar, 2004, p. 418). For example, the “degrees of freedom used to calculate the confidence interval and P-value for the intervention effect is based on the number of pairs of clusters rather than the total number of clusters,” such that pair-matching results in a substantial loss of degrees of freedom compared to simple or stratified designs (Chondros et al., 2021, p. 5766). This may pose serious problems for trials with small numbers of clusters (Donner & Klar, 2004; Ivers et al., 2012).

Most recently, Chondros and colleagues (2021) performed a simulation study comparing the efficiency of the matched-pair design with stratified and simple random designs for cluster randomized trials. The authors found that the matched-pair design was more efficient when the correlation between cluster-level outcomes within pairs was moderate to strong (r ≥ 0.3), but not more efficient with weaker correlations.

Such deliberations have taken place alongside the evolving—if intermittent—application of a priori trial stratification and more extensive matched-pair randomization in medicine, public health, and the social sciences, as we will next illustrate.

Medicine and public health

In post-1926 prospective clinical trials in medicine, a priori matching would remain an important methodological consideration. There were those who employed matching alone, whether for ethical (Gehan & Freireich, 1974; King et al., 2006) or logistical (Inouye et al., 1999) concerns about randomization, with increasingly sophisticated measures taken to ensure the equivalence of such matching (Lin et al., 2018). However, matching alone among prospective trials appears to have been rarely practiced in the RCTs era. Rather, most discussions have focused on the relative utility of matched randomization (or before then, alternate allocation) versus randomization (or alternate allocation) alone, with discussion dating to Austin Bradford Hill’s own elaboration of the “Principles of Medical Statistics” in 1937, the same year that Cabot was enrolling his first participants in the CSYS.Footnote 2

In Hill’s framing, it was critical in clinical trials “to ensure beforehand that, as far as is possible, the control and treated groups are the same in all relevant respects” (Hill, 1937a, p. 42). In alternate allocation studies, continued Hill, “in the long run we can fairly rely upon this random allotment of the patients to equalise in the two groups the distribution of other characteristics that may be important,” and that especially “with large numbers we can be reasonably sure that the numbers of each type [of differing representation with respect to particular characteristics] will be equally, or nearly equally, represented in both groups” (p. 42). However, recognizing the potential for unequal sorting in smaller studies, Hill provided a key caveat:

If it be known that certain characteristics will have an influence upon the results of treatment and on account of relatively small numbers the distribution of these characteristics may not be equalised in the final groups, it is advisable to extend this method of allocation. For instance, alternate persons will not be treated but a division will be made by sex, so that the first male is treated and the second male untreated, the first female is treated and the second female untreated. (p. 42)

Hill later alluded to the “practical difficulties” that could enter into the design of clinical trials (Hill, 1937b). And most matched-pair randomized studies entailed only a handful of variables, with Wladyslaw Billewicz noting in 1964 that of 20 “recently published medical investigations,” the number ranged from one to six, with most studies employing two or three. Debate over ensuing decades would thus focus on the relative merits and demerits of including matching prior to randomization. On the pro side of including a priori matching, a “state of ‘other things being equal’ is built into the design,” protecting “the investigator against ‘freaked’ samples” (Billewicz, 1964), and, as eventually noted, improving statistical power (see, e.g., McClatchey et al., 1992).

Perhaps most prominently, in 1966, the Director of the American Medical Association’s Department of Biostatistics, Stanley Schor, emphasized for JAMA’s audience the benefits of stratification prior to randomization: “To many clinical investigators the word ‘randomization’ has a magic connotation. As long as they randomize, they think it does not matter how important some pertinent characteristic is in terms of its effect on the results of a study. This may be true with enormous samples. But in the ordinary course of clinical research an investigator should not trust the randomization procedure to produce unbiased results” (Schor, 1966, p. 124). Instead, attention should be devoted early to equalize those seemingly knowable factors that could shape the trial outcomes: “If a characteristic is known to have an important effect on the experiment, an investigator should not depend upon chance in the selection process to cancel it out. The effects of important factors should be designed out of the study, controlled in some way, or allowed to remain in such a manner as to have their net effects measurable. Randomization should be relied upon only for the numerous factors of lesser importance” (Schor, 1966, p. 124). Or, as Schor concluded, the investigator “should not simply randomize and hope” (p. 124). However, statisticians were likewise willing to draw attention to the con side of the ledger, whether concerning the potentially increased cost and logistical difficulties entailed in such matching, or the potential statistical messiness it introduced (Billewicz, 1964; Bland & Altman, 1994; McKinlay, 1977).

The usage of matching within the New England Journal of Medicine in the twentieth and twenty-first centuries may be an instructive and representative sampling device concerning the consequent application of matching and matched-pair randomization.Footnote 3 The vast majority of “matched” investigations in the journal were retrospective case–control studies, with several hundred represented. Nonetheless, a small fraction (between 1 and 2% of the “hits” represented) were matched prospective studies. Some of these were matched, prospective observational studies: in a 1960 study of physical activity and obesity, “obese” subjects were matched by age, occupation, and socioeconomic background to “nonobese” subjects (Chirico & Stunkard, 1960), while in a 1978 study of growth and development in children with sickle-cell trait, the children were matched as closely as possible to controls according to sex, birth date, birth weight, gestational age, five-minute Apgar score, and socioeconomic status (Kramer et al., 1978). By 2015, still more elaborate methods could be used to match patients within a prospective “registry” study of patients receiving cardiac bypass surgery versus percutaneous intervention with second-generation drug-eluting stents among patients with multi-vessel coronary artery disease (Bangalore et al., 2015).

Other researchers conducted matched, prospective RCTs. The first of these, a 1961 study of vitamin C and antihistamines on gingival hyperplasia among patients receiving the anti-seizure medication phenytoin, was analogous to the study by Amberson et al. (1931), a matched-cluster randomization study (Rose et al., 1961). Later studies on the impact of glycemic control on kidney function among diabetic patients (Feldt-Rasmussen et al., 1986), and the first study of what would eventually be called copaxone for multiple sclerosis (Bornstein et al., 1987), were matched-pair studies, using three matched characteristics (albeit different ones) apiece.

Two studies, entailing matched-pair cluster randomization, shaded closer to social science investigations, with one concerning an educational program for risk factor modification for heart disease (Walter et al., 1988) and the other a safe childbirth checklist study in India (Semrau et al., 2017). That such NEJM-reported educational interventions noted above shared much in common with social science investigations is perhaps no surprise, given the role of biostatisticians as the shared colleagues of investigators of multiple disciplines, and the increasing ease of access of investigators across disciplines to the papers of one another (see, e.g., McKinlay, 1977). Having shown the persisting, albeit limited, application of matched-pair randomization in medicine and public health, we thus next turn to the discipline of criminology—harkening back to Cabot—and the social sciences more generally.

Criminology and the social sciences

The combination of pair-matching and random allocation in prospective controlled trials in criminology and in the social sciences is most common when the unit of allocation is clusters of individuals or places. Designs that employ some form of stratification, including pair-matching, are especially useful in this context due to the smaller number of units to be allocated to treatment and control conditions. Imai et al. (2009b) reviewed pre-randomization designs in studies with cluster randomization in political science, economics, education, and medicine and public health during the 2000s. Of the 107 cluster randomized experiments that were located, 22% used stratification and 19% used pair-matching. The authors also noted that pair-matching was largely confined to studies in medicine and public health, but was also common in development economics.

Others have similarly observed that, outside of medicine and public health, cluster-RCTs with pair-matching are employed most frequently in development economics (Banerjee & Duflo, 2009). Much of this work has been conducted at MIT’s Poverty Action Lab (e.g., Banerjee et al., 2007). One survey of randomized experiments in development economics found that, while most studies employed stratification prior to cluster randomization, few employed pair-matching (Bruhn & McKenzie, 2009).Footnote 4 Intriguingly, in an accompanying survey of leading researchers, approximately half indicated that they had used randomization within matched pairs at some point in their work.Footnote 5 Elsewhere, in a meta-analysis of 77 educational interventions involving random assignment procedures performed in developing countries, McEwan (2015) found that approximately 70% used some form of stratification (including pair-wise matching) prior to randomization.Footnote 6

In the first comprehensive review of RCTs in criminology, which included published studies with a minimum N = 100 units (individuals or places) and covering the period 1939 to 1981, only 2 out of 37 trials used pair-matching (Farrington, 1983). One of these trials was the CSYS (McCord, 1978). The other, run by the California Youth Authority in the late 1950s, evaluated effects on recidivism of two different institutional living units (20- and 50-bed) for juvenile offenders (Jesness, 1971). Participants (N = 281) were matched by age and social backgrounds and then randomly allocated to either of the two treatment conditions.

An update of this review, using the same criteria and covering the period 1982–2004, identified an additional 85 RCTs (mostly of individuals) with criminological outcomes (Farrington & Welsh, 2005; see also Farrington & Welsh, 2006). Only one of the trials included pair-matching. This trial evaluated effects on recidivism of a cognitive-behavioral treatment program for male sex offenders in California (Marques et al., 1994). Participants (N = 229) were matched on three variables (age, prior criminal history, and offender type), arranged by pairs, and randomly allocated to either the treatment or control conditions.

Similar to the use of cluster randomization in the social sciences more generally, most examples of pair-matching with random allocation in criminology involve place-based experiments. Here, the unit of interest is not an individual but rather a discrete geographical area, such as a police district, high crime area (“hot spot”), business, or neighborhood (Boruch et al., 2010). In the aforementioned reviews, the included experiments with few exceptions used individuals as the unit of allocation. Since place-based experiments typically involve a small number of areas (more often < 100), pair-matching prior to random allocation provides important benefits over random allocation alone. Ideally, matched pairs of places could be established with one member of each pair randomly allocated to the treatment condition. This is also called a fully blocked design (Weisburd & Gill, 2014), but it is not often employed because it can entail a substantial loss of degrees of freedom (i.e., the number of variables that are free to vary following one or more restrictions placed on the data).

Policing experiments often utilize blocking prior to randomization, and on occasion, this involves pair-matching. To get a sense of the extent of the use of pair-matching in policing experiments, we drew upon the latest analysis of the Global Policing Database (GPD), as well as carried out some preliminary searches of the GPD. Developed by researchers at the University of Queensland and Queensland University of Technology in Australia, the GPD is a “web-based and searchable database designed to capture all published and unpublished experimental and quasi-experimental evaluations of policing interventions conducted since 1950” (Higginson et al., 2014; see also Eggins et al., 2016). Impressively, the GPD is updated on a fairly regular basis and it is not restricted to studies reported in English. In their latest analysis of the GPD (through 2018), Mazerolle et al. (2022) identified a total of 431 RCT of policing interventions. Based on searches of the RCTs in the database, we identified at least 20 unique studies (or 4.6%) that employed pair-matching or full blocking prior to random allocation. Some of the other RCTs used partial blocking, which involves some type of stratification of the place-based units prior to random allocation to treatment and control conditions (Weisburd & Gill, 2014).

One notable example of the use of the matched-pair RCTs design in policing was carried out by Weisburd et al. (2008) to evaluate a risk-focused policing intervention in Redlands, CA. The authors grouped 26 census blocks into 13 pairs, matched according to risk factor scores, calls for police service, population density, and median home value, and then randomly allocated units in each matched pair to receive risk-focused policing or usual patrol.

Outside of policing, there are few examples of pair-matching with random allocation in criminology. Most often, these occur in school settings where the matched-pair design is especially useful: “Since it is difficult to assign a large number of schools randomly, it may be best to place schools in matched pairs and randomly assign one member of each pair to the experimental condition and one member to the control condition” (Farrington & Ttofi, 2009, p. 327). The most notable example is Communities That Care (CTC), a multi-modal, community-based youth development program. Across seven states, 24 small, rural communities (average population = 14,646) were recruited and matched by pairs based on “population size, racial and ethnic diversity, economic indicators, and crime rates” (Hawkins et al., 2008, p. 183). One community in each pair was then randomly assigned by coin toss to receive the preventive intervention (from grades 5 to 9). Analyses indicated baseline similarity of the intervention and control communities. Follow-up assessments have been conducted at 8 years (through grade 12; Hawkins et al., 2014) and 11 years (through age 21; Oesterle et al., 2018).

Another example of pair-matching with random allocation involved a behavioral intervention to prevent sexual assault in Nairobi, Kenya (Baiocchi et al., 2017). Thirty-two schools were pair-matched based on “number of girls in the school, number of boys in the school, academic performance, public versus private school, location, materials used to construct the school, and materials used for the floor” (Baiocchi et al., 2017, p. 822). One school from each pair was then randomly allocated to receive the intervention. Two intervention schools ultimately did not participate in the program, and the researchers dropped these schools and their matched controls. It can be concluded that the use of pair-matching in RCTs in criminology and in other social science disciplines has shown a renewed interest in the last two decades, but, like with medicine and public health, is rather limited.

Discussion and conclusions

This paper started with Cabot and Amberson to show the shared and enduring interest in criminology and medicine in rigorously comparing like with like in evaluating effects of prevention interventions and treatments. Indeed, Cabot, who was both a physician and social interventionist, showed overlap of these concerns. Over the twentieth and twenty-first centuries, both domains have continued to wrestle with methodologies to most efficiently and robustly compare like with like. Both, in this setting, have turned to pair-matching in combination with random allocation, though less often than its advocates would like.

Certainly, the boundaries can be fuzzy between criminology/social sciences and medicine/public health. Some intersection between the domains has been clearer. One important example comes from the medical profession’s response to victims of violent crime. In their seminal (but non-experimental) study “Murder and Medicine,” Harris et al. (2002) found that advances in emergency medical technology and care (e.g., development of 911 call systems and trauma units at hospitals, improved training for medical technicians) in the USA during the 1960s through the 1990s played a central role in increasing the chance of survival for victims of violent criminal assault. The authors estimated that the lethality of violent assaults (i.e., assaults resulting in homicides) decreased over this period of time by 2.5 to 4.5% per year.

Another notable example is the movement toward evidence-based policy and practice in the respective domains. The Cochrane Collaboration (now Cochrane) in medicine was instrumental in the founding of the Campbell Collaboration in the social sciences (which includes a major focus on crime and justice) more than 20 years ago, and the two international organizations work closely together, with many systematic reviews registered jointly (Wilson et al., 2021). Moreover, like efforts to make medicine more evidence-based, the adoption of an evidence-based approach in criminology is confronted by a number of similar obstacles, including institutional resistance and to some degree an unwillingness to learn from failures (Millenson, 2021).

Charting the evolution of this novel and highly rigorous research design in criminology and medicine over almost a full century draws attention to the possibilities for advancing knowledge and improving public policy. It also draws attention to the possibilities for experimental criminology (see Farrington et al., 2020).

Implications for experimental criminology

While most criminological research is non-experimental (Dezember et al., 2021), there has been a growing recognition that random allocation is not only necessary for establishing causal effects in evaluation research (Weisburd, 2010), but that a broad scope of criminological topics can benefit from randomized controlled trials (Ridgeway, 2019). This echoes earlier calls to make social science more experimental (Sherman, 2003), including the prediction that “[c]riminology may soon resemble medicine more than economics” (Sherman, 2005, p. 132). While criminology has not yet achieved this status (see Dezember et al., 2021), this is an intriguing observation given the historical development of pair-matching with random allocation in medicine/public health as well as in criminology.

Today, the main use of pair-matching in combination with random allocation is when the units are clusters or places, the latter often for policing interventions. In the context of a rapid growth of experimental research in criminology, as documented in the Global Policing Database and other sources (see, e.g., Farrington et al., 2020; Mazerolle et al., 2022), there are seemingly many more opportunities for researchers to use this design. Take policing, for example. Of the 431 RCTs of policing interventions in the GPD (Mazerolle et al., 2022), we identified at least 20 unique studies (or 4.6%) that used pair-matching or full blocking prior to random allocation. While this number may be small in both absolute and relative terms, it is noteworthy that most of the studies that have used this design have been conducted in the last two decades.

Understanding why some researchers who are using RCTs to evaluate police interventions are incorporating the pair-matching technique draws attention to a couple of broader themes. One has to do with the need for increased methodological rigor to achieve like with like comparisons (i.e., to improve internal validity) and increase confidence in observed effects. This takes on added importance in the context of place-based interventions when the number of units of allocation (N) is small and there is heterogeneity among the units. In this context, Weisburd and Gill (2014) demonstrate that blocking of units prior to random allocation can go a long way to decreasing covariate imbalance—and thus improving equivalence—between treatment and control conditions, without necessarily compromising statistical power or degrees of freedom. In doing so, the authors also rebut the conventional wisdom that there should be a minimum of 50 units in each condition (Farrington, 1983; Farrington & Welsh, 2006), which is not always feasible when the units are places or clusters.

Another key theme has to do with new developments in experimental methodologies and their application to criminological interventions. Most recently, Sherman (2022) reviewed the advantages and disadvantages of the repeat crossover RCT design compared to the simple (or parallel track) RCT design as applied to place-based policing interventions. In the context of the strategy of hot spots policing, Sherman (2022, p. 2) describes the repeat crossover RCT design’s fundamentals:

In this design, each hot spot serves as its own control. Using each day in each hot spot as the unit of analysis (hot spot-days), each hot spot is randomly assigned to different treatments on different days. Crime outcomes on treatment days, on average, in each hot spot are then compared to average outcomes on no-treatment days, within each hot spot.

The main advantage of this design is to allow for “continuous impact assessment” of interventions—based on “local knowledge”—to produce reductions in real time in targeted crimes at the local level (Sherman, 2022, p. 2, emphasis in original). Recent examples of the use of the crossover RCT design include two short duration police foot patrol interventions in hot spots of serious violence in the British city of Essex (Basford et al., 2021) and county of Bedfordshire (Bland et al., 2021).

For the criminologist designing a prospective RCT, whether it involves a simple (or parallel track), wait-list control, or some other type of design (but not crossover design), the key questions become as follows: (1) Can the units (e.g., people or places), based on the data available on the units and the recruitment process of the units, be matched into pairs prior to random allocation? and (2) Will this produce a more rigorous assessment of the impact of the intervention? The point here is that, like the principle that evaluation designs (experimental or quasi-experimental) need to be guided by the research question at-hand and not the other way around, pair-matching in combination with random allocation may not always be feasible or needed. For example, an argument could be made today that the Cambridge-Somerville Youth Study, at least based on its large original sample (N = 650), did not require pair-matching in addition to random allocation. But, of course, this overlooks the historical context of the beginnings of experimentation in the social sciences and medicine (Bothwell & Podolsky, 2016; Forsetlund et al., 2007), not to mention concerns that Cabot had about the use of matching on its own. (Recall that for the CSYS, random allocation was a secondary consideration.) To return to the criminologist designing a prospective RCT today, even a large sample size may not be sufficient, especially if there is a moderate to high degree of heterogeneity among the units.

Whether it be through this application or others, the shared history of this particular technique for rigorously comparing like with like reinforces experimental criminology’s bonds with experimentation in medicine and public health.