Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Social scientists become increasingly aware of the relevance of space and place (Goodchild et al. 2000), and criminology is no exception. As a matter of fact, the geography of crime has been a focal concern of criminologists from the very start of the discipline. In nineteenth century Europe, the “moral statistics” of pioneers, Guerry and Quetelet, empirically demonstrated that crime varied across geographical regions. They not only produced maps that visualized these differences, but also studied statistical relations between crime, poverty, and education.

During the early twentieth century, researchers associated with the University of Chicago studied how crime and other social problems varied across urban communities, again mapping the geographical patterns and using community characteristics to explain these distributions. The work on juvenile delinquency has become a classic example (Shaw and McKay 1942).

Interest in the geography of crime decreased somewhat between 1950 and 1980, to some extend possibly because the ecological fallacy (Robinson 1950) hampered the interpretation of aggregated data. Later, the link between communities and crime was revitalized (Bursik and Grasmick 1993; Sampson et al. 1997). In recent years, a new concern has emerged with micro units of place such as addresses or street segments (Eck and Weisburd 1995; St. Jean 2007).

From the very start, geographical criminology has been an area of research where methodological and statistical innovations were either developed or adopted early. In their times, Guerry and Quetelet were pioneers and innovators, and their work is said to have been the launching pad for much of modern social science (Beirne 1987; Friendly 2007). When hierarchical linear (multilevel) models were developed in the 1980s, criminologists and sociologists who studied the links between community and crime quickly embraced and applied them to model community context effects, and even took a lead by developing a new “ecometrics” of crime measurement (Raudenbush and Sampson 1999). When trajectory models were developed to model the criminal development of individuals (Nagin 1999), geographical criminologists soon saw their value for modeling the crime trajectories of geographical entities (Griffiths and Chavez 2004; Weisburd et al. 2004). As another example, spatial econometrics (Anselin 1988) has quickly diffused into the criminology field. In the research on crime location choice, developments in discrete choice modeling have been adopted (Bernasco and Nieuwbeerta 2005). The recent focus on small units of analysis creates new methodological challenges for geographic criminology (Weisburd et al. 2009). In sum, geographic criminology has always been at the cutting edge of major methodological and empirical progress.

The purpose of the present chapter is to provide an up-to-date overview of methods for the statistical modeling of spatial crime data, to review some instructive and innovative applications in the field, and to direct the reader to the relevant literature.

The chapter consists of three sections. The first section introduces and delineates the subject matter. It discusses the relevance of spatial analysis, describes what spatial data are, which spatial units of analysis can be distinguished, and how they are sampled. The section further addresses criminological categories that can be geographically referenced, and delineates spatial modeling from descriptive spatial statistics and from visualization techniques (“crime mapping”) that are treated elsewhere in this volume.

We distinguish two types of spatial outcomes that can be modeled: spatial distribution, and movement. The second section deals with the analysis of spatial distributions. We discuss how spatial structure is specified in spatial statistics, address the basic concept of spatial autocorrelation, and review a variety of spatially informed regression models and their uses in criminology. The third section addresses the analysis of movement. We address the length of the journey-to-crime, and discuss spatial interaction models, spatial choice models, and the analysis of mobility triads, again highlighting applications in the field of crime and criminal justice.

This chapter resembles and builds upon a review that appeared nearly a decade ago (Anselin et al. 2000). Compared to that review, the present chapter dedicates less space to theory, to geographic information systems (GIS) and to descriptive spatial analysis methods, and more to the analysis of spatial choice and movement.

What are Spatial Crime Data?

All methods discussed in this chapter apply to spatial crime data. Crime data are simply data that bear a direct relation to crime. Often the data apply to people in their roles of offenders, accomplices, fences, victims, bystanders, police officers or judges. They can also be crime targets, such as houses (for burglary), empty walls (for graffiti), cars (for theft), or airplanes (for hijacking). Most often, however, the data are the criminal events themselves: the burglaries, rapes, arsons, robberies, assaults, and murders.

What makes crime data spatial crime data is that the units of analysis are geographically referenced. This means that they have attributes (e.g., a pair of geographical coordinates) that can be used to establish where they are situated relative to the other units in the sample. In modeling spatial distributions, a weight matrix (see section “Specification of Spatial Structure: The Spatial Weight Matrix and Chap. 6 by Tita and Radil”) specifies the spatial relations between all pairs of observations.

Thus, like in network data and in hierarchically structured data, in spatial data the observational units are interrelated. In spatial data, this relation is geographic in nature. For example, two units are adjacent or non-adjacent, they are nearby or distant, they are nearest neighbors or not.

Many textbooks distinguish spatial data by the spatial characteristics of the units of analysis, e.g., whether the data refer to points, to cells of a grid or to areas (depending on the contexts also referred to as zones, lattices, or polygons). For the purpose of the present review, however, it is more useful to make another distinction, namely between stationary (time invariant) spatial distributions on the one hand, and movement between origins and destinations on the other hand. The first type of data may be referred to as spatial distribution data, the second as movement data. Here are some examples of spatial distribution data on crime and criminal justice issues:

  • Geographical coordinates of the home addresses of convicted juvenile offenders in Chicago (Shaw and McKay 1942)

  • Numbers of homicides per county in the USA (Baller et al. 2001)

  • Percentage of residents reporting to be victims of violent assault in their own neighborhood, for each neighborhood cluster in Chicago (Sampson et al. 1997)

  • Geographical coordinates and dates of police reported burglary incidents in Liverpool, England (Bowers and Johnson 2005)

  • Numbers of police recorded crimes per street segment in Seattle over a period of 14 years (Weisburd et al. 2004)

Spatial mobility data involve movement between two or more locations. Here are some examples of movement data on crime and criminal justice issues:

  • The distance between the home and the place of the offence of serial rapists (Warren et al. 1998).

  • Robbery incidents in Chicago, georeferenced according to the census tract of residence and the census tract of the robbery incident (Bernasco and Block 2009).

  • Homicides in Washington, DC, georeferenced according to the geographical coordinates of the offender’s home, the victim’s home and the location of the homicide (Groff and McEwen 2007).

  • Numbers of crime trips (linking offender’s home to crime site) between neighborhoods in The Hague, the Netherlands (Elffers et al. 2008).

What is Spatial Modeling?

Although spatial models require spatial data, spatial data need not necessarily be analyzed with spatial models. As a matter of fact, most spatial crime data have been analyzed without spatial models. For example, with a few exceptions (e.g., Heitgerd and Bursik 1987; Morenoff et al. 2001) spatial models have not been used in the century-old ecological tradition that studies how neighborhood crime rates are influenced by neighborhood conditions, while neighborhoods are clearly spatial entities. Neither have they been used in cross-national comparisons of crime phenomena, although like neighborhoods, countries are spatial entities. The present chapter will obviously focus on the methods of analysis that actually utilize the spatial nature of the data.

We distinguish between two types of spatial analysis methods. The first is often referred to as exploratory spatial data analysis (acronym ESDA) and is concerned with the description and exploration of spatial data. Typically, the results of these analytical methods are visualized with the use of geographic information systems (GIS). Geographical information systems are software tools for digital cartography that help to process, organize, analyze, and visualize geographically referenced information. Applied to crime and justice topics, this is commonly referred to as “crime mapping.” Textbooks that discuss ESDA methods are Bailey and Gatrell (1995) and Haining (2003). Visualization and crime mapping issues are comprehensively dealt with in Chainey and Ratcliffe (2005), and more concisely in the chapter by Ratcliffe (Chap. 2) in this volume. Two studies that followed a similar setup analyzed longitudinal data on census tracts in Chicago (Griffiths and Chavez 2004) and on street segments in Seattle (Weisburd et al. 2004). Both first used group-based trajectory analysis (see Chap. 4 by Nagin in this volume) to classify the spatial units according to their temporal crime patterns. Subsequently, the resulting classification was visualized using maps of Chicago and Seattle respectively.

The present chapter addresses only the second type of spatial data analysis: spatial modeling. This term refers to a set of regression analysis techniques that are adapted to spatial data.

When we model spatial distribution data, we attempt to predict the outcome variable at each location as a function of variables of the focal location and possibly of variables at other locations as well (typically assuming that a variable measured at nearby locations has a larger influence than the same variable measured at more distant locations). In the following section, we extensively discuss spatial regression models and briefly touch upon spatial filtering models, geographically weighted regression models, and multilevel models.

When we model movement data, we use either aggregated or disaggregated movement data. In the case of aggregated data, we attempt to predict the number of movements from an origin to a destination as a function of attributes of the origin, the destination, and some measure of impedance (usually distance, or travel time) between the origin and the destination. These models are referred to as spatial interaction models. In the case of disaggregated data, we attempt to predict which one of a set of potential destinations an actor will choose as a destination, given that he or she starts from a specific origin. The variables used in the prediction include attributes of all potential destinations and their distance to the origin, possibly in interaction with attributes of the origin or the actor. The models are referred to as spatial choice models.

Why is the Spatial Dimension Important?

There are two different reasons why we should care about our crime data being geographically referenced. The first reason is that the spatial arrangement of the data might bias the findings of regular statistical analysis, and we need spatial statistics to diagnose the situation or correct for it. The second reason is that the spatial dimension is a necessary element of our findings and conclusions, because we are intrinsically interested in spatial patterns and spatial effects, and need spatial statistics to explore them.

Let us first discuss the spatial arrangement of data as a potential cause of bias. As mentioned earlier, in geographically referenced data, all units of analysis are interrelated. For example, each unit is located at a certain distance from each and every other unit in the data. This implies that they are not independent observations, but that one observation may influence one or all of the others. According to Tobler’s First Law of Geography, to the effect that “everything is related to everything else, but near things are more related than distant things” (Tobler 1970: 236), we might expect that nearby units influence each other more intensely than distant units. Thus, not only can we expect our observations to be interdependent, but they may also be interdependent to a varying degree.

Because most statistical techniques are based on the assumption of independence between observations, our results may be biased if this assumption is unjustified. Thus, even if we do not have a substantive interest in spatial aspects of the research problem, we may need spatial statistics either to verify that the assumptions hold (diagnostic tests for spatial autocorrelation), or else to correct in a statistically appropriate way for the interdependence between observations. Much recent work in criminology illustrates the increasing awareness of spatial interdependence as a potential source of bias, and of statistical methods and techniques to deal with it (Kubrin and Stewart 2006; McCord and Ratcliffe 2007). Issues of spatial interdependence require special attention as the spatial scale becomes smaller, as is demonstrated in an analysis of spatial autocorrelation of trajectory group membership among nearly 30,000 street blocks in the city of Seattle (Groff et al. 2009).

Instead of being a source of bias and nuisance factor, the spatial arrangement of data is also a potential source of information. It may also provide an opportunity to explore substantive hypotheses. For example, it has been hypothesized that local measures against crime give rise to displacement of crime to nearby locations, i.e., that when offenders find out that their crime opportunities are blocked in one place, they will move to nearby places to commit crime. Alternatively, the beneficial crime-reducing effects may diffuse or spill over to nearby places. In order to test this hypothesis, we need to assert whether after the implementation of the measures, crime increases or decreases more in nearby locations than elsewhere, in more distant locations (Bowers and Johnson 2003; Weisburd et al. 2006). Thus, we can only address the issue empirically if we have geographically referenced crime data.

Another example in which the research question dictates that spatially referenced data are to be analyzed is a study on lynching in the Southern USA in the 1890–1919 era (Tolnay et al. 1996). The authors assessed the effect lynching in one place had on lynching elsewhere, contrasting a “contagion” model – lynching in one place increases the likelihood of lynching in nearby places – with a “deterrence” model, which states that lynching in one place reduces the likelihood of lynching in nearby places. In this research, the authors formulated specific hypotheses on spatial processes assumed to influence the occurrence of crime events, as is the case in many recent analyses of spatial crime data (Andresen 2006; Baller et al. 2001; Hipp 2007; Kubrin and Stewart 2006; Mears and Bhati 2006; Messner et al. 1999; Morenoff et al. 2001; Nielsen et al. 2005; Wilcox et al. 2007).

When we analyze movement, the spatial arrangement of origins relative to destinations is crucial. For example, in a typical application to crime data, movement applies to the journey to crime. Geography is an essential element of travel, and in order to analyze offenders’ journeys to crime, we need to know where they live and where they commit crime.

Spatial Units of Analysis

Spatial crime data are crime data that are geographically referenced: we know where the observations are located relative to each other. Spatial crime data can be either point data or areal data. Point data are data for which the geographic reference is a single point, usually a pair of coordinates in a two-dimensional coordinate system that indicates the exact location of the event of interest. As an example, if we have a data set of burglaries and if the geographical coordinates of all burglaries are included, we have point data.

Areal data apply to a continuous subset of the study area. Depending on the discipline where they are used, areas are also denoted as “zones,” “spatial lattices” or “polygons.” An example of areal data is the numbers of calls for service for all police beats in a city. Although the calls for service originate from a specific point in space, the data are aggregated to the police beat level and, therefore, areal in nature. Areas are typically demarcated by physical or administrative boundaries.

A classic methodological problem in geography is the modifiable areal unit problem (MAUP) (Openshaw 1984): there are numerous ways to aggregate individual point data into areas, and the results of the analysis of the aggregated data depend on how the aggregation was done. One aspect of the MAUP is the issue of scale: how large should aggregated units be. In the USA context, for example, the spatial unit of analysis could be as large as a state, a county, or a city. In practice, most studies in criminology use smaller units: they have neighborhood clusters, neighborhoods or census tracts as the spatial unit of analysis. Because even areas as small as census tracts tend to be far from homogeneous, many scholars in geographic criminology have advocated the measurement at still smaller levels of spatial aggregation, such as (again we use terms from the USA context) census block groups, census blocks, street segments, or even addresses (Weisburd et al. 2004, 2009).

When the spatial crime data are georeferenced as points, their spatial relationship to each other can be easily established by calculating the distance between them. When the geographic reference is an area, there are various possibilities. One possibility is to use the centroid of the areas as an approximation and to calculate the distance with the other areas. In most applications, the spatial relation between areas is indicated by some measure of contiguity or adjacency (see section “Specification of Spatial Structure: The Spatial Weight Matrix”).

Sampling and Statistical Inference

Spatial data in general and spatial crime data in particular, seldom represent a random sample from a population. In virtually every analysis of spatial crime data, the data are an exhaustive sample of the population, e.g., all cities in a country, all census tracts in a region, or all street segments in a city, so that running a significance test to infer something about a larger population is useless, as there is no larger population, and the “estimates” are truly descriptions (Gould 1970). In practice, researchers often routinely revert significance testing in these cases, because they tend to think of their data as being a sample that could be generalized to other places or to other times. One argument for statistical significance testing in this situation refers to the modifiable areal unit problem (MAUP, see section “Spatial Units of Analysis”): the particular areal subdivision that is used in the analysis is one particular sample from a large number of hypothetical subdivisions (Cliff 1973). This line of argument gives rise to so-called permutation tests (Edgington 1980). For a critique of the application of procedures of classical statistical inference to population data in geography, see Summerfield (1983).

Of course, it is conceivable that true sampling occurs in a spatial context, e.g., when we analyze the numbers of criminal events in randomly chosen street segments in a city, and relate them to characteristics of neighboring segments. In such cases, statistical inference is only logical, because the observed street segments are a random sample from all segments in the city. However, true samples are extremely rare in the statistical analysis of spatial crime data, and we have not been able to find a single instance. In virtually all studies, there is no sample of spatial units, and the complete population is analyzed. Sometimes stratified samples are taken from spatial strata, for example, a population survey stratified by neighborhood (Sampson et al. 1997). In those cases, all neighborhoods are selected, and a random sample of residents is sampled in each neighborhood. In such cases, inferential statistics can logically be applied to the sample units (residents), but the use of inferential statistics for the strata is void, as discussed earlier.

Analysis of Spatial Distributions

This section deals with the analysis of spatial variation, i.e., with the observation that offenders, victims, crimes, and related events, like many other human phenomena, are not randomly distributed across space, but usually display patterns. Before reviewing a variety of spatially informed regression models and their uses in criminology, we discuss the issue of how spatial structure is incorporated into statistical analysis, how it is used in the calculation of spatial autocorrelation, and how we can test for residual spatial autocorrelation in nonspatial regression models.

Specification of Spatial Structure: The Spatial Weight Matrix

In spatial regression models, an outcome at a focal location is assumed to be influenced not only by its own characteristics, but also by characteristics of other locations. Typically, the strength of these influences is assumed to decay over distance, i.e., the influence of nearby locations is stronger than that of locations further away (this principle is illustrated by the first Law of Geography, see section “Why Is the Spatial Dimension Important?”).

To specify the relative strength of these spatial influences and incorporate them in a statistical model, spatial statistics methods use a spatial weight matrix, which is sometimes referred to as a connectivity matrix (Brunsdon 2001), or adjacency matrix. A spatial weight matrix W is a square matrix of dimension n, where n is the number of locations in the dataset. Each element w ij of W is a measure of the strength of the influence of location i on location j. The elements w ii on the diagonal equal zero by definition. Because the strength of the influence is supposed to decay over distance, normally the larger w ij , the closer is i to j.

In spatial models, the weight matrix W is fixed (it is not estimated but defined). Because different weight matrices imply different spatial dependency structures, the outcomes of any model are conditional on the weight matrix used. The specification of the spatial structure is too often chosen as a matter of convenience or dictated by constraints imposed by software. Deane et al. (2008) report an interesting analysis of a single model that is estimated with various differently specified weight matrices.

Figure 33.1 is an example of a simple weight matrix. It depicts a map of an area comprising nine neighborhoods on the left side, and a weight matrix W (1) on the right side.

Fig. 33.1
figure 1_33

Map of area containing 9 neighborhoods (left) and an associated weight matrix W(1) based on first-order adjacency (right)

The W (1) weight matrix specifies how the nine neighborhoods in the area are expected to influence each other. In this case, W (1) is defined according to first order adjacency, i.e., \({{w}^{(1)}}_{\mathit{ij}} = 1\) if and only if neighborhoods a i and a j have a border in common, otherwise \({{w}^{(1)}}_{\mathit{ij}} = 0\). Used in a regression model, this weight matrix claims that an outcome variable in a focal neighborhood is influenced by the characteristics of adjacent neighborhoods, and that each of these influences is equally strong (as the relevant w ij are all equal to 1).

W (1) is by no means the only possible dependency structure. We could, for example, use a second order adjacency matrix W (2), in which case \({{w}^{(2)}}_{\mathit{ij}} = 1\) if and only if the number of borders to be crossed between locations i and j equals 1 or 2, and else \({{w}^{(2)}}_{\mathit{ij}} = 0\) otherwise.

Still another weight matrix, depicted in Fig. 33.2, uses multiple order adjacencies to define a continuous measure of influence. In W (3), the relationship \({{w}^{(3)}}_{\mathit{ij}}\) between two neighborhoods w i and w j is quantified as

$${w}_{ij}^{(\mathrm{3})} = \mathrm{1}/(\mathrm{minimal}\,\mathrm{number}\,\mathrm{of}\,\mathrm{borders}\,\mathrm{to}\,\mathrm{be}\,\mathrm{crossed}\,\mathrm{between}\,{w}_{ i}\,\mathrm{and}\,{w}_{j}\mathrm{)}$$
(33.1)
Fig. 33.2
figure 2_33

Map of area containing 9 neighborhoods (left) and an associated weight matrix W(3) based on inverse of minimal number of borders to be crossed between i and j (right)

Many other spatial weight matrices are possible, and any measure of proximity or the strength of the spatial relation between w i and w i may be considered as an element of W. For example, it could be inverse distance, or the inverse of the traffic intensity from i to j, or a nearest neighbor relation: neighborhood i is the x-order nearest neighbor of neighborhood j if there are no more than x other neighborhoods in the dataset that are closer to i than j is, or the length of the common border between i and j (also see Getis 2007).

The weight matrix plays a pivotal role in the analysis of spatial model, because the results are conditional on the choice of the spatial weight matrix (for a detailed argumentation of this point, see Chap. 6 by Tita and Radil). Therefore, the weight matrix must be carefully chosen by the researcher, legitimated by theoretical arguments, and reported in detail (Anselin 1988).

Spatial Autocorrelation

There is spatial autocorrelation when the spatial distribution of some measure across space is nonrandom, so that there is a spatial pattern (Cliff and Ord 1973; Dubin 1998). Spatial autocorrelation is positive when nearby entities are more similar than entities that are far apart, and thus it embodies Tobler’s first Law of Geography. On a map where events are indicated by dots, spatial autocorrelation shows up as the dots form groups. On areal maps, autocorrelation is indicated by the grouping of similarly colored or patterned areas.

Negative spatial autocorrelation is the opposite phenomenon, where nearby entities tend to be different from each other (Griffith 2006). This shows up on maps as evenly spaced patterns of dots or colors (e.g., like on a checkerboard). In the absence of spatial autocorrelation, there is no pattern, neither a clustered pattern nor a uniform (checkerboard) pattern.

There are numerous measures of spatial autocorrelation (for a list, not a discussion, see Getis 2007), the most well-known including Moran’s I, Geary’s C, and Ripley’s K. In order to be informed about the spatial structure of the data, all of these measures require a weight matrix (see section “Specification of Spatial Structure: The Spatial Weight Matrix”). Thus, the level of autocorrelation in a dataset depends on both the specific statistic that is chosen and the weight matrix that is selected. The statistics can be tested against a theoretical distribution, but most contemporary software test the observed values using permutation tests, in which a Monte Carlo simulation is run to construct a sampling distribution using the observed data (Besag and Diggle 1977).

Moran’s I is the most commonly used statistic because it is easily computed and adaptable to special situations. For example, it can be used to calculate “local” spatial autocorrelation (Anselin 1995) and can be adapted to take into account variations in the underlying population densities and to situations where the data are points (for an application to fine-grained crime data, see Groff et al. 2009). Its variance–covariance structure is also similar to other well-known statistics.

Regression Analysis in the Presence of Spatial Autocorrelation

As was discussed in section “Why Is the Spatial Dimension Important?,” the spatial relations between our observations may be a nuisance rather than an asset. If we are interested in processes within the examined units of analysis, and not between them, then the fact that they may be spatially interdependent is actually a disadvantage from a statistical point of view. Consider the ordinary least squares (OLS) regression model:

$$y = X\beta + \epsilon $$
(33.2)

In this equation, y is a dependent variable (for example, the crime rate) observed in N spatial units (for example, census tracts), Xβ is the matrix representing the independent variables (e.g., population density, affluence, residential stability) and their associated parameters, and ε is a residual (error term).

One of the assumptions of OLS (and many other) regression models is that the residuals ε are uncorrelated with each other. When spatial data are used, residuals are assumed not to be spatially autocorrelated, i.e., the residuals of pairs of nearby observations should not be more similar than the residuals of pairs of observations that are located at greater distance from each other.

The presence of residual spatial autocorrelation can be tested with Moran’s I or one of the other test statistics developed for this purpose. If no residual spatial autocorrelation is detected, it means that any existing spatial autocorrelation in the dependent variable is due to spatial autocorrelation in the independent variables, so that the conditional distribution of the dependent variable, i.e., conditional on the values of the independent variables, is not spatially autocorrelated.

If the errors are spatially autocorrelated, and if the substantive interest is in processes within rather than between the observations, the conclusion must be that unobserved variables – the effect of which is captured in the error term – are responsible for the residual spatial autocorrelation. Although an OLS estimate may yield unbiased estimates of the β parameters, the standard errors of the β parameters are unknown and significance is therefore not appropriate.

For this situation, two solutions are available: either filter away autocorrelated error, which is done in the spatial filtering approach (see section “Spatial Filtering below”), or explicitly incorporate spatially autocorrelated error in the model, which is what spatial error regression models do (see section “Spatial Error Regression”).

Spatial Filtering

If the researcher is not directly interested in the spatial aspects of the problem under study (i.e., if the spatial arrangement of the data is mainly a potential cause of bias), it may be useful to resort to the spatial filtering approach. Spatial filtering is a method to remove or “filter out” the spatially autocorrelated part of variables in a regression equation. The advantage of this method is that it generates estimates that are not biased by spatial autocorrelation, but allows the analysis to reap all the advantages of the well-known ordinary least squares regression model.

Spatial filtering converts variables that are spatially autocorrelated into spatially independent variables in an OLS regression framework. The conversion requires spatial filtering procedures. Two alternative filtering procedures are available, one devised by Getis (1990, 1995) and one devised by Griffith (2000). Both seem to perform equally well (Getis and Griffith 2002).

Spatial Error Regression

In the spatial error model (Anselin 2003), spatially autocorrelated error is an explicit part of the regression model:

$$y = X\beta + \lambda W\epsilon + u$$
(33.3)

In this equation, y is the dependent variable, Xβ is the matrix representing the independent variables and their associated parameters, ε is a vector of error terms that are subject to the spatial interdependence that is specified in the weight matrix W, λ is a single parameter that measures the amount of spatial interaction, and u is a regular (non-autocorrelated) error term. Thus, in the spatial error model, the error term is split into an autocorrelated part and a non-autocorrelated part. This is an appropriate model if it can be assumed that there are unobserved independent variables that are spatially autocorrelated and affect the value of y. In this sense, the model captures the spatial influence of unobserved (unmeasured) independent variables.

Andresen (2006) analyzed calls for service made to the Vancouver police, utilizing census tracts as the spatial unit of analysis. The purpose was to explain differences in crime rates across census tracts, with a particular focus on using ambient populations as well as resident populations in the denominator of the crime rate equation, in order to measure the population at risk of criminal victimization. After finding that the residuals of OLS regression models displayed spatial autocorrelation, but in the absence of a specific hypothesis of why the crime rate in a focal tract would be affected by the characteristics of adjacent tracts, Andresen used the spatial error model to control for the residual autocorrelation.

In a study of city-level robbery rates (Deane et al. 2008) in 1,056 cities in the United States with 25,000 or more residents, OLS regression was first used to assess the factors that explained variation in robbery rates between cities. When the residuals displayed spatial autocorrelation, a spatial error model was estimated. An interesting feature of this study was that the authors estimated the model with various alternative spatial dependence structures (i.e., weight matrices): one based on distance between the cities, one based on distance between cities in the same state (i.e., the weight of a pair of cities in different states was zero), and one in which all cities are nested in states (i.e., all pairs of cities within a state have a value of 1, and all other pairs 0). The last model, with the nested weight matrix structure, was preferred on the basis of model fit, suggesting that similarity in robbery rates may be a function of processes at the state level (legislation or state level policies) rather than by spatial proximity alone.

Spatial Lag Regression

Spatial filtering procedures and spatial error models are appropriate in situations where spatial interdependence is a nuisance rather than the main topic of inquiry. In many substantive applications, the issue of spatial dependence is at the heart of the research question. In such cases, spatial relations between the units of analysis are actually a necessity because they allow us to assess, for example, whether gang incident rates are affected by gang violence in nearby areas, whether the number of robberies in a block depends on the number of commercial businesses in adjacent blocks, or whether the burglary rate in a community depends on the level of economic deprivation in nearby communities. The observation of adjacent or nearby spatial entities is needed to answer such questions.

Spatial lag models are regression models that incorporate a specification of spatial interdependence not in the error term but in the fixed or predicted part of the regression equation (Anselin 2003). It is useful to distinguish between two separate cases, one in which the hypothesized spatial interdependence runs from the independent variables in nearby areas to the dependent variable in the focal area, and one in which the hypothesized spatial interdependence runs from the dependent variables in nearby areas to the dependent variable in the focal area (so that y is an endogenous variable that appears on both sides of the regression equation). The former model can be written as follows:

$$y = X\beta + \mu WX + \epsilon $$
(33.4)

This model with lagged independent variables is sometimes not specifically referred to as a spatial regression model because it can be estimated with the OLS technique. However, from a substantive point of view it is, because it allows the researcher to assess whether independent variables measured in nearby areas have an effect on a dependent variable in the focal area. In this equation, μWX is a spatially weighted matrix of independent variables in areas “nearby” the focal area where y is measured (where “nearby” is thus defined in the weight matrix W).

In an early application of the idea that extra-community processes affect internal outcomes, Heitgerd and Bursik (1987) demonstrated that local delinquency rates in a community were influenced by racial changes in adjoining communities. More recent examples of this approach are Bernasco and Luykx (2003), who explored whether burglary rates in a reference neighborhood were affected by concentrations of burglars’ residences in adjacent neighborhoods, and Mears and Bhati (2006), who explored whether the number of homicides in a neighborhood was affected by resource deprivation in adjacent neighborhoods.

In the other spatial lag model, not the independent variables but the dependent variable is lagged:

$$y = X\beta + \rho Wy + \epsilon $$
(33.5)

Here, ρWy is a spatially weighted matrix of the dependent variable. This is the traditional spatial lag regression model (Anselin 2003) where the dependent variable is endogenous, as it appears on both sides of the equation. This model is often applied when spatial effects are being considered in research on crime and criminal justice. An essential aspect is that it assumes that spatial dependence operates through effects of nearby y variables upon each other, and any spatial effects of the exogenous x variables run indirectly as “spatial multipliers” (Anselin 2003) through their influence on the local y variable. For example, the model could assume that the crime rate in a focal area is affected by the crime rate in surrounding areas. Often, the model is justified by referring to concepts like diffusion, contagion, or displacement, although these concepts assert a sequential process, while the spatial lag model is not sequential.

Some examples of this spatial lag model in research on crime and criminal justice are Baller et al. (2001) who studied the spatial clustering of county-level homicide rates and found evidence for the existence of a diffusion process, and Morenoff et al. (2001) and Kubrin (2003), who studied neighborhood level homicide, and also found spatial proximity to homicide to be related to homicide rates.

The (maximum likelihood) estimation of (33.5) becomes unfeasible in large samples (i.e., in cases where the number of spatial entities N is large, and where the weight matrix W thus contains N 2 cells). A solution for this problem was devised by Land and Deane (1992). They used a Two-Stage-Least-Squares (2SLS) procedure in which they first estimated y using instrumental variables, and subsequently used the predicted y in the right-hand side of the above equation. This procedure is still regularly used in situations where direct estimation of (33.5) is difficult (Hipp 2007; Rosenfeld et al. 2007).

For a discussion on non-linear spatial lag (and error) models (Logit, Poisson, Negative-Binomial), see Anselin (2001). These methods have also been used in panel designs (Anselin 2001; Baltagi et al. 2007).

Geographically Weighted Regression

Geographically weighted regression (GWR) analysis (Brunsdon et al. 1996; Fotheringham et al. 2002; LeSage 2004) is a modeling technique for exploratory spatial data analysis. It estimates (ordinary least squares, logistic or Poisson) regression equations in which the parameters are allowed to vary across space (and it tests whether they do). For example, in estimating a regression model of crime on income in a city, GWR allows the effect of income to be different, even change sign, between different parts of the city. In fact, the effect is allowed to be different at every data point. The (ordinary least squares) equation used in geographically weighted regression analysis, is:

$${y}_{i} = {\alpha }_{i} + {\beta }_{i}{X}_{i} + {\epsilon }_{i}$$
(33.6)

The subscript i refers to an individual data point. Because the β has an i subscript, it can have a different value for every observation in the data. To estimate the coefficients of the equation at point i, the other observations in the data set are used as well, but they are spatially weighted to the effect that observations near i weight more heavily in the estimation of β i than distant observations. The spatial weights can be calculated using a variety of methods.

The estimated coefficients are the output of GWR analysis and can be mapped, because they are all linked to a specific location. The resulting map displays spatial variation in the relationship between two variables (for an example, see Chap. 5 in Chainey and Ratcliffe 2005).

GWR models can also be mixed models, in which the coefficients of some variables are assumed to be global, while others are allowed to vary locally. For example, in the following mixed model, β varies locally, but γ has a single value for all observations:

$${y}_{i} = {\alpha }_{i} + {\beta }_{i}{X}_{i} + \gamma {Z}_{i} + {\epsilon }_{i}$$
(33.7)

An application of GWR to crime in Portland, Oregon (Cahill and Mulligan 2007) explored spatial variation in the factors that affect the amount of violent crime in block groups. The authors concluded that the GWR method was successful in exploring spatial variation in the relation between structural variables and crime. They found that the effects of some variables were quite stable across the city of Portland, while those of other variables fluctuated substantially across the city. The authors also performed a cluster analysis, using the similarity of coefficients and spatial contiguity as a criterion for placing observations together in a cluster (for a formal method of identifying a mixed model in GWR, see Mei et al. 2004). Malczewski and Poetz (2005) used GWR analysis to study spatial variation of the relation between socioeconomic neighborhood characteristics and the burglary risk in London, Ontario.

Multilevel Regression and Spatial Dependence

Multilevel regression models (also known as hierarchical linear models) have been developed for the analysis of hierarchically structured data, i.e., data in which the units of analysis are grouped or nested. Examples of such data include gang members grouped by the gang they are part of, offences grouped by the offenders who committed them, and crime victims grouped by their neighborhood of residence. Like spatial data structures, hierarchical data structures violate the standard assumption of OLS regression models that observations are independent. One of the various functions of multilevel models is that they alleviate this assumption, so that the observations within a group are allowed to be interdependent.

Multilevel models are now used routinely in situations where individual crime data that are nested spatially are available (Kubrin and Stewart 2006; Sampson et al. 1997; Velez 2001; Van Wilsem et al. 2006). Typically, such research involves individuals (usually offenders or victims) as the micro level of analysis and census tracts or neighborhoods as the spatially aggregated second level of analysis. The hierarchical structure is thus spatial in nature – the neighborhood encapsulates the individuals – and the multilevel model is used to take into account that all residents of the same neighborhood are subject to the same neighborhood conditions.

Multilevel models were not developed to model spatial processes, and they provide a very crude way of correcting for spatial autocorrelation because neither the influence of neighborhoods on nearby neighborhoods nor the influence of individuals on nearby individuals are modeled. Thus, in a standard multilevel model where spatial data are forced into a hierarchical data structure, both the spatial distribution of subjects within a neighborhood and the spatial arrangement of the neighborhoods themselves become irrelevant. The outcome of the analysis will not change if we were to change the locations of the individuals within the neighborhood, or the locations of the neighborhoods within the study area (Chaix et al. 2005; Elffers 2003).

The spatial aspect of a hierarchy does not always need to dominate its theoretical relevance. For example, if we are to study gun ownership and gun use across the states of the USA, the differences in regulations between states might be considered more salient than their relative spatial positions.

The complete integration of multilevel and spatial modeling is a complex issue. Morenoff (2003) has constructed a partial integration of spatial and multilevel modeling, in which the individuals are nested in neighborhoods but are not directly influenced by a spatial process, while the neighborhoods are subject to influence from adjacent or nearby neighborhoods. Various authors use this approach to test multilevel models for residual autocorrelation (e.g., Kubrin and Stewart 2006; Wyant 2008). If the spatial effects between the neighborhoods are not of substantive interest but a nuisance factor (i.e., potential source of bias) an alternative way to address the issue is by introducing a higher aggregated spatial level. Thus, districts or neighborhood clusters are collections of adjacent neighborhoods that are introduced as a third level in the multilevel model (Van Wilsem 2003; Wilcox et al. 2007). However, this solution solves the problem only partially and reintroduces it at a higher spatial level of aggregation.

Analysis of Movement

All methods discussed in the previous section apply to situations in which the units of analysis are stationary objects. As far as the analysis is concerned, they are fixed in space. Various theoretical perspectives require an analysis of movement. For example, routine activity theory (Cohen and Felson 1979) asserts that the convergence in time and space of motivated offenders and attractive and unguarded targets is a necessary and sufficient condition for the occurrence of predatory crime. From this perspective, understanding spatial crime patterns requires the analysis of the movements of potential offenders, potential targets, and potential guardians. Some targets, such as houses and businesses, or guardians like CCTV cameras may of course be immobile.

In the present section, the statistical analysis of criminal movement data is addressed. The distance between the offender’s home and the crime location is one of the most studied variables in this field of research. We address the analysis of the length of the journey to crime as a dependent variable in section “Length of the Journey to Crime,” and briefly discuss geographic offender profiling, a method that inverts distance decay curves to prioritize the search for an offender. In section “Spatial Interaction Models,” we describe how criminologists have taken up the study of the journey to crime using gravity and spatial interaction models that are used in other sciences to study flow between places. In these models, distance is one of the main independent variables. Subsequently, in section “Disaggregate Discrete Location Choice Models,” we discuss how explicit models of choice can be used and have been used to study location choice at the individual level. Finally, section “Crime Triads: Convergence of Offenders, Victims and Crimes” deals with methods that can be used to analyze spatial triads, in particular, the offender-victim-crime triad. In each subsection, we address basic methodological literature and discuss applications in research on crime and criminal justice.

Length of the Journey to Crime

One of the applications of Tobler’s First Law of Geography, referred earlier, is that, everything else being equal, the amount of interaction between two places declines as the distance between them increases (Fotheringham and Pitts 1995; Haynes and Fotheringham 1984). This phenomenon, distance decay, has been found to apply to daily routine journeys for various purposes, such as travel between home, workplace, and shopping and leisure centers, and also to migration. The distribution of the length of these trips is skewed to the left, with the large majority of the trips being relatively short. The general explanation for the distance decay phenomenon is that human movement is governed by the principle of least effort (Zipf 1949). If people travel purposefully towards destinations that provide rewards, they will generally prefer a nearby to a distant destination, unless the distant location provides significantly more rewards than the one nearby.

Criminologists have had a longstanding interest in the offender’s journey to crime, in particular in its length: the distance between the offender’s home and the location where he or she committed the offence. Starting with Bullock (1955), some studies have also explored the victim’s journey to crime, and in the following section, we will discuss the joint analysis of the victim’s and the offender’s journey to crime in terms of mobility triangles. The length of the journey to crime also displays distance decay. Some studies find a “buffer zone” of decreased criminal activity just around the offender’s home (Rossmo 2000), but many other studies do not.

Quite a number of studies relate the length of the journey to crime to features of the crime itself or to characteristics of the offender (Wiles and Costello 2000). For example, various studies demonstrate that juvenile offenders travel shorter distances, that there are systematic differences in the average length of the journey to crime between types of offences and that generally, the criminal rewards of crime increase with its the length of the journey to it. Geographic offender profiling (Rossmo 2000) is an investigative method that utilizes the distance decay phenomenon to prioritize the search area for an unknown offender, and it has become the subject of a comprehensive literature (Harries and LeBeau 2007; Wilson and Maxwell 2007). Geographic offender profiling is predicated on the assumption that distance decay, which is a feature of aggregated journeys to crime by different offenders, also holds for multiple crimes committed by the same offender. Recent work that uses multilevel analysis to empirically disentangle the total variation in the length of the journey to crime into within-offender and between-offender components, shows that this is barely the case, and that a large part of the variation is between offenders who have different ranges of operation (Smith et al. 2009).

Spatial Interaction Models

When goods, money, information, or people move between two locations, geographers call it spatial interaction. Spatial interaction modelsFootnote 1 are utilized to explain the quantities of these movements between locations. They can be used to analyze all types of movement flows that have an origin (starting point) and a destination (end). For example, spatial interaction models have been used to study travel between cities in terms of numbers of tickets sold (Zipf 1946), migration in terms of numbers of people moving between cities (Stouffer 1960; Wadycki 1975), trade between countries in terms of monetary value (Bergstrand 1985), inter-city telecommunication in terms of numbers of phone calls (Guldmann 1999), and also the journey to crime, in terms of numbers of offenders’ traveling from their homes to the locations where they commit crimes (Elffers et al. 2008; Peeters 2007; Rengert 1981; Reynald et al. 2008; Smith 1976).

There are various textbooks that discuss spatial interaction models in detail (Golledge and Stimson 1997; Haynes and Fotheringham 1984; Wilson and Bennett 1985). Here, we address the main features of these models and their use for understanding crime related movement patterns.

Spatial interaction models are regression models in which the unit of analysis is a pair of locations. What is modeled is the size of the interaction between these two locations. The independent variables in the regression equation include characteristics of the origins and the destinations, and one or more measures of the impedance or friction between the origin and the destination. Typically, this is the distance between the two locations, but it could also be formulated in travel time or cost. In one of the first uses of this model (Zipf 1946), it was shown, for travel by bus, by train, and by airplane, that the number of passengers traveling between 29 randomly chosen US cities was roughly proportional to the product of the populations of origin and destination divided by the distance between origin and destination. A simple spatial interaction model is given in (33.8).

$${M}_{ij} = k \cdot {P}_{i}^{\alpha } \cdot {Q}_{ j}^{\beta } \cdot {D}_{ ij}^{{}^{\gamma } }$$
(33.8)

In (33.8), P i is a characteristic of the origin that we assume to represent the propulsiveness of the origin (it is a push factor that generates movement), P j represents the attractiveness of the destination (a pull factor), D ij is the distance between origin and destination, M ij is the amount of movement from origin to destination, and α, β, and γ, the parameters to be estimated. A more general formulation could include multiple variables that characterize the propulsiveness of the origin (variables with subscript i), the attractiveness of the destination (variables with subscript j), and the impedance or friction between origin and destination (variables with subscripts i and j).

Depending on the substantive questions and data at hand, Wilson (1971) distinguished four variants of the basic spatial interaction model (Pooler 1994, addresses an extended family of spatial interaction models). In the total flow constrained model, only the total flow is fixed in advance, and in the production-constrained model, the total outflow from all origins is fixed, so the model is used to estimate “where they go to”. In the attraction-constrained model, this is reversed. The total inflow into destinations is fixed and the model is used to estimate “where they came from”. Finally, in the doubly constrained model, the total outflow from all origins and the total inflow is fixed, and the model can be used to estimate the effects of distance, or more generally, the impedance factors.

Although spatial interaction models do include distances between pairs of locations, they do not take into account the role of spatial structure. For example, the potential destinations from a given origin may be spatially clustered, and this clustering may either facilitate travel to the clustered destinations (an agglomeration effect), reduce it (a competition effect) or have no effect at all (the assumption of the standard spatial interaction model). Two conceptually similar adaptations to the spatial interaction model have been proposed (Cascetta et al. 2007). One approach to incorporate these possible effects is to introduce into the model a competing destination factor, which measures the accessibility of a destination to other destinations (Fotheringham 1983a, b), i.e., the sum of the distance weighted attractions of other potential destinations.

Alternatively, an intervening opportunities factor can be used to incorporate spatial pattern (Stouffer 1940, 1960). According to this approach, the distance effect itself is seen as theoretically redundant, as it represents the absorbing effects of destinations located between the origin and the potential destination (in Stouffer’s 1960 formulation), or of destinations located at shorter distances from the origin than the potential origin (in Stouffer’s 1940 formulation).

Although various applications of the spatial interaction model exclude the interaction of a spatial unit with itself (e.g., local migration, intra-city phone calls), there is little in the model that prevents it from being applied to such trips. The only issue is that for local trips, the distance in the denominator of the equation may be coded as zero. The solution is to replace the zero value with a small distance, e.g., the average distance between two random points within the origin area.

Spatial interaction methods have been and are often estimated using ordinary least squared regression analysis on the linear equation that results from taking the natural logarithm of both sides of an equation like (33.8):

$$\ln ({M}_{ij}) =\ln (k) + \alpha \,\ln ({P}_{i}) + \beta \,\ln ({P}_{j}) - \gamma \,\ln (D)$$
(33.9)

When modeling spatial interaction, however, there is often no specific reason to follow the functional specification of (33.9), where the logarithm of the size of the movement stream is a linear function of the logarithms of the other variables. In many cases and for various reasons, including the presence of pairs of locations where the quantity of interaction is zero and logarithms cannot be taken, a Poisson or negative binominal model is to be preferred (Flowerdew and Aitkin 1982; Flowerdew and Lovett 1988). For expositions of the Poisson model family tailored to criminology, see Osgood (2000) and Berk and MacDonald (2008).

In theory, there are quite a number of possible applications for spatial interaction models in criminology and criminal justice. We can model the residential migration of offenders or victims, the journey to crime of victims, offenders or police officers who respond to a call; we can also use spatial interaction models to study co-offending patterns, where the amount of interaction is the number of co-offending relations between these two neighborhoods. In practice, we find only a handful of applications of the spatial interaction model in criminology, and all of them analyze offenders’ journey to crime, i.e., the spatial the interaction between the zones where offenders live and the zones where they commit offences (Elffers et al. 2008; Kleemans 1996; Peeters 2007; Rengert 1981; Reynald et al. 2008; Smith 1976).

In the first application of spatial interaction models to crime travel data (Smith 1976), various specifications of spatial interaction models, including Stouffer’s intervening opportunities model, were used to analyze the crimes that resulted in arrest in 1972 by the Rochester (New York) Police Department. Offenders’ residence and crime site were coded by census tract. Distances were calculated as distances between the centroids of census tracts. Using less elaborate sets of specifications, similar spatial interaction models were applied to burglary in Philadelphia (Rengert 1981) and Enschede, the Netherlands (Kleemans 1996).

More recently, spatial interaction models have been used to study the flow of crime in The Hague, in particular to analyze whether physical barriers (Peeters 2007) and social barriers (Reynald et al. 2008) reduce the criminal movement between neighborhoods, and to test various variants of Stouffer’s intervening opportunities theory (Elffers et al. 2008).

Disaggregate Discrete Location Choice Models

Spatial interaction models apply to aggregated crime data. They have been used to explain or predict the total numbers of crimes that originate in one area and take place in another. They do not apply to individual offenders and their specific characteristics, or to individual crimes. While it is possible to estimate spatial interaction models for specific classes of offenders or offences separately (Smith 1976, for example, also performed the analysis separately for property crime only), the model remains an essentially aggregated model.

The models discussed in the following section are similar to production-constrained spatial interaction models, because they assume that the number of journeys to crime that originate from a given location, is fixed. It is not part of what must be explained. The models discussed in the following section are different because they are applied to “disaggregated” data, i.e., to journeys to crime, and they model the behavior of individual actors. They are called discrete choice models because they predict which available alternative a decision maker will choose from a set of discrete alternatives. When the choice is between spatial entities (such as the country to visit on holiday, the neighborhood to move to, or the street corner to commit a robbery), it is called spatial (discrete) choice.

Discrete choice models (McFadden 1973) are explicitly based on a theory of random utility maximization (RUM). The first applications of discrete choice models were in the study of travel mode choice (i.e., the choice between train, bus, car, or airplane). Later, the model was also applied to spatial choice (Ben-Akiva and Lerman 1985).

The point of departure of the spatial discrete choice model is an actor who is faced with a choice among J discrete spatial alternatives, of which (s)he must choose only one. The actor could be a motivated offender who is about to choose an area to commit a crime. The actor is supposed to evaluate the utility (net gain, profits, satisfaction) that could be derived from each alternative, and the utility derived by actor i from alternative j is given by the following equation:Footnote 2

$${U}_{ij} = \beta {P}_{j} + \gamma {D}_{ij} + {\epsilon }_{ij}$$
(33.10)

In this equation, P j is an attribute that varies across the spatial alternatives (e.g., economic deprivation of a city, or number of bars in a neighborhood), D ij is the distance that varies across spatial alternatives and across individuals, and β and γ are the parameters to be empirically estimated on the basis of the actually observed location choices. They indicate the relative importance of the attributes in the outcome of the utility evaluation. Finally, ε ij is a random error term that contains unmeasured relevant attributes of actors and alternatives, as well as measurement error.

The statistical model used to estimate the theoretical random utility model is the conditional logit model (Greene 1997; McFadden 1973), which is also known as the multinomial logit model.

The most useful feature of the disaggregated spatial choice model when compared to the aggregated origin-constrained spatial interaction model, is that it can be utilized to study the role of individual characteristics. By including interaction terms of individual characteristics (e.g., age) and destination characteristics (e.g., affluence), we can study whether some destination characteristics have different effects on the spatial choices of different types of offenders.

A feature in the conditional logit model is the assumption of independence from irrelevant alternatives (IIA), which implies that the ratio of two probabilities does not depend on the remaining probabilities. As this assumption is generally considered the main weak spot of the conditional logit model in general and of its application to spatial choice in particular (Pellegrini and Fotheringham 2002; Thill 1992), alternatives have been suggested, such as the nested logit model (Greene 1997: 921–926; Heiss 2002), that relaxes the assumption. Used in a spatial choice context (Hunt et al. 2004; Kanaroglou and Ferguson 1996; Pellegrini and Fotheringham 2002), the nested logit model is based on the assumption that the human process of spatial information processing and decision making is hierarchical. This means that spatial alternatives are perceived in more or less homogeneous clusters, and that spatial choice takes place in steps: first decide on a cluster, and once the cluster is chosen, choose a destination from within that cluster. For example, when we decide on where to go on vacation, we would first decide on a country, in the next phase decide on a region, and finally on a specific town or city. In the nested logit model, the IIA assumption is maintained within clusters but not between clusters.

There are various problems with the application of the nested logit model to spatial choice. The most important being that there is usually no way to determine the spatial choice structure, so that the researcher is forced to specify in advance, and often arbitrarily, the hierarchical structure of the spatial choices that individuals are confronted with. This is difficult, sometimes impossible, especially in those cases where the perspective on spatial alternatives may depend on where the origin is. For example, because we are generally most familiar with nearby places, we distinguish between small spatial “pockets” in the nearby environment, while we use larger categories for distant areas. Thus, the actors in our model who live in different locations will have choice sets that are differently structured (Pellegrini and Fotheringham 2002).

The problem of defining the relevant choice set – what is actually the set of alternatives that actors choose from – is a general problem of the discrete spatial choice model (Thill 1992), and another manifestation of the modifiable areal unit problem (MAUP).

In criminology, there have been until now only a few applications of the model (Bernasco 2006; Bernasco and Block 2009; Bernasco and Nieuwbeerta 2005; Clare et al. forthcoming), all of which use the regular multinomial model (conditional logit model). In an analysis of neighborhood destination choice of burglars in the city of the Hague, the Netherlands (Bernasco and Nieuwbeerta 2005), the authors tested hypotheses on the effects of distance, affluence and social disorganization of neighborhoods on the likelihood of being a target area for burglary, and whether these effects depended on the age and ethnic background of the offenders. Recognizing that quite a number of offences are committed by co-offenders, a subsequent study, using the same data on The Hague but including burglaries committed by co-offenders, established that criteria used in spatial choice of target areas did not differ between solitary offences and group offences (Bernasco 2006). Using the census tract as a spatial unit of analysis, another study addressed location choice of robbers in Chicago (Bernasco and Block 2009). In addition to distance, the authors also used measures of census tract dissimilarity between origin and destination tract as impeding factors to the journey to crime. They showed not only that crime trips are more likely between racially and ethnically similar census tracts, but also that individual robbers prefer to rob in census tracts where the majority of the population is of their own racial or ethnic group.

A study in Perth, Australia (Clare et al. forthcoming) used the model to study the effect of physical barriers and connectors on offenders’ choice of a destination for burglary trips.

A complication of the use of multinomial logit models for spatial choice is, that the number of possible choice alternatives becomes unmanageably large when the spatial unit of analysis is small. McFadden (1978) describes a solution in which per individual choice, a random sample of the non-chosen alternatives is deleted from the choice set, and he shows that this procedure will asymptotically yield unbiased estimates.

Crime Triads: Convergence of Offenders, Victims and Crimes

Most studies of the journey to crime have focused on the offender’s journey to crime. A handful (Bullock 1955; Caywood 1998; Messner and Tardiff 1985) have, in addition, explored the victim’s journey to crime.

Routine activity theory, an influential perspective in criminology, asserts that the necessary and sufficient conditions for a crime are fulfilled when a willing offender and a desirable and unprotected target converge in space and time (Cohen and Felson 1979). This perspective views the journey to crime not as the dyad, but as a triad: it links the offender’s home, the victim’s home, and the location of the criminal event.

Spatial analyses of crime triads have used either a distance approach or a mobility triangle approach. Studies that have used the distance approach typically extend the types of analysis that have been used to study the length of the offender’s journey to crime. For example, they explore the distance between the offense and the offender’s and the victim’s home, and relate these distances to characteristics of the offender, the victim, and the offense (Block et al. 2007; Pizarro et al. 2007).

Another approach focuses on the mobility triangle: a typology of the spatial relations between offender, victim and criminal event (Groff and McEwen 2007; Tita and Griffiths 2005). Most studies have used the following five-category typology:

  1. 1.

    Offender and victim both live at the location where the crime takes place (neighborhood or internal triangle)

  2. 2.

    Offender and victim live in different locations, and the crime takes place at the victim’s address (offender mobility triangle)

  3. 3.

    Offender and victim live in different locations, and the crime takes place at the offender’s address (victim mobility triangle)

  4. 4.

    Offender and victim live at the same location, but the crime takes place elsewhere (offense mobility triangle)

  5. 5.

    Offender and victim live in different locations, and the crime takes place at neither’s address but elsewhere (total mobility triangle)

In this typology, the term “location” is used generically, although most mobility triangle studies have used the census tract or the neighborhood as a spatial unit of analysis. The typology could, however, also be based on a much smaller spatial entity, such as an address. In such a case, for example, the internal triangle (type 1) would only apply when committed in the joint home of the offender and the victim (e.g., domestic violence). Groff and McEwen (2006, 2007) propose a distance-based mobility triangle that distinguishes between near (less than a quarter mile apart) and distant (more than a quarter mile apart) location pairs. In another study, they discuss cartographic techniques to visualize mobility patterns in crime triads (Groff and McEwen 2006).

The five-category typology can be related to characteristics of the victim, the offender and the offense. Multivariate approaches have used multinomial regression analysis to explore the relation between these characteristics as the independent variables, and the mobility triangle category as the dependent variable (Groff and McEwen 2007; Tita and Griffiths 2005).

Conclusion

There is an increasing spatial awareness in criminology. Testing for residual autocorrelation and utilization of methods that account for spatial autocorrelation have become a routine issue in the analysis of spatial data. Moreover, spatial crime data are more and more the object of theoretically inspired research that focuses explicitly on spatial effects in terms of spillover, diffusion, contagion, and displacement processes. In addition, the analysis of movement and mobility has moved beyond the description of the distance between offenders’ home and crime site, and is now used to test theories of travel behavior and target choice. Although the field of crime and criminal justice is certainly not the place where most new analytical methods are discovered and developed, the field is quick to absorb new analytical strategies and apply them to spatial crime data.

At present, there are a few issues that could be addressed to advance the field of spatial analysis of crime data. They include

  • The integration of the spatial and the temporal dimensions of crime,

  • The empirical measurement of travel of (mobile) targets and offenders,

  • The measurement and analysis at lower levels of spatial aggregation, and the utilization of methods that are robust to the skewed distributions resulting from it,

  • A greater focus on experimental work, both empirically (field experiments) and theoretically (simulation).

We discuss these four issues below.

There are not only places but also times that are suited for crime. However, the temporal dimension has received far less scholarly attention than the spatial dimension (Grubesic and Mack 2008; Ratcliffe 2006). Still, it is a common observation that crime varies spatio-temporally. For example, because offenders prefer to burgle unoccupied premises, residential burglaries are mostly daytime events while commercial burglaries are nighttime events (Ratcliffe 2001). To capture this variation, research on spatial crime distribution and on spatial movement should take into account spatio-temporal patterns.

All quantitative research on movement and crime location choice has been performed on origin–destination data (whereby the origin is typically the registered home address of an offender as the assumed starting point of the journey to crime). No research has yet studied the concrete spatial behavior of offenders, including routes and travel modes. In other research areas, such as time use and transportation research, instruments have been developed that measure subjects’ activities over time (Pentland et al. 1999; Schlich and Axhausen 2003). These time-budgets ask subjects about their activities in terms of what they were doing, when they were doing it, and at what place were, and with whom they were doing it. In a space-time budget instrument, subjects are also asked to indicate at what geographic location the activities took place (Wikström and Sampson 2003: 137–138). Such data would not only specifically document from where and along which roads offenders travel towards and away from their targets (rather than assume a straight line from home to crime site), but also allow us to compare, within the same individual, which time-space patterns are associated with offending and which ones are not.

Ethnographic fieldwork (St. Jean 2007) and quantitative research (Oberwittler and Wikström 2009) show that even at low levels of spatial aggregation, such as block groups or street segments, considerable variation exists. As discussed in this chapter and elsewhere (Weisburd et al. 2009), to capture small-scale variations and interactions, research in geographic criminology should use small spatial units of analysis. As this will generally involve more skewed distributions, they must be analyzed with analytical tools specifically adapted to modeling such skewed data.

Finally, although this is more an issue of methodological than of analytical concern, spatial crime data have very seldom been experimental in nature. Empirically, there is some interesting experimental work on the spatial effects of certain forms of policing, in particular, on spatial displacement effects (for a review, see Braga 2001). On the theoretical front, a start has been made with spatially informed simulation studies (Elffers and van Baal 2008; Groff 2007; Johnson 2008).