Keywords

Introduction

The overall purpose of this study is to examine if violent crime rates are a good predictor for community health. Specifically, this study used local obesity rates as a proxy to community health. From an ecological standpoint, researchers often study obesity through investigating their associations with environmental characteristics (Sandy et al. 2013; Burdette and Whitaker 2004; Shahid and Bertazzon 2015; Ruijsbroek et al. 2015) and/or social structural characteristics, such as poverty (Halleröd and Larsson 2008; Chen and Truong 2012; Salois 2012; Rybarczyk et al. 2015; Huang et al. 2018) and race (Fan and Jin 2014). Furthermore, sociologists often argued that the fear of crime and the lack of appropriate infrastructure led to less physical activities, which resulted in obesity among residents. To this end, however, there have not been direct causal associations or universal relationships found between crime and obesity in the literature.

The selection of analytical methods may contribute to the inconsistent results found in previous research. Current research mainly used global regression models which failed to consider the spatial non-stationarity within the relationships between variables (Sandy et al. 2013; Brown Barbara et al. 2014. Therefore, spatially weighted analytics such as the geographically weighted regression (GWR)/geographically weighted Poisson regression (GWPR) have become increasingly recognized and used in public health research (Gilbert and Chakraborty 2011; Nakaya et al. 2005; Yang and Matthews 2012; Comber et al. 2011). This study examines whether violent crime can be a good predictor to local obesity prevalence and if such association is spatially varying using the GWR.

This paper demonstrates the use of GWR in modeling crime, health, demographic, and environmental data. The discussions are to answer the questions: Is it necessary to use GWR in crime and health modeling? How to organize data? How to select the appropriate variables for model building? How to interpret the results? Is the GWR result better than the ordinary least square (OLS) regression? What should we pay attention when mapping the results?

Geographically Weighted Regression and Ordinary Least Square Regression

Spatial Non-stationarity

The concept of spatial non-stationarity was first introduced by Fotheringham, Charlton, and Brunsdon (Fotheringham et al. 1996). In their paper, they pointed out that even though researchers had recognized the spatial component in data, global models were still widely used in studies. However, according to the First Law of Geography (Tobler 1970), global parameters could not be able to capture the spatial variances which existed in the relationships between the explanatory variables and the dependent variable (Fotheringham et al. 1996; Brunsdon et al. 1996).

In an effort to address the issue of spatial non-stationarity, researchers proposed localized spatial statistics such as the G statistics (Getis and Ord 1992), local indicators of spatial association (LISA) statistics (Anselin 1995), local ordinary least square regression (OLS) and local Bi-square (Fotheringham et al. 1996), and geographically weighted regression (Brunsdon et al. 1996). Among the aforementioned methods, the first two measure the levels of spatial clustering in geographical events and capture the spatial heterogeneity among them. The latter extended traditional regression models by adding components that measured the strength of spatial associations.

Since the introduction of the geographically weighted regression (GWR), there have been an increasing number of studies that used the method in research related to public health (e.g., Chen and Truong 2012; Chi et al. 2013; Wen et al. 2010; Chalkias et al. 2013). However, with concerns that building GWR models may yield a higher correlation between model variables than those from OLS models (Cahill and Mulligan 2003; Cahill and Mulligan 2007; Troy et al. 2012; Deng 2015; Rybarczyk et al. 2015), many previous and current research still applied global linear regression such as OLS for model building (Vandewater et al. 2004; Singh et al. 2008; Carroll-scott et al. 2020). Given this, it is crucial to weight the strength and weakness of the two methods so that future researchers can choose the most appropriate one.

The GWR model can be expressed as:

$$ {y}_i={\beta}_0\left({u}_i,{v}_i\right)+\sum_{j=1}^k{\beta}_j\left({u}_i,{v}_i\right){x}_{ij}+{\varepsilon}_i $$

where y i is the estimated value of the dependent variable at the location i and (u i, v i) describe the coordinates of i, β 0 is the intercept value, and β j is a set of parameters at point i. The value of β j will vary for different space-time locations. It is assumed that the observed data close to point i have a greater influence in the estimation of β j than others. Detail explanation regarding the GWR method can be found in Fotheringham, Charlton, and Brunsdon (Fotheringham et al. 2002).

In addition to GWR, the spatial matrix was introduced to other regression models to expand the spatial statistics such as the GWPR mentioned earlier (Nakaya et al. 2005) and Gaussian semi-parametric GWR (SGWR, Villarraga et al. 2014).

Although GWR can be used to explore the spatial non-stationarity among variables, it might not be suitable for use for all datasets. Therefore, it needs to be justified that using the spatially weighted method is indeed superior to global modeling before applying it to a specific dataset in the research.

Justification for Using GWR

There are a few criteria that we can use for assessing if GWR is suitable for use in a study. A global regression model may be built for an initial assessment. After applying the OLS regression, Jarque-Bera test (Jarque and Bera 1980; Thadewald and Büning 2007; Barbu 2012) and Koenker (BP) statistics (Wallace 2011; Avila-flores et al. 2010) can be applied to evaluate the model residuals for assessment. Combining both tests assesses whether the relationships shown by the model have any bias or are consistent over the study region. If both statistics are statistically significant, we would be confident that the global model is biased, and another method, such as GWR, should be used (Ortolano et al. 2018; Avila-flores et al. 2010).

However, the Jarque-Bera test and the Koenker (BP) statistics do not necessarily reflect whether the spatial non-stationarity causes the bias. Therefore, other criteria should be included to justify whether using GWR produces a better model. Overall, we may compare the residual squares, the Akaike information criterion (AIC) (Akaike 1974) which serves as a goodness-of-fit indicator (Yang and Matthews 2012), and the R2 and/or adjusted R2 between the GWR and OLS model to determine whether it is necessary to use GWR and whether GWR models perform better than the OLS model.

The residual squares are the sum of the squared residuals in the model. Models with smaller residual squares have a closer fit between estimated values to the observed data. Furthermore, the model with lower corrected AIC (AICc) value reflects a better fit to the observed data. If the AICc value of the GWR model is lower than that of the OLS model with the difference larger than 3, it can be asserted that using the GWR model is beneficial.

Also, R2 represents the goodness of fit, which shows the proportion of the dependent variable variance accounted for by the independent variables. The value of R2 and adjusted R2 shows the strength of the association between the dependent variable and independent variables. The larger the value, the better the fit of the model. However, adding any collinearity among explanatory variables might inflate the value of R2. Therefore, the adjusted R2 should also be evaluated.

Bandwidth Selection

Different from the ordinary least square (OLS) regression model, which treats a study area as having the same association between dependent and independent variables everywhere, the GWR uses a moving kernel with a fixed or adaptive bandwidth for defining the different spatial weight that a given local unit should be weighted in the analysis of the association between dependent and independent variables. Such a model produces localized regression parameters, i.e., a local R2 and local regression coefficients for each spatial unit and each independent variable in the model. Therefore, GWR can be used to search for locations that exhibit significantly strong (or weak) associations between the independent and dependent variables or to detect “hot spots” (Fotheringham et al. 2002).

The spatial weights matrix may be structured to be based on distances between local spatial units. Based on the First Law of Geography (Tobler 1970), the selection of a bandwidth may significantly affect the model outcome (Cahill and Mulligan 2007). The bandwidth that is too small may include fewer data points for estimation, which may result in an instability of the parameter estimates, while a bandwidth that is too large may smooth the spatial variation at the estimation point.

Many GIS software allows users to determine whether they want to use a pre-defined fixed bandwidth or an adaptive bandwidth. The value of a fixed bandwidth may come from previous experience or literature. However, in this research, adaptive bandwidth was selected. There are two popular parameters usually provided in GIS software to calculate the optimal bandwidth – the corrected Akaike information criterion (AICc) and the cross-validation (CV). The two parameters produce similar results, while the AICc (Akaike 1974) also can serve as a goodness-of-fit indicator (Yang and Matthews 2012), which makes it more popular among researchers (Cahill and Mulligan 2007; Fotheringham et al. 2002).

Attribute Selection Process

It is also crucial to select a proper set of independent variables that are correlated with the dependent variable while avoiding multicollinearity among one another so that the model is reliable. The first step of choosing variables is to always refer to the literature and previous studies so that the model makes sense. This study referred to ecological studies including opportunity theory (Cohen and Felson 1979) and environmental criminology theory (Brantingham and Brantingham 1975) that suggested the included variables for representing the racial component. Selected explanatory variables include Black population percentage; economic components such as housing occupancy, income, and employment rate; and physical/built environment characteristics. However, not all variables were suitable for building the health-crime regression model.

As introducing variables that are highly correlated may affect the model outcome by falsely inflating the R2, there are a few criteria that we should use to select the most appropriate set of independent variables for the model building. A correlation analysis may be performed to both dependent and independent variables before building the model. Independent variables that are not statistically significantly correlated with the dependent variable can be discarded. Among independent variables, the ones that were not highly correlated with each other can be retained in the final GWR model. Also, the model’s variance inflation factor (VIF) can be calculated to refer to whether there was any redundancy among explanatory variables. If the VIF was less than 5, the variable could be included in both the OLS and the GWR model.

When building an OLS model, it often excludes the not significant variables, for example, at the 95% confidence (t-value >1.96 or <−1.96). However, since spatial non-stationarity may exist in the relationship between the dependent variable and independent variables, when an independent variable is sometimes not significantly correlated with the dependent variable in an OSL model, it may be acceptable in the GWR model. This is because it may be significantly related to the dependent variable at certain places.

In this research, the variable selection process for building the GWR model was done by using GWR4 software. In the model set, the function of “Geographical variability test” was selected, which reported a statistic for each variable. This statistic is called the “difference of criterion” (DIFF of Criterion). It indicates whether the variable presents any spatial variability. For variables that have the values of the DIFF of Criterion that are greater than 2, it was suggested that these variables should be assumed as global variables but not local ones. These variables were best removed manually if they also were not significantly related to the dependent variable at the global term.

Study Area and Data

The city of Akron in the Summit County of Ohio was the study area which is located in the center of Summit County of Northeast Ohio. Akron was one of the fastest-growing cities in America during the 1920s with a population peak of over 300,000 people. However, the city’s population has been continuously declining since then. The 2010 census showed that the city had 199,110 people. Figure 1 shows the locations of the downtown Akron neighborhoods and the population density by block groups in 2010 (population data retrieved from US Census Bureau). As it is shown in Fig. 1, the population of Akron is concentrated in neighborhoods surrounding downtown.

Fig. 1
figure 1

Population density in the city of Akron, 2010

The spatial units for analysis are census block groups, which are the smallest unit to have an extensive selection of census variables available. Therefore, data retrieved in other spatial units such as point data of crime and BMI (for obesity) were aggregated into block groups for analysis.

Obesity and Crime Data

Self-reported data were used to calculate the BMI for individuals. These data came from Summit County’s Department of Motor Vehicles. The dataset contained a complete listing of self-reported heights and weights for all residents who held drivers’ licenses. The data cover all holders of drivers’ licenses between 2009 and 2014, which is a 5-year spectrum, corresponding to the time duration that each license has to be renewed.

To further reduce potential biases, only data from the file that recorded heights and weights for adults aged between 16 and 21 were considered for this study. This is based on the assumption that the first-time reported heights and weights are more accurate than the ones from the licenses that were renewed later because many renewals probably were not given updated heights and weights. The data file of all license holders has approximately 440,000 records.

Crime data comes from the Akron Police Department. Each year’s data file contains the time and date of the crime events. Collectively, there are crime data from 2009 to 2012. Each record in these files contains location information as geocoded by the Akron Police Department in the form of latitude and longitude.

Crime types of violent crimes were selected for analysis. Violent crime in this dataset includes assault, battery, murder, homicide, and manslaughter. It is noted that crimes of sexual assault were not included because of the complex nature of the crime.

The Uneven Distribution of Obesity Cases, Crimes, and Neighborhood Characteristics

Figure 2a, b shows the distribution of obesity rates and violent crime rates (i.e., the number of crime incidents per 100,000 populations in each spatial unit). The distribution of obesity, as shown in Fig. 2a, is similar to the distribution of population densities. However, the distribution pattern of the violent crime rates, as it is shown in Fig. 2b, is different from the distribution of that of obesity rates which the downtown center often observes the highest crime rate

Fig. 2
figure 2

(a) Distribution of obesity rates; (b) Distribution of violent crime rates

In addition, neighborhood characteristics, including socioeconomic and environmental variables, were included in the analysis. The socioeconomic data were obtained from the US Census Bureau, and the 5-year estimates of 2014 were selected. The variables include population density, Black percentages, renter-occupied housing percentage, median household income (MHHI, US dollars), and unemployment rates. These variables represent several socioeconomic components of Akron, including racial, housing occupancy, and economic status.

The built environment data were retrieved from the County of Summit GIS Hub and Open Data (http://data-summitgis.opendata. arcgis.com/), including the impervious surface percentage and tree cover percentage per block group and road density.

Data Organization and Software

The obesity percentage of each block group was the dependent variable in the model. It was calculated by first calculating the individual’s BMI from the sample. Those individuals who had BMI larger than 30 were considered to be obese and were included in this study. The BMI >30 criterion is according to the definition by the United States Centers for Disease Control and Prevention (https://www.cdc.gov/obesity/adult/defining.html). The number of the obese populations was aggregated to block groups and standardized by dividing the block groups’ population counts to derive the obesity percentages (i.e., obesity rates). Independent variables in the model include violent crime rates and SES and environmental attributes of Akron, including Black population percentages, renter-occupied housing percentages, median household income, unemployment rates, impervious surface percentages, tree cover percentages, and road densities.

GWR was performed using GWR4* software (Nakaya 2014; software available at https://gwrtools.github.io/) to build models between violent crime rates and obesity rates. An adaptive Gaussian kernel was selected for building the models, and AICc were used to assess the model’s goodness of fit. ArcGIS 10.5 (ESRI, Redlands, CA) was used for mapping the results.

Obesity, Violence, and Neighborhood Characteristics: Regression Analysis Results

OLS Regression Result

Table 1 shows the regression coefficients in the OLS model. Table 1 shows the best models with the subsets of the independent variables. The R2 and adjusted R2 of the OLS model are 43.43% and 41.31%, and the minimum AICc value is 421.58. The VIF values for the impervious surface percentages and tree cover rates are higher than 5 and lower than 7.5, which indicate the model being moderately problematic. Other VIF values indicate that there were no problematic levels of multicollinearity. All coefficients other than that of the environmental variables, including impervious surface percentages, road densities, and tree cover rates, were statistically significant at the 5% level.

Table 1 OLS model result

The OLS model shows that violent crime rates and the Black population percentages are positively related to the obesity rates. Although it cannot be asserted that there exists a causal relationship between crime, race, and obesity, elevated crime rates and Black percentages in a neighborhood may be related to increasing obesity rates as indicated by the model outcome.

Also, median household income, unemployment rates, and renter-occupied housing rates are negatively related to the obesity rates. These results show an inconsistent relationship between the economic status of the neighborhoods and obesity rates as an increase of median household income and an increase of the unemployment rate are related to lower the obesity rate in the neighborhood. Higher unemployment rates and renter-occupied housing rates are typically considered to be reflecting lower SES of a neighborhood, while a higher median household income often indicates a higher SES.

Overall, inconsistent results were found in the OLS model. Both the Koenker (BP) and Jarque-Bera tests are statistically significant, indicating that the OLS model is bias and not reliable. This may be explained as that the OLS model has failed to consider the spatial variations within the relationships (Fotheringham et al. 2002). Therefore, building a GWR model is necessary to analyze the local variations.

Geographically Weighted Regression Results

Four variables entered the GWR model, including the road densities, violent crime rates, Black population percentages, and unemployment rates. The model’s AICc value is 402.40, which is reduced by 19.18 from that of the previous OLS model. The R2 and adjusted R2 are 50.00% and 45.78%, which are higher than those of the OLS model. The optimal bandwidth is 50.86.

Figure 3a–d show maps of individual variables’ t-values, which are presented using graduated colors. A positive t-value represents a positive association between the variable and obesity, while a negative value shows otherwise. Locations that have absolute t-values higher than 1.96 or 2.58 (or lower than −1.96 or − 2.58), which correspond to the 95% or 99% significance levels, respectively, reveal statistically significant relationships. Mapping is done in ArcMap 10.5. For variables of road densities and unemployment rates, values are divided into five categories based on the level of significance. Due to all locations showing positive associations between violent crime rates/Black population percentages and obesity rates, the t-values for these two variables are divided using Natural Break in ArcMap 10.5

As it is shown in Fig. 3a, road densities are negatively related to obesity rates in Akron neighborhoods. Block groups located on the south and east of Akron observed the most robust relationships. However, most block groups in Akron experience no significant relationships between road densities and obesity rates.

Figure 3b shows that the violent crime rates are positively related to obesity rates across the whole study area. Violent crimes in block groups of the south, southwest, and southeast have stronger associations with obesity rates. The race variable is also observed to have a positive association with obesity in the study area, as it is shown in Fig. 3c. Stronger such relationships are mostly concentrated in downtown and north Akron

In addition, the unemployment rates are negatively related to obesity rates. This is shown in Fig. 3d. Significant relationships are found in the south and east sides of Akron, as well as in downtown neighborhoods.

Fig. 3
figure 3figure 3

(a) Road density t-values; (b) violent crime rate t-values; (c) Black population percentage t-values; (d) unemployment rate t-values

Discussion

Overall, the results of our analysis are consistent with previous research in that locations that experienced higher violent crime rates also experienced higher obesity rates. The results from the GWR models showed better results than those of OLS models. As shown in Fig. 3b, violent crimes showed overall positive associations with obesity rates in the study area. These results are consistent with findings from previous research (Taylor 1995; Stafford et al. 2007; Sandy et al. 2013). The results of the GWR models showed that there were spatial non-stationarity in the associations between obesity, crimes, racial, socioeconomic, and environment variables. Some locations were more vulnerable to violent crime and obesity than others, especially locations in the urban center and the south side of Akron.

In general, it is expected that increasing crime rates, especially in neighborhoods located in the urban areas, may be associated with elevated obesity rates. However, different neighborhoods reported having different degrees of the associations between crime and obesity. Therefore, it is worth to mention that locations that have higher regression coefficients between crime and obesity are not necessarily the locations that have high crime rates and high obesity rates as it is shown in Fig. 2a, b. Locations with higher crime rates and obesity rates are located in the urban areas, in and around neighborhoods adjacent to the urban center. Also, areas that have the highest coefficients are located in the southern part of Akron. These neighborhoods are older urban residential communities where housing values and household incomes are low and renter-occupied rates are relatively lower than national average (around 42% compared to the national average of 36%), according to the US Census Bureau (https://www.census.gov/quickfacts/fact/table/US/PST045218). Nevertheless, renter-occupied percentage was found to not have a significant contribution to obesity rate in the final GWR model. However, to use the variable as the sole explanatory variable, local significant relationships were found. As a result, more detailed research should be done locally to investigate whether more reasons may contribute to poor safety or health situation.

Furthermore, no strong or significant associations are found between environmental variables and obesity, except road densities. However, only some block groups reported significant t-values between the variables and obesity. These neighborhoods are located in the south side of Akron, which has low housing values and household income. The result also shows that there are spatial non-stationarity in associations between the variable and obesity. More research should be done to explore the effects and extent of that association.

The racial variable showed to be significantly associated with obesity. As shown in Fig. 3c, block groups located in downtown and the northern neighborhoods show stronger relationships with obesity. These neighborhoods have high housing rental rates and low household income according to the Akron Neighborhood Profiles.

Also, economic variables of unemployment rates are found to be related to obesity in some Black groups. Locations in the eastern part of Akron observe negative associations between unemployment and obesity. Such a result is not consistent with findings from previous research in that low economic status contributes to higher obesity rates (Laitinen et al. 2002). Therefore, solely trying to increase local income levels may not be the best strategy for improving local health status. Instead, more surveys or research should be done on the levels and ways of food consumption and nutrition levels of the residents.

Strength and Weakness

Comparing the results from the OLS model and those from the GWR model, we demonstrated that the GWR model produces statistically significant results that seem to be more practical than those from the OLS model. However, GWR is not a panacea for all model-building problems. Based on what it has been discussed above, this section summarizes some weakness of the method and what users should pay attention to when addressing the issue of spatial non-stationarity.

First, it is necessary to justify whether GWR is needed. The procedure presented in this research shows a combination of the Jarque-Bera test and Koenker (BP) statistics to indicate that the OLS model is bias and unreliable so that applying GWR is preferable to achieve potentially better results. However, although the GWR method produced a significantly better model than the OLS model in this paper, the increase of the overall R2 is not substantial. Therefore, further investigation and other measures may be taken to improve the explanatory power of the independent variables in the model.

Second, the selection of the bandwidth is crucial in model building. The most common measures are the AICc and CV. Researchers can also assign a bandwidth based on existing literature and their particular research needs. Keep in mind, however, that both fixed and adaptive bandwidths may introduce a certain degree of generalization over the spatial non-stationarity. Ideally, the selected bandwidth not only would have the best statistical results but also is the most meaningful.

Third, the selection of model variables is also critically important. In this research, variables in the final model were selected based on the following criteria. First, the population density was not suitable to be a local predictor as its DIFF of Criterion is greater than 2. Also, impervious surface and tree cover percentages were not statistically significant to explaining the variations in the dependent variable since their local absolute t-values were less than 1.96 or 2.58. After removing these three variables, the variable MHHI, renter percentages contribute less to the model as compared to other variables. However, if removing a variable caused little to no impact on the overall R2, the user should eliminate such variable. Therefore, both variables were not included in the final GWR model.

In summary, as a quantitative method, a user of the GWR model should be careful with the model parameters in order to produce an optimal model and appropriately interpreted the results.

Concluding Remarks

To study into why existing studies found inconsistent results, this article applied OLS and GWR methods to explore the associations between obesity and crime, SES, and environment. In the OLS model, both the Koenker (BP) and Jarque-Bera tests reported significant, which indicate that the global model was not sufficient nor appropriate to explore the associations between crime and obesity. Overall, GWR models revealed spatial non-stationarity in the associations being investigated and produced better results than OLS models.

Violent crimes were found to be generally positively related to obesity. It is worth to note that locations that have already experience higher crime and obesity rates may not have the same effects as other places. Therefore, it is worth to look into specific neighborhoods as of why even though it currently does not have a high crime and obesity rate, it is still vulnerable to a change. Policies such as revitalize or gentrify downtown Akron may attract new investments into downtown, so as to improve local economies. Accordingly, the police department should increase patrol in urban areas to help ensure the safety of neighborhoods.

Both the environmental and SES variables showed local variances in the GWR model. These variances confirm the existence of the spatial non-stationarity among their relationships with local health status. According to the result, even for a single city, the same strategies might not work for all neighborhoods. Policies must be adjusted to target on local situations. Given the spatial non-stationarity concluded in this study, more detailed investigations should be conducted locally so that appropriate measures can be taken to reduce the problems of neighborhood crime and health issues.

Finally, the study reported here is one of the relatively few that look at the associations between violent crime and public health from a quantitative perspective. Results reported here should contribute to our understanding of, spatially, how violence, socioeconomic, and environmental conditions may influence local health.