1 Introduction

Geographically Weighted Regression (GWR) allows the spatial modeling of non-stationary data and was defined by Fotheringham et al. (2002). However, this technique is used when the distribution of the data is Gaussian. In many applications the dependent variable represents a count which makes classic GWR inappropriate to model this type of data. Poisson and negative binomial are the most adequate distributions for the modeling of count data, and Geographically Weighted Poisson Regression (GWPR) was developed by Nakaya et al. (2005). Kobayashi and Lane (2007) and Hadayeghi et al. (2010) used GWPR because Geographically Weighted Regression for a negative binomial distribution was not available. The advantage of a negative binomial distribution is the ability to model data with overdispersion (Hilbe 2011) because this type of data has an additional parameter, α. Furthermore, this distribution is a generalization of the geometric and Poisson distributions, when α=1 and α→0, respectively.

The negative binomial and Poisson distributions belong to the exponential family of distributions (Nelder and Wedderburn 1972). The Fisher Score algorithm of Generalized Linear Models (GLM) can be extended to the spatial case with a negative binomial distribution considering a modification in the Iteratively Reweighted Least Squares (IRLS) procedure, which results in the proposed Geographically Weighted Poisson Regression (GWPR). The parameter α of the negative binomial, which is supposedly known through the IRLS method, can be estimated in a subroutine with the maximum likelihood method using the Newton-Raphson (NR) algorithm (Hilbe 2011).

Since the work of Nakaya et al. (2005), many other studies have attempted to investigate the limitations and problems associated with Geographically Weighted Regression, such as spatial error autocorrelation (Leung et al. 2000b), multicollinearity (Wheeler and Tiefelsdorf 2005; Geniaux et al. 2011) and extreme coefficients including sign reversal (Farber and Paez 2007). In a more recent work, Paez et al. (2011) investigated spatially varying relationships in a simulation-based study using diverse levels of multicollinearity between the explicative variables. These researches concluded that caution should be exercised when attempting to interpret coefficients obtained through Gaussian GWR.

The objective of this paper is to extend the IRLS and NR methods to establish Geographically Weighted Regression for a negative binomial distribution, which is denoted GWNBR, for the modeling of non-stationary count spatial data with overdispersion. In addition, the procedure developed by Paez et al. (2011) will be used to investigate the performance of our proposed method in the presence of multicollinearity between the explicative variables. Section 2 presents the specifications of the GWNBR methodology. Section 3 presents some simulations and real applications that were used to evaluate the potential of GWNBR compared with GWPR and to conduct the simulations presented by Paez et al. (2011). Some conclusions are drawn in Sect. 4.

2 Specifications of Geographically Weighted Negative Binomial Regression

The most used global Negative Binomial Regression (NB-2) considers a logarithm link function. Parameterizing this model in terms of μ j /t j , where t j is an offset variable, μ j is the predicted mean, α is the parameter of overdispersion, β k is the parameter related to the explicative variable x k , for k=1,…,K, and y j is the j-th dependent variable for j=1,…,n we have

(1)

where NB represents Negative Binomial.

GWNBR is an extension of the global or non-spatial model (1), which allows the spatial variation of the parameters β k and α. Without limiting the functional form of this variation, GWNBR produces non-parametric surfaces of the parameter estimates. This local model is described as the following:

(2)

where (u j ,v j ) are the locations (coordinates) of the data points j, for j=1,…,n.

The parameter estimation of the global model (1) is performed interactively with the combination of the NR and IRLS methods, i. e., using \(\hat{\alpha}\), which is estimated by the NR method, we estimate the vector β using the IRLS method. Thus, from this new \(\hat{\boldsymbol{\beta}}\), we update \(\hat{\alpha }\), and so forth until convergence is obtained. However, modifications in the NR and IRLS algorithms are necessary to incorporate the local variations.

Let us consider first the modified IRLS method. Initially, suppose that the parameters α(u,v) are known. Then, the log-likelihood of GWNBR, as function of β(u,v), is given by

(3)

where for j=1,…,n,

(4)
(5)
(6)

Considering that the surfaces of the parameters β(u,v) are approximately planar (Fig. 1) in the neighborhood of the regression point i, the local log-likelihood of GWNBR at location i can be approximated as

(7)

where, for i=1,…,g, μ j (β(u i ,v i )) is the predicted mean at location j with the parameters at regression point i:

(8)

and w(d ij ) is the geographical weight of observation j at regression point i, which depends on the distance between them. A kernel function can be used to determine these weights, as will be described in Sect. 3.

Fig. 1
figure 1

Surface of β k which is locally planar at point i

Note that it is possible to estimate the parameters at any regression point i. However, the estimated means are calculated only for the observed points j at which the information x jk is known.

The maximum local log-likelihood (7) provides estimates for the parameters β(u i ,v i ) of GWNBR at location i. Thus, using the results provided by Nakaya et al. (2005) for GWPR, the analytical solution for (7) is given by

(9)

where X is an n×k matrix of the explicative variables

(10)

W(u i ,v i ) is an n×n GWR diagonal weighting matrix for point i

(11)

and A(u i ,v i )(m) is an n×n GLM diagonal weighting matrix for iteration m and location i

(12)
(13)

In this paper, the IRLS method will be used with the observed Fisher Information Matrix. Thus, the elements \(a_{ij}^{(m)}\) (j=1,…,n) of GWNBR (12) are the following:

(14)

As in GWPR, the covariance matrix of the parameter estimates can be estimated by

(15)

where

(16)

and the elements of W(u i ,v i ) and A(u i ,v i ) are given by (11) and (12), respectively.

After an estimate of β(u i ,v i ) is obtained, the parameters α i will be estimated using the NR method based on the local log-likelihood (7). To facilitate the calculation, the parameters r i , where r i =1/α i , will be estimated first. Thus, rewriting (7), we have

(17)

Maximizing the local log-likelihood (17) using the univariate NR method, we obtain

(18)

where \(U_{i}^{(m)}\) and \(H_{i}^{(m)}\) are the first and second derivatives of the local log-likelihood with respect to \(r_{i}^{(m)}\), i.e.,

(19)
(20)

where ψ(.) e ψ′(.) are the digamma and trigamma functions, respectively, which are given by

Using delta method, we find that, for a function g(.) that is differentiable in θ, if the distribution \(\hat{\theta}_{n}\rightarrow N(\theta, \sigma^{2}_{n})\) achieves convergence, then the distribution \(g(\hat{\theta}_{n})\rightarrow N (g(\theta), [g'(\theta)]^{2} \times\sigma^{2}_{n} )\) also converges (Casella and Berger 2001). If, under certain conditions, the estimators that maximize the local likelihood are asymptotically normal, unbiased and consistent (Staniswalis 1989), then

(21)

because \(\mathit{Var}(\hat{r}_{i})=-1/H_{i}\), where H i is given equation by (20), and \([g'(r_{i})]^{2}= 1/r_{i}^{4}\).

Thus, for each regression point i, the NR and IRLS algorithms are used alternately until the parameter estimates achieve convergence.

To complete the fitting of the model, it is necessary to estimate the bandwidth of the chosen kernel function, which helps determine the weights w(d ij ). One possibility could be to estimate it such that it minimizes the corrected AIC criterion (AICc):

(22)

where k is the effective number of parameters and L(β,α) is the log-likelihood of GWNBR shown in (3). Hereafter, the term “AIC” represents AICc.

The effective number of parameters of GWNBR can be written as k=k 1+k 2, where k 1 and k 2 are the effective number of parameters due to β and α, respectively. Following the method developed by Nakaya et al. (2005), k 1 is given by the trace of the matrix R, which is given by elements

(23)

where X j is the j-th row of X.

However, to date, it has not been possible to estimate k 2, i.e., the contribution of the surface of \(\hat{\boldsymbol{\alpha}}\) on the effective number of parameters of the model. Consequently, we opted to estimate the bandwidth using the cross-validation criterion given by (Fotheringham et al. 2002):

(24)

where \(\hat{y}_{\neq j}(b)\) is the estimated value for point j, omitting the observation j and b is the bandwidth.

Note that the indetermination of k 2 does not prevent the adjustment of GWNBR. However, this indetermination makes it difficult to compare the models because the complexity of the GWNBR, which is given by the effective number of parameters, is unknown.

2.1 GWNBRg model

To avoid the difficult associate with the estimation of k 2, we propose the Geographically Weighted Negative Binomial Regression with α global methodology, which is namely GWNBRg. In this model, the spatial variation is allowed only for β(u i ,v i ), i. e.,

(25)

where the parameters are the same as in (2).

In the GWNBRg model, the estimation of the parameter α is made globally, i.e., we assume that all of the parameters in the model are stationary, and we estimate a global overdispersion \(\hat{\alpha}\) to be used in the local estimates β(u i ,v i ). Consequently, we propose that the estimate of α in the GWNBRg model will be the same as that obtained through non-spatial (or global) negative binomial regression.

The parameters β(u i ,v i ) are estimated using the IRLS method, as in the GWNBR model described earlier, assuming that \(\hat{\alpha_{i}}=\hat{\alpha}\) for all i. Note that it is not necessary to alternate the NR and IRLS methods, because once \(\hat {\alpha}\) is estimated globally, we estimate β(u i ,v i ) for each regression point i using only the IRLS method.

Because there is no spatial variation for α, its contribution to the effective number of the parameters in the model is the unity, i.e., k 2=1. Consequently, the bandwidth can be found using the AIC criterion (22), where

(26)

and \(k=\operatorname{tr}(\mathbf{R})+1\).

Thus, the objective of the GWNBRg model is to perform the GWNBR methodology in a simplified manner, which has the disadvantage of obtaining a biased estimate of α and the advantage of being a model with known complexity. We hope that the GWNBRg methodology be a better approach for the modeling of spatial count data with overdispersion than the GWPR method, which considers that α is equal to zero.

3 Simulations and applications

This section aims to show the fit of the GWNBR and GWNBRg models to simulated and real data. Using simulated data, our objective is to evaluate whether the models correctly reproduce the parameters. The real data was used to show a practical application of the proposed model. In both cases, the results will be compared with GWPR and Negative Binomial (NBR) and Poisson (PR) regressions.

It is important to mention that the discussion presented here is mainly focused on the comparison between different distributions for the modeling of count data. The analysis of the limitations and problems associated with the GWR framework, such as the spatial error autocorrelation (Leung et al. 2000b), multicollinearity (Wheeler and Tiefelsdorf 2005; Geniaux et al. 2011) and extreme coefficients including sign reversal (Farber and Paez 2007) will not be treated as in previous studies, but some of the results show that our approach does retain some of these problems.

The simulations will be conducted using the SAS 9.2 software and the procedure presented by Paez et al. (2011), i.e., we will simulate spatial data and spatial coefficients, and we will introduce varying levels of correlations between variables x 1 and x 2 (from −0.75 to 0.75 by 0.25 and 100 times for each level), using a transformation similar to that used by Wheeler and Tiefelsdorf (2005):

$$ x_2^{\theta}=\sin(\theta)x_1+\cos( \theta)x_2 $$
(27)

where θ is the correlation level.

The correlation between variables x 1 and x 2 and the correlation measured on simulated data are presented below. As shown, the pattern is approximately the same.

figure a

3.1 Simulation of GWNBR

The main objective of the GWNBR and GWNBRg frameworks is the modeling of non-stationary count data with overdispersion. Thus, we first simulate data with these characteristics using the following structure:

(28)

where

$$ \begin{aligned} &\mu(u_j,v_j) =\exp \bigl\{b_0(u_j,v_j)+b_1(u_j,v_j)x_{j1} \\ &\hphantom{\mu(u_j,v_j) =}{}+ b_2(u_j,v_j)x_{j2} \bigr\} \\ &b_0(u_j,v_j) = 1 \\ &b_1(u_j,v_j) = \sqrt{[u_j- \overline{u}_j]^2 \times[v_j- \overline{v}_j]^2} \\ &b_2(u_j,v_j) = 2 \bigl\{- \bigl[(u_j-\overline{u}_j)/5\bigr]^2 - \bigl[(v_j-\overline{v}_j)/5\bigr]^2 \\ &\hphantom{b_2(u_j,v_j) =}{}+0.13 \bigr\} \\ &\overline{u}_j=-40.7693, \qquad\overline{v}_j=-19.5915 \\ &\alpha(u_j,v_j) = 2 \bigl\{10^{-7}[|v_j|]^5 \bigr\}^2 \end{aligned} $$
(29)

The explicative variables x j1∗ and x j2∗ were initially simulated from a Uniform(0,1) distribution, and the variable x 1 was then updated to be spatially dependent by \(x_{j1}:= (\frac {x_{j1}*|u_{j}|}{1000} )^{2}+ (\frac{x_{j1}*|v_{j}|}{90} )^{2}\) and the variable x 2 was updated such that it is correlated with x 1 using (27). We use the shape of the Brazilian state Espirito Santo as a geographical reference to simulate the data (77 municipalities) because irregular tessellations are more representative of real geographic systems, as described by Farber et al. (2009). Figure 2 shows the spatial distributions of the parameters, and Fig. 3 shows the spatial distributions of the variables, assuming that the correlation between x 1 and x 2 is equal to zero.

Fig. 2
figure 2

True coefficients of surfaces b 1, b 2, and α of the GWNBR framework

Fig. 3
figure 3

Spatial patterns of y, x 1, and x 2 of the GWNBR framework

To confirm whether the method used to find the best bandwidth affects the results based on the suggestion provided by Farber and Paez (2007), we fit the GWNBR, GWNBRg, and GWPR using AIC and Cross-Validation (CV) approaches, but the comparison between the models will be performed using log-likelihood. To determinate the geographical weights using the nearest-neighbor bandwidth, we will use the adaptive bi-square kernel given by

$$ w_{ij}=\left\{ \begin{array}{l@{\quad}l} [1-(d_{ij}/b)^2]^2 , & \text{if $j$ is one of the $N$-th nearest}\\ & \text{neighbors of $i$;}\\ 0, & \text{otherwise.} \end{array} \right. $$
(30)

where d ij is the distance between the j-th and i-th nearest neighbors and b is the bandwidth.

To determine the geographical weights by the distance bandwidth, we will use the Gaussian kernel function given by

$$ w_{ij}=\exp\biggl\{- \frac{1}{2}(d_{ij}/b)^2 \biggr\} $$
(31)

where d ij is the distance between the j-th and i-th locations and b is the bandwidth.

Table 1 shows the frequency of the best model using cross-validation or the AIC approach for the distance or the nearest-neighbor bandwidths. As shown, approximately 65 % of the best fits were obtained using the GWNBR method, and approximately 33 % of the best fits were obtained from the GWPR framework.

Table 1 Frequency of the best model for the fitting of GWNBR simulated data

Table 2 shows the frequency of the best model using only cross-validation for the distance or the nearest-neighbor bandwidths. As shown, the percentage of times that GWNBR is chosen as the best model increases to 72 %, and increases to 14 % for GWNBRg.

Table 2 Frequency of the best model for the fitting of GWNBR simulated data

Figures 4 and 5 show the variability of the log-likelihoods for the GWNBR for no correlation case and PR and NBR models, respectively. In general, when the AIC was minimized to find the optimal bandwidth, the variability of the log-likelihood was smaller than when the CV was used to find the optimal bandwidth. We found the same results for each level of correlation between x 1 and x 2 and for GWNBRg and GWPR. Also, it is interesting to note that the NBR model is influenced by the correlation between variables x 1 and x 2, whereas the Poisson regression is not.

Fig. 4
figure 4

Distributions of log-likelihood of GWNBR

Fig. 5
figure 5

Distributions of log-likelihood of PR and NBR

Because the cross-validation approach apparently gives the best results for the modeling of count data, let us analyze the variability in the optimal bandwidth. Figure 6 shows the box-plot of the optimal bandwidth using cross-validation or AIC for the distance and nearest-neighbor bandwidths of GWNBR. Figure 7 shows the box-plot of the optimal bandwidth using cross-validation or AIC for the distance and nearest-neighbor bandwidths of GWPR. In addition, Fig. 8 shows the box-plot of the optimal bandwidth using cross-validation or AIC and for the distance and nearest-neighbor bandwidths of GWNBRg.

Fig. 6
figure 6

Distributions of the optimal bandwidth of GWNBR

Fig. 7
figure 7

Distributions of the optimal bandwidth of GWPR

Fig. 8
figure 8

Distributions of the optimal bandwidth of GWNBRg

In summary, the correlation between variables x 1 and x 2 appears to influence the distance bandwidth used in the GWNBR framework, but this correlation does not affect the results when the type of bandwidth is nearest-neighbor and when the model is GWPR or GWNBRg.

Table 3 shows some median statistics for the goodness of fit. Column b refers to the bandwidth, which is expressed in terms of the numbers of points included in the local regressions, see (30), and which was found by minimizing the AIC for the GWPR and GWNBRg models and minimizing the CV for the GWNBR model. The column labeled Par indicates the effective number of parameters, see (23) and the column labeled L(β,α) indicates the log-likelihood.

Table 3 Median of the results obtained from the GWNBR simulations, where b refers to the bandwidth and Par indicates the effective number of parameters

The negative binomial regression shows a goodness of fit that is better than that obtained from the Poisson regression. This finding was expected because the simulated data were simulated according to a negative binomial distribution. In addition, allowing α to vary spatially, the best estimated model was GWNBR compared with GWNBRg, which uses a fixed α. As expected, the Poisson regression was the worst model.

Figures 9, 10 and 11 show the differences between the real and the values estimated by the GWNBR, GWPR and GWNBRg models, respectively. In summary, GWNBR was the best model, which successfully adjusted the simulated data based on a spatial negative binomial distribution. In addition, GWNBRg, despite of its biased estimate for the overdispersion parameter α also gave good results. Unfortunately, the variability was very large as was observed by Paez et al. (2011) with a small sample size. It is interesting to study this behavior for larger samples.

Fig. 9
figure 9

Distributions of the differences between the real values and the values estimated with the GWNBR model

Fig. 10
figure 10

Distributions of the differences between the real values and the values estimated with the GWPR model

Fig. 11
figure 11

Distributions of the differences between the real values and the values estimated with the GWNBRg model

Figure 12 shows the sample distributions of RMSE of the parameter estimates using 700 replications for each. In general, the approach that found the best bandwidth and the type of bandwidth produced the same distribution of the parameters estimates. As shown, the distribution of RMSE for GWNBR model presents smaller values and less variability than GWNBRg and GWPR models.

Fig. 12
figure 12

Distributions of the RMSE of the parameters estimates

3.2 Simulation of GWPR

The main objective of this section is to show that GWNBR can also model Poisson data that are spatially distributed and to therefore reinforce that GWPR is a special case of GWNBR. The simulated data were created 100 times, as in the previous section, assuming that α is equal to zero and that variable x j1 follows a normal (0,1) distribution. We choose to not present all detailed results of the GWPR simulated data because of length constraints and because the Poisson distribution was very stable to the correlation between variables. The results correspond to the median values.

(32)

where

$$ \begin{aligned} &b_0(u_j,v_j) = 0.000005[|u_j|]^3+0.000005[|v_j|]^3 \\ &b_1(u_j,v_j) = \sqrt{[u_j- \overline{u}_j]^2 \times[v_j- \overline{v}_j]^2} \\ &\overline{u}_j=-40.7693, \qquad\overline{v}_j=-19.5915 \end{aligned} $$
(33)

The bandwidth was found through an adaptive bi-square kernel function, see (30) and the graphic of the minimization of the CV was achieved through a golden section search algorithm (resulting in b=15), as shown in Fig. 13.

Fig. 13
figure 13

Bandwidth of the GWNBR model that minimize the CV

Table 4 shows some statistics for the goodness of fit obtained for the models. As expected, the GWPR shows a goodness of fit that is superior to that obtained with the other models. However, the log-likelihood of GWNBR is close to that of GWPR, which shows that this model is also a candidate. The quality measures for GWNBRg were worse than those obtained with the other models. In addition, it is interesting to see that, despite the fact that the data were derived from a Poisson distribution, the Poisson regression was the worst model because the data are spatially dependent. Even the negative binomial regression showed a better fit than the PR model, because its interpretation of the spatiality was captured in the overdispersion parameter.

Table 4 Results obtained from simulations using the GWPR model

Figure 14 shows the estimated surfaces of parameters b 0 and b 1 and the real maps based on (33). As shown, all of the models estimated the intercept correctly, but the adjustment of parameter b 1 by GWNBRg was the worst. In contrast, the values for parameter b 1 estimated by the GWPR and GWNBR models were almost the same. This result can be explained by the fact that the parameters α of GWNBR were close to zero (varying between 0 and 0.076) and, as was found previously, the negative binomial distribution approximates a Poisson distribution when α tends to zero.

Fig. 14
figure 14

Real and estimated surfaces of the parameters of the GWPR

The GWNBRg model does not to differentiate between non-stationarity and overdispersion well. Because α is estimated under the assumption that all of the parameters are constants, there is confusion between the two concepts. Thus, we do not recommend that GWNBRg should be used to model count data without overdispersion.

To demonstrate that GWPR is a special case of GWNBR, we set the parameter α of the GWNBR framework to zero, and the results, including the quality measures, such as the log-likelihood and AIC, obtained were exactly the same up to the sixth decimal point. Thus, we concluded that GWNBR is able to model non-stationary data from a negative binomial distribution and from Poisson distribution. In addition, if α=0, the GWNBR framework overrides GWPR.

3.3 Application for Tokyo, Japan

The GWPR technique was developed by Nakaya et al. (2005) and used to model the working-age deaths in the Tokyo metropolitan area of Japan. The results indicate that there are significant spatial variations in the variables and that, consequently, the application of a traditional ‘global’ Poisson model would yield misleading results. Thus, our objective is to compare the results found by Nakaya et al. (2005) with those estimated by the GWNBR model (setting α=0, i.e., GWPR from GWNBR) and to also estimate the overdispersion parameter α using GWNBR.

Another objective is to determine whether GWPR is the best approach to model the mortality data from Japan and whether GWNBR is capable to produce the same results. In addition, we aimed to show the consistency of the SAS %gwnbr macro compared with the GWR 3.x software.

The explicative variables are the following:

$$ \begin{array}{@{}rcl} b_1 &=& \mbox{proportion of professional and technical workers} \\ && \mbox{(PRO);} \\ b_2 &=& \mbox{proportion of elderly people, i.e., over 64~years of} \\ && \mbox{age (OLD);} \\ b_3 &=& \mbox{rate of house-ownership (OWNH); and} \\ b_4 &=& \mbox{unemployment rate (UNEMP).} \end{array} $$

Table 5 shows the parameter values estimated by Nakaya et al. (2005) using the software GWR 3.x software and the parameters estimated by GWNBR using the %gwnbr macro developed in SAS/IML. Table 6 shows the standard errors of the parameters estimates. The bandwidth estimated by Nakaya et al. (2005) using the adaptive bi-square kernel given by (30) and minimizing the AIC criterion was b=95 neighbors. The bandwidth estimated using GWNBR was b=18 km, this bandwidth was found using the Gaussian kernel function given by (31) and minimizing the CV.

Table 5 Summary of the parameter estimates obtained for the data from Tokyo, Japan
Table 6 Summary of the standard errors of the parameters estimates

The comparison of the results shown in Table 5 and Table 6 reveals the same estimates, up to the third decimal point, between GWPR (GWR 3.x) and GWNBR (%gwnbr macro and α=0), which indicates that the algorithm written for GWNBR is correct.

Table 5 shows no estimates for the overdispersion parameters α for GWNBR, but these values vary between 0.0001 and 0.01 and are not significant in only 5.7 % of the areas. In addition, the log-likelihood of the GWPR and GWNBR frameworks were close (−985.9 and −980.7, respectively), which indicates that both models exhibited a similar goodness of fit. Thus, we conclude that the data does not exhibit overdispersion (and this fact was verified by GWNBR). Consequently, the GWPR model is the most adequate model. Thus, to use the GWPR model with the GWNBR framework, should be set α to zero in the SAS %gwnbr macro.

3.4 Application for the Brazilian state of Espirito Santo

In this section, we analyze the distribution of the vehicles used for road freight transportation in Espirito Santo, Brazil. The understanding of the spatial distribution of these vehicles in the country can help the Brazilian authorities formulate public policies for the sector, such as supervision, maintenance and expansion of the highways, which are the main mode of transportation in Brazil. In addition, this spatial configuration is of interest to the companies that need these vehicles to transport their products.

To facilitate the interpretation and focus on the GWNBR model, we used only one explicative variable. The amount of industries (X) in the city, which represents the economic aspect of the problem, will help explain the amount of vehicles used in road freight transportation (Y). The spatial distributions of the variables (Fleet—Y and Industry—X) are shown in Fig. 15.

Fig. 15
figure 15

Spatial distribution of the variables Fleet and Industry

As shown in Fig. 15, the trucks are concentrated in the sea border of the state (southeast side), where the capital of the state Vitoria is located. The same behavior was observed for the variable industry, which indicates that there is a relationship between these variables.

To check for linearity, which is required for the GWR framework, we use the linear LOESS (Cleveland and Devlin 1988) and GAM models (Hastie and Tibshirani 1990; Wood 2006). As shown in Fig. 16, the adjustment achieved with the GAM model is more sophisticated than those obtained with the LOESS and linear models. However, linearity was also shown due to the large number of data points around the line.

Fig. 16
figure 16

Relationship between the variables Fleet and Industry

The same procedure that was followed in the analysis of the data obtained from Japan will be performed in this analysis, i.e., we will use the Poisson and Negative Binomial models, in their global and spatial forms. For the GWNBR and GWNBRg models, we used the Gaussian kernel, which resulted in a bandwidth of 53 kilometers. For the GWPR model, the bandwidth, which was found using the same kernel, was b=9.4 kilometers. However, the parameter estimation with this bandwidth is inappropriate because there are only between 1 and 9 points for each regression. Therefore, we used the adaptive bi-square kernel with a bandwidth equal to 10 points. Table 7 shows the results of the fit.

Table 7 Model results for the analysis of the data from Espirito Santo, Brazil

As shown in Table 7, the PR and GWPR models were the worst models. The earlier problem detected in the estimation of the bandwidth was a clue of the lack of fit obtained with the Poisson distribution. Thus, we can conclude that the fleet of vehicles used in road freight transportation probably exhibits overdispersion. Therefore the negative binomial distribution is the best approach for the modeling of these variables.

Although the difference between their AIC and log-likelihood is marginal, GWNBR and GWNBRg are better choices for the modeling of these data than the NBR model. To highlight this fact, we performed a non-stationarity test based on m=999 replications for the GWNBR model, by checking the variability of its estimates over space. For parameter k, the variability can be calculated as

(34)

The significance of the V k can be evaluated through a randomization test (Hope 1968), in which the empirical distribution of the V k is created under the null hypothesis of spatial stationarity. Initially, the geographic coordinates of the observations are randomly permuted and GWNBR is computed for this new configuration. According to the new estimates, we calculate V k using (34), and repeat this procedure m times until the empirical distribution of V k , under the null hypothesis is obtained. By ranking the values of V k in descending order, including the original value for V k we obtain the p-value using the following equation: p/(m+1), where p is the rank of the original V k , i.e., the number of times that the simulated estimates were greater than the original estimate. The randomization test has the advantage that it does not use a probability distribution for V k , but it is computationally intensive. Another parametric test for non-stationarity was proposed by Leung et al. (2000a), but this method was not used in this analysis.

The p-values for β 0, β 1, and α were 1.2 %, 8.4 % and 62.6 %, respectively. Considering the null hypothesis of stationarity, we can conclude that β 0 and β 1 are statistical different across the space at level of 10 %.

The surface of the parameters and their standard errors are shown in Fig. 17. The greater values for the intercept (b 0) are concentrated in the Vitoria metropolitan region (southeast side), which is the capital of the state. This finding reflects the greater amount of vehicles located in that place. In contrast, the parameter estimates for b 1 are smaller in that area because there are many industries and the link function is exponential.

Fig. 17
figure 17

Surface of the parameter estimates and standard errors obtained with the GWNBR model

In summary, the results indicate that there is spatial variation and overdispersion in the distribution of vehicles used for road freight transportation in Espirito Santo, Brazil. Consequently, the application of a traditional negative binomial model would yield misleading results. In addition, the Poisson models (global or spatial) would produce even poorer results.

4 Conclusions

This paper aims to show a methodology for Geographical Weighted Regression for data that follow a negative binomial distribution; this method is denoted GWNBR. Due to the difficulty associated with finding the effective number of parameters, we also developed the GWNBRg approach. The difference between these two methods is in the estimation of the overdispersion parameter α. In GWNBRg, α is estimated in globally, which generates a biased estimate for the parameter of overdispersion. However, the simplicity of this approach allows the calculation of the effective number of parameters of the model. In GWNBR, although this amount is unknown, GWNBR allows parameter α to vary spatially.

Based on the simulations presented, it was found that the global regressions search for regularities and patterns is inadequate for the search for local particularities or exceptions. In addition, the models based on the Poisson distribution were found to be inappropriate for the modeling of data with overdispersion. Also, it is interesting to investigate whether the distribution of α influenced the results, i.e, simulate a spatial distribution of α for data that do not exhibit overdispersion or non smoothy (more rugged data).

GWNBR shows a goodness of fit in the modeling of non-stationary and overdispersion data that is superior to that obtained with the other models, as was observed in the example of the fleet of vehicles in the Brazilian state of Espirito Santo. Also, GWNBR exhibits a good approximation for GWPR when α→0, or the same results when α=0, as was observed in the example of the distribution of deaths in Tokyo, Japan.

The simulations of count data showed a pattern that was the same as that found by Paez et al. (2011), who evaluated the results of Gaussian GWR in the presence of multicollinearity, i.e., when the data are created from a negative binomial distribution, NBR but not PR is sensitive to multicollinearity. The simulations also showed that the approach used to find the best bandwidth influences the results: AIC has a smaller variability than CV but CV produces a better fit than AIC, as measured by the log-likelihood.

In conclusion, the GWNBR method is a robust tool for the modeling of count data, especially when the data exhibit non-stationarity and overdispersion, by incorporating the Poisson Regression (PR), Geographically Weighted Poisson Regression (GWPR), and Negative Binomial Regression (NBR) models.