Keywords

14.1 Introduction

Geospatial data analysis has always been deeply rooted in two main inference paradigms in statistics, the classical frequentist inference and the younger but fasting-growing Bayesian inference (Haining, 2014). The two paradigms represent two ontologically (model-building) and epistemologically (integrating knowledge from other sources) different statistical reasonings (Withers, 2002). Geographers have for decades recognized the great potential of Bayesian inference. For instances, Bennett (1985) suggested that Bayesian approaches held the greatest potential in advancing spatial analysis; Hepple (1995) introduced the Bayesian analysis in spatial and network econometrics; Fotheringham et al. (2000) discussed Bayesian inference along with classical inference in their quantitative geography textbook; Withers (2002) provided a comprehensive review on the methodological and substantive benefits of Bayesian methods in human geography research and encouraged geographers to try the new approach; and Lawson and Banerjee (2009) conducted a comprehensive technical review of Bayesian spatial analysis, which covered a broad range of topics including spatial data types, basic concepts and algorithms of Bayesian inference, Bayesian models and examples for point processes and disease mapping, and software packages for Bayesian modeling.

This chapter is organized into five sections: after a brief introduction, Sect. 14.2 reviews the basics of Bayesian inference and its potentials in geospatial research; Sect. 14.3 discusses four applications and models; Sect. 14.4 provides a short discussion on the implementation of Bayesian models; and the last section offers some concluding remarks.

14.2 Bayesian Inference

Between the two major inference frameworks in statistics, the frequentist inference is the conventional approach. It interprets the probability as the long-run frequency or repeatable experiments. Therefore, it can estimate the parameters, which are considered as unknown but fixed quantities, based on the sample data. On the other hand, the Bayesian inference is based on Bayes’ theorem, named after Thomas Bayes. In addition to the long-run frequency which can be obtained from sample data (Greenland, 2000), this approach includes subjective experience of uncertainty (De Finetti, 1974) to interpret the probability. The subjective experience relies on previous knowledge on uncertainty to describe the distributions of the parameters, which are considered as unknown but random variables and we call them as prior distributions of the parameters. Bayesian methods combine the information from both sample data and the prior distributions to produce posterior distribution. The model fitting and the implications of the resulting posterior distribution can also be evaluated (Gelman, 2014).

To be specific, let \(y\) be a random variable with distribution \(f(y|\theta )\). A sample is collected for the random variable \(y\) with independent observations \(y_{1} , \ldots ,{ }y_{n}\), then the likelihood is defined as: \(L\left( {\theta {|}y_{1} , \ldots ,y_{n} } \right) = \prod\nolimits_{i = 1}^{n} {f(y_{i} |\theta )} .\)

The likelihood summarizes the information about \(\theta\) based on the observations (Tanner, 1998) and is used by the frequentist inference to estimate the parameter \(\theta\). Therefore, the frequentist inference uses information only from sample data. In comparison, Bayesian inference is based on information from the likelihood as well as the prior distribution. Let \(p\left( \theta \right)\) be the prior distribution of \(\theta\) which represents the subjective knowledge of \(\theta\), then the Bayes’ theory calculates the posterior distribution (Kreft & de Leeuw, 2007) as:

$$ \begin{aligned} p\left( {\theta {|}y_{1} , \ldots , y_{n} } \right) & = \frac{{p\left( \theta \right)L(\theta |y_{1} , \ldots ,y_{n} )}}{{p\left( {y_{1} , \ldots ,y_{n} } \right)}} \\ & = \frac{{p\left( \theta \right)L\left( {\theta {|}y_{1} , \ldots ,y_{n} } \right)}}{{\smallint p(y_{1} , \ldots , y_{n} |\theta )p\left( \theta \right){\text{d}}\theta }} \\ \end{aligned} $$
(14.1)

The posterior distribution is a conditional distribution, i.e., the distribution of \(\theta\) given the sample data. It updates the knowledge about \(\theta\) from the prior distribution using the information from the sample data. Then it can be used to make inference about \(\theta\), such as mean, credibility interval which corresponds to confidence interval of the frequentist approach.

As the Bayes’ theory incorporates the prior information, it adds complexity to the computation. The integration of denominator in Eq. (14.1) may not have a closed form. Because the analytical and numerical integration are often not intractable, especially for high dimension integration, methods were proposed to approximate the posterior distribution. The most popular method is Markov Chain Monte Carlo (MCMC) (Berger, 2000; Cappe & Robert, 2000), which includes Gibbs sampler and Metropolis Hastings algorithm.

The MCMC method is based on the theory of Law of Large Numbers which states that an expectation can be efficiently approximated by a Monte Carlo estimator. Therefore, the basic idea of the MCMC method is to make inference based on samples drawn from posterior distribution. Specifically, it first generates sequences of dependent observations which is called Markov chains, then inference is done using these samples, such as estimating expectation of the parameter using the sample mean. It has been proved that, although the samples are dependent, the observations in these samples can be considered as independent and identical from the true posterior distribution when the Markov Chain is long enough (i.e., to infinity) and is under certain conditions (the chain must be finite, aperiodic, irreducible, and ergodic). To meet those conditions, some iterations at the beginning of the MCMC run need to be discarded and the process is called Burn-in samples. The number of Burn-in samples that need to be discarded can be determined by diagnostics, such as the Geweke Diagnostic, the Heidelberg and Welch Diagnostic, the Raftery and Lewis Diagnostic, and the Gelman and Rubin Multiple Sequence Diagnostic. In addition, the Gibbs sampler is the simplest MCMC algorithm, and it is a special case of Metropolis Hastings algorithm.

Bayesian inference are detailed in numerous textbooks (Congdon, 2014; Gelman, 2014). Several journal articles also provide extensive discussions of Bayesian methods that are directly dealing with geospatial data (Berger, 2000; Cappe & Robert, 2000; Hepple, 1995; Lawson & Banerjee, 2009). The following discussion addresses only selected applications of Bayesian inference on geospatial data analysis.

14.3 Applications of Bayesian Models in Geospatial Problems

We focus our discussions on models that analyze two types of geospatial data, point data with attributes and count data aggregated to areal units. The following discussions are confined to four selected methods and models, namely Bayesian spatial interpolation, spatial epidemiology/disease mapping, Bayesian hierarchical models, and Bayesian spatial autoregressive models.

14.3.1 Bayesian Spatial Interpolation

The classical interpolation method such as Kriging relies on the BLUP (Best Linear Unbiased Predictor) and substitutes maximum likelihood estimates for the model parameters (Lam, 1983). The Bayesian approach, on the other hand, first computes a posterior distribution for model parameters and then computes the posterior predictive distribution by marginalizing over (averaging over) the posterior distribution. The major advantage of the latter solution is that the inference is supported by proper and moderately informative priors on the weakly identified correlation function parameters (Lawson & Banerjee, 2009; Mugglin et al., 1999). Bayesian approach fuses information from multiple sources in the development of models, so it can better handle uncertainty in the interpolation results. Bayesian spatial interpolation methods are applied most commonly in environmental studies (Brown et al., 1994; Cooley et al., 2007; Fuentes & Raftery, 2005). More recently, Bayesian-based spatiotemporal methods have been developed to analyze rapidly increasing collections of and access to spatiotemporal data (Christakos, 2000; Cressie & Wikle, 2011; Esmaeilbeigi et al., 2020; Haining & Li, 2021; Li & Revesz, 2004; Sahu et al., 2010, 2015; Susanto et al., 2016), because of the above-mentioned advantages.

14.3.2 Bayesian Models for Disease Mapping, Risk Estimate, and Prediction

There are two common types of disease data. The first type is case event data, where the locations of cases (points) are usually known residential addresses of patients. These data form a spatial point process. But such data are usually unavailable, particularly for large study areas. The second type is aggregated counts of cases (events), which are more common and accessible. The boundaries of aggregation that form the basic spatial units of the study region are typically subjective with respect to the disease process (such as zip code areas). Bayesian models have long been applied on disease mapping, risk estimate, and prediction (Besag & Newell, 1991; Greenland, 2006, 2007, 2009; Lawson, 2018; Wakefield & Morris, 2001; Waller et al., 1997). A wise choice of the prior distribution can inform the models by bringing in epidemiological domain knowledge and other information from the study area, which can lead to more reliable model results. In addition, the Bayesian approach is more flexible and effective in dealing with sparse data or rare events. Lawson and Banerjee (2009) illustrated the technical details of specifying Bayesian models for analyzing the two types of data and highlighted the applications of count data.

14.3.3 Bayesian Hierarchical Models

Hierarchical (multilevel) regression models have long been used in geospatial research to explicitly incorporate data collected at various spatial scales of observations, for instance, individuals nested in neighborhoods and neighborhoods in cities. Hierarchical models are naturally Bayesian because the distributions of regression coefficients across various clusters (groups, geographic regions etc.) can be treated as a special type of prior distribution. The “empirical Bayes” method estimates regression coefficients as weighted average of the coefficients obtained from sample data from all clusters. In this case, sample data are used to form the prior population distribution, so there are no prior distributions for the hyperparameters. The “Pure Bayes” method generates prior distributions for the hyperparameters from a population. Though the two approaches commonly yield similar results, the latter approach explicitly takes account of prior uncertainty, so it usually generates larger posterior variance (Western, 1999). Moreover, datasets used in hierarchical models could be complex due to problems such as measurement error, censored or missing observations, complex multilevel correlation structures, and multiple endpoints. Comparing to frequentist procedures, Bayesian procedures are not only more flexible in handling the above data issues, but also easier to justify the theoretical properties in the model (Congdon, 2021; Dunson, 2001; McGlothlin & Viele, 2018).

14.3.4 Bayesian Spatial Autoregressive Models

Spatial autoregressive (SAR) models differ from standard regression models in that they account for spatial autocorrelation in the sample data (Griffith, 2009). Bayesian methods have been used to estimate SAR models for several decades (Hepple, 1979; LeSage, 1997, 2000) and the motivation was driven by several advantages of the approach: it can accommodate the presence of an unknown form of heteroskedasticity in the disturbance term in SAR models; it can produce posterior distribution of spatial lag parameters; it can help choose between a logit or probit model; and it by nature can allow prior knowledge to be introduced in the model when available. Doğan and Taşpınar (2014) compared the robust method of moments (GMM) estimator and the estimators based on the Bayesian MCMC approach for SAR models with heteroskedasticity of an unknown form. Their results indicate that the Maximum Likelihood Estimation (MLE) and the Bayesian estimators impose relatively greater bias on the SAR parameter estimation when there is a negative spatial dependence in the model, they also found that the Bayesian estimators perform better than the robust GMM estimator in terms of finite sample efficiency. LeSage and Chih (2018) developed a Bayesian heterogeneous coefficients SAR panel model to estimate spill-in and spill-out effects for wage in the contiguous US states.

Litterman (1986) proposed the Bayesian vector autoregressive model (BVAR) to overcome the collinearity and overparameterization that are typically found in unrestricted vector autoregressive models (VAR). The Bayesian approach can specify coefficients with varying weights and the estimated coefficients are therefore a combination of prior knowledge and the information from sample data. Like other Bayesian SAR models, BVAR models have mostly been applied in economic and regional forecasting research (Cuaresma et al., 2016; Puri & Soydemir, 2000).

14.4 Bayesian Implementation

Bayesian models are often fit using MCMC techniques. Many software packages can perform MCMC estimation with varying degrees of difficulty and different sampling procedures. The most popular one is WinBUGS (Bayesian inference Using Gibbs Sampling). This free software package employs both Gibbs sampling and Metropolis–Hastings updating methods for a wide range of models. It allows specifying models, sampling from the posterior distribution of parameters, diagnosing model convergence, and creating graphical and analytic output (Lunn et al., 2009). GeoBUGS, a GIS add-on module of WinBUGS, can be used to fit spatial models and to produce a range of map products from model output.

JAGS (Just Another Gibbs Sampler) is an open-source and cross-platform Bayesian analysis program that uses the same model description language as WinBUGS. It can specify Bayesian models and generate samples from the posterior distribution (Plummer, 2003). JAGS users usually rely on R packages such as “coda” and “mcmcplot” to test model convergence, analyze model output, and generate graphics of model results. The “rjags” package of the R software provides an interface to access the JAGS library.

STAN is another specialized software package for Bayesian analysis. Different from JAGS and WinBUGS samplers, it uses a Hamiltonian Monte Carlo and No-U Turn sampling procedure due to their abilities to handle nonconjugate priors and high posterior correlations (Stan Development Team, 2021). Like JAGS, the R package “rstan” is commonly used to access the STAN library from R and the same R packages for JAGS can be used to analyze model output and produce result graphics.

Bayesian analysis is also facilitated by a growing number of R packages. For instances, “brms” is for fitting Bayesian generalized (non-)linear multivariate multilevel models using STAN for full Bayesian inference; “geoR” is for geostatistical analysis including variogram-based, likelihood-based, and Bayesian methods; “spBayes” is for spatially varying short-length time series data; “spTimer” is for fitting large hierarchical Bayesian spatiotemporal models; and “CARBayes” is for fitting a class of univariate and multivariate spatial generalized linear mixed models for areal unit data, with inference in a Bayesian setting using MCMC simulation.

A growing number of researchers have been adopted INLA (the integrated nested Laplace approximation) as an alternative method for approximating Bayesian inference over the past 10 or so years (Blangiardo & Cameletti, 2015; Rue et al., 2009). The INLA methodology focuses on models that can be expressed as latent Gaussian Markov random fields (GMRF), therefore it works for a large family of models. It also enjoys significant computational advantages over classic methods such as MCMC in dealing with complex models. The method can be implemented using the R-INLA package (R-INLA Project).

14.5 Some Concluding Thoughts

For more than four decades, the Bayesian inference has been proposed as an alternative inference to overcome some intrinsic issues in the classical statistical inference in geospatial inquires. For instances, Summerfield (1983) questioned the validity and relevance of applying the classical statistical inference to population data in geography research. Bennett (1985) argued Bayesian approaches had offered powerful alternative theory and techniques to advance statistical inference in spatial science. Haining (2014) highlighted three areas of development in spatial data analysis in the coming years after summarizing the major progress in spatial statistics in the first decade of the twenty-first century: spatial data mining; the “new” geostatistics; and the Bayesian spatial hierarchical modeling. Although the word “Bayesian” appears only in the last area, Bayesian methods have also been applied in the other two areas (Diggle & Lophaven, 2006; Gelfand & Banerjee, 2017; Zhang et al., 2019).

The Bayesian approach has provided geospatial researchers a versatile alternative to fit a wide range of models. In addition to the benefit of including prior knowledge to models, the Bayesian approach is much more flexible because almost any model assumption can be treated as a priori. In addition, the fact that geospatial datasets are not always samples and they could be populations or apparent populations (irreplicable observations) has made it difficult to do classical inference. The Bayesian approach provides a debatable solution to this unique inference challenge in geospatial research (Berk et al., 1995; Mendoza et al., 2021). Moreover, the Bayesian approach can accommodate the needs in the rapidly growing space–time modeling (Faghmous & Kumar, 2014; Holmström et al., 2015).

The opponents of the Bayesian approach hold two fundamental objections to the method. One is that the approach might be abused as an automatic inference engine, and the other is the subjectivity in the choice of prior distribution (Gelman et al., 2013). Like many other authors, we view Bayesian inference as an ontologically and epistemologically different approach with appealing statistical properties that the classical inference lacks. The true value of the approach, however, will need to be assessed by whether it can advance geospatial reasoning in the long run (Withers, 2002). We encourage you to explore the great potential offered by this compelling alterative approach to tackle the problems in the fascinating geospatial world.