Keywords

JEL Classifications

1 Introduction

Traditionally, models that use origin-destination flow data to explain variation in the level of flows between origin and destination locations of interaction across some relevant geographic space are called gravity models,Footnote 1 in analogy with Newton’s concept of gravity. Locations may be either regions or point units, and spatial interactions relate to movements of various kinds. Examples include not only migration, journey-to-work, traffic, commodity or trade flows, but also flows of less tangible entities such as capital, information and knowledge. By adopting a spatial interaction perspective, attention is focused on interaction patterns at the aggregate rather than the individual level.

Gravity modelsFootnote 2 typically rely on three types of factors to explain mean interaction frequencies: Origin specific variables that characterise the ability of origin locations to produce or generate flows; destination specific variables that attempt to capture the attractiveness of destination locations; and, a separation function that reflects the way spatial separation of origins from destinations constrains or impedes the interaction (Fischer and Wang 2011). At larger scales of geographical inquiry spatial separation might be simply measured in terms of the great circle distance separating an origin from a destination location. In other cases, it might be transportation cost, perceived travel time or any other sensible measure such as political distance, language distance and cultural distance measured in terms of nominal or categorical attributes. One popular example of a separation or deterrence function is the exponential function that leads to gravity models known as exponential gravity models.

Alternative forms of the gravity model can be specified by imposing (exogenous) constraints on the mean interaction frequencies. These model variants include origin and/or destination specific balancing (normalising) factors that act as constraints to ensure that the origin and destination totals for spatial interactions are exactly predicted (see Wilson 1971). The model is said to be doubly constrained if both origin and destination constraints hold for each location. If either the origin or the destination constraints hold the model is singly constrained; otherwise it is unconstrained. It is worth noting that the doubly constrained gravity model has also become known as the trip distribution stage in the four-step transport planning approach.Footnote 3 One more recently recognized role of these constraints is their accounting for spatial autocorrelation effects in the geographic distribution attributes across origins and destinations.

The focus in this paper is on singly and doubly constrained exponential gravity model variants for situations involving flows taking the form of counts; for example, counts of persons commuting from home to work locations, or as in the example considered in this paper, the number of patent citations from one region to another. In such cases, current practice is to model origin-destination flow data with Poisson gravity model specifications. Under the assumption that the flows are independently distributed Poisson variables the constrained gravity model variants can be treated as particular cases of a generalised linear model (GLM) with fixed (or random) effects, employing a logarithmic link function and a Poisson mean flow. Maximum likelihood estimates of the model parameters can be achieved using an iterative re-weighted least squares algorithm, as implemented in statistical software packages such as GLIM (Generalised Linear Iterative Modelling).

Flows, however, are not strictly independent. Spatial (or network) dependenceFootnote 4 is more likely than spatial independence when considering origin-destination flows. Spatial dependence in a flow setting refers to a situation where flows from nearby locations (either origins or destinations) are similar in magnitude. A failure to incorporate spatial dependence in model specifications leads to biased parameter estimates and incorrect conclusions. Eigenvector spatial filtering—described in Griffith (2003) for conventional regression models—offers an approach to dealing with spatial dependence in constrained gravity model variants. This approach relies on the decomposition of a spatial weight matrix into eigenvalues and eigenvectors and then uses a subset of the eigenvectors as additional explanatory variables in the singly and doubly constrained gravity model specifications to reduce potential bias in parameter estimates. A virtue of this approach is that existing software can be applied for the case of spatial dependence in constrained model variants involving flows taking the form of counts.

The purpose of this paper is twofold: first, to establish theoretical connections between the constrained gravity model versions with balancing factors, fixed effects represented by binary location specific indicator variables, and random effects; and second, to illustrate these connections with an empirical example while accounting for spatial dependence among flows during estimation. Fulfilling these goals reveals that fixed and random effects are identical and equal to the logarithm of the entropy maximisation derived factors, except for slight rounding/algorithm-convergence errors. This finding is the outcome of an equivalency between assigning a single fixed effects indicator variable to each origin/destination on the one hand, and estimating a single random effects value for an origin/destination while treating the corresponding destinations/origins as repeated measures, on the other. As with the unconstrained gravity model variant, adjusting for spatial dependence in origin-destination flows reduces bias in parameter estimates and improves model performance.

The rest of the paper is organised as follows. Section 3.2 describes unconstrained and constrained classes of the gravity model with a focus on doubly and singly constrained model variants that rely on a multiplicative adjustment scheme to enforce satisfactorily the conservation rule. Section 3.3 presents the counterpart Poisson specifications that interpret/predict the level of flows as dependent on not only the explanatory variables (and the associated coefficient estimates), but also origin and destination specific effects coefficients. Section 3.4 describes spatial filtering as a way of filtering the sample origin-destination data for spatial dependence (i.e., transferring spatial autocorrelation effects from residuals to the mean/intercept parameter) in an effort to mimic independent data amenable to standard Poisson regression estimation procedures. Section 3.5 continues to establish theoretical connections between balancing factors, fixed effects and random effects (spatially filtered) model specifications. The results are illustrated in Sect. 3.6 with an empirical example involving knowledge flows between 257 European regions resulting in 2572 = 66,049 flow dyads. Section 3.7 concludes the paper.

2 Unconstrained and Constrained Classes of Gravity Models: The Classical View

Gravity models that describe mean interaction frequencies in a system of n locations can be writtenFootnote 5 as

$$ E\left({Y}_{ij}\right)={K}_{ij} {U}_i {V}_j f\left({d}_{ij}\right) $$
(3.1)

where Y ij \( \left(i,j=1, \dots, n\right) \) is a random variable denoting the level of flows from origin location i \( \left(i=1, \dots, n\right) \) to destination location j \( \left(\,j=1, \dots, n\right), \) U i and V j are appropriate origin and destination specific factors or functions reflecting locational propensities to emit or attract interactions, f(d ij ) is a separation function of some inter-location measure d that separates origin i from destination j, and K ij is an origin-destination specific constant of proportionality, or scaling factor, which reduces to a constant scaling factor K for the unconstrained gravity model specification (which then is accompanied by attaching exponents of other than one to U i and V j). The role of this origin-destination specific constant of proportionality in the gravity model equation depends on how extensively the conservation rule (Ledent 1985) is enforced in the system of locations. Four alternative cases may be distinguished, giving rise to equally many classes of gravity models.

A gravity model is called unconstrained if the conservation principle is ignored altogether so that

$$ {K}_{ij}=K $$
(3.2)

where K is a constant scaling factor independent of all origins and destinations. If Y •• denotes the total number of flows in the spatial system, then

$$ K={Y}_{\bullet\bullet}\left[{\displaystyle \sum\limits_{\begin{subarray}{c}\hfill i=1\hfill \\ {}\hfill i\ne j\hfill \end{subarray}}^n}{ \sum_{j=1}^n}{U}_i{V}_j\,f\left({d}_{ij}\right)\right]^{-1} $$
(3.3)

where the summation is over the range \( i=1, \dots, n \) and \( j=1, \dots, n. \) Although Eq. (3.1) has been developed by analogy with Newton’s gravity equation, Isard (1960), and Sen and Smith (1995) developed versions of the unconstrained model using a probabilistic approach.

At the other extreme of the spectrum is the doubly constrained case of spatial interaction that refers to a situation in which the conservation principle is enforced from both the viewpoint of origin and destination locations. The origin-destination specific constant of proportionality, K ij , now depends on both origins and destinations. For simplicity, it is generally assumedFootnote 6 that

$$ {K}_{ij}={A}_i\;{B}_j $$
(3.4)

where the origin and destination specific constants, A i and B j , called balancing factors, are solutions of the equation system (Wilson 1967)

$$ {A}_i={Y}_{i\bullet}\,\left[{U}_i{\displaystyle \sum\limits_{\begin{subarray}{c}\hfill j=1\hfill \\ {}\hfill j\ne i\hfill \end{subarray}}^n}\,{B}_j\,{V}_j\,f\left({d}_{ij}\right)\right]^{-1} $$
(3.5)
$$ {B}_j={Y}_{\bullet j}\left[{V}_j{\displaystyle\,\sum\limits_{\begin{subarray}{c}\hfill i=1\hfill \\ {}\hfill i\ne j\hfill \end{subarray}}^n}\,{A}_i\,{U}_i\,f\left({d}_{ij}\right)\right]^{-1}. $$
(3.6)

These balancing factors act as constraints to ensure that the estimated inflows \( {\skew2\hat{Y}}_{\bullet j} \) for \( j=1, \dots, n \) and outflows \( {\skew2\hat{Y}}_{i\bullet} \) for \( i=1, \dots, n \) equal the observed inflow and outflow totals, respectively. Such doubly (or attraction-production) constrained models have been extensively used as trip distribution models in transport modelling, and many variants of this model form exist for describing journey-to-work interactions.

Between these two extreme cases of unconstrained and doubly constrained spatial interaction lie many models that are subject to some constraints but not to others. Two important classes can be identified: the origin (or alternatively called production) constrained gravity model, and the destination (or alternatively called attraction) constrained gravity model. In the production constrained case the conservation principle is enforced from the viewpoint of origin locationsFootnote 7 only. Hence

$$ {K}_{ij}={A}_i. $$
(3.7)

A i is a factor dependent on the location of an origin, and is called an origin specific balancing factor. If Y i denotes the total number of outflows from location i,

$$ {A}_i={Y}_{i\bullet}\left[{U}_i{\displaystyle \sum\limits_{\begin{subarray}{c}\hfill j=1\hfill \\ {}\hfill j\ne i\hfill \end{subarray}}^n}\,{V}_j\,f\left({d}_{ij}\right)\right]^{-1}. $$
(3.8)

The origin constrained gravity model is useful in situations where the outflow totals are known or can be exogenously predicted for each origin location in the system. For an instructive example see Haynes and Fotheringham (1984, pp. 60–62).

The attraction constrained case of spatial interaction enforces the conservation principle from the viewpoint of destination locations. Thus

$$ {K}_{ij}={B}_j $$
(3.9)

where B j is a factor dependent on the destination location. If Y j denotes the total number of inflows into location j,

$$ {B}_j={Y}_{\bullet j}\left[{V}_i{\displaystyle\,\sum\limits_{\begin{subarray}{c}\hfill i=1\hfill \\ {}\hfill i\ne j\hfill \end{subarray}}^n}{U}_i\,f\left({d}_{ij}\right)\right]^{-1}. $$
(3.10)

This model variant can be used to forecast total outflows from origin locations. Such a situation might arise, for example, in forecasting the effects of locating a new industrial park within a metropolitan area. The number of people to be employed in the new development area is known, and the destination constrained gravity model can be used to forecast the demand for housing in particular locations of the metropolitan area that will result from the new employment (Haynes and Fotheringham 1984).

The models presented in Eqs. (3.1)–(3.10) are in a generalised form and no mention has yet been made to the particular set of parameters characterising such gravity models. Although the balancing factors are sometimes referred to as parameters, in this paper the term parameter is restricted to those constants that must be estimated statistically, rather than to those constants that imply the accounting constraints placed on the model.

Many different model formulations can be obtained from Eq. (3.1), despite its structural simplicity (see Baxter 1983). U i and V j can be treated as completely known, as parameters to be estimated (see Cesario 1973), or as simple power functions of some known variables (see Fotheringham and O’Kelly 1989). The separation function constitutes the very core of gravity models.Footnote 8 In this study we use the multivariate exponential deterrence function

$$ f\left({d}_{ij}\right)= \exp \left(-\theta\;{d}_{ij}\right) $$
(3.11)

in which d denotes a multivariate separation measure with an associated sensitivity parameter θ. This specification of the spatial separation function leads to the following three variants of the gravity model: the doubly constrained variant

$$ E\left({Y}_{ij}\right)={A}_i {B}_j {U}_i {V}_j \exp \left(-\theta\;{d}_{ij}\right) $$
(3.12)
$$ {A}_i={Y}_{i\bullet}{\left[{U}_i{\displaystyle \sum\limits_{\begin{subarray}{c}\hfill j=1\hfill \\ {}\hfill j\ne i\hfill \end{subarray}}^n}\,{B}_j\,{V}_j \exp \left(-\theta {d}_{ij}\right)\right]}^{-1} $$
(3.13)
$$ {B}_j={Y}_{\bullet j}{\left[{V}_j{\displaystyle \sum\limits_{\begin{subarray}{c}\hfill i=1\hfill \\ {}\hfill i\ne j\hfill \end{subarray}}^n}\,{A}_i\,{U}_i \exp \left(-\theta {d}_{ij}\right)\right]}^{-1} $$
(3.14)

the origin constrained variant

$$ E\left({Y}_{ij}\right)={A}_i {U}_i {V}_j \exp \left(-\theta\;{d}_{ij}\right) $$
(3.15)
$$ {A}_i={Y}_{i\bullet}{\left[{U}_i{\displaystyle \sum\limits_{\begin{subarray}{c}\hfill j=1\hfill \\ {}\hfill j\ne i\hfill \end{subarray}}^n}\,{V}_j \exp \left(-\theta {d}_{ij}\right)\right]}^{-1} $$
(3.16)

and, the destination constrained variant

$$ E\left({Y}_{ij}\right)={B}_j {U}_i {V}_j \exp \left(-\theta\;{d}_{ij}\right) $$
(3.17)
$$ {B}_j={Y}_{\bullet j}{\left[{V}_j{\displaystyle \sum\limits_{\begin{subarray}{c}\hfill i=1\hfill \\ {}\hfill i\ne j\hfill \end{subarray}}^n}\,{U}_i \exp \left(-\theta {d}_{ij}\right)\right]}^{-1}. $$
(3.18)

The central concern in this paper is with the problem of estimating the model parameter θ rather than with the problem of determining appropriate values for the balancing factors.Footnote 9 A solution to this latter problem, for example in the case of Eqs. (3.12)–(3.14), involves using an iterative biproportional adjustment technique, known as the Deming-Stephan-Furness procedureFootnote 10 (see Sen and Smith 1995, p. 374). As Evans (1970) shows, convergence to a unique set of values for A i and B j is guaranteed for any non-trivial set of starting values.

3 Poisson Versions of the Constrained Gravity Models

Flows often take the form of counts such as numbers of migrants moving from one location to another. In such situations a common assumption is that the Y ij (i, j = 1, …, n) follow independentFootnote 11 Poisson distributions,Footnote 12 Y ij  ∼ 𝒫(μ ij ), where μ ij is equated with the right hand side of Eq. (3.1). The mean and the variance of the distribution are equal to μ ij . The Poisson specifications of the gravity model variants interpret/predict the level of flows as dependent on not only the explanatory variables (and their associated coefficient estimates), but also origin and destination specific effects coefficients. The fixed effects version of the three constrained model variants of the gravity model can be described as in Eqs. (3.19)–(3.21), respectively.

$$ E\left({Y}_{ij}\right)={\mu}_{ij}={U}_i\;{V}_j \exp \left[\alpha +{\displaystyle \sum_{h=1}^{n-1}{I}_{iho}\;{\beta}_{ho}+{\displaystyle \sum_{k=1}^{n-1}{I}_{jkd}\;{\beta}_{kd}\,-\,}\theta\;{d}_{ij}}\right] $$
(3.19)
$$ E\left({Y}_{ij}\right)={\mu}_{ij}={U}_i\; \exp \left[\alpha +{\displaystyle \sum_{h=1}^{n-1}{I}_{iho}\;{\beta}_{ho}-\theta\;{d}_{ij}}\right] $$
(3.20)
$$ E\left({Y}_{ij}\right)={\mu}_{ij}={V}_j\; \exp \left[\alpha +{\displaystyle \sum_{k=1}^{n-1}{I}_{jkd}\;{\beta}_{kd}-\theta\;{d}_{ij}}\right] $$
(3.21)

with origin h and destination k specific effects coefficients exp(β ho ) and exp(β kd ), and corresponding binary indicator variablesFootnote 13 I iho and I jkd for origins i and destinations j respectively,

$$ {I}_{iho}=\left\{\begin{array}{l}1 \mathrm{if}\ i=h\\ {}0 \mathrm{otherwise}\end{array}\right. $$
(3.22)
$$ {I}_{jkd}=\left\{\begin{array}{l}1 \mathrm{if}\ j=k\\ {}0 \mathrm{otherwise}.\end{array}\right. $$
(3.23)

The fixed effects parameters inflate or deflate the level of flows, depending on whether they are positive or negative. Of note is that one of the origin and one of the destination specific effects coefficients, β no and β nd , have to be set to zero to avoid perfect collinearity in the specifications, and these values are absorbed in the intercept term α.

The most direct approach to estimating the models is with maximum likelihood techniques. The likelihood function to be maximised is proportional to

$$ \mathcal{L}={\displaystyle \prod_{i,j}{\mu}_{ij}^{y_{ij}}} \exp \left(-{\mu}_{ij}\right) $$
(3.24)

where y ij is the realisation of the random variable Y ij . The Poisson distribution is a member of the exponential family of distributions. Hence parameter estimation can be achieved via GLMs (see McCullagh and Nelder 1983) so that the constrained gravity model variants (3.19)–(3.21) can be treated simply as particular cases of a GLM with a logarithmic link functionFootnote 14 and a Poisson mean. Then for the doubly constrained case, for example, we get

$$ \log \left[E\left({Y}_{ij}\right)\right]= \log {\mu}_{ij}= \log {U}_i+ \log {V}_j+\alpha +{\displaystyle \sum_{h=1}^{n-1}{I}_{iho}\;{\beta}_{ko}+{\displaystyle \sum_{k=1}^{n-1}{I}_{jkd}\;{\beta}_{kd}}-\theta\;{d}_{ij}} $$
(3.25)

where the term \( \left( \log {U}_i+ \log {V}_j\right) \) is included in the estimation procedure as an offset variable (that is, its coefficient is fixed to equal one).

The maximum likelihood estimates can be derived by means of an iterative re-weighted least squares procedureFootnote 15 that is implemented in many statistical software packages such as GLIM. A convenient property of the Poisson assumption along with the log-linear functional form assumed for μ ij is that the resulting maximum likelihood estimates guarantee that the fitted flows Ŷ ij satisfy relationships that are consistent with the desirable origin and/or destination constraints of spatial interactionFootnote 16 (see Kirby 1974; Davies and Guy 1987, and Bailey and Gatrell 1995, pp. 353–354 for details). Hence, there is no need to modify standard maximum likelihood parameter estimation to incorporate explicit constraints on predicted flows. The goodness-of-fit of GLMs is assessed on the basis of the log-likelihood ratio statistic, known as the deviance.

Fixed effects model specifications allow the unobserved location specific effects to be correlated with the explanatory variables. If the individual effects are strictly uncorrelated with the regressors, then it might be appropriate to model the location specific constant terms as randomly distributed across the locations. The role of random effects terms in this context may be twofold: first, supporting inferences beyond the specific fixed values of covariates employed in an analysis, and, second, accounting for correlation in a non-random sample of data being analysed, in part due to missing variables, for which they function as a surrogate. Random effects may be used if the values of independent variables—which were not deliberately selected by an experimenter—are thought to be a small subset of all possible values to which inferences are to be made, to account for heterogeneity/overdispersion/inter-observation correlation, or to handle observations that are not obtained by simple random sampling but come from a cluster or multi-level sampling design.

The random effects counterpartsFootnote 17 of the fixed effects model specifications (3.19)–(3.21) may be formulated as in Eqs. (3.26)–(3.28).

$$ E\left({Y}_{ij}\right)={\mu}_{ij}={U}_i\;{V}_j \exp \left[\alpha +{\xi}_{io}+{\xi}_{jd}-\theta\;{d}_{ij}\right] $$
(3.26)
$$ E\left({Y}_{ij}\right)={\mu}_{ij}={U}_i\; \exp \left[\alpha +{\xi}_{io}-\theta\;{d}_{ij}\right] $$
(3.27)
$$ E\left({Y}_{ij}\right)={\mu}_{ij}={V}_j \exp \left[\alpha +{\xi}_{jd}-\theta\;{d}_{ij}\right] $$
(3.28)

with zero mean normally distributed origin and destination specific random effectsFootnote 18 ξ io and ξ jd .

Finally, note that there is a constant of proportionality term, α, in the preceding Poisson gravity model specifications. This term is made explicit because the balancing factors that can be calibrated—as already mentioned – with the Deming-Stephan-Furness procedure, have a constant factored from them. This factorisation is achieved by: (i) setting the maximum A i and/or the maximum B j values to one at each iteration step in the procedure; (ii) arbitrarily removing one of the origin and/or destination indicator variables in the fixed effects specifications; and, (iii) imposing a mean of zero on the random effects prior distribution. This is equivalent to rewriting Eq. (3.12) as \( E({Y}_{ij})=K {A}_i {B}_j {U}_i {V}_j \exp (-\theta\;{d}_{ij}), \) where K is a constant.

4 Accounting for Spatial Dependence in the Model Specifications

origin-destination flows are not independent (Bolduc et al. 1995; Tiefelsdorf 2003), because flows are fundamentally spatial in nature (LeSage and Pace 2009). Spatial dependence in flows relates to correlations among flows between locations that are neighbouring a given origin-destination pair of locations.Footnote 19 Hence, a failure to account for spatial dependence in model specifications may lead to biased parameter estimates and incorrect conclusions. One way to overcome this problem is by incorporating spatial dependence into the Poisson versions of the constrained gravity model variants.Footnote 20 Another way to address spatial dependence in origin-destination flows involves eigenvector spatial filteringFootnote 21 (see Chun 2008; Fischer and Griffith 2008; Griffith 2009; Chun and Griffith 2011). Eigenvector spatial filtering relies on a spectral decompositionFootnote 22 of an N-by-N spatial weight matrix W into eigenvalues and eigenvectors, and then uses a subset of these eigenvectors as additional explanatory variables in the model specifications.

Spatial filtering used here in this paper relies on a spectral decomposition of a transformed spatial weight matrix MWM, where W is an N-by-N spatial weight matrixFootnote 23

$$ W={W}_n\otimes {W}_n $$
(3.29)

that captures spatial dependence between flows from locations neighbouring both the origins and destinations, labelled origin-to-destination dependence by LeSage and Pace (2008). W n is a row-stochastic n-by-n spatial weight matrix that describes spatial neighbourhood relationships between the n locations. This matrix has—by convention—zeros in the main diagonal, and non-negative elements in the off-diagonal cells. Specifically the (i, j)th element of W n is greater than zero if i and j are neighbouringFootnote 24 locations. \( \otimes \) denotes the Kronecker product. M is the N-by-N projection matrix

$$ M=I-\iota {\iota}^{\prime}\frac{1}{N} $$
(3.30)

where I is the N-by-N identity matrix, and ι the N-by-1 vector of ones.

The approach focuses on capturing correlations among flows with a spatial filter that is constructed as a linear combination of eigenvectors extracted from the matrix MWM. The eigenvalues scaled by N/(ι′ W n ι) directly indicate Moran’s I coefficient values of map patterns that are represented by the corresponding eigenvectors (Tiefelsdorf and Boots 1995). The first eigenvector, say E 1, is the set of real numbers that has the largest Moran’s I value achievable by any set of real numbers for the spatial dependence structure defined by the spatial weight matrix W. The second eigenvector, E 2, is the set of real numbers that has the largest achievable Moran’s I value by any set that is orthogonal to and uncorrelated with E 1. The third eigenvector is the third such set of values, and so on through E N , the set of real numbers that has the largest negative Moran’s I value achievable by any set that is orthogonal to and uncorrelated with the preceding \( (N-1)\) eigenvectors. As such, these eigenvectors furnish distinct map pattern descriptions of latent spatial dependence in the origin-destination flow variable because they are both orthogonal and uncorrelated. Their Moran’s Is corresponding eigenvalues index the nature and degree of spatial dependence portrayed by each eigenvector (Tiefelsdorf and Boots 1995), which can be standardised by the largest Moran’s I value, I max.

The construction of a spatial filter involves a stepwise selection process. Griffith (2003) suggests identifying a set of candidate eigenvectors first, based on a critical value for the corresponding eigenvalues, a value that indicates a specific minimum spatial autocorrelation levelFootnote 25 such as 0.5 measured in terms of the statistic I/I max. From these candidate eigenvectors, a subset of Q eigenvectors then can be selected with standard model selection criteria such as the Akaike information criterion. In the doubly constrained case of spatial interaction, for example, this yields the following spatial filter versions of the model specifications (3.12), (3.19) and (3.26), respectively:

$$ E\left({Y}_{ij}\right)={A}_i\;{B}_j\;{U}_i\;{V}_j \exp \left(\alpha -\theta\;{d}_{ij}+{\displaystyle \sum_{q=1}^Q{E}_q\;{\phi}_q}\right) $$
(3.31)
$$ E\left({Y}_{ij}\right)={\mu}_{ij}={U}_i\;{V}_j \exp \left[\alpha +{\displaystyle \sum_{h=1}^{n-1}{I}_{iho}\;{\beta}_{ho}}+{\displaystyle \sum_{k=1}^{n-1}{I}_{jkd}\;{\beta}_{kd}}-\theta\;{d}_{ij}+{\displaystyle \sum_{q=1}^Q{E}_q\;{\phi}_q}\right] $$
(3.32)
$$ E\left({Y}_{ij}\right)={\mu}_{ij}={U}_i\;{V}_j \exp \left[\alpha +{\xi}_{io}+{\xi}_{jd}-\theta\;{d}_{ij}+{\displaystyle \sum_{q=1}^Q{E}_q\;{\phi}_q}\right] $$
(3.33)

where E q denotes the qth eigenvector and ϕ q its associated coefficient. The term \( \exp \big({{\sum}_q\;{E}_q\;{\phi}_q}\big) \) is called a spatial filter.

The approach provides a simple way of filtering the sample flow data for spatial dependence in an effort to mimic independent data amenable to standard estimation procedures, and hence to reduce potential bias in the estimation of coefficients associated with the explanatory variables. Spatial filtering, however, also faces computational challenges in situations involving a large sample of observations.Footnote 26

5 Equivalency Relationships Between the Balancing Factors, Fixed Effects, and Random Effects

This section compares the three different model variants of constrained spatial interaction with each other. First, attention is shifted toward comparisons between model specifications with balancing factors and with fixed effects, and then between model specifications with balancing factors and with random effects, for both the doubly and singly constrained cases of spatial interaction.

5.1 Comparisons Between Balancing Factors and Fixed Effects

The first comparison is between the doubly constrained model with balancing factors and its corresponding fixed effects model specification, and hence focuses on the relationship between Eqs. (3.31) and (3.32).

Theorem 1

If Y ij ∼ Poisson with mean \( {\mu}_{ij}= \exp ( \log {U}_i+ \log {V}_j+\alpha +{\alpha}_{io}+{\alpha}_{jd} \) \( -\theta\;{d}_{ij}++{ {\sum}_q{E}_q\;{\phi}_q}) \) , where α denotes the global Poisson regression intercept term, α io the Poisson regression origin location intercept term, and α jd the Poisson regression destination location specific intercept term, then the balancing factors for the doubly constrained gravity model are given by \( {A}_i= \exp ({\alpha}_{io}) \) and \( {B}_j= \exp ({\alpha}_{jd}). \)

Proof

Because Eqs. (3.31) and (3.32) posit the expectation for the same random variable, Y ij , for \( i,j=1, \dots, n \)

$$\setcounter{equation}{33}\begin{array}{lll} &&{A}_i\;{B}_j\;{U}_i\;{V}_j \exp \left(\alpha -\theta\;{d}_{ij}+{\displaystyle \sum_{q=1}^Q{E}_q\;{\phi}_q}\right)\nonumber\\&&={U}_i\;{V}_j \exp \left(\alpha +{\beta}_{io}+{\beta}_{jd}-\theta\;{d}_{ij}+{\displaystyle \sum_{q=1}^Q{E}_q\;{\phi}_q}\right) \end{array}$$
(3.34)
$$ {A}_i\;{B}_j = \exp\;\left({\beta}_{io}+{\beta}_{jd}\right) = \exp\;\left({\beta}_{io}\right)\; \exp\;\left({\beta}_{jd}\right) $$
(3.35)
$$ \therefore {A}_i = \exp\;\left({\beta}_{io}\right)= \exp \left({\alpha}_{io}\right)\;\mathrm{f}\mathrm{o}\mathrm{r}\ \mathrm{all}\ i=1, \dots, n $$
(3.36)
$$ {B}_j= \exp \left({\beta}_{jd}\right)= \exp \left({\alpha}_{jd}\right)\;\mathrm{f}\mathrm{o}\mathrm{r}\ \mathrm{all}\;j=1, \dots, n\ \square $$
(3.37)

Remarks

The equivalencies \( {\alpha}_{io}={\beta}_{io} \) and \( {\alpha}_{jd}={\beta}_{jd} \) relate these results not only to the doubly, but also to the singly constrained cases. Furthermore, exp(α) is the constant of proportionality, which frequently is set to one (i.e., \( \alpha =0 \)) for the traditional entropy maximising solution, and other than one for the conventional gravity model solution. Allowing α to deviate from one in the Deming-Stephan-Furness procedure helps to stabilise convergence for large flow matrices, and may be achieved by setting the largest A i and the largest B j values to one during each iteration. This adjustment is equivalent to setting one of the \( {\alpha}_{io}={\beta}_{io} \) and one of the \( {\alpha}_{jd}={\beta}_{jd} \) to zero in the fixed effects specification in order to avoid perfect multicollinearity between the location specific indicator variables and the global mean (which is a coefficient times an n-by-1 vector of ones). Estimates of β io and β jd are obtained with Poisson regression.

This result relates the log-balancing factors, log(A i ) and log(B j ), for the doubly constrained gravity model to their counterpart origin and destination fixed effects, α io and α jd . Hence fixed effects take on a particular meaning because they can be interpreted as balancing factors. Cesario (1977) characterises the meaning of the origin and destination balancing factors as follows: 1/A i indexes the accessibility of all destination locations vis-à-vis origin i, and 1/B j indexes the accessibility of all origin locations vis-à-vis destination j.

The next comparisons are between the model specifications with balancing factors and fixed effects in the singly constrained cases of spatial interaction. To this end, Theorem 1 suggests the following two corollaries pertaining to the singly constrained spatial filter model specifications.

Corollary 1

If Y ij ∼ Poisson with mean \( {\mu}_{ij}= \exp ( \log {U}_i+\alpha +{\alpha}_{io}-\theta\;{d}_{ij}+{{\sum}_q\;{E}_q\;{\phi}_q}) \) , where α denotes the global Poisson regression intercept term, and α io the Poisson regression origin location specific intercept term, then the balancing factors for the origin constrained gravity model are given by \( {A}_i= \exp ({\alpha}_{io}) \) for \( i=1, \dots, n \) .

Corollary 2

If Y ij ∼ Poisson with mean \( {\mu}_{ij}= \exp ( \log {V}_j+\alpha +{\alpha}_{jd}-\theta\;{d}_{ij}+{{\sum}_q\;{E}_q\;{\phi}_q}) \) , where α denotes the global Poisson regression intercept term, and α jd the Poisson regression destination location specific intercept term, then the balancing factors for the destination constrained gravity model are given by \( {B}_j= \exp ({\alpha}_{jd}) \) for \( j=1, \dots, n \) .

These two corollaries relate the log-balancing factors for the singly constrained gravity models to their counterpart origin or destination fixed effect model specifications.

5.2 Comparisons Between Balancing Factors and Random Effects

Finally, comparisons can be made between the preceding results and the random effects model specifications. In this context, a specification includes ξ io and/or ξ jd , normal random variables, which respectively denote the random effects for origin i and/or destination j, whose stochastic quantities are added to the global intercept term. The next theorem relates to the relationship between a doubly constrained model with balancing factors and its corresponding random effects model specification, and hence focuses on the relationship between Eqs. (3.31) and (3.33).

Theorem 2

If Y ij ∼ Poisson with mean \( {\mu}_{ij}= \exp ( \log {U}_i+ \log {V}_j+\alpha +{\xi}_{io}+{\xi}_{jd} \) \( -\theta\;{d}_{ij}+{{\sum}_q\;{E}_q\;{\phi}_q}) \) , where α denotes the global Poisson regression intercept term, ξ io the Poisson regression origin location random effect, and ξ jd the Poisson regression destination location random effect, such that \( {\xi}_{io}\sim \mathcal{N}(0,{\sigma}_{\xi_o}^2) \) and \( {\xi}_{jd}\sim \mathcal{N}(0,{\sigma}_{\xi_d}^2) \) , where \( {\sigma}_{\xi_o}^2 \) and \( {\sigma}_{\xi_d}^2 \) denote the origin and the destination location random effects variances respectively, then the balancing factors for the doubly constrained gravity model are given by \( {A}_i= \exp ({\xi}_{io}) \) for \( i=1, \dots, n \) , and \( {B}_j= \exp ({\xi}_{jd}) \) for \( j=1, \dots, n \) .

Proof

Equation (3.33) implies

$$ \log \left[E\left({Y}_{ij}\right)\right]= \log {\mu}_{ij}= \log {U}_i+ \log {V}_j+\alpha +{\xi}_{io}+{\xi}_{jd}-\theta\;{d}_{ij}+{\displaystyle \sum_{q=1}^Q{E}_q\;{\phi}_q}. $$
(3.38)

Because Eqs. (3.31) and (3.38) posit the expectation for the same random variable, Y ij , for \( i,j=1, \dots, n \)

$$\setcounter{equation}{38}\begin{array}{lll} &&{A}_i\;{B}_j\;{U}_i\;{V}_j \exp \left(\alpha -\theta\;{d}_{ij}+{\displaystyle \sum_{q=1}^Q{E}_q\;{\phi}_q}\right)\nonumber\\&&\qquad ={U}_i\;{V}_j \exp \left(\alpha +{\xi}_{io}+{\xi}_{jd}-\theta\;{d}_{ij}+{\displaystyle \sum_{q=1}^Q{E}_q\;{\phi}_q}\right) \end{array}$$
(3.39)
$$ {A}_i{B}_j= \exp ({\xi}_{io}+{\xi}_{jd})= \exp ({\xi}_{io})\; \exp ({\xi}_{jd}) $$
(3.40)

\( \therefore \) \( {A}_i= \exp ({\xi}_{io}) \) for all \( i=1, \dots, n \) and \( {B}_j= \exp ({\xi}_{jd}) \) for all \( j=1, \dots, n \)

Remarks

Both ξ io and ξ jd have a mean of zero, which is achieved by having the global mean, α, in the model specification. In other words, the individual origin and destination location means deviate from the global mean by random quantities. Theorems 1 and 2 together imply: \( {A}_i= \exp ({\xi}_{io})= \exp ({\beta}_{io})= \exp ({\alpha}_{io}) \) for all \( i=1, \dots, n \), and \( {B}_j= \exp ({\xi}_{jd})= \exp ({\beta}_{jd}) \) \( = \exp ({\alpha}_{jd}) \) for all \( j=1, \dots, n \). Estimates of ξ io and ξ jd are obtained by integrating them out of the likelihood function.

Singly constrained models are obtained by setting \( {\xi}_{jd}=0 \) for all j, yielding the origin constrained specification, or \( {\xi}_{io}=0 \) for all i, yielding the destination constrained specification. Accordingly, Theorem 2 suggests the following two corollaries pertaining to the singly constrained model specifications.

Corollary 3

If Y ij ∼ Poisson with mean \( {\mu}_{ij}= \exp ( \log {U}_i+\alpha +{\xi}_{io}-\theta\;{d}_{ij}+{{\sum}_q\;{E}_q\;{\phi}_q}) \) , where α denotes the global Poisson regression intercept term, and ξ io the Poisson regression origin location random effect, such that \( {\xi}_{io}\sim \mathcal{N}(0,{\sigma}_{\xi_o}^2) \) , where \( {\sigma}_{\xi_o}^2 \) denotes the origin location finite random effects variance, then the balancing factors for the origin constrained gravity model are given by \( {A}_i= \exp \left({\xi}_{io}\right) \) for \( i=1, \dots, n \) .

Corollary 4

If Y ij ∼ Poisson with mean \( {\mu}_{ij}= \exp ( \log {V}_j+\alpha +{\xi}_{jd}-\theta\;{d}_{ij}+{{\sum}_q\;{E}_q\;{\phi}_q}) \) , where α denotes the global Poisson regression intercept term, and ξ jd the Poisson regression destination location random effect, such that \( {\xi}_{id}\sim \mathcal{N}(0,{\sigma}_{\xi_d}^2) \) , where \( {\sigma}_{\xi_d}^2 \) denotes the destination location finite random effects variance, then the balancing factors for the destination constrained gravity model are given by \( {B}_j= \exp ({\xi}_{jd}) \) for \( j=1, \dots, n \) .

6 An Illustrative Example

In this section we use knowledge flows as captured by patent citation data to numerically illustrate the relationships between the aforementioned balancing factors, fixed effects and random effects in the cases of singly and doubly constrained variants of the gravity model. The origin-destination data relate to citations between European high-technology patents. By European patents we mean patent applications at the European Patent Office assigned to high-technology firms located in Europe. High-technology is defined to include the International Standard Industrial Classification (ISIC) sectors of aerospace (ISIC 3845), electronics-telecommunication (ISIC 3832), computers and office equipment (ISIC 3825), and pharmaceuticals (ISIC 3522). Self-citations (that is, citations from patents assigned to the same firm) have been excluded, given our interest in pure externalities as evidenced by interfirm knowledge spillovers.

Experts acknowledge that observations of patent citations are subject to a truncation bias, because we observe citations for only a portion of the life of an invention. To avoid this bias in the analysis, we have established a 5-year window (that is, 1985–1989, 1986–1990,…, 1993–1997) to count citations to a patent.Footnote 27 The observation period is 1985–1997 with respect to cited patents, and 1990–2002 with respect to citing patents. The sample used in this section is restricted to inventors located in n = 257 European NUTS-2 regions, covering the EU-27 member states (excluding Cyprus and Malta) plus Norway and Switzerland. In case of cross-regional inventor teams, we have used the procedure of multiple full counting that—unlike fractional counting—does justice to the true integer nature of patent citations, but gives interregional cooperative inventions greater weight.

Subject to caveats relative to the relationship between patent citations and knowledge spillovers, the sample data allow us to identify and measure spatial separation effects for interregional knowledge spillovers in the spatial system of 257 regions. We use a binary 257-by-257 contiguity matrix to specify the 66,049-by-66,049 spatial weight matrix W that captures spatial dependence between patent citation flows from locations neighbouring both the origins and the destinations. Our interest is focused on the following three measures of separation: geographical distance, measured in terms of the great circle distance (in km), technological proximity, measured in terms of an index (for details see Fischer et al. 2006), and a dummy variable that represents border effects measured in terms of the presence of country borders between the regions. The product U i V j may be interpreted simply as the number of distinct (i, j) interactions that are possible. Thus, a reasonable way to measure the origin factor U i is in terms of the number of patents in knowledge producing region i in the time period 1985–1997, and the destination factor V j in terms of the number of patents in knowledge absorbing region j in the time period 1990–2002 (Fischer and Griffith 2008). Accordingly, we have 66,049 observations, five (four) covariates and an intercept term in the doubly (singly) constrained cases of spatial interaction.

6.1 Model Specifications Ignoring Spatial Dependence in origin-destination Flows

Empirical experiments were conducted to numerically illustrate relationships between the aforementioned balancing factors, fixed effects, and random effects. The preceding theorems and corollaries indicate that these should be perfectly straight trend line relationships (using the log-balancing factors) with a slope of one, but not necessarily an intercept of zero. The intercept term represents an arbitrary multiplicative factor (i.e., a constant of proportionality).

Theorem 1 together with Corollaries 1 and 2 indicate that the scatterplots of the log-balancing factors versus their concatenated Poisson regression indicator variable coefficients (augmented by zero for the arbitrarily removed indicator variables) form a perfect straight line [see Fig. 3.1]. The accompanying linear regression equationsFootnote 28 relating these two pairings of values are as followsFootnote 29: for the origin constrained case of spatial interaction: log(A i ) = 0.00051 + 0.99997 α io (R2 = 1.0000), the destination constrained case of spatial interaction: log(B j ) = 0.00001 + 0.99999 α jd (R2 = 1.0000), and the doubly constrained case of spatial interaction: log (A i ) = 0.00018 + 1.00114 α io (R2 = 1.0000) and log(B j ) = 0.00099 + 1.00110 α jd (R2 = 1.0000). Furthermore, these log-balancing factors strongly covary [see Fig. 3.2(a) and (b)], and all deviate somewhat from a normal frequency distribution as indicated by Fig. 3.2(c)–(f).

Fig. 3.1
figure 1

Scatterplots of the log-balancing factors [log(A i ) and log(B j )] versus the vector of the Poisson regression indicator variable coefficients: (a) the singly-constrained cases: origin and destination balancing factor plots; and (b) the doubly constrained case: the origin and the destination balancing factor plots

Fig. 3.2
figure 2

Log-balancing factors for the constrained model variants: (a) scatterplot of the singly constrained origin and destination log-balancing factor pairs; (b) scatterplot of the doubly constrained origin and destination log-balancing factor pairs; (c) normal quantile plot of log(A i ) values in the origin-constrained case, with its 95 % confidence interval (CI); (d) normal quantile plot of log(B j ) values in the destination-constrained case, with its 95 % CI; (e) normal quantile plot of log(A i ) values in the doubly-constrained case, with its 95 % CI; and (f) normal quantile plot of log(B j ) values in the doubly-constrained case, with its 95 % CI

Model comparison results for the fixed effects versions of the constrained models are presented in Table 3.1. Inclusion of the origin and/or destination balancing factors as fixed effects covariates reduces overdispersion as indicated by the deviance statistic,Footnote 30 noticeably changes the three separation function component parameter estimates (especially that for the geographical distance decay), and remarkably increases the pseudo-R2 value (measured in terms of a linear relationship between the predicted and observed counts). The last column in Table 3.1 presents estimation results for the doubly constrained spatially filtered gravity model specification, for comparative purposes. The elimination of spatial dependence in the flows triggers a change of estimated parameter values and generates a decrease in the estimated overdispersion, compared with the standard doubly constrained model specification. That is, a part of the overdispersion, caused by spatial dependence, is eliminated by including eigenvectors, which are the proxy variables of the spatial dependence embedded in the standard model.

Table 3.1 Summary statistics for estimation of the fixed effects constrained variants of the gravity model (standard deviations in parentheses)

Theorem 2 together with Corollaries 3 and 4 address the random effects model specifications for the three constrained variants of the gravity model. Treated as particular cases of a GLM with a logarithmic link function and a Poisson mean, these specifications yield the following expected values: \( \log \left[E({Y}_{ij})\right]= \log ({\mu}_{ij})= \log ({U}_i)+ \) \( \log ({V}_j)+\alpha +{\xi}_{io}+{\xi}_{jd}-\theta\;{d}_{ij} \) in the doubly constrained case of spatial interaction, \( \log \left[E({Y}_{ij})\right]= \log ({\mu}_{ij})= \log ({U}_i)+\alpha +{\xi}_{io}-\theta\;{d}_{ij} \) in the origin constrained case, and \( \log \left[E({Y}_{ij})\right]= \log ({\mu}_{ij})= \log ({V}_j)+\alpha +{\xi}_{jd}-\theta\;{d}_{ij} \) in the destination constrained case. The log terms on the right-hand side of the equations are the offset variables. Bolduc et al. (1995) argue that estimating origin and destination specific random effects in the gravity model specification is very difficult. But the implication from Theorem 2 for the doubly constrained specification supports a numerical demonstration for it, too.

Descriptive statistics for the random effects estimates are given in Table 3.2. A frequentist approach requires integration of these effects out of the likelihood function under study. As n increases, the multidimensional integration involved becomes increasingly difficult. Here, with n = 257, the SAS procedure, called SAS PROC NLMIXED, fails to correctly calculate about 10 % of the random effects (see the Appendix). This complication resulted in the design of an indirect demonstration of Theorem 2 as follows. Each balancing factor was introduced into the model specification, and then a random effects term was estimated. If a balancing factor is equivalent to a random effects term, then all of the estimated random effects are approximately zero. This expectation characterises the findings summarised in Table 3.2. In other words, the estimated fixed and random effects display no consequential differences. The generalised linear mixed model (GLMM) random effects estimates are nearly identical to their balancing factor fixed effects counterparts.

Table 3.2 Summary statistics for random effects estimations: the origin-constrained, the destination-constrained and the doubly-constrained cases

Spatial filter descriptions of these variates are nearly identical,Footnote 31 as shown in Table 3.3, and comprise 17–20 of the 42 candidate eigenvectors depicting at least weak positive spatial autocorrelation map patterns. These filters allow the balancing factors to be deconstructed into spatially structured (SSRE) and spatially unstructured (SURE) random effects: the linear combination of eigenvectors constitutes the SSRE, and the (remaining) residuals constitute the SURE. The SSREs account for roughly two-thirds of the variance displayed by the total random effects terms. This spatial structuring represents moderate-to-strong positive spatial autocorrelation, and is one reason the individual terms deviate from a normal frequency distribution [all P(S-W) statistics increase, but still indicate marked deviation from a normal distribution]. These linear spatial filters account for virtually all of the spatial autocorrelation latent in the spatial distribution of these balancing factor variates, which differ from the spatial dependence latent in the flows between the regions.

Table 3.3 Summary statistics for the balancing factors and the decomposition

Consequently, these particular singly constrained gravity model results confirm Corollaries 3 and 4 , and as such indirectly demonstrate Theorem 2 . They also illustrate that \( {A}_i= \exp ({\xi}_{io})= \exp ({\beta}_{io})= \exp ({\alpha}_{io}) \) for \( i=1, \dots, n \), and \( {B}_j= \exp ({\xi}_{jd})= \exp ({\beta}_{jd})= \exp ({\alpha}_{jd}) \) for \( j=1, \dots, n \). In other words, the model specifications with balancing factors, fixed effects and random effects, respectively, yield identical estimation results for the production constrained and the attraction constrained cases of spatial interaction. These findings imply that the same results hold for the doubly constrained case (Fig. 3.3).

Fig. 3.3
figure 3

Matrix lower triangular scatterplots, and upper triangular correlations: (a) spatially structured random effects (SSREs): sossre—singly-constrained origin, sdssre—singly-constrained destination, dossre—doubly-constrained origin, and ddssre—doubly-constrained destination; and (b) spatially unstructured random effects (SUREs): sossre—singly-constrained origin, sdssre—singly-constrained destination, dossre—doubly-constrained origin, and ddssre—doubly-constrained destination

6.2 Spatial Filter Model Specifications Accounting for~Spatial Dependence in Flows

Estimating the balancing factors for singly and doubly constrained model specifications accounts for spatial dependence in the origin and destination factors of the gravity model, but not for spatial dependence in flows. Because only one set of indicator variables is involved in singly constrained model specifications, the intercept term can be added to each factor, forcing α to zero in the origin constrained model specification, and in the destination constrained model specification, respectively. This simple adjustment is not possible for the doubly constrained model, for which the intercept term includes the sum of the two arbitrarily selected indicator variable coefficients set to zero. Estimating random effects in the doubly constrained case also overlooks spatial dependence in flows, and treats the n origin flow recipients as repeated measures for each destination, and the n destination flow sources as repeated measures for each origin, respectively. All of these specifications posit a unique value for each origin/destination for the N = n 2 flow data.

Origin and destination balancing factors must be estimated simultaneously (not sequentially) with the spatial filters, in order to preserve the row and column constraining totals. For the current case study, the spatial filters represent moderate-to-strong positive spatial autocorrelation \( \left(I\approx 0.70\right) \), decrease overdispersion by a third or more beyond the reduction attributable to the balancing factors (Table 3.1), produce a modest increase in the pseudo-R2 value, induce a marked decrease in the distance decay parameter (for example, the confidence interval does not overlap with those for the other specifications), and comprise Q = 221 of the 576 candidate eigenvectorsFootnote 32 of matrix W.

Fig. 3.4
figure 4

Scatterplots of observed (vertical axis) versus predicted (horizontal axis) flows; grey lines denotes the line of perfect prediction. (a) Unconstrained model specification; (b) doubly constrained model specification; and (c) doubly constrained model specification adjusting for spatial dependence

Figure 3.4 reports the scatterplots of observed versus predicted flows for the unconstrained gravity model specification, and the doubly constrained gravity model specification with and without accounting for origin-to-destination dependence in the flows. The scatterplots display a standard Poisson random variable plot of increasing variance with increasing amount of flow, and indicate a sequentially improved alignment of predicted with observed values. Imposing flow data matrix row and/or column total constraints coupled with inclusion of a spatial filter capturing spatial dependence between flows from locations neighbouring both the origins and destinations during estimation, shrinks especially the larger predicted flow values toward the perfectly straight trend line.

Figure 3.5 portrays the three individual separation effects. An expected finding is that the geographical distance decay parameter estimate adjusted for spatial dependence is less than in the model specifications that ignore spatial dependence in flows. And, it differs substantially from its unadjusted counterparts [see Fig. 3.5a]. The pairs of values do not have overlapping confidence intervals (CIs), in part because of the large sample size. The CI for the unconstrained case of spatial interaction is (−0.4887, −0.4613), the origin constrained case (−0.8473, −0.8161), the destination constrained case (−0.7695, −0.7384), the doubly constrained case (−0.9249, −0.8934), and the spatially filtered doubly constrained case (−1.5507, −1.4476). The technological separation decay parameter estimate exhibits little difference across the specifications [see Fig. 3.5b]. And, ignoring spatial dependence appears to exaggerate border separation effects [see Fig. 3.5c]. Of note is that the geographical distance parameter estimate has the largest spread across the model specifications.

Fig. 3.5
figure 5

Separation decay effects for the various model specifications: unconstrained (thin line), destination-constrained (dotted line), origin-constrained (short dash line), doubly-constrained (long dash line), doubly-constrained adjusted for network spatial autocorrelation (thick line). (a)~geographical distance; (b) technological distance; and (c) intervening border

7 Concluding Remarks

This paper suggests a number of interesting conclusions and implications for the statistical analysis of origin-destination data. Foremost, and quite counterintuitive, fixed effects and random effects are identical and equal the logarithm of the entropy maximisation derived balancing factors, except for slight rounding/algorithm-convergence errors. This finding is the outcome of an equivalency between assigning a single fixed effects indicator variable to each origin/destination, on the one hand, and estimating a single random effects (which is a mean) value for an origin/destination while treating the corresponding n destinations/origins as repeated measures, on the other hand. This finding also indicates that the number of degrees of freedom associated with the random effects term in this context may well be closer to \( n-1 \) than to two (i.e., for estimating the mean and the variance of a random effects term) for the origins as well as the destination stochastic variable.

As with the unconstrained gravity model, adjusting for spatial dependence in flows improves the performance of the constrained variants of the gravity model in terms of both the pseudo-R2 and the deviance statistic, and has a substantial impact on the separation parameter estimates that is in line with Curry (1972). The cost in degrees of freedom is modest. On average, at least 90 degrees of freedom are available for each parameter estimated in this case study. The eigenvectors successfully capture origin-to-destination dependence in flows. Hence, eigenvector spatial filtering provides a useful way of filtering spatial dependence in the sample origin-destination data. A virtue of this approach is that standard model specifications of the constrained gravity models and existing software can be applied to origin-destination data samples. This proves especially useful when dealing with flows taking the form of counts. However, the difficulty of computing eigenvalues and eigenvectors when dealing with a large number of locations limits the ability of filtering to capitalise on these advantages.