1 Introduction

Since Alfred Marshall’s pioneering Principles of Economics, a common theme in Urban and Regional Economics has been that the agglomeration of similar firms can boost firm productivity. Thus agglomeration economies are a key variable in the location decision process. Usually, only firms located in reduced areas, such as the city of Prato (Italy), or Silicon Valley (USA) (often referred to as Marshallian industrial districts), are supposed to get the advantages of agglomeration economies. However, one can expect that spillovers and other advantages derived from agglomeration economies might also provide benefits to plants locating in nearby areas, in addition to those in the same immediate town or municipality (Ellison and Glaeser 1997). This issue is related to the so-called geographical scope of agglomeration economies commonly assumed to attenuate over distance. In that perspective, the aim of this paper is to analyze whether the location decisions of manufacturing plants in Spanish municipalities are related to the location decisions taken in surrounding or neighboring municipalities, and to give insight into the reasons for this agglomerative behavior. In order to do so, we will apply Spatial Econometric techniques to study the location decisions of 11 industries in Spanish municipalities.

Firms may cluster due to many reasons, such as history, random events, natural advantages, or agglomeration economies (Marshall 1890; Krugman 1991a, b; Ellison and Glaeser 1997; Ellison et al. 2010)Footnote 1. The most usual classification of agglomeration economies comprises urbanization economies, when the industrial mix is diverse and firms also benefit from the services and facilities of urban areas, and localization economies or Marshallian external economies, when the advantages of clustering derive from the same industry (Hoover 1948)Footnote 2. According to Marshall (1890), the sources of the so-called agglomeration economies are: shared input marketsFootnote 3, labor market poolingFootnote 4; and human capital and knowledge spilloversFootnote 5. A similar concept to localization economies are the so-called MAR externalities—named after Marshall (1890), Arrow (1962) and Romer (1986)—when the agglomeration of firms arises in an oligopolistic environment (Glaeser et al. 1992).

Most analyses of Marshallian externalities have usually focused on the aforementioned sources of agglomeration economiesFootnote 6, and on the so-called industrial scope, which deals with the distinction between localization economies and urbanization economiesFootnote 7. However, as it is pointed out in Rosenthal and Strange (2004), less attention has been paid to the other dimensions over which agglomeration economies extend: the temporal scope and the geographic scope. The temporal scope is related to whether the effects of these economies are felt immediately or whether there may be any time lag, since there may be static agglomeration economies and dynamic agglomeration economies (see Glaeser et al. 1992; Henderson 1997). The geographic scope deals with the attenuation of the benefits of agglomeration with physical distance, since, ceteris paribus, when economic agents are closer there is more potential for interaction. This paper is focused on this geographical dimension of agglomeration economies, using data from Spanish municipalities.

There is not much work done on the geographic scope of agglomeration economies, with existing studies exhibiting only limited evidence of benefits extending beyond town limits. Using US zip codes, Rosenthal and Strange (2003) show that the geographic scope of localization economies seems larger than urbanization economies. They found that employment outside the industry of focus had an inconsistent and frequently insignificant effect. For the Spanish municipalities, Viladecans-Marsal (2004), who limits her analysis to the most crowded Spanish cities (over 15,000 inhabitants), found that urbanization economies influence location in most industries, while localization economies played a minor role, and the agglomeration effects only spilled over the city borders in three of the six manufacturing industries analyzed. Using similar techniques, but studying Catalan municipalities, Jofré-Montseny (2009) found evidence on the geographical scope of localization economies for the textile and wood and furniture industries, and for urbanization economies in medical, precision and optical instruments, chemical products and metal products except for machinery industries.

On the other hand, Soest et al. (2006), working with zip code data from a Dutch province, conclude that agglomeration economies may well operate on a geographic scale that is smaller than a city, since they only found evidence for interurban externalities for manufacturing, which is analyzed as a single industry. Simmie (1998), Suárez-Villa and Alrod (1998), and Arita and McCann (2000) also cast doubts on the spatial extent of agglomeration.

According to Tobler’s first law of Geography “everything is related to everything else, but near things are more related than distant things” (Tobler 1970)Footnote 8. That sentence is often used to explain the concept of spatial dependence or spatial autocorrelation, and to justify the need to check for spatial autocorrelation when dealing with spatial data and processes. There is spatial dependence or autocorrelation when the values of a variable in a certain location are related to the values of the same variable in neighboring locations. Surprisingly spatial autocorrelation is seldom taken into consideration in industrial location decision analysis. Therefore, most of the studies referenced above are mainly based on non-spatial regression analysisFootnote 9, which limits their findings. To properly capture the geographical scope of agglomeration economies, controls for spatial dependence should be usedFootnote 10. Spatial tools allow location decisions to be influenced by the decisions of firms in neighboring or nearby municipalities. Ignoring these influences can cause a variety of issues in an empirical analysis.

The aim of this paper is to analyze the extent of dependence in location decisions between neighboring municipalities. Instead of building or testing a comprehensive or sophisticated location decision model, we focus on the similarities or dissimilarities of those location decisions among neighboring municipalities.

We apply Spatial Econometrics (Spatial Probit models and Non-spatial Probit models with spatially lagged explanatory variables) to estimate a simple location decision model and Spatial Statistics techniques (BB Join Count Statistics and Moran’ I Statistic) to analyze the spatial allocation of new manufacturing establishments in Spanish municipalities. Both methods examine spatial dependence in location decisions. Our dataset comprises the continental Spanish municipalities and 11 industries.

This paper is organized as follows. Section 2 provides data, the methodology both for the exploratory analysis and for the confirmatory analysis, and a simple location decision model is presented. Results are shown in Sect. 3. Finally, the main conclusions of this research are set out in Sect. 4.

2 A simple location model, the statistical methodology, the spatial unit of analysis and the data

In this section, we introduce the model, the spatial econometrics and spatial statistics techniques that will be implemented in the next section, some considerations about the spatial unit of analysis, and the data.

2.1 Econometric specification

Usually, location models are constructed considering the location decision problem as one of “random” profit maximizationFootnote 11 (Figueiredo et al. 2002). Following McFadden (1974) and Carlton (1983), it is considered that if an entrepreneur, who previously decided to open a new establishment in manufacturing industry j, locates in municipality i it will produce a potential profit of \(\pi _{ij}\). Formally,

$$\begin{aligned} \pi _{ij} = X_{i}+\varepsilon _{ij} \end{aligned}$$
(1)

where Xi reflects internal characteristics of municipality i and \(\varepsilon _{ij}\) stands for a random variable, which is expected to be distributed independently. So, this entrepreneur will locate in municipality i if the potential profit is greater than in other municipalities, m, for instance, that is

$$\begin{aligned} \pi _{ij }>\pi _{mj} \end{aligned}$$
(2)

where \(i \ne m\). This profit depends on a set of local characteristics, and it is usually expressed as a linear combination of these characteristics (Figueiredo et al. 2002). Thus, in our case this profit would also depend on the characteristics of the neighboring area

$$\begin{aligned} \pi _{ij }f(X_{i},WX) \end{aligned}$$
(3)

where the explanatory variables Xi and WX account for the local characteristics which impact on profits and for the relevant characteristics of the neighboring municipalities, respectively. W is a spatial weights matrix (SWM), where \(w_{ij}\) is set to 1 if municipality i and municipality are considered neighbors, and to zero otherwise. So, WX could be substituted by \(W\pi _{ij}\)

$$\begin{aligned} \pi _{ij}f(X_{j}, W\pi _{ij}) \end{aligned}$$
(4)

As it is not possible to observe \(\pi _{ij}\) (Ellison and Glaeser 1997), the dependent variable of location models is usually the number of new establishments or new firms created over a period of time, LOC. So, we may express LOC as a linear combination of independent variables from equation (3)

$$\begin{aligned} { LOC}_{ij}=\Sigma _{n}\beta _{n}X_{ni}+\Sigma _{n}\rho _{ n}WX_{ni}+\varepsilon _{ij}. \end{aligned}$$
(5)

Location decision models are usually estimated using limited dependent variable models, i.e., Logit, Probit or Poisson specificationsFootnote 12. However, there are potentially a variety of unobserved (or difficult to quantify) influences that could cause location decisions to be spatially dependent. For instance, some areas may have better infrastructure or road networks that are conducive to manufacturing. If LOC ij depends on what happens in neighboring municipalities, the assumption of an independently distributed \(\varepsilon _{ij}\) is too strong. Two popular tests of spatial dependence are described in Sect. 2.3. The existence of spatial autocorrelation invalidates the use of most usual statistical and econometric techniques, such us ordinary least squares, or the basic logit or Probit modelsFootnote 13. If those models are used on spatially dependent data, biased or inefficient results will be obtained.

Spatial autocorrelation in data and processes may be treated in different ways. A simple approach may be to try to remove it from the datasetFootnote 14, but this is often not sufficient. Alternatively, spatial controls can be included in the specification of the model. The two most common approaches to the later method are the spatial autoregressive model (SAR) and the spatial error model (SEM)Footnote 15.

Three models will be estimated for each manufacturing industry: a standard Probit with spatially lagged explanatory variables, (PLEV), a Bayesian spatial autoregressive Probit, (SARP), and a Bayesian spatial error Probit, (SEMP).

As changes in explanatory variables for municipality i will have a direct impact on the location decisions of municipality i, as well as an indirect or spatial spillover impact on neighbors, following Lesage and Pace (2009) we will estimate total, direct and indirect effects of SARP models.

However, since the indirect and indirect effects of SAR models are global (Lesage and Pace 2009) and that location processes may seem more localized, we will also estimate SEMP models with spatially lagged explanatory variables.

As a dependent variable, we use \({\textit{LOC}}_{ij}\), a binary variable which is set to 1 if the location decision industry j is implemented in municipality i over the period 1991–1995Footnote 16 and to 0 otherwise. We estimate an equation for each one of the eleven manufacturing industries considered. The normal approach to this type of data would be to use a Probit or logit modelFootnote 17. In the presence of spatial autocorrelation, however, standard Logit and Probit models are not very useful since \(\varepsilon \) does not follow a normal distribution. The majority of spatial econometric models with a continuous dependent variable use maximum likelihood techniques. However, with a binary dependent variable, there is no closed form solution to Probit or logit probabilities (Anselin 2002, Lesage and Pace 2009).

We therefore use an alternative approach, which employs Bayesian methods to control for spatial dependence (Lesage 1997 and Lesage 2000; Smith and Lesage 2002). Although there are other less popular alternativesFootnote 18, such as the generalized methods of moments (GMM) estimation (Pinkse and Slade 1988); or the EM (expectation maximization) approach for error models (Mcmillen 1995), Bayesian methods represent the most comprehensive approach with a range of support and previous literature. This approach, proposed in Lesage (1997, 2000) and Smith and Lesage (2002) “is the most flexible of the spatially dependent models because it can incorporate spatial lag dependence and spatial error dependence in addition to general heteroskedasticity, of unknown form (Fleming 2004, p.166–167).”

The Bayesian approach used here has its foundations in a non-spatial paper by Albert and Chib (2003), who model the binary dependent variable y as an indicator of unobserved latent utility \(y^*\) (Lesage and Pace 2009). The relationship between y and \(y^* \) is as follows: \(y_i = 1\) if \(y_i^* \ge 0\), and \(y_i = 0\) if \(y_i^*< 0\). In the present application, when the net utility (\(y_i^*\ge 0\)) of locating in municipality i is positive, \(y_i = 1\) and the firm selects i for its location. Albert and Chib (1993) recognized that \(p({ \beta },\sigma 2 {\vert } y^*)=p({\beta }, \sigma 2 {\vert } y^*,y)\), since if you have \(y^*\) you have all the information needed to create y. This significantly simplifies the problem, because if \(y^*\) is added as an additional parameter to be estimated, then the joint conditional posterior distribution of \(\beta \) and \(\sigma \)2 can be modeled as the same form as a continuous dependent variable Bayesian regression (Lesage and Pace 2009; LeSage et al. 2011).

Instead of having to numerically integrate over the conditional distributions, Albert and Chib’s (1993) contribution allows us to use Bayesian Markov chain Monte Carlo methods to sample each parameter from its conditional distribution. After numerous iterations of this sampling algorithm, a set of draws is produced that converges to the unconditional joint posterior distribution (full details are contained in Lesage and Pace (2009)). For instance, the conditional distributions of \(\rho \) in the SAR model, and \(\lambda \) in the SEM model, as followsFootnote 19.

$$\begin{aligned}&p(\rho |\beta ,y^{*})\propto \left| {I_n -\rho W} \right| \exp \left( {-\frac{1}{2}\left( {(I_n -\rho W)y^{*}-X\beta } \right) ^{\prime }\left( {(I_n -\rho W)y^{*}-X\beta } \right) } \right) \nonumber \\&p(\lambda |\beta ,y^{*})\propto \left| {I_n -\lambda W} \right| \exp \left( {-\frac{1}{2}\left( {y^{*}-X\beta } \right) ^{\prime }\left( {I_n -\lambda W} \right) ^{\prime }\left( {I_n -\lambda W} \right) \left( {y^{*}-X\beta } \right) } \right) \nonumber \\ \end{aligned}$$
(6)

For the number of iterations, we use 10,000 draws along with a 2500 draw “burn in”, which is discarded, but used to better calibrate the initial parameter values. To determine whether this number of draws is sufficient, Raftery–Lewis convergence diagnostics are employed. Although we implement several tests of spatial dependence below, there is not a robust method of choosing between the SAR and SEM models in the context of a binary dependent variableFootnote 20. Consequently, both models are presented below.

2.2 Data sources and location determinants

Location models try to explain how certain variables may influence location decisions. Most empirical work usually groups these variables into categories such as supply factors, demand factors, external economies and diseconomies, etc. (Guimarães et al. 2004). Since our central focus is the spatial influence of neighboring municipalities, we do not carry out an extensive analysis of location determinantsFootnote 21. As explained later, this is also due to the lack of data for NUTS V in Spain with regard to location factors such as labor cost, land prices or taxesFootnote 22, etc. The location determinants we are taking into consideration are: human capital as a supply factor; municipality product as a demand factor; local external economies (localization and urbanization); and the role of neighboring municipalities’ location decisions characteristics.

The human capital index, \({HC}_i\), is defined as the percentage of population with at least a secondary school degree in municipality i in 1991. The expected sign is positive since it reflects the skilled labor market. Municipality product in 1991, \({MP}_i\), reflects the volume of economic activity in the municipality, the potential market for new firms, so its expected sign is positive.

External economies are represented by the classic location quotient and by a diversity index.

The location quotient, \({LQ}_{ij}\) represents the advantages of geographical specialization of municipality i in industry j, that is, traditional localization economies, Marshallian externalities or MAR’s agglomeration economies in 1990. Its expected sign is positive. Since higher \({LQ}_{ij}\) may be caused both by a large number of small firms and by a small number of large firms, besides localization externalities it may also reflect the effects of concentration or internal returns of scale. It is defined as follows:

$$\begin{aligned} LQ_{i,j} =(E_{ij} /E_i )/(E_J /E_T ) \end{aligned}$$
(7)

where \(E_{ij}\) accounts for total employment in manufacturing activity j in municipality i, \(E_i\) for total employment in municipality i, \(E_J\) for national employment in manufacturing activity j, and ET total national employment in all manufacturing activities.

\({DI}_i\) is a manufacturing diversification index for municipality i in 1990. The expected sign of this variable is positive since manufacturing diversity may reflect the existence of inter-industrial external economies, such as the Jacobs type (Jacobs 1969; Glaeser et al. 1992), and also because the creation of new plants is biased toward more diversified cities (Duranton and Puga 2000). This index is based on the correction for differences in sectoral employment shares at the national level of the inverse of a Hirschman–Herfindahl index proposed in Duranton and Puga (2000):

$$\begin{aligned} { DI}_i =1\Big /{\sum \nolimits _j {\left| {s_{ij} -s_i } \right| } } \end{aligned}$$
(8)

where \(s_{ij}\) is the share of manufacturing industry j in manufacturing employment in municipality i, and \(s_j\) is the share of manufacturing industry j in total national manufacturing employment.

Finally, we consider the potential role of neighboring municipalities \({NM}_i\), that is, location decisions of neighboring municipalities and the characteristics of neighboring municipalities. It may be measured by the spatially lagged independent variables in a standard (non-spatial) Probit model and in spatial error models, (\({WHC}_i\), \({WLQ}_{ij}\), \({WDI}_i\) and \({WMP}_i\)), where W is an SWM, and by the spatially lagged dependent variable in a Spatial Autoregressive Probit modelFootnote 23, (\({WLOC}_i\)). While \({WHC}_i\) and \({WMP}_i\) account for the human capital and the potential market of neighboring municipalities, \({WLQ}_{ij}\) and \({WDI}_i\) represent the geographical scope of agglomeration economies which are originated in neighboring municipalities. Location decisions of neighboring municipalities in industry j are represented by \({WLOC}_i\). That is, \({WLOC}_i \) measures part of the geographical scope of location decisions.

Therefore, location decisions may be explained as a function of local and neighboring municipalities variables, such as agglomeration economies, human capital, and potential market through the following expression:

$$\begin{aligned} LOC_{ij} =f(HC_i ,LQ_{ij} ,DI_i ,MP_i ,NM_i ) \end{aligned}$$
(9)

As Ottaviano and Puga (1998) point out, literature on economic geography identifies economic agglomeration at different levels of aggregation, from the small scale, e.g., a highly specialized industrial district such as the city of Prato in Italy, to the large scale agglomerations that cut across states, such as the US “Manufacturing Belt” or the European “Hot Banana.” Since the geographic scope of agglomeration economies do not seem to be very large, as described in the previous section, we focus on Spanish municipalities (NUTS V). It seems a sensible election to study both the location of new manufacturing plants or the geographical scope of agglomeration economies, (as shown in Holl (2004a), in Jofré-Montseny (2009) or in Viladecans-Marsal (2001, 2003) and Viladecans-Marsal (2004)), since the average size of Spanish municipalities is \(64 \hbox { km}^{2}\), which is 1/3 of the average size of the U.S. zip codes analyzed in Rosenthal and Strange (2003), and around 85 % of the municipalities consideredFootnote 24 are smaller than \(100\hbox { km}^{2}\).

Nevertheless, working with Spanish municipalities also imposes a hard data constraint since most municipality data are related to socio-demographic characteristics and they are not usually up to date, because they are often produced for decennial census or for other purposes. We could try to overcome this scarcity of data using data related to higher levels of spatial aggregation, such as NUTS III, as done in Holl (2004a) to proxy municipal wages, labor force qualification, sector and industry specialization, and industry share. Unfortunately, as it is widely known in spatial analysis but often ignored in location analysis, our analysis could be wrong due to the so-called Modifiable Areal Unit Problem (MAUP)Footnote 25, which is a potential source of error that can affect spatial studies which use aggregate data sources, consist of both a scale and an aggregation problem and is related to the concept of ecological fallacy (Unwin 1996; Bailey and Gatrell 1995). Thus, as our target is not to fully explain location decisions or location determinants, but to test whether location decisions in a municipality are related to the ones taken in neighboring municipalities, we will only consider NUTS V data.

The data sources that we will use in our analysis are Registro de Establecimientos Industriales—Industrial Establishments Register—(REI), Censo de Población 1991 (1991 Population Census), Censo de Locales 1990 (1990 Establishments Census 1990), and Alañón (2002). REI dataFootnote 26 will allow us to study the spatial allocation of new manufacturing establishments in Spanish municipalities for 11 industries at 2 CNAE-93 digit level (Spanish classification of economics activities at 2 digit level). The industries considered are: food and tobacco; clothes and leather; wood and furniture; printing and paper; chemistry; other nonmetallic minerals; first transformation of metals; machinery; computer, office equipment, etc.; electric and electronic equipment; and transport equipment. We have data from 1980 to 1998Footnote 27. 1991 Population Census and 1990 Establishments Census are the last Spanish Census whose municipality data are available for all municipalities. Census data will allow us to build indicators for the advantages derived from human capital, and agglomeration economies. Alañón-Pardo (2002) provides gross domestic product of Spanish municipalities for 1991.

Due to the restrictions of the data sources referred above, while the spatial exploratory analysis will cover the 1980–1998 period, the regression analysis will be limited to the 1991–1995 period.

2.3 The spatial statistics tools

In this section, we introduce the BB Join Count statistic and Moran’s I statistic that will be applied to study the spatial allocation of new manufacturing plants in Spanish municipalities.

The BB Joint Count TestFootnote 28 for spatial autocorrelation or spatial dependence reflects whether binary variables are clustered or randomly distributed in space. The BB Join Count Test is defined as follows:

$$\begin{aligned} BB=(1/2)\sum _i \sum _h {w_{ih} } LOC_i LOC_h \end{aligned}$$
(10)

where LOC is a binary variable, which is set to 1 when a manufacturing establishment is created over a period of time, and LOC is set to 0 otherwise. \(W_{ih }\) is the i-th element of a spatial weights matrix W, which reflects whether municipalities i and h share a common border, that is, they are neighbors. Thus BB reflects the number of times a municipality where there has been manufacturing births is contiguous to another municipality where there has been manufacturing births. A positive and significant z-value for this statistic indicates positive autocorrelation, that is, for a given manufacturing industry establishments births are more spatially clustered than might be caused purely by chance (Anselin 1992).

Using a measure of spatial autocorrelation for a binary variable seems sensible, since we are interested on whether the location decision is implemented or not. However, it could be argued that in our case, the measure could produce misleading results, since LOC is a binary variable which does not account for the number of establishments created. The BB statistic will be the same whether there is one or many new establishments created in the municipality.

In order to avoid this criticism, we will also apply Moran’s I statisticFootnote 29, which is defined as follows:

$$\begin{aligned} I=N/S_0 \sum _i {\sum _h {w_{ih} } } (x_i -\mu )(x_h -\mu )\Big /\sum _i {(x_i -\mu )^{2}} \end{aligned}$$
(11)

where N is the number of observations; \(w_{ih}\) is as defined above; xi and xh are the number of new establishments of a given manufacturing activity which have been set up in municipalities i and h respectively; and \(S_0\) is a scaling constant, \(S_0 =\sum \nolimits _i {\sum \nolimits _h {w_{ih} } } \). A positive and significant z-value for this statistic indicates positive spatial autocorrelation, that is, municipalities which have been chosen as locations for the new entries in a given manufacturing activity tend to be close to each other.

If BB Join Count statistic and Moran’s I statistic show there is spatial autocorrelation in location decisions and in the creation of new manufacturing establishments respectively, it does not necessarily mean that this spatial co-location is due to Marshallian agglomeration economies, since firms may cluster because of history, random events, natural advantages etc., as noted in the introduction. So, if the location decisions and the establishments births are spatially autocorrelated, we will apply Moran’s I statistic to the location quotient of the 11 manufacturing industries considered. The location quotient, \({LQ}_{ij}\), represents advantages of geographical specialization, traditional localization economies, Marshallian externalities or MAR’s type agglomeration economies. If the location quotient, or municipality specialization in a given industry, is autocorrelated in space, then location decisions and establishment births may be autocorrelated in space in order to get the advantages derived from a specialized environment.

3 Results

3.1 Exploratory analysis results

In this section, we provide results on the spatial statistics tools applied to the location decisions (Table 1), on the creation of new manufacturing establishments (Table 2), and on the manufacturing industry specialization in the Spanish municipalities (Table 3)Footnote 30. These analyses correspond to the 1980–1998 period and involve 11 manufacturing industriesFootnote 31.

As can be seen in Table 1, which shows the BB Join Count Test on the location decisions in Spanish municipalities, location decisions are spatially autocorrelated in all the manufacturing industries considered, except for computer and office equipment and electric and electronic equipment industries in 1980 and 1981. That is, municipalities which have been chosen for the location of manufacturing establishments of a given industry tend to share a common border with other municipalities where there are manufacturing births for that industry, in a fashion greater than could be caused purely by chance.

Looking at the number of births for every manufacturing industry in Table 2, results are very similar. Thus, both positive location decisions for a given industry and a given year, and the number of manufacturing births, are autocorrelated in space. These spatial patterns may be due to Marshallian agglomeration economies or to other reasons, as stated at the beginning of the introduction. In order to support the evidence for Marshallian agglomeration economies, Moran’s I statistic is applied to the level of municipality specialization in every manufacturing industry considered, which is measured through the location quotient, defined in expression 9. As shown in Table 3, except for the food industry, which is widely spread across the Spanish territory, specialized municipalities in a given industry tend to be neighbors. So, since municipality specialization in a given industry is autocorrelated in space, and so are location decisions and new manufacturing births, we may not reject that the benefits of locating in specialized municipalities are behind these spatial patterns.

Table 1 BB join count test (1980–1998)
Table 2 Morans’s I statistic (1980–1998)
Table 3 Moran’s I statistic on municipality specialization (location quotient)

3.2 Econometric results

In this section, as noted in Sect. 2.1, three models are estimated for each manufacturing industry: a standard Probit with spatially lagged explanatory variables, (PLEV), a Bayesian spatial autoregressive Probit, (SARP), and a Bayesian spatial error Probit with spatially lagged explanatory variables, (SEMP). The SARP and SEMPs Bayesian models both allow for heteroskedasticity. Spatially lagged explanatory variables in PLEV models are built with first-order contiguity SWM. As PLEV models results suggest spatial effects do exist in location decisions, we extend the geographical scope of these effects. The Deviance Information Criterion (DIC) (Spiegelhalter et al. 2002) was used to select the SWM specification.

Table 4 DIC for SEM and SAR models

This criterion is commonly used in Bayesian analyses with competing models (LeSage et al. 2011), and is based on the model likelihood. The DIC provides a measure of fit, which adjusts for the complexity of a model. Formally, the DIC is defined as:

$$\begin{aligned} DIC=\bar{{D}}({{\varvec{\uptheta }}})+p_D \end{aligned}$$
(12)

where D(\(\theta ) = -2\mathrm{LL}(\theta \)), or negative two times the log likelihood, and

$$\begin{aligned} p_D =\bar{{D}}({{\varvec{\uptheta }}})-D({\bar{{{\varvec{\uptheta }}}}}) \end{aligned}$$
(13)

where \(D({\bar{{{\varvec{\uptheta }}}}})\) is the deviance calculated using the mean of the parameters \({\bar{{{\varvec{\uptheta }}}}}\) obtained from the MCMC draws, and the average deviance (\(\bar{{D}}\)) is computed by taking the average of the deviance over the MCMC draws (Spiegelhalter et al. 2002). As can be seen in Table 4, multiple SWMs were examined, including nearest neighbors, NN, inverse distance, InvDist, and inverse distance squared, InvDistSQ. The 20 NN SWM and the InvDistSQ SWM for 10 km had the lowest DIC score for SEMP and SARP models, respectively (with the difference in DICs much greater than 7 in each case), providing strong evidence for the superiority of these models (LeSage et al. 2011). Note that DIC in SARP models is lower to the one in SEMP models.

Table 5 Standard probit with spatially lagged explanatory variables results (SWM = first-order contiguity spatial weights matrix)
Table 6 Spatial error Probit model (SWM = nearest neighbor 20)
Table 7 Spatial autoregressive Probit model
Table 8 Total effects
Table 9 Direct effects
Table 10 Indirect effects

To test for convergence of the MCMC routines, Raftery–Lewis convergence diagnostics (Lesage and Pace 2009) were used. Results indicate that convergence was achieved in fewer than 4000 draws for all models, with the majority converging at around 2000 draws.

The results of the econometric models are summarized in Tables 5, 6, 7, 8, 9 and 10. All non-spatially lagged explanatory variables, except for LQ in the Food and Tobacco industry in SARP and SEMP models, are significant and show the expected sign across all three models. According to these results, we cannot reject that population skills, manufacturing specialization (localization economies), market potential, and diversity (urbanization or Jacobs external economies) play an important role in location processes. Results for food industry in spatial Probit models are consistent with the lack of significance of Moran’s I for the location quotient in Table 3.

These results differ to a certain extent from the evidence shown in previous studies, such as Viladecans-Marsal (2004), where urbanization economies influence location in most sectors, but specialization only plays a minor role.

Looking at the spatially lagged explanatory variables in the PLEV models in Table 5, which account for the sources of agglomeration economies in neighboring municipalities, WLQ and WDI, are always significant and show the expected sign except for WLQ in Food and in First transformation of metals. However, WLQ is highly significant all the other industries, which could reflect the positive effect of neighboring municipalities due to Marshallian agglomeration economies. As noted in Sect. 3, the insignificant Food results may be due to the fact that this industry is highly spread across SpainFootnote 32.

The high significance of the spatially lagged diversity indicator, WDI, stresses the key role of inter-industrial linkages at an interurban level. As was suggested at the beginning of this paper and in the comments on WLQ and WDI indicator they also support evidence on the geographical scope of agglomeration economies.

A striking result is the lack of significance of the spatially lagged Human Capital indicator, WHC, in most manufacturing activities. It could mean that commuting is not very important in Spain as a whole (excluding the biggest cities) or that the commuters are not very skilled, but that its effect is also represented in WLQ since a qualified labor market is also a source of agglomeration economies.

The spatially lagged potential market indicator, WMP, is not significant in most manufacturing activities. Therefore, decision-makers seem to focus primarily on their internal market.

Moving on to the full spatial models in Tables 6 and 7, note that the spatial error and lag parameters, \(\lambda \) and \(\rho \), are significant in all models except computers and office equipment (SARP, and SEMP models) and electric and electronic equipment and transport equipment (SEMP models). Computers and office equipment is a manufacturing industry highly clustered in certain areas, and not very widespread in Spain. This agrees with the findings of the BB Joint Count test. Also, if we use \(\rho \) as a measure of the spatial dependence present in the SARP model, computers and office equipment has the lowest coefficient at 0.09. It also has the lowest \(\lambda \) coefficient in Table 6. The strongest spatial dependence is shown in food industry, since spatial autoregressive coefficient \(\rho \) is 0.57, which is consistent with the fact that this industry is highly spread across Spain. As \(\lambda \) and \(\rho \) are highly significant most manufacturing industries analyzed, we cannot reject that location decisions in neighboring municipalities matter in industrial location decisions.

The coefficient estimates from Tables 6 and 7 Footnote 33 are not easily compared to Table 6, since the impact of both the coefficient and its lag must be accounted for in the latter. Although some of the non-spatial (Table 5) coefficients are within the credible intervals for the spatial results—such as LQ for all estimates except machinery—there are many others that do not fall within the interval.

As stated in Sect. 2.1 as location processes may seem more localized, our SEMP models include spatially lagged explanatory variables (Table 6). Results on these variables do not differ much from the ones in PLEV models. WHC is not significant or present a negative sign in most industries; WLQ is significant in all industries but food; and WMP and WDI are significant and show the expected sign in all industries.

As shown in Table 4, according to DIC criteria SAR models get a better fit than SEM ones. Effect estimates for these models are shown in Tables 8, 9 and 10. As expected, direct effects, Table 9, are larger than indirect effects, Table 10, in all industries. All explanatory variables are significant, but LQ in food industry. Location decisions of each municipality seem more influenced by changes in human capital (HC) and industrial diversity (DI).

The indirect effect or spatial spillovers impact on neighbor municipalities of each explanatory variable is shown in Table 10. These results are mostly consistent with most of the ones in spatially lagged variables in PLEV and SEM models. However, human capital is significant and shows the expected sign in most industries. Changes in neighboring human capital and in industrial diversity seem to have larger impact on location decision than the ones in municipality product and industrial specialization.

These results highlight the importance of properly controlling for spatial dependence. Although past papers have used specifications similar to Table 5, that kind of model does not fully control for the error structure of spatial dependence. Although Viladecans-Marsal (2004) provides empirical evidence on the geographical scope of agglomeration economies in the biggest Spanish cities, her results differ, since agglomeration effects only spill over beyond the administrative borders in three of the six industries analyzed.Footnote 34

4 Conclusions

This paper is focused on this geographical scope of agglomeration economies in Spain, using data from municipalities. Specifically, on the role of the neighboring municipalities characteristics in location decisions. Exploratory analysis has shown that for every manufacturing industry considered births are spatially autocorrelated, no matter that we test positive location decisions or the number of births. That is, municipalities which have been chosen as location for births in a given industry tend to be neighbors of municipalities which have also been chosen as location for the same manufacturing industry. Spatial exploratory analysis on the municipality specialization suggests that spatial behavior may be due to the existence of Marshallian agglomeration economies that expand beyond the municipality borders, because the location quotient is also spatially autocorrelated for every manufacturing industry. Therefore, the geographical scope of agglomeration economies may play a role in location decision.

In order to test the role of the geographical scope of agglomeration economies in industrial location decisions confirmatory analysis was carried out. A simple location model was outlined and estimated using Spatial Econometrics and Spatial Statistics techniques. Spatial variables are highly significant for most industries, so we cannot reject that the characteristics of neighboring municipalities matter in industrial location decisions. That is, what happens in a municipality depends not only on what happens inside that municipality, but also depends on what happens in its neighboring area. Interurban agglomeration economies due to industrial diversity seem to play a larger role in the location decision of neighboring municipalities than the one of interurban agglomeration economies due to industrial specialization.

Policy makers of countries with a highly decentralized regional system, such is Spain, should bear in mind that these agglomeration economies can extend to or come from neighboring areas which belong to other regions. Therefore, inter-regional coordination is needed before implementing local or regional location incentives. This might be an important argument to justify the industrial policy has a regional definition, avoiding either the national basis less efficient (Aghion et al. 2011) and the municipal basis. In fact, most of the variables determining localization (population skill, manufacturing specialization, market potential and diversity) are mainly affected by policies of regional scope.

Future research should check the kilometric extent of agglomeration economies for every industry. Longer in time and more disaggregated industrial datasets (3 or higher digit level) are needed to analyze both the industrial, the temporal and the geographical scopes of agglomeration economies properly.

Finally, spatial autocorrelation should be taken into consideration when estimating location models, since spatial dependence invalidates the use of traditional estimation techniques.