1 Introduction

The most striking feature of the spatial distribution of economic activity is its heterogeneity. As has recently been highlighted by Ellison et al. (2010), economic activity is geographically concentrated and this concentration is too pronounced to be explained by exogenous spatial differences in natural advantages alone. Nonetheless, the observed concentration may also be due to these natural advantages. Some regions simply possess a better environment for certain industries that attracts them to that area. Similarly, it is also difficult to ensure whether the establishments of two industries are located close to each other because they are attracted by similar characteristics or natural advantages of the area or because they have strong linkages and have deliberately decided to locate close to each other to exploit synergies.

It is therefore not surprising that economists have paid attention to this tendency of firms and industries to become spatially localised since the pioneering works of Von Thünen, Marshall and Weber up to more recent contributions from the ‘new economic geography’ initiated by Krugman (1991a).Footnote 1

Although it is easy to define localisation or colocalisation as the tendency of different establishments or industries to opt for the company of other firms or industries and as a result to locate together, it is more difficult to measure it properly and to detect whether the location of firms or industries is conditioned by the location of other firms or industries. That is, we should differentiate between exogenous causes for locating together, like natural advantages, and endogenous reasons, like strong linkages or synergies between firms or industries.

Thus, the interest in theories that can explain the agglomeration of firms and industries has also been extended, in recent years, to the development of empirical methods to quantify and characterise this tendency of individual firms and industries to cluster in space. The first generation of these measures—to use the terminology employed by Duranton and Overman (2005)—was based on indicators such as the Herfindahl or Gini indices, which did not take space into consideration.Footnote 2 The second generation, initiated by Ellison and Glaeser (1997), began to take space into account, but not in a proper way. The Ellison and Glaeser index still used administrative units to measure the spatial distribution of economic activity, treating space as being discrete.Footnote 3 Therefore, they restricted the analysis of spatial distribution to just one administrative scale, ‘they transform points on a map into units in boxes’.Footnote 4 Alternatively, the third generation of empirical measures of spatial localisation, developed by authors from different scientific fields (economics, geography and statistics), introduced the treatment of space as being continuous, by simultaneously analysing multiple spatial scales. These new measures are unbiased with respect to arbitrary changes in the spatial units and can allow us to know and to compare the concentration intensity for each spatial scale. Authors like Marcon and Puech (2003), Quah and Simpson (2003), Duranton and Overman (2005) and Arbia et al. (2008), among others, were the pioneers in introducing these methods into economic geography. More recently, papers by Duranton and Overman (2008), Marcon and Puech (2010) or Albert et al. (2012) have developed several extensions and improvements to these methodologies. Since then, an increasing number of studies have appeared thanks to these methodological developments and the widespread use of micro-geographic data. Some examples are Nakajima et al. (2012), who examined the location patterns of Japanese manufacturing industries, in the same way as Koh and Riedel (2014) did with the four-digit German manufacturing and service industries. Meanwhile, Barlet et al. (2013) improved the test proposed by Duranton and Overman (2005), avoiding the bias with respect to the number of plants, and studying the location patterns of service and manufacturing industries in France. In accordance with other localisation measures, Guimarães et al. (2011) modified measures of spatial concentration by taking into account neighbouring effects and applying the new instruments to the USA. Similarly, Behrens and Bougna (2015) used micro-geographic data to analyse the evolution of geographic concentration in Canada, applying the Duranton and Overman index and also integrating neighbourhood effects into the Ellison and Glaeser index, in the same way as Guimarães et al. (2011).

In this paper, we use different measures belonging to this last generation in order to analyse two important issues characterising the spatial location patterns of manufacturing firms: the sectoral scope of the location patterns of different industries and the tendency to colocalise among various industries whose activity is related either vertically or horizontally. To do so, we apply an extension of Ripley’s K function,Footnote 5 which allows us to assess the different tendencies to cluster in each industry while also enabling us to know whether concentration exists, its intensity at each distance, and on what spatial scale its highest level is obtained. Moreover, to analyse colocalisation, we use the K-cross functionFootnote 6 and we incorporate a methodological improvement that, unlike other proposals, enables us to obtain results that are closer to reality from the economic point of view.

Specifically, using a narrowly defined industry classification, we analyse the patterns of intra- and inter-industry location of Spanish manufacturing sectors. First, we focus on the patterns of location of the set of establishments making up each manufacturing industry at the four-digit level, the results revealing that 68% of them show localisation patterns. Moreover, these industries reach their maximum concentration at very heterogeneous distances. Second, we check the sectoral scope, that is, whether the four-digit industries that are part of the same two-digit sector have similar patterns of location among them (intra-sectoral homogeneity) and whether, at the same time, they are similar to the location pattern of the whole two-digit sector. Our results confirm that the more homogeneous the activities in a specific sector are, the more similarities we find in the spatial location patterns among their industries. Third, we analyse the colocalisation patterns of the pairs of industries with relevant linkages, finding that 48% of these examined pairs are colocalised. Furthermore, 74% of colocalised industries are vertically linked. Finally, we show that the patterns of colocalisation detected are sensitive to the methodology used to construct the counterfactual that allows us to establish the statistical significance of the results. So, the more restrictive the methodology used, the more likely it is to reject the existence of colocalisation between pairs of industries, especially at short distances, thereby increasing the risk of rejecting actually existing colocalisation patterns. In this case, the colocalised pairs of industries are reduced from 48 to 20%.

The remainder of the paper is organised as follows. In Sect. 2, we present the data used in our analysis and the methodology employed. In Sect. 3, we introduce and discuss the main results obtained, taking into consideration the sectoral scope of localisation for industries at the four-digit level and their corresponding colocalisation between vertically and horizontally linked industries. Finally, in Sect. 4, we conclude and discuss the final considerations.

2 Data and methodology

2.1 Data

We use establishment-level data, for the year 2007, from the Analysis System of Iberian Balances databaseFootnote 7 to carry out our empirical analysis. For each establishment in our database, we know their geographical coordinates (longitude and latitude), their number of employeesFootnote 8 and the kind of industrial activity they perform. Specifically, we have information about the codes of the National Classification of Economic Activities (NACE)Footnote 9 they belong to. When we refer to all establishments grouped into four-digit or two-digit NACE codes, we are speaking about industries and sectors, respectively. Note that each sector (two-digit code) includes several industries (four-digit code).

The geographical coordinates allow us to treat space as being continuous instead of using a single administrative scale, thus allowing multiple spatial scales to be analysed simultaneously. In fact, through said geographical coordinates, we locate the establishments, represented by dots, accurately in space without having any modifiable areal unit problems (MAUP), that is, our results do not depend on the administrative scale chosen.

Our database is restricted to Spanish manufacturing firms located only on the peninsula and not in the Canary and Balearic Islands, Ceuta or Melilla, and which employ at least ten workers. This second requirement is due to the fact that most establishments with fewer than ten workers do not have the essential information (geographical coordinates) needed to carry out our analysis. Moreover, we should highlight the fact that some industries are not included in the analysis because the number of their establishments is too small to be able to apply the statistical methods.Footnote 10 After considering all these requirements, our database contains 42,820 establishments, belonging to 90 industries and 19 sectors.Footnote 11

2.2 Methodology: localisation and colocalisation

The methods we are going to use follow Albert et al. (2012) and are based on Ripley’s K function, K(r). This function is a distance-based method that measures concentration by counting the average number of neighbours each firm has within a circle of a given radius, ‘neighbours’ being understood to mean all the firms situated at a distance equal to or lower than the radius (r). From here on, firms will be treated as points.

The K(r) function describes the characteristics of the point patterns on many different scales simultaneously, depending on the value of ‘r’ we take into accountFootnote 12; that is,

$$ K\left( r \right) = \frac{1}{\lambda N}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1,i \ne j}^{N} {w_{ij} I\left( {d_{ij} } \right)} } $$
$$ I\left( {d_{ij} } \right) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {d_{ij} \le r} \hfill \\ {0,} \hfill & {d_{ij} > r} \hfill \\ \end{array} } \right. $$

where d ij is the distance between the ith and jth establishments; I(d ij ) is the indicator function and takes a value of 1 if the distance between the ith and jth establishments is lower than or equal to r, and 0 otherwise; N is the total number of points observed in the studied area; λ = N/A represents its density, A being the area of studyFootnote 13; and w ij is the weighting factor to correct for border effects, and will be equal to the area of the circle divided by the intersection between the area of the circle and the area of study.Footnote 14

The next step in the evaluation of the location patterns of economic activity is to define the counterfactuals that allow us to provide our results with statistical robustness. The null hypothesis is usually a kind of randomly distributed set of locations in the area of study. Thus, if establishments were located at random in the study area and independently from each other, we would have a location pattern known as complete spatial randomness (CSR). However, it is not altogether correct to use CSR as the null hypothesis because economic activity cannot be located in space in a random and independent way. Economic activities are spatially concentrated for other reasons, very different to economic factors, for example, because of dissimilarities in such natural features as mountains, rivers or harbours. Additionally, with CSR as our benchmark neither can we isolate the idiosyncratic tendency of each industry to locate itself from the general tendency of manufacturing firms to agglomerate.

To avoid these drawbacks and to control for the overall agglomeration of manufacturing, we proceed in two steps. First, we define M TM(r) as the difference between the K value of each set of industrial firms under consideration and the K value of the total manufacturing at radius r, that is: M TM(r) = K(r) − K TM(r). And second, we test the significance of departures from a random distribution, conditioned on the overall distribution of manufacturing. To do this, we construct suitable confidence intervals using Monte Carlo simulations.Footnote 15 Specifically, for each industry, we construct counterfactuals by randomly drawing the same number of points (establishments) as in each of the industries under consideration, but the location of these hypothetical establishments is restricted, as in Duranton and Overman (2005), to the sites where we can currently find establishments from the whole manufacturing sector. In this way, the construction of the confidence interval allows us to assess the significance of departures from spatial patterns followed by the whole of manufacturing and to control for industrial concentration. When the estimated M TM(r) for a specific industry lies within the confidence interval, we cannot reject the null hypothesis that the location pattern of this industry is the same as that of manufactures as a whole. If our estimation lies above the upper bound of the confidence interval, the industry analysed is more concentrated than the manufacturing industry, while if it is below the lower bound of the confidence interval, then the analysed industry exhibits a more dispersed pattern than manufactures as a whole.Footnote 16

Similarly, in order to analyse whether two industries, horizontally or vertically linked, are colocalised, we have to consider a multivariate spatial point pattern. To do so, we use a K-cross function, K ij (r), where i ≠ j and r is the radius, that is:

$$ K_{ij} \left( r \right) = \left( {\lambda_{i} \lambda_{j} A} \right)^{ - 1} \sum\limits_{k} {\sum\limits_{l} {w\left( {i_{k} ,j_{l} } \right)I\left( {d_{{i_{k} ,j_{l} }} } \right)} } $$
$$ I\left( {d_{{i_{k} ,j_{l} }} } \right) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {d_{{i_{k} ,j_{l} }} \le r} \hfill \\ {0,} \hfill & {d_{{i_{k} ,j_{l} }} > r} \hfill \\ \end{array} } \right. $$

where \( d_{{i_{k} ,j_{l} }} \) is the distance between the kth location of type i and the lth location of type j; \( I\left( {d_{{i_{k} ,j_{l} }} } \right) \) is the indicator function and takes a value of 1 if the distance between the kth location of type i and the lth location of type j is lower than or equal to r, and 0 otherwise; λ i  = N i /A and λ j  = N j /A represent the density of points of type i and j, respectively, A being the area of study, N i and N j being the total number of points of type i and type j observed in the studied area; and w(i k , j l ) is the weighting factor to correct for border effects, the fraction of the circumference of a circle being centred at the kth location of process i with radius \( d_{{i_{k} ,j_{l} }} \) that lies inside the area of study.

As argued by Duranton and Overman (2005), in this case defining the null hypothesis in order to construct the counterfactuals is rather more complicated. Furthermore, this choice will condition the interpretation of our results. Indeed, we must point out that the fact that two industries are located together does not always mean that they deliberately locate close to each other to exploit synergies between them. Instead, they can possess similar location patterns only because the two industries may be attracted by the same localised natural advantages.Footnote 17 However, only in the first case can we speak of colocalisation in the proper sense of the term.

In this paper, we construct the counterfactuals, with which the estimated K-cross function can be compared, in two different ways.

In the first way, for any two four-digit industries (i and j) we simulate the spatial location of establishments by randomly sampling the same number of points (establishments) of four-digit industry i (N i ) in the set of sites actually occupied by establishments of two-digit sector I, which this industry belongs to (i ∈ I).Footnote 18 Then, we use these simulations to compute the K-cross function, K ij (r), and construct the confidence intervals.Footnote 19 In this case, the upper deviations of the estimated K ij (r) value from randomness indicate that the establishments in the four-digit industry i ∈ I are attracted by establishments of industry j ∈ J, even after controlling for any tendency of establishments of industry i to cluster with establishments in the remaining sector I they belong to. In other words, we interpret this result as statistically significant evidence of a tendency of establishments in industry i to locate closer to establishments of industry j (i → j), instead of locating closer to other establishments from their own sector. Alternatively, if the K ij (r) value lies below the lower bound of the confidence interval, colocalisation will not exist and establishments from industry i will prefer to locate closer to establishments from their own sector rather than to establishments from industry j. Finally, if the value of K ij (r) lies between the upper and the lower bounds of the confidence interval, the location pattern of the establishments of industry j does not have a significant influence on the location pattern of the establishments of industry i and these establishments will locate close both to establishments of industry j and to establishments from their own sector. Furthermore, we use the same criterion to assess whether establishments from four-digit industry j, belonging to two-digit sector J (j ∈ J), are located closer to establishments from industry i than to other establishments in their own sector J (j → i). Hence, using these criteria together allows us to consider that colocalisation exists in both ways. This means that establishments in industry i locate closer to establishments in industry j than to other establishments in their own sector I, and vice versa, (i ↔ j); that is, establishments from both industries show mutual attraction. In this way, more robustness is given to colocalisation and the probability of ‘joint-localisation’ can be minimised.

Our second way to construct counterfactuals relies on the test proposed by Duranton and Overman (2005, 2008). In this case, the alternative is to randomly sample the same number of points as the sum of establishments belonging to the two industries (N i  + N j ) in all sites actually occupied by them. As in the previous case, we use these simulations to compute the K-cross function, K ij (r), and construct the confidence intervals.Footnote 20 Hence, when the estimated K ij (r) value is higher than the upper bound of the confidence interval, it means that establishments in these industries are attracted to each other even after controlling for whatever tendency they have to cluster. Note that this is an extremely restrictive and demanding test, because as Duranton and Overman said, ‘the desire to locate close to establishments in a vertically linked industry does not necessarily require locating closer to establishments in this industry than establishments in one’s own industry’.Footnote 21

Obviously, the way the counterfactual is built defines the conditioning factors of the underlying spatial randomness of our null hypothesis, and changes the intuition behind the test we are performing. The key question is: what is the alternative to evaluate the significance of the tendency of establishments in an industry i to be closer to establishments in a linked industry j? In sum, in this paper we propose two alternatives. First, this tendency will be stronger than the tendency to be located closer to other establishments in their own two-digit sector (which henceforth we will call the ‘broad test’ of colocalisation) or, second, closer to establishments in their own four-digit industry (hereafter called the ‘narrow test’ of colocalisation).

3 Empirical results

3.1 Localisation in four-digit industries

The analysis of the spatial location pattern of the Spanish manufacturing industries shows that 61 of the 90 four-digit industries considered (68%) are concentrated,Footnote 22 whereas 16 industries (18%) are dispersed, and 13 (14%) do not present any significant differences from the location pattern of the whole manufacturing.

Focusing on those industries that have a higher tendency to spatial concentration than the whole manufacturing, Fig. 1 shows the cumulative percentage of industries that reach their maximum level of concentration (maximum M TM value) at each distance of the radius.

Fig. 1
figure 1

Cumulative percentage of industries that reach their maximum concentration level (maximum M TM) at different distances

In this figure, we can observe that, first, this maximum intensity of concentration is reached at very different radii among the industries analysed and, second, that there is a considerable change in the rate of incorporation of new industries that reach their highest level of concentration from 90 km onwards.

In fact, a large number of industries (30%) reach their highest level of concentration at very short distances (between 0 and 45 km), while another 20% reach their maximum concentration between 45 and 65 km. This trend continues up to 90 km, and from this distance on the trend slows down considerably. As a result, 75% of the concentrated industries reach their maximum level of concentration at distances lower than 90 km and only 25% reach their highest level of concentration at distances larger than 90 km.

Finally, we must take into account the fact that the maximum M TM value is the maximum difference between K i and K TM, and given that K TM values do not remain constant when the radius becomes larger, the interpretation of the M TM value is not independent on the behaviour of K TM. Hence, industries that reach their maximum level of concentration at large distances will not be very important in our analysis, because the whole of the Spanish manufacturing (TM) reaches its maximum level of concentration at a distance of 60 km and from this distance onwards its concentration becomes lower, even showing dispersion patterns beyond 150 km. Hence, those industries that reach their M TM peak at distances of <60 km are of greater importance in our analysis, since this evidence could be interpreted as strong localisation.

Table 1 gives detailed information about the location patterns of a representative sample of the 90 industries analysed.Footnote 23 This sample contains industries belonging to all the sectors analysed and not only those that are concentrated, as in Fig. 1. It therefore also includes industries that present dispersion patterns and industries that do not present significant differences from the manufacturing industry as a whole.

Table 1 Location patterns of Spanish manufacturing industries (M TM)

The first column of Table 1 shows the four-digit code and name of the industries. Its last two columns show the significant extreme value of the M TM function and the distance (r) at which this value is reached. A positive extreme value of the M TM function means that at the critical value of radius r, there is a maximum; that is, for this radius the highest level of concentration at all possible radii is reached. Similarly, a negative extreme value of the M TM function indicates that at radius r, there is a minimum; that is, for this radius the highest level of dispersion at all possible radii is reached. Therefore, in this table, the spatial location pattern of each industry is characterised by two specific features: (1) the intensity of the cluster (extreme value of the M TM function), and (2) the distance at which this highest/lowest intensity is reached (critical value of radius r).

As we can see in Table 1, although there are similarities in the spatial distribution of some of these industries, the dominant feature is the substantial variation among them. The intensity and the distance at which the maximum concentration/dispersion is reached vary from industry to industry and this can encourage a large diversity of types of clusters.

If we look at the intensity of concentration in Table 1, it is easy to observe that, on the one hand, textile-related industries, media-based industries, and chemical-related industries are among those with higher levels of spatial concentration, together with the ceramic manufacturing, an industry that is heavily localised in Spain.Footnote 24 On the other hand, we find that mostly food and food-related industries, together with industries with a high dependence on natural resources, are the activities that show the most dispersed location patterns.Footnote 25 Like Duranton and Overman (2008), we do not find any particular characteristics among the most concentrated industries. Thus, most of the textile-related industries are low-tech and presumably their geographical concentration is due to historical trends; in the case of media-based and chemical-related industries, their decision to concentrate may be due to the search for skilled labour; and, finally, the ceramic manufacturing has a markedly specialised industrial activity, as well as knowledge and technological spillovers that make its establishments form a well-known industrial district. Moreover, if we go further in the comparison of our results with those obtained by other authors, we realise that many similarities are to be found. For instance, Duranton and Overman (2008) found that, in the UK, textile-related industries and media-based industries are amongst the most localised, while it is mostly food-related industries together with industries with high transport costs or a high dependence on natural resources that show dispersion. Behrens and Bougna (2015), in Canada, found that textile and publishing industries are among the most localised industries, while food and drink and wood industries are among the least localised ones. They also added that most of the industries do not have extreme spatial patterns, which is similar to the results for the UK and Spain. Similarly, the analysis performed by Koh and Riedel (2014), for the German manufacturing industry, also suggested that the especially traditional industries, like textile production, are the ones which exhibit strong localisation patterns.

Finally, the last column of Table 1 allows us to know the spatial scale or the rough size of the cluster of each industry, i.e. the distance (radius) at which the highest intensity of concentration is reached. As can be seen, heterogeneity is the norm, reaching peak concentration levels at distances ranging from 6 to 200 km. Thus, among the industries that reach their highest level of concentration at short distances (lower than 30 km) we can find Building and repairing of ships (3511), which reaches it at only 6 km; some textile and leather industries, like Manufacture of other textiles (1754) or Manufacture of luggage, handbags, saddlery and harness (1920), which reach their highest concentration at distances of 19 and 29 km, respectively; and Manufacture of pulp, paper and paperboard (2112), at a distance of 23 km. However, we should note that, although these industries reach their highest level of concentration at very short distances, they do not have excessive ‘intensity’, that is, their maximum level of concentration is not very high. We find only one industry that meets both requirements. Manufacture of ceramic tiles and flags (2630) presents a high level of concentration and this happens at short distances.

Only when we combine the information on the value of the maximum intensity of the clusters with the distance at which it is achieved is it possible to give some idea of the underlying reality in Table 1 for different industries. Thus, in the first cases, we are looking at industries with small clusters distributed throughout the study area, but which also have a significant proportion of establishments distributed throughout the space, although they are not located inside those clusters. However, in the second case, the level of concentration of the industry is much higher at short distances because most of its establishments are located in a single small cluster, and only a few establishments are situated outside this cluster.

3.2 Sectoral scope: four-digit industries within two-digit sectors

Next, we examine whether related four-digit industries within the same two-digit sector tend to follow similar or different patterns of localisation.

Our results indicate that for many two-digit sectors, related industries within the same sector tend to follow similar location patterns; that is, intra-sectoral homogeneity appears. Some examples are sectors (15) Food products and beverages, (18) Wearing apparel and dressing, (27) Basic metals, (31) Electrical machinery, (32) Radio, televisions and other appliances, (33) Instruments and (34) Motor vehicles and trailers. For instance, sector 15 exhibits a more dispersed pattern than manufactures as a whole, reaching its maximum level of dispersion, with an intensity of −0.01, around the distance of 60 km, as do most of the industries that are grouped in it.Footnote 26 Duranton and Overman (2005) found similar results, showing that for many of these previously mentioned two-digit sectors, related four-digit industries within these sectors tend to follow similar patterns of distribution.

However, for other sectors, related industries tend to follow different location patterns. This is the case of sector (36) Furniture and other products, where none of its industries have a similar location pattern to that of the sector they belong, and the spatial distribution of each industry is very different from one to another. This is probably due to the fact that it is a sector with very varied activities grouped in it, like furniture, mattresses, jewellery, musical instruments, sports goods, toys, and so forth.Footnote 27 Another example of intra-sectoral heterogeneity is sector (29) Other machinery and equipment. In this case, its industries present very heterogeneous patterns of localisation, showing high levels of dispersion and concentration. Specifically, on the one hand, industries (2953) Manufacture of machinery for food and beverage and (2932) Manufacture of agricultural and forestry machinery show dispersion patterns. This happens because the main activity of these two industries is related to other widely dispersed industries. Thus, industry 2953 is vertically linked to food-related industries and industry 2932 is vertically linked to industries with a high dependence on natural resources. On the other hand, industry (2954) Manufacture of machinery for textile, apparel and leather production presents concentration patterns at short distances, due to the fact that it is vertically linked to textile industries, which are also concentrated at short distances. These results coincide with those obtained in the UK by Duranton and Overman (2005), who say that industries 2953 and 2932 present dispersion at all distances, while industry 2954 exhibits localisation for distances between 0 and 50 km.

Lastly, there are other sectors where the heterogeneity among their industries is not as high as in the previous examples, with industries with location patterns very similar to the sector as a whole and industries with completely different patterns of localisation coexisting within the same sector. This is the case of sectors (17) Textiles and (22) Publishing, printing and recorded media, where two of their industries differ from the rest and from the spatial distribution of the aggregated sector. The same happens with sectors (25) Rubber and plastic products and (26) Other non-metallic mineral products, in which one industry presents location patterns that are far more concentrated than the other industries and the sector itself.

Naturally, for each industry and sector it is possible to have a far more detailed analysis when we combine information on the level of intensity with the distance at which it is achieved. However, it is not feasible to offer a detailed description of the location pattern for all industries and sectors analysed. Hence, we will use two examples to illustrate more accurately the magnitude of the differences highlighted by the estimation of Ripley’s K functions.

Figures 2 and 3 illustrate the case of industries (2213) Publishing of journals and periodicals and (2630) Manufacture of ceramic tiles and flags, and their corresponding two-digit sectors (22 and 26). Figures 2a, c, 3a, c depict point clouds for each industry and sector, where each dot corresponds to an establishment, and Figs. 2b, d, 3b, d show their M TM estimated functions.Footnote 28

Fig. 2
figure 2

Relative location patterns of sector 22 and industry 2213

Fig. 3
figure 3

Relative location patterns of sector 26 and industry 2630

The point clouds clearly show that there are differences in the density and distribution of establishments between each industry and its corresponding sector. However, just by looking at the clouds it is difficult to establish to what extent the spatial location patterns of the sectors and the industries concerned are different, how large these differences are, or how the degree of spatial concentration is affected at each distance. Thus, on examining the M TM estimated functions it is easy to recognise that the most pronounced difference between the location pattern of four-digit industry and two-digit sector occurs in Fig. 3. In fact, sector 26 shows dispersion patterns at all distances of the radius analysed, while industry 2630 presents high levels of concentration. In Fig. 2, the difference between the spatial location pattern of four-digit industry and two-digit sector is smaller, because sector 22 is concentrated at every distance of the radius, but its intensity is not as elevated as that belonging to industry 2213. Nevertheless, the common feature between Figs. 2 and 3 is that, in both cases, the four-digit industries show higher levels of concentration than their respective two-digit sectors and this concentration is reached at much shorter distances.

At this point, it should be added that although both industries present concentration patterns at short distances, the shape of their M TM curves is very different. In Figs. 2d, 3d, we observe a common fast growth of the M TM value at short distances of the radius, but when the maximum M TM is reached, the behaviour of the M TM function differs in the two industries. On the one hand, industry 2213 (Fig. 2d) reaches its maximum intensity (0.33) at a distance of 50 km and then the M TM value hardly decreases when the distance analysed becomes larger. On the other hand, industry 2630 (Fig. 3d) reaches its maximum intensity (0.32) at a distance of 30 km and when the distance becomes larger, the M TM value decreases very rapidly.

The rapid growth of the M TM value at small distances has an obvious explanation. In both industries, most of the establishments are located within clusters rather than being spread (82% in industry 2213 and 78% in industry 2630). In fact, localisation leads to specialisation in particular jobs. As a result, workers skilled in those jobs are attracted to that place and these localised industries are continuously fed by a regular supply of skilled labour that also attracts new firms into the industry. Therefore, the majority of establishments in industries 2213 and 2630 are attracted over the years to specific locations and located in small clusters, thereby making the density of establishments at small distances very high and the intensity of concentration (the M TM value) very elevated at short distances.

Moreover, the reason why the M TM function does not behave in the same way when the radius becomes larger is because both industries possess a different number of clusters. Industry 2213 has two huge clusters around Madrid and Barcelona, whereas industry 2630 has a single well-defined cluster around Castellón. Therefore, when our test counts the neighbours close to the establishments when the radius is increased, the value of M TM is maintained over large distances when more than one cluster exists. In this way, the value of the M TM function depends on (1) the relative number of establishments located within the clusters, (2) the number of clusters in the point pattern, and (3) the distance between these clusters.Footnote 29

3.3 Colocalisation

The next issue that we analyse concerns colocalisation patterns between pairs of industries with significant linkages. We use the 2005 Spanish Input–Output TableFootnote 30 to analyse the existence and magnitude of linkages between the different manufacturing industries. This table gives information about the inter- and intra-industry flows between the industries and sectors of an economy, and it allows us to establish the degree of industrial interdependence (or linkages) between them. Industries having the highest input–output linkages are good candidates to become colocalised for reasons other than the natural advantages of the places where they agglomerate.

An initial review provides us with a large number (thousands) of pairs of industries that may be eligible as candidates in our study, but it would be impractical to analyse all of them. Hence, we focus on only the 168 pairs of industries with the strongest linkages, and we use the K-cross function to evaluate the tendency to colocalise among them.Footnote 31

Figure 4 shows the percentage of pairs of analysed industries that are colocalised at each distance of the radius (r), using the ‘broad test’ of colocalisation, as has been explained in the methodology section. As can be seen in the figure, there is a higher percentage of pairs of industries that present colocalisation at short distances than at large distances. Therefore, in the first 50 km, around 35% of the pairs of industries analysed present colocalisation patterns, while the remaining 65% show a higher attraction towards establishments from their own sector than towards establishments of the other industry considered. The percentage of colocalised industries increases significantly at distances larger than 70 km, reaching their maximum at a distance of 90 km, where it is found that 48% of pairs of industries with stronger linkages are colocalised. Finally, this percentage decreases rapidly from this distance onwards. At a distance of above 120 km, the percentage of pairs of industries that are colocalised decreases to below 20%. In sum, our results show that colocalisation is not a widespread phenomenon among the most linked industries.

Fig. 4
figure 4

Percentage of pairs of industries colocalised at different distances (‘broad test’)

Another relevant question is whether there are significant differences in the patterns of colocalisation between the pairs of industries that belong to the same sector, horizontally linked industries, and those pairs that belong to different sectors, vertically linked industries.Footnote 32 The question arises because when we observe the values of the diagonal of the input–output table, it is striking that these values are higher than those for other intersections, showing that there is a high percentage of intra-sectoral flows. And the answer is no. Of all the industries that present colocalisation patterns, only 26% correspond to horizontally linked pairs of industries. Therefore, the fact that industries belonging to the same sector have stronger linkages between them has no direct impact on higher levels of colocalisation.

Alternatively, in Fig. 5, we observe the percentage of pairs of analysed industries that are colocalised at each distance of the radius (r) using the second way to construct the counterfactuals, that is, what we have called the ‘narrow test’. On the one hand, we see that a lower percentage of industries show colocalisation patterns if we compare these results with those obtained in Fig. 4. On the other hand, there is a higher percentage of pairs of industries that present colocalisation patterns at large distances of the radius, rather than at short distances. In fact, the maximum percentage of colocalised industries (20%) is reached as of 130 km and persists at long distances. This result is very similar to that obtained by Duranton and Overman, who found that in the UK colocalisation patterns increase at distances beyond 160 km.

Fig. 5
figure 5

Percentage of pairs of industries colocalised at different distances (‘narrow test’)

The results obtained in Figs. 4, 5 are consistent with the construction of the counterfactuals. Since most industries analysed show a tendency to spatial concentration at short distances, it is not surprising that this trend is imposed on the attraction of establishments in other industries over short distances, when our null hypothesis is very restrictive (‘narrow test’). Thus, in the first case, when we control for the tendency of firms to be located closer to other establishments in their own two-digit sector (‘broad test’ of colocalisation), we find more pairs of industries colocalised at shorter distances. In the second case, when we control for the tendency of firms to be located closer to other establishments in their own four-digit industry (‘narrow test’ of colocalisation), the number of pairs of linked industries with colocalisation patterns decreases at short distances and increases significantly over long distances. However, we must take into account that both tests may fail. In the first case, although there is a tendency for establishments of industry i to locate near establishments of industry j, and we can reject conditional randomness, we may be facing some form of joint-localisation. But in the second case, the test is extremely demanding and we may be unable to reject randomness despite strong forces pushing towards colocalisation if own-industry concentration forces dominate. In other words, in this case we fail to detect industries that are effectively being colocalised.

Figure 6 exemplifies the arguments discussed above. Here, we observe the interaction that industry i (2213) Publishing of journals and periodicals maintains with industry j (2121) Manufacture of corrugated paper and paperboard. The continuous lines represent the estimated value of K ij at every distance of r, and the dashed lines are the lower and upper bounds of the confidence intervals.

Fig. 6
figure 6

Colocalisation between industries 2213 and 2121 (‘broad test’ vs. ‘narrow test’)

The difference between the left- and right-hand side of the Fig. (6a, b) lies in the method of building the confidence interval. On the left-hand side we control for the tendency of firms to be located closer to other establishments in their own two-digit sector (‘broad test’), while on the right-hand side we control for the tendency of firms to be located closer to other establishments in their own four-digit industry (‘narrow test’). In Fig. 6a, the K ij (r) function lies above the upper bound of the confidence interval at every distance analysed. This means that colocalisation between these two industries is significant at all radii and establishments of industry 2213 have a tendency to locate close to establishments of industry 2121, or at least closer than to establishments within their own sector, (22) Publishing, printing and recorded media. However, the significance level of the function is almost negligible at short distances of the radius, meaning that at short distances the establishments of industry 2213 also have a strong tendency to be located close to establishments in their own sector. In Fig. 6b, we observe that the K ij (r) function lies below the lower bound of the confidence interval, or between the upper and the lower bounds, up to a distance of 70 km, lying above the upper bound from this distance onwards. Thus, colocalisation will not be significant at short distances of the radius, this meaning that establishments of industry 2213 will prefer to locate closer to establishments in their own industry than to establishments of industry 2121 at these distances.

The key to explain the difference in the results lies in the location pattern of industry i. Industry 2213 has very intense concentration patterns at short distances. Hence, if own-industry concentration forces are strong and dominate, the narrow colocalisation test is likely to fail despite strong forces pushing towards colocalisation. This does not mean that establishments of industry 2213 do not locate close to establishments of industry 2121 at small distances of the radius, but rather that the attraction towards the establishments in one’s own industry is stronger than the attraction towards the establishments in the vertically linked industry 2121. However, when we condition by the tendency of establishments to be located close to establishments in their own sector, a greater tendency appears between both industries considered to be colocalised. We can therefore conclude that the location pattern of industry i affects the narrow test more than the alternative one.

The characteristics of the spatial location pattern of the sector or the industry that are behind the construction of the counterfactuals condition the capability of the test to accept or reject the proposed alternatives. Rejecting conditional randomness, in the first case (‘broad test’), creates ‘false positives’, and being unable to reject conditional randomness, in the second case (‘narrow test’) it creates ‘false negatives’. Consequently, the analysis of the colocalisation patterns cannot be independent of the analysis of the location patterns of specific industries and sectors considered.

Finally, in Figs. 7 and 8, we have two industries that belong to the same two-digit sector of motor vehicles: industries 3410 Manufacture of motor vehicles and 3430 Manufacture of accessories for motor vehicles and their engines. Traditionally, these assemblers and car suppliers have attempted to locate close to each other to minimise assembly time and reduce manufacturing costs. Industry 3410 has barely any concentration patterns and industry 3430 does not present any significant difference from the location pattern of the whole manufacturing.

Fig. 7
figure 7

Colocalisation between industries 3410 and 3430 (‘broad test’ vs. ‘narrow test’)

Fig. 8
figure 8

Colocalisation between industries 3430 and 3410 (‘broad test’ vs. ‘narrow test’)

Figure 7 illustrates the interaction in the location patterns of establishments in industry 3410 (assemblers) and establishments in industry 3430 (suppliers). In Fig. 7a, b, the solid lines are the same and represent the observed or empirical K-cross functions between both industries, while the dashed lines delimit the confidence interval. The difference between the confidence intervals again lies in the test used, that is, in the way the counterfactuals are built (Fig. 7a, ‘broad test’, and Fig. 7b, ‘narrow test’ of colocalisation).

In Fig. 7a, the observed K ij (r) lies above the upper bound of the confidence interval at relatively short distances. Specifically, this tells us that establishments in industry 3410, assemblers, tend to localise close to establishments in industry 3430, suppliers, from 0 up to 101 km, or, at least, they locate closer to these than to establishments in their own sector. However, if we look at Fig. 7b, we can see that the K ij (r) function lies between the upper and the lower bounds of the confidence interval. This means that colocalisation will not be significant at any distance analysed, but neither do we find any codispersion patterns.

In the case of the automotive sector, it seems obvious that the industries analysed present a clear tendency towards colocalisation. This is the result that is detected when our ‘broad test’ is used, while according to the ‘narrow test’ colocalisation would be rejected. Evidently, one could argue that the tendency to colocalisation is not symmetrical (who attracts who?), and probably this is true. A more detailed historical analysis of inputs and outputs in both industries could shed more light on this issue. However, as we do not have that information, we introduce Fig. 8 in order to add more details to the analysis of the colocalisation patterns of both industries. Thus, Fig. 8 shows the interaction in the location patterns of establishments in industry 3430 and establishments in industry 3410, that is, the reverse case of Fig. 7.

In Fig. 8a, we observe that the pattern of behaviour of the observed K ij (r) is almost repeated, as in Fig. 7a, lying above the upper bound of the confidence interval from 0 up to 104 km. This means that suppliers are also colocalised with assemblers at these distances. Thus, we find colocalisation in both ways (i ↔ j), giving more robustness to our results discussed above. In Fig. 8b, we find again, as in Fig. 7b, that the K ij (r) function lies between the upper and the lower bounds of the confidence interval at all the distances of the radius that were analysed. Hence, with the ‘narrow test’, we do not obtain codispersion at short distances, probably because both industries present individual location patterns which are not very concentrated. In this way, the tendency of the establishments of both industries to locate closer to establishments in their own industry will not be stronger than the tendency to locate closer to establishments of the other industry. Therefore, we can conclude that these two industries analysed present a close positioning over the years.

Finally, in general terms, we observe that the way the counterfactuals are constructed somehow affects the interpretation of our results. Since the two tests can be misleading, the joint use of them can give us a more accurate assessment of colocalisation patterns between two industries. Using the ‘narrow test’, we would be able to detect less than half of the industries that have a relevant influence on the location pattern of other industries and locate close to them. However, the number of industries with colocalisation patterns becomes more or less equal when we apply our ‘broad test’ in both ways (i ↔ j), as in the last example. Hence, taking into account this mutual attraction, the number of pairs of industries showing colocalisation patterns is very similar, in both cases, and we can add more robustness to the analysis and minimise the probability of ‘joint-localisation’.

4 Conclusions

In this paper we extend the point pattern methodology of Albert et al. (2012) to analyse the location and colocalisation patterns of Spanish manufacturing industries. We use extensive micro-geographic data, with a more detailed industrial classification, in order to assess the tendencies to cluster in each four-digit industry relative to the whole of manufacturing, to measure the sectoral scope of localisation and to analyse the colocalisation patterns of the pairs of industries with relevant input–output linkages.

To do this, we apply an extension of Ripley’s K function and a bivariate generalisation of this function, the K-cross function. Both of them are distance-based methods, which allow us to treat space as being continuous and simultaneously analyse multiple spatial scales while avoiding the shortcomings of the administrative scale.

The analysis carried out provides us with a meaningful variety of results. First, we find that 68% of the Spanish manufacturing industries present concentration patterns, whereas 18% are dispersed and 14% do not present any significant differences from the location pattern of the manufacturing as a whole. Second, a vast majority of the four-digit industries that show concentration patterns are grouped in relatively small clusters, since 75% of them reach their maximum level of concentration at distances lower than 90 km. Third, we observe that textile-related industries, media-based industries and chemical-related industries are listed as the most concentrated industries, whereas mostly food and food-related industries, together with industries with a high dependence on natural resources, are the activities that show the most dispersed location patterns. These results coincide to a large extent with those obtained by other authors for other European countries.

According to the sectoral scope, we do not find a widespread feature if we compare the location patterns of four-digit industries that are part of the same two-digit sector. However, it seems that the more homogeneous the activities in a specific sector are, the more similarities we find in the spatial location patterns among their industries (intra-sectoral homogeneity).

With regard to colocalisation, we obtain a rich harvest of results. First, the existence of significant input–output linkages between different industries does not guarantee the spatial colocalisation of these industries. Second, we find that establishments tend to locate closer to those in vertically linked industries than to establishments in horizontally linked industries. So, 74% of colocalised industries are vertically linked. This result suggests that the input–output linkages, although probably not the only factor generating a tendency towards inter-industrial spatial concentration, are an important source of externalities that encourage inter-industrial agglomeration, especially in vertically linked industries. Third, the number of pairs of industries that show colocalisation patterns varies depending on the method used to construct the counterfactuals, which allow us to establish the statistical significance of the results. Thus, the more demanding the methodology used is, the more likely it is that actually existing colocalisation between pairs of industries will be rejected. Specifically, when we use the ‘narrow test’, we find that only 20% of the total number of pairs of industries with stronger linkages are colocalised. Moreover, this tendency to locate close to other industries is higher at large distances of the radius, rather than at short distances, the maximum percentage of colocalised industries being reached from 130 km onwards. This result coincides with that obtained by Duranton and Overman for the UK, which is to our knowledge the only study that has also addressed this issue. Otherwise, when we employ the ‘broad test’, we observe that a higher number of industries show colocalisation patterns. In fact, 48% of the total number of pairs of industries analysed are colocalised, the percentage of industries with colocalisation patterns at short distances now being higher than at large distances. However, we believe that this result is not only an artefact of the design of the null hypothesis, but rather reflects the real contribution of other forces, such as the existence of a skilled labour market or greater mobility of labour at short distances.

In this work, following the path of what has been called the third generation, we have abandoned the use of CSR as a benchmark, building our own counterfactuals. These have been adjusted to the specific reality under study and to the economic restrictions affecting industrial localisation. However, this work, like other similar studies, enables us to establish a series of ‘stylised facts’ in a rigorous manner, but the reported evidence is basically descriptive. In this way, an interesting extension of this type of work could be the design of tests for specific theoretical hypotheses. This would allow us to discriminate between different causes of concentration of economic activity, at different spatial scales, although doing so is beyond the scope of this paper.