1 Introduction

Successful innovation is increasingly based on interactions and collaborative research activities between research actors (Wuchty et al. 2007). Despite of globalisation and the increased use of information and communication technologies (ICT), geographical distance continues to be an impediment for collaboration, and thus a barrier for expansive knowledge diffusion between different actors in a system of innovation. There is a strong tendency towards spatial concentration of collaboration, even though geographical proximity can be partially substituted by other forms of proximity (Boschma 2005; Breschi and Lissoni 2009) or has to be established temporarily (Torre 2008). There are some hints that the importance of spatial distance has decreased in Europe since the 1990s (Cappelli and Montobbio 2013; Chessa et al. 2013; Lata et al. 2015). However, a decreasing importance of distance could also be induced by increased ICT usage, the overall trend of globalisation in science, or—in Europe—specific policy measures aiming to establish a European Research Area (European Commission 2011). Up to now, in the USA, no such policy measures for enhancing collaboration over distance have been adopted. Since the USA are an economic area relatively similar to Europe in terms of culture and economics compared with other large economic areas in the world, the USA could serve as a suitable comparison for the development over time. Research on interregional innovation collaboration has been a prominent topic in Europe but less in the USA, so that up to now there are hardly any comparable studies available for the USA.

The paper at hand investigates whether the importance of spatial and further forms of distance has developed differently in the USA and Europe recently. The aim of our research is threefold: firstly, we will analyse the development over time. Secondly, we will distinguish between different types of distance in order to see whether there is a development in the importance of certain types of distance while others remain unchanged. Thirdly, we will compare Europe and the USA in order to investigate whether there are differences in the relevance of the various types of distances and whether there are specific European developments that could be policy-induced. We employ a spatial interaction modelling perspective in order to identify and compare the relevance of distinct measures of distance for the different collaboration networks in Europe and the USA. In order to have a strong basis for the analysis, we use two collaboration output measures: co-patenting and scientific co-publication.

The study differs from previous research (Lata et al. 2015; Morescalchi et al. 2014; Hoekman et al. 2009) by the following respects: first, we use small area units (NUTS3 level in Europe and the Core Based Statistical Areas in the USA) in order to examine the influence of different types of distance on co-patent and co-publication networks.Footnote 1 Second, we analyse these networks in two different areas (Europe and the USA) by employing one encompassing panel model for each network that includes all observations in Europe and the USA in all considered years. These two models allow us to investigate the publication and patent networks from a comparative and longitudinal perspective. Third, following Lata et al. (2015), we employ the spatial filter technique in an encompassing random effects spatial interaction model. By adjusting the spatial filter technique for two different areas (i.e. Europe and the USA) in one encompassing model, we are able to account for spatial dependence among residual flows.

The remainder of the paper is structured as follows. The next section gives an overview of the literature on collaboration over distance and lists possible differences between the USA and Europe, especially regarding the development over time. Section 3 presents our method. Section 4 presents the data and variables. Results of the regressions and their discussion follow in Sect. 5 before Sect. 6 concludes.

2 Theoretical background

The literature provides a number of reasons why spatial and other forms of distance should play a role for co-patenting and co-publishing. We will discuss one after the other in the following.

First, the traditional geographical approach connects distance with costs, mainly travelling costs. In the last decades, travel costs have decreased tremendously and seem to have lost their relevance (Cairncross 1997). Travel times (indirect travel costs) depend on the available transportation infrastructure, which has improved in the last decades. This implies a decrease in the relevance of geographical distance. Comparing USA and Europe, there are no clear differences in travel times. Travelling by car is cheaper in the USA. However, the spatial distribution of metropolitan areas differs with the USA having the main centres at the borders of the country implying long travel distances, while in Europe there are economic centres more evenly distributed in space.

Second, many studies claim that geographical distance does not matter because of the distance itself but because of the occasions and options it provides for meeting each other, even unintentionally (Bathelt et al. 2004). One argument is based on the common belief that it is only possible to exchange certain kinds of knowledge face-to-face. Another argument is based on the assumption that interacting frequently, and especially face-to-face, enhances trust and, therefore, the willingness to exchange knowledge (see Storper and Venables 2004 for an overview of the arguments). Finally, it is argued that the search for collaboration partners has a regional bias, leading to more collaborations with nearby partners (Broekel and Binder 2007). These arguments seem not to lose relevance with time, and there is also no obvious difference between USA and Europe.

Third, it is argued in the literature that social proximity is the reason for the findings on the impact of geographical proximity. Breschi et al. (2003) and Breschi and Lissoni (2009) have shown in their works that collaboration occurs mainly between actors that know each other. The driving force according to their argument is social proximity, which in most cases comes with geographical proximity, except if people relocate. As a consequence, mobility becomes an issue. There are clear differences in the mobility of people between USA and Europe. People in the USA are more mobile, especially with respect to large distances (see, e.g. Ihrke and Faber 2012: Table 1, Statistisches Bundesamt Deutschland 2012: p. 23). For some years now, EU policies have been directed at the integration of innovation activities, e.g. policies aiming at a European Research Area (ERA), students’ mobility programs (like Erasmus), and the European Regional Development Fund (ERDF) aiming at the cohesion of European regions. If these approaches were successful, we should see a decreasing relevance of geographical distance in Europe, especially in co-publications, while there is no obvious reason to expect changes in the USA.

Fourth, in the context of globalisation the last decades have been characterised by an increasing similarity of regulations and laws between countries, especially among developed countries and an increasing share of the English-speaking population in countries in which English is not the official language (European Commission 2012a). These developments could be responsible for decreasing cultural and institutional barriers. However, this holds only for Europe and not for the USA, where these barriers are not present.

The impeding effect of geographical distance holds for both co-publication and co-patenting. The economic interests are usually greater in patenting activities, because filing a patent is more expensive than submitting papers for publication and because patents can more often than publications be used for developing innovations which shall result in revenues. On the one hand, this should lead to larger investments for co-patenting activities including costs for overcoming spatial distance. On the other hand, this leads to a higher share of collaborations that take place within one organisation (high level of trust) and close proximity. The latter probably prevails and leads to more distance sensitivity of co-patenting activities in comparison with co-publications.

Besides geographical distance, we also study the impact of cognitive distance (in form of technological and scientific distance). The importance of an optimal distance in interaction and collaboration has been intensively discussed in the literature [(e.g. Nooteboom (2000) and Nooteboom et al. (2007)]. Neither very small nor large cognitive distances lead to good outcomes and learning effects in interactions. However, recent literature argues that new technological developments are more and more based on the combination of rather distant technological fields (Choi and Valikangas 2001). As a consequence, it can be expected that the optimal cognitive distance between actors becomes larger with time, especially in the technological sphere (patents).

The scientific world is slightly different. Scientific researchers focus mainly on publishing, which implies that whether research can be published is a major issue. Although interdisciplinary research has been propagated strongly by politicians in the last decades, most journals are still very focused and even the university system is in most countries structured in scientific fields with little interaction. This hints at a rather inhibiting effect of cognitive distance. Furthermore, journal rankings disadvantage interdisciplinary research and make mono-disciplinary research more attractive (Rafols et al. 2012). Hence, in the scientific domain it cannot be expected that the relevance of cognitive distance has decreased.

Besides the above arguments, the amount of knowledge increases continuously and exponentially and each individual is able to know a decreasing share of the overall knowledge even when educated intensively and interdisciplinary. This is the reason for the increasing share of team inventions and larger teams (Wuchty et al. 2007). It might be argued that if the share of total knowledge that is known by each actor decreases, it becomes necessary to collaborate with cognitively less distant partners. This would lead to a decrease in cognitive distances, which should similarly hold for publications and patents.

There are several studies on various kinds of collaboration between regions, firms, or individuals that investigate the influence of more than one type of distance empirically. In these studies usually negative effects on collaboration are found for the different types of distance, sometimes there is no significant effect. In case it is explicitly stated, technological distance seems to have the strongest negative effect. The bulk of the studies captures Europe and is of static nature. There are still some empirical gaps regarding the analysis of interregional collaboration in the USA and the analysis from a longitudinal perspective. We found few dynamic studies. The one by Lata et al. (2015) finds a decreasing negative impact of spatial distance (for EU framework program projects). Cappelli and Montobbio (2013) compare four periods of 5 years each and find a decreasing importance of spatial and technological distance for patent collaboration but increasing importance of spatial distance for patent citations. By comparing old and new EU member states, they find an increased amount of collaboration between old and new EU members. This hints at an indeed positive development of integration in European research activities. The study closest to our investigation is the one by Morescalchi et al. (2014). It is a follow-on study of Chessa et al. 2013 including, spatial, technological, and cultural distance. They find a decreasing influence of spatial distance until the mid-90s, then an increase, and lately a stagnating trend, which they do not try to explain. When comparing the EU with non-EU collaboration, they find an EU integration effect until 2004, but not further on. Chessa et al. (2013) find that spatial distance plays a smaller role in the USA compared to the EU. Similarly, Fritsch and Slavtchev (2007) find a smaller radius of university-business collaboration in Germany compared to the study of Acs et al. (2002) for the USA. However, we do not know any other study comparing USA and Europe regarding the impact of spatial distance on innovation collaboration.

2.1 Hypotheses

Overall, our theoretical considerations and the existing empirical record hint at a lower sensitivity for spatial distance in the USA compared to Europe. The theoretical arguments are mainly based on the higher mobility of people in the USA and on the additional barriers that exist in Europe, such as language and institutional differences. Empirically, two studies support this difference between USA and EU (Chessa et al. 2013; Fritsch and Slavtchev 2007). In order to disentangle the effects of country and language borders from the “pure” distance effect, we will test the following hypothesis:

H1: The probability of a link between two regions is in Europe more negatively related to spatial distance than in the USA even when taking country and language borders into account.

From our considerations above follows that spatial as well as other forms of distance continue to play a role—everywhere. However, there are some indications of a trend that the impeding effect of spatial distance is decreasing over time, at least in Europe, but probably also in the USA. However, the empirical literature does not provide clear evidence about trends in the relevance of geographical distance although a decrease is slightly more often supported. Similarly, the empirical literature does not provide a clear statement about differences between the EU and other countries. The theoretical arguments are quite clear in this context. Given the efforts of the EU government to support mobility and the cohesion within the EU, the relevance of geographical distance, country borders and language barriers should decrease. We will test the following hypothesis:

H2: The negative relationship between spatial and cultural distance and the probability of a link between two regions has decreased in Europe during the last decade stronger than in the USA.

The empirical studies that have been conducted so far do not find a change in the negative impact of technological distance over time. The theoretical arguments (see above) lead to ambiguous expectations. With respect to patents, we have two contradicting arguments: new technological developments require more and more the combination of distant technologies, while increasing specialisation makes collaborations with lower cognitive distance necessary. With respect to publications, the question is whether the increased support for interdisciplinary research counterbalances the increased specialisation. We assume that the increased interdisciplinarity outweighs the increased specialisation and state:

H3: The negative relationship of cognitive distance, measured as technological and scientific distance, with the probability of a link between two regions decreases with time. Differences between US and Europe are not expected.

3 Method

The purpose of this study is to estimate and to compare how spatial and other types of distance influence collaboration behaviour in Europe and USA over time. As we are dealing with origin–destination flow data in the form of collaboration activities between different regions, it seems natural to employ a spatial interaction modelling perspective. Spatial interaction models have been widely used to explain different kinds of flows across geographical space [see, for example, Fischer (2002) and Sen and Smith (1995)]. In order to compare the effects of different distance types in Europe and the USA, we pool the data for Europe and the USA and estimate one model for the co-patent network and one model for the co-publication network. In implementing an encompassing panel model for Europe and the USA, we are able to compare the effects of different distance types in both areas. Furthermore, we are capable to identify time effects in both areas.

In what follows we will specify the panel version of the spatial interaction model.

Following Fischer and Wang (2011), spatial interaction models rely on three types of factors that explain the mean interaction frequencies between origins i and destinations j in time period t: the (1) origin—specific factors which characterises the origin i of the interaction; (2) the destination—specific factors which describe the destination j of the interaction and (3) the origin–destination measures which characterise the spatial separation between a origin i and destination j. The general form of the spatial interaction model is given by

$$\begin{aligned} y_{ijt}=a+x_{ijt} + {\varepsilon }_{ijt} \quad i, j=1, \ldots , n; t = 1, \ldots , T \end{aligned}$$
(1)

with

$$\begin{aligned} x_{ijt} = O_{it} D_{jt} S_{ijt} \quad i, j = 1, \ldots , n; t =1, \ldots , T \end{aligned}$$
(2)

where \(y_{ijt}\) is our dependent variable with observations \(\{ y_{ijt}: i,j=1,{\ldots },n; t=1, {\ldots },T\}\) on collaborations between region i and j in time period t within one of our networks. \({\varepsilon }_{ijt}\) refers to the disturbance term with the property E \([{\varepsilon }_{ijt }{\vert }y_{ijt}] = 0\). As we estimate an encompassing model, the dependent variable represents all collaborations between European regions and between regions in the USA in a given network.Footnote 2 \(x_{ijt}\) is the expected mean interaction frequency of flows from regions i to j in time period t. \(O_{it }\) and \(D_{jt}\) characterise the origin and destination factors in time period t, and \(S_{ijt}\) denotes the function of some measure of separation between region i and j in time period t. It is generally accepted that the origin and destination factors are best stated by power functions [(see, for example, Fischer and Wang (2011)]. Thus, we define \(O_{it} =o_{it}^{\alpha _1 }\) and \(D_{jt} =d_{jt}^{\alpha _2 } \), where \(a_{1}\) and \(a_{2}\) are the statistical parameters to be estimated. The principal core of the spatial interaction model is the separation function, which is specified as

$$\begin{aligned} S_{ijt} =\exp \left[ {\sum _{k=1}^K {\beta _{k} s_{ijt}^{(k)} } }\right] \quad i, j = 1, \ldots , n; t = 1, \ldots , T \end{aligned}$$
(3)

where \(s_{ijt}^{(k)} \) are \(K (k = 1, \ldots , K)\) separation variables and ß \(_{k}\) are parameters to be estimated. A detailed description of the separation variables used in this study is outlined in the next section.

As highlighted by other studies [see, for instance, Fischer et al. (2006) and Scherngell and Barber (2011)], the use of least square regression requires origin–destination flows that are independent and log-normally distributed about the mean value. In our case, this assumption is violated due to the discrete nature of our dependent variable and the presence of zero flows. Consequently, the use of least square regression would produce biased estimates. The usual approach to solve this deficiency is the Poisson model specification. We therefore assume that our \(y_{ijt}\) is distributed as an independent Poisson variable. Furthermore, we follow Lata et al. (2015) and implement a panel Poisson version of the spatial interaction model, which is given by

(4)

where \(\gamma _{ij} \) denotes the unobserved individual specific effect (see Baltagi 2008).

The crucial advantage of a panel model specification is that it takes region-pair-specific heterogeneity into account. The two usual approaches for fitting panel data models are known as fixed effect regression and random effects regression. As noted by Greene (2012), the selection of the appropriate panel model is difficult, as all specifications have distinct shortcomings. A disadvantage of the fixed effect specification is that the within estimator wipes out all time-invariant variables (Baltagi 2008). However, time-invariant independent variables, such as geographical distance or technological distance, are the core of our interest with respect to the research question of the current study. Using the LR test to compare the model performance of the pooled model contrary to the random effect model indicates that the random effect specification is more appropriate. Therefore, we use a random effect specification. In our case, the random term \(\gamma _{ij} \) is time invariant but varies across all (ij)-region-pairs and thus accounts for region-pair-specific effects. Furthermore, we follow Lata et al. (2015) and estimate a random effects negative binomial spatial interaction model in order to take overdispersion into account. When our dependent variable \(y_{ijt }\)is Poisson distributed with \(x_{ijt}\), as defined in Eq. (4), and exp \((\gamma _{ij} ) \quad \sim \) Gamma then our random effects negative binomial spatial interaction model is defined as:

$$\begin{aligned} \Pr \left( {y_{ij1} , \ldots ,y_{ijT} } \right) =\frac{\left( {\prod _{t=1}^T {x_{ijt} ^{y_{ijt} }} } \right) \varGamma \left( {\theta +\sum _{t=1}^T {y_{ijt} } } \right) }{\left( {\varGamma (\theta ) \prod _{t=1}^T {y_{ijt} !} } \right) \left[ {\left( {\sum _{t=1}^T {x_{ijt} } } \right) ^{\sum _{t=1}^T {y_{ijt} } }} \right] } Q_i \left( {1-Q_i } \right) ^{\sum _{t=1}^T {y_{ijt} } } \end{aligned}$$
(5)

with

$$\begin{aligned} Q_i =\frac{\theta }{\theta +\sum _{t=1}^T {x_{ijt} } } \end{aligned}$$
(6)

where \(\varGamma (.)\) symbolises the Gamma distribution and \(\theta \) its variance.

Different studies have used spatial interaction models for modelling origin and destination flows in various regional settings [see, for instance, Sen and Smith (1995) and Fischer and Reismann (2002)]. In this context, numerous work have pointed to the problem of spatial dependence in interaction models, also called spatial autocorrelation of flows (Fischer and Griffith 2008; LeSage and Pace 2008; Scherngell and Lata 2013; Fichet de Clairfontaine et al. 2015). Spatial autocorrelation of flows occurs when collaborations flows are related to each other, e.g. when collaboration flows from region a to region b are related with collaborations flows of region a to region c [see, for example, Chun (2008)]. Spatial autocorrelation of flows can lead to incorrect inferences due to inconsistence of the standard errors, and, thus, to unrealistic significances.

Thus, we follow Lata et al. (2015) and Scherngell and Lata (2013) by applying the spatial filter method to our panel model settings. This approach consists of introducing eigenvectors in order to filter out spatial autocorrelation in our models. However, as suggested by Griffith (2003), it is not appropriate to use the full set of extracted eigenvectors. We follow Griffith (2003) and extract a subset of eigenvectors on the basis of Moran’s I values, that show a higher value than 0.25. Furthermore, we adopt this set of eigenvectors to our spatial interaction model framework by using Kronecker products [see, for details, Fischer and Griffith (2008)]. At this point, we follow Lata et al. (2015) in order to construct the eigenvectors to our panel models. These final panel origin and destination filters are time invariant and covers the total number of space–time observations. We denote the origin and destination filters as \(E_{q}\) and \(E_{r}\), respectively. In the next step, we include the time-invariant and time-variant spatial filters as regressors into our panel models. Thus, our negative binomial panel interaction model is given by

(7)

The coefficients to be estimated for the spatial filters are \(\psi _{q} \) and \(\varphi _{r} \).

In regard to our research question, namely to assess how specific separation effects evolve over time we introduce a time variable \(H_{t}\). We construct this variable by assigning a number for each year. In order to examine changes of our separation variables, we construct different interaction terms between the time variable and the separation variables (see, Wooldridge 2008). These terms demonstrate if the effect of our separation variables changed during the observed time period. However, as mentioned above we use an encompassing panel model for Europe and the USA. In order to determine the effects of our separation variable between Europe and the USA, we construct a dummy variable \(Z_{t}\) .This variable takes a value of one if a region is located in Europe and zero if the region is located in the USA. By constructing interaction terms between this dummy variable and our separation variables, we are able to estimate the differences of the separation effects in Europe and the USA. In addition, we also build interactions terms between both dummy variables \((H_{t}\) and \(Z_{t})\) and the separation variables, in order to detect differences in the time trends between Europe and the USA. Incorporating \(H_{t}\) and \(Z_{t }\) in (7) leads to our final spatially filtered panel version of the negative binomial spatial interaction model. The resulting model is given by

(8)

4 Data and variables

In order to analyse different collaboration networks across Europe and the USA, two datasets are used. We use the Web of Science database (WoS) to construct the co-publication dataset and the OECD REGPAT database (January 2013 edition) to build the co-patenting dataset. We extract the two datasets for the time period 1999–2009. The WoS database is a bibliographical database which contains also information on the institutional addresses of authors. We use this information to construct our regional setting of the co-publications. We use all given addresses of authors and assign them to the regions (we succeed in around 98% of the case). Authors with no address given and addresses that could not be assigned are not considered further. For our co-publication dataset, a link is given for each pair of authors that are on the same publication with an assignable address. For patents, we use the inventor addresses in order to trace the origin of the invention. The assignment of inventors to the regions (NUTS-3 and CBSAs) was done according to postal codes and in ambiguous cases supported by city/county name and state. For the co-patenting dataset, a link is given when a patent comprises two or more inventors.

If a patent or publication contains inventors or authors from more than two regions, we assign a link to each region-pair that is involved. Thus, a link is defined as a collaborative activity between two inventors or two authors. For example, we count for a patent three links, when three different inventors a, b and c are located in three different regions (from a to b, from b to c and from a to c) [see Scherngell and Lata (2013)]. \(y_{ijt}\) is the sum of these links for each combination of two regions and each year. We consider only links within one of the two studied economic areas, i.e. all US–Europe collaborations as well as all collaborations with actors outside the USA and Europe are excluded.

4.1 Regions and regional collaboration networks

Interregional collaboration studies are often based on the rather large NUTS-2 level. The paper at hand takes a more detailed approach and investigates interregional collaboration on the NUTS-3 level (1260 regions) in Europe. Using fine-grained spatial units has at least two advantages: (1) most large cities do not have a border with the next large city so that being neighbouring is not mixed up with collaborations between large cities, (2) using the centres of the regions for calculating distances between regions results in more accurate distances. To the authors’ knowledge, there is no study analysing interregional collaboration in the USA and there is no equivalent to the NUTS classification in the USA except the OECD territorial level 3, which is comparable in size with the NUTS2 level. Most similar in size to NUTS3 are the 925 Core Based Statistical Areas (CBSAs, without oversea islands). They contain so-called metropolitan and micropolitan areas, i.e. urban agglomerations of more than 10,000 inhabitants defined by the US Office of Management and Budget. By definition, this classification excludes some rural areas (referring to appr. 13% of the population). In fact, we expect that regarding patent inventors, the CBSAs cover a share that is even larger than that of the population, because innovative firms are rarely located in the countryside. The average region in the USA in 2010 had 316,000 inhabitants (sd: 1,062,000), and the average NUTS3 region had 382,026 inhabitants (sd: 463,503). Thus, the regions are of comparable size. In order to construct our collaboration networks on a regional level, we extract n-by-n collaboration matrices for each time period t and for each dataset, by summing up the links in time period t for each region. This leads to the observed number of scientific collaborations \(y_{ijt}\) between two regions i and j in time period t and for each collaboration network, i.e. the co-patent and co-publication collaboration network in the USA and Europe.

4.2 Descriptive statistics

Tables 1 and 2 present some descriptive statistics on observed collaborations for the co-patent and co-publication networks in the USA and Europe. The columns show the statistics for the USA and Europe for the years 1999 and 2009. The results can be termed as follows. First, we observe that all networks become denser between 1999 and 2009. Second, we observe in both economic areas different amounts of links between the co-patent and co-publication networks. This finding confirms, in both economic areas, a higher cross-regional collaboration activity in the co-publication network than in the co-patent network. Third, comparing the share of interregional and intraregional links, we find differences of intra- and interregional collaboration activities in the co-patent networks between Europe and the USA. The results show a lower spatial concentration of co-patent activities in Europe.

Table 1 Descriptive statistics of scientific collaboration in co-patent networks (USA/Europe)
Table 2 Descriptive statistics of scientific collaboration in co-publication networks (USA/Europe)

In the USA, the collaboration activities in the co-patent and the co-publication network tend to be dominated by the northeast regions with the large urban areas New York, Philadelphia, Boston, and Baltimore and the west regions with the agglomerations of Seattle, San Francisco, and Los Angeles (see Figs. 1, 2). In Europe, the regional collaboration activities in the co-patent network show a spatial concentration in regions that belong to the industrial core of Europe (Brunet 2002). The collaborations activity in the co-publication network seems to be governed by the urban areas of London, Paris, northern Italy, and Barcelona.

The spatial structure of the co-patent network and the co-publication network differs in both observed areas. However, the spatial differences seems to be larger in Europe than in the USA. Furthermore, in both areas the share of intraregional collaboration activity is higher in the co-patent network than in the co-publication network, pointing to an (expected) higher propensity of spatial clustering of collaborative activity in the co-patent networks.

Fig. 1
figure 1

Spatial distributions of cross-region scientific collaborations for the year 2009 in the USA within a the co-publication network and b the co-patent network. In each network, the nodes correspond to one region. The sizes of the nodes are proportional to the number of regional participations. The size of a node represents therefore the total number of collaborative activities in co-publications or co-patents of a region. It is not the total number of publications or patents of this region. The strength of the lines corresponds with the number of joint collaborations between two regions

Fig. 2
figure 2

Spatial distributions of cross-region scientific collaborations for the year 2009 in Europe within a the co-publication network and b the co-patent network. In each network, the nodes correspond to one region. The sizes of the nodes are proportional to the number of regional participations. The size of a node represents therefore the total number of collaborative activites in co-publications or co-patents of a region. It is not the total number of publications or patents of this region. The strength of the lines corresponds with the number of joint collaborations between two regions

4.3 The independent variables

Since the seminal paper on types of proximity by Boschma (2005)—and in some cases even before—many studies have included more than one type of proximity in models explaining collaboration or knowledge flows. Depending on the object of research and data availability, types of distances were selected and operationalised. In the paper at hand a number of measures are used as separation variables characterising the separation between regions. In what follows, we will specify these separation variables, which are the focus of interest in the context of our research questions.

The first separation variable is the spatial distance between the centres of two regions. We use the great circle distance to calculate the distance between the centres of two regions. The second separation variable is a non-neighbouring dummy variable. If the population is spatially distributed within a region, the distance between some parts of the population in neighbouring regions might be very small and not adequately reflected by the distance between the regions’ centres. This might facilitate collaboration between the neighbouring regions. While in the USA the used areas are in most cases defined such that they are built around the dominating city, the areas are defined in Europe more on a historical background. Thus, the spatial aspect within the non-neighbouring dummy should be stronger in Europe. There are different methods to choose the neighbour criterion [see, for example, Bivand et al. (2013)]. Neighbouring objects can be, for example, defined using contiguity neighbours, higher-order neighbours, or grid neighbours. We use distance-based methods and choose the k-nearest neighbour criterion (with \(k=5\)) in order to assign each region in Europe and the USA the same number of neighbours. Thus our variable takes the value of zero, if region j is one of the five nearest neighbours of region i, and one otherwise.

The third separation variable captures technological distance between two regions. Technological distance is measured by \(1-r ^{2}\) with \(r^{2}\) being the uncentered correlation between the patent class distribution (aggregated to 121 classes) of patents applied for in the regions [see Hoekman et al. (2010) and Moreno et al. 2005]. In a similar way, the fourth separation variable “knowledge distance” is defined on the basis of publication data. To this end, the science classification of the WoS is used and aggregated to 68 classes (reflecting the classification of university activities used by the German Statistical Office). Again \(1-{r}^{2}\) with \({r}^{2}\) being the uncentered correlation between the publication class distributions of regions is used as measurement for the knowledge distance. Through this, we obtain two measures that express in a similar way the similarity between the activities in two regions: the technological distance based on the patent activity and the knowledge distance based on the publication activity.

The fifth separation measure is a country border variable, which indicates whether there is a country border between two regions. This variable takes in Europe a value of zero if two regions are located in the same country and one otherwise. For all regions in the USA, this variable is zero, as country borders do not exist. The sixth separation variable is a language border dummy and equals for European region-pairs one if two regions belong to different languages areas. The assignment of a language to a region refers to the language which is spoken as first language by the majority of the population. For regions in the USA, this variable takes always a value of zero. Furthermore, we include origin and destination variables as it is common in gravity models. We use population data as origin and destination variable (instead of the amount of patenting/publishing in the regions) in order to be able to use the same data for all datasets. Since the correlation between population and patenting/publishing is high, this is a legitimate standardisation. The US data are from the 2010 Census Gazetteer by the US Census Bureau. The European data are available from Eurostat and refer to 2012.

Moreover, as mentioned above we construct a dummy variable that takes a value of one if the regions are located in Europe and zero if regions are located in the USA. By constructing interaction terms between this dummy variable and our separation variables, we are able to estimate the differences of the separation effects in Europe and the USA. We denote this dummy variable “Europe” and build the following four interaction terms: first, between geographical distance and Europe, second, between non-neighbouring region and Europe, third, between technological distance and Europe and fourth, between knowledge distance and Europe. Furthermore, we include a time variable in our model framework and label this variable “time”. We construct the following six interaction terms: first, between geographical distance and time, second, between non-neighbouring region and time, third, between technological distance and time, fourth, between knowledge distance and time, fifth, between country border and time, and sixth, between language border and time. By estimating this set of interaction terms, we are able to determinate the change of the separation variables over time. However, in order to differentiate the time effects between Europe and the USA, we build additional interaction terms. The separation variables interact with the dummy variable “Europe” and with the time variable. We construct the following four interaction terms: first, between geographical distance, Europe and time, second, between non-neighbouring region, Europe and time, third, between technological distance, Europe and time, fourth, between knowledge distance, Europe and time.

5 Results and discussion

As explained in the previous section, we take the US data for each distance variable as a baseline. An interaction term of each distance with the dummy for Europe results in coefficients that show the differences between the two economic areas. That means, the sum of the baseline coefficient and the European coefficient gives the overall value for Europe.

5.1 Publications

5.1.1 Comparison US–Europe

Table 3 shows, as expected, that geographical distance reduces the likelihood of collaboration as do country and language borders. The impeding effect of spatial distance is not significantly stronger in Europe than in the USA. This result contradicts Hypothesis 1. While being not neighboured reduces the likelihood of collaboration further in the USA, the effect is hardly existing in Europe (remember the coefficient for the USA is the baseline, so that the effect in Europe is given by the sum of the coefficients, which is almost zero). Hence, collaboration of actors in neighbouring regions (without language and country borders between them) is similarly likely in Europe as in the USA. However, while in the USA this likelihood is significantly decreased if the two regions are not neighboured, no such neighbouring effect exists in Europe but the likelihood decreases even much stronger if a language or country border is between the two regions. The pure distance effect is similar in both areas.

Considering cognitive distances, we find that knowledge distance plays by far a larger role than technological distance for co-publication activities. This is no surprise, since knowledge distance is calculated on publication data. Comparing the USA and Europe, knowledge as well as technological distance has the larger impeding effect in Europe.

Table 3 Estimation results (USA and Europe 1999–2009)

Hence, differences in the scientific specialisation between regions prevent collaboration between these regions more in Europe than in the USA. This contradicts Hypothesis H3. However, above we argued that the relevance of cognitive distance in scientific collaborations might be influenced by a strong structuring of university research and journal rankings into scientific subjects. The results might imply that these structural bindings are stronger in Europe than in the USA.

5.2 Time trend

Geographical distance loses slightly importance over time in the USA. In Europe, the loss in importance is even a bit stronger, hinting at a European integration effect. This confirms the first part of Hypothesis H2. The impeding effect of being not neighboured also becomes weaker over time, in Europe more than in the USA. This supports H2. However, at the same time, the importance of language borders has become stronger in Europe while the effect of country borders did not change. Overall, the language effect does not completely mediate the positive development of spatial distance and non-neighbourhood. This finding shows that we find some developments that could be assigned to the EU and all its integration endeavours.

Regarding knowledge distance, we find a negative overall trend and a positive trend for Europe compared with USA leading to an overall slightly positive trend in Europe. Hence, H3 holds for Europe but not for the USA. Probably Europe shows a trend towards more interdisciplinary research, while in the USA the trend goes rather towards specialisation. The influence of technological distance becomes slightly weaker over time with no difference between the two economic areas.

5.3 Patents

5.3.1 Comparison US–Europe

As in the case of publications, we obtain significant impacts for all distance measures. Even when controlling for language and border effects, the negative impact of spatial distance is significantly stronger in Europe. Links between distant regions in Europe are less likely. However, being neighboured again matters more in the USA than in Europe. This counterbalances the geographical effect for regions that are not too far away from each other. For regions that are less than 300 km away from each other and not neighboured, the probability of scientific collaboration is even larger in Europe than in the USA. However, this only holds as long as the regions are not separated by a language or country border. Hence, Hypothesis H1 is confirmed for patents only partly. Actors in Europe interact more often with not too far away other actors as long as they belong to the same country and speak the same language. Neighbouring regions are less an issue in Europe. Actors in the USA are more likely to interact with others in neighbouring regions and in faraway regions compared to Europe.

Considering cognitive distance, technological distance plays clearly a stronger role than knowledge distance. This is not surprising because technological distance is calculated on the basis of patent data. For knowledge as well as technological distance, the impeding effect is weaker in Europe. We can only speculate on the reasons for that. One possible explanation is that the economic activity is more diverse within Europe offering more options for interdisciplinary collaboration. Another possible explanation is based on a potential impact of the EU policy, which fosters interdisciplinary collaboration also in the economy. Hence, the second part of Hypothesis H3 is not confirmed.

5.3.2 Time trend

In contrast to the publication dataset, the patent dataset shows no decreasing barrier of geographical distance in the USA, but a slightly decreasing effect in Europe. In both areas, the effect of being not neighboured is weakened over time. However, in total we see little changes in time for the likelihood of links between distant regions (see Table 3). In comparison with the publication networks, the relevance of distance is much stronger in the case of patents and there is no time trend observable. The impeding effect of country borders is slightly decreasing, while the effect of language borders is increasing. Patenting continues to be a space-sensitive activity with only weak positive developments. Again, the development in Europe shows some more tendency towards a decreasing relevance of distance with the exemption of the language barriers (as it was the case for publications). In how far the European development can be traced back to European integration policy has to be investigated in future studies.

There is no trend in the effect of technological distance. Most likely, increasing specialisation and interdisciplinarity mutually counterbalance. Regarding the effect of knowledge distance, the two economic areas converge with Europe starting from a higher level of links between regions with different knowledge bases but decreasing likelihoods and the USA otherwise round. Hypothesis 3 is not supported, because there are differences between the two areas and there is no clear time trend. This hints at the difficulties to overcome cognitive distance.

5.4 Robustness check

The inventor and author team size of the co-patent and co-publication networks in the two areas under observation differ. Since we use any possible combination of two inventors and authors as a link in our analyses, this could have an influence on the results. Therefore, we applied our analyses also to the reduced set of all two-inventor patents and all two-author publications as a robustness check. A descriptive analysis shows that two inventors and two authors are the most common type of team and the dataset for the models is still quite large. Regarding the two inventors co-patent network, the results show that the coefficients do not change sign and significance, with a few exceptions (Table 4). The most important change for our analysis is that all coefficients related to distance, non-neighbouring, and cognitive distance are larger, meaning that distance plays a stronger role for two inventors teams. Furthermore, for the USA we find now a significant time trend of increasing relevance of distance. The relevance of language and borders remained nearly the same, although the decreasing relevance of country borders is now not significant any more. The other changes are of minor interest to our study. Regarding the two authors co-publication network, more coefficients change significantly. First of all, again most coefficients related to the importance of distance, neighbouring, cognitive distance and country borders became larger. Hence, again distance plays a stronger role in two authors publications. In this context, also a number of coefficients became significant, supporting the arguments before: this holds especially for the difference in the relevance of geographical distance between Europe and the USA. Beyond this, three coefficients are significant in both analyses and change their sign. Technological distance is now significantly less important in Europe, which is of minor interest. More relevant is that we do now find a significantly increasing relevance of geographical distance, which is in line with corresponding finding for patents. Hence, the trend in the relevance of geographical distance is unclear. Considering all cooperations, we find mainly (except of patents in the USA) a decreasing relevance, while in the case of two actor links we find mainly (except of patents in Europe) an increasing relevance. A final difference is that knowledge distance decreases in its relevance in the USA and Europe in our robustness check. Hence, the above interpretation that in Europe science is becoming more interdisciplinary might also hold for the USA.

Table 4 Coefficients for the robustness check (model with two-inventor patents only)

Table 5 sums up our findings. For patent data, the barriers are higher in the USA, except for geographical distance. The effects for publication data do not show such clear differences.

For most variables we find a decreasing importance of distance. This holds especially for the effects of non-neighbouring and partly for the cognitive distances. For the geographical distance, which is most discussed in the literature, we find unclear results (see robustness check). Only for the co-patent activity in Europe, we find a clear decrease in the relevance of geographical distance. Interestingly, language barriers are the only factor with an increasing relevance in time. Overall, we find some more trends towards higher probability of collaboration over distances in Europe than in the USA. This is a first hint that there are indeed scientific integration developments in Europe that are less present in the USA. Whether this is due to EU policies has to be investigated in further studies.

Table 5 Summary of the findings

6 Conclusion

The paper at hand compares the impeding effects of four types of distance on collaboration behaviour in the USA and Europe. Three features distinguish the study from others in the field. Firstly, we compare two very important economic areas. Secondly, we use fine-grained spatial units in order to have rather homogeneous units. Lastly, we analyse a period of eleven years for a dynamic view of the different types of distance. We find that space continues to be a barrier for collaboration even though the inhibiting effect has become weaker in co-patenting in Europe, while for other activities the results are not robust. We found that group size is an issue here that has to be further analysed in future studies.

The comparison between Europe and the USA shows that for publications the impact of spatial distance is similar in both areas. However, non-neighbourhood of regions has a negative impact in the USA, while language and country borders matter in Europe. In the case of patents short and medium distances (below 300 km), but non-neighbouring (geographically) cooperation is more frequent in Europe. But this higher likelihood of links in Europe in comparison with the USA becomes a lower one for large distances and if country or language borders are crossed.

Furthermore, we find clear differences in the relevance of cognitive distance. While technological as well as knowledge distance is greater barriers in the USA for co-patents, this is rather the other way round for publications. In sum, some positive trends are visible especially in Europe, which might be the results of EU policies. More detailed research would be necessary for analysing causal effects in this context, which is beyond the scope of the paper at hand.

Another interesting topic would be to investigate technology-specific differences in the relevance and development of the effect of different types of distance. Running the models for a set of different industries or technologies will thus be the next step in a follow-on study.

Certainly, the investigation above has some limitations. A time period of eleven years may be too short to expose strong trends. Furthermore, we had to exclude data from teams composed of individuals which are not all located in one of the two economic areas under observation. Moreover, the estimation of a spatially filtered panel version of the negative binomial spatial interaction model with pseudo-maximum likelihood is a subject for future research [see, Krisztin and Fischer (2015)] Nevertheless, the study is the first one to compare Europe and the USA with such large datasets and over several years. Further studies of this kind would be helpful to foster our results.