Introduction

Given the context of the “knowledge economy”, policy makers have expressed increasing interest in influencing the intensity of production of new knowledge, and potentially, the speed and breadth of its diffusion. For this reason, researchers have focused on the ways in which knowledge flows are achieved, the intensity with which they manifest themselves and the factors that condition them. One of the issues for specific attention is the importance of the geographic factor in the creation and dissemination of knowledge. The geographic barriers bounding knowledge flows (Audretsch and Lehmann 2005; Audretsch and Keilbach 2007) have been lowered by the rapid development of information technologies, contributing significantly to savings in the costs and time of knowledge diffusion (Ding et al. 2010).

Nevertheless, geographic distance still seems to be relevant in determining the structure of knowledge flows between territories. In their seminal article, Jaffe et al. (1993) found that citations to domestic patents are more likely to be domestic, and more likely to come from the same state and municipality. Twenty years later, Belenzon and Schankerman (2013) confirmed these early results, using patent citations both to university patents and scientific publications.

In shifting from the flows associated with patent citations to those of knowledge encoded in the scientific literature, one could expect the geographic factor to be less important: citations between articles, since they refer to the public content of research findings, would in theory be “placeless”, i.e. not influenced by the geographic location of the cited and citing authors (Livingstone 2010). However, a series of empirical studies have demonstrated that there a geographic proximity effect also exists in citations of scientific literature (Matthiessen Wichmann et al. 2002; Börner et al. 2006; Ahlgren et al. 2013). Pan et al. (2012) showed that the citation flows between cities, as well as the collaboration strengths, decrease with the relative distances, following a gravity law.

Differently from most previous studies on the topic, which were mainly limited to one or few research fields, Abramo et al. (2020) analyzed the influence of geographic distance on knowledge flows related to the entire Italian scientific production in the period 2010–2012. Applying a gravity model, estimated using ordinary least squares (OLS), the authors show that geographic distance is an influential factor in the processes of knowledge flows between regions of the same country, and that is not negligible in “continental” flows (from Italy to European countries), but irrelevant in intercontinental flows (from Italy to non-European countries).

Frenken et al. (2009) offer a relevant methodological contribution, proposing an analytical framework able to distinguish between physical and other forms of “proximity”, for instance “social proximity”, as determinants of scientific interaction. When controlling for such forms of proximity, physical distance seems to be reduced in importance. Yan and Sugimoto (2011) observed that the steady introduction of online databases has weakened the effect of the physical distance, so that citations are now more closely dependant on the intensity of collaboration. Recently, Wuestman et al. (2019) claimed that self-citations are an important driver of “geographic bias”. Moreover, once “cognitive relatedness” (measured by the number of references shared by two publications) is accounted for, the effect of distance between citing and cited publications is weak. The authors warn about the generalizability of their findings due to the sector and time specific nature of their analysis. Also, Head et al. (2019) conclude that the negative impact of geographic distance on citations is “mediated” by “social relatedness”. They studied how geographic distance and social ties (co-authorship, past collocation, and relationships mediated by advisors and the alma mater) affect citation patterns in mathematics, observing that when controlling for ties, the negative impact of geographic distance on citations is generally halved. The authors hypothesize that spatial proximity facilitates the creation of interpersonal links that in turn favor knowledge flows.

The contributions proposed so far in the literature are based on observations of individual fields, or at an aggregate level without distinction between fields. Therefore, we do not know if and to what extent the geographic proximity effect varies across fields. With the current work, our intention is to address this gap.

A number of reasons may explain why the role of geographic proximity might differ across research fields. Citation behavior of authors differs across research fields (Hurt 1987; Vieira and Gomes 2010). Field-focused research organizations may be more or less geographically clustered, and large and numerous within clusters. Therefore, to the extent that places concentrate their research efforts on certain topics (Boschma et al. 2014), citations reflecting intellectual recognition will also be more geographically concentrated (Head et al. 2019). Hence, the geographical proximity effect in citations may, in principle, be fully explained by the geographical concentration of intellectually related knowledge. Furthermore, in certain fields research topics might be more territory specific, addressing local needs, and therefore with more localized spillovers. Finally, when included in the analysis, self-citations amplify the role of geographic proximity (Aksnes 2003), and it is known that self-citation rates vary across fields (Ioannidis et al. 2019).

To conduct our investigation, we analyze the world publications citing up to the close of 2017, the Italian publications indexed in the Clarivate Analytics Italian national citation report (I-NCR), extracted from Web of Science (WoS) core collection in the period 2010–2012. For each publication (citing and cited) we associate a prevalent territory of production, as well as the WoS subject category (SC) of affiliation. The analysis of the effect of geographic distance on citation flows is carried out using gravity models estimated using OLS, for each SC (244 in all) and geographic context (national, continental and intercontinental).

The work is structured as follows. In the next section we present the data and methods of analysis; third section provides the results from the elaborations; fourth section closes the work with a synthesis of main results and authors’ considerations on the implications of the study.

Methods

To test the influence of geographic distance on knowledge flows at SC and area level, we apply a gravitational model similar to that used by Ponds et al. (2007) for the study of scientific collaborations between different types of institutions. The model is based on two assumptions:

  • The flow of knowledge between any two territories can be measured through the citations made in the scientific production by the research centres in the first territory, to the scientific production by the research centres in the second (i.e. citations in the scientific literature of the “citing territory” to the scientific literature of the “cited territory”).

  • Citations between two territories increase with the amount of scientific production of both, and decrease with the distance between them.

We assign publications (cited or citing) to a territory following the criteria conceived by Abramo et al. (2020), to which we refer the reader for a thorough discussion:

  • For cited publications, we define a publication as “made in” a territory if the majority of its co-authors are affiliated to organizations located in that territory.

  • Differently from the cited publications, for the citing publications the I-NCR reports only the address list without the link to authors. We define then a publication as “made in” a territory if the majority of its addresses refers to that territory.Footnote 1

  • Publications with no prevalent territory are excluded from the analysis.

The analysis of knowledge flows will be carried out in three distinct geographic contexts:

  • the national one, in which the citing publications assigned to “Italy” are attributed to one and only one LAU (municipality)Footnote 2 of the Italian territory, always on the basis of the prevalence criterion;

  • the international one, where the citing publications will be attributed to one and only one country on the basis of the prevalent NUTS0 codeFootnote 3; we will distinguish also between the continental (Europe) and the extra-continental (extra-Europe) context.

We then measure the “distances” of the citation flows, along the geodetic lineFootnote 4 that joins the prevalent Italian LAU of production of the aforementioned publication with:

  • the citing Italian LAU, for national analysis,

  • the capital of the citing country, for international analysis.

In this work, we control for the cognitive proximity of the citing-cited publications. Previous studies measured the mass of the citing territory by the total number of publications made in that territory (Pan et al. 2012) or by the number of solely publications falling in the same field as the cited publication (Abramo et al. 2020). Here, we adopt a different method to measure the mass of the citing territory.

As usual, we measure the mass of the territory of the cited publication by the total number of publications of that territory falling in the same WoS subject category (SC).

Much more complex is the way we measure the mass of the territory of the citing publications. Citing publications may fall or not fall in the same SC as the cited publication. We first calculate the SC frequency distribution of all world publications citing all Italian publications within a certain SC. The mass of the territory of each citing publication is the weighted sum of that territory’s publications falling in the above identified SCs, whereby the weights correspond to their frequency distribution.

To exemplify, we consider all publications in the dataset falling in the SC Paleontology. Relevant world citing publications in the observed period fall in 93 different SCsFootnote 5: 45% in Paleontology, 18.9% in Geosciences, multidisciplinary, 9.4% in Geology, 8.6% in Geography, physical and the remaining 18% are dispersed across the remaning 89 SCs. Let us assume that we want to measure the knowledge flows generated by the cited publications in Paleontology made in LAU Milan, to LAU Turin. The mass of Milan is measured by the 2010–2012 cited publications in Paleontology made in Milan (52 in all). The mass of Turin, instead, is measured by the weighted average of the 2010–2017 publications made in Turin and falling in the above 93 SCs (189 in all).

The gravity model adopted for the national analysis in each SC is:

$$C_{ij} = k \cdot \frac{{M_{i}^{\alpha } M_{j}^{\beta } }}{{d_{ij}^{\gamma } }}$$
(1)

With \(C_{ij}\) number of citations to publications made in LAU i by the publications made in LAU j.,k constant, Mi total number of publications made in LAU i in the 2010–2012 period, Mj weighted number of publications made in LAU j in the 2010–2017 period, dij geodetic distance between cited LAU i and citing LAU j.

For the international analysis, the following distinctions apply:

\(C_{ij}\) indicates the number of citations to publications made in LAU i by the publications made in country j.

Mj refers to the prevalent country j.

Mj = weighted number of publications made in country j in the 2010–2017 period.

dij is the distance between cited LAU i and the capital of the citing country j.

Applying a logarithmic transformation to all variables of Eq. (1), we obtain:

$$\ln (C_{ij} ) = \ln \left( k \right) + \alpha \ln \left( {M_{i} } \right) + \beta \ln \left( {M_{j} } \right) - \gamma \ln \left( {d_{ij} } \right) + \varepsilon$$
(2)

The coefficients of a log–log model represent the elasticity of the Y dependent variable with respect to the X independent variable. For example, for the distance variable (\(d_{ij}\)) an elasticity of one (γ = 1) indicates that a 1% increase in the distance is associated with a 1% decrease in citations exchanged, on average.

For the 2010–2012 triennium the I-NCR dataset contains 255,399 Italian publications, 184,177 of which had received at least one citation up to the close of 2017. 161,680 were assigned univocally to an Italian LAU,Footnote 6 and had received 3,002,835 total citations from 1,800,037 citing publications. The overall dataset was broken down by SC (244 in all, according to the WoS classification schema) of the hosting journal.Footnote 7 In turn, the SCs are grouped in OECD disciplinary areas (DAs, six in all) applying a category-to-category mapping available on the Incites-Clarivate Analytics portal.Footnote 8

Empirical evidence

The following sections illustrate the results of the analysis at two levels of aggregation: (1) by DA; (2) by SC. For each level, the analysis was carried out considering three geographical contexts: national, European and extra-European, depending on the location of the citing publications.

Disciplinary area level analysis

The national context

For the analysis of the national context, Table 1 shows the descriptive statistics of the variables used in the gravitational modelFootnote 9 estimated for each DA.

Table 1 Descriptive statistics for the variables of the gravitational model applied to the national context, by disciplinary area

In the period of observation, the mean citation flows between Italian municipalities vary greatly among the DAs considered, ranging from a minimum of 3.6 (Humanities) to a maximum of 41.5 (Natural sciences). Differences are mainly due to the peculiar characteristics of the DAs considered, such as the different intensity of publication and citation.

Focusing on the variable of interest dij, it can be observed that for all DAs the mean distance is always higher than the median, revealing a right skewed distribution. The mean distance ranges from a minimum of 320 km (Humanities) to 373 km (Natural sciences). The maximum distance of citation flows registered between two Italian municipalities ranges between 1022 km (Humanities) and 1119 km (Medical and health sciences). To contextualize these figures, it should be observed that the maximum geographic distance between two LAUs, from extreme southern to northern Italy, is 1271 km (Lampedusa, Vipiteno).

In summary, it is clear from the data observed that compared to the other DAs, in the Social sciences and Humanities the average distances of citation flows between national organisations are significantly smaller.

Table 2 shows the estimates of the coefficients of the gravitational model calculated by means of OLS. The R2 values are always lower than 0.6; the lowest values are recorded in Humanities (0.398) and in Social sciences (0.471).

Table 2 OLS regression outcome at the disciplinary area level for the national context

Our next focus is on the variable of interest dij. After demonstrating at an aggregate level that distance still matters in scholarly knowledge flows in science (Abramo et al. 2020), the findings confirm that the same phenomenon is also present at a lower level of aggregation, but with different intensities. For all six DAs considered there is a clear effect of geographic proximity on knowledge flows, with values of γ all negative and statistically significant: a percentage increase of 1% in distance corresponds to a decrease in the citations exchanged that varies in the range of 0.3–0.5%, in absolute value. In detail, the most significant reductions on averages of citations exchanged are observed in Natural sciences (− 0.505) and Engineering and technology (− 0.497), followed by the closely grouped threesome of Medical and health sciences, Social sciences and Agricultural sciences (respectively − 0.427, − 0.409, − 0.392), and finally at a distance, Humanities (− 0.277).

In the national context, the geographic proximity effect on citations between territories results as present, but differentiated by DAs: more contained within the Humanities and Social sciences, more significant in the Sciences, in particular in Natural sciences and Engineering and technology. These data, corroborated by the descriptive statistics of Table 1, suggest the hypothesis that Humanities and Social sciences are characterized by the limitation of geographic influence to a small area, probably reflecting the national specificity of the research topics covered.

The same applies for the variables Mi and Mj, whose coefficients are all positive and statistically significant: for these, a percentage increase of 1% corresponds to an increase in citations exchanged that varies in the range 0.1–0.5%, with the minimums always in Humanities (0.114 for Mi, 0.145 for Mj) and in Social sciences (0.296 for Mi, 0.279 for Mj).

The international context

Table 3 shows the estimates of the coefficients at DA level for the analysis of the continental European (EUR) context. The R2 values are always > 0.5, except for Humanities (0.337). The effect of geographic proximity on knowledge flows is evident in all six DAs, with values of γ all negative and statistically significant. A 1% increase in distance corresponds to a decrease in citations of around 0.2%, with the lowest values recorded in Social sciences (− 0.119), Humanities (− 0.151) and Engineering and technology (− 0.193). In contrast, given increasing distance, the most significant reductions in citations are observed in Natural sciences (− 0.261), Agricultural sciences (− 0.250), Medical and health sciences (− 0.215).

Table 3 OLS regression outcome at the disciplinary area level for the European context

As seen previously in the national case, the values of the coefficients of the variables Mi and Mj, all positive and statistically significant, are almost aligned; however, these always reveal Humanities as the DA with the lowest coefficients (0.265 for Mi, 0.341 for Mj).

Finally, we carry out the same analysis out for the extra-EUR context (Table 4). The results show R2 varying in the range 0.4–0.7, with the values higher than 0.6 for all DAs except Humanities (0.4).

Table 4 OLS regression outcome at the disciplinary area level for the extra-European context

For four out of the six DAs considered, the geographic effect on knowledge flows would seem attested by statistically significant values and positive in sign. For the remaining two DAs (Humanities and Medical and health sciences) the geographic proximity effect is not manifested, since their p-values are not statistically significant. Taking this evidence as a whole, the geographic effect clearly disappears beyond a “threshold distance”, meaning that the phenomenon would be confined to the national and continental scale.

We can hypothesize that as distances increase, the contact mediated by information and communications technologies prevails over purely “personal” relationships. In the four DAs where the values of the γ coefficient are statistically significant, these are in any case all lower than 0.12, in absolute value. Still, although limited in value, it remains to be understood why there would be a positive sign on the γ coefficient of dij for the four DAs, in considering the intercontinental citation flows.

Analysis at the level of subject category

We can now replicate the analysis seen at the DA level, but at the SC level. This is a critical analysis, given that the DAs aggregate SCs, which in addition to varying contents, also have different characteristics in terms of publication intensity and citability.

Table 5 shows, as an example, the results for the 11 SCs belonging to the Agricultural sciences DA.

Table 5 OLS regression outcome for the subject categories of the Agricultural sciences disciplinary area

The results of the OLS regression show that the R2 values vary in the range 0.4–0.5 (national case), 0.2–0.5 (EUR), and 0.3–0.6 (extra-EUR). Food Science & Technology has the highest values in both EUR and extra-EUR contexts. In the analysis at national scale, the coefficients relating to distances are all statistically significant and negative. In the European context this is true for 7 SCs (with the exception of Agricultural Engineering, Agricultural Economics & Policy, Fisheries, Horticulture), and in the extra-European context for just two (Agriculture, dairy & animal science; Food Science & Technology).

We can conclude that the presence of the geographic proximity effect is also confirmed at the SC level, certainly for the citation flows on a national scale and to a less extent in the European context. In the case of extra-EUR the geographic proximity effect is almost always not significant and possibly confined to a limited number of SCs.

This is confirmed by the data of Table 6, which for each DA presents the descriptive statistics for the distribution of the γ coefficient for the variable dij. The maximum and minimum values thus refer to what is observed for the SCs of the given DA.

Table 6 Descriptive statistics for the distribution of the dij γ coefficient for the SCs of each disciplinary area

At national scale, the coefficients are always all negative: geographic distance between municipalities has a negative impact on the citation flows in all SCs. Instead, the “Max” column evidences some positive values, i.e. the presence of at least one SC where geographic proximity reduces the citation flows, both in the continental and inter-continental analyses. At EUR scale, this is recorded in six SCs belonging to four different DAs: Engineering and technology (SC of Engineering, Chemical), Humanities (SCs of Art and History), Social Sciences (Criminology & penology and Ergonomics), Medical and Health Sciences (Nursing). On the Extra-EUR scale, positive values are recorded in 26 SCs belonging to five different DAs: more precisely in nine SCs of Natural sciences, eight of Medical and health sciences, six of Engineering and technology, two of Social Sciences and one of Agricultural sciences. The “Min” column evidences one case only (Humanities), on the EUR scale, showing all positive values for each relevant SC.

Comparing columns 2 and 5 of Table 6, we observe that the minimum γ coefficients in the national context are always higher, in absolute value, than those recorded in the EUR context. We can conclude that the geographic bias, where significant, is always greater for national flows than for continental ones.

It is also interesting to observe the trend in the value of standard deviation within each DA, i.e. the dispersion of data around the average. In the national context, the highest value for dispersion of data is observed in Engineering and technology and Natural sciences, while the lowest corresponds to Humanities; in the other DAs the values are almost equal. In the EUR context, Social sciences presents the maximum value of standard deviation, Agricultural sciences the minimum. In the extra-EUR context, the highest and lowest values occur respectively in Social sciences, and Humanities.

Table 7 shows the counts concerning distribution of the coefficient γ of the dij variable in the three territorial contexts analyzed. In the analysis of flows between national municipalities, of the 215 cases where the γ coefficient is significant, it is also systematically negative; in the analysis of continental flows, the SCs with significant γ coefficient drop to 125, and in six of these the sign is positive, indicating that rather than distance limiting flows, they are encouraged. Finally, in the extra-EUR context, the number of SCs where the OLS model returns significant γ coefficients drops further, to 59, and in 26 of these the γ coefficient is not negative.

Table 7 Counting data for dij γ coefficient by disciplinary area and subject category

Conclusions

The current work continues from and deepens a previous study by the authors (Abramo et al. 2020), concerning the influence of geographic distance on the knowledge flows from producers of new knowledge (articles cited) to the users (articles citing). In this case the specific aim is to study and compare the knowledge diffusion across scientific fields, for determination of if and how the effects of geographic distance vary between SCs. The study is based on the same methodological assumptions of the previous paper: using a gravitational model estimated by ordinary least squares, we have now analyzed the effect of geographic proximity in each DA and SC, at three different geographic scales.

On a national scale, the results show that as the distance between territories increases, controlling for their mass, there is a decrease in the number of citations exchanged in all SCs investigated, a finding in line with previous literature.

However the proximity effect is more evident in Natural sciences, Engineering and technology, and less in Humanities. The same occurs on a European scale, but with less noticeable decreases than observed at national scale: in relative terms, the reductions in knowledge flows with distance are more appreciable in the DAs of Natural sciences and Agricultural sciences, less in Social sciences and Humanities. In the analysis carried out at extra-European scale, the geographic effect seems to disappear; instead there is even a positive relationship between quotations exchanged and distance, in Engineering and technology and in Natural sciences, while the inverse relationship holds only in Humanities and Agricultural sciences.

Therefore, what we previously observed at the aggregate level is confirmed at SC level, concerning the presence of a “threshold” effect beyond which the geographic distance effect disappears and the intensity of the citation flows becomes insensitive to the spatial factor. It is evident that at the intercontinental scale, ICT-mediated communications have been progressively replacing face-to-face contacts, and so the effects of geographic proximity have waned. However, such results could also be linked to the geographic context analyzed. This type of analysis is inevitably country specific, as is the very concept of “intercontinental”: each country has its own specific place in the world and what is evident for the flows generated by Italian scientific production might not be so in other cases, for example the New Zealand one. It follows that a fundamental aspect of these types of analysis is that they must necessarily consider geographic scale as a fundamental factor. This would be true for Italy, in particular, whose scientific production generates extra-EUR citation flows that compose almost 50% of the total international ones.

This work also evidences that geographic bias tends to be differentiated between SCs, by virtue of their intrinsic characteristics. Humanities and Social sciences have a smaller area of influence, a smaller average range of citation flows; at the same time, the decay of citation flows with geographic distance is lower than in other DAs, particularly compared to the Sciences. This could be due to the peculiarity of the research topics addressed, more country specific for Humanities and Social sciences, and therefore with more localized spillovers, but also with lower citability of the works, as well as lower incidence of self-citations for these DAs compared to the SCs of Sciences. All these determinants can be the object of further study, continuing from the current work.

Still on the subject of future developments, from a methodological point of view it could be useful to work on the specification of the model, for example by integrating a series of latent variables associated with the so-called “social proximity factors” (links between mentors and students, belonging to the same scientific school within a field; asymmetry in citation processes in favour of papers published in prestigious journals, or by prestigious scientists, …). In fact citations reflect not only the attribution of scientific credit, but can also be dictated by reasons of a “social” nature, and this could generate a bias in favor of the geographic factor. Finally, it would certainly be interesting to include the time variable in the analytical model, with the aim of verifying the variation of the effect of geographic distance as a function of time, with the relevant implications concerning citation time windows.

Appendix

See Tables 8 and 9.

Table 8 Descriptive statistics for the European context analysis by disciplinary area
Table 9 Descriptive statistics for the extra-European context analysis by disciplinary area