Keywords

1 Introduction

Science is usually developed in teams that could be considered domestic, if the researchers are all from the same country, or international, if researchers belong to different countries. According to [1], the fourth age of research is driven by international collaborations, which could be motivated by two kinds of factors [2] related to the diffusion of scientific capacity, or related to the interconnectedness of researchers. That is, in the current global knowledge society [3], researchers tend to collaborate with colleagues from other countries in order to advance in their own fields.

Furthermore, papers developed within international teams, used to be more cited than paper developed within domestic teams [2, 4]. Indeed, recently has been demonstrated that researchers to develop their career in different countries tend to be more cited that those who still in the same country in all their academic life [5].

So, the main aim of this contribution is to determine if there an increase in the number of citations when researchers from different geographical areas collaborate. To answer this question, we develop a bibliometric analysis [6,7,8] we have focused on the 9 Universities from the region of Andalusia (Spain). We compare the number of citations in the next scenarios: when the publications are only signed by members of the same Andalusian university, when they are only signed by members of Andalusian Universities, when they are signed by members of Andalusian Universities and other affiliations from Spain, and finally when they are signed by at least one member from an Andalusian University. The number of publications, citations and citations per publication is compared.

The Dimensions database has been used to obtain the publications and number of citations. We have chosen this database because it is freely available for academic purposes, it includes a large corpus (i.e., 89m publications and 4 billion references), and it has a powerful API that allows advanced analytic by means of their own DSL (Domain Specific Language) query language, and using a programming language such as Python.

The rest of the paper is structured as follows: in Sect. 2 the methodology to obtain the dataset of publications is explained. Then, a discussion of the results is presented in Sect. 3. Finally, conclusions and future work are presented.

2 Methodology

This section describes the steps followed to obtain the dataset of articles published by researchers from Andalusian Universities. The data is sourced from Dimensions, an inter-linked research information system provided by Digital Science (https://www.dimensions.ai). This database has been chosen not only because its large amount of data available, including the number of citations by publication, but also because it offers the possibility to use an API to perform queries using a DSL. This SQL-like petitions can be performed from any programming language and used to obtain a large batch of specific results in JSON format, thus facilitating their processing and analysis. Python language has been used to call the Dimensions API and to plot and analyse the results.

The data we are interested in are the publications signed by at least one member from one Andalusian University. As we want to compare the citation number when different regional and international collaborations appear, we divided the dataset as follows, from local to international co-authoring:

  • \(P_{One}\): Papers from Andalusia (only one affiliation). This dataset includes the papers where all the authors belong to the same Andalusian University.

  • \(P_{And}\): Papers from Andalusia (only with Andalusian Universities collaborators). This dataset includes the papers where all the authors belong to Andalusian Universities.

  • \(P_{Spa}\): Papers from Andalusia (only with Spanish Entities collaborators). This dataset includes the papers where all authors belong to a Spanish institution, and at least one is from Andalusia.

  • \(P_{All}\): All papers from Andalusia universities. All papers where at least one author is from a University from Andalusia.

Every dataset is included in the one above, therefore \( P_{One} \subseteq P_{And} \subseteq P_{Spa} \subseteq P_{All}\).

The rest of parameters for the queries are:

  • The selected universities to perform the analysis are the 9 public universities of Andalusia. To perform the queries, their associate Global Research Identifier Database (GRID), available from https://grid.ac/, has been usedFootnote 1.

  • Date range: from 2010 to 2015.

  • Only publications of type “article” are used.

  • The queries were performed on 25th July 2018.

To obtain all the papers from Andalusia Universities (\(P_{All}\)), the query used is:

figure a

The query to obtain the papers of \(P_{One}\) directly from dimensions, the modifier can be used:

figure b

To select all the papers from Andalucia Universities with only collaborators from Spain, the modifier used is .

figure c

Dimensions does not offer the possibility to filter directly using “only from” a specific attribute (for example, only from the list of universities). That is the reason we have filtered the \(P_{All}\) dataset to obtain \(P_{And}\) iterating for every publication and removing those that have authors whose affiliation is not in the Andalusian universities list.

3 Results

Obtained results and corpus sizes are summarized in Table 1. As it can be seen, a great number of articles (39.81%) are only signed by researchers of the same affiliation (\(P_{One}\)). On the other side, Andalusian researchers tend to collaborate with researchers from affiliations outside Andalusia: 1426 publications with other Andalusian Universities, and 5536 with other Spanish affiliations (without counting Andalusian Universities, that is \(P_{Spa}-P_{And}\)). But it is remarkable that 13944 publications (\(P_{All}-P_{Spa}\)) are signed by Andalusian Universities and foreign affiliations (40.14% of the total), more than with regional and national collaborators. Results in Table also show that the average number of citations increases when including publications from other affiliations, clearly being the \(P_{All}\) dataset the one with the larger value (15.675). This can be explained because large research projects involving different countries are more ambitious than regional ones.

Table 1. Summary of citations and publications per corpus. All publications are included in the dataset below: \( P_{One} \subseteq P_{And} \subseteq P_{Spa} \subseteq P_{All}\)

Plotting the citation histogram of all publications also show clear differences between the datasets. Figure 1 shows that the publication citations follow a long-tail scheme, where the majority of the publications are not cited, or below of 200 citations, while a few highest cited papers are cited up to 594 times. When increasing the geographical ambit, higher cited papers appear, for example for \(P_{And}\) and \(P_{Spa}\) (Figs. 2 and 3), the highest cited paper has 1089 citations, clearly being a paper from \(P_{And}\). However, the average citation per paper is still greater if we take into account the Spain geographical ambit. Although another highly cited paper appears in Fig. 4 with 839 citations, the differences between both datasets are not so clear. It is when plotting all the papers with Andalusian authors and the rest of the world \(P_{All}\) (Fig. 4), where the highest cited papers appears (4231, 2919, 2309 and 1515 respectively), but also, the group at the beginning of the x-axis moves to higher amount of citations.

Fig. 1.
figure 1

Citation histogram for \(P_{One}\). Y-axis uses a logarithmic scale.

Fig. 2.
figure 2

Citation histogram for \(P_{And}\). Y-axis uses a logarithmic scale.

Fig. 3.
figure 3

Citation histogram for \(P_{Spa}\). Y-axis uses a logarithmic scale.

Fig. 4.
figure 4

Citation histogram for \(P_{All}\). Y-axis uses a logarithmic scale.

4 Conclusions

In this paper, the number of citations of articles from Andalusian Universities is analysed taking into account the geographical collaboration network of the authors. The Dimensions.ai database has been used to obtain the articles from Andalusian researchers, divided into 4 different geographical areas: only one affiliation, only Andalusian affiliations, only Spanish affiliations and all publications.

Results show that Andalusian publications are clearly divided into two groups: articles signed by researchers within the same affiliation (39.8%) and signed with researchers from foreign countries (40.14%). The average number of citations per paper also increases when the collaboration network geographically increases, meaning that publications with international collaborations obtain more citations than the ones with only one affiliation.

Future studies will include more complete information by separating the presented datasets into disjoint sets, or limiting by specific University. Another kind of analysis, such as collaboration graphs between countries or universities, may give more insight to determine the members of each network or the quality of the research. Furthermore, Dimensions API also allows to obtain the funding agencies of each publication, so a study to compare the impact of projects funded by different countries can be performed. Patents and clinical trials are also available in Dimensions, so a comparison of the different types of publications may also be relevant to the issue.