Background and aim of the paper

Although the study of complex networks can count on a long tradition in various fields of knowledge, in the last few years, after the introduction of models for power-law (Barabàsi and Albert 1999) and scale-free networks (Watts and Strogatz 1998), complex networks have become increasingly important in many fields of science including social sciences, biology, chemistry, computer science etc. Complex networks are used to describe different real world systems (Albert and Barabási 2002). By this, we mean complex network systems with a large number of interconnections and variability over time. Social relationships among agents are an example of complexity, where nodes can be people, programs, projects etc., and links represent their interactions (Watts et al. 2002; Arenas et al. 2004). This paper studies the collaboration among researchers in a specific Italian funding program, the Projects of National Interest (PRIN), which aims to support the academic research (Reale and Zinilli 2014). Through the PRIN, which represents an important governmental competitive tool to give resources for collaborative research in different interest sectors, we want to analyze the factors behind the collaborations among researchers, during the period from 2000 to 2011. The main aims of this paper are to understand the distribution of the connections among researchers who decide to collaborate on project funding application and why the links in the research collaborations, given a set of covariates, change over time. This paper does not considered the general collaboration among scholars in Italy, but only those collaborations that have received funding in the PRIN context. Since researchers join or leave the network and consequently the network composition changes over time, it is important to understand the networks formation according to a set of attributes, which have been selected on the relevant literature. We have studied a competitive funding scheme through dynamic network analysis techniques, as there is more and more a tendency towards competitive systems regarding project funding (Lepori et al. 2007). We think a power-law distribution exist with the network collaborations, there is a spatial dimension that influences the ties, there is no tendency to collaborate with researchers from other disciplines and we think that there is a dissimilarity in the H Index in the collaborations. The paper is organized mainly in five steps. First, we proceed with an analysis of the literature on network collaborations. Secondly, we check if there is the presence of a power law in our dataset in the four areas of interest (chemistry, physics, economics and sociology). Thirdly, a model is developed to analyze how collaborations are formed in the four areas over time. A stochastic actor-oriented model is used in the analysis to estimate parameters which are at the base of the mechanism of change. In this model, we take into account different PRIN calls from 2000 to 2011, putting into the model a set of control variables. Then, we present the results of estimates of the longitudinal analysis. Finally, we check our hypothesis and present the conclusions that derive from the estimates.

Conceptual framework

The current literature focuses on the effectiveness of the approach to examine network models and patterns of collaboration in scientific communities and to describe the different roles of researchers in the network (e.g. more isolated or central researchers or those who have a brokerage role). In the last years, research collaborations have been studied through several approaches: one approach tends to investigate how we can measure research collaboration (a methodological approach); another approach tends to study the impact of collaboration on productivity (among researchers, between firms and university etc.); another approach studies what are the indicators that drive the formation of research collaborations, taking into consideration patterns of geographic proximity and scientific interaction levels (Andersson and Persson 1993; Newman 2001a, b; Katz and Martin 1997). Scientific collaboration is a set of informal functions (e.g. face-to-face contacts) and formal activities (e.g. participating in research projects or professional conferences) among scientists involved in producing knowledge (De Stefano et al. 2013). Usually, researchers start to collaborate with others through personal contacts developed during their career, often begun at professional conferences (Laudel 2006). Most of the studies focus on co-authorship networks or co-citation/citation networks. According to Gmür (2003, p. 27–28) the number of co-citations represent an indicator of proximity in terms of content. Hummon and Doreian (1989) have developed new methods (main-path algorithms) to weigh the links in a citation network of articles. Hummon and Carley (1993) used bibliographic citations and the SNA technique of path analysis to study the connections between researchers published in the journal Social Networks. In this paper we focus on collaborations that were born in projects funded on a competitive basis in four disciplinary fields.

An important law (especially in the literature of physical sciences) to model the overall mechanism of the formation of the network is the power-law degree distribution, which provides a mechanism for network growth tied to the fact that the new vertices tend to connect the vertices that already possess a high degree of connections. In particular, when the probability of being connected to a given vertex is proportional to its degree, such distribution results can be observed (Albert and Barabási 2002). Of course, this mechanism may occur with different variations in different real contexts. The power-law is a mathematical distribution of the concept of the Matthew effect described by Merton (1968), which is a possible effect of another concept used in network science, that is a preferential attachment (Newman 2001a, b). In the scientific collaboration networks, the power law distribution has been observed more times (Newman 2001a, b), which means the more connected nodes are destined to have much more connections in the future than the new nodes in a network. Many empirical studies have also shown distributions of power law at the upper tail of a distribution (Sinha 2006), other studies have shown that income and wealth distributions follow log-normal distributions (Chatterjee et al. 2005). However, as explained by Clauset et al. (2009) detecting a power law distribution in empirical data may be very difficult and for this reason it is important to compare the power law model with some other distributions (Newman 2005).

Very often junior researchers are forced to join senior researchers to join a network. Following this, the first question is:

Q1

is there a power law distribution in the collaborations in a competitive funding context?

H1

the hypothesis is that exist a power-law distribution to the network collaborations in competitive systems.

The everyday use of the word collaboration suggests researchers working together to achieve a common goal (most likely conducting research and developing new scientific knowledge) (Katz and Martin 1997; Newman 2001a, b, 2004). Funding programs are always keen “to increase the level of collaboration engaged in by the researchers whom they support in the belief that this will bring about better benefits”, increasing researchers’ chances of funding success (Katz and Martin 1997, p. 2; Laudel 2006; Beaver 2001). We are aware that the collaboration pattern is the result of a complex interplay of other factors. Boschma (2005) suggests five forms of proximity: Geographical proximity, cognitive proximity, organizational proximity, social proximity, institutional proximity. Therefore, the decision to collaborate with other researchers is influenced by different dimensions of proximity, which may overlap, reducing the independence concept. Usually, the several concepts of proximity are applied to inter-firm collaborations, innovation and regional economic development (Anand and Khanna 2000; Boschma 2005). Fernández and López (2015) examine the effect of geographical, cognitive, institutional, organizational and social proximity on scientific collaboration among academic institutions using both bibliometric data on co-authorship and EUMIDA data on institutional-level information. The results show evidence of the importance of geographical, cognitive, institutional and social proximity in the rise of research collaborations while the effect of organizational proximity seems to be weaker.

The spatial proximity (or geographical) is an important determinant of collaboration patterns. Indeed many collaborations are begun in informal contexts, which geographical proximity facilitates (face-to-face meetings are easier to organize) (Andersson and Persson 1993; Katz 1994; Boschma 2005). Cognitive proximity refers to the degree of similitude of the knowledge bases of organizations and institutions (Nooteboom et al. 2007). Organizational proximity is defined as the degree of strategic interdependence between two organizations, and it reduces the uncertainty about the behaviour of the future partner (Boschma 2005). Social proximity is defined by the existence of direct and informal personal interaction of the employees or the managers of two different organizations (Boschma 2005; Boschma and Frenken 2009). Institutional proximity is defined by the similarity of informal and formal rules shared by actors (North 1990).

In this work we explore the possibility of using geographical proximity and cognitive proximity to understand how they influence the choice of partners in the PRIN context.

In particular, researchers nearby are more likely to collaborate in the PRIN. Researchers whose universities are far away are less likely to collaborate together, even if their research interests are close. The literature on the contribution of spatial distance has analyzed the importance of distance in time (Scherngell and Lata 2011; Smith and Katz 2000). Therefore, we look at the geographical dimension of researchers that have collaborated in the PRIN program by examining the correlation between proximity and the decisions to collaborate with different partners. The question that we ask is:

Q2

is there spatial correlation in the formation of collaborations in a competitive funding context?

H2

the hypothesis is that spatial dimension positively influences the evolution of the collaboration network.

In our case we can mean the cognitive proximity in terms of disciplinary, measured through the scientific area of belonging of the researchers. The disciplinary proximity promotes communication and the exchange of knowledge among researchers of the same scientific area.

Gibbons et al. (1994) note that modern science is adopting new organizational forms largely using interdisciplinary perspectives instead of disciplinary ones. Different works have studied collaboration when there is heterogeneity of knowledge and skills (Durfee et al. 1989; Andersson 2011). Increasingly, there are calls for more interdisciplinary approaches to research, along with encouragement for greater collaboration among researchers (Hicks and Katz 1996; Gibbons et al. 1994). The paper investigates the disciplinary proximity in terms of the researchers affiliated area, if two researchers belong to the same area they are close if not the opposite applies (similar knowledge bases).

The research question is:

Q3

is there a tendency to collaborate between individuals from different disciplinary backgrounds in the changing links?

H3

the hypothesis is that in the PRIN there is not a tendency to collaborate with scholars with different knowledge.

For chemistry and physics we have the h-indexFootnote 1 for each researcher. In this paper we use this measure to investigate how the variation in individual scientific impact is related to the collaboration network. We consider the h-index as prestige, with all its limitations (Bornmann and Marx 2011). The citation could be viewed like a vote, with the same meaning has an academic context (Davis 2008); the status of a researcher in a social context is given by the total number of citations by other researchers (Bollen et al. 2006). The question is:

Q4

is there an h-index assortativityFootnote 2 in the choice of partners?

H4

the hypothesis is that there is not an assortativity dimension, but a preferential attachment about scientific prestige.

Assortativity dimension reflects tendencies for researchers with high h-index to preferably be tied to other researchers with a high h-index. We think that is exactly the opposite, those researchers with a low h-index tend to join with those who have a high h-index (preferential attachment behaviour). We calculated the h-index using all years of study, this information was taken from Scopus search.

Dataset and methodology

The paper uses two approaches to study the dynamic complex networks: a first interest is to identify the observed distribution of links among researchers in the four areas of interest through distribution models, then we will use a stochastic model to understand how the links change over time. In order to do this, we have built through the PRIN website a large and novel dataset containing 4322 researchers from 98 universities and research institutes that have been selected for PRIN allocation from 2000 to 2011. We chose this time period because it was long enough for a longitudinal analysis and because in subsequent years the data was not available.

The sample refers to four research areas: chemistry, physics, economics and sociology. Both physics and chemistry are characterized by consolidating collaborative patterns and publication habits in international indexed journals. As to the social sciences, the focus is on areas where collaborations are not so diffused between scholars and the use of indexed journals is not widespread, since scholars prefer diverse types of output production (e.g. books). The selection of the aforementioned areas comes from the need to have a sample representing different characteristics of the epistemic communities. This study concerns an Italian funding scheme and it cannot be generalized to the entire academic community, therefore we can consider this work as a proxy of the overall communities.

12 years of PRIN program are observed. Originally, the dataset was in two-mode network (researchers by project), then we built the dataset in one-mode through a command implemented in STATA (Zinilli and Cerulli 2015). We consider the PRIN research team membership as the most viable way of collecting data on collaboration for our analysis. Data collected on PRIN grants allocated in the considered period allow us to understand the characteristics of the proponents. In fact each research team has a principal investigator and one or more responsible for the research units, the gender, academic position, scientific quality, role in the PRIN, affiliation (we have taken into account that researchers change institutions during the 12 years under analysis).

To study how the collaborations are formed over time, the package RSiena for “R” environment is used in the analysis. In four models that we have built (a model for each area) we have included a number of control variables (some labelled like “Call PRIN”) to understand how government guidelines have influenced changes over time. We put these variables because the exclusion of control variables would compromise the internal validity of the model and to understand if there is a policy’s effect in the changing link. Initiatives and directives by government could lead to the establishment of contacts with people other than the scientifically most interesting ones. A variable of interaction between h index and the academic role has been added, because there could be the possibility of a confounding role by researcher’s seniority (is there a modification of the effect of H index due to seniority?). The goal is to isolate the presence of a modification of the effect of H index determined by seniority.

Variables used in the model are (Table 1):

Table 1 Variables used in the SAOM model 

The PRIN dataset contains a changing composition, indeed the network can acquire new researchers over time (joiners), or lose the researchers (leavers). As Bellotti (2012) explains the PRIN calls have changed since 2005 and after this date it has not been possible to participate to the PRIN for two consecutive years. To remove problems of convergence of the estimation algorithm, we had to combine the date for years 2004 and 2005, 2006 and 2007, and 2008 and 2009. Furthermore, we have applied composition change directives in the RSiena package. Research joiners and leavers during the years could have influenced the model, although parameters are estimated with careful consideration through imputation techniques recommended by Huisman and Snijders (2003) (Huisman and Steglich 2008). Before moving forward in the analysis we introduce some basic network terminology. A network is composed of nodes (vertices), in our analysis the researchers, and the links (edges), belonging to the same research team. It is represented as a n × n adjacency matrix, where the element a ij provides information of the existence of a link from researcher i to researcher j; the network is undirected.

Power-law distribution

In previous years different papers on real complex networks analyzed a self-organizing processes in the real networks.

The network is undirected, an actor’s degree is the number of other actors to which it is directly connected. For degree we mean the number of co-workers with which each researcher collaborated along the years (repeated collaborations with the same partner are counted only once).

Degree distribution is the probability that a randomly chosen node has x connections. The probability P(x) that a node in the network interacts with x other nodes decays with the law:

$$\text{P} (x) \propto ax^{ - \gamma }$$

Where P(x)is the probability to encounter value x and γ as the scaling exponent (γ ≻ 0). Scale-free networks are open and dynamically formed by the continuous addition of new nodes. The likelihood to have a node with x links is function of the number of links ‘scaled’ for the exponent on which the skewness of the distribution depends. It can happen that power-law distribution can look like other types of distributions (for instance to exponential distribution) or the right-skewed distributions can be better fitted with a power law with exponential cut-off (Giot et al. 2003). In other cases the distribution follows a power-law only over a lower bound (x min) that has to be estimated when fitting the empirical data. Power-law has two important features: (1) it does not have a peak at its average value and it starts at its maximum value and then decreases all the way to infinity; (2) the rate at which the power-law decay is much slower than the decay rate of other distributions (for example with respect to an exponential distribution) and this brings to a much greater likelihood of extreme events. In order to estimate x min we have followed different procedures (Clauset et al. 2009). Below (Table 2) the distributions we have used to compare with power-law, for each distribution we indicated the basic functional form f(x):

Table 2 Functional form by distribution

First, to understand if we can speak of a power-law we follow a straight line on logarithmic axes (log–log plot). Through an observation of the log–log plot of the probability distribution function we can see if there are values that change and don’t follow the power law distribution. The slope is the value of the scaling parameter γ and a slope changing reveals the lower bound when there is not a power-law for all its values.

A power law is a linear relationship between logarithms, of the form:

$$\log P(x) = - \gamma \log x + \log a$$

In the continuous case, the constant a in the equation is provided by the request of normalization, namely:

$$\int_{{x_{\hbox{min} } }}^{\infty } {p\left( x \right)} dx = 1 \to a = \left( {\gamma - 1} \right)x_{\hbox{min} }^{\gamma - 1}$$

In discrete case, as in this study, we have:

$$p(x) = 1 \to \alpha = \frac{1}{{\zeta (\gamma ,x_{\hbox{min} } )}}$$

with:

$$\zeta (\gamma ,x_{\hbox{min} } ) = \sum\nolimits_{n = 0}^{\infty } {(n + x_{\hbox{min} } )}^{ - \gamma }$$

it is clear that γ > 1 otherwise the equation diverges. To have more consistent estimates, using the methodology by Clauset et al. (2009) to measure power-law behaviour, we have compared it with some other distributions. To fit a discrete power law to this data and to understand if there are alternative models fit the data better, we have calculated the x min (through the Kolmogorov–Smirnov statistics) that minimizes the distance between empirical data and the power-law model which best fits it. For a given value x min, the scaling parameter is estimated by numerically optimising the log-likelihood (Gillespie 2015). For continuous cases, the optimization takes place through a maximum likelihood estimator:

$$\hat{\gamma } \simeq 1 + n\left[ {\sum\limits_{i = 1}^{n} {\log \left( {\frac{{x_{i} }}{{x_{\hbox{min} } }}} \right)} } \right]^{ - 1}$$

where x i  = 1,2,3…, n are observations such that x i  > x min. In discrete cases, we can estimate \(\hat{\gamma }\) in different ways, one way is through the numerical maximization of the logarithm of the likelihood function (Clauset et al. 2009):

$$L(\gamma ) = - n\log \zeta (\gamma ,x_{\hbox{min} } ) - \gamma \sum\limits_{i = 1}^{n} {\log } \, x_{i}$$

The standard errors for estimated parameters are computed with standard bootstrap methods with 5.000 replications. Then, a goodness of fit tests is applied and used to compare power-law fits with fits of other alternatives. A goodness of fit tests shows us how good is the model with the parameters x min and γ estimated before. As Clauset et al. (2009) have shown the hypothesis is tested using a bootstrapping procedure. If the p value generated is large, any difference between the empirical distribution and the model can be explained from statistical fluctuations. If p ≃ 0, the model does not offer a reasonable fit to the data and other distributions could be more appropriate. Following the literature the power-law hypothesis is rejected with a p value <0.1. On the other hand, a large p-value does not mean that power-law is the best model and other alternative hypotheses could be better, such as the log-normal, the exponential or Poisson. To compare these distributions, the logarithm of ratio of the likelihoods of data for two different models are calculated, if the value is positive the power law model is preferable, if negative it is preferable to the other distribution. To understand if the sign of the value is significant, a standard technique is to use the Vuong’s test (Vuong 1989). Lower p values tell us that the sign is a good indicator of which model better measures empirical data. Instead, higher p value indicates that we are not able to decide which model is preferable.

Stochastic actor-oriented model

Stochastic actor-oriented models (SAOM) have been built to study the complexity of network panel data and thus to model change in social networks. The first study with SAOM of social networks panel data was introduced by Snijders (1996). In SAOM, the dependent variables are actors’ choices about changes of network relations. Each observation is represented by an n × n matrix x = (x ij ) where x ij represents the link from the researcher i to the researcher j(ij = 1, 2, …n). The independent variables can be endogenous or exogenous. Endogenous variables explain network changes with existing network structures: an example of endogenous structure is the transitivity or reciprocity. These types of variables capture the influence of specific local network structures of co-collaboration on the probability of creating a new tie. Exogenous variables are the actor covariates, which capture the influence of individual characteristics of researchers on the probability of creating a new tie, for example same sex, similarity in a specific characteristic, etc. In the SAOM each change in the network is made according to processes of individual choice. This important assumption is realistic for what we are studying.Footnote 3 Into the stochastic actor-oriented model are embedded Markov processes,Footnote 4 random utility function and Monte Carlo simulation. The model assumption is that the discrete network observations (t 1 , t 2,…t n ) are only snapshots of an unobserved underlying dynamic sequence, the evolution between t n−1 and t n is assumed to be continuous and simulated with a Monte Carlo method. These models for network evolution are outcomes of a Markov process evolving in continuous time, which a special stochastic process. A continuous-time Markov process determines the change of the network connections. Given random variables X t , with t ∊ (0, ∞)in a discrete set S, to extend the notion of Markov chain to that of a continuous time Markov chain we require:

$$P\left[ {X_{s + t} = j|X_{s} = i,X_{sn} = i_{n, \ldots ,} X_{s1} = i_{1} } \right] = P\left[ {X_{s + t} = j|X_{s} = i} \right]$$

for all t > 0, s > s n  > … > s 1 ≥ 0 and iji k  ∊ S. The quantities P[X s+t  = j|X s  = i]is a transition probabilities. This is the analogue of the Markov process with discrete time variable, just that here there is continuous parameter (in this model simulated with a Monte Carlo method).

The idea behind this type of model is modelling the change process through two important components: the change opportunity process (rate function) and the change determination process (objective function).Footnote 5 The actors are in control of their links and they each seek to change these links (i.e. create, maintain or dissolve the links) such that their personal “satisfaction” with the network composition is maximized. An important assumption in this sense is that the actor has full knowledge of the network, including the other actors, links and their characteristics. The opportunities to change a link are modelled according to a Poisson process with rate λ i for each actor i. The interpretation is that actor i to change one of the tie variables occur at a rate of λ i (x 0), where x 0 identifies the state of the network at a certain time. In the basic model, all the actors have the same opportunity for change, which is equal to a constant parameter λ i  = p m . When we include individual attribute (v i ) or structural variables as the degree (\(\sum {jx_{ij} }\)) we introduce the heterogeneity in change opportunities. In these more complex models the rate function is given by:

$$\lambda_{i} \left( {x^{0} ,v} \right) = p_{m} \exp \left( {\alpha_{1} v_{i} + \alpha_{2} \sum {_{j} x_{ij} } } \right)$$

From the general theory of continuous-time Markov chains (Norris 1997) - following on a current state x 0, and a set of permitted new states C(x 0) and the product of the two model components λ i and p i (with p i defines the probability distribution of choices)—follows the existence of the intensity matrix that describes the rate at which \({\text{X}}\left( {\text{t}} \right) = x^{0}\) tends to transition into \(\tilde{X}\left( {t + dt} \right) = \tilde{x}\) as dt → 0:

$${\text{q}}_{{\left( {{\text{x}}^{ 0} , {\text{x}}} \right)}} = \mathop {\lim }\limits_{dt \to 0} \frac{{P\left\{ {X(t + dt) = {\text{x | X(t) = x}}^{ 0} } \right\}}}{\text{dt}}\left( {x^{0} \ne {\text{x}}} \right)$$

where x ∊ X, \({\text{q}}_{{\left( {{\text{x}}^{ 0} , {\text{x}}} \right)}}\) whenever \(x_{ij} \ne x_{ij}^{0}\) for more than one element (ij) and \({\text{q}}_{{\left( {{\text{x}}^{ 0} , {\text{x}}} \right)}} = \lambda_{i} \left( {x^{0} ,v,w} \right)p_{i} \left( {x^{0} ,x,v,w} \right)\) for graphs x and x 0 which differ the element with index (ij). Given that an actor i has the opportunity to change a relation, the choice for this actor is to change one of the link variables x (i,j), because actors can only change one link variable at a time. Changing the link variables x (i,j) will lead to a new state x, with x ∊ C(x 0). In order to model choice probabilities, a classical multinomial logistic regression specified by an objective function f i is used (Snijders et al. 2010):

$$p\{ X\left( t \right)\,{\text{changes}}\;{\text{to}}\;x\left| i \right.\,{\text{has}}\;{\text{a}}\;{\text{change}}\;{\text{opportunity}}\;{\text{at}}\;{\text{time}}\;t, X\left( t \right) = x^{0} \} = p_{i} \left( {x^{0} ,x,v,w} \right) = \frac{{\exp \left( {f_{i} \left( {x^{0} ,x,v,w} \right)} \right)}}{{\sum\nolimits_{{x' \in C\left( {x^{0} } \right)}} {\exp \left( {f_{i} \left( {x^{0} ,x^{'} ,v,w} \right)} \right)} }}$$

When actors have the opportunity to change their relations, they choose to form a new tie to try to maximize their objective function f i . This objective function describes preferences and limits of nodes (our case the researchers). Precisely, relationship choices are then determined by a linear combination of effects, depending on the current state (x 0), the potential new state(x), individual attributes(v) and attributes at a dyadic level(w). Effects related to the current state of the network are endogenous, implying a self-reproduction of network structures, such as transitive closure or betweenness effect. Individual attributes are effects modelling the propensity of some nodes to create more links. Dyadic effects indicate the propension of actors with analogous characteristics to form links.

Parameters indicating the strength of each effect in determining the network dynamics are estimated from longitudinal data using a simulation-based approach inspired by the method of moments. The solution of the moment equation is obtained by a variation of the Robbins and Monro (1951) algorithm (for further information see Snijders 2001). The stochastic algorithm has the task to simulate the development of the network and estimates the parameters that minimize the distance between observed and simulated networks. During the iteration phase, the provisional parameters of the probability model are gradually adjusted in a way that the simulated networks fit the observed networks. Then the parameters are held constant to its final value, with the aim to evaluate the goodness of fit of the model and the standards errors.

Finally, the parameters estimate by SAOM can be read as non-standardized coefficients obtained from a classical logistic analysis. Indeed, the parameter estimates are log-odds ratio, and they can be interpreted like the log-odds of ties change with one unit change in the relative independent variable.

Empirical results

In this section we show the results of the different research questions. As a first step we want to visualize our data in order to get an idea of how the data looks. The table below presents some descriptive statistics of the network structure, where the values are normalized (Table 3).Footnote 6

Table 3 Descriptive analysis

In the PRIN, the nodes (researchers) and the number of connections in the natural sciences are greater than sociology and economics, consistent with the different size of the community. The average distance is a measure of how far a researcher is from the other researchers; in physics, there is a smaller distance than the other areas. Indeed, the average geodesic distance between researchers in the network is 5.94. Clustering coefficient is high for all disciplinary fields, this coefficient goes from 0 to 1. In PRIN projects each researcher is linked to everyone else within the project and for this reason the clustering coefficient, average geodesic distance and maximum geodesic distance have been calculated on a whole network and not year by year.Footnote 7 The mean clustering coefficient is measured by the fraction of paths of length two in the network that are closed. The “small world” phenomena, the combination of short average path lengths together with a high cluster coefficient, seem to be present in physics. This phenomenon happens because a researcher acts as a shortcut between distant researchers. A possible explanation is because in physics more researchers have had collaborations of research through past collaborations with common third researcher. We expect that the closest nodes are those that work together in large laboratories (they belong to the physics’ experimental sector). On the other side the most isolated nodes seem to be those belonging to the theoretical physics, that have less need to collaborate (verified observing the nodes more in depthFootnote 8).

After this brief description of the states of the network on the four disciplinary fields, the power-law distribution is presented. The first hypothesis argues that researchers’ network follows a power-law distribution. Given the definitions above (paragraph “Power-law distribution”) a power law is a linear relationship between logarithms. This plot should follow a straight line in case of a power law. Figure 1 presents the log–log plot of chemistry:

Fig. 1
figure 1

Log–log plot of chemistry

The log–log plot shows a straight line only after a specific value, suggesting that a power-law model will have a good fit after a minimum value. Chemistry follows a power law from a x min equal at 5 and with an exponent equal at 2.7. For physics x min is equal at 9 with a exponent of 3.83 and sociology with a x min equal at 12 with an exponent of 2.11. In economics the minimal Kolmogorov–Smirnov value is reached keeping the lower bound at 12 with a scaling exponent of 2.04. The power-law p-value (second column in Table 4) is not statistically significant for physics, economics and sociology. For that reason alternative distributions should be explored.

Table 4 Power-law and alternative distributions

Chemistry seems present a pure power law with a p value of 0.62 (thus > 0.1): physics, economics and sociology don’t have a power law but follow other types of distributions. From Table 4 we can see that the physics, economics and sociology are not fitted well by the power-law model according to the goodness-of-fit test used. Physics follows a lognormal distribution, sociology data fit best to an exponential distribution and economics there is not a statistical significance. In conclusion, we find that in the PRIN program only the network collaboration of chemistry is fit well by a power-law model between 2000 and 2011, even if p values of other distributions (in chemistry) are so large that the tests cannot exclude better distributions.

SAOM results

Following the approach by Snijders (2005) initially we included in the model only basilar network effects, in order to check for endogenous dynamics.Footnote 9 At a later time, we add further effects and dropped those that were not significant. In this way it was possible to avoid model instability when running the algorithm and getting reliable estimates of the parameters. Consequently, the first model presented (Model 1) is very simple and only includes density, transitive triads, betweenness and the degree assortativity effects. The density must be included in all objective functions. It is a sort of intercept of the model and represents the tendency to form arbitrary edges. Transitivity indicates that two researchers sharing a tie towards a third researcher are more likely to engage in collaborative activities between them than to other researchers. This effect measures the tendency of researchers to cluster together. Betweenness effect measures the intermediation dynamics in the evolution of the network which are present if researchers tend to stay in the middle between indirectly connected couples of researchers. The degree assortativity effect reflects tendencies for researchers with high degrees to preferably be tied to other actors with high degrees.

For network evolution, the rate function describes the average number of changes in network ties between measurement points. Below we show the estimates for the chemistry area (Table 5):

Table 5 SAOM for chemistry

The table above shows that the average rate of change in the final model from 2000 to 2011 is 3; it means that between 2000 and 2011 on average the expected number of changes was 3 for each researcher. The density effect is negative and significant; this variable indicates that there is an opportunity cost in the establishment of each relation. Thus, the tendency of researchers to start a relationship is driven by other variables, compensating for this cost.

The significant transitivity effect indicates a tendency for transitive closure (specifically, the transitivity effect indicates a preference for collaborating with friends of friends); we could say there is trust in sharing common partners of research.

Betweenness effect in the second model is not significant excluding the tendency to intermediation by researchers.

Degree assortativity does not seem to affect the actors’ choice about changes of network relations.

Spatial proximity seems to be a relevant variable, it means that there is a tendency to create collaborative ties based on the location in the same geographical area.

The h-index effect, a positive parameter implies that researchers prefer ties to other researchers with different values on h-index variable (preferential attachment about the prestige). The model says that the effect of the interaction is not statistically significant, therefore the seniority does not change the effect of the H index.

The changing of the funding specification and the thresholds in the number of collaborations in the PRIN call seems to influence the partners’ choice.

Below the results of physics (Table 6).

Table 6 SAOM for physics

In physics the average rate of change is slightly lower than in chemistry; therefore, physicists have changed less partners in these years. The variables of control “Funding” and “Collab” seems to have an effect on the choice of partners. The transitivity effect, h-index and geographical proximity are significant like in chemistry. Even in this case the variable of interaction between H Index and Academic role is not statistically significant, therefore the seniority does not change the effect of the H index on changing of ties. The important difference compared to chemistry is the role played from the endogenous variable “betweenness”. The Betweenness effect in the second model is significant, in this situation the only node that is connected to both nodes as an advantaged for it can directly acquire resources from them and manage the flows among them. In physics there are nodes that can play as intermediaries among indirectly connected couples of nodes.

The last two tables show the SAOM result for economics and for sociology (Tables 7, 8).

Table 7 SAOM for economics
Table 8 SAOM for sociology

In economics the average rate in the final model in considered years is 3.7; it means that on average the expected number of changes is almost 4 for each researcher. Even here there is an influence of the transitive effect and geographical proximity on the changing of links. In addition to the dummy variable “Funding”, in economics the effect of control variable is added, here called “collabFootnote 10”, which indicates a change in the guideline by MIUR about the possibility of participating in the PRIN program in two consecutive years. In this context we can only say that there is an influence of this variable in the change of the links, but we cannot put this variable in relation with the other variables of the model.

Below the results for sociology.

In sociology the average rate of change is equal to 4.2, it means they tend to change more than other disciplines. Also here there is a transitive effect and a geographical effect like in economics. What differentiates sociology are the variables “interdisciplinary” and “same academic role variables”.

The negative sign of variable “interdisciplinary” means that there is a low tendency to collaborate with those researchers who come from other disciplinary fields. The positive sign of variable “same academic role” means that in sociology, in the PRIN context, a researcher tends to change the links on the base of academic level; for instance, a researcher prefers to do research with a full professor instead of another researcher (preferential attachment on the base of academic role). This is a feature found only in sociology.

Discussion and conclusions

We analyzed data on PRIN funded grants to figure out the collaboration patterns among researchers in a competitive context. We have noticed that there are different types of distributions in the four networks highlighting the distinct scientific traditions by the four academic communities. The first assumption is confirmed only in part; indeed chemistry seems to follow a power law (although we cannot exclude other distributions) starting from a minimum value and this could be explained through a preferential attachment of scientific quality by researchers, namely researchers with a lower profile tend to tie to excellent researchers to increase the chances of funding. Laudel (2002) speaks about ‘lobby’ in project funding contexts, this effect can enhance the chances of funding success and could become a real ‘lobby’, with the risk of influencing research policy and decisions on the allocation of loans in its favour. Even if analyzing the presence of lobby is out of the scope of the paper, we can imagine that this kind of effect is also present in Italy and in other countries. Physics reveals a small-world characteristic (short paths and high cluster coefficient) (Watts and Strogatz 1998) but does not exhibit a power-law degree distribution according to the Clauset et al. (2009) method. In physics some nodes are closer than the other fields, there is one large team that form the majority of the collaborations. This could be explained by the presence of large laboratories in physics, which require many researchers (therefore more connections). The cluster coefficient is present in all areas, this means that the friends of friends will have a larger probability to collaborate together. Altogether, however, the presented results are in agreement with Kronegger et al. (2012), supporting the conclusion that the dynamic social networks are governed by different forms between disciplines (in our case the network structure has different distribution for each area of study). Each field of study is affected by their own characteristics and academic culture. The second hypothesis is confirmed for all disciplinary fields. Data show that geographical proximity is an important driver for the longitudinal evolution of the network in all disciplinary fields. Even if we are in a national context (relative distance), physical proximity plays a key role in the formation of collaborations. It confirms the effects of face to face mentioned by Laudel (2006); for instance, researchers who work in the same department.

The third hypothesis is also confirmed; an interdisciplinary component does not play a role when a researcher chooses a partner of research. This does not mean that the projects are mostly disciplinary, but simply that researchers do not change the links on the base of partner’s area. Finally, the last hypothesis about H-index is confirmed, there is no system of connection with those who have the same H-index. Therefore, there is a preferential attachment about the scientific prestige in chemistry and physics. Gender variable has been used as a control variable and it is not significant in the four models. This suggests that there are not mechanisms of gender homophily even if we cannot exclude an effect of interaction between gender and differences in rank (different academic role). This interaction could invalidate the effect on the dependent variable, because females are found less in powerful positions (e.g. positions of full professor) compared to males and so they might be less attractive. Moreover, research policies (isolated through the calls) drive the collaborative processes, as demonstrated by the significance of some control variables (e.g. Funding and Collab); this means that changes in the calls also influence the choice of research partners.

This paper concerns a specific Italian funding scheme in four areas of interest and the results are not valid for other funding schemes, in Italy and abroad; it cannot be generalized to the entire academic community.