Introduction

In recent years there has been a worldwide increase in scientific collaborations (Milojević 2014). The share of single-authored publications is observed as constantly on the decline (Abt 2007; Uddin et al. 2012), while the average number of authors of a publication has been continuously increasing (Persson et al. 2004; Wuchty et al. 2007; Bukvova 2010; Gazni et al. 2012; Larivière et al. 2015). Among other motivations, the interaction between scientists of different disciplines and/or organizations is a response to the need to address the complex challenges of science and society (Hall et al. 2018). In addition, the capacity to activate and manage effective collaborations with colleagues, in their own and other institutions, both domestic and international, has become a rewarding factor in the scientist’s career development (Petersen et al. 2012). Collaboration allows them to participate in broader research projects, gain access to funding, and not least, to improve personal competencies, with positive effects on the quantity and quality of research outputs. It has been shown that in reality, as research collaboration increases, the number of publications (Ductor 2015; Lee and Bozeman 2005) and citations (Bidault and Hildebrand 2014; Li et al. 2013) also increases. Indeed, the link between research collaboration and performance is widely accepted in the literature (He et al. 2009), although fewer studies have tested the aspect of the impact of research performance on the ability to activate collaborations (Abramo et al. 2017).

In general, collaboration behavior at the individual level can vary on the basis of contextual factors: first of all with the research discipline concerned (Abramo et al. 2013a; Gazni et al. 2012; Yoshikane and Kageura 2004); also with personal factors, such as gender, age, academic rank (Abramo et al. 2014; Kyvik and Olsen 2008; Bozeman and Gaughan 2011; Gaughan and Bozeman 2016; Zhang et al. 2018); and also with social conventions, particularly those concerning the manner of assigning credits and publication authorship (Katz and Martin 1997; Cronin 2001; Glänzel and Schubert 2004).

Studies on the effect of gender on scientific collaboration show that women have less extensive collaboration networks than their male counterparts (Cole and Zuckerman 1984; Bozeman and Corley 2004; McDowell et al. 2006), van Rijnsoever et al. 2008). In addition, there is greater heterogeneity in their individual networks, which on the one hand implies less specialization (Leahey 2006), and on the other hand favors inter-disciplinary collaboration (Rhoten and Pfirman 2007; van Rijnsoever et al. 2008). In reality, as Araújo et al. (2017) show, it is only in the natural sciences that women are more likely than men to have collaborators from other fields.

Some studies indicate that women seem to prefer collaborations with colleagues from other domestic organizations (Moya Anegón et al. 2009), while showing a lower propensity for international collaboration than their male colleagues (Frehill et al. 2010; Larivière et al. 2011). Fox et al. (2017), surveying women engineers, found that frequency of international research collaboration varies by region, with European women leading the ranking. However, Abramo et al. (2013b) analyzing the scientific production of academics from Italy, found that even if female researchers register a greater capacity to collaborate at intramural and extramural domestic level, there is still a gap with their male colleagues in terms of international collaborations. Addressing similar objectives, Iglič et al. (2017) surveyed Slovenian scientists in four disciplines: mathematics, physics, biotechnology, sociology. Their research shows that while, in general, gender differences in the level of collaboration are not observed, women are probably more connected with colleagues of other research units and departments in the same organization (intramural), while being less connected in terms of international collaborations. These results are in line with the findings of Jadidi et al. (2017) who analyzed the international community of computer scientists, and by González-Álvarez and Cervera-Crespo (2017), who, investigating scientific production in neuroscience, claimed that the pattern of female collaboration in this field is less international than is the case for male collaboration.

A number of factors have been identified as the main ones responsible for the difference in collaboration behavior between men and women. Among others, the choices of research collaborators is often influenced by mechanisms of gender homophily, which stimulate a search for collaborations primarily among colleagues of the same gender, with whom the individual is more likely to share values and methodological approaches (Boschini and Sjögren 2007; Ferber and Teiman 1980; Mcdowell and Smith 1992). Also, women academics are still a minority in the main disciplines (Hamel et al. 2006; Rivellini et al. 2006), and their presence is still less among academics of higher rank (Athanasiou et al. 2016; Gaughan and Bozeman 2016). Furthermore, the effects of gender discrimination (i.e. under-recognition of women’s contributions to science, gender biases in perceptions of publication quality and collaboration interest, gender biases in evaluations of research work) make female scientists less attractive to potential research collaborators (Knobloch-Westerwick et al. 2013). The combination of these factors brings about the isolation of female academics, ever more so in departments that are smaller (Mcdowell and Smith 1992) and have lower percentages of women (Etzkowitz et al. 2000). This isolation becomes still more acute given the long history of male overrepresentation in the academic environment (Rhoten and Pfirman 2007), and could at least partly explain the so-called “productivity gap”, a term indicating that male researchers do indeed perform better than women (Fox 1983; Cole and Zuckerman 1984; Long 1987, 1992; Xie and Shauman 1998, 2004; Mauleón and Bordons 2006; Larivière et al. 2013). In particular, Abramo et al. (2009), showed that the gender productivity gap is noticeable among top scientists (TSs) only.

Keeping in mind the existence of a triangular relationship between gender, collaboration behavior and research performance, in this paper we intend to verify whether gender differences occur on the collaboration behavior of TSs. The current study is part of the stream of works on this matter by the authors’ research group, and more specifically a continuation of two previous works. In the first, Abramo et al. (2011) showed that TSs are also those who collaborate more abroad, but that the reverse is not always true. In the second, Abramo et al. (2018) verified whether TSs have a collaboration behavior different from the others. The results from a longitudinal analysis over two successive 5-year periods show a strong increase in the propensity to collaborate at domestic level (both extramural and intramural), however this is less for professors who remain or become TS than it is for their lower-performing colleagues. In contrast, the increase in international collaboration behavior is greater for scientists who become or remain top than it is for their peers.

Using the same dataset as this last work, we will extend the analysis to include the gender variable. The objective is to measure collaboration behavior at the “international”, “domestic extramural”, and “intramural” levels for TSs, to see if this behavior differs by gender from that of their colleagues. The field of observation consists of the Italian academic system and the co-authorships of scientific publications of 11,145 professors over the 5-year period 2006–2010, catalogued according to gender, as well as by their scientific field.

The next section further describes the field of observation and the methodology for the study. “Results and analysis” section presents the results obtained from the statistical analyses. The paper closes with the conclusions and questions for further examination.

Data and method

The research performance indicator

A fundamental requirement of this study is the identification of TSs, and therefore the measurement of individual research performance. The citation-based indicator used to measure individual research performance is the fractional scientific strength (FSS). The value of FSS is measured for professors in the sciences of Italian universities for the 2006–2010 period, with citations counted at 30/06/2017. Because the intensity of publication varies across fields, we need to classify the population under observation into research fields. Incidentally, this will allow us also to investigate whether the collaboration behavior of TSs varies across fields. In Italy each professor is classified in one and only one research field named “scientific disciplinary sector” (SDS, 370 in all).Footnote 1,Footnote 2 SDSs are grouped into disciplines named “university disciplinary areas” (UDAs, 14 in all). We define TSs as professors placing among the top 10% by FSS in each SDS.

The FSS is a proxy of the average yearly total impact of an individual’s research activity over a period of time. At present we provide the formula to measure FSS, while referring the reader to Abramo and D’Angelo (2014) for a thorough treatment of the underlying microeconomic theory, and all the limits and assumptions embedded in both the definition and the operationalization of the measurement

$${\text{FSS}} = \frac{1}{t}\mathop \sum \limits_{i = 1}^{N} \frac{{c_{i} }}{{\bar{c}}}f_{i}$$

where t number of years of work in the period under observation, N number of publications in the period under observation, \(c_{i}\) citations received by publication i,\(\bar{c}\) average of distribution of citations received for all cited publications in same year and subject category of publication i, \(f_{i}\) fractional contribution of professor to publication i.

The fractional contribution equals the inverse of the number of authors in those fields where the practice is to place the authors in simple alphabetical order but assumes different weights in other cases. For the life sciences, widespread practice in Italy is for the authors to indicate the various contributions to the published research by the order of the names in the byline. For the life science SDSs, we then give different weights to each co-author according to their position in the list of authors and the character of the co-authorship (intramural or extramural).Footnote 3

The reader is warned that evaluative scientometrics is based on: (1) the axiom that for the production of new knowledge to have an impact “on scientific advancement”, it has to be used by other scientists: no use, no impact; and (2) the assumption that citations “certify” the use of prior knowledge. As a consequence, all the usual limits, caveats, and qualifications apply, in particular: (1) publications as not representative of all knowledge produced; (2) bibliometric repertories do not cover all publications; and (3) citations are not always certification of real use and representative of all use.

Dataset and data source

The source for data on each professor of Italian universities is the database maintained by the Ministry of Education, Universities and Research (MIUR),Footnote 4 which indexes the name, gender, academic rank, field/discipline (SDS/UDA), and institutional affiliation of all professors in Italian universities, recorded at the close of each year.

The bibliographic dataset used to measure FSS is extracted from the Observatory of Public Research (ORP), a database developed by the authors and derived under license from Clarivate Analytics’ Web of Science (WoS). Beginning from the raw data of WoS and applying a complex algorithm for disambiguation of the true identity of the authors and reconciliation of their institutional affiliations, each publication is attributed to the Italian university professor that authored it, with a harmonic average of precision and recall (F-measure) equal to 97% (for details see D’Angelo et al. 2011).

Because the bibliographic repositories’ coverage of research output in arts and humanities and a number of fields within the social sciences is not completely satisfactory (Hicks 1999; Archambault et al. 2006), and particularly so in Italy,Footnote 5 our analysis only focuses on the sciences. Professors in the sciences, totaling 39,139, are classified in 9 UDAs, namely 1—mathematics and computer science, 2—physics, 3—chemistry, 4—earth sciences, 5—biology, 6—medicine, 7—agricultural and veterinary sciences, 8—civil engineering, 9—industrial and information engineering.

The dataset used for the analyses, taken directly from Abramo et al. (2018), is a subset of this population, and is made up of professors who satisfy the following two conditions in the period 2001–2010: (1) they are permanently on staff over the whole period, at the same university and SDS; and (2) they have authored at least one publication indexed in WoS.Footnote 6

Since in UDA 8 the number of female TS professors is too low (only one), we have omitted this UDA. The dataset consists of 11,145 professors (or 28.5% of the total) distributed over 175 SDSs, as indicated in Table 1. Women are just under 30% of the total population, with a peak of 48.9% in Biology and a minimum of 12.9% in industrial and information engineering. The lower representation of women in the dataset is due in part to the higher incidence of unproductive women in the period under observation. However, women do represent just under 27% of the total dataset (last row, column 4). The comparison between the percentages indicated in columns 4 and 5 indicates a low concentration of females in the restricted group of TSs in almost all UDAs, the sole exception being physics, in which women represent 13.7% of the total and 12.9% of the TS. The last two columns of Table 1 highlight the different publication intensity across UDAs and, within each UDA, the higher average output of TSs compared to their non-TS colleagues.Footnote 7

Table 1 Dataset of the analysis, by UDA; in brackets the share of females

The collaboration propensity indicators

In order to assess the collaboration behavior we analyze the nature of co-authorships, adopting the taxonomy described in Abramo et al. (2013a). For each academic i of the dataset, we measure the propensity to collaborate overall and by type of collaboration, using the following indicators:

  • Propensity to collaborate: \(C = \frac{{cp_{i} }}{{N_{i} }}\), where \(cp_{i}\) is the number of publications resulting from collaborations (two or more co-authors in the byline) over the period, and Ni is the total number of publications authored by the academic i over the period;

  • Propensity to collaborate at the intramural level: \({\text{CI}} = \frac{{{\text{cip}}_{i} }}{{N_{i} }}\), where \({\text{cip}}_{i}\) is the number of publications resulting from collaborations with other academics belonging to the same university over the period;

  • Propensity to collaborate extramurally at the domestic level: \({\text{CED}} = \frac{{{\text{cedp}}_{i} }}{{N_{i} }}\), where \({\text{cedp}}_{i}\) is the number of publications resulting from collaborations with scientists belonging to other domestic organizations over the period;

  • Propensity to collaborate extramurally at the international level: \({\text{CEF}} = \frac{{{\text{cefp}}_{i} }}{{N_{i} }}\), where \({\text{cefp}}_{i}\) is the number of publications resulting from collaborations with scientists belonging to foreign organizations over the period.

These indicators vary between zero (if, in the observed period, the scientist under observation did not produce any publications resulting from the form of collaboration analyzed), and 1 (if the scientist produced all his/her publications through that form of collaboration).Footnote 8

Statistical testing

In order to respond to the research questions, we have used two types of statistical test.

At the aggregate level (overall) we have used the two-sample t test with unequal variances, to verify if the variation in gender (male vs. female) and status (TS vs. non-TS) correspond, on average, to variations in the collaboration behavior of scientists. The preliminary skewness and kurtosis normality tests have shown that none of the collaboration propensity distributions is normal. This fact does not rise concern, since in large samples the test is valid for any distributions (Lumley et al. 2002). We have repeated the exercise applying parametric tests which showed exactly the same results.

At UDA level, we have used a non-parametric test (Wilcoxon rank-sum test) because of the varying sizes of UDAs, and in few cases small sizes. Moreover, the Wilcoxon rank-sum test is both valid for data from any distributions, and much less sensitive to outliers than the two-sample t test (Mann and Whitney 1947).

Results and analysis

All the bibliometric measures described above were calculated for the period 2006–2010, for purposes of verifying whether variation in gender (male vs. female) and status (TS vs. non-TS) correspond to variations in the collaboration behavior of scientists. For this, a t test was used. The results of the analysis at aggregate level are shown in Table 2, for all types of collaboration considered.

Table 2 Overall propensity to collaborate relative to status and gender: t test for comparison of averages (95% confidence interval in brackets)

With gender fixed, the differences between the averages (TS vs. non-TS) are consistently in favor of non-TS, apart from propensity to collaborate at the international level (CEF)—the latter being the sole exception in which TSs prevail. The female TSs register as follows:

  • propensity for international collaboration 7.9% higher (30.2% vs. 22.3%) compared to female non-TS colleagues;

  • propensity for extramural domestic collaboration (although statistically not significant) lower by 2.5% (50.6% vs. 53.1%);

  • propensity for intramural collaboration lower by 8.8% (71.1% vs. 79.9%);

  • overall propensity to collaborate lower by 1.4% (97.0% vs. 98.4%).

For males, the above four differences show the same signs, respectively at: + 7.7%, − 1.4%, − 7%, − 0.5%.

The comparison between women and men shows that there are no significant differences in the propensity to collaborate for the TSs. On the other hand, for the non-TSs, the differences are statistically significant and in favor of women: for their propensity to collaborate in general (98.4% vs. 97.4%); for their propensity for extramural domestic collaboration (53.1% vs. 50.8%); and for intramural domestic collaboration (79.9% vs. 77.0%); the opposite is true for international collaborations (22.3% vs. 23.4% in favor of men).

Differences among disciplines

The above analysis was repeated at the UDA level, however, given the low number of female TSs in some UDAs (e.g. eight each in Earth sciences and Industrial and information engineering) a non-parametric test was applied: the Mann–Whitney U test. In particular, the porder option of the STATA package “Ranksum” command was used. For each indicator of propensity for collaboration, the tables below show the sign of the difference observed between the two sub-sets and the relative statistical significance.

Table 3 shows the analysis for the propensity to collaborate at international level (CEF), in each UDA. With gender fixed (F/M), in the comparison between TS and non-TS, the porder option shows positive sign (for both women and men) in all UDAs. In other words, the TSs show a greater propensity to collaborate abroad than their colleagues, regardless of gender or UDA. While for males the test is significant in all UDAs except UDA 2 (Physics), for females it is significant only in UDA 5 (Biology), 6 (Medicine) and 9 (Industrial and information engineering).

Table 3 Differences in the propensity for international collaboration (CEF) by UDA, according to gender and status

However, when status (TS/non-TS) is fixed, the comparison between women and men shows differences varying with the discipline. In particular, among the TSs, women show a lower propensity to collaborate at international level in UDA 1 (mathematics and computer science) and 3 (chemistry). In UDAs 2, 7 and 9 the differences are also in favor of men but are not statistically significant; nor are they significant in the other UDAs. On the other hand, considering the non-TS category, women show a CEF that is significantly lower than that of men in UDA 1 (mathematics and computer science) and 5 (biology); in the other UDAs the test is not significant.

Table 4 provides the analysis of propensity to collaborate at extramural domestic level (CED). Columns 2 and 3 show that differences in behavior between TSs and non-TSs are only significant in Chemistry (UDA 3) for women, and only in chemistry and physics (UDA 2) for men, in both cases in favor of non-TSs. Focusing on TSs, the comparison between women and men shows statistically significant differences in favor of the former only in physics (UDA 2). On the other hand, analyzing non-TSs, there are significant differences in favor of women in UDA 4 (earth sciences), 6 (medicine), 9 (industrial and information engineering), and in favor of men in mathematics (UDA 1).

Table 4 Differences in the propensity to extramural domestic collaboration (CED) by UDA, according to gender and status

Finally, Table 5 shows the results for propensity to collaborate at intramural level (CI). In the TS versus non-TS comparison, statistically significant differences were observed for men, and in favor of non-TS, in all the UDAs considered. What emerged at an overall level in the previous section is confirmed at the level of individual disciplines: male TSs show a significantly lower propensity for intramural collaboration than do their non-TS male colleagues. For women, TS versus non-TS comparisons are consistently in favor the latter, but significant in only three UDAs (chemistry, biology, medicine).

Table 5 Differences in propensity for intramural collaboration (CI) by UDA, according to gender and status

The comparison between women and men does not show significant differences for TSs in any of the cases. Instead, concerning non-TSs, in 4 UDAs (1, mathematics and computer science; 2, physics; 3, chemistry; 5, biology), propensity to collaborate at intramural level is significantly higher for women than for men.

Conclusions

Many studies in the literature agree that gender matters, both in the research performance and in the collaboration behavior of scientists. Compared to male colleagues, women seem to prefer collaborations with colleagues from other domestic organizations (both intramurally and extramurally), while they show a lower propensity for international collaborations. It has been shown also that collaboration intensity is positively correlated with research performance and, vice versa, research performance seems a driver of attractiveness for scientific collaborations.

The existence of this triangular relationship between gender, collaboration behavior and research performance, has prompted the authors to check whether gender matters in the collaboration behavior of top performers, as a natural sequel of the authors’ previous empirical studies on these interrelated topics.

The test set is composed of 11,145 professors and the coauthorship of their scientific publications over the 2006–2010 period. Examining this data, the average values for propensity to collaborate at domestic level are always lower for TSs than for their non-TS colleagues, both among men and women. On the contrary, the propensity to collaborate internationally sees the TSs prevail, without distinction for gender.

Focusing on the TSs, at the aggregate level the comparison between women and men does not show statistically significant differences in propensity for collaboration, either domestic or international; gender differences do emerge for the non-TS set, for all types of collaboration.

At the level of single disciplines, for TSs, statistically significant gender differences are limited to three cases: in Mathematics and computer science, as in Chemistry, women show a lower propensity to collaborate at the international level; in Physics, men show a lower propensity to collaborate at extramural domestic level. For non-TSs, significant gender differences emerge in some ten cases. Women show less propensity to collaborate at international level in biology and mathematics and computer science. For extramural domestic collaboration the differences are in favor of women in earth sciences, medicine, and industrial and information engineering; in favor of men in mathematics. Women also show a higher propensity to collaborate at the intramural level in the disciplines of mathematics and computer science, physics, chemistry, and biology.

For extramural domestic collaboration, the differences are in favor of women in earth sciences, medicine, and industrial and information engineering; in favor of men in mathematics. Women show a higher propensity to collaborate at intramural level in mathematics and computer science, physics, chemistry, and biology.

Further to what is known in the literature, the results of the study suggest that the differences in collaboration behavior between males and females do not concern TSs, in particular no differences occur in the propensity to collaborate at the international level. Evidently, the two-way positive link between international collaboration and research performance is confirmed as, differently from female non-TSs, female TSs have a propensity to engage in international collaboration similar to males.

Several gender policies have been envisaged in the Italian research system, as highlighted by the European Institute for Gender Equality (https://eige.europa.eu/gender-mainstreaming/toolkits/gear/legislative-policy-backgrounds/italy). In particular “The National Code of Equal Opportunities between Women and Men”, established by Legislative Decree No. 198 in 2006, sets the obligation for Public Administrations (and therefore Universities) to adopt a Positive Action Plan (PAP). The plan lasts 3 years and must assure the removal of all obstacles hindering equal opportunities at work between men and women. The directive of the Presidency of the Council of Ministers of 23 May 2007 identifies the instruments and the areas of intervention: positive actions aiming at balancing female representation in sectors and professional levels where they are underrepresented; the organisation of work aiming at promoting work-life balance; and hiring and promotional mechanisms targeting women.

Unfortunately, because of everlasting Government instability in Italy, very little (extension of the maternity leave to post-doc researchers) more than declarations of intent has been actually realized.

With regard to the specific focus of this study, few policy mechanisms might be considered. Because, all others equal, increase in productivity is the underlying aim of all productive systems, fostering international collaboration is an indirect way to achieve it. In particular for women, who are noticeably underrepresented among TSs. To foster the propensity of women to collaborate at the international level, a wide variety of incentives can be envisaged. Increasing the freedom and responsibility of individual female researchers to form international research partnerships and attract female foreign researchers. Utilizing honorary and visiting professor or research-fellow appointments to attract female external scholars for collaboration purposes. The creation of internationalization offices, focused on promotion of the institutions research qualities and strengths, with a specific focus on women. Finally, funding schemes can be specifically engineered to require partnerships embedding female individuals, thus facilitating bottom-up collaboration involving women.

In the interpretation of the outcomes of the analysis, we urge scholars to take into account the limitations and assumptions embedded in the bibliometric approach for measurement of research performance and collaboration, and the sensitivity of the results to the conventions and classification schemes adopted, and last but not least the characteristics of the country system under analysis. Given this, the reproduction of this study in other countries would provide interesting interpretive keys on the phenomenon—clearly impacted by the sociocultural features of the different national science systems. Possible future research could investigate the trends of gender differences in TSs’ collaboration behavior, through time-series analysis.