Introduction

Since Gross and Gross (1927) first employed citation count to evaluate scientific work, citation-based indicators have played essential roles in research evaluation, as a complement to peer review (Onodera and Yoshikane 2015). Previous studies have found that citation count can be influenced by many factors (e.g., Bornmann and Daniel 2008a; Tahamtan et al. 2016; Tahamtan and Bornmann 2018; Yu and Yu 2014). The influencing factors are divided into three categories: paper-related factors, journal-related factors, and author-related factors (Tahamtan et al. 2016; Tahamtan and Bornmann 2018). As one of the most important author-related factors, scientific collaboration has received increasing attention in recent years.

Scientific collaboration is defined as “the working together of researchers to achieve the common goal of producing new scientific knowledge” (Katz and Martin 1997). The significant and positive relationship between scientific collaboration and citation count has been generally accepted in the academic community (e.g., Asubiaro 2019; Moldwin and Liemohn 2018; Frenken et al. 2005; Sooryamoorthy 2009, 2017; Annalingam et al. 2014; Low et al. 2014; Ronda-Pupo et al. 2015). Tahamtan et al. (2016) comprehensively reviewed the empirical studies and found out a positive relationship between scientific collaboration and citation count. However, there is still a lack of widely-agreed quantitative evidence about the strength of the positive relationship. Cohen (1988) divided the strength of correlation into four groups: non-correlated (r = 0.00–0.09), weak (r = 0.10–0.29), moderate (r = 0.30–0.49), and strong (r = 0.50–1.00). For example, in Iranian publications, a strong correlation (r = 0.685) was found (Hayati and Didegah 2010), whereas there was a weak relationship (r = 0.133) in Latin-American management articles (Ronda-Pupo et al. 2015). Furthermore, some studies have failed to report a significant correlation between scientific collaboration and citation count (Bartneck and Hu 2010; Hart 2007; Bornmann et al. 2012), and some even reported a negative effect (Ahmed et al. 2016; Fu and Ho 2018; Fu et al. 2018). Therefore, it is necessary to explore a consistent result and investigate potential moderators leading to the inconsistency in empirical studies.

Research is a highly complicated activity, with outputs and performance that are influenced by numerous factors. Exploring the contribution of collaboration to the academic impact of research can help researchers improve the impact of their research (Haslam et al. 2008). It can also be a useful reference for administrators and funders designing mechanisms to encourage effective research models (Polyakov et al. 2017). In addition, considering the time lag between a paper’s publication and being cited, the value of scientific collaboration in the early prediction of citation count can be revealed by examining the strength and consistency of the relationship (Louscher et al. 2019; Alabousi et al. 2019).

The current study applied a meta-analysis approach to systematically investigate the effect between scientific collaboration and citation count. Meta-analysis is a method that provides quantitative synthesis of the results from different primary studies and allows a statistical comparsion among subgroups to test potiential moderators. Co-authorship is an important part of scientific collaboration (Kraut et al. 1987). Moreover, the co-authorship indicator is verifiable, stable over time, and easy to use (Bozeman et al. 2013). Thus, as the most accepted measurement of scientific collaboration, co-authorship was used in the current study.

Data and method

Literature searching

We adopted the following four steps to search for literature targeting the correlation between research collaboration and citation count.

  1. 1.

    Pre-searching. Based on the key concepts in our research, i.e., scientific collaboration and citation count, we formulated the following search strategy: (collaborat* OR cooperat*) AND (“number of citation*”). We then searched in the Web of Science by subject fields, browsed titles and abstracts, and read the full texts of highly relevant literature. The purpose of this step was to explore as many related search terms as possible.

  2. 2.

    Searching. Based on the pre-search results, four bibliographic databases, i.e., the Web of Science, Scopus, PubMed and the Library & Information Science Abstracts (LISA), were searched in December 2019, using the terms: (collaborat* OR cooperat* OR co-author* OR multi-author* OR multi-nation* OR multi-institution* OR “number of author*” OR “number of institut*” OR “number of countr*”) AND (“citation impact*” OR “citation count*” OR “number of citation*” OR “citation rate*” OR “cited time*”). We had no limitation on document type, or year of publication. Duplicates and obviously irrelevant records were removed by screening the titles and abstracts. Subsequently, 332 relevant papers were identified and their full texts were downloaded. Five papers were excluded due to the lack of full-text.

  3. 3.

    Reference tracking. By screening the reference lists of the 327 papers, we found an additional 21 relevant ones.

  4. 4.

    Updating. A supplementary search was conducted in May 2020 to avoid missing newly published literature. Another 13 papers were indentified by re-adopting the second and third steps. Finally, we obtained 361 papers that could potentially be included in the meta-analysis, as shown in Fig. 1.

Fig. 1
figure 1

Flow chart for literature searching, inclusion and coding

Criteria for inclusion and exclusion

The main effect of the current meta-analysis was the relationship between scientific collaboration (measured by co-authorship) and citation count, which were both continuous variables. Therefore, we used the Pearson correlation coefficient as the effect size. We did not distinguish the Spearman correlation coefficient from the Pearson correlation coefficient because they provided similar information (Morgan et al. 2013). Additionally, several statistics, such as t (Rosenthal 1991), F (Rosenthal 1991), Mann–Whitney U (Morgan et al. 2013), χ2 (Cohen 1988), Kruskal-Walis H (Li and He 2013) and the determination coefficient of univariate linear regression (R2) (Li and He 2013), can be transformed to correlation coefficients. These transformation methods enabled more studies to be included in our meta-analysis.

We established the following inclusion criteria, i.e., we included studies (1) using the co-authorship indicator (e.g., the number of authors, institutions or countries) as the independent variable, and using citation count of papers as the dependent variable; (2) reporting correlation coefficients, statistics that can be transformed to correlation coefficients, or original data that can be used to calculate these statistics; (3) reporting sample sizes. As a result, we excluded 154 papers that failed to meet all three criteria.

Studies would be excluded if they met any of the following criteria: (1) non-empirical studies, such as letters to editors and reviews were excluded. Twelve papers were accordingly excluded; (2) our research investigated the relationship between scientific collaboration and citation count at the paper level, so studies exploring the correlation at the paper set level were excluded. For example, Bornmann and Daniel (2007) took the applicants of the Boehringer Ingelheim Fonds fellowship as research objects, and calculated the influence of the average number of authors among all papers they published on the total citation count. Forty-nine papers were excluded by this criterion; (3) irrelevant studies, including bibliometric reports about co-authorship and citation, studies focusing on the methods of assigning citations to each co-author, and studies talking about the Matthew effect of citations in co-authored papers were excluded. Thirty-five papers were accordingly excluded; (4) nineteen non-English papers were excluded.

After inclusion and exclusion criteria were applied, 92 papers were finally included in our meta-analysis, as shown in Fig. 1. A list of the included papers is provided in "Appendix A".

Coding

Based on the main effect and possibile moderators, we designed the coding schema to extract variables from the 92 included papers. For example, correlation coefficients and sample sizes reported in these studies were extracted to calculate the main effect.

Moderators possibly affecting the relationship between scientific collaboration and citation count can be divided into three categories: (1) collaboration types; (2) sample characteristics in primary studies, including disciplines, countries, journals, and document types; and (3) citation characteristics in primary studies, including citation sources, citation windows and citation types.

Some previous studies suggested that the relationships between different types of scientific collaboration and citation count are not the same. For instance, Iribarren-Maestro et al. (2009) found that institutional and international collaboration were significantly related to citation count rather than individual collaboration. Asubiaro (2019) also showed that papers from international collaboration were cited more frequently, even though no significant difference between local and domestic collaboration was found. However, Gazni and Didegah (2011) revealed that the influence of international collaboration on citation count was not significant using regression analysis. In terms of the classification of scientific collaboration, there are two approaches. One is based on the geographical distribution of collaborators (Borrons et al. 1996) by classifying them as local (i.e., collaborators from the same institution), domestic (i.e., collaborators from different institutions in the same country) and international collaboration (i.e., collaborators from different countries). The other approach is based on the granularity of collaboration (Didegah and Thelwall 2013) and classifying them as individual, institutional, and international collaborations. Since most existing studies have quantified scientific collaboration by the number of authors, institutions, or countries, we followed the latter classification and coded the scientific collaboration in primary studies as “Individual,” “Institutional,” and “International.”

Some studies have shown that sample characteristics, such as disciplines (Puuska et al. 2014; Shehatta and Mahmood 2016; Van Wesel 2014), countries (Leimu and Koricheva 2005; Chi and Glanzel 2016, 2017; Thelwall and Maflahi 2019), journals (Rousseau and Ding 2016; Ibanez et al. 2013; Peclin et al. 2012) and document types (Abramo and D'Angelo 2015; Sin 2011; Muniz et al. 2018), could influence the correlation between scientific collaboration and citation count, indicating their moderating effects. The coding schema of these variables is as follows: (1) disciplines, (2) countries, (3) journals and (4) document types. In terms of disciplines, Puuska et al. (2014), and Shehatta and Mahmood (2016) classified disciplines by a single research domain, e.g., Arts & Humanities, Social Sciences, and natural science. Van Wesel (2014) also conducted research in some subject areas, e.g., Sociology, General & Internal Medicine, and Applied Physics. Since most primary studies in our meta-analysis used Web of Science as the data source, and it is feasible to group subject areas into research domains, we referred to the research domain categories of Web of ScienceFootnote 1 and coded disciplines as Arts & Humanities, Social Sciences, Life Sciences and Biomedicine, Physical Sciences, and Technology. This variable was coded as null if a primary study involved more than one research domain. As for countries, Chi and Glanzel (2016, 2017) and Thelwall and Maflahi (2019) selected samples from individual countries, e.g., Iran, Israel, and Belgium. Leimu and Koricheva (2005) also based their search on the geographical positions of countries and collected data from the US and Europe. Although the level of economic development is positively correlated with the scientific wealth of a country (Kumar et al. 2016; Hatemi-J et al. 2016), the levels are always different among countries in the same continent. Thus, we divided the countries into developed and developing countries according to the list of advanced economies from the International Monetary Fund,Footnote 2 instead of by continent. This variable was coded as null if the primary studies included samples from both developed and developing countries. For the third category, journals, Ibanez et al. (2013) and Peclin et al. (2012) used journal impact factor (JIF) quartiles in the Jounal Citation Report (JCR) to characterize their samples. Rousseau and Ding (2016) also collected samples from three individual journals, i.e. PNAS, Science and Nature. We classified journals by JCR’s JIF quartiles since most primary studies in this meta-analysis used Web of Science as the data source, and individual journals could be easily mapped to the JIF quartiles. The version of JCR is the year when a papers was published (Ibanez et al. 2013; Peclin et al. 2012) or is designated as a particular year (Low et al. 2014; Bales et al. 2014). In the current meta-analysis, a number of primary studies contained samples published before 1997, when the first version of JCR was issued. Therefore, we chose 2018 as the year. The 2018 JCRFootnote 3 was followed to score 1–4 for journals belonging to Q4-Q1, respectively. For journals that belong to different subject areas (with different JIF quratiles), their subject areas were identified according to the primary studies. If the samples of primary studies were published in different journals, they could be coded by the arithmetic mean score of each journal. In addition, this variable was marked as null when the samples were from non-indexed journals. The fourth classification is by document type. Most primary studies in our meta-analysis conducted their research with articles or reviews in Web of Science (e.g., Abramo and D'Angelo 2015; Sin 2011; Muniz et al. 2018). Following their codes of document types, we classified them as “Article,” “Review,” and “Both.” Although some primary studies were characterized as journal papers and conferences (e.g., Ibanez et al. 2013), we considered these samples “Both” because both articles and reviews can be published in journals and proceedings.

The variation of effect sizes across studies was also bound up with citation characteristics, such as citation sources (Garcia-Aroca et al. 2017; Louscher et al. 2019), i.e., data sources to collect citation counts, citation windows (Bornmann and Daniel 2008b; Onodera and Yoshikane 2015), i.e., the interval between publication and citation observation, and citaition types (Clements 2017; Leimu and Koricheva 2005). Citation sources included Web of Science, Scopus, Google Scholar, and “Other.” We grouped sources such as CNKI, PubMed, and journal websites, into “Other” because few primary studies (N = 8) collected samples from these sources. In terms of citation windows, Abramo et al. (2011) suggested that a citation window of two or three years would be long enough to guarante the robustness of citation counts in the impact measurement. Liu et al. (2015) also found that papers would reach their citation peak in the third year after publication, which indicated the third year was a reasonable cutoff. Therefore, we coded citation windows as: “annual,” “1–3 years,” and “\(\ge \) 4 years.” A citation window could also be coded as null if it is too long to be divided. Notablely, the year of citation observation would be counted as the current year if the observation occurred after July; otherwise, it would be counted as the last year. For example, Muniz et al. (2018) collected the citation counts of papers published between 2000 and 2015 in April 6, 2017, so the year of citation observation was counted as 2016, resulting in a citation window of 2–17 years. Citation types included “peer-citations” and “self-citations” (Clements 2017). They can also be coded as “total-citations” when both peer-citations and self-citations were considered, or no citation type was reported in the primary studies.

All variables except for citation types were coded as null if not reported. Two authors independently coded the included papers and compared the results. Disagreements were solved by discussions between the first two authors; the third author joined the discussions, if necessary.

Meta-analytic method

Since the studies included in our meta-analysis were independent rather than from a homogeneous population, it is unreasonable to assume that the true effects of all studies were the same. Therefore, a random effect model was used to calculate the mean effect size. As the effect size (ES), correlation coefficient (r) is suggested to transform to Fisher’s z that actually functions in meta-analysis procedures (Formula 1) (Borenstein et al. 2009). Within-study standard error (SE) of Fisher’s z was also calculated using Formula 2 (Borenstein et al. 2009). Fisher’s z was transformed back to a summarized correlation coefficient when reporting the results (Formula 3) (Borenstein et al. 2009).

$$ES=\frac{1}{2}ln\left(\frac{1+r}{1-r}\right)$$
(1)
$$SE=\frac{1}{\sqrt{n-3}}$$
(2)
$$r=\frac{{e}^{2ES}-1}{{e}^{2ES}+1}$$
(3)

To assess the reliability of main effect, we conducted tests of publication bias and heterogeneity for the current meta-analysis. Publication bias was examined using a funnel plot, Egger’s regression test, p-curve, and Rosenthal’s fail-safe N. Cochran Q-test, I2, funnel plot, and prediction interval (PI) were used to evaluate heterogeneity. In addition, we divided subgroups according to the potential moderators and investigated their effects using between-subgroup analysis of variance (ANOVA) and between-subgroup z-tests. Stata 16.0 was used for these analyses, and the level of significance was 0.050.

Results

Coding results

Information about all variables was extracted from 92 papers (see detailed coding results in Online Appendix B). Some papers reported multiple individual studies. For instance, Thelwall and Sud (2014) examined the difference in citation counts between co-authored and single-authored papers in 30 discplines, so 30 correlation coefficients were extracted from this paper. Finally, 340 correlation coefficients were included (Fig. 1). Effect sizes (i.e., Fisher’s z) transformed by these coefficients ranged from − 0.400 to 0.838, and their sample sizes were from 38 to 12,021,209. In addition, the included papers were published between 1975 and 2020.

Main effect

A total of 340 effect sizes were synthesized by a random effect model, and a mean effect size of 0.147 was achieved with a confidence interval of [0.136, 0.158] (Online Appendix C). The Z-test of the mean effect size was also siginificant (z = 25.77, p < 0.000). The summarized correlation coefficient was 0.146 after transformation, showing a positive and weak correlation between scientific collaboration and citation count (Cohen 1988).

Test of publication bias

Publication bias has been a common issue in meta-analyses. This bias shows that studies with significant results are more likely to be published and these published studies are more likely to be included in a meta-analysis. Consequently, studies with smaller effect sizes and sample sizes are easily omitted, leading to an overestimated mean effect size. The mean effect size will be unreliable if the publication bias is too large (Borenstein et al. 2009).

A funnel plot can be used for a qualitative examination of publication bias. As shown in Fig. 2, the distribution of effect sizes was asymmetric with many on the upper right, indicating a publication bias. In contrast, there was no obvious asymmetry on the bottom, suggesting that few primary studies with small effects and sample sizes were missed and the publication bias was small. We also performed Egger’s regression to quantitatively analyze publication bias. In Egger’s regression, the bias coefficient was 2.16 (t = 3.56, p < 0.000), so we rejected the null hypothesis and accepted the alternative hypothesis that publication bias did exist.

Fig. 2
figure 2

Funnel plot for the effect sizes

We further investigated whether the mean effect size was an artifact of the bias. First, we conducted p-curve analysis, a histogram of p-values for individual effect sizes (Fig. 3). As shown in Fig. 3a, the number of effect sizes increased as p-values decreased, and the p-values of most effect sizes were less than 0.05. Figure 3b also showed that the majority of p-values gathered around zero, which indicated the reliability of the mean effect (Simonsohn et al. 2014). In addition, the result of Rosenthal’s fail-safe N was 2,755,856.94, far larger than the reference value, 1710 (5 k + 10, with k the number of effect sizes), showing that 2,755,857 primary studies were needed to change the significance of the mean effect size. To sum up, although there was some publication bias in our meta-analysis, the influence on the mean effect size was limited.

Fig. 3
figure 3

P-curve analysis. a The distribution of p-values for all effect sizes, 253 statistically significant effect sizes (p < 0.05) compared to 87 insignificant values. b The distribution of p-values for 253 statistically significant effect sizes

Tests for heterogeneity

Heterogeneity is usually examined by Cochran Q-test (Q-value) and I2 (Formula 4 and 5). Q-values obey χ2 distribution with k − 1 degree of freedom, under the hypothesis that all studies share a common effect size (Borenstein et al. 2009). Higgins et al. (2003) also suggested that a I2 of 25%, 50%, and 75% shows a low, moderate, and high extent of heterogeneity, respectively, and conducting a meta-analysis was inappropriate when I2 > 75%.

$$\mathrm{Q}=\sum_{\mathrm{i}=1}^{\mathrm{k}}\frac{{({\mathrm{ES}}_{\mathrm{i}}-\mathrm{U})}^{2}}{{\mathrm{Si}}^{2}}$$
(4)
$${\mathrm{I}}^{2}=\frac{\mathrm{Q}-\mathrm{k}+1}{\mathrm{Q}}$$
(5)
$$\mathrm{Q}=\sum_{\mathrm{i}=1}^{\mathrm{k}}{(\mathrm{n}}_{\mathrm{i}}-3)*{({\mathrm{ES}}_{\mathrm{i}}-\mathrm{U})}^{2}$$
(6)

The Q-value of this meta-analysis was 40,271.11 (p < 0.000), and I2 equalled 99.2%, suggesting that there was large heterogeneity among the included studies, and that conducting the current meta-analysis might not be reasonable. In the following paragrahs, we present reasons why Cochran Q-test and I2 were not applicable to assess the heterogeneity of a meta-analysis including studies with large samples.

Fomula 6 is derived from Formula 2 and 4, indicating that Q-values and sample sizes (ni) were interrelated. For the included studies, the average sample size was up to 7,266.71 after excluding the maximum and the minimum, which necessarily resulted in a high Q-value. Given that heterogeneity is defined as the variation of the true effect across studies and with no relation to sample size (Parr et al. 2019), the large Q-value of this meta-analysis resulted from the high statistical power of test.

I2 is essentially the ratio of between-studies variance to total variance, the latter consisting of both between-studies variance and within-study variance (Borenstein et al. 2017). Within-study variance approximated zero when the sample size was extremely large (Fomula 2), and thus I2 was close to 100%. Xie et al. (2019) also examined this phenomenon by simulation analysis, finding that when the sample sizes in primary studies ranged from 50 to 100, the I2 of 89% of the simulated meta-analyses were larger than 75%. As the sample sizes increased further, almost all I2 exceeded 80%.

Instead, we used a funnel plot to examine heterogeneity. In the current study, when within-study standard error decreased, the individual effect sizes tended to converge towards the mean effect size, except for a few effect sizes (Fig. 2), suggesting some extent of heterogeneity (Wake et al. 2020). We also calculated the prediction interval (PI) to estimate the range of true effects among primary studies (Parr et al. 2019). In our meta-analysis, the PI of Fisher’s z with a 95% confidence level was [− 0.039, 0.333], and the PI of the corresponding correlation coefficient was [− 0.039, 0.321]. This revealed that in a universal set of studies investigating the relationship between scientific collaboration and citation count, more than 95% of the correlation coefficients were between − 0.039 and 0.321, showing non-correlation or a weak correlation (Cohen 1988). Therefore, the degree of dispersion among studies was small. Based on the funnel plot and PI, although there was some heterogeneity in our meta-analysis, the mean effect size was still reliable.

Moderators analysis

In this section, we examine whether moderators could account for the heterogeneity. First, we divided the data into subgroups based on the classification of potential moderators to indicate their influences on the main effect (Table 1).

Table 1 The result of subgroup division

We then employed a between-subgroup ANOVA to identify moderators that could significantly affect the mean effect size (Fomula 7). In the fomula, Q is the Q-value of the overall meta-analysis, Qi is the Q-value of the ith subgroup, and j is the number of subgroups. With no heterogeneity across subgroups, \({Q}_{\mathrm{bet}}\) follows χ2 distribution with a j − 1 degree of freedom (Borenstein et al. 2009).

$${Q}_{\mathrm{bet}}=Q-\sum_{i=1}^{j}{Q}_{i}$$
(7)

Due to limited data availability in the primary studies, the numbers of the effect sizes among moderators were different (Table 2). As shown by the \({Q}_{\mathrm{bet}}\) and p values, disciplines, countries, document types, and citation sources exerted a significant effect on the relationship between scientific collaboration and citation count, whereas other potential moderators did not. It is noteworthy that the results of the between-subgroup ANOVA could be influenced by the primary studies within the individual subgroups. For example, the insignificant moderating effect of journals might be influenced by the fourth subgroup (value = 1) whose between-study variance was large (T2 = 0.1323) and the number of effect sizes was small (n = 7) (Table 1).

Table 2 The result of the between-subgroup ANOVA

The between-subgroup ANOVA revealed whether the mean effect sizes among the subgroups were significantly different. To further explore the relationship of the mean effect sizes among the subgroups, between-subgroup z-tests (Fomula 8) were performed for significant moderators that could be divided into more than two subgroups, i.e., disciplines, document types, and citation sources. The moderator “countries” only had two subgroups, and thus no between-subgroup z-test was conducted. In formula 8, ESa and ESb were the mean effect sizes of the two subgroups, while Va and Vb were their variances, respectively (Borenstein et al. 2009).

$$\mathrm{z}=\frac{{\mathrm{ES}}_{\mathrm{a}}-{\mathrm{ES}}_{\mathrm{b}}}{\sqrt{{\mathrm{V}}_{\mathrm{a}}+{\mathrm{V}}_{\mathrm{b}}}}$$
(8)

Countries. The mean effect sizes of developed countries and developing countries (Table 1), as well as the difference between them (Table 2) were all significant. The relationship between scientific collaboration and citation count was smaller in developed countries (r = 0.112), while it was stronger for developing countries (r = 0.180). This result supports the findings in Chi and Glanzel (2016, 2017).

Disciplines. Scientific collaboration was significantly positively related to citation count in all research domains (Table 1). Table 3 also shows that the correlation in Life Sciences & Biomedicine is siginificantly larger than the correlations in Technology, Physical Sciences, and Arts & Humanities. Although Technology and Physical Sciences rely more on expertise and skills, the correlations in these two domains were siginificantly smaller than that in Social Science. There was also no significant difference among Technology, Physical Sciences, and Arts & Humanities, between Life Sciences & Biomedicine and Social Science.

Table 3 The results of between-subgroup z-tests for “Disciplines”

Document types. Scientific collaboration and citation count were significantly and positively related in each subgroup, with the highest correlation coefficient in “Article” subgroup (r = 0.171) (Table 1). As shown in Table 4, the correlation coefficient between scientific collaboration and citation count in “Article” was significantly higher than that in “Both” (r = 0.136). Although “review” had a lower mean effect size than “article,” the difference between them is not significant (p = 0.279).

Table 4 The results of between-subgroup z-test for “document types”

Citation sources. As shown in Table 1, scientific collaboration exerted no relation to citation count for Google Scholar sources (p = 0.502). For Web of Science and Scopus as citation sources, the correlations are significantly higher than those in Google Scholar and other sources (Table 5). Although the correlation coefficient in Web of Science (r = 0.146) was smaller than that in Scopus (r = 0.173), the difference was not significant (p = 0.148), and the same result is found between Google Scholar and other sources.

Table 5 The results of between-subgroup z-test for “citation sources”

Discussion

Main effect

We quantitatively summarized the extant studies about the relationship between scientific collaboration and citation count using meta-analysis and found that the correlation coefficient between them was 0.146, showing a positive and weak correlation. The results of the funnel plot, Rosenthal’s fail-safe N, and prediction interval supported the reliability of this result.

The positive correlation between scientific collaboration and citation count suggested the benefits of collaboration.Three aspects of benefits have been reported by previous studies. For researchers, scientific collaboration allowed the sharing and transferring of knowledge, skills, or techniques to promote their academic competence (Katz and Martin 1997). As for research teams, collaboration played a critical role in developing scientific and technical human capital and in raising more funds (Bozemana and Corley 2004). It could also create rigorous internal reviews for team building (van Wesel et al. 2014). In terms of research outputs, the clash of views and cross-fertilization of ideas brought by scientific collaboration contributed considerably to knowledge recombination and ouput innovation (Katz and Martin 1997; He et al. 2009; Talke et al. 2011). These benefits have been essential for scholars and their teams to conduct superior research and produce high-quality publications that would be cited widely. Some studies have also supported the advantages of co-authored papers, compared to non-co-authored papers, in peer-review scores (Franceschet and Costantini 2010), acceptance rates of journals (Tregenza 2002), JIF (Sahu and Anda 2014), and methodological quality (Cartes-Velasquez and Manterola 2017). In addition to quality, the increased opportunities of self-citation (Lin and Huang 2012) and visibility through larger social networks in the community (Katz and Martin 1997; Goldfinch et al. 2003; van Wesel et al. 2014) might contribute to more citations of co-authored papers.

The weak correlation found in this study resonated with the inverted “U” relationship between scientific collaboration and citation counts (Lariviere et al. 2015; Hsiehchen et al. 2015; Quan et al. 2019; Acedo et al. 2006). The inverted “U” relationship suggests an optimal team size in collaboration activities, which can be caused by the cost of scientific collaboration. For example, various differences (including favorable differences to underpin the collaboration, and incidental differences undermining its achievement) were found to exist among collaborators; thus, it was necessary to manage these differences with coordination costs (Bammer 2008). Moreover, the coordination costs have been found to be a statistical mediation in the negative influence of scientific collaboration (Cummings and Kiesler 2007). When team size increased, both favorable and incidental differences possibly increased. To manage these differences, more coordination costs were needed. In a word, an oversize team could probably receive limited profits or even deficits from scientific collaboration. In addition, scientific collaboration required time and resources from researchers (Godin and Gingras 2000), and introduced challenges in the allocation of credit and responsibilities (Wray 2006), which might reduce the efficiency and motivation of collaborators.

Moderating effects

The between-subgroup ANOVA indentified the significant moderating effects of disciplines, countries, document types, and citation sources on the main effect. The moderating effect of disciplines possibly resulted from the diverse research practices in different domains. For example, the higher mean effect size in Social Science might be because that Social Science research was often multi-paradigmatic and produced arguments and interpretations; thus, the perceived credibility could increase when there were several authors (Puuska et al. 2014). The correlation between scientific collaboration and citation count was also higher in Life Sciences & Biomedicine, which might be explained by two reasons. First, Life Sciences & Biomedicine research required more instruments, ideas, analyses and interpretations, calling for even more collaboration among researchers to improve research quality (Shehatta and Mahmood 2016). Second, Chinchilla-Rodriguez et al. (2012) found that biomedical publications accounted for around 30% of the outputs worldwide, far more than other research domains. Therefore, papers were more likely to be cited with higher frequency due to a large number of potential citing papers within the domain. In addition, the weaker correlation between scientific collaboration and citation count in Arts & Humanities was possibly because the fact that they had the lowest extent of scientic collabotation among all domains (Franceschet and Costantini 2010; Wuchty et al. 2007). Finally, since the team sizes in Physical Sciences and Techology were generally large (Wuchty et al. 2007), the profits of scientic collaboration could be limited by excessive coordination costs, which might lead to the lower correlation between scientific collaboration and citation count.

This study revealed that the mean effect size of developed countries was lower than that of developing countries. The level of economic development of a country was positively correlated with its scientific wealth (Kumar et al. 2016; Hatemi-J et al. 2016). For instance, developed countries could invest more money and resources into research activities, to increase the quality of their scientific outputs (Allik et al. 2020). Therefore, the reason why papers from developed countries were cited frequently was more likely the perceived quality of the research rather than scientific collaboration. In addition, the moderating effect of countries indicated some cultural differences. Hofstede (2011) found that individualism was prevalent in developed and Western countries, and collectivism was more dominant in developing countries. Researchers from countries with a high degree of individualism might be less willing to achieve common goals through scientific collaboration (Thelwall and Maflahi 2019).

Document types reflected the nature of studies. For example, reviews aimed at reviewing the scientific literature on a particular topic, wheras articles presented new results (Abramo and Angelo 2015). Therefore, the benefits of knowledge recombination and output innovation from scientific collaboration might be more significant in articles than reviews. Additionally, the higher mean effect size of articles possibly related to citation practices. As Lachance et al. (2014) suggested, researchers tended to cite the corresponding original papers instead of the current (secondary) source when some content that was worthy of being cited was found in a review.

Various citation sources provided different coverage of scholarly publications, leading to different citation counts for the same paper (Bakkalbasi et al. 2006), which possibly explained the moderation effect of citation sources. For example, Web of Science and Scopus covered publications from multiple disciplines (Mongeon and Paul-Hus 2015), and thus papers could achieve higher citation counts when collecting citations from these databases. Moreover, Scopus provided broader coverage of publications than Web of Science in all fields, especially in Social Science and Arts & Humanities (Mongeon and Paul-Hus 2015), which could result in the larger correlation coefficient of Scopus than that of Web of Science. In contrast, other citation sources often had narrow and limited coverage of the literature. For instance, PubMed mainly included academic publications from Life Sciences & Biomedicine. Although Google Scholar included broader coverage (Harzing and Alakangas 2016), compared to Web of Science and Scopus, the mean effect size was siginifcantly lower, which might result from the unreliability and lack of transparency of its citation data (Mingers and Lipitakis 2010).

The current study found that all types of scientific collaboration correlated positively with citation counts, and there were no significant differences among them. This result addressed the inconsistency in previous studies with two possible reasons (Iribarren-Maestro et al. 2009; Gazni and Didegah 2011; Didegah and Thelwall 2013; Fu and Ho 2018; Sud and Thelwall 2016). On the one hand, although the classification for scientific collaboration of this study has been commonly applied (e.g. Didegah and Thelwall 2013; Fu and Ho 2018), different types of collaboration are overlapped. For example, Didegah and Thelwall (2013) and Sud and Thelwall (2016) found that, for publications, the correlations are moderate between the number of authors with the number of institutions and the number of countries. Bordons et al. (2013) further proved that the increase in the number of institutions was the main reason for rise in the number of authors. On the other hand, the comparable effect sizes of coarse-granularity collaboration (i.e., institutional collaboration and international collaboration) with individual collaboration might be due to coordination costs. Members from different institutions and countries constituted a potential source of collaboration diversity (Bordons et al. 2013), and more investments of coordination costs were thus required (Bammer 2008). Some empirical studies have also revealed the negative effects of coordination costs in institutional and international collaboration (e.g. Cummings and Kiesler 2007; Wagner et al. 2019).

Although it is generally accepted that scientific collabotation increases the number of self-citaions, whether self-citation plays a main role in improving citation counts remains controversial (Smart and Bayer 1986; Herbertz 1995; Van Raan 1998; Aksnes 2003). In our study, citation types did not have a significant moderating effect. In particular, we failed to demonstrate that the correlation between scientific collaboration and self-citation counts were higher than that of peer-citation counts. Therefore, the current study did not support the main role of self-citation in increasing the citation count of co-authored papers.

Reasons for publication bias

Although Egger’s regression analysis and the funnel plot showed some publication bias in the current study, the reasons for publication bias might be different from previous meta-analyses. These differences were related to the characteristics of bibliometrics studies. On the one hand, traditional publication bias resulted from less visibility of publications with fewer samples and insignificant effect sizes (Borenstein et al. 2009). However, bibliometrics studies generally had large sample sizes. For example, in this meta-analysis, the average sample size of the included studies was 7266.71, and even the minimum sample size (N = 38) met the criterion of a large sample in statistics (N > 30) (Li and He 2013). Therefore, it is less possible for bibliometrics studies to remain unpublished due to the insignificant results from the insufficient power of tests. On the other hand, the diverse research methods in the primary studies probably contributed to the main reason for publication bias. Although many papers were included after transforming their statistics to correlation coefficients, others were excluded because their results could not be transformed. For instance, multivariate linear regressions and generalized regressions were widely employed to investigate the influencing factors of citation counts in primary studies, but there are few applicable approaches to transform their results to correlation coefficients.

Conclusion and implications

Using a meta-analysis method to explore the relevant primary studies, we found that there was a positive and weak correlation between scientific collaboration and citation count. The correlation in Life Sciences & Biomedicine and Social Science was higher than that in other research domains. The correlation is also higher in articles and publications from developing countries. In addition, when choosing the Web of Science and Scopus as the citation sources, scientific collaboration is more closely associated with citation count.

The results of this study provide practical implications for both research administrators and researchers. As for research administrators, especially those from developing countries, incentives for scientific collaboration to improve the level of knowledge and skills of domestic scholars and to expand their academic impact are recommended. Since the significant and positive effect sizes existed in all research domains, administrators should not only pay particular attention to Life Sciences & Biomedicine and Social Science with the highest mean effect sizes, but also provide the same support or even more resources for other disciplines. For example, collaboration in Physical Sciences and Techology should be encouraged because of a heavy reliance on expertise and skills. For citation-based evaluation, the Web of Science and Scopus should be used as the citation sources to avoid underestimating the performance of scientific collaboration. In addition, reasonable self-citations of researchers are acceptable, and it is unnecessary to eliminate self-citations in evaluations. In terms of researchers, actively participating in scientific collaboration is encouraged among all disciplines and countries, particularly when conducting exploratory and innovative research. Researchers should also pay close attention to the efficiency of scientific collaboration, e.g., choosing competent partners and avoiding blind pursuit of large teams or international collaboration.

This study also offers guidelines for assessing heterogeneity in meta-analyses. Indicators providing different information should be reported together to comprehensively reveal heterogeneity. For example, Q-value (i.e., the ratio of the dispersion of effect sizes to the within-study variance), I2 (i.e., the ratio of between-studies variance to total variance), and prediction interval (i.e., the dispersion of effect sizes in a universal set of relevant studies) are recommended.

The quality of this meta-analysis was restricted by the availability and quality of the data in primary studies (Ellis et al. 2011). For instance, only 92 of 361 papers obtained by a systematic search reported the required data, resulting in publication bias. Although the significances of the mean effect sizes in each “journals” subgroup were different, the result of the between-studies ANOVA for this moderator was insignificant. Future studies are needed to explore the moderating effect of journals. Furthermore, this meta-analysis was a secondary research based on the results of extant studies. Although a positive and weak correlation between scientific collaboration and citation counts and some moderators have been identified, more detailed and qualitative analysis will be required to draw stronger conclusions on the reasons behind.

Complaince with ethical standards

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.