Introduction

There has been much research over the years into the factors that affect the number of citations that a published article attracts. Typical factors include the number of authors (Smart and Bayer 1986), the journal impact factor (Elkins et al. 2010), whether international collaboration is involved (Inzelt et al. 2009; Prathap 2013), whether a journal is open access (Craig et al. 2007), author reputations (Makino 1998) etc. Recent examples of such studies include those by Thelwall and Wilson (2014), Onodera and Yoshikane (2015) and by Tahamtan et al. (2016), whose detailed review identified no fewer than 28 factors in three broad categories, these relating to the article itself, to the article’s author(s), and to the journal in which the article was published.

Such studies have focused on cited articles but there has also been some interest in uncitedness, i.e., the study of articles that never attract any citations (Hu and Wu 2014; Hu et al. 2018; Schwartz 1997; Zhao 2015). There are many reasons why this might be the case for a particular publication: it may be of low quality, be poorly written or difficult to understand, be published in an inappropriate journal, be valuable but undiscovered or forgotten. That said, MacRoberts and MacRoberts (2010) have suggested that even uncited publications may influence subsequent research in a range of ways. There have been several studies of the statistical characteristics of uncited publications, e.g. (Burrell 2013; Egghe 2010; Liang et al. 2015). Lou and He (2016) reported a correlation between uncitedness and author affiliations. Stern (1990) analysed the bibliographic characteristics of uncited biomedical papers and found that the number of authors had a smaller influence on whether articles were cited. Hu and Wu (2018) have recently reported the reasons for citing or not citing a publication, including ‘prestigious authors’, and ‘academic tastes and interests similar to citers’ that can drive authors to form large research communities. In this paper we investigate differences of research communities between cited and uncited articles from a novel perspective, that of using social network analysis (SNA).

SNA is being increasingly used to study bibliometric phenomena, involving the co-occurrences in publications of authors, topics, institutions and countries, and similarly for the co-citations of same; here, we focus on co-authorship. Co-authorship of a publication can be thought of as documenting a collaboration between two or more authors, and these collaborations form a network in which the nodes represent authors and the edges between pairs of authors denote a joint publication. The use of such networks to reveal the structures of academic communities was pioneered by Newman (2001a, b, 2004). For example, a co-authorship study of publications from biology, physics, and mathematics, identified a single large connected group of authors in each case, representing 82–92% of the total number of author nodes. Newman (2004) noted that this suggested that a large fraction of each of these research communities could be regarded as working in what he described as a “linked research enterprise”, with the remaining author nodes forming a large number of much smaller connected components.

Furthermore, research collaboration in groups has been identified as important factor to drive productivity and enhance citation of research (Smart and Bayer 1986; Cohen 1991; Etzkowitz 1992). These factors have been examined by a large number of follow-up studies: size and productivity (Seglen and Aksnes 2000; Guimerà et al. 2005; Maaike et al. 2015), the difference in cooperation within research groups (Adams 2012; Kyvik et al. 2017), the role of the leader within groups (Pudovkin et al. 2012). Glanzel (2002) showed that the relationship between collaboration and the quality of the research varies across different subject fields. The evidence from the patterns and structure of social science co-authorship networks showed that authors with many collaborators and high scientific prestige gain more connections from newcomers than do their colleagues (Moody 2004). From studying the structure of co-authorship networks in four scientific disciplines (Physics, Mathematics, Biotechnology and Sociology) in four 5-year periods (1986–2005), Kronegger et al. (2011) revealed that, regardless of the research discipline, the co-authorship structure very quickly consolidates into a multi-core, semi-periphery to periphery structure.

There is subsequently a large, and growing, literature associated with the analysis of co-authorship networks, as exemplified in work of the patterns and structure of the community of researchers from the field of mathematical research (Grossman 2002), computer science (Izquierdo et al. 2016; Franceschet 2011), and the measure of authors’ centrality in co-authorship networks (Lu and Feng 2009), the detection and identification of research groups in co-authorship networks (Calero et al. 2006; Perianes-Rodriguez et al. 2010; Wang et al. 2012, 2013), the relations between research groups and quality of research through co-authorship analysis (Yan and Ding 2009; Abbasi et al. 2011; Reyes-Gonzalez et al. 2016), the stability of co-authorship structures (Cugmas et al. 2016), as well as scientific collaborations between China and the European Union reflected by co-authorship structures (Wang and Wang 2017; Wang et al. 2017), inter alia.

However, their studies mainly focus on the structure of research communities from the angle of unclassified publications that might not see into the mutual relations between the structure of research community and quality of research. This paper focuses on both cited and uncited publications, and uses analysis of co-authorship networks to look into the difference of research communities between two types of publications across different disciplines and countries. So, a SNA tool, CiteSpace (Chen 2004, 2006; Chen et al. 2010) has been used to address two principal research questions. First, are there significant differences between networks based on cited and uncited articles; second, are there significant differences between networks based on work conducted in different countries [specifically the People’s Republic of China (PRC), the United Kingdom (UK), and the United States of America (USA)].

Experimental details

The basic data for the study were obtained from the Web of Science (hereafter WoS) Core Collection in December 2016. Four different WoS subject categories were selected that contained a sufficient number of both cited and uncited publications in all of the three chosen countries for subsequent analysis (where the selected publications comprised articles, reviews and proceedings papers). These categories were Chemistry Organic, Engineering Environmental, Economics, and Management that respectively belong to three broadly but fundamentally different disciplines, i.e., one physical science, one engineering and two social science disciplines. The differences in the three chosen disciplines might help to provide persuasive evidence for the difference in the structure of research communities across different disciplines. The three countries chosen were the UK and the USA, which have for long been two of the most productive academic nations, and the PRC, which is a nation that has risen to academic prominence over the last decade or so.

For each category, all publications for the period 2000–2014 from the three chosen countries were identified, together with the citations to those publications for the period 2000–2016. The resulting WoS dataset contained 109,843 publications from Chemistry Organic, 78,738 from Engineering Environmental, 151,827 from Economics and 112,371 from Management, as detailed in Table 1. For example, 47,504 of the Management publications had USA authors, of which 37,723 were cited and the remaining 9781 were uncited.

Table 1 Numbers of publications for the period 2000–2014, and citations to those for the period 2000–2016

From these publications, we identified the most productive authors for each combination of subject and country. For each year of publication (2000, 2001 etc.), the 50 most productive authors were identified (and also the 51st, 52nd etc. if they had the same number of publications in a given year as the 50th most productive author for that year), subject to them having at least two publications in the chosen year. So, for example, in 2001 the most productive USA author from cited publications in Economics was Acemoglu D with 8 publications, then Kleit AN with 7 publications and so on. Furthermore, name ambiguity might affect the quality of research results to a certain extent. So, we checked name of each author in all co-authorship networks and merged the repeated authors through checking whether two authors own same affiliation. As an illustration, two authors–Allenby B and Allenby BR from uncited co-authorship network in Engineering Environmental of USA, own the same affiliation–department of Civil, Environmental and Sustainable Engineering, Arizona State University, Tempe, USA. Then we merged two authors in co-authorship network through the merging function named the Alias List in CiteSpace III.

The CiteSpace III system (available from http://cluster.cis.drexel.edu/~cchen/citespace/) was then used to generate two co-authorship networks for each combination of subject and country: one based on the cited publications for these productive authors and the other based on their uncited publications. The connected components in the resulting networks then corresponded to research communities (Girvan and Newman 2002), comprising groups of authors who are linked either directly (e.g., author A has published jointly with author B) or indirectly (e.g., authors A and C are linked if A has published jointly with B, who has also published jointly with author C).

The incidence of uncited publications

Citation studies often pay little attention to uncited publications. However, since one of the principal foci of this study is the difference between cited and uncited materials, we start with a brief review of the characteristics of the uncited publications in our WoS sample. Table 2 lists the percentages of uncited publications for each combination of subject and country, e.g., for the 2002 Chemistry Organic publications, 1.3% of the American, 4.9% of the Chinese and 1.9% of the British publications attracted no citations at all in the period 2002–2016. The bottom two rows of the table list the mean and standard deviation over the 15 years 2000–2014.

Table 2 Percentages of uncited articles in each year of publication for each combination of subject and country

We can draw two principal conclusions from this table. First, far fewer Chemistry Organic publications remain uncited than is the case for the other three subject categories, where the uncited percentage can be as much as an order of magnitude greater. The Chemistry Organic uncited rates range from 1.0% (2003 USA publications) to 10.2% (2014 PRC publications), but even the latter figure is notably less than the figure of 18.8% quoted by Hamilton (Hamilton 1991) for 1984 Chemistry Organic publications that were uncited in the period 1984–1988. In addition to the mean values, the year-to-year fluctuations (as quantified by the standard deviations) are also the smallest for this subject.

Second, the uncited percentages for PRC publications are consistently larger than for USA and UK publications (where the mean uncited percentages are very similar for all four disciplines). Indeed, for the 60 combinations (15 years of publication and four subject categories), the Chinese uncited percentage is the largest for all but three of them (Chemistry Organic in 2011 and Engineering Environmental in 2000 and 2002). Moreover, the differences are very substantial: between two and three times in the mean values for all but Chemistry Organic. This is especially so in the two social science subjects where the uncited percentages are in excess of 60% for all but the very earliest years (specifically, 2000–2001 for Economics and 2000–2003 for Management). Thus, while citations to Chinese research in general are increasing in many subject fields (Tang et al. 2015; Xie et al. 2014), there are still very large numbers that fail to achieve any form of impact in the shape of WoS citations. At least in the four disciplines considered here, this occurs far less for the USA and UK publications, thus mirroring the more general pattern of low Chinese impact noted by Radosevic and Yoruk (2014) and by Zhu et al. (2014).

Co-authorship patterns in cited and uncited publications

The co-authorship networks for the uncited and cited USA publications are shown in Figs. 1 and 2 respectively, the four diagrams in each case corresponding to the Chemistry Organic (upper-left), Engineering Environmental (lower-left), Economics (upper-right) and Management (lower-right) datasets for the most productive authors. Each connected component in the network can be considered as a research community that comprises individual productive authors who are linked together by the fact that they have co-authored papers with other authors in the network. The numbers and sizes of the connected components are shown in Tables 3 and 4, these corresponding to the networks in Figs. 1 and 2, respectively.

Fig. 1
figure 1

The co-authorship mapping from uncited USA publications for Chemistry Organic (upper-left), Engineering Environmental (lower-left), Economics (upper-right) and Management (lower-right)

Fig. 2
figure 2

The co-authorship mapping from cited USA publications for Chemistry Organic (upper-left), Engineering Environmental (lower-left), Economics (upper-right) and Management (lower-right)

Table 3 The sizes and numbers of research communities in four subject categories from Fig. 1
Table 4 The sizes and numbers of research communities in four subject categories from Fig. 2

Inspection of the four networks in Fig. 1 show that they all exhibit a comparable structure in which there are large numbers of small families (the largest being a single Engineering Environmental grouping containing 24 members). For example, the network based on the uncited USA Chemistry Organic publications contained 158 productive author nodes and 118 co-authorship links: these yielded a total of 36 families containing between 2 and 7 individuals, with the remaining 47 productive authors unlinked. The ratio of the number of links to the number of nodes is 0.75, with the corresponding figures for Economics and Management being 0.63 and 0.53, respectively; that for Engineering Environmental is 1.05, representing a more highly linked structure than for the other three disciplines. However, it will be clear that in all networks, the great majority of the families consist of just two or three authors. Figure 2 and Table 4 describe the corresponding networks based on the cited publications. It will be seen that the structures here are very different: while there are again many small, two-or three-membered communities, all four networks are dominated by a single, enormous community, a sort of extended research family: that for Economics contains 97 authors while those for the other three disciplines all contain over 200 authors.

The corresponding sets of networks for PRC and UK publications are shown in Figs. 3 and 4 (uncited publications) and Figs. 5 and 6 (cited publications), respectively. The eight UK networks are similar to those for the USA in that the uncited ones (Fig. 4) again consist of many small communities while the four cited ones (Fig. 6) are each dominated by a single, extended research family. This is also the case with the four networks based on the Chinese cited publications in Fig. 5. Where the Chinese networks differ are when the uncited publications are considered (in Fig. 3) since these all have the single, very large community that otherwise characterises the various cited-publication networks.

Fig. 3
figure 3

The co-authorship mapping from uncited Chinese publications for Chemistry Organic (upper-left), Engineering Environmental (lower-left), Economics (upper-right) and Management (lower-right)

Fig. 4
figure 4

The co-authorship mapping from uncited UK publications for Chemistry Organic (upper-left), Engineering Environmental (lower-left), Economics (upper-right) and Management (lower-right)

Fig. 5
figure 5

The co-authorship mapping from cited Chinese publications for Chemistry Organic (upper-left), Engineering Environmental (lower-left), Economics (upper-right) and Management (lower-right)

Fig. 6
figure 6

The co-authorship mapping from cited UK publications for Chemistry Organic (upper-left), Engineering Environmental (lower-left), Economics (upper-right) and Management (lower-right)

To save space, we have not included full tables analogous to Tables 3 and 4 here. Instead, the data for all of the networks is summarised in Table 5. This table organizes the communities into four sizes: those with ≥ 200 authors, with 20–100 authors, with 2–19 authors, and singletons; a blank cell denotes no communities within a given range of sizes. For example, if we read across the row for ≥ 200 Engineering Environmental authors, the Chinese uncited network has one community containing 450 authors (this was indeed the largest community identified in any of the analyses), and the USA, Chinese and UK cited networks each has one family containing 225, 365 and 318 authors, respectively. This table highlights the fact that the Chinese uncited networks are totally different in structure from the USA and UK uncited networks, whereas the three sets of cited networks are comparable in structure. It also makes clear that, in most cases, the majority of the authors are in small or singleton clusters, as would be expected given the power law behaviour of co-authorship networks first noted by Newman (Newman 2001b, 2004). A χ2 analysis of the data in Table 5 is detailed in Table 6, each cell of which contains the χ2 value and the associated probability of a significant difference (ν = 3) between the distributions of family sizes for the uncited and cited networks. There is no significant difference, at the 0.05 level of statistical significance, for two of the Chinese datasets, those for Engineering Environmental and Management, and the difference is only marginal (p = 0.049) for Economics. That is completely different from all USA and UK datasets.

Table 5 Communities based on uncited and cited publications
Table 6 Chi squared analysis for differences between the cited and uncited networks

The structural characteristics of the 24 networks (cited and uncited for four disciplines in three nations) are further summarised in Table 7 in terms of their density, modularity and centrality. The density is the ratio of the number of actual edges in the network to the total possible number of edges. The modularity is the fraction of the edges within the network minus the fraction of the edges that would be expected if they were assigned at random, so that a large positive value reflects a highly structured network. The centrality is the betweenness centrality for each node in the network, i.e., the number of shortest paths passing through that node. While some of the parameter values differ little between countries or between cited and uncited networks, others support the view that the Chinese networks differ markedly from their American and British counterparts. For example, the numbers of authors with non-zero centrality in the largest community, where the Chinese uncited values are much larger than the other two values across all four disciplines. Again, both the percentage of authors in the largest community and the mean centralities are larger, and the percentage of singleton communities smaller, for China in the sets of both cited and uncited publications.

Table 7 Characteristics of the networks for uncited and cited publications

The influence of the larger threshold for productive scholars

A reviewer suggested that the use of a larger number of productive scholars might affect the results obtained. To check whether this is in fact the case, the UK and USA Economics data was re-analyzed using the top-100 scholars, rather than the top-50 as previously. The structural characteristics of the resulting networks are detailed in Table 8, which reveals a very similar pattern of behaviour to that observed in the Economics portion of Table 7. Analogous results are obtained if we compare the numbers and sizes of the various communities obtained using the two sets of scholars, as shown in Table 9. The top-50 co-authorship networks are dominated by large clusters (containing 97 and 177 authors for the USA and the UK respectively) for the cited publications and exactly the same behaviour is observed for the top-100 networks (240 and 364 authors for the USA and the UK respectively). That apart, the great majority of the authors are in small or singleton clusters containing between 1 and 19 members: 100% for the uncited USA and UK publications, and 82% and 76% for the cited USA and UK publications. There is again a statistically significant difference between the cited and uncited networks: the χ2 values for USA and UK are 2.739 and 3.768 respectively, and the associated probability is 0.000 in both cases. It would hence seem that the choice of threshold has little or no effect on the overall structure of the research communities or on the differences between the cited and uncited publications.

Table 8 Characteristics of the networks for uncited and cited publications
Table 9 Communities based on uncited and cited publications

Discussion and conclusions

As noted in the Introduction, we sought to answer two separate questions, viz whether there are differences between networks based on cited and uncited articles, or based on different countries. It is clear from the above that these two questions are inter-related in that the results for the two types of Chinese network differ considerably from those for the other two countries. Collaboration—whether at the institutional, national or international level—has long been recognised as an important factor in the ability of a research publication to attract citations. It might be hence expected that there would be greater evidence of collaboration in cited, as against uncited, publications. The co-authorship networks considered in this paper have demonstrated that this is certainly the case for USA and UK publications in four disparate research disciplines, since there are significant differences in structure between the cited and uncited networks. The former are characterised by a single, extended community involving a large fraction of the total sets of productive authors, together with a large number of much smaller, or even singleton, components; whereas the extended community is absent from the latter. There is some degree of variation in the precise figures but this is hardly unexpected given the very different ways in which disciplines are organized in the physical sciences, engineering and the social sciences, and the differences are small when compared with the corresponding results for the Chinese researchers.

We are unaware of any previous studies that have suggested that the Chinese results might be very different from those of other countries. Our results show that there is a much greater degree of commonality between the cited and uncited networks for the PRC, with both types of network exhibiting a single, highly extended community that encompasses a large fraction of the complete set of productive authors. So why have the extensive linkages, as reflected in the co-authorship network, not resulted in the expected impact in terms of citation in the published literature? One possible reason may be a cultural phenomenon arising from the widespread use of gift or honorary authorship, i.e., the inclusion in a list of authors of individuals who have made little, or no, substantive contribution to the research described in the publication. Here, a combination of the ‘publish or perish’ syndrome, payment for publication in WoS journals, and an in-built respect for authority figures in China mean that it is common for new, or less qualified, staff to include better established, senior colleagues as authors in an attempt to increase the chance that a submission will be accepted for publication in a prestigious journal. It must be emphasised that we are not suggesting that this is a purely Chinese phenomenon since it clearly occurs in many countries and disciplines (Sokol 2008; Strange 2008; Zaki 2011); however, it is known to be particularly well established in the Chinese context (Hvistendahl 2013; Liao et al. 2017; Macfarlane 2017). For example, an empirical analysis by Hao et al. (2009) found that guest authorship was involved in 28.6% of papers published in 2008 in the Chinese Medical Journal, the great majority of the guest authors being heads of departments or institutions. In contrast, Wislar et al. (2011) quoted a markedly lower figure of 11.4% for 2008 articles in three USA journals (Annals of Internal Medicine, Journal of the American Medical Association, and New England Journal of Medicine).Footnote 1

Chen and MacFarlane (2016) have suggested that guanxi (a generic term for the networks of relationships that are used to oil the wheels of business and more generally) and the intensive norms of reciprocity that dominate academic life in China are very different from the culture of authorship in Western academe. Tang et al. suggested guanxi as one of several factors that could influence the citation counts for highly cited publications in nanotechnology, a topic that has been the focus of considerable Chinese research efforts (Tang et al. 2015). The work reported here extends that of Tang et al. in that our results suggest that guanxi may also occur in other disciplines (and not just in nanotechnology) and that this effect applies to uncited as well as to cited articles: consequently, this is a factor that might usefully form the basis for future bibliometric studies.

In this paper, we have focused on the role of gift authorship in China, but—as Rethinaraj and Chakravarty (2017) noted—such behaviour is common amongst researchers in Asia. In particular, the phenomenon has been extensively discussed in the Indian context (see, e.g., Bavdekar 2012; Daniel 2016; Mandal et al. 2015; Zaki 2011), and it would hence be of interest to determine whether this can be identified in co-authorship networks of the sort investigated here. Furthermore, it would also be worthwhile to explore how a co-authorship network might change if individual sets of publications were grouped not only by whether they were cited or not cited, but also by how many citations they received. Our study still belongs to a probing analysis, in the following studies, more disciplines and countries might be selected to verify the robust of the findings in this manuscript.