Introduction

As carriers of knowledge, scientific publications play a vital role in scholarly communication. Quick access to high-quality publications is of great importance for scholars. Although peer review has been recognized as the principal mechanism for quality control in most scientific fields (Bornmann 2011), there are many criticisms of the process. Therefore, academia sometimes uses citation count as an alternative measure of quality because of its availability, applicability and objectivity (Garfield 1979). Some studies have shown that the outcomes of bibliometric indicators are generally in line with those of peer reviews (Li et al. 2010; Oppenheim 1995; Raan 2006). However, the use of citations as an alternative to expert judgment of quality has itself been subjected to continuous scrutiny. Therefore, it is important to understand the factors which influence the number of citations a scientific publication receives.

Tahamtan and Bornmann (2018) provided a conceptual model of citation process, based on the context of cited documents, processes from selection to citation, and the context of citing documents. Contextual features of the cited documents include document features (e.g., perceived quality, accessibility, title, abstract, etc.), author features (e.g., number of authors, the rank of authors’ institutions, etc.), journal features (e.g., the scope and reputation of a journal) and document values (e.g., the perceived utility). Tahamtan et al. (2016)’s comprehensive review has also identified 28 factors contributing to citation count and grouped them into three categories: paper-, journal- and author-related factors. Among previous studies, some have focused on one category of factors, for example, Amara et al. (2015) explored what kind of faculty members achieve high scholarly performance; some researchers have investigated two categories (Wesel et al. 2014); others have taken all three categories into account, e.g., Antoniou et al. (2015) and Leimu and Koricheva (2005). It is now commonly accepted that a number of bibliometric indicators also affect citation count, such as the number of authors, Journal Impact Factor (JIF), the number of pages, and the presence of early citations (Yu et al. 2014). In addition, the influential factors vary across disciplines (Antoniou et al. 2015; Stremersch et al. 2015; Yu et al. 2014).

A variety of regression analyses, including multiple linear regression and negative binomial regression, have utilized different numbers of factors to construct models with different degrees of fit (Yu et al. 2014; Bornmann and Leydesdorff 2015; Stremersch et al. 2015; Haslam et al. 2008; Lokker et al. 2008). Stepwise regression is preferred as a method of selecting significant factors among large candidate sets (Vanclay 2013).

Those studies focusing on a single category of factors have offered detailed explanations of citations within a specific context, but failed to make comparisons across categories. Moreover, although previous studies have considered journal-, author-, and article-related factors in tandem, they only emphasized the quantitative factors directly from scientific databases. Less is known about qualitative factors, such as authors’ educational background, etc. In addition, some previous studies reviewed factors related to citation count, but neglected the difference of their importance. Based on these observations, we aim to address the following questions in the present study:

  1. (1)

    Are there any heretofore neglected factors which have significant association with the citations of scientific publications?

  2. (2)

    Which factors, whether established or newly identified, have the greatest influence on citation count?

The structure of this article is as follows. In the next section we review the literature which involves citation-related factors and frequently used regression models, the third section presents data and methodology, the fourth section reports the results, and in the last section we draw conclusions and discuss potential implications and limitations.

Related work

In the conceptual model of the citation process in Tahamtan and Bornmann (2018), they found that many factors are associated with authors’ decision to cite a document, such as the location of the citation context, the features of citing documents, as well as their authors’ and journals’ features. In addition, they explained many reasons for citing a document, such as provision of a background for new research, use of the cited authors’ methodology, and criticism of a previously published work. In the context of cited documents, document features, author features, and journal features, together with citing authors’ positive or negative attitudes toward the documents’ value, are all revealed to influence citation process.

The current study intends to uncover more factors which may have significant correlation with citations, and to identify the most correlated ones through comparisons. We organize factors which potentially affect citations into four categories: article-, author-, reference-, and citation-related factors, and review corresponding literature respectively.

Factors associated with citation counts

Article-related factors

It is verified in previous studies that the number of citations an article received is significantly positively correlated to the length of the article (generally operationalized as page count), (Leimu and Koricheva 2005), the number of keywords, the length of abstract (Rostami et al. 2014; Wesel et al. 2014), structured abstract, the number of tables, figures, footnotes, appendixes and formulas (Lokker et al. 2008; Stremersch et al. 2015), open access and inclusion in numerous databases (Lokker et al. 2008), download times (Schlögl et al. 2014), as well as document types, article age, peer reviewing, reviewing times, funds and acknowledgements (Bornmann and Leydesdorff 2015; Vanclay 2013; Rigby 2013). The length of title sometimes has no or negative effects on citations (Rostami et al. 2014). In addition, the number of citations is also associated with the order of an article’s appearance in a journal (Yu et al. 2014; Bornmann and Leydesdorff 2015; Dalen and Henkens 2001; Stremersch et al. 2015), i.e., earlier articles usually receive more citations.

Some article-related factors deserve further investigation. For example, the effect of download times used to be analyzed separately by bivariate methods, without comparisons with other factors. For another example, since previous studies only applied absolute order and the number of articles in an issue varies significantly across journals, the relative order of an article is worth exploring.

Author-related factors

Amara et al. (2015) investigated factors which explained why scholars perform differently in terms of research productivity and impact. According to their results, authors’ time allocated to research and teaching, financial resources, academic title, and institutional affiliation are all significant in this respect. An author’s previous citations and publications increase the citations of a given article (Yu et al. 2014; Stremersch et al. 2015). Co-authorship characteristics, including the number of authors, the first author’s h-index, and the group’s highest and minimum h-index, are also related to the number of citations (e.g., Hurley et al. 2013). Other correlated factors include institutional, regional, and international collaborations, as well as interdisciplinary cooperation (Amara et al. 2015).

Numerical factors have been frequently considered in previous studies, such as the number of authors and h-index. However, many categorical factors’ influence has been underestimated. For example, whether authors’ nationality, educational background or academic degree exerts significant influence on citations remains unanswered.

Reference-related factors

The number of citations an article received is associated with the number and impact of references (Stremersch et al. 2015; Bornmann and Leydesdorff 2015), the variety of references (Chakraborty et al. 2014), as well as reference currency, author self-citation and journal self-citations (Roth et al. 2012; Vanclay 2013).

It is revealed that language barrier generates negative impact and prevent people from seeking necessary information (Henderson 2005). Whether an author is proficient in a foreign language or languages (and thus gets access to rich information in those languages) could be indicated by the percentage of foreign-language references. Moreover, different document types carry distinct information or knowledge (Crossick 2016).

Citation-related factors

Early citations, i.e., citations an article receives immediately after publication, are the early feedback of the scientific community about the article’s impact, and significantly correlate with total citations (Dalen and Henkens 2005). The time at which a paper receives its first citation represents its speed of dissemination in the scientific community, which is often characterized by the reciprocal of an article’s first-cited age (Yu et al. 2014). Citations received in the first 2 years can also be a predictor of citation counts according to Yu et al. (2014). They detected the positive effects of the number of countries, document types, journals and disciplines citing an article in its first 2 years after publication in detail. However, whether these effects can be expanded to a larger citation window requires further study.

Regression models involving citation counts

More and more studies have developed regression models to compare the different effect size of influencing factors and to predict future citations at an early stage. Bivariate analysis was used to investigate the relationship between bibliometric indicators and citation impact (Leimu and Koricheva 2005; Rostami et al. 2014; Schlögl et al. 2014), and to explore whether the above factors have significant effects on citations. Negative binomial regression is a standard model used to account for over-dispersion that would be underfitted by a Poisson model. For instance, Didegah and Thelwall (2013) employed a zero-inflated negative binomial regression model with eight independent factors and identified JIF and impact of references as the most effective predictors of citation counts. A comparison between negative binomial regression model and ordinary least squares regression was made by Bornmann and Leydesdorff (2015) to prove the stability of the former model’s results.

Ordinary least squares regression is a widely used linear regression model in this context, often applied after the logarithmic transformation of bibliometric indicators. For example, through multivariate regression, Antoniou et al. (2015) found three factors’ independent influence on the number of citations: study subject, study design and article length. Royle et al. (2013) performed multiple regression to determine the amount of variation in citations attributable to JIF and other factors.

Stepwise regression is preferred as a method of selecting significant indicators among a large candidate set. It is a method that guarantees the validity and importance of the chosen indicators and reduces the additional error introduced by redundant indicators (Yu et al. 2014). Accordingly, stepwise linear regression has become popular in bibliometric research, where large sets of candidate factors are commonplace. Vanclay (2013), for instance, constructed a stepwise multilinear model beginning with 12 potential factors, five of which were identified to play a significant role in predicting citation impact. Yu et al. (2014) also compared forward selection, backward elimination and bidirectional elimination theoretically. It is a common practice to use forward stepwise regression to select a good subset from a moderate or large number of predictors, and predictors chosen by forward stepwise regression can decrease the residual sum of squares effectively (Taylor and Tibshirani 2015).

Methodology

To address the research questions raised in Introduction, we constructed, labeled, and analyzed a large dataset of full-text journal articles. Detailed information is listed in Tables 1 and 4, while data sources and the holonomic schema are presented in the Appendix. Bivariate analysis was used to detect the relationship between different candidate factors and citations, and stepwise regression analysis was employed to compare the sizes of the effects.

Table 1 Results of data collection and bivariate analysis for article-related factors
Table 2 Results of data collection and bivariate analyses for author-related factors
Table 3 Results of data collection and bivariate analysis for reference-related factors
Table 4 Results of data collection and bivariate analysis for citation-related factors

Data source

The Chinese Social Science Citation Index (CSSCI), developed by Nanjing University, is an important tool for inquiry and assessment of the major publications and journals in the humanities and social sciences (Yang et al. 2010). The list of journals in CSSCI is revised every 2 years. The 2014–2015 edition includes 18 journals in Library and Information Science (LIS): Journal of Academic Libraries; Journal of the National Library of China; Information Science; Information Studies: Theory & Application; Journal of The China Society for Scientific and Technical Information; Journal of Intelligence, Information and Documentation Services; Library; Library Work and Study; Library Development; Library Tribune; Research on Library Science; Library Journal; Library and Information Service; Documentation Information & Knowledge; Library & Information; Data Analysis and Knowledge Discovery (previously New Technology of Library and Information Science); and Journal of Library Science in China.

Chinese National Knowledge Infrastructure (CNKI), launched by Tsinghua Tongfang Knowledge Network Technology Company, is a full-text scientific database (Wan et al. 2010). To ensure at least 1 year for citation accumulation, we downloaded full-text articles published between 2005 and 2015 in the 18 journals, obtaining 55,720 articles in total. Forty-nine bibliometric indicators were encoded manually from the full-text articles or accessed automatically from CNKI. The full-text articles published in Journal of The China Society for Scientific and Technical Information were not indexed by CNKI prior to 2017; therefore, instead we accessed them from Wanfang Data (another full-text scientific database) and manually searched these articles in CNKI to identify their citation counts.

Baidu Scholar (xueshu.baidu.com), similar to Google Scholar, indexes Chinese scientific publications, as well as overseas literature, across a variety of publishers and disciplines. We used this platform to count the number of databases in which an article was indexed (X21) and to determine whether an article was open-access (X22). Other online resources, such as institutional websites and author’s homepages, were used to code author-related factors, e.g., educational background, age and alma mater.

Sampling

The dataset was then randomly sampled. According to the method introduced by Berenson and Levine (1993), the sample size was calculated via Formula 1 below. P represents the diversity of samples and e represents error. On condition that P = 0.5 (the maximum) and e = 0.05, with Z = 1.96 and N = 55,720, the required sample size is 382. To enhance the reliability of the study, we sampled 600 articles randomly from the sample set.

$$N = P(1 - P)/(e^{2} /z^{2} + P(1 - P)/N).$$
(1)

When extracting data from the sampled articles, we excluded 34 non-scholarly articles, remaining 566 articles in our final sample. We also calculated the percentage of selected articles in each journal and found that the likelihood of being sampled was nearly the same for articles published in different journals.

Data collection

Two authors of this study were trained before data collection and intercoder reliability was calculated by percent agreement and kappa coefficients (Lombard et al. 2002). Two authors conducted a pilot test of the schema informally by extracting data from ten articles together, then discussed problems and revised the schema. The percent agreement of them over all factors was 0.952 (>0.90), and is satisfactory (Lombard et al. 2002). All reference- and citation- related variables, and most of the article- and author-related variables were objective, which can be extracted from databases directly. Thus, the study calculated the kappa coefficients of the other 16 article- and author-related variables which were subjectively coded by the two authors. All 15 coefficients were bigger than 0.700 and the other one was almost 0.700 (kappa for X1 = 0.697), indicating a substantial agreement between coders (Lombard et al. 2002). Disagreements were settled by negotiation.

Descriptions of selected factors are given below, while the schema for all 66 potential factors and average citations per year is listed in the Appendix. As shown in Tables 14, all potential factors (X1 to X66) studied here are substantial features, rather than subjective judgments like perceived utility and quality. Average citations per year was used as the citation metric to correct year differences.

Article-related factors

We divided article order into anterior, medium and posterior tertiles with respect to the different numbers of articles in different issues. Category number was in accordance with the Book Classification of China (National Library of China 2010). Funding was divided into 6 categories: national funding, funding from Ministry of Education of the PRC, provincial funding, prefectural funding, other, and no funding.

Author-related factors

Since the first author makes the most contribution to a Chinese article in LIS, all author-related features were coded based on the first author and the other authors were neglected. The educational background of the first author was described with reference to his/her highest degree, according to Ministry of Education of the PRC (2005). Academic degree likewise referred to the highest degree of the first author, categorized as junior college, bachelor’s, master’s, or doctorate. The classification of universities which the first author graduated from included “Project 985”, i.e., thirty-nine leading Chinese universities that meet certain scientific, technical, and human resources standards and receive financial support from the government; “Project 211”, i.e., 116 universities in China that meet less stringent requirements and are eligible for government support; “general university”; “foreign university”, and “other.” Authorship was categorized into teachers, librarians, independent researchers (who are not affiliated to universities), students, officials or engineers. Categories of affiliations included “Project 985” universities, “Project 211” universities, general universities, junior colleges, national libraries, provincial libraries, city libraries, district libraries, military libraries and foreign affiliations.

Reference-related factors

The mean JIF of references was the average impact factors of journals in the references. Reference age was denoted by mean age of references.

Citation-related factors

The first-cited age was denoted by the reciprocal of the number of years from publication to the first citation. A value of zero was used for never-cited articles.

Statistical analysis

We applied bivariate analysis including Spearman Correlations and other Nonparametric Tests to our data, because of the skewed distribution. For ordinal variables and nominal variables, the Mann–Whitney U test is used to determine whether two independent samples were selected from the populations with the same distribution, while Kruskal–Wallis is applied for comparing three or more independent samples. Kruskal–Wallis test indicates that at least one sample differed from the others. An effect can be of no importance, but highly significant (Cumming and Calin-Jageman 2017). Thus, the present study also compared the effect size of correlations, according to Cohen (1988)’s rules, i.e., r = 0.00–0.09 for no correlation, r = 0.10–0.29 for weak, r = 0.30–0.49 for medium and r = 0.50–1.00 for high.

To find the most significant factors, average citations per year was regressed on all variables showing significant effects in previous bivariate analysis. Except for variables of citation-related factors, 7 scale variables with missing data were estimated by linear interpolation in regression analysis, because replacing missing data in early citation-related factors could introduce biases. Records with missing data in 10 categorical variables were deleted automatically. We constructed dummy variables, and scale variables were standardized after logarithmic transformation. Then, we chose the factors that most decreased the residual sum of squares, and entered one factor each time in this procedure. The forward stepwise multiple linear regression model was run with logarithmic transformation of the dependent variable after adding 1.0, and independent variables were entered into the model if P < 0.10 using simple regression. SPSS 23.0 was used in both processes.

Results

Bivariate analysis

Article-related factors

As shown in Table 1, we found no significant correlation between the number of citations and article order, category number, timespan of peer review, or the number of footnotes or formulas. In addition, whether the article is OA or has structured abstracts made no difference in terms of citations, whereas document type and inclusion in more databases did. There was a strong relationship between citations and download times (r = 0.74). The correlation coefficients between citations and the number of keywords (r = 0.15), words in abstract (r = 0.14), and number of tables and figures (r = 0.12), though small, were all significant and positive. However, there were weak relationships between citations and JIF (r = 0.13) or article length (r = 0.27). The total number of publications in a journal year, meanwhile, was slightly negatively associated with citations (r = − 0.13, p = 0.003). X14, the number of appendixes in an article, equaled 0 in all articles sampled, so the effect of this factor could not be detected.

Author-related factors

As shown in Table 2, it was found that educational background, academic degree, years of research, years of working, average h-index of all authors and authorship category were all significantly associated with citations. However, in this study, whether the first author is a nationally recognized top talent (i.e., a Yangtze Scholar) was not correlated with citations, neither was alma mater nor cross-discipline/province collaboration.

It is identified that the number of citations was associated with gender, productivity, previous citations, previous citations per article, authors’ affiliations and academic title. Particularly, the authors’ previous total citations and previous citations per article had medium effects (r = 0.42 and r = 0.49). Moreover, the h-index of the first author (r = 0.38) and the highest h-index among authors (r = 0.35) also correlated moderately with citations, with coefficients higher than that of the minimum h-index among authors (r = 0.20). Besides, the number of authors had a small but significant effect (r = 0.16). Cross-institution collaboration connects to citations, but international collaboration does not. There is no evidence that authors’ age or nationality correlated to citation count.

Reference-related factors

Table 3 shows that in an article’s references, the percentages of conference papers (r = 0.10), online resources (r = 0.15) and dissertations (r = 0.12) as well as monographs (r = − 0.18) are weakly associated with the number of citations it received. The percentage of journal articles in the references, however, has no effect. Interestingly, the percentage of foreign-language references also slightly correlates with citations (r = 0.17, p = 0.000). In addition, the impact of references, i.e., the total number of citations an articles’ references received, is associated with citations. There is a slight tendency that articles with more recent references received more citations (r = − 0.15, p = 0.000). Among these reference-related factors, the number of references has the highest correlation with citations, but the effect size is still small.

Citation-related factors

There is negative correlation between citations and an article’s first citation age (measured in years), with correlation coefficient 0.31 after reciprocal transformation. The number of citations an article received in the first year since publication correlates strongly with the total citations (r = 0.76), and the coefficient increases along with the enlargement of citation windows (r = 0.87 for the first 2 years and r = 0.97 for the first 5 years). A similar pattern holds for the number of citing journals, where the correlations are 0.66, 0.78 and 0.90 in the first year, first 2 years and first 5 years respectively.

Regression analysis

The results of stepwise multiple linear regression are shown in Table 5. Among the 46 factors which have significant effects in the bivariate analysis, only 6 passed the further test of regression analysis with R2 = 0.948. The value of R2 is more satisfactory than the results obtained by Yu et al. (2014) (R2 = 0.676), Vanclay (2013) (R2 = 0.450), or Haslam et al. (2008) (R2 = 0.360).

Table 5 Results of stepwise regression

The citation count in the first 5 years was found to have the most significant effect on citations (β = 0.74). Download times also exerted a positive influence on average citations (β = 0.26). Interestingly, authors’ characteristics, including authorship category, educational background, and category of affiliation, were also found to have positive effects. Authors who are independent researchers, received more citations (β = 0.10). However, authors who work for “Project 211” universities (β = − 0.04) or who have computer science background (β = − 0.06) were indicated to have a negative influence on citations. Unexpectedly, the percentage of monographs in the references was also slightly related to the citations an article received (β = 0.06).

Discussion

The citation preference of social science research is different from that of natural sciences. For example, natural sciences articles have less references than social science ones (Skilton 2006). The percentage of self-citations in social sciences and humanities is 49%, and in natural sciences the percentage is 87% (Larivière et al. 2006).

The factors affecting citations

Author-related factors

Since senior scientists who produce higher-quality publications and hold higher academic titles received more citations (Slyder et al. 2011), we inferred that scientists who conduct research (X33) and work (X34) for longer time were more experienced and hence received more citations. In keeping with the prior observation that faculty members who dedicate more time to research activities at the expense of teaching, administration and consulting activities achieved better publication and citation performance (Amara et al. 2015), the academic title of authors (X30) had a positive correlation with citations. Furthermore, an author’s educational background and academic degree reflect academic literacy and improve the performance of publications as well. This is in line with the findings of Ried and McKenzie (2004), who argued that students’ academic preparation, as measured by academic degree, was a predictor of citation performance.

There is a strong relationship between citations and the author’s h-index as well as the number of authors (Qian et al. 2017). The significant effect of the first author’s gender (X24) and affiliation (X35) are also supported by many previous studies (Amara et al. 2015). The association between citations and cross-discipline/country/province collaboration was not supported, but cross-institution collaboration made a difference.

In our study, recognized top talent in the author list did not significantly contribute to an article’s citations, though his/her productivity and citation impact did (Dalen and Henkens 2005). This finding may be aroused by the small number of reputational scholars, since there were only 15 Yangtze Scholars in LIS at the time when the dataset was collected.

Reference-related factors

A notable finding in this part was the small but significant influence exerted by the variety of references (X56 to X59). Referencing more dissertations, conference papers and online resources were associated with higher citation counts, which has been explained by previous studies (Chung et al. 2012; Crossick 2016; Yi 2004).

We found that the proportion of foreign-language references in an article also associated with citation counts. A study of the usage of English-language resources by Spanish and Latin American literature scholars shed light on the percentage growth of foreign resources in humanities (Nolen 2014).

The number of references, as well as their respective impacts and recency, exerted weak influence on citations, as documented previously (Bornmann and Leydesdorff 2015). This might also imply a quality element: for example, with more references, the authors obtained more subject-matter knowledge from other researchers, which boosted the quality of the original research (Haslam and Koval 2010; Bornmann and Leydesdorff 2015).

Article-related factors

In this study, we found that the article order in a journal issue was not positively associated with the number of citations, which is possibly because of the easy access to publications in digital era. The category number given in Chinese articles had no effect either, which perhaps indicates the increasingly important role that interdisciplinarity has played in LIS (Xu et al. 2016). Although funding (X18) and acknowledgements (X19) are independent in Chinese articles, they highlight the importance of financial and academic supports, both of which facilitate high-quality research (Amara et al. 2015). However, open-access (OA) seems irrelated to citations, possibly because most institutions in China subscribe to scientific full-text databases (e.g., CNKI and Wanfang Data mentioned above) which provide convenient access to rich information sources. Bhat (2009) suggested that peer-reviewed articles and those with longer reviewing time correlate positively with citations, but this study did not support this suggestion.

As documented before (Haslam et al. 2008; Hegarty and Walton 2012), we found that containing more tables and figures, an extensive abstract, more keywords and more pages were slightly associated with citations. Why title length was not associated with citations could be explained by the argument that a valuable title is simple, understandable and informative (Rostami et al. 2014). In addition, it has been observed that articles in journals with high JIFs received more citations.

Although downloads do not inevitably translate to citations, we found that within our sample, article which were downloaded more received more citations. Because once a paper was downloaded, it can potentially have a citation (Jamali and Nikzad 2011). Many scientific databases have the capability to rank search results by download count, so articles being downloaded frequently attract more attention and disseminate more broadly. The positive effect of the inclusion number of databases and the negative effect of the total number of publications in an issue can also be explained as visibility effects (Lokker et al. 2008).

Citation-related factors

We found that the more quickly an article is recognized by scientific community, the more citations it is likely to receive. This effect is sometimes known as the “first mover advantage”: being a pioneer leads to a strong advantage in scientific performance (Sabatier and Chollet 2017). The dynamic Waring model of Glänzel and Schubert (1995) also proved the influence of the number of early citations on those obtained later, which explains the cumulative advantage. The high correlation between current and previous citations of a paper (e.g., citation counts in the first year, 2 years, or 5 years) is of prime importance in increasing visibility, and this process of dissemination can also be accelerated and broadened by the citing journals (Yu et al. 2014). The reasons may be twofold: on one hand, search results in databases are often ranked by previous citation counts; on the other hand, highly-cited articles are frequently seen in others’ references (Hlimer and Lusk 2009).

Regression model

In this model, the six variables exerting significant influence on citations belong to different categories, including examples of author-, reference-, citation- and article-related factors. By examining the factors which correlate to citations, authors can boost the impact of their work, while readers can more effectively select articles with high quality and impact. Since variables were standardized before regression, we compared the coefficient of different variables with average citation count, i.e., dependent variable.

Citation count in the first 5 years exerted the strongest effect in the model, which underscored the cumulative advantage discussed above. With the development of electronic publishing, the metric of download counts has become a popular ranking feature, and its association with citations has been documented in previous bibliometric and altmetric research (Schlögl et al. 2014). In the present study, we first compared the effect of downloads with that of other indicators in a regression model. Apart from the citation counts in the first 5 years after publication, downloads had the strongest correlation with citations. Combined with the results of the present study, this suggests the value of an attractive title and extensive abstract in encouraging more potential readers to download an article (Jamali and Nikzad 2011).

The monograph has a central place in the culture and ecology of research publication in the fields of arts and humanities and is likewise important in most of the social sciences (Crossick 2016). The percentage of monographs in a reference list thus relates to citations. As argued by Crossick (2016), monographs offer the length and space for full examination of a topic, rich ideas, which is not possible in a journal article.

The effects of three author-related factors have also been highlighted. Independent researchers (X30) received more citations. University faculty members’ research time has been occupied by teaching, administration and professional consulting activities (Amara et al. 2015), whereas independent researchers, however, can focus on their research. This result suggests that scholars must devote more time to research activities to increase the probability of success in their academic career. Furthermore, the 39 top-tier universities in China (“Project 985”) received a great deal of funding to promote the development and reputation of Chinese higher education system, and overshadowed “Project 211” universities in all aspects. Therefore, articles published by better universities may have advantages in receiving more citations (Mingers and Xu 2010). We found that authors with an educational background in computer science might find it difficult to get recognized in the field of LIS. The reasons may be that scholarly practices differ between these two disciplines (Qian et al. 2017), and few articles to date have been published in LIS journals by computer scientists.

Although the dataset of the present study included only eighteen Chinese LIS journals, the findings are in line with previous studies of international journals. First of all, this study proved the significant effects of the rank of affiliations (Stremersch et al. 2015), downloads (Schlögl et al. 2014), and early citations (Yu et al. 2014) on citation counts through regression analysis. Second, it has been found that there are significant relationship between citation counts and the factors discussed above, i.e., document type (Mingers and Xu 2010), article length (Bornmann and Leydesdorff 2015), the number of tables and figures (Stremersch et al. 2015), Journal Impact Factors (Haslam and Koval 2010), publication year (Sin 2011), collaboration (Sin 2011), authors’ previous productivity (Yu et al. 2014), academic title (Amara et al. 2015), the number of references (Haslam et al. 2008), etc. Nevertheless, Chinese journals are characterized by local features which may also appear in other non-English journals. For example, no relationship was found between international collaboration and citations (Sin 2011), as there is rare international collaboration in Chinese journal articles. It also characterized Spanish journals (Jesús and José 2004). In conclusion, the present findings from the Chinese journals are not limited to Chinese academia, but also shed lights on international academia.

Conclusions

In this paper, we probed the influence of 66 factors on citations, by drawing sample of articles from eighteen leading Chinese library and information science journals. Bivariate and regression analyses shed light on the relationship between a host of factors and citation counts, and created a more comprehensive picture of citation patterns.

There are at least two limitations in this study. One is our reliance on manual extraction of data, which limited the sample size compared to previous predictive analysis. Though the dependent variable in this randomly sampled set was highly skewed, samples with high citations might be under-represented. In further study, the population can be split into pieces, for example, through investigating the factors related to citations of highly cited articles and factors related to never-cited ones. The other is the limited scope of the sample: our conclusions are based only on Chinese articles in the field of library & information science. Further studies should investigate publications in other fields and other languages, in order to establish a wider applicability of these findings.