Introduction

Up until recently, assessing the quality of an article through the average quality of the journal in which it was published was the only possible way to go. However more and more citation indexes exist for individual publications, as for example those of the Journal of Citations Report, Scopus, or Google Scholar. Currently, the two ways of assessing publication quality, through the average quality of the journal or individual citations, co-exist. There has been very few attempts to assess whether the two approaches really measure the same phenomenon. Abramo et al. (2010) for instance compare Italian university rankings obtained when using Web of Science citations and journal impact factors over 2004–2007 but they do not control for any of the characteristics of the universities, nor of their academics. Using an individual dataset of French academics and their publications over 1969–2008, we provide the first econometric assessment of the determinants of the two types of measures and we evaluate the extent to which they are substitutes or not. Importantly, we control for a large number of co-variates at the individual level as age, gender, field of specialisation, team size, and network.

We propose to answer three sets of questions. First, are some of the standard determinants of productivity in market activities (age and gender) also determinants of productivity in academic research, or do other variables (as typically the size of the author’s team and network) also play an important role? Second, to what extent are publications and citations records driven by specialisation fields? Third, do individual citations and publication scores adjusted for journal quality measure the same dimension of academic productivity, and which variables (age/gender, specialisation, or team size/network) drive the gap between the two?

We use an exhaustive dataset of French academic economists in 2008, their publication records in EconLit and their Google Scholar citation indexes. Alongside the academic’s age, age-squared and gender, we introduce in the model estimated a first variable specific to the organisation of labour in academic research, which is the average number of authors per publication (we refer to this variable as the author’s team size). We find that larger author teams have a more robust impact on publications adjusted for journal quality and citations than standard Mincerian determinants like age or gender. We also introduce the number of published articles and the size of the co-author network (the author’s total number of different co-authors) as determinants of the average quality of publications. We demonstrate increasing returns to scale with respect to both variables. Academics who have published more articles and who have had more different co-authors reach a higher average quality of publications.

Then, we introduce specialisation patterns measured by the share of each academic’s articles in each JEL (Journal of Economic Literature) classification code. It turns out that, even if these specialisation choices do not have much impact on the overall results described above, some evidence of disparities between fields are observed, both in terms of publication and citation patterns. For instance, French academics specialised in the fields of “Microeconomics” and “Labor and Demographic Economics” publish more articles, of a higher average quality, and they are more cited than the average.

Finally, regressing citation indexes on publication scores and on the variables mentioned above, we find that the largest part of the variance in citations is explained by publication scores. This allows us to conclude that on the whole, publications adjusted for journal quality and citations measure the same dimension of publication productivity. Nevertheless, we observe some non-random deviations between the two measures for specific over- or under-cited fields and due to strong team size and network effects related to the organisation of the research and publication activity. For a given publication record, larger team size and a larger co-author network generate more citations. Since the publication volume and journal quality are controlled for, we interpret this result as a pure impact of team size and network on knowledge diffusion at identical levels of academic activity. This can emerge from two types of effects. First, different co-authors of an article present their study in different conferences, seminars, informal talks, etc. The more numerous the co-authors are, the stronger the diffusion. Second, because academics talk about their new papers to their former co-authors, the more numerous these latter are, the larger the knowledge diffusion to their colleagues and their other co-authors. While these effects are fairly intuitive, our study, using both individual publications and citations simultaneously, is the first, to the best of our knowledge, to provide systematic evidence on them and to quantify their overall magnitude.

In the 1970s, the quantity of academics’ publications was shown to be an important determinant of academic wages. Katz (1973) evaluates the return to an article between $18 and $102 in 1969 in one large US public university. In a multi-equation system modelling job-quality, research productivity and earnings of a sample of 863 economists in 1966, Hansen et al. (1978) find that experience, measured by the number of years since Ph.D., has a significant impact on research productivity. They also evaluate the return to an additional published article or book at an almost 8 % increase in annual earnings. From the 1980s, the quality of publications, measured by citations, has been shown to be a more important determinant of salaries than their quantity. On a sample of 148 full professors of economics at seven large public universities, Hamermesh et al. (1982) show that citations, which they define as indirect contributions to knowledge, have a superior effect on academic earnings than the number of publications, which they define as direct contributions to knowledge. Diamond (1986) goes further in the analysis and estimates that the marginal value of a citation lies between $50 and $1,300, depending on the discipline. With a panel of 140 academic economists, Sauer (1988) confirms this existence of incentives to knowledge growth and estimates that an individual’s return from a co-authored paper with n authors is approximately 1/n times that of a single-authored paper. Finally, Kenny and Studley (1995) show that economists’ salaries are best characterised by implicit long-term contracts (predicted publications and citations) than by current productivity (presumably because of mobility costs) and find an insignificant effect of field choice on academic wages. This paper takes a similar perspective, assessing the respective roles on research productivity of age, gender, the number of articles published, and specialisation, to which we add the role of team size and networks.

Adopting a macro perspective, Lovell (1973) estimates production functions of publications and citations, which he defines as contributions to economic knowledge. At the aggregate level, considering previously published articles as the stock of capital and the number of PhDs granted in the USA as the labour input in a Cobb-Douglas production function, he explains the tendency of scientific literature to grow exponentially between 1895 and 1965 by the exponential growth of the labour input. Here, we take another road and study the micro determinants of publications and citations, which have been proved to be important determinants of academic wages and promotions and are therefore considered to be evidence of research productivity. Hence, this allows us to shed some light on the impact on knowledge creation and diffusion of the behaviour of academics in terms of co-authorship, specialisation choices and research strategy in general. Recently, some studies have studied the role of seniority (Mishra and Smyth 2013), gender (van Arensbergen et al. 2012) and networks (Badar et al. 2013; Egghe et al. 2013) on different measures of academic productivity. However the impact of these variables has been never studied simultaneously, with the further possible role of specialisation, which we do here. From a policy point of view, understanding and quantifying such mechanisms is an important step in designing a more efficient academic system.

Finally, Stigler and Friedland (1975) is the study the most closely related to ours. They examine the citations of articles published between 1950 and 1968 in two economic sub-fields, cited in doctorates in economics from six major US universities, to identify patterns of intellectual debtors and creditors. Regressing the number of citations—considered as a measure of intellectual influence—on the number of published articles, they find some evidence of weak but significant increasing returns to quantity. However, their specification does not include the age, gender, team size, and network variables we consider here.

Another contribution of our paper is the use of a new tool to measure the impact of academics: their Google Scholar citations. Standard studies on citations use the Journal of Citation Reports but even its latest version only refers to about 300 journals in economics, and many less were used in most of the studies mentioned above. In that case citations of and by articles in non-referenced journals are excluded. Using Google Scholar citations presents the decisive advantage of taking into account a much wider range of supports for both cited and citing articles and it also includes books, working papers, and policy reports, since all supports on academic websites are concerned. As a consequence, Google Scholar is sometimes considered as a better tool for comparisons across disciplines (Amara and Landry 2012; Harzing 2013) and we also contribute to this emerging literature that assesses the properties of Google Scholar citations. As regards our measures of publications adjusted for journal quality, we use all 1,206 Econlit journals, and their relative quality, which is also a large extension with respect to previous studies. Finally, we do not restrict the academic sample used, which can induce selection biases when, for instance, only the best universities are kept; we consider the full sample of 2,782 French academics in 2008, whether they publish a lot or not at all.

Data and the econometric strategy are presented in “Data” and “Econometric specification”, respectively. The individual determinants of publication scores and citation indexes are analysed in “Individual determinants of publication and citation records”. “The impact of specialisation choices” tests the robustness of the findings when specialisation choices are taken into account. Comparing citation indexes and publication scores, “The patterns of knowledge diffusion” analyses the patterns of knowledge diffusion and “Discussion and conclusion” concludes.

Data

Measure of output

We measure the research output of the academic i in two ways: her/his number of publications adjusted for journal quality and her/his number of citations in Google Scholar.

Publication records

Publication records are measured as weighted sums of articles referenced in EconLit, which lists more than 560,000 publications in more than 1,200 journals between 1969 and 2008. As is now standard, three dimensions enter the weighting of publications: the relative number of pages, the number of authors and the quality of the journal.

We take into account the number of pages to capture the idea that longer articles contain more information (normal vs short papers in the American Economic Review, for instance). However, since the layout can be very different from one journal to another and since we do not want to favorise some journals for that reason, the weighting is made within each journal. The weight is the ratio of the number of pages of article a over the average number of pages of the articles published in that journal in the same year, which assumes the consistency of editorial policy over one year only. By contrast, differences in the length of articles between journals (Economic Letters vs American Economic Review, for instance) are considered to be captured directly by the journal quality index (presented below).

We also take into account the number of authors of each article. As is standard, we assume an equal split of the article between its authors.

Finally, we take the quality of publications into account using the Combes and Linnemer (2010) journal weighting scheme. For robustness purposes, we compare our results using two of their indexes assuming different degrees of convexity in the distribution of journals’ weights.

To sum up, the output of academic i is a weighted sum of her/his articles a:

$$ y_{i} = \sum_{a}\frac{W(a)}{n(a)}\frac{p(a)}{\bar{p}} $$
(1)

where p(a) is the number of pages of the article, \(\bar{p}\) the annual average number of pages of articles in the journal, n(a) the number of authors of the article and W(a) the journal weighting scheme. We consider the medium and high degree of convexity of journals’ weights, noted CLm and CLh respectively. Scores using neither journal weights nor the correction for the relative number of pages are noted E. CLm ranges from a weight equal to 100 for the Quarterly Journal of Economics to a weight of 4 for the last journal, passing by 55.1 for the Journal of Labor Economics for instance. CLh ranges from 100 for the Quarterly Journal of Economics to 0.0007 for the last journal, passing by 16.7 for the Journal of Labor Economics. We refer to these two schemes as the “Quality” and “Top quality” publication measures respectively. They are illustrated for the top 50 journals in Table 9 of Appendix. E is referred to as “Quantity” (which is corrected for the number of authors only).

Citation records

We assess citations through the Google Scholar citations of articles, books and working papers (which we refer to as entries) written by the academics of our database. These citations were extracted on January 2010, around two years after the date at which we want to measure research productivity, which seems quite reasonable given the time needed for studies to be cited.

In order to avoid problems of homonyms involving academics with identical names in fields other than economics, we restrict the fields on Google Scholar to the “subject areas” “Business, Administration, Finance, and Economics” and “Social Sciences, Arts, and Humanities”. To have a period of time comparable to that used for EconLit, we only keep entries dated between 1969 and 2008.

As we did for publications, we take into account the number of authors for each entry. However, it is no longer necessary to take into account the quality of the support or the relative length of publications, since we assume that the number of citations directly reflects the entry quality.

We first build an index of total citations, TCit i , which is the total number of citations received by all academic i’s entries, each divided by its number of authors. Then, to combine this (quality-adjusted) measure of quantity with something closer to the average quality of publications, we use a synthetic index. We do not use the famous H-index, H i , proposed by Hirsh (2005), because typically two academics can have the same H-index when one of them has some very highly-cited entries and the other not. In other words, the H-index ignores the internal distribution of citations received by the articles used to calculate it, and we consider that as one of its strong limits. Therefore we prefer to use the G-index, proposed by Egghe (2006) which states that academic i has a G i -index equal to g, which is unique, if her/his g most-cited articles have received g 2 citations in total, or g citations on average. It can be shown that G i  ≥ H i . The difference between the two indexes relates to the number of citations received on average by the most-cited articles.

To take the number of co-authors of each article into account, we follow Schreiber (2008) who proposes to attribute all its citations to any entry but simply a fraction of the entry to each author. Then the H and G indexes are not necessary integers anymore, but they keep the same signification. For instance, a G-index of 7.5 means that the academic has had published at least 7.5 “single-author equivalent” articles with 7.5 citations each on average.

Population and descriptive statistics

The EconLit database, which enables us to compute our different scores of publication, is matched with a list of academics in economics provided by the French Ministry of Education and Research.Footnote 1 In 2008, year for which the analysis was conducted, 2782 academics were considered.Footnote 2

Table 10 in Appendix presents descriptive statistics about our main dependent and independent variables. The average academic was around 47 years old, had published 3.5 (single-author equivalent) articles referenced in the EconLit database between 1969 and 2008, had entries cited 107 times with a G-index of 7.25 (meaning that the academic’s 7.25 most-cited entries had been cited 7.25 times on average). In 2008, 30 % of French academic economists were women and 73 % had published at least one article. We refer to them as the “Published”. Also, 85 % have at least one entry with at least one citation in Google Scholar; we refer to them as the “Cited”.

If we restrict our sample to the academics who have published at least one article referenced in the EconLit database (panel (b) in Table 10 of Appendix), the average published author is slightly younger (around 46 years old) and more likely to be a man (27 % of women, compared with 30 % for all academics). S/he has published 4.8 single-author equivalent articles, has been cited 137 times and has a G-index of 9 on average. 95 % of published academics have been cited at least once on Google Scholar. The average number of authors per article is around 2 (1.85 in EconLit, 2.04 in Google Scholar) while the average network size (total number of different co-authors) is 4.12.

Alternatively, if we restrict our sample to academics who have been cited at least once on Google Scholar (panel (c)), 81 % have published at least one article and 28 % are women. The average cited academic has published 4.05 single-author equivalent articles, has been cited 126 times and has a G-index of 8.5. The average network size of cited academics is slightly smaller (3.49 vs 4.12) than that of published academics.

Finally, Table 1 provides some simple correlations between EconLit publication scores and Google Scholar citation indexes.Footnote 3 The following observations are noteworthy. As expected, the academics who have published more articles are more cited. The correlation between citation indexes and publication scores is higher when the quality of journals is taken into account with a medium degree of convexity rather than a high degree of convexity. However, the average quality of publications is more correlated with citation indexes when there is a high degree of convexity in the journal weighting scheme.

Table 1 Correlations of EconLit and Google Scholar indexes

Econometric specification

For our different measures of publication scores or citation indexes, we estimate the following specification using ordinary least squares:

$$ \text {log} \, y_i = \beta_0 + \beta_1 \, \text {Gender}_i + \beta_2 \, \text {Age}_i + \beta_3 \, \text {Age}^{2}_i + \beta_4 \,\text {log} \,\overline{\text nau}_i + \beta_5 \, \text {log} \,(1+\text {Net}_i) + \sum_{j=1}^{18} \gamma_j \frac{y_{ij}}{y_i} + \epsilon_i $$

where y i is the publication score or citation index of academic i,  Gender i is a dummy variable taking 1 for women, \(\overline{\text nau}_i\) is the average number of authors per publication by academic i and Net i is the size of the academic’s co-authorship network, i.e., her/his total number of different co-authors.Footnote 4 Finally, \(\frac{y_{ij}}{y_i}\) is the share of academic i’s output in JEL code j at the first letter level (calculated in terms of E, the number of single-author equivalent articles published in EconLit).Footnote 5 , Footnote 6

Since some academics have never published an article in EconLit or have no citation on Google Scholar, we also take selection into account using a Heckman 2-step procedure. The first step is the probit selection equation and the second step is the main equation augmented by the inverse of Mills’ ratio. In the present case, this corresponds to a model where academics who have not published or are not cited are those who do not reach a sufficient quality threshold in their research activity, and this can be explained by the same type of variables as those that determine the level of publication or citations. Unfortunately, for this very reason, it is pretty difficult to find exclusion restrictions—variables that would explain the probability of being published or cited, but not the levels. Therefore the selection effect is identified on non-linearities only.

Individual determinants of publication and citation records

In this section, we analyse the individual determinants of publication and citation records by regressing total publication scores (E the number of single-author equivalent articles, and CLm and CLh, the quality and top quality publication scores using a medium and a high degree of convexity in the journal weighting scheme, respectively) and citation indexes (TCit Google Scholar total citations discounted by the number of authors per paper and G Google Scholar G-index) on gender, age and its square and the average number of authors per article. This is presented in Table 2. We also regress CLm/E and CLh/E, the average quality and top quality of publications on the same explicative variables augmented by E (quantity) and the network size to identify possible increasing returns to quantity and co-authorship. Since network size is by construction highly correlated with quantity, it is included in the regressions only when quantity is also included to identify the network effect on top of its natural quantity effect. Except for age and gender, all variables are in logs.

Table 2 Determinants of publication scores and citation indexes

Women are less productive, whatever the measure of research output. Older academics have published more articles, which are on average of lower quality. These two effects cancel out when the dependent variable is the total publication score taking quality of publications into account. Older academics are also more cited (with a slightly concave effect), which could be due to the fact that their articles were published longer ago and have therefore had more time to be cited. Moreover, we find increasing returns to the average number of co-authors per article for all research output measures, except for the number of single-author equivalent articles. A published academic who has on average two co-authors instead of only one (per article) has 10.1 % less publications but their average quality is 8.4 % higher and their average top quality is 36.8 % higher.Footnote 7 Her/his total quality and top quality publication scores are 11.4 and 96.9 % higher, respectively, while s/he is cited 53.4 % more and has a G-index 41.8 % higher.Footnote 8

We also find increasing returns to the individual quantity and to the size of the co-authors’ network on the average quality of publications, and they are stronger when the journal’s quality assessment is more selective. The more articles academics have published, the more different co-authors they have had, the higher the average quality of their publications. Researchers who have published five single-author equivalent articles instead of four have an average quality of publications 3 % higher and an average top quality of publications 15.5 % higher.Footnote 9 Having a stock of five different co-authors instead of four, meaning a total network size of 6 instead of 5, increases average publication quality by 4 % and average publication top quality by 15.5 %.Footnote 10

In Table 3, we repeat the same exercise as in Table 2 except that we take selection into account using the Heckman 2-step procedure. Hence, in the first column of Table 3, we run a probit equation of the probability of having been published, which allows us to calculate the inverse of Mills’ ratio (“Selection”) that we include in the columns 2–6. The same exercise is done for Google Scholar citation indexes in columns 7–9. Without clear exclusion restrictions, the inverse of Mills’ ratio should possibly control for the presence of non-linearities in the model.

Table 3 Determinants of publication scores and citation indexes, with selection

Comparing the results from Tables 2 and 3, we find that if women have published less articles in EconLit on average, it is because of the women who have not published at all. Once we have controlled for this selection process, published women have had as many papers published as men, and cited women are cited as often as cited men.

Taking selection into account, older academics do not produce lower or higher quality articles than younger academics. Despite the correlation between average journal quality and citations, older academics are more cited however, probably because their articles were published longer ago and therefore have had more time to be cited. Results on the increasing returns to the average number of authors per article and to the quantity of publications and the size of the co-authorship network for the average quality of publications are not impacted by the Heckman procedure.

The impact of specialisation choices

In this second step, we test the robustness of “Individual determinants of publication and citation records” results and analyse the effect of specialisation choices by including the shares of research output published in each field (JEL codes at the first letter level) as a control variable. This tests whether academics specialised in certain fields are published more or cited more.Footnote 11

Specialisation only

We start by regressing publication scores and citation indexes on specialisation shares alone. As seen in Table 11 in Appendix, French economists specialised in the fields of “General Economics and Teaching” (A), “Macroeconomics and Monetary Economics” (E), “Microeconomics” (D) and “History of Economic Thought, Methodology, and Heterodox Approaches” (B) have published more articles than the average (by order of magnitude of the coefficients). At the other extreme, academics specialised in “Business Administration and Business Economics; Marketing; Accounting” (M), “Economic History” (N) and “Health, Education, and Welfare” (I) have published less single-author equivalent articles.

The French economists’ average quality of publications is higher in the fields of “Mathematical and Quantitative Methods” (C), “Microeconomics” (D), “Public Economics” (H), “Labor and Demographic Economics” (J) and “Macroeconomics and Monetary Economics” (E) (by order of magnitude of the coefficients). Total publication scores are higher than average in the same fields (except for Public Economics (H) in CLm), meaning that fields which are over-represented in terms of quantity as “General Economics and Teaching” (A) or “History of Economic Thought, Methodology, and Heterodox Approaches” (B) are published in low-quality journals.

In terms of Google Scholar citation indexes, French economists specialised in the fields of “Microeconomics” (D), “Labor and Demographic Economics” (J), “Agricultural and Natural Resource Economics; Environmental and Ecological Economics” (Q) and “Mathematical and Quantitative Methods” (C) (only considering total citations for the latter) are more cited than the average. At the other extreme, French economists specialised in the fields of “History of Economic Thought, Methodology, and Heterodox Approaches” (B), “Financial Economics” (G) and “Economic History” (N) are less cited than the average.

Interestingly, comparing Table 11 with Table 2 in “Individual determinants of publication and citation records”, we observe that the R 2 of the model is higher with specialisation shares alone than it is with individual characteristics alone to explain total publication scores. The total explained variance by specialisation is 13.8 % for CLm and 19.5 % for CLh, whereas the total explained variance by individual characteristics (Table 2) is 3.4 % for CLm and 6.6 % for CLh. By contrast, the R 2 of the model explaining citation indexes is higher when we introduce individual characteristics alone (8.2 and 10.1 % for Google Scholar total citations and G-index, respectively) than it is when we introduce specialisation choices alone (2.9 and 3.4 % for total citations and G-index, respectively). In any case, the variance explained by individual characteristics is weak compared with what is found in the literature on wage equations for instance (in which individual characteristics often explain between 30 and 40 % of wages, a measure of labour productivity). In academic research, specialisation choices appear to be a stronger determinant of publication outcome than the individual variables we considered in “Individual determinants of publication and citation records”, while the reverse is true for citations. Now we need to assess whether the conclusions we presented in “Individual determinants of publication and citation records” are driven by the omitted specialisation variables or if the two sets of variables correspond to different effects.

Specialisation and individual characteristics

We now introduce both specialisation patterns and individual characteristics. We conclude from Table 4 that the average number of authors per article is the only individual characteristic that matters for explaining total publication scores and citation indexes. As found in “Individual determinants of publication and citation records”, it has a negative impact on the quantity of published articles and a positive impact on average quality of publications, total publication scores (except in CLm) and citation indexes. Coefficients associated with age, its square and gender are never significant anymore, underlining the fact that the demographic impact on total publication scores observed in “Individual determinants of publication and citation records” was in fact due to specialisation choices.

Table 4 Determinants of publication scores and citation indexes: specialisation and individual characteristics

This contrasts with results on the average quality of publications, which is still impacted, in the same order of magnitude, by increasing returns to quantity and size of the co-authorship network. Then we also observe in Table 12 of Appendix including detailed specialisation shares’ coefficients that French academics in economics specialised in the fields of “Microeconomics” (D) and “Labor and Demographic Economics” (J) have more articles of a higher average quality published, hence reach higher total publication scores and are more cited than the average. French economists specialised in the fields of “Macroeconomics and Monetary Economics” (E), “Mathematical and Quantitative Methods” (C) and “Public Economics” (H) also have higher publication scores but are not more cited than the average. At the other extreme, academics specialised in the fields of “History of Economic Thought, Methodology, and Heterodox Approaches” (B) and “Economic History” (N) publish lower-quality articles and are less cited than the average, whereas academics specialised in “Business Administration and Business Economics; Marketing; Accounting” (M) and “Urban, Rural, and Regional Economics” (R) also publish lower-quality articles but are not less cited than the average.

The patterns of knowledge diffusion

Finally, we compare citation indexes and publication scores to assess whether they measure the same dimensions of research productivity. The above analysis shows that this is not fully the case, since the effects of certain variables differ in significance and magnitude for the two types of variables. To go further into the analysis, we regress Google Scholar total citation indexes and G-indexes on EconLit publication scores, age, gender, specialisation fields, team size and co-author networks. This allows us to analyse the patterns and determinants of knowledge diffusion, that is the number of citations received everything else being equal, including the number and quality of publications. In other words, publication scores and individual citations could differ only because the former are more noisy (because quality is assessed at the journal level and not at the individual level), but not on average. If we show that some variables explain the gap between the two, this means that citations and publication scores do not measure exactly the same dimensions of productivity. This also allows us to quantify the determinants of the non-random divergence of citations with respect to publication scores, i.e., the driving forces behind citations, reflecting knowledge diffusion, at identical levels of publication.

Citations regressed on publications and specialisation

We start by regressing citation indexes on publication scores (quantity and average quality of publications) and specialisation shares to assess whether citation patterns differ from one field to another. Unsurprisingly, academics who publish more and in journals of higher quality are more cited. This confirms that Google Scholar citation indexes capture both the quantity and quality aspects of publication records. In Table 5, the total R 2 of the model is slightly higher when the average quality of publications is measured with a high degree of convexity in the journal weighting scheme. With this measure, having published five single-author equivalent articles instead of four increases Google Scholar total citations by 22.4 % and the Google Scholar G-index by 11.9 %.Footnote 12 On top of that, increasing the average publication top quality by 10 % increases Google Scholar total citations by 2.1 % and the Google Scholar G-index by 1.2 %.Footnote 13

Table 5 Citation indexes regressed on publication scores and specialisation

As seen in Table 13 of Appendix including detailed specialisation shares, controlling for publication scores, academics specialised in the fields of “Business Administration and Business Economics; Marketing; Accounting” (M), “Industrial Organization” (L), “Agricultural and Natural Resource Economics; Environmental and Ecological Economics” (Q) and “Urban, Rural, and Regional Economics” (R) (by order of magnitude of the coefficients) are relatively more cited. At the other extreme, academics specialised in the fields of “History of Economic Thought, Methodology, and Heterodox Approaches” (B), “Macroeconomics and Monetary Economics” (E), “Microeconomics” (D), “Public Economics” (H) and “Financial Economics” (G) are relatively less cited, given their level of publications.

The determinants of citations when publications have been controlled for

Finally, we regress citation indexes on publication scores (quantity and average quality of publications), specialisation, and individual characteristics including co-authorship variables. The results of “Citations regressed on publications and specialisation” are robust to the introduction of individual characteristics, except that academics specialised in “Agricultural and Natural Resource Economics; Environmental and Ecological Economics” (Q) are no longer over-cited and academics specialised in “Financial Economics” (G) are no longer under-cited than the average. Controlling for the quantity of published papers and the average quality of these publications, older academics have slightly higher total Google Scholar citation scores, which could be due to the fact that their articles were published longer ago and hence have had more time to be cited. But they do not have higher G-indexes.

Importantly, at given publication records, academics with a higher average number of authors per article and with a larger network size (total number of different co-authors) are significantly more cited. Our interpretation of this result is that larger team sizes generate more knowledge diffusion through conferences, seminars, informal talks, etc. Also, academics may generate knowledge diffusion of their new publications through their former co-authors, which is stronger when they are more numerous. When the average quality (and not the top quality) is controlled for, a published academic who has on average two co-authors instead of only one (per article) is cited 9.7 % more and has a G-index 20.5 % higher.Footnote 14 Moreover, a stock of five different co-authors instead of four, meaning a total network size of 6 instead of 5, increases Google Scholar total citations by 5.7 % and the Google Scholar G-index by 4.2 %.Footnote 15

We perform a variance analysis to infer the relative explanatory power of each variable (or group of variables). To do so, we first calculate the effect of each variable as the product of the variable and its coefficient. Then, we calculate the standard error of this effect on all observations and the correlation coefficients between the calculated effects and the dependent variable. To have an important explanatory power of the dependent variable, the effect of an explanatory variable should first have a high standard error by comparison with the standard error of the dependent variable. Most importantly, a variable—or a group of variables—has a large explanatory power when its effect is largely correlated with the dependent variable.

In Table 7, we perform the variance analysis of the estimations reported in columns 3 and 5 of Table 6 (or 14), where the log of Google Scholar total citations is the dependent variable. In Table 8, we perform the variance analysis of columns 4 and 6 of Table 6 (or 14), where the log of the Google Scholar G-index is the dependent variable. In both cases, with a standard error equal to around 55–60 % of the dependent variable and a correlation coefficient of around 0.65, EconLit publication scores are the most important determinants of citation indexes.Footnote 16 In this group of variables, the quantity of publications has a much higher explanatory power than the average quality of these publications with a standard error between 1.5 times and twice as large and a correlation coefficient 1.5 times as large, whatever the journal weighting scheme. Then, we observe that other explanatory variables have much weaker explanatory powers of citation indexes than publication scores. For total Google Scholar citations, even if individual demographic characteristics (gender and age) have a higher standard error than co-authorship effects, the correlation coefficient of the latter with the dependent variables are 1.8 times as large as the correlation coefficients of individual demographic characteristics.Footnote 17 For Google Scholar G-indexes, it is even clearer that co-authorship patterns matter more than individual demographic characteristics, since both standard errors and correlation coefficients with the dependent variables are higher for co-authorship effects than for individual demographic characteristics (importantly, correlation coefficients are more than twice as large).Footnote 18 Hence, the determinants of publication productivity seems to differ from those in market activities and encompass network effects that are typical of this activity.

Table 6 Determinants of citation indexes controlled for publication scores
Table 7 Variance analysis, Google Scholar total citations
Table 8 Variance analysis, Google Scholar G-index

In the “individual demographic characteristics” group of variables, age has the largest explanatory power. In the “co-authorship patterns” group of variables, network size has a larger explanatory power than the average number of authors per article. Knowledge diffusion is driven more by total co-authorship network size than by team size. Finally, even if we observe some disparities between fields in the citation practices described above, we observe in Tables 7 and 8 that the measured effects of specialisation shares are very weakly correlated with the dependent variables, meaning that specialisation patterns have a weak explanatory power of citation indexes like Google Scholar total citations and G-indexes, when all individual effects are controlled for. In both Tables, using CLm or CLh to measure journal quality leads to very similar results.

Discussion and conclusion

We study the individual determinants of EconLit publication scores and Google Scholar citation indexes of French academic economists. We show that when co-author patterns have been controlled for (the average number of authors per article and the total co-author network size), gender and age do not matter anymore, except for the probabilities of being published and of being cited at least once. Moreover, we carefully analyse the role of the specialisation patterns of the academics. Those specialised in “Microeconomics” and “Labor and Demographic Economics” publish more articles of a higher average quality and are more cited.

Importantly, we exhibit increasing returns to quantity and to co-author network size for the average quality of publications, whatever the chosen weighting scheme of journals, and increasing returns to the average number of authors per article for all research output taking quality into account, including citation indexes. Finally, by looking at the pattern of knowledge diffusion, we find that publication scores are the most important determinant of citation indexes. Nevertheless, we also show that team size and co-author networks constitute its second largest explanatory group of variables. Academics who publish more papers and of higher quality are more cited, which was expected. Now, we also show that academics working in larger co-author teams and who have a larger total co-author network are also more cited. Our interpretation is the following. Co-authored articles are presented in conferences, seminars and workshops by their several authors. Moreover, academics discuss their new findings with their peers, including those with whom they have already worked. Both attitudes generate more knowledge diffusion, which we measure through citations. Therefore we confirm the widespread intuition that networks do matter for citations.

It has already been shown that citations are an important determinant of academic wages and that network effects matter for academic promotions (see McDowell and Smith (1992), Combes et al. (2008) or Zinovyevay and Bagues (2012) for instance). We prove here that network effects matter for citations. Therefore, data on wage and position would allow us to disentangle the direct effects of networks and their indirect effects through citations, in the explanation of wages.

We must emphasise also that using citation indexes different from Google Scholar that is considered here could lead to different conclusions. Google Scholar is currently the citation dataset that has the largest scope. On the one hand, it is interesting because it allows us to take into account in the academic’s publication record any type of publications, journal articles, books, reports, and so on, which are present on academic websites. This can be important for domains where medias other than journal articles are still much used. This enlargement is also on the citing side since any citation in these alternative supports is also counted. By contrast, citation datasets as the Web of Knowledge or Scopus consider only academic journal articles, and typically only a subset of them, which much reduces any academic’s publication record. It also much reduces the number of documents that are considered as citation sources, which are in general those found in the same journals. On the other hand, the enlargement when using Google Scholar clearly introduces some noise in both the academics’ publication record and its citations. This is because the same study or report is sometimes posted many times on the web, with slightly different titles or different publishers for instance, or because different chapters of the same book are considered as separate items or not, and so on. Also, when one enlarges the citation sources, the question of weighting or not the citation made by a given source by the impact of this source becomes more crucial. Typically, some people consider that a citation from an article in an academic journal as more value than a citation in an essay posted by an undergrad student on the web. Google Scholar does not allow distinguishing the two. At least, within the Web of Knowledge or Scopus, there is already a selection of relatively homogenous citation sources, and one can go even further by weighting any citation by the source’s average impact. For these reasons, Google Scholar is considered as less selective than alternative citations sources. Depending on the purpose of the study, one can prefer one or the other. For instance, Amara and Landry (2012) show that Google Scholar better discriminates among academics who are in the middle of the distribution since they all achieve very low scores when computed using more selective sources. Amara and Landry (2012) also show that the ranking of best academics are no so much affected using Google Scholar or another source. Still, one would probably prefer to use the most accurate information to compare top academics and therefore use more selective and precise sources than Google Scholar. In any case, our study considers Google Scholar only. Replicating it with other citation datasets would show whether the effects we put into light are robust or not.

Another limit of our study is due to the fact that academics in economics only are considered and there are not necessary representative of other academics on dimensions that are directly linked to the phenomenon we study. For instance the practice in terms of number and hierarchy of co-authors of a paper much differ across domains. Typically, the economists world is relatively simple, first because the number of authors is rarely above three (there are only around 5 % of articles with four or more co-authors), second because they are ranked by alphabetical order, independently of the supposed contribution of each of them. When the number of co-authors is large and the co-authors explicitly recognise the presence of a hierarchy between them without giving the exact contribution of each, which is the standard practice in other domains, first it is more difficult to compute the score of a given academic when for economists we simply divide by the number of co-authors, second, the impact of the number of co-authors on this score itself can change. Again, this calls for some replication of our study on other domains.

Finally, using the sources of citations (the citing academics/articles) would allow us to build a more precise picture of the patterns of knowledge diffusion by tracking the path of citations. Unfortunately, Google Scholar does not provide such information. It would be also interesting to perform some more structural estimations or to benefit from richer datasets (with a panel dimension possibly and some instruments) to deal with endogeneity concerns and to infer the directions of causalities, as Bramoullé et al. (2009) propose, for instance, to identify peer effects in recreational services. This would improve on the correlations established here and move us to more causal interpretations. This is a difficult exercise though, which awaits further research.