Introduction and background

This study attempts to investigate whether, in top economics journals, the length of an article has an effect on the number of citations received by that article.Footnote 1 We examined articles published in the top-five general interest economics journals between 2010 and 2014 combined with citation count data from Google Scholar. Controlling for a range of factors, including author quality, we find that a 1% increase in article length is associated with an increase in the number of citations by around 0.56%.

Our study relates to rankings of academic output that are important in building the reputation of individual articles, books and other scholarly works. These rankings, in turn, determine the perception of the quality of authors, universities and journals. Such impressions of quality play a central role in determining academics’ job market outcomes and the receipt of grants, prize, awards and honors (Hamermesh, 2013, 2018; Ellison, 2013; Gibson, 2014; Gibson et al., 2014; Anauati et al., 2016; Sandnes, 2018).

In economics, citation count of scholarly works is a widely-used measure to evaluate researchers.Footnote 2 Citation counts have been described in the literature as a measure of research impact, research quality and researcher influence.Footnote 3 Many studies in economics (e.g., Liebowitz & Palmer, 1984; Medoff, 1989; Galiani & Gálvez, 2017; Hamermesh, 2018; Arrow et al., 2011) find citation count to be a good indicator of the influence and quality of research. Gibson et al. (2017) found that citation count contributed to increasing academic salaries, particularly in lower ranked departments where journal quality is much more heterogeneous and publishing in top-ranked journals less frequent. There can be many reasons behind such findings. First, submitted articles in a journal are accepted for publication on the basis of their perceived future impact while citation count measures actual impact of research works (Gibson et al., 2017). Second, citation count is non-subjective in nature (Card & DellaVigna, 2020; Gibson et al., 2017; Powdthavee et al., 2018; Besancenot et al., 2017). A further potential advantage of citation count is that it is quantifiable and easy to employ for analysis and comparison. For example, aggregate citation counts of researchers in a university can be used as an indictor of its research quality (Bruns & Stern, 2016).

How many times an article is cited may depend upon several factors including the length of the article. Earlier research in economics, while not explicitly focusing on article length, found that citation counts are significantly higher for longer papers (Medoff, 2003; Hudson, 2007; Card & DellaVigna, 2013). This association between article length and citations has also been studied in medicine (Falagas et al., 2013) and ecology (Fox et al. (2016)) . Medoff (2003) examined articles published in the top eight economics journals in 1990 to test if collaboration increases paper quality. He finds a positive and significant correlation between article length and citations. Hudson (2007), using five-year citation trends from American Economic Review and Economic Journal, also concludes that citations are positively related to page length. He hypothesizes that length reflects paper quality. Card and DellaVigna (2013) find that published articles in the upper quintile of the length distribution have 50% more citations than those in the middle quintile. They conclude that the increase in rejection rates has motivated authors to improve paper quality, resulting in longer papers.Footnote 4 Xie et al. (2019) conduct a meta-analysis of the research on the correlation between paper length and citations which covers 18 studies from a wide range of research domains including the social sciences. They find a correlation of 0.3 between citation counts and article length.

Theoretically, the effect of an article’s length on the number of citations it receives is ambiguous. Article length may negatively affect citation. For example, shorter papers, might be easier for editors and referees to evaluate quickly (Ellison, 2002; Card & DellaVigna, 2012; Slusky, 2019). Such papers may be easier for readers to consume quickly, understand and then cite. This could be particularly true for researchers with a non-English speaking background whose share, with the globalization of research and development, is increasing over time (Moiwo & Tao, 2013). On the other hand, it is possible for length to have little or no impact on citations. Expert reviewers and editors in top economics journals are likely to slash unnecessary details from articles. This could make length irrelevant.

For a number of reasons, we may observe a positive effect of article length on citations. First, authors of longer articles may write differently, with more detailed introductions, model development and specification testing. All these can contribute to the clarity of the article and this could produce more citations (Card & DellaVigna, 2013). Second, in order to make papers comprehensive, some articles focus both on theory and empirics which can make them longer. Such papers can be cited by both theorists and empirical researchers, generating more citations. Finally, limiting references to a minimum number– a practice often followed by researchers in economics–may induce authors to cite lengthy articles which cover a larger number of issues. It can also lead to only citing articles from top journals.

Our primary contribution in this paper is that we improve on previous estimates of the relationship between article length and citation counts by controlling for paper quality in several novel ways. To identify a causal impact of article length on citations would require exogenous variation in article length that is unrelated to any other factors which influence citations. The most important of these is article quality. If article quality produces both longer articles and higher citation counts, then we would expect a positive association between the two, even in the absence of any effect of article length on citation counts.

We attempt to control for quality in a number of ways. First, we only consider articles from the top five economics journals. While there is heterogeneity in article quality in these journals, it is much less than what we would find if we expanded our analysis to a larger number of economics journals. The top five journals are generally agreed to have the highest quality papers.

We also attempt to control for article quality in the regressions using information about author quality based upon citation counts from other publications. Finally, we use an exogenous change in policy at one of the top five journals which affected article length, but should not have affected quality, and a difference-in-difference strategy to examine the effect of article length on citation counts. While neither approach can produce a causal estimate of the impact of article length on citations in the absence of exogenous variation in article length, we believe that our approach improves on previous estimates of this relationship. Re-assuringly, the regression approach and the quasi-experimental approach produce very similar results.

Our focus on the top five economics journals, which receive an inordinate amount of attention in that field, is a contribution to our knowledge that will be of particular interest to economists.

As mentioned above, we find that article length, even after we control for author quality, has a positive relationship with the number of citations. This flew in the face of our intuition about our own behaviour. Thus, to supplement these findings, we conducted a survey of economists at the Australian National University (ANU). Their responses confirmed that researchers were not aware of article length in choosing which papers to cite. Nor did they think that article length, per se, changed the likelihood that they would cite a paper. However, ANU economists did believe that longer articles are usually comprehensive and focus both on theory and empirical issues. This may be an underlying factor for the higher citations of longer papers. We briefly discuss this survey at the end of the results section in “Survey of economists” section.

The remainder of this article is organized as follows. “Empirical strategy, identification and variable construction” section presents the empirical strategy and identifying assumptions employed in our investigation. “Citation data: descriptive statistics” section briefly describes the data. Results from our analysis are presented at “Results and discussion” section. Section “Conclusion” concludes.

Empirical strategy, identification and variable construction

We employ a linear regression model

$$\begin{aligned} Y_{ijacmt} =\alpha +\beta _1 \, X_{ijacmt} +\beta _2 \, X^2_{ijacmt} + \varvec{\gamma } \, {\mathbf {Z}}_{ijacmt} +\phi _{jmt} + \psi _{a} + \eta _{c} + u_{ijacmt}, \end{aligned}$$
(1)

which allows for non-linearity in the effect of article length on citation count. For each article i in journal \(j=1,\ldots ,5\) in month m of year \(t=2010,\ldots ,2014\), of type a (regular; comment/replies/notes; lectures) and category c (theory; empirical; theoretical and empirical; econometric; other), Y denotes the citation count, X is article length (in standardized pagesFootnote 5), \({\mathbf {Z}}\) is a vector of other variables which could potentially affect the number of citations and u are other unobservable factors affecting the citation count. The model includes fixed effects for journal issue (\(\phi _{jmt}\)) (given by the interaction of journal, month and year), article type (\(\psi\)) and article categories (\(\eta\)). We estimate this model in levels and in logs. In the latter case, we use the natural log of the citation count (y) and the natural log of the number of pages in the article (x).

Our investigation relies on recent articles from the top five general interest journals in economics: American Economic Review, Journal of Political Economy, Quarterly Journal of Economics, Econometrica and the Review of Economic Studies, all of which publish scholarly articles in economics.Footnote 6 and their ranking by citations has largely remained stable over time (Card & DellaVigna, 2013). They receive a significant proportion of citations in economics (Anauati et al. 2020). A number of similar studies employed these journals for their investigations (e.g., Card & DellaVigna, 2013; Anauati et al. 2016).Footnote 7

Card and DellaVigna (2013) observed that papers published in the 1990s have higher citation counts compared to those published in the 1970s and 1980s. They hypothesised that this might be due to the nature of the sources used by Google Scholar whose searches include citation of and citation in working papers and publications. Thus, by construction, it is more likely for older papers to exclude some citations. Current practice of increased citations by researchers in economics may also generate this stylized fact. As a result, this paper only uses recent articles, published between 2010 and 2014, and focuses on recent trends in citations.Footnote 8

Following some of the previous literature (e.g., Card & DellaVigna, 2013, 2014), Papers and Proceedings issues of the AER were excluded from our analysis. The total number of observations in our sample is 1561 with 621 from AER, 154 from JPE, 211 from QJE, 319 from ECA and 256 from RES. See Appendix, Table A.1 for a complete distribution of the analysis sample by journal, year and month.

Journal issue fixed effects in our model will account for any journal-specific effects such as variation in page formatting across journals. The share of articles from the five journals we use is not proportional–AER, which published more volumes per year than the other journals, accounts for nearly 40% of the articles in our sample. The time element in the journal issue fixed effects will also capture differences in citation that originate from the variation in time after publication. It takes time for an article to accumulate citations. These can also control for the recent increase in citation of economics papers from within economics and from other social sciences disciplines (Angrist et al., 2020). The journal issue fixed effects also capture seasonality in citation. Publishing in the final 3 months of the year was found to lead to fewer citations relative to other months; see Chao et al. (2019).

Our inclusion of fixed effects for article type aims to control for the differing number of citations for different article types. There are three article types: normal articles; comments/replies/notes; and lectures. There are no review articles in our sample.

Citation of empirical, theory or econometric articles can differ and we thus control for this. Following Angrist et al. (2017), we categorize articles as (i) theoretical, (ii) empirical, (iii) both theory and empirical, (iv) econometrics and (v) other types.

The vector \({\mathbf {Z}}\) in equation (1) includes number of authors, title length, position in the order of articles (in the journal issue), author quality, reported field of study within economics and the length of time elapsed between the first published working paper version of the paper and the publication of the final article. For number of authors, title length and article position we use categorical dummy variables in our main specification but we also explore what happens when we include these variables in a continuous form. For field of study we use dummy variables for each of the top level (single letter) Journal of Economic Literature (JEL) codes.Footnote 9 JEL codes are only available for RES, QJE and AER. The reference category for these dummy variables in our regression specification is articles from ECA and JPE which do not have JEL codes.

Collaboration can be related to article quality and therefore citation numbers. Card and DellaVigna (2013) find that the number of authors per paper in the top-five economics journals has increased from 1.3 in 1970 to 2.3 in 2012 which may result in an increase in the quality of the papers. Also, people cite their own papers more often (Snyder & Bonzi, 1998). Thus, although some previous studies (e.g., Medoff, 2003; Hudson, 2007) find no significant effect of collaboration on citation, we include dummies for the number of authors.

Bramoullé and Ductor (2018) found a negative correlation between the length of the title and the number of citations for economics articles. In contrast, Guo et al. (2018) observed the correlation between title length and the number of citations to be positive after 2000, when online searches became more important for finding relevant literature. The correlation was negative between 1956 and 2000. Therefore, we also include title length, in words, as an explanatory variable.

An article’s position in the table of contents of a particular issue of a journal may also affect its citation. For example, investigating consumer response to the ordering of economics papers in a weekly email announcement issued by the National Bureau of Economic Research (NBER), Feenberg et al. (2017) concludes that, despite the effectively random list placement, papers listed first are nearly 30% more likely to be viewed, downloaded and cited. Thus, in our models, we controlled for where the article appeared in the issue in which it was published.

Article quality is undoubtedly the most important factor which determines citation and therefore needs to be included in any model which tries to explain the number of citations. As explained above, article length may simply be a proxy for quality and have no independent impact on the number of citations. We attempt to address this in several ways.

First, our use of a sample exclusively from the top five journals in economics ensures that all of our articles are of high quality. This intrinsically controls for journal quality and is the reason why we restrict our analysis to a narrow set of journals. We can think of our estimates as telling us about the effect of article length conditional on average page quality for a sample where page quality is uniformly high.

High quality articles are produced by high quality researchers. We thus construct a measure of researcher quality for each article. We do this by looking at citation counts–high quality researchers have high numbers of citations. The problem with putting total citations for a researcher on the right-hand side of a regression as a control for quality is that that total will include citations to the article in question. This creates a reflection problem as the right-hand side variable is correlated with the left-hand side variable by construction.

To solve this problem, we construct a measure of author quality based upon the number of citations that each author has received for all other publications excluding the article in question. Our approach of constructing a ‘leave-one-out’ measure of quality avoids the reflection problem of having a variable on the right-hand side of the regression that is partially a function of the left-hand side variable. Our approach is similar to how Haucap et al. (2018) control for scholarly influence using bibliometric indicators such as the number of citations and h-index as proxies. However, they do not control for this reflection problem.

Having created this ‘leave-one-out’ measure of author quality, we then create a set of dummy variables which we include in the regression based upon the quintiles of the distribution of number of citations for the most-cited author of the paper: one dummy for each of the quintiles and one dummy for cases where citation information is missing. For about 8% of articles, none of the authors appear in Google Scholar. In “Sensitivity checks” section we consider using h-index instead of citations to control for author quality and show that the results are not statistically different. Using the citation count for all authors as a control for the quality of the research team also produces nearly identical results.

Fields of study within economics are inconsistently used across the journals and articles in our sample. Kosnik (2018) show that authors and editors often disagree about assignment of articles to different field of study codes. Perhaps because of this, earlier studies find little impact of the field of study on both regression results and model fit (e.g., Card & DellaVigna, 2013). However, citation practices may differ across economic sub-fields in the way it differs across disciplines (Anderson & Tressler, 2016; Aistleitner et al., 2019). Therefore, we include field of study in our models.

The longer the gap between the appearance of a paper in working paper form and the final publication may lead to longer articles.Footnote 10 A longer gap usually signals that the paper has been submitted to more journals and this may result in additional material being added to the paper at each iteration. The paper may also have presented at more seminars and conferences which could also lead to additional material being added to the original paper.

Information on the month and year of the first working paper version of the article was gathered from Google Scholar and the Social Science Research Network (SSRN). For about 11% of articles, we could not find a working paper version. For a very small number of articles, the working paper version appeared in the same month and year as the journal version. In the specifications below we control for the natural log of one plus the number of months between working paper publication and article publication for those papers with a working paper where those two dates are different. Adding one to the month results in no missing values and ensures that 0 months and 1 month are distinct.Footnote 11

Female-authored papers receive more citations than observably similar male-authored papers (Grossbard et al., 2018; Card et al., 2019; Hengel & Moon, 2019). Kolev et al. (2019) find that gender differences in writing and communication are a significant contributor to gender disparities in the evaluation of science and innovation. Our leave-one-out measure of quality will capture some of these effects–female authors will show up as higher-quality if they are cited more. Given that almost all of our articles are co-authored and many include authors of different sex, it is difficult to control for this. We also exclude the impact of authors’ location on citation as Head et al. (2019) found no impact of geographic location on citations in mathematics. Finally, surnames that are earlier in the alphabet are cited more often than those later in the alphabet when journals order citations alphabetically as opposed to chronologically or numerically (Stevens & Duque, 2019). Our quality variable may thus capture some additional factors such as age/cohort (older authors have had more time to accumulate more total citations), gender, surname and geographic location (provided that authors do not change family name and do not move.)

\(\beta _1\) in equation (1) provides the partial correlation between article length and citation counts conditional on journal/issue, article type and category, number of authors, title length, position of article (in the issue), field of study and author quality. If there were no correlation between \(u_{ijacmt}\) and article length and the other right-hand side variables, then \(\beta _{1}\) would provide an estimate of the causal effect of article length on citation. . This seems unlikely as there are probably other unobserved factors which influence both quality and length.

As a supplement to our regression estimates, therefore we conduct an analysis employing a quasi-experimental design. In September, 2008 the AER introduced a page limit policy for new submissions which required the manuscript to be no more than 40 pages with 1.5 line spacing or no more than 50 pages with 2 line spacing (Card & DellaVigna, 2012). AER continues this policy whereas no other top-five economics journal had adopted it. As a result, with some minor fluctuations, the article length of AER between 2010 and 2014 remained almost the same at 41.03 (standardized) pages. At the same time, article length at the other top-five journals has increased. In (standardized) pages, between 2010 and 2014, JPE increased by 2.01 pages (from 49.42), QJE by 8.47 pages (from 44.75), ECA by 10.05 pages (from 38.44) and RES by 2.41 pages (from 51.9). Using these differences in changes in page count over time in a quasi-experimental setting, we conduct an analysis by employing a flexible difference-in-differences (DiD) model of the following form

$$\begin{aligned} y_{ijact}= & {} \alpha + \varvec{\gamma } \, {\mathbf {Z}}_{ijact} + \psi _{a} + \eta _{c} + \sum _{j=1}^{5}\beta _{j} Journal_j + \sum _{t=2010}^{2014}\gamma _{t} Year_t \nonumber \\&+ \sum _{j=1}^{5}\sum _{t=2010}^{2014}\delta _{jt} Journal_j \times Year_t + \varepsilon _{ijact}, \end{aligned}$$
(2)

in which, \(y_{ijact}\) is the level or natural log of the citation count of article i in journal j at year t of type a and category c where the subscripts are as in equation (1). \(Year_t\) is a vector of indicator variables that takes the value of 1 for a particular year t and 0 otherwise, covering all the periods in our data. \(Journal_j\) is also a vector of indicator variables that takes the value of 1 for journal j and 0 otherwise, covering all the journals in our data. Including these interactions is equivalent to a fixed effect for each year/journal combination. We treat AER as the reference category since it has not changed article length over time. With 2010 as the reference year, the DiD coefficients \(\delta _{jt}\) for all the journals and years will give us the average impact of article length on citations by journal and year.Footnote 12

Equation (2) could provide a causal estimate of the impact of article length on citation counts if there were no other changes at any of the journals over this time period which could also impact on either article length or citation counts. Unfortunately, this was not the case because the AER simultaneously increased the number of issues (from 5 to 7 in 2011–2013 and then to 12 in 2014) and the number of published papers (from 104 in 2010 to 148 in 2014). This could have resulted in a decrease in the average article quality which would impact on the estimates from equation (2). This does not seem to have happened however, as the AER’s impact factor increased from 3.15 in 2010 to 3.67 in 2014 and further to 5.55 in 2019.Footnote 13

Neither the approach of Eq. (1) nor equation (2) will produce a causal estimate, but using these two approaches together will improve our understanding of the correlation between article length and citation count.

Citation data: descriptive statistics

There are a number of sources that provide citation count data. Citation count in Google Scholar offers free public access to citation information and is consequently an increasingly popular tool to evaluate the quality and impact of both articles and authors (e.g., Card & DellaVigna, 2013, 2014; Anauati et al., 2016). The effectiveness of Google Scholar in capturing citation information is confirmed by a relatively high degree of correlation with other citation tools, including ISI Web of Knowledge and Citations in Economics (Card & DellaVigna, 2013). Many important studies (e.g. Card & DellaVigna, 2020) employ Google Scholar citations in assessing research in economics.

Citation count of articles in our analysis were manually collected from each Google Scholar entry at the week starting 15 November 2019. We include references to the published paper and to the working paper version of articles. Summary statistics of citation count by journal and year are presented in Table 1.Footnote 14 For all journals, it indicates a decrease in citation over time, indicating the need for time to accumulate citations. The table demonstrates that QJE articles receive the most citations among the five journals while ECA and RES receive the least.Footnote 15 The minimum number of citations is one–every article in our time window is cited at least once.

Table 1 Summary statistics: citation by year and journal

Information about article length in our data are collected from the table of contents of each journal issue and relies on total number of pages, including references and appendices. Summary statistics of article length by journal and year are presented in Tables  2, 3.Footnote 16 For all journals, it indicates an increase in the length of articles over time.

We follow the approach of previous studies (e.g., Card & DellaVigna, 2012, 2013) which adjust for journal formatting to make article length more comparable across journals. We follow Card and DellaVigna (2012) and use standardization factors of 1.08 for the QJE, 1.23 for the JPE, 1.32 for the ECA, 1.67 for the RES, 1.76 for 2010 AER and 1.49 for 2011–2014 AER. Interestingly, when comparing standardized lengths, we find that the size of AER articles remain constant between 2010 and 2014 (Tables  2, 3).

Table 2 Summary statistics: page count by year and journal
Table 3 Means and SDs of important control variables by year and journal

Other information collected includes journal name, year and month of publication, types of article, number of authors, title length, orders of article (in the journal issue), authors’ citations and field of study. See Appendix, Table A.2, for the distribution of articles by type. Over time the number of authors increases for all journals except QJE, title length fluctuates, order of articles reduces due to there being fewer articles per issue (consistent with longer papers) and there is a reduction in the maximum citations of authors for articles published at a later date.

Results and discussion

We begin with a simple nonparametric (NP) analysis of the bi-variate relationship between citation count and article length. Figure 1 shows that the relationship is linear and positive for articles below 50 pages in length. Above that length, the relationship appears to flatten out and may be non-linear. However, the relationship becomes very imprecisely estimated and the confidence intervals are quite wide.

Fig. 1
figure 1

Movement of citation count with article length (in levels)

We also estimate a semi-parametric model, in which article length enters into the model nonparametrically while all other variables enter linearly. Figure B.1 demonstrates that the relationship is roughly similar.

Figure 2 shows the bi-variate NP regression of citation count on article length where we take the natural log of both variables. This log-linear relationship appears to fit much better than the linear one.Footnote 17

Fig. 2
figure 2

Movement of citation count with article length (in logs)

These two figures lead us to believe that the model in logs is the better specification. However, in what follows, we present results for models both in levels and logs.

The marginal effects of the OLS estimates of our model coefficients are presented in Table 4. Column 1 provides estimation results from equation (1) without controls. The estimated marginal effect of (standardized) page count indicates that an increase in article length by one page increases the number of citations by 3.7, on average. Column 2 results are generated from the full model except for the control for quality. Controlling for all of these effects reduces the estimate to 2.3.

Table 4 The effect of article length on citation: marginal effects

Results from the full model are provided at Column 3, in which we add information about the most-cited author’s other citations as a measure of author quality. These results indicate that a one page increase in article length raises the number of citations by 1.9.

Results for all covariates (see Appendix Table A.3) show that journal issue fixed effects are significant in most cases. Empirical papers are cited more as are papers employing both theory and empirical analysis. Among other variables, number of coauthors shows a positive effect while article order generally shows a negative effect on citation. Subject areas show mixed effects but are often not statistically significant. A longer gap between the publication of the first working paper version and the article itself leads to more citations and the effect is statistically significant.

In line with our expectations, as we move to the upper quintile of author’s citation (quintile 1 is the reference category), we find an increasingly large number of citations. We interpret this as author quality positively affecting citations. But importantly, this effect of author quality does not eliminate the statistically significant effect of article length.

Results from the quadratic-in-logs models are presented in Columns 4–6 of Table 4 in the same way. First, the model with no controls, then the model with all controls except author quality and finally the full model including the quality measure. Without controls, we find an average elasticity of 1.00, this drops to 0.62 when we add all of the controls except quality and drops to 0.56 when we also include the quality variable. Looking at the detailed results in Appendix Table A.3, the other variables behave as described above for the quadratic-in-level model.

The quadratic-in-logs model of columns (4)-(6) fits better than the quadratic-in-levels models of columns (1)-(3).Footnote 18 This is not surprising given the non-parametric results and the distributions of article length and citation which are much closer to normal after the log transform (see Appendix, Figure A.4 and A.5). Thus, Column (6) of Table 4 represent our preferred specification. Models without the quadratic term were similar to those estimated with the quadratic term, but we prefer the quadratic specification as it gives more flexibility and we find that the log of page length and its square are always jointly significant.Footnote 19

The quadratic form results in a marginal effect which is not constant with respect to page length. We find that the marginal effect of an additional page increases with article length. The marginal effect is insignificant below 16 pages (16 pages is below the tenth percentile of the distribution of article length) and then increases slowly with page length–see Appendix Figure A.6. The marginal effect is 0.39 at the 10th percentile (17 pages), 0.52 at the 25th percentile (25 pages), 0.60 at the median (34 pages), 0.66 at the 75th percentile (39 pages) and 0.70 at the 90th percentile (42 pages).

That our estimates in both models do not change that much when we add the proxy for quality should not surprise us. As already indicated, our highly selected sample means that all of the articles which we consider are of high quality. Editors and referees most likely only permit additional pages when they are of equal quality to those already in the journal and in an article.

The results in Table 4 conform with some earlier studies that find a significant and positive effect of article length on citation. For example, given the within-quintile average article length of 32 pages and 52 pages at the middle and top quintiles of our sample, our estimate indicates 35% more citations for articles in the latter quintile compared to the former. This is slightly less than Card and DellaVigna (2013) who observed 50% more citations when comparing these two quintiles in their sample. Medoff (2003) observed a positive, but statistically insignificant, effect of article length on citation. Hilmer and Lusk (2009) also observed a positive but insignificant effect of article length on citations in agricultural economics.

Sensitivity checks

We consider a number of robustness checks. First, it is possible that the effect of article length is completely driven by a positive effect in one or a couple of journals. In order to check this, we separately estimate the model for each journal. Columns 1–5 of Table 5 indicate that, the (marginal) effect of (standardized) article length in the quadratic models differs somewhat by journals. However, in all cases, the effects are positive ranging from 1.56 for QJE to 3.50 for JPE. The quadratic-in-logs model results indicate elasticity estimates between 0.02 and 0.74 (Columns 6–10), although it is not statistically significant in case of QJE. See Appendix Table A.4, for the complete, detailed regression results.

Table 5 The effect of article length on citation: marginal effects from journal specific models

The effect of article length on citation may be dissimilar at different points of the (conditional) citation distribution. For example, while length may not matter for average citation counts, it would be useful to know if increased length increases citations at the top or the bottom of the conditional distribution. We employ quantile regression to investigate such concerns.

Our quantile regression estimates of article length on citation are presented in Table 6. We estimate the model at the 0.10, 0.25, 0.50, 0.75, and 0.90 quantiles, both for the quadratic and quadratic-in-logs models. While the results of both sets of models are similar, our preferred log–log model with full controls suggests that length has a significant effect on citations across the conditional distribution. At the lower end of the distribution (10 percentile), the coefficient for length is 0.56 and remains similar throughout the distribution.Footnote 20 The distribution of the quantile regression coefficients and their confidence intervals include the entire distribution of OLS estimates.Footnote 21

Table 6 The effect of article length on citation: marginal effects from quantile regression models

We use standardized page length and journal/issue fixed effects to account for page formatting differences across journals. In Table 7 we present results using raw page counts (with full results in Appendix Table A.6). The results are essentially unchanged.

Finally, we estimate the DiD model discussed in “Empirical strategy, identification and variable construction” section above. The results are presented in Table 8 following the layout of the previous tables. In most models, the DiD estimates indicate a positive impact. The DiD coefficients, which are the year \(\times\) journal interactions, capture the change in article length over time relative to the AER after controlling for journal and time fixed effects. As article length in AER doesn’t change over time, these coefficients pick up the effect of changing article length on citation count.

Table 7 The effect of article length on citation: marginal effects with non-standardized page count
Table 8 The effect of article length on citation: difference in differences estimate

In our preferred log–log specification, presented in column (6), seven of sixteen coefficients are positive and statistically significant. Several others are not significant but have t-values above 1.2. Three are negative, but very close to zero both statistically and practically. If we average across the 16 DiD coefficients, we find an average effect of 0.38.

The estimates of other coefficients in the model are similar to our previous results.Footnote 22 The quasi-experimental approach confirms our main results of the positive and statistically significant impact of increasing page length on higher citations.

Our results are subject to the caveats discussed above–namely that the number of papers published by the AER increased during this time and if this had an impact on the average quality of AER papers, then our impact estimate will be biased. This bias would most likely be downwards if the drop in quality resulted in lower citations of AER papers, which would make our estimate a lower bound on the true estimate. The fact that the AER’s impact factor actually went up would seem to provide evidence against this, however.

We conduct a variety of additional robustness checks. Dropping comments/reply/notes and lectures from the analysis and only using regular articles does not affect our results (Table A.8). Using continuous variables for the number of authors, title length and order of articles does not affect our results (Table A.9). In our main estimates we control for quality by using the total number of citations (less the citations attributable to the article in question) for the most highly cited author. In Table A.10 we replace this measure of author quality with dummies for quintiles of h-index for the author with the highest h-index. Again, our results are largely unaffected. The marginal effect of article length on citation count is slightly higher at 0.61, but is not statistically different than our main estimates.Footnote 23

Following Hamermesh (2018), we added an additional variable which categorized articles as (i) empirical: borrowed data, (ii) empirical: own data, (iii) theory only, (iv) theory with simulation, (v) lab experiment, (vi) field experiment, (vii) econometric theory, (viii) both theory and empirical: borrowed data, (ix) both theory and empirical: own data and x) other types. When we add this variable to our models, the key results are not statistically different than what is presented in Table 4.Footnote 24

Table A.11 presents results from models where article length enters linearly (either in levels of logs) rather than as a quadratic. The marginal effects are slightly smaller. In our preferred log–log model, the effect of article length falls from 0.56 to 0.40. The confidence intervals overlap so these estimates don’t seem to be significantly different from our main estimates. Interestingly, this estimate is nearly identical to what we found in the DiD estimates.Footnote 25

Results from our preferred model indicate that a 1% increase in article length increases citations by 0.56% after controlling for all other factors. On that basis, if we compare an article of 40 pages against a similar quality article of 50 pages, the number of citations for the latter article will be higher by 14 per cent. Thus it appears beneficial for authors to write longer articles.

However, this may not be true at the author level. Gibson (2014), investigating 5620 articles in 700 different journals, showed that citations are higher for people who spread a given output into more articles. In other words, from the supply side, the opportunity cost of writing a longer article is writing two or more shorter articles of similar quality, and two or more shorter ones will give more total citations compared to that given by one long one. For example, with our elasticity estimate, if an author wrote two short articles of 20 pages each, instead of one 40 page article of similar quality, her total citations would increase by 44%.

Obviously these effects are at the margin and while significant may be relatively unimportant compared to other factors. Researchers want to publish in top five journals and, in order to do so, in the face of high rejection rates and tough refereeing, authors will try to make articles more informative and useful by writing articles with the greatest possible breadth and depth.

Survey of economists

In order to gain a little more insight into authors’ citation practices and to inform our decisions about which variables to include in the regression models above, we conducted a very short survey of Economists at the Australian National University (ANU).Footnote 26 All active economists at the University were invited by email to participate in the survey. Participants were asked if they prefer to cite longer or shorter articles. 20 academics responded, all of whom claimed that article length did not influence their choice of whether or not to cite a paper. However, in follow-up questions, some (15%) mentioned the fact that shorter articles are easier to read (Fig. 3) while longer articles have the benefit of being comprehensive and including both theory and empirical components (Fig. 4).Footnote 27

Fig. 3
figure 3

Responses against preferring shorter articles

Fig. 4
figure 4

Responses against preferring longer articles

Thus, the survey suggests that economists may not be aware that they are more likely to cite longer articles, but other preferences lead them to do so. Such result can be due to the correlation between length and comprehensiveness of articles.

Conclusion

We examine whether there is any relationship between article length and the number of citations that an article receives. In our investigation, we employed information of articles published between 2010 and 2014 in top five journals in economics. Using citation count in Google Scholar and controlling for a range of factors associated with citation, we find an elasticity of the number of citations with respect to page length of 0.56. A difference-in-difference approach exploiting differential changes in article length over time also confirms a statistically significant relationship between increased article length and citations. Neither approach is able to identify a causal impact of article length on citation numbers, but our estimates provide improved evidence for this relationship in top economics journals.

While longer articles may be higher quality, we attempt to control for this by using a measure of author quality that uses information from all of the author’s other publications. Our estimates do not change much when we exclude this measure, probably because our focus on papers only from the top five journals means that we have a homogenous sample of very high-quality papers.

Although citations increase with article length, our estimates, consistent with other literature, also suggest that authors are better off publishing multiple smaller articles rather than one longer article.

A brief survey of economists at the Australian National University indicates that authors do not have a conscious preference for citing longer articles. However, they do prefer comprehensive articles which combine both theory and empirical work and this may lead them to cite longer articles. Also, longer articles may contain a greater variety of things that can be cited.

Do our results mean that authors should try and pad out their articles with useless prattle? Of course not. High rejection rates and rigorous editing and reviewing systems in the top journals pare down irrelevant and uninteresting additional pages. If additional pages are of equally high quality to average pages, as is probably the case in our sample, then longer articles suggest more breadth and depth of high-quality analysis. By all means, authors should strive to write better papers with more high-quality analysis. Such papers produce more citations.