Introduction

Earlier research investigated the issue of readability from the perspectives of typography, interests, and style (Dale and Chall 1948). Of the three perspectives, research on readability in terms of style focuses on the “style of expression” (Dale and Chall 1948, p. 3), i.e. how words and sentence structures affect its readers’ comprehension of the texts. Probably inspired by this stream of research, Klare (1963) defined readability as “the ease of understanding or comprehension due to the style of writing”. Many writing experts advise that, in order to achieve high readability of a text, brevity and conciseness should be significant for the quality of writing (e.g. Marroquín and Cole 2015; Zinsser 2006). In addition, readability is also important for knowledge development and information communication (e.g. Hartley et al. 2002). Accordingly, readability has been examined in a wide range of contexts such as education materials, newspaper reports, and advertisements (Dolnicar and Chapple 2015).

Due to the significance in the research of academic texts, more than 200 readability measures or formulas were developed to quantitatively examine the readability or difficulty level of texts (Gazni 2011). Of them, the Flesch Reading Ease (FRE) (Flesch 1948; Hartley 2000) may be the most widely used one (Didegah and Thelwall 2013). The FRE considers the numbers of syllables, words, and sentences in a text. A text with an FRE score above 80 is taken as very easy, while a text with a score below 50 is considered as difficult or very difficult. See “Appendix 1” for the explanations of the FRE scores (Dolnicar and Chapple 2015).

In contrast, McLaughlin’s (1969) simple grading readability formula, known as the Simple Measure of Gobbledygook statistics (SMOG), counted the number of polysyllabic words (words with three or more syllables). The reason for so doing is that “the counting of polysyllabic words in a fixed number of sentences gives an accurate index of the relative difficulty of various texts” (McLaughlin 1969). Compared to the FRE index, the SMOG score is more intuitive to understand. That is, the SMOG score of a text refers to the number of years of education a general reader may need to understand the text. Thus, the higher the SMOG score of a text is, the more difficult the text to understand. Many researchers prefer the SMOG index (McLaughlin 1969) to the FRE (e.g. Fitzsimmons et al. 2010), and argue for the SMOG index as “probably one of the simplest but most valid and reliable readability formulas available to date.” (Contreras et al. 1999, p. 22).

Recently, the issue of readability has become one of the concerns in academia, and a number of studies have been conducted to investigate the readability of academic writing. For example, Bauerly et al. (2006), Crosier (2004), Sawyer et al. (2008), and Stremersch et al. (2007) examined the readability of academic articles in the areas of management and marketing. Meanwhile, Hartley et al. (2002, 2003) and Shelley and Schuh (2001) investigated that in psychology and education, Bottle et al. (1983) and Dolnicar and Chapple (2015) studied that in chemistry and tourism. In addition, Oliver et al. (1998) and Gazni (2011) scrutinized the readability of research articles in multiple disciplines.

Several points can be summarized from previous studies concerning the readability of academic articles. First, academic articles are very difficult to read. Dolnicar and Chapple (2015) analysed 13 studies that investigated the readability of research articles with the FRE formula. It found that most of the studies reported the less readable nature of the research articles. For example, Hartley et al. (2002) examined the readability of 164 academic articles in psychology from 1935 to 1990. The results showed that the minimum FRE score of the articles is 19 (very difficult) and the maximum FRE score is 34 (difficult). Another example is Sawyer et al. (2008), which analysed the readability of 162 articles that are either awarded or not. The study reported that both categories of articles are difficult to read in terms of FRE scores, though the non-winner articles are less readable than the winner ones.

Second, academic articles seem to have become increasingly difficult to read over time. One example is Bauerly et al. (2006) that investigated the readability of articles in The Journal of Marketing from 1936 to 2001. The study found significant decrease of readability of the articles over the period examined. Another example is Dolnicar and Chapple (2015), which studied the changes of readability in tourism articles from 1993 to 2003. They also found that the readability of the articles dropped from 21.0 in 1993 to 20.6 in 2003 and to 15.5 in 2013. The results of Dolnicar and Chapple’s (2015) study indicated that articles in tourism are also difficult to read, and more importantly, they have become more difficult to read, though no significant decrease was found.

Third, the readability and the citations of academic articles are not significantly correlated, or negatively correlated in some cases. For example, Hartley et al. (2002) investigated the relationship between the readability and the impact of articles in psychology. It found that the readability scores of articles with higher and lower citations are not significantly different. Another example is Stremersch et al. (2007), which examined the relationship of readability and citations, amongst other indexes, of 1825 articles in five top journals in marketing. The study found that reading ease negatively affects citations. More recently, Gazni (2011) studied the readability of the abstracts of 260,000 articles and its relationship with the number of citations of these articles. It also found that less readable abstracts tend to receive more citations. In addition, Dolnicar and Chapple’s (2015) investigation into 493 articles from 1993 to 2013 in tourism gained similar results. That is, the readability of the 20 % most and 20 % least cited articles in tourism is negatively correlated with their citations, although no overall significant correlation was found for all abstracts in the decades.

Although the relationship of readability and citation counts in research articles has been examined in many disciplines such as management, marketing, psychology, tourism, etc., no such research, to our knowledge, has been conducted on the research articles in information science. Therefore, the present study aims to examine the readability and the number of citations of research articles in information science. The study intends to address the following three research questions:

  1. 1.

    What is the difficulty level of research articles in information science in terms of readability?

  2. 2.

    Does the difficulty level of research articles in information science change over time?

  3. 3.

    Is the readability of research articles in information science correlated with the number of citations of the articles?

Methods

We retrieved the Web of Knowledge database with the advanced search expression “IS = 0138–9130 OR IS = 1751–1577 OR IS = 0048–7333 OR IS = 0958–2029. Timespan = 2003–2012”. That is, we attempted to retrieve the bibliometric information of four journals, i.e. Scientometrics, Journal of Informetrics, Research Policy, and Research Evaluation from years 2003 to 2012. The four journals were chosen since they are important journals in information science, or as Milojević and Leydesdorff (2012) labelled, in information metrics or iMetrics. Accordingly, the relationship between readability and citations of articles in information science or iMetrics may be well revealed by examining the four journals. The time span from 2003 to 2012 was selected for two reasons. First, a span of 10 years would serve as a time window for us to discern the possible change of the difficulty level of the articles. Second, the span from 2012 to 2015 would not leave enough time for the publications to accumulate citations.

We used two sets of data in the present study, i.e. the abstracts and the full texts of articles. The first set is the data of abstracts. Following Gazni (2011) and Dolnicar and Chapple (2015), we used the abstracts of the articles to analyse the readability for two reasons. First, previous research claimed that the abstract is often the only section of an article that is read (Pitkin et al. 1999), and it may be “the most important surrogate type of information” of an article (Gazni 2011, p. 273). More importantly, Hartley et al. (2003) analysed the sections of Abstracts, Introductions and Discussions in psychology journal articles, and found that the sections of the articles are stylistically consistent within the authors. That is, the abstract “should reflect fully and accurately the work reported” (Pitkin et al. 1999, p. 1110). Accordingly, the articles of the four journals with their citation information from 2003 to 2012 were harvested after the first retrieval. Then, the articles that contain abstracts with the number of citations were taken as the data of abstracts to be analysed in the present study. Note that all data of abstracts and full texts of the four journals as described below were from 2003 to 2012, except for Journal of Informetrics, which was launched in 2007. Thus, the data concerning Journal of Informetrics were from 2007 to 2012.

The second set is the data of full texts of articles. Due to practical limitations, we could not download full texts of all the articles published in the four journals from 2003 to 2012. Thus, we randomly selected five articles for each year of the examined decade from one of the four journals. That is, the data of full texts consisted of 200 articles in total, with 50 ones for each journal. As previously mentioned, the data concerning of Journal of Informetrics were from 2007 to 2012. The use of the full texts data is to address one of the reviewers’ concern, on one hand. On the other hand, the results of both the abstracts and full texts data would provide the chance to triangulate our findings.

Both the Flesch Reading Ease (FRE) (Flesch 1948) and the Simple Measure of Gobbledygook statistics (SMOG) (McLaughlin 1969) were employed for the analysis of readability. The FRE was used in the present study since most previous studies pertinent to the present research (e.g. Dolnicar and Chapple 2015; Gazni 2011) used it as the readability index. Accordingly, it was calculated in the present study in order to make direct comparisons with the results of previous studies. In addition, the SMOG was also computed in the present study due to the fact that many researchers argued for its more reliability and validity in the analysis of readability (e.g. Contreras et al. 1999; Fitzsimmons et al. 2010).

The Python package Readability developed by the NLTK team (available at https://github.com/nltk/nltk_contrib) was adopted to analyse the readability of the abstracts and the full texts. The Readability package calculates, amongst other indices, both the FRE and the SMOG scores of the submitted texts. In order to examine the validity or accuracy of package, another Python package also entitled Readability (available at https://github.com/buriy/python-readability) was used to calculate the readability of the data. The results of linear correlation analyses (Pearson’s r) between the readability scores from the two Python tools showed that they were significantly correlated on both the FRE and the SMOG scores (FRE: r = 0.9955, p < .00; SMOG: r = 0.9985, p < .00). That is, the readability scores from the package Readability developed by the NLTK team were accurate, and hence the scores were used and reported in the present study.

Then, the relative citation rate (RCR) (Schubert and Braun 1986) was employed to normalise the number of citations. The RCR was calculated with the observed citation rate divided by the expected citation rate, where the observed citation rate was the citation count of a publication (i.e. a journal article for the present study), and the expected citation rate was the average citation count of the journal in the year when the article was published.

Last, Spearman’s rank correlation analyses (Spearman’s rho) between the readability scores of the two data sets and the normalised citations of the articles were calculated, since the citation counts were not normally distributed (we owe this point to one of the reviewers).

Results

Difficulty levels of abstracts and full texts of articles

The descriptive statistics of the FRE are reported in “Appendix 2” (Tables 4, 5). The FRE minimum score of the abstracts is −212.60 (very difficult) and the maximum one is 90.66 (very easy), which indicates that the difficulties of abstracts vary considerably. In addition, the mean and median FRE scores of the abstracts are 28.45 and 29.38 (very difficult). That is, the abstracts of the research articles in information science are generally very difficult to read.

The results of the FRE indices of the full texts are similar to those of the abstracts. That is, the difficulties of full texts range from the very difficult level (7.82) to fairly easy (76.64), and the full texts are also very difficult to read (mean: 42.79, median: 41.96).

As one of the reviewers commented, some softwares such as Microsoft Word report the FRE scores by converting negative values to zero. However, the Python tool we used for the present study does not make such changes, which seems more accurate.

Dolnicar and Chapple (2015) reported that the mean FRE scores of the abstracts in three tourism journals are from 17 to 19. That is, the journal abstracts in information science are less difficult to read than those in tourism. In addition, Gazni (2011) reported the readability of journal abstracts by researchers in five distinguished universities and institutes in 22 disciplines. The mean FRE scores range from 12 to 28. It seems that the readability of the abstracts in information science is comparable to that in space science and mathematics (25.4 and 25.6), and lower than that in other disciplines.

The descriptive statistics of the SMOG are reported in “Appendix 2” (Tables 6, 7). The minimum SMOG score of the abstracts is 3, and the maximum one is 48.50, which also indicates that the difficulties of the abstracts vary considerably. In addition, the mean SMOG score of the abstracts is 16.52. That is, the general readers may need 16.52 years of education, on average, to understand the abstracts of the research articles in information science, which also shows the great difficulty level of the abstracts. Similar results are also found from the SMOG statistics of the full texts. The mean and median scores of the full texts are 15.67 and 15.60, which indicates that the general readers may need 15-something years of education to understand the full texts. See “Appendix 3” for two samples of abstracts of various readabilities.

Trends of difficulty levels over time

The readability trends of the abstracts and full texts in terms of the FRE scores are demonstrated in Figs. 1 and 2. It seems that the difficulty levels of the abstracts showed a downtrend from 2003 to 2012, though they remained at the very difficult level with the FRE scores approximately at or below 30. The results of the simple linear regression showed that the downtrend is not significant (F(1, 3047) = 2.647, p = 0.1038, Cohen’s d = 0.0589).

Fig. 1
figure 1

FRE scores from 2003 to 2012 of the abstracts

Fig. 2
figure 2

FRE scores from 2003 to 2012 of the full texts

In contrast, the readability levels of the full texts showed an uptrend from 2003 to 2012, though they remained at the difficult level with the FRE scores approximately between 35 and 50. The results of the simple linear regression showed that the uptrend is significant (F(1, 198) = 5.889, p = 0.01613). However, the change is small in terms of its effect size (Cohen’s d = 0.3432).

The readability trends of the abstracts and full texts in terms of the SMOG scores are reported in Figs. 3 and 4. Interestingly, an uptrend of the SMOG scores was found for the abstracts. That is, the abstracts became less readable in terms of SMOG (F(1, 1047) = 19.34, p = 0.00), though the change was very small (Cohen’s d = 0.1593). In addition, a downtrend of the SMOG scores was found for the full texts. That is, the full texts became more readable in terms of SMOG (F(1, 198) = 6.899, p = 0.0093), though the change was also small (Cohen’s d = 0.3715).

Fig. 3
figure 3

SMOG scores from 2003 to 2012 of the abstracts

Fig. 4
figure 4

SMOG scores from 2003 to 2012 of the full texts

We would explain the findings of the trends in the Discussion section.

Readability and the number of normalised citations

The descriptive statistics of the number of normalised citations of the abstracts and full texts are reported in Tables 1 and 2. From the perspective of citations for the abstracts, the citation counts range from 0 to 17.83, which indicates the great variety of citations between different articles.

Table 1 Descriptive statistics of normalised citations of the abstracts
Table 2 Descriptive statistics of normalised citations of the full texts

We calculated the correlation between the number of normalised citations and the readability scores for the abstracts and full texts respectively. No significant correlation was found between the number of normalised citations and the FRE scores (abstracts: ρ = −0.020, p = .260; full texts: ρ = 0.035, p = .619) or between the number of normalised citations and the SMOG scores (abstracts: ρ = 0.006, p = .750; full texts: ρ = −0.005, p = .946). The scatter plots are illustrated in Figs. 5, 6, 7 and 8.

Fig. 5
figure 5

Scatter plot of FRE readability and normalised citations of the abstracts

Fig. 6
figure 6

Scatter plot of SMOG readability and normalised citations of the abstracts

Fig. 7
figure 7

Scatter plot of FRE readability and normalised citations of the full texts

Fig. 8
figure 8

Scatter plot of SMOG readability and normalised citations of the full texts

Discussion and conclusion

The present study analysed the readability of the abstracts and the full texts of the articles published in four leading journals in the area of information science or iMetrics from 2003 to 2012. The results showed that both the abstracts and the full texts are categorised into the difficult level in terms of both FRE scores and SMOG scores.

In addition, the readability of the abstracts in terms of FRE scores remained stable across time, while that in terms of SMOG scores went down, though the effect size of the change was minuscule. Since the SMOG index calculates the normalised number of polysyllabic words (i.e. words with three or more syllables), the fact that the abstracts became less readable in terms of SMOG scores may indicate that the abstracts became to consist of more polysyllabic words in the examined decade.

The findings of the readability of the full texts in terms of both FRE and SMOG scores showed that the full texts became significantly more readable in the examined decade. That is, the full texts may become to consist of less polysyllabic words, shorter words, and shorter sentences. However, the effect sizes of both the FRE and the SMOG changes are minuscule, that is, the change in the readability of the full texts was small.

The results confirmed the findings of previous studies such as Gazni (2011) and Dolnicar and Chapple (2015), which also found the very difficult nature of abstracts across time. In addition, the bottoming-out effect proposed by Bottle et al. (1983) may account for the stable nature of the difficulty level of the abstracts over time in information science. That is, the abstracts and the full texts are already at the very difficult, or the bottom level.

One point of interest is the difference of readability between the abstracts and the full texts. Although both of them are difficult to read, the abstracts seem even more difficult to read than the full texts in terms of FRE mean scores (28.45 vs. 42.79; the SMOG mean scores are comparable: 16.52 vs. 15.67). That is, the abstracts may use longer words and sentences than the full texts do, which may be explained by the very limited space for journal articles. The abstracts are usually limited to 200–300 words. Despite of such a limited space, the abstracts are responsible for reporting on the essence of the full texts. Accordingly, in comparison to full texts, less frequent words such as technical terms in the abstracts may take a higher proportion of the sentences, and complex and compound sentences may be used more frequently in order to save space.

The study also examined the relationship between the readability and the number of citations. The results demonstrated that readability indices such as the FRE and the SMOG scores were not significantly correlated with the number of citations. The results confirmed the findings of previous studies such as Hartley et al. (2002) and Dolnicar and Chapple (2015). The results should seem more interesting if we take previous studies such as Stremersch et al. (2007) and Gazni (2011) into account. In those studies, significantly negative correlation was found between the reading ease index and the number of citations, i.e. the Dr. Fox phenomenon in academia (Dolnicar and Chapple 2015, p. 164), which refers to the effect that people’s judgements on academic publications may be misled by the superficial linguistic features.

Bottle et al. (1983) suggested that the increased printing density of the journals be responsible for the downward trend of the readability of research articles. Probably based on the foregoing argument, Dolnicar and Chapple (2015) proposed the use of online supplementary data to the articles in order to reduce the space pressure for the authors. However, the issue of space pressure does not seem to hold true for the reason that the readability of the academic texts may not be confined by the space limit. The authors, while composing the texts, may not take factors that define the FRE score such as word length or sentence length into consideration. It would be more reasonable to assume that they, most probably, are most concerned about or focus on how the knowledge is conveyed or communicated, or how the information is exchanged to their target audience. It is particularly true when the target audience are their peers in the specific academic field. Accordingly, the academic texts unavoidably include a fairly large number of technical terms, which are most probably long words and less readable to non-academics or academics not in the specific field. As a result, the academic texts are less readable or very difficult to read, especially to the readers outside the field, though many writing experts and researchers such as Zinsser (2006), Okulicz-Kozaryn (2013) and Marroquín and Cole (2015) suggest that academic writing should be readable and economical.

It would become even less encouraging, if not that discouraging, if less readable articles receive more citations as Stremersch et al. (2007) and Gazni (2011) found in their studies. More interestingly, some research also finds that awarded articles are less readable than non-awarded ones (Sawyer et al. 2008). Then, if the foregoing two points are combined together, the authors may not have any reason to consider the readability issue while composing academic texts (Lei 2016). It is particularly true, as one of the reviewers argues, if it is taken into consideration that the researchers in natural and applied sciences may tend to write less readable texts based on the readability formulas.

Accordingly, we fully accept the point that the academic community accepts the impact of the articles or the studies, not due to its writing quality (Gazni 2011). That is, the primary task of the academic community or the primary goal of academic writing is to exchange the knowledge, and the number of citations articles receive is only a measure of the impact of the study. In addition, the writing quality or the readability of an academic text is secondary to the impact of the study. With that said, it is not suggested that the academic writers not pay any attention to the readability issue. On the contrary, it would be better if the technical texts be more readable and clearer after the knowledge or information is accurately and academically conveyed.

Future research may consider the following points. The first is to increase the data size of full texts of articles and conduct more in-depth investigations of the relationship of readability of full texts and citation counts. Also, the data may be extended to a longer span such as two or more decades if more in-depth investigations into diachronic changes are to be examined. The second is, as one of the reviewers suggests, to examine the relationship between the human-rated readability and the readability scores from indices such as FRE and SMOG ones. The reason for such investigations comes from challenges to the superficial features of texts examined by readability indices such as FRE and SMOG (e.g. Hartley et al. 2002). The third is to examine, as one of the reviewer suggests, the readability of academic texts of different disciplines and by authors of different first language background, ethnicity, gender and age. In addition, future research may use other readability measures such as Flesch-Kincaid Grade Level and Fry to measure the relation between readability and citation counts.