Introduction

The Impact Factor (IF) reflects the relative importance of a journal within its field and quantifies the frequency with which the “average article” in a journal has been cited in a particular period [1]. IF for a given indexed journal is calculated dividing the number of times the articles published in a journal are cited during the two previous years, by the number of articles published by that journal in the same interval [2]. Although IF has become the leader metric for consideration of tenure and promotion, and for budget and resource planning in most of the universities, research institutions, and colleges [3], contradictory opinions are stating that the IF is not a perfect metric and has severe limitations [4,5,6,7].

Recently, the American Society for Cell Biology, with journal editors, publishers, and other stakeholders, issued a pledge to move away from an over-reliance on journal impact factor and to seek new ways of assessing research output [8]; a similar opinion appeared in an editorial of BMC Medicine [9]. Furthermore, recent articles in journals of the Radiology, Nuclear Medicine and Medical Imaging category, written by editors and authors, commented on several aspects of the IF and its association with concurrent bibliometrics [1, 7, 10,11,12].

Alternative bibliometrics from the impact factor (IF), all of them reported annually by the Web of Knowledge managed by Thomson Reuters [13] have been claimed to improve the esteem of journals in specific categories [14, 15]. This bibliometrics includes 5-year Impact Factor, Immediacy Index, No. of Articles [published], Cited Half-life, Eigenfactor™ Score (ES), and Article Influence Score.

To further analyze the relationship between citations and bibliometrics of Radiology, Nuclear Medicine and Medical Imaging journals which are reported annually by the Web of Knowledge, we assessed the bibliometrics from these journals and calculated their annual predictive ability for total-cites calculation in a 7-year period. Our findings might help authors to understand which bibliometrics offer a better ranking of journals before submission.

Materials and methods

Study design

A retrospective study to evaluate the performance of journals in the Radiology, Nuclear Medicine and Medical Imaging category from the Web of Knowledge [13] recorded the values of eight selected bibliometrics listed in the Journal Citations Reports (JCR) [16] in a 7-year period. Definitions of each used bibliometric from the Web of Knowledge have been recently published [15].

Definitions of the alternative bibliometrics from the Web of Knowledge:

  1. 1.

    The 5-years Impact Factor (5-yIF), it is the IF of a given publication, in a specific year, but calculated over a 5-year period.

  2. 2.

    The cited half-life (CHL) is a measure of the rate of decline of the citation curve. That is, the number of years that the number of current citations takes to decline to 50% of its initial value. It measures how long articles continue to be cited after publication [17].

  3. 3.

    The immediacy index (ImIn) depicts how often, on average, authors cite very recent articles from a particular journal; and hence, how rapidly the average paper from that journal is adopted into the literature [17].

  4. 4.

    Eigenfactor™ Score (ES) is considered an indicator of the global influence or repercussion of manuscripts published online in Journal Citation Reports® (JCR) as part of the Web of KnowledgeSM. Its calculation is based on the number of times articles published in the past five years have been cited in the JCR annually. It also considers which journals have contributed to these citations, so that highly cited journals will influence the network more than lesser-cited journals; references from one article to another article from the same journal, are removed, so that Eigenfactor Scores are not biased by journal self-citation [18, 19].

  5. 5.

    Article Influence™ Score (AIS) determines the average influence of articles over the first 5 years after publication. It is obtained from ES based on its same iterative algorithm, but taking into account the number of articles [20].

  6. 6.

    Article number (AN) is the number of articles published during the selected year.

Journal Selection and measured periods

We chose the bibliometrics values of journals in the Radiology, Nuclear Medicine and Medical Imaging category of the JCR Science Edition. A total of 124 journals were selected, see Table 1. Journals that coincidentally appeared between the 2007 and 2013 in the JCR Science Editions were included. We assembled five sets of bibliometrics for each journal and for each selected year that matched to its total citations 2-year ahead:

Table 1 List of journals in the Radiology, Nuclear Medicine and Medical Imaging category
  • Set 1: 2007 Bibliometrics vs. 2009 Total Cites.

  • Set 2: 2008 Bibliometrics vs. 2010 Total Cites.

  • Set 3: 2009 Bibliometrics vs. 2011 Total Cites.

  • Set 4: 2010 Bibliometrics vs. 2012 Total Cites.

  • Set 5: 2011 Bibliometrics vs. 2013 Total Cites.

A pdf file containing bibliometrics from all reported journal (2007–2013) is available as a supplementary file for online access.

Sample size calculation

We followed the recommendation by Tabachnick and Fidell [21] when performing repeated measures analysis: univariate F is robust to modest violations of normality as long as there are at least 20 degrees of freedom for error in a univariate ANOVA and the violations are not due to outliers. Even with different n and only a few dependent variables (DVs), a sample size of about 20 in the smallest cell should ensure “robustness”; we got a total of 124 measurements per bibliometric included in our assessment.

Statistical analysis

Design of a mixed effects model

We assembled a predictive model of 2-years ahead Total Cites by combining the overall effect of bibliometrics while taking into account within-journal citation in five repeated measures. A linear mixed effects model with random slopes and intercepts was used to test the hypothesis that alternative bibliometrics of the Web of Knowledge surpasses the IF as predictors of a total-cite calculation. In agreement with the hierarchical structure of data, we assembled a mixed-model comprised of a three-level hierarchy. Level 1, the selected bibliometrics (continuous variables) consecutively measured during 5 years; level 2, the year of citation (that is time, the repeated measures); and level, 3 the journal ID (levels 1 and 2 are nested within each journal). Figure 1 shows the diagram of our three-level hierarchical data structure that assembled the variables and effects in the model.

Fig. 1
figure 1

The design of a three-level hierarchical data structure, where the level-2 variable defines the repeated measures

Evaluating the need for multilevel modeling

The intraclass correlation coefficient (ICC) calculated the ratio of variation across the year of citation and journals. When a high ICC value points to a difference in the mean selected-bibliometric levels across time (year-over-year data), a multilevel modeling is needed to separately estimate the selected-bibliometric variance that occurs both across journals and time of measurement [22].

Independent and dependent variables

A total of eight independent variables were included: seven bibliometrics (continuous variables): Impact Factor, 5-year Impact Factor, Immediacy Index, No. of Articles, Cited Half-life, Eigenfactor Score, Article Influence Score; and one categorical variable, Year of measurement (time-set of citations). The 2-years ahead Total Citations was the dependent variable.

Mixed-model effects analysis

Data were analyzed using maximum likelihood (ML) estimation; it is considered an appropriate approach for studying individual change, besides ML, focuses on the entire model (both fixed and random effects); for our data, it created a hierarchical model that nested repeated measures at six consecutive years within journals. To specify the within-individual error covariance structure that best fits the data and protects the precision of estimates for the appropriate model, we evaluated the most common covariance matrixes types reported in the literature (unstructured, scaled identity, compound symmetry, diagonal) [23]; then, each matrix type generated a different model. The fitness of the models was assessed with the 2-log likelihood (i.e., likelihood ratio test/deviance test), Akaike’s Information Criterion (AIC), and Bayesian Information Criterion (BIC).

Our model considered the fixed effects of the selected independent variables. We added a random effect for the repeated measures, which allowed us to resolve the non-independence of data by assuming a different “baseline” of the continuous independent variables values for each journal. This effect characterizes the characteristic variation due to individual differences; the model design expected multiple responses per journal, and these reactions would depend on each journal’s baseline level [24]. Graphical representation of the fitting of the data was performed using scatter plot of the predicted versus the observed values of Total Citations for the whole model, labeled by subgroups (year of citations). Linear regression (LR) analysis allowed the measurement of R2 and p values [25].

Measure of the effect size

We computed a pseudo-R2 as a measure of global effect size statistic, even though the response variable variance was partitioned into five levels in our model. As previously described, we use the predicted score for each participant in the sample, calculate the correlation between the observed and predicted scores, and squared that correlation [22]. The effect size assessment (proportion of the variance in the dependent variable that can be explained by the independent variables) was obtained using the R proposed by Cohen [26], where 0.1–0.29 = small effect, 0.30–0.49 = moderate effect, and ≥ 0.5 = large effect. All analyses were carried out using IBM’s SPSS software (version 22.0.0.0, IBM Corporation, Armonk, NY, USA). Statistical significance was indicated by p < 0.05 (two-tailed).

Evolution of IF and ES over time

We finish our analysis by plotting the evolution of the IF and ES (separately) in the Radiology, Nuclear Medicine and Medical Imaging category by sorting the journals in 5 groups based on the values of their IF [(level 1, 0–0.99); (level 2, 1.0–1.49); (level 3, 1.5–1.99); (level 4, 2.0–2.99); (level 5 ≥ 3.0)]; and ES [(level 1, 0–0.00120); (level 2, 0.00121–0.00285); (level 3, 0.00286–0.00565); (level 4, 0.00566–0.01200); (level 5 ≥ 0.01201)]. These values reflected the percentiles, 20, 40, 60, 80 and 100%, respectively. We plotted the selected bibliometric values from 2007 to 2013 and performed a split-plot factorial anova.

Displacement of the ranking place

To show a graphic representation of how alternative bibliometrics could reorder the top 25 journal initially ranked by the IF after the analysis, we present a simple graph line connecting the rankings of the 25’s top journals with their most significant predictive metrics.

Results

Multilevel modeling analyses

Intraclass correlation coefficients

We found significant ICC values in all selected bibliometrics: Impact Factor (ICC = 0.988, p < 0.001); 5-year Impact Factor (ICC = 0.993, p < 0.001); Immediacy Index (ICC = 0.930, p < 0.001); No. of Articles (ICC = 0.989, p < 0.001); Cited Half-life (ICC = 0.981, p < 0.001); Eigenfactor Score (ICC = 0.998, p < 0.001), and Article Influence Score (ICC = 0.993, p < 0.001). The ICC helped us to justify the need of multilevel modeling; presence of variations in the mean selected-bibliometric levels across time (year of citations) indicated the need to perform multilevel modeling to separately estimate the selected bibliometric variance that occurs both across journals and time.

Overall fit of models

We performed separate analyses using the most common covariance matrixes types; the unstructured matrix type represented the best model by depicting the smallest values in the information criteria table. This matrix type has been reported to offer the best fit and is most commonly found in longitudinal data as it is the most parsimonious, which requires no assumption in the error structure [27]. Table 2 shows the assessment of the overall fit of the multivariate models.

Table 2 Assessment of the overall fit of the multivariate models by evaluating different covariance matrixes types, the selected covariance type (smallest values in the information criteria) corresponded to the unstructured covariance matrix

Significant predictors of total citations and beta coefficients of the regression model

All independent variables were included: Impact Factor, 5-year Impact Factor, Immediacy Index, No. of Articles, Cited Half-life, Eigenfactor Score, and Article Influence Score. For random effects, we used by-subject random slopes and intercepts for the effect of repeated measures (time) (Wald Z = 10.845, p < 0.001). There was a significant effect for five IVs in the model: 5-year Impact Factor, No. of Articles, Cited Half-life, Eigenfactor Score, and Article Influence Score (p ≤ 0.010 in all cases). The Impact Factor and Immediacy Index were not significant. Table 3 shows the main effects for the selected model.

Table 3 Significant predictors of Total Cites for selected bibliometrics

The more significant coefficient corresponded to the ES; it depicted a positive, meaningful direction in the outcome. Table 4 shows the unstandardized beta coefficients for each variable and its CI. Figure 2 shows graph lines identifying each selected journal in the final model; the existence of random slopes and intercepts is evident.

Table 4 Directions of the relationship between each predictor and the outcome (total citations) are represented by a positive or negative regression coefficient (b value); it is evident the large beta coefficient of the Eigenfactor Score signaling it as the most influential predictor of total citations
Fig. 2
figure 2

Graph lines are identifying the selected journal in the final model. Lines represent the predicted values of total citations (number) at each year of measurement; the existence of random slopes and intercepts among the journals is evident

Global effect

The global effect size (pseudo-R2 correlation between the observed and predicted scores) for the whole model depicted an R2 value = 0.934, p < 0.001; this value corresponds to a large effect size. Figure 3 illustrates the regression line between the observed and predicted values for total citations.

Fig. 3
figure 3

Regression line between the observed and predicted number of total cites, which was considered a measure of the global effect size; there was an excellent correlation between data (99.9%, p < 0.001)

Evolution of IF and ES over time

The split-plot ANOVA showed no significant interaction between time and IF (p > 0.05), but a main effect of time (p < 0.001); there was neither an interaction between time and ES nor a main effect of time (p > 0.05 in both analyses). For those journals with an IF > 3.0, there was a continuous growing trend from 2007 to 2013; journals with an IF between 2 and 3 decrease in their value after 2010; and those below 2.0 IF remained stable after that date. Regarding ES, journals with values higher than 0.01201 showed a decreasing trend from 2007 to 2011 with a growing recovery continued until 2013. All journals with ES below 0.01200 depicted a very mild downward trend from 2007 to 2013. Figure 4a, b shows the IF and ES trends plotting the split-plot ANOVA.

Fig. 4
figure 4

Graphical representation of the IF and ES trends over time (from 2007 to 2013) in the Radiology, Nuclear Medicine and Medical Imaging category using a split-plot ANOVA with a 5-level rank. A, for those journals with an IF > 3.0 there was a continuous growing trend from 2007 to 2013. B, regarding ES, journals with values higher than 0.01201, showed a decreasing trend from 2007 to 2011 with an increasing trend continued until 2013

Displacement of journals previously ranked by the IF

There was a significant re-ranking among the top 25’s journals initially listed by the IF (the year 2013). After we had reclassified the journals based on their ES, only 16 journals remained in the top 25, some were demoted, but the rest climbed higher; some examples include: JAAC-Cardiovas Imag moved from 1st to 20th, and Med Phys jumped from 25th to 6th position. When ranked using the CHL, only 3 of the original top 25 IF journals stayed within the top 25 places. Figure 5 depicts the ranking displacements based on IF, ES, and CHL.

Fig. 5
figure 5

Graphical representation of displacements in the ranking order of the top 25 journals; after re-ranking the journals by their ES, only 16 chronicles remained in the top 25, the rest were demoted; in a second re-ranking by CHL, only 3 of the first top 25 IF journals stayed within the top 25 positions

Discussion

The ranking of journals by their IF has become a primary consideration when authors are deciding where to submit their papers. IF is misused as a proxy for the quality of individual articles [8]. Usually, researchers look for journals with the highest impact factor instead of journals with the best audience for their research [4].

The success of researchers is nowadays judged by the number of publications they have published in high IF journals [28]. The assessment of the scientific impact of journals evaluated by bibliometrics is a complex, multi-dimensional construct and therefore the use of a single bibliometric index is inappropriate to rank, evaluate, and value journals. Readers should look beyond the impact factor and assess scientific articles individually [29]. Preferably, the use of multiple metrics with complementary features provides a more comprehensive view of journals and their relative placements in their fields [30].

Our study shows additional evidence to numerous reports about the apparent limitations of the IF as a significant predictor of total citations [5, 31,32,33]. The use of mixed-models analyses allowed us to study intra- and interindividual differences in the curve parameters (slopes and intercepts) [23]; this assessment dismisses the assumption of homogeneity of regression slopes, cast aside the assumption of independence (between cases), and expect the presence of missing data [24]. Our results evince the ES as the bibliometric which captures best the prestige of a journal, this fact has been previously compared with the ability of the IF [34]. However, our data are limited to state if the ES can assess the actual dissemination of an article (i.e., its use, as well as the category of journals which include it in their reference lists) [20]. The ranking displacements using ES rejects a previous statement that IF and ES produce similar rank orders of medical journals [35]. We consider our finding that journals with ES values higher than 0.01201 showing a decreasing trend from 2007 to 2011 and then a growing recovery continued until 2013, depicts a real pattern in the data, as this was a subgroup analysis not observed with lower ES values.

Our findings agree with a similar study in the Gastroenterology and Hepatology category [14]. However, our global effect size (R2) was slightly less (0.934 vs. 0.999). To explain our findings, we must mention that although IF is a per-article measure, people use it to evaluate journals. On the other hand, ES is a per-journal measure that represents each journal size, based on its “total citations”; therefore it is considered superior for evaluating the quality of journals [34]. The non-significance of the IF is explained as it measures citations per article, then, is a poor indicator of total citations given that scholarly journals vary in size over multiple orders of magnitude.

The ES is gaining traction, because it focuses on the impact of particular articles, but dependence solely on citations still limits it. The rest of bibliometrics is less well-known as predictors of citations, for example, the number of articles at least scales with a journal size, but does not account at all for quality; this left the ES as a winner in an unbalanced competition [14]. We found a significant predictive ability of AIS, which has been reported with a positive correlation to the IF (r = 0.94), however, we did not find an analysis of AIS similar to our study [36].

Readers should be aware that even though the number of citations has been widely used as a metric to rank papers, recently some iterative processes are considering new approaches, such as the PageRank algorithm which has been applied to the citation networks [37]. Moreover, more modern usage-based article-level metrics are being also explored such as the Usage Factor, Publisher and Institutional Repository Usage Statistics (PIRUS2), and the Y-Factor [30].

Additional factors are worth mentioning: each medical specialty depicts different IF threshold, for example, a journal in the oncology field might have an IF up to 30 times as high as the corresponding figure in the forensic medicine category [3]. All journals have a diverse set of citations, and even the best publications contain some papers that are never cited [38]. That is, citations are not equally distributed, with fewer than 20% of the articles accounting for more than 50% of the total number of citations [39]. Despite these facts, the misuse of the IF for judging the value of science persists, because it confers significant benefits to individual scientists and journals [40].

Several limitations of this study need to be addressed: a detailed explanation of each bibliometric is beyond the scope of this article; our analysis was not a conventional regression model, but a linear mixed effects design. Part of the benefits of a study of this kind is that the assumption of independence (between cases) is cast aside, and correlation among variables in the model is expected [24]. Our predictive analysis was limited to time-sets in a 2-year comparison period (five sets of repeated measured bibliometrics from 2007 until 2013). This timespan explains that the first publications about ES methodology appeared in 2008 [19]. Also, the original idea for this project was conceived at the beginning of 2015. At that time, the more recent list of bibliometrics published by JCR included the year 2013. After our previous articles on this topic were published in 2015 and 2018, we looked forward to being consistent our previous reports [14, 15].

Additional factors, such as longer time frames, the number of articles published in each issue, circulation of each journal, host of factors impacting “citation” (self-citation, semi-mandatory, and mandatory citation) might all influence citation calculations. We did not include these possible confounding factors, as the Web of Knowledge does not consider them. We acknowledge that normalization of journal citations by its article count is desirable. However, we used the raw data provided by the ISI Web of Knowledge at the time we wrote this study to assemble our predictive model. Also, a factor that may affect author where to submit their work were outside the scope of this study: topic, affiliation with society, geography, rejection by earlier submission, familiarity with the submission and revision process, turnaround time and invitation by editors. Our previous publications on this topic were in the Gastroenterology and Hepatology [14] and Neurosciences [15] categories. We do not consider this study a case of self-plagiarism finding the similarities in the used methodology, but a needed proof-of-concept (PoC); we mean the attainment of a specific method to demonstrate its feasibility. Because a PoC aim assesses the real potential of a method for its clinically meaningful effects in the intended population [41], we repeat the same methodology (linear mixed model design) to our target journal category and were able to obtain similar results. Readers should be aware that the JCR includes approximately 171 categories in the sciences and 54 in the social sciences; then, publication of future studies validating our model in other specialties would be desirable.

In conclusion, Impact Factor and Immediacy Index shows no ability to predict 2-year ahead annual-citations, our findings support researchers’ decision to stop the misuse of IF alone to evaluate journals. A re-ranking of journals using Eigenfactor Score, Article Influence Score, and Cited Half-life provides a better assessment of the significance and importance of scientific journals in particular disciplines. Radiologists and other researchers should review these scores for their decision-making during the manuscript submission phase; they may even become a new standard of the quality and validity of the research.