INTRODUCTION

In evaluating scientific productivity, we can identify three main approaches: the first could be called formal or scientometric, relying on publications, where the weight of the publication depends on the journal’s indicators such as impact factor and quartile rankings; the second approach is expert-based, in which the results of a scientist’s work are evaluated directly by other scientists, based on the content of their work; and the third is a hybrid approach, with assessment based on publications, but the level of the journal (not the specific publication) is determined by expert scientists. A key element in both the scientometric and hybrid approaches involves the stratification of scientific journals, i.e., dividing journals into groups by levels or strata.

In Russia, during the 2010s, the predominant approaches to evaluating scientific productivity were the scientometric and hybrid methods. These methods faced significant criticism due to doubts regarding the fairness of the evaluations, the distorted incentives that they created for scientists, and the rise of unethical practices in the scientific community. In 2022, following the cessation of Russia’s access to Web of Science and Scopus, the government announced a restructuring of the evaluation system, which many in the scientific community welcomed enthusiastically, hoping that the new system would address the shortcomings of the previous ones.

At the same time, the development of a new system for evaluating the productivity of scientific work commenced, along with its significant component—a White List of scientific journals. This model, adopted from Scandinavian countries, proposed to include worthy Russian and international scientific journals. The plan was to stratify the list, distributing journals across levels. The composition of the list—all subsequent versions following the first—was intended to utilize not only journal indicators but also expert evaluations.

As of early 2024, the first version of the ranked White ListFootnote 1 has been published, including journals indexed in the Web of Science Core Collection (WoS CC), Scopus, or the Russian Science Citation Index (RSCI). The ranking was performed according to a specific algorithmFootnote 2 based on journal indicators and has not yet included expert evaluation. From these rankings, journals were assigned one of four levels. To date, the White List has not been officially implemented in existing state science support tools. The development of a procedure for updating the list and reviewing journal levels has been announced (see the official website).

Frequently criticism of the approaches adopted in Russia to assess scientific productivity have centered on the inadequacy of journal stratifications to filter out publications of dubious quality. Prior to 2022, this assessment heavily relied on databases like the WoS CC, the RSCI, and Scopus, along with the journal’s impact factor and the Scimago Journal Rank quartiles. For journals that are not included in these international databases, stratification was determined using the Higher Attestation Commission (VAK) list and the RSCI. Alongside many reputable publications, these strata also included weaker and frankly unethical journals. The creation of the White List of scientific journals is begun but it remains too early to tell how this new approach will resolve existing issues. The stratification might be determined expertly or fully/partly based on scientometric indicators.

This article explores the following questions:

• Which available stratifications of Russian journals fail to filter out publications with weak editorial policies at higher levels?

• Which scientometric indicators identify publications with weak editorial policies?

We address these questions by analyzing Russian journals in the field of medicine and healthcare. While individual cases of weak or unethical journals reach high journal strata are widely known within the domestic scientific community, there have been few systematic analyses of this issue in terms of Russian material.

RESEARCH REVIEW

Two primary approaches to journal stratification can be identified: expert-based and automatic, where the latter is based on formal information and scientometric indicators. A well-known example of stratification based on expert evaluations is the Norwegian model [4, 5], which may use citation metrics as supplementary information. The scientometric approach relies on available indicators, primarily using international citation databases, adopting either direct metrics such as SNIP or SJR, or the JCR impact factor and journal quartile levels.

Research and experience obtained from various countries indicate the pros and cons of stratification approaches. The expert approach can assess information that has been overlooked by metrics. However, this method is susceptible to conflicts of interest, in that experts may favor journals that frequently publish their work [5, 6]. Researchers suggest that the outcomes of ratings based on expert surveys could reflect the scientific interests of the experts themselves: “Scientists studying robotics might overrate robotics journals, while those researching cognitive science might overrate machine learning journals” [6]. Systematic differences in journal evaluations have also been observed among American political scientists [7]. Further, stratification is a costly procedure requiring ongoing resources.

Scientometric indicators can replace expert evaluations when metrics align well with the results of expert procedures. Researchers have conducted analyses on several national examples. For instance, for Finland’s White List, they examined whether rankings based on expert evaluations would coincide with those obtained automatically through scientometric indicators [5]. The presence of three indicators provided by Scopus (SJR, SNIP, and IPP) predicts the Finnish expert rating very accurately—for over 99% of the journals considered, the decisions of the Finnish experts did not significantly deviate from the levels obtained through metrics. Of these, the SNIP indicator had the highest predictive power. However, it is challenging to use scientometrics where a significant portion of scientific output is not indexed in international databases.

Countries where English is not the primary language of scientific communication cannot completely rely on indicators from databases outside the country. The Scandinavian countries’ experience with the creation of stratified lists of scientific journals and using these lists to assess research outcomes is well-documented. In Norway, Denmark, and Finland, panels of scientist experts play a significant role in creating the lists, allowing journals with low indicators but high community regard to be ranked highly, and conversely, excluding publications that are of dubious quality, even if they are indexed internationally. Such white lists have been operational for over a decade, allowing their effects on national science to be analyzed. In spite of some critical evaluations (e.g., [8]), the accumulated data have predominantly had positive effects [4, 9, 10]. In particular, taking into account that publications in unethical journals are indexed in Scopus, such occurrences are rare in Scandinavian countries [11], which is logical, as such journals are mostly filtered out by national white lists.

The Polish experience, although is not as widely known, is described in [12]. In Poland, there is a relatively low proportion of scientific publications in English, which makes national journals a crucial channel for scientific communication. In 2015, the national task was set to assess national journals lacking metrics from Scopus or Web of Science. This assessment was based on a combination of formal, scientometric, and expert methods. Researchers tested the scenario of relying solely on scientometric information, finding that many journals would have to be excluded from the ratings, especially those in the social sciences and humanities, whereas scientometry is quite relevant for national journals in the natural sciences.

Until 2022, Russia maintained two lists of national journals in use for assessing scientific productivity and qualifications: the VAK List, which included 2592 journals as of 2022, and the RSCI list, which included 952 journals [13]. In addition, there were journal stratifications that used indexing in international databases, sorting journals according to their indexing status and quartile. O.V. Kirillova and E.V. Tikhonova [14] have thoroughly investigated the role of international databases in shaping journal stratification. These databases establish independent expert councils with external experts who develop evaluation criteria, participate in creating an expert system, and review submitted journals [14]. However, the reliance on quartile divisions has been criticized. Scientists often choose journals in relation to their indexing status and quartile, not because of the publication’s reputation or its potential to disseminate results. Moreover, this focus has guided some scientists toward predatory journals [2, 3, 1517]. Research measuring the proportion of a country’s publications in Scopus that are of dubious quality shows a high percentage for Russia [11, 15].

The VAK List, which has long been compiled through expert selection of journals, has been criticized for insufficiently rigorous selection and opaque replenishment procedures [18], leading to a revision of its methodology in 2022 (see [13]). Based on scientometric and expert information, a journal ranking was established, and it was divided into the categories K1, K2, K3: K1 for the top 25% of journals, K2 for the next 50%, and K3 for the remaining 25%. Additionally, all journals from the RSCI, as well as those indexed in international databases (most of which were not yet in the VAK List), were assigned to category K1.

RSCI ranking was criticized for the incompleteness of the list, its lack of transparency, and the insufficient use of quality criteria. A study of economic journals revealed a list of publications that were not included in the RSCI in spite of their high citation metrics and positive community reviews [19]. Further analysis showed that eight journals indexed in WoS and Scopus with significant bibliometric indicators were still not included in the RSCI [20]. Discrepancies were also noted in the bibliometric analysis of RSCI and RINZ (Russian Index of Scientific Citation—broad database of Russian journals) journals, showing that only half of the publications in the RSCI occupy top positions in the Science Index of the RINZ [21].

In [20], a critical review of several Russian initiatives to compile lists of recommended journals is presented, including: the White List of scientific publications, the Categorized VAK Journal List, and the Academic Rating of RSCI Journals. O.V. Tretyakova argues that using citation indices calculated from metrics from different databases is problematic because different databases cover different scopes and, consequently, provide varied metrics for the same journals.

Typically, scientists analyze the stratifications of Russian journals in terms of the representation of the most authoritative publications. A distinctive feature of our analysis is that we consider a broad range of journals, including those exhibiting signs of weak or improper editorial policies.

RESEARCH METHODOLOGY

For our analysis, we employed a proprietary stratification of journals, grounded in data from the RINZ database. We selected journals that met the following criteria:

● Country: Russia

● Field: Medicine and Healthcare

● Language of publication: Any

● Inclusion in RINZ: Indexed

We matched our data set with the SCIENCE INDEX 2021 list in the “Medicine and Healthcare” category, adding nine journals to our initial query results. We excluded multidisciplinary journals where less than 40% of articles related to Medicine and Healthcare. This refinement led to a final sample of 586 journals. For each journal, we collected indicators from the RINZ journal profile, instances of editorial misconduct from the Dissernet website, and data on multiple publications from a study [22].

Based on this data, we categorized Russian medical journals into four groups:

• A—Most reputable journals

• B—Recognized journals

• C—Journals of average or uncertain quality

• D—Journals exhibiting signs of improper editorial policies

This study aims to determine whether available scientometric indicators significantly differentiate these groups. If yes, these indicators could be valuable for refining journal stratification. If no, the utility of scientometric indicators for this task remains questionable. We pay special attention to Group D, compiled of journals exhibiting improper editorial policies. Penalty points were assigned to each journal based on the following criteria:

• 1 point for being in the quartile that had the highest self-citation rate among medical journals (RINZ data);

• 1 point for being in the quartile that had the highest level of detected unethical borrowings (RINZ data);

• 1 point for being in the quartile with the highest number of multiple publications (Chekhovich and Khazov data [22]);Footnote 3

• 1 point for having three or more cases listed in the Dissernet database.Footnote 4

In this study, a journal could accumulate from zero to four penalty points. We considered journals that scored three or more points to have clear signs of unethical editorial policies, placing them in group D. In addition, journals that have five or more instances of duplicate publication in the same journal (according to Chekhovich and Khazov data) were also classified into group D, indicating a clear sign of problematic editorial practices. We refer to this group as journals having signs of unethical editorial policies or as questionable journals. It is important to note, however, that not every journal in this group is necessarily predatory or unethical. The violations noted may have been temporary and may have been resolved by now. However, the group as a whole is characterized by a lower quality threshold compared to other groups.

Journals that were not included in group D were distributed across the remaining three categories according to a straightforward principle: if a journal was indexed in two or more reputable indices (WoS CC, Scopus, RSCI) as of 2021, it was included in group A; if a journal was indexed in only one of these databases, it was placed in group B. Journals not indexed in any of these databases fell into group C.

It is crucial to clarify that when we refer to groups A and B as “most recognized” and “recognized” journals, respectively, this recognition is contingent upon their acceptance by authoritative databases. The categorization of journals into the groups A, B, C, and D serves as a practical tool for further analysis and is not an assertion of the correct stratification of journals. We demonstrate how various journal indicators differ across these four defined groups.

RESULTS

Overview of the Field of Medical Journals

The array of Russian medical journals have diverse origins. Among publishers, the most common categories are nonprofit organizations and universities, accounting for 33 and 24% of journals, respectively. In addition, journals are published by commercial companies, scientific organizations (institutes and centers), clinical institutions, and other government organizations.

Adopting the described algorithm, we classified 49 publications (8% of the total array) as journals having the signs of unethical editorial policies, forming group D. In addition, 106 journals (18%) were categorized as most reputable and 127 (22%) as reputable. Approximately half of the array fell into group C, consisting of journals not included in authoritative databases but also not exhibiting clear signs of unethical editorial practices. Particular attention in our analysis is given to group D, the journals that exhibit serious signs of unethical editorial policies. By contrast with the intuitive expectation linking the integrity of editorial policies with profit motives, only 4 of the 49 journals in group D are published by commercial companies, with the majority being nonprofit organizations and universities, as is common in the overall array.

Approximately 35–40 thousand articles are published annually in Russian medical journals. Figure 1 shows the distribution of these journals are distributed across the groups for 2020. Comparison of the distribution of articles and journals reveals that the share of articles in recognized and questionable journals exceeds their proportional representation in the complete journal array. In other words, the average number of articles published annually in group C journals is lower than in other groups (47 articles per journal compared to 92 in group A journals, 61 in group B, and 103 in group D).

Fig. 1.
figure 1

Number of Russian medical journals and articles published in 2020, by journal groups A, B, C, D.

Available Journal Stratifications (VAK, RSCI, White List) Across the Four Journal Groups

How the journals from the groups that we identified are distributed across the levels in well-known and used Russian stratifications is shown in the table. Notably, none of the examined stratifications completely filtered out all questionable journals. The VAK stratification stands out, according to which nearly half of the journals in group D are not only included in the VAK List but are ranked in the second quartile. The RSCI and White List stratifications appear much more adequate in terms of the distribution of group D journals.

In regard to group A (the most reputable journals), the VAK classification also raises questions. Almost 40% of journals indexed in both authoritative databases—a criterion for inclusion in group A—are not listed in the VAK List at all. In that all of these journals are automatically included in the White List, this creates a paradoxical situation for some journals: they are considered good enough to be used in evaluating the performance of scientists but not good enough for evaluating the work of doctoral candidates. It is also notable that about half of the journals in group C are included in the VAK List, and many of them rank in the higher levels.

Table 1.   Distribution of Russian Medical Journals by Stratification Levels and Groups A, B, C, and D

In our study, we noted a significant disparity in the placement of Russian journals at the top tiers of the White List and the RSCI quartiles. Both stratifications appear to distribute Groups C and D adequately, yet they rarely elevate Russian journals to the highest level. This observation raises the crucial question: if scholars’ work is evaluated based on the White List, they would not achieve the highest possible score for any publication in a domestic medical journal. The appropriateness of this scenario, relative to the objectives of assessment, remains outside the scope of this paper. On one hand, it seems clear that Russian medical and health journals do not reach Level 1 on the White List, as they are not among leading international publications. However, Russian scientists predominantly publish in top global journals as part of international research teams [23], a practice currently complicated by strained international scientific cooperation. Whether the assessment of scholars’ work is equitably applied across different fields—where Russian journals appear at Level 1 in some fields but not in others—is a matter of ongoing debate.

Impact Factor Indicators and SCIENCE INDEX for the Four Groups of Journals

We have shown that well-known Russian stratifications do not adequately filter out journals with lenient editorial standards, and the VAK List predominantly fails to filter at all. The new system of evaluating scientists’ work being developed now faces this issue, as addressing weak journals has been problematic for previously active tools. The group D that we have identified is not a solution to this problem but merely a tool for analysis. The exclusion of weak and unethical journals can be addressed either by conducting a comprehensive review of all Russian scientific publications or by imposing filters that are based on certain available journal indicators, in the hope that these filters will exclude weak and predatory publications. We will consider a set of indicators for journals in the RINZ database and check whether any of them can indicate journals that have incorrect editorial policies.

As seen in Fig. 2, the distributions of journals by the RINZ impact factor, calculated in six variations, indicate that all versions of this indicator adequately differentiate recognized and nonrecognized journals. However, they are incapable of differentiating journals that have problematic editorial policies from the main mass of mediocre journals (groups D and C). Thus, any version of the impact factor is useful in distributing journals across the levels of the White List (for example, as informational support for experts), but it would not solve the problem of filtering problematic publications.

Fig. 2.
figure 2

Distribution of journals by RINZ impact factor in 2020, calculated in six variations for the four groups A, B, C, and D.

We also consider another indicator that is related to journal citation. The integral indicator for a journal in the Science Index rating is calculated as the weighted sum of the normalized impact factor of the journal in the RINZ core for 5 years, the normalized Hirsch index of the journal in the RINZ core for articles over the last 10 years, the average normalized Hirsch index of the authors of articles in the journal over the last 3 years, and the average length of articles over the last 3 years. The ranking of journals is based on the integral Science Index indicator. The best journals have a high Science Index indicator and a low ranking position.

A clear trend emerges from Group A to Group D, where a journal’s ranking position declines alongside the decrease in its Science Index value, supporting the validity of this indicator. In each group, however, some journals occupy high positions in the Science Index ranking. Figure 3 does not show outliers, but in Groups C and D, several journals outperform the majority of Group A journals based on this indicator. As there are few such journals, and they are outliers, this fact does not compromise the indicator. It is important to note that the Science Index value does not directly depend on the databases where the journal is indexed, so the separation between Groups A and B was not predetermined by their composition. However, the Science Index cannot be recognized as a well-differentiating indicator, as there is no statistically significant difference between Groups C and D based on it.

Fig. 3.
figure 3

Distribution of medical journals by Science Index rating in 2020 and their positions in the overall journal ranking for the four groups A, B, C, and D.

Other Scientometric Indicators for Four Groups of Journals

Figure 4 provides the distribution levels of four indicators, calculated within the RINZ database, for each group of journals. With the exception of the 5‑Year Self-Citation Indicator, the other three indicators do not measure the journal’s citation metrics but rather those of the corpus of articles published within the journal. These indicators are valuable because, unlike citation metrics, they do not require a time lag for journal evaluation.

Fig. 4.
figure 4

Distribution of journals by scientometric indicators (available in RINZ) for the four groups A, B, C, and D.

None of the included indicators demonstrate a significant difference between Groups A, B, C, and D (see Fig. 4). One might expect the Self-Citation Indicator to distinguish Group D from the others, not only because it is a criterion for categorizing journals into Group D but also because it is generally known that unscrupulous journals often mandate that authors cite the journal, thus inflating their citations. However, Group D exhibits considerable heterogeneity in this indicator.

The RINZ database calculates more indicators for journals than those included in our analysis. Our goal was to demonstrate the approach and recommend its application in journal stratification methodology. If the approach relies on expert evaluation, and scientometric indicators are provided to expert scientists as supplementary information, it would be prudent to inform experts of which indicators are statistically associated with questionable editorial practices. They should be asked to pay particular attention to publications with lower or higher levels of such indicators.

CONCLUSION

Constructing a hierarchy of scientific journals is an essential component of the system for assessing researchers’ performance, unless the evaluation relies entirely on the expert examination of scientific work content. In Russian science, the past decade and a half have been significantly influenced by the scientometric approach to this evaluation. This has had both positive effects, which are not well-studied, and negative ones. One seemingly minor issue, namely, the imperfection of journal stratification embedded in evaluation tools, has led to serious incentives for Russian authors to publish in weak and pseudo-scientific journals. Consequently, thousands of works have emerged with unclear or blatantly low scientific quality, damaging the reputation of Russian science and leading to a misallocation of resources.

Currently, new tools for evaluating scientific performance are being developed at the national level. These tools will likely continue to adopt formal and hybrid approaches to evaluation, meaning that researchers’ work can be assessed based on the journals where they are published. With the restructuring of the system, there is now an opportunity to address the critical issue of preventing weak, unscrupulous, and predatory journals from entering the upper levels of journal stratification. Removing incentives to publish in such journals should significantly reduce the flow of articles by domestic authors to them.

In spite of the widely discussed examples in professional circles, where weak journals end up at one of the top levels due to classification results, this issue has not yet been systematically analyzed. Our work proposes an approach to such an analysis. Using a dataset of Russian journals in medicine and health, we demonstrated how to identify publications that have dubious editorial policies, and we compared this group of journals with others in the stratifications used in Russia, as well as across several journal characteristics. We found that VAK stratification aligns poorly with other approaches to ranking journals; the Science Index better differentiates journals than the impact factor; and noncitation characteristics do not distinguish the group of dubious journals from others. Continuing this type of analytical work should contribute to creating an effective system for evaluating researchers’ performance, benefiting Russian science by eliminating incentives for imitation.