Introduction

Research evaluations that use citation count data are complicated by average citation counts differing between fields and years. This can be resolved by only comparing publications from the same field and year (and document type) or by using field-normalised indicators. The standard normalising approach is exemplified by the Mean Normalised Citation Score (MNCS), which divides the citation count for each article by the world average for the article’s field and year (Waltman et al. 2011a, b). These normalised valued are then averaged for each group (e.g., country or research unit), with values above 1 indicating more citations than the world average. This calculation is, in theory, sensitive to outliers because citation data is highly skewed (Clauset et al. 2009; Price 1976; Thelwall 2016a). The Mean Normalised Log Citation Score (MNLCS) has been designed to deal with this issue with an extra step: applying a log transformation count ln(1 + x) to each citation count before any other calculations (Thelwall 2017a, b). For example, one study found that confidence intervals are narrower for MNLCS than for MNCS for funded medical research (Thelwall 2017b). Another found that the choice of MNCS or MNLCS affected the outcome of a test of which gender’s research tended to be most highly cited (Thelwall 2018). Since the MNCS type of indicator is widely used (including Elsevier’s Source Normalised Impact per Paper: SNIP; Clarivate Analytics’ Category Normalized Citation Impact: CNCI), it is important to assess the influence of outliers on this calculation and on the MNLCS, which is designed to reduce outliers. A unit may have an above average MNCS value solely because it publishes a few highly cited articles, whereas the remainder of its research is average. Irrespective of whether this is regarded as a drawback or advantage of MNCS calculations, policy makers using the calculation would benefit from knowing if this is likely to occur in practice.

Highly cited articles are presumably much more influential than typical articles. They are more likely to be the result of international collaboration, especially for small countries (Aksnes 2003; Persson 2009), and so might represent an unusual type of research to some extent. It has been argued that highly cited papers should be analysed separately from a unit’s typical output because the two can give different results (Tijssen et al. 2002). It is not clear whether the capacity to produce highly cited papers is separate from the capacity to produce good typical research or whether it could be conceived as an occasional almost accidental research by-product. The ability of a country to produce highly cited papers differs between nations and seems to be fairly stable over time (Bornmann et al. 2015), suggesting that it could be a side-effect of producing higher average quality or impact research.

Since citations follow an imitation pattern to some extent (Merton 1968), it seems likely that the citation counts for highly cited papers overestimate their value because a greater proportion of the citations will be imitative. A study of physics and physical chemistry suggested that 40% of citations to the 23 highly cited papers examined did not reflect active use (Oppenheim and Renn 1978). Conversely, some influential research may become standard enough to not need citations (McCain 2011), so extremely high citation counts may underestimate the influence of seminal works. One of the few studies to empirically assess the usefulness of highly cited studies found that 83% of 35 biomarker studies overestimated the effect that they claimed to measure (Ioannidis and Panagiotou 2011), and so highly cited studies are not necessarily excellent.

A previous analysis of the influence of highly cited articles on the citation impact of Norway in 24 fields 1981–1994 with a 5-year citation window found that the national citation average was predominantly due to a small number of highly cited articles in some fields. Substantial variations over time in individual field averages occurred because of these few highly cited articles and so field-specific average citation counts were not useful indicators of underlying research capacity or excellence (Aksnes and Sivertsen 2004). It is not clear whether these issues would be ameliorated for cross-disciplinary field normalised citation impact indicators, however. The great influence of individual highly cited publications on MNCS for individual universities has also been acknowledged (Waltman et al. 2012).

Highly cited articles tend to be indexed by scholarly databases (Martín-Martín et al. 2018) and so the main cause of any variations in outlier influence between databases is likely to be the classification scheme. Field normalised indicators are known to be affected by the field classification scheme used, with one study suggesting that thousands of categories may be needed for optimal results (Ruiz-Castillo and Waltman 2015). The standard journal classifications from Scopus and the Web of Science contain errors (Wang and Waltman 2016), which may create outliers. It is therefore important to assess the influence of classification schemes on the influence of outliers in field normalised calculations.

Despite the findings discussed above, no previous study seems to have directly analysed the influence of outliers on field normalised calculations. The goal of this paper is to assess the influence of highly cited papers on field normalised calculations at the national level. As argued above, whilst it is theoretically possible that a small number of highly cited papers can exert a great influence on overall field normalised scores, it is not clear whether this is likely to be a problem in practice. The focus here is on the MNCS and MNLCS indicators because the former is standard in scientometrics and the latter is a MNCS variant that may reduce the influence of highly cited papers.

Motivated by the above discussion, the following research questions drive this study.

  • RQ1 How much influence do outliers exert on the field normalised citation indicators MNCS and MNLCS for individual countries?

  • RQ2 Does the choice of field classification scheme affect the influence of outliers on MNCS and MNLCS for individual countries?

Methods

The research design was to obtain a large coherent collection of academic journal articles and their citation counts, assess them for the presence of outliers for both MNCS and MNLCS, and then check whether the country-level results differ between the two indicators. As an additional check, the country level results are compared between two time intervals to assess whether the indicator that is less influenced by outliers is more stable. For the second research question, the presence of outliers and indicator values are compared between two different subject classification schemes.

Data

Scopus journal articles from the year 2013 were chosen for the main data set. The year 2013 ensures that each article has about 5 years in which to attract citations. Citation counts with a 5-year citation window have a correlation of 0.9 with long term (31 years) citation counts (Wang 2013), which is adequate for the current study. Scopus was chosen in preference to the Web of Science (WoS) for its more international coverage (Falagas et al. 2008). Since the current article focuses on international differences, greater international coverage is a desirable property. The additional journals in Scopus would presumably tend to publish articles that attract few citations, which would inflate the significance of outliers. Thus, a similar study for WoS may produce less substantial outliers. Reviews and documents that are not journal articles were excluded since these can have different average citation counts. Article records were downloaded from Scopus in October 2018 using queries like the following. Queries were submitted for each of the 304 Scopus narrow fields, excluding the overlapping general categories for each broad field. The example below is for Classics, with field code 1205, and the publication year was sent as a separate API parameter (date).

  • SUBJMAIN(1205) AND DOCTYPE(ar) AND SRCTYPE(j)

The Scopus API occasionally reported errors and so the searches were repeated for fields giving fewer results than reported as the maximum by the API. The final dataset should therefore be a complete list of all 3,387,576 documents of type journal article in Scopus with a publication year of 2013 (Table 1). Document types can sometimes be classified incorrectly in citation databases (Donner 2017) and so there may be some errors in the data.

Table 1 Data sets and sample sizes used in this study

Articles were allocated to countries based on the declared national affiliation of the first author, as recorded in Scopus. The first author contributes the largest average share of the work in all broad fields (Larivière et al. 2016). Alphabetical authoring in maths and economics undermines this to a limited extent (Levitt and Thelwall 2013). The last author is often senior in biomedical research but the first author still tends to do most of the work (Larivière et al. 2016).

The Scopus narrow field journal classification scheme of 304 fields (excluding general fields that are largely contained within subcategory fields) was used as the primary field classification (see the source title spreadsheet: www.elsevier.com/solutions/scopus/how-scopus-works/content, the ASJC classification codes worksheet). This is a manual classification of journals into one or more subject categories, presumably with a primary information retrieval goal. Although document-level classifications are more desirable due to interdisciplinarity (Glänzel and Schubert 2003), journal-based classification schemes are currently more widely used. The Science-Metrix journal classification list of 176 narrow journal categories (www.science-metrix.com/en/classification, July 2018) was used as a second, independent scheme. This is partly based on classification schemes from WoS, Chi Research and the Australian Research Council Evaluation of Research Excellence (ERA) process (Archambault et al. 2011). The originally collected Scopus articles were fitted into this scheme, with articles in journals not in the Science-Metrix classification excluded from the Science-Metrix calculations (but retained for the Scopus classifications). The Science-Metrix classification is apparently designed for research impact calculations rather than information retrieval.

For comparison purposes, a second data set of articles published in 2012 was collected in October 2018 and processed with the same methods (Table 1). This has a different citation window (6 years rather than 5 years) but this should not affect the results much (Wang 2013). The purpose of this paper is not to focus on any time window but to examine long term citation behaviour for data that is recent enough to be relevant.

Analyses

MNCS and MNLCS values were calculated for the Scopus classification (including duplicate articles that occur in multiple categories) and separately for the Science-Metrix classification (without duplicate articles). The individual normalised citation counts were retained for analysis. For article i with citation count ci in field fi, its normalised citation counts are as follows, where the divisor is the arithmetic mean of the values for all articles from the same field:

$$\begin{aligned} {\text{MNCS}}_{i} & = c_{i} /\bar{c}_{j} |_{{f_{j} = f_{i} }} \\ {\text{MNLCS}}_{i} & = \ln (1 + c_{i} )/\overline{{\ln (1 + c_{j} )}} |_{{f_{j} = f_{i} }} \\ \end{aligned}$$

MNCS and MNLCS values for countries were obtained by taking the arithmetic mean of the MNCSi and MNLCSi values for articles i with a first author from the given country.

The extent to which outliers were present in each dataset was estimated using the kurtosis calculation. Although commonly thought of as a distribution shape (peakedness) measure, it is more accurately an assessment of the extent to which a distribution or sample has outliers (Westfall 2014). This is because the shape of the centre of the distribution has little influence on kurtosis compared to outliers, when present. Kurtosis values were calculated for each country separately (i.e., using the country average rather than the world average) so that the values would not be affected by whether the country tended to produce research that was differently cited than the world average. Kurtosis values are reported here rather than excess kurtosis (i.e., 3 was not subtracted). Although outliers can be either low or high cited articles in theory, in practice, even for the MNLCS calculation, the largest outliers are all above average and are therefore highly cited articles rather than uncited articles.

Correlations were used to assess whether the presence of outliers could systematically influence MNCS or MNLCS. A high positive correlation between kurtosis and MNCS or MNLCS would suggest that high country normalised indicators might be mainly due to the presence of outliers. Pearson correlations were not used because some of the samples failed a Kolmogorov–Smirnov univariate normality test. Spearman correlations were used instead, giving rank order comparisons. Lower correlations would suggest that outliers influence MNCS scores, since MNLCS is less prone to outliers, by design. All Spearman correlations reported are statistically significant from zero (p = 0.000, for a two-tailed hypothesis test that the underlying population, in the social sciences sense of repeated observations under similar conditions, correlation differs from 0), although this is not relevant to the current article.

The main analysis was conducted for the 20 countries with the most articles in Scopus in 2013 to assess the research questions for major research publishing nations. The analysis was repeated for the 50 and 75 countries with the most articles in Scopus in 2013 to encompass nations that publish moderate numbers of papers.

Results

Scopus classification scheme

The MNCS kurtosis values are extremely high for the top 20 countries and the Scopus classification scheme, as expected. They vary between 119 (Iran) and 518,540 (USA), with an average of 29,518. For reference, normally distributed data has a kurtosis of 3. Also as expected, the MNLCS kurtosis values are much lower, varying between 2.7 (India) and 8.4 (Sweden) with an average of 5.0. Some of the MNLCS country values are therefore moderately outside the range expected for a sample taken from the normal distribution. Given that citation counts for a single field and year tend to follow the discretised lognormal distribution (Thelwall 2016a), values close to 3 might be expected for MNLCS because of its logarithmic formula.

Kurtosis values were plotted against MNLCS to check whether high MNLCS could be primarily due to the presence of outliers (Fig. 1). There is a tendency for higher MNLCS to be associated with higher MNLCS kurtosis (Spearman correlation: 0.817), although Poland is a prominent exception. Thus, highly cited articles may strongly influence the MNLCS, despite the outlier-reducing MNLCS formula.

Fig. 1
figure 1

Kurtosis values calculated separately for the 20 countries with the most Scopus 2013 journal articles using the field normalised log citation counts and the Scopus classification scheme. MNLCS values are plotted on the y axis for comparison

Expanding the country set to the largest 75 countries in terms of Scopus journal articles in 2013, most of the extra countries have MNLCS kurtoses between 3 and 5 (Fig. 2). This is not true for all the extra countries, however, with Denmark, Philippines, Finland, Belgium and Austria all having higher MNLCS kurtoses than the top 20. The MNLCS against MNLCS kurtosis correlation is weaker for these 75 countries (Spearman correlation: 0.521).

Fig. 2
figure 2

Kurtosis values calculated separately for the 75 countries with the most Scopus 2013 journal articles using the field normalised log citation counts and the Scopus classification scheme. MNLCS values are plotted on the y axis for comparison. *Not all countries are labelled

Median MNLCS kurtosis values calculated separately for each country and field are much lower than cross-field MNLCS kurtosis values for each country, although the two correlate (Fig. 3; Spearman correlation: 0.765). The high overall MNLCS kurtosis values are therefore caused by merging fields with differening individual field MNLCS numbers.

Fig. 3
figure 3

Kurtosis values calculated separately for the 20 countries with the most Scopus 2013 journal articles, using the field normalised log citation counts and the Scopus classification scheme. *The median of the individual field kurtosis values is plotted against the overall kurtosis value

The MNLCS individual article outliers for 2013 (i.e., the highest article MNLCSi values) were examined to determine their causes. For the Scopus classification scheme, 97% of the 100 largest MNLCS outliers were from Literature and Literary Theory or Visual Arts and Performing Arts. These two narrow fields were therefore the main cause of the high outlier values. For example, one US article in the first field had 89 citations but a log normalised score of 13.0. This extremely high value (the highest article MNLCSi in the dataset) is due to the low average for the category because it contains many prestigious but rarely cited large literary magazines that are not in English. The world geometric mean citation count for Literature and Literary Theory is only 0.41 as a result. The USA and UK scores have benefitted from avoiding non-English literary magazines but publishing in mainstream literary journals. The problem of essentially uncitable articles in Scopus has been previously shown to give some fields untypical (zero-inflated) citation distributions (Thelwall 2016b).

The eight countries in the top 20 with high MNLCS kurtosis values were investigated to find the cause. This entailed inspecting the 100 articles from each country with the highest MNLCSi values because these will have contributed most to the MNLCS kurtosis score. In addition to Literature and Literary Theory and Visual Arts and Performing Arts, the Religious Studies category was prominent in these. There are outliers in Religious Studies due to the combination of magazines (e.g., The Expository Times, with all 53 articles uncited, including “All-Age Worship Resources for July”; Parabola with all 43 articles uncited, including “The night I died”) and entirely uncited journals (e.g., Svensk Teologisk Kvartalskrift; Bibel und Kirche; Bulletin de Litterature Ecclesiastique) together with well-cited interdisciplinary journals with a main focus outside religion (e.g., Psychology of Religion and Spirituality: 97% cited; Christian Bioethics: 95% cited; Journal of Religion and Health: 85% cited). The following summarises the results for the largest 100 MNLCSi articles for each country. The list is ordered in decreasing order of country MNLCS kurtosis.

  • Sweden: 24% in Literature and Literary Theory or Visual Arts and Performing Arts, including the largest 10 MNLCS values; 8% in Religious Studies; 57% in the Arts and Humanities broad category.

  • Australia: 59% in Literature and Literary Theory or Visual Arts and Performing Arts, including the largest 24 MNLCS values; 6% in Religious Studies; 80% in the Arts and Humanities broad category.

  • USA: 89% in Literature and Literary Theory or Visual Arts and Performing Arts, including the largest 28 MNLCS values; 7% in Religious Studies; 96% in the Arts and Humanities broad category.

  • UK: 76% in Literature and Literary Theory or Visual Arts and Performing Arts, including the largest 21 MNLCS values; 4% in Religious Studies; 93% in the Arts and Humanities broad category.

  • Canada: 46% in Literature and Literary Theory or Visual Arts and Performing Arts, including the largest 16 MNLCS values; 12% in Religious Studies; 78% in the Arts and Humanities broad category.

  • Netherlands: 26% in Literature and Literary Theory or Visual Arts and Performing Arts, including the largest 5 MNLCS values; 15% in Religious Studies; 70% in the Arts and Humanities broad category.

  • Poland: 13% in Literature and Literary Theory or Visual Arts and Performing Arts, including the largest 4 MNLCS values; 6% in Religious Studies; 40% in the Arts and Humanities broad category. A further 17% were from Computer Science (miscellaneous).

  • Spain: 58% in Literature and Literary Theory or Visual Arts and Performing Arts, including the largest 9 MNLCS values; 3% in Religious Studies; 81% in the Arts and Humanities broad category.

In summary, relatively highly cited articles in the Literature and Literary Theory and/or Visual Arts and Performing Arts categories were the main causes of high MNLCS kurtosis for Australia, Canada, Spain, the UK and the USA. Considering the lower output for The Netherlands and Sweden, this, together with the wider Arts and Humanities, accounts for their high kurtosis values. For Poland, Computer Science (miscellaneous) articles were a significant contributory factor. The 17 articles in this category were all from International Journal of Applied Mathematics and Computer Science, from 13 different Polish research institutions. This journal seems to reflect a high citation specialism with the category. This is exacerbated by the inclusion of the huge rarely-cited Information Technology Journal from China that accounted for 44% (1439) of the articles in Computer Science (miscellaneous). Only 27% of its articles were cited in contrast with 71% of the remaining articles.

Changing between MNLCS and MNCS makes little difference to the relative sizes of country scores (Spearman correlation: 0.971) but the difference is substantial enough to change the policy conclusions that might be drawn from the results (see “Appendix” section, Table 7 for values). For example, the USA is ranked 2nd according to MNCS but 5th according to MNLCS (Fig. 4).

Fig. 4
figure 4

MNCS against MNLCS for the 20 countries with the most Scopus 2013 journal articles, using the Scopus classification scheme

Science-Metrix classification scheme

The results for the Science-Metrix classification scheme are like those for the Scopus classification scheme, but with approximately half the kurtosis. The MNCS kurtosis values vary between 59 (Poland) and 32,784 (Japan), with an average of 2660. The MNLCS kurtosis values range from 2.6 (China) to 3.9 (Netherlands) with an average of 3.3 and are therefore close to the normal distribution values (Figs. 56).

Fig. 5
figure 5

Kurtosis values calculated separately for the 20 countries with the most Scopus 2013 journal articles using the field normalised log citation counts and the Science-Metrix classification scheme. MNLCS values are plotted on the y axis for comparison

Fig. 6
figure 6

Kurtosis values calculated separately for the 75 countries with the most Scopus 2013 journal articles using the field normalised log citation counts and the Science-Metrix classification scheme. MNLCS values are plotted on the y axis for comparison. *Not all countries are labelled

The Science-Metrix MNLCSi article-level outliers are substantially more moderate than those for the Scopus classifications, with the highest being 6.8 for 2013 (Psychoanalysis, Canada, 60 citations) and 5.6 for 2012 (Literary Studies, USA, 90 citations). The lack of higher outliers is due to the Science-Metrix classification scheme not including many non-English literary magazines (in its Literary Studies category), increasing the average log citation score for the category.

Median MNLCS kurtosis values calculated separately for each country and field are slightly lower (average 2.9) than cross-field kurtosis values and the two correlate (Spearman correlation: 0.789), but the difference is much smaller (Fig. 7).

Fig. 7
figure 7

Kurtosis values calculated separately for the 20 countries with the most Scopus 2013 journal articles, using the field normalised log citation counts and the Science-Metrix classification scheme. The median of the individual field kurtosis values is plotted against the overall kurtosis value

Changing between MNLCS and MNCS makes little difference to the relative sizes of country scores (Spearman correlation: 0.943) but, again, enough to change the conclusions that might be drawn from them. For example, China has a substantially higher MNCS than Brazil but a slightly lower MNLCS (Fig. 8).

Fig. 8
figure 8

MNCS against MNLCS for the 20 countries with the most Scopus 2013 journal articles, using the Science-Metrix classification scheme

Scopus versus Science-Metrix

The choice of classification scheme influences the MNCS and MNLCS results by a small but significant amount in all country sets (Table 2). Even when using the relatively outlier-resistant MNLCS, changing the classification scheme alters the country ranks. The change is larger for the top 50 and 75. MNCS is more resistant to classification scheme changes than is MNLCS.

Table 2 Spearman correlations between MNCS and MNLCS based upon Scopus and Science-Metrix classifications for the 20, 50 and 75 countries with the most articles in Scopus in 2013

The MNCS and MNLCS results for journal articles from 2013 were compared with those from 2012. Comparing two different source years is a better test of the influence of outliers than comparing within years because it uses a different set of data with none of the same outliers. If there is unlikely to be a dramatic change in the research capability of a country between 2012 and 2013, then a robust field normalised average impact calculation should give results for 2012 that highly correlate with the results from 2013.

The Scopus MNLCS individual article outliers from 2012 were again mainly due to the two narrow fields Literature and Literary Theory and Visual Arts and Performing Arts, which together accounted for 94% of the 100 largest MNLCSi article values, with a highest individual score of 13.0 again.

For both the Scopus (Fig. 9) and Science-Metrix (Fig. 10) classification schemes, and for all three sets of countries investigated, the MNLCS and MNCS values from 2012 correlate very strongly with the corresponding values from 2013 (bold figures in Tables 345). Since the bold correlations are all very high and similar in magnitude within the same tables, they do not point to one indicator or classification scheme being more robust over time.

Fig. 9
figure 9

Kurtosis values from 2012 calculated separately for the 75 countries with the most Scopus 2013 journal articles using the field normalised log citation counts and the Scopus classification scheme. MNLCS values are plotted on the y axis for comparison. *Not all countries are labelled

Fig. 10
figure 10

Kurtosis values from 2012 calculated separately for the 75 countries with the most Scopus 2013 journal articles using the field normalised log citation counts and the Science-Metrix classification scheme. MNLCS values are plotted on the y axis for comparison. *Not all countries are labelled

Table 3 Spearman correlations between MNCS and MNLCS based upon Scopus and Science-Metrix classifications for the 20 countries with the most articles in Scopus in 2013
Table 4 Spearman correlations between MNCS and MNLCS based upon Scopus and Science-Metrix classifications for the 50 countries with the most articles in Scopus in 2013
Table 5 Spearman correlations between MNCS and MNLCS based upon Scopus and Science-Metrix classifications for the 75 countries with the most articles in Scopus in 2013

Kurtosis values were calculated for the 2012 data to check that the 2013 data was not unusual. The MNLCS but not MNCS averages were similar in all cases (Table 6). Sweden had the highest MNLCS kurtosis for MNLCS in 2012, echoing the situation for 2013. In contrast, whilst Canada had the highest MNLCS kurtosis for the Science-Metrix classifications in 2013, The Netherlands had the highest for Science-Metrix in 2012.

Table 6 Average and extreme kurtosis values for the field normalised citation counts of the top 20 countries (i.e., with the most articles in Scopus in 2013) in 2012 and 2013

Discussion

The results are limited by the classification schemes and years analysed. Different patterns may have been obtained from WoS (possibly with weaker outliers if WoS has more balanced journal coverage) or article-based classification schemes (Perianes-Rodriguez and Ruiz-Castillo 2017). Newer or much older data may also display different characteristics. The analysis is restricted to a technical discussion, without known correct values for the underlying research impacts of the countries. The influence of outliers may also be different for smaller types of unit, such as research groups, where they seem likely change the results more. Of course, citation counting is only one way of attempting to assess scholarly influence and it ignores may important ways in which scholarship can be useful to other researchers (MacRoberts and MacRoberts 2010) and society (Priem et al. 2010).

The high MNLCS kurtosis values for some countries with the Scopus classification scheme were mainly due to values being inflated for two fields with many essentially uncitable articles. This would be less of a problem for percentile indicators. These report the percentage of a unit’s articles that are within the top X% (e.g., 1%, 5%, 10%, 50%) for each field separately (Waltman and Schreiber 2013). The uncitable articles could inflate this percentile for units not publishing uncitable articles but individual outliers have little affect (Waltman, et al. 2012). Nevertheless, sets of outliers (as in the two fields above) may still influence the overall results. This may be the reason why one study found classification schemes to influence the top 1% results more than the top 10% results for universities (Perianes-Rodriguez and Ruiz-Castillo 2018). Two other citation indicators, the number of highly cited articles and the proportion of highly cited articles (Waltman 2016), have similar issues to percentile indicators.

Conclusions

The country-level data for 2013 and 2012 shows, as expected, that MNCS has extreme outliers. It also shows that MNLCS can have moderately stronger outliers than the normal distribution, if the Scopus classification scheme is used and slightly stronger if the Science-Metrix scheme is used. The stronger Scopus outliers are due to the inclusion of essentially uncitable articles in a few Scopus categories, lowering the world average citation rate. The slightly stronger outliers for MNLCS with the Science-Metrix scheme than the normal distribution, despite citations approximately following the discretised lognormal distribution (Thelwall 2016a), could be due to countries having differing research impacts between fields, which would inflate kurtosis values calculated using the overall average rather than individual field averages. This conclusion is supported by the lower values found when taking the median of the within-field kurtoses rather than a single cross-field kurtosis.

Although the MNLCS was designed to reduce the impact of individual highly cited articles, the above findings show that it is important to check for uncitable articles in any category before producing MNLCS to inform policy decisions. This is even more important for MNCS calculations because of the much larger outliers. Surprisingly, however, the higher MNCS outliers do not make MNCS values more stable over time, nor does the use of a classification scheme (from Science-Metrix) that reduces the largest outliers. This is probably due to the long-term root cause of the highest outliers (at the country level) being classification scheme issues that influence countries consistently, despite differing between countries.

The differences between MNLCS and MNCS values and between classification schemes were numerically small in all cases, in the sense of producing very high correlations between them, especially for larger countries. Nevertheless, in a context where small differences can have implications for policy decisions, especially if they change the rank order of nations, these small variations are worrying. For example, a country believing that a recent policy decision had helped its researchers to overtake a near competitor in average research impact might not realise that this indicator-based conclusion would be reversed if a different indicator or classification scheme had been used. Thus, indicator producers should be careful to ensure that the categorisation schemes are appropriate and that the end users are aware of the potential influence of outliers and category schemes on the conclusions drawn.