Introduction: Background

Soon after bibliometric measures based on citation appeared, and first of all the Journal Impact Factor (Garfield 1972), it was recognized that these measures were dependent to disciplinary effects. Garfield (1979) attributed this variability to the citing behavior in the various scientific fields. The traditional response to citation impact across fields is normalization in a broad sense, based for example on field average values, or quantiles in rank analysis. For example a journal in biochemistry is compared to the average impact of biochemistry, and so on, making it sensible to compare these relative values across fields. This “cited-side” or “target-based” normalization, studied in an abundant literature,Footnote 1 is not the only way. A novel family of citation normalization reverses the point of view, by correcting the variability where it appears, the referencing behavior. Among forerunning works of citing-side normalization, Small and Sweeney (1985) defined fractional citation in a co-citation context and Zitt et al. (2005) called for an exploration of citing-side normalization of citation measures in their study of the scale-instability of impact normalization. The first applications were developed at the journal level: experimental Audience Factor, Zitt and Small (2008) and Zitt (2010), Source-Normalized Impact per Paper implemented on Scopus, Moed (2010). A less general tool, but with some similarity in the case of journals with no citation exchanges, is the Reference Return Ratio (Nicolaisen and Frandsen 2008). Applications of citing side normalization at actors’ level are appearing (Waltman and van Eck 2010, Leydesdorff and Opthof 2010). Among novel approaches of the impact factor, the intellectual influence measures, a powerful concept pioneered by Pinski and Narin (1976), were reactivated for example by Palacio-Huerta and Volij (2004), and generalized by Bergstrom (Eigenfactor 2007) on Google-type algorithms. Moya-Anegon (2007) implemented influence measures in Scopus. Some implementations include, along with iterative citation chains, a reference-based entry (Eigenfactor).

This article focuses on the rationale of the citing-side approach, which emerges from the scrutiny of the Journal Impact Factor formula. Basically, it shows how the determinants of the aggregate impact factor of a field J, stated in the previous publications, shape the impact factor first in a model of closed field (no imports/exports of citations with other fields) with as symmetrical design of the database (coverage on citing and cited side), then in more general cases.Footnote 2 Let B t a set of publications from selected fields; M t a set of publications from selected media, for example a selection of journals; D t a set of publications with selected document types. t is the time unit. A citation index contains a pair of sets of publications (which in practice largely overlap) from the scientific literature Ω

  • one qualified for emitting citations (the potential source set, subscript g for citing):

    $$ W_{gt} = \, \{ \omega\, \epsilon \, \Upomega ,\;\omega \,\epsilon \, B_{gt},\;\omega \,\epsilon \, M_{gt},\;\,\omega \,\epsilon \, D_{gt}\} \, = \;B_{gt} \cap M_{gt} \cap D_{gt} $$
  • one qualified for receiving citations (the potential target or “citable” set):

    $$ W_{ct} = \, \{ \omega \,\epsilon \,\Upomega \,,\;\,\omega \,\epsilon \,B_{ct},\;\,\omega \,\epsilon \, M_{ct},\;\,\omega \,\epsilon \, D_{ct} \} \, = B_{ct} \cap M_{ct} \cap D_{ct} $$
  • The filters B, M, D of the database may vary over time.

The filters of the database may vary with the year.

In the following, some general simplifications are made:

  • Year is the time unit.

  • No difference is made between “publication year” and “database year”

  • The filters above are noted as independent but the reality is somewhat different, for example active “types of documents” depend on the “type of media”, themselves sometimes dependent on the field.

  • A field is considered as a set of journals without multi-assignment to fields. Similarly we assume the stability of assignments over time.

  • Last assumption, the archetype of JIF described here is assumed to be consistent as far as all literature considered in the numerator and the denominator are based on the same filter (D g below) Footnote 3.

These assumptions are made for simplicity, but do not alter the general findings.

A framework of citation analysis involves a pair of subsets Z gt and Z ct respectively from W gt and W ct, under coupling constraints on the time frame. T(t) is a period of consecutive years, anchored in some year t (see below). We note Z T the union set Z T(t)  = U u ϵT(t) Z u

Let Z gT the set qualified as potential source (emitters). T = T(t) denotes the citing period.

Let Z the set qualified as a potential target of citations (“citable” set). Θ = Θ(t) denotes the cited period. Within a given framework defined by Z gT and Z c Θ, a citation study typically focus on some part of Z c , say z c , for example: an individual publication; a set of publications by the same author; an institution; a journal, noted j c ; a whole field, noted F c .

For any sets of publication U and V, p(U) and c(U,V) denote the number of publications in U and the number of citations from U to V.

Let p(Z gT ) and p(z ) the number of documents in Z gT and z

Let c(Z gT , Z ) the number of citations from Z gT to Z . The bibliographic references in Z gT are deemed “active”–by comparison with c(Z gT , Ω) referencing all literature–if they cite documents within the time frame Θ and obeying the filters on W . For example references to books are not deemed active if books are ruled out by the filter B in W ; references to articles older than the period Θ are ruled out as well.

For a specific target z

c(Z gT , z ) = c(S gT (z ), z ) where S gT (z ) denotes the subset of effective sources belonging to Z gT and citing z . Papers in the set S gT (z ) cite z but also, possibly, other parts of Z . Effective sources should be distinguished from potential sources like W gt or Z gt : all elements in sets W gt , Z gt are qualified for being citing items but some may not contain effective citations to a particular target z under scrutiny; similarly elements in W ct , Z , z are qualified on the cited-side but may receive no citations.

There are many schemes of citation counting defined

  • by arrangements of T and Θ. These issues of time frame of citations have been discussed many times; see for example Irvine and Martin (1989), Rousseau (1997), Ingwersen et al. (2001), Glänzel (2004). Terminology is not stabilized: citing year(s), synchronous, backward, retrospective count, opposed to cited year(s), diachronous, forward, prospective count.

  • by type of count (fractional, whole and many others) and field multi-assignment rules, not real issues at the journal level with our working hypotheses.

The framework of the Journal Impact Factor JIF(j,t) of journal j relies on a citing year scheme.

JIF(j,t) dated year t is based on:

  • the set of targets j  = z with z instantiated to the journal j over the period Θ = {td,.. t1}. In the standard Journal Impact Factor, the length of the citation window d is 2 years.

  • the set of potential sources Z gT with T reduced to t, i.e. Z gt . t is both the date of emitted citations and the date of the impact factor.

  • the set of effective sources S t (j ), i.e. the set of publications, subset of Z gt , citing j cΘ.

The general formula of JIF(j,t) dated year t is:

$$ JIF\left( {j,t} \right)\; = \;c(S_{t} (j_{c\Uptheta } ),j_{c\Uptheta } ) \, /p(j_{c\Uptheta } ) $$

Let us define the “Field Impact Factor” of the field F—remembering that a field is defined here a set of uni-assigned journals.

$$ FIF\left( {F,t} \right) = c(S_{t} (F_{c\Uptheta } ),F_{c\Uptheta } ) \, /p(F_{c\Uptheta } ) $$
(1)

with

  • the set of targets F  = z with now z instantiated to the field F over the period Θ = {td.. t1}.

  • the set of potential sources, Z gt as above.

  • the set of effective sources S t (F ), i.e. the set of publications, subset of Z gt , citing F

FIF(F,t) can also be seen as a weighted average of the impacts of the n individual journals noted j1… jn forming the field F, with the size of the journal (number of papers) as the weight.

$$ FIF\left( {F,t} \right) = \sum_{i = 1..n} c(S_{t} (j^{i}_{c\Uptheta } ),j_{c\Uptheta }^{i} ) \, \big/ \, \sum_{i = 1..n} p(j^{i}_{c\Uptheta } ) \, = \, \sum_{i = 1..n} [p\left( {j^{i}_{c\Uptheta } )JIF\left( {j^{i} ,t} \right)} \right]\,\big/ \, \sum_{i = 1..n} p(j^{i}_{c\Uptheta } ) $$

Case of a closed field

Assumptions

  1. (a)

    we assume a symmetrical design of the citing set W g and the citable set W c of the database, in terms of coverage of fields, journals and types of documents. At any date u:

$$ W_{cu} \; = \;W_{gu} $$
  1. (b)

    we assume F is a closed field, it does not exchange with other fields I in either direction. For effective citations, in absence of multi- or re-assignments of journals:

$$ {\text{I }} \ne F \rightarrow c(S_{t} (F_{c\Uptheta } ),I_{c\Uptheta } ) = \, 0;\;c(S_{t} (I_{c\Uptheta } ),F_{c\Uptheta } ) = \;0 $$

In Fig. 1, this configuration is represented by the arrow 1. Let us calculate the denominator and numerator of the FIF (Eq. 1) as a function of the current literature in the field F, p(F gt ).

Fig. 1
figure 1

Exchanges in four configurations

Denominator of the Field Impact Factor

The denominator in Eq. 1, p(F cθt ), can be expressed as a function of p(F gt ). It follows from (a) and (b) that for every individual year u

$$ F_{cu} = \, F_{gu} \;{\text{then}}\;p(F_{cu} ) = p(F_{gu} ) $$

Change of p(F c ) over time is carried by change of average number of papers by journal (it would not make difference if the change were also due to the number journals assigned). Whatever the process, assume a constant annual increment, positive (actual growth) or negative, over the period.

Let r(F) the annual growth rate of the field F measured on p(F gu ):

$$ r\left( F \right) = (p\left( {F_{gu} } \right)\; - \;p(F_{g(u - 1)}) )/p(F_{g(u - 1)} ) $$

Note h(F) the ratio p(F gu )/p(F g(u1))  = 1 + r(F).

The literature of the field, year t−1, is p(F g(t1) ) = p(F gt ) h 1

for year t−d: p(F g(td) ) = p(F gt ) h d

The denominator of FIF (F,t) is the qualified literature within the T period:

$$ F_{cu} = \, F_{gu} \rightarrow \, p(F_{c\Uptheta } ) = \;p(F_{g\Uptheta } ) $$
$$ DEN = p(F_{c\Uptheta } ) \, = p(F_{g\Uptheta } ) = \sum_{u = 1..d} \left[ {p\left( {F_{g(t - u)} } \right)} \right] \, = \, p\left( {F_{gt} } \right)\;G_{d} \;\;\;\;\;\;\;\;\;\;d >= 1 $$
(2)

where G d  =  u=1..d (h u ) = (h 1h d1 )/(1h 1 )

Numerator of the Field Impact Factor: citations received by the field

The numerator of Eq. 1, c(S t (F ), F ), can also be expressed as a function of p(F gt ).

In a closed field, the effective citations received by the field identify with the “active references” (citations emitted) in this field in the citing year. We shall note v(Z gt ) the per document average of active references in any citing set, with two options:

A first way to decompose NUM is:

NUM = c(S t (F ), F ) = v(S t (F )) p g (S t (F )) where v(S t (F )) is the average of active references calculated in effective source document S t (F ) in the field on year t.

Instead, NUM can be decomposed on the basis of p(F gt )

$$ NUM \, = \, c(S_{t} (F_{c\Uptheta } ),\;F_{c\Uptheta } ) \, = \;v\left( {F_{gt} } \right) \, p(F_{gt} ) $$
(3)

where v(F gt ) is the average number of active references calculated on the basis of all citing documents p(F gt ) in the field on year t. Compared to S t (F ), F gt includes additional documents, obeying the filters of W g but lacking “active references”. For example, they contain other references, falling outside the window Θ. In principle W g rules out those document types that may not contain any reference (ex. editorials, etc.). Components of the two alternative products composing NUM are such as:

$$ p_{g} \left( {S_{t} (F_{c\Uptheta } )} \right) <= p\left( {F_{gt} } \right)\;{\text{and}}\;\;v(S_{t} (F_{c\Uptheta } ))\; >= \;v\left( {F_{gt} } \right) $$

Calculation of the Field Impact Factor

Then the FIF writes:

$$ FIF\left( {t,d} \right) \, = \, v\left( {F_{gt} } \right) \, p\left( {F_{gt} } \right)/ \, p\left( {F_{gt} } \right)G_{d} = \, v\left( {F_{gt} } \right) \, /G_{d} $$
(4)

for example FIF(t,2) = v(F gt )/G 2

which expresses that in a close field the average impact factor only depends on: the propensity to cite within the citation window, if we measure propensity by the average number of active references in the citing articles v(F gt ); and the growth conditions, reflected by G d . By the way, the size of the field, represented by the variable p(F gt ) present both in the numerator and denominator, does not directly affect the level of FIF.

v(F gt ), average of active references by document, can be further decomposed:

$$ v\left( {F_{gt} } \right)\; = \;q\left( {F_{gt} } \right)\;b\left( {F_{gt} } \right)\;a\left( {F_{gt} } \right) $$

where, q(F gt ) is the average of the total number of references by document in F gt , whatever the target and the date of the reference. q(F gt ) refers to a scheme where all filters B, M, D are deactivated;

b(F gt ) is the proportion of all references accounted for in q, going to documents passing the filters (for example, in the standard JIF, references to books do not);

and a(F gt ) the proportion of the latter documents falling in the citation window Θ.

In the following, since there is no ambiguity, time subscripts t, Θ and d will be dropped in the expressions of G, F g , F c, S. For example, F g will stand for F gt , F c for F , etc. Similarly v(F gt ), q(F gt ), b(F gt ), a(F gt ) will be abridged v, q, b, a. Subscripts will be retained when necessary, for example in new definitions (e.g. β below). With these notations, Eq. 4 becomes:

$$ FIF\left( {t,d} \right) = v/G $$

A related approach in the context of comparison of citation frameworks is found is Glänzel (2004). Equation 4 shows that the determinants of the average field impact factor in a close field, with working assumptions that are not heroic, are quite clear, the average of active references by citing paper on the one hand, the growth condition on the other hand.

Case of a closed field F with an asymmetrical design

It may be interesting to explore asymmetrical schemes. The assumption (a) citing set and the citable set similar in terms of coverage of journals and document types, is dropped. For simplicity, assumption (b) is held.

We now allow W c  ≠ W g , namely through an asymmetry of documents types D c  ≠ D g . The consequences would be analogous for an asymmetry on M. Assume the difference between D c and D g is due to the introduction of a (fake) type of document, say “web conference proceedings” (WCP), on the cited side and/or the citing side. For convenience, we will assume the absence

  • of change in citation behavior. The issue is limited to “taking or not taking” those WCP in the data on either side, citing and cited.

  • of time effect on the proportion of the WCP literature during the period.

Let us examine what happens in different cases. Table 1 above shows four cells A, B, C, D corresponding to inclusion or exclusion of WCP on the citing or the citing side. The previous section corresponds to cell A.

Table 1 Four configurations

Cell B of Table 1: same citing sources, new citable set and new references activated

The cardinal of the citing set, p(F g ), is unchanged by construction, but in this citing literature, new references in articles’ bibliography are taken into account, those references which point the new targets WCP. In Fig. 1, this configuration activates the flows of arrows 1 and 2. On the target literature, both the number of citations (numerator of the FIF) and number of publications (denominator of the FIF) are concerned:

Let F gt the set on the citing side, including the WCP, with p (\( F'_{gt} \)) publications. By construction, this set remains virtual (not activated) in the present configuration (cell B of Table 1) but p( \( F'_{gt} \) ) is used as an auxiliary variable for defining the ratio β gt :

\( \beta_{gt} = p\left( {F'_{gt} } \right)/p\left( {F_{gt} } \right) >= 1 \). \( F'_{gt} \) and β gt will be noted \( F'_{gt} \) and β g .

Let \( F_{c\Uptheta }^{'} \) (noted below \( F'_{c} \)) the new target set, including the WCP as citable documents, with p(\( F_{c\Uptheta }^{'} \)) documents.

$$ p(F{'}_{c} ) \, = \, p\left( {F{'}_{g} } \right)G' = p\left( {F{'}_{g} } \right)G $$

admitting similar growth conditions for both types of literature (G′ = G) and no distortions in their relative coverage on the citing and the cited side.

Equation 2, ruling the denominator of the FIF, is modified to reflect a citable set now larger than the potentially citing set:

$$ DEN' = \, p\left (F{'}_{c}\right ) = p\left( {F{'}_{g} } \right) \, G = \beta_{g} p\left( {F_{g} } \right)G $$
(2b)

Equation 3, ruling the numerator, is modified through the increase in the number of active references, which now encompasses the WCP targets. This modification is due to the components of v, now v′: q is not altered, since based on all references whatever; a is not altered, admittedly: for convenience, we consider that a′ is equal to a, i.e. that the additional literature does not alter the immediacy conditions; b is altered: we shall note b′ the new value. b′ is the proportion of total references which go to the (enhanced) cited set considered.

Then

$$ NUM' = c\left(S\left( {F{'}_{c} } \right),F{'}_{c}\right ) = \, v' \, p\left( {F_{g} } \right) = \, qb'ap\left( {F_{g} } \right) $$
(3b)

Let us now expand b′. The ratio b′/b is equal to β g (ratio of increase of cited targets) if the new cited objects WCP have the same impact than the previous ones (articles, etc.). For example, assume that b = 0.7, 70% of all citations were previously passing the filter. We now add WCP on the cited side, suppose the total number of targets is multiplied by 1.3. All things equal, we get b′ = b β g  = 1.3 × 0.7 = 0,91. Now, if the number of references to WCP is not in same proportion of their number that it is for the previous considered literature, indicating that their resulting impact will be lower, a correcting coefficient s t (noted s) is necessary in the expression of the ratio b′/b, so that:

$$ b' = bs \, \beta_{g} $$

Adapting Eq. 3b:

$$ NUM' = \, v' \, p\left( {F_{g} } \right) = \, qbs \, \beta_{g} ap\left( {F_{g} } \right) $$

From Eq. 2b DEN′ = β g p(F g ) G

Hence

$$ FIF'\left( {t,d} \right) = (qbs \, \beta_{g} a \, p\left( {F_{g} } \right))/\left( {\beta_{g} p\left( {F_{g} } \right)G} \right) = sv/G $$
(4b)

The new FIF’′(t,d) as a function of FIF(t,d) is then

$$ FIF'\left( {t,d} \right) = \, sFIF\left( {t,d} \right) $$

This highlights the fact that no systematic up-trend or down-trend is expected from the change. It depends whether the impact of the added literature (here the WCP) is higher or lower than the previous one. In a realistic situation, as changes in the coverage of sound databases usually occur along a Bradfordian path—taking first the core of literature, then enhancing towards less visible classes–the case of diminishing returns in terms of impact is probably the most frequent (s < 1).

Cell C, new source papers, same qualified “citable” set

In this other form of asymmetry, this configuration activates the flows of arrows 1 and 3 in Fig. 1.

The citing set F g is now enhanced by the WCP, becoming \( F''_{g} \) such as \( p\left( {F''_{g} } \right)\; = \; \beta_{g} p(F_{g} ). \)

The citable literature, p(F c ) is not getting larger, but it now collects citations from the new items and its average impact tends to increase, all things equal. Remembering that we assume similar growth conditions in the new and the “old” sources (G′ = G):

Equation 2, ruling the denominator of FIF, is rewritten since the citable set, unchanged, is now a modified function of of the new citing set. Instead of DEN = p(F c ) = p(F g ) G we have now:

$$ DEN^{\prime \prime} = p\left( {F_{c} } \right)\; = p\left( {F_{g}^{\prime \prime } } \right)G^{\prime \prime} / \, \beta_{g} = p\left( {F_{g}^{\prime \prime } } \right)\;G/ \, \beta_{g} $$
(2c)

Equation 3, ruling the numerator of FIF, is modified to take into account the increase in the number of citing sources, which directly affects the level of FIF. Here we assume \( {\text{q}}(F^{\prime \prime}_{\text{g}} ) \) , \( b(F^{\prime \prime}_{g} ), \) \( a(F^{\prime \prime}_{g} ) \) and therefore \( v\left( {F^{\prime \prime}_{g} } \right) \) not influenced by the addition of new sources. In other words, the structure of bibliography in the WCP is supposed identical to the structure of bibliography in the previously covered literatureFootnote 4:

$$ NUM^{\prime \prime} = v\left( {F_{g}^{\prime \prime } } \right) \, p\left( {F_{g}^{\prime \prime } } \right) = v \, p\left( {F_{g}^{\prime \prime } } \right) $$
(3c)

Then the new FIF is:

$$ FIF^{\prime \prime} \left( {t,d} \right) = vp\left( {F_{g}^{\prime \prime } } \right)/ \, \left( {p\left( {F_{g}^{\prime \prime } } \right)G/\beta_{g} } \right) $$
$$ FIF^{\prime \prime} \left( {t,d} \right) = \beta_{g} v/G $$
(4c)

FIF″(t,d) = β g FIF(t,d) through Eq. 4

FIF″(t,d) is strictly larger than FIF(t,d) iff β g  > 1 (number of WCP not equal to zero).

Adding sources result in an increase of impact factor.

Cell D back to symmetry

In this case the new category is implemented both on cited and citing side, back to a symmetrical design but on an enlarged basis. In Fig. 1, this configuration activates the flows of arrows 1 through 4.

DEN″′ is now equal to DEN′ (enhancement of the targets) and can be written as a function of p(F gt )

$$ DEN^{\prime \prime \prime} \; = \; p\left( {F^{\prime \prime \prime}_{g} } \right) \, G^{\prime \prime} = \beta_{g} p\left( {F_{g} } \right) \, G $$
(2d)

as in Eq. 2b

NUM″′ collects additional citations from new source papers (like in configuration C) and from references to the new targets (like in configuration B) in all citing documents.

\( NUM^{\prime \prime \prime} \; = \; v\left( {F^{\prime \prime \prime}_{g} } \right) \, p(F^{\prime \prime \prime}_{g} ) \) where both factors are enhanced by WCP:

$$ p\left( {F^{\prime \prime \prime}_{g} } \right) \; = \; \beta_{g} p(F_{g} ) $$

and \( v\left( {F^{\prime \prime \prime}_{g} } \right) = q\left( {F_{g} } \right)_{{}} b^{\prime \prime \prime} \left( {F_{g} } \right)_{{}} a(F_{g} ) \) with admittedly, like in configuration B, b″′(F g ) = b(F g ) s β g

\( v\left( {F^{\prime \prime \prime}_{g} } \right) \; = \; q_{{}} bs_{t} \beta_{g} a = v_{{}} s \, \beta_{g} \) since we assume that q and a are not altered.

Hence

$$ NUM^{\prime \prime \prime} \, = vs \, \beta_{g}^{2} p\left( {F_{g} } \right) $$
(3d)
$$ FIF^{\prime \prime \prime} \left( {t,d} \right) \, = \;\left( {vs \, \beta_{g}^{2} p\left( {F_{g} } \right)} \right)/(\beta_{g} p\left( {F_{g} } \right)G) $$
$$ FIF^{\prime \prime \prime} \left( {t,d} \right) = \left( {vs \, \beta_{g} } \right)/G $$
(4d)
$$ FIF^{\prime \prime \prime} \left( {t,d} \right) = \left( {s\beta_{g} } \right)FIF\left( {t,d} \right) $$

As in configuration C, the impact factor is expected to increase because of addition of citing sources. As in configuration B, the coefficient s may also affect the result.

To summarize the results on asymmetrical cases, the primary effect is the increase of impact factor when sources are enhanced. A secondary effect takes place when the added documents do not, on average, score like others. Figure 1 illustrates the issue of coverage both on citing and cited side (see also Moed 2010). It may be noted that asymmetry involving other filters—for example a different coverage of media (journals) on the citing and the cited side —may be treated accordingly.

Case of an open field

The assumption (b) is now dropped, the field is assumed to exchange citations with other fields, which is a realistic case. For simplicity, the assumption (a), symmetry, is held. The denominator of the impact factor is not modified but the numerator now reflects the across-fields exchanges.

The sources S (F c ) are distributed among fields I, and the field as a target loses the part of its own citations, which is not self-citations.

S (F c ) = U I ·S I (F c ) where U is the Union symbol. S I (F c ) denotes the sources belonging to field I.

Adapting [Eq. 3]:

\( NUM \; = \;c(\sum_{I} S^{I} (F_{c} ),F_{c\Uptheta } ) = \sum_{I} w_{IF} v\left( {I_{g} } \right) \, p\left( {I_{g} } \right) \) where w IF denotes the proportion of active citations emitted by I going to F.

In the matrix W of transactions, w IF is the ratio to the margin, for citations emissions: W(IF)/W(I +). In the case of a closed field, w IF  = 0 for I ≠ F; w IF  = 1 for I = F, that is, incoming citations identify to self-cites and out-coming citations are zero.

The time frame is the same as in the sections above.

Then:

$$ FIF\left( {t,d} \right) = \sum_{I} [w_{IF} v\left( {I_{g} } \right)p\left( {I_{g} } \right)/(p\left( {F_{g} } \right)G)] $$

Noting x IF  = p(I g )/p(F g ), ratio of size (number of publications) of the source field I to the source field F

$$ FIF\left( {t,d} \right) = \sum_{I} [w_{IF} x_{IF} v\left( {I_{g} } \right)]/G $$
(5)

Separating self-cites from other flows:

$$ FIF\left( {t,d} \right) = w_{FF} v\left( {F_{g} } \right)/G + \sum_{I \ne F} [w_{IF} x_{IF} v\left( {I_{g} } \right)]/G $$

v(I g ) may be decomposed:

$$ v\left( {I_{g} } \right)\; = \;q\left( {I_{g} } \right)\;b\left( {I_{g} } \right)\;a\left( {I_{g} } \right) $$

if we assume, for simplicity, that parameters (b, a) in the emitting field I are invariant with the target F.

This illustrates how the impact factor of an open field depends on the propensity to cite in each emitting field, weighted in function of the structure of inputs across fields (including self-citation). The dependence to growth on the cited (target) field is maintained.

Applications

A first extension of the above developments consists in scaling up or down the aggregate “field” considered above. Instead of fields—based for example on WoS subject categories—we could consider micro-fields made from a single journal. Similar configurations could be studied: isolated journals with symmetrical design (only self-citing, and receiving only self-cites), isolated journals with asymmetrical design, open journals exchanging with each other.

From Eqs. 5 and 2, by noting h the target journal and j the sources, we retrieve a formula of JIF making explicit the role of active references:

$$ JIF\left( {t,d} \right)\; = \sum_{j} \left[ \, w_{jh} x_{jh} v\left( {j_{g} } \right)\right] /G_{h} \; = \;\sum_{j} \left[ \, w_{jh} p\left( {j_{g} } \right)\;v\left( {j_{g} } \right)\right]/G_{h} p\;\left( {h_{g} } \right) $$

equivalent to \( \sum_{j} [ \, w_{jh} p\left( {j_{g} } \right)v\left( {j_{g} } \right)]/p\left( {h_{c} } \right) \)

reducing to JIF(t,d) = v(h g )/G h in an isolated journal, where j = h →  w jh  = 1, else 0, and j = h → p(j g ) = p(h g )

The decomposition of the Field Impact Factor, or weighted average of JIF of the field journals, shows that, with a few sensible hypotheses, the across-field variations are determined by three factors: the propensity to cite and to cite rapidly (speed of citation) within the citation window; the growth conditions; the import–export of citations, often seen as a counterpart of knowledge and information exchanges. Various assumptions have been made here to simplify the model; most of them could easily be relaxed by introducing correcting factors. Intricate issues may come from the coverage conventions in the databases. The basic finding remains: the field-dependence of the field impact factor first depends on the propensity to cite, and thus may be corrected by citing side normalization.

The general principle allows many applications: the first prototype, the Audience Factor, for example, was based on the network of journals, with v calculated at the journal level, and was intended to follow the scheme of JIF with a particular weighting. The backward framework of the JIF, looking back two years from a current “citing year(s)”, is not ideal for fine grain applications. For general purposes, a forward view is usually preferred. The principle of citing-side normalization is applicable, with appropriate settings, to any time framework and any granularity on the cited or citing side. A crucial question is the basis on which the propensity to cite (here the coefficient v) is calculated, and its mode of calculation (see discussion below).

Discussion and conclusion

Concerning the field level, contrarily to a common belief, there is no functional linkage of the average JIF to the size of the field, a point already made by Garfield (1976, 1998). However, other measures, such as the maximum JIF reached in a field, may exhibit such dependence. For example, in a limit case where “the winner takes all” in a particular field (a single journal gathers all citations), the maximum impact factor in the field, namely this particular journal’s impact, is a direct function of the size of the field. In this case the variance and the skewness of citation scores also reach top values. This effect of range is met at any level of observation, for example outstanding individual articles may capture a higher number of citations in larger fields, even assumed closed.

Explaining the referencing behavior is another issue. The propensity to cite results from a mix of field knowledge structure and sociological habits, topics studied in the abundant literature on citation theories, or rather the implications for citations studies, of theories in sociology of science (Cronin 1984, Luukkonen 1997 for reviews). In the realm of socio-cognitive motives, going from the rather cognitive to the rather sociological determinants, we could expect a propensity to cite positively correlated to: the horizontal integration of the field, not confining references to a narrow specialty within the field; the vertical integration of the field with internalization of the knowledge-base, implying references to the theoretical substrate; the dependence over other fields, a form of multi-disciplinarity; the complexity of the mode of production (theories, instruments, methods) along with a division of labor and collaboration intensity, possibly amplified by self-citation, which also may spur external dependences; the social practices, for example the acceptance of contextual references with relatively low relevance or the generosity towards “non-Mertonian” references. These field-dependent factors influencing the propensity to cite and likely to be normalized in typical citing-side approaches are not, however, the only players.

Within a community and area of research, the variability amongst individual researchers and articles remain high. Should this be neutralized by normalization? The priority is controlling for communities’ citation habits. Controlling for habits at a lower level may be interesting but should be very careful to avoid adverse consequences. Secondly, the type of literature also influences the propensity to cite. A typical case concerns the articles in “trade journals”, which often exhibit scarce references, but at the same time are not really primary research accountsFootnote 5 and do not deserve an overweight of references. Both problems are challenges to citing-side normalization, as commented below.

As argued in the literature on the subject, a typical implementation of the citing-side normalized impact

  • controls for the citing propensity, including the speed of citations (in contrast with the original JIF)

  • does not control for the growth rate, giving some advantage to growing fields (as in the original JIF).

  • does not control for the exports or imports of citation, which are fully reflected (as in the JIF). It is a way to recognize the role of basic/generic research.

  • may be sensitive to the coverage of the citing sources in the database.

  • has a rationale almost or strongly classification-free, we go back to this point later.

In contrast, cited-side (or ex-post) normalization jointly controls for all variables. This family of measures does not reflect differential growth or import-exports of knowledge. This is probably considered as an advantage, but the equality amongst fields, borne by cited-side normalization, could find some justification. However, the argument is fragile, because equality depends on the somewhat arbitrary choice of a specific classification scheme. The dependence of cited-side normalization on the coverage of sources (citing side, right part of Fig. 1) is only second-order, which is an advantage, compared with citing-side normalized JIF or straight JIF, which depend at the first order on differences of coverage amongst fields. Indeed, a major weakness of the typical implementations of cited-side normalization is the strong dependence on nomenclature/classifications used for normalization, both in delineation of areas at a given scale, and of cross-scale instability (Zitt et al. 2005). By and large, citing-side normalization offers the most natural way to correct citing discrepancies.

In case of a multiple origin of citations to a given entity, citing-side normalization takes into account the conditions of the multiple sources, and not a fixed reference on the cited side such as Medline Mesh index, CAS indexes, WoS subject categories. The spirit of the citing-side approach is “classification-free”, not relying on arbitrary definitions of fields at any level with their consequences, such as the issue of multi-assignment of journals to fields. The essential point is the level used for calculating the length of active bibliographies, and their statistical treatment. Natural choices are:

  1. (a)

    the individual paper level: a radical choice where the sum of references weights by citing paper is unity.

  2. (b)

    the neighborhood of the paper in a bibliometric network

  3. (c)

    the journal level: the sum of references weights by citing paper is proportional (as in conventional citation schemes) to the number of references

  4. (d)

    the neighborhood of the journal in a bibliometric network

All variants above have the property of independence from classification schemes. The property is strong if the normalization base is free from any alternative process of grouping, namely neighborhood determination in bibliometric networks. Only the variants (a) and (c)Footnote 6 are strongly classification-free in this respect.

The options implying the strict classification-free property are paid for by shortcomings. The option “journal level” (first implementation of the “audience factor” v0.1) faces the issue of journal multi-disciplinarily, a shortcoming mitigated, not cancelled, by using journal networks (audience factor v0.2). The option “article level”, an ideal theoretical case of “fractional citation weighting” where every citing source weighs one, presents a drawback, the sensitivity to several problems: individual variability; types of literature with scarce references due to their marginality to research; editorial constraints on bibliographies length. The correction provided in favor of sources with short bibliographical lists must take notice that bibliographies may be short for bad reasons. Translating the scarce bibliographies of trade journal articles into an overweight for each individual reference is an obvious trap likely to spuriously enhance the impact of these journals. It would be abnormal that an article cited three times in trade journals receive a larger score than an article cited ten times in good scientific journals. Scarce bibliographies sometimes imposed for editorial reasons by scientific journals, despite the fact that they force a stronger selection process by the citing authors, may also be seen as a source of irregularity.

In the SNIP, Moed mitigates the problem by using a robust central value, the median, but this only a partial solution. Other works (Leydesdorff and Shin 2011) claim the radical application of the principle, at great risk in our opinion, especially in actors level applications. Another recent proposal is put forth by Glänzel (2010). In our view, a challenge for the new form of normalization is to find a satisfactory trade-off, in the statistical smoothing, between the reliability of the picture—habits of referencing—and the statistical artefacts likely to arise from cluster/neighborhood analyses in bibliometric networks.

As stated in our previous articles mentioning this approach (2005, 2008, 2010, op. cit.), the principle of citing-side is quite general. It opens a third family of normalized indicators, along with the traditional cited-side standardization—under many forms—and implicit source-level normalization conveyed in some implementations of iterative influence measures. The universality of application range at any granularity level, is a crucial point.Footnote 7 Suitable settings make it appropriate to backward or forward schemes in impact calculation—the JIF calculation is the archetype of citing-year process. The generality of the principle is reaffirmed by Waltman and van Eck (2010) who implemented the Mean Source-Normalized Citation Score (MCNSC) on the model of classical forward counts.

In typical implementations, citing-side normalization avoids the consistency problems of “mediant fractions” met in a classical normalized indicator, the relative citation impact (Ramanana et al. 2009), also termed “mean citation rate” or “crown indicator” which is based on the ratio of an actor’s impact to field average. Leydesdorff and Opthof (2010), on a radical position, dismissed this type of indicator based on a quotient of sums introducing an over-weighting of highly cited items, and recommended to go back to a paper-level normalization, a worthwhile option with advantages and also shortcomings. However, they extend the critique to some forms of citing-side indicators, which is quite strange in our opinion.Footnote 8

The power of the family of “influence measures” for tracking intellectual dependences has been mentioned above. These indicators also have limitations. A technical issue, quite serious at the journal-level, is the treatment of self-citations (Bar-Ilan 2009). At disaggregate level the applications can be hindered by the accumulation of citing delays along a chain of citations, a limitation ignored at high aggregation level if entities (journals, institutions as a whole, etc.) as considered as timeless blocks. Other citation analyses, including citing-side methods, rely only on direct citation links and then escape this limitation.

A challenging question is the relation between various forms of citing-side normalization and the question of “cross-scale” normalization mentioned above. The fact that, technically, citing-side normalization is classification-free does not abolish the fundamental question of “cross-scale” effects. The citation behavior is averaged (lato sensu) on sets of various definitions depending on the variants above (paper-level, journal-level, neighborhoods) with respect to the citation network. The choice of the smoothing area keeps the cross-scale issue alive, albeit asleep.