On fractional approach to analysis of linked networks

Batagelj, Vladimir

doi:10.1007/s11192-020-03383-y

On fractional approach to analysis of linked networks

Published: 28 February 2020

Volume 123, pages 621–633, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Scientometrics Aims and scope Submit manuscript

On fractional approach to analysis of linked networks

Download PDF

Vladimir Batagelj ORCID: orcid.org/0000-0002-0240-9446^1,2,3

541 Accesses
18 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we present the outer product decomposition of a product of compatible linked networks. It provides a foundation for the fractional approach in network analysis. We discuss the standard and Newman’s normalization of networks. We propose some alternatives for fractional bibliographic coupling measures.

Twenty Years of Network Science: A Bibliographic and Co-authorship Network Analysis

Multiplex measures for higher-order networks

Article Open access 03 September 2024

Extracting h-Backbone as a Core Structure in Weighted Networks

Article Open access 25 September 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The fractional approach was proposed by Lindsey (1980). For example in the analysis of coauthorship the contributions of all coauthors to a work has to add to 1. Usually the contribution is then estimated as 1 divided by the number of coauthors. An alternative rule, Newman’s normalization, was given in Newman (2001, 2004) which excludes the selfcollaboration. Recently several papers (Batagelj and Cerinšek 2013; Cerinšek and Batagelj 2015; Perianes-Rodriguez et al. 2016; Prathap and Mukherjee 2016; Leydesdorff and Park 2017; Gauffriau 2017) reconsidered the background of the fractional approach. In this paper we propose a theoretical framework based on the outer product decomposition to get the insight into the structure of bibliographic networks obtained with network multiplication.

The paper starts with basic notions: collections of linked networks in “Linked networks” section and network multiplication in “Network multiplication” section. In “Outer product decomposition” section we formalize, using the outer product decomposition, an observation from our paper (Batagelj and Cerinšek 2013) that the product of two compatible networks is a sum of complete two-mode networks. Their contributions to the product can be different (“Derived networks” section). To equalize their impact we have to normalize both networks (fractional approach, “Fractional approach” section). We extend the result to the linking through a network (“Linking through a network” section) and discuss different implications of the decomposition (“Some notes” section). The standard fractional approach works well for co-citation. Not all normalizations make sense—we can not apply the standard fractional approach to bibliographic coupling. In “Bibliographic coupling and co-citation” section we discuss this problem and show that the Salton cosinus and the Jaccard index are among the options.

Linked networks

Linked or multi-modal networks are collections of networks over at least two sets of nodes (modes) and consist of some one-mode networks and some two-mode networks linking different modes. For example: modes are Persons and Organizations. Two one-mode networks describe collaboration among Persons and among Organizations. The linking two-mode network describes membership of Persons to different Organizations.

Linked networks are the basis of the MetaMatrix approach developed by Krackhardt and Carley (Krackhardt and Carley 1998; Carley 2003). For an example see the Table 3 in Diesner and Carley (2004, p. 89).

Another example of linked networks are bibliographic networks. From special bibliographies (BibTeX) and bibliographic services (Web of Science, Scopus, SICRIS, CiteSeer, Zentralblatt MATH, Google Scholar, DBLP Bibliography, US patent office, IMDb, and others) we can construct some two-mode networks on selected topics: authorship on works $\times$ authors (${\mathbf {W\!A}}$), keywordship on works $\times$ keywords (${\mathbf {W\!K}}$), journalship on works $\times$ journals/publishers (${\mathbf {W\!J}}$), and from some data also the classification network on works $\times$ classification (${\mathbf {WC}}$) and the one-mode citation network on works $\times$ works (${\mathbf {Ci}}$); where works include papers, reports, books, patents, movies, etc. Besides this we get also the partition of works by the publication year, and the vector of number of pages (WoS 2018; Batagelj 2007).

An important tool in analysis of linked networks is the use of derived networks obtained by network multiplication.

Network multiplication

Given a pair of compatible two-mode networks ${\mathcal {N}}_A = ({{\mathcal {I}}},{{\mathcal {K}}},{\mathcal {A}}_A,w_A)$ and ${\mathcal {N}}_B = ({{\mathcal {K}}},{\mathcal {J}},{\mathcal {A}}_B,w_B)$ with corresponding matrices ${\mathbf {A}}_{{{\mathcal {I}}} \times {{\mathcal {K}}}}$ and ${\mathbf {B}}_{{{\mathcal {K}}} \times {\mathcal {J}}}$ we call a product of networks${\mathcal {N}}_A$ and ${\mathcal {N}}_B$ a network ${\mathcal {N}}_C = ({{\mathcal {I}}},{\mathcal {J}},{\mathcal {A}}_C,w_C)$, where ${\mathcal {A}}_C = \{ (i,j): i \in {{\mathcal {I}}}, j \in {\mathcal {J}}, c_{i,j} \ne 0 \}$ and $w_C(i,j) = c_{i,j}$ for $(i,j) \in {\mathcal {A}}_C$. The product matrix ${\mathbf {C}} = [ c_{i,j} ]_{{{\mathcal {I}}} \times {\mathcal {J}}} = {\mathbf {A}} \cdot {\mathbf {B}}$ is defined in the standard way

$$\begin{aligned} c_{i,j} = \sum _{k \in {\mathcal {K}}} a_{i,k} \cdot b_{k,j} \end{aligned}$$

In the case when ${\mathcal {I}} = {\mathcal {K}} = {\mathcal {J}}$ we are dealing with ordinary one-mode networks (with square matrices).

In the following we will often identify networks by their matrices.

In the paper Batagelj and Cerinšek (2013) it is shown that $c_{i,j}$ is equal to the value of all two step paths from $i \in {\mathcal {I}}$ to $j \in {\mathcal {J}}$ passing through ${\mathcal {K}}$. In a special case, if all weights in networks ${\mathcal {N}}_A$ and ${\mathcal {N}}_B$ are equal to 1 the value of $c_{i,j}$ counts the number of ways we can go from $i \in {\mathcal {I}}$ to $j \in {\mathcal {J}}$ passing through ${\mathcal {K}}$: $c_{i,j} = | N_A(i) \cap N^-_B(j)|$; where $N_A(i)$ is the set of nodes in ${\mathcal {K}}$ linked by arcs from node i in the network ${\mathcal {N}}_A$, and $N^-_B(j)$ is the set of nodes in ${\mathcal {K}}$ linked by arcs to node j in the network ${\mathcal {N}}_B$.

The standard matrix multiplication has the complexity $O(|{\mathcal {I}}|\cdot |{\mathcal {K}}|\cdot |{\mathcal {J}}|)$—it is too slow to be used for large networks. For sparse large networks we can multiply much faster considering only nonzero elements.

In general the multiplication of large sparse networks is a ’dangerous’ operation since the result can ’explode’—it is not sparse. If for the sparse networks ${\mathcal {N}}_A$ and ${\mathcal {N}}_B$ there are in ${\mathcal {K}}$ only few nodes with large degree and no one among them with large degree in both networks then also the resulting product network ${\mathcal {N}}_C$ is sparse.

From the network multiplication algorithm we see that each intermediate node $k \in {\mathcal {K}}$ adds to a product network a complete two-mode subgraph $K_{N^-_A(k),N_B(k)}$ (or, in the case ${\mathbf {B}} = {\mathbf {A}}^T$, where ${\mathbf {A}}^T$ is the transposition of ${\mathbf {A}}$, a complete subgraph $K_{N(k)}$). If both degrees $\deg _A(k)=|N^-_A(k)|$ and $\deg _B(k)=|N_B(k)|$ are large then already the computation of this complete subgraph has a quadratic (time and space) complexity—the result ’explodes’. For details see the paper Batagelj and Cerinšek (2013).

Outer product decomposition

For vectors $x = [x_1, x_2, \ldots , x_n]$ and $y = [y_1, y_2, \ldots , y_m]$ their outer product$x \circ y$ is defined as a matrix

$$\begin{aligned} x \circ y = [x_i \cdot y_j]_{n\times m} \end{aligned}$$

then we can express the previous observation about the structure of product network as the outer product decomposition

$$\begin{aligned} {\mathbf {C}} = {\mathbf {A}} \cdot {\mathbf {B}} = \sum _k {\mathbf {H}}_k \quad \text{ where } \quad {\mathbf {H}}_k = {\mathbf {A}}[k,\cdot ] \circ {\mathbf {B}}[k,\cdot ] \end{aligned}$$

For binary (weights) networks we have ${\mathbf {H}}_k = K_{N^-_A(k),N_B(k)}$.

Example A

As an example let us take the binary network matrices ${\mathbf {W\!A}}$ and ${\mathbf {W\!K}}$ (Fig. 1):

and compute the product ${\mathbf {H}}= {\mathbf {W\!A}}^T \cdot {\mathbf {W\!K}}$. We get a network matrix ${\mathbf {H}}$ which can be decomposed as

Derived networks

We can use the multiplication to obtain new networks from existing compatible two-mode networks. For example, from basic bibliographic networks ${\mathbf {W\!A}}$ and ${\mathbf {W\!K}}$ we get

$$\begin{aligned} {\mathbf {A\!K}}= {\mathbf {W\!A}}^T \cdot {\mathbf {W\!K}}\end{aligned}$$

a network relating authors to keywords used in their works, and

$$\begin{aligned} {\mathbf {Ca}} = {\mathbf {W\!A}}^T \cdot {\mathbf {Ci}}\cdot {\mathbf {W\!A}}\end{aligned}$$

is a network of citations between authors.

Networks obtained from existing networks using some operations are called derived networks. They are very important in analysis of collections of linked networks.

What is the meaning of the product network? In general we could consider weights, addition and multiplication over a selected semiring (Cerinšek and Batagelj 2017). In this paper we will limit our attention to the traditional addition and multiplication of real numbers.

The weight ${\mathbf {A\!K}}[a,k]$ is equal to the number of times the author a used the keyword k in his/her works.

The weight ${\mathbf {Ca}}[a,b]$ counts the number of times a work authored by the author a is citing a work authored by the author b; or shorter, how many times the author a cited the author b.

Using network multiplication we can also transform a given two-mode network, for example ${\mathbf {W\!A}}$, into corresponding ordinary one-mode networks (projections)

$$\begin{aligned} {\mathbf {WW}} = {\mathbf {W\!A}}\cdot {\mathbf {W\!A}}^T \qquad \text{ and } \qquad {\mathbf {AA}} = {\mathbf {W\!A}}^T \cdot {\mathbf {W\!A}}\end{aligned}$$

The obtained projections can be analyzed using standard network analysis methods. This is a traditional recipe how to analyze two-mode networks. Often the weights are not considered in the analysis; and when they are considered we have to be very careful about their meaning.

The weight ${\mathbf {WW}}[p,q]$ is equal to the number of common authors of works p and q.

The weight ${\mathbf {AA}}[a,b]$ is equal to the number of works that author a and b coauthored. In a special case when $a=b$ it is equal to the number of works that the author a wrote. The network ${\mathbf {AA}}$ is describing the coauthorship (collaboration) between authors and is also denoted as ${\mathbf {Co}}$—the “first” coauthorship network.

In the paper Batagelj and Cerinšek (2013) it was shown that there can be problems with the network ${\mathbf {Co}}$ when we try to use it for identifying the most collaborative authors. By the outer product decomposition the coauthorship network ${\mathbf {Co}}$ is composed of complete subgraphs on the set of work’s coauthors. Works with many authors produce large complete subgraphs, thus bluring the collaboration structure, and are over-represented by its total weight. To see this, let $S_x = \sum _i x_i$ and $S_y = \sum _j y_j$ then the contribution of the outer product $x\circ y$ is equal

$$\begin{aligned} T = \sum _{i,j} (x\circ y)_{ij} = \sum _i \sum _j x_i\cdot y_j = \sum _i x_i\cdot \sum _j y_j = S_x \cdot S_y \end{aligned}$$

In general each term ${\mathbf {H}}_w$ in the outer product decomposition of the product ${\mathbf {C}}$ has different total weight $T({\mathbf {H}}_w) = \sum _{a,k} ({\mathbf {H}}_w)_{ak}$ leading to over-representation of works with large values. In the case of coautorship network ${\mathbf {Co}}$ we have $S({\mathbf {W\!A}}[w,.]) = \text{ outdeg }_{\mathbf {W\!A}}(w)$ and therefore $T({\mathbf {H}}_w) = \text{ outdeg }_{\mathbf {W\!A}}(w)^2$. To resolve the problem we apply the fractional approach.

Fractional approach

To make the contributions of all works equal we can apply the fractional approach by normalizing the weights: setting $x' = x / S_x$ and $y' = y / S_y$ we get $S_{x'} = S_{y'} =1$ and therefore $T({\mathbf {H}}'_w) = 1$ for all works w.

In the case of two-mode networks ${\mathbf {W\!A}}$ and ${\mathbf {W\!K}}$ we denote

$$\begin{aligned} S^{{\mathbf {W\!A}}}_w = {\left\{ \begin{array}{ll} \sum _a {\mathbf {W\!A}}[w,a] & \quad \text{ outdeg }_{\mathbf {W\!A}}(w) > 0\\ 1 & \quad \text{ outdeg }_{\mathbf {W\!A}}(w) = 0 \end{array}\right. } \end{aligned}$$

(and similarly $S^{\mathbf {W\!K}}_w$) and define the normalized matrices

$$\begin{aligned} {\mathbf {W\!A}}{\mathbf {n}} = \text{ diag }\left( \frac{1}{S^{\mathbf {W\!A}}_w}\right) \cdot {\mathbf {W\!A}}, \quad {\mathbf {W\!K}}{\mathbf {n}} = \text{ diag }\left( \frac{1}{S^{\mathbf {W\!K}}_w}\right) \cdot {\mathbf {W\!K}}\end{aligned}$$

In real life networks ${\mathbf {W\!A}}$ (or ${\mathbf {W\!K}}$) it can happen that some work has no author. In such a case $S^{\mathbf {W\!A}}_w = \sum _a {\mathbf {W\!A}}[w,a] = 0$ which makes problems in the definition of the normalized network ${\mathbf {W\!A}}{\mathbf {n}}$. We can bypass the problem by setting $S^{\mathbf {W\!A}}_w = 1$, as we did in the above definition.

Then the normalized product matrix is

$$\begin{aligned} {\mathbf {A\!K}}{\mathbf {t}} = {\mathbf {W\!A}}{\mathbf {n}}^T \cdot {\mathbf {W\!K}}{\mathbf {n}} \end{aligned}$$

Denoting $\displaystyle {\mathbf {F}}_w = \frac{1}{S^{\mathbf {W\!A}}_w S^{\mathbf {W\!K}}_w} {\mathbf {H}}_w$ the outer product decomposition gets form

$$\begin{aligned} {\mathbf {A\!K}}{\mathbf {t}} = \sum _w {\mathbf {F}}_w \end{aligned}$$

Since

$$\begin{aligned} T({\mathbf {F}}_w) = {\left\{ \begin{array}{ll} 1 & \quad (\text{ outdeg }_{\mathbf {W\!A}}(w)> 0) \wedge (\text{ outdeg }_{\mathbf {W\!K}}(w) > 0) \\ 0 & \quad \text {otherwise} \end{array}\right. } \end{aligned}$$

we have further

$$\begin{aligned} \sum _{a,k} {\mathbf {F}}[a,k] = \sum _{a,k} \sum _w {\mathbf {F}}_w[a,k] = \sum _w T({\mathbf {F}}_w) = |W^+| \end{aligned}$$

where $W^+ = \{ w \in W : (\text{ outdeg }_{\mathbf {W\!A}}(w)> 0) \wedge (\text{ outdeg }_{\mathbf {W\!K}}(w) > 0) \}$.

In the network ${\mathbf {A\!K}}{\mathbf {t}}$ the contribution of each work to the bibliography is 1. These contributions are redistributed to arcs from authors to keywords.

Example B

For matrices from Example A we get the corresponding diagonal normalization matrices

compute the normalized matrices

outer products such as

and finally the product matrix

Linking through a network

Let a network ${\mathbf {S}}$ links works to works. The derived network ${\mathbf {WA}}^T \cdot {\mathbf {S}} \cdot {\mathbf {WA}}$ links authors to authors through${\mathbf {S}}$. Again, the normalization question has to be addressed. Among different options let us consider the derived networks defined as:

$$\begin{aligned} {\mathbf {C}} = \mathbf {WAn}^T \cdot {\mathbf {S}} \cdot {\mathbf {WAn}} \end{aligned}$$

It is easy to verify that:

if ${\mathbf {S}}$ is symmetric, ${\mathbf {S}}^T = {\mathbf {S}}$, then also ${\mathbf {C}}$ is symmetric, ${\mathbf {C}}^T = {\mathbf {C}}$;
$$\begin{aligned} {\mathbf {C}}^T = ( \mathbf {WAn}^T \cdot {\mathbf {S}} \cdot \mathbf {WAn})^T = \mathbf {WAn}^T \cdot {\mathbf {S}}^T \cdot (\mathbf {WAn}^T)^T = {\mathbf {C}} \end{aligned}$$
if $W^+ = \{ w \in W : \text{ outdeg }_\mathbf {W\!A}(w) > 0 \} = W$, the total of weights of ${\mathbf {S}}$ is redistributed in ${\mathbf {C}}$:
$$\begin{aligned} T({\mathbf {C}}) = \sum _{e \in L({\mathbf {C}})} c(e) = \sum _{e \in L({\mathbf {S}})} s(e) = T({\mathbf {S}}) \end{aligned}$$
Since $\displaystyle \sum _{a \in A} wa[p,a] = \text{ outdeg }_\mathbf {W\!A}(p)$ and $\displaystyle wan[p,a] = {\left\{ \begin{array}{ll} \frac{wa[p,a]}{\text {outdeg}_\mathbf {W\!A}(p)} & \ \text{ outdeg }_\mathbf {W\!A}(p) > 0 \\ 0 & \ \text {otherwise} \end{array}\right. }$ we get
$$\begin{aligned} T({\mathbf {C}})&= \sum _{e \in L({\mathbf {C}})} c(e) = \sum _{a \in A}\sum _{b \in A} c[a,b] \\&= \sum _{a \in A}\sum _{b \in A} \sum _{p \in W}\sum _{q \in W} wan[p,a] \cdot s[p,q] \cdot wan[q,b] \\&= \sum _{p \in W^+}\sum _{q \in W^+} \frac{s[p,q]}{\text{ outdeg }_\mathbf {W\!A}(p)\text{ outdeg }_\mathbf {W\!A}(q)} \sum _{a \in A} wa[p,a] \sum _{b \in A} wa[q,b]\\&= \sum _{p \in W^+}\sum _{q \in W^+} s[p,q] \end{aligned}$$
and finally, if $W^+ = W$
$$\begin{aligned} \sum _{p \in W^+}\sum _{q \in W^+} s[p,q] = \sum _{e \in L({\mathbf {S}})} s(e) = T({\mathbf {S}}) \end{aligned}$$

As special cases we get for normalized author’s citation networks with $W^+ = W$: for ${\mathbf {S}} = \mathbf {Ci}$

$$\begin{aligned} \sum _{a \in A}\sum _{b \in A} c[a,b] = \sum _{p \in W}\sum _{q \in W} ci[p,q] = | \mathbf {Ci}| \end{aligned}$$

and for ${\mathbf {S}} = \mathbf {Cin}$

$$\begin{aligned} \sum _{a \in A}\sum _{b \in A} c[a,b] = \sum _{p \in W}\sum _{q \in W: \ \mathbf {outdeg_\mathbf {Ci}}(q)> 0} \frac{ci[p,q]}{\text{ outdeg }_\mathbf {Ci}(p)} = \sum _{q \in W: \ \mathbf {outdeg_\mathbf {Ci}}(q) > 0} 1 = | W_\mathbf {Ci}^+ | \end{aligned}$$

Some notes

A. Instead of computing the normalized network $\mathbf {W\!A}{\mathbf {n}}$ from the network $\mathbf {W\!A}$ we could collect the data about the real proportion wan[w, a] of the contribution of each author a to a work w such that $\mathbf {W\!A}{\mathbf {n}}$ is normalized: for every work w it holds

$$\begin{aligned} \sum _{a \in A} wan[w,a] \in \{0,1\} \end{aligned}$$

Unfortunately in most cases such data are not available and we use the computed normalized weights as their estimates. Most of the results do not depend on the way the normalized network was obtained.

B. In general a given network matrix $\mathbf {W\!A}$ can be normalized in two ways: by rows, as used in this section, and by columns

$$\begin{aligned} \mathbf {W\!A}\mathbf {n'} = \mathbf {W\!A}\cdot \text{ diag }\left( \frac{1}{S^{\mathbf {W\!A}}_a}\right) \quad \text {where} \quad S^{\mathbf {W\!A}}_a = {\left\{ \begin{array}{ll} \sum _w \mathbf {W\!A}[w,a] & \ \text{ indeg }_\mathbf {W\!A}(a) > 0\\ 1 & \ \text{ indeg }_\mathbf {W\!A}(a) = 0 \end{array}\right. } \end{aligned}$$

In the context of bibliographic networks its meaning does not make much sense.

C. The network $\mathbf {Co}$ is symmetric: $co_{ab} = co_{ba}$. We need to compute only half of values $co_{ab}$, $a\le b$. The resulting network is undirected with weights $co_{ab}$.

D. In the paper Batagelj and Cerinšek (2013) the “second” coauthorship network $\mathbf {Cn} = \mathbf {W\!A}^T\cdot \mathbf {W\!A}{\mathbf {n}}$ is considered. The weight $cn_{ab}$ is equal to the contribution of an author a to works that (s)he wrote together with the author b. Using these weights the selfsufficiency of an author a is defined as:

$$\begin{aligned} \displaystyle S_a = \frac{cn_{aa}}{\text{ indeg }_{\mathbf {W\!A}}(a)} \end{aligned}$$

and collaborativness of an author a as its complementary measure $K_a = 1 - S_a$.

E. In the “third” coauthorship network $\mathbf {Cn} = \mathbf {W\!A}{\mathbf {n}}^T\cdot \mathbf {W\!A}{\mathbf {n}}$ the weight $ct_{ab}$ is equal to the total fractional contribution of ‘collaboration’ of authors a and b to works. Each work w with $S^\mathbf {W\!A}_w > 0$ contributes 1 to the total of weights in $\mathbf {Cn}$. This is the network to be used in analysis of collaboration between authors (Batagelj and Cerinšek 2013; Leydesdorff and Park 2017; Prathap and Mukherjee 2016). To identify the most collaborative groups we can use methods such as $P_S$-cores and link islands (Batagelj et al. 2014).

The product $\mathbf {Cn}$ is symmetric. Note C applies. We transform it to the corresponding undirected network—pairs of opposite arcs are replaced by an edge with doubled weight. In analyses we usually analyze separately the vector of weights on loops (selfcontribution) and the network $\mathbf {Cn}$ without loops.

F. An alternative normalization $\mathbf {W\!A}\mathbf {n'}$ of a binary autorship matrix $\mathbf {W\!A}$ was proposed in Newman (2004)

$$\begin{aligned} wan'_{wa} = \frac{wa_{wa} }{ \max (1,\text{ outdeg }_{\mathbf {W\!A}}(w)-1)} \end{aligned}$$

in which only collaboration with coauthors is considered—no selfcollaboration. Note that using the network construction proposed on p. 5 of Newman (2001) we get a network in which works with many coauthors are still over-represented. The same idea is used in the fractional counting co-authorship matrix ${\mathbf {U}}^*$ proposed in equation (5) in Perianes-Rodriguez et al. (2016).

To treat all works equally using the Newman’s normalization the “fourth” coauthorship network was proposed in Cerinšek and Batagelj (2015). To compute it we first compute

$$\begin{aligned} \mathbf {Ct'} = \mathbf {W\!A}{\mathbf {n}}^T \cdot \mathbf {W\!A}\mathbf {n'} \end{aligned}$$

The weight $ct'_{ab}$ is equal to the total contribution of “strict collaboration” of authors a and b to works. The obtained product is symmetric. Again note C applies. We transform it to the corresponding undirected network—pairs of opposite arcs are replaced by an edge with doubled weight. The loops are removed. The contribution of each work with at least two coauthors is equal to 1. A kind of the outer product decomposition exists also for the network $\mathbf {Ct'}$ with a diagonal set to 0.

Bibliographic coupling and co-citation

Bibliographic coupling occurs when two works each cite a third work in their bibliographies, see Fig. 2, left. The idea was introduced by Kessler (1963) and has been used extensively since then. See figure where two citing works, p and q, are shown. Work p cites five works and q cites seven works. The key idea is that there are three works cited by both p and q. This suggests some content communality for the three works cited by both p and q. Having more works citing pairs of prior works increases the likelihood of them sharing content.

We assume that the citation relation means $p\ \mathbf {Ci}\ q \equiv \text{ work } p \text{ cites } \text{ work } q$. Then the bibliographic coupling network $\mathbf {biCo}$ can be determined as

$$\begin{aligned} \mathbf {biCo} = \mathbf {Ci} * \mathbf {Ci}^T \end{aligned}$$

The weight $bico_{pq}$ is equal to the number of works cited by both works p and q; $bico_{pq}= | \mathbf {Ci}(p) \cap \mathbf {Ci}(q) |$. Bibliographic coupling weights are symmetric: $bico_{pq} = bico_{qp}$:

$$\begin{aligned} \mathbf {biCo}^T = (\mathbf {Ci} \cdot \mathbf {Ci}^T)^T = \mathbf {Ci} \cdot \mathbf {Ci}^T = \mathbf {biCo} \end{aligned}$$

Co-citation is a concept with strong parallels with bibliographic coupling (Small 1973; Marshakova 1973), see Fig. 2, right. The focus is on the extent to which works are co-cited by later works. The basic intuition is that the more earlier works are cited, the higher the likelihood that they have common content. The co-citation network $\mathbf {coCi}$ can be determined as

$$\begin{aligned} \mathbf {coCi} = \mathbf {Ci}^T \cdot \mathbf {Ci} . \end{aligned}$$

The weight $coci_{pq}$ is equal to the number of works citing both works p and q. The network $\mathbf {coCi}$ is symmetric $coci_{pq} = coci_{qp}$:

$$\begin{aligned} \mathbf {coCi}^T = (\mathbf {Ci}^T \cdot \mathbf {Ci})^T = \mathbf {Ci}^T \cdot \mathbf {Ci} = \mathbf {coCi} \end{aligned}$$

An important property of co-citation is that $\mathbf {coCi}(\mathbf {Ci}) = \mathbf {biCo}(\mathbf {Ci}^T)$:

$$\begin{aligned} \mathbf {biCo}(\mathbf {Ci}^T) = \mathbf {Ci}^T \cdot (\mathbf {Ci}^T)^T = \mathbf {Ci}^T \cdot \mathbf {Ci}= \mathbf {coCi}(\mathbf {Ci}) \end{aligned}$$

Therefore the constructions proposed for bibliographic coupling can be applied also for co-citation.

What about normalizations? Searching for the most coupled works we have again problems with works with many citations, especially with review papers. To neutralize their impact we can introduce normalized measures. The fractional approach works fine for normalized co-citation

$$\begin{aligned} \mathbf {CoCit} = \mathbf {Cin}^T \cdot \mathbf {Cin} \end{aligned}$$

where $\mathbf {Ci}{\mathbf {n}} = {\mathbf {D}} \cdot \mathbf {Ci}$ and ${\mathbf {D}} = \text{ diag }(\frac{1}{\max (1,\text {outdeg}(p))})$. ${\mathbf {D}}^T = {\mathbf {D}}$ . In the normalized network every work has value 1 and it is equally distributed to all cited works.

The fractional approach can not bi directly applied to bibliographic coupling—to get the outer product decomposition work we would need to normalize $\mathbf {Ci}$ by columns—a cited work has value 1 which is distributed equally to the citing works—the most cited works give the least. This is against our intuition. To construct a reasonable measure we can proceed as follows. Let us first look at

$$\begin{aligned} \mathbf {biC} = \mathbf {Ci}{\mathbf {n}} \cdot \mathbf {Ci}^T \end{aligned}$$

we have

$$\begin{aligned} \mathbf {biC}&= ({\mathbf {D}} \cdot \mathbf {Ci}) \cdot \mathbf {Ci}^T = {\mathbf {D}} \cdot \mathbf {biCo} \\ \mathbf {biC}^T&= ({\mathbf {D}} \cdot \mathbf {biCo})^T = \mathbf {biCo}^T \cdot {\mathbf {D}}^T = \mathbf {biCo} \cdot {\mathbf {D}} \end{aligned}$$

For $\mathbf {Ci}(p) \ne \emptyset$ and $\mathbf {Ci}(q) \ne \emptyset$ it holds

$$\begin{aligned} \mathbf {biC}_{pq} = \frac{|\mathbf {Ci}(p) \cap \mathbf {Ci}(q)|}{|\mathbf {Ci}(p)|} \quad \text{ and } \quad \mathbf {biC}_{qp} = \frac{|\mathbf {Ci}(p) \cap \mathbf {Ci}(q)|}{|\mathbf {Ci}(q)|} = \mathbf {biC}_{pq}^T \end{aligned}$$

and $\mathbf {biC}_{pq} \in [0,1]$. $\mathbf {biC}_{pq}$ is the proportion of its references that the work p shares with the work q. The network $\mathbf {biC}$ is not symmetric. We have different options to construct normalized symmetric measures such as

$$\begin{aligned} \mathbf {biCoa}_{pq}&= \frac{1}{2}( \mathbf {biC}_{pq} + \mathbf {biC}_{qp} ) \quad \text{ Average }\\ \mathbf {biCom}_{pq}&= \min ( \mathbf {biC}_{pq}, \mathbf {biC}_{qp} ) \quad \text{ Minimum }\\ \mathbf {biCoM}_{pq}&= \max ( \mathbf {biC}_{pq}, \mathbf {biC}_{qp} ) \quad \text{ Maximum } \end{aligned}$$

or, may be more interesting

$$\begin{aligned} \mathbf {biCog}_{pq}&= \sqrt{ \mathbf {biC}_{pq}\cdot \mathbf {biC}_{qp}} = \frac{|\mathbf {Ci}(p) \cap \mathbf {Ci}(q)|}{\sqrt{ |\mathbf {Ci}(p)| \cdot |\mathbf {Ci}(q) |} } \quad \begin{array}{l}\text{ Geometric } \text{ mean }\\ \text{ Salton } \text{ cosinus }\end{array} \\ \mathbf {biCoh}_{pq}&= 2\cdot ( \mathbf {biC}_{pq}^{-1} + \mathbf {biC}_{qp}^{-1} )^{-1} = \frac{ 2 |\mathbf {Ci}(p) \cap \mathbf {Ci}(q)|}{ | \mathbf {Ci}(p)| + |\mathbf {Ci}(q) |} \quad \text{ Harmonic } \text{ mean } \\ \mathbf {biCoj}_{pq}&= ( \mathbf {biC}_{pq}^{-1} + \mathbf {biC}_{qp}^{-1} - 1)^{-1} = \frac{ |\mathbf {Ci}(p) \cap \mathbf {Ci}(q)|}{ | \mathbf {Ci}(p) \cup \mathbf {Ci}(q) |} \quad \text{ Jaccard } \text{ index } \end{aligned}$$

All these measures are similarities.

It is easy to verify that $biCoX_{pq} \in [0,1]$ and: $biCoX_{pq} = 1$ iff the works p and q are referencing the same works, $\mathbf {Ci}(p) = \mathbf {Ci}(q)$.

From $m \le H \le G \le A \le M$ and $J \le m$, ($\frac{|P \cap Q|}{|P \cup Q|} \le \min (\frac{|P \cap Q|}{|P|} ,\frac{|P \cap Q|}{|Q|} )$) we get

$$\begin{aligned} \mathbf {biCoj}_{pq} \le \mathbf {biCom}_{pq} \le \mathbf {biCoh}_{pq} \le \mathbf {biCog}_{pq} \le \mathbf {biCoa}_{pq} \le \mathbf {biCoM}_{pq} \end{aligned}$$

The equalities hold iff $\mathbf {Ci}(p) = \mathbf {Ci}(q)$.

To get a dissimilarity we can use transformations $dis = 1 - sim$ or $dis = \frac{1}{sim} - 1$ or $dis = - \log sim$. For example

$$\begin{aligned} \mathbf {biCod}_{pq} = 1 - \mathbf {biCoj}_{pq} = \frac{ |\mathbf {Ci}(p) \oplus \mathbf {Ci}(q)|}{ | \mathbf {Ci}(p) \cup \mathbf {Ci}(q) |} \quad \text{ Jaccard } \text{ distance } \end{aligned}$$

where $\oplus$ denotes the symmetric difference of sets.

Bibliographic coupling and co-citation networks are linking works to works. To get linking between authors, journals or keywords considering citation similarity we can apply the construction from “Linking through a network” section to the normalized co-citation or bibliographic coupling network.

Conclusions

In the paper we presented an attempt to provide a foundation of fractional approach to biblimetric networks based on the outer product decomposition of product networks. We also discussed the fractional approach to bibliographic coupling and co-citation networks. The results of application of the proposed methods to real bibliographic data will be presented in separate papers.

All described computations can be done efficiently in program Pajek (De Nooy et al. 2018) using macros such us: norm1—normalized 1-mode network, norm2—normalized 2-mode network, norm2p—Newman’s normalization of a 2-mode network, biCo—bibliographic coupling network, and biCon—normalized bibliographic coupling network, available at GitHub (Batagelj 2018).

References

Batagelj, V. (2007). WoS2Pajek. Networks from web of science. Version 1.5 (2017). http://vladowiki.fmf.uni-lj.si/doku.php?id=pajek:wos2pajek.
Batagelj, V. (2018). Github: biblio—Bibliographic network analysis. https://github.com/bavla/biblio.
Batagelj, V., & Cerinšek, M. (2013). On bibliographic networks. Scientometrics, 96(3), 845–864.
Article Google Scholar
Batagelj, V., Doreian, P., Ferligoj, A., & Kejžar, N. (2014). Understanding large temporal networks and spatial networks: Exploration, pattern searching, visualization and network evolution. Chichester: Wiley.
Book Google Scholar
Carley, K. M. (2003). Dynamic network analysis. In R. Breiger & K. M. Carley (Eds.), Summary of the NRC workshop on social network modeling and analysis (pp. 133–145). Washington, DC: National Research Council.
Google Scholar
Cerinšek, M., & Batagelj, V. (2015). Network analysis of Zentralblatt MATH data. Scientometrics, 102(1), 977–1001.
Article Google Scholar
Cerinšek, M., & Batagelj, V. (2017). Semirings and matrix analysis of networks. In R. Alhajj & J. Rokne (Eds.), Encyclopedia of social network analysis and mining. New York: Springer.
MATH Google Scholar
Clarivate Analytics (2018). https://clarivate.com/products/web-of-science/databases/.
De Nooy, W., Mrvar, A., & Batagelj, V. (2018). Exploratory social network analysis with Pajek: Revised and expanded edition for updated software., Structural analysis in the social sciences Cambridge: Cambridge University Press.
Book Google Scholar
Diesner, J., & Carley, K. M. (2004). Revealing social structure from texts: Meta-matrix text analysis as a novel method for network text analysis, Chapter 4. In V. K. Narayanan & D. J. Armstrong (Eds.), Causal mapping for research in information technology (pp. 81–108). Calgary: Idea Group Inc.
Google Scholar
Gauffriau, M. (2017). A categorization of arguments for counting methods for publication and citation indicators. Journal of Informetrics, 11(3), 672–684.
Article Google Scholar
Kessler, M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14(1), 10–25.
Article Google Scholar
Krackhardt, D., & Carley, K. M. (1998). A PCANS model of structure in organization. In Proceedings of the 1998 international symposium on command and control research and technology evidence based research (pp. 113–119), Vienna, VA.
Leydesdorff, L., & Park, H. W. (2016). Full and fractional counting in bibliometric networks. Journal of Informetrics, 11(1), 117–120.
Article Google Scholar
Lindsey, D. (1980). Production and citation measures in the sociology of science: The problem of multiple authorship. Social Studies of Science, 10(2), 145–162.
Article Google Scholar
Marshakova, I. (1973). System of documentation connections based on references (sci). Nauchno-Tekhnicheskaya Informatsiya Seriya, 2(6), 3–8.
Google Scholar
Newman, M. E. J. (2001). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 016132.
Article MathSciNet Google Scholar
Newman, M. E. J. (2004). Coauthorship networks and patterns of scientific collaboration. In Proceedings of the national academy of sciences of the United States of America (vol. 101, no. Suppl1, pp. 5200–5205).
Perianes-Rodriguez, A., Waltman, L., & Van Eck, N. J. (2016). Constructing bibliometric networks: A comparison between full and fractional counting. Journal of Informetrics, 10(4), 1178–1195.
Article Google Scholar
Prathap, G., & Mukherjee, S. (2016). A conservation rule for constructing bibliometric network matrices. arXiv:1611.08592
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.
Article MathSciNet Google Scholar

Download references

Acknowledgements

The paper is based on presentations on 1274. Sredin seminar, IMFM, Ljubljana, 29. March 2017; NetGloW 2018, St Petersburg, July 4-6, 2018; and COMPSTAT 2018, Iasi, Romania, August 28-31, 2018. This work is supported in part by the Slovenian Research Agency (research program P1-0294 and research projects J1-9187, J7-8279 and BI-US/17-18-045) (Javna Agencija za Raziskovalno Dejavnost RS), project CRoNoS (COST Action IC1408) (European Cooperation in Science and Technology) and by Russian Academic Excellence Project ‘5-100’.

Author information

Authors and Affiliations

Institute of Mathematics, Physics and Mechanics, Jadranska 19, 1000, Ljubljana, Slovenia
Vladimir Batagelj
University of Primorska, Andrej Marušič Institute, 6000, Koper, Slovenia
Vladimir Batagelj
National Research University Higher School of Economics, 11 Pokrovsky Boulevard, Moscow, Russia, 101000
Vladimir Batagelj

Authors

Vladimir Batagelj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Batagelj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Batagelj, V. On fractional approach to analysis of linked networks. Scientometrics 123, 621–633 (2020). https://doi.org/10.1007/s11192-020-03383-y

Download citation

Received: 10 February 2019
Published: 28 February 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11192-020-03383-y

Keywords

Mathematics Subject Classification

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On fractional approach to analysis of linked networks

Abstract

Similar content being viewed by others

Twenty Years of Network Science: A Bibliographic and Co-authorship Network Analysis

Multiplex measures for higher-order networks

Extracting h-Backbone as a Core Structure in Weighted Networks

Introduction

Linked networks

Network multiplication