A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Kim, Jinseok

doi:10.1007/s11192-019-03143-7

A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Published: 14 June 2019

Volume 120, pages 661–681, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Scientometrics Aims and scope Submit manuscript

A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Download PDF

Jinseok Kim ORCID: orcid.org/0000-0001-6481-2065¹

675 Accesses
14 Citations
1 Altmetric
Explore all metrics

Abstract

Clustering results in author name disambiguation are often evaluated by measures such as Cluster-F, K-metric, Pairwise-F, Splitting and Lumping Error, and B-cubed. Although these measures have different evaluation approaches, this paper shows that they can be calculated in a single framework by a set of common steps that compare truth and predicted clusters through two hash tables recording information about name instances with their predicted cluster indices and frequencies of those indices per truth cluster. This integrative calculation reduces greatly calculation runtime, which is scalable to a clustering task involving millions of name instances within a few seconds. During the integration process, B-cubed and K-metric are shown to produce the same precision and recall scores. In addition, name instance pairs for Pairwise-F are counted using a heuristic, which enables the proposed method to surpass a state-of-the-art algorithm in speedy calculation. Details of the integrative calculation are described with examples and pseudo-code to assist scholars to implement each measure easily and validate the correctness of implementation. The integrative calculation will help scholars compare similarities and differences of multiple measures before they select ones that characterize best the clustering performances of their disambiguation methods.

On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method

Article 07 July 2015

Generating automatically labeled data for author name disambiguation: an iterative clustering method

Article 29 November 2018

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Article 16 February 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Author name disambiguation is an entity resolution task to generate clusters of name instances to refer to unique authors in bibliographic data. It is crucial to research that mines authorship data because ambiguous names can lead to merging and/or splitting of author identities and thus flawed knowledge about research production and collaboration (Fegley and Torvik 2013; Kim and Diesner 2015, 2016; Strotmann and Zhao 2012). As publications and ambiguous author names such as East Asian names increase in digital libraries (Bornmann and Mutz 2015; Torvik and Smalheiser 2009), various methods for author name disambiguation (Hussain and Asghar 2017; Smalheiser and Torvik 2009) have been proposed.

After a disambiguation method is implemented, its clustering performance is evaluated by measures such as Cluster-F, K-metric, Pairwise-F, Splitting & Lumping Error, and B-cubed. As there is no consensus on a definitive measure for evaluating author name disambiguation (Ferreira et al. 2012), one or two measures are chosen at the researcher’s discretion. The selection is, sometimes, justified by an argument that the selected measure is frequently used. In many studies, however, a measure is selected without any clarification.

A clustering measure should be selected considering the context of each study. The choice can, however, change our impression about a disambiguation method if its performance is evaluated high by one measure, but low or mediocre by another. Calculating diverse measures in a disambiguation study can be a nontrivial task because clustering measures have distinct evaluation approaches which are not easy to compare their similarities and differences. In addition, the straightforward implementation of a measure like Pairwise-F can consume too much runtime depending on data size because the number of instance pairs for comparison can increase quadratically in a worst-case scenario (Menestrina et al. 2010).

To help scholars select clustering measures that characterize best their disambiguation results, this study shows that five commonly used measures for evaluating clustering performance in author name disambiguation can be calculated all-in-one by implementing a common set of code. This integrative calculation shows intuitively how these measures are similar and different in evaluating clustering results. Especially, the proposed approach reduces computation runtime, dramatically for Pairwise-F in particular. In the following sections, the usage patterns of clustering measures in author name disambiguation are reviewed. Then, the integration process is explained step-by-step with pseudo-code and examples.

Literature review

Table 1 shows the list of selected author name disambiguation studies and their measures for evaluating clustering performance. Note that detailed explanation of each measure will be provided in the Results section in this paper.

Table 1 Clustering performance measures in selected author name disambiguation studies

Full size table

According to the table, Pairwise-F is the most popular measure. It appears in 15 out of 23 studies. This confirms that it is the most frequently used measure in entity resolution in general (Maidasani et al. 2012; Menestrina et al. 2010) as well as in author name disambiguation (Levin et al. 2012).^{Footnote 1} K-metric is calculated in 8 studies, followed by B-cubed (B³, 7) and Cluster-F (5). Three studies use the Splitting and Lumping Errors (SE & LE) measure.

In Table 1, 11 out of 23 studies rely on a single measure while others on two or three measures. In addition, the combinations of co-used measures vary. Figure 1 shows pairs of co-used measures in Table 1 and their co-usage frequencies. For example, Pairwise-F is paired with K-metric 7 times. Interestingly, some possible pairs have never been calculated together. For example, B³ is paired with Pairwise-F twice but not with K-metric, Cluster-F, and SE and LE.

The use of Pairwise-F is often justified because it is frequently used in entity resolution studies. Sometimes, measures are selected to follow the practice of referenced studies or without any clarification. Although such choices should be understood in each study’s unique context, they can change our impression about the performance of a disambiguation method. To illustrate this, the clustering performance of a digital library, DBLP (Ley 2009; Reitz and Hoffmann 2013), was evaluated on a labeled dataset, KISTI (Kang et al. 2011). KISTI consists of a set of ambiguous name instances filtered from DBLP and disambiguated semi-manually by researchers at the Korean Institute for Science and Technology Information. Among 41,673 name instances in the original KISTI, a total of 41,358 name instances are matched to DBLP (2017 September) records.^{Footnote 2} Figure 2 shows DBLP’s clustering performance evaluated on KISTI by five measures.

According to the figure, DBLP’s disambiguation is highly accurate: precision, recall, and F scores of three measures—Pairwise-F, B³, and K-metric—are all above 0.95, corroborating Kim (2018). Cluster-F and SE & LE scores are, however, not so much encouraging. Especially, Cluster-F shows that DBLP performs a little worse in recall than in precision, which contrasts other three measures reporting that DBLP performs better in recall than in precision. According to SE & LE, DBLP disambiguates names better in terms of recall than precision, but the recall-precision performance gap (|recall − precision| = 0.1794) is much pronounced than those by Pairwise-F, K-metric, and B³ (|recall − precision| = 0.0346–0.0487).

This illustrates why we need to consider various clustering measures to evaluate a disambiguation method. Depending on the choices of measures, the same clustering results can be evaluated as encouraging or less so. As shown in Table 1, however, the selection of measures do not seem to be guided by any common practice. But this does not imply that scholars need to report evaluation results calculated by all available measures, which is undesirable for efficient communication.

Instead, it should be considered that using diverse measures can illuminate where a disambiguation method performs well and needs improvement. For example, the low Cluster-F coupled with high B³ in Fig. 2 indicates that misidentified name instances by DBLP are not many (high B³ scores) but appear across several truth clusters (low Cluster-F) because a single misidentified instance in a truth cluster decides the DBLP’s performance for the whole cluster as a failure. In addition, diverse measures can enable scholars to compare performances of their disambiguation methods with other studies evaluated by different measures and thus to find room for improvement or synthesize strengths of each study.

Calculating various measures for a disambiguation study can, however, be a nontrivial task. Each measure needs to be implemented with a careful validation of accuracy. In addition, each measure can be implemented using different code snippets which are not often shared. So, scholars who want to implement a clustering measure usually need to write code from scratch. Sometimes, a measure may not be easily implementable for a large dataset. For example, calculating Pairwise-F can consume much computing time and RAM because the number of instance pairs can increase quadratically “in the worst case” (Menestrina et al. 2010).^{Footnote 3}

To facilitate the efficient use of diverse clustering measures in author name disambiguation, this study proposes an algorithm to calculate the five commonly used measures all-in-one in an integrative framework. Specifically, although the five measures have different evaluation approaches, they can be calculated by a common set of code, which will help us understand better the similarities and differences of the measures. This integrative calculation is the first attempt of this sort and a novel contribution to the measurement of clustering performance in author name disambiguation. Moreover, during the integration process, B³ and K-metric are shown to produce the same precision and recall scores. Within this framework, especially, Pairwise-F is calculated by a heuristic rather than a brute-force comparison of instance pairs, reducing greatly computation time from quadratic (at worst) to linear one. This solution is motivated by Menestrina et al. (2010) in which Pairwise-F is calculated linearly through a Slice algorithm combined with a cost function. This study combines the Slice algorithm with a heuristic to calculate Pairwise-F faster than the ‘Slice algorithm + cost function’ approach. In following sections, the details of integrative calculation are described with examples and pseudo-code.

Methods

Scholars usually evaluate clustering results in two ways: recall and precision. Here, a cluster consists of name instances that are decided to represent the same authors by a disambiguation algorithm (a predicted cluster) or manual labeling (a truth cluster). Recall measures how many truth clusters are not compromised by merged or split name instances in predicted clusters, while precision measures how many predicted clusters group correctly name instances that belong to the same truth clusters.

Incorporating the aforesaid five measures into the same framework is possible because all of them evaluate disambiguation results by both recall and precision. What makes them different from one another is that each measure is designed to assess precision and recall at one of three levels: cluster, instance, or pair of instances, as summarized in Table 2.

Table 2 Summary of calculation level and recall-precision types per performance measure

Full size table

Despite such different calculation levels, the measures can be implemented by embedding the instance- and pair-level calculations into the cluster level calculation through a set of common steps (“skeleton code” hereafter). Algorithm 1 shows the skeleton code.

A key idea of Algorithm 1 is that truth clusters are not compared cluster by cluster to predicted ones. Instead, a name instance (p) in a predicted cluster (P_i) is recorded into a hash table (pIndex) where the instance p (key) is mapped to its cluster membership (= i: value) (code line 2–7). Next, a name instance (t) in a truth cluster (T_j) is checked for its index (i) in predicted clusters (P) by referencing pIndex. Then, the count of the index (i) is recorded into another hash table (tMap) where an index i (key) is mapped to its frequency (value) (code line 10–15). In other words, this code snippet counts the number of name instances in a truth cluster that appear together in predicted clusters (= sharing the same i), which corresponds to detecting the intersection of a truth cluster (T_j) and predicted clusters (P). Note that this procedure adopts part of the Slice algorithm in Menestrina et al. (2010).

Within this cluster-level calculation framework, pair- and instance-level measures can be also calculated. To demonstrate this, each measure is explained in detail below starting from cluster- to pair- and instance-levels.

Results

Cluster level: Cluster-F

Cluster-F (cF) is a harmonic mean of cluster recall (cR) and cluster precision (cP) (Menestrina et al. 2010).

$$cR = \frac{{\left| {P \cap T} \right|}}{\left| T \right|}$$

(1)

$$cP = \frac{{\left| {P \cap T} \right|}}{\left| P \right|}$$

(2)

$$cF = \frac{2 \times cR \times cP}{(cR + cP)}$$

(3)

Here, P is a set of predicted clusters, while T is a set of truth clusters. The numerator $\left| {P \cap T} \right|$ counts the number of predicted clusters that contain all and the only instances belonging to the same truth clusters. Cluster recall (cR) is the ratio of the numerator over the number of all truth clusters ($\left| T \right|$). Cluster precision (cP) is the ratio of this numerator over the number of all predicted clusters ($\left| P \right|$).

Table 3 shows an example from Maidasani et al. (2012, p.17) for calculating Cluster-F. In the first column, there are three truth clusters (T₁, T₂, and T₃) in which eight name instances with numeric ids (1, 2, 3…8) are assigned. The second column shows predicted results: eight instances in the first column are assigned to two clusters (P₁ and P₂). After instances are compared across predicted and truth clusters, only one case of $\left| {P \cap T} \right|$ (P₁ = T₁) is detected. So, the numerator for cR is 1, while the denominator is 3 (the number of truth clusters), resulting in cR = 1/3. The numerator for cP is also 1 but its denominator is 2 (the number of predicted clusters), resulting in cP = 1/2. Their harmonic mean is 0.4.

Table 3 An Illustration of Cluster-F calculation

Full size table

The calculation of cR and cP can be implemented as follows.

In Algorithm 2, the code lines added to Algorithm 1 are highlighted in bold. As a result of running the skeleton code, the hash table tMap records every cluster index i associated with name instances in T and the frequency of each index. If (1) an index i (key)’s frequency in tMap is the same as the size of a truth cluster T_j (value = $\left| {T_{j} } \right|$) and (2) the size of the cluster P_i is the same (cSize[key] = $\left| {T_{j} } \right|$), this means that all and only name instances in the truth cluster appear together in the same predicted cluster. This is a case of the intersection ($\left| {P \cap T} \right|$) and increments cMatch, the numerator for cR and cP.

Cluster level: K-metric

K-metric consists of Average Author Purity (AAP), Average Cluster Purity (ACP), and their geometric mean (K) (Santana et al. 2017).

$$AAP = \frac{1}{N}\mathop \sum \limits_{j = 1}^{\left| T \right|} \mathop \sum \limits_{i = 1}^{\left| P \right|} \frac{{n_{ij}^{2} }}{{n_{j} }}$$

(4)

$$ACP = \frac{1}{N}\mathop \sum \limits_{i = 1}^{\left| P \right|} \mathop \sum \limits_{j = 1}^{\left| T \right|} \frac{{n_{ij}^{2} }}{{n_{i} }}$$

(5)

$$K = \sqrt {ACP \times AAP}$$

(6)

Here, T and P represent sets of truth and predicted clusters each. N is the total of name instances to be disambiguated. It is assumed that every name instance in truth clusters is assigned to one of predicted clusters throughout this paper. $n_{ij}$ is the number of P_i name instances that also appear in T_j; n_i and n_j represent the numbers of name instances in P_i and T_j, respectively. AAP measures the fragmentation of truth clusters. In other words, its value is low when many instances of an author (= a truth cluster) are split into separate predicted clusters (≈ recall). In contrast, ACP measures the purity of the predicted clusters. The ACP value decreases if predicted clusters contain name instances that should belong to other predicted clusters (≈ precision).

Table 4 illustrates the K-metric calculation. AAP starts by counting the number of name instances in the truth cluster that appear in each predicted cluster. For example, all instances in T₁ appear together in P₁, thus $n_{11}^{2}$ = 3² (= 9) and n₁ = 3. This repeats over other truth clusters (T₂ = 2²/2 and T₃ = 3²/3). The same procedure is applied for ACP but this time staring from P₁ being compared to each truth cluster.

Table 4 An Illustration of K-metric Calculation

Full size table

Equations 4 and 5 can be re-written using a set notation as follows. The order of cluster comparison (truth → predicted or predicted → truth) does not affect the calculation outcome because the final sets of intersection ($P_{i} \cap T_{j}$) in AAP and ACP are the same. So, the summation can be ordered as truth clusters being compared to predicted clusters (i.e., truth → predicted) for both AAP and ACP.

$$AAP = \frac{1}{N}\mathop \sum \limits_{j = 1}^{\left| T \right|} \mathop \sum \limits_{i = 1}^{\left| P \right|} \frac{{n_{ij}^{2} }}{{n_{j} }} = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{i \in P} \frac{{\left| {P_{i} \cap T_{j} } \right|^{2} }}{{\left| {T_{j} } \right|}}$$

(7)

$$ACP = \frac{1}{N}\mathop \sum \limits_{i = 1}^{\left| P \right|} \mathop \sum \limits_{j = 1}^{\left| T \right|} \frac{{n_{ij}^{2} }}{{n_{i} }} = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{i \in P} \frac{{\left| {P_{i} \cap T_{j} } \right|^{2} }}{{\left| {P_{i} } \right|}}$$

(8)

The revised equations can be implemented by expanding Algorithm 1.

Algorithm 3 recycles the skeleton code. The added lines to Algorithm 1 are shown in bold. The re-use is possible because in Eqs. 7 and 8, K-metric is re-formulated using a single procedure in which truth clusters are compared to predicted clusters for both AAP and ACP. In contrast, Eqs. 4 and 5 formulate that truth clusters are compared to predicted clusters for AAP and then predicted clusters to truth clusters for ACP.

As all name instances in truth clusters are assigned to one of predicted clusters, the value of N can be obtained by counting instances in either truth (instSum, code line 11) or predicted clusters. In code lines 20–21, $\left| {P_{i} \cap T_{j} } \right|^{2} /\left| {T_{j} } \right|$ in Eq. 7 and $\left| {P_{i} \cap T_{j} } \right|^{2} /\left| {P_{i} } \right|$ in Eq. 8 are calculated and summed into aapSum and acpSum, respectively, using the hash values in tMap. Especially, $\left| {P_{i} } \right|$ is obtained by referencing a predicted cluster index i (key) to cSize generated in code line 7.

Cluster level: Splitting and Lumping Error

Several studies have adopted the concept of Lumping (= merging) and Splitting Error (Kim and Diesner 2016; Lerchenmueller and Sorenson 2016; Li et al. 2014; Liu et al. 2014; Torvik and Smalheiser 2009). Splitting Error (SE) and Lumping Error (LE) are defined as follows (Li et al. 2014):

$$SE = \frac{{\mathop \sum \nolimits_{a} \left| {\left\{ {x |x \in T_{a} , x \notin P_{a} } \right\}} \right|}}{{\mathop \sum \nolimits_{a} \left| {T_{a} } \right|}}$$

(9)

$$LE = \frac{{\mathop \sum \nolimits_{a} \left| {\left\{ {x |x \in P_{a} , x \notin T_{a} } \right\}} \right|}}{{\mathop \sum \nolimits_{a} \left| {P_{a} } \right|}}$$

(10)

Here, x means an author name instance. T_a represents the truth cluster of a unique author a, while P_a means the predicted cluster that contains the largest number of name instances of the unique author a. SE evaluates how many name instances of a unique author (= a truth cluster) fail to appear in the predicted cluster that contains the largest number of name instances associated with the unique author (≈ recall). LE measures how many name instances in a predicted cluster belong to other distinct authors, i.e., truth clusters (≈ precision). Note that SE and LE consider only a predicted cluster that contains the largest number of name instances of a truth cluster. In contrast, Cluster-F considers only the perfect match of all name instances between a predicted cluster and a truth cluster. Others (K-metric, Pairwise-F, and B³) consider all intersection sets of instances between a truth cluster and predicted clusters.

Table 5 illustrates how to calculate SE and LE. The SE calculation starts by comparing name instances in T₁ with P₁ and P₂. P₁ contains the largest number of T₁ name instances. As there is no name instance in T₁ that does not belong to P₁, the value for $\left| {\left\{ {x |x \in T_{a} , x \notin P_{a} } \right\}} \right|$ in Eq. 9 is zero. Likewise, no splitting error case is detected for T₂ and T₃ because all name instances in T₂ and T₃ are found in P₂, the predicted cluster that contains all name instances of both T₂ and T₃. Thus, the numerator for SE is 0, while its denominator, sum of all truth cluster sizes, is 8. For LE, name instances in T₁ are all found in P₁. But name instances in T₂ and T₃ are lumped with those from T₃ and T₂, respectively, in the same predicted cluster P₂. Regarding the error for T₂, three name instances from T₃ are wrongly assigned to P₂ (thus, lumping error = 3), while for T₃, two instances from T₂ are wrongly assigned to P₂ (thus, lumping error = 2). As both T₂ and T₃ share the largest predicted cluster, P₂, their $\left| {P_{a} } \right|$ value is 5 (= |P₂|).

Table 5 An Illustration of Splitting and Lumping Errors Calculation

Full size table

A key difference between SE &LE and other four measures is that SE &LE counts errors (split or lumped name instances), while others count correctly predicted name instances. For comparison across five measures, these error-based measures can be converted into recall (eR), precision (eP), and F (eF) measures as follows (Lerchenmueller and Sorenson 2016; Liu et al. 2014; Torvik and Smalheiser 2009):

$$eR = 1 - SE$$

(11)

$$eP = 1 - LE$$

(12)

$$eF = \frac{2 \times eR \times eP}{eR + eP}$$

(13)

This conversion scales eR between 0 (all split) and 1 (no splitting), and eP between 0 (all lumped) and 1 (no lumping). In Table 5, for example, eR = 1 – SE = 1–0 = 1 and eP = 1–LE = 1–0.3846 = 0.6154. Their harmonic mean (= 0.7619) is eF.

Equation 9 and 10 can be re-written using a set notation as follows.

$$SE = \frac{{\mathop \sum \nolimits_{a} \left| {\left\{ {x |x \in T_{a} , x \notin P_{a} } \right\}} \right|}}{{\mathop \sum \nolimits_{a} \left| {T_{a} } \right|}} = \frac{{\mathop \sum \nolimits_{a} (\left| {T_{a} } \right| - \left| {T_{a} \cap P_{a} } \right|)}}{{\mathop \sum \nolimits_{a} \left| {T_{a} } \right|}}$$

(14)

$$LE = \frac{{\mathop \sum \nolimits_{a} \left| {\left\{ {x |x \in P_{a} , x \notin T_{a} } \right\}} \right|}}{{\mathop \sum \nolimits_{a} \left| {P_{a} } \right|}} = \frac{{\mathop \sum \nolimits_{a} (\left| {P_{a} } \right| - \left| {T_{a} \cap P_{a} } \right|)}}{{\mathop \sum \nolimits_{a} \left| {P_{a} } \right|}}$$

(15)

The calculation of SE and LE can be implemented by adding lines to the skeleton code as follows.

In Algorithm 4, code lines 17 and 19–22 find the predicted cluster index i (key) with the largest frequency (value) from tMap. For an author a (= a truth cluster $\left| {T_{a} } \right|$), the maxValue in tMap is used for counting $\left| {T_{a} \cap P_{a} } \right|$ in Eqs. 14 and 15. In addition, the key for the maxValue is used to obtain the value for cSize[maxKey] $= \left| {P_{a} } \right|$, which is the size of the predicted cluster that contains the largest number of name instances in the truth cluster $\left| {T_{a} } \right|$.

Pairwise level: Pairwise-F

This measures clustering performance at a pair-level via pairwise Precision (pP), pairwise Recall (pR), and pairwise F (pF) as defined below (Menestrina et al. 2010):

$$pR = \frac{{\left| {pairs\left( P \right) \cap pairs\left( T \right)} \right|}}{{\left| {pairs\left( T \right)} \right|}}$$

(16)

$$pP = \frac{{\left| {pairs\left( P \right) \cap pairs\left( T \right)} \right|}}{{\left| {pairs\left( P \right)} \right|}}$$

(17)

$$pF = \frac{2 \times pR \times pP}{pR + pP}$$

(18)

Here, pairs(P) and pairs(T) mean name instance pairs generated from the same cluster in predicted clusters P and truth clusters T. The numerator $\left| {pairs\left( P \right) \cap pairs\left( T \right)} \right|$ is the number of instance pairs that appear both in P and T.

The calculation of pR and pP is illustrated in Table 6. Here, a pair of name instances is represented by two instance ids separated by a vertical bar. In T1, for example, three name instances (1, 2, and 3) are paired into three pairs (1|2, 1|3, and 2|3). The list of name pairs of truth clusters is compared with that of predicted clusters to generate a list of pairs found in both lists. The count of these intersection pairs constitutes the numerator (1|2, 1|3, 2|3, 4|5, 6|7, 6|8, 7|8; 7 pairs), which is divided by the total of pairs in truth clusters (= 7) for pR and by the total of pairs in predicted clusters (= 13) for pP.

Table 6 An Illustration of Pairwise-F Calculation

Full size table

Calculating pR and pP can be memory- and time-consuming because the number of pairs in a cluster increases in a quadratic way with the size of name instances (Levin et al. 2012; Louppe et al. 2016). For example, the number of pairs for a cluster with 10 instances is 45, while that of a cluster with 1000 instances is 499,500. To overcome this problem, the Pairwise-F measures can be re-written as follows.

$$pR = \frac{{\left| {pairs\left( P \right) \cap pairs\left( T \right)} \right|}}{{\left| {pairs\left( T \right)} \right|}} = \frac{{\mathop \sum \nolimits_{j \in T} \mathop \sum \nolimits_{i \in P} \left| {T_{j} \cap P_{i} } \right| \times \left( {\left| {T_{j} \cap P_{i} } \right| - 1} \right)/2}}{{\mathop \sum \nolimits_{j \in T} \left| {T_{j} } \right| \times \left( {\left| {T_{j} } \right| - 1} \right)/2}}$$

(19)

$$pP = \frac{{\left| {pairs\left( P \right) \cap pairs\left( T \right)} \right|}}{{\left| {pairs\left( P \right)} \right|}} = \frac{{\mathop \sum \nolimits_{j \in T} \mathop \sum \nolimits_{i \in P} \left| {T_{j} \cap P_{i} } \right| \times \left( {\left| {T_{j} \cap P_{i} } \right| - 1} \right)/2}}{{\mathop \sum \nolimits_{i \in P} \left| {P_{i} } \right| \times \left( {\left| {P_{i} } \right| - 1} \right)/2}}$$

(20)

Here, the number of pairs in a cluster is counted not by generating all possible pairs in the cluster but by a heuristic that the number of pairs in a cluster can be calculated from the number of instances in a cluster via cluster size × (cluster size − 1)/2. Likewise, the number of pairs in an intersection can be obtained using the number of instances in it. Algorithm 4 implements this heuristic.

Again, this implementation of pR and pP is based on the same skeleton code for K-metric and SE and LE as well as Cluster-F. The added code lines to Algorithm 1 are highlighted in bold.

Instance level: B-cubed

This measures clustering performance at an instance-level. Three parts of this measure – B³ Recall (bR), B³ Precision (bP), and B³ F (bF)—are defined as follows (Levin et al. 2012):

$$bR = \frac{1}{N}\mathop \sum \limits_{t \in T} \frac{{\left| {P\left( t \right) \cap T\left( t \right)} \right|}}{{\left| {T\left( t \right)} \right|}}$$

(21)

$$bP = \frac{1}{N}\mathop \sum \limits_{t \in T} \frac{{\left| {P\left( t \right) \cap T\left( t \right)} \right|}}{{\left| {P\left( t \right)} \right|}}$$

(22)

$$bF = \frac{2 \times bR \times bP}{bR + bP}$$

(23)

Here, t is a name instance in truth clusters T. N is the total of instances in truth clusters (T). $T\left( t \right)$ means a truth cluster that contains a name instance t, while $P\left( t \right)$ means a predicted cluster that contains the name instance t.

Table 7 shows an illustration of B³ calculation. Starting with the instance 1 in T₁ for bR, for example, a predicted cluster containing it is detected: $P\left( 1 \right) = P_{1}$ and $T\left( 1 \right) = T_{1}$. Next, the intersection of the truth cluster (T₁) and the predicted cluster (P₁) is filtered (1, 2, and 3). Then, $\left| {P_{1} \cap T_{1} } \right|/\left| {T_{1} } \right|$ = 3/3 is obtained. This is repeated for instances 2 and 3 in T₁, resulting in an array of (3/3, 3/3, 3/3) for T₁. After the same procedure is applied to T₂ and T₃, the sum of $\left| {P\left( t \right) \cap T\left( t \right)} \right|/\left| {T\left( t \right)} \right|$ for all name instances is divided by the total of those instances (= 8), producing bR = 1.0.

Table 7 An Illustration of B³ F Calculation

Full size table

Although B³ is an instance level metric, its calculation can be formulated as a cluster level one. This is possible because in Eqs. 21 and 22, the calculation results for each name instance in the same intersection are the same. In Table 7, for example, instances 4 and 5 in T₂ have the same calculation outcome (= 2/2) as they appear together in the intersection of T₂ and P₂. So, we can re-write (2/2 + 2/2) as (2/2) × 2 = 2²/2. Here, 2/2 (or 2²) is the calculation outcome for an instance, while 2 besides 2/2 is the number of instances in the intersection (|T₂ ∩ P₂|). Drawing on this formulation, Eqs. 21 and 22 can be re-written as follows.

$$\begin{aligned} bR & = \frac{1}{N}\mathop \sum \limits_{t \in T} \frac{{\left| {P\left( t \right) \cap T\left( t \right)} \right|}}{{\left| {T\left( t \right)} \right|}} = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{{t \in T_{j} }} \frac{{\left| {P\left( t \right) \cap T_{j} } \right|}}{{\left| {T_{j} } \right|}} = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{{t \in T_{j} }} \mathop \sum \limits_{i \in P} \frac{{\left| {P_{i} \cap T_{j} } \right|}}{{\left| {T_{j} } \right|}} \\ & = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{i \in P} \frac{{\left| {P_{i} \cap T_{j} } \right|}}{{\left| {T_{j} } \right|}} \times \left| {P_{i} \cap T_{j} } \right| = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{i \in P} \frac{{\left| {P_{i} \cap T_{j} } \right|^{2} }}{{\left| {T_{j} } \right|}} = AAP \\ \end{aligned}$$

(24)

$$\begin{aligned} bP & = \frac{1}{N}\mathop \sum \limits_{t \in T} \frac{{\left| {P\left( t \right) \cap T\left( t \right)} \right|}}{{\left| {P\left( t \right)} \right|}} = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{{t \in T_{j} }} \frac{{\left| {P\left( t \right) \cap T_{j} } \right|}}{{\left| {P\left( t \right)} \right|}} = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{{t \in T_{j} }} \mathop \sum \limits_{i \in P} \frac{{\left| {P_{i} \cap T_{j} } \right|}}{{\left| {P_{i} } \right|}} \\ & = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{i \in P} \frac{{\left| {P_{i} \cap T_{j} } \right|}}{{\left| {P_{i} } \right|}} \times \left| {P_{i} \cap T_{j} } \right| = \frac{1}{N}\mathop \sum \limits_{j \in T} \mathop \sum \limits_{i \in P} \frac{{\left| {P_{i} \cap T_{j} } \right|^{2} }}{{\left| {P_{i} } \right|}} = ACP \\ \end{aligned}$$

(25)

In Eq. 24, a cluster T_j is set first as a calculation unit ($\mathop \sum \nolimits_{j \in T} \mathop \sum \nolimits_{{t \in T_{j} }} \left( \right)$). This follows the transformation of T(t) to T_j because all name instances in T_j have the same set elements (themselves) and thus the same value for $\left| {T\left( t \right)} \right|$$( = \left| {T_{j} } \right|$). Next, an instance t needs to be checked cluster by cluster to decide where it appears in predicted clusters P_i(t) as in $\mathop \sum \nolimits_{j \in T} \mathop \sum \nolimits_{{t \in T_{j} }} \mathop \sum \nolimits_{i \in P} \left| {P_{i} \left( t \right) \cap T_{j} } \right|/\left| {T_{j} } \right|$. Evidently, P_i(t) is the same as P_i. Finally, the calculation process can be simplified as $\mathop \sum \nolimits_{j \in T} \mathop \sum \nolimits_{i \in P} \left| {P_{i} \cap T_{j} } \right|/\left| {T_{j} } \right| \times \left| {P_{i} \cap T_{j} } \right|$. This is because the calculation results of $\left| {P_{i} \cap T_{j} } \right|/\left| {T_{j} } \right|$ for name instances in the same cluster are the identical if the instances appear in the same intersection ($P_{i} \cap T_{j}$). That is why $\left| {P_{i} \cap T_{j} } \right|/\left| {T_{j} } \right|$ is multiplied by the number of instances belonging to the intersection ($\left| {P_{i} \cap T_{j} } \right|$), omitting the part of instance referencing in the nested summation ($\mathop \sum \nolimits_{{t \in T_{j} }} \left( \right)$). The final re-writing is the same as the calculation of AAP in Eq. 7. Likewise, bP can be re-written to match ACP (Eq. 25). This transformation can be illustrated by the example in Table 8, where the calculation for B³ and K-metric is juxtaposed to show their similarity.

Table 8 An Illustration of B³ F Calculation in comparison with K-metric Calculation

Full size table

As such, Eqs. 24 and 25 indicate that bR and bP can be calculated by Algorithm 3 for calculating AAP and ACP. A difference is that B³F is a harmonic mean of AAP (= bR) and ACP (= bP), while K is a geometric mean of AAP and ACP.

All-in-one calculation and runtime test

In Algorithm 2–5, five clustering measures are calculated using the same skeleton code in Algorithm 1. This commonality enables us to integrate them in a single set of code, as in Algorithm 6 below. Note that B³ precision and recall are not calculated because they produce the same results as ACP and AAP in K-metric.

Besides integrating multiple measures in a single framework, Algorithm 6 reduces computation runtime. To illustrate this, a total of 41,358 name instances in KISTI were used again to evaluate the clustering performance of DBLP’s disambiguation by the five measures as in Fig. 2. For this, especially, the steps implied in the original equations of each measure were implemented straightforwardly. For example, instance pairs per cluster for Pairwise-F were generated (797,297 truth pairs and 826,187 predicted pairs) and compared one by one to find their intersection. Execution time of each measure was measured in seconds and compared to that of the same measure implemented by its corresponding Algorithms 2–5.^{Footnote 4} Table 9 reports the runtime results.

Table 9 Runtime (in seconds) of measure Calculation by straightforward implementation versus proposed algorithms

Full size table

Table 9 reports that Algorithms 2–5 calculated each measure less than 0.057 s, while straightforward implementations took approximately 47 (Cluster-F) up to 23,433 (6.5 h, Pairwise-F) seconds. All measures could be calculated in less than 0.065 s by the All-In-One algorithm.

To test the scalability of Algorithm 6, a set of 1.2 M name instances associated with unique identifiers in a high-energy physics publication library, INSPIRE, was obtained (Louppe et al. 2016). Using the INSPIRE unique identifiers as ground truth of author identity, the performance of all-initials-based name disambiguation^{Footnote 5} was evaluated by the five measures. This task is challenging, especially for the calculation of Pairwise-F, because the number of instance pairs in truth clusters (= 15,388 authors) approximates 213.4 M, while that in predicted clusters (= 18,672) was almost 194.5 M (intersection pairs ≈ 179.9 M). Algorithm 6 produced evaluation results by all measures in 1.583 s. Tested only for the Pairwise-F calculation by Algorithm 5, the runtime was 1.552 s, which is comparable to 12.903 s by the Generalized Merged Distance (GMD) algorithm^{Footnote 6} (Menestrina et al. 2010), the most runtime-efficient method for calculating Pairwise-F so far.

Conclusion and discussion

This paper demonstrated that five measures of clustering performance in author name disambiguation can be integrated into one calculation framework. This was possible mainly because name instances in truth and predicted clusters were compared not by a brute-force cluster-by-cluster comparison but by the use of two hash tables recording instances with their predicted cluster indices and their frequencies in the predicted-truth cluster intersection. Using set notations, each measure’s equations were formulated to fit into the integrative framework.

A few contributions of this paper are worth noting. First, as there is no standard collection of code for the five clustering measures above, this paper can provide an anchoring place for scholars to implement them and validate their correctness using efficient code and samples. Second, the proposed integration of measures dramatically reduces runtime compared to the straightforward implementation of the measures because it uses hash tables instead of brute-force cluster-by-cluster and instance-by-instance comparisons which can increase runtime up to O(n²). Especially, calculating Pairwise-F was re-formulated using a heuristic to count pairs in a cluster for fast caculation. The scalability of the integrative calculation can help scholars evaluate the clustering performance of a disambiguation method at a large scale, for example, using several millions of name instances associated with Researcher IDs in Web of Science (Backes 2018). This paper demonstrated this potential by evaluating the clustering results of 1.2 M name instances in INSPIRE.

Another contribution is that K-metric and B³ measures were shown to produce the same recall and precision scores. This means that studies using either K-metric or B³ have evaluated their clustering results in almost the same way and thus are comparable to one another. Also, this can be good news to scholars who use K-metric because B³ has been argued to evaluate clustering results better than others on challenging cases (Amigó et al. 2009). In addition, the usage frequency of these two measures in Table 1 equals that of Pairwise-F (= 15), which makes them a family of major clustering measures in author name disambiguation.

Most importantly, the integrative calculation shows that the five measures can be understood within a single framework for their similarities and differences. This can help us modify current measures or propose new ones that assess disambiguation performance from distinctive perspectives. In addition, this integrative framework can incorporate other clustering measures such as Closest-Cluster-F (Menestrina et al. 2010) and Variation of Information (Meilă 2003) which have been rarely used in author name disambiguation. Such integration will not only guide us to select measures characterizing best disambiguation performance but also help future efforts to compare different evaluation approaches under diverse ambiguity conditions for entity resolution in general beyond author name disambiguation.

Notes

Note that B-Cubed is more frequently used than other measures in person name disambiguation on the Web [e.g. Delgado et al. (2017)] because the metric has formal properties that can handle evaluation scenarios specific to the task. For details, see Amigó et al. (2009).
For details on the matching procedure, see Kim (2018).
For example, a set of 3964 author name instances can generate over 7.8 M instance pairs (Kim et al. 2017). To address this challenge, a few studies have proposed advanced blocking algorithms. For details, see Kim et al. (2017).
Runtime was tested on a desktop with Intel Core i7-7700 CPU (3.60 GHz), 32G RAM, and 64-bit Windows OS by running code in Strawberry Perl 64-bit (ver. 5.26). Runtime was tested 10 times for each measure and best results were reported for each.
Two name instances that share the same full surname and initials of all forenames are predicted to refer to the same person. For details, see Kim (2018).
The GMD method was implemented by Algorithm 1 in Menestrina et al. (2010).

References

Amigó, E., Gonzalo, J., Artiles, J., & Verdejo, F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12(4), 461–486. https://doi.org/10.1007/s10791-008-9066-8.
Article Google Scholar
Backes, T. (2018). The impact of name-matching and blocking on author disambiguation. In Paper presented at the proceedings of the 27th ACM international conference on information and knowledge management, Torino, Italy. https://doi.org/10.1145/3269206.3271699
Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222. https://doi.org/10.1002/asi.23329.
Article Google Scholar
Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. F. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870. https://doi.org/10.1002/asi.21363.
Article Google Scholar
Delgado, A. D., Martínez, R., Montalvo, S., & Fresno, V. (2017). Person name disambiguation in the web using adaptive threshold clustering. Journal of the Association for Information Science and Technology, 68(7), 1751–1762.
Article Google Scholar
Fan, X., Wang, J., Pu, X., Zhou, L., & Lv, B. (2011). On graph-based name disambiguation. Journal of Data and Information Quality, 2(2), 1–23. https://doi.org/10.1145/1891879.1891883.
Article Google Scholar
Fegley, B. D., & Torvik, V. I. (2013). Has large-scale named-entity network analysis been resting on a flawed assumption? PLoS ONE. https://doi.org/10.1371/journal.pone.0070299.
Google Scholar
Ferreira, A. A., Gonçalves, M. A., & Laender, A. H. F. (2012). A brief survey of automatic methods for author name disambiguation. Sigmod Record, 41(2), 15–26.
Article Google Scholar
Ferreira, A. A., Veloso, A., Gonçalves, M. A., & Laender, A. H. F. (2014). Self-training author name disambiguation for information scarce scenarios. J Assoc Inf Sci Technol, 65(6), 1257–1278. https://doi.org/10.1002/asi.22992.
Article Google Scholar
Han, H., Yao, C., Fu, Y., Yu, Y., Zhang, Y., & Xu, S. (2017). Semantic fingerprints-based author name disambiguation in Chinese documents. Scientometrics, 111(3), 1879–1896. https://doi.org/10.1007/s11192-017-2338-6.
Article Google Scholar
Huang, J., Ertekin, S., & Giles, C. L. (2006). Efficient name disambiguation for large-scale databases. Berlin: Springer.
Book Google Scholar
Hussain, I., & Asghar, S. (2017). A survey of author name disambiguation techniques: 2010–2016. The Knowledge Engineering Review, 32, e22.
Article Google Scholar
Hussain, I., & Asghar, S. (2018). DISC: Disambiguating homonyms using graph structural clustering. Journal of Information Science, 44(6), 830–847. https://doi.org/10.1017/S0269888917000182.
Article Google Scholar
Kang, I. S., Kim, P., Lee, S., Jung, H., & You, B. J. (2011). Construction of a large-scale test set for author disambiguation. Information Processing and Management, 47(3), 452–465. https://doi.org/10.1016/j.ipm.2010.10.001.
Article Google Scholar
Kim, J. (2018). Evaluating author name disambiguation for digital libraries: A case of DBLP. Scientometrics, 116(3), 1867–1886. https://doi.org/10.1007/s11192-018-2824-5.
Article Google Scholar
Kim, J., & Diesner, J. (2015). The effect of data pre-processing on understanding the evolution of collaboration networks. Journal of Informetrics, 9(1), 226–236. https://doi.org/10.1016/j.joi.2015.01.002.
Article Google Scholar
Kim, J., & Diesner, J. (2016). Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks. Journal of the Association for Information Science and Technology, 67(6), 1446–1461. https://doi.org/10.1002/asi.23489.
Article Google Scholar
Kim, J., & Kim, J. (2018). The impact of imbalanced training data on machine learning for author name disambiguation. Scientometrics, 117(1), 511–526. https://doi.org/10.1007/s11192-018-2865-9.
Article Google Scholar
Kim, K., Sefid, A., & Giles, C. L. (2017). Scaling author name disambiguation with CNF Blocking. arXiv preprint arXiv:1709.09657.
Lerchenmueller, M. J., & Sorenson, O. (2016). Author disambiguation in PubMed: Evidence on the precision and recall of authority among NIH-funded scientists. PLoS ONE, 11(7), e0158731. https://doi.org/10.1371/journal.pone.0158731.
Article Google Scholar
Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology, 63(5), 1030–1047. https://doi.org/10.1002/asi.22621.
Article Google Scholar
Ley, M. (2009). DBLP: Some lessons learned. Proceedings of the VLDB Endowment, 2(2), 1493–1500.
Article Google Scholar
Li, G. C., Lai, R., D’Amour, A., Doolin, D. M., Sun, Y., Torvik, V. I., et al. (2014). Disambiguation and co-authorship networks of the US patent inventor database (1975–2010). Research Policy, 43(6), 941–955. https://doi.org/10.1016/j.respol.2014.01.012.
Article Google Scholar
Liu, W., Islamaj Dogan, R., Kim, S., Comeau, D. C., Kim, W., Yeganova, L., et al. (2014). Author name disambiguation for PubMed. Journal of the Association for Information Science and Technology, 65(4), 765–781. https://doi.org/10.1002/asi.23063.
Article Google Scholar
Liu, Y., Li, W., Huang, Z., & Fang, Q. (2015). A fast method based on multiple clustering for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology, 66(3), 634–644. https://doi.org/10.1002/asi.23063.
Article Google Scholar
Louppe, G., Al-Natsheh, H. T., Susik, M., & Maguire, E. J. (2016). Ethnicity sensitive author disambiguation using semi-supervised learning. Knowledge Engineering and Semantic Web, Kesw, 2016(649), 272–287. https://doi.org/10.1007/978-3-319-45880-9_21.
Article Google Scholar
Maidasani, H., Namata, G., Huang, B., Getoor, L. (2012). Entity resolution evaluation measures. Retrieved from http://honors.cs.umd.edu/reports/hitesh.pdf.
Meilă, M. (2003). Comparing clusterings by the variation of information. In Learning theory and kernel machines (pp. 173–187). Berlin: Springer.
Menestrina, D., Whang, S. E., & Garcia-Molina, H. (2010). Evaluating entity resolution results. Proceedings of the VLDB Endowment, 3(1–2), 208–219.
Article Google Scholar
Momeni, F., & Mayr, P. (2016). Evaluating Co-authorship networks in author name disambiguation for common names. Paper presented at the 20th international conference on theory and practice of digital libraries (TPDL 2016), Hannover, Germany. https://doi.org/10.1007/978-3-319-43997-6_31
Müller, M. C., Reitz, F., & Roy, N. (2017). Data sets for author name disambiguation: An empirical analysis and a new resource. Scientometrics, 111(3), 1467–1500. https://doi.org/10.1007/s11192-017-2363-5.
Article MATH Google Scholar
Pereira, D. A., Ribeiro-Neto, B., Ziviani, N., Laender, A. H. F., Gonçalves, M. A., & Ferreira, A. A. (2009). Using web information for author name disambiguation. Paper presented at the Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, Austin, TX, USA.
Qian, Y., Zheng, Q., Sakai, T., Ye, J., & Liu, J. (2015). Dynamic author name disambiguation for growing digital libraries. Information Retrieval Journal, 18(5), 379–412. https://doi.org/10.1007/s10791-015-9261-3.
Article Google Scholar
Reitz, F., & Hoffmann, O. (2013). Learning from the Past: An Analysis of Person Name Corrections in the DBLP Collection and Social Network Properties of Affected Entities. In T. Özyer, J. Rokne, G. Wagner, & A. H. P. Reuser (Eds.), The influence of technology on social network analysis and mining (pp. 427–453). Vienna: Springer Vienna.
Chapter Google Scholar
Santana, A. F., Gonçalves, M. A., Laender, A. H. F., & Ferreira, A. A. (2017). Incremental author name disambiguation by exploiting domain-specific heuristics. Journal of the Association for Information Science and Technology, 68(4), 931–945. https://doi.org/10.1002/asi.23726.
Article Google Scholar
Shin, D., Kim, T., Choi, J., & Kim, J. (2014). Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics, 100(1), 15–50. https://doi.org/10.1007/s11192-014-1289-4.
Article Google Scholar
Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43, 287–313.
Article Google Scholar
Strotmann, A., & Zhao, D. Z. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the American Society for Information Science and Technology, 63(9), 1820–1833. https://doi.org/10.1002/asi.22695.
Article Google Scholar
Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data. https://doi.org/10.1145/1552303.1552304.
Google Scholar
Wu, H., Li, B., Pei, Y. J., & He, J. (2014). Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics, 101(3), 1955–1972. https://doi.org/10.1007/s11192-014-1283-x.
Article Google Scholar
Zhang, Y., Zhang, F., Yao, P., & Tang, J. (2018). Name disambiguation in AMiner: Clustering, maintenance, and human in the loop. Paper presented at the Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK.
Zhu, J., Wu, X., Lin, X., Huang, C., Fung, G. P. C., & Tang, Y. (2018). A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering. Scientometrics, 114(3), 781–794. https://doi.org/10.1007/s11192-017-2611-8.
Article Google Scholar

Download references

Acknowledgements

This work was supported by grants from the National Science Foundation (#1561687 and #1535370), the Alfred P. Sloan Foundation and the Ewing Marion Kauffman Foundation.

Author information

Authors and Affiliations

Institute for Research on Innovation and Science, Survey Research Center, Institute for Social Research, University of Michigan, 330 Packard Street, Ann Arbor, MI, 48104-2910, USA
Jinseok Kim

Authors

Jinseok Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinseok Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, J. A fast and integrative algorithm for clustering performance evaluation in author name disambiguation. Scientometrics 120, 661–681 (2019). https://doi.org/10.1007/s11192-019-03143-7

Download citation

Received: 07 December 2018
Published: 14 June 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s11192-019-03143-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Abstract

Similar content being viewed by others

On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method

Generating automatically labeled data for author name disambiguation: an iterative clustering method

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Introduction

Literature review

Methods