Keywords

1 Introduction

With the rapid development of information science and technology, information and data are exploding. It is an urgent problem to solve how to extract information effectively and discover knowledge in the face of massive information data. Knowledge Map uses the symbolic form to describe concepts and their relationships in the physical world. Through the analysis and process of structured knowledge, they are displayed graphically. With the development of scientific research in the field, a large number of scientific research results and academic papers have emerged. Faced with a large number of academic authors and complicated author relations, it is particularly important to construct the author influence map. Based on author map, we can explore the potential relationship between authors effectively and build a personalized recommendation system for authors. It also helps to analyze the research ability of individual authors, academic rankings, etc.

The work of building the author map focuses mainly on the evaluation of the author’s influence and the author’s relationship. H-index is a popular evaluation method in recent years and has been applied in the impact assessment of author and academic journals [1, 2]. A large h-index usually means that the author has great influence. Mccarty suggests that the highest h-index can be achieved by working with many co-authors, at least some with high h-indexes themselves [3]. It has become a popular scientific measurement method because of its simplicity and intuitiveness. Kudelka proposed a new author evaluation measure named a h-index based on h-index to evaluate the citation quantity received by individual scientists [4]. In addition, authors’ influence can also be analyzed based on the number of publications, the number of co-authors and the times cited. However, the author influence is not only reflected in the amount of static data, but also in the behavior that the author’s point of view is noticed and communicated [5]. Meanwhile, the influence spreads through the relational path of cooperation and citation between authors, on which the authors should have similar academic interests. Author’s cooperation network and citation network such as PageRank algorithm based on network structure can also be used to evaluate the author’s influence. PageRank-based methods for evaluating authors’ influence, namely Pub Rank, Star Rank and WMI Rank were introduced in [6]. An optimized topic-related PageRank algorithm was used to measure the ranks of publications and authors [7]. The study of the authors’ relationship is mainly based on co-citation relationship and cooperation relationship. The citation content of authors was applied to measure the similarity between co-citation authors and construct the co-citation map of authors [8]. Based on all papers published in SIGMOD from 1975 to 2002, the author’s cooperation relationship map was constructed [9]. Xu selected papers from 2005–2014 in CNKI series database, and used social network analysis method to study the author cooperation relationship of the digital publishing [10].

Author map not only can reflect the authors’ influence but also reflect the relationships between them intuitively. Although some research achievements have been made, only a single factor, such as the number of publications or citations, has been considered in the evaluation of the authors’ influence. Based on data field theory and entropy weight method, we put forward a method to construct author influence map, which evaluates the number of publications, the order of author’s signature, the level of the publisher, the reference relationship among authors and the amount of citation comprehensively. Firstly, the entropy weight method is used to evaluate the author’s contribution comprehensively by analyzing the different weight of the factors such as the quantity of publication, author’s signature order, the level of the publisher and so on. Second, data field theory is introduced to measure the author’s influence. Finally, the author influence map (AIM) which contains reference relationship, cooperation relationship and influence can be constructed.

The rest of this paper is organized as follows. The secondly section introduced the definition of AIM. The third section elaborates on the key techniques of measuring author’s influence and the method of evaluating the author’s relationship. The fourth section gives the experimental analysis and conclusion. The fifth section gives a summary of this paper.

2 Formal Definition of AIM

This paper presents an Author Influence Map (AIM) in academic fields. The AIM reflects the author’s influence, cooperation relationship and reference relationship among different authors. The nodes in the AIM reflect the author’s influence, and the connection between nodes indicates the relationship such as reference relationship and cooperation relationship.

Definition 1:

Author Influence Map (AIM).

AIM = <A, R> , where

  • A = {ai} is the set of nodes, each element represents the author of a particular academic field.

    • \( a_{i} \, = \,{<}a\_n_{i} ,\,a\_p_{i} {>} \), \( a\_n_{i} \) indicates the name of author \( a_{i} \), and \( {\text{a}}\_p_{i} \) indicates the influence of author \( a_{i} \).

  • R is a set of relations between authors, R = QR∪CR, where

    • QR = {<\( a_{j} \to a_{i} \)>} denotes the reference relationship between authors, and the element <\( a_{j} \to a_{i} \)> denotes that author \( a_{j} \) refers author \( a_{i} \)’s paper.

    • CR = \( \left\{{{<}a_{i} ,a_{j}{>}} \right\} \) denotes the cooperation relationship between authors, and the element <ai, aj> denotes author \( a_{i} \) and author \( a_{j} \) have completed at least one paper together.

It can be seen that the key issues in building AIM are: mining and measuring relations among authors, measuring authors’ influence, which will be introduced in Sects. 3.1 and 3.2.

3 Construction of Author Influence Map Based on Data Field Theory and Entropy Weight Method

3.1 Mining Relations Between Authors

Reference Relations among Authors.

There are reference relationships between different authors, which are directed relationships.

$$ r_{{a_{j} \to a_{i} }} = \frac{{C_{{a_{j} \to a_{i} }} }}{MAX} $$
(1)

Formula (1) denotes the calculation method of reference weights between them. Where \( a_{j} \to a_{i} \) denotes author \( a_{j} \) quotes author \( a_{i} \)’s papers. \( r_{{a_{j} \to a_{i} }} \) denotes the weights of author \( a_{j} \) refers to author \( a_{i} \)’s papers. If \( r_{{a_{j} \to a_{i} }} \) is larger than \( r_{{a_{j} \to a_{k} }} \), it denotes that the author \( a_{j} \) refers to author \( a_{i} \) more closely than author \( a_{k} \). \( C_{{a_{j} \to a_{i} }} \) denotes the times of author \( a_{j} \) refers to author \( a_{i} \)’s papers, and \( MAX \) is the largest times in all reference relationships.

Cooperation Relations Among Authors.

In a scientific research paper, there will be multiple co-authors, so these authors constitute cooperation relationships. The cooperation weights between authors indicate the degree of cooperation between authors, which is not only related to the times of cooperation between them, but also to the degree of cooperation in a single paper. The degree of cooperation can be measured through the author’s contribution in a single paper. The order of authors’ signature, whether they are the first authors or not, and whether they are a corresponding author all can reflect the degree of authors’ contribution. Therefore, the effective measurement of authors’ contribution in a paper is the key to calculating the degree of cooperation between authors.

Contribution Degree of Authors in Different Signature Order.

At present, there are many studies on the authors’ contribution in different signature order. This paper adopts linear model [11, 12]. As shown in formula (2).

$$ C\left( {s,n} \right) = \frac{{\left( {n - s + 1} \right)}}{{\mathop \sum \nolimits_{s = 1}^{n} \left( {n - s + 1} \right)}} = \frac{n - s + 1}{{\frac{1}{2}n\left( {n + 1} \right)}} $$
(2)

where n denotes the number of collaborators and s denotes the author’s signature order. Without considering the corresponding authors, the author’s contribution decreases linearly with the increases of the author’s signature order, and the degree of decline is related to the number of authors. The degree of decline can be described in formula (3).

$$ \Delta C = \frac{2}{{n\left( {n + 1} \right)}} $$
(3)

The value of n gets larger; the degree of decline get smaller. However, the value of n should not be too large, because \( \Delta C \) varies little if n is too large. So the model can not accurately describe the impact of the number of collaborators on the contribution degree of authors.

The corresponding author is usually the first author of the paper, but some corresponding authors are not the first author. So the formula can be modified as follows after adding the factor of the corresponding author [12].

$$ C\left( {s,n,p,t,m} \right) = \frac{{n + 1 - s^{1 - m} }}{{\frac{1}{2}n\left( {n + 1} \right) + p\left( {t - 1} \right)}} $$
(4)

where s denotes the author’s signature order, n is the number of authors in a paper. If there are corresponding authors, p is 1. Otherwise, p is 0. t denotes the corresponding author’s signature position, m denotes whether the current author is the corresponding author, if true then m = 1, otherwise m = 0.

Degree of Cooperation Between Authors.

The co-authors in each paper can form many cooperation relationships, but their cooperation degrees are different. The cooperation weight is the sum of multiple cooperation degree behaviors between authors. The cooperation degree of single cooperation is measured by the author contribution model mentioned above. We referred to the calculation method of cooperation degree in [12].

$$ W_{i,j}^{k} = \frac{{2\left( {n + 1} \right) - \left( {S_{i}^{{1 - m_{i} }} + S_{j}^{{1 - m_{j} }} } \right)}}{{n\left( {n + 1} \right) + 2p\left( {t - 1} \right)}} $$
(5)

where \( W_{i,j}^{k} \) indicates the degree of cooperation between author \( i \) and author j in paper k. Based on the cooperation degree of a single co-authored paper, the total cooperation degree between authors can be calculated, and the calculation formula is shown as follows.

$$ W_{i,j} = \mathop {\sum\nolimits_{k} {W_{i,j}^{k} } }\limits $$
(6)

where \( W_{i,j} \) denotes the total cooperation weights between author i and author j.

3.2 Assessment of Authors’ Influence Based on Data Field

The author’s influence is related to two aspects: the author’s “quality” weights and the reference weights between authors. The specific flowchart of the author’s influence process of the mechanism is shown in Fig. 1. Author’s “quality” is the reflection of the author’s own characteristics in academic research, which is related to the amount of publication, the publisher level and the order of signature. In a certain field, the more papers the author published, the higher the order of signatures in each paper and the higher the publisher level of the paper all will produce the greater “quality” for each author. Therefore, the author’s “quality” weights are the comprehensive evaluation results of the several above indicators. We will describe in detail later about the calculation method of “quality” weights. The reference weights are related to the times of reference between them. Inspired by the idea of the physical field, we try to introduce the interaction between particles and the description method of their fields into abstract numerical space [13] and use data field theory to measure the author’s influence. The above two aspects are considered comprehensively in the method.

Fig. 1.
figure 1

Flowchart of overall process of author influence map

The Data Field of Authors.

According to the data field theory, there are several entities in a range, and each entity will produce a field of potential energy. These fields also affect the surrounding entities. The size of the potential energy field of each entity is not only related to its own quality but also affected by the superposition of potential energy fields around it. The affection of the potential energy field around the entity is related to the distance between them. The smaller the distance, the greater the affection. Be applied to the academic field, and each author represents an entity which generates a data field. And they also are in the field produced by other authors and affected by the data field of other authors. The size of the data field of each author is not only related to its own “quality” factors but also related to the superposition the data field of the authors around him. The authors around him are the authors who quoted his papers. The larger the reference weights, the smaller the distance between them, and the greater the affection of the reference authors. Therefore, the value of the authors’ data field is a sum of his own field and others’ superposition field. The author’s influence can be expressed by the value of the data field. We use the Gauss formula to calculate the value of each author’s data field.

$$ P\left( {a_{i} } \right) = W_{{a_{i} }} \sum\nolimits_{j = 1}^{n} {W_{{a_{j} }} \times {\text{e}}^{{ - \left( {\frac{{{\text{d}}is \left( {a_{j} \to a_{i} } \right)}}{\sigma }} \right)^{2} }} } $$
(7)
$$ {\text{d}}is \left( {a_{j} \to a_{i} } \right) = 1 - r_{{a_{j} \to a_{i} }} $$
(8)

where \( P\left( {a_{i} } \right) \) denotes the data field of author \( a_{i} \), σ is the balance parameter, \( {\text{d}}is \left( {a_{j} \to a_{i} } \right) \) denotes the distance between author \( a_{j} \) and author \( a_{i} \), and \( r_{{a_{j} \to a_{i} }} \) denotes the reference weights between author \( a_{j} \) and author \( a_{i} \). \( W_{{a_{i} }} \) represents “quality” weights of author \( a_{i} \), and the next section describes in detail the calculation method of “quality” weights.

Evaluation of Author’s Quality Based on Entropy Weight Method.

The author’s “quality” is mainly related to the number of papers published, the order of signatures in papers and the publisher level of papers. In this paper, we mainly analyze the three factors. To measure their contribution to the comprehensive evaluation of authors’ “quality”, we used the method of entropy weight.

The concept of information entropy is a measure to describe the degree of information disorder in a system. If the information entropy of an indicator is greater, the uncertainty of information will be greater and the amount of information provided will be smaller, so the contribution of the indicator will be smaller in the multi-indicator comprehensive evaluation. On the contrary, the smaller the indicator’s information entropy is, the larger the amount of information it provides and the larger the contribution it provides in the comprehensive evaluation. In this paper, the author’s publication quantity, the publisher level and the order of signature are taken as the indicators to evaluate the author’s “quality” comprehensively. Author’s publication quantity and publisher level can be obtained directly. And the order of authors’ signatures needs to be treated as the contribution degree of authors because the signature order can reflect the author’s contribution degree correctly in a paper. The higher the order of the author’s signature is, the greater the contribution degree is. The contribution degree decreases linearly with the increase of the order of signatures [13].

Suppose \( AC = \left\{ {a_{1} ,a_{2} ,a_{3} \ldots a_{n} } \right\} \) represents a set of authors. The author ai’s publication quantity is Xi1, publisher level is Xi2 and contribution degree is Xi3, and \( Y_{i1} \),\( Y_{i2} \),\( Y_{i3} \) are the results after normalization. Then the information entropy of three indicators is calculated as shown in formula (9).

$$ H_{j} = - ln\left( n \right)^{ - 1} \sum\nolimits_{i = 1}^{n} {Y_{ij} \;ln\;Y_{ij} ,} \quad j = 1,2,3,\,\,i = 1,2,3 \ldots n $$
(9)

where j denotes the above three indicators, and n denotes the number of authors set AC.

After calculating the information entropy of each indicator, the entropy weight of each indicator is clear. It can be calculated by formula (10).

$$ w_{j} = \frac{{1 - H_{j} }}{{k - \varSigma H_{j} }},\quad j = 1,2,3,\,k = 3 $$
(10)

Obviously, when the information entropy \( H_{j} \) of indicator j reaches its maximum 1, and the entropy weight \( w_{j} \) is 0. It denotes that indicator j can hardly provide useful information when determining the author’s “quality”, so it occupies fewer weights in a comprehensive evaluation. After the entropy weights of the three indicators are obtained, the value of authors’ “quality” can be obtained by formula (11).

$$ W_{{A_{i} }} = \sum\nolimits_{j = 1}^{k} {w_{j} Y_{ij} ,\quad i = 1,2, \ldots n,\,k = 3} $$
(11)

where \( W_{{A_{i} }} \) denotes the authors’ “quality”.

4 Experimental Analysis

This paper takes information retrieval as the target research field and takes the academic papers published in ACM SIGIRFootnote 1 conference and their authors as our experimental objects. We selected papers and authors from 2006 to 2015, and obtained a total of 2241 papers and 3207 different authors, including 7548 pairs of cooperation relationships and 29774 pairs of reference relationships (excluding self-citations). For the authors who have repeat names, we distinguish them mainly through the model of “author + institution”.

4.1 Assessment of Authors’ “Quality”

Measuring the author’s “quality” is the key to calculate the author’s influence. Due to the source data in the experiment are all from SIGIR, so the indicator of publisher level is not considered in this paper. We extracted all authors who appeared in the paper of ACM SIGIR conference from 2006 to 2015 as the author set \( AC = \left\{ {a1, a2, a3 \ldots an} \right\} \). And the number of each author’s papers can be obtained from the ACM database. We extracted the paper and the author’s signature order from SIGIR conference in 2006 to 2015. Then according to the method of authors’ contribution mentioned in Sect. 3.1, we can conclude the total contribution degree of every author.

Different indicators occupy different importance in a comprehensive evaluation. This paper reflects the evaluation weights of each indicator through information entropy and entropy weights. The results are shown in Table 1.

Table 1. Information entropy of publication quantity and contribution degree.

It can be seen that the information entropy of the amount of publication is a little smaller and the entropy weights is a little larger. Then the author’s “quality” value is calculated based on the entropy weight method. We select author threshold as the core authors according to Price Formula.

$$ M_{p} = 0.749\sqrt {N_{p \;max} } $$
(12)

The total number of authors \( N_{p \;max} \) is 3207, and the number of selected core authors is \( M_{p} = 42 \). Due to space limitations, Fig. 2 shows the “quality” value of the top 20 authors.

Fig. 2.
figure 2

Authors’ “quality” weights

The author’s “quality” only reflects the author’s own characteristics, but not enough to reflect the author’s influence. In order to reflect the authors’ influence more accurately, we should also consider the impact of the author, who has a reference relationship with him.

4.2 Evaluation Results of the Author’s Domain Influence

After the reference relationship and weights between authors are extracted and the author’s “quality” is obtained, then we calculate the author’s data field value by formula (7). The data field value of authors can reflect the author’s influence.

Then we compare the authors’ influence with their “quality” weights. The top 20 authors’ influence values and top 20 authors’ “quality” weights are shown in Fig. 3.

Fig. 3.
figure 3

Comparison of authors’ “quality” and influence

It can be seen that some authors are in the “quality” ranking top 20, while they are not necessarily in influence ranking top 20. Han J, for example, ranked first in authors’ “quality”, but his influence ranked 157th. It was found that Han J was cited by only 10 people and cited 10 times in the ACM SIGIR conference from 2006 to 2015. And Garcia-molina H dropped from 11th in the “quality” ranking to 312th in the influence ranking. We find that he is cited by only 4 people, with a total of 4 times. In addition, he only co-authored papers with 5 people during this period. That indicated they were weakly correlated with the surrounding authors during this period. So their academic influence should be lower. However, Agichtein E and Dumais S rose from the 50th and 28th positions in the “quality” ranking to the 5th and 7th positions in the influence ranking respectively. By analyzing the times of citation, it can be found that Agichtein E is cited by 323 authors for 579 times, and Dumais S was cited by 265 authors for 426 times. In terms of the number of collaborators, they have co-authored papers with 35 and 31 people respectively during this period. It is obvious that they have closer connection with the surrounding authors. So they should have more influence.

From the analysis above, we can draw the following conclusions:

  1. 1.

    Authors’ “quality” and authors’ influence reflect different dimensions. The “quality” reflects the author’s own characteristics, such as the number of author’s papers and the order of signature. While the influence not only reflects the author’s “quantity”, but also reflects the reference degree from other authors who quoted him.

  2. 2.

    The higher the author’s “quality” is, the more articles are published by the author, and the higher the order of the author’s signature is. However, he doesn’t have to be cited more by other authors, so his influence doesn’t have to be greater.

  3. 3.

    Influence based on date field emphasizes the authors’ influence on other authors and reflects the real influence situation better.

5 Conclusions

This paper mainly studies the method of constructing author influence map and the mining of author reference relationship and cooperation relationship. For the measurement of author influence, we use the author data field method, which not only considers the author’s own characteristics related to influence, but also considers the superposition impact of the authors who quoted him. The effective evaluation of the authors’ “quality” is the premise to measure the author’s influence. We use entropy weight method to measure the “quality” of authors, which can effectively evaluate the different importance of each indicator in the comprehensive evaluation. In the aspect of cooperation relationship and weights between authors, we introduce the method of calculating authors’ contribution degree in different signature order to measure authors’ cooperation weights. Through experimental analysis, we explore the relevance between authors’ influence and authors’ “quality”, and realize the data field method reflects the authors’ influence more accurately. This will provide a reliable way to construct the author influence map.

This paper aims at the research of author influence and relationship map. In the future work, we will expand the research areas and paper databases. In addition, we will apply the author influence map to the personalized information service recommendation and build the personalized service recommendation system for academic authors.