1 Introduction

Three-way decision was proposed by Professor Yao Yiyu [81, 82], and it is an effective mathematical tool to make decisions based on trisection idea [15, 18, 43, 45, 54, 92]. In recent years, the theory of three-way decision has been developed by incorporating granular computing, cognitive computing, rough set, formal concept analysis, fuzzy set, multi-attribute decision making, and so on. In the development of three-way decision, there have been appearing many hot and promising research issues such as decision-theoretic rough set [23, 38,39,40, 46, 75, 113], probabilistic rough set [48, 72, 83, 109], three-way granular computing [44, 89, 91], attribute reduction based on three-way decision [56, 105,106,107], fuzzy set oriented three-way decision [12, 22, 24, 25, 104], multi-granulation three-way decision [6, 36, 57, 59], cost-sensitive three-way decision [47, 102, 112], sequential three-way decision [19, 31, 83, 100, 103], three-way concept analysis [33, 35, 55, 60, 61, 63, 64, 88, 96, 110, 111], three-way concept learning [13, 34], three-way conflict analysis [21, 67, 90], clustering with three-way decision [1, 70, 93,94,95, 108], orthopairs [4] and shadowed sets [53]. In the meanwhile, a large number of useful and interesting results have been obtained. So it is natural and important to analyze the researches in three-way decision for the purpose of providing a reference for interested readers to read, study, understand and develop the theory of three-way decision.

The dataset of three-way decision articles we collected for complex network analysis in this paper were searched from ISI Web of Science by using the keywords: “three-way decision”, “three-way decisions”, “decision-theoretic rough sets” or “probabilistic rough sets” for those published before December 18, 2019. The dataset covers 549 papers and each of them was saved as a CIW (customer information warehouse) file for easy access. Number of the collected papers as a function of dates is shown in Fig. 1. It should be pointed out that the number of those papers written by only one author was 43 which accounts for 7.8% of total papers. We also found that the dataset of three-way decision articles includes 11 ESI highly cited papers [14, 29, 31, 34, 41, 59, 66, 82, 84, 94, 98].

Fig. 1
figure 1

Number of the collected papers as a function of dates

The rest of this paper is organized as follows. In Sect. 2, we introduce the measurement indices and algorithms used in this paper to make data analysis based on complex networks. In Sect.  3, scientific collaboration network is constructed to reveal the relationship between authors. In Sect. 4, university collaboration network is built to show the relationship between affiliations. In Sect. 5, citation network, bibliographic coupling network and co-citation network are investigated to obtain the relationship between three-way decision articles. In Sect. 6, keywords network is discussed to find the hottest research issues closely related to three-way decision. Finally, some useful conclusions are given in Sect. 7. The main content of our work can be shown in Fig. 2.

Fig. 2
figure 2

The main content of our work

2 Models and algorithms

A network G can be presented by a set V of nodes and a set E of edges [50, 71], i.e., \(G=(V,E)\). Generally speaking, networks can be divided into directed networks and undirected networks based on whether the edges have directions or not. Like undirected networks, directed networks were also encountered frequently in the real world. For instance, in citation network of papers, the nodes are papers and there is a directed edge from paper M to paper N if M cites N in its bibliography.

A network G can simply be represented by an adjacency matrix A no matter whether it is directed or undirected. For example, let G be a network with n nodes, and A be the adjacency matrix of G. In [50], if G is an undirected network, its adjacency matrix A can be defined as

$$\begin{aligned} A_{ij}= {\left\{ \begin{array}{ll} 1, &{} \text {if there is an edge between the nodes } i \text { and } j,\\ 0, &{} \text {otherwise;} \end{array}\right. } \end{aligned}$$
(1)

if G is a directed netwrok, the elements of the adjacency matrix A are given as follows:

$$\begin{aligned} A_{ij}= {\left\{ \begin{array}{ll} 1, &{} \text {if there is an edge from the node } j \text { to the node } i,\\ 0, &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(2)

It should be pointed out that sometimes the importance of the connections between nodes of a network may be different from each other. If it happens, then we call such a network as a weighted network whose elements \(A_{ij}\) are viewed as the weights \(w_{ij}\) of the corresponding connections and they can be computed in a certain way in real instances. For example, in scientific collaboration network, weights often present frequency of cooperation between authors.

The degree of a node is an important notion in a network. Let G be a network with n nodes, and A be the adjacency matrix of G. In [50], if G is an undirected network, then the degree \(k_i\) of the node i is defined as the number of the edges directly connecting it to the other neighbors, i.e.,

$$\begin{aligned} k_i = \sum _{j=1}^nA_{ij}. \end{aligned}$$
(3)

If G is a directed network, the case may be more complicated and it needs an additional distinction between in-degree and out-degree [50]. The former means the number of incoming edges, while the latter means the number of outgoing edges. In this paper, we denote the in-degree and out-degree of the node i by \(k_i^{\text{ in }}\) and \(k_i^{\text{ out }}\), respectively. Then we have

$$\begin{aligned} k_i^{\text{ in }} = \sum _{j=1}^nA_{ij},\quad k_j^{\text{ out }} = \sum _{i=1}^nA_{ij}. \end{aligned}$$
(4)

Another important notion in a network is network centrality. This index was used to find the most important or central nodes in a network. Different people may have different ideas about this issue. To the best of our knowledge, there have been many network centrality measures. In this paper, degree centrality, eigenvector centrality and PageRank value will be used to search for important authors and influential papers from the dataset of three-way decision articles.

Different from degree centrality, eigenvector centrality further considered the importance of the nodes that they are connected to [2, 50, 71]. In fact, this is very reasonable because if all the nodes in a network are assigned with different scores in advance, the connections to high-scoring nodes will naturally contribute more than those to low-scoring nodes.

For an undirected network G with n nodes, let A be the adjacency matrix of G, \(\lambda\) be a constant, and \(x=(x_1,x_2,\ldots ,x_n)\) be a vector of centralities of all nodes. If \(x_i\) is defined as (see e.g. [69] for details)

$$\begin{aligned} x_i = \dfrac{1}{\lambda }\sum _{j=1}^nA_{ij}x_j, \end{aligned}$$
(5)

then the above equation can be rewritten in the matrix form as

$$\begin{aligned} \lambda x = Ax. \end{aligned}$$
(6)

It can be observed that \(\lambda\) is an eigenvalue of the adjacency matrix A with the corresponding eigenvector x. In the eigenvector centrality measure, when an appropriate eigenvalue \(\lambda\) of the adjacency matrix A is obtained, we find the required eigenvector x.

Google’s PageRank algorithm can be considered as a generalization of the eigenvector centrality measure. Compared with the eigenvector centrality measure, PageRank algorithm further considered the importance of the nodes that they are pointed to as a relative value [3]. In other words, the contribution of the importance of a node is also affected by the out-degree of this node. The more the number of the edges directly connecting one node to the other neighbors is, the less the contribution of the importance of this node is. In [50], the PageRank value was defined by

$$\begin{aligned} x_i = \alpha \sum _{j=1}^nA_{ij}\frac{x_j}{k_j^{\text{ out }}}+\beta _i, \end{aligned}$$
(7)

where \(x_i\) is the centrality measure of the web page i, \(k_j^{\text{ out }}\) is the out-degree of the web page j, and \(\alpha\) and \(\beta _i\) are two positive constants. Note that \(\alpha\) is used to keep a balance between the first item and the second item in the equation, and it was often set to be 0.85 for the purpose of accelerating the convergence speed. The parameter \(\beta _i\) can be determined by the text similarity between the web page and query condition. In PageRank algorithm, the PageRank value will be iterated until the computational results are unchanged or only slightly changed. In this case, the PageRank algorithm will be terminated and the required vector x of centralities can be obtained.

Hyperlink-induced topic search (HITS) algorithm can provide more information than the other network centrality algorithms [20, 50]. It gives each node two values: authority centrality and hub centrality. The former shows the authority of a node which can be quantified by the number of nodes with high hub centrality connecting to it, and the latter shows the importance of a node which can be quantified by the number of nodes with high authority centrality that it is pointing to. For example, in citation network of papers, nodes with high authority centrality mean the important or influential papers in the research field, and nodes with high hub centrality mean the ordinary papers which cite many important or influential papers.

The concept of a connected component of a network comes from graph theory. It is a maximal subnetwork in which each pair of nodes is connected by a path. In real networks, there often exist some large components which include most of nodes and the rest of nodes are divided into many small components disconnected to each other [50, 68]. For example, scientific collaboration network has some large cooperative groups and many small cooperative groups.

Detecting community structures of a network is an important issue in the domain of complex network analysis. If a network has community structures, it means that the network can be easily grouped into sets of nodes within which connections are dense but between which they are sparse [49, 50]. For instance, the scientific collaboration network and university collaboration network in this paper are divided naturally into communities.

3 Scientific collaboration network

Scientific collaboration network is a social network where nodes are scientists and links are co-authorships. The 549 papers collected in this paper with the topic of three-way decision involves 709 authors. In order to avoid too many authors’ influence on the final analytical results, this paper only considered the papers where the number of authors is less than or equal to 5.

Fig. 3
figure 3

Scientific collaboration network

The scientific collaboration network visualization was realized by Gephi which is an open source network analysis and visualization software package, and it is shown in Fig. 3. This weighted network has 644 nodes and 1235 edges and it includes 68 connected components. The number of connections repeated between two nodes was converted into the weight of a link. The node size is proportional to its degree (the number of coauthors). The node color is based on connected components. The edge width is based on its weight (the number of cooperation in pairs). Top 10 components colored the nodes with color and the remain components colored the nodes with gray. The first largest component represented by Professor Yao Yiyu has 366 nodes and 830 edges accounted for 56.83%. The second largest component represented by Professor Xue Zhan’ao has 13 nodes and 37 edges accounted for 2.02%. The third largest component represented by Professor Zhu Yanhui has 12 nodes and 29 edges accounted for 1.86%.

Fig. 4
figure 4

Community detection of scientific collaboration network

In Fig. 4, the community structures in the largest components of the scientific collaboration network are shown. The network was divided into 18 communities. The node color is based on community cluster membership. The node size is proportional to its degree (the number of coauthors). Top 10 communities colored the nodes with color and the remain colored the nodes with gray.

Table 1 Community ranking of modularity class

Table 1 shows the ranking of communities in the investigation of three-way decision model. Members of influential communities and their proportion of total researchers are listed in the table.

Table 2 Authors ranking

In order to obtain the influence of authors in the three-way decision field, author rankings are listed with different evaluating measures and benchmarks in Table 2. The evaluation indices are the number of papers published, the amount of coauthors, the frequency of collaboration with other authors, and eigenvector centrality (considering both quantity and quality of coauthors). In the 2nd, 3rd, 4th columns of Table 2, each of them includes two parts: author name and the value of certain index.

4 University collaboration network

In this section, we construct a weighted network in which nodes represent universities and undirected edges indicate co-occurrence pairs of universities. The number of connections repeated between two nodes was converted into the weight of a link.

Fig. 5
figure 5

University collaboration network

The university collaboration network visualization was realized by Gephi, and it is shown in Fig. 5. This weighted network has 156 nodes and 268 edges and it includes 15 connected components. The node size is proportional to its degree (the number of co-occurrence pairs between this node and other universities). The node color is based on connected components. Top 10 connected components colored the nodes with color and the remain colored the nodes with gray. The edge width is based on its weight (the number of connections repeated between the two nodes of this edge). The largest component represented by University of Regina has 135 nodes and 245 edges accounted for 77.56%.

Fig. 6
figure 6

Community detection of university collaboration network

In Fig. 6, the community structures in the largest components of the university collaboration network are shown. The network was divided into nine communities. The node color is based on community cluster membership. The node size is proportional to its degree.

Table 3 Community ranking of modularity class

Table 3 shows the ranking of communities. Members of influential communities and their proportion of total universities are listed in the table.

5 Networks of scientific papers

In this section, we construct a directed network (citation network) and two undirected networks (bibliographic coupling network and co-citation network) by bibliography to reveal the relationship between papers. In the analysis of the dataset of three-way decision papers, we used digital object identifier (DOI) which is a string of numbers, letters and symbols as the unique label to identify a paper. Moreover, the PageRank and HITS algorithms will be used to search for the most important or influential papers in the networks.

5.1 Citation network

Citation network is a directed network in which nodes are papers and there is a directed edge from paper M to paper N if M cites N in its bibliography [50, 71].

Fig. 7
figure 7

Citation network

The citation network visualization was realized by Gephi, and it is shown in Fig. 7. This directed network has 4578 nodes and 14,627 edges. The node size is proportional to its in-degree (times cited by the papers in our dataset). This investigation shows that about 210 papers were never cited at all accounted for 4.59%. For the remainder, 2874 papers have one citation accounted for 62.78%, 554 papers have two citations accounted for 12.1%, and 201 papers have three citations accounted for 4.4%. Only 194 papers have 10 or more citations accounted for 4.24%, and just 7 papers have 100 or more citations accounted for 0.15%.

Table 4 Paper ranking of citation network

Table 4 shows the ranking of papers by in-degree (times cited by the papers in our dataset) based on citation network. Top 12 influential papers [26, 32, 34, 41, 66, 74, 82, 87, 89, 95, 99, 115] and their cited times are listed. Note that the citations in Table 4 contain two parts: the former is the in-degree of a paper within the citation network, while the latter quotes from ISI Web of Science (their topics are not limited to three-way decision). By the way, we list in Table 4 only the papers from our downloaded dataset. However, the papers [8, 16, 17, 30, 51, 52, 58, 65, 77,78,79,80, 85, 86, 93, 97, 114] beyond our dataset are also given in Table 5 for the convenience of interested readers’ reference. In other words, the topics of the papers listed in Table 5 may not be three-way decision. These additional papers are recommended to readers because they were cited very frequently together with those in Table 4. Generally speaking, if readers want to well understand the ideas of the influential papers in Table 4, it is necessary to read those in Table 5 at the same time.

Table 5 Paper ranking beyond our dataset

By Table 5, we can obtain more information on papers outside our downloaded dataset. There are two reasons: (1) keywords of some papers do not cover three-way decision, such as Nos. 6, 7, 8, 10, 11, 12, 13, 14, 16, 17 in Table 5; (2) some papers have not been included in ISI Web of Science (of course, they do not have any citation in ISI Web of Science, but still have citations in our citation network), which are presented in Table 5 with “–”. In other words, these important papers outside our dataset can be successfully traced by complex network analysis although they were not included in our dataset.

Table 6 Paper ranking of citation network on PageRank

Table 6 shows the ranking of papers by PageRank algorithm on citation network. Top 12 influential papers [7, 26, 28, 32, 34, 73, 74, 82, 87, 95, 99, 115] and their cited times are listed. To compare with Table 4, there are significant differences for these two ranking methods in terms of top 12 papers. The reasons are as follows: the ranking results shown in Table 4 completely depend on the quantity of citations, while those in Table 6 depend on both the quantity and quality of citations.

Table 7 Paper ranking of citation network on authorities

Table 7 shows the ranking of papers by HITS algorithm on citation network. That is, the third ranking method was used to rank papers and the ranking results are given in Table 7. It can be observed from Tables 4 and 7 that the parameters times cited and authority seem to have similar effects on the evaluation of academic papers for our dataset since only one paper is different between them. This is not surprising because the ranking methods based on times cited and authority have the same characteristic of depending on the quantity only. To be more concrete, the former depends on the quantity of citations, and the latter depends on the quantity of hubs connecting to it. In other words, if we view citations and hubs as the same, then the ranking methods based on times cited and authority will be similar.

Note that HITS algorithm also output top 12 hub papers when it obtained top 12 authority papers. In Table 8, top 12 hub papers [10, 11, 26,27,28, 37, 42, 56, 57, 62, 66, 101] and their in/out-degrees (times cited by the papers in our dataset/the number of bibliographies) are listed. According to the discussion in Sect. 2, we know that top 12 hub papers in Table 8 were used to help the generation of top 12 authority papers in Table 7. In fact, the papers with high hub centrality and the papers with high authority centrality are beneficial to each other in HITS algorithm because they need to search for each other. In other words, readers can easily find influential papers from the papers with high hub centrality.

Table 8 Paper ranking of citation network on hubs

5.2 Bibliographic coupling network

Citation network provides a simple and intuitive way to show citation patterns but not the only one. An alternative representation is the bibliographic coupling network. Two papers are said to be bibliographic coupling if they cite the same other papers [50, 71]. In this section, we construct a weighted network in which nodes are papers and undirected edges indicate strengths of coupling which are the number of common citations between two papers.

Fig. 8
figure 8

Bibliographic coupling network

The bibliographic coupling network visualization was realized by Gephi, and it is shown in Fig. 8. This weighted network has 532 nodes and 100,518 edges. The node size is proportional to its degree (the number of common citations). We only drew the nodes and ignored the huge number of edges. There are a slight difference among the sizes of nodes. The reason is that all the papers belong to the same research field. Here, we do not give the ranking of papers based on  bibliographic coupling network because it lacks of convincing due to only slight difference among nodes.

5.3 Co-citation network

Another undirected network of citation patterns is co-citation network. Two papers are said to be co-cited if they are both cited by the same third paper [50, 71]. In this section, we construct a weighted network in which nodes represent papers and undirected edges indicate strengths of co-citation which equal to the number of other papers that cite both of them.

Fig. 9
figure 9

Co-citation network

The co-citation network visualization was realized by Gephi, and it is shown in Fig. 9. This weighted network has 247 nodes and 9744 edges. The node size is proportional to its degree (the number of co-citations).

Table 9 Paper ranking of co-citation network

Table 9 shows the ranking of papers by degree on co-citation network. Top 12 influential papers [5, 9, 31, 32, 34, 41, 59, 76, 82, 84, 94, 98] and their cited times are listed.

6 Keywords network

In this section, we construct a weighted network in which nodes represent keywords and undirected edges indicate co-occurrence pairs of keywords. The number of connections repeated between two nodes was converted into the weight of a link.

Fig. 10
figure 10

Keywords network

In Fig. 10, the keywords network visualization is shown in which weighted degree is not less than 10. This weighted network has 109 nodes and 544 edges. The node size is proportional to its degree (the number of co-occurrence pairs of keywords). The edge width is based on its weight. Top 10 keywords are three-way decision, decision-theoretic rough set, rough set, probabilistic rough set, granular computing, attribute reduction, fuzzy set, multi-granulation, cost-sensitive and loss function.

Furthermore, we trisected pairs of keywords according to occurring frequently together with three-way decision, and the trisection results are shown in Fig. 11. This star network has 100 nodes and 99 edges. In the figure, there are high frequency keywords (the frequency is greater than or equal to 11), middle frequency keywords, and low frequency keywords (the frequency is less than or equal to 3). These three types of keywords excluding three-way decision are colored with different colors from the inner to the outer. The trisection results of keywords provide useful hints for future researches of three-way decision since high frequency keywords often mean well-established researches, middle frequency keywords probably present emerging researches, and low frequency keywords may be new novel researches.

Fig. 11
figure 11

Star network of keywords on three-way decision

7 Conclusions

We have studied the dataset of three-way decision articles downloaded from ISI Web of Science. Concretely, the scientific collaboration network, university collaboration network, networks of scientific papers (i.e., citation network, bibliographic coupling network, co-citation network) and keywords network have been constructed to show the relationships between authors, affiliations, papers and keywords, respectively. Some interesting results are summarized as follows:

  1. (1)

    The scientific collaboration network has shown that most of papers were completed by different authors with cooperation. Yao Yiyu, Liu Dun, Li Tianrui, Liang Decui, Yu Hong, Miao Duoqian, Li Huaxiong, Yao JingTao, Wang Guoyin, Min Fan, et al. played a huge role in the development of three-way decision. Moreover, the community structure has presented the detailed collaboration among them and other authors.

  2. (2)

    University collaboration network has shown the cooperation between different research affiliations. A clear cooperative relationship among universities has been established with close geographical location, especially those in the same province. University of Regina, Southwest Jiaotong University, Tongji University, Nanjing University, Sichuan University, and University of Electronic Science and Technology of China actively promoted the development of three-way decision.

  3. (3)

    Networks of scientific papers include citation network, bibliographic coupling network and co-citation network. It has been shown that Professor Yao Yiyu’s two pioneering papers occupied the core position. Some important papers outside our dataset can be successfully traced by complex network analysis. It is interesting that there is a strong correlation between the weighted degree of papers in the co-citation network and ESI highly cited papers.

  4. (4)

    Keywords network has shown that decision-theoretic rough set, rough set, probabilistic rough set, granular computing, attribute reduction, fuzzy set, multi-granulation, cost-sensitive, loss function and sequential three-way decision are the hottest research issues around three-way decision.