1 Introduction

The concept of opinion leaders was first introduced by Lazarsfeld [10], who is a sociologist, for a group of individuals who have the power to affect a group’s perceptions. The concept of opinion leaders was developed by Von Hippel [21, 22] to characterize a set of users that are ahead of others in terms of developing new product needs, and was subsequently taken up by other researchers in social networks, aiming to understand users whose opinions are highly influenced by these opinion leaders.

The concept of opinion leaders is widely used in online social network analysis as they emerge in the interaction between subjects. There have been many researchers from different aspects to research opinion leaders detection. Ravi [17] presents a rigorous survey on opinion mining and sentiment analysis, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications. So far, opinion leaders are being detected and identified using standard questionnaires or observed behaviour. Belz [1] has shown that using the analysis of posts in internet forums, classifies almost half of internet users correctly as opinion leaders or non-leaders. Bilgram [2] studied online social network communities and identified several factors for identification of opinion leaders. Mazhari [11] proposed a model for measuring the similarities of two members based on information contained in their profiles in online social networks. The model can contribute to detect opinion leaders through friendship relationship. Severyn [18] designed the model for effectively carrying out opinion mining on YouTube comments. Huang [7] used the comments in online social networks to detect the opinion leaders. It can find the most influential comments from all the network comments and the most influential users form the entire user network. Above studies mainly depend on comments or sentiment analysis to detect opinion leaders and require comprehensive analysis of opinions expressed in online social networks. But analysis of opinions expressed is very difficult and the most influential comments may not belong to the most influential opinion leaders.

On the other hand, Kratzer [9] detected opinion leaders by specific network topologies, specifically some typical topologies of complex network, such as betweenness and degree centrality. Online social networks are typical examples of complex networks and complex network theories can be used to analyse their network topological characteristics. There are many traditional evaluation indicators for node importance in complex networks. These indicators are mainly divided into two categories: local connectivity importance indicators of node, and global connectivity importance indicators of node. Degree and clustering coefficient are typical indicators of the former, betweenness, closeness and composite indicator for node importance are typical indicators of the latter.

The degree of a node in a network is the number of connections or links that the node has to other nodes. High-degree nodes are those that have plenty of nodes connected with them. It is revealed that high-degree nodes are more important in a manner [3, 6, 20, 23]. The indicator is very intuitive and has low time complexity, but it can only reflect the local importance of a node.

Betweenness of a node is the total number of shortest paths from all nodes to all others that pass through that node. This indicator can reflect perfectly a node’s centrality in a network [5, 12]. A node with high betweenness has a large influence on the transfer of items through the network. The dynamic change characteristics of betweenness over a certain timeframe are taken into consideration. Dynamic betweenness [8, 15], such as flow betweenness and random-walk betweenness are proposed.

The farness of a node is defined as the sum of its shortest paths to all other nodes, and its closeness is defined as the reciprocal of the farness [14]. The indicator can reflect the impact of the node on other nodes, as well as the impact of topology location for nodes.

In online social networks, opinion leaders can be defined as the impact of node importance in complex networks. But there are many social relationships, such as friendship, commerce, or others in online social networks. With regards to opinion leaders, they can have different interests for different subjects, so groups of opinion leaders tend to overlap. It is inaccurate to detect and identify opinion leaders in multi-relationship online social networks according to a single kind of relationship alone. The aim of our study is to fill those gaps. Differing from previous studies [1, 2, 4, 7, 9, 11, 13, 18, 19, 24, 25], we propose a complex network model, multi-subnet composited complex network, in which a link represents all kinds of relationships between nodes. Given comments or sentiment analysis similarity relationships and network topologies, subnets could be picked up from the original network. Through multi-subnet composited networks, multiple relationships in online social networks can be used to detect and identify opinion leaders accurately.

2 Multi-subnet composited complex network model

  • Definitions 1 (Multi-subnet Composited Complex Network): A multi-subnet composited complex network is 4-tuple G = (V, E, R, F), where

  1. (1)

    V = {v 1, v 2, …, v m } is a finite set of nodes and m =  ∣ V∣;

  2. (2)

    E = {<v h , v l >| v h , v l  ∈ V, 1 ≤ h, l ≤ m} ⊆ V × V is a finite set of links between nodes;

  3. (3)

    R = R 1 × … × R i  × … × R n  = {(r 1, …, r i , …, r n )| r i  ∈ R i , 1 ≤ i ≤ n}, where R i is one kind of relationships between nodes and the amount of kinds of relationships is n ;

  4. (4)

    F is a mapping from E to R.

Mapping F gives each link an n-tuple to denote all kinds of relationships between its two nodes. Let r i  = ∅ be no interrelation R i between the nodes.

  • Definitions 2 (Subnet): Assuming multi-subnet composited complex network G = (V, E, R, F), G ' = (V ', E ', R ', F ') is said to be subset of G according to set of relationships R (R  ⊆ R and R  ≠ ∅), iff

  1. (1)

    V  ⊆ V;

  2. (2)

    E  = {<v h , v l >| F(<v h , v l >) ∩ R  ≠ Φ, <v h , v l  >  ∈ E, v h , v l  ∈ V};

  3. (3)

    F  : E  → R , where \( \forall <{v}_h^{\prime },{v}_l^{\prime }>\in {V}^{\prime } \), s.t. \( {F}^{\prime}\left(<{v}_h^{\prime },{v}_l^{\prime }>\right)=F\left(<{v}_h^{\prime },{v}_l^{\prime }>\right) \).

Actually, a subnet is also a composited network. The reason that we define it is to show a composited network could be dynamically organized. Due to the complexity of complex systems, we usually model one aspect of these systems as a subnet in which one kind of relationships among parts of the system is considered. Whereas for some problems, comprehensive analysis referring to several kinds of relationships in the system is essential. In that case, we expect that several subnets could be composited together.

For practical usage, multi-dimensional space is imported into our model, converting the relationships into a spatial vector to make it computable for solving real problem. Based on multi-subnet composited complex network, a vector composited network is defined by converting the relationships into a spatial vector.

Just as each link is assigned a weight proportional to the intensity or capacity of multiple relationships in a weighted network, dom(r i ) is defined to denote the intensity set. Each element in dom(r i ) indicates the intensity of relationship r i .

  • Definitions 3 (Vector Space of relationship intensity): Assuming a multi-subnet composited complex network G = (V, E, R, F), where R = R 1 × … × R i  × … × R n  = {(r 1, …, r i , …, r n )| r i  ∈ R i , 1 ≤ i ≤ n}, let dom(r i ) ⊆  +, null intensity is zero.

    Define

$$ S=\mathit{\operatorname{dom}}\left({r}_1\right)\times \dots \times \mathit{\operatorname{dom}}\left({r}_i\right)\times \dots \times \mathit{\operatorname{dom}}\left({r}_n\right) $$
(1)

as set of ordered n-tuple real numbers. It is obvious that S is vector-space of a field of real numbers. S is a vector-space of relationship intensity. Eq.(1) could be rewrote as Eq.(2) below,

$$ S=\left\{\boldsymbol{s}=\left(\boldsymbol{s}\left({r}_1\right),\dots, \boldsymbol{s}\left({r}_i\right),\dots, \boldsymbol{s}\left({r}_n\right)\right)|\boldsymbol{s}\left({r}_i\right)\in \mathit{\operatorname{dom}}\left({r}_i\right),1\le i\le n\right\} $$
(2)

where s is the vector of relationship intensity and s(r i ) is the i-th term of s denoting the intensity of relationship r i .

  • Definitions 4 (Bases of Vector Space of interrelation intensity): Assuming multi-subnet composited complex networkG = (V, E, R, F), where R = R 1 × … × R i  × … × R n  = {(r 1, …, r i , …, r n )| r i  ∈ R i , 1 ≤ i ≤ n}. Let S be a vector space of relationship intensity G, set of vectors

$$ {e}_1=\left(1,0,\dots, 0\right),{e}_2=\left(0,1,\dots, 0\right),\dots, {e}_n=\left(0,0,\dots, 1\right) $$
(3)

is defined as bases of S. For ∀s ∈ S, s = s(r 1)e 1 + s(r 2)e 2 + … + s(r n )e n . e 1, …, e i , …, e n are linearly independent vectors. Base e i could be regarded as relationship r i and vice versa. The number of dimensions of S is dim(S) = n.

  • Definitions 5 (Vector Network): Given G = (V, E, R, F), let S be a vector space of relationship intensity G. Then there is the mapping M : E → S, s.t. for ∀ < v h , v l  >  ∈ E, v h , v l  ∈ V, 1 ≤ h, l ≤ m

$$ M\left(<{v}_h,{v}_l>\right)={\boldsymbol{s}}^{<{v}_h,{v}_l>}=\left({\boldsymbol{s}}^{<{v}_h,{v}_l>}\left({r}_1\right),\dots, {\boldsymbol{s}}^{<{v}_h,{v}_l>}\left({r}_i\right),\dots, {\boldsymbol{s}}^{<{v}_h,{v}_l>}\left({r}_n\right)\right) $$
(4)

and if r i  ≠ ∅ in F(<v h , v l >) then \( {\boldsymbol{s}}^{<{v}_h,{v}_l>}\left({r}_i\right)\ne 0 \), 1 ≤ i ≤ n. Three-tuple Σ = (G, S, M) is a vector network of G. And let \( {\boldsymbol{s}}^{<{v}_h,{v}_l>}\left({r}_i\right) \) indicates the intensity of relationship r i between nodes v h , v l .

3 Node importance evaluation approach in composited networks based on signalling

3.1 Signalling process

For a network with nodes, every node is assumed to be a system which can send, receive and record signals. One node can only affect its neighbour nodes, likewise those nodes also will affect their neighbour nodes in the same way. With the signalling process, each node will affect the whole network.

At the beginning, a node as source is selected and has one unit of signal and the other nodes in the network have no signal. Then the source node sends a signal to all of its neighbour nodes and itself. After the first process, the node and all its neighbour nodes have a signal. All the nodes which have a signal will subsequently send it to their neighbour nodes and themselves. Every node would record the quantity of signals which it received and then would send the same quantity of signals in the next process. In this way, the signalling process would be repeated constantly on the network. After T time process, an nd vector that records each node’s signal quantity which represents the impact of the source node is received. The signalling process of network with 6 nodes is shown in Fig. 1a-d.

Fig. 1
figure 1

Signalling process in network with 6 nodes

The signalling process in Fig. 2 could be described by using a mathematical equation shown as follows:

$$ V={\left(I+A\right)}^T $$
(5)

where, A is an adjacency matrix, which can represent mathematically a network with s nodes, elements A ij of the adjacency matrix represents whether there is a link from node i to node j and 0 otherwise. I is an identity matrix. T represents the signalling process time. So the column i of matrix V represents the impact of source node i to the whole network in signalling process time, T.

Fig. 2
figure 2

Signalling process of multi-subnet composited complex network with 6 nodes

In order to get the relative impact, each row of matrix V should be standardized. Assume the row i of V is V i  = (v i1, v i2, …, v is ), then the V i can be standardized as U i  = (u i1, u i2, …, u is ), here \( {u}_{ij}={v}_{ij}/{\varSigma}_j^n{v}_{ij} \).

3.2 Signalling process in multi-subnet composited complex networks

Every element \( {\widehat{A}}_{ij} \) of adjacency matrix \( \widehat{A} \) of a multi-subnet composited complex network is an m-tuple which represents all kinds of relationships of link from node i to node j. The ith vector component \( {\boldsymbol{s}}^{<{v}_i,{v}_j>}\left({r}_f\right) \)represents if there is relationship r f on link from node i to node jor not.

In a multi-subnet composited complex network, when the source node sends signal about relationship r f to all of its neighbour nodes and itself, only those neighbour nodes which are connected to the source node to form a link where there is relationship r f can receive signals. Other neighbour nodes connected with the source node to form link where there is no relationship r f cannot receive signals with regard to relationship r f .

After T signalling process, for every source node, an nd vector that includes all relationships in the network occurs, every vector component is all m-tuple (r 1, …, r i , …, r m ) which represents all kinds of relationships in the network. Thus, for the whole multi-subnet composited complex network, an n ∗ n adjacency matrix whose every element is a m-tuple occurs.

The signalling process of a multi-subnet composited complex network with 6 nodes and node 1 acting as source node is shown in Fig. 2a-d.

The signalling process in a multi-subnet composited complex network could be described by using a mathematical equation shown as follows:

$$ \widehat{V}={\left(\widehat{I}+\widehat{A}\right)}^T $$
(6)

where, T represents the signalling process time. \( \widehat{A} \) is an adjacency matrix of a multi-subnet composited complex network, it can be expressed as follows:

$$ {\displaystyle \begin{array}{l}\kern4.90em {v}_1\kern2.30em {v}_2\kern2.30em {v}_3\kern2.30em {v}_4\kern3.25em \\ {}\ \widehat{A}=\begin{array}{c}{v}_1\\ {}{v}_2\\ {}{v}_3\\ {}{v}_4\end{array}\left[\begin{array}{cccc}\left(0,0\right)& \left(1,1\right)& \left(1,1\right)& \left(0,1\right)\\ {}\left(1,1\right)& \left(0,0\right)& \left(1,1\right)& \left(0,0\right)\\ {}\left(1,1\right)& \left(1,1\right)& \left(0,0\right)& \left(0,0\right)\\ {}\left(0,1\right)& \left(0,0\right)& \left(0,0\right)& \left(0,0\right)\end{array}\right]\end{array}} $$
(7)

The above signalling process is under the assumption that there is no interrelation between the multiple relationships. Yet in reality, there would be interrelation between the multiple relationships in a multi-subnet composited complex network. For a multi-subnet composited complex network, the parameter αcan be used to represent the interrelationship between multiple relationships in the signalling process.

For example, if relationship r 2effects relationshipr 1, signalling t 0on relationship r 1which is received by neighbour nodes v 0 of source node v s can be expressed as follows:

$$ {t}_o=\left\{\begin{array}{l}\ \alpha \times 1\kern1em if\ {\boldsymbol{s}}^{<{v}_s,{v}_o>}\left({r}_1\right)=\mathbf{1}\ \mathbf{and}\ {\boldsymbol{s}}^{<{v}_s,{v}_o>}\left({r}_2\right)=\mathbf{0}\\ {}\ 1\kern2.6em if\ {\boldsymbol{s}}^{<{v}_s,{v}_o>}\kern0.24em \left({r}_1\right)=\mathbf{1}\ \mathbf{and}\ {\boldsymbol{s}}^{<{v}_s,{v}_o>}\left({r}_2\right)=\mathbf{1}\end{array}\right. $$
(8)

The signalling process of an interactional multi-subnet composited complex network with 6 nodes and node 1 acting as source node is shown in Fig. 3a-d.

Fig. 3
figure 3

Signalling process of interactional multi-subnet composited complex network with 6 nodes. In the process, parameter α = 0.5

4 Opinion leader detecting algorithm in multi-relationship online social networks

The signalling process in multi-relationship online social networks could be described by a matrix iterative operation based on a multi-subnet composited complex network. For node v h in a multi-subnet composited complex networkG = (V, E, R, F), \( {\widehat{k}}_{v_h}^{r_i} \)represents the degree of node v h on relationshipr f , expressed as follows:

$$ {\widehat{k}}_{v_h}^{r_i}=\sum \limits_{v_l\in V}{\widehat{\boldsymbol{a}}}_{hl}\left({r}_i\right) $$
(9)

The node degree on relationship r f indicates the number of nodes on which signalling on relationship r f can be directly reached. Importance vector R can be conducted by an adjacency matrix, and importance matrix D can be constructed by an adjacency matrix and importance vectorR. The steps of an opinion leader detecting algorithm are descripted as follows:

  1. (1)

    Generating the initial adjacency matrix \( \widehat{A} \) of a multi-relationship online social network;

  2. (2)

    Generating the initial importance vector R of every node by adjacency matrix\( \widehat{A} \);

  3. (3)

    Constructing an importance matrix D by importance vector R and adjacency matrix \( \widehat{A} \);

  4. (4)

    Using matrix multiplication to simulate the signalling process, the matrix multiplication mathematical equation is defined as follows: R i + 1 = R i  • D i ;

  5. (5)

    Comparing R i + 1 with R i to judge that the difference is whether or not more than setting threshold value, which means that importance vector R is whether or not tending towards stability after matrix iterative, if stable state is achieved, the algorithm would terminate, otherwise, jumping to step (3) for going on matrix iterative operation.

5 Experiment and analysis

To verify our proposed algorithm, experimental analysis is conducted. The data for the experiment consists of 30,453 users in Douban.com [16] through tracking their relationships and comments over two days (2015–03-17 08:00: 00~2015–03-18 07:59:59). The experimental network is composed of 30,453 nodes, 55,865 links and 1,789,567 book comments. There are friendship relationships and book comments similar in relationship in the network, and the two kinds of relationships interrelate with each other. So the experimental network is a typical multi-relationship online social network.

Firstly, we select 4 users from the data to form a directed network to validate node importance of our algorithm. The 4 users network are shown as Fig. 4.

Fig. 4
figure 4

The 4 users network used to validate our algorithm

We assumed that if the difference in every node importance in two iterations is less than 10−3, the important degree of the nodes is regarded as stable, and then the termination condition of the algorithm would be met.

Comparison of the importance degree of nodes of our algorithm with the PageRank algorithm and HITS algorithm for a 4 user network is shown in Table 1.

Table 1 Comparison of important degree of nodes of our algorithm with PageRank algorithm and HITS algorithm

From Table 1 we can see that if the multiple relationships in a 4 user network are ignored, the analysis results of the nodes importance of a 4 user network through our algorithm and classical algorithms, such as the PageRank algorithm and HITS algorithm, are completely consistent.

If the difference of every node importance in two iterations is less than a certain threshold, the important degree of nodes is regarded as stable. The iteration times with different thresholds to display trend stability through our algorithm, the PageRank algorithm and HITS algorithm are shown in Fig. 5.

Fig. 5
figure 5

Iterative times comparison of our algorithm PageRank algorithm and HITS algorithm

From Fig. 5 we can see that the convergence of our proposed algorithm is much better than the PageRank and HITS algorithms. In the case of the same importance difference threshold, the iteration time of our proposed algorithm is much less than the iteration times of the PageRank and HITS algorithms.

In the experiment, First of all, manual voting method is used to detect opinion leaders. The first 500 users got by HITS algorithm and PageRank algorithm were investigated with email anonymous questionnaire, and 500 candidate opinion leaders were given in the questionnaire. A total of 378 valid questionnaires were returned, according to the results of the questionnaire, the top 20 opinion leaders were obtained.

Then through friendship relationship building of a single relationship network, the PageRank and HITS algorithms are used to detect opinion leaders, and via multiple relationships in the same data building a composited network, applying the algorithm proposed in Section 4, the top 20 nodes in order of importance are shown in Table 2.

Table 2 Comparison of the analysis results of our proposed algorithm,PageRank algorithm and HITS algorithm with manual voting

From Table 2 we can see that though the sequence of the top 20 nodes is different for each of the three algorithms, there are only 3 nodes in the top 20 node sequences of our proposed algorithm not in the sequence of the PageRank and HITS algorithms. If the top 100 and top 500 nodes of the three algorithms sequences are selected, the three sequences are completely covered. So for node importance analysis, the three algorithms can be considered to be concordant.

But the detecting important nodes of our proposed algorithm are very close to the results of manual voting, and there are very big difference between the other two algorithms and the results of manual voting. It is proved that our proposed algorithm considering multiple relationships joint effects are more in line with the reality.

For the purpose of validating our algorithm further, sentiment analysis of comments is applied, where comments are usually divided into positive (1), negative (−1) and neutral (0). We divide the period of tracking data time into two different time periods. There are 211 comments of top 500 importance nodes in the first time period as shown in Fig. 6.

Fig. 6
figure 6

Ranking (08:00:00–12:30:00)

It is also easy to find that comment No.25 which belongs to user No.1239 received the highest scores, while comment No.166 which belongs to user No.4323 didn’t receive the highest score. Hence, user No.4323 is not an opinion leader at this time. As illustrated in Fig. 7, user No.4323 should be an opinion leader, while the score of user No.166 increases with time. But opinion leader user No.4323 can’t be detected by either the PageRank or HITS algorithms.

Fig. 7
figure 7

Ranking (08:00:00–next day 07:59:59)

Consequently, it is found that our algorithm proposed in this paper has much more accuracy than other algorithms, and our algorithm can better apply to detecting opinion leaders of multi-relationship online social networks.

6 Conclusion

Based on a multi-subnet composited complex network model and signalling process, we proposed the use of an importance matrix iterative method to analyse node importance, and thereby detect opinion leaders in multi-relationship online social networks. Through experiment analysis, it proved that compared with existing node importance evaluation approaches, our proposed algorithm can better reflect node importance in multi-relationship online social networks, and iteration times of our proposed algorithm are much lower. The research would contribute to improved efficiency and accuracy of public opinion analysis, community structure detecting, and signalling processing in multi-relationship online social networks.