1 Introduction

With the explosive increase of individual interactions across the world by social services such as Facebook, Email and telephone, the social network has been extensively studied in a myriad of contexts, from uncovering criminal activity to finding friends and from detecting community to predicting the spread of epidemics [9, 28]. An attractive and challenging work in the social network is event detection and evaluation [40]. A generalized social network is a set of individuals (people, organizations or other social entities) connected by a set of social relationships (friendship, co-working, kinships or information exchange) [19], which is not limited in popular social media. The occurrences of many events have enormous influences in their corresponding social networks. The detections and evaluations of events can provide critical materials that describe the current situations in many real applications, such as crisis management and decision making. For example, the telephone communications of staffs in a small company constitute a generalized social network, and its topological structures in four consecutive days are shown in Figure 1. It is known that the event of a management change happens in the second day, which leads to the significant increase of the communication. Can we evaluate the event impact in the social network without available texts? Most real-world social networks cannot be simply visualized like Figure 1, so how can we detect events in large-scale and complex social networks efficiently? Assuming the company detects the anomaly and evaluates the impact of the management change in time, the company can have an effective insight into staff reactions, which is significant for the company to take useful strategies to avert negative effects.

Figure 1
figure 1

The telephone communication of a work group in four consecutive days

In this paper, we are devoted to solve the problem of event detection and evaluation for generalized social networks. There are two types of existing methods about event detection and evaluation in social networks. (1) The method based on text analysis has been widely studied in the field of the social media [45], such as tweet, Facebook and blog. And the content like the messages, associated tags and descriptions in the social media are used as text to extract event information. This method is seriously limited by the context provided by users. And many social interactions like friendship, co-working and telephone communications have no available texts to be applied for this method. (2) The method based on graphic feature statistics involves graph similarity comparison [29], scan statistics [41] and graphic pattern recognition [43]. But this method needs select appropriate graphic features by calculating and comparing a huge amount of statistical variables [2, 17]. Ignoring the dynamics of network evolutions, this method is hard to evaluate event impacts in practical application. There is still desperate for an efficient method to detect and evaluate events in generalized social networks with limited topology information.

It has been acknowledged that each social network is driven by its inherent evolution mechanism [32, 47]. And the occurrences of many events can lead a social network to deviate from its inherent evolution and produce abnormal fluctuations. So we can detect events by discovering abnormal fluctuations and evaluate events by analyzing fluctuation trends. The key point behind the idea is how to describe a network evolution fully. The evolution mechanism of a real-world network is often hybridized with multiplex features instead of a single pure mechanism [23], and different social networks usually have different mechanisms. It is hard to directly construct a suitable model to describe a real network evolution. There is a strong need for an efficient way to achieve the corresponding optimal descriptions for different social networks. Another important point desiring attention is how to evaluate the fluctuations caused by events quantitatively. Even without happening social events, routine evolution fluctuations usually happen in the network evolution process. It needs us to distinguish the routine evolution fluctuation and the abnormal network evolution fluctuation caused by events.

Link prediction is a popular link mining method, and its goal is to evaluate the existence probability of future links or unknown links between nodes [6, 7]. Link prediction has been applied to extract missing information, identify spurious interactions, recommend similar friends and so on [5], but there is little work related to event detection and evaluation in the social network. Both link prediction and network evolution consider the increase and decrease of edges and nodes. And a network evolution mechanism theoretically has a corresponding link prediction algorithm [24]. It is possible to indirectly describe the evolution mechanism of a social network by estimating the performance of its corresponding link prediction algorithm. It is natural to inspire us to describe a given network evolution by link prediction.

However, most link prediction algorithms can only achieve good prediction performances for networks with given inherent mechanisms, and they are not universal for different social networks. To overcome the problem, we firstly propose the mixed index for link prediction that is suited to all the social networks. Because of the fast computing speed and the outstanding global searching ability in dynamic multi-objective optimization environment, the particle swarm optimization (PSO) has become one of the most powerful swarm intelligence based methods [37]. To efficiently determine the optimal mixed index for a given social network from a huge number of possible mixed indexes, we propose the optimal weight algorithm (OWA), which introduces the quantum mechanism to improve the description performance of PSO. The link prediction algorithm based on the optimal index can describe the corresponding network evolution fully. On the other hand, to quantify the fluctuations caused by events in a given social network, the otherness of micro node evolutions is further considered to propose a fluctuation detection algorithm (FDA) by the link prediction algorithm based on the optimal index. Based on OWA and FDA, a hybrid quantum swarm intelligence indexing method (HQSII) inspired by link prediction is proposed in this paper. The main conclusions can be summarized as follows.

  1. (1)

    The mixed index for link prediction is proposed firstly, which improves the universality of existing similarity indexes. Furthermore, OWAis proposed to determine the optimal mixed index for different social networks efficiently. OWA not only improves the accuracy of link prediction by exploring the cooperation of similarity indexes, but also makes link prediction describe different network evolution mechanisms fully.

  2. (2)

    FDAis proposed to quantify the abnormal fluctuations caused by events in social network evolutions at different periods. To avoid the interferences of routine network fluctuations, the otherness of micro node evolutions is also introduced into link prediction.

  3. (3)

    With limited topology information in generalized social networks, HQSII detects events by discovering abnormal fluctuations and evaluates event impacts by analyzing fluctuation trends. We demonstrate that HQSII can accurately detect major events and quantitatively evaluate event impacts in different social networks with single and multiple events.

The rest of this paper is organized as follows: the related work is summarized briefly in Section 2; then the problem of event detection and evaluation in social networks is defined in Section 3; we detail our proposed method and experiments in Section 4 and Section 5, respectively, followed by a brief conclusion and prospects in Section 6.

2 Related work

The existing methods about event detection and evaluation in social networks mainly focus on text analysis and graphic feature statistic. Their latest and influential works are reviewed as follows.

For event detection and evaluation based on text analysis, Zhou and Chen [48] proposed a novel framework to detect composite social events over streams, which fully exploits the information of social data over multiple dimensions. A graphical model called location-time constrained topic is proposed to capture the content, time, and location of social messages. And the events are identified by conducting efficient similarity joins over social media streams. Dong et al. [10] proposed a novel approach towards multi-scale event detection using social media data, which takes into account different temporal and spatial scales of events in the data. They also proposed a novel algorithm to compute a data similarity graph at appropriate scales and detect events of different scales simultaneously by a single graph-based clustering process. The spatiotemporal statistical analysis of the noisy information in the data stream is proposed to define a novel term-filtering procedure. Stilo and Velardi [39] presented a novel method for clustering words in micro-blogs, based on the similarity of the related temporal series. Their method applies the symbolic aggregate approximation algorithm to transfer the temporal series of terms into a small set of levels, leading to a string for each. They evaluated the quality of all discovered events in a 1-year stream, “googling” with the most frequent cluster n-grams and manually assessing how many clusters correspond to published news in the same temporal slot. However, the above methods based on text analysis are seriously limited by the context provided by user messages. They cannot be applied for most generalized social networks with no available texts like telephone communications and friendship.

For event detection and evaluation based on graphic feature statistics, Priebe et al. [31] introduced a theory of scan statistics on graphs, and applied the ideas to solve the problem of anomaly detection in a time series of Enron email graphs. Mcculloh and Carley [27] proposed a social network change detection method, which monitors social networks to determine when the significant changes of their organizational structure occur and what caused them. Their method combines analytical techniques from social network analysis with statistical process control, which requires the use of statistical process control charts to detect changes in observable network measures. Iglesias and Zseby [17] addressed the feature selection problem for network traffic based anomaly detection. They proposed a multi-stage feature selection method using filters and stepwise regression wrappers. Their feature selection method reduces the original feature vectors from 41 to only 16 features, and the detection performance with five fundamentally different classifiers has significant reduction. However, the above methods based on graphic feature statistics need select valid features from a huge number of possible graph features. Ignoring the dynamics of network evolutions, the above methods based on graphic feature statistics are hard to evaluate event impacts in practical applications. As Akoglu et al. [2] summarized, the develop definitions and formulations for abnormalities in dynamic attributed graphs is insufficient, which limits them to find applications in dynamic social networks of the real world.

The benefit and disadvantage of the above two typical methods is further shown in Table 1. There is still not an effective method for event detection and evaluation in generalized social networks, which motivates us to propose a method from the aspect of link prediction in this paper.

Table 1 The comparison between two methods for event detection and evaluation

3 Problem statement

Impressive achievements of event detection and evaluation have been achieved in the social media like Facebook and tweet [10, 39]. Different from these narrow social media with available texts, we focus on event detection and evaluation in the generalized social networks that derive from the interactions among different individuals in various ways, such as friendship, club membership, co-working and information exchange.

Figure 2 shows the process of event detection and evaluation in generalized social networks. The social data reflects the social relationship of different individuals at different periods, which can be abstracted to a dynamic evolution network where nodes denote individuals and edges denote social relationships among individuals [46]. It is a common way to denote a dynamic evolution network by extracting its network snapshot set {g 1, g 2,  ... , g t,  … , g T − 1, g T}. T denotes the sum of snapshots, which is decided by actual needs. g t denotes the network snapshot at the moment t. The interval between two adjacent snapshots is usually constant.

Figure 2
figure 2

The process of event detection and evaluation in generalized social networks

A real-world event occurrence that is reflected in the objective social network is called as an event in this paper. We focus on the events that have impacts on the objective social network. And these events are critical to describe the social network situation. The social content just existed in social media cannot be used as input in our method for generalized social networks, and we need propose a general method (Me) just depended on the network snapshot set where each network snapshot consists of nodes (v) and edges (e). The problem is defined as follows.

Problem definition

At the current momentt, the social network snapshot set of the objective social network is extracted as {g 1(v 1, e 1), g 2(v 2, e 2),  ... , g t − 1(v t − 1, e t − 1), g t(v t , e t )} , how to judge whether events happen in the latest period (t − 1, t)? If we detect events in the period (t − 1, t), how to quantitatively evaluate its impact on the objective social network in the period (t − 1, t)? These can be denoted by Eq. (1).

$$ Me\left(\left\{{g}^1\left({v}_1,{e}_1\right),{g}^2\left({v}_2,{e}_2\right),\dots, {g}^{t-1}\left({v}_{t-1},{e}_{t-1}\right),{g}^t\left({v}_t,{e}_t\right)\right\}\right)\to {\left\{\zeta, E\right\}}_{\left(t-1,t\right)} $$
(1)

For the objective period (t − 1, t), the required result is a judge array with ζ (ζ ∈ {0, 1}) and E. When events happen in the period(t − 1, t), ζ = 1. Else, ζ = 0. And E denotes the event impact value that estimates the possible events.

Assuming the above problem is solved, we can observe the impact change of detected events in different periods. And we also can further predict the impact of an event in the future by many statistical analysis techniques like regression analysis [8]. In this paper, HQSII is proposed as the general method Me to solve this problem from the perspective of link prediction. It detects events by accurately discovering abnormal fluctuations and evaluates events by quantitatively analyzing fluctuation trends.

4 HQSII

OWA that describes network evolutions and FDA that evaluates network fluctuations are hybridized to propose HQSII, which accurately detects events and quantitatively evaluates event impacts. The framework of HQSII is shown in Figure 3, which is detailed as follows. (1) The mixed index is proposed to explore the cooperation of different independent similarity indexes. OWA determines the optimal mixed indexes to describe different social network evolutions. (2) Based on the optimal mixed index, FDA is proposed to quantitatively evaluate the fluctuations caused by events in social network evolutions at different periods, which considers the otherness of micro node evolutions to avoid the interferences of routine network fluctuations. (3) Based on OWA and FDA, HQSII selects the periods with relatively high impact values as the potential event occurrence periods, and evaluates the impacts of events by their fluctuation degree. It finally outputs the event detection sequence ({ζ, E}(1, 2),  ... , {ζ, E}(t − 1, t),  ... , {ζ, E}(T − 1, T)) that contains judge arrays in all periods. The process of HQSII is shown in Table 2.

Figure 3
figure 3

The framework of HQSII

Table 2 The process ofHQSII

In the rest of this section, Section 4.1 introduces the preliminary concepts of quantum information, similarity index and link prediction measurement. OWA and FDA of HQSII are detailed in Section 4.2 and Section 4.3, respectively.

4.1 Preliminary concepts

Quantum information

Quantum information is a new subject from the combination of quantum mechanics and information science. The basic storage unit in quantum information is quantum bit [12]. |0〉 and |1〉 denote tow polarization states of a quantum bit. The state of a quantum bit is denoted as P ic |0〉 + P is |1〉. And P ic and P is denote the probability amplitudes of |0〉 and |1〉, respectively. The quantum mechanism is usually applied to improve the performances of many optimization algorithms [22].

Link prediction

Link prediction has three common methods: the link prediction based on Markov chain or machine learning [30], the link prediction based on likelihood analysis [11], and the link prediction based on similarity indexes [26]. Compared with the first two methods with the problem of high computational complexity, the link prediction based on similarity indexes is more simple and efficient to achieve high prediction accuracy [26]. The common steps of link prediction based on similarity indexes are as followings: (1) Use a similarity index to calculate the probability score that the node pairs without links produce links; (2) According to the probability scores in descending order, output high ranked links as the prediction results. Many independent similarity indexes have been proposed, such as preferential attachment index (PA) [3], common neighbor index (CN) [33], Adamic-Adar index (AA) [1], Jaccard index (JA) [18], Sorensen index (SO) [38], HPI index (HPI) [34], salton index (SA) [35] and LNH-I index (LNH) [21]. The similarity indexes are used to calculate the similarity score of each node pair. When the similarity score of a node pair is high, the node pair has more probability to produce a link. S(i, j) denotes the similarity score of a node pair (i, j). For node i, Γ(i) and k(i) denote the neighbor node set and the degree, respectively. And k(i) = |Γ(i)|. PA, CN, AA, JA, SO, HPI, SA and LNH are defined as S(i, j) = k(i) × k(j) [3], S(i, j) = |Γ(i) ∩ Γ(j)| [33], \( S\left(i,j\right)={\displaystyle \sum_{z\in \varGamma (i)\cap \varGamma (j)}\frac{1}{l{g}^{k(z)}}} \) [1], \( S\left(i,j\right)=\frac{\left|\varGamma \left(\mathrm{i}\right)\cap \varGamma \left(\mathrm{j}\right)\right|}{\left|\varGamma \left(\mathrm{i}\right)\cup \varGamma \left(\mathrm{j}\right)\right|} \) [18], \( S\left(i,j\right)=\frac{2\left|\varGamma \left(\mathrm{i}\right)\cap \varGamma \left(\mathrm{j}\right)\right|}{k\left(\mathrm{i}\right)+k\left(\mathrm{j}\right)} \) [38], \( S\left(i,j\right)=\frac{\left|\varGamma \left(\mathrm{i}\right)\cap \varGamma \left(\mathrm{j}\right)\right|}{ \min \left\{k\left(\mathrm{i}\right),k\left(\mathrm{j}\right)\right\}} \) [34], \( S\left(i,j\right)=\frac{\left|\varGamma (i)\cap \varGamma (j)\right|}{\sqrt{k(i)\times k\left(\mathrm{j}\right)}} \) [35] and \( S\left(i,j\right)=\frac{\left|\varGamma \left(\mathrm{i}\right)\cap \varGamma \left(\mathrm{j}\right)\right|}{k\left(\mathrm{i}\right)\times k\left(\mathrm{j}\right)} \) [21], respectively.

Link prediction measurement

The measurement indexes of link prediction are Precision [14], RankingScore [49] and AUC [13]. Precision only considers the edges with higher score rank, and RankingScoreconsiders all ranked edges. Because of low computing complexity, AUC is the most commonly used measurement index, which evaluates the performance of link prediction algorithms comprehensively. AUC [13] is selected as the measurement index in this paper. Its definition is as follows.

$$ AUC=\frac{n^{\prime }+0.5{n}^{{\prime\prime} }}{n_o} $$
(2)

n o denotes the total comparison time, and n' denotes the comparison time that the similarity score of a link randomly selected from the test link set is higher than the similarity score of a link randomly selected from the inexistent link set. n" denotes the comparison times that the similarity score of a link randomly selected from the test link set is equal to the similarity score of a link randomly selected from the inexistent link set. The higher the AUC value is, the higher the link prediction accuracy is. When the scores of all comparison links are randomly produced, AUC = 0.5 in theory. To avoid the fluctuation of AUC caused by random selections, the edge in the test link set can compare with the edge in the inexistent link set one by one, when the computation power is available. And 672,400 comparisons are needed at most to satisfy that the absolute error of AUC does not exceed 1/10,000 with the confidence level of 90 % [20].

4.2 OWA

A real network evolution often contains various network evolution mechanisms. A link prediction algorithm based on an independent similarity index is difficult to fully describe a real-world social network evolution that is usually hybridized with multiplex features [47]. Exploring the cooperation of different similarity indexes, the mixed index is proposed in OWA. To fully describe the evolutions of different social networks, OWAfurther determines the optimal mixed indexes for different social networks efficiently.

Mixed index

The mixed index is proposed and defined by Eq. (3), which improves the universality of independent similarity indexes.

$$ MixSimIndex=f\left({w}_1,\dots, {w}_{\mu },\dots, {w}_n, SimInde{x}_1,\dots, SimInde{x}_{\mu },\dots, SimInde{x}_n\right) $$
(3)

where SimIndex μ  ∈ ϕ and \( {\displaystyle \sum_{\mu =1}^n{w}_{\mu }}=1 \).

SimIndex μ denotes an independent similarity index, which is called as a unit index of MixSimIndex.w μ is the corresponding weight of SimIndex μ . n denotes the number of unit indexes in MixSimIndex. ϕ denotes the similarity index set, and it should be noted that the new similarity indexes proposed in the future study can also be added into ϕ. All unit indexes are selected from ϕ. fdenotes the mixed function, and it enhances the flexibility of mixed indexes. When the scores of all comparison links are randomly produced, AUC = 0.5 in theory. So it is defined that only the similarity index in ϕ that makes AUC > 0.5 can be selected as a unit index.

$$ MixSimIndex={\displaystyle \sum_{\mu =1}^n{w}_{\mu}\times SimInde{x}_{\mu }}, $$
(4)

where SimIndex μ  ∈ ϕ and \( {\displaystyle \sum_{\mu =1}^n{w}_{\mu }}=1 \).

In the following paper, the mixed function f is denoted by Eq. (4), which emphasizes the proportions of different unit indexes. And ϕ contains eight common similarity indexes PA, CN, AA, JA, SO, HPI, SA and LNH. For example, if MixSimIndex = 0.7 × AA + 0.1 × SA + 0.2 × CN, the similarity score of the node pair (i, j) is calculated by \( S\left(i,j\right)=0.7\times {\displaystyle \sum_{z\in \varGamma (i)\cap \varGamma (j)}\frac{1}{l{g}^{k(z)}}}+0.1\times \frac{\left|\varGamma (i)\cap \varGamma (j)\right|}{\sqrt{k(i)\times k\left(\mathrm{j}\right)}}+0.2\times \left|\varGamma (i)\cap \varGamma (j)\right| \).

The independent similarity index (PA, CN, AA, JA, SO, HPI, SA or LNH) is a special example of MixSimIndex. For example, if only CN is selected form ϕ as the unit index, then the weight value of CN is 1 and MixSimIndex = CN. And the mixed index is equal to the independent index CN.

Optimal mixed index

The performance of a mixed index is closely related to the weights of its unit indexes. When a link prediction algorithm is consistent to the network evolution mechanism, the algorithm should provide accurate prediction [47]. A mixed index that its link prediction algorithm achieves the highest link prediction accuracy is defined as the optimal mixed index. The link prediction algorithm based on the optimal mixed index is most consistent with the network evolution mechanism, so the optimal mixed index can describe the corresponding network evolution best.

By introducing the quantum mechanism to improve the searching ability of PSO, OWA is proposed to determine the optimal mixed index efficiently in this section. Assuming the mixed index has n unit indexes, they are denoted as SimIndex 1, SimIndex 2 ,…. and SimIndex n , their corresponding weights w 1,w 2,…. andw n constitute the weight array W = (w 1, w 2,  .... , w n ). AUC (MixSimIndex(W)) denotes the AUC value of the mixed index corresponding to the weight array W, which is used as fitness function to conduct and evaluate the performance of a mixed index. The higher the fitness value is, the better the corresponding mixed index is. W il denotes the optimal weight array that the particle i has discovered until now, and its corresponding probability amplitude is P il . W g denotes the optimal weight array that the particle swarm has discovered until now, and its corresponding probability amplitude is P g .The three steps of OWA are as follows.

  1. (1)

    Produce the initial quantum particle swarm. A quantum state particle P i is encoded by Eq. (5).

$$ {P}_i=\left[\left|\begin{array}{c}\hfill \cos \left({\theta}_{i1}\right)\hfill \\ {}\hfill \sin \left({\theta}_{i1}\right)\hfill \end{array}\right|\begin{array}{c}\hfill \cos \left({\theta}_{i2}\right)\hfill \\ {}\hfill \sin \left({\theta}_{i2}\right)\hfill \end{array}\left|\begin{array}{c}\hfill ......\hfill \\ {}\hfill ......\hfill \end{array}\right|\left.\begin{array}{c}\hfill \cos \left({\theta}_{in}\right)\hfill \\ {}\hfill \sin \left({\theta}_{in}\right)\hfill \end{array}\right|\right] $$
(5)

Where θ ij  = 2π × rnd,rnd is produced in (0,1) randomly,i = 1 , 2 ,  .  .  .  , m and j = 1 , 2 ,  .  .  .  , n. m is the particle number of a quantum particle swarm. The increase of m contributes to the diversity of the initial weight arrays. n denotes the number of unit indexes in the optimal mixed index. The probability amplitudes of each quantum state particle are denoted as P is and P ic , respectively. Each quantum state particle occupies two sites corresponding to P is and P ic , which are denoted by Eqs. (6) and (7).

$$ {P}_{is}=\left( \sin \left({\theta}_{i1}\right), \sin \left({\theta}_{i2}\right),......, \sin \left({\theta}_{in}\right)\right) $$
(6)
$$ {P}_{ic}=\left( \cos \left({\theta}_{i1}\right), \cos \left({\theta}_{i2}\right),......, \cos \left({\theta}_{in}\right)\right) $$
(7)

The corresponding weight arrays of P is and P ic are obtained by Eqs. (8) and (9). The weight array W can be W is or W ic .

$$ {W}_{is}=\left(\frac{ \sin \left({\theta}_{i1}\right)}{{\displaystyle \sum_{\eta =1}^n \sin \left({\theta}_{i\eta}\right)}},\kern0.5em \frac{ \sin \left({\theta}_{i2}\right)}{{\displaystyle \sum_{\eta =1}^n \sin \left({\theta}_{i\eta}\right)}},....,\frac{ \sin \left({\theta}_{in}\right)}{{\displaystyle \sum_{\eta =1}^n \sin \left({\theta}_{i\eta}\right)}}\right) $$
(8)
$$ {W}_{ic}=\left(\frac{ \cos \left({\theta}_{i1}\right)}{{\displaystyle \sum_{\eta =1}^n \cos \left({\theta}_{i\eta}\right)}},\kern0.5em \frac{ \cos \left({\theta}_{i2}\right)}{{\displaystyle \sum_{\eta =1}^n \cos \left({\theta}_{i\eta}\right)}},....,\frac{ \cos \left({\theta}_{in}\right)}{{\displaystyle \sum_{\eta =1}^n \cos \left({\theta}_{i\eta}\right)}}\right) $$
(9)
  1. (2)

    Update weight arrays. Assuming the optimal sites are cosine sites, P il  = (cos(θ il1), cos(θ il2),  ... , cos(θ iln )) and P g  = (cos(θ g1), cos(θ g2),  ... , cos(θ gn )). The update of weight arrays is implemented by adjusting the probability amplitudes P is and P ic . After each iteration, P is and P ic will be adjusted as \( \overline{P_{is}} \) and \( \overline{P_{ic}} \) by Eq. (10), then \( {P}_{is}=\overset{\_}{P_{is}} \) and \( {P}_{ic}=\overset{\_}{P_{ic}} \).

$$ \begin{array}{l}{\overline{P}}_{is}=\left( \sin \left({\theta}_{i1}(t)+\varDelta {\theta}_{i1}\left(t+1\right)\right),....., \sin \left({\theta}_{in}(t)+\varDelta {\theta}_{in}\left(t+1\right)\right)\right),\hfill \\ {}{\overline{P}}_{is}=\left( \cos \left({\theta}_{i1}(t)+\varDelta {\theta}_{i1}\left(t+1\right)\right),....., \cos \left({\theta}_{in}(t)+\varDelta {\theta}_{in}\left(t+1\right)\right)\right),\hfill \\ {}\kern7em \varDelta {\theta}_{ij}\left(t+1\right)=w\varDelta {\theta}_{ij}(t)+{c}_1{r}_1\left(\varDelta {\theta}_1\right)+{c}_2{r}_2\left({\varDelta \theta}_g\right),\hfill \end{array} $$

where

$$ \varDelta {\theta}_l=\left\{\begin{array}{c}\hfill 2\pi +{\theta}_{ilj}+{\theta}_{ij,}\left({\theta}_{ilj}-{\theta}_{ij}<-\pi \right)\hfill \\ {}\hfill {\theta}_{ilj}-{\theta}_{ij,}\left(-\pi \le {\theta}_{ilj}-{\theta}_{ij}\le \pi \right)\hfill \\ {}\hfill {\theta}_{ilj}-{\theta}_{ij}-2{\pi}_{,}\left({\theta}_{ilj}-{\theta}_{ij}>\pi \right)\hfill \end{array}\right.\kern0.5em \mathrm{and}\kern0.5em \varDelta {\theta}_g=\left\{\begin{array}{c}\hfill 2\pi +{\theta}_{gj}+{\theta}_{ij,}\left({\theta}_{gj}-{\theta}_{ij}<-\pi \right)\hfill \\ {}\hfill {\theta}_{gj}-{\theta}_{ij,}\left(-\pi \le {\theta}_{gj}-{\theta}_{ij}\le \pi \right)\hfill \\ {}\hfill {\theta}_{gj}-{\theta}_{ij}-2{\pi}_{,}\left({\theta}_{gj}-{\theta}_{ij}>\pi \right)\hfill \end{array}\right. $$
(10)
  1. (3)

    Mutation operation of weight arrays. OWA avoids the diversity loss by mutation operation by Eq. (11). p ro denotes the mutation probability. rnd i is randomly produced in (0,1). If rnd i  < p ro , ⌈n/2⌉ quantum bits of the quantum state particle are randomly selected to achieve the mutation process by Eq. (11).

$$ \left[\begin{array}{cc}\hfill 0\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 1\hfill \end{array}\right]\left[\begin{array}{c}\hfill \cos \left({\theta}_{ij}\right)\hfill \\ {}\hfill \sin \left({\theta}_{ij}\right)\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \cos \left(\frac{\pi }{2}-{\theta}_{ij}\right)\hfill \\ {}\hfill \sin \left(\frac{\pi }{2}-{\theta}_{ij}\right)\hfill \end{array}\right] $$
(11)

The Eqs. (5), (6), (7), (8), (10) and (11) main offer an operating setting corresponding to PSO. The introduction of quantum information mainly focuses on the mutation operation by Eq. (12). The mutation operation of weight arrays can effectively avoid the problem of early convergence in PSO. When m quantum state particles go through g max iterations, the process of OWA is introduced as follows.

figure a

OWA avoids considering the magnitude difference of different unit indexes. When more than one independent similarity indexes are qualified to be unit indexes, we can constitute a mixed index with all of them as unit indexes. With enough iterations in OWA, the weights of unit indexes that yield the increase of AUC will be aligned with 0 automatically.

4.3 FDA

Based on the optimal mixed index, FDA is proposed to quantify the network evolution fluctuation caused by social events, which considers the otherness of micro node evolutions to avoid the interference of the routine network fluctuation.

The otherness of node evolutions

Achieving the highest link prediction accuracy in the current period, the link prediction algorithm based on the current optimal mixed index is most consistent with the current network evolution mechanism. Under normal circumstances without influencing events, the current network evolution mechanism will remain stable for a certain period of time, so the link prediction algorithm based on the current optimal mixed index will still obtain the relatively high link prediction accuracy in the following periods. When the link prediction accuracy decreases severely, we can guess that events happen in the corresponding period. And the events disturb the internal network evolution mechanism and cause the decrease of the link prediction accuracy.

However, the link prediction accuracy is just a macroscopic performance of network evolutions, which ignores the otherness of node evolutions. If the topology change of a node is consistent with the network inherent evolution, the node can be regarded as a normal evolution node, and its effect on the network evolution fluctuation is small. If the topology change of a node is not consistent with the network evolution, it may be caused by event occurrences. Considering the otherness of micro node evolutions is helpful to quantify the network evolution fluctuation caused by events more precisely, and it effectively avoids the interference of routine network fluctuations. Assuming AUC t denotes the AUC at the moment t, we propose \( AU{C}_i^t \) by introducing AUC t at the micro level to consider node evolutions, which evaluates the consistent degree between the topology change of node i and the network internal evolution. For \( AU{C}_i^t \), the links connected with node i constitute the test link set, and the inexistent links that connects the existing nodes with node i constitute the inexistent link set.

Based on \( AU{C}_i^t \) ,we further propose the measurement index mAUC t to comprehensively consider the evolutions of all nodes in social networks. At the moment t, Γ(t) denotes the node set, and D t denotes the sum of nodes in Γ(t). mAUC t is denoted by Eq. (12).

$$ mAU{C}^t=\left(\frac{{\displaystyle \sum_{i\in \varGamma (T)} AU{C}_i^t}}{D_t}\right)\times AU{C}^t $$
(12)

Quantify fluctuations caused by events

T O denotes a historical moment without influential event occurrences, which can be set flexibly. And \( BMixSimInde{x}_{T_O} \) is used to denote the optimal mixed index at T O . Both \( AU{C}_i^t \) and AUC t are calculated based on \( BMixSimInde{x}_{T_O} \). To quantify the network fluctuations caused by events, we propose the event impact value E. And E t denotes the event impact value of possible events at the period (t ‐ 1, t), which is calculated by Eq. (13).

$$ {E}^t=\left\{\begin{array}{l}mAU{C}^t-\frac{{\displaystyle \sum_{\eta =t-\varDelta t}^{t-1}mAU{C}^{\eta }}}{\varDelta t}\\ {}mAU{C}^t,\kern0.5em \left(\varDelta t=0\right)\end{array}\right.,\kern0.5em \left(0<\varDelta t\; and\kern0.5em \varDelta t\kern0.5em is\kern0.5em \mathrm{interger}\right) $$
(13)

Δt denotes the memory time. When multiple events cause the consecutive network fluctuation, Δt should be aligned with small values to avoid the interference among different events. When Δt = 0, E t is decided by mAUC t directly. The higher E t is, the bigger the network evolution fluctuation caused by possible events is. The process of FDA is introduced as follows.

figure b

5 Experiments and discussions

To better verify the proposed algorithms OWA and FDA in HQSII, we analysis them by typical social network models in Section 5.1. In Section 5.2, extensive comparison experiments in theoretical and real social networks are conducted to verify the performance of HQSII.

5.1 Analysis of OWA and FDA

Watts and Strogatz [44] proposed the important WS small world network, which is between the regular network and the random network. Based on the evolution mechanism of priority connection, Barabasi and Albert [4] proposed the typical BA scale-free model. Because of their flexible constructions, WS small world model and BA scale-free model can be applied to analyze OWA and FDA powerfully.

The feasibility of mixed index in OWA

To analyze the feasibility of the mixed index in OWA, a network instance N of WS small world model is constructed: 200 nodes are produced, each node has four neighbor nodes, and the probability of randomization reconnection is 0.3. As a network snapshot, N can be used to analyze the mixed indexes in OWA. For N, the existing links constitute the test link set, and the inexistent links constitute the inexistent link set. Precision is also introduced to evaluated the link prediction precision. Assuming the test link set has sn edges, we rank the edges in the test link set and inexistent link set together according to their similarity scores. Then we count the edge number of the top sn edges existed in the test link set, which denotes the value of Precision. The link prediction algorithms based on different independent similarity indexes are conducted, and their results are shown in Table 3.

Table 3 AUC and Precision of different similarity indexes in N

When AA and HPI, CN and PA are selected, and their corresponding AUC and Precision values are shown in Table 4. From Tables 3 and 4, by the cooperation of different independent similarity indexes, the mixed indexes have the ability to achieve higher prediction values than independent similarity indexes. But not all mixed indexes can achieve higher prediction values than independent similarity indexes. Only when the unit indexes of a mixed index are aligned with appropriate weights, it can achieve a high prediction value. When MixSimIndex is 0.5 × AA + 0.5 × HPI and 0.9 × CN + 0.1 × PA , they all get AUC and Precision values that are higher than eight independent indexes. But when MixSimIndex is 0.1 × AA + 0.9 × HPI and 0.1 × CN + 0.9 × PA, their prediction value is lower than some independent similarity indexes.

Table 4 AUC and Precision of mixed indexes in N

The consistency analysis in FDA

To better explain the consistency between the change of the link prediction measurement index and the network evolution fluctuation in FDA, we construct network instances N1 and N2 based on the BA scale-free model. N1 is the network evolution of the BA scale-free model, and N2 is the designed network evolution with event occurrences in the BA scale-free model. The construction steps of N1 are as follows. (1) The initial network is empty, and a node is added at the first time step. (2) From the second time step to the 200th time step, a new node is added to link the existing node that owns the largest degree at each time step. The construction steps of N2 are as follows. (1) The normal network evolution phase (from the first time step to the 120th time step): a new node is added at the first time step. From the second time step to the 120th time step, a new node is added to link the existing node that owns the largest degree at each time step. (2) The event occurrence phase (from the 121th time step to the 159th time step): a new node is added to link one of the existing nodes randomly at each time step. (3) The network evolution recovery phase (from the 160th time step to the 200th time step): a new node is added to link the existing node that owns the largest degree at each time step.

Compared with N1, N2 applies the designed evolution mechanism from the 121th time step to the 159th time step to simulate the event occurrence. Before the end of the 120th time step, N1 and N2 have the same network mechanism. When T O  = 120, the AUC 120of different independent similarity indexes in N1 and N2 are shown in Table 5.

Table 5 AUC 120of different similarity indexes in N1 and N2

Only AUC 120 of PA is higher than 0.5, and PA is selected as the single unit index in the mixed index, so BMixSimIndex 120 = PA. The AUC values of the link prediction algorithms based on BMixSimIndex 120 = PA and other similarity indexes are shown in Figure 4. Because SA, JA, SO, HPI and LNH are similar with CN and AA, only CN and AA are demonstrated to make Figure 4 clear.

Figure 4
figure 4

The change of AUC of different similarity indexes

As shown in Figure 4, when the appropriate similarity index is selected, the change of the link prediction measurement index can be consistent with the network evolution fluctuations. For N2, the change of AUC of PA reflects the network evolution fluctuation. The AUC value of PA is higher than 0.7 in the normal network evolution phase. But the AUC value of PA decreases severely in the network event occurrence phase, because of the change of the network internal evolution mechanism. In the network evolution recovery phase, the AUC value of PA rises gradually. For N1 without event occurrences, the AUC value of PA is always higher than 0.7. Meanwhile, for both N1 and N2, the AUC values of CN, AA, JA, SO, HPI, SA and LNH all have no obvious changes, and they cannot reflect the network evolution fluctuation.

5.2 The performance of HQSII

After detailing the data sets, the computation complexity of HQSII is analyzed by introducing comparison methods. The performances of HQSII to describe network evolutions and evaluate network fluctuations are verified by extensive comparison experiments.

Data sets

In order to verify the performance of the proposed HQSII, the constructed instance N2, the real telephone communication network VAST and the email network ENRON are used as experimental data sets. N2 derives from the famous BA scale-free model. VAST and ENRON are typical event detection data sets in social network. The reason why we select N2, VAST and ENRON is that their event occurrence times and occurrence reasons are known clearly. It is important for us to compare experiment results with real data precisely. The three cases are detailed as follows.

  1. (1)

    N2: As constructed in Section 5.1, the constructed instance N2 is a designed network evolution that a defined event happens in the evolution of the BA scale-free model.

  2. (2)

    VAST: The data set derives from IEEE VAST 2008 (http://www.cs.umd.edu/hcil/VASTchallenge08), which contains the call data network in a social network of 400 staffs in 10 days. And the major management change happens between the seventh day and the eighth day. The event in VAST is single, which is helpful to analyze the changes of a network inherent evolution before and after the event.

  3. (3)

    ENRON: The data set contains the internal email data of 150 staffs in 111 weeks in the company of Enron (http://www.cs.cmu.edu/~enron). And 20 representative weeks are selected, which involves events like company acquisition and bankruptcy. The events in ENRON are multiple, which are conducive to analysis the performance of HQSII in social networks with multiple events.

The computation complexity analysis

Considering the difference of running environment, programming languages and coding styles, HQSII is evaluated by the computation complexity analysis in O notation that counts critical programming statements in iterations. Because HQSII consists of OWA and FDA, the computation complexity can be analyzed from OWA and FDA. To verify the performance of OWA, the differential search algorithm (DSA) method [25], the genetic algorithm (GA) method [42], the PSO method [36] and the comprehensive learning particle swarm optimization (CLPSO) method [15] are introduced as four comparison methods against the proposed OWA in HQSII. And three typical and latest methods of generalized social networks are also introduced as comparison methods to verify the performance of FDA, which are the structure analysis (SC) method [27], the feature selection (FS) method [17] and the graph similarity (GS) method [16].

n o denotes the total comparison time. m quantum state particles go through g max iterations. The mixed index has n unit indexes. OWA has the same computation complexity with four comparison methods including initialization, evaluation and update, which is shown in Table 6. OWA usually has enough computing time to apply in real applications, because the optimal mixed index needs to be demined only once in the proposed method HQSII. The following detection and evaluation of events are carried out by FDA based on the given optimal mixed index.

Table 6 The computation complexity of five methods

Assuming Ddenotes the sum of nodes, the computation complexity of FDA is O(D × n O ). Because of GS, FS and SC need traverse all node pairs, their corresponding computation complexities are all O(D × D). For link prediction, 672,400 comparisons are needed at most to satisfy that the absolute error of AUC does not exceed 1/10,000 with the confidence level of 90 % [20]. When FDA is applied into the large-scale social network, its computation advantage is obvious with the computation complexity O(D × 672400).

Network evolution descriptions

HQSII describes the evolutions of social networks by OWA that discovers the optimal mixed indexes efficiently. For OWA, the numbers of particles and iterative times are decided by actual needs. As recommended by the study about PSO [37], the number of particles and iterative times are set as 150 and 200 in the following experiments. The fitness function, the swarm size and the iterative time of four comparison methods are same with OWA, and their other parameters are set in accordance of their original settings.

It is easy to determine that \( BMixSimInde{x}_{T_O}=PA \), because only PA makes AUC > 0.5 for N2 in Table 5. The \( AU{C}^{T_O} \) of different similarity indexes in VAST and ENRON are shown in Table 7. And the first day and the first week are selected as their corresponding T O . From Table 7, the mixed index in VAST includes unit indexes of CN, PA, AA, SA, JA, SO, HPI and LNH. And the mixed index in ENRON includes unit indexes of CN, PA, AA, SA, JA and LNH. When the five methods are applied to determine the optimal mixed index, the fitness value boxplots that contain all discovered fitness values in the iterations are shown in Figure 5. For VAST and ENRON, the parameter values by OWA for different similarity are shown in Table 8. The conclusions are made as follows.

Table 7 \( AU{C}^{T_O} \)of different similarity indexes in the data set of VAST and ENRON
Figure 5
figure 5

Boxplots of discovering fitness values (AUC) for five methods in VAST and ENRON

Table 8 The parameter values by OWA for different similarity indexes in VAST and ENRON
  1. (1)

    OWA is more effective to find the optimal mixed with highest fitness values than four comparison methods, which is benefited from its quantum mechanism that weakens the premature convergence. When the optimal mixed index is an independent similarity index in N2, the five methods have no effect. In other case, OWA has all discovered the largest optimization ranges. And the highest fitness value achieved by OWA is significantly higher than four comparison methods, which are 0.5158 and 0.6260.

  2. (2)

    Compared with the independent similarity indexes, the more complex the evolution mechanism is, the bigger the advantage of the optimal mixed index is. In ENRON, the multiple events cause large fluctuations in the network evolution, and the fitness value of the optimal mixed index improves 0.0424 than independent indexes. But in VAST with a single event, the fitness value just improves 0.0012. In N2 with a single event, the optimal mixed index is even equivalent to PA.

Evaluate network fluctuations

To evaluate the abnormal fluctuations caused by events in N2, VAST and ENRON quantitatively, FDA in HQSII are applied to obtain their event impact values at different periods. And our proposed method is verified by comparison experiments. The standardizing event impacts of N2, VAST and ENRON are shown in Figure 6.

Figure 6
figure 6

The standardizing event impact value by four methods inN2, VAST and ENRON

The event impact by FDA can better reflect the abnormal fluctuation caused by events than three comparison methods in the social network with a single event and multiple events. As shown in Figure 6, when events happen in the three cases, the five methods all have relatively high event impact values and the event impact of FDA is higher than three comparison methods. For the three cases, the ratio of the average event impact values in event occurrence periods and normal periods is shown in Table 9, FDA is more sensitive to detect the abnormal fluctuations caused by events. And the ratio of FDA is significantly higher than three comparison methods. For N2, compared with the event impact value in the normal network evolution phase (40th to 120th time step), the event impact value significantly increases in the event occurrence phase (from 121th to 159th time step). For VAST, the event impact values all decline dramatically at the sixth and seventh days, because of the major management changes between the seventh day and the eighth day. And their event impact values rise to the normal level at the eighth day, which shows that the impact of the major management changes weakens gradually in the social network. For ENRON, Table 10 shows the contrast between the detected events and real events. The event impact values have good matching relations with real events. The Event 2 severely disrupts the network internal evolution mechanism and causes the rapid decrease of the event impact value.

Table 9 The ratio of average event impact values in event occurrence periods and normal periods
Table 10 The contrast between detected events and real events in ENRON

6 Conclusions and prospects

In this paper, we propose a hybrid quantum swarm intelligence indexing method for generalized social networks with limited available information, which not only detects events by discovering abnormal fluctuations, but also evaluates event impacts by analyzing fluctuation trends. It consists of two algorithms OWA and FDA. Extensive comparison experiments about WS small world model, BA scale-free model, VAST and ENRON are implemented. The conclusions are made as follows.

  1. (1)

    The cooperation of different independent similarity indexes is conductive to archive higher link prediction accuracy than other independent similarity indexes. When the appropriate similarity index is selected, the change of the link prediction measurement index can be consistent with the network evolution fluctuations. The optimal mixed index by OWA can describe the network evolution fully with the highest fitness value.

  2. (2)

    The event impact by FDA can obviously reflect the network fluctuations caused by events in social networks with a single and multiple events. The consideration of the node evolution otherness improves the sensitivity to detect the network fluctuations caused by events.

  3. (3)

    In instances of WS small world model, the designed BA scale-free model, VAST and ENRON, the events detected and evaluated by the proposed method have good matching relations with real events, which proves the accuracy and practicability of the proposed method.

Despite the proposed method is promising, further researches are still needed, which include: (1) how to determine the proportion of different network evolution mechanisms in real-world social networks; (2) how to further explore the relationship of existing similarity indexes; (3) how to propose more effective mixed functions.