Analysis of Probabilistic Models for Influence Ranking in Social Networks

Nerurkar, Pranav; Pavate, Aruna; Shah, Mansi; Jacob, Samuel

doi:10.1007/978-981-13-1513-8_23

Pranav Nerurkar¹⁷,
Aruna Pavate¹⁸,
Mansi Shah¹⁹ &
…
Samuel Jacob²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 810))

1645 Accesses
3 Citations

Abstract

Influence is a phenomenon occurring in every social network. Network science literature on Influence ranking focuses on investigation and design of computational models for ranking of nodes by their influence and mapping the spread of their influence in the network. In addition to this contemporary literature seeks efficient and scalable influence ranking techniques that could be suitable for application on massive social networks. For this purpose joint and conditional probabilistic models could be a way forward as these models can be trained on data rapidly making them ideal for deployment on massive social networks. However identification of suitable predictors that may have a correlation with influence plays a major role in deciding the successful outcome for these models. The present investigation proceeds with the intuition that interaction is positively correlated with influence. Furthermore, through extensive experimentation it identifies a joint probabilistic model and trains it on interaction characteristics on nodes of a social network for influence ranking. A qualitative analysis of these models is presented to highlight its suitability.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Probabilistic reasoning system for social influence analysis in online social networks

Article 19 November 2020

Probabilistic graphical models in modern social network analysis

Article 19 October 2015

Joint latent space models for ranking data and social network

Article 13 June 2022

Keywords

1 Introduction

Social influence is referred to as the behavioural change or alternation in performance of actions of an individual brought about due to interchanges conducted with other individuals. It has known to be the causal link for other well documented phenomena seen in social networks such as social competition, peer pressure, homophily, information spread, network evolution. Influence is also important in constraining the flow of dynamics within a network [1].

In online social networks, social influence may depend on factors such as strength of relationships between nodes, distance between the nodes, number of paths for traversal from a node to its neighbors and characteristics of the individuals in the network [2]. However, for the purpose of developing a computational model that shall measure influence quantitatively and qualitatively, a statistic based measure is required. Statistic based measures proposed in the literature for measuring influence are centrality, between-ness, closeness, decay etc. These methods focused only on the structural characteristics of a social network to calculate the influence of nodes in it. Influence Maximization technique goes beyond such simple statistical measures and provides an alternate method to measure influence. However it is a subset selection problem and hence is fundamentally different from Influence ranking which is a measurement problem. Hence it is not the focus of the present inquiry.

Joint and conditional probabilistic frameworks could be the suitable methods to measure influence. This is because such techniques have been experimentally verified on wide range of scenarios. The advantage of these methods is that they can be trained rapidly on data. This makes them suitable for deployment on massive social networks such as Twitter, Facebook etc. However, a key aspect that determines the success of these methods is the selection of suitable predictors for training them. This inquiry selects statistics about user interactions for training four probabilistic models. Extensive experiments are then performed using these frameworks on Twitter data. Standard evaluation metrics are used to select the most optimal model out of these. The aim of this work is to extend the literature on use of probabilistic models to rank nodes on influence.

2 Review of Literature

The edge and node measures for calculating centrality are techniques that rely on structural features which ignoring attribute level data of a node or interaction characteristics of a node in the network. To overcome these drawbacks Influence Maximization technique and Probabilistic Generative models were proposed.

2.1 Quantifying Influence

A formal computational definition for influence is proposed in [3]. The nodes are $x_i$ and $x_j$ and t and $t-1$ are the time instants, $a_{ij}^t$ denotes value of the adjacency matrix at time instant t for nodes i and j.

$$\begin{aligned} \frac{p((x_i^t, x_j^t)> (x_i^{t-1}, x_j^{t-1}) | a_{ij}^{t-1} = 0, a_{ij}^t = 1)}{p((x_i^t, x_j^t) > (x_i^{t-1}, x_j^{t-1}) | a_{ij}^{t-1} = 0)} \end{aligned}$$

(1)

The numerator is the conditional probability that two nodes that were not linked at instant $t-1$ are linked at t have seen increase in their similarity. The denominator calculates the conditional probability that two nodes that were not linked at $t-1$ see an increase in their similarity at t as compared to $t-1$. This method however does not differentiate the influence from various angles [2].

2.2 Influence Maximization (IM)

The formal definition of the problem is: Given a social influence graph G(V, E) with V, E representing vertices (individuals) and edges (social relationships) respectively. P(u, v) is the probability that v is activated by an already active node u sharing a directed edge (u, v). Independent cascade model allows a small sets of seed nodes to be activated. Then a node u can activate its neighbor v with probability p(u, v). Influence spread is denoted as the maximum number of nodes activated. The problem is to maximize the influence spread [4]. IM is an NP-hard problem and hence a greedy approximation algorithm is used that can theoretically guarantee influence spread is within 63% of the optimal influence spread. However, the greedy algorithm requires the evaluation of the influence spread given a seed set. This step is time consuming.

In contrast to this line of work, statistics of interaction by nodes were also explored for measuring their influence in the network. Interaction occurs in the network through activity performed by the node, response received from its neighbors on such an activity, propagation of activity further throughout the network, activating reaction from nodes not directly connected to each other etc. Klout rank is one such measure that ranks nodes in a network as per their influence based on their statistics of interaction [5]. These statistics are collected from multiple social networks of which the node is a participant. Klout rank uses a feature vector of 3600 attributes which are statistics of user interaction on the social platform and analyzes interactions between the user and other participants on the social network to generate a score. Alexy et al. have investigated use of machine learning for calculating influence ranks based on data generated from social network activity by the users [6]. Behnam et al. modelled influence of a node in a network based on metrics such as number of followers and ratio of affection [6]. The purpose of the current inquiry is to advance the existing knowledge in the field of influence analysis by providing experimental evaluation of conditional and joint probabilistic learning techniques to model influence of nodes on social networks.

3 Mathematical Model

Learning techniques build a function $g \in G$, where G represents the hypothesis space. Function $g : X \rightarrow Y$ maps the input space X to the output space Y. If F denotes the space of scoring functions f such that $f: X *Y \rightarrow R$ then g is as shown in Eq. 2.

$$\begin{aligned} g(x) = \mathop {\arg \max }\limits _{y} f(x, y) \end{aligned}$$

(2)

Then a suitable conditional probability model or joint probability model is chosen for mapping f to g. For a conditional probability model, g is calculated as P(y|x) and for a joint probability model g is P(x, y). A method to choose the appropriate model is structural risk minimization in which a regularization penalty called $L_2$ norm which is $\sum _j \beta _j^2$ is incorporated in to the cost function $J(\theta )$ to minimize over-fitting.

3.1 Learning Technique for Ranking Influential Nodes

Conditional probability model or joint probability models are then used in the below technique to identify the most suitable model amongst them for comparing influences and ranking nodes as per their influence.

4 Experimental Study

4.1 Data-Set

Data-set used for the experiment consists of features extracted about user interaction characteristics (Tables 1 and 2) from Twitter—an online social network [6]. Twitter is a graph $G = (V,E)$ with n nodes having k attributes which have been formed using its interaction characteristics. These have been collected from the nodes activity observed over an online social network. Any two nodes, A, B of G are picked and a feature vector $x_i$ is built by combining their individual interaction characteristics i.e. $x_i = {a_i, b_i}$. This is used to build a dataset X that contains training samples of the form $(x_1, y_1) ... (x_n, y_n)$ such that $x_i$ is the feature vector of the ith example. The corresponding class label of $x_i$ is $y_i$ which represents the human judgment about which one of the two individuals in $x_i$ is more influential. Thus $y_i \in {0,1}$ such that $y_i = 0$ means first user is more influential and $y_i = 1$ means that the second user is more influential.

Learning models described in Sect. 4.2 are used in the Influence learning strategy proposed in Sect. 3.1. Performance metrics used for selection of the appropriate model are Accuracy, Kappa, Area under the ROC curve, specificity, sensitivity, Log-loss, Precision and Recall.

Table 1 Description of the data-set

Full size table

Table 2 Attributes in the feature vector

Full size table

4.2 Result

4.3 Artificial Neural Network: [7, 8]

This model is a single hidden layer feed-forward neural network whose tunable parameters are number of hidden units (nhid) and activation function (actfun) chosen using the random hyper-parameter optimization [9]. The optimal model was chosen by fivefold cross validation repeated 5 times on the basis of Accuracy metric (other metrics aren’t calculated for this model as it doesn’t give class probabilities) (Table 3).

Table 3 Parameters of the Optimal Ex-LM model

Full size table

4.4 Random Forests (Rf): [10]

Random forests were the next model trained on the interaction characteristics. Tunable parameter is the number of trees (mTry) for this model. Four different optimal models were selected one each for the metrics Accuracy, Area under ROC curve (ROC), Area under Precision-recall curve (AUC) and logloss. fivefold cross validation was used with 5 times repeat (Table 4).

Table 4 Parameters of the Optimal Rf model

Full size table

4.5 Stochastic Gradient Boosted Trees (GBM): [11]

Stochastic Gradient Boosted Trees were the next model trained on the interaction characteristics. Tunable parameter is the boosting iterations (n.trees), maximum tree depth (interaction.depth), (shrinkage) and minimum terminal node size (n.minobsinnode) for this model. Four different optimal models were selected one each for the metrics Accuracy, Area under ROC curve (ROC), Area under Precision-recall curve (AUC) and logloss. fivefold cross validation was used with 5 times repeat (Table 5).

Table 5 Parameters of the Optimal GBM model

Full size table

4.6 Extreme Gradient Boosted Trees (xgbTree): [11]

Extreme Gradient Boosted Trees model was trained on the interaction characteristics. Tunable parameter are the Boosting Iterations (nrounds), Max Tree Depth ($max-depth$), Shrinkage (eta), Minimum Loss Reduction (gamma), Sub-sample Ratio of Columns ($col-bt$), Minimum Sum of Instance Weight ($min-cw$), and Sub-sample Percentage (subsample) for this model. Four different optimal models were selected one each for the metrics Accuracy, Area under ROC curve (ROC), Area under Precision-recall curve (AUC) and log-loss. fivefold cross validation was used with 5 times repeat (Table 6).

Table 6 Parameters of the Optimal xgbTree model

Full size table

Optimal models selected from the above procedure were selected evaluated on the test data. Based on the evaluation metrics it can be inferred the xgbTree models fit the test data better than other ML techniques applied on the training dataset. xgbTree models are ensembled predictors of balanced decision trees and also have the advantage of not overfitting the data.

Table 7 Results of Optimal ML based model on test data

Full size table

Table 7 demonstrates that Extreme Gradient Boosted Trees (xgbTree) were found to be the most appropriate probabilistic models amongst those reviewed in this study for influence ranking on the Twitter dataset. xgbTree model supports parallelization of tree construction, cache optimization of data structures and algorithm to make best use of hardware, distributed computing and out-of-core computing for very large datasets that do not fit into memory for training very large models using a cluster of machines [12].

5 Conclusion

This inquiry was conducted on the intuition that interaction occurs between entities of any social network and this could play an important role in modelling of influence. Hence statistics related to interactions made by entities in a social network were utilized to model influence and rank the entities as per their influence. Joint and conditional probability based techniques were used such as Neural networks, decision trees, ensembled predictors to conduct the inquiry. Based on the evaluation of the performance of the learning techniques on the data it was concluded that Extreme gradient boosted trees were the most suitable amongst the techniques used. The key advantage of this technique over other methods would be that it could scale well over massive online networks compared to other influence ranking models.

References

Aggarwal, C.C.: An introduction to social network data analytics. In: Social Network Data Analytics, pp. 1–15 (2011)
Chapter Google Scholar
Sun, J., Tang, J.: A survey of models and algorithms for social influence analysis. In: Social Network Data Analytics, pp. 177–214 (2011)
Chapter Google Scholar
Scripps, J., Tan, P.-N., Esfahanian, A.-H.: Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 747–756. ACM (2009)
Google Scholar
Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 137–146. ACM (2003)
Google Scholar
Rao, A., Spasojevic, N., Li, Z., DSouza, T.: Klout score: measuring influence across multiple social networks. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2282–2289. IEEE (2015)
Google Scholar
Nerurkar, P., Bhirud, S.: Modeling influence on a social network using interaction characteristics. Int. J. Comput. Math. Sci. 152–160 (2017)
Google Scholar
Zurada, J.M.: Introduction to Artificial Neural Systems, vol. 8. West St. Paul (1992)
Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings, vol. 2, pp. 985–990. IEEE (2004)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 5–32 (2001)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)
Google Scholar
A gentle introduction to XGBoost for applied machine learning, Sep 2016
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CE & IT, VJTI, Mumbai, India
Pranav Nerurkar
Department of CE & IT, Atharva CoE, Mumbai, India
Aruna Pavate
Department of CE & IT, Rizvi CoE, Mumbai, India
Mansi Shah
Jagdishprasad Jhabarmal Tibrewala University, Jhunjhunu, Rajasthan, India
Samuel Jacob

Authors

Pranav Nerurkar
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Pavate
View author publications
You can also search for this author in PubMed Google Scholar
Mansi Shah
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Jacob
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pranav Nerurkar .

Editor information

Editors and Affiliations

Department of Electronics and Telecommunication Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad, Maharashtra, India
Brijesh Iyer
Department of Electronics and Telecommunication Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad, Maharashtra, India
S.L. Nalbalwar
Department of Electronics and Communication Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Nagendra Prasad Pathak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nerurkar, P., Pavate, A., Shah, M., Jacob, S. (2019). Analysis of Probabilistic Models for Influence Ranking in Social Networks. In: Iyer, B., Nalbalwar, S., Pathak, N. (eds) Computing, Communication and Signal Processing . Advances in Intelligent Systems and Computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_23

Download citation

DOI: https://doi.org/10.1007/978-981-13-1513-8_23
Published: 13 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1512-1
Online ISBN: 978-981-13-1513-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Analysis of Probabilistic Models for Influence Ranking in Social Networks

Abstract

Similar content being viewed by others

Probabilistic reasoning system for social influence analysis in online social networks

Probabilistic graphical models in modern social network analysis

Joint latent space models for ranking data and social network

Keywords

1 Introduction

2 Review of Literature

2.1 Quantifying Influence

2.2 Influence Maximization (IM)

3 Mathematical Model

3.1 Learning Technique for Ranking Influential Nodes

4 Experimental Study

4.1 Data-Set

4.2 Result

4.3 Artificial Neural Network: [7, 8]

4.4 Random Forests (Rf): [10]

4.5 Stochastic Gradient Boosted Trees (GBM): [11]

4.6 Extreme Gradient Boosted Trees (xgbTree): [11]

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Analysis of Probabilistic Models for Influence Ranking in Social Networks

Abstract

Similar content being viewed by others

Probabilistic reasoning system for social influence analysis in online social networks

Probabilistic graphical models in modern social network analysis

Joint latent space models for ranking data and social network

Keywords

1 Introduction

2 Review of Literature

2.1 Quantifying Influence

2.2 Influence Maximization (IM)

3 Mathematical Model

3.1 Learning Technique for Ranking Influential Nodes

4 Experimental Study

4.1 Data-Set

4.2 Result

4.3 Artificial Neural Network: [7, 8]

4.4 Random Forests (Rf): [10]

4.5 Stochastic Gradient Boosted Trees (GBM): [11]

4.6 Extreme Gradient Boosted Trees (xgbTree): [11]

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation