Abstract
Online social networks have become an essential communication channel for the broad and rapid sharing of information. Currently, the mechanics of such information-sharing is captured by the notion of cascades, which are tree-like networks comprised of (re)sharing actions. However, it is still unclear what factors drive cascade growth. Moreover, there is a lack of studies outside Western countries and platforms such as Facebook and Twitter. In this work, we aim to investigate what factors contribute to the scope of information cascading and how to predict this variation accurately. We examine six machine learning algorithms for their predictive and interpretative capabilities concerning cascades’ structural metrics (width, mass, and depth). To do so, we use data from a leading Russian-language online social network VKontakte capturing cascades of 4,424 messages posted by 14 news outlets during a year. The results show that the best models in terms of predictive power are Gradient Boosting algorithm for width and depth, and Lasso Regression algorithm for the mass of a cascade, while depth is the least predictable. We find that the most potent factor associated with cascade size is the number of reposts on its origin level. We examine its role along with other factors such as content features and characteristics of sources and their audiences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Thorson, K., Wells, C.: Curated flows: a framework for mapping media exposure in the digital age. Commun. Theory 26(3), 309–328 (2015)
Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediated Commun. 13(1), 210–230 (2007)
Sun, E., Rosenn, I., Marlow, C.A., Lento, T.M.: Gesundheit! Modeling contagion through Facebook news feed. In: Third International AAAI Conference on Weblogs and Social Media (2009)
González-Bailón, S., Borge-Holthoefer, J., Moreno, Y.: Online networks and the diffusion of protest. In: Analytical Sociology, pp. 261–278 (2014)
Liben-Nowell, D., Kleinberg, J.: Tracing information flow on a global scale using internet chain-letter data. Proc. Natl. Acad. Sci. 105(12), 4633–4638 (2008)
Gomez-Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data 5(4), 1–37 (2012)
Myers, S.A., Zhu, C., Leskovec, J.: Information diffusion and external influence in networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012 (2012)
Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyones an influencer. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011 (2011)
Cheng, J., Adamic, L., Dow, P.A., Kleinberg, J.M., Leskovec, J.: Can cascades be predicted? In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014 (2014)
Cao, Q., Shen, H., Cen, K., Ouyang, W., Cheng, X.: DeepHawkes: bridging the gap between prediction and understanding of information cascades. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1149–1158 (2017)
Petrovic, S., Osborne, M., Lavrenko, V.: RT to Win! Predicting message propagation in Twitter. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)
Hong, L., Dan, O., Davison, B.D.: Predicting popular messages in Twitter. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 (2011)
Elsharkawy, S., Hassan, G., Nabhan, T., Roushdy, M.: Towards feature selection for cascade growth prediction on Twitter. In: Proceedings of the 10th International Conference on Informatics and Systems, INFOS 2016 (2016)
Tsur, O., Rappoport, A.: What’s in a hashtag? Content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 643–652 (2012)
Martin, T., Hofman, J.M., Sharma, A., Anderson, A., Watts, D.J.: Exploring limits to prediction in complex social systems. In: Proceedings of the 25th International Conference on World Wide Web, pp. 683–694 (2016)
Leskovec, J., Mcglohon, M., Faloutsos, C., Glance, N., Hurst, M.: Patterns of cascading behavior in large blog graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining (2007)
Vicario, M.D., Bessi, A., Zollo, F., Petroni, F., Scala, A., Caldarelli, G., Stanley, H.E., Quattrociocchi, W.: The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)
Mail.ru Group Limited Annual Report for FY 2019 and unaudited IFRS results for Q1 2020, April 2020. https://corp.imgsmail.ru/media/files/engq1-2020-results.pdf
Koltsov, S., Pashakhin, S., Dokuka, S.: A full-cycle methodology for news topic modeling and user feedback research. In: International Conference on Social Informatics, pp. 308–321. Springer (2018)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, Heidelberg (2013)
Becker, R.A., Chambers, J.M., Wilks, A.R.: The new s language, April 2018
TreeNet stochastic gradient boosting: an implementation of the MART methodology. http://docs.salford-systems.com/TreeNetManual_v1.pdf
Quan, Z., Valdez, E.A.: Predictive analytics of insurance claims using multivariate decision trees. SSRN Electron. J. (2018)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Sullivan, L.E.: Selective exposure. In: The SAGE Glossary of the Social and Behavioral Sciences, p. 465 (2009)
Acknowledgements
This work is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Moroz, A., Pashakhin, S., Koltsov, S. (2021). Modeling Cascade Growth: Predicting Content Diffusion on VKontakte. In: Antonyuk, A., Basov, N. (eds) Networks in the Global World V. NetGloW 2020. Lecture Notes in Networks and Systems, vol 181. Springer, Cham. https://doi.org/10.1007/978-3-030-64877-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-64877-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64876-3
Online ISBN: 978-3-030-64877-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)