Modeling Cascade Growth: Predicting Content Diffusion on VKontakte

Moroz, Anna; Pashakhin, Sergei; Koltsov, Sergei

doi:10.1007/978-3-030-64877-0_12

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 181))

Included in the following conference series:

Fifth Networks in the Global World Conference

304 Accesses

Abstract

Online social networks have become an essential communication channel for the broad and rapid sharing of information. Currently, the mechanics of such information-sharing is captured by the notion of cascades, which are tree-like networks comprised of (re)sharing actions. However, it is still unclear what factors drive cascade growth. Moreover, there is a lack of studies outside Western countries and platforms such as Facebook and Twitter. In this work, we aim to investigate what factors contribute to the scope of information cascading and how to predict this variation accurately. We examine six machine learning algorithms for their predictive and interpretative capabilities concerning cascades’ structural metrics (width, mass, and depth). To do so, we use data from a leading Russian-language online social network VKontakte capturing cascades of 4,424 messages posted by 14 news outlets during a year. The results show that the best models in terms of predictive power are Gradient Boosting algorithm for width and depth, and Lasso Regression algorithm for the mass of a cascade, while depth is the least predictable. We find that the most potent factor associated with cascade size is the number of reposts on its origin level. We examine its role along with other factors such as content features and characteristics of sources and their audiences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Understanding and forecasting lifecycle events in information cascades

Article 04 November 2017

Predicting Information Diffusion in Social Networks Using Content and User’s Profiles

Distinguishing re-sharing behaviors from re-creating behaviors in information diffusion

Article 14 December 2015

References

Thorson, K., Wells, C.: Curated flows: a framework for mapping media exposure in the digital age. Commun. Theory 26(3), 309–328 (2015)
Article Google Scholar
Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediated Commun. 13(1), 210–230 (2007)
Article Google Scholar
Sun, E., Rosenn, I., Marlow, C.A., Lento, T.M.: Gesundheit! Modeling contagion through Facebook news feed. In: Third International AAAI Conference on Weblogs and Social Media (2009)
Google Scholar
González-Bailón, S., Borge-Holthoefer, J., Moreno, Y.: Online networks and the diffusion of protest. In: Analytical Sociology, pp. 261–278 (2014)
Google Scholar
Liben-Nowell, D., Kleinberg, J.: Tracing information flow on a global scale using internet chain-letter data. Proc. Natl. Acad. Sci. 105(12), 4633–4638 (2008)
Article Google Scholar
Gomez-Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data 5(4), 1–37 (2012)
Article Google Scholar
Myers, S.A., Zhu, C., Leskovec, J.: Information diffusion and external influence in networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012 (2012)
Google Scholar
Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyones an influencer. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011 (2011)
Google Scholar
Cheng, J., Adamic, L., Dow, P.A., Kleinberg, J.M., Leskovec, J.: Can cascades be predicted? In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014 (2014)
Google Scholar
Cao, Q., Shen, H., Cen, K., Ouyang, W., Cheng, X.: DeepHawkes: bridging the gap between prediction and understanding of information cascades. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1149–1158 (2017)
Google Scholar
Petrovic, S., Osborne, M., Lavrenko, V.: RT to Win! Predicting message propagation in Twitter. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)
Google Scholar
Hong, L., Dan, O., Davison, B.D.: Predicting popular messages in Twitter. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 (2011)
Google Scholar
Elsharkawy, S., Hassan, G., Nabhan, T., Roushdy, M.: Towards feature selection for cascade growth prediction on Twitter. In: Proceedings of the 10th International Conference on Informatics and Systems, INFOS 2016 (2016)
Google Scholar
Tsur, O., Rappoport, A.: What’s in a hashtag? Content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 643–652 (2012)
Google Scholar
Martin, T., Hofman, J.M., Sharma, A., Anderson, A., Watts, D.J.: Exploring limits to prediction in complex social systems. In: Proceedings of the 25th International Conference on World Wide Web, pp. 683–694 (2016)
Google Scholar
Leskovec, J., Mcglohon, M., Faloutsos, C., Glance, N., Hurst, M.: Patterns of cascading behavior in large blog graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining (2007)
Google Scholar
Vicario, M.D., Bessi, A., Zollo, F., Petroni, F., Scala, A., Caldarelli, G., Stanley, H.E., Quattrociocchi, W.: The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)
Article Google Scholar
Mail.ru Group Limited Annual Report for FY 2019 and unaudited IFRS results for Q1 2020, April 2020. https://corp.imgsmail.ru/media/files/engq1-2020-results.pdf
Koltsov, S., Pashakhin, S., Dokuka, S.: A full-cycle methodology for news topic modeling and user feedback research. In: International Conference on Social Informatics, pp. 308–321. Springer (2018)
Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, Heidelberg (2013)
Book Google Scholar
Becker, R.A., Chambers, J.M., Wilks, A.R.: The new s language, April 2018
Google Scholar
TreeNet stochastic gradient boosting: an implementation of the MART methodology. http://docs.salford-systems.com/TreeNetManual_v1.pdf
Quan, Z., Valdez, E.A.: Predictive analytics of insurance claims using multivariate decision trees. SSRN Electron. J. (2018)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Article MathSciNet Google Scholar
Sullivan, L.E.: Selective exposure. In: The SAGE Glossary of the Social and Behavioral Sciences, p. 465 (2009)
Google Scholar

Download references

Acknowledgements

This work is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University).

Author information

Authors and Affiliations

Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, Saint Petersburg, Russia
Anna Moroz, Sergei Pashakhin & Sergei Koltsov

Authors

Anna Moroz
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Pashakhin
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Koltsov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Moroz .

Editor information

Editors and Affiliations

St. Petersburg University, Centre for German and European Studies, St. Petersburg, Russia
Artem Antonyuk
St. Petersburg University, Centre for German and European Studies, St. Petersburg, Russia
Nikita Basov

A Appendix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moroz, A., Pashakhin, S., Koltsov, S. (2021). Modeling Cascade Growth: Predicting Content Diffusion on VKontakte. In: Antonyuk, A., Basov, N. (eds) Networks in the Global World V. NetGloW 2020. Lecture Notes in Networks and Systems, vol 181. Springer, Cham. https://doi.org/10.1007/978-3-030-64877-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-64877-0_12
Published: 20 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64876-3
Online ISBN: 978-3-030-64877-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics