Skip to main content

Modeling Cascade Growth: Predicting Content Diffusion on VKontakte

  • Conference paper
  • First Online:
Networks in the Global World V (NetGloW 2020)

Abstract

Online social networks have become an essential communication channel for the broad and rapid sharing of information. Currently, the mechanics of such information-sharing is captured by the notion of cascades, which are tree-like networks comprised of (re)sharing actions. However, it is still unclear what factors drive cascade growth. Moreover, there is a lack of studies outside Western countries and platforms such as Facebook and Twitter. In this work, we aim to investigate what factors contribute to the scope of information cascading and how to predict this variation accurately. We examine six machine learning algorithms for their predictive and interpretative capabilities concerning cascades’ structural metrics (width, mass, and depth). To do so, we use data from a leading Russian-language online social network VKontakte capturing cascades of 4,424 messages posted by 14 news outlets during a year. The results show that the best models in terms of predictive power are Gradient Boosting algorithm for width and depth, and Lasso Regression algorithm for the mass of a cascade, while depth is the least predictable. We find that the most potent factor associated with cascade size is the number of reposts on its origin level. We examine its role along with other factors such as content features and characteristics of sources and their audiences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Thorson, K., Wells, C.: Curated flows: a framework for mapping media exposure in the digital age. Commun. Theory 26(3), 309–328 (2015)

    Article  Google Scholar 

  2. Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediated Commun. 13(1), 210–230 (2007)

    Article  Google Scholar 

  3. Sun, E., Rosenn, I., Marlow, C.A., Lento, T.M.: Gesundheit! Modeling contagion through Facebook news feed. In: Third International AAAI Conference on Weblogs and Social Media (2009)

    Google Scholar 

  4. González-Bailón, S., Borge-Holthoefer, J., Moreno, Y.: Online networks and the diffusion of protest. In: Analytical Sociology, pp. 261–278 (2014)

    Google Scholar 

  5. Liben-Nowell, D., Kleinberg, J.: Tracing information flow on a global scale using internet chain-letter data. Proc. Natl. Acad. Sci. 105(12), 4633–4638 (2008)

    Article  Google Scholar 

  6. Gomez-Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data 5(4), 1–37 (2012)

    Article  Google Scholar 

  7. Myers, S.A., Zhu, C., Leskovec, J.: Information diffusion and external influence in networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012 (2012)

    Google Scholar 

  8. Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyones an influencer. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011 (2011)

    Google Scholar 

  9. Cheng, J., Adamic, L., Dow, P.A., Kleinberg, J.M., Leskovec, J.: Can cascades be predicted? In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014 (2014)

    Google Scholar 

  10. Cao, Q., Shen, H., Cen, K., Ouyang, W., Cheng, X.: DeepHawkes: bridging the gap between prediction and understanding of information cascades. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1149–1158 (2017)

    Google Scholar 

  11. Petrovic, S., Osborne, M., Lavrenko, V.: RT to Win! Predicting message propagation in Twitter. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)

    Google Scholar 

  12. Hong, L., Dan, O., Davison, B.D.: Predicting popular messages in Twitter. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 (2011)

    Google Scholar 

  13. Elsharkawy, S., Hassan, G., Nabhan, T., Roushdy, M.: Towards feature selection for cascade growth prediction on Twitter. In: Proceedings of the 10th International Conference on Informatics and Systems, INFOS 2016 (2016)

    Google Scholar 

  14. Tsur, O., Rappoport, A.: What’s in a hashtag? Content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 643–652 (2012)

    Google Scholar 

  15. Martin, T., Hofman, J.M., Sharma, A., Anderson, A., Watts, D.J.: Exploring limits to prediction in complex social systems. In: Proceedings of the 25th International Conference on World Wide Web, pp. 683–694 (2016)

    Google Scholar 

  16. Leskovec, J., Mcglohon, M., Faloutsos, C., Glance, N., Hurst, M.: Patterns of cascading behavior in large blog graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining (2007)

    Google Scholar 

  17. Vicario, M.D., Bessi, A., Zollo, F., Petroni, F., Scala, A., Caldarelli, G., Stanley, H.E., Quattrociocchi, W.: The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)

    Article  Google Scholar 

  18. Mail.ru Group Limited Annual Report for FY 2019 and unaudited IFRS results for Q1 2020, April 2020. https://corp.imgsmail.ru/media/files/engq1-2020-results.pdf

  19. Koltsov, S., Pashakhin, S., Dokuka, S.: A full-cycle methodology for news topic modeling and user feedback research. In: International Conference on Social Informatics, pp. 308–321. Springer (2018)

    Google Scholar 

  20. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, Heidelberg (2013)

    Book  Google Scholar 

  21. Becker, R.A., Chambers, J.M., Wilks, A.R.: The new s language, April 2018

    Google Scholar 

  22. TreeNet stochastic gradient boosting: an implementation of the MART methodology. http://docs.salford-systems.com/TreeNetManual_v1.pdf

  23. Quan, Z., Valdez, E.A.: Predictive analytics of insurance claims using multivariate decision trees. SSRN Electron. J. (2018)

    Google Scholar 

  24. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Article  MathSciNet  Google Scholar 

  25. Sullivan, L.E.: Selective exposure. In: The SAGE Glossary of the Social and Behavioral Sciences, p. 465 (2009)

    Google Scholar 

Download references

Acknowledgements

This work is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Moroz .

Editor information

Editors and Affiliations

A Appendix

A Appendix

Fig. 1.
figure 1

A graph showing top-10 of the most tangible predicting features used as input by the Lasso Regression algorithm for cascade’s mass prediction.

Fig. 2.
figure 2

A plot displaying negative to positive values proportion of Lasso Regression variables’ coefficients.

Fig. 3.
figure 3

Plots showing observation-level effects of the number of posts first-level reposts on cascade’s depth (on the left) and width (on the right). The multiple black lines are individual conditional expectation (ICE) curves, while the red line stands for the averaged values across all predictions [24] by Gradient Boosting algorithm.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moroz, A., Pashakhin, S., Koltsov, S. (2021). Modeling Cascade Growth: Predicting Content Diffusion on VKontakte. In: Antonyuk, A., Basov, N. (eds) Networks in the Global World V. NetGloW 2020. Lecture Notes in Networks and Systems, vol 181. Springer, Cham. https://doi.org/10.1007/978-3-030-64877-0_12

Download citation

Publish with us

Policies and ethics