Abstract
Social network preserves the life of users and provides great potential for journalists, sociologists and business analysts. Crawling data from social network is a basic step for social network information analysis and processing. As the network becomes huge and information on the network updates faster than web pages, crawling is more difficult because of the limitations of bandwidth, politeness etiquette and computation power. To extract fresh information from social network efficiently and effectively, this paper presents a novel crawling method of social network. To discover the feature of social network, we gather data from real social network, analyze them and build a model to describe the discipline of users’ behavior. With the modeled behavior, we propose methods to predict users’ behavior. According to the prediction, we schedule our crawler more reasonably and extract more fresh information. Experimental results demonstrate that our strategies could obtain information from SNS efficiently and effectively.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Stanford Graph Set, http://snap.stanford.edu/data/
Leskovec, J.: Social Media Analytics. SIGKDD, tutorial (2011)
Spinn3r, http://www.icwsm.org/data/
Denev, D., Mazeika, A., Spaniol, M., Weikum, G.: SHARC: Framework for Quality-Conscious Web Archiving. In: VLDB (2009)
Olston, C., Pandey, S.: Recrawl scheduling based on information longevity. In: WWW, pp. 437–446 (2008)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 1–38 (1977)
Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachoura, V., Silvestri, F.: Design trade-offs for search engine caching. ACM Trans. Web 2(4), 1–28 (2008)
Cho, J., Ntoulas, A.: Eective change detection using sampling. In: VLDB, pp. 514–525 (2002)
Casella, G., Berger, R. (eds.): Statistical Inference. Brooks/Cole (2008)
Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. In: WWW, pp. 161–172 (1998)
Cho, J., Garcia-Molina, H.: Estimating frequency of change. Trans. Inter. Tech. 3(3), 256–290 (2003)
Castillo, C., Marin, M., Rodriguez, A., Baeza-Yates, R.: Scheduling algorithms for web crawling. In: WebMedia, pp. 10–17 (2004)
Cho, J., Schonfeld, U.: Rankmass crawler: a crawler with highpersonalized pagerank coverage guarantee. In: VLDB, pp. 375–386 (2007)
Wikipedia, http://zh.wikipedia.org/wiki/%E6%96%B0%E6%B5%AA%E5%BE%AE%E5%8D%9A
Byun, C., Lee, H., Kim, Y.: Automated Twitter Data Collecting Tool for Data Mining in Social Network. In: RACS (2012)
Okazaki, T.M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proc. of Conf. on World Wide Web, WWW (2010)
Aramaki, E., Maskawa, S., Morita, M.: Twitter Catches, The Flu: Detecting Influenza Epidemics using Twitter. In: Proceedings of the 2011 Conference on Empirical Methods, in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, pp. 1568–1576. Association for Computational Linguistics (2011)
Bošnjak, M., Oliveira, E., Martins, J., Mendes, E., Sarmento, L.: TwitterEcho - A Distributed Focused Crawler to Support Open Research with Twitter Data. In: WWW 2012 – MSND 2012 Workshop, Lyon, France, April 16-20 (2012)
Noordhuis, P., Heijkoop, M., Lazovik, A.: Mining Twitter in the Cloud. In: IEEE 3rd International Conference on Cloud Computing (2010)
Dziczkowski, G., Bougueroua, L., Wegrzyn-Wolska, K.: Social Network – An tutonoumous system designed for radio recommendation. In: International Conference on Computational Aspects of Social Networks, SASoN (2009)
Chau, D., Pandit, S., Wang, S., Faloutsos, C.: Parallel Crawling for Online Social Networks. In: WWW (2007)
Twitter Rate Limiting, https://dev.twitter.com/docs/rate-limiting
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, R., Wang, H., Li, K., Li, J., Gao, H. (2013). CUVIM: Extracting Fresh Information from Social Network. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-38562-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)