Abstract
In this paper, we propose an online incremental learning framework for identifying web spam. The proposed work can incrementally update the learning model based on any newly arrived samples without recourse to the original data. The prototype of the framework has been evaluated with a real large scale web spam dataset. The results demonstrate the proposed online detector has high learning speed and accurate prediction rates for the web spam.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., Vigna, S.: ACM SIGIR Forum 40(2), 11 (2006)
Benczur, R.A., Csalogany, K., Sarlos, T., Uher, M.: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), held in conjunction with WWW 2005 (2005)
Zhou, B., Pei, J.: ACM Transactions on Knowledge Discovery from Data 3(3) (2009)
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: In: International World Wide Web Conference: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92 (2006)
Fetterly, D., Manasse, M., Najork, M.: 7th International Workshop on the Web and Databases (2004)
Dagan, I., Karor, Y., Roth, D.: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 55–63 (1997)
Bekkerman, R., McCallum, A., Huang, G.: Categorization of email into folders: Bench- mark experiments on enron and sri corpora. Ciir technical report ir-418, CIIR, University of Massachusetts (2004)
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: The Journal of Machine Learning Research, 551–585 (2006)
Carvalho, V.R., Cohen, W.W.: in. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 548–553. ACM (2006)
Yahoo! Research: ”Web Spam Collections”. Crawled by the Laboratory of Web Algorithmics, University of Milan URLs, http://law.dsi.unimi.it/ (retrieved on July 12, 2010)
Levenberg, A., Osborne, M.: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2009)
Geng, G.G., Wang, C.H., Li, Q.D., Xu, L., Jin, X.B.: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, vol. 4, pp. 583–587 (2007)
Mortensen, C.W., Pagh, R., Pǎtraçcu, M.: STOC 2005: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, pp. 104–111. ACM (2005)
Bradley, A.P.: Pattern Recognition, 1145–1159 (1997)
Vanderlooy, S., Hüllermeier, E.: Machine Learning, 247–262 (2008)
Hanley, J.A., McNeil, B.J.: Radiology (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Han, L., Levenberg, A. (2012). Scalable Online Incremental Learning for Web Spam Detection. In: Qian, Z., Cao, L., Su, W., Wang, T., Yang, H. (eds) Recent Advances in Computer Science and Information Engineering. Lecture Notes in Electrical Engineering, vol 124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25781-0_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-25781-0_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25780-3
Online ISBN: 978-3-642-25781-0
eBook Packages: EngineeringEngineering (R0)