Abstract
This paper presents a new algorithm for hypertext graph crawling. Using an ant as an agent in a hypertext graph significantly limits amount of irrelevant hypertext documents which must be downloaded in order to download a given number of relevant documents. Moreover, during all time of the crawling, artificial ants do not need a queue to central control crawling process. The proposed algorithm, called the Focused Ant Crawling Algorithm, for hypertext graph crawling, is better than the Shark-Search crawling algorithm and the algorithm with best-first search strategy utilizing a queue for the central control of the crawling process.
This work was partly supported by the Foundation for Polish Science (Professorial Grant 2005-2008) and the Polish State Committee for Scientific Research (Grant N516 020 31/1977), Special Research Project 2006-2009, Polish-Singapore Research Project 2008-2010, Research Project 2008-2010.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web, Probabilistic Methods and Algorithms. Wiley, Chichester (2003)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Cortez, C., Vapnik, V.N.: The hybrid application of an inductive learning method and a neural network for intelligent information retrieval. Machine Learning 20, 1–25 (1995)
Kłopotek, A.M.: Intelligent Search Engines. EXIT (in polish) (2001)
Duch, W., Adamczak, R., Diercksen, G.H.F.: Classification, association and pattern completion using neural similarity based methods. International Journal of Applied Mathematic and Computer Science 10(4), 101–120 (2000)
Bilski, J.: The UD RLS algorithm for training feedforward neural networks. International Journal of Applied Mathematic and Computer Science 15(1), 115–123 (2005)
Łȩski, J., Henzel, N.: A neuro-fuzzy system based on logical interpretation of if-then rules. International Journal of Applied Mathematic and Computer Science 10(4), 703–722 (2000)
Łȩski, J.: A fuzzy if-then rule-based nonlinear classifier. International Journal of Applied Mathematic and Computer Science 13(2), 215–223 (2003)
Piegat, A.: Fuzzy Modeling and Control. Physica-Verlag (2001)
Rutkowska, D., Nowicki, R.: Implication-based neuro-fuzzy architectures. International Journal of Applied Mathematic and Computer Science 10(4), 675–701 (2000)
Dziwiński, P., Rutkowska, D.: Algorithm for generating fuzzy rules for WWW document classification. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 1111–1119. Springer, Heidelberg (2006)
Dziwiński, P., Rutkowska, D.: Hybrid algorithm for constructing DR-FIS to classification www documents. In: Some Aspects of Computer Science, EXIT Academic Publishing House, Warsaw (2007)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs (1995)
Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through URL ordering. Computer Networks and ISDN Systems 30, 161–172 (1998)
Baeza-Yates, R., Castillo, C., Marin, M., Rodriguez, A.: Crawling a country: Better strategies than breadth-first for web page ordering. In: International Word Wide Web Conference (2005)
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Computer Networks (31), 1623–1640 (1999)
Diligenti, M., Coetzee, F.M., Lawrence, S., Giles, C.L., Gori, M.: Focused crawling using context graphs. In: 26th International Conference on Very Large Data Bases, pp. 527–534 (2000)
Davison, B.D.: Topical locality in the web. In: 23rd Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 272–279 (2000)
Rungsawang, A., Angkawattanawit, N.: Learnable topic-specific web crawler. Computer Applications 28, 97–114 (2005)
Hersovici, M., Jacovi, M., Maarek, Y., Pelleg, D., Shtalhaim, M., Ur, S.: The shark-search algorithm – an application: tailored web site mapping. In: 7th International World-Wide-Web Conference on Computer Networks, pp. 317–326 (1998)
De Bra, P., Post, R.: Information retrieval in the world wide web: making client-based searching feasible. Computer Networks and ISDN Systems 27(2), 183–192 (1994)
Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation 1(1), 53–66 (1997)
Dorigo, M., Birattari, M., Stützle, T.: Ant colony optimization, artificial ants as a computational intelligence technique. IEEE Computational Intelligence Magazine, 28–39 (November 2006)
Pintea, C.M., Pop, P.C., Dumitrescu, D.: An ant-based technique for the dynamic generalized traveling salesman problem. In: 7th WSEAS International Conference on Systems Theory and Scientific Computation, vol. 7 (2007)
Vesel, A., Zerovnik, J.: How good can ants color graphs? Journal of Computing and Information Technology - CIT 8, 131–136 (2000)
Dowsland, K.A., Thompson, J.M.: An improved ant colony optimisation heuristic for graph coloring, vol. 156, pp. 313–324. Elsevier Science Publishers B. V (2008)
Altshuler, Y., Bruckstein, A., Wagner, I.: Swarm robotics for a dynamic cleaning problem. In: Swarm Intelligence Symposium, SIS 2005, pp. 209–216 (2005)
Wagner, I.A., Lindenbaum, M., Bruckstein, A.M.: Distributed covering by ant-robots using evaporating traces. IEEE Transactions on Robotics and Automation 15(5) (1999)
Wagner, I.A., Lindenbaum, M., Bruckstein, A.M.: Efficiently searching a graph by a smell-oriented vertex process. Annals of Mathematics and Artificial Intelligence 24, 211–223 (1998)
Birattari, M., Pellegrini, P., Dorigo, M.: On the invariance of ant colony optimization. IEEE Transactions on Evolutionary Computation 11(6) (2007)
Dorigo, M., Maniezzo, V., Colorni, A.: Ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics – Part B 26(1), 29–41 (1996)
Yanowski, V., Wagner, I.A., Lindenbaum, M., Bruckstein, A.: A distributed ant algorithm for efficiently patrolling a network. Algorithmica 37, 165–186 (2003)
Mark, E.: Searching for information in a hypertext medical handbook. Communications of the ACM (31), 880–886 (1988)
Documentation for the Java Platform, Standard Edition (2008), http://java.sun.com/javase/reference/index.jsp
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dziwiński, P., Rutkowska, D. (2008). Ant Focused Crawling Algorithm. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_96
Download citation
DOI: https://doi.org/10.1007/978-3-540-69731-2_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69572-1
Online ISBN: 978-3-540-69731-2
eBook Packages: Computer ScienceComputer Science (R0)