Abstract
Search has arguably become the dominant paradigm for finding information on the World Wide Web. In order to build a successful search engine, there are a number of challenges that arise where techniques from artificial intelligence can be used to have a significant impact. In this paper, we explore a number of problems related to finding information on the web and discuss approaches that have been employed in various research programs, including some of those at Google. Specifically, we examine issues of such as web graph analysis, statistical methods for inferring meaning in text, and the retrieval and analysis of newsgroup postings, images, and sounds. We show that leveraging the vast amounts of data on web, it is possible to successfully address problems in innovative ways that vastly improve on standard, but often data impoverished, methods. We also present a number of open research problems to help spur further research in these areas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of the 7th International World Wide Web Conference, pp. 107–117 (1998)
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)
Bharat, K., Henzinger, M.R.: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In: Proc. of the 21st International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111 (1998)
Tomlin, J.A.: A New Paradigm for Ranking Pages on the World Wide Web. In: Proc. of the 12th International World Wide Web Conference, pp. 350–355 (2003)
Henzinger, M.R., Motwani, R., Silverstein, C.: Challenges in Web Search Engines. In: Proc. of the 18th International Joint Conference on Artificial Intelligence, pp. 1573–1579 (2003)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization: Papers from the 1998 Workshop. AAAI Technical Report WS-98-05 (1998)
Dumais, S., Bharat, K., Joachims, T., Weigend, A. (eds.): Workshop on Implicit Measures of User Interests and Preferences at SIGIR 2003 (2003)
Agosti, M., Melucci, M. (eds.): Workshop on Evaluation of Web Document Retrieval at SIGIR 1999 (1999)
Joachims, T.: Evaluating Retrieval Performance Using Clickthrough Data. In: Proc. of the SIGIR 2002 Workshop on Mathematical/Formal Methods in Information Retrieval (2002)
Mitra, M., Singhal, A., Buckley, C.: Improving Automatic Query Expansion. In: Proc. of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 206–214 (1998)
Smith, M., Kollock, P.: Communities in Cyberspace: Perspectives on New Forms of Social Organization. Routledge Press, London (1999)
Fiore, A., Tiernan, S.L., Smith, M.: Observed Behavior and Perceived Value of Authors in Usenet Newsgroups: Bridging the Gap. In: Proc. of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 323–330 (2002)
Arnt, A., Zilberstein, S.: Learning to Perform Moderation in Online Forums. In: Proc. of the IEEE/WIC International Conference on Web Intelligence (2003)
Zhang, Y., Callan, J., Minka, T.P.: Novelty and Redundancy Detection in Adaptive Filtering. In: Proc. of the 25th International ACM-SIGIR Conference on Research and Development in Information Retrieval (2002)
Smith, J.R., Chang, S.F.: Tools and Techniques for Color Image Retrieval. In: Proc. of SPIE Storage and Retrieval for Image and Video Databases, vol. 2670, pp. 426–437 (1996)
Berenzweig, A., Logan, B., Ellis, D., Whitman, B.: A Large-Scale Evaluation of Acoustic and Subjective Music Similarity Measures. In: Proc. of the 4th International Symposium on Music Information Retrieval (2003)
Wu, J., Rehg, J.M., Mullin, M.D.: Learning a Rare Event Detection Cascade by Direct Feature Selection. In: Advances in Neural Information Processing Systems, vol. 16 (2004)
Sung, K., Poggio, T.: Learning Human Face Detection in Cluttered Scenes. In: Proc. of Intl. Conf. on Computer Analysis of Image and Patterns (1995)
Rowley, H.A., Baluja, S., Kanade, T.: Neural Network-based Face Detection. IEEE Trans. On Pattern Analysis and Machine Intelligence 20(1), 23–38 (1998)
Viola, P., Jones, M.: Rapid Object Detection Using a Boosted Cascade of Simple Features. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 511–518 (2001)
Schneiderman, H., Kanade, T.: A Statistical Model for 3D Object Detection Applied to Faces and Cars. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (2000)
Viola, P., Jones, M., Snow, D.: Detecting Pedestrians Using Patterns of Motion and Appearance. Mitsubishi Electric Research Lab Technical Report. TR-2003-90 (2003)
Banko, M., Brill, E.: Mitigating the Paucity of Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for NLP. In: Proc. of the Conference on Human Language Technology (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sahami, M., Mittal, V., Baluja, S., Rowley, H. (2004). The Happy Searcher: Challenges in Web Information Retrieval. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-28633-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22817-2
Online ISBN: 978-3-540-28633-2
eBook Packages: Springer Book Archive