Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval

Shokouhi, Milad

doi:10.1007/978-3-540-71496-5_17

Milad Shokouhi¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

European Conference on Information Retrieval

2123 Accesses
34 Citations

Abstract

Collection selection is one of the key problems in distributed information retrieval. Due to resource constraints it is not usually feasible to search all collections in response to a query. Therefore, the central component (broker) selects a limited number of collections to be searched for the submitted queries. During the past decade, several collection selection algorithms have been introduced. However, their performance varies on different testbeds. We propose a new collection-selection method based on the ranking of downloaded sample documents. We test our method on six testbeds and show that our technique can significantly outperform other state-of-the-art algorithms in most cases. We also introduce a new testbed based on the trec gov2 documents.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Improving Shard Selection for Selective Search

LTRRS: A Learning to Rank Based Algorithm for Resource Selection in Distributed Information Retrieval

A Study of Collection-Based Features for Adapting the Balance Parameter in Pseudo Relevance Feedback

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Avrahami, T., et al.: The FedLemur: federated search in the real world. Journal of the American Society for Information Science and Technology 57(3), 347–358 (2006)
Article Google Scholar
Baillie, M., Azzopardi, L., Crestani, F.: Adaptive query-based sampling of distributed collections. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 316–328. Springer, Heidelberg (2006)
Chapter Google Scholar
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems 19(2), 97–130 (2001)
Article Google Scholar
Callan, J., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proc. ACM SIGIR Conf., Seattle, Washington, pp. 21–28. ACM Press, New York (1995)
Google Scholar
Craswell, N., Bailey, P., Hawking, D.: Server selection on the World Wide Web. In: Proc. ACM Conf. on Digital Libraries, San Antonio, Texas, pp. 37–46. ACM Press, New York (2000)
Chapter Google Scholar
D’Souza, D., Thom, J., Zobel, J.: Collection selection for managed distributed document databases. Information Processing and Management 40(3), 527–546 (2004a)
Article Google Scholar
D’Souza, D., Zobel, J., Thom, J.: Is CORI effective for collection selection? an exploration of parameters, queries, and data. In: Proc. Australian Document Computing Symposium, Melbourne, Australia, pp. 41–46 (2004b)
Google Scholar
Gravano, L., et al.: STARTS: Stanford proposal for Internet meta-searching. In: Proc. ACM SIGMOD Conf., Tucson, Arizona, pp. 207–218. ACM Press, New York (1997)
Google Scholar
Gravano, L., Garcia-Molina, H., Tomasic, A.: GlOSS: text-source discovery over the Internet. ACM Transactions on Database Systems 24(2), 229–264 (1999)
Article Google Scholar
Hawking, D., Thomas, P.: Server selection methods in hybrid portal search. In: Proc. ACM SIGIR Conf., Salvador, Brazil, pp. 75–82. ACM Press, New York (2005)
Google Scholar
Joachims, T., et al.: Accurately interpreting clickthrough data as implicit feedback. In: Proc. ACM SIGIR Conf., Salvador, Brazil, pp. 154–161. ACM Press, New York (2005)
Google Scholar
Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proc. ACM SIGIR Conf., New Orleans, Louisiana, pp. 267–275. ACM Press, New York (2001)
Google Scholar
Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Proc. ACM SIGIR Conf., Toronto, Canada, pp. 290–297. ACM Press, New York (2003)
Google Scholar
Powell, A.L., French, J.: Comparing the performance of collection selection algorithms. ACM Transactions on Information Systems 21(4), 412–456 (2003)
Article Google Scholar
Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: Proc. 27th Int. Conf. on Very Large Data Bases, Roma, Italy, pp. 129–138. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Shokouhi, M., Scholer, F., Zobel, J.: Sample sizes for query probing in uncooperative distributed information retrieval. In: Proc. Asia Pacific Web Conf., Harbin, China, pp. 63–75 (2006a)
Google Scholar
Shokouhi, M., et al.: Capturing collection size for distributed non-cooperative retrieval. In: Proc. ACM SIGIR Conf., Seattle, Washington, pp. 316–323. ACM Press, New York (2006b)
Google Scholar
Si, L., Callan, J.: Unified utility maximization framework for resource selection. In: Proc. ACM CIKM Conf., New York, NY, pp. 32–41. ACM Press, New York (2004)
Google Scholar
Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proc. ACM SIGIR Conf., Toronto, Canada, pp. 298–305. ACM Press, New York (2003a)
Google Scholar
Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems 21(4), 457–491 (2003b)
Article Google Scholar
Si, L., et al.: A language modeling framework for resource selection and results merging. In: Proc. ACM CIKM Conf., McLean, Virginia, pp. 391–397. ACM Press, New York (2002)
Google Scholar
Xu, J., Croft, B.: Cluster-based language models for distributed retrieval. In: Proc. ACM SIGIR Conf., Berkeley, California, United States, pp. 254–261. ACM Press, New York (1999)
Google Scholar
Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the Internet. In: Proc. Conf. on Database Systems for Advanced Applications, Melbourne, Australia, pp. 41–50 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Technology, RMIT University, Melbourne 3001, Australia
Milad Shokouhi

Authors

Milad Shokouhi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shokouhi, M. (2007). Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-71496-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval

Abstract

Chapter PDF

Similar content being viewed by others

Improving Shard Selection for Selective Search

LTRRS: A Learning to Rank Based Algorithm for Resource Selection in Distributed Information Retrieval

A Study of Collection-Based Features for Adapting the Balance Parameter in Pseudo Relevance Feedback

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval

Abstract

Chapter PDF

Similar content being viewed by others

Improving Shard Selection for Selective Search

LTRRS: A Learning to Rank Based Algorithm for Resource Selection in Distributed Information Retrieval

A Study of Collection-Based Features for Adapting the Balance Parameter in Pseudo Relevance Feedback

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation