Abstract
Entity search has emerged as an important research topic over the past years, but so far has only been addressed in a centralized setting. In this paper we present an attempt to solve the task of ad-hoc entity retrieval in a cooperative distributed environment. We propose a new collection ranking and selection method for entity search, called AENN. The key underlying idea is that a lean, name-based representation of entities can efficiently be stored at the central broker, which, therefore, does not have to rely on sampling. This representation can then be utilized for collection ranking and selection in a way that the number of collections selected and the number of results requested from each collection is dynamically adjusted on a per-query basis. Using a collection of structured datasets in RDF and a sample of real web search queries targeting entities, we demonstrate that our approach outperforms state-of-the-art distributed document retrieval methods in terms of both effectiveness and efficiency.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Balog, K., Soboroff, I., Thomas, P., Craswell, N., de Vries, A.P., Bailey, P.: Overview of the TREC 2008 enterprise track. In: TREC 2008. NIST (2009)
Balog, K., de Vries, A.P., Serdyukov, P., Thomas, P., Westerveld, T.: Overview of the TREC 2009 entity track. In: TREC 2009 (2010)
Blanco, R., Halpin, H., Herzig, D., Mika, P., Pound, J., Thompson, H., Duc, T.: Entity search evaluation over structured web data. In: EOS 2011 (2011)
Callan, J.: Distributed information retrieval. In: Advances in Information Retrieval. Kluwer Academic Publishers (2000)
Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: SIGIR 1995. ACM (1995)
de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 Entity Ranking Track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)
Gravano, L., Garcia-Molina, H.: Generalizing GlOSS to vector-space databases and broker hierarchies. In: VLDB 1995 (1995)
Haas, K., Mika, P., Tarjan, P., Blanco, R.: Enhanced results for web search. In: SIGIR 2011. ACM (2011)
Halpin, H., Herzig, D.M., Mika, P., Blanco, R., Pound, J., Thompson, H.S., Tran, D.T.: Evaluating ad-hoc object retrieval. In: IWEST 2010 (2010)
Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: WWW 2010. ACM (2010)
Shokouhi, M.: Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)
Shokouhi, M., Si, L.: Federated search. Foundations and Trends in Information Retrieval 5, 1–102 (2011)
Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: SIGIR 2003. ACM (2003)
Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: CIKM 2002. ACM (2002)
Thomas, P., Shokouhi, M.: SUSHI: scoring scaled samples for server selection. In: SIGIR 2009. ACM (2009)
Voorhees, E.: Overview of the TREC 2004 question answering track. In: TREC 2004. NIST (2005)
Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: SIGIR 1999. ACM (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balog, K., Neumayer, R., Nørvåg, K. (2012). Collection Ranking and Selection for Federated Entity Search. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-34109-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34108-3
Online ISBN: 978-3-642-34109-0
eBook Packages: Computer ScienceComputer Science (R0)