Abstract
Peer-to-peer networks have been identified as promising architectural concept for developing search scenarios across digital library collections. Digital libraries typically offer sophisticated search over their local content, however, search methods involving a network of such stand-alone components are currently quite limited. We present an architecture for highly-efficient search over digital library collections based on structured P2P networks. As the standard single-term indexing strategy faces significant scalability limitations in distributed environments, we propose a novel indexing strategy–key-based indexing. The keys are term sets that appear in a restricted number of collection documents. Thus, they are discriminative with respect to the global document collection, and ensure scalable search costs. Moreover, key-based indexing computes posting list joins during indexing time, which significantly improves query performance. As search efficient solutions usually imply costly indexing procedures, we present experimental results that show acceptable indexing costs while the retrieval performance is comparable to the standard centralized solutions with TF-IDF ranking.
The work presented in this paper was carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European FP 6 STREP project ALVIS (002068).
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 52–66. Springer, Heidelberg (2005)
Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: DL Meets P2P - Distributed Document Retrieval Based on Classification and Content. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 379–390. Springer, Heidelberg (2005)
Li, J., Loo, B., Hellerstein, J., Kaashoek, F., Karger, D., Morris, R.: The feasibility of peer-to- peer web indexing and search. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 207–215. Springer, Heidelberg (2003)
Buntine, W., Aberer, K., Podnar, I., Rajman, M.: Opportunities from open source search. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 2–8 (2005)
Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to- peer networks. In: 16th International Conference on Supercomputing, pp. 84–95 (2002)
Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM), pp. 199–206 (2003)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable contentaddressable network. In: SIGCOMM 2001, pp. 161–172 (2001)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to- peer lookup service for internet applications. In: SIGCOMM 2001, pp. 149–160 (2001)
Aberer, K.: P-Grid: A self-organizing access structure for P2P information systems. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 179–194. Springer, Heidelberg (2001)
Aberer, K., Alima, L.O., Ghodsi, A., Girdzijauskas, S., Haridi, S., Hauswirth, M.: The Essence of P2P: A Reference Architecture for Overlay Networks. In: Fifth IEEE International Conference on Peer-to-Peer Computing, pp. 11–20 (2005)
Reynolds, P., Vahdat, A.: Efficient Peer-to-Peer Keyword Searching. In: Middleware 2003 (2003)
Salton, G., Yang, C.: On the specification of term values in automatic indexing. Journal of Documentation 4, 351–372 (1973)
Salton, G., Allan, J., Buckley, C.: Approaches to Passage Retrieval in Full Text Information Systems. In: SIGIR 1993, pp. 49–58 (1993)
Pôssas, B., Ziviani, N., Ribeiro-Neto, B., Wagner Meira, J.: Maximal termsets as a query structuring mechanism. In: CIKM 2005, pp. 287–288 (2005)
Rajman, M., Bonnet, A.: Corpora-Base Linguistics: New Tools for Natural Language Processing. In: 1st Annual Conference of Association for Global Strategic Information (1992)
Aberer, K., Klemm, F., Rajman, M., Wu, J.: An Architecture for Peer-to-Peer Information Retrieval. In: SIGIR 2004, Workshop on Peer-to-Peer Information Retrieval (2004)
Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12), pp. 236–246. IEEE Press, Los Alamitos (2003)
Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Improving collection selection with overlap awareness in P2P search engines. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 67–74 (2005)
Balke, W., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to- peer networks. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 174–185 (2005)
Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: VLDB 2005, pp. 637–648 (2005)
Pôssas, B., Ziviani, N., Wagner Meira, J., Ribeiro-Neto, B.: Set-based vector model: An efficient approach for correlation-based ranking. ACM Trans. Inf. Syst. 23, 397–429 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Podnar, I., Luu, T., Rajman, M., Klemm, F., Aberer, K. (2006). A Peer-to-Peer Architecture for Information Retrieval Across Digital Library Collections. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2006. Lecture Notes in Computer Science, vol 4172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11863878_2
Download citation
DOI: https://doi.org/10.1007/11863878_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44636-1
Online ISBN: 978-3-540-44638-5
eBook Packages: Computer ScienceComputer Science (R0)