Abstract
The promises inherent in users coming together to form data sharing network communities, bring to the foreground new problems formulated over such dynamic, ever growing, computing, storage, and networking infrastructures. A key open challenge is to harness these highly distributed resources toward the development of an ultra scalable, efficient search engine. From a technical viewpoint, any acceptable solution must fully exploit all available resources dictating the removal of any centralized points of control, which can also readily lead to performance bottlenecks and reliability/availability problems. Equally importantly, however, a highly distributed solution can also facilitate pluralism in informing users about internet content, which is crucial in order to preclude the formation of information-resource monopolies and the biased visibility of content from economically-powerful sources. To meet these challenges, the work described here puts forward MINERVA∞, a novel search engine architecture, designed for scalability and efficiency. MINERVA∞ encompasses a suite of novel algorithms, including algorithms for creating data networks of interest, placing data on network nodes, load balancing, top-k algorithms for retrieving data at query time, and replication algorithms for expediting top-k query processing. We have implemented the proposed architecture and we report on our extensive experiments with real-world, web-crawled, and synthetic data and queries, showcasing the scalability and efficiency traits of MINERVA∞.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aspnes, J., Shah, G.: Skip graphs. In: Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 2003, pp. 384–393 (2003)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC 2004 (2004)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)
Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. Technical Report DCS-TR-487, Rutgers University (Sept. 2002)
Fagin, R.: Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 58(1), 83–99 (1999)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4) (2003)
Ganesan, P., Bawa, M., Garcia-Molina, H.: Online balancing of range-partitioned data with applications to peer-to-peer systems. In: VLDB, pp. 444–455 (2004)
Gupta, A., Sahin, O.D., Agrawal, D., Abbadi, A.E.: Meghdoot: content-based publish/subscribe over p2p networks. In: Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware, pp. 254–273. Springer, New York (2004)
Harvey, N., Jones, M., Saroiu, S., Theimer, M., Wolman, A.: Skipnet: A scalable overlay network with practical locality properties. In: USITS (2003)
Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the internet with pier. In: VLDB, pp. 321–332 (2003)
Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of CIKM 2003, pp. 199–206. ACM Press, New York (2003)
Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: VLDB Conference (2005)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of ACM SIGCOMM 2001, pp. 161–172. ACM Press, New York (2001)
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Proceedings of International Middleware Conference, June 2003, pp. 21–40 (2003)
Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), pp. 329–350 (2001)
Salomoni, D., Luitz, S.: High performance throughput tuning/measurement (2000), http://www.slac.stanford.edu/grp/scs/net/talk/High_perf_ppdg_jul2000.ppt
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM SIGCOMM 2001, pp. 149–160. ACM Press, New York (2001)
Suel, T., Mathur, C., Wu, J., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasunderam, K.: Odissea: A peer-to-peer architecture for scalable web search and information retrieval. Technical report, Polytechnic Univ. (2003)
Tirumala, A., et al.: iperf: Testing the limits of your network (2003), http://dast.nlanr.net/projects/iperf/
Triantafillou, P., Pitoura, T.: Towards a unifying framework for complex query processing over structured peer-to-peer data networks. In: DBISP2P (2003)
Wang, Y., Galanis, L., de Witt, D.J.: Galanx: An efficient peer-to-peer search engine system, Available at http://www.cs.wisc.edu/~yuanwang
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 IFIP International Federation for Information Processing
About this paper
Cite this paper
Michel, S., Triantafillou, P., Weikum, G. (2005). MINERVA∞: A Scalable Efficient Peer-to-Peer Search Engine. In: Alonso, G. (eds) Middleware 2005. Middleware 2005. Lecture Notes in Computer Science, vol 3790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11587552_4
Download citation
DOI: https://doi.org/10.1007/11587552_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30323-7
Online ISBN: 978-3-540-32269-6
eBook Packages: Computer ScienceComputer Science (R0)