Abstract
CiteSeer is a well-known online resource for the computer science research community, allowing users to search and browse a large archive of research papers. Unfortunately, its current centralized incarnation is costly to run. Although members of the community would presumably be willing to donate hardware and bandwidth at their own sites to assist CiteSeer, the current architecture does not facilitate such distribution of resources. OverCite is a proposal for a new architecture for a distributed and cooperative research library based on a distributed hash table (DHT). The new architecture will harness resources at many sites, and thereby be able to support new features such as document alerts and scale to larger data sets.
This research was conducted as part of the IRIS project (http://project-iris.net/), supported by the National Science Foundation under Cooperative Agreement No. ANI-0225660. Isaac G. Councill receives support from NSF SGER Grant IIS-0330783 and Microsoft Research.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bawa, M., Manku, G.S., Raghavan, P.: SETS: Search enhanced by topic segmentation. In: Proceedings of the 2003 SIGIR (July 2003)
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences (June 1997)
Burkard, T.: Herodotus: A peer-to-peer web archival system. Master’s thesis, Massachusetts Institute of Technology (May 2002)
Chawathe, Y., Ratnasamy, S., Breslau, L., Lanham, N., Shenker, S.: Making Gnutella-like P2P systems scalable. In: Proc. of SIGCOMM (August 2003)
Cho, J., Garcia-Molina, H.: Parallel crawlers. In: Proceedings of the 2002 WWW Conference (May 2002)
Dabek, F., Kaashoek, M.F., Li, J., Morris, R., Robertson, J., Sit, E.: Designing a DHT for low latency and high throughput. In: Proceedings of the 1st NSDI (March 2004)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66, 614–656 (2003)
Gnawali, O.D.: A keyword set search system for peer-to-peer networks. Master’s thesis, Massachusetts Institute of Technology (June 2002)
Gupta, A., Liskov, B., Rodrigues, R.: Efficient routing for peer-to-peer overlays. In: Proceedings of the 1st NSDI (March 2004)
Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the Internet with PIER. In: Proceedings of the 19th VLDB (September 2003)
Kannan, J., Yang, B., Shenker, S., Sharma, P., Banerjee, S., Basu, S., Lee, S.J.: SmartSeer: Continuous queries over CiteSeer. Tech. Rep. UCB//CSD-05-1371, UC Berkeley, Computer Science Division (January 2005)
Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Computer 32(6), 67–71 (1999), http://www.citeseer.org
Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735. Springer, Heidelberg (2003)
Li, J., Stribling, J., Kaashoek, M.F., Morris, R.: Bandwidth-efficient management of DHT routing tables. In: Proceedings of the 2nd NSDI (May 2005)
Litwin, W., Neimat, M.-A., Schneider, D.A.: LH* — a scalable, distributed data structure. ACM Transactions on Database Systems 21(4), 480–525 (1996)
Loo, B.T., Cooper, O., Krishnamurthy, S.: Distributed web crawling over DHTs. Tech. Rep. UCB//CSD-04-1332, UC Berkeley, Computer Science Division (February 2004)
Loo, B.T., Huebsch, R., Stoica, I., Hellerstein, J.M.: The case for a hybrid P2P search infrastructure. In: Voelker, G.M., Shenker, S. (eds.) IPTPS 2004. LNCS, vol. 3279, pp. 141–150. Springer, Heidelberg (2005)
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672. Springer, Heidelberg (2003)
Shi, S., Yang, G., Wang, D., Yu, J., Qu, S., Chen, M.: Making peer-to-peer keyword searching feasible using multi-level partitioning. In: Voelker, G.M., Shenker, S. (eds.) IPTPS 2004. LNCS, vol. 3279, pp. 151–161. Springer, Heidelberg (2005)
Singh, A., Srivatsa, M., Liu, L., Miller, T.: Apoidea: A decentralized peer-to-peer architecture for crawling the world wide web. In: Proceedings of the SIGIR 2003 Workshop on Distributed Information Retrieval (August 2003)
Suel, T., Mathur, C., Wu, J.-W., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: ODISSEA: A peer-to-peer architecture for scalable web search and information retrieval. In: Proceedings of the International Workshop on the Web and Databases (June 2003)
Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: Proceedings of the 1st NSDI (March 2004)
Yang, B., Garcia-Molina, H.: Improving search in peer-to-peer networks. In: Proceedings of the 22nd ICDCS (July 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stribling, J. et al. (2005). OverCite: A Cooperative Digital Research Library. In: Castro, M., van Renesse, R. (eds) Peer-to-Peer Systems IV. IPTPS 2005. Lecture Notes in Computer Science, vol 3640. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558989_7
Download citation
DOI: https://doi.org/10.1007/11558989_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29068-1
Online ISBN: 978-3-540-31906-1
eBook Packages: Computer ScienceComputer Science (R0)