Abstract
Content distribution networks (CDNs) improve scalability and reliability, by replicating content to the “edge” of the Internet. Apart from the pure networking issues of the CDNs relevant to the establishment of the infrastructure, some very crucial data management issues must be resolved to exploit the full potential of CDNs to reduce the “last mile” latencies. A very important issue is the selection of the content to be prefetched to the CDN servers. All the approaches developed so far, assume the existence of adequate content popularity statistics to drive the prefetch decisions. Such information though, is not always available, or it is extremely volatile, turning such methods problematic. To address this issue, we develop self-adaptive techniques to select the outsourced content in a CDN infrastructure, which requires no apriori knowledge of request statistics. We identify clusters of “correlated” Web pages in a site, called Web site communities, and make these communities the basic outsourcing unit. Through a detailed simulation environment, using both real and synthetic data, we show that the proposed techniques are very robust and effective in reducing the user-perceived latency, performing very close to an unfeasible, off-line policy, which has full knowledge of the content popularity.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Annapureddy, S., Freedman, M.J., Mazieres D.: Shark: scaling file servers via cooperative caching. In: Proceedings of the USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI), Boston, MA, 2–4 May 2005
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)
Bent, L., Rabinovich, M., Voelker, G.M., Xiao, Z.: Characterization of a large Web site population with implications for content delivery. World Wide Web J. 9(4), 505–536 (2006)
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: Proceedings of the SIAM Data Mining Conference (SDM) (2004)
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)
Chen, Y., Qiu, L., Chen, W., Nguyen, L., Katz, R.H.: Efficient and adaptive Web replication using content clustering. IEEE J. Sel. Areas Commun. 21(6), 979–994 (2003)
Eiron, N., McCurley, K.S.: Untangling compound documents on the Web. In: Proceedings of the ACM Conference on Hypertext and hypermedia (HT), Nottingham, 26–30 August 2003
Fan, L., Cao, P., Almeida, J.M., Broder, A.Z.: Summary cache: a scalable wide-area Web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)
Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-organization and identification of web communities. IEEE Comput. 35(3), 66–71 (2002)
Flake, G.W., Tarjan, R.E., Tsioutsiouliklis, K.: Graph clustering and minimum cut trees. Internet Math. 1(4), 385–408 (2003/2004)
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)
Fujita, N., Ishikawa, Y., Iwata, A., Izmailov, R.: Coarse-grain replica management strategies for dynamic replication of Web contents. Comput. Networks 45(1), 19–34 (2004)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U. S. A. 99(12), 7821–7826 (2002)
Greco, G., Greco, S., Zumpano, E.: Web communities: models and algorithms. World Wide Web J. 7(1), 58–82 (2004)
Guimera, R., Sales-Pardo, M., Nunes Amaral, L.A.: Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E 70, 025101 (2004)
Haveliwala, T.H.: Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)
He, Y., Siganos, G., Faloutsos, M., Krishnamurthy, S.: Asystematic framework for unearthing the missing links: measurements and impact. In: Proceedings of the USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI), pp. 187–200. Cambridge, USA, 11–13 April 2007
Ino, H., Kudo, M., Nakamura, A.: Partitioning of Web graphs by community topology. In: Proceedings of the ACM World Wide Web Conference (WWW), Chiba, 10–14 May 2005
Jung, Y., Krishnamurthy, B., Rabinovich, M.: Flash crowds and denial of service attacks: characterization and implications for CDNs and Web sites. In: Proceedings of the ACM World Wide Web Conference (WWW), Honolulu, 7–11 May 2002
Kangasharju, J., Roberts, J.W., Ross, K.W.: Object replication strategies in content distribution networks. Comput. Commun. 25(4), 376–383 (2002)
Katsaros, D., Manolopoulos, Y.: Caching in Web memory hierarchies. In: Proceedings of the ACM Symposium on Applied Computing (SAC), Nicosia, 14–17 March 2004
Katsaros, D., Manolopoulos, Y.: Prediction in wireless networks by Markov chains. IEEE Wireless Communications magazine, 2007 (in press)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Krishnamurthy, B., Wills, C., Zhang, Y.: On the use and performance of content distribution networks. In: Proceedings of the ACM SIGCOMM Internet Measurement Workshop (IMW), San Francisco, 1–2 November 2001
Kroeger, T., Long, D.E., Mogul, J.: Exploring the bounds of Web latency reduction from caching and prefetching. In: Proceedings of the USENIX Symposium on Internet Technologies and Services (USITS), Monterey, 8–11 December 1997
Kulkarni, P., Shenoy, P.J., Gong, W.: Scalable techniques for memory-efficient CDN simulations. In: Proceedings of the ACM World Wide Web Conference (WWW), Budapest, 20–24 May 2003
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for emerging cyber-communities. Comput. Networks 31(11–16), 1481–1493 (1999)
Laoutaris, N., Zissimopoulos V., Stavrakakis, I.: On the optimization of storage capacity allocation for content distribution. Comput. Networks 47(3), 409–428 (2005)
Leung, K.Y., Eric Wong, W.M., Yeung, K.H.: Designing efficient and robust caching algorithms for streaming-on-demand services on the Internet. World Wide Web J. 7(3), 297–314 (2004)
Li, W.-S., Candan, K.S., Vu, Q., Agrawal, D.: Query relaxation by structure and semantics for retrieval of logical Web documents. IEEE Trans. Knowl. Data Eng. 14(4), 768–791 (2002)
Masada, T., Takasu, A., Adachi, J.: Web page grouping based on parameterized connectivity. In: Lee, Y.-J., Li, J., Whang, K.-Y., Lee, D. (eds.) Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), pp. 374–380. Springer, Heidelberg (2004)
Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized Web prefetching. IEEE Trans. Knowl. Data Eng. 15(5), 1155–1169 (2003)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Ninan, A.G., Kulkarni, P., Shenoy, P.J., Ramamritham, K., Tewari, R.: Scalable consistency maintenance in content distribution networks using cooperative leases. IEEE Trans. Knowl. Data Eng. 15(4), 813–828 (2003)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University (1999)
Pallis, G., Stamos, K., Vakali, A., Katsaros, D., Sidiropoulos, A., Manolopoulos, Y.: Replication based on objects load under a content distribution network. In: Proceedings of the IEEE International Workshop on Challenges in Web Information Retrieval and Integration (WIRI), Atlanta, 3–7 April 2006
Pallis, G., Vakali, A.: Insight and perspectives for content delivery networks. Commun. ACM 49(1), 101–106 (2006)
Pallis, G., Vakali, A., Stamos, K., Sidiropoulos, A., Katsaros, D., Manolopoulos, Y. A latency-based object placement approach in content distribution networks. In: Proceedings of the IEEE Latin American Web Congress (LA-Web), Buenos Aires, 31 October–2 November, 2005
Pan, J., Hou, Y.T., Li, B.: An overview of DNS-based server selections in content distribution networks. Comput. Networks 43(6), 695–711 (2003)
Qiu, L., Padmnanabhan, V.N., Voelker, G.M.: On the placement of Web server replicas. In: Proceedings of the IEEE Conference on Computer Communications (INFOCOM), vol. 3, pp. 1587–1596. IEEE, Piscataway (2001)
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. U. S. A. 101(9), 2658–2663 (2004)
Sivasubramanian, S., Pierre, G., van Steen, M., Alonso, G.: Analysis of caching and replication strategies for web applications. IEEE Internet Computing 11(1), 60–66 (2007)
Spertus, E., Sahami, M., Buyukkokten, O.: Evaluating similarity measures: a large-scale study in the Orkut social network. In: Proceeding of the ACM International Conference on Knowledge Discovery in Data Mining (SIGKDD), pp. 678–684 (2005)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for Internet applications. In: Proceedings of ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), San Diego, 27–31 August 2001
Tajima, K., Hatano, K., Matsukura, T., Sano, R., Tanaka, K.: Discovery and retrieval of logical information units in Web. In: Proceedings of the ACM Workshop on Organizing Web Space, in conjunction with ACM Digital Libraries (1999)
Vakali, A.: Proxy cache replacement algorithms: a history-based approach. World Wide Web J. 4(4), 277–298 (2001)
Vakali, A., Pallis, G.: Content delivery networks: status and trends. IEEE Internet Computing 7(6), 68–74 (2003)
Venkataramani, A., Yalagandula, P., Kokku, R., Sharif, S., Dahlin, M.: The potential costs and benefits of long-term prefetching for content distribution. Comput. Commun. 25(4), 367–375 (2002)
Wu, B., Kshemkalyani, A.D.: Objective-greedy algorithms for long-term Web prefetching. In: Proceedings of the IEEE Conference on Network Computing and Applications (NCA), pp. 61–68. IEEE, Piscataway (2004)
Wu, B., Kshemkalyani, A.D.: Objective-optimal algorithms for long-term Web prefetching. IEEE Trans. Comput. 55(1), 2–17 (2006)
Zegura, E., Calvert, K., Bhattacharjee, S.: How to model an internetwork. In: Proceedings of the IEEE Conference on Computer Communications (INFOCOM), pp. 594–602. IEEE, Piscataway (1996)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sidiropoulos, A., Pallis, G., Katsaros, D. et al. Prefetching in Content Distribution Networks via Web Communities Identification and Outsourcing. World Wide Web 11, 39–70 (2008). https://doi.org/10.1007/s11280-007-0027-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-007-0027-8