freeCycles - Efficient Multi-Cloud Computing Platform

Bruno, Rodrigo; Costa, Fernando; Ferreira, Paulo

doi:10.1007/s10723-017-9414-2

freeCycles - Efficient Multi-Cloud Computing Platform

Published: 30 October 2017

Volume 15, pages 501–526, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Grid Computing Aims and scope Submit manuscript

freeCycles - Efficient Multi-Cloud Computing Platform

Download PDF

110 Accesses
3 Citations
Explore all metrics

Abstract

The growing adoption of the MapReduce programming model increases the appeal of using Internet-wide computing platforms to run MapReduce applications on the Internet. However, current data distribution techniques, used in such platforms to distribute the high volumes of information which are needed to run MapReduce jobs, are naive, and therefore fail to offer an efficient approach for running MapReduce over the Internet. Thus, we propose a computing platform called freeCycles that runs MapReduce jobs over the Internet and provides two new main contributions: i) it improves data distribution, and ii) it increases intermediate data availability by replicating tasks or data through nodes in order to avoid losing intermediate data and consequently avoiding significant delays on the overall MapReduce execution time. We present the design and implementation of freeCycles, in which we use the BitTorrent protocol to distribute all data, along with an extensive set of performance results, which confirm the usefulness of the above mentioned contributions. Our system’s improved data distribution and availability makes it an ideal platform for large scale MapReduce jobs.

References

Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.N.: Tarazu: Optimizing mapreduce on heterogeneous clusters. SIGARCH Comput. Archit. News 40(1), 61–74 (2012)
Article Google Scholar
Alexandrov, A.D., Ibel, M., Schauser, K.E., Scheiman, C.J.: Superweb: towards a global web-based parallel computing infrastructure. In: Parallel Processing Symposium, 1997. Proceedings., 11th International, pp 100–106 (1997)
Anderson, D.P.: Boinc: a system for public-resource computing and storage. In: 2004. Proceedings. Fifth IEEE/ACM International Workshop on Grid Computing, pp 4–10 (2004)
Anderson, D.P., Christensen, C., Allen, B.: Designing a runtime system for volunteer computing. In: SC 2006 Conference, Proceedings of the ACM/IEEE, pp 33–33 (2006)
Anderson, D.P., Fedak, G.: The computational and storage potential of volunteer computing. In: 2006. CCGRID 06. Sixth IEEE International Symposium on Cluster Computing and the Grid, vol. 1, pp 73–80 (2006)
Baratloo, A., Karaul, M., Kedem, Z.M., Wijckoff, P.: Charlotte: Metacomputing on the web. Futur. Gener. Comput. Syst. 15(5–6), 559–570 (1999)
Article Google Scholar
Bazinet, A.L., Cummings, M.P.: Subdividing long-running, variable-length analyses into short, fixed-length boinc workunits. J. Grid Comput. 14(3), 429–441 (2016)
Article Google Scholar
Bertis, V., Bolze, R., Desprez, F., Reed, K.: From dedicated grid to volunteer grid: Large scale execution of a bioinformatics application. J. Grid Comput. 7(4), 463 (2009)
Article Google Scholar
Binzenhöfer, A., Leibnitz, K.: Estimating churn in structured p2p networks. In: Managing Traffic Performance in Converged Networks, pp 630–641. Springer, Berlin (2007)
Borthakur, D.: The hadoop distributed file system: Architecture and design. Hadoop Proj. Website 11, 21 (2007)
Google Scholar
Bruno, R., Ferreira, P.: Scadamar: Scalable and data-efficient internet mapreduce. In: Proceedings of the 2Nd International Workshop on CrossCloud Systems, CCB’14, pp 2:1–2:6. ACM, New York (2014)
Cardosa, M., Wang, C., Nangia, A., Chandra, A., Weissman, J.: Exploring mapreduce efficiency with highly-distributed data, In Proceedings of the Second International Workshop on MapReduce and its Applications, 27–34, ACM, New York (2011)
Castro, M., Liskov, B., et al.: Practical byzantine fault tolerance. In: OSDI, vol. 99, pp 173–186 (1999)
Chakravarti, A.J., Baumgartner, G., Lauria, M.: The organic grid: self-organizing computation on a peer-to-peer network. IEEE Trans. Syst. Man Cybern. Part A: Syst. Humans 35(3), 373–384 (2005)
Article Google Scholar
Cherkasova, L., Lee, J.: Fastreplica: Efficient large file distribution within content delivery networks. In: USENIX Symposium on Internet Technologies and Systems, Seattle (2003)
Chowdhury, M., Zaharia, M., Ma, J., Jordan, M.I., Stoica, I.: Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Comput. Commun. Rev. 41(4), 98–109 (2011)
Article Google Scholar
Chun, B., Culler, D., Roscoe, T., Bavier, A., Peterson, L., Wawrzoniak, M., Bowman, M.: Planetlab: an overlay testbed for broad-coverage services. ACM SIGCOMM Comput. Commun. Rev. 33(3), 3–12 (2003)
Article Google Scholar
Costa, F., Veiga, L., Ferreira, P.: Internet-scale support for map-reduce processing. J. Internet Serv. Appl. 4(1), 1–17 (2013)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dinu, F., Ng, T.S.: Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, pp 187–198. ACM, New York (2012)
Fedak, G., Germain, C., Neri, V., Cappello, F.: Xtremweb: a generic global computing system. In: 2001. Proceedings. First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp 582–587 (2001)
Fedak, G., He, H., Cappello, F.: Bitdew: A data management and distribution service with multi-protocol file transfer and metadata abstraction. J. Netw. Comput. Appl. 32(5), 961–975 (2009). Next Generation Content Networks
Article Google Scholar
Gentzsch, W., Girou, D., Kennedy, A., Lederer, H., Reetz, J., Riedel, M., Schott, A., Vanni, A., Vazquez, M., Wolfrat, J.: Deisa—distributed european infrastructure for supercomputing applications. J. Grid Comput. 9(2), 259–277 (2011)
Article Google Scholar
Georgatos, F., Gkamas, V., Ilias, A., Kouretis, G., Varvarigos, E.: A grid-enabled cpu scavenging architecture and a case study of its use in the greek school network. J. Grid Comput. 8(1), 61–75 (2010)
Article Google Scholar
Heckmann, O., Bock, A.: The edonkey 2000 protocol. Rapport technique, Multimedia Communications Lab, Darmstadt University of Technology, 13 (2002)
Heien, E.M., Anderson, D.P., Hagihara, K.: Computing low latency batches with unreliable workers in volunteer computing environments. J. Grid Comput. 7(4), 501 (2009)
Article Google Scholar
Kailasam, S., Dhawalia, P., Balaji, S.J., Iyer, G., Dharanipragada, J.: Extending mapreduce across clouds with bstream. IEEE Trans. Cloud Comput. 2(3), 362–376 (2014)
Article Google Scholar
Ko, S.Y., Hoque, I., Cho, B., Gupta, I.: Making cloud intermediate data fault-tolerant. In: Proceedings of the 1st ACM Symposium on Cloud Computing, p 181–192. ACM, Berlin (2010)
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., et al.: Oceanstore: An architecture for global-scale persistent storage. ACM Sigplan Not. 35(11), 190–201 (2000)
Article Google Scholar
Langville, A.N., Meyer, C.D.: Google’s PageRank and beyond: the science of search engine rankings. Princeton University Press, Princeton (2011)
MATH Google Scholar
Li, P., Guo, S., Yu, S., Zhuang, W.: Cross-cloud mapreduce for big data. IEEE Trans. Cloud Comput. PP(99), 1–1 (2015)
Google Scholar
Liang, J., Kumar, R., Ross, K.W.: The fasttrack overlay: A measurement study. Comput. Netw. 50(6), 842–858 (2006)
Article Google Scholar
Lin, H., Ma, X., Archuleta, J., Feng, W.-c., Gardner, M., Zhang, Z.: Moon: Mapreduce on opportunistic environments. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, pp 95–106. ACM, New York (2010)
Lo, V., Zappala, D., Zhou, D., Liu, Y., Zhao, S.: Cluster computing on the fly: P2p scheduling of idle cycles in the internet. In: Peer-to-Peer Systems III, pp 227–236. Springer, Berlin (2005)
Marozzo, F., Talia, D., Trunfio, P.: Adapting mapreduce for dynamic environments using a peer-to-peer model. In: Proceedings of the 1st Workshop on Cloud Computing and its Applications (2008)
Nguyen, T., Shi, W.: Improving resource efficiency in data centers using reputation-based resource selection. In: Green Computing Conference, 2010 International, pp 389–396, USA (2010)
Pouwelse, J., Garbacki, P., Epema, D., Sips, H.: The bittorrent p2p file-sharing system: Measurements and analysis. In: Peer-to-Peer Systems IV, pp 205–216. Springer, Berlink (2005)
Qureshi, M.B., Dehnavi, M.M., Min-Allah, N., Qureshi, M.S., Hussain, H., Rentifis, I., Tziritas, N., Loukopoulos, T., Khan, Samee U., Xu, C.-Z., Zomaya, A.Y.: Survey on grid resource allocation mechanisms. J. Grid Comput. 12(2), 399–441 (2014)
Article Google Scholar
Rasooli, A., Down, D.G.: Guidelines for selecting hadoop schedulers based on system heterogeneity. J. Grid Comput. 12(3), 499–519 (2014)
Article Google Scholar
Ripeanu, M.: Peer-to-peer architecture case study: Gnutella network. In: 2001. Proceedings. First International Conference on Peer-to-Peer Computing, pp 99–100. IEEE, USA (2001)
Rood, B., Lewis, M.J.: Grid resource availability prediction-based scheduling and task replication. J. Grid Comput. 7(4), 479 (2009)
Article Google Scholar
Sarmenta, L.F.G., Hirano, S.: Bayanihan: building and studying web-based volunteer computing systems using java. Futur. Gener. Comput. Syst. 15(5–6), 675–686 (1999)
Article Google Scholar
Silberstein, M., Sharov, A., Geiger, D., Schuster, A.: Gridbot: execution of bags of tasks in multiple grids. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC’09, pp 11:1–11:12. ACM, New York (2009)
Singh, S., Chana, I.: A survey on resource scheduling in cloud computing Issues and challenges. J. Grid Comput. 14(2), 217–264 (2016)
Article Google Scholar
Stutzbach, D., Rejaie, R.: Understanding churn in peer-to-peer networks, In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, 189–202, ACM, New York (2006)
Tang, B., Moca, M., Chevalier, S., He, H., Fedak, G.: Towards mapreduce for desktop grid computing. In: 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), pp 193–200 (2010)
Tang, B., Tang, M., Fedak, G., He, H.: Availability/network-aware mapreduce over the internet. Inf. Sci. 379, 94–111 (2017)
Article Google Scholar
Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the condor experience. Concurr. Comput. Pract. Exper. 17(2-4), 323–356 (2005)
Article Google Scholar
Toth, D., Finkel, D.: Improving the productivity of volunteer computing by using the most effective task retrieval policies. J. Grid Comput. 7(4), 519 (2009)
Article Google Scholar
White, T.: O’Reilly (2012)
Yang, S., Butt, A.R., Fang, X., Hu, Y.C., Midkiff, S.P.: A fair, secure and trustworthy peer-to-peer based cycle-sharing system. J. Grid Comput. 4(3), 265–286 (2006)
Article MATH Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp 10–10 (2010)

Download references

Author information

Authors and Affiliations

INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, 1000-029, Lisbon, Portugal
Rodrigo Bruno, Fernando Costa & Paulo Ferreira

Authors

Rodrigo Bruno
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Costa
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Ferreira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo Bruno.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bruno, R., Costa, F. & Ferreira, P. freeCycles - Efficient Multi-Cloud Computing Platform. J Grid Computing 15, 501–526 (2017). https://doi.org/10.1007/s10723-017-9414-2

Download citation

Received: 01 October 2016
Accepted: 09 October 2017
Published: 30 October 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10723-017-9414-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

freeCycles - Efficient Multi-Cloud Computing Platform

Abstract

Article PDF

Similar content being viewed by others

Implementing MapReduce Applications in Dynamic Cloud Environments

Cloud Federation to Elastically Increase MapReduce Processing Resources

The Emergence of Modified Hadoop Online-Based MapReduce Technology in Cloud Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

freeCycles - Efficient Multi-Cloud Computing Platform

Abstract

Article PDF

Similar content being viewed by others

Implementing MapReduce Applications in Dynamic Cloud Environments

Cloud Federation to Elastically Increase MapReduce Processing Resources

The Emergence of Modified Hadoop Online-Based MapReduce Technology in Cloud Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation