Abstract
Failures are normal rather than exceptional in the cloud computing environments. To improve system availability, replicating the popular data to multiple suitable locations is an advisable choice, as users can access the data from a nearby site. This is, however, not the case for replicas which must have a fixed number of copies on several locations. How to decide a reasonable number and right locations for replicas has become a challenge in the cloud computing. In this paper, a dynamic data replication strategy is put forward with a brief survey of replication strategy suitable for distributed computing environments. It includes: 1) analyzing and modeling the relationship between system availability and the number of replicas; 2) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 3) calculating a suitable number of copies to meet a reasonable system byte effective rate requirement and placing replicas among data nodes in a balanced way; 4) designing the dynamic data replication algorithm in a cloud. Experimental results demonstrate the efficiency and effectiveness of the improved system brought by the proposed strategy in a cloud.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Foster I, Zhao Y, Raicu I, Lu S Y. Cloud computing and grid computing 360-degree compared. In Proc. Grid Computing Environments Workshop, Austin, TX, USA, Nov. 12-16, 2008, pp.1–10.
Buyya R, Yeo C S, Venugopal S, Broberg J, Brandic I. Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 2009, 25(6): 599–616.
Armbrust M, Fox A, Griffith R, Joseph A D, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. A view of cloud computing. Communications of the ACM, 2010, 53(4): 50–58.
Mell P, Grance T. The NIST definition of cloud computing. Communications of the ACM, 2010, 53(6): 50.
Iosup A, Ostermann S, Yigitbasi N, Prodan R, Fahringer T, Epema D H J. Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(6): 931–945.
Han Y B, Sun J Y, Wang G L, Li H F. A cloud-based BPM architecture with user-end distribution of non-compute-intensive activities and sensitive data. Journal of Computer Science and Technology, 2010, 25(6): 1157–1167.
Wang H. Privacy-preserving data sharing in cloud computing. Journal of Computer Science and Technology, 2010, 25(3): 401–414.
He K Q, Wang J A, Liang P. Semantic interoperability aggregation in service requirements refinement. Journal of Computer Science and Technology, 2010, 25(6): 1103–1117.
Xu B M, Zhao C Y, Hu E Z, Hu B. Job scheduling algorithm based on Berger model in cloud environment. Advances in Engineering Software, 2011, 42(7): 419–425.
Ghemawat S, Gobioff H, Leung S T. The Google file system. ACM SIGOPS Operating Systems Review, 2003, 37(5): 29–43.
Shvachko K, Hairong K, Radia S, Chansler R. The Hadoop distributed file system. In Proc. the 26th Symposium on Mass Storage Systems and Technologies, Incline Village, NV, USA, May 3-7, 2010, pp.1–10.
Wang S S, Yan K Q, Wang S C. Achieving efficient agreement within a dual-failure cloud-computing environment. Expert System with Applications, 2010, 38(1): 906–915.
Chang R S, Chang H P. A dynamic data replication strategy using access-weights in data grids. Journal of Supercomputing, 2008, 45(3): 277–295.
Kim Y H, Jung M J, Lee C H. Energy-aware real-time task scheduling exploiting temporal locality. IEICE Transactions on Information and Systems, 2010, 93(5): 1147–1153.
Wei Q, Veeravalli B, Gong B, Zeng L, Feng D. CDRM: A cost-effective dynamic replication management scheme for cloud storage cluster. In Proc. 2010 IEEE International Conference on Cluster Computing, Heraklion, Crete, Greece, Sept. 20-24, 2010, pp.188–196.
Bonvin N, Papaioannou T G, Aberer K. A self-organized, fault-tolerant and scalable replication scheme for cloud storage. In Proc. the 1st ACM Symposium on Cloud Computing, Indianapolis, IN, USA, June 10-11, 2010, pp.205–216.
Nguyen T, Cutway A, Shi W. Differentiated replication strategy in data centers. In Proc. the IFIP International Conference on Network and Parallel Computing, Zhengzhou, China, Sept. 13-15, 2010, pp.277–288.
Mckusick M, Quinlan S. GFS: Evolution on fast-forward. Communications of the ACM, 2010, 53(3): 42–47.
Ahmad N, Fauzi A A C, Sidek R M, Zin N M, Beg A H. Lowest data replication storage of binary vote assignment data grid. In Proc. the 2nd International Conference Networked Digital Technologies, Prague, Czech Republic, July 7-9, 2010, pp.466–473.
Rahman R M, Barker K, Alhajj R. Replica placement design with static optimality and dynamic maintainability. In Proc. the 6th IEEE International Symposium on Cluster Computing and the Grid, Singapore, May 16-19, 2006, pp.434–437.
Dogan A. A study on performance of dynamic file replication algorithms for real-time file access in data grids. Future Generation Computer Systems, 2009, 25(8): 829–839.
Lei M, Vrbsky S V, Hong X. An on-line replication strategy to increase availability in data grids. Future Generation Computer Systems, 2008, 24(2): 85–98.
Litke A, Skoutas D, Tserpes K, Varvarigou T. Efficient task replication and management for adaptive fault tolerance in mobile grid environments. Future Generation Computer Systems, 2007, 23(2): 163–178.
Dobber M, van der Mei R, Koole G. Dynamic load balancing and job replication in a global-scale grid environment: A comparison. IEEE Transactions on Parallel and Distributed Systems, 2009, 20(2): 207–218.
Yuan D, Yang Y, Liu X, Chen J. A data placement strategy in scientific cloud workflows. Future Generation Computer Systems, 2010, 26(8): 1200–1214.
Rood B, Lewis M J. Grid resource availability prediction-based scheduling and task replication. Journal of Grid Computing, 2009, 7(4): 479–500.
Latip R, Othman M, Abdullah A, Ibrahim H, Md Sulaiman N. Quorum-based data replication in grid environment. International Journal of Computational Intelligence Systems, 2009, 2(4): 386–397.
Avizienis A, Laprie J C, Randell B R, Landwehr C. Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 2004, 1(1): 11–33.
Al-Kuwaiti M, Kyriakopoulos N, Hussein S. A comparative analysis of network dependability, fault-tolerance, reliability, security, and survivability. IEEE Communications Surveys & Tutorials, 2009, 11(2): 106–124.
Ray I, Ray I, Chakraborty S. An interoperable context sensitive model of trust. Journal of Intelligent Information Systems, 2009, 32(1): 75–104.
Tu M, Li P, Yen I L, Thuraisingham B M, Khan L. Secure data objects replication in data grid. IEEE Transactions on Dependable and Secure Computing, 2010, 7(1): 50–64.
Wang J Y, Jea K F. A near-optimal database allocation for reducing the average waiting time in the grid computing environment. Information Sciences, 2009, 179(21): 3772–3790.
Jung D, Chin S H, Chung K S, Suh T, Yu H C, Gil J M. An effective job replication technique based on reliability and performance in mobile grids. InProc. the 5th International Conference Advances in Grid and Pervasive Computing, Hualien, Taiwan, China, May 10-13, 2010, pp.47–58.
Buyya R, Ranjan R, Calheiros R N. Modeling and simulation of scalable cloud computing environments and the CloudSim toolkit: Challenges and opportunities. In Proc. 2009 International Conference on High Performance Computing & Simulation, Leipzig, Germany, June 21-24, 2009, pp.1–11.
Belalem G, Tayeb F Z, Zaoui W. Approaches to improve the resources management in the simulator CloudSim. In Proc. the 1st International Conference Information Computing and Applications, Tangshan, China, Oct. 15-18, 2010, pp.189–196.
Calheiros R N, Ranjan R, Beloglazov A, De Rose C A F, Buyya R. CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software-Practice & Experience, 2011, 41(1): 23–50.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Grant Nos. 61070162, 71071028 and 70931001, the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant Nos. 20110042110024 and 20100042110025, the Fundamental Research Funds for the Central Universities of China under Grant Nos. N100604012, N090504003 and N090504006.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sun, DW., Chang, GR., Gao, S. et al. Modeling a Dynamic Data Replication Strategy to Increase System Availability in Cloud Computing Environments. J. Comput. Sci. Technol. 27, 256–272 (2012). https://doi.org/10.1007/s11390-012-1221-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-012-1221-4