Abstract
This paper focuses on the solution to overcome the problems caused due to physically scattered chunks of data. Fragmentation can occur in the form of sparse containers or containers that are not in order. Restore speed and garbage collection efficiency are compromised due to these containers. The disordered container triggers decline in restore speed owing to the decrease in restore cache. The idea of diminishing fragmentation is showcased by the proposal of History-Aware Rewriting (HAR) algorithm. HAR uses some of the historical information of the backups that have occurred to recognize and reduce sparse containers. Each of the chunks is given a unique hash code by the hash code generator Message Digest 5 (MD5). The logical block address is used to merge all the blocks and obtain the original single file. The Data Encryption Standard (DES) is used to generate a secret key file which is given to the user when the user is created by the data owner. Collectively using the above-mentioned algorithms, the proposed system aims to minimize fragmentation problem for in-line deduplication backup storage system. The amount of improvement of restore performance will depend on the amount of duplicate data. Simulation results show that if the same data is uploaded twice, the write performance rises up to 80% and further rises up to 90% for third instance of same data. This value varies in accordance with the deduplication technique. In case of no duplicate data at all, this model does not affect the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lillibridge, M., Eshghi, K., Bhagwat, D.: Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings of USENIX FAST (2013)
Nam, Y., Lu, G., Park, N., Xiao, W., Du, D.H.: Chunk fragmentation level: an effective indicator for read performance degradation in deduplication storage. In: Proceedings of IEEE High Performance Computing and Communications, pp. 581–586 (2013)
Mao, B., Jiang, H., Wu, S., Fu, Y., Tian, L.: Read performance optimization for deduplication-based storage systems in the cloud. ACM Trans. Storage 10(2), 6:1–6:22 (2014)
Lai, R., Hua, Y., Feng, D., Xia, W., Fu, M., Yang, Y.: A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems. Springer International Publishing Switzerland (2014)
Li, Y.-K., Xu, M., Ng, C.-H., Lee, P.P.C.: Efficient hybrid inline and out-of-line deduplication for backup storage. ACM Trans. Storage 11(1), 2:1–2:21 (2014)
Wei, J., Jiang, H., Zhou, K., Feng, D.: MAD2: scalable high throughput exact deduplication approach for network backup services. In: Proceedings of IEEE Mass Storage Systems Technology, pp. 1–14 (2010)
Guo, F., Efstathopoulos, P.: Building a high performance deduplication system. In Proceedings of USENIX ATC (2011)
Fu, M., Feng, D., Hua, Y., He, X., Chen, Z., Liu, J., Liu, Q.: Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge. IEEE Trans. Parallel Distrib. Syst. 27(3) (2016)
Meister, D., Brinkmann, A., Suß, T.: File recipe compression in data deduplication systems. In Proceedings of USENIX FAST (2013)
DuBois, L., Amaldas, M., Sheppard, E.: Key considerations as deduplication evolves into primary storage. White Paper 223310 (2011)
Nam, Y.J., Park, D., Du, D.H.: Assuring demanded read performance of data deduplication storage with backup datasets. In Proceedings of IEEE MASCOTS, pp. 201–208 (2012)
Botelho, F.C., Shilane, P., Garg, N., Hsu, W.: Memory efficient sanitization of a deduplicated storage system. In Proceedings of USENIX FAST (2013)
Lin, X., Lu, G., Douglis, F., Shilane, P., Wallace, G.: Migratory compression: coarse-grained data reordering to improve compressibility. In: Proceedings of USENIX FAST (2014)
Fu, M., Feng, D., Hua, Y., He, X., Chen, Z., Xia, W., Zhang, Y., Tan, Y.: Design tradeoffs for data deduplication tradeoffs in backup workloads. In: Proceedings USENIX FAST (2015)
Machine learning dataset download. http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip
Machine learning dataset download. http://archive.ics.uci.edu/ml/machine-learning-databases/00311/SentenceCorpus.zip
GitHub Alphabet Recognition datasets download. https://github.com/MinhasKamal/AlphabetRecognizer/tree/master/src/res/trainingData/10
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gayathri Devi, K., Raksha, S., Sooda, K. (2020). Enhancing Restore Speed of In-line Deduplication Cloud-Based Backup Systems by Minimizing Fragmentation. In: Satapathy, S., Bhateja, V., Mohanty, J., Udgata, S. (eds) Smart Intelligent Computing and Applications . Smart Innovation, Systems and Technologies, vol 159. Springer, Singapore. https://doi.org/10.1007/978-981-13-9282-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-13-9282-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9281-8
Online ISBN: 978-981-13-9282-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)