Enhancing Restore Speed of In-line Deduplication Cloud-Based Backup Systems by Minimizing Fragmentation

Gayathri Devi, K.; Raksha, S.; Sooda, Kavitha

doi:10.1007/978-981-13-9282-5_2

K. Gayathri Devi⁷,
S. Raksha⁷ &
Kavitha Sooda⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 159))

827 Accesses
1 Citations

Abstract

This paper focuses on the solution to overcome the problems caused due to physically scattered chunks of data. Fragmentation can occur in the form of sparse containers or containers that are not in order. Restore speed and garbage collection efficiency are compromised due to these containers. The disordered container triggers decline in restore speed owing to the decrease in restore cache. The idea of diminishing fragmentation is showcased by the proposal of History-Aware Rewriting (HAR) algorithm. HAR uses some of the historical information of the backups that have occurred to recognize and reduce sparse containers. Each of the chunks is given a unique hash code by the hash code generator Message Digest 5 (MD5). The logical block address is used to merge all the blocks and obtain the original single file. The Data Encryption Standard (DES) is used to generate a secret key file which is given to the user when the user is created by the data owner. Collectively using the above-mentioned algorithms, the proposed system aims to minimize fragmentation problem for in-line deduplication backup storage system. The amount of improvement of restore performance will depend on the amount of duplicate data. Simulation results show that if the same data is uploaded twice, the write performance rises up to 80% and further rises up to 90% for third instance of same data. This value varies in accordance with the deduplication technique. In case of no duplicate data at all, this model does not affect the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DASM: A Dynamic Adaptive Forward Assembly Area Method to Accelerate Restore Speed for Deduplication-Based Backup Systems

A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems

Distributed Storage Hash Algorithm (DSHA) for File-Based Deduplication in Cloud Computing

References

Lillibridge, M., Eshghi, K., Bhagwat, D.: Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings of USENIX FAST (2013)
Google Scholar
Nam, Y., Lu, G., Park, N., Xiao, W., Du, D.H.: Chunk fragmentation level: an effective indicator for read performance degradation in deduplication storage. In: Proceedings of IEEE High Performance Computing and Communications, pp. 581–586 (2013)
Google Scholar
Mao, B., Jiang, H., Wu, S., Fu, Y., Tian, L.: Read performance optimization for deduplication-based storage systems in the cloud. ACM Trans. Storage 10(2), 6:1–6:22 (2014)
Article Google Scholar
Lai, R., Hua, Y., Feng, D., Xia, W., Fu, M., Yang, Y.: A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems. Springer International Publishing Switzerland (2014)
Google Scholar
Li, Y.-K., Xu, M., Ng, C.-H., Lee, P.P.C.: Efficient hybrid inline and out-of-line deduplication for backup storage. ACM Trans. Storage 11(1), 2:1–2:21 (2014)
Article Google Scholar
Wei, J., Jiang, H., Zhou, K., Feng, D.: MAD2: scalable high throughput exact deduplication approach for network backup services. In: Proceedings of IEEE Mass Storage Systems Technology, pp. 1–14 (2010)
Google Scholar
Guo, F., Efstathopoulos, P.: Building a high performance deduplication system. In Proceedings of USENIX ATC (2011)
Google Scholar
Fu, M., Feng, D., Hua, Y., He, X., Chen, Z., Liu, J., Liu, Q.: Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge. IEEE Trans. Parallel Distrib. Syst. 27(3) (2016)
Article Google Scholar
Meister, D., Brinkmann, A., Suß, T.: File recipe compression in data deduplication systems. In Proceedings of USENIX FAST (2013)
Google Scholar
DuBois, L., Amaldas, M., Sheppard, E.: Key considerations as deduplication evolves into primary storage. White Paper 223310 (2011)
Google Scholar
Nam, Y.J., Park, D., Du, D.H.: Assuring demanded read performance of data deduplication storage with backup datasets. In Proceedings of IEEE MASCOTS, pp. 201–208 (2012)
Google Scholar
Botelho, F.C., Shilane, P., Garg, N., Hsu, W.: Memory efficient sanitization of a deduplicated storage system. In Proceedings of USENIX FAST (2013)
Google Scholar
Lin, X., Lu, G., Douglis, F., Shilane, P., Wallace, G.: Migratory compression: coarse-grained data reordering to improve compressibility. In: Proceedings of USENIX FAST (2014)
Google Scholar
Fu, M., Feng, D., Hua, Y., He, X., Chen, Z., Xia, W., Zhang, Y., Tan, Y.: Design tradeoffs for data deduplication tradeoffs in backup workloads. In: Proceedings USENIX FAST (2015)
Google Scholar
Machine learning dataset download. http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip
Machine learning dataset download. http://archive.ics.uci.edu/ml/machine-learning-databases/00311/SentenceCorpus.zip
GitHub Alphabet Recognition datasets download. https://github.com/MinhasKamal/AlphabetRecognizer/tree/master/src/res/trainingData/10

Download references

Author information

Authors and Affiliations

B. M. S. College of Engineering, Bengaluru, India
K. Gayathri Devi, S. Raksha & Kavitha Sooda

Authors

K. Gayathri Devi
View author publications
You can also search for this author in PubMed Google Scholar
S. Raksha
View author publications
You can also search for this author in PubMed Google Scholar
Kavitha Sooda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Gayathri Devi .

Editor information

Editors and Affiliations

School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
School of Computer Applications, KIIT University, Bhubaneswar, Odisha, India
J. R. Mohanty
School of Computer and Information Science, University of Hyderabad, Hyderabad, Telangana, India
Siba K. Udgata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gayathri Devi, K., Raksha, S., Sooda, K. (2020). Enhancing Restore Speed of In-line Deduplication Cloud-Based Backup Systems by Minimizing Fragmentation. In: Satapathy, S., Bhateja, V., Mohanty, J., Udgata, S. (eds) Smart Intelligent Computing and Applications . Smart Innovation, Systems and Technologies, vol 159. Springer, Singapore. https://doi.org/10.1007/978-981-13-9282-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-13-9282-5_2
Published: 27 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9281-8
Online ISBN: 978-981-13-9282-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Enhancing Restore Speed of In-line Deduplication Cloud-Based Backup Systems by Minimizing Fragmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DASM: A Dynamic Adaptive Forward Assembly Area Method to Accelerate Restore Speed for Deduplication-Based Backup Systems

A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems

Distributed Storage Hash Algorithm (DSHA) for File-Based Deduplication in Cloud Computing

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Enhancing Restore Speed of In-line Deduplication Cloud-Based Backup Systems by Minimizing Fragmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DASM: A Dynamic Adaptive Forward Assembly Area Method to Accelerate Restore Speed for Deduplication-Based Backup Systems

A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems

Distributed Storage Hash Algorithm (DSHA) for File-Based Deduplication in Cloud Computing

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation