Abstract
A distributed file system is used to store and share files in a peer-to-peer network using the InterPlanetary File System (IPFS) protocol. In a distributed file system, multiple central servers can save the files which will be accessed by various remote clients with proper authorization rights within the network. Nowadays, the amount of data getting generated each minute is huge and can be accessed by multiple users. This creates a problem in managing, accessing and executing data and finally leading to concurrency control issues. Concurrency control is a process of simultaneously managing the execution of data in a shared database and ensures the serializability of data for multiple users. Thus, in the context of Hadoop, if multiple clients want to write an updated data in the HDFS file system then the protocol that needs to be followed to make sure that the write done by one client does not influence the computation performed by the other client. This paper elaborates how multiple users can access the same file without getting any distortion in the content of that file. It also provides a theoretical solution to handle the concurrency control problem in Hadoop. The solution discussed in this paper is to implement Hadoop’s Java-based Filesystem interface for the decentralised, peer-to-peer file system using IPFS. The proposed interface will allow the Hadoop MapReduce functions to be directly performed on data files hosted on IPFS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, S., Zhang, Y., Zhang, Y.: A blockchain-based framework for data sharing with fine-grained access control in decentralized storage systems. IEEE Access 6, 38437–38450 (2019)
Ali, S., Wang, G., White, B., Cottrell, R.L.: A blockchain-based decentralized data storage and access framework for PingER. In: Proceedings of the 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), New York, NY, USA, pp. 1303–1308, 1–3 Aug 2018
Kuo, T.T., Kim, H.E., Ohno-Machado, L.: Blockchain distributed ledger technologies for biomedical and health care applications. Int. J. Am. Med. Inform. Assoc. 6, 1211–1220 (2017)
Vorick, D., Champine, L.: Sia: simple decentralized storage. https://sia.tech/sia.pdf (2020). Accessed 31 January 2020
Wilkinson, S., Boshevski, T., Brandoff, J., Buterin, V.: Storj: a peer-to-peer cloud storage network. https://storj.io/storj.pdf (2020). Accessed 2 February 2020
Shafagh, H., Burkhalter, L., Hithnawi, A., Duquennoy, S.: Towards blockchain-based auditable storage and sharing of IOT data. In: Proceedings of the 2017 on Cloud Computing Security Workshop, Dallas, TX, USA, pp. 45–50, 3 Nov 2017
Wennergren, O., Vidhall, M., Sörensen, J., Steinhauer, J.: Transparency Analysis of Distributed File Systems. Bachelor Degree Project in Information Technology; University of Skövde, Skövde, Sweden (2018)
Bansal, N., Upadhyay, D., Mittal, U.: Concurrency control techniques in HDFS. In: 2014 5th International Conference Confluence The Next Generation Information Technology Summit (Confluence), Noida, pp. 87–90 (2014)
Benet, J.: IPFS-content addressed, versioned, P2P file system arXiv:1407.3561 (2014)
Jovović, I., Husnjak, S., Forenbacher, I., Maček, S.: 5G, Blockchain, and IPFS: A general survey with possible innovative applications in industry 4.0. In: Proceedings of the 3rd EAI International Conference on Management of Manufacturing Systems-MMS, Dubrovnik, Croatia, pp. 1–10, 6–8 Nov 2018
Chen, Y., Li, H., Li, K., Zhang, J.: An improved P2P file system scheme based on IPFS and Blockchain, pp. 2652–2657. https://doi.org/10.1109/bigdata.2017.8258226
IPFS-Cluster. https://cluster.ipfs.io/. Accessed 5 Dec 2018
Interplanetary Linked Data (IPLD). https://ipld.io/. Accessed March 2020
Kaiser, J., Meister, D., Brinkmann, A., Effert, S.: Design of an exact data deduplication cluster. In: Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), San Diego, CA, USA, pp. 1–12, 16–20 Apr 2012
IPFS Architecture. https://github.com/ipfs/specs/tree/master/architecture. Accessed March 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sethi, J., Srivastava, S., Upadhyay, D. (2021). A Review on P2P File System Based on IPFS for Concurrency Control in Hadoop. In: Hassanien, A.E., Bhattacharyya, S., Chakrabati, S., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 1286. Springer, Singapore. https://doi.org/10.1007/978-981-15-9927-9_83
Download citation
DOI: https://doi.org/10.1007/978-981-15-9927-9_83
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9926-2
Online ISBN: 978-981-15-9927-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)