Keywords

1 Introduction

For several decades, people have been depending on centralized solutions that act as Trusted Third Parties (TTPs) to exchange information and to transfer assets through the Internet. These TTPs are responsible for securing data exchanges and they collect massive amounts of privacy-sensitive information from their users. However, a TTP becomes a single point of failure (SPOF) and is more vulnerable to security breaches and attacks [13]. As a solution to overcome this issue, blockchain [9] has gained massive attention due to its decentralized, transparent and immutable features. Indeed, blockchain allows participants to exchange information and store transactions without any TTP. Concretely, a blockchain is a chain of blocks that contain transactions, and each block is linked to the previous one with a cryptographic signature generated using a hash function. Adding a block to the chain relies on a consensus algorithm [3], which ensures that the same copy of the transactions in the block are validated by enough (in general, the majority) participants. For the validation to happen, different consensus algorithms (e.g. proof of work, proof of stake, etc.) are available nowadays, with different characteristics (computational cost, complexity, etc.). However, the availability of the recorded data to everyone in the blockchain network raises issues when it comes to privacy-sensitive data [8]. Besides this, the immutability property of blockchain guarantees that the data records stored in transactions are tamper-proof, i.e. they can neither be deleted nor be mutated, which can be seen as a limiting factor.

In this paper, we aim at addressing these challenges with a single framework that integrates the following contributions:

  • a solution for decentralized data storage that combines blockchain and Distributed Hash Table (DHT) to allow for data updates,

  • a Role-Based Access Control (RBAC) solution to manage access to privacy-sensitive data,

  • a flexible encryption design that allows to choose between multiple types of encryption while storing and querying data on the blockchain,

  • a proof-of-concept implementation with performance evaluation that demonstrate the feasibility of our solution.

The rest of the paper is organized as follows. Section 2 presents the motivating scenario that highlights the research problem. In Sect. 3, we discuss existing work and their limitations before highlighting the originality of our contribution. Section 4 presents our framework and its components and explains how its provides privacy-preserving, secure, and decentralized data management. Section 5 describes the experimental results that confirm the feasibility of our proposed solution. Finally, Sect. 6 concludes this paper and lays ground for future work.

2 Motivating Scenario and Research Problem

In this section, we present a wooden furniture supply chain scenario that motivates the need for data updates and highlights our research problems. In the following, we identified 6 actors that involve in the wood supply chain: 1) Wood cutting company identifies specific trees and cuts them into logs. 2) Transport company transports wood logs to warehouse. 3) Storage warehouse company stores logs temporarily. 4) Furniture assembly company assemble the furniture. 5) Furniture shop company displays the assembled furniture 6) Customer purchases wooden furniture and verifies product origin.

Our wood supply chain scenario highlights the need for trust and traceability in the supply chain process. Existing solutions rely on Radio Frequency Identification (RFID) technology to enable electronic traceability of wood in the supply chain. Generally, this traceability framework needs a third-party centralized database framework to collect and store RFID data, which leads to a Single Point Of Failure (SPOF). Decentralized storage solutions can solve the problem of a single point of failure. In particular, blockchain is a decentralized and distributed ledger technology that stores records of users in such a way that makes them accessible to all participants without the risk of SPOF. A blockchain consists of a linear sequence of blocks. The contents of each blockFootnote 1 contains a hash of the contents of the previous one to prevent the modification of stored transactions [11]. If the previous block is modified, the new hash one could generate from the content upon verification would not match the one stored in the next block. This design provides the blockchain with its immutability feature [6]: once data has been stored, no one can modify it.

However, our wood supply chain scenario highlights that actors need to insert, retrieve and delete data about their business activities, and at the same time, they need to be able to modify data, while keeping the proof that data was inserted. There is a need to develop a solution that overcomes the immutability characteristic of blockchain to allow update and delete operations on stored data. At the same time, the developed solution must fine-grained access control, as data access permissions vary depending on the data requester identity (data owner, business partners, client). In the following, we identify the research problems that raise from the scenario discussed above:

  • Data modification management: In our case study, actors may need to modify data in blockchain (e.g. number of logs and product type). However, blockchain does not allow data modification, once it has been added to the chain due to its immutability nature. The challenge consists in overcoming this limitation while keeping the properties that make the blockchain interesting.

  • Data security and access control: Blockchain stores data publicly and allows anyone to access it. In our context, a decentralized solution should ensure data privacy and protect privacy-sensitive data from unauthorized access.

In the following, we discuss the limitations of decentralized solutions supporting privacy-aware data access and data update.

3 Related Work

In this section, we present the related work and its limitations to store and update data on the blockchain. In [7], the authors present a blockchain-based framework to share data between stakeholders. Data hash sum is stored on blockchain while original data is stored on a MySQL database. However, MySQL databases are centralized and they are not as scalable as DHTs to store large amounts of data [5]. The authors in [14], propose a blockchain-based supply chain framework to maintain food traceability using a smart contract. However, product data is accessible publicly and immutable, which leads to privacy and data updates issues. In [12], the authors combine blockchain with DHT for secure IoT data sharing. Blockchain is used to store access control permissions, which is publicly visible and raise privacy issues. However, access control permissions are unable to modify due to blockchain immutability feature. The authors in [1], present a blockchain-based data storage for PingER (Ping End-to-End Reporting). The proposed solution use permissioned blockchain to store metadata of PingER files whereas corresponding data are stored on DHT without any encryption mechanisms. Additionally, this framework stores monitoring agent name and upload locations of the file on the blockchain, which raises data security and privacy issues. Inspired by the PingER metadata structure, our framework extends metadata structure and enables privacy and security management that ensures authorized access control and privacy protection.

As a summary, we have identified the most relevant work related to blockchain and DHT data storage. To the best of our knowledge, this is the first paper that provides decentralized data storage, data mutability, manages access to privacy-sensitive data, multiple types of encryption, and message sender anonymity at the same time in a single solution. In the following, we discuss the steps of our proposed framework in detail.

4 Proposed Framework

In this paper, we propose a secure privacy-aware decentralized framework that supports role-based access control, data mutability and actor’s anonymity. Each actor, as a peer of the framework, runs the same code that is structured into a set of components. The following subsections describe each of these components in detail as depicted in Fig. 1.

Fig. 1.
figure 1

Overview of a peer architecture.

4.1 General Overview

Our proposed framework allows wood supply chain actors to store data, read data upon request, and communicate with other actors through HTTP calls. Figure 1 shows the layout of our framework and its components organized around a main program. In our framework different actors are running the main program and they connect to each other using their APIs, after an initial call to the registry_server to get the list of available peers.

Let us consider the following example: an actor such as a wood cutter logs in our framework to store the number of logs cut on this day. When the program starts, the wood cutter will send a ‘POST’ request to the /peers resource of the registry_server to add its public key and URL (Uniform Resource Locator) to the list of connected peersFootnote 2. Then, it will call the /peers resource with the ‘GET’ method and retrieve the list of connected peers. It will then connect with other available peers to download a copy of the blockchain (/chain resource, method ‘GET’, available on each peer). Upon request, the main component will call the rbac_manager component to verify the current actor’s permission such as wood cutter to perform read and write operations. Indeed, each actor’s roles, resources, and permission are defined in this rbac_manager component. Our framework allows the authorized actor to choose different types of encryption methods while storing data and generates a public key, private key, or symmetric key accordingly. Before storing the data, the main component will call the encrypt_manager component to encrypt the entered data with the current actor’s public key or symmetric key depends on the selected encryption method. For each actor, the encrypt_manager component is responsible for generating public and private keys. This encrypted data sent to off-chain (key-value) storage called DHT_manager component, while corresponding pointer and metadata are stored on a blockchain_manager component (pointer is the hash of the data).

An authorized actor allows to create, update, delete, and read data using the pointer stored on the public ledger. A request (/chain/<block_no>, method ‘GET’) to the main component might call the ring_signature component to sign data anonymously, only in the case where the data is privacy-sensitive and the role of the requester requires anonymization. Accordingly, a request to (/chain/<block_no>, method ‘POST’), will create a block, or update it if it already exists, and process the contents sent in the request message.

The following subsections describe each of these components in detail.

4.2 Framework Components

In this section, we describe in detail the components of the proposed framework including decentralized data storage, authorized data access, ensure data traceability, and maintains the actor’s anonymity.

RBAC Manager Component: we use a Role-Based Access Control (RBAC) model to manage access to privacy-sensitive data. The RBAC model is based on the following four parameters: user, role, resource, and permission. In RBAC, users are actors related to the application. Roles are the application’s functions that allow to access resources based on the given permissions. A permission is an authorization to access one or more resources within the application [2].

In our work, we define the following users, roles, resources, and permissions that assigns permissions to the user based on their role in our wood supply chain scenario.

  • Users: In our framework, we need to define RBAC users according to the actors of the wood supply chain. Therefore, we define the following users: wood cutter, transporter, warehouse manager, furniture assembler, furniture seller, and customer.

  • Roles: According to the different actions our supply chain users can perform on the architecture, we define the following roles: 1) Data owner any userFootnote 3 can be data owner. Data owners can add, read, modify and delete data about their products. For example, a wood cutter would act as “data owner” and insert information such as (trees-cut:20, type:oak,). 2) Business partner the business partner role allows specific users (chosen by the data owner) to access data that is not available to anyone. For example, a furniture assembler would act as “business partner” and might be allowed to read from the previous example: (trees-cut:20, type:oak). 3) Public reader the public reader role gives access to all public data. For example, a customer would act as “public reader” and might be allowed to read (type:oak).

  • Resources: In our framework, user can access resources according to defined roles and permissions. In our framework, we define the following resources: 1) DHT: user can access DHT resource to add data about their business activities. For example, a wood cutter has a role “data owner” and store information such as (trees-cut:20, type:oak). 2) Blockchain: User allows to access blockchain resource to read data. For example, a customer has a role “public reader” and might be allowed to read information such as (type:oak).

  • Permissions: We define permissions to restrict user’s actions to access resources. For example, from previous example. a wood cutter has a role “data owner” and has a “permission” to write, read, update, and delete data such as (trees-cut:30, type:maple), whereas transport company would act as a “business partner” and has only “permission” to read information such as (trees-cut:30).

  • Rules and policies: Our framework defines rules and policies that controls access to the data such as private data, privacy-sensitive data, and public data. Our rbac_manager component is responsible to authenticate role of current login actor. It also ensures if current role has permission to access resource or not as denoted by verify_permission (role, operation, resource). For example, wood cutter has a role ‘business partner’ logs into the framework to store data on blockchain. The main component calls the method authenticate(actor, role) to authenticate that if a ‘business partner’ role exists in our rbac_manager component or not. After role authentication, the rbac_manager component verifies the permissions of actions for current login actor’s role such as if (actor_role == ‘owner’), then “owner” has permission to perform read, write, update, and delete all types of data on the blockchain. In case, if (actor_role == ‘business_partner’), then our framework allows just to read some data such as privacy-sensitive and public data. If (actor_role == ‘public_user’), then our framework provides access to just read public data. Our framework provides filter access based on role such as wood cutter as a ‘business partner’ has not permission to write, update, and delete data. We maintain data security by limiting unnecessary access to sensitive data based on each actor’s role. Please note that although this simple RBAC model answers the requirements of our scenario, more elaborate models could be plugged in without changing anything in the framework design.

Blockchain Component: We use blockchain_manager component to manage metadata and pointer of encrypted data. Our proposed metadata structure consists of the data entry date, data entry time, and data pointer. The main components of the blockchain include block transaction, consensus algorithms, and metadata extension. Each component is explained as follows.

  • Block transaction: Each block contains the block header, consensus signature, hash of the previous block, timestamps, verified metadata, and pointer of the actual data. In our framework, actors will connect to the framework and call initialize(chain) method to copy the blockchain if there will be any other available actors on the network, otherwise genesis block will be created and added to the blockchain. A blockchain is composed of a chain of the blocks where each block is comprised of many transactions [10]. Each transaction is broadcast on the network for verification and miners verifies the transaction through signature. Then, the verified transactions are added to the block of the blockchain. After storing verified metadata and pointer on the blockchain, our framework returns the block number to the data owner. The proposed framework allows data owner to access specific block to perform data update and delete operations.

  • Consensus mechanism: It is used in our blockchain_manager component to establish the agreement on one state of the data in a distributed network. It ensures that the same copy of the data is replicated to all nodes in the blockchain network. Further, it verifies the transactions from this block and prevents the attacker to change the state of the data. Our framework uses a proof of work consensus mechanism to add each block to the blockchain. To do so, miners solve the complex puzzle and receive a reward such as a new coin to validate the block. Miners validate the transactions in a block and add this block to the blockchain. Proof of work consensus mechanism prevents a malicious actor to compromise more than half of the hashing power on the blockchain. The process to verify the proof and its correctness is easy and fast. In the following we define the proposed metadata structure.

  • Metadata extension: In [1], the authors allow storing metadata in the blockchain. We follow a similar approach and store the metadata information for each piece of data to maintain product traceability and actors’ trust.

    In our framework (see Fig. 1), we have an RBAC_manager component to restrict user’s actions on the data and we use a blockchain component to store metadata and pointer of actual data that are stored on the DHT component. We use REST APIs (/chain) that allow actors to copy blockchain and to store and read data on the distributed framework. We propose a metadata extension that relies on paper [1], to handle privacy constraints on data. To do so, we propose to encrypt user’s sensitive information (e.g. location) with encryption mechanisms, and we store this encrypted data on offline storage (DHT). In our sample scenario, actual data on DHT consists of an actor’s name, product identity, product location, quantity, and wood type.

DHT Component: In the proposed framework the encrypted data of each actor is stored on off-blockchain (key, value) storage called DHT. We implement a DHT component of our framework by using the Kademlia library. DHT is comprised of network of nodes that enable actors to write/read data associated with a given key. Actor’s data are randomized across the nodes of the network and replicated to eliminate the chance of data loss. Our proposed framework records the date and time of each new data entered by the actor. This enables a network to keep track of the product and maintains the order of product entries.

Encryption Manager Component: In our framework, the encrypt_manager component is responsible for data encryption and decryption according to the selected encryption method. Our framework allows actors to choose encryption methods for each data write operation. If the data owner chooses the asymmetric encryption method then data will be encrypted with the owner’s public key and stored encrypted data on DHT. A public key is accessible publicly while the private key is kept private by the key’s owner to decrypt the data. If the data owner chooses the symmetric encryption method then data will be encrypted with a symmetric key and this symmetric key again will be encrypted with the owner’s public key to ensure that only the data owner can access it later. Both encrypted symmetric key and encrypted data will be stored on DHT.

Ring Signature Component: It is an option here to actor’s ensure anonymity within a group. A signature is created by any member from a set of public keys called a ring. Therefore, the identity of the signer remains hidden and no one can identify that who is the actual signer of the data. In our framework, the data owner can allow other actors to read their data upon request by using (/chain/<block_no>, method ‘GET’). To read data, we rely on encryption according to data reading requirements: 1) Private data will not be shared with anyone. Therefore, it will be encrypted with the owner’s public key, so only the owner can decrypt data using their private key. 2) Privacy-sensitive data is shared with only a specific number of users. It will be encrypted using the receiver’s public key, so later data can be decrypted only with the corresponding private key. The data owner will also sign data by using ring signature to remain anonymous within a group, An authorized requester can read data and verify the signature. 3) Public data is available to anyone. It will optionally be signed by ring signature or encrypted with the data owner’s public key to guarantee data ownership.

5 Implementation and Evaluation

This section discusses the implementation and evaluation of our proposed work. We implemented the key components of our framework by using an open source blockchain libraryFootnote 4 and the Kademlia DHT libraryFootnote 5. The blockchain library is used to achieve consensus on a distributed network and creation of blocks. While, we used the DHT to store and retrieve data link with a key in a network of peer nodes. We performed all the experimental process using Python 3. The experiments are performed on the data (privacy-sensitive, private, and public) entered by the actors into the framework.

We evaluated the key components of our proposed framework on 64-bit Microsoft Windows Operating System with 16 GB of memory. In the following, we discuss the qualitative security and privacy analysis as well as quantitative performance evaluation. 1) Security analysis: according to the design of proposed framework, only authorized actors are allowed to access the system to perform write, read, update, and delete operations. A malicious user cannot modify existing data unless he/she controls more computation power than all other miners. Our framework ensures following security properties: we achieve confidentiality using asymmetric and symmetric encryption. We encrypt data with the owner’s public key and store the corresponding pointer on the blockchain to achieve integrity. Our framework archives availability through the access control model. We ensures non-repudiation by adding metadata to the chain. 2) Linking attack: our framework uses a unique public key for each transaction. It prevents an attacker to link multiple data and transactions with the same ID. 3) Modification attack: in our solution, data owner has ability to encrypt data with their public key and store hash of the encrypted data on the blockchain. It also records evidence of data entry date and data entry time to trace last modification of data. An attacker can not modify owner’s data. 4) Privacy: our proposed solution ensures that the owner owns and control their private data. Actor’s private data will not be shared with other actors on the network. We encrypted privacy-sensitive and public data using requester public key to protect the data from malicious actor. In our proposed solution, we achieve anonymity using ring signature. 4) Scalability currently, we tested our prototype with six actors and achieve reasonable performance. Our framework is flexible and scalable to work with a large number of actors.

Fig. 2.
figure 2

Overall time overhead for asymmetric encryption

Performance Evaluation: We evaluate the time overhead to verify permission, data encryption/decryption using a symmetric or asymmetric method, DHT access, blockchain access, and overall total time while data store, read, update and delete operations. Figure 2a and Fig. 2b outline the time processing for both asymmetric encryption without ring signature and asymmetric encryption with ring signature. The results demonstrate that the total time of asymmetric encryption without ring signature is larger than the total time of asymmetric encryption with ring signature while store, update and delete data. We calculated the overall time for symmetric encryption as depicted in Fig. 3a and Fig. 3b. We compare results symmetric encryption without ring signature with symmetric encryption using ring signature. It is seen from the results that the total time of storing and deleting data for symmetric encryption without ring signature is larger than the symmetric encryption with ring signature. The total time to read data for symmetric encryption without ring signature is less than the symmetric encryption with ring signature. Total time to update data for both Fig. 3a and Fig. 3b are not much affected by the ring signature and symmetric encryption.

We also calculated average, standard deviation, min, and max value for both asymmetric and symmetric encryption while store, read, update, and delete data. We ran our prototype 50 times and experimental results show that asymmetric encryption gives a standard deviation of 0,022 s and symmetric encryption has a standard deviation of 0,023 s during data storing operation. To read data, asymmetric encryption has a minimum value of 0,124 s and symmetric encryption gives 0,142 s. For data update operation, asymmetric encryption has maximum value of 0,068 s and symmetric encryption gives 0,052 s maximum value. Experimental results clearly show that our proposed framework achieves a low overhead that is acceptable for the actor.

Fig. 3.
figure 3

Overall time overhead for symmetric encryption

6 Conclusion

In this paper, we illustrate the need for privacy-aware decentralized data storage, access control, data mutability, and actor anonymity in the wood supply chain scenario. Our framework enables this by combining the blockchain with DHT, role-based access control, and multiple encryption mechanisms that allow only authorized actors to access and modify their data without disclosing their identity on a distributed ledger. Thanks to its RESTful (between peers) and component-based (inside a peer) design, our framework is fully reusable across the wide diversity of possible application domains and use cases. We also presented a performance evaluation regarding its operation. Our simulation results demonstrate that our framework shows promising results and achieves an acceptable overhead. To the best of our knowledge, this research is the first work that integrates this combination of technologies in a single framework. In future work, we plan to compare our solution to similar blockchain implementations. Furthermore, we will study how the behaviour of our prototype evolves over larger number of peers, and devise optimizations to improve its performance over large scale networks, in real or simulated environments.