Keywords

1 Introduction

The various benefits of emerging cloud computing and its storage service have attracted a lot of enterprises and private users. Although it is popular with companies and private users, cloud storage can be targeted by criminals. As the users outsource their local data to the cloud, the security of data on the cloud becomes a very important issue [1]. Compared with the local environment, the dynamic nature of cloud computing makes traditional data security solutions become powerless, which brings greater challenges to the protection of cloud data [2,3,4]. Especially for users as the government, banking, securities, insurance and other industries, as well as large enterprise users, they attach great importance to data security. If sensitive data is stolen or leaked, it will cause huge losses to cloud users. Besides, there is another possibility that cloud service may be abused. For example, criminals can also store and hide incriminating and illegal materials in cloud, even distribute illegal information through shared cloud storage service [5].

Different to the previous cloud auditing research [6,7,8], we pay more attention to the security of log generated during the operation of cloud data. As any user behavior is recorded by cloud service provider as activity logs, the logs can be an important resource for extracting user behavior characteristics or illegal behavior. Collecting evidence of incidents or crimes involving cloud users and cloud services used by them is known as digital forensics (or cloud forensics) [9, 10]. Hence, logs are critical evidence in cloud forensics [11]. With analyzing logs, investigators can trace back after the event to obtain important clues [12]. However, acquisition of logs is completely dependent on the cloud service provider because logs are generated and stored by them. Considering that after an incident, the cloud service provider may modify the log data in order to evade responsibility. And the existence of some hardware and software failure factors will inevitably lead to the destruction of data. Therefore, the logs provided directly by the cloud service provider are not completely trusted. In our threat model, the cloud provider is honest when generates the log. However, after that the provider may hide the fact that logs have been tampered with, or even manipulate the logs for some business interests. An incompletely trusted log will throw subsequent investigations and forensics into a dilemma. To this end, ensuring the integrity of logs in cloud storage must be achieved before log analysis in digital forensics.

As trustworthy logs are very critical for digital forensics, researchers have come up with several solutions to ensure the security of log. However, most of current schemes focused on the secure logging method. Although some of them supported public verification, the schemes required downloading the whole logs for verification, which was not practical for cloud environment. In [13], public auditing for operation behaviors log in cloud storage was introduced, which enhanced the credibility of forensic results. However, the scheme was complex and caused large computational and communication overhead. On this basis, we made some improvements. In this paper, we present a novel public secure auditing scheme for logs in cloud based on blockchain. As blockchain can ensure the integrity and non-repudiation of data [14], the properties of blockchain make it suitable for protecting log security. However, the size of log in cloud is generally significant and it is not realistic to store log in the blockchain. Therefore, we just store only a small amount of important auxiliary information on the blockchain. In addition, considering that log information is highly sensitive and user’s privacy issues are directly related to it, we use random masking technique proposed in [15], enabling the auditor to audit the logs without learning the log content. The proposed scheme ensures that the forensic investigators get secure and reliable logs.

The contributions of this paper are as follows:

  1. 1.

    We designed a secure and efficient auditing scheme for logs recording activities of users generated in cloud environment.

  2. 2.

    We introduce blockchain to prevent external attacker or CSPs from tempering with the logs after-the-fact.

  3. 3.

    We evaluate the performance of the proposed scheme by experiments and comparisons with the state-of-the-art schemes and prove our advantage in computing overhead.

The rest of this paper is organized as follow. Section 2 provides some background information and our public auditing model for logs in cloud storage. Section 3 presents our scheme in detail. In Sect. 4, we provide the performance evaluation, and finally we conclude in Sect. 5.

2 Problem Statement

2.1 Public Auditing Model for Logs in Cloud Storage via Blockchain

A representative public auditing model for cloud storage logs is illustrated in Fig. 1. The model includes three different entities as follows:

Fig. 1.
figure 1

Public auditing model for logs in cloud storage via blockchain

Users, an entity that needs to store large amounts of data in the cloud. The user is the owner of the data and has the right to share data (the data hosted by the user can be shared), which can be an individual or an organization.

Cloud Storage Provider (CSP), an entity that is responsible for supervising and maintaining the outsourced data provided by users. At the same time, in order to meet user needs, CSP also provides related services such as cloud computing and logging.

Third Party Auditor (TPA), an entity that is authorized by the user or its forensic institution to verify the integrity of log operations on the cloud and return the verification results.

In general, users hosting large amounts of data on a cloud server can greatly reduce local storage and computational overhead. In addition, data stored on the cloud can be easily shared with others. Generally, users and other data visitors who are granted legal rights can complete a series of operations on data through cloud services, such as adding, deleting, and modifying cloud data. And every operation of the data should be recorded in the log by the cloud service provider correctly, since there may be illegal operations on cloud data. Once the data is destroyed, we need to obtain evidence of illegal operations through log analysis. Determining the integrity of the log before log analysis is the key to solving the above problem. Therefore, we introduced a log integrity audit. In cloud data integrity auditing, a common practice is to introduce a trusted third party to improve the credibility of the audit process. In view of this, we propose a public audit model for logs in cloud storage based on third parties and use the blockchain as an aid.

2.2 Design Objectives

To efficiently support public verification of log integrity in cloud storage, our protocol is designed to achieve the following security and performance objectives:

  • Public auditing: to enable any authorized TPA to verify the integrity of the cloud logs.

  • Auditing correctness: to ensure that there exist no cheating cloud logs that can pass TPA’s audit without indeed storing logs intact.

  • Privacy preserving: to ensure that TPA cannot directly or indirectly obtain the actual log content during the auditing process.

  • Blockless verification: to ensure that the actual log data does not need to be retrieved during the audit process.

  • Lightweight: to enable TPA to carry out auditing with minimum communication and computation costs.

3 The Proposed Scheme

In this section, we first briefly introduce the preliminaries that will be used to construct the proposed scheme. Then, we present our public auditing for logs in cloud storage with all design details.

3.1 Preliminaries

Homomorphic Hash Function:

A homomorphic hash function \( H \) is a collision-resistant hash function that for any vectors \( a,b \) and scalars \( \alpha ,\beta \) holds that \( H\left( {\alpha a + \beta b} \right) = H(a)^{\alpha } H(b)^{\beta } \). Collision resistance implies that if one knows vectors \( H\left( c \right) = H(a)^{\alpha } H(b)^{\beta } \) then it must be \( c = \alpha a + \beta b \) [16].

Merkle Hash Tree (MHT):

Merkle hash tree is a classic validation data structure [17] and is often used to verify the data integrity. As Fig. 2. Shows, it is a binary tree consisting of a root node, a set of intermediate nodes, and a set of leaf nodes. The bottom leaf node contains the stored data or its hash value. Each intermediate node is the hash of the contents of its two child nodes as well as the root node [18], such as \( h_{d} = h(h\left( {N_{3} } \right)||h\left( {N_{4} } \right)) \). Suppose the verifier has the value \( h_{r} \) corresponding to the root node and he wants to verify the integrity of \( N_{3} \) and \( N_{6} \). Then, only the certifier needs to provide relevant auxiliary information \( \varOmega = \{ N_{3} ,\,N_{6} ,\,h\left( {N_{4} } \right),\,h\left( {N_{5} } \right),\,h_{c} ,\,h_{f} \} \). The verifier could use the auxiliary information to recursively obtain the root node \( h_{r}^{ '} \) by constructing the MHT and check whether the calculated \( h_{r}^{ '} \) is the same as the authentic one.

Fig. 2.
figure 2

Example of Merkle hash tree

Blockchain Technology:

Blockchain [19] is a distributed general ledger technology derived from digital cryptocurrency bitcoin [20]. The development of this technology has aroused widespread concern in industry and academia. From a technical point of view, blockchain is a combination of innovations in various technologies, such as peer-to-peer network technology, asymmetric encryption technology, consensus mechanism, smart contracts, etc. Therefore, blockchain technology has many features and we have briefly summarized it here:

  • Decentralization. A blockchain is a point-to-point network composed of many nodes. There is no centralized device and management organization. The maintenance of the network depends on all the nodes with maintenance functions in the network. Each node has equal status.

  • Dependence. The blockchain operation rules and inter-node data are open and transparent, and nodes do not need to rely on trusted third parties to establish trust relationships in advance.

  • Irreversibility. The latter block in the blockchain system stores the hash value of the previous block. Therefore, to tamper with the data of a certain block, the data content of the block and all subsequent blocks must be modified accordingly. So, tampering with data in a blockchain system is computationally infeasible.

  • Traceability. The blockchain stores data by using a time-stamped chained block structure. So, the transactions on the chain are traceable.

3.2 Description of the Proposed Scheme

In our scheme, after each user operation is detected, a log entry e will be generated by CSP. The log entries generated within a certain period constitute a log block Bi = {ei1, …, eis}. A log file F, which needs to be verified, consists of a fixed number of log blocks: {B1, …, Bn}. Let H(∙) and h(∙) denote a homomorphic hash function and a one-way hash function respectively, G be a multiplicative cyclic group of the prime order q, g represent the generator of G. The system public parameters are {G, q, g, H, h}.

Specifically, our scheme consists of 4 algorithms: {Setup, Challenge, Prove, verify}. The details of the proposed scheme are as follows:

  1. 1.

    Setup Algorithm: In this algorithm, the CSP generates a key pair. Then, CSP computes a tag for each log block Bi of the log file F, and constructs a Merkle tree with hashes of log block tags as leaf nodes. Finally, the CSP stores all the tags and publishes the tree root into blockchain to protect the integrity of the tags.

    1. a.

      The CSP generates tag σi for each log block Bi as:

      $$ \sigma_{i} = H\left( {\sum\nolimits_{j = 1}^{s} {e_{ij} } } \right) $$
      (1)

      Let Σ = {σi} be the set of tags, where 1 ≤ i ≤ n.

    2. b.

      The CSP uses tags as leaf nodes to construct a Merkle tree MT and generates the tree root R. Finally, the CSP publishes the tree root R with blockchain to prevent the tags from being altered.

  2. 2.

    Challenge Algorithm: This algorithm is conducted by the TPA to generate an auditing challenge for checking the integrity of logs in the cloud. The TPA first randomly picks a set L = {l1, …, lc} with c elements, where 1 ≤ li ≤ n. Then, the TPA chooses a random value vi ∈ ℤp, for each element lL. Finally, the TPA sends the auditing challenge chal = {i, vi} to the CSP, where iL.

  3. 3.

    Prove Algorithm: After receiving the auditing challenge, the CSP runs this algorithm to generate a proof of log integrity as follows:

    1. a.

      The CSP first chooses s random valuer wj ∈ ℤp, 1 ≤ j ≤ s. Then, the CSP computes

      $$ W = H\left( {\sum\nolimits_{j = 1}^{s} {w_{j} } } \right) $$
      (2)
      $$ E_{j} = \sum\limits_{i \in L} {v_{i} \cdot e_{ij} + w_{j} } $$
      (3)
    2. b.

      The CSP generates the sibling path from the leaf nodes {σi} to the root node in the tree as an auxiliary information {Ωi}, where iL. Finally, the CSP responds the TPA with proof P = {W, Ej, {σi, Ωi}iL} jS.

  4. 4.

    Verify Algorithm: In this algorithm, the TPA first retrieves the root R from the blockchain. Then, the TPA computes root r using the received {σi, Ωi}iL and checks that if r and R are equal. If r is not the same value as R, the TPA outputs False, which means that the tags of the log blocks stored in cloud has been tampered with. Otherwise, the TPA checks where the following equation holds:

    $$ W \cdot \prod\limits_{i \in L} {\sigma_{i}^{{v_{i} }} } = H\left( {\sum\nolimits_{j = 1}^{s} {E_{j} } } \right) $$
    (4)

If the equation holds, output TRUE; otherwise, False.

Here, we give the demonstration about the correctness of the verification Eq. (5) as follows:

$$ \begin{aligned} {\text{Right}} = & \;H\left( {\sum\nolimits_{j = 1}^{s} {\left( {\sum\limits_{i \in L} {v_{i} \cdot e_{ij} + w_{j} } } \right)} } \right) \\ = & \;H\left( {\sum\nolimits_{j = 1}^{s} {\sum\limits_{i \in L} {v_{i} \cdot e_{ij} } } + \sum\nolimits_{j = 1}^{s} {w_{j} } } \right) \\ = & \;H\left( {\sum\limits_{i \in L} {v_{i} \left( {\sum\nolimits_{j = 1}^{s} {e_{ij} } } \right)} } \right) \cdot H\left( {\sum\nolimits_{j = 1}^{s} {w_{j} } } \right) \\ = & \;\prod\limits_{i \in L} {H\left( {\sum\nolimits_{j = 1}^{s} {e_{ij} } } \right)}^{{v_{i} }} \cdot W \\ = & \;\prod\limits_{i \in L} {\sigma_{i} }^{{v_{i} }} \cdot W = {\text{Left}} \\ \end{aligned} $$
(5)

4 Performance Evaluation

4.1 Theoretical Analysis

We compare our scheme with ref. [18] in terms of computational cost. For convenience, we define the following notations to denote the operations. EXP and MUL to denote the complexity of one exponentiation operation and one multiplication operation on Group G respectively. With the use of homomorphic hash function, our computation overhead in tag generation for each log file is nEXP operations, where n is the number of elements in block, compared with the overhead in [18] which is 2nEXP and n hash operation. Since both schemes use Merkle hash tree, we do not consider the time of constructing the MHT in this section. In our scheme, the CSP runs the Prove algorithm with one EXP and c MUL operations, where c is the number of challenged blocks. After that, TPA runs Verify algorithm with (c + 1) EXP and one MUL operations. However, in [18] needs cEXP and cMUL operations while the TPA needs two Pairing, 2cEXP and cMUL operations. In conclusion, our scheme can reduce computation overhead in both CSP and TPA side.

4.2 Experimental Results

In this subsection, we show the performance of our scheme by the experiments implemented with python based on Pairing Base Cryptography (PBC) library version 0.5.14 and employs an MNT d159 curve based on 160-bit group order to achieve 80-bit security parameter. We conduct all the experiments on a Linux system (ubuntu 16.04.2 LTS x64 with 4.8.0 kernel version) with an Intel Xeon E3-1225 v5 CPU at 3.31 GHz, 8 GB RAM and a 7200 RPMSATA 2 TB. All the results are the averages of 20 trials. In experiments, we assume that a log file is 20 MB composed by 5000 blocks.

We conduct the experiments for tag generation and evaluate its performance in Fig. 3, where the number of blocks is increased from 200 to 2000 with intervals of 200. As we can see, the time cost of tag generation linearly increases with the number of blocks, which ranges from 315.56458 ms to 3059.623603 ms.

Fig. 3.
figure 3

The computation overhead of tag generation under different number of blocks

The computation cost for the algorithm challenge, proof and verify is shown in Figs. 4 and 5. The results of these experiments under different number of challenged blocks varied from 100 to 1000 show that the time in different auditing phases increases proportionally to the number of challenged blocks. From the above experimental results, it is clearly that most of the computation overhead is taken by the CSP and TPA only costs a little time. Therefore, our proposed scheme achieves high efficiency on TPA side.

Fig. 4.
figure 4

The computation overhead of TPA in challenge phase under different number of challenged blocks

Fig. 5.
figure 5

Computation overhead in different auditing phase under different number of challenged blocks

5 Conclusion

In this paper, we present a public auditing scheme for logs in cloud storage with privacy preserving. Specifically, we utilize homomorphic hash function to generate tag of log block. As the tag is generated by the CSP and the CSP is not completely trusted, we aggregate the log tags by Merkle hash tree and publish the tree root with blockchain, which not only reduces the computation cost of CSP to generate tags, but also prevents the log tags from being tampered with. Besides, our proposed scheme can support public auditing without leaking the log content with the use of random masking technique. The theoretical analysis and experiment results show that the proposed scheme has a good performance.