Keywords

1 Introduction

Cloud computing has been envisioned as the next-generation architecture of IT enterprise [1]. It enables users to access to the infrastructure and application services on a subscription basis. This computing service can be categorized into Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS) [2]. Due to the advantage characteristics including large scale computation and data storage, virtualization, high scalability and elasticity, cloud computing technologies have been developing fast, of which the important branch is cloud storage system. Cloud storage service is a new paradigm for delivering storage on demand, over a network and billed for just what is used. Many international IT corporations now offer cloud storage service on a scale from individual to enterprise, such as Amazon Simple Storage Service (S3) and EMC Atoms Cloud Storage.

Although cloud storage is growing in popularity, data security is still one of the major concerns in the adoption of this new paradigm of data hosting. For example, the cloud service providers may discard the data which has not been accessed or rarely accessed to save the storage space or keep fewer replicas than promised [3]. And the storage service provider, which experiences Byzantine failures occasionally, may decide to hide the data errors from the client for the benefit of their own [1]. Furthermore, disputes occasionally suffer from the lack of trust on cloud service provider (CSP) because the data change may not be timely known by the cloud client, even if these disputes may result from the users own improper operations [4]. Therefore, clients would like to check the integrity and availability of their stored data. However, the large size of the outsourced data and the limited resource capability present an additional restriction: the client should perform the integrity check without downloading all stored data.

To date, extensive researches are carried out to address this problem [514, 1820]. Early work concentrated on enabling data owners to check the integrity of remote data, which can be denoted as private verifiability. Although schemes with private verifiability can achieve higher scheme efficiency, public verifiability (or public auditability) allows anyone not just the client (data owner), to challenge the cloud server for correctness of data storage while keeping no private information [1]. In cloud computing, data owners are able to delegate the verification of data integrity to a trusted third party auditor (TPA), who has expertise and capabilities to audit the outsourced data on demand. This is because the client themselves are not willing to perform frequent integrity checks due to the heavy overhead and cost.

Recently, public auditability has become one of the basic requirement of proposing a data storage auditing scheme. However, there are still some major concerns need to be solved before put the auditing schemes into practical use. Many big data applications keep clients’ data on the cloud and offer frequently update operations. A most typical example is Twitter. Data stored in cloud may not only be accessed but also updated by the clients through either modify an existing data block, or insert a new block, or delete any block. To support the most general forms of update operation is important to broaden the scope of practical application of cloud storage. Therefore, it is imperative to extend the auditing scheme to support provable updates to outsourced data. Unfortunately, traditional data integrity verification schemes are mainly designed for static data storage. The direct extension of these schemes may lead to functional defect or security vulnerability. In this paper, we will focus on better support for dynamic data operation for cloud storage applications. We employ a secure signature scheme from bilinear maps [15] and the Large Branching Tree (LBT) to achieve that aim. Our contribution can be summarized as follows:

  1. (1)

    We formally define the framework of dynamic provable data possession scheme and provide an efficient construction, which supports fully dynamic updates including modification, insertion and deletion.

  2. (2)

    We analyze the existing schemes and point out the disadvantages of the Merkle Hash Tree (MHT) used as the data structure for dynamic updates. For better efficiency, we replace MHT with LBT. This multiple branching data structure enables reduction in size of auxiliary information, thereby causes less communication cost compared to MHT-based schemes.

  3. (3)

    We employ a secure signature algorithm for LBT data structure. The characteristics of bilinear pairings in the signature algorithm only cause O(1) computation cost on CSP for each dynamic update. Besides, the client no longer needs to construct LBT structure to support dynamic operation. Consequently, this algorithm greatly reduces computation cost both on CSP and client as well as simplifies the update process.

  4. (4)

    We prove the security of our proposed construction and justify the performance of our scheme through comparisons with existing data integrity verification schemes [1, 57, 11, 12].

The rest of this paper is organized as follows. Section 2 discusses related works. In Sect. 3, we introduce main techniques, system model and security model. Then, Sect. 4 presents the specific description of our proposed scheme. Section 5 provides security analysis. We further analyze the experimental results in Sect. 6. Section 7 concludes the paper.

2 Related Works

Recently, the integrity verification for data outsourced in cloud has attracted extensive attention. The existing provable data integrity schemes can be classified into two categories: proof of retrievability (POR) and provable data possession (PDP). POR scheme was first proposed by Juels et al. in 2007 [5]. In their scheme, the client can not only check their remote data integrity, but also recover outsourced data in its entirety by employing erasure-correcting code. The following researches of POR focused on providing security analysis [7] and improving the construction. However, most of existing POR schemes can only be used to the static archive storage system, e.g., libraries and scientific data sets [5, 79]. The reason is that the erasure-correcting codes using in POR system bring a problem: the whole outsourced data is required to perform a small update. This is the main issue towards making POR dynamic.

In cloud computing, the dynamic update is a significant issue for various applications which means that the outsourced data can be dynamically updated by the clients such as: modification, deletion and insertion. Therefore, an efficient dynamic auditing protocol is essential in practical cloud storage systems [10].

In 2007, Ateniese et al. [6] proposed PDP framework. Compared to POR scheme, PDP did not use erasure-correcting codes, and hence was more efficient. Although PDP did not provide the retrievability guarantee, the dynamic techniques of PDP are developed well in follow-up studies. Ateniese et al. [11] gave a dynamic PDP scheme based on their prior work [6], in which the client pre-computes and stores at the server a limited number of random challenges with the corresponding answers. This scheme cannot perform insertion since that would affect all remaining answers.

The first fully dynamic PDP protocol was proposed by Erway et al. [12] in 2009. They considered using dynamic data structure to support data updates, so they constructed the rank-based authenticated dictionaries based on the skip list. However, the skip list requires a long authentication path and large amount of auxiliary information during the verification process. Wang et al. [1] employed homomorphic signature and MHT data structure to achieve supporting fully dynamic updates. Zhu et al. [4] proposed a dynamic auditing system based on fragment, random sampling and Index-Hash Tree (IHT) that supports provable updates and timely anomaly detection. Later on, researches are focus on supplying additional properties [16], distribute and replicate [13] or enhance efficiency and using other data structure [17]. For instance, Wang et al. [18] firstly proposed a proxy provable data possession (PPDP) system. Their protocol supports the general access structure so that only authorized clients are able to store data to public cloud servers. Lin et al. [19] proposed a novel provable data possession scheme, in which data of different values are integrated into a data hierarchy, and clients are classified and authorized different access permissions. Their scheme also allows the data owner to efficiently enroll and revoke clients which make it more practical in cloud environment.

Recently, Gritti et al. proposed an efficient and practical PDP system by adopting asymmetric pairings [20]. Their scheme outperforms other existing schemes because there are no exponentiation and only three pairings are required. However, this scheme is vulnerable to three attacks as they later pointed out [21]. Several solutions are proposed by Gritti et al. corresponding to all the vulnerabilities of scheme [20]. They used IHT and MHT techniques to resist the replace attack and replay attack. They also employed a weaker security model to achieve data privacy. Although system security can be guaranteed, the performance of the system still needs improvement.

To solve the above problems, we employ a new data structure Large Branching Tree (LBT) into PDP system. The difference between LBT and MHT is that each none-leaf node yields out multiple children, taking q as an example. This multiple branching data structure enables the client to increase the number of a node’s children and decrease the depth of the tree without inflating the signature length. For further improving the system efficiency, we introduce a secure signature scheme to verify the value of the data blocks. In fact, the improvement is achieved by the difference of the way the sibling nodes are authenticated. We will discuss this in detail in Sect. 4.

3 Preliminaries

3.1 Large Branching Tree

Compared to MHT, LBT is concise in structure. Each node of the tree except leaves has more than 2 children nodes. For concreteness, we take the outdegree of the node to be q, and the height of the tree is l. An authentication LBT scheme produces signatures that represent paths connecting data blocks to the root of the tree. The authentication mechanism works inductively: the root authenticates its children nodes, these nodes authenticate their children nodes, and the authentication proceeds recursively down to the data blocks authenticated by its parent [15]. In our scheme, the way the sibling nodes are authenticated is different. Since every node has multiple brother nodes, we label them with a number to denote its position among siblings. And an unique authentication value that can be verified independently has been generated for the verification.

3.2 Dynamic PDP System

The dynamic PDP system for outsourced data in cloud consists of three entities: Client, who has limited storage resource and computational ability but large amount of data to be stored in the cloud; Cloud Storage Server (CSS), an entity which has huge storage space and is able to provide data maintenance and computation; Third Party Auditor (TPA), who specializes in verifying the integrity of outsourced data in cloud when received a request from the client. The system model is shown in Fig. 1.

We assume the communication between any two of these three entities is reliable. The whole auditing scheme is on a challenge-response protocol, which contains three phases: first, the client completes initializing work and then hosts his/her data files in cloud; second, the client makes an update operation by communication with CSS; third, TPA and CSS work together to provide data auditing service through exchanging the challenge and proof messages. TPA would report the audit results to the client.

Fig. 1.
figure 1

System model

Definition 1

In a DPDP system, the client, CSS and TPA cooperate with each other to accomplish the challenge-response procedure. A DPDP scheme consists of the following algorithms:

  • \(KeyGen (1^{k}) \rightarrow \{sk,pk\}\). This probabilistic algorithm is run by the client. It takes as input security parameter \(1^{k}\), and returns private key sk and public key pk.

  • \(TagGen(F,sk) \rightarrow \{T\}\). This algorithm is run by the client to generate the metadata. It takes as input the data file F and private key sk and outputs the tag sets T, which is a collection of signatures \(\{\tau _{i}\}\) on \(\{m _{i}\}\).

  • \(Update (F,Info,\varOmega ,pk) \rightarrow \{F^{'},P_{update}\}\). This algorithm is run by CSS in response to an update request from TPA. As input, it takes the data file F, update information Info, the previous auxiliary information \(\varOmega \) and the public key pk. The output is the new version of the data file \(F^{'}\) along with its proof \(P_{update}\). CSS sends the proof to TPA.

  • \(VerifyUpdate (P_{update},sk,pk) \rightarrow \{accept,reject\}\). This algorithm is run by TPA to verify CSS updated the data correctly. The input contains the proof \(P_{update}\) from CSS, the new file \(F^{'}\) with its corresponding metadata \(T^{'}\), and the private and public keys. The output is accept if the proof is valid or reject otherwise.

  • \(Challenge (\cdot ) \rightarrow \{chal\}\). TPA runs this algorithm to start a challenge and send the challenge information chal to CSS.

  • \(GenProof (F,T,chal,pk) \rightarrow \{P\}\). This algorithm is run by CSS. It takes data file F, metadata T, the challenge information chal and the public key as inputs, and outputs the proof for the verification.

  • \(VerifyProof (P,pk) \rightarrow \{accept,reject\}\). TPA run this algorithm to verify the response P from CSS. It outputs “accept” if the proof is correct, or “reject” otherwise.

3.3 Security of Dynamic PDP

Following the security model defined in [12, 20], we define the security model for our proposed DPDP scheme by a data possession game between a challenger C and a adversary A. The detailed data possession game is described in Appendix A.

Definition 2

We say that a DPDP scheme is secure if for any probabilistic polynomial time (PPT) adversary A (i.e., malicious CSS), the probability that A wins the data possession game is negligible.

4 Construction

The main building blocks of our scheme include LBT, a secure signature scheme proposed by Boneh et al. [15] and Homomorphic Verifiable Tags (HTVs) [6]. LBT data structure is an expansion of MHT, which is intended to prove that a set of elements are undamaged and unaltered [1]. Naturally, we consider employing the hash algorithm used in MHT structure to authenticate the values of nodes in LBT, but this algorithm brings undesirable effects on the performance. During the update process, that the client modify, insert, or delete the data if only for one block will affect the whole data structure, causing O(n) computation overhead for both the client and CSS. Therefore, it is imperative to find a better method to authenticate LBT data structure. Instead of using hash functions, we employ the signature scheme [15] to improve the efficiency of verifying the elements in LBT. The computation complexity decreases to O(1) in the update process. As for the public auditability, we resort to the homomorphic verifiable tags. The reason is that HVTs make it possible to verify the integrity of the data blocklessly.

The procedure of our scheme is summarized in three phase: Setup, Dynamic Operation and Periodical Auditing. The details are as follows:

4.1 Setup

In this phase, we assume the data file F is segmented into \(\{m_1,m_2,...,m_n\}\), where \(n=q^l\) and q, l are arbitrary positive integers. Bilinear map \(e:G \times G \rightarrow G_T\) is secure. Group G has a prime order p. Let g be the generator of G. \(H:\{0,1\}^* \rightarrow G\) is a family of collision-resistant hash functions. Note that all exponentiations in following algorithms are performed modulo p on G, and for simplicity we omit writing “(mod p)” explicitly.

  • \({{\varvec{KeyGen}}}\,\, (1^{k})\). The client runs this algorithm to generate a pair of private and public keys. Choose a random \(x \leftarrow Z_p\) and compute \(y=g^x\). Pick \(\alpha _1,\alpha _2,...,\alpha _q \leftarrow Z_p\) and \(\lambda \leftarrow G\). Compute \(\lambda _1 \leftarrow \lambda ^{1/\alpha _1},\lambda _2 \leftarrow \lambda ^{1/\alpha _2},...,\lambda _q \leftarrow \lambda ^{1/\alpha _q} \in G\). Pick \(\mu \leftarrow G\), \(\beta _0 \leftarrow Z_p\), then compute \(\nu =e(\mu ,\lambda )\) and \(\eta _0 =e(\mu ,\lambda )^{\beta _0}\) where \(\eta _0\) denotes the root of LBT (the root of MHT is the hashes of all the nodes). And for every node in LBT tree, the client chooses \(\{\beta _j\}_{1 \le j \le n}\). The client also generates a random signing key pair (spkssk). The public key is \(pk=\{y,\lambda ,\nu ,\mu ,\{\alpha _i\}_{1 \le i \le q},\{\beta _i\}_{1 \le i \le n},spk\}\) and the private key is \(sk=\{x,\beta _0,ssk\}\).

  • \({{\varvec{TagGen}}}\,\, (F,sk)\). For each data block \(m_i\) (\(i=1,2,...,n\)), the client chooses a random element \(\omega \leftarrow G\), and computes a signature tag \(\tau _i \leftarrow (H(m_i)\cdot \omega ^{m_i})^x\). The set of all the tags is denoted by \(T=\{\tau _i \},1 \le i \le n\). Then the client computes \(\gamma =Sig_x(\eta _0)\) and sends \(Ini=\{F,T,t,\gamma \}\) to CSS. Let \(t=name \parallel n \parallel \omega \parallel Sig_{ssk}(name \parallel n \parallel \omega )\) be the tag for file F. The client will then compute \(sig=Sig_{ssk}(t)\) and sends sig along with the auditing delegation request to TPA for it to compose a challenge later on.

Upon receiving the initialize information Ini, CSS first stores all the data blocks, and then construct a LBT as follows: for the ith data block \(m_i\) (\(i=1,2,...,n\)), CSS generates the i-th leaf of LBT together with a path from the leaf to the root. We denote the leaf by \(\eta _l \in G\), where l is the layer of the leaf and the nodes on its path to the root are \((\eta _l,i_l,\eta _{l-1},i_{l-1},...,\eta _1,i_1)\), where \(\eta _j\) is the \(i_j\)-th child of \(\eta _{j-1},1 \le j \le l\). The authentication values for these nodes are computed as follow steps:

  • Step 1: For every node on the path from leaf \(\eta _l\) to the root, CSS generates \(\eta _j \leftarrow e(\mu ,\lambda _{i_j})^{\beta _j}\).

  • Step 2: The authentication value of node \(\eta _j\), the \(i_j\)th child of \(\eta _{j-1}\), is \(f_j \leftarrow \mu ^{\alpha _{i_j}(\beta _{j-1} +H(\eta _j))}\).

  • Step 3: The authentication value of \(H(m_i)\), the child of the leaf node \(\eta _l\), is \(f \leftarrow \mu ^{\beta _l + H(m_i)}\).

Therefore, the signature on data block \(m_i\) is \(\varOmega _i=(f,f_l,i_l,...,f_1,i_1)\), which is also the auxiliary information for authentication in the dynamic update process. The construction of LBT is illustrated in Fig. 2.

Fig. 2.
figure 2

Construction of LBT

4.2 Dynamic Operation

  1. (1)

    Modification. The client composes an update request \(Info=(m,i,m_{i}^{'},\tau _{i}^{'})\), it denotes that the client wants to modify \(m_i\) to \(m_{i}^{'}\), and \(\tau _{i}^{'}=(H(m_{i}^{'}) \cdot \omega ^{m_{i}^{'}})^x\) is the signature of \(m_{i}^{'}\). Then he/she sends the update information Info to CSS.

  • \({{\varvec{Update}}}\,\, (F,Info,\varOmega ,pk)\). Upon receiving the update request, CSS first modifies the data block \(m_i\) to \(m_{i}^{'}\), and replaces the \(H(m_i)\) wth \(H(m_{i}^{'})\) in LBT. As shown in the Fig. 3, CSS generates the new authentication value \(f^{'} \leftarrow \mu ^{\beta _l + H(m_{i}^{'})}\) and update the signature \(\varOmega \) into \(\varOmega ^{'}\). Note that, CSS only consumes O(1) computation overhead. Finally, CSS responds

    $$\begin{aligned} P_{update}=(H(m_{i}^{'}),\varOmega ^{'},\gamma ), \end{aligned}$$

    to TPA.

  • \({{\varvec{VerifyUpdate}}}\,\, (Pupdate,sk,pk)\). TPA generates root \(\eta _{0}^{'}\) based on \((H(m_{i}^{'}),\varOmega ^{'}_{i})\) as follows:

  • Step 1: Compute \(\eta _{l}^{'} \leftarrow e(f',\lambda )\cdot \nu ^{-H(m_{i}^{'})}\).

  • Step 2: Computes \(\eta _{j-1}^{'} \leftarrow e(f _{j}^{'},\lambda _{ij})\cdot \nu ^{-H(\eta _{j}^{'})}\) for \(j=l,...,1\).

  • Step 3: The proof is accepted if \(e(\gamma ,g)=e(\eta _{0}^{'},y)\) or otherwise rejected.

Fig. 3.
figure 3

LBT update under modification

Fig. 4.
figure 4

LBT update under insertion

  1. (2)

    Insertion. As the insert operation would change the structure of LBT. This process is different from data modification. We assume the client wants to insert block \(m^{*}\) after the ith block \(m_i\). First, the client generates a tag \(\tau ^{*} \leftarrow (H(m^{*}) \cdot \omega ^{m^{*}})^x\). Then the client chooses two parameters \(\beta _{l+1},\beta _{l+1}^{*}\) and sends an update request \(Info=(i,m^{*},\tau ^{*}, \beta _{l+1},\beta _{l+1}^{*})\) to CSS.

  • \({{\varvec{Update}}}\,\, (F,Info,\varOmega ,pk)\). Upon receiving the update information, CSS updates data files and turns the leaf node \(\eta _l\) into a father node whose first child node is \(\eta _{l+1}\) and the second one is \(\eta _{l+1}^{*}\). Data blocks \(m_l\) and \(m^{*}\) are authenticated with respect to the leaves \(\eta _{l+1}\) and \(\eta _{l+1}^{*}\). As shown in the Fig. 3, CSS computes the authentication values \(f _{l+1}\) and \(f _{l+1}^{*}\) by \(\eta _{l+1}\) and \(\eta _{l+1}^{*}\) respectively. The authentication values of the two blocks are computed as \(f \leftarrow \mu ^{\beta _{l+1}+H(m_i)}\) and \(f^{*} \leftarrow \mu ^{\beta _{l+1}^{*}+H(m^{*})}\). Finally, CSS responses TPA with a proof \(P_{update}=\{(\varOmega _{i}^{'},H(m_i)),(\varOmega ^{*},H(m^{*})),\gamma \}\). The process is shown in Fig. 4.

  • \({{\varvec{VerifyUpdate}}}\,\, (P_{update},sk,pk)\). This process is similar to the update verification process in modification operation except that the data blocks and the auxiliary information are different.

  1. (3)

    Deletion. Suppose the client wants to delete the block \(m_i\). The update process is very simple. The only thing CSS needs to do is deleting \(m_i\) from its storage space and taking out the \(H(m_i)\) from LBT structure.

4.3 Auditing

After the Setup process, no mater whether the update operation is executed or not, the integrity verification is available for TPA to perform his/her duty as an auditor. The integrity verification process is a challenge-response protocol, TPA generates a challenge information chal and sends it to CSS. CSS responds with a proof P. Then TPA verifies the correctness of the proof and outputs accept/reject.

  • \({{\varvec{Challenge}}}\,\, (\cdot )\). Before challenging, TPA first use ssk to verify the signature on t to recover \(\omega \). Suppose TPA wants to challenge c blocks. The indexes of these blocks are randomly selected from [1, n]. Namely, let \(I=\{i_1,i_2,...,i_c\}\) be the indexes of the challenged blocks. For each \(i \in I\), TPA chooses a random element \(\pi _i \leftarrow Z_p\). TPA then sends \(chal=\{(i,\pi _i)_{i \in I}\}\) to CSS.

  • \({{\varvec{GenProof}}}\,\, (F,T,chal,pk)\). Upon receiving the challenge, CSS takes the data F, tags T and challenge information chal as inputs, and outputs: \(\varphi = \sum \limits _{i \in I} \pi _i m_i\) and \(\tau = \prod \limits _{i \in I} \tau _{i}^{\pi _i}\).

Moreover, CSS also provides TPA with the auxiliary information \(\{\varOmega _i\}_{i \in I}\), which denotes the authentication path from the challenged data blocks to the root. CSS sends proof \(P=\{\varphi ,\tau ,\{H(m_i),\varOmega _i\}_{i \in I},\gamma \}\) to TPA.

  • \({{\varvec{VerifyProof}}}\,\, (P,pk)\). For each challenged block \(m_i\), \(i \in I\), TPA first use the auxiliary information to reconstruct the nodes \(\eta _l,\eta _{l-1},...,\eta _0\) in a bottom-up order by the following steps:

  • Step 1: Compute \(\eta _{l} \leftarrow e(f,\lambda )\cdot \nu ^{-H(m_{i})}\).

  • Step 2: For \(j=l,l-1,...,1\), compute \(\eta _{j-1} \leftarrow e(f_j,\lambda _{i_j})\cdot \nu ^{-H(\eta _{j})}\).

  • Step 3: Verify \(e(\gamma ,g)=e(\eta _0,y)\).

If the equality in step 3 holds, TPA continues to verify \(e(\tau ,g)=e(\prod \limits _{i \in I}H(m_i)^{\pi _i}\cdot \omega ^{\varphi },y)\).

If so, the proof is accepted, otherwise rejected.

5 Correctness and Security

Correctness. The correctness of our scheme is that both the proof generated for dynamic auditing and integrity checking passes the verification algorithm. The correctness of the proof for dynamic auditing is easy to prove. Indeed, Step 1 of the verification algorithm results in

$$\begin{aligned} e(f,\lambda )\cdot \nu ^{-H(m_i)}=e(\mu ^{\beta _l+H(m_i)},\lambda )\cdot e(\mu ,\lambda )^{-H(m_i)}=(\mu ^{\beta _l},\lambda )= \eta _l. \end{aligned}$$

For any \(j \in \{l, l-1, \dots , 1\}\), the result of computation in step 2 of the verification algorithm is

$$\begin{aligned} e(f_j,\lambda _{i_j})\cdot \nu ^{-H(\eta _j)}&=e(\mu ^{\alpha _{i_j}(\beta _{j-1}+H(\eta _j))},\lambda ^{1/\alpha _{i_j}})\cdot e(\mu ,\lambda )^{-H(\eta _j)}\\&=e(\mu ^{\beta _{j-1}+H(\eta _j)},\lambda )\cdot e(\mu ^{-H(\eta _j)},\lambda )=e(\mu ^{\beta _{j-1}},\lambda )=\eta _{j-1}. \end{aligned}$$

The proof for integrity checking is also based on the properties of bilinear maps.

$$\begin{aligned} e(\tau ,g)=e(\prod \limits _{i \in I}(H(m_i)\cdot \omega ^{m_i})^{x\pi _i},g)=e(\prod \limits _{i \in I}(H(m_i)^{\pi _i}\cdot \omega ^{m_i\pi _i}),g^x)=\!e(\prod \limits _{i \in I}H(m_i)^{\pi _i}\cdot \omega ^{\varphi },y). \end{aligned}$$

Now we show that our proposed scheme is secure in the random oracle model. The security of our scheme is depending on responding correctly generated proof. We divide the security analysis of our scheme into two parts:

  1. (1)

    Prove that if the challenger accepts the proof \(P=\{\varphi ,\tau ,\{H(m_i),\varOmega _i\}_{i\in I},\gamma \}\), where \(\tau \) denotes the tag proof which aggregates some forged tags for all the challenged blocks, the Computational Diffie-Hellman (CDH) problem is tractable within non-negligible probability.

  2. (2)

    Prove that if the challenger accepts the proof \(P=\{\varphi ,\tau ,\{H(m_i),\varOmega _i\}_{i\in I},\gamma \}\), where \(\varphi \) denotes the data proof generated by the adversary with all the challenged blocks \(\{m_i\}_{i\in I}\), the Discrete Logarithm (DL) problem is tractable within non-negligible probability.

Security. During the analysis of existing schemes, we found that different schemes have different security levels. We classify some typical schemes’ security level by their key techniques. Most of MAC-based schemes are semantically secure. RSA-based schemes and BLS-based schemes are both provably secure since they rely on public keys. Like most homomorphic tag-based schemes, our scheme is provably secure in the random oracle model.

Theorem 1

If the tag generation scheme we use is existentially unforgeable, CDH problem and DL problem is intractable in bilinear groups in the random oracle model, there exist no adversary against our provable data possession scheme could cause the verifier to accept a corrupted proof in the challenge-verify process, within non-negligible probability, except by responding the correctly computed proof.

Proof

The full proof of this theorem can be found in Appendix B.

6 Performance

In this section, we analyze the performance of our scheme in the terms of storage overhead, computation cost and communication complexity.

6.1 Storage Overhead

Through analysis of the state-of-the-art, we find that what affects the storage overhead most is the metadata. For example, in [5], the verifier (the client) has to store the sentinels for verification. In [14], the verifier (the client) needs to store MACs.

In our scheme, the metadata is stored in CSS instead of the verifier (TPA). The client sends the metadata together with data to CSS during the setup phase. For each challenge, CSS responds both the data proof and the tag proof to TPA.

Table 1 shows the comparison of the storage overhead of different schemes. In the table, k denotes the total number of the sentinels, n denotes the total number of the data blocks, \(\lambda \) is the security parameter, p denotes the order of the group G and N is RSA modulus.

Table 1. Comparison of the storage overhead

6.2 Computation Complexity

There are three entities in our scheme: the client, CSS and TPA. We discuss their computation cost respectively in different phase. In the setup phase, the client needs to compute 2 pairings, \(2n+2\) exponentiations and n multiplications on G.

Fig. 5.
figure 5

Comparison of pre-processing time

Fig. 6.
figure 6

Comparison of communication cost

For better comparison, we implemented both our scheme and MHT-based scheme [1] in Linux. All experiments are conducted on a system with an Intel Core i5 processor running at 2.6 GHz, 750 MB RAM. Algorithms such as paring and SHA1 are employed by installing the Paring-Based Cryptography (PBC) library and the crypto library of OpenSSL. All experimental results represent the mean of 10 trials. Figure 5 shows the pre-processing time as a function of block numbers for client. The MHT-based scheme [1] exhibits slower pre-processing performance. Our scheme only performs an exponentiation on every data block in order to create the metadata. However, in scheme [1], client needs to perform the exponentiation as well as constructing a MHT to generate the root.

Besides, in the dynamic update phase, TPA only needs to compute 1 exponentiation in modification, 2 exponentiations in insertion and causes no computation in deletion. Note that the computation complexity of CSS in scheme [1] is O(n) in all three update operations, where n is the number of data blocks. Therefore, the secure signature scheme based on bilinear maps [15] introduced in our scheme has greatly reduced the computation overhead during the dynamic update phase. In the auditing phase, TPA needs to do 2c summations and 2c multiplications, where c is the number of challenged data blocks. The computation complexity of TPA is O(n).

6.3 Communication Cost

The main communication cost we concern is the communication cost between CSS and TPA during each challenge-response query. Since the metadata is stored in CSS, the proof sended from CSS to TPA is increased. There is a trade-off between the storage overhead and the communication cost. The major component of the communication cost is the proof sent to TPA by CSS. We compare our scheme with MHT scheme [1]. Figure 6 shows the proof size as a function of the number of challenged blocks. Apparently, our scheme causes less communication cost between CSS and TPA. The auxiliary information accounts for that gap. In our scheme, the size of auxiliary information grows linearly as the number of challenged blocks increase, while it grows exponentially as the number of challenged blocks increase in the MHT scheme [1].

7 Conclusion

In this paper, we propose an efficient dynamic auditing scheme based on a secure signature scheme [15] and LBT data structure. We formally give the system model and security model. Then, we present the concrete process of the proposed scheme. LBT data structure enables reduction in size of auxiliary information, thereby causes less communication cost compared to MHT-based schemes. Moreover, the characteristics of bilinear pairings in the signature algorithm only cause computation cost on CSP for each dynamic update. And the client no longer needs to construct LBT structure to support dynamic operation. Therefore, our scheme greatly reduce computation cost both on CSP and client as well as simplify the update process. Through security analysis and performance analysis, our scheme is provably secure and efficient.