Keywords

1 Introduction

Storage systems allow everyone to upload his/her data on cloud servers, and thus avoid keeping them on his/her own devices that have often limited storage capacity and power. Nevertheless, storage services are susceptible to attacks or failures, and lead to possible non-retrievable losses of the stored data. Indeed, storage systems are vulnerable to internal and external attacks that harm the data integrity even being more powerful and reliable than the data owner’s personal computing devices. A solution is to construct a system that offers an efficient, frequent and secure data integrity check process to the data owner such that the frequency of data integrity verification and the percentage of audited data should not be limited by computational and communication costs on both cloud server’s and data owner’s sides.

A Provable Data Possession (PDP) enables a data owner, called the client, to verify the integrity of his/her data stored on an untrusted cloud server, without having to retrieve them. Informally, the client first divides his/her data into blocks, generates tags on each block, and then forwards all these elements to the server. In order to check whether the data are correctly stored by the server, the client sends a challenge such that the server replies back by creating a proof of data possession. If the proof is correct, then this means that the storage of the data is correctly done by the server; otherwise, this means that the server is actually cheating somehow. Natural extension features of PDP include: (1) Dynamicity (D) that enables the client to update his/her data stored on the server via three operations (insertion, deletion and modification); (2) Public verifiability (PV) that allows a client to indirectly check that the server correctly stores his/her data by enabling a Third Party Auditor (TPA) or everyone else to do the audit; (3) Data privacy (DP) preservation that ensures that the contents of the stored data are not leaked to neither the TPA nor anyone else. We require that a Dynamic PDP (DPDP) with PV and DP system is secure at untrusted server, which means that the server cannot successfully generate a proof of data possession that is correct without actually storing all the data. In addition, a DPDP with PV and DP system should be data privacy-preserving, which means that the TPA should not learn anything about the client’s data even by having access to the public information.

Gritti et al. [9] recently constructed an efficient and practical DPDP system with PV and DP. However, we have found three attacks threatening this construction: (1) The replace attack enables the server to store only one block of a file m and still pass the data integrity verification on any number of blocks; (2) The replay attack permits the server to keep the old version of a block \(m_{i}\) and the corresponding tag \(T_{m_{i}}\), after the client asked to modify them by sending the new version of these elements, and still pass the data integrity verification; (3) The attack against data privacy allows the TPA to distinguish files when proceeding the data integrity check without accessing their contents. We then propose two solutions to overcome the adversarial issues threatening the DPDP scheme with PV and DP in [9]. We give a first new publicly verifiable DPDP construction based on Index Hash Tables (IHT) in the random oracle model. We prove that such scheme is secure against replace and replay attacks as well as is data privacy-preserving according to a model differing from the one proposed in [9]. We present a second new publicly verifiable DPDP construction based on Merkle Hash Trees (MHT) in the random oracle model. We demonstrate that such scheme is not vulnerable against the three attacks mentioned above. In particular, we use the existing model given in [9] to prove that the MHT-based scheme is data privacy-preserving.

1.1 Related Work

Ateniese et al. [1] introduced the notion of Provable Data Possession (PDP) which allows a client to verify the integrity of his/her data stored at an untrusted server without retrieving the entire file. Their scheme is designed for static data and used homomorphic authenticators as tags based on public key encryption for auditing the data file. Subsequently, Ateniese et al. [2] improved the efficiency of the aforementioned PDP scheme by using symmetric keys. The resulting scheme gets lower overhead and partially supports partial dynamic data operations. Thereafter, various PDP constructions were proposed in the literature [10, 20, 23, 24]. Moreover, PDP schemes with the property of full dynamicity were suggested in [4, 18, 19, 25, 26]. An extension of DPDP includes version control [3, 6] where all data changes are recorded into a repository and any version of the data can be retrieved at any time. DPDP protocols with multi-update capability were suggested in [5]. More recently, data privacy-preserving and publicly verifiable PDP schemes were presented in [7, 9, 14,15,16,17].

2 Preliminaries

Let \(\mathbb {G}_{1}\), \(\mathbb {G}_{2}\) and \(\mathbb {G}_{T}\) be three multiplicative cyclic groups of prime order \(p \in \Theta (2^{\lambda })\) (where \(\lambda \) is the security parameter). Let \(g_{k}\) be a generator of \(\mathbb {G}_{k}\) for \(k \in \{1,2\}\), that we denote \(<g_{k}>=\mathbb {G}_{k}\).

Bilinear Maps: Let \(e : \mathbb {G}_{1} \times \mathbb {G}_{2} \rightarrow \mathbb {G}_{T}\) be a bilinear map with the following properties: (1) Bilinearity: \(\forall u \in \mathbb {G}_{1}, \forall v \in \mathbb {G}_{2}, \forall a,b \in \mathbb {Z}_{p},e(u^{a},v^{b})=e(u,v)^{ab}\). (2) Non-degeneracy: \(e(g_{1},g_{2}) \ne 1_{\mathbb {G}_{T}}\). \(\mathbb {G}_{1}\) and \(\mathbb {G}_{2}\) are said to be bilinear groups if the group operation in \(\mathbb {G}_{1} \times \mathbb {G}_{2}\) and the bilinear map e are both efficiently computable. Let GroupGen denote an algorithm that on input the security parameter \(\lambda \), outputs the parameters \((p,\mathbb {G}_{1},\mathbb {G}_{2},\mathbb {G}_{T},e,g_{1},g_{2})\).

Discrete Logarithm (DL) Assumption: Let \(a \in _{R} \mathbb {Z}_{p}\). If \(\mathcal {A}\) is given an instance \(( g_{1},g_{1}^{a})\), it remains hard to extract \(a \in \mathbb {Z}_{p}\). The DL assumption holds if no polynomial-time adversary \(\mathcal {A}\) has non-negligible advantage in solving the DL problem.

Computational Diffie-Hellman (CDH) Assumption: Let \(a,b \in _{R} \mathbb {Z}_{p}\). If \(\mathcal {A}\) is given an instance \(( g_{1}, g_{1}^{a}, g_{1}^{b})\), it remains hard to compute \( g_{1}^{ab} \in \mathbb {G}_{1}\). The CDH assumption holds if no polynomial-time adversary \(\mathcal {A}\) has non-negligible advantage in solving the CDH problem.

Decisional Diffie-Hellman Exponent (DDHE) Assumption: Let \(\beta \in _{R} \mathbb {Z}_{p}\). If \(\mathcal {A}\) is given an instance \((g_{1},g_{1}^{\beta },\cdots ,g_{1}^{\beta ^{s+1}},g_{2},g_{2}^{\beta },Z)\), it remains hard to decide if either \(Z= g_{1}^{\beta ^{s+2}}\) or Z is a random element in \(\mathbb {G}_{1}\). The \((s+1)\)-DDHE assumption holds if no polynomial-time adversary \(\mathcal {A}\) has non-negligible advantage in solving the \((s+1)\)-DDHE problem.

2.1 Definition of the DPDP Scheme with PV and DP

Let m be a data file to be stored that is divided into n blocks \(m_{i}\), and then each block \(m_{i}\) is divided into s sectors \(m_{i,j} \in \mathbb {Z}_{p}\), where p is a large prime. A DPDP scheme with PV and DP is made of the following algorithms:

\(\bullet \) \(\textsf {KeyGen}(\lambda ) \rightarrow (pk,sk)\). On input the security parameter \(\lambda \), output a pair of public and secret keys (pksk).

\(\bullet \) \(\textsf {TagGen}(pk,sk,m_{i}) \rightarrow T_{m_{i}}\). TagGen is independently run for each block. Therefore, to generate the tag \(T_{m}\) for a file m, TagGen is run n times. On inputs the public key pk, the secret key sk and a file \(m=(m_{1},\cdots ,m_{n})\), output a tag \(T_{m}=(T_{m_{1}},\cdots ,T_{m_{n}})\) where each block \(m_{i}\) has its own tag \(T_{m_{i}}\). The client sets all the blocks \(m_{i}\) in an ordered collection \(\mathbb {F}\) and all the corresponding tags \(T_{m_{i}}\) in an ordered collection \(\mathbb {E}\). He/she sends \(\mathbb {F}\) and \(\mathbb {E}\) to the server and removes them from his/her local storage.

\(\bullet \) \(\textsf {PerfOp}(pk,\mathbb {F},\mathbb {E},info=(\text{ operation },l,m_{l},T_{m_{l}})) \rightarrow (\mathbb {F'},\mathbb {E}',\nu ')\). On inputs the public key pk, the previous collection \(\mathbb {F}\) of all the blocks, the previous collection \(\mathbb {E}\) of all the corresponding tags, the type of the data operation to be performed, the rank l where the data operation is performed in \(\mathbb {F}\), the block \(m_{l}\) to be updated and the corresponding tag \(T_{m_{l}}\) to be updated, output the updated block collection \(\mathbb {F}'\), the updated tag collection \(\mathbb {E}'\) and an updating proof \(\nu '\). For the operation: (1) Insertion: \(m_{l} =m_{\frac{i_{1} + i_{2}}{2}}\) is inserted between the consecutive blocks \(m_{i_{1}}\) and \(m_{i_{2}}\) and \(T_{m_{l}}=T_{m_{\frac{i_{1} + i_{2}}{2}}}\) is inserted between the consecutive tags \(T_{m_{i_{1}}}\) and \(T_{m_{i_{2}}}\). We assume that \(m_{\frac{i_{1} + i_{2}}{2}}\) and \(T_{m_{\frac{i_{1} + i_{2}}{2}}}\) were provided by the client to the server, such that \(T_{m_{\frac{i_{1} + i_{2}}{2}}}\) was correctly computed by running TagGen. (2) Deletion: \(m_{l} =m_{i}\) is deleted, meaning that \(m_{i_{1}}\) is followed by \(m_{i_{2}}\) and \(T_{m_{l}}=T_{m_{i}}\) is deleted, meaning that \(T_{m_{i_{1}}}\) is followed by \(T_{m_{i_{2}}}\), such that \(i_{1},i,i_{2}\) were three consecutive ranks. (3) Modification: \(m_{l} =m_{i}'\) replaces \(m_{i}\) and \(T_{m_{l}}=T_{m_{i}'}\) replaces \(T_{m_{i}}\). We assume that \(m_{i}'\) and \(T_{m_{i}'}\) were provided by the client to the server, such that \(T_{m_{i}'}\) was correctly computed by running TagGen. After operations, the set of ranks becomes \((0,n+1) \cap \mathbb {Q}\).

\(\bullet \) \(\textsf {CheckOp}(pk,\nu ') \rightarrow 0/1\). On inputs the public key pk and the updating proof \(\nu '\) sent by the server, output 1 if \(\nu '\) is a correct updating proof; output 0 otherwise.

\(\bullet \) \(\textsf {GenProof}(pk,F,chal,\varSigma ) \rightarrow \nu \). On inputs the public key pk, an ordered collection \(F \subset \mathbb {F}\) of blocks, a challenge chal and an ordered collection \(\varSigma \subset \mathbb {E}\) which are the tags corresponding to the blocks in F, output a proof of data possession \(\nu \) for the blocks in F that are determined by chal.

\(\bullet \) \(\textsf {CheckProof}(pk,chal,\nu ) \rightarrow 0/1\). On inputs the public key pk, the challenge chal and the proof of data possession \(\nu \), output 1 if \(\nu \) is a correct proof of data possession for the blocks determined by chal; output 0 otherwise.

Correctness. We require that a DPDP with PV and DP is correct if for \((pk,sk) \leftarrow \textsf {KeyGen}(\lambda )\), \(T_{m} \leftarrow \textsf {TagGen}(pk,sk,m)\), \((\mathbb {F'},\mathbb {E}',\nu ') \leftarrow \textsf {PerfOp}(pk,\mathbb {F},\mathbb {E},info)\), \(\nu \leftarrow \textsf {GenProof}(pk,F,\) \(chal,\varSigma )\), then \(1 \leftarrow \textsf {CheckOp}(pk,\nu ')\) and \(1 \leftarrow \textsf {CheckProof}(pk,chal,\nu )\).

2.2 Security and Privacy Models

Security Model Against the Server. The model follows the ones in [1, 4, 9]. We consider a DPDP with PV and DP as defined above. Let a data possession game between a challenger \(\mathcal {B}\) and an adversary \(\mathcal {A}\) (acting as the server) be as follows:

\(\diamond \) Setup. \(\mathcal {B}\) runs \((pk,sk) \leftarrow \textsf {KeyGen}(\lambda )\) such that pk is given to \(\mathcal {A}\) while sk is kept secret.

\(\diamond \) Adaptive Queries. First, \(\mathcal {A}\) is given access to a tag generation oracle \(\mathcal {O}_{TG}\). \(\mathcal {A}\) chooses blocks \(m_{i}\) and gives them to \(\mathcal {B}\), for \(i \in [1,n]\). \(\mathcal {B}\) runs \(\textsf {TagGen}(pk,sk,m_{i}) \rightarrow T_{m_{i}}\) and gives them to \(\mathcal {A}\). Then, \(\mathcal {A}\) creates two ordered collections \(\mathbb {F}=\{m_{i}\}_{i \in [1,n]}\) of blocks and \(\mathbb {E} =\{T_{m_{i}}\}_{i \in [1,n]}\) of the corresponding tags. Then, \(\mathcal {A}\) is given access to a data operation performance oracle \(\mathcal {O}_{DOP}\). For \(i \in [1,n]\), \(\mathcal {A}\) gives to \(\mathcal {B}\) a block \(m_{i}\) and \(info_{i}\) about the operation that \(\mathcal {A}\) wants to perform. \(\mathcal {A}\) also submits two new ordered collections \(\mathbb {F}'\) of blocks and \(\mathbb {E}'\) of tags, and the updating proof \(\nu '\). \(\mathcal {B}\) runs \(\textsf {CheckOp}(pk,\nu ')\) and replies the answer to \(\mathcal {A}\). If the answer is 0, then \(\mathcal {B}\) aborts; otherwise, it proceeds. The above interaction between \(\mathcal {A}\) and \(\mathcal {B}\) can be repeated. Note that the set of ranks has changed after calls to the oracle \(\mathcal {O}_{DOP}\).

\(\diamond \) Challenge. \(\mathcal {A}\) chooses blocks \(m_{i}^{*}\) and \(info_{i}^{*}\), for \(i \in \mathcal {I} \subseteq (0,n+1) \cap \mathbb {Q}\). Adaptive queries can be again made by \(\mathcal {A}\), such that the first \(info_{i}^{*}\) specifies a full re-write update (this corresponds to the first time that the client sends a file to the server). \(\mathcal {B}\) still checks the data operations. For \(i \in \mathcal {I}\), the final version of \(m_{i}\) is considered such that these blocks were created regarding the operations requested by \(\mathcal {A}\), and verified and accepted by \(\mathcal {B}\) beforehand. \(\mathcal {B}\) sets \(\mathbb {F} = \{m_{i}\}_{i \in \mathcal {I}}\) of these blocks and \(\mathbb {E}= \{T_{m_{i}}\}_{i \in \mathcal {I}}\) of the corresponding tags. It then sets two ordered collections \(F=\{m_{i_{j}}\}_{i_{j} \in \mathcal {I},j \in [1,k]} \subset \mathbb {F}\) and \(\varSigma =\{T_{m_{i_{j}}}\}_{i_{j} \in \mathcal {I},j \in [1,k]} \subset \mathbb {E}\). It computes a resulting challenge chal for F and \(\varSigma \) and sends it to \(\mathcal {A}\).

\(\diamond \) Forgery. \(\mathcal {A}\) computes a proof of data possession \(\nu ^{*}\) on chal. Then, \(\mathcal {B}\) runs \(\textsf {CheckProof}(pk,chal,\) \(\nu ^{*})\) and replies the answer to \(\mathcal {A}\). If the answer is 1 then \(\mathcal {A}\) wins.

The advantage of \(\mathcal {A}\) in winning the data possession game is defined as \(Adv_{\mathcal {A}}(\lambda ) = Pr [ \mathcal {A} \text{ wins }]\). The DPDP with PV and DP is secure against the server if there is no PPT (probabilistic polynomial-time) adversary \(\mathcal {A}\) who can win the above game with non-negligible advantage \(Adv_{\mathcal {A}}(\lambda )\).

Data Privacy Model Against the TPA. In a DPDP protocol, we aim to ensure that data privacy is preserved at the verification step, meaning that data are accessible to all but protected only via a non-cryptographic access control, and the verification process does not leak any information on the data blocks.

First Data Privacy Model. The model is found in [17, 20]. We consider a DPDP with PV and DP as defined above. Let the first data privacy game between a challenger \(\mathcal {B}\) and an adversary \(\mathcal {A}\) (acting as the TPA) be as follows:

\(\diamond \) Setup. \(\mathcal {B}\) runs \(\mathsf{KeyGen}(\lambda )\) to generate (pksk) and gives pk to \(\mathcal {A}\), while sk is kept secret.

\(\diamond \) Queries. \(\mathcal {A}\) is allowed to make queries as follows. \(\mathcal {A}\) sends a file \(m = (m_{1},\cdots ,m_{n})\) to \(\mathcal {B}\). \(\mathcal {B}\) computes \(T_{m} = (T_{m_{1}},\cdots ,T_{m_{n}})\) and gives it back to \(\mathcal {A}\). Then, two ordered collections \(\mathbb {F}= \{m_{i}\}_{i \in [1,n]}\) of blocks and \(\mathbb {E} = \{T_{m_{i}}\}_{i \in [1,n]}\) of tags are created.

\(\diamond \) Challenge. \(\mathcal {A}\) submits a challenge chal containing \(k \le n\) ranks, the k corresponding blocks in F and their k tags in \(\varSigma \).

\(\diamond \) Generation of the Proof. \(\mathcal {B}\) computes a proof of data possession \(\nu ^{*} \leftarrow \textsf {GenProof}(pk,F,chal,\varSigma )\) such that the blocks in F are determined by chal and \(\varSigma \) contains the corresponding tags.

\(\mathcal {A}\) succeeds in the first data privacy game if \(F \nsubseteq \mathbb {F}\) and \(\varSigma \nsubseteq \mathbb {E}\), and \(\textsf {CheckProof} (pk,chal,\nu ^{*}) \rightarrow 1\). The advantage of \(\mathcal {A}\) in winning the first data privacy game is defined as \(Adv_{\mathcal {A}}(\lambda ) = Pr[ \mathcal {A} \text{ succeeds }]\). The DPDP with PV and DP is data privacy-preserving if there is no PPT adversary \(\mathcal {A}\) who can win the above game with non-negligible advantage \(Adv_{\mathcal {A}}(\lambda )\). This implies that there is no \(\mathcal {A}\) who can recover the file from a given tag tuple with non-negligible probability.

Second Data Privacy Model. The model follows the ones in [7, 9, 24]. We consider a DPDP with PV and DP as defined above. Let a second data privacy game between a challenger \(\mathcal {B}\) and an adversary \(\mathcal {A}\) (acting as the TPA) be as follows:

\(\diamond \) Setup. \(\mathcal {B}\) runs \(\textsf {KeyGen}(\lambda )\) to generate (pksk) and gives pk to \(\mathcal {A}\), while sk is kept secret.

\(\diamond \) Queries. \(\mathcal {A}\) is allowed to make queries as follows. \(\mathcal {A}\) sends a file m to \(\mathcal {B}\). \(\mathcal {B}\) computes the corresponding \(T_{m}\) and gives it to \(\mathcal {A}\).

\(\diamond \) Challenge. \(\mathcal {A}\) submits two different files \(m_{0}\) and \(m_{1}\) of equal length, such that they have not be chosen in the phase Queries, and sends them to \(\mathcal {B}\). \(\mathcal {B}\) generates \(T_{m_{0}}\) and \(T_{m_{1}}\) by running TagGen, randomly chooses a bit \(b \in _{R} \{0,1\}\) and forwards \(T_{m_{b}}\) to \(\mathcal {A}\). Then, \(\mathcal {A}\) sets a challenge chal and sends it to \(\mathcal {B}\). \(\mathcal {B}\) generates a proof of data possession \(\nu ^{*}\) based on \(m_{b}\), \(T_{m_{b}}\) and chal, and replies to \(\mathcal {A}\) by giving \(\nu ^{*}\).

\(\diamond \) Guess. Finally, \(\mathcal {A}\) chooses a bit \(b' \in \{0,1\}\) and wins the game if \(b'=b\).

The advantage of \(\mathcal {A}\) in winning the second data privacy game is defined as \(Adv_{\mathcal {A}}(\lambda )= |Pr[b'=b] - \frac{1}{2} |\). The DPDP with PV and DP is data privacy-preserving if there is no PPT adversary \(\mathcal {A}\) who can win the above game with non-negligible advantage \(Adv_{\mathcal {A}}(\lambda )\).

3 The Three Attacks

3.1 DPDP Construction with PV and DP in [9]

The DPDP scheme with PV and DP construction presented in [9] is as follows:

\(\bullet \) \(\textsf {KeyGen}(\lambda ) \rightarrow (pk,sk)\). The client runs \(\textsf {GroupGen}(\lambda ) \rightarrow (p,\mathbb {G}_{1},\mathbb {G}_{2},\mathbb {G}_{T},e,g_{1},g_{2})\) such that on input the security parameter \(\lambda \), GroupGen generates the cyclic groups \(\mathbb {G}_{1}\), \(\mathbb {G}_{2}\) and \(\mathbb {G}_{T}\) of prime order \(p = p(\lambda )\) with the bilinear map \(e : \mathbb {G}_{1} \times \mathbb {G}_{2} \rightarrow \mathbb {G}_{T}\). Let \(<g_{1}>=\mathbb {G}_{1}\) and \(<g_{2}>=\mathbb {G}_{2}\). Then, \(h_{1},\cdots ,h_{s} \in _{R} \mathbb {G}_{1}\) and \(a \in _{R} \mathbb {Z}_{p}\) are randomly chosen. Finally, he/she sets the public key \(pk= (p,\mathbb {G}_{1},\mathbb {G}_{2},\mathbb {G}_{T},e,g_{1},g_{2},h_{1},\cdots ,h_{s},g_{2}^{a})\) and the secret key \(sk=a\).

\(\bullet \) \(\textsf {TagGen}(pk,sk,m_{i}) \rightarrow T_{m_{i}}\). A file m is split into n blocks \(m_{i}\), for \(i \in [1,n]\). Each block \(m_{i}\) is then split into s sectors \(m_{i,j} \in \mathbb {Z}_{p}\), for \(j \in [1,s]\). Therefore, the file m can be seen as a \(n \times s\) matrix with elements denoted as \(m_{i,j}\). The client computes \( T_{m_{i}} = (\prod _{j=1}^{s} h_{j}^{m_{i,j}})^{-sk} =\prod _{j=1}^{s} h_{j}^{- a \cdot m_{i,j}}\). Yet, he/she sets \(T_{m}= (T_{m_{1}},\cdots ,T_{m_{n}}) \in \mathbb {G}_{1}^{n}\).

\(\bullet \) \(\textsf {PerfOp}(pk,\mathbb {F},\mathbb {E},info=(\text{ operation },l,m_{l},T_{m_{l}})) \rightarrow (\mathbb {F'},\mathbb {E}',\nu ')\). The server first selects at random \(u_{j} \in _{R} \mathbb {Z}_{p}\), for \(j \in [1,s]\), and computes \(U_{j}=h_{j}^{u_{j}}\). It also chooses at random \(w_{l} \in _{R} \mathbb {Z}_{p}\) and sets \(c_{j}= m_{l,j} \cdot w_{l} + u_{j} \), \(C_{j}=h_{j}^{c_{j}}\), and \(d= T_{m_{l}}^{w_{l}}\). Finally, it returns \(\nu ' = (U_{1},\cdots ,U_{s},C_{1},\cdots ,C_{s},d,\) \(w_{l}) \in \mathbb {G}_{1}^{2s+1}\) to the TPA. For the operation: (1) Insertion: \((l,m_{l},T_{m_{l}}) = (\frac{i_{1}+i_{2}}{2},m_{\frac{i_{1}+i_{2}}{2}},T_{m_{\frac{i_{1}+i_{2}}{2}}})\); (2) Deletion: \((l,m_{l},T_{m_{l}}) = (i,\_,\_)\), meaning that \(m_{l}\) and \(T_{m_{l}}\) are not required (the server uses \(m_{i}\) and \(T_{m_{i}}\) that are kept on its storage to generate \(\nu '\)); (3) Modification: \((l,m_{l},T_{m_{l}}) =(i,m_{i}',T_{m_{i}'})\).

\(\bullet \) \(\textsf {CheckOp}(pk,\nu ') \rightarrow 0/1\). The TPA has to check whether the following equation holds:

$$\begin{aligned} e(d,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} U_{j} ,g_{2})&{\mathop {=}\limits ^{?}}&e(\prod _{j=1}^{s} C_{j},g_{2}) \end{aligned}$$
(1)

If Eq. 1 holds, then the TPA returns 1 to the client; otherwise, it returns 0 to the client.

\(\bullet \) \(\textsf {GenProof}(pk,F,chal,\varSigma ) \rightarrow \nu \). The TPA first chooses \(I \subseteq (0,n+1) \cap \mathbb {Q}\), randomly chooses |I| elements \(v_{i} \in _{R} \mathbb {Z}_{p}\) and sets \(chal=\{(i,v_{i})\}_{i \in I}\). After receiving chal, the server sets \(F = \{m_{i}\}_{i \in I} \subset \mathbb {F}\) of blocks and \(\varSigma = \{T_{m_{i}}\}_{i \in I} \subset \mathbb {E}\) which are the tags corresponding to the blocks in F. It then selects at random \(r_{j} \in _{R} \mathbb {Z}_{p}\), for \(j \in [1,j]\), and computes \(R_{j}=h_{j}^{r_{j}}\). It also sets \(b_{j}=\sum _{(i,v_{i}) \in chal} m_{i,j} \cdot v_{i} + r_{j}\), \(B_{j}=h_{j}^{b_{j}}\) for \(j \in [1,s]\), and \(c =\prod _{(i,v_{i}) \in chal} T_{m_{i}}^{v_{i}}\). Finally, it returns \(\nu = (R_{1},\cdots ,R_{s},B_{1},\cdots ,B_{s},c) \in \mathbb {G}_{1}^{2s+1}\) to the TPA.

\(\bullet \) \(\textsf {CheckProof}(pk,chal,\nu ) \rightarrow 0/1\). The TPA has to check whether the following equation holds:

$$\begin{aligned} e(c,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} R_{j} ,g_{2})&{\mathop {=}\limits ^{?}}&e(\prod _{j=1}^{s} B_{j},g_{2}) \end{aligned}$$
(2)

If Eq. 2 holds, then the TPA returns 1 to the client; otherwise, it returns 0 to the client.

Correctness. Given the proof of data possession \(\nu \) and the updating proof \(\nu '\), we have:

$$\begin{aligned} e(c,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} R_{j} ,g_{2})= & {} e ( \prod _{\begin{array}{c} (i,v_{i}) \\ \in chal \end{array}} T_{m_{i}}^{v_{i}},g_{2}^{a} ) \cdot e(\prod _{j=1}^{s} h_{j}^{r_{j}} ,g_{2}) = e(\prod _{j=1}^{s} h_{j}^{b_{j}},g_{2}) =e(\prod _{j=1}^{s} B_{j},g_{2}) \\ e(d,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} U_{j} ,g_{2})= & {} e( T_{m_{i}}^{w_{i}},g_{2}^{a}) \cdot e(\prod _{j=1}^{s} h_{j}^{u_{j}} ,g_{2}) = e(\prod _{j=1}^{s} h_{j}^{c_{j}},g_{2}) = e(\prod _{j=1}^{s} C_{j},g_{2}) \end{aligned}$$

N.B. In the construction in [9], the definition of the tag \(T_{m_{i}}\) corresponding to the block \(m_{i}\) and enabling to remotely verify the data integrity is independent of the rank i; thus, this begs for being used for an attack. Note that if \(m_{i}=0\), then \(T_{m_{i}}=1\) and thus, one can trivially cheat since the tag is independent of the file.

3.2 Replace Attack

Let the server store only one block (e.g. \(m_{1}\)) instead of n blocks as the client believes. The TPA audits the server by sending it a challenge chal for blocks with ranks in \(I \subseteq [1,n]\) such that \(|I| \le n\). The server generates a proof of data possession on the |I| blocks \(m_{1}\) (instead of the blocks defined by chal) by using |I| times the block \(m_{1}\) to obtain the proof of data possession. The attack is successful if the server manages to pass the verification process and has its proof of data possession being accepted by the TPA.

The client computes \(T_{m} = (T_{m_{1}},\cdots ,T_{m_{n}}) \in \mathbb {G}_{1}^{n}\) for a file \(m = (m_{1},\cdots ,m_{n})\) where \(T_{m_{i}} = (\prod _{j=1}^{s} h_{j}^{m_{i,j}})^{-sk} = (\prod _{j=1}^{s} h_{j}^{m_{i,j}})^{-a} \) for s public elements \(h_{j} \in \mathbb {G}_{1}\) and the secret key \(sk=a \in \mathbb {Z}_{p}\). Then, the client stores all the blocks \(m_{i}\) in \(\mathbb {F}\) and the tags \(T_{m_{i}}\) in \(\mathbb {E}\), forwards these collections to the server and deletes them from his/her local storage. Yet, the server is asked to generate a proof of data possession \(\nu \). We assume that it only stores \(m_{1}\) while it has deleted \(m_{2},\cdots ,m_{n}\) and we show that it can still pass the verification process. The TPA prepares a challenge chal by choosing a set \(I \subseteq [1,n]\) (without loss of generality, we assume that the client has not requested the server for data operations yet). The TPA then randomly chooses |I| elements \(v_{i} \in _{R} \mathbb {Z}_{p}\) and sets \(chal=\{(i,v_{i})\}_{i \in I}\). Second, after receiving chal, the server sets \(F = \{m_{1}\}_{i \in I} \subset \mathbb {F}\) of blocks (instead of \(F = \{m_{i}\}_{i \in I}\)) and \(\varSigma = \{T_{m_{1}}\}_{i \in I} \subset \mathbb {E}\) (instead of \(\varSigma = \{T_{m_{i}}\}_{i \in I}\)). The server finally forwards \(\nu = (R_{1},\cdots ,R_{s},B_{1},\cdots ,B_{s},c) \in \mathbb {G}_{1}^{2s+1}\) to the TPA, where \(R_{j}=h_{1}^{r_{j}}\) for \(r_{j} \in _{R} \mathbb {Z}_{p}\) and \(B_{j} = h_{j}^{\sum _{(i,v_{i}) \in chal} m_{1,j} \cdot v_{i} + r_{j}}\) (instead of \(B_{j}=h_{j}^{\sum _{(i,v_{i}) \in chal} m_{i,j} \cdot v_{i} + r_{j}}\)) for \(j \in [1,s]\), and \(c =\prod _{(i,v_{i}) \in chal} T_{m_{1}}^{v_{i}}\) (instead of \(c =\prod _{(i,v_{i}) \in chal} T_{m_{i}}^{v_{i}}\)). The TPA has to check whether the following equation holds:

$$\begin{aligned} e(c,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} R_{j} ,g_{2})&{\mathop {=}\limits ^{?}}&e(\prod _{j=1}^{s} B_{j},g_{2}) \end{aligned}$$
(3)

If Eq. 3 holds, then the TPA returns 1 to the client; otherwise, it returns 0 to the client.

Correctness. Given the proof of data possession \(\nu \), we have:

$$\begin{aligned} e(c,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} R_{j} ,g_{2})= & {} e ( \prod _{(i,v_{i}) \in chal} T_{m_{1}}^{v_{i}},g_{2}^{a} ) \cdot e(\prod _{j=1}^{s} h_{j}^{r_{j}} ,g_{2}) \\= & {} e ( \prod _{(i,v_{i}) \in chal} \prod _{j=1}^{s} h_{j}^{m_{1,j} \cdot (-a) \cdot v_{i}},g_{2}^{a} ) \cdot e(\prod _{j=1}^{s} h_{j}^{r_{j}} ,g_{2}) \\= & {} e(\prod _{j=1}^{s} h_{j}^{b_{j}},g_{2}) =e(\prod _{j=1}^{s} B_{j},g_{2}) \end{aligned}$$

Therefore, Eq. 3 holds, although the server is actually storing one block only.

3.3 Replay Attack

The client asks the server to replace \(m_{i}\) with \(m_{i}'\). However, the server does not proceed and keeps \(m_{i}\) on its storage. Then, the TPA has to check that the operation has been correctly done and asks the server for an updating proof \(\nu '\). The server generates it, but using \(m_{i}\) instead of \(m_{i}'\). The attack is successful if the server manages to pass the verification process and has \(\nu '\) being accepted by the TPA.

A client asks the server to modify the block \(m_{i}\) by sending \(m_{i}'\) and \(T_{m_{i}'}\). However, the server does not follow the client’s request and decides to keep \(m_{i}\) and \(T_{m_{i}}\), and deletes \(m_{i}'\) and \(T_{m_{i}'}\). The server receives i, \(m_{i}'\) and \(T_{m_{i}'}\) from the client but deletes them, and generates the updating proof \(\nu ' = (U_{1},\cdots ,U_{s},C_{1},\cdots ,C_{s},d) \in \mathbb {G}_{1}^{2s+1}\) by using \(m_{i}\) and \(T_{m_{i}}\) such that \(U_{j}=h_{1}^{u_{j}}\) where \(u_{j} \in _{R} \mathbb {Z}_{p}\) and \(C_{j} = h_{j}^{m_{i,j} \cdot w_{i} + u_{j} }\) (instead of \(C_{j}= h_{j}^{ m_{i,j}' \cdot w_{i} + u_{j} }\)) for \(j \in [1,s]\), and \(d= T_{m_{i}}^{w_{i}}\) (instead of \(d= T_{m_{i}'}^{w_{i}}\)). It gives \(\nu '\) to the TPA. The TPA has to check whether the following equation holds:

$$\begin{aligned} e(d,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} U_{j} ,g_{2})&{\mathop {=}\limits ^{?}}&e(\prod _{j=1}^{s} C_{j},g_{2}) \end{aligned}$$
(4)

If Eq. 4 holds, then the TPA returns 1 to the client; otherwise, it returns 0 to the client.

Correctness. Given the updating proof \(\nu '\), we have:

$$\begin{aligned} e(d,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} U_{j} ,g_{2})= & {} e( T_{m_{i}}^{w_{i}},g_{2}^{a}) \cdot e(\prod _{j=1}^{s} h_{j}^{u_{j}} ,g_{2}) = e( \prod _{j=1}^{s} h_{j}^{m_{i,j} \cdot (-a) \cdot w_{i}},g_{2}^{a}) \cdot e(\prod _{j=1}^{s} h_{j}^{u_{j}} ,g_{2}) \\= & {} e(\prod _{j=1}^{s} h_{j}^{c_{j}},g_{2}) = e(\prod _{j=1}^{s} C_{j},g_{2}) \end{aligned}$$

Therefore, Eq. 4 holds, although the server has not updated the block \(m_{i}'\) and the corresponding tag \(T_{m_{i}'}\).

3.4 Attack against Data Privacy

The adversarial TPA and the server play the second data privacy game. The TPA gives two equal-length blocks \(m_{0}\) and \(m_{1}\) to the server and the latter replies by sending \(T_{m_{b}}\) of \(m_{b}\) where \(b \in _{R} \{0,1\}\) is a random bit. Then, the TPA selects a bit \(b' \in \{0,1\}\). The attack is successful if using \(m_{b'}\), the TPA can discover which block \(m_{b} \in \{m_{0},m_{1}\}\) was chosen by the server.

Let \(m_{0} = (m_{0,1} , \cdots , m_{0,n})\) and \(m_{1} = (m_{1,1} , \cdots , m_{1,n})\). The server computes \(T_{m_{b,i}} =(\prod _{j=1}^{s} h_{j}^{m_{b,i,j}})^{-sk} =(\prod _{j=1}^{s} h_{j}^{m_{b,i,j}})^{-a} \), for \(b \in _{R} \{0,1\}\) and \(i \in [1,n]\), and gives them to the TPA. Note that \( e(T_{m_{b,i}},g_{2}) = e( (\prod _{j=1}^{s} h_{j}^{m_{b,i,j}})^{-a},g_{2}) =e( \prod _{j=1}^{s} h_{j}^{m_{b,i,j}},(g_{2}^{a})^{-1})\). The computation of \(e(\prod _{j=1}^{s} h_{j}^{m_{b,i,j}},(g_{2}^{a})^{-1})\) requires only public elements. Therefore, for \(b' \in \{0,1\}\), the TPA is able to generate the pairing \(e(\prod _{j=1}^{s} h_{j}^{m_{b',i,j}},(g_{2}^{a})^{-1})\) given pk and the block that it gave to the server, and \(e(T_{m_{b,i}},g_{2})\) given the tag sent by the server. Finally, the TPA compares them. If these two pairings are equal, then \(b'=b\); otherwise \(b' \ne b\).

N.B. This attack is due to the public verifiability property of the scheme in [9] based on the definition of the second data privacy game. Moreover, in the proof for data privacy in [9], the analysis is wrong: the affirmation “The probability \(Pr[b'=b]\) must be equal to \(\frac{1}{2}\) since the tags \(T_{m_{b,i}}\), for \(i \in [1,n]\), and the proof \(\nu ^{*}\) are independent of the bit b.” is incorrect since \(T_{m_{b,i}}\) and \(\nu ^{*}\) actually depend on b.

4 IHT-based DPDP Scheme with PV and DP

A solution to avoid the replace attack is to embed the rank i of \(m_{i}\) into \(T_{m_{i}}\). When the TPA on behalf of the client checks \(\nu \) generated by the server, it requires to use all the ranks of the challenged blocks to process the verification. Such idea was proposed for the publicly verifiable scheme in [13]. A solution to avoid the replay attack is to embed the version number \(vnb_{i}\) of \(m_{i}\) into \(T_{m_{i}}\). The first time that the client sends \(m_{i}\) to the server, \(vnb_{i}=1\) (meaning that the first version of the block is uploaded) and is appended to i. When the client wants to modify \(m_{i}\) with \(m_{i}'\), he/she specifies \(vnb_{i}=2\) (meaning that the second version of the block is uploaded) and generates \(T_{m_{i}'}\) accordingly. When the TPA on behalf of the client checks that the block was correctly updated by the server, it has to use both i and \(vnb_{i}\) of \(m_{i}\). Moreover, we stress that the rank i of the block \(m_{i}\) is unique. More precisely, when a block is inserted, a new rank is created that has not been used and when a block is modified, the rank does not change. However, when a block is deleted, its rank does not disappear to ensure that it won’t be used for another block and thus, to let the scheme remain secure.

4.1 IHT-based Construction

The IHT-based DPDP scheme with PV and DP construction is as follows:

\(\bullet \) \(\textsf {KeyGen}(\lambda ) \rightarrow (pk,sk)\). The client runs \(\textsf {Group-}\) \(\textsf {Gen}(\lambda ) \rightarrow (p,\mathbb {G}_{1},\mathbb {G}_{2},\mathbb {G}_{T},\) \(e,g_{1},g_{2})\) such that on input the security parameter \(\lambda \), GroupGen generates the cyclic groups \(\mathbb {G}_{1}\), \(\mathbb {G}_{2}\) and \(\mathbb {G}_{T}\) of prime order \(p = p(\lambda )\) with the bilinear map \(e : \mathbb {G}_{1} \times \mathbb {G}_{2} \rightarrow \mathbb {G}_{T}\). Let \(<g_{1}>=\mathbb {G}_{1}\) and \(<g_{2}>=\mathbb {G}_{2}\). Let the hash function \(H : \mathbb {Q} \times \mathbb {N} \rightarrow \mathbb {G}_{1}\) be a random oracle. Then, \(h_{1},\cdots ,h_{s} \in _{R} \mathbb {G}_{1}\) and \(a \in _{R} \mathbb {Z}_{p}\) are randomly chosen. Finally, he/she sets the public key \(pk= (p,\mathbb {G}_{1},\mathbb {G}_{2},\mathbb {G}_{T},e,g_{1},g_{2},h_{1},\cdots ,h_{s},g_{2}^{a},H)\) and the secret key \(sk=a\).

\(\bullet \) \(\textsf {TagGen}(pk,sk,m_{i}) \rightarrow T_{m_{i}}\). A file m is split into n blocks \(m_{i}\), for \(i \in [1,n]\). Each block \(m_{i}\) is then split into s sectors \(m_{i,j} \in \mathbb {Z}_{p}\), for \(j \in [1,s]\). Therefore, the file m can be seen as a \(n \times s\) matrix with elements denoted as \(m_{i,j}\). The client computes \( T_{m_{i}} = (H(i,vnb_{i}) \cdot \prod _{j=1}^{s} h_{j}^{m_{i,j}})^{-sk} =H(i,vnb_{i})^{-a} \cdot \prod _{j=1}^{s} h_{j}^{- a \cdot m_{i,j}}\). Yet, he/she sets \(T_{m}= (T_{m_{1}},\cdots ,T_{m_{n}}) \in \mathbb {G}_{1}^{n}\).

\(\bullet \) \(\textsf {PerfOp}(pk,\mathbb {F},\mathbb {E},info=(\text{ operation },l,m_{l},T_{m_{l}}))\) \(\rightarrow (\mathbb {F'},\mathbb {E}',\nu ')\). The server first selects at random \(u_{j} \in _{R} \mathbb {Z}_{p}\), for \(j \in [1,s]\), and computes \(U_{j}=h_{j}^{u_{j}}\). It also chooses at random \(w_{l} \in _{R} \mathbb {Z}_{p}\) and sets \(c_{j}= m_{l,j} \cdot w_{l} + u_{j} \), \(C_{j}=h_{j}^{c_{j}}\) for \(j \in [1,s]\), and \(d= T_{m_{l}}^{w_{l}}\). Finally, it returns \(\nu ' = (U_{1},\cdots ,U_{s},C_{1},\cdots ,C_{s},d,w_{l}) \in \mathbb {G}_{1}^{2s+1}\) to the TPA. For the operation: (1) Insertion: \((l,m_{l},T_{m_{l}}) = (\frac{i_{1}+i_{2}}{2},m_{\frac{i_{1}+i_{2}}{2}},T_{m_{\frac{i_{1}+i_{2}}{2}}})\) and \(vnb_{l} = vnb_{\frac{i_{1}+i_{2}}{2}}=1\); (2) Deletion: \((l,m_{l},T_{m_{l}}) = (i,\_,\_)\) and \(vnb_{l} = vnb_{i}=\_\), meaning that \(m_{l}\), \(T_{m_{l}}\) and \(vnb_{l}\) are not required (the server uses \(m_{i}\), \(T_{m_{i}}\) and \(vnb_{i}\) that are kept on its storage to generate \(\nu '\)); (3) Modification: \((l,m_{l},T_{m_{l}}) =(i,m_{i}',T_{m_{i}'})\) and \( vnb_{l}=vnb_{i}' = vnb_{i}+1\).

\(\bullet \) \(\textsf {CheckOp}(pk,\nu ') \rightarrow 0/1\). The TPA has to check whether the following equation holds:

$$\begin{aligned} e(d,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} U_{j} ,g_{2})&{\mathop {=}\limits ^{?}}&e ( H(l,vnb_{l})^{w_{l}},g_{2}) \cdot e(\prod _{j=1}^{s} C_{j},g_{2}) \end{aligned}$$
(5)

If Eq. 5 holds, then the TPA returns 1 to the client; otherwise, it returns 0 to the client.

\(\bullet \) \(\textsf {GenProof}(pk,F,chal,\varSigma ) \rightarrow \nu \). The TPA first chooses \(I \subseteq (0,n+1) \cap \mathbb {Q}\), randomly chooses |I| elements \(v_{i} \in _{R} \mathbb {Z}_{p}\) and sets \(chal=\{(i,v_{i})\}_{i \in I}\). After receiving chal, the server sets \(F = \{m_{i}\}_{i \in I} \subset \mathbb {F}\) of blocks and \(\varSigma = \{T_{m_{i}}\}_{i \in I} \subset \mathbb {E}\) which are the tags corresponding to the blocks in F. It then selects at random \(r_{j} \in _{R} \mathbb {Z}_{p}\), for \(j \in [1,s]\), and computes \(R_{j}=h_{j}^{r_{j}}\). It also sets \(b_{j}=\sum _{(i,v_{i}) \in chal} m_{i,j} \cdot v_{i} + r_{j}\), \(B_{j}=h_{j}^{b_{j}}\) for \(j \in [1,s]\), and \(c =\prod _{(i,v_{i}) \in chal} T_{m_{i}}^{v_{i}}\). Finally, it returns \(\nu = (R_{1},\cdots ,R_{s},B_{1},\cdots ,B_{s},c) \in \mathbb {G}_{1}^{2s+1}\) to the TPA.

\(\bullet \) \(\textsf {CheckProof}(pk,chal,\nu ) \rightarrow 0/1\). The TPA has to check whether the following equation holds:

$$\begin{aligned} e(c,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} R_{j} ,g_{2})&{\mathop {=}\limits ^{?}}&e ( \prod _{\begin{array}{c} (i,v_{i}) \\ \in chal \end{array}} H(i,vnb_{i})^{v_{i}},g_{2}) \cdot e(\prod _{j=1}^{s} B_{j},g_{2}) \end{aligned}$$
(6)

If Eq. 6 holds, then the TPA returns 1 to the client; otherwise, it returns 0 to the client.

Correctness. Given the proof of data possession \(\nu \) and the updating proof \(\nu '\), we have:

$$\begin{aligned} e(c,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} R_{j} ,g_{2})= & {} e ( \prod _{(i,v_{i}) \in chal} T_{m_{i}}^{v_{i}},g_{2}^{a} ) \cdot e(\prod _{j=1}^{s} h_{j}^{r_{j}} ,g_{2}) \\= & {} e ( \prod _{(i,v_{i}) \in chal} (H(i,vnb_{i}) \cdot \prod _{j=1}^{s} h_{j}^{m_{i,j} })^{-a \cdot v_{i}},g_{2}^{a}) \cdot e(\prod _{j=1}^{s} h_{j}^{r_{j}} ,g_{2}) \\= & {} e ( \prod _{(i,v_{i}) \in chal} H(i,vnb_{i})^{v_{i}},g_{2} ) \cdot e(\prod _{j=1}^{s} B_{j},g_{2} ) \end{aligned}$$
$$\begin{aligned} e(d,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} U_{j} ,g_{2})= & {} e( T_{m_{l}}^{w_{l}},g_{2}^{a}) \cdot e(\prod _{j=1}^{s} h_{j}^{u_{j}} ,g_{2}) \\= & {} e(H(l,vnb_{l}) \cdot \prod _{j=1}^{s} h_{j}^{m_{l,j} },g_{2}^{a})^{-a \cdot w_{l}} \cdot e(\prod _{j=1}^{s} h_{j}^{u_{j}} ,g_{2}) \\= & {} e ( H(l,vnb_{l})^{w_{l}},g_{2}) \cdot e(\prod _{j=1}^{s} C_{j},g_{2}) \end{aligned}$$

N.B. The client or TPA must store the values vnb locally. However, this does not incur more burden if we consider the values vnb as bit strings.

4.2 Security and Privacy Proofs

Security Proof Against the Server

Theorem 1

Let \(\mathcal {A}\) be a PPT adversary that has advantage \(\epsilon \) against the IHT-based DPDP scheme with PV and DP. Suppose that \(\mathcal {A}\) makes a total of \(q_{H} >0\) queries to H. Then, there is a challenger \(\mathcal {B}\) that solves the Computational Diffie-Hellman (CDH) and Discrete Logarithm (DL) problems with advantage \(\epsilon ' = \mathcal {O}(\epsilon )\).

We give the security proof in the Appendix A.

First Data Privacy Proof Against the TPA

Theorem 2

Let \(\mathcal {A}\) be a PPT adversary that has advantage \(\epsilon \) against the IHT-based DPDP scheme with PV and DP. Suppose that \(\mathcal {A}\) makes a total of \(q_{H} >0\) queries to H. Then, there is a challenger \(\mathcal {B}\) that solves the CDH problem with advantage \(\epsilon ' = \mathcal {O}(\epsilon )\).

We give the first data privacy proof in the full version of this paper [8].

4.3 Performance

We compare the IHT-based scheme with the original scheme proposed in [9]. First, the client and TPA obviously have to store more information by keeping the IHT. Nevertheless, we stress that in any case, the client and TPA should maintain a rank list. Indeed, they need some information about the stored data in order to select some data blocks to be challenged. We recall that the challenge consists of pairs of the form “(rank, random element)”. By appending an integer and sometimes an auxiliary comment (only in case of deletions) to each rank, the extra burden is not excessive. Therefore, such table does slightly affect the client’s as well as TPA’s local storages. The communication between the client and TPA rather increases since the client should send more elements to the TPA in order to keep the table updated. Second, the client has to perform extra computation when generating the verification metadata: for each file block \(m_{i}\), he/she has to compute \(H(i,vnb_{i})\). However, the communication between the client and server overhead does not increase. Third, the TPA needs to compute an extra pairing \(e(H(i,vnb_{i}),g_{2})^{w_{i}}\) in order to check that the server correctly performed a data operation requested by the client. The TPA also has to compute |I| multiplications in \(\mathbb {G}_{1}\) and one extra pairing when checking the proof of data possession: for each challenge \(chal = \{(i,v_{i})\}_{i \in I}\), it calculates \(\prod _{(i,v{i}) \in chal} H(i,vnb_{i})\) as well as the pairing \(e(\prod _{(i,v{i}) \in chal} H(i,vnb_{i})^{v_{i}},g_{2})\). This gives a constant total of four pairings in order to verify the data integrity instead of three, that is not a big loss in term of efficiency and practicality. Finally, apart the storage of a light table and computation of an extra pairing by the TPA for the verification of both the updating proof and proof of data possession, the new construction for the DPDP scheme with PV and DP is still practical by adopting asymmetric pairings to gain efficiency and by still reducing the group exponentiation and pairing operations. In addition, this scheme still allows the TPA on behalf of the client to request the server for a proof of data possession on as many data blocks as possible at no extra cost, as in the scheme given in [9].

5 MHT-based DPDP Scheme with PV and DP

A second solution to avoid the three attacks is to implement a MHT [12] for each file. In a MHT, each internal node has always two children. For a leaf node \(nd_{i}\) based on the block \(m_{i}\), the assigned value is \(H'(m_{i})\), where the hash function \(H' : \{0,1\}^{*} \rightarrow \mathbb {G}_{1}\) is seen as a random oracle. Note that the hash values are affected to the leaf nodes in the increasing order of the blocks: \(nd_{i}\) and \(nd_{i+1}\) correspond to the hash of the blocks \(m_{i}\) and \(m_{i+1}\) respectively. A parent node of \(nd_{i}\) and \(nd_{i+1}\) has a value computed as \(H'(H'(m_{i}) || H'(m_{i+1}))\), where || is the concatenation sign (for an odd rank i). The Auxiliary Authentication Information (AAI) \(\varOmega _{i}\) of a leaf node \(nd_{i}\) for \(m_{i}\) is a set of hash values chosen from its upper levels, so that the root rt can be computed using \((m_{i},\varOmega _{i})\).

5.1 MHT-based Construction

Let \(\textsf {DPDP}\) be a DPDP construction with PV and DP such as defined in Sect. 3.1 and [9]. Let \(\textsf {SS}=(\textsf {Gen},\textsf {Sign},\textsf {Verify})\) be a strongly unforgeable digital signature scheme. The MHT-based DPDP scheme with PV and DP construction is as follows:

\(\bullet \) \(\textsf {MHT.KeyGen}(\lambda ) \rightarrow (\mathsf{pk},\mathsf{sk})\). Let \(\textsf {GroupGen}(\lambda ) \rightarrow (p,\mathbb {G}_{1},\mathbb {G}_{2},\mathbb {G}_{T},e,g_{1},g_{2})\) be run as follows. On input the security parameter \(\lambda \), GroupGen generates the cyclic groups \(\mathbb {G}_{1}\), \(\mathbb {G}_{2}\) and \(\mathbb {G}_{T}\) of prime order \(p = p(\lambda )\) with the bilinear map \(e : \mathbb {G}_{1} \times \mathbb {G}_{2} \rightarrow \mathbb {G}_{T}\). Let \(<g_{1}>=\mathbb {G}_{1}\) and \(<g_{2}>=\mathbb {G}_{2}\). The client runs \(\textsf {Gen}(\lambda ) \rightarrow (pk_{\textsf {SS}},sk_{\textsf {SS}})\) and \(\textsf {KeyGen}(\lambda ) \rightarrow (pk,sk) =((p,\mathbb {G}_{1},\mathbb {G}_{2},\mathbb {G}_{T},e,g_{1},g_{2},h_{1},\cdots ,h_{s},g_{2}^{a}),a)\), where \(h_{1},\cdots ,h_{s} \in _{R} \mathbb {G}_{1}\) and \(a \in _{R} \mathbb {Z}_{p}\) are randomly chosen. The client sets his/her public key \(\mathsf{pk} = (pk,pk_{\textsf {SS}}) \) and his/her secret key \(\mathsf{sk} =(sk,sk_{\textsf {SS}}) \).

\(\bullet \) \(\textsf {MHT.TagGen}(\mathsf{pk},\mathsf{sk},m_{i}) \rightarrow T_{m_{i}}\). The client runs n times \(\textsf {TagGen}(pk,sk,m_{i}) \rightarrow T_{m_{i}}' = (\prod _{j=1}^{s} h_{j}^{m_{i,j}})^{-sk} = (\prod _{j=1}^{s} h_{j}^{m_{i,j}})^{-a}\) for \(i \in [1,n]\) and obtains \(T_{m}'= (T_{m_{1}}',\cdots ,T_{m_{n}}') \in \mathbb {G}_{1}^{n}\). He/she also chooses a hash function \(H' : \{0,1\}^{*} \rightarrow \mathbb {G}_{1}\) seen as a random oracle. Then, he/she creates the MHT regarding the file \(m =(m_{1},\cdots ,m_{n})\) as follows. He/she computes \(H'(m_{i})\) and assigns it to the i-th leaf for \(i \in [1,n]\). He/she starts to construct the resulting MHT, and obtains the root rt. Finally, the client runs \(\textsf {Sign}(sk_{\textsf {SS}},rt) \rightarrow \sigma _{rt}\). Using the hash values, he/she computes the tags as \(T_{m_{i}} = H'(m_{i})^{-sk} \cdot T_{m_{i}}'= H'(m_{i})^{-a} \cdot \prod _{j=1}^{s} h_{j}^{- a \cdot m_{i,j}}\) for \(i \in [1,n]\). Then, the client stores all the blocks \(m_{i}\) in an ordered collection \(\mathbb {F}\) and the corresponding tags \(T_{m_{i}}\) in an ordered collection \(\mathbb {E}\). He/she forwards these two collections and \((H',\sigma _{rt})\) to the server. Once the server receives \((\mathbb {F},\mathbb {E},H')\), it generates the MHT. It sends the resulting root \(rt_{server}\) to the client. Upon getting the root \(rt_{server}\), the client runs \(\textsf {Verify}(pk_{\textsf {SS}},\sigma _{rt},rt_{server}) \rightarrow 0/1\). If 0, then the client aborts. Otherwise, he/she proceeds, deletes \((\mathbb {F},\mathbb {E},\sigma _{rt})\) from his/her local storage and keeps \(H'\) for further data operations.

\(\bullet \) \(\textsf {MHT.PerfOp}(\mathsf{pk},\mathbb {F},\mathbb {E},R=(\text{ operation },i),info=(m_{i},T_{m_{i}},\sigma _{rt'})) \rightarrow (\mathbb {F'},\mathbb {E}',rt_{server}')\). First, the client sends a request \(R=(\text{ operation },i)\) to the server, that contains the type and rank of the operation. Upon receiving R, the server selects the AAI \(\varOmega _{i}\) that the client needs in order to generate the root \(rt'\) of the updated MHT, and sends it to the client. Once the client receives \(\varOmega _{i}\), he/she first constructs the updated MHT. He/she calculates the new root \(rt'\) and runs \(\textsf {Sign}(sk_{\textsf {SS}},rt') \rightarrow \sigma _{rt'}\). Then, the client sends \(info=(m_{i},T_{m_{i}},\sigma _{rt'})\) (note that \(m_{i}\) and \(T_{m_{i}}\) are not needed for a deletion). After receiving info from the client, the server first updates the MHT, calculates the new root \(rt_{server}'\) and sends it to the client. Upon getting the root \(rt_{server}'\), the client runs \(\textsf {Verify}(pk_{\textsf {SS}},\sigma _{rt'},rt_{server}') \rightarrow 0/1\). If 0, then the client aborts. Otherwise, he/she proceeds and deletes \((m_{i},T_{m_{i}},\sigma _{rt'})\) from his/her local storage. For the operation: (1) Insertion: \(m_{i_{0}}\) is added before \(m_{i}\) by placing \(m_{i_{0}}\) at the i-th leaf node, and all the blocks from \(m_{i}\) are shifted to leaf nodes by 1 to the right; (2) Deletion: \(m_{i}\) is removed from the i-th leaf node and all the blocks from \(m_{i+1}\) are shifted to leaf nodes by 1 to the left; (3) Modification: \(m_{i}'\) simply replaces \(m_{i}\) at the i-th leaf node.

\(\bullet \) \(\textsf {MHT.GenProof}(\mathsf{pk},F,chal,\varSigma ) \rightarrow (\nu ,rt_{server},\{H'(m_{i}),\varOmega _{i}\}_{i \in I})\). The TPA chooses a subset \(I \subseteq [1,n_{max}]\) (\(n_{max}\) is the maximum number of blocks after operations), randomly chooses |I| elements \(v_{i} \in _{R} \mathbb {Z}_{p}\) and sets the challenge \(chal=\{(i,v_{i})\}_{i \in I}\). Then, after receiving chal and given \(F = \{m_{i}\}_{i \in I} \subset \mathbb {F}\) and \(\varSigma = \{T_{m_{i}}\}_{i \in I} \subset \mathbb {E}\), the server runs \(\textsf {GenProof}(pk,F,chal,\varSigma ) \rightarrow \nu \) such that \(\nu = (R_{1},\cdots ,R_{s},B_{1},\cdots ,B_{s},c) \in \mathbb {G}_{1}^{2s+1}\), where \(r_{j} \in _{R} \mathbb {Z}_{p}\), \(R_{j}=h_{1}^{r_{j}}\), \(b_{j}=\sum _{(i,v_{i}) \in chal}\) \( m_{i,j} \cdot v_{i} + r_{j} \in \mathbb {Z}_{p}\) and \(B_{j}=h_{j}^{b_{j}}\) for \(j \in [1,s]\), and \(c =\prod _{(i,v_{i}) \in chal} T_{m_{i}}^{v_{i}}\). Moreover, the server prepares the latest version of the stored root’s signature \(\sigma _{rt}\) provided by the client, the root \(rt_{server}\) of the current MHT, the \(H'(m_{i})\) and AAI \(\varOmega _{i}\) for the challenged blocks, such that the current MHT has been constructed using \(\{H'(m_{i}),\varOmega _{i}\}_{i \in I}\). Finally, it returns \((\nu ,\sigma _{rt},rt_{server},\{H'(m_{i}),\varOmega _{i}\}_{i \in I})\) to the TPA.

\(\bullet \) \(\textsf {MHT.CheckProof}(\mathsf{pk},chal,\nu ,\sigma _{rt},rt_{server},\{H'(m_{i}),\varOmega _{i}\}_{i \in I}) \rightarrow 0/1\). After receiving \(\{H'(m_{i}),\) \(\varOmega _{i}\}_{i \in I}\) from the server, the TPA first constructs the MHT and calculates the root \(rt_{TPA}\). It then checks that \(rt_{server}=rt_{TPA}\). If not, then it aborts; otherwise, it runs \(\textsf {Verify}(pk_{\textsf {SS}},\sigma _{rt},\) \(rt_{server}) \rightarrow 0/1\). If 0, then the TPA aborts. Otherwise, it proceeds and checks whether the following equation holds:

$$\begin{aligned} e(c,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} R_{j} ,g_{2})&{\mathop {=}\limits ^{?}}&e(\prod _{(i,v_{i}) \in chal} H'(m_{i})^{v_{i}} ,g_{2}) \cdot e(\prod _{j=1}^{s} B_{j},g_{2}) \end{aligned}$$
(7)

If Eq. 7 holds, then the TPA returns 1 to the client; otherwise, it returns 0 to the client.

Correctness. We suppose that the correctness holds for \(\textsf {DPDP}\) and \(\textsf {SS}\) protocols. Given the proof of data possession \(\nu \), we have:

$$\begin{aligned} e(c,g_{2}^{a}) \cdot e(\prod _{j=1}^{s} R_{j} ,g_{2})= & {} e ( \prod _{(i,v_{i}) \in chal} T_{m_{i}}^{v_{i}},g_{2}^{a} ) \cdot e(\prod _{j=1}^{s} h_{j}^{r_{j}} ,g_{2}) \\= & {} e ( \prod _{(i,v_{i}) \in chal} (H'(m_{i}) \cdot \prod _{j=1}^{s} h_{j}^{m_{i,j}})^{-a \cdot v_{i}},g_{2}^{a} ) \cdot e(\prod _{j=1}^{s} h_{j}^{r_{j}} ,g_{2}) \\= & {} e ( \prod _{(i,v_{i}) \in chal} H'(m_{i})^{v_{i}},g_{2} ) \cdot e(\prod _{j=1}^{s} B_{j},g_{2}) \end{aligned}$$

N.B. In MHT.GenProof, since I is a subset of ranks, the server has to be given the appropriate \(\{\varOmega _{i}\}_{i \in I}\) along with \(\{H'(m_{i})\}_{i \in I}\) to obtain the current MHT and thus complete the proof generation. Otherwise, the TPA won’t get the proper MHT.

5.2 Security and Privacy Proofs

We give the proofs in the full version of this paper [8].

Security Proof Against the Server

Theorem 3

Let \(\mathcal {A}\) be a PPT adversary that has advantage \(\epsilon \) against the MHT-based DPDP scheme with PV and DP. Suppose that \(\mathcal {A}\) makes a total of \(q_{H'} >0\) queries to \(H'\). Then, there is a challenger \(\mathcal {B}\) that solves the CDH and DL problems with advantage \(\epsilon ' = \mathcal {O}(\epsilon )\).

Second Data Privacy Proof Against the TPA

Theorem 4

Let \(\mathcal {A}\) be a PPT adversary that has advantage \(\epsilon \) against the MHT-based DPDP scheme with PV and DP. Suppose that \(\mathcal {A}\) makes a total of \(q_{H'} >0\) queries to \(H'\). Then, there is a challenger \(\mathcal {B}\) that solves the \((s+1)\)-DDHE problem with advantage \(\epsilon ' = \mathcal {O}(\epsilon )\).

5.3 Performance and Discussion with Other Existing Works

We first compare the MHT-based scheme with the original one presented in [9]. The MHT-based construction seems less practical and efficient than the construction in [9]. Communication and computation burdens appear in order to obtain the desired security standards against the server and TPA. The communication overheads increase between the client and server. The computation overheads for the client raise also, although the client is limited in resources. The storage space of the server should be bigger, since it has to create and possibly stores MHTs for each client. The TPA has to provide more computational resources for each client in order to ensure valid data integrity checks. Nevertheless, experiments might show that the time gap between the algorithms in the scheme proposed in [9] and the ones in the MHT-based scheme is acceptable.

The MHT is an Authenticated Data Structure (ADS) that allows the client and TPA to check that the server correctly stores and updates the data blocks. Erway et al. [4] proposed the first DPDP scheme. The verification of the data updates is based on a modified ADS, called Rank-based Authentication Skip List (RASL). This provides authentication of the data block ranks, which ensures security in regards to data block dynamicity. However, public verifiability is not reached. Note that such ADS with bottom-up leveling limits the insertion operations. For instance, if the leaf nodes are at level 0, any data insertion that creates a new level below the level 0 will bring necessary updates of all the level hash values and the client might not be able to verify. Wang et al. [21] first presented a DPDP with PV using MHT. However, security proofs and technical details lacked. The authors revised the aforementioned paper [21] and proposed a more complete paper [22] that focuses on dynamic and publicly verifiable PDP systems based on BLS signatures. To achieve the dynamicity property, they employed MHT. Nevertheless, because the check of the block ranks is not done, the server can delude the client by corrupting a challenged block as follows: it is able to compute a valid proof with other non-corrupted blocks. Thereafter, in a subsequent work [20], Wang et al. suggested to add randomization to the above system [22], in order to guarantee that the server cannot deduce the contents of the data files from the proofs of data possession. Liu et al. [11] constructed a PDP protocol based on MHT with top-down leveling. Such protocol satisfies dynamicity and public verifiability. They opted for such design to let leaf nodes be on different levels. Thus, the client and TPA have both to remember the total number of data blocks and check the block ranks from two directions (leftmost to rightmost and vice versa) to ensure that the server does not delude the client with another node on behalf of a file block during the data integrity checking process. In this paper, the DPDP scheme with PV and DP is based on MHT with bottom-up leveling, such that data block ranks are authenticated. Such tree-based construction guarantees secure dynamicity and public verifiability processes as well as preservation of data privacy, and remains practical in real environments.

6 Conclusion

We provided two solutions to solve the adversarial issues encountered in the DPDP scheme with PV and DP proposed in [9]. These solutions manage to overcome replay attacks, replace attacks and attacks against data privacy by embedding IHT or MHT into the construction in [9]. We proved that the two new schemes are both secure against the server and data privacy-preserving against the TPA in the random oracle.