1 Introduction

Cloud computing and internet of things have been envisioned as a next generation information technology paradigm for provisioning of computing and storage resources with a reduced cost and fast accessibility [1]. However, its benefits in terms of flexibility are shadowed by security challenges which inhibit its adoption. In the cloud and internet of things, users put large data files on the cloud storage server. As the users’ data do not reside within their physical possession any more, how to efficiently audit the integrity of outsourced data becomes a great challenge in the cloud [35]. Traditional data auditing needs data to be downloaded to the local storage, which could seriously increase the communication and computation overhead. So this is not suitable for cloud environments and internet of things because of the huge amount of users’ data in the cloud and internet of things.

Many schemes including private auditing [15] and public auditing [2, 38, 42] are proposed to process data integrity checking in cloud computing. Compared with private auditing schemes, public auditing schemes allow any public verifier to check the integrity of data storage. Besides, users themselves do not have to afford the overhead of data integrity checking. Thus, public auditing seems more practical and may play a more important role in the cloud [34]. Most of the above schemes focus on the public auditing of personal data in the cloud. However, several other schemes [27, 29] concerning on the integrity checking of shared data have been proposed recently. Different blocks in shared data in the cloud are signed by different users during the data modification. Moreover, users who leave the group or misbehave must be revoked from the group for security consideration. Revoked users cannot access or modify shared data any more. Furthermore, the data blocks previously signed by the revoked users still need to be re-signed by an existing user in the group though the content of shared data is not changed during user revocation.

However, none of the above schemes considers the efficiency of user revocation when checking the integrity of shared data. And then, Panda [32] and its variants [28, 30, 31] are proposed to solve this problem. According to [32], Panda is able to allow cloud server to efficiently re-sign data blocks on behalf of existing users in the group during user revocation, thus existing users do not need to download and re-sign multimedia data blocks by themselves. Moreover, a public verifier is able to check the integrity of shared data without retrieving the entire data blocks from the cloud. Unfortunately, we find that Panda and its variants can neither preserve data privacy nor resist against integrity forgery attack. These schemes are insecure and vulnerable to attacks from outside attackers or malicious servers.

  • Contribution. In this paper, we make a cryptanalysis of Panda and show two specific attacks on Panda: data recovery attack can be implemented by outside attackers to reveal data privacy; while integrity forgery attack can be performed by malicious cloud servers to forge an auditing proof against any auditing challenge successfully even without correct data storage. We pinpoint that the primary cause of the insecurity is the linear combinations of sampled data blocks without random masking properly. Then we propose an improvement of Panda together with data privacy preserving and sound public auditing while incurring optimal communication and computation overhead. The cryptanalysis and improvement are also available for Panda’s variants.

  • Organization. The rest of the paper is organized as follows. Section II presents the problem statement of Panda. Then a brief description of Panda is given in Section III. In Section VI, we introduce two attacks on Panda. An improvement of Panda is proposed in Section V and then the security analysis of the new proposal is followed in Section VI. Performance evaluation is given in Section VII.

1.1 Problem statement

The system model, design goals, several preliminaries of Panda and the improved scheme are described in this section.

1.2 System and security model

The system comprises three different entities: the cloud, the users (who share data as a group), and the public verifier, as shown in Fig. 1.

Fig. 1
figure 1

The system model of Panda

  • The cloud. The cloud owns the infrastructure and expertise to offer outsourced data storage and sharing services to the group.

  • The users. The users can be classified into two types, original users and other group users. The original users create and share data with other group users in the cloud. Both the original users and group users can modify shared data.

  • The public verifier. A client or a third-party auditor (TPA), who can provide data integrity verification, aims to check the storage reliability and validity of shared data via a challenge-response protocol with the cloud server.

Users can put large data files on the cloud and internet of things to free themselves from the burden of storage and maintenance. Shared data is divided into several blocks. Users in the group can perform insert, delete and update operations on the blocks. Each block in shared data is attached with a signature. The original user creates all the signatures on shared data initially. After that, users who modify data blocks are required to sign the modified block with their own private key. Assume that the original user is the group manager and can revoke users on behalf of the group. Once users are revoked from the group, the blocks previously signed by the revoked users need to be re-signed by an existing user.

1.3 Design goals

Panda and the improved scheme are required to achieve the following security and performance goals:

  • Storage Correctness. The public verifier can audit the integrity of shared data correctly.

  • Efficient and Secure User Revocation. The data blocks signed by revoked users can be efficiently re-signed and revoked users cannot create valid signatures any more.

  • Public Auditing. The public verifier can check the integrity of shared data without retrieving the entire data from the cloud.

  • Scalability. Data can be efficiently shared among a large number of users in the cloud, and the public verifier can manage multiple auditing tasks from possibly many users concurrently in secure and efficient manner.

  • Privacy preserving. The public verifier cannot derive the content of shared data from information collected during integrity checking. for the performing of public auditing between cloud and public verifier will reveal data privacy to outside attackers in Panda, this design goal is only achieved by the improved scheme.

1.4 Preliminaries

Some necessary cryptographic primitives are introduced as follows.

  • Bilinear Maps. Let \( {\mathbb{G}}_1 \) and \( {\mathbb{G}}_2 \) be two multiplicative cyclic groups of prime order p. Let g be a generator of \( {\mathbb{G}}_1 \). Bilinear map e is a map e: \( {\mathbb{G}}_1\times {\mathbb{G}}_1\to {\mathbb{G}}_2 \) with the following properties: 1) Bilinear: for all \( u\in {\mathbb{G}}_1 \), \( v\in {\mathbb{G}}_2 \) and a, b ∈ ℤ p , e(u a, v b) = e(u, v)ab; 2) Non-degenerate: e(g, g) ≠ 1; 3) Computable: there exists an efficient algorithm for computing map e.

  • Discrete Logarithm (DL) Problem. Let a ∈ ℤ p , given \( g,{g}^a\in {\mathbb{G}}_1 \) as input, output a.

  • Computational Diffie-Hellman (CDH) Problem. Let a, b ∈ ℤ p , given \( g,{g}^a,{g}^b\in {\mathbb{G}}_1 \) as input, output \( {g}^{ab}\in {\mathbb{G}}_1 \).

  • Homomorphic Authenticators. Homomorphic authenticators are used to construct public auditing mechanisms. Homomorphic authenticable signature schemes should satisfy the following properties: unforgeability, blockless verifiability, and non-malleability [25].

  • Proxy Re-signatures. A semi-trusted proxy is able to act as a translator of signatures between two users. Specifically, the proxy can translate one user’s signature into the other’s on the same block without learning any private information of the two users [3]. In Panda, the cloud is the proxy and translates signatures during user revocation.

2 Panda description

Panda uses public key-based homomorphic authenticators, which are based on the BLS signature scheme [4], to equip the auditing with public auditability. Proxy re-signatures are also used to support cloud server to re-sign shared data blocks. The security of Panda is based on the hardness assumptions of CDH and DL problem over bilinear groups. This section introduces the construction of Panda and its extension.

2.1 Construction of Panda

Details of Panda construction are illustrated in Fig. 2.

Fig. 2
figure 2

Details of Panda

At last, if and only if the verification result is true, the public verifier believes that the integrity of all the blocks in shared data is correct.

2.2 Extension of Panda

Panda can be extended in terms of detection probability, scalability, and reliability. Thus, the detection probability is improved by performing multiple auditing tasks on the same shared data; batch auditing is supported by verifying multiple auditing tasks simultaneously; and reliability is enhanced by adoption of a multi-proxy model of Panda. Further details of Panda extension can be found in [32].

3 Attacks on Panda

In fact, not only is the cloud semi-trusted, but also the public verifier is not fully trusted. Pubic auditing allows any potential client to verify the integrity of the cloud data. As described in Panda, a client who wants to use the shared data in cloud can act as the public verifier. So the public verifier may collect the auditing information for his own purpose (e.g., revealing data privacy, etc.)

It is claimed in [32] that Panda can achieve the following goals: storage correctness, efficient and secure user revocation, public auditing, and scalability. Unfortunately, we find that Panda is vulnerable to attacks from outside attackers and malicious cloud servers. More concretely, the performing of public auditing between cloud and public verifier will reveal data privacy to outside attackers; and the storage correctness of shared data in the cloud may not be ensured even if the cloud passes the integrity auditing from public verifier.

We first describe the threat model of Panda in this section. Then, we introduce two specific attacks on Panda. Data recovery attack can be performed by outside attackers to reveal data privacy. Integrity forgery attack can be implemented by malicious cloud servers to forge an auditing proof against any auditing challenge successfully. Note that the two attacks are also available on panda’s variants [28, 30, 31].

3.1 Threat model

  • Privacy Threats. The content of shared data should be private. An outside attacker acting as a public verifier, who is only allowed to verify the correctness of shared data integrity, may try to recover the content of data blocks. Once the outside attacker reveals the content directly or indirectly, he finishes the privacy attack successfully.

  • Integrity Threats. The cloud server may corrupt or even delete the shared data in cloud storage because of software bugs or hardware failures inadvertently. Besides, the cloud server can be self-interested. It may be economically motivated, which means it might even hide these data corruption incidents to cloud users in order to save its reputation and avoid profit loss of its services.

3.2 Data recovery attack

We assume that an outside attacker who wants to reveal the content of shared data acts as a public verifier. And the outside attacker performs public auditing process with the cloud server. We will show that he can achieve this goal after performing enough public auditing process. The detailed attack scheme is as follows.

For the sake of simplicity, the outside attacker firstly targets only at data blocks signed (or re-signed) by user u 1. For further attacks targeting at data blocks signed (or re-signed) by user u i , 2 ≤ i ≤ d, the attack scheme below is also available.

We assume that the number of elements in subset L 1 is c 1 = t and L 1 = {l 1, l 2, ⋯, l t }. After the user and the cloud finish KeyGen, ReKey (optional), Sign and ReSign (optional), the outside attacker perform t times of ProofGen process and the auditing challenges are \( c{h}_1=\left\{\left({l}_1,{\eta}_{1{l}_1}\right),\left({l}_2,{\eta}_{1{l}_2}\right),\cdots, \left({l}_t,{\eta}_{1{l}_t}\right)\right\} \), \( c{h}_2=\left\{\left({l}_1,{\eta}_{2{l}_1}\right),\left({l}_2,{\eta}_{2{l}_2}\right),\cdots, \left({l}_t,{\eta}_{2{l}_t}\right)\right\} \), …, \( \left\{c{h}_t=\right\{\left({l}_1,{\eta}_{t{l}_1}\right), \) \( \left({l}_2,{\eta}_{t{l}_2}\right),\cdots, \left({l}_t,{\eta}_{t{l}_t}\right)\Big\} \). Then he sends them to the cloud and receives auditing proofs as P 1 = (α 1, β 1, id l ), P 2 = (α 2, β 2, id l ), …, P t  = (α t , β t , id l ).

Since the attack firstly targets only at data blocks signed (or re-signed) by user u 1, there is only one element in vector α i and β i respectively, which is \( {\alpha}_i={\displaystyle {\sum}_{j=1}^t{\eta}_{i{l}_j}{m}_{l_j}},1\le i\le t \) and \( {\beta}_i={\displaystyle {\prod}_{j=1}^t{\sigma_{l_j}}^{\eta_{i{l}_j}}},1\le i\le t \), where \( {\sigma}_{l_j}={\left(H\left(i{d}_{l_j}\right){w}^{m_{l_j}}\right)}^{\pi_1}, \) 1 ≤ j ≤ t.

Assume that \( {\boldsymbol{\upeta}}_1=\left({\eta}_{1{l}_1},{\eta}_{1{l}_2},\cdots, {\eta}_{1{l}_t}\right) \), \( {\boldsymbol{\upeta}}_2=\Big({\eta}_{2{l}_1},{\eta}_{2{l}_2},\cdots, \) \( {\eta}_{2{l}_t}\Big) \), …, \( {\boldsymbol{\upeta}}_t=\left({\eta}_{t{l}_1},{\eta}_{t{l}_2},\cdots, {\eta}_{t{l}_t}\right) \) and the construction of matrix η 1 is

$$ {\boldsymbol{\upeta}}^1=\left[\begin{array}{cccc}\hfill {\eta}_{1{l}_1}\hfill & \hfill {\eta}_{1{l}_2}\hfill & \hfill \cdots \hfill & \hfill {\eta}_{1{l}_t}\hfill \\ {}\hfill {\eta}_{2{l}_1}\hfill & \hfill {\eta}_{2{l}_2}\hfill & \hfill \cdots \hfill & \hfill {\eta}_{2{l}_t}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill {\eta}_{t{l}_1}\hfill & \hfill {\eta}_{t{l}_2}\hfill & \hfill \cdots \hfill & \hfill {\eta}_{t{l}_t}\hfill \end{array}\right] $$

Let det(η 1) ≠ 0, so vectors η 1, η 2, ⋯, η t are linearly independent. Then there is a matrix μ that satisfies μη 1 = E.

Assume that matrix \( {\mathbf{m}}_1=\begin{array}{cccc}\hfill \Big[{m}_{l_1}\hfill & \hfill {m}_{l_2}\hfill & \hfill \cdots \hfill & \hfill {m}_{l_t}\Big]\hfill \end{array} \) is constructed from data blocks signed (or re-signed) by user u 1 and \( {\boldsymbol{\upalpha}}^{\mathbf{\prime}}=\left[\begin{array}{cccc}\hfill {\alpha}_1\hfill & \hfill {\alpha}_2\hfill & \hfill \cdots \hfill & \hfill {\alpha}_t\hfill \end{array}\right] \). Thus α′ = η 1 m 1, and then the outside attacker can derive m 1  = μα′. The outside attacker recovers the data blocks signed (or re-signed) by user u 1 successfully.

In fact, even if the outside attacker cannot act as the public verifier in some cases, if he eavesdrops on enough auditing challenges {(l, η l )} l ∈ L and auditing proofs {α, β, id l }, he can also recover the matrix m i , namely the data blocks signed (or re-signed) by user u i , 1 ≤ i ≤ d. And the exact number of pairs of auditing challenges and proofs he need to collect is c i that satisfies det(η i) ≠ 0, 1 ≤ i ≤ d.

3.3 Integrity forgery attack

We assume that malicious cloud servers might delete data owned by users or even hide some data corruptions for their own benefits. We will show that a malicious cloud server can forge an auditing proof against any auditing challenge successfully even without correct data storage.

To make matters worse, according to the following attack scheme, an outside attacker, who does not initially possess the shared data at all, can forge an auditing proof against any auditing challenge after eavesdropping on enough valid pairs of auditing challenges and auditing proofs. This means that any user is able to masquerade as the cloud server as long as he can eavesdrop on auditing challenges and auditing proofs. This serious security flaw can bring unexpected risks to both users and the cloud. The detailed scheme of integrity forgery attack is as follows.

For the sake of simplicity, the attacker firstly targets only at data blocks signed (or re-signed) by user u 1. For further attacks targeting at data blocks signed (or re-signed) by user u i , 2 ≤ i ≤ d, the attack scheme below is also available.

As the same as data recovery attack, we assume that the number of elements in subset L 1 is c 1 = t and L 1 = {l 1, l 2, ⋯, l t }. The user and the malicious cloud server have finished KeyGen, ReKey (optional), Sign and ReSign (optional), After the public verifier performs t times of ProofGen process and the auditing challenges are \( c{h}_1=\left\{\left({l}_1,{\eta}_{1{l}_1}\right),\left({l}_2,{\eta}_{1{l}_2}\right),\cdots, \left({l}_t,{\eta}_{1{l}_t}\right)\right\} \), \( c{h}_2=\left\{\left({l}_1,{\eta}_{2{l}_1}\right),\right({l}_2, \) \( {\eta}_{2{l}_2}\left),\cdots, \left({l}_t,{\eta}_{2{l}_t}\right)\right\} \), …, \( \Big\{c{h}_t=\left\{\left({l}_1,{\eta}_{t{l}_1}\right),\left({l}_2,{\eta}_{t{l}_2}\right),\cdots, \left({l}_t,{\eta}_{t{l}_t}\right)\right\} \), the malicious cloud server returns auditing proofs as P 1 = (α 1, β 1, id l ), P 2 = (α 2, β 2, id l ), …, P t  = (α t , β t , id l ).

Since the attack firstly targets only at data blocks signed (or re-signed) by user u 1, there exists only one element in vector α i and β i respectively, which is \( {\alpha}_i={\displaystyle {\sum}_{j=1}^t{\eta}_{i{l}_j}{m}_{l_j}},1\le i\le t \) and \( {\beta}_i={\displaystyle {\prod}_{j=1}^t{\sigma_{l_j}}^{\eta_{i{l}_j}}},1\le i\le t \), where \( {\sigma}_{l_j}={\left(H\left(i{d}_{l_j}\right){w}^{m_{l_j}}\right)}^{\pi_1}, \) 1 ≤ j ≤ t.

Assume that \( {\boldsymbol{\upeta}}_1=\left({\eta}_{1{l}_1},{\eta}_{1{l}_2},\cdots, {\eta}_{1{l}_t}\right) \), \( {\boldsymbol{\upeta}}_2=\Big({\eta}_{2{l}_1},{\eta}_{2{l}_2},\cdots, \) \( {\eta}_{2{l}_t}\Big) \), …, \( {\boldsymbol{\upeta}}_t=\left({\eta}_{t{l}_1},{\eta}_{t{l}_2},\cdots, {\eta}_{t{l}_t}\right) \) and the construction of matrix η 1 is

$$ {\boldsymbol{\upeta}}^1=\left[\begin{array}{cccc}\hfill {\eta}_{1{l}_1}\hfill & \hfill {\eta}_{1{l}_2}\hfill & \hfill \cdots \hfill & \hfill {\eta}_{1{l}_t}\hfill \\ {}\hfill {\eta}_{2{l}_1}\hfill & \hfill {\eta}_{2{l}_2}\hfill & \hfill \cdots \hfill & \hfill {\eta}_{2{l}_t}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill {\eta}_{t{l}_1}\hfill & \hfill {\eta}_{t{l}_2}\hfill & \hfill \cdots \hfill & \hfill {\eta}_{t{l}_t}\hfill \end{array}\right] $$

If det(η 1) ≠ 0, vectors η 1, η 2, ⋯, η t are linearly independent. Therefore, the malicious cloud server deletes data blocks signed (or re-signed) by user u 1 and stores the t pairs of auditing challenges and proofs. And then he can generate valid auditing proofs against auditing challenge to data blocks signed (or re-signed) by user u 1 even without correct data storage of them.

After receiving an auditing challenge \( c{h}^{\ast }=\left\{\left({l}_1,{\eta_{l_1}}^{\ast}\right),\right({l}_2, \) \( {\eta_{l_2}}^{\ast}\left),\cdots, \left({l}_t,{\eta_{l_t}}^{\ast}\right)\right\} \) to data blocks signed (or re-signed) by user u 1, the malicious cloud server generates an auditing proof as follows:

  1. 1)

    Assume \( {\boldsymbol{\upeta}}^{\ast }=\left({\eta_{l_1}}^{\ast },{\eta_{l_2}}^{\ast },\cdots, {\eta_{l_t}}^{\ast}\right) \), since det(η 1) ≠ 0, the malicious cloud server can generate a i , 1 ≤ i ≤ t that satisfies η  = a 1 η 1 + a 2 η 2 + ⋯ + a t η t , namely \( {\eta_{l_j}}^{\ast }={\displaystyle {\sum}_{i=1}^t{a}_i{\eta}_{i{l}_j}} \).

  2. 2)

    The malicious cloud server generates α  = ∑ t i = 1 a i α i and \( {\beta}^{\ast }={\displaystyle {\prod}_{i=1}^t{\beta_i}^{a_i}} \), and outputs an auditing proof {α , β , id l }, where α  = (α ) and β  = (β ).

Thus, when the integrity forgery attack firstly targets at data blocks signed (or re-signed) by user u 1, namely for the i = 1 factors in \( e\left({\displaystyle {\prod}_{i=1}^d{\beta}_i},g\right)={\displaystyle {\prod}_{i=1}^de\left({{\displaystyle {\prod}_{l\in {L}_i}H\left(i{d}_l\right)}}^{\eta_l}\cdot {w}^{\alpha_i},p{k}_i\right)} \), the auditing proof {α , β , id l } generated by the malicious cloud server can pass the verification because of the following equation:

$$ \begin{array}{l}e\left({\beta}^{\ast },g\right)=e\left({\displaystyle {\prod}_{i=1}^t{\beta_i}^{a_i}},g\right)\\ {}=e\left({\displaystyle {\prod}_{i=1}^t{\displaystyle {\prod}_{j=1}^t{\sigma_{l_j}}^{\eta_{i{l}_j}{a}_i}}},g\right)\\ {}=e\left({\displaystyle {\prod}_{j=1}^t{\left({\sigma}_{lj}\right)}^{{\displaystyle {\sum}_{i=1}^t{a}_i{\eta}_{i{l}_j}}},g}\right)\\ {}=e\left({\displaystyle {\prod}_{j=1}^t{\left(H\left(i{d}_{l_j}\right){w}^{m_{l_j}}\right)}^{\pi_1{\displaystyle {\sum}_{i=1}^t{a}_i{\eta}_{i{l}_j}}},g}\right)\\ {}=e\left({\displaystyle {\prod}_{j=1}^tH{\left(i{d}_{l_j}\right)}^{{\displaystyle {\sum}_{i=1}^t{a}_i{\eta}_{i{l}_j}}}\cdot {w}^{{\displaystyle {\sum}_{i=1}^t{a}_i{\displaystyle {\sum}_{j=1}^t{\eta}_{i{l}_j}{m}_{l_j}}}},{g}^{\pi_1}}\right)\\ {}=e\left({\displaystyle {\prod}_{j=1}^tH{\left(i{d}_{l_j}\right)}^{{\displaystyle {\sum}_{i=1}^t{a}_i{\eta}_{i{l}_j}{\pi}_1}}\cdot {w}^{{\displaystyle {\sum}_{i=1}^t{a}_i{\alpha}_i{\pi}_1}},p{k}_1}\right)\\ {}=e\left({{\displaystyle {\prod}_{j=1}^tH\left(i{d}_{l_j}\right)}}^{{\eta_{l_j}}^{\ast }}\cdot {w}^{\alpha^{\ast }},p{k}_1\right)\end{array} $$

Since the auditing challenges received by the malicious cloud server are the simple arrangement of auditing challenge for each data block signed (or re-signed) by each user u i , 2 ≤ i ≤ d, it’s easy to separate the auditing challenges signed (or re-signed) by different users. The auditing proofs can be separated in the same way. Thus the malicious cloud server can get pairs of auditing challenges and auditing proofs for data blocks signed (or re-signed) by each user u i , 2 ≤ i ≤ d. When receiving auditing challenge, the malicious cloud server firstly generates the partial auditing proof {α , β , id l } for each i, 2 ≤ i ≤ d specialized in the auditing challenge, and then he gets the valid auditing proof by arranging the partial auditing proof in the way of the auditing challenge arrangement.

Thus even without correct data storage, the malicious cloud server can forge an auditing proof against any auditing challenge successfully.

4 New proposal for Panda

In this section, we propose an improved scheme of Panda, which is also a homomorphic authenticable proxy re-signature scheme. The system model and security preliminaries of this scheme are the same as Panda’s. However, because Panda cannot preserve data privacy during auditing process, we add privacy preserving as the new design goal of the improved scheme to ensure that the public verifier cannot derive shared data during integrity auditing.

From the two specific attacks on Panda described above, it is concluded that the primary cause of the insecurity of Panda is the linear combinations of sampled data blocks without random masking properly. Data blindness during data auditing is not well concerned in Panda. Thus outside attackers and malicious cloud servers can easily derive shared data content or forge a valid auditing proof by collecting enough linear combinations of data blocks. So in the improved scheme, we integrate the homomorphic authenticators with random masking technique.

Actually, the data privacy preserving in public auditing schemes and its solution with random masking have already been discussed and proposed (Wang et al., 2010). But random masking is not properly used in (Wang et al., 2010). In fact, this solution is found of security flaws and cannot provide secure data storage for users [39]. However, in our new proposal with improved random masking technique, the outside attackers and malicious cloud servers cannot get necessary information to derive the shared data content or generate valid auditing proofs any more, no matter how many linear combinations of data blocks can be collected. And even with the presence of the randomness, the correctness validation of the pairs of auditing challenges and proofs can still be processed in a new way.

The new proposal for Panda also includes the following six algorithms: KeyGen, ReKey, Sign, ReSign, ProofGen, and ProofVerify. The initialization and the first four algorithms of the new proposal are the same as Panda. The ProofGen and ProofVerify algorithms are illustrated in Fig. 3.

Fig. 3
figure 3

The improvement of Panda

At last, if and only if the verification result is true, the public verifier believes that the integrity of all the blocks in shared data is correct.

The proof of the correctness of the verification is given as follows.

$$ \begin{array}{l}e\left({\displaystyle {\prod}_{i=1}^d{\beta}_i},g\right)\\ {}={\displaystyle {\prod}_{i=1}^de\left({\displaystyle {\prod}_{l\in {L}_i}}{\sigma_l}^{\eta_l},g\right)}\\ {}={\displaystyle {\prod}_{i=1}^de\left({{\displaystyle {\prod}_{l\in {L}_i}\left(H\left(i{d}_l\right){w}^{m_l}\right)}}^{\pi_i{\eta}_l},g\right)}\\ {}={\displaystyle {\prod}_{i=1}^de\left({\displaystyle {\prod}_{l\in {L}_i}H{\left(i{d}_l\right)}^{\eta_l}\cdot {\displaystyle {\prod}_{l\in {L}_i}}{w^{m_l}}^{\eta_l},{g}^{\pi_i}}\right)}\\ {}={\displaystyle {\prod}_{i=1}^de\left({\displaystyle {\prod}_{l\in {L}_i}}H{\left(i{d}_l\right)}^{\eta_l}\cdot {w}^{{\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l+{\gamma}_i}}\cdot {w}^{-{\gamma}_i},p{k}_i\right)}\\ {}={\displaystyle {\prod}_{i=1}^de\left({\displaystyle {\prod}_{l\in {L}_i}}H{\left(i{d}_l\right)}^{\eta_l}\cdot {w}^{\alpha_i}\cdot {R}_i^{-1},p{k}_i\right)}\end{array} $$

As the same as Panda, this new proposal can also be extended in terms of detection probability, scalability, and reliability easily. Due to space limitation, we will not describe this extension and details of the extended construction can refer to [32].

5 Security analysis

We evaluate the security of the new proposal for Panda by modularizing it into two parts, namely the storage correctness guarantee and the privacy preserving guarantee. The security of our scheme depends on the hardness assumption of CDH and DL problems.

  • Storage Correctness Guarantee. We need to prove that the cloud server cannot generate valid auditing proof for the public verifier without actually storing the shared data.

Theorem 1.

The cloud passes the verification done by the public verifier only if it indeed possesses the specified shared data intact as it is.

Proof.

First, the signature scheme of the new proposal is existentially unforgeable and please refers to [4, 34]. Then, the proof follows from Theorem 4.2 of [25]. The cloud server is treated as an adversary. The challenger controls the random oracle H(⋅). We show that if the adversary passes the verification with non-negligible probability, a simulator can be constructed that can solve the CDH problem.

Given \( g,{g}^a,{g}^b\in {\mathbb{G}}_1 \), the simulator needs to output \( {g}^{ab}\in {\mathbb{G}}_1 \). The simulator randomly chooses x, y ∈ ℤ p and then sets pk 1 = g a and w = g x g by. For each l, the simulator chooses r l  ∈ ℤ p , and programs the random oracle at l as \( H\left(i{d}_l\right)={g}^{r_l}{{}^{-x{m}_l}}^{-by{m}_l} \).

Since w = g x g by, the simulator computes σ l for the signature query issued by the adversary as

$$ {\sigma}_l={\left(H\left(i{d}_l\right){w}^{m_l}\right)}^a={\left({g}^{r_l}{{}^{-x{m}_l}}^{-by{m}_l}{\left({g}^x{g^b}^y\right)}^{m_l}\right)}^a={g}^{a{r}_l} $$

Firstly, for an auditing challenge returned by the challenger, let {α, β, R, id l } be the cloud server’s response that can also satisfy

$$ e\left({\displaystyle {\prod}_{i=1}^d{\beta}_i},g\right)={\displaystyle {\prod}_{i=1}^de\left({{\displaystyle {\prod}_{l\in {L}_i}H\left(i{d}_l\right)}}^{\eta_l}\cdot {w}^{\alpha_i}\cdot {R_i}^{-1},p{k}_i\right)} $$
(1)

And then for the same γ i  ∈ ℤ p , let the adversary output {α′, β′, R, id l } as the auditing proof and it satisfies

$$ e\left({\displaystyle {\prod}_{i=1}^d{\beta}_i^{\prime }},g\right)={\displaystyle {\prod}_{i=1}^de\left({{\displaystyle {\prod}_{l\in {L}_i}H\left(i{d}_l\right)}}^{\eta_l}\cdot {w}^{\alpha_i^{\prime }}\cdot {R_i}^{-1},p{k}_i\right)} $$
(2)

Obviously that α i  ≠ α i , otherwise β i  = β i , which contradicts the assumption that the challenger aborted on the adversary’s response. Assume that Δα i  = α i  − α i , we can solve the CDH problem as follows:

Since γ i  ∈ Z p is the same in Eqs. (1) and (2), dividing Eq. (2) by Eq. (1), we have

$$ e\left({\displaystyle {\prod}_{i=1}^d\frac{\beta_i^{\prime }}{\beta_i}},g\right)={\displaystyle {\prod}_{i=1}^de\left({w}^{\varDelta {\alpha}_i},p{k}_i\right)} $$
(3)

For the i = 1 factors in Eq. (3), replacing w by g x g by and pk 1 by g a, we have

$$ e\left({\beta}_1^{\prime}\cdot {\beta_1}^{-1},g\right)=e\left({\left({g}^x{g^b}^y\right)}^{\varDelta {\alpha}_1},{g}^a\right)=e\left(p{k_1}^{x\varDelta {\alpha}_1}{\left({g^{by}}^{\varDelta {\alpha}_1}\right)}^a,g\right) $$

Thus we have

$$ \begin{array}{l}e\left(p{k_1}^{-x\varDelta {\alpha}_1},g\right)e\left({\beta}_1^{\prime}\cdot {\beta_1}^{-1},g\right)=e\left(p{k_1}^{-x\varDelta {\alpha}_1},g\right)e\left(p{k_1}^{x\varDelta {\alpha}_1}{\left({g^{by}}^{\varDelta {\alpha}_1}\right)}^a,g\right)\hfill \\ {}e\left({\beta}_1^{\prime}\cdot {\beta_1}^{-1}\cdot p{k_1}^{-x\varDelta {\alpha}_1},g\right)=e\left({\left({g^{by}}^{\varDelta {\alpha}_1}\right)}^a,g\right)=e{{\left({g}^{ab},g\right)}^y}^{\varDelta {\alpha}_1}\hfill \\ {}e\left({\left({\beta}_1^{\prime}\cdot {\beta_1}^{-1}\cdot p{k_1}^{-x\varDelta {\alpha}_1}\right)}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$y\varDelta {\alpha}_1$}\right.},g\right)=e\left({g}^{ab},g\right)\hfill \end{array} $$

So as long as yΔα 1 ≠ 0 mod p, we can solve the CDH problem as

$$ {g}^{ab}={\left({\beta}_1^{\prime}\cdot {\beta_1}^{-1}\cdot p{k_1}^{-x\varDelta {\alpha}_1}\right)}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$y\varDelta {\alpha}_1$}\right.} $$

Since the adversary cannot get the value of y, the probability that yΔα 1 = 0 mod p will be 1/p which is negligible and therefore β 1 = β 1 . And in this case if the adversary successes with non-negligible probability, a simulator can be constructed that can solve the DL problem.

We have proved that β 1 = β 1 . It is only the values α i and α i that can differ. The simulator answers the adversary’s queries and the adversary outputs a forged proof {α′, β′, R, id l }. Then we have

$$ \begin{array}{l}e\left({\beta}_1^{\prime },g\right)=e\left({\beta}_1,g\right)\hfill \\ {}e\left({{\displaystyle {\prod}_{l\in {L}_1}H\left(i{d}_l\right)}}^{\eta_l}\cdot {w}^{\alpha_1^{\prime }}\cdot {R_1}^{-1},p{k}_1\right)=e\left({{\displaystyle {\prod}_{l\in {L}_1}H\left(i{d}_l\right)}}^{\eta_l}\cdot {w}^{\alpha_1}\cdot {R_1}^{-1},p{k}_1\right)\hfill \\ {}e\left({w}^{\alpha_1^{\prime }},p{k}_1\right)=e\left({w}^{\alpha_1},p{k}_1\right)\hfill \\ {}{w}^{\varDelta {\alpha}_1}=1\hfill \end{array} $$

In this case, Δα 1 = 0 mod p and this contradicts our assumption. Otherwise, we can solve the DL problem as

$$ 1={w}^{\varDelta {\alpha}_1}={\left({g}^x{g^b}^y\right)}^{\varDelta {\alpha}_1}={g^x}^{\varDelta {\alpha}_1}\cdot {g}^b{{}^y}^{\varDelta {\alpha}_1} $$

Then the solution for the DL problem is

$$ {g}^b={g}^{-\frac{x\varDelta {\alpha}_1}{y\varDelta {\alpha}_1}}={g}^{-\frac{x}{y}} $$

And y is zero only with probability 1/p, which is negligible and this completes the proof.

The analysis above shows that there is negligible probability that an adversary can cause the public verifier to accept its proof except by responding with correctly computed values.

  • Privacy Preserving Guarantee. We need to prove that the public verifier cannot derive the shared data from the information collected during integrity checking.

Theorem 2.

From the cloud’s auditing proof {α, β, R, id l }, The public verifier cannot recover any block in the shared data.

Proof.

First, we show that privacy of \( {\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l} \) is guaranteed from α i . This is because α i is blinded by γ i as \( {\alpha}_i={\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l}+{\gamma}_i \) where γ i is chosen randomly by the cloud and is unknown to the public verifier. Even with R, due to the hardness assumption of DL problem, γ i is still hidden from the public verifier. Thus no information of \( {\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l} \) can be learned from α i .

Second, we show that privacy of \( {\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l} \) is guaranteed from β i .

$$ \begin{array}{l}{\beta}_i={\displaystyle {\prod}_{l\in {L}_i}{\sigma_l}^{\eta_l}}\\ {}={\displaystyle {\prod}_{l\in {L}_i}{\left(H\left(i{d}_l\right){w}^{m_l}\right)}^{\pi_i{\eta}_l}}\\ {}={\displaystyle {\prod}_{l\in {L}_i}H{\left(i{d}_l\right)}^{\pi_i{\eta}_l}}\cdot {\left({w}^{{\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l}}\right)}^{\pi_i}\end{array} $$

From the equation above, we can see that \( {\left({w}^{{\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l}}\right)}^{\pi_i} \) is blinded by \( {\displaystyle {\prod}_{l\in {L}_i}H{\left(i{d}_l\right)}^{\pi_i{\eta}_l}} \). Computing the value of \( {\displaystyle {\prod}_{l\in {L}_i}H{\left(i{d}_l\right)}^{\pi_i{\eta}_l}} \) from H(id l ), η l and \( {g}^{\pi_i} \), which is the only information the public verifier can get, is hard due to the CDH problem. Thus the value of \( {\left({w}^{{\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l}}\right)}^{\pi_i} \) as well as \( {\displaystyle {\sum}_{l\in {L}_i}{\eta}_l{m}_l} \) cannot be derived from β i . This completes the proof.

6 Performance evaluation

In this section, we evaluate the performance of our scheme by evaluating the time and communication overhead. Then we show the performance of auditing experiments. Results show that the new proposal for Panda provides the desired data privacy preserving and sound public auditing while incurring a little extra communication and computation overhead compared with Panda.

  • Communication Overhead. As the same as in [32], the scheme does not introduce communication overhead to existing users during user revocation. Thus we only analyze the communication overhead incurred by auditing challenge and its corresponding auditing proof. As we have described above, we assume c random data blocks that will be checked in auditing process. The size of an auditing message is c ⋅ (|n| + |p|) bits. The size of an auditing proof {α, β, R, id l } is 3d|p| + c(|id|) bits. Thus the total communication overhead is 3d|p| + c(|id| + |n| + |p|) bits. Compared with Panda, the extra communication overhead of this new proposal is only d|p|. Moreover, since the scheme is based on the BLS short signatures, we have the shortest auditing challenge and auditing proof which shows that the communication complexity is constant and asymptotically it is O(1).

  • Computation Overhead. The initialization and the first four algorithms (KeyGen, ReKey, Sign, ReSign) of the new proposal are the same as Panda. Thus the computation overhead of the first four algorithms is as the same as Panda, which can refer to [32]. Based on the auditing equation illustrated in Fig. 3, the computation overhead of an auditing proof verification is \( \left(c+d\right) Ex{p}_{{\mathbb{G}}_1}+\left(c+2d\right)Mu{l}_{{\mathbb{G}}_1}+\left(d+1\right) Pair \) \( +d\kern0.1em Mu{l}_{{\mathbb{G}}_2}+c\kern0.1em Has{h}_{{\mathbb{G}}_1}+d\kern0.1em In{v}_{{\mathbb{G}}_1} \), where \( Ex{p}_{{\mathbb{G}}_1} \) denotes one exponentiation in \( {\mathbb{G}}_1 \), \( Ex{p}_{{\mathbb{G}}_1} \) denotes one exponentiation in \( {\mathbb{G}}_1 \), \( Mu{l}_{{\mathbb{G}}_1} \) denotes one multiplication in \( {\mathbb{G}}_1 \), Pair denotes one pairing operation on e: \( {\mathbb{G}}_1\times {\mathbb{G}}_1\to {\mathbb{G}}_2 \), \( Has{h}_{{\mathbb{G}}_1} \) denotes one hashing operation in \( {\mathbb{G}}_1 \), \( In{v}_{{\mathbb{G}}_1} \) denotes multiplicative inversion in \( {\mathbb{G}}_1 \). In fact, compared with Panda, the extra computation overhead of this new proposal is only \( d\kern0.1em In{v}_{{\mathbb{G}}_1} \).

  • Performance of Auditing. Pairing Based Cryptography Library is used to implement cryptographic operations. As the same as Panda, experiments are tested under Ubuntu with 2.5 GHz Processor and 4 GB Memory. Assuming the size of an element of \( {\mathbb{G}}_1 \) is |p| = 160 bits, |id| = 80 bits and |n| = 1, 000, 000. By utilizing proper aggregation methods [25], the size of each block can be set as 2 KB, and the volume of shared data is 2 GB. The communication overhead and auditing time overhead are both linearly increasing with the number of users in the group, as illustrated in Figs. 4 and 5. Our scheme can also support large groups efficiently. For example, when the number of users is 50 and c = 300, the auditing task can be finished with only 820 milliseconds and 11 KB communication overhead.

    Fig. 4
    figure 4

    Communication overhead

    Fig. 5
    figure 5

    Auditing time overhead

7 Conclusion

Ensuring the security of outsourced data needs continuous integrity auditing, meanwhile, without privacy leakage. In a public auditing scheme, the public verifier is delegated to check the validity of outsourced data. However, this delegation brings privacy concerns since the public verifier has the potential to derive multimedia data blocks. Moreover, if the scheme is not well designed, the cloud and internet of things might successfully hide some data corruptions for their own benefits. Besides, an outside attacker may forge an auditing proof against any auditing challenge after eavesdropping on enough valid pairs of auditing challenges and auditing proofs. In this paper, we have shown two security drawbacks of Panda. We have demonstrated that Panda is vulnerable to attacks from outside attackers and malicious cloud servers. Therefore, Panda cannot preserve data privacy or audit the integrity of shared data in the cloud and internet of things correctly. Then, we propose a new proposal for Panda, which is also a homomorphic authenticable proxy re-signature scheme. Detailed security and performance analyses show that this new proposal can provide desired data privacy preserving and sound public auditing while incurring a little extra communication and computation overhead compared with Panda. The proposed research result is able to be applied in some related research fields, such as image processing [7, 12, 13, 17, 19, 24, 43], visualization [22, 26], network [10, 11, 14, 23], grid [5, 6, 33], cloud computing [16, 36, 37, 41, 45], multimedia [9, 18, 20, 21], hardware [40], and others [8, 44].