1 Introduction

As the amount of data continues to grow in today’s world, cloud storage services are becoming increasingly important. Instead of storing data in a local storage system, outsourcing large amounts of data to a remote cloud server can greatly relieve the individuals and enterprises from the burden of data maintenance and management [1, 2]. Cloud benefits such as elastic storage, high reliability, affordable management, location independence, on-demand self-service and ubiquitous network access [3] attract more and more users toward the cloud service for data storage and sharing, as shown in Fig. 1. According to Cisco [4], there will be 2 billion individuals who will use the cloud storage service by 2018 [5].

Fig. 1
figure 1

The framework of cloud storage

Despite the benefits of cloud storage which is managed by the cloud service providers (CSPs), the cloud storage service brings new threats and challenges to the security of user data in terms of integrity, availability, secure sharing and utilization. For instance, the cloud infrastructure may suffer from some inevitable failures and a wide range of external attacks, which can lead to the data corruption (e.g., some data stored on Amazon’s cloud are destroyed permanently due to the crash disaster), but the CSP may hide this data loss in order to avoid a bad reputation. Even worse, a CSP may deliberately discard some sensitive data that are not frequently visited by users for financial reasons. Users have no idea if their outsourced data are stored correctly on the selected cloud server because they lose physical control of their data on the remote cloud servers. Hence, how to guarantee the data integrity on a remote cloud server becomes a key issue for cloud storage services.

To address the aforementioned security issue, various auditing schemes have been proposed in recent years. An auditing scheme is designed to help an auditor verify the integrity of outsourced data periodically without downloading the entire data file from the remote cloud server. In an auditing scheme, the original data file is considered to consist of n data blocks. It is typical for a user to create a corresponding signature or tag for each data block, and then outsource all data signatures to a cloud server. During the auditing procedure, an authorized auditor can implement a challenge-protocol to verify the data integrity. Since the auditing proof responded by the cloud server is generated by aggregating the sampled data blocks and their corresponding signatures, it is essential for each data signature to be unforgeable. On the other hand, the value of aggregated data blocks, as part of auditing proof, should also be unforgeable.

Generally, the user is limited by the computational resources available, so that an authorized third party [known as third-party auditor (TPA)] is introduced to audit the data integrity instead of the original user. As TPA is a third party, it should not have any knowledge of the data content during the auditing procedure. In addition, several users can easily form a group for information sharing (e.g., an academic debate forum, where many researchers can upload files and share information with each other). In this case, the researchers in the group would not like to leak their private identity for security.

If a public auditing scheme cannot guarantee the user’s identity privacy, it is possible for the TPA to know which blocks are more important, and which user is more important among the users in the current group after performing several audits. Then, the frequently audited data may suffer from more attacks and attract more attackers, such as address tracing. Thus, an auditing scheme for shared data must protect the identity anonymity of group users from the TPA and the cloud server.

However, absolute anonymity may bring some other integrity and security issues to the data owner or users. For example, a malicious data owner can damage or modify part of the shared data because of some reasons, while the property of absolute anonymity can provide a protective umbrella to the malicious user and help the user from taking any blame. In this situation, the property of absolute anonymity will lead to unpredictable financial or reputational losses. Therefore, both identity anonymity and traceability are important to a public auditing scheme for shared data.

In recent years, although many public auditing schemes for the shared data have been proposed, they still suffer from different security attacks or they incur high computation/communication overheads making them unsuitable for deployment in practical applications. Therefore, designing a secure and efficient public auditing scheme for shared data remains a significant challenge. In this work, we investigate how to achieve a secure and efficient public auditing scheme for shared data on a remote cloud server.

1.1 Our contributions

Based on the above description, we note that the unforgeability of auditing proofs, the anonymity of both shared data and user identity in cloud auditing are significant for group users. To address these security issues, we present a new efficient public auditing scheme for shared data on the remote cloud server based on Tzu-Hsin Ho’s group signature scheme [6]. We summarize our research contributions as follows:

  • First, we propose a new public auditing scheme for shared data with high efficiency. In the proposed scheme, we use the group signature and homomorphic message authentication code (MAC) to create signatures for data blocks of users in a group. And we authorize a TPA to support public data auditing by sharing a secret key pair with it.

  • Second, we analyze the security of the proposed scheme against signature forgery attack and proof forgery attack. Moreover, we prove that the proposed scheme can preserve data privacy, protect identity anonymity of users and provide conditional identity traceability.

  • Finally, we evaluate the performance of the proposed scheme and demonstrate its efficiency.

1.2 Organization

The remainder of this paper is structured as follows: In Sect. 2, we briefly review some previous related works. In Sect. 3, we present some background related to the proposed scheme. In Sect. 4, we describe our proposed scheme. In Sect. 5, we analyze the proposed scheme in terms of its correctness and security. Then, in the following section, we evaluate the efficiency of the proposed scheme. Finally, we make some concluding remarks.

2 Related work

Data integrity verification is of vital importance when verifying the storage correctness of outsourced data on a remote cloud server. Ateniese et al. [7] first came up with the provable data possession (PDP) concept to avoid downloading entire files from a remote cloud server while auditing data integrity. In their scheme, homomorphic tags are utilized to aggregate the sample blocks and corresponding signatures in order to reduce communication costs. Then, Shacham and Waters proposed the compact proofs of retrievability (CPoR) concept [8] with a concrete construction based on the Boneh–Lynn–Shacham (BLS) signature [9], and the scheme is provable secure under the random oracle model [10]. Since then, several other public auditing schemes [11,12,13,14,15] have been proposed to meet diverse application requirements.

To reduce the management overhead of certificate in PKI-based auditing schemes [7, 16, 17], the identity-based cryptography has been adopted in many public auditing schemes [18, 19]. For instance, Yu et al. [20] presented a generic construction of identity-based provable data possession (ID-PDP) scheme, and described a concrete ID-based auditing scheme as well. Wang et al. [21] proposed an identity-based proxy-oriented data uploading and auditing scheme, which is specialized for some business managers who are not able to upload or audit the outsourced data in person. Li et al. [22] presented a novel auditing scheme by using biometric as a fuzzy identity to address the complex key management issue.

Although some organizations require strong security in key management, the identity-based cryptography is not the best choice to adopt when checking the integrity of outsourced data because of the key escrow problem. It means that the private key of a user is revealed to a third party (refer in particular to the KGC), so that the KGC can impersonate the user to do some unexpectable things and damage the user’s benefit. Therefore, certificateless public auditing schemes have been proposed. Wang et al. [23] first presented a secure certificateless public auditing scheme for data integrity verification. Zhang et al. [24] are the first researchers to propose a certificateless public integrity verification scheme considering the security against a malicious TPA simultaneously. He et al. proposed a certificateless public auditing scheme for cloud-assisted wireless body area networks (WBANs) [25], which is proved to be more efficient than Wang et al.’s scheme [23].

Besides secure key management, the property of data privacy preservation is also important for public data integrity verification. Early on, most public auditing schemes suffered from security weakness owing to the linear combination of sampled data blocks when aggregating them as one part of the auditing proof, e.g., \( \sum _{i \in L}v_i m_i \), where \( v_i \) is a random integer and \( m_i \) is the sampled block. For example, in scheme [26], Yang et al. analyzed the issue of data privacy in detail, and they pointed out that the scheme Panda [16] is vulnerable to the proof forgery attack and cannot keep the data privacy against a curious TPA. They further proposed an improved scheme to address the two weaknesses by using the random masking technique, that is, they added a random element to each aggregation (i.e., \( \sum _{i \in L}v_i m_i + \eta \)). Similarly, Xu et al. analyzed the security of Tang and Zhang’s public auditing scheme [27] in [28]. And they pointed out that Tang and Zhang’s scheme suffers from the same vulnerabilities as Panda. Recently, Li et al. [29] proposed a privacy-preserving scheme specialized for low-performance end devices in a cloud environment. Later, Shen et al. used a third-party entity in their public auditing scheme which preserves data privacy. Since the third party performs almost all the computation operations instead of the group users, their scheme is pretty efficient in computation cost [30]. However, the scheme suffers from communication costs than most previously proposed auditing schemes due to the additional communication overheads incurred between the user and the third party entity.

When it comes to the privacy preservation, identity anonymity is also an essential property for cloud auditing, especially for the case where many group users share their data on a remote cloud server. Luo et al. designed an efficient integrity auditing scheme for shared data to achieve secure user revocation, which uses the concept of Shamir Secret Sharing to address the weakness of proxy re-signatures [31]. Wang et al. proposed an integrity checking and sharing scheme for remote data in a cloud-based health Internet of Things (IoT) environment [32]. But both schemes do not consider the identity privacy. In 2015, He et al. proposed a public auditing scheme for shared data aimed at preserving identity privacy [33]. Their scheme converts each user’s signature into the TPA’s signature with a re-signed key, which perfectly protect the identity privacy of users from the TPA, but the scheme cannot provide identity traceability.

In the case of auditing shared data, Wang et al. proposed several public auditing schemes, such as Panda [16], Oruta [34], Knox [35]. Panda performs efficient user revocation without data privacy and user identity anonymity based on proxy re-signatures. Oruta utilizes the ring signature and homomorphic authenticators to achieve public auditing and identity anonymity. But for Oruta, the communication and computation cost increase linearly with the number of group users. To improve the efficiency, Knox proposed a scheme that use group signature. The scheme assures user identity anonymity from the TPA, and also supports the traceability of user identity by the group manager. However, the communication and computational efficiency of Knox should be further improved. Although many public auditing schemes with various functions and security features have been proposed in recent years, but none of them can satisfy all above security requirements (i.e., public auditing, data sharing, data privacy, user identity anonymity, data traceability and high performance). Therefore, it is important to design a more novel auditing scheme including all above practical features.

3 Background

3.1 Preliminaries

  1. (1)

    Bilinear maps

Let \( G_{1} \), \( G_{2} \) and \( G_{T} \) denote three cycle groups of the same prime order q. Note that \( g_{1} \) and \( g_{2} \) are the generators of \( G_{1} \) and \( G_{2} \), respectively. An efficient computable bilinear map can be denoted as \( e: G_{1} \times G_{2} \rightarrow G_{T} \) with the following two properties:

  • Bilinearity: Given \( \forall a,b \in Z_{q} \) and \( \forall u\in G_{1} \), \( \forall v \in G_{2} \), the equation \( e(u^a, v^b) = e(u^a,v)^b = e(u, v^b)^a = e(u, v)^{ab} \) holds.

  • Non-degenerate: The equation \( e(g_1,g_2)\ne 1_{G_{T}} \) holds.

  1. (2)

    Complexity assumptions

The security of the proposed scheme relies on the following three assumptions:

  • Decisional Diffie–Hellman (DDH) assumption

    Given four elements \( g, g^a, g^b, g^c \in G \), there is no polynomial algorithm to decide whether \( g^c = g^{ab} \), where abc are randomly chosen from the group \( Z_q \).

  • Computational Diffie–Hellman (CDH) assumption

    Given three elements \( g, g^a, g^b \in G \), there is no polynomial algorithm to calculate the value of \( g^{ab} \) with non-negligible probability, where ab are randomly chosen from group \( Z_q \).

  • Elliptic curve discrete logarithm (ECDL) Assumption

Given two element \( g, g^a \in G \), it is computationally infeasible to compute the value of a, where a is randomly chosen from the group \( Z_q \), and g is a random point in group G.

  1. (3)

    Elliptic ElGamal encryption

The proposed scheme uses an efficient group signature to achieve data integrity auditing, while the Elliptic ElGamal encryption is embedded into this group signature to protect the private keys of group users. The encryption process can be described as follows:

Assume that G is an elliptic curve group of large prime order q, and g is a generator of G. Let a random element \( x \in Z_q \) be an encryptor’s private key, and \( pk = g^x \in G \) be the public key. Given a message \( m \in G \), the encryptor first selects a temporary key \( y \in Z_q \) randomly, and then computes \( C_1 = g^y \in G \), \( C_2 = m \cdot pk^y \), and finally returns \( (C_1, C_2) \) as its ciphertext. To encrypt, a decryptor must know the decryptor’s private key x, and computes \( C_2 / C_1^x \) to get the plaintext m. The Elliptic ElGamal encryption is secure if the DDH assumption holds.

3.2 System model

The system model of the proposed scheme consists of four entities, as illustrated in Fig. 2:

Fig. 2
figure 2

The proposed system model

  • Group manager The group manager is a special group user who is trusted by all other group users and is responsible for generating a distinct signing key pair of each group user.

  • Group users The group users are responsible for generating signatures of data blocks, and then outsourcing them to a cloud server.

  • Third-party auditor The third-party auditor (TPA) is authorized by the group manager and users to check the storage correctness of shared data on a remote cloud server.

  • Cloud server The cloud server which is managed by a semi-trusted cloud service provider(CSP) stores and manages large amount of data.

3.3 Design goals

To achieve a secure and efficient auditing process, the scheme should satisfy some basic requirements as follows:

  1. (1)

    Correctness The auditor should be able to verify the integrity of shared data on the remote cloud server only if the data are stored correctly.

  2. (2)

    Public auditing To reduce the computation overhead of group users, an authorized third-party auditor is allowed to check the data integrity of shared data.

  3. (3)

    Privacy preservation Since the TPA is honest but curious, the data content should be kept secret from the TPA. That is, even an authorized TPA cannot get any knowledge of the data content from the auditing proofs.

  4. (4)

    Anonymity Given an outsourced file, an adversary (as well as the TPA and other group users) cannot reveal the identity of file owners according to the signatures attached to each data block.

  5. (5)

    Traceability The group manager can disclose the identity of any group user if that user uploads malicious files.

  6. (6)

    Efficiency The auditor (e.g., the group user or TPA) can efficiently verify the data integrity without downloading the entire data file, and the computation cost is constant during the auditing process.

4 The proposed scheme

In this section, we first present an overview of the proposed scheme, and then we describe our design approach.

4.1 Overview

To address the design goals mentioned in previous auditing schemes for shared data on remote cloud, we propose a privacy-preserving efficient auditing scheme by utilizing the group signature and Homomorphic MACs presented by Tzu-Hsin Ho [6], Agrawal and Boneh [36], respectively. According to [6], the group signature evolved from the Elliptic ElGamal encryption algorithm, which satisfies semantic security under the DDH assumption. Homomorphic MACs is applied to create homomorphic signatures when developing the data auditing mechanism.

On the basis of this group signature, we can keep the identity anonymity of signers on the shared data blocks from TPA while checking the data integrity. Similar to other group signature schemes, the group manager which is considered to be the original data owner, can track the actual identity of each group signer if it is necessary. Moreover, the proposed scheme can keep the privacy of shared data through utilizing data masking technique. By making use of the short group signature and Homomorphic MACs, the proposed scheme can generate a constant-size public key and a constant-size signature which will help reduce communication and computation costs.

Specifically, the proposed scheme uses eight algorithms: SysInit, GroupSetup, Enroll, GroupSign, ProofGen, ProofVerify, Open and Revocation. All public system parameters are generated by taking a security parameter as input with the SysInit algorithm. The GroupSetup algorithm is executed by the group manager to generate the group key pairs, where the group private key is kept secret and only the manager knows it, while the public key is shared among group users and the authorized TPA. With the Enroll algorithm, a valid group user can be assigned a distinct key pair by the group manager, which is used to sign the data blocks in the GroupSign algorithm. The ProofGen and ProofVerify algorithms are executed to check the storage correctness of shared data on the remote cloud. As for the Open algorithm, the group manager is able to recognize the identity of the signer who uploads illegal files. Furthermore, the group manager can revoke that malicious user by following the Revocation algorithm.

4.2 Detailed design

Some important notations are shown in Table 1. We then describe the detailed design of our proposed scheme as follows:

Table 1 Summary of some important notations

SysInit\(( \lambda )\) Taking as input the security parameter \( \lambda \), the system outputs three cyclic groups \( G_{1}, G_{2}, G_{T} \) of \( \lambda \)-bit prime order q, and let \( e: G_{1} \times G_{2} \longrightarrow G_{T} \) be an efficient bilinear map. \( g_{1} \) and \( g_{2} \) denote the generators of \( G_{1} \) and \( G_{2} \), respectively. It is worth noting that there is no efficiently computable isomorphism from \( G_{1} \) to \( G_{2} \). There are two hash functions \( h:\{0,1\}^{*} \longrightarrow Z_{q} \) and \( H:\{0,1\}^{*} \longrightarrow G_{1} \), which are considered as random oracles in the proof of security. Then, the system chooses a pseudo-random generator PRG: \( K_{prg} \longrightarrow Z_{q}^{k} \) and a pseudo-random function PRF: \( K_{prf} \times I \longrightarrow Z_{q} \), where \( K_{prg} \) and \( K_{prf} \) are the set of secret keys for PRG and PRF, respectively, and I is the index table of data blocks. Finally, the system publishes these public parameters \( params = \langle q, G_{1}, G_{2}, G_{T}, e, g_{1}, g_{2}, h, H, PRG, PRF \rangle \)

GroupSetup(params) Given the public parameters params, the group manager executes the algorithm to get group secret keys and public keys as follows:

  • Randomly selects a key pair \( thk = (a, b) \), where \( a \in K_{prg} \) and \( b \in K_{prf} \), and share the key pair \( thk = (a, b) \) with the group user and the TPA via a secure channel.

  • Choose four random elements \( d,u,t,\xi \in Z_{q}^{*} \) as group master key, i.e., \( msk = (d, u, t, \xi ) \), where d, u are used to enroll new group users, t is used to trace the signatures of group users, and \( \xi \) is used to sign data blocks. Then, the group manager sets \( D = g_{1}^{d} \), \( U = g_{2}^{u} \), \( T = g_{1}^{t} \), \( X = g_2^{\xi } \) and \( \eta \in Z_{q}^{*} \) as the group public key, i.e., \( gpk = (D, U, T, X, \eta ) \).

  • Keeps the group master key \( msk = (d, u, t, \xi ) \) as secret, and publishes the public key \( gpk = (D, U, T, X, \eta ) \) among group users and the authorized TPA.

Enroll(imsk) Given the index of a new group user i and a group master key msk, the group manager runs the algorithm to distribute a private key for group users as follows:

  • Randomly selects an element \( x_{i} \in Z_{q}^{*} \), and sets \( Z_{i} = g_{1}^{(d-x_{i})(ux_{i})^{-1}} = g_{1}^{z_{i}} \in G_{1} \), where \( z_{i} \in Z_{q}^{*} \). Actually, we can get the equation \( d = x_{i} +z_{i}ux_{i} \).

  • Re-selects another value \( x_{i} \) if the value of \( x_{i} \) is a distinct secret. Otherwise, the group manager records the key pair \( (x_{i}, Z_{i}, \xi ) \) as i-th user’s signing key in a table called, Tab.

  • Sends the secret key \( usk[i] = (x_{i}, Z_{i}, \xi ) \) to i-th user through a secure channel.

GroupSign(gpkthkusk[i], M) Given the group public key \( gpk =(D, U, T, \eta ) \), the group private key thk, the user’s private key \( usk[i] = (x_{i}, Z_{i}) \) and the file \( M = \{m_{1},m_{2},\ldots ,m_{n}\} \in \{0,1\}^{*} \), where \( m_{j} = \{m_{j1},m_{j2},\ldots ,m_{js}\} \), the user i executes the following steps to calculate corresponding signatures for each data block \( m_{j} \).

  • Selects a random element \( y_{j} \in Z_q^{*} \), and computes \( Q_{j} = e(U,T)^{y_{j}} \), \( C_{j1} = g_{1}^{y_{j}} \) and \( C_{j2} = Z_{i}^{x_{i}}\cdot T^{y_{j}} \), respectively.

  • Computes \( \vec {A} = \{a_{1}, a_{2}, \ldots , a_{s}\} \longleftarrow PRG(a) \) and \( \vec {B} = \{b_{1},b_{2},\ldots ,b_{n}\} \), where \( b_{j} \longleftarrow PRF(b,j) \). The group manager then calculates \( \pi _{j} = \sum _{l=1}^{s}a_{l}m_{jl}+b_{j} \).

  • Computes \( \tau _{j} = \eta ^{\pi _{j}}h(C_{j1},C_{j2},Q_{j}) \) and \( \omega _{j} = y_{j}\cdot \tau _{j}+x_{i} \).

  • Computes a tag for the data block as \( \theta _{j} = [H(fn\parallel j)g_{1}^{\pi _{j}}]^{\xi } \), where fn is a distinct random string assigned by the user i, as a unique file identifier of a file. We note that fn can be known by the TPA for auditing, because the file identifier does not contain the real content of data.

  • Outsources the file M with its signatures \( \sigma _{j} = (C_{j1},C_{j2},\tau _{j},\omega _{j},\theta _{j})_{1 \le j \le n} \) to the remote cloud server.

ProofGen\(( fn, \sigma _j, n) \) This algorithm is an interactive protocol executed between the remote cloud server and the user/TPA for data integrity checking.

If the user cannot check the data integrity in person, he/she can delegate the auditing task to an authorized TPA by sending an auditing request.

Step 1 When receiving the auditing request, the TPA generates an auditing message and sends it to remote cloud server as follows:

  • Randomly selects a c-element subset L from set [1, n] , where \( j \in L \) denotes the index of a block sampled to check in the auditing process and c is the number of sampled data blocks to be audited.

  • For each \( j \in L \), generates the small integer \( v_{j} \in Z_{q}^{*} \).

  • Sends the auditing message \( chal = \{(j, v_{j}), fn\}_{j \in L} \) to the remote cloud server.

Step 2 Upon receiving the auditing message chal from the user or TPA, the cloud server generates the corresponding proof of data possession with the sampled data blocks as follows:

  • Randomly selects a set of elements as \( \gamma _{1}, \gamma _{2}, \ldots , \gamma _{s} \), and computes \( \mu _{l} = \sum _{j \in L}v_{j}m_{jl} + \gamma _{l} \) for \( 1 \le l \le s \).

  • Computes the aggregation of block tags as \( \Theta = \prod _{j \in L}\theta _{j}^{v_{j}} \).

  • Sends \( pf = \{\Theta , \vec {\mu }, \varOmega , \vec {\varUpsilon } \} \) as the auditing proof, where \( \vec {\mu } = \{\mu _{l}\}_{1 \le l \le s} \), \( \vec {\varUpsilon } = \{\delta _{l}, \eta _l\}_{1 \le l \le s} \), \( \delta _{l} = g_{1}^{-\gamma _{l}} \), \( \eta _{l} = \eta ^{-\gamma _{l}} \), and \( \varOmega = \{\varphi _{j}\}_{j \in L} \), \( \varphi _{j} = (C_{j1}, C_{j2}, Q_{j}, \tau _{j}, w_{j}) \), where \( Q_j = \tfrac{e(C_{j2},U)\cdot e(g_{1},g_{2})^{w_j}}{e(D \cdot C_{j1}^{\tau _{j}},g_{2})} \) can be precomputed by the cloud server to save the computation cost.

ProofVerify(pfgpkthk) Given that the auditing proof pf, group public key gpk and the private key thk are only shared among group members and the authorized TPA, the user/TPA can execute the following steps to check the correctness of data storage:

  • Checks the following equation

    $$\begin{aligned} \prod _{j \in L}Q_{j} {\mathop {=}\limits ^{?}} \dfrac{e(\prod _{j \in L}C_{j2},U)\cdot e(g_{1},g_{2})^{\sum _{j \in L}w_{j}}}{e(D^c \cdot \prod _{j \in L} C_{j1}^{\tau _{j}},g_{2})} \end{aligned}$$
    (1)

    If it does not hold, the TPA aborts the procedure and outputs “0”; otherwise, it continues to do the following steps for checking Eqs. (2) and (3):

  • Computes \( \vec {A} = \{a_{1}, a_{2}, \ldots , a_{s}\} \longleftarrow PRG(a) \) and \( b_{j}\longleftarrow PRF(b,j) \) for \( j \in L \) by taking the private key thk as input.

  • Computes \( \varpi = \sum _{l=1}^{s}a_{l}\mu _{l} + \sum _{j \in L}v_{j}b_{j} \).

    $$\begin{aligned} \prod _{j \in L}\tau _{j}^{v_{j}}&{\mathop {=}\limits ^{?}}&\eta ^{\varpi }\cdot \prod _{l=1}^{s}\eta _l^{a_l}\cdot \prod _{j \in L}h(C_{j1},C_{j2},Q_{j})^{v_j} \end{aligned}$$
    (2)
    $$\begin{aligned} e(\Theta ,g_{2})&{\mathop {=}\limits ^{?}}&e\left( \prod _{j \in L}H(fn\parallel j)^{v_{j}}\cdot \prod _{l=1}^{s}\delta _{l}^{a_{l}}\cdot g_{1}^{\varpi },X\right) \end{aligned}$$
    (3)

If all above three equations hold, the proof shows that the user’s data are stored correctly on the remote cloud server and outputs “1”; otherwise, the data file is corrupted.

  • Sends the auditing result to the user if the verifier is an authorized TPA.

Open\(( t, M, \sigma )\) Given the group private key t, data file M and the corresponding signature set \( \sigma \), the group manager can trace a signature back to the actual group user if he/she uploads illegal files.

  • Verifies if the signature set \( \sigma \) is a valid signature on file M through Eqs. (1) and (2).

  • For arbitrary \( j \in [1,n] \), compute

    $$\begin{aligned} Z_{i}^{x_{i}} = C_{j2} / C_{j1}^{t} \end{aligned}$$
    (4)

Since the value of \( Z_{i}^{x_{i}} \) denotes the multiplication of two parts of usk[i], the group manager can reveal the actual identity of the user through traversing the table that can map \( Z_{i}^{x_{i}} \) to a user’s identity.

4.3 Support user revocation

Initially, the group manager publishes a revocation list, called RList, to record the identity of those revoked/departed users. When a group user i always uploads junk files, the group manager has to expel the user from this group by adding \( x_i \) into RList, where \( x_i \) is a part of the secret key that belongs to the revoked user i. Then, the group manager deletes the tuple \( (x_i, Z_i, \xi ) \) from Tab.

Once a user has been revoked, the cloud server does not need to respond to future requests from this user for security reasons. To distinguish whether the user is a revoked one, the cloud server can do a revocation test as follows:

Given a part of signatures \( (C_{j1}, C_{j2}, \tau _j, w_j) \), for each record \( x \in \) RList, the cloud server can first compute the value of \( \widetilde{Q}_j = \tfrac{e(C_{j2},U)\cdot e(g_{1},g_{2})^{w_{j}}}{e(D \cdot C_{j1}^{\tau _{j}},g_{2})} \), then it checks the correctness of the following equation:

$$\begin{aligned} e(C_{j2}, T)\cdot e(g_1^x, g_2) {\mathop {=}\limits ^{?}} e(D, g_2)\cdot \widetilde{Q}_j \end{aligned}$$
(5)

If the cloud server proves that Eq. (5) holds when \( x = x_{i{'}} \), it can know the current data file and the corresponding signatures are provided by a revoked user \( i{'} \) so that it can refuse the user’s storage request to save space. According to the specific policies of different groups, the group manager can choose to destroy the privacy of data uploaded by the revoked user, or resign the data by utilizing the proxy re-signing signature as referred in [35].

5 Correctness and security analysis

In this section, we present a detailed analysis of the proposed scheme according to the security requirements aforementioned in Design goals. The correctness of the scheme can be verified through a straightforward calculation based on the properties of a bilinear map. The following theorems support the security analysis of the proposed scheme.

5.1 Correctness analysis

Theorem 1

The proposed scheme satisfies the correctness, i.e., the TPA can check the data integrity in the ProofVerify procedure as long as all entities follow the proposed scheme honestly.

Proof

The group user i can honestly generate a set of signatures \( \{\sigma _{j}\}_{1 \le j \le n} \) on file M with his private key usk[i] , which is generated by the honest group manager. Moreover, the cloud server correctly stores the user’s data and follows the proposed scheme to generate the corresponding auditing proof \( pf = \{\Theta , \mu _{1}, \mu _{2}, \ldots , \mu _{s}, \varOmega , \vec {\varUpsilon } \} \). Given the group public key \( gpk = (D, U, T, \eta ) \), the correctness of Eqs. (1), (2) and (3) can be proved as follows:

$$\begin{aligned} \begin{aligned}&e\left( \prod _{j \in L}C_{j2},U\right) \cdot e(g_{1},g_{2})^{\sum _{j \in L}w_{j}} \\&\quad = \prod _{j \in L}[e(Z_{i}^{x_i}\cdot T^{y_j},U)\cdot e(g_1,g_2)^{y_j \cdot \tau _j + x_i}] \\&\quad = \prod _{j \in L} [e(g_1,g_2)^{x_i z_i u} \cdot e(T,U)^{y_j } \cdot e(g_1,g_2)^{y_j \cdot \tau _j + x_i}] \\&\quad = \prod _{j \in L} [e(g_1,g_2)^{x_i z_i u + y_j \cdot \tau _j + x_i} \cdot Q_{j}] \\&\quad = \prod _{j \in L} [e(g_1,g_2)^{d + y_j \cdot \tau _j} \cdot Q_{j}] \\&\quad = \prod _{j \in L} [e(D\cdot C_{j1}^{\tau _j},g_2) \cdot Q_{j}] \\&\quad = e\left( D^c\cdot \prod _{j \in L}C_{j1}^{\tau _j},g_2\right) \cdot \prod _{j \in L}Q_{j} \end{aligned} \end{aligned}$$

The above formula transformation shows that Eq. (1) holds.

Equation (2) can be proved as follows:

$$\begin{aligned} \begin{aligned} \prod _{j \in L}\tau _{j}^{v_j}&= \prod _{j \in L}[\eta ^{\pi _j}h(C_{j1},C_{j2},Q_j)]^{v_j} \\&= \eta ^{\sum _{j \in L}\pi _j v_j}\cdot \prod _{j \in L}h(C_{j1},C_{j2},Q_j)^{v_j} \\&= \eta ^{\sum _{j \in L}(\sum _{l=1}^{s}a_l m_{jl}+b_j) v_j}\cdot \prod _{j \in L}h(C_{j1},C_{j2},Q_j)^{v_j} \\&= \eta ^{\sum _{l=1}^{s}a_l (u_l - \gamma _l) + \sum _{j \in L}b_j v_j}\cdot \prod _{j \in L}h(C_{j1},C_{j2},Q_j)^{v_j} \\&= \eta ^{\varpi }\cdot \prod _{l=1}^{s}\eta _l^{a_l} \cdot \prod _{j \in L}h(C_{j1},C_{j2},Q_j)^{v_j} \end{aligned} \end{aligned}$$

From the above two equations, we note that the group signatures for batch auditing satisfy the correctness criterion.

$$\begin{aligned} \begin{aligned} e(\Theta ,g_{2})&= e\left( \prod _{j \in L}\theta _{j}^{v_j},g_2\right) \\&= e\left( \prod _{j \in L}[H(fn\parallel j)g_{1}^{\pi _{j}}]^{\xi v_j}, g_2\right) \\&= e\left( \prod _{j \in L}H(fn\parallel j)^{v_{j}}\cdot \prod _{j \in L}g_1^{\pi _j v_j},g_2^{\xi }\right) \\&= e\left( \prod _{j \in L}H(fn\parallel j)^{v_{j}}\cdot \prod _{l=1}^{s}g_1^{a_l \mu _l - a_l \gamma _l + \sum _{j \in L}b_j v_j},X\right) \\&= e\left( \prod _{j \in L}H(fn\parallel j)^{v_{j}} \cdot \prod _{l=1}^{s}g_1^{ - a_l \gamma _l}\cdot g_1^{\sum _{l=1}^{s}a_l \mu _l + \sum _{j \in L}b_j v_j},X\right) \\&= e\left( \prod _{j \in L}H(fn\parallel j)^{v_{j}}\cdot \prod _{l=1}^{s}\delta _{l}^{a_{l}}\cdot g_{1}^{\varpi },X\right) \end{aligned} \end{aligned}$$

Thus, Eq. (3) also satisfies the correctness criterion. It proves that, as long as the data maintains integrity, the auditing proof generated over the original data can pass the TPA’s verification.

Moreover, for Eq. (4) in the Open procedure, we have

$$\begin{aligned} \begin{aligned} C_{j2} / C_{j1}^t&= \left( Z_i^{x_i} \cdot T^{y_j}\right) / C_{j1}^t \\&= \left( Z_i^{x_i} \cdot g_1^{y_j t}\right) / g_1^{y_j t} \\&= Z_i^{x_i} \end{aligned} \end{aligned}$$

Thus, all valid signatures can be opened correctly by the group manager.

The correctness of the value \( \widetilde{Q}_j \) is based on the proof of Eq. (1), and Eq. (5) can be proved as follows:

$$\begin{aligned} \begin{aligned}&e(C_{j2}, U)\cdot e(g_1^x, g_2) \\&\quad = e\left( Z_{i}^{x}\cdot T^{y_{j}},U\right) \cdot e\left( g_1^x, g_2\right) \\&\quad = e\left( g_1^{z_i x},g_2^u\right) \cdot e\left( T^{y_{j}},U\right) \cdot e\left( g_1^x, g_2\right) \\&\quad = e(g_1,g_2)^{z_i x u+x}\cdot e(T^{y_{j}},U) \\&\quad = e(D, g_2)\cdot \widetilde{Q}_j \end{aligned} \end{aligned}$$

\(\square \)

5.2 Security analysis

Theorem 2

It is computationally infeasible for a semi-trusted cloud server or an external adversary to forge a valid signature for any data block if the CDH assumption holds.

Proof

To prove unforgeability, we first assume that \( \mathcal {F} \), which can adaptively choose data blocks and identities, is able to generate a forged group signature with the advantage of \( \varepsilon \), and it can only execute at most \( q_H \) hash queries and \( q_s \) sign queries within time qt. Then ,we can construct a challenger \( \mathcal {C} \) that can break the CDH assumption within time \( qt{'} \) and advantage \( \varepsilon {'} \), where

$$\begin{aligned} \begin{aligned} qt{'}&\le qt + (q_H + q_s +1)\cdot T_{G_1} \\ \varepsilon {'}&\ge \dfrac{\varepsilon }{e\cdot (1+q_s)} \end{aligned} \end{aligned}$$

Let us assume that \( \mathcal {C} \) has broken the homomorphic MAC algorithm. That is, given a message \( m_j \), \( \mathcal {C} \) can generate a valid homomorphic MAC \( \pi _j \). Then, it further implements a challenge-response game to break the group signature for algorithm \( \mathcal {F} \) as follows:

Game1 Given \( (g_1, g_1^\alpha , g_1^\beta ) \) as a challenge, \( \mathcal {F} \) uses it to set a target public key \( PK = g_1^\alpha \). Let \( Q_{Hash} \), \( Q_{sign} \) denote the hash queries and sign queries, respectively, for which \( \mathcal {C} \) is supposed to return valid responses. we note that the result of hash queries in this game is \( H(fn\parallel j)g_1^{\pi _j} \) on the query of \( (j,m_{j}) \), where \( \pi _j \) is the homomorphic MAC of block \( m_j \).

\( Q_{Hash} \) Given a coin \( coin \in \{0,1\} \), the probability for \( \mathcal {C} \) to select a coin by \( coin = 0 \) is \( Pr[0] = \tfrac{q_s}{1+q_s} \). \( \mathcal {C} \) maintains a list \( L_H \) to record the hash queries. For each hash query \( (j, m_j) \) from \( \mathcal {F} \), \( \mathcal {C} \) first checks whether the entry has been in \( L_H \), if so, outputs corresponding result to \( \mathcal {F} \) directly. Otherwise, \( \mathcal {C} \) first flips the coin to select the value of coin. If \( coin = 0 \), it outputs the result as \( h_j^* = g_1^{\beta } \) to \( \mathcal {F} \). Otherwise, it chooses a random element \( r \in Z_q^{*} \), and outputs the result as \( h_j = g_1^r \) to \( \mathcal {F} \), and records the result in \( L_H \), where r is a distinct value for the current query with index j.

\( Q_{sign} \) \( \mathcal {C} \) maintains a list \( L_S \) to record the sign queries. If a sign query \( (j, m_j) \) has been recorded in \( L_S \), \( \mathcal {C} \) outputs the existed result to \( \mathcal {F} \). Otherwise, \( \mathcal {C} \) flips the coin, if \( coin = 0 \), it implies a failure and it aborts this game. Otherwise, by searching the hash query list \( L_H \) (we assume that a hash query has been issued on this block), it outputs a signature \( \theta _j = (g_1^\alpha )^r = h_j^\alpha = g_1^{r\cdot \alpha } \) to \( \mathcal {F} \).

Forgery \( \mathcal {F} \) generates a forgery \( (j^*, m_j^*, \theta ^*) \). In accordance with the above random oracle queries, if \( coin \ne 0 \), \( \mathcal {C} \) fails to guess the target \( (j^*, m_j^*) \) that \( \mathcal {F} \) wants to implement a forgery attack. Otherwise, with the result of the hash query on \( (j^*, m_j^*) \), \( \mathcal {C} \) can compute a signature \( \theta _j^* = (h^*)^\alpha = g_1^{\alpha \cdot \beta } \), which is considered as the result of the CDH problem.

Thus, \( \mathcal {C} \) can break the CDH assumption if \( \mathcal {F} \) can generate a forgery successfully. Additionally, the probability that \( \mathcal {C} \) wins the game is \( Pr[0]^{q_s}\cdot (1-Pr[0])\cdot \varepsilon = \tfrac{\varepsilon }{e\cdot (1+q_s)} \), where \( e = \lim _{q_s \rightarrow +\infty }(1+\frac{1}{q_s})^{q_s} \) is a constant. For each hash query or sign query of \( \mathcal {F} \), \( \mathcal {C} \) requires one exponentiation operation on \( G_1 \). For the last forgery, it requires an exponentiation operation as well. So, the whole time for \( \mathcal {C} \) to break the CDH assumption is at most \( qt{'} = qt + (q_H + q_s +1)\cdot T_{G_1} \), where \( T_{G_1} \) is the running time of one exponentiation operation.

However, it is computationally infeasible to compute a result for CDH problem if the CDH assumption holds. It is also worth mentioning that the probability for an algorithm \( \mathcal {F} \) to break a homomorphic MAC on any data block is 1 / q [36], which is a significant challenge. Therefore, a semi-trusted cloud server or an external adversary cannot forge a signature under the proposed scheme. \(\square \)

Theorem 3

It is computationally infeasible for a semi-trusted cloud server or an external adversary to generate an auditing proof with corrupted data that can pass the verification of an auditor under the proposed auditing scheme.

Proof

Based on Theorem 2, it is rather hard for a semi-trusted cloud server to forge a group signature on a data block. Besides, to prove the unforgeability of auditing proof, we also define a challenge-response game (named Game2) between the TPA and a semi-trusted cloud server. If the cloud server wins Game2 by generating a forged auditing proof with corrupted data block, and enabling the proof to pass the TPA’s verification, we can break the ECDL assumption on group \( G_1 \). Similar to the game defined in previous works [37, 38], Game2 can be described as follows:

Game2 The TPA generates an auditing challenge \( chal = \{j, v_j\}_{j \in L} \) on shared data M and sends it to the cloud server. If all these data blocks are not corrupted, the correct auditing proof is \( pf = \{\Theta , \vec {\mu }, \varOmega , \vec {\varUpsilon } \} \). If we assume that some sampled data blocks have been corrupted, the cloud server has to generate a forged proof \( pf{'} = \{\Theta , \vec {\mu {'}}, \varOmega , \vec {\varUpsilon } \} \), where \( \vec {\mu {'}} = \{\mu _l{'}\}_{1 \le l \le s} \), \( \mu _{l}{'} = \sum _{j \in L}v_{j}m_{jl}{'} + \gamma _{l} \). Since Theorem 2 has proved that the signatures of data blocks cannot be forged, the value \( \{\Theta , \varOmega \} \) can not be a forgery. In addition, as some data blocks are corrupted, there is at least one element in \( \{\varDelta \mu _{l} \}_{1 \le l \le s} \), where \( \varDelta \mu _{l} = \mu _l - \mu _l{'} \).

According to Eq. (3), a correct auditing proof satisfies

$$\begin{aligned} e(\Theta ,g_{2}) = e\left( \prod _{j \in L}H(fn\parallel j)^{v_{j}}\cdot \prod _{l=1}^{s}\delta _{l}^{a_{l}}\cdot g_{1}^{\varpi },X\right) \end{aligned}$$

Now, we assume that if the forged auditing proof can also pass the TPA’s verification, then it satisfies

$$\begin{aligned} e(\Theta ,g_{2}) = e\left( \prod _{j \in L}H(fn\parallel j)^{v_{j}}\cdot \prod _{l=1}^{s}\delta _{l}^{a_{l}}\cdot g_{1}^{\varpi {'}},X\right) \end{aligned}$$

where \( \varpi {'} = \sum _{l=1}^{s}a_{l}\mu _{l}{'} + \sum _{j \in L}v_{j}b_{j} \).

From the above two equations, we can get

$$\begin{aligned} \begin{aligned}&g_1^{\varpi {'}} = g_1^\varpi \Rightarrow g_1^{\sum _{l=1}^{s}a_{l}\mu _{l}{'} + \sum _{j \in L}v_{j}b_{j}} = g_1^{\sum _{l=1}^{s}a_{l}\mu _{l} + \sum _{j \in L}v_{j}b_{j}} \\&g_1^{\sum _{l=1}^{s}a_{l}\mu _{l}{'}} = g_1^{\sum _{l=1}^{s}a_{l}\mu _{l}} \Rightarrow g_1^{\sum _{l=1}^{s}a_{l}\varDelta \mu _{l}} = 1 \\&1 = \prod _{l=1}^{s}(g_1^{a_l})^{\varDelta \mu _{l}} \end{aligned} \end{aligned}$$
(6)

Give two elements \( h, g \in G_1 \), because \( G_1 \) is a cyclic group, there always exists a number x that satisfies \( h = g^x \). Then, we show how to compute the value of x as follows:

Let \( g_1^{a_l} = g^{\kappa _{l}}h^{\rho _l} \), where \( \kappa _{l}, \rho _{l} \) is randomly chosen from \( Z_{q} \). Then, Eq. (5) can be transformed into

$$\begin{aligned} \begin{aligned} 1&= \prod _{l=1}^{s}(g_1^{a_l})^{\varDelta \mu _{l}} = \prod _{l=1}^{s}(g^{\kappa _l}h^{\rho _{l}})^{\varDelta \mu _{l}} \\&= g^{\sum _{l=1}^{s}\kappa _l\cdot \varDelta \mu _{l}}h^{\sum _{l=1}^{s}\rho _l\cdot \varDelta \mu _{l}} \\&= g^{\sum _{l=1}^{s}\kappa _l\cdot \varDelta \mu _{l}+x\cdot \sum _{l=1}^{s}\rho _l\cdot \varDelta \mu _{l}} \end{aligned} \end{aligned}$$

We can get the following result:

$$\begin{aligned} x = -\dfrac{\sum _{l=1}^{s}\kappa _l\cdot \varDelta \mu _{l}}{\sum _{l=1}^{s}\rho _l\cdot \varDelta \mu _{l}} \end{aligned}$$

As it is defined that at least one element is nonzero, and \( \rho _{l} \) is a random element of \( Z_q \), so the probability for \( \sum _{l=1}^{s}\rho _l\cdot \varDelta \mu _{l} = 0 \) is 1 / q, which can be neglected. Thus, the value obtained in Eq. (5) is meaningful, and we therefore can conclude that if the cloud server wins Game2 by passing the verification with a forged proof, we can solve the ECDL problem with the probability of \( 1 - 1/q \), which contradicts the ECDL assumption. Therefore, the semi-trusted cloud server cannot forge an auditing proof to pass the TPA’s verification when the data blocks are not stored correctly under the proposed scheme. \(\square \)

Theorem 4

The TPA cannot get any knowledge of the original data from the auditing proof under the proposed scheme if the ECDL assumption holds.

Proof

The authorized but curious TPA can only get three kinds of information: a shared key pair \( a \in K_{prg} \), \( b \in K_{prf} \), the auditing challenge \( \{j, v_j\}_{j \in L} \) and the corresponding proof \( pf = \{\Theta , \vec {\mu }, \varOmega , \vec {\varUpsilon }\} \). As the shared key pair and auditing challenge do not contain any message about the original data, we can only prove that the auditing proof does not leak the data privacy based on the ECDL assumption as follows:

For the element \( \varOmega = \{C_{j1}, C_{j2}, Q_j, \tau _j, w_j\}_{j\in L} \in pf \), only the elements \( \tau _j = \eta ^{\pi _j}\cdot h(C_j1, C_{j2}, Q_j) \) and \( w_j = y_j \cdot \tau _j + x_i \) refer to the original data, where \( \pi _j = \sum _{l=1}^s a_lm_{jl}+b_j \). Although the TPA can get the value of \( a_l, b_j \) and \( h(C_j1, C_{j2}, Q_j) \), it still cannot compute the value of \( \pi _j \) if the ECDL assumption holds. Thus, it is not able to get \( m_{jl} \) from \( \tau _j \). As the TPA cannot get the original data from \( \tau _j \), it therefore cannot obtain the value of \( m_{jl} \) from \( w_j \), which is further blinded by secret keys \( x_i \) and \( y_j \).

For the element \( \vec {\mu } = \{\mu _{l}\}_{1 \le l \le s} \in pf \), we have \( \mu _l = \sum _{j \in L} v_jm_{jl} + \gamma _l \). From the previous work (i.e., [25, 30, 39]), the TPA can recover the original data from \( \sum _{j \in L} v_jm_{jl} \), while we blind it with a random element \( \gamma _l \in Z_q \) in the proposed scheme. Although the TPA can get \( \vec {\varUpsilon } = \{\delta _{l}, \eta _l\}_{1 \le l \le s} \), it cannot get the value of \( \gamma _l \) from \( g_1^{\gamma _l} \) or \( \eta ^{\gamma _l} \) if the ECDL assumption holds.

For the element \( \Theta \in pf \), the formula \( \Theta = \prod _{j \in L} \theta _j^{v_j} \) can be transformed as follows:

$$\begin{aligned} \begin{aligned} \Theta&= \prod _{j \in L}H(fn\parallel j)^{v_j \xi }\cdot \prod _{j \in L}g_1^{\pi _j v_j \xi }\\&= \prod _{j \in L}H(fn\parallel j)^{v_j \xi }\cdot \prod _{j \in L}g_1^{(\sum _{l=1}^{s}a_lm_{jl}+b_j) v_j \xi }\\&= \prod _{j \in L}H(fn\parallel j)^{v_j \xi }\cdot \prod _{j \in L}g_1^{b_j v_j \xi }\cdot \prod _{j \in L}g_1^{(\sum _{l=1}^{s}a_lm_{jl}) v_j \xi }\\&= \prod _{j \in L}H(fn\parallel j)^{v_j \xi }\cdot \prod _{j \in L}g_1^{b_j v_j \xi }\cdot g_1^{ \xi \sum _{j \in L}(v_j\sum _{l=1}^{s}a_lm_{jl})} \end{aligned} \end{aligned}$$

From the above equation, the value of \( g_1^{ \xi \sum _{j \in L}(v_j\sum _{l=1}^{s}a_lm_{jl})} \) is blinded by \( \prod _{j \in L}H(fn\parallel j)^{v_j \xi } \) and \( g_1^{b_j v_j \xi } \). Given \( X = g_2^{ \xi } \), \( H_{j} \), \( v_j \) and \( b_j \), it is impossible for the TPA to get the value of their product based on the CDH assumption, especially when there is no efficiently computable isomorphism from \( g_{1} \) to \( g_{2} \). Besides, the TPA still cannot get the value of \( \sum _{l=1}^{s}a_lm_{jl} \) from \( g_1^{ \xi \sum _{j \in L}(v_j\sum _{l=1}^{s}a_lm_{jl})} \) based on the ECDL assumption. Therefore, the data privacy is guaranteed under the proposed scheme. \(\square \)

Theorem 5

The TPA and cloud server cannot know the group user’s identity under the proposed scheme if ElGamal encryption is secure.

Proof

Since the proposed auditing scheme is developed from a simple-yet-efficient group signature scheme [6], and the corresponding block signature \( (\Theta , \vec {\mu }, \vec {\varUpsilon }) \) does not contain any identity information, so that the proof of anonymity is followed with that described in [6]. Here, we show that if an algorithm \( \mathcal {F} \) can disclose the identity anonymity of the proposed auditing scheme, there exists an algorithm \( \mathcal {B} \) that can break ElGamal encryption. \( \mathcal {B} \) first selects a random element \( t \in Z_q^* \) to compute \( T = g_1^t \) as a public key of ElGamal encryption. Then, \( \mathcal {B} \) computes \( D = g_1^d \), \( U = g_2^u \), where d, u is randomly selected from \( Z_q^* \) as a group private key gsk. For each user i, \( \mathcal {B} \) computes \( Z_i = g_1^{z_i} = g_1^{(d-x_i)(ux_i)^{-1}} \), where \( x_i \), \( z_i \) are two random elements in \( Z_q^* \). Finally, \( \mathcal {B} \) stores the distinct value of \( (x_i, Z_i) \) as i-th user’s private key, and outputs the group public key \( gpk = (D, U, T, X, \eta ) \) to \( \mathcal {F} \), where \( \eta \in Z_q^* \).

Let us assume that \( \mathcal {F} \) can query \( \mathcal {B} \) about the random hash oracle \( h(\cdot ) \) at any time, and \( \mathcal {B} \) holds a list List about these answers. If \( \mathcal {F} \) gives a repeated query, \( \mathcal {B} \) can directly respond to it with the existing answer in List. Otherwise, responds with a random element chosen from \( Z_{q} \) and stores the answer in List for repeated queries.

Now, \( \mathcal {F} \) sends two indices, \( i_0 \) and \( i_1 \), and a homomorphic MAC \( \pi \) to \( \mathcal {B} \) as an anonymity challenge. Then, \( \mathcal {B} \) sends two values \( Z_{i_0}^{x_{i_0}} \) and \( Z_{i_1}^{x_{i_1}} \) to \( \mathcal {C} \) as an indistinguishability challenge, where \( \mathcal {C} \) denotes an ElGamal encryption challenger. Upon receiving the indistinguishability challenge, \( \mathcal {C} \) chooses one of two values to encrypt, i.e., \( C_{j1} = g_1^{y_{j}}, C_{j2} = Z_{i_b}^{x_{i_b}} \cdot T^{y_j} \), where b is a bit from \( \{0,1\} \), \( y_j \in Z_q^* \).

\( \mathcal {B} \) needs to determine which value is encrypted by \( \mathcal {C} \) as follows: It chooses two elements \( \tau _j, w_j \in Z_q \) for computing \( Q_j = \tfrac{e(C_{j2},U)\cdot e(g_{1},g_{2})^{w_{j}}}{e(D \cdot C_{j1}^{\tau _{j}},g_{2})} \), where \( \tau _j \) is a multiple of \( \eta ^{\pi _j} \). For the tuple \( (\eta , \tau _j, C_{j1}, C_{j2}, Q_j) \), if the random hash oracle \( h(\cdot ) \) has set some other value instead of c, there is a collision to one-way hash function, which implies a failure.

Since the ciphertext of \( Z_{i_b}^{x_{i_b}} \) is embedded into a valid signature \( \sigma \) of the data block by group user \( i_b \), \( \mathcal {B} \) can respond to its indistinguishability challenge by calling \( \mathcal {F} \)’s response. In other words, if \( \mathcal {F} \) can break the identity anonymity of the proposed scheme, \( \mathcal {B} \) can break the ElGamal encryption, which contradicts with the semantic security of the ElGamal encryption.

Therefore, the identity of any group user is anonymous under the proposed auditing scheme. \(\square \)

Theorem 6

The group manager can trace the identity of the group user by using the signatures on each data block.

Proof

Given a data block m, the corresponding signature should be \( (C_1, C_2, \tau , w, \theta ) \) if the user i follows the GroupSign procedure in a right manner. As the group manager owns the group private key t, it can computes the value of \( Z_i^{x_i} = C_2 / C_1^t \). By checking the table Tab, the group manager can get the multiplication of each group user’s private key, and then reveal the identity of the group user who signed on the data block m. \(\square \)

5.3 Security comparison

In order to highlight the security benefits of our proposed scheme, we compare it with a few recently proposed approaches, such as Wang et al.’s scheme [35], Yuan et al.’s scheme [40] and Fu et al’s scheme [41]. For sake of simplicity, let \( SR_1 \), \( SR_2 \), \( SR_3 \), \( SR_4 \), \( SR_5 \) and \( SR_6 \) represent the security requirements in terms of public auditing, authorized auditing, data privacy, identity privacy, identity traceability and user revocation.

Table 2 shows that, none of the previously proposed related schemes protect data privacy, because the curious TPA can recover the data content from the value of \( \mu = \sum _{j \in L}v_j m_j \). Due to the usage of group signature, the scheme [35, 41] and our proposed scheme can achieve identity privacy and traceability, but the scheme [35]cannot provide user revocation. Although Yuan et al.’s scheme [40] can provide user revocation, it cannot provide identity privacy and traceability for shared data in the cloud.

Table 2 Security comparisons of our proposed scheme and related schemes

6 Performance evaluation

In this section, we first analyze the performance of our proposed scheme in terms of computation and communication costs based on an asymmetric bilinear map groups of Type-3 [42]. Then, we demonstrate the efficiency by implementing practical experiments. We also compare the performance of our proposed scheme with another recent scheme [41]. According to [43] and Table 3 referred in [44], for most mathematical operations, the bilinear map groups of type-3 is more efficient than other bilinear map groups of Type-1 when they are at 80-bit security level, so that the following comparison results are based on the bilinear map groups of Type-3 at the same security level.

6.1 Performance analysis

For the sake of simplicity, we employ some notations to describe the mathematical operations used in Fu et al.’s scheme [41] and ours. They are defined as: \( E_{G_1}, E_{G_2}, E_{G_T} \) and \( E_{Z_q} \) denote one exponentiation operation in group \( G_1, G_2, G_T \) and \( Z_q \), respectively, and \( M_{G_1}, M_{G_T}, M_{Z_q} \) denote one multiplication operation in group \( G_1, G_T \) and \( Z_q \), respectively. \( D_{Z_q}, D_{G_T} \) represents one division operation in group \( Z_q \) and \( G_T \); \( A_{Z_q} \) denotes one addition operation in group \( Z_q \). Hshs denote one hash-to-point operation on group \( G_1 \) and one hash-to-integer operation in group \( Z_q \), respectively. \( P_{1,2} \) represents one bilinear pairing operation. Assume that the number of sampled blocks to verify is c, and each block is split into s sectors. (The advantages of partitioning data blocks are given in [39].) Since the data block in Fu et al.’s scheme is not split into s sectors, we set \( s = 1 \) to get fair comparison results, and so that we can compare Fu et al’s scheme with our proposed scheme.

(1) Analysis on communication cost:

Since uploading the data file and the corresponding verification information is a one-time operation, therefore we do not analyze the communication cost between each group user and the remote cloud server. But the communication cost incurred by the auditing challenge and proof cannot be ignored.

For every ProofGen phase, the auditing challenge \( chal = \{(j, v_j), fn\} \) costs \( c(|j|+|q|)+ |fn| \) bits, where |j| is the size of a block index j, |q| is the size of an element chosen from group \( Z_q \), and |fn| is the size of file identifier. For the auditing proof \( pf = \{\Theta , \vec {\mu }, \varOmega , \vec {\varUpsilon } \} \), it consumes \( (2c+1)|G_1| + c|G_T| + (2c+2s)|q| \) bits, where \( |G_1|, |G_T| \) denote the sizes of an element in group \( G_1 \) and \( G_T \), respectively. Thus, the total communication cost is \( (2c+1)|G_1| + c|G_T| + (3c+2)|q| + c|j| + |fn| \) bits due to \( s = 1 \). The total communication cost of Fu et al.’s scheme [41] is \( |G_1| + (16c+1)|q| + c(|j| + |id_j|) \) bits, so that the communication cost of their scheme is higher than ours by \( (14c-1)|q| - 2c|G_1| - c|G_T| + (c-1)|id_j| \) bits (where |fn| is equal to \( |id_j| \) and \( c \ge 100 \) in practical experiments, and the representation sizes of those mentioned elements can be found in [44]). Our proposed scheme is therefore much more efficient than Fu et al.’s scheme in terms of communication cost.

(2) Analysis on computation cost:

For our proposed scheme, the computation cost is mainly caused by the ProofGen phase and the ProofVerify phase. To create an auditing proof, the computational operations of the cloud are \( (c+s)E_{G_1} +cM_{G_1} + sE_{Z_q} + csM_{Z_q} + (cs+s)A_{Z_q} \). After receiving the cloud’s proof, the TPA has to execute \( E_{G_T} + (c+1)M_{G_T} + D_{G_T} + (2c+s+2)E_{G_1} + (3c+s+3)M_{G_1} + (2c+s+1)E_{Z_q} + (3c+2s+2)M_{Z_q} + (c+s)A_{Z_q} + 4P_{1,2} + c(hs+Hs) \) computational operations to finish the auditing proof verification, where \( s = 1 \).

For Fu et al.’s scheme [41], the cloud only needs \( c(E_{G_1} + M_{G_1} + M_{Z_q} + A_{Z_q}) \) operations to generate an auditing proof. To verify it, there are \( (2c+1)E_{G_1} + (21c+2)M_{G_1} + (22c+2)E_{Z_q} + (11c+2)M_{Z_q} + (3c)D_{Z_q} + (7c)A_{Z_q} + 2P_{1,2} + c(2hs+Hs) \) operations for TPA to execute. As we can see, there are many exponentiation operations and division operations in \( Z_q \) for Fu et al.’s scheme during the auditing phase which cause it to incur high computation overheads.

Based on the comparison, at the cloud’s side, the computation cost of our proposed scheme is higher than that in Fu et al.’s by \( E_{G_1} + E_{Z_q} + A_{Z_q} \), which is used to blind the sampled blocks in order to protect data privacy. As a matter of fact, it is insignificant to include the additional operations for a cloud because of its strong computing capability. However, at the TPA’s side, the computation cost of our proposed scheme is a little lower than that of Fu et al.’s by \( (20c)E_{Z_q} + (18c-2)M_{Z_q} + (3c)D_{Z_q} + (6c-1)A_{Z_q} - D_{G_T} - (c+1)M_{G_T} - E_{G_T} - 2E_{G} - 2P_{1,2} + c\cdot hs \). Therefore, our proposed scheme is lightly more efficient than Fu et al.’s scheme in theory, and we demonstrate it in the following practical experiments.

(3) Analysis on signature size:

The efficiency of our proposed scheme is based on the efficiency of group signature, which only contains five elements for each data block, e.g., \( \sigma _{j} = (C_{j1},C_{j2},\tau _{j},\omega _{j},\theta _{j}) \). Thus, the signature size is \( n(3|G_1|+2|Z_q|) \) if there are n data blocks in a file. While for Fu et al.’s scheme [41], there are fifteen elements consisting of a data block’s signature, and it costs \( n(|G_1|+14|Z_q|) \) bits to create the signatures for a whole file. Thus, on the premise of \( s = 1 \), the signature size of Fu et al.’s scheme is larger than ours by \( n(12|Z_q|-2|G_1|) \) with the same file. In fact, by referring to [39], the value of s is greater, the number of blocks n is lower which means the less bandwidth we need to transfer the signatures to a remote cloud server and also need less time we need to generate the signatures for a file.

6.2 Experimental results

In this subsection, we evaluate the performance of our proposed scheme from the view of practical experiments. We use the free Mircal Library, written in C, to implement the cryptographic operations on Windows system with an Intel(R) Core(TM) i7-6700 CPU at 3.40 GHz and equipped with 8GB of RAM. To achieve the bilinear operations (referred to the R-ate pairing [45]), the elliptic curve we selected is a MNT curve at 80-bit security level, whose base field size is 159 bits, and the embedding degree is 6. Therefore, the size of element chosen from \( Z_q^{*} \) is 160 bits. Let us assume that the number of sampled blocks to be audited is c, and the number of corrupted data blocks in the shared data is t. Let n be the number of corrupted blocks in the sampled data blocks, the probability P for an auditor to detect the corrupted blocks can be computed as: \( P = P(X\ge 1) = 1 - P(X=0) = 1 - \frac{C_{n-t}^{c}}{C_{n}^c} = 1 - \frac{(n-t)(n-t-1)\cdots (n-t-c+1)}{n(n-1)\cdots (n-c+1)} \), which can be converted to the inequality: \( 1 - (\frac{n-t}{n})^{c} \le P \le (\frac{n-t-c+1}{n-c+1})^{c} \). For instance, if \( 1\% \) data blocks are corrupted, the auditor can detect the corruption with a probability of \( 95\% \) with 300 sampled blocks, and \( 99\% \) with 460 sampled blocks.

Similar to the performance analysis, we also compare our proposed scheme with [41] in terms of computation cost and communication cost. In our experiments, the size of each sector in our proposed scheme (s sectors consist of a data block) is set to be 160 bits, so that the size of each block is about \( 160\times s \) bits. For simplicity and fair comparison, we still set the number of sector as \( s = 1 \). All our experimental results are the average of 50 trials.

6.2.1 Experiments on computation cost

Firstly, to demonstrate the efficiency of signature generation, we evaluate the execution time of the GroupSign phases in our proposed scheme and Fu et al.’s scheme [41]. Figure 3 shows the experimental results. The total number of data blocks is set to be \( n = 10000 \), and the execution time of generating signatures is nearly linear with the number of data blocks. The result shows that Fu et al.’s scheme is slightly more efficient than ours at the data user side, but the operations for generating signatures on blocks can be viewed as a one-time phase, rather than a frequent phase (e.g., auditing phase). Therefore, our proposed scheme is acceptable for practical applications as well because it is also efficient to some degree.

Fig. 3
figure 3

Time of signature generation with different number of blocks

As the auditing efficiency is mainly affected by proof generation and proof verification, we evaluate the execution time of the ProofGen phase on the cloud’s side and the ProofVerify phase on the TPA’s side, respectively. As shown in Fig. 4, the execution time of our scheme for generating auditing proofs is slightly higher than Fu et al.’s scheme because of the additional exponential operations, which are used to blind the original data blocks for improving auditing security. As a matter of fact, the extra computation cost can be negligible in real applications because of the powerful computational capability of the cloud server. For the execution time of proof verification shown in Fig. 5, we note that the efficiency of our proposed scheme is higher than that of Fu et al.’s scheme. In particular, to generate the auditing proofs, it takes around 0.010s and 0.011s for our proposed scheme and the other scheme, respectively, when the number of block is \( c= 100 \). The execution time increase to around 0.056s (with our scheme) and 0.057s (with the other scheme) when the number of block increases to \( c=500 \). To verify the auditing proofs when \( c=100 \) and \( c=500 \), our proposed scheme only needs 0.098s and 0.432s, respectively, whereas Fu et al.’s scheme needs 0.155s and 0.733s, respectively. Moreover, to satisfy the detection probability of corruption, we also test the execution time of proof verification when the number of sampled blocks is 300 or 460. The experimental results show that it takes 0.445s and 0.671s for Fu et al’s scheme when \( c=300 \) and \( c=460 \). To finish the same auditing tasks, it only takes 0.265s and 0.392s for our proposed scheme, both of which improve the efficiency by about \( 41\% \). Therefore, our proposed scheme is more efficient in proof auditing while improving the security of that scheme at the same time.

Fig. 4
figure 4

Time of auditing proof generation with different numbers of blocks

Fig. 5
figure 5

Time of auditing proof verification with different block numbers

Fig. 6
figure 6

Time of auditing proof verification with different numbers of blocks

6.2.2 Experiments on communication cost

When the data owner/users outsource(s) their data file and signatures on blocks to the cloud server, the communication cost must be considered. As we analyzed in the subsection on signature size earlier, our proposed scheme incurs lower communication cost than that in scheme [41]. Here, we would not compare the communication cost of this procedure because it can be considered as a one-time process for an outsourced file. For the auditing challenge message chal, since the extra communication cost produced by fn is not more than 4 bytes in our experiments, the communication cost in our proposed is very close to than in Fu et al’s scheme. As for the communication costs (shown in Fig. 6) produced by the auditing proof, it increases when the number of sampled blocks grows in the auditing procedure. Fortunately, the number of blocks \( c = 460 \) is enough for implementing the auditing tasks. Additionally, Fig. 6 demonstrates that our proposed scheme incurs lower communication than Fu et al.’s scheme [41]. To present the costs clearly, we list the computation and communication costs for \( c=300 \) and \( c=460 \), respectively, in Table 3.

Table 3 Comparisons of computation and communication costs for \(c=300\) and \(c=460\)

7 Conclusion

Cloud storage has emerged as a promising solution to the management problem of massive amount of data generated in the big data era. To save costs, an application can share the data among a large number of users working in the same group. To guarantee the data integrity, many public auditing schemes have been proposed to check the data storage correctness. However, it remains a significant challenge to design an efficient public auditing scheme for shared data while simultaneously preserving identity privacy.

In this paper, we propose an efficient privacy-preserving public auditing scheme for shared data on a remote cloud server. More concretely, we utilize efficient group signature and homomorphic authenticators to compute the verification information for each data block, so that an authorized TPA can efficiently verify the integrity of data, but cannot know the identity of signer on the sampled blocks. Moreover, with the properties of group signature, the group manager can reveal and revoke any group user if he/she behaves badly. We also utilize the random masking technique to keep data privacy against the TPA. Our security analysis demonstrates that the proposed scheme satisfies all the security requirements for public data auditing. Moreover, the performance evaluation shows that the proposed scheme achieves the property of auditing efficiency as well.

As a part of our future work, we will design a more efficient public auditing scheme for shared data with constant verification time, while achieving both high security and optimal performance.