1 Introduction

Smart city, as the most creative city urban morphology, has become a global strategic choice of urban development. In essence, the smart city mainly uses information and communication technologies (ICT) to achieve the intelligent management of various fields for cities, and then create a better life for urban citizens [1]. As shown in Fig. 1, a smart city involves all sorts of smart domains, such as smart transportation, smart care, smart education, and smart economy. And these smart systems form the backbone of a city’s livability, efficiency and sustainability [2]. Take the smart care as an example, people living in the smart city can make a reservation registration on their mobile phone at home instead of registering in line at hospital, which brings much convenience for both patients and hospitals. Moreover, electronic medical records (EMR) enable the authorized doctors to diagnose patients at anytime and anywhere.

Fig. 1
figure 1

The structure of smart city

To provide accurate smart city services, all kinds of data should be collected and analyzed with advanced data processing technologies [3], as shown in Fig. 2. In other words, data can be considered as the core of a smart city. Therefore, there should be an efficient infrastructure with powerful ability in storing and sharing data for archival, online, and offline analytics. Just like EMR, the medical institution must collect and store many medical recorders for a large number of patients. Taking the reliability, scalability, and cost minimization of the data into account, it is a good choice to store these data in open source cloud-based data storage servers [4].

Fig. 2
figure 2

The cloud storage in smart city

As described above, cloud storage plays an increasingly important role in smart city construction while facing with the explosive growth of the data amount in Big Data Age. To enjoy the smart city services better, organizations and individuals have to move their massive data into the cloud servers since they lack powerful storage and management abilities. By delegating the data to a cloud server, organizations and individuals not only liberate themselves from anguishing about the management of complex hardware system, but also do not worry about their data backup. What’s more, they can get access to their data in the cloud at anytime/anywhere.

Along with the unprecedented convenience brought by cloud storage, some significant security and privacy challenges come into being. Once people living in smart city outsource their data to the cloud server, they have to give up the physical possession of their outsourced data and authorize the cloud service providers(CSPs) to implement some fundamental operation on data. It means that users cannot perform data modification, insertion and deletion operations on data in their local system, but the cloud service provider can perform these even without informing the data owners. For example, CSPs can delete the outsourced data that seems to be expired or infrequently accessed for economies. Additionally, even if the CSPs are honest in terms of storing users’ data, they still cannot avoid some inevitable software/hardware breakdowns [5, 6], which leads to some data corruptions.

When the above problems occur, CSPs will try to hide these corruptions out of some benefits, i.e., reputation, economy, and then driving the data owners to believe that their outsourced data are stored correctly in the cloud storage server. As a consequence, checking data integrity is necessary for ensuring the data security [7, 8]. However, it is not efficient for some traditional cryptographic checking methods to check data integrity, because the users do not physically possess their data. Since the data are the core of a smart city, ensuring the data security is the basic premise to achieve the sustainable development of smart city. Hence, an efficient and secure data auditing mechanism without downloading the whole data is required to guarantee the data integrity at cloud server.

In recent years, researchers have proposed many different schemes including private auditing [9] and public auditing [10,11,12] to audit the integrity of remote data in cloud. Private auditing is necessary in some cases, such as the conditions where the data owners must be strictly limited to access the Internet [13]. As for public auditing schemes, they allow both the data owners and other authorized third parties as auditors to verify the data integrity in cloud computing environment when compared with private auditing. Additionally, it is convenient for users to delegate the auditing assignment to a trusted TPA when they cannot audit in person, and users do not need to afford any fee for their auditing request. Hence, public auditing mechanism seemed more practical in terms of verifying the data integrity of cloud storage [11].

1.1 Related work

Cryptographic schemes are very suitable for implementing secure communication; a lot of encryption schemes [14,15,16], authentication schemes [17,18,19], and digital signature schemes [20,21,22] have been proposed for practical applications. To ensure the integrity of the outsourced data in smart city, a number of schemes [10,11,12,13, 23,24,25,26,27] have been proposed. Ateniese et al.[23] was the first to design a provable data possession (PDP) model for public auditing, and it was the primitive security model to verify storage correctness on untrusted cloud. They utilized RSA-based homomorphic authentication and enabled the cloud users to verify the integrity of outsourced data without downloading the whole data file. However, it did not satisfy dynamic storage files. In 2007, Juels and Kaliski first proposed Proofs of Retrievability(POR) [24] which was another method to audit the data integrity in semi-trusted cloud servers. But it could only deal with limited number of queries.

Based on the [23] and [24], there are a series of improved public auditing schemes proposed by other researchers. For example, compact proofs of retrievability(CPOR) [25] was a public auditing scheme based on BLS signatures [28]. Yuan et al. [10] proposed another public auditing scheme whose communication cost is constant. Wang et al. [29] designed a public auditing scheme with group dynamic which is efficiently solved by outsourcing signature updating operations to the cloud via a secure proxy re-signature scheme. In [30], Wang et al. came up with a certificateless public auditing scheme to enhance the security in certificate management.

Recently, Wang et al. proposed the scheme Panda [31], which is an auditing method with efficient user revocation and secure public verification for outsourced data in cloud. But Yang et al. [32] pointed out that Panda is vulnerable to data recovery attack and it is not secure enough in resisting proof forgery attack. In fact,the variants of Panda, i.e. [33, 34] proposed by Wang et al. suffer from the two vulnerabilities as well. Then in [32], Yang et al. proposed an improved public auditing scheme based on Panda by utilizing random masking technology. However, their method incurred some extra communication and computation overhead, especially there were some complicated inverse operations during the verification process of TPA.

Most recently, Li et al. proposed another provable data integrity scheme [35], which mainly reduced the cost in the process of generating verification metadata at the client to achieve high efficiency. Unfortunately, we find that their scheme also encounters with the security threats about resisting data recovery attack and proof forgery attack. Thus, we propose a new scheme to enhance the security of [35] with the random masking technology, and we eliminate the complicated inverse operations compared with [32]. Besides, the improvements can also be applied to some other public auditing schemes, i.e. [36], which is an identity-based public auditing mechanism from RSA.

1.2 Our contribution

In this paper, we mainly analyze the scheme proposed by Li et al. [35] and propose an improved data auditing scheme for checking the storage correctness of outsourced data, which inherits all advantages of [35] and further refines it by improving its security. To be specific, the three major contributions are presented as follows:

  • Firstly, we give a security analysis to the scheme in [35], and prove that there are two typical security drawbacks. One is it cannot resist data recovery attack, i.e., an external attacker can impersonate a public auditor to extract the entire outsourced data. The other one is it is not secure enough to resist proof forgery attack, i.e., a malicious cloud server can generate a valid auditing proof without correctly storing the outsourced data.

  • Secondly, we propose an improved public data auditing scheme, and the security analysis demonstrates that it strongly satisfies the security and privacy requirements of public data auditing. In particular, our schemes avoid the vulnerabilities in resisting the data recovery attack and proof forgery attack. What is more, the improvements and analysis can also be applied to other public auditing schemes.

  • Finally, we present a detailed analysis in terms of the communication cost and computation cost to show that the proposed scheme can greatly improve the security at the expense of increasing communication and computation cost slightly.

1.3 Organization of this paper

The rest of this paper is organized as follows. Section 2 introduces the mathematical background and system models related to this paper. Section 3 makes a brief review to Li et al.’s scheme [35]. Section 4 presents the details about how to implement the two attacks including data privacy recovery attack and proof forgery attack. Section 5 describes the improved public auditing scheme on the basis of [35]. Then, Section 6 analyzes the security of the proposed scheme, and Section 7 evaluates the communication cost and computation cost of the proposed auditing scheme. Section 8 makes some concluding remarks at last.

2 Preliminaries

The necessary cryptographic primitives, system model, and the security requirements are described in this section as follows.

2.1 Mathematical background

  1. 1)

    Bilinear Maps. In order to enhance the readability of this article, some mathematical properties of the bilinear maps are given below.

    Let G, G T be two cyclic groups with the same large prime order p. And let e : G × GG T denote a bilinear map with the following properties:

    • Bilinearity For all P,QG and a,bZ p , the equation e(P a,Q b) = e(P,Q)ab holds.

    • Nondegeneracy For at least one element PG, the inequality \(e(P, P) \neq 1_{G_{T}}\) holds.

    • Computability For any two elements P,QG, we can always find an efficient algorithm to compute the map e(P,Q).

  2. 2)

    Security Assumptions. The security of our auditing scheme is supported by the two mathematical assumptions as below.

    • Computation Diffie-Hellman (CDH) Assumption Given g,g a,g bG, where a,b are two unknown elements in Z p , there is no polynomial algorithm can calculate g ab with a non-negligible probability.

    • Discrete Logarithm (DL) Assumption Given an element g aG, it is computationally infeasible to get the value of a.

2.2 System model

As shown in Fig. 3, the system model in this paper consists of three entities: the user, the cloud server, and the public auditor. Each entity is specifically defined as follows.

Fig. 3
figure 3

System model

The user

The user is someone who owns a mass of data and outsources his/her data to the cloud after generating a signature for each data block. Then with the help of various interfaces provided by the cloud server, he/she can access the outsourced data at anytime.

The cloud server

The cloud server is a distributed storage system which has advantages in providing data storing, complex calculating, and data sharing for users.

The public auditor

The public auditor is the data owners or other third-party verifiers, who execute the data integrity verification process to check the storage correctness of the outsourced data.

In this system model, the user can upload his/her data to the cloud server and delete the original data from the local memory, which greatly reduces the storage burden for users. Then the user can delegate the auditing task to a trusted TPA or act as a verifier himself/herself to verify the validity of outsourced data. Each data block gets stored with a signature signed by the private key of data owners.

2.3 Threat models

  • Integrity threat Under the security model proposed by Shacham and Waters for integrity checking in [25], the user or another third-party verifier can challenge the cloud server to examine whether the outsourced data are stored correctly in cloud. Generally, the integrity of outsourced data in cloud may encounter two types of threats. The first one is that an attacker can generate a set of forged signatures for all data blocks, and then make use of the signature set to return a forged proof; The other one is that the malicious attacker can forge the aggregated information of proof about the sampled data blocks to pass the verification of public auditors.

  • Privacy threat For each auditing mechanism, it is important to ensure there is no one who can get access to the outsourced data in cloud without the data owner’s authorization, which means the public auditor is only allowed to verify the integrity with no data leakage. If someone can extract the details of outsourced data from the auditing information, the security model suffers from data privacy attack.

2.4 Design goals

To achieve an efficient and secure data integrity checking for public cloud storage referring to the aforementioned schemes, the method of Li et al.’s and our improved scheme should satisfy the following security and performance goals.

  • Public auditability The public auditor is allowed to verify the integrity of outsourced data stored in public cloud.

  • Storage correctness No malicious cloud server is able to pass the public verifier’s auditing but for storing the outsourced data correctly.

  • Dynamic operation In the auditing scheme, users are allowed to execute block-level operations on the outsourced data, and provide correctness guarantee of the changed file.

  • Privacy preserving It is crucial for all auditing schemes that the public verifier cannot reveal the whole outsourced data even some data blocks via the auditing information collected during the integrity checking process. What we can know from later security analysis is that the auditing method in Li et al.’s scheme suffers from this weakness, and this goal can be realized in our advanced scheme.

3 Review of Li et al.’s scheme

In this section, we will review Li et al.’s scheme [35] on achieving provable data integrity in the public cloud storage, which consists of four algorithms: K e y G e n(1λ)→(p k,s k), S i g n G e n(p k,s k,F)→Φ, G e n P r o o f(F,Φ,c h a l)→P, C h e c k P r o o f(p k,c h a l,P)→{0,1}. The main process of verification is showed in Fig. 4, and the scheme details are described as below.

Fig. 4
figure 4

The auditing process of Li et al.’s scheme

Let p be a large prime, which is the order of two multiplicative groups G and G T , and let g be a generator of group G. G × GG T is defined as a bilinear map, and H : {0,1}G denotes a hash function, which maps strings uniformly to group G.

Firstly, the user executes K e y G e n to generate a key pair (s k,p k), where sk is a random element xZ p chosen by the user, and p k = g x. Then, based on the scheme of BLS signatures proposed in [25, 26], let data file F be divided into n blocks, and each block is further split into s sectors. Thus, F can be denoted as below:

$$F = \left( \begin{array}{cc} &\overrightarrow{m_{1}}\\ &{\vdots} \\ &\overrightarrow{m_{n}} \end{array}\right) = \left( \begin{array}{cccc} &m_{11} & {\dots} & m_{1s} \\ &{\vdots} & {\ddots} & {\vdots} \\ &m_{n1} & {\dots} & m_{ns} \end{array}\right) \in Z_{p}^{n\times s} $$

Now, the user execute S i g n G e n to generate a signature for each data block. First, the user randomly chooses s elements α 1,…,α s Z p to compute \(w_{j} = g^{\alpha _{j}}\), then for each data block \(\overrightarrow {m_{i}}\), the user can get the corresponding signature:

$$ \sigma_{i}=(H(id_{i})\cdot g^{{\sum}_{j=1}^{s}\alpha_{j}m_{ij}})^{sk}, 1 \leq i \leq s $$
(1)

where i d i = f i l e n a m e∣∣i denotes the identifier of data block m i , filename is the name of outsourced data F. Φ = {σ i }1≤in denotes the signature set of outsourced data blocks.Then, the user delivers his file F and corresponding signature Φ set to the public cloud and removes the data copy from his local memory. Meanwhile, the user should send the message {f i l e n a m e,n,{w j }1≤js } to the TPA.

While the TPA intends to check the data integrity in the cloud, it first randomly chooses c-element subset I = {l 1,…,l c } from the set [1,n] as the samples to be checked. Then, the TPA continues to choose c random elements from Z p , which is \(\upsilon _{l_{1}}, \dots , \upsilon _{l_{c}}\). Finally, the TPA sends the challenge request \(chal = \{(i, \upsilon _{i})_{l_{i} \leq i \leq l_{c}}\}\) to the cloud server.

Once the cloud server obtains the auditing request \(chal = \{(i, \upsilon _{i})_{l_{i} \leq i \leq l_{c}}\}\), it executes the algorithm P r o o f G e n to generate a corresponding auditing proof. The proof consists of two parts, which are

$$\begin{array}{@{}rcl@{}} \mu_{j} &=& \sum\limits_{i=l_{1}}^{l_{c}}\upsilon_{i}m_{ij}, 1 \leq j \leq s \end{array} $$
(2)
$$\begin{array}{@{}rcl@{}} \sigma &=& \prod\limits_{i=l_{1}}^{l_{c}}\sigma_{i}^{\upsilon_{i}} \end{array} $$
(3)

Let the set {μ,σ}, where μ = {μ 1,…,μ s }, be the response proof and be sent to the TPA. Then, the TPA can verify the storage correctness of outsourced data by running C h e c k P r o o f via the following equation:

$$ e(\sigma, g) \stackrel{?}{=} e(\prod\limits_{(i,\upsilon_{i})\in chal}H(id_{i})^{\upsilon_{i}}\cdot \prod\limits_{j=1}^{s}w_{j}^{\mu_{j}}, pk) $$
(4)

4 Weaknesses of Li et al.’s scheme

As described in [35], any client can check the correctness of the data in cloud by acting as a public auditor. Since the public auditor is not fully trusted, it is possible for him/her to achieve his/her own purpose such as recovering the outsourced data by collecting enough auditing information. What is more, the cloud server is semi-trusted, so it may cheat the verifier by giving a forged proof for its benefits and reputation when it deletes some data blocks or hides some data corruptions.

The scheme [35] claimed that the scheme could meet these design requirements as follows: public auditing, storage correctness, dynamic operation, batch auditing, and lightweight. However, we find that the scheme suffers from data privacy attack and proof forgery attack, which are mainly caused by an external attacker (curious TPA) and the malicious cloud server, respectively. The detailed attack process is shown as follows.

4.1 Data privacy attack

We assume that the curious public verifier is an attacker, who intends to obtain the outsourced data without the owner’s authorization. Then, we prove that it can achieve this target by collecting enough public auditing information pairs from the cloud server.

Let the c-element subset I = {l 1,…,l c } of set [1,n] chosen by the attacker denote as follows:

$$\begin{array}{@{}rcl@{}} F^{\prime} =& \left( \begin{array}{c} \overrightarrow{m_{l_{1}}}\\ {\vdots} \\ \overrightarrow{m_{l_{c}}} \end{array}\right) = \left( \begin{array}{ccc} m_{l_{1}1} & {\dots} & m_{l_{1}s} \\ {\vdots} & {\ddots} & {\vdots} \\ m_{l_{c}1} & {\dots} & m_{l_{c}s} \end{array}\right) = \begin{array}{ccc} (\mathbf{F_{1}} & {\dots} & \mathbf{F_{s}}) \end{array} \end{array} $$

For the sake of simplicity, let us take the data fragmentation F 1 as a target that the attacker intends to reveal for example, where

$$\mathbf{F_{1}}^{\mathrm{T}} = \{m_{l_{1}1}, m_{l_{2}1},\dots, m_{l_{c}1}\} $$

After the user accomplishes the process of K e y G e n, S i g n G e n and outsources their data into cloud, the attacker can disguise as an auditor to perform challenge-response phase with the cloud server for at least c times. And the c-challenges it sends to the cloud server can be presented as

$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{lll} &chal_{1} = \{(l_{1}, \upsilon_{1l_{1}}), (l_{2}, \upsilon_{1l_{2}}), \cdots, (l_{c}, \upsilon_{1l_{c}}) \} \\ &chal_{2} = \{(l_{1}, \upsilon_{2l_{1}}), (l_{2}, \upsilon_{2l_{2}}), \cdots, (l_{c}, \upsilon_{2l_{c}}) \} \\ & \quad {\cdots} {\cdots} \\ &chal_{c} = \{(l_{1}, \upsilon_{cl_{1}}), (l_{2}, \upsilon_{cl_{2}}), \cdots, (l_{c}, \upsilon_{cl_{c}}) \} \end{array}\right. \end{array} $$

In return, it can receive c times corresponding proofs: p f 1 = (μ 1,σ 1), p f 2 = (μ 2,σ 2), …, p f c = (μ c ,σ c ).

Because the attacker only targets at the data fragment F 1 firstly, there is merely one element in μ i as μ i = {μ 1i }, where \( \mu _{1i} = {\sum }_{j=1}^{c}\upsilon _{il_{j}}m_{l_{j}1} \), 1 ≤ ic. Assume that \(\mathbf {\upsilon _{1}} = (\upsilon _{1l_{1}}, \upsilon _{1l_{2}}, \dots , \upsilon _{1l_{c}})\) , \(\mathbf {\upsilon _{2}} = (\upsilon _{2l_{1}}, \upsilon _{2l_{2}}, \dots , \upsilon _{2l_{c}}) \) , …, \(\mathbf {\upsilon _{c}} = (\upsilon _{cl_{1}}, \upsilon _{cl_{2}}, \dots , \upsilon _{cl_{c}}) \) and the construction of matrix υ is

$$\mathbf{V} = \left( \begin{array}{cccc} \upsilon_{1l_{1}} & \upsilon_{1l_{2}} & {\dots} &\upsilon_{1l_{c}} \\ \upsilon_{2l_{1}} & \upsilon_{2l_{2}} & {\dots} &\upsilon_{2l_{c}} \\ {\vdots} & {\vdots} & {\ddots} &{\vdots} \\ \upsilon_{cl_{1}} & \upsilon_{cl_{2}} & {\dots} &\upsilon_{cl_{c}} \end{array}\right) $$

Let vectors υ 1, υ 2, …, υ c be linearly independent, so there exists a matrix R that meets the requirement of R V = E, where E is a unit matrix.

Assume that μ = {μ 11,μ 12,…,μ 1c }, since μ = V F 1, the external attacker can reveal F 1 by computing F 1 = R μ. In the same way, the attacker can derive other data fragments such as F 2, …, F s . What’s more, it can furtherly derive the data blocks m i for iI by implementing other corresponding challenge-response operations.

As a matter of fact, even though the attacker cannot play the role of an auditor in some limited conditions, it is still able to recover the matrix of m i , provided that it can eavesdrop or intercept the message of challenge-response. So, Li et al.’s scheme fails to resist the attack in data privacy preserving.

4.2 Proof forgery attack

We assume that the malicious cloud is an internal attacker who deletes the outsourced data owned by users or incorrectly stores the original data out of some interests. Then, we demonstrate that a malicious cloud can generate a forged auditing proof corresponding to the public auditor’s challenges and pass the verification successfully even without correct data storage.

Similar to the data recovery attack, let the c-element I = {l 1,l 2,…,l c } from set [1,n] chosen by attacker present as

$$\begin{array}{@{}rcl@{}} F^{\prime} =& \left( \begin{array}{c} \overrightarrow{m_{l_{1}}}\\ {\vdots} \\ \overrightarrow{m_{l_{c}}} \end{array}\right) = \left( \begin{array}{ccc} m_{l_{1}1} & {\dots} & m_{l_{1}s} \\ {\vdots} & {\ddots} & {\vdots} \\ m_{l_{c}1} & {\dots} & m_{l_{c}s} \end{array}\right) = \begin{array}{ccc} (\mathbf{F_{1}} & {\dots} & \mathbf{F_{s}}) \end{array} \end{array} $$

For the sake of simplicity, we still take the fragmentation F 1 as a target that needs to be audited by a auditor, where \(\mathbf {F_{1}}^{\mathrm {T}} = \{m_{l_{1}1}, m_{l_{2}1},\dots , m_{l_{c}1}\}\). Suppose that the auditor has performed c times challenge-response phase with the cloud server and the corresponding c-challenges are

$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{lll} &chal_{1} = \{(l_{1}, \upsilon_{1l_{1}}), (l_{2}, \upsilon_{1l_{2}}), \cdots, (l_{c}, \upsilon_{1l_{c}}) \} \\ &chal_{2} = \{(l_{1}, \upsilon_{2l_{1}}), (l_{2}, \upsilon_{2l_{2}}), \cdots, (l_{c}, \upsilon_{2l_{c}}) \} \\ & \quad {\cdots} {\cdots} \\ &chal_{c} = \{(l_{1}, \upsilon_{cl_{1}}), (l_{2}, \upsilon_{cl_{2}}), \cdots, (l_{c}, \upsilon_{cl_{c}}) \} \end{array}\right. \end{array} $$

Then, the malicious cloud outputs corresponding auditing proofs as \(pf_{1} = (\mathbf {\mu _{1}}, \sigma _{1}^{\prime })\), \(pf_{2} = (\mathbf {\mu _{2}}, \sigma _{2}^{\prime }) \), …, \( pf_{c} = (\mathbf {\mu _{c}}, \sigma _{c}^{\prime }) \).

In the same way, because the malicious cloud server firstly targets at the data fragment F 1, there is merely one element in μ i as μ i = {μ 1i }. And the value of μ 1i and \(\sigma _{j}^{\prime }\) are respectively denoted as

$$\begin{array}{@{}rcl@{}} \mu_{1i} &= \sum\limits_{k=1}^{c}\upsilon_{il_{k}}m_{l_{k}1} , 1 \leqslant i \leqslant c \\ \sigma_{j}^{\prime} &= \prod\limits_{i=1}^{c}\sigma_{l_{i}}^{\upsilon_{jl_{i}}} , 1 \leqslant j \leqslant c \end{array} $$
(5)

Assume that \(\mathbf {\upsilon _{1}} = (\upsilon _{1l_{1}}, \upsilon _{1l_{2}}, \dots , \upsilon _{1l_{c}})\), \(\mathbf {\upsilon _{2}} = (\upsilon _{2l_{1}}, \upsilon _{2l_{2}}, \dots , \upsilon _{2l_{c}}) \), …,\( \mathbf {\upsilon _{c}} = (\upsilon _{cl_{1}}, \upsilon _{cl_{2}}, \dots , \upsilon _{cl_{c}})\) and the construction of matrix υ is

$$\mathbf{V} = \left( \begin{array}{cccc} \upsilon_{1l_{1}} & \upsilon_{1l_{2}} & {\dots} &\upsilon_{1l_{c}} \\ \upsilon_{2l_{1}} & \upsilon_{2l_{2}} & {\dots} &\upsilon_{2l_{c}} \\ {\vdots} & {\vdots} & {\ddots} &{\vdots} \\ \upsilon_{cl_{1}} & \upsilon_{cl_{2}} & {\dots} &\upsilon_{cl_{c}} \end{array}\right) $$

If d e t(V)≠0, vectors υ 1, υ 2, …, υ c are linearly independent, we can prove that the malicious cloud can generate valid auditing proofs by utilizing these kinds of c pairs of auditing requests and responses after it deletes or destroys the user’s data.

Given that the malicious cloud server has stored c pairs of auditing messages (c h a l i ,p f i ) for 1 ≤ ic. Now it receives a new auditing challenge set \( chal^{*} = \{(l_{1}, \upsilon _{l_{1}}^{*}), (l_{2}, \upsilon _{l_{2}}^{*}), \dots , (l_{c}, \upsilon _{l_{c}}^{*}) \} \) to the data block set F , then it forges a valid auditing proof as following steps:

  1. (1)

    Since d e t(V)≠0 and \( \mathbf {\upsilon ^{*}} = (\upsilon _{l_{1}}^{*}, \upsilon _{l_{2}}^{*}, \dots , \upsilon _{l_{c}}^{*}) \), the malicious cloud can find a data set {a 1,a 2,…,a c } that satisfies \( \mathbf {\upsilon ^{*}} = a_{1}\mathbf {\upsilon _{l_{1}}} + a_{2}\mathbf {\upsilon _{l_{2}}} + {\dots } + a_{c}\mathbf {\upsilon _{l_{c}}} \), that is, \( \upsilon _{l_{i}}^{*} = {\sum }_{j=1}^{c} a_{j}\upsilon _{jl_{i}} \).

  2. (2)

    The malicious cloud computes \( \mu _{1}^{*} = {\sum }_{i=1}^{c} a_{i}\mu _{1i} \), \( \sigma ^{*} = {\prod }_{i=1}^{c} \sigma _{l_{i}}^{\upsilon _{l_{i}}^{*}} \). In the same way, it can compute \( \mu _{2}^{*} \), …, \( \mu _{s}^{*} \), where \( \mu _{j}^{*} = {\sum }_{i=1}^{c} a_{i}\mu _{ji} \). At last, it outputs {μ ,σ } as the proof.

The public verifier checks the correctness of auditing proofs by verifying the equation \( e(\sigma , g) = e({\prod }_{(l_{i}, \upsilon _{i}) \in chal}H(id_{i})^{\upsilon _{i}} \cdot {\prod }_{j=1}^{s}w_{j}^{\mu _{j}}, pk) \). Therefore, the proof {μ ,σ } can pass the checking due to the computing process below:

$$\begin{array}{@{}rcl@{}} e(\sigma^{*}, g) &=& e(\prod\limits_{i=1}^{c} \sigma_{l_{i}}^{\upsilon_{l_{i}}^{*}}, g) = e(\prod\limits_{i=1}^{c} \sigma_{l_{i}}^{{\sum}_{j=1}^{c}a_{j}\upsilon_{jl_{i}}}, g) \\ &=& e(\prod\limits_{i=l_{1}}^{l_{c}}(H(id_{i}) \cdot g^{{\sum}_{j=1}^{s}\alpha_{j}m_{ij}})^{sk \cdot {\sum}_{j=1}^{c}a_{i}\upsilon_{ji}}, g) \\ &=& e(\prod\limits_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon_{i}^{*}} \cdot \prod\limits_{i=l_{1}}^{l_{c}}g^{{\sum}_{j=1}^{s}\alpha_{j}m_{ij}{\sum}_{k=1}^{c}a_{k}\upsilon_{il_{k}}}, g^{sk}) \\ &=& e(\prod\limits_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon_{i}^{*}} \cdot \prod\limits_{i=l_{1}}^{l_{c}}(\prod\limits_{j=1}^{s}w_{j}^{m_{ij}})^{{\sum}_{k=1}^{c}a_{k}\upsilon_{il_{k}}}, pk) \\ &=& e(\prod\limits_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon_{i}^{*}} \cdot \prod\limits_{j=1}^{s}w_{j}^{{\sum}_{i=l_{1}}^{l_{c}}m_{ij}{\sum}_{k=1}^{c}a_{k}\upsilon_{il_{k}}}, pk) \\ &=& e(\prod\limits_{(i,\upsilon_{i})\in chal}H(id_{i})^{\upsilon_{i}^{*}} \cdot \prod\limits_{j=1}^{s}w_{j}^{\mu^{*}}, pk) \end{array} $$
(6)

Since the auditing challenges from public verifiers are simply arranged for data blocks, it is very simple to segregate the auditing challenges according to the identifier of each data block and match them with corresponding auditing proofs in the similar manner. Thus, if the malicious cloud server firstly collects enough pairs of auditing messages, then deletes some data blocks for its own benefits, it can generate a valid auditing proof (μ ,σ ) by inducing the partial auditing proofs and challenges from its collections.

What is worse, an external attacker, who does not possess the outsourced data initially, can also generate an auditing proof of forgery corresponding to any challenges once he/she eavesdrops on enough valid pairs of challenge-proofs. In other words, any external attacker can pretend to be the cloud server when he/she obtains enough auditing messages by some way. Obviously, both users and cloud servers may suffer from unexpected risks caused by this serious security flaw.

5 Our proposed scheme

As described above, the flaw of Li et al.’s scheme is that the public auditing method cannot guarantee the data confidentiality when an exploitative verifier manages the auditing process; even that it cannot guarantee the correctness of the auditing proof when the malicious cloud attempts to give a counterfeited proof due to some considerations over the economy and reputation problems. Therefore, data privacy preserving as another important requirement should be added into our improved scheme to make sure the public auditor is incapable to extract the original data blocks or files no matter how many operations he/she does during the integrity auditing phase, and we further enhance the security over the auditing proof forgery attack in the new proposed scheme.

According to the two attacks specifically analyzed in Section 4, the main cause of the security vulnerabilities is the linear operation for those specified blocks at the process of proof generating. Hence, the original data file can be easily derived by any external attacker and malicious cloud server if he/she collects enough linear combinations of the selected blocks. To remedy the flaws left by Li et al.’s scheme, we make use of random masking technique to eliminate the linear relationship between the data blocks and integrity proofs in this paper, and the details are illustrated in Fig. 5.

Fig. 5
figure 5

The auditing process of the new scheme

Our new proposed scheme also consists of four parts: (1) The K e y G e n(1λ) algorithm is executed by the client to get a public key and a private key with the security parameter λ. (2) In the S i g n G e n(p k,s k,F) algorithm, the user generates a signature set Φ for corresponding data file F. (3) The cloud server implements the P r o o f G e n(F,Φ,c h a l) algorithm after receiving the auditor’s request chal, and then takes the result as integrity proof sending to the auditor. (4) Algorithm P r o o f C h e c k(p k,c h a l,p f) is executed by the auditor to verify the correctness of that proof derived from the public cloud, and it returns accept or reject. The first two algorithms are the same with Li et al.’s scheme, and we only give an appropriate revision on the last two algorithms to achieve the data privacy preserving and get better security performance in the new proposed scheme.

5.1 K e y G e n(1λ)→(s k,p k)

This is a setup phase, where the user randomly selects a key xZ p as private key, and regards g x as public key.

5.2 S i g n G e n(p k,s k,F)→Φ

It is first for the user to divide the data file F into n blocks, namely F = {m i }1≤in . In order to reduce the computation and storage cost while generating a signature set [26], the user furtherly divide each block into s ≥ 1 sectors, where \( m_{i}=\{m_{ij} \in Z^{*}_{p}\}_{1 \leq j \leq s} \). Then, the user randomly chooses s ≥ 1 numbers, α 1,...,α s Z p , to compute \( \omega _{j}=g^{\alpha _{j}} \). To cut down the computation cost, the user translates \( {\prod }^{s}_{j=1}\omega _{j}^{m_{ij}} \) into \( g^{{\sum }_{j=1}^{s}\alpha _{j} \cdot m_{ij}} \), and generates the signature for each file block m i ,1 ≤ in, which is shown as follows:

$$ \sigma_{i} = (H(id_{i}) \cdot g^{{\sum}_{j=1}^{s}\alpha_{j} \cdot m_{ij}})^{sk}, 1 \leq i \leq n $$
(7)

Finally, the file F and the set of signature Φ = {σ i }1≤in are sent to the cloud without storage in local memory by the data owner. Besides, it is necessary to deliver the message {f i l e n a m e,n,{ω i }1≤in } to the TPA for later auditing process.

5.3 P r o o f G e n(F,Φ,c h a l)→p f

Before issuing the verification, an auditing challenge should be generated by TPA as following steps:

⋅:

Randomly chooses a c-elements subset L of set [1,n], where L = {l 1,…,l c }.

⋅:

Randomly picks a c-element vector \( \mathbf {\upsilon } =\{\upsilon _{1}, \dots , \upsilon _{c}\}_{\upsilon _{i} \in Z_{p}} \) corresponding to {l i }1≤ic .

⋅:

Sends the challenge request \(chal = \{(l_{i}, \upsilon _{i})\}_{l_{i} \in L}\) to the public cloud server.

After receiving the challenge request chal from TPA, the cloud generates a relevant proof to prove the integrity of outsourced file. To avoid the two attacks shown in Section 4, choose s random elements ξ 1,...,ξ s Z p to compute the proof as follows:

$$\begin{array}{@{}rcl@{}} \beta_{j} &=& {\sum}^{c}_{i=1}\upsilon_{i}m_{l_{i}j} + \xi_{_{j}} , 1 \leq j \leq s \\ \varphi_{j} &=& g^{-\xi_{j}} , 1 \leq j \leq s \\ \sigma &=& {\prod}_{i=1}^{c}\sigma_{l_{i}}^{\upsilon_{i}} \end{array} $$
(8)

So, the cloud server returns {σ,β,φ} as its auditing proof to TPA, where β = {β 1,β 2,…,β s } and φ = {φ 1,φ 2,…,φ s }.

5.4 P r o o f C h e c k(p k,c h a l,p f)→(t r u e,f a l s e)

Once the public auditor receives the proof {σ,β,φ}, he/she checks its correctness on the basis of specified request message chal and the public key pk by validating the following equation:

$$ e(\sigma, g) \stackrel{?}{=} e(\prod\limits_{(l_{i}, \upsilon_{i}) \in chal}H(id_{i})^{\upsilon_{i}} \cdot \prod\limits_{j=1}^{s}\omega_{j}^{\beta_{j}} \cdot {\prod}^{s}_{j=1}\varphi_{j}, pk) $$
(9)

If this equation is tenable, the result of P r o o f C h e c k(p k,c h a l,p f) will be 1, that is, the public auditor believes that the data file F maintains its integrity in public cloud storage. Otherwise, the auditor can assert that one or more data blocks in outsourced file get destroyed in cloud when the output is 0.

Assume that the cloud is honest and it executes the procedures proposed in our scheme with the client in a right manner, it must be able to pass the verifier’s auditing. Then, the validity of this proof is verified as below:

$$\begin{array}{@{}rcl@{}} e(\sigma, g) &=& e(\prod\limits_{i=l_{1}}^{l_{c}} \sigma_{i}^{\upsilon_{i}}, g) \\ &=& e(\prod\limits_{i=l_{1}}^{l_{c}}(H(id_{i}) \cdot g^{{\sum}_{j=1}^{s}\alpha_{j}m_{ij}})^{sk \cdot \upsilon_{i}}, g) \\ &=& e(\prod\limits_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon_{i}} \cdot \prod\limits_{i=l_{1}}^{l_{c}}g^{\upsilon_{i} \cdot {\sum}_{j=1}^{s}\alpha_{j}m_{ij}}, g^{sk}) \\ &=& e(\prod\limits_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon_{i}} \cdot \prod\limits_{j=1}^{s}(g^{\alpha_{j}\beta_{j}} \cdot g^{-\xi_{j}}),pk) \\ &=& e(\prod\limits_{(l_{i}, \upsilon_{i}) \in chal}H(id_{i})^{\upsilon_{i}} \cdot \prod\limits_{j=1}^{s}\omega_{j}^{\beta_{j}} \cdot \prod\limits^{s}_{j=1}\varphi_{j}, pk) \end{array} $$
(10)

6 Security analysis and comparisons

In this section, we make an analysis to the security of proposed provable data possession scheme for cloud storage. We first prove that the scheme can resist the data privacy attack and proof forgery attack. Then, we compare the security of our scheme and other most recent schemes.

6.1 Security analysis

On the basis of the system model and the abilities of adversaries, we demonstrate that the signatures of outsourced data blocks cannot be forged. Under the premise, we further prove that the proposed scheme is secure against proof forgery attack. Then, we demonstrate that the proposed scheme can resist the data privacy attack; any external attacker cannot reveal the content of outsourced data from auditing message.

Theorem 1

It is impossible for a dishonest cloud or an external attacker to forge a signature of the data block as long as the CDH problem is computationally infeasible to solve.

Proof

If the cloud server or other external attacker can create a correct signature for the data block with no possession of the owner’s private key and entire file, it is highly possible to tamper the original data and pass the auditor’s verification. Here, we prove that our scheme is secure in signature forgery attack. □

We first give an assumption that an adversary \(\mathcal {A}\) which can choose message and identity adaptively is competent to break this scheme with the probability of ε in time t after executing at most q H hash queries and q s sign queries and requesting q k key queries. Then, there exists a simulator \(\mathcal {B}\) that can solve the CDH problem in G with

$$\begin{array}{@{}rcl@{}} && t^{\prime} \leqslant t + q_{H}(T_{G}+T_{p}) + q_{s}T_{G} \\ && \varepsilon^{\prime} \geqslant \frac{\varepsilon}{q_{H}q_{k}} \end{array} $$
(11)

where T G denotes the time each exponentiation takes on G, and T p is the time one incorporation takes on Z p .

Given (g,g a,g b) as a CDH problem example, the simulator \(\mathcal {B}\) implement a signature forgery attack for adversary as follows:

Setup. :

As A asks for the creation of system users, \( \mathcal {B} \) sets the security parameter (G,g,p) and guess which one \( \mathcal {A} \) will plan to forge. Without loss of generality, we set p k t = g a as the target public key. For all other public keys, we randomly choose a element τ i Z p to set the public key p k i as \( g^{\tau _{i}} \) for each it .

Queries. :

There only exist two kinds of oracle queries Q h a s h , Q s i g n that \( \mathcal {A} \) can execute and \( \mathcal {B} \) is supposed to give the corresponding valid answers.

  • Q h a s h : \( \mathcal {B} \) maintains a list L H to look up the Q h a s h records. For each input (i d i ,m i ), \( \mathcal {B} \) checks whether the entry is in L H , if so, returns the corresponding value to A. Otherwise, guesses if the queries (i d i ,m i ) is the target i d or the target block m that A used in its forgery. If i d i = i d , m i = m , let \( H(id_{i})g^{\sum \limits _{j=1}^{s}\alpha _{j}m_{ij}} = g^{b} \). If not, randomly chooses a γ i Z p , and returns \( g^{\gamma _{i}} \) to A, then inserts this record into list L H .

  • Q s i g n : \( \mathcal {B} \) maintains a four-tuple (p k j ,i d i ,m i ,σ i ) consisting of the list L s i g n . For each input (p k j ,i d i ,m i ), if jt,i d i = i d ,m i = m , \( \mathcal {B} \) aborts. Otherwise, if jt, \( \mathcal {B} \) returns \(\sigma _{i} = (H(id_{i})g^{\sum \limits _{j=1}^{s}\alpha _{j}m_{ij}})^{\tau _{j}}\) via hash queries (assuming that A has executed the hash query for simplicity); else if j = t but i d i i d or m i m , \( \mathcal {B} \) returns \( \sigma _{i} = (g^{\gamma _{i}})^{a} = (g^{a})^{\gamma _{i}} \).

  • F o r g e r y : Finally the adversary \( \mathcal {A} \) generates a forgery (p k j ,i d ,m ,σ ) after adaptive queries. If jt, \( \mathcal {B} \) fails to guess the target user, the game aborts. If P r o o f C h e c k(p k j ,m ,σ )≠1, or P r o o f C h e c k(p k j ,m ,σ ) = 1 but a result of Q s i g n , it aborts, because \( \mathcal {B} \) cannot forge a valid signature by itself. Otherwise, A succeeds to create a signature of forgery with no information from above oracle queries, which also means that \( \mathcal {B} \) is able to find a \( \sigma = (H(id^{*})g^{{\sum }_{j=1}^{s}\alpha _{j}m_{j}^{*}})^{a} = (g^{b})^{a} = g^{ab} \) as a solution to the proposed CDH problem.

Since the public key queries time is q k , \( \mathcal {B} \) can guess the target user with a probability of 1/q k , and the probability that \( \mathcal {B} \) accurately guess the i d and target block m is 1/q H . Thus, \( \mathcal {B} \) will solve the CDH problem with the probability of ε/q k q H if \( \mathcal {A} \) can forge a valid signature with probability ε. Throughout the process, each hash query requests an exponentiation on G and an extra incorporations on Z p , and each sign query requires an exponentiation on G, thus the entire performing time is t + q H (T G + T p ) + q s T G .

According to the analysis above, the simulator \( \mathcal {B} \) can solve the CDH problem with a non-negligible probability in polynomial time which is contradictory to the CDH assumption. Therefore, hardly can the signature of forgery be generated in this scheme.

Theorem 2

For the cloud server, the auditing proof is unforgeable under our improved scheme as long as the DL problem is computationally infeasible to solve.

Proof

By reference to the security game defined in [31] and [35], we can demonstrate that if the cloud server succeeds to win the game, named Game1, via generating a fake auditing proof without correct outsourced data, then it means that we can solve the DL problem on group G with non-negligible probability in a polynomial time. □

Game1:

The data owner or the other public verifier sends an auditing request \( chal = \{(l_{i}, \upsilon _{i})\}_{l_{i} \in L} \) to the cloud server and asks for the auditing proof {σ,β,φ} of the correct outsourced data, which is sure to make the verification equation (9) hold. But the cloud server may return a proof of forgery {σ,β ,φ} when it loses or modifies the original data. where \( \mathbf {\beta ^{\prime }} = \{\beta _{1}^{\prime }, ..., \beta _{s}^{\prime }\} \) and \( \beta _{i}^{\prime } = {\sum }_{i=1}^{c}\upsilon _{i}m_{l_{i}j} + \xi _{j}, i \in [1,s] \). Let \( \Delta \beta _{i} = \beta _{i}-\beta _{i}^{\prime } \), it is obvious that at least one element in Δβ i for i ∈ [1,s] is nonzero; otherwise, the outsourced data is stored correctly. Only does the fake proof pass the verification executed by the public auditor that the cloud server can win G a m e1, or it fails.

Because the auditing proof {σ,β,φ} is correct, we get

$$ e(\sigma, g) = e(\prod\limits_{(l_{i}, \upsilon_{i}) \in chal}H(id_{i})^{\upsilon_{i}} \cdot \prod\limits_{j=1}^{s}\omega_{j}^{\beta_{j}} \cdot \prod\limits^{s}_{j=1}\varphi_{j}, pk) $$
(12)

Now, we assume that the incorrect proof generated by cloud succeeds to pass the verification; then, in the same way, we get

$$ e(\sigma, g) = e(\prod\limits_{(l_{i}, \upsilon_{i}) \in chal}H(id_{i})^{\upsilon_{i}} \cdot \prod\limits_{j=1}^{s}\omega_{j}^{\beta_{j}^{\prime}} \cdot \prod\limits^{s}_{j=1}\varphi_{j}, pk) $$
(13)

According to the properties of bilinear maps, we can learn from (12) and equation (13) that

$$ \prod\limits_{j=1}^{s} \omega_{j}^{\beta_{j}} = \prod\limits_{j=1}^{s} \omega_{j}^{\beta^{\prime}_{j}} \Rightarrow \prod\limits_{j=1}^{s}\omega_{j}^{\Delta \beta_{j}} = 1 $$
(14)

Since \(\omega _{j} = g^{\alpha _{j}}\), where g is a generator of the cyclical group G and α j Z p , there exists bZ p that satisfies y = x b, where x and y are the other two generators of the cyclical group G. Then, we can select two random elements r j ,k j Z p to meet the equation \(\omega _{j} = x^{r_{j}}y^{k_{j}}\). So we have

$$ 1 = \prod\limits_{j=1}^{s}(x^{r_{j}}y^{k_{j}})^{\Delta \beta_{j}} = x^{{\sum}_{j=1}^{s}r_{j}\Delta \beta_{j}} \cdot y^{{\sum}_{j=1}^{s}k_{j}\Delta \beta_{j}} $$
(15)

Given x,y = x bG, it is clear that we obtain the value of b which is a solution of the DL problem as long as \( {\sum }_{j=1}^{s}k_{j}\Delta \beta _{j} \) is not equal to zero, showed as below:

$$\begin{array}{@{}rcl@{}} y = x^{- \frac{{\sum}_{j=1}^{s}r_{j}\Delta \beta_{j}}{{\sum}_{j=1}^{s}k_{j}\Delta \beta_{j}}} \Rightarrow b = - \frac{{\sum}_{j=1}^{s}r_{j}\Delta \beta_{j}}{{\sum}_{j=1}^{s}k_{j}\Delta \beta_{j}} \end{array} $$
(16)

As we have described in Game 1, at least one element in {Δβ j }1≤js is not equal to zero and the random element k j Z p . Therefore, the probability that \( {\sum }_{j=1}^{s}k_{j}\Delta \beta _{j} = 0 \) is at most 1/p, then it denotes that we can solve the DL problem with a probability of not less than 1 − 1/p. Because the element p is a large prime, the value of 1/p is negligible, as a result, the probability of 1 − 1/p is non-negligible, which is contradictory to the DL assumption. Therefore, this scheme is secure in terms of proof forgery attack.

Theorem 3

Our improved scheme is secure in data privacy preserving if the DL assumption is correct. Simply speaking, it is very difficult for the external attacker to recovery the data file via any auditing proof from the cloud server under the complexity of DL problem.

Proof

As for data privacy preserving, we show that none of the elements in auditing proof {β,φ,σ} can be used to recover the outsourced data by any internal or external attacker. □

First, we prove that the element β can avoid the data recovery attack showed in Section 4.2 and preserves the privacy of \( {\sum }_{i=1}^{j}\upsilon _{i}m_{l_{i}j} \). The reason is that we break the linear features in proof generating phase by embedding an element ξ j as \( \beta _{j} = {\sum }_{i=1}^{j}\upsilon _{i}m_{l_{i}j} + \xi _{j} \), where ξ j is randomly chosen by cloud and is blinded to the public auditor.

Second, even though the element φ is transparent to all people, it is still very hard to get the value of ξ j due to the computational complexity of the DL problem and CDH problem.

At last, we prove that the element σ can guarantee the data privacy. Following the definition that

$$\begin{array}{@{}rcl@{}} \sigma &=& \prod\limits_{i=l_{1}}^{l_{c}}(H(id_{i})g^{{\sum}_{j=1}^{s}\alpha_{j}m_{ij}})^{sk \upsilon_{i}} \\ &=& \prod\limits_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon_{i} sk} \cdot \prod\limits_{i=l_{1}}^{l_{c}}(g^{\upsilon_{i}{\sum}_{j=1}^{s}\alpha_{j}m_{ij}})^{sk} \\ &=& \prod\limits_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon_{i} sk} \cdot g^{sk{\sum}_{j=1}^{s}(\alpha_{j}{\sum}_{i=l_{1}}^{l_{c}}\upsilon_{i}m_{ij})} \\ &=& \prod\limits_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon_{i} sk} \cdot pk^{{\sum}_{j=1}^{s}(\alpha_{j}{\sum}_{i=l_{1}}^{l_{c}}\upsilon_{i}m_{ij})} \end{array} $$
(17)

From the equation above, we can learn that \( pk^{{\sum }_{j=1}^{s}(\alpha _{j}{\sum }_{i=l_{1}}^{l_{c}}\upsilon _{i}m_{ij})} \) is masked by \( {\prod }_{i=l_{1}}^{l_{c}}H(id_{i})^{\upsilon _{i} sk} \), where the value is composed of H(i d i ), υ i and s k. Although the value H(i d i ), υ i and g sk are public to the auditing verifier, it is still impossible to calculate their product due to the hardness of CDH problem. What is more, hardly can we get the information about \( {\sum }_{i=1}^{j}\upsilon _{i}m_{l_{i}j} \) from \( pk^{{\sum }_{j=1}^{s}(\alpha _{j}{\sum }_{i=l_{1}}^{l_{c}}\upsilon _{i}m_{ij})} \) owing to the intractability of DL problem. Thus, our scheme is secure in terms of keeping data privacy.

6.2 Security comparisons

We compare the security of the improved integrity auditing scheme with Li et al.’s scheme and another two recent schemes [31] and [36] for public cloud storage. Let S R 1, S R 2, S R 3 denote public verification, unforgeability of auditing proof and preservation of data privacy. The comparison details of these schemes are shown in Table 1.

Table 1 Security comparisons of our proposed scheme and related schemes

According to Table 1, we can see that none of the three schemes (i.e.,Wang’s scheme [31], Li’s scheme [35], Yu’s scheme [36]) can meet all the three basic security requirements (S R 1 to S R 3). Especially for Wang’s scheme [31] and Li’s scheme [35], neither can they ensure the preservation of data privacy nor satisfy the unforgeability of auditing proofs. Yu’s scheme is secure against proof forgery attack, but it suffers from data privacy attack. It is only our proposed scheme can meet these three basic security requirements for public data auditing.

7 Performance analysis

In this section, we analyze the performance of our proposed scheme via evaluating the computation cost and communication cost. The result shows that the improved scheme can preserve the data privacy and provide a more secure public auditing mechanism by increasing a little computation and communication costs compared with Li et al.’s scheme.

Communication cost analysis

For the improved scheme, it does not add extra communication overhead to the data owner, because the method of giving a signature to each data block is the same as in [35]. Therefore, we only analyze the communication overhead increased by the auditing challenges and corresponding auditing proofs, shown in Table 2. To make the matter simple, assume that c blocks are chosen to be verified during the challenge-response process. Then, the size of an auditing challenge {(l i ,υ i )}1≤ic is c(|p|+|n|) bits, where |p| denotes the length of element υ i in Z p , |n| denotes the length of each data block index. For the corresponding auditing proof {μ,φ,σ}, there are s elements in μ and φ, respectively, so the size of auditing proof is about (2s + 1)|p|. Compared with Li et al.’s scheme, the extra communication overhead mainly comes from the transmission of vector φ, whose size is only s|p|. Inheriting the advantage of Li et al.’s scheme, the new proposed scheme also adopts the BLS short signatures, so the communication complexity is constant and asymptotically with O(1).

Table 2 Comparisons of communication cost between our proposed scheme and Li’s schemes [35]

Computation cost analysis

we compare the computation cost between our proposed scheme and Li et al.’s scheme [35] in Table 3. For convenience, the notations used to present the execution time are defined below:

  • MG: The time cost of a multiplication operation on group G.

  • Exp: The time cost of an exponent operation in group G.

  • MZ: The time cost of a multiplication operation in the filed Z p .

  • Add: The time cost of an addition operation in the filed Z p .

  • H: The time cost of one hash operation on group G.

  • BP: The time cost of a pairing operation on group G.

Table 3 Comparisons of computation cost between our proposed scheme and Li’s scheme [35]

As we can see, the initialization phase and algorithms K e y G e n, S i g n G e n are completely same between [35] and our proposed scheme. Therefore, the entire computation overhead of these three phases is also the same as that in [35], which is n M G + n H + 2n E x p + n s M Z + n s A d d. Since the difference between Li et al.’s scheme and our new proposed scheme is the P r o o f G e n and P r o o f C h e c k algorithms, we mainly discuss the computation cost during the two processes.

According to the equation presented in Fig. 5, the computation cost for generating an auditing proof is around c s M Z + s(c + 1)A d d + (c + s)E x p + c M G for the cloud server; In the same way, the computation cost for the TPA to verify the validity of auditing proofs is about 2P + (c + s)E x p + (c + 2s + 2)M G. Referring to Table 3, the extra computation cost of the new proposal is only s A d d for cloud server and (s + 1)M G for the public auditor, but we enforce the security in a large extent when it is compared with [35].

Auditing performance analysis

Next, we analyze the auditing performance of our proposed scheme. We carried out the experiments on a personal computer Dell (with an I5-4460S 2.90GHz processor, 4GB memory, and the Window 8 operating system) using the MIRACL library. If the probability of data corruption is 1%, the TPA should randomly choose only 460 sampled blocks to detect this misbehavior with probability greater that 99%. As the same with Li et al.’s scheme [35], we implemented the auditing operations with 300 blocks (detection probability of about 95%) and 460 blocks, respectively. Specifically, we assume that there are 100 sections in each block. Then, we repeated the experiments for 100 trials and compared with [35], which is shown in Fig. 6. Obviously, the auditing time increases with the increase of value c.

Fig. 6
figure 6

The comparison of the proposed scheme and Li et al.’s scheme

8 Conclusion

To achieve the blueprint of smart city in big data age, it is crucial to ensure the security of outsourced data in the public cloud server. In the last several years, many public integrity auditing protocols have been proposed, but the openness of these audit models might bring some security and privacy risks. In this paper, we analyze two weaknesses of Li et al.’s scheme. we demonstrate that Li et al.’s scheme is vulnerable to the data privacy attack and proof forgery attack. Aimed at the two weaknesses, we propose a new provable data possession scheme. The improved scheme inherits the properties and advantages of Li et al.’ scheme [35], such as batch auditing. And the detailed security analysis and performance evaluation demonstrate that it can achieve a desire public auditing and a novel data privacy preserving especially while incurring a little extra communication and computation cost. Note that the cryptanalysis and proposed method can solve the security weaknesses in [29, 33, 34, 36, 37] or other variants as well.For the future work, we will be devoted to design a more secure and practical auditing scheme with high efficiency.