1 Introduction

Cloud computing is a technology, which provides a wide range of on-demand services to the cloud users over the internet. Cloud computing encompasses different properties viz., rapid elasticity, fault tolerance, measured service and resource pooling has made it as a promising technology. In recent years, cloud computing has obtained a large variety of users from IT companies, enterprises, educational institutes and healthcare sectors. According to the recent report “2020 Cloud Computing Survey” of IDG Communications Inc., around 92% of information technology based organizations have adopted the cloud technology and outsourced their data to the cloud servers and only 8% of organizations are maintaining their data in the local on-premise servers.

Data storage is one of the essential service offered by the cloud service provider (CSP). Data owners (DO) can outsource the data to cloud storage server and can access it from anywhere. Once the data are uploaded to the cloud storage, data owner loses the physical control over the data. It increases the data vulnerability. Challenges like data tampering, deletion and data crash are more common in the cloud environment. CSP may delete less accessed user data from the cloud to improve the storage efficiency without the knowledge of the data owners (Liu et al. 2018; Tabrizchi and Kuchaki 2020). Moreover, to maintain the reliability and reputation of the company, CSP may hide the information pertaining to security attacks, crashing and tampering of data from the data owners (Khan et al. 2018).

To overcome these challenges, the data owner has to be convinced that the outsourced data are kept safe in the cloud storage servers. In case, if the outsourced data face threats from the notorious cyber-attackers or internal entities, then the system has to notify the data owner immediately. It can be achieved only through an efficient data integrity verification mechanism.

Data integrity is a process, which helps to maintain the accuracy, validity and consistency of the user’s data over its entire lifecycle (Ateniese et al. 2007). Any unintended or unauthorized changes identified over the data, it is reported immediately to the data owners. The two major categories of data integrity verifications are private and public integrity verification. Private auditing allows the data owners or information proprietor to directly contact the CSP to verify the integrity of the cipher text stored in the cloud servers. Data owner holds metadata information about the encrypted data blocks and their corresponding storage locations. From the metadata information, at a regular interval, the data owner challenges the CSP to prove the integrity of the cipher text. Private auditing schemes are more secure and accurate. But, it puts an extra workload on the data owners by letting them to challenge the cloud service provider directly. As a result, private auditing increases the computation and communication overhead on the data owners, which is dreadful on a service oriented applications. (Ateniese et al. 2009; Wang 2013).

The second type of integrity verification scheme is public auditing, which is very popular in the service oriented cloud applications. It reduces the communication and computation workload on the data owner by introducing a TPA. It acts as a bridge between the data owners and the cloud service providers. TPA helps the data owners to verify the integrity of the data blocks stored in the cloud servers. Data owner shares a limited information about the data chunks to the TPA to challenge the CSP. Incorporating TPA reduces the communication, computation and metadata storage cost at the data owner’s side. Introducing a third parties between the data owners and cloud service providers results in increased data vulnerability and trust issues (Kang et al. 2017).

Completely trusting and relying on Third Party Auditor (TPA) for integrity verification is not advisable as it may produce inaccurate results to the data owners in two circumstances. (1) If the trustworthiness of the TPA is compromised, then the TPA may produce fraudulent integrity results to the data owners. (2) TPA maintains a dedicated index table to perform integrity verification tasks. Due to false positive errors in the index table, the TPA may produce inaccurate results. (Jiang et al. 2013, 2018).

In addition, most of the existing public auditing schemes use an RSA signature (Venkatesh et al. 2012; Yu et al. 2016; Xu et al. 2017) and BLS (Benoh–Lynn–Shacham) signature (Wang et al. 2011; Mukundan et al. 2014; Liu et al. 2015; Xiling et al. 2018; Zhang et al. 2019) to verify the authenticity of the data owners in the cloud environment. But, the size of the keys used in the RSA and BLS signature schemes is large. So, the computation time to generate signature, proof and verifying the authenticity remains high. To manage and verify the integrity of high velocity data in the cloud environment, the authenticity verification procedure should take a low computational cost to achieve maximum result.

1.1 Drawbacks of existing auditing schemes

The drawbacks of existing auditing schemes are (1) data owner faces high computation and communication overhead while performing private auditing. (2) Existing public auditing schemes completely rely on third party auditors and trusts the integrity results without any cross-verification. (3) TPA produces erroneous results due to false positive errors in the index table. (iv) Authenticity verification of data owners using RSA and BLS signature results in increased computation cost on the public auditing schemes.

1.2 Contributions

To overcome the drawbacks of existing auditing schemes and considering the necessity of lightweight, secure integrity verification scheme in the cloud environment, this research work proposes an LDuAP (lightweight dual auditing protocol). It combines both public and private auditing schemes to verify the integrity of the cipher text stored in the cloud servers. Initially, for all the data chunks stored in the cloud, the proposed LDuAP scheme verifies the integrity using a third party auditor. Later, depending upon the computational power and importance of the data, the private auditing is performed for a random number of data blocks.

The contributions of the proposed LDuAP scheme are,

  • Introduces a lightweight secured public auditing scheme based on the Cramer Shoup cryptosystem to reduce the computation and communication overhead.

  • Performs private auditing for random data blocks and cross verifies the integrity results generated by the Third Party Auditor. It potentially increases the authenticity of the auditing results.

  • To overcome the false positive errors in the TPA’s index table, a scalable Bloom Matrix Hash Table (BMHT) is introduced.

  • Lightweight signature is introduced to verify the authenticity of the data owner.

The rest of the paper is organized as follows: Sect. 2 summarizes the related works that are carried out in the data integrity verification schemes. Section 3 explains the problem statement. Section 4 describes the system architecture model and discusses about the design of key generation, encryption, public and private auditing of the proposed model. Section 5 explains the structure of the proposed scalable Bloom Matrix Hash Table (BMHT). The advancement that are made in the proposed LDuAP scheme are discussed in Sect. 6. The security analysis of the proposed LDuAP scheme is discussed in Sect. 7. The performance evaluation of the proposed LDuAP integrity scheme and BMHT is given in Sect. 8 and Sect. 9 concludes and discusses about the future works.

2 Related works

In this section, the recent research works that have been carried out on integrity verification schemes and the drawbacks of using RSA and BLS signatures for authenticity verification in public auditing are summarized.

Data integrity demands accuracy and completeness of the user data in cloud storage servers. Data owners always expects the data to be stored, managed in a trustworthy environment. Ensuring the integrity remains challenging as the data are stored in a different geological location. Depending upon the working nature of the data integrity schemes, it can be categorized into two difference schemes, such as private and public auditing.

2.1 Private auditing schemes

Private auditing schemes allows the data owner to directly verify the integrity of the data blocks stored in the cloud servers. Private auditing schemes can be further sub classified into two types, such as provable data possession (PDP) and proof of retrievability (POR).

PDP is a probabilistic integrity scheme, which allows the data owner’s to directly verify whether the cloud storage server possesses the original data by sampling a set of random data blocks. PDP schemes can effectively identify the corruption or alteration in the cipher text. But, PDP schemes does not support recovery of corrupted data. In POR schemes, at a frequent intervals, the CSP generates a proof to intimate the data owners that their data blocks are safe and recoverable at all time.

Ateniese et al. (2007), proposed the first probabilistic proof of possession scheme by sampling random set of data blocks from the server. It allows the server to access a small portion of the file to generate the proof. To perform authentication, RSA algorithm based homomorphic verifiable tags are used. While uploading the data blocks, data owner pre-computes the homomorphic tags for each data block and stores it in the server. Later, the data owner challenges the server to prove the integrity of the data block using the generated homomorphic tags. The server generates the proof of possession for the requested data block and send it back to the data owner. However, in this scheme, the data owners has to stay active at all time, which is not practically possible. Sending frequent challenges to CSP puts an extra workload on the data owners and it increases the communication cost and computation overhead.

Later, Ateniese et al. (2011) proposed a remote PDP data integrity verification scheme based on spot-checking. In this scheme, the server uses a forward error checking (FEC) mechanism to prove the data possession. Initially, the files are encoded with FEC by the data owners and then the encoded files are used by the cloud servers to prove the possession of the data blocks. FEC methods supports the data recovery for minor attacks and detects the major attacks effectively. Subsequently, Wang (2013,2015), He et al. (2017) and Li et al. (2017) proposed PDP based schemes. However, all these schemes were unable to perform data recoverability.

POR schemes are introduced to support data recoverability. Shacham and Waters (2008), proposed the first POR method to verify the integrity of the user data in the distributed storage systems. It uses BLS signature for short queries and pseudorandom functions for long queries. It uses block-less verification method but fails to prevent data leakage. Later, Bowers et al. (2009), proposed a HAIL (high availability and integrity layer) architecture to maintain integrity of the data. HAIL uses integrity protected error correcting code (IP-ECC) and universal hash functions (UHFs) and pseudo random function for corruption resilient. Recently, Yuhan et al. (2017), proposed an IPOR scheme, which supports both integrity verification and recoverability. But, the signature used in the scheme is too large and result in high computation cost on data owner.

As a summary, both PDP and POR private auditing schemes are accurate in verifying the integrity of the cipher text in the cloud environment. Allowing the data owners to verify the integrity increases the computation and communication overhead on the data owner. Because, the cipher text stored in the cloud storage servers is transferred to the premises of the data owner and decrypted before every private integrity verification. In addition, letting the cipher text to flow between the CSP and data owner for private verification increases the possibility of network attacks.

2.2 Public auditing schemes

Public auditing reduces the communication overhead of the integrity verification by employing a TPA between the data owners and the cloud service provider. Data owner shares metadata information about the data block and ownership signature with the TPA to perform public auditing. Most of the public auditing schemes uses RSA and BLS signatures to verify the authenticity of the data owners.

Yu et al. (2016), proposed an Identity based cloud data integrity checking (ID-CDIC) using an RSA encryption algorithm to support public auditing. It reduced the complexity of the certificate management in the PDP and PoR and supports the variable sized blocks. The security of the ID-CDIC method highly depends on the RSA assumptions with large public exponents. However, RSA with the key size of 512 bits is already proved to be breakable. Using a larger key size for each data block will increase the computation overhead. In addition, ID-CDIC method fails against recovery attacks and the results of the integrity verification highly depend upon the performance of the Third Party Auditor.

To improve the security against recovery attacks in Yu et al. (2016) and Xu et al. (2017), proposed a modified Identity based integrity verification algorithm using RSA signature. It tightens the security against recovery attacks. But, the computation overhead to generate the signature and verifying the proof becomes higher than the Yu et al. (2016) scheme. Moreover, both schemes are infeasible to perform dynamic and batch auditing. Later, Walid et al. (2019), proposed an RSA based cryptographic accumulator integrity verification scheme to provide security for highly sensitive data. It allows the data owners to perform an unlimited number of integrity check to ensure the authenticity of the data.

BLS (Benoh–Lynn–Shacham) signature is used as an alternative to the RSA signature. Since, the size of the RSA signature is high, (to achieve security level λ = 128 bits, the required size of the RSA signature should be O (λ) 3 (i.e.) signature size = 2048 bits) BLS signature scheme was introduced. It reduces the required signature size to 2λ from O (λ)3. The required signature size for BLS to achieve security level λ = 128 bits is 256 bits (twice as the size of the λ). Liu et al. (2015), proposed a public auditing scheme called, MuR-DPA (multi-replica dynamic public auditing). It uses BLS signature and Merkle Hash Tree (MHT) to verify the integrity. Here, MHT is used to store all the replicas of each data block in a same sub-tree. It supports both dynamic and batch auditing schemes. But MuR-DPA uses an additional key server to generate a private key for the users to encrypt the data. Later, Mukundan et al. (2014), proposed a dynamic multi replica provable data possession (DMR-PDP) method to perform integrity verification for replicated data on cloud storages based on BLS signature and homomorphic encryption. It tightens the security of user data stored in the cloud server. But, the time taken to perform homomorphic operations on the encrypted data remains high. Implementing homomorphic encryptions to verify integrity of the data needs a large computation power. Later, Xiling et al. (2018), proposed a privacy preserving EoCo architecture to improve the efficiency of the integrity verification. It reduces the communication overhead between the TPA and CSP. Later, Zhang et al. (2019), proposed a fuzzy auditing protocol to allow multiple cloud users to modify the data while ensuring the data integrity. It uses BLS signature to verify the authenticity of the data owner. The required size of the BLS signature to achieve the 128 bit security level is 2λ (i.e.) 256bits.

Moreover, all the existing public auditing schemes (Yu et al. 2016; Liu et al. 2015; Wang et al. 2011, 2018; Xiling et al. 2018; Zhang et al. 2019, 2020) completely trusts and accepts the integrity results generated by the TPA without any cross-verification. However, if the trustworthiness of the TPA is compromised, then the TPA might send fraudulent integrity results to the data owners. Completely relying and trusting the TPA increases the security vulnerabilities in many aspects.

2.3 False positive errors in index tables and its consequences in public auditing

In public auditing schemes, index table or hash based tree structure is used by the TPA to perform auditing tasks. However, the rate of false positive errors in the index table increases as the number of data blocks grows in the cloud storage. It subsequently reduces the accuracy of the integrity results and increases the trust issues between the data owners and the CSPs.

Aditya et al. (2011), proposed an auditing scheme using Bloom filter. Each block of data stored in the cloud server is verified using its corresponding hash values in the bloom filter. It improves the space efficiency for computation and metadata management. If a single user file has 1000 data blocks and the rate of false positive error in the Bloom filter is 0.001, then the reliability of the verification probability for one file is positive for all blocks. If it is not found, the reliability is only about 90%. In addition, in this scheme, during ownership verification, the data owner has to upload the entire file. It increases the vulnerability of the data. Subsequently, Nianmin et al. (2014), Xiang et al. (2016), Zhang et al. (2017) and Yan et al. (2018), have proposed several variants of bloom filter based integrity verification schemes in the cloud storages. However, the rate of false positive error in the bloom filter is not controlled to work with a large velocity of data (i.e.) big data.

As an alternative to Bloom filter, Zhu et al. (2013), proposed the public auditing scheme using the Index Hash Table (IHT). It uses a probabilistic query and verification to improve the performance of the auditing scheme. IHT uses serial number, block number, version number to maintain the information. IHT helps in achieving maximum accuracy but it also incurs a large computational overhead while performing inserting and deletion in the hash table. Later, Tian et al. (2017), proposed public auditing scheme using Dynamic Hash Table (DHT), which is a two-dimensional data structure located at a TPA to record the data property information. It reduces the computational overhead in Zhu et al. (2013). However, the search operations become inefficient while performing the verification and updation process. In addition, both the IHT and DHT are exposed to false positive errors.

3 Problem statement

As referred in the related works, the existing auditing schemes face four major issues, such as, (1) allowing the data owners to contact the cloud storage servers directly and perform the private integrity verification which increases the computation and communication overhead on the data owner, (2) employing third party auditor between the cloud service provider and data owners increases the data vulnerability and trust issues. (iii) Existing public auditing scheme uses RSA and BLS signature to perform ownership verification. However, the size of the keys used in these signature schemes is large and results in increased computation overhead. (4) False positive errors in the index table reduces the accuracy of the public auditing.

In this research work, an LDuAP is proposed to overcome the issues associated with the existing auditing schemes. It combines both the public and private auditing methods to verify the integrity of the cipher text stored in the cloud servers. LDuAP scheme uses IND-CCA2 (Indistinguishable Adaptive Chosen Cipher text Attacks) secured Cramer Shoup cryptosystem to perform the auditing task. To reduce the false positive error from the hash table, a two-dimensional scalable Bloom Matrix Hash Table is proposed.

4 System model

The proposed LDuAP scheme creates an efficient framework to verify the integrity of the data stored in the cloud environment. The major entities involved in the proposed scheme are, (1) data owner, (2) third party auditor and (3) cloud service provider. The relationships and communications between the entities are shown in Fig. 1.

Fig. 1
figure 1

Overall architecture of the proposed LDuAP scheme

Data owners (DO) can be either individuals or a group of peoples who have full rights to access, retrieve and modify the data from the cloud at any point of time. Data owner generates keys, signature and metadata for each data block. The generated keys are used to encrypt the data block using Cramer Shoup cryptosystem. After encrypting, the cipher texts are uploaded to the cloud storage and the metadata information about the data blocks are shared with the Third Party Auditor to perform public auditing. In addition, DO are responsible for cross-verifying the integrity results generated by the TPA or Public auditing.

Third party auditors (TPA) act as a bridge between the data owners and the cloud service provider. TPA manages a scalable Bloom Matrix Hash Table (BMHT) to perform auditing. At a frequent interval, TPA performs two major tasks, such as (1) challenges the CSP to prove the integrity of the data block stored in the cloud storage server and (2) proof verification.

Cloud service providers (CSP) is an untrusted entity of hardware and software resource cluster, which have an unlimited capacity to store the cipher text uploaded by the data owners. In the proposed integrity verification scheme, cloud service provider performs two major tasks such as, (1) verifies the ownership of the challenged data block. (2) Generates proof and acknowledges the challenge raised by the TPA.

The operations that are performed by the data owners viz., key generation, signature generation, metadata generation and data encryption are executed only once. However, to ensure the integrity of the data block, operations performed at TPA and CSP, such as, integrity challenge, proof generation and proof verifications are performed unlimited number of times.

4.1 Data security

Data owner generates asymmetric keys, signature and metadata for each block of data and encrypts it using Cramer Shoup cryptosystem. Algorithm 1 explains the key generation and encryption of the data block and Algorithm 2 explains the creation of metadata and signature for the data block.

figure a

For each data block, data owner creates public as well as the private keys. Here, ‘c’, ‘d’ and ‘h’ are used as public key and ‘x1’, ‘x2’, ‘y1’, ‘y2’, ‘z’ are used as private keys. The calculated values such as, ‘U1’, ‘U2’and ‘Enc’, ‘v’ are the cipher text of the data block ‘R’.

Later, to create the signature for data block ‘R’, it is hashed using a collision resistance one-way hash function (HF). It produces ‘L’ bit hash values. A prime number ‘e’ is chosen from \(\left( {\frac{{\left( {L + 1} \right)}}{2}} \right)\) hash bits and a random string ‘β’ is chosen from \(\left( {\frac{L}{2}} \right)\) bits hash value. Here, ‘e’ and β are used to compute the signature for the data block ‘R’. Instead of choosing a prime number ‘e’ from ‘L’ bits, the proposed method chooses the value of ‘e’ from \(\left( {\frac{{\left( {L + 1} \right)}}{2}} \right)\) bits, which reduce the size of the signature by 50% without compromising the security. Algorithm 2 explains the signature generation and metadata creation for data block ‘R’ in detail.

figure b

After the completing the data encryption, signature generation and metadata creation, the cipher text (U1, U2, Enc and v) of data block ‘R’ is sent to the cloud storage server and the corresponding metadata MDn = {αn, T, Bid and Uid} and signature (ye) is sent to the Third Party Auditor. In metadata, αn represents the hash value of the data block, ‘T’ represent the time of data encryption, Bid represents the data block identification number and Uid represents the user identification number.

4.2 Dynamic public auditing

Upon receiving the metadata MDn = {αn, T, Bid and Uid} and signature (ye) from the data owners, Third Party Auditor (TPA) calculates the corresponding hash bits and stores it in the Bloom Matrix Hash Table. Hash value (αn) in the metadata may consist of two or more message digests. Consider, if the data owners chooses three hash function (n = 3), then it will produce three unidentical message digest of different length for single data block ‘R’. From these message digest, TPA will derive the hash bits and stores it in the Bloom Matrix Hash Table. The values in the BMHT will either be 0 or 1. If the value in BHT is 0, then it means that no hash values has accommodated in that particular location. If the bit value is 1, then it means that a hash value has accommodated location. The structure and scalability feature of the proposed Bloom Matrix Hash Table is explained in detail in Sect. 5.

At a frequent interval, TPA challenges (Chal) the cloud service provider to prove the integrity of the data blocks stored in the cloud storage. (CSP). Algorithm 3, explains the proposed dynamic public auditing method. Here, timestamp (Ts) in Challenge query is used to maintain the sequential order and Uid is used for user identification. The workflow of the proposed LDuAP scheme is shown in Fig. 2.

figure c
Fig. 2
figure 2

Workflow of the proposed LDuAP scheme

Cloud service provider performs two major tasks such as, (1) verifies the ownership of the challenged data block and (2) generates proof and acknowledges the challenge raised by the TPA.

Verifying the data owner’s signature is important in cloud storage to maintain the legitimacy. It protects the cloud storage from unauthorized access. After verifying the signature of the data owner, acknowledging the challenge helps the TPA to ensure the integrity. Algorithm 4, explains the signature verification and integrity proof generation in detail.

figure d

The proposed public auditing is efficient and secure. However, if the trustworthiness of the TPA is compromised, then the TPA might hide the original auditing results generated by the proposed scheme and may send a fraudulent result to the data owners continuously. To overcome this issue, a cross verification mechanism is introduced using private or direct auditing method. The proposed LDuAP scheme allows the data owner to cross-verify the integrity results produced by the Third Party Auditor.

In existing private auditing schemes, the integrity of the data block can be verified only after decrypting the cipher text. It is a time consuming task in both computational and communication aspects. However, the proposed scheme allows the data owners to verify the integrity without decrypting the entire file. Integrity of the data block can be directly verified using the following equation,

$$\left( {U_{1} } \right)^{{x_{{1 + }} ~y_{{1\alpha }} }} + ~\left( {U_{2} } \right)^{{x_{{2 + }} ~y_{{2\alpha }} }} ~ = ~v.$$
(1)

The process of verifying the integrity using direct verification is explained in Algorithm 5.

figure e

Introducing private auditing in the LDuAP scheme slightly increases the computation overhead of the data owners. However, the IND-CCA2 secured Cramer Shoup cryptosystem allows the data owners to perform private auditing with decrypting the data block. In addition it also reduce the trust issues associated with the TPA.

5 Structure of the proposed scalable Bloom matrix Hash Table (BMHT)

Bloom filter is a space efficient probabilistic data structure which is used to test whether an element is a member of a set. Existing standard Bloom filter uses a one-dimensional array data structure to map the elements to the bits arrays. However, standard Bloom filters are not scalable. The size of the array used in the Bloom filter has to be predetermined. As far as the cloud is concerned, predetermining the number of data blocks to be stored in the cloud storage server is impractical.

Using standard non-scalable bloom filter in the TPA increases the bottleneck while addressing the high velocity of input data. It drastically reduces the processing speed of the TPA and results in increased false positive error rates.

To overcome the issues in the standard Bloom filter, the proposed LDuAP auditing scheme creates a two dimensional BMHT. It helps the TPA to perform pubic auditing effectively.

Instead of using array data structure, the proposed BMHT uses a two dimensional scalable zero-matrix with the size of m * n, whereas ‘m’ and ‘n’ are always same. The initial size ‘m’ of the BMHT is calculated using the following formula,

$$m = {\text{Sqrt~}}\left( {\frac{{ - ~Total\;number\;of\;data\;chunks \times \ln (False\;positive\;rate)}}{{{\text{ln}}\left( 2 \right)^{2} }}} \right).$$
(2)

While generating the metadata, the data owner hashes the data block ‘R’ using ‘n’ number of one-way hash function. The hash functions in the proposed scheme are MD5, SHA1 and SHA2. The creation of the Bloom Matrix Hash Table in TPA is shown in Fig. 3.

Fig. 3
figure 3

Creation of Bloom Matrix Hash Table

Upon receiving the metadata (αn) from the data owner, TPA calculates the hash bits and stores it in the corresponding bit array in BMHT. The values in the BMHT will either be 0 or 1. If the value in BHT is 0, then it means that no hash values has accommodated in that particular location. If the bit value is 1, then it means that a hash value has accommodated location.

Scalability on BMHT To reduce the false positive errors in the public auditing, the proposed BMHT is designed in a manner to increases its size from ‘m’ to ‘2*m’. Third Party Auditor monitors the remaining free slots (i.e.) the number of bit arrays with 0 values in BMHT continuously. If the free slots in BMHT reaches a certain threshold limit, then a new and larger BMHT is created. If the growth rate of the input data is low, then the newly created BMHT will be two times bigger than the old BMHT and four times bigger if the growth rate is high. Since, the new matrix have more slots, the false positive rate of new matrix is always lesser than the old matrix.

When the filter reaches a certain fill ratio threshold value, a new matrix is created with a tightened error probability. The tightening ratio ‘r’ controls the growth of the new matrixes. The tightening ratio is always maintained between 0.8 and 0.9. Regarding to the fullness criteria of a BMHT matrix, the maximum number of items stored in the given matrix for fill ratio 50% can be found by the following formula, \(n = m \times ~\frac{{\ln 2^{2} }}{{\left| {\ln P} \right|}}~\) where ‘m’ is the number of bits and P is the probability of the false positive error. Algorithm 6, explains the creation and scalability of BMHT. Figure 4, diagrammatically represents the scalability feature of the proposed BMHT.

figure f
Fig. 4
figure 4

Scalability on Bloom Matrix Hash Table

5.1 Hash collision

The important issues that has to be considered while creating an index table is hash collision and false positive errors. If the hash function maps two or more different elements (i.e.) data blocks to the same bit array in the hash table, then the collision is said to occur.

To avoid hash collision and pigeonhole problem, the size of the index table should be always greater than the total number of data blocks stored in the cloud. Let us consider, at time t’, the data owner wants to upload a dataset with the size of 211 GB to the cloud storage. Initially, to reduce the computational complexity, the input dataset will be chunked into multiple small sized data blocks. Here, the given dataset is chunked into 2,16,553 data blocks each of 1 MB size. The required size of the BMHT to store the hash bits of 2,16,553 input data blocks is calculated from the Eq. (2),

$$m = {\text{Sqrt}}\left( {\frac{{ - ~216553 \times \ln ~\left( {0.01} \right)}}{{{\text{ln}}\left( 2 \right)^{2} }}} \right) = Sqrt\left( {2,075,686} \right)~bits = {\text{144}}0.$$

So, the required size of BMHT is 1440 × 1440. Table 1, represents the BMHT matrix size and storage requirement of BMHT for different input data size.

Table 1 Storage requirement of Bloom Matrix Hash Table

In the proposed LDuAP scheme, data owners uses three hash functions such as, MD5, SHA-1 and SHA-2 to hash the input data block ‘R’. MD5 hash function produces a 128bit message digest and SHA-1 and SHA-2 produces 160 bit message digest. The probability of collision of these hash function can be calculated as follows, \(\left( {Collision\;of\;MD5} \right) = \frac{1}{{2^{{128}} }}\); \(P\left( {Collision\;of\;~SHA1} \right) = \frac{1}{{2^{{160}} }}\);\(P\left( {Collision\;of\;SHA2} \right) = \frac{1}{{2^{{160}} }}\). So the Probability of total collision is, \(P~\left( {Collision} \right) = \left( {\frac{1}{{2^{{128}} }} \times ~\frac{1}{{2^{{160}} }} \times \frac{1}{{2^{{160}} }}} \right) = 1.375 \times 10^{{ - 137}}\); which is very low and considerably negligible. Hence, the probability of collision for two independent input data block is negligible in the proposed BMHT.

5.2 Probability of false positive errors in proposed BMHT

If an element refers to the same bit array as that of another element in the BMHT, then the false positive error is said to occur. The probability of false positive is calculated from,

$$P\left( {false\;Positive} \right) = \left( {1 - e^{{\frac{{k*n}}{m}}} } \right)^{k} ~.$$
(3)

The probability of the false positive of the proposed BMHT is derived for various input size. Collision in BMHT plays a major role in increasing the probability of the false positive errors. Let’s consider that the BMHT has a ‘Cf’ number of collision, then the probability of false positive, \(P~\left( {false\;positive} \right) = (1~{-}~Cf)\) and the expected amount of false positive due to collision can be calculated by,

$$E\left( {false\;positive} \right)~ = ~\mathop \sum \limits_{{i = 0}}^{{n - 1}} Cf~\left( i \right)~.$$
(4)

where ‘i’ is the factor of collision. From the above-mentioned formula, it is clear that if collision increases in BMHT then there will be an increase in false positive rate. As the proposed model uses independent hashing techniques with the negligible collision factor, it produces a very low false positive error. Table 2, represent the probability of false positive in BMHT.

Table 2 Probability of False Positive error in BMHT

The acceptable false positive rate for an index table is 0.01. However, the probability of false positive errors in the proposed scalable BMHT ranges between 0.002 and 0.003, which is significantly low compared to the standard bloom filter. Table 3, shows the probability of the false positive errors in the newly created BMHT.

Table 3 Probability of false positive error of newly created BMHT

The probability of false positive error in the newly created BMHT is always lesser than the older BMHT. So, sudden increase in the velocity of input data will not affect the performance of the TPA.

6 Advantages of the proposed LDuAP scheme

In this section, the advantage of using Cramer Shoup cryptosystem, BMHT and dual auditing in the proposed LDuAP scheme are discussed in detail.

6.1 Why Cramer-Shoup cryptosystem is preferred over other encryption algorithm in the proposed LDuAP scheme?

Popularly known asymmetric encryption algorithm such as, RSA, Paillier and Elgamal are malleable (i.e.) it is possible for an adversary to transform a cipher text into another cipher text which is decrypted to a related plaintext. Cramer Shoup cryptosystem is an extension of Elgamal algorithm is proven to be secure against indistinguishable adaptive chosen cipher text attack (IND-CCA2) and it ensures non-malleability even against a resourceful attacker. To ensure the non-malleability of the cipher text stored in the cloud storage, the Cramer-Shoup cryptosystem performs direct integrity verification by equating (U1) x1 + y1α + (U2) x2 + y2α with ‘v” before every decryption. The proposed LDuAP scheme has incorporated the non-malleability feature of the Cramer Shoup cryptosystem to perform direct integrity verification.

6.2 How does the Bloom Matrix Hash Table (BMHT) improves the performance of public auditing?

Existing standard Bloom filter uses a non-scalable one-dimensional array data structure to map the elements. Since, it is non-scalable, size of the bloom filter has to be predetermined based on the total number of input data blocks. However, in a cloud environment, calculating the number of real-time input data blocks is impossible. Usage of standard non-scalable bloom filter in the TPA increases the bottleneck in cloud environment. It may drastically reduce the performance of the TPA and results in increased false positive error rates.

To overcome these issues, a scalable two dimensional BMHT has been proposed in the LDuAP scheme. It effectively reduces the hash collision and false positive errors in the TPA. It helps in increasing the trust between the data owners and third party auditor. The rate of false positive errors in the proposed BMHT is shown in Tables 2 and 3.

In addition, the false positive rate of the BMHT is tested with IBM SPSS testing tool. To store 1 GB of user data in the cloud storage server, a two-dimension matrix with the size of 99 × 99 is created by the TPA (i.e.) it can store up to 9801 bits. The user data is chunked into 1024 data blocks (each consist of 1024 KB). The data blocks are encrypted using Cramer Shoup cryptosystem and uploaded to the cloud storage. Later, the public auditing is performed by the TPA to ensure the integrity of the data blocks. From 1024 data blocks, 750 public integrity results were collected, in which 742 results matched the actual value and remaining 4 results befallen into false positive category. Table 4 and Table 5 represents the results derived from the IBM SPSS testing tool.

Table 4 Case processing summary
Table 5 Cross-Tabulation of the proposed BMHT table

The IBM SPSS analyzed results shows that the false positive rate of the proposed Bloom Matrix Hash Table is very less. In addition, since the proposed BMHT supports scalability, the false positive rate can be reduced to greater extent.

6.3 Importance of dual auditing scheme

In public auditing schemes, the proof verification or integrity verification is performed only on the premises of the TPA. It maintains a Bloom Matrix Hash Table (BMHT) to map the elements received from the data owners to the corresponding bit arrays. At any time ‘t’, the integrity of the data blocks stored in the cloud is verified only if the hash bit received from the data owner and CSP are same. Subsequently, the public integrity result is communicated to the corresponding data owners. If the trustworthiness of the TPA is compromised, then the TPA may send a fraudulent public integrity to the data owners continuously. Existing public auditing schemes (Yu et al. 2016; Liu et al. 2015; Wang et al. 2011, 2018; Xiling et al. 2018; Zhang et al. 2019, 2020) does not allow the data owners to perform cross verification. However, to reduce the trust issues between the data owners and the TPA, the proposed LDuAP scheme has introduced a dual auditing scheme. It allows the data owners to perform cross verification to ensure the correctness of the public auditing.

7 Analysis of the proposed LDuAP scheme

The primary goals of the proposed LDuAP auditing scheme are (a) privacy protection and data security in cloud storages. (b) Cross verification of the integrity results generated by the TPA and (c) signature size reduction or lightweight auditing.

7.1 Privacy protection and vulnerability assessment

Privacy protection and data security are the two important features that has to be considered while creating an auditing scheme. While considering the privacy of the data owners in cloud environment, the identity of the data owners has to be protected from the internal and external adversarial attacks. In the proposed work, before uploading the data blocks to the cloud storage, data owner generates a signature (ye) to prove the ownership of the data block. Later, the cipher text and the signature of the data blocks are sent to the Third Party Auditor along with the metadata. The signature of the data owner is generated using,

$${\text{y}}^{{\text{e}}} = \left( {{\text{x}} \times {\text{h}}_{1}^{\beta } \times {\text{h}}_{2}^{{\beta \oplus {\text{H}}_{1} \left( {\text{R}} \right)}} \times {\text{h}}_{3}^{{\beta \oplus {\text{H}}_{2} \left( {\text{R}} \right)}} } \right)~mod\,n.$$
(5)

While creating the signature of the data owner, the data block is hashed using anti-collision one-way hash function ‘HF’, makes impossible for an adversary to acquire the information about the data owners from the signature. Similarly, to secure the data blocks in the cloud storage, LDuAP uses an asymmetric Cramer Shoup cryptosystem, which is proven to be secure against IND-CCA2 (indistinguishable adaptive chosen cipher text attacks) under the assumption of Decisional Diffie-Hellman (DDC) algorithm. The proposed LDuAP auditing scheme effectively protects the data owner’s privacy from internal and external adversarial attacks. In addition, the proposed LDuAP scheme is tested using a Coverity–SAST (Static Application Security Testing) tool in the Polaris software integrity platform. The common weakness enumeration (CWE) of the proposed LDuAP scheme is derived and shown in the Fig. 5.

Fig. 5
figure 5

a Vulnerability assessment of the proposed cryptographic model in LDuAP scheme. b Vulnerability assessment of the data integrity process in the proposed LDuAP scheme

To assess the vulnerability of the proposed cryptographic model, the defect density of weak encoding for password (CWE-261), missing cryptographic step (CWE-325), reversible one-way Hash (CWE-328), small space of random values (CWE-334), use of cryptographically weak pseudo-random number generator (CWE-338), use of password hash with insufficient computational effort (CWE-916) are derived using the Coverity- SAST security testing tool.

Likewise, to assess the proposed dual auditing scheme, the defect density of key exchange without entity authentication (CWE-322), improper verification of cryptographic signature (CWE-347), use of less trusted source (CWE-348), missing support for integrity check (CWE-353), download of code without integrity check (CWE-494), improper enforcement of message integrity during transmission in a communication channel (CWE-924) are derived.

7.2 Cross verification

As per the discussion made in Sect. 6.3, the proposed public auditing scheme does not assure the correctness of the integrity verification at all time. Hence, a direct verification method is proposed to ensure the correctness of the integrity method. For some random data blocks, data owner generates a private challenge query Chalprivate and sends to the Cloud Service Provider. After receiving the Chalprivate, CSP verifies the ownership of the data block and allows the data owner to perform direct verification as shown in Algorithm 5. Since, the integrity is directly verified, the correctness of the proposed auditing scheme is assured and the trust issues associated with the TPA is reduced. However, it slightly increases the computation and communication overhead on the data owner’s side.

7.3 Lightweight auditing

To measure the computation overhead of the proposed LDuAP auditing schemes, four important factors are considered such as: (i) size of the signature. (ii) Computation time to generate signature and metadata. (iii) Computation time to challenge CSP and verify the integrity proof. (iv) Computation time to generate integrity proof.

(i) Size of the signature In accordance with the signature generation process explained in Algorithm 2, each data block ‘R’ is hashed using a collision resistance hash function ‘HF’ and produces ‘L’ bit hash value. These ‘L’ bit hash value is split into two equal halves, H1(R) and H2 (R) each with L/2 bits. Later, the value of ‘e’ is chosen from the prime number of \(\left( {\frac{{\left( {L + 1} \right)}}{2}} \right)\) bit and a random string ‘β’ is chosen from \(\left( {\frac{L}{2}} \right)\) bit hash values to compute the signature. The splitting of hash values into two halves reduces the value of ‘e’ and subsequently reduces the size of the signature. Choosing a prime number either from \(\left( {\frac{{\left( {L + 1} \right)}}{2}} \right)\) or \(\left( {\frac{L}{2}} \right)\) bits, does not make any major changes in aspects of security of the signature.

Let us consider, if the hash values are not split into two halves, then the value of ‘e’ should be chosen from a prime number of (L + 1) bits values (i.e.) The value of ‘e’ should be chosen from any of these 161bit prime numbers 159, 399, 493, 685, 709, 765, 973, 1011, 1099, 1263. Choosing a larger prime number for ‘e’ will put an extra load on the Data owner. To reduce it, we have split the hash values into two halves and chose a prime number from \(\left( {\frac{{\left( {L + 1} \right)}}{2}} \right)\) bit. (i.e.) 81 bits. So that, the value of ‘e’ can be chosen from any of the 81 bit prime numbers, such as, 51, 63, 163, 205, 333, 349, 429, 433, 481, 553. So using a lesser prime value for ‘e’ in Eq. (2) produces a short signature. To be precise, the size of the signature is reduced by 50% compared to the existing schemes. It subsequently reduces the computation overhead on the data owner’s side. In addition, the size of the signature required to achieve 128 bit security is very less in the proposed LDuAP scheme compared to the existing scheme. Table 6 compares the required size of the signature in various scheme to achieve 128bit security. Likewise, the signature size and time taken to generate signature is compared with existing RSA and BLS methods in Fig. 6.

Table 6 Required size of the signature to achieve 128 bit security
Fig. 6
figure 6

Time taken to generate signatures for 500 data blocks

(ii) Computation time to generate signature and metadata The time taken to create the signature for a single data block is, 3MulQR + Hash R + 3Exp QR + Modsign. Here, 3MulQR represents the number of multiplicative operations used to create the signature. The quadratic residues such as, x, h1, h2, h3 are multiplied with each other. Likewise, one collision resistance hash function (HF), three exponential operation and one modulo ‘n’ operation is used by the data owner to create the signature.

On the other hand, to generate metadata, the proposed LDuAP scheme uses, ‘n’ number of hash function to hash the cipher texts (i.e.) U1, U2 and Enc. Here, the value of ‘n’ is decided based on the total number of data blocks and the size of the BMHT. So, the total computation time required to create metadata remains as, nHashCT.

(iii) Computation time to challenge CSP and proof verification To perform public auditing, Third Party Auditor challenges the Cloud Service Provider to prove the integrity of the data block with a timestamp (Ts). Later, to perform integrity proof verification, the TPA uses a space efficient probabilistic BMHT. The proof verification is performed by comparing hash bits received from the CSP and the hash bits received from the data owner. Challenging the CSP does not use any complex operations. However, the computation time to insert and verify the hash bits in the BMHT is O (n).

(iv) Computation time to signature verification and proof generation To verify the signature of the data owner, the cloud service provider calculates the value of ‘ye’ as shown in Algorithm 4. In the proposed scheme, the time to generate the signature and verifying it are almost same (i.e.) 3MulQR + Hash R + 3ExpQR + Modsign. Likewise, to generate the proof for challenged data blocks, CSP hashes the cipher text values using ‘n’ function. So the required computation time is, nHashCT. The computational overhead of the proposed system is compared with existing schemes system in Table 7.

Table 7 Computation overhead to perform dual integrity verification

(v) Computation time to perform private auditing Existing public auditing schemes does not support direct or private verification. Because, it puts an extra workload (both in computation and communication perspective) on the data owners. However, to overcome the trust issues associated with the TPA, private auditing is introduced in the proposed LDuAP scheme.

In existing private auditing scheme, to perform integrity verification, the cipher text stored in the cloud has to be transferred to the premises of the data owner and later it has to be decrypted. This tedious process increases the computation and computational time of the private auditing. In addition, transferring the cipher text between the data owner and CSP to verify the integrity increases the data vulnerability and possibility of network attacks.

To overcome these issues, the proposed LDuAP scheme allows the data owners to perform private integrity verification on the premises of the CSP without decrypting the cipher text. It reduces the computation and communicational overhead of the private auditing. To verify the integrity of the cipher text, the proposed private auditing scheme uses three addition operations and two exponential operations as shown in Eq. (1). The computation time required to perform direct verification is, 3AddCT + 2Expkey.

In addition, the proposed private auditing scheme allows the data owners to verify the integrity only for a limited number of random data blocks. If TPA performs ‘m’ number of auditing in time ‘t’, then the data owner is allowed to perform only ‘m/10’number of direct auditing. It further helps in maintaining the workload on the data owner.

8 Implementation and result analysis of the LDuAP scheme

The experimental analysis of the proposed LDuAP scheme is carried out on a private cloud. Eucalyptus v4.2.0 is installed on an Intel Xeon E5 2620 v4 processor with a processing speed of 2.1 GHz, 64 GB of RAM and 4 TB of storage space to create the private cloud. An open source mhealth dataset, which consists of 1,72,824 vital signs value of the patients is used to evaluate the performance of the proposed LDuAP scheme. The values in the dataset are sensed using various medical sensors.

The main objective of the proposed LDuAP scheme is to reduce the computation and communication overhead of the integrity verification. In this section, to evaluate the total overhead of the proposed auditing scheme, computational overhead on data owner and TPA are measured separately. Likewise, the communication overhead to perform public and private auditing is measured and compared with the existing auditing schemes.

8.1 Computation overhead on the data owner

Data owner performs five major tasks in the proposed auditing scheme, such as, (1) signature generation, (2) key generation, (3) metadata generation, (4) encrypting the data and (5) private auditing for random data blocks.

As explained in the Sect. 7.3, the proposed scheme uses a lightweight signature to verify the ownership of the data. It subsequently reduces the computational overhead on the data owner and TPA. The time taken to generate the proposed lightweight signature for a total of 500 data blocks is measured and compared with existing RSA and BLS signatures in Fig. 6. Likewise, the computation time taken by the data owner to generate asymmetric keys, metadata information and encrypting a total of 750 data blocks each with 1 MB size is measured and shown in Fig. 7. In addition, the time taken to execute a single data block on the data owner is measured and compared with existing schemes in Table 8.

Fig. 7
figure 7

Time taken on data owner to execute 750 data blocks

Table 8 Time taken by the data owner to execute a single data block

The time taken to generate asymmetric keys in the proposed LDuAP scheme is slightly higher than the existing schemes. Because, the proposed algorithm uses three exponent operations to generate keys. However, the time taken to generate the signature, encryption and metadata creation is low compared to the existing schemes (Yu et al. 2016; Xu et al. 2017; Xiling et al. 2018).

On the other hand, to perform private auditing for random number of data blocks ‘R’, the data owner perform two major operations such as, (1) challenges (Chalprivate) the CSP directly using the block identification number (Bid) and data owner’s signature, (2) verifies the integrity of the cipher text without decrypting it in the premises of the CSP. The time taken to perform private auditing for 500 data blocks is measured and shown in Fig. 8.

Fig. 8
figure 8

Computation overhead to perform private auditing for 500 data blocks

8.2 Computation overhead on TPA

Third Party Auditor perform two major tasks, such as, (1) challenging the CSP to prove the integrity of the data block and (2) proof verification. TPA uses metadata information to challenge the CSP and it uses the BMHT to verify the proof generated by CSP. The time taken to perform these operations for a single data block in the premises of the TPA is measured and shown in Fig. 9.

Fig. 9
figure 9

Computation overhead on TPA to perform public auditing

Likewise, the probability of false positive errors in the proposed BMHT is compared with existing traditional Bloom filter. To measure the probability of false positive errors in BMHT, a total of 500 data block ‘R’ each with 1 MB is used. The number of hash function (HF) used to hash the input data block is set to 3. The total size of the array used to measure the probability of false positive error in traditional bloom filter is fixed to 500. (i.e.) number of data blocks to be inserted. On the other hand, to calculate the initial matrix size (m*n) of the proposed BMHT, Eq. (2) is used.

Later, when the filter reaches a certain fill ratio threshold value, a new matrix is created with a tightened error probability. The creation of new and large BMHT causes a sudden reduction in the probability of false positive errors in BMHT. The comparison of the probability of false positive errors in the proposed BMHT is compared with traditional Bloom filter in Fig. 10.

Fig. 10
figure 10

Comparison of False positive errors in different schemes

8.3 Communication overhead of the proposed LDuAP auditing scheme

While considering the communication overhead of the proposed public auditing scheme, uploading the cipher text to the cloud storage server and uploading the metadata to TPA is performed only once. However, the communication between the CSP and TPA happens in a loop to ensure the integrity of the data stored in the cloud. So, the communication between the TPA and CSP is measured and represented in Fig. 11.

Fig. 11
figure 11

Communication cost of the public auditing

On summarizing, the performance of the proposed public auditing scheme is evaluated based on the computation and communication overhead of the proposed LDuAP scheme. It effectively reduces the size of the signature by 50% and subsequently reduces the computation overhead on the data owner. In addition, the proposed BMHT in the TPA effectively reduces the probability of false positive errors and increases the accuracy of public auditing.

9 Conclusion and future works

To verify the integrity of the outsourced data, auditing is performed at frequent interval in the cloud storage servers. However, existing auditing scheme faces three major problems such as, (1) trust issues associated with the TPA, (2) false positive errors in the index table produces erroneous integrity results and (3) size of the signature used to verify the ownership is large and subsequently increases the computation overhead of the auditing. To overcome the issues, an LDuAP based on the Cramer-Shoup cryptosystem is proposed. It combines both public and private auditing schemes to overcome the trust issues associated with the TPA. It also introduces a low-error index table called, BMHT. It reduces the false positive errors in the hash table and increases the accuracy of the public auditing. In addition, to reduce the computation overhead of the auditing, a lightweight signature is used for ownership verification. The computational overhead of the proposed LDuAP scheme is convincing enough to work with real-time applications. However, the size of the cipher text encrypted using the Cramer Shoup cryptosystem is slightly high compared to the existing encryption algorithms. In future, the efficiency of the cloud storage can be improved by reducing the size of the cipher text encrypted by the Cramer Shoup cryptosystem.