1 Introduction

Cloud is a distributed computing model, that can host as well provide the customers with various internet services. The popularity of cloud computing has increased recently, because of the many advantages provided by the cloud [1, 2]. Cloud computing is widely used in the commercial field for data storage and online applications. The main advantage of online services is that users can access data from multiple locations any time. The cloud storage services relieve the online service providers from storage complexity and high maintenance cost [3,4,5]. However, the complete trust on the cloud service provider is not possible for the reason that the modification of data may happen at non-trusted cloud servers. Therefore, the data security is the first concerns in cloud storage services [6, 7]. To protect the sensitive data, users need to perform encryption of data before sending to the storage cloud and then enforce access control mechanism by cryptographic methods [8]. Many techniques are available to keep the data in a secure manner and mainly cryptographic algorithms are very helpful [9].

Recently, there are many cloud computing encryption techniques under research in industrial and academic field. Moreover, securing files in the cloud and protecting private information is the significant task. The privacy preservation is the process of protecting the sensitive data in the cloud [10]. To perform privacy, preserve task, several security approaches such as key generation, encryption, decryption etc. are required. Therefore, various privacy preservation methods were employed in the existing research works to protect the sensitive data [11]. The big data storage consists of several challenging functions such as multi-user access, maintenance issue, cost efficiency, and optimized storage. In addition, it does not allow any parallel computing techniques for integrating the big data with cloud computing [12, 13]. In cloud, big data security is very significant, because it suffers from several problems like huge size of sensitive or confidential data from different domains may be hacked by malicious intruders [14]. Sometimes the malicious users monitor the different organization’s data to steal the data. Many researchers are working on developing the architecture for big data security and frameworks to secure large volume of data in the cloud [15]. ECDH based privacy-preserved query retrieval system is proposed to overcome these limitations. ECDH algorithm is used in encryption and decryption function to improve the cloud security. This algorithm improves the cloud storage security and enables faster access to data from the cloud.

The organization of the paper is given here. The latest research works on how to secure big data specifically in the cloudlike environment are described in Sect. 2. The proposed ECDH method based encryption and decryption of cloud storage data are explained in Sect. 3. The experimental analysis of our proposal and existing methods are explained in Sect. 4. Finally, Sect. 5 explains the conclusion as well as future enhancement of the present research work.

2 Literature review

Researchers have suggested a number of techniques for security of big data in the cloud like background. Brief discussion on some of the significant contributions from the available literatures are given below.

Yang et al. [16] presented IoT and Cloud based cloud service systems. The proposed method provided a good platform to reduce the complexity between cloud service provider and users. An advanced encryption method protects the cloud data and avoided the leakage of personal information. The proposed IoT and cloud-based protocol minimized the computational cost incurred through the application of bilinear pairings. This proposed protocol only considered the users belongs to the same groups and the number of users must be more than two.

Stergiou and Psannis [17] presented the survey of both cloud and Big Data technology and on the security and privacy challenges. The method combined the functionality of the two technologies with an aim to examine the benefits in security while integration. The big data cloud storage system uses new algorithm, namely Advanced Encryption Standard (AES) that provided more security in CC’s and provided more privacy of data. The proposed AES method bit lags in data security and privacy due to no user verification.

Thangavel and Varalakshmi [18] presented an improved ElGamal based cryptosystem, which is asymmetrically employed to overcome key handling problems on cloud. Also, provided high security for transferring key file between owner and user. The proposed technique provided better authentication and performance against attacks. An advanced method supports both asymmetric and symmetric cryptosystem and improves the security, cloud storage performance as well as data retrieve the cloud data securely. However, the time taken for encryption and decryption increase as the character count increases.

He et al. [19] proposed a Privacy-Preserving Certificate Less Provable Data Possession (PP-CLPDP) scheme for certificate management and to ensure privacy protection. The PP-CLPDP scheme uses the public cloud, providing data integrity for big data. Certificate Management and key escrow problem has been solved in this method. The data integrity of the big data has been preserved in this method. The RSA method has been used to secure the data in the cloud storage and proof challenge is verified. The proposed PP-CLPDP scheme’s computational cost was similar to the existing CLPDP model, so little bit improvement is required.

Song et al. [20] presented the retrieval of full text in the cloud storage system with high privacy. The proposed method used bloom filter, to scatter the storage issues in tree index format. The bloom filter based tree index system shows the similarity among the encrypted documents and query document through membership entropies of index words. The ranking algorithm was used to find the most queries related words in membership index. Also, maximizing the storage space for new document increases storage cost and it is relatively high.

Gnanaprakasam and Rajivkannan [21] proposed the double encryption technique of RSA and Optimal Elliptic Curve Cryptography (OECC) algorithm to encrypt the user document. The optimal key is selected for ECC based on the cuckoo search algorithm. Once the document is encrypted in the cloud and this is stored with the access structure that shows which types of user are allowed to access the document. The information is split and stored in the two separate servers to reduce the computation complexity. The greedy selection is applied to select an optimal key from the two servers. The cuckoo search method has not escaped from the local optimum solution and slow rate of convergence; this affects the computation time of the developed method. The proposed ECDH method involves in the combination of Elliptic Curve and Diffie–Hellman to reduce the computation complexity. Since the tags has been created in secure retrieval index table generation, the challenge time is reduced.

Tewari and Gupta [22] developed an authentication protocol using bitwise operations to reduce the communication cost and storage. As the method is ultra-lightweight, the computation overhead of the method is low. The developed method is analyzed and this shows that the method is untraceable in the function. The strength of the developed method is high and process the IoT data for the authentication. The tag, reader and database server are the three most important entities used in the developed method. The developed method doesn’t able to withstand the Denial of Service (DoS) attacks.

John and Thomas [23] analysis the various adversary attacks that were used against the malware detection classifiers. The study shows that the existing defensive mechanism like simple retraining does not act as a defense in the malware detection. The adversarial retraining method has the disadvantage is that adversarial samples crafted by methods such as Jacobian based or FGSM are outside the training distribution in case of malware.

Olakanmi and Dada [24] developed a semi-honest model that allows the client to perform privacy preserving validation in the cloud without re-computing the computation. The morphism method was used by the client to effectively perform the proof of correctness in the cloud data. The method provides the anonymity for the client to check the integrity of the data in the cloud. Modified Paillier encryption is used by the developed method to increase the security of the data. The developed method has lower computation overhead in the performing the correctness of data in the cloud. The developed method is semi-supervised technique and doesn’t withstand the attacks.

Azad and Navimipour [25] developed the combination of cultural and ant colony optimization to provide the optimized parameter for the make span and energy consumption optimization in the task scheduling problem. The experimental result shows that the proposed method has higher performance compared to the other existing method. The communication overhead of the method needs to be minimized.

2.1 Problem definition

The existing method in the data encryption technique has the limitations of process in the cloud. The problems in the existing data encryption techniques were discussed below.

  1. 1.

    The complexity of the existing data encryption technique is high and required to be reduced. The complexity of the method affects the execution time of the method.

    1. a.

      Solution: ecliptic curve is the simple encryption, which is applied for the data encryption. The complexity of the proposed ECDH method is low and the execution time of the method is reduced.

  2. 2.

    Existing encryption techniques doesn’t secure the pattern access information and this involves in affects the privacy of the cloud.

    1. a.

      Solution: the pattern access information also encrypted in the proposed ECDH method. This increases the security against the attacks and protect the data in the cloud.

3 Proposed methodology

The ‘big data’ is referred as the large collection of complex distributed data created from all the digital sources available today. The research on big data facilitates the growth of scientific discovery and innovation. This paper proposed ECDH algorithm that provides high security for cloud storage environment. The ECDH uses symmetric encryption method for efficient data encryption. An elliptic curve function based DH algorithm is used in cloud data security.

Figure 1 represents the cloud based big data security model using ECDH algorithm. The proposed architecture consists of majorly four steps such as data owner, encryption process, decryption process, and text retrieval process. The major responsibility of proposed big data based security model is key generation and share the keys securely. The major benefit of using ECDH algorithm in key generation function that use the smaller keys for encryption. Also, the secured key procedure is employed for shared secret key generation. The proposed ECDH algorithm pseudo code is described in the following sections. The detailed description of the proposed block diagram is shown in Fig. 2.

Fig. 1
figure 1

Proposed architecture using big data security in cloud

Fig. 2
figure 2

The proposed ECDH algorithm block diagram

The proposed ECDH algorithm initially creates the client–server private key and public key pairs. After initialization, securely share the keys between client and server using ECDH. The data encryption and decryption process employed for storing and retrieval of the user data. An input data is partitioned into several blocks. For each block, creates the secrete keys as \({S}_{K}\) and placed in the cloud. While accessing the data, encrypted data blocks are decrypted and then merge all the block into a single document. The challenge and proof generation process is to verify the blocks and provide the permission to authorized user for data modification.

figure c

The pseudo-code and proposed architecture component description is explained in below sections.

3.1 Data owner

The enterprise or an individual that has a large volume of private data is referred as data owner. The server configuration of data owners includes the restricted storage space due to a maximum number of computations and data storing facilities are available in the cloud. The communication module provides a single combined database management by connecting all cloud databases. The user query submitted is categorized into read only and read–write queries by SQL analyzer. The SQL distributor performs the query process by selecting the most appropriate load balancing technique. The entire database is synchronized by database management system while modifying the query request for efficient resource utilization.

3.2 Dataset description

The Enron corpus database is used in this research work. The database includes four tables; the entities are messages, employees, reference information and recipients. The dataset of Enron Email consists of 200,399 messages belonging to 158 users. The several emails are randomly selected to build an experimental dataset from the Enron Email.

3.3 Initialization (set up)

Setup (1k): An implicit security parameter \(k\); \(MPK\) is the output public parameter; \(MSK\) is the master key. A large prime number \(p\) is selected by CA, a bilinear group \(\left( {G,G1} \right)\) with order \(p\), a generator \(g \in G,h \in G,y \in_{\Re } Z_{p}\) and \(t_{ij} \in Z_{p} \left( {i \in \left[ {1,n} \right],j \in \left[ {1,n_{i} } \right]} \right)\). The CA calculates \(Y = e\left( {g,h} \right)^{y}\) and \(T_{i,j} = g^{{t_{i,j} }} \left( {i \in \left[ {1,n} \right],j \in \left[ {1,n_{i} } \right]} \right)\). The public key and master keys are initialized in Eq. (1),

$$\left\{ \begin{gathered} MPK = \left( {e,g,h,Y,T_{i,j} \left( {i \in \left[ {1,n} \right],j \in \left[ {1,n_{i} } \right]} \right)} \right) \hfill \\ MSK = \left( {y,t_{i,j} \left( {i \in \left[ {1,n} \right],j \in \left[ {1,n_{i} } \right]} \right)} \right) \hfill \\ \end{gathered} \right.$$
(1)

where \(Z_{p}\) is the group of large prime order \(p\). Assume that \(t\) and \(t^{\prime}\) is the two different universal hash function in random oracle which maps \(\left\{ {0,1} \right\}^{*} \times \left\{ {0,1} \right\}^{*} \to Z_{p}\) such that \(t_{i,j} \ne t_{i,j}^{^{\prime}}\) is known to CA. The cloud server creates index nodes by inter linking them and provides the storage services.

3.4 Key sharing

The key sharing process involves the key generation process (KeyGen) performed by using a key generation algorithm. Then key sharing algorithm is run through the Central Authority (CA) and takes input from the CA.

Consider \(MSK\) and attribute list \(L\) of user \(u\), CA generates \(r \in_{\Re } Z_{p}\) and calculate the SK for user “\(u\)” as in Eq. (2).

$$SK_{L} = \left\{ {h^{y + r} ,\forall v_{i,j} \in LD_{i,j} = \left( {T_{i,j} } \right)^{r} ,g^{r} ,L} \right\}$$
(2)

where \(v_{i,j}\) indicates a set of all possible attributes and \(D_{i,j}\) represents the generated key. The secret key is shared for both owner and the user of data. The key generated is used to encrypt and decrypt data [26].

3.5 Data encryption using elliptical curve Diffie–Hellman

The ECDH algorithm is proposed encryption and decryption process for message. The ECDH algorithm is fully homomorphic method of encryption within a secure channel. It contains the pair of keys. One is public key, which encrypts data. The other one is private key for decrypting the data. The public key is used to derive message sharing directly and the private key used for decryption. Further successive data transaction uses the keys derived and ensured among the agreed parties in the channel.

The receiver’s public key \(G^{d}\) is learned by the sender, where \(d\) is the private key of the receiver itself. The sender then generates a new ephemeral value \(y\) and associated value \(G^{y}\). The sender then calculates the symmetric key \(k\) with the help of Key Generation function. The Key generation function \(\left( {KGF} \right)\) is described in Eq. (3),

$$k = KGF\left( {G^{dy} } \right)$$
(3)

The ECDH algorithm is executed in step-by-step procedure for data transaction between the sender (\(S\)) and receiver (\(R\)). At first, elliptic curve parameters of all kinds are generated. In the next step, every party should select the pair of keys, a private key (\(d\)) and a public key (\(Q\)). It’s derived in Eq. (4),

$$Q = dG$$
(4)

whereas the curve generator is indicated as \(G\).Consider, the \(\left( {d_{A} ,Q_{A} } \right)\) represents sender key and \(\left( {d_{B} ,Q_{{\mathbf{B}}} } \right)\) indicates receiver key. The public key is indicated as \(Q\) and it’s shared with others during communication. The input messages in ECDH system are denoted as points (i.e. elliptic curve \(\left( {x,y} \right)\)). The receiver estimates the points like \(\left( {x,y} \right)\) and query is decrypted through the product \(Q_{B} d_{A}\) or \(Q_{A} d_{B}\). Equation 5 represents the symmetric features of the ECDH encrypted process.

$$Q_{B} = d_{A} d_{B} G = d_{B} d_{A} G = d_{B} G$$
(5)

Data owner store the files in the cloud storage system and files are encrypted immediately for security. The ECDH encryption method consists of several parameters such as, \(M\) is the message, \(MPK\) is the public parameter, \(A\) is the access structure of attributes, and \(DK\) is the data encryption key. The ECDH algorithm combined with the key generation parameter decreases the cost of communication and overheads. Equation (6) shows the encryption process,

$$CT = \left( {A,E = Enc_{DK} (M)} \right)$$
(6)

where the variable \(CT\) represents Cipher Text, which is the encrypted data. It verifies that to decrypt the query, the encrypted data must have valid attributes set and satisfy the access policy. The method assumes that the access structure contained in \(CT\) implicitly. An Elliptic Curve (EC) algorithm uses the session key negotiation function in both ends of a communication with less amount of data exchange for high security basis. The ECDH algorithm is more secure than the EC algorithm. ECDH employs smaller key length, minimum resource utilization and high computation speed. The ECDH algorithm helps to generate the key of the cloud data.

3.6 Secure full text retrieval index

The retrieval process of encrypted cloud storage data is described in this section. Initially the storage related services are generated by the server in the cloud. For retrieval process, the global system parameters are \(\left( {H,m,k,p} \right)\) initialize the cloud storage system. The hash function is denoted as \(H\) and \(H_{1} ,H_{2} , \ldots ,H_{h} ,H_{i} :\left\{ {0,1} \right\}^{*} \to \left[ {1,m} \right]\left( {1 \le i \le h} \right)\) is the hash arbitrary strings, which is an integer in the range \(1\,\,to\,\,m\). The owner of data encrypts the files with the help of symmetric encryption before outsourcing, to maintain the data security. The owner has a key pair \((key_{doc} ,\,key_{index} )\) stored locally. Consider that the data owner can allocate the \(keys(key_{doc} ,\,key_{index} )\) to the users who are authorized via safe channels. All words from the document \(d\) is extracted and d is encrypted with \({key}_{doc}\) using ECDH algorithm by the data owner, before outsourcing a document \(d\)\(.\) Equation (7) represents the decryption process of user retrieved document.

$$M = Dec_{DK} (E)$$
(7)

where \(E\) is the encrypted key. Users receive the decrypted message. The user sends a query request to the cloud storage for decrypting the documents sent by the data owner. The decryption algorithm executions depend on the public parameter \(MPK\). The CT includes the access structure \(A\) and the attribute set \(S\) depends on secrete key \(SK\). Decrypt the CT if \(S\) satisfies the access tree and provide the \(M\) (message) otherwise “Ø”. The decrypted result again sent to the user from the cloud storage system. When the huge volume of data variety (structured or unstructured form) stored in a cloud storage, is termed as Big Data Cloud (BDC). In BDC, huge volume of data is shared between the cloud users. So, ECDH algorithm is used to improve the cloud security. This algorithm provides maximum data security in the cloud and achieves encryption, decryption and key generation time. An experimental analysis of proposed ECDH and existing method’s performance is described in the following sections.

4 Experimental result and discussion

The simulation experiment was performed using CloudSim 3.0 PlanetLab on a PC with 3.2 GHz i5 processor. The experimental data were taken from Enron Email Dataset which consists of a total 200,399 of messages belonging to 158 users. Multiple different number of emails were randomly selected from the Enron Email to form an experimental dataset. Every set of input keywords randomly were generated through the user. After that, the cloud server performs data search from the database and extracted the relevant qualified files. To find the efficiency of the algorithm proposed, several metrics such as, key generation time, encryption time, decryption time, and computation overhead were used in this research work.

  • Computation Overhead: The overhead of computation in the auditing phase divides into the generation of challenge, generation of proof and verification of proof. The computation overhead is due to generation of private keys. The calculation of computation overhead is shown in Eq. 8,

    $$computation\;overhead = n\left( {2\exp_{G1} + Mul_{G1} + Hash_{G1} } \right)$$
    (8)

    where \(n\) represents the blocks in the common files. \(Mul_{G1}\) represents one multiplication operation in \(G1\); \(\exp_{G1}\) refers one exponentiation operation in \(G1\); \(Hash_{G1}\) whereas \(q\) represents one multiplication operation in \(Z_{q}^{*}\); \(Hash_{G1}\) refers to one hash operation in \(G1\).

  • Challenge generation: when user monitor any modification happened in the stored blocks of data or files without user authentication it will send a challenge. For a received challenge, the storage will generate a proof messages and forward it to user.

  • Proof generation: In proof generation, \(F\) represents input files, an auditing challenge, a set of corresponding authenticators, and generates a proof \(P\). The Proof \(P\) is used to prove that whether the cloud accurately stores the files or not.

  • Proof verification: In this step, the inputs are the data proof \(P\) and public parameters of the system. For “valid proof” it returns “success”; or “failure”, otherwise.

Table 1 represents the key generation time of the ECDH and existing cryptographic methods. The proposed ECDH algorithm takes minimum time for generating keys compare to the other existing methods. The existing RSA, MRSA, and MRSAC algorithm take maximum time for key generation. The proposed ECDH method provides a small encrypted key that requires small computation power and time for processing the message. Therefore, the proposed method has a lower computation time of 781 ms while the existing method has 8925 ms for 2048-bit key length.

Table 1 Key generation time of proposed and existing methods

Figure 3 depicts the encryption time of proposed and existing methods. The x-axis represents the key length in (bit) and y-axis represents the key generation time in (ms). If number of key length increase, then the key generation time also increases. Compared to the existing methods proposed ECDH algorithm shows minimum key generation time with respect to different key length. The proposed ECDH method has a lower computation time for various numbers of key length due to key generation for the encrypted message is low. The proposed ECDH method The proposed method ECDH algorithm takes minimum time in key generation due to its reduce the computational overhead. Therefore, encryption and decryption keys are generated in minimum time.

Fig. 3
figure 3

Key generation time

Tables 2 and 3 represents the encryption and decryption time of input and output data respectively. An existing and proposed method’s performance is calculated in terms of different key lengths such as 100, 128, 256, 512, 1024, 2048, and 4096 in bits. Compared to the existing methods proposed ECDH algorithm shows better results. The proposed ECDH algorithm takes minimum encryption and decryption time. The proposed ECDH is a key exchange protocol, which exchange the key between client and server quickly. The existing method depends on the RSA requires a number key for the encryption and in-turn requires more time for computation. The proposed ECDH method generates less number of keys for the encryption and the computation time is reduced. The developed ECDH method has the decryption time of 187 ms for the 4096 key length, while the existing method has 10,957 ms. The computation time is much reduced due to the usage of less number of key generation. Moreover, the ECDH reduces the information delay between the client and server. Also, faster than existing methods such as RSA, MRSA, and MRSAC.

Table 2 Encryption time performance of proposed and existing methods
Table 3 Decryption time of proposed and existing methods

Figures 4 and 5 represents the performance of encryption time and decryption time respectively. The X, Y axis indicates the number of key lengths in bits and times in ms respectively. Compared to the existing methods, the proposed ECDH method takes minimum encryption and decryption time. Since the ECDH algorithm uses the key generation function, it reduces the information loss and securely exchange the key. By comparing the proposed and existing methods in the encryption and decryption time, shows that the developed method has a lower computation time in the encryption and decryption process compared to the other existing method. The computation is significantly reduced by reducing the number of key generation for the encryption and decryption process. The developed method generates a lower number of key generation than other existing method. Hence the developed method has lower computation than other existing methods.

Fig. 4
figure 4

Encryption time

Fig. 5
figure 5

Decryption time

Table 4 depicts the performance of computation overheads with respect to various numbers of blocks. Compared to the other two phases, the proof verification procedure takes longer time and challenge generation procedure takes the shortest time than other two procedures [27, 28]. Compare to the existing technique, the ECDH algorithm takes minimum challenge generation and proof generation time. For challenge generation, homomorphic tags are generated in the challenge process. Since tags has already been created in the secure retrieval index table generation process it fetches only the tag from the table during the challenge process. So it will fetch it in lesser time, which cannot be measured in milliseconds results in challenge generation time to be zero for all blocks.

Table 4 Computation overhead

Figure 6 represents the computation overhead performance of proposed and existing methods. The plot contains a number of challenged blocks in X-axis and the computation overhead in Y-axis. This graphical representation describes the proof generation, proof verification and challenge generation of proposed ECDH method. The proposed ECDH algorithm achieved better results in terms of encryption time, description time and key generation.

Fig. 6
figure 6

Computation overhead

5 Conclusion

The cloud computing paradigm has become popular recently, because, it has the ability to store huge data and flexible computation. To reap the benefit of these advantages, many data owners outsource their data and data analysis operations (e.g., data queries, data insertion, modifying and so on) in the cloud. For security concerns, a data owner may like to encrypt data before outsourcing. In this paper, ECDH algorithm is used for cloud data security. The user data encrypted using algorithm is stored in the cloud storage. The data stored is retrieved using decryption function based on the user query. The proposed ECDH method is capable of processing the data with larger key size faster than existing algorithms. An experimental evaluation of ECDH algorithm performance is measured using different evaluation metrics such as encryption time, decryption time, key generation time and computation overhead. The execution time of ECDH algorithm is around 70% better performance than the existing cryptographic methods. In future, a secure relevant data retrieval mechanism can be incorporated into the cloud storage.