Keywords

1 Introduction

The computational capability is increasing drastically from the past decade. The development of networking is leading towards the rapid growth of the web technologies and data centers [1]. The development of Internet of things and big data is quickly accelerating and impacting all areas of technologies. This enhancement is benefiting many organizations as well as individuals [2]. The IoT devices are playing a vital role in this enhancement. The Big Data can be categorized depending upon three factors Velocity, Volume, and Variety. Gartner introduced these terms to describe the challenges of big data [3]. Massive opportunities are producing by analyzing this IoT data in the domain of smart cities, smart transportation, health care and much more.

The rapid growth of IoT devices triggers big data analytics as a challenging task. According to the estimation of IDC (International Data Corporation), the big data market will reach more than the US$125 billion by 2019. The big data analytics used to extract the useful information using various data mining techniques [4]. This information is useful in taking the business decisions as well as in revealing the recent trends.

The IoT data is different from the big data collected through the systems. As the IoT data is collected from various sensors, this data consists of a lot of noise, heterogeneity, and variety [1]. Various studies said that the sensors will increase to 1 trillion by 2030. This enhancement will cause the producing of huge amounts of data [3]. The area like smart traffic, smart grids, intelligent logistics management and intelligent buildings are the some of the applications of IoT and Big Data.

Big data is a term refers to large data sets. These data sets are complex in nature. The traditional data processing applications are not suitable for processing the data [2]. The big data consists of both structured as well as unstructured data. The data is generated from various sources like social networking sites, health care applications, and sensor networks and from many other organizations.

Internet of things is envisioned as the emerging trend. There is a lot of scope for research. This technology makes the human life comfortable. It shows solutions to the lot of problems related to logistics, transportation, urbanization, and environment. This technology enables to connect the physical world things and cyber world.

2 Introduction to Internet of Things and Big Data

2.1 Internet of Things

IoT is a platform where the devices are communicated with one another over the internet. These devices consist of various types of sensors. The IoT devices will share the information in a convenient manner. The IoT is termed as the next generation revolution. IoT is adopted in various sectors such as smart cities, smart transportation, smart office, smart retail, smart energy and smart health care.

Figure 1 is representing the Internet of Things Big picture. It consists of various communication mechanisms used in IoT, Different IoT gateways, different storage mechanisms used for storing the data produced by IoT devices and also various applications that use Internet of Things.

Fig. 1
figure 1

Internet of Things big picture

The mobile devices, transportation vehicles, home appliances, health care devices etc. are used for data actuation [3]. Wrist watches, Doors, refrigerators, air conditioners and microwave Owens are the some of the IoT devices. These devices are deployed in various geographical locations [5]. These devices will acquire the real time data. These devices are connected with several communication mechanisms such as Bluetooth, Zigbee, WiFi etc. 50 billion devices such as laptops, smart phones, sensors will connect to the internet.

The graph in Fig. 2 is generated from the data provided by Cisco in the year of 2011. That graph is representing how the IoT devices are increasing rigorously in this decade.

Fig. 2
figure 2

Growing of Internet of Things

Figure 3 is representing that by 2020 50 billion connected devices will be available for 7.6 billion people, and also the above figure is showing that the IoT is the combination of people, devices, and sensors interconnected with one another.

Fig. 3
figure 3

Connected world

2.2 Big Data

The IoT devices and various other software applications will produce the data continuously. This data will consist of Structured, Unstructured and Semi structured data. This huge amount of data is termed as “Big Data”. The conventional data bases are not sufficient to store this huge amount of data [6]. In simple terms, the big data can be defined as the data that cannot be handled by the single system. The conventional data bases cannot be used for processing and analyzing this data that is growing rigorously.

Gartner proposed a model consisting of 3V’s (Volume, Velocity, Variety). In other words Volume of the data producing, Variety of the data producing, Velocity or speed of the data producing [1]. Some investigations said that the volume is the main characteristic of the big data.

2.3 Big Data Analytics

The big data analytics examines large data sets consisting of a variety of data to provide useful business information, market trends. By analyzing this huge amount of data can help the organizations in getting the useful information. Big data analytics require tools and technologies that can transform structured, unstructured and semi structured data into more useful data [46]. The scientists can analyze large volumes of big data using the traditional tools.

2.4 Relationship Between Big Data Analytics and IoT

The big data analytics are used for decision making by analyzing the data produced by IoT devices continuously. The big data analytics are used to analyze the continuous data and store this large amount of data using various storage technologies. This large amount of data mostly consists of unstructured data [7]. Here the analytic tools need to analyze this data with lightning speed so that the business organizations can take the decisions immediately. Need of adopting big data in Internet of Things applications are increasing dramatically. Figure 2 is representing how the big data and IoT are interdependent on one another. As the usage of IoT devices is increasing the use of the big data will also increase proportionally [810]. The combinations of these two technologies are providing good business opportunities in the area of business and research.

Figure 4 is representing that how the big data analytics and Internet of Things are inter connected with one another. The Figure consisting of three phases, the first phase consists of IoT devices with sensors, these devices are interconnected with one another [1114]. The second phase consists of different storage technologies. The data produced by IoT devices are stored on low-cost commodity hardware. This data can be called as big data, this data has mainly three properties i.e. volume, velocity, and variety. This data will be distributed among fault tolerant databases. The third phase is an analytical phase. In this phase, various tools will be used for analyzing the data such as MapReduce, Spark, Skytree, and Splunk. These tools require training data set. With the help of training data sets we use queries, then produce reports and result sets.

Fig. 4
figure 4

Relationship between Internet of Things and big data analytic

2.5 Architecture of IoT for Big Data Analytics

The architecture in Fig. 5 is representing the Architecture of Internet of thing for Big data analytics. This architecture is consisting of seven layers. The first layer consists of IoT devices which are having sensors. The second layer consists of communication devices such as Internet, Zigbee, WiFi, and Bluetooth etc.

Fig. 5
figure 5

Architecture IoT for big data analytics

The next layer is the cloud which is constructed with commodity hardware. The data generated by IoT devices will be stored in this cloud. The data will be received to the cloud through IoT gateway. The next phase consists of big data analytics phase. In this phase, a large amount of data will be processed which is stored in the commodity hardware. The major purpose of this architecture is to provide ample business solution.

3 Privacy of Big IoT Data

Sensitive information (Personal details) of users will be grabbed by the IoT devices. So, security is one of the major issues in the Big IoT data. These systems majorly depend on third party services. The traditional security solutions are not much effective in protecting this huge amount of data. The current existing security algorithms are majorly designed for providing security to the static data. The data produced from IoT devices are dynamic data. The data can be protected at generation phase, data storage phase, and data processing phase. Information privacy is protecting the information of a person or an individual from others. Security is protecting the data by using technology from recording, modifying, deleting.

3.1 Big Data Privacy at Data Generation Phase

The big data can be generated passively or actively. In active data generation, the data generated by the user will give to the third party. In passive data generation, the data will be generated by the user and the user will not have awareness about either the data is collected by the third party or not. The data can be protected at data generation phase either by access restriction of data or by falsifying the data.

  1. a.

    Access Restriction

In most of the cases user not interested to share the sensitive information. If the user wants to share the data passively the user will take some precautions to secure the data by blocking the advertisements, blocking the scripts, and also by using some encryption techniques.

  1. b.

    Falsifying the Data

In several cases it is very difficult to protect the sensitive information; in such cases, data falsification is used. In data falsification, the data will be distorted by using various tools. For example, while us using the credit card for online shopping Mask Me tool is used by most of the merchants.

3.2 Big Data Privacy at Data Storage Phase

The enhancement of big data technologies is leading to overcoming the storage problem. But if the big data storage system is compromised in security aspect, it will lead to a disclosure of the Users personal information. There are four categories in the traditional security mechanism. They are data security schemes at the file level, data base level, medium level and encryption scheme at the application level. The big data infrastructure should be scalable. By using storage virtualization we can accommodate more than one application dynamically. In this storage virtualization, more than one network storage devices are combined dynamically, so that we can assume that this is a single storage device. The data storage security and also computation auditing security can be provided with the help of SecCloud model.

3.3 Privacy Preservation Approaches for Cloud Storage

There are mainly three factors to be considered in storing the data securely n the cloud, i.e. integrity, confidentiality, and availability. The integrity and confidentiality are directly related to the security aspect of the data. The availability is representing the authorized persons can access the data whenever they required it. There are some basic methods to fulfill the security aspect of the data [15]. For example, the sender will encrypt the data with a public key and the receiver will decrypt the same data using a private key. The mechanisms for ensuring the privacy of the data are Attribute based encryption, Storage Path encryption, Homomorphic encryption and Using of Hybrid clouds.

3.4 Verification of Integrity of Data in Big Data Storage

When the data is stored in a third party cloud, the user will not have control over the data. So the data is at risk. In this scenario, the user needs to verify whether the data is stored in the cloud or not properly [16, 17]. This verification is called for checking the integrity of the data. To verify the integrity of the data there are several mechanisms provided. They are Message authentication code, Digital signatures, Checksums, trap-door hash functions, and Reed-Solomon code. We can also verify the integrity of the data available in the cloud by retrieving all the data stored in the cloud. The integrity verification is having the highest priority in security aspect.

3.5 Privacy Preserving of Big Data in Data Processing

Batch processing, machine learning, stream processing and graph processing are the big data processing paradigms [18, 19]. We can provide security to the data in two phases. In the first phase, the data should be protected from disclosing to the others. If the data is disclosed then the personal or sensitive information of the user will be at risk. In the second phase, the meaningful information needs to be extracted from the data without violating the privacy.

4 A Secure Mechanism for Big Data Collection on Internet of Vehicles

Internet of Vehicles is an extension of Internet of Things. The internet of vehicles is under smart transportation domain. On Internet of Vehicles, the vehicles get to connect with one another and also with the internet. This connection is leading producing of the data of different dimensionalities [20]. This data consists of the vehicles’ location, a speed of the vehicle and the route in which the vehicle was traveled. This type of information will be collected by different sensors of the vehicles. The analysis of this information carries huge research interest. This research will be useful in traffic management [21]. This information may also consist of the users’ personal information. If the security is not provided for this data then users’ privacy will be at risk. There may be a chance that fraudulent information may be transmitted by the malicious vehicles to disturb the traffic system intentionally. So it is necessary to take the precautions to avoid the malicious vehicles.

Here in this security mechanism, first of all, the vehicles need to register at the big data centers to authenticate the vehicles. This authentication will be done by using the single sign-on algorithm. The basic architecture of the internet of vehicles is as represented in Fig. 6. This architecture consists of 4 blocks, i.e. Vehicle nodes, Road side units, Big Data centers and Storage module. This architecture also includes Satellite [22]. The vehicle nodes will communicate with one another and also with the Road side units. These road side units are also called as the SLINK nodes. These vehicles will be communicated with the help of the internet also, for that purpose the vehicles also interact with the satellites. The data generated with the help of vehicles will be collected by the SLINK nodes. This data will be transferred to the big data centers, where the collected data will be analyzed and again the information will be transmitted to the vehicle nodes. This information consists of in which route the traffic is huge and in which route the traffic is less [23]. The data will be finally stored in the storage module.

Fig. 6
figure 6

Architecture of internet of vehicles

The diagrammatical representation of this scheme is shown in Fig. 7. As the no of vehicles increases, the data of different attributes will also increase. This data will be collected from different geographical locations. This data will be stored in the big data centers. These centers are distributed storage systems. These centers use Hadoop Architectures. In the first phase authentication of all vehicle nodes will be done. Here the vehicle nodes will register at big data centers and required information will be exchanged with the big data centers. After registration phase, the vehicles will be log-on to the big data centers using a single sign-on algorithm. After that, the data will be exchanged continuously until up to the vehicles logouts from the system.

Fig. 7
figure 7

Collecting data in a secure manner

In this methodology, all the vehicles will register at the big data center for entering into the network. In the second phase authentication will be done by using a single sign-on algorithm [24]. In the next phase the collected information will be transferred securely and efficiently. In the final phase the collected information will be stored using distributed storage.

4.1 Initialization Phase

Here each and every vehicle is equipped with a certificate given by a third certification authority. In this phase, the vehicles need to register with the big data centers. The vehicles and big data centers generate a public key and private key among themselves. As shown in the Fig. 8 certificates and public keys will be exchanged in between the Vehicle node and big data center. The slink node acts as a mediator [25]. After that the certificates will be verified then the vehicle will get registered in the big data centers.

Fig. 8
figure 8

Exchanging of messages at initialization phase

4.2 First Time Log-on

This section describes different procedures for slink node and vehicle login using single sign-on algorithm.

In the slink nodes’ sign-on phase ID of the slink node, random number to fight against the replay attack, Message time stamp and slink nodes signature will be sent to the Big Data center. The big data center will check all these details. If these messages are from valid slink nodes then the big data centers will generate a session key. This session key is a unique key. The big data center will forward a packet consisting of a random number and unique session key (sc_key). This packet will be encrypted with the public key of the slink node. The slink node will decrypt the private key and acquires session key. Table 1 list out the symbols used in Fig. 9.

Table 1 Symbols notations
Fig. 9
figure 9

Logging on of slink nodes’ for the first time

In the Vehicle nodes’ sign-on phase, the vehicle node sends the ticket to the slink node with its signature. The ticket has a time stamp of the message, vehicle node ID and random number generated by the slink node for fighting against the replay attack. Then the slink node sends the same ticket to the big data center [26]. This ticket consists of signature of the slink node. Big data center validates these details and three tickets will be passed to the slink node. The first ticket consists of Time stamp, ID of the big data center and a Random number generated by the big data center along with the big data center signature (See Fig. 10).

Fig. 10
figure 10

Vehicle nodes’ first-time log-on

$$\begin{array}{*{20}l} {{\text{X2:}}\left( {{\text{Tstamp}}\left\| {\text{ID}} \right.\left\| {\text{rno}} \right.} \right){\text{cen\_sign}}} \hfill \\ {{\text{X3:E}}_{{{\text{veh\_pk}}}} \left( {{\text{vc\_key}}} \right)} \hfill \\ {{\text{X4:E}}_{{{\text{sc\_key}}}} \left( {{\text{X2}}\parallel {\text{X3}}} \right)} \hfill \\ {{\text{X5:E}}_{{{\text{vs\_key}}}} ( {\text{vs\_key)}}} \hfill \\ {{\text{X6:E}}_{{{\text{veh\_pk}}}} \left( {{\text{X2}}\parallel {\text{X3}}} \right)} \hfill \\ \end{array}$$

The second ticket consists of session key between vehicle node and big data center. This session key is encrypted with vehicles public key [27]. The third ticket consists of both first and second tickets these tickets are encrypted with session key between slink node and big data center. The slink node generates a session key between vehicle node and slink node. The X2 and X3 are encrypted with this session key and this packet will be forwarded to the vehicle node. The session key also forwarded to the vehicle node by encrypting it with the public key of the vehicle. Table 2 has the description of various symbols used in this scheme.

Table 2 Symbols notations of vehicle node

4.3 Once Again Log-on

As the vehicle nodes are in the moving condition, the vehicle nodes need to log-on to the next arriving slink nodes by leaving the current log-on slink node. When the vehicle nodes want to access another slink node by leaving the first log-on slink node we need to follow the scenario discussed in this section [28]. Figure 11 is representing the communication between the slink node and vehicle node. In this communication process session key will be updated. The stored ticket X2, vehicle certificate will be forwarded to the Slink node with vehicle signature.

Fig. 11
figure 11

Later log-on of vehicle node

The ticket X2 consists of the big data center signature. This signature shows that the ticket is issued by the big data center. After that, the session key (vs_key) of the slink node will be encrypted with the vehicle nodes public key. This session key (vs_key) will be forwarded along with the slink node certificate (Slink_cert).

4.4 Collecting Data Securely

In the previous scenarios the secure connection will be established between the Vehicle node, Slink node and big data centers. The data will be divided into two categories, business data, and confidential data. The business data will be exchanged in the plain text format and confidential data will be exchanged securely [29]. The business data consists of the information like temperature. The X4 can be calculated by concatenating vehicle nodes’ Id with business message M1. The hash value of M4 is utilized for calculating HMAC. HMAC helps in stop the tampering of data. So that, the data will be sent to the receiver without any loss. The same scheme will be used to transfer the data from a big data center to the Vehicle node. Here M2 is the business message to be sent from the big data center to the vehicle node (See Fig. 12).

Fig. 12
figure 12

Exchanging of business messages for big data collection

But the confidential data need to be exchanged securely. So here we are encrypting the confidential data and converting into the cipher text format. Here a random key, Z is utilized for encrypting. For sharing the random key Z with the slink node and big data center, vc_key and vs_key will be used (See Fig. 13).

Fig. 13
figure 13

Exchanging of Confidential messages for big data collection

4.5 Data Storage Security

In the previous sections we have discussed the secure connection establishment and secure data collection [30]. This section describes the storing of important data in big data center securely. All the information related to the vehicle need not be stored in the big data center. Some important information needs to be stored in the big data center securely [31]. Table 3 is the data structure of the information need to be stored in the big data center. It has the data structures related to both slink node and vehicle node. The first field of the data structure is ID, the second field is the certificate. This ID and certificate fields are used for identification purpose. The third field is statue field. This field consists of two values “on” and “off”. If any abnormal situation occurs, the status field will change from on to off. The next field is the time stamp period field. If the time stamp period expires the vehicle node and slink nodes need to register once again as the new nodes. The session key and public key are important in providing confidentiality to the data.

Table 3 Data structure of big data center for storing slink node and vehicle node data

The business information will be stored as a plain text in the big data center [32]. While storing the confidential information, the data need to be encrypted with the session key between vehicle node and big data center (vc_key). When the vehicle node itself interacts with the big data center, the data will be retrieved [33]. Otherwise, the data cannot be retrieved from the big data center. When the vehicle node interacts with the big data center the data will be decrypted with the session key between vehicle node and big data center (See Fig. 14).

Fig. 14
figure 14

Distributed storage system for big data collection

As the no of vehicles is increasing day to day, the data collected from the vehicles also increasing rapidly. The Hadoop Distributed File System (HDFS) is a famous system for storing the big data. In this HDFS system we will have one name node and remaining all are data nodes. In the HDFS system the data will be replicated to more than one location to avoid the fault tolerance. Whenever a vehicle node wants to access the data, the vehicle node interacts with the big data center, the client JVM request for the file name and block ID through the distributed file system [34]. This distributed file system interacts with the Name node. The name node will acknowledge the block location and block id to the distributed file system [35]. Finally, FS Data output stream sends the Block ID and byte range to the data node to acquire the data [36]. If the acquired data is business data then the data will be sent to the vehicle node as a plain text. Otherwise, the data will be sent as the cipher text. The cipher text needs to be decrypted by using the session key between the vehicle node and big data center (vc_key).

5 Providing Security to Big Sensing Data Streams Using Dynamic Prime Number Based Security Verification

Real time data processing schemes require in many applications such as social networking applications like Facebook and Twitter, large scale sensors, web exploring, financial data and surveillance data analysis [37]. Stream processing engines are introduced with an aim to process the sensing data streams with the small delay. These engines are used to process the data in real time rather than processing after storing the data. But these engines are not suitable for processing the big stream data [38]. These large quantities of data contain various data, i.e. both structured and unstructured data. As these big data streams are continuous in nature, this data needs to be processed in real time [39]. The velocity and volume of this data are huge, we cannot store this data. So, the conventional computing models are not suitable.

5.1 Security Verification of Data Streams

The big sensing data streams are used in some critical applications such as military, these data need to be secured. The sensors are having the low processing power, less power, low storage and also very less energy [40]. These data streams need to be processed during the transmission phase itself. Here providing the security to the data is a very important aspect. For providing security to the data cryptographic model is used. There are two cryptographic models such as Asymmetric and Symmetric [41]. The asymmetric algorithms are much slower when compared to the symmetric cryptographic algorithms. But these symmetric cryptographic algorithms are failed in many cases when providing the security to the streamed data.

As the symmetric key cryptographic algorithms are failed in many cases of big data streaming, the dynamic prime number based security verification scheme will address those challenges. In this scheme, the key will be generated with the help of prime numbers synchronously. This key is generated at regular intervals of the time [42]. This prime number generation will be done at both sensing device side as well as at data stream manager side. As the key is generated at both source and DSM sides, it reduces communication overhead [43]. Here the key is of 64-bit size. This smaller key helps in faster processing of the streamed data by not compromising the security. The key is updated dynamically at both source and DSM side.

5.2 Architecture of Secure Data Stream

  1. a.

    Data Stream Processing

The data stream processing is a revolutionary area. Many applications are using this data stream processing. In data stream processing huge amounts of data need to analyze with a small delay. In conventional mechanisms data is analyzed after storing it [44]. Here the data will be generated from various sources. It is very difficult to handle the data generated from various sources at a time [45]. And also in DSM, the data blocks need to go under the security verification.

The Fig. 15 is representing the architecture of secure data stream. In this architecture, the data stream flows from various sensor devices to the cloud. Here the architecture mainly focuses on three aspects, collecting the data, processing the data and storing the data. The security and query related processes are done in DSM (data stream manager). In this architecture first of all security verification will be done after that query processing will be done [46]. Small buffers will be maintained for both activities. In the final stage the data that is processed will be stored in the cloud [47]. Here the queries used for processing the data are continuous in nature, as the data is flowing continuously.

Fig. 15
figure 15

Architecture of data stream

5.3 Purpose of Symmetric Key Cryptography

The size of symmetric keys is much smaller in size when compared with the asymmetric keys, so they require less computation power. A 128-bit symmetric key provides the equal strength compared with the 3248-bit asymmetric key. The main aim of big sensing data streams is to provide security to the data streams in real time. So, the symmetric key cryptography is the best choice in this scenario. The symmetric key cryptography is 1000 times faster than other public key cryptographic algorithms. As the size of the symmetric key cryptography is much smaller, the attacker can easily attack the data which is encrypted with symmetric key cryptography. To overcome this disadvantage, the keys are generated synchronously with dynamic number based algorithm [48]. Here the keys will be generated at both sensing devices end as well as at Data Stream Manager at regular intervals of time. The Fig. 16 will demonstrate about this key generation scheme.

Fig. 16
figure 16

Relative dynamic prime number generation

5.4 Setup of DPBSV System

Here the system is completely untrusted. The Data stream Manager (DSM) should maintain the entire sensor ID’s and also secret keys (See Table 4 and Fig. 17).

Table 4 Symbols notations and descriptions
Fig. 17
figure 17

Secure authentication procedure between DSM and source sensing device

Here in this process, first of all, sensors ID and a pseudo random number will be sent to the DSM. The DSM receives these details from Sensor. After that, the DSM retrieves secret key (keys), with the help of Retrieve key function. After that, the session key (keysi) will be generated with the help of random key function. This session key will be combined with the secret key. This combination will generate a key (Keyenc) used for authentication purpose. The generated key and session key will be encrypted with the shared key (key). The hash value 1(H) will be computed with the help of hash function. The computed hash value and sensors private key will be passed to the sensor. The below given are the steps for computing hash value1.

$$\begin{array}{*{20}l} {{\text{Key}}_{\text{s}} {\text{ < - Retrieve }}\left( {\text{SID}} \right) ,} \hfill \\ {{\text{Key}}_{\text{si}} {\text{ < - random (),}}} \hfill \\ {{\text{Key}}_{\text{enc}} {\text{ < - Key}}_{\text{s}} \;{ \oplus }\;{\text{Key}}_{\text{si}} } \hfill \\ {{\text{M}}\;{ = }\;{\text{Enc}}_{\text{key}} \left( {{\text{Key}}_{\text{si,}} {\text{Key}}_{\text{enc}} } \right)} \hfill \\ {{\text{H}}\;{ = }\;{\text{Hash }}\left( {{\text{Key}}_{\text{enc}} \parallel {\rm M}\parallel {\text{rno}}} \right)} \hfill \\ \end{array}$$
(1)

After that, the sensor will get the hash value (H) and key used for authentication purpose (keyenc). The DSM will get these details and finds its own secret key based on the authentication key. The sensors secret key and authentication key (Keyenc) will be encrypted for users’ authentication. The hash value will get with the equation Hash (Keyenc||M′||rno). And validates whether the hash value generated by it and the hash value generated by DSM are equal or not. If M = M′ and the hash values are equal, then the authentication of DSM is successful by the sensor. If the authentication process is failed then again the process will begin from step 1.

$$\begin{array}{*{20}l} {{\text{key}}_{\text{si}} {\text{ = Key}}_{\text{enc}} \oplus {\text{Key}}_{\text{s}} } \hfill \\ {{{\rm M}^{\prime}}{\text{ = Enc}}_{\text{key}} \left( {{\text{key}}_{\text{si}} , {\text{key}}_{\text{enc}} } \right)} \hfill \\ {{\text{Hash}}\left( {{\text{key}}_{\text{enc}} \parallel {{\rm M}^{\prime}}\parallel {\text{rno}}} \right)} \hfill \\ {{\text{M = }}{{\rm M}^{\prime}} , {\text{ for authentication of DSM}}} \hfill \\ {{{\rm H}^{\prime}}{\text{ = Hash}}\left( { 1\parallel {\text{Key}}_{\text{enc}} \parallel {{\rm M}^{\prime}}\parallel {\text{rno}}} \right)} \hfill \\ \end{array}$$
(2)

After that the H′ will be forwarded to the DSM, the DSM compares the received value with the Hash(1||Keyenc||M||rno), If these two are equal then the sensor is authenticated successfully. If the authentication is failed then the protocol will be terminated. In this way, both the sensor and DSM authenticate each other. After successful validation, the DSM sends another hash value to fulfill the protocol. The hash value H″ will be calculated as shown below.

$$\, {{\rm H}^{\prime\prime}}\;{\text{ = Hash}}\left( { 2\parallel {\text{Key}}_{\text{enc}} \parallel {\text{M}}\parallel {\text{rno}}} \right)$$

5.5 Handshaking of DPBSV

During the calculation of prime numbers, we need to take care of communication overhead. The communication overhead must be reduced. The PF(RP) function used to generate the prime numbers randomly at both sides. These prime numbers have to be generated at regular intervals of the time. The DSM transmits the algorithms related to generating of the prime number and keys like (Keyd,it,RP,PF(RP),KeyGen, Keysh) to each and every individual sensor by encrypting them with the shared key generated initially. This transferred information will be stored in the trusted part of the sensor.

After successful completion of handshaking process, the data needs to be transmitted securely. This secure transmission and verification can be done by using several functions and keys. As discussed earlier this scheme utilizes dynamic prime number generation process. This dynamic prime number generation can be done at both sensor and DSM side. Each and every sensor will have its own key. Initially shared key and prime numbers will be generated by the DSM itself. Next prime number will be generated depending upon the current prime number and the interval time. The shared key will be generated by the sensors depending upon the formula Keysh = Hash(Enc(RP,keyd)). Here, each and every data block consists of two parts. The first part consists of the encrypted data. This data will be encrypted with the help of secret key Keyi and shared key keysh. These three things will be mutually exclusively ORED. DATA ⊕ Keyi ⊕ keysh. This encryption is mainly used for integrity checking. The second part is used for authentication checking. Si ⊕ keysh. So finally the resultant block is

$$\begin{array}{*{20}l} {({\text{DATA}} \oplus {\text{Key}}_{\text{i}} \oplus {\text{key}}_{\text{sh}} )\parallel ( {\text{ S}}_{\text{i}} \oplus {\text{key}}_{{{\text{sh}} .}} )} \hfill \\ {{\text{Lets I}}_{\text{d}} {\text{ = DATA}} \oplus {\text{Key}}_{\text{i}} \oplus {\text{key}}_{\text{sh}} } \hfill \\ {{\text{A}}_{\text{d}} {\text{ = S}}_{\text{i}} \oplus {\text{key}}_{\text{sh}} } \hfill \\ \end{array}$$

In the next step the sensor will send the encrypted format of the above data block

$${\text{Enc}}_{\text{k}} \left( {{\text{A}}_{\text{d}} \parallel {\text{I}}_{\text{d}} } \right)$$

5.6 Security Verification of DPBSV

The security verification should be done in real time. The main aim of the security verification is to provide the end to end security. This security verification will be done at DSM side. In the security verification the DSM verifies whether the data is modified or not. And also it verifies whether the data is from the authenticated node or not. First of all the DSM will decrypt the data block to check the integrity and authenticity. First of all the DSM authenticates each and every block. And the integrity will be checked at the arbitrary interval blocks. The interval may vary from 0 to 6. i.e. the interval blocks may be 6 at most or 0 at least.

The Fig. 18 representing the updating of shared and also the security verification of the data. The updating of the shared key will be done at both sources sensing device side as well as at Data stream manager side. But the security verification will be done only at DSM side.

Fig. 18
figure 18

Updating of shared key and verification of security

6 Conclusion

In this chapter, we discussed two security algorithms. The first algorithm is used to provide the security to the vehicular data. In this methodology, the vehicular nodes and slink nodes need to register at big data center. In this methodology, single sign-on algorithm was used for login to the big data center. Symmetric key cryptography was used in this methodology. The second algorithm is for providing the security to the Sensor data. Here the dynamic prime number based security scheme was used for proving security to the big data. This prime number generation will be done at both sources as well as at big data center side.