ESSENCE: GPU-based and dynamic key-dependent efficient stream cipher for multimedia contents

Couturier, Raphaël; Noura, Hassan N.; Chehab, Ali

doi:10.1007/s11042-020-08613-2

ESSENCE: GPU-based and dynamic key-dependent efficient stream cipher for multimedia contents

Published: 31 January 2020

Volume 79, pages 13559–13579, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

ESSENCE: GPU-based and dynamic key-dependent efficient stream cipher for multimedia contents

Download PDF

145 Accesses
2 Citations
Explore all metrics

Abstract

Data Confidentiality (DC) is considered one of the most important security services. Currently, a set of existing cipher algorithms is being used to ensure DC. However, researchers constantly investigate the design and implementation of more efficient cipher schemes. To this end, different versions of AES have been implemented efficiently on GPUs to increase the efficiency over big data. However, AES implementation on GPU exhibits limitations in terms of latency and hence, it might not be a suitable solution for high data rates in modern systems and applications. This often leads to a trade-off between system performance and security level. To address these challenges, we propose “ESSENCE”, a lightweight stream cipher scheme, which combines two different Pseudo-Random Number Generators (PRNG), and based on a dynamic key approach. The scheme achieves a high level of security with minimal latency and required resources when compared to existing cipher standards such as AES. Moreover, the implementation of the proposed dynamic key-dependent cipher scheme on GPU is more efficient compared to all existing AES implementations on GPUs. Experimental results indicate that the proposed cipher is highly efficient with a throughput more than 115 GB/s on a Titan X GPU, and more than 372 GB/s on a Titan V100 GPU. Thus, ESSENCE can be considered as a promising stream cipher candidate with high randomness degree (BigCrush of TestU01), periodicity, and key sensitivity.

ORSCA-GPU: one round stream cipher algorithm for GPU implementation

Article 16 February 2022

Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers

Stream cipher designs: a review

Article 10 February 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Security has become the most important armor responsible for protecting all kinds of resources and data from various types of threats targeting security services such as data confidentiality, integrity, and source authentication. These security services are typically ensured by resorting to cryptographic solutions, which are essential to overcome and limit such threats. Existing attacks can be either active or passive, where passive attacks can seriously impair the Data Confidentiality (DC) and privacy of the system, while active attacks can compromise its authentication (source, user, device), integrity, and availability. Moreover, the nature of passive attacks makes them very difficult to detect compared to active ones. An active attacker may insert, delete or modify data contents. Encrypting communicated or stored data can solve all problems related to passive attacks. However, this requires a distributed scheme and a robust key exchange mechanism. Typically, symmetric-key schemes are used for data encryption, especially since they are more efficient in terms of memory and computational complexity compared to asymmetric-key ones. Furthermore, conventional symmetric key ciphers are either block-based or stream-based. Several standardized cipher algorithms that ensure DC already exist, including the stream cipher RC4 [25], and the block cipher AES [17].

In general, a block cipher [23] uses a round function that can either be based on a Feistel Network (FN) such as DES, or on Substitution-Permutation Networks (SPN) such as AES. SPN lends itself to parallel implementation and requires a lower number of rounds compared to FN and hence, SPN exhibits lower latency and requires fewer resources than FN.

1.1 Related work

AES [6] is a block cipher that processes data in blocks of size 128 bits (16 bytes), and it uses keys of size 128, 192 and 256 bits. The design of AES depends on the SPN principle. It includes a round function, which consists of diffusion and substitution operations, and it is iterated r times, depending on the size of the secret key. The number of rounds, r, is equal to 10, 12, and 14 for a secret key of size 128, 192, and 256 bits, respectively.

Each round, except for the last, includes four operations:

RoundKeyAddition: it mixes the plain input block with the specific round key.
ByteSubstitution: the operation employs a substitution table, S-Box, to ensure the confusion property.
ShiftRows and MixColumn operations are used to ensure the diffusion property. Note that the MixColumn operation is eliminated in the last round.

1.2 Problem formulation

The security level of existing symmetric ciphers, against analytic attacks, depends on the number of rounds r, which leads to a trade-off between the security level and the required latency and resources. Ciphers that are based on a static structure have proven their resistance against analytic cryptanalysis. However, the static structure of the round function represents the main security issue. Moreover, since the cipher primitives are static, the required number of rounds r is high, where different substitution and diffusion operations are performed within each round [18, 20, 21].

Fixed cipher structures lend themselves to future potential attacks [4, 31], which would benefit from the fixed structure (substitution and diffusion primitives) to recover the secret key [27]. Examples of such attacks include implementation attacks such as side-channel attacks and fault attacks [27]. Hence, countermeasures against implementation attacks are required, which would increase the latency and required resources. This, in turn, reduces their performance and makes them not suitable for some of the future systems and applications [2].

1.3 Motivation

To overcome these limitations, our approach uses the dynamic key-dependent structure as in [18, 21] to reduce the required number of rounds and operations. This leads to a good balance between efficiency and security level, as well as offering a simple solution to prevent certain implementation attacks.

To reduce the execution time of the existing cryptographic algorithms, GPU (Graphic Processing Unit) implementations are being adopted. A GPU is useful for cryptographic algorithms, which can benefit from the hundreds and even thousands of cores in a GPU. Researchers use GPUs to generate pseudo-random numbers such as in [1, 13]. Also, standard cryptographic algorithms have been implemented on GPUs such as AES [9, 14, 15], which resulted in an impressive speed-up [5] compared to the CPU implementation. It is worth noting that the efficient implementation of an algorithm on a GPU requires the expertise to optimize the use of the GPU architecture in terms of shared memory, registers, and warp [22].

Recently, an optimized and efficient implementation of AES on GPU was presented in [15]. It achieved an excellent performance and the authors made various optimization compared to the previous related works. Accordingly, this implementation is selected as the reference for comparison against the proposed cipher solution. There is another recent implementation of AES on GPU, PHAST, which was described in [26]. This implementation is more generic and it resulted in about 10% decrease in performance as compared to [15].

1.4 Contributions

The proposed cipher solution follows the recent dynamic key-dependent approach of [7, 20, 21]. In contrast to these related solutions, no integer diffusion operation is used in the proposed dynamic key-dependent stream cipher. This operation is eliminated without weakening the cipher security level since the cryptographic primitives are updated for each new input data. Moreover, the proposed solution does not require the avalanche effect, but it is based on high key sensitivity.

The proposed cipher scheme uses an efficient and simple key-stream generation algorithm that uses dynamic permutation and substitution tables in addition to two different PRNGs with a large number of seeds. To the best of our knowledge, the proposed solution is the first dynamic key-dependent stream cipher algorithm with dynamic seeds and substitution/permutation tables.

Next, we list the technical contribution of this paper as compared to the existing cipher solutions:

The proposed cipher is based on a dynamic key-dependent approach, and it is based on a simple key derivation function that uses a variable session key and a Nonce, which change for each new input message, making it highly resistant against attacks.
The permutation table is used as a perturbation technique to modify the internal state, which increases the periodicity of the employed PRNGs.
The proposed solution uses a dynamic substitution process to increase the nonlinear degree of the generated key-stream and to achieve higher key sensitivity.
The proposed cipher exhibits a high level of randomness, which was verified using the “BigCrush” of “TestU01” [12] statistical suite tests on the generated key-stream.
The proposed cipher scheme uses lightweight PRNGs and simple operations, which minimizes the latency and required resources, and leads to a simple software implementation.

In summary, The proposed cipher satisfies the desirable cryptographic characteristics such as long periodicity, high level of key sensitivity, and high level of randomness and thus, higher resistance against attacks, with low latency and overhead.

1.5 Organization

The remainder of the paper is organized as follows. Section 2 describes and analyzes existing GPU cipher implementation. In Section 3, the proposed dynamic key derivation is presented. While in Section 4, the employed cipher primitives construction techniques are described. Then, in Section 5, we introduce and describe in details the proposed stream cipher algorithm, along with the functionality of each operation. In Sections 6 and 7, we respectively assess the robustness of the proposed cipher scheme and its performance. Finally, in Section 8, a conclusion and future directions are presented.

2 Existing GPU cipher algorithms and their corresponding implementations

A GPU (Graphic Processing Unit) is a commonly used architecture to accelerate computations. GPUs are used in many computing applications and systems ranging from smart-phones, embedded computing, to supercomputers. The architecture of a GPU is quite different from that of a CPU. In a GPU, the architecture is optimized to maximize the execution throughput of many simultaneous threads. The number of computing cores inside a GPU ranges from hundreds to even thousands. The hardware is designed to execute many threads, even if the bottleneck is the memory access itself. To benefit from the GPUs computing power, users need to use a number of threads that exceeds the number of cores. Hence, while some threads are waiting for their data, other threads are capable of executing. Typically, there are many kinds of memory in a GPU: global memory which is the slowest one, cache memory, texture memory, shared memory, local memory, and a limited set of registers having the fastest access. Consequently, memory management is critical within GPUs.

GPUs are composed of streaming multiprocessors (SMs), and their number varies for different GPU types. Typically, SMs are composed of 32 cores that can execute only a single instruction at a time. So, if two threads are executed on the same SM, one instruction will be executed, while the second one would have to wait. This is called thread divergence. Consequently, “IF” instructions and “WHILE” conditions must be avoided, whenever possible. Threads are scheduled by groups of 32 on an SM, and they are referred to as warps. In practice, threads are organized into blocks; depending on the GPU architecture, the maximum thread number per block is limited to 1024. Hence, it is important to keep in mind that GPUs technical details are constantly changing with every new generation.

3 Dynamic key derivation function

In this section, the proposed dynamic key generation function and the corresponding sub-keys generation schemes are presented and illustrated in Fig. 1. The cipher primitives (seeds and permutation boxes) are dynamic and they change based on this set of sub-keys. The specific secret key, SK, is mixed with a NONCE N_o (unique for each new input) to produce a dynamic secret, O. Then, the new dynamic key (DK) is obtained by hashing O using a secure cryptographic hash function. To ensure that a different DK is produced for each different input message or session, the SHA-512 hash function is chosen and it is known for its high resistance degree against collisions. The dynamicity introduces robustness against powerful attacks. The dynamic key, DK, is used to generate the required sub-keys as explained next.

Master Secret KeyK: It is shared between both legal entities to provide enhanced security. It allows the symmetric secret key to be renewed after each periodic interval, depending on the application itself. For example, Elliptic Curve Diffie Hellman (ECDH) protocols can be selected for this specific task.
NonceN_o: Each Nonce will be used only once; it is updated for every input image or session. Two possible Nonce generation techniques can be adopted i) generated by the sender and transmitted to the receiver in an encrypted form, by either employing a secret key or by employing the receiver public key; ii) producing the Nonce at the sender and receiver in a synchronized manner, through the use of a deterministic pseudo-random generator.
Dynamic KeyDK: The master secret key K is XORed with N₀, and the output is hashed using SHA-512. This generates the dynamic key, DK, which represents the MAC value with a size of 512 bits. Then, DK is divided into 4 main sub-keys {K_P,K_SKg₁,Kg₂} each with a size of 16 bytes (128 bits).

4 Construction of dynamic cipher primitives

The sub-keys are used to generate the required cipher primitives, as described below.

Permutation sub-keyK_P: it consists of the most significant 16 bytes of DK, and it is used to produce a set of permutation tables (32 P − boxes) that can be employed during the selection process. In this solution, any key-dependent permutation generation algorithm can be employed such as the ones in [19, 20]. The selection of the Modified Key Setup Algorithm (MKSA) of [19] is used to construct the required dynamic key-dependent permutation tables. In fact, MKSA is selected due to its simple hardware and software implementations. To ensure that a P-box has a good cryptographic performance, MKSA should be always iterated with a different key in order to produce different P-boxes. Therefore, K_P is used as a seed for the RC4 just to generate a set of permutation sub-keys, and each sub-key is used as a seed for the MKSA to produce a different permutation table. On the other hand, the weaknesses of RC4, as reported in [8, 11, 16] do not affect the proposed solution, which is based on a dynamic key-dependent structure.
RC4 will be iterated to generate a byte vector of length equals to Np × lp. Then, the output is reshaped to form a matrix with a size of Np × lp, where each row represents one of the dynamic permutation keys with a length of Qp = lp × 8 bits, to be used as a permutation table.
Note that RC4 is iterated with a dynamic sub-key to avoid any weakness and to achieve a high level of security.
Substitution sub-keyK_S: it represents the second set of the 16 most significant bytes, and it is used to produce a set of substitution sub-keys, where each sub-key is used to produce a dynamic substitution table (S-box). Any key-dependent algorithm could be used for the generation of the substitution tables. We adopt the simple technique used in [19], which is based on the Key Setup Algorithm (KSA) of RC4. The output of the original KSA, for any input key, is a substitution table that is used as a dynamic S-box. RC4 is iterated to form a byte vector of length equals to Ns × ls. Then, the output is reshaped to form a matrix of size Ns × ls; each row represents one of the dynamic substitution keys, with a size of Qs = ls × 8 bits, and used as a key-dependent substitution table.
First PRNG seedKg₁: it represents the third most significant 16 bytes of DK and it is used to produce a set of seeds of length lg1, one of which is selected for each thread. Also, in this step, RC4 is selected and it is iterated for $\frac {lg1 \times Qg1}{8}$ times to generate different N seeds, where N represents the possible number of threads, and Qg1 represents the precision of the first generator, which can be equal to 32, 64 or 128. The output key-stream is reshaped to form a byte matrix of size $N \times \frac {Qg1}{8}$. Each row of this matrix represents one of the seeds, and it has a length equals to Qg1 bits. Any repeated row (seed) is eliminated from this list and RC4 is re-iterated to produce a new seed.
Second PRNG seedKg₂: It represents the fourth most significant 16 bytes of DK and it is used to produce a set of seeds of length N, one of which is selected for each thread. Similarly, RC4 is selected and it is iterated for $\frac {N \times Qg2}{8}$ times to generate different N seeds. Besides, Qg2 represents the precision of the second generator. The output key-stream is reshaped to form a byte matrix of size $N \times \frac {Qg2}{8}$. Each row has a size of Qg2 bits, and represents one of the seeds. Any repeated row (seed) is also eliminated from this list, and RC4 is re-iterated to produce a new seed.

All notations are shown in Table 1. These steps guarantee a high level of sensitivity, where any tiny change in the dynamic key would result into a completely different cipher primitive in the generation process; such a change was proven in Section 6.2. The parameters’ derivation is illustrated in Fig. 1.

Table 1 Table of notations

Full size table

5 Proposed stream cipher algorithm

This section describes the proposed stream cipher, “ESSENCE”, which is designed with a single round to outperform AES. The main properties of the proposed solution are: high-security level, reduced computational complexity, and simple and parallel hardware and software implementations.

5.1 Basic Concepts

The proposed scheme is based on 3 main concepts:

Parallel Computing: This algorithm is designed to run in parallel. All the threads are independent of each other and they could be all executed in parallel (see Fig. 2), even if it is not possible to schedule all of them at the same time.
Fig. 2
Scheme of the proposed lightweight stream cipher algorithm for the i^th thread
Full size image
Multi-streaming multiprocessors (SM) contain each 32 syn-chrome threads and shared memory and hence, the same operation is applied on these syn-chrome threads but with different inputs. In the proposed scheme, every 32 threads are iterated to perform the same function. For example, the first PRNG with different seeds is iterated to produce 32 outputs, each represented by 32 bits.
Flexible Structure: The structure of the proposed stream cipher allows for any pair of outputs of the efficient PRNGs to be used [29]. Therefore, any PRNG that exhibits very good performance and satisfies the randomness properties could be used. For example, in this paper, we use “xoroshiro128plus” and “xorshift”, which were selected due to their simplicity and efficiency. The proposed solution uses both PRNGs since TestU01 can detect the link between all the threads, if only one PRNG is used.
Efficient & Lightweight Combination of both PRNGs The selected pairs of PRNGs are combined in an efficient manner (diffusion operation) to produce a key-stream with high periodicity, and a stable randomness degree. The proposed technique benefits from the shared memory of GPU, whereby the output of the first PRNG is stored in the shared memory. Then, the output of the second PRNG is mixed with two different outputs of the first PRNG.
Dynamic Selection of Shared Memories. These shared memories (O₀, and O₁) are selected according to dynamic permutation tables (32 different P-boxes).

In the following, we describe a set of possible pseudo-random generators that can be used in the proposed stream cipher.

5.2 xoroshiro128+ (XOR/rotate/shift/rotate)

It is a successor to Xorshift (implementation at xorshift128+). It uses a carefully handcrafted shift/rotate-based linear transformation, as shown in Algorithm 1. This PRNG ensures a significant reduction in latency and the corresponding resources. Also, this PRNG reaches a high level of randomness. It has a repetition period of (2¹²⁸-1), which is not long enough for cryptographic algorithms. Therefore, the proposed stream cipher scheme uses a higher number of threads and for each thread, a xoroshiro128+ PRNG is used. Also, a diffusion operation is applied to the output of three different xoroshiro128+ PRNGs (different for each iteration) to increase the periodicity of the proposed key-stream. The xorshift64 algorithm is presented in Algorithm 2.

5.3 Xorshift

Xorshift belongs to a class of PRNGs that is based on linear-feedback shift registers (LFSRs), which is described in Algorithm-2. Xorshift allows for an efficient implementation without the need of excessively using sparse polynomials. This makes them extremely fast on any modern computer architecture. Similar to LFSRs, the available parameters must be chosen with extreme caution in order to achieve a long period [24]. However, xorshift generators do not have non-linear steps. This makes them fail some statistical tests [24]. However, Xorshift generators do have numerous advantages including low execution time as well as a simple implementation.

5.4 Proposed encryption algorithm

Below, we describe the various steps of the proposed algorithm, as illustrated in Algorithm 3:

Note that the input data is stored in the d_input table, and the encrypted data (output) is stored in the d_output table. As the input is not changed, the keyword __restrict__ allows the compiler to optimize the variable’s access, which reduces the memory access time. Other unchanged variables also have this keyword during the execution of the algorithm.

The variable d_xoro is used to store the required internal values for the ”xoroshiro128plus” PRNG [3]. Each thread has a different value. To improve the performance, in many GPU algorithms, one is advised not to compute more than a single value per thread. Consequently, in our algorithm, the variable nb represents the number of elements that each thread is responsible for. The use of the loop is essential to reduce the number of threads used in the code to maximize the GPU’s occupancy. Without the loop, the performance would be diminished.

In the main loop, the xoro variable is used to select 2 permutation tables from the 32 generated ones. Note that we could have chosen bigger permutation tables. However, in this case, we would need to use the __syncthreads() instruction to synchronize threads on different warps. However, such an instruction reduces the performance significantly. These permutation tables are obtained using 32 P-boxes generated with the initial key provided to the proposed ESSENSE PRNG. So, the variable d_pbox contains 32 random permutations tables of size 32. Variable shmem is the shared memory that allows threads to exchange their values. It should be noted that each thread will have values coming from different permutation tables. For example, thread 0 will xor its result with threads 2 and 10, while thread 1 will xor its result with threads 3 and 8, and thread 2 will xor its results with threads 31 and 9, and so on. Moreover, at each iteration of the loop, the values of o0 and o1 change.

Then, the algorithm calls the xoroshiro128plus function, which changes the variable xoro, and puts the result into the variable res. Then, the shared memory is used to save this variable before xoring it with 2 other numbers generated by 2 other threads (according to the two permutation tables, as previously mentioned). The variable res is used as input to the second PRNG (xorshift64), and the result is saved in res2. Then, res2 is xor-ed with two other values coming from two other threads (in the same warp). Next, res and res2 are also xor-ed in order to obtain res3. Finally, a substitution table d_sbox is used to substitute 4 or 8 different bytes of res3 for an output of 32 or 64 bits, respectively. Note that the output is converted to an unsigned char table before applying the substitution operation on each element of the table. At the end of the loop, the internal value of xoro is saved for the next call of the function. Finally, it should be noted that nb_ele is the total number of threads, which depends on the size of the data to encrypt.

5.5 Proposed decryption algorithm

A legitimate receiver will use the same steps for decryption as the ones for encryption, and the same secret and Nonce to produce the specific dynamic key. This allows for the generation of the required cipher primitives. Then, the decryption algorithm proceeds in a similar manner to the encryption algorithm.

6 Security analysis

An efficient encryption algorithm should be able to resist the most known types of attacks such as statistical, differential, chosen/known plain-text, and brute-force attacks [19, 20]. Extensive experiments are performed in this section to demonstrate the efficiency and security level of the proposed scheme against such attacks. Note that the proposed solution can be used for any kind of data (structured or unstructured), but the following results are provided for multimedia image contents.

6.1 Statistical analysis

To guard against statistical attacks, a cipher must exhibit a high degree of randomness and uniformity [30]. To test the randomness degree, the following statistical security tests were carried out, (a) Probability Density Function (PDF) analysis, (b) Entropy analysis and (c) Correlation between plain and encrypted images.

6.1.1 Uniformity analysis

The most important test is the probability density function(PDF) of the encrypted image, which must be uniform; every symbol has a probability occurrence close to $\frac {1}{n}$, where n is the number of symbols. The PDFs of two original plain-images and their corresponding cipher images are shown in Fig. 3. It can be seen that the PDFs of the encrypted images are close to a uniform distribution, with a value close to 0.039 that is $\frac {1}{256}=3.9 \times 10^{-3}$.

6.1.2 Information Entropy Analysis

The information entropy, of a given image M, is a parameter that measures the uncertainty level in a random variable [32], and it is defined by:

$$ H(m)= -\sum\limits_{i=1}^{h^{2}} p(m_{i})\log_{2} \frac{1}{p(m_{i})} $$

(1)

The entropy is expressed in bits, and p(m_i) indicates the occurrence probability of symbol m_i, and NS the total number of symbols. If the entropy of the encrypted data is either equal to or close to log₂(NS), it can be considered as a true random source with a uniform distribution.

The Entropy analysis of the encrypted Lenna image, at the sub-matrix level with a dimension of 16 × 16, and by using a random dynamic key, is shown in Fig. 4. The results indicate that the encrypted images have an entropy similar to the desired value of 8. As such, the proposed cipher is sufficiently secure against any given entropy attack.

6.1.3 Independence

Removing any correlation between the sequence of elements is highly essential to ensure the robustness of the proposed cipher scheme [19]. Having a correlation coefficient close to zero means that the cipher scheme exhibits a high randomness degree. The correlation test is performed by randomly taking adjacent pixels from an original image and its corresponding encrypted image. This correlation can be done in horizontal, vertical and diagonal directions. The correlation coefficient r_xy is calculated using the following equation:

$$ \begin{array}{@{}rcl@{}} && r_{xy}={}{}\frac{cov (x,y)} {\sqrt{D(x)\times{D(y)}}}\\ && \text{where :} \\ && cov (x,y)=\frac{1}{N}\times \sum\limits_{i=1}^{N} (x_{i}-E(x))(y_{i}-E(y)) \\ && E_{x}=\frac{1}{N}\times \sum\limits_{i=1}^{N} x_{i} \\ &&{} D_{x}=\frac{1}{N}\times \sum\limits_{i=1}^{N} (x_{i}-E(x))^{2} \end{array} $$

(2)

The correlation results of the original and encrypted images, and for (2,000 pairs of adjacent pixels), are shown in Figs. 5 and 6, for one random key, and for 1,000 random keys, respectively. The results clearly show that the adjacent pixels of the plain image have a high correlation, close to 1. However, the coefficient correlation of the encrypted images tends is very low, close to 0, confirming the randomness property of the proposed cipher.

6.1.4 Plain data vs. encrypted data

The encrypted data should be very different from the original one, with a difference of at least 50%, at the bit level. According to the obtained result in Fig. 7a, the proposed cipher scheme satisfies the desirable difference results, with a percentage of at least 50% between the plain and the encrypted Lenna images.

6.2 Sensitivity tests

Differential attacks are based on studying the relation between two encrypted messages resulting from a slight change, such as a one-bit difference, between two original messages. The sensitivity tests must confirm that a small change in the plain-image or in the key affect the cipher image and generate a different one. The higher the difference, the better is the sensitivity of the encryption algorithm.

6.2.1 Key Sensitivity test

This is one of the most important tests, and it quantifies the sensitivity against a slight change in the secret key. The proposed key derivation function is based on a secret key and a Nonce. To further study the key sensitivity, two dynamic keys are used, DK₁ and DK₂, which differ by a single random bit. The two plain-images are then encrypted separately, and the Hamming distance of the corresponding encrypted images, C₁ and C₂, is computed and illustrated in Fig. 7b against 1,000 random dynamic keys. We can see that the majority of values are close to the optimal one (50%). This confirms the high key sensitivity of the proposed cipher algorithm. Additionally, the obtained results of 49.9970 are acceptable when compared to the reported ones of AES.

6.2.2 Plain-text sensitivity

Since a different dynamic key is being used for each input image, the algorithm produces a completely different cipher image for the same plain image. Hence, the proposed cipher successfully satisfies the avalanche criteria.

6.3 Visual degradation

This test is restricted to image and video contents, and it quantifies the visual degradation associated with the output of a cipher scheme. Two popular parameters are assessed to measure the visual quality: the Structural SIMilarity index (SSIM) [28], and the Peak Signal-to-Noise Ratio (PSNR) [10].

The PSNR is derived from the Mean Squared Error (MSE), which represents the cumulative squared error between the encrypted and original images. A low PSNR value indicates a high difference between the cipher and original images. On the other hand, SSIM lies in the [0,1] interval, where 0 means the absence of correlation between original and cipher images, while a value close to 1 indicates a high correlation between the original and cipher images. We measured PSNR and SSIM between the original and encrypted Lenna images for 1,000 random keys. The results are presented in Fig. 8a and b, respectively. It can be seen that the value of the PSNR is 9.23 dB, which is a low value and confirming the high difference between the original and encrypted images. Also, the SSIM values are always close to zero, which confirms that a high and hard visual distortion is achieved by the proposed cipher algorithm.

6.4 Cryptanalysis: Resistance against well-known types of attacks

In contrast to the majority of existing cipher solutions, our scheme is based on a dynamic key approach, with dynamic substitution, permutation and diffusion layers for each input data. Previous statistical tests (entropy analysis, probability density function, correlation tests) have confirmed the robustness of the proposed cipher scheme and its high resistance against statistical attacks. Moreover, the key sensitivity analysis demonstrated a high sensitivity against key-related attacks. These results are sufficient to infer that no useful information can be inferred from the encrypted data. On the other hand, the resistance against chosen/known plain-text attacks is verified due to the dynamic key approach, which drastically complicates the attacker’s task. As such, the problems of a single message failure and accidental key disclosure are avoided. Furthermore, differential and linear attacks are ineffective since any change in the dynamic key leads to a significant difference in the produced cipher primitive and in the encrypted message as well. Also, the key space of the secret key is of the order of 2¹²⁸, 2¹⁹² or 2²⁵⁶, which is sufficiently large to make brute-force attacks unfeasible. The same is true for the key space of the dynamic key, which is 2⁵¹². Note that a large secret key and a large dynamic key are being used since the difficulty of cipher-text-only attack is equivalent to one of the brute force attacks, making it impossible for a cipher-text-only attack to retrieve useful information from the cipher image.

6.5 Statistical tests with TestU01

As previously explained, ESSENCE has been tested with more than 100 seeds via TestU01 [12], and it successfully passed all the tests. In practice, a message of size 512*512 is typically used with all elements set to zero, and the key is initialized only once, at the beginning. All the other variables are also initialized once. Since TestU01 uses many pseudo-random numbers, the same message is used repeatedly over a very large number of iterations, with a single difference between iterations, the different numbers generated by the PRNGs.

7 Performance analysis

In this section, the cipher latency is quantified to assess the performance of the proposed cipher.

7.1 Experiments

To measure the performance of the proposed cipher, ESSENCE is evaluated on a Titan X GPU, which has the following characteristics:

Compute capability: 5.2
Global memory: 12,207 MB
GPU frequency: 1.25 GHz
Memory frequency: 3,505 MHz
Number of CUDA cores: 3,072

and on a Tesla V100 with the following characteristics:

Compute capability: 7.0
Global memory: 16,152 MB
GPU frequency: 1.53 GHz
Memory frequency: 877 MHz
Number of CUDA cores: 5,120

To compare the performance against the best AES implementation, we selected the implementation of [15], which uses the ECB operation mode, and we shall refer to it as AES-ECB. The performance tests are based on different 8-bit color images. Note that the throughput of AES-ECB is very close to the result in [26], 570.72 Gbps, which corresponds to 71.3 GBps.

The execution time of the encryption algorithm is the same as the one of the decryption algorithm (stream cipher). Note that our implementation is highly optimized, and the kernel operations of reading and writing an image take approximately the same time. The speed-up of ESSENCE compared to AES-ECB is shown in Tables 2 and 3 and in Fig. 9.

Table 2 Throughput of ESSENCE and AES-ECB on a Titan X GPU

Full size table

Table 3 Throughput of ESSENCE and AES-ECB on a Volta V100 GPU

Full size table

The obtained results indicate that the proposed cipher scheme is faster compared to AES-ECB, and the ratio varies between 1.4 and 2 depending on the message length on the Titan X, and between 2.4 and 2.9 for the Tesla V100. Therefore, the proposed cipher scheme is more suitable for real-time applications.

8 Conclusion

In this paper, we presented ESSENCE, a new dynamic, key-dependent, one-round stream cipher scheme with an efficient, parallel, and dynamic key-dependent structure, and which was designed targeting a GPU implementation. ESSENCE outperformed the most optimized implementation of AES on GPU, which makes it preferable for real-time applications. Moreover, the proposed cipher scheme offers a high degree of randomness, which was validated by quantifying the produced key-stream, which successfully passed the statistical tests of TestU01. Also, ESSENCE has a high periodicity since it combines the threads’ results of two PRNGs, which are then dynamically xor-ed based on 32 permutation tables, which are also generated and related to the dynamic key. Moreover, the implementation of ESSENCE is very simple compared to other existing cipher schemes. Equally important, the robustness of ESSENCE has been assessed and confirmed via cryptanalysis along with different benchmark tests. Note that other existing cryptanalysis techniques are designed to target static structures, which is not the case of the proposed scheme. In future work, the design of an efficient parallel dynamic key-dependent hash function for GPU will be investigated.

References

Bahi J, Couturier R, Guyeux C, Héam P-C (2015) Efficient and cryptographically secure generation of chaotic pseudorandom numbers on GPU. J Supercomput 71(10):3877–3903
Article Google Scholar
Beaulieu R, Treatman-Clark S, Shors D, Weeks B, Smith J, Wingers L (2015) The simon and speck lightweight block ciphers. In: 52nd ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, pp 1–6
Blackman D, Vigna S (2018) Scrambled linear pseudorandom number generators. CoRR, arXiv:1805.01407
Chen L, Zhang R (2008) A Key-dependent Cipher DSDP. In: 2008 International Symposium on Electronic Commerce and Security. IEEE, pp 310–313
Couturier R (2013) Designing Scientific Applications on GPUs. Numerical Analysis & Scientific Computating. Chapman & Hall/CRC, London
Daemen J, Rijmen V (2013) The Design of Rijndael: AES-the Advanced Encryption Standard. Springer Science & Business Media, Berlin
Fawaz Z, Noura Ha, Mostefaoui A (2016) An efficient and secure cipher scheme for images confidentiality preservation. Signal Process Image Commun 42:90–108
Article Google Scholar
Fluhrer S, Mantin I, Shamir A et al Weaknesses in the Key Scheduling Algorithm of RC4, Springer, Berlin
Guo G-l, Qian Q, Zhang R (2015) Different Implementations of AES Cryptographic Algorithm. In: IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) High Performance Computing and Communications (HPCC). IEEE, pp 1848–1853
Huynh-Thu Q, Ghanbari M (2008) Scope of validity of PSNR in Image/Video quality assessment. Electron Lett 44(13):800–801
Article Google Scholar
Klein A (2008) Attacks on the rc4 stream cipher. Designs Codes Cryptogr 48 (3):269–286
Article MathSciNet Google Scholar
L’Ecuyer P, Simard RJ (2007) TestU01: A C Library for Empirical Testing of Random Number Generators, vol 33
Article Google Scholar
Lee W-K, Cheong H-S, Phan RC-W, Goi B-M (2016) Fast Implementation of Block Ciphers and PRNGs in Maxwell GPU Architecture. Clust Comput 19(1):335–347
Article Google Scholar
Li Q, Zhong C, Zhao K, Mei X, Chu X (2012) Implementation and Analysis of AES Encryption on GPU. In: High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS). IEEE, pp 843–848
Lim RK, Petzold LR, Koç Ç (2016) Bitsliced High-performance AES-ECB on GPUs. In: The New Codebreakers. Springer, pp 125–133
Mantin I, Shamir A (2001) A practical attack on broadcast rc4. In: International workshop on fast software encryption. Springer, pp 152–164
Miller FP., Vandome AF., McBrewster J (2009) Advanced encryption standard. Alpha Press, Orlando
Noura H, Chehab A, Noura M, Couturier R, Mansour MM (2018) Lightweight, Dynamic and Efficient Image Encryption Scheme. Multimedia Tools and Applications 78:1–35
MATH Google Scholar
Noura H, Chehab A, Sleem L, Noura M, Couturier R, Mansour MM (2018) One Round Cipher Algorithm for Multimedia IoT Devices. Multimedia Tools and Applications 94:1–31
Google Scholar
Noura H, Sleem L, Noura M, Mansour MM, Chehab A, Couturier R (2018) A new efficient lightweight and secure image cipher scheme. Multimed Tools Appl 77(12):15457–15484
Article Google Scholar
Noura HN, Noura M, Chehab A, Mansour MM, Couturier R (2018) Efficient and secure cipher scheme for multimedia contents. Multimedia Tools and Applications 78:1–30
Google Scholar
Nvidia, CUDA. A C Programming Guide, version 9.0. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Paar C, Pelzl J (2009) Understanding cryptography: a textbook for students and practitioners. Springer Science & Business Media, Berlin
Panneton F, L’ecuyer P (2005) cOis On the xorshift Random Number Generators. ACM Trans Model Comput Simul (TOMACS) 15(4):346–361
Article Google Scholar
Paul G, Maitra S (2011) RC4 Stream cipher and its variants. CRC Press, Boca Raton
Book Google Scholar
Peccerillo B, Bartolini S, Koç ÇK (2017) Parallel Bitsliced AES through PHAST: a Single-Source High-Performance Library for Multi-Cores and GPUs. Journal of Cryptographic Engineering 9:1–13
Google Scholar
Stallings W (2017) Cryptography and Network Security: Principles and Practice. Pearson, Upper Saddle River
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Wellons C (2017) Finding the Best 64-bit Simulation PRNG << null program. https://nullprogram.com/blog/21/09/2017
Xu S, Wang Y, Wang J, Tian M (2008) Cryptanalysis of Two Chaotic Image Encryption Schemes Based on Permutation and XOR Operations. In: 2008 CIS’08. International Conference on Computational Intelligence and Security. IEEE, vol 2, pp 433–437
Zhang R, Chen L (2008) A Block Cipher using Key-dependent S-box and Pboxes. In: 2008. ISIE 2008. IEEE International Symposium on Industrial Electronics. IEEE, pp 1463–1468
Zhang G, Liu Q (2011) A novel image encryption method based on total shuffling scheme. Opt Commun 284(12):2775–2780
Article Google Scholar

Download references

Acknowledgments

This paper is partially funded by the Maroun Semaan Faculty of Engineering and Architecture at the American University of Beirut and by the EIPHI Graduate School (contract “ANR-17-EURE-0002”). We also thank the supercomputer facilities of the Mésocentre de calcul de Franche-Comté.

Author information

Authors and Affiliations

CNRS, FEMTO-ST Institute, University of Bourgogne Franche-Comté (UBFC), Dijon, France
Raphaël Couturier
Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon
Hassan N. Noura & Ali Chehab
Department of Computer Sciences, Arab Open University, Beirut, Lebanon
Hassan N. Noura

Authors

Raphaël Couturier
View author publications
You can also search for this author in PubMed Google Scholar
Hassan N. Noura
View author publications
You can also search for this author in PubMed Google Scholar
Ali Chehab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raphaël Couturier.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Couturier, R., Noura, H.N. & Chehab, A. ESSENCE: GPU-based and dynamic key-dependent efficient stream cipher for multimedia contents. Multimed Tools Appl 79, 13559–13579 (2020). https://doi.org/10.1007/s11042-020-08613-2

Download citation

Received: 15 March 2019
Revised: 02 October 2019
Accepted: 02 January 2020
Published: 31 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11042-020-08613-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

ESSENCE: GPU-based and dynamic key-dependent efficient stream cipher for multimedia contents

Abstract

Similar content being viewed by others

ORSCA-GPU: one round stream cipher algorithm for GPU implementation

Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers

Stream cipher designs: a review

1 Introduction

1.1 Related work

1.2 Problem formulation

1.3 Motivation

1.4 Contributions

1.5 Organization

2 Existing GPU cipher algorithms and their corresponding implementations

3 Dynamic key derivation function

4 Construction of dynamic cipher primitives

5 Proposed stream cipher algorithm

5.1 Basic Concepts

5.2 xoroshiro128+ (XOR/rotate/shift/rotate)

5.3 Xorshift

5.4 Proposed encryption algorithm

5.5 Proposed decryption algorithm

6 Security analysis

6.1 Statistical analysis

6.1.1 Uniformity analysis

6.1.2 Information Entropy Analysis

6.1.3 Independence

6.1.4 Plain data vs. encrypted data

6.2 Sensitivity tests

6.2.1 Key Sensitivity test

6.2.2 Plain-text sensitivity

6.3 Visual degradation

6.4 Cryptanalysis: Resistance against well-known types of attacks

6.5 Statistical tests with TestU01

7 Performance analysis

7.1 Experiments

8 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation