Abstract
Today, with an ever-increasing number of computer users, the number of cyberattacks to steal data and invade privacy is of utmost importance. A group of applications uses the Advanced Encryption Standard (AES) to encrypt data for security reasons. This mainly concerns enterprises and businesses, which ultimately handle user data. But many implementations of the AES algorithm consume large amounts of CPU horsepower and are not up to the mark in terms of throughput. To tackle this problem, the proposed system makes use of GPUs, which are targeted for parallel applications. These enable parallel operations to be performed much faster than the CPU, ultimately increasing throughput and reducing resource consumption to some extent. The vital aspect of this approach is the speedup that is achieved due to massive parallelism. This research aims to implement AES encryption and decryption using CUDA and benchmark it on various compute devices.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Cybersecurity is critical these days, with cybercrime and cyberattacks carried out by malicious users and hackers on the rise. To protect our data, encryption algorithms are used, like Advanced Encryption Standard (AES), which was developed to tackle security problems found in Data Encryption Standard (DES) in 2001 by the National Institute of Standards and Technology (NIST). AES is a block cipher-based encryption technique. Even after 21 years, AES still withstands all cyberattacks and is also considered to be arguably the most used encryption algorithm. The widespread use of AES led to the development of many optimized implementations for a variety of CPU architectures.
Traditionally, graphical processing units (GPUs) are used by enthusiasts or developers for game playing or development, respectively, video rendering, and all operations that require a considerably large amount of video memory and dedicated processing. GPU architectures work on single instruction multiple data (SIMD) which allows them to execute the same instruction on multiple data streams in parallel fashion. This is also known as “massively accelerated parallelism” [1,2,3,4,5].
The main motive behind adapting this research topic is to enhance data security using encryption algorithms that allow for minimal power consumption, high throughput, and low latency. This includes exploring the field of computation to extract all the benefits we gain from general-purpose computing. Encrypting large files on a CPU tends to take a lot of time as they sequentially perform each calculation, while offloading this computation to a GPU drastically reduces the time taken as it parallelly performs the same calculations [5,6,7,8,9,10]. This means that many similar calculations are performed concurrently, resulting in a faster result. When GPUs are used to perform general tasks rather than video processing, they are known as general purpose graphical processing units (GPGPUs). GPGPUs are used for tasks that are generally performed by CPUs, such as mathematical equations and cryptography, as well as to create cryptocurrency. These GPGPUs are accessible by making use of parallel platforms like OpenCL or CUDA [10,11,12,13,14,15]. The proposed project makes use of Compute Unified Device Architecture (CUDA). It is a NVIDIA-exclusive technology that will be available on select NVIDIA compute devices. This compatibility can be checked on NVIDIA’s official website.
The proposed study seeks to show the potential speedup and advantage of using a GPU to encrypt files using the AES algorithm. Despite making a significant improvement in performance, this speedup is not directly beneficial to end users. Large corporations can truly harness this power as they have to continuously encrypt a large number of files while being confined by time. As a consequence, end users benefit indirectly since it takes less time to respond to their requests. This technique not only saves a lot of time, but also power if the resources are used efficiently. This will save money not only by lowering electricity consumption but also by lowering the cost of cooling the machines. The applications of using GPUs for general-purpose workloads are limitless; encryption is just one of the many others.
2 Related Work
Survey of related works is shown in Table 1.
3 Proposed Work
-
(A)
AES Algorithm
The AES block cipher works with 128 bits, or 16 bytes, of input data at a time. The substitution-permutation network principle is used in AES, which is an iterative algorithm (Fig. 1). The total number of rounds needed for the encryption or decryption process is determined by the size of the cryptographic key employed. AES's key length and number of rounds is shown in Table 2.
The input is represented as a 4 × 4 bytes grid or matrix in a column major arrangement, in contrast to traditional row major arrangement followed in system programming. The below equation shows the AES 16-byte matrix of 4 rows and 4 columns, which will be mapped as an array for converting plaintext to ciphertext.
Round is composed of several processing steps, including substitution, transposition, and mixing of the input plain text to generate the final output of cipher text. Each round is divided into 4 steps—
-
(1)
SubBytes—Input 16 bytes are substituted by looking up the S-Box.
-
(2)
ShiftRows—Each of the four rows of the matrix is shifted to the left.
-
(3)
MixColumns—Each column of four bytes is transformed using a special mathematical function. This operation is not performed for the last round.
-
(4)
Add Round Key—The 128 bits of the matrix are XORed to the 128 bits of the round key. If the current round is the last, then the output is the ciphertext.
-
(B)
CUDA Implementation
The proposed project consists of a parallel implementation for
-
AES-128-bit encryption
-
AES-192-bit encryption
-
AES-256-bit encryption.
The proposed implementation is developed using Compute Unified Device Architecture (CUDA). It is a parallel computing platform and programming model created by NVIDIA for general computing on NVIDIA GPUs only. CUDA is not just an API, programming language, or SDK. It is mainly a hardware component, allowing drivers and libraries to run on top of it.
The AES algorithm is divided into two parts: one that runs on the CPU and another that runs on the GPU. The calculations performed in AES are performed on the GPU side, and the results are stored back in system memory. The CPU handles reading binary data from images and videos and creating new binary streams after encryption and decryption by the GPU.
Figure 2 presents the NVIDIA CUDA Compiler (NVCC) trajectory. It includes transformations from source to executable that are executed on the compute device. As a result, the.cu source will be divided into host and guest parts and execute on different hardware. This is called hybrid code, which then implements parallel working. We make use of CUDA specifiers like __global, which denote that the code runs on the device and is called from the host. The next specifier used is __device, which runs and calls on the device itself. Along with this, there is another specifier known as __host, which runs and makes calls on host, just like other library APIs or user-defined functions are called.
The key for encryption and decryption will be stored in a text file and can be 128, 192, or 256 bits in length. The binary data, key, and number of threads are passed as command line arguments to the program. The binary data can be a text file, a video, or an image to be encrypted given as a relative path. Number of threads is considered for benchmarking the performance of GPUs to measure their potential. After the execution starts, the data stored in the form of blocks and the key will be copied from system memory, i.e., RAM to the GPU Video RAM (VRAM), using arrays. The operations will be performed based on the number of rounds which is determined by the key length. After encryption and decryption operations are completed in VRAM, the results will be copied back to RAM, and the time required for the calculation will be displayed.
Figure 3 depicts a sample code snippet using CUDA specifiers. It includes the byte substitution process, which replaces the state array with the respective S-Box values and addition of a round key which comprises Binary XOR operation. In this manner, all the AES encryption and decryption operations are implemented as C language functions with modified extensibility using CUDA to implement parallelism.
4 Result Analysis
-
(A)
Evaluation Environment
For the purpose of evaluating the performance of our proposed algorithm, we have used a set of hardware and software components specified in Table 3.
Figure 4 demonstrates the use of our implementation. All of the samples are bitmap images having.bmp file extension. After the program has finished the execution, in the root directory of the application, 2 bitmap images will be generated, EncryptedImage.bmp and DecryptedImage.bmp. Above figure is a screenshot which combines both the specified files. It has two parts, the left part of the image contains the encrypted file, which the default image application is unable to open, and on the right side you can see the decrypted image.
Figure 5 describes the CUDA information about the computer system you are using to run the program. This uses the APIs from cuda.h, which includes cudaGetDeviceCount() and cudaGetDeviceProperties(). The summary lists out different parameters like—
-
(1)
Total Number of CUDA Supporting GPU Device/Devices on the System
-
(2)
CUDA Driver and Runtime Information
-
a.
CUDA Driver Version
-
b.
CUDA Runtime Version
-
a.
-
(3)
GPU Device General Information
-
a.
GPU Device Number
-
b.
GPU Device Name
-
c.
GPU Device Compute Capability
-
d.
GPU Device Clock Rate
-
e.
GPU Device Type—Integrated or Discrete
-
a.
-
(4)
GPU Device Memory Information
-
a.
GPU Device Total Memory
-
b.
GPU Device Constant Memory
-
c.
GPU Device Shared Memory per SMProcessor
-
a.
-
(5)
GPU Device Multiprocessor Information
-
a.
GPU Device Number of SMProcessors
-
b.
GPU Device Number of Registers per SMProcessor
-
a.
-
(6)
GPU Device Thread Information
-
a.
GPU Device Maximum Number of Threads Per SMProcessor
-
b.
GPU Device Maximum Number of Threads Per Block
-
c.
GPU Device Threads in Warp
-
d.
GPU Device Maximum Thread Dimensions
-
e.
GPU Device Maximum Grid Dimensions
-
a.
-
(7)
GPU Device Driver Information
-
a.
Error Correcting Code (ECC) Support—Enabled/Disabled
-
b.
GPU Device CUDA Driver Mode—Tesla Compute Cluster(TCC)/Windows Display Driver Model (WDDM).
-
a.
-
(B)
Evaluation Result
While comparing results with existing implementations, the proposed system includes 2 different performance benchmarks, one which compares the performance obtained on different CPUs and GPUs, thus specifying the need to use parallel computing and the second compares compute capability of different GPUs.
The calculation of the time taken for performing operations on the binary data is done using the “helper_timer” library offered by NVIDIA. This is achieved using the set of APIs—
-
a.
sdkCreateTimer()—To create a timer pointer of type StopWatchInterface
-
b.
sdkStartTimer()—To start the timer
-
c.
sdkStopTimer()—To stop the timer
-
d.
sdkGetTimerValue()—To get the timer value after the timer is stopped
-
e.
sdkDeleteTimer()—To free the timer pointer.
Figure 6 depicts the program results obtained using 2048 threads. The time required to encrypt and decrypt the images is calculated and displayed in seconds. The number of threads passed to the application is modifiable and is passed as command line arguments to the program.
Table 4 shows the different time values required to perform the encryption and decryption on various CPUs and GPUs.
-
a.
Column 1—Represents the device on which the program is tested.
-
b.
Column 2—Specifies the sample size. Samples are the bitmap images used for testing.
-
c.
Column 3—Time required to encrypt the data, represented in seconds.
-
d.
Column 4—Time required to decrypt the data, represented in seconds.
Figures 7 and 8 portray the time required for encryption and decryption on CPU and GPU for different sample sizes. According to the results specified in Table 4, as the size of input data increases the GPU takes less time to perform AES operations.
Table 5 depicts the different time values required to perform the encryption and decryption on various GPUs with variable threads. First column represents the name of the GPU, second column specifies the number of threads tested on that GPU. The third and fourth columns state the time required for encryption and decryption measured in seconds, respectively. This data is dynamic as the values can change over different runs. But overall, it gives the idea of performance capabilities of different NVIDIA GPUs.
Figure 9 is used to represent the time and speedup factor by visualizing it. From the results, we can clearly see that, the more powerful the GPU, the less time required to complete the task. Here, the number of threads is also a crucial factor while determining the best GPU. For our testing, NVIDIA RTX 3060 was the best performer.
From the results, we can say that using CUDA extensively saves time and increases the throughput. This can be useful in hash algorithms as well, which can then be implemented in blockchain technology which will compute the hash of the block a lot faster. CUDA can also be used to conserve the amount of energy and power required to maintain the blockchain network. Basically, it will save time, resources and computational cost will be reduced to a great extent.
5 Conclusion
We proposed a method to parallelize the encryption and decryption processes in order to overcome the issue of high resource consumption in the traditional implementation of AES that would run on the CPU. We designed and implemented the AES encryption and decryption algorithm, which works on 128-bit, 192-bit, and 256-bit key sizes, to run on GPUs using CUDA, thereby reducing power consumption and increasing efficiency. This method provided a significant speedup over the CPU, providing high speed. This may change the way traditional resources are used, as these implementations can be used to encrypt binary data in all forms, including images and videos, as well as full disk encryption like Microsoft BitLocker. This process would require extreme fine-tuning to make such implementations a standard for other security techniques.
6 Future Scope
Currently, the proposed system presents a parallel implementation of the AES algorithm that can only be run on NVIDIA GPUs, as the presented research uses CUDA. This limits portability of testing and deploying to infrastructure using AMD or Intel GPUs, may it be integrated or discrete. To overcome this issue, we would need to develop a codebase using OpenCL that would allow us to cover every GPU and CPU device. But there are various parameters that are yet to be considered to optimize the algorithm to make proper and efficient use of the GPU to save energy and still produce similar results.
References
Yuan Y, He Z, Gong Z, Qiu W (2014) Acceleration of AES encryption with OpenCL. In: 2014 Ninth Asia joint conference on information security, pp 64–70
Jaiswal M, Kumari R, Singh I (2018) Analysis and implementation of parallel AES algorithm based on T-table using CUDA on the multicore GPU. IJCRT 6(1). ISSN: 2320-2882
Ma J, Chen X, Xu R, Shi J (2017) Implementation and evaluation of different parallel designs of AES using CUDA. In: 2017 IEEE second international conference on data science in cyberspace (DSC), pp 606–614
Abdelrahman AA, Fouad MM, Dahshan H, Mousa AM (2017) High performance CUDA AES implementation: a quantitative performance analysis approach. In: 2017 Computing conference, pp 1077–1085
An S, Seo SC (2020) Highly efficient implementation of block ciphers on graphic processing units for massively large data. NATO Adv Sci Inst Ser E Appl Sci 10(11):3711
Biryukov A, Großschädl J (2012) Cryptanalysis of the full AES using GPU-like special-purpose hardware. Fund Inform 114(3–4):221–237
Li Q, Zhong C, Zhao K, Mei X, Chu X (2012) Implementation and analysis of AES encryption on GPU. In: 2012 IEEE 14th international conference on high performance computing and communication & 2012 IEEE 9th international conference on embedded software and systems, pp 843–848
Iwai K, Kurokawa T, Nisikawa N (2010) AES encryption implementation on CUDA GPU and its analysis. In: 2010 First international conference on networking and computing, pp 209–214
Mei C, Jiang H, Jenness J (2010) CUDA-based AES parallelization with fine-tuned GPU memory utilization. In: 2010 IEEE international symposium on parallel & distributed processing, workshops and Phd forum (IPDPSW), pp 1–7
Jadhav S, Vanjale SB, Mane PB (2014) Illegal access point detection using clock skews method in wireless LAN. In: 2014 International conference on computing for sustainable global development (INDIACom). https://doi.org/10.1109/indiacom.2014.6828057
Stallings W (2022) Cryptography and network security: principles and practice. Pearson.
Daemen J, Rijmen V (2013) The design of Rijndael: AES—the advanced encryption standard. Springer Science & Business Media
Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional
Wilt N (2020) The Cuda handbook: a comprehensive guide to Gpu Programming. Addison-Wesley Professional
Nguyen H, NVIDIA Corporation (2008) GPU gems 3. Addison-Wesley. https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-36-aes-encryption-and-decryption-gpu
Tezcan C (2021) Optimization of advanced encryption standard on graphics processing units. IEEE Access 9:67315–67326
Sanida T, Sideris A, Dasygenis M (2020) Accelerating the AES algorithm using OpenCL. In: 2020 9th International conference on modern circuits and systems technologies (MOCAST), pp 1–4
Bharadwaj B, Saira Banu J, Madiajagan M, Ghalib MR, Castillo O, Shankar A (2021) GPU-accelerated implementation of a genetically optimized image encryption algorithm. Soft Comput 25(22):14413–14428
Wang C, Chu X (2019) GPU accelerated AES algorithm. arXiv [cs.DC]. arXiv. http://arxiv.org/abs/1902.05234
Inampudi GR, Shyamala K, Ramachandram S (2018) Parallel implementation of cryptographic algorithm: AES using OpenCL on GPUs. In: 2018 2nd International conference on inventive systems and control (ICISC), pp 984–988
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jadhav, S., Patel, U., Natu, A., Patil, B., Palwe, S. (2023). Cryptography Using GPGPU. In: Rajakumar, G., Du, KL., Rocha, Á. (eds) Intelligent Communication Technologies and Virtual Mobile Networks. ICICV 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 171. Springer, Singapore. https://doi.org/10.1007/978-981-99-1767-9_23
Download citation
DOI: https://doi.org/10.1007/978-981-99-1767-9_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1766-2
Online ISBN: 978-981-99-1767-9
eBook Packages: EngineeringEngineering (R0)