Abstract
Since the introduction of the CUDA programming model, GPUs are considered a viable platform for accelerating non-graphical applications. Many cryptographic algorithms have been reported to achieve remarkable performance speedups, especially block ciphers. For stream ciphers, however, the lack of reported GPU acceleration endeavors is due to their inherent iterative structures that prohibit parallelization. In this paper, we propose an efficient implementation methodology for data-parallel cryptographic functions in a batch processing fashion on modern GPUs in general and optimizations for Salsa20 in particular. We present an autotuning framework to reach the most optimized set of device and application parameters for Salsa20 kernel variants with throughput maximization as a figure of merit. The peak performance achieved by our implementation for Salsa20/12 is 2.7 GBps and 43.44 GBps with and without memory transfers respectively on NVIDIA GeForce GTX 590. These figures beat the fastest reported GPU implementation of any stream cipher in the eSTREAM portfolio including Salsa20/12, as well as the block cipher AES optimized by hand-tuning, and thus, to the best of our knowledge set a new speed record.
This work was done in part while the second author was visiting RWTH Aachen, Germany as an Alexander von Humboldt Fellow.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
eSTREAM: the ECRYPT Stream Cipher Project, http://www.ecrypt.eu.org/stream
Bernstein, D.J.: Hash functions and ciphers. In Notes on the ECRYPT Stream Cipher Project (eSTREAM), http://cr.yp.to/streamciphers/why.html
Bernstein, D.J.: eBACS: ECRYPT Benchmarking of Cryptographic Systems, http://bench.cr.yp.to/results-stream.html
Bernstein, D.J.: The salsa20 family of stream ciphers. In: Robshaw, M., Billet, O. (eds.) New Stream Cipher Designs. LNCS, vol. 4986, pp. 84–97. Springer, Heidelberg (2008)
Bernstein, D.J.: ChaCha, a variant of Salsa20. Workshop Record of SASC 2008: The State of the Art of Stream Ciphers, http://cr.yp.to/papers.html#chacha
Biagio, A., Barenghi, A., Agosta, G., Pelosi, G.: Design of a parallel AES for graphics hardware using the CUDA framework. In: International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–8. IEEE (2009)
Iwai, K., Nishikawa, N., Kurokawa, T.: Acceleration of AES encryption on CUDA GPU. International Journal of Networking and Computing 2(1), 131–145 (2012)
Khalid, A., Bagchi, D., Paul, G., Chattopadhyay, A.: Optimized GPU implementation and performance analysis of HC series of stream ciphers. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 293–308. Springer, Heidelberg (2013), http://eprint.iacr.org/2013/059
Kurzak, J., Tomov, S., Dongarra, J.: Autotuning GEMM kernels for the Fermi GPU. In: Transactions on Parallel and Distributed Systems, pp. 2045–2057. IEEE (2012)
Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: International Signal Processing and Communications (ICSPC), pp. 65–68. IEEE (2007)
Murthy, G.S., Ravishankar, M., Baskaran, M.M., Sadayappan, P.: Optimal loop unrolling for GPGPU programs. In: International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–11. IEEE (2010)
Neves, S.: Cryptography in GPUs. Master’s thesis (2009), http://eden.dei.uc.pt/~sneves/pubs
Stefan, D.: Analysis and Implementation of eSTREAM and SHA-3 Cryptographic Algorithms. Master’s thesis (2011), https://github.com/deian/gSTREAM .
Wu, H.: The Stream Cipher HC-128, http://www.ecrypt.eu.org/stream/hcp3.html
NVIDIA’s Next Generation CUDA Compute Architecture: Fermi, http://stanford-cs193g-sp2010.googlecode.com/svn/trunk/lectures/lecture_4/cuda_memories.pdf
CUDA C Programming Guide, http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ptx-compatibility
CUDA C Best Practices Guide, http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
Basic Linear Algebra Subprograms Technical Forum Standard (August 2001), http://www.netlib.org/blas/blast-forum/blas-report.ps
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Khalid, A., Paul, G., Chattopadhyay, A. (2013). New Speed Records for Salsa20 Stream Cipher Using an Autotuning Framework on GPUs. In: Youssef, A., Nitaj, A., Hassanien, A.E. (eds) Progress in Cryptology – AFRICACRYPT 2013. AFRICACRYPT 2013. Lecture Notes in Computer Science, vol 7918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38553-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-38553-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38552-0
Online ISBN: 978-3-642-38553-7
eBook Packages: Computer ScienceComputer Science (R0)