Abstract
This paper describes an algorithm for computing modular exponentiation using vector (SIMD) instructions. It demonstrates, for the first time, how such a software approach can outperform the classical scalar (ALU) implementations, on the high end x86_64 platforms, if they have a wide SIMD architecture. Here, we target speeding up RSA2048 on Intel’s soon-to-arrive platforms that support the AVX2 instruction set. To this end, we applied our algorithm and generated an optimized AVX2-based software implementation of 1024-bit modular exponentiation. This implementation is seamlessly integrated into OpenSSL, by patching over OpenSSL 1.0.1. Our results show that our implementation requires 51% less instructions than the current OpenSSL 1.0.1 implementation. This illustrates the potential significant speedup in the RSA2048 performance, which is expected in the coming (2013) Intel processors. The impact of such speedup on servers is noticeable, especially since migration to RSA2048 is recommended by NIST, starting from 2013.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Barker, E., Roginsky, A.: Transitions: Recommendation for Transitioning the Use of Cryptographic Algorithms and Key Lengths, p. 5. NIST Special Publication 800-131A (2011), http://csrc.nist.gov/publications/nistpubs/800-131A/sp800-131A.pdf
Bernstein, J.D.: Curve25519: New Diffie-Hellman speed records (2006)
Brent, R., Zimmermann, P.: Modern Computer Arithmetic. Cambridge University Press (2010), http://www.loria.fr/~zimmerma/mca/pub226.html (retrieved)
Gueron, S.: Enhanced Montgomery Multiplication. In: Kaliski Jr., B.S., Koç, Ç.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 46–56. Springer, Heidelberg (2003)
Gueron, S., Krasnov, V.: Efficient and side channel analysis resistant 512-bit and 1024-bit modular exponentiation for optimizing RSA1024 and RSA2048 on x86_64 platforms, OpenSSL #2582 patch (posted August 2011), http://rt.openssl.org/Ticket/Display.html?id=2582&user=guest&pass=guest
Gueron, S., Krasnov, V.: Speeding up Big-numbers Squaring. In: IEEE Proceedings of 9th International Conference on Information Technology: New Generations (ITNG 2012), pp. 821–823 (2012)
Gueron, S.: Efficient Software Implementations of Modular Exponentiation. Journal of Cryptographic Engineering 2, 31–43 (2012)
Gueron, S., Krasnov, V.: Efficient, and side channel analysis resistant 1024-bit modular exponentiation, for optimizing RSA2048 on AVX2 capable x86_64 platforms, OpenSSL patch (posted June 2012), http://rt.openssl.org/
Hassaballah, M., Omran, S., Mahdy, Y.B.: A Review of SIMD Multimedia Extensions and their Usage in Scientific and Engineering Applications. The Computer Journal 51(6), 630–649 (2007)
Intel: Using Streaming SIMD Extensions (SSE2) to Perform Big Multiplications (2006)
Intel: Intel Advanced Vector Extensions Programming Reference, http://software.intel.com/file/36945
Buxton, M. (Intel): Haswell New Instruction Descriptions Now Available!, http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/
Intel: Software Development Emulator (SDE), http://software.intel.com/en-us/articles/intel-software-development-emulator/
Koc, Ç.K., Kaliski, B.S.: Analyzing and Comparing Montgomery Multiplication Algorithms. Micro 16(3), 26–33 (1996), http://islab.oregonstate.edu/papers/j37acmon.pdf
Koç, Ç.K., Walter, C.D.: Montgomery Arithmetic. In: van Tilborg, H. (ed.) Encyclopedia of Cryptography and Security, pp. 394–398. Springer (2005)
Lin, B.: Solving Sequential Problems in Parallel. Application Note, Freescale Semiconductor (2006)
Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography, 5th printing. CRC Press (2001)
OpenSSL: The Open Source toolkit for SSL/TLS, http://www.openssl.org/
Page, D., Smart, P.: Parallel Cryptographic Arithmetic Using a Redundant Montgomery Representation. IEEE Transactions on Computers 53(11), 1474–1482 (2004)
Walter, C.D.: Montgomery exponentiation needs no final subtractions. Electron. Lett. 35, 1831–1832 (1999)
Walter, C.D.: Montgomery’s Multiplication Technique: How to Make It Smaller and Faster. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 80–93. Springer, Heidelberg (1999)
Walter, C.D.: Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli. In: Preneel, B. (ed.) CT-RSA 2002. LNCS, vol. 2271, pp. 30–39. Springer, Heidelberg (2002)
YASM: The YASM Modular Assembler Project, http://yasm.tortall.net/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gueron, S., Krasnov, V. (2012). Software Implementation of Modular Exponentiation, Using Advanced Vector Instructions Architectures. In: Özbudak, F., Rodríguez-Henríquez, F. (eds) Arithmetic of Finite Fields. WAIFI 2012. Lecture Notes in Computer Science, vol 7369. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31662-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-31662-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31661-6
Online ISBN: 978-3-642-31662-3
eBook Packages: Computer ScienceComputer Science (R0)