Abstract
This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on the new 64-bit x64 processors, AMD Athlon64 (AMD64) and Intel Pentium 4 (EM64T). We fully utilize newly introduced 64-bit registers and instructions for extracting maximal performance of target primitives. Our program of AES with 128-bit key runs in 170 cycles/block on Athlon 64, which is, as far as we know, the fastest implementation of AES on a PC processor.
Also we implemented a “bitsliced” AES and Camellia for the first time, both of which achieved very good performance. A bitslice implementation is important from the viewpoint of a countermeasure against cache timing attacks because it does not require lookup tables with a key-dependent address. We also analyze performance of SHA256/512 and Whirlpool hash functions and show that SHA512 can run faster than SHA256 on Athlon 64. This paper exhibits an undocumented fact that 64-bit right shifts and 64-bit rotations are extremely slow on Pentium 4, which often leads to serious and unavoidable performance penalties in programming encryption primitives on this processor.
Chapter PDF
Similar content being viewed by others
References
Aoki, K., Ichikawa, T., Kanda, M., Matsui, M., Moriai, S., Nakajima, J., Tokita, T.: The 128-Bit Block Cipher Camellia. IEICE Trans. Fundamentals 85(1), 11–24 (2002)
Barreto, P., Rijmen, V.: The Whirlpool Hashing Function. In: Proceedings of First Open NESSIE Workshop, Heverlee, Belgium (2000)
Barreto, P.: The Whirlpool Hash Function, http://planeta.terra.com.br/informatica/paulobarreto/WhirlpoolPage.html
Biham, E.: A Fast New DES Implementation in Software. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 260–272. Springer, Heidelberg (1997)
Cryptography Research and Evaluation Committees: The CRYPTREC Homepage, http://www.cryptrec.org/
Federal Information Processing Standards Publication 197. Advanced Encryption Standard (AES), NIST (2001)
Federal Information Processing Standards Publication 180-2, Secure Hash Standard, NIST (2002)
Fog, A.: How To Optimize for Pentium Family Processorss, Available at, http://www.agner.org/assem/
Granlund, T.: Instruction latencies and throughput for AMD and Intel x86 Processors, Available at, http://swox.com/doc/x86-timing.pdf
IA-32 Intel Architecture Optimization Reference Manual, Order Number 248966-011, http://developer.intel.ru/download/design/Pentium4/manuals/24896611.pdf
ISO/IEC 18033-3, Information technology - Security techniques – Encryption algorithms - Part3: Block ciphers (2005)
Kartunov, V.: Prescott: The Last of the Mohicans (Pentium 4: from Willamette to Prescott), http://www.xbitlabs.com/articles/cpu/display/netburst-1.html
Matsui, M., Fukuda, S.: How to Maximize Software Performance of Symmetric Primitives on Pentium III and 4 Processors. In: Gilbert, H., Handschuh, H. (eds.) FSE 2005. LNCS, vol. 3557, pp. 398–412. Springer, Heidelberg (2005)
Nakajima, J., Matsui, M.: Performance Analysis and Parallel Implementation of Dedicated Hash Functions on Pentium III. IEICE Trans. Fundamentals E86- A(1), 54–63 (2003)
Nakajima, J., Matsui, M.: Fast Software Implementations of MISTY1 on Alpha Processors. IEICE Trans. Fundamentals E82-A(1), 107–116 (1999)
New European Schemes for Signatures, Integrity, and Encryption (NESSIE), https://www.cosic.esat.kuleuven.ac.be/nessie/
Osvik, D.A., Shamir, A., Tromer, E.: Full AES key extraction in 65 milliseconds using cache attacks. Crypto 2005, rump session (2005)
Rudra, A., Dubey, P., Jutla, C., Kummar, V., Rao, J., Rohatgi, P.: Efficient Rijndael Encryption Implementation with Composite Field Arithmetic. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 171–184. Springer, Heidelberg (2001)
Satoh, A., Morioka, S., Takano, K., Munetoh, S.: A Compact Rijndael Hardware Architecture with S-Box Optimization. In: Boyd, C. (ed.) ASIACRYPT 2001. LNCS, vol. 2248, pp. 239–254. Springer, Heidelberg (2001)
Software Optimization Guide for AMD64 Processors, Publication 25112, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
de Vries, H.: Understanding the detailed Architecture of AMD’s 64 bit Core, http://chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matsui, M. (2006). How Far Can We Go on the x64 Processors?. In: Robshaw, M. (eds) Fast Software Encryption. FSE 2006. Lecture Notes in Computer Science, vol 4047. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799313_22
Download citation
DOI: https://doi.org/10.1007/11799313_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36597-6
Online ISBN: 978-3-540-36598-3
eBook Packages: Computer ScienceComputer Science (R0)