Abstract
This paper describes a vulnerability in several implementations of the Secure Hash Algorithm 3 (SHA-3) that have been released by its designers. The vulnerability has been present since the final-round update of Keccak was submitted to the National Institute of Standards and Technology (NIST) SHA-3 hash function competition in January 2011, and is present in the eXtended Keccak Code Package (XKCP) of the Keccak team. It affects all software projects that have integrated this code, such as the scripting languages Python and PHP Hypertext Preprocessor (PHP). The vulnerability is a buffer overflow that allows attacker-controlled values to be eXclusive-ORed (XORed) into memory (without any restrictions on values to be XORed and even far beyond the location of the original buffer), thereby making many standard protection measures against buffer overflows (e.g., canary values) completely ineffective. First, we provide Python and PHP scripts that cause segmentation faults when vulnerable versions of the interpreters are used. Then, we show how this vulnerability can be used to construct second preimages and preimages for the implementation, and we provide a specially constructed file that, when hashed, allows the attacker to execute arbitrary code on the victim’s device. The vulnerability applies to all hash value sizes, and all 64-bit Windows, Linux, and macOS operating systems, and may also impact cryptographic algorithms that require SHA-3 or its variants, such as the Edwards-curve Digital Signature Algorithm (EdDSA) when the Edwards448 curve is used. We introduce the Init-Update-Final Test (IUFT) to detect this vulnerability in implementations.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
A (cryptographic) hash function transforms a variable-length message into a fixed-length output, referred to as a “message digest,” a “hash value,” or simply a “hash.” This hash is intended to serve as a unique representative value of the message (i.e., as a “digital fingerprint”). A typical use of hash functions is in digital signature schemes, where the signature is typically applied to the hash of the message.
For such signature schemes to be secure, a hash must be uniquely identifiable by the corresponding message. Nevertheless, hash functions are many-to-one, therefore due to the pigeonhole principle, it is unavoidable that there exists a collision: two distinct messages with the same hash value.
A secure hash function is traditionally required to have three security properties: it should be computationally infeasible to find a collision, as well as to find a second preimage (another message that results in the same hash), or to find a preimage (i.e., to find a message that corresponds to a given hash). For a classical treatment of hash functions based on these three properties (preimage, second preimage, and collision resistance), we refer to the Handbook of Applied Cryptography [7, Chapter 9].
Wang et al. presented a colliding pair of messages for the Message Digest 5 (MD5) hash function at EUROCRYPT 2005 [21], and presented a collision attack for SHA-1 at CRYPTO 2005 [20]. In response to these attacks, NIST announced a competition for a new SHA-3 hash function standard in 2007 [11]. The Keccak hash function was one of the 64 hash functions submitted in 2008 and was eventually selected as the winner of the competition in 2012. In 2015, NIST published FIPS 202 [12], which specifies the SHA-3 standard.
In this paper, we will not focus on the specifications of hash functions, but on the correctness of their implementations. The source codes of the SHA-3 submissions have been subject to years of public scrutiny. Already at the beginning of the competition, Forsythe and Held of Fortify [4] performed a systematic analysis of all first-round candidates against typical programming errors and found buffer overflows, out-of-bound reads, memory leaks, and null dereferences in five reference implementations. In 2018, Mouha et al. [10] introduced a new testing strategy that showed bugs in 41 of the 86 reference implementations. Later at CT-RSA 2020, Mouha and Celi [9] announced a vulnerability in Apple’s CoreCrypto library that affected 11 out of the 12 hash functions that were implemented in the library.
In this paper, we present an undiscovered vulnerability that impacts the final-round submission of Keccak to the SHA-3 competition [13]. The vulnerability also affects the eXtended Keccak Code Package (XKCP) [2] of the Keccak team and various software projects (including Python and PHP) that are based on this source code. The vulnerability described in this paper does not affect the SHA-3 standard (as specified in FIPS 202 [12]), and not all implementations of SHA-3 are vulnerable. Most notably, the implementation of SHA-3 in OpenSSL is not affected.
Vulnerability Disclosure. CVE (Common Vulnerabilities and Exposures) identifiers are assigned by CVE Numbering Authorities (CNAs). The vulnerability did not seem to fit the scope of any of the regular CNAs, so the MITRE CNA of Last Resort (CNA-LR) was contacted on August 4, 2022. On August 7, CVE-2022-37454 was assigned to this vulnerability. The Keccak Team was contacted on August 21, and a series of discussions followed regarding the technical details and scope of the vulnerability, potential fixes, and a disclosure process. Proof-of-concept code was disclosed to the main projects that appeared to be impacted (Python, PHP, and pysha3) on October 11. There were no objections to publicly patching and disclosing the vulnerability on October 20. On October 21, the PyPy and SHA3 for Ruby projects were informed as well. The National Vulnerability Database (NVD) assigned the score “9.8 CRITICAL” to this vulnerability on October 25. Fixes are available for the affected projects, therefore the Python and PHP scripts in this paper may no longer produce segmentation faults even though older versions of the interpreters are vulnerable.
2 The SHA-3 Standard
SHA-3 uses the “sponge construction” to process the message in blocks of a fixed size (see Fig. 1). For the four hash functions (SHA3-224, SHA3-256, SHA3-384, and SHA3-512), the number in the suffix refers to the length of the hash value in bits. An eXtendable-Output Function (XOF) is a variant of a hash function that provides a hash value of any requested length. The two XOFs (SHAKE128 and SHAKE256) have a security strength of 128 and 256 bits respectively, assuming the requested output is sufficiently long.
The sponge construction is parameterized by a rate r and a capacity c, both given in bits. For the four hash functions and the two XOFs that are specified in the SHA-3 standard [12], the values of these parameters are given in Table 1. The message M is processed according to a specific padding ruleFootnote 1 so that the padded message becomes a positive multiple of r bits. This allows it to be split into blocks of r bits each: \(M_0\), \(M_1\), \(M_2\), \(\ldots \) As many output blocks \(Z_0\), \(Z_1\), \(Z_2\), \(\ldots \) are generated as necessary, these blocks are concatenated, and the hash value is obtained after truncating to the desired length.
We will use the notation \(0^s\) to refer to the all-zero bit string of length s. In the figures, the numbers given next to every line represent the length of the corresponding bit string and \(\oplus \) is the bitwise eXclusive-OR (XOR) operation. The function f is a cryptographic permutation. It is easy to evaluate f and its inverse \(f^{-1}\), but the outputs should appear “random,” so that any structure in the output only occurs by chance after evaluating f on a sufficient number of inputs. In “sponge” terminology, processing the padded message is referred to as “absorbing,” and generating the hash is referred to as “squeezing.”
This paper will focus mostly on the hash function with the smallest output size: SHA3-224. This is only for convenience and simplicity, as the source code that contains the vulnerability is used by the implementations of all the hash functions and XOFs in the SHA-3 standard, as well as the SHA-3 derived functions (cSHAKE, KMAC, TupleHash, and ParallelHash) specified in SP 800-185 [6].
Two application programmer interfaces (APIs) are common for hash function implementations. More specifically, the message can be processed either at once or incrementally. In the latter case, a call to Keccak_HashInitialize() is followed by any number of calls to Keccak_HashUpdate(), and then followed by a call to Keccak_HashFinal(). In this case, the calls to Keccak_HashUpdate() are “absorbing,” while the single call to Keccak_HashFinal() performs the “squeezing.” This makes it convenient to process a message that consists of several parts: it is not necessary to store these parts in a temporary buffer, but the hash can be computed on the fly.
Many cryptographic algorithms naturally lend themselves to processing the input in blocks: for the cryptographic library HACL\(^*\) [15, 22], 17 algorithms are spread out across 40 implementations, and at least a dozen of those follow a block-based paradigm as pointed out by Protzenko and Ho [17].
3 The Vulnerability
In XKCP versions released before October 20, 2022 (and in other projects such as the Python and PHP scripting languages that included this source code before they were patched), there is a vulnerability in the KeccakSponge.inc file that implements the processing of the message in fixed-size blocks. The same vulnerability is also present in the KeccakSponge.c file of the final-round source code made available by NIST on the SHA-3 competition website. As explained in Sect. 2, the block size is also known as the “rate” and its size in bytes is denoted by rateInBytes in the source code.
The KeccakSponge.inc file contains the following code in SpongeAbsorb() to process the input of the hash function in fixed-size blocks:
On all 64-bit Windows, Linux, and macOS operating systems, size_t variables are unsigned 64-bit integers and unsigned int variables are 32-bit unsigned integers. Therefore, the variable definitions (not shown here) imply that
-
partialBlock, instance->byteIOIndex, and rateInBytes are unsigned 32-bit integers, whereas
-
dataByteLen and i are unsigned 64-bit integers.
The comparison (partialBlock+instance->byteIOIndex> rateInBytes) is intended to detect when SpongeAbsorb() encounters (partial) inputs that, when added to the instance->byteIOIndex bytes already in the buffer from previous calls (if any) to SpongeAbsorb(), will be larger than the block size (rateInBytes).
This buffer may already contain some data. If this is the case, then a subsequent call to SpongeAbsorb() with an input that is just below \(2^{32}\) bytes (4 GiB) causes partialBlock+instance->byteIOIndex to wrap around due to an integer overflow. This incorrectly results in a value that is lower than the block size, so that the if condition evaluates to false. Consequently, a large value of partialBlock will be passed on to SnP_AddBytes(), resulting in a buffer overflow when these partialBlock bytes are XORed to memory inside SnP_AddBytes().
Additionally, there is an incorrect type casting. If an input of at least \(2^{32}\) bytes (4 GiB) is provided, then the higher bits are discarded due to the cast to an unsigned int. The code will nevertheless be correct if only one call to SpongeAbsorb() is performed. If, however, the buffer already contains some data and an input of at least \(2^{32}\) bytes is provided, then the program will enter into an infinite loop. Note the similarity here with the vulnerability presented at CT-RSA 2020 by Mouha and Celi [9], which affected every implemented hash function except MD2 in Apple’s CoreCrypto library, and also caused an infinite loop.
The infinite loop can be avoided as follows. Assume that an input of x bytes is processed (where \(0<x<\texttt {rateInBytes}\)), so that instance->byteIOIndex is set to x. The buffer then contains x bytes. Then, assume that this is followed by another input of \(2^{32}-x\) bytes. This will create a situation where a large number of bytes of the input message are XORed in memory. If this involves a write operation into unwritable memory, it will cause a segmentation fault. Proof of concept Python and PHP scripts that generate a segmentation fault in this way are given in Appendix A.
If we can ensure that the write is done into writable memory, then this specific input value will avoid an infinite loop, but instead, will exit the loop before the next iteration. We will not go into the details of the techniques to avoid write operations to unwritable memory, but we note that the typical techniques for this (such as stack spraying or heap spraying, depending on the location of the internal hash function state), may also help to mitigate Address Space Layout Randomization (ASLR) if present.
In the following, we explain that if a write operation to unwritable memory can be avoided, it will be possible to generate second preimages and preimages for this specific implementation of the SHA-3 hash function. We reiterate that this is not due to a weakness in the SHA-3 standard, but rather due to the implementation producing an incorrect hash value when provided with malicious inputs. We also show how to provide an exploit payload along with the message, which will overwrite the stack return address to point to the location of the payload inside the message.
3.1 Constructing a Second Preimage
The construction of a second preimage (which also implies a collision) is rather straightforward. As shown in Fig. 2, we process an all-zero message of \(2^{32}\) bytes (4 GiB) using two calls to Keccak_HashUpdate() (which will internally call SpongeAbsorb()). The first call consists of \(0<x<\texttt {rateInBytes}\) bytes, followed by a call of \(2^{32}-x\) bytes. The value of x can be any integer within the specified range, for simplicity we use \(x=1\) in the proof of concept code given in Appendix A.
The \(2^{32}\) bytes of the message will be XORed into memory. As we are XORing all-zero values, the content of the memory will not be changed but may result in a segmentation fault if the memory region is not writable. Therefore, the adjacent memory region, beyond the \(r+c\) bits of the sponge state, does not need to be known to the attacker but needs to be writable.
A call to Keccak_HashUpdate() of \(0<x<\texttt {rateInBytes}\) bytes followed by a call of \(2^{32}-x\) bytes will conveniently result in another integer overflow: instance->byteIOIndex will overflow and end up with a value of zero. Therefore, from the point of view of the implementation, the 4 GiB message is “forgotten” and the computation continues as if nothing has been processed yet.
Now, the padding of SHA-3 becomes relevant. As explained in [12], the padding consists of adding a fixed two- or four-bit suffix to the message (to distinguish the SHA-3 hash functions from the SHA-3 XOFs), followed by the “multi-rate padding rule” which consists of a ‘1’, followed by a possibly empty string of zeros, and a ‘1’. This padding is notably different from the MD4, MD5, SHA-1, and the SHA-2 family, which include the length of the message as part of the padding, a process known as Merkle–Damgård strengthening.
Because the padding for SHA-3 does not involve the number of bytes that were processed, we can perform a third call to Keccak_HashUpdate() (and any number of subsequent calls) and the hash value will be the same as when the first two calls to Keccak_HashUpdate() were not present.
As such, we find a second preimage for the vulnerable implementation: given any message, we can prepend 4 GiB of zeros to the message (to be processed as mentioned earlier in two calls) to obtain another message that results in the same hash value.
3.2 Constructing a Preimage of Zero
At SAC 2020, Benmocha et al. [1] studied implementations of the keyed-hash message authentication code (HMAC) when the API is used in an unintended way by adding extra data after the tag has already been computed. They noted that most APIs do not raise an error when used in such a way, and that for OpenSSL it is possible to instantly find collisions and multi-collisions that are also colliding under any key.
The SHA-3 implementation does raise an error when such an API misuse happens. To achieve this, the state contains a squeezing variable that is initialized to zero, and is set to a non-zero value when the padding has been processed. Every time new data is processed, the implementation confirms that the squeezing variable is zero, otherwise the calling function returns with a non-zero value to indicate an error.
On the other hand, an implementation would typically not check for errors that cannot occur if the implementation is correct, and even if they do, such checks might be eliminated as part of a common compiler optimization called “dead code elimination.” If this is the case, we show how to construct a preimage of zero for a vulnerable implementation.
More specifically, we can provide a 4 GiB message (processed using two calls to Keccak_HashUpdate() as before) to reach beyond the \(r+c\) bits of the sponge state, and access the internal variables of the hash function state (see Fig. 3). This allows us to set the squeezing variable to a non-zero value, and when Keccak_HashFinal() calls SpongeAbsorbLastFewBits() to process the padding, it will return early with an error when it finds that squeezing has a non-zero value. In the end, the hash will not be written but will contain the value to which it was initialized, most likely zero.
In Appendix A, we provide proof-of-concept code to use this technique to obtain a preimage of zero for a vulnerable implementation.
3.3 Constructing a Preimage of Any Value
Rather than just creating a preimage of zero, we can use the vulnerability to create a preimage of an arbitrary hash value.
For this, we start with the target hash value H, and pad it to the entire \(r+c\) sponge state. The contents of the padding do not matter, so we can just use zeros for simplicity. Recall that f is a permutation, so we can invert f on any value. The code for the inverse of f is not included in XKCP [2], however, it can be found in KeccakTools [3]. As SHA-3 initializes the \(r+c\) bits of the sponge state with zeros, all we need to do now is XOR the inverse of f with two padding bytes (see [12, App. B.2]) to obtain the first \(r+c\) bits of the \(M_0\Vert M_1\), which is again a message of 4 GiB that is processed in two calls. The other bits of \(M_0\Vert M_1\) are set to zero to avoid altering the adjacent memory regions. The entire procedure is illustrated in Fig. 4.
In literature, the attack is known as the correcting block attack as applied to hash functions based on Cipher Block Chaining (CBC) [16, Sect. 5.3.1.1], such as the attack on the first-round SHA-3 candidate Khichidi-1 [8, Sect. 2.6.3].
In Appendix A, we show how for a vulnerable implementation we can generate a preimage of 000102030405060708090a0b0c0d0e0f101112131415161718191a1b in this way.
3.4 Constructing a Message with an Exploit Payload
As shown in Fig. 5, a carefully constructed stack overflow allows the return address of the function to be overwritten. We illustrate this with a simple return-to-stack exploit when an attacker-provided file is hashed, which launches a Meterpreter Reverse TCP payload. This allows the attacker to download and upload files, view the webcam, run post-exploitation tools to pivot deeper into the victim’s device and/or to maintain persistence, etc. Proof-of-concept code is provided in Appendix A. Our exploit assumes that the stack is executable and that ASLR is not present. Note that these assumptions can be avoided by using more advanced exploitation techniques, such as return-oriented programming and techniques to reduce address randomization.
3.5 Attacking EdDSA
The use of SHA-3 and its variants is mandatory in certain NIST and Internet Engineering Task Force (IETF) standards. For example, EdDSA [5, 14] makes the use of SHAKE256 mandatory for Ed448. The vulnerability would then work as follows. If an implementation of Ed448 verification (with the default empty-string context) places a 10-byte encoded context, a 57-byte point R, and a 57-byte public key Q in the buffer, then \(10+57+57=124\) bytes are in the buffer before the message is processed. This is less than 136 bytes, which is the rate in bytes for SHAKE256. Therefore, a message of \(2^{32}-124\) bytes can be used to cause the buffer overflow described in this paper. Note that the message does not need to be correctly signed for the buffer overflow attack on the verification function to work.
As this example shows, the use of repeated calls to Keccak_HashUpdate() can occur quite naturally, for algorithms such as EdDSA where the input consists of a concatenation of various values.
4 Discussion
The execution time to process the 4 GiB message will depend on the platform. However, no calls to the cryptographic permutation f are involved, therefore the execution time is mainly the time required to XOR a 4 GiB value into memory. In our experiments on a recent laptop, we observed an execution time of 2 to 3 s to process this input. The proof-of-concept code in Appendix A shows various techniques to avoid a large amount of RAM or swap space, such as using mmap() to create file-backed and anonymous mappings. However, it may be necessary for some attacks that the amount of RAM and swap together is at least 4 GiB to avoid an error that insufficient memory is available. It does not seem that 32-bit systems are vulnerable because the address space is insufficient to malloc() or mmap() such an input.
Not all implementations of SHA-3 are vulnerable to this bug. For example, the implementation of OpenSSL is not based on the XKCP, and incidentally, Python 3.9 has been patched to use OpenSSL’s implementation when available [18]. As explained by Christian Heimes [19], both SHA-3 and SHAKE are listed in hashlib.algorithms_guaranteed, but using OpenSSL is optional. This explains why the vulnerable code has not been removed and that it may be reachable under some configurations. Since Python 3.11, however, the XKCP implementation was replaced by Saarinen’s tiny_sha3 [19].
Projects that are derived from Python, such as PyPy3, may remain vulnerable for a longer time due to a slower adoption of Python patches. For example, the PyPy 3.8 release is vulnerable, but the latest PyPy 3.9 release incorporates the patch to use the OpenSSL implementation.
A possible suggestion to mitigate the vulnerability is to switch the default SHA-3 and SHAKE from XKCP to OpenSSL, or Saarinen’s tiny_sha. Another suggestion to mitigate this bug is to limit the maximum size of a call to \(2^{32}-\texttt {rateInBytes}\) bytes, where rateInBytes is either the corresponding value in Table 1 for the given SHA-3 hash function or XOF, or a cautious upper limit of 200 (the size of the sponge state in bytes). Lastly, the vulnerability can also be avoided by always processing the entire message at once, which may require the use of a temporary buffer.
Note that the Large Data Test (LDT) that was introduced at CT-RSA 2020 by Mouha and Celi [9] is not effective to find this bug (nor for regression testing) because a specific sequence of calls is required; a single call with a large input will not trigger the vulnerability. Bugs of this type may be difficult to find through testing because they require a very specific sequence of calls, which may explain why this bug has not been discovered since it was first introduced in 2011. Nevertheless, the bug may be triggered using only one call to higher-level algorithms that are now introducing SHA-3 or its variants, as in the Ed448 example mentioned earlier.
The bug was not present in the first- and second-round submissions of the Keccak package to the NIST SHA-3 competition, but appears in the implementation that was submitted in the final round where partialBlock was changed from a 64-bit to a 32-bit variable. Nevertheless, we note a slight difference with the bug in the Keccak package: the incorrect line contains databitlen rather than dataByteLen, and therefore a message of \(2^{32}\) bits (0.5 GiB) rather than \(2^{32}\) bytes (4 GiB) is required to trigger the bug described in this paper. Taking this change into account, all attacks described in this paper also apply to the final-round Keccak submission to the SHA-3 competition.
5 Proposing the Init-Update-Final Test (IUFT)
Within the NIST Cryptographic Algorithm Validation Program (CAVP), when testing hash functions, a single call to Keccak_HashUpdate() is performed to compute the hash value. As we have shown, this is not sufficient to cover corner cases that appear in practice. Testing must match the real-world use cases of an implementation to be effective. Currently, there is a gap in the test coverage offered by NIST. To cover this gap, we propose the Init-Update-Final Test (IUFT) as a solution in Fig. 6. The example is given in the Automated Cryptographic Validation Protocol (ACVP) JavaScript Object Notation (JSON) format, and includes an optional Large Data Test (LDT) element proposed by Mouha and Celi at CT-RSA 2020 [9].
6 Conclusion
We described a buffer overflow vulnerability in the final-round Keccak submission package to the NIST SHA-3 competition, in the eXtended Keccak Code Package (XKCP), and in various projects such as the Python and PHP interpreters that incorporate this code.
The vulnerability is due to a 32-bit integer overflow that occurs when a large (around 4 GiB) call to Keccak_HashUpdate() is made after an incomplete number of blocks have been processed. Depending on the length of the call, this will result in either an infinite loop or an attacker-chosen 4 GiB value that is XORed into memory, resulting in a buffer overflow.
We showed how this buffer overflow can be leveraged to violate the cryptographic properties of the hash function (preimage, second preimage, and collision resistance), as it provides the attacker full control over the \(r+c\) bits of the sponge state. Moreover, we showed how to overwrite the stack pointer and execute an attacker-provided payload.
Lastly, we proposed the Init-Update-Final Test (IUFT) that can process an input in several parts.
Notes
- 1.
As we will explain in Sect. 3.1, the length of the message is not part of the padding. This property will be useful for our attacks.
References
Benmocha, G., Biham, E., Perle, S.: Unintended features of APIs: cryptanalysis of incremental HMAC. In: Dunkelman, O., Jacobson Jr., M.J., O’Flynn, C. (eds.) SAC 2020. LNCS, vol. 12804, pp. 301–325. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81652-0_12
Bertoni, G., Daemen, J., Hoffert, S., Peeters, M., Assche, G.V., Keer, R.V.: eXtended Keccak code package (2022). https://github.com/XKCP/XKCP
Bertoni, G., Daemen, J., Peeters, M., Assche, G.V.: KeccakTools (2018). https://github.com/KeccakTeam/KeccakTools
Forsythe, J., Held, D.: NIST SHA-3 competition security audit results. Fortify Software Blog (2009). http://web.archive.org/web/20120222155656if_/blog.fortify.com/repo/Fortify-SHA-3-Report.pdf
Josefsson, S., Liusvaara, I.: Edwards-curve digital signature algorithm (EdDSA). RFC 8032 (2017). http://www.ietf.org/rfc/rfc8032.txt
Kelsey, J., Chang, S., Perlner, R.: SHA-3 derived functions: cSHAKE, KMAC, TupleHash, and ParallelHash. NIST SP 800-185 (2016). https://doi.org/10.6028/NIST.SP.800-185
Menezes, A., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, BOca Raton (1996). https://doi.org/10.1201/9781439821916
Mouha, N.: Automated techniques for hash function and block cipher cryptanalysis. Ph.D. thesis, Katholieke Universiteit Leuven (2012)
Mouha, N., Celi, C.: Extending NIST’s CAVP testing of cryptographic hash function implementations. In: Jarecki, S. (ed.) CT-RSA 2020. LNCS, vol. 12006, pp. 129–145. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40186-3_7
Mouha, N., Raunak, M.S., Kuhn, D.R., Kacker, R.: Finding bugs in cryptographic hash function implementations. IEEE Trans. Reliab. 67(3), 870–884 (2018). https://doi.org/10.1109/TR.2018.2847247
National Institute of Standards and Technology: Announcing Request for Candidate Algorithm Nominations for a New Cryptographic Hash Algorithm (SHA-3) Family. 72 Fed. Reg. (2007). https://www.federalregister.gov/d/E7-21581
National Institute of Standards and Technology: SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions. NIST Federal Information Processing Standards Publication 202 (2015). https://doi.org/10.6028/NIST.FIPS.202
National Institute of Standards and Technology: Hash Functions: SHA-3 Project (2020). https://csrc.nist.gov/projects/hash-functions/sha-3-project
National Institute of Standards and Technology: Digital Signature Standard (DSS). NIST Federal Information Processing Standards Publication 186-5 (2023). https://doi.org/10.6028/NIST.FIPS.186-5
Polubelova, M., et al.: HACLxN: verified generic SIMD crypto (for all your favourite platforms). In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.) CCS 2020: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, 9–13 November 2020, pp. 899–918. ACM (2020). https://doi.org/10.1145/3372297.3423352
Preneel, B.: Analysis and design of cryptographic hash functions. Ph.D. thesis, Katholieke Universiteit Leuven (1993)
Protzenko, J., Ho, S.: Functional pearl: zero-cost, meta-programmed, dependently-typed stateful functors in F\(^*\). CoRR abs/2102.01644 (2021)
Python Tracker: Issue 37630: Investigate replacing SHA3 code with OpenSSL (2019). https://bugs.python.org/issue37630
Python Tracker: Issue 47098: sha3: Replace Keccak Code Package with tiny_sha3 (2022). https://bugs.python.org/issue47098
Wang, X., Yin, Y.L., Yu, H.: Finding collisions in the full SHA-1. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 17–36. Springer, Heidelberg (2005). https://doi.org/10.1007/11535218_2
Wang, X., Yu, H.: How to break MD5 and other hash functions. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 19–35. Springer, Heidelberg (2005). https://doi.org/10.1007/11426639_2
Zinzindohoué, J.K., Bhargavan, K., Protzenko, J., Beurdouche, B.: HACL\(^*\): a verified modern cryptographic library. In: Thuraisingham, B., Evans, D., Malkin, T., Xu, D. (eds.) Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, 30 October–03 November 2017, pp. 1789–1806. ACM (2017). https://doi.org/10.1145/3133956.3134043
Acknowledgments
The authors would like to thank Benjamin Livelsberger, Olivera Kotevska, Kevin Stine, and their NIST colleagues for their useful comments and suggestions. We also thank the Keccak team for their quick response to update their codebase and coordinate the disclosure of the vulnerability, and the security teams and maintainers of the Python, PHP, PyPy, and SHA3 for Ruby projects for promptly fixing the vulnerability. Products may be identified in this document, but identification does not imply recommendation or endorsement by NIST, nor that the products identified are necessarily the best available for the purpose.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Proof of Concept Code
A Proof of Concept Code
Below we provide proof-of-concept code that runs with little to no modification (assuming the necessary packages are installed) on a 64-bit Ubuntu Linux platform. The proof of concept script will attempt to set up a Python crash, a PHP crash, a second preimage, a preimage of zero, a preimage of an attacker-chosen value, and a buffer exploit on a file hashing tool. The script assumes docker is installed with access to a non-root user. As explained below, the location of the return address and the attacker’s IP address may need to be modified for the attack to work.
Expected output:
File run_all_attacks.sh:
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mouha, N., Celi, C. (2023). A Vulnerability in Implementations of SHA-3, SHAKE, EdDSA, and Other NIST-Approved Algorithms. In: Rosulek, M. (eds) Topics in Cryptology – CT-RSA 2023. CT-RSA 2023. Lecture Notes in Computer Science, vol 13871. Springer, Cham. https://doi.org/10.1007/978-3-031-30872-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-30872-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30871-0
Online ISBN: 978-3-031-30872-7
eBook Packages: Computer ScienceComputer Science (R0)